Citation |

- Permanent Link:
- http://ufdc.ufl.edu/AA00003326/00001
## Material Information- Title:
- Association algebra a mathematical foundation for object- oriented databases
- Creator:
- Guo, Mingsen, 1947-
- Publication Date:
- 1990
- Language:
- English
- Physical Description:
- viii, 159 leaves : ill. ; 29 cm.
## Subjects- Subjects / Keywords:
- Algebra ( jstor )
Data models ( jstor ) Database design ( jstor ) Databases ( jstor ) Departmental majors ( jstor ) Distributivity ( jstor ) Mathematics ( jstor ) Query languages ( jstor ) Relational database models ( jstor ) Undergraduate students ( jstor ) - Genre:
- bibliography ( marcgt )
theses ( marcgt ) non-fiction ( marcgt )
## Notes- Thesis:
- Thesis (Ph. D.)--University of Florida, 1990.
- Bibliography:
- Includes bibliographical references (leaves 135-140).
- General Note:
- Typescript.
- General Note:
- Vita.
- Statement of Responsibility:
- by Mingsen Guo.
## Record Information- Source Institution:
- University of Florida
- Holding Location:
- University of Florida
- Rights Management:
- Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
- Resource Identifier:
- 025013339 ( ALEPH )
AHR3687 ( NOTIS ) 24160849 ( OCLC )
## UFDC Membership |

Downloads |

## This item has the following downloads: |

Full Text |

ASSOCIATION ALGEBRA: A MATHEMATICAL FOUNDATION FOR OBJECT-ORIENTED DATABASES By MINGSEN GUO A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1990 Copyright 1990 by Mingsen Guo Dedicated to my dear wife Zhu (Susie) and lovely daughter Jialan. And to our parents Jingcheng Guo and Ruiying Zhang Shuyan Huang and Chuanxiang Chen, this was their dream before it was mine. ACKNOWLEDGEMENTS I would like to express my sincere appreciation to Dr. Stanley Su, chairman of my supervisory committee, for giving me the opportunity to work on this interesting and important topic in the area of object-oriented database systems. Without his patient guidance and continuous support, this work could not have been completed. I am grateful to Dr. Herman Lam, cochairman of my supervisory committee, for his thought-provoking suggestions on this work. I thank Dr. Sham Navathe for his com- ments and his personal library. I thank Dr. Randy Chow for his encouragement throughout my graduate study. I would like to thank Dr. John Staudhammer for his time and for being on my supervisory committee. My special thanks go to Sharon Grant, the secretary of the Database Systems Research and Development Center, whose help to me is always friendly and in time. This research was supported by the National Science Foundation (DMC- 8814989) and the National Institute of Standard and Technology (60NANB4D0017). The development effort is supported by the Florida High Technology and Industrial Council (UPN88092237). TABLE OF CONTENTS ACKNOW LEDGM ENTS .............................................................................. ABSTRACT .................................................................................................... CHAPTER Page iv vii 1 INTRODUCTION .............................................................................. 1 2 A SURVEY OF RELATED WORK............................................. 12 2.1 Relational Model and Relational Algebra................................ 12 2.2 Existing 0-0 Query Languages.............................. ............ .. 18 2.3 ENCORE 0-0 Data Model and Its Underlying Query Algebra. 25 3 OVERVIEW OF 0-0 DATABASES AND ASSOCIATION-BASED QUERY FORMULATION........................ 38 3.1 Overview of 0-0 Databases................................... ........... 38 3.2 Pattern-based Query Formulation.......................... ............ 41 3.3 Conclusion .............................................................................. 45 4 ASSOCIATION ALGEBRA ......................................... ............ .. 51 4.1 Definitions.................................................................................. 51 4.2 Relationship Between Two Patterns..................................... 55 4.3 Association Operators.......................................................... 56 4.4 Query Examples .................................................................. 71 5 MATHEMATICAL PROPERTIES OF OPERATORS AND THEIR APPLICATIONS IN QUERY OPTIMIZATION AND QUERY DECOMPOSITION............................................ 91 5.1 Conventional Algebraic Properties........................................ 91 5.2 Nesting of Two Unary Operators ........................................... 95 5.3 Nesting of Binary Operator in Unary Operator ...................... 97 5.4 Cascading of Two Binary Operators..................................... 99 5.5 General Identities ....................................................................104 5.6 Transformation of Operators ..................................................104 5.7 Applications in Query Optimization and Decomposition ..........106 6 COMPLETENESS OF THE A-ALGEBRA.......................................118 7 CONCLUSION.................................................................................133 REFEREN CES .................................................................................................. 135 APPEND IX .............................. .............................. ...................................141 BIO GRAPHICAL SK ETCH ................................................................................159 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy ASSOCIATION ALGEBRA: A MATHEMATICAL FOUNDATION FOR OBJECT-ORIENTED DATABASES By Mingsen Guo December 1990 Chairman: Dr. Stanley Y.W. Su Major Department: Electrical Engineering Existing 0-0 DBMSs lack a solid mathematical foundation for the manipulation of 0-0 databases, optimization of queries, and the design and selection of storage structures for supporting 0-0 database manipulations. An association algebra (A- algebra) is prescribed for serving as a mathematical foundation for processing 0-0 databases, which is analogous to the use of relational algebra for processing relational databases. In this algebra, objects and their associations in an 0-0 database are uni- formly represented by association patterns which are manipulated by a number of operators to produce other association patterns. Different from the relational alge- bra, in which set operations operate on relations with union-compatible structures, the A-algebra operators can operate on association patterns of both homogeneous and heterogeneous structures. Different from the traditional record-based relational pro- cessing, the A-algebra allows very complex patterns of object associations to be directly manipulated. Pattern-based query formulation and the A-algebra operators are described. Some mathematical properties of the algebraic operators are presented together with their application in query decomposition and optimization. The completeness of the A-algebra is also defined and proven. The A-algebra has been used as the basis for the design and implementation of an object-oriented query language, OQL, which is the query language used in a prototype Knowledge Base Management System OSAM*.KBMS. CHAPTER 1 INTRODUCTION In the past two decades, techniques of data modeling have gone through two major conceptual changes. First, in early 1970s, E. F. Codd observed that future database systems should allow application programs and terminal users to remain unaffected by changes made to the internal data representation (or the storage structure) of a database. He introduced the relational data model [COD70] and proposed the relational algebra and relational calculus [COD72a] as the mathematical foundation for processing relational databases. The relational model provides two levels of data independence in a three-level architecture for a data- base management system as shown in Figure 1.1 (figures of each chapter are placed at the end of the chapter). At the lower level, the physical data indepen- dence is provided, i.e., the logical representation of a relational database is a set of relations (i.e., flat tables), which is independent of the physical (data and storage) structures in which data are stored. At the higher level, the logical data indepen- dence is provided, i.e., the external view remains unchanged when the logical view of a database is modified (note that the external view remains unchanged only for some schema modifications). Besides simple logical representation and data independence, the fact that the relational model has a solid mathematical founda- tion is very important and has contributed to the success of the model and the existing relational database management systems. However, the relational model and relational systems have some limitations. For example, the model captures rather limited structural properties of real-world entities or objects. The construct of aggregation hierarchy which models complex objects and the construct of generalization which models the superclass-subclass relationship are not provided. In the relational model, data which describe a com- plex object are scattered among a number of normalized relations and accessing that data involves time-consuming traversal and assembly of data stored in multi- ple relations. The model also does not allow behavioral properties of entities/objects to be explicitly defined. The second conceptual change of data modeling techniques occurred in the early 1980s. The object-oriented paradigm, first introduced in the programming language SIMULA [DAH67] and made very popular through the language SMALLTALK [GOL81], allows richer structural constructs and behavioral proper- ties of objects to be specified at the logical level independent of their physical implementations. Several features of the paradigm such as abstract data types, inheritance, encapsulation, information hiding, polymorphism, etc. have been shown to be useful for data modeling and system development. The object encap- sulation concept adds a level of data independence between the physical and the logical independence introduced in the relational model, as depicted in Figure 1.2. It requires that the structural and behavioral properties of an object be (logically) encapsulated in its class in the conceptual view of an 0-0 database. Since then, a number of Object-Oriented (0-0) and semantic data models have been proposed [HAM81, BAT84, KIN84, ZAN85a, ZAN85b, DAD86, MAI86, MAN86, SU86, ZDO86, WOE86, BAN87, FIS87, HOR87, HUL87, KIM87, ROW87, CAR88, COL89, SU89], which offer more powerful constructs for modeling the structural and behavioral properties of objects found in advanced applications such as CAD/CAM, CASE, and decision support systems. An 0-0 semantic data model can be structurally and/or behaviorally object- oriented [DIT86]. A structurally 0-0 data model is one that encompasses at least the following characteristics: (1) It supports the unique identification of objects, that is, each object has a unique object identifier (surrogate) which is valid for the life-time of the object. (2) It categorizes those objects which can be described by the same set of charac- teristics (attributes) into an object class. (3) It allows aggregation (association) hierarchies to be defined. (4) It allows generalization (association) hierarchies to be defined. The 0-0 view of an application world is represented in the form of a net- work of classes and associations. Object class can be either a primitive-class whose instances are of simple data types (e.g., string, integer) or a nonprimitive class (e.g., Part, Student, Teacher). At the extensional level, instances of different classes can be related (associated) with each other forming patterns of object asso- ciations. A behaviorally object-oriented data model, on the other hand, is one in which operations that describe the behavior of the objects of a class can be defined and registered with that class. Programs or methods that implement the opera- tions defined for an object are transparent to the user of the objects. For these models to be truly useful, they must provide some object manipula- tion languages, which can take advantage of the expressive power of the models and provide the users with simple and powerful querying facilities. Recently, several query languages such as DAPLAX [SHI81], GEM [ZAN83, TSU84], ARIEL [MAC85], FAD [BAN87], POSTQUEL [ROW871, EXCESS [CAR881, and others reported in [DAD86, MAN86, SER86, BAN87, FIS87, BAN88, COL89, SHA90] have been proposed. These languages were developed based on different para- digms. For example, DAPLAX and the query language of [MAN86] are based on the functional paradigm. The query language of [BAN88] is based on the message passing paradigm. Other query languages are based on the relational paradigm: an extension of QUEL [ROW87, CAR88]; an extension of SQL [DAD86]; and an extension of the relational algebra [COL89]. The query language of [FIS87] is based on both functional and relational paradigms, allowing functions to be used in object-oriented SQL (OSQL) constructs. The above languages have an 0-0 flavor and have taken significant steps towards the development of a powerful 0-0 query language. Query languages such as DAPLAX [SHI81], GEM [ZAN83], ARIEL [MAC85], and the object- oriented query language described in [BAN88], are based on the view of a data- base defined in terms of objects, object classes, and their associations. A query in these languages is formulated by specifying one class (usually a nonprimitive-class, whose instances are real world objects) in the schema as a central class with some path expressions. Each path expression starts from the central class and ends at another class (usually a primitive-class, whose instances are of basic data types such as integer, string, set, etc.). A restriction condition can be specified on the class referenced at the end of a path expression. This class can also be specified in the list of attributes to be retrieved. The result of a query is a set of tuples, each of which corresponds to a single instance of the central class and contains values related to that instance which are collected from classes specified in the list. A major drawback of these query languages is that they do not maintain the closure property [ALA89b]. A query language is said to be closed if the result of a query can be further queried by other queries specified in the same language. In the above mentioned languages, the input to a query has an 0-0 representation (i.e., a network of objects, classes, and their associations) whereas its output is a relation which does not have the same structural and behavioral properties as the original objects. Consequently, the result of a query cannot be further processed by the same set of operators. The design of these languages is very much influenced by the relational model and relational languages which are concerned mainly with retrieval and storage operations. In 0-0 processing, objects in different classes that satisfy some search conditions are subject to different user- defined operations. The idea of collecting data to form a resulting relation does not satisfy this processing model. The query languages proposed [DAD86, MAN86, BAN87, ROW87, CAR88, COL89] use nested relations as their logical views of 0-0 databases. Although these languages are closed, i.e., operators in these languages operate on nested relations to produce nested relations, the nested relation is not a proper logical representation for an 0-0 database which is basically a network structure of object associations. Mapping from a network representation to nested relations is an additional process. Furthermore, in order to use a nested relation to represent complex network structures, a considerable amount of data has to be introduced to relate these nested relations. It is our view that the query language and its underlying algebra should directly support the manipulation of network structures. A query algebra [SHA90] was proposed recently based on the 0-0 model ENCORE [ELM89]. Although ENCORE models applications as networks of objects, object types, and their associations, the domain of the algebra is defined as sets of objects of the Tuple type, which is essentially the nested relation representation since it allows the nesting of tuples. Therefore, the mapping prob- lem addressed above still remains. In this algebra, two identical queries or two identical operations in a single query do not give the same response, since each produces a new object in the database. To eliminate duplicated copies of the same newly created object, the algebra introduces operations like DupEliminate and Coalesce, which would not have been necessary if the algebra were to directly support the network-structured processing of 0-0 databases. We further observe that the union operation in this algebra may produce a collection of objects having the same data type but with different structures (e.g., the union of two collections of objects of the Tuple type with different arities). Nevertheless, the other opera- tors introduced in the algebra are not defined to operate on collection of objects with heterogeneous structures. A common limitation of many existing query languages is that they cannot express "non-association" relationship between objects easily, i.e., identify objects in two classes that are not associated with each other while their classes are. For example, in an 0-0 database, let us assume that Suppliers sl and s2 supply Parts pl and p2, respectively. GEM, POSTQUEL, and several other query languages provide the "dot" construct (Suppliers.Parts) and ARIEL provides the "of" con- struct (Parts of Suppliers) to navigate from the class Suppliers to the class Parts to produce object pairs (sl,pl and s2,p2). However, they do not have a language construct for specifying the semantics that sl does not supply p2 and s2 does not supply pl. Similarly, in functional languages, only the function Parts(Suppliers) is provided to specify the associations of sl,pl and s2,p2 but not the non-association of suppliers and parts. In view of the disadvantages of the existing 0-0 query languages, we would like to stress the importance of using a graph as the logical representation of an 0-0 database at both intensional and extensional levels as exemplified by 02 [LEC88], FAD [BAN87], and OSAM* [SU89]. The query language and its under- lying algebra should provide constructs to directly process graphs with different degrees of complexity. They should also support the specification of non- associations and the processing of heterogeneous structures. Furthermore, the clo- sure property should be maintained. In this dissertation, we propose an association algebra (A-algebra) based on the graph representation of 0-0 databases and the association-based query formu- lation (refer to Chapter 3). Analogous to the development of the relational alge- bra for relational databases, the development of the A-algebra provides the formal foundation for query processing and optimization in 0-0 databases and for designing 0-0 query languages. Unlike the record(tuple)-based relational algebra [COD70 and COD72] and the query algebra [SHA90], the A-algebra is association-based, i.e., the domain of the algebra is sets of association patterns (e.g., linear structures, trees, lattices, networks, etc.) and processing an 0-0 data- base is based on the matching and manipulation of homogeneous as well as hetero- geneous patterns of object associations. Operators of the A-algebra can be used to navigate a network of interconnected object classes along the path of interest to construct a complex pattern as the search condition. They can also be used to decompose a complicated pattern into simple ones. Ten operators have been defined for the algebra: three unary operators [A-Select (r), A-Project (I), and A- Integrate (f)], and seven binary operators [Associate (*), A-Complement (I), A- Union (+), A-Difference (-), A-Divide (-), NonAssociate (!), and A-Intersect (*)], where the prefix A stands for "Association". Although many of these operators correspond to the relational algebra operators, they are different from them in that they can operate on complicated heterogeneous structures. In this respect, the A-algebra is more general than the relational algebra. The rest of this dissertation is organized as follows. A detailed survey on the relational model and the relational algebra, the existing 0-0 query languages, and a recently proposed query algebra is provided in Chapter 2. The graphical representation of 0-0 databases and the association-based query formulation are described in Chapter 3 with the help of examples. Chapter 4 formally defines the concepts of Schema Graph (SG), Object Graph (OG), and association patterns. The formal definitions of the association operators and their simple mathematical 9 properties are also presented. The A-algebra expressions for some example queries are given to demonstrate the utility of the algebra. Chapter 5 presents the mathematical properties of the association operators and their utilities in query optimization and query decomposition. The proofs of the mathematical properties of the operators can be found in the Appendix. The completeness of the A- algebra is shown in Chapter 6 and the conclusion is given in Chapter 7. logical data independence physical data independence Figure 1.1 Data independencies in relational databases logical data independence encapsulation physical data independence Figure 1.2 Architecture of 0-0 databases CHAPTER 2 A SURVEY OF RELATED RESEARCH This section surveys some of the existing work related to the development of the A-algebra. Section 2.1 describes the relational model and the relational alge- bra, while Section 2.2 surveys some existing query languages designed for 0-0 semantic data models. The query algebra recently appeared in the literature is surveyed in Section 2.3. 2.1 Relational Model and Relational Algebra When the hierarchical and network data models were used extensively in information systems in the late 1960s, Codd [COD70] raised an interesting and important question: Can application programs and terminal activities remain invariant as the internal data representations (physical representations) change? He asserted that the future users of large data banks must be protected from hav- ing to know how the data were organized in the machine. Following this rationale, he conceived the notion of data independence which suggests that the logical organization of data should be independent of its physical representation. Determined to demonstrate the validity of his data independence concept, he pro- posed a relational data model based on n-ary relations. The scheme of a relation, R, of an entity set {E1, E2, ..., EJ} is defined on a set of m attributes {A, A2, ..., Am} which correspond to m domains {DI, D2, ...,Dm} (not necessarily distinct). Each entity (the instance of the scheme) is represented by an m-ary tuple which has its first attribute value from D,, its second attribute from D2, and so forth. A set of attributes of a relation is called a key if the entities of the relation can be uniquely identified by the values of these attributes. In particular, the information of the suppliers such as their names, addresses, items they supply, and the prices of the items can be represented by the relation SUPPLIERS of the following scheme SUPPLIERS(SNAME, ADDRESS, ITEM, PRICE) where the attributes SNAME and ITEM form a composite key. Data represented in this form, which intuitively is a flat table, is the logical view of an application world. It has nothing to do with the physical representation of the data. When designing a database using the relational model, one is often faced with a choice among alternative sets of relation schemes. Some choices are more favor- able than others for various reasons. For example, the relation SUPPLIERS is not a desirable scheme because it has the following potential problems: (1) Redun- dancy the address of the supplier is repeated once for each item supplied. (2) Potential inconsistency (update anomalies) as a consequence of the redundancy, the update of the address of a supplier in one tuple will leave it inconsistent with the address of another tuple. (3) Insertion anomalies the address of a supplier cannot be recorded if that supplier does not currently supply at least one item since SNAME and ITEM form a composite key of the relation SUPPLIERS. (4) Deletion anomalies the inverse to problem (3) is that should all the items sup- plied by one supplier be deleted, we unintentionally lose the address of that sup- plier. The causes of these problems and their solutions are relevant to the func- tional dependencies among the attributes of a relation [COD70, ULL82]. Suppose X and Y are two sets of attributes of a relation. Y functionally depends on X (or X functionally determines Y), denoted by X-.Y, if two tuples of the relation hav- ing the same values in attributes X agree on the values of the attributes in Y. The above four problems emerge if X-. Y and X,--Z hold simultaneously, where X, stands for a proper subset of X and Z a set of attributes of the relation. The solution to these problems is to decompose a relation based on the func- tional dependencies among attributes. For example, the functional dependencies among attributes of the relation SUPPLIERS are (SNAME,ITEM)--PRICE and SNAME-.SADDRESS, thereby having the redundancy, update, insertion, and deletion anomalies. It should be clear to the reader that these problems will be eliminated if the relation SUPPLIERS is decomposed into two relations SA(SNAME, ADDRESS) and SIP(SNAME, ITEM, PRICE). There is, however, a disadvantage to the above decomposition; to find the address of a supplier who supplies item "piston", a join operation, has to be applied since the SADDRESS and ITEM are logically distributed in two relations. The decomposition of a relation based on the functional dependencies among its attributes is a novel issue of normalization in the relational model. Four types of normal forms, denoted by 1NF, 2NF, 3NF, and Boyee-Codd-NF, respectively, have been recognized in considering the functional dependency [COD70, ARM74, and BEE77]. The Boyee-Codd-NF is the strongest of these normal forms. Rela- tions in these normal forms may have to be further decomposed into 4NF or 5NF to eliminate multivalued dependencies [FAG77, DEL78, and ZAN76] and join dependencies [AHO79]. This decomposition is needed to eliminate further redun- dancy and anomalies. The success and popularity of the relational model and the relational data- base management systems (DBMSs) are due to its simplicity in structural tabularr) representation and its sound theoretical basis the relational algebra and the rela- tional calculus [COD72a]. The relational algebra defines five primitive operators, of which two are unary operators [Projection (H) and Selection (o)] and three are binary operators [Cross-product (x), Union (+), and Difference (-)]. Other opera- tors such as Join, Natural-join, Set-intersection, and Set-division are also defined in the algebra. Although these later operators are easy to use, they are not primi- tive since they can be expressed in terms of the primitive operators. The relational algebra has the closure property, since every operator must operate on one or more relations and produces a new relation. Operators of the relational algebra basically operate on the values of tuples in relations. Structur- ally speaking, they are defined to operate on tuples whose structures are union- compatible (homogeneous). The relational algebra is complete in the sense that it has the equivalent expressive power to the relational calculus [COD72a and ULL82]. Because of this, it serves as the theoretical basis for the relational model. The relational algebra has been used for the following three purposes, although it has not been previously implemented in any existing DBMSs exactly as defined [ULL82], (1) It creates a new class of query languages called algebraic languages. Based on the relational algebra, languages that directly adopt the relational operators can be developed, such as ISBL [TOD76] which is a close approximation to the relational algebra. Although languages of this type are mostly procedural, it is relatively easy to demonstrate their completeness along with the mathematical properties of the relational algebra which can be readily applied to query optimization and query decomposition. (2) It not only serves as a benchmark for evaluating query languages in existing systems, but also as the criterion for designing new languages for relational DBMSs. A relational language will not have the necessary expressive power if it is not relationally complete [ULL82]. (3) It provides a mathematical basis for transforming expressions in query decom- position and (logical or conceptual) query optimization. As an algebra form, the mathematical properties of the relational algebra can be explored precisely and systematically. For query languages construed as algebraic languages, these mathematical properties exhibit a straightforward application [HAL76J. Query languages like SQUARE or SEQUEL having certain algebraic features may also use these properties, since the parse of a query yields a tree in which some nodes represent relational algebra operators [AST76]. Even if a query language such as QUEL is a relational calculus language, its calculus-like expressions are translated into relational algebra expressions in the QUEL optimizer [WON76]. The total content proposed by Codd before 1979 on the relational model is referred as Version 1 of the relational model (RM/V1), whose modeling capabilities were extended by Codd in 1979 [COD79] to version RM/T (T for Tasmania). Based on these two versions, Codd [COD90] introduces Version 2 of the relational model (RM/V2). The most important additional features in RM/V2 are as fol- lows: (1) A new treatment of items of data missing because they represent properties that happen to be inapplicable to certain object instances. (2) New features supporting all kinds of integrity constraints, especially the user- defined integrity constraints. (3) A more detailed account of view updatability. (4) New features pertaining to the management of distributed databases. It is important to recognize the fact that hierarchical and network models as well as the relational model evolved during a time in which the primary applica- tions of information systems were business-oriented. In an attempt to apply these techniques to the more complicated application areas such as CAD/CAM, CASE, and decision support, it is found that the relational model is no longer adequate for modeling these advanced applications. The inadequacies of the relational model are summarized as follows. First, the relational model has limited modeling capabilities. When data are logically represented in the form of relations, the rela- tionships among entities in these relations are represented by matching values of the attributes or keys in one relation with values of the attributes or foreign keys in other relations. The actual semantics among the data such as generalization and aggregation (the abstract data type) cannot be modeled by the relational model. Second, the relational model only models the structural aspects of entities, and thus, ignores their behavioral aspects (e.g., system-defined and user-defined operations). Third, in these advanced applications, the concept of data indepen- dence should be further extended to the concept of object encapsulation, i.e., not only should the logical representation of an object be separated from its physical representation, but its structural and behavioral properties should be logically encapsulated in its class. The object encapsulation concept cannot be realized in the relational model, since the data describing an entity may be logically scattered among several relations due to normalization [COD70, COD72b, BEE77, and ULL82]. Fourth, entities with complex structures and complicated relationships among entities are not representable by flat tables (relations). Finally, it cannot represent and operate on entities with different (heterogeneous) structures. 2.2 Existing 0-0 Query Languages An extensive literature search on query languages for accessing 0-0 data- bases such as GEM [ZAN83, TSU84], ARIEL [MAC85], DAPLEX [SHI81], FAD [BAN87], POSTQUEL [ROW87], EXCESS [CAR88], as well as other proposed languages [ST084, DAD86, MAN86, SER86, BAN87, FIS87, BAN88, COL89, SHA90] has been carried out. This section surveys a representative sample of these languages. Most existing query languages have capabilities beyond those provided by its theoretical basis. For example, the arithmetic operations and aggregation functions provided by the relational languages are not available in the relational algebra. Therefore, this survey is limited to those features which are relevant to the proposed algebra. To demonstrate the similarities and differences of these languages, the same database schema as shown in Figure 2.1 is used for example queries written in GEM, ARIEL, DAPLEX. The sample schema of Figure 2.1 is for a government owned laboratory system where rectangles represent classes and edges (links) represent attributes. QUEL [STO76, WON76, and Z0077] is a tuple-calculus oriented query language for relational DBMS INGRES [ST076]. In order to avoid the ambiguity which arises when two attributes of different relations having the same name are addressed in a single query, QUEL uses a "dot" mechanism to qualify an attribute of a relation (i.e., a dot is inserted between the name of the relation and the name of the attribute). For example, Equipment.Name refers to the attribute Name of the relation Equipment. Influenced by this mechanism, the existing 0-0 query languages use similar notations for navigating the database schema from one class to another or from one relation to other relations in systems which use relational databases as their back-ends. The language GEM [ZAN83,TSU84] is an extension of QUEL for the data model DSIS which supports aggregation, generalization, and unique identification of objects. In GEM, a class in an aggregation hierarchy that has a link emanating to another class has the name of the later class as the data type of one of its attri- bute. For example, the class Lab has an attribute, Facility, of the type Equip- ment, and has another attribute, Locality, of the type Location, and so forth. The dot notation is used in GEM for navigating along the reference attributes (links) in query formulation. The following GEM query retrieves the name of the manager, the serial number of the equipment, and the address for each laboratory whose headquarter is located in New York. Range of Lab is Lab Retrieve Lab.Manager.Name Lab.Equipment.Serial# Lab.Location.Address Where Lab.Manager.Department.Headquarters.City = "New York" This query returns a set of tuples in a tabular form. Each tuple contains values for the manager's name, the equipment serial number, and the address of the laboratory of interest. In the approach described in Stonebraker et al. [ST084], the dot notation is used in a manner similar to that found in GEM to implement the abstract data type (ADT) concept. In addition, QUEL is used as a data type to facilitate the navigation from one relation to another. A relation may have a field of type QUEL which may contain expressions or commands (queries). Whenever the field is addressed in a query, these expressions, in whole or in part, will be activated. In general, if X is the tuple variable of the relation R1, Y is a field of type QUEL in relation R1, and the query stored in Y retrieves field Z of another relation, R2, then the expression X.Y.Z is a field in a collection of this view. In other words, the expression will return the values of the Z field of tuples (in R2) that are related to X through Y. For example, let the relation Manager have a field called OfficeInfo of type QUEL which contains a query that retrieves the telephone number of the relation Location. The expression Manager.OfficeInfo.Tel# returns the telephone number for each manager in a tabular format. Clearly, the imple- mentation of QUEL as a data type provides a way to relate data in two relations without modifying the database schema. Instead of using the dot notation, ARIEL [MAC85] takes advantage of the "OF" notation. The example query described for GEM can be restated as Range of Lab is Lab Retrieve Name OF Manager OF Lab Serial# OF Equipment OF Lab Address OF Location OF Lab Where City OF Headquarters OF Department OF Manager OF Lab = "New York" using the "OF" notation which is linguistically more natural than using the dot notation. However, the result of this query is also represented by a flat table (relation). DAPLEX [SHI81] is a functional data language. The data retrieval com- ponent of DAPLEX is similar to the languages described above, although it is interpreted differently. In the functional paradigm, the class having a link (i.e., attribute) emanating to another class is considered as a function. The function has, by default, the name of the class to which the link points. For example, Location(Lab) and Department(Headquarters) represent the facts that Lab has Location and Headquarters has Department as attribute, respectively. When the function Location(Lab) is applied to an object of the class Lab, it returns a value which is an object in the domain class over which the attribute is defined. If the navigation is from one class to another through a sequence of classes, a nested function is used. For instance, the expression Name(Manager(Lab)) specifies the name of the manager of a laboratory to which the manager is responsible. For a particular object of Lab, the manager of the laboratory is produced first; then, the function Name( is applied to the returned manager and returns the name of the manager. The example query can be expressed in DAPLEX as follows. FOR EACH Lab SUCH THAT City (Headquarters (Department (Manager (Lab)))) = "New York" PRINT Name (Manager (Lab)), Serial# (Equipment (Lab)), Address (Location (Lab)) Even though DAPLEX is based on the functional paradigm, it returns data in the form of a relation just like in GEM and in ARIEL. Banerjee et al. [BAN88] introduce a query language based on message pass- ing. In the message passing paradigm, the name of a link emanating from a class is interpreted as the name of a message which is stored within that class. One can assume there is actually a message created by the system and having, by default, the same name as its corresponding attribute. When such a message is sent to an instance of the class, it returns the value of the attribute. For example, the fol- lowing is an expression for selecting a laboratory that has a manager who belongs to a subordinate department of its New York headquarters. (Lab SELECT :S (:S Manager Department Headquarters City = "New York")) SELECT in this expression is a message sent to the class Lab. The first argument of SELECT is :S, an iteration variable. The SELECT message iterates over the instances of the class Lab with :S bound to one instance at a time. The block of code within the parentheses is the second argument of SELECT, and is executed for each value bound to :S. In this particular block, the message Manager is sent to the instance bound to :S in order to return the related Manager instance. Similarly, Department and Headquarters are messages. To elaborate, Department is sent to the returned Manager instance, Manager is sent to the returned Department instance, and Headquarters is sent to the returned Depart- ment instance. The sign "=" is also a message which has the argument "New York". When this message is sent to the resulting headquarter instance, it returns a logical object TRUE or FALSE. An instance of Lab is qualified for the above expression, if and only if the returned logical object is TRUE. The logical AND or OR message can be sent to this object with an argument that specifies some other condition on the instance of Lab. In principle, though not described in Ban- erjee et al. [BAN88], similar message-based expressions can be used to retrieve attribute values of the resulting Lab instance. The result of a query which involves such conditions is the set of the instances of Lab along with its attribute values and is represented in a tabular form. As shown in the samples of these query languages, their query formulations, though interpreted differently, are very similar to each other. This is evident in the fact that the formulating of queries is accomplished by navigating the graphi- cally represented database schema from class to class through their respective links. In each of these languages, however, a query operates on a database that is structurally represented using an 0-0 data model and returns a result whose structure is represented in a tabular form. Consequently, the result of a query cannot be further queried by other queries written in the same language. There- fore, these languages are not closed. Another drawback of these languages is seen in their navigation mechanisms which can only formulate queries against classes (or relations) that are interre- lated in simpler patterns like the linear and forest structures shown in Figure 2.2a. However, in 0-0 databases, the graphical patterns in which objects are inter- related with each other are basically networks which are not restricted to plane graphs (a graph is a plane graph if it can be drawn on a plane without any inter- section of two edges). They can be as complicated as surface graphs (a graph is a surface graph if it can be drawn on a surface without any intersection of two edges). Phrasing queries against classes that are interrelated in more complicated patterns depicted in Figure 2.2b is beyond the capabilities of these languages. A third drawback of these languages which renders their navigation mechan- isms insufficient is that only one type of the relationship (an object ia related to another object) between objects of two classes can be expressed. In fact, when two classes are directly linked at the schema level, objects in these two classes may have another type of relationship an object is not related to another object. This type of relationship represents the complement aspect of the semantics specified for the two associated classes, such as not-a-part-of, not-a-function-of, or is-not-a which is often needed in querying the databases. For example, "For each laboratory, list the equipment that is not available" is a reasonable query. The proposed query languages [DAD86, MAN86, BAN87, ROW87, CAR88, COL89] use nested relations as their logical views of databases. A nested relation is a generalized relation, i.e., a recursively defined relation: the attributes of a rela- tion can be either atomic values or another relation in which the attributes can be a third relation, and so forth. Figure 2.3 shows an example of a nested relation. Nested relations are particularly suitable for representing data in forest structures. The above languages are considered to be closed, since operators in these languages operate on nested relations and produce nested relations. However, they also have the drawbacks mentioned above and it is our view that nested rela- tion is not a proper logical representation for an 0-0 database which is networks of objects, object classes, and their associations. Using nested relations to represent data in network structures introduces one level of indirection. Mapping from a network representation to nested relations is an extra process. Further- more, in order to use a nested relation to represent complex structures, a large amount of data has to be replicated in the representation. Figure 2.4 shows an example of using a nested relation to represent a graph having loops. Note that vertex F has to be replicated three times. 2.3 ENCORE 0-0 Data Model and Its Underlying Query Algebra In spite of the popularity of the 0-0 paradigm and its application in the field of database management, the existing 0-0 database management systems still lack a solid mathematical foundation for the manipulation of an 0-0 database and the optimization of queries. Recently, a query algebra [SHA90] was proposed for the ENCORE 0-0 data model [ELM89]. This section surveys the query alge- bra as well as the ENCORE model. It also serves as a comparison to the associa- tion algebra proposed in this dissertation. 2.3.1 The ENCORE Model ENCORE 0-0 data model [ELM89] supports abstract data type, type inheri- tance, typed collection of typed objects, objects with identity, and object encapsu- lation. It models an application as networks of objects, object types, and their associations. The definition of an abstract data type in this model includes the Name of the type, a set of Properties defined for instances of the type, a set of Operations which can be applied to the instance of the type. Properties reflect the state of an object while operations may perform arbitrary actions. Properties are typed objects that may be implemented as stored values, procedures, or functions. The implementation of a property is invisible to the user and is assumed to return an object of the correct type and to have no side-effects. In addition to user-defined abstract data types and a collection of atomic types such as Int, String, Boolean, etc. (i.e., primitive-classes), ENCORE provides two parameterized types and a global Object type which is the supertype of all other types. The parameterized type Set[TC defines T as the type, or supertype, of objects in a collection having type Set, and T is called the member type of the set. The parameterized tuple type associates types (T,) with attribute names (A,) and defines properties Get-attribute-value and operations Setattribute-value for each attribute. The T,'s can be any database types, thus, allow nesting of tuple types. The value of a tuple is represented as A's are attributes of the tuple and the o's are objects of the corresponding types. The global supertype Object defines a family of operations for equality called i-equality where i indicates how "deeply" a comparison of two objects must search before finding equality. Two objects are identical when they are the same object, i.e., they have the same identity. Identical objects are 0-equal (=0 or just =) and, for i>0, two objects are i-equal (=J) if (1) they are both collections of the same cardinality and there is a one-to-one correspondence between the collections such that corresponding members are (2) they both have the same type (not a collection type) and the values of corresponding properties are =i-1. Type Object also defines a stronger notion of equality called id-equality. Two objects are id-equal at depth i if they are i-equal and graphical representa- tions of the objects are isomorphic. 2.3.2 The Underlying Query Algebra of ENCORE The query algebra [SHA90] is proposed based on the 0-0 model ENCORE. The domain of the query algebra is defined as a typed collection of typed objects. A typed collection is of parameterized type Set[T1 and the objects in the collection are of type T. If objects of a collection are collected from different types, T is their most specific common type in the type lattice. For example, if object a is of type S, object p is of type P, and S is a supertype of P, the collection of objects a and p is of type Set[S]. The query algebra is closed since the operators of the query algebra operate on collections) of objects with type Set[T,] and produce a collection with type Set[TJ, where type Tk is defined by the query. Similar to the languages surveyed in Section 2.2, the query algebra addresses a property of an object using 'dot' notation (e.g., e.p.q where a is an object of type T1, p is a property of a and is of type T2, and q is a property of p and is of type T3). Twelve operators are defined in this algebra. We give their brief definitions followed by some example queries to illustrate the major concepts of this algebra. (1) The Select operation creates a collection of objects which satisfy a selection predicate. Select(S,p) = { | (a in S)Ap(s) } where p is the predicate. (2) The Image operation is used to return a single object for each object in the queried collection and has the form: Image(s, f: 7) = { (A) I s in S } where S is a collection of objects and f returns an object of type T. (3) The Project operation extends Image by allowing the application of many functions to an object, thus supporting the creation and maintenance of selected relationships between objects. The relationships are stored as tuples with Tuple type. Project(S, = { where S is of type Set[71, the A,'s are unique attribute names, and each /f takes a single input of type T and returns an object of type Ti. Project returns one tuple for each object in the collection being queried. Each newly created tuple is a new object with unique object identifier. (4) The Ojoin operator is an explicit join operator used to create relationships which is not defined between objects of two collections in the database. It is essentially a Cartesian product of collections of objects, followed by a selec- tion of result tuples. For collections S and R, the Ojoin is defined as follows: Ojoin(S, R, A,, Ag, p) = { I a in S A r in R A p(s,r) } where p is a predicate (as in Select) defined over objects from S and R. The Ojoin operation creates new tuples in the database to store the generated relationships. The tuples created will have unique object identifiers. (5) Union, Difference, and Intersection are the usual set operations with object comparisons and set membership based on object identity (=,). The result of these operations is considered to be a collection of objects of type T, where T is the most specific common supertype (in the type lattice) of the types of the objects in the operands. (6) Flatten operation is used to restructure sets of sets and Nest and UnNest allow the representation of tuples as flat or nested relations. (7) For the above operators, two identical operations cannot give identical response, since each result collection is a newly identified object in the data- base and the objects in a result collection may be either existing database objects or new tuple objects created during the operation. Operators DupEl- iminate and Coalesce are introduced to handle situations where equal objects are created by a query. The example queries are issued against the Supplier-Parts-Job database shown in Figure 2.5. For the purpose of these examples, it is assume that Type Object is the only supertype for each of the given types. Example 1: Find all red parts. Which suppliers can supply all of the red parts? Pred := Select(Parts,Xp p.color = "Red" S-Pred:= Select(Suppliers,Xs P.red subset-of s.Inventory) The first selection finds the red parts and the second selection finds all sup- pliers for which the inventory includes that set of parts. The subset-of operation is available since property Inventory and result P-red both have type Set[art]. Example 2: What parts are needed by jobs in Boston? BosJobs := Select(Jobs,Xj j.address.city =- "Boston") BosJobParts := Project(BosJobs,Xj <(J,j),(Pt,j.PartsNeeded)>) The select operation finds the jobs in Boston and the project operation gives information about which parts are needed for each job in Boston. The result of the projection is of type Set[Tuple]. Note that operation NewPart (of type Job) cannot be applied to members of BosJobParts, since they have type Tuple. How- ever, it is appropriate for objects BosJobParts.J. Example 3: Find all local suppliers for each job. LocalS:= Ojoin(jobs,Suppliers,J,S, Xj Xs j.address.city = s.address.city) This Ojoin operation produces a set of tuples of type <(J,Job),(S,Supplier)>, which is similar to a normalized relation. To get a set of suppliers for each job, a Nest operation needs to be applied: Nest(LocalS, S). From the above description, we can see that the query algebra supports many features of 0-0 databases and has taken significance steps towards a power- ful 0-0 query algebra to serve as the mathematical foundation for 0-0 database. However, it still has the following limitations. (1) Although the ENCORE models an application as networks of types, objects, and their associations, the domain of its underlying query algebra is defined as collections of objects having type Set[T], which is essentially a nested relation representation, since the member type T of the set type can be a parameter- ized Tuple type which may in turn contain attributes of Tuple types. There- fore, the query algebra cannot represent network-structured relationships among objects efficiently and the mapping problem addressed before still remains. (2) In this algebra, two identical expressions or two identical operations in a sin- gle expression do not give identical response, since each result collection is a newly identified object in the database. To eliminate duplicated copies of the same newly created object, the algebra introduces DupEliminate and Coalesce operations, which are not necessary if it directly supports the net- work view of 0-0 databases. (3) In this algebra, a collection may contain objects with heterogeneous struc- tures. For example, two objects are both of Tuple type but with different arities and the union of the two object is also a collection of objects having Tuple type. However, other operators in this algebra are not defined to operate on such collectionss. (4) Since the query algebra is developed for a specific model (i.e., Encore), it is difficult to apply to other 0-0 models. Figure 2.1 A sample schema (a) simple query patterns plane graphs surface graphs (b) complex query patterns Figure 2.2 Simple and complex query patterns 0---0---0---0---0 Figure 2.3 An example of a nested relation B(b2) A(al) D(d3) E(e2) F(f5) G(gl) H(h6) Figure 2.4 Using a nested relation to represent a complex structure Type Supplier properties: Ident: string Address: Addr Inventory: Set[Part] Type Job properties: Num: string Address: Addr PartsNeeded: Set[Part] Preferred_Suppliers: Ordered_list[Supp operations: RecvOrder: Supplier, Set[Part] --> Supplier operations: NewPart: Job, Part --> Job Type Part properties: operation Num: string Order: Address: Addr Same Color: string Components: Set[Tuple[<(P,Part, (Qty, Int)>]] Plan: drawing BillofMaterial: list[Part] s: Part --> Part Part: Part, Part --> Boolean Type Addr properties: Street: string City: string State: string Figure 2.5 A Supplier-Parts-Job database CHAPTER 3 OVERVIEW OF 0-0 DATABASES AND ASSOCIATION-BASED QUERY FORMULATION This chapter informally introduces the graphical view of 0-0 databases and illustrates the association-based query formulation mechanism. The graphical view captures the most important characteristics of 0-0 databases in which object classes and their objects are associated with each other. Based on this view, query formulation and processing can be made by specifying and manipulat- ing association patterns in which objects are inter-related with each other, unlike the traditional attribute-based query formulation and processing which match values in different relations. Since the graphical view is suitable for many 0-0 data models, the association algebra developed based on this view can be used as a general algebra for supporting these 0-0 databases. The graphical view of O-O databases is formalized in the next chapter. 3.1 Overview of 0-0 Databases 0-0 semantic data models provide a conceptual basis for defining 0-0 data- bases. Although each model has some unique constructs that distinguish one model from the others, there are several common structural and behavioral pro- perties based on which an algebra can be developed and used to support these models: First, objects are physical entities, abstract concepts, events, processes, func- tions or anything that an application cares to capture and represent. Second, objects having the same structural and behavioral properties are grouped together to form an object class. Object classes can be categorized into two general categories: (1) the nonprimitive-class which represents a set of objects of interest in an application world, each of which is assigned a system-wide unique object identifier (OID) and its data are explicitly entered in a database by the user; and (2) the primitive-class which represents a class of self-named objects serving as a domain for defining other object classes, such as a class of symbols or numerical values. The behavioral properties of an object class are defined in terms of system-defined or user-defined operations (e.g., retrieve, display, delete, insert, rotate a design object, hire an employee, etc.), which can meaningfully operate on its objects using their corresponding programs (or methods). The structural properties of an object class and, thus, its objects consist of two types of data (1) descriptive data (or instance variables) which define the states of the objects; and (2) association data which specify the relationships between its objects and the objects of some related classes. Third, different 0-0 models recognize different types of associations. Two of the most commonly recognized associations are aggregation and generalization. Aggregation models the a-part-of, a-function-of, or a-composition-of relation- ship. For instance, a complex object can be modeled by an aggregation hierarchy (abstract data type) in which a complex object is defined in terms of its associa- tions with objects in other defined classes. Generalization models the is-a or the superclaos-subclass relationship in which an object in a subclass inherits both the structural and the behavioral properties of its superclass(es). Thus, from the algebra point of view, an 0-0 database can be viewed as a collection of objects, grouped together in classes and interrelated through associa- tions. It can be represented by graphs at both the intensional and the extensional levels. At the intensional (schema) level, a database is defined by a collection of inter-related object classes and is represented by a Schema Graph (SG). For example, the SG for a university database is illustrated in Figure 3.1, in which each rectangle denotes a nonprimitive-class such as a class of person objects or a class of department objects, and each circle denotes a primitive-class such as a class of names or ages. The associations among classes are represented by the edges in SG. For example, there is an association between the class Course and the class Department (an Aggregation association), and an association between the class Person and the class Student (a Generalization association). Since the semantic distinctions of these and other association types recognized by different semantic models can be either hard-coded in a DBMS or declaratively specified by some rules and used by a rule processor to govern the manipulation of the associ- ated classes, the underlying algebra does not have to incorporate the semantics of these association types. All it has to be concerned with is whether or not an object class and its objects are associated with some other classes and their objects, i.e., the edges (or associations) are type-less in SG. For example, the semantics of inheritance can be incorporated in a query language translator which translates a high-level language statement into its underlying algebraic representa- tion. The algebra does not have to deal directly with the semantics of inheritance. This is particularly important if the algebra is to be used as a general algebra for supporting various 0-0 data models in which the semantics of an association type may have slightly different meanings. At the extensional (instance) level, a database can be viewed as a collection of objects, grouped together in classes and inter-related through some type-less associations; and as such it can be represented by an Object Graph (OG). For example, the OG corresponding to a portion of the university schema graph is shown in Figure 3.2. In this example, the Teacher object t4 is associated with two Section objects; thereby representing the fact that he/she is teaching two sections, sc3 and sc4. The Student object sl is associated with Undergrad object ul which, in turn, is associated with Department object dl; thereby representing that sl is an undergraduate student who minors in the department dl. Finally, the Section object sc2 is not associated with any object of the Student class, which represents the fact that it is not taken by any student. Object associations expressed by different graph patterns represent the semantic relationships among these objects in an application world. 3.2 Pattern-bhsed Query Formulation Based on this view of an O-O database, users can query the database by specifying patterns of object associations as search conditions. Once these objected are selected, they can be further processed by either system-defined operations (Retrieval, Display, Update, Insert, Delete, etc.) or user-defined operations (RotatePart, PurchasePart, HireFaculty, etc.). For example, the fol- lowing queries can be issued against the university database as illustrated in Fig- ures 3.1 and 3.2 (the algebraic expressions for these queries will be given in Section 4.4). Query 1: For all sections, get the majors of students who are taking these sections. To satisfy this query, we can specify a linear pattern containing the classes Section, Student, and Department as shown in Figure 3.3a. In this pattern, a cir- cle represents a class and an edge represents that the objects of the two adjacent circles (classes) must be associated with each other. This pattern is called an intensional pattern which represents that sections taken by students who major in some departments are to be identified. The answer to this query can be found in Figure 3.2 by checking if the objects of these three classes satisfy such pattern. There are five object patterns (called extensional patterns) which satisfy the inten- sional pattern as shown in Figure 3.3b. The Section object sc2 and the Student object s3 do not appear in these extensional patterns, since sc2 is not taken by any student and s3 does not have a major yet. These patterns can also be identified in two sequential steps. First, get all the patterns in which the Section objects are associated with the Student objects. Then, if a pattern generated in the first step (i.e., a Section-Student pair) is further associated with an object of Department, a new pattern consisting of three objects is constructed and retained in the result; otherwise, the pair is dropped. Once these objects (as well as their associations) have been identified, different system-defined or user-defined operations defined on their corresponding classes can be applied to these selected objects. For example, Inform(Department) can be an operation defined on the class Department. It sends each of the selected departments a letter concerning the majors of the students. Suppose there is a rule in the university that a student cannot major and minor in the same department. To check whether there is such a case in the database, the following query can be issued. Query 2: List students who major and minor in the same department. The intensional pattern for this query is shown in Figure 3.3c. It can be formed by starting from the class Student and navigating the schema in two traversal paths (refer to Figure 3.1). One path is from Student to Department, which means that a student majors in a certain department; and the other path is from Student to Department through Undergrad, which means that a student is an undergraduate and minors in a certain department (we can see from the SG that only undergraduates may have minors). According to the query, a single stu- dent should associate with objects in both Undergrad and Department and these two paths should merge at Department, thereby forming a loop. This implies two logical AND conditions, one at the Student class and the other at the Department class. We use double arcs to denote such conditions as shown in Figure 3.3c. From Figure 3.2, we can see that the student sl has his major and minor in the department dl. This extensional pattern is depicted in Figure 3.3d. Query 3: For those students taking section 300 and having majors and/or minors, get their majors and/or minors. There are several ways to form an intensional pattern for the query. We may start from Section# and traverse to Student through Section and, then, navi- gate the schema in two paths as we did for query 2. According to the query, a student who either has a major or a minor should be included in the result (in this database, it is assumed that graduate students do not have minors). This means that either path of the navigation will construct a pattern that would satisfy the query. Thus, a logical OR condition exists at Student. We use a single arc to indicate the OR condition as shown in Figure 3.4a. Like Query 2, these two branches merge at Department. However, this query does not require that they merge at the same Department object. This is specified by the second OR condi- tion at Department in Figure 3.4a. The extensional patterns that satisfy this query have heterogeneous struc- tures: two types of linear patterns as shown in Figure 3.4b. The first type includes patterns that represent the minors of the undergraduates; and the second type includes patterns that represent the majors of the student who are either under- graduates or graduates. In both types of patterns, a student is associated with sec- tion 300 which is assumed to be the Section# for sc3. Figure 3.4c will be described later in Section 4.4. We have given some example queries which specify how objects are associ- ated with one another. In the graphical representation of an 0-0 database, when there is no edge between two objects even though there is one between their classes, it implies that two objects are not associated with each other. This represents the complement aspect of the semantics between two associated classes. It is necessary to allow a user to retrieve this type of object non-association from a database. The following query is such an example. It can also be specified by a pattern. Query 4: For each teacher, list the sections which he/she does not teach. We use a dashed line to represent the fact that two objects are not associated with each other. Therefore, the intensional pattern for this query can be drawn as in Figure 3.4d. There are twelve extensional patterns that match the intensional pattern. Figure 3.4e shows a portion of them. Non-association relationships among objects are not explicitly stored in a database. However, they can be derived during the processing of this type of queries. Using the above examples, we hope that we have convinced the reader that the pattern-based query formulation is suitable for query specification based on a graphical view of an 0-0 database. 3.3 Conclusion The (type-less) graphical representation of 0-0 databases is applicable to most 0-0 data models, since it captures the essential characteristics of 0-0 data models in which object classes as well as their objects are inter-related with each other in different association patterns. Querying such databases can be made by specifying patterns in which objects of interest are associated with each other. It should be clear that this formulation is quite different from the attribute-based query formulation in the existing relational query languages which is based on matching the attributes (or the key or composite key) of one relation with the attributes (foreign keys) in other relations. A query that requires the specification of a complex pattern of object associations can be specified in a rather straightfor- ward manner in an association-based language, whereas in an attribute-based language, complex nestings of query blocks or multiple queries would be required [ALA89a]. It is our view that an algebra developed for processing data based on the graphical view of 0-0 databases and the pattern-based query formulation should satisfy the following requirements. First, it should allow direct manipulation of complex patterns of object associations. Second, the closure property should be maintained. Third, both association and non-association relationships among objects should be expressible as search conditions. Fourth, it should be complete in the sense that it can be used to describe all possible patterns in a database. Lastly, it must be able to represent and process patterns with both homogeneous and heterogeneous structures. degree Figure 3.1 Schema graph of a university database Teacher Section Section# Student Department Figure 3.2 Object graph Query 1 Section Dept (a) 0--- 0 Student scl sl dl 0 p sc3 s2 d3 (b) sc3 s4 d3 sc3 s5 d4 sc4 s7 d6 Query 2 jQUndergrad (c) a Student Dept ul (d) dl sl A Idl Figure 3.3 Pattern specifications for Query 1 and Query 2 Query 3 Section# Section Student Dept (a) O0O u-- (a) [300] Undergrad [300] sc3 s3 u3 d2 [300] sc3 [300] sc3 s4 u4 d2 [300] sc3 [300] sc3 s4 d3 s5 d4 s3 d2 s4 ^ d2 s d3 Query 4 ) Teache (d) 0- s2 d3 s5 d4 .----- r Section S- --0 sc2 --0 sc3 sc2 -- -S Figure 3.4 Pattern specifications for Query 3 and Query 4 (b) w w w w CHAPTER 4 ASSOCIATION ALGEBRA The association algebra (A-algebra) is defined based on a uniform representa- tion of an 0-0 database in terms of objects, object classes, and type-less associa- tions, as described in Chapter 3. The algebra contains a number of operators which operate on graph structures of object associations to produce graph struc- tures. The closure property of the algebra ensures that the result of a query can be further manipulated by other queries. 4.1 Definitions First, we formally define an 0-0 database at both schema and object levels. Schema Graph (the intensional database): The schema graph of an 0-0 database is defined as SG(C,A), where C={C,} is a set of vertices representing object classes; A is a set of edges, each of which, Aj(k), represents association between classes C and C, where k is a number for distinguishing the edges from one another when there is more than one edge between two vertices. Object Graph (the extensional database): The object graph of an 0-0 database is defined as OG(O,E), where 0={0)} is a set of vertices representing object instances (Ith object in class q,); and E={O(i- == m,,} is a set of edges representing the associations among object instances. When one object instance is connected with another in the object graph, a regular-edge (solid line) is drawn between the corresponding ver- tices as Oi,-0O,, which specifies that jth object instance in class Ci is related to nth object instance in class C, through the kth association of classes C, and Cm. If two object instances 0,. and 0,,. are not connected in the object graph but their classes Ci and Cm in the corresponding SG are directly connected, a complement-edge (dotted line) is drawn between them and is denoted by ,j....Om,,. In this 0-0 models, an object may participate in several classes (e.g., in a generalization hierarchy). Its representation in a class is called an object instance. Since in most cases in this dissertation, "object" and "object instance" can be used interchangeably without any ambiguity, we shall use "object" unless a distinction is required between the two. The reason for explicitly introducing complement-edges into the OG is to allow the A-algebra to manipulate both association and non-association between objects of two adjacent classes. In an actual 0-0 database, it is not necessary to explicitly store the complement-edges. Figure 4.1 illustrates the regular-edges and complement-edges among the objects of three object classes. For example, we see that section scl is taken by students s2 and s3 (regular-edges) and not taken by students sl and s4 (complement-edges). The relationship between an OG and its corresponding SG is formally described by the following proposition. Proposition 1: An OG(O,E) is a morphism of its corresponding SG(C,A). The mapping function Fm is defined as F,,: Ci => {Oij}, and Fm2: Aim(k) => {Oi--==m,.}. The mapping between SG and OG is one-to-many, since a database is dynamically changing and may have different instantiations at different times for the same schema graph. To define "association pattern", we first extend the concept of connected graph in graph theory by treating complement-edges as edges, i.e., a connected graph is a graph in which there exists at least one path between any two vertices and each path may contain regular-edges, complement-edges, or a combination of the two. We shall from now on use an upper-case letter to denote a class and the corresponding lower-case letter with a subscript to denote an object instance in that class. We shall assume that there is only one edge between any two vertices in SG unless otherwise specified so as not to complicate the notation. Association Pattern: A connected subgraph of an OG is an association pattern (or pattern for short). By this definition, a single vertex (or object instance) in OG, which is a con- nected subgraph, is also a pattern. We call it an Inner-association-pattern (or Inner-pattern for short). It is algebraically represented by (a,) for a vertex of class A in SG. Thus, object instances are treated as Inner-patterns in the A-algebra. A regular-edge together with two vertices (i.e., two Inner-patterns) it connects is called an Inter-association-pattern (or Inter-pattern) which is represented by (ai0b). A complement-edge together with the two Inner-patterns it connects is called a Complement-association-pattern (or Complement-pattern) and is represented by (acbj). This pattern states that a, and b, are not associated with each other in OG. If a path consisting of only regular-edges between vertices a, and b, it can be represented by a Derived-inter-association-pattern (D-inter-pattern), denoted by (aibj); otherwise, it can be represented by a Derived-complement-association- pattern (D-complement-pattern), denoted by (aib,). When a path is represented by a derived pattern, it simply means that two vertices are indirectly associated or non-associated but how they are interrelated (the actual path) is of no importance. A D-inter-pattern is treated as an Inter-pattern and a D-complement-pattern is treated as a Complement-pattern in the algebraic operations. The above five types of patterns are the primitive patterns, the latter four being binary patterns. Their graphical and algebraic representations are summar- ized in Figure 4.2a. All other connected subgraphs are called complex patterns. For example, the complex pattern shown in Figure 4.2bl contains three primitive patterns: two Inter-patterns (b61) and (bd), and a Complement-pattern (b6c). It can be uniquely defined by its algebraic representation as a set of primitive pat- terns, i.e., (aab,bjc,b6d,). More examples of complex patterns are shown in Figure 4.2b. From these examples, one can observe that a complex pattern can be decomposed into a set of binary patterns which cannot be further decomposed. This implies that, in the algebraic representation of a complex pattern, an Inner- pattern may not occur as an element and a binary pattern may appear only once. A pattern in this algebraic format is called a normalized pattern, otherwise it is called an unnormalized pattern. (b,,bzcj), (b2,b22), and (a6b,,bc2,ab,) are examples of unnormalized patterns. During the process of constructing an association pat- tern, we always normalize it by eliminating the duplicates. The above three pat- terns have the normalized forms of (bc6), (b22), and (a1b1,bc), respectively. The definitions of OG and association pattern imply that a pattern is a non- directional graph, i.e., (aib,) = (bjai), and that the sequence of primitive patterns in the algebraic representation of a complex pattern is not important, hence (aibj, bjck) = (ckb,, aibj). Based on the above definition and notion of association pattern, we view an OG as an Association Graph (AG) and all the association patterns in AG form the domain of the A-algebra, denoted by A. 4.2 Relationship Between Two Association Patterns The operators of the A-algebra are defined based on the possible relationships between two patterns in A, so that they can be used either to construct complex patterns using simpler patterns or to decompose a complex pattern into several patterns of simpler structures. There are four possible relationships between two patterns p' and p2: non-overlap, overlap, contain, and equal. (1) Non-overlap: Two patterns are said to be non-overlap, denoted by p'DCp2, if they have no common Inner-pattern. (2) Overlap: Two patterns are said to be overlapped, denoted by pr p2, if they have at least one common Inner-pattern. (3) Contain: Contain is a special case of (2) when all the primitive patterns of p' are contained in p2. We say that p' is a subpattern of p2 and denote this relationship by p1Cp2. (4) Equal: This is a special case of (3) when p' contains all the primitive pat- terns of p2, and vice versa. It is denoted by p =p2. Before defining the association operators, we give the definition of "Association-set" the operand of the association operators. Association-set: An association-set, denoted by a Greek letter a (or f,"q,...), is a set of associa- tion patterns without duplicates, a' designates the ith pattern in a, where a oa (Vi,.j). An empty set is also an association-set, denoted by 0. A special type of association-set is called homogeneous association-set, which is important to the A-algebra, since some of the mathematical properties hold only when operands are homogeneous association-sets. Homogeneous Association-set: An association-set is homogeneous, if (1) all patterns are formed by the Inner-patterns (or object instances) of the same set of object classes; and (2) all patterns have the same number of Inner-patterns from each class in the set; and (3) corresponding primitive patterns belong to the same association and are of the same type; and (4) all patterns have the same topology. Otherwise, it is a heterogeneous association-set. Figure 4.3 depicts three example association-sets: a is homogeneous, whereas P is not since pattern #f has only one Inner-pattern of class C instead of two like ' and 0. r is not homogeneous because y3 contains a Complement-pattern which is different from and 'y (i.e., different topologies). 4.3 Association Operators Ten association operators are formally defined in this section: three unary operators [A-Project (II), A-Select (a), and A-Integrate (f)] and seven binary operators [Associate (*), A-Complement (I), A-Union (+), A-Difference (-), A- Divide (+), NonAssociate (!), and A-Intersect (0)]. The examples used to explain these operators will make use of the domain A shown in Figure 4.4. To keep the graph simple, the Complement-patterns are not shown in the figure. The simple mathematical properties such as commutativity, associativity, idempotency, and nilpotency satisfied by the operators are given after each definition. 4.3.1 Notations Notations that will be used in the subsequent sections are listed below. A, B,...,K Denote classes. CL, Denotes a variable for a class. [R(CL,,CL2)] Denotes the association between classes CL1 and CL2. ac Denotes the ith Inner-pattern of class A. @ Denotes an Inner-pattern variable. (a bj) Denotes an Inter-pattern between two classes A and B. (aibj) Denotes a Complement-pattern between two classes A and B. (ate,) Denotes a Derived-pattern from class A to class C. a, f, 7,... Denote association-sets. a Denotes ith pattern of association-set a. {W},{X},{},... Denote sets of classes. Hence, a( represents association-set a which has Inner-pattern(s) from the classes in {X}. It should be noted that an Inner-pattern is represented by an object instance identifier (liD), which is a system-assigned object identifier (OID) prefixed by a class identification so that the object instances of an object in multiple classes can be unambiguously distinguished and the fact that these object instances are instances of the same object can easily be recognized. 4.3.2 Operators All relational algebraic operators operate on relations of homogeneous (or union-compatible) structures with the exception of Cartesian-product and Join. The Cartesian-product and Join provide the mechanism to concatenate two rela- tions of different structures into a single relation, so that it can be further manipu- lated by other operators. In the A-algebra, all the operators are defined to operate on association patterns of homogeneous as well as heterogeneous structures. Therefore, the relational algebra is a special case of the A-algebra in this respect. (1) Associate (*): The Associate operator is a binary operator which constructs an association- set of complex patterns by concatenating the patterns represented by two operand association-sets. Since a pattern may involve many classes and an object class may have more than one association with another class, it is necessary to specify through which association the concatenation of two patterns is intended. The Associate operation on association-sets a and f over the association R between classes A and B is defined as follows: a [R(A,B)] 6 = { y 7 =(af,,amb,): amb,E[R(A,B)] A amE, A bE } The result of an Associate operation is an association-set containing no dupli- cates. Each of its pattern is the concatenation of two patterns (one from each operand association-set). More specifically, if the Inner-pattern (or object am) of A in a' is associated with the Inner-pattern (or object b,) of B in f' in the domain of the algebra A shown in Figure 4.4, then a' and #f are concatenated via the primi- tive pattern (a,,b.). We do not restrict A and B to be different classes in [R(A,B)], i.e., a*[R(A,A)]# is a legitimate operation, which concatenates two patterns (one from each operand association-set) if they have a common Inner-pattern of class A. An example of the Associate operation is shown in Figure 4.5a (for conveni- ence a copy of the sample database is shown in each figure for illustrating an operation. For clarity, we use graphical notation in the figures. In the example, a1 is concatenated with f' and f, respectively, due to the existence of (bcl) and (bic2) in A as shown in Figure 4.4. a is dropped simply because it does not have an Inner-pattern of class B. a3 is dropped because (b2) is not associated with any Inner-pattern of class C in A. ff cannot be concatenated through (e4) with any pattern in a because no pattern in a has an Inner-pattern of B that is associated with (c4) in A. For the same reason f/ is dropped. For the Associate operator, [R(A,B)] can be omitted if the following condi- tions hold: (1) both a and f are A-algebra expressions, (2) the Associate operator operates on the last class in a linear expression a and the first class in a linear expression f, and (3) there is a unique association between these two classes. For example, A *[R(A,B)] B can be written as A*B, if class A is associated with class B through the attribute [R(A,B)] of A. It should be pointed out that A-algebra allows an attribute to be defined by a computed value (or object). For instance, B=(A). The implementations of the function and the procedure are invisible to the algebra. However, they should not have side effect, i.e., the computed result must be of the same type as B. The Associate operator is commutative and conditionally associative as defined below: a 4[R(A,B)] P = P 4[R(B,A)] a (commutativity) (afx} [[R(A,B)] #,) *[R(C,D)] -{z} (associativity) = aC [R(A,B)] ({y1 *{R(C,D)] Y {z}) (if C {X} A BV {Z}) A ({R(A,A)] A = A (idempotency) The associativity holds true if a and 7 do not have Inner-pattern of classes C and B, respectively. Otherwise, the associativity does not hold. For example, if a=(abl,b6o2), f=(bc1), r-=(d,), and A is as shown in Figure 4.4 (the domain of the algebra), then (a o4R(A,B)] fi) *R(C,D)] y =(alb,,bi,,b,e2,c2d, ) and a AIR(A,B)j (P 4R(C,D)] ry) = (2) A-Complement ( ): The A-Complement operator is a binary operator which concatenates the patterns of two operand association-sets over Complement-patterns. It is used to identify the objects in two classes which are not associated with each other in A. The A-Complement operator is defined as follows: a [R(A,B)] f = { '1 | =(oaff,ia,,b): (amb.)E[R(A,B)] A amaEct A bE or k=a : 3(m)(amc.a) A A(n)(be or 7=' : 3(n)(b,Ei) A A(mX)(amEa) } The result of an A-Complement operation is an association-set. Each of its patterns is formed by concatenating two patterns (one from each operand association-set) via a Complement-pattern (a.bn), where am and b, belong to a' and #i, respectively, and the Complement-pattern (amb,) is in A. In the special case when a(or P) is an empty association-set or does not have Inner-patterns of class A(or B), then all patterns of f(or a) that have Inner-patterns of A(or B) are retained in the resulting association-set. An example of the A-Complement operation is shown in Figure 4.5b. It operates over the association between classes B and C. a2 does not appear in the resultant association-set because it contains no Inner-patterns of B. a1 cannot be A-Complemented with P and fL because it is connected with f# and f by Inter- patterns (bc,) and (bc) in A, respectively. Under the same conditions as given in the Associate operator, [R(A,B)] need not be specified with the A-Complement operator unless there is an ambiguity. The A-Complement operator is commutative and associative. For the similar rea- son described for the Associate operator, the associativity holds true conditionally. a [R(A,B)] P = f [R(B,A)] a (commutativity) (ax | [R(A,B)] t{y1) | [R(C,D)] f{z} (associativity) = atx I[R(A,B)] (P{ I [R(C,D)J 7z}) (if {X} A BO{Z}) A I[R(A,A)J A = ( (nilpotency) (3) A-Select (a): The A-Select is a unary operator, which operates on an association-set a to produce a subset of patterns that satisfy a specified predicate P. A pattern in the operand association-set is retained iff the predicates are evaluated true for that pattern. a(&)[I = I = Ya': ;(a')=true } where a is defined by an algebraic expression, and P= T18IT22 .* 0 ,,T,. Each term, T,(i=l,2,...n), is a comparison between two expressions and i,(i=1,2,...,n-1) is a Boolean operator (Aorv). (ar')=true represents that a pattern is evaluated true for that predicate. The expressions on the left- and right-hand sides of a comparison operation may contain constants, functions, and/or operations on objects, but cannot both be constants. The comparison terms are type sensitive, i.e., the results of the two expressions in a term should be data of the same type for primitive-classes or both liDs for nonprimitive-classes. =,>,<,>,<, and are the legitimate comparisons for numerical types; = and o for character, string, and IID types; and =,C,D,C,D, and # for set types. The comparison of two IIDs is performed by comparing their OID portions, since IIDs are the concatenations of the class identifiers and OIDs. A single valued object or a single IID can be treated either as its own data type in numerical, string, or IID comparison, or as a set type containing one element in a set comparison. As an example of A-Select, we assume that there are two associated classes: S for stack and Q for queue. To select associated stack and queue object pairs in which the top and the bottom of the stack have some common objects) with those in the head and the tail of the queue, it can be written as o(S*Q)[(top(S)uottom(S)) n (head(Q)JtaiQ)) 0j For the top equals the head and the bottom equals the tail, we have o(S Q)[top(S)=head(Q) A bottom(S)=tai( Q)] (4) A-Project (H): Similar to the projection operation in the relational algebra, an A-Project operation is defined to project subpattern(s) of a pattern. However, in the rela- tional algebra, the relationship among the projected attributes is not important. Whereas in A-algebra, the association among the projected subpatterns must be maintained so that the associations among the objects in these subpatterns will be retained. The A-Project operator is defined as follows: I4a)[6, TJ where a is an association-set defined by an A-algebra expression; E=(e1, e2, .. e) is a set of expressions which specify subpatterns to be pro- jected; and T=(t,, t, t,) is a set of ordered sets of classes. Each ordered set, tf, specifies a path connecting two projected subpatterns defined by the E expres- sions. e,{i=1,2,...,n) is a subexpression of the expression which defines a. e, and ej (Vi43) should not contain a common class. There may be many paths that con- necting two subpatterns in the original pattern. The path to be retained can be specified in tk. If a specific path is chosen, a minimal number of classes along the path which can uniquely identify the path should be specified. The result of an A-Project operation over a pattern is its subpatterns defined by E and some paths defined by Tthat connect these subpatterns. If a path in the original pattern con- sists of all Inter-patterns, a D-inter-pattern is retained. Otherwise, a D- complement-pattern is included. Multiple paths between two projected subpat- terns can be declared in T, if it is so desired. Figure 4.5c shows an example of A-Project from a pattern a over A B and D. For a', the subpatterns (ab,1) and (d,) satisfy A*B and D, respectively. There- fore, they are kept in the result. According to the path specification stated in the operation, a Derived-pattern (b,d1) is added to the result, thus 7'=(a~b, d, b,d. Its normalized form is -=(alb,, bid. 72 is produced for the same reason. Since a3 does not have a subpattern satisfying A *B, only (ds) is retained. (5) NonAssociate (!): The NonAssociate operator is a binary operator used to identify the associa- tion patterns in one operand association-set that are not associated (over a specified association) with any pattern in the other association-set, and vice versa, in the domain of the algebra A. The NonAssociate operator is defined as follows: a [R(A,B)] f ={ 7 I = (ao, ', amb): (amb,)E[R(A,B)] A amEa' A bEf A V ((amb,),(ambJEA)(am4 a A b 4 ) k i or 7 = a: 3(m)(amea') A A(nXb6. ) V V(b,Ef)3(k, kAm)(akEa A (akb.)E[R(A,B)]) or = i: 3(n)(befi) A i(m)(amea) V V(a,,a)3(k, k,4n)(bE A (ab )[R(A,B)]) } The result of a NonAssociate operation is an association-set. Each of its pat- terns is formed by concatenating two patterns a' and 0' via a Complement- pattern (a,,b,) under the condition that a' is not associated with any # and vice versa. Furthermore, in the special case where the patterns of a(or f) have Inner- patterns of A(or B) and cannot be concatenated with any pattern of (or a), these patterns of a(or P) will be retained in the result if one of the following three condi- tions holds: (1) (or a) is an empty association-set, (2) all patterns of (or a) do not have Inner-patterns of B(or A), or (3) all patterns of (or a) that have Inner- patterns of B(or A) can be concatenated with patterns of a(or f). An example of the NonAssociate operation is shown in Figure 4.5d. In the example, a1 and f are dropped due to the existence of (b1c,) in Figure 4.4. a2 is dropped because it does not contain an Inner-pattern of class B. 0' is dropped because it does not contain an Inner-pattern of class C. 71 is in the resultant association-set because (b2) is not associated with (c4) in A as shown in Figure 4.4 and (bs) does not appear in a. 7 exists because (b2) is not associated with (c,) in A. Note that the NonAssociate operator produces a resultant association-set which is a subset of that produced by the A-Complement operator, because a', i, and ab, may form a new pattern only when am of a' does not associate with any object of B in P and b. of fP does not associate with any object of A in a. In fact, the NonAssociate operator can be expressed in terms of A-Complement and other operators as follows: A [R(A,B)] B = (A H(A *[R(A,B)] B)[A] I[R(A,B)] (B I(A *iR(A,B)] B)[B]) Thus, NonAssociate is not a primitive operator in a strict sense. However, it is very useful for query formulation and is therefore included in the set of A-algebra operators. Under the same conditions as given in the Associate operator, [R(A,B)] need not be specified unless there is an ambiguity. The NonAssociate operator is com- mutative but not associative. a [R(A,B)] f = f [R(B,A)] a (commutativity) A ![R(A,A)] A = 0 (nilpotency) (6) A-Intersect (.): The A-Intersect operation is convenient for constructing a pattern with a branch or a lattice structure (a pattern that has a loop), since a pattern in such structures can be viewed as the intersection of two patterns. Conceptually, the A-Intersect operator is equivalent to the JOIN operator in the relational algebra. It operates on two operand association-sets over a set of specified classes. Two patterns, one from each association-set, are combined into one if they contain the same set of Inner-patterns for each specified class. The A-Intersect operation is defined as follow: a{ *{i W} = { l7 It = (a,fi): V(CLE{ W})V(@ECL,,a')(@E') A V(CL,{ W})V(@eCL,,)(@Ea') } Figure 4.5e shows an example of the A-Intersect operation over classes B and C. The resultant association-set contains four patterns, which are the intersection of a'nI a'nfi, a2onf, and a2wf, respectively, since they all have Inner-patterns (bl) and (c2). Other patterns (as, a4, fl, fl) fail to produce new patterns because they either have no Inner-pattern in both classes B and C or have no common Inner-pattern of class C. The set of classes { W can be omitted when the A-Intersect operation is per- formed on all the common classes of its operands, i.e., {W}={X}r{Y} is implied. Since a lattice pattern can be transformed into a set of other simple patterns, an A-Intersect operation for building a complex pattern can be replaced by an Associate operation followed by an A-Select operation (see Section 4 for detail). The A-Intersect operator is commutative, conditionally associative and idempo- tent. a *{W} = f *{ W} a (commutativity) (aW .{ *W}) fl{Y) *{ W2} = z} = V { (WI) (l{} *{ W2} "{z}) (associativity) (if ({W--(W } {z} =( A (W}-{W ) n ( = a 0 a = a (if a is a homogeneous association-set) (idempotency) The associativity is not always true because there are cases in which a pat- tern of f which fails to intersect with any pattern of 7, may succeed by first inter- secting with a pattern of a in the operation (o{W1}) and then intersecting with a pattern of 7 in the operation (.{ W2}). Now we define three set operators, which are different from the correspond- ing set operators in relational algebra, since they operate on heterogeneous struc- tures as well as homogeneous structures. (7) A-Integrate (f): The A-Integrate is a unary operator. It reorganizes patterns in an association-set according to the relationships among patterns with respect to the classes specified. The A-Integrate operation is defined as follows: f()= { yI l'y=(a): V(k, CL,.{ WIA@ECLA@EaciajEa,)(@EakAakEa,) } By this definition, a subset of patterns (a,) of a is combined into a single pattern if every object instance of classes in { } that appears in a pattern in the subset is also contained in all other patterns in the subset. If a pattern of a cannot be com- bined with any other pattern, it is retained in the resultant association-set as it is. If no class is specified, patterns, in which every pattern has at least one object instance (of any class) common to another, will be integrated into one pat- tern. The reorganized association-set will contain patterns which are apart from each other (refer to Section 4.2). Figure 4.5f shows two examples. The first example shows an A-Integrate operation over class A. Patterns that have common Inner-pattern of class A are grouped into one ('1 is the integration of a', a2, and a3; and Y6 is the integration of a and a ). All other patterns in a are retained in the result as they are. The second example illustrates an A-Integrate operation on the same association-set of the first example but without specifying a class. The result becomes two patterns, which are apart and are exactly the same as they appear in the original database. Whereas the same primitive patterns appear more than once in the result of the first example. (8) A-Union(+): Similar to the UNION operation of the relational algebra, A-Union combines two association-sets into one. However, these two association-sets can contain heterogeneous association structures. It is important for A-algebra to be able to operate on heterogeneous structures because some prior operations may produce heterogeneous association-sets and may need to be further processed over the objects of a common class against other patterns of associations. Unlike the rela- tional algebra and other 0-0 query languages, union-compatibility is not a restric- tion in A-algebra. For this reason, A-algebra has more expressive power. Any query that can be expressed by a single expression in other languages can be expressed as a single A-algebra expression but not vise versa. The A-Union opera- tion is defined as follows: a + p ={ 7I ea V IEf } The A-Union operator is commutative, associative, and idempotent: a + = P + a (commutativity) (a + f) + 7 = a + (f + 7) (associativity) a + a = a (idempotency) (9) A-Difference (-): The A-Difference implements the same concept as the DIFFERENCE opera- tor in relational algebra but with two differences. First, its operands do not have to be union compatible. Secondly, a pattern in the minuend is retained if it does not contain any of the patterns in the subtrahend. a- = 7 | Iy* = a : A(fi)(fC) } The example depicted in Figure 4.5g shows that a1 and a3 are dropped since they both contain #. (10) A-Divide (-): The A-Divide operator implements the concept that a group of patterns with certain common features contains another set of patterns. Q at~ = {( I = aI : V(k( a. ) } where a, is a subset of the patterns of a, which have common Inner-patterns for all classes of {W} and they together contain all patterns of fl. If ({W} is not specified, the A-Divide operation retains all the patterns of a, if each of which contain at least one pattern of f and they together contain all patterns of f. Figure 4.5h shows an example of a being divided by f8 with respect to class B. The A-Divide operation retains a, a2 ,and a3 since they all contain Inner- pattern (b,) of B and together contain all patterns of f. 4.3.3 Precedence The precedence relationships of the above operator are as follows. Unary operators have higher precedence than binary operators. The precedence of the seven binary association operators is given in the following order: *, |, ,, , and +. Parentheses can be used to alter the precedence relationships. 4.3.4 Summary of operators (1) Associate (*): Two patterns are concatenated via an Inter-pattern. (2) A-Complement (I): Two patterns are concatenated via a Complement-pattern. (3) A-Select (o): A pattern is retained if it satisfies the predicate. (4) A-Project (H7): A subpattern is projected from the original pattern. (5) NonAssociate (!): Two patterns are concatenated via a Complement-pattern only if each of them cannot be concatenated with any pattern of the other operand via an Inter-pattern. (6) A-Intersect (.): Two pattern are combined into a single pattern if their com- mon classes have common objectss. (7) A-Integrate (f): Patterns in an association-set are combined if objects of a specified class in a pattern are common to these patterns. (8) A-Union (+): Two association-sets are lumped into a single set. (9) A-Difference (-): A pattern in the minuend is retained if it does not contain any pattern in the subtrahand. (10) A-Divide (-): A subset of patterns in the dividend that have certain common features) and contain all the patterns in the divisor is retained. 4.4 Query Examples We have formally defined nine association operators and given their simple mathematical properties. Before exploring other properties, we give some exam- ples to illustrate how these operators can be used to formulate queries for process- ing an 0-0 database. There can be many alternative expressions for the same query. Choosing the best one for execution is the task of a query optimizer. The mathematical properties of these operators can be used for that purpose. In the following formulation of algebraic expressions, we assume that the user is using the algebra directly instead of a high-level query language. In the latter case, the task of generating algebraic expressions would belong to the translator. To formulate an A-algebra expression for a query, first, we need to construct an intensional pattern for it by navigating the schema graph of the database as illustrated in Chapter 3. Then, each edge of the pattern is marked an operator *, I, or on the intended semantics. For simple patterns, the formulation is straight- forward. For patterns with complex structures, we may have to decompose them into patterns with simpler structures. The expression for the original pattern is the A-Intersect's of the expressions for the decomposed patterns. First, we formulate expressions for Query 1 to Query 4 given in Chapter 3. We have identified the intensional patterns for these queries (see Figure 3.3). Query 1: For all sections, get the majors of students who are taking these sections. It is trivial to write an algebraic expression for Query 1, which is represented by a linear pattern. For this pattern, two edges are all marked with and the algebraic expression can be formulated as follows: f (sco (Section Student Department)[Section,Department;Section:Department]) {Section) where the A-Integrate operation groups the resultant patterns by Sections. Query 2: List students who major and minor in the same department. For Query 2, the edges of the intensional pattern shown in Figure 3.3c are all marked with *. Since this loop structure can be viewed as the A-Intersect of two linear patterns involving both Student and Department, we have (Student Undergrad Department Student Department)[Student] where the A-Project operation gets the student objects that satisfy the association pattern as required by the query. Query 3: For those students taking section 300 and having majors and/or minors, get their majors and/or minors. The expression for the intensional pattern of Query 3 shown is as follow: Section# *Section (Student *Department + Student *Undergrad *Departmentl) where the A-Union operator is used to realize the OR condition at the class Stu- dent. As long as a student has a major or a minor, the linear pattern from Student to Department and the linear pattern from Student to Undergrad and to Depart- ment should be retained. In the expression, Department- is an alias of Depart- ment, which is used to distinguish major and minor departments. Since the query ask for the majors and minors of students who are taking section 300, the A-Select and A-Project operations are used. Thus, we have ft (17( o(a)[Section#=300])[Student, Department, Departmentl; {Student} Student:Department,Student:Departmentl]) where a is the intensional pattern given above. As shown in Figure 3.3g, the result of this expression will contain the derived patterns shown in Figure 3g which are specified by the [CT7J clause of the projection operation and is reorgan- ized by an A-Integrate operation. Note that Query 3 cannot be phrased in a sin- gle relational algebra expression since (a) the union operation in relational algebra requires operands to be union-compatible, (b) using a join operation on Student can cause a loss of information because not every student has both major and minor, (c) the cartesian-product of the majors and minors will produce erroneous results, and (d) no other operation in the relational algebra can combine two rela- tions into one. Query 4: For each teacher, list the sections which he/she does not teach. The algebraic expression for Query 4 can be easily formulated as follows, since it is represented by a linear pattern shown in Figure 3.3h. We note that the A-Complement operator I, rather than the NonAssociate operator !, should be used for this query, since a teacher may be teaching some courses. Teacher I Section Several other query examples are given below. They use the schema graph given in Figure 3.1. Their corresponding intensional patterns are depicted in Fig- ure 4.6. Query 5: List the names of students who teach in the same departments as their major departments. We can see from Figure 4.6 that the intensional pattern for this query can be constructed in two ways. One way is to decompose it into three linear patterns: Name-Person-Student, Student-Department, and Student-Grad-TA- Teacher-Department The A-Intersect's of these three patterns will produce a pattern that satisfies this query. n(Student Person Name Student Department Student Grad TA Department)[Name] where the first A-Intersect operation operates over Student and the second operates over Student and Department. The A-Project operation projects the names of these students. Another way is to decompose the intensional pattern into two linear patterns: Name-Person-Student-Department and Student-Grad- TA- Teacher-Department Therefore, we have an alternative expression (lName *Person *Student *Department *TA Student *Grad *TA I Teacher *Department)[Name] Query 6: List the section# of those sections which have not been assigned a room or have not been assigned a teacher. Since the query requests sections that have not been assigned a room or a teacher, these sections must not be connected with any room or any teacher (i.e., a section which does not associate with any room and teacher should also be retained in the result). Therefore, there should be Complement-patterns between Section and Teacher and between Section and Room, and a single arc between these two branches as shown in Figure 4.6. We emphasize that operation, instead of |, should be used to construct these two Complement-patterns. Then the algebra expression for this query can be easily formulated as follows: 7I (Section# (Section Room# + Section !Teacher))[Section#] Query 7: List the names of students who take courses 6010 and 6020. We shall show three ways of formulating an expression for this query. First, the intensional pattern for Query 5 shown in Figure 4.6 can be constructed by the A-Intersect of two linear patterns as we did for Query 5: n(a(Name *Person *Student *Enrollment *Course *Course#)[Coure#=6010] o(Student *nrollment-l *Course.- *Course#-l)[ Course#=6020])[Name] where Enrollment-1, Course-1, and Course#J are the aliases of the classes Enrollment, Course, and Course#, respectively. This ensures that the A-Interact operation will be performed only over the Student class. A second way is to view the original pattern as a linear pattern without res- triction on Course# as follows: Name-Pe rson-Stude nt-Enrollme nt- Course- Course# Students who are taking both courses must participate at least two such patterns with Course#==6010 and Course#=6020, respectively. This implies an A-Divide operation. Thus, the query can be formulated as follows: 1(Name *Person sStudent *Enrollment *Course *Course# +{Student} o( Course. Course#)[ Course#=601VOCourse#==6020)[Name] where a dot in Course.Course# is used only for identifying the Course# class which is defined in the Course class. It does not represent a function or a method as in other languages. This expression can also be rewritten as follow: l(Name Person I(Student Enrollment Course Course# -{Student} o(Course. Course#)[Course#=6O10V Course#--6020])[Student])[Name] which is more suitable for execution than the first since the inner A-Project gets the student objects who are taking these two courses so that all other data associ- ated with these students, such as Enrollment, Course, and Course#, do not have to be carried along in further processing to get the names of these student. Details of optimization issues will be addressed in the next chapter. We stress that the above association pattern expressions represent the inter- nal algebraic operations that need to be performed if the dynamic inheritance method is used. The high-level query statements corresponding to these algebraic expressions issued by the user can be much simpler due to the inheritance of attri- butes in the generalization hierarchy or lattice. Section Figure 4.1 Regular-edges and Complement-edges in an OG Student Course graphical representation al Inn-pattern a al bl I-pattern al b primitive patterns cl dl Complement- - pattern al dl binary D-Inter- pattern patterns pattern which is derived from al bl c1 dl al dl D-Complement- al dl pattern W -* which is derived from al bl c1 dl ---*--I---- algebraic representation (al) (albl) (cidl) (aT'dl) (albl,blcl,cldl) (ald1) (albl,blcl,cldl) (a) primitive association patterns al bl c1 (1) d (albl,blcl,bldl) a2 b2 c3 ..7 - (2) 3 (2a4 b3 (a2b2,a4b2,b2c3,b3c3) bl c1 dl b. -- -- --.--- -- -ft (bic1,cidl) cl d1 c2 (aTbi ,b1c1 ,bi c2,c dl ,c2d1) (b) complex association patterns Figure 4.2 Examples of association patterns a al bt c1 c c2 a 1c3 (a3 b2 rC1 \~1'33) Y al bl C1 c2 cc4 IcI / -' .cy Figure 4.3 Examples of association-sets A B C D bl cl dl a2 b2 c2 d2 a3 c3 d3 a4 b3 Ad4 c4 Figure 4.4 A sample database association graph (The Complement-patterns are not shown) Sample Database (The Complement-patterns are not shown) P /al -- bl\ cl e---- dl a3 *( c2 c4--- d2 c4 ----- d3 al b1 cl d21 ka=l b c--d--2- ..--------e (a) an Associate operation Figure 4.5 Example of operations Sample Database (The Complement-patterns are not shown) al -- bicl ----. dl a4 e---4 b3 c3 al bl c3 C-~--.--e-.. a4 b3 cl dI ----4------- a4 b3 c2 d2 a4 b3 --- a4 b3 c3 V --..- --- (b) an A-Complement operation Figure 4.5--continued Sample Database (The Complement-patterns are not shown) al bl ci dl c-----*--*--* l al bI cl d3 S c--' --- d b2 c3 d3 e----+---- [(A*B, D);(B:D)] = al bl dl al bl.... d3 Id3 '4. */ (c) an A-Project operation Figure 4.5--continued Sample Database (The Complement-patterns are not shown) a P Y al bl e----4 --c2 d3 I 0 *c4 d4 a4 b2 c4 d4 pp I -.-----..c----. a ) ![R(B,C)] ---- a4 b2 b2 a4 b2 c3 c3---*---. (d) a NonAssociate operation Figure 4.5--continued Sample Database (The Complement-patterns are not shown) bl c2 dlb 2 d b 2 d a; b c. bI c d3 a2 b2 0[B,C] bi ci d3 1 a3 2 c4, d4 c bl c2 dl d2 bl c2 d3 al bl c2 d2 l*--------*---- kal bl c2 d3 S*- --Q--* - (e) an A-Intersect operation Figure 4.5--continued Sample Database (The Complement-patterns are not shown) al bl c2 al bl cl dl - --- -* c2 dl < d2 b3 c4 b3 c4 d4 e--*--- a4 b2 --a4 b3 a4 b3 ...... Cal bl cl dl d ---- 2C d b3 c4 b3 c4 d4 b2 a4b3 --. al * al bl c2 al bl cl di c2 ,dl1 al bl cl dl < d2 0S -- b3 c4 c2 b3 c4 d4 b2 0-----* Z a4 b2 a b3 c4 d4 a4 b3 -----. (f) A-Integrate operations Figure 4.5--continued {A} ,1 ; Sample Database (The Complement-patterns are not shown) P al b1 cl a3 b2 \c2 a---l c al bl c2 - c--- (al bl c2) a3 b3 -- .--. a3 b2 * ----- (g) an A-Difference operation Figure 4.5--continued Sample Database (The Complement-patterns are not shown) al b cl bl c2 dl al b1 cl al bi cla b bl c4 d4 -- b1 c2 dl ---.--e... } c2 b3 c4 bl c4 d4 Sc4 d4 --- ---- / b2 c3 ----* *-----* (h) an A-Divide operation Figure 4.5-continued Query 5 Name Student Grad TA Teacher Query 6 Teacher Section# -0 Section O Room Query 7 Name Enrollment Course Student on Enrollment_1 Course_1 Course#=6010 Course#=6020 Figure 4.6 Intensional patterns of Query 5, 6, and 7 Dept CHAPTER 5 MATHEMATICAL PROPERTIES OF OPERATORS AND THEIR APPLICATIONS IN QUERY OPTIMIZATION AND QUERY DECOMPOSITION In Section 4.3, we have shown some mathematical properties of individual operators. In this section, we shall study their properties systematically. The pro- perties of A-algebra are classified into six categories: (1) conventional algebraic properties such as commutativity, associativity, idempotency, nilpotency, and dis- tributivity; (2) nesting of two unary operations; (3) a binary operation nested in a unary operation; (4) cascading of two different binary operations; (5) general iden- tities; and (6) operation transformation. The properties presented in this disserta- tion is quite exhaustive, but may not be complete. These properties provide the mathematical foundation for query decomposition and query optimization. Their utilities in these two applications are also illustrated in this chapter. The proofs of properties that are marked with t's can be found in the Appendix. Others can be proved similarly. 5.1 Conventional Algebraic Properties To be systematic, first we list the properties given in Section 4.3 without explanation, since they have been illustrated previously. Then, we give the pro- perties of distributivity. A. Commutativity a *R(A,B)I] = P *[R(B,A)] a (5.1 t) a I [R(A,B)] 6 = I [R(B,A)] a (5.2 t) a [R(A,B)] P = f [R(B,A)] a (5.3 t ) a *{W B = 6 *{ w} a (5.4 t) a+ = + (5.5 t) B. Associativity (apx *[R(A,B)] ,{) *[R(C,D)] 7{z} = ax *RR(A,B)] (fi{y *[R(C,D)] {z) (C {X} A B {Z}) (5.6 t) (ax I [(R(A,B)] fl{y) I [R(C,D)] 7(z} = a { [R(A,B)] ((I} [ [R(C,D)] '{z}) (CG {X} A Bq {Z}) (5.7 t ) (a{, *{ W} I{() *{ 7z} = a, w { (W } f{ )W2} 'Y{ (({Wi}-{W2}) n {z = A ({W2}-{WI}) l {X} = ) (5.8 t) (a + P) + y = a + (f + -) (5.9 t) C. Idempotency and Nilpotency a a = a (if a is a homogeneous association-set) (5.10) a + a = a (5.11) A *R(A,A)] A = A (5.12) A ![R(A,A)] A = (5.13) |

Full Text |

xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd INGEST IEID EZS2WFJTR_O925AA INGEST_TIME 2017-07-12T21:18:32Z PACKAGE AA00003326_00001 AGREEMENT_INFO ACCOUNT UF PROJECT UFDC FILES PAGE 1 $662&,$7,21 $/*(%5$ $ 0$7+(0$7,&$/ )281'$7,21 )25 2%-(&725,(17(' '$7$%$6(6 %\ 0,1*6(1 *82 $ ',66(57$7,21 35(6(17(' 72 7+( *5$'8$7( 6&+22/ 2) 7+( 81,9(56,7< 2) )/25,'$ ,1 3$57,$/ )8/),//0(17 2) 7+( 5(48,5(0(176 )25 7+( '(*5(( 2) '2&725 2) 3+,/2623+< 81,9(56,7< 2) )/25,'$ PAGE 2 &RS\ULJKW E\ 0LQJVHQ *XR PAGE 3 'HGLFDWHG WR P\ GHDU ZLIH =KX 6XVLHfn PHQWV DQG KLV SHUVRQDO OLEUDU\ WKDQN 'U 5DQG\ &KRZ IRU KLV HQFRXUDJHPHQW WKURXJKRXW P\ JUDGXDWH VWXG\ ZRXOG OLNH WR WKDQN 'U -RKQ 6WDXGKDPPHU IRU KLV WLPH DQG IRU EHLQJ RQ P\ VXSHUYLVRU\ FRPPLWWHH 0\ VSHFLDO WKDQNV JR WR 6KDURQ *UDQW WKH VHFUHWDU\ RI WKH 'DWDEDVH 6\VWHPV 5HVHDUFK DQG 'HYHORSPHQW &HQWHU ZKRVH KHOS WR PH LV DOZD\V IULHQGO\ DQG LQ WLPH 7KLV UHVHDUFK ZDV VXSSRUWHG E\ WKH 1DWLRQDO 6FLHQFH )RXQGDWLRQ '0& f DQG WKH 1DWLRQDO ,QVWLWXWH RI 6WDQGDUG DQG 7HFKQRORJ\ 1$1%'f 7KH GHYHORSPHQW HIIRUW LV VXSSRUWHG E\ WKH )ORULGD +LJK 7HFKQRORJ\ DQG ,QGXVWULDO &RXQFLO 831ff LV SUHVFULEHG IRU VHUYLQJ DV D PDWKHPDWLFDO IRXQGDWLRQ IRU SURFHVVLQJ GDWDEDVHV ZKLFK LV DQDORJRXV WR WKH XVH RI UHODWLRQDO DOJHEUD IRU SURFHVVLQJ UHODWLRQDO GDWDEDVHV ,Q WKLV DOJHEUD REMHFWV DQG WKHLU DVVRFLDWLRQV LQ DQ GDWDEDVH DUH XQLn IRUPO\ UHSUHVHQWHG E\ DVVRFLDWLRQ SDWWHUQV ZKLFK DUH PDQLSXODWHG E\ D QXPEHU RI RSHUDWRUV WR SURGXFH RWKHU DVVRFLDWLRQ SDWWHUQV 'LIIHUHQW IURP WKH UHODWLRQDO DOJHn EUD LQ ZKLFK VHW RSHUDWLRQV RSHUDWH RQ UHODWLRQV ZLWK XQLRQFRPSDWLEOH VWUXFWXUHV WKH $DOJHEUD RSHUDWRUV FDQ RSHUDWH RQ DVVRFLDWLRQ SDWWHUQV RI ERWK KRPRJHQHRXV DQG KHWHURJHQHRXV VWUXFWXUHV 'LIIHUHQW IURP WKH WUDGLWLRQDO UHFRUGEDVHG UHODWLRQDO SURnr.%06 9OOO PAGE 9 &+$37(5 ,1752'8&7,21 ,Q WKH SDVW WZR GHFDGHV WHFKQLTXHV RI GDWD PRGHOLQJ KDYH JRQH WKURXJK WZR PDMRU FRQFHSWXDO FKDQJHV )LUVW LQ HDUO\ V ( ) &RGG REVHUYHG WKDW IXWXUH GDWDEDVH V\VWHPV VKRXOG DOORZ DSSOLFDWLRQ SURJUDPV DQG WHUPLQDO XVHUV WR UHPDLQ XQDIIHFWHG E\ FKDQJHV PDGH WR WKH LQWHUQDO GDWD UHSUHVHQWDWLRQ RU WKH VWRUDJH VWUXFWXUHf RI D GDWDEDVH +H LQWURGXFHG WKH UHODWLRQDO GDWD PRGHO >&2'@ DQG SURSRVHG WKH UHODWLRQDO DOJHEUD DQG UHODWLRQDO FDOFXOXV >&2'D@ DV WKH PDWKHPDWLFDO IRXQGDWLRQ IRU SURFHVVLQJ UHODWLRQDO GDWDEDVHV 7KH UHODWLRQDO PRGHO SURYLGHV WZR OHYHOV RI GDWD LQGHSHQGHQFH LQ D WKUHHOHYHO DUFKLWHFWXUH IRU D GDWDn EDVH PDQDJHPHQW V\VWHP DV VKRZQ LQ )LJXUH ILJXUHV RI HDFK FKDSWHU DUH SODFHG DW WKH HQG RI WKH FKDSWHUf $W WKH ORZHU OHYHO WKH SK\VLFDO GDWD LQGHSHQn GHQFH LV SURYLGHG LH WKH ORJLFDO UHSUHVHQWDWLRQ RI D UHODWLRQDO GDWDEDVH LV D VHW RI UHODWLRQV LH IODW WDEOHVf ZKLFK LV LQGHSHQGHQW RI WKH SK\VLFDO GDWD DQG VWRUDJHf VWUXFWXUHV LQ ZKLFK GDWD DUH VWRUHG $W WKH KLJKHU OHYHO WKH ORJLFDO GDWD LQGHSHQn GHQFH LV SURYLGHG LH WKH H[WHUQDO YLHZ UHPDLQV XQFKDQJHG ZKHQ WKH ORJLFDO YLHZ RI D GDWDEDVH LV PRGLILHG QRWH WKDW WKH H[WHUQDO YLHZ UHPDLQV XQFKDQJHG RQO\ IRU VRPH VFKHPD PRGLILFDWLRQVf %HVLGHV VLPSOH ORJLFDO UHSUHVHQWDWLRQ DQG GDWD LQGHSHQGHQFH WKH IDFW WKDW WKH UHODWLRQDO PRGHO KDV D VROLG PDWKHPDWLFDO IRXQGDnn SOH[ REMHFW DUH VFDWWHUHG DPRQJ D QXPEHU RI QRUPDOL]HG UHODWLRQV DQG DFFHVVLQJ WKDW GDWD LQYROYHV WLPHFRQVXPLQJ WUDYHUVDO DQG DVVHPEO\ RI GDWD VWRUHG LQ PXOWLn SOH UHODWLRQV 7KH PRGHO DOVR GRHV QRW DOORZ EHKDYLRUDO SURSHUWLHV RI HQWLWLHVREMHFWV WR EH H[SOLFLWO\ GHILQHG 7KH VHFRQG FRQFHSWXDO FKDQJH RI GDWD PRGHOLQJ WHFKQLTXHV RFFXUUHG LQ WKH HDUO\ V 7KH REMHFWRULHQWHG SDUDGLJP ILUVW LQWURGXFHG LQ WKH SURJUDPPLQJ ODQJXDJH 6,08/$ >'$+@ DQG PDGH YHU\ SRSXODU WKURXJK WKH ODQJXDJH 60$//7$/. >*2/@ DOORZV ULFKHU VWUXFWXUDO FRQVWUXFWV DQG EHKDYLRUDO SURSHUn WLHV RI REMHFWV WR EH VSHFLILHG DW WKH ORJLFDO OHYHO LQGHSHQGHQW RI WKHLU SK\VLFDO LPSOHPHQWDWLRQV 6HYHUDO IHDWXUHV RI WKH SDUDGLJP VXFK DV DEVWUDFW GDWD W\SHV LQKHULWDQFH HQFDSVXODWLRQ LQIRUPDWLRQ KLGLQJ SRO\PRUSKLVP HWF KDYH EHHQ VKRZQ WR EH XVHIXO IRU GDWD PRGHOLQJ DQG V\VWHP GHYHORSPHQW 7KH REMHFW HQFDSn VXODWLRQ FRQFHSW DGGV D OHYHO RI GDWD LQGHSHQGHQFH EHWZHHQ WKH SK\VLFDO DQG WKH ORJLFDO LQGHSHQGHQFHV LQWURGXFHG LQ WKH UHODWLRQDO PRGHO DV GHSLFWHG LQ )LJXUH ,W UHTXLUHV WKDW WKH VWUXFWXUDO DQG EHKDYLRUDO SURSHUWLHV RI DQ REMHFW EH ORJLFDOO\f HQFDSVXODWHG LQ LWV FODVV LQ WKH FRQFHSWXDO YLHZ RI DQ GDWDEDVH 6LQFH WKHQ D QXPEHU RI 2EMHFW2ULHQWHG f DQG VHPDQWLF GDWD PRGHOV KDYH EHHQ SURSRVHG >+$0 %$7 .,1 =$1D =$1E '$' 0$, 0$1 68 PAGE 11 =' :2( %$1 ),6 +25 +8/ .,0 52: &$5 &2/ 68@ ZKLFK RIIHU PRUH SRZHUIXO FRQVWUXFWV IRU PRGHOLQJ WKH VWUXFWXUDO DQG EHKDYLRUDO SURSHUWLHV RI REMHFWV IRXQG LQ DGYDQFHG DSSOLFDWLRQV VXFK DV &$'&$0 &$6( DQG GHFLVLRQ VXSSRUW V\VWHPV $Q VHPDQWLF GDWD PRGHO FDQ EH VWUXFWXUDOO\ DQGRU EHKDYLRUDOO\ REMHFW RULHQWHG >',7@ $ VWUXFWXUDOO\ GDWD PRGHO LV RQH WKDW HQFRPSDVVHV DW OHDVW WKH IROORZLQJ FKDUDFWHULVWLFV f ,W VXSSRUWV WKH XQLTXH LGHQWLILFDWLRQ RI REMHFWV WKDW LV HDFK REMHFW KDV D XQLTXH REMHFW LGHQWLILHU VXUURJDWHf ZKLFK LV YDOLG IRU WKH OLIHWLPH RI WKH REMHFW f ,W FDWHJRUL]HV WKRVH REMHFWV ZKLFK FDQ EH GHVFULEHG E\ WKH VDPH VHW RI FKDUDFn WHULVWLFV DWWULEXWHVf LQWR DQ REMHFW FODVV f ,W DOORZV DJJUHJDWLRQ DVVRFLDWLRQf KLHUDUFKLHV WR EH GHILQHG f ,W DOORZV JHQHUDOL]DWLRQ DVVRFLDWLRQf KLHUDUFKLHV WR EH GHILQHG 7KH YLHZ RI DQ DSSOLFDWLRQ ZRUOG LV UHSUHVHQWHG LQ WKH IRUP RI D QHWn ZRUN RI FODVVHV DQG DVVRFLDWLRQV 2EMHFW FODVV FDQ EH HLWKHU D SULPLWLYHFODVV ZKRVH LQVWDQFHV DUH RI VLPSOH GDWD W\SHV HJ VWULQJ LQWHJHUf RU D QRQSULPLWLYH FODVV HJ 3DUW 6WXGHQW 7HDFKHUf $W WKH H[WHQVLRQDO OHYHO LQVWDQFHV RI GLIIHUHQW FODVVHV FDQ EH UHODWHG DVVRFLDWHGf ZLWK HDFK RWKHU IRUPLQJ SDWWHUQV RI REMHFW DVVRn FLDWLRQV $ EHKDYLRUDOO\ REMHFWRULHQWHG GDWD PRGHO RQ WKH RWKHU KDQG LV RQH LQ ZKLFK RSHUDWLRQV WKDW GHVFULEH WKH EHKDYLRU RI WKH REMHFWV RI D FODVV FDQ EH GHILQHG DQG UHJLVWHUHG ZLWK WKDW FODVV 3URJUDPV RU PHWKRGV WKDW LPSOHPHQW WKH RSHUDn WLRQV GHILQHG IRU DQ REMHFW DUH WUDQVSDUHQW WR WKH XVHU RI WKH REMHFWV PAGE 12 )RU WKHVH PRGHOV WR EH WUXO\ XVHIXO WKH\ PXVW SURYLGH VRPH REMHFW PDQLSXODn WLRQ ODQJXDJHV ZKLFK FDQ WDNH DGYDQWDJH RI WKH H[SUHVVLYH SRZHU RI WKH PRGHOV DQG SURYLGH WKH XVHUV ZLWK VLPSOH DQG SRZHUIXO TXHU\LQJ IDFLOLWLHV 5HFHQWO\ VHYHUDO TXHU\ ODQJXDJHV VXFK DV '$3/$; >6+,@ *(0 >=$1 768@ $5,(/ >0$&@ )$' >%$1@ 326748(/ >52:@ (;&(66 >&$5@ DQG RWKHUV UHSRUWHG LQ >'$' 0$1 6(5 %$1 ),6 %$1 &2/ 6+$@ KDYH EHHQ SURSRVHG 7KHVH ODQJXDJHV ZHUH GHYHORSHG EDVHG RQ GLIIHUHQW SDUDn GLJPV )RU H[DPSOH '$3/$; DQG WKH TXHU\ ODQJXDJH RI >0$1@ DUH EDVHG RQ WKH IXQFWLRQDO SDUDGLJP 7KH TXHU\ ODQJXDJH RI >%$1@ LV EDVHG RQ WKH PHVVDJH SDVVLQJ SDUDGLJP 2WKHU TXHU\ ODQJXDJHV DUH EDVHG RQ WKH UHODWLRQDO SDUDGLJP DQ H[WHQVLRQ RI 48(/ >52: &$5@ DQ H[WHQVLRQ RI 64/ >'$'@ DQG DQ H[WHQVLRQ RI WKH UHODWLRQDO DOJHEUD >&2/@ 7KH TXHU\ ODQJXDJH RI >),6@ LV EDVHG RQ ERWK IXQFWLRQDO DQG UHODWLRQDO SDUDGLJPV DOORZLQJ IXQFWLRQV WR EH XVHG LQ REMHFWRULHQWHG 64/ 264/f FRQVWUXFWV 7KH DERYH ODQJXDJHV KDYH DQ IODYRU DQG KDYH WDNHQ VLJQLILFDQW VWHSV WRZDUGV WKH GHYHORSPHQW RI D SRZHUIXO TXHU\ ODQJXDJH 4XHU\ ODQJXDJHV VXFK DV '$3/$; >6+,@ *(0 >=$1@ $5,(/ >0$&@ DQG WKH REMHFW RULHQWHG TXHU\ ODQJXDJH GHVFULEHG LQ >%$1@ DUH EDVHG RQ WKH YLHZ RI D GDWDn EDVH GHILQHG LQ WHUPV RI REMHFWV REMHFW FODVVHV DQG WKHLU DVVRFLDWLRQV $ TXHU\ LQ WKHVH ODQJXDJHV LV IRUPXODWHG E\ VSHFLI\LQJ RQH FODVV XVXDOO\ D QRQSULPLWLYHFODVV ZKRVH LQVWDQFHV DUH UHDO ZRUOG REMHFWVf LQ WKH VFKHPD DV D FHQWUDO FODVV ZLWK VRPH SDWK H[SUHVVLRQV (DFK SDWK H[SUHVVLRQ VWDUWV IURP WKH FHQWUDO FODVV DQG HQGV DW DQRWKHU FODVV XVXDOO\ D SULPLWLYHFODVV ZKRVH LQVWDQFHV DUH RI EDVLF GDWD W\SHV PAGE 13 VXFK DV LQWHJHU VWULQJ VHW HWFffnf 1HYHUWKHOHVV WKH RWKHU RSHUDnf DQG $5,(/ SURYLGHV WKH RI FRQn VWUXFW 3DUWV RI 6XSSOLHUVf WR QDYLJDWH IURP WKH FODVV 6XSSOLHUV WR WKH FODVV 3DUWV WR SURGXFH REMHFW SDLUV VLSL DQG VSf +RZHYHU WKH\ GR QRW KDYH D ODQJXDJH FRQVWUXFW IRU VSHFLI\LQJ WKH VHPDQWLFV WKDW VL GRHV QRW VXSSO\ S DQG V GRHV QRW VXSSO\ SL 6LPLODUO\ LQ IXQFWLRQDO ODQJXDJHV RQO\ WKH IXQFWLRQ 3DUWV6XSSOLHUVf LV SURYLGHG WR VSHFLI\ WKH DVVRFLDWLRQV RI VLSL DQG VS EXW QRW WKH QRQDVVRFLDWLRQ RI VXSSOLHUV DQG SDUWV ,Q YLHZ RI WKH GLVDGYDQWDJHV RI WKH H[LVWLQJ TXHU\ ODQJXDJHV ZH ZRXOG OLNH WR VWUHVV WKH LPSRUWDQFH RI XVLQJ D JUDSK DV WKH ORJLFDO UHSUHVHQWDWLRQ RI DQ GDWDEDVH DW ERWK LQWHQVLRQDO DQG H[WHQVLRQDO OHYHOV DV H[HPSOLILHG E\ >/(&@ )$' >%$1@ DQG 26$0r >68@ 7KH TXHU\ ODQJXDJH DQG LWV XQGHUn O\LQJ DOJHEUD VKRXOG SURYLGH FRQVWUXFWV WR GLUHFWO\ SURFHVV JUDSKV ZLWK GLIIHUHQW GHJUHHV RI FRPSOH[LW\ 7KH\ VKRXOG DOVR VXSSRUW WKH VSHFLILFDWLRQ RI QRQn DVVRFLDWLRQV DQG WKH SURFHVVLQJ RI KHWHURJHQHRXV VWUXFWXUHV )XUWKHUPRUH WKH FORn VXUH SURSHUW\ VKRXOG EH PDLQWDLQHG ,Q WKLV GLVVHUWDWLRQ ZH SURSRVH DQ DVVRFLDWLRQ DOJHEUD $DOJHEUDf EDVHG RQ WKH JUDSK UHSUHVHQWDWLRQ RI GDWDEDVHV DQG WKH DVVRFLDWLRQEDVHG TXHU\ IRUPXn ODWLRQ UHIHU WR &KDSWHU f $QDORJRXV WR WKH GHYHORSPHQW RI WKH UHODWLRQDO DOJHn EUD IRU UHODWLRQDO GDWDEDVHV WKH GHYHORSPHQW RI WKH $DOJHEUD SURYLGHV WKH IRUPDO IRXQGDWLRQ IRU TXHU\ SURFHVVLQJ DQG RSWLPL]DWLRQ LQ GDWDEDVHV DQG IRU PAGE 16 GHVLJQLQJ TXHU\ ODQJXDJHV 8QOLNH WKH UHFRUGWXSOHfEDVHG UHODWLRQDO DOJHEUD >&2' DQG &2'@ DQG WKH TXHU\ DOJHEUD >6+$@ WKH $DOJHEUD LV DVVRFLDWLRQEDVHG LH WKH GRPDLQ RI WKH DOJHEUD LV VHWV RI DVVRFLDWLRQ SDWWHUQV HJ OLQHDU VWUXFWXUHV WUHHV ODWWLFHV QHWZRUNV HWFf DQG SURFHVVLQJ DQ GDWDn EDVH LV EDVHG RQ WKH PDWFKLQJ DQG PDQLSXODWLRQ RI KRPRJHQHRXV DV ZHOO DV KHWHURn JHQHRXV SDWWHUQV RI REMHFW DVVRFLDWLRQV 2SHUDWRUV RI WKH $DOJHEUD FDQ EH XVHG WR QDYLJDWH D QHWZRUN RI LQWHUFRQQHFWHG REMHFW FODVVHV DORQJ WKH SDWK RI LQWHUHVW WR FRQVWUXFW D FRPSOH[ SDWWHUQ DV WKH VHDUFK FRQGLWLRQ 7KH\ FDQ DOVR EH XVHG WR GHFRPSRVH D FRPSOLFDWHG SDWWHUQ LQWR VLPSOH RQHV 7HQ RSHUDWRUV KDYH EHHQ GHILQHG IRU WKH DOJHEUD WKUHH XQDU\ RSHUDWRUV >$6HOHFW Uf $3URMHFW f DQG $ ,QWHJUDWH f@ DQG VHYHQ ELQDU\ RSHUDWRUV >$VVRFLDWH rf $&RPSOHPHQW _f $ 8QLRQ f $'LIIHUHQFH f $'LYLGH Af 1RQ$VVRFLDWH f DQG $,QWHUVHFW fff 2EMHFW *UDSK 2*fa? ORJLFDO GDWD LQGHSHQGHQFH O SK\VLFDO GDWD U LQGHSHQGHQFH )LJXUH 'DWD LQGHSHQGHQFLHV LQ UHODWLRQDO GDWDEDVHV PAGE 19 a? ORJLFDO GDWD LQGHSHQGHQFH HQFDSVXODWLRQ SK\VLFDO GDWD U LQGHSHQGHQFH )LJXUH $UFKLWHFWXUH RI GDWDEDVHV PAGE 20 &+$37(5 $ 6859(< 2) 5(/$7(' 5(6($5&+ 7KLV VHFWLRQ VXUYH\V VRPH RI WKH H[LVWLQJ ZRUN UHODWHG WR WKH GHYHORSPHQW RI WKH $DOJHEUD 6HFWLRQ GHVFULEHV WKH UHODWLRQDO PRGHO DQG WKH UHODWLRQDO DOJHn EUD ZKLOH 6HFWLRQ VXUYH\V VRPH H[LVWLQJ TXHU\ ODQJXDJHV GHVLJQHG IRU VHPDQWLF GDWD PRGHOV 7KH TXHU\ DOJHEUD UHFHQWO\ DSSHDUHG LQ WKH OLWHUDWXUH LV VXUYH\HG LQ 6HFWLRQ 5HODWLRQDO 0RGHO DQG 5HODWLRQDO $OJHEUD :KHQ WKH KLHUDUFKLFDO DQG QHWZRUN GDWD PRGHOV ZHUH XVHG H[WHQVLYHO\ LQ LQIRUPDWLRQ V\VWHPV LQ WKH ODWH V &RGG >&2'@ UDLVHG DQ LQWHUHVWLQJ DQG LPSRUWDQW TXHVWLRQ &DQ DSSOLFDWLRQ SURJUDPV DQG WHUPLQDO DFWLYLWLHV UHPDLQ LQYDULDQW DV WKH LQWHUQDO GDWD UHSUHVHQWDWLRQV SK\VLFDO UHSUHVHQWDWLRQVf FKDQJH" +H DVVHUWHG WKDW WKH IXWXUH XVHUV RI ODUJH GDWD EDQNV PXVW EH SURWHFWHG IURP KDYn LQJ WR NQRZ KRZ WKH GDWD ZHUH RUJDQL]HG LQ WKH PDFKLQH )ROORZLQJ WKLV UDWLRQDOH KH FRQFHLYHG WKH QRWLRQ RI GDWD LQGHSHQGHQFH ZKLFK VXJJHVWV WKDW WKH ORJLFDO RUJDQL]DWLRQ RI GDWD VKRXOG EH LQGHSHQGHQW RI LWV SK\VLFDO UHSUHVHQWDWLRQ 'HWHUPLQHG WR GHPRQVWUDWH WKH YDOLGLW\ RI KLV GDWD LQGHSHQGHQFH FRQFHSW KH SURn SRVHG D UHODWLRQDO GDWD PRGHO EDVHG RQ QDU\ UHODWLRQV PAGE 21 7KH VFKHPH RI D UHODWLRQ 5 RI DQ HQWLW\ VHW ^(Y ( (Q` LV GHILQHG RQ D VHW RI P DWWULEXWHV ^$Y $ $P` ZKLFK FRUUHVSRQG WR P GRPDLQV ^'Y QRW QHFHVVDULO\ GLVWLQFWf (DFK HQWLW\ WKH LQVWDQFH RI WKH VFKHPHf LV UHSUHVHQWHG E\ DQ PDU\ WXSOH ZKLFK KDV LWV ILUVW DWWULEXWH YDOXH IURP 'Y LWV VHFRQG DWWULEXWH IURP 'Y DQG VR IRUWK $ VHW RI DWWULEXWHV RI D UHODWLRQ LV FDOOHG D NH\ LI WKH HQWLWLHV RI WKH UHODWLRQ FDQ EH XQLTXHO\ LGHQWLILHG E\ WKH YDOXHV RI WKHVH DWWULEXWHV ,Q SDUWLFXODU WKH LQIRUPDWLRQ RI WKH VXSSOLHUV VXFK DV WKHLU QDPHV DGGUHVVHV LWHPV WKH\ VXSSO\ DQG WKH SULFHV RI WKH LWHPV FDQ EH UHSUHVHQWHG E\ WKH UHODWLRQ 6833/,(56 RI WKH IROORZLQJ VFKHPH 6833/,(5661$0( 6$''5(66 ,7(0 35,&(f ZKHUH WKH DWWULEXWHV 61$0( DQG ,7(0 IRUP D FRPSRVLWH NH\ 'DWD UHSUHVHQWHG LQ WKLV IRUP ZKLFK LQWXLWLYHO\ LV D IODW WDEOH LV WKH ORJLFDO YLHZ RI DQ DSSOLFDWLRQ ZRUOG ,W KDV QRWKLQJ WR GR ZLWK WKH SK\VLFDO UHSUHVHQWDWLRQ RI WKH GDWD :KHQ GHVLJQLQJ D GDWDEDVH XVLQJ WKH UHODWLRQDO PRGHO RQH LV RIWHQ IDFHG ZLWK D FKRLFH DPRQJ DOWHUQDWLYH VHWV RI UHODWLRQ VFKHPHV 6RPH FKRLFHV DUH PRUH IDYRUn DEOH WKDQ RWKHUV IRU YDULRXV UHDVRQV )RU H[DPSOH WKH UHODWLRQ 6833/,(56 LV QRW D GHVLUDEOH VFKHPH EHFDXVH LW KDV WKH IROORZLQJ SRWHQWLDO SUREOHPV f 5HGXQn GDQF\ WKH DGGUHVV RI WKH VXSSOLHU LV UHSHDWHG RQFH IRU HDFK LWHP VXSSOLHG f 3RWHQWLDO LQFRQVLVWHQF\ XSGDWH DQRPDOLHVf fÂ§ DV D FRQVHTXHQFH RI WKH UHGXQGDQF\ WKH XSGDWH RI WKH DGGUHVV RI D VXSSOLHU LQ RQH WXSOH ZLOO OHDYH LW LQFRQVLVWHQW ZLWK WKH DGGUHVV RI DQRWKHU WXSOH f ,QVHUWLRQ DQRPDOLHV WKH DGGUHVV RI D VXSSOLHU FDQQRW EH UHFRUGHG LI WKDW VXSSOLHU GRHV QRW FXUUHQWO\ VXSSO\ DW OHDVW RQH LWHP PAGE 22 VLQFH 61$0( DQG ,7(0 IRUP D FRPSRVLWH NH\ RI WKH UHODWLRQ 6833/,(56 f 'HOHWLRQ DQRPDOLHV WKH LQYHUVH WR SUREOHP f LV WKDW VKRXOG DOO WKH LWHPV VXSn SOLHG E\ RQH VXSSOLHU EH GHOHWHG ZH XQLQWHQWLRQDOO\ ORVH WKH DGGUHVV RI WKDW VXSn SOLHU 7KH FDXVHV RI WKHVH SUREOHPV DQG WKHLU VROXWLRQV DUH UHOHYDQW WR WKH IXQFnn WLRQV LQ WKHVH QRUPDO IRUPV PD\ KDYH WR EH IXUWKHU GHFRPSRVHG LQWR 1) RU 1) WR HOLPLQDWH PXOWLYDOXHG GHSHQGHQFLHV >)$* '(/ DQG =$1@ DQG MRLQ GHSHQGHQFLHV >$+@ 7KLV GHFRPSRVLWLRQ LV QHHGHG WR HOLPLQDWH IXUWKHU UHGXQn GDQF\ DQG DQRPDOLHV 7KH VXFFHVV DQG SRSXODULW\ RI WKH UHODWLRQDO PRGHO DQG WKH UHODWLRQDO GDWDn EDVH PDQDJHPHQW V\VWHPV '%06Vf DUH GXH WR LWV VLPSOLFLW\ LQ VWUXFWXUDO WDEXODUf UHSUHVHQWDWLRQ DQG LWV VRXQG WKHRUHWLFDO EDVLV WKH UHODWLRQDO DOJHEUD DQG WKH UHODn WLRQDO FDOFXOXV >&2'D@ 7KH UHODWLRQDO DOJHEUD GHILQHV ILYH SULPLWLYH RSHUDWRUV RI ZKLFK WZR DUH XQDU\ RSHUDWRUV >3URMHFWLRQ f DQG 6HOHFWLRQ Uf> DQG WKUHH DUH ELQDU\ RSHUDWRUV >&URVVSURGXFW [f 8QLRQ f DQG 'LIIHUHQFH f@ 2WKHU RSHUDn WRUV VXFK DV -RLQ 1DWXUDOMRLQ 6HWLQWHUVHFWLRQ DQG 6HWGLYLVLRQ DUH DOVR GHILQHG LQ WKH DOJHEUD $OWKRXJK WKHVH ODWHU RSHUDWRUV DUH HDV\ WR XVH WKH\ DUH QRW SULPLn WLYH VLQFH WKH\ FDQ EH H[SUHVVHG LQ WHUPV RI WKH SULPLWLYH RSHUDWRUV 7KH UHODWLRQDO DOJHEUD KDV WKH FORVXUH SURSHUW\ VLQFH HYHU\ RSHUDWRU PXVW RSHUDWH RQ RQH RU PRUH UHODWLRQV DQG SURGXFHV D QHZ UHODWLRQ 2SHUDWRUV RI WKH UHODWLRQDO DOJHEUD EDVLFDOO\ RSHUDWH RQ WKH YDOXHV RI WXSOHV LQ UHODWLRQV 6WUXFWXUn DOO\ VSHDNLQJ WKH\ DUH GHILQHG WR RSHUDWH RQ WXSOHV ZKRVH VWUXFWXUHV DUH XQLRQ FRPSDWLEOH KRPRJHQHRXVf 7KH UHODWLRQDO DOJHEUD LV FRPSOHWH LQ WKH VHQVH WKDW LW PAGE 24 KDV WKH HTXLYDOHQW H[SUHVVLYH SRZHU WR WKH UHODWLRQDO FDOFXOXV >&2'D DQG 8//@ %HFDXVH RI WKLV LW VHUYHV DV WKH WKHRUHWLFDO EDVLV IRU WKH UHODWLRQDO PRGHO 7KH UHODWLRQDO DOJHEUD KDV EHHQ XVHG IRU WKH IROORZLQJ WKUHH SXUSRVHV DOWKRXJK LW KDV QRW EHHQ SUHYLRXVO\ LPSOHPHQWHG LQ DQ\ H[LVWLQJ '%06V H[DFWO\ DV GHILQHG >8//@ ff ,W QRW RQO\ VHUYHV DV D EHQFKPDUN IRU HYDOXDWLQJ TXHU\ ODQJXDJHV LQ H[LVWLQJ V\VWHPV EXW DOVR DV WKH FULWHULRQ IRU GHVLJQLQJ QHZ ODQJXDJHV IRU UHODWLRQDO '%06V $ UHODWLRQDO ODQJXDJH ZLOO QRW KDYH WKH QHFHVVDU\ H[SUHVVLYH SRZHU LI LW LV QRW UHODWLRQDOO\ FRPSOHWH >8//@ f ,W SURYLGHV D PDWKHPDWLFDO EDVLV IRU WUDQVIRUPLQJ H[SUHVVLRQV LQ TXHU\ GHFRPn SRVLWLRQ DQG ORJLFDO RU FRQFHSWXDOff ZKRVH PRGHOLQJ FDSDELOLWLHV ZHUH H[WHQGHG E\ &RGG LQ >&2'@ WR YHUVLRQ 507 7 IRU 7DVPDQLDf %DVHG RQ WKHVH WZR YHUVLRQV &RGG >&2'@ LQWURGXFHV 9HUVLRQ RI WKH UHODWLRQDO PRGHO 509f 7KH PRVW LPSRUWDQW DGGLWLRQDO IHDWXUHV LQ 509 DUH DV IROn ORZV f $ QHZ WUHDWPHQW RI LWHPV RI GDWD PLVVLQJ EHFDXVH WKH\ UHSUHVHQW SURSHUWLHV WKDW KDSSHQ WR EH LQDSSOLFDEOH WR FHUWDLQ REMHFW LQVWDQFHV f 1HZ IHDWXUHV VXSSRUWLQJ DOO NLQGV RI LQWHJULW\ FRQVWUDLQWV HVSHFLDOO\ WKH XVHU GHILQHG LQWHJULW\ FRQVWUDLQWV f $ PRUH GHWDLOHG DFFRXQW RI YLHZ XSGDWDELOLW\ f 1HZ IHDWXUHV SHUWDLQLQJ WR WKH PDQDJHPHQW RI GLVWULEXWHG GDWDEDVHV ,W LV LPSRUWDQW WR UHFRJQL]H WKH IDFW WKDW KLHUDUFKLFDO DQG QHWZRUN PRGHOV DV ZHOO DV WKH UHODWLRQDO PRGHO HYROYHG GXULQJ D WLPH LQ ZKLFK WKH SULPDU\ DSSOLFDn WLRQV RI LQIRUPDWLRQ V\VWHPV ZHUH EXVLQHVVRULHQWHG ,Q DQ DWWHPSW WR DSSO\ WKHVH WHFKQLTXHV WR WKH PRUH FRPSOLFDWHG DSSOLFDWLRQ DUHDV VXFK DV &$'&$0 &$6( DQG GHFLVLRQ VXSSRUW LW LV IRXQG WKDW WKH UHODWLRQDO PRGHO LV QR ORQJHU DGHTXDWH IRU PRGHOLQJ WKHVH DGYDQFHG DSSOLFDWLRQV 7KH LQDGHTXDFLHV RI WKH UHODWLRQDO PRGHO DUH VXPPDUL]HG DV IROORZV )LUVW WKH UHODWLRQDO PRGHO KDV OLPLWHG PRGHOLQJ PAGE 26 FDSDELOLWLHV :KHQ GDWD DUH ORJLFDOO\ UHSUHVHQWHG LQ WKH IRUP RI UHODWLRQV WKH UHODn WLRQVKLSV DPRQJ HQWLWLHV LQ WKHVH UHODWLRQV DUH UHSUHVHQWHG E\ PDWFKLQJ YDOXHV RI WKH DWWULEXWHV RU NH\V LQ RQH UHODWLRQ ZLWK YDOXHV RI WKH DWWULEXWHV RU IRUHLJQ NH\V LQ RWKHU UHODWLRQV 7KH DFWXDO VHPDQWLFV DPRQJ WKH GDWD VXFK DV JHQHUDOL]DWLRQ DQG DJJUHJDWLRQ WKH DEVWUDFW GDWD W\SHf FDQQRW EH PRGHOHG E\ WKH UHODWLRQDO PRGHO 6HFRQG WKH UHODWLRQDO PRGHO RQO\ PRGHOV WKH VWUXFWXUDO DVSHFWV RI HQWLWLHV DQG WKXV LJQRUHV WKHLU EHKDYLRUDO DVSHFWV HJ V\VWHPGHILQHG DQG XVHUGHILQHG RSHUDWLRQVf 7KLUG LQ WKHVH DGYDQFHG DSSOLFDWLRQV WKH FRQFHSW RI GDWD LQGHSHQnf )LQDOO\ LW FDQQRW UHSUHVHQW DQG RSHUDWH RQ HQWLWLHV ZLWK GLIIHUHQW KHWHURJHQHRXVf VWUXFWXUHV ([LVWLQJ 4XHU\ /DQJXDJHV $Q H[WHQVLYH OLWHUDWXUH VHDUFK RQ TXHU\ ODQJXDJHV IRU DFFHVVLQJ GDWDnf UHSUHVHQW DWWULEXWHV 48(/ >67 :21 DQG =@ LV D WXSOHFDOFXOXV RULHQWHG TXHU\ ODQJXDJH IRU UHODWLRQDO '%06 ,1*5(6 >67@ ,Q RUGHU WR DYRLG WKH DPELJXLW\ ZKLFK DULVHV ZKHQ WZR DWWULEXWHV RI GLIIHUHQW UHODWLRQV KDYLQJ WKH VDPH QDPH DUH DGGUHVVHG LQ D VLQJOH TXHU\ 48(/ XVHV D GRW PHFKDQLVP WR TXDOLI\ DQ DWWULEXWH RI D UHODWLRQ LH D GRW LV LQVHUWHG EHWZHHQ WKH QDPH RI WKH UHODWLRQ DQG WKH QDPH RI WKH DWWULEXWHfn EXWH )RU H[DPSOH WKH FODVV /DE KDV DQ DWWULEXWH )DFLOLW\ RI WKH W\SH (TXLSn PHQW DQG KDV DQRWKHU DWWULEXWH /RFDOLW\ RI WKH W\SH /RFDWLRQ DQG VR IRUWK 7KH GRW QRWDWLRQ LV XVHG LQ *(0 IRU QDYLJDWLQJ DORQJ WKH UHIHUHQFH DWWULEXWHV OLQNVf LQ TXHU\ IRUPXODWLRQ 7KH IROORZLQJ *(0 TXHU\ UHWULHYHV WKH QDPH RI WKH PDQDJHU WKH VHULDO QXPEHU RI WKH HTXLSPHQW DQG WKH DGGUHVV IRU HDFK ODERUDWRU\ ZKRVH KHDGTXDUWHU LV ORFDWHG LQ 1HZ PAGE 29 WKHQ WKH H[SUHVVLRQ ;<= LV D ILHOG LQ D FROOHFWLRQ RI WKLV YLHZ ,Q RWKHU ZRUGV WKH H[SUHVVLRQ ZLOO UHWXUQ WKH YDOXHV RI WKH = ILHOG RI WXSOHV LQ 5f WKDW DUH UHODWHG WR ; WKURXJK < )RU H[DPSOH OHW WKH UHODWLRQ 0DQDJHU KDYH D ILHOG FDOOHG 2IILFHOQIR RI W\SH 48(/ ZKLFK FRQWDLQV D TXHU\ WKDW UHWULHYHV WKH WHOHSKRQH QXPEHU RI WKH UHODWLRQ /RFDWLRQ 7KH H[SUHVVLRQ 0DQDJHU2IILFHOQIR7HO UHWXUQV WKH WHOHSKRQH QXPEHU IRU HDFK PDQDJHU LQ D WDEXODU IRUPDW &OHDUO\ WKH LPSOHn PHQWDWLRQ RI 48(/ DV D GDWD W\SH SURYLGHV D ZD\ WR UHODWH GDWD LQ WZR UHODWLRQV ZLWKRXW PRGLI\LQJ WKH GDWDEDVH VFKHPD ,QVWHDG RI XVLQJ WKH GRW QRWDWLRQ $5,(/ >0$&@ WDNHV DGYDQWDJH RI WKH 2) QRWDWLRQ 7KH H[DPSOH TXHU\ GHVFULEHG IRU *(0 FDQ EH UHVWDWHG DV 5DQJH RI /DE LV /DE 5HWULHYH 1DPH 2) 0DQDJHU 2) /DE 6HULDO 2) (TXLSPHQW 2) /DE $GGUHVV 2) /RFDWLRQ 2) /DE :KHUH &LW\ 2) +HDGTXDUWHUV 2) 'HSDUWPHQW 2) 0DQDJHU 2) /DE 1HZ PAGE 30 /RFDWLRQ/DEf DQG 'HSDUWPHQW+HDGTXDUWHUVf UHSUHVHQW WKH IDFWV WKDW /DE KDV /RFDWLRQ DQG +HDGTXDUWHUV KDV 'HSDUWPHQW DV DWWULEXWH UHVSHFWLYHO\ :KHQ WKH IXQFWLRQ /RFDWLRQ/DEf LV DSSOLHG WR DQ REMHFW RI WKH FODVV /DE LW UHWXUQV D YDOXH ZKLFK LV DQ REMHFW LQ WKH GRPDLQ FODVV RYHU ZKLFK WKH DWWULEXWH LV GHILQHG ,I WKH QDYLJDWLRQ LV IURP RQH FODVV WR DQRWKHU WKURXJK D VHTXHQFH RI FODVVHV D QHVWHG IXQFWLRQ LV XVHG )RU LQVWDQFH WKH H[SUHVVLRQ 1DPH0DQDJHU/DEff VSHFLILHV WKH QDPH RI WKH PDQDJHU RI D ODERUDWRU\ WR ZKLFK WKH PDQDJHU LV UHVSRQVLEOH )RU D SDUWLFXODU REMHFW RI /DE WKH PDQDJHU RI WKH ODERUDWRU\ LV SURGXFHG ILUVW WKHQ WKH IXQFWLRQ 1DPHf LV DSSOLHG WR WKH UHWXUQHG PDQDJHU DQG UHWXUQV WKH QDPH RI WKH PDQDJHU 7KH H[DPSOH TXHU\ FDQ EH H[SUHVVHG LQ '$3/(; DV IROORZV )25 ($&+ /DE 68&+ 7+$7 &LW\ +HDGTXDUWHUV 'HSDUWPHQW 0DQDJHU /DEffff 1HZ PAGE 31 ORZLQJ LV DQ H[SUHVVLRQ IRU VHOHFWLQJ D ODERUDWRU\ WKDW KDV D PDQDJHU ZKR EHORQJV WR D VXERUGLQDWH GHSDUWPHQW RI LWV 1HZ PAGE 32 YDOXHV DQG LV UHSUHVHQWHG LQ D WDEXODU IRUP $V VKRZQ LQ WKH VDPSOHV RI WKHVH TXHU\ ODQJXDJHV WKHLU TXHU\ IRUPXODWLRQV WKRXJK LQWHUSUHWHG GLIIHUHQWO\ DUH YHU\ VLPLODU WR HDFK RWKHU 7KLV LV HYLGHQW LQ WKH IDFW WKDW WKH IRUPXODWLQJ RI TXHULHV LV DFFRPSOLVKHG E\ QDYLJDWLQJ WKH JUDSKLn FDOO\ UHSUHVHQWHG GDWDEDVH VFKHPD IURP FODVV WR FODVV WKURXJK WKHLU UHVSHFWLYH OLQNV ,Q HDFK RI WKHVH ODQJXDJHV KRZHYHU D TXHU\ RSHUDWHV RQ D GDWDEDVH WKDW LV VWUXFWXUDOO\ UHSUHVHQWHG XVLQJ DQ GDWD PRGHO DQG UHWXUQV D UHVXOW ZKRVH VWUXFWXUH LV UHSUHVHQWHG LQ D WDEXODU IRUP &RQVHTXHQWO\ WKH UHVXOW RI D TXHU\ FDQQRW EH IXUWKHU TXHULHG E\ RWKHU TXHULHV ZULWWHQ LQ WKH VDPH ODQJXDJH 7KHUHn IRUH WKHVH ODQJXDJHV DUH QRW FORVHG $QRWKHU GUDZEDFN RI WKHVH ODQJXDJHV LV VHHQ LQ WKHLU QDYLJDWLRQ PHFKDQLVPV ZKLFK FDQ RQO\ IRUPXODWH TXHULHV DJDLQVW FODVVHV RU UHODWLRQVf WKDW DUH LQWHUUHn ODWHG LQ VLPSOHU SDWWHUQV OLNH WKH OLQHDU DQG IRUHVW VWUXFWXUHV VKRZQ LQ )LJXUH D +RZHYHU LQ GDWDEDVHV WKH JUDSKLFDO SDWWHUQV LQ ZKLFK REMHFWV DUH LQWHUn UHODWHG ZLWK HDFK RWKHU DUH EDVLFDOO\ QHWZRUNV ZKLFK DUH QRW UHVWULFWHG WR SODQH JUDSKV D JUDSK LV D SODQH JUDSK LI LW FDQ EH GUDZQ RQ D SODQH ZLWKRXW DQ\ LQWHUn VHFWLRQ RI WZR HGJHVf 7KH\ FDQ EH DV FRPSOLFDWHG DV VXUIDFH JUDSKV D JUDSK LV D VXUIDFH JUDSK LI LW FDQ EH GUDZQ RQ D VXUIDFH ZLWKRXW DQ\ LQWHUVHFWLRQ RI WZR HGJHVf 3KUDVLQJ TXHULHV DJDLQVW FODVVHV WKDW DUH LQWHUUHODWHG LQ PRUH FRPSOLFDWHG SDWWHUQV GHSLFWHG LQ )LJXUH E LV EH\RQG WKH FDSDELOLWLHV RI WKHVH ODQJXDJHV $ WKLUG GUDZEDFN RI WKHVH ODQJXDJHV ZKLFK UHQGHUV WKHLU QDYLJDWLRQ PHFKDQn LVPV LQVXIILFLHQW LV WKDW RQO\ RQH W\SH RI WKH UHODWLRQVKLS DQ REMHFW LD UHODWHG WR DQRWKHU REMHFWf EHWZHHQ REMHFWV RI WZR FODVVHV FDQ EH H[SUHVVHG ,Q IDFW ZKHQ PAGE 33 WZR FODVVHV DUH GLUHFWO\ OLQNHG DW WKH VFKHPD OHYHO REMHFWV LQ WKHVH WZR FODVVHV PD\ KDYH DQRWKHU W\SH RI UHODWLRQVKLS fÂ§ DQ REMHFW LV QRW UHODWHG WR DQRWKHU REMHFW 7KLV W\SH RI UHODWLRQVKLS UHSUHVHQWV WKH FRPSOHPHQW DVSHFW RI WKH VHPDQWLFV VSHFLILHG IRU WKH WZR DVVRFLDWHG FODVVHV VXFK DV QRWDSDUWRI QRWDIXQFWLRQRI RU LDQRWD ZKLFK LV RIWHQ QHHGHG LQ TXHU\LQJ WKH GDWDEDVHV )RU H[DPSOH n)RU HDFK ODERUDWRU\ OLVW WKH HTXLSPHQW WKDW LV QRW DYDLODEOH LV D UHDVRQDEOH TXHU\ 7KH SURSRVHG TXHU\ ODQJXDJHV >'$' 0$1 %$1 52: &$5 &2/@ XVH QHVWHG UHODWLRQV DV WKHLU ORJLFDO YLHZV RI GDWDEDVHV $ QHVWHG UHODWLRQ LV D JHQHUDOL]HG UHODWLRQ LH D UHFXUVLYHO\ GHILQHG UHODWLRQ WKH DWWULEXWHV RI D UHODn WLRQ FDQ EH HLWKHU DWRPLF YDOXHV RU DQRWKHU UHODWLRQ LQ ZKLFK WKH DWWULEXWHV FDQ EH D WKLUG UHODWLRQ DQG VR IRUWK )LJXUH VKRZV DQ H[DPSOH RI D QHVWHG UHODWLRQ 1HVWHG UHODWLRQV DUH SDUWLFXODUO\ VXLWDEOH IRU UHSUHVHQWLQJ GDWD LQ IRUHVW VWUXFWXUHV 7KH DERYH ODQJXDJHV DUH FRQVLGHUHG WR EH FORVHG VLQFH RSHUDWRUV LQ WKHVH ODQJXDJHV RSHUDWH RQ QHVWHG UHODWLRQV DQG SURGXFH QHVWHG UHODWLRQV +RZHYHU WKH\ DOVR KDYH WKH GUDZEDFNV PHQWLRQHG DERYH DQG LW LV RXU YLHZ WKDW QHVWHG UHODn WLRQ LV QRW D SURSHU ORJLFDO UHSUHVHQWDWLRQ IRU DQ GDWDEDVH ZKLFK LV QHWZRUNV RI REMHFWV REMHFW FODVVHV DQG WKHLU DVVRFLDWLRQV 8VLQJ QHVWHG UHODWLRQV WR UHSUHVHQW GDWD LQ QHWZRUN VWUXFWXUHV LQWURGXFHV RQH OHYHO RI LQGLUHFWLRQ 0DSSLQJ IURP D QHWZRUN UHSUHVHQWDWLRQ WR QHVWHG UHODWLRQV LV DQ H[WUD SURFHVV )XUWKHUn PRUH LQ RUGHU WR XVH D QHVWHG UHODWLRQ WR UHSUHVHQW FRPSOH[ VWUXFWXUHV D ODUJH DPRXQW RI GDWD KDV WR EH UHSOLFDWHG LQ WKH UHSUHVHQWDWLRQ )LJXUH VKRZV DQ H[DPSOH RI XVLQJ D QHVWHG UHODWLRQ WR UHSUHVHQW D JUDSK KDYLQJ ORRSV 1RWH WKDW PAGE 34 YHUWH[ ) KDV WR EH UHSOLFDWHG WKUHH WLPHV r (1&25( 'DWD 0RGHO DQG ,WV 8QGHUO\LQJ 4XHU\ $OJHEUD ,Q VSLWH RI WKH SRSXODULW\ RI WKH SDUDGLJP DQG LWV DSSOLFDWLRQ LQ WKH ILHOG RI GDWDEDVH PDQDJHPHQW WKH H[LVWLQJ GDWDEDVH PDQDJHPHQW V\VWHPV VWLOO ODFN D VROLG PDWKHPDWLFDO IRXQGDWLRQ IRU WKH PDQLSXODWLRQ RI DQ GDWDEDVH DQG WKH RSWLPL]DWLRQ RI TXHULHV 5HFHQWO\ D TXHU\ DOJHEUD >6+$@ ZDV SURSRVHG IRU WKH (1&25( GDWD PRGHO >(/0@ 7KLV VHFWLRQ VXUYH\V WKH TXHU\ DOJHn EUD DV ZHOO DV WKH (1&25( PRGHO ,W DOVR VHUYHV DV D FRPSDULVRQ WR WKH DVVRFLDn WLRQ DOJHEUD SURSRVHG LQ WKLV GLVVHUWDWLRQ 7KH (1&25( 0RGHO (1&25( GDWD PRGHO >(/0@ VXSSRUWV DEVWUDFW GDWD W\SH W\SH LQKHULn WDQFH W\SHG FROOHFWLRQ RI W\SHG REMHFWV REMHFWV ZLWK LGHQWLW\ DQG REMHFW HQFDSVXnf (1&25( SURYLGHV WZR SDUDPHWHUL]HG W\SHV DQG D JOREDO 2EMHFW W\SH ZKLFK LV WKH VXSHUW\SH RI DOO RWKHU W\SHV 7KH SDUDPHWHUL]HG W\SH 6HW>7@ GHILQHV 7 DV WKH W\SH RU VXSHUW\SH RI REMHFWV LQ D FROOHFWLRQ KDYLQJ W\SH 6HW DQG 7 LV FDOOHG WKH PHPEHU W\SH RI WKH VHW 7KH SDUDPHWHUL]HG WXSOH W\SH DVVRFLDWHV W\SHV 7f ZLWK DWWULEXWH QDPHV $f DQG GHILQHV SURSHUWLHV *HWDWWULEXWHBYDOXH DQG RSHUDWLRQV 6HWBDWWULEXWHBYDOXH IRU HDFK DWWULEXWH 7KH 7 V FDQ EH DQ\ GDWDEDVH W\SHV WKXV DOORZ QHVWLQJ RI WXSOH W\SHV 7KH YDOXH RI D WXSOH LV UHSUHVHQWHG DV $ RY $ R $Q RQ! ZKHUH WKH $fV DUH DWWULEXWHV RI WKH WXSOH DQG WKH RfV DUH REMHFWV RI WKH FRUUHVSRQGLQJ W\SHV 7KH JOREDO VXSHUW\SH 2EMHFW GHILQHV D IDPLO\ RI RSHUDWLRQV IRU HTXDOLW\ FDOOHG LHTXDOLW\ ZKHUH L LQGLFDWHV KRZ GHHSO\ D FRPSDULVRQ RI WZR REMHFWV PXVW VHDUFK EHIRUH ILQGLQJ HTXDOLW\ 7ZR REMHFWV DUH LGHQWLFDO ZKHQ WKH\ DUH WKH VDPH REMHFW LH WKH\ KDYH WKH VDPH LGHQWLW\ ,GHQWLFDO REMHFWV DUH 2HTXDO RU MXVW f DQG IRU }! WZR REMHFWV DUH LHTXDO f LI f WKH\ DUH ERWK FROOHFWLRQV RI WKH VDPH FDUGLQDOLW\ DQG WKHUH LV D RQHWRRQH FRUUHVSRQGHQFH EHWZHHQ WKH FROOHFWLRQV VXFK WKDW FRUUHVSRQGLQJ PHPEHUV DUH m Q 2/ f WKH\ ERWK KDYH WKH VDPH W\SH QRW D FROOHFWLRQ W\SHf DQG WKH YDOXHV RI FRUUHVSRQGLQJ SURSHUWLHV DUH BM 7\SH 2EMHFW DOVR GHILQHV D VWURQJHU QRWLRQ RI HTXDOLW\ FDOOHG LGHTXDOLW\ 7ZR REMHFWV DUH LGHTXDO DW GHSWK L LI WKH\ DUH LHTXDO DQG JUDSKLFDO UHSUHVHQWDnn LV D VXSHUW\SH RI 3 WKH FROOHFWLRQ RI REMHFWV D DQG S LV RI W\SH 6HW>6@ 7KH TXHU\ DOJHEUD LV FORVHG VLQFH WKH RSHUDWRUV RI WKH TXHU\ DOJHEUD RSHUDWH RQ FROOHFWLRQVf RI REMHFWV ZLWK W\SH 6HW >79@ DQG SURGXFH D FROOHFWLRQ ZLWK W\SH 6HWIUM ZKHUH W\SH 7N LV GHILQHG E\ WKH TXHU\ 6LPLODU WR WKH ODQJXDJHV VXUYH\HG LQ 6HFWLRQ WKH TXHU\ DOJHEUD DGGUHVVHV D SURSHUW\ RI DQ REMHFW XVLQJ fGRWf QRWDWLRQ HJ DST ZKHUH m LV DQ REMHFW RI W\SH 7Y S LV D SURSHUW\ RI D DQG LV RI W\SH 7 DQG T LV D SURSHUW\ RI S DQG LV RI W\SH 7Vf 7ZHOYH RSHUDWRUV DUH GHILQHG LQ WKLV DOJHEUD :H JLYH WKHLU EULHI GHILQLWLRQV IROORZHG E\ VRPH H[DPSOH TXHULHV WR LOOXVWUDWH WKH PDMRU FRQFHSWV RI WKLV DOJHEUD f 7KH 6HOHFW RSHUDWLRQ FUHDWHV D FROOHFWLRQ RI REMHFWV ZKLFK VDWLVI\ D VHOHFWLRQ SUHGLFDWH 6HOHFW6Sf ^ D D LQ 6f$SVf ` ZKHUH S LV WKH SUHGLFDWH f 7KH ,PDJH RSHUDWLRQ LV XVHG WR UHWXUQ D VLQJOH REMHFW IRU HDFK REMHFW LQ WKH TXHULHG FROOHFWLRQ DQG KDV WKH IRUP PAGE 37 ,PDJH6 I 7f fÂ§ ^ mf LQ 6 ` ZKHUH LV D FROOHFWLRQ RI REMHFWV DQG UHWXUQV DQ REMHFW RI W\SH 7 f 7KH 3URMHFW RSHUDWLRQ H[WHQGV ,PDJH E\ DOORZLQJ WKH DSSOLFDWLRQ RI PDQ\ IXQFWLRQV WR DQ REMHFW WKXV VXSSRUWLQJ WKH FUHDWLRQ DQG PDLQWHQDQFH RI VHOHFWHG UHODWLRQVKLSV EHWZHHQ REMHFWV 7KH UHODWLRQVKLSV DUH VWRUHG DV WXSOHV ZLWK 7XSOH W\SH 3URMHFW6 $9 $f ff! ^$9 fmf! m r} 6 ` ZKHUH 6 LV RI W\SH 6HW>7? WKH $fV DUH XQLTXH DWWULEXWH QDPHV DQG HDFK I WDNHV D VLQJOH LQSXW RI W\SH 7 DQG UHWXUQV DQ REMHFW RI W\SH 7 3URMHFW UHWXUQV RQH WXSOH IRU HDFK REMHFW LQ WKH FROOHFWLRQ EHLQJ TXHULHG (DFK QHZO\ FUHDWHG WXSOH LV D QHZ REMHFW ZLWK XQLTXH REMHFW LGHQWLILHU f 7KH 2MRLQ RSHUDWRU LV DQ H[SOLFLW MRLQ RSHUDWRU XVHG WR FUHDWH UHODWLRQVKLSV ZKLFK LV QRW GHILQHG EHWZHHQ REMHFWV RI WZR FROOHFWLRQV LQ WKH GDWDEDVH ,W LV HVVHQWLDOO\ D &DUWHVLDQ SURGXFW RI FROOHFWLRQV RI REMHFWV IROORZHG E\ D VHOHFn WLRQ RI UHVXOW WXSOHV )RU FROOHFWLRQV 6 DQG 5 WKH 2MRLQ LV GHILQHG DV IROORZV 2MRLQ6 5 $Y $ Sf ^$\ V $ U! H LQ 6 $ U LQ 5 $ SmUf ` ZKHUH S LV D SUHGLFDWH DV LQ 6HOHFWf GHILQHG RYHU REMHFWV IURP 6 DQG 5 7KH 2MRLQ RSHUDWLRQ FUHDWHV QHZ WXSOHV LQ WKH GDWDEDVH WR VWRUH WKH JHQHUDWHG UHODWLRQVKLSV 7KH WXSOHV FUHDWHG ZLOO KDYH XQLTXH REMHFW LGHQWLILHUV f 8QLRQ 'LIIHUHQFH DQG ,QWHUVHFWLRQ DUH WKH XVXDO VHW RSHUDWLRQV ZLWK REMHFW FRPSDULVRQV DQG VHW PHPEHUVKLS EDVHG RQ REMHFW LGHQWLW\ f 7KH UHVXOW RI PAGE 38 WKHVH RSHUDWLRQV LV FRQVLGHUHG WR EH D FROOHFWLRQ RI REMHFWV RI W\SH 7 ZKHUH 7 LV WKH PRVW VSHFLILF FRPPRQ VXSHUW\SH LQ WKH W\SH ODWWLFHf RI WKH W\SHV RI WKH REMHFWV LQ WKH RSHUDQGV f )ODWWHQ RSHUDWLRQ LV XVHG WR UHVWUXFWXUH VHWV RI VHWV DQG 1HVW DQG 8Q1HVW DOORZ WKH UHSUHVHQWDWLRQ RI WXSOHV DV IODW RU QHVWHG UHODWLRQV f )RU WKH DERYH RSHUDWRUV WZR LGHQWLFDO RSHUDWLRQV FDQQRW JLYH LGHQWLFDO UHVSRQVH VLQFH HDFK UHVXOW FROOHFWLRQ LV D QHZO\ LGHQWLILHG REMHFW LQ WKH GDWDnf 7KH ILUVW VHOHFWLRQ ILQGV WKH UHG SDUWV DQG WKH VHFRQG VHOHFWLRQ ILQGV DOO VXSn SOLHUV IRU ZKLFK WKH LQYHQWRU\ LQFOXGHV WKDW VHW RI SDUWV 7KH VXEVHWBRI RSHUDWLRQ LV DYDLODEOH VLQFH SURSHUW\ ,QYHQWRU\ DQG UHVXOW 3BUHG ERWK KDYH W\SH 6HW>3DUW@ ([DPSOH :KDW SDUWV DUH QHHGHG E\ MREV LQ %RVWRQ" %RV -REV 6HOHFW-REV;M MDGGUHVVFLW\ %RVWRQf %RV-RE3DUWV 3URMHFW%RV-REV;M -Mf3WM3DUWV1HHGHGf!f PAGE 39 7KH VHOHFW RSHUDWLRQ ILQGV WKH MREV LQ %RVWRQ DQG WKH SURMHFW RSHUDWLRQ JLYHV LQIRUPDWLRQ DERXW ZKLFK SDUWV DUH QHHGHG IRU HDFK MRE LQ %RVWRQ 7KH UHVXOW RI WKH SURMHFWLRQ LV RI W\SH 6HW>7XSOH@ 1RWH WKDW RSHUDWLRQ 1HZ3DUW RI W\SH -REf FDQQRW EH DSSOLHG WR PHPEHUV RI %RV-RE3DUWV VLQFH WKH\ KDYH W\SH 7XSOH +RZn HYHU LW LV DSSURSULDWH IRU REMHFWV %RV-RE3DUWV([DPSOH )LQG DOO ORFDO VXSSOLHUV IRU HDFK MRE /RFDO6 2MRLQMREV6XSSOLHUV-6 ;M ;V MDGGUHVVFLW\ VDGGUHVVFLW\f 7KLV 2MRLQ RSHUDWLRQ SURGXFHV D VHW RI WXSOHV RI W\SH -REf66XSSOLHUf! ZKLFK LV VLPLODU WR D QRUPDOL]HG UHODWLRQ 7R JHW D VHW RI VXSSOLHUV IRU HDFK MRE D 1HVW RSHUDWLRQ QHHGV WR EH DSSOLHG 1HVW/RFDO6 6f )URP WKH DERYH GHVFULSWLRQ ZH FDQ VHH WKDW WKH TXHU\ DOJHEUD VXSSRUWV PDQ\ IHDWXUHV RI GDWDEDVHV DQG KDV WDNHQ VLJQLILFDQFH VWHSV WRZDUGV D SRZHUn IXO TXHU\ DOJHEUD WR VHUYH DV WKH PDWKHPDWLFDO IRXQGDWLRQ IRU GDWDEDVH +RZHYHU LW VWLOO KDV WKH IROORZLQJ OLPLWDWLRQV f $OWKRXJK WKH (1&25( PRGHOV DQ DSSOLFDWLRQ DV QHWZRUNV RI W\SHV REMHFWV DQG WKHLU DVVRFLDWLRQV WKH GRPDLQ RI LWV XQGHUO\LQJ TXHU\ DOJHEUD LV GHILQHG DV FROOHFWLRQV RI REMHFWV KDYLQJ W\SH 6HW>7@ ZKLFK LV HVVHQWLDOO\ D QHVWHG UHODWLRQ UHSUHVHQWDWLRQ VLQFH WKH PHPEHU W\SH 7 RI WKH VHW W\SH FDQ EH D SDUDPHWHUn L]HG 7XSOH W\SH ZKLFK PD\ LQ WXUQ FRQWDLQ DWWULEXWHV RI 7XSOH W\SHV 7KHUHn IRUH WKH TXHU\ DOJHEUD FDQQRW UHSUHVHQW QHWZRUNVWUXFWXUHG UHODWLRQVKLSV DPRQJ REMHFWV HIILFLHQWO\ DQG WKH PDSSLQJ SUREOHP DGGUHVVHG EHIRUH VWLOO UHPDLQV PAGE 40 f ,Q WKLV DOJHEUD WZR LGHQWLFDO H[SUHVVLRQV RU WZR LGHQWLFDO RSHUDWLRQV LQ D VLQn JOH H[SUHVVLRQ GR QRW JLYH LGHQWLFDO UHVSRQVH VLQFH HDFK UHVXOW FROOHFWLRQ LV D QHZO\ LGHQWLILHG REMHFW LQ WKH GDWDEDVH 7R HOLPLQDWH GXSOLFDWHG FRSLHV RI WKH VDPH QHZO\ FUHDWHG REMHFW WKH DOJHEUD LQWURGXFHV 'XS(OLPLQDWH DQG &RDOHVFH RSHUDWLRQV ZKLFK DUH QRW QHFHVVDU\ LI LW GLUHFWO\ VXSSRUWV WKH QHWn ZRUN YLHZ RI GDWDEDVHV f ,Q WKLV DOJHEUD D FROOHFWLRQ PD\ FRQWDLQ REMHFWV ZLWK KHWHURJHQHRXV VWUXFn WXUHV )RU H[DPSOH WZR REMHFWV DUH ERWK RI 7XSOH W\SH EXW ZLWK GLIIHUHQW DULWLHV DQG WKH XQLRQ RI WKH WZR REMHFW LV DOVR D FROOHFWLRQ RI REMHFWV KDYLQJ 7XSOH W\SH +RZHYHU RWKHU RSHUDWRUV LQ WKLV DOJHEUD DUH QRW GHILQHG WR RSHUDWH RQ VXFK FROOHFWLRQVf f 6LQFH WKH TXHU\ DOJHEUD LV GHYHORSHG IRU D VSHFLILF PRGHO LH (QFRUHf LW LV GLIILFXOW WR DSSO\ WR RWKHU PRGHOV PAGE 41 )LJXUH $ VDPSOH VFKHPD PAGE 42 2 2 2 R R Df VLPSOH TXHU\ SDWWHUQV )LJXUH 6LPSOH DQG FRPSOH[ TXHU\ SDWWHUQV PAGE 43 1$0( $''5(66 ,19(670(176 &203$1< 6+$5(6 385&+$6( 35,&( '$7( ,62 -RKQ 6PLWK (DVW QG 6W %ORRPLQJWRQ ,1 -LOO %URG\ 1RUWK 0DLQ 6W 2EHUWLQ 2K (;;21 )25' 6($56 )LJXUH $Q H[DPSOH RI D QHVWHG UHODWLRQ PAGE 44 3DWWHUQ 1XPEHU $ % & ( ) ) ) + D E F G H I I I JL K )LJXUH 8VLQJ D QHVWHG UHODWLRQ WR UHSUHVHQW D FRPSOH[ VWUXFWXUH PAGE 45 7\SH 6XSSOLHU SURSHUWLHV RSHUDWLRQV ,GHQW VWULQJ 5HFY2UGHU $GGUHVV $GGU 6XSSOLHU 6HW>3DUW@ a! 6XSSOLHU ,QYHQWRU\ 6HW>3DUW@ 7\SH -RE SURSHUWLHV RSHUDWLRQV 1XP VWULQJ 1HZ3DUW -RE 3DUW -RE $GGUHVV $GGU 3DUWV1HHGHG 6HW>3DUW@ 3UHIHUUHGB6XSSOLHUV 2UGHUHG BOLVW>6XSSOLHU@ 7\SH 3DUW SURSHUWLHV RSHUDWLRQV 1XP VWULQJ 2UGHU 3DUW 3DUW $GGUHVV $GGU 6DPHB3DUW 3DUW 3DUW %RROHDQ &RORU VWULQJ &RPSRQHQWV 6HW>7XSOH>33DUW4W\OQWfnn EDVHV $OWKRXJK HDFK PRGHO KDV VRPH XQLTXH FRQVWUXFWV WKDW GLVWLQJXLVK RQH PRGHO IURP WKH RWKHUV WKHUH DUH VHYHUDO FRPPRQ VWUXFWXUDO DQG EHKDYLRUDO SURn SHUWLHV EDVHG RQ ZKLFK DQ DOJHEUD FDQ EH GHYHORSHG DQG XVHG WR VXSSRUW WKHVH PRGHOV PAGE 47 )LUVW REMHFWV DUH SK\VLFDO HQWLWLHV DEVWUDFW FRQFHSWV HYHQWV SURFHVVHV IXQFn WLRQV RU DQ\WKLQJ WKDW DQ DSSOLFDWLRQ FDUHV WR FDSWXUH DQG UHSUHVHQW 6HFRQG REMHFWV KDYLQJ WKH VDPH VWUXFWXUDO DQG EHKDYLRUDO SURSHUWLHV DUH JURXSHG WRJHWKHU WR IRUP DQ REMHFW FODVV 2EMHFW FODVVHV FDQ EH FDWHJRUL]HG LQWR WZR JHQHUDO FDWHJRULHV Of WKH QRQSULPLWLYHFODVV ZKLFK UHSUHVHQWV D VHW RI REMHFWV RI LQWHUHVW LQ DQ DSSOLFDWLRQ ZRUOG HDFK RI ZKLFK LV DVVLJQHG D V\VWHPZLGH XQLTXH REMHFW LGHQWLILHU 2,'f DQG LWV GDWD DUH H[SOLFLWO\ HQWHUHG LQ D GDWDEDVH E\ WKH XVHU DQG f WKH SULPLWLYHFODVV ZKLFK UHSUHVHQWV D FODVV RI VHOIQDPHG REMHFWV VHUYLQJ DV D GRPDLQ IRU GHILQLQJ RWKHU REMHFW FODVVHV VXFK DV D FODVV RI V\PEROV RU QXPHULFDO YDOXHV 7KH EHKDYLRUDO SURSHUWLHV RI DQ REMHFW FODVV DUH GHILQHG LQ WHUPV RI V\VWHPGHILQHG RU XVHUGHILQHG RSHUDWLRQV HJ UHWULHYH GLVSOD\ GHOHWH LQVHUW URWDWH D GHVLJQ REMHFW KLUH DQ HPSOR\HH HWFf ZKLFK FDQ PHDQLQJIXOO\ RSHUDWH RQ LWV REMHFWV XVLQJ WKHLU FRUUHVSRQGLQJ SURJUDPV RU PHWKRGVf 7KH VWUXFWXUDO SURSHUWLHV RI DQ REMHFW FODVV DQG WKXV LWV REMHFWV FRQVLVW RI WZR W\SHV RI GDWD f GHVFULSWLYH GDWD RU LQVWDQFH YDULDEOHVf ZKLFK GHILQH WKH VWDWHV RI WKH REMHFWV DQG f DVVRFLDWLRQ GDWD ZKLFK VSHFLI\ WKH UHODWLRQVKLSV EHWZHHQ LWV REMHFWV DQG WKH REMHFWV RI VRPH UHODWHG FODVVHV 7KLUG GLIIHUHQW PRGHOV UHFRJQL]H GLIIHUHQW W\SHV RI DVVRFLDWLRQV 7ZR RI WKH PRVW FRPPRQO\ UHFRJQL]HG DVVRFLDWLRQV DUH DJJUHJDWLRQ DQG JHQHUDOL]DWLRQ $JJUHJDWLRQ PRGHOV WKH DfÂ§SDUWfÂ§RI DfÂ§IXQFWLRQfÂ§RI RU DfÂ§FRPSRVLWLRQfÂ§RI UHODWLRQn VKLS )RU LQVWDQFH D FRPSOH[ REMHFW FDQ EH PRGHOHG E\ DQ DJJUHJDWLRQ KLHUDUFK\ DEVWUDFW GDWD W\SHf LQ ZKLFK D FRPSOH[ REMHFW LV GHILQHG LQ WHUPV RI LWV DVVRFLDn WLRQV ZLWK REMHFWV LQ RWKHU GHILQHG FODVVHV *HQHUDOL]DWLRQ PRGHOV WKH LVD RU WKH PAGE 48 VXSHUFODVVfÂ§VXEFODVH UHODWLRQVKLS LQ ZKLFK DQ REMHFW LQ D VXEFODVV LQKHULWV ERWK WKH VWUXFWXUDO DQG WKH EHKDYLRUDO SURSHUWLHV RI LWV VXSHUFODVVHVf 7KXV IURP WKH DOJHEUD SRLQW RI YLHZ DQ GDWDEDVH FDQ EH YLHZHG DV D FROOHFWLRQ RI REMHFWV JURXSHG WRJHWKHU LQ FODVVHV DQG LQWHUUHODWHG WKURXJK DVVRFLDn WLRQV ,W FDQ EH UHSUHVHQWHG E\ JUDSKV DW ERWK WKH LQWHQVLRQDO DQG WKH H[WHQVLRQDO OHYHOV $W WKH LQWHQVLRQDO VFKHPDf OHYHO D GDWDEDVH LV GHILQHG E\ D FROOHFWLRQ RI LQWHUUHODWHG REMHFW FODVVHV DQG LV UHSUHVHQWHG E\ D 6FKHPD *UDSK 6*f )RU H[DPSOH WKH 6* IRU D XQLYHUVLW\ GDWDEDVH LV LOOXVWUDWHG LQ )LJXUH LQ ZKLFK HDFK UHFWDQJOH GHQRWHV D QRQSULPLWLYHFODVV VXFK DV D FODVV RI SHUVRQ REMHFWV RU D FODVV RI GHSDUWPHQW REMHFWV DQG HDFK FLUFOH GHQRWHV D SULPLWLYHFODVV VXFK DV D FODVV RI QDPHV RU DJHV 7KH DVVRFLDWLRQV DPRQJ FODVVHV DUH UHSUHVHQWHG E\ WKH HGJHV LQ 6* )RU H[DPSOH WKHUH LV DQ DVVRFLDWLRQ EHWZHHQ WKH FODVV &RXUVH DQG WKH FODVV 'HSDUWPHQW DQ $JJUHJDWLRQ DVVRFLDWLRQf DQG DQ DVVRFLDWLRQ EHWZHHQ WKH FODVV 3HUVRQ DQG WKH FODVV 6WXGHQW D *HQHUDOL]DWLRQ DVVRFLDWLRQf 6LQFH WKH VHPDQWLF GLVWLQFWLRQV RI WKHVH DQG RWKHU DVVRFLDWLRQ W\SHV UHFRJQL]HG E\ GLIIHUHQW VHPDQWLF PRGHOV FDQ EH HLWKHU KDUGFRGHG LQ D '%06 RU GHFODUDWLYHO\ VSHFLILHG E\ VRPH UXOHV DQG XVHG E\ D UXOH SURFHVVRU WR JRYHUQ WKH PDQLSXODWLRQ RI WKH DVVRFLn DWHG FODVVHV WKH XQGHUO\LQJ DOJHEUD GRHV QRW KDYH WR LQFRUSRUDWH WKH VHPDQWLFV RI WKHVH DVVRFLDWLRQ W\SHV $OO LW KDV WR EH FRQFHUQHG ZLWK LV ZKHWKHU RU QRW DQ REMHFW FODVV DQG LWV REMHFWV DUH DVVRFLDWHG ZLWK VRPH RWKHU FODVVHV DQG WKHLU REMHFWV LH WKH HGJHV RU DVVRFLDWLRQVff OHYHO D GDWDEDVH FDQ EH YLHZHG DV D FROOHFWLRQ RI REMHFWV JURXSHG WRJHWKHU LQ FODVVHV DQG LQWHUUHODWHG WKURXJK VRPH W\SHOHVV DVVRFLDWLRQV DQG DV VXFK LW FDQ EH UHSUHVHQWHG E\ DQ 2EMHFW *UDSK 2*ff RU XVHUGHILQHG PAGE 50 RSHUDWLRQV 5RWDWH3DUW 3XUFKDVH3DUW +LUH)DFXLW\ HWFf )RU H[DPSOH WKH IROn ORZLQJ TXHULHV FDQ EH LVVXHG DJDLQVW WKH XQLYHUVLW\ GDWDEDVH DV LOOXVWUDWHG LQ )LJn XUHV DQG WKH DOJHEUDLF H[SUHVVLRQV IRU WKHVH TXHULHV ZLOO EH JLYHQ LQ 6HFWLRQ f 4XHU\ )RU DOO VHFWLRQV JHW WKH PDMRUV RI VWXGHQWV ZKR DUH WDNLQJ WKHVH VHFWLRQV 7R VDWLVI\ WKLV TXHU\ ZH FDQ VSHFLI\ D OLQHDU SDWWHUQ FRQWDLQLQJ WKH FODVVHV 6HFWLRQ 6WXGHQW DQG 'HSDUWPHQW DV VKRZQ LQ )LJXUH D ,Q WKLV SDWWHUQ D FLUn FOH UHSUHVHQWV D FODVV DQG DQ HGJH UHSUHVHQWV WKDW WKH REMHFWV RI WKH WZR DGMDFHQW FLUFOHV FODVVHVf PXVW EH DVVRFLDWHG ZLWK HDFK RWKHU 7KLV SDWWHUQ LV FDOOHG DQ LQWHQVLRQDO SDWWHUQ ZKLFK UHSUHVHQWV WKDW VHFWLRQV WDNHQ E\ VWXGHQWV ZKR PDMRU LQ VRPH GHSDUWPHQWV DUH WR EH LGHQWLILHG 7KH DQVZHU WR WKLV TXHU\ FDQ EH IRXQG LQ )LJXUH E\ FKHFNLQJ LI WKH REMHFWV RI WKHVH WKUHH FODVVHV VDWLVI\ VXFK SDWWHUQ 7KHUH DUH ILYH REMHFW SDWWHUQV FDOOHG H[WHQVLRQDO SDWWHUQVf ZKLFK VDWLVI\ WKH LQWHQn VLRQDO SDWWHUQ DV VKRZQ LQ )LJXUH E 7KH 6HFWLRQ REMHFW VF DQG WKH 6WXGHQW REMHFW V GR QRW DSSHDU LQ WKHVH H[WHQVLRQDO SDWWHUQV VLQFH VF LV QRW WDNHQ E\ DQ\ VWXGHQW DQG V GRHV QRW KDYH D PDMRU \HW 7KHVH SDWWHUQV FDQ DOVR EH LGHQWLILHG LQ WZR VHTXHQWLDO VWHSV )LUVW JHW DOO WKH SDWWHUQV LQ ZKLFK WKH 6HFWLRQ REMHFWV DUH DVVRFLDWHG ZLWK WKH 6WXGHQW REMHFWV 7KHQ LI D SDWWHUQ JHQHUDWHG LQ WKH ILUVW VWHS LH D 6HFWLRQ6WXGHQW SDLUf LV IXUWKHU DVVRFLDWHG ZLWK DQ REMHFW RI 'HSDUWPHQW D QHZ SDWWHUQ FRQVLVWLQJ RI WKUHH REMHFWV LV FRQVWUXFWHG DQG UHWDLQHG LQ WKH UHVXOW RWKHUZLVH WKH SDLU LV GURSSHG PAGE 51 2QFH WKHVH REMHFWV DV ZHOO DV WKHLU DVVRFLDWLRQVf KDYH EHHQ LGHQWLILHG GLIIHUHQW V\VWHPGHILQHG RU XVHUGHILQHG RSHUDWLRQV GHILQHG RQ WKHLU FRUUHVSRQGLQJ FODVVHV FDQ EH DSSOLHG WR WKHVH VHOHFWHG REMHFWV )RU H[DPSOH ,QIRUP'HSDUWPHQWff 2QH SDWK LV IURP 6WXGHQW WR 'HSDUWPHQW ZKLFK PHDQV WKDW D VWXGHQW PDMRUV LQ D FHUWDLQ GHSDUWPHQW DQG WKH RWKHU SDWK LV IURP 6WXGHQW WR 'HSDUWPHQW WKURXJK 8QGHUJUDG ZKLFK PHDQV WKDW D VWXGHQW LV DQ XQGHUJUDGXDWH DQG PLQRUV LQ D FHUWDLQ GHSDUWPHQW ZH FDQ VHH IURP WKH 6* WKDW RQO\ XQGHUJUDGXDWHV PD\ KDYH PLQRUVf $FFRUGLQJ WR WKH TXHU\ D VLQJOH VWXn GHQW VKRXOG DVVRFLDWH ZLWK REMHFWV LQ ERWK 8QGHUJUDG DQG 'HSDUWPHQW DQG WKHVH WZR SDWKV VKRXOG PHUJH DW 'HSDUWPHQW WKHUHE\ IRUPLQJ D ORRS 7KLV LPSOLHV WZR ORJLFDO $1' FRQGLWLRQV RQH DW WKH 6WXGHQW FODVV DQG WKH RWKHU DW WKH 'HSDUWPHQW FODVV :H XVH GRXEOH DUFV WR GHQRWH VXFK FRQGLWLRQV DV VKRZQ LQ )LJXUH F )URP )LJXUH ZH FDQ VHH WKDW WKH VWXGHQW VL KDV KLV PDMRU DQG PLQRU LQ WKH GHSDUWPHQW GO 7KLV H[WHQVLRQDO SDWWHUQ LV GHSLFWHG LQ )LJXUH G PAGE 52 4XHU\ )RU WKRVH VWXGHQWV WDNLQJ VHFWLRQ DQG KDYLQJ PDMRUV DQGRU PLQRUV JHW WKHLU PDMRUV DQGRU PLQRUV 7KHUH DUH VHYHUDO ZD\V WR IRUP DQ LQWHQVLRQDO SDWWHUQ IRU WKH TXHU\ :H PD\ VWDUW IURP 6HFWLRQ DQG WUDYHUVH WR 6WXGHQW WKURXJK 6HFWLRQ DQG WKHQ QDYLn JDWH WKH VFKHPD LQ WZR SDWKV DV ZH GLG IRU TXHU\ $FFRUGLQJ WR WKH TXHU\ D VWXGHQW ZKR HLWKHU KDV D PDMRU RU D PLQRU VKRXOG EH LQFOXGHG LQ WKH UHVXOW LQ WKLV GDWDEDVH LW LV DVVXPHG WKDW JUDGXDWH VWXGHQWV GR QRW KDYH PLQRUVf 7KLV PHDQV WKDW HLWKHU SDWK RI WKH QDYLJDWLRQ ZLOO FRQVWUXFW D SDWWHUQ WKDW ZRXOG VDWLVI\ WKH TXHU\ 7KXV D ORJLFDO 25 FRQGLWLRQ H[LVWV DW 6WXGHQW :H XVH D VLQJOH DUF WR LQGLFDWH WKH 25 FRQGLWLRQ DV VKRZQ LQ )LJXUH D /LNH 4XHU\ WKHVH WZR EUDQFKHV PHUJH DW 'HSDUWPHQW +RZHYHU WKLV TXHU\ GRHV QRW UHTXLUH WKDW WKH\ PHUJH DW WKH VDPH 'HSDUWPHQW REMHFW 7KLV LV VSHFLILHG E\ WKH VHFRQG 25 FRQGLn WLRQ DW 'HSDUWPHQW LQ )LJXUH D 7KH H[WHQVLRQDO SDWWHUQV WKDW VDWLVI\ WKLV TXHU\ KDYH KHWHURJHQHRXV VWUXFn WXUHV WZR W\SHV RI OLQHDU SDWWHUQV DV VKRZQ LQ )LJXUH E 7KH ILUVW W\SH LQFOXGHV SDWWHUQV WKDW UHSUHVHQW WKH PLQRUV RI WKH XQGHUJUDGXDWHV DQG WKH VHFRQG W\SH LQFOXGHV SDWWHUQV WKDW UHSUHVHQW WKH PDMRUV RI WKH VWXGHQW ZKR DUH HLWKHU XQGHUn JUDGXDWHV RU JUDGXDWHV ,Q ERWK W\SHV RI SDWWHUQV D VWXGHQW LV DVVRFLDWHG ZLWK VHFn WLRQ ZKLFK LV DVVXPHG WR EH WKH 6HFWLRQ IRU VF )LJXUH F ZLOO EH GHVFULEHG ODWHU LQ 6HFWLRQ :H KDYH JLYHQ VRPH H[DPSOH TXHULHV ZKLFK VSHFLI\ KRZ REMHFWV DUH DVVRFLnr &RQFOXVLRQ 7KH W\SHOHVVff RI RQH UHODWLRQ ZLWK WKH DWWULEXWHV IRUHLJQ NH\Vf LQ RWKHU UHODWLRQV $ TXHU\ WKDW UHTXLUHV WKH VSHFLILFDWLRQ RI D FRPSOH[ SDWWHUQ RI REMHFW DVVRFLDWLRQV FDQ EH VSHFLILHG LQ D UDWKHU VWUDLJKWIRUnf 2 2 2 6WXGHQW VF V G f f f VF V G f f f Ef VF V G f f f VF V G f f f VF V G } f f 4XHU\ )LJXUH 3DWWHUQ VSHFLILFDWLRQV IRU 4XHU\ DQG 4XHU\ PAGE 58 4XHU\ Ef 6HFWLRQ 6HFWLRQ 6WXGHQW 'HSW 4XHU\ Gf 7HDFKHU R 6HFWLRQ fÂ§R VF f f VF Hf r VF )LJXUH 3DWWHUQ VSHFLILFDWLRQV IRU 4XHU\ DQG 4XHU\ PAGE 59 &+$37(5 $662&,$7,21 $/*(%5$ 7KH DVVRFLDWLRQ DOJHEUD $DOJHEUDf LV GHILQHG EDVHG RQ D XQLIRUP UHSUHVHQWDn WLRQ RI DQ GDWDEDVH LQ WHUPV RI REMHFWV REMHFW FODVVHV DQG W\SHOHVV DVVRFLDn WLRQV DV GHVFULEHG LQ &KDSWHU 7KH DOJHEUD FRQWDLQV D QXPEHU RI RSHUDWRUV ZKLFK RSHUDWH RQ JUDSK VWUXFWXUHV RI REMHFW DVVRFLDWLRQV WR SURGXFH JUDSK VWUXFn WXUHV 7KH FORVXUH SURSHUW\ RI WKH DOJHEUD HQVXUHV WKDW WKH UHVXOW RI D TXHU\ FDQ EH IXUWKHU PDQLSXODWHG E\ RWKHU TXHULHV Â£'HILQLWLRQV )LUVW ZH IRUPDOO\ GHILQH DQ GDWDEDVH DW ERWK VFKHPD DQG REMHFW OHYHOV 6FKHPD *UDSK WKH LQWHQVLRQDO GDWDEDVHf 7KH VFKHPD JUDSK RI DQ GDWDEDVH LV GHILQHG DV 6*&$f ZKHUH & ^&^` LV D VHW RI YHUWLFHV UHSUHVHQWLQJ REMHFW FODVVHV $ LV D VHW RI HGJHV HDFK RI ZKLFK $L`Nf UHSUHVHQWV DVVRFLDWLRQ EHWZHHQ FODVVHV & DQG & ZKHUH N LV D QXPEHU IRU GLVWLQJXLVKLQJ WKH HGJHV IURP RQH DQRWKHU ZKHQ WKHUH LV PRUH WKDQ RQH HGJH EHWZHHQ WZR YHUWLFHV 2EMHFW *UDSK WKH H[WHQVLRQDO GDWDEDVHf 7KH REMHFW JUDSK RI DQ GDWDEDVH LV GHILQHG DV 2*2W(f ZKHUH ^2A` LV D VHW RI YHUWLFHV UHSUHVHQWLQJ REMHFW LQVWDQFHV MnWK REMHFW LQ FODVV &^f DQG ( ^L; PLV D VHW RI HGJHV UHSUHVHQWLQJ WKH DVVRFLDWLRQV DPRQJ REMHFW LQVWDQFHV :KHQ RQH REMHFW LQVWDQFH LV FRQQHFWHG ZLWK DQRWKHU LQ WKH REMHFW JUDSK D UHJXODUHGJH VROLG OLQHf LV GUDZQ EHWZHHQ WKH FRUUHVSRQGLQJ YHUn WLFHV DV ^SfÂ§ ZKLFK VSHFLILHV WKDW MnWK REMHFW LQVWDQFH LQ FODVV &Â LV UHODWHG WR QWK REMHFW LQVWDQFH LQ FODVV &P WKURXJK WKH IFWK DVVRFLDWLRQ RI FODVVHV & DQG &P ,I WZR REMHFW LQVWDQFHV ^ M DQG 2P Q DUH QRW FRQQHFWHG LQ WKH REMHFW JUDSK EXW WKHLU FODVVHV & DQG &P LQ WKH FRUUHVSRQGLQJ 6* DUH PAGE 60 GLUHFWO\ FRQQHFWHG D FRPSOHPHQWHGJH GRWWHG OLQHf LV GUDZQ EHWZHHQ WKHP DQG LV GHQRWHG E\ 2 A 2 LM P Q ,Q WKLV PRGHOV DQ REMHFW PD\ SDUWLFLSDWH LQ VHYHUDO FODVVHV HJ LQ D JHQHUDOL]DWLRQ KLHUDUFK\ff DQG QRW WDNHQ E\ VWXGHQWV VL DQG V FRPSOHPHQWHGJHVf 7KH UHODWLRQVKLS EHWZHHQ DQ 2* DQG LWV FRUUHVSRQGLQJ 6* LV IRUPDOO\ GHVFULEHG E\ WKH IROORZLQJ SURSRVLWLRQ 3URSRVLWLRQ $Q *(f LV D PRUSKLVP RI LWV FRUUHVSRQGLQJ 6*&$f 7KH PDSSLQJ IXQFWLRQ )P LV GHILQHG DV )P9 &L DQG ) Pn $-9 ^Y/PQ`f %\ WKLV GHILQLWLRQ D VLQJOH YHUWH[ RU REMHFW LQVWDQFHf LQ 2* ZKLFK LV D FRQn QHFWHG VXEJUDSK LV DOVR D SDWWHUQ :H FDOO LW DQ ,QQHUDVVRFLDWLRQSDWWHUQ RU ,QQHUSDWWHUQ IRU VKRUWf ,W LV DOJHEUDLFDOO\ UHSUHVHQWHG E\ Df IRU D YHUWH[ RI FODVV $ LQ 6* 7KXV REMHFW LQVWDQFHV DUH WUHDWHG DV ,QQHUSDWWHUQV LQ WKH $DOJHEUD $ UHJXODUHGJH WRJHWKHU ZLWK WZR YHUWLFHV LH WZR ,QQHUSDWWHUQVf LW FRQQHFWV LV FDOOHG DQ ,QWHUDVVRFLDWLRQSDWWHUQ RU ,QWHUSDWWHUQf ZKLFK LV UHSUHVHQWHG E\ DÂEMf $ FRPSOHPHQWHGJH WRJHWKHU ZLWK WKH WZR ,QQHUSDWWHUQV LW FRQQHFWV LV FDOOHG D &RPSOHPHQWDVVRFLDWLRQSDWWHUQ RU &RPSOHPHQWSDWWHUQf DQG LV UHSUHVHQWHG E\ DAMf 7KLV SDWWHUQ VWDWHV WKDW RI DQG EM DUH QRW DVVRFLDWHG ZLWK HDFK RWKHU LQ 2* ,I D SDWK FRQVLVWLQJ RI RQO\ UHJXODUHGJHV EHWZHHQ YHUWLFHV DW DQG EM LW FDQ EH UHSUHVHQWHG E\ D 'HULYHGLQWHUDVVRFLDWLRQSDWWHUQ 'LQWHUSDWWHUQf GHQRWHG E\ Dftf RWKHUZLVH LW FDQ EH UHSUHVHQWHG E\ D 'HULYHGFRPSOHPHQWDVVRFLDWLRQ PAGE 62 SDWWHUQ 'FRPSOHPHQWSDWWHUQf GHQRWHG E\ DEMf :KHQ D SDWK LV UHSUHVHQWHG E\ D GHULYHG SDWWHUQ LW VLPSO\ PHDQV WKDW WZR YHUWLFHV DUH LQGLUHFWO\ DVVRFLDWHG RU QRQDVVRFLDWHG EXW KRZ WKH\ DUH LQWHUUHODWHG WKH DFWXDO SDWKf LV RI QR LPSRUWDQFH $ 'LQWHUSDWWHUQ LV WUHDWHG DV DQ ,QWHUSDWWHUQ DQG D 'FRPSOHPHQWSDWWHUQ LV WUHDWHG DV D &RPSOHPHQWSDWWHUQ LQ WKH DOJHEUDLF RSHUDWLRQV 7KH DERYH ILYH W\SHV RI SDWWHUQV DUH WKH SULPLWLYH SDWWHUQV WKH ODWWHU IRXU EHLQJ ELQDU\ SDWWHUQV 7KHLU JUDSKLFDO DQG DOJHEUDLF UHSUHVHQWDWLRQV DUH VXPPDUn L]HG LQ )LJXUH D $OO RWKHU FRQQHFWHG VXEJUDSKV DUH FDOOHG FRPSOH[ SDWWHUQV )RU H[DPSOH WKH FRPSOH[ SDWWHUQ VKRZQ LQ )LJXUH EO FRQWDLQV WKUHH SULPLWLYH SDWWHUQV WZR ,QWHUSDWWHUQV RMIFMf DQG EOGOf DQG D &RPSOHPHQWSDWWHUQ Ff ,W FDQ EH XQLTXHO\ GHILQHG E\ LWV DOJHEUDLF UHSUHVHQWDWLRQ DV D VHW RI SULPLWLYH SDWn WHUQV LH DFGf 0RUH H[DPSOHV RI FRPSOH[ SDWWHUQV DUH VKRZQ LQ )LJXUH E )URP WKHVH H[DPSOHV RQH FDQ REVHUYH WKDW D FRPSOH[ SDWWHUQ FDQ EH GHFRPSRVHG LQWR D VHW RI ELQDU\ SDWWHUQV ZKLFK FDQQRW EH IXUWKHU GHFRPSRVHG 7KLV LPSOLHV WKDW LQ WKH DOJHEUDLF UHSUHVHQWDWLRQ RI D FRPSOH[ SDWWHUQ DQ ,QQHU SDWWHUQ PD\ QRW RFFXU DV DQ HOHPHQW DQG D ELQDU\ SDWWHUQ PD\ DSSHDU RQO\ RQFH $ SDWWHUQ LQ WKLV DOJHEUDLF IRUPDW LV FDOOHG D QRUPDOL]HG SDWWHUQ RWKHUZLVH LW LV FDOOHG DQ XQQRUPDOL]HG SDWWHUQ ESEM&Mf EEFf DQG DEFDEf DUH H[DPSOHV RI XQQRUPDOL]HG SDWWHUQV 'XULQJ WKH SURFHVV RI FRQVWUXFWLQJ DQ DVVRFLDWLRQ SDWn WHUQ ZH DOZD\V QRUPDOL]H LW E\ HOLPLQDWLQJ WKH GXSOLFDWHV 7KH DERYH WKUHH SDWn WHUQV KDYH WKH QRUPDOL]HG IRUPV RI EMFEFf DQG DEEFf UHVSHFWLYHO\ 7KH GHILQLWLRQV RI 2* DQG DVVRFLDWLRQ SDWWHUQ LPSO\ WKDW D SDWWHUQ LV D QRQ GLUHFWLRQDO JUDSK LH DAf ED^f DQG WKDW WKH VHTXHQFH RI SULPLWLYH SDWWHUQV LQ PAGE 63 WKH DOJHEUDLF UHSUHVHQWDWLRQ RI D FRPSOH[ SDWWHUQ LV QRW LPSRUWDQW KHQFH DLEU EMFNf FNEM DLEMf %DVHG RQ WKH DERYH GHILQLWLRQ DQG QRWLRQ RI DVVRFLDWLRQ SDWWHUQ ZH YLHZ DQ 2* DV DQ $VVRFLDWLRQ *UDSK $*f DQG DOO WKH DVVRFLDWLRQ SDWWHUQV LQ $* IRUP WKH GRPDLQ RI WKH $DOJHEUD GHQRWHG E\ $ r 5HODWLRQVKLS %HWZHHQ 7ZR $VVRFLDWLRQ 3DWWHUQV 7KH RSHUDWRUV RI WKH $DOJHEUD DUH GHILQHG EDVHG RQ WKH SRVVLEOH UHODWLRQVKLSV EHWZHHQ WZR SDWWHUQV LQ $ VR WKDW WKH\ FDQ EH XVHG HLWKHU WR FRQVWUXFW FRPSOH[ SDWWHUQV XVLQJ VLPSOHU SDWWHUQV RU WR GHFRPSRVH D FRPSOH[ SDWWHUQ LQWR VHYHUDO SDWWHUQV RI VLPSOHU VWUXFWXUHV 7KHUH DUH IRXU SRVVLEOH UHODWLRQVKLSV EHWZHHQ WZR SDWWHUQV S DQG S QRQRYHUODS RYHUODS FRQWDLQ DQG HTXDO f 1RQRYHUODS 7ZR SDWWHUQV DUH VDLG WR EH QRQRYHUODS GHQRWHG E\ Sn][LS LI WKH\ KDYH QR FRPPRQ ,QQHUSDWWHUQ f 2YHUODS 7ZR SDWWHUQV DUH VDLG WR EH RYHUODSSHG GHQRWHG E\ SnQS LI WKH\ KDYH DW OHDVW RQH FRPPRQ ,QQHUSDWWHUQ f &RQWDLQ &RQWDLQ LV D VSHFLDO FDVH RI f ZKHQ DOO WKH SULPLWLYH SDWWHUQV RI S DUH FRQWDLQHG LQ S :H VD\ WKDW S LV D VXESDWWHUQ RI S DQG GHQRWH WKLV UHODWLRQVKLS E\ Sn&S f (TXDO 7KLV LV D VSHFLDO FDVH RI f ZKHQ S FRQWDLQV DOO WKH SULPLWLYH SDWn WHUQV RI S DQG YLFH YHUVD ,W LV GHQRWHG E\ Sn S %HIRUH GHILQLQJ WKH DVVRFLDWLRQ RSHUDWRUV ZH JLYH WKH GHILQLWLRQ RI $VVRFLDWLRQVHW fÂ§ WKH RSHUDQG RI WKH DVVRFLDWLRQ RSHUDWRUV $VVRFLDWLRQVHW $Q DVVRFLDWLRQVHW GHQRWHG E\ D *UHHN OHWWHU D RU Iff LV D VHW RI DVVRFLDn WLRQ SDWWHUQV ZLWKRXW GXSOLFDWHV D GHVLJQDWHV WKH rWK SDWWHUQ LQ D ZKHUH PAGE 64 DnAD 9L9Mf $Q HPSW\ VHW LV DOVR DQ DVVRFLDWLRQVHW GHQRWHG E\ I! $ VSHFLDO W\SH RI DVVRFLDWLRQVHW LV FDOOHG KRPRJHQHRXV DVVRFLDWLRQVHW ZKLFK LV LPSRUWDQW WR WKH $DOJHEUD VLQFH VRPH RI WKH PDWKHPDWLFDO SURSHUWLHV KROG RQO\ ZKHQ RSHUDQGV DUH KRPRJHQHRXV DVVRFLDWLRQVHWV +RPRJHQHRXV $VVRFLDWLRQVHW $Q DVVRFLDWLRQVHW LV KRPRJHQHRXV LI f DOO SDWWHUQV DUH IRUPHG E\ WKH ,QQHUSDWWHUQV RU REMHFW LQVWDQFHVf RI WKH VDPH VHW RI REMHFW FODVVHV DQG f DOO SDWWHUQV KDYH WKH VDPH QXPEHU RI ,QQHUSDWWHUQV IURP HDFK FODVV LQ WKH VHW DQG f FRUUHVSRQGLQJ SULPLWLYH SDWWHUQV EHORQJ WR WKH VDPH DVVRFLDWLRQ DQG DUH RI WKH VDPH W\SH DQG f DOO SDWWHUQV KDYH WKH VDPH WRSRORJ\ 2WKHUZLVH LW LV D KHWHURJHQHRXV DVVRFLDWLRQVHW )LJXUH GHSLFWV WKUHH H[DPSOH DVVRFLDWLRQVHWV D LV KRPRJHQHRXV ZKHUHDV 3 LV QRW VLQFH SDWWHUQ I" KDV RQO\ RQH ,QQHUSDWWHUQ RI FODVV & LQVWHDG RI WZR OLNH DQG IW LV QRW KRPRJHQHRXV EHFDXVH V FRQWDLQV D &RPSOHPHQWSDWWHUQ ZKLFK LV GLIIHUHQW IURP DQG V LH GLIIHUHQW WRSRORJLHVf r $VVRFLDWLRQ 2SHUDWRUV 7HQ DVVRFLDWLRQ RSHUDWRUV DUH IRUPDOO\ GHILQHG LQ WKLV VHFWLRQ WKUHH XQDU\ RSHUDWRUV >$3URMHFW f $6HOHFW Uf DQG $,QWHJUDWH IfDQG VHYHQ ELQDU\ RSHUDWRUV >$VVRFLDWH rf $&RPSOHPHQW _f $8QLRQ f $'LIIHUHQFH f $ 'LYLGH If 1RQ$VVRFLDWH Of DQG $,QWHUVHFW ff@ 7KH H[DPSOHV XVHG WR H[SODLQ PAGE 65 WKHVH RSHUDWRUV ZLOO PDNH XVH RI WKH GRPDLQ $ VKRZQ LQ )LJXUH 7R NHHS WKH JUDSK VLPSOH WKH &RPSOHPHQWSDWWHUQV DUH QRW VKRZQ LQ WKH ILJXUH 7KH VLPSOH PDWKHPDWLFDO SURSHUWLHV VXFK DV FRPPXWDWLYLW\ DVVRFLDWLYLW\ LGHPSRWHQF\ DQG QLOSRWHQF\ VDWLVILHG E\ WKH RSHUDWRUV DUH JLYHQ DIWHU HDFK GHILQLWLRQ 1RWDWLRQV 1RWDWLRQV WKDW ZLOO EH XVHG LQ WKH VXEVHTXHQW VHFWLRQV DUH ILVWHG EHORZ $ &/Â‘ >5&/Y&/f? LDLEMf DLEMf .FNf RU 3 D 'HQRWH FODVVHV 'HQRWHV D YDULDEOH IRU D FODVV 'HQRWHV WKH DVVRFLDWLRQ EHWZHHQ FODVVHV &/; DQG &/ 'HQRWHV WKH rWK ,QQHUSDWWHUQ RI FODVV $ 'HQRWHV DQ ,QQHUSDWWHUQ YDULDEOH 'HQRWHV DQ ,QWHUSDWWHUQ EHWZHHQ WZR FODVVHV $ DQG % 'HQRWHV D &RPSOHPHQWSDWWHUQ EHWZHHQ WZR FODVVHV $ DQG % 'HQRWHV D 'HULYHGSDWWHUQ IURP FODVV $ WR FODVV & 'HQRWH DVVRFLDWLRQVHWV 'HQRWHV rnWK SDWWHUQ RI DVVRFLDWLRQVHW D 'HQRWH VHWV RI FODVVHV +HQFH UHSUHVHQWV DVVRFLDWLRQVHW D ZKLFK KDV ,QQHUSDWWHUQVf IURP WKH FODVVHV LQ ^$` ,W VKRXOG EH QRWHG WKDW DQ ,QQHUSDWWHUQ LV UHSUHVHQWHG E\ DQ REMHFW LQVWDQFH LGHQWLILHU ,,'f ZKLFK LV D V\VWHPDVVLJQHG REMHFW LGHQWLILHU 2,'f SUHIL[HG E\ D FODVV LGHQWLILFDWLRQ VR WKDW WKH REMHFW LQVWDQFHV RI DQ REMHFW LQ PXOWLSOH FODVVHV FDQ EH XQDPELJXRXVO\ GLVWLQJXLVKHG DQG WKH IDFW WKDW WKHVH REMHFW LQVWDQFHV DUH PAGE 66 LQVWDQFHV RI WKH VDPH REMHFW FDQ HDVLO\ EH UHFRJQL]HG 2SHUDWRUV $OO UHODWLRQDO DOJHEUDLF RSHUDWRUV RSHUDWH RQ UHODWLRQV RI KRPRJHQHRXV RU XQLRQFRPSDWLEOHf VWUXFWXUHV ZLWK WKH H[FHSWLRQ RI &DUWHVLDQSURGXFW DQG -RLQ 7KH &DUWHVLDQSURGXFW DQG -RLQ SURYLGH WKH PHFKDQLVP WR FRQFDWHQDWH WZR UHODn WLRQV RI GLIIHUHQW VWUXFWXUHV LQWR D VLQJOH UHODWLRQ VR WKDW LW FDQ EH IXUWKHU PDQLSXn ODWHG E\ RWKHU RSHUDWRUV ,Q WKH $DOJHEUD DOO WKH RSHUDWRUV DUH GHILQHG WR RSHUDWH RQ DVVRFLDWLRQ SDWWHUQV RI KRPRJHQHRXV DV ZHOO DV KHWHURJHQHRXV VWUXFWXUHV 7KHUHIRUH WKH UHODWLRQDO DOJHEUD LV D VSHFLDO FDVH RI WKH $DOJHEUD LQ WKLV UHVSHFW Of $VVRFLDWH rfr >IO$IOf@ 3 ^ mDPff DPEQH>5$%f` $ DPtrf $ EQH ` 7KH UHVXOW RI DQ $VVRFLDWH RSHUDWLRQ LV DQ DVVRFLDWLRQVHW FRQWDLQLQJ QR GXSOLn FDWHV (DFK RI LWV SDWWHUQ LV WKH FRQFDWHQDWLRQ RI WZR SDWWHUQV RQH IURP HDFK PAGE 67 RSHUDQG DVVRFLDWLRQVHWf 0RUH VSHFLILFDOO\ LI WKH ,QQHUSDWWHUQ RU REMHFW DPf RI $ LQ Rn LV DVVRFLDWHG ZLWK WKH ,QQHUSDWWHUQ RU REMHFW EQf RI % LQ LQ WKH GRPDLQ RI WKH DOJHEUD $ VKRZQ LQ )LJXUH WKHQ D DQG DUH FRQFDWHQDWHG YLD WKH SULPLn WLYH SDWWHUQ DP:H GR QRW UHVWULFW $ DQG % WR EH GLIIHUHQW FODVVHV LQ r>5$%f? LH D r^5$$f@3 LV D OHJLWLPDWH RSHUDWLRQ ZKLFK FRQFDWHQDWHV WZR SDWWHUQV RQH IURP HDFK RSHUDQG DVVRFLDWLRQVHWf LI WKH\ KDYH D FRPPRQ ,QQHUSDWWHUQ RI FODVV $ $Q H[DPSOH RI WKH $VVRFLDWH RSHUDWLRQ LV VKRZQ LQ )LJXUH D IRU FRQYHQLn HQFH D FRS\ RI WKH VDPSOH GDWDEDVH LV VKRZQ LQ HDFK ILJXUH IRU LOOXVWUDWLQJ DQ RSHUDWLRQ )RU FODULW\ ZH XVH JUDSKLFDO QRWDWLRQ LQ WKH ILJXUHV ,Q WKH H[DPSOH RU LV FRQFDWHQDWHG ZLWK DQG UHVSHFWLYHO\ GXH WR WKH H[LVWHQFH RI Ff DQG Ff LQ $ DV VKRZQ LQ )LJXUH D LV GURSSHG VLPSO\ EHFDXVH LW GRHV QRW KDYH DQ ,QQHUSDWWHUQ RI FODVV % D LV GURSSHG EHFDXVH f LV QRW DVVRFLDWHG ZLWK DQ\ ,QQHUSDWWHUQ RI FODVV & LQ $ FDQQRW EH FRQFDWHQDWHG WKURXJK Ff ZLWK DQ\ SDWWHUQ LQ D EHFDXVH QR SDWWHUQ LQ R KDV DQ ,QQHUSDWWHUQ RI % WKDW LV DVVRFLDWHG ZLWK Ff LQ $ )RU WKH VDPH UHDVRQ LV GURSSHG )RU WKH $VVRFLDWH RSHUDWRU >5$%f? FDQ EH RPLWWHG LI WKH IROORZLQJ FRQGLn WLRQV KROG f ERWK D DQG IW DUH $DOJHEUD H[SUHVVLRQV f WKH $VVRFLDWH RSHUDWRU RSHUDWHV RQ WKH ODVW FODVV LQ D OLQHDU H[SUHVVLRQ D DQG WKH ILUVW FODVV LQ D OLQHDU H[SUHVVLRQ DQG f WKHUH LV D XQLTXH DVVRFLDWLRQ EHWZHHQ WKHVH WZR FODVVHV )RU H[DPSOH $ r>5$%f? % FDQ EH ZULWWHQ DV $r% LI FODVV $ LV DVVRFLDWHG ZLWK FODVV % WKURXJK WKH DWWULEXWH RI $ ,W VKRXOG EH SRLQWHG RXW WKDW $DOJHEUD DOORZV DQ DWWULEXWH WR EH GHILQHG E\ D FRPSXWHG YDOXH RU REMHFWf )RU LQVWDQFH PAGE 68 % M^$f 7KH LPSOHPHQWDWLRQV RI WKH IXQFWLRQ DQG WKH SURFHGXUH DUH LQYLVLEOH WR WKH DOJHEUD +RZHYHU WKH\ VKRXOG QRW KDYH VLGH HIIHFW LH WKH FRPSXWHG UHVXOW PXVW EH RI WKH VDPH W\SH DV % 7KH $VVRFLDWH RSHUDWRU LV FRPPXWDWLYH DQG FRQGLWLRQDOO\ DVVRFLDWLYH DV GHILQHG EHORZ D r>5$-f@ r>e"$f@ D FRPPXWDWLYLW\f m^r` r^5$f%f` 3^\`f r^5LF'f? O^]f DVVRFLDWLYLW\f m: 3^ PAGE 69 f $&RPSOHPHQW _f 7KH $&RPSOHPHQW RSHUDWRU LV D ELQDU\ RSHUDWRU ZKLFK FRQFDWHQDWHV WKH SDWWHUQV RI WZR RSHUDQG DVVRFLDWLRQVHWV RYHU &RPSOHPHQWSDWWHUQV ,W LV XVHG WR LGHQWLI\ WKH REMHFWV LQ WZR FODVVHV ZKLFK DUH QRW DVVRFLDWHG ZLWK HDFK RWKHU LQ $ 7KH $&RPSOHPHQW RSHUDWRU LV GHILQHG DV IROORZV D >5^$%f? 3 ^ A$fH>-$IOf@ $ DUD*m $ EQH" RU Dn +DPemff $ ÂQfEfH3f RU nI IW QfEQeIIf $ Â£PfDP*Df ` 7KH UHVXOW RI DQ $&RPSOHPHQW RSHUDWLRQ LV DQ DVVRFLDWLRQVHW (DFK RI LWV SDWWHUQV LV IRUPHG E\ FRQFDWHQDWLQJ WZR SDWWHUQV RQH IURP HDFK RSHUDQG DVVRFLDWLRQVHWf YLD D &RPSOHPHQWSDWWHUQ RPQf ZKHUH DP DQG EQ EHORQJ WR D DQG IW UHVSHFWLYHO\ DQG WKH &RPSOHPHQWSDWWHUQ DPQf LV LQ $ ,Q WKH VSHFLDO FDVH ZKHQ DRU Sf LV DQ HPSW\ DVVRFLDWLRQVHW RU GRHV QRW KDYH ,QQHUSDWWHUQV RI FODVV $RU %f WKHQ DOO SDWWHUQV RI IRU Df WKDW KDYH ,QQHUSDWWHUQV RI $RU %f DUH UHWDLQHG LQ WKH UHVXOWLQJ DVVRFLDWLRQVHW $Q H[DPSOH RI WKH $&RPSOHPHQW RSHUDWLRQ LV VKRZQ LQ )LJXUH E ,W RSHUDWHV RYHU WKH DVVRFLDWLRQ EHWZHHQ FODVVHV % DQG & D GRHV QRW DSSHDU LQ WKH UHVXOWDQW DVVRFLDWLRQVHW EHFDXVH LW FRQWDLQV QR ,QQHUSDWWHUQV RI % D FDQQRW EH $&RPSOHPHQWHG ZLWK IW DQG IW EHFDXVH LW LV FRQQHFWHG ZLWK IW DQG I" E\ ,QWHUn SDWWHUQV &Mf DQG EAf LQ $ UHVSHFWLYHO\ 8QGHU WKH VDPH FRQGLWLRQV DV JLYHQ LQ WKH $VVRFLDWH RSHUDWRU >5$%f` QHHG QRW EH VSHFLILHG ZLWK WKH $&RPSOHPHQW RSHUDWRU XQOHVV WKHUH LV DQ DPELJXLW\ 7KH $&RPSOHPHQW RSHUDWRU LV FRPPXWDWLYH DQG DVVRFLDWLYH )RU WKH VLPLODU UHD PAGE 70 VRQ GHVFULEHG IRU WKH $VVRFLDWH RSHUDWRU WKH DVVRFLDWLYLW\ KROGV WUXH FRQGLWLRQDOO\ D >B5$%f@ 3 3 >%%$f@ D FRPPXWDWLYLW\f m^r` S><@f >5>&'f` ^]` DVVRFLDWLYLW\f RUZ >5$%f` 3^ PAGE 71 $ VLQJOH YDOXHG REMHFW RU D VLQJOH ,,' FDQ EH WUHDWHG HLWKHU DV LWV RZQ GDWD W\SH LQ QXPHULFDO VWULQJ RU ,,' FRPSDULVRQ RU DV D VHW W\SH FRQWDLQLQJ RQH HOHPHQW LQ D VHW FRPSDULVRQ $V DQ H[DPSOH RI $6HOHFW ZH DVVXPH WKDW WKHUH DUH WZR DVVRFLDWHG FODVVHV 6 IRU VWDFN DQG 4 IRU TXHXH 7R VHOHFW DVVRFLDWHG VWDFN DQG TXHXH REMHFW SDLUV LQ ZKLFK WKH WRS DQG WKH ERWWRP RI WKH VWDFN KDYH VRPH FRPPRQ REMHFWVf ZLWK WKRVH LQ WKH KHDG DQG WKH WDLO RI WKH TXHXH LW FDQ EH ZULWWHQ DV R^6r4f>RSf_A-RRP6ff S_ KHDG^4f\MWD?4ff A I!f )RU WKH WRS HTXDOV WKH KHDG DQG WKH ERWWRP HTXDOV WKH WDLO ZH KDYH R6r4f^WRS6f KHDG4f $ ERWWRUUL6f WDLO4f` f $3URMHFW LIf 6LPLODU WR WKH SURMHFWLRQ RSHUDWLRQ LQ WKH UHODWLRQDO DOJHEUD DQ $3URMHFW RSHUDWLRQ LV GHILQHG WR SURMHFW VXESDWWHUQVf RI D SDWWHUQ +RZHYHU LQ WKH UHODn WLRQDO DOJHEUD WKH UHODWLRQVKLS DPRQJ WKH SURMHFWHG DWWULEXWHV LV QRW LPSRUWDQW :KHUHDV LQ $DOJHEUD WKH DVVRFLDWLRQ DPRQJ WKH SURMHFWHG VXESDWWHUQV PXVW EH PDLQWDLQHG VR WKDW WKH DVVRFLDWLRQV DPRQJ WKH REMHFWV LQ WKHVH VXESDWWHUQV ZLOO EH UHWDLQHG 7KH $3URMHFW RSHUDWRU LV GHILQHG DV IROORZV Q^FWf>e @ ZKHUH D LV DQ DVVRFLDWLRQVHW GHILQHG E\ DQ $DOJHEUD H[SUHVVLRQ e HY H HQf LV D VHW RI H[SUHVVLRQV ZKLFK VSHFLI\ VXESDWWHUQV WR EH SURn MHFWHG DQG 7 WY WPf LV D VHW RI RUGHUHG VHWV RI FODVVHV (DFK RUGHUHG VHW PAGE 72 W VSHFLILHV D SDWK FRQQHFWLQJ WZR SURMHFWHG VXESDWWHUQV GHILQHG E\ WKH I H[SUHVn VLRQV HW^L OQf LV D VXEH[SUHVVLRQ RI WKH H[SUHVVLRQ ZKLFK GHILQHV D H DQG H 9AMf VKRXOG QRW FRQWDLQ D FRPPRQ FODVV 7KHUH PD\ EH PDQ\ SDWKV WKDW FRQn QHFWLQJ WZR VXESDWWHUQV LQ WKH RULJLQDO SDWWHUQ 7KH SDWK WR EH UHWDLQHG FDQ EH VSHFLILHG LQ WN ,I D VSHFLILF SDWK LV FKRVHQ D PLQLPDO QXPEHU RI FODVVHV DORQJ WKH SDWK ZKLFK FDQ XQLTXHO\ LGHQWLI\ WKH SDWK VKRXOG EH VSHFLILHG 7KH UHVXOW RI DQ $3URMHFW RSHUDWLRQ RYHU D SDWWHUQ LV LWV VXESDWWHUQV GHILQHG E\ f DQG VRPH SDWKV GHILQHG E\ 7 WKDW FRQQHFW WKHVH VXESDWWHUQV ,I D SDWK LQ WKH RULJLQDO SDWWHUQ FRQn VLVWV RI DOO ,QWHUSDWWHUQV D 'LQWHUSDWWHUQ LV UHWDLQHG 2WKHUZLVH D FRPSOHPHQWSDWWHUQ LV LQFOXGHG 0XOWLSOH SDWKV EHWZHHQ WZR SURMHFWHG VXESDWn WHUQV FDQ EH GHFODUHG LQ 7 LI LW LV VR GHVLUHG )LJXUH F VKRZV DQ H[DPSOH RI $3URMHFW IURP D SDWWHUQ D RYHU $r% DQG )RU D WKH VXESDWWHUQV DEcf DQG GM VDWLVI\ $ r% DQG UHVSHFWLYHO\ 7KHUHn IRUH WKH\ DUH NHSW LQ WKH UHVXOW $FFRUGLQJ WR WKH SDWK VSHFLILFDWLRQ VWDWHG LQ WKH RSHUDWLRQ D 'HULYHGSDWWHUQ GMf LV DGGHG WR WKH UHVXOW WKXV DIF Gc E^Gf ,WV QRUPDOL]HG IRUP LV D LAGf nI LV SURGXFHG IRU WKH VDPH UHDVRQ 6LQFH D GRHV QRW KDYH D VXESDWWHUQ VDWLVI\LQJ $ r% RQO\ GJf LV UHWDLQHG f 1RQ$VVRFLDWH Of 7KH 1RQ$VVRFLDWH RSHUDWRU LV D ELQDU\ RSHUDWRU XVHG WR LGHQWLI\ WKH DVVRFLDn WLRQ SDWWHUQV LQ RQH RSHUDQG DVVRFLDWLRQVHW WKDW DUH QRW DVVRFLDWHG RYHU D VSHFLILHG DVVRFLDWLRQf ZLWK DQ\ SDWWHUQ LQ WKH RWKHU DVVRFLDWLRQVHW DQG YLFH YHUVD PAGE 73 LQ WKH GRPDLQ RI WKH DOJHEUD $ 7KH 1RQ$VVRFLDWH RSHUDWRU LV GHILQHG DV IROORZV D >5$%f? IW ^ mrn IW A$f A$f&>L$%f@ $ DP*RUfn $ EQHIW $ 9 D fD QfAfr $ A Q P P Q RU IW Df PfDPHmnf $ Â£Qff*Af 9 9Q*AIF WHPfDNHr $ DIFQf*>m$6f@f RU r IW? QfEQHLf $ APfRP*mf 9 9DP*DfIF 0QfIF* $ DPrf*>L$L"f@f ` 7KH UHVXOW RI D 1RQ$VVRFLDWH RSHUDWLRQ LV DQ DVVRFLDWLRQVHW (DFK RI LWV SDWn WHUQV LV IRUPHG E\ FRQFDWHQDWLQJ WZR SDWWHUQV D DQG IW YLD D &RPSOHPHQW SDWWHUQ DPQf XQGHU WKH FRQGLWLRQ WKDW D LV QRW DVVRFLDWHG ZLWK DQ\ IW DQG YLFH YHUVD )XUWKHUPRUH LQ WKH VSHFLDO FDVH ZKHUH WKH SDWWHUQV RI DRU Sf KDYH ,QQHU SDWWHUQV RI $RU %f DQG FDQQRW EH FRQFDWHQDWHG ZLWK DQ\ SDWWHUQ RI SRU RUf WKHVH SDWWHUQV RI DRU IWf ZLOO EH UHWDLQHG LQ WKH UHVXOW LI RQH RI WKH IROORZLQJ WKUHH FRQGLn WLRQV KROGV f SRU Df LV DQ HPSW\ DVVRFLDWLRQVHW f DOO SDWWHUQV RI SRU Df GR QRW KDYH ,QQHUSDWWHUQV RI %>RU $f RU f DOO SDWWHUQV RI SRU Df WKDW KDYH ,QQHU SDWWHUQV RI %RU $f FDQ EH FRQFDWHQDWHG ZLWK SDWWHUQV RI DRU Sf $Q H[DPSOH RI WKH 1RQ$VVRFLDWH RSHUDWLRQ LV VKRZQ LQ )LJXUH G ,Q WKH H[DPSOH D DQG IW DUH GURSSHG GXH WR WKH H[LVWHQFH RI IFAf LQ )LJXUH D LV GURSSHG EHFDXVH LW GRHV QRW FRQWDLQ DQ ,QQHUSDWWHUQ RI FODVV % IW LV GURSSHG EHFDXVH LW GRHV QRW FRQWDLQ DQ ,QQHUSDWWHUQ RI FODVV & IW LV LQ WKH UHVXOWDQW DVVRFLDWLRQVHW EHFDXVH Ef LV QRW DVVRFLDWHG ZLWK Ff LQ $ DV VKRZQ LQ )LJXUH DQG f GRHV QRW DSSHDU LQ D H[LVWV EHFDXVH tf LV QRW DVVRFLDWHG ZLWK Ff LQ $ 1RWH WKDW WKH 1RQ$VVRFLDWH RSHUDWRU SURGXFHV D UHVXOWDQW DVVRFLDWLRQVHW ZKLFK LV D VXEVHW RI WKDW SURGXFHG E\ WKH $&RPSOHPHQW RSHUDWRU EHFDXVH RUn IW PAGE 74 DQG DPEQ PD\ IRUP D QHZ SDWWHUQ RQO\ ZKHQ DP RI Dn GRHV QRW DVVRFLDWH ZLWK DQ\ REMHFW RI % LQ 3 DQG EQ RI IW GRHV QRW DVVRFLDWH ZLWK DQ\ REMHFW RI $ LQ D ,Q IDFW WKH 1RQ$VVRFLDWH RSHUDWRU FDQ EH H[SUHVVHG LQ WHUPV RI $&RPSOHPHQW DQG RWKHU RSHUDWRUV DV IROORZV $ >5$%f? % $ fÂ§ ,-$ r>=$%f@ %f>$@ _>-$IOf@ % ,$ %f>%@f 7KXV 1RQ$VVRFLDWH LV QRW D SULPLWLYH RSHUDWRU LQ D VWULFW VHQVH +RZHYHU LW LV YHU\ XVHIXO IRU TXHU\ IRUPXODWLRQ DQG LV WKHUHIRUH LQFOXGHG LQ WKH VHW RI $DOJHEUD RSHUDWRUV 8QGHU WKH VDPH FRQGLWLRQV DV JLYHQ LQ WKH $VVRFLDWH RSHUDWRU >%$%f@ QHHG QRW EH VSHFLILHG XQOHVV WKHUH LV DQ DPELJXLW\ 7KH 1RQ$VVRFLDWH RSHUDWRU LV FRPn PXWDWLYH EXW QRW DVVRFLDWLYH D >L$%fM 3 3 ^5%$f? D FRPPXWDWLYLW\f $ >-$$f@ $ M! QLOSRWHQF\f f $,QWHUVHFW ff 7KH $,QWHUVHFW RSHUDWLRQ LV FRQYHQLHQW IRU FRQVWUXFWLQJ D SDWWHUQ ZLWK D EUDQFK RU D ODWWLFH VWUXFWXUH D SDWWHUQ WKDW KDV D ORRSf VLQFH D SDWWHUQ LQ VXFK VWUXFWXUHV FDQ EH YLHZHG DV WKH LQWHUVHFWLRQ RI WZR SDWWHUQV &RQFHSWXDOO\ WKH $,QWHUVHFW RSHUDWRU LV HTXLYDOHQW WR WKH -2,1 RSHUDWRU LQ WKH UHODWLRQDO DOJHEUD ,W RSHUDWHV RQ WZR RSHUDQG DVVRFLDWLRQVHWV RYHU D VHW RI VSHFLILHG FODVVHV 7ZR SDWWHUQV RQH IURP HDFK DVVRFLDWLRQVHW DUH FRPELQHG LQWR RQH LI WKH\ FRQWDLQ WKH VDPH VHW RI ,QQHUSDWWHUQV IRU HDFK VSHFLILHG FODVV 7KH $,QWHUVHFW RSHUDWLRQ LV GHILQHG DV IROORZ PAGE 75 m^r` r^:f 3> PAGE 76 1RZ ZH GHILQH WKUHH VHW RSHUDWRUV ZKLFK DUH GLIIHUHQW IURP WKH FRUUHVSRQGn LQJ VHW RSHUDWRUV LQ UHODWLRQDO DOJHEUD VLQFH WKH\ RSHUDWH RQ KHWHURJHQHRXV VWUXFn WXUHV DV ZHOO DV KRPRJHQHRXV VWUXFWXUHV f $,QWHJUDWH f 7KH $,QWHJUDWH LV D XQDU\ RSHUDWRU ,W UHRUJDQL]HV SDWWHUQV LQ DQ DVVRFLDWLRQVHW DFFRUGLQJ WR WKH UHODWLRQVKLSV DPRQJ SDWWHUQV ZLWK UHVSHFW WR WKH FODVVHV VSHFLILHG 7KH $,QWHJUDWH RSHUDWLRQ LV GHILQHG DV IROORZV I^Z`Df ^ 7 Lfn 9IF &/QH^:`$#H&/Q$#HD$RWHDf^#HRLN$RLNHRLf ` %\ WKLV GHILQLWLRQ D VXEVHW RI SDWWHUQV DWf RI D LV FRPELQHG LQWR D VLQJOH SDWWHUQ LI HYHU\ REMHFW LQVWDQFH RI FODVVHV LQ ^:` WKDW DSSHDUV LQ D SDWWHUQ LQ WKH VXEVHW LV DOVR FRQWDLQHG LQ DOO RWKHU SDWWHUQV LQ WKH VXEVHW ,I D SDWWHUQ RI D FDQQRW EH FRPn ELQHG ZLWK DQ\ RWKHU SDWWHUQ LW LV UHWDLQHG LQ WKH UHVXOWDQW DVVRFLDWLRQVHW DV LW LV ,I QR FODVV LV VSHFLILHG SDWWHUQV LQ ZKLFK HYHU\ SDWWHUQ KDV DW OHDVW RQH REMHFW LQVWDQFH RI DQ\ FODVVf FRPPRQ WR DQRWKHU ZLOO EH LQWHJUDWHG LQWR RQH SDWn WHUQ 7KH UHRUJDQL]HG DVVRFLDWLRQVHW ZLOO FRQWDLQ SDWWHUQV ZKLFK DUH DSDUW IURP HDFK RWKHU UHIHU WR 6HFWLRQ f )LJXUH I VKRZV WZR H[DPSOHV 7KH ILUVW H[DPSOH VKRZV DQ $,QWHJUDWH RSHUDWLRQ RYHU FODVV $ 3DWWHUQV WKDW KDYH FRPPRQ ,QQHUSDWWHUQ RI FODVV $ DUH JURXSHG LQWR RQH LV WKH LQWHJUDWLRQ RI RU D DQG D DQG LV WKH LQWHJUDWLRQ RI D DQG DVf $OO RWKHU SDWWHUQV LQ D DUH UHWDLQHG LQ WKH UHVXOW DV WKH\ DUH 7KH VHFRQG H[DPSOH LOOXVWUDWHV DQ $,QWHJUDWH RSHUDWLRQ RQ WKH VDPH DVVRFLDWLRQVHW RI PAGE 77 WKH ILUVW H[DPSOH EXW ZLWKRXW VSHFLI\LQJ D FODVV 7KH UHVXOW EHFRPHV WZR SDWWHUQV ZKLFK DUH DSDUW DQG DUH H[DFWO\ WKH VDPH DV WKH\ DSSHDU LQ WKH RULJLQDO GDWDEDVH :KHUHDV WKH VDPH SULPLWLYH SDWWHUQV DSSHDU PRUH WKDQ RQFH LQ WKH UHVXOW RI WKH ILUVW H[DPSOH f $8QLRQf 6LPLODU WR WKH 81,21 RSHUDWLRQ RI WKH UHODWLRQDO DOJHEUD $8QLRQ FRPELQHV WZR DVVRFLDWLRQVHWV LQWR RQH +RZHYHU WKHVH WZR DVVRFLDWLRQVHWV FDQ FRQWDLQ KHWHURJHQHRXV DVVRFLDWLRQ VWUXFWXUHV ,W LV LPSRUWDQW IRU $DOJHEUD WR EH DEOH WR RSHUDWH RQ KHWHURJHQHRXV VWUXFWXUHV EHFDXVH VRPH SULRU RSHUDWLRQV PD\ SURGXFH KHWHURJHQHRXV DVVRFLDWLRQVHWV DQG PD\ QHHG WR EH IXUWKHU SURFHVVHG RYHU WKH REMHFWV RI D FRPPRQ FODVV DJDLQVW RWKHU SDWWHUQV RI DVVRFLDWLRQV 8QOLNH WKH UHODn WLRQDO DOJHEUD DQG RWKHU TXHU\ ODQJXDJHV XQLRQFRPSDWLELOLW\ LV QRW D UHVWULFn WLRQ LQ $DOJHEUD )RU WKLV UHDVRQ $DOJHEUD KDV PRUH H[SUHVVLYH SRZHU $Q\ TXHU\ WKDW FDQ EH H[SUHVVHG E\ D VLQJOH H[SUHVVLRQ LQ RWKHU ODQJXDJHV FDQ EH H[SUHVVHG DV D VLQJOH $DOJHEUD H[SUHVVLRQ EXW QRW YLVH YHUVD 7KH $8QLRQ RSHUDn WLRQ LV GHILQHG DV IROORZV m 3 ^ 9HD Y 9He ` 7KH $8QLRQ RSHUDWRU LV FRPPXWDWLYH DVVRFLDWLYH DQG LGHPSRWHQW D 3 If D D Sf D f m D D FRPPXWDWLYLW\f DVVRFLDWLYLW\f LGHPSRWHQF\f PAGE 78 f $'LIIHUHQFH f 7KH $'LIIHUHQFH LPSOHPHQWV WKH VDPH FRQFHSW DV WKH ',))(5(1&( RSHUDn WRU LQ UHODWLRQDO DOJHEUD EXW ZLWK WZR GLIIHUHQFHV )LUVW LWV RSHUDQGV GR QRW KDYH WR EH XQLRQ FRPSDWLEOH 6HFRQGO\ D SDWWHUQ LQ WKH PLQXHQG LV UHWDLQHG LI LW GRHV QRW FRQWDLQ DQ\ RI WKH SDWWHUQV LQ WKH VXEWUDKHQG m 3 ^ r Dr Âœ ^IWIIL&Df ` 7KH H[DPSOH GHSLFWHG LQ )LJXUH J VKRZV WKDW D DQG D DUH GURSSHG VLQFH WKH\ ERWK FRQWDLQ f $'LYLGH Af 7KH $'LYLGH RSHUDWRU LPSOHPHQWV WKH FRQFHSW WKDW D JURXS RI SDWWHUQV ZLWK FHUWDLQ FRPPRQ IHDWXUHV FRQWDLQV DQRWKHU VHW RI SDWWHUQV m ^Z` 3 ^ r 2Ir 9f&DW f ` ZKHUH RWW LV D VXEVHW RI WKH SDWWHUQV RI RU ZKLFK KDYH FRPPRQ ,QQHUSDWWHUQV IRU DOO FODVVHV RI ^:` DQG WKH\ WRJHWKHU FRQWDLQ DOO SDWWHUQV RI ,I ^:` LV QRW VSHFLILHG WKH $'LYLGH RSHUDWLRQ UHWDLQV DOO WKH SDWWHUQV RI D LI HDFK RI ZKLFK FRQWDLQ DW OHDVW RQH SDWWHUQ RI DQG WKH\ WRJHWKHU FRQWDLQ DOO SDWWHUQV RI )LJXUH K VKRZV DQ H[DPSOH RI D EHLQJ GLYLGHG E\ ZLWK UHVSHFW WR FODVV % 7KH $'LYLGH RSHUDWLRQ UHWDLQV RU D DQG D VLQFH WKH\ DOO FRQWDLQ ,QQHU SDWWHUQ f RI % DQG WRJHWKHU FRQWDLQ DOO SDWWHUQV RI IL PAGE 79 3UHFHGHQFH 7KH SUHFHGHQFH UHODWLRQVKLSV RI WKH DERYH RSHUDWRU DUH DV IROORZV 8QDU\ RSHUDWRUV KDYH KLJKHU SUHFHGHQFH WKDQ ELQDU\ RSHUDWRUV 7KH SUHFHGHQFH RI WKH VHYHQ ELQDU\ DVVRFLDWLRQ RSHUDWRUV LV JLYHQ LQ WKH IROORZLQJ RUGHU r f I DQG 3DUHQWKHVHV FDQ EH XVHG WR DOWHU WKH SUHFHGHQFH UHODWLRQVKLSV 6XPPDU\ RI RSHUDWRUV f $VVRFLDWH f 7ZR SDWWHUQV DUH FRQFDWHQDWHG YLD DQ ,QWHUSDWWHUQ f $&RPSOHPHQW _f 7ZR SDWWHUQV DUH FRQFDWHQDWHG YLD D &RPSOHPHQWSDWWHUQ f $6HOHFW HUf $ SDWWHUQ LV UHWDLQHG LI LW VDWLVILHV WKH SUHGLFDWH f $3URMHFW -f $ VXESDWWHUQ LV SURMHFWHG IURP WKH RULJLQDO SDWWHUQ f 1RQ$VVRFLDWH Of 7ZR SDWWHUQV DUH FRQFDWHQDWHG YLD D &RPSOHPHQWSDWWHUQ RQO\ LI HDFK RI WKHP FDQQRW EH FRQFDWHQDWHG ZLWK DQ\ SDWWHUQ RI WKH RWKHU RSHUDQG YLD DQ ,QWHUSDWWHUQ f $,QWHUVHFW ff 7ZR SDWWHUQ DUH FRPELQHG LQWR D VLQJOH SDWWHUQ LI WKHLU FRPn PRQ FODVVHV KDYH FRPPRQ REMHFWVf f $,QWHJUDWH f 3DWWHUQV LQ DQ DVVRFLDWLRQVHW DUH FRPELQHG LI REMHFWV RI D VSHFLILHG FODVV LQ D SDWWHUQ DUH FRPPRQ WR WKHVH SDWWHUQV f $8QLRQ f 7ZR DVVRFLDWLRQVHWV DUH OXPSHG LQWR D VLQJOH VHW f $'LIIHUHQFH f $ SDWWHUQ LQ WKH PLQXHQG LV UHWDLQHG LI LW GRHV QRW FRQWDLQ DQ\ SDWWHUQ LQ WKH VXEWUDKDQG f$'LYLGH If $ VXEVHW RI SDWWHUQV LQ WKH GLYLGHQG WKDW KDYH FHUWDLQ FRPPRQ IHDWXUHVf DQG FRQWDLQ DOO WKH SDWWHUQV LQ WKH GLYLVRU LV UHWDLQHG PAGE 80 4XHU\ ([DPSOHV :H KDYH IRUPDOO\ GHILQHG QLQH DVVRFLDWLRQ RSHUDWRUV DQG JLYHQ WKHLU VLPSOH PDWKHPDWLFDO SURSHUWLHV %HIRUH H[SORULQJ RWKHU SURSHUWLHV ZH JLYH VRPH H[DPn SOHV WR LOOXVWUDWH KRZ WKHVH RSHUDWRUV FDQ EH XVHG WR IRUPXODWH TXHULHV IRU SURFHVVnr RU RQ WKH LQWHQGHG VHPDQWLFV )RU VLPSOH SDWWHUQV WKH IRUPXODWLRQ LV VWUDLJKWn IRUZDUG )RU SDWWHUQV ZLWK FRPSOH[ VWUXFWXUHV ZH PD\ KDYH WR GHFRPSRVH WKHP LQWR SDWWHUQV ZLWK VLPSOHU VWUXFWXUHV 7KH H[SUHVVLRQ IRU WKH RULJLQDO SDWWHUQ LV WKH $,QWHUVHFWfV RI WKH H[SUHVVLRQV IRU WKH GHFRPSRVHG SDWWHUQV )LUVW ZH IRUPXODWH H[SUHVVLRQV IRU 4XHU\ WR 4XHU\ JLYHQ LQ &KDSWHU :H KDYH LGHQWLILHG WKH LQWHQVLRQDO SDWWHUQV IRU WKHVH TXHULHV VHH )LJXUH f 4XHU\ )RU DOO VHFWLRQV JHW WKH PDMRUV RI VWXGHQWV ZKR DUH WDNLQJ WKHVH VHFWLRQV ,W LV WULYLDO WR ZULWH DQ DOJHEUDLF H[SUHVVLRQ IRU 4XHU\ ZKLFK LV UHSUHVHQWHG E\ D OLQHDU SDWWHUQ )RU WKLV SDWWHUQ WZR HGJHV DUH DOO PDUNHG ZLWK r DQG WKH PAGE 81 DOJHEUDLF H[SUHVVLRQ FDQ EH IRUPXODWHG DV IROORZV I ,,>6HFWLRQ Â 6WXGHQW Â 'HSDUWPHQWf>6HFWLRQ'HSDUWPHQW6HFWLRQ'HSDUWPHQW@f n^6HFWLRQ` ZKHUH WKH $,QWHJUDWH RSHUDWLRQ JURXSV WKH UHVXOWDQW SDWWHUQV E\ 6HFWLRQV 4XHU\ /LVW VWXGHQWV ZKR PDMRU DQG PLQRU LQ WKH VDPH GHSDUWPHQW )RU 4XHU\ WKH HGJHV RI WKH LQWHQVLRQDO SDWWHUQ VKRZQ LQ )LJXUH F DUH DOO PDUNHG ZLWK r 6LQFH WKLV ORRS VWUXFWXUH FDQ EH YLHZHG DV WKH $,QWHUVHFW RI WZR OLQHDU SDWWHUQV LQYROYLQJ ERWK 6WXGHQW DQG 'HSDUWPHQW ZH KDYH ,,6WXGHQW Â 8QGHUJUDG Â 'HSDUWPHQW f 6WXGHQW Â 'HSDUWPHQWf>6WXGHQWZKHUH WKH $3URMHFW RSHUDWLRQ JHWV WKH VWXGHQW REMHFWV WKDW VDWLVI\ WKH DVVRFLDWLRQ SDWWHUQ DV UHTXLUHG E\ WKH TXHU\ 4XHU\ )RU WKRVH VWXGHQWV WDNLQJ VHFWLRQ DQG KDYLQJ PDMRUV DQGRU PLQRUV JHW WKHLU PDMRUV DQGRU PLQRUV 7KH H[SUHVVLRQ IRU WKH LQWHQVLRQDO SDWWHUQ RI 4XHU\ VKRZQ LV DV IROORZ 6HFWLRQ Â6HFWLRQ r >6WXGHQW Â'HSDUWPHQW 6WXGHQW Â8QGHU JUDG Â'HSDUWPHQWf ZKHUH WKH $8QLRQ RSHUDWRU LV XVHG WR UHDOL]H WKH 25 FRQGLWLRQ DW WKH FODVV 6WXn GHQW $V ORQJ DV D VWXGHQW KDV D PDMRU RU D PLQRU WKH OLQHDU SDWWHUQ IURP 6WXGHQW WR 'HSDUWPHQW DQG WKH OLQHDU SDWWHUQ IURP 6WXGHQW WR 8QGHUJUDG DQG WR 'HSDUWn PHQW VKRXOG EH UHWDLQHG ,Q WKH H[SUHVVLRQ 'HSDUWPHQWBO LV DQ DOLDV RI 'HSDUWn PHQW ZKLFK LV XVHG WR GLVWLQJXLVK PDMRU DQG PLQRU GHSDUWPHQWV 6LQFH WKH TXHU\ DVN IRU WKH PDMRUV DQG PLQRUV RI VWXGHQWV ZKR DUH WDNLQJ VHFWLRQ WKH $6HOHFW DQG $3URMHFW RSHUDWLRQV DUH XVHG 7KXV ZH KDYH PAGE 82 I ,4Uf>HFWRQ @f>XGOHQ 'HSDUWPHQW 'HSDUWPHQWDO -^6WXGHQW` 6WXGHQW?'HSDUWPHQW6WXGHQW?'HSDUWPHQW@f ZKHUH D LV WKH LQWHQVLRQDO SDWWHUQ JLYHQ DERYH $V VKRZQ LQ )LJXUH J WKH UHVXOW RI WKLV H[SUHVVLRQ ZLOO FRQWDLQ WKH GHULYHG SDWWHUQV VKRZQ LQ )LJXUH J ZKLFK DUH VSHFLILHG E\ WKH >e@ FODXVH RI WKH SURMHFWLRQ RSHUDWLRQ DQG LV UHRUJDQn L]HG E\ DQ $,QWHJUDWH RSHUDWLRQ 1RWH WKDW 4XHU\ FDQQRW EH SKUDVHG LQ D VLQn JOH UHODWLRQDO DOJHEUD H[SUHVVLRQ VLQFH Df WKH XQLRQ RSHUDWLRQ LQ UHODWLRQDO DOJHEUD UHTXLUHV RSHUDQGV WR EH XQLRQFRPSDWLEOH Ef XVLQJ D MRLQ RSHUDWLRQ RQ 6WXGHQW FDQ FDXVH D ORVV RI LQIRUPDWLRQ EHFDXVH QRW HYHU\ VWXGHQW KDV ERWK PDMRU DQG PLQRU Ff WKH FDUWHVLDQSURGXFW RI WKH PDMRUV DQG PLQRUV ZLOO SURGXFH HUURQHRXV UHVXOWV DQG Gf QR RWKHU RSHUDWLRQ LQ WKH UHODWLRQDO DOJHEUD FDQ FRPELQH WZR UHODnn XUH PAGE 83 4XHU\ /LVW WKH QDPHV RI VWXGHQWV ZKR WHDFK LQ WKH VDPH GHSDUWPHQWV DV WKHLU PDMRU GHSDUWPHQWV :H FDQ VHH IURP )LJXUH WKDW WKH LQWHQVLRQDO SDWWHUQ IRU WKLV TXHU\ FDQ EH FRQVWUXFWHG LQ WZR ZD\V 2QH ZD\ LV WR GHFRPSRVH LW LQWR WKUHH OLQHDU SDWWHUQV 1DPHfÂ§3HUVRQfÂ§6WXGHQW 6WXGHQWfÂ§'HSDUWPHQW DQG 6WXGHQWfÂ§ *UDGfÂ§ 7$fÂ§ 7HDFKHUfÂ§'HSDUWPHQW 7KH $,QWHUVHFWfV RI WKHVH WKUHH SDWWHUQV ZLOO SURGXFH D SDWWHUQ WKDW VDWLVILHV WKLV TXHU\ Q>6WXGHQW Â 3HUVRQ Â 1DPH f 6WXGHQW Â 'HSDUWPHQW f 6WXGHQW Â *UDG Â 7$ r 'HSDUWPHQWf>1DPH@ ZKHUH WKH ILUVW $,QWHUVHFW RSHUDWLRQ RSHUDWHV RYHU 6WXGHQW DQG WKH VHFRQG RSHUDWHV RYHU 6WXGHQW DQG 'HSDUWPHQW 7KH $3URMHFW RSHUDWLRQ SURMHFWV WKH QDPHV RI WKHVH VWXGHQWV $QRWKHU ZD\ LV WR GHFRPSRVH WKH LQWHQVLRQDO SDWWHUQ LQWR WZR OLQHDU SDWWHUQV 1DPHfÂ§3HUVRQfÂ§6WXGHQWfÂ§'HSDUWPHQW DQG 6WXGHQWfÂ§ *UDGfÂ§ 7$fÂ§ 7 H DFKHUfÂ§'HSDUWPHQW 7KHUHIRUH ZH KDYH DQ DOWHUQDWLYH H[SUHVVLRQ ,-1DPH Â3HUVRQ Â6WXGHQW Â'HSDUWPHQW Â7$ f 6WXGHQW Â*UDG Â7$ Â7HDFKHU Â'HSDUWPHQWf>1DPH@ 4XHU\ /LVW WKH VHFWLRQ RI WKRVH VHFWLRQV ZKLFK KDYH QRW EHHQ DVVLJQHG D URRP RU KDYH QRW EHHQ DVVLJQHG D WHDFKHU 6LQFH WKH TXHU\ UHTXHVWV VHFWLRQV WKDW KDYH QRW EHHQ DVVLJQHG D URRP RU D WHDFKHU WKHVH VHFWLRQV PXVW QRW EH FRQQHFWHG ZLWK DQ\ URRP RU DQ\ WHDFKHU LH PAGE 84 D VHFWLRQ ZKLFK GRHV QRW DVVRFLDWH ZLWK DQ\ URRP DQG WHDFKHU VKRXOG DOVR EH UHWDLQHG LQ WKH UHVXOWf 7KHUHIRUH WKHUH VKRXOG EH &RPSOHPHQWSDWWHUQV EHWZHHQ 6HFWLRQ DQG 7HDFKHU DQG EHWZHHQ 6HFWLRQ DQG 5RRP DQG D VLQJOH DUF EHWZHHQ WKHVH WZR EUDQFKHV DV VKRZQ LQ )LJXUH :H HPSKDVL]H WKDW RSHUDWLRQ LQVWHDG RI VKRXOG EH XVHG WR FRQVWUXFW WKHVH WZR &RPSOHPHQWSDWWHUQV 7KHQ WKH DOJHEUD H[SUHVVLRQ IRU WKLV TXHU\ FDQ EH HDVLO\ IRUPXODWHG DV IROORZV ,, 6HFWLRQ r 6HFWLRQ 5RRP 6HFWLRQ ?7HDFKHUff>6HFWLRQMI? 4XHU\ /LVW WKH QDPHV RI VWXGHQWV ZKR WDNH FRXUVHV DQG :H VKDOO VKRZ WKUHH ZD\V RI IRUPXODWLQJ DQ H[SUHVVLRQ IRU WKLV TXHU\ )LUVW WKH LQWHQVLRQDO SDWWHUQ IRU 4XHU\ VKRZQ LQ )LJXUH FDQ EH FRQVWUXFWHG E\ WKH $,QWHUVHFW RI WZR OLQHDU SDWWHUQV DV ZH GLG IRU 4XHU\ ,7R>1DPH Â3HUVRQÂ6WXGHQW Â(QUROOPHQW Â&RXUVH r&RXUmHf>&RXUVH @ f R^6WXGHQW Â(QUROOPHQW Â&RXUVH r&RXUDHBOf>&RWLUVH @f>O9DPH@ ZKHUH (QUROOPHQWfÂ§O &RXUVHBO DQG &RXUVHBO DUH WKH DOLDVHV RI WKH FODVVHV (QUROOPHQW &RXUVH DQG &RXUVH UHVSHFWLYHO\ 7KLV HQVXUHV WKDW WKH $,QWHUDFW RSHUDWLRQ ZLOO EH SHUIRUPHG RQO\ RYHU WKH 6WXGHQW FODVV $ VHFRQG ZD\ LV WR YLHZ WKH RULJLQDO SDWWHUQ DV D OLQHDU SDWWHUQ ZLWKRXW UHVn WULFWLRQ RQ &RXUVH DV IROORZV 1DPHfÂ§3HUVRQfÂ§ 6WXGHQWfÂ§(QUROOPHQWfÂ§ &RXUVHfÂ§ &RXUVH 6WXGHQWV ZKR DUH WDNLQJ ERWK FRXUVHV PXVW SDUWLFLSDWH DW OHDVW WZR VXFK SDWWHUQV ZLWK &RXUVH DQG &RXUVH UHVSHFWLYHO\ 7KLV LPSOLHV DQ $'LYLGH RSHUDWLRQ 7KXV WKH TXHU\ FDQ EH IRUPXODWHG DV IROORZV PAGE 85 ,O1DPH Â3HUVRQ Â6WXGHQW Â(QUROOPHQW Â&RXUVH Â&RXUVH A^6WXGHQW` rL&RXUVH RZUVHf>&RXUmH 9RXUmH @f>L9RPH@ ZKHUH D GRW LQ &RXUVH&RXUVH LV XVHG RQO\ IRU LGHQWLI\LQJ WKH &RXUVH FODVV ZKLFK LV GHILQHG LQ WKH &RXUVH FODVV ,W GRHV QRW UHSUHVHQW D IXQFWLRQ RU D PHWKRG DV LQ RWKHU ODQJXDJHV 7KLV H[SUHVVLRQ FDQ DOVR EH UHZULWWHQ DV IROORZ ,O1DPH Â 3HUVRQ Â ,,>6WXGHQW Â (QUROOPHQW Â &RXUVH Â &RXUVH U^VWXGHQW` A^&RXUVH&FPUHf>&nRXUH 9&RXUVF @f>6XLHQ@f>L9DPH@ ZKLFK LV PRUH VXLWDEOH IRU H[HFXWLRQ WKDQ WKH ILUVW VLQFH WKH LQQHU $3URMHFW JHWV WKH VWXGHQW REMHFWV ZKR DUH WDNLQJ WKHVH WZR FRXUVHV VR WKDW DOO RWKHU GDWD DVVRFLn DWHG ZLWK WKHVH VWXGHQWV VXFK DV (QUROOPHQW &RXUVH DQG &RXUVH GR QRW KDYH WR EH FDUULHG DORQJ LQ IXUWKHU SURFHVVLQJ WR JHW WKH QDPHV RI WKHVH VWXGHQW 'HWDLOV RI RSWLPL]DWLRQ LVVXHV ZLOO EH DGGUHVVHG LQ WKH QH[W FKDSWHU :H VWUHVV WKDW WKH DERYH DVVRFLDWLRQ SDWWHUQ H[SUHVVLRQV UHSUHVHQW WKH LQWHUn QDO DOJHEUDLF RSHUDWLRQV WKDW QHHG WR EH SHUIRUPHG LI WKH G\QDPLF LQKHULWDQFH PHWKRG LV XVHG 7KH KLJKOHYHO TXHU\ VWDWHPHQWV FRUUHVSRQGLQJ WR WKHVH DOJHEUDLF H[SUHVVLRQV LVVXHG E\ WKH XVHU FDQ EH PXFK VLPSOHU GXH WR WKH LQKHULWDQFH RI DWWULn EXWHV LQ WKH JHQHUDOL]DWLRQ KLHUDUFK\ RU ODWWLFH PAGE 86 6WXGHQW 6HFWLRQ &RXUVH )LJXUH 5HJXODUHGJHV DQG &RPSOHPHQWHGJHV LQ DQ 2* PAGE 87 JUDSKLFDO UHSUHVHQWDWLRQ DOJHEUDLF UHSUHVHQWDWLRQ SULPLWLYH SDWWHUQV D ZKLFK LV GHULYHG IURP D E F f f D '&RPSOHPHQW BB A SDWWHUQ f ZKLFK LV GHULYHG IURP D E F G f G G Df ,SDWWHUQ D $ E $ DEf Z 9 F G &RPSOHPHQW SDWWHUQ D G F G f ',QWHU SDWWHUQ DIGf D EEFFGf DIGf DEEFFGf Df SULPLWLYH DVVRFLDWLRQ SDWWHUQV D E F DEEFEGf f G EFFGf DEDEEFEFf DWEOEF EFFGFGf Ef FRPSOH[ DVVRFLDWLRQ SDWWHUQV )LJXUH ([DPSOHV RI DVVRFLDWLRQ SDWWHUQV PAGE 88 )LJXUH ([DPSOHV RI DVVRFLDWLRQVHWV PAGE 89 $ % & )LJXUH $ VDPSOH GDWDEDVH DVVRFLDWLRQ JUDSK 7KH &RPSOHPHQWSDWWHUQV DUH QRW VKRZQf PAGE 90 $%& G G G G 6DPSOH 'DWDEDVH 7KH &RPSOHPHQWSDWWHUQV DUH QRW VKRZQf D 3 < D W f E D f D ,fÂ§ E r>5%&f@ FmfÂ§Â‘rG ? F f m G F E f f G 9F fÂ§f G Df DQ $VVRFLDWH RSHUDWLRQ )LJXUH ([DPSOH RI RSHUDWLRQV PAGE 91 $ % & G G G G 6DPSOH 'DWDEDVH 7KH &RPSOHPHQWSDWWHUQV DUH QRW VKRZQf D D f D f AD f E _>5%&f@ G ffÂ§f G F r m G F D r E D E D D E F F G F G f E F Ef DQ $&RPSOHPHQW RSHUDWLRQ )LJXUH fÂ§FRQWLQXHG PAGE 92 $ % & G G G G 6DPSOH 'DWDEDVH 7KH &RPSOHPHQWSDWWHUQV DUH QRW VKRZQf D \ I D f E } Â‘ & GL $ P I D E G$ Q D E F G >$r% 'f%'f@ ffÂ§ D ffÂ§ fÂ§ffÂ§ E ;M/ 9 9 r F G + \ fÂ§IW Ff DQ $3URMHFW RSHUDWLRQ )LJXUH fÂ§FRQWLQXHG PAGE 93 $ % & G G G G 6DPSOH 'DWDEDVH 7KH &RPSOHPHQWSDWWHUQV DUH QRW VKRZQf D 3 < I DL E$ rY G I DB >5%&f@ F E F GA? DO 9DL =Er 9 fÂ§ D 9 f E Gf D 1RQ$VVRFLDWH RSHUDWLRQ )LJXUH aFRQWLQXHG PAGE 94 $ % & G G G G 6DPSOH 'DWDEDVH 7KH &RPSOHPHQWSDWWHUQV DUH QRW VKRZQf Hf DQ $,QWHUVHFW RSHUDWLRQ )LJXUH fÂ§FRQWLQXHG PAGE 95 $%& G G G G 6DPSOH 'DWDEDVH 7KH &RPSOHPHQWSDWWHUQV DUH QRW VKRZQf If $,QWHJUDWH RSHUDWLRQV )LJXUH fÂ§FRQWLQXHG PAGE 96 $%& G G G G 6DPSOH 'DWDEDVH 7KH &RPSOHPHQWSDWWHUQV DUH QRW VKRZQf Jf DQ $'LIIHUHQFH RSHUDWLRQ )LJXUH fÂ§FRQWLQXHG PAGE 97 $%& 6DPSOH 'DWDEDVH 7KH &RPSOHPHQWSDWWHUQV DUH QRW VKRZQf D D f E } E E F fÂ§} f F G f f F G E F f f E F f f \ 3 < I G ? f D E Ur? E Ff ? f f E F G E F f f L E F G F G ? f f ? ffÂ§ffÂ§f M Kf DQ $'LYLGH RSHUDWLRQ )LJXUH fÂ§FRQWLQXHG PAGE 98 4XHU\ 1DPH 2fÂ§ 4XHU\ 4XHU\ 1DPH 2 )LJXUH 6WXGHQW 'HSW 7HDFKHU 6HFWLRQ A r R FNe 6HFWLRQ 5RRP (QUROOPHQW &RXUVH 6WXGHQW 3HUVRQ (QUROOPHQWA &RXUVHB &RXUVH &RXUVH ,QWHQVLRQDO SDWWHUQV RI 4XHU\ DQG PAGE 99 &+$37(5 0$7+(0$7,&$/ 3523(57,(6 2) 23(5$7256 $1' 7+(,5 $33/,&$7,216 ,1 48(5< 237,0,=$7,21 $1' 48(5< '(&20326,7,21 ,Q 6HFWLRQ ZH KDYH VKRZQ VRPH PDWKHPDWLFDO SURSHUWLHV RI LQGLYLGXDO RSHUDWRUV ,Q WKLV VHFWLRQ ZH VKDOO VWXG\ WKHLU SURSHUWLHV V\VWHPDWLFDOO\ 7KH SURn SHUWLHV RI $DOJHEUD DUH FODVVLILHG LQWR VL[ FDWHJRULHV f FRQYHQWLRQDO DOJHEUDLF SURSHUWLHV VXFK DV FRPPXWDWLYLW\ DVVRFLDWLYLW\ LGHPSRWHQF\ QLOSRWHQF\ DQG GLV WULEXWLYLW\ f QHVWLQJ RI WZR XQDU\ RSHUDWLRQV f D ELQDU\ RSHUDWLRQ QHVWHG LQ D XQDU\ RSHUDWLRQ f FDVFDGLQJ RI WZR GLIIHUHQW ELQDU\ RSHUDWLRQV f JHQHUDO LGHQn WLWLHV DQG f RSHUDWLRQ WUDQVIRUPDWLRQ 7KH SURSHUWLHV SUHVHQWHG LQ WKLV GLVVHUWDn WLRQ LV TXLWH H[KDXVWLYH EXW PD\ QRW EH FRPSOHWH 7KHVH SURSHUWLHV SURYLGH WKH PDWKHPDWLFDO IRXQGDWLRQ IRU TXHU\ GHFRPSRVLWLRQ DQG TXHU\ RSWLPL]DWLRQ 7KHLU XWLOLWLHV LQ WKHVH WZR DSSOLFDWLRQV DUH DOVR LOOXVWUDWHG LQ WKLV FKDSWHU 7KH SURRIV RI SURSHUWLHV WKDW DUH PDUNHG ZLWK IfV FDQ EH IRXQG LQ WKH $SSHQGL[ 2WKHUV FDQ EH SURYHG VLPLODUO\ &RQYHQWLRQDO $OJHEUDLF 3URSHUWLHV 7R EH V\VWHPDWLF ILUVW ZH OLVW WKH SURSHUWLHV JLYHQ LQ 6HFWLRQ ZLWKRXW H[SODQDWLRQ VLQFH WKH\ KDYH EHHQ LOOXVWUDWHG SUHYLRXVO\ 7KHQ ZH JLYH WKH SURn SHUWLHV RI GLVWULEXWLYLW\ PAGE 100 $ &RPPXWDWLYLW\ D r>L$%f@ S S r>-=%$f@ m I f D >-$IOf@ 3 3 >5%$f` D I f D >5$%f? 3 S? >5%$f@ D I f D f^:` 3 3 f^:` D If D 3 3 D _f % $VVRFLDWLYLW\ m: r^5$0 S^Qf r>5&'f@ ^]` DZ r^5$%f` S><` r>5&'f? ^]`f &e^;` $ %J ^=`f I f mm :$0 3Zf >5&'f? ^]` RUZ >5$0 3^<` >5^&'f? ^]`f &e ^$` $ %e^=`f I f m^r` f^ZLf A^\`f m^:` a^]` m^[! r :` 3^<` Zf ^=`f LZAfQZA D ^X\LZAQ^r` rf f D Sf D S f I f & ,GHPSRWHQF\ DQG 1LOSRWHQF\ D f D D LI D LV D KRPRJHQHRXV DVVRFLDWLRQfÂ§VHWf f D D D f $ r>5$$f` $ $ f $ >IO$$f@ $ I! f PAGE 101 D D D f 'LVWULEXWLYLW\ Df GLVWULEXWLYH SURSHUW\ RI r ZLWK UHVSHFW WR D r>5$%f` S f D r>IO$%f@ S D r>%$%f@ I f Ef GLVWULEXWLYH SURSHUW\ RI ZLWK UHVSHFW WR D >%$%f@ 3 f D >5$%f` S D >%$%f@ I f FfGLVWULEXWLYH SURSHUW\ RI f ZLWK UHVSHFW WR RU r^;` 3 f D 3 D r^;` I f 7KHVH WKUHH SURSHUWLHV KROG WUXH IRU WKH VDPH UHDVRQV )LUVW WKH $8QLRQ RSHUDWLRQ VLPSO\ OXPSV WRJHWKHU SDWWHUQV RI WZR DVVRFLDWLRQVHWV ZLWKRXW PRGLI\n LQJ WKHP 6HFRQG ZKHQ WZR SDWWHUQV DUH RSHUDWHG RQ E\ r RU f WKH SURGXFWLRQ RI D QHZ SDWWHUQ LV LQGHSHQGHQW RI RWKHU SDWWHUQV LQ WKH RSHUDQG DVVRFLDWLRQVHWV LH WKH GHFLVLRQ ZKHWKHU D QHZ SDWWHUQ LV SURGXFHG RU QRW LV GHWHUPLQHG RQO\ EDVHG RQ WKH VWUXFWXUH RI WKH WZR SDWWHUQV EHLQJ RSHUDWHG RQ Gf GLVWULEXWLYH SURSHUW\ RI r ZLWK UHVSHFW WR f m^[! r 5&/Y&/f` 3^\f ^:` ^]`f m: r>5&/Y&/f? 3^\f ^:8DZ r^5&/Y&/f? ^]` I f Hf GLVWULEXWLYH SURSHUW\ RI ZLWK UHVSHFW WR f ?5&/Y&/f? 3>\f ^:` ^]`f RUZ >5&/Y&/f? 3>\f ^:8;` D^[@ ? >5&/9&/f? ]f f PAGE 102 'LVWULEXWLYH SURSHUWLHV G DQG H KROG WUXH XQGHU WKH IROORZLQJ WKUHH FRQGLn WLRQV Lf &/H: LLf ;S_< ;I< I! DQG LLLf RU LV D KRPRJHQHRXV DVVRFLDWLRQfÂ§VHW 7KH ILUVW FRQGLWLRQ HQVXUHV WKDW WKH r DQG RSHUDWLRQV DUH SHUIRUPHG RQ WKH LQWHUVHFWLRQ RI DQG UL! 2WKHUZLVH LW GRHV QRW PDNH VHQVH WR KDYH DQ RSHUDn WLRQ EHWZHHQ D DQG 7KH VHFRQG FRQGLWLRQ VWDWHV WKDW D SDWWHUQV DUH QRQn RYHUODSSLQJ ZLWK 3 DQG SDWWHUQV 7KH WKLUG FRQGLWLRQ VWDWHV WKDW RQ WKH ULJKW KDQG VLGH RI WKH H[SUHVVLRQ RQO\ WKH SDWWHUQV KDYLQJ WKH VDPH D SDWWHUQV DV WKHLU VXESDWWHUQV ZLOO VXFFHHG LQ WKH $,QWHUVHFW RSHUDWLRQ $OWKRXJK WKHVH WZR GLVWULnf DQG D IW D KDYH WRWDOO\ GLIIHUHQW VHPDQWLFV 7KH IRUPHU VWDQGV IRU SDWWHUQV LQ D WKDW DUH QRW DVVRFLDWHG ZLWK SDWWHUQV LQ ERWK IW DQG ZKHUHDV WKH ODWWHU VSHFLILHV WKRVH SDWWHUQV LQ D WKDW DUH QRW DVVRFLDWHG ZLWK DQ\ SDWWHUQ LQ HLWKHU RU 6HFRQG LV QRW GLVWULEXWLYH PAGE 103 ZLWK UHVSHFW WR f 7KLV SURSHUW\ GRHV QRW KROG EHFDXVH SHUIRUPLQJ WKH $,QWHUVHFW RSHUDWLRQ ILUVW PD\ GURS VRPH SDWWHUQV ZKLFK PD\ EH DVVRFLDWHG ZLWK VRPH D SDWWHUQV DQG WKH GURSSHG c SDWWHUQV PD\ DOORZ WKRVH D SDWWHUQV WR EH QRQ DVVRFLDWHG ZLWK WKH UHVXOW RI WKH $,QWHUVHFW RSHUDWLRQ :KHUHDV ZKHQ SHUIRUPn LQJ WKH 1RQDVVRFLDWH RSHUDWLRQ ILUVW WKRVH D SDWWHUQV PD\ QRW DSSHDU LQ WKH ILQDO UHVXOW 7KH UHDVRQ WKDW 1RQ$VVRFLDWH RSHUDWRU LV QRW GLVWULEXWLYH ZLWK UHVSHFW WR $ 8QLRQ DQG $,QWHUVHFW RSHUDWLRQV LV PDLQO\ EHFDXVH LW LV QRW DVVRFLDWLYH :H VKDOO VHH IURP WKH UHVW RI WKLV FKDSWHU WKDW LW KDV OHVV SURSHUWLHV WKDQ RWKHU RSHUDWRUV r 1HVWLQJ RI 7ZR 8QDU\ 2SHUDWLRQV Df 7ZR $6HOHFW RSHUDWLRQV RQH QHVWHG LQ WKH RWKHUf 6LPLODU WR WKH UHODWLRQDO DOJHEUD WKH RUGHU RI WKH QHVWLQJ RI WZR VHOHFWLRQV FDQ EH H[FKDQJHG ZLWKRXW DIIHFWLQJ WKH ILQDO UHVXOW 2U WKH\ FDQ EH FRPELQHG LQWR D VLQJOH VHOHFWLRQ RSHUDWLRQ 7KH VHOHFWLRQ FRQGLWLRQ RI WKH FRPELQHG $6HOHFW RSHUDWLRQ LV WKH FRQMXQFWLRQ RI WKH SUHGLFDWHV RI WKH RULJLQDO WZR $6HOHFW RSHUDn WLRQV rL D>RWf>3$3? Wf PAGE 104 Ef 7ZR $3URMHFW RSHUDWLRQV RQH QHVWHG LQ WKH RWKHUf ,W VKRXOG EH REYLRXV WKDW WKH RUGHU RI WKH QHVWLQJ RI WZR SURMHFWLRQ RSHUDn WLRQV FDQQRW EH H[FKDQJHG H[FHSW WKDW WKH\ SURMHFW WKH VDPH WKLQJ ZKLFK LV QRW PHDQLQJIXO +RZHYHU WKH\ DUH HTXLYDOHQW WR D VLQJOH SURMHFWLRQ LI WKH RXWHU $ 3URMHFW RSHUDWLRQ SURMHFWV VXESDWWHUQV RYHU SDWWHUQV SURGXFHG E\ WKH LQQHU $ 3URMHFW Q[ :WH7GLIWM7QDfIWM7f 9HOLF\HOLfI $ HMfe A H8fÂ§HMf f ZKHUH HXfV DUH VXESDWWHUQ H[SUHVVLRQV RI WKH ILUVW $3URMHFW RSHUDWLRQ DQG HAfV DUH VXESDWWHUQ H[SUHVVLRQV RI WKH VHFRQG $3URMHFW RSHUDWLRQ DQG HX&H\ PHDQV WKDW HX GHILQHV D VXESDWWHUQ RI H\ Ff 7ZR $,QWHJUDWH RSHUDWLRQV RQH QHVWHG LQ WKH RWKHUf %\ WKH GHILQLWLRQ RI WKH $,QWHJUDWH RSHUDWLRQ LI DQ $,QWHJUDWH RSHUDWLRQ LV DSSOLHG VHFRQG WLPH RQ DQ DVVRFLDWLRQVHW LW ZLOO KDYH QR HIIHFW RQ WKH UHVXOW RI WKH ILUVW RSHUDWLRQ 7KHUHIRUH ZH KDYH mff mf -^Z` -^Z`Y -^Z\ n f mff mf f 6LQFH DQ $,QWHJUDWH RSHUDWLRQ ZLWK D VHW RI VSHFLILHG FODVVHV RQO\ SHUIRUPV SDUW RI WKH IXQFWLRQ RI DQ $,QWHJUDWH RSHUDWLRQ ZLWKRXW D VHW RI VSHFLILHG FODVVHV WKH IROn ORZLQJ HTXDWLRQV DOVR KROG WUXH \m} ff f PAGE 105 \ mff mf f Gf $6HOHFW QHVWHG LQ $SURMHFW RU YLVH YHUVD $ VHOHFWLRQ RSHUDWLRQ SHUIRUPHG RQ WKH UHVXOW RI D SURMHFWLRQ RSHUDWLRQ LV HTXLYDOHQW WR WKH SURMHFWLRQ SHUIRUPHG RQ WKH UHVXOW RI WKH VHOHFWLRQ VLQFH WKH VHOHFWLRQ FRQGLWLRQ DSSOLFDEOH WR WKH SURMHFWHG VXESDWWHUQV PXVW EH DSSOLFDEOH WR WKH SDWWHUQV EHIRUH WKH SURMHFWLRQ +RZHYHU LW LV QRW WUXH IRU WKH RWKHU GLUHFWLRQ DW QrfP0 ,$ DLDPH@ f )RU WKH RWKHU GLUHFWLRQ WR EH WUXH WKH FODVVHV LQYROYHG LQ WKH SUHGLFDWH RI WKH VHOHFWLRQ FRQGLWLRQ VKRXOG DOVR DSSHDU LQ >e@ FODXVH RI WKH SURMHFWLRQ RSHUDWLRQ GHQRWHG DV 3&6f ZKLFK GHILQHV VXESDWWHUQVf WR EH SURMHFWHG RXW 2WKHUZLVH WKH UHVXOW RI WKH VHOHFWLRQ LV DOZD\V DQ HPSW\ VHW EHFDXVH WKH SUHGLFDWH LV QRW DSSOLFDn EOH WR WKH SURMHFWHG SDWWHUQV 7KHUHIRUH WKH DERYH SURSHUW\ KROGV WUXH IRU ERWK GLUHFWLRQV ZKHQ WKH FRQGLWLRQ KROGV WKXV ZH KDYH P $rfP]7? 3&6f Wf Â/ $ %LQDU\ 2SHUDWLRQ 1HVWHG LQ $ 8QDU\ 2SHUDWLRQ %LQDU\ RSHUDWLRQ QHVWHG LQ DQ $6HOHFW Df $VVRFLDWH $&RPSOHPHQW RU $,QWHUVHFW QHVWHG LQ $6HOHFW *HQHUDOO\ VSHDNLQJ WUDQVIRUPLQJ DQ H[SUHVVLRQ RI D ELQDU\ RSHUDWLRQ $VVRFLn DWH $&RPSOHPHQW RU $,QWHUDFWf QHVWHG LQ D VHOHFWLRQ LQWR DQRWKHU H[SUHVVLRQ LV LPSRVVLEOH VLQFH WKH SUHGLFDWH RI WKH VHOHFWLRQ RSHUDWLRQ FDQ EH YHU\ FRPSOLFDWHG )RU WKLV UHDVRQ ZH VWXG\ RQO\ WKH VLPSOH FDVH LQ ZKLFK WKH SUHGLFDWH KDV WKH IRUP PAGE 106 3[D3 RU 3\3 DQG 3[ DQG 3 DUH RQO\ DSSOLFDEOH WR D DQG 3 UHVSHFWLYHO\ 7KH IROn ORZLQJ SURSHUWLHV DUH VLPLODU WR WKRVH LQ UHODWLRQDO DOJHEUD 7KH\ GR QRW QHHG DQ H[SODQDWLRQ )RU 3[D3 ZH KDYH DD r^5$%f` 3f>3[D3@ Dmf>!@ r^5$%f` D^Sf>3f I f RD ?>5$%f? 3f>3[D3@ ARUf3[@ >r$LOf@ D3f>3@ f rD f U[Df>3[` f f )RU 3[Y3 ZH KDYH RD r>IO$IOf@ 3f>3\3Â RRIf>!@ rWm$%f@ 3 D r^5$%f` R^3f>3? I f R^D _>%$%f@ 3f>3;93@ mrf>!@ ?>5$%f` D _>-$%f@ R3f>3f rLr 3f>3[Y3@ R>RWf>3[f 3 D R>\ f :H QRWH WKDW WKH DERYH SURSHUWLHV DUH QRW WUXH IRU D 1RQ$VVRFLDWH RSHUDWLRQ QHVWHG LQ DQ $6HOHFW 7KH UHDVRQ LV VLPLODU WR ZKDW ZH KDYH H[SODLQHG LQ WKH VHFn WLRQ RQ GLVWULEXWLYH SURSHUW\ Ef $'LIIHUHQFH QHVWHG LQ $6HOHFW 6LQFH ERWK $'LIIHUHQFH DQG $6HOHFW RSHUDWLRQV SHUIRUP D UHVWULFWLRQ RQ DQ DVVRFLDWLRQVHW DQG SURGXFH D VXEVHW RI SDWWHUQV ZLWKRXW FKDQJLQJ WKHLU RULJLQDO VWUXFWXUHV DQ $6HOHFW RSHUDWLRQ SHUIRUPHG RQ WKH PLQXHQG RU RQ WKH UHVXOW RI WKH $'LIIHUHQFH RSHUDWLRQ ZLOO SURGXFH WKH VDPH UHVXOW D>D Sf>3c R^Df>3> 3 If PAGE 107 Ff $8QLRQ QHVWHG LQ $6HOHFW ,W VKRXOG EH REYLRXV WKDW WKH IROORZLQJ HTXDWLRQ LV DOZD\V WUXH RD Sf>-@ R>Df>3L D^P I f ,Q D VSHFLDO FDVH WKDW 3 KDV WKH IRUP 3A-" DQG 3[ DQG 3 FDQ EH DSSOLHG WR D DQG 3 UHVSHFWLYHO\ ZH KDYH Rm f>! Y3m03Wf %LQDU\ RSHUDWLRQ QHVWHG LQ $3URMHFW RU $,QWHJUDWH 6LQFH $3URMHFW DQG $,QWHJUDWH RSHUDWLRQV SURGXFH SDWWHUQV ZKLFK PD\ FRQn WDLQ VXESDWWHUQV RI ERWK RSHUDQGV RI WKH QHVWHG ELQDU\ RSHUDWLRQ SURSHUWLHV VLPLn ODU WR WKRVH SUHVHQWHG DERYH GR QRW KROG LQ JHQHUDO H[FHSW IRU WKH QHVWLQJ RI DQ $8QLRQ RSHUDWLRQ ,->D Sf>e7@ QDf^e7? -->L7If m 3f mf f f D Sf > I Df I Sff -^Z\ f -^Z\ -Z`Y f ?Z\ f f K$ &DVFDGLQJ RI 7ZR %LQDU\ 2SHUDWLRQV &DVFDGLQJ RI WZR LGHQWLFDO ELQDU\ RSHUDWRUV 0RVW FDVHV KDYH EHHQ FRYHUHG E\ WKH DVVRFLDWLYLW\ SURSHUWLHV $OWKRXJK WKH DVVRFLDWLYLW\ GRHV QRW KROG IRU RSHUDWRUV DQG I WKHUH H[LVW VRPH HTXLYDOHQW H[SUHVVLRQV 7KH FDVFDGLQJ RI WZR $'LIIHUHQFH RSHUDWLRQV IROORZV WKH VHW PAGE 108 GLIIHUHQFH LQ VHW WKHRU\ Dn" DUn\" DU" n\f I f 7KH FDVFDGLQJ RI WZR $'LYLGH RSHUDWLRQV LV HTXLYDOHQW WR WKH GLYLGHQG GLYLGHG E\ WKH $8QLRQ RI WKH WZR GLYLVRUV EHFDXVH DQ $'LYLGH RSHUDWLRQ UHWDLQV SDWWHUQV RI WKH GLYLGHQG ZLWKRXW PRGLI\LQJ WKHLU VWUXFWXUHV QRWH WKDW WKH GLYLGH RSHUDWLRQ LQ UHODWLRQDO DOJHEUD UHWDLQV D VXEVWUXFWXUH RI WKH GLYLGHQGf 7KHUHIRUH WKH RUGHU RI WKH WZR $'LYLGH RSHUDWLRQV LV QRW LPSRUWDQW D a ^Z` 3 7 D A^Z` 7 ^Z` 3 I f m ^Z`L3 f &DVFDGLQJ RI WZR GLIIHUHQW ELQDU\ RSHUDWLRQV 0DQ\ FDVHV KDYH EHHQ FRYHUHG E\ WKH GLVWULEXWLYH SURSHUWLHV $OWKRXJK WKH GLVWULEXWLYLW\ SURSHUWLHV RI DQG ZLWK UHVSHFW WR GR QRW KROG WKHUH VWLOO H[LVW VRPH HTXLYDOHQW H[SUHVVLRQV 7KHVH SURSHUWLHV DUH OLVWHG EHORZ DFFRUGLQJ WR WKHLU ILUVW RSHUDWRUV Df r ZLWK RWKHU ELQDU\ RSHUDWRUV 7KH FDVFDGLQJ RI r DQG RSHUDWRUV LV DVVRFLDWLYH m^r` r^5$0 3Zf _>5&'f? ^]` DZ r>5$%f` ^\f ??5^&'f? ^]`f I f &^;`$%e^=`f 7KH FRQGLWLRQ HQVXUHV WKDW WKH RSHUDWLRQ r>$%f@ GRHV QRW RSHUDWH RQ SDWWHUQV DQG r>5&'f? GRHV QRW RSHUDWH RQ D SDWWHUQV PAGE 109 )RU WKH FDVFDGLQJ RI r DQG RSHUDWRUV LQ WKDW RUGHUf LW VKRXOG EH REYLRXV WKDW ZKHQ WKH VXEWUDKHQG LV RQO\ DSSOLFDEOH WR RQH RI WKH RSHUDQGV RI WKH r RSHUDn WLRQ WKH RSHUDWLRQ FDQ EH SHUIRUPHG ILUVW DQG MXVW DJDLQVW WKDW RSHUDQG m^[! r>m$%f@ 3^<`f a ^]` m: ^]`f r>5$0 3^ PAGE 110 Ef ZLWK RWKHU ELQDU\ RSHUDWRUV 6LPLODU WR WKH DERYH WZR SURSHUWLHV ZH KDYH m: IW><@f ^]` RUZ ^]`f ?>5r0 IW> PAGE 111 7KH VLJQLILFDQFH RI HTXDWLRQV DQG LV WKDW WKH\ FDQ EH XVHG WR WUDQVIRUP WKH RULJLQDO H[SUHVVLRQV LQ ZKLFK WKH RSHUDWRUV RSHUDWH RQ KHWHURJHQHn RXV DVVRFLDWLRQVHWV HJ D f IRU ZKLFK WKH GLVWULEXWLYLW\ FDQQRW EH DSSOLHG LQWR H[SUHVVLRQV LQ WKH IRUPDW RI $8QLRQfV RI KRPRJHQHRXV DVVRFLDWLRQVHWV Hf ZLWK RWKHU RSHUDWRUV $Q DVVRFLDWLRQVHW Df GLYLGHG E\ WKH $8QLRQ RI WZR RWKHU DVVRFLDWLRQVHWV DQG f LV HTXLYDOHQW WR WZR FRQVHFXWLYH $'LYLGH RSHUDWLRQV RI D GLYLGHG E\ If DQG LQ WXUQ DV LQGLFDWHG LQ HTXDWLRQ 7KH RUGHU RI WKH WZR $'LYLGH RSHUDWLRQV LV QRW LPSRUWDQW D ^Z`3 f D A^Z` 3 A^Z` 7 f D ^MY` 3 7KH $'LYLGH RSHUDWRU DOVR KDV OHVV SURSHUWLHV EHFDXVH LW LV QRW DVVRFLDWLYH If ZLWK RWKHU ELQDU\ RSHUDWRUV 7KH SURSHUWLHV RI RSHUDWRU FDVFDGHG ZLWK RWKHU RSHUDWRUV DUH FRYHUHG E\ DQG Jf ZLWK RWKHU ELQDU\ RSHUDWRUV 7KH HTXDWLRQ EHORZ IROORZV WKH VHWXQLRQ DQG VHWGLIIHUHQFH RSHUDWLRQV LQ VHW WKHRU\ RU Sf D f f I f PAGE 112 7KH SURSHUWLHV RI FDVFDGLQJ RI ZLWK RSHUDWRUV r f DQG RSHUDWRUV FDQ EH IRXQG LQ DQG VLQFH WKH ODWWHU RSHUDWRUV DUH FRPPXWDn WLYH r *HQHUDO ,GHQWLWLHV 7KHUH DUH PDQ\ RWKHU SURSHUWLHV ZKLFK DUH XQLTXH WR WKH $DOJHEUD EXW FDQn QRW EH FODVVLILHG LQWR WKH DERYH FDWHJRULHV /LVWHG EHORZ DUH VRPH LGHQWLW\ SURSHUn WLHV 7KHVH LGHQWLWLHV DUH XVHIXO IRU H[SUHVVLRQ UHGXFWLRQ $ f $ r % $ r % f $ f $ % $ % f $ ,->$?%f>$@ $ f $r%r&r$r% $r%r& f rIL 7UDQVIRUPDWLRQ RI 2SHUDWRUV $Q LPSRUWDQW IDFW ZH KDYH REVHUYHG LV WKDW WKH VDPH SDWWHUQ FDQ EH FRQn VWUXFWHG E\ GLIIHUHQW DOJHEUDLF H[SUHVVLRQV XVLQJ GLIIHUHQW RSHUDWRUV )RU H[DPSOH SDWWHUQ $fÂ§%fÂ§& FDQ EH FRQVWUXFWHG HLWKHU E\ $r%r&RU E\ %r$ f %r& KHQFH %r$n%r& $r%r& f )RUPDOO\ WKHLU HTXLYDOHQFH FDQ EH GHULYHG XVLQJ WKH SURSHUWLHV SUHVHQWHG LQ WKH SUHYLRXV VHFWLRQV % r $ f % r & % f % r &@ r>%$f@ $ % r &f r>IO$f@ $ E\ f E\ f PAGE 113 $ r % r 2f E\ f $ r% r& E\ f )RU WKH RWKHU GLUHFWLRQ ZH KDYH $ r % r & $ r % r %f r & $ r % f %f r & fÂ§ $ r % f %f r & $ r % f % r & E\ f E\ f E\ f E\ f 8VLQJ WKLV SURSHUW\ D SDWWHUQ RI WUHHVWUXFWXUH FDQ EH GHVFULEHG ZLWKRXW XVLQJ $ ,QWHUVHFW RSHUDWRU ZKLFK LV UHODWLYHO\ PRUH H[SHQVLYH WR LPSOHPHQW )RU H[DPSOH $ r% r&% r'f $ r>5$%f` & r % r 'f E\ f $ r % r & r>5%'f? 'f E\ f $ r % r & r>5^%'f` E\ f $QRWKHU XVHIXO WUDQVIRUPDWLRQ LV SRVVLEOH EHFDXVH D SDWWHUQ RI ODWWLFH VWUXFn WXUH H[SUHVVHG E\ DQ LQWHUVHFWLRQ RI WZR OLQHDU SDWWHUQV FDQ EH YLHZHG DV D VHOHFn WLRQ RQ OLQHDU SDWWHUQV WR DYRLG WKH H[SHQVLYH $,QWHUVHFW RSHUDWLRQ )RU H[DPSOH $r%r&r' f %r(r' R $r%r&r'r(r%fÂ§Of>% %fÂ§O? f 7KH OHIWKDQG VLGH LV WR FRQVWUXFW D ODWWLFH SDWWHUQ E\ LQWHUVHFWLQJ WZR OLQHDU SDWn WHUQ RYHU FODVVHV % DQG %\ EUHDNLQJ WKH ODWWLFH SDWWHUQ DW % LW EHFRPHV D VLQn JOH OLQHDU SDWWHUQ DV VHHQ RQ WKH ULJKWKDQG VLGH RI WKH DERYH H[SUHVVLRQ +HUH % LV DQ DOLDV RI % %\ VSHFLI\LQJ WKDW % % LQ WKH WKH DVVRFLDWLRQVHW GHILQHG E\ $r%r&r'r(r% ZH REWDLQ WKH VDPH UHVXOW DV WKH H[SUHVVLRQ GHILQHG RQ WKH OHIW KDQG VLGH PAGE 114 %DVHG RQ WKHVH WZR WUDQVIRUPDWLRQ SURSHUWLHV D FRPSOLFDWHG QHWZRUN VWUXFn WXUH FDQ EH YLHZHG DV D IRUHVW VWUXFWXUH E\ SURSHUO\ EUHDNLQJ DOO WKH ORRSV LQ WKH QHWZRUN DQG LWV DOJHEUDLF H[SUHVVLRQ FDQ EH VSHFLILHG XVLQJ D r DQG RSHUDWRUV K$SSOLFDWLRQV LQ 4XHU\ 2SWLPL]DWLRQ DQG 4XHU\ 'HFRPSRVLWLRQ :H KDYH V\VWHPDWLFDOO\ SUHVHQWHG WKH PDWKHPDWLFDO SURSHUWLHV RI WKH RSHUDnf DQG LV RSWLPDO IRU H[HFXWLRQ )LQDOO\ WKH QHZ DFFHVV SODQ LV VFKHGXOHG IRU H[Hnn FXOXV UHIHU WR &KDSWHU f 4XHU\ RSWLPL]DWLRQ LV ZLWKRXW ORVV RI JHQHUDOLW\ DQ 13KDUG SUREOHP 7KHUHIRUH DQ DFFHVV SODQ JHQHUDWHG E\ WKH RSWLPL]HU LV RSWLPDO LQ D YHU\ UHVWULFn WLYH VHQVH )XUWKHUPRUH WR EH SUDFWLFDO WKH RYHUKHDG RI WKH RSWLPL]HU VKRXOG QHYHU H[FHHG WKH DGYDQWDJH RI TXHU\ RSWLPL]DWLRQ ,Q JHQHUDO D TXHU\ RSWLPL]HU JHQHUDWHV DQ RSWLPDO DFFHVV SODQ LQ WZR VWHSV f JHQHUDWH OLPLWHG QXPEHU RIf HTXLYDOHQW DFFHVV SODQV DQG f HYDOXDWH WKHVH DFFHVV SODQV EDVHG RQ D IHZf V\Vn WHP SDUDPHWHUV DQG FULWHULD 7KH PDWKHPDWLFDO SURSHUWLHV RI WKH $DOJHEUD SUHVHQWHG DERYH DUH WKH IRXQn GDWLRQ IRU WKH ILUVW VWHS RI TXHU\ RSWLPL]DWLRQ LQ GDWDEDVHV ,Q WKH VHFRQG VWHS WKH V\VWHPDSSOLFDWLRQ FKRRVHV RQH RU PRUH RI WKH IROORZLQJ DV WKH JRDO RI LWV TXHU\ RSWLPL]DWLRQ PLQLPDO UHVSRQVH WLPH PLQLPDO H[HFXWLRQ WLPH PLQLPDO FRPn PXQLFDWLRQ WLPH PLQLPDO VWRUDJH VSDFH PD[LPDO UHVRXUFH XWLOL]DWLRQ HWF 7KH SDUDPHWHUV XVHG LQ HVWLPDWLQJ WKH SHUIRUPDQFH RI DQ DFFHVV SODQ LQFOXGH FRPPXQn LFDWLRQ FRVW SHU EORFNf &38 FRVW SHU XQLWf ,2 FRVW SHU ,2f EXIIHU VL]H VHOHF WLYLWLHV RI RSHUDWLRQV HJ 6HOHFWLRQ DQG -RLQ LQ UHODWLRQDO GDWDEDVHVf GDWD VWUXFn WXUH DOJRULWKPV RI WKH RSHUDWLRQV HJ QHVWHGMRLQ KDVKMRLQf HWF 6LQFH WKH FULWHULD RI RSWLPL]DWLRQ DUH V\VWHPDSSOLFDWLRQ GHSHQGHQW DQG WKH RSWLPL]DWLRQ VWUDWHJLHV YDU\ IURP V\VWHP WR V\VWHP D GHWDLOHG VWXG\ LV RXW RI WKH VFRSH RI WKLV GLVVHUWDWLRQ :H VKDOO JLYH DQ H[DPSOH WR GHPRQVWUDWH WKH LPSRUn WDQFH RI WKH $DOJHEUD LQ TXHU\ RSWLPL]DWLRQ PAGE 116 4XHU\ /LVW *3$V RI VWXGHQWV ZKR PDMRU DQG PLQRU LQ WKH VDPH GHSDUWPHQWV 7KH LQWHQVLRQDO SDWWHUQ IRU WKLV TXHU\ LV VKRZQ LQ )LJXUH D 6XSSRVH WKDW WKH DOJHEUDLF H[SUHVVLRQ SURGXFHG E\ WKH TXHU\ WUDQVODWRU LV DV IROORZ ZKLFK FRUUHVSRQGV WR DQ DFFHVV SODQ UHSUHVHQWHG E\ WKH TXHU\ WUHH VKRZQ LQ )LJXUH E ,,*3$ r 6WXGHQW r 'HSDUWPHQW f 6WXGHQW r 8QGHUJUDG r 'HSDUWPHQWff>*3$@ 7R PDNH WKH HYDOXDWLRQ HDV\ ZH DVVXPH WKDW HYHU\ VWXGHQW KDV PDMRU PLQRU DQG *3$ LH WKH VHOHFWLYLWLHV RI DOO r RSHUDWLRQV DUH f DQG RXW RI VWXn GHQWV PDMRU DQG PLQRU LQ WKH VDPH GHSDUWPHQWV LH WKH VHOHFWLYLW\ RI WKH f RSHUDWLRQ LV f ,I WKH WLPH WR SHUIRUP DQ $6HOHFW RQ D SDWWHUQ LV XQLW WR SHUIRUP DQ $VVRFLDWH RSHUDWLRQ LV XQLWV DQG WR SHUIRUP DQ $,QWHUVHFW RSHUDWLRQ LV XQLWV WKH WRWDO H[HFXWLRQ WLPH FDQ EH FDOFXODWHG DV IROORZV QRW LQFOXGLQJ WLPH IRU WKH $3URMHFW RSHUDWLRQ 7M rf rf rf r ZKHUH WKH ILUVW WHUP LV WKH WLPH IRU LGHQWLI\LQJ VWXGHQWVf PDMRUV WKH VHFRQG WHUP LV IRU LGHQWLI\LQJ VWXGHQWVf PLQRUV WKH WKLUG WHUP LV IRU WKH $,QWHUVHFW RSHUDWLRQ DQG WKH ODVW WHUP LV IRU LGHQWLI\LQJ WKH *3$V ,Q )LJXUH E WKH FRVWV RI RSHUDn WLRQV DUH GHSLFWHG QH[W WR WKH RSHUDWRU QRGHV +HUH WKH WLPH IRU WKH $,QWHUVHFW RSHUDWLRQ LV VPDOO EHFDXVH HDFK VWXGHQW KDV RQO\ RQH PDMRU DQG RQH PLQRU DQG LQGLFHV PD\ EH XVHG WR VSHHG XS WKH RSHUDWLRQ 8VLQJ SURSHUW\ WKH VDPH LQWHQVLRQDO SDWWHUQ FDQ EH YLHZHG DV D OLQHDU SDWWHU VKRZQ LQ )LJXUH D DQG WKXV WKH RSWLPL]HU JHQHUDWHV D QHZ DOJHEUDLF H[SUHVVLRQ ZKLFK FRUUHVSRQGV WR WKH DFFHVV SODQ VKRZQ LQ )LJXUH E PAGE 117 Q>R^*3$ r 6WXGHQW r 'HSDUWPHQW r 8QGHU JUDG r 6WXGHQWf ?6WXGHQW 6WXGHQWO@f> *3$@ 7KH WRWDO H[HFXWLRQ WLPH IRU WKLV DFFHVV SODQ LV 7 rf f r ZKHUH WKH ILUVW WHUP LV WKH WLPH IRU IRXU $VVRFLDWH RSHUDWLRQV DQG WKH VHFRQG WHUP LV WKH WLPH IRU WKH VHOHFWLRQ RSHUDWLRQ ,W LV OHVV H[SHQVLYH WKDQ WKH RULJLQDO DFFHVV SODQ WKXV D EHWWHU SODQ +RZHYHU LI ZH DVVXPH WKDW WKH GDWDEDVH LV D GLVWULEXWHG RQH LQ ZKLFK GDWD RI VWXGHQWVf *3$V DUH LQ VLWH DQG RWKHU GDWD DUH LQ VLWH WKH FODVV 6WXGHQW KDV WR EH UHSOLFDWHG LQ ERWK VLWHVf 7KH FRPPXQLFDWLRQ FRVW LV DVVXPHG WR EH XQLWV SHU EORFN ZLWK EORFN VL]H RI SDWWHUQV 7KH WRWDO H[HFXWLRQ WLPHV IRU WKHVH WZR DFFHVV SODQV FDQ EH FDOFXODWHG DV IROORZV 7M rf rf rf r 7 rf f r ,Q 7M WKH IRXUWK WHUP LV WKH FRPPXQLFDWLRQ FRVW IRU VHQGLQJ TXDOLILHG VWXGHQWV WR VLWH ,Q W WKH WKLUG WHUP LV WKH FRPPXQLFDWLRQ FRVW WKH FRPPXQLFDWLRQ FRVWV DUH WKH VDPH IRU VHQGLQJ *3$V RI DOO VWXGHQWV WR VLWH DQG IRU VHQGLQJ VWXGHQWVf PDMRUV DQG PLQRUV WR VLWH f ,Q WKLV FDVH WKH ILUVW DFFHVV SODQ LV EHWWHU WKDQ WKH VHFRQG )LJXUH D DQG E GHSLFWV WKH FRVWV RI RSHUDWLRQV QH[W WR WKH RSHUDn WLRQVf DQG WKH FRVWV RI FRPPXQLFDWLRQV RQ WKH HGJHVf IRU WKHVH WZR DFFHVV SODQV 7KH RSWLPL]HU RI WKH GLVWULEXWHG V\VWHP PD\ JHQHUDWH DQRWKHU DFFHVV SODQ E\ DSSO\LQJ SURSHUW\ WR WKH DOJHEUDLF H[SUHVVLRQ RI WKH VHFRQG DFFHVV SODQ DQG ZH KDYH PAGE 118 ,*3$ r R6WXGHQW r 'HSDUWPHQW r 8QGHU JUDG r 6WXGHQWf >6WXGHQW 6WXGHQWO@f>*3$@ ZKLFK FRUUHVSRQGV WR WKH DFFHVV SODQ VKRZQ LQ )LJXUH F 7KH WRWDO H[HFXWLRQ WLPH IRU WKLV DFFHVV SODQ LV 7V rf rn WUDWRUV DQG DSSOLFDWLRQ SURJUDPV LQ FRQYHQWLRQDO GDWDEDVHV V\VWHPV 7R HQVXUH JRRG SHUIRUPDQFH '%06V QHHG WKH VXSSRUW RI SDUDOOHO DQG GLVWULEXWHG SURn FHVVLQJ WHFKQLTXHV ,Q GLVWULEXWHG DQG SDUDOOHO SURFHVVLQJ HQYLURQPHQW D TXHU\ LV GHFRPSRVHG LQWR VXETXHULHV DFFRUGLQJ WKH SURFHVVLQJ FDSDELOLWLHV RI SURFHVVRUV DQGRU GDWD GLVn WULEXWLRQ 7KH DOJHEUDLF UHSUHVHQWDWLRQ RI D TXHU\ FDQ EH PDQLSXODWHG PDWKHPDWn LFDOO\ IRU WKLV SXUSRVH )RU H[DPSOH VXSSRVH D TXHU\ LV UHSUHVHQWHG E\ DQ LQWHQ VLRQDO SDWWHUQ VKRZQ LQ )LJXUH D 7KH DOJHEUD H[SUHVVLRQ IRU WKLV TXHU\ FDQ EH PAGE 119 ,OO ZULWWHQ DV IROORZV H[SU $ r %r(r) %r&r'r+ f &r*ff %\ DSSO\LQJ WKH GLVWULEXWLYLW\ SURSHUWLHV WKH DERYH H[SUHVVLRQ FDQ EH ZULWWHQ DV EHORZ H[SU $ r %r(r) %r&r'r+ f %r&r*f $ r% r(r) $ r % r&r' r+ f % r&r*f $r%r(r) $r%r&r'r+ f $r%r&r* 7KH GHFRPSRVHG H[SUHVVLRQ LV WKH $8QLRQ RI WZR VXEH[SUHVVLRQV UHSUHVHQWLQJ WZR VXESDWWHUQV VKRZQ LQ )LJXUH E 7KHVH VXEH[SUHVVLRQV DUH LQGHSHQGHQW RI HDFK RWKHU DQG FDQ EH SURFHVVHG LQ SDUDOOHO LQ D SDUDOOHO V\VWHP 7KH VHFRQG VXEn H[SUHVVLRQ FDQ EH IXUWKHU RSWLPL]HG DV VKRZQ LQ WKH IROORZLQJ H[SUHVVLRQ LQ ZKLFK r>&*f@ LQGLFDWHV WKDW WKH $VVRFLDWH RSHUDWLRQ LV SHUIRUPHG WKURXJK WKH DVVRFLDn WLRQ EHWZHHQ & DQG H[SU $ r%r(r) $r%r&r'r+f r>5^&*f? ,Q DGGLWLRQ VLQFH HDFK VXEH[SUHVVLRQ UHSUHVHQWV D KRPRJHQHRXV DVVRFLDWLRQVHW LWV SURFHVVLQJ ZLOO EH PRUH HIILFLHQW WKDQ SURFHVVLQJ RYHU KHWHURJHQHRXV DVVRFLDWLRQ VHWV 1H[W ZH SUHVHQW WZR WKHRUHPV RI WKH $DOJHEUD ZKLFK HQVXUHV WKDW WKH GHFRPSRVHG VXEH[SUHVVLRQV SURGXFH KRPRJHQHRXV DVVRFLDWLRQVHWV 7KHRUHP 2SHUDWRUV H[FHSW $8QLRQ DQG $,QWHJUDWHf RI $DOJHEUD SURGXFH KRPRJHQHRXV DVVRFLDWLRQVHWV LI WKHLU RSHUDQGV DUH KRPRJHQHRXV DVVRFLDWLRQVHW PAGE 120 3URRI 7KLV LV WUXH E\ WKH GHILQLWLRQV RI WKH RSHUDWRUV $,QWHUVHFW RSHUDWLRQ VKRXOG EH XVHG ZLWKRXW VSHFLI\LQJ WKH FODVVHV RQ ZKLFK WKH $,QWHUVHFW RSHUDWLRQ LV SHUn IRUPHG LH LW SHUIRUPV RQ WKH FRPPRQ FODVVHV RI LWV RSHUDQGVf 1RWH WKDW IRU $'LIIHUHQFH DQG $'LYLGH RSHUDWLRQV WKLV LV DOVR WUXH LI RQO\ WKH ILUVW RSHUDQG WKH PLQXHQG RU WKH GLYLGHQGf LV D KRPRJHQHRXV DVVRFLDWLRQVHW 7KHRUHP ,I DQ $DOJHEUD H[SUHVVLRQ ZKLFK GRHV QRW FRQWDLQ $,QWHJUDWH RSHUDn WLRQ DQG $'LYLGH RSHUDWLRQ ZKRVH GLYLGHQG LV DQ KHWHURJHQHRXV DVVRFLDWLRQVHW LW FDQ EH GHFRPSRVHG LQWR WKH $8QLRQfV RI VRPH VXEn H[SUHVVLRQV HDFK RI ZKLFK SURGXFHV D KRPRJHQHRXV DVVRFLDWLRQVHW 3URRI $FFRUGLQJ WR 7KHRUHP EHVLGHV WKH $,QWHJUDWH RSHUDWLRQ WKH $8QLRQ LV WKH RQO\ RSHUDWRU WKDW FDQ SURGXFH KHWHURJHQHRXV DVVRFLDWLRQVHW ZKHQ LWV RSHUDQGV DUH KRPRJHQHRXV DVVRFLDWLRQVHWV 7KHUHIRUH LW VXIILFHV WR SURYH WKDW ZKHQHYHU VXFK KHWHURJHQHRXV DVVRFLDWLRQVHW DSSHDUV LQ DQ H[SUHVVLRQ WKH H[SUHVn VLRQ FDQ EH GHFRPSRVHG LQWR WKH $8QLRQ RI VXEH[SUHVVLRQV ZKLFK SURGXFH KRPRn JHQHRXV DVVRFLDWLRQVHWV 3URRI /HW D IW DQG ; EH DOO KRPRJHQHRXV DVVRFLDWLRQVHWV %\ SURSHUWLHV DQG ZH KDYH RU r^ ;f RU r Dr? Ifr? m ;f D_ D_; e_ \_; D r ;f Dr D; Am \}; R^D P R^rP R>P QRW 3f>tD IODf>e7, %\ SURSHUWLHV ZH KDYH m 3f m f ^3 Âf PAGE 121 %\ SURSHUWLHV ZH KDYH D ;f D ,O>Dr?f>D` -Arf>@f DU; Drf>D@ "r;f>;@f >ID ,,3r?f,3? Drf>@f ; ,,^0P a DIr;f>;@f ,Q WKH DERYH GHFRPSRVLWLRQV HDFK WHUP RI WKH $8QLRQ RSHUDWLRQV UHSUHVHQWV D KRPRJHQHRXV DVVRFLDWLRQVHW Â’ PAGE 122 *3$ 6WXGHQW 'HSDUWPHQW Df LQWHQVLRQDO SDWWHUQ RI 4XHU\ Ef DFFHVV SODQ RI 4XHU\ )LJXUH $FFHVV SODQ RI 4XHU\ PAGE 123 *3$ 6WXGHQW 'HSDUWPHQW 8QGHUJUDG 6WXGHQWB R R R R R Df DOWHUQLWLYH LQWHQVLRQDO SDWWHUQ RI 4XHU\ Ef DFFHVV SODQ RI 4XHU\ )LJXUH $FFHVV SODQ RI 4XHU\ PAGE 124 r *3$ 6WXGHQW *3$ 6WXGHQW Ef FRVW RI DFFHVV SODQ Ff FRVW RI DFFHVV SODQ )LJXUH &RVWV LQ D GLVWULEXWHG V\VWHP PAGE 125 4! Df Ef )LJXUH ([DPSOH RI TXHU\ GHFRPSRVLWLRQ Â•, PAGE 126 &+$37(5 &203/(7(1(66 2) 7+( $$/*(%5$ :H KDYH VKRZQ LQ WKH SUHFHGLQJ VHFWLRQV WKDW D TXHU\ LVVXHG DJDLQVW DQ GDWDEDVH FDQ EH VSHFLILHG E\ DQ DVVRFLDWLRQ RU JUDSKLFf SDWWHUQ LQ ZKLFK REMHFW LQVWDQFHV RI LQWHUHVW DUH UHODWHG DVVRFLDWHG RU QRQDVVRFLDWHGf DQG WKDW WKH $ DOJHEUD SURYLGHV D XVHIXO PDWKHPDWLFDO PHWKRG IRU VSHFLI\LQJ DQG PDQLSXODWLQJ VXFK SDWWHUQ WR SURGXFH WKH UHVXOW IRU WKH TXHU\ +RZHYHU IRU WKH DOJHEUD WR EH WUXO\ XVHIXO WKH FRPSOHWHQHVV RI WKH DOJHEUD QHHGV WR EH DGGUHVVHG 'XH WR WKH FORVXUH SURSHUW\ RI WKH $DOJHEUD WKH UHVXOW RI D TXHU\ LV UHSUHVHQWHG LQWHQVLRQDOO\ E\ D VXEGDWDEDVH VFKHPD JUDSK 6*W DQG H[WHQVLRQDOO\ E\ D VXEGDWDEDVH REMHFW JUDSK 2*W ZKHUH 6* LV D VXEJUDSK RI WKH 6* RI WKH RULJLn QDO GDWDEDVH DQG 2* LV D VXEVHW RI DVVRFLDWLRQ SDWWHUQV LQ WKH RULJLQDO REMHFW JUDSK 2* $ VXEGDWDEDVH FDQ EH IXUWKHU RSHUDWHG XSRQ E\ WKH $DOJHEUD RSHUDn WRUV WR SURGXFH RWKHU VXEGDWDEDVHV :H FDQ WKHUHIRUH GHILQH WKH FRPSOHWHQHVV RI WKH DOJHEUD LQ WKH IROORZLQJ ZD\ &RPSOHWHQHVV 7KHRUHP 7KH $DOJHEUD LV FRPSOHWH LI LW FDQ GHILQH DOO SRVVLEOH VXEGDWDEDVH RI DQ GDWDEDVH %HIRUH SURYLQJ WKH WKHRUHP ZH ILUVW JLYH WKH IRUPDO GHILQLWLRQV RI WKH 6* DQG 2* RI RI WKH VXEGDWDEDVHV RI DQ GDWDEDVH PAGE 127 6XEGDWDEDVH 6FKHPD *UDSK $ VXEGDWDEDVH VFKHPD JUDSK 6*Wf LV D VHW RI P FRQQHFWHG VXEJUDSKV ^6*f&$f` OPf IURP WKH RULJLQDO GDWDEDVH VFKHPD JUDSK 6*&$f ZKHUH & LV D VHW RI YHUWLFHV UHSUHVHQWLQJ FODVVHV ^F` DQG $ LV D VHW RI HGJHV UHSUHVHQWLQJ DVVRFLDWLRQV EHWZHHQ FODVVHV HDFK RI ZKLFK LV GHQRWHG E\ $IM IRU DQ DVVRFLDWLRQ EHWZHHQ FODVVHV & DQG &\ ,I &MH6*? WKHQ &e6*N 90Mf 7KH FRQGLWLRQ HQVXUHV WKDW D FODVV GRHV QRW DSSHDU LQ WZR GLIIHUHQW FRQQHFWHG JUDSKV LQ D VXEGDWDEDVH ,I LW GRHV WKH WZR FRQQHFWHG JUDSKV VKRXOG KDYH EHHQ D VLQJOH FRQQHFWHG JUDSK 6XEGDWDEDVH 2EMHFW $VVRFLDWLRQf *UDSK $ VXEGDWDEDVH REMHFW JUDSK *W(ff FRQWDLQV D VXEVHW RI DVVRFLDWLRQ SDWn WHUQV RI WKH RULJLQDO GDWDEDVH REMHFW JUDSK *(ff ZKHUH 2 LV D VHW RI YHUWLFHV UHSUHVHQWLQJ REMHFW LQVWDQFHV DQG ( LV D VHW RI HGJHV UHSUHVHQWLQJ DVVRFLDWLRQV EHWZHHQ REMHFW LQVWDQFHV $Q ,QQHUSDWWHUQ RU REMHFW LQVWDQFH ^-f EHORQJV WR 2*W RQO\ LI &LH6*O DQG 2A*& $Q ,QWHUSDWWHUQ RU D &RPSOHPHQWSDWWHUQ 2LM 2P Qf EHORQJV AWR 2*W RQO\ LI &L&PH6*W DQG $PH6*f ZKHUH 2A& 2PQH&P DQG 2W2P Q*$ P 7KH DERYH FRQGLWLRQV VWDWH WKDW D SULPLWLYH DVVRFLDWLRQ SDWWHUQ VKRXOG QRW EH LQFOXGHG LQ 2*W LI WKH FRUUHVSRQGLQJ FODVVHV DQGRU DVVRFLDWLRQV RI WKH RULJLQDO GDWDEDVH DUH QRW LQ 6*W ,QVWHDG RI SURYLQJ WKH FRPSOHWHQHVV WKHRUHP DV VWDWHG DERYH ZH PDNH WKH IROORZLQJ REVHUYDWLRQV DQG UHVWDWH WKH WKHRUHP DV VKRZQ EHORZ )LUVW DOWKRXJK WKH 6* RI DQ GDWDEDVH PD\ FRQVLVW RI PRUH WKDQ RQH FRQQHFWHG JUDSK LW VXIILFHV WR SURYH WKH FDVH WKDW WKH 6* LV D VLQJOH FRQQHFWHG JUDSK VLQFH LI WZR FODVVHV GR QRW KDYH D SDWK EHWZHHQ WKHP LQ WKH 6* WKH\ ZLOO QRW EH DVVRFLDWHG ZLWK HDFK RWKHU LQ DQ\ RI WKH VXEGDWDEDVHV 7KHUHIRUH HDFK FRQQHFWHG JUDSK RI 6* FDQ EH WUHDWHG DV DQ LQGHSHQGHQW GDWDEDVH DQG D VXEGDWD PAGE 128 EDVH GHILQHG RQ PRUH WKDQ RQH FRQQHFWHG JUDSKV RI 6* FDQ EH UHSUHVHQWHG E\ WKH $8QLRQ RI WKH VXEGDWDEDVHV GHILQHG RQ GLIIHUHQW FRQQHFWHG JUDSKV RI 6* 6HFRQG LW VXIILFHV WR SURYH WKH FDVH WKDW D VXEGDWDEDVH FRQVLVWV RI RQO\ RQH FRQQHFW VXEJUDSK RI 6* DOWKRXJK LQ JHQHUDO WKH 6*W RI D VXEGDWDEDVH PD\ FRQn WDLQ PRUH WKDQ RQH VXEJUDSKV RI 6* 7KLV LV EHFDXVH WKH JHQHUDO FDVH FDQ EH UHSUHVHQWHG E\ WKH $8QLRQ RI WKH H[SUHVVLRQV IRU LQGLYLGXDO VXEJUDSKV 7KLUG VLQFH DQ GDWDEDVH LV D FROOHFWLRQ RI DVVRFLDWLRQ SDWWHUQV LW VKRXOG EH REYLRXV WKDW LI WKHUH H[LVWV DQ $DOJHEUD H[SUHVVLRQ IRU HYHU\ DVVRFLDWLRQ SDWn WHUQ RI DQ GDWDEDVH WKHQ WKH VXEGDWDEDVHV FDQ EH UHSUHVHQWHG E\ WKH $ 8QLRQ RI D VXEVHW RI WKHVH DVVRFLDWLRQ SDWWHUQV 7KHUHIRUH WKH FRPSOHWHQHVV WKHRUHP FDQ EH UHVWDWHG DV IROORZV &RPSOHWHQHVV 7KHRUHP 7KH $DOJHEUD LV FRPSOHWH LI WKHUH H[LVWV DQ H[SUHVVLRQ IRU HYHU\ DVVRn FLDWLRQ SDWWHUQ LQ WKH 2* RI DQ GDWDEDVH :H SURYH WKH DERYH WKHRUHP E\ LQGXFWLRQ RQ WKH QXPEHU RI REMHFW LQVWDQFHV LQ DQ DVVRFLDWLRQ SDWWHUQ 3URRI %XVeL :H ILUVW VKRZ WKDW WKHUH LV DQ H[SUHVVLRQ IRU WKH FDVH WKDW DQ DVVRFLDWLRQ SDWWHUQ FRQWDLQV D VLQJOH REMHFW LQVWDQFH 6LQFH WKH QDPH RI D FODVV VD\ &Y UHSUHVHQWV DOO WKH REMHFW LQVWDQFHV RI WKH FODVV DQ DVVRFLDWLRQ SDWWHUQ FRQWDLQLQJ D VLQJOH REMHFW LQVWDQFH RI WKDW FODVV FDQ EH UHSUHVHQWHG E\ DQ $6HOHFW RSHUDWLRQ RYHU WKH REMHFW LQVWDQFHV RI &O WR VHOHFW D SDUWLFXODU REMHFW LQVWDQFH RI LQWHUHVW DV VKRZQ EHORZ PAGE 129 pP ZKHUH % LV WKH FRQGLWLRQ DQ REMHFW LQVWDQFH RI & PXVW VDWLVI\ +\SRWKHVLV $VVXPH WKDW WKHUH H[LVWV DQ H[SUHVVLRQ IRU HYHU\ DVVRFLDWLRQ SDWWHUQ WKDW FRQWDLQV Q REMHFW LQVWDQFHV 7KHVH QO REMHFW LQVWDQFHV PXVW IRUP D FRQn QHFWHG JUDSK LH HDFK REMHFW LQVWDQFH PXVW EH DW OHDVW RQH SDWK EHWZHHQ DQ\ WZR REMHFW LQVWDQFHV LQ WKH JUDSK 2WKHUZLVH WKH\ ZRXOG KDYH IRUPHG PXOWLSOH DVVRFLn DWLRQ SDWWHUQV ,QGXFWLRQ 6XSSRVH WKHUH H[LVW DQ H[SUHVVLRQ IRU DQ DVVRFLDWLRQ SDWWHUQ 3QB ZKLFK FRQWDLQV QO REMHFW LQVWDQFHV :KHQ DGGLQJ WKH QWK REMHFW LQVWDQFH WR WKLV SDWn WHUQ D QHZ SDWWHUQ 3Q FRQWDLQLQJ Q REMHFW LQVWDQFHV FDQ EH IRUPHG LQ WKH IROORZn LQJ WZR ZD\V DV GHSLFWHG LQ )LJXUH Df WKH QWK REMHFW LQVWDQFH EHORQJV WR FODVV &N DQG WKH REMHFW LQVWDQFHV RI &N GR QRW SDUWLFLSDWH LQ 3QB DQG Ef WKH QWK REMHFW LQVWDQFH EHORQJ WR D FODVV VD\ &S ZKLFK KDV VRPH REMHFW LQVWDQFHVf SDUWLFLSDWHG LQ WKH 3QB 7R DYRLG XVLQJ FRPSOLFDWHG QRWDWLRQ ZH ZLOO VKRZ WKH IRUPXODWLRQV IRU WZR VSHFLILF SDWWHUQV GHSLFWHG LQ )LJXUH D DQG E ZKLFK FRUUHVSRQG WR WKH FDVHV RI )LJXUH D DQG E UHVSHFWLYHO\ 3DWWHUQV LQ JHQHUDO IRUPV FDQ EH IRUPXODWHG XVLQJ WKH VDPH PHFKDQLVP GHVFULEHG EHORZ :H VKDOO GLVFXVV FDVHV D DQG E LQ WXUQ &DVH D :KHQ DGGLQJ DQ REMHFW LQVWDQFH RI & WR D SDWWHUQ 3 FRQWDLQLQJ REMHFW LQVWDQFHV YDULRXV QHZ SDWWHUQV 3V FDQ EH IRUPHG GHSHQGLQJ RQ WKH DVVRn FLDWLRQV EHWZHHQ WKH QHZ REMHFW LQVWDQFH DQG WKH RWKHU H[LVWLQJ REMHFW LQVWDQFHV 7KH QHZ REMHFW LQVWDQFH FDQ RQO\ KDYH RQH DVVRFLDWLRQ ZLWK DQ H[LVWLQJ REMHFW LQVWDQFH LI WKHLU FODVVHV DUH GLUHFWO\ FRQQHFWHG LQ 6* E\ D VLQJOH DVVRFLDWLRQ W\SH PAGE 130 ZH ZLOO FRQVLGHU ODWHU WKH FDVH WKDW WKHUH DUH PRUH WKDQ RQH DVVRFLDWLRQ W\SH EHWZHHQ WZR FODVVHVf 7KHUH DUH RQO\ WKUHH SRVVLEOH FKRLFHV IRU WKH QHZ REMHFW LQVWDQFH WR UHODWH WR DQ H[LVWLQJ REMHFW LQVWDQFH Of WKH DVVRFLDWLRQ LV RI QR LQWHUHVW LH WKH DVVRFLDWLRQ LV QRW LQFOXGHG LQ WKH SDWWHUQ f WKH\ DUH DVVRFLDWHG ZLWK HDFK RWKHU f WKH\ DUH QRW DVVRFLDWHG ZLWK HDFK RWKHU *UDSKLFDOO\ ZH XVH D VROLG OLQH DQ ,QWHUSDWWHUQf WR UHSUHVHQW FKRLFH DQG D GDVKHG OLQH D &RPSOHPHQWSDWWHUQfn WHUQ 3n LQ )LJXUH D SURYLGHG WKDW WKH REMHFW LQVWDQFHV RI WKH DOLDVLQJ FODVVHV RI WKH VDPH FODVV DUH QRW WKH VDPH REMHFW LQVWDQFHV 1H[W WKH HTXLYDOHQW SDWWHUQ LV GHFRPSRVHG LQWR D VHW RI SDWWHUQV HDFK RI ZKLFK LV D VXESDWWHUQ LH VXEJUDSKf RI WKH SDWWHUQ LQ )LJXUH D DQG FRQVLVWV RI 3Q WKH QHZ REMHFW LQVWDQFH DQG LWV UHODWLRQVKLS ZLWK RQH REMHFW LQVWDQFH LQ 3 ,I ZH FDQ GHULYH H[SUHVVLRQV IRU WKHVH VXESDWWHUQ LQGLYLGXDOO\ WKH $,QWHUVHFWfV RI WKHVH H[SUHVVLRQ ZLOO EH WKH H[SUHVVLRQ IRU WKH SDWWHUQ LQ )LJXUH D ZKLFK LV HTXLYDOHQW WR WKH SDWWHUQ LQ )LJXUH D ,Q WKLV H[DPSOH WKH SDWWHUQ LQ )LJXUH D LV GHFRPSRVHG LQWR VL[ VXESDWWHUQV DV PAGE 131 VKRZQ LQ )LJXUH D ZKLFK FDQ EH HDVLO\ H[SUHVVHG DV IROORZV (SQ ( f _>-&MB&f@ & DO e e f m&A&f@ & D ASL ^( Qf rLL&MO&f@ & ( e"f r>5&nB&f@ & D e ( Xf ?>5&V&f` & R ASL ef r>mAAf@ & R2 UHVSHFWLYHO\ +HUH ( VWDQGV IRU WKH DOJHEUDLF H[SUHVVLRQ RI WKH DVVRFLDWLRQ SDWWHUQ VSHFLILHG E\ LWV VXEVFULSW ,Q HDFK H[SUHVVLRQ DQ RSHUDWLRQ rfB DIWHU WKH SDWWHUQ WUDQVIRUPDWLRQ SURFHVV VHH )LJXUH Ef $V VKRZQ LQ )LJXUH E WKH HTXLYDOHQW SDWWHUQ GHSLFWHG LQ )LJXUH E LV GHFRPSRVHG LQWR IRXU SDWWHUQV ZKLFK FDQ EH H[SUHVVHG E\ PAGE 132 ( ( f r>IO&B&f@ &B ( Z (BQf r>5&fÂ§O&Bf@ &B e ( Xf _>&B&Bf@ FB A e_>&&Bf@ &I-L UHVSHFWLYHO\ 7KHUHIRUH IRU WKH SDWWHUQ 3Â ZH KDYH H[SUHVVLRQ A fÂ§ (SO r AS r ( f ( +RZHYHU WKH DERYH H[SUHVVLRQ GRHV QRW H[FOXGH WKH FDVH WKDW WZR REMHFW LQVWDQFHV LQ DOLDVLQJ FODVVHV RI &W UHIHU WR WKH VDPH REMHFW LQVWDQFH +HQFH LW LV QHFHVVDU\ WR SHUIRUP DQ $6HOHFW RSHUDWLRQ WR HOLPLQDWH VXFK FDVH DQG ZH KDYH ( t RW( f ( f ( L f ( A>&JOA&M@ r n 6R IDU ZH KDYH VKRZQ WKDW WKHUH H[LVWV DW OHDVW RQH H[SUHVVLRQ IRU D SDWWHUQ FRQVLVWLQJ RI DQ\ QXPEHU RI REMHFW LQVWDQFHV :H QRWH WKDW WKHUH PD\ H[LVW PRUH WKDQ RQH H[SUHVVLRQ IRU D SDWWHUQ :H LOOXVWUDWH WKLV E\ VKRZLQJ DQ DOWHUQDWLYH ZD\ RI WUDQVIRUPLQJ D SDWWHUQ LQWR DQ HTXLYDOHQW RQH VR WKDW GLIIHUHQW H[SUHVVLRQV FDQ EH GHULYHG )LJXUH D VKRZV DQRWKHU SDWWHUQ ZKLFK LV HTXLYDOHQW WR WKH SDWWHUQ LQ )LJn XUH D LI LQ )LJXUH D WKH REMHFWV LQVWDQFHV RI WKH DOLDVLQJ FODVVHV &B WKURXJK &ME WKDW SDUWLFLSDWH LQ 3 UHIHU WR WKH VDPH REMHFW 7KHUHIRUH ZH KDYH DQ DOWHUn QDWLYH H[SUHVVLRQ IRU 3nD (Sf reff _>3&B&Bf@ &\fÂ§Of r>IO&,BO9f@ &B PAGE 133 f f f r>-&f9f@ &ff>&BO & &B@ ZKLFK LV D VHTXHQFH RI r DQGRU RSHUDWLRQV RQ (S8 RYHU FODVVHV &BW Af DQG WKHLU DVVRFLDWHG FODVVHV 7KH VHOHFWLRQ FRQGLWLRQ >&BO &B &B@ HQVXUHV WKDW WKH REMHFW LQVWDQFHV LQ DOO DOLDVLQJ FODVVHV RI & UHIHU WR WKH VDPH REMHFW 6LPLODUO\ WKH SDWWHUQ LQ )LJXUH E LV HTXLYDOHQW WR WKH SDWWHUQ LQ )LJXUH E LI WKH REMHFW LQVWDQFHV LQ &JB WKURXJK &B WKDW SDUWLFLSDWH LQ 3I DUH WKH VDPH REMHFW DQG WKLV REMHFW LV GLIIHUHQW IURP WKH RQH LQ &JBO +HQFH DQ DOWHUQDWLYH H[SUHVVLRQ FDQ EH GHULYHG DV IROORZV ( R^( f r>-"&JB&Af@ &-=f r^5&O&H-f` &Bf E f f f _>L&B&Bf@ &HBff>&HB &HB &HB &HBO@ :H KDYH VKRZQ WKDW WKHUH H[LVWV DQ H[SUHVVLRQ IRU HYHU\ DVVRFLDWLRQ SDWWHUQ ZKHQ WKHUH LV D VLQJOH DVVRFLDWLRQ EHWZHHQ WZR FODVVHV 1RZ ZH SURYH WKLV LV DOVR WUXH ZKHQ WKHUH DUH PRUH WKDQ RQH DVVRFLDWLRQ EHWZHHQ WZR FODVVHV 7KHUH DUH DOVR WZR FDVHV DV GHVFULEHG LQ WKH SURRI DERYH :H RQO\ SURYH FDVH D WKDW WKH QHZ REMHFW LQVWDQFH EHORQJV WR &N DQG WKH REMHFW LQVWDQFHV RI &N GR QRW SDUWLFLn SDWH LQ 3? &DVH E FDQ EH SURYHQ XVLQJ WKH VDPH PHWKRGRORJ\ )LJXUH D VKRZV DQ 6* LQ ZKLFK WKHUH DUH WZR DVVRFLDWLRQV EHWZHHQ &WB DQG &N 7KH WZR DVVRFLDWLRQV DUH GHQRWHG DV >5A&MA&A@ DQG >5A&A&A@ UHVSHFn WLYHO\ )LJXUH E VKRZV D SDWWHUQ LQ ZKLFK WKH QHZ REMHFW LQVWDQFH RI &N KDV WZR DVVRFLDWLRQV ZLWK HDFK REMHFW LQVWDQFH RI &BY 7KH DVVRFLDWLRQV EHWZHHQ REMHFW LQVWDQFHV RI &MB^ DQG &N DUH ODEHOHG E\ QXPEHUV FRUUHVSRQGLQJ WR WKH DVVRn FLDWLRQV RI WKHLU FODVVHV 7R GHULYH WKH DOJHEUDLF H[SUHVVLRQ IRU WKLV SDWWHUQ ILUVW PAGE 134 ZH GHFRPSRVH LW LQWR WZR SDWWHUQV 3f DQG 3N DV VKRZQ LQ )LJXUH F 7KH GHFRPSRVLWLRQ LV GRQH E\ PDNLQJ WZR FRSLHV RI WKH SDWWHUQ ,Q RQH FRS\ WKH DVVRn FLDWLRQV ODEHOHG DUH GURSSHG DQG LQ WKH RWKHU WKH DVVRFLDWLRQV ODEHOHG DUH GURSSHG )URP WKH HDUOLHU GLVFXVVLRQ ZH FDQ GHULYH H[SUHVVLRQV IRU WKHVH WZR SDWn WHUQV DQG WKH H[SUHVVLRQ IRU WKH RULJLQDO SDWWHUQ FDQ EH UHSUHVHQWHG E\ WKH $ ,QWHUVHFW RI WKH WZR ( B ( f f ( f SQ SQ Sf D E 7R HQVXUH WKDW WKH $,QWHUVHFW RSHUDWLRQ ZLOO SURGXFH WKH SDWWHUQ DV UHTXLUHG WKH VDPH REMHFW LQVWDQFH LQ WKH WZR FRSLHV VKRXOG XVH WKH VDPH DOLDVLQJ FODVV QDPH ZKHQ H[SUHVVLRQV ( f DQG ( f DUH IRUPXODWHG D *HQHUDOO\ LI WKH QHZ REMHFW LQVWDQFH RI &N KDV PXOWLSOH DVVRFLDWLRQV ZLWK REMHFW LQVWDQFHV RI VHYHUDO FODVVHV WKH DVVRFLDWLRQ SDWWHUQ LV GHFRPSRVHG LQWR P SDWWHUQV ZKHUH P LV WKH PD[LPXP QXPEHU RI DVVRFLDWLRQV &N KDV ZLWK DQRWKHU FODVV 6LQFH LW KDV EHHQ VKRZQ WKDW ZH FDQ IRUPXODWH DOJHEUDLF H[SUHVVLRQV IRU DOO SRVVLEOH SDWWHUQV LQ ZKLFK REMHFW LQVWDQFHV DUH DVVRFLDWHG RU QRQDVVRFLDWHG DQG WKH $8QLRQfV RI WKHVH H[SUHVVLRQV IRUPV D VLQJOH H[SUHVVLRQ IRU WKH VXEGDWDEDVH RI LQWHUHVW ZH KDYH VKRZQ WKDW WKH $DOJHEUD LV FRPSOHWH E\ LQGXFWLRQ Â’ PAGE 135 Df WKH QWK REMHFW LV LQ &N Ef WKH QWK REMHFW LV LQ &M )LJXUH 7ZR ZD\V RI IRUPLQJ QHZ SDWWHUQV PAGE 136 Df WKH WK REMHFW LV LQ & Ef WKH WK REMHFW LV LQ & )LJXUH 7ZR VSHFLILF H[DPSOHV RI QHZ SDWWHUQV PAGE 137 Df Ef )LJXUH (TXLYDOHQW SDWWHUQV PAGE 138 Df Ef )LJXUH 'HFRPSRVHG SDWWHUQV PAGE 139 Df Ef )LJXUH 2WKHU HTXLYDOHQW SDWWHUQV PAGE 140 Df 7ZR FODVVHV KDYH PXOWLSOH Ef 7ZR REMHFWV KDYH PXOWLSOH DVVRFLDWLRQV LQ D SDWWHUQ DVVRFLDWLRQV )LJXUH 1HZ REMHFW LQVWDQFH KDYLQJ PXOWLSOH DVVRFLDWLRQV ZLWK WKRVH RI &WB PAGE 141 &+$37(5 &21&/86,21 2EMHFW2ULHQWHG '%06V DQG WKHLU XQGHUO\LQJ PRGHOV H[KLELW VHYHUDO GHVLUDEOH IHDWXUHV WKDW DUH VXLWDEOH IRU PRGHOLQJ DQG SURFHVVLQJ FRPSOH[ REMHFWV IRXQG LQ PRUH DGYDQFHG GDWDEDVH DSSOLFDWLRQV +RZHYHU WKH\ VWLOO GR QRW KDYH D VROLG PDWKHPDWLFDO IRXQGDWLRQ 6XFK D IRXQGDWLRQ LV LPSRUWDQW IRU WKH HIILFLHQW PDQLn SXODWLRQ RI GDWDEDVHV DQG IRU WKH GHVLJQ RI KLJKOHYHO TXHU\ ODQJXDJHV WR HDVH WKH XVHUfV WDVN LQ DFFHVVLQJ DQG PDQLSXODWLQJ GDWDEDVHV ,Q WKLV GLVVHUWDWLRQ ZH KDYH SUHVHQWHG DQ DOJHEUD IRU GDWDEDVH SURFHVVn LQJ EDVHG RQ WKH XQLIRUPHG UHSUHVHQWDWLRQ RI REMHFW LQVWDQFHV DQG WKHLU DVVRFLDnn GDWDEDVHV WKDW DUH GHULYDEOH IURP DQ GDWDEDVH FDQ EH H[SUHVVHG LQ $DOJHEUD H[SUHVVLRQV PAGE 142 7KH $DOJHEUD KDV EHHQ XVHG LQ WKH GHVLJQ DQG LPSOHPHQWDWLRQ RI D KLJK OHYHO REMHFWRULHQWHG TXHU\ ODQJXDJH 24/ IRU SURFHVVLQJ GDWDEDVHV >$/$E :8@ $ JUDSKLF LQWHUIDFH IRU WKH ODQJXDJH DQG D SURWRW\SH NQRZOHGJH EDVH PDQDJHPHQW V\VWHP EDVHG RQ WKH VHPDQWLF DVVRFLDWLRQ PRGHO 26$0r >68 DQG 68@ DUH SUHVHQWHG LQ >'6 7< 68 /$0 3$1 &+8 6,1@ PAGE 143 5()(5(1&(6 >$+@ >$/$D@ >$/$E@ >$/$@ >$50@ >$67@ >%$1@ >%$1@ >%$7@ $KR $9 %HHUL & DQG 8OOPDQ -' 7KH 7KHRU\ RI -RLQV LQ 5HODnn'HSHQGHQF\ 6WUXFWXUHV RI 'DWD %DVH 5HODWLRQn VKLSV )'7 $&0 1HZ PAGE 144 >%$7@ >%((@ >&$5@ >&+8@ >&2' @ >&2'D@ >&2' E@ >&2'@ >&2' @ >&2/@ >'$+@ >'(/@ %DWRU\ DQG .LP : 0RGHOLQJ &RQFHSWV IRU 9/6, &$' 2EMHFWV $&0 7UDQVDFWLRQV RQ 'DWDEDVH 6\VWHPV SS %HHUL & )DJLQ 5 DQG +RZDUG -+ $ &RPSOHWH $VLRPDWL]DWLRQ IRU )XQFWLRQDO DQG 0XOWLYDOXHG 'HSHQGHQFLHV $&0 6,*02' ,QWHUn QDWLRQDO 6\PSRVLXP RQ 0DQDJHPHQW RI 'DWD /RV $QJHOHV &$ SS &DUH\ 0'H:LWW 'DQG 9DQGHQEHUJ 6/ $ 'DWD 0RGHO DQG 4XHU\ /DQJXDJH IRU (;2'86 $&06,*02' &RQIHUHQFH SS &KXDQJ + 6 2SHUDWLRQDO 5ROH 3URFHVVLQJ LQ D 3URWRW\SH 26$0r .%06 0DVWHUfV WKHVLV 8QLYHUVLW\ RI )ORULGD &RGG ( $ 5HODWLRQDO 0RGHO RI 'DWD IRU /DUJH 6KDUHG 'DWD %DQN &$&0 SS &RGG ( n5HODWLRQDO &RPSOHWHQHVV RI 'DWDEDVH 6XEODQJXDJHV LQ 'DWD %DVH 6\VWHPV 5XVWLQ 5 HGf 3UHQWLFH+DOO ,QF (QJOHZRRG &OLIIV 1SS &RGG () n)XUWKHU 1RUPDOL]DWLRQ RI WKH 'DWD %DVH 5HODWLRQDO 0RGHO LQ 'DWD %DVH 6\VWHPV 5 5XVWLQ HGf 3UHQWLFH+DOO (QJOHn ZRRG &OLILV 1SS &RGG () nf6RX]D 7 *UDSKLF 6HPDQWLF 'DWD 'HILQLWLRQ /DQJXDJH DQG D *UDSKLF %URZVHU IRU WKH 2EMHFWHGRULHQWHG 6HPDQWLF $VVRFLDWLRQ 0RGHO 0DVWHUfV 7KHVLV 8QLYHUVLW\ RI )ORULGD >(/0@ (OPRUH 3 6KDZ *0 DQG =GRQLN 6% 7KH (1&25( 2EMHFW 2ULHQWHG 'DWD 0RGHO WHFK UHS %URZQ 8QLYHUVLW\ 1RYHPEHU >)$*@ )DJLQ 5 n0XOWLYDOXHG 'HSHQGHQFLHV DQG D 1HZ 1RUPDO )RUP IRU 5HODWLRQDO 'DWDEDVH $&0 7UDQVDFWLRQV RQ 'DWDEDVH 6\VWHPV SS >),6@ )LVKPDQ '+ %HHFK &DWH +3 &KRZ (& &RQQRUV 7 'DYLV -: 'HUUHWW 1 +RFK &* .HQW : /\QJEDHN 3 0DK ERG % 1HLPDW 0$ 5\DQ 7$ DQG 6KDQ 0& ,ULV $Q 2EMHFW 2ULHQWHG 'DWDEDVH 0DQDJHPHQW 6\VWHP $&0 7UDQVDFWLRQV RQ 2IILFH ,QIRUPDWLRQ 6\VWHPV SS >*2/@ *ROGEHUJ $ ,QWURGXFLQJ WKH 6PDOOWDON 6\VWHP %\WH $XJ SS >+$/@ +DOO 3$9 2SWLPL]DWLRQ RI D 6LQJOH 5HODWLRQDO ([SUHVVLRQ LQ D 5HODWLRQDO 'DWDEDVH ,%0 5HVHDUFK DQG 'HYHORSPHQW SS >+$0@ +DPPHU 0 DQG 0FOHRG n'DWDEDVH 'HVFULSWLRQ ZLWK 6'0 $ 6HPDQWLF $VVRFLDWLRQ 0RGHO $&0 72'6 SS >+25@ +RUQLFN 0) DQG =GRQLN 6 % $ 6KDUHG 6HJPHQWHG 0HPRU\ 6\Vn WHP IRU DQ 2EMHFWRULHQWHG 'DWDEDVH 6\VWHP $&0fV 7UDQVDFWLRQV RQ 2IILFH ,QIRUPDWLRQ 6\VWHPV SS >+8/@ +XOO 5 DQG .LQJ 5 6HPDQWLF 'DWDEDVH 0RGHOLQJ 6XUH\ $SSOLFDn WLRQV DQG 5HVHDUFK ,VVXHV $&0 &RPSXWLQJ 6XUYH\V SS >.,0@ .LP : %DQHUMHH &KRX +7 *DU]D -) DQG :RHON &RPn SRVLWH 2EMHFW 6XSSRUW LQ DQ 2EMHFWRULHQWHG 'DWDEDVH 6\VWHP 3URFHHGLQJV RI 2236/$ )/ 2FW SS >.,1@ .LQJ 5 6HPEDVH $ 6HPDQWLF '%06 WKH 3URFHHGLQJV RI WKH )LUVW ,QWHUQDWLRQDO :RUNVKRS RQ ([SHUW 'DWDEDVH 6\VWHPV $WODQWD *$ 2FW SS >./(@ .OHHQH 6& 0DWKHPDWLFDO /RJLF -RKQ :LOH\ t 6RQV ,QF PAGE 146 >/$0@ /DP + ;LD 4LX DQG :X 3 n3URWRW\SH ,PSOHPHQWDWLRQ RI DQ 2EMHFWRULHQWHG .QRZOHGJH %DVH 0DQDJHPHQW 6\VWHP WR DSSHDU LQ WKH 3URFHHGLQJV RI 352&,(0 f 2UODQGR )/ 1RY >/(&@ /HFOXVH & 5LFKDUG 3 DQG 9HOH] ) R DQ 2EMHFW2ULHQWHG 'DWD 0RGHO $&06,*02' &RQIHUHQFH &KLFDJR ,/ -XQH SS >0$&@ 0DF*UHJRU 5 $5,(/$ 6HPDQWLF )URQW(QG WR 5HODWLRQDO '%06V 3URFHHGLQJV RI 9/'% $WODQWD *$ $SULO SS >0$,@ 0DLHU DQG 6WHLQ n'HYHORSPHQW RI DQ 2EMHFWRULHQWHG '%06 3URF RI 2236/$ f &RQIHUHQFH 3RUWODQG 25 6HSW 2FW SS >0$1@ 0DQROD ) DQG 'D\DO 8 n3'0 $Q 2EMHFW2ULHQWHG 0RGHO ,QWfO :RUNVKRS 2Q 2EMHFW2ULHQWHG 'DWDEDVH 6\VWHPV SS >3$1@ 3DQW 6 $Q ,QWHOOLJHQW 6FKHPD 'HVLJQ 7RRO IRU 26$0r 0DVWHUfV WKHVLV 8QLYHUVLW\ RI )ORULGD >52:@ 5RZH / $ DQG 6WRQHEUDNHU 0 5 7KH 3267*5(6 'DWD 0RGHO 3URFHHGLQJV RI WKH WK 9/'% &RQIHUHQFH %ULJKWRQ SS >6(5@ 6HUYLR /RJLF 'HYHORSPHQW &RUSRUDWLRQ 3URJUDPPLQJ LQ 23$/ D 0DQXDO 3XEOLVKHG E\ 6HUYLR /RJLF 'HYHORSPHQW &RUSRUDWLRQ %HDYHUn WRQ 25 >6+$@ 6KDZ 0 DQG =GRQLF 6 % $ 4XHU\ $OJHEUD IRU 2EMHFW2ULHQWHG 'DWDEDVHV ,((( 7UDQV RQ 'DWD (QJLQHHULQJ SS )HE >6+,@ 6KLSPDQ 7KH )XQFWLRQDO 'DWD 0RGHO DQG WKH 'DWD /DQJXDJH '$3/(; $&0 72'6 SS >6,1@ 6LQJK 0 7UDQVDFWLRQ 2ULHQWHG 5XOH 3URFHVVLQJ LQ DQ 2EMHFW 2ULHQWHG .QRZOHGJH %DVH 0DQDJHPHQW 6\VWHP 0DVWHUfV WKHVLV 8QLYHUVLW\ RI )ORULGD >67@ 6WRQHEUDNHU 0 :RQJ ( .UHSV 3 DQG +HOG 7KH 'HVLJQ DQG ,PSOHPHQWDWLRQ RI ,1*5(6 $&0 7UDQVDFWLRQV RQ 'DWDEDVH 6\Vn WHPV SS >67@ 6WRQHEUDNHU 0 $QGHUVRQ ( +DQVRQ ( DQG 5XEHQVWHLQ % 4XHO DV D 'DWD 7\SH 3URFHHGLQJV RI WKH $&0 6,*02' &RQIHUHQFH PAGE 147 >68@ >68@ >68@ >72'@ >768@ >7<@ >8//@ >:2(@ >:21@ >:8@ >=$1@ RQ 0DQDJHPHQW RI 'DWD %RVWRQ 0$ -XQH SS 6X 6<: 0RGHOLQJ ,QWHJUDWHG 0DQXIDFWXULQJ 'DWD :LWK 6$0r 7((( &RPSXWHU -DQXDU\ SS 6X 6<: /DP + DQG 1DYDWKH 61 $Q 2EMHFWRULHQWHG &RPn SXWLQJ (QYLURQPHQW IRU 3URGXFWLYLW\ ,PSURYHPHQW LQ $XWRPDWHG 'HVLJQ DQG 0DQXIDFWXULQJ 3URMHFW 6XPPDU\ 352&,(0 f2UODQGR )/ 1RY 6X 6<: .ULVKQDPXUWK\ 9 DQG /DP + $Q 2EMHFWRULHQWHG 6HPDQWLF $VVRFLDWLRQ 0RGHO 26$0rf $, ,QGXVWULDO (QJLQHHULQJ DQG 0DQXIDFWXULQJ 7KHRUHWLFDO ,VVXHV DQG $SSOLFDWLRQV 6 .XPDUD $/ 6R\VWHU DQG 5/ .DVK\DS HGVf 7KH ,QVWLWXWH RI ,QGXVWULDO (QJLQHHULQJ ,QGXVWULDO (QJLQHHULQJ DQG 0DQDJHPHPQW 3UHVV 1RU FURVV *$ 7RGG 6-3 7KH 3HWHUOHH 5HODWLRQDO 7HVW 9HKLFOH $ 6\VWHP 2YHUn YLHV ,%0 6\VWHPV SS 7VXUW 6 DQG =DQLROR & $Q ,PSOHPHQWDWLRQ RI *(0 6XSSRUWLQJ D 6HPDQWLF 'DWD 0RGHO RQ D 5HODWLRQDO %DFN (QG 3URFHHGLQJV RI WKH $&0 6,*02' ,QWL &RQIHUHQFH RQ WKH 0DQDJHPHQW RI 'DWD %RVWRQ 0$ -XQH SS )UHGHULFN 7\ 7KH 'HVLJQ DQG ,PSOHPHQWDWLRQ RI D *UDSKLFV ,QWHUn IDFH IRU DQ 2EMHFWRULHQWHG /DQJXDJH 0DVWHUff %HQMDPLQ&XQQLQJV 3XEOLVKLQJ 0HXOR 3DUN &$ SS =GRQLN 6 % 6NDUUD $ + DQG 5HLVV 6 3 $Q REMHFW 6HUYHU IRU DQ 2EMHFWRULHQWHG 'DWDEDVH 6\VWHP ,QWHUQDWLRQDO :RUNVKRS RQ 2EMHFWRULHQWHG 'DWDEDVH 6\VWHPV 3DFLILF *URYH &$ 6HSW =RRN : PAGE 149 $33(1',; 7KH IRUPDO SURRIV RI WKH PDWKHPDWLFDO SURSHUWLHV RI WKH $DOJHEUD RSHUDWRUV DUH JLYHQ EHORZ $ &RPPXWDWLYLW\ f DA5^$%f?3 3A%$f?D f 3URRI ,I D SDWWHUQ LQ D FDQ EH FRQFDWHQDWHG ZLWK D SDWWHUQ LQ RYHU DQ ,QWHUn SDWWHUQ DEM WKHQ WKH SDWWHUQ LQ FDQ EH FRQFDWHQDWHG ZLWK WKDW SDWWHUQ LQ D RYHU WKH ,QWHUSDWWHUQ LAD 6LQFH SDWWHUQV DUH QRQGLUHFWLRQDO LH DLEM ED^ WKH OHIW KDQG VLGH DQG WKH ULJKWKDQG VLGH RI WKH HTXDWLRQ ZRXOG SURGXFH WKH VDPH UHVXOW 2Q WKH RWKHU KDQG LI DQ D SDWWHUQ FDQQRW EH FRQFDWHQDWHG ZLWK D c SDWWHUQ E\ WKH RSHUDWLRQ RQ WKH OHIWKDQG VLGH WKHQ WKH VDPH SDWWHUQ FDQQRW EH FRQFDWHQDWHG ZLWK WKDW D SDWWHUQ E\ WKH RSHUDWLRQ RQ WKH ULJKWKDQG VLGH Â’ f D?>5$%f@3 3+5%$f`Fr f 3URRI 6LQFH D &RPSOHPHQWSDWWHUQ LV QRQGLUHFWLRQDO DQG LI D FRPSOHPHQW SDWWHUQ DLEM FRQQHFWV DQ D SDWWHUQ ZLWK D I SDWWHUQ WKHVH WZR SDWWHUQV WRJHWKHU ZLWK WKH &RPSOHPHQWSDWWHUQ DLEM ZLOO DOO EH UHWDLQHG LQ WKH UHVXOWV RI WKH H[SUHVVLRQV RQ ERWK VLGHV RI WKH HTXDWLRQ )RU WKH VDPH UHDVRQ D QHZ SDWWHUQ ZKLFK FDQQRW EH SURGXFHG E\ WKH RSHUDWLRQ RQ WKH OHIWKDQG VLGH RI WKH HTXDWLRQ FDQQRW EH SURGXFHG E\ WKH RSHUDWLRQ RQ WKH ULJKWKDQG VLGH Â’ PAGE 150 f D?^5$%f@3 3n>5%$f@[ f 3URRI $FFRUGLQJ WR WKH FRQQHFWLRQV EHWZHHQ SDWWHUQV RI D DQG 3 WKURXJK VRPH ,QWHUSDWWHUQV D DQG 3 FDQ EH GHFRPSRVHG LQWR WKH $8QLRQ RI WZR VXEVHWV RI SDWn WHUQV UHVSHFWLYHO\ LQ LQ Dff?^5$%f@3 3f D LI 3 Q S ,, Q Q D 3 RWKHUZLVH ULJKWKDQG VLGH 3 f?>5%$f?DW Df Q Q D LI 3 W! ,, ,, 3 LI D I! 3 D RWKHUZLVH 6LQFH D &RPSOHPHQWSDWWHUQ LV QRQGLUHFWLRQDO LH D 3 3 D WKH FRPPXWDWLYLW\ KROGV IRU DOO FDVHV Â’ PAGE 151 f &I^;f3 3r^;`D f 3URRI ,I WKH ,QQHUSDWWHUQV REMHFW LQVWDQFHVf RI WKH FODVVHV VSHFLILHG LQ ^;` FRQn WDLQHG LQ DQ D SDWWHUQ DUH FRPPRQ WR D 3 SDWWHUQ WKH QHZ SDWWHUQ ZKLFK LV WKH LQWHUVHFWLRQ RI WKH WZR SDWWHUQV ZLOO EH SURGXFHG E\ ERWK VLGHV RI WKH HTXDWLRQ 2Q WKH RWKHU KDQG LI DQ D SDWWHUQ ZKLFK GRHV QRW LQWHUVHFW ZLWK D 3 SDWWHUQ E\ WKH RSHUDWLRQ RQ WKH OHIWKDQG VLGH RI WKH HTXDWLRQ WKH VDPH 3 SDWWHUQ ZLOO QRW LQWHUVHFW ZLWK WKDW D SDWWHUQ E\ WKH RSHUDWLRQ RQ WKH ULJKWKDQG VLGH Â’ f D3 3D f 3URRI 6LQFH WKH $8QLRQ RSHUDWLRQ VLPSO\ OXPSV SDWWHUQV QDPHG E\ WZR RSHUDQGV LQWR D VLQJOH DVVRFLDWLRQVHW DQG WKH SDWWHUQV LQ DQ DVVRFLDWLRQVHW DUH QRW RUGHUHG ERWK VLGHV RI WKH HTXDWLRQ ZLOO SURGXFH WKH VDPH UHVXOW Â’ % $VVRFLDWLYLW\ f DZr^5&/9&/f?3^ PAGE 152 ZKHUH D UHSUHVHQWV D VXEVHW RI D SDWWHUQV ZKLFK FDQ EH FRQFDWHQDWHG ZLWK D VXEVHW RI IW SDWWHUQV DQG WKHUHDIWHU EH FRQFDWHQDWHG WKURXJK IW SDWWHUQVf ZLWK D VXEVHW RI SDWWHUQV D UHSUHVHQWV D VXEVHW RI D SDWWHUQV ZKLFK FDQ EH FRQn FDWHQDWHG ZLWK D VXEVHW RI IW SDWWHUQV ZKLFK KRZHYHU FDQQRW EH FRQFDWHQDWHG ZLWK DQ\ SDWWHUQ DQG D UHSUHVHQWV D VXEVHW RI SDWWHUQV ZKLFK HLWKHU GRHV QRW KDYH WKH ,QQHUSDWWHUQV RI &/W RU FDQQRW EH FRQFDWHQDWHG ZLWK DQ\ IW SDWWHUQ U 1RWH WKDW DQ D SDWWHUQ PD\ EHORQJ WR D DQG D f P Ef IW IW IW IW IW Q ZKHUH 3 FDQ EH FRQFDWHQDWHG ZLWK D DQG FDQ EH FRQFDWHQDWHG ZLWK D EXW QRW ZLWK If FDQ EH FRQFDWHQDWHG ZLWK EXW QRW ZLWK RU DQG FDQQRW EH L FRQFDWHQDWHG ZLWK HLWKHU D RU IW 1RWH WKDW SDWWHUQV RI IW IW IW DQG IW DUH PXWXDOO\ H[FOXVLYH Ff ZKHUH DQG KDYH WKH VLPLODU LQWHUSUHWDWLRQV DV RU RU DQG D UHVSHF WLYHO\ LOO ,I DIW D IW IWnf IW DQG DIW DUH XVHG WR UHSUHVHQW WKH UHVXOWV RI WKH $VVRFLDWH RSHUDWLRQV DFFRUGLQJ WR WKH GHILQLWLRQ RI $VVRFLDWH ZH KDYH ,, ,,, LOO OHIWKDQG VLGH D D D fr^5>&/Y&/A?^IW IW IW IW ff 5&/&/f@Q f ^DIW D IWfr>5^&/Y&/f`^ f , DIWO ULJKWKDQG VLGH RU D D f r^5&/Y&/f?IW IW IW IW f ,,, r>M5&/&/f@ ff L Q KL LW LQ Q D D D fr?5&/Y&/f@>S IW f , DIW Â’ PAGE 153 f D^;`?>5&/Y&/f@3^ PAGE 154 OLQQ LQ LQ >D3 D 3 f?>5&/O&/f`nf f D3 L LW QL L Q LQ QQ ULJKWKDQG VLGH RU D D f ?>5&/9&/f@3 3 3 3 f ,, ,,, _>M5&/M&/f@ ff L Q QU L L LQ Q D D D f?^5^&/Y&/f@3Q 3 Of , D 3L ZKHUH D3 D 3 3nf 3 RSHUDWLRQV Â’ DQG D3 UHSUHVHQW WKH UHVXOWV RI WKH $&RPSOHPHQW f DZm^:n`"P0:`: r^[`:`S> PAGE 155 WKH SDWWHUQV LQ D VHW DUH QRW RUGHUHG WKH RUGHU RI SHUIRUPLQJ $8QLRQ RSHUDWLRQV RQ D QXPEHU RI DVVRFLDWLRQVHWV ZLOO KDYH QR HIIHFW RQ WKH ILQDO UHVXOW Â’ 'LVWULEXWLYLW\ f RWr>5^$%f@^37f [r^5^$%f@3 DIr>-$IOf@ f 3URRI )LUVW D DQG FDQ EH GHFRPSRVHG DV IROORZV L Q LQ Df D D D D W Q P ZKHUH D FDQ EH FRQFDWHQDWHG ZLWK D FDQ EH FRQFDWHQDWHG ZLWK DQG D FDQQRW EH FRQFDWHQDWHG ZLWK HLWKHU 3 RU 1RWH WKDW DQ DW SDWWHUQ PD\ EHORQJ ,, WR DW DQG D Ef 3 3 P ,, ZKHUH IW FDQ EH FRQFDWHQDWHG ZLWK D EXW If FDQQRW Ff Q ZKHUH FDQ EH FRQFDWHQDWHG ZLWK D EXW FDQQRW %\ WKH GHILQLWLRQ RI WKH $VVRFLDWH RSHUDWLRQ ZH KDYH ,, ,,, ,, W ,, OHIWKDQG VLGH D D D f r>5$%f?3 IW f ,, ,, DI D ,, ,,, ,, ,, ,,, OO ULJKWKDQG VLGH D D D fr>5$%f`3 IWf DW D D fr>5$%f@ f ,, ,, DIf DW Â’ f FW?^5$%f?3nOf r?>5$%f` D_>-$IOf@ f 3URRI D IW DQG FDQ EH GHFRPSRVHG DV IROORZV P Df D D D D P D ZKHUH D FRQWDLQV SDWWHUQV WKDW DUH FRQQHFWHG WR IW E\ &RPSOHPHQWSDWWHUQV D ,,, FRQWDLQV SDWWHUQV WKDW DUH FRQQHFWHG WR E\ &RPSOHPHQWSDWWHUQV DQG D FDQ PAGE 156 QRW EH FRQQHFWHG WR HLWKHU 3 RU E\ &RPSOHPHQWSDWWHUQV 1RWH WKDW DQ D SDW WHUQ PD\ EHORQJ WR D DQG D Ef 3 3 3 Q ZKHUH c FDQ EH FRQQHFWHG WR D E\ &RPSOHPHQWSDWWHUQV EXW I FDQQRW Q ZKHUH FDQ EH FRQQHFWHG WR D E\ &RPSOHPHQWSDWWHUQV EXW FDQQRW %\ WKH GHILQLWLRQ RI WKH $&RPSOHPHQW RSHUDWLRQ ZH KDYH OHIWKDQG VLGH D D RU f ?>5$%f@3 3 f ,, ,, D D ,, +, W ,, ,, +, ,, ULJKWKDQG VLGH D D D f??5$%f@3 3f D D D f_>L($%f@ f ,, +, DIO DW Â’ f Dr^;`3f RW}^;` D^;` f 3URRI D IW DQG FDQ EH GHFRPSRVHG DV IROORZV + +, Df D D Fr DU P D KL ZKHUH D LQWHUVHFWV ZLWK D LQWHUVHFWV ZLWK DQG D GRHV QRW LQWHUVHFW ZLWK HLWKHU If RU 1RWH WKDW DQ D SDWWHUQ PD\ EHORQJ WR D DQG D Ef 3 3 3 D ,, ZKHUH 3 LQWHUVHFWV ZLWK D EXW S GRHV QRW ,, ZKHUH LQWHUVHFWV ZLWK D EXW GRHV QRW %\ WKH GHILQLWLRQ RI WKH $,QWHUVHFW RSHUDWLRQ ZH KDYH + ,+ ,, + OHIWKDQG VLGH RU D D f^;`" 3 f ,, +, RWS D + ,,, ,+ ,, +, ,, ULJKWKDQG VLGH m DW D f}^;`3 3f DW DW DW f^;` f ,, +, D3 D Â’ PAGE 157 f DZ r^5&/9&/f@"P^ :`^]`f m r^5 &/9 &/f?3>\@^ :M;`DUZ r?5 &/Y&/f?>]` f f rZ?>5&/Y&/f`3^\f^:`nL^]ff D_>L&/&/3^\`^:8;`DZ_>L&/f&/f@^]` f 7KH DERYH WZR GLVWULEXWLYH SURSHUWLHV KROG ZKHQ WKH IROORZLQJ FRQGLWLRQV DUH WUXH Lf &/e^:` LLf ;U?< ;QZ M! DQG LLLf RU LV D KRPRJHQHRXV DVVRFLDWLRQfÂ§VHW 7KH ILUVW FRQGLWLRQ HQVXUHV WKDW WKH RSHUDWLRQV $VVRFLDWH $&RPSOHPHQW DQG 1RQ$VVRFLDWH ZLOO RSHUDWH RQ WKH FRPPRQ FODVV RI DQG DV VKRZQ LQ Df RI WKH IROn ORZLQJ ILJXUH 2WKHUZLVH WKH GLVWULEXWLRQV RI WKHVH RSHUDWLRQV WR DQG GR QRW PDNH VHQVH DV VKRZQ LQ Ef DQG Ff 7KH VHFRQG FRQGLWLRQ HQVXUHV WKDW D SDWWHUQV PXVW QRW LQWHUVHFW ZLWK DQ\ SDWWHUQ RI HLWKHU A RU VR WKDW WKH f^;X :` RSHUDWLRQV RQ WKH ULJKWKDQG VLGHV RI WKH HTXDWLRQV ZLOO H[DPLQH WKH LQWHUVHFWLRQV RQ WKH SRUWLRQV RI D DQG VHSDUDWHO\ 7KH WKLUG FRQGLWLRQ HQVXUHV WKDW RQ WKH ULJKWKDQG VLGHV RI WKH HTXDWLRQV RQO\ WKRVH SDWWHUQV WKDW KDYH WKH VDPH D SDWWHUQ ZLOO LQWHUVHFW DQG EH UHWDLQHG LQ WKH UHVXOW Df Ef Ff :H VKDOO RQO\ JLYH WKH SURRI RI FDQ EH SURYHG XVLQJ WKH VDPH WHFKn QLTXH PAGE 158 :KHQ WKH FRQGLWLRQV DUH WUXH D 3 DQG FDQ EH GHFRPSRVHG DV IROORZV Q LQ QQ Df D D D r RU ZKHUH D FDQ EH FRQFDWHQDWHG ZLWK 3 DQG D FDQ EH FRQFDWHQDWHG ZLWK IW EXW LQ QQ QRW ZLWK D FDQ EH FRQFDWHQDWHG ZLWK EXW QRW ZLWK DQG D FDQQRW EH P QQ FRQFDWHQDWHG ZLWK HLWKHU c RU 1RWH WKDW D D D DQG D DUH PXWXDOO\ H[FOXVLYH f L Q QL QQ Ef S S S S S L Q ZKHUH 3 FDQ EH FRQFDWHQDWHG ZLWK D DQG GRHV LQWHUVHFW ZLWK IW FDQ EH FRQ +, FDWHQDWHG ZLWK D EXW GRHV QRW LQWHUVHFW ZLWK I FDQQRW EH FRQFDWHQDWHG ZLWK QQ D EXW GRHV LQWHUVHFW ZLWK DQG IW FDQ QHLWKHU EH FRQFDWHQDWHG ZLWK D QRU L Q P QQ LQWHUVHFW ZLWK 1RWH WKDW IW c DQG c DUH DOVR PXWXDOO\ H[FOXVLYH Q LQ QQ Ef ZKHUH DQG KDYH WKH VLPLODU LQWHUSUHWDWLRQV DV 3 IW IW DQG 3 UHVSHFWLYHO\ %\ WKH GHILQLWLRQ RI WKH RSHUDWLRQV RI $VVRFLDWH DQG $,QWHUVHFW ZH KDYH L LQ QQ Q P QQ OHIWKDQG VLGH D D D D f r^5&/9&/f?3 3 3 3 f L Q LQ QQ f^:` ff L Q P QQ LQ LQ D D D D fr^5&/9&/f?3 3 f ,,, ,,, 6LQFH &/H^:` DQG 3 FDQQRW EH SURGXFHG E\ WKH ^;` RSHUDWRU DFFRUGLQJ WR LQ QU WKH GHFRPSRVLWLRQV RI 3 DQG 2WKHUZLVH RU 3f PXVW FRQWDLQ WKH VDPH ,QQHU SDWWHUQ RI &/ DV FRQWDLQHG LQ 3 RU f DQG PXVW EH DEOH WR FRQFDWHQDWH ZLWK D $SSO\LQJ WKH GLVWULEXWLYH SURSHUW\ ZH REWDLQ PAGE 159 m5&/Y&/f?3L G r?5&/Y&/f?3 Q WLQ WQ P D r>5&/Y&/f?3 D r>5^&/9&/f?3 ,W , ,,, WWW ,,, D 5^&/Y&/f@SnL D r^5&/Y&/f@S XQ W L WLQ WWW WWW m 5^&/Y&/f`3 D r^5^&/Y&/f@3 %DVHG RQ WKH GHFRPSRVLWLRQV RI D 3 DQG RQO\ WKH ILUVW LWHP ZLOO SURGXFH QHZ SDWn WHUQV DQG LV UHWDLQHG +HQFH G r>5&/Y&/f`3G U DSn\ 2Q WKH ULJKWKDQG VLGH RI WKH HTXDWLRQ ZH KDYH W LW WWW WLQ W WW WWW WWQ ULJKWKDQG VLGH D D RU D fr>5&/O&/f@3 3 3 3 ff W LW WQ WWQ W Q WQ WWQ ^;X:`D D D D fr^5^&/9&/f`^ ff W L LQ WW U LQ WQ WQ Q D3 D3 D 3 D "fr^;X:`Fr D D D f $SSO\LQJ WKH GLVWULEXWLYH SURSHUW\ ZH KDYH LW W W L LW LW WWW L U W WQ Q ULJKWKDQG VLGH D3}^;?M:`DWnL D3}^;?>M:`DWnL DAW^-XL9`D D3}^;?M:fD LQ Q WW W WW WQ W WW WQ WW D3f^;?M:`Dnf D3f^;?M:fDAc D3}^;?M:`D D 3 f^;?M:`RW Q W WWW LQ Q WWW W WW WQ Q D 3}^;?M:`DL D "r^;X:n`D D 3}^;?M:fD D 3}^;?M:`D D 3 f^;?M:`DWnf D 3 f^;?M:fDWnf D 3 r^;X :`D D 3 f^;X:`DU 2I WKH VL[WHHQ LWHPV RQO\ WKH ILUVW RQH LV UHWDLQHG 7KH UHVW RI LWHPV DUH GURSSHG EHFDXVH WKH\ GR QRW LQWHUVHFW HLWKHU RYHU FODVVHV LQ ^;` RU RYHU FODVVHV LQ ^:f 7KHUHn IRUH ,W WW ULJKWKDQG VLGH mAf^;X:`Dn W D3 Â’ PAGE 160 ( 2WKHU 3URSHUWLHV f DZPP ÂÂm0P f Q P QQ L Q 3URRI D FDQ EH GHFRPSRVHG LQWR D D D D ZKHUH D VDWLVILHV 3[ DQG 3 D P QQ RQO\ VDWLVILHV 3Y D RQO\ VDWLVILHV 3 DQG D GRHV QRW VDWLVI\ HLWKHU 3[ RU 3 DLm m9M m I Z m;} m f>A@ m DWmf>3$!3@ D D f ,O RLDPZ R Df>I@f>A 3=ef f 3URRI )LUVW D LV GHFRPSRVHG LQWR D r r ZKHUH D VDWLVILHV WKH VHOHFWLRQ FRQGLWLRQ Q LQ EXW D GRHV QRW 7KHQ OHW DQG UHSUHVHQW WKH UHVXOWV RI WKH SURMHFWLRQ RSHUDWLRQ FRUUHVSRQGLQJ WR D DQG D UHVSHFWLYHO\ 6LQFH 3&6 VDWLVILHV 3 EXW GRHV QRW DQG ZH KDYH ,, rD3f>I7, 9f> 3 D> ,ODf>eO@f>3L R^Â f Â’ f R>D r>IO$%f@ IOIW$\ A0}@ L$%f@ Df>3@ f ZKHUH 3[ DQG 3 DUH DSSOLFDEOH WR D DQG UHVSHFWLYHO\ Q QL QQ L Q 3URRI )LUVW D LV GHFRPSRVHG LQWR D D D D ZKHUH D DQG D VDWLVI\ 3[ EXW QL QQ L LQ P Q D DQG D GR QRW DQG D DQG D FDQ EH FRQFDWHQDWHG ZLWK VRPH SDWWHUQV EXW D QQ L Q LQ QQ DQG RU GR QRW If FDQ EH GHFRPSRVHG LQWR c 3 ZLWK D VLPLODU LQWHUSUH WDWLRQ 7KHUHIRUH ZH KDYH ,, OLW LQ ,,, ,,, R>D r>5$%f` f>3[D3` M>D D c D D f>3[D3? , D rLrf>!L@ YÂf>3D Df r?5^$%f? f , D Â’ PAGE 161 f RD r>Â$%f@ 3f>3^63Â rrf>!@ r>5$0 3 D f ZKHUH 3O DQG 3 DUH DSSOLFDEOH WR D DQG 3 UHVSHFWLYHO\ 3URRI D DQG 3 DUH GHFRPSRVHG DV LQ WKH DERYH SURRI 7KXV ZH KDYH ,W O ,,, ,,, O ,,, LQ R^D r^5^$%f` 3f>3O?3Â R>D3 r3 D 3 DW 3 f>3OY3@ ,, ,,, ,,, DW3 DW3 DW 3 mm0O 3 r m:LQ L LW LQ PL L Q P PL L LW DW RW fr>-$%f@ 3 3 3 "f mfm DW fr>5$%f@3 3f ,W ,,, ,,, DW3 RW3 D 3 Â’ f R^DW 3f3c R^Df>3L 3 f L LL LQ QQ L Q Q 3URRI :H GHFRPSRVH D LQWR D D D DW ZKHUH D DQG D VDWLVI\ 3 EXW D DQG PL L P P Q QQ RW GR QRW DQG D DQG D FRQWDLQ 3 SDWWHUQV EXW RW DQG D GR QRW 7KHQ ZH KDYH Q WP Q R^DW Sf>WI RDW D f>A RU DDf>3? 3 DW DWf3 DW Â’ f !} 3f>3L RDf3> R3f>3> f OO OO 3URRI 6XSSRVH D DQG 3 DUH GHFRPSRVHG LQWR VXEVHWV D DQG D DQG 3 DQG 3 UHVSHF ,, OO OO WLYHO\ ZKHUH RW DQG 3 VDWLVI\ 3 EXW D DQG 3 GR QRW %\ WKH GHILQLWLRQ RI $6HOHFW RSHUDWLRQ ZH KDYH DD 3f>3? RW 3 RDf>A R3f^3? f RRW 3f>393Â UDP RÂ3f>3Â f ZKHUH 3O DQG 3 DUH DSSOLFDEOH WR D DQG 3 UHVSHFWLYHO\ OO O OO 3URRI 6XSSRVH D DQG 3 DUH GHFRPSRVHG LQWR VXEVHWV D DQG RU DQG 3 DQG 3 UHVSHF ,, OO WLYHO\ ZKHUH D VDWLVILHV 3[ EXW DW GRHV QRW DQG 3 VDWLVILHV 3 EXW 3 GRHV QRW %\ WKH GHILQLWLRQ RI $6HOHFW RSHUDWLRQ ZH KDYH RD 3f>3\3? D 3 ADIO\ r03Â Â’ PAGE 162 f Â,D 3f>e7? ,-DWf>fn7? Q^Sf>e7c f WL L Q 3URRI 6XSSRVH WKDW RU DQG 3 DUH GHFRPSRVHG LQWR VXEVHWV D DQG D DQG 3 DQG 3 UHVSHFWLYHO\ ZKHUH D DQG 3 FRQWDLQ VXESDWWHUQV GHILQHG E\ >eEXW D DQG S GR QRW 7KH UHVXOWV RI WKH WZR $3URMHFW RSHUDWLRQV RQ D DQG 3 DUH UHSUHVHQWHG E\ D DQG IW UHVSHFWLYHO\ %\ WKH GHILQLWLRQ RI $3URMHFW RSHUDWLRQ ZH KDYH QD "f>e @ D 3 Df>e-ef>e7Â’ f RU Sf D f f f ,, 3URRI D DQG c DUH GHFRPSRVHG LQWR VXEVHWV D DQG D DQG If DQG UHVSHFWLYHO\ L ZKHUH D DQG 3 FRQWDLQ SDWWHUQV EXW RW DQG S GR QRW 7KXV ZH KDYH D Sf D S D f f Â’ f D U: 3 f m S f 3URRI %\ WKH GHILQLWLRQ RI WKH $'LYLGH RSHUDWLRQ RQ WKH OHIWKDQG VLGH RI WKH HTXDn WLRQ DQ D SDWWHUQ ZLOO EH UHWDLQHG LQ WKH UHVXOW LI Df LW KDV ,QQHUSDWWHUQV RI FODVVHV LQ ^:` DQG FRQWDLQV DOO SDWWHUQV RI 3 DQG RU Ef WKH ,QQHUSDWWHUQV RI FODVVHV LQ ^:` WKDW DQ D SDWWHUQ KDV DUH FRPPRQ WR VRPH RWKHU D SDWWHUQV DQG WKHVH SDWWHUQV WRJHWKHU GHQRWHG E\ D FRQWDLQ DOO SDWWHUQV RI 3 DQG $Q D SDWWHUQ RU SDWWHUQV LQ Df ZKLFK LV UHWDLQHG RQ WKH OHIWKDQG VLGH RI WKH HTXDWLRQ ZLOO EH UHWDLQHG DIWHU WKH ILUVW $'LYLGH RSHUDWLRQ RQ WKH ULJKWKDQG VLGH VLQFH LW PXVW FRQWDLQ DOO WKH 3 SDWWHUQV ,W ZLOO DOVR EH UHWDLQHG LQ WKH ILQDO UHVXOW DIWHU WKH VHFRQG $'LYLGH RSHUDWLRQ VLQFH LW PXVW FRQWDLQ DOO WKH SDWWHUQV Â’ PAGE 163 f RUZ r^5$%f` S> PAGE 164 f m^[` r>A$%f@ 3^\`f f ^]` fÂ§ ASWf f ^]`f r>A$%f@ 3^ PAGE 165 W WW W Q H[FOXVLYH 3 LV GHFRPSRVHG LQWR " ZKHUH 3 FDQ EH FRQFDWHQDWHG ZLWK D EXW 3 FDQQRW FDQ EH GHFRPSRVHG DV S %\ WKH GHILQLWLRQ RI WKH 1RQ$VVRFLDWH RSHUDWLRQ ZH KDYH W WW P PW WW W WW OHIWKDQG VLGH WW r D WW f >L$f@ ^II 3 f PW WW WW RU R ,, 0W WW WW D 3 LI M! WW WW PW 3 LI D f WW PW WW LI D 3 M! WW OLOW WW 3 LI D I! WWUW WW WW D LI S PW WW PW WW D 3 D RWKHUZLVH W W P U W P 6LQFH D r>5$%f? RWIf RW ZH KDYH ,,Dr>5$%f?Sf>D? RW RW 6LPLODUO\ -Dr>5$M%f@f>RM D D 7KHUHIRUH RQ WKH ULJKWKDQG VLGH ZH KDYH PW 2W D WWW D f WW 3 PW LI m M! PW WW D 3 RWKHUZLVH PW 2W ,, D P D f WW ,, m+ PW WW D RWKHUZLVH PAGE 166 +HQFH W UWI LW P ULJKWKDQG VLGH D?>5$%f?3 fÂ§ D D f D>L$%f@Uf fÂ§ D D f WWI D?>5$%f?3 D D f fÂ§ LW WWW D?>5$%f@nf D D f f D^3 nLf D 3 f 3URRI %\ WKH GHILQLWLRQ RI $'LIIHUHQFH RSHUDWLRQ WKH OHIWKDQG VLGH RI WKH HTXDWLRQ UHWDLQV D SDWWHUQV WKDW GR QRW FRQWDLQ DQ\ SDWWHUQ RI S RU 2Q WKH ULJKWKDQG VLGH WKH ILUVW $'LIIHUHQFH RSHUDWLRQ UHWDLQV D SDWWHUQV WKDW GR QRW FRQWDLQ DQ\ 3 SDWWHUQ DQG WKHQ WKH VHFRQG RSHUDWLRQ UHWDLQV D SDWWHUQV WKDW GR QRW FRQWDLQ DQ\ SDWWHUQ RI 3 WWWW LW IW D ,, PL Q IW D c ,, Ha IW IW 3 WWWW LI D I! IW UXW LW LI RU c I! IW 3 WWWW ,W LI D W! WXW WW IW D LI A WWW IW WWWO IW D 3 D RWKHUZLVH RU Â’n PHQW &HQWHU DW WKH 8QLYHUVLW\ RI )ORULGD ZKHUH KH UHFHLYHG KLV 06 GHJUHH LQ HOHFWULFDO HQJLQHHULQJ LQ PAGE 168 , FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DV D GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ 6WDQOH\ <: 6X &KDLUPDQ 3URIHVVRU RI(OHFWULFDO (QJLQHHULQJ FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DV D GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU +H $VVRFLDWH 3URIHVVRU RI (OHFWULFDO (QJLQHHULQJ RI 3KLORVRSK\ DQ ; /DP &RFKDLUPDQ FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DV D GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ [LNIL 6KDPNDQW % 1DYDWKH 3URIHVVRU RI &RPSXWHU DQG ,QIRUPDWLRQ 6FLHQFHV FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DV D GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ 8 L ODQG\ < 4 &KRZ fURIHVVRU RI &RPSXWHU DQG ,QIRUPDWLRQ 5 3URIHVVRU 6FLHQFHV PAGE 169 , FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DV D GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ -RKQ 6WDXGKDPPHU 3URIHVVRU RI (OHFWULFDO (QJLQHHULQJ 7KLV GLVVHUWDWLRQ ZDV VXEPLWWHG WR WKH *UDGXDWH )DFXLW\ RI WKH &ROOHJH RI (QJLQHHUn LQJ DQG WR WKH *UDGXDWH 6FKRRO DQG ZDV DFFHSWHG DV SDUWLDO IXOILOOPHQW RI WKH UHTXLUHn PHQWV IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ 'HFHPEHU I! :LQIUHG 0 3KLOOLSV 'HDQ &ROOHJH RI (QJLQHHULQJ 0DGHO\Q 0 /RFNKDUW 'HDQ *UDGXDWH 6FKRRO PAGE 170 81,9(56,7< 2) )/25,'$ ASSOCIATION ALGEBRA: A MATHEMATICAL FOUNDATION FOR OBJECT-ORIENTED DATABASES By MINGSEN GUO A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1990 Copyright 1990 by Mingsen Guo Dedicated to my dear wife Zhu (Susie) and lovely daughter Jialan. And to our parents Jingcheng Guo and Ruiying Zhang Shuyan Huang and Chuanxiang Chen, this was their dream before it was mine. ACKNOWLEDGEMENTS I would like to express my sincere appreciation to Dr. Stanley Su, chairman of my supervisory committee, for giving me the opportunity to work on this interesting and important topic in the area of object-oriented database systems. Without his patient guidance and continuous support, this work could not have been completed. I am grateful to Dr. Herman Lam, cochairman of my supervisory committee, for his thought-provoking suggestions on this work. I thank Dr. Sham Navathe for his comÂ¬ ments and his personal library. I thank Dr. Randy Chow for his encouragement throughout my graduate study. I would like to thank Dr. John Staudhammer for his time and for being on my supervisory committee. My special thanks go to Sharon Grant, the secretary of the Database Systems Research and Development Center, whose help to me is always friendly and in time. This research was supported by the National Science Foundation (DMC- 8814989) and the National Institute of Standard and Technology (60NANB4D0017). The development effort is supported by the Florida High Technology and Industrial Council (UPN88092237). IV TABLE OF CONTENTS Page ACKNOWLEDGMENTS iv ABSTRACT vii CHAPTER 1 INTRODUCTION 1 2 A SURVEY OF RELATED WORK 12 2.1 Relational Model and Relational Algebra 12 2.2 Existing 0-0 Query Languages 18 2.3 ENCORE 0-0 Data Model and Its Underlying Query Algebra. 25 3 OVERVIEW OF 0-0 DATABASES AND ASSOCIATION-BASED QUERY FORMULATION 38 3.1 Overview of 0-0 Databases 38 3.2 Pattern-based Query Formulation 41 3.3 Conclusion 45 4 ASSOCIATION ALGEBRA 51 4.1 Definitions 51 4.2 Relationship Between Two Patterns 55 4.3 Association Operators 56 4.4 Query Examples 71 5 MATHEMATICAL PROPERTIES OF OPERATORS AND THEIR APPLICATIONS IN QUERY OPTIMIZATION AND QUERY DECOMPOSITION 91 5.1 Conventional Algebraic Properties 91 5.2 Nesting of Two Unary Operators 95 5.3 Nesting of Binary Operator in Unary Operator 97 5.4 Cascading of Two Binary Operators 99 5.5 General Identities 104 5.6 Transformation of Operators 104 5.7 Applications in Query Optimization and Decomposition 106 6 COMPLETENESS OF THE A-ALGEBRA 118 7 CONCLUSION 133 v REFERENCES 135 APPENDIX 141 BIOGRAPHICAL SKETCH 159 vi Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy ASSOCIATION ALGEBRA: A MATHEMATICAL FOUNDATION FOR OBJECT-ORIENTED DATABASES By Mingsen Guo December 1990 Chairman: Dr. Stanley Y.W. Su Major Department: Electrical Engineering Existing 0-0 DBMSs lack a solid mathematical foundation for the manipulation of 0-0 databases, optimization of queries, and the design and selection of storage structures for supporting 0-0 database manipulations. An association algebra (A- algebra) is prescribed for serving as a mathematical foundation for processing 0-0 databases, which is analogous to the use of relational algebra for processing relational databases. In this algebra, objects and their associations in an 0-0 database are uniÂ¬ formly represented by association patterns which are manipulated by a number of operators to produce other association patterns. Different from the relational algeÂ¬ bra, in which set operations operate on relations with union-compatible structures, the A-algebra operators can operate on association patterns of both homogeneous and heterogeneous structures. Different from the traditional record-based relational proÂ¬ cessing, the A-algebra allows very complex patterns of object associations to be directly manipulated. Pattern-based query formulation and the A-algebra operators are described. Some mathematical properties of the algebraic operators are Vll presented together with their application in query decomposition and optimization. The completeness of the A-algebra is also defined and proven. The A-algebra has been used as the basis for the design and implementation of an object-oriented query language, OQL, which is the query language used in a prototype Knowledge Base Management System OSAM*.KBMS. Vlll CHAPTER 1 INTRODUCTION In the past two decades, techniques of data modeling have gone through two major conceptual changes. First, in early 1970s, E. F. Codd observed that future database systems should allow application programs and terminal users to remain unaffected by changes made to the internal data representation (or the storage structure) of a database. He introduced the relational data model [COD70] and proposed the relational algebra and relational calculus [COD72a] as the mathematical foundation for processing relational databases. The relational model provides two levels of data independence in a three-level architecture for a dataÂ¬ base management system as shown in Figure 1.1 (figures of each chapter are placed at the end of the chapter). At the lower level, the physical data indepenÂ¬ dence is provided, i.e., the logical representation of a relational database is a set of relations (i.e., flat tables), which is independent of the physical (data and storage) structures in which data are stored. At the higher level, the logical data indepenÂ¬ dence is provided, i.e., the external view remains unchanged when the logical view of a database is modified (note that the external view remains unchanged only for some schema modifications). Besides simple logical representation and data independence, the fact that the relational model has a solid mathematical foundaÂ¬ tion is very important and has contributed to the success of the model and the existing relational database management systems. 1 2 However, the relational model and relational systems have some limitations. For example, the model captures rather limited structural properties of real-world entities or objects. The construct of aggregation hierarchy which models complex objects and the construct of generalization which models the superclass-subclass relationship are not provided. In the relational model, data which describe a comÂ¬ plex object are scattered among a number of normalized relations and accessing that data involves time-consuming traversal and assembly of data stored in multiÂ¬ ple relations. The model also does not allow behavioral properties of entities/objects to be explicitly defined. The second conceptual change of data modeling techniques occurred in the early 1980s. The object-oriented paradigm, first introduced in the programming language SIMULA [DAH67] and made very popular through the language SMALLTALK [GOL81], allows richer structural constructs and behavioral properÂ¬ ties of objects to be specified at the logical level independent of their physical implementations. Several features of the paradigm such as abstract data types, inheritance, encapsulation, information hiding, polymorphism, etc. have been shown to be useful for data modeling and system development. The object encapÂ¬ sulation concept adds a level of data independence between the physical and the logical independences introduced in the relational model, as depicted in Figure 1.2. It requires that the structural and behavioral properties of an object be (logically) encapsulated in its class in the conceptual view of an 0-0 database. Since then, a number of Object-Oriented (0-0) and semantic data models have been proposed [HAM81, BAT84, KIN84, ZAN85a, ZAN85b, DAD86, MAI86, MAN86, SU86, 3 ZD086, WOE86, BAN87, FIS87, HOR87, HUL87, KIM87, ROW87, CAR88, COL89, SU89], which offer more powerful constructs for modeling the structural and behavioral properties of objects found in advanced applications such as CAD/CAM, CASE, and decision support systems. An 0-0 semantic data model can be structurally and/or behaviorally object- oriented [DIT86]. A structurally 0-0 data model is one that encompasses at least the following characteristics: (1) It supports the unique identification of objects, that is, each object has a unique object identifier (surrogate) which is valid for the life-time of the object. (2) It categorizes those objects which can be described by the same set of characÂ¬ teristics (attributes) into an object class. (3) It allows aggregation (association) hierarchies to be defined. (4) It allows generalization (association) hierarchies to be defined. The 0-0 view of an application world is represented in the form of a netÂ¬ work of classes and associations. Object class can be either a primitive-class whose instances are of simple data types (e.g., string, integer) or a nonprimitive class (e.g., Part, Student, Teacher). At the extensional level, instances of different classes can be related (associated) with each other forming patterns of object assoÂ¬ ciations. A behaviorally object-oriented data model, on the other hand, is one in which operations that describe the behavior of the objects of a class can be defined and registered with that class. Programs or methods that implement the operaÂ¬ tions defined for an object are transparent to the user of the objects. 4 For these models to be truly useful, they must provide some object manipulaÂ¬ tion languages, which can take advantage of the expressive power of the models and provide the users with simple and powerful querying facilities. Recently, several query languages such as DAPLAX [SHI81], GEM [ZAN83, TSU84], ARIEL [MAC85], FAD [BAN87], POSTQUEL [ROW87], EXCESS [CAR88], and others reported in [DAD86, MAN86, SER86, BAN87, FIS87, BAN88, COL89, SHA90] have been proposed. These languages were developed based on different paraÂ¬ digms. For example, DAPLAX and the query language of [MAN86] are based on the functional paradigm. The query language of [BAN88] is based on the message passing paradigm. Other query languages are based on the relational paradigm: an extension of QUEL [ROW87, CAR88]; an extension of SQL [DAD86]; and an extension of the relational algebra [COL89]. The query language of [FIS87] is based on both functional and relational paradigms, allowing functions to be used in object-oriented SQL (OSQL) constructs. The above languages have an 0-0 flavor and have taken significant steps towards the development of a powerful 0-0 query language. Query languages such as DAPLAX [SHI81], GEM [ZAN83], ARIEL [MAC85], and the object- oriented query language described in [BAN88], are based on the view of a dataÂ¬ base defined in terms of objects, object classes, and their associations. A query in these languages is formulated by specifying one class (usually a nonprimitive-class, whose instances are real world objects) in the schema as a central class with some path expressions. Each path expression starts from the central class and ends at another class (usually a primitive-class, whose instances are of basic data types 5 such as integer, string, set, etc.). A restriction condition can be specified on the class referenced at the end of a path expression. This class can also be specified in the list of attributes to be retrieved. The result of a query is a set of tuples, each of which corresponds to a single instance of the central class and contains values related to that instance which are collected from classes specified in the fist. A major drawback of these query languages is that they do not maintain the closure property [ALA89b]. A query language is said to be closed if the result of a query can be further queried by other queries specified in the same language. In the above mentioned languages, the input to a query has an 0-0 representation (i.e., a network of objects, classes, and their associations) whereas its output is a relation which does not have the same structural and behavioral properties as the original objects. Consequently, the result of a query cannot be further processed by the same set of operators. The design of these languages is very much influenced by the relational model and relational languages which are concerned mainly with retrieval and storage operations. In 0-0 processing, objects in different classes that satisfy some search conditions are subject to different user- defined operations. The idea of collecting data to form a resulting relation does not satisfy this processing model. The query languages proposed [DAD86, MAN86, BAN87, ROW87, CAR88, COL89] use nested relations as their logical views of 0-0 databases. Although these languages are closed, i.e., operators in these languages operate on nested relations to produce nested relations, the nested relation is not a proper logical representation for an 0-0 database which is basically a network structure of 6 object associations. Mapping from a network representation to nested relations is an additional process. Furthermore, in order to use a nested relation to represent complex network structures, a considerable amount of data has to be introduced to relate these nested relations. It is our view that the query language and its underlying algebra should directly support the manipulation of network structures. A query algebra [SHA90] was proposed recently based on the 0-0 model ENCORE [ELM89]. Although ENCORE models applications as networks of objects, object types, and their associations, the domain of the algebra is defined as sets of objects of the Tuple type, which is essentially the nested relation representation since it allows the nesting of tuples. Therefore, the mapping probÂ¬ lem addressed above still remains. In this algebra, two identical queries or two identical operations in a single query do not give the same response, since each produces a new object in the database. To eliminate duplicated copies of the same newly created object, the algebra introduces operations like DupEliminate and Coalesce, which would not have been necessary if the algebra were to directly support the network-structured processing of 0-0 databases. We further observe that the union operation in this algebra may produce a collection of objects having the same data type but with different structures (e.g., the union of two collections of objects of the Tuple type with different arities). Nevertheless, the other operaÂ¬ tors introduced in the algebra are not defined to operate on collection of objects with heterogeneous structures. A common limitation of many existing query languages is that they cannot express "non-association" relationship between objects easily, i.e., identify objects 7 in two classes that are not associated with each other while their classes are. For example, in an 0-0 database, let us assume that Suppliers si and s2 supply Parts pi and p2, respectively. GEM, POSTQUEL, and several other query languages provide the "dot" construct (Suppliers.Parts) and ARIEL provides the "of" conÂ¬ struct (Parts of Suppliers) to navigate from the class Suppliers to the class Parts to produce object pairs (si,pi and s2,p2). However, they do not have a language construct for specifying the semantics that si does not supply p2 and s2 does not supply pi. Similarly, in functional languages, only the function Parts(Suppliers) is provided to specify the associations of si,pi and s2,p2 but not the non-association of suppliers and parts. In view of the disadvantages of the existing 0-0 query languages, we would like to stress the importance of using a graph as the logical representation of an 0-0 database at both intensional and extensional levels as exemplified by 02 [LEC88], FAD [BAN87], and OSAM* [SU89]. The query language and its underÂ¬ lying algebra should provide constructs to directly process graphs with different degrees of complexity. They should also support the specification of nonÂ¬ associations and the processing of heterogeneous structures. Furthermore, the cloÂ¬ sure property should be maintained. In this dissertation, we propose an association algebra (A-algebra) based on the graph representation of 0-0 databases and the association-based query formuÂ¬ lation (refer to Chapter 3). Analogous to the development of the relational algeÂ¬ bra for relational databases, the development of the A-algebra provides the formal foundation for query processing and optimization in 0-0 databases and for 8 designing 0-0 query languages. Unlike the record(tuple)-based relational algebra [COD70 and COD72] and the query algebra [SHA90], the A-algebra is association-based, i.e., the domain of the algebra is sets of association patterns (e.g., linear structures, trees, lattices, networks, etc.) and processing an 0-0 dataÂ¬ base is based on the matching and manipulation of homogeneous as well as heteroÂ¬ geneous patterns of object associations. Operators of the A-algebra can be used to navigate a network of interconnected object classes along the path of interest to construct a complex pattern as the search condition. They can also be used to decompose a complicated pattern into simple ones. Ten operators have been defined for the algebra: three unary operators [A-Select ( Union (+), A-Difference (-), A-Divide (-^), NonAssociate (!), and A-Intersect (â€¢)], where the prefix A stands for "Association". Although many of these operators correspond to the relational algebra operators, they are different from them in that they can operate on complicated heterogeneous structures. In this respect, the A-algebra is more general than the relational algebra. The rest of this dissertation is organized as follows. A detailed survey on the relational model and the relational algebra, the existing 0-0 query languages, and a recently proposed query algebra is provided in Chapter 2. The graphical representation of 0-0 databases and the association-based query formulation are described in Chapter 3 with the help of examples. Chapter 4 formally defines the concepts of Schema Graph (SG), Object Graph (OG), and association patterns. The formal definitions of the association operators and their simple mathematical 9 properties are also presented. The A-algebra expressions for some example queries are given to demonstrate the utility of the algebra. Chapter 5 presents the mathematical properties of the association operators and their utilities in query optimization and query decomposition. The proofs of the mathematical properties of the operators can be found in the Appendix. The completeness of the A- algebra is shown in Chapter 6 and the conclusion is given in Chapter 7. 10 ~\ logical data independence < l physical data ' independence J Figure 1.1 Data independencies in relational databases 11 logical data independence 4 â–º encapsulation physical data ^ independence J Figure 1.2 Architecture of 0-0 databases CHAPTER 2 A SURVEY OF RELATED RESEARCH This section surveys some of the existing work related to the development of the A-algebra. Section 2.1 describes the relational model and the relational algeÂ¬ bra, while Section 2.2 surveys some existing query languages designed for 0-0 semantic data models. The query algebra recently appeared in the literature is surveyed in Section 2.3. 2J Relational Model and Relational Algebra When the hierarchical and network data models were used extensively in information systems in the late 1960s, Codd [COD70] raised an interesting and important question: Can application programs and terminal activities remain invariant as the internal data representations (physical representations) change? He asserted that the future users of large data banks must be protected from havÂ¬ ing to know how the data were organized in the machine. Following this rationale, he conceived the notion of data independence which suggests that the logical organization of data should be independent of its physical representation. Determined to demonstrate the validity of his data independence concept, he proÂ¬ posed a relational data model based on n-ary relations. 12 13 The scheme of a relation, R, of an entity set {Ev E2, ..., En} is defined on a set of m attributes {Av A2, ..., Am} which correspond to m domains {Dv D2, (not necessarily distinct). Each entity (the instance of the scheme) is represented by an m-ary tuple which has its first attribute value from Dv its second attribute from Dv and so forth. A set of attributes of a relation is called a key if the entities of the relation can be uniquely identified by the values of these attributes. In particular, the information of the suppliers such as their names, addresses, items they supply, and the prices of the items can be represented by the relation SUPPLIERS of the following scheme SUPPLIERS(SNAME, SADDRESS, ITEM, PRICE) where the attributes SNAME and ITEM form a composite key. Data represented in this form, which intuitively is a flat table, is the logical view of an application world. It has nothing to do with the physical representation of the data. When designing a database using the relational model, one is often faced with a choice among alternative sets of relation schemes. Some choices are more favorÂ¬ able than others for various reasons. For example, the relation SUPPLIERS is not a desirable scheme because it has the following potential problems: (1) RedunÂ¬ dancy -- the address of the supplier is repeated once for each item supplied. (2) Potential inconsistency (update anomalies) â€” as a consequence of the redundancy, the update of the address of a supplier in one tuple will leave it inconsistent with the address of another tuple. (3) Insertion anomalies -- the address of a supplier cannot be recorded if that supplier does not currently supply at least one item 14 since SNAME and ITEM form a composite key of the relation SUPPLIERS. (4) Deletion anomalies -- the inverse to problem (3) is that should all the items supÂ¬ plied by one supplier be deleted, we unintentionally lose the address of that supÂ¬ plier. The causes of these problems and their solutions are relevant to the funcÂ¬ tional dependencies among the attributes of a relation [COD70, ULL82]. Suppose X and Y are two sets of attributes of a relation. Y functionally depends on X (or X functionally determines Y), denoted by Xâ€”*-Y, if two tuples of the relation havÂ¬ ing the same values in attributes X agree on the values of the attributes in Y. The above four problems emerge if Xâ€”*Y and Xt-*Z hold simultaneously, where X, stands for a proper subset of X and Z a set of attributes of the relation. The solution to these problems is to decompose a relation based on the funcÂ¬ tional dependencies among attributes. For example, the functional dependencies among attributes of the relation SUPPLIERS are (SNAME,ITEM)-*PRICE and SNAMEâ€”Â«-SADDRESS, thereby having the redundancy, update, insertion, and deletion anomalies. It should be clear to the reader that these problems will be eliminated if the relation SUPPLIERS is decomposed into two relations SA(SNAME, SADDRESS) and SIP(SNAME, ITEM, PRICE). There is, however, a disadvantage to the above decomposition; to find the address of a supplier who supplies item "piston", a join operation, has to be applied since the SADDRESS and ITEM are logically distributed in two relations. 15 The decomposition of a relation based on the functional dependencies among its attributes is a novel issue of normalization in the relational model. Four types of normal forms, denoted by INF, 2NF, 3NF, and Boyee-Codd-NF, respectively, have been recognized in considering the functional dependency [COD70, ARM74, and BEE77]. The Boyee-Codd-NF is the strongest of these normal forms. RelaÂ¬ tions in these normal forms may have to be further decomposed into 4NF or 5NF to eliminate multivalued dependencies [FAG77, DEL78, and ZAN76] and join dependencies [AH079]. This decomposition is needed to eliminate further redunÂ¬ dancy and anomalies. The success and popularity of the relational model and the relational dataÂ¬ base management systems (DBMSs) are due to its simplicity in structural (tabular) representation and its sound theoretical basis -- the relational algebra and the relaÂ¬ tional calculus [COD72a]. The relational algebra defines five primitive operators, of which two are unary operators [Projection (77) and Selection ( tors such as Join, Natural-join, Set-intersection, and Set-division are also defined in the algebra. Although these later operators are easy to use, they are not primiÂ¬ tive since they can be expressed in terms of the primitive operators. The relational algebra has the closure property, since every operator must operate on one or more relations and produces a new relation. Operators of the relational algebra basically operate on the values of tuples in relations. StructurÂ¬ ally speaking, they are defined to operate on tuples whose structures are union- compatible (homogeneous). The relational algebra is complete in the sense that it 16 has the equivalent expressive power to the relational calculus [COD72a and ULL82]. Because of this, it serves as the theoretical basis for the relational model. The relational algebra has been used for the following three purposes, although it has not been previously implemented in any existing DBMSs exactly as defined [ULL82], (1) It creates a new class of query languages called algebraic languages. Based on the relational algebra, languages that directly adopt the relational operators can be developed, such as ISBL [TOD76] which is a close approximation to the relational algebra. Although languages of this type are mostly procedural, it is relatively easy to demonstrate their completeness along with the mathematical properties of the relational algebra which can be readily applied to query optimization and query decomposition. (2) It not only serves as a benchmark for evaluating query languages in existing systems, but also as the criterion for designing new languages for relational DBMSs. A relational language will not have the necessary expressive power if it is not relationally complete [ULL82]. (3) It provides a mathematical basis for transforming expressions in query decomÂ¬ position and (logical or conceptual) query optimization. As an algebra form, the mathematical properties of the relational algebra can be explored precisely and systematically. For query languages construed as algebraic languages, these mathematical properties exhibit a straightforward application [HAL76]. Query languages like SQUARE or SEQUEL having certain algebraic features may also use these properties, since the parse of a query yields a tree in which 17 some nodes represent relational algebra operators [AST76]. Even if a query language such as QUEL is a relational calculus language, its calculus-like expressions are translated into relational algebra expressions in the QUEL optimizer [WON76]. The total content proposed by Codd before 1979 on the relational model is refered as Version 1 of the relational model (RM/Vl), whose modeling capabilities were extended by Codd in 1979 [COD79] to version RM/T (T for Tasmania). Based on these two versions, Codd [COD90] introduces Version 2 of the relational model (RM/V2). The most important additional features in RM/V2 are as folÂ¬ lows: (1) A new treatment of items of data missing because they represent properties that happen to be inapplicable to certain object instances. (2) New features supporting all kinds of integrity constraints, especially the user- defined integrity constraints. (3) A more detailed account of view updatability. (4) New features pertaining to the management of distributed databases. It is important to recognize the fact that hierarchical and network models as well as the relational model evolved during a time in which the primary applicaÂ¬ tions of information systems were business-oriented. In an attempt to apply these techniques to the more complicated application areas such as CAD/CAM, CASE, and decision support, it is found that the relational model is no longer adequate for modeling these advanced applications. The inadequacies of the relational model are summarized as follows. First, the relational model has limited modeling 18 capabilities. When data are logically represented in the form of relations, the relaÂ¬ tionships among entities in these relations are represented by matching values of the attributes or keys in one relation with values of the attributes or foreign keys in other relations. The actual semantics among the data such as generalization and aggregation (the abstract data type) cannot be modeled by the relational model. Second, the relational model only models the structural aspects of entities, and thus, ignores their behavioral aspects (e.g., system-defined and user-defined operations). Third, in these advanced applications, the concept of data indepenÂ¬ dence should be further extended to the concept of object encapsulation, i.e., not only should the logical representation of an object be separated from its physical representation, but its structural and behavioral properties should be logically encapsulated in its class. The object encapsulation concept cannot be realized in the relational model, since the data describing an entity may be logically scattered among several relations due to normalization [COD70, COD72b, BEE77, and ULL82]. Fourth, entities with complex structures and complicated relationships among entities are not representable by flat tables (relations). Finally, it cannot represent and operate on entities with different (heterogeneous) structures. 12. Existing 0-0 Query Languages An extensive literature search on query languages for accessing 0-0 dataÂ¬ bases such as GEM [ZAN83, TSU84], ARIEL [MAC85], DAPLEX [SHI81], FAD [BAN87], POSTQUEL [ROW87], EXCESS [CAR88], as well as other proposed languages [ST084, DAD86, MAN86, SER86, BAN87, FIS87, BAN88, COL89, 19 SHA90] has been carried out. This section surveys a representative sample of these languages. Most existing query languages have capabilities beyond those provided by its theoretical basis. For example, the arithmetic operations and aggregation functions provided by the relational languages are not available in the relational algebra. Therefore, this survey is limited to those features which are relevant to the proposed algebra. To demonstrate the similarities and differences of these languages, the same database schema as shown in Figure 2.1 is used for example queries written in GEM, ARIEL, DAPLEX. The sample schema of Figure 2.1 is for a government owned laboratory system where rectangles represent classes and edges (links) represent attributes. QUEL [ST076, WON76, and Z0077] is a tuple-calculus oriented query language for relational DBMS INGRES [ST076]. In order to avoid the ambiguity which arises when two attributes of different relations having the same name are addressed in a single query, QUEL uses a "dot" mechanism to qualify an attribute of a relation (i.e., a dot is inserted between the name of the relation and the name of the attribute). For example, Equipment.Name refers to the attribute Name of the relation Equipment. Influenced by this mechanism, the existing 0-0 query languages use similar notations for navigating the database schema from one class to another or from one relation to other relations in systems which use relational databases as their back-ends. The language GEM [ZAN83,TSU84] is an extension of QUEL for the data model DSIS which supports aggregation, generalization, and unique identification 20 of objects. In GEM, a class in an aggregation hierarchy that has a link emanating to another class has the name of the later class as the data type of one of its attriÂ¬ bute. For example, the class Lab has an attribute, Facility, of the type EquipÂ¬ ment, and has another attribute, Locality, of the type Location, and so forth. The dot notation is used in GEM for navigating along the reference attributes (links) in query formulation. The following GEM query retrieves the name of the manager, the serial number of the equipment, and the address for each laboratory whose headquarter is located in New York. Range of Lab is Lab Retrieve Lab.Manager.Name Lab.Equipment.Serial# L ab .Loc at ion. Address Where Lab.Manager.Department.Headquarters.City = "New York" This query returns a set of tuples in a tabular form. Each tuple contains values for the managerâ€™s name, the equipment serial number, and the address of the laboratory of interest. In the approach described in Stonebraker et al. [ST084], the dot notation is used in a manner similar to that found in GEM to implement the abstract data type (ADT) concept. In addition, QUEL is used as a data type to facilitate the navigation from one relation to another. A relation may have a field of type QUEL which may contain expressions or commands (queries). Whenever the field is addressed in a query, these expressions, in whole or in part, will be activated. In general, if X is the tuple variable of the relation Rl, Y is a field of type QUEL in relation Rl, and the query stored in Y retrieves field Z of another relation, R2, 21 then the expression X.Y.Z is a field in a collection of this view. In other words, the expression will return the values of the Z field of tuples (in R2) that are related to X through Y. For example, let the relation Manager have a field called Officelnfo of type QUEL which contains a query that retrieves the telephone number of the relation Location. The expression Manager.Officelnfo.Tel# returns the telephone number for each manager in a tabular format. Clearly, the impleÂ¬ mentation of QUEL as a data type provides a way to relate data in two relations without modifying the database schema. Instead of using the dot notation, ARIEL [MAC85] takes advantage of the "OF" notation. The example query described for GEM can be restated as Range of Lab is Lab Retrieve Name OF Manager OF Lab Serial# OF Equipment OF Lab Address OF Location OF Lab Where City OF Headquarters OF Department OF Manager OF Lab = "New York" using the "OF" notation which is linguistically more natural than using the dot notation. However, the result of this query is also represented by a flat table (relation). DAPLEX [SHI81] is a functional data language. The data retrieval comÂ¬ ponent of DAPLEX is similar to the languages described above, although it is interpreted differently. In the functional paradigm, the class having a link (i.e., attribute) emanating to another class is considered as a function. The function has, by default, the name of the class to which the fink points. For example, 22 Location(Lab) and Department(Headquarters) represent the facts that Lab has Location and Headquarters has Department as attribute, respectively. When the function Location(Lab) is applied to an object of the class Lab, it returns a value which is an object in the domain class over which the attribute is defined. If the navigation is from one class to another through a sequence of classes, a nested function is used. For instance, the expression Name(Manager(Lab)) specifies the name of the manager of a laboratory to which the manager is responsible. For a particular object of Lab, the manager of the laboratory is produced first; then, the function Name() is applied to the returned manager and returns the name of the manager. The example query can be expressed in DAPLEX as follows. FOR EACH Lab SUCH THAT City (Headquarters (Department (Manager (Lab)))) = "New York" PRINT Name (Manager (Lab)), Serial# (Equipment (Lab)), Address (Location (Lab)) Even though DAPLEX is based on the functional paradigm, it returns data in the form of a relation just like in GEM and in ARIEL. Banerjee et al. [BAN88] introduce a query language based on message passÂ¬ ing. In the message passing paradigm, the name of a link emanating from a class is interpreted as the name of a message which is stored within that class. One can assume there is actually a message created by the system and having, by default, the same name as its corresponding attribute. When such a message is sent to an instance of the class, it returns the value of the attribute. For example, the fol- 23 lowing is an expression for selecting a laboratory that has a manager who belongs to a subordinate department of its New York headquarters. (Lab SELECT :S (:S Manager Department Headquarters City = "New York")) SELECT in this expression is a message sent to the class Lab. The first argument of SELECT is :S, an iteration variable. The SELECT message iterates over the instances of the class Lab with :S bound to one instance at a time. The block of code within the parentheses is the second argument of SELECT, and is executed for each value bound to :S. In this particular block, the message Manager is sent to the instance bound to :S in order to return the related Manager instance. Similarly, Department and Headquarters are messages. To elaborate, Department is sent to the returned Manager instance, Manager is sent to the returned Department instance, and Headquarters is sent to the returned DepartÂ¬ ment instance. The sign "=" is also a message which has the argument "New York". When this message is sent to the resulting headquarter instance, it returns a logical object TRUE or FALSE. An instance of Lab is qualified for the above expression, if and only if the returned logical object is TRUE. The logical AND or OR message can be sent to this object with an argument that specifies some other condition on the instance of Lab. In principle, though not described in Ban- erjee et al. [BAN88], similar message-based expressions can be used to retrieve attribute values of the resulting Lab instance. The result of a query which involves such conditions is the set of the instances of Lab along with its attribute 24 values and is represented in a tabular form. As shown in the samples of these query languages, their query formulations, though interpreted differently, are very similar to each other. This is evident in the fact that the formulating of queries is accomplished by navigating the graphiÂ¬ cally represented database schema from class to class through their respective links. In each of these languages, however, a query operates on a database that is structurally represented using an 0-0 data model and returns a result whose structure is represented in a tabular form. Consequently, the result of a query cannot be further queried by other queries written in the same language. ThereÂ¬ fore, these languages are not closed. Another drawback of these languages is seen in their navigation mechanisms which can only formulate queries against classes (or relations) that are interreÂ¬ lated in simpler patterns like the linear and forest structures shown in Figure 2.2a. However, in 0-0 databases, the graphical patterns in which objects are interÂ¬ related with each other are basically networks which are not restricted to plane graphs (a graph is a plane graph if it can be drawn on a plane without any interÂ¬ section of two edges). They can be as complicated as surface graphs (a graph is a surface graph if it can be drawn on a surface without any intersection of two edges). Phrasing queries against classes that are interrelated in more complicated patterns depicted in Figure 2.2b is beyond the capabilities of these languages. A third drawback of these languages which renders their navigation mechanÂ¬ isms insufficient is that only one type of the relationship (an object ia related to another object) between objects of two classes can be expressed. In fact, when 25 two classes are directly linked at the schema level, objects in these two classes may have another type of relationship â€” an object is. not related to another object. This type of relationship represents the complement aspect of the semantics specified for the two associated classes, such as not-a-part-of, not-a-function-of, or ia-not-a which is often needed in querying the databases. For example, 'For each laboratory, list the equipment that is not available" is a reasonable query. The proposed query languages [DAD86, MAN86, BAN87, ROW87, CAR88, COL89] use nested relations as their logical views of databases. A nested relation is a generalized relation, i.e., a recursively defined relation: the attributes of a relaÂ¬ tion can be either atomic values or another relation in which the attributes can be a third relation, and so forth. Figure 2.3 shows an example of a nested relation. Nested relations are particularly suitable for representing data in forest structures. The above languages are considered to be closed, since operators in these languages operate on nested relations and produce nested relations. However, they also have the drawbacks mentioned above and it is our view that nested relaÂ¬ tion is not a proper logical representation for an 0-0 database which is networks of objects, object classes, and their associations. Using nested relations to represent data in network structures introduces one level of indirection. Mapping from a network representation to nested relations is an extra process. FurtherÂ¬ more, in order to use a nested relation to represent complex structures, a large amount of data has to be replicated in the representation. Figure 2.4 shows an example of using a nested relation to represent a graph having loops. Note that 26 vertex F has to be replicated three times. 2*2 ENCORE Q-Q Data Model and Its Underlying Query Algebra In spite of the popularity of the 0-0 paradigm and its application in the field of database management, the existing 0-0 database management systems still lack a solid mathematical foundation for the manipulation of an 0-0 database and the optimization of queries. Recently, a query algebra [SHA90] was proposed for the ENCORE 0-0 data model [ELM89]. This section surveys the query algeÂ¬ bra as well as the ENCORE model. It also serves as a comparison to the associaÂ¬ tion algebra proposed in this dissertation. 2.3.1 The ENCORE Model ENCORE 0-0 data model [ELM89] supports abstract data type, type inheriÂ¬ tance, typed collection of typed objects, objects with identity, and object encapsuÂ¬ lation. It models an application as networks of objects, object types, and their associations. The definition of an abstract data type in this model includes the Name of the type, a set of Properties defined for instances of the type, a set of Operations which can be applied to the instance of the type. Properties reflect the state of an object while operations may perform arbitrary actions. Properties are typed objects that may be implemented as stored values, procedures, or functions. The implementation of a property is invisible to the user and is assumed to return an object of the correct type and to have no side-effects. 27 In addition to user-defined abstract data types and a collection of atomic types such as Int, String, Boolean, etc. (i.e., primitive-classes), ENCORE provides two parameterized types and a global Object type which is the supertype of all other types. The parameterized type Set[T] defines T as the type, or supertype, of objects in a collection having type Set, and T is called the member type of the set. The parameterized tuple type associates types (T,.) with attribute names (A,.) and defines properties Get-attribute_value and operations Set_attribute_value for each attribute. The T- s can be any database types, thus, allow nesting of tuple types. The value of a tuple is represented as cAp ov A2: o2, ... , An: on> where the Aâ€™s are attributes of the tuple and the oâ€™s are objects of the corresponding types. The global supertype Object defines a family of operations for equality called iâ€”equality where i indicates how "deeply" a comparison of two objects must search before finding equality. Two objects are identical when they are the same object, i.e., they have the same identity. Identical objects are O-equal (=0 or just =) and, for *>0, two objects are i-equal (=$.) if (1) they are both collections of the same cardinality and there is a one-to-one correspondence between the collections such that corresponding members are =Â«-u or (2) they both have the same type (not a collection type) and the values of corresponding properties are =,._j. Type Object also defines a stronger notion of equality called id-equality. Two objects are id-equal at depth i if they are i-equal and graphical representaÂ¬ tions of the objects are isomorphic. 28 2.3.2 The Underlying Query Algebra of ENCORE The query algebra [SHA90] is proposed based on the 0-0 model ENCORE. The domain of the query algebra is defined as a typed collection of typed objects. A typed collection is of parameterized type Set[T] and the objects in the collection are of type T. If objects of a collection are collected from different types, T is their most specific common type in the type lattice. For example, if object a is of type 5, object p is of type P, and S' is a supertype of P, the collection of objects a and p is of type Set[S]. The query algebra is closed since the operators of the query algebra operate on collection(s) of objects with type Set [TV] and produce a collection with type SetfTJ, where type Tk is defined by the query. Similar to the languages surveyed in Section 2.2, the query algebra addresses a property of an object using â€™dotâ€™ notation (e.g., a.p.q where Â« is an object of type Tv p is a property of a and is of type T2, and q is a property of p and is of type Ts). Twelve operators are defined in this algebra. We give their brief definitions followed by some example queries to illustrate the major concepts of this algebra. (1) The Select operation creates a collection of objects which satisfy a selection predicate. Select(S,p) = { 8 | (Â« in S)Ap(a) } where p is the predicate. (2) The Image operation is used to return a single object for each object in the queried collection and has the form: 29 Image(S, f : T) â€” { /(Â«) | 8 in S } where 5 is a collection of objects and / returns an object of type T. (3) The Project operation extends Image by allowing the application of many functions to an object, thus supporting the creation and maintenance of selected relationships between objects. The relationships are stored as tuples with Tuple type. Project(S, { where S is of type Set[T\, the A/s are unique attribute names, and each takes a single input of type T and returns an object of type T{. Project returns one tuple for each object in the collection being queried. Each newly created tuple is a new object with unique object identifier. (4) The Ojoin operator is an explicit join operator used to create relationships which is not defined between objects of two collections in the database. It is essentially a Cartesian product of collections of objects, followed by a selecÂ¬ tion of result tuples. For collections S and R, the Ojoin is defined as follows: Ojoin(S, R, Av A2, p) = { where p is a predicate (as in Select) defined over objects from S and R. The Ojoin operation creates new tuples in the database to store the generated relationships. The tuples created will have unique object identifiers. (5) Union, Difference, and Intersection are the usual set operations with object comparisons and set membership based on object identity (=0). The result of 30 these operations is considered to be a collection of objects of type T, where T is the most specific common supertype (in the type lattice) of the types of the objects in the operands. (6) Flatten operation is used to restructure sets of sets and Nest and UnNest allow the representation of tuples as flat or nested relations. (7) For the above operators, two identical operations cannot give identical response, since each result collection is a newly identified object in the dataÂ¬ base and the objects in a result collection may be either existing database objects or new tuple objects created during the operation. Operators DupEl- iminate and Coalesce are introduced to handle situations where equal objects are created by a query. The example queries are issued against the Supplier-Parts-Job database shown in Figure 2.5. For the purpose of these examples, it is assume that Type Object is the only supertype for each of the given types. Example 1: Find all red parts. Which suppliers can supply all of the red parts? P_red := Select(Parts,Xp p.color = "Red" S_Pred:= Select(Suppliers,Xs P_red subset_of s.Inventory) The first selection finds the red parts and the second selection finds all supÂ¬ pliers for which the inventory includes that set of parts. The subset_of operation is available since property Inventory and result P_red both have type Set[Part]. Example 2: What parts are needed by jobs in Boston? Bos Jobs := Select(Jobs,Xj j.address.city = "Boston") BosJobParts := Project(BosJobs,Xj <(J,j),(Pt,j.PartsNeeded)>) 31 The select operation finds the jobs in Boston and the project operation gives information about which parts are needed for each job in Boston. The result of the projection is of type Set[Tuple]. Note that operation NewPart (of type Job) cannot be applied to members of BosJobParts, since they have type Tuple. HowÂ¬ ever, it is appropriate for objects BosJobParts.J. Example 3: Find all local suppliers for each job. LocalS:= Ojoin(jobs,Suppliers,J,S, Xj Xs j.address.city = s.address.city) This Ojoin operation produces a set of tuples of type <(J, Job),(S,Supplier)>, which is similar to a normalized relation. To get a set of suppliers for each job, a Nest operation needs to be applied: Nest(LocalS, S). From the above description, we can see that the query algebra supports many features of 0-0 databases and has taken significance steps towards a powerÂ¬ ful 0-0 query algebra to serve as the mathematical foundation for 0-0 database. However, it still has the following limitations. (1) Although the ENCORE models an application as networks of types, objects, and their associations, the domain of its underlying query algebra is defined as collections of objects having type Set[T], which is essentially a nested relation representation, since the member type T of the set type can be a parameterÂ¬ ized Tuple type which may in turn contain attributes of Tuple types. ThereÂ¬ fore, the query algebra cannot represent network-structured relationships among objects efficiently and the mapping problem addressed before still remains. 32 (2) In this algebra, two identical expressions or two identical operations in a sinÂ¬ gle expression do not give identical response, since each result collection is a newly identified object in the database. To eliminate duplicated copies of the same newly created object, the algebra introduces DupEliminate and Coalesce operations, which are not necessary if it directly supports the netÂ¬ work view of 0-0 databases. (3) In this algebra, a collection may contain objects with heterogeneous strucÂ¬ tures. For example, two objects are both of Tuple type but with different arities and the union of the two object is also a collection of objects having Tuple type. However, other operators in this algebra are not defined to operate on such collection(s). (4) Since the query algebra is developed for a specific model (i.e., Encore), it is difficult to apply to other 0-0 models. 33 Figure 2.1 A sample schema 34 O O O o o (a) simple query patterns Figure 2.2 Simple and complex query patterns 35 NAME ADDRESS INVESTMENTS COMPANY SHARES PURCHASE PRICE DATE ISO John Smith 311 East 2nd St. Bloomington, IN 47401 64.50 02/01/83 1 00 92.50 08/1 0/87 200 89.75 06/20/83 500 96.50 1 1/1 0/84 1 00 Jill Brody 41 North Main St. Obertin, Oh 44074 EXXON 35.0 01/30/81 1 00 64.50 01/30/82 1 00 59.50 02/1 0/83 200 FORD 35.50 02/1 0/83 200 SEARS 35.75 1 2/25/87 1 00 Figure 2.3 An example of a nested relation 36 Pattern Number A B C D E F F F G H 1 a1 b2 c4 d3 e2 f 5 f 5 f 5 gi h6 Figure 2.4 Using a nested relation to represent a complex structure 37 Type Supplier properties: operations: Ident: string RecvOrder: Address: Addr Supplier, Set[Part] ~> Supplier Inventory: Set[Part] Type Job properties: operations: Num: string NewPart: Job, Part --> Job Address: Addr PartsNeeded: Set[Part] Preferred_Suppliers: Ordered _list[Supplier] Type Part properties: operations: Num: string Order: Part --> Part Address: Addr Same_Part: Part, Part --> Boolean Color: string Components: Set[Tuple[<(P,Part,(Qty,lnt)>]] Plan: drawing BillofMaterial: list[Part] Type Addr properties: Street: string City: string State: string Figure 2.5 A Supplier-Parts-Job database CHAPTER 3 OVERVIEW OF 0-0 DATABASES AND ASSOCIATION-BASED QUERY FORMULATION This chapter informally introduces the graphical view of 0-0 databases and illustrates the association-based query formulation mechanism. The graphical view captures the most important characteristics of 0-0 databases in which object classes and their objects are associated with each other. Based on this view, query formulation and processing can be made by specifying and manipulatÂ¬ ing association patterns in which objects are inter-related with each other, unlike the traditional attribute-based query formulation and processing which match values in different relations. Since the graphical view is suitable for many 0-0 data models, the association algebra developed based on this view can be used as a general algebra for supporting these 0-0 databases. The graphical view of 0-0 databases is formalized in the next chapter. 2J Overview of Q-Q Databases 0-0 semantic data models provide a conceptual basis for defining 0-0 dataÂ¬ bases. Although each model has some unique constructs that distinguish one model from the others, there are several common structural and behavioral proÂ¬ perties based on which an algebra can be developed and used to support these models: 38 39 First, objects are physical entities, abstract concepts, events, processes, funcÂ¬ tions or anything that an application cares to capture and represent. Second, objects having the same structural and behavioral properties are grouped together to form an object class. Object classes can be categorized into two general categories: (l) the nonprimitive-class which represents a set of objects of interest in an application world, each of which is assigned a system-wide unique object identifier (OID) and its data are explicitly entered in a database by the user; and (2) the primitive-class which represents a class of self-named objects serving as a domain for defining other object classes, such as a class of symbols or numerical values. The behavioral properties of an object class are defined in terms of system-defined or user-defined operations (e.g., retrieve, display, delete, insert, rotate a design object, hire an employee, etc.), which can meaningfully operate on its objects using their corresponding programs (or methods). The structural properties of an object class and, thus, its objects consist of two types of data (1) descriptive data (or instance variables) which define the states of the objects; and (2) association data which specify the relationships between its objects and the objects of some related classes. Third, different 0-0 models recognize different types of associations. Two of the most commonly recognized associations are aggregation and generalization. Aggregation models the aâ€”partâ€”of, aâ€”functionâ€”of, or aâ€”compositionâ€”of relationÂ¬ ship. For instance, a complex object can be modeled by an aggregation hierarchy (abstract data type) in which a complex object is defined in terms of its associaÂ¬ tions with objects in other defined classes. Generalization models the is-a or the 40 superclassâ€”subclase relationship in which an object in a subclass inherits both the structural and the behavioral properties of its superclass(es). Thus, from the algebra point of view, an 0-0 database can be viewed as a collection of objects, grouped together in classes and interrelated through associaÂ¬ tions. It can be represented by graphs at both the intensional and the extensional levels. At the intensional (schema) level, a database is defined by a collection of inter-related object classes and is represented by a Schema Graph (SG). For example, the SG for a university database is illustrated in Figure 3.1, in which each rectangle denotes a nonprimitive-class such as a class of person objects or a class of department objects, and each circle denotes a primitive-class such as a class of names or ages. The associations among classes are represented by the edges in SG. For example, there is an association between the class Course and the class Department (an Aggregation association), and an association between the class Person and the class Student (a Generalization association). Since the semantic distinctions of these and other association types recognized by different semantic models can be either hard-coded in a DBMS or declaratively specified by some rules and used by a rule processor to govern the manipulation of the associÂ¬ ated classes, the underlying algebra does not have to incorporate the semantics of these association types. All it has to be concerned with is whether or not an object class and its objects are associated with some other classes and their objects, i.e., the edges (or associations) are type-less in SG. For example, the semantics of inheritance can be incorporated in a query language translator which translates a high-level language statement into its underlying algebraic representa- 41 tion. The algebra does not have to deal directly with the semantics of inheritance. This is particularly important if the algebra is to be used as a general algebra for supporting various 0-0 data models in which the semantics of an association type may have slightly different meanings. At the extensional (instance) level, a database can be viewed as a collection of objects, grouped together in classes and inter-related through some type-less associations; and as such it can be represented by an Object Graph (OG). For example, the OG corresponding to a portion of the university schema graph is shown in Figure 3.2. In this example, the Teacher object t4 is associated with two Section objects; thereby representing the fact that he/she is teaching two sections, sc3 and sc4. The Student object si is associated with Undergrad object ul which, in turn, is associated with Department object dl; thereby representing that si is an undergraduate student who minors in the department dl. Finally, the Section object sc2 is not associated with any object of the Student class, which represents the fact that it is not taken by any student. Object associations expressed by different graph patterns represent the semantic relationships among these objects in an application world. 2*2 Pattern-based Query Formulation Based on this view of an 0-0 database, users can query the database by specifying patterns of object associations as search conditions. Once these objected are selected, they can be further processed by either system-defined operations (Retrieval, Display, Update, Insert, Delete, etc.) or user-defined 42 operations (RotatePart, PurchasePart, HireFacuity, etc.). For example, the folÂ¬ lowing queries can be issued against the university database as illustrated in FigÂ¬ ures 3.1 and 3.2 (the algebraic expressions for these queries will be given in Section 4.4). Query 1: For all sections, get the majors of students who are taking these sections. To satisfy this query, we can specify a linear pattern containing the classes Section, Student, and Department as shown in Figure 3.3a. In this pattern, a cirÂ¬ cle represents a class and an edge represents that the objects of the two adjacent circles (classes) must be associated with each other. This pattern is called an intensional pattern which represents that sections taken by students who major in some departments are to be identified. The answer to this query can be found in Figure 3.2 by checking if the objects of these three classes satisfy such pattern. There are five object patterns (called extensional patterns) which satisfy the intenÂ¬ sional pattern as shown in Figure 3.3b. The Section object sc2 and the Student object s3 do not appear in these extensional patterns, since sc2 is not taken by any student and s3 does not have a major yet. These patterns can also be identified in two sequential steps. First, get all the patterns in which the Section objects are associated with the Student objects. Then, if a pattern generated in the first step (i.e., a Section-Student pair) is further associated with an object of Department, a new pattern consisting of three objects is constructed and retained in the result; otherwise, the pair is dropped. 43 Once these objects (as well as their associations) have been identified, different system-defined or user-defined operations defined on their corresponding classes can be applied to these selected objects. For example, Inform(Department) can be an operation defined on the class Department. It sends each of the selected departments a letter concerning the majors of the students. Suppose there is a rule in the university that a student cannot major and minor in the same department. To check whether there is such a case in the database, the following query can be issued. Query 2: List students who major and minor in the same department. The intensional pattern for this query is shown in Figure 3.3c. It can be formed by starting from the class Student and navigating the schema in two traversal paths (refer to Figure 3.1). One path is from Student to Department, which means that a student majors in a certain department; and the other path is from Student to Department through Undergrad, which means that a student is an undergraduate and minors in a certain department (we can see from the SG that only undergraduates may have minors). According to the query, a single stuÂ¬ dent should associate with objects in both Undergrad and Department and these two paths should merge at Department, thereby forming a loop. This implies two logical AND conditions, one at the Student class and the other at the Department class. We use double arcs to denote such conditions as shown in Figure 3.3c. From Figure 3.2, we can see that the student si has his major and minor in the department dl. This extensional pattern is depicted in Figure 3.3d. 44 Query 3: For those students taking section 300 and having majors and/or minors, get their majors and/or minors. There are several ways to form an intensional pattern for the query. We may start from Section# and traverse to Student through Section and, then, naviÂ¬ gate the schema in two paths as we did for query 2. According to the query, a student who either has a major or a minor should be included in the result (in this database, it is assumed that graduate students do not have minors). This means that either path of the navigation will construct a pattern that would satisfy the query. Thus, a logical OR condition exists at Student. We use a single arc to indicate the OR condition as shown in Figure 3.4a. Like Query 2, these two branches merge at Department. However, this query does not require that they merge at the same Department object. This is specified by the second OR condiÂ¬ tion at Department in Figure 3.4a. The extensional patterns that satisfy this query have heterogeneous strucÂ¬ tures: two types of linear patterns as shown in Figure 3.4b. The first type includes patterns that represent the minors of the undergraduates; and the second type includes patterns that represent the majors of the student who are either underÂ¬ graduates or graduates. In both types of patterns, a student is associated with secÂ¬ tion 300 which is assumed to be the Section# for sc3. Figure 3.4c will be described later in Section 4.4. We have given some example queries which specify how objects are associÂ¬ ated with one another. In the graphical representation of an 0-0 database, when there is no edge between two objects even though there is one between their classes, it implies that two objects are not associated with each other. This 45 represents the complement aspect of the semantics between two associated classes. It is necessary to allow a user to retrieve this type of object non-association from a database. The following query is such an example. It can also be specified by a pattern. Query 4: For each teacher, list the sections which he/she does not teach. We use a dashed line to represent the fact that two objects are not associated with each other. Therefore, the intensional pattern for this query can be drawn as in Figure 3.4d. There are twelve extensional patterns that match the intensional pattern. Figure 3.4e shows a portion of them. Non-association relationships among objects are not explicitly stored in a database. However, they can be derived during the processing of this type of queries. Using the above examples, we hope that we have convinced the reader that the pattern-based query formulation is suitable for query specification based on a graphical view of an 0-0 database. 2*3 Conclusion The (type-less) graphical representation of 0-0 databases is applicable to most 0-0 data models, since it captures the essential characteristics of 0-0 data models in which object classes as well as their objects are inter-related with each other in different association patterns. Querying such databases can be made by specifying patterns in which objects of interest are associated with each other. It should be clear that this formulation is quite different from the attribute-based query formulation in the existing relational query languages which is based on 46 matching the attributes (or the key or composite key) of one relation with the attributes (foreign keys) in other relations. A query that requires the specification of a complex pattern of object associations can be specified in a rather straightforÂ¬ ward manner in an association-based language, whereas in an attribute-based language, complex nestings of query blocks or multiple queries would be required [ALA89a]. It is our view that an algebra developed for processing data based on the graphical view of 0-0 databases and the pattern-based query formulation should satisfy the following requirements. First, it should allow direct manipulation of complex patterns of object associations. Second, the closure property should be maintained. Third, both association and non-association relationships among objects should be expressible as search conditions. Fourth, it should be complete in the sense that it can be used to describe all possible patterns in a database. Lastly, it must be able to represent and process patterns with both homogeneous and heterogeneous structures. 47 Figure 3.1 Schema graph of a university database 48 Teacher Undergrad Figure 3.2 Object graph 49 Query 1 Section Dept (a) O O O Student sc1 s1 d1 â€¢ â€¢ â€¢ sc3 s2 d3 â€¢ â€¢ â€¢ (b) sc3 s4 d3 â€¢ â€¢ â€¢ sc3 s5 d4 â€¢ â€¢ â€¢ sc4 s7 d6 Â» â€¢ â€¢ Query 2 Figure 3.3 Pattern specifications for Query 1 and Query 2 50 Query 3 (b) Section# Section Student Dept Query 4 (d) Teacher o- - Section â€”o 11 sc2 â€¢ - â€¢ 11 sc3 (e) I 14 * sc2 Figure 3.4 Pattern specifications for Query 3 and Query 4 CHAPTER 4 ASSOCIATION ALGEBRA The association algebra (A-algebra) is defined based on a uniform representaÂ¬ tion of an 0-0 database in terms of objects, object classes, and type-less associaÂ¬ tions, as described in Chapter 3. The algebra contains a number of operators which operate on graph structures of object associations to produce graph strucÂ¬ tures. The closure property of the algebra ensures that the result of a query can be further manipulated by other queries. Ã¡J Definitions First, we formally define an 0-0 database at both schema and object levels. Schema Graph (the intensional database): The schema graph of an 0-0 database is defined as SG(C,A), where C={C{} is a set of vertices representing object classes; A is a set of edges, each of which, Ai}{k), represents association between classes C,. and C-, where k is a number for distinguishing the edges from one another when there is more than one edge between two vertices. Object Graph (the extensional database): The object graph of an 0-0 database is defined as OG(OtE), where 0={O^} is a set of vertices representing object instances (j'th object in class C{); and E={0iX=OmJ is a set of edges representing the associations among object instances. When one object instance is connected with another in the object graph, a regular-edge (solid line) is drawn between the corresponding verÂ¬ tices as Oi^â€”Omn which specifies that j'th object instance in class <7,. is related to nth object instance in class Cm through the fcth association of classes CÂ¡ and Cm. If two object instances Ot j and Om n are not connected in the object graph but their classes <7,- and Cm in the corresponding SG are 51 52 directly connected, a complement-edge (dotted line) is drawn between them and is denoted by ^ J ij m, n In this 0-0 models, an object may participate in several classes (e.g., in a generalization hierarchy). Its representation in a class is called an object instance. Since in most cases in this dissertation, "object" and "object instance" can be used interchangeably without any ambiguity, we shall use "object" unless a distinction is required between the two. The reason for explicitly introducing complement-edges into the OG is to allow the A-algebra to manipulate both association and non-association between objects of two adjacent classes. In an actual 0-0 database, it is not necessary to explicitly store the complement-edges. Figure 4.1 illustrates the regular-edges and complement-edges among the objects of three object classes. For example, we see that section scl is taken by students s2 and s3 (regular-edges) and not taken by students si and s4 (complement-edges). The relationship between an OG and its corresponding SG is formally described by the following proposition. Proposition 1: An 0G(0,E) is a morphism of its corresponding SG(C,A). The mapping function Fm is defined as FmV Ci => and Fm2' => {OiJ===Omn}. The mapping between SG and OG is one-to-many, since a database is dynamically changing and may have different instantiations at different times for the same schema graph. 53 To define "association pattern", we first extend the concept of connected graph in graph theory by treating complement-edges as edges, i.e., a connected graph is a graph in which there exists at least one path between any two vertices and each path may contain regular-edges, complement-edges, or a combination of the two. We shall from now on use an upper-case letter to denote a class and the corresponding lower-case letter with a subscript to denote an object instance in that class. We shall assume that there is only one edge between any two vertices in SG unless otherwise specified so as not to complicate the notation. Association Pattern: A connected subgraph of an OG is an association pattern (or pattern for short). By this definition, a single vertex (or object instance) in OG, which is a conÂ¬ nected subgraph, is also a pattern. We call it an Inner-association-pattern (or Inner-pattern for short). It is algebraically represented by (a,.) for a vertex of class A in SG. Thus, object instances are treated as Inner-patterns in the A-algebra. A regular-edge together with two vertices (i.e., two Inner-patterns) it connects is called an Inter-association-pattern (or Inter-pattern) which is represented by (a{bj). A complement-edge together with the two Inner-patterns it connects is called a Complement-association-pattern (or Complement-pattern) and is represented by This pattern states that of and bj are not associated with each other in OG. If a path consisting of only regular-edges between vertices at and bj. it can be represented by a Derived-inter-association-pattern (D-inter-pattern), denoted by (a.-bj); otherwise, it can be represented by a Derived-complement-association- 54 pattern (D-complement-pattern), denoted by (a{bj). When a path is represented by a derived pattern, it simply means that two vertices are indirectly associated or non-associated but how they are interrelated (the actual path) is of no importance. A D-inter-pattern is treated as an Inter-pattern and a D-complement-pattern is treated as a Complement-pattern in the algebraic operations. The above five types of patterns are the primitive patterns, the latter four being binary patterns. Their graphical and algebraic representations are summarÂ¬ ized in Figure 4.2a. All other connected subgraphs are called complex patterns. For example, the complex pattern shown in Figure 4.2bl contains three primitive patterns: two Inter-patterns (ojfcj) and (bldl), and a Complement-pattern (6,c,). It can be uniquely defined by its algebraic representation as a set of primitive patÂ¬ terns, i.e., (a,61,61c1,61d1). More examples of complex patterns are shown in Figure 4.2b. From these examples, one can observe that a complex pattern can be decomposed into a set of binary patterns which cannot be further decomposed. This implies that, in the algebraic representation of a complex pattern, an Inner- pattern may not occur as an element and a binary pattern may appear only once. A pattern in this algebraic format is called a normalized pattern, otherwise it is called an unnormalized pattern. (b2,b2c2), and are examples of unnormalized patterns. During the process of constructing an association patÂ¬ tern, we always normalize it by eliminating the duplicates. The above three patÂ¬ terns have the normalized forms of (fejcJ, (b2c2), and (a^pbjCg), respectively. The definitions of OG and association pattern imply that a pattern is a non- directional graph, i.e., (a{bj) = (6,-a,.), and that the sequence of primitive patterns in 55 the algebraic representation of a complex pattern is not important, hence (aibr bjck) = (ckbj, aibj)- Based on the above definition and notion of association pattern, we view an OG as an Association Graph (AG) and all the association patterns in AG form the domain of the A-algebra, denoted by A. 4*2 Relationship Between Two Association Patterns The operators of the A-algebra are defined based on the possible relationships between two patterns in A, so that they can be used either to construct complex patterns using simpler patterns or to decompose a complex pattern into several patterns of simpler structures. There are four possible relationships between two patterns p1 and p2: non-overlap, overlap, contain, and equal. (1) Non-overlap: Two patterns are said to be non-overlap, denoted by p'zxip2, if they have no common Inner-pattern. (2) Overlap: Two patterns are said to be overlapped, denoted by p'np2, if they have at least one common Inner-pattern. (3) Contain: Contain is a special case of (2) when all the primitive patterns of p1 are contained in p . We say that p is a subpattern of p and denote this relationship by p'Cp2. (4) Equal: This is a special case of (3) when p1 contains all the primitive patÂ¬ terns of p2, and vice versa. It is denoted by p=p. Before defining the association operators, we give the definition of "Association-set" â€” the operand of the association operators. Association-set: An association-set, denoted by a Greek letter a (or #7,...), is a set of associaÂ¬ tion patterns without duplicates, a designates the *th pattern in a, where 56 a'^a3 (ViVj). An empty set is also an association-set, denoted by A special type of association-set is called homogeneous association-set, which is important to the A-algebra, since some of the mathematical properties hold only when operands are homogeneous association-sets. Homogeneous Association-set: An association-set is homogeneous, if (1) all patterns are formed by the Inner-patterns (or object instances) of the same set of object classes; and (2) all patterns have the same number of Inner-patterns from each class in the set; and (3) corresponding primitive patterns belong to the same association and are of the same type; and (4) all patterns have the same topology. Otherwise, it is a heterogeneous association-set. Figure 4.3 depicts three example association-sets: a is homogeneous, whereas P is not since pattern f? has only one Inner-pattern of class C instead of two like $ and ft. 7 is not homogeneous because 7s contains a Complement-pattern which is different from 71 and 7s (i.e., different topologies). 4*3 Association Operators Ten association operators are formally defined in this section: three unary operators [A-Project (77), A-Select ( Divide (-f), NonAssociate (l), and A-Intersect (â€¢)]. The examples used to explain 57 these operators will make use of the domain A shown in Figure 4.4. To keep the graph simple, the Complement-patterns are not shown in the figure. The simple mathematical properties such as commutativity, associativity, idempotency, and nilpotency satisfied by the operators are given after each definition. 4.3.1 Notations Notations that will be used in the subsequent sections are fisted below. A, CL,â– \R(CLvCL2)\ (a,bj) (aibj) (aick) or, P, a Denote classes. Denotes a variable for a class. Denotes the association between classes CLl and CL2. Denotes the *th Inner-pattern of class A. Denotes an Inner-pattern variable. Denotes an Inter-pattern between two classes A and B. Denotes a Complement-pattern between two classes A and B. Denotes a Derived-pattern from class A to class C. Denote association-sets. Denotes *'th pattern of association-set a. Denote sets of classes. Hence, represents association-set a which has Inner-pattern(s) from the classes in {A}. It should be noted that an Inner-pattern is represented by an object instance identifier (IID), which is a system-assigned object identifier (OID) prefixed by a class identification so that the object instances of an object in multiple classes can be unambiguously distinguished and the fact that these object instances are 58 instances of the same object can easily be recognized. 4,3.2 Operators All relational algebraic operators operate on relations of homogeneous (or union-compatible) structures with the exception of Cartesian-product and Join. The Cartesian-product and Join provide the mechanism to concatenate two relaÂ¬ tions of different structures into a single relation, so that it can be further manipuÂ¬ lated by other operators. In the A-algebra, all the operators are defined to operate on association patterns of homogeneous as well as heterogeneous structures. Therefore, the relational algebra is a special case of the A-algebra in this respect. (l) Associate (*): The Associate operator is a binary operator which constructs an association- set of complex patterns by concatenating the patterns represented by two operand association-sets. Since a pattern may involve many classes and an object class may have more than one association with another class, it is necessary to specify through which association the concatenation of two patterns is intended. The Associate operation on association-sets or and /? over the association R between classes A and B is defined as follows: or * [fl(A,fl)] P={ 7 I 7 ==(Â«/,ambn): ambne[R(A,B)} A am&*â€˜ A bnetf } The result of an Associate operation is an association-set containing no dupliÂ¬ cates. Each of its pattern is the concatenation of two patterns (one from each 59 operand association-set). More specifically, if the Inner-pattern (or object am) of A in o' is associated with the Inner-pattern (or object bn) of B in ft in the domain of the algebra A shown in Figure 4.4, then a and ft are concatenated via the primiÂ¬ tive pattern (am6J. We do not restrict A and B to be different classes in *[R(A,B)\, i.e., a *{R(A,A))f} is a legitimate operation, which concatenates two patterns (one from each operand association-set) if they have a common Inner-pattern of class A. An example of the Associate operation is shown in Figure 4.5a (for conveniÂ¬ ence a copy of the sample database is shown in each figure for illustrating an operation. For clarity, we use graphical notation in the figures. In the example, or1 is concatenated with ft and ft, respectively, due to the existence of (61c1) and (c2) in A as shown in Figure 4.4. a is dropped simply because it does not have an Inner-pattern of class B. a3 is dropped because (62) is not associated with any Inner-pattern of class C in A. ft cannot be concatenated through (c4) with any pattern in a because no pattern in o- has an Inner-pattern of B that is associated with (c4) in A. For the same reason ft is dropped. For the Associate operator, [R(A,B)\ can be omitted if the following condiÂ¬ tions hold: (1) both a and ft are A-algebra expressions, (2) the Associate operator operates on the last class in a linear expression a and the first class in a linear expression P, and (3) there is a unique association between these two classes. For example, A *[R(A,B)\ B can be written as A*B, if class A is associated with class B through the attribute [/2(A,fi)j of A. It should be pointed out that A-algebra allows an attribute to be defined by a computed value (or object). For instance, 60 B=j{A). The implementations of the function and the procedure are invisible to the algebra. However, they should not have side effect, i.e., the computed result must be of the same type as B. The Associate operator is commutative and conditionally associative as defined below: a *[-R(A,J9)] 0 = 0 *[#(Â£?,A)] a (commutativity) (arw *{R{A,B)} 0{Y}) *[R(C,D)\ 7{Z} (associativity) = Â«Â« MAM [P{Y) A.R(C,D)\ 1{z]) (if CÂ£{X} A BÂ£{Z}) A *[-R(A,A)] A = A (idempotency) The associativity holds true if a and 7 do not have Inner-pattern of classes C and B, respectively. Otherwise, the associativity does not hold. For example, if a=(o161,61c2), yS=(6,Cj), 7=(rfx), and A is as shown in Figure 4.4 (the domain of the algebra), then (or *[J2(A,.B)] 0) 4R(C,D)} 7 =(o161,61c1(61c2,Â«2d1) and a *[fl(A,fl)] (0 *\R(C,D)\ 7) = 0 61 (2) A-Complement (|): The A-Complement operator is a binary operator which concatenates the patterns of two operand association-sets over Complement-patterns. It is used to identify the objects in two classes which are not associated with each other in A. The A-Complement operator is defined as follows: a | [R{A,B)) P = { 7 I 7 (sX)e[i2(A,J3)] A aJZfit A bjtf or 'f=a : 3(m)(ameaâ€˜) A i(n)(bâ€žeff) or 'f=ft : 3(n){bnÂ£ff) A Ã¡(m)(amGa) } The result of an A-Complement operation is an association-set. Each of its patterns is formed by concatenating two patterns (one from each operand association-set) via a Complement-pattern (om6n), where am and bn belong to a and ft, respectively, and the Complement-pattern (am6n) is in A. In the special case when a(or fi) is an empty association-set or does not have Inner-patterns of class A(or B), then all patterns of f(or a) that have Inner-patterns of A(or B) are retained in the resulting association-set. An example of the A-Complement operation is shown in Figure 4.5b. It operates over the association between classes B and C. a does not appear in the resultant association-set because it contains no Inner-patterns of B. a1 cannot be A-Complemented with ft and ft because it is connected with ft and ff by InterÂ¬ patterns (6,Cj) and (6^) in A, respectively. Under the same conditions as given in the Associate operator, [R{A,B)} need not be specified with the A-Complement operator unless there is an ambiguity. The A-Complement operator is commutative and associative. For the similar rea- 62 son described for the Associate operator, the associativity holds true conditionally. a | [J2(A,B)] P = P | [i2(B,A)] a (commutativity) (Â«W I [i2(A,B)j P{Y]) | [R{C,D)} 7{z} (associativity) = orw | [R(A,B)} (P{Y) | [R(C,D)\ 7{z}) (if C(f{X} A B(f{Z)) A |[i?(A,A)] A â€” (3) A-Select (tr): The A-Select is a unary operator, which operates on an association-set or to produce a subset of patterns that satisfy a specified predicate P. A pattern in the operand association-set is retained iff the predicates are evaluated true for that pattern. a(ot)[P\ = { 7 | Y = a : P(a)=true } where or is defined by an algebraic expression, and P = Tidx T292 â€¢ â€¢ â€¢ 0n_, Tn. Each term, T,{t=l,2,...n), is a comparison between two expressions and 5Â¿(Â»=l,2,...,n-l) is a Boolean operator (Aorv). P(a)=true represents that a pattern is evaluated true for that predicate. The expressions on the left- and right-hand sides of a comparison operation may contain constants, functions, and/or operations on objects, but cannot both be constants. The comparison terms are type sensitive, i.e., the results of the two expressions in a term should be data of the same type for primitive-classes or both IIDs for nonprimitive-classes. =,>,<,>,<, and Â¿ are the legitimate comparisons for numerical types; = and ^ for character, string, and IID types; and =,C,D,C,D, and for set types. The comparison of two IIDs is performed by comparing their OID portions, since IIDs are the concatenations of the class identifiers and OIDs. 63 A single valued object or a single IID can be treated either as its own data type in numerical, string, or IID comparison, or as a set type containing one element in a set comparison. As an example of A-Select, we assume that there are two associated classes: S for stack and Q for queue. To select associated stack and queue object pairs in which the top and the bottom of the stack have some common object(s) with those in the head and the tail of the queue, it can be written as C^SÃAQl^ÃoplSlI^JfcoÃÃorr^S)) p| (head(Q)[Jtail(Q)) ^ 4>\ For the top equals the head and the bottom equals the tail, we have o(S*Q)[top(S)=head(Q) A bottorri(S)=tail(Q)} (4) A-Project (77): Similar to the projection operation in the relational algebra, an A-Project operation is defined to project subpattern(s) of a pattern. However, in the relaÂ¬ tional algebra, the relationship among the projected attributes is not important. Whereas in A-algebra, the association among the projected subpatterns must be maintained so that the associations among the objects in these subpatterns will be retained. The A-Project operator is defined as follows: Il(a)[Â£, 71 where a is an association-set defined by an A-algebra expression; Â£=(ev e2, , en) is a set of expressions which specify subpatterns to be proÂ¬ jected; and T=(tv t^, . . . , tm) is a set of ordered sets of classes. Each ordered set, 64 t,., specifies a path connecting two projected subpatterns defined by the et{i=l,2,...,n) is a subexpression of the expression which defines a. e,. and e - should not contain a common class. There may be many paths that conÂ¬ necting two subpatterns in the original pattern. The path to be retained can be specified in tk. If a specific path is chosen, a minimal number of classes along the path which can uniquely identify the path should be specified. The result of an A-Project operation over a pattern is its subpatterns defined by â‚¬ and some paths defined by T that connect these subpatterns. If a path in the original pattern conÂ¬ sists of all Inter-patterns, a D-inter-pattern is retained. Otherwise, a D- complement-pattern is included. Multiple paths between two projected subpatÂ¬ terns can be declared in T, if it is so desired. Figure 4.5c shows an example of A-Project from a pattern a over A*B and D. For a, the subpatterns (a,bÂ¡) and (dj satisfy A *B and D, respectively. ThereÂ¬ fore, they are kept in the result. According to the path specification stated in the operation, a Derived-pattern (6,dj) is added to the result, thus 71=(a1fc1, dÂ¡ b{d). Its normalized form is 7=(a161, i^d). 'f is produced for the same reason. Since a does not have a subpattern satisfying A *B, only (dg) is retained. (5) NonAssociate (l): The NonAssociate operator is a binary operator used to identify the associaÂ¬ tion patterns in one operand association-set that are not associated (over a specified association) with any pattern in the other association-set, and vice versa, 65 in the domain of the algebra A. The NonAssociate operator is defined as follows: a ! [R(A,B)\ ft={ 7 I 7 = (Â«*', ft, ^fcj: ftftJn)e[R(A,B)} A amGar*â€˜ A bneft A V ((a 6 ,),(a <6n)G.4)(a A 6 ,Â£/?) n m m n or ft = aâ€˜: 3(m)(amGÂ«â€˜) A Ã¡(n)(6â€žG^) V V(6â€žG03(fc, Mm)(atGÂ« A (afc6n)G[Â«(A,S)]) or ft = ft: 3(n)(6â€žG^) A ^(m)(omGÂ«) V V(amGa)3(*, Mn)(6fcG0 A (am6fc)G[JR(A,B)]) } The result of a NonAssociate operation is an association-set. Each of its patÂ¬ terns is formed by concatenating two patterns a and ft via a Complement- pattern (am6â€ž) under the condition that a is not associated with any ft and vice versa. Furthermore, in the special case where the patterns of a(or ft) have Inner- patterns of A(or B) and cannot be concatenated with any pattern of ftor or), these patterns of a(or ft) will be retained in the result if one of the following three condiÂ¬ tions holds: (1) ft(or a) is an empty association-set, (2) all patterns of ft(or a) do not have Inner-patterns of B[or A), or (3) all patterns of ft(or a) that have Inner- patterns of B(or A) can be concatenated with patterns of a(or ft). An example of the NonAssociate operation is shown in Figure 4.5d. In the example, a1 and ft are dropped due to the existence of (fcxc2) in Figure 4.4. a is dropped because it does not contain an Inner-pattern of class B. ft is dropped because it does not contain an Inner-pattern of class C. ft is in the resultant association-set because (b2) is not associated with (c4) in A as shown in Figure 4.4 and (63) does not appear in a. 7 exists because (&2) is not associated with (c3) in A. Note that the NonAssociate operator produces a resultant association-set which is a subset of that produced by the A-Complement operator, because or', ft, 66 and ambn may form a new pattern only when am of a' does not associate with any object of B in P and bn of ft does not associate with any object of A in a. In fact, the NonAssociate operator can be expressed in terms of A-Complement and other operators as follows: A ! [i2(A,B)] B = [A â€” II(A *[fl(A,B)] B)[A] \{R(A,B)} (B - II(A *[R(A,B)} B){B]) Thus, NonAssociate is not a primitive operator in a strict sense. However, it is very useful for query formulation and is therefore included in the set of A-algebra operators. Under the same conditions as given in the Associate operator, [i?(A,B)] need not be specified unless there is an ambiguity. The NonAssociate operator is comÂ¬ mutative but not associative. a ! [i2(A,B)j ft = ft ! [R(B,A)\ a (commutativity) A ![J2(A,A)] A = (6) A-Intersect (â€¢): The A-Intersect operation is convenient for constructing a pattern with a branch or a lattice structure (a pattern that has a loop), since a pattern in such structures can be viewed as the intersection of two patterns. Conceptually, the A-Intersect operator is equivalent to the JOIN operator in the relational algebra. It operates on two operand association-sets over a set of specified classes. Two patterns, one from each association-set, are combined into one if they contain the same set of Inner-patterns for each specified class. The A-Intersect operation is defined as follow: 67 Â«{*} *{W} P{Y) = { 7 I 7* = (<*'/): V( CLne{ W}) V(@G CLn,a)(@eff) A V(CLâ€žG{W})V(@GCLâ€ž/)(@ea*) } Figure 4.5e shows an example of the A-Intersect operation over classes B and C. The resultant association-set contains four patterns, which are the intersection of an/?, a D/T, au.fi, and a2nfi, respectively, since they all have Inner-patterns (6,) and (c2). Other patterns (a3, a4, fi, fi) fail to produce new patterns because they either have no Inner-pattern in both classes B and C or have no common Inner-pattern of class C. The set of classes {W> can be omitted when the A-Intersect operation is perÂ¬ formed on all the common classes of its operands, i.e., {IV}={^Qn{T} is implied. Since a lattice pattern can be transformed into a set of other simple patterns, an A-Intersect operation for building a complex pattern can be replaced by an Associate operation followed by an A-Select operation (see Section 4 for detail). The A-Intersect operator is commutative, conditionally associative and idempo- tent. a *{W) fi = p â€¢{W} a (commutativity) (Â«{*} *{Wi} firÂ¡) Â«{WJ 7{z} = Â«{x} â€¢ {IF,} {P[Y) #{W2} 7{z}) (associativity) The associativity is not always true because there are cases in which a patÂ¬ tern of P which fails to intersect with any pattern of 7, may succeed by first interÂ¬ secting with a pattern of a in the operation (â€¢{ VV,}) and then intersecting with a pattern of 7 in the operation (â€¢{ W2}). 68 Now we define three set operators, which are different from the correspondÂ¬ ing set operators in relational algebra, since they operate on heterogeneous strucÂ¬ tures as well as homogeneous structures. (7) A-Integrate (/): The A-Integrate is a unary operator. It reorganizes patterns in an association-set according to the relationships among patterns with respect to the classes specified. The A-Integrate operation is defined as follows: f[w}(a) = { T I 7 = i0.)'- v(fc, CLne{W}A@eCLnA@eQ1Aotâ€™ea.){@eoikAoikeoi,) } By this definition, a subset of patterns (or,) of a is combined into a single pattern if every object instance of classes in {W} that appears in a pattern in the subset is also contained in all other patterns in the subset. If a pattern of a cannot be comÂ¬ bined with any other pattern, it is retained in the resultant association-set as it is. If no class is specified, patterns, in which every pattern has at least one object instance (of any class) common to another, will be integrated into one patÂ¬ tern. The reorganized association-set will contain patterns which are apart from each other (refer to Section 4.2). Figure 4.5f shows two examples. The first example shows an A-Integrate operation over class A. Patterns that have common Inner-pattern of class A are grouped into one (71 is the integration of or1, a, and a3; and is the integration of a and as). All other patterns in a are retained in the result as they are. The second example illustrates an A-Integrate operation on the same association-set of 69 the first example but without specifying a class. The result becomes two patterns, which are apart and are exactly the same as they appear in the original database. Whereas the same primitive patterns appear more than once in the result of the first example. (8) A-Union(+): Similar to the UNION operation of the relational algebra, A-Union combines two association-sets into one. However, these two association-sets can contain heterogeneous association structures. It is important for A-algebra to be able to operate on heterogeneous structures because some prior operations may produce heterogeneous association-sets and may need to be further processed over the objects of a common class against other patterns of associations. Unlike the relaÂ¬ tional algebra and other 0-0 query languages, union-compatibility is not a restricÂ¬ tion in A-algebra. For this reason, A-algebra has more expressive power. Any query that can be expressed by a single expression in other languages can be expressed as a single A-algebra expression but not vise versa. The A-Union operaÂ¬ tion is defined as follows: Â« + P = { 7 I Vea v VeÂ£ } The A-Union operator is commutative, associative, and idempotent: a + P = f) + a (a + p) + 7 = a + (0 + 7) at + a = a (commutativity) (associativity) (idempotency) 70 (9) A-Difference (-): The A-Difference implements the same concept as the DIFFERENCE operaÂ¬ tor in relational algebra but with two differences. First, its operands do not have to be union compatible. Secondly, a pattern in the minuend is retained if it does not contain any of the patterns in the subtrahend. * - P = { 7 I 7* = a* : } The example depicted in Figure 4.5g shows that a1 and a are dropped since they both contain $. (10) A-Divide (^-): The A-Divide operator implements the concept that a group of patterns with certain common features contains another set of patterns. Â« +{W) P = { 7 I 1 = Â«V VfrX/^Ca, ) } where ott is a subset of the patterns of or, which have common Inner-patterns for all classes of {W} and they together contain all patterns of /?. If {W} is not specified, the A-Divide operation retains all the patterns of a, if each of which contain at least one pattern of f) and they together contain all patterns of /?. Figure 4.5h shows an example of a being divided by ft with respect to class B. The A-Divide operation retains or1, a ,and a3 since they all contain Inner- pattern (6,) of B and together contain all patterns of fi. 71 4.3.3 Precedence The precedence relationships of the above operator are as follows. Unary operators have higher precedence than binary operators. The precedence of the seven binary association operators is given in the following order: *, |, !, â€¢, 4-, and +. Parentheses can be used to alter the precedence relationships. 4.3.4 Summary of operators (1) Associate (#): Two patterns are concatenated via an Inter-pattern. (2) A-Complement (|): Two patterns are concatenated via a Complement-pattern. (3) A-Select ( (5) NonAssociate (l): Two patterns are concatenated via a Complement-pattern only if each of them cannot be concatenated with any pattern of the other operand via an Inter-pattern. (6) A-Intersect (â€¢): Two pattern are combined into a single pattern if their comÂ¬ mon classes have common object(s). (7) A-Integrate (/): Patterns in an association-set are combined if objects of a specified class in a pattern are common to these patterns. (8) A-Union (+): Two association-sets are lumped into a single set. (9) A-Difference (-): A pattern in the minuend is retained if it does not contain any pattern in the subtrahand. (10)A-Divide (-f): A subset of patterns in the dividend that have certain common feature(s) and contain all the patterns in the divisor is retained. 72 4.4 Query Examples We have formally defined nine association operators and given their simple mathematical properties. Before exploring other properties, we give some examÂ¬ ples to illustrate how these operators can be used to formulate queries for processÂ¬ ing an 0-0 database. There can be many alternative expressions for the same query. Choosing the best one for execution is the task of a query optimizer. The mathematical properties of these operators can be used for that purpose. In the following formulation of algebraic expressions, we assume that the user is using the algebra directly instead of a high-level query language. In the latter case, the task of generating algebraic expressions would belong to the translator. To formulate an A-algebra expression for a query, first, we need to construct an intensional pattern for it by navigating the schema graph of the database as illustrated in Chapter 3. Then, each edge of the pattern is marked an operator *, I, or ! on the intended semantics. For simple patterns, the formulation is straightÂ¬ forward. For patterns with complex structures, we may have to decompose them into patterns with simpler structures. The expression for the original pattern is the A-Intersectâ€™s of the expressions for the decomposed patterns. First, we formulate expressions for Query 1 to Query 4 given in Chapter 3. We have identified the intensional patterns for these queries (see Figure 3.3). Query 1: For all sections, get the majors of students who are taking these sections. It is trivial to write an algebraic expression for Query 1, which is represented by a linear pattern. For this pattern, two edges are all marked with * and the 73 algebraic expression can be formulated as follows: f (II[Section Â¿ Student Â¿ Department)[Section,Department,Section.Department]) '{Section} where the A-Integrate operation groups the resultant patterns by Sections. Query 2: List students who major and minor in the same department. For Query 2, the edges of the intensional pattern shown in Figure 3.3c are all marked with *. Since this loop structure can be viewed as the A-Intersect of two linear patterns involving both Student and Department, we have II(Student Â¿ Undergrad Â¿ Department â€¢ Student Â¿ Department)[StudentJ where the A-Project operation gets the student objects that satisfy the association pattern as required by the query. Query 3: For those students taking section 300 and having majors and/or minors, get their majors and/or minors. The expression for the intensional pattern of Query 3 shown is as follow: Section# Â¿Section * [Student Â¿Department + Student Â¿Under grad Â¿Department-1) where the A-Union operator is used to realize the OR condition at the class StuÂ¬ dent. As long as a student has a major or a minor, the linear pattern from Student to Department and the linear pattern from Student to Undergrad and to DepartÂ¬ ment should be retained. In the expression, Department_l is an alias of DepartÂ¬ ment, which is used to distinguish major and minor departments. Since the query ask for the majors and minors of students who are taking section 300, the A-Select and A-Project operations are used. Thus, we have 74 J (IJ( <7(Qr)[5ecÃton#=300])[5ÃudlenÃ, Department, Departmental; J{Student} Student.Department,Student.Department-A]) where a is the intensional pattern given above. As shown in Figure 3.3g, the result of this expression will contain the derived patterns shown in Figure 3g which are specified by the [Â£;7] clause of the projection operation and is reorganÂ¬ ized by an A-Integrate operation. Note that Query 3 cannot be phrased in a sinÂ¬ gle relational algebra expression since (a) the union operation in relational algebra requires operands to be union-compatible, (b) using a join operation on Student can cause a loss of information because not every student has both major and minor, (c) the cartesian-product of the majors and minors will produce erroneous results, and (d) no other operation in the relational algebra can combine two relaÂ¬ tions into one. Query 4: For each teacher, list the sections which he/she does not teach. The algebraic expression for Query 4 can be easily formulated as follows, since it is represented by a linear pattern shown in Figure 3.3h. We note that the A-Complement operator |, rather than the NonAssociate operator !, should be used for this query, since a teacher may be teaching some courses. Teacher \ Section Several other query examples are given below. They use the schema graph given in Figure 3.1. Their corresponding intensional patterns are depicted in FigÂ¬ ure 4.6. 75 Query 5: List the names of students who teach in the same departments as their major departments. We can see from Figure 4.6 that the intensional pattern for this query can be constructed in two ways. One way is to decompose it into three linear patterns: Nameâ€”Personâ€”Student, Studentâ€”Department, and Studentâ€” Gradâ€” TAâ€” Teacherâ€”Department The A-Intersectâ€™s of these three patterns will produce a pattern that satisfies this query. n[Student Â¿ Person * Name â€¢ Student Â¿ Department â€¢ Student Â¿ Grad Â¿ TA * Department)[Name\ where the first A-Intersect operation operates over Student and the second operates over Student and Department. The A-Project operation projects the names of these students. Another way is to decompose the intensional pattern into two linear patterns: Nameâ€”Personâ€”Studentâ€”Department and Studentâ€” Gradâ€” TAâ€” T e acherâ€”Department Therefore, we have an alternative expression IJ(Name Â¿Person Â¿Student Â¿Department Â¿TA â€¢ Student Â¿Grad Â¿TA Â¿Teacher Â¿Department)[Name] Query 6: List the section# of those sections which have not been assigned a room or have not been assigned a teacher. Since the query requests sections that have not been assigned a room or a teacher, these sections must not be connected with any room or any teacher (i.e., 76 a section which does not associate with any room and teacher should also be retained in the result). Therefore, there should be Complement-patterns between Section and Teacher and between Section and Room, and a single arc between these two branches as shown in Figure 4.6. We emphasize that ! operation, instead of |, should be used to construct these two Complement-patterns. Then the algebra expression for this query can be easily formulated as follows: II (Section# * (Section ! fioom# + Section \Teacher))[Section#\ Query 7: List the names of students who take courses 6010 and 6020. We shall show three ways of formulating an expression for this query. First, the intensional pattern for Query 5 shown in Figure 4.6 can be constructed by the A-Intersect of two linear patterns as we did for Query 5: IT(o(Name Â¿Person Â¿Student Â¿Enrollment Â¿Course Â¿Course#)[Course#=6010] â€¢ o(Student Â¿Enrollment-1 Â¿Course-1 Â¿Course#-\)[Course#=&02Q\)[Name] where Enrollment-1, Course_l, and Course#_l are the aliases of the classes Enrollment, Course, and Course#, respectively. This ensures that the A-Interact operation will be performed only over the Student class. A second way is to view the original pattern as a linear pattern without resÂ¬ triction on Course# as follows: Nameâ€”Personâ€”Studentâ€”Enrollmentâ€”Courseâ€”Course# Students who are taking both courses must participate at least two such patterns with Course#=6010 and Course#=6020, respectively. This implies an A-Divide operation. Thus, the query can be formulated as follows: 77 Il(Name Â¿Person Â¿Student Â¿Enrollment Â¿Course Â¿Course# -r{student} Â°iCourse. Course#)[Course#=60l0\/Course#=6020})[Name] where a dot in Course.Course# is used only for identifying the Course# class which is defined in the Course class. It does not represent a function or a method as in other languages. This expression can also be rewritten as follow: Il(Name Â¿ Person Â¿ II(Student Â¿ Enrollment Â¿ Course Â¿ Course# -T{student} which is more suitable for execution than the first since the inner A-Project gets the student objects who are taking these two courses so that all other data associÂ¬ ated with these students, such as Enrollment, Course, and Course#, do not have to be carried along in further processing to get the names of these student. Details of optimization issues will be addressed in the next chapter. We stress that the above association pattern expressions represent the interÂ¬ nal algebraic operations that need to be performed if the dynamic inheritance method is used. The high-level query statements corresponding to these algebraic expressions issued by the user can be much simpler due to the inheritance of attriÂ¬ butes in the generalization hierarchy or lattice. 78 Student Section Course Figure 4.1 Regular-edges and Complement-edges in an OG 79 primitive patterns graphical representation algebraic representation a1 which is derived from a1 b1 c1 â€ž _ a1 D-Complement- __- ^ pattern â€ which is derived from a1 b1 c1 d1 -â€¢ d1 + d1 (a1) I-pattern a1 A b1 A (a1b1) w V c1 d1 Complement- pattern a1 d1 (c 1 d 1) D-Inter- pattern (afd1) (a1 b1,b1c1,c1d1) (afd1) (a1b1,b1c1,c1d1) (a) primitive association patterns a1 b1 c1 (a1b1,b1c1,b1d1) (3) d1 (b1c1,c1d1) (a2b2,a4b2,b2c3,b3c3) (atbl,b1c1 ,b1c2,c1d1,c2d1) (b) complex association patterns Figure 4.2 Examples of association patterns 80 Figure 4.3 Examples of association-sets 81 A B C D Figure 4.4 A sample database association graph (The Complement-patterns are not shown) 82 ABC D d1 d2 d3 d4 Sample Database (The Complement-patterns are not shown) a P Y a1 t- -â€¢ b1 a3 â€¢ a3 Iâ€”-4 b2 , *[R(B,C)] /c1Â«â€”â– -*d1 \ c2 â€¢ Â« d2 c4 b3 â€¢ â€¢ 0d4 Vc4 â€”â€¢ d3 J (a) an Associate operation Figure 4.5 Example of operations 83 A B C D d1 d2 d3 d4 Sample Database (The Complement-patterns are not shown) a /"a1 â€¢- a2 â€¢ ^a4 -â€¢ b1 |[R(B,C)] d â€¢"â€”"â€¢ d1 c2 * 4 d2 .c3# ( a1 Â»- b1 a4 b3 a4 a4 b3 c3 c1 d1 â– 4â€”-4 c2 d2 â– 4 â€¢ b3 c3 J (b) an A-Complement operation Figure 4.5â€”continued 84 A B C D d1 d2 d3 d4 Sample Database (The Complement-patterns are not shown) a y f a1 â€¢ b1 â€”m â– C1 di A m f a1 b1 dA n a1 b1 c1 d3 [(A*B, D);(B:D)] = â€¢â€” a1 â€¢â€” â€”â€¢â€” b1- XjL3 V V *2 c3 d3 -H. y â€”ft (c) an A-Project operation Figure 4.5â€”continued 85 A B C D d1 d2 d3 d4 Sample Database (The Complement-patterns are not shown) a P Y a1 bA al a4 b2 J ![R(B,C)] r v c2 4 c4 4 b2< c3< d3 d4 A y f a4 b? c4 â€¢â€” â€” â€” a4 b2 c3 v â€¢- (d) a NonAssociate operation Figure 4.5â€”continued 86 A B C D d1 d2 d3 d4 Sample Database (The Complement-patterns are not shown) (e) an A-Intersect operation Figure 4.5â€”continued 87 ABC D d1 d2 d3 d4 Sample Database (The Complement-patterns are not shown) (f) A-Integrate operations Figure 4.5â€”continued 88 ABC D d1 d2 d3 d4 Sample Database (The Complement-patterns are not shown) (g) an A-Difference operation Figure 4.5â€” continued 89 ABC D Sample Database (The Complement-patterns are not shown) a a1 â€¢- b1 Â» b1 b1 c1 â€”Â» â€¢ c2 d1 â€¢ â€¢ c4 d4 b3 c4 â€¢ â€¢ b2 c3 â€¢ â€¢ y P Y f d1 \ â€¢ a1 b1 b.1 c.â€™ \ â€¢ â€¢ = b1 c2 d1 b1 c2 â€¢ â€¢ i b1 c4 d4 c4 d4 \ â€¢ â€¢ / \ â€¢â€”â€¢â€”â€¢ j (h) an A-Divide operation Figure 4.5â€”continued 90 Query 5 Name Student Dept Query 6 Teacher Section# ^ *0 o ckÂ£ Section ^ Room Query 7 Enrollment Course Course#=6010 Course#=6020 Figure 4.6 Intensional patterns of Query 5, 6, and 7 CHAPTER 5 MATHEMATICAL PROPERTIES OF OPERATORS AND THEIR APPLICATIONS IN QUERY OPTIMIZATION AND QUERY DECOMPOSITION In Section 4.3, we have shown some mathematical properties of individual operators. In this section, we shall study their properties systematically. The proÂ¬ perties of A-algebra are classified into six categories: (1) conventional algebraic properties such as commutativity, associativity, idempotency, nilpotency, and dis- tributivity; (2) nesting of two unary operations; (3) a binary operation nested in a unary operation; (4) cascading of two different binary operations; (5) general idenÂ¬ tities; and (6) operation transformation. The properties presented in this dissertaÂ¬ tion is quite exhaustive, but may not be complete. These properties provide the mathematical foundation for query decomposition and query optimization. Their utilities in these two applications are also illustrated in this chapter. The proofs of properties that are marked with fâ€™s can be found in the Appendix. Others can be proved similarly. 5J Conventional Algebraic Properties To be systematic, first we list the properties given in Section 4.3 without explanation, since they have been illustrated previously. Then, we give the proÂ¬ perties of distributivity. 91 92 A. Commutativity a *[Â¿2(A,B)] p = p *[fl(fl,A)] Â« (5.1 f ) a | [J2(A,J3)] P = P | [Â«(fl,A)] a (5.2 f ) a ! [J2(A,B)] P = P ! [R(B,A)\ a (5.3 f ) a â€¢{W} P = P â€¢{W} a (5.4 f) a + P = P + a (5.5 |) B. Associativity (â€œw *{R(AM pw) *{R(C,D)\ 1{z] = aw *{R(AM (P{y) *{R(C,D)\ 7{z}) (C%{X) A B* {Z}) (5.6 f ) (Â«{*> I WAM P{Y)) I [*()] 7{Z} = orw | [B(A,B)] (P{Y} | [B(C,B)] 7{z}) (Cg{A) A BÂ£{Z}) (5.7 f ) (Â«{*} *{^i} P{y)) Â«{W,} ~<{z} = â€¢ W) (P{Y} *{W2} 7{z}) (Â«wiHMyjnw-^ a ({w})nffl = Â« (5.81) (a + p) + 7 = a + (P + 7) (5.9 f ) C.Idempotency and Nilpotency a â€¢ a = a (if a is a homogeneous associationâ€”set) (5.10) a + a = a (5.11) A *[R(A,A)} A = A (5.12) A ![B(A,A)] A = (5.13) 93 a + a = a (5.14) D. Distributivity a) distributive property of * with respect to +: a *[R(A,B)} (p + 7) = a *[fl(A,B)] p + a *[B(A,B)] 7 (5.15 f ) (b) distributive property of | with respect to +: a 1 [B(A,B)] (P + 7) = a I [R(A,B)} P + a | [B(A,B)] 7 (5.16 f ) c)distributive property of â€¢ with respect to + : or .{*} ( p + 7 ) = Â« *{X} P + a *{X} 7 (5.17 f ) These three properties hold true for the same reasons. First, the A-Union operation simply lumps together patterns of two association-sets without modifyÂ¬ ing them. Second, when two patterns are operated on by *, |, or â€¢, the production of a new pattern is independent of other patterns in the operand association-sets, i.e., the decision whether a new pattern is produced or not is determined only based on the structure of the two patterns being operated on. d) distributive property of * with respect to â€¢: a{x) *{R{CLvCL2)\ (P[y} .{W} 7{z}) = *[R(CLvCL2)\ P[y) .{WUaw *[R(CLvCL2)\ 1{z} (5.18 f ) e) distributive property of | with respect to â€¢: I \R(CLvCL2)\ (P[y) .{W} 7{z}) = am I [R(CLvCL2)\ P[y) .{WUX} am \ [R(CLVCL2)\ 1{z] (5.19) 94 Distributive properties d and e, hold true under the following three condiÂ¬ tions: i) CL2eW, ii) Xp|Y = XfY = iii) or is a homogeneous associationâ€”set. The first condition ensures that the *, |, and ! operations are performed on the intersection of 0 and ri>. Otherwise, it does not make sense to have an operaÂ¬ tion between a and 7. The second condition states that a patterns are nonÂ¬ overlapping with f) and 7 patterns. The third condition states that, on the right- hand side of the expression, only the patterns having the same a patterns as their sub-patterns will succeed in the A-Intersect operation. Although these two distriÂ¬ butive properties do not hold when one of the above three conditions is not true, they are equivalent to some other expressions under a less restrictive condition. These properties are classified in other categories. It should be noted that two possible distributive properties are missing in the above list. First, ! is not distributive with respect to +. This property does not exist because of the way the NonAssociate operation is defined. By its definition, a pattern in one association-set will be included in the resultant pattern iff it does not connect to any pattern in the other association-set. This implies a logical AND concept. Therefore, expressions a ! (ft + 7) and a ! ft + a ! 7 have totally different semantics. The former stands for patterns in a that are not associated with patterns in both ft and 7; whereas the latter specifies those patterns in a that are not associated with any pattern in either /? or 7. Second, ! is not distributive 95 with respect to â€¢. This property does not hold because performing the A-Intersect operation first may drop some /? patterns which may be associated with some a patterns and the dropped 0 patterns may allow those a patterns to be non- associated with the result of the A-Intersect operation. Whereas, when performÂ¬ ing the Nonassociate operation first those a patterns may not appear in the final result. The reason that NonAssociate operator is not distributive with respect to A- Union and A-Intersect operations is mainly because it is not associative. We shall see from the rest of this chapter that it has less properties than other operators. 5*2 Nesting of Two Unary Operations a) Two A-Select operations (one nested in the other): Similar to the relational algebra, the order of the nesting of two selections can be exchanged without affecting the final result. Or, they can be combined into a single selection operation. The selection condition of the combined A-Select operation is the conjunction of the predicates of the original two A-Select operaÂ¬ tions. *i( mm = Â«(Â«)[/> A/y (5.20 t) 96 b) Two A-Project operations (one nested in the other): It should be obvious that the order of the nesting of two projection operaÂ¬ tions cannot be exchanged except that they project the same thing, which is not meaningful. However, they are equivalent to a single projection if the outer A- Project operation projects subpatterns over patterns produced by the inner A- Project. nx( nÂ¿ct)[eÂ¿TÂ¿)[e{,tj = nja)[^tj (5.21) ( Velt-3e,y(e!,â– â‚¬Â£, A e2jâ‚¬Â£2 ^ eUâ€”e2j) ) where elfâ€™s are subpattern expressions of the first A-Project operation and e^.â€™s are subpattern expressions of the second A-Project operation; and euCey means that eu defines a subpattern of ey. c) Two A-Integrate operations (one nested in the other): By the definition of the A-Integrate operation, if an A-Integrate operation is applied second time on an association-set, it will have no effect on the result of the first operation. Therefore, we have I (/ (Â«)) = / (Â«) J{w}VJ{w}v â€ J{wy ' (5.22) /( Â¡(a)) = /(a) (5.23) Since an A-Integrate operation with a set of specified classes only performs part of the function of an A-Integrate operation without a set of specified classes, the folÂ¬ lowing equations also hold true. /(yÂ»Â» - /(Â«) (5.24) 97 JÂ¡w}( /(Â«)) = /(Â«) (5.25) d) A-Select nested in A-project, or vise versa: A selection operation performed on the result of a projection operation is equivalent to the projection performed on the result of the selection, since the selection condition applicable to the projected subpatterns must be applicable to the patterns before the projection. However, it is not true for the other direction. at (5.26) For the other direction to be true, the classes involved in the predicate of the selection condition should also appear in [Â£;7] clause of the projection operation (denoted as PCS) which defines subpattern(s) to be projected out. Otherwise, the result of the selection is always an empty set because the predicate is not applicaÂ¬ ble to the projected patterns. Therefore, the above property holds true for both directions when the condition holds, thus we have U[ a(<*)mzT\ = JJ(ar)[Â£;TlM (Â« (5-27 f ) ÃL3 A. Binary Operation Nested in A Unary Operation 5.3.1 Binary operation nested in an A-Select a) Associate, A-Complement, or A-Intersect nested in A-Select Generally speaking, transforming an expression of a binary operation (AssociÂ¬ ate, A-Complement, or A-Interact) nested in a selection into another expression is impossible, since the predicate of the selection operation can be very complicated. For this reason, we study only the simple case in which the predicate has the form 98 PxaP2 or PyP2, and Px and P2 are only applicable to a and /?, respectively. The folÂ¬ lowing properties are similar to those in relational algebra. They do not need an explanation. For PxaP2, we have *{R(A,B)] P)[PxaP2] = vÂ¿a)[PÂ¿ *{R(A,B)} a2{p)[P2) (5.28 f ) Â°i<* \[R(A,B)\ P)[PxaP2] = ax(a)[/>,] [fl(A,il)] | a2(P)[P2] (5.29) Â°ia â€¢ P)[P(5.30) For PxvP2, we have o{a *[fl(A,fl)] P)[PyPÂ¿ = <*a)[PÂ¿ *[Ã2(A,B)] fi + a *{R(A,B)} o{P)[P2\ (5.31 f ) o{a |[fl(A,B)] P)\PXVPÂ¿ = a{a)[Px} |[fl(A,B)] fi + a |[J2(A,B)] o{p)[PÂ¿ (5.32) Â°i<* . P)[PxvP2] = o{a)[Px} . P + a . o(fi\PÂ¿ (5.33) We note that the above properties are not true for a NonAssociate operation nested in an A-Select. The reason is similar to what we have explained in the secÂ¬ tion on distributive property. b) A-Difference nested in A-Select Since both A-Difference and A-Select operations perform a restriction on an association-set and produce a subset of patterns without changing their original structures, an A-Select operation performed on the minuend or on the result of the A-Difference operation will produce the same result. o{a - m = o{a)[P[ - P (5.34 f) 99 c) A-Union nested in A-Select It should be obvious that the following equation is always true: o(a + p)[J] = o[a)[Pi + o{m (5.35 f ) In a special case that P has the form PyjP2 and P, and P2 can be applied to a and P, respectively, we have o(Â« + flfP.vPJ = ^[P,] + a2{p)[P2) (5.36 f ) 5.3.2 .Binary operation nested in A-Project or A-Integrate Since A-Project and A-Integrate operations produce patterns which may conÂ¬ tain subpatterns of both operands of the nested binary operation, properties simiÂ¬ lar to those presented above do not hold in general except for the nesting of an A-Union operation. TJ[a + p)[Â£;T] = Il(a){Â£;T\ + 17{p)[e;7] (5.37 f) /(Â« + P) = /( /(Â«) + /($) (5.38) / (a + p) = [ ( f (a) + f (p)) J{wy â€™ J{wf J(w}v â€™ \wy â€ (5.39) hA Cascading of Two Binary Operations 5.4.1 Cascading of two identical binary operators Most cases have been covered by the associativity properties. Although the associativity does not hold for operators - and -f, there exist some equivalent expressions. The cascading of two A-Difference operations follows the set- 100 difference in set theory. a'-/?-7 = ar-'y-/? = ar-(/? + 'y) (5.40 f ) The cascading of two A-Divide operations is equivalent to the dividend divided by the A-Union of the two divisors because an A-Divide operation retains patterns of the dividend without modifying their structures (note that the divide operation in relational algebra retains a substructure of the dividend). Therefore, the order of the two A-Divide operations is not important. a P T = a ^{w} T -^{w} P (5.41 f ) = Â« +{w}iP + 7) 5.4.2 Cascading of two different binary operations Many cases have been covered by the distributive properties. Although the distributivity properties of ! and -f with respect to + do not hold, there still exist some equivalent expressions. These properties are listed below according to their first operators. a) * with other binary operators The cascading of * and | operators is associative. (Â«W P(y]) |[R(C,D)\ 7{z} = aw *[R(A,B)} (P{Y) \\R{C,D)\ 7{z}) (5.42 f ) (OÃ‰WABÃ‰{2}) The condition ensures that the operation *[R(A,B)} does not operate on 7 patterns and *[R(C,D)\ does not operate on a patterns. 101 For the cascading of * and - operators (in that order), it should be obvious that when the subtrahend is only applicable to one of the operands of the * operaÂ¬ tion, the - operation can be performed first and just against that operand. (Â«W *{R(AM Pw) - 7{z} = (orw - 7{z}) *[R{A,B)} P{Y) ({Y}f^Z} = 0(5.43 |) = orw *{R(A,B)} (P{Y} - 1{z]) ({X}f^Z} = 0) For a similar reason, the following property hold true. (â€œ{x} *{R(A,B)\ P{y}) â€¢ l{z) = (â€œ{x} â€¢ 7{z}) *[R(A,B)} P^y] (5.44 f ) ({Y}pt{Z}=0 A {Y>nW = 4> A Ae{X}) = a{x} (P{y) â€¢ 7{z}) (wn{^>=0A wntw a be{y\) The first two conditions ensure that 7 patterns do not intersect with a and P patÂ¬ terns. Otherwise, the A-Intersect operation will perform over the common classes of P and 7 if the * operation is performed first. The third condition ensures that a (P) must contains object instances of A (B). In other words, the algebraic expresÂ¬ sion that defines a (P) must contain A (B). Otherwise, performing the A-Intersect operation first may produce false result when 7 contains object instances of A. Note that the right-hand side of the equation is in a distributive form of * with respect to â€¢. However, the distributive property cannot be applied, since it requires that A belong to a and P, and that 7 be a homogeneous association-set (refer to Section 5.1). 102 b) | with other binary operators Similar to the above two properties, we have (Â«{*} l[Â«(A,B)] P[Y)) - 7{z} = (Â«w - 7{z}) l[*(A,fl)] P[Y) ({Y}^ = *) (5.45) = Â«W \[R(AB)} (P{y) - 7{z}) (Wni^J = 4>) (Â«W l[Â«(A,fl)] /?m) . 7{z} = (of{Jt} . 7{z}) l[*(A,iJ)] /?{y} (5.46) ({y>n{^}=0 a wnw = = ^{jv} |[.R(A,Â£i)] (^yj â€¢ 7{z}) (WrK^W A wni^W A Â¿*e{Y}) c) â€¢ with other binary operators Similar to equations 5.43 and 5.45, we have (a{x) â€¢ P{y}) ~ 7{z} = (<*{.*} â€” 7{z}) â€¢ P{y) ({Y}p){.Z} = = ^{v} â€¢ (^{v} â€” ^{z}) (POPH-2) = d) ! with other operators As we have mentioned earlier, the ! operator has less properties because it is not associative. Although ! is not distributive with respect to +, the following decomposition holds true: a \\R(A,B)} (p + 7) (5.48 f ) = a![i2(A,Bp-i7(tt*[i2(A,B)]7)[a] + a![/?(A,fl)]7-17(a*[i2(A,B)]^[a] or Â« ![Ã2(A,B)] (P + 7) (5.49) = (a-IJ{a *{R(A,B)}P)l<*}-n(<* *[JR(A,B)]7)[Â«]) l[fi(A,B)] ((P~n{at*[R{A,B)]P)[P\) + (i-n( 103 The significance of equations 5.48 and 5.49 is that they can be used to transform the original expressions, in which the ! operators operate on heterogeneÂ¬ ous association-sets (e.g., a+0 ) for which the distributivity cannot be applied, into expressions in the format of A-Unionâ€™s of homogeneous association-sets. e) 4- with other operators An association-set (a) divided by the A-Union of two other association-sets (/? and 7) is equivalent to two consecutive A-Divide operations of a divided by f) and 7 in turn as indicated in equation 5.41. The order of the two A-Divide operations is not important. a +{w}(P + 7) = a "^{w} P "^{w} T (5.50) = a 4-{iv} 1 P The A-Divide operator also has less properties because it is not associative. f) - with other binary operators The properties of operator - cascaded with other operators are covered by 5.43, 5.45, and 5.47. g) + with other binary operators The equation below follows the set-union and set-difference operations in set theory. (or + p) - 7 = (a - 7) + (0 - 7) (5.51 f ) 104 The properties of cascading of + with operators *, |, â€¢, and ! operators can be found in 5.15, 5.16, 5.17, 5.48, and 5.49, since the latter operators are commutaÂ¬ tive. 5*5 General Identities There are many other properties which are unique to the A-algebra but canÂ¬ not be classified into the above categories. Listed below are some identity properÂ¬ ties. These identities are useful for expression reduction. A â€¢ A * B = A * B (5.52) A â€¢ A ! B = A ! B (5.53) A + IJ[A\B)[A] = A (5.54) A*B*C*A*B = A*B*C (5.55) 5*fi Transformation of Operators An important fact we have observed is that the same pattern can be conÂ¬ structed by different algebraic expressions using different operators. For example, pattern Aâ€”Bâ€”C can be constructed either by A*B*Cor by B*A â€¢ B*C, hence B*AÂ»B*C=A*B*C (5.56) Formally, their equivalence can be derived using the properties presented in the previous sections: B * A â€¢ B * C = (B â€¢ B * C] *[42(B,A)] A = (B * C) *[12(fl,A)] A (by 5.44) (by 5.52) 105 = A * (B * O) (by 5.1) = A *B *C (by 5.6) For the other direction, we have A * B * C = A * (B * B) * C = A * (B â€¢ B) * C = (A * B â€¢ B) * C = A * B â€¢ B * C (by 5.10) (by 5.10, 5.12) (by 5.44) (by 5.44) Using this property, a pattern of tree-structure can be described without using A- Intersect operator, which is relatively more expensive to implement. For example, A *(B *C,B *D) = A *[R{A,B)} (C * B * D) (by 5.56) = A * (B * C *[R(B,D)\ D) (by 5.1,5.6) = A * B * C *[R(B,D)} D (by 5.6) Another useful transformation is possible because a pattern of lattice strucÂ¬ ture expressed by an intersection of two linear patterns can be viewed as a selecÂ¬ tion on linear patterns to avoid the expensive A-Intersect operation. For example, A*B*C*D â€¢ B*E*D = o (A *B*C*D*E*Bâ€”l)[B=Bâ€”l\. (5.57) The left-hand side is to construct a lattice pattern by intersecting two linear patÂ¬ tern over classes B and D. By breaking the lattice pattern at B, it becomes a sinÂ¬ gle linear pattern as seen on the right-hand side of the above expression. Here, B-1 is an alias of B. By specifying that B=B-1 in the the association-set defined by A*B*C*D*E*B-1, we obtain the same result as the expression defined on the left- hand side. 106 Based on these two transformation properties, a complicated network strucÂ¬ ture can be viewed as a forest structure by properly breaking all the loops in the network and its algebraic expression can be specified using a, *, |, and ! operators. hJ. Applications in Query Optimization .and Query Decomposition We have systematically presented the mathematical properties of the operaÂ¬ tors of A-algebra. In this section, their utilities in query optimization and query decomposition will be illustrated. 5.7.1 Applications in query optimization Generally, query processing consists of three phases: translation, optimization, and execution. A query issued by the user is in the form of high-level language. First, it is translated into an internal representation -- an access plan, which may not be efficient for execution. Then, the optimizer generates a new access plan which is equivalent to the original access plan (i.e., they produce the same result) and is "optimal" for execution. Finally, the new access plan is scheduled for exeÂ¬ cution by the transaction manager to produce the result of the query. Since it is difficult to determine the equivalence of two statements in a high-level language, alternative access plans cannot be generated by the query translator. In relational databases, the access plan generated by the query translator is in the form of a query tree in which algebra operators are used in the relational databases so that the mathematical properties can be used to generate equivalent access plans, even 107 if the high-level language is based on the relational tuple calculus or domain calÂ¬ culus (refer to Chapter 2). Query optimization is, without loss of generality, an NP-hard problem. Therefore, an access plan generated by the optimizer is optimal in a very restricÂ¬ tive sense. Furthermore, to be practical, the overhead of the optimizer should never exceed the advantage of query optimization. In general, a query optimizer generates an optimal access plan in two steps: (1) generate (limited number of) equivalent access plans, and (2) evaluate these access plans based on (a few) sysÂ¬ tem parameters and criteria. The mathematical properties of the A-algebra presented above are the founÂ¬ dation for the first step of query optimization in 0-0 databases. In the second step, the system/application chooses one or more of the following as the goal of its query optimization: minimal response time, minimal execution time, minimal comÂ¬ munication time, minimal storage space, maximal resource utilization, etc. The parameters used in estimating the performance of an access plan include communÂ¬ ication cost (per block), CPU cost (per unit), I/O cost (per I/O), buffer size, selec- tivities of operations (e.g., Selection and Join in relational databases), data strucÂ¬ ture, algorithms of the operations (e.g., nested-join, hash-join), etc. Since the criteria of optimization are system/application dependent and the optimization strategies vary from system to system, a detailed study is out of the scope of this dissertation. We shall give an example to demonstrate the imporÂ¬ tance of the A-algebra in query optimization. 108 Query 8: List GPAs of students who major and minor in the same departments. The intensional pattern for this query is shown in Figure 5.1a. Suppose that the algebraic expression produced by the query translator is as follow, which corresponds to an access plan represented by the query tree shown in Figure 5.1b. II(GPA * (Student * Department â€¢ Student * Undergrad * Department))[GPA] To make the evaluation easy, we assume that every student has major, minor, and GPA (i.e., the selectivities of all * operations are 1.0) and 100 out of 104 stuÂ¬ dents major and minor in the same departments (i.e., the selectivity of the â€¢ operation is 1/102). If the time to perform an A-Select on a pattern is 1 unit, to perform an Associate operation is 2 units, and to perform an A-Intersect operation is 5 units, the total execution time can be calculated as follows not including time for the A-Project operation: Tj = (2 *104) + (4*104) + (5 *104) + 200 = 11.02*104 where the first term is the time for identifying studentsâ€™ majors, the second term is for identifying studentsâ€™ minors, the third term is for the A-Intersect operation, and the last term is for identifying the GPAs. In Figure 5.1b, the costs of operaÂ¬ tions are depicted next to the operator nodes. Here, the time for the A-Intersect operation is small because each student has only one major and one minor and indices may be used to speed up the operation. Using property 5.57, the same intensional pattern can be viewed as a linear patter shown in Figure 5.2a, and thus, the optimizer generates a new algebraic expression, which corresponds to the access plan shown in Figure 5.2b. 109 II(o{GPA * Student * Department * Under grad * Student-1) \Student=Student-l])[ GPA] The total execution time for this access plan is T2 = (8*104) + (104) = 9*104 where the first term is the time for four Associate operations and the second term is the time for the selection operation. It is less expensive than the original access plan, thus, a better plan. However, if we assume that the database is a distributed one in which data of studentsâ€™ GPAs are in site 1 and other data are in site 2 (the class Student has to be replicated in both sites). The communication cost is assumed to be 1000 units per block with block size of 100 patterns. The total execution times for these two access plans can be calculated as follows: 7\ = (2 *104) + (4*104) + (5 *104) + 1000 + 200 = 11.12 *104 T2 = (8*104) + (104) + 106 = 19 *104 In Tj, the fourth term is the communication cost for sending qualified students to site 1. In t2, the third term is the communication cost (the communication costs are the same for sending GPAs of all students to site 2 and for sending studentsâ€™ majors and minors to site 1). In this case, the first access plan is better than the second. Figure 5.3a and 5.3b depicts the costs of operations (next to the operaÂ¬ tions) and the costs of communications (on the edges) for these two access plans. The optimizer of the distributed system may generate another access plan by applying property 5.28 to the algebraic expression of the second access plan, and we have 110 I7(GPA * o(Student * Department * Under grad * Student-1) [Student=Student-l])[GPA] which corresponds to the access plan shown in Figure 5.3c. The total execution time for this access plan is Ts = (6*104) + 104 + 104 + 200 = 7.12 *104 where the first term is the time for the three Associate operations nested in the A-Select, the second term accounts for the selection operation, the third term accounts for the communication cost, and the last term is the time for getting GPAs. Therefore, the third access plan is the optimal one for execution. 5.7.2 Applications in query decomposition The 0-0 modeling techniques incorporate many high-level features such as association types, inheritance, behavioral properties of objects, knowledge and rules, etc. in the DBMS. These features were taken care of by database adminisÂ¬ trators and application programs in conventional databases systems. To ensure good performance, 0-0 DBMSs need the support of parallel and distributed proÂ¬ cessing techniques. In distributed and parallel processing environment, a query is decomposed into subqueries according the processing capabilities of processors and/or data disÂ¬ tribution. The algebraic representation of a query can be manipulated mathematÂ¬ ically for this purpose. For example, suppose a query is represented by an inten- sional pattern shown in Figure 5.4a. The algebra expression for this query can be Ill written as follows: expr = A * (B*E*F + B*(C*D*H â€¢ C*G)). By applying the distributivity properties, the above expression can be written as below: expr = A * (B*E*F + B*C*D*H â€¢ B*C*G) - A *B *E *F + A * (B *C*D *H â€¢ B *C*G) = A*B*E*F + A*B*C*D*H â€¢ A*B*C*G. The decomposed expression is the A-Union of two sub-expressions representing two sub-patterns shown in Figure 5.4b. These sub-expressions are independent of each other and can be processed in parallel in a parallel system. The second subÂ¬ expression can be further optimized as shown in the following expression in which *[/2(C,G)] indicates that the Associate operation is performed through the associaÂ¬ tion between C and G. expr = A *B*E*F + (A *B*C*D*H) *[R{C,G)\ G. In addition, since each sub-expression represents a homogeneous association-set, its processing will be more efficient than processing over heterogeneous association- sets. Next, we present two theorems of the A-algebra, which ensures that the decomposed sub-expressions produce homogeneous association-sets. Theorem 5.1: Operators (except A-Union and A-Integrate) of A-algebra produce homogeneous association-sets if their operands are homogeneous association-set. 112 Proof: This is true by the definitions of the operators (A-Intersect operation should be used without specifying the classes on which the A-Intersect operation is perÂ¬ formed, i.e., it performs on the common classes of its operands). Note that, for A-Difference and A-Divide operations, this is also true if only the first operand (the minuend or the dividend) is a homogeneous association-set. Theorem 5.2: If an A-algebra expression which does not contain A-Integrate operaÂ¬ tion and A-Divide operation whose dividend is an heterogeneous association-set, it can be decomposed into the A-Unionâ€™s of some subÂ¬ expressions, each of which produces a homogeneous association-set. Proof: According to Theorem 5.1, besides the A-Integrate operation, the A-Union is the only operator that can produce heterogeneous association-set when its operands are homogeneous association-sets. Therefore, it suffices to prove that whenever such heterogeneous association-set appears in an expression, the expresÂ¬ sion can be decomposed into the A-Union of sub-expressions which produce homoÂ¬ geneous association-sets. Proof: Let a, ft, 7, and X be all homogeneous association-sets. By properties 5.15, 5.16, 5.17, 5.35, and 5.37 we have (or + $*{7 + X) = a *7 + a*\ + 7 + f)*\ (Â« + 01(7 + X) = or hr + a|X + /?|7 + y9|X (a + $*(7 + X) = a*7 + a-.X + Â£Â«7 + y9Â»X o{a + m = o{<*m + By properties 5.56, we have (or + P) - 1 = (or - 7) + (Â£ - 7) 113 By properties 5.42, we have (a + $!(7 + X) = (Â«hr - n[a*\)[a} - 7J(^*7)[7]) + (ar!X - 77(a*7)[a] - 77(/?*X)[X]) + - IT(P*\)IP\ - 77(a*7)[7]) + (0X - IAPnm ~ J7(ar*X)[X]) In the above decompositions, each term of the A-Union operations represents a homogeneous association-set. â–¡ 114 GPA Student Department (a) intensional pattern of Query 8 (b) access plan 1 of Query 8 Figure 5.1 Access plan 1 of Query 8 115 GPA Student Department Undergrad Student_1 o o o o o (a) alternitive intensional pattern of Query 8 (b) access plan 1 of Query 8 Figure 5.2 Access plan 2 of Query 8 116 * 200 GPA Student GPA Student (b) cost of access plan 2 (c) cost of access plan 3 Figure 5.3 Costs in a distributed system Q> 117 (a) (b) Figure 5.4 Example of query decomposition Ã“I CHAPTER 6 COMPLETENESS OF THE A-ALGEBRA We have shown in the preceding sections that a query issued against an 0-0 database can be specified by an association (or graphic) pattern, in which object instances of interest are related (associated or nonassociated), and that the A- algebra provides a useful mathematical method for specifying and manipulating such pattern to produce the result for the query. However, for the algebra to be truly useful, the completeness of the algebra needs to be addressed. Due to the closure property of the A-algebra, the result of a query is represented intensionally by a subdatabase schema graph SGt and extensionally by a subdatabase object graph OGt, where SG, is a subgraph of the SG of the origiÂ¬ nal database and OG, is a subset of association patterns in the original object graph OG. A subdatabase can be further operated upon by the A-algebra operaÂ¬ tors to produce other subdatabases. We can therefore define the completeness of the algebra in the following way. Completeness Theorem: The A-algebra is complete if it can define all possible subdatabase of an 0-0 database. Before proving the theorem, we first give the formal definitions of the SG, and OG, of of the subdatabases of an 0-0 database. 118 119 Subdatabase Schema Graph: A subdatabase schema graph (SGt) is a set of m connected subgraphs, {SGâ€™(C,A)} from the original database schema graph SG(C,A), where C is a set of vertices representing classes {c,.} and A is a set of edges representing associations between classes, each of which is denoted by A(j for an association between classes C,. and Cy If Cje.SG\, then CÂ£SGk (VMj). The condition ensures that a class does not appear in two different connected graphs in a subdatabase. If it does, the two connected graphs should have been a single connected graph. Subdatabase Object (Association) Graph: A subdatabase object graph (0Gt(0,E)) contains a subset of association patÂ¬ terns of the original database object graph (0G(0,Â£)), where O is a set of vertices representing object instances and E is a set of edges representing associations between object instances. An Inner-pattern (or object instance 0{J) belongs to OGt only if CieSGl and O^-eC,-. An Inter-pattern or a Complement-pattern (Oij===Gm n) belongs ^to OGt only if Ci,CmeSGt and A,-,meSGâ€ž where O^C,, OmnÂ£Cm, and Oif==A)mneAim. The above conditions state that a primitive association pattern should not be included in OGt if the corresponding classes and/or associations of the original database are not in SGt. Instead of proving the completeness theorem as stated above, we make the following observations and restate the theorem as shown below. First, although the SG of an 0-0 database may consist of more than one connected graph, it suffices to prove the case that the SG is a single connected graph since if two classes do not have a path between them in the SG, they will not be associated with each other in any of the subdatabases. Therefore, each connected graph of SG can be treated as an independent database and a subdata- 120 base defined on more than one connected graphs of SG can be represented by the A-Union of the subdatabases defined on different connected graphs of SG. Second, it suffices to prove the case that a subdatabase consists of only one connect subgraph of SG, although in general the SGt of a subdatabase may conÂ¬ tain more than one subgraphs of SG. This is because the general case can be represented by the A-Union of the expressions for individual subgraphs. Third, since an 0-0 database is a collection of association patterns, it should be obvious that if there exists an A-algebra expression for every association patÂ¬ tern of an 0-0 database, then the subdatabases can be represented by the A- Union of a subset of these association patterns. Therefore, the completeness theorem can be restated as follows: Completeness Theorem: The A-algebra is complete if there exists an expression for every assoÂ¬ ciation pattern in the OG of an 0-0 database. We prove the above theorem by induction on the number of object instances in an association pattern. Proof: BusÂ£i We first show that there is an expression for the case that an association pattern contains a single object instance. Since the name of a class, say Cv represents all the object instances of the class, an association pattern containing a single object instance of that class can be represented by an A-Select operation over the object instances of Cl to select a particular object instance of interest, as shown below: 121 *( Hypothesis: Assume that there exists an expression for every association pattern that contains n-1 object instances. These n-l object instances must form a conÂ¬ nected graph, i.e., each object instance must be at least one path between any two object instances in the graph. Otherwise, they would have formed multiple associÂ¬ ation patterns. Induction. Suppose there exist an expression for an association pattern Pn_1 which contains n-l object instances. When adding the nth object instance to this patÂ¬ tern, a new pattern Pn containing n object instances can be formed in the followÂ¬ ing two ways as depicted in Figure 6.1: (a) the nth object instance belongs to class Ck and the object instances of Ck do not participate in Pn_I; and (b) the nth object instance belong to a class, say Cfi which has some object instance(s) participated in the Pn-1. To avoid using complicated notation, we will show the formulations for two specific patterns depicted in Figure 6.2a and 6.2b, which correspond to the cases of Figure 6.1a and 6.1b, respectively. Patterns in general forms can be formulated using the same mechanism described below. We shall discuss cases a and b in turn. Case a: When adding an object instance of C7 to a pattern P11 containing 11 object instances, various new patterns P12,s can be formed depending on the assoÂ¬ ciations between the new object instance and the other existing object instances. The new object instance can only have one association with an existing object instance if their classes are directly connected in SG by a single association type 122 (we will consider later the case that there are more than one association type between two classes). There are only three possible choices for the new object instance to relate to an existing object instance: (l) the association is of no interest, i.e., the association is not included in the pattern; (2) they are associated with each other; (3) they are not associated with each other. Graphically, we use a solid line (an Inter-pattern) to represent choice 2 and a dashed line (a Complement-pattern) to represent choice 3. No line is drawn between the two object instances for choice 1. Note that at least one of the associations of the new object instance with the existing object instances must have a choice of 2 or 3. Otherwise, the new object instance and P11 are two separate patterns that should be covered by the base and the hypothesis. To formulate an expression for the new pattern shown in Figure 6.2a, we first transform pattern P11 into a pattern by treating object instances of Pn as if they are from different classes by using the aliasing names of their original classes, as shown in Figure 6.3a. The pattern P12 in Figure 6.2a is equivalent to the patÂ¬ tern P'2 in Figure 6.3a provided that the object instances of the aliasing classes of the same class are not the same object instances. Next, the equivalent pattern is decomposed into a set of patterns, each of which is a subpattern (i.e., subgraph) of the pattern in Figure 6.3a and consists of Pn, the new object instance, and its relationship with one object instance in P11. If we can derive expressions for these subpattern individually, the A-Intersectâ€™s of these expression will be the expression for the pattern in Figure 6.3a, which is equivalent to the pattern in Figure 6.2a. In this example, the pattern in Figure 6.3a is decomposed into six subpatterns, as 123 shown in Figure 6.4a, which can be easily expressed as follows: Epu = (EpU) \{R(C^l,C7)} C7, ol E = (E ) *[i2(C,_2,C7)] C7; 02 Ep i2 = {E n) *(-R( C3-I > c7)} C7; Epn = (E 11) *[R(CÂ¿Â¿,C7)\ Cv o4 Epâ€ž = (E ) \{R(Cb,C7)\ C7; 06 ^p12 = (^l) *[Â«(^^7)] oO respectively. Here, E stands for the algebraic expression of the association pattern specified by its subscript. In each expression, an operation * or | is chosen corresponding to the type of connection between object instances, and EpU is parenthesized to ensure the correct execution sequence. The expression for the pattern of Figure 6.2a can then be formulated by a sequence of A-Intersect operations on the expressions of these individual patterns: E_w = E E E E E E Case b. Figure 6.2b depicts the case that the new object instance belongs to an existing class C6 and it may have associations with object instances of other classes that have associations with C6. The formulation for the new pattern Pf shown in Figure 6.2b can be obtained similarly as depicted in Figure 6.3b and 6.4b. Note that the new object instance belongs to the aliasing class Câ€ž_2 after the pattern transformation process (see Figure 6.3b). As shown in Figure 6.4b, the equivalent pattern depicted in Figure 6.3b is decomposed into four patterns which can be expressed by 124 E 2 = (E ) 4R(C^2,CeJ2)\ C6_2; 61 E w = (-^pii) *[-R(C!,â€”l,0,^-2)] C6_2; 62 Â£.2 = (S,i) |[/2(C4_2,C6_2)] C6_2; 63 ^pi2 = (*>) IW^CfÂ«-2)] <^2, 61 respectively. Therefore, for the pattern PÂ¿2 we have expression â€¢Â®P12 â€” E12 â€¢ E l2 â€¢ E 12 â€¢ E12. 6 61 62 63 64 However, the above expression does not exclude the case that two object instances in aliasing classes of Ct refer to the same object instance. Hence, it is necessary to perform an A-Select operation to eliminate such case and we have â– Â®pi2 = otE â€¢ E ,2 â€¢ E i2 â€¢ E ,^[Cg-l^Cj-2]. 6 *61 ' 62 63 6-1 So far we have shown that there exists at least one expression for a pattern consisting of any number of object instances. We note that there may exist more than one expression for a pattern. We illustrate this by showing an alternative way of transforming a pattern into an equivalent one so that different expressions can be derived. Figure 6.5a shows another pattern which is equivalent to the pattern in FigÂ¬ ure 6.2a if in Figure 6.5a the objects instances of the aliasing classes C7_ 1 through C7jb that participate in P'2 refer to the same object. Therefore, we have an alterÂ¬ native expression for P'a2 Epâ€ž = o(-((Â¿Ãy,) |[J2(C1_1,C7_1)] <7,-1) *[J2(C71_2,C'7_2)] C7_2 125 â€¢ â€¢ â€¢ 4R(C5,Ch6)] C7-6))[C7_l=C7-2=...=C7_6]. which is a sequence of * and/or | operations on EpU over classes C7_t, (^1,2,...,6) and their associated classes. The selection condition [C7_l=C7_2=...=C7_6] ensures that the object instances in all aliasing classes of C7 refer to the same object. Similarly, the pattern in Figure 6.5b is equivalent to the pattern in Figure 6.2b if the object instances in Cg_2 through C6_5 that participate in Pf are the same object and this object is different from the one in Cg_l. Hence an alternative expression can be derived as follows E = o{...((E ) 4R(C^2,CeJÂ¿)} C6JZ) *[jR(C4_1,C6_3)] C6_3) b â€¢ â€¢ â€¢ |[J2(C74_2,Ci_5)] C'6_5))[C6_2=C6_3=C6_4=C6_5^C8_1]. We have shown that there exists an expression for every association pattern when there is a single association between two classes. Now we prove this is also true when there are more than one association between two classes. There are also two cases as described in the proof above. We only prove case a that the new object instance belongs to Ck and the object instances of Ck do not particiÂ¬ pate in Pn~\ Case b can be proven using the same methodology. Figure 6.6a shows an SG in which there are two associations between Ct_, and Ck. The two associations are denoted as [R^C^C^] and [R,2(Cj_vCk)\, respecÂ¬ tively. Figure 6.6b shows a pattern in which the new object instance of Ck has two associations with each object instance of Ct_r The associations between object instances of Cj_{ and Ck are labeled by numbers corresponding to the assoÂ¬ ciations of their classes. To derive the algebraic expression for this pattern, first, 126 we decompose it into two patterns, Pâ€ and P%, as shown in Figure 6.6c. The decomposition is done by making two copies of the pattern. In one copy the assoÂ¬ ciations labeled 2 are dropped and in the other the associations labeled 1 are dropped. From the earlier discussion, we can derive expressions for these two patÂ¬ terns and the expression for the original pattern can be represented by the A- Intersect of the two: E _ = E â€ž â€¢ E â€ž. pn pn pâ€ a b To ensure that the A-Intersect operation will produce the pattern as required, the same object instance in the two copies should use the same aliasing class name when expressions E â€ž and E â€ž are formulated. a 6 Generally, if the new object instance of Ck has multiple associations with object instances of several classes, the association pattern is decomposed into m patterns, where m is the maximum number of associations Ck has with another class. Since it has been shown that we can formulate algebraic expressions for all possible patterns in which object instances are associated or nonassociated and the A-Unionâ€™s of these expressions forms a single expression for the subdatabase of interest, we have shown that the A-algebra is complete by induction. â–¡ 127 (a) the nth object is in Ck (b) the nth object is in Cj Figure 6.1 Two ways of forming new patterns 128 (a) the 12th object is in C7 (b) the 12th object is in C6 Figure 6.2 Two specific examples of new patterns 129 (a) (b) Figure 6.3 Equivalent patterns 130 J (a) (b) Figure 6.4 Decomposed patterns 131 (a) (b) Figure 6.5 Other equivalent patterns 132 (a) Two classes have multiple (b) Two objects have multiple associations in a pattern associations Figure 6.6 New object instance having multiple associations with those of C-_x CHAPTER 7 CONCLUSION Object-Oriented DBMSs and their underlying models exhibit several desirable features that are suitable for modeling and processing complex objects found in more advanced database applications. However, they still do not have a solid mathematical foundation. Such a foundation is important for the efficient maniÂ¬ pulation of 0-0 databases and for the design of high-level query languages to ease the userâ€™s task in accessing and manipulating 0-0 databases. In this dissertation, we have presented an algebra for 0-0 database processÂ¬ ing based on the uniformed representation of object instances and their associaÂ¬ tions in an 0-0 database: association patterns. Nine algebra operators have been introduced for manipulating patterns of both heterogeneous and homogeneous structures. The closure property of the algebra allows the result of an algebraic expression to be further processed by the algebra. Several mathematical properties of the A-algebra operators have been studied and formally proven. Their utility in query decomposition and optimization has been demonstrated. The A-algebra is complete in the sense that all possible subÂ¬ databases that are derivable from an 0-0 database can be expressed in A-algebra expressions. 133 134 The A-algebra has been used in the design and implementation of a high- level object-oriented query language, OQL, for processing 0-0 databases [ALA89b, WU89]. A graphic interface for the language and a prototype knowledge base management system based on the 0-0 semantic association model OSAM* [SU86 and SU89] are presented in [DS088, TY88, SU88, LAM89, PAN89, CHU90, SIN90]. REFERENCES [AH079] [ALA89a] [ALA89b] [ALA90] [ARM74] [AST76] [BAN87] [BAN88] [BAT84] Aho, A.V., Beeri, C., and Ullman, J.D., "The Theory of Joins in RelaÂ¬ tional Databases," ACM Transactions on Database Systems 4:3, 1979, pp. 297-314. Alashqur, A.M., "A Query Model and Query and Knowledge Definition Languages for Object-oriented Databases," doctoral dissertation, University of Florida, 1989. Alashqur, A.M., Su, S.Y.W., and Lam, H., "OQL: A Query Language for Manipulating Object-oriented Databases," Proceedings of the 5th Inti. Conference on VLDB, Amsterdam, The Netherlands, 1989, pp. 433-442. Alashqur, A.M., Su, S.Y.W., and Lam, H., "A Rule-based Language for Deductive Object-Oriented Databases," Proceedings of the 6th International Conference on Data Engineering, Los Angeles, CA, Feb. 5-9, 1990. Armstrong, W.W., 'Dependency Structures of Data Base RelationÂ¬ ships," FDT: ACM, New York, 1974. Astrahan, M.M. and Chamberlin D.D., "System R: a relational approach to data management," ACM Transactions on Database SysÂ¬ tems 1:2, 1976, pp. 97-137. Bancilhon, F., Briggs, T., Khoshafian S., and Valduriez P., 'FAD, a Powerful and Simple Database Language," Proceedings of the 13th VLDB Conference, Brighton, 1987, pp. 97-105. Banerjee, J., Kim, W., and Kim, K.C., "Queries in Object-oriented Databases," Proceedings of the 4th Inti. Conference on Date EngineerÂ¬ ing, Los Angeles, CA, 1988, pp. 31-38. Batory, D.S. and Buchmann, A.P., 'Molecular Objects, Abstract, Abstract Data Types and Data Models: A Framework," Proceedings Inti. Conference on VLDB, 1984, pp. 172-184. 135 136 [BAT85] [BEE77] [CAR88] [CHU90] [COD 70] [COD72a] [COD 72b] [COD79] [COD 90] [COL89] [DAH67] [DEL78] Batory, D., and Kim, W., "Modeling Concepts for VLSI CAD Objects," ACM Transactions on Database Systems, 10:3, 1985, pp. 322-346. Beeri, C., Fagin, R., and Howard J.H., "A Complete Asiomatization for Functional and Multivalued Dependencies," ACM SIGMOD InterÂ¬ national Symposium on Management of Data, Los Angeles, CA, 1977, pp. 47-61. Carey, M.J., DeWitt, D.J., and Vandenberg, S.L. "A Data Model and Query Language for EXODUS," ACM-SIGMOD Conference 1988, pp. 413-423. Chuang, H. S., "Operational Role Processing in a Prototype OSAM* KBMS," Masterâ€™s thesis, University of Florida, 1990. Codd, E., "A Relational Model of Data for Large Shared Data Bank," CACM, 13:6, 1970, pp. 377-387. Codd, E., 'Relational Completeness of Database Sublanguages," in Data Base Systems, (Rustin, R. ed.), Prentice-Hall Inc., Englewood Cliffs, NJ, 1972, pp.65-98. Codd, E.F., 'Further Normalization of the Data Base Relational Model," in Data Base Systems (R. Rustin, ed.) Prentice-Hall, EngleÂ¬ wood Clifis, NJ, pp. 33-64. Codd, E.F., 'Extending the Database Relational Model to Capture More Meaning," ACM Trans, on Database Systems, 4:4, 1979 pp. 262- 294. Codd, E.F., The Relational Model for Database Management, Addision-Wesley, 1990. Colby, L. S. "A Recursive Algebra and Query Optimization for Nested Relations," ACM-SIGMOD Conference, Portland OR, 1989, pp. 273- 283. Dahl, O. J., Myhrhaug, B., and Nygaard, K., "SIMULA 67: Common Base Language," NCC Publ. S22, Norwegian Computing Center, Oslo, Norway, 1967. Delobel, C., "Normalization and Hierarchical Dependencies in the Relational Data Model," ACM Transactions on Database Systems, 3:3, 1978, pp. 201-222. 137 [DS088] Dâ€™Souza, G. T., "Graphic Semantic Data Definition Language and a Graphic Browser for the Objected-oriented Semantic Association Model," Masterâ€™s Thesis, University of Florida, 1988. [ELM89] Elmore, P., Shaw, G.M., and Zdonik, S.B., "The ENCORE Object- Oriented Data Model," tech, rep., Brown University, November, 1989. [FAG77] Fagin, R., 'Multivalued Dependencies and a New Normal Form for Relational Database," ACM Transactions on Database Systems, 2:3, 1977, pp. 262-278. [FIS87] Fishman, D.H., Beech, D., Cate, H.P., Chow, E.C., Connors, T., Davis, J.W., Derrett, N., Hoch, C.G., Kent, W., Lyngbaek, P., Mah- bod, B., Neimat, M.A., Ryan, T.A., and Shan, M.C., "Iris: An Object- Oriented Database Management System," ACM Transactions on Office Information Systems, 5:1, 1987, pp. 49-69. [GOL81] Goldberg, A., "Introducing the Smalltalk-80 System," Byte, Aug. 1981, pp. 14-26. [HAL76] Hall, P.A.V., "Optimization of a Single Relational Expression in a Relational Database," IBM J. Research and Development 20:3, 1976, pp. 244-257. [HAM81] Hammer, M. and Mcleod, D., 'Database Description with SDM: A Semantic Association Model," ACM TODS, 6:3, 1981, pp. 351-368. [HOR87] Hornick, M.F. and Zdonik, S. B., "A Shared, Segmented Memory SysÂ¬ tem for an Object-oriented Database System," ACMâ€™s Transactions on Office Information Systems, 5:1, 1987, pp. 70-95. [HUL87] Hull, R. and King, R., "Semantic Database Modeling: Surey, ApplicaÂ¬ tions, and Research Issues," ACM Computing Surveys, 19:3, 1987, pp. 201-260. [KIM87] Kim, W., Banerjee, J., Chou, H.T., Garza, J.F., and Woelk D., "ComÂ¬ posite Object Support in an Object-oriented Database System," Proceedings of OOPSLA, FL, Oct. 4-8, 1987, pp. 118-125. [KIN84] King, R., "Sembase: A Semantic DBMS," the Proceedings of the First International Workshop on Expert Database Systems, Atlanta, GA, Oct. 1984, pp.151-171. [KLE67] Kleene, S.C., Mathematical Logic, John Wiley & Sons Inc., 1967. 138 [LAM89] Lam, H., Xia, D. Qiu, J., and Wu, P., 'Prototype Implementation of an Object-oriented Knowledge Base Management System," to appear in the Proceedings of PROCIEM â€™89, Orlando, FL, Nov. 13-15, 1989. [LEC88] Lecluse, C., Richard, P., and Velez, F., "o2, an Object-Oriented Data Model," ACM-SIGMOD Conference, Chicago IL, June 1-3, 1988, pp. 425-433. [MAC85] MacGregor, R., "ARIEL--A Semantic Front-End to Relational DBMSs," Proceedings of VLDB 85, Atlanta, GA., April 1985, pp. 305- 315. [MAI86] Maier, D. and Stein J., 'Development of an Object-oriented DBMS," Proc. of OOPSLA â€™86 Conference, Portland OR, Sept. 29 - Oct. 2, 1986, pp. 472-482. [MAN86] Manola, F. and Dayal, U., 'PDM: An Object-Oriented Model," Intâ€™l Workshop On Object-Oriented Database Systems, 1986, pp 18-25. [PAN89] Pant, S., "An Intelligent Schema Design Tool for OSAM*," Masterâ€™s thesis, University of Florida, 1990. [ROW87] Rowe, L. A and Stonebraker, M. R., "The POSTGRES Data Model," Proceedings of the 13th VLDB Conference, Brighton 1987, pp. 83-96. [SER86] Servio Logic Development Corporation, Programming in OPAL, a Manual, Published by Servio Logic Development Corporation, BeaverÂ¬ ton, OR., 1986. [SHA90] Shaw G. M., and Zdonic, S. B., "A Query Algebra for Object-Oriented Databases," IEEE Trans, on Data Engineering, 12:3, 1990, pp. 154-162, Feb. 1990. [SHI81] Shipman, D., "The Functional Data Model and the Data Language DAPLEX," ACM TODS, 6:1, 1987, pp. 140-173. [SIN90] Singh M., "Transaction Oriented Rule Processing in an Object- Oriented Knowledge Base Management System," Masterâ€™s thesis, University of Florida, 1990. [ST076] Stonebraker, M., Wong, E., Kreps, P., and Held, G., "The Design and Implementation of INGRES," ACM Transactions on Database SysÂ¬ tems, 1:3, 1976, pp. 189-222. [ST084] Stonebraker, M., Anderson, E., Hanson, E., and Rubenstein, B., "Quel as a Data Type," Proceedings of the 1984 ACM SIGMOD Conference 139 [SU86] [SU88] [SU89] [TOD76] [TSU84] [TY88] [ULL82] [WOE86] [WON76] [WU89] [ZAN76] on Management of Data, Boston, MA, June, 1984, pp. 208-214. Su, S.Y.W., "Modeling Integrated Manufacturing Data With SAM*," IEEE Computer, January, 1986, pp.34-49. Su, S.Y.W., Lam, H., and Navathe S.N., "An Object-oriented ComÂ¬ puting Environment for Productivity Improvement in Automated Design and Manufacturing: Project Summary," PROCIEM â€™88,Orlando, FL., Nov. 14-15, 1988. Su, S.Y.W., Krishnamurthy, V., and Lam, H, "An Object-oriented Semantic Association Model (OSAM*)," A.I. Industrial Engineering and Manufacturing: Theoretical Issues and Applications (S. Kumara, A.L. Soyster, and R.L. Kashyap eds.), The Institute of Industrial Engineering, Industrial Engineering and Managememnt Press, Nor- cross, GA, 1989. Todd, S.J.P., "The Peterlee Relational Test Vehicle -- A System OverÂ¬ vies," IBM Systems J. 15:4, 1976, pp. 285-308. Tsurt, S. and Zaniolo, C., "An Implementation of GEM -- Supporting a Semantic Data Model on a Relational Back End," Proceedings of the ACM SIGMOD Inti. Conference on the Management of Data, Boston MA, June 18-21, 1984, pp. 286-295. Frederick Ty, "The Design and Implementation of a Graphics InterÂ¬ face for an Object-oriented Language," Masterâ€™s thesis, University of Florida, 1988. Ullman, J.D., Principle of Database Systems, Computer Science Press, 1982. Woelk, D., Kim, W., and Luther, W., "An Object-Oriented Approach to Multimedia Databases," ACM SIGMOD Conference Proceedings, Washington, D.C., May 1986, pp. 311-325. Wong, E. and Youssefi, K., " Decomposition â€” A Strategy for Query Processing," ACM Transactions on Database Systems, 1:3, 1976, pp. 223-241. WU, Ping, 'Implementation Concepts for OSAM* Data Model and OQL language," Masterâ€™s thesis, University of Florida, 1989. Zaniolo, C., "Analysis and Design of Relational Schemata for Database Systems," Doctoral Dissertation, UCLA, July, 1976. 140 [ZAN83] [ZAN85a] [ZAN85b] [ZD086] [Z0077] Zaniolo, C., "The Database language GEM," Proceedings of the ACM SIGMOD Inti. Conference on the Management of Data, San Jose, CA, 1983. Zaniolo, C., "The Representation and Deductive Retrieval of Complex Object," Proceedings of VLDB, Stockholm, Sweden, 1985, pp. 485-469. Zaniolo, C., Ait-Kaci, H., Beech, D., Cammarata, S., Kerschberg, L., and Maier, D., "Object-Oriented Database Systems and Knowledge Systems," in Expert Database Systems, (Larry Kersberg ed.), Benjamin/Cunnings Publishing, Meulo Park, CA, 1985, pp. 49-63. Zdonik, S. B., Skarra, A. H., and Reiss, S. P., "An object Server for an Object-oriented Database System," International Workshop on Object-oriented Database Systems, Pacific Grove, CA., Sept. 1986. Zook, W., Youssefi, K., Whyte, N., Rubinstein, P., Kreps, P., Held, G., Ford, J., Berman, B., and Allman, E., INGRES Reference Manual, Dept, of EECS, Univ. of California, Berkeley, 1977. APPENDIX The formal proofs of the mathematical properties of the A-algebra operators are given below: A. Commutativity: (1) a4R(A,B)}P = P4R(B,A)}a (5.1) Proof: If a pattern in a can be concatenated with a pattern in /? over an InterÂ¬ pattern a(bj, then the pattern in /? can be concatenated with that pattern in a over the Inter-pattern Since patterns are non-directional, i.e., aibj = b-a{, the left- hand side and the right-hand side of the equation would produce the same result. On the other hand, if an a pattern cannot be concatenated with a Â¡3 pattern by the operation on the left-hand side, then the same (3 pattern cannot be concatenated with that a pattern by the operation on the right-hand side. â–¡ (2) a\[R(A,B)]P = PmB,A)]c* (5.2) Proof: Since a Complement-pattern is non-directional and if a complement pattern aibj connects an a pattern with a Â¡3 pattern, these two patterns together with the Complement-pattern aibj will all be retained in the results of the expressions on both sides of the equation. For the same reason, a new pattern which cannot be produced by the operation on the left-hand side of the equation cannot be produced by the operation on the right-hand side. â–¡ 141 142 (3) a\[R(A,B)]ft = fi[R(B}A)\ai (5.3) Proof: According to the connections between patterns of a and ft through some Inter-patterns, a and ft can be decomposed into the A-Union of two subsets of patÂ¬ terns, respectively. in in a) or = a + a and ft = ft + ft where a represents a subset of a patterns that can be concatenated with the ft patterns and a represents a subset of a patterns that cannot be concatenated with ft patterns. The decomposition of ft can be interpreted similarly. n n n n Assume that a ft and ft a are used to denote the new patterns produced by the NonAssociate operations on both left- and right-hand sides of the equation. Each of the new patterns consists of one a pattern, one ft pattern, and a Complement-pattern which connects the two. By the definition of the NonAssociate operation, we have left-hand side = (or + a )\{R(A,B)](ft + ft) a if ft =4> n p II n n a ft otherwise right-hand side = (ft + ft )\[R(B,A)\(at + a) n n a if ft = II II = ft if a = 1 ft a otherwise Since a Complement-pattern is non-directional, i.e., a ft = ft a, the commutativity holds for all cases. â–¡ 143 (4) Cf{X)P = P*{X}a (5.4) Proof: If the Inner-patterns (object instances) of the classes specified in {X} conÂ¬ tained in an a pattern are common to a P pattern, the new pattern which is the intersection of the two patterns will be produced by both sides of the equation. On the other hand, if an a pattern which does not intersect with a P pattern by the operation on the left-hand side of the equation, the same P pattern will not intersect with that a pattern by the operation on the right-hand side. â–¡ (5) a+P = P+a (5.5) Proof: Since the A-Union operation simply lumps patterns named by two operands into a single association-set and the patterns in an association-set are not ordered, both sides of the equation will produce the same result. â–¡ B. Associativity (1) (aw*[fl(CLVCL2)}P{y]) *[Â«(CLVCL4)]1[Z} = *{R( CLV CL2)}(p{ n JR( CL,, CL^Z}) (5.6) CLÂ£{X\ A CLÂ£{Z}. Proof: The associativity holds only under the stated condition. The condition states that a does not contain Inner-patterns of class CL3 and j does not contain Inner- patterns (or object instances) of class CL2 so that a will have no effect on the operaÂ¬ tion *[fÃ(CL3,CL4)\ on the left-hand side and 7 will have no effect on the operation *[R{CLvCL2)\ on the right-hand side. Given that the above condition holds, a,P and 7 can be decomposed as follows: rum a) a = a + a + a 144 where a represents a subset of a patterns which can be concatenated with a subset of ft patterns and thereafter be concatenated (through ft patterns) with a subset of 7 patterns, a represents a subset of or patterns which can be conÂ¬ catenated with a subset of ft patterns which, however, cannot be concatenated with any 7 pattern, and a represents a subset of patterns which either does not have the Inner-patterns of CLt or cannot be concatenated with any ft pattern. t 11 Note that an a pattern may belong to a and a . â€ž / tt m mr b) ft = ft + ft + ft + ft t tt where P can be concatenated with a and 7, /? can be concatenated with a but not with 7, f) can be concatenated with 7 but not with or, and /? cannot be t ft rtt irtt concatenated with either a or ft. Note that patterns of ft, ft, ft , and ft are mutually exclusive. c) 7 = 7+7* + 1 1_ f tt 1 W 1 B t tt tit where 7, 7, and 7 have the similar interpretations as or, or, and a , respec- tively. / t tt tt ft fit tt If aft, a ft, ft'), ft 7, and aft7 are used to represent the results of the Associate operations, according to the definition of Associate we have I It tit t tt tit till left-hand side = ((a + a + a )*{R(CLl,CL2)}(ft + ft + ft + ft )) / tt III 4B(CLS,CL4)]( 7 + 7 + Tf ) = {aft + a ft)*[R{CLvCL2)}{7 + 7 + 7 ) I t I = aft! I It ttt t tt tit tttt right-hand side = (dr + a + a )*{R(CLvCL2)\((ft + ft + ft + ft ) I tt ttt *[jR(CL3,CL4)](7 + 7 + 1 )) I ft III ft III If = (a + a + a )4R(CLvCL2)]{ft7 + ft 7) / t I = aft 7 â–¡ 145 (2) (a{Jti\[R(CLvCL2)}p{Y)) |[Ã2(CL3,CL4)]7{Z} (5.7) = a{x)\[R( CLvCL2)\(P{V)\[R( CLs, CL4)]7{z}) CL.Â£ {X} A CL& {Z}. Proof: For the similar reason given in the discussion of associativity of * operator, a, P, and 7 can be decomposed as follows: t n m a) or = a + a + a where a can be connected to P patterns by Complement-patterns and then be connected to 7 patterns, a can be connected to ft pattern by Complement- ni patterns but cannot be further connected to 7 patterns, and a either has no Inner-patterns of CLX or cannot be connected to any P pattern by t n Complement-patterns. Also, patterns of a and a may not be mutually exclusive. â€ž r n nt nn b) p = p + p +p +p / n where P can be connected to a and 7 patterns by Complement-patterns, P can m be connected to or patterns by Complement-patterns but not to 7 patterns, /? can be connected to 7 patterns by Complement-patterns but not to a patterns, nn and P cannot be connected to the patterns of either a or 7. Also, patterns of t n in nr P, P, P , and P are mutually exclusive. i n in c) 7 = 7+ 7 + 7 in in ... . in in where 7, 7, and 7 have the similar interpretations as a, a, and a , respec- tively. Then, by the definition of the A-Complement operation, we have i n in 1 11 in nn left-hand side = ((Â« + a + a )|[R(CLVCL2)}(P + P + P + P )) I II III |[fl(CLâ€žCL4)](7 + 7 + 7 ) 146 t t ti n in in = (a/? + a P )\[R(CLl,CL2)}(') + 7 + 7 ) I 1 1 = aP 7 1 11 111 I n 111 1111 right-hand side = (or + a + a ) \[R(CLVCL2)]((P + P + P + P ) 1 11 111 |[jR(CL3jCL4)](7 + 7 + 7 )) I n in i i in n = (a + a + a )\{R{CLvCL2)}{Pn + P l) I / I = a Pi where aP, a P, P'), P 7 , operations. â–¡ and aP7 represent the results of the A-Complement (3) (ot[xf{Wl}p{Y}).{W2}'1{z] = aw.{U'}(/?w.{VV2}7{z}) (5.8) where {U^-W2}n{Z}=M{ W2-W1}n{XM. Proof: The condition ensures that the operation Â«{X} operates only on patterns of a and P and â€¢{ Y} operates only on patterns of P and 7. The following figure shows four possible cases in which three patterns intersect with one another. It should be clear that the associativity does not hold for case (d), because it violates the condition, i.e., the second A-Intersect operation operates on a and p. When the condition is true, the proof is similar to the proofs for the above two associative properties; i.e., by decomposing a, P, and 7 accordingly. Y (a) (b) (c) (d) (4) (a+/J)+i - a-K^+1) (5.0) Proof: Since the A-Union operation simply lumps two association-sets into one and 147 the patterns in a set are not ordered, the order of performing A-Union operations on a number of association-sets will have no effect on the final result. â–¡ D. Distributivity (1) a*[R(A,B)}(p+'i) = a*{R(A,B)]P + (5-15) Proof: First, a, p and 7 can be decomposed as follows. 1 11 111 a) a = a + a + a t n nr where a can be concatenated with P, a can be concatenated with 7, and a cannot be concatenated with either P or 7. Note that an a pattern may belong I II to at and a . b) P = p + 0 I m II where ft can be concatenated with a but ft cannot. c) 7 = 7+7 / n where 7 can be concatenated with a but 7 cannot. By the definition of the Associate operation, we have I II III I II t II left-hand side = (a + a + a ) *[R(A,B)\(P + ft + 7 + 7) II II I = af3 + a 7 I II III I II 1 II III I II right-hand side = (a + a + a )*\R(A,B)}(P + ft) + (at + a + a )*[-R(A,i?)](7 + 7) II II I = aft + a 7 â–¡ (2) a\(R(A,B)](P+'l) = a\[R(A,B)\P + Â«|[J2(A,fl)]7 (5-16) Proof: a, P and 7 can be decomposed as follows. 111m a) a = a + a + a 1 m a where a contains patterns that are connected to P by Complement-patterns, a III contains patterns that are connected to 7 by Complement-patterns, and a can- 148 not be connected to either P or 7 by Complement-patterns. Note that an a pat- 1 11 tern may belong to a and a . b) P = P + P 1 n where Â¡3 can be connected to a by Complement-patterns but f3 cannot. / n where 7 can be connected to a by Complement-patterns but 7 cannot. By the definition of the A-Complement operation, we have left-hand side = (a + a + a ) \[R{A,B)](P + P +7 + 7) II II I = a/3 + a 7 i 11 ni t n 1 tt m 1 11 right-hand side = (a + a + a )|[J2(A,B)](y9 + 0) + (a + a + a )|[JE(A,B)](7 + 7) 11 ni = a P + a 7 â–¡ (3) a*{X}(P+7) = otÂ»{X} + o.{X}7 (5.17) Proof: a, ft and 7 can be decomposed as follows. 1 n in a) a = a -f O' + ar 1 m a # hi where a intersects with a intersects with 7, and a does not intersect with either f) or 7. Note that an a pattern may belong to a and a . b) P = P + P I a II where P intersects with a but p does not. I * ll where 7 intersects with a but 7 does not. By the definition of the A-Intersect operation, we have I H IH I II I H left-hand side = (or + a + a ).{X}(/? + P + 7 + 7) II HI = otp + a 7 I H III I H I II HI I II right-hand side = (Â« + a + a )Â»{X}(P + P) + (a + a + a ).{X}(7 + 7) II HI = aP + a 7 â–¡ 149 (4) a[x}*{R(CLvCL2)}(P{Y).{W}7W) = Â« 4R(CLvCL2)}0[rj.{ WUJf}orw *[Â«( CLvCL2)]~t{z) (5.18) (5) a'[Xi\[R(CLvCL2)\(P{Y).{W}'i{z]) - a\[R(CLvCIJ]fi{yflWjX)am\[R(CLvCLj]im (5.19) The above two distributive properties hold when the following conditions are true i) CL2Â£{W}; ii) Xr\Y = Xnw= Â¿ and iii) or is a homogeneous associationâ€”set. The first condition ensures that the operations Associate, A-Complement, and NonAssociate will operate on the common class of /? and 7 as shown in (a) of the folÂ¬ lowing figure. Otherwise, the distributions of these operations to /? and 7 do not make sense as shown in (b) and (c). The second condition ensures that a patterns must not intersect with any pattern of either or 7 so that the *{Xu W} operations on the right-hand sides of the equations will examine the intersections on the portions of a and 7 separately. The third condition ensures that, on the right-hand sides of the equations, only those patterns that have the same a pattern will intersect and be retained in the result. (a) (b) (c) We shall only give the proof of 5.18. 5.19 can be proved using the same techÂ¬ nique. 150 When the conditions are true, a, ft and 7 can be decomposed as follows. 1 n m nn a) a = a + a + a + a where a can be concatenated with ft and 7, a can be concatenated with ft but tn nn not with 7, a can be concatenated with 7 but not with ft, and a cannot be / tr m 1111 concatenated with either Â¡3 or 7.Note that a, a, a , and a are mutually exclusive. â€ž r n in nft b) ft = ft + ft + ft + ft i n where ft can be concatenated with a and does intersect with 7, ft can be con- III catenated with a but does not intersect with 7, f3 cannot be concatenated with mi ot but does intersect with 7, and Â¡3 can neither be concatenated with a nor 1 n m nn intersect with 7. Note that f3, Â¡3, /3 , and Â¡3 are also mutually exclusive. / n in nn b) 7 = 7+ 7 + 7 + 7 1 n m nn ... i n in nn where 7, 7, 7 and 7 have the similar interpretations as ft, ft, ft , and ft , respectively. By the definition of the operations of Associate and A-Intersect we have t 11 ni nn i n m nn left-hand side = (a + a + a + a )*{R(CLvCL2)\((ft + ft + ft + ft ) / n in nn â€¢{W}(7 +7+7 + 7 )) i n m nn 1 1 in in = (a+a + a + a )*[R(CLvCL2)\{ft'i + ft 7 ) f III III I Since CL2e{W}, ft7 and ft 7 cannot be produced by the .{X} operator according to in nr the decompositions of ft and 7. Otherwise, 7 (or ft ) must contain the same Inner- pattern of CL2 as contained in ft (or 7) and must be able to concatenate with a. Applying the distributive property 5.15, we obtain 151 = Ã¡*\R(CLvCL2)]PÃ + d*[R{CLvCL2)]P 7"' n t i n m ni + a *[R(CLvCL2)\P7 + a *{R(CLVCL2)}P 7 /// / t in ni ni + a 4R(CLâ€žCL2)]/?7 + a ^(CL^CL^ 7 /w / / nn ni 111 + Â« 4R(CLvCL2)]P7 + a *[i2(CL1;CL2)]/9 7 Based on the decompositions of a, P, and 7, only the first item will produce new patÂ¬ terns and is retained. Hence, = d *{R(CLvCL2)\pd 1 1 1 = api On the right-hand side of the equation we have 1 ft ttt nn i n m 1111 right-hand side = ((a + a 4- or + a )*\R(CLl,CL2)](P + P + P + P )) i n in nn 1 n m nn â€¢{XuW}((ar + a + a + a )*{R{CLVCL2)}{7 +7+7 + 7 )) II I II II I II II II I II III I III II = (aP + aP + a P + a /?)*{XuVV}(c*7 + Â«7 + a 7 + a 7) Applying the distributive property 5.18, we have II II II I II II III I II III II right-hand side = aPÂ»{X\jW}at'i + aPÂ»{X\jW]ar) + a/?Â»{XUW}a 7 + aPÂ»{X\jW)a 7 in 11 1 n 1 n i n in 1 in in n + aPâ€¢{X\jW}a') + aPâ€¢{X\jW]a^Â¡ + aPÂ»{X\jW}a 7 + a/? Â»{Xu W}a! 7 It I II It I I II II I III I II I III II + a PÂ»{X\jW)a'i + a /?*{XuW'}a7 + a PÂ»{X\jW)a 7 + a PÂ»{X\jW}a 7 + a P â€¢{X\jW}at') + a P â€¢{X\jW)at') + a P â€¢{XuWJa 7 + a P â€¢{XuW}ar 7 Of the sixteen items, only the first one is retained. The rest of items are dropped because they do not intersect either over classes in {X} or over classes in {W). ThereÂ¬ fore, II 11 right-hand side = n^Â»{XuiV}Â»7 / I I = aP7 â–¡ 152 E. Other Properties (1) a^ajamw = = "MWd (5-20) / n m rm / # _ n Proof: a can be decomposed into a + a + a + a , where a satisfies /> and P2, a nr ttti only satisfies Pv a only satisfies P2, and a does not satisfy either P1 or P2. Â«i(Â«MJy)W] = + Â«Vil = a t nr r Â°4Pi(a)\piiW = a2(a + <* )[/y = Â« Â°(a)[^PÂ¿ = a â–¡ (2) IT( a(a)[Pi)(Â£;T\ = o{ Il(a)[Â£mPl (PÂ£Q (5.27) r n r 1 # Proof: First, a is decomposed into a + a, where or satisfies the selection condition n r n but a does not. Then, let /? and Â¡3 represent the results of the projection operation t n m t n corresponding to a and a, respectively. Since PC.Ã, f3 satisfies P but /? does not and we have IA o(or)[^)[f;TI = n[Ã¡)\e-,71 = p a{ U(a)[ftTDW - Â°(P + P) = P D (3) <7(0- *[fl(A,5)] P)[P1APÂ¿ = 4i2(A,B)] / // /// nn t n Proof: First, or is decomposed into a + a + a + a , where a and a satisfy Px but nr nn r nr m n a and a do not; and or and a can be concatenated with some Â¡3 patterns but a nn r n nr nn and or do not. /? can be decomposed into ft + /? 4- P + P with a similar interpre- tation. Therefore, we have t r r nr nr r nr nr a{a *{R(A,B)\ P)[P1aP2] = o(Â°-P + aP + a + a (3 )[P,APÂ¡Â¡Â¡ r t = a {3 "Â¿am o2(/3){PÂ¿ = (a + a) *\R(A,B)\ (/3 + fi) r r = aft â–¡ 153 (4) o{a 4i2(A,B)] P)[PXVPJ = *(Â«)[/>] P + Â» 4Â«(A,B)] a{p)[P2} (5.31) where Px and P2 are applicable to a and P, respectively. Proof: a and P are decomposed as in the above proof. Thus, we have It l III ill l III in o{a 4R(A,B)\ P)[PyP2] = o[aP + (*P + a P + at P )[PxvPÂ¿ II I III III I = otP + aP + at P o{ot)[Px] 4R(A,B)} P + <* 4R{A,B)\ o{p)[P2) i ti i it in mi i n ni tut i ii = (a + ot )4i2(A,B)) (P + P + P + P) + (at+a+at + at )*[R(A,B)](P + P) It I III III I = otP + otP + a P â–¡ (5) o(a - p)[P\ = o{a)[Pi - P (5.34) i ii in nn i n n Proof: We decompose a into a + a + a + at , where a and a satisfy P but a and ini i ni m n nn a do not; and a and a contain P patterns but a and a do not. Then, we have n nn n o{at - p)[P\ = o(at + a )[^ = or a(a)[P\ - P = (at + at) - P = at â–¡ (6) o(ct + p)[P[ = o{a)[Pi + oiP)[PÂ¡ (5.35) I II I II Proof: Suppose a and P are decomposed into subsets a and a and P and P, respec- II II II tively, where or and P satisfy P but a and P do not. By the definition of A-Select operation, we have a{a + P){P[ = at + P = o{at)[PÂ¡ + a[p)[PÂ¡ â–¡ (7) o{ot + p)[PxwPÂ¿ = ^(arM] + oÂ¿P)\PÂ¿ (5.36) where Px and P2 are applicable to a and P, respectively. I II l II Proof: Suppose a and P are decomposed into subsets a and or and P and P, respec- t It I n tively, where a satisfies Px but at does not and P satisfies P2 but P does not. By the definition of A-Select operation, we have o{ot + P)[PyP2\ = Â« + P = ox{at)[Px\ + 154 (8) met + P)[Â£;T\ = 77(a)[f;7] + mm (5-37) / it i it Proof: Suppose that or and P are decomposed into subsets a and a and P and P, II It ft respectively, where a and P contain subpatterns defined by [Â£;7j but a and Â¡3 do not. The results of the two A-Project operations on a and f3 are represented by a and ft, respectively. By the definition of A-Project operation, we have I$a + /?)[Â£; 7] = Â« + p = /7(a)[Â£;7J + II(P){Â£;T\ â–¡ (9) (a + p) - 7 = (a - 7) + (p - 7) (5.40) / w / n Proof: a and Â¡3 are decomposed into subsets a and a and f) and , respectively, / / it n where a and P contain 7 patterns but ot and p do not. Thus, we have (a + p) - 7 = a + p = (a - 7) + (P - 7) â–¡ (10) a -rW (P + 7) = Â« p ^-{w} 7 (5.41) Proof: By the definition of the A-Divide operation, on the left-hand side of the equaÂ¬ tion, an a pattern will be retained in the result if (a) it has Inner-patterns of classes in {W} and contains all patterns of P and 7, or (b) the Inner-patterns of classes in {W} that an a pattern has are common to some other a patterns and these patterns / together, denoted by a, contain all patterns of P and 7. An a pattern (or patterns in a) which is retained on the left-hand side of the equation will be retained after the first A-Divide operation on the right-hand side since it must contain all the P patterns. It will also be retained in the final result after the second A-Divide operation since it must contain all the 7 patterns. â–¡ 155 (11) (orw *{R(A,B)} p[Y)) \{R(C,D)\ 7{z} = orw *\R(A,B)} (P[Y) \[R(C,D)] 7{z}) (5.42) {X} and B<Â¿ {Z} t n in nn mu i n Proof: P is decomposed into P + P + P + P + P , where P and p can be con- m nn it mi catenated with a patterns but /? and f) cannot; f) and (3 can be concatenated with / m mu 7 patterns by Complement-patterns but P and P cannot; and P can be neither conÂ¬ catenated with a patterns nor concatenated with 7 patterns by Complement patÂ¬ terns. a is decomposed into a + a, where a can be concatenated with P patterns but a cannot. 7 is decomposed into 7-1-7 with a similar interpretation. Thus, we have (a *[R(A,B)\ p) I[R(B,Cj\ 7 = (Â«>' + ap)\[R(C,D)}'1 I II I = ap7 4R(A,B)\ (P{y) I[R{C,D)\ 7{z}) = a*\R(A,B)\(p'l + P"Ã) I II I = aP 7 â–¡ (12) (aw 4J2(A,B)] P{Y)) - 7{z} = (Â«W - 7{z}) *[R(A,B)} P{y) (WRi^ = *) (5-43) = a{x] (P{y} - 7{z}) (TO = Proof: We shall prove the first case. The second case can be proved similarly, a is I II III lili I II decomposed into a + a -Â»- or + or , where a and a can be concatenated with /? pat- m mi 1 m a ii nn terns but a and a cannot; and a and a contain 7 patterns but a and a do not. Â¡3 is decomposed into Â¡3 + Â¡3, where f3 can be concatenated with a patterns but Â¡3 cannot. Since {Y}f^Z}= (Â«{*} *\r(A,B)} P[y)) - 7{z} = {ap + a P) - 7 II I = a P nn 1 11 (Â«W - 7{z}) â€¢[Â«(A.B)] P[Y) = (Â«" + Â») *{R(A,B)} {P+P) II I = a P â–¡ 156 (13) (Â«{x} *[-^(A,B)] P{y)) â€¢ 7{z] â€” â€¢ 7{z}) P{y) (5.44) ({Y>n(Z} = MAGW) = a{x} *t-^(-^>-Â®)] (^{y} â€¢ 7{z}) ({X}f({Z} = MB6W) Proof: We only give the proof of the first case. The decompositions of a, P, and 7 / n in mi i n . are as follows: a = a + or + or + a , where a and a can be concatenated with p in mt 1 m i n patterns and a and or cannot, and a and a intersect with 7 patterns and a and mi t n i ff a do not; f3 = /? + (3, where /? can be concatenated with a patterns and p cannot; 1 n 1 n 7 = 7 +7, where 7 intersects a patterns and 7 does not. When {Y}p|{Z}=<Â£, pat- terns of P and 7 do not intersect with each other and we have {aw *{R(A,B)} p[Y]) â€¢ 7{z} = {<*P + Â«V') â€¢ (7 + 7) / / I = a I I III I I II (a{X) â€¢ 7{z}) 4-H(A,B)] Â£{y} = (Â«7 + or 7) *[B(A,B)] (0 + P) I I I = aP 7 â–¡ Note that the left-hand side of 5.44 is in a distributive form of * with respect to â€¢ but the distributive property cannot be applied because it requires that A be in both a and P and 7 be a homogeneous association-set. (14) or ![B(A,B)] (P + 7) (5.48) = a\[R(A,B)}P-n{a*[R(A,B)}'lM + a![B(A,B)]7-B(a*[B(A,B)]$[ar] where a, P, and 7 are homogeneous association-sets. i n in mi 1 Proof: a can be decomposed as a = a + or + ar + or , where a can be concatenated II with P by Inter-patterns but not with 7; a can be concatenated with 7 by Inter- III patterns but not with p, a can be concatenated with both a and P by Inter-patterns; nn m 1 n in itn and a cannot be concatenated with Â¡3 and 7. or, a, a , and a are mutually 157 exclusive. P is decomposed into /? + /?, where P can be concatenated with a but P cannot. 7 can be decomposed as ft. By the definition of the NonAssociate operation we have / n m nn 1 n t n left-hand side = (Â« + O' + a + tt ) ![J2(A,Â£)] (ft + P +7+7 ) a 7 -o- II Wh HH II II a P -o- II Â«*-< It II nil P + 7 if a = II nn n 7 if a =P =0 n nn n P if a =7 = tlH II II a 1111 11 nil n if p =7 =4> a P + a 7 otherwise 1 1 II P = aP + a l P, we have n(a*[R(A,B)}p)[a = a + Â« . Therefore, on the right-hand side nn II a if P=<Â¡> 1 ill 11 nn (a + a ) = p if Of = nil n a P otherwise nn II a II n nt a nn (a + a ) = 7 if a = nn n a 7 otherwise 158 Hence, r nt n nt right-hand side = Ot\[R(A,B)\P â€” (a + a ) -f <*![i2(A,B)]r) â€” (a + a; ) I HI a\[R(A,B)]P - (a + a ) = II III + a\[R(A,B)]') - (a + a ) (15) a-{P + 'i) = a- P- 7 (5.51) Proof: By the definition of A-Difference operation, the left-hand side of the equation retains a patterns that do not contain any pattern of p or 7. On the right-hand side, the first A-Difference operation retains a patterns that do not contain any P pattern and then the second operation retains a patterns that do not contain any pattern of P tin 11 ti a 7 -e- II mi n n a P 11 e~ II II P + 7 nn if a = II nn n 7 if Of =P = II P nn n if a =7 - 1111 II II a if p=n= nn n nn n a P + a 7 otherwise or 7. â–¡ BIOGRAPHICAL SKETCH The author has been a research assistant in the Database Systems Research and Development Center at the University of Florida since 1985, where he has been working towards the Ph.D. degree in electrical engineering. His research interests include semantic data modeling, query models for object-oriented databases, knowledge and rule representation and processing, query optimization, concurrency control, and parallel processing for 0-0 databases. In 1970, he received his B.S. degree in mathematics from Fudan University, Shanghai, China, where he was a faculty member of the Computer Center from 1970 to 1983. Between 1983 and 1985, he joined as a visiting scholar the Database Systems Research and DevelopÂ¬ ment Center at the University of Florida, where he received his M.S. degree in electrical engineering in 1987. 159 I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Stanley Y.W. Su, Chairman Professor of/Electrical Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor He Associate Professor of Electrical Engineering of Philosophy an X. Lam, Cochairman I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Shamkant B. Navathe Professor of Computer and Information Sciences I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. U -t ÃÃJaÃIÃ landy Y. Q. Chow â€™rofessor of Computer and Information R Professor Sciences I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. John Staudhammer Professor of Electrical Engineering This dissertation was submitted to the Graduate Facuity of the College of EngineerÂ¬ ing and to the Graduate School and was accepted as partial fulfillment of the requireÂ¬ ments for the degree of Doctor of Philosophy. December, 1990 &-â– /!) Winfred M. Phillips Dean, College of Engineering Madelyn M. Lockhart Dean, Graduate School UNIVERSITY OF FLORIDA 3 1262 08285 385 3 |