A neutral data model for the design and implementation of a heterogeneous data model and schema translation system

MISSING IMAGE

Material Information

Title:
A neutral data model for the design and implementation of a heterogeneous data model and schema translation system
Physical Description:
vii, 151 leaves : ill. ; 29 cm.
Language:
English
Creator:
Fang, Suey-Chyun
Publication Date:

Subjects

Subjects / Keywords:
Electrical Engineering thesis, Ph.D   ( lcsh )
Dissertations, Academic -- Electrical Engineering -- UF   ( lcsh )
Genre:
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 1993.
Bibliography:
Includes bibliographical references (leaves 144-150).
Additional Physical Form:
Also available online.
Statement of Responsibility:
by Suey-Chyun Fang.
General Note:
Typescript.
General Note:
Vita.

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 030326650
oclc - 30784220
System ID:
AA00025827:00001

Full Text









A NEUTRAL DATA MODEL FOR THE DESIGN AND IMPLEMENTATION OF A HETEROGENEOUS DATA MODEL AND SCHEMA TRANSLATION SYSTEM













By


SUEY-CHYUN FANG














A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY


UNIVERSITY OF FLORIDA 1993















Dedicated to



My parents TPun Lin and Yaw-Ming Fang who instilled within me the values to succeed in any endeavor. My wife Rouh-Iong Huang who patiently and lovingly provided support necessary for the completion of this effort. My children Shin-Chiao and Terry who challenged me to finish my degree program.














ACKNOWLEDGEMENTS


It is my pleasure to express my gratitude for the warm encouragement and support of my dissertation advisor, Prof. Stanley Y. W. Su. I also wish to thank the members of my dissertation committee, Prof. Herman Lam, Prof. Jose C. Principe, Prof. Sharma Chakravarthy, and Prof. Nabil Kamel for their prompt reading of this dissertation and insightful comments.

My special appreciation goes to my colleague Chong-Shi Zhang for his dedicated work on the implementation of the system. I thank all my colleagues on the ORECOM project including Ching-Shou Chen, Shashikant Garje, Rangarajan Uppilisrinivasan, Prakash Sreewastav, and Denver Williams (University of Central Florida) whose efforts have resulted in a working system. I am also grateful to our secretary, Sharon Grant, for her constant help during my research.

Finally, I acknowledge the financial support of the National Institute Standards and Technology (Grant No. 60NANB7DO714) and the Florida High Technology and Industry Council (Grant No. UPN 90090708, 91092323, and 92110316).

















iii














TABLE OF CONTENTS

Diu
A CKNOW LEDGEM ENTS ............................................................................................... iii

ABSTRA CT ....................................................................................................................... vi

CHAPTERS

1 INTRODUCTION ......................................................................................................... I

1. 1 M otivation ............................................................................................................... 1
1.2 General Approach ................................................................................................... 4
1.3 Dissertation Organization ........................................................................................ 5

2 SURVEY OF RELATED W ORK ................................................................................ 7

2.1 Translation Capability ............................................................................................. 7
2.2 Translation Approach .............................................................................................. 8

3 AN APPROACH TO DATA MODEL ANALYSIS ................................................. 15

3.1 Sem antic Decom position ....................................................................................... 15
3.2 Support of a Neutral Core M odel .......................................................................... 16

4 THE OBJECT-ORIENTED RULE-BASED EXTENSIBLE CORE MODEL
(ORECOM ) ................................................................................................................. 19

4.1 Structural Prim itives .............................................................................................. 19
4. 1.1 O bject ........................................................................................................... 19
4.1.2 A ssociation .................................................................................................. 20
4.1.3 Class ............................................................................................................. 21
4.2 Behavioral Prim itives ... : ... *"******"""**"*'***'***'***********'*'**********"***,*"***'**'******"***",** 25
4.2.1 M ethod Specification ................................................................................... 26
4.2.2 Rule Specification ........................................................................................ 27

5 THE GENERAL CONSTRAINT TYPES (MACROS) ............................................. 31

5.1 Intra-class Constraints ............................ 32
5.2 Inter-class and Inter-association Constraints ......................................................... 34
5.2.1 PARTICIPATION (PT) ................................................................................ 35
5.2.2 CARD INALITY (CD ) .................................................................................. 41
5.2.3 INHERITA NCE (114) ................................................................................... 47
5.2.4 PRIVACY (PV) ............................................................................................ 50
5.2.5 TRAN SITION (TS) ...................................................................................... 53
5.2.6 MATHEMATICAL-DEPENDENCE (MD) ................................................ 56
5.2.7 LOG ICAL-DEPENDENCE (LD) ............................................................... 58


iv









6 SEMANTIC DECOMPOSITION AND DATA MODEL ANALYSIS ..................... 64

6.1 IDEF- I X ................................................................................................................ 64
6.2 EXPRESS .............................................................................................................. 67
6.3 NIAM .................................................................................................................... 71
6.4 O SAM ................................................................................................................. 74

7 THE DATA MODEL AND SCHEMA TRANSLATION SYSTEM ......................... 77

7.1 System Overview .................................................................................................. 77
7. 1.1 Subsystem 1: D ata M odel Translation ......................................................... 78
7.1.2 Subsystem -2: Schem a Translation ............................................................... 80
7.2 System Architecture .............................................................................................. 81
7.3 An Example of OSAM* to IDEF- I X Schema Translation ................................... 83
7.4 System Implem entation ......................................................................................... 88
7.5 Applications .......................................................................................................... 89
7.5.1 Data M odel Learning .................................................................................. 89
7.5.2 Schem a Integration ....................................................................................... 90
7.5.3 Schem a Verification and Optim ization ........................................................ 91

8 CON CLU SIONS ......................................................................................................... 92

8.1 The Contributions of the Research ........................................................................ 93
8.2 Future Research ..................................................................................................... 95

APPENDICES

A DEFINITIONS OF M ACRO S .................................................................................... 96

B SEMANTIC DECOMPOSITION OF IDEF-1X, EXPRESS, NIAM, AND
O SAM ..................................................................................................................... 104

C EQUIVALENCE MATRICES FOR DATA MODEL TRANSLATIONS OF
ID EF-lX EXPRESS, NIAM A ND O SAM ........................................................... 132

REFERENCES ................................................................................................................ 144

BIOG RAPH ICAL SKETCH ........................................................................................... 151
















v














Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy


A NEUTRAL DATA MODEL FOR THE DESIGN AND IMPLEMENTATION OF A HETEROGENEOUS DATA MODEL AND SCHEMA TRANSLATION SYSTEM By

Suey-Chyun Fang

August, 1993


Chairman: Dr. Stanley Y. W. Su
Cochairman: Dr. Herman Lam
Major Department: Electrical Engineering



To achieve data sharing among heterogeneous database management systems, one essential step is the conversion of the schemata defined in the diverse data models used by these systems. A semantic preserving translation of different modeling constructs and constraints is necessary to ensure that the semantics of applications remain intact after the translation. However, it is difficult to translate the constructs and constraints of one model directly into those of another model because 1) their ten-ninologies and modeling constructs are often different and 2) being high-level user-oriented models, their modeling constructs may have a lot of implied semantic properties which may or may not correspond to those of the others. If these high-level constructs and constraints are decomposed into some lowlevel, neutral, primitive semantic representations, then a system can be built to compare different sets of primitives in order to identify whether these constructs and constraints are identical, slightly different, or totally unrelated. Discrepancies among them can be explicitly specified by the primitive representations and be used in the application development to


vi









account for the missing semantics. This dissertation contributes to the development of an object-oriented rule-based extensible core model (ORECOM) which is used as a neutral, intermediate model through which diverse modeling constructs and constraints are translated.

The core model provides a small number of general structural constructs for representing the structural properties of high-level modeling constructs including those of object-oriented data models. It also provides a powerful knowledge rule specification language for defining those semantic properties not captured by the general structural constructs. The language is calculus-based and allows complex semantic constraints to be defined in terms of triggers and the alternative actions to be taken when some complex data conditions have or have not been satisfied. This dissertation also presents eight basic constraint types found in many semantically rich data models including IDEF-lX, EXPRESS, NIAM, and OSAM*. These constraint types are represented in the neutral model by parameterized macros and their corresponding semantic rules. Parameterized macros are compact forms of neutral semantic representations which are used in an implemented system for comparing and translating the modeling constructs and constraints of different data models. The corresponding semantic rules are the detailed semantic descriptions of the constraint types.

The modeling constructs and constraints of the above mentioned data models have been analyzed and decomposed into the macro- and micro-rule representations which are used in the design and implementation of a data model and schema translation system. In addition to schema translations, several applications of the neutral core model and the translation system including data model learning, schema integration, schema verification and optimization are also described.







vii













CHAPTER 1

INTRODUCTION


1.1 Motivation


Recent progress in computer networking technologies and fast-emerging nontraditional applications have created an increased need for an integrated or federated computing system to achieve more coordination and interoperability of heterogeneous systems [GUP89, LIT9O, SHE9O, TH090, HSI92a, HSI92b]. The main feature of such an integrated or federated computing system is that the data and schemata of all component systems can be shared and interchanged among multiple divisions of an enterprise or multiple enterprises. This feature has been shown to be increasingly important for supporting the business activities of present and future organizations. It is critical for effectively reducing product or application development costs, helping to strengthen an organization's functionalities, and therefore increasing the competitiveness of their products in the world market.

In an integrated or federated computing environment, the data and schemata have usually been independently developed for supporting different aspects of applications using different computing and database systems. The main barrier to effective data and schema sharing is the rapid proliferation and heterogeneity of data models and data management systems. For example, in a manufacturing environment, data used to support various designs, process planning, and manufacturing activities are often defined by different data models and stored and processed by different file management systems, database management systems, and in-house software of all kinds. They have different logical as well as physical structures and are processed by different application programs which run on different hardware systems and operating systems. To share the data generated by

I





2


different activities of the entire product life cycle starting from product design, analysis, process planning, manufacturing, sale, and service, a heterogeneous database management system (HDBMS) needs to be developed to run on top of the existing data management facilities used by a manufacturing enterprise. According to the three-layered structure of a HDBMS [ELM90], a HDBMS must be able to perform the following functions to achieve the interoperability: (1) convert different data models and query languages, (2) translate and integrate schemata, and (3) process global as well as local transactions. Above all things, it must be able to translate the logical (schema) and physical (data) representations of the data residing on one component system into the representation suitable for use by another component system. The translation of logical schemata is essential for achieving data sharing and improving interoperability among heterogeneous component systems. Other problems and solutions related to many issues of interoperability among heterogeneous systems have been well addressed and documented [HEI85, LIT86, NSF89, SHE90, ALO91, GAN91, KEN91, URB91, KRI91, BAR92, BEN92].

Although there has been much work on the research and development of logical schema translation, most of the existing work has been limited to traditional data models (e.g., hierarchical, network, and relational models) [BIL79, VAS80, CAR80a,b, LAR83, HWA83, etc.] In today's complex and nontraditional applications such as CAD/CAM, office automation, knowledgebases, multimedia data management, integrated manufacturing and so on, the translation becomes more difficult because the schemata of different applications are defined by much more complex high-level object-oriented and semantic models such as EXPRESS [SCH89, ISO92], NIAM [VER82], IDEF-1X [L0086, 87], OSAM* [SU89], and others surveyed in [HUL87, PEC88]. Each of these models has different modeling constructs (e.g., entity or class, object or instance, relation or association, etc.) for modeling the structural properties of an application. These constructs often have a number of semantic constraints associated with them. They are either explicitly specified by some keywords or rules (e.g., primary key, non-null, total





3


participation, one-to-one mapping, etc.) or implicitly represented by the constructs (e.g., all instances of a subtype entity are also members of a supertype entity). Furthermore, some models, especially object-oriented models (e.g., OSAM* and EXPRESS), also allow the encapsulation of user-defined operations and rules for modeling the behavioral properties of an application. Different data models may also name their constructs and constraints differently even though they may represent the same semantic properties. Others may have the same names but refer to totally different properties. In addition, constructs and constraints of different models may share similar but not identical properties. All of these phenomena are referred to as the data inconsistency problem, which is called the domain mismatch problem [BRE90] or representational heterogeneity problem [GAN91]. These data inconsistencies or semantic discrepancies create great difficulties in data conversion, query translation, schema integration as well as transaction execution in heterogeneous systems. They need to be fully accounted for when data of one system are accessed by another system so that no semantic properties will be lost. Thus, the analysis of semantic properties underlying the existing modeling constructs and the identification of the semantic similarities and differences of these constructs are the fundamental problems that need to be solved before the true interopet-ability of heterogeneous database systems can be achieved.

Motivated by the above problems, we have studied a number of popular data models in an attempt to identify the underlying semantic properties of their modeling constructs and their associated constraints. The objective is to use their common properties and differences as a basis for the design and development of a neutral data model through which the modeling constructs and constraints of one model can be converted into those of another, and semantic discrepancies, if they exist, can be identified and specified explicitly. These discrepancies can be used for the generation of explanations and can be used by application developers, who use the converted schemata and databases, for incorporation into the new application systems so that no semantic loss will result. If these discrepancies





4


can be specified in enough detail, they can also be used for automatic generation of a program code which can be incorporated into the new application systems.



1.2 General Approach

Our main approach to solve the data inconsistency problem is first to develop a neutral core model which provides some primitive modeling constructs and then to decompose the complex semantic properties of high-level modeling constructs and constraints into the primitive constructs of the core model. The objective of the decomposition is to precisely capture the structural and behavioral properties captured by the existing data models in order to explicitly identify and represent the similarities and differences among these data models. A core model, called ORECOM, has been developed to serve this purpose. It is an object-oriented, rule-based, extensible core model which provides a few very general structural constructs and a powerful rule specification facility for specifying primitive semantic properties. The high-level modeling constructs of the existing data models can then be decomposed and represented by ORECOM's primitive structural constructs and semantic rules. By comparing their low-level neutral representations, the semantic discrepancies of high-level constructs can thus be identified and explicitly specified by semantic rules. In ORECOM, these rules are called microrules, which allow the specification of database operations (the trigger conditions) under which certain detailed database states need to be verified to determine the alternative database operations to be taken based on the result of the verification. A calculus-based language is used as the rule specification language. For the convenience of expressing the semantic constraint types frequently found in data models and for avoiding the repeated specifications of these constraint types in terms of detailed micro-rules, we use parameterized macros to represent them. Each macro captures a genetic constraint type (e.g., cardinality) and its variations are specified by the parameters of the macro. Thus, a





5



macro corresponds to sets of micro-rules which define the constraint type and its variations. In this dissertation (Chapter 6), examples of high-level modeling constructs taken from several semantically rich data models such as IDEF-1X, NIAM, EXPRESS, and OSAM* are used to show how they can be mapped into the underlying ORECOMs macro- and micro-rule representations through semantic decompositions. The first three models have been used by many industrial companies for product specification and exchange. They are of great interest to the industrial community involved in STEP1 and IGES/PDES2 activities in developing an international standard. OSAM* is an objectoriented semantic association model developed at the University of Florida. A knowledgebase management system OSAM*.KBMS has been implemented based on this data model for supporting advanced data/knowledge base applications.

The major contributions of this research are (1) the introduction of an objectoriented, rule-based, extensible core model (ORECOM) which is used to represent in a neutral form the semantic properties of diverse high-level modeling constructs and constraints, (2) the formalization of some basic constraint types (i.e., macros) which provide the needed semantic foundation for data model analysis, and (3) the development of a semantics-preserving schema translation technique and system.


1.3 Dissertation Organization


The rest of this dissertation is organized as follows. Chapter 2 surveys the related work in the schema translation area with emphasis on the following two aspects: translation capability and translation approach. In Chapter 3, we present the approach for data model analysis based on the concept of semantic decomposition and the use of a neutral core model. In Chapter 4, we describe the structural primitives and behavioral primitives of the 'STEP (the Standard for the Exchange of Product Model Data) is a project under development by the ISO Technical Committee 184 (TC184) /Sub-Committee 4 (SC4). 21GES (Initial Graphics Exchange Specification) and PDES (Product Data Exchange Specification) are U.S.-based contributors to the development of STEP.





6



neutral core model ORECOM. The macro representations of basic constraint types are described in Chapter 5. The use of ORECOM macros in performing data model analysis of IDEF-IX, EXPRESS, NIAM, and OSAM* is given in Chapter 6. In addition to the data model analysis, other applications of ORECOM are also discussed. In Chapter 7, we present the design and implementation of a schema translation system which uses ORECOM as a neutral model. Finally, Chapter 8 gives our conclusion of this research and outlines the future directions.














CHAPTER 2
SURVEY OF RELATED WORK

In this chapter, a survey of some related work on data model and schema translation is given. The survey and the discussion will be concentrated on two aspects: the translation capability and the translation approach. On translation capability, we are concerned with the data models and the directions of translations that are supported by a translation system. On translation approach, we compare the direct approach with the indirect approach of translation. The result of our survey is summarized in the table shown in Table 2.1.


2.1 Translation Capability

Most of the existing work on data model and schema translation were developed in the early 80's and focused on the transformations among the traditional data models only (i.e., hierarchical, network, and relational models). Some of them can transform in both directions, i.e., schemata expressed in any one of the three models to that of the others [BIL79, CAR80a,b, POT84, GRA84, JAC85, BRE86, CAR87]. Some can perform only specific transformations, for example, between the relational and network models [DEV80, KAT80, LAR83], or from the hierarchical and network models to the relational model [VAS80, HWA83].

In recent years, the fast growing demand for database support for complex and non-traditional applications such as CAD/CAM, computer integrated manufacturing (CIM), office automation, and multi-media system, etc., has spurred the introduction of a number of new high-level data models such as semantic data models exemplified by E-R [CHE76], EER [ELM89], E2/R [OZK90], NIAM[VER82] and EXPRESS [SCH89, ISO92], and object-oriented data models used in Iris [FIS87], and 02 [BAN88], etc. It also motivated



7





8



more research on data model and schema translation. For example, in the work of Elmasri and Navathe [ELM89], E-R and EER models can be transformed to one of the traditional models. Similarly, the translation between E2/R model and the traditional models has also been studied [OZK9O]. Bidirectional translations between the E-R model and the traditional models [SEN86] and between EXPRESS and the extended E-R model [SAN92] are available. Translation in a heterogeneous database system involving the functional data model [SHI81] and the traditional models has been reported [DEM87,88, HS189]. To name a few more, there are transformations from NIAM to EXPRESS [CHA92J, from NIAM to ACS [FLY8 5], from Iris to the relational model [LYN87], from NIAM to the relational model rLEU88], and from E-R to the relational and network models [BR185]1, etc. However, to our knowledge, there has been not much attempt to translate schemata defined by high-level semantically rich data models such as IDEF-iX, EXPRESS, NIAM, and OSAM*. Furthermore, most of the existing work does not handle bi-directional transformations. None of the existing translation systems can be easily extended or modified to include the semantics introduced in new models.


2.2 Translation Approach


As far as the translation approach is concerned, a model/schema translation can be performed either directly or indirectly. In the direct translation approach illustrated by Figure 2. 1(a), the translation of one model to another is carried out by applying a dedicated translation algorithm which is designed to satisfy only the conversion requirements of a particular pair of models. Therefore, it can not be used for the translation of a different pair. For instance, in Figure 2. 1(a), the algorithm for transforming model A to model C can not be used for other transformations (e.g., from A to B, from A to D, or from C to A). Consequently, the mapping between E-R and the relational model as described in [VAS8O] needs two different algorithms; one for each direction. Similarly, the mappings between ER, EER, or E 2/R and the relational, hierarchical, or network model [ELM89,OZK9O], Iris





9



and the relational model [LYN87], NIAM and the relational model [LEU88], NIAM and ACS [FLY85], the relational model and the network model IILAR83], etc., all need different translation or mapping algorithms. There is no uniform approach to support the algorithm development for different transformations.



Model A Model D FModeAMolD


Intermediate
Representation

\Transformation Rules

Model B Mod CIe 7

(a) Direct translation (b) Indirect translation


Figure 2.1 Two different translation approaches.




The indirect translation approach, on the other hand, avoids the development of a large number of unsharable translation algorithms by using a common intermediate model. Instead of transforming a source model to a target model directly, it divides the translation into two steps. In the first step, the source model is converted into an intermediate representation which is then converted into the target model in the second step. Clearly, only two sets of algorithms are needed; one for each step of the translation. By these two sets of algorithms, any source model can be transformed to any target model without additional algorithms. The concept of indirect translation is illustrated in Figure 2. 1(b).

For a translation system that handles a few data models, a direct translation can be a good approach from the performance point of view because of its dedicated translation algorithms. However, for a large system containing many different models, an indirect





10



translation approach is more appropriate. There are at least two significant advantages. (1) Low complexity -- For N data models, N*(N-1) dedicated translation algorithms have to be developed and implemented in the direct approach. Only 2N algorithms are needed for the indirect approach. The order of complexity of direct transformations is O(N2) as compared with O(N) of the direct translation [SEN86]. (2) High extensibility -- To add a new model into a translation system that can handle schema translation of N models, only two new translation algorithms need to be added using the indirect approach; one for each direction between the new model and the intermediate model. In the case of direct translation, 2N new algorithms are needed. The indirect approach is obviously more advantageous than the direct approach.

To use the indirect translation approach, an important decision to be made is on the selection of an intermediate model. In the work we surveyed, many existing or specially designed data models have been used as the intermediate model. The E-R model in HDDBMS [CAR80a,b, 87], the functional model in Multibase [SMI81], the ECR in DDTS [DEV80], the IDEF-IX in IISS [IIS83], the attribute-based model in MLDS [DEM87,88, HS189], the OSAM* in IMDAS [KIR88], the canonical model in PRECI [DEE8 I], and the EXPRESS in SUMM [FUL92] are just some examples. Other representations are also possible. For example, the first-order logic is used in Jacobs [JAC85] and Grant [GRA84], a Logical Data Definition Language (LDDL) is developed by Biller [BIL79], a hypergraph is considered by Morgenstern [MOR83], and mathematically precise definitions are proposed by Klug [KLU78]. The selection of an intermediate model is important because it not only determines the capability of a translation system but also affects the correctness of the result. For example, if a data model has some constructs or constraints that can not be captured by the selected intermediate model, then the semantic properties can not be preserved in the translation. For achieving correct transformations, an effective mechanism is also needed to handle semantic discrepancies of the source and target constructs. It has to first identify the differences and represent them precisely in some form such as rules that





I1I



can be used for generating explanations or a program code to account for the discrepancies. Otherwise, semantic lossless translations can not be guaranteed.

We can categorize the existing intermediate models into two goups: high-level data models and low-level data models. Using a high-level (usually semantic or object-oriented) data model as the intermediate model can be viewed as a "super model" approach. This approach aims to incorporate the modeling concepts of the existing data models in the intermediate model so that it can "subsume" all the existing data models. There are several examples of this approach such as the E-R model [P0T84, CAR8Oa,b, 87], the functional model [SM181], the XER model [MAR83], and the EXPRESS [ALM92, CLE92, FUL92]. However, such a super model is generally difficult to achieve and is impractical for the following reasons. First, it is difficult for a single model to effectively incorporate in a "seamless" manner all the constructs and constraints of the existing data models which have different complexities and incompatible features. Second, the resulting model, even if it can be developed, would be too complicated for use. Furthermore, it is virtually impossible to anticipate and accommodate the changing requirements for modeling application data (i.e., new semantic properties) even if a complete set of modeling constructs and constraints can be defined based on the existing models.

Using a low-level data model as the intermediate model seems to be a more suitable approach. A low-level data model is one that provides a set of primitive, general, and neutral constructs. For precise semantic specifications, high-level modeling constructs of the existing data models can be decomposed and represented by these primitive constructs and their discrepancies can be identified by matching the sets of low-level constructs. The work of Grant [GRA84] and Jacobs [JAC85] is a good example. Grant and Jacobs developed a database logic for data model and database transformations. The IM model [SEN86] and the attribute-based model [DEM87,88, HS1891 are another two examples of using primitive constructs and operations to handle data model transformations. However, these intermediate models were used to capture the semantic properties of the traditional and





12


functional data models. They are not adequate for capturing the complex structural and behavioral properties of new object-oriented data models.

As a general conclusion of this survey, we believe that (1) the indirect approach of model/schema translation is less complicated and more extensible, and (2) a low-level model is necessary to support the analysis of complex semantic properties of high-level data models and to effectively handle the semantic discrepancies found in the modeling constructs and constraints of the existing data models.






13


Table 2.1 Related work on data model and schema translation.


Capability Approach
Source Target Directio Direct / Intermediate Applications Source Target Direct Indirect Model

Ozkarahan E2/R R,H,N 1 -D Direct multi-model
[OZK90] database system

Elmasri & Navathe ER R,H,N 1-D Direct model comparison
[ELM89] EER & database design
Vassiliou &
Lochovsky H, N R 1-D Direct transaction
[VAS80] translation

Katz R N 2-D Direct model translation
[KAT80] & database design

Sen R,H,N R,H,N 2-D Indirect IM schema translation
[SEN86] ER ER Data Model

Demurjian & Hsiao attribute- heterogeneous
[DEM87,88] R,H,N,F R,H,N,F 2-D Indirect based Model databases
[HS189]

Potter R,H,N R,H,N 2-D Indirect ER multi-model
[POT841 schema design

database
Smith R,N,file R,N,file 2-D Indirect functional/ integration
[SMI81] DAPLEX inertion
(Multibase)
database
Devor ECR/database
[DEV80] R,N R,N 2-D Indirect ECRIAS integration
(DDTS)

Jacobs & Grant R,H,N R,H,N 2-D Indirect database database conversion
[JAC85] [GRA84] logic & view integration


Navathe & Cheng EER H 1-D Direct schema translation
[NAV83]

Hwang & Dayal H, N R 1-D Indirect ER/ multi-model
[HWA83] ERL database system

Dumpala & Arora R,H,N ER 2-D Direct multi-model
[DUM83] database system
(R : relational H : hierarchical N : network F : functional 1-D unidirectional 2-D :bidirectional)






14


Table 2.1 (continued).


Capability Approach
Source Target Direction Direct / Intermediate Applications Source Target Direction Iniet.oe Indirect Model
Biller
[BIL79] R,H,N R,H,N 2-D Indirect LDDL data translation


Flynn & Laender NIAM ACS 1-D Direct schema
[FYY85] comparison

Cardnenas RHN RHN 2-D Indirect ER+DIAM heterogeneous
[CAR80a,b,87] RHN RHN 2-D Indirect ER+DIAM database system
(HD-DBMS)
Breitbart R,H,N R,H,N 2-D Indirect extended heterogeneous
[BRE86] relational database system
(ADDS)
Morgenstern R,H,N, R,H,N,
[MOR83] entity- entity- 2-D Indirect hypergraph schema translation
based based

Deen R,N R,N 2-D Indirect canonical heterogeneous
[DEE80,81] Adabas Adabas data model databases
(PRECI)

Lyngbaek & Vianu R 1-D Direct schema
[LYN87] Iris R 1-D Direct comparison &
optimization
Leung & Nijssen NIAM R 1-D Direct
[LEU88] NIAM R 1-D Direct database design

Briand, et al. ER R, N 1-D Direct database design
[BR185]

Larson
[LAR83] R N 2-D Direct multi-model
database system

IMS
Margrave et al. IDMS
Margrave et al. ER IDAS 1-D Indirect XER database design
[MAR83] ADABAS
TOTAL

R : relational H : hierarchical N: network F : functional 1-D :unidirectional 2-D :bidirectional)














CHAPTER 3
AN APPROACH TO DATA MODEL ANALYSIS AND TRANSLATION



As concluded from the previous chapter, we know that it is difficult to translate the constructs and constraints of one model directly into those of another model because 1) their terminologies and modeling constructs are often different and 2) being high-level useroriented models, their modeling constructs may have a lot of implied semantic properties which may or may not correspond to those of the others. If these high-level constructs and constraints are decomposed into some low-level, neutral, primitive semantic representations, then a system can be built to compare different sets of primitives in order to identify whether these constructs and constraints are identical, slightly different, or totally unrelated. Discrepancies among them can be explicitly specified by the primitive representations and be used in the application development to account for the missing semantics. To support the above semantic decomposition of high-level data models and to facilitate the translations of the schemata defined on these data models, a neutral low-level core model is required. This model must be very primitive so that it can be used to capture the structural and behavioral properties of other high-level data models. In this chapter, we shall first present the concept of semantic decomposition and then discuss the required characteristics of the neutral data model.


3.1 Semantic DecompQsition

For the convenience of database users, all existing data models are designed in sucli a way that they provide a number of high-level structural constructs for defining the structural properties (attributes and relationships) of real world entities. Associated with these constructs, there are a number of constraints (keys, non-null, total participation, 15





16



cardinality, etc.) which are either explicitly specified by some reserved keywords or implicitly specified in the structures for expressing various semantic restrictions that should be enforced by DBMSs. Due to the fact that these modeling constructs are high-level and user-oriented, they often carry a lot of semantics which may or may not be equivalent to those in the modeling constructs of another data model. For example, the association between two entity types in the relational model is expressed by means of a cross reference between keys and foreign keys, and the referential constraint can be implemented in a DBMS using different deletion rules, whereas, in the entity-relation ship or ER model [CHE76], the association is explicitly modeled by a relationship which implicitly uses cascaded deletion for implementing the referential constraint. As another example, the generalization construct or superclass- subclass association between two entity types in all object-oriented data models implies the property of inheritance yet their inheritance models can be rather different and thus have different implied object insertion and deletion behaviors. Due to these and many other types of differences, a direct translation between modeling constructs and constraints of different data models is not workable since many fine semantic distinctions can not be captured and there will be semantic losses or additions in the translation. In order to achieve a semantic-preserving translation, it is necessary to decompose a modeling construct and its associated constraints into some low-level structural and behavioral primitives which can then be used for comparing the modeling constructs and constraints of different models and used to explicitly represent their discrepancies.


3.2 Support of a Neutral Core Model

The existing user-oriented data models use different ten-ninologies to name all things of interest in an application world (e.g., entities, objects, concepts, tuples, etc.) and the collections of these things (e.g., entity types, object types or classes, concept types, relation, etc.). They also use different terms to specify the structural relationships among





17



these collections (e.g., attributes, instance variables, associations, relationships, bridges, links, etc.). In order to map the structural constructs to a neutral representation, the neutral model should adopt a general terminology to name things and their relationships. It should also provide a number of very primitive structural constructs so that the basic structural properties of high-level data models can all be represented by these primitives. Those semantic properties in these high-level constructs that are not captured by these structural primitives can be specified using a knowledge specification language. The separation of structural primitives from detailed semantic specifications using a rule language is important since the former provides a common structural representation and the latter can be used to explicitly state the different semantic properties associated with different high-level constructs.

In any database management system, the semantics of modeling constructs and constraints of a data model can be stated in terms of what the DBMS should do when data defined by these constructs and constraints are retrieved and manipulated. In other words, their semantic properties can be defined by the conditions under which certain actions need to be taken by the DBMS. For example, part of the meaning of a key attribute is that, upon the insertion of an entity instance (or a tuple of a relation) or the update of a key attribute value, the DBMS needs to verify that its key attribute value is different from those of the other instances. As another example, part of the semantics of a superclass-subclass association is that, upon the deletion of a superclass object, the corresponding object instances in all their subclasses in the class hierarchy or lattice need to be deleted also. Thus, the semantics of data models can be defined in terms of the "operational semantics" of a DBMS rather than the semantics of words and languages that linguists and philosophers are interested in.

For the purpose of data model and schema translation, the neutral model should be "low-level" enough to allow the subtle semantic differences among data models to be distinguished, yet "high-level" enough so that the comparison between the neutral





18


representations of the modeling constructs or constraints of any two data models can be easily carried out by a translation system.

In the neutral model to be described in the next chapter, we adopt the objectoriented paradigm for structural representation due to its generality and use a calculus-based knowledge rule specification language which uses triggers and patterns of object associations for the specification of operational semantics associated with modeling constructs and constraints.














CHAPTER 4
THE OBJECT-ORIENTED RULE-BASED EXTENSIBLE CORE MODEL (ORECOM) ORECOM stands for an object-oriented, rule-based, and extensible core model. It is a low-level neutral data model designed for inter-model and schema translation. Its object orientation allows all things of interest (e.g, physical objects, abstract things, events, processes, functions, etc.) to be represented uniformly in terms of objects and object associations. Its rule specification facility allows complex semantic constraints found in the existing data models to be explicitly specified by knowledge rules with triggers. Its extensible feature allows new semantic properties to be introduced into ORECOM to account for the possible semantic extensions of the existing models or the introduction of new models. The extensible feature is achieved by extending the model of the model (or the meta-model) using knowledge rules and methods. It has been reported by Yaseenl [YAS91a, b] and will not be addressed in this dissertation. The above features allow the various semantic properties (structural and behavioral) of data models to be explicitly defined in terms of ORECOM's modeling primitives.


4.1 Structural Primitives


4.1.1 ObJect


Object is the atomic unit for modeling an application world. It can represent a physical entity such as a building, a computer, or an employee; an abstract concept such as a number, a project, or a business process; an event such as an employee works on a project; or anything of interest to an application. Two general types of objects are distinguished in ORECOM: self-naming objects and syst-em-named obJects. Self-naming objects are those identified by their values such as integer or real numbers, character 19





20


strings, or some structures such as set, bag, list, or array of atomic data items used for defining other complex self-naming objects or system-named objects. System-named objects are those of interest to the application users such as employees, parts, projects, etc. They are uniquely identified by system-assigned object identifiers (OlDs) and are described in terms of self-naming and/or system-named objects. This distinction is commonly made in data models, e.g., the lexical and non-lexical objects in NIAM, and domain and entity objects in OSAM*.

Another way of distinguishing objects is by their structures, i.e., simple objects and complex objects. A simple obJect represent a single thing or concept that can not be further decomposed into other objects with respect to an application world. A complex obJect, on the other hand, is composed of other simple or complex objects with a structure Such as set, list, bag, array, tuple, etc., or their mixture. For example, a list of five to ten employees can be defined to be a complex object with the structure 'list[5, 10f and an array of four such complex objects (e.g., array[-2, 1] of list[5, 10] of employee) can be further formed as another complex object.


4.1.2 Association


An association is a bi-directional link connecting two objects. It specifies that the pair of objects are structurally related. However, the specific semantic properties between them such as a system-named object's relationship with a self-naming object (or a domain value) through an attribute, a superclass object's relationship with its representation in a subclass, or a system-named object's relationship with another system-named object, etc., are not represented by the association. In ORECOM, the semantic properties of an association link are specified by micro-rules (to be described later). This separation of the general structural relationship from the specific semantic properties allows heterogeneous data modeling constructs to be mapped to the neutral structural representation and their semantic similarities and/or differences to be explicitly represented by micro-rules. An n-





21



nary association where n > 2 is represented by n number of binary associations and the semantics of the n-nary relationship is again captured by rules.

An association in ORECOM is labeled by an association name as well as the direction of the association. For example, a company object (c) hires an employee object (e) is specified by "c *>Hires e" where the symbol "*" represents an association, ">" indicates the direction of the association from c to e, and "Hires" is the association name. This association can also be equivalently represented as "c *Belongs-to e" and "c
*>Hires e" are two associations with different meanings between the same pair of objects.


4.1.3 Class


An object class is an abstraction of or a type specification for a collection of object instances which share some common structural and behavioral properties. Object instances of a class are designated uniquely by instance identifiers (i.e., IlDs) each of which is I concatenation of a class identifier (i.e., CID) and an object identifier (i.e., OlD). Using this identification scheme, the same real world object which can be identified in ORECOM by its OID can have instances in different classes which can be identified by their lDs. Corresponding to the distinction of basic object types made in many existing data models, ORECOM categorizes object classes into two general types: entity-class and domain-class. An entity-class (or E-class) defines the structural and behavioral properties of a set of system-named objects. It also serves as the "container" of "holder" of a set of object instances which are the data representations of the collection of objects in that class. In ORECOM, an E-class is defined in terms of a class name, a number of associations with





22



other classes, a set of method specifications (or signatures) and their implementations for defining the "procedural semantics" implemented in program code, and a set of micro-rules for defining semantic constraints applicable to its objects. A skeletal class definition of an Employee class is given in Figure 4. 1. A domain-class (or D-class) in ORECOM defines a set of self-naming objects (e.g., simple self-naming objects such as integers, reals, etc. or complex self-naming objects such as a list of integers, an array of reals, etc.). It specifies the data type and, optionally, some constraint of a set of simple or structured values which are used for representing the instances of B-class objects. The values of a D-class are not explicitly entered and stored in a database. They are contained in the instances of B-class objects. In Figure 4.1, Bid, B salary, and Works-for are class association names which connect the B-class Employee to D-classes Integer and Salary and to another B-class Company, respectively. An instance of Employee consists of an Bid value, an B salary value, and an instance reference (1ID) to a company object. We note that the traditional concepts of "attributes" and "entity associations/relationships" are uniformly represented in structure by "associations" in ORECOM. Similarly, the relationship between a superclass and a subclass recognized in object-oriented models (the same relationship is called a generalization or categorization in some semantic models), say, between class Person and class Employee, is represented by a class association linking Person to Employee. As we pointed out before, the detailed semantic properties of different association types such as "attribute," "entity association," superclass/subclIass or generalization association" are captured in ORECOM by micro-rules.

When two object classes are associated with each other as defined in a schema, their objects and thus object instances may also be associated with each other through the same association. For example, the class association "Company *>Hires Employee" implies that there can be an object association "c *>Hires e" for some c in class Company and some e in class Employee.






23







ENTITY-CLASS Employee ASSOCIATIONS Eid : Integer; E_Salary : Salary; Works_for : Company; METHODS
DisplayDatao; SetSalary0; IMPLEMENTATION
DisplayData0 is begin ......... end;
SetSal;ary0 is begin ......... end;

MICRO-RULES rule Emp001 is triggered before this.DISASSOCIATE(Worksfor, c:Company) condition ..... action .....
otherwise ..... rule Emp002 is .........

END Employee;


Figure 4. 1. Example of an entity-class Employee.



Based on the primitive structural constructs (objects, object associations, classes, and class associations) described above, the modeling constructs of many highlevel data models can be represented in ORECOM's neutral representation. For example, Figure 4.2 shows the graphical representations of the constructs used in EXPRESS, IDEFIX, NIAM, OSAM*, SDM [HAM81, TH089], and OMT [RUM91] for modeling the superclass-subclass relationship between Employee and Secretary. Although, these data models use different terminologies and graphical symbols to represent their constructs, the underlying common structural properties can be specified by the concepts of E-classes Employee and Secretary and their class association. Additional to the structural primitives, a set of semantic constraints representing the implied meaning of superclass-subclass





24






EXPRESS IDEF-1X NIAM OSAM* SDM OMT
Employee Employee E


+G
EmplyeeEmplyeeEmployee



Secretary 1
Secretar Secretary Sertyj Sceay





ORECOM
OBJECTS
nAsecretary inherits l h
properties of an employee, OPERATIONS & ENTITY A. employee including attributes, MICRO-RULES
CLASS Eoperations and rules.
m An employee may or OPERATIONS &
BASIC may not be a secretary. MICRO-RULES
ASSOCIA77ONLCONSTRAINTS m A secretary must also OPERATIONS &
be an employee. MICRO-RULES
O There is an 1-1& ENTTYcorrespondence between MICRO-RULES
ENT7TY ( ] ertrI an employee and his/her/ IR-UE
CLASS role as a secretary.


Structure-4 Behavior

Figure 4.2. Semantic decomposition of different models to ORECOM's neutral
representation.





relationship can be explicitly defined by a set of methods and micro-rules which correspond to the constraint statements shown in the figure for capturing the behavioral properties of these two classes and their association. By translating high-level modeling constructs into ORECOM's general structural primitives and detailed constraint rules, subtle differences





25



among these constructs can be uncovered. For example, in IDEF- iX, it is required to specify a discriminator for the superclass-subclass relationship between Employee and Secretary. The discriminator is one attribute of the superclass, whose value determines which subclass a superclass object should be a member of. This requirement does not exist in the other models mentioned above. It can be defined in terms of the insertion behavior followed by a DBMS.


4.2 Behavioral Primitives


Object behavior means how an object performs an operation (or a method) upon other objects as well as how it reacts in response to an operation. So, the behavioral semantics of an application world can be defined by operations that can be performed on objects and the rules that these objects and their operations have to obey. The traditional data models such as relational, network and hierarchical models provide only some structural constructs and very limited constraint specification facilities (e.g., by keywords such as Keys, Non-Null, domain constraints, etc.). They do not capture the behavioral properties of objects like object-oriented models do in term of operations or methods which represent the "procedural semantics" of objects. In the traditional DBMSs, this type 01 semantic properties are either implemented in application programs or hard-coded in DBMSs. More recent data models such as the ones used in ODE [AGR89, GEH9IJ, OSAM* [SU89], STARBUST [LOH9l], POSTGRES [ST091], etc., provide high-level rule specification facilities for defining different types of semantic constraints. Thus, in order to accommodate these existing data models and their translations, a neutral data model like ORECOM needs to provide the facilities for method and rule specifications. There are five primitive system-defined object operations and a rule specification language provided in ORECOM. They are described in the following sections.





26



4.2,1 Method Specification


Like in other object-oriented data models, each system-predefined or user-defined object class in ORECOM may contain a number of method specifications each of which defines the operation that can be performed on its objects. Each method specification has a name, a number of parameters associated with the operation and optionally a returned value (e.g., DisplayData and SetSalary in Figure 4.1). The activation of an operation on an object is carried out by message passing. Methods can be either system-defined or userdefined. System-defined methods are object operations that are common to all objects and are defined in some system-predefined classes and inherited by user-defined classes. Userdefined methods are object operations that are application-specific: and applicable only to the objects of user-defined classes, in which these operations are specified, and their subclasses. Corresponding to each method specification, there is a method implementation containing the program code that carries out the operation.

Since user-defined methods are application dependent and the procedural semantics captured by them are expressed by program code, there is no known method for analyzing programs to capture the semantics of implemented algorithms. In a data model or schema translation system, programs that implement user-defined methods have to be converted either manually or automatically to programs that suit the target application system. In this work, we shall consider only the system-defined operations that manipulate a database. We shall present the system-defined methods of ORECOM in the remainder of this section. We use the dot notation "x.op" to specify that the object operation op is performed on the object x. There are five system-defined object operations supported in ORECOM: CREATE, DESTROY, ASSOCIATE, DISASSOCIATE, and READ. These operations are corresponding to the structural primitives of objects and object associations discussed before. The operation x.CREATE establishes the object x in a class where the operation is executed. Its inverse operation x.DESTROY terminates the membership of x in that





27



class. For establishing an object association between two objects x and y, the associate operation x.ASSOCIATE(a, y) is used. Here, ot specifies the name of the association. The inverse of ASSOCIATE is DISASSOCIATE as in x.DISASSOCIATE(cc, y). We note here that, since many types of relationships found in the existing data models are modeled uniformly in ORECOM as objects and class associations, data manipulation operations such as update, insert, and delete are modeled by ASSOCIATE and DISASSOCIATE operations in ORECOM. For example, updating an attribute value of an object can be represented by disassociating the object with the old value (a D-class object) and associating the object with the new value. Inserting an object instance is represented by creating the object and associating the object with a number of D-class and/or E-class objects through different association names (i.e., attributes). The inverse operations can be done for the deletion operation. The last system-defined operation, READ, which is represented as xxt, is used for specifying the retrieval of all objects that are associated with x through the association a.


4.2.2 Rule Specification


Constraints captured by high-level data models are used by DBMSs to control the data retrieval and manipulation operations so that these constraints are enforced and the database integrity is maintained. They can be explicitly defined in terms of knowledge rules which specify the operational behaviors of objects and their associations. In ORECOM, these knowledge rules are called "micro-rules" since they are used to specify the detailed semantic properties of high-level data modeling constructs and constraints.

The rule specification language used in ORECOM for defining micro-rules is the rule specification part of a general-purpose knowledge base programming language called K which has been designed and implemented at the Database Systems Research and Development Center of the University of Florida [SHY92, ARR92]. The language is a






28



calculus-based language and has triggers and association pattern specification capabilities. The syntax of a micro-rule is given below:


rule rule-id is
triggered trigger-conditions
[con dit ion guarded-expression]
[action statements]
[otherwise statements]
end rule-id;


The ruled is a unique name for rule identification purpose. Besides the rulelid, a rule consists of a set of trigger-conditions and a rule body defined by condition, action, and otherwise clauses. When any of the trigger-conditions is satisfied, the rule is triggered and the rule body is evaluated. A trigger _condition is defined by a triggering time (i.e., before, after, or immediate-after) and a system-defined or user-defined method. For example, the triggering condition "after x.ASSOCIATE(cx, y)" states that after establishing an association named cx between objects x and y, the following rule body should be evaluated. The condition clause can be specified by a guarded expression in the form of "G 1, ..., Gn I T", where Gi, i = l..n, are the guards and T is the target. The G's and T are Boolean expressions and may contain association patterns (to be explained below). A guarded expression is evaluated to TRUE, SKIP, or FALSE. The value TRUE is returned if G 1, ...Gn, and T are all true. In this case, the statements specified in the action-clause will be executed. If any one of G I, ..., Gn is false in a sequential evaluation of these expressions, the guarded-expression returns the value SKIP, which will cause the rest of the rule body to be ignored. If all the guards are true but the target T is false, then the expression returns FALSE. A false result will cause the statements specified in the otherwise-clause to be executed. The guarded expression is a short hand for a complex expression that involves the nesting of many condition-action-otherwise sub-expressions. The statements in both action-clause and otherwise-clause can be expressions of various kinds including system and user-defined methods or any K's computation statements including assignments,





29


quantified expressions, conditional statements, repetitive statements, and so on. [SHY9 1, 92].

Besides the usual predicates used in Boolean expressions, the rule specification language of ORECOM allows "patterns of object associations" or simply "association patterns" to be specified in the condition clause. For example, the expression "x:X[Px]" is a simple pattern which specifies that all objects of class X which satisfy the predicate expression Px can be referenced by the object variable x. Thus, the expression "e:Employee[Age > 50 AND Salary = 70K]" would identifies all Employee objects whose age is greater than 50 and whose salary is equal to 70K. The variable e ranges over this set of employee objects. A more complex association pattern may involve two classes in the form of (x:X *>or y:Y) or (x:X !>(x y:Y). (Here, predicates Px and Py associated with classes X and Y are omitted to keep the expressions simpler.) The pattern (x:X *>X y:Y) returns all pairs of X and Y objects which are associated (as specified by the association operator '*') through x, whereas the pattern (x:X !>ox y:Y) returns all X objects which are not associated (as specified by the non-association operator "!') with any Y objects through cx and all Y objects which are not associated with any X objects through the same association. The object variables x and y are used to reference the objects that satisfy the expressions. As discussed before, the direction symbols, ">" and "<", distinguish the subject/agent from the object of an association and the following conditions hold:


"x:X *>a y:Y" = "x:X * "x:X !> y:Y" = "x:X !

where cx-' is the inverse association of mx.

Association patterns may involve a long linear structure of object classes. They may also contain AND/OR branches and loops. They provide a simpler way of specifying complex associations among object classes and thus their objects. We shall explain the branching structures: "x:X AND(L1 P1, L2 P2, ..., Ln Pn)" and "x:X OR(LI





30



P1, L2 P2, ..., Ln Pn)". The Li in these expressions specifies an association link such as "*>cti" or "!
1..n.

The syntax and semantics of micro-rules and the association patterns explained above provide a powerful means for specifying a variety of semantic constraints found in the existing data models. We have used such a rule specification language to define all the constructs and constraints of several semantically rich data models such as EXPRESS, IDEF-iX, NIAM, and OSAM*.














CHAPTER 5
THE GENERAL CONSTRAINT TYPES (MACROS)


In this chapter, we shall present a number of semantic constraint types commonly found in the existing modeling constructs. They are the results of analyzing and relating the constructs and constraints of a number of semantically rich data models. For each constraint type, we introduce a "macro" representation, which is a parameterized way of specifying the constraint type and its variations. Corresponding to each variation, a set of micro-rules can be defined to specify its detailed operational semantics. Thus, a variation of a macro is an abstraction of a set of detailed micro-rules. The macro representation is particularly suitable for the comparison and conversion of modeling constructs and constraints since it is a more compact representation than the micro-rule representation. The former can be used by a data model and schema translation system to compare the ORECOM representations of modeling constructs and constraints and the latter can be used by the system to generate explanations or program code to account for the discrepancies found in a translation.

Our analysis of the modeling constructs and constraints of data models follows the following approach. We first determine what are the constraint types and their variations that are meaningful to a set of objects in an object class (i.e., intra-class constraints). We then examine the possible constraint types and variations that are meaningful to the association of two object classes (i.e., the semantics of a binary association or inter-class constraints). Lastly, we examine those constraint types and variations related to an object class and its multiple associations with other object classes (i.e., the semantic of n-nary associations or inter-association constraints). This approach provides a systematic




31





32


way of determining the semantic properties that can exist in the structural primitives of ORECOM.

In the following sections, there are eight general constraint types or macros to be discussed, each with its syntax definition, semantic description, part of its corresponding micro-rule representation, and some examples to illustrate its uses. The complete micro-rule representation of each macro is given in the APPENDIX A of this dissertation.


5.1 Intra-class Constraints

In spite of the differences in terminologies and notations, all existing data models provide some ways of specifying the structures (or data types) of a set of objects that form a class (or its equivalent concepts), the ways objects can be individually identified, the constraints associated with the membership of objects in the class. Figure 5.1 provides some examples. In the IDEF-IX notation shown in 5.1(a), Eid and Ename are used



IDEF-lX EXPRESS OSAM* NIAM

Employee TYPE Company

Eid Positive = INTEGER; A
Ename WHERE ProjTeam I Date
SELF > 0; SET[5,20] ,
ENDTYPE,
FEmployee
(a) (b) (c) (d)

Figure 5.1. Examples of modeling constructs that contain intra-class constraints.


together as a required external identifier (or primary key) of Employee objects. In 5.1 (b), the data type Positive defined in EXPRESS allows only positive non-zero integers to be its members. Figure 5.1 (c) shows that the attribute ProjTeam of the entity class Company in OSAM* is defined over the domain class EmpSet having a constraint of five to twenty





33


employee instances which in turn is defined over the class Employee. In Figure 5.1 (d), each member of the NIAM's so-called lexical object type (LOT) Date is assumed to represent a tuple of objects from other LOTs Month, Day, and Year.

In ORECOM, we introduce a constraint type or macro called MEMBERSHIP

(MB), which has the following syntax (a parameterized notation) and semantics (in the form of a micro-rule):

Macro MB (X, 0, I, S, T, C, M) (M1.1)
Micro-rule
rule MB-01 is /* defined in the meta-class named CLASS whose instances are definitions of all classes defined in a system */ triggered immediateafter this.CREATE action this.ClassName := X;
this.ObjectType := 0;
if 0 = "system-named" then this.Identifier := I;
else this.Identifier :=
endif;
this.Structure := S; this.ClassType =T; this.Constraint := C;
this.LocalOperation := M;


The above MB macro specifies properties and constraints of an object class that must be satisfied by its members. These properties and constraints are specified by a class name (X), an object type (0) which takes the value of "self-naming" or "system-named", a user-defined object identifier (I) (i.e., attribute(s) that serves as a primary key) if the object type is system-named, an object structure (S) which specifies the structure of its members (simple or complex structures such as SET, LIST, BAG, ARRAY, TUPLE, etc.), a class type (T) which is either an E-class or a D-class, a set of membership constraints (C) which either list the possible members, or specify the range of values that constrain its members, or express in logical expressions that evaluate to true or false, and a set of methods (M)





34


which specify the meaningful operations that can be performed on the class members. A type definition in a high-level data model can be mapped into the MB macro with specific values assigned to its parameters. For example, the macro representations of the four classes shown in Figure 5.1 are as follows:

MB(Employee, system-named, (Eid, Ename), simple, E-CLASS, -, -) MB(Positive, self-naming, -, simple, Integer, Positive > 0, -) MB(EmpSet, self-naming, -, SET, Employee, -, ) MB (Date, self-naming, -, TUPLE, (Month, Day, Year), -, -)


The "-" sign in the above macros indicates that a parameter is not specified in the original construct, and the "E-CLASS" of the first macro is a system-defined class which contains all defined entity classes. By translating the constructs of different data models which capture the concept and constraints of MEMBERSHIP into the above macro representations and comparing their parameter values, a schema translation system will be able to identify their semantic similarities and differences.

The system-defined meta-class CLASS is a class containing all definitions of classes in terms of the following seven attributes: ClassName, ObjectType, Identifier, Structure, ClassType, Constraint, and LocalOperation. The micro-rule MB-01 shown above simply sets the attribute values for a class, which are specified in a MB macro, after the class is created as an object instance (identified by "this") in the class CLASS.


5.2 Inter-Class and Inter-Association Constraints


We now examine some of the constraint types, which are commonly seen in the existing data models, for restricting an association between two object classes (inter-class constraints) and gmultiple associations of an object class (inter-association constraints).





35




5.2.1 PARTICIPATION (PT)


As an inter-class constraint, this constraint type restricts the total number of objects of one class that must participate in an association with the objects of another class. It is a very general constraint type which can be used as a common representation of a number of constraints used in different data models. Figure 5.2 shows some schema examples taken from different data models. The alternate key, SSN, of Employee entity in the IDEF- X



S1 (IDEF-lX) S2 (EXPRESS) S3 (SDM)

Employee ENTITY Company; CLASS Engineer
Fax: OPTIONAL
INTEGER; Position : INTEGER,
Phone: OPTIONAL INITIALVALUE 10;
INTEGER; Salary: REAL,
SSN (AK) DERIVED BY
WHERE Position *325.3 + 20000;
_J Fax< <9999999999; ...
Fax <> Phone;
ENDENTITY;


S4 (OSAM*) S5 (NIAM) S6 (OMT)

Student



[3,15] [1, 2] T
X
Emloyee Prjc Fulime_- PartTime Egn oy Wel
Student Student


Figure 5.2. Examples of schemata of different data models.



schema Si1 is a non-null attribute. It requires that every Employee entity must be associated with a social security number. In other words, there is a total participation constraint associated with the Employees' associations with social security numbers. In the schema





36



S2 defined in EXPRESS, both Fax and Phone are optional attributes. In ORECOM's representation, this is a partial participation constraint associated with the Company class's associations with Fax class and Phone class since company objects may, but do not have to have Fax and Phone numbers. In the SDM model, an attribute can have an initial value assigned to it. The schema S3 shows that the initial value for the Position of each Engineer object is '10'. This default attribute value assignment implies the constraint of total participation since an engineer will have a position value equal to either 10 or some other number but not null. The schema S4 shows an interaction association (I) defined in the OSAM* model to capture the semantics that each Works-for object represents the fact that an Employee object is associated with a Project object and the association is itself modeled as an object. The construct, among other semantic properties, has a total participation constraint, namely all Works-for objects have to be associated with some Employee objects as well as with some Project objects. Existence dependency is another frequently used inter-class constraint, which is available in both the NIAM schema S5 and the OMT schemia S6 in Figure 5.2. In S5, every FullTimeStudent object and every PartLTimeStudent object must also be a Student object, that is, they are all "existence dependent" on the Student objects. From ORECOM's point of view, both FullTimeStudent and PartTimeStudent classes are totally participated in their associations with the Student class. (The symbol "T" in S5 specifies a total specialization which means that a student object must be in either FullTimeStudent class or PartTimeStudent class. This total specialization is an example of an inter-association constraint of the PARTICIPATION constraint type which will be described later. The other symbol "X" in S5 specifies a set exclusive constraint which means that an object in FullTimeStudent class can not be in PartTimeStudent class and vice versa. The set exclusive constraint belongs to another inter-association constraint type called Logical -Dependence which will be addressed in Section 5.2.7.) The Car class in S6 is an aggregation of classes Engine, Body, and Wheels, and according to the semantics of aggregation defined in the OMT model





37



[RUM9 1], a Car object can not exist if any of its aggregation components (i.e., an Engine object, a Body object, and a Wheels object) does not exist. In this case, there are three total participation constraints on the Car class, one for each of these three associations. On the other hand, not every Engine (or Body, or Wheels) object is used in a car, that is, there is only a partial participation constraint associated with Engine's (or Body's, or Wheels') association with Car. The above examples clearly show that the modeling constructs of different data models may look very different. However, if their semantics are decomposed into more primitive representations, part of their semantics may overlap and can be explicitly identified (in the above examples, the common semantics are the variations of the PARTICIPATION constraint type).

We shall now consider the more general representation of the PARTICIPATION constraint type and its macro and micro-rule representations.


Macro: PT (X, minl, max, a, Y) (M2.l1)
Micro:
Rule PT-01 is 1* defined in class X *
triggered after this.CREATE, immediate-after this.DISASSOCLATE (ax,y) condition exist x in (x:X *>a Y where Count(x)
Rule PT-02 is /* defined in class X ~
triggered immediate-after this.ASSOCIATE (ax,y) condition exist x in ( x:X *>ax Y where Count(x) > max) action REJECT;


The inter-class PARTICIPATION (PT) macro uses two parameters, min and max, to specify a lower-bound and an upper-bound of the number of X objects which participate in the association named a with the class Y. The values of min and max can be zero, a positive integer, or any expression that returns a positive integer. The min value is always less than the max value. If min equals to zero, it means that there is no constraint on the





38


lower-bound. If it equals to the expression "Count(X)", which returns the current total number of objects in class X, then a total participation constraint is specified. On the other hand, if max equals to zero, it means that no object can participate in the association. Max equals to "Count(X)" simply means that all X objects can participate. As an example, the macro "PT (Employee, 5, Count(Employee), SSN, Integer)" specifies that at least five employees must have social security numbers, whereas the macro "PT (Employee, Count(Employee), Count(Employee), SSN, Integer)" specifies a total participation which states that every employee must have a social security number.

Using the macro defined in (M2.1), the macro representations of some variations of the PARTICIPATION constraint type which exist in the modeling constructs of Figure 5.2 are shown below:

S 1: PT(Employee, Count(Employee), Count(Employee), SSN, Integer)
--- a non-null attribute
PT(Integer, 0, Count(Integer), SSN-1, Employee)
--- a partial participation
S2: PT(Company, 0, Count(Company), Fax, Integer)
--- an optional attribute
S3: PT(Engineer, Count(Engineer), Count(Engineer), Position, Integer)
--- a default attribute value which implies a non-null constraint
S4: PT(Works for, Count(Worksfor), Count(Worksfor), -, Employee)
--- an implied total participation
S5: PT(FullTime_Student, Count(Full_TimeStudent), Count(Full_Time_Student), -,
Student)
PT(PartTime_Student, Count(Part_Time_Student), Count(PartTime_Student), -,
Student)
--- existence dependencies
S6: PT(Car, Count(Car), Count(Car), -, Engine)
--- an existence dependency
PT(Engine, 0, Count(Engine), -, Car)
--- a partial participation


The micro-rule PT-01 is an example operational rule which can be defined in class X to enforce the participation constraint of X objects. It specifies the enforcement of the constraint after a CREATE transaction has been performed on X or after a





39



DISASSOCIATE operation is executed on an object of X identified by "this" and an object of Y identified by "y". The function "Count(x)" returns the number of those X objects which fall in the association pattern "x:X *>a Y". This value is compared with the min value. Here, x is an object variable which ranges over those X objects that satisfy the pattern. "After" means that the rule is verified after the CREATE transaction instead of right after the CREATE operation which is specified using the key-word "immediate-after". In the CREATE transaction, the association between the created X object and some Y object(s) can be established. If the minimum participation constraint is violated after the creation transaction or after the disassociation operation, the operation will be rejected. The rejection of the creation transaction will cause the transaction to be rolled back and the rejection of the disassociation operation will cause it to be aborted. The rule PT-02 ensures that the upper-bound max is not violated by an ASSOCIATE operation by comparing it with the current number of the participating X objects (i.e., Count(x)).

Sometimes it is desired that a participation constraint applies only to a subset of objects of a class rather than the entire set of objects as in the examples shown above. For instance, in OSAM*, a user-defined constraint can be stated as a rule inside the Employee class such as "all employees who have no Eids must have SSNs". This is a total participation constraint for only a subset of employees rather than all employees. The condition that all employees who have no Eids is called a selection condition of the Employee class. The incorporation of a selection condition to each class specified in the participation macro makes it even more general and useful for capturing some user-defined constraints. For this reason, we extend the participation macro M2.1 into the following fonn:


Macro: PT (X, Px, min, max, (X, Y, PY) (M2.2)





40


The semantics of this macro then becomes that at least min and at most max X objects which satisfy the selection condition Px must participate in the association named at with those Y objects which satisfy the selection condition Py. The macro defined in M2.1 can be viewed as a special case of M2.2 with both selection conditions omitted. Using M2.2, the above OSAM* example can be represented as "PT(Employee, Employee.Eid = nil, Count(Employee), Count(Employee), SSN, Integer, )" where no selection condition is specified for the Integer class. Accordingly, micro-rules PT-01 and PT-02 can be modified to include the expressions Px and Py. Additional rules must also be introduced to account for those operations that may change the qualification of some X or Y objects with respect to Px and Py.
A participation constraint can also exist in a class which has multiple associations with other classes. For example, as shown in the NIAM schema S5 of Figure 5.2, Student class has a total specialization constraint (identified by the symbol "T") in its associations with FullTimeStudent and PartTimeStudent classes, meaning that every student must be in either one of the subclasses. Or, in terms of the participation concept, Student objects must be totally participated in some associations with some objects of these two subclasses (or any number of subclasses in a more general case). Thus, the PARTICIPATION macro is further generalized in the following form:


Macro: PT (X, Px, min, max, ((cl, Y1, Py1), .... (an, Yn, PYn))) (M2.3)


This macro states that the participation of those X objects that satisfy Px and are in the associations axl, a2, ..., an with those YI, Y2, ..., Yn objects that satisfy PY1, PY2, .... Pyn, respectively, must be within the range of [min, max]. It is of no importance which association and how many associations a qualified X object actually participates in. Using the macro M2.3, the total specialization constraint of the NIAM schema S5 can be represented in ORECOM by "PT (Student, -, Count(Student), Count(Student), ((-,





41


FullTimeStudent, -), (-, PartTimeStudent, -)))", in which no selection conditions are specified for the three classes in this example. The corresponding micro-rule representation of the generalized PT macro is available in the APPENDIX A.2 of this dissertation.


5.2.2 CARDINALITY (CD)


The CARDINALITY constraint type specifies that the number of objects of one class which can be associated with an object of another class must be withg47

in a specified range. It is a general constraint type that can exist in an association between two classes as well as among associations of multiple classes. In the inter-class case, a cardinality constraint is commonly expressed as follows:


X : Y = [minX, maxX] : [minY : maxY]

This expression means that an X object can associate with a minimum number of Y objects specified by minY and a maximum number of Y objects specified by maxY, and that a Y object can be associated with a minimum number of X objects specified by minX and a maximum number of X objects specified by maxX. Using this format, we can define some cardinality constraints for the schemata given in Figure 5.2 as below (the Ms in the expressions mean "many"):

SI: Employee: SSN = [1,1] : [1,1] -- employee's SSN is single-valued and unique S2: Company : Fax = [1,M] : [1,1] -- company's FAX is single-valued and non-unique S3: Engineer: Salary = [1,M] : [1,1] -- engineer's salary is single-valued and non-unique S4: Employee: Worksfor = [1,1] : [1,M] -- each Worksjfor object involves one employee but more than one Worksfor
object can be associated with an employee S5: Student : FullTimeStudent = [1,1] : [1,1] -- a student can only have one representation as a full time student
and vice versa.
S6: Car : Engine = [1,1] : [1,1] -- one car, one engine and vice versa





42


The general representation of the inter-class CARDINALITY constraint type is defined by the following macro and micro-rules.


Macro: CD (X, ca, Y, minX, maxX, minY, maxY) (M3.1)
Micro:
Rule CD-01 is /* defined in class X */ triggered before this.DISASSOCIATE(a, y) condition exist y' in ( this *>ox y':Y where Count(y') = minY) action REJECT

Rule CD-02 is /* defined in class X */ triggered before this.ASSOCIATE(x, y) condition exist y' in (this *>a y':Y where Count(y') = maxY) action REJECT

Rule CD-03 is /* defined in class Y */ triggered before this.DISASSOCIATE(a-1, x) condition exist x' in (this *
Rule CD-04 is /* defined in class Y */ triggered before this.ASSOCIATE(a-1, x) condition exist x' in (this *


The values of the four parameters in the CD macro, i.e., minX, maxX, minY, and maxY, can be any positive integers or the special character 'M'. For example, the value [1, M] of [minY, maxY] means that an X object can associate with many Y objects, whereas the value [3, M] limits the minimum number of Y objects that can be associated with an X object to three. In the latter case, the non-zero minY does not imply that every X object must be associated with some Y object. But, if an X object is associated with some Y object, it must also be associated with two other Y objects to satisfy the minimum





43


cardinality constraint. Using M3. 1, the cardinality constraints of the above examples can be represented in the neutral macro forms:


SI: CD(Employee, SSN, Integer, 1, 1, 1, 1) S2: CD(Company, Fax, Integer, 1, M, 1, 1)
S3: CD(Engineer, Salary, Real, 1, M, 1, 1)
S4: CD(Employee, -, Works-for, 1, 1, 1, M) S5: CD(Student, -, FullTimeStudent, 1, 1, 1, 1) S6: CD(Car, -, Engine, 1, 1, 1, 1)


To enforce the cardinality constraint specified in a CD macro, four micro-rules named CD-01, CD-02, CD-03, and CD-04 are defined as shown above to maintain the four bounds. The lower-bound minY can be violated only by a DISASSOCIATE operation which removes the association of an X object with a Y object and hence may decrease the number of the associated Y objects to a value less than minY. Rule CD-01 checks if the number of Y objects associated with the X object is already equal to minY. If so. the disassociation operation is rejected. In this rule, the function Count(y') gives the total number of Y objects that satisfy the pattern "this *>z y':Y", where 'this' names the X object operated on by the triggering DISASSOCIATE operation. The interpretations of CD02, CD-03, and CD-04 are similar to CD-01.

Similar to the extension of the PARTICIPATION macro, the CD macro in M3.1 can be extended as below by incorporating selection conditions to allow a cardinality constraint to be applied on selected subsets of objects.


Macro: CD (X, Px, ax, Y, Py, minX, maxX, minY, maxY) (M3.2)


The advantage of using selection conditions in a CD macro is that user-defined cardinality constraints can be captured by these selection conditions. Constraints of this type are usually expressed using a constraint language or implemented in an application program.





44


For example, the following cardinality constraint can be expressed in different ways using different constraint specification languages or can be implemented differently by different application programs.

"If an employee's Eid is less than 999, then he or she can associate with

at most three Worksfor objects whose job type is 'type-A'."

However, they can all be translated into the following macro representation.


"CD(Employee, Employee.Eid < 999, -, Works for,
Worksfor.JobType = 'type-A', 1, 1, 1, 3)"

We note here that the micro-rules CD-01 to CD-04 can be extended to incorporate Px and Py and additional micro-rules need to be added so that violations caused by operations that involve Px and Py can be checked. For example, two more micro-rules are needed in the last example to check the [l,1]:[1,3] cardinality whenever an employee's Eid is changed and whenever the job type value of a Worksfor object is updated.

In the above, we have presented a CD macro and micro-rules for representing a cardinality constraint on a single association between two classes (i.e., an inter-class cardinality constraint). The same constraint type may exist among multiple associations of a class. We shall use the SDM schema S3 of Figure 5.2 as an example. In that schema, an additional cardinality constraint may state that a pair of Position and Salary values may be associated with a minimum of one and a maximum of six Engineer objects. Here, the cardinality constraint is added between the Engineer class and the pair of Integer and Real classes through the two associations Position and Salary. In the following, we shall introduce two generalized CD macros for two cases of inter-association cardinality constraints. The first one is a macro for the cardinality constraint between one class and a set of directly associated classes as shown in Figure 5.3(a), and the second one is for the cardinality constraint between two indirectly associated sets of classes as shown in Figure

5.3(b).





45





XX1 ..... Xk

aanPPMal an



(a) (b)

Figure 5.3. Two inter-association cardinality constraints:
(a) X: (Y1 ..., Yn) = [minX, maxX] : [minY, maxY]
(b) (Z1, ..., Zm) : (Y1, ..., Yn) = [minZ, maxZ] : [minY, maxY]


The macro representation of the first case is given as M3.3 below. This macro specifies two bounds, minY and maxY, which stand for the minimum and maximum numbers of tuples of qualified Y1, Y2, ..., Yn objects that a qualified X object can be associated with (through the associations a l, a2, ..., and an, respectively). It also specifies the minimum and maximum numbers of qualified X objects with which a tuple of qualified YI, Y2, ..., and Yn objects can be associated with through the associations aXl-1, a2-1, .... and cn-1, respectively.


Macro:
CD (X, Px, ((al, Y1, PyI), ..., (an, Yn, Py,)), minX, maxX, minY,
maxY) (M3.3)


Using M3.3, our previous example of inter-association cardinality constraint can be represented as "CD(Engineer, -, ((Position, Integer, -), (Salary, Real, -)), 1, 6, 1, 1)" with no selection conditions specified.

The micro-rule representation of M3.3 is basically the same as that of M3.2 except that additional rules of the forms similar to CD-03 and CD-04 are needed because of the multiple classes Yl, Y2, ..., and Yn, and the Count function of CD-01 and CD-02 is





46


extended to count the number of associated tuples of Y 1 ... Yn objects associated with an X object.

The second case of a cardinality constraint is illustrated in Figure 5.3(b), where two sets of classes, (Z1, ..., Zm) and (Y1, ..., Yn), are indirectly associated through another set of classes (XI, ..., Xk). A cardinality constraint can be specified for the two sets of associations or attributes, (31, ..., Pm) and (xl, ..., cxn), such that

(Zi, ..., Zm) : (Y1, ..., Yn) = [minZ, maxZ] : [minY, maxY].


The macro that represents such an inter-association cardinality constraint is defined as follows:


Macro:
CD (((P13,Z1,Pzi), ..., (Pm,Zm,Pzm)), ((X1,Pxl), ..., (Xk,Pxk)),
((cdl,Y1,Py1), ..., (czn,Yn,Pyn)), minZ, maxZ, minY, maxY) (M3.4)



This macro is the most general form of all cardinality constraints. In other words, all the previous presented CD macros, including M3. 1, M3.2, and M3.3, are simply special cases of M3.4. As the number of classes in M3.4 increases, the number of the micro-rules of M3.4 will also increase. An extension of the micro-rules is straight-forward using the rule specification language. Specifically, several additional rules are needed to account for those ASSOCIATE and DISASSOCIATE operations on objects of Xi and Xi+l, i = L..k-l, since their executions may result in violations on the two lower bounds and the two upper bounds of the specified cardinality constraint. (See APPENDIX A.3 for detailed micro-rule representation of M3.4, where the rule-ids are re-arranged and are different from those we used in this section.) An example of this general constraint type is found in OSAM* model. The indirect cardinality constraint in the OSAM* schema S4 of Figure 5.2 can be represented in M3.4 form with the indices m = k = n = 1. In this schema, the Employee and Project classes are indirectly associated through the Worksfor class and their mapping





47


relationship is specified asg [3, 15] : [1, 2], meaning an employee can work for at most two projects and a project can have at least three and at most fifteen employees. This constraint can be translated, using M3.4, into its macro representation "CD((-, Employee,

-), (Worksfor, -), (-, Project, -), 3, 15, 1, 2)" in which no attribute names and no selection conditions are specified.


5.2.3 INHERITANCE (IH)


Inheritance is a modeling mechanism which allows object attributes, operations, and rules of a superclass to be inherited by objects of a subclass. It is a very useful and commonly supported modeling construct available in most new generation of semantic and object-oriented data models. However, in more traditional data models such as the relational model and the E-R model, this construct is not supported. To accommodate both types of data models, ORECOM does not treat the inheritance as one of its basic structural constructs. Instead, it is treated as a basic constraint type whose semantics is expressed by rules. The following IH macro and its micro-rule illustrate one aspect of the operational semantics of inheritance.


Macro: IH (X, Y) (M4.1)
Micro:
rule IH-01 is /* defined in class Y (the subclass) */ triggered before this.att, this.DISASSOCIATE(att, v), this.ASSOCIATE(att, v), this.op condition (not In(att, this.LocalAttribute)) v (not In(op, this.LocalOperation)) I
exist x in ((this *> x:X) where OID(this) = OID(x)) action x $ this.thisOperation;


The IH macro has two parameters: X specifies a superclass, and Y specifies a subclass. The six inheritance constructs of EXPRESS, IDEF-IX, NIAM, OSAM*, SDM, and OMT shown in Figure 2 can be uniformly represented in ORECOM as "IH(Employee, Secretary)". The IH macro is an inter-class constraint type. If a superclass has more than





48


one subclass, then each superclass-subclass association will be represented by an individual IH macro. For example, in the NIAM schema S5 of Figure 5.2 where the Student object type is a supertype of both FullTimeStudent and PartTimeStudent object types, the inheritance of these two associations is represented by "IH(Student, FullTimeStudent)" and "IH(Student, PartTimeStudent)", respectively. Similarly, in the case of multiple inheritance, there will be one IH macro for each association between the subclass and one of its superclasses. Other constraints between the superclass and the subclass (e.g., existence dependency, cardinality, or the discriminator specification as required by the IDEF-IX model) and among the multiple superclass-subclass associations (e.g., total specialization, set exclusion, set subset, etc.) are represented by other types of macros.

The semantics of one aspect of inheritance is described by micro-rule IH-01, which is defined in the subclass of the association identified by Y. Generally speaking, this rule allows an operation defined in a superclass to be performed on objects of a subclass. The four triggering operations of the rule IH-01 are attribute retrieval, attribute association, attribute disassociation, and activation of a user-defined operation. In these triggering specifications, 'this' represents a Y object, 'att' and 'v' represent an attribute and its value, and 'op' represents a user-defined operation. A prerequisite condition for evaluating the rule body is that an attempt is made to access or manipulate an attribute (att) or to perform an operation (op). The condition-clause of IH-01 is a guarded expression which states that, if the attribute being retrieved or manipulated or the operation being performed is not a member of the set of attributes or operations defined in class Y, then we want to check if there exists an object instance in class X that corresponds to "this" Y instance having the same OID. If both conditions verified in sequence are true, then the triggering operation is aborted to be performed on the corresponding X object. This is done by using a casting operator "$" which replaces the original operand, a Y object instance (this), with its corresponding superclass object instance (x) to avoid a type checking error.





49


The general concept of inheritance should apply to not only the inheritance of attributes (or associations) and operations but also the inheritance of rules. However, it is noted that the definition of IH-01 dose not show the rule inheritance. This is because that references to the inherited attributes and operations in a subclass are replaced by references of these attributes and operations defined in its superclass. Processing of these inherited attributes and operations are thus automatically subject to the semantic rules of the superclass. In other words, the function of rule inheritance is achieved by the object casting mechanism.

In the following, a more general IH macro is given to allow selection conditions to be used with both superclass and subclass. The micro-rule representation of this generalized IH macro is slightly modified from what is shown as IH-01 above and is defined in the APPENDIX A.4. To our knowledge, none of the existing models provide an explicit way to model the inheritance property for some selected objects. However, such restriction could have been introduced in a model or defined by user-defined rules.


Macro: IH (X, Px, Y, Py) (M4.2)


This general IH macro does not change the original semantic property of inheritance represented by M4.1 and its micro-rule. It only restricts the inheritance to those X and Y objects which satisfy Px and Py, respectively. We use the NIAM schema S5 of Figure 5.2 again to illustrate this "conditional inheritance" concept. In this original schema, every FullTimeStudent object can inherit an attribute, say, GPA, and an operation, say, DisplayTranscripto, from its corresponding Student object. Now, it is assumed that a FullTimeStudent object can inherit GPA or DisplayTranscripto if and only if the local attribute Status of the FullTimeStudent object is equal to "tuition-paid" and the local attribute Major of the Student object is not equal to "CIS". This conditional inheritance is made possible by adding the two selection conditions, (Student.Major # "CIS") and (FullTime_Student.Status = "tuition-paid"), to the macro IH(Student,





50



FullTimeStudent). If a student whose major is CIS or a full time student who has not paid his or her tuition, then the inheritance property will not be applicable to this student.


5.2.4 PRIVACY (PV)


Constraints of this category provide access protections to the structural and behavioral properties of objects. An association between two classes can be declared to be private, protected, or public. For example, suppose ct is an attribute defined in class X, whose domain is class Y. If this attribute (or association between X and Y) is private, then only objects of X can initiate operations to access the Y objects that are associated with the X objects through a or to associate or disassociate with Y objects. This type of privacy constraint can also be applied to a superclass-subclass construct. For example, if X is a subclass of Y, only X objects can initiate an inherited operation or access an inherited attribute from class Y and Y's superclasses. In both modeling constructs, the association is private or "invisible" to all other objects associated with class X. For a protected association, on the other hand, the privilege of traversing the association to access Y objects and its attributes and operations is also granted to objects of the subclasses of X. In this case, if class Z is a subclass of X and X is a protected subclass of Y, then objects of Y and its superclasses and their operations and attributes will be all inheritable to both X and Z objects. Similarly, if Y is the domain of the protected attribute ax of X, and z represents a subclass object of Z, then the retrieve operation "zxc" will be granted. An association which is neither private nor protected is accessible to all objects and is called a public association.

Privacy constraints are not commonly supported by the existing data models. However, they are supported by some object-oriented programming languages such as C++. The two simple C++ schemata shown in Figure 5.4 give some examples of privacy constraints. In schema S7, the Employee class has a public attribute Eid, a protected attribute Address, and a private attribute Salary. The Eid can be accessed by objects of all





51



classes since it is public. The Salary is a private attribute and therefore can be accessed and updated by Employee objects only. All other indirect accesses from outside the Employee class (e.g., a subclass) are not permitted. The last attribute, Address, is protected but not private. In addition to the Employee objects, objects of Employee's subclasses are also allowed to access this attribute. In schema S8, the superclass-subclass association between the Employee and Manager classes is defined to be a private association. Because of this privacy constraint, the inheritance of Employee's instances, attributes, and operations is limited to objects of Manager only. They can neither be further inherited by subclasses of Manager nor be accessible to Manager's other associated classes. The C++ privacy constraints have been adopted in the underlying data model of our implemented knowledge base programming language K. 1 [ARR92].


S7 (C++) S8 (c++)
class Employee class Manager: private Employee

public: public:
int Eid; int Office#;
protected:
char *Address;
private:
float Salary;

Figure 5.4. Examples of public, protected, and private associations in C++.



Macro: PV( X, cc, Y, privacy_type ) (M5.1)
Micro:
rule PV-01 is /* defined in class X */ triggered before this.ax, this.ASSOCIATE(ox, y), this.DISASSOCIATE(or, y),
y $ this.thisOperation
condition ((privacy_type = "private") => (this.prior = this)) OR
((privacy-type = "protected") =>
((this.prior = this) OR ((this.prior *> this) AND (OID(this.prior) = OID(this))))) otherwise REJECT;





52



The PV macro shown in M5.1 is a general form to represent the above privacy constraints. The same macro can represent a privacy constraint of a superclass-subclass association or a simple association between two classes. The value of the parameter 'privacy type' can be specified as "private" or "protected". Using this PV macro, the privacy constraints in S7 and S8 of Figure 5.4 are represented as follows:


S7: PV(Employee, Salary, float, private)
S7: PV(Employee, Address, char, protected)
S8: PV(Manager, -, Employee, private)


The PV macro is enforced by the micro-rule PV-01. This rule is defined in class X and can be triggered by two sets of operations, depending on whether there is inheritance on the association. If there is no inheritance, then PV-01 can be triggered by one of the three operations: this.a, this.ASSOCIATE(a, y), or this.DISASSOCIATE(a, y). If Y is a superclass of X, then it can be triggered by a casting operation which transfers an operation from class X to class Y. The main task of PV-01 is to find out, through the use of a system-defined method called prior, the object which initiates the triggering operation. It can be the current operated X object identified by 'this', or objects of other classes. For the former case, the method "this.prior" will return a value which is the instance identifier (LID) of the object represented by 'this'. The returned value must equal to 'this' for a private association, because only objects of X can initiate the triggering operations. However, for a protected association, the value of "this.prior" can be equal to 'this' or the lID of a subclass instance of 'this'. In our last example shown in Figure 5.4, we can assume Secretary is another subclass of Employee and let 'sOO' and 'eOI' represent the liDs of the person in the Secretary and Employee classes, respectively. Then, the operation "sOO.Salary" will be replaced with another operation "eOO1 $ sOO.Salary" and before it is executed, the rule PV-01 in Employee class is triggered. In evaluating this PV-01, the value returned by the





53


method "eOOl.prior" is sOOl, not eOl. According to the private constraint which requires IfeO0l.prior" to be equal to eO0l, the operation "eOOl $ sOOl.Salary" and hence the original "sOOl.Salary" will thus be rejected. On the other hand, the retrieve operation "sOOl .Address" will be accepted since Address is a protected attribute of Employee and can be accessed by the subclass objects of Employee (e.g., sO01).

The following macro is a generalized PV macro of M5. 1, which allows selection conditions to be specified with both classes.


Macro: PV( X, PX, (x, Y, Py, privacy_type )(M5.2)


As in other generalized macros, the purpose of adding the selection conditions PX and Py to this macro is to allow the specified privacy constraint to be applied on selected objects. For example, in the macro "PV(Employee, Employee. Position > 5, Salary, Float, Float > 10OK, private)", the Salary is specified to be a private attribute of Employee, but the privacy is only for those employees whose Position value is greater than five and the Salary value is higher than 1O0K. Micro-rule representation of M5.2 would be similar to that of M5.1 except that the condition-clause has to be modified to include the verification of the two selection conditions (see APPENDIX A.5 for detail).


5.2.5 TRANSITION (TS)


This type of constraints deal with updating an association or the transition of an association from one state to another. Upon updating an association, one object is disassociated from the other, and it may or may not be associated again with another object in the same class. A transition constraint can be defined in this situation to regulate how an association can be changed. Though most existing data models do not provide explicit notations or facilities for specifying a transition constraint, it is an important constraint type in database applications. We take the IDEF-IX schema SI in Figure 5.2 as an example.





54


The SSN in this schema is an alternate key of the Employee entity. Since a key is normally not updatable once its value has been given, there must exist a "non-updatable" constraint for this attribute. This non-updatable constraint is one example of transition constraints. A transition constraint can also be used to specify the relationship between the two values of an attribute before and after its update. For instance, a transition constraint can be added to the Salary attribute of Engineer in the SDM schema S3 in Figure 5.2 so that an engineer's salary can only be increased and the increment should be at least 10% of its current value. A rule like this is an application dependent transition constraint and has to be defined or implemented in an application program. To represent and enforce a transition constraint in a general manner, no matter how it is actually defined or implemented and no matter what special language is used, we define the following TS macro and micro-rules.


Macro: TS ( X, a, Y, texp(a, a_old) ) (M6.1)
Micro:
rule TS-01 is /* defined in class X */
triggered before this.DISASSOCIATE(a, y) action this.aold := y;

rule TS-02 is /* defined in class X */
triggered before this.ASSOCIATE(a, y)
condition this.a_old # nil I texp(y, this.a_old) otherwise REJECT;


The TS macro in M6.1 is an inter-class constraint type, which specifies a transition rule in the parameter 'texp' for the two associated classes, X and Y. These two classes are associated with each other through an association named a, and, as specified in the order given in the macro, ac serves as an attribute of X with its domain Y. The two arguments of the parameter texp (i.e., a and a_old) hold separately the current value of the attribute and the old value before the current one is assigned. These two values should satisfy the transition rule specified by texp. Otherwise, the last update of this attribute that changes its





55


value from the value of a-~old to that of ax would have violated the constraint. Here we assume that the cx-old is a system-generated attribute which records the old value of attribute ax during an update. Using M6. 1, the above example of non-updatable SSN attribute of the Employee entity can be represented as "TS(Employee, SSN, Integer, Ernployee.SSN = Employee. SSN-old) ". And, the example of the increase-only Salary attribute of the Engineer class can be represented by the macro "TS(Engineer, Salary, Real, Engineer.Salary 1.1 Engineer. Salary-old) ".

An attribute update is carried out in ORECOM by two primitive operations, i.e., a DISASSOCIATE operation for removing the old value and an optional ASSOCIATE operation for assigning a new value. Therefore, the enforcement of a transition rule can be achieved in two steps. First, the current attribute value has to be recorded before it is removed, and then, the rule is evaluated using the recorded value and the new value. These two steps are carried out by the micro-rules TS-01 and TS-02, respectively. Both of them are defined in the X class, i.e., the owner of the attribute ax. The rule TS-01 simply puts the current attribute value identified by 'y' into a-old before the DISASSOCIATE operation removes it from the X object identified by 'this'. Then, in rule TS-02, before an X object is associated with another Y object (identified by 'y') as its new attribute value, the Y object and the one recorded in cx-old are evaluated together to see if this attribute update satisfies the transition rule specified in texp. If the evaluation of texp failed, the ASSOCIATE operation is rejected.

There are two things to be noted about the TS macro and its micro-rules. First, in defining the TS macro and its micro-rules, we have made an assumption that the represented transition rule applies to only those X objects whose ax values have been assigned. For examples, the constraint of non-updatable SSN would not be applicable to employees whose SSNs have never been filled, and similarly, the constraint of at least 10% increase on Salary would not be applicable to engineers whose salaries have not been decided. In general, if an X object whose ax attribute has never been assigned a value, then





56


its ax-old value is kept as null by default. Therefore, according to this assumption, the evaluation of the transition rule texp would not be executed unless "this.ac-old nil". Second, although a TS macro contains two micro-rules, it is not necessary that both are fired for a same update. It is possible that only TS-01 is fired if an update removes an attribute value without assigning a new value. It is also possible that only TS-02 is fired if an update assigns a value to an attribute whose current value is null. For examples, an employee's SSN can be removed in one update and re-assigned in another update. The original non-updatable rule still holds even the two micro-rules, TS-01 and TS-02, are triggered by two separate updates.

The TRANSITION constraint type defined in M6. 1 can be further generalized, just like the other constraint types we have discussed, so that it can be applied only to some selected subsets of objects. The generalized TS macro with its micro-rule representation is available in APPENDIX A.6 of this dissertation. 5.2.6 MATHEMATICAL-DEPENDENCE (MD)


This macro is an inter-association constraint type. A set of attributes of a class are mathematically dependent if one attribute can be derived from the others by using a mathemnatic formula. A mathemnatic formula can be specified with arithmetic functions and operators such as 'sin', 'log', 'sqr' and '+', '- "' i,'*, etc. It is possible for a mathemnatic formula to be specified in many different but mathematically equivalent ways. For example, a formula for the attributes a, b, c, and d of some class could be in the form of "a = b + c d", or "a b = c d", or "a + d = b + c", and so on. Such a formula specifies a "value relationship" constraint for the related associations which needs to be maintained during data entry and update. An example of this type of constraints can be found in the SDM schema 83 shown in Figure 5.2. In this schema, the formula "Salary = Position 325.3 + 20000" derives a Salary value from a Position value and it also imposes a mathematical dependence constraint upon the two attributes so that if one is updated, the





57


other one must also be updated accordingly. Similar formulas can also be defined in other models such as EXPRESS, OSAM*, and NIAM.

The following MD macro and the micro-rule MD-01 provide a general form to represent and enforce a mathematical dependence constraint on the associations or attributes al, a2, ..., an of the class X. The constraint is specified as a function mexp(al, ..., an) which returns TRUE or FALSE. The classes Y1, Y2, ..., and Yn are domains of a l, a2, ..., and an, respectively.


Macro: MD (X, ((a 1, Y1)...(an, Yn)), mexp(al, ..., an)) (M7.1)
Micro:
rule MD-01 is /* defined in class X */
triggered immediateafter this.ASSOCIATE(al, yl) .......
immediateafter this.ASSOCIATE(an, yn)
condition exist this in (this AND(*>cal yl:Y1, ..; *>(xn yn:Yn)) I mexp(yl...yn) otherwise REJECT;


The value relationship between the attributes Salary and Position has the MD macro representation "MD(Engineer, ((Salary, Real), (Position, Integer)), Salary = Position 325.3 + 20000)". This value relationship needs to be maintained whenever an engineer has both Salary and Position values. In the micro-rule MD-01, the condition clause specifies a guarded expression which is verified after an X object (i.e., this) is associated with some yi through the association named ai. The function mexp is evaluated if the X object is associated with objects yl, y2, ..., and yn through the associations al, a2, ..., and an, respectively. In the evaluation, the function mexp(al, ..., an) is instantiated to mexp(yl, ..., yn). A false result will cause the triggering ASSOCIATE operation to be rejected and aborted. We note that the trigger of the rule contains associate operations only. Disassociate and retrieve operations are not included because their execution will not affect the value relationship of the attributes.





58


The next representation M7.2 is a generalized MD macro, which allows a mathematical dependence constraint to be applied on subsets of objects selected by the expressions PY1, PY2, ..., and PYn.


Macro: MD (X, Px, ((cl, Y1, Pyl)...(an, Yn, PY)), mexp(ctl, ..., cxn)) (M7.2)

An example of this generalized MD macro is "MD(Engineer, Engineer.Degree = "ME", ((Salary, Real, -), (Position, Integer, Integer < 25)), Salary = Position 325.3 + 20000)". The same formula now is not applicable to all engineers due to the added selection conditions. Only those engineers whose degree is "ME" and whose positions are lower than 25 can use the formula to decide their salaries. In addition, every engineer's salary must be re-verified whenever his/her degree or position is updated, no matter he/she originally satisfies the selection conditions or not. This is necessary because, after an update of degree or position, an engineer may become qualified with respect to the two selection conditions, which means the engineer's salary has to be adjusted to satisfy the formula. In terms of micro-rule representation, the verification of the selection conditions has to be incorporated into the rule MD-01, and more micro-rules are needed to account for those operations that change the qualification of objects with respect to the selection conditions. (See APPENDIX A.7 for detailed definition.)


5.2.7 LOGICAL-DEPENDENCE (LD)

This type of constraints specify some logical relationships among a set of associations of a class. A logical relationship can be specified in a general way using a quantified association pattern such as "forall x in x:X suchthat exist yl, y2 in (x OR(*>aXl yl:Yl, *>(2 y2:Y2))", which means every X object must have at least one of the two associations, (xl or cx2, with Y1 or Y2 object. An example of a logical relationship existing in the two associations between the Student class and the FullTimeStudent,





59


Part_Time_Student classes is illustrated in the NIAM schema (S5) in Figure 5.2. These two associations are logically dependent on each other because of the "EXCLUSION" constraint denoted by the symbol "X". Due to this EXCLUSION constraint, the two associations can not co-existed. That is, a student can be either a full time student or a part time student, but not both. In OSAM* model, a different symbol "SX" meaning "SETEXCLUSION" is used, and in EXPRESS, it is represented by "ENTITY Student SUPERTYPE OF (ONEOF (FullTime_Student, PartTime_Student));", where the key word "ONEOF" means that a student can only be in one of the subclasses. This constraint can be uniformly specified in ORECOM by the following quantified expression:


forall s in (s:Student *> Full_Time_Student)
suchthat NOT exist p in (s *> p:PartTime_Student)
AND
forall s in (s:Student *> PartTime_Student)
suchthat NOT exist f in (s *> f:Full_Time_Student) (lexpl)


The NIAM schema shown in Figure 5.5 demonstrates another example of LOGICAL-DEPENDENCE constraints. In this schema, there is a "SUBSET" (S) constraint between the two associations Pays and Enrolls of the Student class. This



TuitionFee
Pays

Student S
Course
IEnrollsI]-Figure 5.5. A NIAM schema with a "subset" constraint.


constraint requires that the set of students who have paid the tuition fee must be a subset of those who have enrolled in some course(s). The subset constraint is not available in EXPRESS and IDEF-1X models, but it is equivalent to the SET-SUBSET (SS) constraint





60


of OSAM* except that the latter is defined on two subclass associations. In the language K, it can be represented more generally as follows:


forall s in (s:Student *>Pays TuitionFee)
suchthat exist c in (s *>Enrolls c:Course) (lexp2)


The EXCLUSION and SUBSET constraints in the last two examples are both model-supported constraints. Many user-defined constraints which are embedded in application programs can also be specified by logic expressions. For example, the following conjunctive expression is used to represent a special constraint for the two attributes Position and Salary of the SDM schema (S3) in Figure 5.2. This constraint has to be implemented in a program to meet a particular application's need since it can not be captured explicitly in the referenced SDM schema.


forall e in (e:Engineer *>Position Integer)
suchthat exist r in (e *>Salary r:Real)
AND
forall e in (e:Engineer *>Salary Real)
suchthat exist i in (e *>Position i:Integer) (lexp3)


According to the original schema, both Position and Salary are optional attributes. However, with the above constraint, if an engineer has one of these two attribute values, the other attribute must also be assigned. A similar kind of constraint, called SETEQUALITY (SE) and EQUALITY (E), is supported by OSAM* and NIAM, respectively.

A logical dependence constraint can be used to specify the constraint on the domain of an attribute. An example of this is the first local rule defined in the WHERE clause of the EXPRESS schema S2 in Figure 5.2. According to this rule, a company's Fax number should always be an integer less than 9999999999. In the form of a logical expression, this can be specified as follows:


forall c in (c:Company *>Fax i:INTEGER)
suchthat (i < 9999999999) (lexp4)





61




In the following, we shall introduce a more general form in terms of macro and micro-rules to represent the various logical dependence constraints on the associations al, a2, ..., and an.


Macro: LD(X,((al,Y1),...,(an,Yn)), lexp(X,((al,Y1),...,(an,Yn)))) (M8.1) Micro:
rule LD-01 is /* defined in class X */ triggered immediateafter this.ASSOCIATE(al, yl) ......
immediate_after this.ASSOCIATE(axn, yn),
immediateafter this.DISASSOCIATE(al, yl) ......
immediate_after this.DISASSOCIATE(an, yn) condition lexp(this, ((al,Y1), ..., (an, Yn))) otherwise REJECT;


The function lexp(X, ((al,Y1), ..., (an, Yn))) of the LD macro specifies a logical expression in the language K for associations al, a2, ..., an of the class X. The logic expression can be a simple quantified association pattern such as "forall x in (x:X OR(*>al Y1, *>a2 Y2)) suchthat exist y3, ..., yn in (x OR(*>a3 y3:Y3, ..., *>an yn:Yn))", or a compound quantified association pattern with Boolean operators NOT, AND

(A), OR (v), and logic implication (=>). Using M8.1, the above examples of logical dependence constraints can be represented as follows:

LD(Student, ((-, FullTimeStudent), (-, Part TimeStudent)), lexpl) LD(Student, ((Pays, TuitionFee), (Enrolls, Course)), lexp2) LD(Engineer, ((Position, Integer), (Salary, Real)), lexp3) LD(Company, (Fax, INTEGER), lexp4)

The micro-rule LD-01 defined in the class X is the only rule for the LD macro. Its main task is to evaluate the logic expression specified in a LD macro. Since the logical relationship among the associations can be affected by any update of the associations, the





62


micro-rule can be triggered by an ASSOCIATE or a DISASSOCIATE operation on any of the associations al, a2, ..., and an. When LD-01 is triggered, the argument X' in the specified function lexp(X, ((axl,Y1), ..., (an, Yn))) is instantiated to the X object of the triggering operation, i.e., lexp(this, ((cl,Y1), ..., (an, Yn))). The binding of other variables for Classes Yl, Y2, ..., and Yn still depends on their original quantifiers in the expression. The evaluation of lexp(this, ((al,Y1), ..., (on, Yn))) decides if the triggering operation has to be rejected. For example, in the second LD macro shown above, if a student has enrolled in only one course and he/she has paid the tuition fee, then a disassociation of the student from his/her associated course will be rejected because it violates the "SUBSET" relationship described in the logical expression.

The macro defined in M8.1 can be extended to include some selection conditions for the involved classes as below:


Macro: LD (X, Px, ((aI,Y1,Py1), ..., (an,Yn,Pyn)),
lexp(X, (al,Y1), ..., (an, Yn))) (M8.2)


This generalized LD macro can be used to represent logical dependence constraints in a more general way. For example, there can be a number of different logical relationships exist in different subsets of objects on the same set of associations. This can be illustrated by a modified constraint of the NIAM schema in Figure 6. The original SUBSET (S) constraint applies to every student in the schema. By modifying it as shown below, only non-graduating students have to obey the same subset constraint. LD(Student, Student.Status-'graduating', ((Pays, TuitionFee), (Enrolls, Course)), lexp2)


The micro-rules corresponding to the generalized LD macro need to be modified to include the selection conditions. Additional micro-rules are also needed so that operations that change the qualification of objects of X, Yl, ..., or Yn with respect to the selection





63



conditions PX, Pyl, ..., Py,, will trigger rules to verify the specified logical expressions. A complete micro-rule representation of M8.2 can be found in APPENDIX A.8.














CHAPTER 6
DATA MODEL ANALYSIS


In our study, we have examined a number of data models with a special emphasis on the semantics-rich models such as IDEF-IX, NIAM, EXPRESS and OSAM*. Our objective is to use the ORECOM's macro representation as a neutral representation to capture the underlying semantic properties of their modeling constructs and constraints. We have manually translated all the constructs and constraints of these models into parameterized macros each of which can be further expressed by a set of micro-rules representing the operational semantics of a DBMS. Additionally, we have implemented a schema translation system to demonstrate the workability of schema translation through this neutral representation. In the following sections, we shall use some selected constructs and constraints of the above mentioned four data models as examples to illustrate the concept of semantic decomposition. (A complete analysis of these four models is given in APPENDIX B of this dissertation.)


6.1 IDEF-IX


IDEF- I X [L0086, 87] is an extension of the data model IDEF- I (or Integrated Computer-Aided Manufacturing Definition Method 1). IDEF- 1 was developed in the late 1970's under the auspices of the U.S. Air Force, and later became one of the best known data modeling techniques in the industry. This model is a hybrid of the ER model and the relational model. It uses the concepts of entities, attributes, entity relationships to express data semantics and provides a nice graphical notation for representing some structural properties and constraints. Table 6.1 shows some examples of IDEF-IX construct and constraint patterns and their corresponding macro representations. Construct patterns I-C64





65


01 to I-T-04-2 describe an entity, an attribute, and an alternate-key attribute, respectively. These concepts are commonly supported in other data models even though different terminologies and notations have been used. For example, corresponding to an IDEF- IX entity, NIAM uses non-lexical object type (NOLOT) and OSAM* uses entity class (Eclass). An IDEF-1X alternate-key is captured in EXPRESS and NIAM by a uniqueness constraint.
The construct I-C-01 of Table 6.1 is an IDEF-1X entity whose semantic properties can be represented in ORECOM by a Membership (MB) macro. This macro states that the IDEF-1X entity, X, can be mapped to an entity class of ORECOM, whose members are system-named objects and has an external identifier (YI, ..., Yn), i.e., the composite primary key of X. The default object structure of X is 'simple' (denoted by the first '-' in the macro), and the class type is 'E-CLASS' because it defines a set of system-named objects. It does not have a membership constraint nor method specifications (denoted by the last two '-'s in the macro). The pattern I-C-05 captures the constraints of an attribute

(Y), which relates instances of an entity class (X) to instances of the underlying domain Y (i.e., dom(Y)). The symbol "(0)" after Y means that it is an optional attribute which is represented in ORECOM by a partial participation constraint: the first Participation (PT) macro of I-C-05. On the other hand, not all instances of the domain of Y are associated with X instances, which is also a partial participation constraint represented by the second Participation (PT) macro. The mapping between entity class X and the domain class of Y is many-to-one which is captured by the Cardinality (CD) macro of I-C-05. If Y is an alternate key (AK) of X as shown in I-T-04-2, then the macros in I-C-05 need to be modified to capture the "non-null" (or total participation) constraint and the "uniqueness" (or one-to-one cardinality mapping) constraint associated with an alternate-key attribute. We note here that the concepts of a primary key and an alternate key can be similarly decomposed into participation and cardinality constraints.





66


Table 6.1. Examples of IDEF-1X construct and constraint patterns.

No. Pattern Macro Representation
X
I-C-01 an Y1 ... Yn MB(X, system-named, (Y1 ...Yn), -, E-CLASS, -,
entity


an X PT( X, -, 0, count(X), (Y, dom(Y), -))
I-C-04 optional -- -PT(dom(Y), -, 0, count(dom(Y)), (Y X, -))
attribute Y(o) CD((-), (X, -), (Y, dom(Y), -), 1, M, 1, 1)

X
an PT( X, -, count(X), count(X), (Y, dom(Y), -))
I-T-04-2 alternate- PT( dom(Y), -, 0, count(dom(Y)), (Y -1, X, -) )
key Y(AK) CD((-), (X, -), (Y, dom(Y), -) 1, 1, 1, 1 )

X Y PT( X,-, 0, count(X), (R ,' Y, ))
I-C-05 I R Z PT( Y, -, 0, count(Y), (R, X, ) )
I-C-05 g... ..
Z(FK)(0) CD((-), (X, -), (R Y, -), 1, M, 1, 1 )

X Y PT( X, -, 0, count(X), (R ,' Y, ))
I-T-06-1r R Z PT( Y, -, count(Y), count(Y), (R, X, -))
Z P CD((-), (X, -), (R Y, -), 1, M, 1, 1 )





The last two constructs of Table 6.1, I-C-05 and I-T-06-1, are two examples of entity relationship between entities X and Y which is called in IDEF-1X the connection relationship. The former (see I-C-05) is graphically represented in IDEF-1X by a dashed line labelled with a verb 'R', and the primary key of entity Y is a foreign key (FK) of X. In ORECOM, we treat the inverse of R (i.e., R-1) as an attribute of X and the domain of R-1 is Y. In I-C-05, the foreign key Z is located below the line of entity X and is an optional attribute, which means that instances of X do not have to be associated with any Y instance (i.e., no existence dependency). This semantic property is captured in ORECOM by the first Participation (PT) macro. The black dot on the X side of the link indicates that a Y instance can be connected to zero, one, or many instances of X, which implies that Y is partially participated in the association R with X. This property is captured by the second





67


Participation (PT) macro. The cardinality mapping from X to Y in I-C-05 is many-to-one as captured by the Cardinality (CD) macro.

Compared with I-C-05, the construct I-T-06-1 is more constrained in two ways. First, the foreign key becomes a part of the primary key of X, which enforces an identifier dependency constraint and hence, an existence dependency constraint. Second, every Y instance must be associated with at least one X instance (indicated by the symbol "P"). The identifier dependency is depicted in IDEF-1X by a solid line instead of the dashed line in I-C-05, and the entity box of X is also changed to a round-cornered box which indicates that the entity X is identifier-dependent on at least one other entity (e.g., entity Y, in I-T-06-1). This constraint is represented by changing the first Participation

(PT) macro of I-C-05 to a total participation constraint. The constraint imposed by the symbol "P" is represented by changing the second Participation (FT) macro of I-C-05 to a total participation. The cardinality is not changed, i.e., it is still many-to-one. In a connection relationship in IDEF-1 X, a label "Z" can be used in the place of "P" to mean zero or one Y instance's connection to X instance. This constraint can be similarly expressed in a cardinality macro with different parameter values.


6.2 EXPRESS


EXPRESS [SCH89, IS092] is an information modeling language which is a strong candidate for an international standard for product specification. An EXPRESS schema defines data types and their constraints. The definition of a data type 'positive' is shown below:
TYPE positive = INTEGER;
WHERE self > 0;
ENDTYPE;
The above is called a defined data type which is similar to the domain class (D-class) of OSAM* and the lexical object type (LOT) of NIAM. However, it is not explicitly supported by IDEF-1X since all domains of attributes are hidden from an IDEF-IX schema. Table





68



6.2 shows some general constructs of EXPRESS. The X in the construct E-C-08 is a defined data type whose domain is Y (which is implicitly a simple data type in EXPRESS). A defined data type can have a domain rule specified by an expression 'exp' in the WHERE clause. In ORECOM, this construct can be represented by a Membership (MB) macro having X as a domain class (i.e., containing self-naming objects because the domain Y is a simple type containing self-naming objects) and the expression exp as its membership constraint. Since every member of the defined type X must be an instance of Y that satisfies exp, and every such qualified Y instance becomes automatically a member of X, two Participation (PT) macros are used to describe these two total participation constraints. In addition, a Cardinality (CD) macro is used to capture the one-to-one mapping relationship between X and Y.



Table 6.2. Examples of EXPRESS construct and constraint patterns.

No. Pattern Macro Representation
MB( X, self-naming, -, -, Y, exp, -)
TYPE X = Y; PT( X, -, count(X), count(X), (-, Y, exp ) )
E-C-08 WHERE exp; PT( Y, exp, count(Y), count(Y), (-, X, -))
END_TYPECD((-), (X, -), (-, Y, exp), 1, 1, 1, 1 )

MB( Y, self-naming, -, SET, X, -, -)
PT( Y, -, count(Y), count(Y), (-, X, -) ) E-C-01 SET [kl, k2] OF X
PT( X, -, 0, count(X), (-, Y, -))
CD((-), (Y,-), (-, X, -), 1, M, ki, k2)

ENTITY X;

E-T-1516-3 DERIVE MD( X, -, ((Y, dom(Y), -), (Z1, dom(Zl), -) .....
Y : [agg] W:= exp(Zl ...Zn); (Zn, dom(Zn), -)), Y = exp(Zl...Zn))
END_ENTITY;
ENTITY X;

E-T-1516-5 WHERE exp(Zl ...Zn); LD( X, -, ((Z1, dom(Z1), -) ..., (Zn, dom(Zn), -)),
exp(Z1 ...Zn))
END ENTITY;





69



For defining complex data types, EXPRESS provides four "aggregations" (i.e., SET, LIST, BAG, and ARRAY), which can be used in any combination and in any length (e.g., LIST of ARRAY of ARRAY of SET of INTEGER). An equivalent feature of this is supported in OSAM* but it is not generally available in other existing models. In EXPRESS, aggregations can be used in the TYPE declaration for defined data types or in the entity declaration for defining complex domains of attributes. In either case, each aggregation of a complex data type is viewed in ORECOM as defining a new class from the base (or domain) of that aggregation, whose members are complex objects having the structure specified by the aggregation. For example, the "SET of INTEGER" in the above example will be treated as defining a new class from INTEGER, say, SET-INTEGER, and each object of this new class represents a set of INTEGER objects. Based on this SETINTEGER, a higher level aggregation such as "ARRAY OF SET OF INTEGER" would define another new class, ARRAYSETINTEGER, whose members represents arrays of objects of the class SETINTEGER. The general construct shown in E-C-Ol of Table 6.2 is EXPRESS's way of specifying an aggregation of X with a minimum of zero and a maximum of u number of X. It can be translated to a number of macros. The Membership macro specifies that a new created class named Y is defined as an aggregation of X and is a domain class. (Here the X is assumed to be a simple data type or all aggregation of simple data type in EXPRESS.) The 'agg' in E-C-O1 can be SET, LIST, or BAG only, because the lower and upper bounds of the other aggregation ARRAY mean differently and need to be represented by a separate construct. The lower bound of the agg' in E-C-O1 is zero, which means that an instance of X may not contain any Y instance, and also, as a default condition, not every Y instance has to become an element of some aggregated object of X. Both of these two conditions are partial participation constraints and therefore can be represented by the two Participation macros in E-C-O1. The upper bound (u) of the 'agg' determines the cardinality mapping between X and Y as many-to-u (or M-to--u) as shown in E-C-Ol. By changing the parameters of the macros associated with





70


this construct, a number of other similar EXPRESS constructs such as an aggregation with a non-zero lower bound and a LIST of unique elements, can be represented.

Besides TYPE declarations, entity declarations are the major part of an EXPRESS schema. The definition of an EXPRESS entity type is in terms of its properties (or attributes), each of which has an associated domain and an optional constraint of the domain. An attribute can be further constrained to be a non-optional attribute, a unique attribute, or a derived attribute. The first two types of constraints are similar to the non-null attribute and alternate-key attribute of IDEF-IX which have been discussed previously. A derived attribute is shown in E-T- 1516-3 of Table 6.2, in which the value of attribute Y is derived by the expression 'exp(Z1...Zn)'. Since the domain of Y (i.e., [agg] W) can be a simple data type, a defined data type (specified in the TYPE section), an entity type, or a complex data type specified by an aggregation of W, we simply use 'dom(Y)' as the general representation of the domain of Y. To capture the value relationship between the values of Y ( i.e., members of [agg]W) and the domains of attributes Zs which derive Y values, a Mathematical-Dependence (MD) macro is used to specify "Y = exp(Z1...Zn)".

Another way to specify a constraint on an attribute in EXPRESS is to define a "local rule" in a WHERE clause as illustrated in E-T-1516-5 of Table 6.2. It specifies that the values of attributes ZI, Z2, ..., and Zn of each entity of X have to satisfy a local rule expressed by an expression exp(ZI...Zn). Two examples of local rules are: "Z1 10 > Z2 + Z3" and '71 :=: Z2". The symbol ":=:" in the second expression represents an instance equality operator for two entity typed attributes. This expression returns TRUE if both ZI and Z2 refer to the same entity instance. Similar rule specification facility is also available in OSAM*, but each has its own rule language syntax. In E-T-1516-5, the local rule is represented in ORECOM as an inter-association constraint using a Logical-Dependence

(LD) macro as shown in Table 6.2.





71



6.3 NIAM

NIAM is an information modeling methodology pioneered by G. M. Nijssen [VER82]. This model is sometimes referred to as a "binary semantic model" because it provides a binary representation of data, semantics, and constraint. The building blocks of NIAM are lexical objects (LOTs), non-lexical objects (NOLOTs), associations between LOTs and NOLOTs (called BRIDGEs), and associations between different NOLOTs (called IDEAs). Furthermore, both BRIDGEs and IDEAs are composed of a pair of ROLEs which are usually verbs that describe the semantics of the associations. The most interesting feature of NIAM is that it supports many kinds of constraints on multiple associations. Besides the UNIQUENESS (U) constraint, which is similar to the alternatekey (AK) of IDEF- 1X or the UNIQUE constraint of EXPRESS, there are three association constraints: EQUALITY (E), EXCLUSION (X), and SUBSET (S). These constraints are described separately in the patterns of N-C-08, N-C-10, and N-C-15 in Table 6.3. In these patterns, the NOLOTs YI and Y2 are assumed to be the domains of attributes al and a2 of X, respectively. We show these constraints in a pair of IDEAs even though they can also exist in a pair of BRIDGEs or between an IDEA and a BRIDGE. For each construct, a Logical-Dependence (LD) macro is used to capture the constraint in ORECOM's representation. Since the constraints are different, the logic expressions of their corresponding macros are different. For the constraint of EQUALITY (E) in N-C-08, objects of X must either have both values of al and a2 or none of them. Therefore, the logical expression for N-C-08 would be as follows:

(forall x in (x:X *>al Y1) suchthat exist y2 in (x *>a2 y2:Y2))
AND (forall x in (x:X *>a2 Y2) suchthat exist yl in (x *>al yl:Y1))

This conjunctive association pattern expression ensures that there does not exist any X object which associates with only one of the two classes, Y 1 or Y2. As we discussed in the section on the LD macro, the 'X's in the above expression will be bound to the X object






72



instance of a triggering operation which triggers the micro-rule LD-01 to evaluate the expression.





Table 6.3. Examples of NIAM construct and constraint patterns.


No. Pattern Macro Representation

al Yi LD( X, Y1(-,V, -), (-, Y2, -)), expi) where
expi = (forall x in (x:X *>al Vi) suchthat exist y2 in N-C-08 X E (x *>a2 y2:Y2)) AND

a2 Y2 (forall x in (x:X *>a2 Y2) suchthat exist yl in


al Vi LD( X, Y1(- i,-(, Y2, -)), exp2) where X ~exp2 = (forall x in (x:X *>al Vi) suchthat NOT exist N-C-i 0 X X y2 in (x *>a2 y2:Y2)) AND

a2 Y2 (forahl x in (x:X *>a2 Y2) suchthat NOT exist
______ ______________yl in (X*>al yl:Y1))

al Vi LD( X, -,((-, Y1, -), (-, Y2, -)), exp3) where
N-C-i 2 Xexp3 = (forall x in (x:X *>al Yi) suchthat
exist y2 in (x *>a2 y2:V2))


IH(X, ,N-C-i5 Q QPT( X, -,0, count(X), (-, Y, -)
PT( VY, count(V), count(Y), (,X, -)
_____________________CD((-), (X, ),( Y, -,11,11



N-T-i 6-i PT( X, -count(X), count(X), Y1-, Vi).- n, -)



LD( X,. 1, iY .- n, -),exp4) where A'r- -o exp4 =
(foral! x in (x:X *>ai Vi) suchthat NOT exist y2,., N-T-i 6-2 yn in (x AN D(*>a2 y2:Y2,. >an yn:Yn)))
AND .... AND
(tora! x in (x:X *>an Yn) suchthat NOT exist yi.
________ __________________ yn-i in (x AND(*>ai yi :Yi,.*>an-1 yn-i :Yn-1 )))





73


The constraint of EXCLUSION (X) shown in N-C-10 requires that no object of X can have both values of al and a2 at the same time. The logical expression for this constraint in the Macro representation is:


(forall x in (x:X *>al YI) suchthat NOT exist y2 in (x *>a2 y2:Y2))
AND (forall x in (x:X *>a2 Y2) suchthat NOT exist yl in (x *>al yl :Y 1))

The semantics of the SUBSET (S) constraint of N-C-15 is that the set of X objects which have an al value is a subset of those X object which have an a2 value, or equivalently, the association al between an X object and a Y1 object implies the association a2 between that X object and a Y2 object. The logical expression for this constraint is shown below.


(forall x in (x:X *>al Y1) suchthat exist y2 in (x *>a2 y2:Y2))

In addition to the above inter-association constraints, NIAM allows object types to form a supertype-subtype hierarchy. This important concept is supported in almost every new semantic or object-oriented data model, e.g., the generalization (G) of OSAM*, the supertype-subtype of EXPRESS and IDEF- IX. What is shown in the pattern of N-C- 15 is a supertype-subtype constraint of NIAM between the NOLOTs X and Y. Since X is the supertype of Y, Y inherits all properties of X. The inheritance semantics of this construct is represented by the Inheritance (IH) macro of N-C-15. The two Participation (PT) macros capture the partial participation of X with Y and the total participation of Y with X, and the Cardinality (CD) macro captures the one-to-one mapping between X and Y.

In NIAM, a set of subtype associations may have two kinds of constraints: TOTALITY (T) and DISJOINT (#). The construct N-T-16-1 shows a TOTALITY (T) constraint on the subtypes (Yl, Y2, ..., Yn). This constraint states that the union of all the subtype objects should be equal to the set of objects of X. In other words, X is totally participated in the set of subtypes (Y1, Y2, ..., Yn). This TOTALITY (T) constraint is called a "total specialization" in OSAM* and IDEF-1X. In ORECOM, it is neutrally





74



represented by an inter-association Participation (PT) macro as shown in N-T- 16-1. The second type of constraint among subtypes is DISJOINT (#) as shown in N-T- 16-2. It specifies that the objects of each subtype of X can not overlap. This constraint is similar to the EXCLUSION (X) of N-C- 10 except it is applicable to subtype classes only. The DISJOINT constraint is also available in OSAM* (i.e., the set-exclusion or SX constraint) and EXPRESS (i.e., the ONFOF constraint). In IDEF-iX, however, it is defined as a default constraint among subtype entities. In ORECOM, the DISJOINT (*) constraint specifies a logical relationship among subtypes and therefore is captured by a LogicalDependence (LD) macro containing a logic expression as specified in Table 6.3.


.6.4 OSAM*


The OSAM* [SU89] is an object-oriented semantic association model developed at the Database Systems Research and Development Center of the University of Florida. The basic structural modeling concepts of this model are object classes (i.e., E-class and Dclass) and associations between/among classes. There are five system-defined association types in OSAM* to represent different object/class relationships. They are aggregation (A), generalization (G), interaction (1), composition (C), and cross-product (X) associations. The A-association is similar to the attribute of EXPRESS, the attribute and connection relation ship of IDEF- IX, and the BRIDGE (connecting one LOT and one NOLOT) and the IDEA (connecting two NOLOTs) of NIAM. The G-association is identical to the supertypesubtype relation of these models except a few different optional constraints. The Iassociation is a special association which models the interactions among a set of entity classes and is similar to the relationship construct of the ER model. For example, the interactions among Student, Instructor, and Course classes can be modeled as objects of another class called Registration. In the construct shown in 0-C- 15 of Table 6.4, the entity class X is defined by an interaction among a number of classes including class Y. Z is an optional name of the association between X and Y. Three macros are needed for specifying





75


the constraints between X and Y or X and any other constituent class. The first macro represents a total participation constraint so that every X instance is existence dependent on each of its constituent class' instances. (It is not meaningful to record an interaction among some objects if they do not exist as members of their corresponding classes.) The second



Table 6.4. Examples of OSAM* construct and constraint patterns.

No. Pattern Macro Representation

I PT( X, -, count(X), count(X), (Z, Y, -))
O-C-15 z PT( Y, -, 0, count(Y), (Z ;X, -))
z CD((-), (X, -), (Z, Y, -), 1, M, 1, 1)


L PT( X, -, count(X), count(X), (Z, Y,
O-T-15-1 TP Z PT( Y, -, count(Y), count(Y), (Z-, X, -)
.XCD((-), (X, -), (Z, Y, -), 1, M, 1, 1)




0-T-1 5-2 CD((Z1, Y1, -), (X, -), (Z2, Y2, -), pl, q1, p2, q2)



MB( X, system-named, -, SET, ENTITYOBJECT, C Px,- ) where
O-C-16 0 Px = (exist x in (x:X where x = Instance(Y1 ))) AND
E6-......... .......... AND
(exist x in (x:X where x = Instance(Yn)))





macro is a partial participation because not every Y instance has to be interacted with instances of other classes. The cardinality mapping between X and Y is many-to-one (i.e., a Y instance can participate in many interactions with the object instances of other constituent classes) as captured by the Cardinality (CD) macro. If all Y instances have to participate in some interactions with other instances as defined by instances of X, then a





76


key word "TP" is specified above the class Y (see O-T-15-1) and the second Participation macro of O-C- 15 will be replaced by a total participation macro. An important characteristic of the I-association is that an indirect cardinality mapping constraint can be added for a pair of interacting classes. As shown in O-T- 15-2, the indirect mapping between Y 1 and Y2 is [pl, ql]-to-[p2, q21, that is, every Y1 instance can participate in interactions with at least p2 and at most q2 of Y2 instances, and every Y2 instance can participate in interactions with at least pl and at most qI of YI instances. To represent this indirect mapping constraint in ORECOM, an inter-association Cardinality (CD) macro is used. Note that, in this construct, the constraints of each individual pair (X and one of its constituent classes) is supposed to be captured by constructs of O-C-15 or O-T-15-1.

The last example OSAM* construct in Table 6.4 (i.e., O-C-16) is particularly useful for statistical database applications. This construct represents a Composition or Cassociation. It means that the dynamic sets of objects in classes Y1, Y2 ....Yn are instances of X. Any attribute (usually statistical summary attribute) of class X (defined by an aggregation association not shown in O-C-16) would characterize the set-structured instances rather than the individual members in the sets. As a consequence, the Membership (MB) macro representation of this construct shows that X is a system-named class, its structure is SET and its class type is ENTITYOBJECT where ENTITYOBJECT is a system-defined class of all entity objects of a database. The constraint on the members of X is that each member of X corresponds to the entire set of instances of Yi for i = 1..n. Here, Instance(Yi) denotes the entire set of instances of Yi.














CHAPTER 7
THE DATA MODEL AND SCHEMA TRANSLATION SYSTEM


In order to achieve the interoperability of heterogeneous database systems, a semantic-preserving translation of the modeling constructs and constraints captured by different data models and defined in different schemata is a necessity. In the previous chapters, we have introduced the technique of data model analysis which is based on the principle of semantic decomposition using the developed core model ORECOM. This work provides us a framework for the development of a semantic-preserving schema translation system for the translation of schemata defined in EXPRESS, IDEFAX, NIAM, and OSAM*. This chapter describes the translation process and the architecture of such a system.


7.1 System Overview

The objective of a heterogeneous schema translation system is to translate a source schema of one data model to an equivalent target schema of another data model. Since a schema is defined in terms of the modeling constructs and constraints of its underlying data model, the translation between schemata will need information about the equivalence relationships between the modeling constructs and constraints of the source and the target data models. The ORECOM-based translation system therefore consists of two subsystems. A data model translation subsystem provides the equivalence relationships for each pair of data models and the schema translation subsystem uses the equivalence information to perform the translation of schemata.






77





78



7.1 .1 Subsystem- 1: Data Model Translation


The analysis of data models to establish their equivalences and discrepancies consists of the following three steps:


Step 1: Identify the basic modeling constructs and their associated constraints of both the
source and target data models. Each identified construct and its constraints is
represented as a syntatical or graphical pattern and is labeled with a pattern index.
(Data model analysis)

" Step 2: Determine the macro and micro-rule representations of each pattern identified in
Step 1. (Decomposition)

" Step 3: Based on the macro representations of source and target patterns, determine their
equivalences and discrepancies. (Equivalence analysis)



These steps are carried out by persons who are knowledgable with the data models being considered. As illustrated by Figure 7. 1, two models are individually processed in Step I to Step 2. The results are two sets of patterns and their macro and micro-rule representations, which are stored as the matadata of the pair of processed models. Then, in equivalence analysis is carried out using their macro representations to derive their equivalence relationships. The output of the equivalence analysis is an equivalence matrix with its rows and columns representing the source and target patterns, respectively. Each non-empty element of the matrix is called an ad-justment factor which specifies the discrepancy (in terms of macros) between a source pattern and the closest pattern that call be found in the target model. If two patterns are equivalent, the corresponding matrix element will be marked by an equal sign (=). As an example, let us consider the patterns indexed as ACI of model A and BC5 of model B in Figure 7.1. ACI is assumed to be decomposed into three macros, MI, M2 and M3, and BC5 is decomposed into MI and M2. If BC5 is the closest construct of model B to ACI, then, to make these two patterns





79


equivalent, the macro M3 has to be added to BC5 as the adjustment factor. That is, "AC1 = BC5 + MY. We note that, although we are using an equivalence matrix for the mapping of each pair of data models, this approach is different from the direct translation approach discussed in Chapter 2 since the schema translation method (see Section 7.1.2) is separated from the data (i.e., the equivalent information provided by the matrices) which drive the translation process. In other words, our schema translation method is general in the sense that it is independent of the source and target models. Only the equivalence matrices are specific to pairs of data models.


Source Data Model: A Target Data Model: B
Constructs I Constraints Constructs I Constraints

SData Model Analysi Data Model Analysi
& De mo & Decomposition

Construct/ Constructi .
Constraint AC1 .... Constraint BC5
Macros M1 M2 M3 .... Macros .... M1 M2 ....
MicroRules IRl R4 R2 3*tl.. MicroRules .. Ri R2 ...


Analysis

Equivalence Matrix
B05 BCm BTn


ACi *+ M3.
ACi Mi
Al 10 + Mj." Adjustment
Factor

Figure 7.1 Data model translation.


To translate any pair of data models of EXPRESS, IDEF- IX, NIAM, and OSAM*, twelve equivalence matrices are needed, which have been manually analyzed and their results are included in the APPENDIX C of this dissertation.





80



7.1.2 Subsystem-2: Schema Translation


Schema translation is driven by the equivalence information generated by subsystem-1. Given a source schema, its equivalent target schema defined in a different data model is generated by the following three steps:

" Step 1: Based on the syntax of the data definition language (textual or graphical) of the
source data model, compile the input schema into a set of basic patterns of
constructs and their constraints which have been identified in Step 1 of
subsystem-1. (Schema compilation)

" Step 2: For each pattern of the input schema, search the equivalence matrix of subsystem1 for an equivalent pattern of the target data model. (Equivalence search)

" Step 3: Collect all searched patterns of the target data model and generate the target
schema. If there are discrepancies found in Step 2, they are used to either generate
an explanation to the user so that he/she can take care of the discrepancies in
his/her application development using the target schema, or generate program
codes to be incorporated into the user's application system. The automatic
generation of explanations and programs is possible since the semantics of the
discrepancies have been explicitly defined by macros and micro-rules which
define the trigger conditions and database operation in great details. (Schema
generation)


The overall procedure of a schema translation is illustrated by Figure 7.2. Subsystem-2 has an optional input specifying the additional constraints which are to be added to the source schema. The added constraints are the semantic properties of the applications that were not modeled in the source schema3 but the user wants them to be incorporated in the target schema. These added constraints can be specified in either


3Some added constraints may be extracted from the code embedded in application programs or method implementations of an object-oriented model or in the rule specification block of a schema defined in a data model like EXPRESS. Or it can be obtained directly from the original schema designer or the schema translation administrator.





81


ORECOM macros or in some rule specification language. In the latter case, the specification needs to be translated into macros which can then be merged with the compiled source schema for translation into the target schema.




Source Schema Target Schema
Discrepancy
t L E I O ther explanation,
constraints constraints,
- -orprogram code

Schema Rule Se Rul
Compilation Translation Schema Rule
l TGeneration Translation
Equivalence Matrix L..I.J .
Constructs, Constraints, BC5 BCm BTn Constructs, Constraints,
Macros, Micro-rules -A Macros, Micro-rules
ACl (C1), (C3) i T k BC5+M3 (Cl), (C3)
ACi (C2,C3) AC1 +M3 BCm-Mi (C2, C3)
(C3, C4) ACi II- Mi (C3, C4)
... ...... ATj + Mj.. ..........


Figure 7.2 Schema translation.



7.2 System Architecture


Combining the above two subsystems, the architecture of a data model and schema translation system is shown in Figure 7.3. Subsystem-1 contains two major processing steps (i.e., semantic analysis and equivalence analysis) and a database to store the macro/micro-rule definitions, decomposed data model patterns, and equivalence matrices. Semantic analysis is a manual process which carries out the tasks of Step 1 and Step 2 of subsystem-1. For each data model, its semantics is analyzed and the result (i.e., decomposed patterns with macro representation) is stored for use in the equivalence analysis. During a semantic analysis, the stored macro definitions (and their micro-rules)





82


SUser-defined OSAM*, EXPRESS, IDEF-1 X, NIAM Constraints,
Assertions, or Rules

Subsystem Subsystem-2

Semantics __ S a
Analysis Schema Rule
New Compiler Translator
macros or
micro-rules
...V...... __ _!_
Macros Decomposed I Equivalence
& & Constructs & rC
Micro-rules Constraints Search Module

Source Tret:
Instantiated
I ITarget Macros,
Equivalence Constructs, Micro-rules
Analysis Constraints
SCcSchema Rule


Equivalence Generator Translator



Target
B
- - - - - -. .... ... .. ........... ..

Target Discrepancy
Schema Explanation

or
User-defined
constraints
or





Figure 7.3 Architecture of the data model and schema translation system.





83


can be accessed by the human analyzer to aid his/her analysis. Additional micro-rules or macros can be defined by the analyzer to extend the system's capability to capture new semantic properties. The equivalence analysis step performs the manual task stated in Step 3 of subsystem- I to generate an equivalence matrix for a given pair of source and target data models. This analysis is intended to be a general process by taking advantage of the neutral representation of each model so that the equivalence relationships between a source and a target model can be derived without referring to their original representations. Basically, it functions as a pattern matching process by looking into the macro representation of every target pattern to find out the closest one to the given source pattern. Subsystem-2 is an automated system. It has three sets of working modules. The first set forms the front-end interface which contains schema compilers and rule translators to execute the tasks specified in Step I of schema translation. There are as many schema compilers and rule translators as the number of data models handled by the translation system. The second set contains only a single module which carries out the equivalence search (i.e., Step 2 of subsystem-2.) It takes the merged output of the schema compiler and rule translator and make use of the equivalence matrix to map the source patterns of constructs and constraints to the target patterns and possibly a list of macros which represent the discrepancies between the two schemata. The third set of modules is composed of schema generators and rule translators. A schema generator does exactly the reverse task of a schema compiler. It generates the target schema using the set of target patterns. Similarly, a rule translator translates the discrepancies expressed in macros to some representation such as plain English statements, rules of a particular constraint language, or programs of some programming language.


7.3 An Example of QSAM* to IDEF- I X Schema Translation


In this section, we use an example of translating an OSAM* schema to its equivalent IDEF-IX schema to illustrate the translation process performed by the system





84



presented in the last section. The source schema is defined in OSAM* and is shown in Figure 7.4. This schema models the interaction between STUDENT and COURSE classes, which is modeled as another E-class, REGISTRATION, using the I-association. These three classes have their own user-defined keys (i.e., S#, C#, and R#, respectively) all of which have INTEGER as their domain class. The REGISTRATION class has an optional but one-to-one attribute named ProcessedBy, which is defined on another D-class STRING. The semantics of this schema is that each REGISTRATION object represents a particular registration made by a single student on a single course, but a student can register many courses and a course can be registered by many students. Besides this general semantics, there are two additional constraints: (1) the symbol "TP" along the COURSE specifies that every course must be registered by some student(s), and (2) the indirect mapping of "STUDENT: COURSE" is "[7, 120] :I[l, 5]", meaning a student can register at most five courses and a course must be registered by at least seven and at most 120 students.













PrNTEGER~ ISTRGNR

Figursed 7.4Atsue schem oftiona





85


The translation starts with the OSAM* schema compiler which reads and compiles the source schema into a number of OSAM* pattern types shown in Figure 7.5. Each pattern type has a list of instances each of which contains class or attribute names defined in the schema that form the pattern instance. A pair of braces ("{ } ") is used to enclose a pattern instance. For example, the first pattern O-C-06 in Figure 7.5 specifies that the three classes STUDENT, COURSE, and REGISTRATION satisfy this pattern type which denotes all E-classes.




O-C-06 [ {STUDENT, (S#))
{COURSE, (C#)}
{REGISTERATION, (R#)} ]

O-T-11-3 [(REGISTERATION, ProcessedBy, STRING} ]

O-T-1112-4 [ {STUDENT, (S#, INTEGER)}
{COURSE, (C#, INTEGER)}
{REGISTERATION, (R#, INTEGER)) ]

O-C-15 [ {REGISTERATION, RequestedBy STUDENT}
{REGISTERATION, Title, COURSE} ]

O-T-15-1 [ {REGISTERATION, Title, COURSE} ]

O-T-15-2 [ {REGISTERATION, ((RequestedBy, STUDENT), (Title, COURSE)), 7,120,1,5 } ]

Figure 7.5 The OSAM* source schema after compilation.




The list of OSAM* schema pattern types is then processed by the equivalence search module to map them into a list of pattern types of the target model (IDEF- IX). Here, the equivalence matrix of OSAM*-to-IDEF-1X generated by the subsystem-1 is used for this purpose. A part of this matrix is shown in the table of Figure 7.6. In this table, most OSAM* patterns have an equivalent IDEF-1X pattern except O-T-1 1-3 and O-T-15-2. The pattern O-T-11-3 captures the constraint of an optional attribute with an one-to-one





86


cardinality constraint whose closest pattern in IDEF-IX is I-C-04 (see Figure 6.1 or APPENDIX B.1). In order to preserve the semantics in this translation, Cardinality (CD) macros need to be added and substracted. For O-T-15-2, which specifies the indirect mapping constraint of the OSAM*'s I-association, no corresponding pattern exists in IDEF-1X. Therefore, it becomes a discrepancy between these two models. For those OSAM* patterns, which have equivalent IDEF-1X patterns, their associated values are copied to instantiate the IDEF-IX patterns as shown in Figure 7.7.




OSAM* IDEF-IX
(source) (target)
0-C-06 = I-C-01
0-T-1 1-3 = 1-0-04 CD(I-C-04) + CD(O-T-1 1-3)
0-T-1 112-4 = I-T-04-2 0-C-15 = I-T-05-1
0-T-15-1 = I-T-06-1
0-T-15-2 =





Figure 7.6 Part of the equivalence matrix for translating OSAM* to IDEF- IX.


The last step of the translation is to generate the target schema from the list of instantiated IDEF-1X pattern types. The IDEF-IX schema generator first parses the list of patterns into its internal dictionary and then composes and displays the target schema. All discrepancies in terms of the adjusted macros are also displayed with the target schema. The IDEF- 1X target schema is shown by the diagram of Figure 7.8. (In this target schema, we have assumed the two verb phrases 'IsTitleOf and 'Registers' are the inverse of attributes 'Title' and 'RequestedBy', respectively.)





87





I-C-01 [ {STUDENT, (S#))
{COURSE, (C#)}
{REGISTERATION, (R#)} ]

I-C-04 [ {REGISTERATION, (ProcessedBy, STRING)} ] I-T-04-2 [ {STUDENT, (S#, INTEGER)}
{COURSE, (C#, INTEGER)}
{REGISTERATION, (R#, INTEGER)} ]

I-T-05-1 [({REGISTERATION, (RequestedBy, STUDENT)}J] I-T-06-1 [( {REGISTERATION, (Title, COURSE)} ]


- CD ((-), (REGISTRATION, -), (ProcessedBy, String, -), 1, 1,1, 1) + CD ((-), (REGISTRATION, -), (ProcessedBy, String, -), 1, M, 1, 1) + CD ((RequestedBy, STUDENT, -), (REGISTRATION, -), (Title, COURSE, -), 7, 120,1, 5)


Figure 7.7 The search module output for generating an IDEF- IX target schema.




STUDENT COURSE
S# C#



Registers Is TitleOf
(RequestedBy) (Title)
""P

REGISTRATION
R#
S# (FK)
C# (FK)
ProcessedBy (AK)

CD ((-), (REGISTRATION, -), (ProcessedBy, String, -), 1, 1, 1, 1) + CD ((-), (REGISTRATION, -), (ProcessedBy, String, -), 1, M, 1, 1)
+ CD ((RequestedBy, STUDENT, -), (REGISTRATION, -), (Title, COURSE, -), 7, 120, 1, 5)

Figure 7.8 The generated equivalent IDEF-1X target schema with the discrepancies.





88




7.4 System Implementation


The implementation of the data model and schema translation system started in 1990 at the Database Systems R&D Center of the University of Florida. Six Masters students have collaborated in this project to develop the schema translation subsystem, including four pairs of schema compilers and generators and one equivalence search module. A prototype system which can translate schemata of any pair (in either direction) of the four data models (EXPRESS, IDEF-lX, NIAM, and OSAM*) has been completed. This prototype is implemented in C running on both UNIX-based Sun workstations and AIXbased IBM RS-6000 workstations. The schema translation system is featured by its graphical user interface (GUI) implemented in an X-Window environment using OSF/Motif widget set. Although the subsystem- I can also be implemented to automatically translate data models, we have not done so at this stage. We have, however, manually converted the modeling constructs and constraints into their ORECOM representations (i.e., data model pattern types) and derived the pair-wise equivalence matrices for all the combinations of the above four models to support the schema translation (see APPENDIX C). The prototype system has been demonstrated at RPI, NIST and IBM Kingston in May 1992 and the second annual EXPRESS User's Group (EUG'92) at Texas in October 1992.

In this prototype system, a schema compiler and a schema generator are implemented for each of the four data models. Particularly, for IDEF-lX, both graphical and textual interfaces have been implemented so that an IIDEF- IX schema can be processed (input and output) in either format: diagrammatically [GAR9lI or in its textual language called SML. EXPRESS has an official graphic representation called EXPRESS-C, but it only supports a subset of EXPRESS. The schema translation project is currently involved in developing a graphics-based EXPRESS toolset including an EXPRESS-G editor, a browser, and a query processor. In addition, an EXPRESS schema compiler [CHE9 11 and a generator were implemented for the old version of EXPRESS (ISO TCI 8415C4/WG5,





89



N14, 1989). They have been upgraded to the new version released in August 1992 (N151). For NIAM, only a textual interface (schema compiler and generator) has been implemented [UPP93]. It is based on a constraint language of NIAM, called RIDL [N1E88]. The OSAM* schema compiler and generator are adopted from the OSAM*.KBMS project [SRE93].

The equivalence search module of the schema translation subsystem has also been implemented. A future plan for this module is to enhance its capability so that it can accept macros directly from the system input and merge them with the compiled source schema. The implementation of the two rule translators are yet to be carried out. In the absence of these rule translators, the present system allows additional constraints to be inputted in terms of ORECOM macros and the discrepancies found in a translation are presented to the user in macros and micro-rules without interpretation.


7.5 Applications


The above presented data model and schema translation system provides a general framework for resolving the data model heterogeneity problem found in multimodel database systems. Many problems associated with the interoperability of multimodel database systems such as data model learning, schema integration, and schema verification and optimization can also be benefited from the system and its semantic decomposition technique.


7.5.1 Data Model Learning


One possible application of ORECOM is to assist a user or a database system designer or developer in learning the semantics of a new data model in two ways. First, when a data model is mapped to an ORECOM representation, each of its modeling constructs and its associated constraints are fully decomposed (as presented in the previous sections) into a concise parameterized representation (i.e., the macro representation) and its





90


corresponding operational semantics (i.e., the micro-rule representation). A user can learn quite easily the modeling concepts of a new data model from the macro representations of its constructs and a system designer or developer can gain precise information about the DBMS operations needed to implement a construct or enforce a constraint from its microrule representation. This information is useful in designing a schema, implementing a database, or writing a control or interface program. Second, since different data models can be mapped to the neutral ORECOM representation, the data model learning process can be considerably eased by comparing the neutral representation of a new construct or constraint with that of the corresponding construct or constraint in a model familiar to the user. Due to the neutral representation of all decomposed modeling constructs and constraints, a cross reference between a new model and a well-understood model can be automatically generated to compare their commonalties and differences.


7.5.2 Schema Inteeration


In a multimodel database system environment, schema integration is a necessary task in the development of a federated or integrated database management system. A desirable approach to schema integration is to first translate heterogeneous schemata into common representations and then integrate them. This approach is used in the work reported in [SPA92]. The use of ORECOM as the neutral model has the following benefits: 1) the integration can be carried out on the basis of macros whose low-level and primitive representations can distinguish the fine semantic differences of different modeling constructs and constraints which can not be distinguished by a high-level common model, 2) a tightly integrated global schema can be generated to capture not only the structures but also the constraints and operations of the component schemata, 3) the mapping relationship between the global and the component schemata can be recorded and reported in detail (in terms of macros or micro-rules) for interfacing the global and local query and transaction processes.





91




7.5.3 Schema Verification and Optimization


The existing tools for checking the correctness of a schema (i.e., verification) and for removing the redundant constructs or constraints from a schema (i.e., optimization) are usually data-model-dependent. Using ORECOM as a neutral model, the development of a common schema management tool for these two tasks is possible because heterogeneous schemata can be translated into the primitive ORECOM representations before applying verification and optimization techniques on them.













CHAPTER 8
CONCLUSIONS


In this dissertation, we have described the design and development of a data model and schema translation system. The underlying core data model, the semantic decomposition technique, the translation methodology, and the implementation architecture of the system have been presented. The development of the translation system is based upon the following two basic principles. First, a neutral data model is used as the intermediate representation in data model and schema translations to reduce the complexity and to avoid a large number of pair-wise direct translations. Second, high-level modeling constructs and constraints are decomposed into some low-level, neutral, primitive semantic representations so that the equivalence relationship between different constructs and constraints of different data models can be determined precisely and specified explicitly.

We have presented ORECOM as the neutral representation to facilitate the semantic decomposition of high-level data models. Different from those semantic ally-ri c h data models which provide high-level structural constructs and constraints, ORECOM is a lowlevel data model consisting of a number of modeling primitives, i.e., objects, classes, associations, object operations, micro-rules, and macros. High-level structural constructs and constraints are decomposed into ORECOM's modeling primitives. For the convenience of data model and schema translations, we have defined eight macros and their semantics using the powerful knowledge rule specification language K to represent the eight basic constraint types found in many existing data models. Due to their compact and uniform representations, the process of semantic decomposition is made simpler. The analysis of these constraint types is based on our study of the semantic properties of modeling



92





93


constructs and constraints found in several semantically-rich data models including IDEFIX, EXPRESS, NIAM, and OSAM*

To verify our semantic analysis of data models, the technique of semantic decomposition and the utility of the neutral core model ORECOM for schema translation, we have developed a schema translation system which is capable of translating schemata defined in IDEFAX, EXPRESS, NIAM, and OSAM* models. The translation of highlevel constructs and constraints to ORECOM representations, the matching and transformation of the ORECOM representations of one model to that of another, and the generation of target schemata have been described in this dissertation. Other applications of the neutral model in addition to schema translation have also been presented.

We shall in the following summarize the contributions of this research effort and provide the direction for future work.


8.1 The Contributions of the Research


One of the major contributions of our research is the development of the Objectoriented Rule-based Extensible Core Model (ORECOM) and the formalization of its general semantic constraint types. It has been shown that, by using ORECOM's modeling primitives as the neutral representation, the semantics captured by many different data models can be evaluated and decomposed. By comparing the primitive representations of the decomposed constructs and constraints, we are able to identify whether these constructs and constraints are identical, slightly different, or totally unrelated. In case of non-identical constructs and constraints, the discrepancies between them can be explicitly specified by the primitive representations and be used in an application development to account for the missing semantics. Semantic decomposition of high-level data models to ORECOM representations will be very useful for (i) verifying a data model to see if the semantics of its constructs and constraints are defined consistently, (ii) learning a new data model by comparing its primitive representation with those of others familiar to the user, and (iii)