Group Title: Department of Computer and Information Science and Engineering Technical Reports
Title: Distributed information mediation and query processing in a CORBA environment
CITATION PDF VIEWER THUMBNAILS PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00095411/00001
 Material Information
Title: Distributed information mediation and query processing in a CORBA environment
Series Title: Department of Computer and Information Science and Engineering Technical Reports
Physical Description: Book
Language: English
Creator: Su, Stanley Y. W.
Yu, Tsae-Feng
Affiliation: Unviersity of Florida
Unviersity of Florida
Publisher: Department of Computer and Information Science and Engineering, University of Florida
Place of Publication: Gainesville, Fla.
Copyright Date: 1997
 Record Information
Bibliographic ID: UF00095411
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.

Downloads

This item has the following downloads:

1997263 ( PDF )


Full Text










Distributed Information Mediation and Query Processing in a
CORBA Environment *

i Ii ,,! Y. W. Su and Tsae-Feng Yu
Database Systems Research and Development Center
Department of Computer and Information Science and Engineering
University of Florida
{su,yu} ". '- .ufl.edu



Abstract
A heterogeneous information -1. i can be built on a large number of component -1. i- which
are interconnected by a LAN or WAN. The data stored in these component -I. i,- are likely
to have very. 1!. i, I naming, structural and semantic representations. The !i i .- ;_ facility of
the information -- -I. i, needs to couple with an information mediation facility to resolve data
heterogeneity problems so that a user can issue queries in the terms and receive data in the data
representations that are familiar to him/her. In this work, we introduce an object-oriented mod-
eling language for modeling the data resources and object services of the component -- -.I 11,-
and a mediation specification language for explicitly -1 I. i the similarities and differences
among data representations, as well as the methods needed to do data conversions. Based on
the schemas and the mediation specifications defined for the component -1. 1 a compiler has
been developed to generate enhanced program bindings for the component -1. i These bind-
ings contain subquery, rule and mediation processing code to perform distributed mediation,
query and active rule processing tasks at run-time over the OMG's CORBA communication in-
frastructure. System. ti! i ,- is thus achieved by the compilation approach and the distributed
processing of generated code.


1 Introduction


In a heterogeneous information system, component systems operating on different computing plat-

forms need to exchange and share data resources. Data residing in these systems would generally

have different structural and semantic representations. A traditional approach taken by the existing

heterogeneous systems is to establish an "'iti i .t1 I global r1, i n.i' over the conceptual models

of the data resources stored in these systems. I Ir integrated global schema forces all semantic

ambiguities, naming conflicts, and structural/semantic discrepancies among dissimilar systems to

be resolved at the time when the global schema is being designed. I In, means that some users of

*This research is supported by the Advanced Research Project .\-. !i. ,- under ARPA Order .'-.. B761-00 and
managed by the United -I.,i. Air Force under contract F33615-94-2-4447. This is a part of the R&D effort of the
NIIIP Consortium. The views and conclusions contained in this paper are those of the authors and should not be
interpreted i.. --.n! representing the ..!I. i.,! policies, either expressed or implied, of all the NIIIP Consortium
members, the Advanced Research Projects .\.-. i. or the United -i.I.i Government.










the existing systems will be forced to view and make reference to the global database in a way not

familiar to them, both structurally and semantically. I il recent thinking in the database com-

munity is to "-inr .I.I.I-" dissimilar data representations at run-time instead of --it ,i i.itj" them

at build-time [CITiE WIE92]. 'I 11 term "i ,i. lI.I t,.n" has been defined in a very broad sense in

[WIE94]. Almost all tasks which facilitate the communication and interchange of dissimilar data or

the resolution of disagreements or disputes can be considered as some form of mediation tasks. How-

ever, as pointed out in the literature [BRE90, CHAL94, CHA91, GOH94, I l: ii.;, KIM91, VEN91],

the most critical problem in a heterogeneous information system is the resolution of various kinds

of data heterogeneities. I I'1 problem has not been effectively solved, especially in a large-scale

system environment. I, i I !,- -, in this work, we restrict the domain of mediation to information

mediation which deals with problems of ij. i lij-. structural and semantic differences.

'I I, basic idea of information mediation is to allow a user of a heterogeneous information system

to view data the way he/she wants and is accustomed to. 'I h heterogeneous information system

will transform the semantic and/or structural representations of data at run-time so that the data

retrieved from the dissimilar component systems can be converted, assembled and presented to the

user in the structure and semantics familiar to him/her.

Information mediation needs to be closely-coupled with distributed query processing. A global

query is issued by a client to a Distributed Query Processor (DQP) which decomposes the global

query into a number of subqueries depending on the locations of the data referenced in the query.

Each subquery is to be processed against some data residing in a different component system.

Since the global query uses terms (object class names, attribute names, data value representations,

etc) which are familiar to the user (i.e., based on the data representation defined in a component

schema), the terms in a subquery may have to be converted by a mediator to fit the naming

convention and the syntactic and semantic representations of another component system (i.e., a

server) before the subquery can be processed by it. Each server would then translate the mediated

subquery into a native query, command or application interface (API) processable by the server,

and the retrieved data will have to be transformed into a standard data interchange form before

they are returned to the mediator. 'I hl returned data from the servers of all the subqueries may

have to be converted by the mediator to conform to the user's view of data before they can be

correctly assembled by a Data Assembler (a component of DQP). I L. assembled data are then

forwarded to the client.

To carry out the mediation tasks in the above scenario, the approach taken in this work is










to use a high-level object-oriented modeling language (NCL [SU'Li]) to uniformly model the data

resources as well as the component systems as object classes, each of which is defined in terms

of attributes, associations, methods and event-condition-action-alternativeaction rules. hijt ii.

component schemas having dissimilar data representations are produced in the modeling process.
A high-level mediation specification language is then used to specify the mediation specification

that captures the naming, structural, and semantic similarities and differences among component

schemas. I Ir mediation specification is then compiled to generate a set of object classes which

model the DQP, the subquery processors and mediators. 'I Il classes contain active mediation

rules for triggering the mediation operations at run-time. I hr specifications of the component

systems and the generated classes are combined to form a mediated global schema which is used

by the user to issue queries and by a compiler to generate executable code to support distributed

mediation and query processing.

I Ih main differences between our work and some existing mediation efforts [AMBD' 1, CHAW94,

GOH94, FLO96, SAU' ii are: 1) the use of an object-oriented modeling language, which combines

the features of two standard languages (ISO's EXPRESS and OMG's IDL) and the association

and rule specification facilities of our own modeling language K [SHY'lli, for defining the infor-
mation resources, as well as the component systems of a heterogeneous network, 2) the use of a

high-level mediation specification language (instead of low-level mediation logic rules) for media-

tion specifications, 3) the use of a compilation approach (instead of an interpretive approach) to

automatically generate distributed mediation and query processing code so that the overloading

problem of a centralized mediation system can be avoided, and 4) the use of a set of mediated

component schemas (instead of a single or multiple integrated schemas) to allow the user to issue

queries based on the schema familiar to him/her and receive data in his/her own view. A prototype
distributed mediation and query processing system with the above features has been implemented

as a part of a DARPA-supported project entitled -N.,it II.,! Industrial Information Infrastructure

Protocols (NIIIP)".
'I Ij remainder of this paper is organized as follows. Section 2 defines and categorizes data het-

erogeneity problems. Section 3 presents our approach to dealing with these problems. It describes

the object-oriented information modeling language, the mediation specification language with some

mediation specification examples, the translation and code generation process, and the distributed

query processing and mediation. Section 4 summarizes the main features of this proposed approach

and reports on the implementation status.










2 Problems of Data Heterogeneity


Data heterogeneity problems have been thoroughly discussed in several publications [BRE90, CHA91,

KIM91, SU91, VEN91, l.\: i'l.;, CHAL94, GOH94]. Based on the work of Goh, : Ii.li.I i: and Siegel

[GOH94], these problems are categorized as schematic heterogeneity and semantic heterogeneity.

Schematic heterogeneity includes two types of problems: naming and structural conflicts. I Ir

naming conflicts include the synonym and homonym problems on both attribute and entity type

names. I Ir structural conflicts are due to different ways of modeling the same piece of information.

For example, in Figlii 1, the company name, "--l: i", can be used as an entity type name, an

attribute name, and a value of an attribute in different systems.

Datebase 1 (IBM is an attribute value) Database 2 (IBM is an attribute name)
Date StkName TradePrice Date IBM HP
1/20/95 IBM 50.00 1/20/95 50.00 40.00
1/20/95 HP 40.00 ...


Database 3 (IBM is an entity name)
Entity IBM Entity HP
Date TradePrice Date TradePrice
1/20/95 50.00 1/20/95 40.00


Fi nil 1: I I!i Examples of Showing Structural Conflicts


Semantic heterogeneity is due to different representations of data values. It includes naming and

other representational conflicts. I Iir naming conflicts in attribute values are seen as synonyms (e.g.,

'IBM' and 'IBM Corp') and homonyms (e.g., persons with the same name). Other representational

conflicts in this category include: (1) measurement conflicts (US Dollar vs. Yen), (2) representation

conflicts (\i .- York Stock Exchange representation vs. decimal representation), (3) confounding

conflicts (e.g., latest closing price vs. latest trade price), (4) granularity conflicts (e.g., monthly

pay vs. yearly pay), and (5) domain type conflicts (e.g., numerical type vs. string type).


3 Approach

3.1 Object-oriented Modeling

A heterogeneous information system may contain component systems that run on different comput-

ing platforms and use different types of data management systems. CORBA's approach to achieve

data and program sharing is to model all data and software resources as distributed objects, and










their interfaces are uniformly defined in an Interface Definition Language (IDL). However, IDL has a

very limited number of modeling constructs. : i 1 i1 of the semantic information associated with data

entities and software systems can be lost in the IDL specifications. In our work, a semantically-rich

modeling language is used to model the resources of the component systems resulting in a number

of component schemas. Each component schema preserves the i.iiij-; structural and semantic

representations of the data stored in each component system. For details of the modeling language,

the readers are referred to [SUL'iii

3.2 Mediation Specification

A high-level mediation language is introduced to explicitly specify the similarities and differences

among data elements and the mediation operations needed for modifying queries and converting

data from one representation to another. For a number of component schemas that contain seman-

tically related data, a mediation specification is defined by the mediation system administrator.

I Ir design of the mediation specification language meets the following requirements: (1) the

language should be able to explicitly specify the ~ij.ii ij. structural and semantic relationship of

heterogeneous component systems, (2) the syntax of the language should be high-level and easy to

use, and (3) the syntactic constructs should conform to the common modeling language (i.e., NCL)

so that they can be easily translated into NCL class specifications.

I Ih heterogeneity problems discussed in Section 2 provide a good guideline in our design of the

mediation language. 1I Ir overall structure of a mediation specification is shown below:

SCHEMA MedSchema;
USE FROM schemal(. !l1Q _1,...);
USE FROM schema_'I,. 2!1- _2,...);

ENTITY super_entityid
ABSTRACT SUPERTYPE OF (supertypeexpression);
ENTITY EQUIVALENCE (sch_l::entityl,sch_2::entity_2,...) ;
ATTRIBUTE EQUIVALENCE [(sch_l::entityl.attr_l,sch_2::en tity_2.attr_2,...);
(attrset_1, attr_set_2,...);]
[VALUE EQUIVALENCE((sch_l::entityl.attr1_,convmethod (sch_2::entity_2.attr_2),...);
(sch_l::entityl.attr_l,convmethod(attr_set 2),...);
); ]
[WHERE
simplecondition;


END_ENTITY;











ENDSCHEMA;


I Ijmediation language conforms to the standard information specification language, EX-

PRESS [ISO92] (a part of NCL) by using some of its keywords. Four additional syntactic constructs
are introduced in the mediation language. 'I y are:


I Il ENTITY EQUIVALENCE clause is used to declare the equivalence relationship between

two or more entity types and to resolve the problem of synonymous entity type names.
Entity type names enclosed in the clause are declared to be synonyms and are semantically

equivalent. 'I l synonym relationship is used to generate code to do query modifications (a

method of the 1 i, .it -:') by the mediation language compiler.

I Ij. ATTRIBUTE EQUIVALENCE clause is used to represent the synonym relationship

among a set of attributes, each of which can be composite. I Ir -, attributes are of the same

meaning and their values are convertible. II l clause is also used to generate part of the

implementation code for query modification. I hl information on attribute name mappings

is embedded in the code of that method.

'l,- VALUE EQUIVALENCE clause specifies the method to be used to perform data con-

versions between two different systems. I hl data conversion method specified in the clause

is to convert data of one or a set of attributes into the representation of another attribute.

Data conversions are needed to resolve the semantic heterogeneity problems, which may in-

volve both irregular (e.g., synonymous data values) and regular (e.g., unit or measurement

conflict) data mappings. 'I hl synonym problem in data values can be resolved by pairwise

mappings of equivalent data values (e.g., the value 'i: i' in system A is equivalent to the

value 'IBM Corp' in system B) in the implementation of the conversion method. Similarly,

mathematical functions (usually simple ones) can be embedded in the conversion method to

deal with regular data mapping. 'I l implementations of conversion methods can be done
in NCL or other programming languages.

'I Il WHERE clause following the ATTRIBUTE EQUIVALENCE clause is used to resolve

both the homonym problem on values and the structural conflict between two systems.

Homonymous values are identified by specifying the equality conditions of key attributes
in the WHERE clause. 'I ,l attribute values are identical only when the key attribute values










are the same. In data irl, ii.-. the same piece of information can be modeled in different

ways, which causes the problem of structural conflicts. 1 Irj conversion between two attribute

values is possible when a special condition is true (e.g., an entity name is equal to a particu-

lar attribute value). It represents a conditional mapping relationship between related classes

and is needed in query modification. I l, WHERE clause is also used to specify the special

condition for the mediation.


It is noted that, by default, all entity and attribute names defined in different systems are

assumed to be unrelated, unless ENTITY EQUIVALENCE and ATTRIBUTE EQUIVALENCE

clauses are used to explicitly specify their synonymous relationships. I Ij homonym problem on

attribute values is handled by using the WHERE clause to explicitly specify the equality condition

of key attributes as explained above.

I Ij, following examples illustrate the use of the mediation language to handle the heterogeneity

problems discussed in this paper.



(1) Example of Schematic Heterogeneity: Naming and Structural Conflicts

In Fi2gil 2, stock information is modeled in different ways in schemas DB_1 and DB_2. For ex-

ample, --lilI 1" is an attribute value of StkCode in DB_1, but is an attribute name in DB_2. I Ir.

value of TradePrice in DB_1 is equal to that of IBM in DB_2 only if StkCode = 'li: 1" in DB_1.

I Ir similar condition needs to hold for HP stock price in the two semantically related component

schemas. Also, Date and Stkcode of DB_1 form a composite key, whereas Date of DB_2 is the key

(indicated by a double-slash link). Stock_l_2 is the generalization of the two Stock classes in the

two component schemas with an assumed constraint of SI (i.e., Set-Intersection which is the same

as the ANDOR constraint in EXPII:K' ).

Sy'IiHEII Mediation;
USE FF;RII DB_1 (Stock);
USE FF;Rll DB_2 (Stock);
ENTITY Stock_l_2;
ABSTRACT SiiF EF;T', F OF (DB_1::Stock AIII'lIF; DB_2::Stock);
ENTITY EQUIVALEII'-E (DB_1::Stock, DB_2::Stock);
IATRIEiilT EQUIVALEI'I'E (DB_1::Stock.Date, DB_2::Stock.Date);
IATRIEiilT EQUIVALEI'.-E (DB_1::Stock.TradePrice, DB_2::Stock.HP);
.'HERFE
DB_1::Stock.StkCode = "HP";
(* DB_1::Stock.TradePrice and DB_2::Stock.HP are equivalent only
when DB_1::Stock.StkCode is equal to "HP" *)
ATTRIEiiTE EQUIVALEI'.-E (DB_1::Stock.TradePrice, DB_2::Stock.IBM);
.'HERFE
DB_1::Stock.StkCode = "IBM";
(* DB_1::Stock.TradePrice and DB_2::Stock.IBM are equivalent only
when DB_1::Stock.StkCode is equal to "IBM" *)











END_ENTITY;
END_S,.'HEI IA;


Schema Mediation

Stock 1 2

SI
Local Schema DB Local Schema DB 2

Stock Stock

Dat radePnce Date
;StkCode HP IBM




Finii 2: Example of Structural Conflict




(2) Example of Semantic Heterogeneity: Measurement Conflicts

I Il example shown in Fi iii 3 represents a group of heterogeneity problems in the category

of semantic heterogeneity which are resolvable by using conversion methods. In this example, two

component systems USA and Japan contain stock data which are represented in different currencies

(i.e., US Dollar and Yen). 'I ll two currencies are convertible by applying some simple conversion

functions (e.g., USD_to_Yen and Yen_to_USD). For some other semantic heterogeneities, including

representation, confounding, granularity and data type conflicts, they can be resolved by data

conversion methods specified in mediation specifications. 'I Iri 1 program code needs to be provided

by the mediation system administrator.

S',HEllr Mediation;
USE FFRill Japan (Stock);
USE FRillI US (Stock);
ENTITY StockJP_US;
ABSTRACT Si.iF EFIT'i F OF (Japan::Stock AII'OIFK US::Stock);
ENTITY EQUIVALEIICE (Japan::Stock, US::Stock);
ATTRIEiiTE EQUIVALEII''E (Japan::Stock.Price, US::Stock.Price);
VALiE EQUIVALEII,'E (
(Japan::Stock.Price, USD_to_Yen(US::Stock.Price));
(US::Stock.Price, Yen_to_USD(Japan::Stock.Price));

(* Data conversions between two currencies *)
END_ENTITY;
END_S '-HEII ;










Schema Mediation

Stock JP US


Local Schema Japan / Local Schema USA

Stock Stock






Fiiiil 3: Example of Semantic Heterogeneity: i ..-il. i it Conflicts


3.3 Translation of Mediation Specifications

I Ir mediation language illustrated by examples in the last section is used to mediate a number of

component schemas defined in NCL which capture the semantics of some existing systems. Given

some mediation specifications, a mediation language translator is used to generate the "r .. I..tI .i n

- I in, ijt-' defined in NCL based on these specifications. In!- set of mediation elements contains

the following object classes with active mediation rules and methods for carrying out the run-time

mediation tasks.


Supertype entity classes: Ih -- superclasses generalize the ENTITY classes of different com-

ponent systems if they contain semantically related objects. I y upward inherit the at-

tributes and methods associated with these entity classes. I I. specifications of these su-

perclasses also capture the set membership constraints among the related ENTITY classes,

such as Set-Equality, Set-Exclusion, or Set-Intersection. I In- information is useful for query

optimization (e.g., if two component systems contain identical objects and their data are the

same, then there is no need to send a query to both systems).

.I /,I ,.r object classes: 'Ilr classes model the mediators which are generated based on

the mediation specifications associated with semantically related schemas. For achieving

processing efficiency, the : 1i i., t. .r classes are distributed and linked with different component

systems at different sites so that mediation operations can be processed locally. Each 1' i.t 1., 11 .-

contains methods which perform query modification and data conversion. 'I Il -i methods are

invoked by the Subquery Processor of the component system when mediation is needed.











* Distributed Query Processor (DQP) object class: I I. Distributed Query Processor class con-

tains a method which performs distributed query processing in the heterogeneous system.

I I!-i method with a query as its parameter is invoked by the user/application program. I I.

implementation of this method is based on a built-in query processing algorithm. I I. DQP

class has an ECAA mediation rule which, when triggered, calls a knowledgebase management

system (OSAM*.KBMS [SU'lt-) to obtain the meta data for locating the information sources

and propagating subqueries.

Algorithm of the ['QF's global query execution
(* Input: a textual OQL global query *)
(* Output: a set of data results *)
1. Query Parsing: Parse the global query into a parse tree structure
2. Tree Transformation: Transform the tree to generate simple subqueries,
each of which involves only a single class.
3. Subquery Propagation: For each subquery, a mediation rule is triggered
to look up the meta data from the I-Ell" to locate the information sources
which can answer the subquery. For different information sources, more
subqueries are propagated.
4. Subquery Dispatching: Dispatch each subquery to the information source
for processing.
5. Global OID Assignment: Data returned from information sources are already
mediated and uniformaly represented by the local mediator before received
by the [IQF. The [IQF first assigns global object IDs for the data instances
based on the key attribute values. Then, duplicated data instances are
identified.
6. Data Assembly: First, perform union operations on the data generated from
the propagated subqueries (i.e., generating a set of data results for simple
subqueries). Then, perform join operations to combine the data results
based on the join conditions) to produce the final result.
7. Data returning: Return the final result to the client.



* Subquery Processor (SQP) object classes: I Irl -- classes are Subquery Processors generated for

the component systems. Each SQP receives a subquery from the DQP and calls the : i i1.,i, ,t,

if a subquery conversion is needed, to generate a mediated subquery. I Ir SQP then sends

the mediated subquery to the wrapper which converts it into a native query, command or

API processable by the component system. I I- SQP object class contains a main method

which triggers two mediation rules. One is to modify the subquery to conform to the naming

and structural convention of the component system before the main method is executed. Ir!

other is to convert the returned data from its local representation to the one expected by the

user after the main method finishes its execution.


* .I ,l- ... methods: Based on the mediation clauses (i.e., ENTITY EQUIVALENCE, AT-

TRIBUTE EQUIVALENCE, VALUE EQUIVALENCE and WHERE), the implementation

code for the mediation methods, including query modification, data conversion, etc., can











either be generated automatically or provided by the system administrator.


'I Ij generated mediation elements and the component schemas defined in NCL form a mediated

global schema which is then compiled and stored in the KBMS. I Ij meta information of the

mediated global schema is accessed by the DQP at run-time to locate component systems that

contain the relevant data.

I Il following example illustrates the mediation language translation process.

Schema Mediation


SStock 1 2 3


-------- ----SI --- --------- ------


Stock Stock IBM

aradePr A Set kPrace
StkCode HP B Dat Sk


Local Schema DB 1 Local Schema DB 2 Local Schema DB 3

Fi,,i 4: f i. t.i_.n Schema on Top of I In!, Component Schemas



SCHEMA Mediation;
USE FROM DBI_.1 ,, II
USE FROM DB_- Si,,, I
USE FROM DB_3(IBM);

ENTITY ,,,!.-- I:1 ;
ABSTRACT SUPERTYPE OF (Di;_ I,,,, !: ANDOR Di;_ Si,,, !: ANDOR DB_3::IBM);
ENTITY EQUIVALENCE (Di;-I Si,,, !.: i)i;2 Si,,, !: iB_3::IBM);
ATTRIBUTE EQUIVALENCE (Di;_! Si 1!: )ate,Di; Si ,,.. : ate,Di;-; 11i;I 11 .1.);
ATTRIBUTE EQUIVALENCE (Di;_! SI,,, !: TradePrice,Di;_' Si,,, !: 11P);
VALUE EQUIVALENCE(
(Di; S,,, !: TradePrice,Decimalto-NewYork(Di; __ Si..1: 1I11));
(Di;_ 2 Si ., !: ilP,NewYork_toDecimal(Di; Si, I!: TradePrice)));
WHERE
Di;_ S 1,,, !: Sir!:ode= 'HP';
(* Di;_! Si... !: TradePrice and Di;_-2 Si. : IP1 are equivalent only when
Di;_! Si ,, I: Sil!: lode is equal to 'HP' *)
ATTRIBUTE EQUIVALENCE (Di;_! I,,, I: ITradePrice,Di;_' Si,,, !: 11;.,
D i; 1ii;. I S ,,, !:Price);
VALUE EQUIVALENCE(
(Di;_! I,,, I: TradePrice,Decimalto-NewYork(Di;- Si,,. : 11;. !I);
(Di;_-2 Si, !: 11;. NewYork_toDecimal(Di; Si,,I !: TradePrice));










(Di;_! I Si... : TradePrice,Decimalto-NewYork(Di; _; ii;. I Si. !:Price));
(Di;_ ; ii;., ISi. !:Price,NewYork_toDecimal(Di;_! 4i.,i !: TradePrice)));
WHERE
Di;_! S.,, !: S !:' ode = 'IBM';
(* Di; Si.,. !: TradePrice is equivalent to Di;_- Si., !1: 11;. I and Di; _; li;. l Si., !I:Price
only when Di;_! Si... !: Si !: lode is equal to '1i;. *)
ENDENTITY;

ENDSCHEMA;


Fiiii, 4 shows that three component databases, DB_1, DB_2, and DB_3, which contain semanti-

cally related stock information, are mediated by the mediation specification shown above. In Fiuii,

4, Stock_1_2_3 is a generalization (or superclass) of DB_1::Stock, DB_2::Stock and DB_3::IBM with

two assumed pair-wise constraints of Set-Intersection (SI). I Ir constraint specifies that objects

in each pair of subclasses can overlap. I Ii main difference among these three databases is that

the company name is modeled as an attribute value, an attribute name, and an entity name in

DB_1, DB_2 and DB_3, respectively. Additionally, the data representation of the stock price in

DB_1 is different from that in DB_2 and DB_3. In DB_1, the stock price is represented in the \N w.

York Stock Exchange representation, (e.g., 6\08); whereas, in DB_2 and DB_3, it is in the common

decimal representation (e.g., 6.5).

Suppose a client wants to get the stock price of "'li. 1" by issuing a global query against the DQP

based on the view of schema DB_1, as shown in Fi ,i 5. Irh global query service is a method of the

DQP that contains the program code of the query execution plan. I Ii. DQP first parses, checks and

decomposes the global query into simple subqueries. Since the query given above is already simple

(i.e., referencing a single class), it is not decomposed. I Iir query is stored in a data structure for

initialization. After the query is initialized, a mediation rule (i.e., rule_query_propagation shown

in the translated results given below) associated with the DQP is triggered to access the meta-

information from the KBMS. I I r meta-information of the mediated global schema indicates that

two other information sources (i.e., DB_2 and DB_3) contain related data. I Iil-, three subquery

instances are established by --'i pi'. i. it ," the global query. Since the global query is issued based

on schema DB_1, the subqueries generated for DB_2 and DB_3 need to be modified to conform to

the naming convention of these two systems. I Ir mediation rules associated with the SQP of the

component system are triggered to invoke the methods of the mediator to 1) modify the subquery

properly before the mediated subquery is executed in the local server and 2) convert data into the

representation expected by the client (i.e., the \N .- York Stock Exchange representation) after the











mediated subquery result is returned. Once data are converted by the mediator into a uniform

representation, the global query processor assembles the data and returns the assembled data to

the client.

I Ir following object classes defined in NCL is part of the translated results for the mediation

specification given above.

(* Common data structures of Query and returned data *)
(* object class 'Query' for storing query statement and data result *)
DEFINE ENTITY Query;
query_node: Query_node; (* query node *)
s_schema: STRIIIn; ( source schema name *)
t_schema: SITIII'; (* target schema name *)
result: SET OF Data; (* query result *)
END_DEFINE;
(* Object class 'Data' is defined for standard data interchange *)
DEFINE ENTITY Data;
value_set: SET OF Generic_Data_Type;(* result values *)
size: IIIITEGER; (* size of SET *)
data_type: STRIIIn; (* data type *)
s_attr: STRilIII; (* attribute name in source schema *)
t_attr: SITRilII; (* attribute name in target schema *)
END_DEFINE;
(* Generic Data Type:a union of all possible data types *)
DEFINE TYPE Generic_Data_Type=
SELE-CT OF (string_type, number_type, boolean_type, Data);
END_DEFINE;
(* Mediator class in DB_1 generated from 'Stock' mediation specification *)
DEFINE ENTITY Mediator IN DB_1;
METHODS:
(* Two main methods: modify_query, data_conversion *)
METHOD modify_query(INOUT ps:Query):vOl[';
METHOD data_conversion (INOUT result:SET OF Data):VOLi';
METHOD convert(INOUT data:Data):V'OII;
END_DEFINE;
(* Implementation of Method modify_query() *)
METHOD Mediator::modifyquery(INOUT ps:Query):vOII';
IF (ps.s_schema= 'DB_2' AND ps.t_schema= 'DB_1')
THEN
change_name_add_cond('IBM','TradePrice',ps, 'StkCode="IBM"');
ENDIF; ...
END_METHOD;
(* Implementation of Method data_conversion() *)
METHOD Mediator::data_conversion(INOUT result:SET OF Data) :VOII';
(* For each column of data *)
IF (result[i].s attr <> result[i].t_attr)
THEN convert (result[i]);
END_IF;
END_METHOD;
(* Subrountine of data_conversion() *)
METHOD Mediator::convert(INOUT data:Data):VO'I';
(* For each data value in the column *)
IF (data.s_attr = 'DB_1.Stock.TradePrice' AND data.t_attr = 'DB_2.Stock.HP')
THEN NewYork_to_Decimal(data.value_set[i]);
END_IF; ...
END_METHOD;
(* Definition of class 'Distributed Query_Processor' *)
DEFINE ENTITY I'QF IN MED'I~rTlil;
global_query: Query; (* for global query *)
subqueries: SET OF Query; (* for subqueries *)

















Client Program:
q Query Processor,
query STRING,
result SET OF Data.


query = "CONTEXT s DB_1 Stock
WHERE s Date >= '1/1/1995' AND s StkCode='IBM'
RETRIEVE s Date, s TradePrice",
result = q global queryexecution(query),
result value set display,


-> Call KBMS


Query Modification


'So1 -V DB_1
CONTEXT Stock
WHERE Date >= '1/1/1995'
AND StkCode='IBM'
RETRIEVE Date,TradePrice





Subquery Execution
In DB 1


D1


Date IBM
2/1/95 11 6



---------------I



Date TradePrice
2/1/95 11\09





SResults Assembly


Date TradePrice
1/1/9 20\6
2/1/9 11\09
3/1/9 14\09


Date StockPrice
3/1/95 146





Result Conversion
-11-,----- ----:

Date TradePrice
3/1/95 14\09



. ._____________


Fi-i il 5: Example of a Global Query Processing


Date TradePrice
1/1/95 20\6





-7-











METHODS:
(* global_query_execution() is the main method *)
METHOD global_query_execution(IN query_txt:SiTRIII) :SET OF Data;

(* rule triggered to get the meta data from the i-Ell" to locate information
sources *)
RULE rule_query_propagation;
TIFi',,ER IMMEDIATE AFTER subq_initialization(s)
AC'T IO
prop_subqs:= i-Ell I: :propagate_queries(s);
END_RULE;
END_DEFINE;
METHOD [IQF::global_query_execution (IN query_txt: SIFiiil): SET OF Data;
query_initialization(query_txt);
IF (syntax_semantic_checking() == FALSE)
THEN
error_handler();
END_IF;
subquery_generation(global_query.Query_node);
FOREAI'H (s:subqueries)
subq_initialization(s);
FOREAI'H (ps:prop_subqs)
dispatch_subquery(ps);
END_FLIREACH;
assemble_mediated_results(s,prop_subqs);
END_FLIREA.C.H;
result_merge_join();
RETURnI (global_query.result);
END_METHOD;
(* Component System DB_1 *)
DEFINE S.CHEII. DB_1;
END_DEFINE;
DEFINE ENTITY SQP IN DB_1;
m: Mediator;
METHODS:
METHOD query_execution(INOUT q_obj: Query_obj): VOLI';
Rii LE ':
(* rule triggered to modify the subquery into the view of local systems *)
RULE rule_query_modification;
ITRI',,ER EEFOIRE query_execution(q_obj)
COillITILiil q_obj.s_schema <> q_obj.t_schema
ACTILIII
m.modify_query(q_obj);
END_RULE;
(* rule triggered to convert the data into the view of the original query *)
RULE rule_data_conversion;
TIFRI',ER IMMEDIATE AFTER query_execution(q_obj)
AC(Tll0I
m.data_conversion(q_obj.result);
END_RULE;
END_DEFINE;


3.4 Generation of CORBA Enhanced Program Binding


I I above NCL class specifications are used to generate program code for the DQP, SQPs and

mediators. I Ir code for SQPs and :' .it i .i are linked with the program code of component sys-

tems. As shown in Fi,iil 6, component systems, DQP, client and KBMS communicate with each

other over ORB which is the CORBA communication infrastructure. CORBA is a client/server

architecture in which the client makes a service request by issuing a method call which is trans-










parently dispatched to the server. In the architecture, each component system can be a client, a

server or both.

Distributed
Client Query KBMS
Processor



ORB



Subquery Mediator Subquery Mediator Subquer Mediator
Processor Processor Mediato
Wrapper Wrapper Wrapper
Component System 1 Component System 2 Component System 3



DB 1 DB 2 DB 3



Fi iil, 6: CORBA-based : 1, -.litp i.ij System Architecture


To carry out the code generation for use in the CORBA environment, an NCL compiler is

developed to translate the NCL specifications into CORBA program bindings. All the semantic

properties captured by NCL, but not IDL (e.g., keyword constraints and associations), are first

converted into ECAA rules. I Ir -i and other explicitly specified ECAA rules are translated into

C or C++ code. Parts of NCL class definitions which are equivalent to IDL specifications are

translated into IDL specifications. 1 hr IDL specifications are then translated by the IDL compiler

to produce C or C++ stubs for the clients and C or C++ skeletons for the servers. I I. rule

code, mediation code, and subquery processing code are incorporated in the skeletons to form the

so-called enhanced bindings for the servers, as illustrated in the top diagram of Fisii 7. We shall

use the ECAA rules in the generated SQP class, which is shown at the end of Section 3.3, as an

example to illustrate the process of generating enhanced program bindings.

Fi ,ii- 7 shows the compilation of a method (query_execution) with its associated before- and

immediate-after rules (rulequerymodification and ruledataconversion) into the C or C++ code.

During the compilation of the SQP class, the NCL compiler translates each rule into a C or C++

method. For the rule named rulequerymodification, the C or C++ code in the method named

query_execution_R1 will check the condition ps.sschema <> ps.tschema and call the method

modifyquery if the condition evaluates to True. Similarly, for the rule named ruledataconversion,










IDefin n o IDL Specification
Definitionof a NIIIPNCL DL C/C++ Bindings
class in kerne Compiler Compiler
NCL form E
SC/C++ Code

Class SQP in Server DB 1:
attributes IDL for queryexecution IDL compile C/C binding
(stub) for client

C/C++ skeleton for
method specifications: query execution: C/C++ implementation for the query execution
query execution surrogate query execution
/* Request monitor for query execution */
Scall query execution Rl insert code
call query execution p
(original queryexecunion)
call queryexecution R2 }
Rules: uery execution Rl: {/* Rule processing for R1 */
Rule ru query difictio ) /C+ code for implementation of insert code
Rule rule query modification (R1)= IF ps.s schema ps.t schema
IBeore query executionss) THN call modify query(ps)e
IF ps.s schema ps.t schema all modi query(ps)
THEN call modify query(ps) insert code
Rule rule data conversion (R2)= query execution R2: {/* Rule processing for R 4irt code
After query execution(ps) C/C+ code for implementation of
call dataconversion(ps.result) call dataconversion(ps.result) } C/C++ code
to implement
the original queryexecution
(renamed to queryexecutionp)

Fi ,ii 7: Compilation of an NCL : 1i tI ij.i query_execution() and Its Associated Rules


a C or C++ method named query_execution_R2 is generated.

For each method in the class, an equivalent IDL specification is generated. For example, an

IDL specification is generated for the method query_execution and is renamed as query_execution_p.

Furthermore, a new method named query_execution (i.e., a surrogate query_execution in C or C++

code) is generated. I Ir new implementation consists of three method calls. Fi -I, a call to method

query_execution_R1 is made to process the before-rule rule_querymodification. I l. i1, a call is

made to the original implementation of the method query_execution, which has been renamed

as query_execution_p, to perform the subquery processing. F .illy.;, a call is made to method

query_execution_R2 to process the immediate-after-rule rule_data_conversion.

In the final step of the compilation process, the IDL compiler is used to generate the C or

C++ bindings for all the methods which have been translated into IDL specifications, including the

method query_execution. After the bindings have been generated, the corresponding C or C++ im-

plementation code for the surrogate query_execution, query_execution_R1, query_execution_p (the

original query_execution) and query_execution_R2 are inserted into the skeleton of query_execution

to produce the enhanced bindings. At run-time, an activation of the method query_execution will

cause the surrogate method to be executed because it is named as such. I Ii processing of the

surrogate method will call the before-rule method to carry out the query modification task, the










original query_execution method to process the subquery, and the immediate-after-rule method

to do the data conversion. I Ini-, the mediation and query processing tasks are carried out in a

distributed and active manner. I h!- compilation approach offers the needed run-time efficiency.


4 Conclusion

In this paper, we have presented an approach to achieve distributed information mediation and

query processing in a CORBA environment. I Ir key features of the proposed approach are sum-

marized below. Fii-t a semantically-rich object-oriented common modeling language is used to

capture the semantics of the data resources and object services of component systems. Some of

these semantic properties will be lost if IDL is used to define these resources and services. Second,
a high-level mediation specification language is used to explicitly specify the similarities and dis-

crepancies of the data resources and the conversion methods needed for data conversions. Third,

the information and mediation specifications using the above two languages are used to generate
distributed code which is linked to the code of the component systems. I Il-, much of the media-

tion and query processing tasks can be carried out locally and in a distributed and parallel fashion,

without the overloading problem commonly seen in a centralized mediation and query processing
system. Lastly, since a mediated global schema formed by a set of mediated component schemas

is used instead of a traditional integrated global schema, a query issuer can use the naming, struc-

tural and semantic representations of the component schema familiar to him/her to state query
and receive data from multiple data sources in his/her own view. I I!-i is more advantageous than

the traditional integrated schema approach in which all users are forced to see data elements in the

same way.
At the time of writing this paper, the mediation language translator, the NCL compiler and its

code generation facility, and the supporting KBMS have been implemented. I Ir generated compo-

nents for distributed mediation and query processing have been tested using Ill: i' implementation
of ORB (the SOM software package) as the communication infrastructure. Further testing of these

components using different application data is under way.


References

[AMDB'L; J. L. Ambite, Y. Arens, C. ('Cl C. N. Hsu, and C. A. Knoblock, "'1: IS :i .,ii.,i,"
Working Draft, July I' I










- BIE'il~i Y. Breitbart, f": iill l.It.I .l...-i Interoperability," SIGMOD Record, Vol. 19, No. 3, Septem-
ber 1990, pp. fi.;-i I

[CHA91] A. ('! ..tii ijee and A. Segev, "Data : i.,f jiiul., i.ij in Heterogeneous Databases," SIGMOD
Record, Vol. 20, No. 4, December 1991, pp. (. 1-1"

[CHAW94] S. (ChI...ti.l, H. C.i, ,.,-: i.ni l., J. Hammer, K. Ireland, Y. Papakonstantinou, J.
Ullman, and J. Widom, -- I II '1.: I: fil Project: Integration of Heterogeneous Information
Sources," Proceedings of IPSJ Conference, Tokyo, Japan, October 1l'I 1, pp. 7-18.

[CHAL94] H. ('CI. 1il1.-l:' and S. C. -i.i.Iii..,, "Ontological : i.i .i.II .i, An Analysis," Draft, De-
partment of Computer Science, State University of \ '.- York at Buffalo, Buffalo, NY, July
'1'111

[C'ITHE,, L. ('I! ii and P. Arbee, '-. I!, ii., Integration to Support : i jlit ,.i Database Access," Bell-
core, I.1 TS-U1 ."', 1 December 1988.

[COL91] C. Collet, M. N. Huhns and W.-M. Shen, "Resource Integration Using a Large Knowledge
Base in Carnot," IEEE Computer, Vol. 24, No. 12, December 1991, pp. 55-62.

[FLO' ii D. Florescu, L. Raschid and P. Valduriez, "A : ti I I i .. i 1i- y for Query Reformulation in CIS
using Semantic Knowledge," International Journal of Intelligent and Cooperative Information
Systems, special issue on Formal : i, l .l i in Cooperative Information Systems, I'I'i,

[GOH94] C. H. Goh, S.: i., lin. 1., and M. Siegel, "Context Interchange: Overcoming the ('l1.,ll i. -
of Large-Scale Interoperable Database Systems in a Dynamic Environment," Proceedings of the
I / it i Conf on Information and Knowledge .I.l .i.. :11 Nii. I i i, I'I' l, pp. 337-.; !l

li\: I'l.; J. Hammer and D. 1 ....1 "An Approach to Resolving Semantic Heterogeneity in a Fed-
eration of Autonomous, Heterogeneous Database Systems," International Journal of Intelligent
and Cooperative Information Systems, Vol.2, No. 1, I'1'.;, pp.51-83.

[ISO'i,' Subcommittee 4 of ISO Technical Committee 184, "Product Data Representation and
Exchange Part 11: I I. EXPRESS Language Reference : l..ii..l," ISO Document, ISO DIS
10303-11, August 1992.

[KIM91] W. Kim and J. Seo, "('!..I--. !, ii Schematic and Data Heterogeneity in iiltli.. ..i...-
Systems," IEEE Computer, Vol. 24, No. 12, December 1991, pp 12-18.

[N I'I ,' NIIIP Consortium, "NIIIP Reference Architecture: Concepts and Guidelines," NIIIP Pub-
lication N '11'l i-01, Jan. 1, 1l'i'1i

[SAU'Lii Gunter Sauter and Wolfgang Kafer, "BRIITY A : 1.,ii,'in, Language Bridging Hetero-
geneity," Proc. of the ITG/GI/GMA Conf., Software T 1 ... 1, ,... in Automation and Commu-
nication (STAK),: i' Ip ip, ii, Germany,: i., i. 1 l'i I

[SHY'l I Y. M. Shyy, J. Arroyo-Fi-u, i..., S. Y. W. Su, and H. Lam, "I Ip Design and Implemen-
tation of K: A High-level Knowledge Base Programming Language of OSAM*.KIil: I', VLDB
Journal, Vol. 5, No. 3, 1'',1I

[SU~ S. Y. W. Su, V. Krishnamurthy and H. Lam, "An Object Oriented Semantic Association
i i..1. (OSAM*)," ('C!.1t' 17 in A1 ';. '., Intelligence: .~[...ii.,. t. ring I L.. ..j and Practice,
Institute of Industrial Engineers, Industrial Engineering and : i.i.,, il it Press, \. i--, GA,
1989, pp. I1.;- !')4.










[SU91] S. Y. W. Su and J. H. Park, "An Integrated System for Knowledge -'i.iI, among Hetero-
geneous Knowledge Derivation Systems," International Journal of Applied Intelligence, 1, 1991,
pp. 223-245.

[SU' i-] S. Y. W. Su, H. Lam, T. F. Yu, et al, "An Extensible Knowledge Base : i.if.i, m, 1it System
for Supporting Rule-based Interoperability among Heterogeneous Systems," Proceedings of the
Conference on Information and Knowledge .1[,I.,j. : I, t (CIKM), Baltimore, MD, N_.., il :
28 December 2, 1'l'i-,, pp. 1-10.

[SU'Li S. Y. W. Su, H. Lam, T. F. Yu, et al, "NCL: A Common Language for Achieving Rule-Based
Interoperability among Heterogeneous Systems," Journal of Intelligent Information Systems
(JIIS), Special Issue on Intelligent Integration of Information, I' i'L, pp. 171-198.

[VEN91] V. Ventrone and S. Heiler, i i. iit i. Heterogeneity as a Result of Domain Evolution,"
SIGMOD Record, Vol. 20, No. 4, December 1991, pp 16-20.

[WIE92] G. Wiederhold, --: i .lI.II .- in the Architecture of Future Information Systems," IEEE
Computer, Vol. 25, No. 3, : i.., I, 1992, pp. :;- I'.

[WIE94] G. Wiederhold, 'I,' "1i" :.,i p ..ij : ii.i i.i t .i, and Ontologies," Proceedings of the Interna-
tional Symposium on the Fifth Generation Computer Systems (FG( -s';), Workshop on Hetero-
geneous Cooperative Knowledge-Bases, Vol. W3, ICOT, Tokyo, Japan, December 1'I'1, pp.33-
48.




University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs