Group Title: Optimized trigger condition testing in Ariel using Gator Networks
Title: Optimized trigger condition testing in Ariel using Gator networks
ALL VOLUMES CITATION PDF VIEWER THUMBNAILS PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00095400/00001
 Material Information
Title: Optimized trigger condition testing in Ariel using Gator networks
Series Title: Department of Computer and Information Science and Engineering Technical Reports
Physical Description: Book
Language: English
Creator: Hanson, Eric N.
Bodagala, Sreenath
Chadaga, Ullas
Hasan, Mohammed
Kulkarni, Goutam
Rangarajan, Jayashree
Affiliation: University of Florida
University of Florida
University of Florida
University of Florida
University of Florida
University of Florida
Publisher: Department of Computer and Information Science and Engineering, University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: February 20, 1997
Copyright Date: 1997
 Record Information
Bibliographic ID: UF00095400
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.

Downloads

This item has the following downloads:

1997238 ( PDF )


Full Text










Optimized Trigger Condition Testing in Ariel using Gator Networks*

Eric N. Hanson, Sreenath Bodagala, Ullas C11.1i.,..,
'. i .i.1nii, id Hasan, Goutam Kulkarni and Jayashree Rangarajan

Rm. 301 CSE, P.O. Box 116120
CISE Department
University of Florida
Gainesville, FL 32611-6120

http://www.cise.ufl.edu/~hanson/

20 February 1997


TR 97-002

Abstract
This paper presents an active database discrimination network algorithm called Gator, and
its implementation in a modified version of the Ariel active Dl;., IS Gator is a generalization of
the widely known Rete and '11: ..\ 1 algorithms. Gator pattern matching is explained, and it is
shown how a discrimination network can speed up condition testing for multi-table triggers. The
structure of a Gator network optimizer is described. This optimizer can choose an ti! i ii Gator
network for testing the conditions of a set of triggers, given information about the structure of the
triggers, database size, attribute ,1lii i ,ii- and update frequency distribution. The optimizer
uses a randomized -li ,t. -- to deal with the problem of a large search space. The results
show that optimal Gator networks normally have a shape which neither pure Rete nor pure
'11 1..\ 1, but an intermediate form where one or a few inner joins (/ nodes) are materialized. In
addition, this study shows that it is indeed feasible to optimize Gator networks which perform
rule condition testing more tih i iII! than either TREAT or Rete networks. In certain cases,
the best Gator network is an order of magnitude or more faster than Rete and 1 1:..\ 1 (a factor
of 23 in one case).


1 Introduction

A crucial component of an active database system is the mechanism it uses to test trigger con-
ditions as the database changes. I In paper presents an efficient trigger (rule) condition-testing
strategy based on a new type of discrimination network called the Gator network, or Generalized
TREAT/Rete network [16, 3]. It is assumed here that a trigger condition can be based on multiple
tables, and can involve both selections and joins, as in the Ariel active DBMS [4]. Gator networks
are general structures that allow condition testing to be done for trigger conditions involving one
or more tables. In general, there are many possible Gator networks for a given trigger condition,
just as there are many possible query execution plans for evaluating a given query. Each will allow

*This work was supported by .'`..,i ...!i.i Science Foundation grant IRI-9318607.










the trigger condition to be tested correctly, but some are much more efficient than others. Hence,
an optimizer has also been developed for choosing a good Gator network for a trigger.
Rete and TREAT are rule condition testing structures that have been used both in production-
rule systems such as OPS5, and in active database systems [2, 1, 4]. It has been observed in
a simulation study that TREAT normally outperforms Rete, but the "-!!,it'" Rete network can
vastly outperform TREAT in some situations [21]. 1 Ir reason that TREAT is usually better than
Rete is that the cost of maintaining / nodes usually is greater than their benefit. However, if, for
example, update frequency is skewed toward one or a few relations in the database, a particular
Rete network structure can significantly outperform TREAT, as well as other Rete structures. It
has been shown that Rete networks can be optimized, giving speedups of a factor of three or more
in real forward-chaining rule system programs [13], which are like sets of triggers operating on a
small, main-memory database. But even optimized Rete networks still have a fixed number of /
nodes, which take time to maintain and use up space. With Gator, it is possible to get additional
advantages from optimization, since P nodes are only materialized when they are beneficial.
I I!J- paper describes how trigger conditions can be tested using a Gator network, outlines a cost
model for Gator networks, and presents how the Gator optimizer and trigger condition matching
algorithm have been implemented in a modified version of Ariel [15]. Performance figures are given
that demonstrate a substantial speedup in the trigger condition testing performance of Ariel.


2 Gator Networks

Gator networks are made up of the following general components:

selection nodes I Ir test single-relation selection conditions against descriptions of database
tuple updates, or "-1I..:i j- Selection nodes are also sometimes called "t-(. i-i' nodes, since
they typically test tuples to see if they match constant values.

stored-a nodes I Ir -, hold the set of tuples matching a single-table selection condition.

virtual-a nodes I lr are views containing a single-relation selection condition, but not the
tuples matching the condition.

/ nodes I Ir -, hold sets of tuples resulting from the join of two or more a nodes.

P-nodes I Ih i is one P-node for each trigger. If a trigger only involves one table, then its P-node
has a selection node as its input. If a trigger involves two or more tables, its P-node has as
input one or more a and/or f nodes. If new tuples arrive at the P-node, the trigger is fired.
I I P-node is logically the root of a tree joining all the a and f nodes for the trigger.

root node I Ir purpose of this node is to pass tokens to the selection nodes for testing. I I. root
node is not the root of the join tree. I Ir term "root" is used for historical reasons because
it is used in the Rete algorithm [3].

By convention, the a nodes are drawn at the top, and the P-node is drawn at the bottom. In
Gator networks for triggers involving more than one table, f nodes and P-nodes can have two or
more child nodes, or "-ti ill- I I -, inputs can be either a or f nodes. Every a and f node has
a parent node that is either a / node or a P-node.
Rete and TREAT networks are special cases of Gator networks. Rete networks are always
binary trees, with a full set of f nodes, all of which have two inputs. TREAT networks have no f
nodes all a nodes in a TREAT network feed into the P-node.











house
house price>=customer minpricehouse nno
and house price<=customer maxprice de ed nh nno
Idesired_nh nno

( customer \desired_nh
customer cno=
desired nh cno desired nh nno=
customer spno= covers nh nno
salesperson spno

Salesperson .coversnh
salesperson spno=
name=i ris covers_nh spno
name="Iris"

F i ioi 1: A rule condition graph for IrisRule.


To begin illustrating Gator networks with an example, consider the following schema describing
real estate for sale in a city, real estate customers and salespeople, and neighborhoods in the city.

customer(cno,name,phone,minprice,maxprice,spno)
salesperson(spno,name)
neighborhood(nno, name, desc)
desired_nh(cno,nno) ; desired neighborhoods for customers
covers_nh(spno,nno) ; neighborhoods covered by salespeople
house(hno, spno, address, nno, price, desc)

A trigger defined on this schema might be "If a customer of salesperson Iris is interested in a house
in a neighborhood that Iris represents, and there is a house available in the customer's desired
price range in that neighborhood, make this information known to Iris." I I!i- could be expressed
as follows in the Ariel rule language [4]:

define rule IrisRule
if salesperson.name = "Iris"
and customer.spno = salesperson.spno
and customer.cno = desired_nh.cno
and salesperson.spno = covers_nh.spno
and desired_nh.nno = covers_nh.nno
and house.nno = desired_nh.nno
and house.price >= customer.minprice
and house.price <= customer.maxprice
then raise event CustomerHouseMatch("Iris",customer.cno,house.hno)

I li raise event command in the rule action is used to signal an application program, which would
take appropriate action [7]. Internally, Ariel represents the condition of a rule as a rule condition
graph, similar to a connection graph for a query [20]. 'I I structure of the rule condition graph for
IrisRule is shown in Fil_,ii 1. Sample Rete, TREAT and Gator networks for IrisRule are shown
in figures 2, 3, and 4, respectively. Gator networks use objects called "+" tokens to represent
inserted tuples, and "-" tokens to represent deleted tuples. i.i!..l .1i tuples are treated as deletes
followed by inserts.
When a + token is generated due to inserting a tuple in a table, it is propagated through
the Gator network to see if any triggers need to fire. Token propagation is explained below in




























root


t-const relnsalesperson reln customer
nodes 1 l
namelnrs"
12


reln desired nh reln covers nh reln=house

S3 a4 u5


'AND
customer spno=
salesperson spno



AND
customer cno=
desired nh cno


AND
salesperson spno=
covers_nh spno
and desired nh nno=
coversnh nno


SAND
house pnce>=customer minpnce
and house pnce <=customer maxpnce
and house nnodesired nh nno

P-node(InsRule)


Fin5,i' 2: A Rete network for the rule IrisRule.























root


reln=salesperson

name="lrs"


reln=customer reln=desired nh reln=covers nh


reln=house


customer cno=
desired nh cno


al a2-------- 3---- -- a4
customer cno= desirednh nno= house pnce>
customer spno= desirednh cno covers nh nno / customer minpnce
salesperson spno and house price <
-. . --_---- - customer maxpnrce

salesperson spno=
covers_nh spno





P-node(InsRule)



Fil:i. 3: A TREAT network for the rule IrisRule.















reln=salesperson reln=customer reln=desirednh reln=coversnh reln=house
name=lrns"

customer spno= customer cno=
salesperson spno desirednh cno :3


Desired nh nno house prce>=
coversnh nno customer minpnce
and salesperson spno= and house price<=
covers_nh spno customer maxpnce
and customercno=
pl / desirednh cno 5




P-node(InsRule)

Fiiii 4: A Gator network for the rule IrisRule.


object-oriented terms, describing what happens when a token arrives at the types of nodes listed:1

root When the token arrives at the root node, the token is passed through a selection predicate
index [6, 8] to reduce the set of selection nodes whose conditions must be tested against the
token. I Ii token is tested against each selection node that is not eliminated from consider-
ation in the previous step. I I, identifies each a to which the token must be passed. I Ir
token is passed to each of these nodes in turn.

stored a node I Ii tuple contained in the token is inserted into the node. I Ir node will have a
list of one or more other nodes called its "-i !-1," nodes, all of which have the same parent
node. I Ir token is joined with its siblings, using a specific join order that was saved at the
time the Gator network was created (the choice of this join order is discussed in more detail
later). A set of tuples is produced by this join operation. I Ir -i tuples are packaged as +
tokens and passed to the parent node.

virtual a node I Ir work done is the same as that for a stored a node, except that the token is
not inserted into the virtual a node.

P-node I Ir rule is triggered for the data in the token.

/ node I Ir logic for this case is the same as for a stored a node.

As an example of Gator matching, suppose that the Gator network shown in Finlii 4 is being
used, and a new customer for Iris is inserted. I I,-' would cause the creation of a "+" token tl
containing the new customer tuple. Token tl would arrive at a2 and be inserted into a2. I Il i ,
it could be joined with either al or a3. Assume that it is joined first with al where it matches
with the tuple for Iris. I Ir resulting joining pair is joined with a3. If elements of a3 join with
this pair, each joining triple is packaged as a + token and forwarded to 31. Upon arriving at /1,
a + token is stored in 31. I Ir- i, the token can be joined to either a4 or a5 via the join conditions


The actual Ariel implementation has a few other more specific types of nodes (see [4]), but the token propagation
logic works as









shown on the dashed edges from 31 to a4 or a5, respectively. Assume it is joined to a4 first. I Iir
results would be joined next to a5. If a combination of tokens matched all the way across the three
nodes 31, a4 and a5 in this example, then that combination would be packaged as one + token
and placed in the P-node, triggering the rule.


3 Cost Functions

As part of this work, cost functions were developed to estimate the cost of a Gator network relative
to other Gator networks for a particular trigger. 'I l -I functions are based on standard catalog
statistics, such as relation cardinality and attribute cardinality, as well as on update frequency.
I hl catalogs of Ariel have been extended to keep track of insert, delete and update frequency for
each table. An update is considered equivalent to a delete followed by an insert, except in the
special case of triggers that have ON UPDATE event specifications. 'I l cost functions estimate
the expense to propagate tokens through a Gator network, assuming a frequency of token arrival
at different nodes determined by the frequency statistics, relation cardinality, attribute cardinality,
selection and join predicate selectivity, and the presence of ON EVENT specifications for relations
appearing in a trigger condition. In the analysis presented in this paper, insert, delete and update
frequency are assumed to be equal (1/3 each). A presentation of the cost functions is beyond the
scope of this paper. Details on the cost functions are presented elsewhere [5].


4 Optimization Strategy

For a given rule there can be many possible Gator networks. I hl efficiency of the rule condition
testing mechanism depends on the shape of the Gator network used. An optimizer was imple-
mented in Ariel that uses a randomized state-space search technique to get optimally shaped Gator
networks. 'I hl use of a randomized approach to Gator network optimization was motivated by
the fact that it has been used successfully for optimizing large join queries [10], a problem with
a similarly large search space. Experiments were conducted [9, 17] which demonstrated that a
randomized approach is superior to a dynamic programming approach like that used in traditional
query optimizers [18].
I I ir randomized state-space search strategies were considered: iterative improvement (II),
simulated annealing (SA) and two-phase optimization (2PO, a combination of II and SA). I I, -
generic algorithms require the specification of three problem-specific parameters, namely state
space, neighbors function and cost function [12, 10, 11].
In the following discussion two sibling nodes in the discrimination network are said to be con-
nected if the following holds. Fii-t, the condition graph node set of a Gator network node N,
CGCN(N), is defined to be the set of condition graph nodes corresponding the the leaf a nodes of
N. Two sibling Gator network nodes N1 and N2 are connected if there is a rule condition graph
edge between an element of CGNX(N1) and CGNS(N2).
For the optimization of Gator networks, the following parameters were defined:

State Space 'I hl state space of the Gator network optimization problem for a given trigger is
defined as the set of all possible shapes of the complete Gator network for that trigger. Each
possible shape of the Gator network corresponds to a state in the state space. 'I I. state space
is constrained so that no / node is created that requires a cross product to be formed among
two or more of its children. It is assumed that all trigger condition graphs are connected, so










CREATE BETA


KILL BETA


MERGE SIBLING


Fignii 5: Local change operators.


it is always possible to find a Gator network that does not require cross products.2

Neighbors Function I Ih neighbors function in the optimization problem is specified by the
following set of transformation rules, which are also illustrated using examples in Fi ii, 5.

Kill-Beta: Kill-Beta removes a randomly selected P node, say KB, and adds the children
of the node KB as children of the parent of the node KB.
Create-Beta: Create-Beta adds a new 3P node, say CB, to the discrimination network. It
first selects a random / node or the P-node (call this node PARENT). If PARENT has
more than two children, Create-Beta randomly selects two connected siblings rooted at
PARENT, makes them the children of CB, and makes CB the child of PARENT.
.I. rqe-Sibling: ii l, -Sibling makes a node the child of one of its siblings. I I-, operation
first selects a random / node or the P-node. If this node has more than two children,
then two connected siblings rooted at this node are randomly selected and one of them
is made a child of the other. I I, node to which a child is added must be a 3 .

Cost Function 'I h- cost function is briefly outlined in section 3.

'If trigger condition graphs are not connected, the implementation adds dummy join edges with "true" as the join
condition to make to make them connected.


V










I Iir optimizer implemented is capable of using II, SA and 2PO. I Ir. 2PO strategy was used for
all actual performance measurements. However, all three strategies are explained below since 2PO
is a combination of II and SA. Each of the II, SA and 2PO algorithms needs to be able to construct
a random start state (feasible Gator network) given a condition graph for a trigger. Random start
states are built in the following way:

1. Assume the condition graph has N nodes. I Ir i N a nodes are created and inserted into a
list.

2. While there is more than one element in the list, a number K where 2 < K < N is generated.
A single starting element is selected from the list. I Ir ij, K 1 siblings for this node are
selected from among the other elements of the list. I I!-, is done by following join edges
leading out of the initially selected element to identify other elements of the list that have a
join relationship with the initially selected element. I I r total of K elements identified are
removed from the list, and a f node with them as children is formed. I I!i, f is inserted in
the list.

When the list has only one element, that element is a complete Gator network for the trigger. A
general description of II, SA and 2PO is given below.

4.1 Iterative Improvement

I Ir Iterative Improvement (II) technique performs a sequence of local optimizations initiated at
multiple random starting states. In each local optimization, it accepts random downhill movements
until a local minimum is reached. I h, sequence of starting with a random state and performing
local optimizations is repeated until a stopping condition is met. I1 r. final result is the local
minimum with the lowest cost.

4.2 Simulated Annealing

Simulated Annealing (SA) is a : ilfjl,. Carlo optimization technique proposed by Kirkpatrick et al.
[14] for problems with many degrees freedom. It is a probabilistic hill-climbing approach where
both uphill and downhill moves are accepted. A downhill move (i.e. a move to a lower-cost state) is
always accepted. I Ir probability with which uphill moves are accepted is controlled by a parameter
called temperature. I Ir higher the value of temperature, the higher the probability of an uphill
move. However, as the temperature is decreasing with time, the chances of an uphill move tend to
zero [14, 12].

4.3 Two Phase Optimization

In its first phase, 2PO runs II for a small period of time, performing a few local optimizations. I Ir
output of the first phase, i.e. the best local minimum, is input as the initial state to SA, which is
run with a very low initial temperature. Intuitively this approach picks a local minimum and then
searches the space around it. It is interesting to observe that this approach is capable of extricating
itself out of the local minimums. However, the low initial temperature makes climbing very high
hills virtually impossible. It has been observed that 2PO performs better than both II and SA
approaches for optimizing large join queries [10]. I Ir details of the actual implementation of 2PO
discussed in this paper, such as the crossover point between II and SA, the performance of II and
SA individually, etc., are beyond the scope of this paper.










5 Modifications to Ariel


I hl first implementation of the Ariel active DBMS was based on the A-TREAT algorithm, which
did not use 3 nodes. Ariel was thus modified to support 3 nodes. A discrimination network must
be i i i' 'I," at the time a trigger is created; in other words, its stored a and P nodes must be
loaded with data. Ariel's priming mechanism was modified to allow /3 nodes to be primed. Also,
Ariel's token propagation strategy was modified to make use of and maintain P nodes.

5.1 NeP\ Discrimination Network Node Types

In the original Ariel system, there were seven different types of a nodes with slightly different
behavior [4]. i 1fi i nodes in Ariel can be either static, in which case their contents are persis-
tent and are stored between transactions, or dynamic, in which case they are flushed after each
transaction.
To implement Gator, the memory node class hierarchy was modified to include the following
types of 3 nodes:
BetaMemory I I, i is the superclass of the other P node types.

StaticBeta An ordinary P node. If none of the children of a P node is a dynamic node, i.e.
neither dynamic-c or dynamic-/, then that / node is a -i'i ieta.

DynamicBeta If any of the children of a / node is a dynamic node, i.e. either dynamic-c
or dynamic-/, then that p node is a DynamicBeta.

TransBeta (short for Transparent Beta). An instance of this class is used at the root of the
Gator network as a place holder for the P-node.

Virtual / nodes similar to virtual a nodes are not needed since the non-existence of a / node
implies the need to reconstruct its contents as required.

5.2 Details on Priming

In Ariel, stored a and P nodes are primed. However, since the contents of dynamic a and P nodes
do not outlive a transaction, they need not be primed. Also, the virtual-as are not materialized
during priming.
To prime a stored-a, a one-tuple-variable query is formed internally to retrieve the data to be
stored in the a node. I h!- one-variable query is passed to the query optimizer, and the resulting
plan is executed. 'I l, data retrieved are stored in the a memory. To prime a / node, first its
children are primed, and then the children are joined to find the data to put in the 3 node. I Ir
priming strategy used can be described with the following recursive algorithm. I I l Prime method
is invoked on the root (P-node) of the Gator network to prime the network:

Prime(N.i I. )
{
if (childrenExist(Node))
for each C in ChildOf(Node)
Prime(C)
materializeTuples(N. .. I )
}










N1 N2 N3 N4 Token
N1 N4 N3 N2
0000



TR2 (
(a) Gator network with join order plan '
for N1=(N4,N3,N2). TR3 ,'

N5Q
(b) Propagating a token arriving at node N1.

Fi lii 6: Join order plan for a Gator node.


I Ir materializeTuples method forms the tuples for N i, by running the one-variable query in the
case of an a node, and by joining the children of \N. I, in the case of a / node.

5.3 Generating Token Join Order Plans

Every node with a sibling in the Gator network has a join plan attached to it. I Ir join plan is
a sequence of two-way joins regulating the order in which tokens arriving at the node would be
joined with each of its siblings. For instance, in Fi:ii 6(a) the join plan attached to node N1 is
(N4,N3,N2). When a token arrives at node N1, it is first joined with the contents of node N4. I I].
resulting Temporary Result (TR) of the join is then joined with contents of the node N3 and so on,
as shown in F iii- 6(b). I Ir TR's are not stored. I I.; y are generated dynamically and discarded.
An important objective is to choose a join plan with the minimum cost. However, since choosing
token join plans must be done very frequently (hundreds or thousands of times) while finding an
optimized Gator network for one trigger, it is too expensive to use traditional query optimization
[18] to find the join order plan. Instead, the following heuristic is used: during each of the two-way
joins, the current result should be joined with that sibling that would give the join result with
smallest estimated size. I I!i- gives a reasonable join order plan quickly.


6 Performance Evaluation

1 Ii- section presents the details of various experiments conducted to study the performance of
Gator, Rete and TREAT discrimination networks. I Iir performance metric in all the experiments
is the rule condition evaluation time. I Ir rule condition evaluation time is the time to evaluate
a rule condition using a discrimination network (i.e. the time to pass a set of tokens through the
network).
I Ir, Ariel active relational DBMS was used as a testbed for conducting all the experiments.
I Ii average rule activation time was measured by processing a randomly generated stream of
updates. I Ir table to which an update was applied was determined using a frequency distribution
equivalent to the update frequency statistics maintained in the system catalog. Inserts were done
on each table, and the time for each was measured. I Ir, i, a -i it.i! rule condition testing iini was
calculated by multiplying the time spent propagating a token for each table by the insert frequency
for the table.
Rules were created on a synthetically generated database and each of the three different dis-
crimination networks, Rete, Gator and TREAT, were generated for them. I Iir synthetic database










generated had the following properties. 1 Ir relation sizes (number of tuples) were randomly chosen
to be in the range [200, 300] or [2000, 10000]. For each relation, a range was randomly selected,
and within each range, a number was randomly chosen that would be that relation's cardinality.
'I l number of unique values of an attribute of a relation was chosen to give a roughly even mix of
low and high cardinality attributes. I lI cardinalities were chosen as follows. For each attribute,
a number s was chosen from one of two ranges, selected at random: [0.01, 0.1] or [0.5, 1.0]. I Ir,
cardinality of the attribute was given by multiplying that relation's cardinality by s.
Experiments were performed on rules having the following types of Rule Condition Graphs
(RCGs):

string type Each relation in the rule condition participates in a join with two other relations such
that the rule condition graph looks like a string. I il two relations at the two ends of the
string participate in only one join. I 1i following is an example of a rule with a string type
RCG. R1, R2, R3 and R4 are the relations, a is an attribute of R1, b and c are attributes of
R2, d is an attribute of R3 and e is an attribute of R4.

define rule Rulel
if R1.a = R2.b and R2.c = R3.d and R3.d = R4.e
then actionn>

star type One relation participates in a join with all the other relations in the rule condition. I I
following is an example of a rule with a star type RCG.

define rule Rule2
if R1.a = R2.b
and Rl.c = R3.c
and Rl.b = R4.d
then actionn>

I ,I update frequency distribution of various relations in the database significantly affects the
performance of discrimination networks. I il following three update frequency distributions were
chosen:

Skewed One of the relations has a very high update frequency and the other relations have low
frequencies. In all cases, one relation (always R3) is assigned 0.8 as its update frequency, and
the rest are assigned 0.05.

Even All the relations have the same update frequencies.

Ramp I I- frequencies of relations decrease in a ramp-like or stair-like manner. For all the cases,
the following distribution was used: (R2=.4, R4=.3, R3=.2, R1=.05, R5=.05).

In all cases, frequencies sum to one.
I Il results of various experiments conducted on rules with five tuple variables, with different
frequencies and with different rule condition graphs, are explained next. Two graph structures were
considered. I Il first was a string type RCG with selection conditions on two of the five relations.
I I second was a star type RCG with selections on one of the five relations.
For all the tests discussed, an optimized Gator network is compared with a TREAT network
(for which there is only one choice) and a non-optimized, arbitrarily chosen Rete network. Rete













m
60

50

40
U
30

20
10


STRING STAR
Rule Condition Graph Type

Fiai i, 7: String and star type RCG and skewed frequency distribution.


networks can be optimized, but this was not done since the focus of this work is on the more general
Gator structure. Fi&ni 7 shows the results for a rule with five tuple variables, with string and star
type RCGs and skewed frequency distribution.
It can be seen that Gator is doing much better than Rete and TREAT (a factor of 23) in this
case. I Ir, Gator, Rete and TREAT networks generated are shown in Fi ,i 8. In all the networks,
the optimizer decided to create virtual a nodes for relations with no selection predicate in the rule
condition, preventing the duplication of relations and thus saving space. In Gator, the relation with
high update frequency (R3=0.8) was pushed down the discrimination network toward the P-node.
I Iii! means fewer token joins need to be done as tokens propagate through the network due to
updates. Also, the stored a nodes with low size are at the top of the network which helps to reduce
the size of the 3 nodes below them.
In the case of TREAT, shown in Fii 1i 8, whenever a new token enters the network it always has
to participate in a join with four other a nodes and that explains its higher rule condition testing
time. In the case of Rete (A) in Filiii 8, the virtual a node corresponding to the relation with
the highest frequency (R3) is near the top of the network, and that means higher join processing
costs and /-node maintenance costs.
Fin;ii 9 shows the results for the same rule with even frequency distribution. Gator again
performs better than Rete and TREAT. I I, Gator network for this case is given in Fi;il 9 (D).
'I h Gator network has a few /3 nodes. I Ih -- result shows that it can be beneficial to maintain
a few tp nodes (though not as many as contained by a Rete) to get optimal performance. 'I I.
intuition is that having a few 3P nodes in the right place reduces the number of high-cost joins to
be performed, and the benefit of this is greater than the cost of maintaining the / nodes.
It was observed during these experiments that the speedup of Gator over Rete and TREAT for
star type RCGs was less than for string type RCGs. It appears that the optimizer as implemented
does not explore the search space in a smooth manner for star type RCGs. 'I h. reason for this
seems to be that the connectivity between different relations is poor for star type RCGs. Addition
of more edges to the rule condition graph with -t i i '' as the join condition is being investigated.
I I -' will generate many more states in the state space, many of which will have a high cost, but it
may allow a smoother transition between states that are "ilt, 1i -It ij" for the optimizer to explore.
In many cases it is not intuitive why one network is better than another, in large part because
there are so many competing factors that influence the performance of a network. Hence a con-


I Rete
SIGator
-- Treat





--__























R1 R3 R5 R4 R2


(A) Rete, Skewed






R2 R5


(B) Treat, Skewed


R5 R1


(C) Gator, Skewed


(D) Gator, even


Finii: 8: Gator, Rete and TREAT networks generated for string type RCG. In this figure,
SA=stored a, VA=virtual a, B=/, and TB=trans-/ (P-node). I I. symbols R1 through R5
represent the base relations from which the labeled a nodes are derived.













800
700

600
500

S 400
S 300


I Rete

Gator

I Treat










STRING STAR
Rule Condition Graph Type


Fi-ilii 9: String and star type RCG and even frequency distribution.


60 -

50 -

40 -

30

20


SRete

Gator
I Treat










STRING STAR
Rule Condition Graph Type


Fi nili 10: String and star type RCG and ramp frequency distribution.


clusion of this work is that it is better to use cost functions and search to perform optimization of
Gator networks than to use heuristics to pick a good network.
Finuii 10 shows the results for the string and star RCGs with a ramp frequency distribution.
For the rule with the string RCG, the performance of the Gator network was slightly better than
Rete and TREAT. However, for the rule with the star RCG, TREAT does better than Gator.
It appears that the cost formulas may have an inaccuracy in this particular case which prevents
the actual best Gator network from being found. However, overall, the results presented indicate
that the cost formulas are accurate enough to allow a good Gator network to be found in a large
majority of cases. Refinements to the cost formulas to better handle star type rule condition graphs
are being considered.


7 Conclusion

I I!,- paper has introduced Gator networks, a new discrimination network structure for optimized
rule condition testing in active databases. A cost model for Gator has been developed, which










is based on traditional database catalog statistics, plus additional information regarding update
frequency. A randomized Gator network optimizer has been implemented and tested as part of the
Ariel active DBMS.
An interesting result of this work is that for most cases, even for even update frequency distri-
butions, the optimal Gator network has a few /3 nodes -it is not a TREAT network. In addition,
this work shows that it is beneficial to use a general discrimination network structure (Gator),
instead of limiting the possibilities to TREAT or Rete. Also, it shows that update frequency distri-
bution has a tremendous influence on the choice of the best discrimination network. I,: -i.. r!, it
is indeed feasible to develop a cost model and search strategies that allow effective Gator network
optimization.
I I-i work has clearly demonstrated the value of optimizing the testing of trigger conditions
involving joins in active databases. I Ii-i can help make it possible to implement the capability to
efficiently and incrementally process triggers with joins in their conditions in commercial database
systems, thus making a new, powerful tool available to database application developers.



References

[1] David A. Brant and Daniel P. i:'.,. i,. Index support for rule activation. In Proceedings
of the ACM SIGMOD International Conference on .1 ...... :r, it of Data, pages 42-48, : I.;.
1'i'.;

[2] L. Brownston, R. Farrell, E. Kant, and N. 1.f! tin. Programming Expert Systems in OPS5: an
Introduction to Rule-Based Programming. Addison Wesley, I' -i

[3] C. L. Forgy. Rete: A fast algorithm for the many pattern/many object pattern match problem.
II IT;. '. Intelligence, 19:17-37, 1982.

[4] Eric N. Hanson. I h- design and implementation of the Ariel active database rule system.
IEEE I i.i.i. It .. on Knowledge and Data Engineering, 8(1):157-172, February I''ii

[5] Eric N. Hanson, Sreenath Bodagala, : il., i j., I Hasan, Goutam Kulkarni, and Jayashree
Pn miari ji Optimized rule condition testing in ariel using gator networks. Technical Report
'I 1-' '-,027, University of Florida CIS Dept., October 1'i'iP http://www.cis.ufl.edu/cis/tech-
reports/.

IJ Eric N. Hanson,: .: ('l,.... Ili_, ('! ..ijC-lji, Kim, and Yu-wang Wang. A predicate match-
ing algorithm for database rule systems. In Proceedings of the A CM SIGMOD International
Conference on [1i., .. t i.. of Data, pages 271-280, : 1.'. 1990.

[7] Eric N. Hanson, I-(C'l, Ci ('!, i ,, Roxana Dastur, Kurt Engel, Vijay Ramaswamy, ('!Cini Xu, and
Wendy Tan. Flexible and recoverable interaction between applications and active databases.
VLDB Journal, 1997. Accepted.

L Eric N. Hanson and I Ir .. I. Johnson. Selection predicate indexing for active databases using
interval skip lists. Information Systems, 21(3)_'-12' 298, I'L'L,

I i i.l .iij j, .i Hasan. Optimization of discrimination networks for active databases. : i.,-t, r's
thesis, University of Florida, CIS Department, \N'.. nii I 'I'.;










[10] Yiannis loannidis and Younkyung ('C,. Kang. Randomized algorithms for optimizing large
join queries. In Proceedings of the ACM SIGMOD International Conference on ..1'.... .., /
of Data, pages 312-321, : 1.. 1990.

[11] Yiannis loannidis and Younkyung ('C! .i Kang. Left-deep vs. bushy trees: An analysis of strategy
spaces and its implications for query optimization. In Proceedings of the ACM SIGMOD
International Conference on .l1 ...i.. j,. i ,t of Data, pages IIs 177, :1.y 1991.

[12] Yiannis loannidis and Eugene Wong. Query optimization by simulated annealing. In Proceed-
ings of the ACM SIGMOD International Conference on .1.i ..... i. i of Data, I' i.

[13 Toru Ishida. An optimization algorithm for production systems. IEEE I .,1.. i .I on Knowl-
edge and Data Engineering, 6(4):549-558, August 1 1

[14] S. Kirkpatrick, C. C. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science,
220:671 11sI, I' .;

[15] Goutam Kulkarni. Extending the Ariel active DBMS with Gator, an optimized discrimination
network for rule condition testing. Technical Report 11' --006, University of Florida, CIS
Dept., February 1'-'i-. 1 thesis, http://www.cis.ufl.edu/cis/tech-reports/.

[Ii] Daniel P. in.,l. I TREAT: A better match algorithm for AI production systems. In Proc.
AAAI National Conference on A1 I .'I Intelligence, pages 42-47, August I' ;.

[17] Jayashree nP,:n-,-r i:n-, A randomized optimizer for rule condition testing in active databases.
.: i.,-, r's thesis, University of Florida, CIS Department, December 1I'i.;

[18] P. Selinger et al. Access path selection in a relational database management system. In
Proceedings of the ACM SIGMOD International Conference on .1.i.... '.. / of Data, June
1979. (reprinted in [19]).

[19] : i, ij.,- i Stonebraker, editor. Readings in Database Systems. : il. .i.ni Kaufmann, I'I'

[20] Jeffrey D. Ullman. Principles of Database Systems. Computer Science Press, 1982.

[21] Yu-wang Wang and Eric N. Hanson. A performance comparison of the Rete and TREAT
algorithms for testing database rule conditions. In Proc. IEEE Data Eng. Conf., pages .' '17,
February 1992.




University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs