|
Citation |
- Permanent Link:
- http://ufdc.ufl.edu/UF00101372/00001
Material Information
- Title:
- Enhanced techniques for timer trigger processing
- Creator:
- Noronha, Lloyd X. 1973- ( Dissertant )
- Place of Publication:
- Gainesville, Fla.
- Publisher:
- Department of Computer and Information Science and Engineering, University of Florida
- Publication Date:
- 1999
- Copyright Date:
- 1999
- Language:
- English
- Physical Description:
- vii, 73 leaves : ill. ; 29 cm.
Thesis/Dissertation Information
- Degree:
- Master's ( Master of Science (M))
- Degree Grantor:
- University of Florida
- Degree Disciplines:
- Computer Engineering (CEN)
- Committee Chair:
- Hanson, Eric N
- Committee Members:
- Hammer, Joachim
Su, Stanley Y
Subjects
- Subjects / Keywords:
- Algorithms ( jstor )
Art dealing ( jstor ) Databases ( jstor ) Email ( jstor ) Expiration ( jstor ) Indexing ( jstor ) Predicates ( jstor ) Sales personnel ( jstor ) Timing devices ( jstor ) Vacations ( jstor ) Computer and Information Science and Engineering thesis, M.S Database management ( lcsh ) Dissertations, Academic -- UF -- Computer and Information Science and Engineering ( lcsh ) City of Gainesville ( local )
- Genre:
- bibliography ( marcgt )
theses ( marcgt )
Notes
- Abstract:
- Timer-Driven Triggers allow users to monitor interesting tuples or interesting
changes to tuples of a view. Trigger condition testing takes place periodically, i.e., once
every few hours, days or weeks. When the timer of a trigger expires, a new copy of the
view is retrieved and the trigger predicates applied to that view. Trigger predicates could
be transition predicates referring to the attribute values of tuples of a view from the
previous timer expiration.
Timer Triggers differ from conventional triggers, in that the trigger testing is not
done for updated tuples only. Timer triggers could be fired on tuples of a view simply
because they satisfy the predicate trigger predicate. Since the trigger conditions are tested
every so often, high update rate tables can have a lot of timer triggers defined on them.
The goal of this thesis is to design a timer trigger system that works hand-in-hand
with a trigger processor and complements its features. The semantics of the timer trigger
differ from that of regular triggers and are explained by a brute-force approach. Several enhanced algorithms can be written that do better than the brute-force approach, but do
not change the semantics. A timer trigger system also provides the user an option of
replacing expensive join and aggregate triggers with timer triggers.
- General Note:
- Printout.
- General Note:
- Vita.
- Thesis:
- Thesis (M.S.)--University of Florida, 1999.
- Bibliography:
- Includes bibliographical references (leaves 70-72).
Record Information
- Source Institution:
- University of Florida
- Holding Location:
- University of Florida
- Rights Management:
- Copyright Lloyd X Noronha. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
- Resource Identifier:
- 2531467 ( ALEPH )
43757201 ( OCLC )
|
Downloads |
This item has the following downloads:
|
Full Text |
Enhanced Techniques for Timer Trigger Processingi
Lloyd Noronha and Eric Hanson
301 CSE
CISE Department
University of Florida
Gainesville FL 32611-6120
(352) 392-2691
hanson@cise.ufl.edu
http://www.cise.ufl.edu/-hanson
TR-99-015
9 Sep 1999
1 This report contains the MS thesis of the same title written by Lloyd Noronha under the
supervision of Eric Hanson, September 1999.
ACKNOWLEDGMENTS
I would like to express my sincere gratitude to my advisor, Dr. Eric
Hanson, for giving me an opportunity to work on this challenging project and also
for providing continuous guidance, advice, and support throughout the course of
my work.
I thank Dr. Joachim Hammer and Dr. Stanley Su for serving on my
supervisory committee and for their careful perusal of this thesis. I would like to
thank Ms. Sharon Grant for maintaining a great research environment at the
Database Systems Research and Development Center.
I am thankful to Chris Carnes, Lan Huang, Mohan Konyala, Sasi Kumar,
J. B. Park, and Albert Vernon for their invaluable help and the fruitful discussions
that we had during the course of this work.
On a more personal note, I would like to thank my family whose love,
support, and constant encouragement was of great importance through this work.
TABLE OF CONTENTS
Page
A C K N O W L E D G M E N T S ............................ ................. ................................................ii
T A B L E O F C O N T E N T S ......... ................. ......................................... ..........................iii
LIST OF FIGURES ........................ ... ..................... v
CHAPTERS
1 IN T R O D U C T IO N ......... .......... .. .. ...... .. ......................................................
2 TRIGGERMAN, A FAST ASYNCHRONOUS TRIGGER PROCESSOR.................. 5
2.1 Introduction ......................................................... ......... 5
2.2 TriggerM an A rchitecture............................................................... .. ............ 6
2.3 The Predicate Indexing Strategy of TriggerMan............................................. 8
2 .4 C onclu sion................................................. ................. 10
3 TIMER TRIGGER LANGUAGE ................................................... .................. 12
3 .1 Introdu action .............................................. .......... . ........... 12
3.2 T im er T rigger L anguage......................................................................... ... 14
3.3 Trigger Sem antics....................................... .. ................. ...... ... ....... 15
3.4 Tim er trigger exam ples .............. ............................................ .............. 16
3.5 Conclusion ................................................... 20
4 ENHANCED ALGORITHMS FOR SINGLE TABLE AND JOIN TRIGGERS........ 21
4.1 Introduction ................. ..... ...... ......... .......... ........................... 21
4.2 The Brute Force Implementation for Single Table and Join Triggers ............ 22
4.3 Optimization Goals and Strategy..................... ......... ...... .............. 23
4.4 Enhanced Algorithms for Single Table Triggers............... .......... .. 24
4.4.1 Single Table on U pdate ...................................................... .............. 25
4.4.2 Single Table on Insert................................ ................................. 28
4.4.3 Single Table on D elete .................................. ......... .............. 29
4.4.4 Single Table on Update with Transition Predicate............................. 30
4.5 Enhanced algorithms for Join Triggers ................................. .............. 32
4.5.1 Join Trigger on Update................ ..... ................ .. ............. 32
4.5.2 Join Trigger on Update with Transition Predicate ............................... 34
4.6 Combining the View Predicate and the Trigger Predicate ........................... 36
4.7 Examples ............................................... .............. ......... 37
4.8 C conclusions ............................................................................... 41
5 ENHANCED ALGORITHMS FOR AGGREGATE TRIGGERS............................ 42
5.1 Introduction ......................................... 42
5.2 The Brute Force Implementation for Aggregate Triggers............................ 42
5.3 Single Table Aggregate Triggers..... ................. ...................... .............. 44
5.3.1 Single Table, Aggregate, Non-Transition, on Update...............................45
5.3.2 Single Table, Aggregate, Transition, on Update .............................. 47
5.4 M ulti Table Aggregate Triggers ...................................... ....... .............. 48
5.4.1 Multi Table, Aggregate, Non-Transition, on Update ........................... 48
5.4.2 Multi Table, Aggregate, Transition, on Update ................................... 52
5.5 E xam ples .................................................................. ......... 53
5.6 Conclusion ........................................ .......... 55
6 OPTIMIZATION AND IMPLEMENTATION ISSUES .......................................... 56
6 .1 Introdu action .............................................. .......... . ........... 56
6.2 Architecture ......................................................................... ......... ................... 56
6.3 T thoughts on O ptim ization ..................................................... ... ................. 59
6.4 Optimization Strategies .................. ................................. 60
6.5 Conclusion........................................ .............. 62
7 SUMMARY AND FUTURE WORK.......... ....... .............................................64
7.1 Summ ary ............................ ....... .............. 64
7.2 Thoughts on Future W ork ........................................ ......................... 65
REFERENCES ........... .............. .. ......................... .... 70
BIOGRAPHICAL SKETCH...... ........ ................... ...... ... .............. 73
LIST OF FIGURES
Figure Page
2 .1: T riggerM an A rchitecture....................................................... ................................. 7
2.2: Predicate Index A architecture .......................................................... .............. 8
6.1: A architecture of a Tim er Trigger System .................................................................... 57
6.2: Performance of the brute force method vs. the enhanced method
for v trying u p date rates. ......................................................................................... 60
Abstract of Thesis Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Master of Science
ENHANCED TECHNIQUES FOR TIMER TRIGGER PROCESSING
By
Lloyd X. Noronha
August, 1999
Chairman: Eric Hanson
Major Department: Computer and Information Science and Engineering
Timer-Driven Triggers allow users to monitor interesting tuples or interesting
changes to tuples of a view. Trigger condition testing takes place periodically, i.e., once
every few hours, days or weeks. When the timer of a trigger expires, a new copy of the
view is retrieved and the trigger predicates applied to that view. Trigger predicates could
be transition predicates referring to the attribute values of tuples of a view from the
previous timer expiration.
Timer Triggers differ from conventional triggers, in that the trigger testing is not
done for updated tuples only. Timer triggers could be fired on tuples of a view simply
because they satisfy the predicate trigger predicate. Since the trigger conditions are tested
every so often, high update rate tables can have a lot of timer triggers defined on them.
The goal of this thesis is to design a timer trigger system that works hand-in-hand
with a trigger processor and complements its features. The semantics of the timer trigger
differ from that of regular triggers and are explained by a brute-force approach. Several
enhanced algorithms can be written that do better than the brute-force approach, but do
not change the semantics. A timer trigger system also provides the user an option of
replacing expensive join and aggregate triggers with timer triggers.
CHAPTER 1
INTRODUCTION
Timer Triggers test and fire triggers periodically i.e., every few minutes, hours,
days, weeks etc. Views are created on the database and triggers are fired periodically on
interesting tuples or interesting changes to tuples of the view. This idea differs from the
conventional philosophy of synchronous [Sil98] and asynchronous [Han97] triggers
which fire triggers when a tuple is updated.
Current database systems process triggers synchronously. Synchronous trigger
processing lengthens the response time of update transactions, because the transaction
cannot commit until all the trigger conditions defined on the updated tables are tested and
actions fired. The response time of update transaction is thus directly affected by the
increase in the number of triggers, which limits the number of triggers that can be defined
on a table.
In the case of assertions synchronous triggers are indispensable. But in most other
cases, delaying the trigger processing to until after the triggering transaction commits (i.e.
asynchronous trigger processing) should work just as fine. This is especially true in the
case of alerters. In an asynchronous trigger system, the only work expended during a
transaction is to capture the updates for later processing [Han98]. This time spent in
capturing updates is a constant and totally unrelated to the number of triggers defined on
the tables involved in the transaction. Thus, it is possible to increase the number of
triggers defined on a table to a very large number without affecting the response time of
update transactions.
After the transaction completes, each of the updated tuples captured (also known
as tokens) are individually tested against the trigger condition of each trigger and actions
fired accordingly. This process of testing all the trigger conditions can be done
sequentially. However, a huge speedup can be achieved by means of an efficient
discrimination network for evaluating predicates. A Selection Predicate Index, is one such
discrimination network [Han98], which causes a factor of improvement in the processing
time per updated token.
This strategy works really well for all types of triggers, especially with single table
triggers because the entire condition evaluation on an update token takes place in main
memory i.e., without accessing the database. In the case of join triggers and aggregate
triggers, there is a relative increase in time spent on processing a token, due to a possible
table-access involved during condition evaluation of the trigger. In the case of join
triggers, when an update token of a table arrives, a part of the condition evaluation
involves performing a join. In the case of aggregate triggers, we have to identify the group
to which the update token belongs and test the aggregate condition for that group. This is
covered in more detail in Chapters 4 and 5.
In both these cases, the Selection Predicate Index is still used to evaluate a part of
the predicate that does not involve a table access. If this part evaluates to true, we then
proceed to run an SQL query to read one or more tables, to evaluate the rest of the
predicate. Thus, although an efficient discrimination network still evaluates a part of the
predicate quickly, it is the I/O costs that dominate the processing time of join and
aggregate triggers. Timer triggers can be thought of as an alternative option to some
expensive join and aggregate triggers because they access the database less often.
Timer triggers enable the user to define triggers on a view of the database. The
view could be a join or an aggregation of one or more tables or simply a snapshot of a
single table. Trigger conditions are tested periodically i.e., once every few minutes, hours,
days, weeks etc. On a timer expiration, a new copy of the view is created and the
predicates of the trigger are applied to the view. These predicates could be transition
predicates, which compare some attribute values of the current view with the
corresponding values of the copy of the view created on the previous timer expiration.
It is easy to conceive of many applications and examples where users would like
to be intimated of interesting conditions once every so often. Consider an example of a
manager of a computer company, who is interested in knowing every week, which of his
hardware salesmen have a two fold increase in sales from the previous week. A timer
trigger can be defined with the timer duration of a week. The view here, keeps the sales
information of only the hardware salesmen. When the timer expires, a fresh copy of this
view is created from the sales table and the sales information for each salesman is
compared with his sales from the view of the previous week. The action is fired for all
tuples of the view that satisfy the predicate.
Asynchronous triggers, like synchronous triggers, test the trigger condition for
every updated tuple, albeit only after the transaction commits. Thus, only an updated
tuple can cause a trigger to fire. However, in timer triggers, all tuples of a view that
satisfy the predicate can fire the trigger. These include tuples which may have not been
touched since the last timer expiration.
By definition, timer triggers are defined to achieve an objective different from
asynchronous triggers and thus have different semantics. However, as indicated above,
for expensive join and aggregate triggers, it is possible to replace an asynchronous trigger
with a timer trigger to achieve similar (but not always the same) results.
The goal of this thesis is to propose a good design of a timer trigger system to
complement an existing trigger processor. Chapter 2 explains the architecture and
predicate indexing mechanism of an asynchronous trigger processor, TriggerMan. The
syntax and semantics of the timer trigger language are presented in Chapter 3 along with
some examples. Implementation of timer triggers with efficient algorithms for single
table and join triggers are discussed in Chapter 4. Chapter 5 has a similar discussion for
aggregate triggers. Chapter 6 focuses on the architecture of a timer trigger system and
some optimization details. Finally, Chapter 7 summarizes the discussion on timer triggers
and provides some thoughts for future work.
CHAPTER 2
TRIGGERMAN, A FAST ASYNCHRONOUS TRIGGER PROCESSOR
2.1 Introduction
The design of a timer trigger system in this thesis focuses on working with a
conventional (synchronous or asynchronous) trigger processor to complement and
enhance its functionality. In later chapters, we present some enhanced algorithms for the
implementation of timer triggers. These algorithms need to define some regular single-
table triggers to capture interesting updates and thus it is very important for the
underlying trigger processor to have a fast predicate evaluation mechanism for single-
table triggers. In this chapter we explain the architecture of such a trigger processor called
TriggerMan [Han98].
TriggerMan is a scalable and extensible trigger processor. The triggers in
TriggerMan can have user defined data types and the system scales well even for millions
of triggers for certain applications. The scalability of the system can be attributed mainly
to its asynchronous trigger processing mechanism and an efficient predicate indexing
strategy. As the number of triggers in a system increases, more time is spent on testing
each updated tuple against all the trigger conditions. Making this testing process
asynchronous alleviates the problem of delays in update transactions, which is an inherent
problem of synchronous trigger processing. TriggerMan additionally implements a
predicate indexing strategy [Han99] that is responsible for a huge speedup in evaluating
the predicates of the triggers and thus reduces the processing time per updated tuple.
Extensibility will soon be an indispensable requirement in trigger systems. With
the advent of Object Relational Database Management Systems (ORDBMS), extending
data types and adding new routines in a database system have gained popularity amongst
users. Users are allowed to create new User-Defined Data Types (UDT) and User-
Defined Routines (UDR) in the database system and use them with regular SQL.
TriggerMan is designed to recognize UDT's and UDR's of the underlying database system
and allow their use in triggers.
The other possible choices of trigger processors for our timer trigger system are in
the active database systems of Ariel [Han96a], POSTGRES [Sto91], HiPAC [Cha89],
Starbust [Wid91], and Ode [Geh91]. But they do not provide the scalability feature of
TriggerMan and very limited, if any, extensibility support. Among them, Ariel and
POSTGRES do implement a selection predicate indexing architecture, which however, is
not extensible.
2.2 TriggerMan Architecture
TriggerMan is designed to work as a stand-alone module independent of the
underlying system. However, for simplicity of explanation, the architecture that is
discussed here is as a module that resides on an Object-Relational DBMS. The
architecture of TriggerMan is shown in Figure 2.1.
The main components of the architecture are explained as follows. The
TriggerMan Module resides and runs in the address space of the DBMS server. The task
queue is a to-do list of tasks for the trigger processor. Once every so often, multiple
TriggerMan drivers run in parallel to read and process jobs from the queue. The Data
Source applications are responsible for capturing updates on local and remote data
sources and putting them in an update queue. These updates are later gathered up from
the update queue by the TriggerMan drivers and added as tasks to be processed in the task
queue.
Figure 2.1: TriggerMan Architecture
Lastly, the TriggerMan console is an application program that interfaces to the
user and allows him to initialize the system, create triggers, drop triggers, shut down the
system etc.
2.3 The Predicate Indexing Strategy of TrigerMan
The major work done by a TriggerMan driver program in Figure 2.1 is in
evaluating the predicates for an updated tuple against each of the trigger definitions.
TriggerMan employs a novel selection predicate indexing strategy [Han98] that makes the
predicate evaluation faster by a huge factor.
Selection Predicate Index
Predicate Index Root
Data Source Node
Expression Signature List
Sets of the rest of the predicate and trigger id's
for a particular constant
Figure 2.2: Predicate Index Architecture
Constant Sets
If a system has a large number of triggers then there is a fair chance that some
clusters of triggers have a part of their predicate looking similar. This is the basic premise
of TriggerMan's predicate indexing strategy. For example a majority of triggers defined
by a stock investor will be alerters where the predicates check for stock prices going
above or below a certain interesting value.
The predicate indexing strategy of TriggerMan, simplified for explanatory
purposes, is shown in Figure 2.2. All the data sources defined in TriggerMan have a node
at the first level below the root. When a trigger is defined, its predicate is converted to
Conjunctive Normal Form (CNF), and the most selective conjunct qualifies to be the
expression signature for that trigger. If the same signature already exists, then the
constants of this conjunct are plugged into the constant list under that signature and nodes
containing the rest of the predicate and the id of the trigger are placed under this constant.
If the same signature does not already exist then a new signature node is created and the
same steps followed.
For example consider the following two triggers having the same expression
signature:
create Trigger buy_informix
from stock
where (name="Informix") and (price<5)
do email("david", "Stock price of Informix within 5");
create Trigger sell_oracle
from stock
where (name="Oracle Corp.") and (price>40)
do email("david", "Sell Oracle Corp. now. The price is :price");
The trigger processor isolates the predicate "stock.name=constant" as the
expression signature for both the triggers because it finds it the most selective. It adds this
signature into the list of expression signatures under the data source "stock" (if it does not
already exist). The constants "Informix" and "Oracle" are inserted into the list of
constants (if they do not already exist). Finally, a node for the id's of the triggers and the
remaining part of the predicate are inserted under these constants.
Now, when the information about an updated tuple of the stock relation is pulled
out from the task queue by the TriggerMan driver, it is made to pass through this
discrimination network. The attribute values within this tuple are evaluated against the
constants within each expression signature. For every constant for where the expression
evaluates to true, we proceed with testing the rest of the predicate and then firing the
trigger action if necessary.
The predicate for single table triggers can be evaluated completely within this
main memory discrimination network. However for join and aggregate triggers,
evaluation of the rest of predicate involves reading relations to perform joins and
aggregations.
2.4 Conclusion
The optimization algorithms for timer triggers need to define single table triggers
for each of the data sources with the timer trigger definition. The predicate indexing
strategy of TriggerMan works best for single table triggers and hence is ideal for the timer
11
trigger algorithms. Along with a fast predicate evaluation technique, TriggerMan also
scales well for a large number of triggers and works with extensible data types.
CHAPTER 3
TIMER TRIGGER LANGUAGE
3.1 Introduction
The preceding chapters emphasized the importance of an asynchronous trigger
processor like TriggerMan, which provides a more powerful language than current trigger
systems while at the same time providing the features of scalability and extensibility.
There are trigger situations however, where the events of interest for a user do not lie in
individual updates to the database but in interesting happenings to the database over a
period of time. Take the case of a stock market investor who wishes to be alerted at the
end of the week when he spends over a $10,000 in purchasing stock of a single company.
Here, the investor requires the trigger system to keep track of all his purchases for the
week. In essence, the user wants the trigger system to keep track of a certain view of the
database and notify him periodically of interesting changes to that view. This is the
fundamental motivation behind timer triggers.
Apart from fulfilling it's basic role of periodic notification of interesting changes
to a view, timer triggers also provide users an option of off-loading, where possible,
expensive triggers from asynchronous processing to timer based processing. An
asynchronous trigger processor tests for trigger conditions on each updated tuple. While
this strategy works out well for single table triggers (due to no table access), join and
aggregate triggers could suffer sometimes. Both join and aggregate triggers potentially
involve a table access during predicate evaluation on an update token.
A part of the predicate not dependent on the table-access, can still be evaluated
using the fast predicate indexing strategy as that of TriggerMan. However, evaluating the
remaining portion of the predicate in join triggers requires tuples from one or more tables
to be read. In aggregate triggers all tuples belonging to the same group as the updated
tuple have to be read and an aggregation performed on them. This leads to a higher
processing cost per update token, whose effect could be augmented by the increase in the
number of join and aggregate triggers. In systems where triggers are defined on high
update-rate tables, this problem could get compounded causing the system to get
overloaded.
In situations such as these, the timer trigger system in addition to fulfilling it's
semantic role, could lend a helping hand to the asynchronous trigger system by offloading
some expensive join and aggregate triggers to timer triggers. Instead of performing table
accesses often, the advantage lies in doing them once every so often. Timer triggers thus,
albeit semantically different, can be used to meet objectives similar to asynchronous
triggers for a better system performance. Due to differing semantics however, the
decision to switch to a timer trigger must not be one done automatically by the trigger
processing system but should be an explicit decision of the user. The sections to follow
explain the syntax and semantics of a timer trigger system.
3.2 Timer Trigger Language
A Timer Trigger System is designed to work as a component of an asynchronous
trigger processor like TriggerMan, making the schema definitions of both the DBMS and
trigger processor available to it. Timer triggers can be created on data sources or views.
The asynchronous trigger processor is already familiar with the schema and the primary
key of all data sources specified in the command. The syntax for creating timer triggers is
as follows:
create timer trigger
from
{on event}
{when }
do
check every
{initialize {immediately I on first timer expiration}}
The from clause specifies the name of a data source or view on which the trigger
is defined. This field can also contain a SQL select statement. The on event clause could
either be insert, delete or update indicating that the trigger should fired only when inserts,
deletes or updates take place in the view. The default is update.
The when clause has the trigger condition on the view. When the timer expires,
this boolean expression is applied on the view to identify tuples satisfying the trigger
condition. Although any kind of predicates are allowed to be in the when clause, its
primary purpose is to specify transition predicates. The do clause specifies the trigger
action. The check every clause specifies the timer period or the day of the week and time
when the trigger should fire. The initialize clause is optional and specifies whether the
first instance of the view is created immediately or the first time the timer expires.
3.3 Trigger Semantics
The semantics of the timer trigger command is now explained for the three
different events viz. insert, delete and update. Consider a timer trigger defined on a data
source or a view with a when clause W. The predicates could be transition predicates
referring to old values of attributes obtained from the copy of the view created during the
previous timer expiration. The view itself, could be a single relation or a result of
applying a select statement with predicates on single tables, joins or aggregate queries.
Periodic processing: In the explanation to follow V is the current copy of the view
created when the timer expires by executing the query in the from clause of the create
timer trigger statement and oldV is the copy of the view from the previous timer
expiration. The following steps are executed when the timer expires:
Event ON UPDATE:
1. Retrieve the contents of the view into V.
2. If the when clause W has a transition condition then the action is run on every
tuple resulting from the operation aw (V JOIN oldV). If there is no transition
condition, then the action is run on every tuple resulting from the operation aw
(V).
3. Destroy oldV and rename V to oldV.
Event ON INSERT:
1. Retrieve the contents of the view into V.
2. The action is run on every tuple resulting from the operation aw (V old V).
3. Destroy oldV and rename V to oldV.
Event ON DELETE:
1. Retrieve the contents of the view into V.
2. The action is run on every tuple resulting from the operation aw (oldV V).
3. Destroy oldV and rename V to oldV.
3.4 Timer trigger examples
Example 1: Send an alert to employees who have taken more vacation than their allowed
quota but haven't filled out a special requisition form.
The schema definition of the relation is:
employee(empid, empname, email_addr, vacationtaken, vacationleft,
vacationreq filled .........) with emp_id as the primary key.
The trigger is defined as follows:
create timer trigger excessvacation
from (select email_addr, vacationreqfilled
from employee
where vacationtaken > vacationleft)
when vacationreqfilled = False
check every week beginning Monday at 8:00 AM
do email(:email_addr, "Please fill out the Excess Vacations form");
Here the view is simply the details of the employees who have exceeded their
allowed vacation quota. When the timer expires, it identifies the employees who haven't
filled out the necessary form from among those in the view. The notation :email_addr in
the action substitutes the actual email addresses by macro expansion while running the
action for qualifying tuples. Since no event is specified, on update is assumed. Moving
the predicate of the when clause to the definition of the view does not change the
meaning of the trigger in this example.
The above is a good example illustrating the use of timer triggers and it's differing
semantics from conventional triggers. The intent is to email a forgetful employee every
Monday morning until he does the needful. Trying to achieve the same with a
synchronous or an asynchronous trigger would send an email to the employee every time
his details are touched or it is possible that he does not get emailed at all if his details are
not touched.
Example 2: Alert all junior managers when they have two or fewer days of vacation left
in the year. The schema of the relation employee is the same as the one above.
create timer trigger lessvacationleft
from (select email_addr
from employee
where (designation='Jr. Managers) and
(vacationleft vacationtaken) <= 2)
on insert
check every week
do email(:email_addr, "You have only two or less days of vacation left in the
year");
The view has all the junior managers having two or less days of vacation. The on
insert clause ensures that the employee is emailed only the first time his vacation dips
below two days.
Example 3: Notify the finance department of the customers who haven't honored their
invoices for more than ninety days past the due date.
The schema definition of the relations are:
invoice (num, cno, due_date, amount, ........) with primary key num and
customer (cno, name, addr, ...... ) with cno as the primary key.
The trigger is defined as follows:
create timer trigger unhonoredinvoice
from (select invoice.num as invnum, customer.name as custname
from customer, invoices
where (invoice.cno = customer.cno) and (CURRENT DATE -
invoice.due_date) > 90 and (invoice. status='UNPAID')
check every week
do email("finance@experts.com", "Customer :custname has not honored the
invoice :invnum for more than 90 days");
This is a good example of a join trigger where the relation of interest is only
invoice. The customer relation is present only to extract the information on defaulting
customers. Another important characteristic of this trigger is that tuples who qualify are
mostly tuples lying untouched for more than 90 days.
Example 4: The managers of a famous record label wish to let their new pop and rock
artists know when their album sales have crossed a million.
The schema definition of the relations are:
album (albumid, artistname, sales, artist_type, ......) with album_id as the
primary key and
artist (artistname, category, ......) with artistname as the primary key.
The trigger is defined as follows:
create timer trigger newartistalbum_sales
from (select artist.email as artistemail, album.sales as sales
from album, artist
where (album.artistname = artist.artistname) and
((artist.category="rock") or (artist.category="pop")) and
(album.artist_type="new")
when old.sales < 1000000 and sales>=1000000
check every day
do email(:artistemail, "Congratulations! Your album sales have crossed a
million");
The transition predicate in the when clause ensures that only one email is sent to
the artist on the day when his album sales have crossed a million.
Example 5: The managers of a departmental store chain wish to keep the salaries of all
the sales personnel to within 50% of the budgeted salary.
The schema definition of the relations are:
emp(empid, name, dno, sal, desig, ..........) with empid as the primary key and
dept(deptid, name, budget, manageremail, ....) with dept id as the primary key.
The trigger is defined as follows:
create timer trigger excess_salary
from (select dept.manager email as mgremail, sum (emp.sal) as sum_sal,
dept.budget as budget
from emp, dept
where (emp.dno = dept.dno) and
(emp.desig="sales personnel") and
(dept.name="toy")
group by dept.name, dept.manageremail, dept.budget
when sum_sal > .50 budget
check every week
do email(:mgremail, "You are exceeding the allowed budget for salespersons.");
This is an example of an aggregate trigger involving two tables. A trigger can fire
due to changing information on either of the tables and both tables also have selection
predicates. The view here has aggregate information about the total salaries of sales
personnel in the toy dept.
3.5 Conclusion
The syntax and semantics of a timer trigger language are explained. The examples
illustrate how users can use timer triggers to be informed periodically of interesting tuples
in a view. In example 3, we saw an interesting property of timer triggers where some
tuples satisfy the trigger condition although they did not do so in the previous timer
expiration and have not been touched since. They qualified for the action simply because
they aged enough to satisfy a date-time predicate. The examples brought about the
similarity and the differences in timer triggers against the more conventional synchronous
and asynchronous triggers.
CHAPTER 4
ENHANCED ALGORITHMS FOR SINGLE TABLE AND JOIN TRIGGERS
4.1 Introduction
In the previous chapter we presented the syntax of the "create timer trigger"
command and explained the semantic definition of the different types of single table
triggers, join triggers and aggregate triggers with some examples. A brute force method
for implementing any timer trigger involves running a query against the relations in the
trigger definition to generate the new copy of the view and then evaluating the predicates
of the trigger for all tuples in that view. While the brute force method always works and
is the simplest implementation, it is often an inefficient way to go. Consider for example,
the following read-mostly table employee_details which stores the name and information
of all employees within a company,
employee details(emp id, emp name, emp addr,.....)
Changes to this table normally take place only when a new employee is recruited
or an existing employee leaves. If one or more timer triggers are defined on this low
update rate table, then it is totally unnecessary to re-run queries on the entire relation
employee_details to create the new view, every time the timer goes off. In the case when
there are no updates at all, between two timer expirations, the new view is simply the old
view. If only a few updates take place in this interval, then the new view can be obtained
from the old view by simply keeping track of the updates since the last timer expiration
and applying those updates to the old view during processing.
In this chapter we try to look at some enhanced strategies to generate the old view
from the new view without the brute force method. Several enhanced methods seem
appealing, but with improper optimization methods several cases of anomaly arise when
insert, updates and deletes take place on the same tuple many times in one interval. Our
enhanced technique endeavors to rid ourselves of these anomalies, and generate an
efficient technique to handle almost all types of timer triggers.
In the next section we illustrate the brute force approach to implement timer
triggers. With the disadvantages of this approach in mind, we highlight our points of
focus while developing the various enhanced algorithms, followed by the actual
algorithms for timer triggers, on single relations and joins. We end the chapter with some
examples and thoughts.
4.2 The Brute Force Implementation for Single Table and Join Triggers
Consider a general trigger T defined on one or more relations R. The predicate of
the trigger can be expressed in CNF as PAPtran, where P represents all the non-transition
predicates and Ptran represents the transition predicates. Although the view and the trigger
have their separate predicates Pv and Pw respectively, we combine them together
discussion and represent them as PAPtran. More details on combining predicates and
applying them are covered later in this chapter in section 4.6. The current view created is
CurrView=op(R) and PrevView is the copy of the view from the previous timer
expiration. Note that CurrView and PrevView are views we use for implementation, they
are different from the view defined in the trigger i.e. V=opv(R). The operations to be
performed to implement this trigger are:
1. Create a new copy of the view, CurrView=op(R)
2. Run the action on all the tuples resulting from the following operation,
GPtran (CurrView JOIN PrevView).
The above two steps translate into two SQL queries. For non-transition triggers,
the second step is not needed and the action is run on all the tuples of CurrView, resulting
in only one SQL query.
4.3 Optimization Goals and Strategy
The primary goal of developing enhanced algorithms for timer triggers is to
generate the new view from the old view with a method more efficient than the brute
force method, in terms of CPU and I/O processing time. This, as might be expected,
implies that we keep track of some information about the tuples updated on a table since
the last timer expiration. When the timer goes off we can modify the old view to obtain
the new view. As mentioned previously, improper optimization methods can cause
anomalies, whereby trigger behavior is different than when using the brute-force method.
We overcome these anomalies by keeping track of only the primary key of the tuples that
were updated. When the timer goes off, we perform a join of these keys with the original
relation, to get the correct picture of which among the touched tuples were effectively
inserted, deleted or updated. We then go on to apply these operations to the old view to
obtain the new view.
The timer trigger system is designed to complement an asynchronous trigger
system. As mentioned before, for a table on which a timer trigger is defined, we keep
track of all updates to the table within each processing interval. We can go a step further
and keep track of only those tuples which are of interest to the trigger i.e. those tuples,
which satisfy the predicates of the trigger. This can be done by using the existing
asynchronous trigger system of TriggerMan. Predicates for updated tuples can be
evaluated asynchronously using the fast and efficient Selection Predicate Index in main
memory [Han98].
Another focus while developing our algorithms is to query the relations on which
triggers are defined, only once. It is reasonable to assume these relations are large or will
keep growing. It is therefore important for our algorithms to not touch the relations more
times than necessary. So, in almost all cases, we query the original relation only once for
its attribute values.
4.4 Enhanced Algorithms for Single Table Triggers
In this section we present our enhanced algorithms to implement the various timer
triggers on single relations. These include triggers with or without transition predicates
and with any of the various events viz. on insert, on update and on delete.
The general schema of the view CurrView, on a single table R is (key, fields,
tranfields), where key is the primary key of the table R and fields are one or more
attributes of R whose values are required in the action. The third field tranfields is present
only in a transition trigger and includes those attributes of R, whose values are required to
evaluate the transition predicate.
4.4.1 Single Table on Update
Consider a timer trigger defined on a single relation R (key, field, field,
field3,...fieldn), with one or more predicates P, none of which have a transition
condition. The brute force method to implement this trigger is to evaluate the query
currView= Hn ,.,. (7p(R)), every time the timer goes off. The specified action is then
run for all the resulting tuples of the view currView.
In our enhanced algorithms we maintain a relation S(key), which is a set of unique
primary keys of tuples of R, that were touched (i.e. inserted, deleted or updated), since the
last timer expiration. Additionally, the keys of only those tuples are inserted into the
relation S, which either satisfy the predicate clause P or used to satisfy it before the
update. Regular triggers are defined on the relation R to collect this set S. This is
explained in greater detail with examples in later sections. The following operations can
now be performed to generate a new copy of the view:
1. TempCandidates = Hi ,.,1. ((p(S JOIN R))
2. Refresh CurrView :
2a. CurrView = CurrView (S JOIN CurrView)
2b. CurrView = CurrView u TempCandidates
3. Candidates for the action are all the tuples of the view CurrView.
Explanation: CurrView is the view from the last timer expiration. When the timer
goes off, we need to refresh the view CurrView before running the action on all its tuples.
The first step is to identify tuples to be inserted into the view or have their existing
values updated. These tuples could only be from among the touched tuples of relation R
in the interval since the last timer expiration, i.e., from the set of keys in relation S. These
tuples are collected into a temporary relation TempCandidates. A tuple updated in
relation R may cause an insert into the view currView, if that tuple did not previously
satisfy the predicate P, but does so after the update. In performing a join on the original
relation R, we get the current and thus the correct values of the attributes for evaluating
the predicate. TempCandidates should not include any tuples which need to be deleted
from the view. A tuple will be deleted from the view, either because it is physically
deleted from the relation R or because it no longer satisfies the predicate due to the
update. Performing the join between the relation S to the original relation R takes care of
eliminating the former and evaluating the predicate P takes care of eliminating the latter.
The second step refreshes the view currView. The first of the two queries deletes
all the tuples in CurrView that were touched, i.e. all keys in S. The second query inserts
any of the touched tuples that still belong to the view. Let's separately consider how the
inserts, updates and deletes to the view are correctly handled. A tuple to be inserted into
the view will be present in the temporary relation TempCandidates, and thus gets inserted
into the view CurrView by the second query. A tuple that is updated needs its attribute
values refreshed in the view CurrView. This tuple is present in TempCandidates. It is
updated by first deleting it from the view CurrView with the first query and then
reinserting it into the view from TempCandidates with the second query. A tuple to be
deleted from the view will not be present in the temporary relation, TempCandidates. The
first query will delete this tuple and the second query will not reinsert it into the view.
Finally the third step simply calls a User Defined Routine to run the action on
each of the tuples in the view CurrView.
The above steps can be translated into SQL queries as follows:
1. Select S.key as key, R.fields as fields
from S, R
where S.key = R.key and op
into temp TempCandidates;
2.
2a. Delete from CurrView
where exists (select S.key from S where S.key=CurrView.key);
2b. Insert into CurrView
select key, fields from NewCandidates;
3. Select trigAction (key, fields)
from CurrView;
where trigAction is a UDR for the specified trigger action
Salient features of the algorithm are the following. The above algorithm is well
enhanced and should work especially well on tables with a low update rate. The set of
keys of touched tuples in relation S is kept to as small a size as possible. This is done by
evaluating the predicate P, on each tuple, before inserting its key into S. If the predicate P
is highly selective, only a small fraction of the updates to R, have their keys stored in S.
There is however, a small cost overhead of evaluating the predicates P on touched tuples
before inserting it into the relation S. This cost is significantly reduced by making use of
the existing fast and efficient discrimination network of TriggerMan.
The number of queries in the above algorithm is restricted to three, of which only
one accesses the original relation R. The other two queries operate on much smaller
relations and should not cause a performance overhead. Processing is made more efficient
when the relation R has an index on the primary key, because then, in the worst case, only
as many pages of relation R are accessed as the number of unique keys in S. Correlated
sub-queries, as the one above in step 2, are well optimized by several existing DBMS
products. This query is flattened to perform a join ignoring any duplicates.
4.4.2 Single Table on Insert
In the case of single table triggers with the on insert event, we are interested in
rows inserted into the view CurrView, since the last timer expiration. When the timer
goes off, we create the current copy of the view, identify all new entries from the previous
copy and run the action on them. As before the relation S(key) is the set of keys of
touched tuples of relation R. The operations to process the trigger are as follows:
1. TempCandidates = Hn 1. ,.,i. ((p(S JOIN R))
2. Candidates for the action are all the tuples of (TempCandidates CurrView)
3. Refresh CurrView :
2a. CurrView = CurrView (S JOIN CurrView)
2b. CurrView = CurrView u TempCandidates
The first step creates a temporary relation TempCandidates of all the inserts and
updates as before. The next step runs the trigger action on all the tuples in
TempCandidates that are not in the previous copy of the view CurrView. Finally
CurrView is refreshed as before. The SQL statement to run the action on new inserts in
step 2 can be written as follows:
select trigAction (key, fields)
from TempCandidates
where not exists (select CurrView.key from CurrView
where key = TempCandidates.key)
where trigAction is a UDR for the specified trigger action. It accepts the key and fields of
the view and executes the action.
4.4.3 Single Table on Delete
In the case of single table triggers with the on delete event, we are interested in
rows deleted from the view CurrView, since the last timer expiration. When the timer
goes off, we create the current copy of the view, identify rows deleted from the previous
copy and run the action on them. The operations to process the trigger are as follows:
1. TempCandidates= Hn 1. ,,.d. ((p(S JOIN R))
2. Candidates for the action are all the tuples of
((S JOIN CurrView) TempCandidates)
3. Refresh CurrView :
2a. CurrView = CurrView (S JOIN CurrView)
2b. CurrView = CurrView u TempCandidates
The first and the third steps are no different from before. The second step
identifies tuples that were present in the previous copy of the view, but were deleted
since. Of course, these could only be from among the set S. (S JOIN CurrView) returns
all the tuples in the previous copy of the view, that were touched since. From this set, we
ignore the inserts and updates (i.e. tuples in TempCandidates) to get only the deletes. The
SQL query looks as follows:
select trigAction (key, fields)
from S, CurrView
where S.key = CurrView.key and
not exists (select TempCandidates.key from TempCandidates
where S.key = TempCandidates.key).
where trigAction is a UDR for the specified trigger action. It accepts the key and fields of
the view and executes the action. Again, correlated sub-queries are flattened and well
optimized by major commercial DBMS products.
4.4.4 Single Table on Update with Transition Predicate
Timer triggers with transition predicates compare old and new values of attributes
of corresponding tuples using the old and new keywords as qualifiers. As discussed later
in section 4.6, a predicate for a transition trigger can be written in Conjunctive Normal
Form (CNF) as PAPtran. P has all the non-transition predicates as before and is used to
define the view currView= HR. n ,.,j ,,i,a,-. (op(R)). Ptran is the transition predicate
operating on the old and new values of certain fields written as tranFields in the view. For
simplicity, we assume that the predicate Ptran is such that only a change in the value of
tranFields may cause it to evaluate to true. A more general discussion on evaluating
transition predicates which evaluate to true even when the fields tranFields are untouched
(e.g. old.value=new.value) is present in Chapter 7 under future work.
In order to process a transition trigger, we create the current copy of the view, and
then apply the transition predicates on the old and new copies of the view. The operations
to process the trigger are as follows:
1. TempCandidates = key, fields, tranFields ((Yp(S JOIN R))
2. Candidates for the action are all the tuples in
UPtran (CurrView JOIN TempCandidates).
3. Refresh CurrView :
2a. CurrView = CurrView (S JOIN CurrView)
2b. CurrView = CurrView u TempCandidates
The first and third steps are same as before. In the second step we identify the
tuples which satisfy the transition condition. Only those tuples that were updated, since
the last timer expiration, are candidates to satisfy the transition condition. The join
identifies only the updated tuples and applies the transition predicate Ptran on them. An
SQL query for this step would look as follows:
select trigAction (CurrView.key, CurrView.fields, CurrView.tranFields,
TempCandidates.tranFields)
from TempCandidates, CurrView
where TempCandidates.key = CurrView.key
and Ptran (TempCandidates.tranFields, CurrView.tranFields)
where Ptran is the transition predicate applied on field tranFields of the view.
4.5 Enhanced algorithms for Join Triggers
A join trigger is defined on multiple tables. The schema for a view on N relations
R1 to RN is CurrView(keyl, ..., keyN, fields, tranfields) where keyI is the primary key of
relation RI, "fields" are one or more attributes from any of the N relations whose values
are required in the action and the last field "tranfields" have one or more attributes
required to evaluate the transition predicate, if any. The selection predicate, normalized to
CNF is P1AP2A... APNAPJoIN, where PI is the selection predicate on RI and PJOIN
represents the join predicates. The view definition is:
CurrView= FI keyl, ..., keyN, fields, tranFields
((TP1,P2,... ,PN (cPJOIN(R1 X R2 X ....X RI X ......RN)))
The brute force method accesses all the tuples of all the relations to create the new
copy of the view.
4.5.1 Join Trigger on Update
We maintain individual sets for each relation to register the updated tuples since
the last timer expiration. Each of these relations SI(keyI) has the set of all the keys of
relation RI that were touched since the last processing period. When the timer goes off
the following operations are performed:
1. Create TempCandidates
ForI= 1 toN
do
If SI is empty then skip to the next iteration;
TempCandidatel= fn keyl, ..., keyN, fields, tranFields
((P1,P2,... ,PN ((PJOIN(R1 X... (SI JOIN RI) ... X RN)))
TempCandidate = TempCandidate u TempCandidatel.;
done
2. Refresh CurrView :
2a. For I= 1 to N
do
If SI is empty then skip to the next iteration;
CurrView = CurrView (SI JOIN CurrView)
done
2b. CurrView = CurrView u TempCandidates
3. Candidates for the action are all the tuples of the view CurrView.
The relation TempCandidatel represents the set of inserted and updated tuples of
the view due to the updates on RI only. The relation TempCandidates represents all the
inserts and updates on the view since the last timer expiration and is created by
performing the union of all the relations TempCandidatel. Step la and lb can be
implemented by a single query. Step 1 can thus be implemented using N+l queries, N
queries for each of the TempCandidatel's and one query to remove any duplicates from
TempCandidates.
In step 2, the view is refreshed by first deleting all the tuples touched due to
updates on each of the relations and then inserting the tuples from TempCandidates. Step
2 can also be written using N+1 queries. Finally, the last step runs the action on all tuples
in the view.
Salient features of the algorithm are the following. When the timer goes off, we
have to execute many queries to refresh the view and run the trigger action. In the worst
case when updates to all the relations are registered 2N+2 queries will be run, of which
each of the N queries of step 1 access all the relations. Clearly, our implementation is not
efficient in this case. The brute force method runs only one query to access all the
relations. However, if only one relation registered updates then our algorithm need run
only one query for the first step, two for the second and one for the third i.e. only four in
all, three of which are on smaller relations. Our implementation thus, works really well in
cases when only a small number of the tables are updated.
4.5.2 Join Trigger on Update with Transition Predicate
In this case a transition condition is a part of the predicate. The predicate can be
written in CNF as P1AP2A... APNAPJoINAPtran. The view definition does not change and
is as follows:
CurrView= Hl keyl, ..., keyN, fields, tranFields
(GP1lP2,... APN (TPJOIN (R1 X R2 X ....X RI X ......RN)))
When the timer goes off, the trigger is implemented as follows:
1. Create TempCandidates
ForI= 1 toN
do
If SI is empty then skip to the next iteration;
TempCandidatel = nfl key ... keN, fields, tranFields
(G(P1,P2,... PN (GPJOIN(R1 X... (SI JOIN RI) ... X RN)))
TempCandidate = TempCandidate u TempCandidatel.;
done
2. Candidates for the action are all the tuples in
(Ptran (CurrView JOIN TempCandidates).
3. Refresh CurrView :
3a. For I= 1 toN
do
If SI is empty then skip to the next iteration;
CurrView = CurrView (SI JOIN CurrView)
done
3b. CurrView = CurrView u TempCandidates
All the steps look similar to the previous discussion on the non-transition case. In
step 2 the action is run on only tuples of the view that satisfy the transition predicate by
comparing the corresponding values of these tuples attributes in the old and new view.
Again as stated for the transition case of single table triggers, we assume for
simplicity that the predicate Ptan is such that only a change in the value of tranFields may
cause it to evaluate to true. A more general discussion on evaluating transition predicates
which evaluate to true even when the fields tranFields are untouched (e.g.
old.value=new.value) is present in Chapter 7 under future work.
4.6 Combining the View Predicate and the Trigger Predicate
Consider a general timer trigger defined on one or more relations R1, ... Rn with
a view predicate Pv and a trigger condition Pw, both in Conjunctive Normal Form (CNF).
The view definition is V=nfields((Gpv(R1 X R2 X ... X Rn)). On timer expiration, the view
is retrieved and the trigger predicate Pw is applied to it. The view CurrView that we create
and keep around for enhanced trigger processing need not be the same as the view V but
possibly a more efficient representation.
In the case when the trigger has no transition predicates (i.e. no reference to old
attribute values), we generate CurrView by applying the predicates P=PvAPw on the
relations i.e. CurrView=n fields (oP (R1 X R2 X ... X Rn)). Since Pv and Pw are both in
CNF, P is also on CNF.
In the case when the trigger has transition predicates, the trigger predicate Pw has a
non-transition part PnoTran and a transition part Ptran i.e. Pw=PnoTranAPtran. We generate
CurrView by applying the predicates P=PvAPnoTran on the relations i.e., CurrView=n fields
(OPvAPnoTran (R1 X R2 X ... X Rn)). The procedure for breaking up Pw into PnoTran and Ptran
is explained as follows. If the old value of an attribute is referenced in the predicate, then
all the conjuncts containing that attribute make up Ptran. The conjuncts of Pw that are not
in Ptran make up PnoTran.
From the above two cases, it is clear that we can represent the predicate for a
trigger as P when there are no transition predicates and PAPtran when there are transition
predicates.
4.7 Examples
Example of a Single Table Trigger: Consider this example of a non-transition single
table trigger from Chapter 3.
create timer trigger excessvacation
from (select email_addr, vacationreqfilled
from employee
where vacationtaken > vacationleft)
when vacationreqfilled = False
check every week beginning Monday at 8:00 AM
do email(:email_addr, "Please fill up the Excess Vacations form");
The predicate P here is
P=(vacationtaken>vacationleft) and (vacationreq_filled=False)
The various schema definitions are as follows:
CurrView(empid, email_addr);
TempCandidates(empid, email_addr);
S(emp_id);
The asynchronous trigger to collect the updates in S is
create trigger S_employee_excessvacation
from employee
when ((vacationtaken>vacationleft) and
(vacationreqfilled=False))
or
((old.vacationtaken>vacationleft) and
(old.vacationreq_filled=False))
do execSQL'insert into S values :emp_id';
From the discussion in section 4.4.1 for the case of single table triggers with an
ON UPDATE event, the steps to process the trigger on timer expiration are written as
follows:
1. TempCandidates = emp id, e-mail addr
((C(vacation taken>vacation left) and (vacation reqfilled=False) (S JOIN employee))
2. Refresh CurrView :
2a. CurrView = CurrView (S JOIN CurrView)
2b. CurrView = CurrView u TempCandidates
3. Candidates for the action are all the tuples of the view CurrView.
Example of a Join Trigger: Consider this example of a transition join trigger from
Chapter 3.
create timer trigger newartistalbum_sales
from (select artist.email as artistemail, album.sales as sales
from album, artist
where (album.artistname = artist.artistname) and
((artist.category="rock") or (artist.category="pop")) and
(album.artist_type="new")
when old.sales < 1000000 and sales>=1000000
check every day
do email(:artistemail, "Congratulations! Your album sales have crossed a
million");
The predicate here is PartistAPalbumAPJoINAPtran where
Partist= (artist.category="rock") or (artist.category="pop")
Palbum = album.artist type="new"
PJoIN = album.artist name = artist.artist name
Ptran = old.sales < 1000000 and sales>1000000
The various schema definitions are as follows:
CurrView(artistid, albumid, sales);
TempCandidates(artistid, album_id, sales);
Sartist(artist id); Salbum(albumid);
The asynchronous triggers to collect the updates in Sartist and Salbum are:
create trigger S_artistnewalbum_sales
from artist
when (artist.category="rock") or (artist.category="pop")
or (old.artist. category="rock") or (old.artist.category="pop")
do execSQL insert into St,, values :artist id';
create trigger S_albumnewalbum_sales
from album
when album.artisttype="new"
or old.album.artisttype="new"
do execSQL insert into Salbum values :album id';
The view is initialized as follows:
CurrView= H artist id, album id, sales (GPartist,Palbum((GPJOIN(artist X album)))
From the discussion in section 4.5.2 for the case of Join Triggers with an ON
UPDATE event and a Transition predicate, the steps to process the trigger on timer
expiration are written as follows:
1. Create TempCandidates:
la. If Sartist is not empty then
TempCandidateartist = n artist id, album id, sales
((yPartist,Palbum((PJOmI(Sartist JOIN artist) X album)))
TempCandidate = TempCandidate u TempCandidateartist.;
end if
lb. If Salbum is not empty then
TempCandidatealbum = artist id, album id, sales
((yPartist,Palbum((PJOmI(Salbum JOIN album) X artist)))
TempCandidate = TempCandidate u TempCandidatealbum.;
end if
2. Candidates for the action are all the tuples in
(Ptran (CurrView JOIN TempCandidates).
3. Refresh CurrView :
3a. If Salbum is not empty then
CurrView = CurrView (Salbum JOIN CurrView);
If Sartist is not empty then
CurrView = CurrView (Sartist JOIN CurrView);
3b. CurrView = CurrView u TempCandidates
41
4.8 Conclusions
The brute force implementation of timer triggers is expensive because of the CPU and I/O
cost involved in reading entire relations and applying selection and join predicates on
them. In single table triggers where the update rate on relations are not very high, our
strategy of keeping track of the tuples touched and performing queries only on this
touched set works very well. Performance improves even more if the selection predicate
is very selective or if there is an index on key of the relation. In multi-table triggers, our
strategy works well when only a few of the relations are updated and the update rate is not
very high.
CHAPTER 5
ENHANCED ALGORITHMS FOR AGGREGATE TRIGGERS
5.1 Introduction
Aggregate triggers are triggers whose condition of interest involve an aggregation
on all tuples or groups of them. If a tuple of a relation is touched then the group it belongs
to needs its aggregate information re-calculated and the trigger condition checked. Thus,
aggregate triggers implemented as synchronous or asynchronous triggers have a high
processing cost per updated tuple. The timer trigger mechanism to test for interesting
aggregate conditions periodically is a better option to implement aggregate triggers. The
view here is an aggregation of tuples from one or more relations grouped by some
grouping attributes. Again, the brute force method is the simplest implementation and
always works. There are enhanced algorithms discussed in the next few sections which go
along similar lines of the algorithms in the previous chapter that make aggregate triggers
very efficient under some cases.
5.2 The Brute Force Implementation for Aggregate Triggers
Consider a general aggregate trigger defined on relations R. The selection
predicate of the trigger is P, the grouping attributes are gAttr and the aggregate predicate
is PaggrAPtranAggr in CNF, where Paggr represents all the non-transition aggregate predicates
and PaggrTran represents the transition aggregate predicates. Although the view and the
trigger have their separate aggregate predicates Pvaggr and Pwaggr respectively, we combine
them together into one Boolean and represent them as PaggrAPtranAggr, where PtranAggr
contains the transition predicate part of the aggregate predicate and Paggr contains the non-
transtion part. This discussion of combining predicates is identical to the one we had in
section 4.6, the only difference being that the selection predicates are replaced by
aggregate predicates. The schema for the view CurrView is (gAttr, aggrValues,
tranAggrValues) where aggrValues are one or more fields in the view required for the
action and tranAggrValues are one or more fields required to evaluate the aggregate
predicate. The view is defined as follows.
CurrView= H gAttr, aggrValues, tranAggrValues ((UPaggr(gAttr G aggrFunc fields((cP (R))))
where aggrFunc is an aggregate function applied on some attribute "fields" of the relation.
The aggregate values generated by aggrFunc are to evaluate the aggregate predicate Paggr
and also to evaluate the fields aggrValues and tranAggrValues of the view. PrevView is
the copy of the view from the previous timer expiration. Note that CurrView and
PrevView are views we use for implementation, they are different from the view defined
in the trigger. The operations to be performed to implement this trigger are:
1. Create a new copy of the view CurrView,
CurrView= H gAttr, aggrValues, tranAggrValues ((UPaggr(gAttr G aggrFunc fields((cP (R))))
2. Run the action on all the tuples resulting from the following operation,
(Ptran (CurrView JOIN PrevView).
The above two steps translate into two SQL queries. For non-transition triggers,
the second step is not necessary and the action is run on all the tuples of CurrView,
resulting in only one SQL query.
5.3 Single Table Aggregate Triggers
In this section we present some enhanced algorithms to implement the various
timer triggers with aggregate conditions on single relations. These include triggers with
any of the various events viz. on insert, on update and on delete and with or without
aggregate transition predicates on the groups within the view. Again, the simplest method
that works is the brute force method. When the timer goes off the new view is created
from all the tuples of the relations and an aggregation performed on all the groups.
However, if only a small number of groups are touched or many tuples of only one group
are touched then recomputing the aggregation for all the tuples in all the groups may be
extremely inefficient in terms of I/O and CPU.
The general schema of an aggregate view currView on a single table R is (gAttr,
aggrValues, aggrTranfields) where gAttr comprises the group attributes and aggrValues
are the values of aggregate functions applied on the group, required for the action. The
aggrTranfields values are required only for transition triggers to evaluate the transition
predicate. These values are also obtained by applying some aggregate functions on the
group.
5.3.1 Single Table, Aggregate, Non-Transition, on Update
Consider a timer trigger defined on a relation R with selection predicate P, one or
more grouping attributes gAttr and the aggregate predicate Paggr. The view can be defined
as follows:
CurrView= H gAttr, aggrValues ((UPaggr (gAttr G aggrFunc fields ((P (R))))
where aggrfunc are one or more aggregate functions applied on some fields
"fields" of relation R. The values of these aggrFunc may either be needed to evaluate the
aggregate predicate Paggr, or to simply compute the aggregate values aggrValues required
for the action.
As before, we maintain a relation S(gAttr), which is the set of groups touched due
to updates on relation R since the last timer expiration. The algorithm to process this
trigger is as follows:
4. TempCandidates = H gAttr, aggrValues ((UPaggr (gAttr G aggrFunc field ((P (S JOIN R))))
5. Refresh CurrView :
2a. CurrView = CurrView (S JOIN CurrView)
2b. CurrView = CurrView u TempCandidates
6. Candidates for the action are all the tuples of the view CurrView.
TempCandidates is a temporary relation of current aggregate information of only
the groups to be inserted or updated in the view CurrView. 7p (S JOIN R) returns all the
tuples belonging to the touched groups, satisfying the selection predicates. We then group
these tuples on the grouping attributes gAttr and apply the aggregate predicates Paggr, to
obtain new aggregate values of all the touched groups. The second step refreshes the view
CurrView by deleting all the groups touched and inserting all tuples in TempCandidates.
Finally, the trigger action is run on all tuples in the view CurrView.
As consistently seen in all examples until now, the main difference in the various
algorithms for single table triggers lies in creating the relation TempCandidates consisting
of inserts and updates to the view. After this relation is created, the steps to refresh the
view and run the action on the tuples are similar.
The salient features of the algorithm are the following. This algorithm works best
if tuples from only a few groups are touched because then the data is read and aggregation
performed only on tuples belonging to this small set of groups. If there is an index on the
group attributes gAttr, then I/O is performed only on those pages having the tuples of the
touched groups. Applying the selection predicate P on all touched tuples before inserting
it into the set S, reduces the number of groups especially if the predicate is highly
selective. Only one of the queries accesses a part of the original relation R. The remaining
three operate on smaller relations and involve simple join and predicate application.
The SQL query to implement the trigger looks as follows:
1. Select S.gAttr as gAttr, aggrFunc(R.field) as aggrValues
from S, R
where S.gAttr=R.gAttr and P
group by gAttr
having Paggr
into temp TempCandidates
2. Refresh CurrView :
2a. Delete from CurrView
where exists (select gAttr from S where S.gAttr=CurrView.gAttr);
2b. Insert into CurrView
select gAttr, aggrValues from TempCandidates;
3. select trigAction (gAttr, aggrValues)
from CurrView.
5.3.2 Single Table, Aggregate, Transition, on Update
In this case one or more of the aggregate predicates involve a transition condition
on a aggregate value (e.g. old.sum(sal) < sum(sal) ). A general representation of the
aggregate predicate can be written in CNF as PaggrAPtranAggr, where Paggr is the regular
aggregate predicate from before and PtranAggr has one or more transition predicates. The
view is defined as follows:
CurrView= H gAttr, aggrValues, tranAggrValues ((UPaggr (gAttr G aggrFunc field ((P (R))))
where now the aggrfunc additionally generates one or more fields tranAggrValues
to evaluate the transition predicate.
1. TempCandidates =
I gAttr, aggrValues, tranAggrValues ((Paggr (gAttr G aggrFunc field ((P (S JOIN R))))
2. Candidates for the action are all the tuples in
(PtranAggr (CurrView JOIN TempCandidates)
3. Refresh CurrView :
2a. CurrView = CurrView (S JOIN CurrView)
2b. CurrView = CurrView u TempCandidates.
Only step 2 above looks different. In this step we identify the tuples which satisfy
the transition condition. Only those groups touched since the last timer expiration are
candidates to satisfy the transition condition. These groups are in TempCandidates. The
join identifies only the updated tuples and applies the transition predicate PtranAggr on
them.
5.4 Multi Table Aggregate Triggers
In the case of multi-table aggregate triggers, a trigger is defined on multiple tables.
The schema for the view CurrView on N relations R1 to RN is (gAttrl, ..., gAttrN,
aggrValues, aggrTranfields). The grouping attributes in this case are gAttrl, gAttr2, ...,
gAttrN, where a gAttrl could be zero or more attributes of the relation RI. This number is
zero when a relation does not have any of its attributes in the group-by clause. The
selection predicate, normalized to CNF is P1AP2A... APNAPJoIN, where PI is the selection
predicate on RI and PJON represents the join predicates. The view definition is:
CurrView= gAtrgAttrl ... gAttrN, aggrValues
((Paggr ( gAttrl, ... gAttrN G aggrFunc field
((TP1,P2,... ,PN (cPJOIN(R1 X R2 X ....X RI X ......RN)))))
5.4.1 Multi Table, Aggregate, Non-Transition, on Update
When the timer goes off, we re-aggregate on the groups touched due to updates on
each table. The operations are as follows:
4. Create TempCandidates
ForI= 1 toN
do
If SI is empty then skip to the next iteration;
create TempCandidatel;
TempCandidate = TempCandidate u TempCandidatel.;
done
5. Refresh CurrView :
2a. For I = 1 to N
do
If SI is empty then skip to the next iteration;
CurrView = CurrView (SI JOIN CurrView)
done
2b. CurrView = CurrView u TempCandidates
6. Candidates for the action are all the tuples of the view CurrView.
The relation SI (gAttrI) has the set of gAttrI values touched due to updates on
tuples of relation RI, after applying the predicate PI on each tuple. The join predicate
might cause only a subset of the groups having the values in gAttrI to be affected,
however, for simplicity of the algorithm we recompute for all groups having the gAttrI
values.
TempCandidatesI represents the set of groups touched due to the updates on
relation RI only. The process to create it is discussed below. TempCandidates is the union
of all these temporary relations and thus is a cumulative set of touched groups due to
updates on all relations on the view. Step la and lb can be implemented by a single
query. Step 1 can thus be implemented using N+1 queries, N queries for each of the RI's
and one query to remove any duplicates.
In step 2, the view is refreshed. First all the touched groups due to updates on each
relation are deleted from the view, then we reinsert the appropriate groups with their new
values from TempCandidates. Step 2 also can be written using N+1 queries. Finally, the
last step runs the action on all tuples in the view.
The process of creating the relation TempCandidatesI for any relation RI is done
in two different ways depending on whether any of its attributes contribute to the
grouping or not. In the first case we consider a relation RI whose attributes are present in
the grouping as the fields gAttrI. TempCandiatesI for this case can be created as follows:
TempCandidates = H gAttrl, ..., gAttrN, aggrValues
((Paggr ( gAttrl, ..., gAttrN G aggrFunc field
((Gp1,P2,... ,PN (cPJOIN(R1 X... (SI JOIN RI) ... X RN)))))
where SI (gAttrl) has the values of all the gAttrl touched since the last timer
expiration. The aggregation is performed for all the groups whose gAttrl values are in S.
In the second case we consider a relation RI which does not have any attributes in
the grouping clause. This makes the task of finding the groups touched due to updates on
relation RI a little difficult. In this case, we collect the set S(pKey), the set of primary
keys of tuples updated in RI. Then we use the selection and join predicates to find another
relation RJ whose gAttrJ values are touched due to updates on RI. These values are
collected in a relation SJ'(gAttrJ) and appended to SJ(gAttrJ). So SJ would now consist of
all the gAttrJ values touched due to updates on RJ as well as RI.
SJ'(gAttrJ) = n gAttrj (P+I (,PJOIN-IJ((SI JOIN RI) X RI+1 X .... X RJ)))
SJ(gAttrJ)= SJ(gAttrJ)ASJ'(gAttrJ)
where (PJOIN-IJ includes only those join predicates involving relations from RI to RJ
The relation TempCandidatesJ is created the usual way from SJ as shown above,
however it reflects the refreshed values of groups touched by updates on RI and RJ. RJ is
any relation which contributes to the grouping attributes and is connected to RI in the rule
condition graph. A rule condition graph is like a query graph where the various relations
(with their selection predicates) are the vertices and the join predicates are the edges
between the vertices. The relations RI+1 to RJ-1 are the relations along the path from RI
to RJ in the query graph.
The salient features of the algorithm are the following. When the timer goes off,
we have to execute many queries to refresh the view and run the trigger action. In the
worst case when updates to all the relations are registered 2N+2 queries will be run, of
which almost N queries could access all the relations. Clearly, our implementation is not
efficient in this case. The brute force method runs only one query to access all the
relations, but involves a lot of aggregate computation on possibly a large number of
groups. However, if only one relation registered updates then our algorithm need run only
one query for the first step, two for the second and one for the third i.e. only four in all,
three of which are on smaller relations. Our implementation thus, works really well in
cases when only a small number of the tables are updated.
5.4.2 Multi Table, Aggregate, Transition, on Update
The only difference in the case of a multi-table aggregate trigger with a transition
predicate lies in a transition condition being a part of the aggregate predicate. The
aggregate predicate can be written in CNF as PaggrAPtranAggr. The view definition does not
change and is as follows:
CurrView= gAtrgAttr ... gAttrN, aggrValues, tranAggrValues
((Paggr ( gAttrl, ..., gAttrN G aggrFunc field
((GPIP2,... ,PN (cPJOIN(R1 X... (SI JOIN RI) ... X RN)))))
When the timer goes off, the trigger is implemented as follows:
1. Create TempCandidates
ForI= 1 toN
do
If SI is empty then skip to the next iteration;
create TempCandidatel;
TempCandidate = TempCandidate u TempCandidatel.;
done
2. Candidates for the action are all the tuples in
(PtranAggr (CurrView JOIN TempCandidates).
3. Refresh CurrView :
3a. For I= 1 toN
do
If SI is empty then skip to the next iteration;
CurrView = CurrView (SI JOIN CurrView)
done
3b. CurrView = CurrView u TempCandidates.
All the steps look similar to the previous discussion on the non-transition case. In
step 2 the action is run on only the groups satisfying the transition predicate by comparing
the corresponding values of these groups in the old and new view.
5.5 Examples
Example of a Aggregate Trigger: Consider this example of an aggregate trigger with no
transition predicate from Chapter 3.
create timer trigger excess_salary
from (select dept.manageremail as mgremail, sum (emp.sal) as sum sal, dept.budget as budget
from emp, dept
where (emp.dno = dept.dno) and
(emp.desig="sales personnel") and
(dept.name="toy")
group by dept.name, dept.manageremail, dept.budget
when sumsal > .50 budget
check every week
do email(:mgremail, "You are exceeding the allowed budget for salespersons.");
The selection predicate is PempAPdeptAPJoIN where
Pempis (emp.desig="sales personnel")
Pdept is (dept.name="toy")
PJIN is (emp.dno = dept.dno)
The aggregate predicate is Paggr where
Paggr is (sum_sal > .50 budget)
The various schema definitions are as follows:
CurrView(dept name, mgr email, budget, sum_sal);
TempCandidates(dept name, mgremail, budget, sum_sal);
Semp(emp_id); Sdept(name, manager email, budget);
CurrView and TempCandidates have the same schema consisting of the grouping
attributes and a field containing the result of summing the groups on sal. Since only the
relation Dept contributes to the grouping attributes we collect the set of touched groups in
Sdept. Emp does not contribute to the grouping at all so we keep track of the primary keys
of interesting tuples updated.
The asynchronous triggers to collect the updates in Smp and Sdept are:
create trigger Semp_ excess_salary
from emp
when emp.desig="sales personnel"
or old.emp.desig="sales personnel"
do execSQL 'insert into Semp values :emp_id';
create trigger Sdept excess_salary
from dept
when dept.name= "toy"
or old.dept.name="toy"
do execSQL 'insert into Sdept values (:name, :manageremail, :budget)'
From the discussion in section 5.4.1 for the case of a multi-table aggregate trigger
with an ON UPDATE event and no transition predicate, the steps to process the trigger on
timer expiration are written as follows:
1. Create TempCandidates
la. If Smp is not empty then
Sdept- Hname, manager email, budget
((PempPdept ((Semp JOIN emp) JOIN dept)
endif
lb. If Sdept is not empty then
TempCandidatedept= Iname, manager e-mail, budget, sum sal
(dept.name, dept.manager email, dept.budget G sum sal
(GPempPdept (emp JOIN (Sdept JOIN dept))))
TempCandidate = TempCandidatedept;
endif
2. Refresh CurrView :
If Sdept is not empty then
CurrView = CurrView (Sdept JOIN CurrView);
CurrView = CurrView u TempCandidates
3. Candidates for the action are all the tuples of the view CurrView.
5.6 Conclusion
Implementing aggregate triggers by the brute force method may not be the most
efficient method in some cases. Reading entire relations and performing aggregations
involves a lot of expense in terms of both CPU and /O. In single table triggers where the
update rate on relations are not very high, our strategy of keeping track of the groups
touched and aggregating on only this small set of groups works well. Performance
improves even more if the selection predicate is very selective or if there is an index on
the relation. In multi-table triggers, our strategy works well if only a few of those
relations are updated and the update rate is not very high.
CHAPTER 6
OPTIMIZATION AND IMPLEMENTATION ISSUES
6.1 Introduction
In the preceding chapters, we looked at the syntax and semantics of timer triggers
with its brute-force and enhanced implementation algorithms. In this chapter, we discuss
the architecture and some optimization issues of a Timer Trigger System. Although the
enhanced algorithms contain more queries than the brute force approach, they can be
orders of magnitude faster under certain conditions. The same algorithms however, may
perform poorly compared to the brute-force approach under different conditions. This
chapter looks at the factors which determine the performance of our enhanced algorithms
and provides some thoughts in making a choice between the better of the two approaches.
Trying to make a comprehensive assessment using extensive statistics, like the query
optimizer does [Chm76, Se179], is extremely complex to implement.
6.2 Architecture
The architecture of a timer trigger system is shown in figure 6.1. As discussed
before, a Timer Trigger System is designed to complement the powers of an
asynchronous trigger processor. A console program interfaces to the user and accepts the
"create timer trigger" command string. This console could be same as the one used by the
asynchronous trigger processor to accept its commands. This command string is then sent
to the parser where it gets parsed for syntactic correctness. The semantic analyzer
validates the semantics of the various entities present in the command string viz. the data
sources, views, fields etc. It also checks to see if the view on which the trigger is defined
has a primary key on it, and if the action present in the do clause is valid.
Figure 6.1: Architecture of a Timer Trigger System
The code generator then generates the SQL scripts for both the brute-force and
the enhanced strategy. The default strategy is initially set to brute-force. The
implementation of the enhanced strategy requires us to create asynchronous triggers to
keep track of the updates to each of the data sources. As many asynchronous triggers are
needed as there are data sources in the timer trigger definition. The scripts for the
asynchronous trigger processor are created by the code-generator. All the scripts created
are stored in a catalog with the trigger definition. A request is then sent to the timer
manager for a periodic wake-up signal.
If the "initialize immediately" clause is set in the timer trigger string, the view is
initialized the first time on trigger definition by brute force. This view, as discussed
before, needs to be kept until the next timer expiration to evaluate transition predicates.
The timer manager accepts entries for timer requests when a trigger is defined. It
registers these entries into the its list of wake-up times. On expiration of this timer, it
sends a wake-up signal to the execution unit identifying the trigger to be processed.
The execution unit, on receiving a wake-up signal, extracts the definition and the
scripts of the specified trigger from the catalogs. Depending on which of the two
strategies were set active, the execution unit runs the set of SQL scripts from either the
brute-force or the enhanced algorithm. These scripts generate the new view, apply the
predicates of the trigger referring to the previous copy of the view if necessary, and
finally run the action on the qualifying tuples.
After firing the actions, the trigger strategy is reassessed in the reassess block.
The values of a set of parameters, on which the execution cost of the trigger is dependent,
is examined and a decision in made to choose between the two strategies. If the time
spent on re-assessing is a fair fraction of the processing time of the trigger, then this step
can be executed every few times after the processing instead of every time. The
mechanism for reassessing the trigger processing strategy is covered in more detail later
in this chapter.
6.3 Thoughts on Optimization
The run-time performance of the enhanced strategy depends on the following
factors:
update rate of the data sources
selectivity of the selection predicate of the view
indexes present on the data sources on their primary key
In Chapters 4 and 5, we mentioned that for each relation involved in the trigger, a
relation S(pKey) or S(gAttr) is maintained to store the keys of updated tuples. This list of
updated tuples is filtered to a potentially small size by applying the selection predicate
while gathering them. If the number of keys in S is kept to a small size then the time
spent for creating the new view is reduced. The size of the set S is small if either the
update rate is low or the predicate is very selective or both. If the data source has an index
on its primary key pKey or the grouping attribute gAttr, then the time spent on processing
the trigger is significantly reduced, since only a few pages of the data source are touched.
Thus, one or more of the above mentioned factors contribute to the better performance of
the enhanced algorithms.
So, if a plot of the processing time versus the update rate of the table is to be
drawn, then it should look like the one shown in figure 6.2. From this plot it is clear that
the enhanced strategy works well for low update rates. As the update rate increases there
is a crossover point, above which the enhanced strategy is worse than the brute force one.
60
It is our goal to work in the shaded region of the graph i.e., use the enhanced strategy until
the update rate reaches the crossover point, after which we want to switch over to the
brute-force approach.
Crossover Point
Update Rate
-4- Brute Force ----Enhanced I
Figure 6.2: Performance of the brute force method vs. the enhanced method for varying
update rates.
6.4 Optimization Strategies
The first optimization strategy is the simplest and the easiest to implement. Actual
run times are used to arrive at a decision on which among the two algorithms is the better
one. By actual time we mean the wall-clock time, as it is very difficult to obtain the actual
CPU and I/O time spent during query executions due to the inherent multi-threading in
DBMSs [Inf99]. Brute force is initially used to create the view for the first time and the
\"", ""--J_'
time spent in milliseconds is stored with the trigger information. In every subsequent
processing, the enhanced strategy is run and the cost recorded and compared with the
brute-force cost. When the former cost exceeds the latter in two or more consecutive runs,
we switch to the brute-force approach. In the enhanced region of the graph, this algorithm
works fine because we know with a fair degree of certainty that the enhanced algorithm
has lesser processing cost. But once we cross over to the brute-force region, it is difficult
to make a decision when to return back to the enhanced approach, due to the lack of
sufficient information collected.
In the second optimization strategy, we use the estimated costs from the query
optimizer in arriving at a relation between the update rate of the data sources to the
execution cost of the enhanced algorithm. However, this information is not reliable as the
size of the relation S will never be accurately determined by the optimizer due to the
dynamically varying size of S. But, if the optimizer provides a function to calculate the
cost of a query by simply giving to it the size of S then we can approximately determine
the size of S that causes the crossover in costs between the two algorithms. As the update-
rate rises the number of touched keys collected in S will increase too. When the number
of keys in S crosses this threshold number, we switch to the brute force approach and stop
gathering the updates. At a later time, when the update rate drops, we can resume
collecting keys in S to see if this number has dropped below the threshold number.
Stopping and restarting the process of collecting the updated keys is as simple as
disabling and enabling the asynchronous triggers defined on the data sources.
The update-rate of data sources can be kept track of by defining simple main-
memory counters (variables) for each data source. These counters are incremented on
every update to the corresponding data source and once every hour the update rate can be
calculated (by dividing the value of this counter by sixty minutes). Since this counter is
going to be shared among the different transactions updating the data source, a locking
overhead will be introduced per increment. However, since we are interested in only an
approximate number for the update-rate, we can avoid this expensive process of locking
the counter.
While the second strategy seems appealing, it is based on the assumption that the
query optimizer provide an interface function that accepts the size of S and returns the
cost of the enhanced query. However query optimizers compute the estimated costs of
queries based purely on statistics collected by the database and do not accept parameters.
It will be helpful if DBMS vendors could open up their query optimizers provide
interfaces that accept some parameters while optimizing a query. One way to get around
this problem is to enter some dummy rows into S and call the gather statistics command
on S, before optimizing the query.
The first strategy does seem like a more practical and a simple one. However, we
need to find some way to return back to the enhanced approach from the brute force one,
when the update rate drops. This can be done by recording the update-rate at the
crossover point. Later, when the update rate drops below this recorded value we can
switch back to the enhanced strategy.
6.5 Conclusion
The architecture of a timer trigger system is simple. It functions hand in hand with
an Asynchronous Trigger Processor. The enhanced algorithm works well for low update
63
rate tables, high trigger predicate selectivity and the presence of primary key indexes on
data sources. A plot of the enhanced vs. the brute-force strategy reveals that our enhanced
algorithms perform well for low update rates. As the update rates increases there is a
crossover point when this strategy is worse than the brute force one. It is important to
identify the crossover point until which the enhanced algorithm should be used. A more
detailed design and implementation of optimized techniques for a timer trigger system is
a subject of future work.
CHAPTER 7
SUMMARY AND FUTURE WORK
7.1 Summary
Timer triggers allow users to define views and monitor interesting tuples or
interesting changes to tuples in these views. The process of retrieving views, testing
trigger conditions and firing actions takes place periodically. All this gives timer triggers
different semantics from those of the more conventional synchronous (and asynchronous)
triggers, in that the tuples do not need to be updated to satisfy trigger conditions.
The examples covered in Chapter 3 sought to reassert the point that some trigger
needs of users require periodic testing and notification. These translate best to timer
triggers and sometimes cannot be replaced by regular triggers. In example 1, where it was
required to notify some employees weekly of a form to be filled, only timer triggers could
be written here to correctly send one notification per week to forgetful employees.
Some join triggers and many aggregate triggers could be expensive to implement
using asynchronous triggers due to frequent accesses on the original relations. These
triggers can be evaluated more efficiently by the periodic testing mechanism on a group
of updates.
The simplest implementation for timer triggers is the brute-force approach, which
retrieves the view using its definition and then applies the predicates of the trigger. The
enhanced algorithms of simply refreshing the view by recording its updates lead to a huge
performance benefit during periodic processing. This is especially true if one or more of
the following hold viz. the update-rate on the relations is low, the trigger predicate is very
selective or indexes are defined on the primary key of relations.
Formal cost analysis and models in trying to choose the better of the two
algorithms has not been presented and is the scope for future work. However, our
approach of making a choice between the better of the two algorithm and identifying the
point of crossover by using actual run times is a simple and a feasible idea.
The most useful contribution of this thesis is the idea of keeping a set of keys of
updated tuples and restricting the processing to this set. This allows us to be oblivious of
the anomalies arising due to the different sequence of inserts, updates and deletes on the
same tuple of the view. This is achieved by actually retrieving the tuples associated with
these keys from the original relation and thus getting the correct information. Maintaining
this set however, requires us to create asynchronous triggers as explained in Chapter 4.
An asynchronous trigger processor like TriggerMan with its fast predicate indexing
mechanism makes it possible to collect this set of keys with a very low (and almost
insignificant) overhead in many cases.
The architecture of a timer trigger system presented in Chapter 6 is simple and
reuses a lot of the functionality available from the asynchronous trigger processor like its
console, data source definition etc.
7.2 Thoughts on Future Work
In the enhanced algorithms for join triggers we assume that updates to all the
tables in the trigger (or view) definition are important for refreshing the views. Thus, we
keep track of the updates on all the tables. In many cases however, updates to fewer than
all relations are relevant and it may not be necessary to keep track of the updates to all the
relations. This is especially true in example 4 of Chapter 4 which sends a complimentary
email to new pop and rock artists when their album sales cross a million. While updates
take place on both the relations, album and artist, the updates to only album are relevant
as we are only interested when the sales cross the million mark. Thus, there should be an
option in the timer trigger language that specifies the relations on which the updates are
relevant so that our optimizer has a more efficient execution.
While making a choice between the better of the two approaches discussed, our
main basis of comparison was the total run times spent on creating the views and firing
the actions. However, the enhanced algorithms also incur a small cost during the update
to each tuple. This is because, the asynchronous trigger processor evaluates some part of
the predicate on the updated tuples and puts the key in the update set S if necessary.
Although the cost of evaluating some part of the predicate can be very small with an
efficient predicate indexing mechanism, like that of TriggerMan, it could still be
significant for a high update-rate table or an expensive predicate. Other issues such as
locking on the update set S could also escalate cost. Thus, we need to arrive at a more
mathematical model to compute costs of the enhanced algorithms, taking into account the
costs for both collecting the updates and processing them.
When the enhanced algorithms are processing the set of keys of updated tuples
(i.e., keys in S), updates on the original relation still proceed concurrently. Some of these
updates may qualify to have their keys included in the set S. This represents a boundary
case where keys are getting inserted into S while S is being processed. Some analysis
needs to be done to see if any unexpected behavior is possible leading to some tuples
getting processed at the wrong timer expiration.
In our enhanced algorithm for triggers with a transition predicate (i.e., referring to
old and new values from the view), we assumed that a tuple of the view must change to
satisfy the transition predicate. We run the action on all the tuples that satisfy the
following condition:
UPtran (CurrView JOIN TempCandidates)
where CurrView is the old copy of the view from the previous timer expiration
(and is yet to be refreshed) and TempCandidates is the view of the tuples inserted or
updated in the interval since the last timer expiration. By performing the join (CurrView
JOIN TempCandidates), we isolate only those tuples which have been updated since the
last timer expiration and apply the transition predicate Ptran (CurrView.tranFields,
TempCandidates.tranFields) on the old and new copy of the relevant fields. However,
depending on the transition predicate (e.g. old.value = new.value), there may be cases of
tuples that satisfy the predicate without being updated. The brute-force approach will
account for these tuples, as it compares all tuples of the old view to corresponding tuples
of the new view. However, our enhanced approach does not identify tuples that have not
been updated but still satisfy the transition predicate because we apply the predicate only
on the updated tuples. In order to make our enhanced strategy work we need to change the
predicate to the following:
(Ptran' (CurrView OUTER-JOIN TempCandidates)
where Ptran' =
(TempCandidate.key is not Null AND
Ptran(CurrView.tranFields, TempCandidates.tranFields))
OR
(TempCandidate.key is Null AND
Ptran (CurrView.tranFields, CurrView.tranFields))
The predicate Ptran' is explained as follows. We pull in all tuples of the view
CurrView by performing an outer join with TempCandidates. The predicate Ptran' has two
disjuncts, the first one applies the predicate Ptran on old-new pairs of updated tuples in the
view and the second one applies the predicate on only the old values of the untouched
tuples.
The second strategy is a general one and works for both the cases. This can be the
default strategy. However, if we can determine from the transition predicate that an
attribute value has to change to satisfy the predicate then we can switch to the more
efficient approach of applying the predicate Ptran on only updated tuples of the view.
Examples of such predicates are old.value < new.value, new.value> 10*old.value etc. A
theorem prover can be introduced into the code generation block of figure 6.1 to examine
the transition predicates and make this determination. This theorem prover is a subject of
future work.
It has been mentioned at many places, how some expensive join and aggregate
triggers can be implemented more efficiently using timer triggers. More work needs to be
done on this issue. We have to identify factors within the trigger definition and run-time
characteristics that help us conclude the above. Having identified such triggers, we have
to work on a method to inform the owner of that trigger or the system administrator of
possible improvements in system performance using a timer trigger, along with some
numbers possibly. We also need to research the idea of directly switching to timer triggers
where possible, without changing the semantics of the trigger.
It is possible for many triggers to be defined on similar or even the same views.
Moreover, the trigger predicates themselves may be similar. In cases such as these, we
can group the processing of two or more similar triggers to prevent multiple predicate
evaluation and table accesses. This assumes more significance in aggregate triggers, due
to the higher processing costs required for the aggregation. The issue of grouping the
processing of two or more triggers needs to be addressed.
In example 3 of Chapter 3, a trigger was written to send an email weekly to the
finance department on the customers who have defaulted on their payment for ninety days
or more. So, a tuple qualifies for the action simply because its due date has aged for more
than 90 days. Triggers such as these, that do date arithmetic using the current date, can
only be implemented using brute-force. More analysis needs to be done, identifying
predicates like the one using current date, time and timestamps where the trigger will
work correctly using only the brute-force approach.
REFERENCES
[Alf98] Al-Fayoumi, N., Temporal trigger processing in the TriggerMan active
DBMS. Ph.D. dissertation, University of Florida, Gainesville, 1998.
[Car98] Carnes, C., A flexible data source architecture for an asynchronous trigger
processor. Unpublished paper, University of Florida, Gainesville, 1998.
[Cat99] Categoric http://www.categoric.com Palo Alto, CA, 1999.
[Cha89] Chakravarthy, S., HiPAC: A research project in active, time-constrained
database management. Final Technical Report. Technical Report XAIT-89-
02. Xerox Advanced Information Technology, Cambridge, MA, August
1989.
[For82] Forgy, C. L., Rete: A fast algorithm for the many pattern/many object
pattern match problem. Artificial Intelligence 19(1):17-37, September
1982.
[Geh91] Gehani, N., & Jagadish, H. V., Ode as an active database: Constraints and
triggers. In Proceedings of the seventeenth International Conference on
Very Large Data Bases 1991, pages 327-336. Morgan Kaufmann, San
Francisco, September 1991.
[Han90] Hanson, E. N., Chaabouni, M., Kim, C., & Wang, Y., A predicate
matching algorithm for database rule systems. In Proceedings SIGMOD
Conference 1990, pages 271-280. ACM Press, New York, June 1990.
[Han94] Hanson, E. N., & Chaabouni, M., The IBS tree: A data structure for
finding all intervals that overlap a point. Technical Report WSU-CS-90-
11. Wright State University, Dayton, OH, April 1990.
[Han96a] Hanson, E. N., The design and implementation of the Ariel active database
rule system. IEEE Transactions on Knowledge and Data Engineering
8(1):157-172, February 1996.
[Han96b] Hanson, E. N., & Johnson, T., Selection predicate indexing in active
databases using Interval Skip Lists. Information Systems, pp. 269-298,
1996.
[Han97] Hanson, E. N., Bodagala, S., & Chadaga, U., Optimized trigger condition
testing in Ariel using Gator Networks, University of Florida CISE
Department. Technical Report 97-021, Gainesville, November 1997.
http://www.cise.ufl.edu
[Han98] Hanson, E. N., Konyala, M., Vernon, A., Noronha, L., & Park, J., Scalable
trigger processing in TriggerMan. University of Florida CISE Department.
Technical Report 98-008, Gainesville, July 1998. http://www.cise.ufl.edu
[Han99] Hanson, E. N., Carnes, C., Huang, L., Konyala, M., Noronha, L.,
Parasarathy, S., Park, J., & Vernon, A., Scalable Trigger Processing. In
Proceedings International Conference on Data Engineering 1999, pages
266-275. IEEE Computer Society, Los Alamitos, CA, March 1999.
[Inf98] Informix Inc. http://www.informix.com Palo Alto, CA, 1998.
[Kon98] Konyala, M., Predicate indexing in TriggerMan. Master's thesis,
University of Florida, Gainesville, 1998.
[Kan98] Kandil, M., Predicate placement in active database discrimination
networks. Ph.D. dissertation, University of Florida, Gainesville, 1998.
[Mir87] Miranker, D. P., TREAT: A better match algorithm for AI production
systems. In Proceedings AAAI-87 Six\ih National Conference on Artificial
Intelligence, pages 42-47. Morgan Kaufmann, San Francisco, August
1987.
[Net99] NetMind http://www.netmind.com Campbell, CA, 1999.
[Oas98] Ozsu, M. T., & Valduriez, P., Principles of distributed database systems.
Prentice-Hall, Englewood Cliffs, NJ, 1998.
[Par98] Park, J., Implementation and performance tuning of discrimination
networks for asynchronous trigger processing view maintenance.
Unpublished paper, University of Florida, Gainesville, 1998.
[Pat98] Paton, N. W., ed. Active rules in database systems. Springer Verlag, New
York, 1998.
[Pug90] Pugh, W., Skip lists: A probabilistic alternative to balanced trees.
Communications of the ACM 33(6):668-676, June 1990.
[Rao98] Rao, J., & Ross, K. A., Reusing invariants: A new strategy for correlated
queries. In Proceedings SIGMOD Conference 1998, pages 37-48. ACM
Press, New York, June 1998.
[Sel79] Selinger, P. G., Astrahan, M. M., Chamberlin, D. D., Lorie, R. A., & Price,
T. G., Access path selection in a relational database management system.
In Proceedings SIGMOD Conference 1979, pages 23-34. ACM Press,
New York, May-June 1979.
[Ses96] Seshadri, P., Pirahesh, H., & Cliff Leung, T. Y., Complex query
decorrelation. In Proceedings International Conference on Data
Engineering 1996, pages 450-458. IEEE Computer Society, Los Alamitos,
CA, March 1996.
[Si198] Silberschatz, A., Korth, H. F., & Sudarshan, S., Database management
systems, 3rd ed., McGraw-Hill, New York, 1998.
[Sis95a] Sistla, A. P., & Wolfson, O., Temporal conditions and integrity constraints
in active database systems. In Proceedings SIGMOD Conference 1995,
24(2):269-280, San Jose, CA, June 1995.
[Sis95b] Sistla, A. P. & Wolfson, O., Temporal triggers in active databases. IEEE
Transactions on Knowledge and Data Engineering 7(3):471-486, June
1995.
[Sto91] Stonebraker, M., & Kernnitz, G., The POSTGRES next-generation
database management system. Communications of the ACM 34(10):78-92,
October 1991.
[Syb99] Sybase Replication Server http://www.sybase.com Emeryville, CA, 1999.
[Wan92] Wang, Y. W., & Hanson, E. N., A performance comparison of the Rete
and TREAT algorithms for testing database rule conditions. In
Proceedings IEEE Data Engineering Conference 1992, pages 88-97. IEEE
Computer Society, Los Alamitos, CA, February 1992.
[Wid91] Widom, J., Cochrane, R., & Lindsay, B. G., Implementing set-oriented
production rules as an extension to Starbust. In Proceedings of the
seventeenth International Conference on Very Large Data Bases 1991,
pages 275-285. Morgan Kaufmann, San Francisco, September 1991.
[Wid95] Widom, J. & Ceri, S., eds., Active database systems: Triggers and rules
for advanced data processing. Morgan Kaufmann, San Francisco,
September 1995.
BIOGRAPHICAL SKETCH
Lloyd Noronha was born on December 7, 1973 in Bombay, India. He received a
bachelor's degree in electronic engineering securing first class with honors from Bombay
University, Bombay, India, in May 1995.
He joined University of Florida in January 1998 to pursue a master's degree in
computer and information science and engineering.
He has worked as a research assistant with Dr. Hanson at the Database Systems
Research and Development Center.
His research interests are in active database systems.
|
Full Text |
PAGE 1
Enhanced Techniques for Timer Trigger Processing1Lloyd Noronha and Eric Hanson 301 CSE CISE Department University of Florida Gainesville FL 32611-6120 (352) 392-2691 hanson@cise.ufl.edu http://www.cise.ufl.edu/~hanson TR-99-015 9 Sep 1999 1 This report contains the MS thesis of the same title written by Lloyd Noronha under the supervision of Eric Hanson, September 1999.
PAGE 2
ii ACKNOWLEDGMENTS I would like to express my sincere gratitude to my advisor, Dr. Eric Hanson, for giving me an opportunity to work on this challenging project and also for providing continuous guidance, advice, and support throughout the course of my work. I thank Dr. Joachim Hammer and Dr. Stanley Su for serving on my supervisory committee and for their careful perusal of this thesis. I would like to thank Ms. Sharon Grant for maintaining a great research environment at the Database Systems Research and Development Center. I am thankful to Chris Carnes, Lan Huang, Mohan Konyala, Sasi Kumar, J. B. Park, and Albert Vernon for their invaluable help and the fruitful discussions that we had during the course of this work. On a more personal note, I would like to thank my family whose love, support, and constant encouragement was of great importance through this work.
PAGE 3
iii TABLE OF CONTENTS Page ACKNOWLEDGMENTS...................................................................................................ii TABLE OF CONTENTS...................................................................................................iii LIST OF FIGURES.............................................................................................................v CHAPTERS 1 INTRODUCTION..........................................................................................................1 2 TRIGGERMAN, A FAST ASYNCHRONOUS TRIGGER PROCESSOR..................5 2.1 Introduction.......................................................................................................5 2.2 TriggerMan Architecture...................................................................................6 2.3 The Predicate Indexing Strategy of TriggerMan...............................................8 2.4 Conclusion.......................................................................................................10 3 TIMER TRIGGER LANGUAGE................................................................................12 3.1 Introduction.....................................................................................................12 3.2 Timer Trigger Language..................................................................................14 3.3 Trigger Semantics............................................................................................15 3.4 Timer trigger examples...................................................................................16 3.5 Conclusion.......................................................................................................20 4 ENHANCED ALGORITHMS FOR SINGLE TABLE AND JOIN TRIGGERS........21 4.1 Introduction.....................................................................................................21 4.2 The Brute Force Implementation for Single Table and Join Triggers.............22 4.3 Optimization Goals and Strategy.....................................................................23 4.4 Enhanced Algorithms for Single Table Triggers.............................................24 4.4.1 Single Table on Update..........................................................................25 4.4.2 Single Table on Insert.............................................................................28 4.4.3 Single Table on Delete...........................................................................29 4.4.4 Single Table on Update with Transition Predicate.................................30 4.5 Enhanced algorithms for Join Triggers...........................................................32 4.5.1 Join Trigger on Update...........................................................................32 4.5.2 Join Trigger on Update with Transition Predicate.................................34 4.6 Combining the View Predicate and the Trigger Predicate..............................36
PAGE 4
iv 4.7 Examples.........................................................................................................37 4.8 Conclusions.....................................................................................................41 5 ENHANCED ALGORITHMS FOR AGGREGATE TRIGGERS..............................42 5.1 Introduction.....................................................................................................42 5.2 The Brute Force Implementation for Aggregate Triggers...............................42 5.3 Single Table Aggregate Triggers.....................................................................44 5.3.1 Single Table, Aggregate, Non-Transition, on Update............................45 5.3.2 Single Table, Aggregate, Transition, on Update....................................47 5.4 Multi Table Aggregate Triggers......................................................................48 5.4.1 Multi Table, Aggregate, Non-Transition, on Update.............................48 5.4.2 Multi Table, Aggregate, Transition, on Update.....................................52 5.5 Examples.........................................................................................................53 5.6 Conclusion.......................................................................................................55 6 OPTIMIZATION AND IMPLEMENTATION ISSUES.............................................56 6.1 Introduction.....................................................................................................56 6.2 Architecture.....................................................................................................56 6.3 Thoughts on Optimization...............................................................................59 6.4 Optimization Strategies...................................................................................60 6.5 Conclusion.......................................................................................................62 7 SUMMARY AND FUTURE WORK..........................................................................64 7.1 Summary.........................................................................................................64 7.2 Thoughts on Future Work...............................................................................65 REFERENCES..................................................................................................................70 BIOGRAPHICAL SKETCH.............................................................................................73
PAGE 5
v LIST OF FIGURES Figure Page 2.1: TriggerMan Architecture..............................................................................................7 2.2: Predicate Index Architecture........................................................................................8 6.1: Architecture of a Timer Trigger System....................................................................57 6.2: Performance of the brute force method vs. the enhanced method for varying update rates..........................................................................................60
PAGE 6
vi Abstract of Thesis Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Master of Science ENHANCED TECHNIQUES FOR TIMER TRIGGER PROCESSING By Lloyd X. Noronha August, 1999 Chairman: Eric Hanson Major Department: Computer and Information Science and Engineering Timer-Driven Triggers allow users to monitor interesting tuples or interesting changes to tuples of a view. Trigger condition testing takes place periodically, i.e., once every few hours, days or weeks. When the timer of a trigger expires, a new copy of the view is retrieved and the trigger predicates applied to that view. Trigger predicates could be transition predicates referring to the attribute values of tuples of a view from the previous timer expiration. Timer Triggers differ from conventional triggers, in that the trigger testing is not done for updated tuples only. Timer triggers could be fired on tuples of a view simply because they satisfy the predicate trigger predicate. Since the trigger conditions are tested every so often, high update rate tables can have a lot of timer triggers defined on them. The goal of this thesis is to design a timer trigger system that works hand-in-hand with a trigger processor and complements its features. The semantics of the timer trigger differ from that of regular triggers and are explained by a brute-force approach. Several
PAGE 7
vii enhanced algorithms can be written that do better than the brute-force approach, but do not change the semantics. A timer trigger system also provides the user an option of replacing expensive join and aggregate triggers with timer triggers.
PAGE 8
1 CHAPTER 1 INTRODUCTION Timer Triggers test and fire triggers periodically i.e., every few minutes, hours, days, weeks etc. Views are created on the database and triggers are fired periodically on interesting tuples or interesting changes to tuples of the view. This idea differs from the conventional philosophy of synchronous [Sil98] and asynchronous [Han97] triggers which fire triggers when a tuple is updated. Current database systems process triggers synchronously. Synchronous trigger processing lengthens the response time of update transactions, because the transaction cannot commit until all the trigger conditions defined on the updated tables are tested and actions fired. The response time of update transaction is thus directly affected by the increase in the number of triggers, which limits the number of triggers that can be defined on a table. In the case of assertions synchronous triggers are indispensable. But in most other cases, delaying the trigger processing to until after the triggering transaction commits (i.e. asynchronous trigger processing) should work just as fine. This is especially true in the case of alerters. In an asynchronous trigger system, the only work expended during a transaction is to capture the updates for later processing [Han98]. This time spent in capturing updates is a constant and totally unrelated to the number of triggers defined on the tables involved in the transaction. Thus, it is possible to increase the number of
PAGE 9
2 triggers defined on a table to a very large number without affecting the response time of update transactions. After the transaction completes, each of the updated tuples captured (also known as tokens) are individually tested against the trigger condition of each trigger and actions fired accordingly. This process of testing all the trigger conditions can be done sequentially. However, a huge speedup can be achieved by means of an efficient discrimination network for evaluating predicates. A Selection Predicate Index, is one such discrimination network [Han98], which causes a factor of improvement in the processing time per updated token. This strategy works really well for all types of triggers, especially with single table triggers because the entire condition evaluation on an update token takes place in main memory i.e., without accessing the database. In the case of join triggers and aggregate triggers, there is a relative increase in time spent on processing a token, due to a possible table-access involved during condition evaluation of the trigger. In the case of join triggers, when an update token of a table arrives, a part of the condition evaluation involves performing a join. In the case of aggregate triggers, we have to identify the group to which the update token belongs and test the aggregate condition for that group. This is covered in more detail in Chapters 4 and 5. In both these cases, the Selection Predicate Index is still used to evaluate a part of the predicate that does not involve a table access. If this part evaluates to true, we then proceed to run an SQL query to read one or more tables, to evaluate the rest of the predicate. Thus, although an efficient discrimination network still evaluates a part of the predicate quickly, it is the I/O costs that dominate the processing time of join and
PAGE 10
3 aggregate triggers. Timer triggers can be thought of as an alternative option to some expensive join and aggregate triggers because they access the database less often. Timer triggers enable the user to define triggers on a view of the database. The view could be a join or an aggregation of one or more tables or simply a snapshot of a single table. Trigger conditions are tested periodically i.e., once every few minutes, hours, days, weeks etc. On a timer expiration, a new copy of the view is created and the predicates of the trigger are applied to the view. These predicates could be transition predicates, which compare some attribute values of the current view with the corresponding values of the copy of the view created on the previous timer expiration. It is easy to conceive of many applications and examples where users would like to be intimated of interesting conditions once every so often. Consider an example of a manager of a computer company, who is interested in knowing every week, which of his hardware salesmen have a two fold increase in sales from the previous week. A timer trigger can be defined with the timer duration of a week. The view here, keeps the sales information of only the hardware salesmen. When the timer expires, a fresh copy of this view is created from the sales table and the sales information for each salesman is compared with his sales from the view of the previous week. The action is fired for all tuples of the view that satisfy the predicate. Asynchronous triggers, like synchronous triggers, test the trigger condition for every updated tuple, albeit only after the transaction commits. Thus, only an updated tuple can cause a trigger to fire. However, in timer triggers, all tuples of a view that satisfy the predicate can fire the trigger. These include tuples which may have not been touched since the last timer expiration.
PAGE 11
4 By definition, timer triggers are defined to achieve an objective different from asynchronous triggers and thus have different semantics. However, as indicated above, for expensive join and aggregate triggers, it is possible to replace an asynchronous trigger with a timer trigger to achieve similar (but not always the same) results. The goal of this thesis is to propose a good design of a timer trigger system to complement an existing trigger processor. Chapter 2 explains the architecture and predicate indexing mechanism of an asynchronous trigger processor, TriggerMan. The syntax and semantics of the timer trigger language are presented in Chapter 3 along with some examples. Implementation of timer triggers with efficient algorithms for single table and join triggers are discussed in Chapter 4. Chapter 5 has a similar discussion for aggregate triggers. Chapter 6 focuses on the architecture of a timer trigger system and some optimization details. Finally, Chapter 7 summarizes the discussion on timer triggers and provides some thoughts for future work.
PAGE 12
5 CHAPTER 2 TRIGGERMAN, A FAST ASYNCHRONOUS TRIGGER PROCESSOR 2.1 Introduction The design of a timer trigger system in this thesis focuses on working with a conventional (synchronous or asynchronous) trigger processor to complement and enhance its functionality. In later chapters, we present some enhanced algorithms for the implementation of timer triggers. These algorithms need to define some regular singletable triggers to capture interesting updates and thus it is very important for the underlying trigger processor to have a fast predicate evaluation mechanism for singletable triggers. In this chapter we explain the architecture of such a trigger processor called TriggerMan [Han98]. TriggerMan is a scalable and extensible trigger processor. The triggers in TriggerMan can have user defined data types and the system scales well even for millions of triggers for certain applications. The scalability of the system can be attributed mainly to its asynchronous trigger processing mechanism and an efficient predicate indexing strategy. As the number of triggers in a system increases, more time is spent on testing each updated tuple against all the trigger conditions. Making this testing process asynchronous alleviates the problem of delays in update transactions, which is an inherent problem of synchronous trigger processing. TriggerMan additionally implements a
PAGE 13
6 predicate indexing strategy [Han99] that is responsible for a huge speedup in evaluating the predicates of the triggers and thus reduces the processing time per updated tuple. Extensibility will soon be an indispensable requirement in trigger systems. With the advent of Object Relational Database Management Systems (ORDBMS), extending data types and adding new routines in a database system have gained popularity amongst users. Users are allowed to create new User-Defined Data Types (UDT) and UserDefined Routines (UDR) in the database system and use them with regular SQL. TriggerMan is designed to recognize UDT's and UDR's of the underlying database system and allow their use in triggers. The other possible choices of trigger processors for our timer trigger system are in the active database systems of Ariel [Han96a], POSTGRES [Sto91], HiPAC [Cha89], Starbust [Wid91], and Ode [Geh91]. But they do not provide the scalability feature of TriggerMan and very limited, if any, extensibility support. Among them, Ariel and POSTGRES do implement a selection predicate indexing architecture, which however, is not extensible. 2.2 TriggerMan Architecture TriggerMan is designed to work as a stand-alone module independent of the underlying system. However, for simplicity of explanation, the architecture that is discussed here is as a module that resides on an Object-Relational DBMS. The architecture of TriggerMan is shown in Figure 2.1. The main components of the architecture are explained as follows. The TriggerMan Module resides and runs in the address space of the DBMS server. The task
PAGE 14
7 queue is a to-do list of tasks for the trigger processor. Once every so often, multiple TriggerMan drivers run in parallel to read and process jobs from the queue. The Data Source applications are responsible for capturing updates on local and remote data sources and putting them in an update queue. These updates are later gathered up from the update queue by the TriggerMan drivers and added as tasks to be processed in the task queue. Figure 2.1: TriggerMan Architecture Lastly, the TriggerMan console is an application program that interfaces to the user and allows him to initialize the system, create triggers, drop triggers, shut down the system etc. Host ORDBMSUpdate Queue Task Queue TriggerMan Module Data Source App TriggerMan Driver TriggerMan Console
PAGE 15
8 2.3 The Predicate Indexing Strategy of TriggerMan The major work done by a TriggerMan driver program in Figure 2.1 is in evaluating the predicates for an updated tuple against each of the trigger definitions. TriggerMan employs a novel selection predicate indexing strategy [Han98] that makes the predicate evaluation faster by a huge factor. Figure 2.2: Predicate Index Architecture Selection Predicate Index Predicate Index Root Data Source Node Expression Signature List Constant Sets Sets of the rest of the predicate and trigger id's for a particular constant
PAGE 16
9 If a system has a large number of triggers then there is a fair chance that some clusters of triggers have a part of their predicate looking similar. This is the basic premise of TriggerMan's predicate indexing strategy. For example a majority of triggers defined by a stock investor will be alerters where the predicates check for stock prices going above or below a certain interesting value. The predicate indexing strategy of TriggerMan, simplified for explanatory purposes, is shown in Figure 2.2. All the data sources defined in TriggerMan have a node at the first level below the root. When a trigger is defined, its predicate is converted to Conjunctive Normal Form (CNF), and the most selective conjunct qualifies to be the expression signature for that trigger. If the same signature already exists, then the constants of this conjunct are plugged into the constant list under that signature and nodes containing the rest of the predicate and the id of the trigger are placed under this constant. If the same signature does not already exist then a new signature node is created and the same steps followed. For example consider the following two triggers having the same expression signature: create Trigger buy_informix from stock where (name="Informix") and (price<5) do email("david", "Stock price of Informix within 5"); create Trigger sell_oracle from stock where (name="Oracle Corp.") and (price>40)
PAGE 17
10 do email("david", "Sell Oracle Corp. now. The price is :price"); The trigger processor isolates the predicate "stock.name=constant" as the expression signature for both the triggers because it finds it the most selective. It adds this signature into the list of expression signatures under the data source "stock" (if it does not already exist). The constants "Informix" and "Oracle" are inserted into the list of constants (if they do not already exist). Finally, a node for the id's of the triggers and the remaining part of the predicate are inserted under these constants. Now, when the information about an updated tuple of the stock relation is pulled out from the task queue by the TriggerMan driver, it is made to pass through this discrimination network. The attribute values within this tuple are evaluated against the constants within each expression signature. For every constant for where the expression evaluates to true, we proceed with testing the rest of the predicate and then firing the trigger action if necessary. The predicate for single table triggers can be evaluated completely within this main memory discrimination network. However for join and aggregate triggers, evaluation of the rest of predicate involves reading relations to perform joins and aggregations. 2.4 Conclusion The optimization algorithms for timer triggers need to define single table triggers for each of the data sources with the timer trigger definition. The predicate indexing strategy of TriggerMan works best for single table triggers and hence is ideal for the timer
PAGE 18
11 trigger algorithms. Along with a fast predicate evaluation technique, TriggerMan also scales well for a large number of triggers and works with extensible data types.
PAGE 19
12 CHAPTER 3 TIMER TRIGGER LANGUAGE 3.1 Introduction The preceding chapters emphasized the importance of an asynchronous trigger processor like TriggerMan, which provides a more powerful language than current trigger systems while at the same time providing the features of scalability and extensibility. There are trigger situations however, where the events of interest for a user do not lie in individual updates to the database but in interesting happenings to the database over a period of time. Take the case of a stock market investor who wishes to be alerted at the end of the week when he spends over a $10,000 in purchasing stock of a single company. Here, the investor requires the trigger system to keep track of all his purchases for the week. In essence, the user wants the trigger system to keep track of a certain view of the database and notify him periodically of interesting changes to that view. This is the fundamental motivation behind timer triggers. Apart from fulfilling it's basic role of periodic notification of interesting changes to a view, timer triggers also provide users an option of off-loading, where possible, expensive triggers from asynchronous processing to timer based processing. An asynchronous trigger processor tests for trigger conditions on each updated tuple. While this strategy works out well for single table triggers (due to no table access), join and
PAGE 20
13 aggregate triggers could suffer sometimes. Both join and aggregate triggers potentially involve a table access during predicate evaluation on an update token. A part of the predicate not dependent on the table-access, can still be evaluated using the fast predicate indexing strategy as that of TriggerMan. However, evaluating the remaining portion of the predicate in join triggers requires tuples from one or more tables to be read. In aggregate triggers all tuples belonging to the same group as the updated tuple have to be read and an aggregation performed on them. This leads to a higher processing cost per update token, whose effect could be augmented by the increase in the number of join and aggregate triggers. In systems where triggers are defined on high update-rate tables, this problem could get compounded causing the system to get overloaded. In situations such as these, the timer trigger system in addition to fulfilling it's semantic role, could lend a helping hand to the asynchronous trigger system by offloading some expensive join and aggregate triggers to timer triggers. Instead of performing table accesses often, the advantage lies in doing them once every so often. Timer triggers thus, albeit semantically different, can be used to meet objectives similar to asynchronous triggers for a better system performance. Due to differing semantics however, the decision to switch to a timer trigger must not be one done automatically by the trigger processing system but should be an explicit decision of the user. The sections to follow explain the syntax and semantics of a timer trigger system.
PAGE 21
14 3.2 Timer Trigger Language A Timer Trigger System is designed to work as a component of an asynchronous trigger processor like TriggerMan, making the schema definitions of both the DBMS and trigger processor available to it. Timer triggers can be created on data sources or views. The asynchronous trigger processor is already familiar with the schema and the primary key of all data sources specified in the command. The syntax for creating timer triggers is as follows: create timer trigger from {on event} {when } do check every {initialize {immediately | on first timer expiration}} The from clause specifies the name of a data source or view on which the trigger is defined. This field can also contain a SQL select statement. The on event clause could either be insert, delete or update indicating that the trigger should fired only when inserts, deletes or updates take place in the view. The default is update. The when clause has the trigger condition on the view. When the timer expires, this boolean expression is applied on the view to identify tuples satisfying the trigger condition. Although any kind of predicates are allowed to be in the when clause, its primary purpose is to specify transition predicates. The do clause specifies the trigger action. The check every clause specifies the timer period or the day of the week and time when the trigger should fire. The initialize clause is optional and specifies whether the first instance of the view is created immediately or the first time the timer expires.
PAGE 22
15 3.3 Trigger Semantics The semantics of the timer trigger command is now explained for the three different events viz. insert, delete and update. Consider a timer trigger defined on a data source or a view with a when clause W . The predicates could be transition predicates referring to old values of attributes obtained from the copy of the view created during the previous timer expiration. The view itself, could be a single relation or a result of applying a select statement with predicates on single tables, joins or aggregate queries. Periodic processing: In the explanation to follow V is the current copy of the view created when the timer expires by executing the query in the from clause of the create timer trigger statement and old_V is the copy of the view from the previous timer expiration. The following steps are executed when the timer expires: Event ON UPDATE: 1. Retrieve the contents of the view into V. 2. If the when clause W has a transition condition then the action is run on every tuple resulting from the operation sw (V JOIN old_V). If there is no transition condition, then the action is run on every tuple resulting from the operation sw(V). 3. Destroy old_V and rename V to old_V. Event ON INSERT: 1. Retrieve the contents of the view into V. 2. The action is run on every tuple resulting from the operation sw (V old_V).
PAGE 23
16 3. Destroy old_V and rename V to old_V. Event ON DELETE: 1. Retrieve the contents of the view into V. 2. The action is run on every tuple resulting from the operation sw (old_V V). 3. Destroy old_V and rename V to old_V. 3.4 Timer trigger examples Example 1 : Send an alert to employees who have taken more vacation than their allowed quota but haven't filled out a special requisition form. The schema definition of the relation is: employee(emp_id, emp_name, email_addr, vacation_taken, vacation_left, vacation _req_filled .) with emp_id as the primary key. The trigger is defined as follows: create timer trigger excess_vacation from (select email_addr, vacation_req_filled from employee where vacation_taken > vacation_left ) when vacation_req_filled = False check every week beginning Monday at 8:00 AM do email(:email_addr, "Please fill out the Excess Vacations form"); Here the view is simply the details of the employees who have exceeded their allowed vacation quota. When the timer expires, it identifies the employees who haven't filled out the necessary form from among those in the view. The notation :email_addr in the action substitutes the actual email addresses by macro expansion while running the action for qualifying tuples. Since no event is specified, on update is assumed. Moving
PAGE 24
17 the predicate of the when clause to the definition of the view does not change the meaning of the trigger in this example. The above is a good example illustrating the use of timer triggers and it's differing semantics from conventional triggers. The intent is to email a forgetful employee every Monday morning until he does the needful. Trying to achieve the same with a synchronous or an asynchronous trigger would send an email to the employee every time his details are touched or it is possible that he does not get emailed at all if his details are not touched. Example 2 : Alert all junior managers when they have two or fewer days of vacation left in the year. The schema of the relation employee is the same as the one above. create timer trigger less_vacation_left from (select email_addr from employee where (designation='Jr. Managers') and (vacation_left vacation_taken) <= 2 ) on insert check every week do email(:email_addr, "You have only two or less days of vacation left in the year"); The view has all the junior managers having two or less days of vacation. The on insert clause ensures that the employee is emailed only the first time his vacation dips below two days. Example 3 : Notify the finance department of the customers who haven't honored their invoices for more than ninety days past the due date.
PAGE 25
18 The schema definition of the relations are: invoice (num, cno, due_date, amount, ..) with primary key num and customer (cno, name, addr, ) with cno as the primary key. The trigger is defined as follows: create timer trigger unhonored_invoice from (select invoice.num as inv_num, customer.name as cust_name from customer, invoices where (invoice.cno = customer.cno) and (CURRENT_DATE invoice.due_date) > 90 and (invoice.status='UNPAID') check every week do email("finance@experts.com", "Customer :cust_name has not honored the invoice :inv_num for more than 90 days"); This is a good example of a join trigger where the relation of interest is only invoice. The customer relation is present only to extract the information on defaulting customers. Another important characteristic of this trigger is that tuples who qualify are mostly tuples lying untouched for more than 90 days. Example 4 : The managers of a famous record label wish to let their new pop and rock artists know when their album sales have crossed a million. The schema definition of the relations are: album (album_id, artist_name, sales, artist_type, ) with album_id as the primary key and artist (artist_name, category, ) with artist_name as the primary key. The trigger is defined as follows: create timer trigger new_artist_album_sales from (select artist.email as artist_email, album.sales as sales from album, artist where (album.artist_name = artist.artist_name) and ((artist.category="rock") or (artist.category="pop")) and (album.artist_type="new") when old.sales < 1000000 and sales>=1000000
PAGE 26
19 check every day do email(:artist_email, "Congratulations! Your album sales have crossed a million"); The transition predicate in the when clause ensures that only one email is sent to the artist on the day when his album sales have crossed a million. Example 5 : The managers of a departmental store chain wish to keep the salaries of all the sales personnel to within 50% of the budgeted salary. The schema definition of the relations are: emp(emp_id, name, dno, sal, desig, .) with emp_id as the primary key and dept(dept_id, name, budget, manager_email, .) with dept_id as the primary key. The trigger is defined as follows: create timer trigger excess_salary from (select dept.manager_email as mgr_email, sum (emp.sal) as sum_sal, dept.budget as budget from emp, dept where (emp.dno = dept.dno) and (emp.desig="sales personnel") and (dept.name="toy") group by dept.name, dept.manager_email, dept.budget when sum_sal > .50 * budget check every week do email(:mgr_email, "You are exceeding the allowed budget for salespersons."); This is an example of an aggregate trigger involving two tables. A trigger can fire due to changing information on either of the tables and both tables also have selection predicates. The view here has aggregate information about the total salaries of sales personnel in the toy dept.
PAGE 27
20 3.5 Conclusion The syntax and semantics of a timer trigger language are explained. The examples illustrate how users can use timer triggers to be informed periodically of interesting tuples in a view. In example 3, we saw an interesting property of timer triggers where some tuples satisfy the trigger condition although they did not do so in the previous timer expiration and have not been touched since. They qualified for the action simply because they aged enough to satisfy a date-time predicate. The examples brought about the similarity and the differences in timer triggers against the more conventional synchronous and asynchronous triggers.
PAGE 28
21 CHAPTER 4 ENHANCED ALGORITHMS FOR SINGLE TABLE AND JOIN TRIGGERS 4.1 Introduction In the previous chapter we presented the syntax of the "create timer trigger" command and explained the semantic definition of the different types of single table triggers, join triggers and aggregate triggers with some examples. A brute force method for implementing any timer trigger involves running a query against the relations in the trigger definition to generate the new copy of the view and then evaluating the predicates of the trigger for all tuples in that view. While the brute force method always works and is the simplest implementation, it is often an inefficient way to go. Consider for example, the following read-mostly table employee_details which stores the name and information of all employees within a company, employee_details(emp_id, emp_name, emp_addr,..) Changes to this table normally take place only when a new employee is recruited or an existing employee leaves. If one or more timer triggers are defined on this low update rate table, then it is totally unnecessary to re-run queries on the entire relation employee_details to create the new view, every time the timer goes off. In the case when there are no updates at all, between two timer expirations, the new view is simply the old view. If only a few updates take place in this interval, then the new view can be obtained
PAGE 29
22 from the old view by simply keeping track of the updates since the last timer expiration and applying those updates to the old view during processing. In this chapter we try to look at some enhanced strategies to generate the old view from the new view without the brute force method. Several enhanced methods seem appealing, but with improper optimization methods several cases of anomaly arise when insert, updates and deletes take place on the same tuple many times in one interval. Our enhanced technique endeavors to rid ourselves of these anomalies, and generate an efficient technique to handle almost all types of timer triggers. In the next section we illustrate the brute force approach to implement timer triggers. With the disadvantages of this approach in mind, we highlight our points of focus while developing the various enhanced algorithms, followed by the actual algorithms for timer triggers, on single relations and joins. We end the chapter with some examples and thoughts. 4.2 The Brute Force Implementation for Single Table and Join Triggers Consider a general trigger T defined on one or more relations R. The predicate of the trigger can be expressed in CNF as P Ptran, where P represents all the non-transition predicates and Ptran represents the transition predicates. Although the view and the trigger have their separate predicates Pv and Pw respectively, we combine them together discussion and represent them as P Ptran. More details on combining predicates and applying them are covered later in this chapter in section 4.6. The current view created is CurrView= sP(R) and PrevView is the copy of the view from the previous timer expiration. Note that CurrView and PrevView are views we use for implementation, they
PAGE 30
23 are different from the view defined in the trigger i.e. V= sPv(R). The operations to be performed to implement this trigger are: 1. Create a new copy of the view, CurrView= sP(R) 2. Run the action on all the tuples resulting from the following operation, sPtran (CurrView JOIN PrevView). The above two steps translate into two SQL queries. For non-transition triggers, the second step is not needed and the action is run on all the tuples of CurrView, resulting in only one SQL query. 4.3 Optimization Goals and Strategy The primary goal of developing enhanced algorithms for timer triggers is to generate the new view from the old view with a method more efficient than the brute force method, in terms of CPU and I/O processing time. This, as might be expected, implies that we keep track of some information about the tuples updated on a table since the last timer expiration. When the timer goes off we can modify the old view to obtain the new view. As mentioned previously, improper optimization methods can cause anomalies, whereby trigger behavior is different than when using the brute-force method. We overcome these anomalies by keeping track of only the primary key of the tuples that were updated. When the timer goes off, we perform a join of these keys with the original relation, to get the correct picture of which among the touched tuples were effectively inserted, deleted or updated. We then go on to apply these operations to the old view to obtain the new view.
PAGE 31
24 The timer trigger system is designed to complement an asynchronous trigger system. As mentioned before, for a table on which a timer trigger is defined, we keep track of all updates to the table within each processing interval. We can go a step further and keep track of only those tuples which are of interest to the trigger i.e. those tuples, which satisfy the predicates of the trigger. This can be done by using the existing asynchronous trigger system of TriggerMan. Predicates for updated tuples can be evaluated asynchronously using the fast and efficient Selection Predicate Index in main memory [Han98]. Another focus while developing our algorithms is to query the relations on which triggers are defined, only once. It is reasonable to assume these relations are large or will keep growing. It is therefore important for our algorithms to not touch the relations more times than necessary. So, in almost all cases, we query the original relation only once for its attribute values. 4.4 Enhanced Algorithms for Single Table Triggers In this section we present our enhanced algorithms to implement the various timer triggers on single relations. These include triggers with or without transition predicates and with any of the various events viz. on insert, on update and on delete. The general schema of the view CurrView, on a single table R is (key, fields, tranfields), where key is the primary key of the table R and fields are one or more attributes of R whose values are required in the action. The third field tranfields is present only in a transition trigger and includes those attributes of R, whose values are required to evaluate the transition predicate.
PAGE 32
25 4.4.1 Single Table on Update Consider a timer trigger defined on a single relation R (key, field1, field2, field3 ,fieldn), with one or more predicates P, none of which have a transition condition. The brute force method to implement this trigger is to evaluate the query currView= P key,fields ( sP(R)), every time the timer goes off. The specified action is then run for all the resulting tuples of the view currView. In our enhanced algorithms we maintain a relation S(key), which is a set of unique primary keys of tuples of R, that were touched (i.e. inserted, deleted or updated), since the last timer expiration. Additionally, the keys of only those tuples are inserted into the relation S, which either satisfy the predicate clause P or used to satisfy it before the update. Regular triggers are defined on the relation R to collect this set S. This is explained in greater detail with examples in later sections. The following operations can now be performed to generate a new copy of the view: 1. TempCandidates = P key,fields ( sP(S JOIN R)) 2. Refresh CurrView : 2a. CurrView = CurrView (S JOIN CurrView) 2b. CurrView = CurrView TempCandidates 3. Candidates for the action are all the tuples of the view CurrView. Explanation: CurrView is the view from the last timer expiration. When the timer goes off, we need to refresh the view CurrView before running the action on all its tuples. The first step is to identify tuples to be inserted into the view or have their existing values updated. These tuples could only be from among the touched tuples of relation R
PAGE 33
26 in the interval since the last timer expiration, i.e., from the set of keys in relation S. These tuples are collected into a temporary relation TempCandidates. A tuple updated in relation R may cause an insert into the view currView, if that tuple did not previously satisfy the predicate P, but does so after the update. In performing a join on the original relation R, we get the current and thus the correct values of the attributes for evaluating the predicate. TempCandidates should not include any tuples which need to be deleted from the view. A tuple will be deleted from the view, either because it is physically deleted from the relation R or because it no longer satisfies the predicate due to the update. Performing the join between the relation S to the original relation R takes care of eliminating the former and evaluating the predicate P takes care of eliminating the latter. The second step refreshes the view currView. The first of the two queries deletes all the tuples in CurrView that were touched, i.e. all keys in S. The second query inserts any of the touched tuples that still belong to the view. Let's separately consider how the inserts, updates and deletes to the view are correctly handled. A tuple to be inserted into the view will be present in the temporary relation TempCandidates, and thus gets inserted into the view CurrView by the second query. A tuple that is updated needs its attribute values refreshed in the view CurrView. This tuple is present in TempCandidates. It is updated by first deleting it from the view CurrView with the first query and then reinserting it into the view from TempCandidates with the second query. A tuple to be deleted from the view will not be present in the temporary relation, TempCandidates. The first query will delete this tuple and the second query will not reinsert it into the view. Finally the third step simply calls a User Defined Routine to run the action on each of the tuples in the view CurrView.
PAGE 34
27 The above steps can be translated into SQL queries as follows: 1. Select S.key as key, R.fields as fields from S, R where S.key = R.key and sPinto temp TempCandidates; 2. 2a. Delete from CurrView where exists (select S.key from S where S.key=CurrView.key); 2b. Insert into CurrView select key, fields from NewCandidates; 3. Select trigAction (key, fields) from CurrView; where trigAction is a UDR for the specified trigger action Salient features of the algorithm are the following. The above algorithm is well enhanced and should work especially well on tables with a low update rate. The set of keys of touched tuples in relation S is kept to as small a size as possible. This is done by evaluating the predicate P, on each tuple, before inserting its key into S. If the predicate P is highly selective, only a small fraction of the updates to R, have their keys stored in S. There is however, a small cost overhead of evaluating the predicates P on touched tuples before inserting it into the relation S. This cost is significantly reduced by making use of the existing fast and efficient discrimination network of TriggerMan. The number of queries in the above algorithm is restricted to three, of which only one accesses the original relation R. The other two queries operate on much smaller
PAGE 35
28 relations and should not cause a performance overhead. Processing is made more efficient when the relation R has an index on the primary key, because then, in the worst case, only as many pages of relation R are accessed as the number of unique keys in S. Correlated sub-queries, as the one above in step 2, are well optimized by several existing DBMS products. This query is flattened to perform a join ignoring any duplicates. 4.4.2 Single Table on Insert In the case of single table triggers with the on insert event, we are interested in rows inserted into the view CurrView, since the last timer expiration. When the timer goes off, we create the current copy of the view, identify all new entries from the previous copy and run the action on them. As before the relation S(key) is the set of keys of touched tuples of relation R. The operations to process the trigger are as follows: 1. TempCandidates = P key,fields ( sP(S JOIN R)) 2. Candidates for the action are all the tuples of (TempCandidates CurrView) 3. Refresh CurrView : 2a. CurrView = CurrView (S JOIN CurrView) 2b. CurrView = CurrView TempCandidates The first step creates a temporary relation TempCandidates of all the inserts and updates as before. The next step runs the trigger action on all the tuples in TempCandidates that are not in the previous copy of the view CurrView. Finally CurrView is refreshed as before. The SQL statement to run the action on new inserts in step 2 can be written as follows:
PAGE 36
29 select trigAction (key, fields) from TempCandidates where not exists (select CurrView.key from CurrView where key = TempCandidates.key) where trigAction is a UDR for the specified trigger action. It accepts the key and fields of the view and executes the action. 4.4.3 Single Table on Delete In the case of single table triggers with the on delete event, we are interested in rows deleted from the view CurrView, since the last timer expiration. When the timer goes off, we create the current copy of the view, identify rows deleted from the previous copy and run the action on them. The operations to process the trigger are as follows: 1. TempCandidates = P key,fields ( sP(S JOIN R)) 2. Candidates for the action are all the tuples of ((S JOIN CurrView) TempCandidates) 3. Refresh CurrView : 2a. CurrView = CurrView (S JOIN CurrView) 2b. CurrView = CurrView TempCandidates The first and the third steps are no different from before. The second step identifies tuples that were present in the previous copy of the view, but were deleted since. Of course, these could only be from among the set S. (S JOIN CurrView) returns all the tuples in the previous copy of the view, that were touched since. From this set, we
PAGE 37
30 ignore the inserts and updates (i.e. tuples in TempCandidates) to get only the deletes. The SQL query looks as follows: select trigAction (key, fields) from S, CurrView where S.key = CurrView.key and not exists (select TempCandidates.key from TempCandidates where S.key = TempCandidates.key). where trigAction is a UDR for the specified trigger action. It accepts the key and fields of the view and executes the action. Again, correlated sub-queries are flattened and well optimized by major commercial DBMS products. 4.4.4 Single Table on Update with Transition Predicate Timer triggers with transition predicates compare old and new values of attributes of corresponding tuples using the old and new keywords as qualifiers. As discussed later in section 4.6, a predicate for a transition trigger can be written in Conjunctive Normal Form (CNF) as P Ptran. P has all the non-transition predicates as before and is used to define the view currView= Pkey,fields,tranFields ( sP(R)). Ptran is the transition predicate operating on the old and new values of certain fields written as tranFields in the view. For simplicity, we assume that the predicate Ptran is such that only a change in the value of tranFields may cause it to evaluate to true. A more general discussion on evaluating transition predicates which evaluate to true even when the fields tranFields are untouched (e.g. old.value=new.value) is present in Chapter 7 under future work.
PAGE 38
31 In order to process a transition trigger, we create the current copy of the view, and then apply the transition predicates on the old and new copies of the view. The operations to process the trigger are as follows: 1. TempCandidates = P key, fields, tranFields ( sP(S JOIN R)) 2. Candidates for the action are all the tuples in sPtran (CurrView JOIN TempCandidates). 3. Refresh CurrView : 2a. CurrView = CurrView (S JOIN CurrView) 2b. CurrView = CurrView TempCandidates The first and third steps are same as before. In the second step we identify the tuples which satisfy the transition condition. Only those tuples that were updated, since the last timer expiration, are candidates to satisfy the transition condition. The join identifies only the updated tuples and applies the transition predicate Ptran on them. An SQL query for this step would look as follows: select trigAction (CurrView.key, CurrView.fields, CurrView.tranFields, TempCandidates.tranFields) from TempCandidates, CurrView where TempCandidates.key = CurrView.key and Ptran (TempCandidates.tranFields, CurrView.tranFields) where Ptran is the transition predicate applied on field tranFields of the view.
PAGE 39
32 4.5 Enhanced algorithms for Join Triggers A join trigger is defined on multiple tables. The schema for a view on N relations R1 to RN is CurrView(key1, , keyN, fields, tranfields) where keyI is the primary key of relation RI, "fields" are one or more attributes from any of the N relations whose values are required in the action and the last field "tranfields" have one or more attributes required to evaluate the transition predicate, if any. The selection predicate, normalized to CNF is P1 P2 PN PJOIN, where PI is the selection predicate on RI and PJOINrepresents the join predicates. The view definition is: CurrView= P key1 , , keyN, fields, tranFields( sP1P2 PN ( sPJOIN(R1 X R2 X .X RI X RN))) The brute force method accesses all the tuples of all the relations to create the new copy of the view. 4.5.1 Join Trigger on Update We maintain individual sets for each relation to register the updated tuples since the last timer expiration. Each of these relations SI(keyI) has the set of all the keys of relation RI that were touched since the last processing period. When the timer goes off the following operations are performed: 1. Create TempCandidates For I = 1 to N do If SI is empty then skip to the next iteration;
PAGE 40
33 TempCandidateI = P key 1, , keyN, fields, tranFields( sP1P2 PN ( sPJOIN(R1 X (SI JOIN RI) X RN))) TempCandidate = TempCandidate TempCandidateI.; done 2. Refresh CurrView : 2a. For I = 1 to N do If SI is empty then skip to the next iteration; CurrView = CurrView (SI JOIN CurrView) done 2b. CurrView = CurrView TempCandidates 3. Candidates for the action are all the tuples of the view CurrView. The relation TempCandidateI represents the set of inserted and updated tuples of the view due to the updates on RI only. The relation TempCandidates represents all the inserts and updates on the view since the last timer expiration and is created by performing the union of all the relations TempCandidateI. Step 1a and 1b can be implemented by a single query. Step 1 can thus be implemented using N+1 queries, N queries for each of the TempCandidateI's and one query to remove any duplicates from TempCandidates. In step 2, the view is refreshed by first deleting all the tuples touched due to updates on each of the relations and then inserting the tuples from TempCandidates. Step
PAGE 41
34 2 can also be written using N+1 queries. Finally, the last step runs the action on all tuples in the view. Salient features of the algorithm are the following. When the timer goes off, we have to execute many queries to refresh the view and run the trigger action. In the worst case when updates to all the relations are registered 2N+2 queries will be run, of which each of the N queries of step 1 access all the relations. Clearly, our implementation is not efficient in this case. The brute force method runs only one query to access all the relations. However, if only one relation registered updates then our algorithm need run only one query for the first step, two for the second and one for the third i.e. only four in all, three of which are on smaller relations. Our implementation thus, works really well in cases when only a small number of the tables are updated. 4.5.2 Join Trigger on Update with Transition Predicate In this case a transition condition is a part of the predicate. The predicate can be written in CNF as P1 P2 PN PJOIN Ptran. The view definition does not change and is as follows: CurrView= P key1, , keyN, fields, tranFields( sP1P2 PN ( sPJOIN (R1 X R2 X .X RI X RN))) When the timer goes off, the trigger is implemented as follows: 1. Create TempCandidates For I = 1 to N
PAGE 42
35 do If SI is empty then skip to the next iteration; TempCandidateI = P key1, , keyN, fields, tranFields( sP1P2 PN ( sPJOIN(R1 X (SI JOIN RI) X RN))) TempCandidate = TempCandidate TempCandidateI.; done 2. Candidates for the action are all the tuples in sPtran (CurrView JOIN TempCandidates). 3. Refresh CurrView : 3a. For I = 1 to N do If SI is empty then skip to the next iteration; CurrView = CurrView (SI JOIN CurrView) done 3b. CurrView = CurrView TempCandidates All the steps look similar to the previous discussion on the non-transition case. In step 2 the action is run on only tuples of the view that satisfy the transition predicate by comparing the corresponding values of these tuples attributes in the old and new view. Again as stated for the transition case of single table triggers, we assume for simplicity that the predicate Ptran is such that only a change in the value of tranFields may cause it to evaluate to true. A more general discussion on evaluating transition predicates
PAGE 43
36 which evaluate to true even when the fields tranFields are untouched (e.g. old.value=new.value) is present in Chapter 7 under future work. 4.6 Combining the View Predicate and the Trigger Predicate Consider a general timer trigger defined on one or more relations R1, Rn with a view predicate Pv and a trigger condition Pw, both in Conjunctive Normal Form (CNF). The view definition is V= Pfields( sPv(R1 X R2 X X Rn)). On timer expiration, the view is retrieved and the trigger predicate Pw is applied to it. The view CurrView that we create and keep around for enhanced trigger processing need not be the same as the view V but possibly a more efficient representation. In the case when the trigger has no transition predicates (i.e. no reference to old attribute values), we generate CurrView by applying the predicates P=Pv Pw on the relations i.e. CurrView= P fields ( sP (R1 X R2 X X Rn)). Since Pv and Pw are both in CNF, P is also on CNF. In the case when the trigger has transition predicates, the trigger predicate Pw has a non-transition part PnoTran and a transition part Ptran i.e. Pw=PnoTran Ptran. We generate CurrView by applying the predicates P=Pv PnoTran on the relations i.e., CurrView= P fields( sPvPnoTran (R1 X R2 X X Rn)). The procedure for breaking up Pw into PnoTran and Ptranis explained as follows. If the old value of an attribute is referenced in the predicate, then all the conjuncts containing that attribute make up Ptran. The conjuncts of Pw that are not in Ptran make up PnoTran.
PAGE 44
37 From the above two cases, it is clear that we can represent the predicate for a trigger as P when there are no transition predicates and P Ptran when there are transition predicates. 4.7 Examples Example of a Single Table Trigger : Consider this example of a non-transition single table trigger from Chapter 3. create timer trigger excess_vacation from (select email_addr, vacation_req_filled from employee where vacation_taken > vacation_left ) when vacation_req_filled = False check every week beginning Monday at 8:00 AM do email(:email_addr, "Please fill up the Excess Vacations form"); The predicate P here is P=(vacation_taken>vacation_left) and (vacation_req_filled=False) The various schema definitions are as follows: CurrView(emp_id, email_addr); TempCandidates(emp_id, email_addr); S(emp_id); The asynchronous trigger to collect the updates in S is create trigger S_employee_excess_vacation from employee when ((vacation_taken>vacation_left) and (vacation_req_filled=False)) or ((old.vacation_taken>vacation_left) and (old.vacation_req_filled=False))
PAGE 45
38 do execSQL'insert into S values :emp_id'; From the discussion in section 4.4.1 for the case of single table triggers with an ON UPDATE event, the steps to process the trigger on timer expiration are written as follows: 1. TempCandidates = P emp_id, email_addr( s(vacation_taken>vacation_left) and (vacation_req_filled=False) (S JOIN employee)) 2. Refresh CurrView : 2a. CurrView = CurrView (S JOIN CurrView) 2b. CurrView = CurrView TempCandidates 3. Candidates for the action are all the tuples of the view CurrView. Example of a Join Trigger : Consider this example of a transition join trigger from Chapter 3. create timer trigger new_artist_album_sales from (select artist.email as artist_email, album.sales as sales from album, artist where (album.artist_name = artist.artist_name) and ((artist.category="rock") or (artist.category="pop")) and (album.artist_type="new") when old.sales < 1000000 and sales>=1000000 check every day do email(:artist_email, "Congratulations! Your album sales have crossed a million"); The predicate here is Partist Palbum PJOIN Ptran where Partist = (artist.category="rock") or (artist.category="pop") Palbum = album.artist_type="new"
PAGE 46
39 PJOIN = album.artist_name = artist.artist_name Ptran = old.sales < 1000000 and sales>1000000 The various schema definitions are as follows: CurrView(artist_id, album_id, sales); TempCandidates(artist_id, album_id, sales); Sartist(artist_id); Salbum(album_id); The asynchronous triggers to collect the updates in Sartist and Salbum are: create trigger S_artist_new_album_sales from artist when (artist.category="rock") or (artist.category="pop") or (old.artist.category="rock") or (old.artist.category="pop") do execSQL 'insert into Sartist values :artist_id'; create trigger S_album_new_album_sales from album when album.artist_type="new" or old.album.artist_type="new" do execSQL 'insert into Salbum values :album_id'; The view is initialized as follows: CurrView= P artist_id, album_id, sales ( sPartistPalbum( sPJOIN(artist X album))) From the discussion in section 4.5.2 for the case of Join Triggers with an ON UPDATE event and a Transition predicate, the steps to process the trigger on timer expiration are written as follows:
PAGE 47
40 1. Create TempCandidates: 1a. If Sartist is not empty then TempCandidateartist = P artist_id, album_id, sales( sPartistPalbum( sPJOIN(Sartist JOIN artist) X album))) TempCandidate = TempCandidate TempCandidateartist.; end if 1b.If Salbum is not empty then TempCandidatealbum = P artist_id, album_id, sales( sPartistPalbum( sPJOIN(Salbum JOIN album) X artist))) TempCandidate = TempCandidate TempCandidatealbum.; end if 2. Candidates for the action are all the tuples in sPtran (CurrView JOIN TempCandidates). 3. Refresh CurrView : 3a. If Salbum is not empty then CurrView = CurrView (Salbum JOIN CurrView); If Sartist is not empty then CurrView = CurrView (Sartist JOIN CurrView); 3b. CurrView = CurrView TempCandidates
PAGE 48
41 4.8 Conclusions The brute force implementation of timer triggers is expensive because of the CPU and I/O cost involved in reading entire relations and applying selection and join predicates on them. In single table triggers where the update rate on relations are not very high, our strategy of keeping track of the tuples touched and performing queries only on this touched set works very well. Performance improves even more if the selection predicate is very selective or if there is an index on key of the relation. In multi-table triggers, our strategy works well when only a few of the relations are updated and the update rate is not very high.
PAGE 49
42
PAGE 50
42 CHAPTER 5 ENHANCED ALGORITHMS FOR AGGREGATE TRIGGERS 5.1 Introduction Aggregate triggers are triggers whose condition of interest involve an aggregation on all tuples or groups of them. If a tuple of a relation is touched then the group it belongs to needs its aggregate information re-calculated and the trigger condition checked. Thus, aggregate triggers implemented as synchronous or asynchronous triggers have a high processing cost per updated tuple. The timer trigger mechanism to test for interesting aggregate conditions periodically is a better option to implement aggregate triggers. The view here is an aggregation of tuples from one or more relations grouped by some grouping attributes. Again, the brute force method is the simplest implementation and always works. There are enhanced algorithms discussed in the next few sections which go along similar lines of the algorithms in the previous chapter that make aggregate triggers very efficient under some cases. 5.2 The Brute Force Implementation for Aggregate Triggers Consider a general aggregate trigger defined on relations R. The selection predicate of the trigger is P, the grouping attributes are gAttr and the aggregate predicate is Paggr PtranAggr in CNF, where Paggr represents all the non-transition aggregate predicates and PaggrTran represents the transition aggregate predicates. Although the view and the
PAGE 51
43 trigger have their separate aggregate predicates PVaggr and PWaggr respectively, we combine them together into one Boolean and represent them as Paggr PtranAggr, where PtranAggrcontains the transition predicate part of the aggregate predicate and Paggr contains the nontranstion part. This discussion of combining predicates is identical to the one we had in section 4.6, the only difference being that the selection predicates are replaced by aggregate predicates. The schema for the view CurrView is (gAttr, aggrValues, tranAggrValues) where aggrValues are one or more fields in the view required for the action and tranAggrValues are one or more fields required to evaluate the aggregate predicate. The view is defined as follows. CurrView= P gAttr, aggrValues, tranAggrValues ( sPaggr(gAttr * aggrFunc fields( sP (R)))) where aggrFunc is an aggregate function applied on some attribute "fields" of the relation. The aggregate values generated by aggrFunc are to evaluate the aggregate predicate Paggrand also to evaluate the fields aggrValues and tranAggrValues of the view. PrevView is the copy of the view from the previous timer expiration. Note that CurrView and PrevView are views we use for implementation, they are different from the view defined in the trigger. The operations to be performed to implement this trigger are: 1. Create a new copy of the view CurrView, CurrView= P gAttr, aggrValues, tranAggrValues ( sPaggr(gAttr * aggrFunc fields( sP (R)))) 2. Run the action on all the tuples resulting from the following operation, sPtran (CurrView JOIN PrevView).
PAGE 52
44 The above two steps translate into two SQL queries. For non-transition triggers, the second step is not necessary and the action is run on all the tuples of CurrView, resulting in only one SQL query. 5.3 Single Table Aggregate Triggers In this section we present some enhanced algorithms to implement the various timer triggers with aggregate conditions on single relations. These include triggers with any of the various events viz. on insert, on update and on delete and with or without aggregate transition predicates on the groups within the view. Again, the simplest method that works is the brute force method. When the timer goes off the new view is created from all the tuples of the relations and an aggregation performed on all the groups. However, if only a small number of groups are touched or many tuples of only one group are touched then recomputing the aggregation for all the tuples in all the groups may be extremely inefficient in terms of I/O and CPU. The general schema of an aggregate view currView on a single table R is (gAttr, aggrValues, aggrTranfields) where gAttr comprises the group attributes and aggrValues are the values of aggregate functions applied on the group, required for the action. The aggrTranfields values are required only for transition triggers to evaluate the transition predicate. These values are also obtained by applying some aggregate functions on the group.
PAGE 53
45 5.3.1 Single Table, Aggregate, Non-Transition, on Update Consider a timer trigger defined on a relation R with selection predicate P, one or more grouping attributes gAttr and the aggregate predicate Paggr. The view can be defined as follows: CurrView= P gAttr, aggrValues ( sPaggr (gAttr * aggrFunc fields ( sP (R)))) where aggrfunc are one or more aggregate functions applied on some fields "fields" of relation R. The values of these aggrFunc may either be needed to evaluate the aggregate predicate Paggr, or to simply compute the aggregate values aggrValues required for the action. As before, we maintain a relation S(gAttr), which is the set of groups touched due to updates on relation R since the last timer expiration. The algorithm to process this trigger is as follows: 4. TempCandidates = P gAttr, aggrValues ( sPaggr (gAttr * aggrFunc field ( sP (S JOIN R)))) 5. Refresh CurrView : 2a. CurrView = CurrView (S JOIN CurrView) 2b. CurrView = CurrView TempCandidates 6. Candidates for the action are all the tuples of the view CurrView. TempCandidates is a temporary relation of current aggregate information of only the groups to be inserted or updated in the view CurrView. sP (S JOIN R) returns all the tuples belonging to the touched groups, satisfying the selection predicates. We then group these tuples on the grouping attributes gAttr and apply the aggregate predicates Paggr, to obtain new aggregate values of all the touched groups. The second step refreshes the view
PAGE 54
46 CurrView by deleting all the groups touched and inserting all tuples in TempCandidates. Finally, the trigger action is run on all tuples in the view CurrView. As consistently seen in all examples until now, the main difference in the various algorithms for single table triggers lies in creating the relation TempCandidates consisting of inserts and updates to the view. After this relation is created, the steps to refresh the view and run the action on the tuples are similar. The salient features of the algorithm are the following. This algorithm works best if tuples from only a few groups are touched because then the data is read and aggregation performed only on tuples belonging to this small set of groups. If there is an index on the group attributes gAttr, then I/O is performed only on those pages having the tuples of the touched groups. Applying the selection predicate P on all touched tuples before inserting it into the set S, reduces the number of groups especially if the predicate is highly selective. Only one of the queries accesses a part of the original relation R. The remaining three operate on smaller relations and involve simple join and predicate application. The SQL query to implement the trigger looks as follows: 1. Select S.gAttr as gAttr, aggrFunc(R.field) as aggrValues from S, R where S.gAttr=R.gAttr and P group by gAttr having Paggrinto temp TempCandidates 2. Refresh CurrView : 2a. Delete from CurrView
PAGE 55
47 where exists (select gAttr from S where S.gAttr=CurrView.gAttr); 2b. Insert into CurrView select gAttr, aggrValues from TempCandidates; 3. select trigAction (gAttr, aggrValues) from CurrView. 5.3.2 Single Table, Aggregate, Transition, on Update In this case one or more of the aggregate predicates involve a transition condition on a aggregate value (e.g. old.sum(sal) < sum(sal) ). A general representation of the aggregate predicate can be written in CNF as Paggr PtranAggr, where Paggr is the regular aggregate predicate from before and PtranAggr has one or more transition predicates. The view is defined as follows: CurrView= P gAttr, aggrValues, tranAggrValues ( sPaggr (gAttr * aggrFunc field ( sP (R)))) where now the aggrfunc additionally generates one or more fields tranAggrValues to evaluate the transition predicate. 1. TempCandidates = P gAttr, aggrValues, tranAggrValues ( sPaggr (gAttr * aggrFunc field ( sP (S JOIN R)))) 2. Candidates for the action are all the tuples in sPtranAggr (CurrView JOIN TempCandidates) 3. Refresh CurrView : 2a. CurrView = CurrView (S JOIN CurrView) 2b. CurrView = CurrView TempCandidates.
PAGE 56
48 Only step 2 above looks different. In this step we identify the tuples which satisfy the transition condition. Only those groups touched since the last timer expiration are candidates to satisfy the transition condition. These groups are in TempCandidates. The join identifies only the updated tuples and applies the transition predicate PtranAggr on them. 5.4 Multi Table Aggregate Triggers In the case of multi-table aggregate triggers, a trigger is defined on multiple tables. The schema for the view CurrView on N relations R1 to RN is (gAttr1 , , gAttrN, aggrValues, aggrTranfields). The grouping attributes in this case are gAttr1, gAttr2, , gAttrN, where a gAttrI could be zero or more attributes of the relation RI. This number is zero when a relation does not have any of its attributes in the group-by clause. The selection predicate, normalized to CNF is P1 P2 PN PJOIN, where PI is the selection predicate on RI and PJOIN represents the join predicates. The view definition is: CurrView= P gAttr1, , gAttrN, aggrValues( sPaggr ( gAttr1, , gAttrN * aggrFunc field( sP1P2 PN ( sPJOIN(R1 X R2 X .X RI X RN))))) 5.4.1 Multi Table, Aggregate, Non-Transition, on Update When the timer goes off, we re-aggregate on the groups touched due to updates on each table. The operations are as follows: 4. Create TempCandidates For I = 1 to N
PAGE 57
49 do If SI is empty then skip to the next iteration; create TempCandidateI; TempCandidate = TempCandidate TempCandidateI.; done 5. Refresh CurrView : 2a. For I = 1 to N do If SI is empty then skip to the next iteration; CurrView = CurrView (SI JOIN CurrView) done 2b. CurrView = CurrView TempCandidates 6. Candidates for the action are all the tuples of the view CurrView. The relation SI (gAttrI) has the set of gAttrI values touched due to updates on tuples of relation RI, after applying the predicate PI on each tuple. The join predicate might cause only a subset of the groups having the values in gAttrI to be affected, however, for simplicity of the algorithm we recompute for all groups having the gAttrI values. TempCandidatesI represents the set of groups touched due to the updates on relation RI only. The process to create it is discussed below. TempCandidates is the union of all these temporary relations and thus is a cumulative set of touched groups due to updates on all relations on the view. Step 1a and 1b can be implemented by a single
PAGE 58
50 query. Step 1 can thus be implemented using N+1 queries, N queries for each of the RI's and one query to remove any duplicates. In step 2, the view is refreshed. First all the touched groups due to updates on each relation are deleted from the view, then we reinsert the appropriate groups with their new values from TempCandidates. Step 2 also can be written using N+1 queries. Finally, the last step runs the action on all tuples in the view. The process of creating the relation TempCandidatesI for any relation RI is done in two different ways depending on whether any of its attributes contribute to the grouping or not. In the first case we consider a relation RI whose attributes are present in the grouping as the fields gAttrI. TempCandiatesI for this case can be created as follows: TempCandidates = P gAttr1, , gAttrN, aggrValues( sPaggr ( gAttr1, , gAttrN * aggrFunc field ( sP1P2 PN ( sPJOIN(R1 X (SI JOIN RI) X RN))))) where SI (gAttrI) has the values of all the gAttrI touched since the last timer expiration. The aggregation is performed for all the groups whose gAttrI values are in S. In the second case we consider a relation RI which does not have any attributes in the grouping clause. This makes the task of finding the groups touched due to updates on relation RI a little difficult. In this case, we collect the set S(pKey), the set of primary keys of tuples updated in RI. Then we use the selection and join predicates to find another relation RJ whose gAttrJ values are touched due to updates on RI. These values are collected in a relation SJ'(gAttrJ) and appended to SJ(gAttrJ). So SJ would now consist of all the gAttrJ values touched due to updates on RJ as well as RI.
PAGE 59
51 SJ'(gAttrJ) = P gAttrj ( sPI+1 PJ ( sPJOIN-IJ((SI JOIN RI) X RI+1 X . X RJ))) SJ(gAttrJ) = SJ(gAttrJ) SJ'(gAttrJ) where sPJOIN-IJ includes only those join predicates involving relations from RI to RJ The relation TempCandidatesJ is created the usual way from SJ as shown above, however it reflects the refreshed values of groups touched by updates on RI and RJ. RJ is any relation which contributes to the grouping attributes and is connected to RI in the rule condition graph. A rule condition graph is like a query graph where the various relations (with their selection predicates) are the vertices and the join predicates are the edges between the vertices. The relations RI+1 to RJ-1 are the relations along the path from RI to RJ in the query graph. The salient features of the algorithm are the following. When the timer goes off, we have to execute many queries to refresh the view and run the trigger action. In the worst case when updates to all the relations are registered 2N+2 queries will be run, of which almost N queries could access all the relations. Clearly, our implementation is not efficient in this case. The brute force method runs only one query to access all the relations, but involves a lot of aggregate computation on possibly a large number of groups. However, if only one relation registered updates then our algorithm need run only one query for the first step, two for the second and one for the third i.e. only four in all, three of which are on smaller relations. Our implementation thus, works really well in cases when only a small number of the tables are updated.
PAGE 60
52 5.4.2 Multi Table, Aggregate, Transition, on Update The only difference in the case of a multi-table aggregate trigger with a transition predicate lies in a transition condition being a part of the aggregate predicate. The aggregate predicate can be written in CNF as Paggr PtranAggr. The view definition does not change and is as follows: CurrView= P gA ttr1, , gAttrN, aggrValues, tranAggrValues( sPaggr ( gAttr1, , gAttrN * aggrFunc field( sP1P2 PN ( sPJOIN(R1 X (SI JOIN RI) X RN))))) When the timer goes off, the trigger is implemented as follows: 1. Create TempCandidates For I = 1 to N do If SI is empty then skip to the next iteration; create TempCandidateI; TempCandidate = TempCandidate TempCandidateI.; done 2. Candidates for the action are all the tuples in sPtranAggr (CurrView JOIN TempCandidates). 3. Refresh CurrView : 3a. For I = 1 to N do If SI is empty then skip to the next iteration;
PAGE 61
53 CurrView = CurrView (SI JOIN CurrView) done 3b. CurrView = CurrView TempCandidates. All the steps look similar to the previous discussion on the non-transition case. In step 2 the action is run on only the groups satisfying the transition predicate by comparing the corresponding values of these groups in the old and new view. 5.5 Examples Example of a Aggregate Trigger: Consider this example of an aggregate trigger with no transition predicate from Chapter 3.create timer trigger excess_salary from (select dept.manager_email as mgr_email, sum (emp.sal) as sum_sal, dept.budget as budget from emp, dept where (emp.dno = dept.dno) and (emp.desig="sales personnel") and (dept.name="toy") group by dept.name, dept.manager_email, dept.budget when sum_sal > .50 * budget check every week do email(:mgr_email, "You are exceeding the allowed budget for salespersons.");The selection predicate is Pemp Pdept PJOIN where Pemp is (emp.desig="sales personnel") Pdept is (dept.name="toy") PJOIN is (emp.dno = dept.dno) The aggregate predicate is Paggr where Paggr is (sum_sal > .50 * budget) The various schema definitions are as follows: CurrView(dept_name, mgr_email, budget, sum_sal);
PAGE 62
54 TempCandidates(dept_name, mgr_email, budget, sum_sal); Semp(emp_id); Sdept(name, manager_email, budget); CurrView and TempCandidates have the same schema consisting of the grouping attributes and a field containing the result of summing the groups on sal. Since only the relation Dept contributes to the grouping attributes we collect the set of touched groups in Sdept. Emp does not contribute to the grouping at all so we keep track of the primary keys of interesting tuples updated. The asynchronous triggers to collect the updates in Semp and Sdept are:create trigger S_emp_ excess_salary from emp when emp.desig="sales personnel" or old.emp.desig="sales personnel" do execSQL 'insert into Semp values :emp_id'; create trigger S_dept_ excess_salary from dept when dept.name="toy" or old.dept.name="toy" do execSQL 'insert into Sdept values (:name, :manager_email, :budget)';From the discussion in section 5.4.1 for the case of a multi-table aggregate trigger with an ON UPDATE event and no transition predicate, the steps to process the trigger on timer expiration are written as follows: 1. Create TempCandidates 1a.If Semp is not empty then Sdept= Pname, manager_email, budget( sPempPdept ((Semp JOIN emp) JOIN dept)
PAGE 63
55 endif 1b.If Sdept is not empty then TempCandidatedept= Pname, manager_email, budget, sum_sal(dept.name, dept.manager_email, dept.budget * sum sal( sPempPdept (emp JOIN (Sdept JOIN dept)))) TempCandidate = TempCandidatedept; endif 2. Refresh CurrView : If Sdept is not empty then CurrView = CurrView (Sdept JOIN CurrView); CurrView = CurrView TempCandidates 3. Candidates for the action are all the tuples of the view CurrView. 5.6 Conclusion Implementing aggregate triggers by the brute force method may not be the most efficient method in some cases. Reading entire relations and performing aggregations involves a lot of expense in terms of both CPU and I/O. In single table triggers where the update rate on relations are not very high, our strategy of keeping track of the groups touched and aggregating on only this small set of groups works well. Performance improves even more if the selection predicate is very selective or if there is an index on the relation. In multi-table triggers, our strategy works well if only a few of those relations are updated and the update rate is not very high.
PAGE 64
56 CHAPTER 6 OPTIMIZATION AND IMPLEMENTATION ISSUES 6.1 Introduction In the preceding chapters, we looked at the syntax and semantics of timer triggers with its brute-force and enhanced implementation algorithms. In this chapter, we discuss the architecture and some optimization issues of a Timer Trigger System. Although the enhanced algorithms contain more queries than the brute force approach, they can be orders of magnitude faster under certain conditions. The same algorithms however, may perform poorly compared to the brute-force approach under different conditions. This chapter looks at the factors which determine the performance of our enhanced algorithms and provides some thoughts in making a choice between the better of the two approaches. Trying to make a comprehensive assessment using extensive statistics, like the query optimizer does [Chm76, Sel79], is extremely complex to implement. 6.2 Architecture The architecture of a timer trigger system is shown in figure 6.1. As discussed before, a Timer Trigger System is designed to complement the powers of an asynchronous trigger processor. A console program interfaces to the user and accepts the "create timer trigger" command string. This console could be same as the one used by the asynchronous trigger processor to accept its commands. This command string is then sent
PAGE 65
57 to the parser where it gets parsed for syntactic correctness. The semantic analyzer validates the semantics of the various entities present in the command string viz. the data sources, views, fields etc. It also checks to see if the view on which the trigger is defined has a primary key on it, and if the action present in the do clause is valid. Figure 6.1: Architecture of a Timer Trigger System The code generator then generates the SQL scripts for both the brute-force and the enhanced strategy. The default strategy is initially set to brute-force. The implementation of the enhanced strategy requires us to create asynchronous triggers to Parser and Semantic Analyzer Code Generator Re-assess Strategy Timer Manager Asynchronous Trigger Processor Execution Unit Timer Trigger Console Timer Trigger System Execute Scripts
PAGE 66
58 keep track of the updates to each of the data sources. As many asynchronous triggers are needed as there are data sources in the timer trigger definition. The scripts for the asynchronous trigger processor are created by the code-generator. All the scripts created are stored in a catalog with the trigger definition. A request is then sent to the timer manager for a periodic wake-up signal. If the "initialize immediately" clause is set in the timer trigger string, the view is initialized the first time on trigger definition by brute force. This view, as discussed before, needs to be kept until the next timer expiration to evaluate transition predicates. The timer manager accepts entries for timer requests when a trigger is defined. It registers these entries into the its list of wake-up times. On expiration of this timer, it sends a wake-up signal to the execution unit identifying the trigger to be processed. The execution unit , on receiving a wake-up signal, extracts the definition and the scripts of the specified trigger from the catalogs. Depending on which of the two strategies were set active, the execution unit runs the set of SQL scripts from either the brute-force or the enhanced algorithm. These scripts generate the new view, apply the predicates of the trigger referring to the previous copy of the view if necessary, and finally run the action on the qualifying tuples. After firing the actions, the trigger strategy is reassessed in the reassess block . The values of a set of parameters, on which the execution cost of the trigger is dependent, is examined and a decision in made to choose between the two strategies. If the time spent on re-assessing is a fair fraction of the processing time of the trigger, then this step can be executed every few times after the processing instead of every time. The
PAGE 67
59 mechanism for reassessing the trigger processing strategy is covered in more detail later in this chapter. 6.3 Thoughts on Optimization The run-time performance of the enhanced strategy depends on the following factors : update rate of the data sources selectivity of the selection predicate of the view indexes present on the data sources on their primary key In Chapters 4 and 5, we mentioned that for each relation involved in the trigger, a relation S(pKey) or S(gAttr) is maintained to store the keys of updated tuples. This list of updated tuples is filtered to a potentially small size by applying the selection predicate while gathering them. If the number of keys in S is kept to a small size then the time spent for creating the new view is reduced. The size of the set S is small if either the update rate is low or the predicate is very selective or both. If the data source has an index on its primary key pKey or the grouping attribute gAttr, then the time spent on processing the trigger is significantly reduced, since only a few pages of the data source are touched. Thus, one or more of the above mentioned factors contribute to the better performance of the enhanced algorithms. So, if a plot of the processing time versus the update rate of the table is to be drawn, then it should look like the one shown in figure 6.2. From this plot it is clear that the enhanced strategy works well for low update rates. As the update rate increases there is a crossover point, above which the enhanced strategy is worse than the brute force one.
PAGE 68
60 It is our goal to work in the shaded region of the graph i.e., use the enhanced strategy until the update rate reaches the crossover point, after which we want to switch over to the brute-force approach. Update RateTrigger Processing Time Brute Force EnhancedFigure 6.2: Performance of the brute force method vs. the enhanced method for varying update rates. 6.4 Optimization Strategies The first optimization strategy is the simplest and the easiest to implement. Actual run times are used to arrive at a decision on which among the two algorithms is the better one. By actual time we mean the wall-clock time, as it is very difficult to obtain the actual CPU and I/O time spent during query executions due to the inherent multi-threading in DBMSs [Inf99]. Brute force is initially used to create the view for the first time and the CrossoverPoint
PAGE 69
61 time spent in milliseconds is stored with the trigger information. In every subsequent processing, the enhanced strategy is run and the cost recorded and compared with the brute-force cost. When the former cost exceeds the latter in two or more consecutive runs, we switch to the brute-force approach. In the enhanced region of the graph, this algorithm works fine because we know with a fair degree of certainty that the enhanced algorithm has lesser processing cost. But once we cross over to the brute-force region, it is difficult to make a decision when to return back to the enhanced approach, due to the lack of sufficient information collected. In the second optimization strategy, we use the estimated costs from the query optimizer in arriving at a relation between the update rate of the data sources to the execution cost of the enhanced algorithm. However, this information is not reliable as the size of the relation S will never be accurately determined by the optimizer due to the dynamically varying size of S. But, if the optimizer provides a function to calculate the cost of a query by simply giving to it the size of S then we can approximately determine the size of S that causes the crossover in costs between the two algorithms. As the updaterate rises the number of touched keys collected in S will increase too. When the number of keys in S crosses this threshold number, we switch to the brute force approach and stop gathering the updates. At a later time, when the update rate drops, we can resume collecting keys in S to see if this number has dropped below the threshold number. Stopping and restarting the process of collecting the updated keys is as simple as disabling and enabling the asynchronous triggers defined on the data sources. The update-rate of data sources can be kept track of by defining simple mainmemory counters (variables) for each data source. These counters are incremented on
PAGE 70
62 every update to the corresponding data source and once every hour the update rate can be calculated (by dividing the value of this counter by sixty minutes). Since this counter is going to be shared among the different transactions updating the data source, a locking overhead will be introduced per increment. However, since we are interested in only an approximate number for the update-rate, we can avoid this expensive process of locking the counter. While the second strategy seems appealing, it is based on the assumption that the query optimizer provide an interface function that accepts the size of S and returns the cost of the enhanced query. However query optimizers compute the estimated costs of queries based purely on statistics collected by the database and do not accept parameters. It will be helpful if DBMS vendors could open up their query optimizers provide interfaces that accept some parameters while optimizing a query. One way to get around this problem is to enter some dummy rows into S and call the gather statistics command on S, before optimizing the query. The first strategy does seem like a more practical and a simple one. However, we need to find some way to return back to the enhanced approach from the brute force one, when the update rate drops. This can be done by recording the update-rate at the crossover point. Later, when the update rate drops below this recorded value we can switch back to the enhanced strategy. 6.5 Conclusion The architecture of a timer trigger system is simple. It functions hand in hand with an Asynchronous Trigger Processor. The enhanced algorithm works well for low update
PAGE 71
63 rate tables, high trigger predicate selectivity and the presence of primary key indexes on data sources. A plot of the enhanced vs. the brute-force strategy reveals that our enhanced algorithms perform well for low update rates. As the update rates increases there is a crossover point when this strategy is worse than the brute force one. It is important to identify the crossover point until which the enhanced algorithm should be used. A more detailed design and implementation of optimized techniques for a timer trigger system is a subject of future work.
PAGE 72
64 CHAPTER 7 SUMMARY AND FUTURE WORK 7.1 Summary Timer triggers allow users to define views and monitor interesting tuples or interesting changes to tuples in these views. The process of retrieving views, testing trigger conditions and firing actions takes place periodically. All this gives timer triggers different semantics from those of the more conventional synchronous (and asynchronous) triggers, in that the tuples do not need to be updated to satisfy trigger conditions. The examples covered in Chapter 3 sought to reassert the point that some trigger needs of users require periodic testing and notification. These translate best to timer triggers and sometimes cannot be replaced by regular triggers. In example 1, where it was required to notify some employees weekly of a form to be filled, only timer triggers could be written here to correctly send one notification per week to forgetful employees. Some join triggers and many aggregate triggers could be expensive to implement using asynchronous triggers due to frequent accesses on the original relations. These triggers can be evaluated more efficiently by the periodic testing mechanism on a group of updates. The simplest implementation for timer triggers is the brute-force approach, which retrieves the view using its definition and then applies the predicates of the trigger. The enhanced algorithms of simply refreshing the view by recording its updates lead to a huge
PAGE 73
65 performance benefit during periodic processing. This is especially true if one or more of the following hold viz. the update-rate on the relations is low, the trigger predicate is very selective or indexes are defined on the primary key of relations. Formal cost analysis and models in trying to choose the better of the two algorithms has not been presented and is the scope for future work. However, our approach of making a choice between the better of the two algorithm and identifying the point of crossover by using actual run times is a simple and a feasible idea. The most useful contribution of this thesis is the idea of keeping a set of keys of updated tuples and restricting the processing to this set. This allows us to be oblivious of the anomalies arising due to the different sequence of inserts, updates and deletes on the same tuple of the view. This is achieved by actually retrieving the tuples associated with these keys from the original relation and thus getting the correct information. Maintaining this set however, requires us to create asynchronous triggers as explained in Chapter 4. An asynchronous trigger processor like TriggerMan with its fast predicate indexing mechanism makes it possible to collect this set of keys with a very low (and almost insignificant) overhead in many cases. The architecture of a timer trigger system presented in Chapter 6 is simple and reuses a lot of the functionality available from the asynchronous trigger processor like its console, data source definition etc. 7.2 Thoughts on Future Work In the enhanced algorithms for join triggers we assume that updates to all the tables in the trigger (or view) definition are important for refreshing the views. Thus, we
PAGE 74
66 keep track of the updates on all the tables. In many cases however, updates to fewer than all relations are relevant and it may not be necessary to keep track of the updates to all the relations. This is especially true in example 4 of Chapter 4 which sends a complimentary email to new pop and rock artists when their album sales cross a million. While updates take place on both the relations, album and artist, the updates to only album are relevant as we are only interested when the sales cross the million mark. Thus, there should be an option in the timer trigger language that specifies the relations on which the updates are relevant so that our optimizer has a more efficient execution. While making a choice between the better of the two approaches discussed, our main basis of comparison was the total run times spent on creating the views and firing the actions. However, the enhanced algorithms also incur a small cost during the update to each tuple. This is because, the asynchronous trigger processor evaluates some part of the predicate on the updated tuples and puts the key in the update set S if necessary. Although the cost of evaluating some part of the predicate can be very small with an efficient predicate indexing mechanism, like that of TriggerMan, it could still be significant for a high update-rate table or an expensive predicate. Other issues such as locking on the update set S could also escalate cost. Thus, we need to arrive at a more mathematical model to compute costs of the enhanced algorithms, taking into account the costs for both collecting the updates and processing them. When the enhanced algorithms are processing the set of keys of updated tuples (i.e., keys in S), updates on the original relation still proceed concurrently. Some of these updates may qualify to have their keys included in the set S. This represents a boundary case where keys are getting inserted into S while S is being processed. Some analysis
PAGE 75
67 needs to be done to see if any unexpected behavior is possible leading to some tuples getting processed at the wrong timer expiration. In our enhanced algorithm for triggers with a transition predicate (i.e., referring to old and new values from the view), we assumed that a tuple of the view must change to satisfy the transition predicate. We run the action on all the tuples that satisfy the following condition: sPtran (CurrView JOIN TempCandidates) where CurrView is the old copy of the view from the previous timer expiration (and is yet to be refreshed) and TempCandidates is the view of the tuples inserted or updated in the interval since the last timer expiration. By performing the join (CurrView JOIN TempCandidates), we isolate only those tuples which have been updated since the last timer expiration and apply the transition predicate Ptran (CurrView.tranFields, TempCandidates.tranFields) on the old and new copy of the relevant fields. However, depending on the transition predicate (e.g. old.value = new.value), there may be cases of tuples that satisfy the predicate without being updated. The brute-force approach will account for these tuples, as it compares all tuples of the old view to corresponding tuples of the new view. However, our enhanced approach does not identify tuples that have not been updated but still satisfy the transition predicate because we apply the predicate only on the updated tuples. In order to make our enhanced strategy work we need to change the predicate to the following: sPtran' (CurrView OUTER-JOIN TempCandidates) where Ptran' = (TempCandidate.key is not Null AND
PAGE 76
68 Ptran(CurrView.tranFields, TempCandidates.tranFields)) OR (TempCandidate.key is Null AND Ptran (CurrView.tranFields, CurrView.tranFields)) The predicate Ptran' is explained as follows. We pull in all tuples of the view CurrView by performing an outer join with TempCandidates. The predicate Ptran' has two disjuncts, the first one applies the predicate Ptran on old-new pairs of updated tuples in the view and the second one applies the predicate on only the old values of the untouched tuples. The second strategy is a general one and works for both the cases. This can be the default strategy. However, if we can determine from the transition predicate that an attribute value has to change to satisfy the predicate then we can switch to the more efficient approach of applying the predicate Ptran on only updated tuples of the view. Examples of such predicates are old.value < new.value, new.value> 10*old.value etc. A theorem prover can be introduced into the code generation block of figure 6.1 to examine the transition predicates and make this determination. This theorem prover is a subject of future work. It has been mentioned at many places, how some expensive join and aggregate triggers can be implemented more efficiently using timer triggers. More work needs to be done on this issue. We have to identify factors within the trigger definition and run-time characteristics that help us conclude the above. Having identified such triggers, we have to work on a method to inform the owner of that trigger or the system administrator of possible improvements in system performance using a timer trigger, along with some
PAGE 77
69 numbers possibly. We also need to research the idea of directly switching to timer triggers where possible, without changing the semantics of the trigger. It is possible for many triggers to be defined on similar or even the same views. Moreover, the trigger predicates themselves may be similar. In cases such as these, we can group the processing of two or more similar triggers to prevent multiple predicate evaluation and table accesses. This assumes more significance in aggregate triggers, due to the higher processing costs required for the aggregation. The issue of grouping the processing of two or more triggers needs to be addressed. In example 3 of Chapter 3, a trigger was written to send an email weekly to the finance department on the customers who have defaulted on their payment for ninety days or more. So, a tuple qualifies for the action simply because its due date has aged for more than 90 days. Triggers such as these, that do date arithmetic using the current date, can only be implemented using brute-force. More analysis needs to be done, identifying predicates like the one using current date, time and timestamps where the trigger will work correctly using only the brute-force approach.
PAGE 78
70 REFERENCES [Alf98]Al-Fayoumi, N., Temporal trigger processing in the TriggerMan active DBMS. Ph.D. dissertation, University of Florida, Gainesville, 1998. [Car98]Carnes, C., A flexible data source architecture for an asynchronous trigger processor. Unpublished paper, University of Florida, Gainesville, 1998. [Cat99]Categoric http://www.categoric.com Palo Alto, CA, 1999. [Cha89]Chakravarthy, S., HiPAC: A research project in active, time-constrained database management. Final Technical Report. Technical Report XAIT-8902. Xerox Advanced Information Technology, Cambridge, MA, August 1989. [For82]Forgy, C. L., Rete: A fast algorithm for the many pattern/many object pattern match problem. Artificial Intelligence 19(1):17-37, September 1982. [Geh91]Gehani, N., & Jagadish, H. V., Ode as an active database: Constraints and triggers. In Proceedings of the seventeenth International Conference on Very Large Data Bases 1991 , pages 327-336. Morgan Kaufmann, San Francisco, September 1991. [Han90]Hanson, E. N., Chaabouni, M., Kim, C., & Wang, Y., A predicate matching algorithm for database rule systems. In Proceedings SIGMOD Conference 1990, pages 271-280. ACM Press, New York, June 1990. [Han94]Hanson, E. N., & Chaabouni, M., The IBS tree: A data structure for finding all intervals that overlap a point. Technical Report WSU-CS-9011. Wright State University, Dayton, OH, April 1990. [Han96a]Hanson, E. N., The design and implementation of the Ariel active database rule system. IEEE Transactions on Knowledge and Data Engineering 8(1):157-172, February 1996. [Han96b]Hanson, E. N., & Johnson, T., Selection predicate indexing in active databases using Interval Skip Lists. Information System s, pp. 269-298, 1996.
PAGE 79
71 [Han97]Hanson, E. N., Bodagala, S., & Chadaga, U., Optimized trigger condition testing in Ariel using Gator Networks, University of Florida CISE Department. Technical Report 97-021, Gainesville, November 1997. http://www.cise.ufl.edu [Han98]Hanson, E. N., Konyala, M., Vernon, A., Noronha, L., & Park, J., Scalable trigger processing in TriggerMan. University of Florida CISE Department. Technical Report 98-008, Gainesville, July 1998. http://www.cise.ufl.edu [Han99]Hanson, E. N., Carnes, C., Huang, L., Konyala, M., Noronha, L., Parasarathy, S., Park, J., & Vernon, A., Scalable Trigger Processing. In Proceedings International Conference on Data Engineering 1999, pages 266-275. IEEE Computer Society, Los Alamitos, CA, March 1999. [Inf98]Informix Inc. http://www.informix.com Palo Alto, CA, 1998. [Kon98]Konyala, M., Predicate indexing in TriggerMan. Master's thesis, University of Florida, Gainesville, 1998. [Kan98]Kandil, M., Predicate placement in active database discrimination networks. Ph.D. dissertation, University of Florida, Gainesville, 1998. [Mir87]Miranker, D. P., TREAT: A better match algorithm for AI production systems. In Proceedings AAAI-87 Sixth National Conference on Artificial Intelligence , pages 42-47. Morgan Kaufmann, San Francisco, August 1987. [Net99]NetMind http://www.netmind.com Campbell, CA, 1999. [Oas98]Ozsu, M. T., & Valduriez, P., Principles of distributed database systems . Prentice-Hall, Englewood Cliffs, NJ, 1998. [Par98]Park, J., Implementation and performance tuning of discrimination networks for asynchronous trigger processing view maintenance. Unpublished paper, University of Florida, Gainesville, 1998. [Pat98]Paton, N. W., ed. Active rules in database systems. Springer Verlag, New York, 1998. [Pug90]Pugh, W., Skip lists: A probabilistic alternative to balanced trees. Communications of the ACM 33(6):668-676, June 1990. [Rao98]Rao, J., & Ross, K. A., Reusing invariants: A new strategy for correlated queries. In Proceedings SIGMOD Conference 1998, pages 37-48. ACM Press, New York, June 1998.
PAGE 80
72 [Sel79]Selinger, P. G., Astrahan, M. M., Chamberlin, D. D., Lorie, R. A., & Price, T. G., Access path selection in a relational database management system. In Proceedings SIGMOD Conference 1979, pages 23-34. ACM Press, New York, May-June 1979. [Ses96]Seshadri, P., Pirahesh, H., & Cliff Leung, T. Y., Complex query decorrelation. In Proceedings International Conference on Data Engineering 1996, pages 450-458. IEEE Computer Society, Los Alamitos, CA, March 1996. [Sil98]Silberschatz, A., Korth, H. F., & Sudarshan, S., Database management systems , 3rd ed., McGraw-Hill, New York, 1998. [Sis95a]Sistla, A. P., & Wolfson, O., Temporal conditions and integrity constraints in active database systems. In Proceedings SIGMOD Conference 1995 , 24(2):269-280, San Jose, CA, June 1995. [Sis95b]Sistla, A. P. & Wolfson, O., Temporal triggers in active databases. IEEE Transactions on Knowledge and Data Engineeri ng 7(3):471-486, June 1995. [Sto91]Stonebraker, M., & Kernnitz, G., The POSTGRES next-generation database management system. Communications of the AC M 34(10):78-92, October 1991. [Syb99]Sybase Replication Server http://www.sybase.com Emeryville, CA, 1999. [Wan92]Wang, Y. W., & Hanson, E. N., A performance comparison of the Rete and TREAT algorithms for testing database rule conditions. In Proceedings IEEE Data Engineering Conference 1992 , pages 88-97. IEEE Computer Society, Los Alamitos, CA, February 1992. [Wid91]Widom, J., Cochrane, R., & Lindsay, B. G., Implementing set-oriented production rules as an extension to Starbust. In Proceedings of the seventeenth International Conference on Very Large Data Bases 1991 , pages 275-285. Morgan Kaufmann, San Francisco, September 1991. [Wid95]Widom, J. & Ceri, S., eds., Active database systems: Triggers and rules for advanced data processing . Morgan Kaufmann, San Francisco, September 1995.
PAGE 81
73 BIOGRAPHICAL SKETCH Lloyd Noronha was born on December 7, 1973 in Bombay, India. He received a bachelor's degree in electronic engineering securing first class with honors from Bombay University, Bombay, India, in May 1995. He joined University of Florida in January 1998 to pursue a master’s degree in computer and information science and engineering. He has worked as a research assistant with Dr. Hanson at the Database Systems Research and Development Center. His research interests are in active database systems.
xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID EWQFNRL4L_6S1TAX INGEST_TIME 2017-07-20T20:59:02Z PACKAGE UF00101372_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES
|
|