<%BANNER%>

Record for a UF thesis. Title & abstract won't display until thesis is accessible after 2010-05-31.

Permanent Link: http://ufdc.ufl.edu/UFE0021854/00001

Material Information

Title: Record for a UF thesis. Title & abstract won't display until thesis is accessible after 2010-05-31.
Physical Description: Book
Language: english
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2008

Subjects

Subjects / Keywords: Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, M.S.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Thesis: Thesis (M.S.)--University of Florida, 2008.
Local: Adviser: Helal, Abdelsalam A.
Electronic Access: INACCESSIBLE UNTIL 2010-05-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2008
System ID: UFE0021854:00001

Permanent Link: http://ufdc.ufl.edu/UFE0021854/00001

Material Information

Title: Record for a UF thesis. Title & abstract won't display until thesis is accessible after 2010-05-31.
Physical Description: Book
Language: english
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2008

Subjects

Subjects / Keywords: Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, M.S.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Thesis: Thesis (M.S.)--University of Florida, 2008.
Local: Adviser: Helal, Abdelsalam A.
Electronic Access: INACCESSIBLE UNTIL 2010-05-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2008
System ID: UFE0021854:00001


This item has the following downloads:


Full Text

PAGE 1

1 SELECTIVITY ESTIMATION OF SENSOR DATA IN SENSABLE By ANIL MOOLA A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLOR IDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE UNIVERSITY OF FLORIDA 2008

PAGE 2

2 Anil Moola

PAGE 3

3 To My Parents

PAGE 4

4 ACKNOWLEDGMENTS I like to thank my advisor Dr. Abdelsalam (Sumi) Helal for his constant help, guidance and support. I like to thank all past and present members of the Mobile and Pervasive Computing Laboratory, who were involved in the design and de velopment of the Atlas platform. I especially like to thank Raja Bose for his construc tive criticism and valuable contribution.

PAGE 5

5 TABLE OF CONTENTS page ACKNOWLEDGMENTS...............................................................................................................4 LIST OF TABLES................................................................................................................. ..........7 LIST OF FIGURES................................................................................................................ .........8 ABSTRACT....................................................................................................................... ..............9 CHAPTER 1 INTRODUCTION..................................................................................................................10 2 RELATED WORK.................................................................................................................11 3 ATLAS SENSOR PLATFORM.............................................................................................12 Atlas Node..................................................................................................................... .........12 Firmware....................................................................................................................... ..........13 Middleware..................................................................................................................... ........13 4 SENSABLE QUERY PROC ESSOR ARCHITECTURE......................................................15 5 FILTERS AND MERGING...................................................................................................20 Filters........................................................................................................................ ..............20 Merging........................................................................................................................ ...........20 6 MANAGING SENSOR HISTORY.......................................................................................24 Distributed History............................................................................................................ .....25 Distributed History Management...........................................................................................26 History of a Sensor Entirely on Node..............................................................................26 History of a Sensor Enti rely on the Middleware.............................................................27 History of a Sensor partially Stored in the Middleware and the Node............................27 Handling Delayed Values................................................................................................31 Dealing Lost Packets:......................................................................................................33 7 ESTIMATING SELECTIVITY OF A SENSOR...................................................................34 Selectivity.................................................................................................................... ...........34 Selectivity Score.............................................................................................................. .......35 Dynamic Scheduling of Sub Queries of a Continuous Query................................................37

PAGE 6

6 8 EXPERIMENTAL EVALUATION.......................................................................................42 Results........................................................................................................................ .............42 Conclusion..................................................................................................................... .........46 Future Work.................................................................................................................... ........46 LIST OF REFERENCES............................................................................................................. ..47 BIOGRAPHICAL SKETCH.........................................................................................................49

PAGE 7

7 LIST OF TABLES Table page 5-1 Energy costs of various sensors.........................................................................................20

PAGE 8

8 LIST OF FIGURES Figure page 3-2. Software components of an Atlas Middleware.......................................................................14 4-1. Basic architecture for SENSABLE.........................................................................................15 4-2. Service oriented view of the ATLAS sensor platform..........................................................16 4-3. Query processor arch itecture in SENSABLE.........................................................................17 5-2. Containment of Filters.................................................................................................... ........22 5-3. Disjoint Filters.......................................................................................................... ..............22 6-1 History representation..................................................................................................... .........25 6-2 Example#1: Distribu ted history management.........................................................................29 6-3 Example#2: Distribu ted history management.........................................................................31 7-1 Selectivity Graph of Two Sensors...........................................................................................36 7-2 Flow chart for dynamic scheduling of filters..........................................................................41 8-1 Selectivity estimation with history size = 50...........................................................................43 8-2 Selectivity estimation with history size = 100.........................................................................44

PAGE 9

9 Abstract of Thesis Presen ted to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Master of Science SELECTIVITY ESTIMATION OF SENSOR DATA IN SENSABLE By Anil Moola May 2008 Chair: Abdelsalam (Sumi) Helal Major: Computer Engineering A pervasive space is a concept in which diffe rent kinds of smart devices are connected together to form a network to sense data and ac tuate devices. Atlas is a plug-and-play sensor and actuator platform being developed at the Mobile and Pervasive Computing Laboratory to become the basic building block for pervasive spaces. Atlas aims to provide physical nodes for connecting various heterogeneous devices such as sensors and actuators. Although large number of sensors, actuators, and computational devices form pervasive space, there exists no efficient way of querying t hose devices to cater to the growing needs of distributed applications. A new query processor architecture and optim ization methods are proposed for the Atlas Platform to enable the programmer to develop di stributed applications that access large number of different devices in the space. The optimiza tion methods also optimize the limited resources of smart devices such as bandwidth and processing power of nodes.

PAGE 10

10 CHAPTER 1 INTRODUCTION A pervasive space is a concept in wh ich different kinds of sm art devices are connected together to form a network to sense data and ac tuate devices. Atlas is a plug-and-play sensor and actuator platform being developed at the Mobile and Pervasive Computing Laboratory to become the basic building blocks for pervasive space s. Atlas aims to provide physical nodes for connecting various heterogeneous devices such as sensors and actuators. Smart homes and the pervasive computing envi ronments require efficient querying of a large number of sensors, actuators, and computati on devices to provide data to the distributed applications that are running on the top of the middleware fram ework. Although large number of sensors, actuators, and computational devices fo rm a sensor network, there exists no efficient way of querying those devices to cater to the growing needs of multiple applications. In the current version of Atlas platform, app lications are being developed by hard coding the queries according to the needs of the app lication. These queries are being executed in isolation. Moreover, there is no centralized mech anism such as a query processor which optimize the queries by utilizing some of the obvious advantages that ar e obtained by executing them in group and ordering the execution of the sub queries of a query. A new query processor architecture and optim ization methods are proposed for the Atlas Platform to enable the programmer to develop di stributed applications that access large number of different devices in the space. The optimization methods involve merging the sub queries based on their epochs, maintaining fixed size hist ory of each sensor and estimating the selectivity of each predicates to order th e execution of the sub-queries.

PAGE 11

11 CHAPTER 2 RELATED WORK Acquisitional query processor for data collection in sensor networks has been designed as part of Tiny DB to enable the query processing in ad hoc sensor networ ks. Acquistional issues are those that pertain to where, when, and how often data is phys ically acquired and delivered to query processing operators. In acquisitional design, the focus was on the co st of sampling a sensor in terms of energy requirements and based on the costs of the samp ling of physical sensor s, they ordered the sampling of the sensors to reduce the power cons umption. They did not mention any thing about storing of time dependent history of each sensor to determine the selectivity of a sensor. They did not exploit the selectivity of a sensor to determine the order in which predicates are evaluated. Although we used some of the acquistional issu es such as where and when, they are used in a completely different context such as servic e oriented sensor networ ks. The methods that are used to determine when and where data needs to be collected are completely new and have more relevance to the service or iented sensor networks

PAGE 12

12 CHAPTER 3 ATLAS SENSOR PLATFORM The Atlas platform is a combination of hardware, firmware running on the special hardware known as atlas node and a software middleware that provi des services and an execution environment. All these components form atlas sensor platform which allows any kind of sensor, actuator, or other device to be integrat ed into a network of devices, all of which can be queried or controlled through an interface specific to that device, and facilitates the development of applications that use the devices. This s ection describes the differe nt components of Atlas platform Atlas Node Each Atlas node is a modular hardware device composed of stackable, swappable layers, with each layer providing speci fic functionality. The modular design and easy, reliable quickconnect system allow users to change node configurations on the fly. A basic Atlas node configuration (Fig. 3-1) consists of three layers: the Processing Layer, the Communication Layer, and the Device Connection Layer. Figure 3-1. Hardware layers of an Atlas node Processing Layer: The Processing Layer is re sponsible for the main operation of the Atlas node. Processing layer consists of the Atmel ATmega128L microc ontroller. The ATmega128L is

PAGE 13

13 an 8MHz chip that includes 128KB Flash memory, 4KB SRAM, 4KB EEPROM, and an 8channel 10-bit A/D-converter. The microcontrolle r operates at a core voltage between 2.7 and 5.5V. Communication Layer: Communication layer is responsib le for the data transfer. Device Connection Layer: The Connection Layer is used to connect the various sensors and actuators to the platform. Integrating any number of analog and digital sensors is simple and easy. We currently have layers capable of c onnecting up to 32 sens ors to a single node. Firmware The firmware runs on the Processing Layer of the Atlas platform hardware, and allows the various sensors, actuators, and the platform itself to automatically integrate into the middleware framework when they are connected to th e node. The firmware detects the type of Communication Layer attach ed to the Atlas node and then co nnects to the Atlas middleware and uploads its configuration details, which registers all the devices c onnected to that particular node, making the devices available as OSGi services in the framework. Once the initially set up is done the node goes into data-processi ng mode. It begins sending data from sampled sensors and receiving commands to control se nsor and actuator operations. Middleware Although the middleware does, in part, run on the Atlas node s, the majority of the framework operates on a stand-alone PC. The OSGi framework forms the basis of the middleware. OSGi provides many service discovery and configuration mechanisms that are used in creating programmable pervasive spaces. When an Atlas node comes online, it performs some handshaking with the middleware. This hands haking process begins by the node which sends request to the Network Listener which is listening on a dedicated port for new nodes joining the network.

PAGE 14

14 Figure 3-2. Software component s of an Atlas Middleware As shown in Fig. 3-2(1), after the initial handshake, the Network Manager spawns a Communicator Thread that will exclusively handle all the network communications with this particular node from now on. A N ode Service Handler (NSH) is al so created which registers the various devices connected to this node as OSGi se rvices as shown in Fig. 3-2(2) and handles the routing of commands and data between the serv ice bundles and their respective devices. Applications are able to locate and use these serv ices provided by the new de vices (Fig. 3-2(3)).

PAGE 15

15 CHAPTER 4 SENSABLE QUERY PROCESSOR ARCHITECTURE The query processing in SENSABLE is centra lized as well as distributed. Complete centralized processing of all the queries in a stand alone PC th at acts as a middleware does not scale well. As the typical pervasive spaces contai n large number of devices that sense the data from different sensors and the applications that operate on the data, the system does not scale well if the queries are processed only in the middleware using stand alone PC. If the query processing is completely distributed among the se nsor nodes then it in creases the amount of traffic among the nodes to get a task done a nd the nodes will soon be running out of limited resources such as bandwidth and energy of the batteries. To gain the advantages of both distributed and centraliz ed query processing, the SENSABLE query processor has been designed to distribute query proc essing both at the middleware as well at the nodes. The following diag ram shows the approach that is adopted in the SENSABLE query processor. Figure 4-1. Basic architecture for SENSABLE

PAGE 16

16 Figure 4-2. Service oriented view of the ATLAS sensor platform. As we can see in the above diagram services and query processing component reside in the middleware and different Atlas nodes which pr ovide interface to sensors and actuators are connected to the middleware by means of the wi reless and wired network. Atlas is a service oriented sensor platform. In this platform, sensor s and actuators are represen ted as services in the middleware. Representing each sensor and actu ator as services in middleware gives the middleware better control in handlin g a sensor and actuator. It allo ws logical sharing of a sensor by means of a service. These services can be groupe d in different ways to create virtual sensors. These services are used by the query processing e ngine to interact with nodes. Data is fetched from the nodes using these services. These servi ces reside in the middl eware and are managed by a service manager. As shown in the diagram Fig 4-2, service s1 represents sensor s1 of a node; s2 represents sensor s2 so on. Components of the Query Proce ssor: Compilation and execution are two dist inct phases inside Query Engine. The gap between compilation of a query and its execution can be very small, a

PAGE 17

17 few microseconds or seconds. The gap makes the executor to check agai n whether the sensors and actuators that are specified in the query ar e alive during the execution time as sensors can go offline during the time betw een compilation and execution. Figure 4-3. Query processor architecture in SENSABLE During compilation (which include s optimization), It is requir ed to distinguish, what kind of knowledge is used as part of the compila tion. The compilation pro cess encompasses a few things. First, there's parsing and normalizatio n. Parsing is the actual dissecting of a query statement, turning it into data structures that can be processed more readily by a computer. Parsing also includes validating that the queri es have legal syntax. Parsing does not include

PAGE 18

18 things like checking for valid sens ors and its services mentioned in the query. These things are taken care during normalization. Normalization is ba sically intended to resolve those things that are referred inside the query into what their actu al characteristics are in sensor networks. It checks to see whether table mentioned is a valid sensor table and the sensors mentioned in the query actually exist in the current context. One of the things the optimizer does is to optimi ze based on a fairly stab le state. It may be possible that Query Engine may recompile a st atement depending on certain criteria such as when some of the sensors are dead or when the sensor history changes. Optimizer is responsible for generating the order in which where predi cates of a continuous query are executed based on the history available for each sens or that is referred in the query. The end product of the compilation phase is a query plan, which is put into the cache. This plan is used until there is a need to recomp ile the continuous query. C ache stores the execution plans of currently running continuous queries. C ache should contain the la test execution plan. If an execution plan changes because some of the se nsors are offline, then the query needs to be processed again to produce the new execution plan according to the current context. Once a new execution plan is generated, the old execution plan in the cache is replaced by the new execution plan. After the plan has been put in the cache, the Executor goes back through the logic in terms of executing it, checking to see if anything has ch anged, and if the plan needs to be recompiled again. Even though there are only microsecond s separating compilati on from execution, the executor checks whether the services or se nsors mentioned in the plan are alive. Executor executes the plan. It execu tes range predicates of a continuous query as a sub query. These sub queries are made as processe s which wake up at the beginning of the epoch

PAGE 19

19 specified in the query and do the required task. These sub queries are pushed on to the nodes of sensor network as processes which get registered in the process table of their respective nodes. The Optimizations determine the order in whic h these sub queries ar e pushed and what sub queries are pushed on to the node. The scheduler is responsible for k eeping track of the sub queri es pushed on to the nodes. This scheduler also maintains queue for each node to keep track of the queries that are waiting to go into the node. If the node is heavily loaded and the node cant serve more queries, the requests wait in the middleware scheduler queue. Let us s uppose that a query has a rrived at the scheduler when a particular node is heavily loaded. Then the scheduler puts the que ry into the queue and waits for the node to serve the wa iting requests. This scheduler is present in the middleware. Let us suppose that data has arrived from the sens or nodes as part of some continuous query. The data can be used to satisfy some of the queries waiting in the que ue. In such cases, the scheduler simply pops those queries which can be sati sfied with the data from another query. SensonServiceManager keeps track of all of the sensors that are active and its corresponding services. It directly interacts with Sensor Networ k to know whether the sensors are active or not. The SensonServiceManager also helps in providing information to compiler and executor regarding the services and sensors. Sensor History contains complete or a po rtion of history of a sensor that are active in sensor network. Each element in the history contains value and its time stamp.

PAGE 20

20 CHAPTER 5 FILTERS AND MERGING Filters Filters are the range predicates of continuous query that are pushed on to the Atlas node to filter out some of the data at the node depending on the filtering conditions. Filters play important role in filtering out the traffic from the node to middleware. If the sampled values dont satisfy any of the filters running on the node, then they are not required to be sent to the middleware. E.g., Select Temperature form sensors where node1.temperature < 20 epoch 20 sec for 1 hr. In the above query temperature < 20 acts as a filter. When that range predicate is pushed on to the node and executed at the beginning of its epoch then the predicate filters out all the values that are greate r than 20 at the node. Merging Sampling a sensor is a costly operation in sens or networks as sensor nodes have limited energy and CPU capabilities. Table 5-1 shows the en ergy requirements of some of the sensors. Table 5-1. Energy costs of various sensors. When a filter is pushed on to the node, it is checked against already running filters on the node to see whether it can be merged with already running filters. This merging helps in sampling a sensor once for all the similar filters. This merging helps in sharing of the sampled sensor value by many processes and subsequently, saving the cost of sampling a sensor for each

PAGE 21

21 filter. Merging identical filters is not a problem. But identifying the similar filters is difficult task. Following diagram illustrates the cases in which filters can be merged. Consider two filters A and B such that filter A needs to be executed every a seconds and filter B needs to be executed every b seconds and Assume a b where a & b are epochs. Possible relationship between 2 filters and their merging: Merge Partial Commonality of Filters Figure 5-1. Partial Commonality of Filters GCD(a,b) = a: 1. Run Filter A every a seconds excep t where ak = bl for some l,k. 2. Run Filter A U B every b seconds. GCD(a,b) = a = b: Run Filter A U B every a seconds. GCD(a,b) = c for some c < a and Let L =LCM(a,b). 1. Run Filter A every a seconds excep t where ak = bl for some l,k. 2. Run Filter B every b seconds except where ak = bl for some l,k. 3. Run Filter A U B every L seconds. GCD(a,b) = 1 i.e., a & b are co-primes and let L = LCM (a,b): 1. Run Filter A every a seconds excep t where ak = bl for some l,k. 2. Run Filter B every b seconds excep t where ak = bl for some l,k. 3. Run Filter A U B every L seconds. Filter A Filter B Filter A U B

PAGE 22

22 Merge Containment of Filters Figure 5-2. Containment of Filters GCD(a,b) = a: Run Filter A every a seconds. GCD(a,b) = a = b: Run Filter A every a seconds. GCD(a,b) = c for some c < a and Let L = LCM (a,b) : 1. Run Filter A every a seconds excep t where ak = bl for some l,k. 2. Run Filter B every b seconds except where ak = bl for some l,k. 3. Run Filter A every L seconds. GCD(a,b) = 1 i.e., a & b are co -primes and Let L = LCM (a,b). 1. Run Filter A every a seconds excep t where ak = bl for some l,k. 2. Run Filter A every b seconds excep t where ak = bl for some l,k. 3. Run Filter A every L seconds. Disjoint Filters Figure 5-3. Disjoint Filters GCD(a,b) = a: 1. Run Filter A every a seconds. 2. Run Filter B every b seconds. Filter A B Filter A Filter A Filter B Filter A Filter B

PAGE 23

23 GCD(a,b) = a = b: 1. Run Filter A every a seconds. 2. Run Filter B every a seconds. GCD(a,b) = c for some c < a and Let L = LCM(a,b). 1. Run Filter A every a seconds. 2. Run Filter B every b seconds. GCD(a,b) = 1 i.e., a & b are co -primes and Let L = LCM(a,b). 1. Run Filter A every a seconds. 2. Run Filter B every b seconds.

PAGE 24

24 CHAPTER 6 MANAGING SENSOR HISTORY Unlike ad hoc sensor networks, Service orient ed architecture represents each sensor and actuator as a service in the middleware. Progr ammers get better control on the sensors and actuators as they are represente d as services in the middleware. Hence, programmer can specify the sensor or node id or both in the query to access those sensors directly. It is important for the optimizer to optim ize the execution of the query to increase the scalability of the system. Some of the optimizations involve sele cting the least selective range predicate of a sensor. E.g., Select Temperature from sensors where node1.Temperature > 50 and node2. Pressure > 40 and epoch 30mins for 24 hrs; In the above query, Temperature > 50 and Pre ssure > 40 are the two range predicates. One way to execute the above query is to sample the pressure and temperature sensors at the beginning of the epoch and compare those values in the middleware to see whether they satisfy their range predicates. One simple optimization th at can be done on the que ry is to sample only the most selective predicate. If the most selec tive predicate satisfy its predicate condition then only pull the value from the other node. To determine the selectivity of a sensor, it is necessary to have some history of a sensor. The importance of the history data decreases as time elapses, so it is not worth storing entire history data of a sensor. Once some of the history data is collected, Selectivity score is calculated and used to determine how likely a sensor is going to produce its next valu e with in the range predicate of the sensor. Comparison among the sele ctivity scores of different sensors determines the sensor which is the most selective and the least selective.

PAGE 25

25 To determine the selectivity of sensor, some fixed size history is maintained. This history can be imagined as a pipe of fixed length in whic h old data is pushed out as new data arrives to maintain the capacity of the pipe. This manageme nt of history will beco me more difficult when history of data is distributed between a node and the middleware. The following figure illustrates history representation. Figure 6-1 History representation Distributed History The distribution of queries is done by the que ry processor at the middleware and it needs to have access to the histor y of all the sensors involved in a query before making any optimization decision. But Maintaining entire histor y in the middleware increases the traffic of data from node to middleware as it is required to pull some of the data separately for history management that otherwise be filtered out because of the filters running on the node. In order to avoid pulling of filtered out data, It is required to maintain a partial history in the node. Partial history in a node helps in usi ng only small portion of the available memory in the node. The history in a node is formed with th e values that fall outside the filters running on the node. History in the middleware is formed with the values that fall inside the filters running in a node and are used to answer the queries Hence there is no explicit cost involved in maintaining the history except the storage requir ed at the node. In the history, each value has different weight associated with them. This is because those values are sampled at the different points of time and the significance of a value is time dependent because the recent values more likely to reflect the current trend of a sensor th an the relatively older values. In order to give

PAGE 26

26 weights to the different values of history, it is importa nt to keep track of the relative position of each value of the history. It requires each valu e of history to be asso ciated with time stamp. Since the amount of history that is maintained for each sensor is bounded by some window size and the history data may be dist ributed between a node and the middleware based on the characteristics of the filters running on the node, it is important to have some synchronization mechanism to maintain the fixe d size history when a new value is sampled. Distributed History Management Although history of a sensor is distributed between the middleware and a sensor node based on the characteristics of f ilters running on the node, we can im agine the history of a sensor as a fixed length pipe in which arrival of a new value pushes the oldest value out of the window. Since the size of a history window for each sensor is fixed, it is important to find the candidate to be replaced for the arrival of a newly sample d value when the window reaches its max size. Since It is possible that history of da ta gets distributed on both node as well as middleware, it is important to manage the histor y data efficiently. The following section explains the different possible ways in whic h history data can be distributed. History of a Sensor Entirely on Node It may be possible that the hist ory data of a sensor is entirely present on the node. This is the case when all the values sampled from a sensor do not pass through any of the filters of a node. These values are not sent to the middlewar e as they are not requ ired by queries because they failed to satisfy the filters. When a new value is sampled from a sensor and if the value does not pass through any of the filter s running in the node then the va lue is stored in the node by placing it in a queue. The queue node contains the time stamp of the arrival of data along with the value.

PAGE 27

27 History of a Sensor Entirely on the Middleware It may be possible that the hi story of data is entirely pres ent in the middleware. This is the case when all the values sampled from a sensor do pass through at least one of the filters that are running on the node. History data is stored in the middleware using queue. When a new value is sampled from a sensor and the value passes thr ough any of the filters runni ng in a node as part of some continuous query, then the value is sent to middleware and is plac ed in the queue in the middleware. The oldest value is removed from th e queue if addition of a new value to the queue exceeds the maximum size of history window. History of a Sensor Partially Stored in the Middleware and the Node It may be the case that some of the history is present in a node and th e rest of the history is present in the middleware. This is possible if a portion of the sampled values of a sensor are passing through the filters of a se nsor on the node. This is the case which is likely to happen frequently. Once partial window is formed at the middleware and the rest of the window is formed at the node level, it is important to find and remove the oldest value of the two history windows if the addition of a new value makes th e sum of the two windows exceed the maximum history window size of a sensor. Finding the ol dest value can easily be achieved by comparing the timestamps of the oldest values of both the partial windows. To accomplish the task, it requires at least two messages. After a new valu e is sampled, If the value goes to middleware, the middleware gets the time stamp value of a va lue at the beginning of the queue (which is the oldest value as it is present at the beginning of the queue) and put s the timestamp into a message and sends it to the node process to allow the node to compare its local olde st timestamp with the oldest time stamp of the middleware. The node pro cess checks to see whether its local value is the oldest one by comparing th e incoming timestamp with the time stamp of value at the beginning of its local queue. If the node contains oldest value, then the oldest value of the node is

PAGE 28

28 deleted and it sends a message to middleware pr ocess informing the action. If the node does not contain the oldest value, it just sends a message to the middleware process with out deleting any value from history window of the node. Then middle ware deletes the first el ement of its queue as it has the oldest value and places the newly sampled value at the top of the queue. If the newly sampled value stays in at a node because it does not pass through any of the filters running on the node, then the node starts the process of sending a message with the nodes oldest time stamp. The middleware takes this tim estamp and compares the timestamp with the time stamp of the value at the beginning of the queue to determine the oldest time stamp. If it finds that it has oldest time stam p value then it removes the va lue containing oldest timestamp and informs the action to the node Then node adds the newly sampled element to its queue. Thus the oldest value is removed from one of the two history windows and the new value is put into one of the two windows depending on the filters running on the node. But this method involves 2 messages every time sampling takes place. It may be the case that a sensor is sampled every second. In such cases, every time a sensor is sampled two messages need to be transferred between the node and the middleware to maintain the distributed histor y which proves to be more costly in terms of the band width utilization. As the sample rate increases the more messages transfer between the node and middlewar e causing excessive util ization of bandwidth. To reduce the bandwidth overhead and to maintain the history irrespective of sample rate, the following method is proposed. In this method, two bit queues each having number of bits equal to total history window of a sensor are used. One of the two bit queues is maintained at the middleware and the other is maintained at a node. One bit queue is replica of another queue except that they are maintained at the different places. The kth position of the bit qu eue indicates in which partial history window,

PAGE 29

29 middleware or node, the kth (1<= k <=n, where n is the size of histor y window) element of history of a sensor is present. Figure 6-2 Example#1: Dist ributed history management As shown in the above diagram, let us suppos e that a range filter of a continuous query, 1<= x <= 5, is running on a node that has epoch e. Let us suppose that the total history size window is consisting of 4 values. Bit 1 at the kt h position of bit queue indicates that kth element sampled by a sensor is sent to the middleware a nd maintained in the middleware queue. Bit 0 at the kth position of bit queu e indicates that kth element is stored in the node. Let us suppose that the sampled value of the sensor is 2. This value is compared to the filer [1,5] and it is sent to the middleware as the value passes the filer. Then the 1st bit value of bit queue in the node is set to 1 as the first value sampled is sent to the middleware. In the middleware the value is put into the queue as shown in the above diagram and the middleware set the first bit of the bit queue to 1. Let us say the next sampled value is 6. Then value does not pass through the filter running on the node and hence, the value is stored at the node. The value

PAGE 30

30 is put into the node queue after setting 2nd bit of the bit queue in the node to 0 indicating that the second oldest value is present on the node. Since middleware knows the epoch of a filter of a query that runs on a node, it waits for the value 6 to come to the middleware at the begi nning of epoch. If the value is not sent to the middleware then the middleware assumes that th e newly sampled value is not passed through the filter. Then it sets the 2nd bit of the bit queue of a sensor in middleware to 0 indicating that the second value of the history data is stored on the node. But one probl em of the assumption is that the reason for the value not being able to come to the middleware may be either the value does not pass the filters or it gets delayed or lost. It will be shown in the subsequent sections that, the lost or delayed value has no effect on the history management and in calculating th e score of the selectivity. The process continues till the max history wi ndow size is reached (i.e., sum of the sizes of the middleware window and n ode window is equal total history size which is 4 in this example). After 4 values have been sampled, th e history window of size 4 is distributed on middleware as well as on the node. At this stage, let us suppose that a new sampled value is 8. Now the value 8 is accommodated in the histor y data by removing the oldest value from the window. When the value is sampled, the value is compared to the filter running on the node. Now it is required to find the olde st history value of th e distributed history and delete the value. Since 8 does not satisfy the fi lter [1, 5], it remains on the node. At the be ginning of the epoch when 8 is sampled, the middleware and the node look at their respective bit queues. Then the first element of the bit queue is 1 that means oldest value is present in the middleware. The middleware then removes the first value of its hi story window. At the node it will not remove any value as the first bit value is 1 which indi cates that the oldest value is present in the

PAGE 31

31 middleware. At the node, the value 8 is placed in its value queue. Both middleware and node left shift the bit queue and sets the right most bit to 0 indicating that 8 is on the node. Thus this process is repeated for all the newly sampled values and the fixed size window of the history is maintained with out any communication overhead. The following diagram is another example of distributed history management. Figure 6-3 Example#2: Dist ributed history management Handling Delayed Values The value that is sent to the middleware may get delayed because of various reasons. Since the order in which values are produced dete rmines their position in history window, It is important to deal with delayed values so that th ey can be placed at the appropriate position in the time sensitive history window. Th ese delayed values are handl ed by maintaining a sequence number for each value sampled. The sequence number starts with 0 and wraps around to 0 when it reaches 1 less than the maximum history window size. Let us suppose that a value that is

PAGE 32

32 sampled and is able to pass through a filter running on the node. This value is sent to the middleware and placed in history window of a sens or in the middleware. Let us suppose that next value is sampled at the time t + epoch and is sent to the middleware. Assume that the value is delayed for various reasons and hence, the value will arrive at th e middleware after some delay. But the node which produces that value thi nks that, the value might have reached to the middleware and hence, it will do necessary opera tions on its bit queue and updates its local history window( which is a queue) if necessary. Fr om the middleware perspective, it thinks that the value sampled at t+epoch is unable to pa ss through the filters running on the node hence it updates its bit queue and its local history window accordingly. Let us suppose next value is sampled at t+2*epoch time and arrived at the mi ddleware. Since the messa ges from node to the middleware contain the sequence number of th e sampled value (the sequence number gets incremented for every value produced and wraps around when it reaches 1 less than max history window size of a sensor), by looking at the me ssage sequence number of the value produced at t+2*epoch, the middleware realizes that the sample d value at t + epoch is either lost or has stayed at the node. Assume that after proces sing the value produced at t+2*epoch in the middleware, the value sampled at t+epoch time is arrived at the middleware. By looking at the delayed message sequence number, middleware iden tifies that the message is delayed. Although the message is delayed, middleware identifies the proper slot in history window based on the sequence number. It may be possible that the de layed value might be arriving at middleware even after the middleware history window wraps around. In order to deal with this kind of delays, the middleware maintains the time at which the hist ory window is wrapped around. If the time stamp of delayed value that is maintained in the me ssage is less than the last wrap around time of

PAGE 33

33 history window then the message is discarded as the message belongs to the previous history window. Dealing Lost Packets If a message is sent from node to middleware, it is possible that the message is lost. Let us suppose that the message is lost. But the node assumes that the value might have delivered to the middleware and does the requi red operations on its bit queue. The middleware assumes that the value that is sampled at the beginning of e poch might have stayed at the node and does the required operation on bit queue. Because of these assumptions, the oldest element is deleted and the middleware adds bit 0 to the bit queue indicating that the rece nt value has stayed at the node. The node adds bit 1 to its bit queue indicating that the recent value is se nt to the middleware. Because of these changes, history will not get affect ed except that the olde st value is lost. That means that current history size shrinks by 1 whic h has no effect on history. Thus by maintaining sequence number and the time at which the sequ ence number is wrapped around, it is possible to deal with delayed packets and lost messages effectively.

PAGE 34

34 CHAPTER 7 ESTIMATING SELECTIVITY OF A SENSOR Selectivity Selectivity of a sensor is a re lative term. It gives an idea as to how likely a sensor will produce its next value in a range predicate of a sens or specified in a query. If a sensor is the most likely candidate among all the sensors specified in a query to produce its next value in its range predicate then the sensor is the least selective sensor of the query. Consider the following query. Select temperature from sensor table Where sensor.temperature_sensor 30 and 0 sensor.pressure 20; The temperature sensor is the less selective sensor than pressure sensor if the former is most likely to produce its next va lue with in 0 and 30 th an the later does produ ce its next value in 0 and 20. Thus, selectivity of a sens or is not an absolute value. Se lectivity score is rather relative term. Sometimes it may be required to compar e the selectivity scores of sensors of heterogeneous type, and in such cases, it is re quired to have common measurement which takes into consideration of the physical characteristics of a sensor as well as the bounds of the sensor. Selectivity score, which is described in subs equent sections, is a relative measurement which allows comparison of the selectivity of heterogeneous sensors as well as homogeneous sensors. Selectivity is very useful component in service oriented architectures as well as ad hoc wireless sensor networks to determine the most selective and least selective sensor of a query. If a query involves multiple predicates of differe nt sensors, it is good id ea to query the least selective sensor which can be determined thro ugh selectivity score to have more chances of eliminating query with out probing further. The se lectivity is a major co mponent in optimization of continuous queries which runs for long duration of time and requires to samples sensor at the beginning of each epoch.

PAGE 35

35 Selectivity Score Based on our experiences in real life sensor network deployment [ 4,7,11], we found that the future behavior of most analog sensors such as temperature, pressure and humidity can be predicted based on their behavior in the recen t past. Based on this assumption, we propose a simple model for predicting the behavior of a sens or and show that it wo rks reasonably well in a variety of situations, through expe riments on real life sensor data. Selectivity Score S (of a predicate a x b) = where m = (a+b)/2, (yi, ti) ar e sensor history records containi ng values which fall within range of this predicate and w(t) is the ageing function. The higher th e selectivity score, the more selective is the predicate. The Figure 7-1 shows the idea behind th e concept of selectivity scores. Bounds of S Let K = 2 | | a b ti w m yi The term K has a lower bound 0. To find the upp er bound assume the sensor has sensitivity and output range [A, B]. Then the term K will have achieved its maximum value of (B-A)/ for the following boundary conditions: a) If B = a = b = m and yi = A b) If A = a = b = m and yi = B

PAGE 36

36 Using the definition of sensor sensitivity (if |b-a| <= then as far as the sensor is concerned, b = a), we get th at if b=a then (b-a)/2 = /2 This together with the above boundary value conditions gives us th e maximum value of the term. Figure 7-1 Selectivity Graph of Two Sensors As shown in Figure 7-1, let suppose that A, B are the actual sensor ranges for sensors sensor1 and sensor2 respectively. Let us suppose that a and b are the bounds of the range predicates of the sensor1 and sensor2. let m = (a +b)/2 is the midpoint value of [a b] bounds. The green points are the values of the history of sensor1 with x-axis representing time. The yellow points represent the history values of th e sensor2. From Figure 7-1, we can clearly see that sensor1 is less likely to produce its next value in the range [a ,b] than sensor2. One observation that can be made out of the diagram is that the distan ce of all the points of sensor1 with respect to the midpoint of the bounds [a,b] is more than th at of sensor2. Hence the sensor1 is less likely to produce its next valu e in the range [a b] than sensor2. By giving different weights (generally more we ight to more recent value) to the history values based on their timestamp, we can see that recent values dominate the selectivity score than the older values. This idea of giving differe nt weights to the values based on their position

PAGE 37

37 in the history window suppresses th e occasional spikes in the hist ory values and gives the trend more accurately. It is important to see that re cent values get more weight than the previous values to let the recent values be dominant in determining the trend of a sensor. Otherwise, the older values may dominate the selectivity score. The weight function can vary depending on the t ype and the general behavior of a sensor. As per experiments, the logarithmic function is performing well for temp erature, pressure and humidity sensors. The selectivity score uses the sum of the pr oducts of the distance of each history value from midpoint of bounds [a,b] and corresponding weight func tion value. The higher the selectivity score, the farther the trend of the sensor is from the midpoint of the bounds of a range predicate. Hence it is less likely to produce its next value in its range predicate than the sensor having less selectivity score. Dynamic Scheduling of Sub Queries of a Continuous Query Let us suppose that following query has been i ssued by an application and is subjected to the dynamic scheduling of filters. The following s ection illustrates how the query is executed. Let us assume that s1 is a se nsor connected to a sensor node1, s2 is a sensor connected to a node2 and s3 is a sensor connected to node3. The simple way of executing the query is to make s1>20, s2>30 and s3>40 sub queries and pus h them on to the nodes node1, node2 and node3 respectively. At the beginning of the epoch e ach of the sub query samples its sensor and checks the sampled value against the condition i.e., s1>20 s2> 30 and s3>40 and send the data back to the middleware if the value sampled sa tisfies the filtering condition on the node. At the middleware, if the query engine receives all the three values then it can assume that all the E.g. select s1, s2, s3 from sensors where s1 >20 and s2>30 and s3>40 Epoch X for Y

PAGE 38

38 conditions have been satisfied and it can go ah ead further executing the query for the select attributes. The disadvantage of this approach is to ge t some of the values from the nodes to the middleware although the other sensor values fail to satisfy the filtering conditions on their respective nodes. For e.g., Let us assume that node 1, node2, node3 produce s1=19, s2=31 and s3=41 respectively. Since value of s1 falls out side of the range of the filter running on the node1, it is not sent to the middleware. But s2 and s3 values are sent to the middleware as they satisfy their respective filtering conditions. It can be observed that the values of s2 and s3 are being sent to the middleware where those values are not used until s1 satisfies its own filter and sends the value back to the middleware. The best wa y to deal with this kind of queries is to pick the most selective filter of all the filters and pus h the filter on to its node as a process and keep the rest of the filters in the middleware. The pr ocess samples the value at the beginning of each epoch and check against its filter condition. If it satis fies the filter condition then the value is sent to the middleware other wise the value stays at th e node. In the above example let us assume that s1>20 is the most selective sensor meaning that the sensor is least likely to produce a value that falls in its filtering condition. Let us assume that the order of se lectivity of the sensors from the most to least are s1>20, s2>30 and s3>40. Acco rding to this optimization method, we have to push s1>20 filter as a process on to the node1 an d make it run continuously. At the beginning of the epoch of the query, s1 is sampled and the valu e is checked to see whether the value is greater than 20. If the value is not satisfied then mi ddleware did not go further evaluating the query. If the value satisfies the filtering condition then the value is sent to the middleware and the middleware starts evaluating other filters by us ing single time requests to their respective nodes.

PAGE 39

39 Single time requests are like a single time queries where it not required to run the query as a process on the node. The calculation of sele ctivity of a sensor is done ba sed on the history. Since the continuous queries are long running queries the filte r that is most selective at this moment may not be the most selective after some time. The following mechanism is proposed to ensure that the most selective sub query is pushed on to the node. Feedback Mechanism: 1. This mechanism is required as the selectivity ca lculations done at a particular point of time will not remain valid for the entire life of a long running query. 2. Keep track of the number of false positiv es When predicate/ filter running on node returns a result but the entire set of predicates is not satisfied. In case this exceeds a threshold, this implies that the predicate is not the most selective any more since another predicate is getting satisfied lesser number of times than this predicate. 3. Recalculate selectivity score and push in the most selective subset of predicates (and canceling the invalidated predicates previously running on the nodes) This kind of mechanism ensures that recalculati on of selectivity scores is not done at the beginning of each epoch and makes the system more robust to the occasional spikes in the data. This optimization can be summarized as following 1. Use selectivity to determine which subset of predicates to push as filters on the nodes. 2. Push a subset of most sel ective predicates on the nodes. 3. Use selectivity score for determining selectivit y. If filters are running for that sensor, then use sensor history to calculate score. If no filters are running, then push the predicate on the node and let it run for a fixed number of readings. This will f ill up the history which can then be used to calculate the score. 4. Feedback mechanism to indicate when cu rrent selectivity scores are invalidated The following flow chart shows how the cont inuous queries with dynamic scheduling of sub queries are executed. The diagram also expl ains about the feedback mechanism which deals

PAGE 40

40 with false positives. This feedback mechanism is needed in order to prevent the selectivity score being calculated frequently and to reduce the effect of the occas ional spikes in the data. The following flowchart also explains the different mechanisms to de al with single time queries as well as continuous queries.

PAGE 41

41 Figure 7-2 Flow chart for dynamic scheduling of filters Is it continuous query? Multiple predicate Is predicate covered by MW hist. Query nodes for hist. not covered by MW Get the score corresponding to predicate Apply MW histogram Merge results in both Histograms Sort results to generate the list of nodes ordered by selectivity (most to least) Select and push continuous sub query on to most selective node Listen for the response and wait Pull values of the other nodes Check against predicate Satisfied Respond to user Multiple predicates Pull query Do history analysis and pull in order of selectivity Satisfied for all the predicates Report results to users Report no results Query parser A A N N N N N N Update Not_Sat_Count of the selected predicate (or) update the time counter Counter < T Recalculate the histogram B B A N

PAGE 42

42 CHAPTER 8 EXPERIMENTAL EVALUATION Results The simulator simulates the execution of continuous queries whose range filters of the form A AND B. A and B are range filters of sens ors s1 and s2 respectively. The simulations are run on a real life sensor data set. This data set was logged by Intel Research la b at Berkeley and contains temperature, pressure and humidity da ta collected by 54 sensors between February 28th and April 5th, 2004. In order to evaluate our selec tivity model for situations, where a sensor is behaving normally or it is behavi ng erratically as a result of external environmental conditions or malfunction, we deliberately chose those data logs which contain incident s of both types. Given the data set consisting of large number of thos e values, the simulator generates a random range and the difference of the range is used to pick a contiguous block of the data from a sensor dataset. This continuous block of data is divided into two part s in which first portion corresponds to the history data and the second portion corres ponds to test data. Afte r picking the contiguous blocks of data for two sensors and partitioning each contiguous block, the selectivity scores of each sensor is calculated using the histor y portion of its contiguous block of data. Range filter of each sensor is compared against the second portion of the sensors contiguous block i.e., test data of the sensor and the number of values that fall in the range filter are counted. Then simulator will check to see th at the sensor having more selectivity score should have less number of values of test data that falls in the range filter of the sensor. Similarly, the sensor having less sele ctivity score should have more nu mber of values of test data that falls in the range filter of the sensor. The total number of iteration per history size is 1000. History Size varied between 50 and 250 in increments of 50 for each valu e of history size, test data size was varied from 50 to 250 in

PAGE 43

43 increments of 50. Total of 5000 iterations ar e conducted. The success % plotted on the graph corresponding to a particular hist ory size is the average over the number of different training data sizes. 'Clear decision' plot in the gra ph corresponds to the following cases: (Sensor1 is assigned selectivityScore1 and Sensor2 is assigned selectivity Score2) a) If selectivityScore1 > selectivityScore2, then sensor1 is more selective than sensor2 (as per test data) b) If selectivityScore2 > selectivityScore1, then sensor2 is more selective than sensor1 (as per test data) c) If selectivityScore1 = selectivityScore2, then sensor1 is as selective as sensor2 (as per test data) Hence, the 'clear decision' plot excludes cases where selectivity scores are not equal but selectivity are equal (for example, cases where none of the r eadings in the test data set falls within range limits and this is a very common occurrence since range only cover a small portion of the total possible output range of values of the sensor) Figure 8-1 : Selectivity estim ation with history size = 50

PAGE 44

44 Figure 8-2: Selectivity estim ation with history size = 100 Figure 8-3: Selectivity estim ation with history size = 256

PAGE 45

45 As it can be observed that, as the test da ta size increases, which means as the same selectivity score is being used for the increasing size of the test data, th e success % is decreasing. This is because of the fact that the rele vance of history decr eases with the time. One more thing that can be observed from the a bove graphs is that fo r smaller history sizes the success % is not stable as the test data si ze increases. From the above figures, it can be observed that history size 256 is perf orming better than history size 50 and 100. For the history size of 256 values, the success % is good until the test data size of around 4000 values. That means the same 256 values and its selectivity score perform well until next 4000 values are sampled. After that, it is required to recalculate the selectivity score based on the new values in order to prevent the %success fr om going down further. This answers the question how much data the history is valid for and how frequently history should be calculated. The 256 values per sensor seem to be performi ng well. The value 256 is chosen to meet the memory constraints of the sensor node. Atlas sensor node has around 4 KB SRAM and 32 KB of external memory. Each node is ge nerally used to connect four sensors. The vo ltage that is coming from a sensor is convert ed into 10 bit value by ADC. Hence the range of values that would come from a sensor would be from 0 to 1023. In each record, the timestamp of the value and the value is maintained. The time stamp of a valu e gets stored in the form of offset w.r.t to actual time stamp. Hence, around 2 bytes of memory is required for each re cord to store actual value and its time stamp offset. So, to store 256 records, around 512 bytes pe r sensor are required. As it is mentioned earlier, there may be very few situ ations in which entire history ge ts stored in the node. Even in that case, each node requires only 2KB of me mory for the history of all the sensors.

PAGE 46

46 CHAPTER 9 CONCLUSION AND FUTURE WORK Conclusion This thesis is an attempt at providing e fficient query processing architecture and optimization methods to query smart devices in pe rvasive space that form the network of sensors and actuators. This thesis is an attempt at provi ding methods to merge similar filters of different queries to reduce the sampling costs, store time dependent history data to be used in the selectivity calculations calculat e the selectivity scores of se nsors and allow dynamic scheduling of the filters to reduce th e bandwidth utilization. These optimization methods when incorporated in a centralized mechanism called query processor enables the queries to be executed e fficiently and provides th e centralized mechanism to group the queries to marshal the scarce resour ces, band width CPU pr ocessing and battery, of a Atlas node. These optimization methods increase the scalability of perv asive spaces and should prove valuable for a long time to come, given th e increasing number of the applications that access the pervasive spaces and the devices that exist in pervasive space. Future Work One of the main improvements is being done in the area of distributed joins. Some of the other optimization methods that are being work ed on are, phenomena detection and tracking and optimizing sliding window join by deriving filter s from materialized views to push them on to the nodes to reduce the traffic from nodes.

PAGE 47

47 LIST OF REFERENCES [1] D. Abadi, S. Madden, and W.Lindner. Atlas REED: Robust, Efficient Filtering and Event Detection in Sensor Networks. In: Proceedings of the 31st VLDB Conference,Trondheim, Norway, 2005. [2] A. Aboulnaga, S. Chaudhurin. Self-tuning histograms: building hi stograms without looking at data. In: Proceedings of the 1999 ACM SIGMOD inte rnational conference on Management of data Philadelphia USA [3] M.H. Ali, W.G. Aref, R. Bose, A.K. Elmaga rmid, A. Helal, I. Kamel, M.F. Mokbel. NILEPDT: A Phenomenon Detectio n and Tracking Framework for Data Stream Management Systems. In: Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005. [4] A. Arasu B. BabCock, S. Babu, J. Maclis ter, and J. Widom. Characterizing Memory Requirements for Queries over Continuous Data Streams. In: ACM Transactions on Database Systems March 2004. [5] R. Bose, J. King, H. El-Zabadani, S. Pick les and A. Helal. Building Plug-and-Play Smart Homes Using the Atlas Platform. In: Proceedings of the 4th International Conference on Smart Homes and Health Telematics Belfast, Northern Ireland (2006) pp. 265-272. [6] A. Deligiannakis, Y. Kotidis, N. Rouss opoulos. Compressing Histor ical Information in Sensor Networks. In: Proceedings of the 2004 ACM SIGM OD international conference on Management of data SIGMOD '04,Networks June,2004, Paris, France. [7] A. Helal, W. Mann, H. El-Zabadani, J. King, Y. Kaddourah and E. Jansen. Gator Tech Smart House. In: A programmable pervasive space, IEEE Computer 38(3) (March 2005) 50-60. [8] J. King, R. Bose, S. Pickles, A. Helal an d H. Yang. Atlas A Serv ice-Oriented Sensor Platform. In: Proceedings of the 31st IEEE Conference on Local Computer Networks Tampa, U.S.A. (2006) pp. 630-638. [9] S. Madden, M.Franklin, and J. Hellerstein. Th e Design of an Acquisitional Query Processor For Sensor Networks. In: Proceedings of the 2003 ACM SIGMOD international conference on Management of data SIGMOD '03, June 9-12 San Diego, CA [10] C. Olston, J. Jiang, and J. Widom. Adaptiv e Filters for Continuous Queries over Distributed Data Streams. In: Proceedings of the 2003 ACM SIGM OD international conference on Management of data SIGMOD '03, June 9-12 San Diego, CA. [11] J. Spiegel, N. Polyzotis. Graph-Based S ynopses for Relational Selectivity Estimation. In: Proceedings of the 2006 ACM SIGMOD inter national conference on Management of data SIGMOD '06 June 27, Chicago, Illinois, USA.

PAGE 48

48 [12] J. Xie, J. Yang, Y. Chen. On Join ing and Caching Stochastic Streams. In: Proceedings of the 2005 ACM SIGMOD international confer ence on Management of data SIGMOD '05 Baltimore, Maryland, USA.

PAGE 49

49 BIOGRAPHICAL SKETCH Anil Moola was born in Thirumalagiri in 1983 and raised in Hyderabad, India. He obtained his Bachelor of Engineering degree in computer science from Osmania University in 2004. He worked in the software industry for 2 year s. In 2006, he joined as a graduate student in the Department of Computer Engineering of Univer sity of Florida. While pursuing his Master of Science, he conducted research on scalable query processing on Atlas platform in the Mobile and Pervasive Computing Laboratory. After receiving his Master of Science degree in computer science, Anil plans to work as development engin eering in one of the lead ing software industries.