<%BANNER%>

Theory of Linear Operators for Aggregate Stream Query Processing

xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID E20101123_AAAAAR INGEST_TIME 2010-11-23T07:26:37Z PACKAGE UFE0011876_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES
FILE SIZE 25271604 DFID F20101123_AAAKSF ORIGIN DEPOSITOR PATH golani_g_Page_45.tif GLOBAL false PRESERVATION BIT MESSAGE_DIGEST ALGORITHM MD5
d9dd116f1df98085070e34a930b15732
SHA-1
c767bbde8420079a651944f64b494738eca984e5
769407 F20101123_AAAKXC golani_g_Page_34.jp2
c5112dce3d4e14efe48e1b6ba9a6584a
dcef064f151262305312679fce59ed1e735de88a
6569 F20101123_AAAKSG golani_g_Page_08thm.jpg
3add932f1859c1b47944030def24d11e
b79be4dc006596e4879f5d46a9ac259c0ff0d2aa
35488 F20101123_AAAKXD golani_g_Page_35.jp2
d9586722677995dfba33dff0eba98bed
1a3df201ee2b27b683d30a416053f1dc3581a6a8
27744 F20101123_AAAKSH golani_g_Page_40.QC.jpg
1ccfddda85b0826a60fd99c6e1d80968
2314339340d35beb7bcfbb2eaf4a4f732a2e194a
1034913 F20101123_AAAKXE golani_g_Page_36.jp2
eb219406afb80d8d4b1190295f6187b1
4d272759c6c3e605fa1cdbf00a815f482c5d9f97
126564 F20101123_AAAKSI golani_g_Page_47.jpg
dadbdd42d754ec7c173b6a8aedf802e2
9d75c5ca9c21cc448f4cb7c0bd77a34c8daa77c3
111531 F20101123_AAAKXF golani_g_Page_37.jp2
8d72da98d3671f13f729f3cfb9d35cd3
ac40cdf1264f5561a4ecec99aed2da2d882a99a2
21889 F20101123_AAAKSJ golani_g_Page_41.QC.jpg
9526b2a018cb92eb45dad75825904c57
c7310d0d4412c0f2a91741d8b7fbfbb89134aff8
73764 F20101123_AAAKXG golani_g_Page_41.jp2
0843538b4e00a399d051275febdd8cad
47303e827881d457a7f58e9c7a35cb730646c4c2
13595 F20101123_AAAKSK golani_g_Page_06.QC.jpg
0485171a6c8747773b6c80f3a8465a9b
6146c1a8ff029bc9862582975db6b82617d05f7f
761467 F20101123_AAAKXH golani_g_Page_43.jp2
14cb082044b183cc4ae42a3337f57323
0de8e78371c82b8c6b6f7403f9e65194e4116a93
1489 F20101123_AAAKSL golani_g_Page_21.txt
172009873f83a9bbef3a137cf45032f7
9d88c08da7bce79462810da158098fdb8105d9cf
1051973 F20101123_AAAKXI golani_g_Page_45.jp2
9afd52c61e575526c49eed75752e469a
926b5957c50cb6838c39f2353337eb95cbb1a775
F20101123_AAAKSM golani_g_Page_33.tif
aa7a0cbf2a83cf33822a7b6917198a4e
fcf7d73ab1a86f4d0744863289fb98e0e7e29ec3
144112 F20101123_AAAKXJ golani_g_Page_47.jp2
f194e106c6f72e4c0fec31f5bed3347e
68ff37ac1ef11d46b2cea97c79d09ff1323d40b1
138889 F20101123_AAAKXK golani_g_Page_48.jp2
d0c28921dbcd64d205af8ad2b65146e3
c8bb6229cb4cd4351bfcd1af007505157e25fa04
1013052 F20101123_AAAKSN golani_g_Page_40.jp2
0673b31a8bdfba03107704ffcb00ed86
79029c5e4b34b95d9fb1bcf3e6fcc66bd269d64e
1053954 F20101123_AAAKXL golani_g_Page_02.tif
d357a6382b1bf7a66eb30b7eefdbf9c1
afb8208bd3d57a47d7517a7bac686a650ecb3444
45461 F20101123_AAAKSO golani_g_Page_31.pro
a4eda3351fd20cd228ac4b49f5f0cead
0b921db8bc45ca72cbd6591f43a168d167bd8467
F20101123_AAAKXM golani_g_Page_04.tif
149dc408b2e3e905c7bdee79582c7081
4369cf7e74fe76fd2876f377e2a444df19fb21cd
5595 F20101123_AAAKSP golani_g_Page_41thm.jpg
27a0d4a8be565eab189403011b16c26f
bb17d7fc3c3dfe6ed4771e5b77ce50e9e5aba39d
F20101123_AAAKXN golani_g_Page_05.tif
a80ad7c2cdd85066a088102b9f485afa
4efe22b3b8766e3c033a0d5297fb959a347eefd7
1842 F20101123_AAAKSQ golani_g_Page_29.txt
20a3c0b74de07227bfeffc6222e63faf
c0ac96ea250bb65614a66ba5adbabf99ea3910b5
F20101123_AAAKXO golani_g_Page_07.tif
ffd1b40cdce9acbbfeb2cc73bc7ac7d2
ad7799b30c7f4c2fada0ace35f68a5be68ca53df
29697 F20101123_AAAKSR golani_g_Page_17.QC.jpg
3e07b05416cfaa9697e5610c78932bc9
8d2e1ee4dd0ecc174203547ecc38ce614fa48f9d
F20101123_AAAKXP golani_g_Page_10.tif
642354fae82fc532b51310f7510c4c91
cd53e92925941d244a6505acf55ab5b8b869dcf2
62828 F20101123_AAAKSS golani_g_Page_24.jpg
0071f78450afa912734ab8cb37f2e4ae
2a9ebb51517ecc127a30bd7e376856c518ec5fbd
F20101123_AAAKXQ golani_g_Page_11.tif
55896d3f41a37ed6ecf0d467d11f7cf9
0ea1ee06a24e3702d5a01c3b3339e3abd921d3a7
5775 F20101123_AAAKST golani_g_Page_30thm.jpg
7985cbb76ddb0413ebaeef2587d46a99
bc692311e9917c37b0705d7c832c3d17a4b5c6ad
F20101123_AAAKXR golani_g_Page_12.tif
e555855bf65e781dff3a8b9b3e609ae7
e8ea4b5354b8b836d5f6423c2fdc4c28330c7092
1925 F20101123_AAAKSU golani_g_Page_44thm.jpg
fca2bc0f939d222ff53d25ff3d77d849
300f56d0a07e84807d832345dd6e9e9ea7adf8ab
781 F20101123_AAAKSV golani_g_Page_49.txt
3c3cbc5846a312217b2cb7ebfe7c854f
4504403cd02b56ac4e2fd96691a83b64b19d554a
F20101123_AAAKXS golani_g_Page_13.tif
82c00d22b9d91b0a82bc72030ebb83e5
aaa5b159c5e24d914960c73d45de664341c8f020
65221 F20101123_AAAKSW golani_g_Page_47.pro
6140ba0436a32ab4e132ef35e266d906
17a56a133c0fcdabfec24d5f2bdf7c7d7d976ab4
F20101123_AAAKXT golani_g_Page_16.tif
63e06eb6754d0ab90f7fc02678b24649
0ee9a7f3a994118596b81044ec1a2bc1657cddce
32026 F20101123_AAAKSX golani_g_Page_30.pro
628cc11764421a69de370d2a717946ea
850794d6c5883a3ecc1769a8f807ba5acd782283
F20101123_AAAKXU golani_g_Page_17.tif
1a2395bbb543acef61410a8a07819755
c824f777f2dcace4349446bfe33743f5ce4b1e79
26443 F20101123_AAAKSY golani_g_Page_10.QC.jpg
e5d12b13aa37169f3ae9994d2c75fe7c
5a18d2430b5cd7cedf487818c3c79be312154fa7
F20101123_AAAKXV golani_g_Page_18.tif
131798c1020b95cef2c1aa0f301af8d6
652cdad2c1d4698791df5abe8c0df420d14864d9
1620 F20101123_AAAKQA golani_g_Page_23.txt
bfb2a9409404e41775f87a62ecae789d
04ff736aeb09058a7db36babc153ece9dea6c1e6
77916 F20101123_AAAKSZ golani_g_Page_22.jp2
23841a093ae252948063f581fd59e900
217a1af67c87d2a4f302b39c00879624bca029a3
F20101123_AAAKXW golani_g_Page_19.tif
d5768376090e487588eaea22424c9a1a
530118748cda2f743511538a2deb3f38f69383bb
552498 F20101123_AAAKQB golani_g_Page_20.jp2
78a8661c1374c9b6562133ba52755928
fd0c914d38fd6975e70f888cf15993c341a2e232
F20101123_AAAKXX golani_g_Page_20.tif
4e92ebf93825dc68abda02646bfb04ac
6d82a9ccd872f6743044fd516622745ad2dadc3c
18470 F20101123_AAAKQC golani_g_Page_38.QC.jpg
209969b68a9ba84734f902d66689a5d8
ca2fa14a6b393e021e383796263198e8ddb6a12f
51159 F20101123_AAAKVA golani_g_Page_03.jpg
f6300e3474426a318a9295a00c0dc42b
f4225595feaab322e42284ecc38b06c2b19a713b
F20101123_AAAKXY golani_g_Page_22.tif
8a3f65309d8e21760b539abddd5f0530
1eaef35ce45fc96e5dcfc8f85964a3a4b508ee43
1587 F20101123_AAAKQD golani_g_Page_07.txt
559063e773d78a563a893953b7c1f2c3
2216953b7c2731188105970637006c5c307f1764
85919 F20101123_AAAKVB golani_g_Page_04.jpg
26ea11be7d95830051ee4e4e1f794d33
d7987a763cd062c4a7d00083a891c11e78bac7fc
F20101123_AAAKXZ golani_g_Page_23.tif
b669406ef1349d6b614e9350e8973166
142a05f0098488df82972b3648c1170d145ab4ca
92233 F20101123_AAAKQE golani_g_Page_12.jp2
918a64f11ec0694794e5196af4bbd111
4811f8ce32bacb4590c6ccfda8abb4d07c8a20ba
94979 F20101123_AAAKVC golani_g_Page_09.jpg
b3b6b2f3f9a971f7ee9ad4b3cc678e35
3a38d1ad6495e3a91fa5d9139615b87552a28d07
71554 F20101123_AAAKQF golani_g_Page_22.jpg
c21a5e37534a1af6090f5f32d4cc18e6
2da66b92c773822a6d3118cc06509c2df2fa8c67
6056 F20101123_AAALCA golani_g_Page_32thm.jpg
34e51ee10d8c11999667fb2b9a774f02
16d494916989a0fb8929e0169d3fe1f80bb37599
82750 F20101123_AAAKVD golani_g_Page_10.jpg
75256aeae5bc534b313cca05dee2af30
c2388af0aebaf85caceb3e740ffa83695c15ae05
33727 F20101123_AAAKQG golani_g_Page_16.pro
f6bb22ea847769b2c6193a30e27161c4
5e1662dbf173375e8c4bd1eb7cfb8e7ba80d175b
23010 F20101123_AAALCB golani_g_Page_32.QC.jpg
5eced313dcd4f30840cea606af93a571
b90eb24ff21e40988b0492587c540529eb615861
86020 F20101123_AAAKVE golani_g_Page_13.jpg
6c2ef6ffffde6dadb2e4b104fd273c74
7c1eb5769eda5a5da4d9caf970684b2213acab95
F20101123_AAAKQH golani_g_Page_14.tif
92a312f25c3769a882f2d12debd762f2
df58aae16624f545990c0daf14742f0014a95a53
23102 F20101123_AAALCC golani_g_Page_33.QC.jpg
d5c9832733238457415d6b6400a40eb3
bf1a4b3e905a12c98c72a2ff2fcdc4ef9eeb789d
93900 F20101123_AAAKVF golani_g_Page_14.jpg
2e6f50b8d05595a8cb423c69521dd442
6e892cd140c6cb31b4fcee9fc40e5d241a6b6790
F20101123_AAAKQI golani_g_Page_46.tif
8b44aedea08da312b494f4bf3e2c87bd
90ff7d794b09738af614d05879ca7af8bf89405f
22152 F20101123_AAALCD golani_g_Page_34.QC.jpg
62fba5afa28c9bb8199dfe8f6c4c0657
e43e3991d9ded303ffca89ec9405d6bdd6f86c88
93304 F20101123_AAAKVG golani_g_Page_15.jpg
61268e334cde0950af85ffd17db9debb
35d640a8e3358ddb90db56a4caebb983e6b4f05c
46849 F20101123_AAAKQJ golani_g_Page_45.pro
45c4ed7a20648e5aaf1d5524e65fda4f
0c589c6a343c7381dccf758cb8e85bae7b91d5ab
10447 F20101123_AAALCE golani_g_Page_35.QC.jpg
c6417966fd3d271e03b36fc22d2b804b
a683c4c964a2c1be4ec87777615989787099315e
68046 F20101123_AAAKVH golani_g_Page_16.jpg
843afff7a3e2f76f7380c1a68a7b6a37
3e4fce135d67869b0b1800d85150cb2c74348771
1025888 F20101123_AAAKQK golani_g_Page_31.jp2
a1aa3fc396575f11a49d5af55f857512
99add5cd26d5f90d64789b719d4372a80f2fe725
97226 F20101123_AAAKVI golani_g_Page_17.jpg
418cf056f49712b7316f627f12494099
013ed674c08d8f5a193d9ff3414b2349499c7031
28451 F20101123_AAALCF golani_g_Page_36.QC.jpg
c477271c94acd0a629e21abc0c2fe52e
7bb1c839b6e640429b353d301328e93c53c5868e
62407 F20101123_AAAKVJ golani_g_Page_18.jpg
46008f923f3fbe41569b517ce5482c28
4b9438982c2363ba21e62ec50ac4980448500aa2
22394 F20101123_AAAKQL golani_g_Page_07.QC.jpg
7e90bfdd18150957863d0856548fd206
6489735ac4420e57a108865ca66e41872badc8d1
7510 F20101123_AAALCG golani_g_Page_37thm.jpg
13122da3a7e29667a47c2e4ebd59fd36
cfbc7590fb4b86ef15871ee895a4c82763703bc2
101408 F20101123_AAAKVK golani_g_Page_19.jpg
2fed2bf6d5a65850545fbc81771d393d
a570575158d6526e34314660452eb0f383ead5e3
5816 F20101123_AAAKQM golani_g_Page_33thm.jpg
f1205062508513f1d6e02e13cb3d850f
64d5c93b9398a08a64736ada2154ce7c6f2c38bf
32652 F20101123_AAALCH golani_g_Page_37.QC.jpg
6be8d8d15c0afe07d66fcf2bb7fdf2db
caddd9629f3ff5c23f66d27813b7aefe3053323b
52158 F20101123_AAAKVL golani_g_Page_20.jpg
344b23d56e76b63e885d39caa7ceb0d5
837749f3739751037eddc3027320adbfc559c52f
F20101123_AAAKQN golani_g_Page_34.tif
2622c4bc7b54cf7ef60c69cc62e02e3c
e16b6fa47f0230b88f1a38464b2b29c07a5a98a9
5055 F20101123_AAALCI golani_g_Page_38thm.jpg
6784425015167170cd72eddb734197b6
d9ec5732c8e61fbc0e77d8a3010ccbcb095de80a
68994 F20101123_AAAKVM golani_g_Page_21.jpg
73786660d4761e42a33150727b9a36a4
bab2510884bf2d84552f235c422b1a8d76f99f98
6877 F20101123_AAAKQO golani_g_Page_36thm.jpg
71cc3fb0a12a0e6e596c6d9a513ceb90
2a4e376af68e0a28aa886975d01f6388d7146285
766 F20101123_AAALCJ golani_g_Page_39thm.jpg
254dea2779aa7e51ac19e91cfa64d668
12ec39f5488ddf8ab91df75a63874a326020fff1
73900 F20101123_AAAKVN golani_g_Page_23.jpg
3353aa465bb1097d988ad3f2046344b0
5dadad58e9036153f3ee4e824e38676eeb371cf5
10305 F20101123_AAAKQP golani_g_Page_05.jpg
80be00d91f3e7faebca72a0174c7a824
fa16631daa40c745858be122c8f04d5c9f8ac9f5
2603 F20101123_AAALCK golani_g_Page_39.QC.jpg
75ca1de6e5c2e8ccb2c701ce907c7c1a
2488b8b9f5bc4b66d101dadee07a04147d1c7c9a
71317 F20101123_AAAKVO golani_g_Page_27.jpg
57937363ba6ee4457a9ebb9575c779d4
a92d3a39433c557361face04a69f9a8f56b6ac57
5754 F20101123_AAAKQQ golani_g_Page_34thm.jpg
07c57f7b76a8c67180c4e8575886172b
9ad01166135b6fb2487b6ab89d327aefbacb6396
23289 F20101123_AAALCL golani_g_Page_43.QC.jpg
bf0443b703590906e13537d96c46f8b5
ee3f1badedc395c705f441c33e5bc76bdc54db21
54051 F20101123_AAAKVP golani_g_Page_28.jpg
acd85750b16d9c4eb95bc209031d646a
8caf3fe25509c3c79a51a6e0d2c2db22d050cb77
6316 F20101123_AAAKQR golani_g_Page_10thm.jpg
b868e9e98814fab763cf25460723c2ca
e5b5231361fe289a1ba73e8203e59d247faacb9d
6830 F20101123_AAALCM golani_g_Page_45thm.jpg
0103f7010b822378b69a63f6d4a308bd
f46cc240ac961996762192627051e29e926b735e
84620 F20101123_AAAKQS golani_g_Page_12.jpg
c9aaff5379becb4eaa7d11a7fc550f11
f017175203c7b85a20722b4e3863e4f5c00e8a14
6631 F20101123_AAALCN golani_g_Page_46thm.jpg
6f2df2dc5a1a2724aa19293e8ae365bc
21c1fe60c2c51ab225da932d179db3e77d7ba580
90437 F20101123_AAAKVQ golani_g_Page_29.jpg
144db1d14084a80fea9139637447d97d
45ae780c9d7dc3e5b4c5788728b2b7a470b4e126
F20101123_AAAKQT golani_g_Page_08.tif
82d1522845a5d76b327f8e6580f60829
b51ecfa775a964c14c2f6ed42d1e1cd976b2ec6b
28192 F20101123_AAALCO golani_g_Page_46.QC.jpg
5a9e45a8ef2dbd72e51c98b502500389
157a1673cd57f82c958529030cb7f5e69d240d82
70778 F20101123_AAAKVR golani_g_Page_30.jpg
1b074f7726689f43cf00435b63ce70f3
d81ff18542fb4fbbdb5fa5d3617bc54a5d709777
34036 F20101123_AAAKQU golani_g_Page_21.pro
1082ed638c137cbd78d82f0dc4fd0f36
d5d6e007a2cc53a955674a5b78e707125b505fa0
33842 F20101123_AAALCP golani_g_Page_47.QC.jpg
31d66265842a44b3bfb2822aead71ee2
6db4cee9de86005ce990896837d65bf4649751fd
91542 F20101123_AAAKVS golani_g_Page_31.jpg
1f0b7dfb03698e015b839eaf71ece11b
ee9cad87fc6188891f0e2a4feef890ec6d259fad
2005 F20101123_AAAKQV golani_g_Page_01thm.jpg
ffc74da9a5c876482d8dec6e50c423d0
4ce6fb3da22a14b944ec093b1a28f5d6fd5bb23d
33694 F20101123_AAALCQ golani_g_Page_48.QC.jpg
7bc4de2e18028674ee37d11c494454a2
b3bbe09feed0e49ce39fd870f4df7dfae25d76eb
74115 F20101123_AAAKVT golani_g_Page_32.jpg
594184d4a57755384ffa54ac28846056
975930aa605e96457db7d78448873f69c8d03143
1554 F20101123_AAAKQW golani_g_Page_27.txt
009a06300b76ec0e703c05e1a6befadd
5f30e98a421b92c8c7cac53ba1e198854e9714d0
3017 F20101123_AAALCR golani_g_Page_49thm.jpg
b5b2bbaf746bad98355bff085225fcd0
9418c19acd1baa0f700c2ac573f693909f2c87c2
72140 F20101123_AAAKVU golani_g_Page_33.jpg
7f3c65839b3389f2070f9da52187b008
19b7a6ce801a33849dfefc7266c81e703af99a2a
995 F20101123_AAAKQX golani_g_Page_20.txt
57b984115ea9e67298a022de8ebcd5e4
d3814512b807a349e0ebd36cce27dad032057b77
59313 F20101123_AAALCS UFE0011876_00001.mets FULL
4532854f2253fcd1f0178906df0f3b95
0248cc80d7291e4a44af4801506cd53f28fa2447
70319 F20101123_AAAKVV golani_g_Page_34.jpg
0c0c7d260d27cceb4f133346f69d097e
ba7253e8fcbcdf542c0aedf0f07077a209c340cc
30023 F20101123_AAAKQY golani_g_Page_14.QC.jpg
2e98c8db8af8e1505245b9ae15d570b5
23ef5ef4d60c4664c5ea7bd50d2d7e4e937a73c9
32268 F20101123_AAAKVW golani_g_Page_35.jpg
c3d7481d145491b26f36b66cedbb4673
98a64105ebce37c50426e83e8b0a33a691333c99
62548 F20101123_AAAKQZ golani_g_Page_42.jpg
e12898c074b63aa108cd7b2633c9e288
95077699901b28652cce9db81eefc91190134a11
100979 F20101123_AAAKVX golani_g_Page_37.jpg
b3c4654c4b2134b62c49a1543b949144
023984e5384a6df7e13149d4ee9aea917ad6dc10
58160 F20101123_AAAKVY golani_g_Page_38.jpg
f7a0c87f33e1217ae463a890a1e1bad6
1b0ecc067684bfef1bdd3719a5fb7803bf5600a9
F20101123_AAAKTA golani_g_Page_39.tif
572b430de63f0e77ffe4c5600dd69d23
ff856311611fb0eef1315d4489302153d7a3edec
7178 F20101123_AAAKVZ golani_g_Page_39.jpg
3ad3d24114039068fb8f86d4117ba514
52b4946037cf5a61bc594a1fb2ddc6d3215ee314
5645 F20101123_AAAKTB golani_g_Page_26thm.jpg
8cbfdb8a9382336daaa95fd0101ada11
7b333d8121988bfc2252a94827ac48bfc3122597
22202 F20101123_AAAKTC golani_g_Page_22.QC.jpg
2b991a4d75dca80bcad309b9d7d3f83d
a20fd4fdcae5059f889e91ae2c97900ea3782074
F20101123_AAAKYA golani_g_Page_24.tif
c1bb48d37dfb275f0231ec1de501724f
24783a95960b8e061f58776eb03285e245514e2f
1441 F20101123_AAALAA golani_g_Page_16.txt
0b3c98bb3ea52767d6a045a8c873d425
dc1d6289e8694183b5ac6447d91ac4dc8f42dee1
45988 F20101123_AAAKTD golani_g_Page_15.pro
e6b912b2604c89e139a1413c357ac866
b7b1e1310c90ccce85940438ca046b8a324abc95
F20101123_AAAKYB golani_g_Page_25.tif
d390a43ac78e60e50b356fd819173ea0
70faa74765255292038bc1be7b7b333b82e6abb0
1562 F20101123_AAALAB golani_g_Page_18.txt
364fcbd2586d95d44869bcf143a8f8fd
ece84c556a83a5981dc6e0b6eabad045f1aa81fd
6848 F20101123_AAAKTE golani_g_Page_29thm.jpg
2ecb0e0daa1e926dfa1fa33ec53b25f7
b350a648308384c11ad9e7ff9d8c56b762c6237f
F20101123_AAAKYC golani_g_Page_26.tif
69a9435a66ea42b50c0762eeb2098a90
d958273ae2059fb5ec598e1398b0ec03b66dc2e3
2043 F20101123_AAALAC golani_g_Page_19.txt
ff8310b65b31eca5f92f84c258896916
cc23faa88654df9879035aceed7a518391dff4ce
1699 F20101123_AAAKTF golani_g_Page_32.txt
b007d3ea3711c024c254c12ce84f9e23
84c4ec9b6a320a31ddd65df1211c4368ddadded0
F20101123_AAAKYD golani_g_Page_29.tif
73efa6356ba4aeda4d19bbfe74a7272a
d53e3eb63e63c5062e5504e6b76bf13b671a504a
1580 F20101123_AAALAD golani_g_Page_22.txt
63e7ec85e35d45016cb08cad66c8dfd1
8cf8c84d3bd6c1255c628e3b5d0ee2ae088d0059
1793 F20101123_AAAKTG golani_g_Page_12.txt
ff575877d4bddddadf5cd629fa0eb6bf
aad77da57c7c9be4a31793100d2f87102520f8c0
F20101123_AAAKYE golani_g_Page_30.tif
d4c895052a4b23dafd72a90c8ce7563b
fdba82f5f4694da42f60642c8e3f7ea9e217b5a1
1544 F20101123_AAALAE golani_g_Page_24.txt
768ac07d05c561416ebdeba350bcc7b6
20a6b3f4d7eb7e1541a668d96ad546b35dc98c38
5344 F20101123_AAAKTH golani_g_Page_18thm.jpg
85f257628364d851a7e673c23d536849
8093d996f6c3436a9f935a5f02f65e428869b95d
F20101123_AAAKYF golani_g_Page_31.tif
e5c011b96e6025da5ff510a1ab8074fc
7b1a5f5a2fb54b0b3949a0d4b5d996d295e134cb
1427 F20101123_AAALAF golani_g_Page_26.txt
1f28dfa0035263210fb6ceaca229bfa4
d6b3ac185f76dc3195f39ece2fafe9c50c01ed10
45229 F20101123_AAAKTI golani_g_Page_40.pro
8794c1565a2aa07987cd642668b1524c
306baebd656f09eaa9180ab8c08b2d8154cffd7d
F20101123_AAAKYG golani_g_Page_35.tif
b69ca8858d6800e14d9533d4f878fd80
acaa404bb83f31e11ebd24d15e8bc5ba49eb2eb7
1098 F20101123_AAALAG golani_g_Page_28.txt
58d112f71244d52485a0f78d46dcb1bd
e474cb0a8852eec2050be7fa74548490e2a18882
30287 F20101123_AAAKTJ golani_g_Page_04.pro
6afc251c2d944b0d61f3b769380107ce
1f4606a91032de85e50034ecd49b8f56076843e0
F20101123_AAAKYH golani_g_Page_36.tif
888229e140016f175f1701a3b9cbf0bb
e0b62177d07e1ac0acffd1dc5e755f500643efce
1438 F20101123_AAALAH golani_g_Page_30.txt
6de6381a26c24198c643198145eb294e
91a45a9e91eb38c9e61b34f6951a35514048adeb
5757 F20101123_AAAKTK golani_g_Page_04thm.jpg
c097fd30d0d2a8e9b3279f046faf8ab3
9fc7d6168bf94c1df90d9f8e068c36c2d1619d97
F20101123_AAAKYI golani_g_Page_38.tif
803eeb04c1166afd40bb67bdff8e84a5
f3a97dbe5d792f1f7f56de631fca0a12f5d7a7ee
1835 F20101123_AAALAI golani_g_Page_31.txt
a977f577e7cc38d068997c23904d6e97
bade9e0fe46d2f59198b308d518e8e4eaf392851
F20101123_AAAKTL golani_g_Page_06.tif
ff6fbd79991d89f18836b1cb0b0d80ea
5d8fbb1f3f29347d724a2b16ce04c2f15cd38039
F20101123_AAAKYJ golani_g_Page_40.tif
69cdb938ca045c1ca523f3a6f0f761ab
5ce9f79c84efacec8f9bbf53f539540aa3dace2d
1681 F20101123_AAALAJ golani_g_Page_33.txt
c582662ed4a49fd7ddb2e21f02e06f6b
5998649d67934a6a842a7498967143e6c63c89bb
F20101123_AAAKTM golani_g_Page_01.tif
225972124811719b56d6e995d296aa75
b245517d17315bad010863635f4721417c00fe63
F20101123_AAAKYK golani_g_Page_42.tif
9c6356b1028fa44a03038d852bb79d85
2e0ebd5c0da6ed8dba2c5a1f8748f298082555f0
1451 F20101123_AAALAK golani_g_Page_34.txt
e91f4dfa26827d044c8d2914a695628e
f6f4e7fef9e30a73dc5860680d0fa4079f2079ff
1022 F20101123_AAAKTN golani_g_Page_03.txt
81654270e38ec004f114108e92343bf6
8ce34d08c3e22c9de353d759ac8a1e3258deb6f0
F20101123_AAAKYL golani_g_Page_43.tif
36fc62183dc38cc122bd8f666db2f34d
8facae9f13f0676938d1e524b7974b96248083c8
674 F20101123_AAALAL golani_g_Page_35.txt
c0ca814b00002b68a4aa56bfddd3587f
723d2efc93e790c7fa4b97ebb6dd67253b9cc085
F20101123_AAAKYM golani_g_Page_44.tif
b8420554a592a79d1c54178d979448bc
cc58381ed2a4114740e5695ee13c1501eba811f9
1219 F20101123_AAALAM golani_g_Page_38.txt
7439006169f6a36f4232e9719eee99bb
b62642f72acced8d2b68b27ebfeb883de2732848
12728 F20101123_AAAKTO golani_g_Page_49.QC.jpg
2feef5ed9c3eb828ee5d01b0afd6ceb0
e5ef49432c55e5a3a554165e9804226c7fdd3f01
F20101123_AAAKYN golani_g_Page_47.tif
2fc434f069b2ae39118d8cd535c0750b
26ffd3b241c42a15dd04e48931ce80824bfd9a71
195 F20101123_AAALAN golani_g_Page_39.txt
9dba7e95771955e972602e02a584e2c3
8eecf6f4c8390c306c78f21ae76ac78fa63b6f5c
F20101123_AAAKTP golani_g_Page_15.tif
076a8296a61f46476bb829cbf6e3a34e
50486f8c8916e96221b6ba21f3573aa417fa4755
F20101123_AAAKYO golani_g_Page_48.tif
5bc5b525ceb7022b215be00f46c331a4
71669d15137c0000c7bbf8cced62cfc775e0a647
1879 F20101123_AAALAO golani_g_Page_40.txt
15cf8bc5deac808ec0c2b9c4e326dea8
788a11850d76432294e114ce30d15c28d8e7bda3
1051976 F20101123_AAAKTQ golani_g_Page_04.jp2
180dd55cdd6ce451063bcc84f9a55dc5
b7c4852995030cbf4751b2e8e19553744111f3c2
F20101123_AAAKYP golani_g_Page_49.tif
b47f44caa141df0dca29fec8e76117f3
4f430e52b71d4c3329dc6b14d46b349e51e3d4a6
1394 F20101123_AAALAP golani_g_Page_41.txt
bb3a578df3164ec67ed98f31ea02f32b
5f580ea3ad898524ad82f8a3a62e58b295712175
1444 F20101123_AAAKTR golani_g_Page_43.txt
d1149c8837367e07147440ad7f170c7f
2a91e353a7bb26de585a9a6a2ea37d39c7b104dc
3048 F20101123_AAAKYQ golani_g_Page_05.pro
10d16bf1f1e53258c8065f74ff7a0067
18330ed5dd2f411e5a1deb3bd136336422f117f7
482 F20101123_AAALAQ golani_g_Page_44.txt
1ba0bf882da03c21029afd4c858d385c
787ee18c7e44e8e3cea5e7ec6aa52cd01620ea87
5776 F20101123_AAAKTS golani_g_Page_21thm.jpg
8dcdf5936f1b5506bf747410e9b02acc
4553567ceef890be74dee0b8237f581678f44058
21113 F20101123_AAAKYR golani_g_Page_06.pro
d9fe75aec7847249822892b3bac75b68
abe413a3bf8d18c73898cac8f0e951b8d6b53d42
1931 F20101123_AAALAR golani_g_Page_45.txt
d56bb1030a11a052ad7045387d0949fc
83a5973011800a01e113a62db28fb8cf91ce28f3
34294 F20101123_AAAKTT golani_g_Page_34.pro
320b30b4e7ddb74c8124dd8c4f368165
8b331ef171e82648a9dc3ac632ae714553ae646d
49313 F20101123_AAAKYS golani_g_Page_08.pro
2086fda80c409f1a611b250603bf3c52
5883ebbb382a6b4f3e6c273ce6030f4ca1e9edf5
2119 F20101123_AAALAS golani_g_Page_46.txt
4cc138d5fdedebb2376c59688a64c4f9
014572c8c7babf05edfd91e23ca492c7f67a8995
25900 F20101123_AAAKTU golani_g_Page_12.QC.jpg
12207fe98e050ad15fba3acf823f37c5
0b3a155fd4c9e73acfe36fbe8c8e233e95827df3
2628 F20101123_AAALAT golani_g_Page_47.txt
f74324187530cc320bffd9cfc9bb6eca
e9360b413c21c392873333c3ab2a5c3d40c47a72
95068 F20101123_AAAKTV golani_g_Page_11.jpg
7f325c1fd846240cf9a0bd3e8b7a88b6
bfa0bf29c3d9867ff41960fc2b677b67415edc63
42284 F20101123_AAAKYT golani_g_Page_10.pro
d8788096846a4ff5e4da131774cfb389
e51d3bd42971efe19e2306ce5df7943213f90a06
411066 F20101123_AAALAU golani_g.pdf
b1b39aca7b7e9b020951a88e2bb7e040
90a7533c3fe0c96ee426dfa1c52fed5e269efe2e
F20101123_AAAKTW golani_g_Page_41.tif
27a2172375bf73b2a0a3b829aeb6b62f
c97d4699e1bb71165a08a4b9b166c2dd46f7e397
46575 F20101123_AAAKYU golani_g_Page_14.pro
47460e4ea2c722271a6575eb3a81747d
3746203739665aabee0b18099e3e082d025d80be
7006 F20101123_AAALAV golani_g_Page_01.QC.jpg
9300d43017e3df33caf34f50d76c8d66
f4ba2630b3dd2810b9532cfa7ed2d0b5409a3b55
5774 F20101123_AAAKTX golani_g_Page_22thm.jpg
ec36f8fdcd5043b0ef4583f1839855c9
e3fbb7af4e6ed68049f60ed404989ede1c736736
34448 F20101123_AAAKYV golani_g_Page_18.pro
754efb33457715f662c240df28fc4f07
a4b9d5ddc7e69d781aaff0034399ede9fc152cc4
1344 F20101123_AAALAW golani_g_Page_02.QC.jpg
b620462a8d5fd082d1f0678e56039611
42b98cdcfd5627aaf5acdd457afd8a83f1587cbb
29344 F20101123_AAAKRA golani_g_Page_45.QC.jpg
4ffaefbe16a37bc15a65de14ee1c50f6
22d88563739dafe6997bfede584a1356e6db7521
29610 F20101123_AAAKTY golani_g_Page_11.QC.jpg
e8d183f4981a9cb44418ed216648781a
f1d343f428a098d034ce77a755cd40f472f74160
3813 F20101123_AAALAX golani_g_Page_03thm.jpg
0fcf853185b5720fefeeb4ebc3957466
cf1a8aaf33b824e141e3d4a9326382c65686df46
782925 F20101123_AAAKRB golani_g_Page_27.jp2
33e9798e81bb6ddbcaf9fffee7c08aa4
07328db476e48d13db1ce31ed46fc0271a599182
232249 F20101123_AAAKTZ golani_g_Page_44.jp2
1e8308f0ef5419ab8afebe7f48640f9a
a288fe58edbf951e5afdcbd871103128114b65b1
24698 F20101123_AAAKYW golani_g_Page_20.pro
7e1f2ff7922f25841c890216731ebdfc
b54a8c121b29c85cc39124a52350e8fa78937667
15849 F20101123_AAALAY golani_g_Page_03.QC.jpg
f1b6427b55aa3dc5a86a92c38d6ea9ea
2aa664e06e0468419b1d6e95f475a36f37d8f7c3
1282 F20101123_AAAKRC golani_g_Page_42.txt
6dc07176e65348ccedd669fc5ca3b280
c5bca2997694f65d4963a4f373e9719e5b07ace7
36541 F20101123_AAAKYX golani_g_Page_22.pro
a11b4464a5ff77992c5520e63d0f56df
71613001c2a266d01497f9a7d720c41f0d7b1e23
1059 F20101123_AAALAZ golani_g_Page_05thm.jpg
0e3bb13bf954f5900ed579267fc96843
657b3fba0c6afeeb4641059012c042b88b67349f
41863 F20101123_AAAKRD golani_g_Page_13.pro
4620e5e84f51798860f0860e04b7dbfc
56eae2163e36d35d9bd4e37ab5196b489c746ff3
93402 F20101123_AAAKWA golani_g_Page_40.jpg
2312f58183db25fb162f097b4dc200fb
0eac9aef35eebc1151ad64362d079fd479aa87d6
32989 F20101123_AAAKYY golani_g_Page_24.pro
3d88c3cb78d7390cbf3f5e05896137d1
91c49085aa0b19ab72d41edd31f31c36f72ae524
7849 F20101123_AAAKRE golani_g_Page_48thm.jpg
8efb2f12257a27477c2ad91b87e12993
ae1458137ff8a5874f8b0d4aa2ee1d3e6b0ed26e
69702 F20101123_AAAKWB golani_g_Page_41.jpg
ca5edcc969e0cc19e65374247d2754ca
1064a8416bb7f4c3c68ec04a6903dcb03118e80f
37740 F20101123_AAAKYZ golani_g_Page_25.pro
61cd199745cf78b4f5f4318b66f03bab
82a38199a403e40ede6ec8cfbf2e5ff69ba04935
F20101123_AAAKRF golani_g_Page_37.tif
403c075602fbd005fb3f6a093a23c786
711df36659bb6aed8a77ce371694cfe1a5add559
69925 F20101123_AAAKWC golani_g_Page_43.jpg
84e863ef4be001dcbad73e4f28fa4fa0
00eae3ac18475e2443d26a7c1d30f39d34fa7d17
F20101123_AAAKRG golani_g_Page_09.tif
6131ade9c040592c907951529198cdb6
3e483c41e6f70bf5d2c8811680051653a04c4db5
22010 F20101123_AAAKWD golani_g_Page_44.jpg
6ae3f287c3890977ef1c50578b36dcb9
c2437c05a3e6bb2d8c86056ee7c9d52f2739c82c
21844 F20101123_AAAKRH golani_g_Page_16.QC.jpg
87be9bd1d583e32003439b6e006a4dfa
e475ddc1eac73a92d31a7d15987bd401110fb84e
93713 F20101123_AAAKWE golani_g_Page_45.jpg
c11affbe6c264fca4b7ca383a751ea76
b5721f5fb2c227e6bbb0c8f461dd432c118abb04
103462 F20101123_AAAKRI golani_g_Page_08.jp2
9c5210b1567e46784973b23735d8a779
0a21810dff06e50191e994860e6ffa8478848288
102667 F20101123_AAAKWF golani_g_Page_46.jpg
dbf99cb86192db1623ab22f46dbcc6ed
bd4ddfc5e1769f27ee446a8e5ac40d3e8247b06a
6889 F20101123_AAAKRJ golani_g_Page_15thm.jpg
27dfeeb2edc081ef388755121ad7ce1b
b833bb0af7247cdd181f5cb0664857ff42464064
124699 F20101123_AAAKWG golani_g_Page_48.jpg
919bba4b4092e3a2bdb030d582dbd934
43751730888ab0930a0706f16ccd01585657e13a
40183 F20101123_AAAKWH golani_g_Page_49.jpg
55aeba0fbae77e157a52e2ad43f75888
ed7b3caf431a3b9f70454037b4a662b27cfe30cf
819338 F20101123_AAAKRK golani_g_Page_32.jp2
64ef3dcd1802b51afebd712de937f09f
6e5e399b1706d6aa48ef920dfaaa6f3ff6b43a0b
23777 F20101123_AAAKWI golani_g_Page_01.jp2
bcfab722af1f4c3da7c52785950e0639
a7432254da9c161b3617bc0d7348881f4c90285b
968 F20101123_AAAKRL golani_g_Page_06.txt
ce27c76fd2f5bfea0813e2e2ed2a70a9
89a5b3a715f4f6c490a265672dd82c8626c8316a
5733 F20101123_AAAKWJ golani_g_Page_02.jp2
a81393507d44668440737bc35572bee5
0dd7ea3aa058d5839af6f1ce2c37096c97b4359d
54064 F20101123_AAAKWK golani_g_Page_03.jp2
3c651d5b566d13d225a9b06bba9ef659
9ec1cdc7571fccff7312cb2c34d1d1843dc706b7
1002021 F20101123_AAAKRM golani_g_Page_15.jp2
c863eb701f28fbbf794af0fe8e902fa5
13efd494427a62008fc8f76efa6ade0ed6bfaf0d
98872 F20101123_AAAKWL golani_g_Page_05.jp2
646683e4eef9f85ee3f075d4bb07b75f
f73632ccbaf532987e772c51743a0b4fe1a85200
6175 F20101123_AAAKRN golani_g_Page_43thm.jpg
c788ce5361ab9264ec5ba91bed8d14ba
5acc760fa0fbbe1831524b07308ec9bb04a1ffd0
760553 F20101123_AAAKWM golani_g_Page_06.jp2
bb0d02ee40db28b4c2f60b19898582ef
f5ba523914c9e6b8093b88b7398e1537902341b5
F20101123_AAAKRO golani_g_Page_03.tif
1d78d45792a076586aab22c21d061c80
077d941a96cc59cecb834e4e3291fcd42ffc4b15
77746 F20101123_AAAKWN golani_g_Page_07.jp2
9b5c6b95bf2a3fd785891e97c9688c80
8054fea9648f4e5f6ac346a3b976e564444dd477
1912 F20101123_AAAKRP golani_g_Page_36.txt
2ec62d617956d6adc95d5a019328d4fa
0624addfeb91de40f5ed11df9a669458c02e05f1
1051961 F20101123_AAAKWO golani_g_Page_09.jp2
cef88d051b77d822c391f6cb00ee9ed2
6fb22d8a4686d2b9f8b24831614ad90b9ad7f810
51000 F20101123_AAAKRQ golani_g_Page_19.pro
33a81a19877f30f549d2de505d1f826f
afabfeb2a53de489c3af79fc46e691501fb89728
931670 F20101123_AAAKWP golani_g_Page_10.jp2
de74c2c3013b4f161c7b86e70913a67f
645e3b3db85685634f779572c89ec02852d1da8f
F20101123_AAAKRR golani_g_Page_21.tif
2d352d5a03e6a4534c70a192c92de664
6e8376ace5590e0ce73cc2dd1e0f1e67593fa549
1051926 F20101123_AAAKWQ golani_g_Page_11.jp2
be1e62bc99e5ca5b5d0f21ed482b5c18
705d226ba8ca40729171f952288714959d3d1e3e
2153 F20101123_AAAKRS golani_g_Page_37.txt
bfc76d6e38f7fd9978eb4cbb29d93cb8
4d6bf34cab2bb0a2385841ca242d924cab31fff0
10079 F20101123_AAAKRT golani_g_Page_44.pro
2456b30b0b58e57a4458326cfe83503f
0f5de13f75259381510e6b145198952cb46ee10a
1022684 F20101123_AAAKWR golani_g_Page_14.jp2
2a59143e91a883a5bf492b640f1bd0c2
a083283da712a925d605dad512aa9e9d9597832d
6479 F20101123_AAAKRU golani_g_Page_40thm.jpg
bb72d4b2ac8a94c35ba31f3b6e7a0b91
c0bb7e6140788536938bfa839a2ef296aba72dd2
758020 F20101123_AAAKWS golani_g_Page_16.jp2
566a6475b330516bb89ae22f2119e103
5f67a2b9d0cb8044822729521f970dcbbb724b08
480 F20101123_AAAKRV golani_g_Page_02thm.jpg
49341f698c464a79b241da9a08f140dd
0c9cb25e5fe6548d2f6c20a1a54b7b373e61ee9f
1051970 F20101123_AAAKWT golani_g_Page_17.jp2
059ef846e286c74752d53a42deae892d
236863e86691d5d07ac5a4742179777fb36cb2f5
91700 F20101123_AAAKRW golani_g_Page_36.jpg
71ab094962fffdb04b2c91185503cb2b
dd91c9dd1cbcd9cc76d1911962cbe74b3856b9e6
686122 F20101123_AAAKWU golani_g_Page_18.jp2
4f85c646dac6afaf4e982ea42187eac9
f4c0b038d098c6728d0f35e67ec60ba4286b5d1a
20689 F20101123_AAAKRX golani_g_Page_24.QC.jpg
723a37de0873c439cc47bc16309bbaa8
a0d827ef76d396453dbc07b860cb7f82427a8612
1051960 F20101123_AAAKWV golani_g_Page_19.jp2
7fed0b4a6b1c1127168cba9c3c58b4d7
23703941e311e7f5b797aba8f23b3dfe743c1f55
42040 F20101123_AAAKRY golani_g_Page_49.jp2
71c9df6db7521947f7405a491b67d602
74d442043461cf393db2f2f088cf71a10723976b
80568 F20101123_AAAKWW golani_g_Page_23.jp2
0f32f88a7abe464764c9f2b1da25383c
5ffddb0b52480e22a95523fecc4ea846f4494220
2850 F20101123_AAAKRZ golani_g_Page_35thm.jpg
b1b5f4204121af0da6a499dae436a6a9
5ad50ba3512e445168c27f63398af8dae7d1bcec
71834 F20101123_AAAKWX golani_g_Page_24.jp2
34541157e9cfef73f648afa0cbdaaf4d
757403fdcc5ce8bc518704a829d34a31c96a4aeb
7358 F20101123_AAAKUA golani_g_Page_14thm.jpg
25138601a39f5ea404f112a1ca71b848
b8f7c80662c32dd988de850962c664c111aad95c
834342 F20101123_AAAKWY golani_g_Page_25.jp2
94d53a1ceae819ab1408e81e0b3c205e
c9558df06f93a98b728c994d39e910578b46b29b
25961 F20101123_AAAKUB golani_g_Page_13.QC.jpg
8903dcc501c10250f26f0c4e123733c0
58438a38a62174459f76019a377b29a53d627068
985295 F20101123_AAAKWZ golani_g_Page_29.jp2
28202c3449484807c255c6d18b526577
07c5c5b5f6d94ad7980c32cfe2ca29c3d3c06dc2
6997 F20101123_AAAKUC golani_g_Page_09thm.jpg
a22a7f52aaf9370f032b7e233fb6b0ee
0afe70930584eb34f01280f27da650b43449901a
24334 F20101123_AAAKUD golani_g_Page_03.pro
a9df7490b01878c341951c6f81b51cf6
de979f2d997897262965098dc283e3ede6ca0c6e
30505 F20101123_AAAKZA golani_g_Page_26.pro
6f68fc302744bc2f58751127cc67bb7c
7e809722395eeabc0b2b361d1840bd96d326f714
3313 F20101123_AAALBA golani_g_Page_05.QC.jpg
9b802665dc28e5fd2d1dffff5704feec
a0e063b4d33983731af68fabe67d0e0a3c68acc7
F20101123_AAAKUE golani_g_Page_32.tif
08ee5fcb13645f7c3b1e378a8bd7dc8d
3584fe27d21e97adf772d640d29196904701bb08
35123 F20101123_AAAKZB golani_g_Page_27.pro
61cf30b6c8601e2eaff0dfd2ebe4ba01
7bf9eaf5f67371a5071b41e516c8d87c23a8c7bb
3567 F20101123_AAALBB golani_g_Page_06thm.jpg
695cef80a3af4c695cd0f50b30da99e0
d84d921b46838989185023d9a945cd5b71a79906
935046 F20101123_AAAKUF golani_g_Page_13.jp2
d402621083a5195f474bfd1228ff9c73
9620ca6e1a01299674fb0495ef5d7b37c9fa7c28
24637 F20101123_AAAKZC golani_g_Page_28.pro
59e3ca86124fd802820b271401d71218
bd938479d26dc12180cdd168f588ffb58098502a
5077 F20101123_AAALBC golani_g_Page_07thm.jpg
07f1823e81ddab511f16c9434332a91f
e16ffd90e3e1c651a1c9887a652fc9a7cdf39148
116380 F20101123_AAAKPJ golani_g_Page_46.jp2
ccba86ab011b7d5df3ec6444eef90c69
560d6461937747b59c39d26ac25784780b94da13
43846 F20101123_AAAKZD golani_g_Page_29.pro
3910a4d0dddeb39be0556e7b10a6e3a4
8c7569b9852b84403631ce37e55161c31f4792d2
29401 F20101123_AAALBD golani_g_Page_08.QC.jpg
0298e4d8606be5415cb28a51443f83b1
2d6b858052521d6cd09de49e138e47855b2e5990
F20101123_AAAKUG golani_g_Page_27.tif
9ff676e523ab56b40f8f304ce2dca24c
592dedfe52d114a4a0c6719cf4ef672b0c47f88e
36060 F20101123_AAAKZE golani_g_Page_32.pro
272518a0a784a79cd0f9bdf1316428ae
518b0314c6b5524ac035bf94d73816d99fef7c3d
29616 F20101123_AAALBE golani_g_Page_09.QC.jpg
59f43e5f138c1f2ccfc0200a70efaa98
33b643ecc2901b0baca1d186d2cd6394789e4853
23830 F20101123_AAAKUH golani_g_Page_25.QC.jpg
8b710ac1d00b334071d75d7773b7ce4b
1881c13f80ae735854c7ff039f2eb6439fb24afe
6874 F20101123_AAAKPK golani_g_Page_31thm.jpg
766690d296014595563b410b979f89e9
ea72e6fcaeca8886ad1e5e8bf2bcaec5ba95f271
35804 F20101123_AAAKZF golani_g_Page_33.pro
fce57e7abb4f63fa12e55436d46fe25c
4e26be734596242ca9beebcab01060c89f81bb9c
7164 F20101123_AAALBF golani_g_Page_11thm.jpg
823912455b9fa4917f3dab8e732146cc
0334490b96a73adc7375dda17d86e12bba52a3d5
26148 F20101123_AAAKUI golani_g_Page_04.QC.jpg
82015e7ba2274a8642ac7fd5cc67bc42
7be8d23ad0374211d4ed4a2c87f32ff5d5ddeb1e
19454 F20101123_AAAKPL golani_g_Page_42.QC.jpg
daaced9b1809c68b836a256e82e9bade
aaa518947891e8d21429bdd5d3d21c42a3740ae7
15215 F20101123_AAAKZG golani_g_Page_35.pro
63134a9fbedc88f9eff6e4b05f0ef57e
77d7a866de311b03bc56d617ed9993b66bad25a2
6132 F20101123_AAALBG golani_g_Page_12thm.jpg
0ca858f43963c1fdc503ed328b84269b
973227f421e2029ab578d73eb6801c2979c384e8
128 F20101123_AAAKUJ golani_g_Page_05.txt
9e63f918cc84611f5b3fedaeb9caf3fb
7c7666c0c134c5174d420c4cc812b345de46202e
47988 F20101123_AAAKPM golani_g_Page_17.pro
1d56c88a8f8f0b5022bac2b65036406a
4fdc508d07f9243488b2c8fedae9bdb1a713f41c
46571 F20101123_AAAKZH golani_g_Page_36.pro
ef632a39187a1ace6cad61fb9bf9e2ec
4e6a8cc19030113d0602431977391df2482e186a
28571 F20101123_AAALBH golani_g_Page_15.QC.jpg
fb06bd6255804afec4a1da7977660090
759a0083218c97655dd521b80d79a11f77eba952
35043 F20101123_AAAKUK golani_g_Page_07.pro
6fb158dbe97c76680c83ed68a9a9949a
4439cccc9802b6c4c9d167632788b426255da40a
49088 F20101123_AAAKPN golani_g_Page_11.pro
f8cbc86fe4c822e86c246bdaab75b7a8
599c013f8f2805918be1720554458a72c61506aa
53369 F20101123_AAAKZI golani_g_Page_37.pro
64b9f97b326a19beaa3b5d2af06bb59a
fff4f60b9722312e4a4814295dcea2d574c8a6fa
5700 F20101123_AAALBI golani_g_Page_16thm.jpg
834075cbac1acd8a14aca237bff56c7a
600b6e4210beee1e0836b94b6a144da0e1c89d97
46406 F20101123_AAAKUL golani_g_Page_06.jpg
e1979d52b1e10dd51d5d5bff54442354
0c0d087778b191f1b91db9fa21a4108184e11d60
9527 F20101123_AAAKPO golani_g_Page_39.jp2
a8e0996c5e70bf493b237f1dd7a8a1a9
6957d9d5e6421e0a16340b8ea94eaff8e6c1d6d9
28106 F20101123_AAAKZJ golani_g_Page_38.pro
0f05a5bf2da650d3d8e1f4a25402bbfa
59f021716b26e6368ed97415fc91daeeb8946980
7137 F20101123_AAALBJ golani_g_Page_17thm.jpg
0f4007a11c0392b096f39e31230ccbcc
224347fb59239fdd80cbfbe9c0b7d6d5a876583b
5273 F20101123_AAAKUM golani_g_Page_42thm.jpg
1f7197f292f9de6043b694d38b4755bf
56ce195f692a6b7a6687ef7f08f60dc5991f8542
75396 F20101123_AAAKPP golani_g_Page_25.jpg
9eb06669514d6c6a683bcba6a86671b6
0df11e4426dbba4d81e301849dad9eea6fcea598
2882 F20101123_AAAKZK golani_g_Page_39.pro
0e850eef58cee454d53a35d18f84c75d
df855ac2a366729c97467d6f302a88892d905cca
20284 F20101123_AAALBK golani_g_Page_18.QC.jpg
858d2aff2f1c10590737330f62362631
7c36c50add75aeef2061d3e21c85c2ac0a0f38ca
23413 F20101123_AAAKUN golani_g_Page_01.jpg
d595431a7d558c3c0ff9c96b6d1af56d
bd9a90375fb40208cd74d4fc860d2c0957faa183
63205 F20101123_AAAKPQ golani_g_Page_26.jpg
e61a921dae7380a971ec1decd1bed8fc
44ecf34a9e71cbf19c9eb630815999786a3effb0
33628 F20101123_AAAKZL golani_g_Page_41.pro
2efdb150a57093f88e65b3b6f7173b37
8a2203c23883e091563b43165a1ccd0494ec39b3
7257 F20101123_AAALBL golani_g_Page_19thm.jpg
f99f151aa7f3d7302ff30e54df7be7b4
9ebc2977379b7183e095a7a46dff140d1b9947bf
F20101123_AAAKUO golani_g_Page_28.tif
c374aa160d40b42f0a8a39b957d21cbc
2b55b3936c86f928cd6be59f3cd8b76c8f5036c7
7187 F20101123_AAAKPR golani_g_Page_44.QC.jpg
374154c84c6c162eea2d3e4fc6b6df1d
6ce5bb2b14a86c0b13c46847a5c4c22a05157242
28735 F20101123_AAAKZM golani_g_Page_42.pro
dfd376012692944314b418ade7d11717
1a8642b96e0cbb19be7f8a98451241b618ed5621
30996 F20101123_AAALBM golani_g_Page_19.QC.jpg
8b4041e3c248b4d28ad0b62808d25304
3b1be0ef5ce349e787c93ebd88c932c3e87e4c37
6049 F20101123_AAAKPS golani_g_Page_27thm.jpg
d6a7c360a6e1e0d0cb5b92e15adc51db
be9fa3b3589d8e82f67cf6c5aebb08b48f5d7da8
33461 F20101123_AAAKZN golani_g_Page_43.pro
9ce656bb530076dc7a5a7fb41519fe1f
6a4cfc4ced8bd77b74bae015d53895cb4decc337
3986 F20101123_AAALBN golani_g_Page_20thm.jpg
d67d5c95151a1ae8c95cad73770c524e
1da984b0dae775d7c0b1cda8201a27446cf871f4
7554 F20101123_AAAKUP golani_g_Page_01.pro
cf87d68ee88bb738d60a487fb9d09767
4a62c710e057a7cd50e0a584fde78d2dacb97081
646536 F20101123_AAAKPT golani_g_Page_38.jp2
d22c922661250938323cfd8f3d9cb7dc
817c5f379b36ebb65290eceb142654f804efeb6d
52649 F20101123_AAAKZO golani_g_Page_46.pro
e46e4706560b83cec69e470096048775
3978b75b740671415b0e90bc583c3843c0047878
16225 F20101123_AAALBO golani_g_Page_20.QC.jpg
b07ba45eebad6e07a2bc602e54f77d97
1a2e1b8c7ba34da111114170aac8f607b699e4e8
1116 F20101123_AAAKUQ golani_g_Page_02.pro
ba9d516e41907a7cf5dc6994599c4586
85c62946c76e119271e7d6a2fa58f5c2715cda16
37352 F20101123_AAAKPU golani_g_Page_23.pro
516bb94b4947882155aac72e554bcd9a
0c99753179646ad44b7593d54a45fd96fcbaf442
64640 F20101123_AAAKZP golani_g_Page_48.pro
b5895e915a0c4878d695ea7f71ee90d7
8abd1cae4022f756e5e1d8d6814112495d3e5f59
5662 F20101123_AAALBP golani_g_Page_23thm.jpg
d6372b88353e74d8b729fe59f584ed75
681d1a8d43f75955a777d542142ef97386c58a40
21614 F20101123_AAAKUR golani_g_Page_21.QC.jpg
656fadf0322a7c33a27ab99770a7f816
72a6fb2b803c7c54bc9513024cca17fd4fa61467
48343 F20101123_AAAKPV golani_g_Page_09.pro
9c1275a71b5388c5681d085d41c2d5c9
089e4ad0f349c0615a9a3019ebc6fad4a8cb85d0
18493 F20101123_AAAKZQ golani_g_Page_49.pro
6dab4356c379e8b39d6d571fe9cd98c9
539d052405de38a3d41edfc64c391a3c7e2745a9
23527 F20101123_AAALBQ golani_g_Page_23.QC.jpg
a906c6f2d85a7c44698bffcd596ccf57
1b92c80cde02d05fbd51d9e6f22feee59d4122d3
710794 F20101123_AAAKUS golani_g_Page_26.jp2
d56a5dd562665c17730c3fe5e354c9c3
ec9d6309d513cf8dbda93126e138cf3fae40e542
7779 F20101123_AAAKPW golani_g_Page_47thm.jpg
5c7417f1e0b09f4e8f215548e2fa906d
3136318bffa2bb5c11b41346f44053cbf2a08b2a
442 F20101123_AAAKZR golani_g_Page_01.txt
b3b9460f94f3842b8c76d35611c13947
8b197d03ef87a3db247b66376308a0ea82699437
5541 F20101123_AAALBR golani_g_Page_24thm.jpg
b3fbc70389950e637db97c02338ba86e
a8c941c067030ca2ad830d6b7e3106cfc8553922
1566 F20101123_AAAKUT golani_g_Page_25.txt
94fba598797a03ef9ee797493ef54ee1
b61b8eff95d6f6ab6a9726f9ad959d84f088ed7a
93075 F20101123_AAAKPX golani_g_Page_08.jpg
5c74f34f2135a5a1b601a03434aed586
545f575b945caca63742f256d33b4f4805425cae
1299 F20101123_AAAKZS golani_g_Page_04.txt
dd9806e3a5f3d9f4ca1658201e61a9f3
4f55d5a3cdc86397b2d2f78120cc07cc9dede480
6231 F20101123_AAALBS golani_g_Page_25thm.jpg
6a5807d09fd0d141576bc05f5a9ce92f
88369704ca9d2a7ed580db4a1a6d5c56eab04346
697997 F20101123_AAAKUU golani_g_Page_42.jp2
e3393f9d35ed1493c16bd6cb9017a6fb
880a420dfcfe43f7d3fc6764ef0d11948f2f5e6d
72926 F20101123_AAAKPY golani_g_Page_07.jpg
90060da395571e7bfd0ef111d09f273d
56a046a4b85ca8d66af9fd846bbaaee22eb9bb81
2017 F20101123_AAAKZT golani_g_Page_08.txt
fb70c28c56f21529cdadd63bb7bfa0d3
1f3ab2049788dd8c9ec8a5193c3afe3fbceed4e6
20740 F20101123_AAALBT golani_g_Page_26.QC.jpg
3ef87eb4052982d3dbb27f88b8fba5b4
bce42bedfbcf2494e2b0ca080fa587d11236b0a6
F20101123_AAAKUV golani_g_Page_48.txt
c0cb5ab6fadc77aacf69660504cf80e9
de40b59a0085af797c3076e8a62dc1cd3239b363
74851 F20101123_AAAKPZ golani_g_Page_21.jp2
2e108b07d5af1067d85233417b981f5e
5395c940d8333a7adf60bbcc21a118ee1f8a7132
22277 F20101123_AAALBU golani_g_Page_27.QC.jpg
fd5806df029f688c7b3fdc7b3a77d340
4f248a57025604206fcb207c4034e3b98167ed7f
76255 F20101123_AAAKUW UFE0011876_00001.xml
bceafbff3c561c1e620b2fdda4759436
d504477b76a7ddd1cc9dd8c4322e417912290189
1915 F20101123_AAAKZU golani_g_Page_09.txt
f2c97c3b02eda748bd395ab3b5e37d46
2729409d17035dbbebfe9e784cb64e03cbedd908
4736 F20101123_AAALBV golani_g_Page_28thm.jpg
212f5b9a1c1f70c2fa3d72d451843b36
2cc2c46543782bd62788d9bf1cb0d751dfdf684b
1832 F20101123_AAAKZV golani_g_Page_10.txt
3020f82068191a5e3047e967c48f249c
120ab93e230ceab9cd13b35a935c95c27f6a2252
16808 F20101123_AAALBW golani_g_Page_28.QC.jpg
f555529619237f227157fdc59c4eb622
338e39fd0e7ad2ec324958f2135c55733880d925
65 F20101123_AAAKSA golani_g_Page_02.txt
ab7eca003b71ef6ee7677ce76522a093
fe2e944e6cb74130eae94b94c27cbc6e69269f15
2003 F20101123_AAAKZW golani_g_Page_11.txt
0601b5272c4313dbb7d0e23ab90b48ab
bab08cfb94f83b43562853e802e39681c527a305
27799 F20101123_AAALBX golani_g_Page_29.QC.jpg
abd4fb792202e1efe2db755f5826f820
01b21496116190c35891ed458295053f2934ae6f
1910 F20101123_AAAKSB golani_g_Page_17.txt
9e1c914fa3d6e9ac4dfdec9d2f4bf1e1
19cfddbc6a006cf66a6820f2bb5f4091f20753aa
3829 F20101123_AAAKUZ golani_g_Page_02.jpg
7cae1d1babd67d2410b81d330c290965
658a4121e11804b9d566e31e4e5725c92934aaec
1770 F20101123_AAAKZX golani_g_Page_13.txt
2eb83a4e1448ae074b41a871bb176d76
17d5d4ba257291fece2d31e455fa53877a15d26c
22900 F20101123_AAALBY golani_g_Page_30.QC.jpg
09182c86c950fc2dcd3a5a1cb711b657
12734bfbbd9f7b7c50164c0141e3b3d603be6680
582227 F20101123_AAAKSC golani_g_Page_28.jp2
db8832db3d0723ad3276b4c41e6861f0
1149d26423e398f63e3fed43e2b84c219c9afb32
795078 F20101123_AAAKXA golani_g_Page_30.jp2
69380113ed799f8acaf1f77974ce9ad6
07c827820495a41fec7d3bf6c318459e923fd113
1900 F20101123_AAAKZY golani_g_Page_14.txt
9b9c4a0ab063a49480aab44f7479a430
3d6a33d4dfb1274194c696be46a230695bdacd7c
28383 F20101123_AAALBZ golani_g_Page_31.QC.jpg
be84a6bc38d254d6189db33bdd61b6e1
2724393145eeb6da90bf721d2ba175e3e6f740c6
F20101123_AAAKSD golani_g_Page_13thm.jpg
f5ab039a6e80b95c80ade7aeaf52987a
237b62c62ab4231e29a772a15a38b93b222bd9d1
790574 F20101123_AAAKXB golani_g_Page_33.jp2
92d47dda3f285f1c5c6c1d2d30754255
7bf054685f03662e8f559b5ca80b87b6f9730e0c
1875 F20101123_AAAKZZ golani_g_Page_15.txt
b92ad284f63353157dcbf626113c48ab
803d7096580682067f57f8ace88fd6cd4500f756
43891 F20101123_AAAKSE golani_g_Page_12.pro
733dda2d53dbe7fda1008aa073c6dcfb
8601b8eec1c69b0bec87426954fa97defc596790



PAGE 1

THEORYOFLINEAROPERATORSFORAGGREGATESTREAMQUERYPROCESSINGByGURUDITTAGOLANIATHESISPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFMASTEROFSCIENCEUNIVERSITYOFFLORIDA2005

PAGE 2

ToMyparentsandmybrother.

PAGE 3

ACKNOWLEDGMENTSIwouldliketoexpressmydeepestgratitudetomysupervisor,Dr.AlinDobra.Hisenthusiasmandintegralviewonresearchandhismissionforproviding`onlyhigh-qualityworkandnotless'havemadeadeepimpressiononme.Ihavegreatlybenetedfromhisguidanceandemphasisonmepursuingmyownideasinresearch.Iowehimlotsofgratitudeforhavingmeshownthiswayofresearch.Ifeeladeepsenseofgratitudeformylovingfatherandmotherwhoformedpartofmyvisionandtaughtmetobeagoodhumanbeingrstandthenanythingelse.Mybrother,PankajGolani,isagreatsourceofinspirationthatanythingispossiblewithhardworkandhonesteort.Timeandagainhehasprovenallassumptionswrong.Iamfortunatetobeborninsuchafamily.FinallyIwouldliketothankmyfriendSrikantRanjanforhelpingmeduringvariousstagesofthesiswritingandformatting. iii

PAGE 4

TABLEOFCONTENTS page ACKNOWLEDGMENTS ............................. iii LISTOFFIGURES ................................ vi ABSTRACT .................................... vii CHAPTER 1INTRODUCTION .............................. 1 1.1NewStreamingModels ........................ 3 1.2SynopsisOperators .......................... 4 1.3OurContribution ........................... 4 1.4OrganizationofThesis ........................ 5 2RECENTWORK ............................... 6 2.1StreamingModels ........................... 6 2.1.1InniteStreamDataModel .................. 6 2.1.2SlidingWindowModel .................... 8 2.1.3DistributedStreamModel .................. 11 2.2SynopsisOperators .......................... 12 3LINEAROPERATORSANDDATASTREAMALGORITHMS ..... 14 4INFINITEDATASTREAMS ........................ 16 5SLIDINGWINDOWSTREAMS ...................... 22 5.1Denition ............................... 22 5.2SynchronizationBoundary ...................... 24 5.3Linearity ................................ 25 5.4Examples ............................... 26 6DISTRIBUTEDSTREAMS ......................... 29 7LINEAROPERATORASABSTRACTDATATYPE .......... 33 8TOWARDSAFRAMEWORKOFDESIGNOFSTREAMINGALGO-RITHMS .................................. 35 9CONCLUSION ................................ 38 iv

PAGE 5

REFERENCES ................................... 39 BIOGRAPHICALSKETCH ............................ 42 v

PAGE 6

LISTOFFIGURES Figure page 1{1Variousdatastreammodels ....................... 3 2{1Exponentialhistogramconstructedoverbinarystream ........ 9 2{2Wavesynopsis:Thexaxisshowsthe1-ranksandony-axis,eachdyadicleveliislabelledby2i ......................... 11 5{1Atypicalerrorprole ........................... 23 5{2Exponentialhistogramwithnonoverlappingbuckets .......... 25 5{3Wavessynopsiswithoverlappingbucketboundaries .......... 26 6{1Statisticcomputationondistributedstreams .............. 31 8{1Componentizeddesignofstreamingalgorithms ............. 35 8{2WeightedslidingwindowcomputationusingExponentialhistograms 36 vi

PAGE 7

AbstractofThesisPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofMasterofScienceTHEORYOFLINEAROPERATORSFORAGGREGATESTREAMQUERYPROCESSINGByGurudittaGolaniAugust2005Chair:AlinDobraMajorDepartment:ComputerandInformationScienceandEngineeringTherehasbeenagrowingresearchinterestinaddressingchallengesofdatastreamingapplicationsleadingtoahugegrowthinthedatastreamingmodelsandthenumberofalgorithmstocomputestatisticssucientlyundertheconstraintsoflimitedspaceandsinglepassscan.Thishasledtoaseparateclassofalgorithmsforthesamestatisticcomputationfordierentstreamingmodels.Weshowthatthealgorithmsoverthesemodelshaveanunderlyingstructureoflinearitywhichmakescomputationecient.Weshowthatanyalgorithmspecictothebasicdatastreammodelsencodesalinearoperator.Anapplicationofthisunderlyinglinearitystructureisthatanalgorithmparticulartoonestreamingmodelcanbetunedtoothermodelsbyreplacingthelinearoperatorbyanewlinearoperatorparticulartosynopsisgenerationonthenewmodel.Theseoperatorsprovideimportantstructurethatcanbeutilizedinacomponentizeddesignofstreamingalgorithms.

PAGE 8

CHAPTER1INTRODUCTIONDatastreamsarecharacterizedasnon-persistentformofmassivedataarrivingcontinuouslyandcanbeprocessedonlyonce.Thisneedtoprocessdatastreamsarisesinanumberofapplicationdomains.TelecomandInternetserviceproviderscollectdatacontinuouslyatvariousnetworkunitstondinterestingusagestatis-tics.CallDataRecordCDRisgeneratedatendofeachcallandsuchgeneratedCDRsformamassivestreamwhichisanalyzedforinterestingpatternslikemisuse,frauddetection.WeblogsatInternetServiceProvidersformastreamwhichisanalyzedforDOSattacksandloadusagepatterns.Sensornetworksisanotherareawhichrequiresprocessinginformationcon-tinuouslyfromindividualsensors.Sensorsaredistributedoveranareaanddataiscollectedfromsensorstoprocessthisinformation.Thedistributiveanddecen-tralizednatureofsensorapplicationsposeinterestingchallengestoprocessdatastreamsateachnode.Datastreamsarealsocreatedasintermediateresultsofoperatorsduringevaluationofqueryplanbyanoptimizer.DatastreamsalsondapplicationinFinancialdomainlikeATMtransactionrecordsanalysisandstream-ingtickers.Alltheaboveapplicationeldsproducevolumesofdatacontinuouslywhichcannotbestoredondiskandhastobeprocessedon-lineonlyonce.Thus,datastreamshasgrowntobeasignicantareaofresearch.Queriesoverdatastreamsresemblethoseintraditionalpersistentdatabaseelds.Weareinterestedincomputingaggregatestatisticslikejoinsize,sum,averageoverjoins,numberofdistinctelementsandorderstatisticslikequantiles,frequentoccurringitems.However,themassivesizeofdataandasinglepassscanposessevereproblemsthatdonotexistwithtraditionalsolutions.Thequeryis 1

PAGE 9

2 continuousinnaturethusdemandingananswerateverytimeinstantandshouldreectanychangesorupdatestothestream.MunroandPaterson[ 1 ]showedthatanyalgorithmthatcomputesquantilesoverNitemsexactlyinppassesrequiresN1 pspace.Onesolutionistomaintainasnapshotsynopsisofthemassivedatastreamwhichreducesspacethoughitmaynotgiveanexactanswer.Thishasledtodesignofapproximatealgorithmswhicharebasedonspace-accuracytradeos.Thus,thechallengeindesigningalgorithmsforanystatisticsfordatastreamsistokeepsynopsisofdatainspacesub-lineartosizeofstreamandkeepingperitemprocessingtimesmall.Forexample,Alonetal.[ 2 ]givearandomizedapproximatealgorithmwhichcomputesselfjoinsizeofastreaminspaceOlog= 2logN+logMwhere,N=sizeofstream,M=sizeofuniverse,istheerrorprobabilityandistherelativeerror.Thefocusthenshiftstosynopsisgenerationtechniqueswhichsavespaceoptimallywithgooderrorbounds.Histograms,Sampling,Wavelets,randomizedsketchesaresomeofthetechniquescurrentlyinliterature.Addedcomplexityisintroducedbyputtingmoreconstraintsonthestatisticscomputation.Whatmaybeatrivialcomputationinonecasecanbecome"hard"onaddingasimpleconstraint. Example1 BasicCounting[ 3 ]:Givenastreamofdataelements,consistingof0'sand1's,maintainateverytimeinstantthecountofnumberof1'sinthelastNelements.Theunboundedversionoftheproblemwouldbetokeepcountof1'satalltimesfromthebeginningandatrivialsolutionrequiresspaceofOlognforanexactanswerorOloglognspaceforanapproximateanswer,fornbeingsizeofstream.However,underconstraintsofawindowoflastNelements,itcanbe

PAGE 10

3 shownthatthebestsolutionpossibletakesNspaceforexactand1 log2Nspaceforapproximateanswer.Datastreammodelshavebeendevelopedtobroadlyencompasstheaddedconstraintswhichrelatemoretothewayasynopsisistobecollectedfromthedatastreamthantothenatureofthestatisticstobecomputed.Constraintscanbelikedierentphysicallocationsorcanbelogicalinnature.Anexampleofalogicalconstraintisthatquerybeansweredonlyforthemostrecentdataportion.Anexampleofphysicalconstraintis,quantilecomputationovertheunionofstreamsatadistributednetwork.Broadly,thereareofthreetypesofstreamingmodels:theinnitedatastreammodel,thedistributedstreamingmodelandtheslidingwindowmodel.Hybridversionsoftheaboveaddfurthercomplexitytothedesignofrequiredalgorithm. Figure1{1:Variousdatastreammodels 1.1 NewStreamingModelsTheabovemodelsarebynomeanstheonlyexistingmodelsofstreamingdata.Forexample,distributedslidingwindowmodelsexistandcalculatingstatisticsoverthemisaseparatechallenge.Inadistributedsetting,dierentstreamingrateateachnodecanrepresentachallengenotcoveredbytheabovesettings.Weightedslidingwindowmodelismoregenericthantheuniformweightslidingwindowmodel.Interestingquestionstoaskare:Arethealgorithmsdevisedforeachparticularstreamingmodelreallyanewclassofalgorithmsorcanwendanunderlying

PAGE 11

4 themetothedesignofsuchalgorithms.Forexample:cananewalgorithmthatimprovesspaceboundsforaparticularstatisticscalculationinonestreamingmodelclasshelpimproveresultsinother?ortheyareallfundamentallydierent?Anotherquestioncanbe,forexample,theproblemofjoinbetweenstreamsarrivingatdierentrates.Doesthisproblemrequiredeninganewmodel?Orcanwetakethisconstraintofdierentstreamingrateslikeanoperatorappliedtodatabeforethejoinisprocessedmuchlikeinrelationalalgebra? 1.2 SynopsisOperatorsOneofthechallengesindesigningaDatastreamquerymanagementsys-temDSMSistointegratestreamingoperatorswithexistingrelationaloperatorbasedarchitectures.Thesestreamingoperatorsshouldbenonblockinginnature.Alsothehugesizeofintermediateresultdemandssmallspaceinternalstatefromtheseoperators.Approximatesolutionsusingsmallspacestreamsynopsisoperatorsareusedtoanswersuchqueries[ 4 5 ].Indistributedstreamarchitectures,thatistheonlysolution.ThechallengewithDSMSarchitectureistondusefulstructureinthesesynopsisoperatorstousetheminagenericmanner. 1.3 OurContributionWiththeabovecontext,weshowthatalgorithmsassociatedwiththeinnitestreaming,distributedandslidingwindowmodelshaveanunderlyingalgebraicstructureoflinearitytothemthatisutilizedbyanyalgorithmtoecientlycomputethestatistics.Weproveanystreamingalgorithmencodesalinearoperator.Welinkthedistributedstreamingmodeltotheinnitestreamsettingandproveequivalencebetweenthemandarriveatatightboundbetweenthespacerequirementsofbothmodelsincomputingastatistic.Windowconstrainedstreamingalgorithmsarealsoshowntoutilizelinearityinafundamentalway.

PAGE 12

5 Finally,welookatapplicationsoftheunderlyinglinearitystructureandproposethatthesamealgorithmcanbeusedinmultiplestreamingmodelinstancesbyreplacingoneLinearoperatorcorrespondingtosynopsiscreationspecictothatmodelbyanother.Wecallthisascompositionalityandproposeanicecomponentstructuretothedesignofthealgorithmsoverthesetofstreamingmodels. 1.4 OrganizationofThesisHavingdenedthemotivationforthisresearch,hereisanoutlinefortherestofthethesis.InChapter2,wediscusspreviousworkrelatedtostreamingalgorithmsforvariousstatisticsandpartialgeneralizations.InChapter3,wegivebasicdenitionsforlinearoperatorsandafewnotations.InChapter4,weprovethateveryinnitestreamingalgorithmencodesalinearoperatoratitscoreandgiveexamples.InChapter5,wegiveprecisedenitionsrelatedtoslidingwindowsandshowhowlinearityisessentialindesigninganyslidingwindowalgorithm.InChapter6,weshowthatdistributedstreamingalgorithmsareageneralizationofthesinglepartyinnitestreamalgorithmswithtightspaceboundswithrespecttounboundedstreammodelandtheproofrevealsthatlinearityisessentialtoarriveattheecientbounds.InChapter7,wediscusstheimplicationsoflinearstructureoftheseoper-atorsandhowexistingresearchhasbeenusingthesepropertiesbutmissingtheunderlyingstructure.InChapter8,weprovideinitialframeworktowardsde-signofstreamingalgorithmsingeneral.Finally,inChapter9,wesummarizeourobservationsandcontributionspresentedinthisthesis.

PAGE 13

CHAPTER2RECENTWORKSmallworkspaceandsinglepassscanforqueryansweringonstreamsisafundamentalrequirementonanyalgorithmwhichanswersthatquery.Sincestreamscanbemassiveinsize,spacecostgreaterthansublinearinsizeofstreamisinfeasibleformanyapplications.MunroandPaterson[ 1 ]showedthatanyalgorithmthatcomputesquantilesexactlyinonepassesrequiresNspace.Alonet.al[ 6 ]showedthatcomputingfrequencymomentsonstreamsexactlywouldrequireNspace.Thispromptedresearchtodevelopapproximatealgorithmswhichsavesspacewhileallowingforerrors.A;streamingalgorithmforanyfunctionfcomputationisanapproximatealgorithmwhichtakesasinputanerrorparameterandacondenceintervalandcomputesan-approximatestatisticfoverentirestreamseensofarwithprobabilityatleast1)]TJ/F23 11.955 Tf 11.955 0 Td[(.Streamingmodelshavebeenintroducedtorepresentvariousphysicalandlogicalconstraintsoverwhichtocomputeapproximateanswerstothestatistics.Welistthepropertiesofeachmodelandrecentadvancesinstatisticcomputationoverthebasicstreammodels. 2.1 StreamingModels 2.1.1 InniteStreamDataModelThisisthebasicmodelwherethestatisticsiscomputedovertheentirestreamseensofaratonelocationinonepass. 1 Thusthesizeofstreamoverwhicha 1Thismodelcanbeeasilyextendedtoincorporatemultiplestreamsarrivingatsamelocationfromdierentsources.Henceforthwewillconsideronlyasourcestreambutthatdoesnotlimitthescopeofthemodel 6

PAGE 14

7 queryhastobeansweredcangotoinnity.Thismodeldemandsthefollowingpropertiesfromasynopsis:smallspace,orderlessandonepassconstruction.Orderlessnessimpliesthattheanswercomputedbythealgorithmhastobethesameirrespectiveoftheorderinwhichinputisseen.Examplesincludecomputingfrequencymomentsandjoinsizes[ 2 7 { 10 ],orderstatisticssuchasquantiles[ 11 { 13 ]andfrequentitems[ 14 ].Letuslookattwooftheexamplesinmoredetails: Example2 AMSsketchforL2normcomputation.AMSsketchesarerandomlinearprojectionsofstreamelementsasavectorofsizethesizeofuniverseUandwithanelementatithpositionbeingthefrequencyoftheuniverseelementindexedati.Asketchiscreatedasfollows: 1. rstselectingafamilyoffour-wiseindependentbinaryrandomvariablesi:i=1;:::U,whereeachi2f)]TJ/F15 11.955 Tf 26.567 0 Td[(1;1gandP[i=1]=P[i=)]TJ/F15 11.955 Tf 9.299 0 Td[(1]=1=2. 2. Onarrivalofanewitemi,acountX,maintainedatalltimes,isincrementedbyi.Thus,X+=i. 3. Onquerying,wedeneanotherrandomvariable,Z=X2.Itcanbeshownthat,E[Z]=Pif2i.Thisistheunbiasedestimateofthesecondfrequencymomentofthedatastreamseensofar. 4. Tofurtherimprovequalityofestimation,wekeeps1s2suchZsamplestakes2averagesofs1samples.andfromsuchasetofaverages,wepickthemedianvalue.Thistechniqueiscalledtheboostingtechniquetoreducevariance.Thespaceusedissublinear,Olog= 2logN+logMwhere,N=sizeofstream,M=sizeofuniverse,istheerrorprobabilityandistherelativeerror. Example3 QuantilesusingGKsketch[ 12 ].GKalgorithmisadeterministicapproximatealgorithmtocomputequantilesfromastream.Synopsisiscreatedasfollows:

PAGE 15

8 1. Maintainsamplevaluesuchthatanyonesamplevaluecovers2Nrangeofvaluesbetweenitandthepriorvalueinsample.Thus,therecanbeatmostanerrorofN. 2. Onarrivalofanewelementx,atupleiscreated.Thesecondelementofthetuplegivesthecapacityofthistupleandtheinvarianceconditiononthecombinationofsecondandthirdelementofthetupleguaranteesthatnumberofelementscoveredbetweenthetupleandthetupleprevioustoitisnotmorethan2N. 3. Ifspaceboundisexceeded,thesynopsisiscompressedsuchthattheaboveinvariancepropertyholdsforthecompressedtuples. 4. Queryisansweredbyndingthetupleinthesynopsiswithrankclosesttothedesiredrank.Thespaceusedissublinear,O1 logN,whereistherelativeerrorandN=sizeofstreamandthemethodisdeterministic.Arestrictionofthisalgorithmisthatitdoesnothandledeletions. 2.1.2 SlidingWindowModelManyapplicationsrequiredecisionmakingbasedonstatisticsoverstreamel-ementswithmoreweighttorecentstreamelements.Alsothequeryingisexpectedtobecontinuous.Examplesincludenancialstocktickerinformationina5minutesettingortocomputewebtracstatisticsoverthelastdayfromtheweblogdata.Toaddressthis,slidingwindowsmodelhasbeenintroduced[ 15 ].TimestampbasedslidingwindowofsizeNconsistsofalldataelementswhosearrivaltimestampiswithintheintervalNofcurrenttimestamp.CountbasedslidingwindowofsizeNconsistsofNmostrecentdataelementsthathavearrivedsofar.ThemoregeneralslidingwindowmodelwhereaweightfunctionisassignedtothewindowisintroducedbyCohenet.aletal.[ 16 ].

PAGE 16

9 Thechallengeinslidingwindowmodelistoforgettheexpireditemsfromthewindowasnewitemsarriveandolditemsexpire.EvenforthesimpleproblemofBasicCountingasdescribedabove,itcanbedoneexactlyinspaceNwhereNisthesizeofwindow.However,manyapplicationsrequirememoryusageinsizesublinearofthesizeofthewindowandthewindowsizeNcanlargetoo.Followingareafewsynopsistechniquesprevalentinliteraturetoaddresstheproblemofforgettingecientlyinanapproximatemanner: Example4 BasicWindow:[ 17 ]Inthisscheme,theslidingwindowissubdividedequallyintoshorter,basicwindows.Arrivingelementsarekeptinabasicwindowuntilitislledandthennewwindowisused.Queryisansweredbyaddingupcontributionsofalllledbasicwindows.Abasicwindowcontributionisaddedaslongasithasevenoneelementthatisactive.Contributionofnewelementsareignoreduntiltheirbucketwindowislled.Then,thecontributionoftheoldestwindowissubtractedandthatofthenewestoneadded.However,thisdoesnotgiveacloseerrorbound. Figure2{1:Exponentialhistogramconstructedoverbinarystream

PAGE 17

10 Example5 ExponentialHistogramsEH:[ 3 ]Thisschemeisadaptiveandgivesanswerswhicharealwaysrelativeerrorbound.Newelementsarekeptinsingleelementbucketsandastheelementsgetolder,theyaremergedtogethertobiggersize.Thusbucketsizesareexponentiallyincreasing.Asizeinvarianceisalwaysmaintainedsuchthatthecontributionoftheelementsintheoldestbucketneverexceedstimesthatoftheotherrecentbuckets.Thesizeofbucketsarekeptexponentiallyincreasingsothatnumberofbucketsislogarithmicinsizeofwindow.Expiredbucketsaredropped.Instatisticscomputation,thecontributionoftheoldestbucketisignoredifitstimestamphasexpired.Thisschemehasbeenproventogivetightspacebounds. Example6 Waves:[ 18 ]ThisisanimprovementovertheEHscheme,asmerginginEHtakesOlogNtimeperelement,byavoidingthemergestep.Theyimproveontheabovebykeepingseveralbucketsateachdyadiclevelofthemostrecentelements.Eachbucketismarkedbytherankofthe1elementasseenintheslidingwindowstream.Thecompositestructurelookslikeawaveandsincebucketsdonothavetomergetogether,updatetimeperelementisO.Thebucketsfromdierentlevelsoverlapeachotherbutmaintainthesameinvarianceasabovethatthecontributionoftheoldestbucketisnevermorethantimesthatoftheotherbucketsnon-overlappinglycoveringthewindow.Expiredbucketsaredropped.Dataretal.[ 3 ]showedatechniquebywhichalgorithmsforstatisticfcomputationoverinnitestreamingmodelcanbeappliedtoslidingwindowmodelprovidedthefunctionfsatisedafewconstraintslistedbythem.Thistoourknowledge,isarstattemptatgeneratingautomaticguaranteesandplugandplay

PAGE 18

11 Figure2{2:Wavesynopsis:Thexaxisshowsthe1-ranksandony-axis,eachdyadicleveliislabelledby2i frameworkforcomputationonslidingwindowmodelgivenerrorguaranteesforinnitestreamingalgorithms.Inthisthesis,weshowthatsuchaplugandplayframeworkispossibleduetotheunderlyingalgebraicstructureoflinearityandshowapplicationsofit. 2.1.3 DistributedStreamModelInthismodel,forexamplesensornetworks,streamsarriveatvariousnodesofanetworkandthestatisticstobecomputedistobeovertheunionofthesestreams.Itisimpracticaltosendallthestreamdatatoonecentralpointduetocommunicationcostsandsinglepointoffailure.Soeachnodemaintainssynopsisinlimitedworkspaceforitsownstream.Thechallengesposedbythismodelareasmallspacecomputationateachnode,blowcommunicationbitsbetweennodes.Demandonthesynopsisbythemodelalsoincludestakingcareoferrorinaggregationateachnodeduetovariousdiusionspeedsofthenetwork.Examples

PAGE 19

12 includecomputingaggregatestatisticsovernetworks[ 19 ],distributedsetexpressioncardinality[ 20 ],distributeddistinctvalueestimation[ 18 ]. Example7 DistributedTop-Kmonitoring[ 21 ]:Eachnodeinthedistributedsettingreceivesastreamandtop-kfrequentitemsaretobeidentiedovertheunionofthesestreams.Eachnodekeepsanapproximatetop-ksynopsisoveritsstreamandrelaysthissynopsistothecentralnodeatappropriatetimes.Thecentralnodeaddsuptheseindividualnodesynopsistoarriveatatop-ksynopsisovertheunionofthesestreams.Thechallengesaretoaddupthesesynopsisovertheunionofthestreamstogetadistributedtop-ksynopsisandtokeepcommunicationcostsataminimumbyrelayingsynopsisdatatocentralnodeonlywhenunderlyingstreamtop-kchangesaresucienttoinformthecentralnode. 2.2 SynopsisOperatorsOneofthechallengesindesigningaDatastreamquerymanagementsys-temDSMSistointegratestreamingoperatorswithexistingrelationaloperatorbasedarchitectures.Thus,itshouldbeabletohandlequeriesoverstreamsand/orrelations.Theproblemarisesinthenatureofoperatorsthataretobeevaluated.Pipelinedjoinstakelotofspaceinhandlingintermediatedata.Relationalaggre-gationbasedoperatorssuchasMin,Sum,Avg,joinsareblockingoperators,i.e.theyrequiretoseealltheinputbeforetheyoutputonetupleofoutput.Clearlyincaseofstreams,thisisinfeasibleduetothelargeamountofstreamdataandfortheneedofcontinuousevaluationofqueries.Variousschemeshavebeendesignedinexistingarchitecturesforstreamstoun-blocksuchoperators.Windowedoperatorsareusedsothatonlytherecentstreamdataisusedinansweringqueries.Otherwaysincludemaintainingincrementalviews.However,thesesolutionsfailwhenthesizeofwindowislargeandcantbe

PAGE 20

13 storedinmainmemory.Inthosecases,approximatesolutionsusingsmallspacestreamsynopsisoperatorsareusedtoanswersuchqueries.[ 4 5 ].Indistributedstreamarchitectures,thatistheonlysolution.IoannidisandPoosala[ 22 ]suggestedaHistogrambasedalgebratoanswerapproximatequeriesinarelationalsettingbyconsideringhistogrambucketstorep-resentrelationliketuples.Eachbucketformsatuplelo;hi;distinctcount;averageandthesamequerytreecanberunonthissetofhistogramtuplestogetapprox-imateanswers.Chakrabartietal.[ 23 ]similarlyproposedaWaveletAlgebratoanswerqueriesinanSQLmanner.ThechallengewithDSMSarchitectureistondinterestingstructureinthesesynopsisoperatorstocreatetheirlibrariesandusetheminagenericmanner.Inthisthesis,wedescribeapowerfulpropertyofthesesynopsisoperatorswhichisusefulinqueryenginedesign.

PAGE 21

CHAPTER3LINEAROPERATORSANDDATASTREAMALGORITHMSWeproposetolookatanyDataStreamAlgorithmforstatisticfasequivalentto,zLD,withDamultisetofitems2
PAGE 22

15 Letustakeanexample: Example8 ComputingAverageonastreamLetthestreamofelementsarriveasa1;a2;a3;:::;an.LetAverage=zLavgD.TheLinearoperatorforAverageLavgLavg=<0;0>Lavgai=LetLD1=andLD2=forD1,D2partsofstream.Then,theisdenedasLavgD1LavgD2=zisz=s cAverageforthestreamiscomputedbyaddinglinearoperatorsforeachelementofthestreamasitarrivesanditcanbecontinuouslyanswered.ThelinearoperatorforAveragehasinterestingproperties.Itdoesnotrequirestreamelementstoarriveinanyparticularorder.ThisisduetothefactthatLavgiscommutativeandassociative.Thus,itisorderindependent.Ifthestreamissplitintotwoarbitrarystreams,LinearoperatorforAveragecanworkoneachofthemseparatelyandtheaverageovertheunionofthetwostreamscomputedusing.Thus,itcanhandledistributedcomputation.Wecanseethatlinearoperatorswiththeirpropertiesareusefulindatastreamcomputationsastheyhavepropertiesoforderindependenceandallowdistributedcomputation.Withtheabovedenitionandpropertiesoflinearoperators,wenowlookateachofthethreebasicmodelsandidentifytheroleoflinearoperatorsintheclassofalgorithmsspecictoeach.

PAGE 23

CHAPTER4INFINITEDATASTREAMSInthissection,welookatalgorithmsthatcomputestatisticsoveraninnitestreamthatarrivesatthesamelocationwherequeryistobeanswered. Denition2. AnalgorithmisaninnitedatastreamDSalgorithmforanyfunctionf:An!Bisanapproximatealgorithmwhichtakesasinputanerrorparameterandacondenceintervalandcomputesan-approximatestatisticfoverentirestreamseensofarwithprobabilityatleast1)]TJ/F23 11.955 Tf 11.955 0 Td[(.AnyDSalgorithmshouldpossesspropertiesofsmall-spacecomputation,one-passandorderlessassumptiononarrivalofinput.Orderlessnessimpliesthatalgorithmmakesnoassumptionaboutthedistributionofthedata.Wenowshowthatinnitedatastreamalgorithmsuselinearoperatorsattheircoretoformsynopsis. Theorem1. Anyinnitedatastreamalgorithmcanbeseenasanapplicationoflinearoperatortoadatastream. Proof. WegiveourproofbyconstructingsuchalinearoperatorgivenaDSalgorithm.Weshowthatforanystatisticfcomputed,thezLDoperationofanyalgorithmhasthelinearsynopsisconstructionpartascoreofit.LettheTuringmachinethatsimulatestheDSalgorithmbeM.WedeneafunctionSD1;D2=D1kD2,whereDcorrespondstostateofthemachineafterinputD.Wedeneourlinearoperatorasfollows:IdentityelementL=4.1 16

PAGE 24

17 whereisthestatewhenthereisnoinput.AdditionoperatorLD1LD2=def+SD1;D24.2Wenowshowthecommutativityandassociativitypropertyoftheoperator.LD1LD2=SD1;D2=D1kD2=D2kD1=D1kD2=SD2;D1=LD2LD1ThekeystepbeingtonotethatduetoorderlesspropertyoftheDSalgorithmmachine,D1kD2=D2kD1.Thatis,thenalstateafterrstinputtingD1followedbyD2isthesameasthenalstateafterthesequenceD2followedbyD1.Weproveassociativitybysimilarargument.Hence,wedenedalinearoperatorgivenamachinethatsimulatesDSAlgorithm.Itiseasytoprovetheotherdirection.Ifwehavealinearoperatorforagivenstatisticcomputation,itcanbeeasilywrittenasaDSalgorithm. Thistheoremsaysthatanyinnitedatastreamingalgorithmappliesalinearoperatortothedatastreamtocollectsynopsisandcomputestatisticsonit.Thepropertiesoforderlessness,onepassscanarederivedduetothelinearoperatorinvolved. ExamplesWeshowexamplesofalgorithmsproposedinliteraturetocomputestatisticsoverinnitestreamingdataandderivethelinearoperatorthatcreatessynopsisfromthedatastream.

PAGE 25

18 Example9 Histogrambasedrangesumcomputationonastream[ 22 ].Givenbuckets,B1;B2;:::;Bm,eachmaintainsatuplelo,hi,totalcountandafunctionhthathashesnewitemstoappropriatebuckets,theLinearoperator`thatworksoneachitemasitarrivesisasfollows`xi=Bhi:totalcount+xiThequeryisansweredbylocatingbucketswhichfallintherangegivenbythepredicateandaddinguptotalcountsofthebuckets.Theoperatorhereisthearithmetic+andtheoperationiscommutativeandassociative. Example10 L2normcomputationusingAMSsketches[ 2 ].Asdescribedpreviouslyinexample 2 ,AMSsketchesareconstructedbyprojectingincomingstreamelementsontorandombinaryvectors.TheLinearoperator`thatworksoneachitemasitarrivesisasfollows`xi=ixiwhere,i,8iUisthesetofrandom4-wiseindependent1variable.Andtheoperatoristhearithmetic+. Example11 Lpnormfor0p=logU[ 9 ].Distributionswithstabilityparameterphavethefollowingproperty:IfrandomvariablesX1;X2;::;Xnhavestabledistributionswithparameterp,thena1X1+a2X2::::+alXlisdistributedasPijaijp1 pX0,whereX0isarandomvariablewithpstabledistribution.Cauchydistributionisstablewithp=1.Gaussiandistributionisstablewithparameterp=2.Thereexistformulaetogeneratestabledistributionsfor0p2.

PAGE 26

19 Thusgivenxi;jindependentrandomvariablefromastabledistributionwithparameterp,foranyarrivingelementaiwemaintainskaj=nXixi;jaiweareguaranteedthateachskajisdistributedasPijaijp1 pX,whereXisarandomdistributionwithstableparameterp.Bytakingmedianofallentriesjskajjp,wegetagoodestimateforlppnorm.TocomputeLpthus,Weconstructbeforehand,xiindependentrandomvariablefromastabledistributionwithparameterp.Onarrivalofnewitemai,theLinearoperatorworksasfollows`ai=xiaiTheoperatoristhearithmetic+operator.Lpnormiscalculatedbymaintainingseveralsuchsynopsisandpickingthemedianofthem.LpD=medjnXixi;jai Example12 ComputingSUMandotheraggregatesoverjoin[ 7 ].Foraquerysuchas:SELECTSUMRi:AjFROMR1;R2;:::RrWHERE,wheregivesthejoincondition,andunderassumptionsonacyclicityofattributesgivenby[ 7 ],AMSsketchesareextendedtocomputeaggregatesoverjoins.TheLinearoperatorforallbutX1worksas`xi=xijjwhereeachjisarandom4-wiseindependent1variableoveronesetofattributejoins.

PAGE 27

20 ForX1,thelinearoperatorworksas`xi=SumAjxijjTheoperatoristhearithmetic+operator.Soitfollowsassociativityandcommutativity.ThequeryisansweredbyQSUM=rk=1Xk Example13 Pointqueryondatastreamusingwaveletdecomposition[ 23 24 ].Wavelettransformsareusefulincapturingtrendsindataseries.Coecientsarecomputedbyaseriesoflowpassresultinginaveragingcoecientsandhighpassltersresultingindierencecoecientsoverthedataandcanbedoneinonepass.TheCoecientsformasynopsisofincomingdatastreams.Spacesavingisobtainedbyretainingcoecientsaboveacertainthresholdvalue.Toanswerthepointquery,acolumnvectorisconstructedwithallentries0exceptthatofwhichthepointqueryisbeingaskedandthevaluethereis1.Thequeryisansweredbycomputingtheinnerproductofthewaveletcoecientssynopsisvectorwiththewaveletsynopsisvectorofapointqueryvector.IfVisthevectorofwaveletsynopsiscoecientsandasthevectorofwaveletbasisvector,theLinearoperator`thatworksoneachitemasitarrivesisasfollows`xi=<;[0000:::1xi00]>andthesynopsisisincrementedasV=V+`xiTheoperatoristhevectoradditionoperatoranditiscommutativeandassocia-tive.

PAGE 28

21 Example14 QuantilequeryoninnitedatastreamusingGKmethod[ 12 ].GKalgorithmforquantilecomputationisadeterministicapproximatealgorithmwiththebestspaceboundsforagivenerror.Itworksbykeepingrangeofvaluesinsidethememoryandcompressingmemorybykickingoutvalueswhenmemoryisfull.ThusithasanINSERTandaCOMPRESSoperation.TheLinearoperator`thatworksoneachitemasitarrivesisasfollows`x=createtupleWedeneouroperatorasxy=INSERTCOMPRESSINSERTandCOMPRESSareasdenedin[ 12 ].Itiseasytoseethatiscommu-tativeastheINSERToperationcreatesasortedlistandhasanidentityelementof0.Weproveassociativityasbelowxyz=xyzThisisbecausethealgorithmguaranteesthatirrespectiveoforderinwhichx,y,zarrive,theerrorwillbeupperboundedbymax.errorandtheequalityholdsinthatrespect.

PAGE 29

CHAPTER5SLIDINGWINDOWSTREAMSInthischapter,weconcentrateontimestampbasedslidingwindowsfromhereon.Weintroduceadenitionforaslidingwindowsalgorithmandshowhowlinearityisattheheartofanyecientwindowsalgorithm. 5.1 DenitionWepreviouslyidentiedadatastreamalgorithmtohavepropertiesofone-passscan,sub-linearspaceeciencyandorderinvarianceincomputingtheresult.Slidingwindowalgorithmsshouldhavesimilarproperties.Weneedtoredeneorderinvariancewithrespecttoaslidingwindow.Orderinvariancepropertyofaslidingwindowalgorithmmeansforanyorderofdatawithinthewindowbound,theanswercomputedisthesameanddoesnotdependuponorderofdata.SlidingwindowalgorithmshavetoanswerqueriescontinuouslyoverthemovingwindowNanditcanbeshownthattheycantexactlyanswereventhemostbasicqueriesinspaceOlogN.Thisisduetothefactthatittakesalotofspacetorememberwhichelementscontributetothecalculationofthestatisticsandwhichtoforgetaccurately.Thus,theyintroduceerrorwhencomputingapproximateanswersinspaceOlogN.ExponentialHistograms[ 3 ],BasicWindows[ 17 ]andWaves[ 18 ]aresomeofthecommonschemes.Acommonerrorproleovertimeisshowningurebelow.TheimportantthingtonoticefromsuchanerrorproleisthatsinceslidingwindowalgorithmsdoanapproximatejobofforgettingtheexpireddataitemsfromawindowofsizeN,theerrorduetothisapproximationshouldnotincreasewithtimeandshouldbekeptwithinabound.Wethusidentifyaconstraintonasliding 22

PAGE 30

23 Figure5{1:Atypicalerrorprole windowalgorithmthatweputtogetherinadenitionofslidingwindowalgorithmasbelow: Denition3. Alongwiththepropertiesofsublinearspace,orderinvariance,andonepassscan,analgorithmisaslidingwindowsalgorithmifforawindowsizeofN,afteratmostNtimestamps,theerrorgoestozeroatleastonce.Intuitivelyspeaking,thealgorithmshouldgiveperfectansweratleasteveryNelementsinserted.Wedenethatpointinsummarywindowwhereerroriszeroassynchronizationboundary. Denition4. Synchronizationboundaryinaslidingwindowsisatimestampatwhichtheerrorduetothealgorithmiszero.Thusintheabovegure,thesynchronizationboundariesarealongtimestampswheretheerrorduetoitemsfromthattimestamptothemostrecentitemseeniszeroandthealgorithmgivesaperfectanswer.Therecanbemorethanonesynchronizationboundaryintheslidingwindowsynopsis.Itshouldbenotedthatifthealgorithmisdeterministic,weachieveazeroerroratleastonceeveryNtimestamps.Ifhowevertosaveonspace,thealgorithmitselfhasarandomized

PAGE 31

24 guaranteewithincertainerrorbounds,ourdenitionimpliesthatatleastonceeveryNtimestamps,errorcontributionduetoslidingwindowmodeliszero. Example15 BasicCountingalgorithmduetoDataret.al[ 3 ].TheproblemofBasicCountingaspreviouslydenedinexample 1 ,istocountnumberof1'sinabitstreamformostrecentNitems.Dataretal.gaveadeterministicalgorithmusingExponentialHistogramsseeexample 5 .Thesyn-chronizationboundariesarethebucketboundariesoftheexponentialhistogramsanderrorcontributionofelementsfromthelatestuptoanyofthesynchronizationboundariesiszero. Example16 L2normusingAMSsketchesonslidingwindow[ 3 ].L2normusingAMSsketchesisarandomizedschemeappliedtoslidingwindowscheme.Itworksasfollows:AMSsketchesaremaintainedforeachbucketoftheExponentialHistogramsynopsis.ThesketchesarecomposableandsketchesonlyforthebucketswhoseallelementsarewithinthemostNrecentitemsareaddedandL2normcomputedonthecombinedsketch.Therearetwosourcesoferror:duetotheapproximatecomputationofL2normusingsketchesanderrorduetoimperfectlyforgettingtheexpireditemsbynotincludingsketchesofbucketswithelementssomeofwhichmaybeamongmostrecentNitems.Ourconstraintimpliesthat,errorduetoimperfectlyforgettingtheexpireditemsshouldreducetozeroatleastonceeveryNtimestamps. 5.2 SynchronizationBoundaryThisadditionalconstraintonanalgorithmtoqualifyasaslidingwindowalgorithmsuggeststointroducemorethanonesynchronizationboundaryperNitemsscannedtobringerrordowntozeromoreoften.ExponentialHistogramsand

PAGE 32

25 BasicWindowsschemesdojustthat.Synchronizationboundariesforbothschemesaretimestampsoftheearliestelementineachbucket.Whileasynopsisofthedataisbeingmaintained,toanswerquerieseciently,datafromthesynchronizationboundarytothelatesttimestampedentryhastobekeptexactasperthedenitionofsyncboundaries.Errorisintroducedduetoinexactinformationkeptbeforetheearliestsyncboundaryduetoincompleteinformationaboutexpiredelementsversusactiveelements. 5.3 LinearityThusanyschemethatfollowstheaboveconstraintsmaintainsseparatesynop-sisforthedatabetweenanytwosynchronizationboundaries.Hence,bucketizationisaninherentstructuretoanysuchscheme. Example17 ExponentialhistogramsExample 5 maintainnonoverlappingbucketswhereeachbucketrepresentsdatasynopsisbetweentwosynchronizationboundaries. Figure5{2:Exponentialhistogramwithnonoverlappingbuckets Example18 WavessynopsisExample 6 maintainsoverlappingbucketswithsynchronizationboundariesasbucketboundariesandseveralbucketsaremaintainedateachdyadiclevel.Aslidingwindowalgorithmhastodothefollowing:

PAGE 33

26 Figure5{3:Wavessynopsiswithoverlappingbucketboundaries 1. Incorporatenewitem. 2. Fromthebucketsynopsis,computetheresultofthequery.Thiscanbedonebydeningalinearoperatortoaddnewitemtoabucket.Ifmorethanonesynchronizationboundariesandhencemorethanonebucket,thebucketsynopsiscanbemergedlinearlybydeningasuitableplusoperatortoarriveattheanswer.Hence,wecansayanyslidingwindowschemehastohavealinearsynopsiscreationandmergemethodinthecoreofit. 5.4 ExamplesWenowshowtwoexampleswherewederivelinearitystructureofalgorithmsproposedinliteratureappliedtoslidingwindows.Wedonotdealwithhoweachalgorithmcreatedsynchronizationboundariesalongtheslidingwindowlength.Thatisspecictoanalgorithmdesignerastohowmucherrorishewillingtoallowinspacingthesynchronizationboundariesapart.Weshowhownewelementsareincorporatedintothesynopsisstructureandhowqueriesareanswered.Mathematically,weshowtheunderlyinglinearoperatorandthe

PAGE 34

27 operator.Wetaketwoexamples,oneeachtocomputeaggregationstatisticsandorderstatistics. Example19 L2normcomputationonslidingwindows[ 3 ].AMSsketchesaremaintainedforeachbucketoftheslidingwindowandL2normiscomputedbyaddingthesesketchestogetherfromactivebucketsusingtechniquesimilartothatforinnitecaseseeExample 2 .TheLinearoperator`thatworksoneachitemasitarrivesandformsasinglebucketisasfollows`xi=ixiwhere,i,8i2Uisthesetofrandompairwiseindependent1variable.Theisthearithmetic+whichisusedtomergebucketstogetherandtocomputecombinedsynopsisfromactivebucketsforansweringL2normquery. Example20 Quantilequeryonslidingwindows[ 25 ][ 12 ].GKsynopsisaremaintainedforeachbucketintheslidingwindowandmergedtogetherforactivebucketsandthecomputationisthesameasininnitestreamingmodel.TheLinearoperator`isthesameasdenedinthecaseforinnitedatastream,thatworksoneachitemasitarrivesisasfollows`x=createtupleWeexpandthefunctionalityofouroperatorbydeningtwooperators.Theyareintrabucketandinterbucket.Theintrabucketisthesameastheoperatorininnitestreamcase.

PAGE 35

28 Theotherplusoperatorworksonbucketsofsynopsisandisasfollowsxinterbuckety=MINx,yformtupleTINSERTS,TformtupleT'fromMAXx,yINSERTS,T'CommutativityisguaranteedbyMINfunctionandAssociativityholdsbyasimilarorderinvarianceargumentforintrabucketasbelowxinterbucketyinterbucketz=xinterbucketyinterbucketzThisisbecausethealgorithmguaranteesthatirrespectiveoforderinwhichx,y,zbucketsareformed,theerrorwillbeupperboundedbymax.errorandtheequalityholdsinthatrespect.

PAGE 36

CHAPTER6DISTRIBUTEDSTREAMSWithrespecttoourdenitionofaDataStreamAlgorithmasequivalentto,zLD,DistributedStreammodelcanbeseenasageneralizationoftheinnitestreamsinglereceivermodel.Separatestreamsarriveateachnodeofthenetworkandthestatisticistobecomputedovertheunionofallthesestreams.Inthismodel,sincetransmittingallthestreamitemstoacentralpointinvolvinglotofcommunicationisnotdesirable,eachnodecomputesasmallspacesynopsisoverthestreamitgetsandatleasttheco-ordinatingnodereceivessynopsisinformationfromallothernodesandcomputesthestatisticovertheunionofthesynopsisacrossallnodes.Thekeychallengeshereare:tomaintainsmallspacesynopsisateachnodeandtokeepcommunicationbitsataminimum.Itiseasytoseethatadistributedalgorithmforstatisticcomputationcaneasilybeappliedtoasinglepartystreamingmodelwithsamespace-errorbounds.Howeverthereverseisnottrue.Analgorithmdesignedforsinglepartystreamingmodelcannotbetriviallyappliedtodistributedstreams.However,ithasbeenshownthatspacecostsoverbothmodelsaretightwithrespecttoeachother.[ 26 ].Formalizingthis,ConsiderasinglepartystreamingsettingwhereafunctionfistobecomputedovernstreamsX1;X2;::;XninterleavedinrandomordertoproduceonesinglestreamX.LetthespacecostofadeterministiccomputationbeSSCf.Thedistributedsettingequivalentofthisistocomputefoverstreamsassignedtoeachnodei,Xi.LetthespacecostofdeterministicallycomputingfoverthisbeDSCf.Gibbonset.al[ 26 ]showthat: Theorem2. Foranyn1andanyfunctionf,SSCfDSCfnSSCf. 29

PAGE 37

30 Thistheoremsaysthatthecostofdistributedcomputingoverstreamsisgreaterthanthatforcomputingoverasinglelocationbutistightboundedbythesinglestreamcostwithafactorofn.ItisnotclearifDSCf
PAGE 38

31 Figure6{1:Statisticcomputationondistributedstreams thecentralnodewhichhastogureouthowtocombinethemsuchthatLD=LD1LD2::::::::LDnThus,eachnodecomputesitsownsynopsisfromincomingstreamandthenthroughaprotocoltransmitsthesynopsistoacentralpointwhichthencomputesthestatisticonthem.Mathematically,iftherearennodes,andeachnodeireceivesdataDi,then,thestatisticiscomputedasStatistic=zLD1LD2:::LDn=zLD1[D2[:::[Dn6.1Thus,theunderlyingstructureofanystatisticcomputationoverthedis-tributedmodelislinearassynopsisfromvariousnodesarecombinedandthenalresultisirrespectiveoftheorderofcombinationsincewedonotmakeanyassumptiononwhichstreamshouldarriveatwhichnode.Thatgivesusproperties

PAGE 39

32 ofcommutativityandassociativitywiththelinearoperatorembeddedwithinsuchancomputation.

PAGE 40

CHAPTER7LINEAROPERATORASABSTRACTDATATYPELinearoperatorsareessentialforanysmallspacestreamingalgorithmdesign.Theyhavenicepropertiesthatareusedextensivelyinexistingliteraturewithoutbeingexplicitlymentioned.Recentresearchliteratureaboundsinapplicationofsamealgorithmtodierentstreamingmodelsbydening/modifyingtheoperatorwithoutrealizingthepoweroftheunderlyingstructure.Nathet.al.[ 27 ]describeorderinsensitivityandduplicateinsensitivityofthesynopsisasrequiredfordistributedcomputation.Duplicateinsensitivityisaformofconstraintthatcanbehandledbyappropriatelydeningaoperator.TheUnionPropertyofCount-minandFMsketchesusedbyHadjieleftheriou[ 28 ]tocreatedistributedsynopsisisinherentduetotheirlinearitystructure.ALinearoperatorcanbeviewedasanAbstractDatatypewithanullelement,aadditionandagetqueryfunction.TherearesubstantialbenetstoviewingaLinearOperatorasanADT: VariouslinearoperatorsasADTswillhavesimilarinterfacewhichcanbeusedbyanextensibleDatabasesystemlikePREDATOR[ 29 ]toanswerstreamingqueries. LinearoperatorsasADTsleadtogenericalgorithmswithpredenedinter-faces. LinearoperatorsasADT'shavetheaddedpropertythattheyarecom-posableandhavepropertiesofcommutativityandassociativityandthusanystreamingqueryenginecanutilizetheirthispropertytohaveexibleoptimizations.AlinearoperatorADThasinterestingrelationtouserdenedrelationalaggregateoperator.ItcandeneINITIALIZE,ITERATEandTERMINATEfunctionsandcananswerqueriescontinuously. 33

PAGE 41

34 Example21 QUERY:SELECTCOUNT*FROMX1,X2WHEREX1:a=X2:a.LinearOperatorx1i;x2iINITIALIZE:state:YX1=0;YX2=0;ITERATE:UPDATEstate:YX1+=ix1i,YX2+=ix2i;RETURNoutput:YX1YX2;TERMINATE:RETURNoutput:YX1YX2;ThislinearoperatorfollowstheAMSschemeofcalculatingsizeofjoinwhereiisthefamilyoffour-wiseindependentbinaryrandomvariables,whereeachi2f)]TJ/F15 11.955 Tf 26.567 0 Td[(1;1gandP[i=1]=P[i=)]TJ/F15 11.955 Tf 9.298 0 Td[(1]=1=2.Butwhileuserdenedrelationalaggregateoperatorsdonotallowforcompos-abilityintheirdenitioneventhoughtheaggregatefunctionitselfiscomposable,LinearoperatorsasADTshavethispropertyintheirdenitionasADTsandcanbeusedmoreeciently.Specialpropertiesofindividualsynopsisgenerationtechniquesproposedbyresearcherspreviouslyareactuallyduetolinearoperators.Linearoperatorswiththeircanbeusedtoprocessincomingstreamdatainparallelmanner.Thusthestreamcanbedividedintomultiplepartsandeachprocessorcanworkonpartsofstreamandnalresultcanbeobtainedbyaddingthesynopsisoperatorstogether.Duetotheirpropertyoforderinsensitivityandthe,linearoperatorsareeasilyextensibletodistributedstreamcomputation.

PAGE 42

CHAPTER8TOWARDSAFRAMEWORKOFDESIGNOFSTREAMINGALGORITHMSThelinearnatureofthesesynopsisoperatorsgivesusinsightsintoaframe-workfordesigningstreamingalgorithms.Wecancomponentizethedesignbyselectingtherightlinearoperatorforthestreamgiventhemodelconstraintsandseparatelyselectingtheappropriatealgorithmforthestatisticcomputationgivenourspace-errorbounds.Thereareimmediatebenetstothisframework.Forfor Figure8{1:Componentizeddesignofstreamingalgorithms varyingmodelconstraintsbutsamestatisticcomputation,wedonothavetocomeupwithanaltogethernewdesignbuttoadjustthelinearsynopsisoperatortoreectthenewconstraint. Example22 BasicCountingoveranyweightedslidingwindowN,withweightfunctiong,canbeestimatedusinganExponentialHistogramwithwindowsizeN[ 16 ]. 35

PAGE 43

36 Cohenetal.[ 16 ]provethatBasicCountingoveranyweightedfunctiononslidingwindowscanbeapproximatedbyalinearcombinationofpredenedweightsonbucketcountsofExponentialHistogram.Thus,forthisnewmodel,weonlyneedtomodifyourLinearoperatortothisnewweightedLinearoperatorandessentiallyusethesamealgorithmtocomeupwithstatisticcomputationthatwewouldovertheuniformweightedslidingwindowmodel. Figure8{2:WeightedslidingwindowcomputationusingExponentialhistograms Forexample:iftheweightsare8,5,3,2respectivelyforatimewindowextendingthefourbuckets,theresultofBasicCountingwillbeBasicCount=8LD45LD33LD22LD1ThisexampleshowsthatbymodifyingourLinearoperatorforsynopsiscollectiontoadjustfortheconstraintofweightedfunctionoverwindow,weavoidcomingupwithanewalgorithms.Anotherbenetofthecomponentframeworkofdesignisthatifanecientalgorithmisfoundforanymodel,itcanbeeasilyadaptedtoothermodelsduetoitslinearitynature.Forexample,theGKalgorithmforcomputingquantilesover

PAGE 44

37 aninnitestreamcanbemodiedtocomputequantilesoverslidingwindowsbyappropriatelyadjustingthelinearoperatorcorrespondingtothesynopsissee[ 25 ].Dataretal.[ 3 ]havegivenautomaticerrorguaranteesforastatisticfcomputationoverslidingwindowsgivenitserrorguaranteesforinnitestreamingcasebyasimilarcompositionoflinearoperators.

PAGE 45

CHAPTER9CONCLUSIONWehaveshownthatstreamingalgorithmspossesstheunderlyingalgebraicstructureoflinearity.Weshowedthatalgorithmsoninnitestreamingmodel,thedistributedstreamingmodelandslidingwindowmodelallhavelinearoperatorsembeddedwithinthem.Weprovedtightboundsfordistributedalgorithmscomparedtotheirsinglepartycase.Thusweexpectdistributedstreamalgorithmstohavecomparablespaceboundsasgoodwithsinglestreammodels.Weshowedbynumerousexamplesthatitisduetothepropertyoflinearitythatthesealgorithmshavethepropertyoforderlessnessandonepass.Moreover,theycanbeeasilymadeintoparallelalgorithmsduetocompositionalityoflinearoperatorsembeddedwithinthesealgorithms.Weproposedacomponentizedframeworktodesignstreamingalgorithms.Thiscomponentstructureisbenecialinunderstandinghowthealgorithmsworkandalsotomodifythemfornewconstraintsaddedbythemodel.Alsoifnewecientalgorithmsareproposedinfutureforanyparticularstreamingmodel,theycanbeadaptedtoothermodelsbychangingtheunderlyinglinearoperatortoreectconstraintsoftherequiredmodel.WefoundSamplingdiculttoencodeasalinearoperator.SamplingbasedsynopsisisalwaysorderdependentandassuggestedbySurajitetal.[ 30 ],abetterwaytosamplestreamsiswhenthereissomedatadistributioninformationtoarriveatabiasedsamplingprobabilitythatovercomestheproblemoforderdependence.InfutureweexpecttoextendtheframeworktoarriveatagenericmethodbywhichalgorithmsinonemodelcanbeadaptedtoanyothermodelandcomeupwithautomaticerrorguaranteestogeneralizetheworkofDataretal.[ 3 ]. 38

PAGE 46

REFERENCES [1] J.I.MunroandM.S.Paterson,Selectionandsortingwithlimitedstorage,"inTCS12,1980,pp.315{323. [2] NogaAlon,YossiMatias,andMarioSzegedy,Thespacecomplexityofapproximatingthefrequencymoments,"inProceedingsoftheTwenty-eighthAnnualACMSymposiumonTheoryofComputing,NewYork,NY,USA,1996,pp.20{29,ACMPress. [3] MayurDatar,AristidesGionis,PiotrIndyk,andRajeevMotwani,Maintain-ingstreamstatisticsoverslidingwindows:extendedabstract,"inSODA'02:ProceedingsoftheThirteenthAnnualACM-SIAMSymposiumonDiscreteAlgorithms,Philadelphia,PA,USA,2002,pp.635{644,SocietyforIndustrialandAppliedMathematics. [4] ShivnathBabuandJenniferWidom,Continuousqueriesoverdatastreams,"SIGMODRec.,vol.30,no.3,pp.109{120,2001. [5] LukaszGolabandM.Tamer,Issuesindatastreammanagement,"SIGMODRec.,vol.32,no.2,pp.5{14,2003. [6] NogaAlon,PhillipB.Gibbons,YossiMatias,andMarioSzegedy,Trackingjoinandself-joinsizesinlimitedstorage,"inProceedingsoftheEighteenthACMSIGMOD-SIGACT-SIGARTSymposiumonPrinciplesofDatabaseSystems,NewYork,NY,USA,1999,pp.10{20,ACMPress. [7] AlinDobra,MinosGarofalakis,JohannesGehrke,andRajeevRastogi,Processingcomplexaggregatequeriesoverdatastreams,"inProceedingsofthe2002ACMSIGMODInternationalConferenceonManagementofData,NewYork,NY,USA,2002,pp.61{72,ACMPress. [8] GangulyS.,GarofalakisM.,andRastogiR,Processingdata-streamjoinaggregatesusingskimmedsketches,"inInProceedingsofthe9thInternationalConferenceonExtendingDatabaseTechnology,London,UK,2004,pp.569{586,Springer. [9] GrahamCormode,MayurDatar,PiotrIndyk,andS.Muthukrishnan,Com-paringdatastreamsusinghammingnormshowtozeroin,"IEEETrans-actionsonKnowledgeandDataEngineering,vol.15,no.3,pp.529{540,2003. 39

PAGE 47

40 [10] ZivBar-Yossef,T.S.Jayram,RaviKumar,D.Sivakumar,andLucaTrevisan,Countingdistinctelementsinadatastream,"inRANDOM'02:Proceedingsofthe6thInternationalWorkshoponRandomizationandApproximationTechniques,London,UK,2002,pp.1{10,Springer-Verlag. [11] GurmeetSinghManku,SridharRajagopalan,andBruceG.Lindsay,Randomsamplingtechniquesforspaceecientonlinecomputationoforderstatisticsoflargedatasets,"inProceedingsofthe1999ACMSIGMODInternationalConferenceonManagementofData,NewYork,NY,USA,1999,pp.251{262,ACMPress. [12] MichaelGreenwaldandSanjeevKhanna,Space-ecientonlinecomputationofquantilesummaries,"inSIGMOD'01:Proceedingsofthe2001ACMSIGMODInternationalConferenceonManagementofData,NewYork,NY,USA,2001,pp.58{66,ACMPress. [13] A.C.Gilbert,Y.Kotidis,S.Muthukrishnan,,andM.Strauss,Howtosummarizetheuniverse:Dynamicmaintenanceofquantiles.,"inInProceedingsof28thInternationalConferenceonVeryLargeDataBases,2002,pp.454{465. [14] MosesCharikar,KevinChen,andMartinFarach-Colton,Findingfrequentitemsindatastreams,"inProceedingsofthe29thInternationalColloquiumonAutomata,LanguagesandProgramming,London,UK,2002,pp.693{703,Springer-Verlag. [15] ShivnathBabuandJenniferWidom,Continuousqueriesoverdatastreams,"SIGMODRec.,vol.30,no.3,pp.109{120,2001. [16] EdithCohenandMartinStrauss,Maintainingtime-decayingstreamaggregates,"inPODS'03:ProceedingsoftheTwenty-secondACMSIGMOD-SIGACT-SIGARTSymposiumonPrinciplesofDatabaseSystems,NewYork,NY,USA,2003,pp.223{233,ACMPress. [17] Y.ZhuandD.Shasha,Statstream:Statisticalmonitoringofthousandsofdatastreamsinrealtime,"TechnicalReportTR2002-827,NewYorkUniversity,CSDept,NewYork,NY10012,2002. [18] PhillipB.GibbonsandSrikantaTirthapura,Distributedstreamsalgorithmsforslidingwindows,"inSPAA'02:ProceedingsoftheFourteenthAnnualACMSymposiumonParallelAlgorithmsandArchitectures,NewYork,NY,USA,2002,pp.63{72,ACMPress. [19] DavidKempe,AlinDobra,andJohannesGehrke,Gossip-basedcomputationofaggregateinformation,"inFOCS'03:Proceedingsofthe44thAnnualIEEESymposiumonFoundationsofComputerScience,Washington,DC,USA,2003,p.482,IEEEComputerSociety.

PAGE 48

41 [20] AbhinandanDas,SumitGanguly,MinosGarofalakis,andRajeevRastogi,Distributedset-expressioncardinalityestimation,"inInProceedingsofVLDB,2004,pp.312{323. [21] B.BabcockandC.Olston,Distributedtop-kmonitoring,"InProceedingsoftheACMSIGMODInternationalConferenceonManagementofData,pp.28{39,2003. [22] YannisE.IoannidisandViswanathPoosala,Histogram-basedapproximationofset-valuedquery-answers,"inVLDB'99:Proceedingsofthe25thInterna-tionalConferenceonVeryLargeDataBases,SanFrancisco,CA,USA,1999,pp.174{185,MorganKaufmannPublishersInc. [23] KaushikChakrabarti,MinosGarofalakis,RajeevRastogi,andKyuseokShim,Approximatequeryprocessingusingwavelets,"VLDBJournal:VeryLargeDataBases,vol.10,no.2{3,pp.199{223,2001. [24] AnnaC.Gilbert,YannisKotidis,S.Muthukrishnan,andMartinStrauss,One-passwaveletdecompositionsofdatastreams.,"IEEETrans.Knowl.DataEng.,vol.15,no.3,pp.541{554,2003. [25] XueminLin,HongjunLu,JianXu,andJereyXuYu,Continuouslymaintainingquantilesummariesofthemostrecentnelementsoveradatastream,"inICDE'04:Proceedingsofthe20thInternationalConferenceonDataEngineering,Washington,DC,USA,2004,p.362,IEEEComputerSociety. [26] PhillipB.GibbonsandSrikantaTirthapura,Estimatingsimplefunctionsontheunionofdatastreams,"inSPAA'01:ProceedingsoftheThirteenthAnnualACMSymposiumonParallelAlgorithmsandArchitectures,NewYork,NY,USA,2001,pp.281{291,ACMPress. [27] SumanNath,PhillipB.Gibbons,SrinivasanSeshan,andZacharyR.Ander-son,Synopsisdiusionforrobustaggregationinsensornetworks,"inSenSys'04:Proceedingsofthe2ndInternationalConferenceonEmbeddedNetworkedSensorSystems,NewYork,NY,USA,2004,pp.250{262,ACMPress. [28] SMariosHadjieleftheriou,JohnW.Byers,andGeorgeKollios,Robustsketchingandaggregationofdistributeddatastreams,"Tech.Rep.,2005. [29] PraveenSeshadriandMarkPaskin,Predator:anor-dbmswithenhanceddatatypes,"inSIGMOD'97:Proceedingsofthe1997ACMSIGMODInternationalConferenceonManagementofData,NewYork,NY,USA,1997,pp.568{571,ACMPress. [30] SurajitChaudhuriandRajeevMotwani,Onsamplingandrelationaloperators,"IEEEDataEng.Bull.,vol.22,no.4,pp.41{46,1999.

PAGE 49

BIOGRAPHICALSKETCHGurudittaGolaniwasborninAjmer,India,in1977.HedidhisschoolinginvariouscitiesinRajasthan.Afterwards,hejoinedthepremierengineeringinstituteofIndia,IndianInstituteofTechnology,KharagpurIIT-KGP,in1995.HegothisB.TechHonorsdegreein1999inelectricalengineeringandworkedforthenextfouryearsinthesoftwareeldinvariouscompanies.HejoinedtheComputerandInformationScienceandEngineeringDepartmentattheUniversityofFloridainAugust2003,topursueMasterofScience.UnderthetutelageofDr.AlinDobra,heworkedonstreamalgorithmstowardshismaster'sthesisandisexpectedtograduateinAugust2005. 42


Permanent Link: http://ufdc.ufl.edu/UFE0011876/00001

Material Information

Title: Theory of Linear Operators for Aggregate Stream Query Processing
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0011876:00001

Permanent Link: http://ufdc.ufl.edu/UFE0011876/00001

Material Information

Title: Theory of Linear Operators for Aggregate Stream Query Processing
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0011876:00001


This item has the following downloads:


Full Text











THEORY OF LINEAR OPERATORS FOR AGGREGATE STREAM QUERY
PROCESSING















By

GURUDITTA GOLANI


A THESIS PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE

UNIVERSITY OF FLORIDA


2005
















To

I;/ parents and my brother.















ACKNOWLEDGMENTS

I would like to express my deepest gratitude to my supervisor, Dr. Alin

Dobra. His enthusiasm and integral view on research and his mission for providing

'only high-quality work and not less' have made a deep impression on me. I have

greatly benefited from his guidance and emphasis on me pursuing my own ideas in

research. I owe him lots of gratitude for having me shown this way of research.

I feel a deep sense of gratitude for my loving father and mother who formed

part of my vision and taught me to be a good human being first and then anything

else. My brother, Pankaj Golani, is a great source of inspiration that anything

is possible with hard work and honest effort. Time and again he has proven all

assumptions wrong. I am fortunate to be born in such a family.

Finally I would like to thank my friend Srikant R n,-i -in for helping me during

various stages of thesis writing and formatting.















TABLE OF CONTENTS


page


ACKNOWLEDGMENTS ..............

LIST OF FIGURES .................

ABSTRACT ......................

CHAPTER

1 INTRODUCTION ................

1.1 New Streaming Models ..........
1.2 Synopsis Operators ............
1.3 Our Contribution .............
1.4 Organization of Thesis ..........

2 RECENT WORK ...............

2.1 Streaming Models .............
2.1.1 Infinite Stream Data Model ...
2.1.2 Sliding Window Model ......
2.1.3 Distributed Stream Model .
2.2 Synopsis Operators ...........

3 LINEAR OPERATORS AND DATA STREAM

4 INFINITE DATA STREAMS .. ........

5 SLIDING WINDOW STREAMS .. ......

5.1 Definition . . . . .
5.2 Synchronization Boundary ........
5.3 Linearity . . . . .
5.4 Examples ................

6 DISTRIBUTED STREAMS ...........


ALGORITHMS


7 LINEAR OPERATOR AS ABSTRACT DATA TYPE


8 TOWARDS A FRAMEWORK OF DESIGN OF STREAMING ALGO-
R IT H M S . . . . . . . . .

9 CONCLU SION . . . . . . . .









REFERENCES ........... .. .. ........... .... 39

BIOGRAPHICAL SKETCH ........ ........ .. .......... 42















LIST OF FIGURES
Figure page

1-1 Various data stream models ................ .... .. 3

2-1 Exponential histogram constructed over binary stream . . 9

2-2 Wave synopsis: The x axis shows the 1-ranks and on y-axis, each dyadic
level i is labelled by 2' .................. ..... .. 11

5-1 A typical error profile .................. .. 23

5-2 Exponential histogram with nonoverlapping buckets . ... 25

5-3 Waves synopsis with overlapping bucket boundaries . .... 26

6-1 Statistic computation on distributed streams . . 31

8-1 Componentized design of streaming algorithms . . 35

8-2 Weighted sliding window computation using Exponential histograms 36















Abstract of Thesis Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Master of Science

THEORY OF LINEAR OPERATORS FOR AGGREGATE STREAM QUERY
PROCESSING

By

Guruditta Golani

August 2005

C'!I in: Alin Dobra
Major Department: Computer and Information Science and Engineering

There has been a growing research interest in addressing challenges of data

streaming applications leading to a huge growth in the data streaming models and

the number of algorithms to compute statistics sufficiently under the constraints of

limited space and single pass scan. This has led to a separate class of algorithms

for the same statistic computation for different streaming models. We show that

the algorithms over these models have an underlying structure of linearity which

makes computation efficient. We show that any algorithm specific to the basic

data stream models encodes a linear operator. An application of this underlying

linearity structure is that an algorithm particular to one streaming model can be

tuned to other models by replacing the linear operator by a new linear operator

particular to synopsis generation on the new model. These operators provide

important structure that can be utilized in a componentized design of streaming

algorithms.















CHAPTER 1
INTRODUCTION

Data streams are characterized as non-persistent form of massive data arriving

continuously and can be processed only once. This need to process data streams

arises in a number of application domains. Telecom and Internet service providers

collect data continuously at various network units to find interesting usage statis-

tics. Call Data Record(CDR) is generated at end of each call and such generated

CDRs form a massive stream which is analyzed for interesting patterns like misuse,

fraud detection. Web logs at Internet Service Providers form a stream which is

analyzed for DOS attacks and load usage patterns.

Sensor networks is another area which requires processing information con-

tinuously from individual sensors. Sensors are distributed over an area and data

is collected from sensors to process this information. The distributive and decen-

tralized nature of sensor applications pose interesting challenges to process data

streams at each node. Data streams are also created as intermediate results of

operators during evaluation of query plan by an optimizer. Data streams also find

application in Financial domain like ATM transaction records analysis and stream-

ing tickers. All the above application fields produce volumes of data continuously

which cannot be stored on disk and has to be processed on-line only once. Thus,

data streams has grown to be a significant area of research.

Queries over data streams resemble those in traditional persistent database

fields. We are interested in computing .,.-.- regate statistics like join size, sum ,

average over joins, number of distinct elements and order statistics like quantiles,

frequent occurring items. However the massive size of data and a single pass scan

poses severe problems that do not exist with traditional solutions. The query is









continuous in nature thus demanding an answer at every time instant and should

reflect any changes or updates to the stream. Munro and Paterson [1] showed that

any algorithm that computes quantiles over N items exactly in p passes requires

Q(N-) space.

One solution is to maintain a snapshot(synopsis) of the massive data stream

which reduces space though it may not give an exact answer. This has led to

design of approximate algorithms which are based on space-accuracy tradeoffs.

Thus, the challenge in designing algorithms for any statistics for data streams is

to keep synopsis of data in space sub-linear to size of stream and keeping per item

processing time small. For example, Alon et al. [2] give a randomized approximate

algorithm which computes self join size of a stream in space O( I (logN +

logM)) where, N = size of stream, M = size of universe, c is the error probability

and A is the relative error. The focus then shifts to synopsis generation techniques

which save space optimally with good error bounds. Histograms, ,i 1i li :7.;

Wavelets, randomized sketches are some of the techniques currently in literature.

Added complexity is introduced by putting more constraints on the statistics

computation. What may be a trivial computation in one case can become l, 1i

on adding a simple constraint.

Example 1

BasicCounting [3]:

Given a stream of data elements, consisting of O's and l's, maintain at every

time instant the count of number of l's in the last N elements.

The unbounded version of the problem would be to keep count of l's at all

times from the beginning and a trivial solution requires space of O(log(n)) for an

exact answer or O(loglog(n)) space for an approximate answer, for n being size

of stream. However, under constraints of a window of last N elements, it can be










shown that the best solution possible takes (N) space for exact and 0(flog2(N))

space for approximate answer.

Data stream models have been developed to broadly encompass the added

constraints which relate more to the way a synopsis is to be collected from the data

stream than to the nature of the statistics to be computed. Constraints can be

like different physical locations or can be logical in nature. An example of a logical

constraint is that query be answered only for the most recent data portion. An

example of physical constraint is, quantile computation over the union of streams

at a distributed network. Broadly, there are of three types of streaming models:

the infinite data stream model, the distributed streaming model and the sliding

window model. Hybrid versions of the above add further complexity to the design

of required algorithm.

Strenmmng models




Infinite stream Distrbuted Window Steam
model stream model model

Figure 1-1: Various data stream models


1.1 New Streaming Models

The above models are by no means the only existing models of streaming data.

For example, distributed sliding window models exist and calculating statistics over

them is a separate challenge. In a distributed setting, different streaming rate at

each node can represent a challenge not covered by the above settings. Weighted

sliding window model is more generic than the (uniform weight) sliding window

model.

Interesting questions to ask are: Are the algorithms devised for each particular

streaming model really a new class of algorithms or can we find an underlying









theme to the design of such algorithms. For example: can a new algorithm that

improves space bounds for a particular statistics calculation in one streaming

model class help improve results in other? or they are all fundamentally different ?

Another question can be, for example, the problem of join between streams arriving

at different rates. Does this problem require defining a new model? Or can we take

this constraint of different streaming rates like an operator applied to data before

the join is processed much like in relational algebra?

1.2 Synopsis Operators

One of the challenges in designing a Data stream query management sys-

tem(DSMS) is to integrate streaming operators with existing relational operator

based architectures. These streaming operators should be nonblocking in nature.

Also the huge size of intermediate result demands small space internal state from

these operators. Approximate solutions using small space stream synopsis operators

are used to answer such queries[4, 5]. In distributed stream architectures, that is

the only solution. The challenge with DSMS architecture is to find useful structure

in these synopsis operators to use them in a generic manner.

1.3 Our Contribution

With the above context, we show that algorithms associated with the infinite

streaming, distributed and sliding window models have an underlying algebraic

structure of linearity to them that is utilized by any algorithm to efficiently

compute the statistics.

We prove any streaming algorithm encodes a linear operator. We link the

distributed streaming model to the infinite stream setting and prove equivalence

between them and arrive at a tight bound between the space requirements of both

models in computing a statistic. Window constrained streaming algorithms are also

shown to utilize linearity in a fundamental way.









Finally, we look at applications of the underlying linearity structure and

propose that the same algorithm can be used in multiple streaming model instances

by replacing one Linear operator corresponding to synopsis creation specific to that

model by another. We call this as compositionality and propose a nice component

structure to the design of the algorithms over the set of streaming models.

1.4 Organization of Thesis

Having defined the motivation for this research, here is an outline for the

rest of the thesis. In C'!i lpter 2, we discuss previous work related to streaming

algorithms for various statistics and partial generalizations. In C'!i lpter 3, we give

basic definitions for linear operators and a few notations.

In C'!i lpter 4, we prove that every infinite streaming algorithm encodes a linear

operator at its core and give examples. In ('! Ilpter 5, we give precise definitions

related to sliding windows and show how linearity is essential in designing any

sliding window algorithm. In ('! Ilpter 6, we show that distributed streaming

algorithms are a generalization of the single party infinite stream algorithms with

tight space bounds with respect to unbounded stream model and the proof reveals

that linearity is essential to arrive at the efficient bounds.

In ('! Ilpter 7, we discuss the implications of linear structure of these oper-

ators and how existing research has been using these properties but missing the

underlying structure. In C'! Ilpter 8, we provide initial framework towards de-

sign of streaming algorithms in general. Finally, in C'! Ilpter 9, we summarize our

observations and contributions presented in this thesis.















CHAPTER 2
RECENT WORK

Small workspace and single pass scan for query answering on streams is

a fundamental requirement on any algorithm which answers that query. Since

streams can be massive in size, space cost greater than sublinear in size of stream

is infeasible for many applications. Munro and Paterson [1] showed that any

algorithm that computes quantiles exactly in one passes requires 2(N) space. Alon

et.al [6] showed that computing frequency moments on streams exactly would

require 2(N) space. This prompted research to develop approximate algorithms

which saves space while allowing for errors. A (c, 6) streaming algorithm for any

function f computation is an approximate algorithm which takes as input an error

parameter c and a confidence interval 6 and computes an c-approximate statistic f

over entire stream seen so far with probability atleast 1 6.

Streaming models have been introduced to represent various physical and

logical constraints over which to compute approximate answers to the statistics.

We list the properties of each model and recent advances in statistic computation

over the basic stream models.

2.1 Streaming Models

2.1.1 Infinite Stream Data Model

This is the basic model where the statistics is computed over the entire stream

seen so far at one location in one pass.1 Thus the size of stream over which a



1 This model can be easily extended to incorporate multiple streams arriving
at same location from different sources. Henceforth we will consider only a source
stream but that does not limit the scope of the model









query has to be answered can go to infinity. This model demands the following

properties from a synopsis: small space, orderless and one pass construction.

Orderlessness implies that the answer computed by the algorithm has to be the

same irrespective of the order in which input is seen. Examples include computing

frequency moments and join sizes [2, 7-10], order statistics such as quantiles [11-13]

and frequent items [14]. Let us look at two of the examples in more details:

Example 2

AMS sketch for L2 norm computation.

AMS sketches are random linear projections of stream elements as a vector of

size the size of universe U and with an element at ith position being the frequency

of the universe element indexed at i. A sketch is created as follows:

1. first selecting a family of four-wise independent binary random variables

Si : = 1,...U, where each e {-1,1} and P[i = 1] = P[, = -1] = 1/2.
2. On arrival of a new item i, a count X, maintained at all times, is incremented

by ,i Thus, X+ = i.

3. On querying, we define another random variable, Z = X2. It can be shown

that, E[Z] = Z, f. This is the unbiased estimate of the second frequency

moment of the data stream seen so far.

4. To further improve quality of estimation, we keep siS2 such Z samples take s2

averages of si samples. and from such a set of averages, we pick the median

value. This technique is called the boosting technique to reduce variance.

The space used is sublinear, Q( (logN + logM)) where, N size of

stream, M = size of universe, c is the error probability and A is the relative error.

Example 3

Quantiles using GK sketch [12].

GK algorithm is a deterministic approximate algorithm to compute quantiles

from a stream. Synopsis is created as follows:









1. Maintain sample value such that any one sample value covers 2eN range of

values between it and the prior value in sample. Thus, there can be almost

an error of cN.

2. On arrival of a new element x, a tuple < x, 1, 2eN > is created. The second

element of the tuple gives the capacity of this tuple and the invariance

condition on the combination of second and third element of the tuple

guarantees that number of elements covered between the tuple and the tuple

previous to it is not more than 2cN.

3. If space bound is exceeded, the synopsis is compressed such that the above

invariance property holds for the compressed tuples.

4. Query is answered by finding the tuple in the synopsis with rank closest to

the desired rank.

The space used is sublinear, O(llog(cN)), where c is the relative error and

N = size of stream and the method is deterministic. A restriction of this algorithm

is that it does not handle deletions.

2.1.2 Sliding Window Model

Many applications require decision making based on statistics over stream el-

ements with more weight to recent stream elements. Also the querying is expected

to be continuous. Examples include financial stock ticker information in a 5 minute

setting or to compute web traffic statistics over the last dw i from the web log data.

To address this, sliding windows model has been introduced[15]. Timestamp based

sliding window of size N consists of all data elements whose arrival timestamp is

within the interval N of current timestamp. Count based sliding window of size

N consists of N most recent data elements that have arrived so far. The more

general sliding window model where a weight function is assigned to the window is

introduced by Cohen et. alet al. [16].









The challenge in sliding window model is to forget the expired items from the

window as new items arrive and old items expire. Even for the simple problem of

BasicCounting as described above, it can be done exactly in space 2(N) where N

is the size of window. However, many applications require memory usage in size

sublinear of the size of the window and the window size N can large too. Following

are a few synopsis techniques prevalent in literature to address the problem of

forgetting efficiently in an approximate manner:

Example 4

Basic Window:[17]

In this scheme, the sliding window is subdivided equally into shorter, basic

windows. Arriving elements are kept in a basic window until it is filled and then

new window is used. Query is answered by adding up contributions of all filled

basic windows. A basic window contribution is added as long as it has even one

element that is active. Contribution of new elements are ignored until their bucket

window is filled. Then, the contribution of the oldest window is subtracted and

that of the newest one added. However, this does not give a close error bound.




stream direction




1 0 1 1 1 0 0 1 10 01 1 1 0 01





Exponentially sized buckets


Figure 2-1: Exponential histogram constructed over binary stream









Example 5

Exponential HI. l, .;'i- ,EH):[3]

This scheme is adaptive and gives answers which are alv--,v- relative error

bound. New elements are kept in single element buckets and as the elements get

older, they are merged together to bigger size. Thus bucket sizes are exponentially

increasing. A size invariance is ahv--, maintained such that the contribution of

the elements in the oldest bucket never exceeds e times that of the other recent

buckets. The size of buckets are kept exponentially increasing so that number of

buckets is logarithmic in size of window. Expired buckets are dropped.

In statistics computation, the contribution of the oldest bucket is ignored if its

timestamp has expired. This scheme has been proven to give tight space bounds.

Example 6

Waves:[18]

This is an improvement over the EH scheme, as merging in EH takes

O(log(N)) time per element, by avoiding the merge step. They improve on the

above by keeping several buckets at each dyadic level of the most recent elements.

Each bucket is marked by the rank of the 1 element as seen in the sliding window

stream. The composite structure looks like a wave and since buckets do not have to

merge together, update time per element is 0(1).

The buckets from different levels overlap each other but maintain the same

invariance as above that the contribution of the oldest bucket is never more than

e times that of the other buckets non-overlappingly covering the window. Expired

buckets are dropped.

Datar et al. [3] showed a technique by which algorithms for statistic f

computation over infinite streaming model can be applied to sliding window model

provided the function f satisfied a few constraints listed by them. This to our

knowledge, is a first attempt at generating automatic guarantees and plug and p1 li



















by 2

by 4 99
72 84
by 8 44 76 -

by
16 0 25 67 91


0 16 32 36 40 44 46 47 48 49 50





Figure 2-2: Wave synopsis: The x axis shows the 1-ranks and on y-axis, each
dyadic level i is labelled by 2


framework for computation on sliding window model given error guarantees for

infinite streaming algorithms.

In this thesis, we show that such a plug and p1 l framework is possible due to

the underlying algebraic structure of linearity and show applications of it.

2.1.3 Distributed Stream Model

In this model, for example sensor networks, streams arrive at various nodes

of a network and the statistics to be computed is to be over the union of these

streams. It is impractical to send all the stream data to one central point due to

communication costs and single point of failure. So each node maintains synopsis

in limited workspace for its own stream. The challenges posed by this model are

(a) small space computation at each node, (b) low communication bits between

nodes. Demand on the synopsis by the model also includes taking care of error in

.,. I- regation at each node due to various diffusion speeds of the network. Examples









include computing ..-'--regate statistics over networks [19], distributed set expression

cardinality [20], distributed distinct value estimation [18].

Example 7

Distributed Top-K monitoring[21]:

Each node in the distributed setting receives a stream and top-k frequent

items are to be identified over the union of these streams. Each node keeps

an approximate top-k synopsis over its stream and rel i this synopsis to the

central node at appropriate times. The central node adds up these individual

node synopsis to arrive at a top-k synopsis over the union of these streams. The

challenges are to add up these synopsis over the union of the streams to get a

distributed top-k synopsis and to keep communication costs at a minimum by

relaying synopsis data to central node only when underlying stream top-k changes

are sufficient to inform the central node.

2.2 Synopsis Operators

One of the challenges in designing a Data stream query management sys-

tem(DSMS) is to integrate streaming operators with existing relational operator

based architectures. Thus, it should be able to handle queries over streams and/or

relations. The problem arises in the nature of operators that are to be evaluated.

Pipelined joins take lot of space in handling intermediate data. Relational .I.-.- e-

gation based operators such as Min, Sum, Avg joins are blocking operators, i.e.

they require to see all the input before they output one tuple of output. Clearly in

case of streams, this is infeasible due to the large amount of stream data and for

the need of continuous evaluation of queries.

Various schemes have been designed in existing architectures for streams to un-

block such operators. Windowed operators are used so that only the recent stream

data is used in answering queries. Other i-- ,- include maintaining incremental

views. However, these solutions fail when the size of window is large and cant be









stored in main memory. In those cases, approximate solutions using small space

stream synopsis operators are used to answer such queries.[4, 5]. In distributed

stream architectures, that is the only solution.

Ioannidis and Poosala [22] -, -.-.; -I. 1 a Histogram based algebra to answer

approximate queries in a relational setting by considering histogram buckets to rep-

resent relation like tuples. Each bucket forms a tuple (lo, hi, distinctcount, average)

and the same query tree can be run on this set of histogram tuples to get approx-

imate answers. C'!I il: il)arti et al. [23] similarly proposed a Wavelet Algebra to

answer queries in an SQL manner. The challenge with DSMS architecture is to find

interesting structure in these synopsis operators to create their libraries and use

them in a generic manner.

In this thesis, we describe a powerful property of these synopsis operators

which is useful in query engine design.















CHAPTER 3
LINEAR OPERATORS AND DATA STREAM ALGORITHMS

We propose to look at any Data Stream Algorithm for statistic f as equivalent

to, F(L(D)), with D a multiset of items E Rm, (m as size of universe) as streaming

data input, and L : m -- "' as the linear operator which computes synopsis and

F : 'n -- R as an .,i--:regate or order function over the synopsis. If the function

is to be computed over multiple streams, example join of two separate streams r

and s, it is equivalent to, F(L(Dr), L(D,)). This can be extended to any number

of streams but from now on we will concentrate from now on functions over single

streams.

Definition 1. An operator L defined as a (D, E) pair is linear if it possesses a

null element and along with E commutative and associative addition satisfies the

following


VDI, D2 multiset of items E m, L(D1ID2) = L(D1) ( L(D2)


A few properties become immediately obvious:

Linear operators reduce the incoming data from dimension m to a much

smaller dimension n < m. Thus, L(D) forms a summary of D.

The above condition can be written as,


L(D)= edEDL(d)


Thus, we need to define the linear operator only on the smallest element

and together with the E, a small dimension summary can be computed over

which the F can compute the statistic.









Let us take an example:

Example 8

Computing Average on a stream

Let the stream of elements arrive as al, a2, a3,..., a,. Let Average = F(La,g(D)).

The Linear operator for Average Lag


Lavg() = < 0, 0 >

Lag(ai) = < ai, 1 >


Let L(D1) < sl,cl > and L(D2) < s2, c2 > for D1, D2 parts of stream. Then,

the E is defined as


Lavg(D) Lavg(D2) = < s + s2, cl + c2 >


F is

F(< s,c >) -

Average for the stream is computed by adding linear operators for each element of

the stream as it arrives and it can be continuously answered.

The linear operator for Average has interesting properties. It does not require

stream elements to arrive in any particular order. This is due to the fact that Lavg

is commutative and associative. Thus, it is order independent. If the stream

is split into two arbitrary streams, Linear operator for Average can work on each

of them separately and the average over the union of the two streams computed

using E. Thus, it can handle distributed computation. We can see that linear

operators with their properties are useful in data stream computations as they have

properties of order independence and allow distributed computation.

With the above definition and properties of linear operators, we now look at

each of the three basic models and identify the role of linear operators in the class

of algorithms specific to each.















CHAPTER 4
INFINITE DATA STREAMS

In this section, we look at algorithms that compute statistics over an infinite

stream that arrives at the same location where query is to be answered.

Definition 2. An algorithm is an infinite data stream(DS) algorithm for any

function f : A" B is an approximate algorithm which takes as input an error

parameter c and a confidence interval 6 and computes an c-approximate statistic f

over entire stream seen so far with probability atleast 1 6.

Any DS algorithm should possess properties of small-space computation,

one-pass and orderless assumption on arrival of input. Orderlessness implies that

algorithm makes no assumption about the distribution of the data.

We now show that infinite data stream algorithms use linear operators at their

core to form synopsis.

Theorem 1. Any infinite data stream i'lj.:thm can be seen as an application of

linear operator to a data stream.

Proof. We give our proof by constructing such a linear operator given a DS

algorithm. We show that for any statistic f computed, the F(L(D)) operation of

any algorithm has the linear synopsis construction part as core of it.

Let the Turing machine that simulates the DS algorithm be M. We define a

function S(QDI, QD2) = DI|D2 where QD corresponds to state of the machine

after input D.

We define our linear operator as follows:

Identity element

L(O) = (0) (4.1)









where f(0) is the state when there is no input.

Addition operator

L(D1) L(D2) += S(QDI, QD2) (4.2)

We now show the commutativity and associativity property of the operator.


L(D1) L(D2) S(QD1, QD2)

= QDI|D2 = D2||D1

= D11|D2 S(QD2, 2D1)

L (D2) L(D1)


The key step being to note that due to orderless property of the DS algorithm

machine, QDI1lD2 = D211DI. That is, the final state after first inputting D1 followed

by D2 is the same as the final state after the sequence D2 followed by D1.

We prove associativity by similar argument. Hence,we defined a linear operator

given a machine that simulates DS Algorithm.

It is easy to prove the other direction. If we have a linear operator for a given

statistic computation, it can be easily written as a DS algorithm.



This theorem i v that any infinite data streaming algorithm applies a linear

operator to the data stream to collect synopsis and compute statistics on it. The

properties of orderlessness, one pass scan are derived due to the linear operator

involved.

Examples

We show examples of algorithms proposed in literature to compute statistics

over infinite streaming data and derive the linear operator that creates synopsis

from the data stream.









Example 9

HI/..'ir,, based r,"',- sum computation on a stream [22].

Given buckets, B1, B2,..., Bm each maintains a tuple (lo, hi, totalcount) and a

function h that hashes new items to appropriate buckets, the Linear operator that

works on each item as it arrives is as follows


(xi) = Bh(i).totalcount + xi


The query is answered by locating buckets which fall in the range given by the

predicate and adding up totalcounts of the buckets. The E operator here is the

arithmetic + and the operation is commutative and associative.

Example 10

L2 norm computation using AMS sketches[2].

As described previously in example 2, AMS sketches are constructed by projecting

incoming stream elements onto random ( binary vectors. The Linear operator

that works on each item as it arrives is as follows


f(Xi) = i x xi


where, $i, VieU is the set of random 4-wise independent 1 variable.

And the D operator is the arithmetic +.

Example 11

Lp norm for 0 < p < e/logU [9].

Distributions with stability parameter p have the following property: If

random variables X1, X2,.., X, have stable distributions with parameter p, then

alX1 + a2X2.... + a1Xi is distributed as (Z la lP) Xo, where Xo is a random

variable with p stable distribution. Cauchy distribution is stable with p = 1.

Gaussian distribution is stable with parameter p = 2. There exist formulae to

generate stable distributions for 0 < p < 2.









Thus given xij independent random variable from a stable distribution with

parameter p, for any arriving element ai we maintain



i

we are guaranteed that each sk(a)j is distributed as (Ei laP)-X, where X is

a random distribution with stable parameter p. By taking median of all entries

Isk(a)jlP we get a good estimate for (l~)" norm.

To compute Lp thus, We construct beforehand, xi independent random

variable from a stable distribution with parameter p. On arrival of new item ai, the

Linear operator works as follows


4(ai) = Xi x ai


The E operator is the arithmetic + operator. Lp norm is calculated by maintaining

several such synopsis and picking the median of them.


Lp,(D) = medj( xij, x a,)
i

Example 12

Computing SUM and other aggregates over join [7J.

For a query such as: SELECT SUM(Ri.Aj) FROM R1, R2, ...R, WHERE (,

where ( gives the join condition, and under assumptions on ... i, licity of attributes

given by [7], AMS sketches are extended to compute ..: :regates over joins.

The Linear operator for all but Xi works as


f4(~.) xfnjj


where each j is a random 4-wise independent 1 variable over one set of attribute

joins.









For X1, the linear operator works as


f(xi) = Sum(A1)xiljl4

The E operator is the arithmetic + operator. So it follows associativity and

commutativity. The query is answered by


Q(SUM) = IlkXk

Example 13

Point query on data stream using wavelet decomposition [23, 24].

Wavelet transforms are useful in capturing trends in data series. Coefficients

are computed by a series of low pass(resulting in averaging coefficients) and high

pass filters(resulting in difference coefficients) over the data and can be done in one

pass. The Coefficients form a synopsis of incoming data streams. Space saving is

obtained by retaining coefficients above a certain threshold value.

To answer the point query, a column vector is constructed with all entries

0 except that of which the point query is being asked and the value there is 1.

The query is answered by computing the inner product of the wavelet coefficients

synopsis vector with the wavelet synopsis vector of a point query vector.

If V is the vector of wavelet synopsis coefficients and ( as the vector of wavelet

basis vector, the Linear operator that works on each item as it arrives is as

follows

(xi) =< (, [0000... (xi)000] >

and the synopsis is incremented as


V V + (xi)


The E operator is the vector addition operator and it is commutative and associa-

tive.









Example 14

Quantile query on infinite data stream using GK method[12].

GK algorithm for quantile computation is a deterministic approximate

algorithm with the best space bounds for a given error.It works by keeping range

of values inside the memory and compressing memory by kicking out values when

memory is full. Thus it has an INSERT and a COMPRESS operation.

The Linear operator that works on each item as it arrives is as follows


(x) = (create tuple) = < x, 1, 2eN >


We define our D operator as


x y = INSERT = COMPRESS


INSERT and COMPRESS are as defined in [12]. It is easy to see that is commu-

tative as the INSERT operation creates a sorted list and has an identity element of

0.

We prove associativity as below


(x y) z = x (y z)

This is because the algorithm guarantees that irrespective of order in which x,y,z

arrive, the error will be upper bounded by max. error and the equality holds in

that respect.















CHAPTER 5
SLIDING WINDOW STREAMS

In this chapter, we concentrate on timestamp based sliding windows from

here on. We introduce a definition for a sliding windows algorithm and show how

linearity is at the heart of any efficient windows algorithm.

5.1 Definition

We previously identified a data stream algorithm to have properties of one-

pass scan, sub-linear space efficiency and order invariance in computing the result.

Sliding window algorithms should have similar properties. We need to redefine

order invariance with respect to a sliding window. Order invariance property of a

sliding window algorithm means for any order of data within the window bound,

the answer computed is the same and does not depend upon order of data.

Sliding window algorithms have to answer queries continuously over the

moving window N and it can be shown that they cant exactly answer even the

most basic queries in space O(log(N)). This is due to the fact that it takes a lot

of space to remember which elements contribute to the calculation of the statistics

and which to forget accurately. Thus, they introduce error when computing

approximate answers in space O(log(N)). Exponential Histograms [3], Basic

Windows [17] and Waves [18] are some of the common schemes. A common error

profile over time is shown in figure below.

The important thing to notice from such an error profile is that since sliding

window algorithms do an approximate job of forgetting the expired data items from

a window of size N, the error due to this approximation should not increase with

time and should be kept within a bound. We thus identify a constraint on a sliding










synchronization
boundary




0





timestamp





Figure 5-1: A typical error profile


window algorithm that we put together in a definition of sliding window algorithm

as below:

Definition 3. Along with the properties of sublinear space, order invariance, and

one pass scan, an algorithm is a sliding windows algorithm if for a window size of

N, after almost N timestamps, the error goes to zero atleast once.

Intuitively speaking, the algorithm should give perfect answer atleast every N

elements inserted. We define that point in summary window where error is zero as

synchronization boundary.

Definition 4. Synchronization boundary in a sliding windows is a timestamp at

which the error due to the algorithm is zero.

Thus in the above figure, the synchronization boundaries are along timestamps

where the error due to items from that timestamp to the most recent item seen

is zero and the algorithm gives a perfect answer. There can be more than one

synchronization boundary in the sliding window synopsis. It should be noted

that if the algorithm is deterministic, we achieve a zero error atleast once every

N timestamps. If however to save on space, the algorithm itself has a randomized









guarantee within certain error bounds, our definition implies that atleast once every

N timestamps, error contribution due to sliding window model is zero.

Example 15

BasicCounting il'-,', .hm due to Datar et. al[3/.

The problem of BasicCounting as previously defined in example 1, is to

count number of 1's in a bitstream for most recent N items. Datar et al. gave a

deterministic algorithm using Exponential Histograms ( see example 5). The syn-

chronization boundaries are the bucket boundaries of the exponential histograms

and error contribution of elements from the latest up to any of the synchronization

boundaries is zero.

Example 16

L2 norm using AMS sketches on sliding window[3].

L2 norm using AMS sketches is a randomized scheme applied to sliding

window scheme. It works as follows: AMS sketches are maintained for each bucket

of the Exponential Histogram synopsis. The sketches are composable and sketches

only for the buckets whose all elements are within the most N recent items are

added and L2 norm computed on the combined sketch.

There are two sources of error: due to the approximate computation of L2

norm using sketches and error due to imperfectly forgetting the expired items (by

not including sketches of buckets with elements some of which may be among most

recent N items). Our constraint implies that, error due to imperfectly forgetting

the expired items should reduce to zero atleast once every N timestamps.

5.2 Synchronization Boundary

This additional constraint on an algorithm to qualify as a sliding window

algorithm -i-i::. -1 to introduce more than one synchronization boundary per N

items scanned to bring error down to zero more often. Exponential Histograms and










Basic Windows schemes do just that. Synchronization boundaries for both schemes

are timestamps of the earliest element in each bucket.

While a synopsis of the data is being maintained, to answer queries efficiently,

data from the synchronization boundary to the latest timestamped entry has to

be kept exact as per the definition of sync boundaries. Error is introduced due

to inexact information kept before the earliest sync boundary due to incomplete

information about expired elements versus active elements.

5.3 Linearity

Thus any scheme that follows the above constraints maintains separate synop-

sis for the data between any two synchronization boundaries. Hence, bucketization

is an inherent structure to any such scheme.

Example 17

Exponential histograms (Example 5) maintain nonoverlapping buckets where each

bucket represents data synopsis between two synchronization boundaries.


stream direction

N

11 0 1 1 1 0 0 1 1 1 0 0 1 1 1

synchornization boundaries
imperfectly perfectly remembered data
remembered data
Exponential Histograms


Figure 5-2: Exponential histogram with nonoverlapping buckets


Example 18

Waves synopsis(Example 6) maintains overlapping buckets with synchronization

boundaries as bucket boundaries and several buckets are maintained at each dyadic

level.

A sliding window algorithm has to do the following:










Sliding window size N



Synchronization boundaries



by 2
by 4 72 4 99

by 8 44 76

by
16 0 25 67 91


0 16 32 36 40 44 46 47 48 49 50


Figure 5-3: Waves synopsis with overlapping bucket boundaries


1. Incorporate new item.

2. From the bucket synopsis, compute the result of the query.

This can be done by defining a linear operator to add new item to a bucket.

If more than one synchronization boundaries and hence more than one bucket,

the bucket synopsis can be merged linearly by defining a suitable plus operator to

arrive at the answer. Hence, we can -- any sliding window scheme has to have a

linear synopsis creation and merge method in the core of it.

5.4 Examples

We now show two examples where we derive linearity structure of algorithms

proposed in literature applied to sliding windows. We do not deal with how

each algorithm created synchronization boundaries along the sliding window

length. That is specific to an algorithm designer as to how much error is he

willing to allow in spacing the synchronization boundaries apart. We show how

new elements are incorporated into the synopsis structure and how queries are

answered.Mathematically, we show the underlying linear operator and the D









operator. We take two examples, one each to compute .,..-regation statistics and

order statistics.

Example 19

L2 norm computation on sliding windows [3].

AMS sketches are maintained for each bucket of the sliding window and L2

norm is computed by adding these sketches together from active buckets using

technique similar to that for infinite case (see Example 2). The Linear operator f

that works on each item as it arrives and forms a single bucket is as follows


f(xi) i X xi


where, Vi E U is the set of random pairwise independent 1 variable.

The E is the arithmetic + which is used to merge buckets together and to compute

combined synopsis from active buckets for answering L2 norm query.

Example 20

Quantile query on sliding windows [251/12].

GK synopsis are maintained for each bucket in the sliding window and merged

together for active buckets and the computation is the same as in infinite streaming

model. The Linear operator f is the same as defined in the case for infinite data

stream, that works on each item as it arrives is as follows


(x) = (create tuple) = < x, 1, 2cN >


We expand the functionality of our D operator by defining two D operators.

They are uintra bucket and (inter bucket. The (intra bucket is the same as the operator

in infinite stream case.









The other plus operator works on buckets of synopsis and is as follows


x winter bucket y = MIN(x,y) = form tuple T = INSERT (S,T)

f form tuple T' from MAX(x,y) = INSERT(S,T')




Commutativity is guaranteed by MIN function and Associativity holds by a

similar order invariance argument for (intra bucket as below


(X winter bucket Y) Einter bucket Z = X Einter bucket (Y Einter bucket Z)

This is because the algorithm guarantees that irrespective of order in which x,y,z

buckets are formed, the error will be upper bounded by max. error and the equality

holds in that respect.














CHAPTER 6
DISTRIBUTED STREAMS

With respect to our definition of a Data Stream Algorithm as equivalent to,

F(L(D)), Distributed Stream model can be seen as a generalization of the infinite

stream single receiver model. Separate streams arrive at each node of the network

and the statistic is to be computed over the union of all these streams. In this

model, since transmitting all the stream items to a central point involving lot of

communication is not desirable, each node computes a small space synopsis over

the stream it gets and atleast the co-ordinating node receives synopsis information

from all other nodes and computes the statistic over the union of the synopsis

across all nodes. The key challenges here are: to maintain small space synopsis at

each node and to keep communication bits at a minimum.

It is easy to see that a distributed algorithm for statistic computation can

easily be applied to a single party streaming model with same space-error bounds.

However the reverse is not true. An algorithm designed for single party streaming

model cannot be trivially applied to distributed streams. However, it has been

shown that space costs over both models are tight with respect to each other.[26].

Formalizing this, Consider a single party streaming setting where a function

f is to be computed over n streams X1, X2, .., X, interleaved in random order to

produce one single stream X. Let the space cost of a deterministic computation be

SSC(f). The distributed setting equivalent of this is to compute f over streams

assigned to each node i, Xi. Let the space cost of deterministically computing f

over this be DSC(f). Gibbons et. al [26] show that:

Theorem 2. For ,.:; n > 1 and .:;' function f, SSC(f) < DSC(f) < nSSC(f).









This theorem -zi,- that the cost of distributed computing over streams is

greater than that for computing over a single location but is tight bounded by the

single stream cost with a factor of n. It is not clear if DSC(f) < nSSC(f) or if the

equality holds at all times.

We improve the bounds on the above as follows:

Theorem 3. For ,.:q; n > 1 and ,.;,q function f, DSC(f) = nSSC(f).

Proof. In the proof of the theorem mentioned before, if Pm is the best protocol

with cost SSC(f) for calculating f over single interleaved stream and Pd is the

protocol for calculating f over the distributed streams where each node runs Pm to

compute synopsis L(Xi) of size s and transmit this to the co-ordinator node which

calculates f(L(X1), L(X2),.., L(X,)). Thus total space used is n times size of each

synopsis. and hence, DSC(f) = nSSC(f).

Assume to the contrary that there exists a protocol Pd such that DSC(f) <

nSSC(f). Thus, there is atleast one node i which computes L(Xi) synopsis and

sends less than that data to the central node. Thus, what we have is a protocol P,

that computes inefficient synopsis which can be improved upon by discarding that

piece of data not deemed useful to send to central node and therefore, we can have

a better streaming algorithm/protocol for single party, thereby reducing SSC(f), a

contradiction. E

This theorem i,- that space complexity of deterministic computation of any

f (whether exactly or approximately) over a distributed streaming model is n times

space complexity of a single party streaming model. It -zi- that we cannot find a

distributed protocol that can do better than this. This gives us two observations:

1. Any efficient distributed algorithm for a statistic computation f should have

matching space bounds with a single party streaming algorithm for same f.

2. The underlying structure of efficient distributed computation is linear. We

achieve a DSC(f) = nSSC(f) bound by sending all the L(Di) synopsis to


















D=D1 U D2U UDn


L/D
L(D)


F(L(D))


Statistics computation on
single unbounded stream


F(L(D1) + L(D2) +....+ L(Dn))


Statistics computation on distributed
streams.


Figure 6-1: Statistic computation on distributed streams


the central node which has to figure out how to combine them such that


L(D) = L(DI) L(D2) ........ ( L(D,)


Thus, each node computes its own synopsis from incoming stream and then

through a protocol transmits the synopsis to a central point which then

computes the statistic on them. Mathematically, if there are n nodes, and

each node i receives data Di, then, the statistic is computed as


Statistic = F(L(DI) L(D2) ... L(D,)) = F(L(DI U D2 U... U D,)) (6.1)


Thus, the underlying structure of any statistic computation over the dis-

tributed model is linear as synopsis from various nodes are combined (E) and the

final result is irrespective of the order of combination since we do not make any

assumption on which stream should arrive at which node. That gives us properties


/ 1/


/ '/







32

of commutativity and associativity with the linear operator embedded within such

an computation.















CHAPTER 7
LINEAR OPERATOR AS ABSTRACT DATA TYPE

Linear operators are essential for any small space streaming algorithm design.

They have nice properties that are used extensively in existing literature without

being explicitly mentioned. Recent research literature abounds in application

of same algorithm to different streaming models by defining/modifying the (

operator without realizing the power of the underlying structure. Nath et.al. [27]

describe order insensitivity and duplicate insensitivity of the synopsis as required

for distributed computation. Duplicate insensitivity is a form of constraint that

can be handled by appropriately defining a D operator. The Union Property of

Count-min and FM sketches used by H .liii. 1 fl i. i;iou [28] to create distributed

synopsis is inherent due to their linearity structure.

A Linear operator can be viewed as an Abstract Data type with a null

element, a D addition and a getquery function. There are substantial benefits to

viewing a Linear Operator as an ADT:

Various linear operators as ADTs will have similar interface which can be
used by an extensible Database system like PREDATOR [29] to answer
streaming queries.

Linear operators as ADTs lead to generic algorithms with predefined inter-
faces.

Linear operators as ADT's have the added property that they are com-
posable and have properties of commutativity and associativity and thus
any streaming query engine can utilize their this property to have flexible
optimizations.

A linear operator ADT has interesting relation to user defined relational

..- -regate operator. It can define INITIALIZE, ITERATE and TERMINATE

functions and can answer queries continuously.









Example 21

QUERY: SELECT COUNT(*) FROM X1,X2 WHERE X1.a = X2.a.

Linear Operator(x1i, x2i)

INITIALIZE:

state :Yx, = 0; Yx2 = 0;

ITERATE:

UPDATE state: Yx,+ = i xli Yx,+ = i x2i;

RETURN output: Yx1Yx2;

TERMINATE:

RETURN output: Yx1Yx,;



This linear operator follows the AMS scheme of calculating size of join where

(i is the family of four-wise independent binary random variables where each

i G {-1,1} and P[, = 1] P[, = -1] = 1/2.

But while user defined relational .,.- --regate operators do not allow for compos-

ability in their definition even though the ..-I- regate function itself is composable,

Linear operators as ADTs have this property in their definition as ADTs and can

be used more efficiently.

Special properties of individual synopsis generation techniques proposed by

researchers previously are actually due to linear operators. Linear operators with

their D can be used to process incoming stream data in parallel manner. Thus the

stream can be divided into multiple parts and each processor can work on parts of

stream and final result can be obtained by adding the synopsis operators together.

Due to their property of order insensitivity and the D, linear operators are easily

extensible to distributed stream computation.















CHAPTER 8
TOWARDS A FRAMEWORK OF DESIGN OF STREAMING ALGORITHMS

The linear nature of these synopsis operators gives us insights into a frame-

work for designing streaming algorithms. We can componentize the design by

selecting the right linear operator for the stream given the model constraints and

separately selecting the appropriate algorithm for the statistic computation given

our space-error bounds. There are immediate benefits to this framework. For for


F(L(D))
- -


provide approximate
algorithm that works on
the linear synopsis
operator

Component 1
Component 2


L(D) define linear operator
satisfying model constraints



Figure 8-1: Componentized design of streaming algorithms


varying model constraints but same statistic computation, we do not have to come

up with an altogether new design but to adjust the linear synopsis operator to

reflect the new constraint.

Example 22

BasicCounting over ic,:, weighted sliding window N, with weight function g, can be

estimated using an Exponential HI.'-l,,,,ii with window size N [16].









Cohen et al. [16] prove that BasicCounting over any weighted function on

sliding windows can be approximated by a linear combination of predefined weights

on bucket counts of Exponential Histogram. Thus, for this new model, we only

need to modify our Linear operator to this new weighted Linear operator and

essentially use the same algorithm to come up with statistic computation that we

would over the (uniform weighted) sliding window model.






weight /
function /


B1 B2 B3 B4


Exponential Histograms

Figure 8-2: Weighted sliding window computation using Exponential histograms


For example: if the weights are 8, 5, 3, 2 respectively for a time window

extending the four buckets, the result of BasicCounting will be


BasicCount = 8L(D4) 5L(D3) 3L(D2) 2L(D1)


This example shows that by modifying our Linear operator for synopsis collection

to adjust for the constraint of weighted function over window, we avoid coming up

with a new algorithms.

Another benefit of the component framework of design is that if an efficient

algorithm is found for any model, it can be easily adapted to other models due to

its linearity nature. For example, the GK algorithm for computing quantiles over







37

an infinite stream can be modified to compute quantiles over sliding windows by

appropriately adjusting the linear operator corresponding to the synopsis(see [25]).

Datar et al. [3] have given automatic error guarantees for a statistic f computation

over sliding windows given its error guarantees for infinite streaming case by a

similar composition of linear operators.















CHAPTER 9
CONCLUSION

We have shown that streaming algorithms possess the underlying algebraic

structure of linearity. We showed that algorithms on infinite streaming model, the

distributed streaming model and sliding window model all have linear operators

embedded within them. We proved tight bounds for distributed algorithms

compared to their single party case. Thus we expect distributed stream algorithms

to have comparable space bounds as good with single stream models.

We showed by numerous examples that it is due to the property of linearity

that these algorithms have the property of orderlessness and one pass. Moreover,

they can be easily made into parallel algorithms due to compositionality of linear

operators embedded within these algorithms.

We proposed a componentized framework to design streaming algorithms. This

component structure is beneficial in understanding how the algorithms work and

also to modify them for new constraints added by the model. Also if new efficient

algorithms are proposed in future for any particular streaming model, they can

be adapted to other models by changing the underlying linear operator to reflect

constraints of the required model.

We found Sampling difficult to encode as a linear operator. Sampling based

synopsis is ahv--- order dependent and as -i-.i -. .1 by Surajit et al. [30], a better

way to sample streams is when there is some data distribution information to arrive

at a biased sampling probability that overcomes the problem of order dependence.

In future we expect to extend the framework to arrive at a generic method by

which algorithms in one model can be adapted to any other model and come up

with automatic error guarantees to generalize the work of Datar et al. [3].















REFERENCES


[1] J. I. Munro and M. S. Paterson, "Selection and sorting with limited storage,"
in TCS 12, 1980, pp. 315-323.

[2] Noga Alon, Yossi Matias, and Mario Szegedy, "The space complexity of
approximating the frequency moments," in Proceedings of the T.. ,uli-eighth
Annual ACI Symposium on Ti ...,j of Cor,,n','I,:I New York, NY, USA,
1996, pp. 20-29, ACM' Press.

[3] AI liur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani, \l iii ,1-
ing stream statistics over sliding windows: (extended abstract)," in SODA
'02: Proceedings of the Thirteenth Annual AC _I-.SIAM Symposium on Discrete
Algorithms, Philadelphia, PA, USA, 2002, pp. 635-644, Society for Industrial
and Applied Mathematics.

[4] Shivnath Babu and Jennifer Widom, "Continuous queries over data streams,"
SIGMOD Rec., vol. 30, no. 3, pp. 109-120, 2001.

[5] Lukasz Golab and M. Tamer, "Issues in data stream management," SIGMOD
Rec., vol. 32, no. 2, pp. 5-14, 2003.

[6] Noga Alon, Phillip B. Gibbons, Yossi Matias, and Mario Szegedy, "Tracking
join and self-join sizes in limited storage," in Proceedings of the Eighteenth
AC' SIC ( OD-SIGACT-SIGART Symposium on Principles of Database
S-/;. ;,- New York, NY, USA, 1999, pp. 10-20, ACiM Press.

[7] Alin Dobra, Minos Garofalakis, Johannes Gehrke, and Rajeev Rastogi,
"Processing complex ..-:--regate queries over data streams," in Proceedings of
the 2002 AC'[ SICGIOD International Conference on 'i.,.ir,. i,. of Data,
New York, NY, USA, 2002, pp. 61-72, AC\ I Press.

[8] Ganguly S., Garofalakis M., and Rastogi R, "Processing data-stream join
..-i- -regates using skimmed sketches," in In Proceedings of the 9th International
Conference on Extending Database T. 1 ,,...,i/; London, UK, 2004, pp. 569
586, Springer.

[9] Graham Cormode, Mayur Datar, Piotr Indyk, and S. Muthukrishnan, "Com-
paring data streams using hamming norms (how to zero in)," IEEE Trans-
actions on Knowledge and Data Engineering, vol. 15, no. 3, pp. 529-540,
2003.









[10] Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, D. Sivakumar, and Luca Trevisan,
"Counting distinct elements in a data stream," in RANDOM '02: Proceedings
of the 6th International Workshop on Randomization and Approximation
Techniques, London, UK, 2002, pp. 1-10, Springer-Verlag.

[11] Gurmeet Singh Manku, Sridhar R ii i.-opalan, and Bruce G. Lind-iv, "Random
sampling techniques for space efficient online computation of order statistics
of large datasets," in Proceedings of the 1999 AC '[ SIGMOD International
Conference on Mr,,i, i,, ,.i of Data, New York, NY, USA, 1999, pp. 251-262,
AC'\ Press.

[12] Michael Greenwald and Sanjeev Khanna, "Space-efficient online computation
of quantile summaries," in SICI[OD '01: Proceedings of the 2001 ACI[
SIGMOD International Conference on Ml,. in.g, i,,, ,; of Data, New York, NY,
USA, 2001, pp. 58-66, AC'M Press.

[13] A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss, "How
to summarize the universe: Dynamic maintenance of quantiles.," in In
Proceedings of :'li, International Conference on Very L.,,g. Data Bases, 2002,
pp. 454-465.

[14] Moses Charikar, Kevin C'!. i, and Martin Farach-Colton, "Finding frequent
items in data streams," in Proceedings of the 29th International Colloquium
on Automata, Liu,,,j.,j. and P,..,gi,,,,,.:,i London, UK, 2002, pp. 693-703,
Springer-Verlag.

[15] Shivnath Babu and Jennifer Widom, "Continuous queries over data streams,"
SIGMOD Rec., vol. 30, no. 3, pp. 109-120, 2001.

[16] Edith Cohen and Martin Strauss, "Maintaining time-decaying stream
..i.--regates," in PODS '03: Proceedings of the To, -.l.i-second ACI[ SIGMOD-
SIGACT-SIGART Symposium on Principles of Database S,-14. m- New York,
NY, USA, 2003, pp. 223-233, ACMi Press.

[17] Y. Zhu and D. Shasha, "Statstream: Statistical monitoring of thousands
of data streams in real time," Technical Report TR2002-827, New York
U.. ,- /.1;/ CS Dept, New York, NY 10012, 2002.

[18] Phillip B. Gibbons and Srikanta Tirthapura, "Distributed streams algorithms
for sliding windows," in SPAA '02: Proceedings of the Fourteenth Annual
ACM' Symposium on Parallel Algorithms and Architectures, New York, NY,
USA, 2002, pp. 63-72, ACM\ Press.

[19] David Kempe, Alin Dobra, and Johannes Gehrke, "Gossip-based computation
of .,.::-regate information," in FOGS '03: Proceedings of the 44th Annual IEEE
Symposium on Foundations of Computer Science, Washington, DC, USA,
2003, p. 482, IEEE Computer Society.









[20] Abhinandan Das, Sumit Ganguly, Minos Garofalakis, and Rajeev Rastogi,
"Distributed set-expression cardinality estimation," in In Proceedings of
VLDB, 2004, pp. 312-323.

[21] B. Babcock and C. Olston, "Distributed top-k monitoring," In Proceedings
of the AC([ SIGMOD International Conference on _l.i.ir,, m. t of Data, pp.
28-39, 2003.

[22] Yannis E. Ioannidis and Viswanath Poosala, "Histogram-based approximation
of set-valued query-answers," in VLDB '99: Proceedings of the 25th Interna-
tional Conference on Very I.,,g, Data Bases, San Francisco, CA, USA, 1999,
pp. 174-185, Morgan Kaufmann Publishers Inc.

[23] Kaushik CI' i .1 I>arti, Minos Garofalakis, Rajeev Rastogi, and Kyuseok Shim,
"Approximate query processing using wavelets," VLDB Journal: Very Lri,,-
Data Bases, vol. 10, no. 2-3, pp. 199-223, 2001.

[24] Anna C. Gilbert, Yannis Kotidis, S. Muthukrishnan, and Martin Strauss,
"One-pass wavelet decompositions of data streams.," IEEE Trans. Knowl.
Data Eng., vol. 15, no. 3, pp. 541-554, 2003.

[25] Xuemin Lin, Hongjun Lu, Jian Xu, and Jeffrey Xu Yu, "Continuously
maintaining quantile summaries of the most recent n elements over a data
stream," in ICDE '04: Proceedings of the 20th International Conference
on Data Engineering, Washington, DC, USA, 2004, p. 362, IEEE Computer
Society.

[26] Phillip B. Gibbons and Srikanta Tirthapura, "Estimating simple functions
on the union of data streams," in SPAA '01: Proceedings of the Thirteenth
Annual ACM_ Symposium on Parallel Al'..l ,n.:,i and Architectures, New York,
NY, USA, 2001, pp. 281-291, ACMi\ Press.

[27] Suman Nath, Phillip B. Gibbons, Srinivasan Seshan, and Zachary R. Ander-
son, "Synopsis diffusion for robust ., i--regation in sensor networks," in SenSys
'04: Proceedings of the :';.l International Conference on Embedded Networked
Sensor S1,.- /i New York, NY, USA, 2004, pp. 250-262, ACiM Press.

[28] SMarios H .dli. 1. 0 i. liou, John W. Byers, and George Kollios, "Robust
sketching and .,.-.-regation of distributed data streams," Tech. Rep., 2005.

[29] Praveen Seshadri and Mark Paskin, "Predator: an or-dbms with enhanced
data types," in SIGMOD '97: Proceedings of the 1997 AC 'I SIGMOD
International Conference on l.i.r..r., In, of Data, New York, NY, USA, 1997,
pp. 568-571, AC'Mi Press.

[30] Surajit C'!h m !1ti and Rajeev Motwani, "On sampling and relational
operators," IEEE Data Eng. Bull., vol. 22, no. 4, pp. 41-46, 1999.















BIOGRAPHICAL SKETCH

Guruditta Golani was born in Ajmer, India, in 1977. He did his schooling in

various cities in R ii] i-I, i, Afterwards, he joined the premier engineering institute

of India, Indian Institute of Technology, K!] i -pur (IIT-KGP), in 1995. He got his

B. Tech (Honors) degree in 1999 in electrical engineering and worked for the next

four years in the software field in various companies. He joined the Computer and

Information Science and Engineering Department at the University of Florida in

August 2003, to pursue Master of Science. Under the tutelage of Dr. Alin Dobra,

he worked on stream algorithms towards his master's thesis and is expected to

graduate in August 2005.