<%BANNER%>

Query Optimization Using Frequent Itemset Mining

Permanent Link: http://ufdc.ufl.edu/UFE0010844/00001

Material Information

Title: Query Optimization Using Frequent Itemset Mining
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0010844:00001

Permanent Link: http://ufdc.ufl.edu/UFE0010844/00001

Material Information

Title: Query Optimization Using Frequent Itemset Mining
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0010844:00001


This item has the following downloads:


Full Text
xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID E20101130_AAAABA INGEST_TIME 2010-11-30T10:12:03Z PACKAGE UFE0010844_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES
FILE SIZE 3210 DFID F20101130_AAAVAB ORIGIN DEPOSITOR PATH eom_b_Page_73thm.jpg GLOBAL false PRESERVATION BIT MESSAGE_DIGEST ALGORITHM MD5
6cd6596e0cbf24296bce44b3f3509798
SHA-1
a2faf8e6755bba3197062f233025d8f7125ea3e8
5733 F20101130_AAAUYB eom_b_Page_35thm.jpg
79b034b7709982d48b4d9186526d2ebf
fb55f438f137938d5b0334b23df37237b9582c1d
1053954 F20101130_AAAUTD eom_b_Page_31.tif
3326b86bec02dc84a5d746ca9757096a
b4e05afd47b51c0d4ea3834a4f7c69883b1c8ac6
42975 F20101130_AAAUOG eom_b_Page_75.jpg
43c86486eaab6c9eedc895f34406988d
3ecb83aa75c9d27de7f1b6238b59b9d7a5fc8a7a
10183 F20101130_AAAUJJ eom_b_Page_76.QC.jpg
b5ddc008791c52577f60a3897a652036
5b86ed4cf0bee47b7ca1659eba1b2810b2da2524
13018 F20101130_AAAVAC eom_b_Page_74.QC.jpg
ac2b9737d4e95cffd30d31e43e2958d5
eaec83620ca6b661006160b5116fde07c770d3fd
25349 F20101130_AAAUYC eom_b_Page_36.QC.jpg
4ce73f13f59b622db6ba6a270ef27b51
e8ccf9c6553d63aa8d6f43addd0e0b12b53a7868
F20101130_AAAUTE eom_b_Page_32.tif
6c47323fd67c405b08d0ab411215f051
b64d4b61d4204e9fd93255369f1b95a9cfa464ab
33200 F20101130_AAAUOH eom_b_Page_76.jpg
ff00b9967fd22641c45c49812fa941e7
5313624e8563a9643f90399282e61c680057ae0e
2128 F20101130_AAAUJK eom_b_Page_01thm.jpg
911f6d6aeb7248ebda627f762bfd897f
8f0c5fbe35110ca40c202f47d12b54ed76f7abd5
13289 F20101130_AAAVAD eom_b_Page_75.QC.jpg
c7355a32b26e883c89af34dafa199f3f
276dc2b023fca74096ced771e8783967951052d1
21257 F20101130_AAAUYD eom_b_Page_37.QC.jpg
e3ae9cb3314d640b63d8291a579c1478
3772f79da3fb737d1f073d9d2b99704478b78b08
25271604 F20101130_AAAUTF eom_b_Page_33.tif
1625d38023ebe8e69e61bc2ab46f89cf
ff2e7df890c55ddbec2177bb0d98159ad38528ad
35783 F20101130_AAAUOI eom_b_Page_77.jpg
7a029e84c34ba3e7804d652723b9933a
251acc307b68f8cea19be4a24fe92f4d7a3a3747
501782 F20101130_AAAUJL eom_b_Page_85.jp2
56153de15fbdd01d2b08c9afa79c4934
b47df8048564d32fd27d202915514f39482fec3d
4057 F20101130_AAAVAE eom_b_Page_75thm.jpg
8ccaa6e403eff099760339b49d5a3972
60090f1b77b4470087959f83ffbc77489f405fd3
22818 F20101130_AAAUYE eom_b_Page_39.QC.jpg
1b4ad41353568669ab56b97b3914995a
80ccb563aba82f5424e47aa5c0d83e65b85034c9
F20101130_AAAUTG eom_b_Page_34.tif
08cc021c674f98cd4c552d450a5021b8
8b25c2fc01f60a898c988c681c6ec809cdb16a5e
34024 F20101130_AAAUOJ eom_b_Page_78.jpg
44be51990f9ba1518f712f2558b13b50
0cde180c0d149dff632b111f2a15633e73b0efb7
6470 F20101130_AAAUJM eom_b_Page_53thm.jpg
c72b740aa121686ea5d928c6ddcc5780
d9d3b38a603080b2760981ffe667c1f55ab19715
3256 F20101130_AAAVAF eom_b_Page_76thm.jpg
3f48ddd61d9da6f38406485763d677af
af906ff176dddb39a072b02ea268174a0eac7534
F20101130_AAAUTH eom_b_Page_36.tif
7f07d59e75ccee66a3008dbb7004f55e
fe2e101345e4d81b71b0b8ee6ab697e2f2746300
37691 F20101130_AAAUOK eom_b_Page_79.jpg
c38b1b4d7fdff3f5773eafdcf73f150d
388d53aa81ef50582033e58d46ed47432efe792b
F20101130_AAAUJN eom_b_Page_06.tif
e65c05704aa521c2298c320e69f78a7d
ec2dc4f5279ab07ac3466633fd1770bd557827a4
10143 F20101130_AAAVAG eom_b_Page_78.QC.jpg
09bcc0d632c7f5f6a2b4e5bd7b525422
0acb0f25629b7997e5e74055950d872c5c9cca6c
6595 F20101130_AAAUYF eom_b_Page_39thm.jpg
040eba41f056bbed6a713fbb1eefb39e
73e26d8ebd187eaed542932560e96037ca3ecf22
44038 F20101130_AAAUOL eom_b_Page_80.jpg
cee2a2f1a233bbdf8edf42b29b84c4bd
1272a4c6ab7c9453eb57b418abd8652ae7db5c15
629930 F20101130_AAAUJO eom_b_Page_80.jp2
29f701b9a6406ea9799b432bf2c9ba44
d05a98163d6d2b22949915406b1375e8a106ea56
F20101130_AAAUTI eom_b_Page_37.tif
d851e409782a96ecbc283a13920d08d9
d34226cd430b2298459e862caa7bcb486fb4ecdf
11209 F20101130_AAAVAH eom_b_Page_79.QC.jpg
7364952648bbbb4b1345017af8053a29
c943037d8b82e5aae262a2eb74acd9769d9270cc
21585 F20101130_AAAUYG eom_b_Page_40.QC.jpg
b13eaf0c5708a8f1ba52748a465d8e3d
0c6f07bd34ae946fd1bc6b0274b7878f04d4bc4c
41982 F20101130_AAAUOM eom_b_Page_81.jpg
8ace5d3f6a33841f58554462bda611e2
1deff3bb4b80b92e92cfb044976fc86b577e8aba
7414 F20101130_AAAUJP eom_b_Page_97thm.jpg
c31182c9e89f286e0894e8545c08cefa
57d735b2f16deffc8dce8c4fc27d7f94dcad424c
F20101130_AAAUTJ eom_b_Page_38.tif
c1c360177b95eb7f2ced708243138cce
2faf8755dd27e0088168dbabe41c5dc48e85dd69
13274 F20101130_AAAVAI eom_b_Page_80.QC.jpg
37987b85d8432e4e1250de30cc1f5389
44f1ba1a38bd2fd930d30365ee93d46c06870ce1
14901 F20101130_AAAUYH eom_b_Page_41.QC.jpg
cbb03a65e3e1c7f1661fd926e358e47b
72b9a64b5fcd34ab291d210dc5d3f5bf9d6ece9a
42671 F20101130_AAAUON eom_b_Page_83.jpg
46b42bebca4bbec50f03518ff92d7372
816619ef00173cc366b0ab7c33905e9fa1f7f585
91404 F20101130_AAAUJQ eom_b_Page_31.jp2
6e4dff3fcd4424c86d5377e3b2ce212b
72d25245760c2d2e65925d4886f1136cc70f93b5
F20101130_AAAUTK eom_b_Page_39.tif
4bbfb3a0538ebd053609e66134a4dd24
5e6218bd6a3a61d587c2949ac3a2ae70807b581e
4266 F20101130_AAAVAJ eom_b_Page_80thm.jpg
fc69e63a99757db1fd0084001538ff20
3e15599468f4891790bbfd7fd62e6c7237b725fa
14924 F20101130_AAAUYI eom_b_Page_42.QC.jpg
6bdc06fc5a7547c174d3f3cb76a4be1a
f84296f03cff420ac05d414cc878dc5dd0a066ac
36247 F20101130_AAAUOO eom_b_Page_84.jpg
2bd6762779efc03ae57b5a42cac7738a
55c9913c8885e0b85ac8e997e95b385e366ac05a
573675 F20101130_AAAUJR eom_b_Page_12.jp2
eaf625f127ad76aa4e525db59b5e429b
61b29bdd79ee7d89353ddce3773878c817e6d281
F20101130_AAAUTL eom_b_Page_40.tif
f1bdb372700912a76122fee1055bcc35
087e96f7ec28c620500c29542593f92f0d4419de
12421 F20101130_AAAVAK eom_b_Page_81.QC.jpg
f24fd2c09953bb2582061bbcfa3b7454
9cb7205b94d76fa9a2cdb10c1f67cdc276cb4156
4249 F20101130_AAAUYJ eom_b_Page_42thm.jpg
650ea32225a20ba4c5fe545446077fc9
44ac3b01e3e2ee260a32820bafe250a3876e7a1e
37957 F20101130_AAAUOP eom_b_Page_85.jpg
b470cbadb7520dc0f5042ba64a210d84
714c5b4b11e5b25b65633876abedb94868a093ca
3614 F20101130_AAAUJS eom_b_Page_86thm.jpg
7d3f770f230b88fcc323f9961e370132
fde364a96ad4c2b001383835b1fd7abbe884d6e8
F20101130_AAAUTM eom_b_Page_41.tif
d5831728d96de2bbae2c29491b5841ee
60f94344bd1c41d08e33cf884601a7592ba046ed
3904 F20101130_AAAVAL eom_b_Page_81thm.jpg
bfa71df053de8cbe54a4a9568cec744b
6882ac6e37346666731320422af7a446281a8cac
4968 F20101130_AAAUYK eom_b_Page_44thm.jpg
1ac7336c853709d1abfcb43a1f4111f2
51311d50b7a32344bb5be838d3b06608a67e345d
35682 F20101130_AAAUOQ eom_b_Page_86.jpg
b1e8fe5c1ce16f29a3efebabc0c1e208
1146ca776e03ecce802aadfffabbc173ebcfe449
20946 F20101130_AAAUJT eom_b_Page_51.QC.jpg
c119a2e7f90cb1b174faa88998fbfbff
88866ada13b8fbcabd8e190b5f805e27f9c87367
F20101130_AAAUTN eom_b_Page_42.tif
ca76f200ec3e30abe39b0ed5e53ceb67
8d3d41339fa85e38e95f0f1281fc596b73ac44cf
22435 F20101130_AAAUYL eom_b_Page_45.QC.jpg
888c49848d18ef2ba1b419355edba599
ee7c21046f4dec3c916bdb9b21b9b57c19c76bf8
45059 F20101130_AAAUOR eom_b_Page_88.jpg
a1eb06e3d8b765f3d82856e1d16b13e9
d5b3c36576e46076cf6b720841322860039b26db
F20101130_AAAUJU eom_b_Page_93.tif
569195fd481e39abf6dce91edd878b7a
efb31890af8accb5a94b9cc88c9fbe138cb1b44f
F20101130_AAAUTO eom_b_Page_43.tif
66d9b1344793126ef0c6723ed500e76d
693e3a412990d758fbdfeb8ba7aa3fe9f78ee71b
12356 F20101130_AAAVAM eom_b_Page_82.QC.jpg
f623b9209adf306c4680318ea1d8dd8e
9de7136a80e41dae72c18e300bb687cf3d4042c9
6661 F20101130_AAAUYM eom_b_Page_45thm.jpg
573c4bd922cfb5907eb80a28b61fe934
4d823dba74bed935cf896dcca8d4e81015ff3f45
44477 F20101130_AAAUOS eom_b_Page_89.jpg
21d1c3b92d4af53066a5ff04002d86e4
3fb8f1238ba33f5b13c8a5e3ea2ab61093555a2f
71363 F20101130_AAAUJV eom_b_Page_05.jp2
6bac043d60b983689d76480776124116
ba8b4e8b5e1bd28955000801d10a44812a9628da
F20101130_AAAUTP eom_b_Page_44.tif
254f769e77555eb44b5d0d1c82b0d349
b4011e8c4e047f4e2d83cfe75f4974004585ff2e
3923 F20101130_AAAVAN eom_b_Page_82thm.jpg
fad397a704d4e7000a695a8b969aa7e3
0fc4a89f3f3918f3b4143602162edfbd888d9261
6383 F20101130_AAAUYN eom_b_Page_47thm.jpg
b802290d86b730e0cf3a8b6e1419bef9
18de3dd8d40a4637453bc4e6e288c6e0cab210f4
33516 F20101130_AAAUOT eom_b_Page_90.jpg
6fbb198fb17b3754170ca6699fa93246
a02a1ee5956ee9aa1bd640672981035d3b8da1e7
F20101130_AAAUTQ eom_b_Page_45.tif
cacf4ff909e6ec24cbedbdec90e6d423
95858e96a0320f9e25752716329e4bae4294296a
12268 F20101130_AAAVAO eom_b_Page_83.QC.jpg
72fd442b21a616d744de14644d9b62cf
b77908378eecd5d15be4961195186e00fd000821
19446 F20101130_AAAUYO eom_b_Page_48.QC.jpg
95e946e31001e09c699c4ad113b189a0
37909e5dfff99084bec02609f3c30fa21e84c38f
33025 F20101130_AAAUOU eom_b_Page_92.jpg
dbb3472f80390b6a115e4c12a3c741a7
dec594eafb00dbc53299ddd409820c562b71c861
5696 F20101130_AAAUJW eom_b_Page_59thm.jpg
db55b4b6c410064bc6d978ba24afc8d2
5b9482d50434130fc139eba186589b3a5469eb6a
F20101130_AAAUTR eom_b_Page_47.tif
b482f1a32a543060e68d8086b434947b
26488e4ea83d1d36795d26a26bff342781bee8b0
10492 F20101130_AAAVAP eom_b_Page_84.QC.jpg
e65fc7118c9384bfce773009b38da5c6
4d1b0829f8d4bcef667838596e21fee6ac9d3b68
5619 F20101130_AAAUYP eom_b_Page_48thm.jpg
a2287706910fea5244696e7c8ee9d5f7
0b331e152a4608760777ab0eba956c2c820ddc27
44682 F20101130_AAAUOV eom_b_Page_94.jpg
526ac1df12fdc96c82469782501c5aa4
3f8b5f52766f81f6a883eb133f9db29416a28569
61071 F20101130_AAAUJX eom_b_Page_65.jp2
cbdba697904859b1b03f39a680243eab
35177b6d3812e1f8e64ccc7828656b4ef1ea9adb
F20101130_AAAUTS eom_b_Page_48.tif
ee3f38705636cc3727d6c775fe7ffa40
e93ceab95b0119c2497f3d7503ce9ae2e0c8f173
3405 F20101130_AAAVAQ eom_b_Page_84thm.jpg
182f271806553fdffb942649798d948a
2d269633b8dcb51b5f22739044d15357abbb1c5e
5884 F20101130_AAAUYQ eom_b_Page_50thm.jpg
24425622c2c6cfc49e43e909b87d98a4
2cf0b71928730ac9dc67defad1fe5b054483a73b
41829 F20101130_AAAUOW eom_b_Page_95.jpg
447d41a56af2eda496085d3fabab9bd0
2daa024c3eb6ad7750a0a4c36ffb264ad91930f4
11707 F20101130_AAAUJY eom_b_Page_85.QC.jpg
69ed32372cddd567ae48629e074d94b2
3605c3649d46655a656c2e315a7de48f684f9639
F20101130_AAAUTT eom_b_Page_50.tif
1d0fd27981b7558c382b4d01d7652fca
f90c2abb09792dc69ae3954c009bf32543770290
3787 F20101130_AAAVAR eom_b_Page_85thm.jpg
569a85e8f3e54ea84d13b0da7a61d14b
9c5925dc96ab5b921200e3cc0f37bf2980e87e49
6075 F20101130_AAAUYR eom_b_Page_51thm.jpg
7a5aca612c39569b17d54549015d12a5
03d9e7d58b3ba031f5d7abe2265ea739ec7a8555
79729 F20101130_AAAUOX eom_b_Page_96.jpg
6ab619611da73396830113b213ca8db7
bc9c5955654be2ceaf697990062e2455e515df11
3084 F20101130_AAAUJZ eom_b_Page_07thm.jpg
0b92a3c852caaa2049caae9c5068c3ff
35922a12c2f361188d8d84fb5b8853434260180e
F20101130_AAAUTU eom_b_Page_51.tif
41f63cc20795d2b0e61142030fc4e70c
92c3de08d18d63b7b786411f6d2b16cb76e5e645
10817 F20101130_AAAVAS eom_b_Page_86.QC.jpg
d0c1b154c18008430b31dd01be3a216b
3fce6fdf8d0d0a4484946e10272991f483d104e9
20152 F20101130_AAAUYS eom_b_Page_52.QC.jpg
8d5a5277aa5f8563202ab651a2e9df0a
f949a00d39bd6be1212ed33fc42267d9d09cb0d8
94557 F20101130_AAAUOY eom_b_Page_97.jpg
b548e1069af64beb03492fbe8fdf85fb
b8e055464d80cc731fb9aed89a5d9e6e3573b675
F20101130_AAAUTV eom_b_Page_52.tif
b37cc2fb42e94c33aa783867a4a81ee9
df21e86a72a473019b85bd9a5501e827e0aee89c
10394 F20101130_AAAVAT eom_b_Page_87.QC.jpg
8442505c97cec3fbabcd839ac2cf0d48
cd1f0e5c6e68fd7bcaab6fa91831407b2bb0d201
6100 F20101130_AAAUYT eom_b_Page_52thm.jpg
9bc7021e266c60a4feccd8df1e70363c
285628bd91cc8223f8a96f4ad1c2c67b85847af0
29899 F20101130_AAAUOZ eom_b_Page_98.jpg
1d208432d9c05ebd7d1ae3e7ccfdb339
dc16144dff19202492fbed984f2df006bed5c61b
F20101130_AAAUTW eom_b_Page_53.tif
98e15751e7d9c16ac70d95cbd574c411
27321f482e1b8f3383ed6ef02ca7fe4742539e0b
73950 F20101130_AAAUMA eom_b_Page_13.jpg
54be749eeb22209a50aef4ad16489ddb
bfe34ebeb38a41a20e276cf767d424960e038df8
3351 F20101130_AAAVAU eom_b_Page_87thm.jpg
23c57a75e7b957b8ce4c765c3d6e9e2f
8aafb4bdea9f73711d2c6cbacda176c57e903442
23922 F20101130_AAAUYU eom_b_Page_53.QC.jpg
8498fa755826d31d073287f3ff9634e7
4cda7446774c6cc82643385dab845ec487467548
F20101130_AAAUTX eom_b_Page_54.tif
8f22ee81fb87da3175341c5e338b2d89
a2e5542fa696f5f35ed7a55493a3a8f0902e85b8
76828 F20101130_AAAUMB eom_b_Page_14.jpg
60f52555a19e5cc737d19311b77adabe
d9a4484e70d2587ce8a55fca8e542879cddfe996
13435 F20101130_AAAVAV eom_b_Page_89.QC.jpg
0dc794569f6673b8eae53f978b221f60
8d794b53bfa6e4d32eacba420c50274727c88d35
16053 F20101130_AAAUYV eom_b_Page_54.QC.jpg
c8f0f24af2287545a76aea741420b90a
24e6ac402319141bbad9a55d692d2026eba99a5d
F20101130_AAAUTY eom_b_Page_55.tif
857c15e7dd2de8c165b4a1b5b4b36aef
0c4815e130a75babf01347a3f3bd4fa7bd4f9ec6
65375 F20101130_AAAUMC eom_b_Page_16.jpg
10b0e918a40361c7cbcc662726765d68
16254522eaaa9a604f64ff560121ced1d7d6a63c
4120 F20101130_AAAVAW eom_b_Page_89thm.jpg
63e1265af419e1e9479b0e9e4b8e138f
e9a01381839adb97052de80ca3f826246e1b19a5
4895 F20101130_AAAUYW eom_b_Page_54thm.jpg
7a0afd60237822c10aacbbec92936d7d
1d8d4a3db090fa8d6d1e83a9052ce28ee83b0e96
101135 F20101130_AAAURA eom_b_Page_63.jp2
9d58750ea63e4ea1070d4e6c0834d674
34eacb4656aadb3eefb35a875132b6fc200de689
F20101130_AAAUTZ eom_b_Page_56.tif
f940945473ce75ed4f4a156f930a5b23
48e723b23310d4d0384f5bad8d22f76a2f6674c7
71748 F20101130_AAAUMD eom_b_Page_17.jpg
153244fb519ad740df23ca3f4f0b7d02
86eb9678bdd7fc99f0e5d37dc9a5af5f49dace48
10168 F20101130_AAAVAX eom_b_Page_90.QC.jpg
9683fd757fcdde31f107f6cacf648930
cf33fa5c201a16d1020406a9e7313d1d702a0495
19312 F20101130_AAAUYX eom_b_Page_55.QC.jpg
f78f7c7ec6a5f6aca2b1075c43904ed4
b01a0f0c7d4eb725c1186d4bc92231c8b06ee942
118666 F20101130_AAAURB eom_b_Page_64.jp2
11e54510e64570b51572ef4ac84564fc
2419e64bd9f4f3168fe4b12dc93901b7dd1a96d5
44946 F20101130_AAAUME eom_b_Page_18.jpg
545ec0581fe595c49ed57a6161d9b9a9
475876085518c2b91e5cda89fcf522c37992bbc7
3241 F20101130_AAAVAY eom_b_Page_90thm.jpg
1905d9232552ba1912f89f9b178f2cb3
868230c3cf6c52c50970cfd07641c17f3bf9ce2d
5098 F20101130_AAAUYY eom_b_Page_55thm.jpg
c046f576fccf639ea7ca3747e4ef709f
dfcf8fb3a3b98991182782fed4a919e951186115
5340 F20101130_AAAUWA eom_b_Page_43thm.jpg
d259a4094fd73c6656285543e914cf66
fe02a0f11697bde842b30e4d51556d793e93fae1
45939 F20101130_AAAURC eom_b_Page_66.jp2
4cead1076acc261c67d411a31dab485b
d5fdd3d450b7d60339c78b59f65f4a33defbd39e
39206 F20101130_AAAUMF eom_b_Page_19.jpg
409e5de49b4342e09798d4c33a915696
79205a766148c11c49bf20a75ccefd45068085c5
10920 F20101130_AAAVAZ eom_b_Page_91.QC.jpg
73b8345b1ab8faba1e29ece852be104c
93e2b63c43a8115e37d6a4ba7a55aa1b1f95b443
10229 F20101130_AAAUYZ eom_b_Page_56.QC.jpg
0d8f4016ccb9677f20236d18f924490a
2667e1e76bc21f41b47dddde92cabdb43a39cbe1
13451 F20101130_AAAUWB eom_b_Page_68.QC.jpg
15c3f36487335432e4eafc3a601ff196
094dc90e17620684a34e37e0c6e6a7c10e32db60
61205 F20101130_AAAURD eom_b_Page_67.jp2
eda3b641fce22b18c84ef344527c2a28
14a52fbedab18570052afb3eccfe3ea5a879b6b9
72244 F20101130_AAAUMG eom_b_Page_20.jpg
eca1417ca16e86a2342241e6a89ea339
72318f38b58cb46fa61014132dc7582fe7a4a0ee
106227 F20101130_AAAUWC UFE0010844_00001.xml FULL
87d257f594e07deb7352ae508ff4f99f
3c69a5d80c888be230f16ce358d349346a6b2abe
643160 F20101130_AAAURE eom_b_Page_68.jp2
0042a9be4553845abc5abf2c2ad31973
72ae2928d44d00ac1c9ec5b80001c36cae90ebeb
57894 F20101130_AAAUMH eom_b_Page_21.jpg
43b3504fd989eb6bf3d3b9456c0df610
69d481a276ab688a96f01c5ea63c43b82a016f14
639952 F20101130_AAAURF eom_b_Page_69.jp2
bc83d055f44fead3cbb18a5010a34003
769614e2b1de6c1650d739e289c6e4d869962af9
60704 F20101130_AAAUMI eom_b_Page_22.jpg
e50d077d77086484391f43551c85a715
70e1d33d9bb758001a2c336153f61c83e2807dbe
3222 F20101130_AAAUWD eom_b_Page_02.QC.jpg
12051d74b3f5e1acbad5e23e7a87db43
fe40b788e73ca408294600534cbdd747bd284f58
431785 F20101130_AAAURG eom_b_Page_70.jp2
d52d89fc8e3a26fa0b19756b8f9d9db9
c3b854cbaacb5dc68b274503e9823f0d28884fc6
53486 F20101130_AAAUMJ eom_b_Page_23.jpg
4022f1b7282d30a8703f77ea06baabb6
66f4ee2766ba9bed1a993917a62058c0900062e1
503759 F20101130_AAAURH eom_b_Page_71.jp2
e6c5599f0791616ebf70f8d4ac418c18
5e0f3ad43fa210ea3b75e43247ce5c700c13d63c
48924 F20101130_AAAUMK eom_b_Page_24.jpg
f3ed7abe79e7ed9e98906bf1830c90be
5d342bc78158c0ab47c027db54e6d34518f6e6c3
1337 F20101130_AAAUWE eom_b_Page_03thm.jpg
37f20f0d6d921115bd4c089ecd3e23aa
e1751bd3a37ba65d55e45e90e101a1c8f4d6ceb9
465937 F20101130_AAAURI eom_b_Page_72.jp2
748529f4f8ec0aae8d6d7be7f75c7947
be9975f544f40c54435a321de468421814b373fa
71079 F20101130_AAAUML eom_b_Page_25.jpg
d1366640fa97bfaac028a05c6b34e584
32e850be7bb911de8b1d61e68a79a7818586c4a6
19804 F20101130_AAAUWF eom_b_Page_04.QC.jpg
682615ff4095a26a968b3988711825e8
7f51e127f6cc9280a83d0a475dadc7a69a5a5998
477568 F20101130_AAAURJ eom_b_Page_73.jp2
49179eab6c9d2a19203414c9a6f5e151
d4e9e1b69dbd646d90878ab77c1ed3ed9168132c
25460 F20101130_AAAUMM eom_b_Page_26.jpg
9ee915f235407841344d1cff2906733d
f6d98f4c9e126fd15ad6ace04d9d6289b130a9b0
5625 F20101130_AAAUWG eom_b_Page_04thm.jpg
b229815a54448b9bb69c2cebffe7ed7b
7a76a985df452ddda746515aaa58be739dc8c46e
651026 F20101130_AAAURK eom_b_Page_74.jp2
a803cb6b378ac1fc0337c662e0e6562f
9288328e31282c47a6b4c7f02a610ecc4a26e364
65680 F20101130_AAAUMN eom_b_Page_27.jpg
e0a62913761e8690684694a1800afe4f
54553be4b256946aaddfebbb963c1ca298d9236c
16091 F20101130_AAAUWH eom_b_Page_05.QC.jpg
d799eef31b2d3519fcc9b40ebfe2b4f6
6d0c49ea4dd1bb0facf71bda7ca3d193d046f000
613862 F20101130_AAAURL eom_b_Page_75.jp2
36d006a1afb7b6582f1bdbf6638edf69
1b09f4cc08efa65d15e85fff46a1c6daeb9c36b6
59202 F20101130_AAAUMO eom_b_Page_28.jpg
a4dfd1dfa7f0c3f1f60e02ebdc6d8d03
555892dd2a6bae6d40f811357c03a62371df40f7
4668 F20101130_AAAUWI eom_b_Page_05thm.jpg
fcfcda08702e2e54a5dc7b58f29ff460
208110742f546801911bfb16ebda9cc635f2a89a
430709 F20101130_AAAURM eom_b_Page_76.jp2
f08c7f478aa7dcde053a63ea65dde406
883de5010516064f6faa75f56ae334049a85f1b0
60549 F20101130_AAAUMP eom_b_Page_29.jpg
e915071c2ed78cc805595629133327c3
9827b685027b16aec770e4d41ee54a73c7f4a5b6
20213 F20101130_AAAUWJ eom_b_Page_06.QC.jpg
b950e6701d5b06aba67f69680ea98a35
0b401a8bb92af30d0a7a8926711a1aa3bfa9d470
510801 F20101130_AAAURN eom_b_Page_77.jp2
3b3bf507f5e79c11986789767c0ac24c
35f3d0b72a2ccd082d64da03b6fb826bfed59a0b
65183 F20101130_AAAUMQ eom_b_Page_31.jpg
41825b4c37f738663fae1ebb5b99b11c
8ac6d07cdd04e5133d2a7fffcb19971879e04170
5234 F20101130_AAAUWK eom_b_Page_06thm.jpg
485ccfd12ee54dd7998119bee59204af
26c15ab5fd6aa7822ccbc510f36051f5c7e453e3
466565 F20101130_AAAURO eom_b_Page_78.jp2
9bfa9be0759fab8b75cd3d97f41cb54b
57e1865b59c71ea970ee6b531bdc64fa3019aa9e
71622 F20101130_AAAUMR eom_b_Page_32.jpg
37da5a0463edc0e5f1604d7d71ce48db
d42be144963a738cdc08d40380e1cd885bee7761
10451 F20101130_AAAUWL eom_b_Page_07.QC.jpg
efd4c3dedd356a8ede3564b78ab190ab
d2594b35f93742cced2d1054e87f2630a9369f8b
662205 F20101130_AAAURP eom_b_Page_83.jp2
89b03661517a235fb621419e77fa3536
6df5cfd589c1d6a73904a5626f0d7f3648c55c97
44278 F20101130_AAAUMS eom_b_Page_33.jpg
7ad72b53f62125d59eba41cca637a8d1
ceef44a85a38cbb83c0e4df5c269a6b3a0bbfe4d
8500 F20101130_AAAUWM eom_b_Page_08.QC.jpg
50fd7b02ec620ca9a9832d4839420ca4
f224fa1916bc02b2676e925d6f44169159876e7c
440151 F20101130_AAAURQ eom_b_Page_84.jp2
743d32583c992d4841dd9cb2d6da0296
2b664c912fc8f4de46b3cf4473114a372a8494eb
65080 F20101130_AAAUMT eom_b_Page_34.jpg
11c654961a77ae5a319fb9098cdbb75d
c0d450b72ad05329237d330c2cbe35dcfc0a51fc
14686 F20101130_AAAUWN eom_b_Page_09.QC.jpg
c5e45a1328f4e889c79aab4f2351a3b5
f6174dd36ca1e9eba20e3704c0f7bbfc8826d84a
471403 F20101130_AAAURR eom_b_Page_86.jp2
47151615e72a8ddc56dd33ba8efab590
1842fe57e602075236971991d5804f4fe75ce1a3
62038 F20101130_AAAUMU eom_b_Page_35.jpg
9815d45fd4ad36a5df9bc9ea55f04bf2
2c230a7e6ad91afff438ed620c966f8c580a9f13
4028 F20101130_AAAUWO eom_b_Page_09thm.jpg
9a767c8ca420957025962501c5ae25ae
e0cdb469bd00ba867b3455d2759228e12920e495
490485 F20101130_AAAURS eom_b_Page_87.jp2
df3da94e889922f77e4836b2749be4a7
31425245941b8cd35e359809a75b14d6b4b6f7a0
79512 F20101130_AAAUMV eom_b_Page_36.jpg
5eb268238d6c8ab17b953d03201e02b4
9258aef9114ba42866caed40a684c475d8a5ce74
18030 F20101130_AAAUWP eom_b_Page_10.QC.jpg
1b1907789cdd45f953360242b8015315
56041a8b6f515d08b0d19c9c509843f9d851b230
665664 F20101130_AAAURT eom_b_Page_88.jp2
556fb638b82e2118ac8f0c10bc51a090
9ead802bf48295fae16d55352e2aae604da8badf
67851 F20101130_AAAUMW eom_b_Page_37.jpg
18ee6f489a8795b3b382383b8fbc73b0
18a22dd84ec6e469cbc3eb8bcc9bdc24db53ddaf
5124 F20101130_AAAUWQ eom_b_Page_10thm.jpg
c57bea6b3dd5ebfd9d80097d3ff58af3
3692e7e1a90242a8ec8c8016ce5c5c72f105586c
631484 F20101130_AAAURU eom_b_Page_89.jp2
2ec3ef52c009e72a5c08046460093bfa
fa39cc7e43de4fb28a4d1002fa663dd376ea5165
67412 F20101130_AAAUMX eom_b_Page_39.jpg
0cc8dea0b6a8b375acd42bbb8bbb4570
dc7c24d41d2f5b66dca13d9aa00d3321c95184c0
7794 F20101130_AAAUWR eom_b_Page_11.QC.jpg
fd8dbcebff37b6f31b80087af97a9441
5599ffc4d9dd105c7c4be1580fc413c83c7fa598
432693 F20101130_AAAURV eom_b_Page_90.jp2
77922e7a88c0f55c5fa0c9b06c15bf5f
ee29170305693ac3472ee0adbed0bb5ac6868294
48087 F20101130_AAAUMY eom_b_Page_41.jpg
7ed0653e1b4947cbc15257d3207e43a1
29ed35765d3b1d2036c35da611ffff0376b2727f
2525 F20101130_AAAUWS eom_b_Page_11thm.jpg
8649dbe8cfe1fcf7b97b5096a893cc2a
c519ba9c59e5c252acb5f3f4782ddd23d30e676c
517750 F20101130_AAAURW eom_b_Page_91.jp2
4099fd29944d49ccfa2a61709422c8c8
7e60e5dee4bc208cef54160158ec9d4a6eeacfea
38319 F20101130_AAAUKA eom_b_Page_93.jpg
f28ff5b5c54e7a10c9fcd6a79d38f8ea
bc25d6398a08bdcf20c0a4cbbc928dcb089218db
4690 F20101130_AAAUWT eom_b_Page_12thm.jpg
075871a3be6c47910dc59a968610bb94
62c166334c2a6f029715a0b577b5f6c791ffc7ff
465765 F20101130_AAAURX eom_b_Page_92.jp2
a942d03cfdfa6e70f0309bad551806e3
a55b8d50721247c8ec17672d273a559945feedb4
F20101130_AAAUKB eom_b_Page_76.tif
2554ee5867cb96fa5a050f679b4e08a1
d0a63897012d30304da836115cd45f4e574081a6
48850 F20101130_AAAUMZ eom_b_Page_42.jpg
8e4172e19a72fde84b3b986ce3c9521e
53bc5acbeead911a5c826fc36c61dda6c74735f3
23137 F20101130_AAAUWU eom_b_Page_13.QC.jpg
b79e340ae118be2d46f105b7c8dc703d
8647481e38f7591a1c6ceadf0ba3ae0cc07b29aa
498744 F20101130_AAAURY eom_b_Page_93.jp2
e58baf9db2335e94152e5c6ad97279c8
188e0fea38799cd62c88162defe6634235abf954
570665 F20101130_AAAUKC eom_b_Page_82.jp2
959f2152693d6e181c6a9b084ba94fe2
fdc870ed6249ff623759e0139f855ac14edfd459
6369 F20101130_AAAUWV eom_b_Page_13thm.jpg
46b99c7938ea399378dd34953a01a76b
54c8cc38a3302f0050c3c842889065ed831ec40a
638197 F20101130_AAAURZ eom_b_Page_94.jp2
1eb395b2be263920acf15beee6ec4cb8
d661fe35fbcb6b81781e8bef591623de1f32cf50
3373 F20101130_AAAUKD eom_b_Page_77thm.jpg
a84318430d318093252a166fbe31fd88
631af280aedd4b728997fb05e2b5168446b1d93b
24473 F20101130_AAAUWW eom_b_Page_14.QC.jpg
1582421ca6bce4f0856ab58e7fcec303
b9b28e8e81323f58653f79103e8ee48c162111f2
32149 F20101130_AAAUPA eom_b_Page_99.jpg
2873b2a1e851bc2960c0d23278ea6931
8fa699271c2e868c11ad699e5cc210b1b4c16e92
3993 F20101130_AAAUKE eom_b_Page_74thm.jpg
eef78bd015a1e2b1844522fa640d5b80
289077ddb3a0f3eb0bd7cdb56487408f6a8f395d
13857 F20101130_AAAUWX eom_b_Page_15.QC.jpg
f2b316dd57094203752c0e4fd4e18ff5
2711c1eea333af2fb2a3c345cd75f56ddd2651c1
20945 F20101130_AAAUPB eom_b_Page_01.jp2
21a33c28664a212b01b28d7793938a32
e03f5b16c60245ab1cfd805609883047924967ce
3463 F20101130_AAAUKF eom_b_Page_91thm.jpg
6b66d650bc96e930269f0940ed9f1794
2e846cbfa683f5d205b2fcd894baa17f5dd42116
F20101130_AAAUUA eom_b_Page_57.tif
4aae7471b72eeced29368349ecf3b900
2ae4d78079038f7bfd15ec78ce1cc5b6b8874fa5
4327 F20101130_AAAUWY eom_b_Page_15thm.jpg
cbe5fbb3f6c5b6555225a4ee5bb3eff4
16737f86a5ee527d06ef1eae5fde02dd53deea5e
5380 F20101130_AAAUPC eom_b_Page_02.jp2
a57dd705bf12f779ee13d8461e444350
dec94f9eb76063259e18dc9b06ed7c6cb70d8dc4
F20101130_AAAUKG eom_b_Page_46.tif
f6d0426180ec402b862e1bfc5eeea637
229cd5d9d1e408c83744f5847024892b3225dce7
21381 F20101130_AAAUWZ eom_b_Page_16.QC.jpg
35f557eb54a005976c7a85e8bdbd25e5
43ccf6b03d1f85ebe3279cf1b105c8770eb53154
85911 F20101130_AAAUPD eom_b_Page_04.jp2
d9676da5608ee9c4e9998d2f0fd65d13
c37cf5604be9e5bb4ba078ffce3594f33da29020
F20101130_AAAUKH eom_b_Page_85.tif
c3c355b8d10c54ece735ef4137b63a33
d3eef5387b1d257f3618774b30e54c8bb66e1721
F20101130_AAAUUB eom_b_Page_58.tif
7f8c53fc8b0f9a8e9d0fd92d2b31ecd4
7b3e731f15542d0b3e182a9c8f5b463b0eb2a759
1051986 F20101130_AAAUPE eom_b_Page_06.jp2
bbaed206361ac0fad96df837654db009
4c862d64396eef54af32e755509559435b1e4e4c
10303 F20101130_AAAVBA eom_b_Page_92.QC.jpg
fb62c330b88ef781fcd5f72da0c73863
6dc3f4baab779e31ccc745b62f8649ea521305db
3124 F20101130_AAAUZA eom_b_Page_56thm.jpg
d31194fe5d3b2c1b6364a2367b2fa194
0919b45355cf76a1549cbcd32ff358e74dd0fc85
81093 F20101130_AAAUKI eom_b_Page_10.jp2
c319cbfc8da34d35737b5d872696221c
44c4cc72a3cea05cdc00fe19f004f162caa40ead
F20101130_AAAUUC eom_b_Page_60.tif
b72d58123a6522bd8106f2f863e92ad3
a63f29f51355a4679a9979a6b47dd3cff4be122d
910816 F20101130_AAAUPF eom_b_Page_07.jp2
492a97e2c0a4f3d52e8f1fe050f9f0cf
7b30bfd1a1eb691a9dc210140bf118223c78202f
3290 F20101130_AAAVBB eom_b_Page_92thm.jpg
f7340a9685eb56aecb38b20c0d2596d4
9abafade1e328b4f45fb8a046d3a91c5b11054cd
18439 F20101130_AAAUZB eom_b_Page_57.QC.jpg
f5de53115e4833364c80220bc862b38c
58b7447c1e0a680a8f85287def7fa427c80c3ac2
F20101130_AAAUKJ eom_b_Page_92.tif
16ed4c40def5197b8f95b715b37d2858
504a37a2f467fe7487755eb62cc3c333c66a46b8
F20101130_AAAUUD eom_b_Page_61.tif
494dab799d3697f88b2b2785c1ad774f
63ef9dbfdf8d977ef2662e06ece5aefd14b40cef
533937 F20101130_AAAUPG eom_b_Page_08.jp2
5c7f878d22940afa0bdeeaba8113ae63
c4636b5c453573553d716eb79cb0bb6d98632045
3466 F20101130_AAAVBC eom_b_Page_93thm.jpg
2e823ff34881131aee0ac715677f7599
14f4b2b93031544fb964ae1bf24591a8e3f71508
5273 F20101130_AAAUZC eom_b_Page_57thm.jpg
ac3d9c790d3a5f2e9fa1b1d8519eb961
ffb3473632489fb578045f9c2575f4a4d268cbcb
42739 F20101130_AAAUKK eom_b_Page_15.jpg
53ba773cfc4440053e55072d4f13cda8
681988ca784e672d773dbe6e95a9e6e8602f4c8f
F20101130_AAAUUE eom_b_Page_63.tif
26317ef162e06704c00ecee471194d57
3948a93fe76749c571bfc2a5fba308772b26e37d
1051985 F20101130_AAAUPH eom_b_Page_09.jp2
0ecbe9bfcbea92f946f88b73f7e2b317
adbd2b6c46e568f1cbc4ab21ec519d5f6de8f0cd
13324 F20101130_AAAVBD eom_b_Page_94.QC.jpg
016a29b7940abe22b05424022f1231d1
89b7005b7518793ec54452db3c568c7eae48ef0d
24845 F20101130_AAAUZD eom_b_Page_58.QC.jpg
47f0627d2ede5b31430901e371400eed
12b45c8d3f49c64979654f856cfd234f911b7fa1
3090 F20101130_AAAUKL eom_b_Page_03.QC.jpg
34b5bf7ad1c02d6bb0159142066cf12a
e48f5aa96219131978e17529742bf42e731e0c0e
F20101130_AAAUUF eom_b_Page_64.tif
3409beea30c69810493428243976d2bd
d2b69ac543cc7a6404cb2e7a13324b053bc40670
29454 F20101130_AAAUPI eom_b_Page_11.jp2
5e452806a6ad3081350800febe2994bb
d72eea708fa8771292c7e0daca121c30551c7d62
4288 F20101130_AAAVBE eom_b_Page_94thm.jpg
118bdfb8256e8cb105c970d66a2529a3
47f5a9504d437c746c9edefe4e41bcacd371d2c7
6775 F20101130_AAAUZE eom_b_Page_58thm.jpg
a11776e39235e28d728acca7bcf9bd6e
cd47cdc30dedcf6e309d8d697be17642a3bfa62a
98280 F20101130_AAAUKM eom_b_Page_50.jp2
775a917018b82ffcaef51fe415a599e2
65e513dcd4623f9aed6779b7d964866342244858
F20101130_AAAUUG eom_b_Page_65.tif
0d718234728163ef4f0e50c14d8eb04a
3cc15d2bf28e6b0f4b27cad086ca4c8ffec11d42
110175 F20101130_AAAUPJ eom_b_Page_13.jp2
bb2da99c5398e6826b3f0df220fa8034
2662a562207a60f0b38614274fed356083c5be09
12446 F20101130_AAAVBF eom_b_Page_95.QC.jpg
e5b9228ad08c9ed4cdd9c358fd1e7b8f
ec3a4dce7e635a50b823690659f22334c68a5c54
19635 F20101130_AAAUZF eom_b_Page_59.QC.jpg
9ff7546500c79f1e5a4abab877f3a340
f8e0fcbafcb787f225c8894f92720bab150a9755
76170 F20101130_AAAUKN eom_b_Page_30.jpg
4c1048753d3c71c82826e67eb7048306
1ff19acf8068f12f6b6c22b1be51d510c5c75c13
F20101130_AAAUUH eom_b_Page_66.tif
526b7e9ea5b17f37d38c2320dc7f724b
60ce9cea60c242967035f3d9b9811f95ef92ead8
114851 F20101130_AAAUPK eom_b_Page_14.jp2
d2f305e0142dca641801669ef56250a2
f74abdc4d5aad647fd6ae337c89fd65722f03896
3977 F20101130_AAAVBG eom_b_Page_95thm.jpg
f7e934bf31884c340a594f3149040060
81834652cf29965e5636f5e5fbf99b6b67edb898
F20101130_AAAUKO eom_b_Page_62.tif
a03270ec5aad298954efc138c981390b
f23f9efcad34dd6869396c0387ead37f873c30e0
F20101130_AAAUUI eom_b_Page_67.tif
93cf6b5484df366c76a032c5a574058a
1a845b6b2e9959fbc6cf7506ccb96d67f4399a5b
61352 F20101130_AAAUPL eom_b_Page_15.jp2
7db5e02fd5b0223911d7275260bbae70
b0c331e8aba66120eb824bec1597ef10ec482bc9
23091 F20101130_AAAVBH eom_b_Page_96.QC.jpg
7ecf4d5c54c87abdf05834a22d58c1c5
1f22fcdd1ab32343e34b2d09634d1f6b62735d79
11510 F20101130_AAAUZG eom_b_Page_60.QC.jpg
664b766a537088da33e829403c0abb7f
ac4eb3116123c0b7371902699dd532d3a69a1ee6
6659 F20101130_AAAUKP eom_b_Page_01.QC.jpg
1cfd90ffdd9aa4a379b699fecc12b016
92c91f908aeaf69b05b16abe924763510e35e42f
F20101130_AAAUUJ eom_b_Page_68.tif
4567e257b26a5b5b0e73108c50f6f551
bed558e6f7062332ffb4c57788ca1863d80cbd1f
102222 F20101130_AAAUPM eom_b_Page_16.jp2
a84e85cac723413bcdb7302753e9f6b4
798a776437d6453d14eb63036dffd724cfb15441
6316 F20101130_AAAVBI eom_b_Page_96thm.jpg
e13eb830f32b7325504631291fa909a2
c1815954fbcf34781d0a60bfe4e23ef111cabbeb
3764 F20101130_AAAUZH eom_b_Page_60thm.jpg
a8b33e78f1c6340ddc5a5e6747fdc9a7
7f9ba71a653ef68532da86c231a459f19a103ca8
6619 F20101130_AAAUKQ eom_b_Page_17thm.jpg
08e50d561a7276e0df4160e0b1f23bc9
48d9e580a15bc4012cb15d72a8ca1fd3f12205e6
F20101130_AAAUUK eom_b_Page_69.tif
8f261dfa15f78e19d9581c94fb3a4d6b
0362cc2a7259abee794aa9e168acdc070d673bd3
112726 F20101130_AAAUPN eom_b_Page_17.jp2
6303bf4719fe1a571785e7070744fd9f
139de91f22b9f887a202030dff08c28f61a8d63d
9001 F20101130_AAAVBJ eom_b_Page_98.QC.jpg
74ff412b734ffe23e58e50edc4c14d09
4e7cafa1f9f2ea81c7163798fdc3308c3cfdf3e2
22497 F20101130_AAAUZI eom_b_Page_61.QC.jpg
e247edc15184bd8edfbcd489d68122f5
6a8c8b3934a7acff9ec81126c2fe16dfb6c9fd8d
F20101130_AAAUKR eom_b_Page_13.tif
f7270bc5b7a4ce1e726f99ddef183321
7c7248109f629f18937f07ea442251b413fc4e76
F20101130_AAAUUL eom_b_Page_70.tif
ff41f05b443c636a77745a09fd80cd93
0dfd5d1787c27bac277d94e6d62d8e458a9a8900
63928 F20101130_AAAUPO eom_b_Page_18.jp2
c3b835c976b29264afe0aac831c726d9
5b6e2212e765b170905d6a5c7e301941a2dc69db
2883 F20101130_AAAVBK eom_b_Page_98thm.jpg
87b2c3c06db30261e7695536ade7d6bf
530251e3e8d67e374407f43ea72b54659fb74f1f
6382 F20101130_AAAUZJ eom_b_Page_61thm.jpg
ea28e4cba4091cf5d1221c0019bd6070
45578a13a21741579d7ca2798301e669424e13b7
105761 F20101130_AAAUKS eom_b_Page_47.jp2
6fb83212f35768c02bcab3d00c0289eb
588ad604f6e44dbfc59ae65d28ca5d8e6e821890
F20101130_AAAUUM eom_b_Page_71.tif
f2e5bcea7c3418b8dfa01655cac53491
65367b2e7575e9881e105ae45bcb80fb8df6c371
414276 F20101130_AAAUPP eom_b_Page_19.jp2
5050158649502202c3d87f239881af58
f883aa7d189d5c9143dc21ae1c2746c7a8b3181b
25439 F20101130_AAAUZK eom_b_Page_62.QC.jpg
182ee67bf1dbe5512b52ad8699a92642
406b2346d18834c2fde089ab00cc436339609199
13336 F20101130_AAAUKT eom_b_Page_88.QC.jpg
b4a67076681381d2104ceca2fa494e4f
b058644e6fef9c5377728dc606a62a32b7a20e4e
F20101130_AAAUUN eom_b_Page_72.tif
36b5b3d55398a248c1c1a0aef894964d
4af25246b2668e27424368c182d7885f3cf1012f
107492 F20101130_AAAUPQ eom_b_Page_20.jp2
0374544c12804a5ea4ea61fbc9b36c16
0836c6c87b429bdec435d8c4e1217b26a968df70
6929 F20101130_AAAUZL eom_b_Page_62thm.jpg
e22d99e321971ce2dbad8f3d6a453725
8d7f98b997775a5d41f2b63f5115c3c074482b4b
F20101130_AAAUUO eom_b_Page_73.tif
5cd76ca83a659c3db25d8bcee48b2a89
e5a728071cee0d94c165ae248d055511e8fee7e9
85665 F20101130_AAAUPR eom_b_Page_21.jp2
b1f56c5c2561fff67647dd55472befaa
aa90cd293db38525c8af6034293bb20c9831a634
484683 F20101130_AAAUKU eom_b_Page_79.jp2
26ec6de71658f98087a63bd806addd4f
1c1ebf9e75a173f6802bde111c6a027d78b808c1
23778 F20101130_AAAUZM eom_b_Page_63.QC.jpg
95c472c2ef5910670f2cdf337e4bf6fc
2376f4a764e614c6ba00596bf642a314bd15af7d
F20101130_AAAUUP eom_b_Page_74.tif
c93fbb37e86d98692e55a0e85dd29c27
e74e4666d8468b80c3da73c28c42ae8376e93d11
88841 F20101130_AAAUPS eom_b_Page_22.jp2
aa0c9362cd05c274c092354c12eab993
89abbc08fd408348b2a770aa1397050efb6097ca
67860 F20101130_AAAUKV eom_b_Page_44.jp2
0225e6483008fad0507881fa9b41f3c0
b15904d210371bcedd82327a4128947e1cb1bcbc
6548 F20101130_AAAUZN eom_b_Page_63thm.jpg
735a4dcb218ee5a8e06c39053ad2dae1
05db7e24cf3ee627a829b23bb390848c599673d4
F20101130_AAAUUQ eom_b_Page_75.tif
64c9d4488ea57db91cbed772c2e06fdd
86c5ee326b886fc52ac0abb35942fe91ace353bb
78247 F20101130_AAAUPT eom_b_Page_23.jp2
39ec734eabc872107b21a2d8858e69ae
49e240e5e1182df9677750a51b2b73641a0563b0
5296 F20101130_AAAUKW eom_b_Page_49thm.jpg
a2665b61f957ea304febab2cbcea3bb0
c4109ff7a4b645909dfacee03320d200ed41657e
7041 F20101130_AAAUZO eom_b_Page_64thm.jpg
fed923b97f8bee7f86af83860a2e61f8
4b634ca71e6be3fb2753123e0337581534ff5e63
F20101130_AAAUUR eom_b_Page_77.tif
7075632656e93725e9d852d3dbe8fb8a
3068cb43dab595d271f7393c391fe49c3d8fd7a6
32084 F20101130_AAAUPU eom_b_Page_26.jp2
e2ed2cf6221f2a78344d39117f36cb50
dddf32b8ffccad05d056db397a675c791c16a6c8
13849 F20101130_AAAUZP eom_b_Page_65.QC.jpg
4cec190cd0427803fb09ba170439b1b2
ebd4b88b5f996cf238b7f4185cd67980855de56b
F20101130_AAAUUS eom_b_Page_78.tif
b600d40157617e1f6a0b112f084666b4
1c49baed0e3e5659513c0ac6b86a50de7094e74f
99906 F20101130_AAAUPV eom_b_Page_27.jp2
14a73e2a4fdcdcd0019df20a4bc0517b
20e7fa062af4eb07f6720b7999109335c30b9d96
F20101130_AAAUKX eom_b_Page_59.tif
036bd1c6465aab3419dc4338cc1b5f16
cde31292c3a65f67f092fad47572010b3831b66f
10941 F20101130_AAAUZQ eom_b_Page_66.QC.jpg
9f9563b01332c631b4e55fea8be15bc2
8615d5384a1d0269d4e8feeb4411b66d5d8a68ac
F20101130_AAAUUT eom_b_Page_79.tif
1e9b8ae1f5ebe7559b47dbe77936674f
b205247906e2ce757f53b21da0cefedafbc7141d
87511 F20101130_AAAUPW eom_b_Page_28.jp2
78a21b4cab35d56ae3d6bd1902c63ad0
335ac2de520fadddd8d73a5d0a725fca8ce3bdf3
10605 F20101130_AAAUKY eom_b_Page_99.QC.jpg
80b7c38c32840ce86f61ae723b40b3ac
281e8c53cf1bfbe6ff7c2adaa24b27db4a810af6
3302 F20101130_AAAUZR eom_b_Page_66thm.jpg
bfafa70e6a0260c8183ddee45a5c7fe1
7b9c63f95c54990ff5251d86d51aee324f7cdc0e
F20101130_AAAUUU eom_b_Page_80.tif
118490d5842e1d040e15baecfc903584
fa3090a3ff732ded87924d051e995323716cc245
88425 F20101130_AAAUPX eom_b_Page_29.jp2
f668fceb07d8958aae1b8d5acde831a0
b51c146ff34b449544b4217213e2ae4b5236ed47
10325 F20101130_AAAUKZ eom_b_Page_73.QC.jpg
5e5a71ed7fe9a632b739f282a9e5af3e
61b01f447113b982f4d721a6ebf785d13eafd91d
14446 F20101130_AAAUZS eom_b_Page_67.QC.jpg
8b415f824a709cefdb8a91f789194924
8f1815f243420b6749728165e0e17e3814f8a127
F20101130_AAAUUV eom_b_Page_82.tif
066b33fdf94122ddc0268e2f75a1f355
974cb18fc07de4120687735b0ed60d940a2dc98d
113747 F20101130_AAAUPY eom_b_Page_30.jp2
9a3d2c122732f15815c5724c2aa983bd
0ea4db6088025cee2c50a7d2663a0c8f968dd977
4136 F20101130_AAAUZT eom_b_Page_68thm.jpg
362d155a9c8c9731ffd189c54ca82a05
79993da288e1371e2516d96c0fb6f3f82c7e9eab
F20101130_AAAUUW eom_b_Page_84.tif
82174cb8f086cad86c2d95512087ffe7
e2ab26705ae3f10534993d7c0c95c49fc7d23a50
59364 F20101130_AAAUNA eom_b_Page_43.jpg
9646dea624cfc52f39141072db81570f
8ce56dc54fb035813c28d84cd855fa6eda4f40ba
107797 F20101130_AAAUPZ eom_b_Page_32.jp2
e3f79be26f5dfce1326766b3d43e5a5a
0f65e52bb835f2af7460e5bf51fd5316fe5f61c4
12167 F20101130_AAAUZU eom_b_Page_69.QC.jpg
62659737066046a91e00bcf0fc386b35
fd9321c5263230d5f33da53add30c2d5c3b3e3d6
F20101130_AAAUUX eom_b_Page_86.tif
634fc702bb69a5b151e8ada3318d5982
a1b1c651a7aef260fef40abafa1dbccc18e16699
50928 F20101130_AAAUNB eom_b_Page_44.jpg
a9d31dd99532dc36f01968cfc562753d
447ca73284cb0477a0702223e87a38cc085e6e6a
3941 F20101130_AAAUZV eom_b_Page_69thm.jpg
e62c3b04e3018cf96baf32a878805ed1
5b8da3ae73c354a9e8e3029202a3d529f28abd82
F20101130_AAAUUY eom_b_Page_87.tif
018791ad7ecbe9850f667926cecde489
a99f78bd1a2c7b6c032d889e3be4112cc384428d
72127 F20101130_AAAUNC eom_b_Page_45.jpg
cdfa681ec6668feea05a1b6d4723a213
68677b7468d87d9d01b8479dbef3c7af81d8cbd1
F20101130_AAAUIF eom_b_Page_83.tif
4bfd4cbbc00fe3a4ddac14b7c00ada1a
bef693b1ac9ab8abcd834ae39646b2fd4be95dbc
10327 F20101130_AAAUZW eom_b_Page_70.QC.jpg
22ce89cb5b7d2c0450cbbc14beb93249
445f72926f482ab9be2ce511915fed2014695724
F20101130_AAAUUZ eom_b_Page_88.tif
fd590b2b130030bcd016368f8d6342cc
ddd142e324042f3b3e98484d511badee446aeef1
56196 F20101130_AAAUND eom_b_Page_46.jpg
e078136f96a416b695ff47afcd67ed3d
cc959338edac5114dc6a9e9a4e360750f2014146
F20101130_AAAUIG eom_b_Page_30.tif
bd6b427480665c79c8d5d8bc55758c10
718584f016cc2c22948ca1b8ed5c7c58d163afbb
590893 F20101130_AAAUSA eom_b_Page_95.jp2
23cbd181a0051171560fc7bd122126af
af92030bbce2d121e97ee080aac7dd7386a9e359
3410 F20101130_AAAUZX eom_b_Page_70thm.jpg
76cbb10e622b63477f7563c0e2842432
8f94a273a014b95ca46c28e5be910a4b6c7cf0b2
67928 F20101130_AAAUNE eom_b_Page_47.jpg
8d2fc2b90cfcf888e021491219fa7dfd
789909199824c73d8218b7f63d52f1098bf95509
3929 F20101130_AAAUIH eom_b_Page_83thm.jpg
5191ef9be9d459e4d74ddd7d0a9c31c6
aa5ddd46218c1b027db3ebe686355f9e5827567f
123726 F20101130_AAAUSB eom_b_Page_96.jp2
dcdd7a5bd95a9a42b271387e7cb3afa5
24ad28657871e4c16d6c1a6ec78451aa33091085
11623 F20101130_AAAUZY eom_b_Page_71.QC.jpg
563b2121dfa6d3d38e5962f57d978160
61c9aa57657a40c3755054770de4ad23bfae67f6
60132 F20101130_AAAUNF eom_b_Page_48.jpg
de6a17376c1f80dc3d0b39ffc0881af8
6c38b68b99f0aace53f5b6760bac496ec74c3d19
5965 F20101130_AAAUXA eom_b_Page_16thm.jpg
a19e2be227e7056244dfe8a9d3920fc5
8814d85125af44e210ef3e9c01d35aaf1bff6d4c
1372 F20101130_AAAUII eom_b_Page_02thm.jpg
24821f28341edef02c43c811ba9d2020
a78b8a27bcfce7ae003d97c9d8d85fab5b0a2b34
150266 F20101130_AAAUSC eom_b_Page_97.jp2
1619122551ba142f986842f96fa5fa98
26830feac599cf044e6e01232aa5527bf6a30bae
3768 F20101130_AAAUZZ eom_b_Page_71thm.jpg
2bf7a1879e9baafe805e9bab9ccb0392
f65cda8bf25e343744d926ebc8ba74d70398919f
54823 F20101130_AAAUNG eom_b_Page_49.jpg
31248fbff7c852c92c1e25d91eef475a
ad65c5bce80016778c51909707eac8b77a897aae
15050 F20101130_AAAUXB eom_b_Page_18.QC.jpg
c7f742dd09325c30b79cada0df611f31
5f72c321083517bf816a68db8d1dc47522d13be5
60585 F20101130_AAAUIJ eom_b_Page_38.jpg
bc5fd1fa2a17f9b8b615d65ab2e15056
0f32a936ef1b60a63144710937c4a361eedd5e21
415691 F20101130_AAAUSD eom_b_Page_98.jp2
2be0b4fe2aa879c1266fa209a2eb7ceb
b754a68094554ba0e1a3a131c38203a28d57ab71
71521 F20101130_AAAUNH eom_b_Page_50.jpg
166c7dec43c6674ea2f095795e7d77f2
127b56de39c4111ba6012f9deff5532100a278ac
4422 F20101130_AAAUXC eom_b_Page_18thm.jpg
12e8c394479fb64da9cad663f2ea0f33
f855c1b41fc05d1da96eb296bc89924c9424c41a
17871 F20101130_AAAUIK eom_b_Page_43.QC.jpg
2fde985ddd7c4b583303a9c27ed8ae47
2fe53a4965288a743cf9492a6c70cd9c9cd50e3b
42473 F20101130_AAAUSE eom_b_Page_99.jp2
55ef4cda3cc90f343f495c340d02b3e8
f0f716640a918dd72369ef969afd4819b1001d3d
66346 F20101130_AAAUNI eom_b_Page_51.jpg
a607cd05d361839672e3d534a6d2cf3e
3d5a9f29c5da35afc8285f78e802fae56538356b
12598 F20101130_AAAUXD eom_b_Page_19.QC.jpg
3fbe1d7aa50fb646f64acc4f52886782
4248ae757fbdb020d227e5808a5dc10514dca7da
92315 F20101130_AAAUIL eom_b_Page_59.jp2
706b34fda42cd61e8048fe94dd6d1681
e7bfd34672253ebd693cba2d051c3b16430a39b7
F20101130_AAAUSF eom_b_Page_01.tif
fdf28f0db9735e36ad81a0b5af2d2af2
2265e216654c47074c1794a2c1a79662d935b15c
63604 F20101130_AAAUNJ eom_b_Page_52.jpg
fd4e674aadd1c65d5d5fcc52784a2b02
0f22f2df07c791569aca3a04277332017c2b2b41
5554 F20101130_AAAUIM eom_b_Page_38thm.jpg
39c869ab504bfcff1e17ba3a8fafef15
ff5b76eadecd15806ac897b8734b412d13946afa
F20101130_AAAUSG eom_b_Page_03.tif
d86875da22040b90d6b17adefba44dfb
3c1b12b83da1ad9f01d197941a78fe16a0161e3b
73572 F20101130_AAAUNK eom_b_Page_53.jpg
eb1ff76058847712fd4815b6922a1799
af3c3e87ce12c2a8c27c4789a946a7085d7693f8
22523 F20101130_AAAUXE eom_b_Page_20.QC.jpg
fab109b4bbc882e0949b391c9e9dd353
06b6545a81881706ddacc848f63e8eb84c3add79
3992 F20101130_AAAUIN eom_b_Page_88thm.jpg
4f5f009cb7ebd82a9fd3422c67aa29d2
439138213f35b7b24d84e42198291704dc506411
F20101130_AAAUSH eom_b_Page_04.tif
ebfd0a080c8d9acd0622382b5ab90f50
749de291b64e6e45638dd6fa82feaad48037e3f9
47915 F20101130_AAAUNL eom_b_Page_54.jpg
12bed4bb2bca44542a5461cc0deab765
524b2435a034a9118d75aa8a4d8ad8d30d82dbe1
6384 F20101130_AAAUXF eom_b_Page_20thm.jpg
14472f737b74ed14e870fa95cb9e00fc
c47e540fadea62074441ef8d563a17121a27e302
24293 F20101130_AAAUIO eom_b_Page_17.QC.jpg
fed899563508e5b4089e068d1276dd76
c544dbd5448f967b6ae1e7c51fb9d6041b9c0105
F20101130_AAAUSI eom_b_Page_05.tif
1d8505c65ca9466bd999f34b603dc67d
2ef673ed1bd35780c4dbbefe8817b0928e89345a
60193 F20101130_AAAUNM eom_b_Page_55.jpg
55f0bb4d3e5e2e7eacea8fbd040d8432
1456f7f732dc31310120664a83f7606c809c5331
19006 F20101130_AAAUXG eom_b_Page_21.QC.jpg
67460aa610b33c197b3a36dcf96e207a
02851e9d9538689c9eb25f03c5f1a5738a87126c
109314 F20101130_AAAUIP eom_b_Page_25.jp2
26e87302e25007c6682e96373e2b9121
5bff50368f24f193a8575992473abe7cfc341e71
F20101130_AAAUSJ eom_b_Page_07.tif
fa652a71bed4281385bb8dcda11633c8
4b83dc18cf9fa6762367a78e52eebef36db033b3
32842 F20101130_AAAUNN eom_b_Page_56.jpg
46d72def69212aa31ebbda5a7065b377
170e3768b0f786a3c26ef70989dad494624c3caa
5403 F20101130_AAAUXH eom_b_Page_21thm.jpg
b429e2e638d2e160b4831cce30747814
67e36b95f4b01305823ec4a0440bc42e4514d94c
36339 F20101130_AAAUIQ eom_b_Page_87.jpg
fba722552810479ca2dac80e7132eac5
12bd34414e1f695c8e77feec9633da5835c79dd4
F20101130_AAAUSK eom_b_Page_08.tif
4876bc2075269171934ae60eb321cdbc
38a488fb1d4b117ac4620b857d720870d170abad
59561 F20101130_AAAUNO eom_b_Page_57.jpg
a971419bddc17b29f576146120d4af26
be060702da72b0a8cae5c6e4fe67ce7338aebdcc
16858 F20101130_AAAUXI eom_b_Page_23.QC.jpg
7334b0103d830d771d005ea1510d76ac
0aaebcb3739e3f06d9e46334f14e886c8d96a5c4
585356 F20101130_AAAUIR eom_b_Page_81.jp2
abe115ff7539354c699e8cdd8c550443
71bcc47c4d3bdb8afc06e210fc0d7617e744d66b
F20101130_AAAUSL eom_b_Page_10.tif
5f212f2f309a60c80dcfc4075b93b4e0
5d582097e30725297701dcf5d772a1724b8e65a7
73640 F20101130_AAAUNP eom_b_Page_58.jpg
735af4543e798a44de021b155e771525
8b34ac92823125a352cbfc2be029173f4012cf3f
4922 F20101130_AAAUXJ eom_b_Page_23thm.jpg
b62444d40283a7fb99193d7d96764d3d
c8cc1632d14dae7f3dacb08129f6252ea5865c32
6674 F20101130_AAAUIS eom_b_Page_14thm.jpg
871d3628c8f2285919b1ac1ac18cd9b6
83cfd1e83345c6d90a622fd7aea9d03eeae9454f
F20101130_AAAUSM eom_b_Page_11.tif
ae1eaeb359aac9796b01982e0cff7ce2
18f21cebcac3fcd56189a57b60843b4a08ab0326
15698 F20101130_AAAUXK eom_b_Page_24.QC.jpg
591b71a0bf53d1decb8c4dc298a6ea66
ad522c1625385505d2bf205afcc916463db457b9
19620 F20101130_AAAUIT eom_b_Page_38.QC.jpg
da67dcd843efcbacc55da9b59dce97eb
17494d46619dda18e61cbc234c40010ffecf091b
F20101130_AAAUSN eom_b_Page_12.tif
e10d7aea402fa547391ab84d5ec26d60
05a3bf8338376b6e25c8ff44c5b5c1b05c6ae893
61736 F20101130_AAAUNQ eom_b_Page_59.jpg
ec7a36fb3a3589a72d5993cc0b944cf9
a5dfe9ef4340b1df91ea751a217095833f2377e9
4724 F20101130_AAAUXL eom_b_Page_24thm.jpg
585af838a3d08fd22e13a7ea35c6dd21
e51011e6220dd0a030fca9cbbf63b6c464b809aa
3524 F20101130_AAAUIU eom_b_Page_79thm.jpg
795ec15ca0b89094235b05cb10ad75d8
57dfd530c9b28b2c4a95b52f891bade3caa9b0b1
F20101130_AAAUSO eom_b_Page_14.tif
fcfddd988ee6915d8200648cf57da281
5b62e7eb03921521252d55d6eab334ba5e83959f
33044 F20101130_AAAUNR eom_b_Page_60.jpg
e3f8863330f66feb32b255631c3a2ea4
fb556644e15e55435cfc397e6ec0594b057908c8
23946 F20101130_AAAUXM eom_b_Page_25.QC.jpg
effc72de048e72527a36484965c03d19
c3da5c96dae22f76635864e31792e4ab3416a4a7
F20101130_AAAUSP eom_b_Page_15.tif
dc2da9f5b53ffef34395b1f287becda9
fba68b3d73df5eaf08c1f5b3f12bbd6ab9d3881f
68638 F20101130_AAAUNS eom_b_Page_61.jpg
89a5d81309a20a25b31d327ab66ecc66
a991517c4946ff79655382945c92eeded43496ee
6696 F20101130_AAAUXN eom_b_Page_25thm.jpg
4a5dc469d66ca0833206bdf066b1076f
79f91336605291ec526bd0d46adba7cad94ffb04
F20101130_AAAUIV eom_b_Page_49.tif
f6d60dfa7330822168c2cd2b54a0bf36
991c7d84a14022b02be36043598f91e0922b96f7
F20101130_AAAUSQ eom_b_Page_16.tif
cd8e045aa98770c6e8a9bbbd159c868d
ad4fc6fd463a83c966b71c5035f7825743940e7d
85429 F20101130_AAAUNT eom_b_Page_62.jpg
63168e7d93f16f2edc5e3bff6fd3db38
ab61dfacbcb0ba42c474118a50af80aee0a3837e
8201 F20101130_AAAUXO eom_b_Page_26.QC.jpg
5722c9b344ab8f202da5b17d5914b66a
580445835ec4fae94f2b009c0b3d3249b6568a9c
F20101130_AAAUIW eom_b_Page_22.tif
46880fcc5b12d2d9026a4a479a64faba
b1ff197f7cc9e62caeb55d054aafd1be25b6371e
F20101130_AAAUSR eom_b_Page_17.tif
14dd87626e5ae8e660894ad9c228a918
3f6c67c043461b221f439941eb192acffe1709b0
80240 F20101130_AAAUNU eom_b_Page_63.jpg
ddef88771e6cb16476743cbe1246cf70
2702439e336b5f22d73bf6e512a2a029185eb4b7
21951 F20101130_AAAUXP eom_b_Page_27.QC.jpg
1e6bdb3eb705a645dc6f1122a5cb05cc
72fa2ea433d9118a3595d8a58e94b7774179b4a6
3301 F20101130_AAAUIX eom_b_Page_78thm.jpg
76644105b46b1628a9299e43e1954aef
c60500f9fedd64b3117f2d8fbc6c6e140da40b52
F20101130_AAAUSS eom_b_Page_18.tif
b11077e83fa4c942515f98025694009c
bc3ae0e7cea52cf9eebe3e06b898bb1d9516c96b
78782 F20101130_AAAUNV eom_b_Page_64.jpg
89e0ffe88ded17f954e71b22be779a5b
9a6cc11dea62401cf79965914789d6019fca1c0b
6227 F20101130_AAAUXQ eom_b_Page_27thm.jpg
cb84c588cc8d6a42e4c0ad9911caaa14
59325fe76027b7b61a669a48b97fb678ce3a7190
35940 F20101130_AAAUIY eom_b_Page_91.jpg
1bb03c80ac3983d2c5ee721e37fc50ef
7dd1875036861bcfb47955651727d9a3c916118c
F20101130_AAAUST eom_b_Page_19.tif
f05da384bd00ddff1d93dc9d3a525668
fa463a06dff29edfb65a34a6779d77ba5dace61e
44112 F20101130_AAAUNW eom_b_Page_65.jpg
6e6996ed5c662e81ea7004eb4164ba76
e8878fa96d5a91c39f0cd8ef26a76671759028e0
18619 F20101130_AAAUXR eom_b_Page_28.QC.jpg
f054b9caeacce4ec78c09b7c7635db15
477a507eec20ba8d6322b3c95fc28c81b9420a73
10947 F20101130_AAAUIZ eom_b_Page_77.QC.jpg
b470ed472aaef6c6f19967fcdb88c996
596af8b05055f385d31deebdd9e70ea0d9d77478
F20101130_AAAUSU eom_b_Page_20.tif
249e0d3a3560bbe5b283238ca849c0aa
017456c76fbe162528cd910916a5393717324348
33447 F20101130_AAAUNX eom_b_Page_66.jpg
73d921f0373d6e367a9334cdac47d6e0
f229dcd1eccda8e558b602b284691200a9d2a0eb
5402 F20101130_AAAUXS eom_b_Page_29thm.jpg
9b5813c66c70eea0977458d93164b24a
7940ea3b2e3d31014cec422328b8ea1b6c30d339
F20101130_AAAUSV eom_b_Page_21.tif
ec9a9f72ceb96ec623288a300ecf9f08
b3564b520399b0b4852507dba3766db6e72c5ca3
43329 F20101130_AAAUNY eom_b_Page_67.jpg
ebf66773ca4c78397b5a3429f2ee88ef
726f35c85e64717dde86e57becfd42d0b7e5dff6
21645 F20101130_AAAUXT eom_b_Page_31.QC.jpg
e9462e84bd9153b5daa0f316cead6241
38ea56cd4553be3dd4bbc33ef88b7b808533ac7a
F20101130_AAAUSW eom_b_Page_23.tif
532e1664b369d40ac8eafdc0250570d2
70404ee203d4169048097157808317de67ce97e9
5477 F20101130_AAAULA eom_b_Page_28thm.jpg
5e250a96c2f8b03eef38bea383981807
8b0337fdfdf8cd80e22791ea79abb80613ff0f2b
44366 F20101130_AAAUNZ eom_b_Page_68.jpg
f050d757d75917d139af4ad4a99a01d8
d739321940bbe12743108ac9669170a471cb4f38
23338 F20101130_AAAUXU eom_b_Page_32.QC.jpg
e8555bf9ea0c9266397822e10d556255
374171f5323ce559acb5f69614e8c9d20e4438fc
F20101130_AAAUSX eom_b_Page_24.tif
bd0e1e27997f3282b181de7adebb586c
0eb4cf38b0df5b3592bb258c95a077ca10fefec9
69498 F20101130_AAAULB eom_b_Page_40.jpg
db2c4e5ec0495963171746d5ccb04ba1
afe72c678633a6a3675d216141e8d6cc0632725e
6416 F20101130_AAAUXV eom_b_Page_32thm.jpg
4be28ef81a1a084e0523f6eb5a4f7601
93a84352c0300e72a71d23d93c0c89632d087bf3
F20101130_AAAUSY eom_b_Page_25.tif
b0cf9064308cb5e223378b0089342bde
b93d62b7025a1a2e7f8d906d79226f6ee6aa29cf
F20101130_AAAULC eom_b_Page_09.tif
ab86cfb8558dbc673606c5e58b1d58e0
86dd19ec79a25149657ee20aa87e824904fc31ad
13653 F20101130_AAAUXW eom_b_Page_33.QC.jpg
6d55bc111185c78737bd08df29bf222c
0267a702380b2429b5f9aeec6bcf9ecaeb9d8925
561784 F20101130_AAAUQA eom_b_Page_33.jp2
761550070ffaaa05a3645a744a10879b
a9ea93e0e7d1e27058868a58d6efff3c17b55033
F20101130_AAAUSZ eom_b_Page_26.tif
7a504945070d0348285318b8cd58c29a
63cbcaee896d7abf7baf0fca192fd1447a9156b7
70898 F20101130_AAAULD eom_b_Page_24.jp2
2501917a9326175e5242c9d3210e2e15
d58b7a2682cdd8f5df1dc3edcbc8c4072f2666ae
4143 F20101130_AAAUXX eom_b_Page_33thm.jpg
791f8cbf68319fb62a8eecf50c1760ce
7185c9a6c3bfc7540594b866996b54537b6c7bf4
935993 F20101130_AAAUQB eom_b_Page_34.jp2
6a5d6b4c5a1bab4abc9fb5098c2cad9d
5a50263e4c4231200273f4a639790efe60c95bb0
F20101130_AAAULE eom_b_Page_35.tif
1ea3b876e2ce45cb74a69d9b824984e6
47b424ebd661bd136195df25e3b0db389fadc43d
20955 F20101130_AAAUXY eom_b_Page_34.QC.jpg
6c6fbf1a242c61cbe759d898e9fd7be8
4d0edc025d0423a62d2064baedb467ac5bac49de
91806 F20101130_AAAUQC eom_b_Page_35.jp2
700fd7185f90fbab4bd5700b56bc697a
4a8611d3918a14fd9ffc45af6c930c6028544c2b
40069 F20101130_AAAULF eom_b_Page_82.jpg
2aac73342c091b58480dd6f7fea1020d
c97600ec56accec50f970c1422c35e1bc553cb41
F20101130_AAAUVA eom_b_Page_89.tif
7177f7c88ad225a9c2bbffb8a13681c9
e4b2aa742d7ecd22795b6be08b6f1358d690eecd
5637 F20101130_AAAUXZ eom_b_Page_34thm.jpg
8fb1a4c151311ae766e6e64ce2ba2020
03e6231f583b6d2822e13c1d8512359bc3489e13
115944 F20101130_AAAUQD eom_b_Page_36.jp2
68aaaeaa6dc034b524a7dfaae7e085e5
5052979d1b48f82423aa7c3821f0646a26b2c557
4001 F20101130_AAAULG eom_b_Page_65thm.jpg
133f87ace7d8bb71cf26a4de0a2c7eb8
b9b79d9d65956baef38908f78c8bce377f0c36ea
F20101130_AAAUVB eom_b_Page_90.tif
ac08d5b150da602fc0c40e6c82f751a1
08d4a8c523b815712d0f20b3cdccdaa212ed8a02
925580 F20101130_AAAUQE eom_b_Page_37.jp2
34ad62192dc06d8498123624b863d2e7
ee07c04f352c2eeb62767196adee0280f8bb3203
4527 F20101130_AAAULH eom_b_Page_41thm.jpg
61afc3743636360e0bc612eb03d141d3
d45be5b372528dac2f327a17bb74c89b9fa3a8ad
87895 F20101130_AAAUQF eom_b_Page_38.jp2
ecb1ee974a695bf914de6b442afce0c2
8fe372fc466989e75a52b47f31f740aa9fa85529
11348 F20101130_AAAULI eom_b_Page_93.QC.jpg
805b8056452649c917087e4fa8de1fe5
3bf0a139da1d05d5c2f6526114f710ff382fee94
F20101130_AAAUVC eom_b_Page_91.tif
83e8d1c1da389b47c76d1b3c1dcc43da
5cf2856bde879bde905e1b9dc47f5d065f6161d7
103909 F20101130_AAAUQG eom_b_Page_39.jp2
0242d1e3bc4be8506e73bb9d82d4147a
2551d864d4f17049b25b15cb82984b4b8d3b0299
5720 F20101130_AAAULJ eom_b_Page_22thm.jpg
4918e82a43cc9d3f7336d19c4d5c1f71
09b298b50f8515ac237e7ddacbe840935239a39c
F20101130_AAAUVD eom_b_Page_94.tif
b805c7461dea7a372e3a1b33416dd09e
c44076d194b26c27105f292bb7127f8be8565ec9
102540 F20101130_AAAUQH eom_b_Page_40.jp2
bae5257894359251e3dd9b0290acf4e2
4de54f3be913cc59b67e3802d0015996e9b5ada5
F20101130_AAAULK eom_b_Page_81.tif
9e09c10ddd42fc2fe590222b10114241
448eb5e1052df291b3a2195785c8276b543c9c7b
F20101130_AAAUVE eom_b_Page_95.tif
cc370b81dfa6f85c70cb5a3fc8ab1165
39a544544628fdc433451506b7075c77f4ad7f24
692095 F20101130_AAAUQI eom_b_Page_41.jp2
041831ed99759f029755f85471009a85
3ab0e8f64945da64126402e4fb59bc73b18bb864
73127 F20101130_AAAULL UFE0010844_00001.mets
56a6bc57b69bc429b382be5d1012faa5
1297e01dc167678d64575e875db8ad1480aeab52
F20101130_AAAUVF eom_b_Page_96.tif
ef28f67ec3814d0704af7c39bc2e3fff
a33eabdca5062e59952cc89b27323cb75e7c3db0
833234 F20101130_AAAUQJ eom_b_Page_42.jp2
17e6e57f03964ccc727f9e1485e083d2
8a761a03d4690b50aee294a374f22bbe4d2f4dbd
F20101130_AAAUVG eom_b_Page_98.tif
eb84a118049174ca791e0523c74cefe8
78034199afad7e5674e26c4682014a6592159465
740874 F20101130_AAAUQK eom_b_Page_43.jp2
5664b0606a7df1f903b292266e7320a0
3b445d67aa0ed8b0baaae60cc720bddb401d3c8b
F20101130_AAAUVH eom_b_Page_99.tif
80e804dd63d2a767d1afa8081b24ec47
6ff499589464b3281ab1538388af6069a1374476
1051974 F20101130_AAAUQL eom_b_Page_45.jp2
72753cb7236c2acfd87ea1aa5a594baa
3e2f2390e87298a91da04c63ae774dc2e0285b0e
21151 F20101130_AAAULO eom_b_Page_01.jpg
1f5d68e00238c7fe465327333021602c
efb269e4f57664cffa1f083e6c6b3bd4cf2b52cb
407056 F20101130_AAAUVI eom_b.pdf
9825da5389d391ed8146574ad4bac611
e4476eaf4f0f3009e38f6264311e3611d8dddf26
10029 F20101130_AAAULP eom_b_Page_02.jpg
f4e3d40951548546c78684b946c0cad4
17b366d3d50eefd0dfd77765313e23794be15b77
10659 F20101130_AAAUVJ eom_b_Page_72.QC.jpg
741d9fdf893045068c56823ccb166dc0
04e81715bb2b8f63381708b5853080b89b461e1b
85379 F20101130_AAAUQM eom_b_Page_46.jp2
65075a441b2a464af5fa5391b5fdc78c
3a3b7905ef9e05806b06ee9d2c60413dbf954c84
10140 F20101130_AAAULQ eom_b_Page_03.jpg
cf7ebc30221a2f0f5d635532dac37ef0
43e37b75d3cf2f617f63e0e07fcbcf44c8ede974
4122 F20101130_AAAUVK eom_b_Page_67thm.jpg
52b7ebcb74e4c0d40d061541ea3a90c9
904076fca4785c0234ccd6da2c2a88b298e005c8
848052 F20101130_AAAUQN eom_b_Page_48.jp2
504ca31b92d17a8a8689f8976824e08d
9c34c8824dc0035634e5894211d598fab608a873
59053 F20101130_AAAULR eom_b_Page_04.jpg
e4b8abf08e4aefcfb9cdd8d190828f50
41a58ea7f564369f39ce7359a4be87ccd11d2eef
6159 F20101130_AAAUVL eom_b_Page_31thm.jpg
e088e3fe32121a6e5cd2abc71d4d7a4e
0019438b87eabdc00c1660bbd225e7df83216578
83403 F20101130_AAAUQO eom_b_Page_49.jp2
9fffbe92c8309c17b506460836d638ff
deb0f10a2536453b829ef8d5b3443294d7b4ce43
50131 F20101130_AAAULS eom_b_Page_05.jpg
50c2cce2d314137016495690a471289a
4caa7398fdaa1b429299cad94b20813cc53915ad
19035 F20101130_AAAUVM eom_b_Page_29.QC.jpg
ce9afc52d46f44d895d4a7a73e7bfbe8
72b0412b6033ea908ae06277a0f87bebda93c8c2
990252 F20101130_AAAUQP eom_b_Page_51.jp2
b4b10666564f548be1b3515ab9531084
eb520896d37d2a9088362d4984455eab966f2b3f
77361 F20101130_AAAULT eom_b_Page_06.jpg
1b5eb64d985a85a155fe5eb5330e0990
74d1f8105a6f8dd122649e6b18458f4332711992
18293 F20101130_AAAUVN eom_b_Page_49.QC.jpg
d64f327251a98166cf8b851f0e67d963
932dc454bbd247416ec740e034976f9013c8827a
1051984 F20101130_AAAUQQ eom_b_Page_52.jp2
53262625ef60bae9a6bbe027fa096d3c
ff782da6cf7be25ddb4a5f3a32214fe829c56347
38227 F20101130_AAAULU eom_b_Page_07.jpg
db8119ec2f4e372d0c51e46695818ebd
d2e7087bd6b8324e16c6d543a0325f370dd1d722
19248 F20101130_AAAUVO eom_b_Page_46.QC.jpg
1b2b5ba5da30ba1774b46dc356373fdf
8ffa14bacc007ce46e03501c3fa839d3c448dd8d
112534 F20101130_AAAUQR eom_b_Page_53.jp2
17b6ec5943160b6cf3ca92dc95ca4229
2e3b3d135d14494b864352559ad882540a515529
26673 F20101130_AAAULV eom_b_Page_08.jpg
eb7b58274abda024b20b4dcf399b5ca2
84615fed292e1ba23a3822e5bf38326c16c2bf7d
25320 F20101130_AAAUVP eom_b_Page_64.QC.jpg
52d09519a6cb8ec6bf14051ce2227d5c
861e217d547b3955c456fce22f2268fc98ddd475
661523 F20101130_AAAUQS eom_b_Page_54.jp2
1b7a9e16363487e43ed44801e94a9134
83f23a4397306a2ce9ba8dd1e3b7c76fd97e60ab
50837 F20101130_AAAULW eom_b_Page_09.jpg
299359855783373e0200f3ea44fd22b8
621028a11ad553243dc8d69250f599d3cdbd9ae8
22148 F20101130_AAAUVQ eom_b_Page_47.QC.jpg
cc045b065a1e544cb09086fd0efaac21
789f68677aa7866cc55d731f7a6556e307def644
84386 F20101130_AAAUQT eom_b_Page_55.jp2
e570fafd938b0a42e782dae5cd551cf3
6e9eae503d761dcb02f1ae252ad65834b1c1efcf
55451 F20101130_AAAULX eom_b_Page_10.jpg
4ab803315a8cc6b56b8ee2a450a2ff02
95f1707fb469fe47437e2b8fe5fbcbdd56b8ce85
16375 F20101130_AAAUVR eom_b_Page_44.QC.jpg
8d8b9da6d144875dd29fb3d2837f5414
3fead50551843278fa16880a772bc23c147835e2
39997 F20101130_AAAUQU eom_b_Page_56.jp2
ce11cbf1031635a484fa656223eca999
0e557facc99e6b53f2a89a63fae19090b4eccf96
2726 F20101130_AAAUVS eom_b_Page_08thm.jpg
41d6d7d871b9513fc631e9d3a26a4213
5c9ecbff2db006d1eb2b97855756e231449a2cbd
86111 F20101130_AAAUQV eom_b_Page_57.jp2
43d244e543e9d2a6852860d9a7c2ec57
2a54d868b96bbfb7f0895b7e748eb5853db6fec8
23443 F20101130_AAAULY eom_b_Page_11.jpg
fdf0909eb223bd5e5271eee55f31b338
8fdcb4eab263475665784e35103f6174413efffc
2667 F20101130_AAAUVT eom_b_Page_26thm.jpg
abe5dd48ae05d7195b05fa2f9ec93567
c8340f71fed93fe681d691c538928afb04973ec6
113947 F20101130_AAAUQW eom_b_Page_58.jp2
a0c7f00c8dc0d644598ea8e96eda112a
56b83ba7e8bb00ba51a9b7bc1817d03178096471
6133 F20101130_AAAUJA eom_b_Page_37thm.jpg
ccc8144f0d0fbf3d82ce45ef8e347299
65a088ef7f410f995db71711ca4f47ffac6b34a1
47635 F20101130_AAAULZ eom_b_Page_12.jpg
1baeaf77a97e58c7af4c9d1bbcdaad55
d4aaa6e8bff90cd389ef2586bc30b6d6ee62e10d
6119 F20101130_AAAUVU eom_b_Page_40thm.jpg
7dcf462253a75e83be000ba3cc7d9752
cc4310e667007dea7c9d6bb9c7e755b18195d03c
31880 F20101130_AAAUQX eom_b_Page_60.jp2
e9e9cb83c65df3a6b1565468b25ee16a
3f2cf51603c485e2e547953f672a98b07bbb4cc0
5136 F20101130_AAAUJB eom_b_Page_03.jp2
c70fcc3c3cfea3eec545394c7fe90739
dad0cc74622c9dbb510d16566ce70e4e6d54dd69
27378 F20101130_AAAUVV eom_b_Page_97.QC.jpg
60841a306b5110edd27ca5104be4674e
9b8d64b62387766ca973da88ccf60c40e4c8ddea
103982 F20101130_AAAUQY eom_b_Page_61.jp2
318e183babfe81aeeba91cb06fd25b39
c8254615dcf799e326c86d3bde0ab044a2b51a8d
6794 F20101130_AAAUJC eom_b_Page_36thm.jpg
2d06e97fb1bb8dba066ed58e94e387c3
8cd98856be250c3c913eef05eb24ca200f0934e0
15301 F20101130_AAAUVW eom_b_Page_12.QC.jpg
748327d8bc14761a1efa52978747f767
c48493f922eee9a253fe4b089d4dc3fc7f14f627
42111 F20101130_AAAUOA eom_b_Page_69.jpg
975ba7a6525c0729580e58a10f317c25
d24629375d8e6664abdc68cb87356868a0130cd9
102292 F20101130_AAAUQZ eom_b_Page_62.jp2
71ea1e1f16ad1cc13da1d6ccf19e246d
3393f0d6b12fd5b01a2c6225274cc82fc9566ec9
24462 F20101130_AAAUJD eom_b_Page_30.QC.jpg
d6ce71b47f9492703b02a15d5b450bf7
f7c5e0c3a9c1d938505aebb40b2acc56fe028c71
5725 F20101130_AAAUVX eom_b_Page_46thm.jpg
0d60e333d01aa292388d2deab02bb970
78b4596d43e183e506501d6fff872c8573e86f53
F20101130_AAAUOB eom_b_Page_70.jpg
0c906d591c7fa136ed8f38e4a622dcda
a8b3a48df78a0facf305def713468417d30d94d6
F20101130_AAAUJE eom_b_Page_02.tif
cad105b1573304277d671d4bd7961921
da9de83acd0b808d9ff3682d06b347e0b6638e7d
19082 F20101130_AAAUVY eom_b_Page_22.QC.jpg
acf6831660673f1d703d03e9db149d2d
e6c23abe29dcee8ffdf9dcc6b61781873b817c08
37701 F20101130_AAAUOC eom_b_Page_71.jpg
34d2dfc6bffb98b17f3e6b8445c95de3
8b16ef89263c7b1e3807dd34e850c8685424ba5e
6668 F20101130_AAAUJF eom_b_Page_30thm.jpg
b5770ffcf23563cb7c50598d2508754d
5aca04269b4e9ded9b7511be066bdc7e28342d1d
F20101130_AAAUTA eom_b_Page_27.tif
e89d953a80c0de22c6bb598145922fee
dd777a424c9ee4fd1589305854fabe9a0fb1913c
21544 F20101130_AAAUVZ eom_b_Page_50.QC.jpg
aa01363026add45b2fbac733006c78e0
5fe212b85cae0de77daffe189a11cc1e24650931
34745 F20101130_AAAUOD eom_b_Page_72.jpg
3fd770e51be528d1dbd15e5098baff86
861e8bd578adc437c2ebf04479b8115b9d627ee0
F20101130_AAAUJG eom_b_Page_97.tif
c0040f425b3ce675d2d448bc0ff0fda8
ec05a2f15506b151cc83863d5adaa8753a9bc03d
F20101130_AAAUTB eom_b_Page_28.tif
0bd459891e8b8eddf5450bb9b6f19a80
8b917a8988e3a31464e52e12d099d07f17f81eb6
36027 F20101130_AAAUOE eom_b_Page_73.jpg
15ec2a5d08a1d69ef2ab59f771ad5571
b4edd99e92b9c05564dd1d3c276738826aa17cad
3903 F20101130_AAAUJH eom_b_Page_19thm.jpg
1972bd364dae6d498b6d794e2386d35c
24b491b21261718b91b64da18e2442dc32c01407
3539 F20101130_AAAVAA eom_b_Page_72thm.jpg
954a1c1ce4d4dde1d3616e83c13f001c
399a8af4f2520f23db47cf21de8413372f208f88
20417 F20101130_AAAUYA eom_b_Page_35.QC.jpg
a092411e768a7f5f02a06f73bdbfc68c
501c4df80a95737b73aaa3d3ea6f4434f95a0869
F20101130_AAAUTC eom_b_Page_29.tif
d9c2727584a001062d896f79404af9c5
5db0de19063d5ceea082b0d2f5f2f1ad7748717e
43897 F20101130_AAAUOF eom_b_Page_74.jpg
e831950fd4bf77d0c81c0f7175a523d3
e189e84fd43af415b90d27b05898415b46fc8e8b
3292 F20101130_AAAUJI eom_b_Page_99thm.jpg
75ed23022977e810ba8f5bf0956663c2
24a969b2a6f066d4f6750732ce0fde939c71e220



PAGE 1

QUERY OPTIMIZATION USING FREQUENT ITEMSET MINING By BOYUN EOM A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLOR IDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE UNIVERSITY OF FLORIDA 2005

PAGE 2

Copyright 2005 by BoYun Eom

PAGE 3

I devote this work to my parents

PAGE 4

iv ACKNOWLEDGMENTS Lots of thanks should go to many people for this thesis. First of all, I would like to express my sincer e gratitude to my adviser, Dr. Chris Jermaine, for giving me the chance to work with him with th e interesting topic for this research and for his enthusiastic guidance. I was so lucky to have such a wonderful adviser. He has inspired me constantly through this work, and he will be a source of inspiration with whatever I would do in my career. Owing to his invaluable suggestions a nd patience, I have enjoyed this thesis and I have been able to complete it. I would also like to thank Dr. Alin Dobra and Dr.Sanjay Ranka for their willingness to serve on my supervisory committee. Next, I want to show my thanks to my colleagues in our lab, Abhi, Shantanu, and Subi, and the colleagues in DB center, Am it, Gilliean, Jungmin and Seema. I really appreciate for their friendly help and keeping me company during the day and late night. I would like to thank my Soulmates and my friends in Gaineville. I also would like to give my special thanks to all of my friends in Kor ea for their continual love and heartful cheers. Now, I wish to express my deep appreciation and admiration to the Caviedes for being my family here in a strange land. Owing to their wa rm love and thoughtful c oncerns, I have felt more comfortable and have enjoyed my life in Gain esville. I would not forget the amazing friendship and the memory we have shared together. My thanks next go to all of my family member s: My only sister, JuYun, and my brother-inlaw, DongWoo, older brother, SungHyuk, and sist er-in-law, JungA, younger brother, KyungSik,

PAGE 5

v and his new bride. Also, I would like to give my thanks to my beloved nephews, HeeWon and HeeWong. Their great encouragement has been very helpful for my work. I greatly appreciate my grandma, KyungIee Kim, who raised me. Especially, I would like to thank for all of her efforts for me and fo r her being there and patiently waiting for me. I cannot help mentioning my love and thanks toward my grandma, YongCheol Jin, who passed away right after my departure to the USA. I have felt sorry for the fact that I could not keep the last appointment with her and have been missing and will miss her, forever. I also feel that I have to show my thanks to my uncle and aunt who have encouraged me to do what I wished. Most of all, I am extremely thankful to my parents for their acceptance my decision and their prayers for my will. Since I know that it was not easy for my parents to let me go to untraditional route, I really thank them for their trust and support. Last, I appreciate God who has allowed me all of these wonderful people and experiences in my life. Even though it was really exciting an d happy journey for me to spend time for this work, the most valuable thing I have earned through this experience is that I realized His endless love and that I am nothing without Him. Words cannot express my gratitude to all those who have supported me.

PAGE 6

vi TABLE OF CONTENTS page ACKNOWLEDGMENTS.................................................................................................iv LIST OF TABLES...........................................................................................................viii LIST OF FIGURES...........................................................................................................ix ABSTRACT....................................................................................................................... ..x CHAPTER 1 INTRODUCTION........................................................................................................1 2 RELATED WORK.......................................................................................................5 2.1 Histograms..............................................................................................................6 2.1.1 Equi-Width Histograms................................................................................6 2.1.2 Equi-Depth Histograms................................................................................7 2.1.3 End-Biased Histograms................................................................................7 2.2 Selectivity Estimation Using One-Dimensional Histograms.................................9 2.3 Multi-Dimensional Histograms............................................................................14 2.4 BHUNT.................................................................................................................14 3 CARDINALITY ESTIMATION USING FREQUENT ITEMSET MINING...........16 3.1 Background in Data Mining and Frequent Itemsets.............................................16 3.2 Overview of Our Approach..................................................................................19 3.3 Data Structure of the FI Tree................................................................................21 3.4 Selectivity Estimation for Equality Selection.......................................................24 3.4.1 Step One: Get the Cardinality of Resulting Relation Using the FI Tree....24 3.4.2 Step Two: Get the Cardinality of Result Relation Using the Uniform Distribution Assumption..................................................................................25 3.4.3 Step Three: Create a New FI Tr ee for the Result after Selection...............29 3.4.4 Step Four: Update the Table for Attribute Counts for the Result after Selection...........................................................................................................32 3.5 Selectivity Estimation for Join..............................................................................33 3.5.1 Notations Used...........................................................................................35 3.5.2 Step One: Get the Cardinality for the Result after Join..............................36 3.5.3 Step Two: Get the FI tree for the Result af ter the Join...............................39

PAGE 7

vii 3.5.4 Step Three: Update the Table for Attribute Counts for the Result after Join...................................................................................................................44 4 EXPERIMENTS AND RESULTS.............................................................................46 4.1 Goal of Experiments.............................................................................................46 4.2 Methodology.........................................................................................................47 4.3 Results...................................................................................................................5 0 4.3.1 Dataset 1: Slightly Correlated Data............................................................51 4.3.2 Dataset 2: More Correlated Data................................................................51 4.4 Discussion

PAGE 8

viii LIST OF TABLES Table page 1: Distribution of attribute values on Quantity ..............................................................8 2: Transactions in R1 ........................................................................................................17 3: Frequent itemsets from R1 and their supports..............................................................18 4: The updated table for numbers of attribute after selection...........................................33 5: Transactions in R2 ........................................................................................................37 6: Number of attribute values for result relation after join R1 and R2 .............................45

PAGE 9

ix LIST OF FIGURES Figure page 1: Overall flow for query processing in DBMS...................................................................1 2: Histograms (A)The Equi-width, (B)T he Equi-depth and (C)The End-biased histograms..................................................................................................................8 3: A query plan................................................................................................................ ...10 4: More possible query plans.............................................................................................11 5: Query plan1 after performing S1 in Figure 3 ................................................................13 6: A query plan tree for the example of the selection........................................................20 7: An example of an FI tree...............................................................................................22 8: The FI tree for R1.......................................................................................................... 23 9: The numbers of attribute values over R1.......................................................................26 10: Overall process for getting the FI tree.........................................................................30 11: Join modeling.............................................................................................................. .34 12: Profiles for R2 a) the FI tree for R2 b) table for attribute value count for R2 .......37 13: After rebuilding and deleting uncommon items..........................................................40 14: Changing frequencies for nodes..................................................................................41 15: The FI tree for result relation.......................................................................................43 16: Partial schema of TPC-R database...............................................................................49

PAGE 10

x Abstract of Thesis Presen ted to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Master of Science QUERY OPTIMIZATION USING FREQUENT ITEMSET MINING By BoYun Eom August 2005 Chair: Christopher M. Jermaine Major Department: Computer and In formation Science and Engineering Depending on the query plan chosen by a database query optimizer, the overall query execution time can vary dramaticall y. Consequently, the query optimizer in a database management system is critica lly important component for reducing the execution time in query processing. To r ecognize the best plan among many possible query plans using cost-based optimization, th e query optimi zer estimates the cost for every plan based on the underlying data dist ribution. The query optimizer makes use of synopses of the database rela tions to perform this task. The most common synopses in commercial databases have been one-dimensional histograms. However, histograms can provide poor estimation quality, especially when there is correlation among data. Motivated by this, we propose a new met hod to perform more accurate selectivity estimation, even for correlated data. To m odel the correlation that may exist among data, we borrow one of the well-known techniques in data mining a nd extract attribute values

PAGE 11

xi that occur together frequently using fre quent itemset mining. In our method, frequent itemsets are used as synopses of the data base relations. Through experimentation, we found that our approach is effective in mode ling correlations and we confirm that this method approximates result rela tions more accurately than one-dimensional histograms do. Our approach gives precise estimates, partic ularly for the correlat ed data and/or the skewed data.

PAGE 12

1 CHAPTER 1 INTRODUCTION A DataBase Management System (DBMS) has several components used in query execution: the query parser the query optimizer the code generator or the interpreter and the query processor When a user gives a query to a DB MS, the DBMS evaluates the query as follows. First, the query parser checks syntax an d translates the given que ry into an extended form of relational algebra. This is th en passed to the query optimizer, and the plan generator in the query optimizer produces many different query plans. Though all query plans produce the same result, their individual cost of execution vari es widely. So, the cost estimator in the query Figure 1: Overall flow for query processing in DBMS Query Query Parser Query Optimizer Code Generator Query Processor Answer Plan Generator Cost Estimator

PAGE 13

2 optimizer is assigned to choose only one among all plans that the plan generator created. Once the query optimizer has determined the execution plan as a next step, the code generator or the interpreter transforms the plan into the act ual access routines. Finally, the query processor performs the query and the DBMS returns the result [1]. Figure 1 shows overall query processing flow. The performance and efficiency of a DBMS can be significantly affect ed by the quality of the plan chosen by the query optimizer. Hence th e query optimizer is arguably the most important component of a DBMS for query execution time. No t surprisingly, much research has focused on query optimization. Our research is also focused on this particular area. The job of a query optimizer is to seek the b est plan that has the cheapest execution cost, and in particular to avoid picking up a poor plan. In cost-based optimization the main idea is to compare the cost of a plan in terms of the number of disk accesses that would be performed as a query plan is executed. Depending on the or der of relational algebra operators such as select join or project and the size of intermediate relations, the number of disk I/Os required to implement the plan can be different. A query optimizer appr oximately calculates total I/Os for each plan and finally chooses the least costly plan for the give n query. The plan chosen should have the least number of disk accesses among all plans. To estimate the approximate cost of a query plan, a query optimizer needs information on data distribution, and the DBMS typically makes u se of a statistical synopsis for this purpose. As a consequence of the limited time and space, most of current commercial DBMSs use onedimensional histograms as synopses. The Attribute Value Independence (AVI) assumption is needed when one-dimensional hist ograms are used over multiple attributes in a query. Under the AVI assumption, attributes in a relation are assu med to have independent data distribution from one other and the joint data distribution can be obtained from these independent individual distributions [2].

PAGE 14

3 Often, some of data have strong dependenc ies. For example, taller people tend to be heavier than shorter people. Most people who are from Asia have black hair color. People with higher education have a tendency to earn more money. One may imagine many other cases in which data are correlated with each other. The correlation between attributes affects the distribution of data. If we ignore these characteristics of data and simply use the AVI assumption for estimating cost, a query optimizer can make very poor decisions. Motivated by this problem, this thesis considers a method for computing high-quality selectivity estimates, even over correlated data. In our method, instead of using histograms with the AVI assumption, we adopt one of the well-known techniques from data mining to build a model for the data set. We use Frequent Itemsets [3, 4] to capture the correlation among data for estimating the size of the result of selection and join operations. Our method is described at a high level as follows. Using the Apriori algorithm [3], we determine if there exists an attribute value set that satisfies a predetermined frequency threshold. If there is, we regard this set as a frequent itemset and keep the values in this set with the freque ncy. For instance, suppose we treat the set of attributes occurring together in more than 50% of a relation’s records as a frequent itemset. Among total tuples in a relation, if half of tuples have value ‘ a1 ’ on attribute A and value ‘ b2 ’ on attribute B then these two values, ‘ a1 ’ and ‘ b2 ’, are frequent items and we store this itemset, { a1, b2 } and its frequency in our profile. Base d on these frequent values, we build an FI (Frequent Itemset) tree In response to a selectivity estimate request by the query optimizer, we traverse the FI trees corresponding to the queried relations and look for frequent itemsets containing values which are on the attributes in the plan. If there are values that we are looking for, then we use their frequency to estimate the cardinality for the operation in question. For the rest of attribute values that do not appear in the FI tree, we use heuristic methods under the uniform distribution assumption to calculate the ca rdinality and the total numbers of attribute

PAGE 15

4 values are used together. Even though we do not keep the information of non-frequent attribute values, by definition they cannot occur frequently, and so it would not have serious effect on final cost because only attribute values with low frequency are discarded. The benefit of our approach is as follows: (1) Our method outperforms one-dimensional hi stograms, especially when data are not independent of each other, becau se it is available to capture co rrelation in the data. We keep data distribution information like the histogram method, but unlike histograms, our FI trees capture correlations between those attribute va lues. This gives us more accurate estimation for non-independent data. (2) Our approach is also good for skewed data. Unlike histograms which assume data is uniformly distributed within a bucket, we keep the real frequency of frequent items, which form the majority of attribute values in th e relation. This means that we can estimate selectivity more precisely for the skewed data. Through extensive experimentation using TPC -R benchmark, we confirm these useful characteristics of FI-based approach.

PAGE 16

5 CHAPTER 2 RELATED WORK Typically, there are two forms for query optimization: rule-based optimization and costbased optimization In rule-based optimization the optimizer ranks available access paths by heuristic rules and chooses an execution plan for the query. So, this optimization is sometimes called heuristic optimization On the other hand, cost-based optimization estimates the costs of executing possible plans and as a result, the overall cost of executing the query is systematically reduced. With this reason, it is also called systematic optimization Many techniques have been proposed to sugg est more accurate way to estimate in costbased optimization. These can be classified as sampling techniques parametric techniques and non-parametric techniques [5 7]. As name implies, sampling techniques collect samples from the database and use those to calculate intermedia te size. Despite of its potentially high accuracy, the drawback of sampling is that it is expensive and questionable in query optimization since it is performed mostly at run time and query optimiz ation should estimate the cost quickly [5]. Parametric techniques on the other hand, require little overhead but they can give bad estimates [5]. In such techniques, a mathematical function is used and by fitting parameters of this function, the data distribution is approximated [5, 6]. The last class of techniques in query optimization is non-parametric techniques. These techniques are sometimes so called histogram-based techniques [5, 6]. Even though histograms also have their drawbacks, histogram-based techniques are simple and inexpensive [5] and are the most widely used method in commercial systems. In the remainder of this chapter, we discuss pr ior research related to our approach in more detail. We discuss histograms in general, sel ectivity estimation using histograms, and multidimensional histograms.

PAGE 17

6 2.1 Histograms In a cost-based optimization, th e query optimizer uses statistics in the data dictionary to estimate the cost of plans [8]. To support such statistic, a database administrator periodically instructs the DBMS to construct and maintain data profiles. A profile in a database is a statistical summary of a relation such as the number of t uples, the number of attribute values and the distribution of values. Since a query optimizer tota lly relies on these profiles to calculate the costs rather than using the real relations, profiles pl ay significant roles in query optimization. The oldest and the most common form of profile is a histogram [9, 10]. To build a histogram in DBMS, the domain of attribute values for a single attribute is partitioned into buckets after being sorted, a nd the histogram keeps the minimum and maximum attribute values of in each bucket. Every bucket h as the sum of frequencies of attribute values in it, as well. Since a histogram does not store all frequencies for values in a bucket, the frequency of every value in a bucket is assumed to be equa l to the average of frequencies of all values in that bucket. This is known as the uniform distribution assumption [9, 11]. The advantages of using histograms for query optimization are that there is little run-time over head, that they are inexpensive to store, maintain and compute and that they may give low-error estimates [2, 12]. Various classes of histograms have been proposed but it is known that only some of fundamental approaches are used in practice: equi-width histograms, equi-depth histograms and end-biased [6]. 2.1.1 Equi-Width Histograms In the equi-width histogram the width of each bucket is equal but the frequencies are different for every bucket. In other words, the number of consecutive attribute values for all buckets is same but the sums of frequencies for these values in every bucket are not. The equiwidth histogram is one of the oldest histograms and it is relatively cheap to build and very easy to apply in selectivity estimation. However, it is not good for handling skewed data and has a much higher worst-case and average error for selection que ries than equi-depth histogram [9, 12, 13].

PAGE 18

7 2.1.2 Equi-Depth Histograms As an alternative to equi-width histograms, equi-depth histograms have been suggested. All buckets have almost the same height in an equi-depth histogram, i.e., when partitioned, the frequencies for each attribute values are counted an d the range of each bucket is decided by these frequencies. Most of current commercial DBMSs use this histogram [2, 9, 13]. 2.1.3 End-Biased Histograms This distinctive different histogram puts th e most frequent attribute values with its frequency in individual buckets. The rest of th e attribute values and the average of these values are stored in a bucket together. Together equi-depth histograms, end-biased histograms are more accurate for approximating distribution [6]. Below is an example for these three types of histograms. Example 1 : On the relation lineitem in the TPC-R benchmark that we used for the experiments described in chapter 4, there is an attribute named Quantity Assume that Table1 gives the distribution of attribute values on that attribute, and we construct an equi-width histogram and an equi-depth histogram with th is distribution using 4 buckets. Figure 2 is the result histograms for Table 1.

PAGE 19

8 Table 1: Distribution of attribute values on Quantity Figure 2: Histograms (A)The Equi-width, (B)The Equi-depth and (C)The End-biased histograms Quantity count Cumulative 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 2 3 5 8 2 0 0 0 30 2 8 5 5 0 10 14 2 1 1 1 1 2 5 10 18 20 20 20 20 50 52 60 65 70 70 80 94 96 97 98 99 100 40 30 20 10 20-24 25-29 30-34 35-40 20 32 28 20 Frequency Quantity 40 30 20 10 20-28 28 29-34 34-40 25 25 25 25 Quantity Frequency (A) an equi-width histogram (B) an equi-depth histogram 40 30 20 10 28 34 35 46 30 10 Quantity Frequency (C) an end-biased histogram 14 20-27

PAGE 20

9 Note that in the equi-width histogram (A) in Figure 2, the frequencies for each buckets are different but the ranges of attribute values for ea ch buckets are almost the same. On the other hand, in the equi-depth histogram (B), all frequencies for every bucket are (almost) same. When we judge a histogram, we need to consid er 1) accuracy, 2) efficiency in maintaining and 3) extensibility to multi-dime nsional data [10]. So, considering these criteria, the major setback for the partition-based histograms like eq ui-width and equi-depth histograms, is the difficulty in extension to comp lex multi-dimensional data [10]. Besides these traditional partition-based histograms, one interesting classes of histograms is wavelet-based histograms [10, 14, 15, 16]. In this approach, a mathematical function is used for decomposition process. Through this decomposition process, original data are transformed and compressed into a set of numbers, wavelet coefficients Decomposition processes are repeated until there is the single coefficient and we call this as wavelet decomposition [10, 14, 15, 16]. The wavelet-based histogram shows more accuracy than traditional histograms and most of all this can be naturally extended to the multi-dimens ional distribution in the course of wavelet decomposition and reconstruction. Yet, the mainte nance for any change of data distribution is much more difficult than partition-based histograms. 2.2 Selectivity Estimation Usin g One-Dimensional Histograms The terminology of selectivity means a ratio of the number of tuples that satisfy the predicates, conditions, in a query to the total numb er of tuples of the relation [13]. As a plan is being evaluated, an optimizer calculates th is selectivity for operations in a tree. We now consider an example of how histogram s can be used to evaluate the quality of a query plan. Example 2: Below is one of queries we used in our experiments: SELECT l_orderkey, o_orderdate, o_shippriority FROM customer, orders, lineitem WHERE c_mktsegment = ‘AUTOMOBILE’

PAGE 21

10 AND c_custkey = o_custkey AND l_orderkey = o_orderkey; For convenience, we will refer to c_mktsegment = ‘AUTOMOBILE’ as P1, c_custkey = o_custkey as P2 and l_orderkey = o_orderkey as P3 Then the predicate set P for this query can be expressed such as, P = { P1, P2, P3 }. Again, since we have dealt with selection and join operations in this thesis, we ignore the project operation in our query plan trees. For this given query, many quer y plans would be correct in the sense that they give the correct result, though each will have a different cost. Figure 3 describes one of those possible plans. Figure 3: A query plan A leaf node on the query plan tree denotes a base relation and the profile like a histogram for the base relation is called as a base profile Internal nodes are operations. Performing operations on the base profiles results in intermed iate relations and profiles. Unlike base profiles, intermediate profiles are estimated. Therefore, as an intermediate table is farther from base tables in a tree, the accuracy of the corresponding profile decreases. In Figure 3, there are three base tables on our tree and three base profiles are used for the statistic of these relations. Query plan1 first performs selection operation with predicate P1 over customer relation and then it joins the result with orders relation on custkey Finally, it joins intermediate relation after join J1 customer.mksegment=’AUTOMOBILE ’ < customer > < orders > < lineitem > custkey orderkey S1 J1 J2 Query Plan1

PAGE 22

11 with lineitem relation on orderkey We can see some other possible query plans in Figure 4. As this figure shows, depending on the order of operations or depending on the combination of relations for the operation, we can have different plan trees. Note that all three plans have same relational algebra operations. Figure 4: More possible query plans How then are histograms used when the optimizer estimates the cost for a plan tree? At the time that the optimizer calculates the cost for all these possible plan trees, it starts from the base tables. First of all, it checks if there is a histogram corresponding attribute in the selection predicate over the base table. If there is, the optim izer searches for the bucket(s) that contains the attribute values named in the selection predicate. With that statistic, the optimizer can estimate the selectivity. In the case that the operati on involves multiple attributes, under the AVI assumption in that attributes are independent, we estimates each selectivity and multiply them together [17 22]. For more detailed description of the process, we go back to our first example, query plan1 Assume that all attributes that are used in this query have histograms. Normally, histograms are built on the integer attributes. If an attribute is non-numeric, we may need to take transformation for attribute values on that attribute. We assume th at we have already done this process and have Query Plan2 customer.mktsegment=‘AUTOMOBILE’ < customer > < orders > < lineitem > custkey orderkey < customer > < orders > < lineitem > custkey orderkey customer.mktsegment=‘AUTOMOBILE’ Query Plan3

PAGE 23

12 a histogram on attribute mksegmenet in the relation customer We now consider how the optimizer makes a use of histograms to estimate the cost for query plan1 In query plan1 there are one selection, S1 and two join operations, J1 and J2 The total cost for this plan tree is the summation of the costs for these operations We will discuss the selectivity for S1 and J1 in query plan1 Over the relations in this tree, we first define some notation: Nc Cardinality of customer relation No Cardinality of orders relation Nl Cardinality of lineitem relation Bk the kth bucket of the histogram on mktsegment which contains the value corresponding to AUTOMOBILE Fk the frequency for kth bucket fi the frequency for ith data after sorting Count(Bk) number of attribute values in Bk Cost(s) cost for the selectivity s in terms of number of tuples For the selectivity for S1 in query plan1 a query optimizer tries to find out the bucket(s), Bk which contain(s) the predicate value, ‘AUTOMOBILE’, and get(s) the frequency of the bucket(s), Fk. By adapting the uniform di stribution assumption, the frequency of each value in the bucket(s), f would be: ) Count(B F fk k With this frequency, we can get th e cost for this equality selection S1 : Cost(S1) = f Nc If we expand this equality selection to a range query over customer relation, then the cost for range selection, r would be Cost(r) = i i c) f ( N, where i is all values within the range in the query

PAGE 24

13 The selectivity for join operation is more complicated. After performing selection operation S1 our query plan1 would have a form like Figure 5: Figure 5: Query plan1 after performing S1 in Figure 3 To get the selectivity for J1 operation in query plan1, an optimizer compares all buckets on custkey in the intermediate relation R1, which is the result for S1 over the in orders relation. Note that since we use the AVI rule that assumes that all attribute values are independent, we still use the histogram on attribute custkey of relation customer for relation R1 Just like the way to get the cost for equality selection, the number of tuples for every attribute values would be calculated if those are common in both relations. Let I be the set of all common values on joined key in both relations. Then for all common attribute i, number of tuples after join operation would be: Ti = fiin relation R1 f C i R 1 in relation Orders I i C O Finally, the cost for join, J1 is the summation of all these tuples: Cost(J1) = i i T After all, the total cost for the query plan1 is the summation of all these costs after each operation in the tree, in terms of the number of tuples. < intermediate R1 > < orders > < lineitem > custkey orderkey J1 J2

PAGE 25

14 2.3 Multi-Dimensional Histograms In reality, most of queries contain more than two attributes and the results of those queries depend on the joint data distribution [2]. Thus one-dimensional histograms use the AVI assumption for multi-dimensional attributes and will produce good estimates only if the attributes are totally independent. Rather than relying on th e unrealistic AVI assumption, what if we capture data distribution over multiple attributes and use this statistic for estimating selectivity? Multidimensional histograms can be used for this purpo se. Consider the joint frequency distribution of the two attributes in a multi-dimensional space. E ach axis represents one of the attributes and a set of points has the frequency for the joint data corresponding attribute value pairs. We need to sort and partition the tuples and put these partiti ons into buckets [2, 23]. Depending on the way to partition the domain of attribute values, there are several multi-dimensional histograms that have been proposed: phased, mhist, genhist, stholes, vi and hmf [2, 24, 25]. Conceptually, the advantage of multi-dimensional histograms is that we can expand the dimensions as needed and we do not need to depend on the problematic AVI assumption any more. However, the joint frequency distribution is often very complex and e xpensive to construct. In addition to this expensive cost, it is known that in practice multi-dimensional histograms actually do not work well when the dimension is higher than three [13]. Thus especially for the join operation that increases the data dimens ionality, multi-dimensional histograms may be poor in terms of accuracy as well as cost. As a result, multi-dimensional histograms are typically not used in production systems. 2.4 BHUNT We conclude this chapter by introducing one more recent rela ted work that in many ways is close to the method suggested in this thesi s. Brown and Haas [26] suggest a “data-driven technique” where functional dependencies between attributes are automatically discovered to make an improvement in query optimization. A fuzzy algebraic constraint and other useful

PAGE 26

15 relationships are found by BHUNT methodology if there is a correlation between data. First, BHUNT generates candidates that satisfy an algebr aic constraint and for each candidate, then it constructs algebraic constraints. S tatistical histogramming, segmentation, or clustering techniques are used for this construction. During query processing, the optimizer uses constraints to discover more efficient access paths [26]. In the sense that they try to use a data mining technique to get the good query evaluation, their approach may be closely related to ours.

PAGE 27

16 CHAPTER 3 CARDINALITY ESTIMATION USING FREQUENT ITEMSET MINING Our new method for estimating the size of r esulting relations makes use of an idea from data mining: the frequent itemset We begin the description of our method by describing data mining and the idea of frequent itemset, before moving on to our estimation method. 3.1 Background in Data Mining and Frequent Itemsets For those who might not be familiar with the area of “ data mining ”, we start from a brief discussion. As mine workers dig gold from heap of earth, the purpose of data mining is to find some precious information that is hidden under the stack of data and that we might ignore most of times. Among the data that we store in our reposito ry, some interesting and valuable information may be buried. One of the often-men tioned examples for data mining is market basket data [3, 4]. For example, it may be the case that when men go to the market to buy diaper, they likely purchase beer, as well. By finding this correlati on between the item diapers and beer, we can generate a association rule such that if there is a diaper purchased, then beer is also sold as well. This rule is denoted as “diaper beer” [4, 27]. Among the total transactions, if some items are found together frequently, we put these items as a set and mark this as a frequent itemset (FI) If k is the total number of items in a frequent itemset, it is called as k-size itemset that is, the number of items in an FI set is the size of the set. To mine frequent itemsets is one of the interesting challenges in data mining area and many approaches are suggested. For the purpose of selectivity estimation using frequent itemsets, we used the famous apriori [3] algorithm. To judge if an item is eligible as a frequent item or not, we need a pre-specified measure. Support and confidence are those. Suppose we consider th e correlation between some items and

PAGE 28

17 put the items that are in the ant ecedent and in the consequence as all items While support is a fraction of the transactions that contains all items to the total transaction, confidence is the ratio of the number of transactions that have all items to the number of transactions that include items in the antecedent. For example, if a basket data base has 100 transactions, out of which 80 include item diaper and 40 of these contain item beer, the support for these items is 40% (40 out of 100) and the confidence is 50% (40 out of 80). One important property of frequent itemsets is the upward closure property which means if an itemset is a frequent itemset with support with higher than or e qual to the threshold, all of its subsets are also frequent itemsets. For example, if {‘ a1 ’, ‘ b2 ’} is a frequent itemset, all of its subsets, {‘ a1 ’} and {‘ b2 ’} in this case, are also frequent itemstes. Since besides the itemset itself, {‘ a1 ’, ‘ b2 ’}, the sizes of all other subsets of this itemset, {‘ a1 ’} and {‘ b2 ’}, are smaller than the itemset. Since the itemset {‘ a1 ’} may occur with another attribute values rather than ‘ b2 ’, the frequency of {‘ a1 ’} should be equal to or larger than that of {‘ a1 ’, ‘ b2 ’}. To make discussion about FI minining concrete, here we provide an example. Example 3 : In the relation R1, there are 4 attributes, A, B, C and D. Table 2 shows the values on those attributes over all tuples in R1. Let support = 30% be the threshold. Table 2: Transactions in R1 By the definition, we can get the frequent itemsets as below: Transaction ID A B C D 1 2 3 4 5 6 7 8 9 10 a1 b2 c3 d4 a1 b3 c4 d5 a2 b3 c5 d5 a1 b2 c3 d4 a2 b5 c3 d4 a2 b3 c5 d4 a1 b2 c3 d4 a1 b2 c5 a3 b2 c3 a2 b5 d4

PAGE 29

18 Table 3: Frequent itemsets from R1 and their supports Frequent itemset Number of tuples Support {a1} {a2} {b2} {b3} {c3} {c5} {d4} {a1, b2} {a1, c3} {a1, d4} {a2, d4} {b2, c3} {b2, d4} {c3, d4} {a1, b2, c3} {a1, b2, d4} {a1, c3, d4} {b2, c3, d4} {a1, b2, c3, d4} 5 4 5 3 5 3 6 4 3 3 3 4 3 4 3 3 3 3 3 5/10 100 = 50% 4/10 100 = 40% 5/10 100 = 50% 3/10 100 = 30% 5/10 100 = 50% 3/10 100 = 30% 6/10 100 = 60% 4/10 100 = 40% 3/10 100 = 30% 3/10 100 = 30% 3/10 100 = 30% 4/10 100 = 40% 3/10 100 = 30% 4/10 100 = 40% 3/10 100 = 30% 3/10 100 = 30% 3/10 100 = 30% 3/10 100 = 30% 3/10 100 = 30% Among 10 transactions, 3 transactions contain the items, ‘ a1 ’, ‘ b2 ’, ‘ c3 ’ and ‘ d4 ’ and the support for these transactions is 3 divided by 10 = 30%. Excluding the null set, there are 15 (15 = 24 –1) subsets of the frequent itemset {‘ a1 ’ ‘ b2 ’, ‘ c3 ’, ‘ d4 ’} and all those subsets are also frequent itemsets. As we see, for every frequent itemsets in Table 3, the upward closure property holds. Before we finish this section, we clarify a couple of terms we use in this thesis. In data mining area, especially for the transaction data like market basket data where all transactions consist of items of goods, people use the word, item s, to refer to the values in transactions. Although attribute value is the more general-purpose terminology in a database, it will sometimes be more suitable for us to refer to an attribute value as an item in this thesis. The reason is that once we adopt frequent itemset mini ng to the task of selectivity estimation, it is more natural to use original terminology used in data mining. So, we will use the terms item and

PAGE 30

19 attribute value interchangeably and also we will use the word an itemset to refer to an attribute value set We also note that while a cardinality is the number of cardinal members in a set the cardinality in database is the total number of tuples in a relation. 3.2 Overview of Our Approach In this thesis, we describe a new summar ization method based on frequent itemsets and consider two fundamental selectivity operations, equality selection and join. Selectivity for range queries can be obtained by aggregating the selectiv ity of equality selection within the range. We begin with a high-level description of how our a pproach actually works for estimating cardinality. Before the optimizer is invoked, the first thi ng we need to do is to construct profiles of every relation in the database. In our approach, we use a frequent itemset file to model a relation. Using the Apriori algorithm, we begin by mining frequent itemsets among all attribute values. We find out all frequent itemsets over a relation, and build an FI tree with these itemsets. Every time we need the data distribution of a relation to estimate the cardinality of the result of a relational algebra operation over the relation, we use the FI tr ee, in addition to a few other simple statistics on the relation. To estimate the size of the relation from a sel ection operation using our profile, we first consider all predicates present in the selection operation. As long as all of the values which satisfy the predicates are frequent itemsets, we can estimate the exactly correct cardinality for result relation since the FI tree keeps the actual fre quencies of frequent items. For example, if the selection predicates are R.A = ‘ a1 ’ and R.D = ‘ d4 ’ in the query and we have {‘ a1 ’, ‘ d4 ’ } as a frequent itemset whose frequenc y is 60%, we can then get the exact cardinality for the result directly by acquiring the frequency from the FI tree. In addition to the FI tree that contains the cardinality of the relation, we also store the number of attribute values for every attribute over the relation. If we cannot find some of the values that satisfy the predicates in the FI tree, we use those additional counts to estimate the frequenc ies of tuples containing these non-frequent items

PAGE 31

20 using standard heuristic methods. Taking the pr oduct of the frequencies from frequent itemsets and non-frequent itemsets, finally we approxima te the frequency for the predicates in the selection query. As an example, consider these steps for the equality selection operation over the following query plan in Figure 6. Figure 6: A query plan tree for the example of the selection For the selection operation S1 the predicate set P is P = { a1 b1 d4 }. Assume that the cardinality of R is 10 and V ( R,B ) = 5 In the FI tree, {‘ b1 ’} is not shown, but {‘ a1 ’, ‘ d4 ’} is an FI with 60% frequency. We can see three attribute values on B and the summation of the frequencies for those frequent items is 80%. With these facts, we know that for the non-FIs, there are 2 attribute values (2 = 5 – 3) on B and each frequency is assu med as 10%, (100% 80% = 20% frequency divided by 2 attribute values). Therefore, the final frequency for S1 is 10% 60% and the estimated cardinality for the result is 0.6 10 0.06 tuples. After computing the cardinality of the result rela tion, we then build a new FI tree for the resulting relation. The query optimizer will again us e this new FI tree as a profile of the result relation when it processes the operation that the re sult relation is involved with. In other words, this new FI tree and the estimated cardinality af ter each operation are propagated for the further operations in the query plan. Joins are processed in a similar fashion but for the join operation, we first need to rebuild the FI trees of the two input relations so that the attribute values belonging to the joined attribute are found at level 1 in the tree. Fo r example, if an FI set is {‘b2’, ‘d1’, ‘c3’, ‘a1’} and we join . R.A=’a1’ and R.B = ‘b1’ and R.D=‘d4’ < R > < S > S1

PAGE 32

21 on A then we put the set {‘b2’,‘d1’,‘c3’} under {‘a1’} at level 1 in the tree. After rebuilding the two trees that will be used to estimate the result of the join, we compare all items at level 1 and get the common join attribute values across both FI trees. If there is an attribute value from the joined attribute that is present in one relation’s FI tree but not the other, we drop that node and its all branches from the trees. At this point, we have only common items for joining attributes on both trees and can easily calculate the frequenc ies for the relation resulting from joining these values together. Of course it is possible that there are common values in the non-frequent items and the values dropped from the trees. For this case, we again use standard heuristic methods [28] under the uniform distribution assumption for each pair of attribute values. Just like for the selection operation, the product of two frequencies from the FI trees and from other attribute values that are not in the FI trees is th e final frequency for the resulting relation. 3.3 Data Structure of the FI Tree During our process, the frequent itemsets th at are extracted from the relation form a tree (FI tree). Unlike one-dimensional histograms that are constructed along single attribute, an FI tree exists over a relation. So, there is only one FI tree for a relation while a histogram file exists for an attribute. The root of an FI tree has the cardinality of the relation. Except the root node, each node keeps item information, such as value, attribute name, etc., a support as a frequency of the itemset and the pointers for its children. The level of this tree implies the size of an itemset, i.e., all nodes at level 1 are the 1-frequient itemsets and their ch ildren stand for the itemsets that contain this item. One thing to be aware with this tree is that the supports that are on the nodes deeper than level 1 do not mean the support of the item itself. Rather, those are the supports for itemsets that contain all items on their ancestor nodes as well as that node. For instance, we use one of the frequent itemsets of Table 3 to show this. The itemset {‘a1’, ‘c3’} has support 30%. The

PAGE 33

22 subsets of this itemset are {‘a1’} and {‘c3’} and the supports of those 1-itemsets are both 50%. For these items, Figure 7 show the part of the FI tree from Table 3. Figure 7: An example of an FI tree In the FI tree shown above, there are two nodes that contain the item ‘c3’. The node that contains value ‘c3’ at level 1, denotes the 1-frequent itemset {‘c3’}, and we refer this node as n1. The other node that contains value ‘c3’ at level 2 represents the 2-frequent itemset, which means the size of this frequent itemset is 2, {‘a1’, ‘c3’}, and let this node be n2. The support of n2 is not for an item but for an itemset. Since {‘a1’} has a child for the itemset {‘a1’, ‘c3’}, to avoid duplicity, n1 does not need to have a child node for ‘a1’ for that itemset. Now, we provide the full FI tree of the relation R1 in Table 3. Figure 8 depicts the profiles such as an FI tree and the table fo r attribute value counts of relation R1 and we will use these often for further examples. n2 n1 10 {a1} 50% {c3} 30% {c3} 50%

PAGE 34

23 Figure 8: The FI tree for R1 After spending short description about this FI tree, we move to the next section for more comprehensive explanation. From the root node of the FI tree in Figure 8, we get the information that the cardinality of this relation is 10. If we traverse the left most branch,i.e., depth first pattern, we can get some actual data distribution: the frequency of tuples which contain ‘a1’ is 50%, the frequency of tuples which contain ‘a1’ and ‘b2’ at the same time is 40%, the frequency of tuples which contain ‘a1’,’b2’ and ‘c3’ at the same time is 30%, and the frequency of tuples which contain ‘a1’,’b2’,’c3’ and ‘d4’ at the same time is 30%. At the other hand, if we traverse the nodes acco rding to level 1, i.e., using a breadth first search pattern, we can get some information about the attributes. For example, at level 1 of this tree, there are two 1-items whose attribute are ‘A’ and the summation of their supports is 90%. If we use smaller support threshold, the size of an FI tree increases and we can get more accuracy owing to increased information on more frequent itemsets. A:‘a2’ 40% B:‘b2’ 50% B:‘b3’ 40% C:‘c3’ 50% C:‘c5’ 30% D:‘d4’ 50% C:‘c3’ 40% D:‘d4’ 30% D:‘d4’ 30% D:‘d4’ 30% A:‘a1’ 50% C:‘c3’ 30% D:‘d4’ 30% D:‘d4’ 30% D:‘d4’ 30% { R1 : 10 } B:‘b2’ 40% C:‘c3’ 30% D:‘d4’ 30% D:‘d4’ 30%

PAGE 35

24 Note that though an FI tree keeps informati on only on frequent itemsets, we can also draw some part of information on non-freque nt itemsets from FI tree, as well. 3.4 Selectivity Estimation for Equality Selection This Section describes our method for equality selection predicates in detail, which takes advantage of the FI tree and standard heuristic s under the uniform distribution assumption for the selection operation. The entire process is outlined below: 1) First, we discuss the way to estimate the cardinality for resulting relation after the selection operation. 2) Next, we show how the profiles ar e propagated as in a query plan. 3) Last, we explain how we recompute the statis tics for attribute values after the selection operation. This is needed for further operations over the resulting relation. 3.4.1 Step One: Get the Cardinality of Resulting Relation Using the FI Tree. When using the FI tree to estimate the selec tivity of a relational selection query, there are two cases. In the first case, all of the predicates present in the query can be handled directly using the FI tree. For example, consider the following query in 4.1. Example 4: Suppose this relational selection que ry is a part of a query plan. SELECT FROM R1 WHERE A = a1 AND C = c3; We refer to the set of predicate in the query as the predicate set, denoted by P. All predicate values on the relation in a query belong to this predicate set. For example, if we have predicate values, p1, p2, …, pn, then the predicate set, P for the selection is P = { p1, p2, … ,pn }. In this example, P = {‘a1’, ‘c3’}. To process selection query, first we need to check the predicate set with 1-itemsets which reside at level 1 on the FI tree and partition this into two sets; the predicate set whose members

PAGE 36

25 are all one of 1-freqient itemsets, f-P, or not, Nf-P. We treat these two sets differently. For f-P, we use the FI tree to get the number of tuple that sa tisfy the predicates in this set. The main thing when we handle f-P is to discover the completeFI(s). If there is an itemset I whose members are exactly same with those in f-P, we define that node as a completeFI. By taking the product of frequencies of these completeFIs on the FI tree, we estimate the selectivity of result relation that is comprised of frequent items. Yet, for Nf-P (those predicates that do not include any frequent itemset) we assume that all tuples that contai n one of the values in this set are uniformly distributed. So we just calculate the number of tuples that have non-frequent attribute value under uniform distribution assumption. Suppose the FI tree in Figure 8 is the profile we use for this query. In the case that there is a same itemset with P within the FI tree, we can get the exact frequency of the predicates. After checking all items at level 1, we put thes e predicates into frequent-PredicateSet, f-P. So there is no Nf-P in this query. Now we traverse the FI tree in depth first pattern to find out the complete FI. We find the frequency for the itemset,{‘a1’,‘c3’}, take its support (which is 30%), and multiply this frequency by the cardinality of th e relation. By doing this, we can obtain the accurate number of tuples which satisfy th e selection predicates: 3 tuples (3 = 30% 10 tuples). If the FI tree has one completeFI on the node and the set Nf-P is empty, the estimation of the cardinality is as same as the actual one, i.e ., the accuracy is 100%. However, it is still possible that there is no such node where th e frequent itemset is same with f-P. In this case, we try to find out the largest frequent itemset, all of whose me mbers are in the frequent predicate set from the tree, and then remove those items from the f-P. By doing this recursively, we can acquire some frequencies and take the produc t of those frequencies. 3.4.2 Step Two: Get the Cardinality of Result Relation Using the Uniform Distribution Assumption. Sometimes, not all of the predicates in P can be satisfied directly using the FI tree. To handle such cases, we store an additional set of statistics along with the tree. For example, in the

PAGE 37

26 FI tree in Figure 8, there are only two attribute values on A, ‘a1’ and ‘a2’ and the summation of their frequency is 90%. If a predicate value is ‘a3’ which we cannot find in the tree, we need to guess that how many tuples contain this non-fre quent attribute value. For this purpose, we maintain a table that provides the numbers of attribute values over the relation. Figure 9 shows this table for relation R1. Figure 9: The numbers of attribute values over R1 Using the FI tree and this table, we can in fer some information for the non-frequent items that we need to estimate cardinality. Fo r instance, since the attribute counts for A is 3, by subtracting two items in the FI tree in Figure 8, we can know that there is only one attribute value which is not frequent and its suppor t is 10% (10 = 100% 90%). This information on tuples that have non-frequent items is needed for using the heuristics to get the cardinality without data distribution. In the heuristic method for calculati ng cardinalities, we assume uniform distribution over all attribute values. We will intr oduce the formula for this later. To see how these statistics are used in th e case that not all of the predicates in P are present in the FI tree, consider the following example query: Example 5: Here we have a query of R1. For this query, we use both FI tree in Figure 8 and the attribute counts table in Figure 9 as profiles of R1. SELECT < Attribute Values over R1 > Distinct Attribute ‘A’ values: a1,a2,a3 Distinct Attribute ‘B’ values: b2,b3,b5 Distinct Attribute ‘C’ values: null,c3,c4,c5 Distinct Attribute ‘D’ values: null,d4,d5 3 3 4 3 A B C D Attribute # of Attribute Values

PAGE 38

27 FROM R1 WHERE B = b5 AND C = c5 AND D = d4 AND A = a1; We refer to the predicate “B = b5” as p1, “C = c5” as p2 “D = d4” as p3 and “A = a1” as p4 Then the predicate set P = {‘b5’, ‘c4’, ‘d4’, ‘a1’}. By comparing the 1-frequent items on the tree, we divide this set into a frequent predicate set, f-P = {‘c5’, ‘d4’, ‘a1’} and a non-frequent predicate set, Nf-P = {‘b5’}. Again for f-P, we use FI tree to get the frequency of the values, but for the non-frequent predicate set, Nf-P, since we do not have any information like attribute values, distribution and etc., we simply adapt the heuristic method to ge t the estimation. Now, we discuss the heuristic method for calculating the number of tuples resulting from the equality selection operation. In the heuristic method, things become much simpler. With only two pieces of information about a relation, we can approximate the selec tivity. Under the uniform distribution assumption, the only things we need are cardinality of a rela tion and the number of attribute values on all attributes in the relation. Consider a relation R with 1000 tuples. If we do not know any of distributions for attribute values, but we know ther e are 10 attribute values for the attribute of the equality predicate, then we assume every attrib ute value has same frequency in that relation and get 100 tuples as the number of tuples for any pred icate value for an equality selection. To give the exact algorithm, we first define some notations below: NR Cardinality of Relation R V(R A) Value count for attribute A on the Relation R Cost(S) the cost of predicate S, in terms of the number of tuples Then, the general expression for the cost of equality selection S is: Cost(S) = A) (RRV N

PAGE 39

28 The FI tree has the cardinality of the relation. Note that the summation of the supports of the nodes that are at level 1 and whose attribute names found in the predicates is the frequency of the total number of tuples which contain freque nt items on that attribute. Thus, using the cardinality and this summation, we can assume the number of tuples that contain non-frequent items on the attribute. In our example, we had previously broken the selection predicates into two sets, f-P = {‘c5’ ‘d4’, ‘a1’}, and Nf-P = {‘b5’ }. For processing the set f-P we find completeFI(s) as we discussed in 3.4.1. If there is no such a freque nt itemset that is exactly same with the f-P we try to find a completeFI again as if one of the subsets of the f-P is f-P and take a larger subset as a completeFI since it contains more elements in one set for a completeFI We have more than one completeFI, such as {‘a1’, ‘d4’} and {‘c5’}, in our example. By multiplying the supports of these frequent itemsets, we get the frequency for the tuples which contain frequent items: % 9 % 30 % 30 For the set Nf-P, {‘b5’}, the heuristic method under the uniform distribution assumption is used to obtain the support: % 10 2 3 % 90 % 100 is attribute whose and 1 level at is which nodes of number ) ( is attribute whose node on the supports all of sum 100% B B R1 B V The product of supports of completeFIs and that for Nf-P is the frequency for the original predicates. In our example, 0.9%. 10% 9% Therefore, the cardinality for resulting relation is the estimated frequency multiplied by the cardina lity of the relation in the query. We get 0.09 tuples (0.9% tuples 10 0.09 ) as the cardinality of the result relation for our example. From the actual tuples in R1, we can see that there are no tuples th at satisfy the given predicate in our query, which means that the actual cardinality for this query is 0. So, our estimate is very close to the actual result. Below, we provide the complete algorithm for estimating the cardinality.

PAGE 40

29 Algorithm 1: GetCardinality in selection 3.4.3 Step Three: Create a New FI Tree for the Result after Selection. A result relation after a relational operation can be used as an input relation for the next relational operation in a query plan. Thus, for th e intermediate relations, we also need the profiles. So, the base profiles are propagated while the operations are considered. As a result, for every operation, we produce a new FI tree as well as the cardinality of the result relation. This function should be processed after calcu lating the cardinality of result since GetCardinality() finds out the completeFI(s) on the tree and we need to use completeFI(s) to rebuild an FI tree. To construct the result FI tree from the input FI tree, we check all nodes at level 1, and delete all branches whose attribute is in the predicates and which do not contain the completeFI For every node remaining, if there is any node whos e attribute is same with one of the predicate but the value is different, then we delete that node since after equality se lection only one value on the attribute is possible. After finishing this tr imming, we reconstruct the FI tree to satisfy the upward closure property where all subsets of frequent ite msets are also frequent itemsets. We also add the nodes which are in the non-frequent pr edicate in reconstructing process. Finally, we GetCardinality ( freqPredicate fP, Non-freqPredicate NfP, FI tree T ) { 1. while there is any frequent item in fP do 2. complete_FI = the largest subset of fP which is same with one FI in the T(R ); 3. supportFI = supportFI support(complete_FI ); 4. remove this subset from fP; 5. end while; 6. for all values a in NfP do 7. let attr as a.attr; 8. let C_attr as the number of 1-FIs whose attribute is attr on T; 9. let S_attr as the sum(all supports of 1-FIs whose attribute is attr on T); 10. supportNFI = supportNFI ((100 – S_attr)/(V(a)–C_attr)); 11. end for; 12. frequency = supportFI x supportNFI; 13. newCardinality = orgCardinality x frequency; 14. return newCardinality; }

PAGE 41

30 calculate the support for all newly ad ded nodes. Here the supports for completeFIs are 100% because this is the equality selection. For the children nodes ( T ) of these completeFIs the support could be calculated as: suppnew (T) = ) ( ) ( T suppold s completeFI suppold We provide depicted processes in Figure 10 followed by a summarized pseudo-code for this process. Figure 10: Overall process for getting the FI tree A:‘a2’ 40% B:‘b2’ 50% B:‘b3’ 40% C:‘c3’ 50% C:‘c5’ 30% D:‘d4’ 50% C:‘c3’ 40% D:‘d4’ 30% D:‘d4’ 30% D:‘d4’ 30% A:‘a1’ 50% C:‘c3’ 30% D:‘d4’ 30% D:‘d4’ 30% D:‘d4’ 30% { R1 : 10 } B:‘b2’ 40% C:‘c3’ 30% D:‘d4’ 30% D:‘d4’ 30% Complete-FIs < 1. Searching Mode: Finding complete-FI(s) >

PAGE 42

31 Figure 10 .Continued A:‘a2’ 40% B:‘b2’ 50% B:‘b3’ 40% C:‘c3’ 50% C:‘c5’ 30% D:‘d4’ 50% C:‘c3’ 40% D:‘d4’ 30% D:‘d4’ 30% D:‘d4’ 30% A:‘a1’ 50% C:‘c3’ 30% D:‘d4’ 30% D:‘d4’ 30% D:‘d4’ 30% { R1 : 10 } B:‘b2’ 40% C:‘c3’ 30% D:‘d4’ 30% D:‘d4’ 30% < 2. Trimming Mode > C:‘c5’ 30% A:‘a1’ 50% D:‘d4’ 30% { R1 : 10 } C:‘c5’ 30% A:‘a1’ 50% D:‘d4’ 30% { R1 : 10 } < 3. Rebuilding Mode: adding nodes > B:‘b5’ 100% D:‘d4’ 100% new nodes from non-frequent predicate set

PAGE 43

32 Figure 10 .Continued Algorithm 2: GetFITree in selection 3.4.4 Step Four: Update the Table for A ttribute Counts for the Result after Selection. After the selection operation, the numbers of attr ibute values for attributes that involved in the predicates will change. So we need to upda te the attribute value counts for the resulting relation. Since this is the equality selection, only two cas es are possible for the attributes after this operation. First, if the predicate value is one of the possible values in the domain of the attribute, all tuples would have this predicate value for that attribute as a result of this operation. Otherwise, Algorithm GetFITree ( freqPredicate fP,Non-freqPredicate NfP,FItree T ) { 1. for every 1-level node t of T do 2. if t does not have complete-FI 3. drop t from T; 4. end for; 5. for all node t1 of T do 6. let an item in fP as p if t1.attribute = p.attribute; 7. if and t1.value <> pk.value 8. drop t1 from T; 9. end for; 10. add all item in NfP onto T; 11. rebuild T to satisfy upward closure property 12. recalculate supports for all node } C:‘c5’ 100% A:‘a1’ 100% D:‘d4’ 100% { Result relation: 0.09 } < 4. Updating Support Mode > B:‘b5’ 100% D:‘d4’ 100%

PAGE 44

33 the cardinality after this operation would be 0. Ther efore, for those attribute which are part of the predicates, we set the number of attribute values to be 1 (value), and for the rest of attributes which are not predicate in the given query, we just keep the original value count. For our example query, all attributes in this re lation are used as predicate. So, the table in Figure 9 would be updated as Table 4: Table 4: The updated table for numbe rs of attribute after selection Below is the summary of this procedure. Algorithm 3: UpdateAttrCounts in selection 3.5 Selectivity Estimation for Join In this section, we demonstrate how we use FI trees for the join operation. The basic idea behind the cardinality estimation for the join operati on using FI trees is very straightforward as follows. For the join operation, our concern is only over common items in both relations. So we find the common items on the joined attributes in both FI trees. We take the product of the two frequencies on each tree for each common item, and the summation of all these frequencies after UpdateAttrCounts { 1. for every attribute “a” on V( R ) do 2. if a is same with one in P( Q ) 3. V(R, a) = 1; 4. else 5. keep the value; 6. end if 7. end for } 1 1 1 1 A B C D Attribute # of Attribute Values

PAGE 45

34 production is the frequency for the common items am ong a frequent items. If an attribute value is one of the frequent items but it is not on the other tree, we drop this item and all its branches from the FI tree and treat this item as a non-frequent item. For the join operation with non-frequent items, we again assume that all tuples that c ontain one of these non-frequent items are uniformly distributed. Using the formula to get the cardinality for join operation under the uniform distribution assumption, we estimate the frequenc y for result after the join operation with these tuples. The final frequency for resulting relati on is the summation of these two frequencies. Figure11 illustrates the process. Figure 11: Join modeling The remainder of this section is outlined as follows: 1) First, we define some notati ons that we need for effective description of all processes for the join operation. Whole Relation Frequent Items: All attribute values and supports are kept •Attribute coun t •cardinality N on-Frequent Items: NonFIRelation Frequent Items: FIRelation Frequent Items: All attribute values and supports are kept •Attribute coun t •cardinality Whole Relation Find common items CommonFIRelation: All attribute values and supports are kept CommonFIRelation: All attribute values and supports are kept FI but uncommon FI but uncommon Non-FI Non-FI Frequent Items: FIRelation N on-Frequent Items: NonFIRelation +

PAGE 46

35 2) Then, we show how to get the cardinality for resulting relation after the join operation. 3) For future operations in the query plan, we need to build an FI tree profile of the resulting relation. Using an example, we illu strate how we construct a new FI tree from two input FI trees corresponding relations joined. 4) After the join operation, the number of attr ibute values on join attributes may change. Since the table for attributes value counts is al so used as a profile with the FI tree, we describe how to construct a ne w table for resulting relation. 3.5.1 Notations Used. Assume that we have a relation R and we are joining this relation with other relation, S, on attribute A. Attribute A in R has n values, { a1, a2, a3, …, an}. Let Tuples(ak) be all tuples which have attribute values ak, then : |R| = f(a1) + f(a2) + f(a3) +…+ f(an), Here, f (k) is the frequency for attribute value k such that f(ak) = Count ( Tuples ( ak)). Since every tuple has exactly one value on A, the total number of tuples, |R| can be expressed as the sum of frequencie s for all attribute values on a certain attribute is the cardinality of the relation. Based on this idea, we define some of the notations we will use in this section: NR Cardinality of Relation R V(R A) Value count for attribute A on the Relation R, JoiningFI The value set whose all members are frequent items for the join attribute. CommonFI The value set whose all members belong to JoiningFI and are common in both FI trees. CommonFItuples All tuples that contain one of the attribute value in CommonFI. FIRelation (R) A virtual relation that is extracted from relation R when a tuple contains

PAGE 47

36 one of the members in JoiningFI. All tuples in this relation can be divided into CommonFItuples and unCommonFItuples. NonFIRelation (R) A virtual relation that is comprised of the remaining tuples when FIRelation (R) extracted from R. CommonFIRelation (R) A virtual relation which is comprised of CommonFItuples OtherRelation (R) A virtual relation which is comprised of uncommonFItuples and NonFIRelation (R). 3.5.2 Step One: Get the Cardinality for the Result after Join. We begin by assuming that we have already extracted the common items, CommonFI by comparing 1-frequent itemsets on join attributes in both relations and we dropped all uncommon 1-itemsets and their branches. So, now the FI tr ees have only common items on the join attribute in the tree. From the tree, we get the actual frequency for every item on the node at level 1 and actual tuples that contain those attribut e values. These tuples are defined as CommonFIRelation and we do a join over these two CommonFIRelations. By multiplying the tuples in these virtual relations, we get the exact tuple counts for the join result of these two CommonFIRelations. For the other virtual relations, OtherRelation s produced from both relations (that are made up of uncommonFItuple and NonFIRelation (R)), we know only the total number of tuples and the attribute counts on join attribute over these rela tions. So, we just adap t the simple heuristic method for calculating join result under uniform distribution assumption. The reason we do not ignore uncommon items even after deleting them from trees is that there is a possibility that the same attribute value exists among non-frequent items tuples in the other relation. In this case, these two values can be joined together and would contribute to join result. Since this is a join operation for two relations, we need one more relation, R2, for our running example. We introduce R2 in Table 5 and the corresponding FI tree and the table for attribute value counts are depicted in Figure 12.

PAGE 48

37 Table 5: Transactions in R2 Figure 12: Profiles for R2. a) the FI tree for R2 b) table for attribute value count for R2 Using these two relations, we consider a join example. Select from R1, R2 where R1.A = R2.A; For the relation R1, the frequent itemsets with size 1 are {‘a1’: 50%}, {‘a2’: 40%} and R2 has same 1-frequent itemsets but with different frequencies, {‘a1’: 40%}, {‘a2’: 40}. So after comparing 1-frequent itemsets of these two relations to get the common item values, CommomFIRelations are same with FIRelations for both relations in this case. The cardinality when we join these two virtual relations is the su mmation of the number of tuples with the tuples after joining attribute values ‘a1’ and those after joining attribute value ‘a2’, which is 18 tuples: a1 (40.0%) a2 (40.0%) e3 (40.0%) e5 (60.0%) f5 (40.0%) a1 e3 (40.0%) a2 e5 (40.0%) e5 f5 (40.0%) R2 : 5 {a1} 40% {e3} 40% {e3} 40% {a2} 40% {e5} 40% {e5} 60% {f5} 40% {f5} 40% b) Table for attribute count over R2 < Attribute Values over R2 > Distinct Attribute ‘A’ values: a1, a2, a7 Distinct Attribute ‘E’ values: e3, e5 Distinct Attribute ‘F’ values: Null,f2,f4,f5 3 2 4 A E F Attribute # of Attribute Values < Frequent Itemsets over R2 > Transaction ID A E F 1 2 3 4 5 a1 e3 f4 a2 e5 f5 a7 e5 f5 a1 e3 f2 a2 e5

PAGE 49

38 Tuples after joining attribute value ‘a1’: tuples 10 40% tuples 5 50% tuples 10 Tuples after joining attribute value ‘a2’: tuples 8 40% tuples 5 40% tuples 10 Cardinality ( CommomFIRelations ) : tuples 18 tuples 8 tuples 10 For NonFIRelation (R1) and NonFIRelation (R2)which are also part of the original relations, there is 1 attribute value on A in both R1 and R2. Since V (R1,A) = 3 and there are 2 attribute values for attribute A in the FI tree, by subtracting the attribute values in FIRelation (R1) from that of the original values in R1, we have 3-2 = 1 value count for virtual relation NonFIRelation (R1). For the R2, we have same result. The cardinality for OtherRelation(R1) is 1 tuple ( 40%)) (50% (100% tuples 10 tuples 10 tuple 1 ) and 1 tuple for OtherRelation(R2). Therefore, by adapting the heuristics under the uniform distribution assumption, we get the cardinality after joining these OtherRelations : Cardinality(OtherRelations) = ) ) ( ), ( ( A R2, A R1 R2 R1 V V MAX N N = ) 1 1 ( 1 1 MAX = 1 The final cardinality of the join operation is computed with the cardinalities, Cardinality ( CommomFIRelations ) and Cardinality ( OtherRelations ) of the two virtual relations for both original relations, R1 and R2 In our example, we get the result cardinality after joining R1 and R2 by adding these two cardinalities: 18 + 1 = 19 tuples. Considering the fact that the actual cardinality after joining relation R1 and R2 on attribute A is 18 tuples, this estimation is reasonable. The summarized algorithm for this procedure follows.

PAGE 50

39 Algorithm 4: GetCardinality in Join 3.5.3 Step Two: Get the FI tree for the Result after the Join The first thing we have to do to build a new FI tree for the result relation is to rebuild the input FI trees: All nodes whose attribute value is in a CommonFI have the first priority, so that we let the attribute values on the join attribute be the base node in an FI tree for each frequent itemset. We define a base node to be the first node we meet when we traverse an itemset in depth search pattern. It resides at level 2. For example, suppose a frequent itemset is {‘b2’,‘d1’,‘c3’,‘a1’} and ‘a1’ is the value on A When we join on A, we rebuild the FI tree so that all items on A become base nodes. To do this, we put the all items in {‘b2’,’d1’,’c3’} under {‘a1’}, which means that all of items in th is set would be the descendents of ‘a1’. After the rebuilding process, we drop all 1-iemsets whose attribute is joined one but the value is not in CommonFI. When we drop them we also remove all their branches. In our example, since all values in frequent itemsets are in CommonFI over R1 and R2, there is no node or branch deleted. Figure 13 is the result of this deleting process in our example. GetCardinality ( left-join-relation left, right-join-relation right ) { 1. Let a be joining attributes in both relations 2. Let T(left) be the FI tree for left and T(right) be the FI tree for right 3. Find common-items,commonFI,and delete uncommon-items from both FI trees 4. For all i in commonFI 5. cardinality1= Sum(|left|x f(i) in left x|right|x f(i) in right); 6. end for; 7. V(OtherRelation(left),a) = V(left.a) – Count(left.commonFI); 8. |OtherRelation(left)| = |left| – |CommonFIRelation(left)|; 9. V(OtherRelation(right),a) = V(right.a) – Count(right.commonFI); 10. |OtherRelation(right)| = |right| – |CommonFIRelation(right)|; 11. cardinality2 =(|OtherRelation(left)| x |OtherRelation(right)|) / MAX(V(OtherRelation(left),a), V(OtherRelation(right),a)); 12. |result| = cardinality1+ cardinality2; }

PAGE 51

40 Figure 13: After rebuilding and deleting uncommon items For all 1-itemsets whose attribute are not the jo in attribute and all their descendent nodes, we put this part of tree as unJoinedgroup and for all 1-itemsets whose attribute are the join attribute and all of their descendents, we call Joinedgroup Now, we check if there is any same itemset between these two groups. If there is a node n under the joining item in Joinedgroup we subtract the support on n from that of the node in unJoinedgroup In our example, on the FI tree of R1, we see that {‘b2’}, {‘b2’, ‘c3’} and {‘b2’, ‘c3’, ‘d4’} exist under {‘a1’} and there are same itemsets with th ese three itemsets in unJoinedgroup. So we subtract supports on those three nodes under {‘a1’} from the nodes in unJoinedgroup The reason we are doing this is that the A:‘a2’ 40% B:‘b2’ 50% B:‘b3’ 40% C:‘c3’ 50% C:‘c5’ 30% D:‘d4’ 50% C:‘c3’ 40% D:‘d4’ 30% D:‘d4’ 30% D:‘d4’ 30% A:‘a1’ 50% C:‘c3’ 30% D:‘d4’ 30% D:‘d4’ 30% D:‘d4’ 30% { R1 : 10 } B:‘b2’ 40% C:‘c3’ 30% D:‘d4’ 30% D:‘d4’ 30% R2: 5 {a1} 40% {e3} 40% {e3} 40% {a2} 40% {e5} 40% {e5} 60% {f5} 40% { f5 } 4 0% ‘ a1 ’ and ‘ a2 ’ are common items in both relations R1: 10

PAGE 52

41 frequency of {‘b2’, ‘c3’, ‘d4’} is the frequency of the summation of the tuples when occurring with attribute value ‘a1’ and the tuples that happens with other attribute values on A. Since we Figure 14: Changing frequencies for nodes use different methods when we handle the two virtual relations (two CommomFIRelations using FI trees and two OtherRelations using the formula under uniform distribution assumption) by subtracting the frequency when the itemset occurs with one of the items in CommonFIRelation we can obtain the frequencies in OtherRelation s while avoiding duplicity. FI trees after this process is in Figure 14. A:‘a2’ 40% B:‘b2’ 10% B:‘b3’ 40% C:‘c3’ 20% C:‘c5’ 30% D:‘d4’ 0% C:‘c3’ 10% D:‘d4’ 0% D:‘d4’ 0% D:‘d4’ 30% A:‘a1’ 50% C:‘c3’ 30% D:‘d4’ 30% D:‘d4’ 30% D:‘d4’ 0% { R1 : 10 } B:‘b2’ 40% C:‘c3’ 30% D:‘d4’ 30% D:‘d4’ 30% R2 : 5 {a1} 40% {e3} 40% {e3} 0% {a2} 40% {e5} 40% { e5} 20% {f5} 40% { f5} 40% R1: 10 The node which can be found under the joined attribute JoinedGroup

PAGE 53

42 For example, consider Figure 14. In the FI tree for R1, two 1-frequent itemsets {‘a1’} and {‘a2’} and all of their descendents make up Joinedgroup The rest of the nodes in the tree are in unJoinedgroup There are two nodes whose values are ‘d4’ in Joinedgroup and there is one node with value ‘d4’, which is for 1-frequent itemset, in unJoinedgroup s. So we subtract the summation, 60% = 30% + 30%, from that of 1-frequent itemset {‘d4’}, 60%. As a result, the support for {‘d4’} in unJoinedgroup is 60% 60% = 0. By subtracting the support in Joinedgroup s, we can get the support for this item in OtherRelation (R1). So, our example implies that attribute value ‘d4’ always happens with attribute value ‘a1’ or ‘a2’ and there is no such a tuple that containing attribute value ‘d4’ in OtherRelation (R1). After subtracting the frequency when this item occurs with one of CommonFI we consider the case for joining with two OtherRelations. By adapting the uniform distribution assumption, we calculate this frequency. In Figure 14, {‘b2’, ‘c3’} in unJoinedgroup has 10% support. Since we already removed the frequency of the case that this item occurs with any item in CommonFI this frequency means that there is 1 tuple ( tuple 1 10% tuples 10 ) containing {‘b2’, ‘c3’} and this tuple has any unknown attribute values on attribute A. From the table for attribute counts and CommonFI we know that there is 1 attribute value on A and 1 tuple (1 tuple = 40%)) (50% (100% 10) in OtherRelation (R1) In this way, we can compute the information about OtherRelation (R2) These are all information needed wh en we assume uniform distribution assumption. In the next step, we handle all nodes in Joinedgroup We obtain each frequency after joining with the attribute value in CommomFI by multiplying two frequencies in both FI trees. Like in selection queries, we update the frequenc ies for descendents of those joining attribute values. Once we finish getting the all supports for nodes which are related with CommonFI we add the new supports of nodes in JoinedGroup to the same valued nodes in unJoinedGroup. By doing

PAGE 54

43 this, we accomplish to calculate new supports of all nodes for new FI tree. Finally we merge both trees and Figure 15 shows this new FI tree. Figure 15: The FI tree for result relation We provide the algorithm for this procedure, and move to the last function of this operation. A:‘a2’ 42.1% B:‘b2’ B:‘b3’ 35.9% C:‘c3’ C:‘c5’ 26.5% D:‘d4’ 52.7% C:‘c3’ D:‘d4’ D:‘d4’ D:‘d4’ 31.6% A:‘a1’ 52.6% C:‘c3’ 31.6% D:‘d4’ 21.1% D:‘d4’ 21.1% D:‘d4’ { Result Relation : 19 } B:‘b2’ 42.1% C:‘c3’ 31.6% D:‘d4’ 21.1% D:‘d4’ 31.6% {e3}52.6%{e3} {e5} 42.1% {e5} {f5} 40% {f5}35.9% (5.3+ 42.1)% (10.5+ 31.6)% (0+ 52.6) % (5.3+ 42.1) % (5.3+ 31.6)% (0+ 21.1)% (5.3+ 21.1)% (0+ 31.6)%

PAGE 55

44 Algorithm 5: GetFITree in join 3.5.4 Step Three: Update the Table for Attribute Counts for the Result after Join. When we join, except for the join attribute, we keep all values on the other attributes. So the numbers of attribute values for other attributes after join are same as the original value counts. We merge two input tables for the number of attr ibute values and we update the value count for only the join attributes with smaller one among t hose for the join attributes in both relations. Now, we go back to our example. After merging both tables in Figure 9 and Figure 12, we update the value counts for attribute A. As a result, we have a table like Table 6. Algorithm GetFITree ( left-FItree T(left), right-Fitree T(right) ) { 13. let a joined node for both relation 14. For T(left) and T(right) 15. Rebuild FiTrees so that all value whose attribute is a exist only at level 1 16. For all nodes at level 1 17. drop all nodes whose attribute is a and node.value is not in commonFI 18. end for 19. let T1() is a tree which consists with all items on a and their branches 20. let T2() is a tree such that T – T1(); 21. for all nodes n in T2 22. if there is a same itemset with n in T1, subtract the frequency from the support of n 23. end for 24. for all nodes n in T2 25. calculate support of n using the heuristics 26. end for 27. for all nodes n in T1 28. calculate support of node at level1 by multiplying the frequency in the other relation 29. calculate supports for all of its children node 30. end for. 31. for all nodes n in T2 32. if there is a same itemset with n in T1, add its frequency to the frequency of n 33. end for 34. end for 35. newTree = Merge(T(left), T(right)) }

PAGE 56

45 Table 6: Number of attribute values for result relation after join R1 and R2 By including the algorithm for this process below, we conclude this chapter. Algorithm 6: UpdateAttrCount for join UpdateAttrCounts { 1. Let Table(left) as the table for attribute counts in left relation; 2. Let Table(right) as the table for attribute counts in right relation; 3. Table(result) = Merge(V(left), V(right)); 4. For joined attributes in Table(result), update the number of attribute values with MIN(V(left.joinAttribute), V(right.joinAttribute)); } # Of Attribute Values R1.A R1.B R1.C R1.D R2.A R2.E R2.F Attribute M in ( V (R1,A), V (R2,A)) = M in (3, 3 ) = 3 3 4 3 Min ( V (R1,A), V (R2,A)) = Min (3, 3 ) = 3 2 4

PAGE 57

46 CHAPTER 4 EXPERIMENTS AND RESULTS 4.1 Goal of Experiments Our goal through these experiments is to test the ability of our new method to estimate the cardinality for every resulting relations after each operation in a query plan compared to when we use histograms. In our experiments, we tested with two kinds of histograms. First, we used one of the most popular histograms, the equi-depth histogram The next type of histogram we used is histograms that have as many buckets as attri bute values on an attribute. We call this as a Complete histogram in this thesis. Complete histograms have the best performance among all possible one-dimensional histograms. By conductin g experiment with Complete histograms, we wanted to compare our technique with the best possible one-dimensional histograms. We performed our experiments with two realis tic datasets. We gene rated dataset 1 using TPC-R benchmark for real life database. For data set 2, we manipulated some data values in dataset 1 to produce more correlated data. While both histogram methods showed very ba d estimation for costs for correlated data, surprisingly, we found that all methods we used were not bad in terms of choosing the optimal plan. For example, though equi-depth histograms sh owed the worst error rate for every plan of a query, they still selected the best plan with high accuracy. For the correlated data and/or skewed data, our frequent itemsets method significantly outperformed the other methods, in terms of having the lowest absolute error, though all me thods still generally agreed which plan was the best.

PAGE 58

47 4.2 Methodology In our experiments, we tested three different options for selectivity estimation: 1) Equidepth histograms, 2) FI trees and 3) Complete histograms. In histogram method, the number of buckets is a factor that can make a difference in estimation. We used same bucket size over all e qui-depth histograms, where total file size for equi-depth histograms was set to be as close to 10M as possible. The synopsis size for each attribute was chosen to be proportional. For example, assume that one of our relations, R, has two attributes where attribute A had 100 distinct attribute values and attribute B has 200 attribute values. If we use 10 as the number of buckets, ther e would be 10 attribute values in a bucket of histogram for A and 20 for B. For FI trees, we also used same profile size, which is 10MB over all relations. We tried to find the threshold which makes the size of a freque nt itemset become almost same with the sum of sizes of histograms related with the relation. Fo r instance, if there are 5 attributes and the sum of those five histogram files is 1M, then the th reshold for frequent itemsets would be chosen to that the size of FI tree file is almost 1M. If data is not skewed, it is not always easy to keep comparable size because the FI tree size would be increased suddenly in case that most of frequencies of frequent itemsets are similar. In our experiment, almost all of FI files are smaller than the sum of histogram file sizes. This was done to ensure the fairness. The third method we tested is histograms using unlimited size, which we refer as Complete histograms In Complete histograms, every attribute value and its frequency are contained in a different bucket, so the number of buckets and the number of attribute values are exactly same and we can get actual distribution for every attri bute value. If we do not consider the size of profile, and we can store all data distributes in a histogram, this would be the best possible histogram among all one-dimensional histograms. Howe ver, in reality, this kind of histograms is not usually practical since the purpose of such profiles in database is to model the data distribution from compressed information. Nevert heless, the reason we used these histograms in

PAGE 59

48 our experiments is that we wanted to check wh ether even the best possible histogram can give good accuracy when data attributes are correla ted with each other and whether our method is comparable with the best possible histogram. We used the popular TPC-R benchmark for our testing [29]. TPC-R is a decision support benchmark. It evaluates the performance of vari ous decision support systems by executing sets of business-oriented queries. However, the data generated by TPC-R data generation program is uniformly distributed on all attributes, which is not realistic. So, to get skewed data, we downloaded the publicly available program from re searchers at Microsoft [30] and generated more realistic data using their program. This program takes the Zipfian parameter for the degree of skew and uses the Zipfian distribution to provide skew in data. The Zipfian value as a parameter can be from 0 to 4. The parameter z = 0 generates a uniform distribution for each attribute, whereas z = 4 generates a highly skewed data. By using Zipfian value 2, we create a little bit of skew when we generate data for our experiment. We also used 0.1 as a scale factor governing the size of the database produ ced by the data generation program. There are total 8 relations in this database. The largest relation is LINEITEM and its cardinality is 600,000 tuples if a scale factor of 0.1 is used. The relation PARTSUPP has 80,000 tuples, the relation PART contains 20,000 tuples, the cardinalities of CUSTOMER and ORDERS relations are 15,000 tuples and that of the relation SUPPLIER is 10,000 tuples. Both NATION and REGION relations are comprised of 5 tuples. Fi gures 16 depicts partial schema of the database we used.

PAGE 60

49 Figure 16: Partial schema of TPC-R database < PART > Cardinality: 20,000 < PARTSUPP > Cardinality: 80,000 < LINEITEM > Cardinality: 600,000 < ORDERS > Cardinality: 150,000 < SUPPLIER > Cardinality: 10,000 < CUSTOMER > Cardinality: 15,000 < NATION > Cardinality: 25 < REGION > Cardinality: 5 PARTKEY PARTKEY SUPPKEY SUPPKEY SUPPKEY PARTKEY CUSTKEY ORDERKEY NATIONKEY NATIONKEY NATIONKEY REGIONKEY REGIONKEY CUSTKEY ORDERKEY

PAGE 61

50 We generated two different data bases for our experiments. Firs t, we generated a database with a medium amount of skew, using a skew fact or 2. In this set, we produced a very weak correlation between data values. To see how our different three methods work w ith correlated data, we used another dataset. The schema of database and query plans were iden tical, except that we modified some of data in database to introduce additional correlation. For ex ample, we made all of tuples which contained ‘445’ on SUPPKEY in LINEITEM relation have value ‘653’ for attribute PARTKEY. As a result, attribute value ‘445’ on SUPPKEY and attribute value ‘653’ on PARTKEY are correlated each other (the dependency ‘445’ ‘653’ holds in this relation). We updated around 10 data values whose attributes are used in our testing queries to acquire some correlation. We chose 5 queries from the TPC-R benchmark and tested 5 query plans for each query. As our concern was estimating the selectivity of equality selection and join operations, it was necessary to change some parts of queries. For example if there was a range query, like operation, or subqueries, we cha nged the query to be more appr opriate for our implementation. For every query plan we created for each query, we used all three methods to calculate costs. To compare the accuracy for the estimates from thes e three methods, we imported the data from the database generated by TPC benchmark into Oracl e DBMS. By doing this, we could obtain the actual number of tuples for every operation in qu ery plans and compute the error rate for each plan. The formula we used for error rate is, t actual t actual t estimated cos cos cos 4.3 Results The following tables present our results. Th e result compared in our experiment is the number of tuples. We consider the number of tuples as a cost in this thesis. All queries and query plans we used in our experiments and corres ponding test results are provided in the appendix.

PAGE 62

51 We made font bold to denote that plan is th e best one each method chose over all tables below. For the best plan drawn by actual data we marked on the plan title. For example for Query 5 in Dataset 1, the actual best plan is plan5 and FI tree method and Complete histograms method selected the same plan but equi-depth histograms decided plan 4 as a best. 4.3.1 Dataset 1: Slightly Correlated Data plan1 plan2 plan3 plan4 plan5 Avg Error Rate equi-depth H 79.396 20.157 78.819 79.130 46.836 60.86761043 FI 0.530 0.086 0.522 0.518 0.280 0.387463054 Complete H 0.007 0.001 0.007 0.000 0.000 0.003219214 plan1 plan2 plan3 plan4 plan5 Avg Error Rate equi-depth H 597.180 609.966 596.337 68.960 490.425 472.573748 FI 0.518 0.519 0.520 0.535 0.484 0.514999855 Complete H 0.002 0.007 0.002 0.052 0.047 0.021898833 plan1 plan2 plan3 plan4 plan5 Avg Error Rate equi-depth H 357.749 640.466 640.348 530.899 1,372.049 708.3024413 FI 0.032 0.069 0.069 0.734 1.926 0.565942203 Complete H 0.036 0.083 0.083 0.681 1.884 0.553669026 plan1 plan2 plan3 plan4 plan5 Avg Error Rate equi-depth H 8,030.033 4,185.088 62.072 629.358 760.616 2733.433287 FI 27.684 15.175 0.235 2.431 2.705 9.645993616 Complete H 27.970 15.146 0.145 2.351 2.489 9.620216726 4.3.2 Dataset 2: More Correlated Data plan1 plan2 plan3 plan4 plan5 Avg Error Rate equi-depth H 0.663 1.490 0.663 3.728 3.911 2.090932 FI 0.002 0.001 0.532 0.225 0.026 0.1572 Complete H 0.883 0.776 0.882 0.669 0.628 0.767486 plan1 plan2 plan3 plan4 plan5 Avg Error Rate equi-depth H 39.411 7.960 25.142 9.057 10.806 18.47518007 FI 0.072 0.112 0.063 0.300 0.447 0.198344041 Complete H 0.005 0.121 0.085 0.306 0.487 0.200156443

PAGE 63

52 plan1 plan2 plan3 plan4 plan5 Avg Error Rate equi-depth H 19.966 8.213 17.572 8.984 11.816 13.30994855 FI 0.402 0.102 0.339 0.248 0.407 0.299642622 Complete H 0.434 0.111 0.351 0.246 0.447 0.317814133 plan1 plan2 plan3 plan4 plan5 Avg Error Rate equi-depth H 47.990 46.004 47.718 2.261 24.352 33.6647 FI 0.441 0.445 0.043 0.281 0.475 0.337125 Complete H 0.541 0.571 0.541 0.752 0.748 0.630725 plan1 plan2 plan3 plan4 plan5 Avg Error Rate equi-depth H 146.101 169.110 169.055 507.066 1,261.875 450.641351 FI 0.528 0.686 0.686 0.619 1.647 0.833272693 Complete H 0.501 0.646 0.645 0.883 2.253 0.985678458 plan1 plan2 plan3 plan4 plan5 Avg Error Rate equi-depth H 2.917 3.096 57.677 24.944 461.466 110.020 FI 0.983 0.981 0.357 0.837 1.469 0.925325 Complete H 0.981 0.979 0.217 0.812 1.669 0.931379 4.4 Discussion Overall, the FI tree method always outperform ed than equi-depth histogram in terms of accuracy in calculating cardinality. By conducting e xperiments using dataset 1 we could see that when data has little correlation, the Complete hi stograms (which contain all one-dimensional data distribution information) has almost perfect accuracy as we expected. The second data set that has more correlation showed that our new method can work very well for correlated data. In this environment, our FI method even returns more precise results than those using the Complete histograms since th e AVI assumption which all one dimensional histograms (including Complete histograms) have to adopt for multi-dimensional data performed poorly. Thus, even though the Complete histograms keep all information on data distribution for single attribute, the results from this method we re less accurate than those from the FI tree

PAGE 64

53 method. Therefore, we draw a conclusion that our method estimates better than any of onedimensional histograms for correlated data. Among the results we obtained, those for query 5 in dataset 2 are particularly interesting. Due to the relatively high error rate in the Co mplete histogram method, it seems that there is correlation among data over relations which used in the query. Nevertheless, our FI tree method also showed poor estimates for this query. Why did the FI tree (which is designed for use with correlated data) not work well? Although we created some correlation between partkey and size in PART relation, if data is not skewed, the po ssibility we will not keep the correlation in our profile is higher, which means the accuracy go es lower. Despite the correlation we added to the data in PART relation, there is only one itemset that has a value on partkey with a value on size, in our FI tree file. This is due to that fact all of itemsets which contain partkey value and size value have almost same frequencies, and if we store all of these itemsets in a file, the size of the FI tree file balloons and the synops is becomes too large. Consequently, we lost lots of correlation information and the estimation wa s comparably poor when these two attributes were involved like in query 5. However, the es timations from this method are still better than those from equi-depth histograms. Even though our motivation started from the fact that the current histogram method is bad (especially when data have correlations and/or when data is skewed), the results from our experiments show that the FI tree method works be tter than equi-depth histogram way even there is little correlation. As most of profiles do, this technique also has trade back between synopsis size and accuracy. If we use lower threshold to mine frequent itemsets, the estimation can be more precise but the synopsis (which resides in memory when used) becomes bigger. As a result, to decide appropriate threshold would be a challenge when using this method. When looking over the tables above to see th e ability of methods to choose the best plan, we can encounter a bizarre situation, which is the accuracy to decide the best plan is not related with the accuracy to calculate co sts. For query 2 in dataset 2, the average error rate for FI

PAGE 65

54 technique is very low and it returns most accurate estimated costs in that query, but it is the only method that selected the best plan incorrectly. As we discussed above, in dataset 2 the FI tree method shows the most precise results but this me thod is the worst to choose the best plan. We might assume that we do not have enough data to judge the ability in selecting the best plan, but moreover these results remind us that the purpose of query optimizer is to avoid the poor plan to execute in reality. In our implementation, we put values on all attr ibutes into a same pool at the same time to find out frequent itemsets over a relation. However, for the usage of query optimization, we may give options to DBAs so that they can avoid keeping unnecessary values in a profile and have more useful information. Based on the statistic of queries, a DBA can decide attributes that are compared together for constructing FI tree files. For instance, if relation R has 10 attributes but only two or three attributes are used often in que ries, instead of having all frequent itemsets over 10 attributes, we can extract frequent items among data on these two or three attributes.

PAGE 66

55 CHAPTER 5 CONCLUSION AND FUTURE WORKS In this thesis, we proposed a new method to get better selectivity estimation for correlated data. We used frequent itemset mi ning technique for cost-based query optimization. Our approach is effective for skewed data as well. Our expe riments show the potential utility of our method since the FI tree method always outperforms equi-dep th histograms. In terms of choosing the best query plan, no method was clearly the best, but our method shows an excellent ability to calculate frequencies for result relations. In our work, to avoid complexity, we did not use all information we can draw from an FI tree when calculating the costs. If we use as much information as possible from the FI tree, we may get better selectivity estimation. By leav ing this as a future work, we conclude.

PAGE 67

56 APPENDIX EXPERIMENTAL RESULTS Here, we provide all test results we obtained. These are the results performed with two datase ts. For each dataset, we used five different queries and five query plans for each query. So there are 25 different query plans for each dataset. We used three different methods to estimate costs for every operation in a query plan in our experiments. We provide five different queries and the corresponding possible query plans we used and we denote the estimates for every operation in a plan. Total cost is calculated by taking the summation of all costs in a plan for each method. Among total costs, the cost presented in bold font is the best approximation to the actual cost. There are four numbers in the cost of an operation. The first number is the actual execution result for the operation. We got those numbers by executing the operation in ORACLE DBMS. The number that appears along side ‘e)’ is the estimate from equi-depth histograms, and the re sult from using frequent itemsets is known on the third line starting with ‘f)’. The last line that st arts with ‘c)’ is the estimate from Complete histograms.

PAGE 68

57 DataSet 1: Slightly Correlated Data select l_orderkey, o_orderdate, o_shippriority from customer, orders, lineitem where c_mktsegment = ‘AUTOMOBILE' and c_custkey = o_custkey and l_orderkey = o_orderkey; Query 1 Modified from Query #3 in TPC-R. Query Plan1 customer.mktsegment=‘AUTOMOBILE’’ custkey orderkey Query Plan2 customer.mktsegment=‘AUTOMOBILE’ custkey orderkey <3,013> e)3,000 f)3,013 c)3,013 <29,624> e)250,191 f) 45,195 c) 29,704 <17,865,176> e)1,438,652,325 f) 27,343,310 c) 17,998,830 < 17,897,813> e) 1,438,905,516 f) 27,391,518 c) 18,031,547 <90,905,297> e)862,533,324 f) 90,750,963 c ) 90 892 314 <3,013> e)3,000 f)3,013 c)3,013 <17,865,176> e)1,438,824,355 f) 27,343,310 c) 17,998,830 Total Cost <108,773,486> e) 2,301,360,679 f) 118,097,286 c) 108,894,157 Total Cost e) estimate from equi-depth histograms f) estimate from FI trees c) estimate from Complete histograms

PAGE 69

58 Query Plan3 custke y orderkey customer.mktsegment =‘AUTOMOBILE’ custkey orderkey customer.mktsegment =‘AUTOMOBILE’ Query Plan5 customer.mktsegment =‘AUTOMOBILE’ custke y orderkey Query Plan4 <148,044> e)1,250,958 f) 225,000 c) 147,875 <29,624> e)250,192 f) 45,195 c) 29,704 <17,865,176> e) 1,438,652,325 f) 27,198,641 c) 17,998,830 < 18,042,844> e) 1,440,153,475 f) 27,468,836 c) 18,176,409 Total Cost <148,044> e)1,250,958 f) 225,000 c) 147,875 <89,726,688> e)7,193,261,627 f) 136,126,444 c) 89,604,370 <17,865,176> e)1,438,652,325 f) 27,198,635 c) 17,998,830 <107,739,908> e) 8,633,164,910 f) 163,550,079 c) 107,751,075 Total Cost <17,865,176> e)1,438,824,355 f) 27,343,310 c) 17,998,830 <89,726,688> e)7,194,121,776 f) 136,126,444 c) 89,604,370 <90,905,297> e)862,533,324 f) 90,750,963 c) 90,892,314 <198,497,161> e) 9,495,479,455 f) 254,220,717 c) 198,495,514 Total Cost

PAGE 70

59 select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = ‘1’ and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = ’ASIA'; Query 2 Modified from Query #2 in TPC-R. Query Plan1 Part.size = ‘1’ region.name = ‘ASIA’ partkey suppkey nationkey regionkey <593> e) 400 f) 543 c) 543 <45,940> e)322,365 f) 45,075 c) 45,699 <33,776> e)2,677,943 f) 48,739 c) 45,860 <33,776> e)2,477,151 f) 48,781 c) 45,860 <1> e)1 f)1 c)1 <33,776> e)495,430 f) 15,334 c) 9,172 <147,812> e) 5,973,290 f) 158,473 c) 147,135 Total Cost

PAGE 71

60 Query Plan2 Part.size = ‘1’ region.name=‘ASIA’ suppkey nationkey regionkey partkey <80,273> e)664,618 f) 80,157 c) 80,282 <1> e)1 f)1 c)1 <211,846> e) 1,898,224 f) 188,182 c) 186,336 Total Cost <80,273> e)614,785 f) 80,178 c) 80,282 <16,980> e)122,957f ) 16,988 c) 16,056 <33,776> e)495,463 f) 10,315 c) 9,172 <593> e) 400 f) 543 c) 543 Query Plan3 region.name = ‘ASIA’ partkey suppkey nationkey regionkey Part.size = ‘1’ <593> e) 400 f) 543 c) 543 <1> e)1 f)1 c)1 <5> e)5 f)5 c)5 <80,273> e)664,618 f) 80,157 c) 80,282 <33,776> e) 2,678,119 f) 49,122 c) 45,860 <33,776> e)535,635 f) 9,252 c) 9,172 <148,374> e) 3,878,778 f) 139,080 c) 135,863 Total Cost

PAGE 72

61 Query Plan4 Part.size = ‘1’ region.name = ‘ASIA’ suppkey nationkey regionkey partkey Part.size = ‘1’ region.name = ‘ASIA’’ suppkey nationkey regionkey partkey Query Plan5 <593> e) 400 f) 543 c) 543 <1,000> e) 925 f) 1,000 c) 1,000 <212> e) 185 f) 212 c) 200 <16,980> e) 122,957 f) 16,991 c) 16,056 <1> e)1 f)1 c)1 <1> e)1 f)1 c)1 <33,776> e) 495,463 f) 10,341 c) 9,172 <593> e) 400 f) 543 c) 543 <1,000> e) 925 f) 1,000 c) 1,000 <212> e) 185 f) 212 c) 200 <33,776> e) 495,463 f) 10,266 c) 9,172 <45,940> e) 322,365 f) 45,075 c) 45,699 <81,472> e) 819,339 f) 57,097 c ) 56 615 Total Cost <52,512> e) 619,931 f) 29,088 c) 26,972 Total Cost

PAGE 73

62 Select n_name from customer, orders, lineitem, supplier, nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = ’ASIA’; Query 3 Modified from Query #5 in TPC-R. Query Plan1 region.name = ‘ASIA’ custkey nationkey regionkey orderkey nationkey <148,044> e)1,250,958 f) 225,000 c) 147,875 <7,981,717,259> e) 4,774,506,546,885 f) 12,113,002,407 c) 7,963,778,002 Total Cost <89,726,688> e) 7,193,261,627 f) 136,126,444 c) 89,604,370 <3,568,287,410> e) 2,273,652,177,498 f) 5,408,583,824 c) 3,579,044,050 <1> e)1 f) 1 c) 1 <3,568,287,410> e) 2,078,049,880,667 f) 5,408,582,742 c) 3,579,151,422 <755,267,706> e) 415,609,976,134 f) 1,159,484,396 c) 715,830,284

PAGE 74

63 region.name = ‘ASIA’ <1> e)1 f) 1 c) 1 <755,267,706> e) 415,609,976,134 f) 1,159,484,628 c) 715,830,284 <4,413,429,854> e) 2,696,456,666,223 f) 6,704,419,902 c) 4,384,626,585 Total Cost Query Plan2 custkey nationkey regionkey orderkey nationkey <148,044> e)1,250,958 f) 225,000 c) 147,875 <89,726,688> e) 7,193,261,627 f) 136,126,444 c) 89,604,370 <3,568,287,410> e) 2,273,652,177,498 f) 5,408,583,824 c) 3,579,044,050 <5> e)5 f) 5 c) 5 Query Plan3 region.name = ‘ASIA’ custkey orderkey nationkey regionkey nationkey <90,905,297> e) 862,533,324 f) 90,750,963 c) 90,892,314 <3,568,287,410> e) 2,273,924,054,132 f) 5,437,352,201 c) 3,579,044,050 <3,568,287,410> e) 2,078,298,367,755 f) 5,437,352,745 c) 3,579,151,422 <1> e)1 f) 1 c) 1 <599,152> e)4,741,213 f) 599,151 c) 599,141 <7,983,346,976> e) 4,768,749,369,976 f) 12,131,707,131 c) 7,965,517,212 Total Cost <755,267,706> e) 415,659,673,551 f) 1,165,652,070 c) 715,830,284

PAGE 75

64 region.name = ‘ASIA’ custkey orderkey nationkey regionkey nationkey Query Plan5 region.name = ‘ASIA’ custkey regionkey nationkey nationkey orderkey Query Plan4 <755,267,706> e) 53,289,813,663 f) 1,159,191,380 c) 715,808,810 <1,229,734> e) 9,267,464 f) 1,923,950 c) 1,181,303 <148,044> e) 1,250,958 f) 225,000 c) 147,875 <1,000> e) 1,000 f) 1,000 c) 1,000 <5,887,806> e) 46,337,318 f) 8,987,264 c) 5,906,513 <1> e)1 f) 1 c) 1 <755,267,706> e) 415,611,233,761 f) 1,165,652,070 c) 715,830,284 <90,905,297> e) 862,533,324 f) 90,750,963 c) 90,892,314 <128,446> e) 866,665 f) 128,445 c ) 119 832 <599,152> e) 4,741,213 f) 599,151 c) 599,141 <599,152> e) 4,333,326 f) 599,151 c) 599,159 <762,534,291> e) 53,346,670,404 f) 1,170,328,595 c) 723,045,502 Total Cost <847,499,754> e) 416,482,841,625 f) 1,257,601,336 c) 807,920,899 Total Cost <1> e)1 f) 1 c) 1

PAGE 76

65 select nname as suppnation, p_name as line_part, s_name as supplier_name from supplier, lineitem, orders, part, nation where ssuppkey = lsuppkey and o_orderkey = lorderkey and lpartkey = ppartkey and snationkey = nnationkey and n_name = ’IRAN' ; Query 4 Modified from Query #7 in TPC-R. Query Plan1 partkey suppkey nationkey orderkey name = ‘IRAN’ <601,672> e) 5,053,818 f) 605,105 c) 601,662 <181,964,334> e) 65,279,590,833 f) 187,869,257 c) 188,555,570 Total Cost <92,037,885> e) 7,265,143,374 f) 91,740,173 c) 91,144,086 <88,430,967> e)55,939,584,270 f) 91,740,173 c) 93,086,366 <893,809> e) 2,069,809,370 f) 3,783,,805 c) 3,723,455 <1> e) 1 f) 1 c) 1

PAGE 77

66 Query Plan2 partkey suppkey nationkey orderkey name = ‘IRAN’ <601,672> e) 5,053,818 f) 605,105 c) 601,662 <574,567> e) 38,912,990 f) 605,105 c) 614,483 <88,430,967> e)55,939,584,270 f) 91,740,172 c) 93,086,366 <893,809> e) 2,069,809,370 f) 3,783,,805 c) 3,723,455 <90,501,016> e) 58,053,360,449 f) 96,734,188 c) 98,025,967 Total Cost Query Plan3 partkey suppkey nationkey orderkey name = ‘IRAN’ <612,048> e) 4,618,961 f) 600,000 c) 612,786 <574,567> e) 38,910,541 f) 605,105 c) 614,483 <88,430,967> e)55,936,063,813 f) 91,740,173 c) 93,086,366 <893,809> e) 2,069,809,110 f) 3,783,805 c) 3,723,455 <90,511,392> e) 58,049,272,426 f) 96,729,084 c) 98,037,091 Total Cost <1> e) 1 f) 1 c) 1 <1> e) 1 f) 1 c) 1

PAGE 78

67 Query Plan4 Query Plan5 partkey suppkey nationkey orderkey name = ‘IRAN’ partkey suppkey nationkey orderkey name = ‘IRAN’ <39> e) 37 f) 41 c) 40 <4,397,101> e) 2,338,812,520 f) 7,622,643 c) 7,393,326 Total Cost <1> e) 1 f) 1 c) 1 <23,346> e) 186,995 f) 25,159 c) 24,067 <3,479,906> e) 268,816,117 f) 3,798,721 c) 3,645,763 <893,809> e) 2,069,809,370 f) 3,798,721 c) 3,723,455 <1> e) 1 f) 1 c) 1 <5,971> e) 1,439,812 f) 25,159 c) 24,579 <39> e) 37 f) 41 c) 40 <612,048> e) 4,619,833 f) 600,000 c) 612,786 <893,809> e) 2,069,809,370 f) 3,798,721 c) 3,723,455 <1,511,868> e) 2,075,869,053 f) 4,423,922 c) 4,360,861 Total Cost

PAGE 79

68 Query 5 Modified from Query #8 in TPC-R. Query Plan1 select count(*) from part, supplier, lineitem, orders, nation, region where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = ’ASIA’ and o_orderstatus = ‘O’ and p_size = ’1’; suppkey orderkey nationkey region.name = ‘ASIA’ partkey regionkey orders.orderstatus = ‘O’ part.size= ‘1’ <543> e) 400 f) 543 c) 543 <16,694> e) 92,379 f) 16,290 c) 16,637 <7,093> e) 778,211 f) 16,429 c) 16,683 <0> e) 372,903,363 f) 1,239,261 c) 1,242,721 <0> e) 344,943,069 f) 1,214,949 c) 1,242,721 <0> e) 68,988,614 f) 252,338 c) 248,544 <73,758> e) 50,000 f) 73,758 c) 73,758 <98,089> e) 787,756,037 f) 2,813,569 c) 2,841,608 Total Cost <1> e) 1 f) 1 c) 1

PAGE 80

69 Query Plan2 Query Plan3 suppkey orderkey nationkey region.name = ‘ASIA’ partkey regionkey orders.orderstatus = ‘O’ part.size= ‘1’ suppkey orderkey nationkey region.name = ‘ASIA’ partkey regionkey orders.orderstatus = ‘O’ part.size= ‘1’ <543> e) 400 f) 543 c) 543 <16,694> e) 92,379 f) 16,290 c) 16,637 <1,000> e) 925 f) 1,000 c) 1,000 <7,093> e) 719,861 f) 16,429 c) 16,683 <0> e) 344,943,069 f) 1,239,261 c) 1,242,721 <0> e) 68,988,614 f) 255,488 c) 248,544 <1> e) 1 f) 1 c) 1 <73,758> e) 50,000 f) 73,758 c) 73,758 <1,000> e) 925 f) 1,000 c) 1,000 <543> e) 400 f) 543 c) 543 <1> e) 1 f) 1 c) 1 <212> e) 185 f) 212 c) 200 <127,658> e) 934,977 f) 128,205 c) 120,332 < 8,009,730> e) 448,022,381 f) 9,672,.965 c) 8,963,474 <0> e) 68,992,956 f) 262,621 c) 248,544 <73,758> e) 50,000 f) 73,758 c) 73,758 <99,089> e) 414,795,249 f) 1,602,770 c) 1,599,887 Total Cost <8,212,902> e) 518,001,825 f) 10,139,305 c) 9,407,852 Total Cost

PAGE 81

70 Query Plan4 Query Plan5 suppkey orderkey nationkey region.name = ‘ASIA’ partkey regionkey orders.orderstatus = ‘O’ part.size= ‘1’ suppkey orderkey nationkey region.name = ‘ASIA’ regionkey orders.orderstatus = ‘O’ partkey part.size= ‘1’ <543> e) 400 f) 543 c) 543 <16,694> e) 92,379 f) 16,290 c) 16,637 <7,093> e) 778,211 f) 16,429 c) 16,683 <7,093> e) 719,861 f) 16,423 c) 16,683 <7,093> e) 143,972 f) 3,478 c) 3,337 <0> e) 68,988,614 f) 258,325 c) 248,544 <73,758> e) 50,000 f) 73,758 c) 73,758 <1> e) 1 f) 1 c) 1 <1> e) 1 f) 1 c) 1 <73,758> e) 50,000 f) 73,758 c) 73,758 <543> e) 400 f) 543 c) 543 <5> e) 5 f) 5 c) 5 <212> e) 200 f) 212 c) 200 <16,694> e) 92,379 f) 16,290 c) 16,637 <7,093> e) 155,626 f) 3,480 c) 3,337 <0> e) 74,572,773 f) 269,958 c) 248,544 <112,275> e) 70,773,438 f) 385,247 c) 376,186 Total Cost <98,306> e) 74,871,384 f) 364,247 c) 343,025 Total Cost

PAGE 82

71 DataSet 2: More Correlated Data select l_orderkey, o_orderdate, o_shippriority from customer, orders, lineitem where c_mktsegment = ‘AUTOMOBILE' and c_custkey = o_custkey and l_orderkey = o_orderkey; Query 1 Modified from Query #3 in TPC-R. Query Plan1 customer.mktsegmen t=‘AUTOMOBILE’’ custkey orderkey Query Plan2 customer.mktsegment=‘AUTOMOBILE’ custkey orderkey <3,013> e)3,000 f)3,013 c)3,013 <509,197> e)209,279 f)509,349 c)126,352 <655,864,130> e)1,091,331,609 f) 657,059,559 c) 76,637,793 < 656,376,340> e) 1,091,543,888 f) 657,571,921 c) 76,767,158 <90,910,769> e)782,204,056 f) 90,749,343 c ) 90 980 494 <3,013> e)3,000 f)3,013 c)3,013 <655,864,130> e)1,077,293,021 f) 657,048,020 c) 76,637,793 Total Cost <746,777,912> e) 1,859,500,077 f) 747,800,376 c) 167,621,300 Total Cost

PAGE 83

72 Query Plan3 custke y orderkey customer.mktsegment =‘AUTOMOBILE’ custkey orderkey customer.mktsegment =‘AUTOMOBILE’ Query Plan5 customer.mktsegment =‘AUTOMOBILE’ custke y ordekey Query Plan4 <627,617> e)1,046,393 f) 689,239 c) 628,994 <509,197> e)209,279 f)509,349 c)126,352 <655,864,130> e)1,091,331,604 f) 306,528,686 c ) 76,637,793 < 657,000,944 > e) 1,092,587,276 f) 307,727,274 c) 77,393,139 Total Cost <627,617> e)1,046,393 f) 689,239 c) 628,994 <728,725,642> e)5,456,658,047 f) 765,753,258 c) 381,510,318 <655,864,130> e)1,091,331,609 f) 306,528,655 c) 76,637,793 <1,385,217,389> e) 6,549,036,049 f) 1,072,971,152 c) 458,777,105 Total Cost <655,864,130> e)1,077,293,021 f) 657,048,052 c) 76,637,793 <728,725,642> e)5,386,465,103 f) 765,741,720 c) 381,510,318 <90,910,769> e)782,204,056 f) 90,749,343 c) 90,980,494 <1,475,500,541> e) 7,245,962,180 f) 1,513,539,115 c) 549,128,605 Total Cost

PAGE 84

73 select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = ‘1’ and s_nationkey = n_nationkey and n_regionkey = r_regionkey and renamed = ’ASIA'; Query 2 Modified from Query #2 in TPC-R. Query Plan1 Part.size = ‘1’ region.name = ‘ASIA’ partkey suppkey nationkey regionkey <661> e) 444 f) 661 c) 661 <55,428> e)357,807 f) 54,537 c) 55,627 <113,183> e)2,972,364 f) 58,207 c) 55,823 <113,183> e)2,749,496 f) 58,251 c) 55,823 <1> e)1 f)1 c)1 <33,776> e)549,899 f) 17,322 c) 11,165 <316,232> e) 6,630,011 f) 188,979 c) 179,100 Total Cost

PAGE 85

74 Query Plan2 Part.size = ‘1’ region.name=‘ASIA’ suppkey nationkey regionkey partkey <80,273> e)664,618 f) 80,157 c) 80,282 <1> e)1 f)1 c)1 <211,964> e)1,952,790 f) 190,307 c) 188,447 Total Cost <80,273> e)614,785 f) 80,178 c) 80,282 <16,980> e) 122,957 f) 16,988 c) 16,056 <33,776> e)549,985 f) 12,322 c) 11,165 <661> e) 444 f) 661 c) 661 Query Plan3 region.name=‘ASIA’ partkey suppkey nationkey regionkey Part.size = ‘1’ <661> e) 444 f) 661 c) 661 <1> e)1 f)1 c)1 <5> e)5 f)5 c)5 <80,273> e)664,618 f) 80,157 c) 80,282 <113,183> e)2,972,828 f) 58,589 c) 55,823 <33,776> e)594,577 f) 11,263 c) 11,165 <227,899> e) 4,232,473 f) 150,676 c) 147,937 Total Cost

PAGE 86

75 Query Plan4 Part.size = ‘1’ re g ion.name = ‘ASIA’ suppkey nationkey regionkey partkey Part.size = ‘1’ region.name = ‘ASIA’ suppkey nationkey regionkey partkey Query Plan5 <661> e) 444 f) 661 c) 661 <1,000> e) 925 f) 1,000 c) 1,000 <212> e) 185 f) 212 c) 200 <16,980> e) 122,957 f) 16,991 c) 16,056 <1> e)1 f)1 c)1 <1> e)1 f)1 c)1 <33,776> e) 549,985 f) 12,348 c) 11,165 <661> e) 444 f) 661 c) 661 <1,000> e) 925 f) 1,000 c) 1,000 <212> e) 185 f) 212 c) 200 <33,776> e) 549,935 f) 12,273 c) 11,165 <55,428> e) 357,807 f) 54,357 c) 55,627 <91,078> e) 909,297 f) 68,504 c) 68,654 Total Cost <52,630> e) 674,497 f) 31,213 c) 29,083 Total Cost

PAGE 87

76 select n_name from customer, orders, lineitem, supplier, nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = ’ASIA’; Query 3 Modified from Query #5 in TPC-R. Query Plan1 region.name = ‘ASIA’ custkey nationkey regionkey orderkey nationkey <627,617> e)1,046,393 f) 689,239 c) 628,994 <73,930,353,260> e) 3,621,841,053,516 f) 41,305,045,793 c) 33,910,566,700 Total Cost <728,725,642> e) 5,456,658,047 f) 765,753,258 c) 381,510,318 <30,413,000,000> e) 1,724,745,059,090 f) 17,150,034,704 c) 15,240,360,525 <1> e)1 f) 1 c) 1 <30,413,000,000> e) 1,576,365,241,654 f) 17,150,038,134 c) 15,240,055,718 <12,375,000,000> e) 315,273,048,331 f) 6,238,530,457 c) 3,048,011,144

PAGE 88

77 region.name = ‘ASIA’ <1> e)1 f) 1 c) 1 <12,375,000,000> e) 315,273,048,331 f) 6,238,530,924 c) 3,048,011,144 <43,517,353,265> e) 2,045,475,811,867 f) 24,155,008,131 c) 18,670,510,987 Total Cost Query Plan2 custkey nationkey regionkey orderkey nationkey <627,617> e)1,046,393 f) 689,239 c) 628,994 <728,725,642> e) 5,456,658,047 f) 765,753,258 c) 381,510,318 <30,413,000,000> e) 1,724,745,059,090 f) 17,150,034,704 c) 15,240,360,525 <5> e)5 f) 5 c) 5 Query Plan3 region.name = ‘ASIA’ custkey orderkey nationkey regionkey nationkey <90,910,769> e)782,204,056 f) 90,749,343 c) 90,980,494 <30,413,000,000> e) 1,702,558,414,418 f) 31,851,690,316 c) 15,240,360,525 <30,413,000,000> e) 1,556,087,314,026 f) 31,898,496,375 c) 15,240,055,718 <1> e)1 f) 1 c) 1 <599,152> e)4,740,897 f) 599,150 c) 599,172 <73,292,509,922> e) 3,570,650,136,203 f) 76,455,817,905 c) 33,620,007,054 Total Cost <12,375,000,000> e) 311,217,462,805 f) 12,614,282,720 c) 3,048,011,144

PAGE 89

78 region.name = ‘ASIA’ custkey orderkey nationkey regionkey nationkey Query Plan5 region.name = ‘ASIA’ custkey regionkey nationkey nationkey orderkey Query Plan4 <12,375,000,000> e) 40,424,539,745 f) 8,886,229,242 c) 3,048,072,105 <9,960,508> e) 7,751,992 f) 10,366,363 c) 5,025,338 <627,617> e) 1,046,393 f) 689,239 c) 628,994 <1,000> e) 1,000 f) 1,000 c) 1,000 <26,033,882> e)38,759,959 f) 28,497,651 c) 25,126,692 <1> e)1 f) 1 c) 1 <12,375,000,000> e) 315,272,986,664 f) 6,448,456,007 c) 3,048,011,144 <128,446> e) 866,607 f) 128,445 c ) 119 832 <599,152> e) 4,740,897 f) 599,150 c) 599,172 <599,152> e) 4,333,037 f) 599,151 c) 599,160 <12,411,623,008> e) 40,472,099,090 f) 8,925,783,496 c) 3,078,854,130 Total Cost <12,467,237,520> e) 316,065,131,262 f) 6,540,532,097 c) 3,140,309,803 Total Cost <1> e)1 f) 1 c) 1 <90,910,769> e) 782,204,056 f) 90,749,343 c) 90,980,494

PAGE 90

79 select nname as suppnation, p_name as line_part, s_name as supplier_name from supplier, lineitem, orders, part, nation where ssuppkey = lsuppkey and o_orderkey = l_orderkey and lpartkey = ppartkey and snationkey = nnationkey and n_name = ’IRAN' ; Query 4 Modified from Query #7 in TPC-R. Query Plan1 partkey suppkey nationkey orderkey name = ‘IRAN’ <632,147> e) 5,065,628 f) 635,889 c) 632,465 <423,975,095> e) 62,367,000,567 f) 200,080,612 c) 211,463,085 Total Cost <97,842,390> e) 6,603,935,378 f) 97,678,643 c) 95,903,448 <324,606,748> e)53,768,521,259 f) 98,223,785 c) 110,506,895 <893,809> e) 1,989,478,301 f) 3,542,294 c) 4,420,276 <1> e) 1 f) 1 c) 1

PAGE 91

80 Query Plan2 partkey suppkey nationkey orderke name = ‘IRAN’ <632,147> e) 5,065,628 f) 635,889 c) 632,465 <1,914,433> e) 41,243,789 f) 639,438 c) 728,772 <324,607,748> e) 53,768,521,259 f) 98,223,785 c) 110,506,895 <893,809> e) 1,989,478,301 f) 3,542,294 c) 4,420,276 <328,048,138> e) 55,804,308,978 f) 103,041,407 c) 116,288,409 Total Cost Query Plan3 partkey suppkey nationkey orderkey name = ‘IRAN’ <680,613> e) 4,883,985 f) 603,349 c) 691,362 <1,914,433> e) 41,236,655 f) 639,438 c) 728,772 <324,607,748> e) 53,759,221,173 f) 98,223,785 c) 110,506,895 <893,809> e) 1,989,134,191 f) 3,542,294 c) 4,420,276 <328,096,604> e) 55,794,476,005 f) 103,008,867 c) 116,347,306 Total Cost <1> e) 1 f) 1 c) 1 <1> e) 1 f) 1 c) 1

PAGE 92

81 Query Plan4 Query Plan5 partkey suppkey nationkey orderkey name = ‘IRAN’ partkey suppkey nationkey orderkey name = ‘IRAN’ <39> e) 37 f) 39 c) 40 <4,397,101> e) 2,234,016,663 f) 7,118,228 c) 8,281,754 Total Cost <1> e) 1 f) 1 c) 1 <23,346> e) 187,432 f) 23,419 c) 25,299 <3,479,906> e) 244,350,892 f) 3,537,513 c) 3,836,138 <893,809> e) 1,989,478,301 f) 3,557,256 c) 4,420,276 <1> e) 1 f) 1 c) 1 <5,971> e) 1,526,053 f) 23,550 c) 29,151 <39> e) 37 f) 39 c) 40 <680,613> e) 4,885,216 f) 603,349 c) 691,362 <893,809> e) 1,989,478,301 f) 3,557,256 c) 4,420,276 <1,580,433> e) 1,995,889,608 f) 4,184,195 c) 5,140,830 Total Cost

PAGE 93

82 part.size= ‘1’ <661> e) 444 f) 661 c) 661 <91,199> e) 108,522 f) 19,941 c) 22,850 Query 5 Modified from Query #8 in TPC-R. Query Plan1 select count(*) from part, supplier, lineitem, orders, nation, region where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = ’ASIA’ and o_orderstatus = ‘O’ and p_size = ’1’; suppkey orderkey nationkey region.name = ‘ASIA’ partkey regionkey orders.orderstatus = ‘O’ <1,373,617> e) 916,279 f) 21,133 c) 24,086 <106,604,004> e) 398,172,650 f) 1,668,966 c) 1,795,886 <106,604,004> e) 368,317,664 f) 1,562,908 c) 1,795,886 <0> e) 73,663,533 f) 293,750 c) 395,177 <73,758> e) 49,999 f) 73,758 c) 73,758 <214,747,244> e) 841,229,092 f) 3,641,118 c) 4,108,305 Total Cost <1> e) 1 f) 1 c) 1

PAGE 94

83 Query Plan2 Query Plan3 suppkey orderkey nationkey region.name = ‘ASIA’ partkey regionkey orders.orderstatus = ‘O’ part.size= ‘1’ suppkey orderkey nationkey region.name = ‘ASIA’ partkey regionkey orders.orderstatus = ‘O’ part.size= ‘1’ <661> e) 444 f) 661 c) 661 <91,199> e) 108,522 f) 19,941 c) 22,850 <1,000> e) 925 f) 1,000 c) 1,000 <1,373,617> e) 847,576 f) 21,133 c) 24,086 <106,604,004> e) 368,317,665 f) 1,668,966 c) 1,795,886 <0> e) 73,663,533 f) 312,744 c) 359,177 <1> e) 1 f) 1 c) 1 <73,758> e) 49,999 f) 73,758 c) 73,758 <1,000> e) 925 f) 1,000 c) 1,000 <661> e) 444 f) 661 c) 661 <1> e) 1 f) 1 c) 1 <212> e) 185 f) 212 c) 200 <127,658> e) 937,161 f) 134,343 c) 126,493 < 8,009,730> e) 407,247,414 f) 10,584,431 c) 9,431,529 < 0> e) 73,676,276 f) 351,768 c) 359,177 <73,758> e) 49,999 f) 73,758 c) 73,758 <108,144,240> e) 442,988,665 f) 2,098,204 c) 2,277,419 Total Cost <8,213,020> e) 481,912,405 f) 11,146,174 c) 9,992,819 Total Cost

PAGE 95

84 Query Plan4 Query Plan5 suppkey orderkey nationkey region.name = ‘ASIA’ partkey regionkey orders.orderstatus = ‘O’ part.size= ‘1’ suppkey orderkey nationkey region.name = ‘ASIA’ regionkey orders.orderstatus = ‘O’ partkey part.size= ‘1’ <661> e) 444 f) 661 c) 661 <91,199> e) 108,522 f) 19,941 c) 22,850 <1,373,617> e) 916,279 f) 21,133 c) 24,086 <1,373,617> e) 847,576 f) 21,126 c) 24,086 <7,093> e) 169,515 f) 4,239 c) 4,817 < 0> e) 73,663,533 f) 334,438 c) 399,177 <73,758> e) 49,999 f) 73,758 c) 73,758 <1> e) 1 f) 1 c) 1 <1> e) 1 f) 1 c) 1 <73,758> e) 49,999 f) 73,758 c) 73,758 <661> e) 444 f) 661 c) 661 <5> e) 5 f) 5 c) 5 <212> e) 200 f) 212 c) 200 <91,199> e) 108,522 f) 19,941 c) 22,850 <7,093> e) 183,248 f) 4,242 c) 4,817 <0> e) 79,631,293 f) 328,077 c) 359,177 <2,919,946> e) 75,755,869 f) 475,297 c) 549,436 Total Cost <172,929> e) 79,973,712 f) 426,897 c) 461,469 Total Cost

PAGE 96

85 LIST OF REFERENCES [1] Y. E. Ioannidis. Query Optimization. ACM Computing Surveys (CSUR), New York, NY, USA, March 1996, Vol.28, No.1, pages 121-123. [2] V. Poosala and Y. E. Ioannidis. Selectiv ity Estimation without the Attribute Value Independence Assumption. The 23rd Conference on Very Large Data Bases, Athens, Greece 1997 pages 486-495. [3] R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. The 20th International Conference on Very Large Data Bases, Santiago de Chile, Chile, September 12–15, 1994, pages 487–499. [4] S. Brin, R. Motwani and C. Silverstein. Beyo nd Market Baskets: Generalizing Association Rules to Correlations. ACM SIGMOD International Conference on Management of Data, Tucson, AZ, USA, May 13–15, 1997, pages 265–276. [5] Y. E. Ioannidis and V. Poosala. Balancing Histogram Optimality and Practicality for Query Result Size Estimation. The 1995 ACM SIGMOD International Conference on Management of Data, San Jose CA, USA, 1995, pages 233-244. [6] H. Wang and K. C. Sevcik. Utilizing Histogram Information. The 11th CASCON Conference, Toronto, CA, November, 2001, pages 106-120. [7] A. Knig and G. Weikum. Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-Size Estimation. The 25th Conference on Very Large Data Bases, Edinburgh, Scotland, UK, September 7-10, 1999, pages 423-434. [8] N. Bruno and S. Chaudhuri. Exploiting Statis tics on Query Expressions for Optimization. The ACM SIGMOD, Madison, WI, USA, 2002, pages 263-274. [9] M. V. Mannino, P. Chu and T. Sager. Statisti cal Profile Estimation in Database Systems. ACM Computing Surveys (CSUR), September 1988, Vol.20, No.3, pages 191-221. [10] Y. Matias, J. Vitter, and M. Wang. WaveletBased Histograms for Selectivity Estimation. The ACM SIGMOD International Conference on Management of Data Seattle, Washington, USA, 1998, pages 448-459. [11] J. Gryz, and D. Liang. Query Selectivity Estimation via Data Mining. The International Conference on Intelligent Information Processing and Web Mining, IIS 2004 : pages 29-38. [12] Y. Ioannidis and V. Poosala. Histogram-Based Solutions to Diverse Database Estimation Problems. IEEE Data Engineering, September 1995, Vol. 18, No. 3, pages 10-18.

PAGE 97

86 [13] J.-H. Lee, D.-H. Kim and C.-W. Chung. Mu ltidimensional Selectivity Estimation using Compressed Histogram Information. SIGMOD, Philadelphia, PA, U.S.A, 1999, pages 204214. [14] K. Chakrabarti, M. N. Garofalakis, R. Rastogi, and K. Shim. Approximate Query Processing using Wavelets. The 26th International Conference on Very Large Data Bases, Cairo, Egypt, September 10-14, 2000, pages 111–122. [15] Y. Matias, J. S. Vitter, and M. Wang. Wavelet-Based Histograms for Selectivity Estimation. In L. M. Haas and A. Tiwary, editors, SIGMOD 1998, ACM SIGMOD International Conference on Management of Data, Seattle, Washington, USA, June 2-4, 1998, pages 448–459. [16] J. S. Vitter, and M. Wang. Approximate Co mputation of Multidimensional Aggregates of Sparse Data using Wavelets. The ACM International Conference on Management of Data, Philadelphia, PA, USA, June 1999, pages 194-204. [17] A. V. Gelder. Multiple Join Size Estimation by Virtual Domains (extended abstract), The 12th ACM SIGACT-SIGMOD-SIGART Symposiu m on Principles of database systems, Washington, D.C., USA, May 25-28, 1993 pages 180-189. [18] Y. E. Ioannidis and V. Poosala. Histogra m-Based Approximation of Set-Valued QueryAnswers. The 25th Conference on Very Large Data Bases, September 7-10, 1999, Edinburgh, Scotland, UK, pages 174-185. [19] S. Guha, N. Koudas and K. Shim Data Streams and Histograms. The 33rd Annual ACM Symposium on Theory of Computing, 2001, Hersonissos, Greece, pages 471–475. [20] V. Poosala, V. Ganti and Y. E. Ioannidis. Approximate Query Answering using Histograms. Bulletin of the Technical Commitee on Data Engineering 1999, Vol. 22, No.4, pages 5-14. [21] A. Deshpande,M. Garofalakis and R. Rastogi Independence Is Good: Dependency-Based Histogram Synopses for High-Dimensional Data. The 2001 ACM SIGMOD International conference on Management of data Santa Barbara, California, USA, May 21-24, 2001, pages 199-210. [22] S. Chaudhuri. An Overview of Query Optimization in Relational Systems. ACM PODS Conference, Seattle, Washington, USA, 1998, pages 34-43. [23] M. Muralikrishna and D. J. DeWitt. Equi-D epth Histograms for Estimating Selectivity Factors for Multi-Dimensional Queries. The 1988 ACM SIGMOD International Conferenceon Management of Data, Chicago, Illinois, June 1988, pages 28–36. [24] H. Wang and K. C. Sevcik. A Multi-Dimensiona l Histogram for Selectivity Estimation and Fast Approximate Query Answering. The 13th CASCON Conference Toronto, CA, October 2003, pages 246-260. [25] D. Gunopulos, G. Kollios, V. J. Tsotras, and C. Domeniconi. Approximating MultiDimensional Aggregate Range Queries over Real Attributes. ACM SIGMOD International Conference on Management of Data, Dallas, TA, USA, 2000, pages 463–474.

PAGE 98

87 [26] P. G. Brown and P. J. Haas. BHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational Data. The 29th International Conference on Very Large Data Bases, Berlin, Germany, 2003, pages 668-679. [27] C. Jermaine. Playing Hide-And-Seek with Correlations. KDD, Washington, DC, USA, 2003, pages 559-564. [28] J. Ullman, and J. Widom. Database System Implementation. Prentice Hall 2000. [29] TPC benchmark R. Decision Support. http://www.tpc.org Feburary 2005. [30] TPC-D Data Generation with Skew,S. Chaudhur i and V. Narasayya, eds., January 1999. ftp.research.Microsoft.com /users/viveknar/tpcdskew Feburary 2005.

PAGE 99

88 BIOGRAPHICAL SKETCH BoYun Eom was born in Cheongju, Republic of Korea. She received her Bachelor in Computer Engineering degree from Chungbuk National University, in February 1995. She has worked for Lucky Goldstar Mart, Co., Ltd., and Korea Consumer Protection Board as a software developer in Seoul, Republic of Ko rea. She started her study in the CISE graduate program at University Florida in August 2002 and joined Data Base Res earch and Development Center from August 2003. She will receive her Mast er of Science degree from the department of Computer and Information Science and Engineering at the University of Florida in August 2005. Her research area is query optimization in da tabases and Dr. Jermaine is her adviser.