<%BANNER%>

Disclosure Control of Confidential Data by Applying PAC Learning Theory

xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID E20101124_AAAAAT INGEST_TIME 2010-11-24T10:09:56Z PACKAGE UFE0011440_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES
FILE SIZE 23013 DFID F20101124_AAASWP ORIGIN DEPOSITOR PATH he_l_Page_095.pro GLOBAL false PRESERVATION BIT MESSAGE_DIGEST ALGORITHM MD5
e35b233e5abfd71f6b9beba83322cddb
SHA-1
52e4216d44c6b6a1f3061f191b22bd01960afdde
1169 F20101124_AAATCK he_l_Page_115.txt
23553ac758b48d01c969ec1c91693679
d2b7e643c9729ff40a1622b401518fc652b2f941
40422 F20101124_AAASXE he_l_Page_110.pro
ff72d275e31babadeb7d01706e4dd0ae
b6d3acf1d62a7273724179da09eaf34451c255af
1690 F20101124_AAATBV he_l_Page_098.txt
9a4cab2dba5311cfb0e7b7a82daf6d97
ce676caa48f34785a071f6aed07fb99ca8435a2e
19145 F20101124_AAASWQ he_l_Page_096.pro
d5eaadd8c2868fe8bb566eeb820dd2c7
f90519b370fa4208183a9c92045da4c0ba7a5f2e
1830 F20101124_AAATCL he_l_Page_116.txt
9d3e728b90cd4ff8943151cb58b99956
898bc990597709f406ddfc908b19379596e9173c
35468 F20101124_AAASXF he_l_Page_111.pro
49d3bd30cd72ab3af8061635058aa7f1
4b1af3e4430fecaafac2e58d4586b0a502a14f0d
1488 F20101124_AAATBW he_l_Page_099.txt
5d10f9f99ef436ccede032b035f56ebb
d554428fbc743d1f63d8cff38dadf524a6493a2f
20218 F20101124_AAASWR he_l_Page_097.pro
1f0583b9a16b972769451dc146fe2715
11a83a177a3ae6e4067fc76925294f3b1d365ed6
2459 F20101124_AAATDA he_l_Page_135.txt
28316d3797d74a977046fcec8cb5cba2
d76b1ecb6b93391c7b0f2390d60ecb39d4d001dd
2038 F20101124_AAATCM he_l_Page_117.txt
1eff893cdc997c8c381444b4ea83b0a4
5fb1e59ecd6ee3f70b341f4379fa204bd864ba0f
37557 F20101124_AAASXG he_l_Page_112.pro
7a2e82c333178a3a2ced1458a4623588
0cfb4ab72fa7b7009a298c776d2b0a79696f6095
1424 F20101124_AAATBX he_l_Page_100.txt
72d0eb03e1ecb49fc305b1545aba0dc1
307f598f050b753fc0aee38ef1857bb7f17a78b4
37869 F20101124_AAASWS he_l_Page_098.pro
ae2f1b818acacc8dce46f3afd044e1eb
ea2a029adcd228d65fb1504f1582c2c45ae1bd38
2329 F20101124_AAATDB he_l_Page_136.txt
b59473ac71a5419f091f596929db3ec2
f5d6536acbcb22407051dca653fa76e3f7abaf38
1968 F20101124_AAATCN he_l_Page_118.txt
025e87b38a8809f4d2678eaf22dc3322
b159a46f3f0565cbeb18ce9083735c70d0312a3c
26896 F20101124_AAASXH he_l_Page_113.pro
503d861f024e0c06e81bfd3653d0570b
c7e9a94484d68a97bd49f0055300ad6306feee9b
899 F20101124_AAATBY he_l_Page_102.txt
ead8426f1ddb3487971331a7d0acd497
8d043092adfbb368c30ba651a6a61684d2a612d5
36481 F20101124_AAASWT he_l_Page_099.pro
703de620e5fcb17df00f2c6f0f804f29
9b40766da262eaebdee851faf64faa9408ff9bb1
2660 F20101124_AAATDC he_l_Page_138.txt
4d3e27a20b19a7eac454e367418757ab
cea746f886f9d8a62c6ea360e1d75111295eee8f
1442 F20101124_AAATCO he_l_Page_119.txt
9f49f93cfea97e420187551eaddb25ab
6cb7a817bd57ce5ee9a9a301205ff899393a2197
34296 F20101124_AAASXI he_l_Page_114.pro
2d17f3a1677c7f98435b89b6b96e26f3
b0d369562de8013c8b98e5eb0a3e8de15a089a36
1789 F20101124_AAATBZ he_l_Page_103.txt
4fc28909af8e2012b0af03203ba453e6
0763f3724bbdf301616aa1f2da25637cd69e65a5
31393 F20101124_AAASWU he_l_Page_100.pro
3fe2ec67683bc9ee8957f8f6eaf58023
e5eb22f1d2dd78ad3e13a7ca04d9908577e7acd1
2301 F20101124_AAATDD he_l_Page_139.txt
6f4808cfd3d5439ea74cf5e9f78eb6da
dab53cb3fe7111acefa10e72ee930e8ea6f2666d
1272 F20101124_AAATCP he_l_Page_120.txt
6b91f1499811c19a1da7150a4c574a29
9b7b8232ebfde7b5852406d0ab87deaa90b79461
24412 F20101124_AAASXJ he_l_Page_115.pro
3ce2eea4a363e51713b3b00009485262
4623faeab6c447d99545f1409e3bfb5c9c7079cf
38276 F20101124_AAASWV he_l_Page_101.pro
a7f60f4792c0bfde1f2a946c79e561e5
6703333e42f97f3cb913dd0f36c8f1c146cdfcf9
2403 F20101124_AAATDE he_l_Page_140.txt
f6a9b3bc96c44451b3a3e085cfdcbd4e
7ec444a64b5224baf61011aeb454eb65307c45ad
1413 F20101124_AAATCQ he_l_Page_121.txt
9998aec9b48a6ba4cb1951e98a430b7a
2455b8a770b049fd75cb259a521964c98b1ed76d
43595 F20101124_AAASXK he_l_Page_116.pro
79cb51294c29fcb71d87c75ef8e308e4
877e931add1edadbb9bd0b90ffa774768986113e
18697 F20101124_AAASWW he_l_Page_102.pro
c9557bbf080296942b2723f90dbb9979
0e1afa8bc29ac2d1abf2216ab04c73bf0cefd518
2530 F20101124_AAATDF he_l_Page_141.txt
678745b8f45ac0db6349b76bda58589c
7079caf11eacb6b417f3aee9a59c8ffd5ee9ce04
1916 F20101124_AAATCR he_l_Page_123.txt
55ce7fc0e05122765457196ac1ee5416
2889dc0816be9b25732e00f340ee960c331b9d8e
43174 F20101124_AAASWX he_l_Page_103.pro
4729d621e5fa6bba463976483806aa3e
744337788466693fefab2734e243e0c7e28fb473
2420 F20101124_AAATDG he_l_Page_142.txt
74565e289e9b8b1ebba6d0a7a49643c9
422307d2af4e8c74fa699aa510834a7d70a93cec
60579 F20101124_AAASYA he_l_Page_135.pro
0bdca94d17f45571b2c7458daae29761
52cae87dff68d5a8b9ea3f8c878cb42dd363cccd
310 F20101124_AAATCS he_l_Page_124.txt
844f9778a32f95efc8a82533a7458b90
06f96f6e47cf7ced4d8b750787a2e89c45f4e4ae
50690 F20101124_AAASXL he_l_Page_117.pro
32dedaac6b3f4df44a1b96f41536020f
f634a8acff111f3ab7546b8f93e9b282a865c822
35364 F20101124_AAASWY he_l_Page_104.pro
118cbe3ae8058c15e73ba3a43c5b3eef
c9bd0e16da5e6381d092be0031001c74ae965ba4
18621 F20101124_AAASAF he_l_Page_076.QC.jpg
545ac339710cece479f8a9d5527deed7
674f1ec409a4c244482526b5bce5f5d3bf35d305
2354 F20101124_AAATDH he_l_Page_143.txt
92bd9f866f71f03dc4f903352aaf2e2f
053b7e7bdaaece1fa511d32569ef60d5741707f9
56340 F20101124_AAASYB he_l_Page_136.pro
d1ec866599170fd7061585ccbaf83b71
cb9dc6b790b119352950ce2b6cfdb5d8940499c0
1987 F20101124_AAATCT he_l_Page_125.txt
b4792bd8ad33d636bda39e3796c16cda
5cc4ef82c225c6380cc372a4023cb9a1d2d3017e
49350 F20101124_AAASXM he_l_Page_118.pro
c083c096543fd949ca41998d34ca5173
9dda2732bf0caee819df46c2749835ea980bff76
35520 F20101124_AAASWZ he_l_Page_105.pro
e4bd9c5663641cf3d47064a2b53d2ede
82a3ae66b32eec41f05cf546d1458ecaf628341c
1053954 F20101124_AAASAG he_l_Page_120.tif
2735f4d3fa60029743ef283730c6150a
31a5fc242ff842bce9d7e87a171ce96fbf63a10c
1667 F20101124_AAATDI he_l_Page_144.txt
747137fe3ca7f23a918053c76c79e6fb
42548eb3f70e9dfca0e9b23f984ba9cc1a135e80
58938 F20101124_AAASYC he_l_Page_137.pro
372dd5fd230434c33861bddbbbd2d27b
949af5dd30a548bf10defd7fb32b0733b0ac6a3e
1910 F20101124_AAATCU he_l_Page_126.txt
8cae7db5a3adedd0b57034ec0e4ff27b
0029549ed4c2bc0c9fbf2a3691f1fabeaec561a8
22743 F20101124_AAASXN he_l_Page_120.pro
b46c295b648bf16fc1ee7f767913a6d5
a72b1858e8862bd0665e0d3eeba0df6c027902ee
57559 F20101124_AAASAH he_l_Page_011.jpg
9c1c904b1f88073d7be9eb7422ebfae3
11007e0c01518bae38649ab6ad62779d5bb67b3f
691 F20101124_AAATDJ he_l_Page_145.txt
f3ac9ca093aa5549be58b9634b2f72d9
bb10e2b05b550344bfecde40b5920e801c9bf371
65424 F20101124_AAASYD he_l_Page_138.pro
8d442f90fe0b7af4990689b7487079f5
223c110981b5677327b1d78aff854628541035dc
25568 F20101124_AAASXO he_l_Page_121.pro
bad78048fd335a6989f94326e91c2e98
511404048d478338fed4462fe5137baf8737a117
1996 F20101124_AAASAI he_l_Page_036.txt
38584990e35646ce369c0b893596ad30
ed8d942642ebd784cbced80e85acdda2ec3bc56b
2383 F20101124_AAATDK he_l_Page_001thm.jpg
b72be50ab4dcc7f33eccdc78fa4a1edc
717dffd04d6c81a882f88ba558f0c94bc75841a5
56308 F20101124_AAASYE he_l_Page_139.pro
0b8b28c4afc786ef0066f0bb0cfd011a
8945773b67859007e5ad699454c8c762e4c369f4
312 F20101124_AAATCV he_l_Page_130.txt
517935a8a43f68b6b651a88f4226f6b7
6dbdf909dd7a396a2319145fef3364b70d1970a2
32742 F20101124_AAASXP he_l_Page_122.pro
487898ee93b009e3cfbdf1a258dad83c
cafd18269615a9e51edab3bcf2f38d1b45432db5
43075 F20101124_AAASAJ he_l_Page_056.pro
33d870e6f7b0fbcf45bba3f8a147229f
35a8f0e88c85901a536faf0534f8799d739a06e7
594243 F20101124_AAATDL he_l.pdf
54cc7062685e699a3218c87b3627798b
5f79fd3bcedbcaeff6b86d31d97d906be5338eae
59274 F20101124_AAASYF he_l_Page_140.pro
058b9881eecdd1392e8450b1900e535b
295fd33831bb321f2c8d5e08158fa39011d57939
1985 F20101124_AAATCW he_l_Page_131.txt
7e4f13c582e259a46d80b3aaa3d8b7c9
54f5e5603342c546a972409d22d59cec1e5cf777
29956 F20101124_AAASXQ he_l_Page_123.pro
6958bade1e71b3e37c676e1f9c150dec
1ba9375653b4d7b9ac05ec1d94e06e6fc3b32e45
2390 F20101124_AAASAK he_l_Page_137.txt
62120312a5315e0f7dd57842529fa3d7
d8a02144225c60fa53e9f660ff336e459249cdc6
6836 F20101124_AAATDM he_l_Page_001.QC.jpg
b7dff1af239960bdfd376b29a4e356f9
bebabf768c0555c770914f9b2041d3fa864ee5d3
62148 F20101124_AAASYG he_l_Page_141.pro
76f60e9b275f17496337c99ffc3583e6
e9b6d5f5418a61718333f553ca395aa12719445b
1935 F20101124_AAATCX he_l_Page_132.txt
e869d029e6063db1cf9b653dd435ee8d
9b9cfbcfe9b5afd43e3f3bb8f8c0ffc6f9e42c32
4975 F20101124_AAASXR he_l_Page_124.pro
38757567a3b204e82819fd21b05068fc
0290251ac374b6c0ed7c81f93f282854461b54b8
21013 F20101124_AAATEA he_l_Page_009.QC.jpg
6c56b651b733699cd1968058b25eb86a
497f9623be8f328ddf56c82cd08eb569248f34b2
102003 F20101124_AAASAL he_l_Page_050.jp2
2a2b92132c16e1702a4ced38982e944a
3012169987e77475df88c2962ea02d29a036d96f
3222 F20101124_AAATDN he_l_Page_002.QC.jpg
6fdab0cd61ef2e3054c0ca033e32107a
866662a5ac3ce8dd9dd71af6e749d3ba0795f99b
59824 F20101124_AAASYH he_l_Page_142.pro
b96f68201df5bf579ec110982279fc61
9b9f78a5572f18fe6a2817eb75c789252b4d7ee7
F20101124_AAATCY he_l_Page_133.txt
6fb942cdd0a96dc29edb554d2d7d920d
af9bd24024e26175f37aaf6b60b4fe3da4ea0357
33305 F20101124_AAASXS he_l_Page_125.pro
c2ee2c2462b32d7dbe29dfbd12d99d73
0d426c7171bbb1279ce14c7614383e36d0735d3e
5945 F20101124_AAATEB he_l_Page_009thm.jpg
d005a60b844b469b6c81c430801b8029
51a778a47409fccaef40f3bfcb2eec3b197e19be
22132 F20101124_AAASAM he_l_Page_134.QC.jpg
63164cbd9cf673124c228ceb2c537d2b
d11dd99f07f1fb5ff1a762c82c095b491d31c9ad
1364 F20101124_AAATDO he_l_Page_002thm.jpg
8e1de963247254693421fa2a5ada3905
c1a7fa5840140e677726bf16b2e155655f820ab8
58283 F20101124_AAASYI he_l_Page_143.pro
b5b6b48b6c63f4de278cbec1cb19a455
4e95e70b77b1e06e8d945101469d2f42e0d85f43
2062 F20101124_AAATCZ he_l_Page_134.txt
bbad6dbaa2260549bd5e96a2ad3957c0
bce5d2a2f36687d22883487405c557170996145c
30990 F20101124_AAASXT he_l_Page_126.pro
e89950a2c4873bb1d973a92e88474011
1f8e3c7525f36d7a8656f3d4f58c484a4ca830b2
F20101124_AAASBA he_l_Page_102.tif
dca327b0be05b00f5dc20aa20e3a32c5
23f63ec7d1d38ab6f07b50261fbdbec2a5bb0141
4688 F20101124_AAATEC he_l_Page_010.QC.jpg
45d8c565c80b45225ab203f511fd1cac
391105eadb9b389c23e30fc45a42e509e43ee767
7201 F20101124_AAASAN he_l_Page_017.pro
676b5126b2f3ab627b01b78bf01028fb
a6d1241efcf788da8235f60aa444f45893222ea5
4058 F20101124_AAATDP he_l_Page_003.QC.jpg
8c9e5b687b3115a426b27d22a385834b
d2d26165b079ac897c2cbfece17826493feedbcb
40486 F20101124_AAASYJ he_l_Page_144.pro
dc4bc8a030dcb38be4804144087249f8
a5c643434c36acf89fd27384b2cb5edf7463fc7f
5518 F20101124_AAASXU he_l_Page_127.pro
0f6d12fa4bbc8cd21011b81f2c7a3558
0f44e11edee13deec97f07b45ae18854c12838e3
1912 F20101124_AAASBB he_l_Page_129.txt
47135092aef79a3b818caa94cfbfeb47
c42cc1bac376fc9ef3a7a3ec9766bffbfda61e24
1763 F20101124_AAATED he_l_Page_010thm.jpg
afd9daf777d392f8d5f84b9449e98019
2d2f66af9bc0063b49375762ad7e7fd0a7b7753a
20052 F20101124_AAASAO he_l_Page_031.QC.jpg
bf0613d8b35d1efe0498a0bb108c151e
af612f85de45b644d52e763de83b7d8f4f24fab4
18983 F20101124_AAATDQ he_l_Page_004.QC.jpg
8af58b083690384cdb67968cd0cbe24f
c047b3f93a9e516ad6a290f55ee4668ab3633014
15950 F20101124_AAASYK he_l_Page_145.pro
7cc0f0b8450d3fb6838ab0bd52b190be
0d9829ec13436c5992891bf8e6605699ead38c0b
33445 F20101124_AAASXV he_l_Page_128.pro
a75ef3d8bbc3e2ce72e4975bea496932
d7eb6baa333c27f01f76be342bdd3116fa1b3875
6605 F20101124_AAASBC he_l_Page_143thm.jpg
26bdb2463a92cc881bfcd9bcf8cbea4b
10101b4e098ec808155ced65e89f39cdf0349709
18116 F20101124_AAATEE he_l_Page_011.QC.jpg
d90ffb1bf662ce314043379e4cfae8e7
d55a43e6748975be49f5631dd101af9a670e7244
36013 F20101124_AAASAP he_l_Page_119.pro
48312fdecc793bed140a20c3e4da7da1
8b278b748c726c0970527117d9e96152af94745f
5366 F20101124_AAATDR he_l_Page_004thm.jpg
0ee2e8b1e5b6f54cfdb58b41cb6bdb29
466af4fa7b458fd235029b0802aad20d7ba0696f
456 F20101124_AAASYL he_l_Page_001.txt
e4670dc8d9d6aee4bec77ebeae0fa20d
5fabf64b496297c1e7d881a6963279fac90cff86
30872 F20101124_AAASXW he_l_Page_129.pro
d973346b5274a9d9cda83a552c420dcf
1901302ab39d706653d1afa9de92987efe3aa708
4669 F20101124_AAASBD he_l_Page_083thm.jpg
02535af5183808999132e9eae7a8d80e
92b9884c9c3ed2a15efa8ae3e42e310465c71e51
5164 F20101124_AAATEF he_l_Page_011thm.jpg
1931361edaf4ce6f9b3ba3dccebcda60
a92ca453810744310878cbf00a53ecc5778f3b7b
8423998 F20101124_AAASAQ he_l_Page_084.tif
e3f1e8e9fd7c69289c8353dde5616afd
4a356b4f97b8a2788f617efd33de50ff3db7a121
1793 F20101124_AAASZA he_l_Page_018.txt
cfb59631b0812e4a5038914979030fd5
df050ac40e260570b993e4a39738720379d67b5c
17420 F20101124_AAATDS he_l_Page_005.QC.jpg
35c0d1cd0182d144e35ed65d066bfa4a
9fb10bd02ba091f571ff1e724be7bae7944fe70a
5027 F20101124_AAASXX he_l_Page_130.pro
bed5c3d29a3cabe764eab723503ad44a
09e6ebc12e993370f7d29a6c51950ea730cddb0e
2083 F20101124_AAASBE he_l_Page_016.txt
beb7aa1e274c3de63d6cf24739b9015a
c0b3ea5181e3eaaa9b6df58475e75bfcef475793
14113 F20101124_AAATEG he_l_Page_012.QC.jpg
43a3582919713a24d9916ec25b71cb0d
bee686c9cfa46c242e3d259031267815ce8c23d9
3378 F20101124_AAASAR he_l_Page_081thm.jpg
f07acec3edf7f6dbf766fb29066bf96b
e0b1a1ae106c5e6bb017b53b143f48ff90a17f33
1764 F20101124_AAASZB he_l_Page_019.txt
5c016dfb85df94b010074ac575adf7a7
1e54765b24f5811901e3c783149f5d438a9df6c3
4715 F20101124_AAATDT he_l_Page_005thm.jpg
a0bffefed3cbc0ef54973200a4086df2
a86c3407918c7db94ea09696d871ff080c289175
96 F20101124_AAASYM he_l_Page_002.txt
bb41dd420c0cc81b2cea1c068a5f1676
dc7d977aef0419afdb8014acf37291730aa665a4
30704 F20101124_AAASXY he_l_Page_132.pro
f53448a8a8b15e683b2e1fc27a6a11a1
b03b028ef80b2df2563941471eb1708e55e9c78d
1981 F20101124_AAASBF he_l_Page_128.txt
b4cdf02b8284fc2c3b4d3cdc50dee4b5
0e4eea5119620ae14235a666746359304a8d91b5
19544 F20101124_AAATEH he_l_Page_013.QC.jpg
fa9fb41569e2e525a57e161ced79a86f
14c4c732fc72c71507240ab2218052823d56716d
895858 F20101124_AAASAS he_l_Page_027.jp2
0427884213c9457b3c9b3e88b6baf9b7
6c9d2ccd3da0c9fa71d068bbb43d5bf0ffd6e3f6
1756 F20101124_AAASZC he_l_Page_020.txt
60b7fa73e9ce1d0b08ae36250e61ebd6
ebee25d74932353e1c535a80a0c4eb26c2909fc7
25109 F20101124_AAATDU he_l_Page_006.QC.jpg
1eb42e82c29a4d4e5bf1268756dd63f0
077a5950111d1a10db40b666722c9ee6195a9c62
217 F20101124_AAASYN he_l_Page_003.txt
aefeca2e0a91f19dfc898aa9e4993c6e
dc2ccb5d09dbafcb7fbeb6000aed78b9e1770a15
50921 F20101124_AAASXZ he_l_Page_134.pro
9a5cd43355104896f376770355bdf33a
60715e8456234f4c04484df3083e5bb2c45341b3
124816 F20101124_AAASBG he_l_Page_143.jp2
a36bbdf1f70f9c1910074b5f7c96e8e6
7cb6d2a204984fd662b7ae56a4ae22824e310613
5571 F20101124_AAATEI he_l_Page_013thm.jpg
02d3e0d8df090438bff36058b6d16ff3
8456a249aaa91746e3d4d680ad7d1598bede941a
F20101124_AAASAT he_l_Page_131.tif
608c0a646ece7e883d864a30b3f310ba
183e64ac67a6f8af1674af42e4109c96f7e48506
1594 F20101124_AAASZD he_l_Page_021.txt
d3017aa06710202d2259abb2d3600862
5840f670b8112fce6d9bf9e410f2993ec23501f6
5985 F20101124_AAATDV he_l_Page_006thm.jpg
3a32e149d8525048980530f4111a1333
dce2dea16d2d8c534c01e78c94bce7c98159af69
1566 F20101124_AAASYO he_l_Page_004.txt
b6557ae22bd1077de4b6d90fdbe8ecd8
308f16c6c53466617d4269aca71a96a800ed5a08
F20101124_AAASBH he_l_Page_097.tif
1a878c1b8a50434b88f30846c91b3523
be0f82f8489dfd1414e016da60b80e6f68244c88
22726 F20101124_AAATEJ he_l_Page_014.QC.jpg
e8f4941dc20c2430f26bd5425c87e1be
feffe7d4cab0ca21c20f19c57723b0b7428a2766
1191 F20101124_AAASZE he_l_Page_022.txt
ac69f59006db3d54638d44843a6cdae1
1e323cdeae6b3d029c317f6b868b4a3fbc12130d
3145 F20101124_AAASYP he_l_Page_005.txt
d220e634bc6d593bcc6aa25d36579f7b
d181a5a2a56238e873023e3c9d33624db00fbb82
4150 F20101124_AAASBI he_l_Page_091thm.jpg
d1ae9dc9897331ebf8a361b1d3498f12
ebe398ce763902e810eac25c41680bfd41ab6415
6321 F20101124_AAATEK he_l_Page_014thm.jpg
5af294a5273314a040b84175f3719644
7d6de2834cfaa8c49d5f90a06c6c0f2c38226058
22739 F20101124_AAATEL he_l_Page_015.QC.jpg
dd1de7ba9baf269baea1417280b7c252
902a9d292a93058d1285a6759b644b6e8f0cecda
25271604 F20101124_AAASAU he_l_Page_041.tif
b3c885ffe46f5e964d88cc7981cdb51a
9300e94ba593392bb6a4b99ffac7dd00a3675e82
1439 F20101124_AAASZF he_l_Page_023.txt
80811a6dfb7287da90f18e410725c611
55c2b066626880425d57a07efdb0acc6732f9c47
12952 F20101124_AAATDW he_l_Page_007.QC.jpg
dab53594d09ded61a4d348a976fd20e2
169cc25a161bf3a95d5596a742e688cf9cff1e22
4052 F20101124_AAASYQ he_l_Page_006.txt
14a01d1ebd8de923b2e503f7520f5039
117969a04d3acfdce2dea6f91146888edb651c9f
41653 F20101124_AAASBJ he_l_Page_051.pro
5d9f0d808e1c2c35c07c1b9123cf47cd
6623c060d1b104386b742fe8d3e8ece62679c34e
5313 F20101124_AAATFA he_l_Page_023thm.jpg
a139982e8fafd3a1b795450a2106baf5
b83b5dfbf14bcfb0e3fa9ed3fdcd908d83bc018b
6310 F20101124_AAATEM he_l_Page_015thm.jpg
42ea0a4fc2d24af5c11249f9358a9781
da185c64014375a2a5389589b5cf0102d36f57e4
1051947 F20101124_AAASAV he_l_Page_036.jp2
44cc23240873d925a6769cf7902f21d8
c8d71a45af01df5489cdf21b9be05938a97025bc
1429 F20101124_AAASZG he_l_Page_024.txt
9bef25f0a219db2d9ee59f8f96398525
cc2a1a63968e7f975927205452de833e0b7151c7
3707 F20101124_AAATDX he_l_Page_007thm.jpg
d5392245ce2ccfcb5fb3f4ca6161fbd7
8f49a559d089b2d57ce6c32c405cb16a3280f339
1548 F20101124_AAASYR he_l_Page_007.txt
d720690ce5fea371a82f0860ec5f9e34
86e83f44fcfa86668e0b8183745f7d31212d53a1
24034 F20101124_AAASBK he_l_Page_136.QC.jpg
aecb037dfe554148291d61b0a0190cce
912bba8e55850b2441f678837d44a13d0acd7436
16942 F20101124_AAATFB he_l_Page_024.QC.jpg
4d07a41730f3b42e76dfad5c44007814
8d163f102ef7aaf8301528f53023446189d96d6d
24206 F20101124_AAATEN he_l_Page_016.QC.jpg
3dee20d6e7be7afc344f1ef71995d8fc
9caa5b35bfb431c59e8577b358f30d67bbbb3d12
5628 F20101124_AAASAW he_l_Page_108thm.jpg
f3af4c9b025bd559b71efba7c05c5ba6
0a2fecdbd973705e41e4ea8320fa4fc4c5a6c10e
1580 F20101124_AAASZH he_l_Page_025.txt
541efdf3f9fab5aa1a0e56d526bcce11
2262552bc6e16fbd9c790e2bfcf3dfa2b0466393
17844 F20101124_AAATDY he_l_Page_008.QC.jpg
739c2dc92399b7ea7a8eb81ebd51e7d5
3224d61aa3666c57de87f9431d7650ebbfd90bf7
F20101124_AAASYS he_l_Page_008.txt
93d2380803d286e2bd7de1fb076e1caf
58eadfe920afc156d463a926312cb5df45995ae1
62035 F20101124_AAASBL he_l_Page_031.jpg
7334a53da749a7d298e934328e1d2324
b5638c77ff8e47e0ae6cc21f3cec855f4a3513ed
5433 F20101124_AAATFC he_l_Page_024thm.jpg
d37963b18b21978bc37d070865b79a97
8d7e8f49a7e298d1fa2270ddfb572636e0aa2b03
6776 F20101124_AAATEO he_l_Page_016thm.jpg
36e2bdec5e9329380d8b1c55c4f0d918
60eaa12a87395a3456cd177bf88961b8e3179f3d
4785 F20101124_AAASAX he_l_Page_129thm.jpg
7f3412ce864ddc4bbaa1d171f790a4fc
5b454bfe65042a5dc3bd9d5c86748b40020a9866
1739 F20101124_AAASZI he_l_Page_026.txt
425c42dd0e13ba472a6bdf2f3db0f58a
dd7ed195858c22441c7c5e96a30c77c116953fd7
5133 F20101124_AAATDZ he_l_Page_008thm.jpg
edc47bd179408d753f10850d9c01d95a
33e7aa283129e983e64b43f0b05227becf86b5bb
2125 F20101124_AAASYT he_l_Page_009.txt
08ef423191896b783605d17c8e444beb
a04d1c4541f838ee988f60c85b06b734469a4d72
F20101124_AAASCA he_l_Page_103.tif
39318f8ce85ea5eb3dd905dd862fb50f
f71d313ef017809a6c3c4a4067e2df886544f948
11589 F20101124_AAASBM he_l_Page_102.QC.jpg
b6ab6e597cc0a51830013d2cc7c64687
8eb053e3db4b54b8ca101b7c2fb63794705dc493
19210 F20101124_AAATFD he_l_Page_025.QC.jpg
e841ccf022e47eae786c7b2656d1ce26
0fa4a527affeb4364ebf0d6d4b43a235e90af316
5818 F20101124_AAATEP he_l_Page_017.QC.jpg
df2d033a20d413d92b4ea80a7909a29f
d41970ed484ea40beb4f917279f9d66f73c4cbe3
57399 F20101124_AAASAY he_l_Page_128.jpg
dd95a071484ccd4f2f81daeaeeac8554
c2b1ad779364a072d170451403b2f4c076ef35a7
1607 F20101124_AAASZJ he_l_Page_027.txt
5556b127bc536bacb55e02b1eec5f910
1a58ae801caa9f71d36717eba754ca3259785711
214 F20101124_AAASYU he_l_Page_010.txt
4515dc366e11164c3705ae0db435c3e6
c2f830e3f7baad8edcaacd23a80c6d1b7c9dc901
30257 F20101124_AAASCB he_l_Page_081.jpg
0d3a3c4a552065464d75693120a06cbe
2938f1ca4db5578cd0ecfbc55f53deb144844032
61541 F20101124_AAASBN he_l_Page_107.jpg
1ff33d613644982f03966cdc06b38a76
b11625e49492eaeb3f55273d55bfe518de6bc949
5796 F20101124_AAATFE he_l_Page_025thm.jpg
abe735dae223acb1e5674e913a0cbc16
559a3be2213622524bb53ca8b97adb21c396aedc
2066 F20101124_AAATEQ he_l_Page_017thm.jpg
29526a96c221c0e7d2757a6a1833a143
7b73433e322d69491f52ec00325c08bd4e70fc23
F20101124_AAASAZ he_l_Page_038.tif
534b64dff7e7e22f7f628dfd111e5ed2
1472680c141c1625e5285425d2fc95900b68dd52
1957 F20101124_AAASZK he_l_Page_028.txt
ff46f79de2ca8a886ca57d87b7ca9e3f
39995dfaf1945c6af1daa7328398213ead447a55
1669 F20101124_AAASYV he_l_Page_011.txt
d7d9f7fffc6a914f348653459e8f5c65
a120abb6805f0a1c0c6a93bed7e6ad1f67ed0506
F20101124_AAASCC he_l_Page_125.tif
2c67307c52c544cfd83b50e3e5b4c811
26be882c18df5c18b1591d397cc3d5604b99da6b
80273 F20101124_AAASBO he_l_Page_139.jpg
72520d955bb6a1e55a4fab2899325ee4
711fd6aea07b15db31c3b9c094a64321df74922c
19307 F20101124_AAATFF he_l_Page_026.QC.jpg
d4f6a4ba4cc1f446c92f611ea80f4323
80fda2070097e5c357f5acefd7d863c929849c7f
20684 F20101124_AAATER he_l_Page_018.QC.jpg
64ecea9d4bcf04c62093b38ecdd298f2
6b9a2280d47dc45dd1bab148217176efc81f8ae9
1899 F20101124_AAASZL he_l_Page_029.txt
20743d53382a051afee7b854a8a49157
f5a52badd8f3b2f1f01304986a0c0fbf0d172eb1
1796 F20101124_AAASYW he_l_Page_013.txt
9bec916870ae3a3bc4b41b8abaca749a
d197e5de1f5151f422b193fe83be22598665180c
F20101124_AAASCD he_l_Page_002.tif
09a9a4babd7618f8bbd8f1184ca0b43b
edbb2b85bedcd9a6078b6b2cb4717cc746d275f6
72760 F20101124_AAASBP he_l_Page_059.jpg
368d1737aa301e809896da33244b977a
b3e83ecd3e8b2c8a67acd29d09c0a4a9f6d27c6b
6224 F20101124_AAATFG he_l_Page_026thm.jpg
8b69ecaef2a3f6456088c38cae6269c4
81a3599caf1dfd9c37ea634f8410e0ef9c58944f
5795 F20101124_AAATES he_l_Page_018thm.jpg
fc5e223a0fbc900affe28d50a69f20f9
d35abc35aef1c659ed33168f7bbb69025749edd3
1898 F20101124_AAASZM he_l_Page_030.txt
86744b56de01a545b8cb5a9a22d63518
839953f9cef062f43b078ce240200e83adbe5c38
1922 F20101124_AAASYX he_l_Page_014.txt
8a35949319b052bdee7767003ff6a640
5bf8311aaa89b5efbdeac1b0d3923bec9f1fd15a
F20101124_AAASCE he_l_Page_075.tif
5bc4bc42fd1c07d5fb4d90e5f0c5c599
b24de0727dd8d7fd39b66921401c9ad9c69d8209
2002 F20101124_AAASBQ he_l_Page_122.txt
28bcd38a9a51d3ea24f8d7ed73a9fa65
813e55112a23679f6a73742337cc52d50c7c13f3
20556 F20101124_AAATFH he_l_Page_027.QC.jpg
d346b04e6b7b8a2b3a597e5f24964d9e
69837af0862c8078cb63ae6b91772be92654606a
20430 F20101124_AAATET he_l_Page_019.QC.jpg
5c99621842cc50488ab633849379a656
06b2a3f839f5be6a2255917501247aa27aee4dd6
1920 F20101124_AAASYY he_l_Page_015.txt
2b941f85d3a8f081f30d62bd4de0f0c5
739dfd8b4e87ea8de740f8989b9d9bab751e89c5
95113 F20101124_AAASCF he_l_Page_056.jp2
bd9904983df9bf0ec18243cf481ba86f
8ea79340a677cbb200360e5016dca102f1e26356
27908 F20101124_AAASBR he_l_Page_046.pro
f98a0859a0fa5cac9f4508926ba26a44
919d6b299d0fcfe566715a0390dff34e35f153d0
6179 F20101124_AAATFI he_l_Page_027thm.jpg
82f829f8a4fc42ea96d3a0e8e570d22b
e829187c30b8db7398dca09b2db8611ca6f6965c
5849 F20101124_AAATEU he_l_Page_019thm.jpg
eaabe1eb3008573b106d92575305af94
2931456d24ddd374f426f793ebf2effbd9f5f64d
1758 F20101124_AAASZN he_l_Page_032.txt
edd214b46ec5ffae705e28f7eeef189a
874e3263b1533fdd79f4ede92c8ce66555e05ef1
330 F20101124_AAASYZ he_l_Page_017.txt
be974338afb377b9c93edcb5dc75eb39
5fcac1e7a3da162db8cb58ab333ccd4e9b72d823
F20101124_AAASCG he_l_Page_021.tif
bf0a0d8b95aacc56eb608231c8772f1f
be8a8bd07d6d843020ba2522b7c156f7d67e84d2
F20101124_AAASBS he_l_Page_062.tif
1552b3758640e88eceae4dc2b2c61b04
9d3fd98924ed901e5a1bd7abdf6ac6afda42ca31
23257 F20101124_AAATFJ he_l_Page_028.QC.jpg
5165ce7e81a0bc907e685271a35892b5
bfe5c5ef3975b50737bb59e1d186fca087c9ebc0
20397 F20101124_AAATEV he_l_Page_020.QC.jpg
0c161bd2ae63cdd04df93e83c8c7105f
902016b2a926fda1490d416282df3ff1add19c60
1902 F20101124_AAASZO he_l_Page_033.txt
77f0d1830c49a16d84766bd7099ffa5b
a97fea033ac2bce0f6e0716c56ba1a89e7a4c6a8
F20101124_AAASCH he_l_Page_104.tif
f84ec3d33282552c64d55fd2c889283d
9d36a7dcc05da750d8b07f702ee47f6ce22ba8d7
27608 F20101124_AAASBT he_l_Page_078.pro
e933cc8497f89a78f376e29ee5eb533f
c6ac8aa942677a835ac7c19e93106328556fb804
6449 F20101124_AAATFK he_l_Page_028thm.jpg
2b4b6274e0e6ce4ca0e501427e2c9b83
ac270f10ec075f52a8aa97823f78fdc9bd232887
6238 F20101124_AAATEW he_l_Page_020thm.jpg
56fe0aeffa7031bdca610c80f72f625c
4f0994b774629f2a3c8db08919033cbb961a41d5
1976 F20101124_AAASZP he_l_Page_034.txt
c6c48caf46ba380478e4685bf4292bf0
254b1c0ec634516cefc308fadf944c134ac27374
42247 F20101124_AAASCI he_l_Page_093.jp2
bff97aa3f7389578aacc0f8dce2695d4
4712c7a044088ab523ab43448e27f60ebdca84a4
4590 F20101124_AAASBU he_l_Page_122thm.jpg
e4201f7ea76e113f28062f3ea75c1e6f
b9072b7d7a6350dbc9f283998fd2e69c56d77805
22253 F20101124_AAATFL he_l_Page_029.QC.jpg
cbd01cab861915bfec93513c58ee9f3a
90a0c94a09360e168f8fe954b5c0ce6e80953cc4
1572 F20101124_AAASZQ he_l_Page_035.txt
1b986c011d6b26f20cf693409fcf727e
be3435d3c6fcddf31022376f1cb8cc1a3944fe70
F20101124_AAASCJ he_l_Page_137.tif
e51d03430f3b6d525793a71bf869c52e
c7685ef329644fdba904f39c4069e08569b728e1
6699 F20101124_AAATGA he_l_Page_037thm.jpg
58942d70be25ae2ba64909262ea0c82c
d828ad6b34deb4a607eb7b7b22249427a72c0aa4
6599 F20101124_AAATFM he_l_Page_029thm.jpg
b6227e9da2a74a3de22e0e32ab4a6653
68d38d0c46b4fe7b6a2bdd773d054de1027ce7b8
17991 F20101124_AAATEX he_l_Page_021.QC.jpg
3bc4683f48679fc626c7dd0a5b0d613e
fe6f22a5b0f810640cfa6a537a408ed028b11240
2037 F20101124_AAASZR he_l_Page_037.txt
79642da64a13983f693041e9366811c5
8f0b41cbdd6800bcbdba6f61961dafe01b4a1c81
22084 F20101124_AAASCK he_l_Page_086.pro
c51b7ac9f103d689d90249a3a10380cd
33db7689eefd93f1a89854e6aed018335a75da22
42198 F20101124_AAASBV he_l_Page_031.pro
05d7db08c99430cb216c43a682031053
0dc9540121ffffb2514840e7e7ed816f76bb3bbe
23688 F20101124_AAATGB he_l_Page_038.QC.jpg
1c887713c359ee803f63dc1d7a629e9e
ff6decd4b4d43f20c7a9302e3e75eb7e8412a058
23315 F20101124_AAATFN he_l_Page_030.QC.jpg
7ef5c39dca64dc978c8da75e58483100
2cbbc2677d6af14ea932b2671bcc79cc5452d7c2
17098 F20101124_AAATEY he_l_Page_022.QC.jpg
d4076effbb9e4277e5a70c4fba7c90fa
1827daa5b80e9e2ccd65a9d9db5d5d6547debf8e
2285 F20101124_AAASZS he_l_Page_038.txt
80c6bda05a7033a8ff8f2cffa3fca992
557c19449042a9fb0ea40bd7ffacb137900a031b
783295 F20101124_AAASCL he_l_Page_105.jp2
b772425eb03f5365861b08edf452871d
e54b45ed5dce2d6c8aaa65ce94f7c09493e00489
F20101124_AAASBW he_l_Page_031.txt
1f66e4ac1805a2cbbae19995d1e64370
e41bdf1e9dab349af7c89ce36267fb346b58ac91
6471 F20101124_AAATGC he_l_Page_038thm.jpg
60c568fdd5917d79dba9c5fe0eee2f9d
6a57b2296cc71c9121d36138171439fadfed09ea
5704 F20101124_AAATFO he_l_Page_031thm.jpg
6e26cc9c2f7d833e7b891a1392c9c373
5ccbb684ebd1baeeb7d43865a014c9fb9aef0b25
16311 F20101124_AAATEZ he_l_Page_023.QC.jpg
ac5c18cd0d0fd97479fa03a6c9a0190d
202363545b1574cee07945329d8e808dc2cc9515
F20101124_AAASZT he_l_Page_039.txt
a87e3a471419022c7eb04ff3210a3ac4
63725af82a940dab2998737d9daf6d003a836aa7
805389 F20101124_AAASDA he_l_Page_084.jp2
bba846866f1fc85744f7d5c8cf2ee98b
818005192943d80ba94a2901300a59bb9143dc3d
63174 F20101124_AAASCM he_l_Page_054.jpg
5652d6ed2ff3484c84cc4ff4baa6d9cd
68ecc5bb959b7baa2e8e1b52e94c9275a3e15dbc
1060 F20101124_AAASBX he_l_Page_012.txt
abd8874ee72171013e27b67bb44176f5
75ccbf2bcad414e1f6b818ed84d92c3fbb2565d3
22261 F20101124_AAATGD he_l_Page_039.QC.jpg
e1e121e95a62438e64d390544a9156aa
956a88b13df9d2b2c562657e5e444fa70e0040e5
20249 F20101124_AAATFP he_l_Page_032.QC.jpg
73fcf07eb6696c181871b4148ff1aeb0
5384ebf9ad60734923e88e27c5f5cfa994dae949
26079 F20101124_AAASDB he_l_Page_042.pro
4f574210af4eb1604555bb037c61764b
381954d9a0f484fb169943e22bf81b2506f5c7bc
1478 F20101124_AAASCN he_l_Page_052.txt
a57847ecf1f3e1f5ba3bf2765fdef628
854f896612d83fec5f63571b841afe4142f2352b
5695 F20101124_AAASBY he_l_Page_052thm.jpg
b9a042f779dfd40723e823d26ad60130
9cfef0defa66b71bd0b132aed8f5feaf1a77e28a
1840 F20101124_AAASZU he_l_Page_040.txt
4bd77f0ab82d2b0ca414f2d28c7669a7
6d0384e5075139dbe9fb74770df6abb3cb17942d
6092 F20101124_AAATGE he_l_Page_039thm.jpg
81df174a80cc24359c05541d0bb18bab
8bf2667286fadfcdd5c77fbdf6eb16297d0a120b
6108 F20101124_AAATFQ he_l_Page_032thm.jpg
8b152cd003afc4a3d96e6fbbd3318e4c
8d224688a0c4ccc6669adffcce5816d3051c4c51
1843 F20101124_AAASDC he_l_Page_048.txt
55ec62bc1853046522db0ebaa28368d4
d46180e11d52fddf43a3ebea2dfd178f55b63db4
4000 F20101124_AAASCO he_l_Page_012thm.jpg
c6d61b63aeff362b27588f94c9c76da2
7f84b3ebedf636c15d0ea076153594cab8ecda7b
83528 F20101124_AAASBZ he_l_Page_143.jpg
7429008d9853ef61118416f8b3f4ad04
2d7aecaa506fcfe8cbc0fdb1657ced8cf9821688
2233 F20101124_AAASZV he_l_Page_041.txt
1427fe9d3b7c7fd2aab8f127ab61e7d8
d6a4ac3748e8a361d02d017618086ef68fdcc662
22264 F20101124_AAATGF he_l_Page_040.QC.jpg
b7ad05b8538219eac933229522194547
36235aac49d987c85dab7aea0ea6cb52070d3f77
21876 F20101124_AAATFR he_l_Page_033.QC.jpg
f32c1e71cecfba4ebe57d06737f9b07c
ecf2c640fed20e336fd84e552cba51ae74a16b01
F20101124_AAASDD he_l_Page_049.tif
0cbdde68720fbe05e147cb346716e3b4
84cab2a6bab7467ed0d98ddb090838de472b8944
F20101124_AAASCP he_l_Page_142.tif
d5529cfebc78c7711147e494493632cc
46ef4ecd503ca2bfca83cc716281e57d81907582
1511 F20101124_AAASZW he_l_Page_042.txt
26eac7a8a22d0370b0a15de29b6dae0f
370cc6a1a745d037c0a2332fcbbec2a55de10ee1
6323 F20101124_AAATGG he_l_Page_040thm.jpg
4e5ac62652b0f99c0c0b848bd1d263ef
cc3edd1aadc83816a06c9671049d78d0b0065c31
6112 F20101124_AAATFS he_l_Page_033thm.jpg
7d90ebcd57fbf1467a9bb80018fbdaf4
678c919c0bc2428c485eb5782ffe63192b5e0ae1
17026 F20101124_AAASDE he_l_Page_124.jpg
f42db085ab5fc5a3caafe0ed51e5a7ba
2be8c717151f30ddf1fc4e4b8d0169a877f3f0d6
126551 F20101124_AAASCQ he_l_Page_137.jp2
3b1ea0a92704da3fad37a3dda64f1916
15cef73d1537e4838d7cc12865dae3596ec9b21b
1731 F20101124_AAASZX he_l_Page_043.txt
aed47ad5a259c03e2ef620acc6adc2b2
3a3bf0b7e23fe8ef52e7a34e27ab2e83fd8f9c1b
23908 F20101124_AAATGH he_l_Page_041.QC.jpg
441eabee143ba9ac9d02599cef416cab
5d1f6e58c3c6686ebc9f748c35c6c27767e8ad5e
23337 F20101124_AAATFT he_l_Page_034.QC.jpg
1779900ea3659ba15985847722d93fc6
13c30164314731655056e87875e4491a5aa9f229
5963 F20101124_AAASDF he_l_Page_021thm.jpg
67c9bcdc2acc84155c92bdfd624d9a07
6894ae00d16cb32552e91e885aba36c75acc719b
F20101124_AAASCR he_l_Page_064.tif
3ee0a4feaeac8d12b4b511129d9b360f
a7e2e164a6fa1128d7bad06c9f93c6ab6e5e55d0
1841 F20101124_AAASZY he_l_Page_044.txt
ed00ab773948c3b8b32b3a21fe07e4e3
e18efa4c2d2bebd786b5f0c18a13bbd003ba0352
6661 F20101124_AAATGI he_l_Page_041thm.jpg
4c573fb7bcf563cef4d7d880993c7baf
0a583aeb29fa5bc11a98810f9c5e11814f458c3f
6565 F20101124_AAATFU he_l_Page_034thm.jpg
84161e5f178b2d5a3b6f19e86e0773a7
c58e03f5256ba87b584e892f2691841f15763d24
38781 F20101124_AAASDG he_l_Page_145.jp2
690ad89227018db2c5398fff24d956f1
83cc230152e17ff7a6c06e21d82f00d2e360d4ce
997288 F20101124_AAASCS he_l_Page_059.jp2
ae6812ce3d6937a5e5c77253a736bb46
eab02fbe55081ca437ada339f6d91c8f4e841fec
1700 F20101124_AAASZZ he_l_Page_045.txt
3a60ec84a46112dcccd0aeea83f59332
de91d77c7e0db141a8e231711cee4476ab88b2a5
16277 F20101124_AAATGJ he_l_Page_042.QC.jpg
2278a6b84c43f6eb4cce0ba455ec7d6f
fde7a0d1d3d7c2fb6956885be0317bf72be42845
22767 F20101124_AAATFV he_l_Page_035.QC.jpg
a09ccfe8c24eead92010dcd128bfd94c
c0174c0fa80589c52a4d338e92487b53942a0cf7
F20101124_AAASCT he_l_Page_052.tif
f22c3eec98743a959d423481873edd88
7f99a4e9152d998d4380357626c8de83a0311303
6539 F20101124_AAASDH he_l_Page_030thm.jpg
4eea3fea875f974327c4bdf85c9a0252
cddf8532dec2a279ef3767541cd59bb0d4560f68
4970 F20101124_AAATGK he_l_Page_042thm.jpg
8961a784e4b823909daf93b8840e948b
183af992679fd4400e186ecdeb7d4c2df4b21d43
6210 F20101124_AAATFW he_l_Page_035thm.jpg
64b842c4eb01a2dfb4c39e067c1c76ea
7913e47289dd8c1c3fe1aeb40588b1727da2a38f
F20101124_AAASCU he_l_Page_031.tif
ed70bea965a30ba04a3def9e24b5435e
0376378e17f5c3fec2b7884fcf2b1da664004e6a
6955 F20101124_AAASDI he_l_Page_142thm.jpg
442a74762943b8415d753d5a1e396a3b
bd86b37bb02ddd84fa99bf07f356fe6fd02013a2
21440 F20101124_AAATGL he_l_Page_043.QC.jpg
32cd9b663be603d4ba952f875056fd7f
0900f9e8562e54e349dfcca77f133157e66fc2d3
24376 F20101124_AAATFX he_l_Page_036.QC.jpg
97a76d2aec0b55868dffb3a46be99bce
98c569a30a9aea4df076aa9f7d4fe202a894a2a5
1508 F20101124_AAASCV he_l_Page_073.txt
ad1ccf4bd3ef40c0d7bf09ff089430b4
38da3eb695d7236f40aedd71a6fa733b25abcfa4
12750 F20101124_AAASDJ he_l_Page_091.QC.jpg
f584713d173e6104ed2875db20ad0a5a
af906108e4990286efc2a608afefedeb4801b454
F20101124_AAATHA he_l_Page_050thm.jpg
497d695522b46acfd332213379f6bc76
e61c6bf93530d4eaacce0aeff9e26d290f450e8c
6001 F20101124_AAATGM he_l_Page_043thm.jpg
0f015b09ff6228eb55e6b94e28178569
d3557170a3850ece342f463562eb4d95c5a5055a
F20101124_AAASDK he_l_Page_083.tif
7df0b7e95ffcbbc5106b913cf336aded
dfc603a668a70b532235213a6c76b30d63c41d13
20305 F20101124_AAATHB he_l_Page_051.QC.jpg
30ab6af0a7e47973528912ed3fb2b4b6
47cbad64961579b9db361f490520ecd02fb3a241
21551 F20101124_AAATGN he_l_Page_044.QC.jpg
8ff6b4c1d5d2ca4f46b1ea5cd5015aa0
287c92904490a538d47b9cbca1181805b7792084
6703 F20101124_AAATFY he_l_Page_036thm.jpg
a2c5f70b520de8049d53cae2c499a2dc
ee203f28c2c917d01687788ee495038e93dbb786
F20101124_AAASCW he_l_Page_069.tif
e7644ceb597cd7db092fe725d4aa1c7e
55128ed39c145dee1086c5004a79a39662bc555e
33809 F20101124_AAASDL he_l_Page_096.jpg
bc88ce973ee292c880a9f7945c99c5cb
a8509cf939d3fef52d082609321fa253c1f1d260
5769 F20101124_AAATHC he_l_Page_051thm.jpg
664082b9e1db65660c1f5ab936406de5
441acbcfe68106f2566b888b6db9b597d4e140d9
6153 F20101124_AAATGO he_l_Page_044thm.jpg
33cd3c537e7494971ad2a2e9bbdc0856
26766238061f77132a44f0f29332622e9f817b56
25236 F20101124_AAATFZ he_l_Page_037.QC.jpg
d864aec9510fe30ca7572f517215bff7
fc88718c3637385a85d6e78a57fed308e1cfda18
60087 F20101124_AAASCX he_l_Page_112.jpg
8ed68d1f54ab8653bf92815f592d035b
0db7e7f3f9f0f85c0d2ac61d1031db273831ff07
75570 F20101124_AAASEA he_l_Page_038.jpg
e816a7d358dbe8c5567db07a010c3f18
e0aa1bf4f006517094c17ee026d4f8a310933548
1701 F20101124_AAASDM he_l_Page_101.txt
91f322423f82624042a5cb1393ac57cd
debe213f96ab05da6f6c849fbd8b75ab5067430e
18667 F20101124_AAATHD he_l_Page_052.QC.jpg
9a0885e744604838af08bef87c89af25
5946d7b18baafed56f361892f68ab16e844249e5
20735 F20101124_AAATGP he_l_Page_045.QC.jpg
8586ed99c96c869f1d7f6e9ce707f319
ccc293340a2a461b7103b34aa900be355405f600
20527 F20101124_AAASCY he_l_Page_092.pro
7f8c9590205d0f83179d202ff755ed00
2cea916ca49025c0e95772432d8141f104c307d6
6230 F20101124_AAASEB he_l_Page_055thm.jpg
31abb0336b3c71fbd3340ffa5b9bac90
374a3ecfa26ca967425f4464d070ebc4af468a23
4768 F20101124_AAASDN he_l_Page_132thm.jpg
1de0797bde930964c65de11c99ccef7c
d2bbc80c07915a20d0f804a540de9bde9ea703fc
6642 F20101124_AAATHE he_l_Page_053.QC.jpg
cbf093220de9d05364d1217f4cc224ae
d06ea396e930ef653d2d2cd598e3d97e74534edb
5750 F20101124_AAATGQ he_l_Page_045thm.jpg
5a821151dd15fdfc2daab09142d76945
e45eeae6b67724cce345ecad9953e1a3630d7fae
761971 F20101124_AAASCZ he_l_Page_060.jp2
a89b22ef34b267dd4626656ba2e91d96
bd7b37fbb0baf42fab0c9c439ba425823722ec80
F20101124_AAASEC he_l_Page_095.tif
5fd75173504682aaadf05379a53edc31
7481e25242e7f54ba20c640e358ec8b33cc3d116
6765 F20101124_AAASDO he_l_Page_059thm.jpg
a60a8ab750eb484a605d938a8eea8a06
50d29b795cfd9c8ddf682c69ba877ca2ebeb52bc
2310 F20101124_AAATHF he_l_Page_053thm.jpg
d25986492d2e0f75c4e19225068756a0
52618e51e4b48a9cfef36021ed23504eb669fb20
15025 F20101124_AAATGR he_l_Page_046.QC.jpg
df5f092fbb63b7ae70af0a7f1025fd01
0e2aeced36c296eb1bfed109988a3d64b0787f95
1661 F20101124_AAASED he_l_Page_003thm.jpg
7b8912ae93a6319db69dd546294e88e3
5c8e47a25eebb9f5829691b1dbfdf224eb1ce356
5101 F20101124_AAASDP he_l_Page_105thm.jpg
adcf6a58dbc50e873dee1a68b530b83b
deff420ce1c7404aff9f847b376ca32917f89c79
20617 F20101124_AAATHG he_l_Page_054.QC.jpg
7859dd2c59c29e199477762ed0cab614
678661f86658cacd0e979af8692b17978aa33bb8
4409 F20101124_AAATGS he_l_Page_046thm.jpg
10fedfa50e5d828f07993fad41124a6b
a6d50bba4148ed8d94911cbd6f6f43181b2b12e7
64838 F20101124_AAASEE he_l_Page_043.jpg
c54c1970937a08d4cf9e6b68d9fe0ebb
ad61e03f4921840f9f2296aeb584442410f74f7c
64772 F20101124_AAASDQ he_l_Page_048.jpg
ed308a4673c7475d8cf2a56944321681
bfea8b2cd5a8414748ea7942c0a5b8ddb6260e7a
5930 F20101124_AAATHH he_l_Page_054thm.jpg
abe6c39d80cacc878f164e60947e5418
c3e046055faa76dc33971225dfb784982ba62da7
19543 F20101124_AAATGT he_l_Page_047.QC.jpg
3c64ddf2e9cc815295e0dd7329c161ce
3d9863c43cecd2be4fe35236293326c1e7e44d61
1595 F20101124_AAASEF he_l_Page_105.txt
6c5482162c5da434f208a6e996a96d7d
151051f66453084537552d24d70b5c0e9640111c
5560 F20101124_AAASDR he_l_Page_069thm.jpg
8aebcac41872b778894fdd7a7e9c13b3
e6bcc1d41ffb9b6003f8fd807f9efe09c61a2d77
22175 F20101124_AAATHI he_l_Page_055.QC.jpg
ddba506b44f8759352732821ec352646
354076b1955ab81586967548f47d09d4d2ee2bd8
5524 F20101124_AAATGU he_l_Page_047thm.jpg
e47fd72a114908fadce2963011cd46db
76a243981706afe93b64f59d0ed9b7417a3d0d6e
1044182 F20101124_AAASEG he_l_Page_041.jp2
a53cf53b03bf68eca13be1f27b11e513
82f89e1f0d93f286869c983982885931c1298d93
47482 F20101124_AAASDS he_l_Page_068.jpg
1d69d214222c3b1df88f9c4d0dffd799
95b9493298000ac804eca17612bbe392aafa7b39
21726 F20101124_AAATHJ he_l_Page_056.QC.jpg
bd9824f838c28632baab4e39d9aa6a1b
8b937ca25dcaff31883303e84a22ce3aeabb6603
21091 F20101124_AAATGV he_l_Page_048.QC.jpg
a49d610bc458c38ae242646032a3aba4
9830f2683a44883f05305542a1b1162f118b9faa
69039 F20101124_AAASEH he_l_Page_014.jpg
d38651d4ff786ed0ac2b0a71d19b1b27
ec01f17462a23af180e8d6b2c080168a0c7ac664
F20101124_AAASDT he_l_Page_090.tif
64e2d9e7d8eed75907383665db8025e6
5a571586b934348ba6c383c8fcbffa2dfe07a047
6174 F20101124_AAATHK he_l_Page_056thm.jpg
0d74a9ec9d37e765a7cd7b537472c0d3
cba4ee2815c79ae9735b756a321b66a9d3fdcdfa
5904 F20101124_AAATGW he_l_Page_048thm.jpg
71d37ef1563f9917faeeb051a35da774
9f59c43eab04aebaa95a7bca912a2ce1d30b6e22
17035 F20101124_AAASEI he_l_Page_081.pro
890916063f6de62928876d13d31e197a
5a526f5bfe99529d8a450c8f5f3cfbaf206f4a60
F20101124_AAASDU he_l_Page_079.tif
56023c80d4560394f707be284de4c63f
04ca886e1ea852a4e6fa3ca77867fcce237d0f3d
16990 F20101124_AAATHL he_l_Page_057.QC.jpg
de1eb58ad1ee679d6c4cf6b459da7aa2
ae0220d7d802ba33659334589057882b03f1a519
21751 F20101124_AAATGX he_l_Page_049.QC.jpg
12980d047dd5a826fe4a5a43be21c0e6
94918dfde0d658b51cbb86b224d2526c8a521e00
98751 F20101124_AAASEJ he_l_Page_006.pro
9eaf21e5217328070b5aabb486bc6433
0a179f36f265774e6dd8fb309d0ec5dbcf33aa44
79138 F20101124_AAASDV he_l_Page_099.jp2
cf0b9c50dd47460f40b078c32c6c84f7
81952172e1411fcc12f9faeb4d29a276fef5f5f4
20151 F20101124_AAATIA he_l_Page_065.QC.jpg
5f213ab01589d50b75e5468ae918fee3
2fbc548ea76f93f031554be99e8870014cccfc96
5274 F20101124_AAATHM he_l_Page_057thm.jpg
d2e6d8251b71680c178e9c004307153d
bcfed1ba4f5b98a7d64c1d40ba035c9c7a7a742b
6142 F20101124_AAATGY he_l_Page_049thm.jpg
66e66f82db97a83a25e879625163f9d4
fd966dde74cb9584111f8f34bd7f21cc593b3324
33475 F20101124_AAASEK he_l_Page_131.pro
67d36ce0afc6f3004826c51b9a89813e
8c70bb243d05feb2b4c9f7fe40449658ebf3bcf5
F20101124_AAASDW he_l_Page_133.pro
5b3fefeee703e0b6cbf42c93f3b6a70f
f7959ab3ac9c40d5c6c96047c235e4bf9a71d143
5555 F20101124_AAATIB he_l_Page_065thm.jpg
4479c9fba96108b241c9c8b28ed10248
40ece242f6ace097a961dd8e472af54208331c8b
23721 F20101124_AAATHN he_l_Page_058.QC.jpg
0f3c13d2e0dc212c1f4c9dc88e2ad784
73b28923d50d64da5210d239fe61310fcc5e54e6
F20101124_AAASEL he_l_Page_109.tif
404fbbf846659e026c4244786cce500f
00aec0dccb29074d75f945bc3caa59f32b9a24df
22168 F20101124_AAATIC he_l_Page_066.QC.jpg
6c758a975a4e116a17229e6fca9835ec
b8f11e3156bcb20f9fd667a61d397763e8e81ce5
6627 F20101124_AAATHO he_l_Page_058thm.jpg
6bfca8dceebd43ab2b9af2d2b35a9c99
66f6a985fbb5d4303d6ed4a9625ed7c1e2860e46
22327 F20101124_AAATGZ he_l_Page_050.QC.jpg
c26dc9142d6a9be868ade15d1a218cb0
4860fa02b3f0bd02e82f638faef679b4b0db0630
1051955 F20101124_AAASFA he_l_Page_008.jp2
1b52ca12ca5a3dc34df05c9566e853b2
615046576246eb70207bc30982406b4abbf546e6
25535 F20101124_AAASEM he_l_Page_138.QC.jpg
496b782ae9dbbdab8c36eff1dfbcd75a
e91d92b4be6dcadc997648d3873af83fb3e46a3b
6651 F20101124_AAASDX he_l_Page_137thm.jpg
435bdb9af9ef9fd2ca9d4f97d85aca7a
f45fadbf7e146870175014674d14fd90b9262264
6465 F20101124_AAATID he_l_Page_066thm.jpg
3d2c390981f5993aab4604730b2614bc
ef1ccb8462d816c588c2acdb1216e15409074f6f
21912 F20101124_AAATHP he_l_Page_059.QC.jpg
dccbe8f374d4b2e7eb687c8a135239fd
acfe9911ba84b3e3c8215a6d9b88c88c11998a3f
212676 F20101124_AAASFB UFE0011440_00001.xml FULL
ae9629002554adf4cfb66e456477ec15
efb4ff7d0dc52ccfa84fc30eafe2f88272083fc1
5712 F20101124_AAASEN he_l_Page_101thm.jpg
85635fcbaca1e7f6a5efbe44d5fb66d3
9e1470ebb0348f4e4cf2d62e04e0a2c501043391
44350 F20101124_AAASDY he_l_Page_086.jpg
c640c1fc89c9a4f916c0407adfad07c2
c0e9e7b447088c94a8417013901b9b62d74166f1
16731 F20101124_AAATIE he_l_Page_067.QC.jpg
709957d2d54c4c6936b63e0c66a42a24
462c78fdecf7c4414ac814e0164a6f2c37b14df8
18867 F20101124_AAATHQ he_l_Page_060.QC.jpg
1b021dab53b39c3fed932ddd990229f1
bbd8cb48464f5ca336594a99db778f2f4c775cd6
34317 F20101124_AAASEO he_l_Page_102.jpg
54cb5540bd4d243c1ce99a46487bb719
28a17cc6893522ebeef815c1480df91ef319fc8a
3884 F20101124_AAASDZ he_l_Page_097thm.jpg
183ddcca11cfc5e36ad69880d89d48d4
844ae3ca487540a06489de306c216dc29d51e1b3
4905 F20101124_AAATIF he_l_Page_067thm.jpg
dddbdbaa0dadc85b0315660efc73bd7f
f86e21cc26699b5ec6115ae89255e14a6978f75c
5303 F20101124_AAATHR he_l_Page_060thm.jpg
af5201d5e78f0d3a10cdc51cc7e518d2
d09acbde2f378acba35123278273040f1c593a60
349 F20101124_AAASEP he_l_Page_127.txt
3ffa816558e904d63de412a7e8d62a6e
11caf9889d8ff53fb74eaf44e942cbb42e75288c
14653 F20101124_AAATIG he_l_Page_068.QC.jpg
f123fa4e26ffe61b8daf973574c6299d
098bb6ca298fd1530366ccc1b555579830137a64
19073 F20101124_AAATHS he_l_Page_061.QC.jpg
a1b4bddcb6942053386edaa4fc7cf42e
83ac414ed2207949d2839ad14b77b6e33e15d09a
22323 F20101124_AAASFE he_l_Page_001.jpg
2b70c5fc7ba0dd57d0c8365c17495650
dc23ab1131554714d53aaaa4beee409d4b2d017e
18764 F20101124_AAASEQ he_l_Page_144.QC.jpg
80c68ea946835ef42cc490415da0268f
0bb0b98cd7fd81841c9b475978f814a12f48964c
4884 F20101124_AAATIH he_l_Page_068thm.jpg
393ea3046fc444ffa2e43d6011ce6f86
908451c9d56e5d36cdab2fbd71f7fe0c3d0da1a2
5961 F20101124_AAATHT he_l_Page_061thm.jpg
423918266b5b7f1a2f1b984560fd810e
6f90fa33f7b4eb1f0a49aa0fe6c690cb63a547a6
9926 F20101124_AAASFF he_l_Page_002.jpg
b42f045b38e3db2ed43b7a5ef9e3c99d
f938686d4cb4834d2b8b45a9fdca0c6cd4c20114
67401 F20101124_AAASER he_l_Page_108.jpg
e57d76ac6c465d4dce2492e8096f0015
b82ad7901a14a1d7168e896a894d42ad01f5be2e
17507 F20101124_AAATII he_l_Page_069.QC.jpg
b046508aa0d87cb287cecc1cf5b5411a
0d3e4d8a694a2c0cc86fb651082e111080687a52
21045 F20101124_AAATHU he_l_Page_062.QC.jpg
5839b53c85bc9b5bf2a6d055e92f0ab3
38b27de8c72345ca1bf45c601448af35a212f3fa
14055 F20101124_AAASFG he_l_Page_003.jpg
54054984f3ee2a1170f34fae5675aefd
18c3aaca0eb8474077507d55ee31608d166dc33c
49286 F20101124_AAASES he_l_Page_097.jp2
131a867e4f60d1fa598fca83356c71f2
ee832ac9f5302ecef89cd1feb3c4ac10951bbb68
16904 F20101124_AAATIJ he_l_Page_070.QC.jpg
7c24929ecfb1921b487563b7fa6a499e
ecf8f2357dea7e069917b5e55554a4786bb15f95
6343 F20101124_AAATHV he_l_Page_062thm.jpg
b3c330bd6702b164fb0be3356d09ef03
f76ddc8c817a72ad57b701b2aaad702ffcb601eb
57638 F20101124_AAASFH he_l_Page_004.jpg
0b1e3891a0b359f3f56ddd09138e806a
4f8a8e95d4d002879aeb2a2787de73c902703d5e
1244 F20101124_AAASET he_l_Page_080.txt
4b359d2c6d5423d49c2c67269d0197fd
ed87320b77fb5310cd3af7f0e9341d9f5c5b5057
5005 F20101124_AAATIK he_l_Page_070thm.jpg
a2fd317b71eaa3096cb00bfa6c0e4d0a
458a51877ec54e91ca935ae3ca7f8d8ea6bcfe67
22582 F20101124_AAATHW he_l_Page_063.QC.jpg
7b66a344e0c0de5f90c3e9e8f2e85db0
207e67def371e61481900baececa29eed99b27be
68635 F20101124_AAASFI he_l_Page_005.jpg
8c6cf1575ae6bc5fa363df967d46ad0e
4e4418180413157a95abea64ed02fb10bc8a6438
F20101124_AAASEU he_l_Page_027.tif
056953cc7491b0f67668ebb5d09a3fab
17a145b469874006ed35bef992d0b351c3fb7bca
21081 F20101124_AAATIL he_l_Page_071.QC.jpg
81f397077afb38dcdcb3a2a9ad9e480c
7df37e0a5b2ef7eee229391e1affede13461c2b7
6315 F20101124_AAATHX he_l_Page_063thm.jpg
e0839350a0115617724bbf7dba604d9b
f5e7633a4c0e0e60a6d97630e77d4c42355ac4e4
98643 F20101124_AAASFJ he_l_Page_006.jpg
01e75c853e0e5c8322c980284f7e7348
d6fe21747c1ccb0910fe58d51756f5b5166e7d26
1735 F20101124_AAASEV he_l_Page_075.txt
ba07ad2fa34986defe7328f39da50625
7f3d194f0bfd378b769e82ba1687e51527b62558
13501 F20101124_AAATJA he_l_Page_079.QC.jpg
199e4b23f329a4e8837f003b55bfd3e8
1c0fe03ffe1efda5727c37597b8793ea0f57fef4
6342 F20101124_AAATIM he_l_Page_071thm.jpg
0fd29a703a96820d8e5992ab2a7663a3
b7623e30700eb1ee2c68652a9def0013cc0346a5
19869 F20101124_AAATHY he_l_Page_064.QC.jpg
b293d9d588e23d4350159cbd7c8801e2
12057a07897d5e3a2f5a1b922ba3a282cc885eb8
45943 F20101124_AAASFK he_l_Page_007.jpg
48a29ce256126667cb7c44d7b6628216
4502a2a96546a58c3918263d62309d4ffb499298
4733 F20101124_AAASEW he_l_Page_123thm.jpg
c3bafa40d433cbe7f236b3378d13415b
48698d0df81cd89b6f03d8ec7ca5efba3d8bb47b
4365 F20101124_AAATJB he_l_Page_079thm.jpg
dda062f8113b8f601758dbc9d11d5f2d
7c1b86f96e748fb6664c42f0665d7f1b6aa0345c
21205 F20101124_AAATIN he_l_Page_072.QC.jpg
aee31b14d360cd1ccd195619c85bdac8
574349a539b32e3f152c08a8707a84de61a9ff4d
5810 F20101124_AAATHZ he_l_Page_064thm.jpg
ccfcd9d4440a25902daa8042e908d92a
fa21020346e10e45ca64ee8a3b733087aee2106b
57688 F20101124_AAASFL he_l_Page_008.jpg
16e4092a4657f82de43b295eb1fed6dc
b66ef38eca489e4b1c3538d4f4b1479ec8f0513d
4960 F20101124_AAASEX he_l_Page_022thm.jpg
2f893331f2c70e6c9a055fc3ce1a5647
f12a8c3ac9b14cb39a80a4b9b61276b196bcb9db
13890 F20101124_AAATJC he_l_Page_080.QC.jpg
1475339e91fe1568e960d93990374535
5f1cc1e4f44f816c84982ca7444726778e34e76f
6362 F20101124_AAATIO he_l_Page_072thm.jpg
9d96f0802ddda601adb06ff4bc652561
ade4e3b5aa537b2c5fcc1c96d847f448dfdff6bb
70073 F20101124_AAASFM he_l_Page_009.jpg
047cffdc30b5ccdb1ded7cfd75b5d95c
de7caa2135c426d4be0cab25d94f6c3ec973c639
60508 F20101124_AAASGA he_l_Page_025.jpg
d4083efbd79eafaad0134fe8c38cdb22
2f0f203a14e574cf2fac66bd255ba5a1d609da3d
4765 F20101124_AAATJD he_l_Page_080thm.jpg
d03720ee5c29d5d443ef56a80515d780
a5fb880dab0fe68cbb37c8e89b8b4ab595883cbc
18235 F20101124_AAATIP he_l_Page_073.QC.jpg
35e7e081ccc6b6df6426212a10286035
d87e3abcd0bed65851fe31a4c00dee1b9c9761fc
14533 F20101124_AAASFN he_l_Page_010.jpg
862b6dbd5ffd3c0d41b4325aa630296e
1f930c0178d3de40e03afd79bbf7f9d1fa9e9639
F20101124_AAASEY he_l_Page_051.tif
2d433972011d6e4bff9648f33460d662
2291ef065c4028ffe845a3fa1188aac310ef2daa
65103 F20101124_AAASGB he_l_Page_026.jpg
6e9a00ffc8fd5ffaecfff795f338b2d2
8ead3e0d64fdb3f79b40b2ab663bb4249817db00
9728 F20101124_AAATJE he_l_Page_081.QC.jpg
7f4b98da8e2d819d2649a20fcdbc1da3
9fceb988a312ea55f52d5b05f58ee9a1070e77ee
5460 F20101124_AAATIQ he_l_Page_073thm.jpg
8e8c4ff452efcad1447aaa64e1012bcf
537c7c0cc06423fd9a41904deee4e2c74412aaee
42455 F20101124_AAASFO he_l_Page_012.jpg
e627992affb7761cdf42e1ee83f3cac6
bcf5c3e8684e8c307de13880e12dcd0b2abf914a
36782 F20101124_AAASEZ he_l_Page_035.pro
5f89236893a06136ab78c3fbceb1a1b4
ab354cf38f42077382b145d06da0816a895bd40a
65867 F20101124_AAASGC he_l_Page_027.jpg
e553ab200bf577c167e1dafb03ff1771
3cfde049c928aed3e528cb94e9a16625dcda6a4d
7844 F20101124_AAATJF he_l_Page_082.QC.jpg
d1418755438e1d3cdd6ad5428ea21fe5
63f42aa1c8aca56bdde1dd46ff5457b5b145a51a
20252 F20101124_AAATIR he_l_Page_074.QC.jpg
0d458bce06dc269fe7fd433d20f8df00
dc5153bee2c4970fe4a47a6af42b99b54b8b0106
62199 F20101124_AAASFP he_l_Page_013.jpg
eb067d8f3c32b1ffda67cb28fb87e2f3
b6a6734e447cfa514d404ba0cc3e459ab2686dc0
69893 F20101124_AAASGD he_l_Page_028.jpg
ee51885923b699d25616059c23beba74
3b9d0dc4a122de5606eb2e9318b9fef97f8a357c
3054 F20101124_AAATJG he_l_Page_082thm.jpg
fe590815a8f180d25c0056a4f61f2679
2505887e2f657fb43681368a932aa970132fec6d
5659 F20101124_AAATIS he_l_Page_074thm.jpg
0833044872d090bad45515d22c5a38fe
52ac833a751c241c019e4949c833ccbc73f67637
68255 F20101124_AAASFQ he_l_Page_015.jpg
74ebea95b3a6810b4a362648539bfb0a
d8c8c732bc144f88251c85106e143869b9f4d693
68906 F20101124_AAASGE he_l_Page_029.jpg
c43134edbffd17e5bf56808bc75b8ef1
74a8b3ef8ee36102fb0ef1e370be41596f4af411
16143 F20101124_AAATJH he_l_Page_083.QC.jpg
0d2cb4fd122a1c0f8e80f3fd3eb15304
0d8bbbc0cc41501337a10cb8ebc38df7b44ea32d
19956 F20101124_AAATIT he_l_Page_075.QC.jpg
ed4f97151d800ca46057f93a189d2fb2
f2124aeddd11652d8d820ac7d9758ba790bfc50e
73305 F20101124_AAASFR he_l_Page_016.jpg
f7c36d4e6ad8f65d39cf68acc2c31b96
ea2b15526e7f298ba90ae2b0ccd278cb56dc980b
69029 F20101124_AAASGF he_l_Page_030.jpg
859bd94efe76f9dc12e85e941552c5a2
11ce48f5ba52cb794cc6b5d3b4a8f7108e15a6ca
20344 F20101124_AAATJI he_l_Page_084.QC.jpg
a1de027ee662ec3b896eb26b41a78ccc
5b537d9dfdab02a8368e6820d53f0f1a842a0bcb
6213 F20101124_AAATIU he_l_Page_075thm.jpg
8279541721c7d21a4651ce30949d90d9
621bf107092c5a8dfcbf949956f2d2c3ec6aad03
17179 F20101124_AAASFS he_l_Page_017.jpg
f8e58adb3127a1fd5c89805a2e6f30b6
f096ec80a0f5a29701fd462571c3e5d6d9e1b316
62401 F20101124_AAASGG he_l_Page_032.jpg
e99876242a50dd545be2ed27ea62b213
4ffad74546ac4757cddbb868d6953100bdedacb4
5570 F20101124_AAATJJ he_l_Page_084thm.jpg
154598c619d128d80597d71ed2234d32
a78c24da80a82d5f8db921cb8f413434eda04944
5338 F20101124_AAATIV he_l_Page_076thm.jpg
aaa625a7ea208d5fbdd9c0fb0a655f76
df53cc149e860df0a7fcc20c4b56cdb5c1d261da
63537 F20101124_AAASFT he_l_Page_018.jpg
b7390de4edd9e5be733bc1bb332bdb9d
192d0466931ef02ce52005c7cb80226697bc4692
67456 F20101124_AAASGH he_l_Page_033.jpg
b7ae909e9c3efeda4d73fdc51262d738
7014eb5e7909e28540c90d1e3627eeb9abdcf31a
14810 F20101124_AAATJK he_l_Page_085.QC.jpg
ed12a12b59df45e3b140852907adb1dd
87a027ebb6c5b9dd314634d276272f87ad0b2be5
14063 F20101124_AAATIW he_l_Page_077.QC.jpg
e2af60fb25a77657366d3d083ff74edb
b650bf092f6edb58ed66d557ae00c64cb82d800f
63121 F20101124_AAASFU he_l_Page_019.jpg
d554e07b9ceafc1410039cb0b8fe6eb1
d586491b301a5962da0eb81009343d1fb8e0247a
70737 F20101124_AAASGI he_l_Page_034.jpg
cbb37f17250774c6ddb3d533ea9f9298
5d2091be221b294defe93c1e81c84a337523de17
4582 F20101124_AAATJL he_l_Page_085thm.jpg
44d1e8612edda9f9735f37a072b6ca3b
0cafdd06a47bdf5b5e6883aa7121f8663683bfa3
4643 F20101124_AAATIX he_l_Page_077thm.jpg
bd9d0334d3ab3d5b85448bcf7b8151c1
17b7af610919d1fec0ec9ca1f001d7f3f3f499bf
65341 F20101124_AAASFV he_l_Page_020.jpg
f5fbd067d51b55a597d9b37f10055c29
b4e573da226f10d7449ecd09706f0e3b367eae00
76110 F20101124_AAASGJ he_l_Page_035.jpg
80abd17b91e26f1e41975d90a8588d05
5396c9880d7fe092aba8d38adb20c33e9ebdcff5
15756 F20101124_AAATKA he_l_Page_094.QC.jpg
00b569760e6aa63069338231ea65a7e9
81acadf85f939f0fd41191c9fa6657ba6cb4eccf
15005 F20101124_AAATJM he_l_Page_086.QC.jpg
c3cff59b3b20580df3e5858cac048838
539d1e41d00f0d79ddc9d498075ee0798831e9aa
16140 F20101124_AAATIY he_l_Page_078.QC.jpg
0697e9ebf9aceb5739f2c1d1e124a165
e7fcfc3839e44a016ad3dda9805fbe26d8f04c5f
59071 F20101124_AAASFW he_l_Page_021.jpg
a1f208665280ec22b74d85e51b77c6c2
d615d8281cadd44c0eca06a9ab5756676b5bb5df
76271 F20101124_AAASGK he_l_Page_036.jpg
9fce955c6febebe0a3cba8de88247b88
d0419f50806e0551b30485427ee60556306798fe
4712 F20101124_AAATKB he_l_Page_094thm.jpg
e16e383bdaa8c59d8e1808656d69baeb
571cb65fe0798a2d4739daf531b8d9ea45b26d6f
4871 F20101124_AAATJN he_l_Page_086thm.jpg
900a0ca9309b75f3625669304af880cd
37b663b7d637fa36d5774483924f63d7678763f2
4973 F20101124_AAATIZ he_l_Page_078thm.jpg
74eea1881ece3f4aaae0911965a4d39d
ba49df65ba532516657698c5a47dc368747be6d2
50541 F20101124_AAASFX he_l_Page_022.jpg
4fea6adf7fc576b9f249d9b638911450
a6596be40e12624b0bbecd42dfd3283b29d874cf
78729 F20101124_AAASGL he_l_Page_037.jpg
afc1cb933408510edb1fd599abda4de6
9dc44b167613804e07af2c815f28570fc50e3ca9
13650 F20101124_AAATKC he_l_Page_095.QC.jpg
00edaf8e6c00a7b4ee729a2f3309a712
e8e74a529f208285e7081e7840da5312ea3a1292
18061 F20101124_AAATJO he_l_Page_087.QC.jpg
013fe8f918bf63a17b28d1c71e55b911
09ce97942ee772ebc507c08c2baa7a81f9778b14
51718 F20101124_AAASFY he_l_Page_023.jpg
7f2aa8daddb844bf3e463eec6cc94b9e
712b6f639b9d1d693901d18f439076a894729c6e
63495 F20101124_AAASHA he_l_Page_056.jpg
11101968d4d4860e46e63bec8d67a636
fcb089e531d502e950b7d74d0b7eb67cae8b5e1a
66942 F20101124_AAASGM he_l_Page_039.jpg
52a044a91ae0fd8fa87045a0f5e168ee
cd42fa8bf9ac1c255e0c100680306af3b06c8c58
4314 F20101124_AAATKD he_l_Page_095thm.jpg
e021daa92ff9cce657b39b31d6babe53
aa3dafd301df72449113fa8d16ed71b1677e7bc5
5244 F20101124_AAATJP he_l_Page_087thm.jpg
356d3da59a56a293a8fe795a8fd7b826
668798fe80ff5b0b1e4cfd64396cc8a6601c8458
55205 F20101124_AAASHB he_l_Page_057.jpg
892ba5683f25ce8716bf73d4936bd798
cb13fde20f9336cdcf6849529b83b386c5778ae0
67499 F20101124_AAASGN he_l_Page_040.jpg
1be1692e65e9d4aefb6de12f82bec727
90f2443860fd3afe1e942f1cef1d09cb6198b4cf
11159 F20101124_AAATKE he_l_Page_096.QC.jpg
85a42a71df29bc7646b622fd50c0ec98
d9301e35f9f21e0d1482a1876de3b4d739d7dc2a
17872 F20101124_AAATJQ he_l_Page_088.QC.jpg
c7e8362acfa34fb459b290eb9dc0c30d
48b47cfec8894bfcd0369376b24b660770b1146b
51770 F20101124_AAASFZ he_l_Page_024.jpg
506bdb53c3b9f98d62e0b0b2d4590be3
6541d200ab80d7f578e897d691a17839188e8933
71552 F20101124_AAASHC he_l_Page_058.jpg
3a8ac1c8f1fcd0a0fea1647f4c49c558
3e4c63354f52950127ffb2b297deb1d86541b248
75941 F20101124_AAASGO he_l_Page_041.jpg
752c9ad8b6d092b3f0e693c56537cf2f
0e672befd3f373c4080e34233413dba804bc9437
3633 F20101124_AAATKF he_l_Page_096thm.jpg
5e7dfab1646a7b88f184a14f86a21dc8
b7b2f7c479f0c1056211f0e1e15c6996fd39a382
5182 F20101124_AAATJR he_l_Page_088thm.jpg
8a7cd8423acba7be3f21155bea0c2ce5
8bd64d819940736460d003749516079cab21d01d
57802 F20101124_AAASHD he_l_Page_060.jpg
389dc8ccecd22d1554a70961e90dc907
8ddb0a3ec555a35008e40e5c8c258f4f25b6b08a
48472 F20101124_AAASGP he_l_Page_042.jpg
c2fbb9d236e5788e345980d9c5cc7bfb
e46829fe9a5f8a39f5f35c588bacd40cce413c58
12035 F20101124_AAATKG he_l_Page_097.QC.jpg
e47da13952a1c8de9d781fb8685527af
963329dbddb553d51e68841988a6465056baab5f
15455 F20101124_AAATJS he_l_Page_089.QC.jpg
370d97ae5c75be8776c0bd917d6a525f
22da7c8cb658a554c83c9e0e6ea616a21b64bbee
63712 F20101124_AAASHE he_l_Page_061.jpg
40251fe2a44eb15028190be0d6d341a8
50b9794f54617033febaa3f8a761d69933c98ebc
66917 F20101124_AAASGQ he_l_Page_044.jpg
f15f95988078d320612bd74841e84f7c
bff27839acb51f4952d35dccf24de05db3d28acd
18915 F20101124_AAATKH he_l_Page_098.QC.jpg
57d7ee06904626ae646e9d0a2f7b998e
9e95b14c80098141dbaa2025a6ce0326aa4eba01
4892 F20101124_AAATJT he_l_Page_089thm.jpg
430578caa8b4162b720839a8b7789888
ead630a0684a296b9c609077df241740efd432d1
63008 F20101124_AAASHF he_l_Page_062.jpg
2011d00cd988ed581f4746ced2f7024d
f39338a3d0df0aa6bf8b80f128081e3aa13f84c2
63556 F20101124_AAASGR he_l_Page_045.jpg
b0d15bb9be02e11cb4f7bbb594aa56c0
aadaccea0ce5db49dcb965ded1c75e994c1b3fd4
5373 F20101124_AAATKI he_l_Page_098thm.jpg
637a8229eb8db9673052b0620719e5fd
f373d2e70d3c3c2901d8025cb081047b322e371d
14483 F20101124_AAATJU he_l_Page_090.QC.jpg
020fc115607e03ab0465cfb577ed12dc
bdf27ab7eb0a93ee8f51bdcf612d3a468f75fb4b
69587 F20101124_AAASHG he_l_Page_063.jpg
7babc43dd5d2be24d8846887061f421b
f4bfe09524563de66ef526435b26fb77c0e7f313
45548 F20101124_AAASGS he_l_Page_046.jpg
5fe2533df9fc0c831bbb51a249df35fa
1566684c24be27bc2068245d154dbb9aaa034ac5
18781 F20101124_AAATKJ he_l_Page_099.QC.jpg
d6ee8f16c3fe2109855703eee7fc281b
916ee5f100869b16c88b3f64cf6981b226cd6ded
4330 F20101124_AAATJV he_l_Page_090thm.jpg
a087db33247ef2f5387b7e90f60bd614
5f16c6febb2f5b24471a3db16eb2520df9e23c99
62661 F20101124_AAASHH he_l_Page_064.jpg
88cac3bbffb2592bda917cf1264d13b9
a2c8b5abc3e5e08020df95475748da03b77cd6bc
59832 F20101124_AAASGT he_l_Page_047.jpg
45e35c8a130e4858349a0cb0fff0690c
65e5afff97608982877d07d80b4ea321ab412f41
5109 F20101124_AAATKK he_l_Page_099thm.jpg
be58cecfd43c98cf3dd25cb84d00cd47
8f61ef87aaf7df0bd6b7ee5864db9d02c05ff103
12551 F20101124_AAATJW he_l_Page_092.QC.jpg
eddac63d3eb7f2539ac4a54076c59da2
0b159001df08747a737f985f58367b570e77db09
60826 F20101124_AAASHI he_l_Page_065.jpg
a238e36ab3daeca84aed92447a8154cc
a27d9e4825321513852bbfc543581d95bc457a9c
66136 F20101124_AAASGU he_l_Page_049.jpg
030a298e77ddfdd111574b8595d5c8c3
4dbabc024d2d6c9102d51913d2258be918fe7d85
18407 F20101124_AAATKL he_l_Page_100.QC.jpg
b07c6407a0eddfc4f649fd511cd58147
f80b3337bf0e1f135adb6ea9878e3b58157868a2
4097 F20101124_AAATJX he_l_Page_092thm.jpg
578dc722f4962c293334d535ee946620
6a7c4c86e45053b9fbba0a4e62f6c7a837d9954c
75385 F20101124_AAASHJ he_l_Page_066.jpg
04f86ab4699e6863e5ab0207e90e445c
72ffb2fc8f4cfd98ddff4bda56ecd2087bf56075
66778 F20101124_AAASGV he_l_Page_050.jpg
67fe64037849f6e546ed17f9141c3fbd
7dd8149d51dbba2e36b1489bdc5aa2af6e3bcbcb
5157 F20101124_AAATLA he_l_Page_109thm.jpg
5712f695050452cff9d8086a7d40785e
a930794f3a5dea07ed883f71d8dea3760ec16078
5114 F20101124_AAATKM he_l_Page_100thm.jpg
c8ac600bb84d01e84ba2a7102b48e0ed
c585ae031006cd7bfa85d0bcca2a9d36bd4608b4
10299 F20101124_AAATJY he_l_Page_093.QC.jpg
463507a1ae9fc0c82a599d70e8eafe7b
dc011db84f076e4bf5f424029db4a0f082ad6bda
49279 F20101124_AAASHK he_l_Page_067.jpg
3c479bb8637ef883f9c2a3db47a7f4c8
df4a87aa3d4f24908c14e1bb76730241ec721bb8
62600 F20101124_AAASGW he_l_Page_051.jpg
df9cfd8f48f336453bd056a823f61c57
98d23f7ecba6c19b55c40f6d51db42da2c49a11e
21515 F20101124_AAATLB he_l_Page_110.QC.jpg
d9f1d04dc4cd8b5aee6b659e215f328b
7e1f73ededb923035526e9ed0841868848b5f895
21319 F20101124_AAATKN he_l_Page_101.QC.jpg
8d76f69a02acea1f5882b634a1d0e716
4eba206141960a7dd73f76c4b1aa739d87d31efd
3405 F20101124_AAATJZ he_l_Page_093thm.jpg
62a2845a41fb8b7bac739f10cb172fdc
1f98fd78b203684d43cc5611f710c980b82e519e
55633 F20101124_AAASHL he_l_Page_069.jpg
f3ee6f47f11fcba323ff7eb03d7cb8f9
24681805f8a360ba2aeba85b47098a04ec507af3
57650 F20101124_AAASGX he_l_Page_052.jpg
c11eaf588f02f7d2f2727f58389d87fc
e533194ccdb34b876da2ad5adb183c6364405d85
5918 F20101124_AAATLC he_l_Page_110thm.jpg
67cf0009fd00c681aebe3e79e952b050
ded4ba9c895110dcb7d16f3404e9b30d5f1c39bb
3783 F20101124_AAATKO he_l_Page_102thm.jpg
4515fb83fcbfdd0edfa9e347f6d846c4
05e92d4b97e8cd27ee00bf4d897cf676330a75e7
43908 F20101124_AAASIA he_l_Page_085.jpg
6b010ee65342a019b5826372e15cea1e
1d02ef85774e09d13eaba6a4c721f153fab2b12f
48779 F20101124_AAASHM he_l_Page_070.jpg
44f8e918e6820c29fa84d59c52d10006
d7b2a9d7444d4f041d473f55f363530d74c2b15c
20116 F20101124_AAASGY he_l_Page_053.jpg
0d1c9947c74312b7f3e4024e4bd67afc
969de1a563ab31288e21973f3f383c32fcc61668
19141 F20101124_AAATLD he_l_Page_111.QC.jpg
410f4b2d5e3bfd7f22867e2ab461ad3d
a08c8fcfd96021cb507d01db3f4acb708fd8beb1
20358 F20101124_AAATKP he_l_Page_103.QC.jpg
3368e53e7a1376ec77530c87e7c9832f
1b3177357744a9acfc21b1ba0a995f14405b2263
51783 F20101124_AAASIB he_l_Page_087.jpg
5bca00ac52d62ddb56df6e171b686327
28ca9578fb61647ea20b63821d17eb8043b0aa02
62550 F20101124_AAASHN he_l_Page_071.jpg
654e53c10306b5ae66b26ea3f729a622
8727fb3d5e895a66afa028b13c4e6dcc3cef103a
67676 F20101124_AAASGZ he_l_Page_055.jpg
9d6265379eae25f545183a3eb2e3c6be
663baa5f27805aa7f94b13a495a8a6aa16309135
5578 F20101124_AAATLE he_l_Page_111thm.jpg
e3f162bfb47b263b9c046c3af2f5298b
62037836f120926dc1e385e102caad30b3a742e0
5875 F20101124_AAATKQ he_l_Page_103thm.jpg
61bdae3b87e6c454bcffa773c9ab7db0
1015e9a2b0788051580017b3cb52e5305bd8fab0
51894 F20101124_AAASIC he_l_Page_088.jpg
bcd9182eba131b1720301aa87f9ca88c
a1824855d842e40f8efdf169323571aa07e528ea
67865 F20101124_AAASHO he_l_Page_072.jpg
445ec55023601b362f5625925f7f9f1e
bb3af01a85bdc4fc3ebc619c76e33d6cada46d49
19818 F20101124_AAATLF he_l_Page_112.QC.jpg
d28b4c269daae24eb249d6f339c567bd
ec9eed48f7fae8e22ec9c024f590f19528f2c506
17618 F20101124_AAATKR he_l_Page_104.QC.jpg
bedd16b708aaaf657a5ce7ee2d517fe7
1c361864986fa25fdc63b298ecd5249639407143
48440 F20101124_AAASID he_l_Page_089.jpg
a0d4e567c8deff593ad853485bad84e0
7221ec8b63a5bc6867af89c307f79764ff4c0c55
55450 F20101124_AAASHP he_l_Page_073.jpg
83985934bfb702456fc80888bfab5483
03585e6e0cfd2e667b7447719d5360c60a0a0164
5895 F20101124_AAATLG he_l_Page_112thm.jpg
8723c6ff8ab0c71e56fb48989704988d
7bee370c8fb7794be3951de3db4559920a34f0e5
5046 F20101124_AAATKS he_l_Page_104thm.jpg
fed02bc0a339a574938ff11155ad1ae1
58825597f3d2b4a6f5c25f50024a71999b100c38
62566 F20101124_AAASHQ he_l_Page_074.jpg
5f5641fc6ffdd2884191f9c10e6aef09
091fea545d1c5b1dca254350fe166e42db26728f
40132 F20101124_AAASIE he_l_Page_090.jpg
ff4769c362bf59af218f76878a8eef70
8988b5f29f2df3222e34fbc9817893a9bc208c8a
16935 F20101124_AAATLH he_l_Page_113.QC.jpg
957383233c2416e0d4fb83dd18896dab
2186d4680d0216845de6d8ec4199558bd1cc9598
19525 F20101124_AAATKT he_l_Page_105.QC.jpg
2adfe7a552aef70912820e99da671bb8
ef74b5e9b7e6d9e684b5f4f7ce8b4d6b780c8afd
63056 F20101124_AAASHR he_l_Page_075.jpg
9055adc2d08982857d21331fdab9f67d
762f05d7788efe6ea765119ea0f67d15a667de47
38347 F20101124_AAASIF he_l_Page_091.jpg
f7df9f9366f1e38b2ad50c82e20d4a9b
6e9f30340cc209b894bc1f3039c90abf7bfc9a93
5150 F20101124_AAATLI he_l_Page_113thm.jpg
a6b0c31eb61e6bcdfbacff3c46e65367
f8124eae0cca4339b22488eda358740d1622a1bf
18113 F20101124_AAATKU he_l_Page_106.QC.jpg
495e3a71726bdffaa3bda7be2568cea1
1ef507652d7b0dfae6009a4436bc1206f3272c07
50390 F20101124_AAASHS he_l_Page_076.jpg
ed393b9df95b9f4999308c4c37ab89c7
63d2b218df7e12e9fe50545f0506e824a3d26633
37080 F20101124_AAASIG he_l_Page_092.jpg
2237d1a7ebee11e24a631947e5862eb1
573d670f20cf209dc4561a24e50ebe094ac18e7a
18467 F20101124_AAATLJ he_l_Page_114.QC.jpg
6d45cd38ae0b187c36f7641c30559ea4
3f082aab56bcf6206336b0a15185ccecac6f7d7c
5576 F20101124_AAATKV he_l_Page_106thm.jpg
909ff90d40629486a2fa935dffd4e4a2
b96729652e75548aa45b103b6c7c4a7a15a8a704
42390 F20101124_AAASHT he_l_Page_077.jpg
d180bec5148c2c59ec509862af8a54a9
83112c1614750b858135250529047e39af2afb35
32651 F20101124_AAASIH he_l_Page_093.jpg
ae38e4c881d486356367227d37242650
cb36963b9685ce418911a943a90039bf54b024cc
4915 F20101124_AAATLK he_l_Page_114thm.jpg
b0685701d69aa455b4324c740cc87e23
126ba9a4cf485dbbdb029edca6941708280b178b
18278 F20101124_AAATKW he_l_Page_107.QC.jpg
02f8cf7d3b7760215c6f59e2f7947924
29d08e126b28aa90e8c09e5e5c4e00e4d74bb346
47691 F20101124_AAASHU he_l_Page_078.jpg
f2e160bbc2e99a1613d8279ce6d88cae
4c62d4f339b1e2ae718301f624fc6e417b460a85
45688 F20101124_AAASII he_l_Page_094.jpg
e6f8e1955df99b83c7323655b06f05b9
0de052d7f0a29735bb8c2d79989dc363b826036c
14989 F20101124_AAATLL he_l_Page_115.QC.jpg
666a3f36839caa8a077a7ff1c8a551c8
6d3377cfc0abd980837afbaa8210a51dfa3ab8fb
5392 F20101124_AAATKX he_l_Page_107thm.jpg
e4f1b8bcae72064d16626dbbff1a4e59
c24eed8a5b5d8227efd94bb1ea4f4c2ab2f57a17
42401 F20101124_AAASHV he_l_Page_079.jpg
0fa17ec358a1c760642256dbc0677be2
751f747083b9adc251017f0f948493cd8452d1fb
40520 F20101124_AAASIJ he_l_Page_095.jpg
8beda7020151e858343154c8d0b04c1f
cc5b13691e18468d4f70c2e42f2020199b10f23c
17363 F20101124_AAATMA he_l_Page_123.QC.jpg
340291693de1417cf85af0dad4c9544a
a2075c053ebb75958add2d140e3c23bc8a557416
4306 F20101124_AAATLM he_l_Page_115thm.jpg
dd19faf8a527a756feb77eeed9c6e992
f2d4286f7574d8f6d662135510fa63035386ed0d
19931 F20101124_AAATKY he_l_Page_108.QC.jpg
f2a619933c279b6cab930b2adcdfbe85
c73f2f374a8ac4fb782278947e72a443cbb02467
43574 F20101124_AAASHW he_l_Page_080.jpg
c78f235fd26602fc936dc59243d7ebab
d93b16d636c27edc624e2cff5cb08ce5f46ffb0b
36608 F20101124_AAASIK he_l_Page_097.jpg
79cdddaf081af2a10136bd9a646342ab
d8f51192e970b9b886934e2e88a282ac0112909e
5594 F20101124_AAATMB he_l_Page_124.QC.jpg
f161f7d60b4a6a8a0e755ff205bade6a
bc69bfc96ee30ff1002cdb727901dea0f873856c
20699 F20101124_AAATLN he_l_Page_116.QC.jpg
fa38cfb4e29230f75a760056f9badfd8
61350790bd779bb0410bc67423c2148e6dcb6617
17456 F20101124_AAATKZ he_l_Page_109.QC.jpg
589294586405f28e531118ad29571d02
d0a47a835bc3ae6d4fb4c172d44bb7b4e40ff721
26519 F20101124_AAASHX he_l_Page_082.jpg
4f2dc0598c6ce13c87bb6688ad8c50c6
35688975fb7947fe2289f3f84efa4157557e206c
58724 F20101124_AAASIL he_l_Page_098.jpg
0ba016fdd049f89760fb2b9527bd5cb2
3825be5f24912a6baff4ba4c925faf3a250aaa28
1924 F20101124_AAATMC he_l_Page_124thm.jpg
de5b06396dadf010066ca443dcb49d48
773618b3f232b4230c3db680193eea5f4539f1d7
5934 F20101124_AAATLO he_l_Page_116thm.jpg
2f08d7c21b76311ff701e2f3afa9e73a
4994ec85326334f900cfa04a66cb0509c417a3f0
47157 F20101124_AAASHY he_l_Page_083.jpg
6e3dfde5892c4f5df3e166b4923ffdfd
f1d33d8d1dce194df020fd7cf20d8f82c545b81b
70836 F20101124_AAASJA he_l_Page_117.jpg
1bb9ce2f80bb6c01ed4c91d940f1bdb8
3608ee385c2257f9382178037161049d4a9124b9
54610 F20101124_AAASIM he_l_Page_099.jpg
2420c1f02a36a786ea72177b2bb67cef
4400d1099231704e5179813bc09c3cae0cba60a9
17628 F20101124_AAATMD he_l_Page_125.QC.jpg
cb2339ddcf8a7cfe21f705a3e1776c22
f22e46269df87c52d4f5474da0056e655166c0a8
23463 F20101124_AAATLP he_l_Page_117.QC.jpg
f6bad7c678cb8ce3e73957ca6345eaf9
4c4a47f5863d71a48e711153236580b2c2f79d20
60803 F20101124_AAASHZ he_l_Page_084.jpg
2f852f8a85dee37ffd46a86f34495c5f
db7f8ca25097c8a37e9d6f73535e969bd2f041d3
70297 F20101124_AAASJB he_l_Page_118.jpg
32ee6d830a624e1bb1b00cc232be1928
c6bc28e8cbfd4ae57d80d809957c55cbfd6f6e47
50831 F20101124_AAASIN he_l_Page_100.jpg
19cf689f20d51dd6dc529032617a9722
54ba42bea4d00655966344f6f6d5b0666692246d
4597 F20101124_AAATME he_l_Page_125thm.jpg
6791457079c5f70219d0da89951d6aa5
5ad6518734556f1472a00895aabf8d78fe348063
6659 F20101124_AAATLQ he_l_Page_117thm.jpg
1be9b02f8830954382b081906f056446
a06a21e2e3572a030e65bb5ae058812045b909f7
53980 F20101124_AAASJC he_l_Page_119.jpg
998177c248a6e32aaf9e1481c0dd3725
51863f8cb694cfa418cc08b6cce45228c8c4e125
62340 F20101124_AAASIO he_l_Page_101.jpg
5cf4a9064f44a52d4a293595950acfa0
fc48538c52821dc119b88ef1a84a7b7942e61b9d
17699 F20101124_AAATMF he_l_Page_126.QC.jpg
f0bc599b4cd7750e6145a320e0274cc1
a372df15809e7848e2e097faa8786814d07379c4
23027 F20101124_AAATLR he_l_Page_118.QC.jpg
8509c29b97b10db93b019716a06d41f4
4314ff6cf577572e4121654b4fd612fb373f5d38
49411 F20101124_AAASJD he_l_Page_120.jpg
095c4967b783ff5164e3a978440da702
57d980600993d398965b452286a7b4529c009138
64986 F20101124_AAASIP he_l_Page_103.jpg
2457f821e3a6d9c7810aca728af3a309
1a71a59a679a72e391fe247f1ce3f5ce62a1a7a8
4809 F20101124_AAATMG he_l_Page_126thm.jpg
ce2dd7c3e1f8f64d1933bdb83493e7cd
683a4257cc88648898207423d85f0190e8b8ebea
6590 F20101124_AAATLS he_l_Page_118thm.jpg
f335ac00e057ab598e5b8b92fdcc1139
c12126a62428d8c2b37e7d2991e08bbd28d382f1
51147 F20101124_AAASJE he_l_Page_121.jpg
3150c6313c41f3ca9bebaec2a402c844
ccdfa2c8a114eb0ac1db9a16b7dfe8cb94145dd4
52676 F20101124_AAASIQ he_l_Page_104.jpg
31cd6deb9fbdfb410b4eb7565b4416c6
bbee2a1c76dd39b66e94204d3dc87bc80dbbe1c1
5860 F20101124_AAATMH he_l_Page_127.QC.jpg
11abdcc0e3d6384dc557d3113325767f
d3aee7257a7f3b623989b5c8a896aa2712b7a0c3
17608 F20101124_AAATLT he_l_Page_119.QC.jpg
91f54dea99294b09d14e509ae1d8f3e7
8f2339d71a0503c9609fc74c6f130b01e456d28c
54684 F20101124_AAASJF he_l_Page_122.jpg
4306ea88dd2ce4026629603f71e09084
a2c15a1abd7d0e3c0ec9b46bdab16d1be3ba5207
57076 F20101124_AAASIR he_l_Page_105.jpg
a9c6b9d49e89ffd589f99e5d9f2dc882
7a9df427ddecf0c091886d42585ed5a15b3847c9
2018 F20101124_AAATMI he_l_Page_127thm.jpg
90d686a7b2eea88cd62495cbec7d0b27
424809e6b4a8d1739e57b320f9379422465c5659
5016 F20101124_AAATLU he_l_Page_119thm.jpg
6ef8ca7fad5b925ff83900a9d1f73f31
d3569fa124760b6884b42b5fb96f40221131364e
56090 F20101124_AAASJG he_l_Page_123.jpg
9ee9f9efdaba61bcb7cc6f0f0a3677d8
d19c481278085b22d9bc5593b600a4c9801908d5
63687 F20101124_AAASIS he_l_Page_106.jpg
9dacef14ad9895e3450295f8754a3b27
fa90df889f6bcde319d906271cedb5a8e6693242
17905 F20101124_AAATMJ he_l_Page_128.QC.jpg
c7ad67dbb3634461cb643d4274ece5ba
60f865f3334c6ebf786c8b3d6eefb4826d0e4d05
15358 F20101124_AAATLV he_l_Page_120.QC.jpg
8c58e32ccd5fc536381fc48057025c0f
d602b6526e1edb52c358fd80570908cf321f80cc
56482 F20101124_AAASJH he_l_Page_125.jpg
94c0c00bcaaf4665335a2171d6d14d87
d4f29bb63271a1d879616a01ff5e488e132eb6ec
56224 F20101124_AAASIT he_l_Page_109.jpg
017223a8d51dd7eda8712689fdb5c8aa
c4560ea4b6208132dc863f9aed3b8cef7202a832
4681 F20101124_AAATMK he_l_Page_128thm.jpg
edd4493024eb1fe3b00060f0eaf1f5c7
5ddc610812c5911d63a0fc32e0d701354e239663
3986 F20101124_AAATLW he_l_Page_120thm.jpg
cea0bee0ae2c4420759b4bffbf31f0f6
7cd0c12dc2b48656fd63108a92d8ef200569f111
57231 F20101124_AAASJI he_l_Page_126.jpg
8557b8cb072809341979b3e4c09d1e39
00efe03e0facb3f55aec59279f1c7e926ddff9a1
71485 F20101124_AAASIU he_l_Page_110.jpg
fb5709e6118c7f3466a0fc1f1afd1446
3b5b54dd9b5497f44a4d48e8f5d10fba0059af1f
17587 F20101124_AAATML he_l_Page_129.QC.jpg
164b050b4fea60b676f4803aea0431cd
079f0dc1a7bcaf3b5e4a44d89cd9d404405c3aef
15759 F20101124_AAATLX he_l_Page_121.QC.jpg
c835430e65fd0852265b2a929c1cfe13
0d559538d294bc8d61cd45ba9cf5e63bb1b37579
17990 F20101124_AAASJJ he_l_Page_127.jpg
7850fcfee712ca02846e3f9dbd724a35
7f261611541dff1e18ce884c497ea35789a8a3cf
61550 F20101124_AAASIV he_l_Page_111.jpg
d3d193bbb59e70f79836895d8ab0dc14
919f65a67f3f389b082a505fb4563773ed1c0444
6454 F20101124_AAATNA he_l_Page_139thm.jpg
bd0ad7891025208a4bd283093782cda4
f56e6ffc4c8cee5f0336a6babb7132a073f3a8c0
5589 F20101124_AAATMM he_l_Page_130.QC.jpg
b4fe53e5a0e9fb3fc45a6d1b0e514caf
deb7d62903bcb10b61a9342c606b0cfbfc457a53
4299 F20101124_AAATLY he_l_Page_121thm.jpg
1d27234904686439f7c6a44c4864ef71
961bd82ef52e43df6ca15a3a25958ec88060065b
56926 F20101124_AAASJK he_l_Page_129.jpg
60541bda1c9a2ba7599c412786f9eb5e
925119468462a450e1cc03870cca90a338eb8677
53235 F20101124_AAASIW he_l_Page_113.jpg
fb68a91f5637586fd9f413c4e0b85d8c
2318ae42ee98c4d8b175ebe007a045d16870ba0e
25829 F20101124_AAATNB he_l_Page_140.QC.jpg
acebb814f23c2b3ec4ab15a3a447de49
67ff5f24db640421fbf86e75b210d0a109d64e24
1928 F20101124_AAATMN he_l_Page_130thm.jpg
08f9110593445c670a5672bf2e647faf
ea1b5231e99af560bcfea4e53dbbd46bd7401a52
17079 F20101124_AAATLZ he_l_Page_122.QC.jpg
39502f75d4418d0b22ce054f8e3a8610
438aecdb32db35eec51884710b70279cb449ec33
17049 F20101124_AAASJL he_l_Page_130.jpg
a6fb7f4e25469459e7adafca2fcd2f56
63003da213448d69733277a5e434130e7cbc4062
58666 F20101124_AAASIX he_l_Page_114.jpg
977974b313077f492d0807dbd175bf2e
fd9737d7637e6f16ceb9bbe6d71b2558413e7f6c
6781 F20101124_AAATNC he_l_Page_140thm.jpg
208c39d0941ec79fc361a24b2ca61b1e
254f73be506e895f59211f6084b169590cf9bf89
17949 F20101124_AAATMO he_l_Page_131.QC.jpg
b77afceaa44a9df5617b068e5a074add
e8a8ced78d1994292165ba13c53b711c1b0c8485
5145 F20101124_AAASKA he_l_Page_002.jp2
d43f954e502578a3b3f292c5e2491797
9a6cfd388762f6bb54692930442f816215810d10
57562 F20101124_AAASJM he_l_Page_131.jpg
c8e5e5d807671c3c412e7597fbae6b6d
079800b951d3efb43c8bb0bf20ab3fb5f09cbfd6
45107 F20101124_AAASIY he_l_Page_115.jpg
e747c223ba7b3682ff7049f1f761f596
17f4a38ae106b4c2647f32c030154b52da213f5d
25266 F20101124_AAATND he_l_Page_141.QC.jpg
9ece3d3be7e58e2e44c903cfaf247081
3dc5ebf0020d9eedc4131a9fb9aa6d94ada54185
4713 F20101124_AAATMP he_l_Page_131thm.jpg
3a3a8b854003ee128dd77df20340556b
1eab6d21bc7bfe619ec2754c018067dde3b3f4e4
11044 F20101124_AAASKB he_l_Page_003.jp2
1c63ddb0fd8e17c27c3b49db2adbaab1
657900b31a1f451d044af261c53e0887991971c4
56380 F20101124_AAASJN he_l_Page_132.jpg
1bf352c698814aeafa0802635eeddac0
19d2063fa00fc98cf69816af2df87f33e26e06ee
64100 F20101124_AAASIZ he_l_Page_116.jpg
c2f743264a3fb1e3bb4bc714850f646c
34941c1bbe13240661d6f787e45475c37412ba8f
6768 F20101124_AAATNE he_l_Page_141thm.jpg
8f8f835ed45995db148eca079ebd6d90
f0598971f7203dab33686c385cb1e4c0bfba7387
F20101124_AAATMQ he_l_Page_132.QC.jpg
0d94dd1145267e9b16b038809a2b4bee
0e979ccdaa6b91e8a996910164c94858f0e8eec6
83518 F20101124_AAASKC he_l_Page_004.jp2
1196269c86a084071a355a5223fe2d47
37779ecfa93f02c1239b2e6a62946e61c715b800
17031 F20101124_AAASJO he_l_Page_133.jpg
bde3cdb21cb2cbf395acf6c7d804ad4d
3dd392e1f486339af5094a539357f65cd719851d
24867 F20101124_AAATNF he_l_Page_142.QC.jpg
a14bfe1a17dcc60bcb100b3ed9084d1c
549dd3f74c2140160a849bbfe78a20de2e186722
5592 F20101124_AAATMR he_l_Page_133.QC.jpg
722adbdc72cbfc51e1aa3bd2a2dfed09
83bcd90fa42598d6e1cb61f7698a6a2422a490a4
1051986 F20101124_AAASKD he_l_Page_005.jp2
39f67efe628a8dad0317f94986db0f8f
a0a09ca7a32b8db990fe13387283d1befba3f29b
78473 F20101124_AAASJP he_l_Page_134.jpg
3857ad4828d40cd29a83b66bce8f7ca3
18e091eb7936de7e4d7591746e8752240cc05e33
F20101124_AAATNG he_l_Page_143.QC.jpg
60a43790f1ea41a44d437cf82d72187b
89cd6c3b1f92f60ab0c68b244db4d8a93aeef139
F20101124_AAATMS he_l_Page_133thm.jpg
22599aa1f60e261070392b1a40417b56
2de7ebb7e25c4f18d2b7544640f9fe850562d4ba
1051982 F20101124_AAASKE he_l_Page_006.jp2
f1560d5bfc87bee9dc0bc90f5cce2355
95c2d37a7a42e47d365a2930cc927b46516d2067
88080 F20101124_AAASJQ he_l_Page_135.jpg
eae5b991edcf1df3b5224b429c54397c
9cb63e97e829576cd9b68b0f13f274c25bb242f0
F20101124_AAATNH he_l_Page_144thm.jpg
d9ea18dec1a50c88fa783e292d392a44
5192a51a95524a3a83118f94e71be53895736e59
5916 F20101124_AAATMT he_l_Page_134thm.jpg
91729bb46f0dae6cff61e56d0511fb5e
77ad886b14c6924d4581b50d584f18090b4e3a7d
1051966 F20101124_AAASKF he_l_Page_007.jp2
89363b7beff2c73873782dd9189566f6
7fe8a9ef88c0204243d30e27390f7efbc0676bd0
80358 F20101124_AAASJR he_l_Page_136.jpg
d0c5dddab64d6bb3574ac6dc1947f244
793e548c67d3f4b48010eaeeaec80ec25976338b
9586 F20101124_AAATNI he_l_Page_145.QC.jpg
95811773af1f2c980078e093b6a0ba26
7c3a141bd44a2222b0fa5c4f057111cc180ca5ed
24705 F20101124_AAATMU he_l_Page_135.QC.jpg
17f4256d5f2e151714368115162343f7
90bf6e86e0fbeab93e7a4e452553c217ad4f8624
1051971 F20101124_AAASKG he_l_Page_009.jp2
8eb3f005f612d0c3e1a121a69efec5aa
93422074138e119da64c0548d91273c91bf5ff15
90168 F20101124_AAASJS he_l_Page_137.jpg
b108e7c5d07d2864123c6497e0339ea8
bffe8f492c08d11e425041ec72b3192ca29e60df
3008 F20101124_AAATNJ he_l_Page_145thm.jpg
2ced5e583cb6ed15f035db55c502b5d2
a508b3ebc798fb19d5f2860b886c3ddaa3a30658
6857 F20101124_AAATMV he_l_Page_135thm.jpg
3a0f7f238210d8b233085cb4a55fb2b6
3fd6ca5ba37edacbc1104a8693627c06f6c9ade6
219758 F20101124_AAASKH he_l_Page_010.jp2
73aa28cb63e489d61ce38014795a9ba5
44c43cc9cbf15f64bdeb9ea6830343d7a127a50b
91666 F20101124_AAASJT he_l_Page_138.jpg
c2a3f24d1494eea57577ac2d208afb15
e03830b50cc1af8a4337fe28fd89eb20a82ca32b
164447 F20101124_AAATNK UFE0011440_00001.mets
00d7839d8d5cbf1998f8742cddf9585c
ad8780608339477ef0acf87c4402a09bcf6abeb2
6759 F20101124_AAATMW he_l_Page_136thm.jpg
30008f2d9d5a7af26961137f33258e01
4a8acbfa05d4ed1489b3392b3abb1f3dc9067196
82866 F20101124_AAASKI he_l_Page_011.jp2
d5972af7fbea793610b741971d409e62
b2b272a7efdc464c84f1d8dc3adb0d66a2f380c7
90101 F20101124_AAASJU he_l_Page_140.jpg
59e1167f82e132e0238471c2b17f907f
42b823c6a6d7b13039a76ac06f68c27491be22df
25360 F20101124_AAATMX he_l_Page_137.QC.jpg
326f5b4d5e93963f6ff7103d5e2f8365
e51804aef43db18382481faedf0facd47afb78e3
61171 F20101124_AAASKJ he_l_Page_012.jp2
88dd0f64c5bc8e2148774398045bbb7e
193e1858f845234bcc768f3a5d0ae5d3f1dc35de
87504 F20101124_AAASJV he_l_Page_141.jpg
ee8c5cbaed30b64884dba63a29a04d33
e4064eff14ae7809344497d6b71f7ec59db986cd
6667 F20101124_AAATMY he_l_Page_138thm.jpg
b96fe4195ef651f8a13bd82e6d5670f1
e514c2557da431055340e166cc43a0c312f363ce
92902 F20101124_AAASKK he_l_Page_013.jp2
df8dd82673d63cb652880caf21796088
3a27c0359a72facf2371fbc739c6d94046484fec
87069 F20101124_AAASJW he_l_Page_142.jpg
0db6102f14ff19df880c0b1b8050bc86
db41f6c1e3b188fc620181b772bf80d319225f8b
23849 F20101124_AAATMZ he_l_Page_139.QC.jpg
9d72470ac57be046a0e8727e7633fa22
22631a10e0f03348530f77933849f21f0185b7e1
105380 F20101124_AAASKL he_l_Page_014.jp2
2e5c64b1f2bb96ddb3ab7c8d9bcfcb0e
4ce0c90cb7bc9d51d72963098da2ee8014a03379
62886 F20101124_AAASJX he_l_Page_144.jpg
3186b404bed1fc286234823c1c70c17e
57e90116ef7355dc9ac7a0bee1427a560896d196
103978 F20101124_AAASKM he_l_Page_015.jp2
f9caa625f2e9cf477917abf72236c83a
6467e61c2ad15c42cae0c09c4377271a3f1f7bcf
29619 F20101124_AAASJY he_l_Page_145.jpg
aac1db79a7e753fcef0e6b6020a569ee
cfa5e872e767cc622cdd3fa8da07fa19a93f83e5
103718 F20101124_AAASLA he_l_Page_030.jp2
3328bb55df36132d13f7e117ffb418c1
20f2cd468c74ac502a376a40b95ab8a8f426f760
111076 F20101124_AAASKN he_l_Page_016.jp2
4ce284f7d1654f6ea11dad04882cacdd
ffb41442c88399a288d2646a2128b7c64887ab2e
22975 F20101124_AAASJZ he_l_Page_001.jp2
066c5d8b012b17df7892231ccf37996b
f03c3b6b5de94ffd8571aa7aed7b00b25cbfc592
93662 F20101124_AAASLB he_l_Page_031.jp2
0cfff06f0b91ad8f114be3e89991d75f
a3a5ca8b1b8432549352fd257610cbcdc3384e6e
19186 F20101124_AAASKO he_l_Page_017.jp2
323240cf7e9e7cb96299e989954ea177
575708d5fa234b752c669a237bb79dff391b369b
96024 F20101124_AAASLC he_l_Page_032.jp2
9f9bb0691a21a11e88ab753877d9205a
6aca67513953db97f46a88fc75f616bb16310d83
94151 F20101124_AAASKP he_l_Page_018.jp2
a1b77c7e806d7a4f070d11fb0fed29ad
68cb391638c636c1b7e6500b856d2c2df622a503
104127 F20101124_AAASLD he_l_Page_033.jp2
f72057e180ab0fbdf6aeac9eebb2d6c4
5b1d7d06b86aee5d1af3079c11f14fb6680857d8
94898 F20101124_AAASKQ he_l_Page_019.jp2
b6be4923afba7353d68b57f29543ab4d
d3b7f6354022dde61611fdab2ea507146619f100
108421 F20101124_AAASLE he_l_Page_034.jp2
19c79cf27c0baf259ed8cfc2374d0dc0
9c81f90a8b45e2017e1590592377d4dbc4125ee9
96643 F20101124_AAASKR he_l_Page_020.jp2
84d1e62e6575650d3e2fb8cef2eb0b72
387c51787723cc169106c1a7f5b1063b8147ce68
1051977 F20101124_AAASLF he_l_Page_035.jp2
af2a37fd886a3bf891de76319eb035c5
032c7ddf6c0a704ef83a4354e152a5f940e685a5
88638 F20101124_AAASKS he_l_Page_021.jp2
f9ed70911c3f4605e62a3fd647dec1e8
25b7a0d932a38decdbfaefd7d3d0d2a08b5758fe
1051974 F20101124_AAASLG he_l_Page_037.jp2
21a48fad998206b7ecc2f4dc3fd954a0
49262108fd0fddbb9fbd28440c5a65529887b1bc
689184 F20101124_AAASKT he_l_Page_022.jp2
2a8dd17198bfe0407f51fc07fae8cdf3
edf599252371c33b7731bedf51662cd289f1b02d
1027937 F20101124_AAASLH he_l_Page_038.jp2
38d0c7998b6d92bbd97d76baff7878a5
a08323d30d0414ed8119a03ecbd30fea130a9c93
76232 F20101124_AAASKU he_l_Page_023.jp2
911a7a4fb22e0b3a86c7b2b5ecce0a4b
7d13663f00d868fe134aca1eccd461099e20a67f
103804 F20101124_AAASLI he_l_Page_039.jp2
694198c5399ebe8e0964c901d85f3f67
49ab94c850f4305e3916f5239da51339f8ae748c
75130 F20101124_AAASKV he_l_Page_024.jp2
535172c94ae617984a53f4770ebe458c
bc9c69ae0f107a4f5178611cae4f82d911e77717
103038 F20101124_AAASLJ he_l_Page_040.jp2
8358b33276659a83e9a1b2c1979e5869
9e8f31827a22040c474facd11227cb35ac11b44a
829893 F20101124_AAASKW he_l_Page_025.jp2
bae238bade45d4f39b0d806f3de025b1
05a4d952a08cf1af8d27e329f66e2b96501a9606
66879 F20101124_AAASLK he_l_Page_042.jp2
5cdc1a15f9288dbab82e8c19a52b2c82
22fe26c698af9027f77fd1f4578a74c7b995a5f9
859439 F20101124_AAASKX he_l_Page_026.jp2
9967f2502dfd4ff1743ea06ed3f3a5c2
a0e9f7071d337241c44b1aef869cda342820aca1
96702 F20101124_AAASLL he_l_Page_043.jp2
d77abdf2ff689d8fb8070ccb5951ba3c
9a20f379bee2e17355e0df693bfc08c6a83c8903
106075 F20101124_AAASKY he_l_Page_028.jp2
d583b36cccb7394e0731e2082e5d911e
3fd13301a5814430a7c6d37b90f68c74fa0d79e1
94494 F20101124_AAASMA he_l_Page_062.jp2
955032654b38388c7f13da1bec7ff51e
53d515146309ec4f8599fceffd9627d16272be7c
101744 F20101124_AAASLM he_l_Page_044.jp2
9fe6b24a48cd836490ff0b89f66b08b8
6e820299acf0101fbc460ab4c8eec69ccd77c865
103491 F20101124_AAASKZ he_l_Page_029.jp2
5406008ce4bc99a0793e281e436c9bc6
c0afccdda85258d1be3eec4e7fcb1496d25f6839
103983 F20101124_AAASMB he_l_Page_063.jp2
897d71f2d5f61d5821c0239a56c2794c
e197fc53f42d958e10ceb2c4aa3b2a899f1ef3f5
93466 F20101124_AAASLN he_l_Page_045.jp2
a94089e9ba0b887c40e0544b9ab296d1
1ad33b36e688605918f01d26f98aace4dd39dc01
877400 F20101124_AAASMC he_l_Page_064.jp2
dddbe8d730879fa1aacce4cc4c22360c
446ce82e507b6e1398f8c1c79bdd99a8dc9e36b6
64193 F20101124_AAASLO he_l_Page_046.jp2
98201fa510047d428203568aafcf3c10
5dea48b8b1845ea7376d520200eaa0cc8684ab33
88893 F20101124_AAASMD he_l_Page_065.jp2
143549704366c04b0983fa247e3816e8
2fe69d09da5922d58f1282d9c04c22b8de6babad
88436 F20101124_AAASLP he_l_Page_047.jp2
d47ed189522d3a53cedefe26cd55a064
dd09b82e092400b5356f748a94f06619079f831d
1035789 F20101124_AAASME he_l_Page_066.jp2
2742a4c874ebb68e9c5ae9543ce07f82
f6627e31791a52998d3d3f5e1b813d2cb4ad9d45
99886 F20101124_AAASLQ he_l_Page_048.jp2
07e341687c92715c178c361bcbf23e04
9c00eae091713693dd159720e06124083fb8513c
69016 F20101124_AAASMF he_l_Page_067.jp2
5d5dd6f7dafad2853bfc0f7a00576596
bfe4b4627a4068fef4412d7f153f723215fff3f2
102380 F20101124_AAASLR he_l_Page_049.jp2
489b17b7824cb47be1e55f52d92416e5
16e324c6db77f6adbb2da99bfa0ec4358a151622
67461 F20101124_AAASMG he_l_Page_068.jp2
a9efea6516f39b5dbc3fab9eedebdd76
54e8aa062279f53a22f653c872aaeb545b079587
93924 F20101124_AAASLS he_l_Page_051.jp2
bb9a17f1daeaac339bc0f7f5cdf63a25
e2aa8c0464f564493d0276d556bfec5c8f38838d
745539 F20101124_AAASMH he_l_Page_069.jp2
ddb1d12607d58acadce4bf0a770c9920
2ec939ae84894318f07b53ed28f902a6ca248b63
774105 F20101124_AAASLT he_l_Page_052.jp2
195ee20d79a89f76a7de55f48cb818e6
c9e96ce911b0723e8fd22dbff76c3cde9d9a5805
69081 F20101124_AAASMI he_l_Page_070.jp2
24de78734ec65d23e5af4c283fc83ed8
ca653f7707842a6fbc3f095b583e003fc533cff6
23962 F20101124_AAASLU he_l_Page_053.jp2
5f7d35e6a0d9fc0f37987cd52508dc67
851750d8b718052902298cf504f51fdeddf9db6e
88471 F20101124_AAASMJ he_l_Page_071.jp2
371d4c671cd62a0a19059dc06aed7681
ff03ae0bb392bff50f70fef7291bc559936cb95a
94010 F20101124_AAASLV he_l_Page_054.jp2
7c6daf1e58db793eff0e6554157d9309
d0b2479d3615ef57a350fca7ba0cb4d2bebe9f73
100005 F20101124_AAASMK he_l_Page_072.jp2
f68d0a50fb8d3963e659af853b1921e7
7b992f753433db553d2e081f2cade5bf805c4a95
102433 F20101124_AAASLW he_l_Page_055.jp2
947fb98c89b55acc37890077a539c063
d99fb03c01dbe15eecd57d15729f4d83db301458
602896 F20101124_AAASNA he_l_Page_089.jp2
5f1a842f2ba478a9cd2f03bcbfd648d0
2d43316fc08fb2e4879a9092498da94fe754ecd7
82499 F20101124_AAASML he_l_Page_073.jp2
8921105766bf98e5c59974d4e6862906
cb03cf68c9f29bd8cec725158bf487d00c8d08b1
80237 F20101124_AAASLX he_l_Page_057.jp2
d6c311f3e942b11d5abf01ec3cbc6948
211fce6e4a6b6ce5eaeb89d9bdc0a9e5bbad033d
91880 F20101124_AAASMM he_l_Page_074.jp2
2f024ad08031f0f4468b50c0573f0c82
f202b2cb362fa0e261e92904175d5ee6a3a32bca
105353 F20101124_AAASLY he_l_Page_058.jp2
c327415d283c9946c7759ce6aa99a21b
546f059b58857922c3e11726476fb49b4a740edd
95175 F20101124_AAASMN he_l_Page_075.jp2
9ebfb53e905df777837b6df9d06b310f
7df9122186770e68bd944c7a9d98c416a5866d01
837514 F20101124_AAASLZ he_l_Page_061.jp2
d2bdd0030b151b93bb5c726eebc4cb0d
fc42900e2d0aa7b9bb02b7c66940bfde3c893729
55590 F20101124_AAASNB he_l_Page_090.jp2
3ee6e5a461d25148baf67c3d1f6ef8ff
5001b88a204c835da6ced3eee77d368641a073db
75281 F20101124_AAASMO he_l_Page_076.jp2
08ba5d9c096bc754ab11cfd431e91ddf
cfc7ded0916fe15d2b88bce391216ed8105b5ff0
53201 F20101124_AAASNC he_l_Page_091.jp2
85e7363e1563dc3c102bda36fbe7c73b
c4be28899fc21d7ce5d3456e1025dc08767f34cb
55171 F20101124_AAASMP he_l_Page_077.jp2
693b4f9e05bbab728b0f8e72de9b3dcb
611d769a8c0cf5581875afa4edd392a4e0f31695
50437 F20101124_AAASND he_l_Page_092.jp2
fe578f66738f5fd3ecb2126c8fa61a6b
39d96cafd8b6eb35b26caefec492f32dc528824d
609110 F20101124_AAASMQ he_l_Page_078.jp2
0f09ed64fdbee915851b5c0da8585745
f24474b043b183473d18404439262bc34f7efcf2
67258 F20101124_AAASNE he_l_Page_094.jp2
ca6a675d4536985223aa87bbd0ac4cc7
88d862dced06d31fec772d10f3e346ae066cba96
60138 F20101124_AAASMR he_l_Page_079.jp2
23cfe648d03551945c00d8dcc3517ce7
53d898cac8fa28bd4669d692e53ad6416995b05c
54241 F20101124_AAASNF he_l_Page_095.jp2
f21dd1c1bc0835f477985f4d343de2fd
8623347583f6e52597fd341caff2706709a52ce1
61375 F20101124_AAASMS he_l_Page_080.jp2
07a171b2aacaa762ee78c063da16698a
4dbf2a5f8a438d536f99b22c2a9be3f6f361c98b
44254 F20101124_AAASNG he_l_Page_096.jp2
b4c14f43811751cdb89a1ca227cee714
240b1979ebb20b33588cfd6d43fa1aa8d61122e6
40518 F20101124_AAASMT he_l_Page_081.jp2
915877b644b6484389326c07c0d23578
ca115a7e2fb534f29a4da081fa93a9409c35c4dc
815529 F20101124_AAASNH he_l_Page_098.jp2
d75de8cd2a5577a995ac91091c29376a
6b28978be44f593b0b77710e45d4e16e4df7e094
33354 F20101124_AAASMU he_l_Page_082.jp2
eec4cb536373ad2cbbcf9d42f212362d
295f2deedefd1c6df6bac208a367b5ea0028f40f
68201 F20101124_AAASNI he_l_Page_100.jp2
3bf862156a34feb0b1525cfdcf7f4c23
e8d51fa97f04f464ecd271a42a9d725fab270a31
64909 F20101124_AAASMV he_l_Page_083.jp2
afb6731f56552cf7f542e34d74785579
fbfb53206bdc04c21f0a47b8f61ba9e1470fb7e1
832824 F20101124_AAASNJ he_l_Page_101.jp2
859d0737d8596aa806844dbd71dc71af
b7c842ca05cdfd6c18826fbe2198261a07323c5f
61537 F20101124_AAASMW he_l_Page_085.jp2
af746103641263d646fe22c05ec0acc9
f0b0fce472a6fed420901bdb3902405a8f855116
46100 F20101124_AAASNK he_l_Page_102.jp2
bfa9e130f147afbb6df31edc04dbc524
df0d0ae0f2c84febf9393a4dfa6dfe6fbe3c23b2
57884 F20101124_AAASMX he_l_Page_086.jp2
5e0a96346bc336c5e15435b35e2a4122
62745eca2735c38c9ea70131565302000721a2aa
80222 F20101124_AAASOA he_l_Page_119.jp2
5d4b959895bd2dcf032511618cfe113e
cee436eba10c58ad5b0c5da442b3a7aca2d575c9
93799 F20101124_AAASNL he_l_Page_103.jp2
21b37343b1eb3f24158b7b3be759fd21
d123614779522f8b8aa101d0405f94d3f584f94d
75891 F20101124_AAASMY he_l_Page_087.jp2
6052510b61ffea8dfe6e7b9c2a18003d
d085424bd3a9da13fcef648e55fd6cfd676a3243
51436 F20101124_AAASOB he_l_Page_120.jp2
e6eec3d9494d3ce2a95eb3e027f5a2ad
ddfce804fb52b142f04290759b074a8e6782efda
78810 F20101124_AAASNM he_l_Page_104.jp2
e6a30f3971cf97f79a134542634851be
712bb5ee3f858531d26bcff2ec45190b6c49de2e
677607 F20101124_AAASMZ he_l_Page_088.jp2
10a093a2c2ca00022e48cd4f44cc5807
5d1a1398e4e411b85f7908b65a39f6ba5499558e
863363 F20101124_AAASNN he_l_Page_106.jp2
e0a3111eec9bc59bcdd71bd240dd15ac
9229fcfc077b6e582a7e7980d343f64ea9e9b283
56284 F20101124_AAASOC he_l_Page_121.jp2
0b5fc3047d041e7fe9678b7e553b0e9e
16e91e54d2af5232334e89553d5ba7612651eb80
838935 F20101124_AAASNO he_l_Page_107.jp2
7a9ebc6fc25f9ebf9d622009f21c06f4
7cdfba2ad5ba4dc4f95ad7fd39a292042b457b44
54072 F20101124_AAASOD he_l_Page_122.jp2
8204d66cc721d6676195efdc0c0dead5
43802393d6b71b7ec0c1da6bf8b6b40b15664e40
901848 F20101124_AAASNP he_l_Page_108.jp2
a306de4494751c8cd326cf513a69f1ec
967156826a396b1cf3b7ef7f7a5d6d8c7970238a
47545 F20101124_AAASOE he_l_Page_123.jp2
eb48315b345c9b7c172285d88628d9fe
3747a333e466eedaac30597cabeace491de2b1de
729251 F20101124_AAASNQ he_l_Page_109.jp2
40114c2b85712f9c44dddaf254ac0922
0a75934320849b4ae17dbd9a53410a46d530fd17
11412 F20101124_AAASOF he_l_Page_124.jp2
105342445afebb9c874f5d4de9e30134
652c35d7a451482a043e97cd73de0ea408ab2e29
893742 F20101124_AAASNR he_l_Page_110.jp2
56d9fafa98892153a68be745373aed2e
a000bf6f4a88f3aaa9839a63d1c4f29bc5811342
57456 F20101124_AAASOG he_l_Page_125.jp2
3d4bd7fa0179e08f5c927c8152962de3
6dfc51ec1b5e0cab37d8ced109c732f453db9fa4
885209 F20101124_AAASNS he_l_Page_111.jp2
a94620e5219e2bde1260d663493039a6
b7edd46ecf4fd6c38098f490ce7aebe465849d41
51720 F20101124_AAASOH he_l_Page_126.jp2
def9eabefc319a137b5c5cbb2f10748e
f455de4d91bdbbd1040c27434bc62aee567c6048
821036 F20101124_AAASNT he_l_Page_112.jp2
919ada356fc2a950ca0a40f49db5db79
0cb122a581eb010b5b55e91063e1500ff463e38a
12007 F20101124_AAASOI he_l_Page_127.jp2
c57310aa77826e290b18a2bd8f47f17b
3a74080a6feae8e549d65e1a70cbd5b220e0b1ba
694713 F20101124_AAASNU he_l_Page_113.jp2
ffdc033d7f9635db2dd482482614da03
ff3de96e393347cc229cb66a8ba7a48e7eb0454d
58963 F20101124_AAASOJ he_l_Page_128.jp2
1c3c040a45bc002cd2f70b661e683334
1951355bd29b77ac4d8889c4f12301da806cc01d
761470 F20101124_AAASNV he_l_Page_114.jp2
94af0fcdb70b47663c38a1948dfc5715
55f7954c43a33915fc539bf729262cb49b5a6e5b
50305 F20101124_AAASOK he_l_Page_129.jp2
edb7ad5395a01cebef1a3c6ce12c5d49
c71a8db56c039ff1e1a3add83de32b09bc4c17d8
635051 F20101124_AAASNW he_l_Page_115.jp2
ef506ed267d0490762e8a138cadba572
681f825a2aeb180ee625a4d352ab76d9c2d09fda
F20101124_AAASPA he_l_Page_004.tif
0a3c21341961ba68e799cb7a5b9d1095
d077775c81a36c3e38b0855eaaaf1da8f35d9851
11426 F20101124_AAASOL he_l_Page_130.jp2
d3864d2de4241bb7f1acb4ec7d51fd22
90a35f6ef7c0925ebdde23512b7ed699dcbe969c
94349 F20101124_AAASNX he_l_Page_116.jp2
e1b28c184c0717ce31c3022caae0f62c
a137367d9734d641d3dda758a9ab75174fbfb198
F20101124_AAASPB he_l_Page_005.tif
aea4933e9c562ebfa6d5743ff85f2931
499e185253f0bef32b63c5bdc2b0b636ecbc0200
58967 F20101124_AAASOM he_l_Page_131.jp2
bbb5e65bb1d84f1a57433d53802a7d49
de64e615a5092be33f36e338b831a8774da05cfd
107886 F20101124_AAASNY he_l_Page_117.jp2
e7bce5b05538a4c1634fdb828cb08db1
8cac55d060b351b5d4f746e7e2dcfdbea34327f4
F20101124_AAASPC he_l_Page_006.tif
a94fa4ed7869ecfc2810c288217a9602
fae3a22662c2aa892a4c2b6d4fd8d50c90718585
49420 F20101124_AAASON he_l_Page_132.jp2
d37eccb7da2bf96765b1043a9904510b
15dd0e5dc26957a6dcdc2ac35fc220a6e7920f43
107320 F20101124_AAASNZ he_l_Page_118.jp2
cdc50906f1adccbe17c8a8ea92eb0cae
b1df852f617a3a5bf1f811ecf15f76af73e0054b
11385 F20101124_AAASOO he_l_Page_133.jp2
51c6801f78ebbe801d70c8fc30bea812
b78405b1878c8883e123d4c3612b3fd1aa83a8eb
F20101124_AAASPD he_l_Page_007.tif
7dcc8aad96fd13d4091dc39a2120cf72
68ed388f147ed7fd73b10dac64d5090add135b4e
110985 F20101124_AAASOP he_l_Page_134.jp2
a8a77eb998fae534b948c9b9622a1a19
78a1f1a4bcbfc0ef5aaf46b7eb990a35adeaca10
F20101124_AAASPE he_l_Page_008.tif
98eb4f34f2a24960138841f5a77c8fd4
d03b81a5864c71e795a511622f889465e40b6be1
129535 F20101124_AAASOQ he_l_Page_135.jp2
d13b641831cdba8f078d253e699ec4b8
2d939ccff9a01bcdd3879ed4c0cf20ae1a09fce9
F20101124_AAASPF he_l_Page_009.tif
aea7f37e8d28ce8a80d273b4108a1dca
74850140fb7af8ab71b7ca882d13c0a0061b1673
121286 F20101124_AAASOR he_l_Page_136.jp2
b6f4d35ac4edab8d44306b72b6ce9f5b
c1c9eed18fb686a053302878b0e52089bc14fa70
F20101124_AAASPG he_l_Page_010.tif
5b6e00336453eaef579c221d05774672
bc311c48c2054d8fb63928269cae7e4d30839456
135615 F20101124_AAASOS he_l_Page_138.jp2
5e975937d8ad22872f5789db68c7864a
81306c68f90f1c44b71d1c05f3015efcf730a1c0
F20101124_AAASPH he_l_Page_011.tif
704897dcf2b1d3f1c9a885b1f6337545
d60f4cf16900997aec208fcc7bf4dbe1b615b679
121644 F20101124_AAASOT he_l_Page_139.jp2
706648a7c3b325218278e71e988de380
6d6e1ec9827139e11567e0a18337172102d105f6
F20101124_AAASPI he_l_Page_012.tif
404046ee1586009a50c97466b15f7b3c
7966cbbe5a08e767a4a24df8cb53e23497058564
128159 F20101124_AAASOU he_l_Page_140.jp2
5a7431b20559cea79a2611f5d8c07767
71f89e838327cf4e1e8b823f721521c40154b667
F20101124_AAASPJ he_l_Page_013.tif
fb7eb1e939c044d63143023032b9ff61
1593b0b7d6ac9d777fb88d92996bb95442450d9a
132935 F20101124_AAASOV he_l_Page_141.jp2
3a257e80ffe5161718d4fb3f35bca3e4
1240e50f26088b523ef9d4a3142a7d948d415a41
F20101124_AAASPK he_l_Page_014.tif
38f9d1e0f27bd568c5e0e3967b32b766
4cb9d5d9d5c60693df79cc0b2f87322d8a5617d6
129422 F20101124_AAASOW he_l_Page_142.jp2
5932fe564e07f229e6fb980ae5e1be85
27d5755019c22e0de15f53222977159bfae44abc
F20101124_AAASPL he_l_Page_015.tif
fdc58b119d57f52f264d8e864f396a7f
76a72b4fe90549ca76e7376eb0a25a9a42b58b21
88693 F20101124_AAASOX he_l_Page_144.jp2
8ae3fa82830d332802c5235a7b5f8aca
0f12d7bf765c3af8728a3a1b244b6b52741e687b
F20101124_AAASQA he_l_Page_033.tif
a94edfe2b9538b316d67ad63e9f81ea8
88d79ff767f3b85c68417d7e240bee0b51b7f8ff
F20101124_AAASPM he_l_Page_016.tif
0d405ded1904011415d1870c40e121b6
04133b18f0df954e3f91c6b18e19e2dc92fbc277
F20101124_AAASOY he_l_Page_001.tif
f1433c3f468f7a8e854e1b56e6d8379a
4e8cc8958e33ed2c840ab0d132dcb94c2cf04e83
F20101124_AAASQB he_l_Page_034.tif
b3dd8f1019357a70575ee2027afd171b
0739d9db7b4d533724e582c61ef40fe952429bec
F20101124_AAASPN he_l_Page_017.tif
14a4146eb8001bd06615b119315d4428
1b27616014005e307e0f0d69df2fdcba717f35d3
F20101124_AAASOZ he_l_Page_003.tif
d0ff29f9dca00a3bf592c4d8137cd703
1afc80f47aace90c616feda2b5e266808a70ecef
F20101124_AAASQC he_l_Page_035.tif
6c10e90107e598a7ebb3352a83f08fcc
fd1aaa09c7c4d0eafcf72e2961d856a2bf9eeba2
F20101124_AAASPO he_l_Page_018.tif
780022bf364e100eeca578ec5eaa4708
d57269f23c2b3e9e0d9a35612c57b528d52b3ceb
F20101124_AAASQD he_l_Page_036.tif
815f98b8c8b3403f14ec121a40155669
7e68e6ee1b3bd5bfd0cac7e70c0ecbee8511e4fe
F20101124_AAASPP he_l_Page_019.tif
73c474c56d4ad00619fdb9cdb9feb818
c6ee9d1090994b9b9232c05e66bb65be3a1f166f
F20101124_AAASPQ he_l_Page_020.tif
1854575ee6df277f04be791b10636018
8fa0a3921ad8b92202397dcda40dec49726afda7
F20101124_AAASQE he_l_Page_037.tif
cedeccbfa02ba47e93ad42cfe11c0339
dcef35176f40278b9704003b6631fbd9f41b2c7c
F20101124_AAASPR he_l_Page_022.tif
a152b4986d3605518202c7272fac39fb
f28dd08cda223ff400a8ec303c0cee415fb7b59b
F20101124_AAASQF he_l_Page_039.tif
a4dd5ad6dd27b342da3c523894f85b1b
045e282b40b1a9f492fad0b2a5f251c27d8d4487
F20101124_AAASPS he_l_Page_023.tif
99cc77761d5a229e691d0796ac23992c
b1120f22426e1f73d45d0fab7cec27bf53b580c7
F20101124_AAASQG he_l_Page_040.tif
43c77e6a6c48952234749f9bb10b6c35
3d2c43326599eb16898632bd27359eb4cdd0772b
F20101124_AAASPT he_l_Page_024.tif
c150c931cfb9b5adf7f9d9d8e624c80f
9c53d82ac131d73013fa1e9306550ae02987aca0
F20101124_AAASQH he_l_Page_042.tif
720c7e7c5b4a33b02889955562c1d926
dfb2b0af69aee34f027fdfb581fbe1b75aca8892
F20101124_AAASPU he_l_Page_025.tif
3b8975d825388b96b743c2ad6c26846e
6832459e11fe6670f341e1f4880c4b6f1d78bfc2
F20101124_AAASQI he_l_Page_043.tif
79d1027bb6ac309832732f0f295fb37e
368a5303dca23f13fdafd6006199bec1a5aaddb5
F20101124_AAASPV he_l_Page_026.tif
643354b4bd29ef18789b3a2ebb1731c0
8d7c217df851ec465cc7148256cd00d2e87f8c78
F20101124_AAASQJ he_l_Page_044.tif
ea425789c1bbdba799b24d17dc0ecc8f
316a127ffdc622dafbced58ee2b4dd6a940ca933
F20101124_AAASPW he_l_Page_028.tif
1a494d94631798bea72e4fd22a92f3cc
661feac48ee9997d6826b50d456bf6a738d2fd77
F20101124_AAASQK he_l_Page_045.tif
ed567317d3921277f6b7fea5b5bc3604
09bbd2579df614360e7eb147529967a9645f142f
F20101124_AAASPX he_l_Page_029.tif
b3224c0a00dad7b614806d15ea93537b
ecbd643fb73c9c4071418fe758a89849440d54bc
F20101124_AAASRA he_l_Page_066.tif
4d8395902260567b8745f56b23d1a46e
7b58779ba84548ecf1b11ce7f8290cee1b96def6
F20101124_AAASQL he_l_Page_046.tif
a6ac75a3ab4abe6f343d9df98e6507c9
779b9ea8194a7876287fe796712bd8ad1c955f3f
F20101124_AAASPY he_l_Page_030.tif
98fd6e00139fd58068a324ee029244c5
c1befddfe8870709847b25be3ef20f1aa59a8623
F20101124_AAASRB he_l_Page_067.tif
9a05554bf57a862eeca9aabd302ac828
8119b0446003da6e7300b01241679878ab0479a3
F20101124_AAASQM he_l_Page_047.tif
9c6b219dd1df8d88ec64ca3f53edefd6
0a1d6071304ccaa65a24eeb94f6de769850c7eed
F20101124_AAASPZ he_l_Page_032.tif
48b3777e9cb7f790d50e64ed520a6c96
b5c24ca411054e1b97b294530b026ca2104ddc50
F20101124_AAASRC he_l_Page_068.tif
988f7f1ff7cb366c4ff6f9ace767f102
6193f3c8aba0b6cf016c5b564b8d2f31a7352bf6
F20101124_AAASQN he_l_Page_048.tif
799f5a4ee1b4754657f055050b8fbc45
31a41e99a0271c84ca26daac2e2f16016c8e9734
F20101124_AAASRD he_l_Page_070.tif
7d139f34890b4a4d64a6ad4533cbf6c4
b4caf9a91dd5cb41271c2babb1ccb31144530a0d
F20101124_AAASQO he_l_Page_050.tif
ee25ab670dcf65c289c3de9ef966f613
abf5d698742bdd937010b5934d9a9b245831e048
F20101124_AAASRE he_l_Page_071.tif
77e6a42713a6d5df139912a14fe5bd76
355e039ca458163118a81398b562373760a7e8e2
F20101124_AAASQP he_l_Page_053.tif
49b0d8e7da523755204c50945af62cf1
88326b0a5d1940d8a73df4d87af3a16abafe3efe
F20101124_AAASQQ he_l_Page_054.tif
0cc21ba1590842667c90d668c55c1881
a43a0874ce7048c7b6b45112c55f6dde644338cf
F20101124_AAASRF he_l_Page_072.tif
9348dadff0bb75aa8a22b6fa2108f371
d900c00d8552801a0bf1ea4da77b60ca93615485
F20101124_AAASQR he_l_Page_055.tif
be67ba6a86014c40c532721085526a43
7188ba3882666fcb1cc4fbcfadab6b67ed7323a1
F20101124_AAASRG he_l_Page_073.tif
d0e63808db341cc75df7f83b63d37d46
89c0c89d69a0b4437269d33c7240288a0e96a02c
F20101124_AAASQS he_l_Page_056.tif
e459db24db06160f3991dfe3ecd033a0
eb2eb49f08ac8618bf70f9e35b07147198ca15ba
F20101124_AAASRH he_l_Page_074.tif
7d978fe6107366965a1dba162cb99170
3905b548a2a9a96c3aa24505dfc9c1f0708b41b3
F20101124_AAASQT he_l_Page_057.tif
f3d4136ca0a9ea52b71af13ebf85bea1
f41e90df7a093f3ed49c21f749de9b561adcecc6
F20101124_AAASRI he_l_Page_076.tif
b5aa1aa49f83317ef5366f1dc6c295a8
3301e2b3c964f206ce888c49a27a881a8f99a2ab
F20101124_AAASQU he_l_Page_058.tif
18c07ed6f9c67242848d2ad4191bbd98
ef8d5de6e30a086c4bb79d4d6d86b5fbf8c65f50
F20101124_AAASRJ he_l_Page_077.tif
c36b86c3f2300ccd926d084ee2260e0c
642c632cb57d9175c1792ab7ca7fd04cc9315a10
F20101124_AAASQV he_l_Page_059.tif
8c1046dca491ab1312500463d7e3f779
27c6c0ed38977cce1f447ae073df6c8dd3214d36
F20101124_AAASRK he_l_Page_078.tif
7338330b1643f8277d8a9d13d48dbba5
b76624b76ef377fbb085782e035c73c96f06e8e7
F20101124_AAASQW he_l_Page_060.tif
289bc3dc8acd9de28e3449319e2372c5
6fe4a9d7640255f5766c7f282f240c4e3086442b
F20101124_AAASRL he_l_Page_080.tif
7716b05122bc64964eff2a8ea265b970
659c16d05e9e57edebd88bd03ac099418a7d206b
F20101124_AAASQX he_l_Page_061.tif
ff32f41348ec47bd4efaed7ad362f002
d6603c2632546136dd6b42899e2321379b38ce8d
F20101124_AAASSA he_l_Page_100.tif
03ba360576b8a94c20b61bc1ca6188ab
742b148998d7254390ec566578d1acfc011b4067
F20101124_AAASRM he_l_Page_081.tif
96dd59bd4d8b1f43f3bd763de3631790
9a55727c95e935b05fe16a3efe3fb46a22bd3954
F20101124_AAASQY he_l_Page_063.tif
e852806968875c155cbec9171d87dc1b
124520f68c88e93a178eb8a51ab5154441307198
F20101124_AAASSB he_l_Page_101.tif
b8ba759dff498d8166eb74a33403f802
2b34ccdb9a5abcbfdb9d24147da15c41da57794c
F20101124_AAASRN he_l_Page_082.tif
29ee776774b7b8109afddfa6d08c9631
ddca9eaaad37f817c218d90dcbabd763f59a59e0
F20101124_AAASQZ he_l_Page_065.tif
360ccce6169e0d83a87fc326c004b8f7
2ec1b6ff53072aa3f941eccbbaaa38d3d52cbaec
F20101124_AAASSC he_l_Page_105.tif
3f94ff62433daa890a701ce57ee5522e
4a99b9686b3c33fd128e0737c9ef6a8da943d161
F20101124_AAASRO he_l_Page_085.tif
e5b8c9722406e36d1efec924c4cbe10f
c2c1086ee1bccdc9cace5fc1194a810a3e8c2489
F20101124_AAASSD he_l_Page_106.tif
2df689b6dae8aee9a4f1b31e549448ca
a74d8dc591a5d52882f33a7a94157a04d48e4f34
F20101124_AAASRP he_l_Page_086.tif
2923a6b299c3b0cceb6d3e8f6b5f41bd
66962c84485a9c7f3bac6f99630da93b55c2dfaa
F20101124_AAASSE he_l_Page_107.tif
d0d00d83a57ec4626cf523e6d5098808
e67cc621a83396141163ebd68b36d83a11ca3f05
F20101124_AAASRQ he_l_Page_087.tif
09dd1cfcf202dcb7d9404752e6bbd49a
ec074434088a9251f19ff9335c6dac176848e18b
F20101124_AAASSF he_l_Page_108.tif
b9fc832b5dfd56ebcda1337231f281ca
3449fc68168603ce4b4a0528bf2773f74f0461b1
F20101124_AAASRR he_l_Page_088.tif
c9948852356211577a047a85836e67f2
d6605a6eb75990b0fcd66731bdaf5d211cf0f4f6
F20101124_AAASRS he_l_Page_089.tif
08b2a3553b5f29809c8bb184301ff40f
139e0c61a78bc26a0fc0933a9f1af3d273309d50
F20101124_AAASSG he_l_Page_110.tif
ede033ca28769b41e1f8639eac1c256c
a58731212094b5757ef26b1ce52e79cbb9d998d8
F20101124_AAASRT he_l_Page_091.tif
291f6e78bc6026c5d19539e94927772c
31e52834363d2eeaf96ced5cd238d44c448c2c80
F20101124_AAASSH he_l_Page_111.tif
b20161555e2a4c06c90d4ab10e79835b
874a963fc7a795c2a040a93800a16f742eb9d8f3
F20101124_AAASRU he_l_Page_092.tif
794074ee39422b4de706e86bd3f76a57
7556ae217c77e701ff6318f0d3e7ddc0553b8bae
F20101124_AAASSI he_l_Page_112.tif
af395c814ed85c8c39217efbada78c22
0ce014b0610ddb145436deeb7f8fc5241970e013
F20101124_AAASRV he_l_Page_093.tif
96349d289caddc0bdd4a174a78812ebc
7dedc2874aafc72666bbd61d80329071b55a52d1
F20101124_AAASSJ he_l_Page_113.tif
f507852cece7006488606f68dc720f7d
c461bedaea5b93216fb3250e50d8fd52152e63be
F20101124_AAASRW he_l_Page_094.tif
5ed1427baa22ea9381ed9292f1ccff9e
907e6f68d11c74c43c71c7e277729e113131f205
F20101124_AAASSK he_l_Page_114.tif
a5410412584bafec03c8cd04db5d76c1
201ee65e14a4d5b4159556e401bd0d6aa54cea62
F20101124_AAASRX he_l_Page_096.tif
aaf8d792bad32faa8fe9e5400e74edfb
cac4ba8ecb91e0507f2801a1cf60b28c58c29082
F20101124_AAASTA he_l_Page_133.tif
af66763576867dad0bb959eec7282620
005af3ae9a8b27fa8e64901b29ac5a3c274f502f
F20101124_AAASSL he_l_Page_115.tif
21b34a03f27bbc4264f2930530d00e61
bc784b0766a3bbb133fa32fbdd38427a56e7a0c3
F20101124_AAASRY he_l_Page_098.tif
3d8c60ae80ea85fd1d6bbc1550c5755b
00e9e869b60d7d9c9b5c6c35226d706a19e00ae4
F20101124_AAASTB he_l_Page_134.tif
200e732298cc6f6bb00f8836b60f2fad
ef6d428097ca1ff1964e86c7cccf412d09857a7a
F20101124_AAASSM he_l_Page_116.tif
7dab6992a4ad89e8b5f591fa70cd8a52
a7aaa43263d310133235de3bafe8f1c4c64c75d2
F20101124_AAASRZ he_l_Page_099.tif
ca99f9e598b5cf4d29b088a4555b9f5e
ec065728f77326d4d5e68c26a2ecf0836e7dc30c
F20101124_AAASTC he_l_Page_135.tif
892269b06b8ac99f41d6350fdfa1d8bb
196522b4e94f46e785be38cd1144718bc68dcf5c
F20101124_AAASSN he_l_Page_117.tif
5154ee0240d47eb7260e14ac6b9a047e
b2fb43fe0763c81663093f4e68c924b20078dc40
F20101124_AAASTD he_l_Page_136.tif
76c0cebf0472024888ee5436dd61f3e7
c6d040974e2f99ebf3b4157fa9e68a48d6b5af6e
F20101124_AAASSO he_l_Page_118.tif
0f8561375e507c7ffbdb2dd71f6f567d
88963a1f8c3015053aa2d7e0fbe76e8ed216cab7
F20101124_AAASTE he_l_Page_138.tif
b5abb9052c11face657b7440b02cd09d
a33d23d29210862f021379d03febafec37819074
F20101124_AAASSP he_l_Page_119.tif
17b3f456a28706a9b3b2d6599aabc8b4
4fce1654ef9a4586d4d9037c17b8254c984c3a36
F20101124_AAASTF he_l_Page_139.tif
8b43209c298355ea0433a93e2c63413f
d7605ee0549b85421eee715ea02de19228869a1c
F20101124_AAASSQ he_l_Page_121.tif
cb45885d277b381ccc4c3a8edb4fc1cf
0b0d2ec5da7de92ef5bf1055a9d72fef1f539745
F20101124_AAASTG he_l_Page_140.tif
4c9401131de325fb0df036835ab04f6f
627ea67243539589c0d6daabcc06a00343331136
F20101124_AAASSR he_l_Page_122.tif
bb848d1243bb566d5fd50055938e67c0
0cdba4aeec05d261fc5bd5549f877d54a09e994a
F20101124_AAASSS he_l_Page_123.tif
bf30c5c599b41d265ac72aa426adfff7
a33cb3e9fd2fecffa7fbfbfe6906f2d19d934ddc
F20101124_AAASTH he_l_Page_141.tif
51590f926f06f84714576f26075d2744
1945b8376a419665d1dd13a6a8a62b0498aa52de
F20101124_AAASST he_l_Page_124.tif
94b91f8af2071a89bb522a566be46688
03031b59bd0ebc86a5cddba83070a4bb674446a6
F20101124_AAASTI he_l_Page_143.tif
feee3c76f2419e3f3085f63b0963d5d6
fa5d608005725333ce70acf1999a2bccd115a449
F20101124_AAASSU he_l_Page_126.tif
d2a342a3907908ba8a383c7d122b33ff
1e96557baa1f5ef8a6aab91f50c15060f25fe85a
F20101124_AAASTJ he_l_Page_144.tif
cea7524be6d23004ff08046e119c6fe2
a0db53bcb1e206828ab6d9c7fb6ec2b5d0f1b8e2
F20101124_AAASSV he_l_Page_127.tif
45f264205f6357798ec4d2634873eb4a
1b56165aab44ea12f3d0ebe88c2ff1db9161b06a
F20101124_AAASTK he_l_Page_145.tif
2eb0a871f767874287e5dfb92d9e9a89
f9baa9a04e33b69ce41b25f37bc7ce7369d4505d
F20101124_AAASSW he_l_Page_128.tif
165276ffcc7d6090c856fff97d794c4a
601ee525ca99a71db1f8c8e4b3a0ea10fa54113a
43182 F20101124_AAASUA he_l_Page_018.pro
6ef03b769738dbd81f08f84f560060b8
b0b38a671d30b0b3a6891ba8ac3a14b2fd38fb8e
7896 F20101124_AAASTL he_l_Page_001.pro
408e206509e8c9610feda0bc43db945c
b629a9a53e1672054157c54ae09ea48e4f66ae75
F20101124_AAASSX he_l_Page_129.tif
88216da7390bbe1fce6e7e85f3c7b19c
9663359cd4e5ddca9934eead0730f0942cd4b629
43617 F20101124_AAASUB he_l_Page_019.pro
97310c7189a7a61204e09a46d249b509
c42f03310280c65817df54aa84ce10598cfb4c12
962 F20101124_AAASTM he_l_Page_002.pro
da3f801bf9d584d0aae518bf5898870c
557b9317827a59d2de31c492611fcfa2e217315c
F20101124_AAASSY he_l_Page_130.tif
5105b81e302ddca47c2f6cf23d287b88
82d998c79d209ed2506d9cd824c08f28487f0ec6
43939 F20101124_AAASUC he_l_Page_020.pro
918889d812b9368c7eba7f391baed3fe
fc06ae9d7dafb2efbe3f30c29339d7c5f78f28c0
3911 F20101124_AAASTN he_l_Page_003.pro
5857e8f5fe8574821db7c9644d69a48c
8ba174524d058cd7eef169dc47df7896a1336e4c
F20101124_AAASSZ he_l_Page_132.tif
8b4bd93b41bd4fdf86ef86cdbb6c061f
d1f8c1c2d541d0ef0697bffab74ac4fd6193ba16
39804 F20101124_AAASUD he_l_Page_021.pro
94992a145f2375e8ce31eb22cef6e82c
1b37c4b93dd11cc39c8a150dceb93e448351f955
38031 F20101124_AAASTO he_l_Page_004.pro
32f6d0f339a6fd917e580bf329e64f05
a53a75e21e53db3a227b955cda1a435b8466ad4c
27374 F20101124_AAASUE he_l_Page_022.pro
681552dfab921fc9e939c86be9492f69
9640863baabecc27151b8c80c5d299acda410c13
75501 F20101124_AAASTP he_l_Page_005.pro
1dd15deb437a929e00d7570417448d5a
583b599a8beb2553447a2b13c491496615a709da
35230 F20101124_AAASUF he_l_Page_023.pro
3bf39ecded202b639b7c773b6eef28c9
69186892df9f6b61fba51ef4a273f6a17e073345
38584 F20101124_AAASTQ he_l_Page_007.pro
3ec1bf8381b5dc4d9cab3a6ee3164998
23817b41bed6906d6fb87a0bc9fb7506f6858e16
34137 F20101124_AAASUG he_l_Page_024.pro
835419babb384cf55aa86efe72dac82f
cbfea039233744c30e6efbe63ad950842135d357
49646 F20101124_AAASTR he_l_Page_008.pro
6d294696aba37af25cca934a4bdc0129
13eb3be0b1ddc729b9600f5e771d3c66e2cfd0e3
1129 F20101124_AAATAA he_l_Page_046.txt
f34adb0edd37922ed0bf172502781a71
a8216bda1e476673e43ba8e8b7d631798c61d6cf
37343 F20101124_AAASUH he_l_Page_025.pro
dc07ae7283f306f5ec2e4ac3c064dba5
68fbcd3e7bf3805f024838f9acee759eebc575e9
53029 F20101124_AAASTS he_l_Page_009.pro
5b9c32eb22c677e8a38229d74aae50cf
e39e81bf58c10b8659b6f1e54ff6d9a3e5264785
1693 F20101124_AAATAB he_l_Page_047.txt
3d92e3e4fb5ba471ff08431719185603
3272f1c8c936ad6fbad34fe1fbefc6b3168a881a
5355 F20101124_AAASTT he_l_Page_010.pro
a9df04e6d07f73b40513716a755e04d1
d814f6de7da7aadd5664f49fcf59a8783806b1f9
1862 F20101124_AAATAC he_l_Page_049.txt
638bff2c6b89258cabf0f03645e5a0ff
ee6b4521ff7874972fa7d3e3e23c7a4ddd8968b2
39750 F20101124_AAASUI he_l_Page_026.pro
7af491a86235111b03b3255181765357
38db52d62c266d5164c7b56dc290a4c4180edddd
36945 F20101124_AAASTU he_l_Page_011.pro
4190f6501bc67ced72148ab8382b4528
096648fbc3a683b32bc61be5ef46426bc2b4bb44
1845 F20101124_AAATAD he_l_Page_050.txt
9aec8b903f6d33968b9bb0a981082284
7392ed778065527d661791a726f3a19ee2c63f78
39288 F20101124_AAASUJ he_l_Page_027.pro
d01e14232556d38f8622f49d57b26723
282fada29315348a489fa89079361482a8932265
26708 F20101124_AAASTV he_l_Page_012.pro
8abb3e54bb2aa2b26b0d50da6c6b5ced
b96049c127ca9cc59d506da1652809112f07c131
1702 F20101124_AAATAE he_l_Page_051.txt
a097cba323a8d0ac6257d009fd50b470
83a95dfe97958f7769e9e281c219a5f68f106733
49373 F20101124_AAASUK he_l_Page_028.pro
16aa3ee790b4111a8d80e708e3256047
3eb25f1c3fcd7028d30e3b50321824a00683fe36
42491 F20101124_AAASTW he_l_Page_013.pro
7f6c1a97edbdb6229c67bcecba46efad
583eccbb495161f3af830cc10a2827202163b22f
428 F20101124_AAATAF he_l_Page_053.txt
646d481fab34cbfcbdf7873e0f4d31c1
eeafc200a956d4b5a564e5598a9324aa4ff8d1ac
48087 F20101124_AAASUL he_l_Page_029.pro
104630c69fa98429128fed9cca542254
f1b03af206a1094c06d34b595f6a5d6f239e04e0
47710 F20101124_AAASTX he_l_Page_014.pro
72e026bd2987b72a2bee23200b60755c
9dc7807043010e4ec9395b63442295729addb5b3
1800 F20101124_AAATAG he_l_Page_054.txt
1e5d0350f7d43adf4e2e4779d01bc9bd
99c5f8dc2f8c373f1fca00853634c4005c5f0960
45314 F20101124_AAASVA he_l_Page_048.pro
fab1c386a6666d7b20144c6eb570f20f
a371ef6d6b8665aca8eefaa5d4a0cfdb9fbee055
47429 F20101124_AAASUM he_l_Page_030.pro
13dfa4e3743ff239eae66b8270d44fc4
2e4ea81c4f4680aa9355663922737af8146c8e2e
47606 F20101124_AAASTY he_l_Page_015.pro
6c9dc5f5e0bf41d4eb07faa960de266a
c55a01d8eff71bc1f07ea6531c8a492e63f2eaea
1858 F20101124_AAATAH he_l_Page_055.txt
2822f7d2e9cd9e84a7bcb9201772c8e2
a9c6fbb3c7a4be295be8d21031f04ad54878884e
46551 F20101124_AAASVB he_l_Page_049.pro
dc20a9ac1b9b3a966989300be25f8102
2f63224558fac37660f5f4e093532d201692be35
44064 F20101124_AAASUN he_l_Page_032.pro
f40e7a2a64fc4aef2ed7c6a042304bfa
a61249efbac5bf0ee4a0175049623e9c1c607476
51301 F20101124_AAASTZ he_l_Page_016.pro
b69ca24883dd0346a5ee31cfbf0bdad9
b335665b82cddd7e8221ed848b7f96ce9398b785
1721 F20101124_AAATAI he_l_Page_056.txt
af0db69235409a1007dc2d596a3c18fd
2d6342fc1b8bea5668db0e63c6fa355a6bae45f2
46157 F20101124_AAASVC he_l_Page_050.pro
5743a94741e4cf474110158d37fe0109
616dd83949549f924d47404dcc3ea7b53c6adeb7
48091 F20101124_AAASUO he_l_Page_033.pro
4bbb3cf31b6af161f52e8283116286f1
c49930d0b79638d3eaf29d3caccb3c7b2336f302
1480 F20101124_AAATAJ he_l_Page_057.txt
f753a89df359fc004002635e04a40c36
a05169887665cb83c4c7063f3c399391043d3303
33880 F20101124_AAASVD he_l_Page_052.pro
aa38f114ca53ad9cd3522c1bdd5fdf1d
5ab7ad6a73a8022de39f26a01a01f6062752ecbc
49991 F20101124_AAASUP he_l_Page_034.pro
8c39719ec8ff85b8147490b8e208d882
7cc77cf7cf6acdf56aaca0b350f56fd7c3f7d3f0
1949 F20101124_AAATAK he_l_Page_058.txt
3a5871514559d67f1d869680022590d2
924198c3eb4c38b100693575ef59dbefecbb55c4
9552 F20101124_AAASVE he_l_Page_053.pro
196676248bafa6d27c015e269190d0c7
712709d1858a2f0436612e9934f60469436d4ca8
45567 F20101124_AAASUQ he_l_Page_036.pro
ce5fdf67aa52d8e58787f54f11283761
26ed038e673852e72e6a7d34c1bb2ae8fe290fe6
1947 F20101124_AAATAL he_l_Page_059.txt
aa4810fabfc3f983558d36a67333785f
4c5b0fca6e7a88701136cd166349525fd5f3388c
42822 F20101124_AAASVF he_l_Page_054.pro
522fa064b607241237e943f1a8d6a878
3a2b807119afed8296c8eca3ca9c6c5d74480cd0
48912 F20101124_AAASUR he_l_Page_037.pro
64a278906666e65ee99d5e926bfc086d
34c24f1d614c7536f719e64a4d9dd362be7a1d50
1432 F20101124_AAATBA he_l_Page_076.txt
f4e0431170e283cefac05772cd5a2912
81968b33f24be68c4ef1327668ac591b6e4b9108
1742 F20101124_AAATAM he_l_Page_060.txt
80773bdd62885d615bfabd393ddb3dde
488d4cc6369b7fc2722e2c1d4bea830c842bf614
46141 F20101124_AAASVG he_l_Page_055.pro
24b4b3db1d704a0cbc4703bd845facca
0c98f94e1e3d22f2ddf4af02c1b44c1dd10add35
49036 F20101124_AAASUS he_l_Page_038.pro
1ee2be43d62eae9d91b98e17b1225332
43288438db10d9c6ac91a8b86ec37a0243d8db16
1005 F20101124_AAATBB he_l_Page_077.txt
a130920d13d25e4e8aa368edd09df85c
71be7d1f48434b9da05b4e40e05ec755f593505c
1450 F20101124_AAATAN he_l_Page_061.txt
20c3941e2c2b14b7667f6faade3d7ef8
e8ce5aa4742af4364610f91b0dae08a6e7933ae8
36079 F20101124_AAASVH he_l_Page_057.pro
014c1e2443fb0b9cefb4acbc21e3c8f1
8ee2baffd5efd3d7b5f0c64a3535328bb93df3aa
47631 F20101124_AAASUT he_l_Page_039.pro
949a2f0efdd7e437795abc5217929641
376f1272b0e42f4ff35cb490c99ff8cbae0a6ba3
1308 F20101124_AAATBC he_l_Page_078.txt
8a69be7b1714917fe0279b9df7ed67eb
3bca2a46b6e1fdb13014ea7d8883d61be9932827
1726 F20101124_AAATAO he_l_Page_062.txt
b069369b7666e26d6d000d9a2bba2865
9811087805597416f0599156c530a750ca991543
48726 F20101124_AAASVI he_l_Page_058.pro
b6b036af25cf66145b73bf1dbcda107f
175aa6b7316cd92e9c35329989e92acdadb760c0
46009 F20101124_AAASUU he_l_Page_040.pro
443322f90f37b61e11b499aeeaf23898
9cbaa16090b7b14aca07a7cb30f1888c143d8649
1127 F20101124_AAATBD he_l_Page_079.txt
b688f1fe44ee7cb35431ce926e8acb6b
ad94b3937d1053f46c7adeae842d0d9987a385bb
F20101124_AAATAP he_l_Page_063.txt
5a2467778f4d933f7d01d0120e47ca58
dd91094f24e7e10850367309cb86e1a17fa5b1dd
48462 F20101124_AAASUV he_l_Page_041.pro
0a57ac248ab4d354eca4c500afc692af
c24b7fe2d6ae76e7e61bf5c37ebc1b77c742d6e2
857 F20101124_AAATBE he_l_Page_081.txt
8242afafd5c68920a4d69cdfc7723137
7f112966242ded8f90d42bd893fb96af6549ec8a
3184 F20101124_AAATAQ he_l_Page_064.txt
3e46a2f3ea2d2c41ffcde62ea30b4946
820280b651bad27d7af821954536050efb3109ee
45303 F20101124_AAASVJ he_l_Page_059.pro
735567195321e0e6b6879a9c2e2b6a9e
a04dd99931528214e854da85e8badcbe7f4dd0aa
43334 F20101124_AAASUW he_l_Page_043.pro
4e3ef8f988ab71fd02c8e9a537134d92
d671647dd41d48447a9ab4c047b44bc40a13bdc8
693 F20101124_AAATBF he_l_Page_082.txt
288407638d5579073c65336f0760c30f
e2db94b17ca3f355c5820007874f8647bfc27bd5
1634 F20101124_AAATAR he_l_Page_065.txt
1bc77f7ca525c8bb9d6f629a920a03d8
b14d54c3953e20583bb7e5354552b9ba64727e5a
37710 F20101124_AAASVK he_l_Page_060.pro
bb3213d91de3f6c60d0941bcbc99bd1e
f11ab7d2f2fed2afbd7fecc836c165a52ba5f590
F20101124_AAATBG he_l_Page_083.txt
1fc902878e3ad2bfdd643d3adc283c58
a8d00fafcc32d82f18dca0f456699ff2481438c4
33409 F20101124_AAASWA he_l_Page_076.pro
bf5fe460cc13d684a085cb184187869b
fa827626ce68400a7fb2a4c1df8452c11a659834
2221 F20101124_AAATAS he_l_Page_066.txt
5dda468b22213a377370a6845aa2ea7f
44237e4a9447759b1c3f9635287a6b9a89f20da8
35771 F20101124_AAASVL he_l_Page_061.pro
aa4938c8569213b5fa3d231429f10253
d62c517e4722686949d979a2dfd60097d018f1a9
46198 F20101124_AAASUX he_l_Page_044.pro
2917c51db6a7d901970c56afeca7682c
5e0720bff242ddfc7dfb68c89ceaa88dedaeca2f
1803 F20101124_AAATBH he_l_Page_084.txt
3db2b7090e8a700e4d4f5cdca873d134
08bb9481b9a64bdb0a56da9e56efeb408b5f16d2
20535 F20101124_AAASWB he_l_Page_077.pro
6b8e8f6b125fd9dd753be1a873478030
cae1a7c40bab4a3af6d5d0b1dfbbf974c3203bab
43464 F20101124_AAASVM he_l_Page_062.pro
27886f9e3a9d34a2f827a3836f89d73e
592fc91fdd298a852deea3d65d4d48c08f41d647
42538 F20101124_AAASUY he_l_Page_045.pro
3ca38035caf283ac96179f25c2def508
63713ecd9450eb0973ee2be4a75ebab05befb27a
1333 F20101124_AAATBI he_l_Page_085.txt
ced6c9c633122a774e9b1099d4a20e43
fbccecb8f5c9aec4163a22b4a9bd27b1789b23cd
24444 F20101124_AAASWC he_l_Page_079.pro
55137789d666e7c95a2c6c75f3e1290a
58b53b65550b2e741b2c04546c78e58f000324c3
1296 F20101124_AAATAT he_l_Page_067.txt
0203e4c5e9ef688a6ee443bb9a89b0ce
7889d5d5ee66ac39c5ef161bc1cb0b3e14de47fc
47968 F20101124_AAASVN he_l_Page_063.pro
cc7b691b16fd0eddc790dc58c7c7463c
30698cf19b60bd6d0c5ad1cf050c464ee387790f
39951 F20101124_AAASUZ he_l_Page_047.pro
2ab2a2bca337f37d7a809cf976ff5ae7
83dd0f7186ce3bd6fe823607b034f897606a1868
906 F20101124_AAATBJ he_l_Page_086.txt
aac3697f291544ed9d77f5c7969004ee
c709be859d3c40c9aefbe57a3e6357d27e9e9fbb
27032 F20101124_AAASWD he_l_Page_080.pro
de3aca4f839412eb44b64ecb15978c0d
e0229e2bb6dce77c5301126d9cf42e0d74ab7ecf
1538 F20101124_AAATAU he_l_Page_068.txt
9b8caf7a2e19d3ce3fd0b43084f4a34a
2dda22db51a073a9f52cdb37c8cb5aede0df119e
54896 F20101124_AAASVO he_l_Page_064.pro
acaa93846ea6e59b00495b63e95a0ff9
20fc45019dd89df7d33d9abaca531f2a66562630
1444 F20101124_AAATBK he_l_Page_087.txt
68e29aa628147cf133e9eb23fe0c7aa9
8ba01af30333c84ae7d1b0aeb1ca17b5cfe85dcf
12171 F20101124_AAASWE he_l_Page_082.pro
e0a68f640d5e5b6fadb3449b2f7d2186
df3a46de8bac4e46a02b866150ed272c7ae4a990
1415 F20101124_AAATAV he_l_Page_069.txt
98300229ac3481db99c24fe963ce2c8c
8e754d1cce63d55a31e3411f5cd9c7f6be8afa18
40930 F20101124_AAASVP he_l_Page_065.pro
7ad2a3281939e6dc88422e0d82edb997
e110fc9b29cd08f4c278bcbc9356c8cc5a48a792
1622 F20101124_AAATBL he_l_Page_088.txt
711030411d25e0626161785cda1238bf
9b60a9b8610723a6c879cdccdefb4375a76af953
28170 F20101124_AAASWF he_l_Page_083.pro
71c19559ad448119370d5b377ce24517
dd9eca75bfbe9a5dbdd12594e086c744403ecef9
1179 F20101124_AAATAW he_l_Page_070.txt
8f38018a6c62395d632cad1e67c05149
639482f85decfcc72c7de74c6c8d036eac961ff9
46596 F20101124_AAASVQ he_l_Page_066.pro
045ba0e673d59220be33b9ce83b16d36
762bd247fc19ce7df70b621e3418aff88e243e44
1422 F20101124_AAATBM he_l_Page_089.txt
647d4723320f25a2a5a1919781c04b47
b4307137eef41efb87450ad8657cb20c71454a9b
36747 F20101124_AAASWG he_l_Page_084.pro
44dd58e9eb25e718321f45d1f8d815f5
2accc4f56894c697b9baa74a9c72f0467d965150
1671 F20101124_AAATAX he_l_Page_071.txt
8d9baa529686bea4c4c783a60373ce21
77cf4dfca74a09d67465fcb4522876234d9f520d
30629 F20101124_AAASVR he_l_Page_067.pro
2e2fd2fb8d3b288e3fc3e1251af13726
633e0b0d3f87c7989c5b55af3ccd8fc140812fbd
1492 F20101124_AAATCA he_l_Page_104.txt
88ceb85bf9fe027a315b2a4a07c118b4
87bdfafd7e9bc4c73fec940c2b9ee3d03b1182eb
1049 F20101124_AAATBN he_l_Page_090.txt
f2cbd2c60f3f6856f89bf1044dbab709
f08dd20730c395c94d3e34a11139577cb7f9d4b2
27354 F20101124_AAASWH he_l_Page_085.pro
56c1977cb15618cdbf19104d1f518329
e4ee24f676935e6684e3d8e40ea76b57801ff312
1824 F20101124_AAATAY he_l_Page_072.txt
2d943bb48686edde64e6128feeebc7fa
29996c0ffc9e2d76f9a1e919589feba679c4f074
30904 F20101124_AAASVS he_l_Page_068.pro
732beb064e195a665b9b4301104ecd50
a8d32429403c7544f6c45db786cb1ad266ab915d
2508 F20101124_AAATCB he_l_Page_106.txt
ab28f56bb304657319c7152080d65340
42bb79897e96115778453b4c7dcb39a5ea02d504
1276 F20101124_AAATBO he_l_Page_091.txt
dd02e766f0a0878ddecc877caac792ad
0ece4e926b37a9f6bc77170adbaee126e9394c7f
34188 F20101124_AAASWI he_l_Page_087.pro
00493a657d87b48caba0a2ae86037656
97ad1ac0163f5b0e3a235f8b2e1c1fe304596f0d
1759 F20101124_AAATAZ he_l_Page_074.txt
70223aded43f4a024cdbcf79a6a6c602
6388d43c99eca0be091cc70fa28841c22ffbf51c
33832 F20101124_AAASVT he_l_Page_069.pro
5ba6ff8b4c3adbe098141eb6e6839ac5
be8766af0c800179f56a11008ad9ea4e28605baa
1851 F20101124_AAATCC he_l_Page_107.txt
307439993b362cb5f3e66b65272e32b8
2dc8abfa0c1fa2cbed72a45b0068b30ecb86785a
1087 F20101124_AAATBP he_l_Page_092.txt
9c1996379a112485037d7e543fdcb80c
27bc287a73f72cacf36a2d27757a0e21ba80c60a
30038 F20101124_AAASWJ he_l_Page_088.pro
14b3d2a5542a620f26ec542a9052257a
ce0760fdf00f813b9b31b7ef6cf20c9a620e6da8
28319 F20101124_AAASVU he_l_Page_070.pro
26c90a11c8db87c412a4887798042261
9ddf6e6bbbbcabe7145a52171b02ef51fba6f663
3125 F20101124_AAATCD he_l_Page_108.txt
d66c970556466e3b8d25cf37126dbb56
f60066346abacfddfa86263a4273e8c9df0834cc
785 F20101124_AAATBQ he_l_Page_093.txt
62df4d5cd52fe4a4f038ac325446ccab
ea1a095bc633330a277bf7a55e0431437b208208
38354 F20101124_AAASVV he_l_Page_071.pro
c3c5c88fc977ba633c6c294bece420c0
4fe5b78be1d38bb3641e71525f765be2aeaa5799
1502 F20101124_AAATCE he_l_Page_109.txt
10c1e92208166d94c585037dba73afbe
0b44646093a394d85f644cc18d0f5e59609fa9e2
1173 F20101124_AAATBR he_l_Page_094.txt
df2ed2af5a696faef03dfd9c3ca91eec
1dca33249f7a121293f718b7ef8bd3a0090abb47
26870 F20101124_AAASWK he_l_Page_089.pro
dc910ae9dd81f6e9ee5b4f344d7e0ff5
6742735113e1d94a7d841ab7e638e493b7b5a73e
45782 F20101124_AAASVW he_l_Page_072.pro
cd41e01a21e50a17a9276931325a63d7
ae9da870c15f7d24d85f525135357d57084f4565
2199 F20101124_AAATCF he_l_Page_110.txt
f74fbe8ba48b197884f286ec325da436
f531448989bc68b9b37f89fc5f9da4adfbbef2fb
1237 F20101124_AAATBS he_l_Page_095.txt
d4786c9844438e23c1d28dcbef8a40ba
d5037ef960e9f5dbfc43ea59aa143c3291d80386
19523 F20101124_AAASWL he_l_Page_090.pro
75c4b23df7d8f83872b64b1ae69a872c
c46fa972dcb6ddafb66c9c3ce9f5d1738d04fd15
36932 F20101124_AAASVX he_l_Page_073.pro
952d972da7ddaa487068a6620b522030
1588faa69627952c947cd84c47698df2dda948ce
1528 F20101124_AAATCG he_l_Page_111.txt
aca4146dd31d66b7a7bd3200b5a89f02
e4b7c3de8b20192d8ce0b1cb2149d9a77b527473
50238 F20101124_AAASXA he_l_Page_106.pro
0028482fa5235143379e07a74dc8716d
df657097ed480c1c916f04f08edc8bf711840a7d
985 F20101124_AAATBT he_l_Page_096.txt
4fc6413227735ceaaabcbae0729c6f25
19622edb8fefe402d723432a32236b2ba9d86238
23039 F20101124_AAASWM he_l_Page_091.pro
558bebbd7d166b34cf31bab7f4120b0c
981d709b3a1cecb7e803ec0504dff691d752eb7a
41601 F20101124_AAASVY he_l_Page_074.pro
82ea32cc7e8b7dda21d3845845686684
37031c9bd4b1b4fe12fb9cd239d9c23b66edde4f
F20101124_AAATCH he_l_Page_112.txt
a09b0e9a0564808506cb3e4ac96d6116
6e8cee6c935fb163f960f5dd328ac24bdd6dee27
39607 F20101124_AAASXB he_l_Page_107.pro
789c039bb60eb48b767883fafab1a410
7bb9327b62df83f5048c2d660b4502ed540eaa8d
16161 F20101124_AAASWN he_l_Page_093.pro
d3f4ed2a3e2af595363f1f3273bc12c3
c09f84af4b52d25139e393f4b71f21326c2c16b9
43239 F20101124_AAASVZ he_l_Page_075.pro
48ef6c50a32f26d9c1d4b1efcaa387cd
ac8bce4397d97a549137a27cb8e51331de261002
1320 F20101124_AAATCI he_l_Page_113.txt
ad3e6a038265bee253f5473fc57a682e
dc29f02f4613c8ee2b7647ba7d7dbd59cdcae3d5
53796 F20101124_AAASXC he_l_Page_108.pro
6091ae963e9d85290fc12829c5af937c
ecb3b244016bb33eb90e8518093d403422721a70
28180 F20101124_AAASWO he_l_Page_094.pro
fbbf0d33e367bb4bc0b6cbdd30cab42f
087ded0dab01a828e7a19be75d9e95da38433932
1679 F20101124_AAATCJ he_l_Page_114.txt
84509ed2601d0a061b445d057e0eb64b
0e4789baf12cc00a141ef1ebb2ad669b3c1079ee
32277 F20101124_AAASXD he_l_Page_109.pro
85421be7846c906a8c52ce30ae2109ee
04e466243e846f22d874208f87b81422b32ce175
1118 F20101124_AAATBU he_l_Page_097.txt
aa2c0e815c887f37a8944be1c9c15639
c9824429bb8e35c2b7f94321017044f1757986f7



PAGE 1

DISCLOSURE CONTROL OF CONFIDENTIAL DATA BY APPLYING PAC LEARNING THEORY By LING HE A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLOR IDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2005

PAGE 2

Copyright 2005 by Ling He

PAGE 3

I would like to dedicate this work to my parents, Tianqin He and Yan Gao, for their endless love and encouragement through all these years.

PAGE 4

iv ACKNOWLEDGMENTS I would like to express my co mplete gratitude to my a dvisor, Dr. Gary Koehler. This dissertation would not have been po ssible without his support, guidance, and encouragement. I have been very fortunate to have an advisor who is always willing to devote his time, patience and expertise to th e students. During my Ph.D. program, he taught me invaluable lessons and insights on the workings of academic research. As a distinguished scholar and a grea t person, he sets an example that always encourages me to seek excellence in the academic ar ea as well as my personal life. I am very grateful to my dissertati on cochair, Dr. Haldun Aytug. His advice, support and help in various aspects of my re search carried me on through a lot of difficult times. In addition, I would like to thank the rest of my thesis committee members: Dr. Selwyn Piramuthu and Dr. Anand Rangarajan. Their valuable feedback and comments helped me to improve the dissertation in many ways. I would also like to ackno wledge all the faculty members in my department, especially the department chair, Dr. Asoo Va kharia, for their support, help and patience. I also thank my friends for their generous help, understanding a nd friendship in the past years. My thanks also go to my collea gues in the Ph.D. program for their precious moral support and encouragement. Last, but not least, I woul d like to thank my parents for always believing in me.

PAGE 5

v TABLE OF CONTENTS page ACKNOWLEDGMENTS.................................................................................................iv LIST OF TABLES...........................................................................................................viii LIST OF FIGURES...........................................................................................................ix ABSTRACT....................................................................................................................... xi CHAPTER 1 INTRODUCTION........................................................................................................1 1.1 Background........................................................................................................1 1.2 Motivation..........................................................................................................2 1.3 Research Problem..............................................................................................3 1.4 Contribution.......................................................................................................4 1.5 Organization of Dissertation..............................................................................4 2 STATISTICAL AND COMPUTATIONAL LEARNING THEORY.........................6 2.1 Introduction........................................................................................................6 2.2 Machine Learning..............................................................................................7 2.2.1 Introduction...............................................................................................7 2.2.2 Machine Learning Model..........................................................................7 2.3 Probably Approximately Co rrect Learning Model...........................................8 2.3.1 Introduction...............................................................................................8 2.3.2 The Basic PAC Model Learning Binary Functions..................................8 2.3.3 Finite Hypothesis Space.........................................................................11 2.3.4 Infinite hypothesis space.........................................................................12 2.4 Empirical Risk Minimization and Structural Risk Minimization....................13 2.4.1 Empirical Risk Minimization..................................................................13 2.4.2 Structural Risk Minimization..................................................................13 2.5 Learning with Noise.........................................................................................14 2.5.1 Introduction.............................................................................................14 2.5.2 Types of Noise........................................................................................15 2.5.3 Learning from Statistical Query.............................................................17 2.6 Learning with Queries......................................................................................18

PAGE 6

vi 3 DATABASE SECURITY-CONTROL METHODS..................................................19 3.1 A Survey of Database Security........................................................................19 3.1.1 Introduction.............................................................................................19 3.1.2 Database Security Techniques................................................................21 3.1.3 Microdata files........................................................................................22 3.1.4 Tabular data files....................................................................................25 3.2 Statistical Database..........................................................................................27 3.2.1 Introduction.............................................................................................27 3.2.2 An Example: The Compromise of Statistical Databases........................28 3.2.3 Disclosure Control Methods for Statistical Databases...........................29 4 INFORMATION LOSS AND DISCLOSURE RISK................................................35 4.1 Introduction......................................................................................................35 4.2 Literature Review.............................................................................................36 5 DATA PERTURBATION..........................................................................................42 5.1 Introduction......................................................................................................42 5.2 Random Data Perturbation...............................................................................43 5.2.1 Introduction.............................................................................................43 5.2.2 Literature Review...................................................................................43 5.3 Variable Data Perturbation..............................................................................46 5.3.1 CVC Interval Protection for Confidential Data......................................46 5.3.2 Variable-data Perturbation......................................................................50 5.3.3 Discussion...............................................................................................53 5.4 A Bound for The Fixed-data Pert urbation (Theoretical Basis)........................54 5.5 Proposed Approach..........................................................................................58 6 DISCLOSURE CONTROL BY APPLYING LEARNING THEORY......................62 6.1 Research Problems...........................................................................................62 6.2 The PAC Model For the Fixed-data Perturbation............................................63 6.3 The PAC Model For the Variable-data Perturbation.......................................72 6.3.1 PAC Model Setup...................................................................................72 6.3.2 Disqualifying Lemma 2..........................................................................74 6.4 The Bound of the Sample Size for th e Variable-data Perturbation Case.........82 6.4.1 The bound based on the Disqualifying Lemma proof............................82 6.4.2 The Bound based on the Sample Size.....................................................84 6.4.3 Discussion...............................................................................................85 6.5 Estimated the Mean and Standard Deviation...................................................86 7 EXPERIMENTAL DESIGN AND RESULTS..........................................................91 7.1 Experimental Environment and Setup.............................................................91 7.2 Data Generation...............................................................................................93 7.3 Experimental Results.......................................................................................96

PAGE 7

vii 7.3.1 Experiment 1...........................................................................................97 7.3.2 Experiment 2.........................................................................................101 8 CONCLUSION.........................................................................................................104 8.1 Overview and Contribution............................................................................104 8.2 Limitations.....................................................................................................105 8.3 Directions for Future Research......................................................................106 APPENDIX A NOTATION TABLES..............................................................................................108 B DATA GENERATED FOR THE UNIFORM DISTRIBUTION............................110 C DATA GENERATED FOR THE SYMMETRIC DISTRIBUTION.......................113 D DATA GENERATED FOR THE DI STRIBUTION WITH POSITIVE SKEWNESS.............................................................................................................116 E DATA GENERATED FOR THE DI STRIBUTION WITH NEGATIVE SKEWNESS.............................................................................................................119 LIST OF REFERENCES.................................................................................................122 BIOGRAPHICAL SKETCH...........................................................................................133

PAGE 8

viii LIST OF TABLES Table page 3-1: Original Records......................................................................................................24 3-2: Masked Records.......................................................................................................24 3-3: Original Table..........................................................................................................26 3-4: Published Table........................................................................................................26 3-5: A Hospital’s Database..............................................................................................29 5-1: An Example Database..............................................................................................47 5-2: The Example Database With Camouflage Vector...................................................48 5-3: An Example of Interval Disclosure..........................................................................54 5-4: LP Algorithm............................................................................................................55 6-1: Bounds on the Sample Size with Different Values of n.........................................72 6-2: The Relationship among , s and l................................................................86 6-3: Heuristic to Estimate the Mean Standard Deviation and the Bound l .........88 6-4: Summary of the Estimated i i and il in the CVC Example Network...............89 7-1: Summary of Four cases with Different Means an d Standard Deviations................93 7-2: The Intervals of ab under the Four Cases...........................................................93 7-3: Experiments Results on 16 Tests with the Means, Standard Deviations, Sample Sizes and Average Error Rates.................................................................................98 7-4: Experimental Results on the Average Error Rates with 6,000 l for 16 Cases...101

PAGE 9

ix LIST OF FIGURES Figure page 2-1: Error Probability.......................................................................................................10 3-1: Microdata File That Has Been Read Into SPSS.......................................................23 4-1: R-U Confidentiality Map, Univariate Case, 2210,5,2 n .......................40 5-1: Network With ,1,3 mw (data source: Garfinkel et al. 2002)..........................49 5-2: Discrete Distribution of Perturbations from the Bi n-CVC Network Algorithm......52 5-3: Relationships of ,', ccc and d.............................................................................58 5-4: Illustration of the Connection between the PAC Learning and Data Perturbation..59 6-1: Relationships 01201,,,, HHHhh and d in the Fixed-Data Perturbation...............65 6-2: Relationships of 01201,,,, HHHhh and d in the Variable-Data Perturbation......74 6-3: A Bimodal Distribution of Pert urbations in the CVC Network while .........76 6-4: A Distribution of Perturba tions in the CVC Network with n .................77 7-1: Plots of Four Uniform Distributions of Perturbati ons at Different Means and Standard Deviations.................................................................................................94 7-2: Plots of Four Symmetric Distributions of Perturbations at Different Means and Standard Deviations.................................................................................................95 7-3: Plots of Four Distributions with Posi tive Skewness of Pertur bations at Different Means and Standard Deviations...............................................................................96 7-4: Plots of Four Distributions with Posi tive Skewness of Perturbations at Different Means and Standard Deviations...............................................................................97 7-5: Plot of Average Error Rates (%) for 16 Tests..........................................................99

PAGE 10

x 7-6: The Probability Histogram of Perturba tion Distribution for the CVC Network....100 7-7: Plot of Bounds on the Sample Size for 16 Tests....................................................101

PAGE 11

xi Abstract of Dissertation Pres ented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy DISCLOSURE CONTROL OF CONFIDENTIAL DATA BY APPLYING PAC LEAR NING THEORY By Ling He August 2005 Chair: Gary Koehler Cochair: Haldun Aytug Major Department: Decision and Information Sciences With the rapid development of information technology, massive data collection is relatively easier and cheaper than ever before Thus, the efficient and safe exchange of information becomes the renewed focus of database management as a pervasive issue. The challenge we face today is to provide users with reliable and useful data while protecting the privacy of c onfidential information contained in the database. Our research concentrates on statistical databases, which usually store a large number of data records and are open to the public where users are allowed to ask only limited types of queries, such as Sum, Count and Mean. Responses for those queries are aggregate statistics that intends to prevent di sclosing the identity of a unique record in the database. My dissertation aims to analyze thes e problems from a new perspective using Probably Approximately Correct (PAC) learning theory which attempts to discover the true function by learning from examples. Di fferent from traditional methods from which

PAGE 12

xii database administrators apply security met hods to protect the privacy of statistical databases, we regard the true database as the target concept that an adversary tries to discover using a limited number of queries, in the presence of some systematic perturbations of the true answer. We exte nd previous work and classify a new data perturbation method– the vari able data perturbation whic h protects the database by adding random noises to the conf idential field. This method uses a parametrically driven algorithm that can be viewed as genera ting random perturbations by some (unknown) discrete distribution with known parameters, su ch as the mean and standard deviation. The bounds we derive for this new method s hows how much protecti on is necessary to prevent the adversary from discovering the data base with high probability at small error. Put in PAC learning terms we derive bounds on the amount of error an adversary makes given a general perturbation scheme, number of queries and a confidence level.

PAGE 13

1 CHAPTER 1 INTRODUCTION 1.1 Background Statistical organizations, such as U.S. Ce nsus Bureau, National Statistical Offices (NSOs), and Eurostat, collect large amounts of data every year by conducting different types of surveys from assorted individuals. M eanwhile, the data stored in the statistical databases (SDBs) are disseminated to the public in various forms, including microdata files, tabular data files or sequential queries to the online databases. The data are retrieved, summarized and analyzed by various database users, i.e., researchers, medical institutions or business companies. Among the published data, restrictions are established on the release of sensitive data in order to comply with the confidentiality agreements imposed by the sources or providers of the orig inal information. Therefore, the protection of confidential information b ecomes a critical issue with serious economic and legal implications which in turn expands the scope and necessity of impr oved security in the database field. Statistical databases usually store large a number of data records and are open to the public where users are allowed to ask onl y limited types of queries, such as Sum, Count and Mean. Responses for those queries ar e aggregate statistics that aim to prevent disclosing the identity of a uni que record in the database. With the rapid development of informati on technology, it becomes relatively easier and cheaper to obtain data th an ever before. With the recent passage of The Personal Responsibility and Work Opportunity Act of 1996 (The Welfare Reform Act) (Fiengerg

PAGE 14

2 2000) and Health Insurance Portability and Accountability Act of 1996 (HIPPA) in the United States, the protection of confidential information collected by statistical organizations has become a renewed focus of database management as a pervasive issue since the 70s and 80s. Those statistical organizations ha ve the legal and ethical obligations to maintain the accuracy, integrity and privacy of the information contained in their databases. 1.2 Motivation Traditional research on SDBs privacy, which is also called Statistical Disclosure Control (SDC), has been under way for over 30 years. SDC provides a ll types of securitycontrol methods. Among them, microaggrega tion, cell suppression and random data perturbation are some of the most promis ing SDC methods. Recently, Garfinkel et al. (2002) developed a new technique called CVC protection which designs a network algorithm to construct a series of camouflage vectors which hides the true confidential vector. This CVC technique pr ovides interval answers to ad -hoc queries. All those SDC methods attempt to provide the SDB users with reliable and useful data (minimizing the information loss) while protecting the privacy of the confidential information in the database (minimizing the disclosure risk) as well. Probably Approximately Correct (PAC) learning theory is a framework for analyzing machine learning algorithms. It a ttempts to discover the true function by learning from examples which are ra ndomly drawn from an unknown but fixed distribution. Given accuracy and confidence parameters, the PAC model bounds the error that the true function makes. Different from the traditional methods from which database administrators apply SDC methods to protect the privacy of SDBs, we approach the databa se security problem

PAGE 15

3 from a new perspective, from which we a ssume that an adversary regards the true confidential data in the database as the targ et concept and tries to discover it within a limited number of queries by applying PAC learning theory. We describe how much prot ection is necessary to gua rantee that the adversary cannot uncover the database’s confidential information with high probability. Put in PAC learning terms we derive bounds on the amount of error an adversary makes given a general perturbation scheme, number of queries and a confidence level. 1.3 Research Problem Additive data perturbation includes some of the most popular database security methods. Inspired by the CVC technique, we cl assify a new method into this category– the variable data perturba tion which protects a databa se by adding random noises. Different from the fixed random data pert urbation method, this method effectively generates random perturbations which have an unknown discrete distribution. However, parameters, such as the mean and standard de viation, can be estimated. The variable data perturbation method is th e focus of our research. We intend to derive a bound on the level of error that an adversary may make while compromising a database. We extend the previous work by Dinur and Nissim (2003), who found a bound for the fixed data perturba tion method, and deploy the PAC learning theory to develop a new bound for th e variable data perturbation. A threshold on the number of queries is developed from the error bound. With high probability, the adversary can disclose the data base at small error if this certain number of queries is asked. Therefore, we may fi nd out how much protecti on would be necessary to prevent the disclosure of the confidential information in a statistical database.

PAGE 16

4 Our experiments indicate that a high level of protection may yield answers that are not useful whereas useful answers can lead to the compromise of a database. 1.4 Contribution Two major contributions are e xpected from this research. First, we approach the database security problem from a new perspective instead of following the traditional research paths in this field. By applying PA C learning theory, we re gard an adversary of the database as a learner w ho tries to discover the confid ential information within a certain number of queries. We show that both SDC methods and PAC learning theory actually use the similar methodology for differe nt purposes. We also derive a PAC-like bound on the sample size for the variable data perturbation method, within which the database can be compromised with a high probab ility at small error. Based on this result, we would find out if a security method can provide enough protection to the database. 1.5 Organization of Dissertation The dissertation is organized into 8 part s. Chapter 2 provides an overview of the important concepts, methodologies and models in the fields of machine learning and PAC learning theory. In Chapter 3, we summari ze database security-control methods in microdata files, tabular data files and the stat istical database which is the emphasis of our efforts. We review the lit erature of performance meas urements for the database protection methods in Chapter 4. Following that, in Chapte r 5 random data perturbation methods are reviewed and a new data perturba tion method, variable-d ata perturbation, is defined and developed. Two papers that motivated our research are reviewed and explained. We propose our appro ach at the end of this chapter. In Chapter 6, we introduce our methodology and develop the research model. A bound on the sample size for the variable data perturbation me thod is derived, within which the confidential information

PAGE 17

5 can be disclosed. In Chapter 7, experime nts are designed and conducted to test our theoretical conclusions from previous chapte rs. Experimental results are summarized and analyzed at the end. Chapter 8 concludes our work and gives directions for future research.

PAGE 18

6 CHAPTER 2 STATISTICAL AND COMPUTATIONAL LEARNING THEORY In this chapter, we introduce Statisti cal and Computational Learning Theory, a formal mathematical model of learning. The overview focuses on the PAC model, the most commonly used theoretical framework in th is area. We then move to a brief review of statistical learning theory and its two im portant principles: empi rical and structural minimization principles. Other well-known con cepts and theorems are also investigated here. At the end of the chapter, we extend the basic PAC framework to more practical models, that is, learning with noise and query learning models. 2.1 Introduction Since the 1960s, researchers have been diligently working on how to make computing machines learn. Research ha s focused on both empirical and theoretical approaches. The area is now called machine lear ning in computer science but referred to as data mining, knowledge discovery, or pa ttern recognition in other disciplines. Machine learning is a mainstream of artifici al intelligence. It aims to design learning algorithms that identify a targ et object automatically without human involvement. In the machine learning area, it is very common to measure the quality of a learning algorithm based on its performance on a sample dataset. It is therefore difficult to compare two algorithms strictly and rigorously if the cr iterion depends only on empirical results. Computational learning theory defines a formal mathematical model of learning, and it makes it possible to analyze the efficiency a nd complexity of learning algorithms at a theoretical level (Goldman 1991).

PAGE 19

7 2.2 Machine Learning 2.2.1 Introduction In this section we start our review with an introduction to important concepts in the machine learning field, such as hypotheses, training samples, inst ances, instance spaces, etc. This is followed by a demonstration of the basic machine learning model which is designed to generate an hypothe sis that closely approximates the unknown target concept. See Natarajan (1991) for a complete introduction. 2.2.2 Machine Learning Model Many machine learning algorithms are utilized to tackle classification problems which attempt to classify objects into particul ar classes. Three types of classification problems includ binary classification–one with two classes; multi-class classification– handling a finite number of output categorie s; and regression w hose output are real values (Cristianini and Shawe-Taylor 2000). Most machine learning methods learn from ex amples of the target concept. This is called supervised learning. The target concept (or target function) f is an underlying function that maps data from the input space to the output space. The input space is also called an instance space, denoted as X which is used to describe each instance nxX Here n represents the dimensions or attr ibutes of the input instance. The output space, denoted as Y, contains every po ssible output label yY In the binary classification case, the target concept (or target function) f x classifies all instances x X into negative and positive classe s, illustrated as 0 and 1, 0,1nXY Let 1fx if x belongs to a positiv e (true) class, and 0fx (false) otherwise.

PAGE 20

8 Suppose a sample S includes l pairs of training examples, 11,,,,llSxyxy Each i x is an instance, and output iy is i x ’s classification label. The learning algorithm inputs the trai ning sample and outputs an hypothesis hx from the set of all hypotheses under consider ation which best approximates the target concept f x according to its criteria. An hypothesis space H is a set of all possible hypotheses. The target concept is chosen from the concept space f C which consists of a set of all possible concepts (functions). 2.3 Probably Approximately Correct Learning Model 2.3.1 Introduction The PAC model proposed by Valiant in 1984 is considered the first formal theoretical framework to analyze machine l earning algorithms, and it formally initiated the field of computational le arning theory. By learning from examples, the PAC model combines methods from complexity theory and probability theory, aimed at measuring the complexity of learning algorithms. The core idea is that the hypothesis generated from the learning algorithm approximates the target concept with a high probability at a small error in polynomial time and/or space. 2.3.2 The Basic PAC Model Learning Binary Functions The PAC learning model quantifies the wors t-case risk associated with learning a function. We discuss its details using binary functions as the learning domain. Suppose there is a training sample S of size l. Every example is generated independently and identically from an unknown bu t fixed probability distribution D over the instance space 0,1nX Thus, the PAC model is also named a distribution-free model. Each instance

PAGE 21

9 is an n-bits binary vector, 0,1nxX The learning task is to choose a specific boolean function that approximates the target concept :0,10,1nf f C The target concept f is chosen from the concept space 2XC of all possible boolean functions. According to PAC requirement s a learning algorith m must output an hypothesis hH in polynomial time, where 2XH. We hope that the target function f H and hypothesis h can approximate target function f as accurately as possible. If f H then the classification errors are inevitable. Consider a concept space 2XC an hypothesis space 2XH, and an unknown but fixed probability distribution D over an instance space 0,1nX the error of an hypothesis, hH with respect to a target concept f C is the probability that h and f disagree on the classification of an instance x X drawn from D This probability of error is denoted by a risk functional: Pr,:D Derrhxfxhxfx To understand the error more intuitively, s ee Figure 2-1. The error probability is indicated by areas of I and II. Areas I and II in the figure show where hx disagrees with f x on the instances located in these places. We can thi nk about them as Type I and Type II errors. Area III and IV contain those instances that hx and f x agree on their classification. The PAC model utilizes an accuracy parameter and confidence parameter to measure the quality of an hypothesis h. Given a sample S of size l, and a distribution D

PAGE 22

10 from which all training examples are dr awn, the PAC model strives to bound the probability that an hypothesis h gives large error by as in Pr:l DDsSerrorh where s h means that the training set decide s the selection of the hypothesis. Figure 2-1: Error Probability Definition: PAC Learnable. A concept class C of boolean functions is PAC learnable if there exists a learning algorithm A using an hypothesis space H, such that for every f C for every probability distribution D for every 012 and for every 012 : (1) An hypothesis hH produced by algorithm A can approximate the target function f with high probab ility at least 1 such that errorh (2) The complexity of the learning algorithm A is bounded by the size of target concept n, 1 and 1 in polynomial time. The sample co mplexity refers to the sample size within which the algorithm A needs to output an hypothesis h. I III II Instance Space X hxfx hxfx IV

PAGE 23

11 2.3.3 Finite Hypothesis Space An hypothesis space H can be finite or in finite. If an hypothesis h classifies all training examples correctly, it is called a consistent hypothesis We will derive the main PAC result in multiple steps using well-know n inequalities from probability theory. 2.3.3.1 Finite consistent hypothesis space Assuming the hypothesis space H is finite, if we choose an hypothesis h with a risk greater than the probability that it is consistent on a training sample S of size l is bounded as Pr:1l l l DShconsistentanderrorhe To see this, observe that the probability that hypothesis 1h classifies one input pair 11, x fx correctly is 1 111Pr1 hxfx Given l examples, the probability 1h classifies 11,,,,ll x fxxfx correctly is 111Pr1l l lllhxfxhxfx because the sampling is i.i.d. Thus, the probability of finding an hypothesis h with error greater than and consistent with the training set (of size l ) is denoted by the union bound (i.e., the worst case) 1lH To see this latter step, first define iE to represent the event that ih is consistent. Then we know that 1 1PrPr1H H l ll ii i iEEH Finally, 1l le is a commonly known simple algebraic inequality.

PAGE 24

12 The idea behind the PAC bound is to bound this unlucky scenario (i.e., algorithm A finds a consistent hypothesis that happen s to be one with error greater than ). The following result formalizes this. Blumer Bound (Blumer et al. 1987). 1lH Thus, the sample complexity, l, for a consistent hypothesis h over finite hypothesis space H is bounded by 11 lnln lH 2.3.3.2 Finite inconsistent hypothesis space An hypothesis h is called inconsistent if there exist misclassification errors 0s in the training sample. The sample complexity is therefore bounded by 211 lnln 2slH and the error is bounded by 11 lnln 2sH l We can see from the above inequality that is usually larger than error rate s Interested readers can see Goldman (1991) for further explanations. 2.3.4 Infinite hypothesis space When H is finite we can use H directly to bound the sample complexity. When H is infinite we need to utilize a different m easure of capacity. One su ch measure is called the VC dimension, which was first pr oposed by Vapnik and Chervonenkis (1971). Definition: VC Dime nsion Definition. The VC dimension of an hypothesis space is the maximum number, d, of points of the instance space that can be separated into two

PAGE 25

13 classes in all possible 2d ways using functions in the hypothesis space. It measures the richness or capacity of H (i.e., the higher d is the richer the representation). Given H with a VC dimension d and a consistent hypothesis hH then the PAC error bound is (Cristianini and Sh awe-Taylor 2000): 22222 loglog el d ld provided dl and 2 l 2.4 Empirical Risk Minimi zation and Structural Risk Minimization 2.4.1 Empirical Risk Minimization Given a VC dimension d and an hypothesis hH with a training error s the error rate is bounded by 424 2lnlnsel d ld Therefore, the empirical risk can be mi nimized directly by minimizing the number of misclassifications on the sample. This principle is called the Empirical Risk Minimization principle 2.4.2 Structural Risk Minimization As is well known, one disadvantage of th e empirical risk minimization is the overfitting problem, that is, for small sample sizes, a small empirical risk does not guarantee a small overall risk. Statistical learning theory uses the structural risk minimization principle (SRM) (Schlkopf and Smola 2001, Vapnik 1998) to solve this problem. The SRM focuses on minimizing a boun d on the risk functional. Minimizing a risk functional is formally developed as a goal of learning a function from examples by statistical learning theory (Vapnik 1998):

PAGE 26

14 ,, R LzgzdFz over where L is a loss function for misclassified points, g is an instance of a collection of target func tions parametrically defined by and z is the training pair assumed to be drawn randomly and i ndependently according to an unknown but fixed probability distribution Fz. Since Fz is unknown, an induction principle must be invoked. It has been shown that for any with a probability at least 1 the bound on a consistent hypothesis 4 ,, 11 2,,emp struct emp bound structR Rdl RRR Rdl holds where the structural risk structR depends on the sample size, l, the confidence level, and the capacity, d, of the target function. The bound is tight, up to log factors, for some distributions (Crist ianini and Shawe-Taylor 2000). When the loss function is the number of misclassificati ons, the exact form of structR is ln21ln4 ,,4structdld Rdl l It is a common learning strate gy to find consistent target functions that minimize a bound on the risk functional. This strategy prov ides the best “worst case” solution, but it does not guarantee finding target functions that actually minimize the true risk functional. 2.5 Learning with Noise 2.5.1 Introduction The basic PAC model is also called the noi se-free model since it assumes that the training set is error-free, meaning that the given training examples are correctly labeled

PAGE 27

15 and not corrupted. In order to be more practi cal in the real world, the PAC algorithm has been extended to account for noisy inputs (d efined below). Kearns (1993) initiated another well-studied model in the machine learning area, the Sta tistical Query model (SQ), which provides a framework for a noise-tolerant le arning algorithm. 2.5.2 Types of Noise Four types of noise are summarized in Sloan’s paper (Sloan 1995): (1) Random Misclassification Noise (RMN) Random misclassification noise occurs when the learning algorithm, with probability 1, receives noiseless samples y x from the oracle and, with probability receives noisy samples y x (i.e., x with an incorrect cla ssification). Angluin and Laird (1988) first theoretically modeled PAC learning with RMN noise. Their model presented a benign form of misclassification noise. They concluded if the rate of misclassification is less than 12, then the true concept can be learned by a polynomial algorithm. Within l number of samples, the algorithm can find an hypothesis h minimizing the number of disagreements h F. Disagreements h F denotes the number of times that some hypothesis h disagrees with where is the training sample. Sample size l is bounded by H lb2 ln 2 1 22 2 provided 2 1 0 b Extensive studies can be found in Aslam and Decatur (1993), Blum et al. (1994), Bshouty et al. (2003), De catur and Gennaro (1995 ), and Kearns (1993). (2) Malicious Noise (MN)

PAGE 28

16 Malicious noise occurs when the le arning algorithm, with probability 1, gets the correct samples but with probability the oracle returns noisy data, which may be chosen by a powerful malicious adversary. No assumption is made about corrupted data, and the nature of the noise is also unknown. Valiant (1985) first simulated this situation of learning from MN. Kearns and Li (1993) fu rther analyzed this worst-case model of noise and presented some general methods that any learning algorithm can apply to bound the error rate, and they show ed that learning with noise problems are equivalent to standard combinatorial optimization problem s. Additional work can be found in Bshouty (1998), Cesa-Bianchi et al. (1999), and Decatur (1996, 1997). (3) Malicious Miscla ssification Noise (MMN) Malicious misclassification (labeling) noise is that where misclassification is the only possible noise. The adversary ca n choose only to change the label y of the sample pair y x with probability while no assumption is made about y. Sloan (1988) extended Angluin and Lair d’s (1988) result to th is type of noise. (4) Random Attribute Noise (RAN) Random attribute noise is as follo ws. Suppose the instance space is 0,1n. For every instance x in a sample pair y x ,, its attribute ix n i 1, is flipped to ix independently and randomly with a fixed probability This kind of noise is called uniform attribute noise In this case, the noise affects only the input instance, not the output label. Shackelford and Volper ( 1988) probed the RAN for the problem of k-DNF expressions. k-DNF is the disjunctions of terms, wher e each term is a conjunction of at most k-literals. Later Bshout y et al. (2003) defined a noisy distance measure for function classes, which they proved to be the best possible learning styl e in an attribute noise case.

PAGE 29

17 They also indicated that a concept class C, is not learnable if this measure is small (compared with C and attribution noise distribution D ). Goldman and Sloan (1995) developed a uniform attribute noise model for product random attribute noise in which each attribute ix is flipped with its own probability i n i 1. They demonstrated that if the al gorithm focuses only on minimizing the disagreements, this type of noise is nearly as harmful as malicious noise. They also proved that no algorithm can exist if the noise rate i (n i 1) is unknown and the noise rate is higher than 2 ( is the accuracy parameter in the PAC model). Decatur and Gennaro (1995) further proved that if each noise probability i (or an upper bound) is known, then a PAC algorithm may exist for the simple classification problem. 2.5.3 Learning from Statistical Query The Statistical Query (SQ) model introdu ced by Kearns (1993) provides a general framework for an efficient PAC learning algori thm in the presence of classification noise. Kearns proved that if any f unction class can be learned effi ciently by the SQ model, then it is also learnable in the PAC model, and those algorithms are called SQ-typed. In the SQ model, the learning al gorithm sends predicates ,x to the SQ oracle and asks for the probabilities xP that the predicate is correct. Instead of answering the exact probabilities, the oracle gives only probabilities xPˆ within the allowed approximation error which here indicates a tolerance for error, i.e., x x xP P Pˆ. The approach that the SQ model suggested to generate noise-tolerant algorithms is successful. A large number of noise-tolerant al gorithms are formulated as SQ algorithms. Aslam and Decatur (1993) presented a genera l method to boost the acc uracy of the weak

PAGE 30

18 SQ learning algorithm. A late r study by Blum et al. (1994) proved that a concept class can be weakly learned with at least 1 3d queries, and the upper bound for the number of queries is Od. The SQ-dimension d is defined as the number of “almost uncorrelated” concepts in the concept cla ss. Jackson (2003) further improved the lower bound to 2n while learning the class of parity functions in an n-bit input space. However, the SQ model has its limitations. Blumer et al. (1989) proved that there exists a class that cannot be e fficiently learned by SQ, but is actually efficiently learnable. Kearns (1993) showed that th e SQ model cannot generate efficient algorithms for parity functions which can be learned in a noise less data PAC model. Jackson (2003) later showed that noise-tolerant PAC algorithms developed from using the SQ model cannot guarantee to be optimally efficient. 2.6 Learning with Queries Angluin (1988) initia ted the area of Query learning. In the basic framework, the learner needs to iden tify an unknown concept f from some finite or countable concept space C of subsets of a universa l set. The Learner is allowed to ask specific queries about the unknown concept f to an oracle which responds according to the queries’ types. Angluin studied different kinds of que ries, such as membership query, equivalence query, subset, and so forth. Different fr om a PAC model which requires only an approximation to the target concept, query le arning is a non-statistical framework and the Learner must identify the target concept exactly. An efficient algorithm and lower bounds are described in Angluin’s res earch. Any efficient algorithm using equivalence queries in query learning can also be convert ed to satisfy the PAC criterion h error Pr.

PAGE 31

19 CHAPTER 3 DATABASE SECURITY-CONTROL METHODS In this chapter, we will survey important concepts and techniques in the area of database security, such as compromise of a database, inference, disclosure risk, and disclosure control methods among other issues According to the way that confidential data are released, we categorize the review of database securi ty methods into three parts: microdata, tabular data, and sequential queri es to databases. Our main efforts will concentrate on the security contro l of a special type of databa se – the statistical database (SDB), which accepts only limited types of queries sent by users. Basic SDB protection techniques in the literature are reviewed. 3.1 A Survey of Database Security For many decades, computerized databases de signed to store, manage, and retrieve information, have been implemented successfully and widely in many areas, such as businesses, government, research, and health care organizations. Stat istical organizations intend to provide database users with the ma ximum amount of information with the least disclosure risk of sensitive and confiden tial data. With the ra pid expansion of the Internet, both the general publ ic and the research commun ity have been much more attentive to the issues of th e database security. In the fo llowing sections, we introduce basic concepts and techniques commonl y applied in a general database. 3.1.1 Introduction A database consists of multiple tables. Each table is constructed with rows and columns representing entities (or records) a nd attributes (fields), respectively. Some

PAGE 32

20 attributes may store confidential information such as income, medical history, financial status, etc. Necessary security methods have been designed and a pplied to protect the privacy of specific data from outsiders or illegal users. Database security has its own terminology for research purposes. Therefore, first we would like to clarify certain important de finitions and concepts. Those are repeatedly used in this research paper and may ha ve varied implications under different circumstances. When talking about the confid entiality, privacy or security of a database, we refer to the disclosure risk of the confidential data. A compromise of the database occurs when the confidential information is disclosed to illegitimate users exactly, partially or inferentially. Based on the amount of compromised sensitiv e information, the disclosure can be classified into exact disclosure and partial disclosure (Denning et al. 1979, Beck 1980). Exact disclosure or exact inference refers to the situation that illegal users can infer the exact true confidential information by sending sequential queries to the database, while in the case of partial disclosure, the true confidential data can be inferred only to a certain level of accuracy. Inferential disclosure or statistical inference is another type of disclosure, which refers to the situation that an illegal user can infer the confidential data with a high probability by sending sequential queries to th e database. And the probability exceeds the threshold of disclosure pr edetermined by the database admi nistrator. This is known as an inference problem, which also fa lls within our research focus.

PAGE 33

21 There are mainly two types of disclosure s in terms of the disclosure objects: identity disclosure and attribute disclosure Identity disclosure occurs if the identity of a subject is linked to any particular dissemin ated data record (S pruill 1983). Attribute disclosure implies the users could learn the attribute value or estimated attribute value about the record (Duncan and Lambert 1989, Lambert 1993). Currently, most of the research focuses on identity disclosure. 3.1.2 Database Security Techniques Database security concerns the privacy of confidential data stored in a database. Two fundamental tools are applied to prev ent compromising a da tabase (Duncan and Fienberg 1999): (1) restricting access and (2) restricting data. For ex ample, a statistical office or U.S. Census Bureau disseminating data to the public may enforce administrative policies to limit users’ acce ss to data. Normally the comm on method used is that the database administrator assigns ID s and passwords to different t ypes of users to restrict the access at different security levels. For exam ple, for a medical database, doctors could have full access to all kinds of informa tion and researchers may only obtain the nonconfidential records. This security mechan ism is addressed as the restricting access. When all users have the same level of access to the database, only transformed data are usually allowed to be released for the pur pose of security. This protection approach which is in the data restri ction category reduces disclosu re risk. However, for some public databases only access control is not feasible and sufficient enough to prevent inferential disclosure. Thus both tools are complementary and may be used together. However, we prioritize our research in th e second category – the data restriction approach.

PAGE 34

22 Database privacy is also known as Statistical Disclosure Control or Statistical Disclosure Limitation (SDL). The SDC techniques, which are used to modify original confidential data before their release, try to balance the tradeoff between information loss (or data utility) and disclosure risk. Some measures evaluating the performance of SDC methods will be discussed in Chapter 4. Based on the way that data are released publ icly, all responses from queries can be classified into three types: mi crodata files, tabular data file s and statistical responses from sequential queries to databases (Ms 2000). Mos t of the typical databases deal with all three dissemination formats. Our research fo cuses on a section of the third category – sequential queries to a statistical database (S DB), which differs from a regular database due to its limited querying interface. Normally only a few types of queries such as SUM, COUNT, Mean, and etc. can be operated in SDB. The goal of applying disclosure control me thods is to prevent users from inferring confidential data on the basis of those successive statistical queries. We briefly describe protection mechanisms for microdata and tabular data in the next two subsections, 3.1.3 and 3.1.4. Security control techniques for the st atistical database are discussed in detail in section 3.2. 3.1.3 Microdata files Microdata are unaggregated or unsummari zed original sample data containing every anomynized individual record (such as pe rson, business company, etc.) in the file. Normally, microdata originally come from the responses of census surveys issued by the statistical organizations, such as the U.S. Census Bureau (see Figure 3-1 for an example) and include detailed information with ma ny attributes (probably over 40), such as income, occupation, household composition, and et c. Those data are released in the form

PAGE 35

23 of flat tables, where rows and columns represent records and attributes for each individual respondent, respect ively. Microdata can usuall y be read, manipulated and analyzed by computers with statistical software. See Fi gure 3-1 for an example of microdata that are read into SPSS (Statist ical Package for the Social Sciences). Figure 3-1: Microdata File That Has Been Read Into SPSS. (Data source: Indiana University Bloomingt on Libraries, Data Services & Resources. http://www.indiana.edu/~libgpd/ data/microdata/what.html) 3.1.3.1 Protection Techniques for microdata files Before disseminating microdata files to the public, statistical organizations will apply SDC techniques either to distort or remove certain info rmation from original data files, therefore protecting the anonymity of individual record. Two generic types of microdata prot ection methods are (Crises 2004a): (1) Masking methods The basic idea of masking is to add errors to the elements of a dataset before the data are released. Masking me thods have two categories: pe rturbative (see Crises 2004d for a survey) and non-perturbative (s ee Crises 2004c for a survey). The perturbative category modifies the orig inal microdata before its release. It includes methods such as adding noise (S ullivan 1989 and Brand 2002, Domingo-Ferrer

PAGE 36

24 et al. 2004), rounding (Willenborg 1996 and 2000), microaggregation (Defays and Nanopoulos 1993, Anwar 1993, Mateo and Do mingo 1999, Domingo and Mateo 2002, Li et al. 2002b, Hansen and Mukherjee 2003), da ta swapping (Dalenius and Reiss 1982, Reiss 1984, Feinberg 2000, and Fienbe rg and McIntyre 2004) and others. The non-perturbative category does not change data but it makes partial suppressions or reductions of details in the microdata set, and applies methods such as sampling, suppression, recoding, and others (DeWaal and Willenborg 1995, Willenborg 1996 and 2000). The following two tables are simple illustrations of masking methods, i.e., data swapping, Additive noise and microaggreg ation. (Data source: Domingo-Ferrer and Torra 2003). First the microaggregation me thod is used to group “Divorced” and “Widow” into one category – “Widow/er-or-divorced” in the field “Marital Status”; Secondly, values of record 3 and record 5 in the “Age” column are switched by applying data swapping techniques; finally, the valu e of record 4 in th e “Age” attribute is perturbed from “36” to “40” by adding noise of “4”. Table 3-1: Original Records Record Illness … Sex Marital Status Town Age 1 Heart … M Married Barcelona 33 2 Pregnancy … F Divorced Tarragona 40 3 Pregnancy … F Married Barcelona 36 4 Appendicitis … M Single Barcelona 36 5 Fracture … M Single Barcelona 33 6 Fracture … M Widow Barcelona 81 Table 3-2: Masked Records Record Illness … Sex Marital status Town Age 1 Heart … M Married Barcelona 33 2 Pregnancy … F Widow/er-or-divorcedTarragona 40

PAGE 37

25 Table 3-2. Continued. Record Illness … Sex Marital status Town Age 3 Pregnancy … F Married Barcelona 33 4 Appendicitis … M Single Barcelona 40 5 Fracture … M Single Barcelona 36 6 Fracture … M Widow/er-or-divorcedBarcelona 81 (2) Synthetic data generation Liew et al. (1985) initially proposed this protection appr oach which first identifies the underlying density function with associated parameters for the confidential attribute, and then generates a protected dataset by ra ndomly drawing from that estimated density function. Even though data generated from this method do not derive from original data, they preserve some statistical properties of the original distributions. However, the utility of those simulated data for the user has alwa ys been an issue. S ee (Crises 2004b) for an overview of this method. 3.1.4 Tabular data files Another common way to releas e data is in the tabular data format (also called macrodata) obtained by aggregating microda ta (Willenborg 2000). It is also called summary data, table data or compiled data. Th e numeric data are summarized into certain units or groups, such as geographic area, raci al group, industries, age, or occupation. In terms of different processes of aggregation, pub lished tables can be classified into several types, such as magnitude tables, freque ncy count tables, linked tables, etc. 3.1.4.1 Protection techniques for tabular data Tabular data files collect data at a higher level of aggregation since they summarize individual atomic information. Therefore they provide higher security for database than microdata files. However, the disclosure risk has not been completely eliminated and intruders could still infer confidential data from an aggregated ta ble (see Table 3-3 and

PAGE 38

26 3.4 for an example). Protection technique s, such as cell suppression (Cox 1975, 1980, Malvestuto et al. 1991, Kelly et al. 1992, Chu 1997), table redesign, noise adding, rounding, or swapping among others, have to be adopted before the release. See Sullivan (1992), Willenborg (2000), Ogania n (2002) for an overview. See Table 3-3 for an illustration of tabular data. It shows state level data for various types of food stores The Economic Divisi on published the economic data by geography and standard industrial classi fication (SIC) codes. The “V alue of Sales” field is considered as confidential data. Table 3-4 demonstrates how a cel l suppression technique is applied to protect the confidential data. (Data source: U.S. Bureau of the Census Statistical Research Division, Sullivan 1992). Table 3-3: Original Table: SIC … Number of Establishments Value of Sales ($) 54 All Food Stores … 347 200,900 541 Grocery … 333 196,000 542 Meat and Fish … 11 1,500 543 Fruit Stores … 2 2,400 544 Candy … 1 1,000 Table 3-4: Published Table Af ter Applying Cell Suppression SIC … Number of Establishments Value of Sales ($) 54 All Food Stores … 347 200,900 541 Grocery … 333 196,000 542 Meat and Fish … 11 1,500 543 Fruit Stores … 2 D 544 Candy … 1 D Only one Candy store reported sales value for th is state in Table 3-3. If the table is released as it is, any user would learn the exac t sales value for this specific store. Also a sales value is listed for two Fruit stores in this state. Therefore by knowing its own sales figure, either of these two stores can infe r the competitor’s sales volume. A disclosure

PAGE 39

27 occurs under either situation. Thus, SDC methods have to be incorporated into the original table before its publication. Table 3-4 shows that the conf idential data resulting in a compromise are suppressed and replaced by a “D” in the cells. The tech nique applied is called cell suppression, which is very commonly used by U.S Bureau Census currently. 3.2 Statistical Database 3.2.1 Introduction A statistical database (SDB) differs from a regular database due to its limited querying interface. Its users can retrieve only aggregate statisti cs of confidential attributes, that is, SUM, COUNT, and Mean for a subset of records stored in the database. Those aggregate statistics are calculat ed from tables in databases. Tables could include microdata or tabular data. In other words, query responses in SDBs could be treated as views of microdata or tabular da ta tables. However, t hose views can only be summarized to answer limited types of queries a nd in the form of aggregate statistics they are computed according to each query. A SDB is compromised if the sensitive data is disclosed by answering a set of queries. Note that some of the protection methods used in SDBs are overlapped with those for microdata files and tabular data files. However, SDBs security methods emphasize on preven ting a disclosure from responding sequential queries. Many government agencies, businesses, and research institutions normally collect and analyze aggregate data for their special purposes. For instance, medical researchers may need to know the total number of HIV-pos itive patients within a certain age range and gender. The users should not be allowed to link the sensitive information to any specific record in the SDB by asking sequential statistical queries. We illustrate how a

PAGE 40

28 statistical database could po ssibly be compromised by the following example, and further explain the necessity of applyi ng statistical disclosure cont rol methods before data are released. 3.2.2 An Example: The Compromise of Statistical Databases Adam and Wortmann (1989) described three basic types of authorized users for a statistical database: the non-statistical user s accessing the database sending queries and updating data; the researchers au thorized to receive only a ggregate statistics; and the snoopers, attackers or adversaries seeking to compromise the database. The purpose of database security is to provide researcher s with useful information while preventing disclosure risk from attackers. For instance (example from Adam and Wortmann 1989, Garfinkel et al. 2002), a hospital’s database (see Table 35) providing aggregate statistics to the outsiders contains one confidential field, that is, HIV status wh ich is denoted by “1” as positive and “0” as otherwise. Suppose a snooper knows that Coope r working for company D is a male under the age of 30, and attempts to find out whethe r or not Cooper is HIV-positive. Therefore, he types the following queries: Query 1: Sum = (Sex=M) & (Company=D) & (Age<30); Query 2: Sum = (Sex=M) & (Compa ny=D) & (HIV=1) & (Age<30); The response to Query 1 is 1, a nd the response to Query 2 is 1. Neither of queries is a threat to the da tabase privacy individually, however, when they are put together, the attacker who know s Cooper’s personal information can locate Cooper from Query 1’s answer and immediat ely infer that Cooper is HIV-positive from Query 2’s answer. Thus, the confidential data is disclosed. And we refer to this case as a compromise of a database.

PAGE 41

29 From this example, we can tell that the s nooper is able to infer the true confidential data through analyzing aggreg ate statistics by sending the sequential queries. Therefore security mechanisms have to be established prior to the data release. Table 3-5: A Hospital’s Data base (data source: part fr om Garfinkel et al. 2002) Record Name Job Age Sex Company HIV 1 Daniel Manager27 F A 0 2 Smith Trainee 42 M B 0 3 Jane Manager63 F C 0 4 Mary Trainee 28 F B 1 5 Selkirk Manager57 M A 0 6 Daphne Manager55 F B 0 7 Cooper Trainee 21 M D 1 8 Nevins Trainee 32 M C 1 9 Granville Manager46 M C 0 10 Remminger Trainee 36 M D 1 11 Larson Manager47 M B 1 12 Barbara Trainee 38 F D 0 13 Early Manager64 M A 1 14 Hodge Manager35 M B 0 3.2.3 Disclosure Control Methods for Statistical Databases Some basic security control methods for microdata and tabular data have been summarized in the previous sections. In this section, we will concentrate on the security control methods for statistical databases. So me methods used for microdata and tabular data may also be utilized here. Adam a nd Wortmann (1989) conducted a complete survey about security techniques for statistical data bases (SDBs). They clas sified all security methods for SDBs into four categories: concep tual, query restriction, data perturbation, and output perturbation. In addi tion to that, Adam and Wortma nn provided five criteria to evaluate the performance of security mech anisms. Our literature review will follow suit and discuss major security control methods in the following sections.

PAGE 42

30 Figure 3-2: Three Approaches in Statistical Da tabase Security. A) Query Restriction, B) Data Perturbation and C) Perturbed Responses. Figure 3-2 demonstrates three approaches : Query Restriction, Data Perturbation and Output Perturbation (D ata source: Adam and Wort mann 1989). Figure 3-2A shows how Query Restriction method works. This tech nique either returns exact answers to the user or refuses to respond at all. Figure 3-2B introduces Data Perturbation method which creates a perturbed SDB from the original SDB to respond to all queries. The user can receive only perturbed responses. The output pe rturbation method is illustrated in Figure 3-2C. Each query answer is modified be fore being sent back to the user. SDB Researcher (restricted) Queries Exact responses ordenial A SDB SDB Perturbed SDB Data perturbation Researcher Queries (Perturbed) Responses Researcher B (restricted) Queries C PerturbedResponses

PAGE 43

31 3.2.3.1 Conceptual approach The Conceptual approach includes two ba sic models: the Conceptual and Lattice models. The Conceptual model, prop osed by Chin and Ozsoyoglu (1981, 1982), addressed security issues at a Conceptual data model level where the users only access entities with common attributes and their statistics. The Lattice model developed by Denning (1983) and Denning and Sc hlorer (1983), retrieved da ta from SDBs in tabular form at different aggregation levels. Both methods provide a fundamental framework to understand and analyze SDBs’ security problem s, but neither seems functional at the implementation level. 3.2.3.2 Query restriction approach Based on the users’ query history, SDBs e ither provide the exact answer or decline the query (see Figure 3-2A). The five majo r methods in this approach include: (1) Query-set-size control (Hoffman and Miller 1970, Fellegi 1972, Schlorer 1975 and 1980, Denning et al. 1979, Schwartz et al. 1979, Denning and Schlorer 1980, Friedman and Hoffman, 1980, Jonge 1983). This method allows the release of the data only if the query set size (number of records included in the query response) meets some specific conditions. (2) Query-set-overlap c ontrol (Dobkin et al. 1979). Th is mechanism is based on query-set-size control and furt her explores the possible over lapped entities involved in successive queries. (3) Auditing (Schlorer 1976, Hoffman 1977, Chin and Ozsoyoglu 1982, Chin et al. 1984, Brankovic et al. 1997, Malvestuto and Moscarini 1998, Kleinber g et al. 2000, Li et al. 2002a, Malvestuto and Mezzini 2003). This technique in tends to keep query records

PAGE 44

32 for each user, and before answering new queri es, it checks whether or not the response can lead to a disclosure of the confidential data. (4) Partitioning (Yu and Chin 1977, Chin and Ozsoyoglu 1979, 1981, Schlorer 1983). This method groups all entities into a number of disjoint subsets. Queries are answered on the basis of those s ubsets instead of original data. (5) Cell suppression (Cox 1975, 1980, Denning et al. 1982, Sande 1983, Malvestuto and Moscarini 1990, Kelly et al. 1992, Malvestuto 1993) The basic idea of the technique is to suppress all cells that may result in the compromise of SDBs. So far, some methods in this category have been proved eith er inefficient or infeasible. For instance, a statistical database normally includes a large number of data records. Under this situation, a traditional auditing method would become impractical due to its requirement for large memory stor age and strong computing power. Among those methods, the most promising method is the cell suppression technique, which has been implemented successfully by the US Census Bureau and widely adopted in the real world. 3.2.3.3 Data Perturbation Approach In this approach, a dedicated perturbed da tabase is constructed once and for all by altering the original database to answer user s’ queries (see Figure 3-2B). According to Adam and Wortmann (1989), all met hods fall into two categories: (1) The probability distribution. This category treats SDB as a sample drawn from some distribution. The original SDB is re placed either by another sample coming from the same distribution, or by th e distribution itself (Lefons et al. 1983). Techniques in this category include data swapping (Reiss 1984) multidimensional transformation of

PAGE 45

33 attributes (Schlorer 1981), data distortion by probability distributi on (Liew et al. 1985), and etc. (2) Fixed data perturbation. This cate gory includes some of the most successful database protection mechanisms. It can be achieved by either an additive or multiplicative technique (Muralidhar et al. 1999, 1995). An additive technique (Muralidhar et al. 1999) refers to adding noise to the confidential data. The multiplicative data perturbation (Muralidha r et al. 1995) protects the sensitive information by multiplying the original data with a random variable, which has mean of 1 and a prespecified variance. Our study focuses on th e additive data pert urbation, which are classified into two types of perturbation in our research : random data perturbation and variable data perturbation. We will introduce these two methods separately in Chapter 5. 3.2.3.4 Output Perturbation Approach Output Perturbation is also named query-b ased perturbation. The response for each query is computed first from the original da tabase, and then it is perturbed based on the answer of each query (see Figure 3-2C). Th ree methods are included in this approach: (1) The Random-Sample Queries techni que is proposed by Denning (1980). Later, Leiss (1982) suggested a variant of Denning’s method. The basic rationale is that the query response is calculated from a randomly selected sampled query set. This selected query set is chosen from the original query set by satisfying some specific conditions. However, an attacker may compromise the confidential information by repeating the same query and averaging the results.

PAGE 46

34 (2) Varying-Output Perturbation (Beck 1980) works for SUM, COUNT and Percentile queries. This method assigns a varyin g perturbation to the data that are used to compute the response statistic. (3) Rounding includes three types of out put perturbation: systematic rounding (Achugbue and Chin 1979), random rounding (Fellegi and Phillips 1974, Haq 1975, 1977), and controlled rounding (Dalenius 1981). This technique calculates queries based on unbiased data, and then the answer is r ounded up or down to the nearest multiple of a base number set by Database Administrators (DBAs). Query results do not change for the same query, therefore providing good protec tion in terms of av eraging attacks. In this chapter we summarized different types of database security-control methods. For a specific database, one SDC method coul d be more effective and efficient than another. Therefore, how to select the mo st suitable security method becomes a critical issue in the database privacy. We will revi ew various performance measurements for SDC in the next chapter.

PAGE 47

35 CHAPTER 4 INFORMATION LOSS AND DISCLOSURE RISK Chapter 2 provided an overview of impor tant SDC methods that are applied to protect the privacy of a database. Howeve r, since SDC methods reach their goals by transforming original data, us ers of the database would ach ieve only approximate results from a modified data. Therefore, a fundament al issue that every statistical organization has to address is how to pr otect confidential data maxima lly while providing database users with as much useful and accurate inform ation as possible. In this chapter, we review the main performance measurements of SDC methods. These assessments are used to evaluate the information loss (used interchangeably with data utility) and disclosure risk of a database. These measures have become standard criteria for deciding on how to choose appropriate pr otection techniques for SDBs. 4.1 Introduction All SDC methods attempt to opt imize two conflicting goals: (1) Maximizing data utility or minimizing information loss that legitimate data users can obtain. (2) Minimizing the disclosure risk of the confidential information that data organizations take by publishing the data. Therefore the efforts to obtain greater protection usually result in reducing the quality of data that are released. So the data base administrators always seek to solve the problem by optimizing tradeoffs between the information loss and disclosure risk. The definitions for information loss and disclosure risk are as follows:

PAGE 48

36 Information Loss (IL) refers to the loss of the utility of data after being released. It measures the damage of the data quality for the legal users due to the application of SDC methods. Disclosure Risk (DR) refers to the risk of disclosure of confidential information in the database. It measures how dangerous it is for statistical organizations to publish modified data. The problem that statistical organizations always have to confront is how to choose an appropriate SDC method with suitable parameters from many potential protection mechanisms. And the selected mechanism should be able to minimize disclosure risk as well as information loss. One of the best solutions is to count on performance measures to evaluate the suitability of different SDC t echniques to the database. Good designs for performance criteria quantifyi ng information loss and disc losure risk are therefore desirable and necessary. 4.2 Literature Review Designing good performance measures is a challenging task because different users collect data for different purpos es and organizations define di sclosure risk to different extents. So far, there are many performance assessment methods existing in the literature. Based on their properties, we di vide those measurement techni ques into five categories in our research: (1) Information loss measures fo r some specific pr otection methods. This type of measurement assesses the di fference of masked (modified) data from original data after applying a specific prot ection method. Refer to Willenborg and Waal (2000) and Oganian (2002) for example. If vari ances of the original microdata are critical for the user, then the information loss can be estimated as

PAGE 49

37 ˆˆmaskedoriginalVardataVardata where ˆoriginaldata is a consistent estimator of the original data, and ˆmaskeddata is the corresponding estimator of the modified data. We can tell from the above criterion that this measurement depends on a specific purpo se of data use, such as mean, variances, etc. (2) Generic information loss measur es for different protection methods. A generic information loss measure, which is not limited to any particular data use, is designed to compare different pr otection methods. Two well-known general information loss measures are as follows: Shannon’s entropy, discussed in Kooiman et al. (1998) and Willenborg and Waal (2000), can be applied to any SDC technique to define a nd quantify information loss. This measurement models the masking process as noise added to the original dataset, which then is sent through a noisy channel. The receiver of the noi sy data intends to reconstruct the probability distribution of the original data. The entropy of this probability distribution measures the uncertainty of the original data after masked data are released because of the transmis sion process. However an entropy-based measurement is not a very good criterion since it ignores the impact of covariances and means. Whether or not these two statistics can be preserved properl y from the original data directly affects the validity and quality of the altered data. Another measurement by Domingo-Ferrer et al. (2001) and Oganian (2002) suggests that IL would be small if the origin al and masked data have similar analytical structure, but the disclosure risk would be higher in this case. This method compares statistics, such as mean square error, mean absolute error, and mean variation, which are

PAGE 50

38 calculated from the difference of covarian ce matrix, coefficient matrix, correlation matrix, and etc. between the orig inal data and modified data. (3) Disclosure risk measures for specific protection methods. The disclosure risk also affects the qualit y of the SDC methods. Compared with IL measures, DR measures are more method-spec ific. The idea of asse ssing disclosure risk was initially proposed by Lambert (1993). Late r, different DR measures were developed for SDC methods, i.e., for sampling met hods by Chen and Keller-McNulty (1998), Samuel (1998), Skinner et al (1994), and Truta et al. (2004) and for micro-aggregation masking methods by Jaro (1989), and Pagliuca and Seri (1998). (4) Generic disclosure risk meas ures for different protection methods. The two main types of genera l DR measurements are app lied to measure the quality of different protection methods for tabular data. The first measurement is called sensitivity rules, which is used to estimate DR prior to the public ation of data tables. There are three methods: (,) nk -dominance, % p -rule, and p q rule (Felso et al. 2001, Holvast 1999, Luige and Meliskova 1999). Diffe rent from dominance rule, which is criticized for its failure to to reflect the disclosure risk properly, a new priori measure is proposed by Oganian (2002), who also introdu ced a posterior DR measure, which takes the modified data into account and operates after applying SDC methods. A new method based on Canonical Correlation Analysis was introduced by Sarathy and Muralidhar (2002) to evalua te the security level for different SDC methods. This methodology can also be used to select the appropriate inference control method. For more details, refer to Sa rathy and Muralidhar (2002).

PAGE 51

39 (5) Generic performance measures that encompass disclosure risk and information loss for different protection methods. A sound SDC method should be able to achieve an optimal tradeoff between disclosure risk and information loss. Theref ore a joint framework is desired to examine the tradeoffs and compare the performan ce of distinct SDC methods. Two popular performance measures in the literature ar e Score Construction and R-U confidentiality map. Score Construction, proposed by DomingoFerrer and Torra (2001), ranks different SDC methods, based on their scores obtai ned by averaging their information loss and disclosure risk measures. For example (Crisis 2004e), '' '(,)(,) (,) 2 I LVVDRVV ScoreVV Where V is the original data, 'V is the modified data. Information Loss (IL) and Disclosure Risk (DR) are information loss and disclosure risk measures. Refer to Crisis (2004e), Domingo-Ferrer et al. (2001), Seb et al. (2002) and Yancey et al. (2002) for more examples. An R-U confidentiality map, first propos ed by Duncan and Fienberg (1999), constructs a general analytical framework for information organization to trace the tradeoffs between disclosure risk and data ut ility. It was further developed by Duncan et al. (2001, 2004), and Gomatam et al. (2004). Tr ottini and Fienberg (2002) later illustrated two examples of R-U map in their paper. An application is given in Boyen et al. (2004). Database adminisstrators could decide the most appropriate SDC method from the R-U map by observing the influence of a partic ular method with the according parameter

PAGE 52

40 choice. See the following figure (Data s ource: Trottini and Fienberg 2002) for an example. Figure 4-1: R-U Confidentiality Map, Univariate Case, 2210,5,2 n 012, M MandM are represented by a diamond, a ci rcle and a dashed line in the figure, and indicate three types of SD C methods: trivial microaggregation, microaggregation, and the combination of additive noise and microaggregation, respectively. The disclosure ri sk and data utility are functi ons determined by the data size n, known variance (prior belief) 2 known population variance 2 and the standard deviation r of the noise added to the original data. The y-axis measures the disclosure risk while the x-axis estimates the data utili ty. For example, checking Figure 3-2, if the database administrators intend to have the disc losure risk below 0.5, we will see that the appropriate SDC method that satisfies this requirement is 2 M the mixed strategy of additive noise plus microaggregation met hod. From the x-axis, the corresponding data utility is shown as 2.65. The choice of r can also affect the R-U map. If r is large, then the mixed strategy 2 M is close to not releas e any data at all, as r is chosen close to zero,

PAGE 53

41 the 2 M is equivalent to the microaggregation method with some specific parameter. In Figure 4-1, 2.081 r We do not differentiate the measurements for microdata and tabular data in the overview since our research focuses on statis tical databases. All examples and methods previously mentioned are applied either to microdata or tabular data or both.

PAGE 54

42 CHAPTER 5 DATA PERTURBATION This chapter provides an introduction to additive data perturbation methods. Based on different ways of generati ng perturbative values, additiv e data perturbation methods are classified into three cat egories: random-data perturbati on, fix-data pe rturbation and variable-data perturbation. The first category, random-data perturbation, with five types of perturbation methods, can be found in Ki m 1986, Muralidhar et al. 1999, Sullivan 1989, Tendick 1991, Tendick and Matloff 1994. Our proposed variable-data perturbation method is a new category that includes the interval protection technique given by Gopal et al. (1998, 2002) and Garfi nkel et al. (2002). In both random data perturbation and variable-data perturbation me thods, a perturbed database is constructed by adding noise to the confidential data in th e original database. All query responses are computed from the perturbed database. We will review an algorithm by Dinur a nd Nissim (2003) that finds a bound for the fixed-data perturbation. The noise is added to each query response. This bound can be applied to both data pe rturbation and output perturbation methods. Their work considers the tradeoff between privacy and usability of a statistical database. We end the chapter with the proposed appr oach to the database security problem. 5.1 Introduction Our study focuses on additive noise pert urbation methods, which are usually employed to protect confidential numerical data. Perturbation met hods can guarantee the prevention of the exact disclosu re by adding noise to sensitive data, however they are still

PAGE 55

43 susceptible to partial disclosure and inferen tial disclosure. (See Chapter 3 for definitions of exact disclosure, partial disclosure and inferential disclosure.) Two types of additive perturbation methods are described in the following sections based on their different approaches of generating noise. An algorithm by Dinur and Nissim (2003) providing a theoretical basis fo r our study is also reviewed. Our proposed research approach is discusse d at the end of this chapter. 5.2 Random Data Perturbation 5.2.1 Introduction Random Data Perturbation (RDP) is one of the most popular and practical data protection methods employed in statistical databa ses today. In order to effectively prevent statistical inference against a snooper, DBAs attempt to provi de an appropriate level of security by distorting the se nsitive data with random noise. The RDP method could assure adequate protection of confidential in formation while satisfying legitimate users’ needs for aggregate statistics of the database. 5.2.2 Literature Review In the Random Data Perturbation (RDP) me thod, a perturbed database is created by adding random noise to the conf idential numerical attribute(s) We discuss four types of RDP summarized by Crises ( 2004) and describe a genera l method for RDP given by Muralidhar et al. (1999). Before walking through different types of RDP methods, we first discuss the main disadvantage of the data pert urbation methods. RDP method s may generate bias into statistical characteristics of databases, such as PERCENTILES, conditional SUMS, and COUNTS. Matloff (1986) initially introduced the concept of bias which occurs when the responses to certain queries computed from a perturbed database may be different from

PAGE 56

44 the responses computed from the original data base. The four types of bias, A, B, C, and D, are defined and analyzed in the literatu re by Muralidhar et al (1999). Type A bias occurs when a change in variance causes a change of summary measures of some perturbed attribute. Typed B bias applies when the perturbation dist ort the relationships between confidential attributes Type C bias occurs when the perturbation changes the relationships between confiden tial and non-confidentia l attributes. Type D bias occurs when the underlying distribution of the perturbe d database can not be determined because the original database or noise term has a non-multivariate normal distribution. Improved perturbation methods are de signed to avoid bias (Matlo ff 1986, Tendick 1991, Tendick and Matloff 1994, Muralidhar et al. 1995). A creative method called General Additive Data Perturbation (GAD P), proposed by Muralidhar (1999), de letes all these types of bias completely from additive perturbation me thods. For more information about GADP, see Section 5.2. (1) Masking by uncorre lated noise addition This method is also called the Simple Additive Data Perturbation method (Muralidhar et al. 1999). The v ector of confidential fields, md representing the thm attribute of the original database which contains n records, is replaced by a vector my by adding a noise term me : mmmyde where each element of me is normally distributed and drawn from a random variable m ~20,mN. Each noise term is generated in dependently of the others, such that ,0ijCov for all ij The variances of m are generally assumed

PAGE 57

45 proportional to those of the original vector md that is, if the variance of md is 2m then 22:mm The distribution of m and parameter are decided by the DBA. This perturbation method introduces Type A, B and C bias. (2) Masking by corre lated noise addition This method proposed by Kim (1986) and Te ndick (1991) uses correlated noise to perturb the database. It is also called th e Correlated-Noise Additive Data Perturbation method (CADP). The formulation of the method is: yVVV where yV is the covariance matrix from the perturbed data; V is the covariance matrix of the errors, that is, ~ 0,NV which is proportiona l to the covariance matrix of the original data, V, that is: VV The CADP method generates Type A and Type C bias. (3) Masking by noise additi on and linear transformations In Kim (1986), Tendick a nd Matloff (1994), Crises ( 2004), and Muralidhar et al. (1999) masking by correlated noise addition was modified to use additional linear transformations to eliminate certain types of bias. Therefore, the sample covariance matrix of the masked data is an unbiased estimator for the covariance matrix of the original data. This method is also named th e Bias-Corrected Correlated-Noise Additive Data Perturbation (BCADP ) method and only results in Type C bias. (4) Masking by noise additi on and nonlinear transformation

PAGE 58

46 Sullivan (1989) proposed a complex algor ithm (not discussed here) combining simple additive noise with a nonlinear transf ormation. This masking method is applied to discrete attributes. Muralidhar et al. (1999) introduced a General Method for the Additive Data Perturbation (GADP) method, which is a fu rther improvement on the previous RDP methods. Suppose the database U has a set C of confidential attributes and a set NC of non-confidential attributes with n records. A perturbed database P which only alters the attributes in set C is constructed on the basis of the original database U. The perturbation process keeps all statistical re lationships, such as the mean values for C, and measures of the covariance and canonical correlation between C and NC. Then each record in the set C is generated from a multivariate normal distribution. This process is repeated for all records. The GADP method guara ntees that the statistical properties between all attributes are the same before and after perturbation, therefore eliminating all types of bias. Thus, the GADP is called a bias-f ree RDP method. By comparing with other perturba tion methods empirically, Murali dhar et al. suggested that the GADP method would provide th e highest level of security and represents a general form of additive noise perturbation. 5.3 Variable Data Perturbation 5.3.1 CVC Interval Protection for Confidential Data Gopal, Goes, and Garfinkel (1998) initiat ed the idea of interval protection for confidential information in a database and introduced the concept of interval disclosure. They developed three techniques, which th ey called “Technique-LP”, “Technique-ELS, and Technique-RP”, for various query types. As a result, the query t ypes that a user could ask are limited to SUM (COUNT), Mean, MI N, and MAX for numerical data. This

PAGE 59

47 method was further studied in Gopal et al. (2000). Later, Gopal et al. (2002) formally proposed the Confidentiality vi a Camouflage (CVC) interval protection technique, which is designed to answer numerical ad hoc statis tical queries to an online database. Garfinkel et al. (2002, 2004) further ex tended this technique. Garfinkel et al. (2002) explor ed the CVC technique for pr ivacy protection of binary confidential data and answered only ad hoc COUNT queries (the same as SUM queries here). The extended technique is called Bin-CVC. Consider a database consisting of n records. The Bin-CVC technique introduces s binary camouflage vectors, 121,...,, s sPPPPP which are used to camouflage or hide the true c onfidential vector d, where sPd for s. Without loss of generality, they assumed the database contained only one binary confidential field. Ea ch camouflage vector is denoted as 1,...,jjj nPpp When a user asks a query q an interval answer I qlquq will be returned as follows. The upper bound uq and lower bound lq of the interval are calculated from the maximum and minimu m of all camouflage vectors in the specific set related to the query, that is, maxj i jP iquqp and minj i jP iqlqp The true answers are guaranteed to be in side the interval response, i iqdIq Table 5-1: An Example Database (D ata source: Garfinkel et al. 2002) Record Name Job Age Company HIV 1 Jones Manager 27 A 0 2 Smith Trainee 42 B 0 3 Johnson Manager 63 C 0 4 Andres Trainee 28 B 1 5 Selkirk Manager 57 A 0 6 Clark Manager 55 B 0 7 Cooper Trainee 21 D 1 8 Nevins Trainee 32 C 1

PAGE 60

48 Table 5-1. Continued Record Name Job Age Company HIV 9 Granville Manager 46 C 0 10 Brady Trainee 36 D 1 11 Larson Manager 47 B 1 12 Remminger Trainee 28 D 0 13 Early Manager 64 A 1 14 Hodge Manager 35 B 0 The HIV status field represents a binary confidential field with 14 records (see Table 5-1). All query responses involving this sensitive field are computed from camouflage vectors generated by the Bin-CVC technique. Tabl e 5-2 is an example of camouflage vectors for this database where vector 3P is the true vector. Table 5-2: The Example Database with Camouf lage Vector(Data sour ce: Garfinkel et al. 2002) Record P1 P2 P3 = d 1 1 0 0 2 0 1 0 3 1 0 0 4 0 0 1 5 0 1 0 6 1 0 0 7 0 0 1 8 0 0 1 9 0 1 0 10 0 0 1 11 0 0 1 12 1 0 0 13 0 0 1 14 0 1 0 Camouflage vectors are generated from a complex network algorithm. The design of the network algorithm whose joint paths c onstruct different camouflage vectors is a critical step in the success of the BinCVC model. The network represents all n records in the confidential field with variables 1,,n x x. All paths start from the source to the destination. The network is construc ted using two parameters. Parameter w gives the

PAGE 61

49 total number of paths, and parameter m is the number of paths consisting only of true value edges. These determine th e number of camouflage vectors w s m An illustration of the network construction of the exampl e database (see Table 5-1) using three camouflage vectors (see Table 52) is shown in Figure 5-1. Figure 5-1: Network With ,1,3mw (data source: Garfinkel et al. 2002) In the example database (Table 5-1), a ll 14 records in the co nfidential field are denoted by variables 114,, x x. Parameter 3 w indicates 3 disjoint paths are constructed in the network and 1m implies that all those variables with true value 1 in the true confidential field are assigned to one of three paths. Variables representing other records with value zero are assigned as evenly as possible to the rest of two paths. The total number of camouflage vectors is 3 3 1s Every camouflage vector is the combination of choosing m edges out of w paths. So, in Figure 5-1, each camouflage vector selects one edge out of three paths with their true value records on the path. Compared with Table 5-2, camouflage vector 1P has records 1, 3, 6, and 12 containing value one. The remaining records in 1P are zero. In the corresponding network, accordingly there is one path including only variables 13612,,, x xxx.

PAGE 62

50 Performance measurement 1*CBpmw is employed to assess the quality of networks for a given database with different w and m values, where CB stands for Column Balancing. The usefulness of each que ry answer is computed by the formula: 1001/ Z uqlqq q denotes the cardina lity of the query q which is the number of records that are involved in that query. The closer to 1.0 Z is, the better the query answer is. The ideal network that yields the tig htest interval response has a small s and every camouflage vector has the same number of one s as the true confidential field. That is, *j p p where j p is the proportion of ones in jP, and p is the proportion of ones in sPd This ideal structure is called “perfect column balancing”. See Table 5-2 as an example. Here 120.4pp *0.6p A good CB “increases the probability of (a) better query answer”. Bin-CVC is a very promising methodology for the database privacy. However, instead of an exact answer, it responds to the query with an interval which reduces the data utility. We define the information loss of the CVC technique as the width of the interval, given by qeuqlq 5.3.2 Variable-data Perturbation Inspired by the CVC technique, we propos e a new data perturbation method the variable-data perturbation. Different from random data perturbation whose random noise is drawn from a normal distribution 20,N the variable-data perturbation method is defined as a data perturbation method whic h modifies the confid ential information by adding discrete noise that is generated by a parametrically driven algorithm, such as w

PAGE 63

51 and m in the CVC interval protection method. The perturbed database is created once and for all. The algorithm can choose various parameters to produce different types of noise. We can view the output of the algorith m as if it were pulling values randomly from some distribution D with known parameters, with a non-zero mean and variance 2 The mean and variance are always finite. E ach query answer is computed from the perturbed data. A discrete random data pe rturbation method builds a pe rturbed database from which all query responses are computed. Ou tput perturbation met hod does not alter the database, but query answers are perturbed before they are returned to the user. Variable data perturbation method is a hybrid of da ta perturbation and out put perturbation and generates noise for the conf idential field. Perturbed answers for each query involving sensitive data are calculated only from the perturbed confidential vector. We treat the variable-data perturbation as a data pe rturbation method with query protection. Consider the Bin-CVC techni que as an example of the variable-data perturbation method. The network algorithm creates cam ouflage vectors to disguise the true confidential vector once and for all. Each query answer is an interval which is computed from the camouflage vectors and assures the tr ue answer is included. In a worst-case scenario, the noise or perturbation could be regarded as the difference between the lower bound and upper bound of the interval: qeuqlq where qe are discrete random variable. We simulated the network algorithm on the example database (see Table 5-1) in Garfinkel et al. (2002) and co mputed the interval answer s for all queries. Since the confidential vector in the database is a 14-bi t binary string, the total number of queries

PAGE 64

52 involving this binary vector is 142. The following figures (Fig ure 5-2 A-D) show four different cases with parameters of the network algorithm at (1) 5 w and 2 m ; (2) 7w and 3m ; (3) 8w and 5m ; (4) 12w and 6m Among those networks, 7w and 3m creates perfect column balancing and based on its frequencies of each noise value for all 142 queries, we obtain a noise distribution with mean 3.302 and variance 21.379 as shown in Figure 5-2B. Discrete Distribution of Perturbations in the CVC Network with w=5 and m=21878 4830 5585 3120 870 100 0000000 1000 2000 3000 4000 5000 6000123456789101112PerturbationFrequency A Discrete Distribution of Perturbations in the CVC Network with w=7 and m=31053 2870 5208 4802 2100 350 0000000 1000 2000 3000 4000 5000 6000123456789101112PerturbationFrequency B Discrete Distribution of Perturbations in the CVC Network with w=8 and m=5362 1295 3632 4972 3846 1779 450 47 00000 1000 2000 3000 4000 5000 6000123456789101112PerturbationFrequency C Discrete Distribution of Perturbations in the CVC Network with w=12 and m=666 355 1244 2981 5170 4641 1716 210 00000 1000 2000 3000 4000 5000 6000123456789101112PerturbationFrequency D Figure 5-2: Discrete Distribution of Perturbations from the Bin-CVC Network Algorithm. A) 5w and 2m B) 7w and 3m C) 8w and 5m and D) 12 w and 6 m After the network is set up with parameters w and m, the noise distribution D is fixed, and its mean and variances 2 are finite and known. Figure 5-2 showed this property. We intend to bound the noise qe drawn from D in terms of and 2 We

PAGE 65

53 will continue to discuss how to estimate the mean and variances 2 in the next chapter. 5.3.3 Discussion For Bin-CVC, there is a conflict be tween the two performance measures, CB and Z -score. That is, a high Column Balancing value, which indicates a good protection for the whole database with some specific w and m, could not guarantee good query answers (i.e., a high Z value). We claim that Interval disclosure or interval inference occurs when the maximum of the error of the snooper’s es timation about the true confid ential value is less than the tolerance threshold predetermined by the DBA. Exact inference can be treated as a special case of interval inference and has an error value of 0. Gopal et al. (2002) state th at the CVC technique could completely eliminate exact disclosure and interval inference. Howe ver, Muralidhar et al. (2004) have shown empirically that CVC technique is sometime s vulnerable to interval inference. By utilizing a simple deterministic procedure, the snooper can sometimes compromise the database by shrinking the interv al answers into a smaller ra nge within the predetermined threshold. Suppose the thi query is answered by i iu l,. In their example, they show how a snooper could compute the midpoint of the interval 2 /i i iu l m the half-width of the interval, 2 /i i il u w and then use these to build a new interval as i iw m 5 0 which still includes the true value, but is narrower than the original interval and, hence, less than the thre shold. See Table 5-3 for this example.

PAGE 66

54 Table 5-3: An Example of Interval Disclo sure (Data source: Mu ralidhar et al. 2004) Original Interval Intruder Interval Query True Value P1 P2 P3 Lower Limit Upper Limit Width of Original Interval (%) Lower Limit Upper Limit Width of Modified Interval (%) 1 276.3 275.2 302.8263.5263.5 302.8 14.2 273.3 293.0 7.1 2 35.4 36.2 32.7 36.3 32.7 36.2 10.2 33.6 35.4 5.1 3 37.4 37.4 41.1 35.5 35.5 41.1 14.9 36.9 39.7 7.5 … … … … … … … … … … … In Gopal et al. (2002), the inte rval protection requires that the interval length is at least 10% of the original valu e. In Table 5-3, the intruder’ s intervals computed using the method provided by Muralidhar et al. (2004) are narrower th an the threshold of 10%. Thus, the database is compromised in terms of the interval disclosure. However, the test given by Muralidhar et al. (2004) only examined the CVC interval protection empirically. For networks with different w and m, this deterministic method may not apply. 5.4 A Bound for The Fixed-data Perturbation (Theoretical Basis) Dinur and Nissim (2003) studied a theo retical tradeoff between privacy and usability of statistical databases (SDBs). They concluded that a minimum perturbation magnitude of n is required for each query q in order to maintain even weak privacy of the database. Otherw ise, an adversary could recons truct the statistical database using 2lglnn (base 2 logarithm) queries with hi gh probability in polynomial time. As expected, the SDB can be protected from disclosure if the perturbation value is bounded by eon, however, then the data utility ma y be too low to be useful. Since Dinur and Nissim make no assumptions beyond assu ming the additive error is fixed, their

PAGE 67

55 results are valid both for data perturbation and output perturbati on methods using fixed additive error. We review their result s and methodology in the following sections. Dinur and Nissim (2003) modeled the confid ential field in the database as an n-bit binary string 1,...,0,1n ndd. The true answer for a SUM query q, n q, 1, is computed as i iqd. The perturbed answer for a query q is A q obtained by adding a perturbation i iqAqde, where n o e is the bound for the perturbation of each query. The authors developed a Linear Progra mming (LP) algorithm to generate the candidate confidential vector which is the vector th at an adversary would use to compromise the database. See Table 5-4 for details of the LP algorithm. Table 5-4: LP Algorithm (Data source: Dinur and Nissim 2003). [Query Phase] Let 2lglnn For 1 jl choose uniformly at random 1,,jqn and set .jqjaq [Weeding Phase] Using and linear objective, solve the following linear program with unknowns 1c, …, nc: jj jqiq iqaecae for 1 jl 01ic for 1 in [Rounding Phase] Let '1ic if 1 2ic and '0ic otherwise. Output 'c.

PAGE 68

56 Other vectors that are far away fr om the true confidential vector d are weeded out by the algorithm. The output of the LP algor ithm is the candidate vector that best estimates the confidential vector. The n-bit binary vector 'c is obtained by rounding c, which is a vector of real numbers produced by the LP algorithm. Di nur and Nissim (2003) also introduced a vector c obtained by rounding c to the nearest integer multiple of 1 k, where n k represents a precision parameter, and 121 0,,,...,,1 k K kkk Hence ncK They proved that 1 2 e d cjq i i i. To prove that the candidate vector 'c obtained from the algo rithm is close to the true confidential field d, Dinur and Nissim (2003) introdu ced a Disqualifying Lemma, which proves that random queries 1,...,lqq would weed out all vectors x X where n d x K x Xi i i n3 1 Pr | (1) The term 1 Pr 3ii ixd in Equation 1 represents the expected number of records that obey 3 1 i id x for 0 Therefore, X denotes the set of all vectors which are far away from the true vector d. The Disqualifying Lemma states that Pr()21Rii qn iqxde (2)

PAGE 69

57 The lemma proves that there exists a probability 0 such that a query q disqualifies x if ()21ii iq x de. x will not be a valid LP solution if such a q exists. The lemma guarantees if x is far away from d, at least one of the l queries lq q ,, 1 would disqualify x w ith high probability. One missing piece is the relationship between inequalities (1) a nd (2) that relates to The proof of the disqualifying lemma establis hes this link and it is possible to think of as a function of : We will discuss this further in Chapter 6. If l queries lq q,, 1 are chosen independently and randomly, then for each X x the probability that all l queries do not disqualify x is 1l A conclusion derived from th e Disqualifying Lemma is ,...Pr,1111ilRnl i qqn x Xiqdisqualifiesxnnegn ,...1Pr,11ilRnl i qqn x Xiqdisqualifiesxnnegn Thus, the probability that none of the l queries can disqualify X x is bounded by a very small number 0negn Therefore, the Disqualifying Lemma guarant ees ruling out all di squalifying vectors X x with high probability ( 1negn ) and guarantees that the hamming distance between the final candidate vector c and true vector d is small, that is, n d c dist '.

PAGE 70

58 The number of queries that are required to weed ou t disqualified vectors is computed from the Disqualifying Lemma. That is, 2lglnn See Figure 5-3 for an illustration of relationships of ,',ccc and d. Figure 5-3: Rela tionships of ,',ccc and d. 5.5 Proposed Approach Although SDC methods and machine learni ng have completely opposite research goals, similar methodologies are applied in both areas (Domingo-Fe rrer and Torra 2003). The SDC methods attempt to modify the data in tentionally before the public release. The data distortion should be sufficient enough to pr otect the privacy of the confidential data and small enough to minimize the information loss. ML seeks to learn from noisy examples and designs error-resilient algorithms to disclose true information (Angluin and Laird 1988, Goldman and Sloa n 1995, Shackelford and Volper 1988, Sloan 1988, Valiant 0,1nc From LP '0,1nc ncK '1ic if 12ic '0 i c otherwise '1ic if 12ic '0 i c otherwise Rounding ic to the nearest inte g er multi p le of 1 k 0,1nd ',distcdn ()21ii iqcde

PAGE 71

59 1985). SDC methods protect th e confidential data stor ed in a database with n records and m fields. ML learns the true function from l examples, each of them having m attributes. Therefore, a comm on structure is used to expr ess the information between SDL methods and ML. Although th e two areas have different research purpose and often use different terminologies, the underl ying methodologies are often the same. In our research, we approach the da tabase privacy problem from a machine learning perspective by applyi ng PAC learning theory. We c onsider a scenario when a snooper uses a learning algorithm to discover the true confidential data protected by a SDC method. For example, Figure 5-4 de monstrates the connection between the methodologies employed in PAC learning theory and in the database protection approach in Dinur and Nissim (2003). Figure 5-4: Illustration of the Connec tion between the PAC Learning and Data Perturbation Figure 5-4 indicates that both approaches determine a training sample size l, necessary to accomplish the desired goal. The probability that a query disqualifies the Disqualifying Lemma: ,...,Pr,11ilRl n i qqn x Xierrqdisqualifiesxnnegn Pr:1l lShconsistentanderrorhH PAC learning: Random Samples with Size l Error Cardinality of Hypothesis Accuracy parameter Confidence level

PAGE 72

60 x X with probability greater than is bounded by the union bound of X high probability and further bounded by a small probability negn. Those three parameters correspond to the card inality of the hypothesis space H, the accuracy parameter and the confidence level in the PAC learning theory. They are shown in Figure 5-4 as matched terms even though di fferent notation and terminologies are adopted. Therefore, we coul d conclude that both PA C learning theory and the Disqualifying Lemma address the problems by using the same methodology for different purposes. The same parameters are required to build up the models. From the perspective of PAC learning theo ry, we regard the true confidential field as the target concept that an adversary s eeks to discover within a limited number of queries in the presence of some noise, such as random data perturba tion or variable-data perturbation. In Chapter 6, we raise our re search questions and ex tend Dinur and Nissim (2003)’s work by using PAC learning theory. We set up a model to describe how much protection is necessary to gua rantee that the adversary cannot discover the database with high probability. Put in PAC learning terms, we derive bounds on the amount of error an adversary makes, given a general perturbati on scheme, the number of queries, and a confidence level. Three types of data pertur bation bounds are summarized as follows in terms of different error distributions. (1) Perturbation with a Gene ral Bound Case: General PAC bound The error is randomly generated identica lly and independently from an unknown distribution D. So it is also called Perturbation with a Distribution-free Bound case. A general PAC bound is derived as:

PAGE 73

61 11 lnlnlH where l is the number of queries needed to discover the binary confidential data, is the amount of error that an adversary may make to co mpromise the database and is the confidence level. 2nH is the number of candidate confidential vectors in the hypothesis space H. Without specific information a bout the distributi on of noise, the derivation of l wholly depends on and so this bound is relatively loose. (2) Perturbation with a Fixed-data Bound Case: Fixed da ta perturbation Dinur and Nissim (2003) de rived a fixed-data bound eon for the perturbation added to quer y responses. A bound for the number of queries is also developed, denoted as: 2lglnn which is sufficient to discover the true conf idential vector in the database with a high probability at a small error. (3) Perturbation with a Random Variable Bound Case: Va riable data perturbation (Proposed research) We assume that random perturbations whic h are added to the query responses have an unknown discrete distribution. The moments of the distribution, su ch as the mean and standard deviation, can be esti mated. Variable-data perturbati on belongs to this case. In the next chapter, we derive an error b ound for this case by applying the PAC learning theory. This bound provides the minimum number of queries needed to discover the protected column with sp ecified error and accuracy.

PAGE 74

62 CHAPTER 6 DISCLOSURE CONTROL BY APPLYING LEARNING THEORY In Chapter 2 and 3 we reviewed PAC l earning theory and database security methods. In this chapter, we approach the database privacy problem using ideas from Probably Approximate Correct learning theory. Our research will delve into the additive noise perturbation masking method which is cla ssified into three ca tegories: random data perturbation, fixed data pe rturbation (reviewed in Ch apter 5) and variable-data perturbation. Based on the work of Garfinke l et al. (2002) and Di nur and Nissim (2003), we raise our research questions and construct a theoretical m odel from the perspective of PAC learning theory. We attempt to deri ve an error bound for perturbations with a distribution specified by its fi rst two moments and also de velop a heuristic method to estimate the mean and standard deviation fo r the variable-data pe rturbation method. Dinur and Nissim (2003) studied the case of data perturba tion bounded by a fixed number and provide a theoretical foundation for our research. 6.1 Research Problems Our research focuses on the category of variable-data perturbation. Firstly, we intend to derive a bound on the level of erro r that an adversary may make, given the variable-data perturbation method. We ex tend the bound on the fixed-data perturbation proposed by Dinur and Nissim (2003) with an attempt to bound the perturbation of each query with a random variable qe which has a discrete distribution with known parameters, such as the finite mean and variance. We need to develop a new

PAGE 75

63 Disqualifying Lemma, analogous to Dinur and Nissim’s (200 3), for the variable-data perturbation by deploying PA C learning theory. Like the Disqualifying Lemma in Dinur and Nissim (2003), our result bounds the probabi lity that a query does not eliminate hypotheses that are far away fr om the true confidential answ er. Using this, we develop an error bound on the number of queries within whic h the database could be compromised with high probability. 6.2 The PAC Model For the Fixed-data Perturbation We start our model by interp reting the results of Dinur and Nissim (2003) within the methodology of PAC learning theory. Suppose an adversary attempts to comp romise the SDB by ap plying PAC learning theory. We define a Non-Private Database as follows: a database is non-private if a computationally-bound adversary can expose 1 fraction of the confidential data for 0 with probability 1 where 0 We call 1 the confidence level. Consider a statistical database with n records. Its confidential field is a binary string denoted as 1,...,'0,1n ndd See Table 5-1 for an example database. In this table, “HIV” status is the column we represent. An hypothesis space 0H contains n-bit binary vectors, each of which is an hypothesis 00,1nhH and denotes a candidate vector for the confidential field of the data base. The cardinality of the hypothesis space, or the number of hypothesis is 02nH The true confidential field is regarded as the target concept 0dH The online database receives a SUM (or COUNT) query 1,...,qn sent by the user and responds with a perturbed answer Aq of the true

PAGE 76

64 answer qi iqad. A perturbation is added to each query answer instead of every record and bounded by a fixed number qeaAq PAC Learning starts by random sampling. We take l samples consisting of queries and their perturbed responses, 11,,,,llSqAqqAq Since Aq is a perturbed answer, we will cons ider this learning from noisy data. Our learning algorithm is a linear program. As such, answers can be continuous and will be rounded. Thus it is usef ul to define another hypothesis space 20,1nH For analysis, a grid will prove us eful. Let the hypothesis space 1nHK where 121 0,,,...,,1n K nnn Note that 012HHH where all containments are strict when 1n Let 121:hHH by rounding each component in 2H to the nearest integer multiple of 1n (midpoints rounded down). Further, let 00:1,2ihHHi by rounding each component in iH to the nearest of 0 and 1 (0.5 rounds down). Note that 1hccf where 1 1,,i f in n Given a sample S and a fixed perturbation e, Dinur and Nissim (2003) gave a polynomial algorithm that finds 2cH from which one can output 0hc. We represent this algorithm by cS. As already discussed, the specific algorithm is a linear program (see Table 5-4).

PAGE 77

65 See Figure 6-1 for an illustra tion of the relationships of 01201,,,, HHHhh and d. Figure 6-1: Relationships 01201,,,, HHHhh and d in the Fixed-Data Perturbation. Let 0cH, then the hamming distance between c and d is 1,:n iiii idistcdicdcd. Let 2 x H 1 Pr 3iiixd means the probability of choosing 1,,in randomly such that 1 3iixd. That is, for this x there are n expected records where 1 3iixd. Denote this by 1 3iiiExdn where 0 arbitrarily. Ultimately, we wish to show how to choose a sample size l so that 0,disthSdn Lemma 1: If n x K and 1 3iiiExdn then 0,disthxdn 20,1nH From LP algorithm 00,1nH 1 nHK 020: hcHH 121: hcHH 010: hcHH 0,1nd 0, disthSdn 112i ihcde

PAGE 78

66 Proof: First note that if 1 3iixd then 0111 233i ihxd Thus since no more than n i’s, on average, have 1 3iixd then no more than n records, on average, of 0hx can have 01 3i ihxd. The number 1 3 in 1 3iixd guarantees that i x round to the same number as id. End of Proof Let 1 : 3n iiiTxKExdn From the point of view of the intruder, we want our sample to disqualify all points of T with high probability 1 where 0,1 and is usually chosen so that 1 is large. For a sample of size l, generated independently and identically according to an unknown but fixed distribution D, the probability that an hypothesis c is far away from the true target d is measured by the risk functional 11,,:12ii D iqerrcDqncde 1,,:12ii iqDqncde where 1cH As we stated before (see Figure 6-1), the solution c from the LP can be rounded either to a binary vector 0hc or a vector 1nhcK. The probability that the distance

PAGE 79

67 between the true vector d and the rounded vector 1hc is greater than 1 3 is bounded by Based on this condition, for any random query, the difference between the answers from these two vectors is bounded by a function of the perturbation, 21e. So, we can see that e and are related and they describe the er ror from different perspectives. Then we use a probability which is a function of denoted as to bound the risk functional as Derrc We intend to bound 1:l DDSerrhS by 0 Provided eon the Disqualifying Lemma of Dinur and Nissim (2003) proved 0. Then, for 1 1:111l nn l l DDSerrhSnn (6.1) where 1nnKT is the union bound over T, and therefore the worst-case scenario is bounded. The proof of the Disqualifying Lemma in Dinur and Nissim (2003) shows 2/8 (1) (2)min112,1 3Te with 500 T

PAGE 80

68 Recall that the Disqualify Lemma (Dinur and Nissim 2003) proves Pr()21Rii qn iqxde In the proof, 12,,,n are defined as independent random variables such that iii x d and 0i both with probability 1 2 Let 1n i i The authors approached the proof by dividing it into two cas es based on the size of the expected value of denoted as E Let 500 T be a constant to be specified later in the proof. In the case of ETn the probability satisfies 2/8Pr()2112RT ii qn iqxdee In the second case of ETn the probability satisfies Pr()21 3Rii qn iqxde The role of is discussed below. (For the proof details, see the Appendix A of Dinur and Nissim 2003). From the result of Disqualifying Lemma, we choose to be the minimum of the probabilities from these two cases. So, in term (1), 2/8/400012120Tee so 2/81121Te. In term (2), we know 36 so 0 3108 and 11 3 Hence

PAGE 81

69 2/8 (1) (2)min112,11 33Te Thus, 3 where we choose 3 for the worst case. Dinur and Nissim choose large enough so that 2 131k kke (note the right side is decreasing in ). Simple manipulations show that 2 2 2 11k ke e e After taking the partial derivative with respects to for the above formula we obtain 2 22 2 2 112 1kk kke eke e Thus 222 22 22 2 22 12 1 1 11k keee kee e ee Thus we need 2 131k kke 2 2 2 22 3 1 e e e Since 36 we get

PAGE 82

70 2 2 2 22 3 36 1 e e e 2 2 2 22 108 1 e e e Let /2xe Then 22 108 1 x x x is decided by ( is a pre-defined parameter). For 01 numerical calculations show we need 17 thus giving 0.0002 x Since 11 3108 if we plug 22 108 1 x x x into 11 3108 we get 22 1 1 x x x where 22 3 1 x x x Now back to the inequality (6.1), 1:1n ll DDSerrhSn

PAGE 83

71 11l nn If we bound the probability with the parameter 0 we get 1:11l n l DDSerrhSn where 0 is the confidence parameter. Then take the base 2 logarithm (denoted as lg in all the following formula) on both sides of the last two terms 11l nn to get lg11lgl nn Given a pre-defined parameter the minimum sample size is computed as lglg1 lg1 nn l (6.2) where 22 1 x x x and /2xe with chosen large enough. l is bounded by three parameters and n. Since is a very small number, if we apply it directly into formula (6.2), the resulting bound for the sample size l is quite large, much more than 2log lnn from Dinur and Nissim (2003), even for a small n. See Table 6-1 for examples of two bounds on the sample size with different values of n when 0.05 Table 6-1 shows that by in terpreting Dinur and Nissi m (2003)’s Disqualifying Lemma, we get a PAC bound which is looser th an the one derived in Dinur and Nissim

PAGE 84

72 (2003), no matter what n is. However this PAC bound is st ill much less th an the total number of queries in a database, 2n, except the n is very small, such as 10 n Table 6-1: Bounds on the Sample Si ze with Different Values of n. n 2log lnn lglg1 lg1 nn l 2n 10 111 373643 1024 50 1,593 2,274,447 1.1259E+15 100 4,415 5,191,750 1.2677E+30 500 40,193 34,338,167 3.2734E+150 1000 99,317 76,188,677 1.0715E+301 5000 754,940 469,076,527 --In section 6.4, we will show how to replace 22 1 x x x with a more practical number by using the bound in Dinur and Nissim (2003), therefore deriving a tighter bound for the variable-data perturbation case. 6.3 The PAC Model For the Variable-data Perturbation In this section, we move to the case that an adversary tr ies to compromise a database in which the confidential data is modified by adding variab le-data perturbation. In this method, each query q is added with a perturbation created from a database protection algorithm. The perturbed response is Aq while the true query answer is qi iqad. 6.3.1 PAC Model Setup In the fixed-data perturbation case, a fixed number bounds the perturbation: qaAqe In the variable-data perturbation case, qqaAqe and we assume

PAGE 85

73 that the perturbation qe is a random variable with an unknown discrete distribution with known finite mean and variance 2 Based on the knowledge of these parameters, we attempt to develop a bound on the error that an adversary makes. The bound will be expressed in terms of these parameters. A threshold on the number of queries, within which the database is compromised, can be derived from this error bound. Given S and qe for each qS we develop a polynomial algorithm 2 that obtains an hypothesis 2cH from which we can output 0hc. The algorithm, 2cS, is a linear program: 0,1 1in i c i M inc s.t. j jijq iqcAqe 0,,1,, injl where j qe is the realization of the random variable qe in the LP algorithm and is sampled from the perturbation distributi on. Then the distance between 1hc and the true vector d is bounded by 11iiii ii iqiqiqhcdhcccd qq e n 1qe where 1ii ihccf and 1if n Recall that q denotes the cardinality of the query q.

PAGE 86

74 In the variable-data perturbation case, we need to develop a new Disqualifying Lemma which would disqualify all 1hc which are far away from the true vector d. That is, for any 2 x H query q disqualifies x if 11iq i iqhxde See Figure 6-2 for an illustra tion of the relationships of 01201,,,, HHHhh and d. Figure 6-2: Relationships of 01201,,,, HHHhh and d in the Variable-Data Perturbation 6.3.2 Disqualifying Lemma 2 For a sample of size l which is generated i.i.d according to an unknown but fixed discrete distribution D, the probability that an hypothesis 1hc is far away from the true target d is measured by the risk functional 1111,,:1iq i D iqerrhcDqnhcde 20,1nH 00,1nH 1 nHK 020:hcHH 121:hcHH 010:hcHH 0,1nd 02, disthSdn 11iq i iqhcde

PAGE 87

75 11,,:1iq i iqDqnhcde We intend to bound this error rate. As in section 6.2, we want 12:l DDSerrhS (6.3) where 0,1. We now develop our Lemma 2, a dis qualifying lemma, analogous to Dinur and Nissim’s Disqualifying Lemma. Lemma 2 assume s that the mean and standard deviation of the distribution of qe satisfies 2 n and n. Practical reasons motivate these respective cases as we now discuss. (1) if : Since the standard deviation measures how spread out the perturbations (qe values) can be, if many perturbations will be widely dispersed, meaning that the corresponding intervals offer little information. This can take many forms. For example (see Figure 6-3), with a bimodal distribution so me intervals will be tight and others very disperse. The tight ones might provide an att acker the ability to easily disclose parts of the confidential information. The wide interval s may provide too little usable information to be meaningful for the user. (2) if 2 n there are four possible cases: a. n In this case, most perturbations are clus tered around a large m ean. Although a large perturbation provides better pr otection of the database, it reduces the usability of the

PAGE 88

76 query answers. The user gets very little info rmation. For a demonstrat ion of this case, see the following Figure 6-4. Consequently, a data base security method is meaningless if it produces perturbations with a large mean and relatively small standard deviation. A Bimodel Distribution of Perturbations939 2415 1793 1560 435 50 612 2475 3202 2405 465 340 1000 2000 3000 4000123456789101112PerturbationFrequency Figure 6-3: A Bimodal Dist ribution of Perturbations in the CVC Network while b. n Very high mean and standard deviati on imply two situati ons: (1) all query responses are perturbed with big noises whic h are widely spread out in the high mean area. In this case, the user can not get any us eful data from these query answers; and (2) many query answers have large perturbations wh ile others provide users with very tight answers which can reveal the confidential data easily. Neither of a bove distributions is meaningful for our research. c. n The same reason described in (1) is used here also.

PAGE 89

77 A Discret Distribution of Perturbations with high mean and small standard deviation00 90 811 2284 5543 5122 2151 360 22 000 1000 2000 3000 4000 5000 6000123456789101112PerturbationFrequency Figure 6-4: A Distributi on of Perturbations in the CVC Network with n (3) if n holds: A database usually includes a large number of records. Therefore, the mean of the perturbations is likely less than n in most cases. If the mean n is true, then the security method likely offers little information to the users, no matte r what the standard deviation is. See the discussion in (2) a, b and c for similar explanations. Lemma 2: Let 0,1,0,1nnxd and qe be a random variable gene rated from a distribution with mean qEe and variance 2 where 2 n and n If 11 Pr 3ii ihxd then there exists a constant 0, such that 1 []Pr1iq i qRn iqhxde where is a function of

PAGE 90

78 Disqualifying Lemma 2 Proof: Let 1()iiiYhxd be i.i.d. random variables. For any fixed n q let q m the cardinality of q. Without loss of generality, assume m q , 1 Given a random variable qe and constant 0, an we have 11111,21,2mmm iqiqqiqq iiiPYePYeeaPYeea 11,2m iqq iPYeea 11|22m iqqq iPYeeaPea 11|212m iqqq iPYeeaPea According to Chebyshev's Inequality, since qe is a random variable with qEe and 2, then 2 222 2qqPeaPea a Then, we obtain 2 2 11 (1) (2)11|21 2mm iqiqq iiPYePYeea a (6.4) Let the probability be equal to the product of term (1 ) and term (2) in formula (6.4). Next, we continue our proof by so lving two problems, respectively. (1) Prove is a positive number:

PAGE 91

79 In all steps of Dinur and Nissim (2003)’ proof for their Disqualifying Lemma, term (1) 11|2m iqq iPYeea can be substituted for 112m i iPYe provided 0, an To see this we have the following: 11|2m iqq iPYeea 2 011am iq jiPYjPej 2 1112ma iq ijPYePej Since q E en for any an 2 00a q jPej Now, Dinur and Nissim (2003) proved 2/8 11212m T i iPYee for the appropriate choice of T Rescaling T in proportion to 2 0 a q jPej proves our point. Similarly for the second part of his proof the parameters and can be rescaled in proportion to 2 0 a q jPej. This gives then that 2/8 2 12 1|2max12, 33 1m T iqq i x PYeeaex x where /2xe with chosen large enough as seen in Section 6.2. Thus

PAGE 92

80 2 22 12 11 12m iq ix PYex xa So the probability will be a positive number as long as term (2) is greater than 0. Thus we need to have 2 210 2 a which is true when 0 2 a and 2 an provided and 2 n respectively. These latter two cond itions are assumed in the Lemma 2. Thus, 2 1 2 [] 1Pr11|21 2m iqiqq i qRn iqihxdePYeea a (6.5) where parameter 0,, 22 an (2) We now maximize the lower bound over a. In order to derive a tight bound, we s eek to find the maximum value of (6.5) subject to 0,, 22 an So 2 2 111max1|21 2mm iqiqq a iiPYePYeea a 2 2 11|2max1 2m iqq a iPYeea a

PAGE 93

81 where the a in the first term is any 0,, 22 an Using (for this term) =on a gives us 2 22 12 1max1 12m iq a ix PYex xa Note that 2 21 2 a is decreasing over 0, 2 and increasing over 2 n so we merely need to compare 2 21 to 2 21 2 n By assumption n so the latter is maximal. Thus 2 22 12 110 1 2m iq ix PYex x n End of proof. Lemma 2 is a crucial step for our mode l. The successful proof provides a bound on the error in terms of the mean and variances of qe. In the next section, we will continue

PAGE 94

82 discussing these two parameters. Based on the re sults of Lemma 2, we are able to derive a bound for the number of queries, within wh ich the adversary would be able to compromise the database protected by using th e variable-data pertur bation method with a high probability (1 ). 6.4 The Bound of the Sample Size for th e Variable-data Perturbation Case In this section, based on the proof of Lemma 2, we develop the sampling bound for the variable-data perturbation case from two approaches. In the first approach, we use Dinur and Nissim (2003)’s result directly fr om their Disqualifying Lemma proof in our bound; the second approach applies instead th eir sample bound to obtain a tighter bound. 6.4.1 The Bound Based On the Disqualifying Lemma Proof Recall that 1 Derrhc (see section 6.3), a nd we intend to bound 1:l DDSerrhS by the confidence parameter 0 We use a probability to bound 1 Derrhc Then, 12:1n ll DDSerrhSn where 1 Thus we get 12:1n ll DDSerrhSn

PAGE 95

83 2 222 111 1 2l nx nx x n Bounding this with gives 2 12 222 :111 1 2l n l Dx DSerrhSnx x n Then, we take base 2 logarithm on both of the latter two sides to obtain 2 222 lg11lglg1 1 2 x lxnn x n The minimum sample size is thus 2 22lglg1 2 lg11 1 2 nn l x x x n Since 22 1 x x x where /2xe is a very small numb er, the resulting bound is very loose (as was the similar bound under th e Dinur and Nissim framework discussed earlier). If n is small, the sample size l can be even greater than 2n, which is the total number of all possible queries. With larger n, l becomes much smaller than 2n. However, l is still a very large number. In order to reduce the sample size l, we need to find a more practical value instead of 22 1 x x x

PAGE 96

84 6.4.2 The Bound based on the Sample Size Starting from Dinur and Ni ssim (2003), the sample size l is bounded by 2lg nn if the fixed perturbation is less than n Therefore, we have a sufficient bound for the fixed-data perturbation case (see section 6.2 for the details): 2 2lglg1 lg 2 lg1 1 nn lnn x x x Consider the boundary case 2lglg1 lg lg1 nn nn Then 2lglg1 lg1 lg nn nn 2lglg1 lg12nn nn Based on the above result for we replaced 22 1 x x x with 2lglg1 lg12nn nn This formula provides a better value than while developing a tighter bound for the sample size in the variable-data perturbation case.

PAGE 97

85 Since the reasoning used by Dinur and Nissim (2003) to arrive at 2lg nn remains unchanged for our case, so we can use 2lglg1 lg12nn nn in place of our This gives 2lglg1 2 lg 211121 2l nn n nnn n from which we obtain 2lglg1 2 lg 2lg1lg1121lg 2l nn n nnn n 2lglg1 2 lg 2lg1121lglg1 2nn nnlnn n giving 2lglg1 2 lg 2lglg1 lg1121 2nn nnnn l n (6.7) From formula (6.7) we can see that the sample size l decreases when and decrease. 6.4.3 Discussion As we know from section 6.2, the larger the number of camouflage vectors s is, the larger the response interval s are, which lead to the la rger perturbation mean and

PAGE 98

86 standard deviation. This result simply implies that sample size l increases with an increase of s. Our experiments based on the three exam ples in Garfinkel et al. (2002) support these conclusions. The database has 14 records, 14n Three cases are considered in Table 6-2. Table 6-2: The Relationship among , s and l. Network Variable 3w and 1m 5w and 2m 7w and 3m s 3 10 35 2.0236 2.7760 3.3019 1.1150 1.1114 1.174 l 213 217 223 From Table 6-2, we can see that the sample size l increases while and s increase. These results of sample sizes are very close to the bound 2lg nn from Dinur and Nissim (2003) and much less than 14216,384 6.5 Estimated the Mean and Standard Deviation In the previous section, we derived a bound on the sample size, which is the minimum number of queries required to disclo se the binary confidential information in a database protected by the variable-data pe rturbation method. The bound (see formula 6.7) is decided by four parameters: the number of database records n, the confidence parameter and the mean and standard deviation of the perturbation distribution. Among these four parameters, n and are known and predetermined. In this section, we will develop a method to identify the estimat ed mean and standard deviation of the perturbation distribution.

PAGE 99

87 Perturbations’ mean and standard deviation are fixed in the Garfinkel et al. (2002) as soon as the algorithm design is fini shed, such as those networks for camouflage vectors in the CVC technique. However, the actual mean and standard deviation can be calculated only if a ll responses from 2n queries are obtained, whic h is not practical in most situations. Instead of computing the true mean and standard deviation from 2n queries, our heuristic method intends to es timates these two values approximately, denoted as and by using the following random sampling method. Let i : index of query i iq : the thi query iqe: interval length of query iq i : mean of perturbations using queries 1,, i i : standard deviation of perturbations using queries 1,, i il : sample size computed from i and i using formula (6.7) Table 6-3 lists the heuristic steps for estim ating the mean, standard deviation and the bound on the sample size. We use the network example in Garfinkel et al. (2002) to illust rate our heuristic. The basic setting for the network algorithm is: there are 14 n database records, and parameters 3 w and 1 m The true mean and standard deviation computed from 142 queries are 2.023 and 1.115 which give a sample size 213l from formula (6.7). Also see Table 5-2 and Figure 5-1 for all camouflage vectors and the CVC network

PAGE 100

88 algorithm. Next, we show how the heuristic is applied to estimate and for the CVC technique example in Ga rfinkel et al. (2002). Table 6-3: Heuristic to Estimate the Mean Standard Deviation and the Bound l. Heuristic: 0. for (1 i ;30 i ; i ) Generate query iq and record its perturbation iqe. 1. Generate query iq and record its perturbation iqe. 2. Compute i and i using 1,,iqqee 3. Compute il from formula (6.7) using the estimated i and i 4. Increment i and repeat step 1 to step 3 until iil. This il is the final bound on the sample size, l. i and i are final values for the estimated and For example, the intruder sends a random query iq to the database, asking how many employees in Company B have positive HIV (see Table 5-1). The query responds an interval answer as 1,2 (see Table 5-2 for the set of camouflage vectors), from which the random perturbation is recorded as 211iqe Continue sending queries and recording perturbations. The mean and standard deviation are computed as 1 j i q j ie i and 2 1ji qi j ieu i using 1,,iqqee when the number of queries is more than 30,

PAGE 101

89 which is considered a large enough or repres entative sample size in statistics. The bound on the sample size il is also computed using the estimated i and i by formula (6.7). Keep updating the values of i i and il while receiving new query responses. At the same time, i and il are compared. The intruder stops sending queries when iil. In Table 6-4, we simulate this heuris tic by running programming language C++ and record the data for i i and il while the number of queries increases until iil. Table 6-4: Summary of the Estimated i i and il in the CVC Example Network. thi Query i i il 30 1.935 0.948 210 50 2.118 0.983 211 80 2.098 1.018 211 110 2.099 1.036 212 140 2.135 1.069 212 170 2.140 1.085 213 200 2.124 1.102 213 212 2.118 1.105 213 From our example, the sample size computed from the estimated and is the same as the true bound, 213ll. Although and are not exactly equivalent to the true and they are close enough to get the same or very similar bound on the sample size. After computing the bound, we run the LP algorithm (specified in section 6.2) to discover the true confidential vector from l perturbed answers. In this chapter, we build PAC models a nd derive the bounds for both the fixed data perturbation and the variable data pertur bation methods. First, according to Dinur and

PAGE 102

90 Nissim (2003)’s Disqualifying Lemma, we derive a PAC bound for the fixed data perturbation, which is lglg1 lg1nn l This bound is much looser than the sampling bound 2lg nn developed in Dinur and Nissim (2003). The second bound is derived from our Lemma 2 for the variable data perturbation, which is 2 22lglg1 2 lg11 1 2 nn l x x x n where /2xe This bound is still not tight enough to be useful. Then we have the third bound 2lglg1 2 lg 2lglg1 lg1121 2nn nnnn l n This bound is very practical and ca n be applied in a real databa se. In the next chapter, we will design some experiments to test our bound under different situations.

PAGE 103

91 CHAPTER 7 EXPERIMENTAL DESIGN AND RESULTS In Chapter 6, we built the PAC model and derived an error bound from the new Disqualifying Lemma (Lemma 2) for the variable data perturbation method. The bound determines the number of queries necessary to compromise binary c onfidential data in a database. In this chapter, we create a simu lated database with one binary confidential field which is protected by the variable da ta perturbation. All e xperiments are conducted to illustrate our results from the previous chapter. Computa tional results are analyzed and compared to examine how the perturbations’ mean, standard deviat ion and distribution affect the bound on sample size and th e level of disclosure accuracy. 7.1 Experimental Environment and Setup Experiments are designed to empirically illustrate the level of accuracy with which an adversary can compromise a database protected by the variab le-data perturbation method within a specific numb er of queries derived from our new Disqualifying Lemma (Lemma 2). Due to the limited capacity of our testing software CPLEX 8.0 (ILOG), our experiments can only consider the bound in formula (6.7) because the sample size computed from formula (6.6) generates an LP th at is too large to be solved. For the same reason, we have to relax the requirement that n needs to be large enough for formula (6.7) to hold for a specified and instead choose a relatively small n for solving the LP problem. Thus, all bounds used in our tests ar e computed from formula (6.7). This bound is sufficient for large enough n so we are examining that would apply to more benign

PAGE 104

92 distribution cases. As we see, the cases inve stigated seem to show some efficacy even under the various cases studied here. In our tests, the simulated database ha s 100 records with one confidential binary field. For each query we sample a perturbation width qe and generate an interval answer by randomly splitting qe into two values, each of which is deducted from or added to the true query answer to construct the lowe r and upper bound. The heuristic in Table 6-3 shows how to estimate the mean and standard deviation of the pert urbations from which the bound on the sample size l can be computed. The LP discussed in Chapter 6 is applied by the adversary to output the candi date binary vector. The sample size is computed using (6.7): 2lglg1 2 lg 2lglg1 lg1121 2nn nnnn l n where 100 n and 0.05 in our experiments. Four types of discrete pert urbation distributions for s, qe ~ 2, D are considered in the experiments: (1) Uniform distribution (2) Symmetric Distribution (3) Distribution with Positive Skewness (Skew to the right) (4) Distribution with Negative Skewness (Skew to the left) There are four cases with different means a nd standard deviations under each type of distribution. So, a total of 16 experiments ar e conducted. See Table 7-1 for the summary of four cases.

PAGE 105

93 Table 7-1: Summary of Four cases with Different Mean s and Standard Deviations. Variables Cases Case 1 high high Case 2 high low Case 3 low high Case 4 low low 7.2 Data Generation During the experiments, we assume all pert urbations are distributed within different intervals ,ab under each of the four cases. Table 7-2 lists those four intervals. Table 7-2: The Intervals of ,ab under the Four Cases. Variables Cases a b Case 1: high high 1 18 Case 2: high low 5 14 Case 3: low high 1 10 Case 4: low low 3 7 In each test, we use the inverse transfor m method to sample random perturbations from a given distribution. The LP algor ithm takes those perturbation values of qe as the inputs and then outputs a candida te binary vector. Then we can compute the errors, which record the difference between the candidate v ector and the true confidential data. We define an error rate as the pe rcentage of errors that are pr esent in the total number of database records. The average error rate is computed by running each case 100 times to reduce possible bias. The mean and standard deviation of every pe rturbation distribution need to satisfy three requirements (assum ptions discussed in section 6.3 for Lemma 2

PAGE 106

94 proof): (1) 2 n; (2) ; and (3) n. In our tests, 10 n, and different means and standard deviations are summarized in Table 7-3. The following part presents 16 distributi on plots with different means, standard deviations and distribution types. All pe rturbations randomly generated from those distributions for the tests are shown in the Appendix. All experiments results are summarized in Table 7-3. 1. Uniform Distribution In this category, perturbati ons are randomly generated from the following given uniform distributions. Every perturbation value can be produced with the same probability. Four different distri butions with different means and standard deviations are shown in Figure 7-1. A Probability Histogram for Uniform Distribution with high mean and high stardard deviation0 0.03 0.06 0.09 0.12 0.15 123456789101112131415161718PerurbationsProbability A A Probability Histogram for Uniform Distribution with high mean and low stardard deviation0 0.03 0.06 0.09 0.12 0.15 123456789101112131415161718PerurbationsProbability B A Probability Histogram for Uniform Distribution with low mean and high stardard deviation0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 123456789101112131415161718PerurbationsProbability C A Probability Histogram for Uniform Distribution with low mean and low stardard deviation0 0.06 0.12 0.18 0.24 0.3 123456789101112131415161718PerurbationsProbability D Figure 7-1: Plots of Four Unif orm Distributions of Perturbations at Different Means and Standard Deviations. A) Case 1: high high B) Case 2: high low C) Case 3: low high and D) Case 4: low low

PAGE 107

95 2. Symmetric Distribution Perturbations are distributed symmetrical ly in the following four cases. Every distribution’s mean, median and mode are all eq ual. The four given distributions used to generate random perturbations in our tests are shown in Figure 7-2. A Probability Histogram for Symmetric Distribution with high mean and high stardard deviation0 0.02 0.04 0.06 0.08 0.1 123456789101112131415161718PerurbationsProbability A A Probability Histogram for Symmetric Distribution with high mean and low stardard deviation0 0.05 0.1 0.15 0.2 0.25 0.3 123456789101112131415161718PerurbationsProbability B A Probability Histogram for Symmetric Distribution with low mean and high stardard deviation0 0.04 0.08 0.12 0.16 0.2 123456789101112131415161718PerurbationsProbability C A Probability Histogram for Symmetric Distribution with low mean and low stardard deviation0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 123456789101112131415161718PerurbationsProbability D Figure 7-2: Plots of Four Sy mmetric Distributions of Pert urbations at Di fferent Means and Standard Deviations A) Case 1: high high B) Case 2: high low C) Case 3: low high and D) Case 4: low low 3. Distribution with Positive Skewness Positive skewness indicates that the distribut ion skews to the right and its mean is greater than the median. Most of the perturbations are le ss than the average. Random perturbations are generated from the followi ng four given distribut ions with different means and standard deviations shown in Figure 7-3.

PAGE 108

96 A Probability Histogram (Skew to the Right) with high mean and high stardard deviation0 0.02 0.04 0.06 0.08 0.1 123456789101112131415161718PerurbationsProbability A A Probability Histogram (Skew to the Right) with high mean and low stardard deviation0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 123456789101112131415161718PerurbationsProbability B A Probability Histogram (Skew to the Right) with low mean and high stardard deviation0 0.05 0.1 0.15 0.2 0.25 123456789101112131415161718PerurbationsProbability C A Probability Histogram (Skew to the Right) with low mean and low stardard deviation0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 123456789101112131415161718PerurbationsProbability D Figure 7-3: Plots of Four Distributions with Positive Skewness of Perturbations at Different Means and Standard Deviations. A) Case 1: high high B) Case 2: high low C) Case 3: low high and D) Case 4: low low 4. Distribution with Negative Skewness Negative skewness indicates th e distribution skews to the left and its median is greater than the mean which is less than most of the va lues. Random perturbations are generated from the following four given distri butions with different means and standard deviations shown in Figure 7-4. 7.3 Experimental Results Experiments are conducted to disclose the confidential binary da ta in a simulated database. Four types of perturbation distri butions, each of which has four cases, are considered. The LP algorithm outputs the candi date confidential vector at the end by running C++ and CPLEX. Our program, which si mulates queries and their perturbations,

PAGE 109

97 does not deal with the case where the sa me query should be modified by the same perturbation, since the probabil ity that one query is chosen twice during one run is very small ( 1001 2 ). A Probability Histogram (Skew to the Left) with high mean and high stardard deviation0 0.02 0.04 0.06 0.08 0.1 0.12 123456789101112131415161718PerurbationsProbability A A Probability Histogram (Skew to the Left) with high mean and low stardard deviation0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 123456789101112131415161718PerurbationsProbability B A Probability Histogram (Skew to the Left) with low mean and high stardard deviation0 0.05 0.1 0.15 0.2 0.25 123456789101112131415161718PerurbationsProbability C A Probability Histogram (Skew to the Left) with low mean and low stardard deviation0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 123456789101112131415161718PerurbationsProbability D Figure 7-4: Plots of Four Distributions with Positive Skewness of Perturbations at Different Means and Standard Deviations. A) Case 1: high high B) Case 2: high low C) Case 3: low high and D) Case 4: low low 7.3.1 Experiment 1 The bound on the sample size (from formul a 6.7) and average error rate are computed. Table 7-3 lists the information a bout the mean and standard deviation, and also records computational results about the sample size and average error rate.

PAGE 110

98 Table 7-3: Experiments Results on 16 tests wi th the Means, Standard Deviations, Sample Sizes and Average Error Rates. Figure 7-5 shows that case 1 can always be compromised more than the other three cases, no matter what type of distri bution it has. Although the result seems counterintuitive at first sight it does support one of our assumptions in section 6.3. A high mean and high standard deviation of a perturbation distributi on indicate that many query responses have large perturbations whic h may perturb the true answer too much to be useful for the user, while other queries pr ovide very tight answer s which can reveal the Variables Distributions and Cases 2 l Average Error Rate (%) Case 1 9.50 4.91 7.20 5715 12.12 Case 2 9.50 2.60 6.05 4719 14.05 Case 3 5.50 2.60 4.05 4569 13.29 Uniform Case 4 5.50 0.87 3.18 4443 13.83 Case 1 9.50 4.85 7.18 5678 12.28 Case 2 9.50 1.96 5.73 4584 13.94 Case 3 5.50 2.74 4.12 4587 13.71 Symmetric Case 4 5.00 1.06 3.03 4438 13.84 Case 1 8.36 4.74 6.56 5342 13.15 Case 2 8.12 2.16 5.14 4574 14.26 Case 3 4.22 2.53 3.37 4537 13.75 Positive Skewness Case 4 4.51 1.13 2.82 4440 13.46 Case 1 9.99 4.41 7.20 5536 12.70 Case 2 9.99 2.47 6.23 4717 13.79 Case 3 5.60 2.54 4.07 4564 12.83 Negative Skewness Case 4 5.49 1.13 3.31 4443 13.49

PAGE 111

99 confidential data easily. It explains why case 1 with a high mean and high standard deviation still can have a low error rate. Average Error Rates for 16 Tests11.5 12 12.5 13 13.5 14 14.5 Case 1Case 2Case 3Case 4Error Rate % Uniform Symmetric Positive Skewness Negative Skewness Figure 7-5: Plot of Average Error Rates (%) for 16 Tests. Among the four cases, case 2 is more difficult to disclose for any type of distribution. This is true because high mean and low standard deviat ion indicate that most of the perturbations are clustered around th e high average mean value which provides good protection to the database, but the user may get little information from the query answers. An example similar to case 2 occurred in Garfinkel et al. (2002). The CVC technique designed three sample networks to construct th e camouflage vectors for an example database (Table 5-1). Among those three networks, the one with the perfect column balancing, which provides the best pr otection to the databa se according to the paper, has a high perturbation mean 3.302 which is close to 3.742 n and low standard deviation 1.174 similar to our case 2. Based on our experimental results, this network does protect the database well (recall that case 2 always has a high error

PAGE 112

100 rate), however, most of the time, the user get little information from query answers protected by this security method. Figure 7-6 shows 61% of the perturbations are clustered around the mean, and the standard deviation is small. Probability of Histogram of Perturbations for the CVC Network with w=7 and m=30.06 0.18 0.32 0.29 0.13 0.02 0.00 0 0.05 0.1 0.15 0.2 0.25 0.3 0.351234567PerturbationProbabillity Figure 7-6: The Probability Histogram of Pe rturbation Distribution for the CVC Network Error rates of case 3 and case 4 are always between those of case 1 and case 2, and they offer the user more accurate data than case 2. Case 3 is supposed to have a lower error rate than case 1 because of its low mean and high standard deviation which indicate most of the query answers have small pe rturbations. However, a low mean and high standard also generate a smaller sample si ze which may explain why case 3 has a higher error rate than case 1 in our tests. Figure 7-7 records the bounds on the sample size for 16 tests. It shows that the bound increases with increases of the mean and standard deviation in all types of distributions. Dinur and Nissim (2003) gave the bound for the fi xed data perturbation as 2lg nn which is 4,415 in our experiments. Most of our bounds are a little looser than this value.

PAGE 113

101 Bounds on the Sample Size for 16 tests4400 4600 4800 5000 5200 5400 5600 5800Case 1Case 2Case 3Case 4Sample Size Uniform Symmetric Positive Skewness Negative Skewness Figure 7-7: Plot of Bounds on the Sample Size for 16 Tests. 7.3.2 Experiment 2 Results in Experiment 1 are based on the sample sizes computed from different means and standard deviations in 16 cases. In order to reduce the bias because of the different sample sizes, we also computed the average error rates by using 6,000 l for each case. Table 7-4 shows the average error rate for each case when the sample size is the same. All other variables comp ly with those in Table 7-3. Table 7-4: Experimental Results on the Average Error Rates with 6,000 l for 16 Cases. Variables Average Error Rate (%) 6,000 l Case 1 9.50 4.91 11.91 Case 2 9.50 2.60 12.92 Case 3 5.50 2.60 11.72 Uniform Case 4 5.50 0.87 12.28 Distributions and Cases

PAGE 114

102 Table 7-4. Continued. Variables Average Error Rate (%) 6,000 l Case 1 9.50 4.85 12.16 Case 2 9.50 1.96 12.98 Case 3 5.50 2.74 11.51 Symmetric Case 4 5.00 1.06 12.21 Case 1 8.36 4.74 12.11 Case 2 8.12 2.16 12.52 Case 3 4.22 2.53 11.35 Positive Skewness Case 4 4.51 1.13 12.28 Case 1 9.99 4.41 12.09 Case 2 9.99 2.47 12.53 Case 3 5.60 2.54 11.24 Negative Skewness Case 4 5.49 1.13 12.25 As we suspected case 3, with low mean and high standard deviation, becomes the most unsafe situation for the database secu rity conflicting with the conclusions from Table 7-3. Figure 7-8 displays the results for 16 tests. In sum, experiment 1 suggests that case 1 is always worse than case 2 in terms of protecting the database. We can conclude from the test results from experiment 1 that a database may be compromised more easily if its perturbation distri bution has a high mean and high standard deviation. A high mean and low standard deviati on can best protect a database, but the query answer may be useles s because of the large perturbations. Case 3, with low mean and high standard deviation, usually provides the user with the most Distributions and Cases

PAGE 115

103 useful query responses. Experiment 2 shows th at with the same sample sizes, a database with perturbations in case 3 is the easiest to be discovered. Average Error Rates with Same Sample Size l=6,000 for Each Case10.5 11 11.5 12 12.5 13 13.5 Case 1Case 2Case 3Case 4Error Rate % Uniform Symmetric Positive Skewness Negative Skewness Figure 7-8: Plot of Averag e Error Rates (%) for 16 tests with the Same Sample Size 6,000 l In general, we see that a high level of protection may yield answers that are not useful and useful answers compromise the database. The experimental results also support our observation from chapter 6 and sh ow that with high probability, the binary confidential information in a database protec ted by the variable data perturbation can be disclosed at small error within a certain num ber of queries as suggested by Inequality (6.7).

PAGE 116

104 CHAPTER 8 CONCLUSION 8.1 Overview and Contribution In this dissertation, we address the sta tistical database secu rity problems from a new perspective by applying PAC learning theo ry. By learning from examples, the main idea of the PAC model is that the hypothe sis generated from the learning algorithm approximates the target concept with a high pr obability at small erro r in polynomial time and/or space. By deploying the PAC learning theory, we regard the adversary of the database as a learner who tries to discover the confidential data within a certain number of queries. This new approach is different fr om the traditional met hods in the literature. Instead of building models to protect the confidential information, we focus on how to compromise the database, therefore finding out how much protecti on is necessary to prevent the disclosure of sensitive information contained in a database. First, we review the SDC methods in the literature and focus our research on a new data perturbation method. Inspired by the CV C interval protection technique developed by Garfinkel et al. (2002), we de fine this new technique as th e variable data perturbation method which can be viewed as modifying the confidential information by adding discrete noise qe Although the random perturbations have an unknown distribution from an intruder’s perspective, we can estimate the parameters, such as its mean and variance using a heuristic method detailed in Chapter 6. We also extend the work by Dinur and Nissim (2003). In their study, all queries have fixed perturbations e. No information is provided about the distribution of the

PAGE 117

105 perturbations. They derived a bound on the sample size, within which the true confidential binary string should be discove red with high probability by running an LP algorithm. It is assumed that the fixed pert urbations may happen on either side of the query responses while setting up constraints for the LP algorithm. In our paper, we interpret their resu lts within the methodology of PAC learning theory and derive a bound for the fixed data perturbation method. Then, we develop the PAC bounds on the sample size from our Lemm a 2 for the variable data perturbation method. Within the PAC number of queries, a database protected by the variable data perturbation can be compromised with a high probability at small error. Since the bound is decided by parameters, such as the mean and standard deviation, a heuristic method is also introduced to estimate these two values. To illustrate our results, we perform a number of numerical experiments conducted on a simulated database over four types of perturbation distri butions with different means and standard deviations. The te st results show that these da tabases can be compromised at fairly high levels and also s how that the mean and standard deviation of the perturbation distribution are more important factors than the type of the distribution in terms of affecting the error rate and the sample size. 8.2 Limitations There are three main limitations in our work on the database security problems. First, we only consider the case that the confidential data is perturbed by discrete random perturbations even though continuous nois es can also be added to the database protected by the variable data perturba tion method. Second, when deriving the bound, we assume that the confidential item is binary va lued In general, a confidential field can contain many types of data, such as real numbers or categori cal data. So, this assumption

PAGE 118

106 may constrain the application of this bound. Last, the experiments are conducted on a simulated database rather than a real databa se. Moreover, the simulated database used in the experiments had a relatively small number of records (100) due to the limitations of our testing software. 8.3 Directions for Future Research In future research, we can consider other types of confidential items such as realvalued or categorical. We ma y also examine other types of perturbations, such as realvalued ones. Even within Dinur and Nissim (2003) type of setting we might consider a case where their perturbation is fixed but initially drawn fr om some known distribution. A typical example for the variable da ta perturbation is the CVC technique developed by Garfinkel et al. (2002). We simulated th eir network algorithm with different parameters w and m on the example database in Garfinkel et al. (2002) and observed on a number of cases that, given a large-enough number of random queries, all camouflage vectors could be discovered by r unning the LP algorithm used in Chapter 6. Based on these experimental re sults, we conjecture that every camouflage vector in the Bin-CVC technique is an extreme point of a polyhedron formed by all the 2n queries and, conversely, that all the extreme points ar e camouflage vectors. How this (if true) pertains to polytopes formed with a subset of the 2n possible queries needs to be investigated. We suspect that if the output from the LP algorithm is an integer vector, then it will be one of the extreme points, therefore, one of the camouflage vectors. This is an important possible weakness of the CVC method since ther e are generally few camouflage vectors and one is th e true vector of database values. The discovery of the camouflage vectors reduces the intrusion problem to discovering which amoung a small

PAGE 119

107 number of vectors, is the true vector. Insider info rmation on small number of query values could easily determine which is the true vector. Muralidhar et al. (2004) compromised CV C interval protection empirically by employing a simple deterministic procedure. Th ey also claimed that if the CVC technique intends to prevent the interval disclosure, such as increasing the number of camouflage vectors, data utility has to be damaged s ubstantially. Our future research will try to extend their work and propose a more general th eoretical method to address the problem. Since choosing an appropriate security method depends greatly on how well it can balance the tradeoff between information loss a nd disclosure risk, our future task is to develop a general performance measur ement which can be used to assess comprehensively the disclosure risk a nd information loss for the variable-data perturbation method, such as the measure fo r the interval protection. By applying this measure, we would be able to check the utili ty of the interval answer from the Bin-CVC technique and investigate whether CVC interv al protection is practical or whether the quality of responses to queries outweighs the high level of protection for the database. We hope this evaluation scheme can become a guideline for selec ting ideal security methods in SDBs under some specific situations.

PAGE 120

108 APPENDIX A NOTATION TABLES Table A-1: Notations in Machine Learning and PAC Learning Theory. Notation Definition f Target concept (or target function) x Instance X Instance space or Input space y Output Y Output space S Sample l Sample size n Number of attributes or number of records in the database h Hypothesis H Hypothesis space H Cardinality of H C Concept space D Probability distribution ()Derr Probability of error Accuracy parameter Confidence parameter iE Event i s Training error d VC dimension () L Loss function R Risk functional z Observation pairs ,gz Set of target functions with parameters Fz Unknown Probability distribution ()empR Empirical risk ()structR Structural risk ()boundR Risk bound

PAGE 121

109 Table A-2: Notations in Statisti cal Disclosure Control Methods Notation Definition Mean 2 Variance d True confidential vector e Perturbation vector Random variable with a normal distribution V Covariance matrix Number of camouflage vectors P Set of camouflage vectors jP thj camouflage vector, 1,, jk j i p thi element in thj camouflage vector, 1,, in q Query () u Upper bound () l Lower bound () I Interval between () l and () u w Total number of paths in the network algorithm m Number of paths consisting only of true value edges p Proportion of ones in the confidential vector j p Proportion of ones in the thj camouflage vector cardq Cardinality of query q qe Perturbation vector generated from an algorithm aq True query answer A q Perturbed query answer k Precision parameter K Set of precision parameters () dist Hamming distance

PAGE 122

110 APPENDIX B DATA GENERATED FOR THE UNIFORM DISTRIBUTION Table B-1: Case 1 with High M ean and High Standard Deviation Perturbation Frequency Probability CDF 1 600 0.06 0.06 2 600 0.06 0.11 3 600 0.06 0.17 4 600 0.06 0.22 5 600 0.06 0.28 6 600 0.06 0.33 7 600 0.06 0.39 8 600 0.06 0.44 9 600 0.06 0.50 10 600 0.06 0.56 11 600 0.06 0.61 12 600 0.06 0.67 13 600 0.06 0.72 14 600 0.06 0.78 15 600 0.06 0.83 16 600 0.06 0.89 17 600 0.06 0.94 18 600 0.06 1.00 Total 10800 1.00 Table B-2: Case 2 with High M ean and Low Standard Deviation Perturbation Frequency Probability CDF 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 1000 0.1 0.1 6 1000 0.1 0.2 7 1000 0.1 0.3 8 1000 0.1 0.4 9 1000 0.1 0.5 10 1000 0.1 0.6 11 1000 0.1 0.7 12 1000 0.1 0.8 13 1000 0.1 0.9 14 1000 0.1 1

PAGE 123

111 Table B-2. Continued Perturbation Frequency Probability CDF 15 0 0 16 0 0 17 0 0 18 0 0 Total 10000 1 Table B-3: Case 3 with Low M ean and High Standard Deviation Perturbation Frequency Probability CDF 1 1000 0.1 0.1 2 1000 0.1 0.2 3 1000 0.1 0.3 4 1000 0.1 0.4 5 1000 0.1 0.5 6 1000 0.1 0.6 7 1000 0.1 0.7 8 1000 0.1 0.8 9 1000 0.1 0.9 10 1000 0.1 1 11 0 0 12 0 0 13 0 0 14 0 0 15 0 0 16 0 0 17 0 0 18 0 0 Total 10000 1 Table B-4: Case 4 with Low M ean and Low Standard Deviation Perturbation Frequency Probability CDF 1 0 0 0 2 0 0 0 3 2000 0.2 0.2 4 2000 0.2 0.4 5 2000 0.2 0.6 6 2000 0.2 0.8 7 2000 0.2 1 8 0 0 9 0 0 10 0 0 11 0 0 12 0 0 13 0 0

PAGE 124

112 Table B-4 Continued Perturbation Frequency Probability CDF 15 0 0 14 0 0 16 0 0 17 0 0 18 0 0 Total 10000 1

PAGE 125

113 APPENDIX C DATA GENERATED FOR THE SYMMETRIC DISTRIBUTION Table C-1: Case 1 with High M ean and High Standard Deviation Perturbation Frequency Probability CDF 1 450 0.045 0.045 2 460 0.046 0.091 3 480 0.048 0.139 4 520 0.052 0.191 5 530 0.053 0.244 6 570 0.057 0.301 7 620 0.062 0.363 8 670 0.067 0.43 9 700 0.07 0.5 10 700 0.07 0.57 11 670 0.067 0.637 12 620 0.062 0.699 13 570 0.057 0.756 14 530 0.053 0.809 15 520 0.052 0.861 16 480 0.048 0.909 17 460 0.046 0.955 18 450 0.045 1 Total 10000 1 Table C-2: Case 2 with High M ean and Low Standard Deviation Perturbation Frequency Probability CDF 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 200 0.02 0.02 6 500 0.05 0.07 7 900 0.09 0.16 8 1300 0.13 0.29 9 2100 0.21 0.5 10 2100 0.21 0.71 11 1300 0.13 0.84 12 900 0.09 0.93 13 500 0.05 0.98 14 200 0.02 1

PAGE 126

114 Table C-2 Continued Perturbation Frequency Probability CDF 15 0 0 16 0 0 17 0 0 18 0 0 Total 10000 1 Table C-3: Case 3 with Low M ean and High Standard Deviation Perturbation Frequency Probability CDF 1 875 0.0875 0.0875 2 905 0.0905 0.178 3 970 0.097 0.275 4 1050 0.105 0.38 5 1200 0.12 0.5 6 1200 0.12 0.62 7 1050 0.105 0.725 8 970 0.097 0.822 9 905 0.0905 0.9125 10 875 0.0875 1 11 0 0 12 0 0 13 0 0 14 0 0 15 0 0 16 0 0 17 0 0 18 0 0 Total 10000 1 Table C-4: Case 4 with Low M ean and Low Standard Deviation Perturbation Frequency Probability CDF 1 0 0 0 2 0 0 0 3 800 0.08 0.08 4 2400 0.24 0.32 5 3600 0.36 0.68 6 2400 0.24 0.92 7 800 0.08 1 8 0 0 9 0 0 10 0 0 11 0 0 12 0 0 13 0 0

PAGE 127

115 Table C-4. Continued. Perturbation Frequency Probability CDF 14 0 0 14 0 0 15 0 0 16 0 0 17 0 0 18 0 0 Total 10000 1

PAGE 128

116 APPENDIX D DATA GENERATED FOR THE DISTRIBU TION WITH POSITIVE SKEWNESS Table D-1: Case 1 with High M ean and High Standard Deviation Perturbation Frequency Probability CDF 1 500 0.05 0.05 2 550 0.055 0.105 3 600 0.06 0.165 4 700 0.07 0.235 5 800 0.08 0.315 6 905 0.0905 0.4055 7 950 0.095 0.5005 8 755 0.0755 0.576 9 600 0.06 0.636 10 500 0.05 0.686 11 470 0.047 0.733 12 420 0.042 0.775 13 400 0.04 0.815 14 390 0.039 0.854 15 380 0.038 0.892 16 370 0.037 0.929 17 360 0.036 0.965 18 350 0.035 1 Total 10000 1 Table D-2: Case 2 with High M ean and Low Standard Deviation Perturbation Frequency Probability CDF 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 600 0.06 0.06 6 1500 0.15 0.21 7 3000 0.3 0.51 8 1600 0.16 0.67 9 900 0.09 0.76 10 800 0.08 0.84 11 600 0.06 0.9 12 500 0.05 0.95 13 300 0.03 0.98 14 200 0.02 1

PAGE 129

117 Table D-2. Continued. Perturbation Frequency Probability CDF 15 0 0 16 0 0 17 0 0 18 0 0 Total 10000 1 Table D-3: Case 3 with Low M ean and High Standard Deviation Perturbation Frequency Probability CDF 1 1200 0.12 0.12 2 1600 0.16 0.28 3 2250 0.225 0.505 4 1300 0.13 0.635 5 900 0.09 0.725 6 700 0.07 0.795 7 600 0.06 0.855 8 550 0.055 0.91 9 500 0.05 0.96 10 400 0.04 1 11 0 0 12 0 0 13 0 0 14 0 0 15 0 0 16 0 0 17 0 0 18 0 0 Total 10000 1 Table D-4: Case 4 with Low M ean and Low Standard Deviation Perturbation Frequency Probability CDF 1 0 0 0 2 0 0 0 3 2000 0.2 0.2 4 3500 0.35 0.55 5 2400 0.24 0.79 6 1600 0.16 0.95 7 500 0.05 1 8 0 0 9 0 0 10 0 0 11 0 0 12 0 0 13 0 0

PAGE 130

118 Table D-4. Continued. Perturbation Frequency Probability CDF 14 0 0 15 0 0 16 0 0 17 0 0 18 0 0 Total 10000 1

PAGE 131

119 APPENDIX E DATA GENERATED FOR THE DISTRIBU TION WITH NEGATIVE SKEWNESS Table E-1: Case 1 with High M ean and High Standard Deviation Perturbation Frequency Probability CDF 1 320 0.032 0.032 2 350 0.035 0.067 3 380 0.038 0.105 4 410 0.041 0.146 5 450 0.045 0.191 6 480 0.048 0.239 7 520 0.052 0.291 8 550 0.055 0.346 9 600 0.06 0.406 10 850 0.085 0.491 11 1090 0.109 0.6 12 850 0.085 0.685 13 800 0.08 0.765 14 700 0.07 0.835 15 600 0.06 0.895 16 500 0.05 0.945 17 300 0.03 0.975 18 250 0.025 1 Total 10000 1 Table E-2: Case 2 with High M ean and Low Standard Deviation Perturbation Frequency Probability CDF 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 550 0.055 0.055 6 650 0.065 0.12 7 750 0.075 0.195 8 800 0.08 0.275 9 950 0.095 0.37 10 1500 0.15 0.52 11 1800 0.18 0.7 12 1400 0.14 0.84 13 1000 0.1 0.94 14 600 0.06 1

PAGE 132

120 Table E-2. Continued. Perturbation Frequency Probability CDF 15 0 0 16 0 0 17 0 0 18 0 0 Total 10000 1 Table E-3: Case 3 with Low M ean and High Standard Deviation Perturbation Frequency Probability CDF 1 700 0.07 0.07 2 800 0.08 0.15 3 900 0.09 0.24 4 1000 0.1 0.34 5 1100 0.11 0.45 6 1400 0.14 0.59 7 1700 0.17 0.76 8 1000 0.1 0.86 9 800 0.08 0.94 10 600 0.06 1 11 0 0 12 0 0 13 0 0 14 0 0 15 0 0 16 0 0 17 0 0 18 0 0 Total 10000 1 Table E-4: Case 4 with Low M ean and Low Standard Deviation Perturbation Frequency Probability CDF 1 0 0 0 2 0 0 0 3 500 0.05 0.05 4 1600 0.16 0.21 5 2400 0.24 0.45 6 3500 0.35 0.8 7 2000 0.2 1 8 0 0 9 0 0 10 0 0 11 0 0 12 0 0 13 0 0

PAGE 133

121 Table E-4 Continued Perturbation Frequency Probability CDF 14 0 0 15 0 0 16 0 0 17 0 0 18 0 0 Total 10000 1

PAGE 134

122 LIST OF REFERENCES Achugbue, J. O. and Chin, F. Y. (1979). “The Effectiveness of Output Modification by Rounding for Protection of Sta tistical Databases.” INFORM 17(3): 209-218. Adam, N. R. and Jones, D. H. (1989). “Security of Statistical Databases with an Output Perturbation Technique.” Journal of Management Information System 6(1): 101110. Adam, N. R. and Wortmann, J. C. (1989). “Security-Control Met hods for Statistical Database: A Comparative St udy.” ACM Computing Surveys 21(4): 515-556. Angluin, D. (1988). “Queries and C oncept Learning.” Machine Learning 2(4): 319-342. Angluin, D. and Laird, P. (1988). “Learning from Noisy Examples.” Machine Learning 2(4): 343-370. Anwar, N. (1993). “Micro-Aggregation – the Sm all-Aggregates Method.” Internal report. Luxembourg, Eurostat. Aslam, J.A. and Decatur, S.E. (1993). “Gen eral Bounds on Statistical Query Learning and PAC Learning with Noise via Hypothe sis Boosting.” In Proceedings of the 34rd Annual IEEE Symposium on Foundations of Computer Science (FOCS ‘93), Palo Alto, California: 282-291. Beck, L. L. (1980). “A Security Mechanism fo r Statistical Database.” ACM Transactions on Database Systems 5(3): 316-338. Blum, A., Furst, M. Jackson, J. Kearns, M.J. Mansour, Y. and Rudich, S. (1994). “Weakly Learning DNF and Characterizi ng Statistical Query Learning Using Fourier Analysis.” In Proceedings of the 26th Annual ACM Symposium on the Theory of Computing (STOC ‘94), Montral, Canada: 253-262. Blumer, A., Ehrenfeucht, A., Haussler, D., and Warmuth, M. K. (1987). “Occam’s Razor.” Information Processing Letters 24(6): 377-380. Blumer, A., Ehrenfeucht, A., Haussler, D ., and Warmuth, M. K. (1989). “Learnability and the Vapnik-Chervonenkis Dimension.” Journal of the ACM 36(4): 929-965. Brand, R. (2002). “Microdata protection thr ough noise addition. In ference Control in Statistical Databases.” Berlin Heidelberg, Springer. 2316: 97-116.

PAGE 135

123 Brankovic, L., Miller, M., Horak, P. a nd Wrightson, G. (1997). “Usability of Compromise-free Statistical Databases for Range Sum Queries.” In Proceedings of 9th International Conference on Scientific and Statistical Database Management (SSDBM ‘97), Olympia, Washington: 144-154. Bshouty, N. H. (1998). “A New Composition Theorem for Learning Algorithms.” In Proceedings of the 30th Annual ACM Sy mposium on the Theory of Computing (STOC ’98), Dallas, Texas: 583-589. Bshouty, N. H., Jackson, J., and Tamon, C. (2003). “Uniform-Distribution Attribute Noise Learnability.” Info rmation and Computation 187(2): 277-290. Cesa-Bianchi, N., Dichterman, E., Fischer, P., Shamir, E., and Simon, H.U. (1999). “Sample-Efficient Strategies for Learning in the Presence of Noise.” Journal of the ACM 46(5): 684-719. Chen, G. and Keller-McNulty, S. (1998). “Estima tion of Identification Disclosure Risk in Microdata.” Journal of Official Statistics 14: 79-95. Chin, F. Y., Kossowski, P., and Loh, S. C. (1984). “Efficient Inference Control For Range Sum Queries.” Theore tical Computer Science 32:77-86. Chin, F. Y. and Ozsoyoglu, G. (1979). “Secur ity in Partitioned Dynamic Statistical Databases.” In Proceedings of the IEEE International Computer Software and Applications Conference (COMPSA C ’79), Chicago, Illinois: 594-601. Chin, F. Y. and Ozsoyoglu, G. (1981). “Statis tical Database Design.” ACM Transactions on Database Systems 6(1): 113–139. Chin, F. Y. and Ozsoyoglu, G. (1982). “Auditi ng and Inference Control in Statistical Databases.” IEEE Transacti on Software Engineering 8(6): 574-582. Chu, P. C. (1997). “Cell Suppression Me thodology: The Importance of Suppressing Marginal Totals.” IEEE Transactions on Knowledge and Data Engineering 9(4): 513-523. Cox, L.H. (1975). “Disclosure Analysis and Ce ll Suppression.” In Proceedings of the American Statistical Association (S ocial Statistics Section): 750-755. Cox, L.H. (1980). “Suppression Methodology and St atistical Disclosure Control.” Journal of American Statistical Association (Theory and Methods Section) 75(370): 377385. Crises G. (2004). “Additive Noise for Micr odata Privacy Protection in Statistical Databases.” Research Report. http://vneumann.etse.urv.es/publications/reports/a dditivenoise.pdf (accessed July 2005)

PAGE 136

124 Crises, G. (2004a). “An Introduction to Micr odata Protection for Database Privacy.” Research Report. http://vneumann.etse.urv.es/ publications/reports/mic rodata_introduction.pdf (accessed July 2005) Crises, G. (2004b). “Synthetic Microdata Gene ration for Database Privacy Protection.” Research Report. http://vneumann.etse.urv.es/publicatio ns/reports/synthetic_methods.pdf (accessed July 2005) Crises, G. (2004c). “Non-Perturbative Met hods for Microdata Privacy in Statistical Databases.” Research Report. http://vneumann.etse.urv.es/reports/nonperturbative_met hods.pdf (accessed July 2005) Crises, G. (2004d). “Perturbation Masking for Microdata Privacy Protection in Statistical Databases.” Research Report. http://vneumann.etse.urv.es/reports/per turbative_methods.pdf (accessed July 2005) Crises, G. (2004e). “Trading Off Informati on Loss and Disclosure Risk in Database Privacy Protection.” Research Report. http://vneumann.etse.urv.es/ publications/reports /combining.pdf (accessed July 2005) Cristianini, N. and Shawe-Taylor, J. (2000) An Introduction to Support Vector Machines and Other Kernel-based Learning Methods Cambridge, Cambridge University Press. Dalenius, T. (1981). “A Simple Procedure for Controlled Ro unding.” Statistik Tidskrift 3: 202-208. Dalenius, T and Reiss, S.P. (1982). “Dat a-swapping: A Technique for Disclosure Control.” Journal of Statis tical Planning and Inference 6: 73–85. Decatur, S. E. (1996). “Learning in Hybrid Noise Environments Using Statistical Queries.” In Learning From Data: Artificial Intelligence and Statistics V. Edited by Fisher, V. D. and Lenz, H.J. New York, Springer Verlag: 259-270. Decatur, S.E. (1997). “PAC Learning with C onstant-Partition Classification Noise and Applications to Decision Tr ee Induction.” In Proceedings of the 6th International Workshop on Artificial Intelligence and St atistics, Fort Lauderdale, Florida: 147156. Decatur, S. E. and Gennaro, R. (1995). “On Learning from Noisy and Incomplete Examples.” In Proceedings of the 8th Annual ACM Conference on Computational Learning Theory (COLT ,95), Sa nta Cruz, California: 353-360.

PAGE 137

125 Defays, D. and Nanopoulos, P. (1993). “Panel s of Enterprises and Confidentiality: the Small Aggregates Method.” In Proc eedings of 92 Symposium on Design and Analysis of Longitudinal Surveys, Ottawa, Canada: 195-204. Denning, D. E. (1980). “Secure Statistical Databases with Random Sample Queries.” ACM Transactions on Database Systems 5(3): 291-315. Denning, D. E. (1983). “A security Model fo r the Statistical Database Problem.” In Proceedings of the 2nd Internationa l Workshop on Statistical Database Management (SSDBM ’83), Los Altos, California: 368-390. Denning, D. E., Denning, P. J. and Schwartz, M. D. (1979). “The Tr acker: A Threat to Statistical Database Security.” ACM Transactions on Database Systems 4(1): 7679. Denning, D. E. and Schlorer, J. (1980). “A Fast Procedure for Fi nding a Tracker in A Statistical Database” ACM Tran sactions on Database Systems 5(1): 88-102. Denning, D. E. and Schlorer, J. (1983). “Inf erence Control for St atistical Databases.” Computer 16(7): 69–82. Denning, D. E., Schlorer, J., and Wehrle, E. (1982). Memoryless Inference Controls for Statistical Satabases Purdue University. DeWaal, A. G. and Willenborg, L. C. R. J. (1995). “Global Recordings and Local Suppressions in Microdata Sets.” In Pro ceedings of Statistics Canada Symposium 95, Ottawa, Canada: 121–132. Dinur, I. and Nissim, K. (2003). “Revealing Information while Preserving Privacy.” ACM Press 9(12): 202–210. Dobkin, D., Jones, A. K., and Lipton, R. J. (1979). “Secure Databases: Protection Against User Influence.” ACM Transa ctions on Database Systems 4(1): 97-106. Domingo-Ferrer, J. and Mate o-Sanz, J. M. (2002). “P ractical Data-oriented Microaggregation for Statis tical Disclosure Contro l.” IEEE Transactions on Knowledge and Data Engineering 14(1):189–201. Domingo-Ferrer, J., Mateo-Sanz J. and To rra, V. (2001). “Comparing SDC Methods for Microdata on the Basis of Information Lo ss and Disclosure Risk.” Pre-proceedings of Exchange of Technology and Know-how. and. New Techniques and Technologies for Statistics ( ETK-NTTS ‘01), Crete, Greece. 2: 807-826. Domingo-Ferrer, J. and Mateo-Sa nz J. (1998). “Current Direc tions in Statistical Data Protection.” Research in Official Statistics 2: 105-112.

PAGE 138

126 Domingo-Ferrer, J. and Torra, V. (2001). “A Quantitative Comparison of Disclosure Control Methods for Microdata.” Confid entiality, Disclosure and Data Access: Theory and Practical Applica tions for Statistical Agencies Edited by P Doyle, P., Lane, J., Theeuwes, J. and Zayatz, L. Amsterdam, North-Holland: 111-134. Domingo-Ferrer, J. and Torra, V. (2003). “O n the Connections Be tween Statistical Disclosure Control for Microdata and Some Artificial Intelligence Tools.” Information Sciences 151: 153–170. Duncan, G. T. and Fienberg S. E. (19 99). “Obtaining Information While Preserving Privacy: a Markov Perturbation Method for Tabular Data.”. In Proceedings of Statistical Data Protection, Li sbon. Luxembourg, Eurostat: 351-362. Duncan, G. T., Fienberg S. E., Krishnan, R., Padman, R. and Roehrig, S. F. (2001). “Disclosure Limitation Methods and Info rmation Loss for Tabular Data.” In Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies Edited by P Doyle, P., Lane, J., Theeuwes, J. and Zayatz, L. Amsterdam, North-Holland:135-166. Duncan, G. T., Keller-McNulty, S. A. and St okes, S. L. (2004). “Database Security and Confidentiality: Examining Disclosure Risk vs. Data Utility through the R-U Confidentiality Map.” http://www.niss.org/technical reports/tr142.pdf (accessed July 2005) Duncan, G. T. and Lambert, D. (1989). “The Risk of Disclosure of Microdata.” Journal of Business and Economic Statistics 7: 207-17. Fellegi, I. P. (1972). “On the Question of Statistical Confidentiality.” Journal of American Statistical Association 67(337): 7-18. Fellegi, I. P. and Phillips, J. L. (1974). “S tatistical Confidentialit y: Some Theory and Applications to Data Disseminati on.” Annals of Economic and Social Measurement 3(2): 399-409. Felso, F., Theeuwes, J. and Wanger G. G. (2001). “Disclosure Limitation Methods in Use: Results of a Survey.” In Confidentia lity, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies Edited by P Doyle, P., Lane, J., Theeuwes, J. and Zayatz, L. Am sterdam, North-Holland: 17-42. Fienberg, S. E. and McIntyre J. (2004). Data swapping: Variations On A Theme By Dalenius and Reiss. Privacy in Statistical Databases Berlin Heidelberg, Springer. 3050: 14–29. Friedman, A. D., and Hoffman, L. J. (1980). “Towards A Fail-safe Approach to Secure Databases.” In Proceedings of IEEE Sym posium on Security and Privacy, Oakland, California: 18-22.

PAGE 139

127 Garfinkel, R., Gopal, R., Goes, P. (2002). “Privacy Protection of Binary Confidential Data Against Deterministic, Stochastic, a nd Insider Threat.” Management Science 48(6): 749-764. Garfinkel, R., Gopal, R. and Rice, D. (2004) “New Approaches to Disclosure Limitation While Answering Queries to a Database: Protecting Numerical Confidential Data Against Insider Threat Based on Data or Algorithms.” http://www-eio.upc.es/seminar/04/garfinke l.pdf (accessed July 2005) Goldman, S. A. (1991). Com putational Learning Theory Lecture Notes. http://www.cs.wustl.edu/cs/cs/archive/CS582_SP96/ (accessed July 2005) Goldman, S. A. and Sloan, R. (1995). “Can PAC learning Algorithms Tolerate Random Attribute Noise?” Algorithmica 14(1): 70-84. Gomatam, S., Karr, A. F., Reiter, J. P. and Senil, A. P. (2004). “Data Dissemination and Disclosure Limitation in a World Without Microdata: A Risk-Utility Framework for Remote Access Analysis Servers.” http://www.niss.org/techni calreports/tr138.pdf (accessed July 2005) Gopal, R., Garfinkel, R. and Goes, P. ( 2000). “Confidentiality via Camouflage: The CVC Approach to Disclosure Limitation When Answering Queries to Databases.” Operations Research 50(3): 501-516. Gopal, R., Goes, P. and Garfinkel, R. ( 1998). “Interval Protection of Confidential Information in a Database.” Journal on Computing 10(3): 309-322. Hansen, S. L. and Mukherjee. S. (2003). “A Po lynomial Plgorithm for Pptimal Univariate Microaggregation.” IEEE Transactions on Knowledge and Data Engineering 15(4): 1043-1044. Haq, M. I. (1977). “On Safeguarding Statis tical Disclosure by Giving Approximate Answers to Queries.” In Proceedings of In ternational Computer Symposium, Lige, Belgium: 491-495. Haq, M. I. (1975). “Insuring Individual’s Priv acy from Statistical Database Users.” In Proceedings of National Computer Conf erence, Anaheim, California. 44: 941-946. Hoffman, L. J. (1977). Modern Methods for Computer Security and Priuacy New Jersey, Prentice-Hall, Englewood Cliffs. Hoffman, L. J., and Miller, W. F. (1970). “G etting A Personal Dossier From a Statistical Data Bank.” Datamation 16(5): 74-75.

PAGE 140

128 Holvast, J. (1999). “Statistical Dissemin ation, Confidentiality and Disclosure.” In Proceedings of the Joint Eurostat/UNE CE Work Session on Statistical Data Confidentiality. Luxemb ourg, Eurostat: 191-207. ILOG (1999). ILOG CPLEX 6.5 User’s Manual Jackson, J. (2003). “On the Efficiency of No ise-Tolerant PAC Algorithms Derived from Statistical Queries.” Annals of Mathematics and Artificial Intelligence 39(3): 291313. Jaro, M. A. (1989). “Advances in Record-li nkage Methodology as A pplied to Matching the 1985 Census of Tampa, Florida.” Journa l of American Sta tistical Association 84: 414-420. Jonge, W. DE. (1983). “Compromising statisti cal databases: Respondi ng to Queries about Means.” ACM Transactions on Database Systems 8(1): 60-80. Kearns, M. (1993). “Efficient Noise-Tolerant Learning from Statis tical Queries.” In Proceedings of the 25th Annual ACM Sy mposium on Theory of Computing, San Diego, California: 392-401. Kearns, M. and Li, Ming. (1993) “Learning in the Presence of Malicious Errors.” Journal on Computing 22(4): 392-401. Kelly, J.P., Golden, B.L., and Assad, A.A. (1992). “Cell Suppression: Disclosure Protection for Sensitive Tabular Data.” Networks 22: 397-417. Kim, J. J. (1986). “A Method for Limiting Disclosure in Microdata Based on Random Noise and Transformation.” In Proceedings of the 2nd on Survey Research Methods, Alexandria, Virginia: 303-308. Kleinberg, J., Papadimitriou, C. and Raghava n, P. (2000). “Auditing boolean attributes.” In Proceedings of the 9th ACM SIGMOD-SIGACT-SI GART Symposium on Principles of Database Systems, Dallas, Texas: 86-91. Kooiman, P., Willenborg, L., and Gouweleeuw, J. (1998). “A Method of Disclosure Limitation of Microdata.” Research Report. Statistics Netherlands. Lambert, D. (1993). “Measures of Disclosure Risk and Harm.” Journal of Official Statistics 9: 461–8. Lefons, D., Silverstri, A., and Tangorra, F. ( 1983). “An Analytic Appr oach to Statistical Databases.” In Proceedings of 9th Confer ence on Very Large Databases, Florence, Italy: 260-273. Leiss, E. (1982). “Randomizing a Practical Method for Protecting Statistical Databases against Compromise.” In Proceedings of 8th Conference on Very Large Databases, Mexico City. Mexico: 189-196.

PAGE 141

129 Li, Y., Wang, L., Wang, X.S. and Jajodia, S. (2002a). “Auditing Interval-based Inference.” In Proceedings of the14t h Conference on Advanced Information Systems Engineering (CAiSE ’02), Toronto, Canada: 553-568. Li, Y., Wang, L., Zhu, S.C. and Jajodi a, S. (2002b). “A Privacy Enhanced Microaggregation Method.” In Proceedings of the 2nd International Symposium on Foundations of Information and Knowledge Systems (FoIKS ‘02), Salzau Castle, Germany: 148–159. Liew, C. K., Choi, W. J., and Liew, C. J. (1985). “A Data Distortion by Probability Distribution.” ACM Transactions on Database Systems 10(3): 395-411. Luige, T. and Meliskova, J. (1999). “Conf identiality Practices in the Trnsition Countries.” In Proceedings of the Jo int Eurostat/UNECE Work Session on Statistical Data Confidentialit y, Luxembourg, Eurostat: 287-319. Malvestuto, F.M. (1993). “A Universal che me Approach to Statistical Databases Containing Homogeneous Summary Tables .” ACM Trans. Database Systems 18(4): 678-708. Malvestuto, F.M. and Mezzini, M. (2003). “A uditing Sum Queries.” In Proceedings of the 9th International Confer ence on Database Theory (IC DT ’03), Siena, Italy: 126–146. Malvestuto, F.M. and Moscarini, M. ( 1990). “Query Evaluability in Statistical Databases.” IEEE Transactions on Knowledge and Data Engineering 2(4): 425430. Malvestuto, F.M. and Moscarini, M. (1998). “Computational Issues Connected with the Protection of Sensitive Statistics by Aud iting Sum-queries.” In Proceedings of IEEE Scientific and Statistical Databa se Management, Capri, Italy: 134–144. Malvestuto, F.M., Moscarini, M. and Rafa nelli, M. (1991). “Suppressing Marginal Cells to Protect Sensitive Information in a Two-Dimensional Statistical Table.” In Proceedings of the10th ACM SIGACT-S IGMOD-SIGART Symposium Principles of Database Systems, Denver, Colorado: 252-258. Ms, M. (2000). “Statistical Data Protection Techniques.” http://www.eustat.es/document/datos/prot_ seguridat_i.pdf (assessed July 2005) Mateo-Sanz, J. M. and Domingo-Ferrer. J. (1999). “A Method for Data-Oriented Multivariate Microaggregation.” In Proceedi ngs of Statistical Data Protection '98, Luxembourg, UK: 89-99. Matloff, N. E. (1986). “Another Look at the Use of Noise Addition for Database Security.” In Proceedings of IEEE Sym posium on Security and Privacy, Oakland, California: 173-180.

PAGE 142

130 Muralidhar, K., Batra, D. and Kirs, P. J. ( 1995). “Accessibility, Security, and Accuracy in Statistical Database: The Case for the Multiplicative Fixed Data Perturbation Approach.” Management Science 41(9): 1549-1564. Muralidhar, K., Parsa, R., and Sarathy, R. (1999). “A General Add itive Data Perturbation Method for Database Security.” Management Science 45(10): 1399-1415. Muralidhar, K., Li, H. and Sarathy, R. (2004). “Disclosure Risk Problems with Confidentiality via Camouflage.” Natarajan, B. K. (1991). Machine Learning: A Theoretical Approach. San Francisco, California, Morgan Kaufmann Publishers, Inc. Oganian, A. (2002). Security and Informati on Loss in the Protection of Statistical Databases Dissertation Thesis, Universi tat Politcnica de Catalunya. Ozsoyoglu, G. and Chin, F. Y. (1982). “Enhancin g the Security of Statistical Database with a Question-Answering System and a Kernel Design.” IEEE Transactions in Software Engineering 8(3): 223-234. Pagliuca, D. and Seri, G. (1998). “Some Results of Individual Ranking Method on the System of Enterprise Accounts Annual Su rvey.” Esprit SDC Project, Deliverable MI-3/D2. Reiss, S.P. (1984). “Practical Data-swapping: The First Steps.” IEEE Transactions on Database Systems 9: 20-37. Samuel, S. M. (1998). “A Bayesian, Speices-S ampling-Inspired Appr oach to the Unique Problem in Microdata Disclosure Risk A ssessment.” Journal of Official Statistics 14: 373-383. Sande, G. (1983). “Automated Cell Suppression to Reserve Confidentiality of Business Statistics.” In Proceedings of the 2nd International Workshop on Statistical Database Management, Los Altos, California: 346-353. Sarathy, R. and Muralidhar, K. (2002). “The Security of Confidential Numerical Data in Databases.” Information Systems Research 13(4): 389-403. Schlorer, J. (1975). “Identific ation and Retrieval of Persona l Records From a Statistical Data Bank.” Methods of Information in Medicine : 14(1): 7-13. Schlorer, J. (1976). “Confidentia lity of Statistical Records: A Threat Monitoring Scheme of On-line Dialogue.” Methods of Information in Medicine 15(1): 36-42. Schlorer, J. (1980). “Disclosure From Statis tical Databases: Quantitative Aspects of Trackers.” ACM Transactions on Database Systems 5(4): 467-492.

PAGE 143

131 Schlorer, J. (1981). “Security of Stat istical Databases: Multidimensional Transformation.” ACM Transactions on Database Systems 6(1): 95-112. Schlorer, J. (1983). “Information Loss in Part itioned Statistical Databases.” Computer Journal 26(3): 218-223. Schwartz, M. D., Denning, D. E., and Denning, P. J. (1979). “Linear Queries in statistical Databases.” ACM Transactions on Database Systems 4(2): 156-167. Schlkopf, B. and A. J. Smola. (2001). Learni ng with Kernels: Support Vector Machines, Regularization, Optimization and Beyond Boston, MIT Press. Seb, F., Domingo-Ferrer, J., Mateo-Sanz and Torra, V. (2002). “Postmasking Optimization of the Tradeoff between Information Loss and Disclosure Risk in Masked Microdata Sets.” Inference Control in Statistical Databases Berlin Hedelberg, Springer. 2316: 63-171. Shackelford, G. and Volper, D. (1988). “Learni ng k-DNF with Noise in the Attributes.” In Proceedings of the 1988 Workshop on Computing Learning Theory, MIT, Massachusetts: 97-103. Skinner, C., Marsh, C., Openshaw, S., and Wy mer, C. (1994). “Disclosure Control for Census Microdata.” Journa l of Official Statistics 10: 31-51. Sloan, R. (1988). “Types Noise in Data fo r Concept Learning.” In the 1988 Workshop on Computational Learning Theor y, MIT, Massachusetts: 91-96. Sloan, R. (1995). “Four Types of Noise in Data for PAC Learning.” Information Processing Letter 54: 157-162. Spruill, N. L. (1983). “The Confidentiality a nd Analytic Usefulness of Masked Business Microdata.” In Proceedings of the Secti on on Survey Research Methods, American Statistical Association: 602-607. Sullivan, G.R. (1989). The Use of Added E rror to Avoid Disclosure in Microdata Release Dissertation Thesis, Iowa State University. Sullivan, C. M. (1992). “An Overview of Disclo sure Principles.” Bureau of the Census Statistical Research Division Res earch Report Series No. RR-92/09. Tendick, P. (1991). “Optimal Noise Add ition for Preserving Confidentiality in Multivariate Data.” Journal of Statistical Planning and Inference 27(2): 341-353. Tendick, P. and Matloff, N. (1994). “A modified Ra ndom Perturbation Method for Database Security.” ACM Transactions on Database Systems 19(1): 47-63.

PAGE 144

132 Trottini, M. and Fienberg, S. (2002). “Modeling User Uncerta inty for Disclosure Risk and Data Utility.” International Journal of Uncertainty, Fuzziness and KnowledgeBased Systems 10(5): 511-528. Truta, T. M., Fotouhi, F. and Barth-Jones, D. (2004). “Disclosure Risk Measures for the Sampling Disclosure Control Method.” In Proceedings of ACM 2004 Symposium on Applied Computing, Nicosia, Cyprus: 301-306. Valiant, L. G. (1984). “A Theory of th e Learnable.” Communications of the ACM 27: 1134-1142. Valiant, L. G. (1985). “Learni ng Disjunctions of Conjunction.” In Proceedings of 9th International Joint Conference on Artificial Intelligence, Los Angeles, California: Vapnik, V. (1998). Statistical Learning Theory New York, John Wiley & Sons. Vapnik, V. and Chervonenkis, A. (1971). “O n the Uniform Convergence of Relative Frequencies of Events to their Probab ilities.” Theory of Probability and its Applications 16(2): 264-280. Willenborg, L. and Waal, T. (2000). Elements of Statistical Disclosure Control New York. Springer. Willenborg, L. and Waal, T. (1996). Statis tical Disclosure Control in Practice New York, Springer. Yancey, W. E., Winkler, W. E. and Creecy, R. H. (2002). “D isclosure Risk Assessment in Perturbative Microdata Protection.” Infe rence Control in Statistical Databases Berlin Hedelberg, Springer. 2316: 135-152. Yu, C. T., and Chin, F. Y. (1977). “A Study on the Protection of Statistical Databases.” In Proceedings of ACM SIGMOD Intern ational Conference on Management of Data, Toronto, Canada: 169-181.

PAGE 145

133 BIOGRAPHICAL SKETCH Ling He graduated from the University of International Business and Economics with a Bachelor of Arts de gree in economics in 1996. She re ceived a Master of Science degree in Decision and Information Sciences in 2003 and a Ph.D. degree in 2005 at the University of Florida. Her research inte rests focus on database management, machine learning theory and applications, statistical learning theory, data-mining, information security, information retrieval, and e-Commerce. She intends to pursue an academic res earch and teaching career after the completion of her doctoral degree.


Permanent Link: http://ufdc.ufl.edu/UFE0011440/00001

Material Information

Title: Disclosure Control of Confidential Data by Applying PAC Learning Theory
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0011440:00001

Permanent Link: http://ufdc.ufl.edu/UFE0011440/00001

Material Information

Title: Disclosure Control of Confidential Data by Applying PAC Learning Theory
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0011440:00001


This item has the following downloads:


Full Text












DISCLOSURE CONTROL OF CONFIDENTIAL DATA
BY APPLYING PAC LEARNING THEORY















By

LING HE

















A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY


UNIVERSITY OF FLORIDA


2005





























Copyright 2005

by

Ling He

































I would like to dedicate this work to my parents, Tianqin He and Yan Gao, for their
endless love and encouragement through all these years.















ACKNOWLEDGMENTS

I would like to express my complete gratitude to my advisor, Dr. Gary Koehler.

This dissertation would not have been possible without his support, guidance, and

encouragement. I have been very fortunate to have an advisor who is always willing to

devote his time, patience and expertise to the students. During my Ph.D. program, he

taught me invaluable lessons and insights on the workings of academic research. As a

distinguished scholar and a great person, he sets an example that always encourages me

to seek excellence in the academic area as well as my personal life.

I am very grateful to my dissertation cochair, Dr. Haldun Aytug. His advice,

support and help in various aspects of my research carried me on through a lot of difficult

times. In addition, I would like to thank the rest of my thesis committee members: Dr.

Selwyn Piramuthu and Dr. Anand Rangaraj an. Their valuable feedback and comments

helped me to improve the dissertation in many ways.

I would also like to acknowledge all the faculty members in my department,

especially the department chair, Dr. Asoo Vakharia, for their support, help and patience.

I also thank my friends for their generous help, understanding and friendship in the

past years. My thanks also go to my colleagues in the Ph.D. program for their precious

moral support and encouragement.

Last, but not least, I would like to thank my parents for always believing in me.
















TABLE OF CONTENTS

page

A C K N O W L E D G M E N T S ................................................................................................. iv

LIST OF TABLES ......... ................... .. ............. .............. viii

LIST OF FIGURES ......... ......................... ...... ........ ............ ix

ABSTRACT ........ .............. ............. ...... ...................... xi

CHAPTER

1 IN TR OD U CTION ............................................... .. ......................... ..

1.1 B ack g rou n d ................................................................................... 1
1.2 M otiv atio n ............................................................................... ............... .. 2
1.3 Research Problem .................................. .. ....... ................. .3
1.4 C contribution .......................................................... ......... ............ . .
1.5 O organization of D issertation ....................................................................... .... 4

2 STATISTICAL AND COMPUTATIONAL LEARNING THEORY .....................6

2 .1 In tro d u ctio n ................................................................................. 6
2.2 M machine Learning .................. ........................... .... .... ... ........ .. ..
2.2.1 Introduction ................................................ ........ .................
2.2.2 M machine Learning M odel................................. ...... ..................7
2.3 Probably Approximately Correct Learning Model ..........................................8
2.3.1 Introduction ..................... ............................. .. ..... ........... .. 8
2.3.2 The Basic PAC Model Learning Binary Functions ..................................8
2.3.3 Finite H ypothesis Space ................................. ..................................... 11
2.3.4 Infinite hypothesis space............................. ... ...............12
2.4 Empirical Risk Minimization and Structural Risk Minimization................ 13
2.4.1 Em pirical Risk M inim ization .......... ........................ ..1............. .13
2.4.2 Structural Risk M inimization............................... ......... ... ........... 13
2.5 L earning w ith N oise......... ..................................................... ............... 14
2.5.1 Introduction .............. ......... ....... .......... 14
2.5.2 Types of N oise ..................................... ......... ...... .. ............ 15
2.5.3 Learning from Statistical Query .................................. ............... 17
2.6 L earning w ith Q ueries.............................................. ............................ 18









3 DATABASE SECURITY-CONTROL METHODS....................... ............... 19

3.1 A Survey of Database Security .................... ...........................19
3.1.1 Introduction .................... .... .... ......... ..... .. ... ............. 19
3.1.2 Database Security Techniques .................................... ......... ..........21
3.1.3 M icrodata files ............. ................... ..... ........ .. .. ............ 22
3.1.4 Tabular data files ............ ....... ............. ............. .. ............. 25
3.2 Statistical Database ............ .... ............. .......... .. ............. 27
3.2.1 Introduction ............. .. ........... ......... .. .................... .. 27
3.2.2 An Example: The Compromise of Statistical Databases....................28
3.2.3 Disclosure Control Methods for Statistical Databases ..........................29

4 INFORMATION LOSS AND DISCLOSURE RISK .............................................35

4 .1 In tro d u ctio n ............................ ..... ............ .......... ................ 3 5
4.2 Literature Review ..... ............... .......... .. ............ .. ............ 36

5 D A TA PERTU RB A TION ........... .................................. .................. ............... 42

5.1 Introduction ................................................... ..... ............... 42
5.2 Random Data Perturbation.............................................................43
5.2.1 Introduction .......... ........................................ ........... 43
5.2.2 Literature Review ........................................................43
5.3 V ariable D ata Perturbation .................. ........................... ............... ... 46
5.3.1 CVC Interval Protection for Confidential Data ............. ...............46
5.3.2 V ariable-data Perturbation.................................. ........................ 50
5.3.3 D discussion ....................................................... ... .. ..... .......... 53
5.4 A Bound for The Fixed-data Perturbation (Theoretical Basis)........................54
5.5 Proposed A pproach................................................................ ...............58

6 DISCLOSURE CONTROL BY APPLYING LEARNING THEORY....................62

6 .1 R research P rob lem s............................................................................. .. 62
6.2 The PAC Model For the Fixed-data Perturbation.................... .............63
6.3 The PAC Model For the Variable-data Perturbation ....................................72
6.3.1 PA C M odel Setup ..................................................... ...................72
6.3.2 D isqualifying L em m a 2 ..................... ...................... ................. ...74
6.4 The Bound of the Sample Size for the Variable-data Perturbation Case.........82
6.4.1 The bound based on the Disqualifying Lemma proof ............................82
6.4.2 The Bound based on the Sample Size................... .......................... 84
6.4.3 D discussion ............................... .. ................. ........ ........... ......... 85
6.5 Estimated the Mean and Standard Deviation.......................................86

7 EXPERIMENTAL DESIGN AND RESULTS ................... ......................... 91

7.1 Experimental Environment and Setup .................................. ...............91
7.2 D ata G generation .............................. ........................ .. ...... .... ............93
7.3 E xperim ental R results ............................................... ............................ 96









7.3.1 E xperim ent 1............................................... ....................... 97
7 .3 .2 E xperim ent 2 ............................. .... .............................. ............ 10 1

8 C O N C L U SIO N ......... ...................................................................... ........... ..... .. 104

8.1 Overview and Contribution........................................ ........................ 104
8.2 Lim stations ...................... ...... ................................................105
8.3 Directions for Future Research ........... ..............................................106

APPENDIX

A N O TA TIO N TA B LE S............................................ ....................................... 108

B DATA GENERATED FOR THE UNIFORM DISTRIBUTION............................110

C DATA GENERATED FOR THE SYMMETRIC DISTRIBUTION .......................113

D DATA GENERATED FOR THE DISTRIBUTION WITH POSITIVE
SK EW N E SS ............................................................... .... .... ........ 116

E DATA GENERATED FOR THE DISTRIBUTION WITH NEGATIVE
SK EW N E SS ............................................................... .... .... ........ 119

L IST O F R E FE R E N C E S ....................................................................... .................... 122

BIOGRAPH ICAL SKETCH .............................................................. ............... 133
















LIST OF TABLES


Table p

3-1: O original R records ...................... .................... .. .. ........... .... ....... 24

3-2: M asked R records ................... .... ............................ .. ...... ............... 24

3-3: O original Table ......................... ........ .. .. ........ .. ............. 26

3-4: Published Table .................. ......................................... .. ........ .... 26

3-5: A H hospital's D database .............................................................................. .... ........29

5-1: A n Exam ple D database .................................. .......................................... 47

5-2: The Example Database With Camouflage Vector ................................................48

5-3: A n Exam ple of Interval D isclosure................................. ........................ .. ......... 54

5-4: LP A lgorithm .............................................. 55

6-1: Bounds on the Sample Size with Different Values of n. ......................................72

6-2: The Relationship among u/ c s and .................................... ...............86

6-3: Heuristic to Estimate the Mean /i, Standard Deviation and the Bound I .........88

6-4: Summary of the Estimated /i, ~, and 1 in the CVC Example Network...............89

7-1: Summary of Four cases with Different Means and Standard Deviations. ...............93

7-2: The Intervals of [a, b] under the Four Cases ....................................................93

7-3: Experiments Results on 16 Tests with the Means, Standard Deviations, Sample
Sizes and A average Error R ates ........................................ ........................... 98

7-4: Experimental Results on the Average Error Rates with / = 6,000 for 16 Cases...101















LIST OF FIGURES


Figure p

2 -1 : E rro r P ro b ab ility ............................................................................ ..................... 10

3-1: Microdata File That Has Been Read Into SPSS..............................................23

4-1: R-U Confidentiality Map, Univariate Case, n = 10, 02 = 5, 2 = 2 .......................40

5-1: Network With (m,w) = (1,3) (data source: Garfinkel et al. 2002) ..........................49

5-2: Discrete Distribution of Perturbations from the Bin-CVC Network Algorithm......52

5-3: Relationships of c, c', c and d ............................................................................58

5-4: Illustration of the Connection between the PAC Learning and Data Perturbation ..59

6-1: Relationships H0, H1, H2, ho, h1 and d in the Fixed-Data Perturbation ..............65

6-2: Relationships of H0, H,, H2, h0, h and d in the Variable-Data Perturbation......74

6-3: A Bimodal Distribution of Perturbations in the CVC Network while /
6-4: A Distribution of Perturbations in the CVC Network with u/ > c-n > a.................77

7-1: Plots of Four Uniform Distributions of Perturbations at Different Means and
Standard D eviations ..................................... .. .. .......................94

7-2: Plots of Four Symmetric Distributions of Perturbations at Different Means and
Standard D eviations ......................................... .............................95

7-3: Plots of Four Distributions with Positive Skewness of Perturbations at Different
Means and Standard Deviations............................ ............................ 96

7-4: Plots of Four Distributions with Positive Skewness of Perturbations at Different
M eans and Standard Deviations. ........................................ .......................... 97

7-5: Plot of Average Error Rates (%) for 16 Tests. ......................................................99









7-6: The Probability Histogram of Perturbation Distribution for the CVC Network.... 100

7-7: Plot of Bounds on the Sample Size for 16 Tests. ................................................. 101















Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

DISCLOSURE CONTROL OF CONFIDENTIAL DATA
BY APPLYING PAC LEARNING THEORY
By

Ling He

August 2005

Chair: Gary Koehler
Cochair: Haldun Aytug
Major Department: Decision and Information Sciences

With the rapid development of information technology, massive data collection is

relatively easier and cheaper than ever before. Thus, the efficient and safe exchange of

information becomes the renewed focus of database management as a pervasive issue.

The challenge we face today is to provide users with reliable and useful data while

protecting the privacy of confidential information contained in the database.

Our research concentrates on statistical databases, which usually store a large

number of data records and are open to the public where users are allowed to ask only

limited types of queries, such as Sum, Count and Mean. Responses for those queries are

aggregate statistics that intends to prevent disclosing the identity of a unique record in the

database.

My dissertation aims to analyze these problems from a new perspective using

Probably Approximately Correct (PAC) learning theory which attempts to discover the

true function by learning from examples. Different from traditional methods from which









database administrators apply security methods to protect the privacy of statistical

databases, we regard the true database as the target concept that an adversary tries to

discover using a limited number of queries, in the presence of some systematic

perturbations of the true answer. We extend previous work and classify a new data

perturbation method- the variable data perturbation which protects the database by

adding random noises to the confidential field. This method uses a parametrically driven

algorithm that can be viewed as generating random perturbations by some (unknown)

discrete distribution with known parameters, such as the mean and standard deviation.

The bounds we derive for this new method shows how much protection is necessary to

prevent the adversary from discovering the database with high probability at small error.

Put in PAC learning terms we derive bounds on the amount of error an adversary makes

given a general perturbation scheme, number of queries and a confidence level.














CHAPTER 1
INTRODUCTION

1.1 Background

Statistical organizations, such as U.S. Census Bureau, National Statistical Offices

(NSOs), and Eurostat, collect large amounts of data every year by conducting different

types of surveys from assorted individuals. Meanwhile, the data stored in the statistical

databases (SDBs) are disseminated to the public in various forms, including microdata

files, tabular data files or sequential queries to the online databases. The data are

retrieved, summarized and analyzed by various database users, i.e., researchers, medical

institutions or business companies. Among the published data, restrictions are established

on the release of sensitive data in order to comply with the confidentiality agreements

imposed by the sources or providers of the original information. Therefore, the protection

of confidential information becomes a critical issue with serious economic and legal

implications which in turn expands the scope and necessity of improved security in the

database field.

Statistical databases usually store large a number of data records and are open to

the public where users are allowed to ask only limited types of queries, such as Sum,

Count and Mean. Responses for those queries are aggregate statistics that aim to prevent

disclosing the identity of a unique record in the database.

With the rapid development of information technology, it becomes relatively easier

and cheaper to obtain data than ever before. With the recent passage of The Personal

Responsibility and Work Opportunity Act of 1996 (The Welfare Reform Act) (Fiengerg









2000) and Health Insurance Portability and Accountability Act of 1996 (HIPPA) in the

United States, the protection of confidential information collected by statistical

organizations has become a renewed focus of database management as a pervasive issue

since the 70s and 80s. Those statistical organizations have the legal and ethical

obligations to maintain the accuracy, integrity and privacy of the information contained

in their databases.

1.2 Motivation

Traditional research on SDBs privacy, which is also called Statistical Disclosure

Control (SDC), has been under way for over 30 years. SDC provides all types of security-

control methods. Among them, microaggregation, cell suppression and random data

perturbation are some of the most promising SDC methods. Recently, Garfinkel et al.

(2002) developed a new technique called CVC protection which designs a network

algorithm to construct a series of camouflage vectors which hides the true confidential

vector. This CVC technique provides interval answers to ad-hoc queries. All those SDC

methods attempt to provide the SDB users with reliable and useful data (minimizing the

information loss) while protecting the privacy of the confidential information in the

database (minimizing the disclosure risk) as well.

Probably Approximately Correct (PAC) learning theory is a framework for

analyzing machine learning algorithms. It attempts to discover the true function by

learning from examples which are randomly drawn from an unknown but fixed

distribution. Given accuracy and confidence parameters, the PAC model bounds the error

that the true function makes.

Different from the traditional methods from which database administrators apply

SDC methods to protect the privacy of SDBs, we approach the database security problem









from a new perspective, from which we assume that an adversary regards the true

confidential data in the database as the target concept and tries to discover it within a

limited number of queries by applying PAC learning theory.

We describe how much protection is necessary to guarantee that the adversary

cannot uncover the database's confidential information with high probability. Put in PAC

learning terms we derive bounds on the amount of error an adversary makes given a

general perturbation scheme, number of queries and a confidence level.

1.3 Research Problem

Additive data perturbation includes some of the most popular database security

methods. Inspired by the CVC technique, we classify a new method into this category-

the variable data perturbation which protects a database by adding random noises.

Different from the fixed random data perturbation method, this method effectively

generates random perturbations which have an unknown discrete distribution. However,

parameters, such as the mean and standard deviation, can be estimated. The variable data

perturbation method is the focus of our research.

We intend to derive a bound on the level of error that an adversary may make while

compromising a database. We extend the previous work by Dinur and Nissim (2003),

who found a bound for the fixed data perturbation method, and deploy the PAC learning

theory to develop a new bound for the variable data perturbation.

A threshold on the number of queries is developed from the error bound. With high

probability, the adversary can disclose the database at small error if this certain number

of queries is asked. Therefore, we may find out how much protection would be necessary

to prevent the disclosure of the confidential information in a statistical database.









Our experiments indicate that a high level of protection may yield answers that are

not useful whereas useful answers can lead to the compromise of a database.

1.4 Contribution

Two major contributions are expected from this research. First, we approach the

database security problem from a new perspective instead of following the traditional

research paths in this field. By applying PAC learning theory, we regard an adversary of

the database as a learner who tries to discover the confidential information within a

certain number of queries. We show that both SDC methods and PAC learning theory

actually use the similar methodology for different purposes. We also derive a PAC-like

bound on the sample size for the variable data perturbation method, within which the

database can be compromised with a high probability at small error. Based on this result,

we would find out if a security method can provide enough protection to the database.

1.5 Organization of Dissertation

The dissertation is organized into 8 parts. Chapter 2 provides an overview of the

important concepts, methodologies and models in the fields of machine learning and PAC

learning theory. In Chapter 3, we summarize database security-control methods in

microdata files, tabular data files and the statistical database which is the emphasis of our

efforts. We review the literature of performance measurements for the database

protection methods in Chapter 4. Following that, in Chapter 5 random data perturbation

methods are reviewed and a new data perturbation method, variable-data perturbation, is

defined and developed. Two papers that motivated our research are reviewed and

explained. We propose our approach at the end of this chapter. In Chapter 6, we introduce

our methodology and develop the research model. A bound on the sample size for the

variable data perturbation method is derived, within which the confidential information






5


can be disclosed. In Chapter 7, experiments are designed and conducted to test our

theoretical conclusions from previous chapters. Experimental results are summarized and

analyzed at the end. Chapter 8 concludes our work and gives directions for future

research.














CHAPTER 2
STATISTICAL AND COMPUTATIONAL LEARNING THEORY

In this chapter, we introduce Statistical and Computational Learning Theory, a

formal mathematical model of learning. The overview focuses on the PAC model, the

most commonly used theoretical framework in this area. We then move to a brief review

of statistical learning theory and its two important principles: empirical and structural

minimization principles. Other well-known concepts and theorems are also investigated

here. At the end of the chapter, we extend the basic PAC framework to more practical

models, that is, learning with noise and query learning models.

2.1 Introduction

Since the 1960s, researchers have been diligently working on how to make

computing machines learn. Research has focused on both empirical and theoretical

approaches. The area is now called machine learning in computer science but referred to

as data mining, knowledge discovery, or pattern recognition in other disciplines.

Machine learning is a mainstream of artificial intelligence. It aims to design learning

algorithms that identify a target object automatically without human involvement. In the

machine learning area, it is very common to measure the quality of a learning algorithm

based on its performance on a sample dataset. It is therefore difficult to compare two

algorithms strictly and rigorously if the criterion depends only on empirical results.

Computational learning theory defines a formal mathematical model of learning, and it

makes it possible to analyze the efficiency and complexity of learning algorithms at a

theoretical level (Goldman 1991).









2.2 Machine Learning

2.2.1 Introduction

In this section we start our review with an introduction to important concepts in the

machine learning field, such as hypotheses, training samples, instances, instance spaces,

etc. This is followed by a demonstration of the basic machine learning model which is

designed to generate an hypothesis that closely approximates the unknown target concept.

See Natarajan (1991) for a complete introduction.

2.2.2 Machine Learning Model

Many machine learning algorithms are utilized to tackle classification problems

which attempt to classify objects into particular classes. Three types of classification

problems include binary classification-one with two classes; multi-class classification-

handling a finite number of output categories; and regression whose output are real

values (Cristianini and Shawe-Taylor 2000).

Most machine learning methods learn from examples of the target concept. This is

called supervised learning. The target concept (or target function) f is an underlying

function that maps data from the input space to the output space. The input space is also

called an instance space, denoted as X, which is used to describe each instance

x e X c- 91". Here n represents the dimensions or attributes of the input instance. The

output space, denoted as Y, contains every possible output label y e Y. In the binary

classification case, the target concept (or target function) f (x) classifies all instances

x e X into negative and positive classes, illustrated as 0 and 1, X c 91" Y c {0,1}.

Let f (x) = 1 if x belongs to a positive (true) class, and f(x) = 0 (false) otherwise.









Suppose a sample S includes / pairs of training examples, S = ((x,, ), ---,(x", Y )).

Each x& is an instance, and output y, is x 's classification label.

The learning algorithm inputs the training sample and outputs an hypothesis h(x)

from the set of all hypotheses under consideration which best approximates the target

concept f(x) according to its criteria. An hypothesis space H is a set of all possible

hypotheses. The target concept is chosen from the concept space, f e C, which consists

of a set of all possible concepts (functions).

2.3 Probably Approximately Correct Learning Model

2.3.1 Introduction

The PAC model proposed by Valiant in 1984 is considered the first formal

theoretical framework to analyze machine learning algorithms, and it formally initiated

the field of computational learning theory. By learning from examples, the PAC model

combines methods from complexity theory and probability theory, aimed at measuring

the complexity of learning algorithms. The core idea is that the hypothesis generated

from the learning algorithm approximates the target concept with a high probability at a

small error in polynomial time and/or space.

2.3.2 The Basic PAC Model Learning Binary Functions

The PAC learning model quantifies the worst-case risk associated with learning a

function. We discuss its details using binary functions as the learning domain. Suppose

there is a training sample S of size 1. Every example is generated independently and

identically from an unknown but fixed probability distribution D over the instance space

X c- {0, 1). Thus, the PAC model is also named a distribution-free model. Each instance









is an n -bits binary vector, x e X c {0, 1". The learning task is to choose a specific

boolean function that approximates the target concept f : {0," {0,1}, f c C. The

target concept f is chosen from the concept space C = 2X of all possible boolean

functions. According to PAC requirements a learning algorithm must output an

hypothesis h e H in polynomial time, where H ( 2X We hope that the target function

f e H and hypothesis h can approximate target function f as accurately as possible. If

f V H then the classification errors are inevitable.

Consider a concept space C = 2 an hypothesis space H c 2 and an unknown

but fixed probability distribution D over an instance space X c {0,1}", the error of an

hypothesis, h e H with respect to a target concept f e C, is the probability that h and

f disagree on the classification of an instance x e X drawn from D. This probability of

error is denoted by a risk functional:

err(h)= PrD x,f (x)):h(x) f(x)

To understand the error more intuitively, see Figure 2-1. The error probability is

indicated by areas of I and II. Areas I and II in the figure show where h(x) disagrees

with f(x) on the instances located in these places. We can think about them as Type I

and Type II errors. Area III and IV contain those instances that h(x) and f(x) agree on

their classification.

The PAC model utilizes an accuracy parameter e and confidence parameter 3 to

measure the quality of an hypothesis h. Given a sample S of size 1, and a distribution D








from which all training examples are drawn, the PAC model strives to bound the

probability that an hypothesis h gives large error by 3 as in

Pr' {S:errorD (h)> E8 <3

where h means that the training set decides the selection of the hypothesis.


h(x) # f(x)



I III II


IV -...............................
h(x)= f(x)
Instance Space X

Figure 2-1: Error Probability

Definition: PAC Learnable. A concept class C of boolean functions is PAC learnable

if there exists a learning algorithm A, using an hypothesis space H, such that for every

f e C, for every probability distribution D, for every 0 < E < 1/2, and for every

0< <1/2:

(1) An hypothesis he H, produced by algorithm A, can approximate the target

function f with high probability at least 1- 3, such that error (h) < .

(2) The complexity of the learning algorithm A is bounded by the size of target

concept n, 1/E and 1/3 in polynomial time. The sample complexity refers to the sample

size within which the algorithm A needs to output an hypothesis h.









2.3.3 Finite Hypothesis Space

An hypothesis space H can be finite or infinite. If an hypothesis h classifies all

training examples correctly, it is called a consistent hypothesis. We will derive the main

PAC result in multiple steps using well-known inequalities from probability theory.

2.3.3.1 Finite consistent hypothesis space

Assuming the hypothesis spaceH is finite, if we choose an hypothesis h with a

risk greater than E, the probability that it is consistent on a training sample S of size / is

bounded as

Pr S :{S h consistent and error (h) > Es <(1- E)' < e-1

To see this, observe that the probability that hypothesis h classifies one input pair

(x,,f(x,)) correctly is Pr' {h, (x,) = f(x,)} (1- E). Given / examples, the probability

h classifies (x,, f (x )), (x,, f (x,)) correctly is

Pr' {(h (x,)= f(x,))A...A(h, (x,)= f(x,))} <(1- )

because the sampling is i.i.d. Thus, the probability of finding an hypothesis h with error

greater than E and consistent with the training set (of size 1) is denoted by the union

bound (i.e., the worst case) H (1- )1 To see this latter step, first define El to represent

the event that h is consistent. Then we know that



I-1 I-1
Finally, ( ) < is a commonly known simple algebraic inequality.(1
Finally, (1- F)' e is a commonly known simple algebraic inequality.









The idea behind the PAC bound is to bound this unlucky scenario (i.e., algorithm A

finds a consistent hypothesis that happens to be one with error greater than ). The

following result formalizes this.

Blumer Bound (Blumer et al. 1987). H (1- F)' < 3. Thus, the sample complexity, 1,

for a consistent hypothesis h over finite hypothesis spaceH, is bounded by


1- In H +ln


2.3.3.2 Finite inconsistent hypothesis space

An hypothesis h is called inconsistent if there exist misclassification errors E, > 0

in the training sample. The sample complexity is therefore bounded by

1> -In H ln +ln1
2(E-e2 8)

and the error is bounded by


E 2 + ln H+lnn
2/1 8)

We can see from the above inequality that e is usually larger than error rate E,.

Interested readers can see Goldman (1991) for further explanations.

2.3.4 Infinite hypothesis space

When H is finite we can use H directly to bound the sample complexity. When H

is infinite we need to utilize a different measure of capacity. One such measure is called

the VC dimension, which was first proposed by Vapnik and Chervonenkis (1971).

Definition: VC Dimension Definition. The VC dimension of an hypothesis space is the

maximum number, d, of points of the instance space that can be separated into two









classes in all possible 2d ways using functions in the hypothesis space. It measures the

richness or capacity of H (i.e., the higher d is the richer the representation). Given H with

a VC dimension d and a consistent hypothesis h e H then the PAC error bound is

(Cristianini and Shawe-Taylor 2000):

2 dlog 2el 2log2
E<- dlog2 -+0log
/1 d 8 )

provided d< Il and l> 2/e.

2.4 Empirical Risk Minimization and Structural Risk Minimization

2.4.1 Empirical Risk Minimization

Given a VC dimension d and an hypothesis h e H with a training error es, the

error rate E is bounded by

4 2el 41
S< 2E, +-{dln-2+l1n-
l d

Therefore, the empirical risk can be minimized directly by minimizing the number

of misclassifications on the sample. This principle is called the Empirical Risk

Minimization principle.

2.4.2 Structural Risk Minimization

As is well known, one disadvantage of the empirical risk minimization is the over-

fitting problem, that is, for small sample sizes, a small empirical risk does not guarantee a

small overall risk. Statistical learning theory uses the structural risk minimization

principle (SRM) (Scholkopf and Smola 2001, Vapnik 1998) to solve this problem. The

SRM focuses on minimizing a bound on the risk functional.

Minimizing a risk functional is formally developed as a goal of learning a function

from examples by statistical learning theory (Vapnik 1998):









R(a) = L(z,g(z,a))dF(z)

over a e A where L ( ) is a loss function for misclassified points, g (*, a) is an instance

of a collection of target functions parametrically defined by Ua e A, and z is the training

pair assumed to be drawn randomly and independently according to an unknown but

fixed probability distribution F (z). Since F (z) is unknown, an induction principle

must be invoked.

It has been shown that for any a e A with a probability at least 1 6, the bound on

a consistent hypothesis

R (d,1, 3) 4ReP (a) U()
R(a) 2 R truct (d, l, )

holds where the structural risk RtrI, ( ) depends on the sample size, /, the confidence

level, 6 and the capacity, d of the target function. The bound is tight, up to log factors,

for some distributions (Cristianini and Shawe-Taylor 2000). When the loss function is the

number of misclassifications, the exact form of Rshr ( ) is


4d(ln (21/d)+l1)- In (5/4)
strut, (d,1, 6)=4 -

It is a common learning strategy to find consistent target functions that minimize a

bound on the risk functional. This strategy provides the best "worst case" solution, but it

does not guarantee finding target functions that actually minimize the true risk functional.

2.5 Learning with Noise

2.5.1 Introduction

The basic PAC model is also called the noise-free model since it assumes that the

training set is error-free, meaning that the given training examples are correctly labeled









and not corrupted. In order to be more practical in the real world, the PAC algorithm has

been extended to account for noisy inputs (defined below). Kearns (1993) initiated

another well-studied model in the machine learning area, the Statistical Query model

(SQ), which provides a framework for a noise-tolerant learning algorithm.

2.5.2 Types of Noise

Four types of noise are summarized in Sloan's paper (Sloan 1995):

(1) Random Misclassification Noise (RMN)

Random misclassification noise occurs when the learning algorithm, with

probability 1- q, receives noiseless samples (x, y) from the oracle and, with probability

77, receives noisy samples (x,y) (i.e., x with an incorrect classification). Angluin and

Laird (1988) first theoretically modeled PAC learning with RMN noise. Their model

presented a benign form of misclassification noise. They concluded if the rate of

misclassification is less than 1/2, then the true concept can be learned by a polynomial

algorithm. Within / number of samples, the algorithm can find an hypothesis h

minimizing the number of disagreements F(h, o). Disagreements F(h, o) denotes the

number of times that some hypothesis h disagrees with o, where a is the training

sample. Sample size / is bounded by

2 ln(2H}
E 2(1- 2qn )2 2

provided 0 < 7 < rb < 1/2.

Extensive studies can be found in Aslam and Decatur (1993), Blum et al. (1994),

Bshouty et al. (2003), Decatur and Gennaro (1995), and Kearns (1993).

(2) Malicious Noise (MN)









Malicious noise occurs when the learning algorithm, with probability 1 r, gets

the correct samples but with probability r the oracle returns noisy data, which may be

chosen by a powerful malicious adversary. No assumption is made about corrupted data,

and the nature of the noise is also unknown. Valiant (1985) first simulated this situation

of learning from MN. Kearns and Li (1993) further analyzed this worst-case model of

noise and presented some general methods that any learning algorithm can apply to

bound the error rate, and they showed that learning with noise problems are equivalent to

standard combinatorial optimization problems. Additional work can be found in Bshouty

(1998), Cesa-Bianchi et al. (1999), and Decatur (1996, 1997).

(3) Malicious Misclassification Noise (MMN)

Malicious misclassification (labeling) noise is that where misclassification is the

only possible noise. The adversary can choose only to change the label y of the sample

pair (x,y) with probability 7, while no assumption is made about y. Sloan (1988)

extended Angluin and Laird's (1988) result to this type of noise.

(4) Random Attribute Noise (RAN)

Random attribute noise is as follows. Suppose the instance space is {0,1)". For

every instance x in a sample pair (x,y), its attribute x 1 < i < n, is flipped to x5

independently and randomly with a fixed probability qr. This kind of noise is called

uniform attribute noise. In this case, the noise affects only the input instance, not the

output label. Shackelford and Volper (1988) probed the RAN for the problem of k -DNF

expressions. k -DNF is the disjunctions of terms, where each term is a conjunction of at

most k-literals. Later Bshouty et al. (2003) defined a noisy distance measure for function

classes, which they proved to be the best possible learning style in an attribute noise case.









They also indicated that a concept class C, is not learnable if this measure is small

(compared with C and attribution noise distribution D).

Goldman and Sloan (1995) developed a uniform attribute noise model forproduct

random attribute noise, in which each attribute x, is flipped with its own probability 7,,

1 < i < n. They demonstrated that if the algorithm focuses only on minimizing the

disagreements, this type of noise is nearly as harmful as malicious noise. They also

proved that no algorithm can exist if the noise rate r, (1 < i < n ) is unknown and the

noise rate is higher than 2E (E is the accuracy parameter in the PAC model). Decatur

and Gennaro (1995) further proved that if each noise probability 7, (or an upper bound)

is known, then a PAC algorithm may exist for the simple classification problem.

2.5.3 Learning from Statistical Query

The Statistical Query (SQ) model introduced by Kearns (1993) provides a general

framework for an efficient PAC learning algorithm in the presence of classification noise.

Kearns proved that if any function class can be learned efficiently by the SQ model, then

it is also learnable in the PAC model, and those algorithms are called SQ-typed. In the

SQ model, the learning algorithm sends predicates (x, a) to the SQ oracle and asks for

the probabilities Px that the predicate is correct. Instead of answering the exact

probabilities, the oracle gives only probabilities P/ within the allowed approximation

error a, which here indicates a tolerance for error, i.e., P a _< P
The approach that the SQ model suggested to generate noise-tolerant algorithms is

successful. A large number of noise-tolerant algorithms are formulated as SQ algorithms.

Aslam and Decatur (1993) presented a general method to boost the accuracy of the weak









SQ learning algorithm. A later study by Blum et al. (1994) proved that a concept class

can be weakly learned with at least Q(d ) queries, and the upper bound for the number


of queries is O(d). The SQ-dimension d is defined as the number of "almost

uncorrelated" concepts in the concept class. Jackson (2003) further improved the lower

bound to (2") while learning the class of parity functions in an n-bit input space.

However, the SQ model has its limitations. Blumer et al. (1989) proved that there

exists a class that cannot be efficiently learned by SQ, but is actually efficiently learnable.

Kearns (1993) showed that the SQ model cannot generate efficient algorithms for parity

functions which can be learned in a noiseless data PAC model. Jackson (2003) later

showed that noise-tolerant PAC algorithms developed from using the SQ model cannot

guarantee to be optimally efficient.

2.6 Learning with Queries

Angluin (1988) initiated the area of Query learning. In the basic framework, the

learner needs to identify an unknown concept f from some finite or countable concept

space C of subsets of a universal set. The Learner is allowed to ask specific queries

about the unknown concept f to an oracle which responds according to the queries'

types. Angluin studied different kinds of queries, such as membership query, equivalence

query, subset, and so forth. Different from a PAC model which requires only an

approximation to the target concept, query learning is a non-statistical framework and the

Learner must identify the target concept exactly. An efficient algorithm and lower bounds

are described in Angluin's research. Any efficient algorithm using equivalence queries in

query learning can also be converted to satisfy the PAC criterion Pr(error(h)> e)< 8 .














CHAPTER 3
DATABASE SECURITY-CONTROL METHODS

In this chapter, we will survey important concepts and techniques in the area of

database security, such as compromise of a database, inference, disclosure risk, and

disclosure control methods among other issues. According to the way that confidential

data are released, we categorize the review of database security methods into three parts:

microdata, tabular data, and sequential queries to databases. Our main efforts will

concentrate on the security control of a special type of database the statistical database

(SDB), which accepts only limited types of queries sent by users. Basic SDB protection

techniques in the literature are reviewed.

3.1 A Survey of Database Security

For many decades, computerized databases designed to store, manage, and retrieve

information, have been implemented successfully and widely in many areas, such as

businesses, government, research, and health care organizations. Statistical organizations

intend to provide database users with the maximum amount of information with the least

disclosure risk of sensitive and confidential data. With the rapid expansion of the

Internet, both the general public and the research community have been much more

attentive to the issues of the database security. In the following sections, we introduce

basic concepts and techniques commonly applied in a general database.

3.1.1 Introduction

A database consists of multiple tables. Each table is constructed with rows and

columns representing entities (or records) and attributes (fields), respectively. Some









attributes may store confidential information such as income, medical history, financial

status, etc. Necessary security methods have been designed and applied to protect the

privacy of specific data from outsiders or illegal users.

Database security has its own terminology for research purposes. Therefore, first

we would like to clarify certain important definitions and concepts. Those are repeatedly

used in this research paper and may have varied implications under different

circumstances.

When talking about the confidentiality, privacy or security of a database, we refer

to the disclosure risk of the confidential data. A compromise of the database occurs when

the confidential information is disclosed to illegitimate users exactly, partially or

inferentially.

Based on the amount of compromised sensitive information, the disclosure can be

classified into exact disclosure and partial disclosure (Denning et al. 1979, Beck 1980).

Exact disclosure or exact inference refers to the situation that illegal users can infer the

exact true confidential information by sending sequential queries to the database, while in

the case ofpartial disclosure, the true confidential data can be inferred only to a certain

level of accuracy.

Inferential disclosure or statistical inference is another type of disclosure, which

refers to the situation that an illegal user can infer the confidential data with a high

probability by sending sequential queries to the database. And the probability exceeds

the threshold of disclosure predetermined by the database administrator. This is known as

an inference problem, which also falls within our research focus.









There are mainly two types of disclosures in terms of the disclosure objects:

identity disclosure and attribute disclosure. Identity disclosure occurs if the identity of a

subject is linked to any particular disseminated data record (Spruill 1983). Attribute

disclosure implies the users could learn the attribute value or estimated attribute value

about the record (Duncan and Lambert 1989, Lambert 1993). Currently, most of the

research focuses on identity disclosure.

3.1.2 Database Security Techniques

Database security concerns the privacy of confidential data stored in a database.

Two fundamental tools are applied to prevent compromising a database (Duncan and

Fienberg 1999): (1) restricting access and (2) restricting data. For example, a statistical

office or U.S. Census Bureau disseminating data to the public may enforce administrative

policies to limit users' access to data. Normally the common method used is that the

database administrator assigns IDs and passwords to different types of users to restrict the

access at different security levels. For example, for a medical database, doctors could

have full access to all kinds of information and researchers may only obtain the non-

confidential records. This security mechanism is addressed as the restricting access.

When all users have the same level of access to the database, only transformed data are

usually allowed to be released for the purpose of security. This protection approach

which is in the data restriction category reduces disclosure risk. However, for some

public databases only access control is not feasible and sufficient enough to prevent

inferential disclosure. Thus both tools are complementary and may be used together.

However, we prioritize our research in the second category the data restriction

approach.









Database privacy is also known as Statistical Disclosure Control or Statistical

Disclosure Limitation (SDL). The SDC techniques, which are used to modify original

confidential data before their release, try to balance the tradeoff between information loss

(or data utility) and disclosure risk. Some measures evaluating the performance of SDC

methods will be discussed in Chapter 4.

Based on the way that data are released publicly, all responses from queries can be

classified into three types: microdata files, tabular data files and statistical responses from

sequential queries to databases (Mas 2000). Most of the typical databases deal with all

three dissemination formats. Our research focuses on a section of the third category -

sequential queries to a statistical database (SDB), which differs from a regular database

due to its limited querying interface. Normally only a few types of queries such as SUM,

COUNT, Mean, and etc. can be operated in SDB.

The goal of applying disclosure control methods is to prevent users from inferring

confidential data on the basis of those successive statistical queries. We briefly describe

protection mechanisms for microdata and tabular data in the next two subsections, 3.1.3

and 3.1.4. Security control techniques for the statistical database are discussed in detail in

section 3.2.

3.1.3 Microdata files

Microdata are unaggregated or unsummarized original sample data containing

every anomynized individual record (such as person, business company, etc.) in the file.

Normally, microdata originally come from the responses of census surveys issued by the

statistical organizations, such as the U.S. Census Bureau (see Figure 3-1 for an example)

and include detailed information with many attributes (probably over 40), such as

income, occupation, household composition, and etc. Those data are released in the form










of flat tables, where rows and columns represent records and attributes for each

individual respondent, respectively. Microdata can usually be read, manipulated and

analyzed by computers with statistical software. See Figure 3-1 for an example of

microdata that are read into SPSS (Statistical Package for the Social Sciences).


-i A- ,2 _.. =a- hh, i fi, -,

iI I I





IJ i




Figure 3-1: Microdata File That Has Been Read Into SPSS.
(Data source: Indiana University Bloomington Libraries, Data Services & Resources.
http://www.indiana.edu/ libgpd/data/microdata/what.html)

3.1.3.1 Protection Techniques for microdata files

Before disseminating microdata files to the public, statistical organizations will

apply SDC techniques either to distort or remove certain information from original data

files, therefore protecting the anonymity of individual record.

Two generic types of microdata protection methods are (Crises 2004a):

(1) Masking methods

The basic idea of masking is to add errors to the elements of a dataset before the

data are released. Masking methods have two categories: perturbative (see Crises 2004d

for a survey) and non-perturbative (see Crises 2004c for a survey).

The perturbative category modifies the original microdata before its release. It

includes methods such as adding noise (Sullivan 1989 and Brand 2002, Domingo-Ferrer









et al. 2004), rounding (Willenborg 1996 and 2000), microaggregation (Defays and

Nanopoulos 1993, Anwar 1993, Mateo and Domingo 1999, Domingo and Mateo 2002, Li

et al. 2002b, Hansen and Mukherjee 2003), data swapping (Dalenius and Reiss 1982,

Reiss 1984, Feinberg 2000, and Fienberg and McIntyre 2004) and others.

The non-perturbative category does not change data but it makes partial

suppressions or reductions of details in the microdata set, and applies methods such as

sampling, suppression, recoding, and others (DeWaal and Willenborg 1995, Willenborg

1996 and 2000).

The following two tables are simple illustrations of masking methods, i.e., data

swapping, Additive noise and microaggregation. (Data source: Domingo-Ferrer and

Torra 2003). First the microaggregation method is used to group "Divorced" and

"Widow" into one category "Widow/er-or-divorced" in the field "Marital Status";

Secondly, values of record 3 and record 5 in the "Age" column are switched by applying

data swapping techniques; finally, the value of record 4 in the "Age" attribute is

perturbed from "36" to "40" by adding noise of"4".

Table 3-1: Original Records
Record Illness ... Sex Marital Status Town Age
1 Heart ... M Married Barcelona 33
2 Pregnancy ... F Divorced Tarragona 40
3 Pregnancy ... F Married Barcelona 36
4 Appendicitis ... M Single Barcelona 36
5 Fracture ... M Single Barcelona 33
6 Fracture ... M Widow Barcelona 81


Table 3-2: Masked Records
Record Illness ... Sex Marital status Town Age
1 Heart ... M Married Barcelona 33
2 Pregnancy ... F Widow/er-or-divorced Tarragona 40









Table 3-2. Continued.
Record Illness ... Sex Marital status Town Age
3 Pregnancy ... F Married Barcelona 33
4 Appendicitis ... M Single Barcelona 40
5 Fracture ... M Single Barcelona 36
6 Fracture ... M Widow/er-or-divorced Barcelona 81

(2) Synthetic data generation

Liew et al. (1985) initially proposed this protection approach which first identifies

the underlying density function with associated parameters for the confidential attribute,

and then generates a protected dataset by randomly drawing from that estimated density

function. Even though data generated from this method do not derive from original data,

they preserve some statistical properties of the original distributions. However, the utility

of those simulated data for the user has always been an issue. See (Crises 2004b) for an

overview of this method.

3.1.4 Tabular data files

Another common way to release data is in the tabular data format (also called

macrodata) obtained by aggregating microdata (Willenborg 2000). It is also called

summary data, table data or compiled data. The numeric data are summarized into certain

units or groups, such as geographic area, racial group, industries, age, or occupation. In

terms of different processes of aggregation, published tables can be classified into several

types, such as magnitude tables, frequency count tables, linked tables, etc.

3.1.4.1 Protection techniques for tabular data

Tabular data files collect data at a higher level of aggregation since they summarize

individual atomic information. Therefore they provide higher security for database than

microdata files. However, the disclosure risk has not been completely eliminated and

intruders could still infer confidential data from an aggregated table (see Table 3-3 and









3.4 for an example). Protection techniques, such as cell suppression (Cox 1975, 1980,

Malvestuto et al. 1991, Kelly et al. 1992, Chu 1997), table redesign, noise adding,

rounding, or swapping among others, have to be adopted before the release. See Sullivan

(1992), Willenborg (2000), Oganian (2002) for an overview.

See Table 3-3 for an illustration of tabular data. It shows state level data for various

types of food stores The Economic Division published the economic data by geography

and standard industrial classification (SIC) codes. The "Value of Sales" field is

considered as confidential data. Table 3-4 demonstrates how a cell suppression technique

is applied to protect the confidential data. (Data source: U.S. Bureau of the Census

Statistical Research Division, Sullivan 1992).

Table 3-3: Original Table:
Number of Value of
SIC 'Establishments Sales ($)
54 All Food Stores ... 347 200,900
541 Grocery ... 333 196,000
542 Meat and Fish ... 11 1,500
543 Fruit Stores ... 2 2,400
544 Candy ... 1 1,000

Table 3-4: Published Table After Applying Cell Suppression
Number of Value of
SIC Establishments Sales ($)
54 All Food Stores ... 347 200,900
541 Grocery ... 333 196,000
542 Meat and Fish ... 11 1,500
543 Fruit Stores ... 2 D
544 Candy ... 1 D

Only one Candy store reported sales value for this state in Table 3-3. If the table is

released as it is, any user would learn the exact sales value for this specific store. Also a

sales value is listed for two Fruit stores in this state. Therefore by knowing its own sales

figure, either of these two stores can infer the competitor's sales volume. A disclosure









occurs under either situation. Thus, SDC methods have to be incorporated into the

original table before its publication.

Table 3-4 shows that the confidential data resulting in a compromise are suppressed

and replaced by a "D" in the cells. The technique applied is called cell suppression,

which is very commonly used by U.S Bureau Census currently.

3.2 Statistical Database

3.2.1 Introduction

A statistical database (SDB) differs from a regular database due to its limited

querying interface. Its users can retrieve only aggregate statistics of confidential

attributes, that is, SUM, COUNT, and Mean, for a subset of records stored in the

database. Those aggregate statistics are calculated from tables in databases. Tables could

include microdata or tabular data. In other words, query responses in SDBs could be

treated as views of microdata or tabular data tables. However, those views can only be

summarized to answer limited types of queries and in the form of aggregate statistics they

are computed according to each query. A SDB is compromised if the sensitive data is

disclosed by answering a set of queries. Note that some of the protection methods used in

SDBs are overlapped with those for microdata files and tabular data files. However,

SDBs security methods emphasize on preventing a disclosure from responding sequential

queries.

Many government agencies, businesses, and research institutions normally collect

and analyze aggregate data for their special purposes. For instance, medical researchers

may need to know the total number of HIV-positive patients within a certain age range

and gender. The users should not be allowed to link the sensitive information to any

specific record in the SDB by asking sequential statistical queries. We illustrate how a









statistical database could possibly be compromised by the following example, and further

explain the necessity of applying statistical disclosure control methods before data are

released.

3.2.2 An Example: The Compromise of Statistical Databases

Adam and Wortmann (1989) described three basic types of authorized users for a

statistical database: the non-statistical users accessing the database, sending queries and

updating data; the researchers authorized to receive only aggregate statistics; and the

snoopers, attackers or adversaries seeking to compromise the database. The purpose of

database security is to provide researchers with useful information while preventing

disclosure risk from attackers.

For instance (example from Adam and Wortmann 1989, Garfinkel et al. 2002), a

hospital's database (see Table 3-5) providing aggregate statistics to the outsiders contains

one confidential field, that is, HIV status which is denoted by "1" as positive and "0" as

otherwise. Suppose a snooper knows that Cooper working for company D is a male under

the age of 30, and attempts to find out whether or not Cooper is HIV-positive. Therefore,

he types the following queries:

Query 1: Sum = (Sex=M) & (Company=D) & (Age<30);

Query 2: Sum = (Sex=M) & (Company=D) & (HIV=1) & (Age<30);

The response to Query 1 is 1, and the response to Query 2 is 1.

Neither of queries is a threat to the database privacy individually, however, when

they are put together, the attacker who knows Cooper's personal information can locate

Cooper from Query l's answer and immediately infer that Cooper is HIV-positive from

Query 2's answer. Thus, the confidential data is disclosed. And we refer to this case as a

compromise of a database.









From this example, we can tell that the snooper is able to infer the true confidential

data through analyzing aggregate statistics by sending the sequential queries. Therefore

security mechanisms have to be established prior to the data release.

Table 3-5: A Hospital's Database (data source: part from Garfinkel et al. 2002)
Record Name Job Age Sex Company HIV
1 Daniel Manager 27 F A 0
2 Smith Trainee 42 M B 0
3 Jane Manager 63 F C 0
4 Mary Trainee 28 F B 1
5 Selkirk Manager 57 M A 0
6 Daphne Manager 55 F B 0
7 Cooper Trainee 21 M D 1
8 Nevins Trainee 32 M C 1
9 Granville Manager 46 M C 0
10 Remminger Trainee 36 M D 1
11 Larson Manager 47 M B 1
12 Barbara Trainee 38 F D 0
13 Early Manager 64 M A 1
14 Hodge Manager 35 M B 0

3.2.3 Disclosure Control Methods for Statistical Databases

Some basic security control methods for microdata and tabular data have been

summarized in the previous sections. In this section, we will concentrate on the security

control methods for statistical databases. Some methods used for microdata and tabular

data may also be utilized here. Adam and Wortmann (1989) conducted a complete survey

about security techniques for statistical databases (SDBs). They classified all security

methods for SDBs into four categories: conceptual, query restriction, data perturbation,

and output perturbation. In addition to that, Adam and Wortmann provided five criteria to

evaluate the performance of security mechanisms. Our literature review will follow suit

and discuss major security control methods in the following sections.












(restricted) Queries
Researcher
SDB Exact responses
or denial

A



Oueries
Perturbed Researcher
SDB Data
perturbation (Perturbed)
Responses

B


(restricted) Queries

SDB Researcher
Perturbed Resoonses


C

Figure 3-2: Three Approaches in Statistical Database Security. A) Query Restriction, B)
Data Perturbation and C) Perturbed Responses.

Figure 3-2 demonstrates three approaches: Query Restriction, Data Perturbation

and Output Perturbation (Data source: Adam and Wortmann 1989). Figure 3-2A shows

how Query Restriction method works. This technique either returns exact answers to the

user or refuses to respond at all. Figure 3-2B introduces Data Perturbation method which

creates a perturbed SDB from the original SDB to respond to all queries. The user can

receive only perturbed responses. The output perturbation method is illustrated in Figure

3-2C. Each query answer is modified before being sent back to the user.









3.2.3.1 Conceptual approach

The Conceptual approach includes two basic models: the Conceptual and Lattice

models. The Conceptual model, proposed by Chin and Ozsoyoglu (1981, 1982),

addressed security issues at a Conceptual data model level where the users only access

entities with common attributes and their statistics. The Lattice model developed by

Denning (1983) and Denning and Schlorer (1983), retrieved data from SDBs in tabular

form at different aggregation levels. Both methods provide a fundamental framework to

understand and analyze SDBs' security problems, but neither seems functional at the

implementation level.

3.2.3.2 Query restriction approach

Based on the users' query history, SDBs either provide the exact answer or decline

the query (see Figure 3-2A). The five major methods in this approach include:

(1) Query-set-size control (Hoffman and Miller 1970, Fellegi 1972, Schlorer 1975

and 1980, Denning et al. 1979, Schwartz et al. 1979, Denning and Schlorer 1980,

Friedman and Hoffman, 1980, Jonge 1983). This method allows the release of the data

only if the query set size (number of records included in the query response) meets some

specific conditions.

(2) Query-set-overlap control (Dobkin et al. 1979). This mechanism is based on

query-set-size control and further explores the possible overlapped entities involved in

successive queries.

(3) Auditing (Schlorer 1976, Hoffman 1977, Chin and Ozsoyoglu 1982, Chin et

al. 1984, Brankovic et al. 1997, Malvestuto and Moscarini 1998, Kleinberg et al. 2000, Li

et al. 2002a, Malvestuto and Mezzini 2003). This technique intends to keep query records









for each user, and before answering new queries, it checks whether or not the response

can lead to a disclosure of the confidential data.

(4) Partitioning (Yu and Chin 1977, Chin and Ozsoyoglu 1979, 1981, Schlorer

1983). This method groups all entities into a number of disjoint subsets. Queries are

answered on the basis of those subsets instead of original data.

(5) Cell suppression (Cox 1975, 1980, Denning et al. 1982, Sande 1983,

Malvestuto and Moscarini 1990, Kelly et al. 1992, Malvestuto 1993). The basic idea of

the technique is to suppress all cells that may result in the compromise of SDBs.

So far, some methods in this category have been proved either inefficient or

infeasible. For instance, a statistical database normally includes a large number of data

records. Under this situation, a traditional auditing method would become impractical due

to its requirement for large memory storage and strong computing power. Among those

methods, the most promising method is the cell suppression technique, which has been

implemented successfully by the US Census Bureau and widely adopted in the real

world.

3.2.3.3 Data Perturbation Approach

In this approach, a dedicated perturbed database is constructed once and for all by

altering the original database to answer users' queries (see Figure 3-2B). According to

Adam and Wortmann (1989), all methods fall into two categories:

(1) The probability distribution. This category treats SDB as a sample drawn from

some distribution. The original SDB is replaced either by another sample coming from

the same distribution, or by the distribution itself (Lefons et al. 1983). Techniques in this

category include data swapping (Reiss 1984), multidimensional transformation of









attributes (Schlorer 1981), data distortion by probability distribution (Liew et al. 1985),

and etc.

(2) Fixed data perturbation. This category includes some of the most successful

database protection mechanisms. It can be achieved by either an additive or

multiplicative technique (Muralidhar et al. 1999, 1995). An additive technique

(Muralidhar et al. 1999) refers to adding noise to the confidential data. The multiplicative

data perturbation (Muralidhar et al. 1995) protects the sensitive information by

multiplying the original data with a random variable, which has mean of 1 and a

prespecified variance. Our study focuses on the additive data perturbation, which are

classified into two types of perturbation in our research: random data perturbation and

variable data perturbation. We will introduce these two methods separately in Chapter 5.

3.2.3.4 Output Perturbation Approach

Output Perturbation is also named query-based perturbation. The response for each

query is computed first from the original database, and then it is perturbed based on the

answer of each query (see Figure 3-2C). Three methods are included in this approach:

(1) The Random-Sample Queries technique is proposed by Denning (1980). Later,

Leiss (1982) suggested a variant of Denning's method. The basic rationale is that the

query response is calculated from a randomly selected sampled query set. This selected

query set is chosen from the original query set by satisfying some specific conditions.

However, an attacker may compromise the confidential information by repeating the

same query and averaging the results.









(2) Varying-Output Perturbation (Beck 1980) works for SUM, COUNT and

Percentile queries. This method assigns a varying perturbation to the data that are used to

compute the response statistic.

(3) Rounding includes three types of output perturbation: systematic rounding

(Achugbue and Chin 1979), random rounding (Fellegi and Phillips 1974, Haq 1975,

1977), and controlled rounding (Dalenius 1981). This technique calculates queries based

on unbiased data, and then the answer is rounded up or down to the nearest multiple of a

base number set by Database Administrators (DBAs). Query results do not change for the

same query, therefore providing good protection in terms of averaging attacks.

In this chapter we summarized different types of database security-control methods.

For a specific database, one SDC method could be more effective and efficient than

another. Therefore, how to select the most suitable security method becomes a critical

issue in the database privacy. We will review various performance measurements for

SDC in the next chapter.














CHAPTER 4
INFORMATION LOSS AND DISCLOSURE RISK

Chapter 2 provided an overview of important SDC methods that are applied to

protect the privacy of a database. However, since SDC methods reach their goals by

transforming original data, users of the database would achieve only approximate results

from a modified data. Therefore, a fundamental issue that every statistical organization

has to address is how to protect confidential data maximally while providing database

users with as much useful and accurate information as possible. In this chapter, we

review the main performance measurements of SDC methods. These assessments are

used to evaluate the information loss (used interchangeably with data utility) and

disclosure risk of a database. These measures have become standard criteria for deciding

on how to choose appropriate protection techniques for SDBs.

4.1 Introduction

All SDC methods attempt to optimize two conflicting goals:

(1) Maximizing data utility or minimizing information loss that legitimate data

users can obtain.

(2) Minimizing the disclosure risk of the confidential information that data

organizations take by publishing the data.

Therefore the efforts to obtain greater protection usually result in reducing the

quality of data that are released. So the database administrators always seek to solve the

problem by optimizing tradeoffs between the information loss and disclosure risk. The

definitions for information loss and disclosure risk are as follows:









Information Loss (IL) refers to the loss of the utility of data after being released. It

measures the damage of the data quality for the legal users due to the application of SDC

methods.

Disclosure Risk (DR) refers to the risk of disclosure of confidential information in

the database. It measures how dangerous it is for statistical organizations to publish

modified data.

The problem that statistical organizations always have to confront is how to choose

an appropriate SDC method with suitable parameters from many potential protection

mechanisms. And the selected mechanism should be able to minimize disclosure risk as

well as information loss. One of the best solutions is to count on performance measures to

evaluate the suitability of different SDC techniques to the database. Good designs for

performance criteria quantifying information loss and disclosure risk are therefore

desirable and necessary.

4.2 Literature Review

Designing good performance measures is a challenging task because different users

collect data for different purposes and organizations define disclosure risk to different

extents. So far, there are many performance assessment methods existing in the literature.

Based on their properties, we divide those measurement techniques into five categories in

our research:

(1) Information loss measures for some specific protection methods.

This type of measurement assesses the difference of masked (modified) data from

original data after applying a specific protection method. Refer to Willenborg and Waal

(2000) and Oganian (2002) for example. If variances of the original microdata are critical

for the user, then the information loss can be estimated as









Var ( (datamaked))Var ( (dataongnal))


where (datao,,gna) is a consistent estimator of the original data, and (datamaked)

is the corresponding estimator of the modified data. We can tell from the above criterion

that this measurement depends on a specific purpose of data use, such as mean, variances,

etc.

(2) Generic information loss measures for different protection methods.

A generic information loss measure, which is not limited to any particular data use,

is designed to compare different protection methods. Two well-known general

information loss measures are as follows:

Shannon's entropy, discussed in Kooiman et al. (1998) and Willenborg and Waal

(2000), can be applied to any SDC technique to define and quantify information loss.

This measurement models the masking process as noise added to the original dataset,

which then is sent through a noisy channel. The receiver of the noisy data intends to

reconstruct the probability distribution of the original data. The entropy of this

probability distribution measures the uncertainty of the original data after masked data

are released because of the transmission process. However an entropy-based

measurement is not a very good criterion since it ignores the impact of covariances and

means. Whether or not these two statistics can be preserved properly from the original

data directly affects the validity and quality of the altered data.

Another measurement by Domingo-Ferrer et al. (2001) and Oganian (2002)

suggests that IL would be small if the original and masked data have similar analytical

structure, but the disclosure risk would be higher in this case. This method compares

statistics, such as mean square error, mean absolute error, and mean variation, which are









calculated from the difference of covariance matrix, coefficient matrix, correlation

matrix, and etc. between the original data and modified data.

(3) Disclosure risk measures for specific protection methods.

The disclosure risk also affects the quality of the SDC methods. Compared with IL

measures, DR measures are more method-specific. The idea of assessing disclosure risk

was initially proposed by Lambert (1993). Later, different DR measures were developed

for SDC methods, i.e., for sampling methods by Chen and Keller-McNulty (1998),

Samuel (1998), Skinner et al. (1994), and Truta et al. (2004), and for micro-aggregation

masking methods by Jaro (1989), and Pagliuca and Seri (1998).

(4) Generic disclosure risk measures for different protection methods.

The two main types of general DR measurements are applied to measure the quality

of different protection methods for tabular data. The first measurement is called

sensitivity rules, which is used to estimate DR prior to the publication of data tables.

There are three methods: (n,k) -dominance, p% -rule, and pq rule (Felso et al. 2001,

Holvast 1999, Luige and Meliskova 1999). Different from dominance rule, which is

criticized for its failure to to reflect the disclosure risk properly, a new priori measure is

proposed by Oganian (2002), who also introduced a posterior DR measure, which takes

the modified data into account and operates after applying SDC methods.

A new method based on Canonical Correlation Analysis was introduced by Sarathy

and Muralidhar (2002) to evaluate the security level for different SDC methods. This

methodology can also be used to select the appropriate inference control method. For

more details, refer to Sarathy and Muralidhar (2002).









(5) Generic performance measures that encompass disclosure risk and information

loss for different protection methods.

A sound SDC method should be able to achieve an optimal tradeoff between

disclosure risk and information loss. Therefore a joint framework is desired to examine

the tradeoffs and compare the performance of distinct SDC methods. Two popular

performance measures in the literature are Score Construction and R-U confidentiality

map.

Score Construction, proposed by Domingo-Ferrer and Torra (2001), ranks different

SDC methods, based on their scores obtained by averaging their information loss and

disclosure risk measures. For example (Crisis 2004e),


Score(V,) =L(VV)DR(V,
2

Where V is the original data, V is the modified data. Information Loss (IL) and

Disclosure Risk (DR) are information loss and disclosure risk measures. Refer to Crisis

(2004e), Domingo-Ferrer et al. (2001), Sebe et al. (2002) and Yancey et al. (2002) for

more examples.

An R-U confidentiality map, first proposed by Duncan and Fienberg (1999),

constructs a general analytical framework for information organization to trace the

tradeoffs between disclosure risk and data utility. It was further developed by Duncan et

al. (2001, 2004), and Gomatam et al. (2004). Trottini and Fienberg (2002) later illustrated

two examples of R-U map in their paper. An application is given in Boyen et al. (2004).

Database administrators could decide the most appropriate SDC method from the R-U

map by observing the influence of a particular method with the according parameter










choice. See the following figure (Data source: Trottini and Fienberg 2002) for an

example.



<






r,-01


0 2 4 -

0 I

Data Utilty


Figure 4-1: R-U Confidentiality Map, Univariate Case, n = 10, 2 = 5, 02 = 2

M,, M1 and M2, are represented by a diamond, a circle and a dashed line in the

figure, and indicate three types of SDC methods: trivial microaggregation,

microaggregation, and the combination of additive noise and microaggregation,

respectively. The disclosure risk and data utility are functions determined by the data size

n, known variance (prior belief) f2, known population variance o-2, and the standard

deviation r of the noise added to the original data. The y-axis measures the disclosure

risk while the x-axis estimates the data utility. For example, checking Figure 3-2, if the

database administrators intend to have the disclosure risk below 0.5, we will see that the

appropriate SDC method that satisfies this requirement is 2,, the mixed strategy of

additive noise plus microaggregation method. From the x-axis, the corresponding data

utility is shown as 2.65. The choice of r can also affect the R-U map. If r is large, then

the mixed strategy M2 is close to not release any data at all, as r is chosen close to zero,






41


the M2 is equivalent to the microaggregation method with some specific parameter. In

Figure 4-1, r = 2.081.

We do not differentiate the measurements for microdata and tabular data in the

overview since our research focuses on statistical databases. All examples and methods

previously mentioned are applied either to microdata or tabular data or both.














CHAPTER 5
DATA PERTURBATION

This chapter provides an introduction to additive data perturbation methods. Based

on different ways of generating perturbative values, additive data perturbation methods

are classified into three categories: random-data perturbation, fix-data perturbation and

variable-data perturbation. The first category, random-data perturbation, with five types

of perturbation methods, can be found in Kim 1986, Muralidhar et al. 1999, Sullivan

1989, Tendick 1991, Tendick and Matloff 1994. Our proposed variable-data perturbation

method is a new category that includes the interval protection technique given by Gopal

et al. (1998, 2002) and Garfinkel et al. (2002). In both random data perturbation and

variable-data perturbation methods, a perturbed database is constructed by adding noise

to the confidential data in the original database. All query responses are computed from

the perturbed database. We will review an algorithm by Dinur and Nissim (2003) that

finds a bound for the fixed-data perturbation. The noise is added to each query response.

This bound can be applied to both data perturbation and output perturbation methods.

Their work considers the tradeoff between privacy and usability of a statistical database.

We end the chapter with the proposed approach to the database security problem.

5.1 Introduction

Our study focuses on additive noise perturbation methods, which are usually

employed to protect confidential numerical data. Perturbation methods can guarantee the

prevention of the exact disclosure by adding noise to sensitive data, however they are still









susceptible to partial disclosure and inferential disclosure. (See Chapter 3 for definitions

of exact disclosure, partial disclosure and inferential disclosure.)

Two types of additive perturbation methods are described in the following sections

based on their different approaches of generating noise. An algorithm by Dinur and

Nissim (2003) providing a theoretical basis for our study is also reviewed. Our proposed

research approach is discussed at the end of this chapter.

5.2 Random Data Perturbation

5.2.1 Introduction

Random Data Perturbation (RDP) is one of the most popular and practical data

protection methods employed in statistical databases today. In order to effectively prevent

statistical inference against a snooper, DBAs attempt to provide an appropriate level of

security by distorting the sensitive data with random noise. The RDP method could

assure adequate protection of confidential information while satisfying legitimate users'

needs for aggregate statistics of the database.

5.2.2 Literature Review

In the Random Data Perturbation (RDP) method, a perturbed database is created by

adding random noise to the confidential numerical attributess. We discuss four types of

RDP summarized by Crises (2004) and describe a general method for RDP given by

Muralidhar et al. (1999).

Before walking through different types of RDP methods, we first discuss the main

disadvantage of the data perturbation methods. RDP methods may generate bias into

statistical characteristics of databases, such as PERCENTILES, conditional SUMS, and

COUNTS. Matloff (1986) initially introduced the concept of bias, which occurs when the

responses to certain queries computed from a perturbed database may be different from









the responses computed from the original database. The four types of bias, A, B, C, and

D, are defined and analyzed in the literature by Muralidhar et al. (1999). Type A bias

occurs when a change in variance causes a change of summary measures of some

perturbed attribute. Typed B bias applies when the perturbation distort the relationships

between confidential attributes. Type C bias occurs when the perturbation changes the

relationships between confidential and non-confidential attributes. Type D bias occurs

when the underlying distribution of the perturbed database can not be determined because

the original database or noise term has a non-multivariate normal distribution. Improved

perturbation methods are designed to avoid bias (Matloff 1986, Tendick 1991, Tendick

and Matloff 1994, Muralidhar et al. 1995). A creative method called General Additive

Data Perturbation (GADP), proposed by Muralidhar (1999), deletes all these types of bias

completely from additive perturbation methods. For more information about GADP, see

Section 5.2.

(1) Masking by uncorrelated noise addition

This method is also called the Simple Additive Data Perturbation method

(Muralidhar et al. 1999). The vector of confidential fields, d,,, representing the mth

attribute of the original database which contains n records, is replaced by a vector Ym by

adding a noise term em:

y, = d, + e

where each element of em is normally distributed and drawn from a random

variable 72r~ N(0, O- ). Each noise term is generated independently of the others, such


that Cov(r,, r) = 0 for all i a j. The variances of z, are generally assumed









proportional to those of the original vector d,,, that is, if the variance of dm is o,', then

o" := ac- The distribution of nm and parameter a are decided by the DBA. This

perturbation method introduces Type A, B and C bias.

(2) Masking by correlated noise addition

This method proposed by Kim (1986) and Tendick (1991) uses correlated noise to

perturb the database. It is also called the Correlated-Noise Additive Data Perturbation

method (CADP). The formulation of the method is:

S=V+V

where V is the covariance matrix from the perturbed data; V, is the covariance

matrix of the errors, that is, rz-N(O, V), which is proportional to the covariance

matrix of the original data, V, that is:

V' = aV

The CADP method generates Type A and Type C bias.

(3) Masking by noise addition and linear transformations

In Kim (1986), Tendick and Matloff (1994), Crises (2004), and Muralidhar et al.

(1999) masking by correlated noise addition was modified to use additional linear

transformations to eliminate certain types of bias. Therefore, the sample covariance

matrix of the masked data is an unbiased estimator for the covariance matrix of the

original data. This method is also named the Bias-Corrected Correlated-Noise Additive

Data Perturbation (BCADP) method and only results in Type C bias.

(4) Masking by noise addition and nonlinear transformation









Sullivan (1989) proposed a complex algorithm (not discussed here) combining

simple additive noise with a nonlinear transformation. This masking method is applied to

discrete attributes.

Muralidhar et al. (1999) introduced a General Method for the Additive Data

Perturbation (GADP) method, which is a further improvement on the previous RDP

methods. Suppose the database U has a set C of confidential attributes and a set NC of

non-confidential attributes with n records. A perturbed database P which only alters the

attributes in set C is constructed on the basis of the original database U. The

perturbation process keeps all statistical relationships, such as the mean values for C,

and measures of the covariance and canonical correlation between C and NC Then

each record in the set C is generated from a multivariate normal distribution. This

process is repeated for all records. The GADP method guarantees that the statistical

properties between all attributes are the same before and after perturbation, therefore

eliminating all types of bias. Thus, the GADP is called a bias-free RDP method. By

comparing with other perturbation methods empirically, Muralidhar et al. suggested that

the GADP method would provide the highest level of security and represents a general

form of additive noise perturbation.

5.3 Variable Data Perturbation

5.3.1 CVC Interval Protection for Confidential Data

Gopal, Goes, and Garfinkel (1998) initiated the idea of interval protection for

confidential information in a database and introduced the concept of interval disclosure.

They developed three techniques, which they called "Technique-LP", "Technique-ELS,

and Technique-RP", for various query types. As a result, the query types that a user could

ask are limited to SUM (COUNT), Mean, MIN, and MAX for numerical data. This









method was further studied in Gopal et al. (2000). Later, Gopal et al. (2002) formally

proposed the Confidentiality via Camouflage (CVC) interval protection technique, which

is designed to answer numerical ad hoc statistical queries to an online database. Garfinkel

et al. (2002, 2004) further extended this technique.

Garfinkel et al. (2002) explored the CVC technique for privacy protection of binary

confidential data and answered only ad hoc COUNT queries (the same as SUM queries

here). The extended technique is called Bin-CVC. Consider a database consisting of n

records. The Bin-CVC technique introduces s binary camouflage vectors,

P= {P,l 2..., P 1, ,, which are used to camouflage or hide the true confidential vector


d, where P" = d for s. Without loss of generality, they assumed the database contained

only one binary confidential field. Each camouflage vector is denoted as

P' = (p,...,p). When a user asks a query q, an interval answer I(q)= [(q), u(q)]


will be returned as follows. The upper bound u(q) and lower bound 1(q) of the interval

are calculated from the maximum and minimum of all camouflage vectors in the specific

set related to the query, that is, u (q) = max I p' and 1(q)= min p' The true
l q lGq

answers are guaranteed to be inside the interval response, cd e I (q).


Table 5-1: An Exam le Database (Data source: Garfinkel et al. 2002)
Record Name Job Age Company HIV
1 Jones Manager 27 A 0
2 Smith Trainee 42 B 0
3 Johnson Manager 63 C 0
4 Andres Trainee 28 B 1
5 Selkirk Manager 57 A 0
6 Clark Manager 55 B 0
7 Cooper Trainee 21 D 1
8 Nevins Trainee 32 C 1










Table 5-1. Continued
Record Name Job Age Company HIV
9 Granville Manager 46 C 0
10 Brady Trainee 36 D 1
11 Larson Manager 47 B 1
12 Remminger Trainee 28 D 0
13 Early Manager 64 A 1
14 Hodge Manager 35 B 0

The HIV status field represents a binary confidential field with 14 records (see

Table 5-1). All query responses involving this sensitive field are computed from

camouflage vectors generated by the Bin-CVC technique. Table 5-2 is an example of

camouflage vectors for this database where vector P3 is the true vector.

Table 5-2: The Example Database with Camouflage Vector(Data source: Garfinkel et al.
2002)
Record P1 P2 P3= d
1 1 0 0
2 0 1 0
3 1 0 0
4 0 0 1
5 0 1 0
6 1 0 0
7 0 0 1
8 0 0 1
9 0 1 0
10 0 0 1
11 0 0 1
12 1 0 0
13 0 0 1
14 0 1 0

Camouflage vectors are generated from a complex network algorithm. The design

of the network algorithm whose joint paths construct different camouflage vectors is a

critical step in the success of the Bin-CVC model. The network represents all n records

in the confidential field with variables (x, ..., x). All paths start from the source to the


destination. The network is constructed using two parameters. Parameter w gives the









total number of paths, and parameter m is the number of paths consisting only of true


value edges. These determine the number of camouflage vectors s = .An illustration


of the network construction of the example database (see Table 5-1) using three

camouflage vectors (see Table 5-2) is shown in Figure 5-1.





S x X Xg c x,. ? xn PQ i





Figure 5-1: Network With (m,w)= (1,3) (data source: Garfinkel et al. 2002)

In the example database (Table 5-1), all 14 records in the confidential field are

denoted by variables (x,, x4) Parameter w = 3 indicates 3 disjoint paths are

constructed in the network and m = 1 implies that all those variables with true value 1 in

the true confidential field are assigned to one of three paths. Variables representing other

records with value zero are assigned as evenly as possible to the rest of two paths. The


total number of camouflage vectors is s =3 =3. Every camouflage vector is the


combination of choosing m edges out of w paths. So, in Figure 5-1, each camouflage

vector selects one edge out of three paths with their true value records on the path.

Compared with Table 5-2, camouflage vector P' has records 1, 3, 6, and 12 containing

value one. The remaining records in P' are zero. In the corresponding network,

accordingly there is one path including only variables (x,, x3, x6, 12









Performance measurement CB= 1 -p -m/w is employed to assess the quality of

networks for a given database with different w and m values, where CB stands for

Column Balancing. The usefulness of each query answer is computed by the formula:

Z = 100 x(1-(u(q) -(q))/ q). q denotes the cardinality of the query q which is the

number of records that are involved in that query. The closer to 1.0 Z is, the better the

query answer is.

The ideal network that yields the tightest interval response has a small s and every

camouflage vector has the same number of ones as the true confidential field. That is,

p" = p*, where pi is the proportion of ones in Pj, and p is the proportion of ones in

P" = d. This ideal structure is called "perfect column balancing". See Table 5-2 as an

example. Here p = p2 = 0.4, p* = 0.6. A good CB "increases the probability of (a)

better query answer".

Bin-CVC is a very promising methodology for the database privacy. However,

instead of an exact answer, it responds to the query with an interval which reduces the

data utility. We define the information loss of the CVC technique as the width of the

interval, given by e =u(q)-l(q).

5.3.2 Variable-data Perturbation

Inspired by the CVC technique, we propose a new data perturbation method the

variable-data perturbation. Different from random data perturbation whose random noise

is drawn from a normal distribution N (0, r2), the variable-data perturbation method is

defined as a data perturbation method which modifies the confidential information by

adding discrete noise that is generated by a parametrically driven algorithm, such as w









and m in the CVC interval protection method. The perturbed database is created once

and for all. The algorithm can choose various parameters to produce different types of

noise. We can view the output of the algorithm as if it were pulling values randomly from

some distribution D with known parameters, with a non-zero mean u/ and variance c2.

The mean and variance are always finite. Each query answer is computed from the

perturbed data.

A discrete random data perturbation method builds a perturbed database from

which all query responses are computed. Output perturbation method does not alter the

database, but query answers are perturbed before they are returned to the user. Variable

data perturbation method is a hybrid of data perturbation and output perturbation and

generates noise for the confidential field. Perturbed answers for each query involving

sensitive data are calculated only from the perturbed confidential vector. We treat the

variable-data perturbation as a data perturbation method with query protection.

Consider the Bin-CVC technique as an example of the variable-data perturbation

method. The network algorithm creates camouflage vectors to disguise the true

confidential vector once and for all. Each query answer is an interval which is computed

from the camouflage vectors and assures the true answer is included. In a worst-case

scenario, the noise or perturbation could be regarded as the difference between the lower

bound and upper bound of the interval: e = u (q) -(q), where eq are discrete random

variable.

We simulated the network algorithm on the example database (see Table 5-1) in

Garfinkel et al. (2002) and computed the interval answers for all queries. Since the

confidential vector in the database is a 14-bit binary string, the total number of queries













involving this binary vector is 214. The following figures (Figure 5-2 A-D) show four


different cases with parameters of the network algorithm at (1) w = 5 and m = 2; (2)


w = 7 and m = 3; (3) w = 8 and m = 5; (4) w = 12 and m = 6. Among those networks,


w = 7 and m = 3 creates perfect column balancing and based on its frequencies of each


noise value for all 214 queries, we obtain a noise distribution with mean p = 3.302 and



variance 2 = 1.379 as shown in Figure 5-2B.


Discrete Distribution of Perturbations inete Distribut n ofPerturbations
Discrete Distribution of Perturbations
the CVC Network with w=5 and m=2 in the CVC Network with w7 and m
in the CVC Network with w=7 and m=3
6000 55
6000
4830 5208
5000 4802

400000
40004000
2 12 4000
I= 3000 0 2870
3000 3000

1053
2000 2000

1000 1000
0 -,0 .. .. -
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
Perturbation Perturbation
A B

Discrete Distribution of Perturbations Discrete Distribution of Perturbations in
in the CVC Network with w=8 and m=5 the CVC Network with w=12 and m=6
6000
6000
5170
000 5000 4641

4000 3 4 4000

S3000 3000 -

2000 2000
1295 1244
1000 4 1000

11 : i i I 1 1
1 2 3 4 5 6 7 8 9 10 11 12
Perturbation 4Perturbation
C D
Figure 5-2: Discrete Distribution of Perturbations from the Bin-CVC Network Algorithm.
A) w=5 and m=2,B) w=7 and m=3,C) w=8 and m=5 and D)
w=12 and m= 6.


After the network is set up with parameters w and m, the noise distribution D is


fixed, and its mean p and variances o2 are finite and known. Figure 5-2 showed this



property. We intend to bound the noise eq drawn from D in terms of p and -2. We









will continue to discuss how to estimate the mean / and variances U2 in the next

chapter.

5.3.3 Discussion

For Bin-CVC, there is a conflict between the two performance measures, CB and

Z -score. That is, a high Column Balancing value, which indicates a good protection for

the whole database with some specific w and m, could not guarantee good query

answers (i.e., a high Z value).

We claim that Interval disclosure or interval inference occurs when the maximum

of the error of the snooper's estimation about the true confidential value is less than the

tolerance threshold predetermined by the DBA. Exact inference can be treated as a

special case of interval inference and has an error value of 0.

Gopal et al. (2002) state that the CVC technique could completely eliminate exact

disclosure and interval inference. However, Muralidhar et al. (2004) have shown

empirically that CVC technique is sometimes vulnerable to interval inference. By

utilizing a simple deterministic procedure, the snooper can sometimes compromise the

database by shrinking the interval answers into a smaller range within the predetermined

threshold. Suppose the ith query is answered by [,, ,u]. In their example, they show how

a snooper could compute the midpoint of the interval m, = (1, + u, )/2, the half-width of

the interval, w, =(u, l )/ 2, and then use these to build a new interval as

m, + (0.5 x w,) which still includes the true value, but is narrower than the original

interval and, hence, less than the threshold. See Table 5-3 for this example.









Table 5-3: An Example of Interval Disclosure (Data source: Muralidhar et al. 2004)
Original Interval Intruder Interval
Width
True 1 2 3 Width of
uery Value Lower Upper Ori Lower Upper Modified
Limit Limit Limit Limit Interval
Interval
(%)(%)
1 276.3 275.2 302.8 263.5 263.5 302.8 14.2 273.3 293.0 7.1
2 35.4 36.2 32.7 36.3 32.7 36.2 10.2 33.6 35.4 5.1
3 37.4 37.4 41.1 35.5 35.5 41.1 14.9 36.9 39.7 7.5


In Gopal et al. (2002), the interval protection requires that the interval length is at

least 10% of the original value. In Table 5-3, the intruder's intervals computed using the

method provided by Muralidhar et al. (2004) are narrower than the threshold of 10%.

Thus, the database is compromised in terms of the interval disclosure.

However, the test given by Muralidhar et al. (2004) only examined the CVC

interval protection empirically. For networks with different w and m, this deterministic

method may not apply.

5.4 A Bound for The Fixed-data Perturbation (Theoretical Basis)

Dinur and Nissim (2003) studied a theoretical tradeoff between privacy and

usability of statistical databases (SDBs). They concluded that a minimum perturbation

magnitude of Q(n) is required for each query q in order to maintain even weak

privacy of the database. Otherwise, an adversary could reconstruct the statistical database

using = n(lgn)2 (base 2 logarithm) queries with high probability in polynomial time.

As expected, the SDB can be protected from disclosure if the perturbation value is

bounded by e > o(n), however, then the data utility may be too low to be useful. Since

Dinur and Nissim make no assumptions beyond assuming the additive error is fixed, their









results are valid both for data perturbation and output perturbation methods using fixed

additive error. We review their results and methodology in the following sections.

Dinur and Nissim (2003) modeled the confidential field in the database as an n -bit

binary string (dl,..., d ) e {0,1)". The true answer for a SUM query q, q c {1,- ..-, n, is

computed as The perturbed answer for a query q is A(q) obtained by adding a


perturbation A (q) d < e, where e = o(n) is the bound for the perturbation of

each query.

The authors developed a Linear Programming (LP) algorithm to generate the

candidate confidential vector which is the vector that an adversary would use to

compromise the database. See Table 5-4 for details of the LP algorithm.

Table 5-4: LP Algorithm (Data source: Dinur and Nissim 2003).

[Query Phase]
Let = n (lgn)2. For 1
q, c {1,---,n},andset ad <-A(q).

[Weeding Phase]
Using and linear objective, solve the following linear program
with unknowns c, ..., c :

-e
0
[Rounding Phase]
Let c '= 1 if c, > Y and c, '= 0 otherwise.

Output c'.









Other vectors that are far away from the true confidential vector d are weeded out

by the algorithm. The output of the LP algorithm is the candidate vector that best

estimates the confidential vector.

The n -bit binary vector c' is obtained by rounding c, which is a vector of real

numbers produced by the LP algorithm. Dinur and Nissim (2003) also introduced a

1
vector c obtained by rounding c to the nearest integer multiple of where k = n
k


represents a precision parameter, and K = 0, k'k",...,k 1 Hence ce K". They
[ k k k

proved that d, d, < I 2e +1.


To prove that the candidate vector c' obtained from the algorithm is close to the

true confidential field d, Dinur and Nissim (2003) introduced a Disqualifying Lemma,

which proves that random queries ql,..., q, would weed out all vectors x E X where


X= xK"| Pr x -d, >sn\ (1)



The term Pr x-d in Equation 1 represents the expected number of


1
records that obey x d, -, for E > 0. Therefore, X denotes the set of all vectors


which are far away from the true vector d.

The Disqualifying Lemma states that


Pr [ (x,-d ,)2e+1 > (2)
qcR[n] L q









The lemma proves that there exists a probability c > 0 such that a query q


disqualifies x if (x d,) > 2e + 1. x will not be a valid LP solution if such a q


exists. The lemma guarantees if x is far away from d, at least one of the / queries

q*, q, would disqualify x with high probability.

One missing piece is the relationship between inequalities (1) and (2) that relates E

to The proof of the disqualifying lemma establishes this link and it is possible to think

of 5 as a function of : (F). We will discuss this further in Chapter 6.

If I queries q ... q, are chosen independently and randomly, then for each x e X,

the probability that all / queries do not disqualify x is (1- )1.

A conclusion derived from the Disqualifying Lemma is

Pr [Vx X 3i, q, disqualifies x] >1-(n +1)" (1-) >l-neg(n)
q,, q R["]



1- Pr [Vx X 3i, q, disqualifies x]<_ (n +1)" (1- ) q,, q1lR[ ]

Thus, the probability that none of the / queries can disqualify x E X is bounded by

a very small number neg(n) > 0 .

Therefore, the Disqualifying Lemma guarantees ruling out all disqualifying vectors

x E X with high probability (1-neg(n)) and guarantees that the hamming distance

between the final candidate vector c' and true vector d is small, that is, dist(c', d) < s .









The number of queries that are required to weed out disqualified vectors is

computed from the Disqualifying Lemma. That is, = n(lgn)2. See Figure 5-3 for an

illustration of relationships of c, c', c and d.


c '=1 if c >1/2
C, = 0 otherwise


Rounding C to the nearest
integer multiple of 1/k


Scl '=1 if 1 > 1/2
c '= 0 otherwise

dist(c',d)

Figure 5-3: Relationships of c, c', c and d.

5.5 Proposed Approach

Although SDC methods and machine learning have completely opposite research

goals, similar methodologies are applied in both areas (Domingo-Ferrer and Torra 2003).

The SDC methods attempt to modify the data intentionally before the public release. The

data distortion should be sufficient enough to protect the privacy of the confidential data

and small enough to minimize the information loss. ML seeks to learn from noisy

examples and designs error-resilient algorithms to disclose true information (Angluin and

Laird 1988, Goldman and Sloan 1995, Shackelford and Volper 1988, Sloan 1988, Valiant









1985). SDC methods protect the confidential data stored in a database with n records and

m fields. ML learns the true function from / examples, each of them having m

attributes. Therefore, a common structure is used to express the information between

SDL methods and ML. Although the two areas have different research purpose and often

use different terminologies, the underlying methodologies are often the same.

In our research, we approach the database privacy problem from a machine

learning perspective by applying PAC learning theory. We consider a scenario when a

snooper uses a learning algorithm to discover the true confidential data protected by a

SDC method. For example, Figure 5-4 demonstrates the connection between the

methodologies employed in PAC learning theory and in the database protection approach

in Dinur and Nissim (2003).


Disqualifying Lemma:

Pr Vx eX 3i, err(q, disqualifies x )> (s) < (n +1l)n (1 )) < neg (



Random
Samples Error Cardinality Accuracy Confiden
with of parameter level
Sie Hypothesis
Size /




Pr' S: h consistent and error(h)>e c H (1 s) o 8


PAC learning:

Figure 5-4: Illustration of the Connection between the PAC Learning and Data
Perturbation

Figure 5-4 indicates that both approaches determine a training sample size 1,

necessary to accomplish the desired goal. The probability that a query disqualifies the









x e X with probability greater than e () is bounded by the union bound of X, high

probability (F), and further bounded by a small probability neg(n). Those three

parameters correspond to the cardinality of the hypothesis space H the accuracy

parameter E, and the confidence level 3 in the PAC learning theory. They are shown in

Figure 5-4 as matched terms even though different notation and terminologies are

adopted. Therefore, we could conclude that both PAC learning theory and the

Disqualifying Lemma address the problems by using the same methodology for different

purposes. The same parameters are required to build up the models.

From the perspective of PAC learning theory, we regard the true confidential field

as the target concept that an adversary seeks to discover within a limited number of

queries in the presence of some noise, such as random data perturbation or variable-data

perturbation. In Chapter 6, we raise our research questions and extend Dinur and Nissim

(2003)'s work by using PAC learning theory. We set up a model to describe how much

protection is necessary to guarantee that the adversary cannot discover the database with

high probability. Put in PAC learning terms, we derive bounds on the amount of error an

adversary makes, given a general perturbation scheme, the number of queries, and a

confidence level.

Three types of data perturbation bounds are summarized as follows in terms of

different error distributions.

(1) Perturbation with a General Bound Case: General PAC bound

The error is randomly generated identically and independently from an unknown

distribution D. So it is also called Perturbation with a Distribution-free Bound case. A

general PAC bound is derived as:









/>- In H +lnj


where / is the number of queries needed to discover the binary confidential data, E

is the amount of error that an adversary may make to compromise the database and 3 is

the confidence level. H = 2" is the number of candidate confidential vectors in the

hypothesis space H. Without specific information about the distribution of noise, the

derivation of I wholly depends on e and 3, so this bound is relatively loose.

(2) Perturbation with a Fixed-data Bound Case: Fixed data perturbation

Dinur and Nissim (2003) derived a fixed-data bound e = o (i) for the

perturbation added to query responses. A bound for the number of queries is also

developed, denoted as:

/= n(lgn)2

which is sufficient to discover the true confidential vector in the database with a

high probability at a small error.

(3) Perturbation with a Random Variable Bound Case: Variable data perturbation

(Proposed research)

We assume that random perturbations which are added to the query responses have

an unknown discrete distribution. The moments of the distribution, such as the mean and

standard deviation, can be estimated. Variable-data perturbation belongs to this case. In

the next chapter, we derive an error bound for this case by applying the PAC learning

theory. This bound provides the minimum number of queries needed to discover the

protected column with specified error and accuracy.














CHAPTER 6
DISCLOSURE CONTROL BY APPLYING LEARNING THEORY

In Chapter 2 and 3 we reviewed PAC learning theory and database security

methods. In this chapter, we approach the database privacy problem using ideas from

Probably Approximate Correct learning theory. Our research will delve into the additive

noise perturbation masking method which is classified into three categories: random data

perturbation, fixed data perturbation (reviewed in Chapter 5) and variable-data

perturbation. Based on the work of Garfinkel et al. (2002) and Dinur and Nissim (2003),

we raise our research questions and construct a theoretical model from the perspective of

PAC learning theory. We attempt to derive an error bound for perturbations with a

distribution specified by its first two moments and also develop a heuristic method to

estimate the mean and standard deviation for the variable-data perturbation method.

Dinur and Nissim (2003) studied the case of data perturbation bounded by a fixed number

and provide a theoretical foundation for our research.

6.1 Research Problems

Our research focuses on the category of variable-data perturbation. Firstly, we

intend to derive a bound on the level of error that an adversary may make, given the

variable-data perturbation method. We extend the bound on the fixed-data perturbation

proposed by Dinur and Nissim (2003) with an attempt to bound the perturbation of each

e
query with a random variable qe which has a discrete distribution with known

parameters, such as the finite mean and variance. We need to develop a new









Disqualifying Lemma, analogous to Dinur and Nissim's (2003), for the variable-data

perturbation by deploying PAC learning theory. Like the Disqualifying Lemma in Dinur

and Nissim (2003), our result bounds the probability that a query does not eliminate

hypotheses that are far away from the true confidential answer. Using this, we develop

an error bound on the number of queries within which the database could be

compromised with high probability.

6.2 The PAC Model For the Fixed-data Perturbation

We start our model by interpreting the results of Dinur and Nissim (2003) within

the methodology of PAC learning theory.

Suppose an adversary attempts to compromise the SDB by applying PAC learning

theory. We define a Non-Private Database as follows: a database is non-private if a

computationally-bound adversary can expose 1- fraction of the confidential data for

E > 0 with probability 1- where 3 > 0. We call 1- the confidence level.

Consider a statistical database with n records. Its confidential field is a binary

string denoted as (d,,...,d,)' e {0,1". See Table 5-1 for an example database. In this

table, "HIV" status is the column we represent. An hypothesis space H0 contains n -bit

binary vectors, each of which is an hypothesis h e H0 = {0,1)" and denotes a candidate

vector for the confidential field of the database. The cardinality of the hypothesis space,

or the number of hypothesis is H0 = 2". The true confidential field is regarded as the

target concept d e H0. The online database receives a SUM (or COUNT) query

q c (1,...,n} sent by the user and responds with a perturbed answer A(q) of the true






64


answer a, = d A perturbation is added to each query answer instead of every


record and bounded by a fixed number e > a A(q) .

PAC Learning starts by random sampling. We take / samples consisting of queries

and their perturbed responses,

S =((q,,A(q,)),-..,(q,,A(q,))).

Since A(q) is a perturbed answer, we will consider this learning from noisy data.

Our learning algorithm is a linear program. As such, answers can be continuous and

will be rounded. Thus it is useful to define another hypothesis space H2 = [0,1]". For

analysis, a grid will prove useful. Let the hypothesis space H1 = K", where


K = 0, 2,..., ,1 Note that H0 c H1 c H2 where all containment are strict


when n > 1.

Let 4 : H -> H1 by rounding each component in H2 to the nearest integer

multiple of 1/n midpointss rounded down). Further, let h: H, -> H0 (i = 1, 2) by

rounding each component in H, to the nearest of 0 and 1 (0.5 rounds down). Note that

1
h(c)=c+f, where f <- i=l1, -, n.

Given a sample S and a fixed perturbation e, Dinur and Nissim (2003) gave a

polynomial algorithm y that finds c e H2, from which one can output h0 (c). We

represent this algorithm by c <- y(S). As already discussed, the specific algorithm is a

linear program (see Table 5-4).









See Figure 6-1 for an illustration of the relationships of Ho, H1, H,, ho, 4 and d.


h (c): H, Ho


h (c): H, H,


dist(ho(y(S)),d) : I


Figure 6-1: Relationships H0, H1, H, ho, h and d in the Fixed-Data Perturbation.

Let c e H0, then the hamming distance between c and d is


dist (c,d)= {i:c, d, = c, d, .
1=1

Let x H2. Pr x 3 > > E" means the probability of choosing i {1, ., n


1
randomly such that x, -d, >- That is, for this x there are gn expected records where
3

x -d -. Denote this by E, x d > en where E > 0 arbitrarily. Ultimately,
3 311

we wish to show how to choose a sample size / so that dist(h, (7(S)),d) < en .

Lemma 1:

If x eK" and E x 3-d >-








Proof:

1 1 1 1
First note that if x d<- then h0(x) -d < -<. Thus since no more than
3 2 3 3

en i 's, on average, have x -d/ >-, then no more than en records, on average, of
3


h0 (x) can have h0 (x), -d >. The number in x -d < guarantees that x1 round
3 3 3

to the same number as dc.

End of Proof


Let T = x K" : E x-d > en From the point of view of the intruder,


we want our sample to disqualify all points of T with high probability (1 ) where

3 e (0,1) and is usually chosen so that (1- 3) is large. For a sample of size 1, generated

independently and identically according to an unknown but fixed distribution D, the

probability that an hypothesis c is far away from the true target d is measured by the

risk functional


err(c)= -D qc {1,-.,n}: Y(c,-d,) < 1+2e



= Dq c{1,--..,n n (c -d,) >1+2e


where c e H1.

As we stated before (see Figure 6-1), the solution c from the LP can be rounded

either to a binary vector h0 (c) or a vector 4h (c) e K". The probability that the distance









between the true vector d and the rounded vector h (c) is greater than is bounded by
3

E. Based on this condition, for any random query, the difference between the answers

from these two vectors is bounded by a function of the perturbation, 2e +1. So, we can

see that e and E are related and they describe the error from different perspectives. Then

we use a probability which is a function of E, denoted as (E), to bound the risk

functional as

err (c) > (E)

We intend to bound

D' (S:err( ((S))) >( )

by 3>0.

Provided e = o(n), the Disqualifying Lemma of Dinur and Nissim (2003) proved

(E) >0. Then, for r(e)= 1- (e)

D' (S err (y(S))) >( <(n+ 1) ( -(E))' =(ni+ 1)" () (6.1)


where (n +1)" = \K > T is the union bound over T, and therefore the worst-case

scenario is bounded.

The proof of the Disqualifying Lemma in Dinur and Nissim (2003) shows


ic() S ((1)2


with T > ,
500






68

Recall that the Disqualify Lemma (Dinur and Nissim 2003) proves


Pr (x, d,) 2e +1 > Z


In the proof, G1, 1, n are defined as independent random variables such that


S= x d and = 0 both with probability -. Let m = n' 1 The authors
2

approached the proof by dividing it into two cases based on the size of the expected value


of z, denoted as E(m). Let T> -- be a constant to be specified later in the proof.


In the case of E(~) > Tj-n, the probability satisfies


qPr [ (x -da) >2e+1 >1-2e 2/8


In the second case of E(~) < TFn, the probability satisfies


Pr L
qPr [ (x[ -d) 2 2e+1 >-
SR[n]] 1 ] 3,8

The role of / is discussed below. (For the proof details, see the Appendix A of Dinur

and Nissim 2003).

From the result of Disqualifying Lemma, we choose A (e) to be the minimum of

the probabilities from these two cases. So, in term (1), 1 2e /8 = 1- 2e /4000 < 0, so

1- 1-2e /8 >1. Interm (2), we know a = so a = >0 and 1 -- <1.
He36 3 108n/ 3c

Hence













K(s)
(2)


Thus,


where we choose (E)


a for the worst case. Dinur and Nissim choose 8 large
3/7


enough so that


a> 3 p,(k+1)e kf/2
k=l

(note the right side is decreasing in 8f). Simple manipulations show that


e kf/2
k=l


1- e /2


After taking the partial derivative with respects to / for the above formula we obtain


Thus


-2 a e kf/2







Z(k+1)e kf2
k=l


Thus we need


a> 3Y (k+1)e kf/2
k=l


E
Since a = we get
36


i4/2
e-

(1 --82)2


Zke-kf/2
k=l





e-81

-e f/2
1- -/2 2


e-/2
1- e-/2


- /12 2


e


(1-e f2)2


ie-2 2
3e -/2
(1-


-e-

e- 8)2


>, \a
r>-









82 2 -el2
-- >3fe-f/22e
36 (-e -/z)2

2- e Pf/2
e > 108fe- P/22
(1 -e- /2)2

Let x= e7/2. Then

2-x
e >108/x
(1- x)2

/8 is decided by E (E is a pre-defined parameter). For 0 < E <1, numerical calculations

show we need f > 17 thus giving x < 0.0002. Since


3,8 108/

if we plug

2-x
S>108/x
(1- )2

into


3,8 108,/

we get

2-x
(1 _x)


where (E)= = x 2 -
3/, (1- x)

Now back to the inequality (6.1),

D(S:er (S(Se)))e>(E))<;(+1)"rr;








= (n + 1)"(1 ()).

If we bound the probability with the parameter 3 > 0, we get

D' S:err (Y{S))) > () (n+1)"(1 ))<

where 3 > 0 is the confidence parameter.

Then take the base 2 logarithm (denoted as Ig in all the following formula) on

both sides of the last two terms

(n+1)"(1- (E)), <'

to get

g [(n + 1)"(1- )) < lg

Given a pre-defined parameter the minimum sample size is computed as

lg(3)-nlg(n+l)
>1 g ( ((6.2)
lg(l- (E))

2- x
where (E) = x and x = e-'2 with /7 chosen large enough. / is bounded by
(1_x)2

three parameters 3, E and n. Since (E) is a very small number, if we apply it directly

into formula (6.2), the resulting bound for the sample size / is quite large, much more

than / = n(logn)2 from Dinur and Nissim (2003), even for a small n. See Table 6-1 for

examples of two bounds on the sample size with different values of n when 3 = 0.05.

Table 6-1 shows that by interpreting Dinur and Nissim (2003)'s Disqualifying

Lemma, we get a PAC bound which is looser than the one derived in Dinur and Nissim









(2003), no matter what n is. However this PAC bound is still much less than the total

number of queries in a database, 2", except the n is very small, such as n = 10

Table 6-1: Bounds on the Sample Size with Different Values of n.

Slg(3)-nlg(n+l)
n n(logn) 1> (2"))

10 111 373643 1024
50 1,593 2,274,447 1.1259E+15
100 4,415 5,191,750 1.2677E+30
500 40,193 34,338,167 3.2734E+150
1000 99,317 76,188,677 1.0715E+301
5000 754,940 469,076,527 ---

2-x
In section 6.4, we will show how to replace (E) = x with a more
(1- x)

practical number by using the bound in Dinur and Nissim (2003), therefore deriving a

tighter bound for the variable-data perturbation case.

6.3 The PAC Model For the Variable-data Perturbation

In this section, we move to the case that an adversary tries to compromise a

database in which the confidential data is modified by adding variable-data perturbation.

In this method, each query q is added with a perturbation created from a database

protection algorithm. The perturbed response is A(q) while the true query answer is

a =i~d~

6.3.1 PAC Model Setup

In the fixed-data perturbation case, a fixed number bounds the perturbation:

a- A (q) < e. In the variable-data perturbation case, a- A (q) =eq and we assume









that the perturbation eq is a random variable with an unknown discrete distribution with

known finite mean / and variance C2. Based on the knowledge of these parameters, we

attempt to develop a bound on the error that an adversary makes. The bound will be

expressed in terms of these parameters. A threshold on the number of queries, within

which the database is compromised, can be derived from this error bound.

Given S and q for each q e S, we develop a polynomial algorithm 72 that

obtains an hypothesis c e H2 from which we can output h0 (c). The algorithm,

c <- 72 (S), is a linear program:

n
MAin c
CG[0,1] 1-


s.t. -A(q)

where ej is the realization of the random variable eq in the LP algorithm and is sampled

from the perturbation distribution. Then the distance between h (c) and the true vector

d is bounded by


Z(4 (C), ) < Z((c)) + Y (C ')
itq itq itq


n q
< +e

<1+e
qI
1
where 4 (c), =c + f and f, <-. Recall that q denotes the cardinality of the query q.
n








In the variable-data perturbation case, we need to develop a new Disqualifying

Lemma which would disqualify all h (c) which are far away from the true vector d.


That is, for any x e H, query q disqualifies x, if Z( (x) ,) > 1+ e.


See Figure 6-2 for an illustration of the relationships of Ho, H1, H,, ho, h and d.


h, (c): H, 2 Ho


4 (c): H2 H


d(h((S)),d)n (() <
dist (h,(72(S)), d)<:! en (c), d)

Figure 6-2: Relationships of Ho, H,, H2, ho, 4 and d in the Variable-Data Perturbation

6.3.2 Disqualifying Lemma 2

For a sample of size / which is generated i.i.d according to an unknown but fixed

discrete distribution D, the probability that an hypothesis h (c) is far away from the true

target d is measured by the risk functional


err(h4 (c)) = 1- D q {- 1,...,n}: (h(c) d,) <1+e eq










=D qc-1,---,n}: (h(c)-d,)>l+eq




We intend to bound this error rate. As in section 6.2, we want

D (S :err(h (y, (S)))> (cs)) < (6.3)


where e (0,1).

We now develop our Lemma 2, a disqualifying lemma, analogous to Dinur and

Nissim's Disqualifying Lemma. Lemma 2 assumes that the mean and standard deviation

of the distribution of eq satisfies / > a, o + /u < 2sFn and p > -Fn Practical reasons

motivate these respective cases as we now discuss.

(1) if u
Since the standard deviation measures how spread out the perturbations (eq values)

can be, if /u < a, many perturbations will be widely dispersed, meaning that the

corresponding intervals offer little information. This can take many forms. For example

(see Figure 6-3), with a bimodal distribution some intervals will be tight and others very

disperse. The tight ones might provide an attacker the ability to easily disclose parts of

the confidential information. The wide intervals may provide too little usable information

to be meaningful for the user.

(2) if o + _2 > 2-n, there are four possible cases:

a. u > -Tn > c

In this case, most perturbations are clustered around a large mean. Although a large

perturbation provides better protection of the database, it reduces the usability of the







76


query answers. The user gets very little information. For a demonstration of this case, see


the following Figure 6-4. Consequently, a database security method is meaningless if it


produces perturbations with a large mean and relatively small standard deviation.



A Bimodel Distribution of Perturbations

4000

3202

3000
2415 2475 2405

S 2000 1793
m 1560

939
1000
612
435 465
50 34
0 --- ---
1 2 3 4 5 6 7 8 9 10 11 12
Perturbation


Figure 6-3: A Bimodal Distribution of Perturbations in the CVC Network while p/ < a .


b. /u _> c _> Vn


Very high mean and standard deviation imply two situations: (1) all query


responses are perturbed with big noises which are widely spread out in the high mean


area. In this case, the user can not get any useful data from these query answers; and (2)


many query answers have large perturbations while others provide users with very tight


answers which can reveal the confidential data easily. Neither of above distributions is


meaningful for our research.


c. U _> F-n > /U


The same reason described in (1) is used here also.











A Discret Distribution of Perturbations
with high mean and small standard deviation
6000 5543
5122
5000

4000

3000
2284 2151
2000

1000 -
360
0 0 90 22 0 0
0 --
1 2 3 4 5 6 7 8 9 10 11 12
Perturbation


Figure 6-4: A Distribution of Perturbations in the CVC Network with p > n > a .


(3) if u >n holds:

A database usually includes a large number of records. Therefore, the mean of the


perturbations is likely less than Nn in most cases. If the mean > is true, then the

security method likely offers little information to the users, no matter what the standard

deviation is. See the discussion in (2) a, b and c for similar explanations.

Lemma 2:


Let x e [0, ]", d e {0,1)" and e, be a random variable generated from a distribution


with mean p = E(e ) < and variance o2 < o where u > a c +u _< 2-Fn and



p < If Pr, () -> E, then there exists a constant (s) > 0, such that
w r3



Pr,, (x), >)>I+e > 7)
qcR[ n]


where r7 is a function of .








Disqualifying Lemma 2 Proof:

Let Y = h (x), be i.i.d. random variables. For any fixed q e [n], let m = q, the

cardinality of q. Without loss of generality, assume q = {1, ,m}. Given a random

variable e, and constant a e [0, -n, we have


P >i+eq PZ >1+eq,e <2a + P Y >1+eq,eq >2a













According to Chebyshev's Inequality, since eq is a random variable with

,=E(eq)
2

(2a _)2

Then, we obtain
S > 1+eq >P 1 Y >1+eq I eq <2a 2 (6.4)










S(2 (6.4)

(1) (2)

Let the probability oi(E) be equal to the product of term (1) and term (2) in formula (6.4).

Next, we continue our proof by solving two problems, respectively.

(1) Prove r(s) is a positive number:









In all steps of Dinur and Nissim (2003)' proof for their Disqualifying Lemma, term


(1),


can be substituted for


P\ Y >1+2e


provided a e 0, oJn To see this we have the following:


P Y >1+eq eq <2a = P Y2 >1+j (e = j)
z=1 j=0 z=1


>P Y >1


2a
Since Eeq n ,P(
J=0


+2e P(e = j).



eq = j) > 0. Now, Dinur and Nissim


(2003) proved P Y >1+ 2e > 1-2e _2 / for the appropriate choice of T. Rescaling
,-I )


2a
T in proportion to [P(eq
]=0


j) proves our point. Similarly for the second part of his


2a
proof the parameters a and /6 can be rescaled in proportion to YP(eq
J=0


j) This gives


then that


P Y >+eq eq <2a >max 1-2e 2,
,-I 3,8^


2-x
x
(1 x)2


where x = e P/2 with f chosen large enough as seen in Section 6.2. Thus


Sm
P >Y 1 >+eq eq _<2a
I ,-I )








mi 2-cx a
P Y >+e x(l -2 (2a .
,-I (1- X)2 (2a U)2

So the probability q(c) will be a positive number as long as term (2) is greater

than 0. Thus we need to have

1 >0
(2a -/)2

which is true when 0 < a and + < a < provided u > and
2 2

a + _u < 2-H respectively. These latter two conditions are assumed in the Lemma 2.

Thus,


Pr, (x)-d, >1+e >P ~ >l+eqleq<2a 1- 2 (6.5)
qc R[n] r (2 ) a


where parameter a 0, U 2 .
2' "2

(2) We now maximize the lower bound over a.

In order to derive a tight bound, we seek to find the maximum value of (6.5)

subject to ae 0, -U, 2n]. So


P >1e >maxP Y > e e <2a 1 2



>P Y >1+e le <2a max 1- 2a-C2
a (2a- U)2









where the a in the first term is any a 0, U J. Using (forthis


term) a=o(I n) gives us



S q (1-x)2 (2a-)2

Note that


S(2a U)

*U -u Ci C +
is decreasing over 0, / and increasing over 2 n so we merely need to

compare

K1


to


I--
[1 (2, -- p)2

By assumption p > n- so the latter is maximal. Thus

2 -x C
P >+e () --x 1-2 >0.
l+e (1- x) (2-n p)

End of proof.

Lemma 2 is a crucial step for our model. The successful proof provides a bound on

the error E in terms of the mean and variances of e In the next section, we will continue








discussing these two parameters. Based on the results of Lemma 2, we are able to derive

a bound for the number of queries, within which the adversary would be able to

compromise the database protected by using the variable-data perturbation method with a

high probability (1- 3).

6.4 The Bound of the Sample Size for the Variable-data Perturbation Case

In this section, based on the proof of Lemma 2, we develop the sampling bound for

the variable-data perturbation case from two approaches. In the first approach, we use

Dinur and Nissim (2003)'s result directly from their Disqualifying Lemma proof in our

bound; the second approach applies instead their sample bound to obtain a tighter bound.

6.4.1 The Bound Based On the Disqualifying Lemma Proof

Recall that err(h (c)) > 7 (E) (see section 6.3), and we intend to bound


D( S:err(4 (7(S)))>(7))

by the confidence parameter 3 > 0.

We use a probability (E) to bound

err(k (c))> 7(E).

Then,

D' (S err (2(s)))> T ()) (n+1) (e)

where ()= 1 (). Thus we get

D (s err (h (72S)))> s()) <(n+1)i x )










<(n+l)" 1-x2-x 1- C"2
(n+l ( X)2 (2Nn- _)2


Bounding this with 3 gives


D'(s: err(l(2(S)))>g)<(n+l)" 1-x 2 I <


Then, we take base 2 logarithm on both of the latter two sides to obtain

2-x C 2
lg -x 2-x 1- 2 (1- (2n-/)2

The minimum sample size is thus

lg -nlg(n+1)

l 1-x 2-x l ,
(1-X)2 (2, -U2

2-x
Since x where x = e is a very small number, the resulting bound is
(1 x)2

very loose (as was the similar bound under the Dinur and Nissim framework discussed

earlier). If n is small, the sample size / can be even greater than 2", which is the total

number of all possible queries. With larger n, / becomes much smaller than 2".

However, / is still a very large number. In order to reduce the sample size /, we need to

2-x
find a more practical value instead of x.
(1-x)2









6.4.2 The Bound based on the Sample Size

Starting from Dinur and Nissim (2003), the sample size / is bounded by nlg2 n if

the fixed perturbation is less than -n Therefore, we have a sufficient bound for the

fixed-data perturbation case (see section 6.2 for the details):

l>g() -nlg(n+ 1) >g2
S, 2-x
0g 1-x ----
1 1-X n)2

Consider the boundary case

S lg(3)-nlg(n+1)
nlg2 n =
lg(l- (())

Then

lg() -nlg(n+1)
Ig (1- (E)) =
nlg2 n

lg(S)-nlg(n+l)
1(E)= -2 nlg2n

Based on the above result for (F), we replaced

2-x
x t
(1- x)2

with

Ig(S)-nlg(n+1)
1-2 "ng2n


This formula provides a better value than () while developing a tighter bound

for the sample size in the variable-data perturbation case.









Since the reasoning used by Dinur and Nissim (2003) to arrive at n g2 n remains

unchanged for our case, so we can use

lg(S)-nlg(n+l)
1-2 nlg2n

in place of our 7 (E). This gives


lg(S)-nlg(n+l) 2
(n+1)" 1 1-2 "lg2 <


from which we obtain


Ig(2)-nlg(n+) l2




lg(S)-nlg(n+l 2
Ig(+l)"+lg 1 1-2 2lg2I \ -- g








>~~i Ig nlg (n +1) (6.7)
1> (6.7)
l g(S )-nlg(n+l) 2
Ig 1- 1-2 "ng2 1 2l




From formula (6.7) we can see that the sample size 1 decreases when /u and C

decrease.

6.4.3 Discussion

As we know from section 6.2, the larger the number of camouflage vectors s is,

the larger the response intervals are, which lead to the larger perturbation mean and









standard deviation. This result simply implies that sample size / increases with an

increase of s.

Our experiments based on the three examples in Garfinkel et al. (2002) support

these conclusions. The database has 14 records, n =14. Three cases are considered in

Table 6-2.

Table 6-2: The Relationship among /u, a, s and 1.
Network
Vwa iw=3 and m=l w=5 and m=2 w=7 and m=3
Variable
s 3 10 35
P1 2.0236 2.7760 3.3019
a 1.1150 1.1114 1.174
1 213 217 223

From Table 6-2, we can see that the sample size / increases while j/, a and s

increase. These results of sample sizes are very close to the bound nlg2 n from Dinur and

Nissim (2003) and much less than 214 = 16,384.

6.5 Estimated the Mean and Standard Deviation

In the previous section, we derived a bound on the sample size, which is the

minimum number of queries required to disclose the binary confidential information in a

database protected by the variable-data perturbation method. The bound (see formula 6.7)

is decided by four parameters: the number of database records n, the confidence

parameter 3, and the mean / and standard deviation a of the perturbation distribution.

Among these four parameters, n and 3 are known and predetermined. In this section, we

will develop a method to identify the estimated mean and standard deviation of the

perturbation distribution.









Perturbations' mean / and standard deviation c are fixed in the Garfinkel et al.

(2002) as soon as the algorithm design is finished, such as those networks for camouflage

vectors in the CVC technique. However, the actual mean and standard deviation can be

calculated only if all responses from 2" queries are obtained, which is not practical in

most situations. Instead of computing the true mean and standard deviation from 2"

queries, our heuristic method intends to estimates these two values approximately,

denoted as /i and a, by using the following random sampling method.

Let

i: index of query i

q,: the ith query

e : interval length of query q,

/i,: mean of perturbations using queries 1, .., i

r, : standard deviation of perturbations using queries 1, i

/ : sample size computed from fi, and ~, using formula (6.7)

Table 6-3 lists the heuristic steps for estimating the mean, standard deviation and the

bound on the sample size.


We use the network example in Garfinkel et al. (2002) to illustrate our heuristic.

The basic setting for the network algorithm is: there are n = 14 database records, and

parameters w = 3 and m = 1. The true mean and standard deviation computed from 214

queries are u = 2.023 and c = 1.115, which give a sample size = 213 from formula

(6.7). Also see Table 5-2 and Figure 5-1 for all camouflage vectors and the CVC network









algorithm. Next, we show how the heuristic is applied to estimate /i and & for the CVC

technique example in Garfinkel et al. (2002).

Table 6-3: Heuristic to Estimate the Mean /, Standard Deviation &, and the Bound .

Heuristic:

0. for(i=l;i<_30;i++)

Generate query q, and record its perturbation .

1. Generate query q, and record its perturbation e .

2. Compute /, and c, using e,,' ,e

3. Compute 1 from formula (6.7) using the estimated /,

and .

4. Increment i and repeat step 1 to step 3 until i > This

/, is the final bound on the sample size, I /, and c, are

final values for the estimated / and & .


For example, the intruder sends a random query q, to the database, asking how

many employees in Company B have positive HIV (see Table 5-1). The query responds

an interval answer as [1, 2] (see Table 5-2 for the set of camouflage vectors), from which

the random perturbation is recorded as 2 = 2-1= 1. Continue sending queries and



recording perturbations. The mean and standard deviation are computed as /,= 'Jl e



and usin when the number of queries is more than 30,
and 6 = \ 1- -- using el ,* *, e when the number of queries is more than 30,
1