<%BANNER%>

Defending against Internet Worms

xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID E20110404_AAAABS INGEST_TIME 2011-04-04T13:32:53Z PACKAGE UFE0013742_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES
FILE SIZE 55643 DFID F20110404_AABIJF ORIGIN DEPOSITOR PATH tang_y_Page_051.pro GLOBAL false PRESERVATION BIT MESSAGE_DIGEST ALGORITHM MD5
97334574a965e2ed3c819b1dee448819
SHA-1
0ec4af2d52038444ae2bff9c27ea4e6fabe29d53
55955 F20110404_AABIIQ tang_y_Page_016.pro
dc00ed3b10f26930253485a6ef429316
cf51f8ddeacb97453d32bf2ff9670c04b545fa5a
54960 F20110404_AABIJG tang_y_Page_052.pro
6a0f01e0a020b9286a168365bff5a1bd
2bd536a5ec08d34152a2c38bae419db52d858222
52008 F20110404_AABIIR tang_y_Page_017.pro
ffe9e423a04ae761d2915ee7f25d9e02
056ddd8a520ce233ab7126a26d146a527c57d3b4
43054 F20110404_AABIJH tang_y_Page_054.pro
a8a63c164d65ec0a47a9e020f1d33840
93248aab5711be9bb393f48a2b08bc629a502bf1
31896 F20110404_AABIIS tang_y_Page_021.pro
ea9f4fa97e1a226a84fb6394469b5187
b7bc02d71e4dfbd06c79dd414951dbc40f690713
51145 F20110404_AABIJI tang_y_Page_061.pro
2f4bfa7d259e289b42379285e14a2af4
4f9e34d57a1134616c2169b0fd95c604edbee895
49994 F20110404_AABIIT tang_y_Page_023.pro
6d246cb2cd6c8ee817bc6950521653ce
d491965740952bac0c9d7c0e2d15a346e4329d3b
43685 F20110404_AABIJJ tang_y_Page_068.pro
9816cbeff08d75ad74c5e5e7e59ed993
ea00e84f0f652d53d170e3f540661d55dff2b4c2
42753 F20110404_AABIIU tang_y_Page_025.pro
1f2637d54a06c30481e2fc6d87b41b4d
b5b777eac4f48de59fc53f0873568e7ca60135ad
47804 F20110404_AABIJK tang_y_Page_075.pro
bffd9a070607fb1378ef131044eeefcd
6012869141361e1c535d431ae8148f4a8c79b09c
40408 F20110404_AABIIV tang_y_Page_031.pro
f09f506b6df659c84f83b402843967ca
205120456c91fc1a0352af89dc57faa051693e1a
29557 F20110404_AABIJL tang_y_Page_076.pro
ca854ebbabe11b8ab38275201d31c65f
b50f476e5b5c09d141674846c7971d64236dca6e
41511 F20110404_AABIIW tang_y_Page_033.pro
be82a4aa3275c0ec58b158a1dfee2713
fa01d55f1d8d8d0206618d20b40e02d1646ac727
44474 F20110404_AABIJM tang_y_Page_077.pro
eb09606160e86fab6eb577ce1256063d
194080059ca2c26a7380b7e6e1e04ca9650bc186
44770 F20110404_AABIIX tang_y_Page_035.pro
e55853dbdf7daefcc25ab9f38f81a5e2
f7fec233c26460dc2bcd925002476a2c446c9572
62960 F20110404_AABIKA tang_y_Page_107.pro
05d0289c68085115209869c9eca28aba
422bab6b19eb26d7a8c0a4ff02584d67d1f368f6
43342 F20110404_AABIJN tang_y_Page_078.pro
8ff26c885c197c8a5ca8456451b1eb62
cccd04f08d046109cda148c895a75d183372284c
50390 F20110404_AABIIY tang_y_Page_037.pro
6e4233d627702cec2544bbabb62f1cc4
a63067e537629316a82e3283420972591b624fd8
65836 F20110404_AABIKB tang_y_Page_108.pro
475409f53a24edb57c1070b380155031
0c88746175c8d386b024efb40797a8fe2891efd1
55497 F20110404_AABIJO tang_y_Page_081.pro
1f88c96843bb8c073bf2c5698819c1cc
50a57b3307c4a751eb9571583d00dc4a41d51e41
42792 F20110404_AABIIZ tang_y_Page_038.pro
e6175b5d4bcf410b8d872beb7b2b89aa
11e2c3697b86db93c8c21432ab9991b866b3766d
17502 F20110404_AABIKC tang_y_Page_111.pro
a79913a040f141b4c8fe1314c36caa88
bcb10f29e283711e00c12ef1e31ff4ab0552f9da
4209 F20110404_AABIKD tang_y_Page_002.jpg
b7bf6a6b11440b6e8c3920cb97b5fa92
d1edd263c4a5cba8716f8343224de506b0aa232a
22465 F20110404_AABIJP tang_y_Page_082.pro
9e2cb64362c91653471c248def97ac90
135deab97f74edee620a1a2f884789b3de6462a6
80059 F20110404_AABIKE tang_y_Page_004.jpg
86ac276c6087efdd4b16c099c77cdd1c
2ecadc1e975fdef7232f6c44eb8c80deed04cce5
21771 F20110404_AABIJQ tang_y_Page_083.pro
98f7165213c4f1900f3b9afa67188435
6a199004c75e4eefdf18c46cf17b2fa85f7910b3
75600 F20110404_AABIKF tang_y_Page_005.jpg
1e3c55eb2f12485bd471ef172c05d12a
95785eae847aef2ccb9ae7a9e029eae3ca419c9d
53466 F20110404_AABIJR tang_y_Page_084.pro
d16af9dd9280cf3320753bec8c6ea1f4
8f500174d99669455266f3543ac9a0aa2c9601bd
108945 F20110404_AABIKG tang_y_Page_006.jpg
bb9123ef2f3f2ba7a4712d586d42d8e5
bf876dc4da92f960114c1e84f61473d34cc22c6b
18935 F20110404_AABIJS tang_y_Page_085.pro
fee7e5d14fc17a8da10768cb75c55925
d70b0ecc1aa322894968cfe5ba876eb0371ff99f
77724 F20110404_AABIKH tang_y_Page_008.jpg
2bd2826c2dc6a8d63dc63f18075d8698
3d13eedf85813db17522db04a4975e775d520725
47086 F20110404_AABIJT tang_y_Page_086.pro
5641fb8667b514ff93db866b29c4eb9b
f41a65ceb0264e8672553ca08bfa473291cd8a09
34494 F20110404_AABIKI tang_y_Page_009.jpg
c837d3b875581d2cdf0206045483f0f5
f817411414b22f6b48859a54a0ab554707700ab9
51027 F20110404_AABIJU tang_y_Page_091.pro
15460141c7fb990c7ff423da3fe30efb
cf2bc29e0f4d978724c99424bb61a378207035e0
104663 F20110404_AABIKJ tang_y_Page_013.jpg
53124f4d50545e431dcd89c64cc29dab
e00c6ab845bc7aba5b18f19c609d079f17cd233c
35596 F20110404_AABIJV tang_y_Page_099.pro
08a0940b3f02599a8524384c66a9769b
87beceed20a8a84876946a1747c9b10c0936b5a8
105842 F20110404_AABIKK tang_y_Page_018.jpg
6694f9b593a75b7a9e741724ab19b4c3
3a70b02389ca792cbceee05d9a599b2a30c7a139
50803 F20110404_AABIJW tang_y_Page_101.pro
917873d2a64ba72863bb754fadd1dff6
3347cca0914666786b770f9ec27c6ea952769818
83550 F20110404_AABIKL tang_y_Page_020.jpg
7792182431190e34e515eda279398a24
6c7298bb0086da5a9659cf75d6e9e3ff2356e5e0
3167 F20110404_AABIJX tang_y_Page_102.pro
45f4ee26f1a8100b2d3cad9feedbd845
3d20676e6820bc3db19d602e0d46179c2a07a624
104951 F20110404_AABILA tang_y_Page_057.jpg
7679718dad832398ca12666cc7211bab
7ff117edb131a4ffb3fd4fdf801169dff646162c
100895 F20110404_AABIKM tang_y_Page_029.jpg
e810b0bc16d3bc8d4c7915ffa0d6339b
642d415b054b45a95709d890c770068b7e23a2d2
45884 F20110404_AABIJY tang_y_Page_103.pro
211493ca7fb44a6db94d1e5b34f96a70
fd34b7252288d8f3149bb6ad30c1a34945c6ac17
41480 F20110404_AABILB tang_y_Page_058.jpg
3b21b50113afb063e93cc932de0d96cc
7b420071f27cc5353795614c69d9091799b76527
94533 F20110404_AABIKN tang_y_Page_030.jpg
f7f3cf32249a8f977fe54395e3b0757a
227f04babc561ef79510b09362d74ccac470ed05
55717 F20110404_AABIJZ tang_y_Page_106.pro
751e5590602317df6c79a0520803b4fb
4a8fae441fa9103709e45a53fcc69e49a5c0eca6
102735 F20110404_AABILC tang_y_Page_060.jpg
4ac2713cc6ace3a7a0b0b6391a27c81b
5ca6e602149b9250b16e433fca7e9403baeb0b65
74203 F20110404_AABIKO tang_y_Page_031.jpg
d7f5490cb0bdf63edffb1a6ef73545a8
b77c6021a80fadb50b4c32fa454eeb79ec7f40f7
99105 F20110404_AABILD tang_y_Page_061.jpg
2fea6b8832378d76252a791b8f112522
291b59bb27791ed4495cd3cb2d5fd4a746ac7e5a
92620 F20110404_AABIKP tang_y_Page_034.jpg
fdf3a0d5a1005ce345dc94b4d2a0c473
9096d65c01e960ba6b25040c2bef0f183e1d3e5a
102167 F20110404_AABILE tang_y_Page_063.jpg
21819fd852d2649c1263c515084aaaf8
76612c09a026b56a89b7bae11fccb9e0694ebc87
70068 F20110404_AABILF tang_y_Page_065.jpg
b79f536408bfe9aefdeefd6747e6d2aa
e89eeb3fb3fafa252ab245b60b8fa0e5a86d905c
87846 F20110404_AABIKQ tang_y_Page_035.jpg
c99442565f7d7040c62bd7eae26a73c0
1868029ec33788269999e26af3f40090107b6a15
84094 F20110404_AABILG tang_y_Page_068.jpg
adf258837ed63ffc281e56386ca40c79
566628594bb04c7b28b0ef986f9f6deddc4c61f1
83926 F20110404_AABIKR tang_y_Page_038.jpg
e6eb592382d3fd36a61d5758d0d68adb
69f9d6c2ed43ffde5f41e91cd57e46858e742ef0
100177 F20110404_AABILH tang_y_Page_070.jpg
e7ad172dff527c12dba2c6d383080707
5d960895616dbd2255d4303d58aa90e7ac29a227
96334 F20110404_AABIKS tang_y_Page_040.jpg
e61cc4aabe04fced42e9e1627a43c5d3
80f003f13572075012d15781c86aea0397de7fce
83435 F20110404_AABILI tang_y_Page_079.jpg
22773dabaaaa380fc8438525a0b267f9
cc06e6843b1ca419ae11bdf478becf5b7f1e8560
94126 F20110404_AABIKT tang_y_Page_042.jpg
bf73ade1c82cc66ba3813453d0e18f94
865e0c73e470de4b04b197c1c1b33fcd5ec89c5f
47374 F20110404_AABILJ tang_y_Page_082.jpg
6ae3926a8cee80f20e050096263dfdda
5326444f654de35bd495baa94d30b7dece2fd09f
84184 F20110404_AABIKU tang_y_Page_047.jpg
0ce78bee3484f86ca284022ab7a76b5e
b27ef3cac28c8f6bb0e1ce247c5ee4f6144bfacb
46576 F20110404_AABILK tang_y_Page_085.jpg
6424a79c422d4a7aa989ba7ab4502057
e74c724cd9a027f4eaf8b06ded3640dd694ed6af
87825 F20110404_AABIKV tang_y_Page_050.jpg
e47a3edf79af3c5a7eb90a6b6e2351a6
80b35cbd339fc3992e572fcb4775f8b98efd61bc
87926 F20110404_AABILL tang_y_Page_086.jpg
4ce896fb6f0f56f7bc55093e198ee3e7
e6d532e31fa4c220b34373b189f3fe12c2fb6f92
106585 F20110404_AABIKW tang_y_Page_051.jpg
aeb8444429001f858e118eb8977fc1c9
8775ee1c61e7a20ce529278465c2ee40622ff9fb
1051985 F20110404_AABIMA tang_y_Page_019.jp2
8661a1b6950e0e94132f1f185e5ccb4c
d7aefc737fb4b544b3e88032b2fcf6d45aa2bbf4
53620 F20110404_AABILM tang_y_Page_087.jpg
308a0a35ef55f9730aa5f3c375c4e853
da4ec5af65f42c2be85eb8a0c9789d8af8246c1e
85937 F20110404_AABIKX tang_y_Page_053.jpg
4bc818b9ae89c0601ab1ee292ce478cc
128e3ec3fe036668971cc357c8b4d199b2eee624
720010 F20110404_AABIMB tang_y_Page_021.jp2
81caa9ecc5392b7ae774a894615c6d2b
43743b55f29995f3063d9d9308c952c36d0621ed
109081 F20110404_AABILN tang_y_Page_090.jpg
c070316a515de8ec17840c4578b934c7
141fc98b2d4fa6a5a9e3d381d7f630b57cf0fd70
86503 F20110404_AABIKY tang_y_Page_054.jpg
60df796aa1329394d02ce2df0294581a
cae8b1a3972bdd2b440e0c1e932a33b69aedb116
988744 F20110404_AABIMC tang_y_Page_026.jp2
904fe0633a79133f88f1032b63b731d9
3f6b7936ee25bf2ad7143dfa159b480b2aec9fd4
102944 F20110404_AABILO tang_y_Page_091.jpg
4cb9ea24b81a6079d742ccafc9d970aa
fa7968d82c6741befce6ba9051cbecbea62b7ff9
65208 F20110404_AABIKZ tang_y_Page_056.jpg
980b230f34561ebc59cf3f4a181ee068
420ede0527316244eceffd140b368624499050ac
931363 F20110404_AABIMD tang_y_Page_027.jp2
a6b1bab6b1ecede911d721ec972bb502
95d1d7723c975e761f35a6bf900099c6f579519f
69737 F20110404_AABILP tang_y_Page_093.jpg
48543fd20acdb12bbec41118fc136069
bd319d422a5cc5e150a575745694f2ff40a627bc
1051923 F20110404_AABIME tang_y_Page_030.jp2
435cc7a629ff9575abf063c61572a21f
5b165efb9c2506bba7f0663ab4d0af32f1321887
56269 F20110404_AABILQ tang_y_Page_096.jpg
02efa5d2bb3c0b1daa1bc060fed51449
14aa1e559af0b9398ccab53f1e279d05c7aa4078
1038653 F20110404_AABIMF tang_y_Page_034.jp2
64ff84401c418bb588ab0a6610e2f906
0ee7c615d83026dd4214c5a270a878332fd0a8d2
1051955 F20110404_AABIMG tang_y_Page_036.jp2
6be94b41b452a92b27e18f254053e51a
83231c9b026876b8160e8043609f59c2899c6354
96780 F20110404_AABILR tang_y_Page_100.jpg
6d1ca5510fe10ef8ccc152e3661c94d5
e914d8f83350846b4f6980e818dc512823e14194
914637 F20110404_AABIMH tang_y_Page_039.jp2
626ab41cbead4d32dcb745d4941a14fc
8d4b41149485a49068f4fd6d140586c66a11cd90
91229 F20110404_AABILS tang_y_Page_103.jpg
068eb3d9a382204ddb3423be7b2b5940
3d4a5a2ee6343ec7b35b0468ecf3f9f582d23fc6
F20110404_AABIMI tang_y_Page_040.jp2
62621b9f4c27f52ed4c3ea29002d6b13
521aecc1d4d52924d4e1837bd2dde0712dec5f1c
105661 F20110404_AABILT tang_y_Page_104.jpg
278c57a4a62210a3e0ccbe87760b6817
d25f99fa5f507e6a338c8667434df9c834cd511f
948079 F20110404_AABIMJ tang_y_Page_050.jp2
80c2fde9d2430385fea00d38d20c8716
569d7522168d1817a192c25e5d9cefb11e57e0ae
129032 F20110404_AABILU tang_y_Page_108.jpg
00e2492e50b8b60f483cfe49bcefa335
da59c3582d8acd77af7b62e78e3162ef76c2c36e
1051975 F20110404_AABIMK tang_y_Page_057.jp2
6229efc50400b970def841fa4a3ca654
99b3a5efd73257d455318d34bb588af0a3831eb1
38520 F20110404_AABILV tang_y_Page_111.jpg
29c5284a5ec77b92a0df8e5dae5dae2c
0f4036f9298dbffb81a32d7b4fef348a9fb9055d
1051963 F20110404_AABIML tang_y_Page_062.jp2
eba533918e4e5399b6d010b1b2a66e08
cc3ac6276d06348aade8a80b28ffb013ea2a0d61
881568 F20110404_AABILW tang_y_Page_004.jp2
854e698d0178159bd93f8a2b12213f66
dba773c07d25db4403f8c924543c85c778e0813f
750596 F20110404_AABIMM tang_y_Page_065.jp2
fc3ebdc3092e20c568f7ba31d9636b8c
1055bc1442c3cf3590450d2d873e0144b2ccd0ab
1051986 F20110404_AABILX tang_y_Page_006.jp2
b6234ccd5d73f7577ad09652608ffa3b
751037d116c8e79a5588ecf84b9a673c4d0e47a0
592717 F20110404_AABINA tang_y_Page_088.jp2
1acc29712d5ed4618b806855f4f12ec7
f3caee8c13a7165a44fd2399766b51877af580aa
1051937 F20110404_AABIMN tang_y_Page_066.jp2
cfbab4753c68cb25b26a4bcf0ca0f360
d31e5c8ea1ecf5e51b9a193c5a075c808dd340be
185791 F20110404_AABILY tang_y_Page_007.jp2
f74d4616786ebf7bef1d44d7155372cc
3b3c1a9a730ab3ac41667884d9f7521d6ddfe776
1051919 F20110404_AABINB tang_y_Page_091.jp2
e67c289e5bff4300a860f90e57c1b415
927705cb001dee59204164b0346a2d880b37ba86
1021014 F20110404_AABIMO tang_y_Page_069.jp2
cb0f553126fe28a3d7e7c1154b42996a
fab67ab444701facf7f0d9c93e671ae448614c48
F20110404_AABILZ tang_y_Page_013.jp2
c02a7d89a180c3ab259f500730ee2b31
f7de15031309a076bc058f6f0820a9e8745507fd
708727 F20110404_AABINC tang_y_Page_094.jp2
f19a774e35b2e227b2c6c3fec5e911c1
8215213031eca88162c55c28a7fccca189506037
576626 F20110404_AABIND tang_y_Page_096.jp2
7ac7f8f073808c64cf7e2b578cdc13de
51b78b3a9421fcbdf60522a47139ed383da7fc20
1051973 F20110404_AABIMP tang_y_Page_070.jp2
68867f0b15f58f9a0b166737e999989c
a9e04a88ab65dd6855eaa2cdb2fe47f0c368c9a5
872066 F20110404_AABINE tang_y_Page_097.jp2
2a82eba9377b750d105798def59d09cb
c1ce17be23373d66cb266f5dc150572c33c62db5
F20110404_AABIMQ tang_y_Page_073.jp2
9bbff19ba074b7f1e0ef35f0eab48ecf
9eed5ffac634cd318b399a621cb9eed7eeffae0d
70094 F20110404_AABINF tang_y_Page_102.jp2
26ed93251f77015c66d1dad3a1133614
8edd1f1717b6633218c47d2bf70f425273ddcc0d
1013983 F20110404_AABIMR tang_y_Page_075.jp2
cccfde5c2323338f376fed08855de3fb
8b69fd164e44a27d625549e679f0916c3f972bf4
1023832 F20110404_AABING tang_y_Page_103.jp2
2fd9da9abdef7594ffebe6146c2f6993
a9b2bdc214f0b765924d6e62e7a3ebd4f3b1479f
1051953 F20110404_AABINH tang_y_Page_104.jp2
4769806435b3473e923962cc0f433903
77990475069d9ad685979145f7c8bfebf311c1fd
721353 F20110404_AABIMS tang_y_Page_076.jp2
f0cfb41d7ca0874514f90db85823871d
165031b01cb7d048571f1b008f77adb2bf27c36e
1051965 F20110404_AABINI tang_y_Page_106.jp2
68135cda5a5ddb5f236d74f86e95c6d4
7662c333db4576f138679ef9dcb7530c4b1d5f04
926397 F20110404_AABIMT tang_y_Page_079.jp2
776955d98fac3e10be8ce679f793b466
9c1bc3330571daa9a3e6bc53a3c31112767a5dd4
481702 F20110404_AABINJ tang_y_Page_110.jp2
d4e786940e7419f5b40fcf56087cd19b
d78eec7d3b3d4657541e0d6ec98ae309a38a9c28
970507 F20110404_AABIMU tang_y_Page_081.jp2
464b70cd6b5a1ba5aca4de1d6efc2820
3b041e86794722e5a823ac425d9d271996ea6aa6
32358 F20110404_AABINK tang_y_Page_049.QC.jpg
897467691717c234dd6c5cc37d8e5d13
aaf6b8f43b278ef7eba326b15e4565c41a7fceb6
487479 F20110404_AABIMV tang_y_Page_082.jp2
e0f4d5aa3997263488d32b4d2b74b3ed
e31a8cbafc134774344cfa08e0b78b6e56129424
6165 F20110404_AABINL tang_y_Page_039thm.jpg
f297324a7e69230191aa0a207d719320
3b6ba2daae3aae000bd9dcb6a56affe0efc665d5
489348 F20110404_AABIMW tang_y_Page_083.jp2
50ce0248049d1a4466479e3cdf7e57e3
fe05d1eab12b22b662c4b7db70729c658280c365
32380 F20110404_AABIOA tang_y_Page_014.QC.jpg
c904909e1fe36152e7c2f550d6e90a0c
d20db50bce1a86bb4a349db6855ddafab47a3cee
1687 F20110404_AABINM tang_y_Page_007thm.jpg
35079cc87559133a6cd169723cb689df
2fa19e2783e07c6782c7514fb303f26ae6974adb
1051931 F20110404_AABIMX tang_y_Page_084.jp2
cd6d9b90823606adcaa752a08e2f14ad
203dac83c006b863fbde9282b7208db6143c3388
12255 F20110404_AABIOB tang_y_Page_111.QC.jpg
768a7c2e261c0ba03463ee5befee513c
da4151e207b8c9fa38b2d37c0a6826d0ad8333c8
7400 F20110404_AABINN tang_y_Page_045thm.jpg
be458b56c2875feac1dd09d232a4042c
0ea30e4c9a6830281977b337369bad10417c2e02
486320 F20110404_AABIMY tang_y_Page_085.jp2
89eb6cc2bd018098690932bb3e3a423e
109baee3a69490694cb56201ac988a0aeaa2d2e3
7578 F20110404_AABIOC tang_y_Page_104thm.jpg
0961483ce9ef38250d0038eb4a03eb70
50e232dfaa09a2c12526842f2fdc057559f55b5c
3456 F20110404_AABINO tang_y_Page_055thm.jpg
0f04fa428d1c09ee8b0bea3679e45c77
ab35b5ad9068720eb90dd692c49908cb62cb354b
517242 F20110404_AABIMZ tang_y_Page_087.jp2
fbdf1a8d83cbddae3946e9c24bb490f3
c1cbd5afcec0cd701ffe634538deb8cd878175ef
6412 F20110404_AABIOD tang_y_Page_038thm.jpg
cee212f9522da7c75c7587ae7235f94b
c96caf81c157a89aad3345cf0946dd2329ff3fa7
7644 F20110404_AABINP tang_y_Page_084thm.jpg
594ef9cb5e3c4fffecdce29a2e3cc73b
63a6d62188d13e3e1e3c8a0d531214899d75b8cd
22322 F20110404_AABIOE tang_y_Page_065.QC.jpg
e62152f3a2aeb27327d2d78f4fd878a9
4ece52abd15b085337d18141ecba8b157d998c0b
17412 F20110404_AABINQ tang_y_Page_087.QC.jpg
6f173d563eaee97117ab4ccd04084d8d
f51a7f584847fa9a3ef12a94fa3a517a2e230c32
28080 F20110404_AABIOF tang_y_Page_067.QC.jpg
259707794c12306c2234d58e7e1ca3c3
b597ec235338f36a5edb6d3121ee0046a6fec721
5900 F20110404_AABINR tang_y_Page_004thm.jpg
6feebcc2113077000c4b32b24cfd6ec2
fc40dbac519eb6dc5bd26de9dda468f51a8220f4
28652 F20110404_AABIOG tang_y_Page_034.QC.jpg
6a577c7e6663e886799dc0d3de20613c
14411dd5f9e8daf65e2f6b232e5cc86709ab0b2b
6980 F20110404_AABINS tang_y_Page_042thm.jpg
f1b748a2f06126bd19f1d947ee3f01a7
3fdf3bdfc4f6e17aaa9c93152c756575c9e0dbef
177962 F20110404_AABIOH UFE0013742_00001.xml FULL
8e5f575774a60a5c2b988cfa88d50841
3aa391f63bf5c5933bacb341f1bcc01a11ca5381
1232 F20110404_AABIOI tang_y_Page_002.QC.jpg
f3efa786c1200e3f18400023e50515ca
b2c334ec447ea1e3323ba79bd2fc52d7ccee4b54
30391 F20110404_AABINT tang_y_Page_030.QC.jpg
598f92ed422e544047dad0bbda436b6d
127e4d58bca1d36c7a0498b3018249d0a95119a0
25133 F20110404_AABIOJ tang_y_Page_004.QC.jpg
faac0e332713ab98dd6ac6eab52fdac5
75e0f47040181364c2d20b6766f5996e5159a65c
30366 F20110404_AABINU tang_y_Page_023.QC.jpg
fc6caad7ae8dbe8999f5c3c4bc485741
4481c4a43dad093ff03c00d9f3dc1c441d6ec7a5
30245 F20110404_AABIOK tang_y_Page_006.QC.jpg
f828e4533a8b8f6973ac3afc13aca381
1c3940a87b64090e5d72f8cb7d972eec9c0d1be9
529 F20110404_AABINV tang_y_Page_002thm.jpg
a7899758165d7e320f600377731b57dd
e680403cce063198b51245690e6c4e96c382234a
6819 F20110404_AABIOL tang_y_Page_007.QC.jpg
ab473622a04dd3121020e7bb3d3084d5
8690c0a25fd831e8cb7855e529db881cc84604d9
3965 F20110404_AABINW tang_y_Page_088thm.jpg
0a32558b9c660e50f4f9abaa7e687c48
654899822f121329175eb2be20e8c398727e35b3
23755 F20110404_AABIOM tang_y_Page_010.QC.jpg
bf1834f221a3cb4f1ff180e9a26ba657
324fcc15e37ccc7ce18aebd12b6a4895bd1c45ec
2699 F20110404_AABINX tang_y_Page_102.QC.jpg
8ecba33a112bc10f44833ab02044f8ab
5bbb65e61ac7585bf66b2dfdac99b49de22f97b7
31049 F20110404_AABIPA tang_y_Page_062.QC.jpg
43f5212718fbbcdc0a1accd77965a861
49ac01c680bbb25137564d2901cb86705d835904
27001 F20110404_AABION tang_y_Page_011.QC.jpg
930cf2b7ff2267fde4e9a4adc1cf70cc
0f263c12a0f19255b91f8c08a179aefc4622bcad
28917 F20110404_AABINY tang_y_Page_048.QC.jpg
447af813d74494a87857797a24988d61
99e39fbff792320c455a462a5055d57516bcd314
28196 F20110404_AABIPB tang_y_Page_069.QC.jpg
9e681cdd78d1de2c34a7b3b0ebc4a920
41e437b2b9e65aadc8536d940aaabf520ed3b017
31257 F20110404_AABIOO tang_y_Page_015.QC.jpg
03ff82c60c2969c171c90de32e496ab3
f988affc72100fa1427ceae0bfb2d44ddf195f89
27122 F20110404_AABINZ tang_y_Page_026.QC.jpg
a27a6ff4e2e4727805ade26157bffe50
8776147a9a985da4e5072ba4088ae59084f0bebc
31072 F20110404_AABIPC tang_y_Page_070.QC.jpg
b17fcaccbcf5ed41eb4b1c1f1abfd3fe
a94aa7463b0391d7724a3bb3e2054d28f5a85a7e
33293 F20110404_AABIOP tang_y_Page_018.QC.jpg
d19f71d1a99a067334cac38eaac58de1
7679a85dbb4c6ca2c9ff3ec934c7b908886b6d5f
33649 F20110404_AABIPD tang_y_Page_072.QC.jpg
769e15844b17bc7ba65c94796c2bfe1c
0c980a023423b1ba2f5a7715b11a92d6b302e528
33683 F20110404_AABIOQ tang_y_Page_019.QC.jpg
bf35fc5b0b1ebc96d00ba1573323e98c
2d17209d4b385cfd663a3b80483508fbd35b8cb5
32141 F20110404_AABIPE tang_y_Page_073.QC.jpg
a8ac03b994d8e8af7bdf1a7cf52ad221
c48c50e3f4e355d6e239d006c797b49a2dfd59c8
26197 F20110404_AABIOR tang_y_Page_027.QC.jpg
2f178321fe19e53356fb0e68d451d49c
461ccbc3b41f4becc9427de57d1691d6d1bf32d7
26295 F20110404_AABIPF tang_y_Page_079.QC.jpg
99b8015bcba4f19290aec768773a8291
0675c9661454d40d24b2fa483dabdb66e8f0a6af
31328 F20110404_AABIOS tang_y_Page_029.QC.jpg
410685a64b71890438b32c265bf5d5ff
d2af42dab669e42ec07d503743fc7010bff3b4ec
26836 F20110404_AABIPG tang_y_Page_080.QC.jpg
1d4ac7661700554be31f82fece8db61c
7aa283a03e1c263bee2736024a4d3b09d8fb431f
27289 F20110404_AABIOT tang_y_Page_035.QC.jpg
69b42ae79d94076770583edbb3cea686
5105380cc22062536a429c89265477b717c46bdc
26917 F20110404_AABIPH tang_y_Page_081.QC.jpg
b5bbd3830bd98a0878b5ca84df7d3b2a
fa484421c4c3103fe5e4bbf4095131928422d4f6
15570 F20110404_AABIPI tang_y_Page_082.QC.jpg
78ac2133f457ea058a2defdc841429c7
42cd16014cc99137dfd587a0649754a17d720787
30784 F20110404_AABIOU tang_y_Page_036.QC.jpg
2437f622ac1c8ee8367186c45b377ddf
824cf7f4512add304efe5a25bbca81b394b502dc
15594 F20110404_AABIPJ tang_y_Page_083.QC.jpg
9deb6ab830565eaec824f9d6be8433cf
769639fb2aea33adf8f72943ee55e434a773cf95
26100 F20110404_AABIOV tang_y_Page_039.QC.jpg
d23628b541807034b9bc8c35002ed231
7bdd993a6998cf85682f7b3610ce8ace2a9d4d30
32181 F20110404_AABIPK tang_y_Page_084.QC.jpg
8ae9f52bb03f24ead29522c87e594841
9796995830c3a4056de01181188e72dad5394064
32426 F20110404_AABIOW tang_y_Page_043.QC.jpg
ddcd11ab2fc446004005ad4d37231011
c2138e746cd125af31ad7d74cd08ee2f9183bc6b
14293 F20110404_AABIPL tang_y_Page_085.QC.jpg
ca75e8b2f1a735239d1a8d40ee8d82b1
088c28c00beef70f651af106d679f58c21b904a8
32357 F20110404_AABIOX tang_y_Page_045.QC.jpg
17f0170cc257297c0a0117fcd753fb52
561c80c2572eda471dd1ade45c59657e74713550
5950 F20110404_AABIQA tang_y_Page_031thm.jpg
6fc84be98458a7120a4236778321b70c
e2fe73c2f1ebc7816f21b1b48c15f8dcfca9e080
27945 F20110404_AABIPM tang_y_Page_089.QC.jpg
c73376511bcb242c6b52ba0fea8689dc
b6ef8e7951cb15bd2ffde6b5939b2baae883bf56
20385 F20110404_AABIOY tang_y_Page_056.QC.jpg
7f77b61b481c157232fa5c867ea46aaf
4df4482b758b5cb4688b956fe35639dcc266acbf
7300 F20110404_AABIQB tang_y_Page_036thm.jpg
3f288a8d9c967acef5dc4d0fa8d12451
8621a3cbd9c043dcf29495a30cccf4d777fd2bd7
34425 F20110404_AABIPN tang_y_Page_090.QC.jpg
52dbb84073d34a7f6d3808b136215904
02b3a492a8982fcbf003288bb5932113861ae1a9
12810 F20110404_AABIOZ tang_y_Page_058.QC.jpg
8165db3f11090307a747c36a25b893bb
0d28d87ee8aff15dae2597de9df59f77ce1dae38
7180 F20110404_AABHNA tang_y_Page_041thm.jpg
7b95d9309de2260813d48c1466d876f0
c1332981eeef50d549c7206e77ff398989cb87df
6647 F20110404_AABIQC tang_y_Page_047thm.jpg
c9cf8909f5438c65880d682d842063ce
be294c91e9a62f41cc07948911e592720614915b
32519 F20110404_AABIPO tang_y_Page_091.QC.jpg
58aa913d0d1366874cafc1227da3e440
f12308832b65f037defa84b274788613eb4ef701
28148 F20110404_AABHNB tang_y_Page_086.QC.jpg
a434c6c853002909c602533a6186a75a
1e5631cba6157fa3b869253bd4eb5b21f998f800
6515 F20110404_AABIQD tang_y_Page_048thm.jpg
f62d2128c9ea839bfc6776ed6491243f
0db3450bebbe5543d927243ac44d93659c409c46
21584 F20110404_AABIPP tang_y_Page_093.QC.jpg
fff1ec3904cd36264efb0e6e4fd7c86b
ab3625508ab93e661849679a8a4df3b21a280ef2
F20110404_AABHNC tang_y_Page_045.jp2
f1868153562d7258ea8e4cf0167bd3f1
2a0574cc44f950496f915bb6d711d1ca61ed9642
7477 F20110404_AABIQE tang_y_Page_049thm.jpg
603aee6a2c58c0fc47db26e77335cfc3
26b4580043e4737e1d058f8f12358f00dfce281a
17943 F20110404_AABIPQ tang_y_Page_096.QC.jpg
a84a0cc41c64ec56563d8bd354c03c0b
143deed48786c818c23afae7902fb82d31b48c06
8423998 F20110404_AABHND tang_y_Page_093.tif
64c8e742d5b8aebbb0cbf007004deb6d
f8fcb449a73ca67fd8d85ec13bfc29a2fdcc8af2
7508 F20110404_AABIQF tang_y_Page_052thm.jpg
b1ebbe4e8c5aa9d2dda3a37273f4cb68
2288566e775b1d16ef778ba1d0e624980c824d60
20752 F20110404_AABIPR tang_y_Page_098.QC.jpg
0648e64bc005edd662fdf64404a73ca7
06395b17887d1271bcf4e579b999cb33b440b33d
6356 F20110404_AABHNE tang_y_Page_012thm.jpg
01d988abfc08e5f13731f66866be28db
3a7d548b7fb11ca2a4ed09685d9b2bfd4224cb9f
6746 F20110404_AABIQG tang_y_Page_054thm.jpg
e62b8f2f4dd29dd72b34942678456d07
602f19e35e3138f272b87080ac73cb258e92b525
31066 F20110404_AABIPS tang_y_Page_101.QC.jpg
f5a2efa168dc4a00bfe5a143080502eb
805ee6ee92e324de73061b6a8b295eeb7a0e726b
99024 F20110404_AABHNF tang_y_Page_062.jpg
170fa666209297a21e8896a19d7f3766
c52b3245188c5fd54c5319569afc4fd120c3a006
5611 F20110404_AABIQH tang_y_Page_056thm.jpg
b4b7056b6cb181fe1bd3d0ce3e563eda
e4c3d3206e8abceffe45c21801637b7a40b3d83a
28091 F20110404_AABIPT tang_y_Page_103.QC.jpg
5bd07045be85a8515fc7c0cc21c867bd
c5385e2cbf07af1651376ec943ab2d88543cf39d
7343 F20110404_AABHNG tang_y_Page_014thm.jpg
e49319b83dc39e0321e5b1138dc5b30e
3c5b0974d10f34c82a290e6d5ba91c9acb2e6e7a
7276 F20110404_AABIQI tang_y_Page_060thm.jpg
7c586aacefc1ee41a4b06a8ffce4a571
693b8c5b9efd211d1670375a56b367411a399c90
34042 F20110404_AABIPU tang_y_Page_107.QC.jpg
dfb548b655415442045a8c5b5b048492
71bc433b3572e42cf47380df39dd0cb06c04e057
673883 F20110404_AABHNH tang_y_Page_098.jp2
f724a934a319407ee5ad6956340a6e7d
49816d8c816af65f7042cb5446d07dde5d9e196c
7189 F20110404_AABIQJ tang_y_Page_062thm.jpg
e990ca5214274feb910547eec641ad5a
34bd8209236dcc3c66b1d2e9f3a383156a233eef
49830 F20110404_AABHNI tang_y_Page_008.pro
e2282dd37a1ee92bce169540c88d61c2
409488bdc9b91f94e5a78e26f43b6895897a69be
F20110404_AABIQK tang_y_Page_068thm.jpg
7a18b99668e12825f28c0686de247a17
79128f372d72fc297b0a211d8323d566e4fafd7e
35121 F20110404_AABIPV tang_y_Page_108.QC.jpg
6b6560ab4fadf2efc0ac183388af772d
d4aac1e347cc847d333c622f9cafe727970f8e52
41717 F20110404_AABHNJ tang_y_Page_020.pro
58548ac6bc2cb46ff94622eee1c7e705
7c321b5a7aa454fc048be0b3bb96c24514e0d2a0
6740 F20110404_AABIQL tang_y_Page_069thm.jpg
ba780dd58f322c2b916c3741062bdf81
bc70dc143184be8cf66bd5e62544534dcf8ae81c
625 F20110404_AABIPW tang_y_Page_003thm.jpg
2541fa6af820d60948da66fdbe04cd05
7f533e3261c7f22611b584b2e706b65897281190
8096 F20110404_AABIRA tang_y_Page_109thm.jpg
1069381c38ec101919d99eba97cea854
6b349e0666980c375ea1094237a78320502d7888
F20110404_AABHNK tang_y_Page_069.tif
0bcc343242c20b3c76ccd5cf0c875dea
94da09fd193127a8f55a9fa0fb61e3b110ea54dd
6720 F20110404_AABIQM tang_y_Page_075thm.jpg
7588449e357fa9666a6a445c50556083
7bb48fb2ab8075d02d677e258b1f2780dbeb895d
7369 F20110404_AABIPX tang_y_Page_015thm.jpg
dd7293383222a12a560c8da4093ec442
ccdcec9698254b26f683576a05b96cdf6dcb9853
1051969 F20110404_AABHNL tang_y_Page_043.jp2
08cbd31285625dc9f81a3a3836f6ee72
4d3af82aa6ac9cbe079b93c66dfbd8de7cd53bb3
4618 F20110404_AABIQN tang_y_Page_083thm.jpg
6274f590db47b2bf9e8f3f15e4037ec5
d769892850eb43f99bb9d00924d893a5f08109f5
6072 F20110404_AABIPY tang_y_Page_020thm.jpg
149996b0155404c8635b466216a8d8e0
28d140f77f0dcc234b126b9678accefa45f44ea6
18491 F20110404_AABHOA tang_y_Page_095.QC.jpg
5a403d5344d86b616fa59f7a6cf573c4
1b8a00c0f91718061f3fb1cd2350f056752ea4d0
90621 F20110404_AABHNM tang_y_Page_077.jpg
853eb54e4cefca928a07638b8e9145de
3f073ace9440dc8dfdde3849242210eff66245a5
4398 F20110404_AABIQO tang_y_Page_085thm.jpg
bd867dbce6f8cf07ba8d945bec21bbf6
705d3c162e1b60d55aa0b86a1ecb22c1b3b44ca8
7459 F20110404_AABIPZ tang_y_Page_029thm.jpg
41c8c47f9cc5994ca8c5dc6bd6e17c57
18768ba0fe48c1ddd726f7648443285fd774b678
77174 F20110404_AABHNN tang_y_Page_010.jpg
7103ac18406fd03ef7a33d62a25a2fb4
d54b3d294c1d71ba6cce873c0955f48bbf5205c2
6866 F20110404_AABIQP tang_y_Page_086thm.jpg
ed850f248997d50315fb171a28bea7f2
1c913aa7ebcfaaf00fed1b8b2d579daba444708b
101872 F20110404_AABHOB tang_y_Page_049.jpg
77fe7c49454446a659d8bd4dc96f262a
bef47856c3c8b2488cc684acf9d7c49df001e4cc
5339 F20110404_AABHNO tang_y_Page_008thm.jpg
7220452fa1c2eea41680461745e4ff8d
07043e0a8e808b096c104f289e524bdbf2bec50b
5351 F20110404_AABIQQ tang_y_Page_087thm.jpg
fdad3d46d0635b1f0fbbef10cd39b718
3da749c3709df2e6da2bf1518885bc3080e932e8
2923 F20110404_AABHMZ tang_y_Page_081.txt
aa0ffbdb90c32c3f29daba9fa1abbcb5
cd269c0d51fff66509fe678894a26ec24e7ac021
1126 F20110404_AABHOC tang_y_Page_085.txt
aa1c8503d8af0ee7a0c7235b703ea6f6
ac76f2a07f9307c01e32b3fa028d5178051194c6
1051920 F20110404_AABHNP tang_y_Page_014.jp2
94380f8e4baf484032a07819ff3ff034
2b23712f9eaf794f7b87a1a8e3df1fc97ae717f5
7398 F20110404_AABIQR tang_y_Page_091thm.jpg
f2615625ac91ed50eb7244baa38a5b75
628bbbbd3a999fdfa5d34174ba4b7b26bc0b83dd
1051971 F20110404_AABHOD tang_y_Page_029.jp2
50be0253e153c25415aa55dae2e93a6a
c4a763faab2c63c65a9b875b1fb6025517c027fe
7279 F20110404_AABHNQ tang_y_Page_043thm.jpg
22c3fd5a581bcbc8a7b3c5c32fb7493b
cfcafe275d0e157edb7fe5f08db6b8d8a8e1d2f2
4782 F20110404_AABIQS tang_y_Page_095thm.jpg
aca4e24a2c28df173dc9bdad9a9cb58c
d7b5fa856825e15bf1b6ba6f82b5e4d61567afe4
100477 F20110404_AABHOE tang_y_Page_101.jpg
e53bdc507eefa4695fbdcfdc4a40201c
ef04b543d587ed1b2dfc9ce220ff9cea65d8c480
47145 F20110404_AABHNR tang_y_Page_022.pro
00d677d3f6794a894f779352a8ae5e1d
edf56ff2e64b33afae9fcfac6d2c291e55398878
5379 F20110404_AABIQT tang_y_Page_098thm.jpg
b8d041d44bdb06dc75b876987904c43c
fcaed6c86c6195bc162edb8a6e2dab1c12d5ddee
1968 F20110404_AABHOF tang_y_Page_067.txt
6fdb74e56fccf8256e80896aeb334715
3512886cd0b8f8685382701a13d62d5ce9b09bf1
30204 F20110404_AABHNS tang_y_Page_100.QC.jpg
29f08ff7f03536e5f319aeac1bc7facc
8f5a0f287f124c42f00f99d302da7c5e64b62f6e
5973 F20110404_AABIQU tang_y_Page_099thm.jpg
5ed4c21b392d2060089634e9eb61682e
dcf4646039cc12a3d559c906197ce18e07a8979b
11137 F20110404_AABHOG tang_y_Page_055.QC.jpg
6acdf4536ceac5128d497dbc8114cb39
dbfe6ed71830ef91c0788c9f8c15e403624d19f5
106040 F20110404_AABHNT tang_y_Page_072.jpg
ec261919e16dda5d4500b22b8364b4f8
eaed727afa21821bcdf9f2a33d5834f153c374d2
7119 F20110404_AABIQV tang_y_Page_101thm.jpg
196125cf7f004e30bdc8bbbdaacebc21
28f9a660721379dd164862ff862a8753df0ba5dc
26251 F20110404_AABHOH tang_y_Page_068.QC.jpg
fa1116e696430925a6b106599f2ba646
d70472bfe01e59e8d341b7e88084d2ed03a4c464
27365 F20110404_AABHOI tang_y_Page_054.QC.jpg
4a5176e14b80b64f78d968801709d134
e674519e94aa979d5f863a8a621931dca900edca
87285 F20110404_AABHNU tang_y_Page_080.jpg
281e2154b16d13f866613fdc983155b0
a5ac39df7ae06b4050a1f7cc2d15c45653744a58
846 F20110404_AABIQW tang_y_Page_102thm.jpg
f7df8611c89d811c4c7d92b1660267bb
39485a7e561d12453affd27f48d9d9d19279510e
2185 F20110404_AABHOJ tang_y_Page_059.txt
e20377477924c7c3a31802967188264a
428c5cd90d51926f88d761dba666107ad5b246d8
28022 F20110404_AABHNV tang_y_Page_044.QC.jpg
0a6497f1e21a809030a5be93d43ec923
ea63cb3296b87f9bffc5bff14c84ed693de9b994
3462 F20110404_AABIQX tang_y_Page_105thm.jpg
a8f75c2023407c99feeaa84b5f61e395
bc8b72d4ffaf7053c47342bc2cc2222dbc3a8b34
1051922 F20110404_AABHOK tang_y_Page_063.jp2
11c80fac40d721865f8102d2472ab18a
322c3c3b5c2becdede18442b3e5857645e1efed3
7019 F20110404_AABHNW tang_y_Page_023thm.jpg
7b3db2ae4fd3e800e5d9659a8f8c4b2e
c14aef3719c9691c438146599915c027f6a0d657
8076 F20110404_AABIQY tang_y_Page_107thm.jpg
e2095d620839653c56f3ef22464dd023
fd69c3429e6a35e0c31da2ec6783c0c0314a492c
26073 F20110404_AABHOL tang_y_Page_033.QC.jpg
78f623749dcd3a8b445fe75787a0249d
476c71daa8c81aecbdfc63393887f77e1701c8de
1051981 F20110404_AABHNX tang_y_Page_016.jp2
ea33aed80f1ea65e824144ee0f77daee
d54948aa59939e29c05b1bea08bc9f49bfed4dea
8156 F20110404_AABIQZ tang_y_Page_108thm.jpg
75c091ea00c08550652c2757c63041c8
11c8c13986923c6a81a2aaf9e3639fae8f414e27
F20110404_AABHPA tang_y_Page_064.jp2
faca51c567df36d07af3f5ba559820ca
4ff848da1d738a3939b17d99ac3af8020bef79ea
102333 F20110404_AABHOM tang_y_Page_084.jpg
01b6a9eaed4a7e8cb48e178f32dd61e1
8e44f0e24a474d5eac57dcac1b936f48a3d032bf
22751 F20110404_AABHNY tang_y_Page_001.jpg
786d5dc49fbdd7e92d00c76bd2755ff6
a7549f91ade62c8f374f4bc7d6dee5b460aefdf9
F20110404_AABHPB tang_y_Page_111.tif
1612e43baa559e4b43150d47bd807991
56fa8a65fd6d6b4c7aa7bcb503e24d72e1a83bbd
72948 F20110404_AABHON tang_y_Page_099.jpg
9c94670fefb10a42e343b20666947e94
df6a4299d7c05bd8f1179d33aa2458c58fffc421
1926 F20110404_AABHNZ tang_y_Page_035.txt
24495a4904389a2f5962838c5fd35695
3bc610ced15106d7aa9c9109632581bb83bdfa07
46027 F20110404_AABHPC tang_y_Page_053.pro
9ede16aa273ff401848123164ff60e3f
2ed65fa8a1c883de207052bfad31aefa61992548
21589 F20110404_AABHOO tang_y_Page_005.QC.jpg
699280de586231d6990cd49a9493bc73
c72ed2cfbfd7946b9aac1cd3ad8bca5067245444
44394 F20110404_AABHPD tang_y_Page_012.pro
c75af5eef1b7a2dd1553dc84214d873e
49c05e0cdd7ae7bd2fb697e1bf547987db9c52f7
657403 F20110404_AABHOP tang_y_Page_005.jp2
446f2c9bac604594a8ffa11a1a65b25e
781412e554f537b218c002a57d623a57e23de7a9
82269 F20110404_AABHPE tang_y_Page_097.jpg
4207a48a4d763d39e7b8689c0c9e52ad
67370114770f13621432e683dbc166cac5222fdd
38504 F20110404_AABHOQ tang_y_Page_027.pro
020e1aaf4270383efe1c44b3051d00b6
4d69124f931af98e968fe55bf58b18b06ee659b9
2537 F20110404_AABHPF tang_y_Page_109.txt
f258ec55645ccb3d3991107f598a94f9
c209c6158c4f666289c8d4cf5f6d2529b5020bc9
F20110404_AABHOR tang_y_Page_074.tif
603c28826e9487e62193bd9380055db8
1d46744bc3c221af8b64e3a0c079c5d0371f25f8
30675 F20110404_AABHPG tang_y_Page_098.pro
e4ae53b716b0ac80b0688c9f3952d4c3
116b112482c6a74f1733cedc21a728d189b14c71
23962 F20110404_AABHOS tang_y_Page_031.QC.jpg
a6ab5f7bd9ce771a4c48f6fbb015b5ba
a6ceda93ef14593fe11fbb77cd834475dc371238
34308 F20110404_AABHPH tang_y_Page_065.pro
2a196a3bc2bba649206b710f29b71924
d297619ad8bb187ea3f56cf5ec1ed535f24ef9bb
F20110404_AABHOT tang_y_Page_073.tif
977e6fad319e8fd6e2a80ff890eeb1d0
b8fd0243b7c91f239c09b9a8711cab34d73540ed
107103 F20110404_AABHPI tang_y_Page_016.jpg
88126919cf92791f49973ab85bb93556
7cf8c5b2525e72b8923badc6f497aa25bc2c9a64
973732 F20110404_AABHOU tang_y_Page_012.jp2
214046ce065f8408dd2d80590a68c2d3
a400ecf9a3ee9bd8ebbc36119111174af1c277fc
27984 F20110404_AABHPJ tang_y_Page_087.pro
7e76431ba908ef2cbdd6a8f13d9f75f1
e8ce7d86c6eb040648bd05017ed2f92a7b115825
949391 F20110404_AABHPK tang_y_Page_054.jp2
571557e5ee703ae8e030fa239e53158d
8f9d14cf6140abf690b1ca343a5662b30fd3cfb1
32832 F20110404_AABHOV tang_y_Page_057.QC.jpg
c27b9a2d2037aac2e7681dd196f60a7c
c0f0a0f2b5056b5688e285e2f971e1b9ba88c4b3
76081 F20110404_AABHPL tang_y_Page_032.jpg
83c82413c0654bf31c99a9e154fae19e
8ab365d25087737446ea2e88cc78517af6bc6f87
31084 F20110404_AABHOW tang_y_Page_061.QC.jpg
7565bbb6e31677a14f366b570ccef418
a276555cc01736e8a9cf1c13b526e906461c0189
891878 F20110404_AABHQA tang_y_Page_033.jp2
2ac14d8d94423f799a5bb00fa7b7ef52
6bbe3455a560cb272cc2a8e46a7794b6cdec66b6
4466 F20110404_AABHPM tang_y_Page_003.jpg
93452be4202175e752958c156a850d17
8dd37c087a5ca277de2c9b2df42de0a6580043e5
4300 F20110404_AABHOX tang_y_Page_005thm.jpg
42301e91eab103a9697731151e6d4245
1694dc018beaed4ad0a161b10eeac7da578bb7ce
F20110404_AABHQB tang_y_Page_097.tif
530414444ca367dfe67b5aeb012f2ca7
d3d54aa667b2f976c8f690448b14a572ad2e03ca
661895 F20110404_AABHPN tang_y_Page_056.jp2
0f7397dff42225eadd1eb3612ccdee88
06368ba4a5f61efbec8136415dd15c1f48ea6d14
1051954 F20110404_AABHOY tang_y_Page_015.jp2
3f4dd7b66cdb9b27ab94e2953877e3e5
107353ed1be711e202c8f2fb3c18e4f44fd652fe
7262 F20110404_AABHQC tang_y_Page_070thm.jpg
b6a689cbb6291a48d4de59ad240bd4ad
cd3e20098b8d119f131c5c6bc12f61c41272534d
2053 F20110404_AABHPO tang_y_Page_091.txt
70a308beed52bece3c74e4caf923be59
f318cb45d99370178f74f492e1b3c61507bc8881
81292 F20110404_AABHOZ tang_y_Page_025.jpg
771cf41b3962941c94a15e9dda907f09
30e42df0d0208ca9f3c2a463799bbc2a3eada892
16934 F20110404_AABHQD tang_y_Page_088.QC.jpg
d8a1a7545e6f67100a820898ef5b8340
2a0e86f659934bb2265ba767698abe1c7dca15c1
F20110404_AABHPP tang_y_Page_074.jp2
6e3b8bfb66f4c4166a77cdf5485a2a4d
ed2a4c0ff7c62f96f3ad63bb88fdd95a36a37dfb
2090 F20110404_AABHQE tang_y_Page_080.txt
b7d0e5abf469b04597bd020af4fdf518
18807d99fbf82e52d3e2238c941bd147da568f45
F20110404_AABHPQ tang_y_Page_080.tif
b9ef6f17b4aeec04137b5b6760b00ad7
85b0775a2a7d1ef486b616d2421b85e0099017f5
55460 F20110404_AABHQF tang_y_Page_059.pro
27106241718b53674596ea26b9cd11b2
83f3f3ea04d30ab9e127919ae92fa6777d21d598
1051968 F20110404_AABHPR tang_y_Page_107.jp2
d36bebf7202d5ebafcfb690ab1cc17ba
d8849ba759c35874fa0ddc771074e3697f09bc7e
2161 F20110404_AABHQG tang_y_Page_086.txt
efcd98a08c176d1260f8be43ad1edf31
6ee4efc90c7f58ef06151d24241d08738b7e6834
407869 F20110404_AABHPS tang_y_Page_111.jp2
e19fa996f0256c9b232e3e579fe7a957
fd6ff70a0787be4c1b16898197ee783a1d4d523d
29423 F20110404_AABHQH tang_y_Page_042.QC.jpg
163f47722cff318fb90b3d3c6b8487cb
5245fc35de5343e200d44c6341ed9d40c4f0fd07
49837 F20110404_AABHPT tang_y_Page_071.pro
d9b3db7afb867fe193fa937e5983a3a0
c8e86031448cd31e66a8337848848b070d8ed449
11613 F20110404_AABHQI tang_y_Page_007.pro
c0604e6d6f282c8a5e2514b537459066
79b43a8d19d427255de883a65f0d20782c3934b2
39067 F20110404_AABHPU tang_y_Page_080.pro
d47f28fa5d55edf1dde07df9d133a43c
fd0df99328e4b66c2a560c69f4acfd6dcc27d88c
28147 F20110404_AABHQJ tang_y_Page_022.QC.jpg
b2af8c543512883b0b5e5fa725da4cfa
9fa2875597483ebb57d83bd178493607eb586a26
528327 F20110404_AABHPV tang_y_Page_105.jp2
222474b8d37b48047dd0bc3acf3f9791
ce9cceca2c4250dc6eac4ec96f850be7e2e94c92
F20110404_AABHQK tang_y_Page_055.tif
a0aeed217359771e2fd5e34dc229de09
0f9c61f00a1ead7e334aded9caa03c2967140adc
27218 F20110404_AABHQL tang_y_Page_092.QC.jpg
9b79846d155b6083c1ddb67d5f37f425
c2cba3132d8b72ba9cd9720694b93a5ea0a39003
95841 F20110404_AABHPW tang_y_Page_023.jpg
7c8bd1f011a0a5eb275461d68c3869cf
c1e96e139a4a5c6c825b08d1b0df871b17ad4f74
7156 F20110404_AABHQM tang_y_Page_071thm.jpg
e7e2ae4fd0dcf50b0764bc75e0ecf84c
8fa456d512bed2a21976ae6072e31daebdf61a04
22018 F20110404_AABHPX tang_y_Page_007.jpg
7f878315ca6c3df33ff8b93d2d982d45
1c8d5bedb6ae644150eb26b3ae7651f925df19de
2115 F20110404_AABHRA tang_y_Page_064.txt
a3843677ab649ada179b2dddca6421a9
96c1fb47d255be6072980b94ab510a2025b738f0
7669 F20110404_AABHQN tang_y_Page_074thm.jpg
2c11fcfad72352d7655ea4a88db3de81
350baebc2cf7876497c262278964e9eeb7f7d04b
2050 F20110404_AABHPY tang_y_Page_050.txt
129fc41b90b6be6ae303577fa8a774b2
e6a972b7060f42f369b474c943ebf3e771404f7e
6838 F20110404_AABHRB tang_y_Page_081thm.jpg
628a3bbafe9fc61365cefb4e638f707a
b933cca7c3d59f37291d4f955e9ea489085a3e96
90569 F20110404_AABHQO tang_y_Page_069.jpg
6f0ab473f381e5789b42ab48c70f0a70
32049fc2852e9711e2d0be0fc8d9203dec9c4d9a
F20110404_AABHPZ tang_y_Page_109.tif
56de4525ec4008d0ed0381f24cc1a041
f1f79a3fd8755fab385a96811fc0e253ba4ce623
101742 F20110404_AABHRC tang_y_Page_073.jpg
7293cf2d7b65632136b4f5fc0a2c6699
0c42a385d71857cfd2c1d69ca88a6f1eb1fa5381
54867 F20110404_AABHQP tang_y_Page_088.jpg
e605f442635d23114de8bb1804637e3c
d718cf8d473f59fb967d6782cf44fa1d69747ed4
103090 F20110404_AABHRD tang_y_Page_043.jpg
4e554c42458512205443f7de1d36a073
6e3d02b2748674b5309605b7a8fbb962c9382c27
38637 F20110404_AABHQQ tang_y_Page_010.pro
c27177c7de2821cc87e559a44d5acdc2
219c442e7993cd0638b61cbd12c89ffd010851d9
F20110404_AABHRE tang_y_Page_037.jp2
63192b4dcda2a8bed4e4025365b3dcf5
509f305948ec731bc7f49abba6d31124dd08893b
F20110404_AABHQR tang_y_Page_085.tif
bdd0e130d1dc510b053322601041c3cb
a85abcd380c2495203269334b16eeded8b62618b
95980 F20110404_AABHRF tang_y_Page_036.jpg
2357abf39701368ee3d3063ea425560d
0f0b953f5577c58665fdc44cfb8f425a88d30eff
52664 F20110404_AABHQS tang_y_Page_015.pro
8e1512d7accade525f6acbbc77fdf8aa
38a1ccc319acfbb31935e15820f34d1ccf12c103
7551 F20110404_AABHRG tang_y_Page_102.jpg
8eb2a75459440a3abbdca7869551b0f8
34d937cb44aa5c97b89e216c1fdb0aef94e3ba9e
1971 F20110404_AABHQT tang_y_Page_053.txt
c95a79622314389d82475fc33896f3a9
2b6eb25aa2ccab2904413cba80ede14c1f7c269a
971449 F20110404_AABHRH tang_y_Page_044.jp2
e35d35302aee8804ee294323b85bcf2e
7721005e49ad34ea8c1485a73fbe40417c0c35d2
F20110404_AABHQU tang_y_Page_025.tif
96a0d7de903712baf31016149f24ab0b
a4c69d85c11009124dc9fe7711bb6566dbdac1f3
1651 F20110404_AABHRI tang_y_Page_095.txt
9aaaf56a48f79684ae858292ec23b419
6c523b57ae07f412674f9e28b45f9848772d2108
7451 F20110404_AABHQV tang_y_Page_019thm.jpg
9724663e57252b8e14e26ff2e59e503d
c0b016c77f9abcfa62059e594b7eaf1857f28c4c
26710 F20110404_AABHRJ tang_y_Page_078.QC.jpg
8bb926e18e46f166eabc4b45bed4163c
7bd42056441c0d8d708acfffa14ac3c4bb2c76b8
F20110404_AABHQW tang_y_Page_087.tif
dc0b409bf91826da9f0a0eaed49cb630
6fa3e61229f1fef951c031dfb80ebdffa71322d6
44656 F20110404_AABHRK tang_y_Page_110.jpg
416293633f23841382898545e17c43b8
ce5712476b2bf38cf9d17b2d3b3065613c1e60ec
636051 F20110404_AABHRL tang_y_Page_095.jp2
3e49d040cb7802effbe2645c7c94559b
c249a6e636cd061cfa7db1cec0a7a38e0c10b450
7141 F20110404_AABHQX tang_y_Page_063thm.jpg
84806ccb56e6b977942476d1c5e38d9a
6b1cf5024983653a6485b96b8deb64ce41263f94
67509 F20110404_AABHSA tang_y_Page_094.jpg
bb373128c6b9c831ac58a996920d5fae
87097783c3753ab583641d890959d7fb129a5e8d
1681 F20110404_AABHRM tang_y_Page_011.txt
f0c8880e875288009f9944bde5a2c037
e45fa6e4d94f6e5bb1bbeda6373fb3a14e4f24ed
2006 F20110404_AABHQY tang_y_Page_030.txt
2a19d41666b9bf6942cf7a030b113aea
9ac87dd157a2fb47524910f814a6606870d1da19
102312 F20110404_AABHSB tang_y_Page_066.jpg
5668148bf12e7aee26f9742f821e50b1
ad943ef122f83117818274da1add7879e0e650d6
28612 F20110404_AABHRN tang_y_Page_075.QC.jpg
38bda724f48b4f75dc03c43ac964eea9
3f4bae7daabe5205272abd2f3ac7588dbdf97150
1051980 F20110404_AABHSC tang_y_Page_018.jp2
8c9675d01c875696c2762825c936613e
ed2d806a773b7523adc1d5010ee423783034561a
32279 F20110404_AABHRO tang_y_Page_024.QC.jpg
abe2f0f4071e0167cf23feab7afc6da0
87910d363cce1feb068e284bbc73ba18e096f9f8
30137 F20110404_AABHQZ tang_y_Page_071.QC.jpg
21e9f778f1dd9ab221d85fb4af3e11b5
8caf7c890ca1be2597e341e713fd9156db7a7588
6911 F20110404_AABHSD tang_y_Page_001.pro
a368da40f0f01b6a8e4a1595599c3e23
4e69f5ebe787158c984ba5410add3801729e1d47
6296 F20110404_AABHRP tang_y_Page_001.QC.jpg
7a6418dbde5228a5084bedba4735c791
bab0375b490a2a0ee838fc67b2c48cad7af8eb21
6659 F20110404_AABHSE tang_y_Page_026thm.jpg
61619eed3d050bef6b61a9cc968a0419
627fc9e8c94efafd1fcfa0640c95ad62d411dc4f
978208 F20110404_AABHRQ tang_y_Page_077.jp2
2310d9aa2e1c295ab4824a5ca62c3afc
dadd75d0d84f43ee261f92e10ed244044529f29b
1831 F20110404_AABHSF tang_y_Page_026.txt
dfd9b1f09efbc3e167b3168bd0ff8d48
220b137d3167e2fb69486c90cd24b910b24e8d40
31539 F20110404_AABHRR tang_y_Page_063.QC.jpg
4d00eb333751aa014d51d9cea469f9cd
e993f880deb07f63d3e0c1fe2c9c7703de7ccc41
83814 F20110404_AABHSG tang_y_Page_027.jpg
c7c642d69e3680deea5ab01446a070fd
448b044b127629123b0699196d850f1b8fc1b3f0
F20110404_AABHRS tang_y_Page_072.jp2
dd1e632b9d25584f8170775829752cc5
45a9487c611c550b48e7d07dd7a98e9ed301735f
6044 F20110404_AABHSH tang_y_Page_032thm.jpg
f3e4757b8d87d0bcb55122240cb7b2c2
e15602839eae733831611446a48eb49fe7d6f703
10535 F20110404_AABHRT tang_y_Page_009.QC.jpg
0d64e163a9e581409e061be5b82b62b7
2058790422035669ddc036638ed53c6db58f6a40
6455 F20110404_AABHSI tang_y_Page_092thm.jpg
2e75788ec49b859c60763553f51f6bc3
31e7d8eef33f595ca5e0b1c92183628d72faee53
202427 F20110404_AABHRU tang_y_Page_001.jp2
d06c36cb377dea9eb7121aa5b1647020
3303841034dd12dbf536bd3d96c5c302ee95eebd
50228 F20110404_AABHSJ tang_y_Page_064.pro
55ec394fe65164b6d17336cd9907c49c
c59a813c29431fa0bbb19967b3f82a751404cd85
26888 F20110404_AABHRV tang_y_Page_025.QC.jpg
fdcb0ae7108522657ae9cc837a9adb25
2eebc448a6c303972384ecfdf974c41b22771e6e
7392 F20110404_AABHSK tang_y_Page_013thm.jpg
b9d22c20f020a09760df5acd7ab7e742
62d08fa58643a970e598a01ed4b62e098e3faff6
2010 F20110404_AABHRW tang_y_Page_094.txt
18361c79b7fe91c29d3ba696b53ffb8f
2f971e72c9fc48fc1bcec02a050de2eac72a63e0
1627 F20110404_AABHSL tang_y_Page_032.txt
74edce1a99d497b94cc567ebcdf5fe39
0656e2d5bf643be30eca8944948af5fb58a863fb
51153 F20110404_AABHRX tang_y_Page_070.pro
1120094698cd89b5fae0395244a5e7b4
adf9eef50836e2c96b26a866cb8f426287bd048f
26376 F20110404_AABHSM tang_y_Page_020.QC.jpg
873bcbdb58df5b3caf596ee282cec7f2
ec8832bf082e3795483e9fa11287206db04bef89
104189 F20110404_AABHTA tang_y_Page_014.jpg
fa8f2548d4ddeb58eda7b83017f73daa
71062af720f16aedfae052efbf7c36c862f670f5
34451 F20110404_AABHSN tang_y_Page_094.pro
3f5c2f5b14885a90780303a610460e85
abdc9357396dae564de6df1d33040809c282cf9f
F20110404_AABHRY tang_y_Page_037.tif
b6e2205689610fe04f25ec271c7d4add
a8831637a7ade139bb0f0b983d1b47c35be795b6
54048 F20110404_AABHTB tang_y_Page_024.pro
4ff6ef85870a8479b1d8bf3ddb0e81c7
3c8102d02c7b3184a8228bf8de6013e92ae5529c
F20110404_AABHSO tang_y_Page_012.tif
27c6bbced5092fd271602baa8a7e86d9
a38d2336ff51841ec9810f71e6158c1f181bc56e
96529 F20110404_AABHRZ tang_y_Page_041.jpg
e9cb74b8df57678b2d6195aa779ce70e
b279101f6d372279f1eb0ee176287d7823218a2e
F20110404_AABHTC tang_y_Page_083.tif
65e67384666174bd1b6af04a42ffabfc
0d0532937e470ab0c28c717f01f0843dfbd0df1d
1019619 F20110404_AABHSP tang_y_Page_042.jp2
4dc5e0d3171c39cdeb6aa6b322e1b6d5
1330ac3c9778d37818b55a2e792100acf27b54ce
31515 F20110404_AABHTD tang_y_Page_056.pro
9203485ce6e6abe460cb7fe7dec0a4a2
67af8e4319a886097041461bf5d91513e52940fa
F20110404_AABHSQ tang_y_Page_103.tif
5e666f0ccc6104438099db0c16f86f61
96486ac1a2ea427fd714ccc354c9eb95c36af549
44515 F20110404_AABHTE tang_y_Page_026.pro
59cdc2ab2b631d0f67f7ded9b859b2cb
27d92d1ab5c7bacd57e626cd9736debc6a636769
F20110404_AABHSR tang_y_Page_035.tif
fecb1b8505fa2229b8c955e29310f2c8
47390d125e46e21f2a0e67cffbb2082f4a8bb202
110564 F20110404_AABHTF tang_y_Page_106.jpg
d70c6c1fe8d2a7c778bccdb098e5fd28
01c6e674561eed5b1869548dac130f2ea9b5ee95
95895 F20110404_AABHSS tang_y_Page_037.jpg
15e49080d4e609da7d405ef1f36444e3
9ee4235a62943f7db35cd945ad4106289dc25ee9
1855 F20110404_AABHTG tang_y_Page_012.txt
abe1460e1b95b071b9aaf12050e3a642
78409dd4331b617a17ca38b70f6e6bb257613c18
260532 F20110404_AABHST tang_y_Page_055.jp2
3fe8c3e80dd4cb6a14206c10647de5d1
0d4a173d435da7bf1516d88523c289cbb299c7d3
7484 F20110404_AABHTH tang_y_Page_024thm.jpg
e12d2e3c13ef0966a0bc1d8121c25e3f
b70e956f0afa5c03fe79b4c762a3ab47f08258b6
F20110404_AABHSU tang_y_Page_081.tif
47454d3aa319e594ca4e751aaf22c51b
f3785ac5537006f994921e2b54f4942d3a28c0b1
97904 F20110404_AABHTI tang_y_Page_046.jpg
c1b61b9eb0e8f8649c1c781a9a10bc14
26ef8d06b525cb11da35ef5c56bcaf6d01af285c
1357 F20110404_AABHSV tang_y_Page_056.txt
e45bbe726cc1b937673e96ee7e43619f
2c7b97aaeb811a9b0de0b94472343e8656e69437
F20110404_AABHTJ tang_y_Page_064.tif
9c3632c65b46b08bc6f625d1150e30ec
e681f97c687c875dd6feb5114c2d896a7f796701
106666 F20110404_AABHSW tang_y_Page_028.jpg
b0cc5167e2e0a05c03026223fde204da
bfa4f1d2dd586e466c57ef5e8234baa5588b2b96
105782 F20110404_AABHTK tang_y_Page_059.jpg
6e2a49ed45711fc00b194efa352f70c7
ad545833b9ecd3577f35a661b9ba208ba9a10a4f
923105 F20110404_AABHSX tang_y_Page_092.jp2
042782a51eac1872bcc3b57cbf2de47c
09a22ffc6954998f9b9c051ca51f9e8275603ef5
1019269 F20110404_AABHTL tang_y_Page_086.jp2
70dfd97d74ff1441c4c944253afcd6f1
3aaeae9a9d8d75f6adf47c6356021db927b860f5
168 F20110404_AABHSY tang_y_Page_102.txt
fbc2582957dce1357ec3c7d8864e3204
b97f5914a8fcefc20991cbe412aefc885f128cb8
1982 F20110404_AABHUA tang_y_Page_036.txt
99dea2a201e0162edc1d13cba1d37897
e9bb4f3bc415b32af18bbfa4ee754ad96c8462ef
5734 F20110404_AABHTM tang_y_Page_093thm.jpg
c26f9af0795f27019c7865080e90a692
de47cf7139d43957e80a766e700940913d243a19
366391 F20110404_AABHUB tang_y_Page_009.jp2
5e402507eac11b9c939822f0712c84b1
3b9bd77b2c4dea788ed9fe113cbdd1c4dfeda65a
1169664 F20110404_AABHTN tang_y.pdf
1b64550700958153d3edf9bb1e066836
2b495e8e27fea52ae5adfce17726eeb70d92be9c
30925 F20110404_AABHSZ tang_y_Page_046.QC.jpg
87a60a5957235a7316a5381deb819433
d1d1a7dd61b26d7df23ef44f42e134acf2c7b9f9
34557 F20110404_AABHUC tang_y_Page_109.QC.jpg
4494b55812d946bac1f362fdbb4a00af
4a5e5772879af9b3edc3988fed1d6cc84135cfae
81562 F20110404_AABHTO tang_y_Page_033.jpg
854261071deb3fb816633beae21f3901
9bb2d5c6c8b74a19f11cbf3c44dd60b60718bc4a
1315 F20110404_AABHUD tang_y_Page_003.QC.jpg
5be24143d62a65fe9eee94630442cff2
5ae11fd7fd158f4d8b5bc82778e4776dd81ca019
1051958 F20110404_AABHTP tang_y_Page_041.jp2
17dcd9f36ef1984ec785b84f8a013c8a
d70aaf804dd5085a1625206755445ffe2dbef612
F20110404_AABHUE tang_y_Page_032.tif
729ff3f268c4c565bfb31ef6743d33a8
26a58fcf5b03f5775b68142478159d86d966ae1f
4619 F20110404_AABHTQ tang_y_Page_082thm.jpg
075d686ea45c3a0e95eedb9b10db4e83
c3139b9528ecde35822e5eb5599496d4fae4848e
7174 F20110404_AABHUF tang_y_Page_100thm.jpg
fa12fa40d10442021996494a5645e117
b44e8e03ee1d82d4535dba9b9b804f89445171a0
54346 F20110404_AABHTR tang_y_Page_013.pro
bbd938bdf251a42b4d6098e3907280d2
2c57e53f96951c356c7b3072b25b16f7ee7e18d1
27600 F20110404_AABIAA tang_y_Page_096.pro
b39ec5547c26a15e78c962562a67e978
8e7f8df16e22cf6e97dfa6b6e6dc81d12e648d33
386 F20110404_AABHUG tang_y_Page_001.txt
95ca6a1856f575f1255c37ed80a9947d
b749aa38632527d8b7b31b5f78b6ba8ef43b7f83
53904 F20110404_AABHTS tang_y_Page_060.pro
cfd33a03b50d2bbe212625ed6caa0377
fb34d2e2d618a20d93d0de0d0293c23a6137bfcf
23746 F20110404_AABIAB tang_y_Page_105.pro
f7a81b77628215c9b978cfb4c5c10cc8
6314b4d0a4c9668ed9955c984c75cf6f06d9efc4
53818 F20110404_AABHUH tang_y_Page_074.pro
f3ad13b47f7873ee0a050e00d2803504
fba2ab3cb7118e14f62400f22accf121eaebc255
6956 F20110404_AABHTT tang_y_Page_080thm.jpg
a080658431ae194261dd7e93e33be643
35347d5ff867de69f756fbd1c91f5878cb006620
42511 F20110404_AABIAC tang_y_Page_079.pro
43d17b750df3c07a1205e1ffc17ad316
36bf8c3ad947ec31debe69869cba8972e0263d86
F20110404_AABHUI tang_y_Page_071.jp2
000173b568f8afdb89869d591629b6b2
1467685a95932dc83d8976ddd77031b24e53a298
1032500 F20110404_AABHTU tang_y_Page_089.jp2
a77b2b1da75d86be4e47516f94512ae3
b2bf6ce55542a5309bf040bfa7ba2d87068beb03
2129 F20110404_AABIAD tang_y_Page_024.txt
1ec8d25b96904139fddfa579d23c55c2
59024bbcbd03e085173c06b8e69d5431084019b7
F20110404_AABHUJ tang_y_Page_061thm.jpg
9637d784a0b9f41254829eced413d82f
c97523c9e1fae5cff1a17ecfbbe29d14342bd47b
6397 F20110404_AABHTV tang_y_Page_053thm.jpg
2dbc8ff1a593cc5f0779edb3cd8c8f88
235c41208e6498bf33fa770837bf83d506d50f0c
2236 F20110404_AABIAE tang_y_Page_016.txt
7fafa75dd1458f1fe3f578b3a097b6d5
8d727ac42d76242fa01479c89836cfb2b9fd24ad
33734 F20110404_AABHUK tang_y_Page_093.pro
61e7fbfedd906776a1f2e3719bc9ea7c
e93982a6baafbf6b2a0248ad6984a166dec368d0
F20110404_AABHTW tang_y_Page_015.tif
31f9526218e3811331bd2b91f771683d
42a94b8cec48f58b25ca1ae21727d28b2e37e4c7
33366 F20110404_AABIAF tang_y_Page_051.QC.jpg
507811a2db3dcc1e81a4e1ae646e3f7f
2c0b39c720d999dbcb2b11dcf1cb1513b28941c5
33151 F20110404_AABHUL tang_y_Page_028.QC.jpg
a91950d4deac4a2534852fdc6c36f12e
a234f4f4ac4a08d77e00696df5c0b78982e16a53
32325 F20110404_AABHTX tang_y_Page_060.QC.jpg
a1d208ee5e3aec8454c4c6c135f74395
1bd8cdc0a59892d4781f727a5012c6ade7671fdf
1051972 F20110404_AABHVA tang_y_Page_101.jp2
c1059ace16839da7cb828ac702eb583b
6a92f9074d6316ab53b2c891872eaa0a52646f9f
983009 F20110404_AABHUM tang_y_Page_022.jp2
b140aa011345b26ad3b5af27eb100e30
a254e1bdcc7b0d04750d5bd19fd6af794ff3e713
29003 F20110404_AABHTY tang_y_Page_095.pro
a072fa837dc6e4743dd8e5f7316d795a
862010588f1761e54b47292deaafa2bc189de682
10598 F20110404_AABIAG tang_y_Page_055.pro
26d1045798fffffc47b14ede8314c6fc
a6692bdd89baaa19a6c4793794075b84d7b0e6e2
2085 F20110404_AABHVB tang_y_Page_063.txt
c98037647497aa81e71bf6dedcc180b2
6662a1924e8d2b6c45e6d4a9541e9aa3b4a9db37
6575 F20110404_AABHUN tang_y_Page_078thm.jpg
f2e52993d5b571f8f7e4bf0619cec6e6
c1d973acb09bd9d369ba254cf4e593ac621a4339
F20110404_AABHTZ tang_y_Page_033.tif
58d4b9a55440bc1b15381eb42d3fc028
31cf904b4f207b65c00dc9bcdd11bcba8008f59d
87066 F20110404_AABIAH tang_y_Page_092.jpg
dc43a12aec7afffdbeed848c08a88805
2d0ee8a1501bc36904a8625d39da541153da9458
F20110404_AABHVC tang_y_Page_003.tif
7b60fab6ce2939d4b2b8e40d303ea9e2
c93462996985a16ecd3103180fae2e0dc80ec4a9
F20110404_AABHUO tang_y_Page_048.tif
cff15bc182f83cf0b3edfbc998469697
c959518cce96ae7191f3624d82241c5014e866fb
6068 F20110404_AABIAI tang_y_Page_094thm.jpg
085a4199eb137afe8700211de740eb4e
e15d1d49db1b5d116df2f79ea78fbed6966a5b34
32424 F20110404_AABHVD tang_y_Page_013.QC.jpg
7f7e9970e770211e42f47b1ea8218e76
537b394440e94722c24abab9bf6b0da313fae374
2126 F20110404_AABHUP tang_y_Page_074.txt
5f9200a52afe8b6c8e8d5bc7043133d7
3dd98a1791e3de415a90159c07fa02a24b278bbe
2154 F20110404_AABIAJ tang_y_Page_023.txt
cfadcada692b872487257c715cc6b6db
ee33dc26ec757b179c31214978d9fb6432149dd4
24029 F20110404_AABHVE tang_y_Page_002.jp2
a175b1f30b30666a66ef583564a0901b
38db21552f15d429e707e094bdd0b0877c9c492c
65659 F20110404_AABHUQ tang_y_Page_098.jpg
1864d08dc858f906c133038a6f94a871
39faac67678c2a15dc850363409e8192284bc4cd
6719 F20110404_AABIAK tang_y_Page_022thm.jpg
71634777608afe493504c0226f196ae0
4d6be5cbc9fbe659ff7eea036c9e82583d16a316
1051983 F20110404_AABHVF tang_y_Page_108.jp2
c4aa6ec33056901cbde0f66e218b4638
48907d41f09b5103c7479bf9ec35ec9af7eba376
F20110404_AABHUR tang_y_Page_010.tif
e17006df68c471ff23fa116be1de253a
384f082d3745bbff183629c512844bb34ff7fe06
1116 F20110404_AABIBA tang_y_Page_082.txt
000b0048a26f611817cfa233da4d35db
801a9f605d3446bb8a700a4444c1a37ac095f3cc
50010 F20110404_AABIAL tang_y_Page_036.pro
400e7ea205b3b43802577a75cd79da8c
f0471dfaef6ddbc6b6f8696b169e6c83495c96f3
53052 F20110404_AABHVG tang_y_Page_073.pro
9a1184669f1b87726eac777b11695b50
29d8ae7c3657e2aac08278a5dd006f6df9558373
21331 F20110404_AABHUS tang_y_Page_021.QC.jpg
905fe79911531336c29a6c94a8e0639b
5f2c6cee971c393b8b8765e3f61d2218da2e983b
46878 F20110404_AABIBB tang_y_Page_069.pro
94882e56fd225410f4c2136c98de4993
32ddf2ace0edcb824cbfd7a60fc2e535ab7d209d
52166 F20110404_AABIAM tang_y_Page_046.pro
5ac4384887afd0bce6f461d956edc836
88f5678a371bffdf42caa506a7b85b1fb3971556
6209 F20110404_AABHVH tang_y_Page_011thm.jpg
8883bfcf6061b51145f9a07ebe3be1ac
e4fd4814ee8d86f83e310b72a99be3b09633d904
42805 F20110404_AABHUT tang_y_Page_092.pro
debb9e8865ed5adea0d81d0019de9ae6
821f18a81085f1ea7249e5b4e02a4af4aa087ac0
1890 F20110404_AABIBC tang_y_Page_077.txt
9223ba18895d50f6dcfe50bb03ad85ea
8ddf04e28c0c2c5d68a613cecce77dda65d8fc1b
55685 F20110404_AABIAN tang_y_Page_090.pro
71ac1618922cd5bb50d202792bb30ddb
1a1a0a42cc57126f7884e1b17591fc0a468b568c
84965 F20110404_AABHVI tang_y_Page_011.jpg
9a7e505219c9ffae21c69cfb0240b665
2c754cbccef913a74bed2a6f9568c7316bed005f
1930 F20110404_AABHUU tang_y_Page_069.txt
ba4e8a06e9ce1a1b48770c1ef8f03099
8512d2e0f40c2a3740968d5182ba49fd1a6931dd
24246 F20110404_AABIBD tang_y_Page_032.QC.jpg
242ea25a52acdc83f0248818a973aa81
5381b2b3d2887f5257955e43aca9e56eb400630d
29776 F20110404_AABIAO tang_y_Page_037.QC.jpg
c8dfbcc10db3dd43859e3952b9838e3d
a528940d5194d7f71a46a43328e3365283bcbe44
2164 F20110404_AABHVJ tang_y_Page_028.txt
f69dc21095370f5c49c6b39ea09e7000
ea8803b0767e735dfde15a85e7ce827cd2051383
2515 F20110404_AABHUV tang_y_Page_009thm.jpg
600ca0cf519df813d90c6c80344db5e9
22a69e5d5fc0c8c3bcccf6bc07880d77d662f987
1051967 F20110404_AABIBE tang_y_Page_051.jp2
52f3892de1641da08ef4b814116309ca
e67db5e4b5d25e602a7e29eb5595588892897695
1730 F20110404_AABIAP tang_y_Page_001thm.jpg
fac395b256ca134dd6ab558b7f177f48
86ef20f6dbd6a07c95afc1081e7056cb71099625
26621 F20110404_AABHVK tang_y_Page_088.pro
d59b8724ae856d1605349158e52e9550
d4bbea9f31ad815a028e02da3b2e5fb7047b9caa
1662 F20110404_AABHUW tang_y_Page_031.txt
2efcd52603c2130b736eb1a0e08dfa5f
ca734b1ddb07bc09b884868e1a3b6cdd7ba4bfe6
7402 F20110404_AABIBF tang_y_Page_073thm.jpg
464f398ddbb59a02f4bba7fa9a10b5e9
69818c13b58e173c3233c511e54e989d99f93e5f
49597 F20110404_AABIAQ tang_y_Page_100.pro
955ad47bf64377b229d664e68cf7ce80
cf9a26847255303e05b90b93d37e2cff021566c4
1051957 F20110404_AABHVL tang_y_Page_023.jp2
f404398f7975fe6e2ab2872d2b125255
f918b7d685898169dccc37346842dc95d015bf24
5885 F20110404_AABHUX tang_y_Page_076thm.jpg
10e47e69b1ff697654f9d0da918a0db3
4a8f4043a9a972f17c8790845f593dbbbc1dbd8d
1051863 F20110404_AABIBG tang_y_Page_049.jp2
5dbc1e9c5723af46a33290e34d5c3d4b
001915e87fa945fa133f31159db85a3494594491
85515 F20110404_AABIAR tang_y_Page_026.jpg
37025845354f3d81a9ac19ecc29f9ccf
2badb57e1f2aee806c5169d90ea4aa13af0ee5a8
25401 F20110404_AABHVM tang_y_Page_097.QC.jpg
7583a951d751c39c711574ae7d07b07f
9d5195db30ee8c9c1fdf403b0e26da775d9a557c
30272 F20110404_AABHUY tang_y_Page_041.QC.jpg
8e90001ea6b8abd3e8a3750f28832d3b
823486a725e1afb2df07490aea0e7bb471f0ee2f
F20110404_AABHWA tang_y_Page_030.tif
8a2e4c12dcaf4a5cd00dadd2bae9b676
0e5e7c2e418ff430211765a62b4977d94634240b
F20110404_AABIAS tang_y_Page_034.tif
465a83b71ae026e6c68c9dd950cd2247
465c4797330d7b9ea8d0a8bd6b09f74a509f30d4
785523 F20110404_AABHVN tang_y_Page_008.jp2
20e0d73fe21156187ce68acf6044e8ea
fe8d658066960e845a8c062cc593d21b4e3b4b58
735328 F20110404_AABHUZ tang_y_Page_093.jp2
d112c6533fcac0b0e95f4892cbe6a5ab
a18c00840b346ec6d7f5919f50dc3a4570b31186
32350 F20110404_AABIBH tang_y_Page_074.QC.jpg
69bbdf40b6feef775b7d65009cd051d3
28fba3f2feb08a51b12f90e66fca6ef9d468abd1
51154 F20110404_AABHWB tang_y_Page_062.pro
f50393c195fc4aace7685037a45fb920
3a9b78315f9c4cf34306f23591578b1b45a6bcdb
F20110404_AABIAT tang_y_Page_031.tif
07be14a9600c80013aa49b1183bb2870
5905ba4b7cdbe1bb91ac6756a4863697ced8f202
7561 F20110404_AABHVO tang_y_Page_018thm.jpg
518e8ca95117bc69db5846ee71741567
3b6312785b58aa443488d7cf55545761b6e7be79
901649 F20110404_AABIBI tang_y_Page_078.jp2
2f01db24ce8f4c4e3c97c3d02b27c8ec
a29c470ad4013fea24b2d78c880d9e79e49b2745
2907 F20110404_AABHWC tang_y_Page_111thm.jpg
f551b9a637be672169608e96dcb0cd7a
56e0d9445e2b4cbe64512179cb75c806e325f2f0
31752 F20110404_AABIAU tang_y_Page_017.QC.jpg
7170b514d3086b4aa6d368b451eebf7e
ed34ed963aa66a9fe6deb25a3365e51c587a65ef
953942 F20110404_AABHVP tang_y_Page_053.jp2
e453465b24a21429cfc1ec9de3deb32a
c698532e7184b0e24eb3aa9f6d02712b68f7e76c
6971 F20110404_AABIBJ tang_y_Page_034thm.jpg
bb3d33c98173cebce6e2d61ba69cf116
bbb85bd56ade5d3a65710dc029ec0518ecc8bab7
1986 F20110404_AABHWD tang_y_Page_041.txt
b30f9bdbebd0ca5254002932baa48f6e
cc6a048fb3dfe3cc7e221dba3e99f6b3e35412b9
42264 F20110404_AABIAV tang_y_Page_050.pro
95e49e28be82e3830e64ebe436f8cb38
ff9fb74227dc77c4f16a03d8ef7f5da133bf079f
1832 F20110404_AABHVQ tang_y_Page_078.txt
820660c034c0bb9fd1acdf8ae0fb730a
02294f9a4c6a0b8daaf821bf596cf347a54b83a1
46695 F20110404_AABIBK tang_y_Page_048.pro
248eac00cc1f9a6ccbd62a913b012a96
6a7a91b322d192a10e5a7112fc14c31167fb433d
F20110404_AABHWE tang_y_Page_090.jp2
41c5b96a09559e20a8eb0d70fc4f2590
295d6e276fdfdec4e1176aec32858e2f9d20e8f8
972 F20110404_AABIAW tang_y_Page_083.txt
938c81b1c2a2a024716d7157c7fe2aa6
8d616856a5157cce57406391f502bf82bd2b1b6a
1063 F20110404_AABHVR tang_y_Page_088.txt
ea5dcdf83cfc3ef57ec5e43389fd65cc
4ec74841998e22ab7aa30b4c697607295d06d967
6900 F20110404_AABICA tang_y_Page_106thm.jpg
ed801572573230d4608e7f4d7f4d98ad
1fdf4938768749b6221f276fb55b4f3875a8c6f7
30102 F20110404_AABIBL tang_y_Page_040.QC.jpg
85b6232335b3e30286a85c37378bc6c6
ee448ecfd5d1bdd94b892f6ed21ba0c813511bdc
103569 F20110404_AABHWF tang_y_Page_045.jpg
645f75298ad9b5711cf9f951dca6056c
a555561ce19b280a69c6947064cd55cc9b70d20e
32990 F20110404_AABIAX tang_y_Page_052.QC.jpg
70c2ba0c619fbaa7cbb01c7742c6347e
2c9fbacd7dbe75440d1e5645d447b42b64796547
39551 F20110404_AABHVS tang_y_Page_032.pro
2fa8940c5228a2aee06bdc0f239db443
c3122296936c3e93a96adbfaf3257752fbcadcc1
123660 F20110404_AABICB tang_y_Page_107.jpg
6162be98a99af64a685528c16d784d06
e96d351bb562e6469c99ebe715a22aaf11b448e2
F20110404_AABIBM tang_y_Page_110.tif
66eb1b81b235602fbeec183287096bf2
ef7ffb4eeefe3407c1d0dded96530dc7c6414e37
6285 F20110404_AABHWG tang_y_Page_025thm.jpg
934efe03549b771cf4b2dc73dac99804
c0a4999cb5bb623572ea78b49e332a8765b71c7d
F20110404_AABIAY tang_y_Page_030thm.jpg
a00598978122f52b345abee6d54c0c95
019779ddf6bf04022a92de44a2fd0bfb26763cd8
59711 F20110404_AABHVT tang_y_Page_095.jpg
b26e3305fabb81724f56ab592b1a8043
945154aec6356674d6d6c955a6b0d3e3e6b59ad6
2187 F20110404_AABICC tang_y_Page_052.txt
ceed9e045621bce436af08001eea1273
9ab5399efce680d94b1a2a2375e889d98bb55f10
2035 F20110404_AABIBN tang_y_Page_047.txt
e43952166fd484e9e165a63001ca2e0d
439b3c12592c9b9b770f219eaecffc79c1f6633d
41604 F20110404_AABHWH tang_y_Page_039.pro
ffefde10834c68d591db6e1816d66ba0
c8e5355c619487f2b5a47515abbb7364fe9fbcfe
6078 F20110404_AABIAZ tang_y_Page_079thm.jpg
c5f080bb9f4ab326d7d0737023f3cd3f
909b2c974660a47f6e520f176ace73e335e68a01
F20110404_AABHVU tang_y_Page_028.tif
befc01b82900e9b0d0a29efdcfd5ebfe
e858ab7fd38df0e2317778ee678a5f5609aed5c4
54910 F20110404_AABICD tang_y_Page_028.pro
9a40e624b398e0b81a723aaa81552636
43f9cd3461a8436b1f94c2416270bb9f7ae182c8
F20110404_AABIBO tang_y_Page_100.tif
b40f1888bc94cc35ac41b612dc373af2
dcf3f2a3a456317e30abe841765130050dc91d37
F20110404_AABHWI tang_y_Page_027.tif
236d71dc69fb82be68a6482e1d8be276
f90ef95a0fecf313b0a26e826fb74b8780a49641
7675 F20110404_AABHVV tang_y_Page_090thm.jpg
508383a1a4fa4ae8e4a53e83dfbb8bd3
73173aa0754d6967260f707d37808caf6d662311
6701 F20110404_AABICE tang_y_Page_067thm.jpg
1cf4cc24bdf3862dca83c10f3209f1b4
59a47f820449bcff87e88835953282f73bd529cc
743 F20110404_AABIBP tang_y_Page_111.txt
644dfe74fc6dd44064553190e0b654fb
309c93dd2ee03bf2e77300be39a1345b41e00b0d
25762 F20110404_AABHWJ tang_y_Page_038.QC.jpg
3af72694fa327aa2c3bb1e0d56a5e62a
5dffa7c9ed667cb1281aced21cee2abc6c7f225d
92481 F20110404_AABHVW tang_y_Page_048.jpg
35cc82a351e443222e1ad77b14401176
da36c1fd00d19934992024fdf895f69f52fdfd8b
108 F20110404_AABICF tang_y_Page_003.txt
a74bb21ad99b1596922a08d66efbbf27
a102ec1bd9c9789ab1c5d8c3698d13570cb6c9c2
100688 F20110404_AABIBQ tang_y_Page_017.jpg
368f3d92fbdfc47087b95933e7238dab
c50f6f47e3a4b43fd01fc89574b3034ef177fca7
F20110404_AABHWK tang_y_Page_059thm.jpg
7ff20f42e7d76256abf706bcc37301c8
fd502509d7bec981f95e03e09965ccd9cd1eb898
12377 F20110404_AABICG tang_y_Page_110.QC.jpg
8c7b51b56cb17878f0a161f4eb71de47
64e77804b14b73469e8f40246586fd05e880e7eb
3025 F20110404_AABIBR tang_y_Page_006.txt
9950262dca165874636cbc77e74d3ef3
04cb498530878b5487c0964d63bdd191f901b752
83243 F20110404_AABHWL tang_y_Page_039.jpg
a5508e88bf58907c028eac6abc44a90b
479758af320dfdab709ef396eb44a8db7a3e92da
7337 F20110404_AABHVX tang_y_Page_028thm.jpg
734f44aad53453a89d40cc395faf43a9
9bae8e539a6c0d8dab0f7937c546c8d26e4a432a
33444 F20110404_AABICH tang_y_Page_016.QC.jpg
ce43fc37f2927a720b05f61680362206
082badf7d2df53f4fe4a60e13e99fa15dcc34a84
47553 F20110404_AABHXA tang_y_Page_083.jpg
9be450736909007083e3b01c6af0fa6c
440447d14c7b0b43ea1a44e4338de00cff3adb8e
F20110404_AABIBS tang_y_Page_072.tif
009c6accdf3ace23a7418612081a7fcc
78964ed9109d3343d2f5f36469e27921ae9bdbd3
949881 F20110404_AABHWM tang_y_Page_035.jp2
3664b9021ac290f94a87d1ea4c404694
ea6ca80a99c0b4780ae69295298b3725283e9ab9
27513 F20110404_AABHVY tang_y_Page_053.QC.jpg
15b55ae18b3894696e25e27a49981c8b
be5eda5f2326567cb6f76fdec35db1e21e16f3c0
1030098 F20110404_AABHXB tang_y_Page_048.jp2
1b12e67e00f36d2e49dee059428e76f7
095dc0d9e418ebef599f98a0bf0eaa7bb98f98f0
1703 F20110404_AABIBT tang_y_Page_010.txt
6f2ad0058b1a9e97202021e9d76c11b1
12858285ed2dd78e4ed3b53d407f6e1e9b8caaad
874475 F20110404_AABHWN tang_y_Page_025.jp2
e0c890abec0b7a6a86c79b10fbb7ed56
95325d94eb4a342dd1625cc2b941be2a6b38f261
7260 F20110404_AABHVZ tang_y_Page_037thm.jpg
72f4b6d01cf729eb35936210747a6e4e
a7baff45fde338b33b6ae950557f12318e6abd4c
F20110404_AABICI tang_y_Page_108.tif
dee632feca962c4fe49148530431d4af
182fc5de6a934bfb53a9cb67406c1284bc969469
2117 F20110404_AABHXC tang_y_Page_084.txt
08e9861d75574321ae64e3d8060f6632
bddd4bb055d3a4e428f1b87dcee8bb8b6e97f23c
F20110404_AABIBU tang_y_Page_016.tif
a2597a0934f922f93a31b27500346852
d46ad9dfe65ef50faff1e0baa3284866c70da5ab
6690 F20110404_AABHWO tang_y_Page_103thm.jpg
c527ee611783c703cab24fc13cbed1a7
0b04bc3c35ab34cbec149a649f27dc877d5a3efa
F20110404_AABICJ tang_y_Page_045.tif
4e06429b9a9e8bdaaf4096874f42c0e3
6cd5e36fef20c1feb9824474aad7289b1efef26b
5944 F20110404_AABHXD tang_y_Page_006thm.jpg
e4110df125fefa0b9b867e5ffe89070c
4da0cb9c8cec2274cc1b729d6b5784ac64d5e91d
949545 F20110404_AABIBV tang_y_Page_011.jp2
402bc272feb6fa17c41870cfab3cb009
283d9c336b4f876fd39e8905269f493c198aabcf
2123 F20110404_AABHWP tang_y_Page_060.txt
2ba5594f1f0ce856737b47111ab3ccb5
8d52b48040794826e36a5ca65fc243bada07e6ba
29992 F20110404_AABICK tang_y_Page_106.QC.jpg
9d98da3eb2d766ef17e71bfd6f36d22c
1dd896dce0f0e66bd2539a4723a054e334339967
55418 F20110404_AABHXE tang_y_Page_019.pro
40779dd3c7cb15ebda1e9637f63a3242
4f002d596ff73b51bd914828183ad4ebf6b44b52
105 F20110404_AABIBW tang_y_Page_002.txt
ff92d1aa33aa8b198bff99bf65a43c53
e50bc41211c987bb1f0b5a167b4756722284b87d
844468 F20110404_AABHWQ tang_y_Page_032.jp2
1197e24ed03dec72b448f0ff7f641462
17436423bce6ec986bfc4a8c585ef2dec3d58204
7271 F20110404_AABICL tang_y_Page_064thm.jpg
2172aae3585a346ff06d529e65f428be
c4204d9f32fce19e18512280349d5517a2c53e94
F20110404_AABHXF tang_y_Page_102.tif
16a5df10470924fa7f28818a3200ad03
67e3c5c507419e04ff7c62696ed0c082593e516b
6711 F20110404_AABIBX tang_y_Page_050thm.jpg
9cb82b128c0af3630c6ddbb34ce5fbf5
ab3fc416c8267fab7d522fedc7a617db46e057c8
88467 F20110404_AABHWR tang_y_Page_022.jpg
cdcc841216cd2173ec6fbef17d190cde
20f0fd10dc4f0787d2573cbc2d039b9aaeecea78
105266 F20110404_AABIDA tang_y_Page_052.jpg
edb13cb49dedfae8ad9d494e0bf04661
3aca665c639f9962300dcc94d4f73effc2acaba3
108027 F20110404_AABICM tang_y_Page_019.jpg
e771c2b329f059603ccef1cde942e029
8f88b5727fd20b666957054c142b082b5a1b6fcc
2034 F20110404_AABHXG tang_y_Page_092.txt
cb86b0f403b0445a55abb58dff13fc96
06e415de6d5cee48ca8373e3f7f73b473c377f6b
2137 F20110404_AABIBY tang_y_Page_049.txt
5f80058386c992e407b17211d3a96af5
67d0ddd9cbf384da05ccc97283c8d0bb08cfc3e8
5487 F20110404_AABHWS tang_y_Page_021thm.jpg
8c05d9d0372a28cebd7f78a5a606bc50
8e20bdb11dc1b1278ad935bccd6ca0cb1d5822ee
27723 F20110404_AABIDB tang_y_Page_003.jp2
1ad819440383fca60d737ef007c2a4d8
36eabba6c8cb5afea9844966c865e1a2bdfacdee
5652 F20110404_AABICN tang_y_Page_065thm.jpg
2ea9355cf17e9fdafb513bd9daa7c046
699c2ba3df17c6764b97596545097584ec0ad2a4
54933 F20110404_AABHXH tang_y_Page_072.pro
83244fae7c904d0497aee1cbc34d32bd
b9d12551854c1475ff34c6386f6a62dc97d0b17b
50037 F20110404_AABIBZ tang_y_Page_040.pro
e73d27c94483a9cec718691dba5b202e
b49d30cb8bc477ca271c828c2dbd65906967357f
70929 F20110404_AABHWT tang_y_Page_006.pro
ac09230c445ee2042709e7994e6cc202
fe670bd6e9f9c4b587be31950cbacd080e89e6f7
1776 F20110404_AABIDC tang_y_Page_054.txt
d99682881e516cb43f0ab65666843c1b
e240734d5af68ee83a0aba4351b6b021560fb6d0
67923 F20110404_AABICO tang_y_Page_076.jpg
dbd1e1654486a7f2e1fb65a992f59773
9cfb1089aa8a4767fc6fb18e97aea95c57768bbf
49720 F20110404_AABHXI tang_y_Page_030.pro
b88837868b5798a5ea1f9d3499019e7e
8a4049c11ae382759e2df690d1fa18163cf0045f
2979 F20110404_AABHWU tang_y_Page_110thm.jpg
74f802cf41687fe12451e377c0eea2e4
74c83bb1357719c46f0703f234e60c42470df854
F20110404_AABIDD tang_y_Page_056.tif
f67830ae0371574111c507ecf6f8206d
274b10067ad27991687793b93991cd367a8306b2
6489 F20110404_AABICP tang_y_Page_027thm.jpg
4905c44a0959e1057cf1f69761c23a7c
f55920fa7bce93eb884aee5180f43c5e81ef0c35
915210 F20110404_AABHXJ tang_y_Page_068.jp2
4b525cc50a88a982c40ee6a57abddf79
673a0b9739d26f3d4c3c13ffa9c662f5d9c5544a
53141 F20110404_AABHWV tang_y_Page_029.pro
800df954430166aba91e32c8917fb1f7
a3c3d7b018486fbdb27067ce3964671a97f6e889
7730 F20110404_AABIDE tang_y_Page_072thm.jpg
69ed87dd9e60efbf737c407a47dd4bc2
bd96e9a8502a68ebe8183d6c92732e98581bd3d5
31339 F20110404_AABICQ tang_y_Page_055.jpg
c8f3c92b4dfdfc9437fa300fad8d6dfe
edbb7f6ad5b2b0cd550d4fd04bced055b62dc106
90963 F20110404_AABHXK tang_y_Page_067.jpg
33ec30ac61c3f4b831c9efbbcd1491c4
7f031fd43bdebaffe4dc7b4f5aeda75279eb9f12
27854 F20110404_AABHWW tang_y_Page_050.QC.jpg
815ce3f960f813592deb6e8676e5e453
aee32adf047832c1342e5cabc3b95b7013cb306f
2263 F20110404_AABIDF tang_y_Page_046.txt
cee3b413a32c48d6f2b7ea1aaf92befe
e6983f4b26c51a99ae92d584bbc3a9e06005cf6e
15439 F20110404_AABICR tang_y_Page_105.QC.jpg
c0107e8d8cc465a31a8e9eac8596d3e5
92488762546b5828fa5c61754b5c9f8fe4f5cb1f
6644 F20110404_AABHXL tang_y_Page_089thm.jpg
130271e4ad9948fd6f3d39b89e9b373d
e3f80d3164e7e18ac32f1134e8a4d018a7429fe8
F20110404_AABHWX tang_y_Page_061.jp2
cd98fcefe5a841fc79d0f55f9fc77270
6cfb193ef46a4549d8ec1f83ba57d57cd182243a
2159 F20110404_AABIDG tang_y_Page_072.txt
7d4732d10e9e9ec20f0e0b8deae14a7f
6ccd952b932e67a25979111255baab7717706474
52660 F20110404_AABICS tang_y_Page_063.pro
81d295cb2978f02211683ad539963e4d
4b4a2ab064a0139fb0b5df3dce10df3f3f7e8c17
63188 F20110404_AABHXM tang_y_Page_109.pro
6224f9dc460af54d167b82c95c715d45
e464981e10dc554a53c1eeaaadf71739ba4d1316
1580 F20110404_AABHWY tang_y_Page_021.txt
c088125824ddf68fe9e7ac3f0299ee68
330a51feb02a8d89a08ef9964ab034a0f1610b8c
1051982 F20110404_AABIDH tang_y_Page_046.jp2
69706a376d1e09b4caee5752030a0a64
c804d6bb02a8a6405ea0060f7e4f9b8e94d95198
F20110404_AABHYA tang_y_Page_054.tif
70a7449145c395ca51aca484ccddd3bf
26727a88926d71ca84b27f2e08ae85502f5a0987
F20110404_AABICT tang_y_Page_023.tif
a7c750088e3768e9fdddd222be1425c6
2b2c2cffe8802a9b4231b50841df34ba2e33a44e
923247 F20110404_AABHXN tang_y_Page_038.jp2
a32535370364978ce75d95653550f855
c202191098ff33ccd8d38275cdd2b713a64d5e3d
18306 F20110404_AABHWZ tang_y_Page_058.pro
5a3b16d673e94205b6bad48a4346f18f
e35cb164bd60fc5cfe6fefa37e01eaac4a915026
32805 F20110404_AABIDI tang_y_Page_059.QC.jpg
e8bd2a77337daaad83e0b00e4947ad72
08c0e111ca3e797f857e72396c05fb0e4729e110
774650 F20110404_AABHYB tang_y_Page_099.jp2
a8816b232629f6167c4ca22aa6eda613
16b84837056e7c6a798a64e0649e0430b6b8f02f
86171 F20110404_AABHXO tang_y_Page_044.jpg
8cf6dd74fee13b6e9cd391fb262d0eea
4ef273108733bb961ead178a0bda52669c2d856a
25247 F20110404_AABHYC tang_y_Page_047.QC.jpg
2cc82f8b3109d87208c94729cde192f5
4c3095a6f9d2437e46538ce22233d4e1fd825009
54166 F20110404_AABICU tang_y_Page_104.pro
ab4e1968a256f7bbf5735dc0124f8757
acde791a9d445b31f79f3512d5a3f777038b8f8a
2148 F20110404_AABHXP tang_y_Page_013.txt
91bda20e53867c2024d9cb758a3d1a67
a7b51f8aedede982fe70919619703bdc9e27ee3a
40763 F20110404_AABIDJ tang_y_Page_097.pro
7b8f871755dd0e8bb094cfdb7a121791
baea7e1f6ddddadd5da88386929a4c9d82c6aaee
1051929 F20110404_AABHYD tang_y_Page_080.jp2
f35357392406ed36eb0648ffdffb61b4
c4f8d128c6e3ff277022308d2d10902fb2aa4fe1
F20110404_AABICV tang_y_Page_010thm.jpg
e2afacae3e4eb887ff0e9d265c9da088
d315bcdb245b7fcfb1d1427909a22e8fd485556b
27773 F20110404_AABHXQ tang_y_Page_077.QC.jpg
16460815d3a7271b2942e317d531dbd2
68b260066ceb1daa4a394df2d254003406b6e566
30347 F20110404_AABIDK tang_y_Page_064.QC.jpg
79e8a9507256992b7245e44dbbe5ccde
84d57c4919b1b1020b084fe0e42e4c7568ea368e
86773 F20110404_AABHYE tang_y_Page_012.jpg
15233f186a48cbb114d122d075af9142
15022f737ebd376a44f9c1e2ae3c1aadf84498bf
103311 F20110404_AABICW tang_y_Page_074.jpg
49626f309313d6f18c093b6068022be3
eb1f81cc0ba70b4006a12831257476ecff4a9226
1663 F20110404_AABHXR tang_y_Page_020.txt
c7ddccc7066d7ade47a628ba6c775e90
1b93199cf2b185d3a31d073d42cf9511e6eb0014
90186 F20110404_AABIEA tang_y_Page_075.jpg
f67230aa69d4b21a4f8e04348bee1e2e
77bc9529c1d42b9deae56d1bdf11b063671f3f9b
22264 F20110404_AABIDL tang_y_Page_094.QC.jpg
92938617b085ff45a2a1c13fc1af87a7
054421db8ada5ff8ebffb9a2378af91318b70c45
100086 F20110404_AABHYF tang_y_Page_015.jpg
69c8f7f3c985a6b8889ca0d07820d9ed
f11919dc774e8f65708e2c2d92111d0fc7a66856
F20110404_AABICX tang_y_Page_022.tif
f53a12dcebb8782150e0b98ab1226eee
e095ee50ce063be1d2ad53eb060cf488c36794b3
20806 F20110404_AABHXS tang_y_Page_110.pro
1ab3853484854e929c5d1ccd4431bcdb
50b2ae89064cd759374e24b0ad23a662fbaad62b
88775 F20110404_AABIEB tang_y_Page_081.jpg
7f09cd5188b585dbce17ea7a6c6a54b7
ee7eabe6403317aaad9132eb0bd29f5d3dca5fbf
869 F20110404_AABIDM tang_y_Page_058.txt
dfd25c2129bbc9f95330a0aa6b53ddfc
09d9dba9649257998253b59c91df1852cefdfb92
556 F20110404_AABHYG tang_y_Page_055.txt
36c723cce485f260c544fc237508dc13
f3756da9d1da79a17d7972d9254f12dd1032f923
67159 F20110404_AABICY tang_y_Page_021.jpg
2a3d36fb35ff61823e78b14fd0873142
57d814e00a85a66e37f99c4feb0c0b0ccd07d726
7295 F20110404_AABHXT tang_y_Page_017thm.jpg
8841cd1dd3adf48f6464d20aeff294a2
bae224a592377a3c1f7e19798810c96f03f81c2a
7573 F20110404_AABIEC tang_y_Page_066thm.jpg
d05ccea7e6780e9de00df259712e628c
a4755dbfba885da0f9a9b6ccde1e32898d141b2b
2105 F20110404_AABIDN tang_y_Page_066.txt
3715c68a321fe56e1e086210e3e247ee
79b1c52918b5c65ad9c20e162f814cc5729df5b6
F20110404_AABHYH tang_y_Page_047.tif
bd6a64206583fb3e421cbcd09828c457
3c82b8c2e69b24c462c1316070c8a92477b82147
921819 F20110404_AABICZ tang_y_Page_047.jp2
aa31437b8a208acd6a0804ea36ec3798
372fd38dfa0990ff9f317e7dfd0f56c4e78a0a07
F20110404_AABHXU tang_y_Page_051.tif
9220ceac92ceb733a23fca6e8fc36a9a
5f355cd5bdd2632ffca0ad0f715c60d54b7b900b
27198 F20110404_AABIED tang_y_Page_012.QC.jpg
bff02cc61922ef4d64f7bf64402504bc
9d5d2a10f3c85c39d8a250b3777226aadf7fb3de
95586 F20110404_AABIDO tang_y_Page_071.jpg
07dd8f977643e19cd2cc1b6f1c978aa9
1a0c9e2f4e8a88bc678524a100cc86d05993e332
1051959 F20110404_AABHYI tang_y_Page_052.jp2
3d6b18d86343b485d62b9a6e669a6e8c
8febadb585ce04db66273b0b9ae1d5b58fc714ea
55076 F20110404_AABHXV tang_y_Page_018.pro
6b763f42685d22e235f0de2d58e822ec
817e8380db094e8bfe40223fb4d9b450bb720516
F20110404_AABIEE tang_y_Page_100.jp2
ce952aa65b07e5159d19f6c292933216
bf5176464cfe4f7fc4c7a46e2e87e51aae16a85f
6726 F20110404_AABIDP tang_y_Page_044thm.jpg
33183dd9c82db262c20d60c581ff5ef2
c5eaea4d2b3df8bd4e27d18f2af26ff6fbbd421a
F20110404_AABHYJ tang_y_Page_006.tif
80933eea2b779c98483592dcba1b93a4
a6bafc7cd59b432a23c8a2a3318074809b5dd01e
1051961 F20110404_AABHXW tang_y_Page_024.jp2
a0b30a507efbfcd48cc20c6ffa1a9a3a
435021ff9c5b60f679e0fadd549d50f333a360b9
6331 F20110404_AABIEF tang_y_Page_033thm.jpg
09baa84e0e68c2d3cd7b44cd7e6ed48c
0ec37a2ecaeb838996dc90c8b29b731f334a880c
F20110404_AABIDQ tang_y_Page_065.tif
56c2e9b6861dd5b434692853b471e72e
8b0fc52b1068a0da6c27b3eedc9b15581d89f655
F20110404_AABHYK tang_y_Page_066.tif
d36d7d6502e72c7003cc5b952047d855
22e90610c09a680c810dfb9c0abf5c43a8b41440
1884 F20110404_AABHXX tang_y_Page_039.txt
ce00eff26d9796e79066dd3da6b6ae69
26ad387ac0dfb8f8b00eda14c46522c7915f4800
55477 F20110404_AABIEG tang_y_Page_057.pro
67b4248d961928ac5238b3ee188abe03
5fd43759a82f5ec6ed00f914284909efd0ed6072
F20110404_AABIDR tang_y_Page_042.tif
d7e52ed802594f47a3fe0b0af7fb696e
58d7e25b9ed9d3c2c259c906ef32235e27fed756
53028 F20110404_AABHYL tang_y_Page_066.pro
37be3c69f505b1b01ac96b5ced780f48
4fa4611d1be2512ec683579442ffb0af2949b490
85060 F20110404_AABHXY tang_y_Page_078.jpg
f1eceb577370c768ee67d93b4d9cb5c3
08f6066ee52d1ff52416b50b11a03990f57d1e72
846333 F20110404_AABIEH tang_y_Page_031.jp2
5d250611572d2db644b1b4f202b757b6
b11395fda768a601a6a5de06c0b88db25d6b0e34
2190 F20110404_AABHZA tang_y_Page_057.txt
311941e1b5e1979714645ae4b99fc751
745c09e26990003c15a6b7453db53559e5520905
F20110404_AABIDS tang_y_Page_038.tif
2fcc07a56cec375abd10ab1bc39062a6
bec68f3bd8da01b55fae35d0022d1bc8c119961a
48697 F20110404_AABHYM tang_y_Page_034.pro
b862cd81536c66c3be2f8641dde2194e
643dbc37dc3ba4c33425b0992ed6793227d5c18e
1051970 F20110404_AABHXZ tang_y_Page_059.jp2
1939869d2f5549e1cc80c5464fbb11b9
f483476c05e8b7a3e6e487486521be52e5b20055
F20110404_AABIEI tang_y_Page_098.tif
560815516a2f0bc65a71649cf777aa44
3d6b71a70d6883244eef0b3fc798cefcbaf3e573
1051948 F20110404_AABHZB tang_y_Page_017.jp2
8b86bac127f359fb8da83648821e01ec
03fac5a0feaf804845b92939cf9f77ad5bbfbbe9
7605 F20110404_AABIDT tang_y_Page_051thm.jpg
0570dfbd6ce68b4b0c3da90013637448
ad39f6c9deefd6584a291a9955d90a814e2c9662
854 F20110404_AABHYN tang_y_Page_110.txt
599b6d63fad1361cfb6490f5d78d07fa
46a483325af49a63ddc82543ac6e95bd7aa083b2
1451 F20110404_AABIEJ tang_y_Page_098.txt
5b65291e61f374fd12452b14c2995dfb
ac4dbeb5e31944b5115dc0b2f441b1d5557f2fde
393441 F20110404_AABHZC tang_y_Page_058.jp2
289dff5c778f773be5b35fd7597e3868
2da9de00a7766840fa7b2d80f553acb511c669d1
F20110404_AABIDU tang_y_Page_060.jp2
0aec3a5ab7b51bb3210266781b9c2e9c
79ffa958434af0003a2087b739f5f496078edbf5
F20110404_AABHYO tang_y_Page_026.tif
90c50bf222e1cea9cea73b7ec2bc9f72
2cc27d784cd0c37f806e117f4187425ab528b786
7036 F20110404_AABHZD tang_y_Page_040thm.jpg
b8e93e338edf90325554e23b5bd4ff75
9994955a85d7062975cc946ed999440d3e579408
54043 F20110404_AABIDV tang_y_Page_045.pro
39ba0db76418fe9ab42e97471346019d
a64523980db184f16fc68b6a0bafdb40c7dbccf0
6370 F20110404_AABHYP tang_y_Page_097thm.jpg
6a2855947e4166db1ba722e9d0eece38
850008b4c1df381b7c8f51b47fef15cd4db730d2
130195 F20110404_AABIEK UFE0013742_00001.mets
f5883178c14639725e2de8f817b39877
0ae1b247fdce22d5bb3852d6bc0f7efa81243c4f
2189 F20110404_AABHZE tang_y_Page_051.txt
bff7e41ed664c35ae6b3fc1e669733cb
57cbc9cbd4e09b6bb6d359df823a51edc848cbb1
6544 F20110404_AABIDW tang_y_Page_035thm.jpg
99ca97b23ec0d5611797dfea9cfe7ffb
6e36f554a440a5bae157f0cae2c193d3e90673a2
124292 F20110404_AABHYQ tang_y_Page_109.jpg
1964a49a745f3059163b982616453003
4e45b21a18145d7dcf3793ca35d6067349f65afd
46031 F20110404_AABHZF tang_y_Page_089.pro
a0d0c3b834bbce5e5e88f01c95b2b6ae
6c5ad0ff4e143309c53d2e5866898941d4a5a1d5
2029 F20110404_AABIDX tang_y_Page_071.txt
89010aad35170dcfe81cc22a0ae1f6af
ce537607cfa076c5b5950a59329b11565e83b646
33055 F20110404_AABHYR tang_y_Page_104.QC.jpg
5e45c739b4d42076cccdbbcc190041ca
152cc8a350417a5fdf23261a48ec887cced370a3
F20110404_AABIFA tang_y_Page_020.tif
91791b2f7d7676e7cebbaf82313816ba
d5e1b733732d0f0429e701d6c4f306dd8e16b1bc
7494 F20110404_AABHZG tang_y_Page_057thm.jpg
7cbf511af79dae69505ebe67ae7ea30f
3a999e0f03cbed5c7f88f4296a590b3636002285
F20110404_AABIDY tang_y_Page_028.jp2
a523c3bc18845dcada6b7b093564e45f
c1cb96e82073a1c2ab263b8d81a84ba15bca654b
21916 F20110404_AABHYS tang_y_Page_099.QC.jpg
bf4f2f40653db593335e45029360a864
02598824f30d544dfc23eb8a70ab91f6e3211741
F20110404_AABIFB tang_y_Page_021.tif
736174bdba1337258aefabc621a42cc8
a24c57e16d94c5874048b56326dfd9b5ed21934a
F20110404_AABIEN tang_y_Page_001.tif
549633c08797bbdc8568f83b748976ba
205d1bd1833f78086e1f4bcd46b049fbc4c9b7c9
46727 F20110404_AABHZH tang_y_Page_067.pro
1333fa23b5c23bfa8a6a4e35c9a998b9
7990cc69db6eabaff7d6a5319c4ff24acf52daa4
1940 F20110404_AABIDZ tang_y_Page_096.txt
43f9359bf4d453da1f77294fd370aec7
b7687daabffb53e2505d003ee0fa3f1ccfd18041
49701 F20110404_AABHYT tang_y_Page_105.jpg
198a0c216ea77a273b5b5d11c3565e70
741969135323a7ee0fa767dd52aa8f24d0934073
F20110404_AABIFC tang_y_Page_024.tif
c4a987b8790ba6d2732b789fa2f7216f
09d86aadcabc89a8677f847e6f453d2799533604
F20110404_AABIEO tang_y_Page_002.tif
b75516d570a2af1def7e075f53ea40b2
5541d19ac6283715bf323e6e64875e7da05c2750
F20110404_AABHZI tang_y_Page_077.tif
8c61c26861417b86e1739c5d403c95fc
cc59aa078517d585b4e40a1a514a3bed6e6819d8
2046 F20110404_AABHYU tang_y_Page_100.txt
8cb2d8e2433d226fd03c6f03d79c033c
b2d18a7836f9c2eca7faa4c70b95895a9067c14a
F20110404_AABIFD tang_y_Page_029.tif
7d3dc0eabb9fd3c75adeb0e86ddac096
c6886dd1ed08e1da182902475df43427ec3ec44b
F20110404_AABIEP tang_y_Page_004.tif
3652a1d2f4035bf75a4de11a90e5c851
cc0929bff046b895afda07c7aa577691a1552411
F20110404_AABHZJ tang_y_Page_079.tif
5507363bebf5fdd8e70568e807b8e909
212738ea52162865d8533015fd4b633b2ec104e2
50063 F20110404_AABHYV tang_y_Page_041.pro
62bc55f8b4eaa0e1578921272018ff87
3ea432a70a58c9b98360a9bf1a29a23608045f1c
F20110404_AABIFE tang_y_Page_036.tif
7ba261d65e0d8a2b6a9fcd64d2dd6f1c
8df373ea5bcceb4b54419c13a8dd5862711d3da5
F20110404_AABIEQ tang_y_Page_005.tif
ed5c8d3edf3a639e34c252e26d598759
aa5774d06d88b8273b16fe767fb52b75b5c000a3
24215 F20110404_AABHZK tang_y_Page_008.QC.jpg
9f013a6cb2b0dc976dc308b60524ba8b
d6481c1902332e95605d1111d5036fc02c7d5e94
1051984 F20110404_AABHYW tang_y_Page_109.jp2
2ee561af3ddcd74f66257d0094e817da
628311cbe4f0f6e9c666a52ae2340f82d05923b7
F20110404_AABIFF tang_y_Page_039.tif
06c11768d3a4a31012d94b8453c39e0b
6393f17eb0781abfee03fed5d73b88ddf0c2d1b7
F20110404_AABIER tang_y_Page_007.tif
bc10f5d3a7642780c06ebaf61e365b1e
eabc35e283e29ce798f149366adf568192312998
998934 F20110404_AABHZL tang_y_Page_067.jp2
0c93bfabe422a9eb2f2e6225e8c5b525
c9f1c69547e302bbc28e434b5a0da1bbaf996aee
F20110404_AABHYX tang_y_Page_077thm.jpg
e1727aaafa4c2f225a84beb23c26c27a
ed19ac1c262777b10d04c9c8fe600c02d2670922
F20110404_AABIFG tang_y_Page_040.tif
5a15406fcc2f1fc7639bb208dbbcd0f5
4a307f0d4379fb4078f672859d23de6febd07906
F20110404_AABIES tang_y_Page_008.tif
2a5d3c0e2f18a1dcf4813953874d130c
c21bb4be737eea3bf15a6c8def3f3e58e4ee86bc
1529 F20110404_AABHZM tang_y_Page_027.txt
d9441923455dd50f67c361d6e106c0e8
3b2b7921127b7561acb2c51c87166057d9e87c17
97940 F20110404_AABHYY tang_y_Page_064.jpg
9ca90b24e9c98987b05013e2267a1e7e
565527755d6de4b8e950c8109cd9527edff2eb9d
F20110404_AABIFH tang_y_Page_041.tif
72ab599b908927d277d8b47f1423dc14
8b7b16f6e43fc3013d4602171f1b5f336aa04810
F20110404_AABIET tang_y_Page_009.tif
03636168863a28b8f10a9df34987e2a7
f83c44baadd865a0ed80a613ffaa765c1e8ef0c4
32461 F20110404_AABHZN tang_y_Page_066.QC.jpg
7e654f64201970335d3a12b380d84eff
1a7ee801672823fba9b54ec0df589128a8480740
21956 F20110404_AABHYZ tang_y_Page_076.QC.jpg
6a7a895df9c6ccfb309c4ca7bb8fd9b0
bdbb3dd628bfcf9bd1ec06c7729a5d016f791d08
F20110404_AABIFI tang_y_Page_043.tif
3b862840316fe7d62a6955071d7adbab
d5df38c0f4dcc308ea30d693b19f1357bf065081
F20110404_AABIEU tang_y_Page_011.tif
6126999742edb86ad60c1d50f46068da
bb5e5bdfec9fb18097c8f0011991136a18af2efc
103106 F20110404_AABHZO tang_y_Page_024.jpg
67c908a08dcb8c6278d3b5b0259805d9
01f1d8f28332e6e05f345a3a16731ada453a0aa3
F20110404_AABIFJ tang_y_Page_044.tif
efc7014b84391594e4c9dbc36db19b4f
546bcb0219bc5e50ba9cb84fb3f25ef5eaddb1b5
F20110404_AABIEV tang_y_Page_013.tif
14fd454dbc64e67784d12b9addbf41c7
106f425063485edc5d5db0090ad26c2e6dd60c21
866671 F20110404_AABHZP tang_y_Page_010.jp2
19b460df26f3d7e332a337b86d99eb42
12b3b410e501b7b2bec2833e663ca4b9cc42918a
F20110404_AABIFK tang_y_Page_046.tif
783c52b1c7fda30cc3d0e239078efffe
32345e5b00166b24977ad37fb4ead843b0121358
F20110404_AABIEW tang_y_Page_014.tif
7df2af62831f0a8f144fec75a02aa9e5
aa3eecfc9bf872151c95cb3fa8a8cfe828c22bef
90381 F20110404_AABHZQ tang_y_Page_089.jpg
c080ab3fabd0d0c4d7cfed855b8b27e1
5efdcd93cc34cfe0675b8671d611b66ab937c19d
F20110404_AABIEX tang_y_Page_017.tif
27b16e28732b4ed889d108ca3ee4caca
5b4d2c15629a330953002c06d88b9ae368522ef8
F20110404_AABHZR tang_y_Page_096.tif
f117582aff1f253c764d3948372a7b49
9f6aed093cd1a5c3a81ee674058b23e77f312f10
F20110404_AABIGA tang_y_Page_075.tif
a55b22595d0e6b4379a0fb4d04486127
34a364cff1967ab0f362121c47d71e775806a0bf
F20110404_AABIFL tang_y_Page_049.tif
55fcbaed47eee28b89882ce08e42b84b
688336fbf55bea6882fe3345f40d8efc548f8e10
F20110404_AABIEY tang_y_Page_018.tif
744259b05a6ee81c6db2b3cba0ea7ac4
d8ad63e7d156ced9a790bb7ddd99abe81d986fb5
7613 F20110404_AABHZS tang_y_Page_016thm.jpg
6e61b658a9e49fc7aa45fa6ccf604f6a
4b7683848547a61250bfd532377647e7748b2df8
F20110404_AABIGB tang_y_Page_078.tif
75d81fe5310164e6d3f08732fc0dc479
4f9b4f66803578a9a49029e454e10510c1796f77
F20110404_AABIFM tang_y_Page_050.tif
a7d8a9c73c6e7ba3dfac0b6cd25ab592
5e90febf6cc1fd968fe04dc767b4256ad16c40da
F20110404_AABIEZ tang_y_Page_019.tif
935532de6b2218b3c98b7fe68e03c0f9
6708c1c11d97ec2e49cf97aac4ee598071d4ae5f
4919 F20110404_AABHZT tang_y_Page_096thm.jpg
da9c255d527cc25f7a36817c8a318a63
954af15901f03220090f885282b6cbfedd657f67
F20110404_AABIGC tang_y_Page_082.tif
56034f41e9aaa962e3e100dc1ca4e239
a45a4a35238846e425f435b5470e7f2cbd6e2c83
F20110404_AABIFN tang_y_Page_052.tif
7111e90828d8675bc2d5a3aeabbcb000
95e7eb7bcb11c4ebb1c1aed525870e32143c0a2a
3750 F20110404_AABHZU tang_y_Page_058thm.jpg
1b58142aa545ceef84d927107350567c
3c029afc6a9fe4e33d7d575395786d0a780d6e73
F20110404_AABIGD tang_y_Page_084.tif
e7d561b061904019a44f61abcd229b67
9c0747ce27e43035a75fcce876797bd780f1aec6
F20110404_AABIFO tang_y_Page_053.tif
1e55ea435e17128597048d84201faf92
3e8c581e88ed100e300f48bf49fde7021689cab5
F20110404_AABHZV tang_y_Page_107.tif
bbe6e9e3fe5691f7f3ecf5f2d4597fde
f4aba9914e7d6504b6d0914b3cb2cebb982e82e2
F20110404_AABIGE tang_y_Page_086.tif
ab85859b9236a10cf3e79d22ccc53150
28c1e50263dce498b603de342859ea7ea568140c
F20110404_AABIFP tang_y_Page_057.tif
fbcc4bac695b2c7bf72fa95af987cb69
fea2bf7179b087e83486d9f539a45ef5fcb09345
F20110404_AABHZW tang_y_Page_076.tif
882e9183f3eadc6993f401d511845935
2f2a267b94cb9d5cde7975842fe447f9045285e9
F20110404_AABIGF tang_y_Page_088.tif
553b095f560b3c6e21a192749cc1b934
ea4cc3209de8e2816a6ce7e1748db5eecd5eaeff
F20110404_AABIFQ tang_y_Page_058.tif
ecb48d630177c60e0d0d207ac77d209e
7a871fcd4897a6259beac11ed201b92461df9aeb
939299 F20110404_AABHZX tang_y_Page_020.jp2
bfa7841202f3c617c813667bbd14939f
d1d1385e28d0a6f647b1a8cacdc9e45e8aaf6f4b
F20110404_AABIGG tang_y_Page_089.tif
f7ae4006d47f65d0058632864c82f425
8b22bacd245d8e6a42eeda20b703fead06b4c16f
F20110404_AABIFR tang_y_Page_059.tif
3ff3430f93f9158cdd7e783bf82b5155
be8bd022af47c13995e77e6c0809cf74d022e49a
1947 F20110404_AABHZY tang_y_Page_025.txt
3640cd212feb56a597acb8556f0c2bfb
3e52f8848577147b0197d5e38ab65ad9eb3ff340
F20110404_AABIGH tang_y_Page_090.tif
0991d369ac5aff3eab7e198b95c6e360
e52b751f531fe4f0cd2228e4592cc9e40da81fc2
F20110404_AABIFS tang_y_Page_060.tif
cbe0a1dcc6ab4ecaf5d4d65b747539f7
e42cf3a1ff7d22752f399f593f6806911508db61
7201 F20110404_AABHZZ tang_y_Page_046thm.jpg
466d62e11303b54fa2f4529c8b2436be
a0de28412151e711968dd519885da650d2d26261
F20110404_AABIGI tang_y_Page_091.tif
805a2df02bd253ae721b90d43d933898
77e1650e1df961b27f5e9a2df0aa864fe56a33b8
F20110404_AABIFT tang_y_Page_061.tif
ecd5cd2547207830ae3cf9c5364a8ebf
ab683898eb822270dd398efb521594e8f2b32b8c
F20110404_AABIGJ tang_y_Page_092.tif
18cca18ee29e752c620cbad985bffef2
50d5162a2118cbb514b08130eeb51781563dedbc
F20110404_AABIFU tang_y_Page_062.tif
6dc6b9a6a1797f714121874dd123473c
203692012548fb83c33e1b5952e937bb2efa089d
F20110404_AABIGK tang_y_Page_094.tif
c0093bd0b497409442c774109b4bb97f
889a71fd3071b708acd53532a27fc8670f618871
F20110404_AABIFV tang_y_Page_063.tif
8effcdc4dc46a34e41c3a26a822fce32
92818aaece429df559e42228675ee5a2bf423242
F20110404_AABIGL tang_y_Page_095.tif
ae26586c726d0227333d8863550ec7b0
e30e5a90dce1bbb7fffe9b3ebafc75d2d8930166
F20110404_AABIFW tang_y_Page_067.tif
bdb12ba6a26d514c7f100e9c6d79327e
eacb6075b8f0a400e946ee921a8079f945628724
2175 F20110404_AABIHA tang_y_Page_019.txt
a90a502bcc6f7f46f7636e9867dcddd2
4aa8ebcaa8e4129c95ae427576841e2ad270d44e
F20110404_AABIFX tang_y_Page_068.tif
61aa50059efd2e8d5c4164cf3aa0d65c
be362a49a19d40d99f98ceb83efe4260a5e3feca
2166 F20110404_AABIHB tang_y_Page_022.txt
1222829f7886a4aecc6e1afc959e47ec
74a278f6e78d10625a3356d527d71624b3be8222
F20110404_AABIGM tang_y_Page_099.tif
b7ca5c535498c32d25f177390ee7f32a
598794a59f20238346c0d93b3ac486514136b242
F20110404_AABIFY tang_y_Page_070.tif
187756ab0a86461a00963406da49bfdf
aa8ac1d9ff3471218f2076a78c4b93c2f3af1683
2098 F20110404_AABIHC tang_y_Page_029.txt
c425d9eb5975972f05a9ddade4e85d35
4eb844fe665b1a98a5ba6bf5c25ebae0ee18c1ab
F20110404_AABIGN tang_y_Page_101.tif
18ca891a88d04420b55bd67ba07c5cf9
da29869ed3eca520b85dc73bbca47b37608e5861
F20110404_AABIFZ tang_y_Page_071.tif
05af6b99a6cfeebf804840c9433d396d
a767939b5f694667dce30b827ff66f8d53151f1f
1699 F20110404_AABIHD tang_y_Page_033.txt
5eabeea4c96c080f7283d34facae163b
fa8f881ee9172fd6ddf15a4fd69d393f1a3faf46
F20110404_AABIGO tang_y_Page_104.tif
03656d83ad436712dd821bf47f090ce7
df10d0a26b6d14e24fc457dcd7d888e23b7fd874
1974 F20110404_AABIHE tang_y_Page_034.txt
b0a28ff9e9e9566d37d20798b4d82063
8df2be88d76b18a1d31b849d6f8f7565ba7df26e
F20110404_AABIGP tang_y_Page_105.tif
dd9d2aa586ac33208cb4c1aadb339be8
80f72e655b02506043450e04801111b6086b1c52
1994 F20110404_AABIHF tang_y_Page_037.txt
f2bd460d918baf1679d2b8f88f054128
3d9986171685ae1feeffdfb5c15695e18eced4ca
F20110404_AABIGQ tang_y_Page_106.tif
244eade54e8ebe3a0d279e5b7feb0016
f7d49001745957371e464952da491ccf566365e8
2042 F20110404_AABIHG tang_y_Page_038.txt
a37a929742008a65cc1974787c53d5c8
ba66f93adf995578f37900b154fe3aa9c1ea3dc4
1620 F20110404_AABIGR tang_y_Page_004.txt
eb2b2018a5e7bacdf8a423ad313e7805
3af7d6f1c45ea67494106688babf01f03fa2a46a
2254 F20110404_AABIHH tang_y_Page_040.txt
2d463128243cd70603b2bfa9bdd36418
d23724a74c2fddbea5dffc8cd49d69351363c171
2149 F20110404_AABIGS tang_y_Page_005.txt
ef602779759b2c5ff890f416afd3a7f0
5bc2b57c3cbc3cfc6ad109292faa27984dd91322
1939 F20110404_AABIHI tang_y_Page_042.txt
1ac65119cf3cf23e6233613cfc553ed3
e30ce1cf40b9e5043c949056b56427d752f03e05
531 F20110404_AABIGT tang_y_Page_007.txt
d67c936641268337f73ff5b357ad8175
b4cdbaef459df189539eb32eee66be71d9b0a22b
2081 F20110404_AABIHJ tang_y_Page_043.txt
5dce129ccef983af404aedc311931d85
e479162cbd9680707b24980c32dde37a9bc0c5cc
2096 F20110404_AABIGU tang_y_Page_008.txt
027bed02f71d16bf6e81a262889ce813
b60c460321d1f745ef6b2d06e3b1a44f7b5d2423
2367 F20110404_AABIHK tang_y_Page_044.txt
83fcf13a87752aea2e0fc2e951f0f1e4
5763894907f251d17db03197b118e791ba93e741
907 F20110404_AABIGV tang_y_Page_009.txt
10bc2741beaa759960dfa4ba3934b0ee
2a81596a67c31e7c233515d5d3ae386930c82551
2125 F20110404_AABIHL tang_y_Page_045.txt
101062e5feea21664da0ecdc37c079ef
98e7c99ac2de4cd51522476f9947f576eb1bde30
2168 F20110404_AABIGW tang_y_Page_014.txt
996bb1cce4634f125741e2fd9cc24cd6
09b09b0ce0c96495b6cee7ad2fd83c8f6d7f5ba4
1923 F20110404_AABIHM tang_y_Page_048.txt
f7e618597a20b11e4699becaf3b81a6c
acd070b9ddd93c5ad31c2dc6d55ad13006536f6d
2101 F20110404_AABIGX tang_y_Page_015.txt
da9ed8ed86219e7255a3d65ccdc73048
223ed559026203b05852976d16e3bf133f40f7b9
2072 F20110404_AABIIA tang_y_Page_097.txt
c8ecf1ab627a121b8a5fb0d157f6732b
97dbbccc853dda0963b95b79e81c76ebaae3475e
F20110404_AABIGY tang_y_Page_017.txt
28aae356768f87a4914295f3ac7cb069
03c59b0236d25fc3ec6c22a6af470f91b1bda725
1607 F20110404_AABIIB tang_y_Page_099.txt
89fd06d711ba1102758ea65072d7ee15
c45fe9b2cc8989ef64d4f9f96077cd8d8fe109e7
2032 F20110404_AABIHN tang_y_Page_061.txt
d3c75a1122367b0923d1b23d31750c70
f6788bcbf88359cf0ef5c9d1f80b45e37f5ad1c3
2165 F20110404_AABIGZ tang_y_Page_018.txt
97b316541eafa4e53fa09e23358a4157
2a4c12b7c9ee0334a44de9820f15bb7cacdac932
2220 F20110404_AABIIC tang_y_Page_101.txt
3f43b1ab353ae5052720862aa49b9281
f20701ff2c0f9651228bf5ca05b7b6641bdd408b
2022 F20110404_AABIHO tang_y_Page_062.txt
1c23bf3c7157422df6d82c7baf005dd0
d0e2813ffdb71a5eff3854b8ff4fabf9bea2739b
1907 F20110404_AABIID tang_y_Page_103.txt
3fb0a57a4584fbe4f1d39c3e12dd3487
64fc19a56bff2818a198d6339f0d46c62eb176af
1932 F20110404_AABIHP tang_y_Page_065.txt
2150e3f797b7f3063b3321784f63e21a
5b056a0d8da358fe171953dade9aed280259dc7d
F20110404_AABIIE tang_y_Page_104.txt
63b812bf28e9f3714e2866c42010ae3c
e05bd49bb7f0a9a6dd45cf9fe788106ecd9980ae
2128 F20110404_AABIHQ tang_y_Page_068.txt
78174ee95f423488792c2329e271fe20
1845488101fd5e74700f5910e6e46806a1a2e106
946 F20110404_AABIIF tang_y_Page_105.txt
09f159db60f658e94a3056df29292a6c
99c4a4bc0feab894a58e89e224e13ae85377d8f2
2238 F20110404_AABIIG tang_y_Page_106.txt
01955715566b9dd562cfcbc2b96caf76
6b85e597524db24459948760e540bfe80b0d72b7
F20110404_AABIHR tang_y_Page_070.txt
25c87d2d77c51f6df1b022c75c4db348
c71207103fd3225575d1a04b81a193a0bde21ead
2536 F20110404_AABIIH tang_y_Page_107.txt
bb02dbd579b9a25d95b8a4d4b5620354
fe393625cd9ace1c796c4e3918d46568238d99cc
2106 F20110404_AABIHS tang_y_Page_073.txt
ff44f53e5824495423d90d28e223c398
4213434bb29d8654af41acf7a481dbcb85f283da
2639 F20110404_AABIII tang_y_Page_108.txt
31c3f239834228339a0af5c8908d3aa0
54dc06ffdbe43fe470fa861faaa060b5d8b15f3a
2202 F20110404_AABIHT tang_y_Page_075.txt
e9f33a308a3d7f1761738f9fa742878d
703992aeed1661062f563ebba357eae894bc0998
1016 F20110404_AABIIJ tang_y_Page_002.pro
a506ff710e74f8e2c8199aa4ee6d1e6d
77b1e27ecdb7a5a055bffece1ec61e1d3cd541fb
F20110404_AABIHU tang_y_Page_076.txt
c4e8456ef0a871e6b7ced1f9b6c36005
8378d2b33ccd896b57fb4a4abd38441ec4be0293
1350 F20110404_AABIIK tang_y_Page_003.pro
b980a4a9058f6e41de61745c96e1179d
3ff6501a58515280d2b1acdeac8ddb144da19fb4
2133 F20110404_AABIHV tang_y_Page_079.txt
9902963c66472a6b8f964d36cbafde7a
49d32628155cb1acfb1ac32af39a6982a17eff0e
39395 F20110404_AABIIL tang_y_Page_004.pro
bc853917107abce49bff8803a9987343
0b7c11cccb560b54ecde112fa511580af15bf861
1334 F20110404_AABIHW tang_y_Page_087.txt
3dcbe86cfa9ee51b503ed1f9ce263093
9bc66eb79e16ae1236b080537fe88098adf8e90f
46867 F20110404_AABIJA tang_y_Page_042.pro
c0b9c2782498712a149ee719eb225419
395d4aef2c3c7215c9e56c6eb67023b8743665e1
46836 F20110404_AABIIM tang_y_Page_005.pro
37dda3b45506aea7daff3e5ac9e19b9e
9aefbfba66df4a6310b9dadd90aba1db765e18a5
1910 F20110404_AABIHX tang_y_Page_089.txt
360d66b7aefc207ffcfedf70ff377695
9c898c560f406c62153cab308ad1b7515686fc8b
52671 F20110404_AABIJB tang_y_Page_043.pro
5d254c386126a519de8de884bc6d0654
0d8629816d5c72e7eb12afa94c0cd79165e0a049
22211 F20110404_AABIIN tang_y_Page_009.pro
adf8b84d9480f41bc7f8a03d6d94b5f9
c04fd20389a88288e3990afc5881fd93e7640041
2196 F20110404_AABIHY tang_y_Page_090.txt
6751ae3e13564abcbf38f17acc878f03
83cea57b074447b6343e27ce24db324d10a2250a
47519 F20110404_AABIJC tang_y_Page_044.pro
e2118c8aa69f7f9db6b6a642fe743254
a7a3b58f18a829fbdeae958f9cf78f7b199fcee2
1806 F20110404_AABIHZ tang_y_Page_093.txt
bab6bb6e05be4d8d782dbec3d8735d52
eabfd7afd34f96707cd2e40c14c2ee950bc38d6c
41206 F20110404_AABIJD tang_y_Page_047.pro
a4f23cab9c3fe3e6f15bd54cd206df4a
8ca5d47979e5c6803e4850ecf7117c6c7f501bf9
42081 F20110404_AABIIO tang_y_Page_011.pro
e099d67edb9b2e0ba5bed3eebc8fc138
7fa80edc36223ba56d59991af08218b9fecaea8b
54228 F20110404_AABIJE tang_y_Page_049.pro
9732eb435f229b97d39f1521df72eab3
221d04a48ca5a2e354ba2067cc50318f0e8c1b7c
54942 F20110404_AABIIP tang_y_Page_014.pro
a79f7fc2ef59603ac7c1cf47137333d9
bcab5a255e7acfd4ca2d0b7393150af3ce11f168



PAGE 4

Firstofall,Iwouldliketothankmyadvisor,Prof.ShigangChen,forhisguidanceandsupportthroughoutmygraduatestudies.Withoutthenumerousdiscussionsandbrainstormswithhim,theresultspresentedinthisdissertationwouldneverhaveexisted.IamgratefultoProf.SartajSahni,Prof.SanjayRanka,Prof.YuguangFangandProf.DapengWufortheirguidanceandencouragementduringmyyearsattheUniversityofFlorida(UF).IwouldalsoliketothankProf.YeXiaforhisvaluablecommentsandsuggestionsonmyresearch.IamthankfultoallmycolleaguesinProf.Chen'sgroup,includingQingguoSong,ZhanZhang,WeiPan,LiangZhang,andMyungKeunYoon.Theyprovidevaluablefeedbackformyresearch.IwouldalsoliketothankmanypeopleintheComputerandInformationScienceandEngineering(CISE)Departmentfortheirhelpinmyresearchwork.Inparticular,IwouldliketothankFeiWangforhishelpandvaluablediscussionsinmyresearchwork.IwouldalsoliketothankJuWang,ZhizhouWang,XiaobinWu,MingxiWu,JundongLiuandJieZhangfortheirhelpthroughoutmygraduatelife.IamalsothankfultomylongtimefriendsbeforeIenteredUF.Inparticular,IamthankfultoPengWuforhishelpandencouragement.Lastbutnotleast,Iamgratefultomyparentsandsistersfortheirlove,encouragement,andunderstanding.Itwouldbeimpossibleformetoexpressmygratitudetowardstheminmerewords. iv

PAGE 5

page ACKNOWLEDGMENTS ............................. iv LISTOFTABLES ................................. vii LISTOFFIGURES ................................ viii ABSTRACT .................................... x CHAPTER 1INTRODUCTION .............................. 1 1.1InternetWorms ............................. 1 1.2RelatedWork .............................. 4 1.3Contribution ............................... 6 1.3.1DistributedAnti-WormArchitecture .............. 7 1.3.2Signature-BasedWormIdenticationandDefense ...... 8 1.3.3OptimizationofIterativeMethods ............... 9 2SLOWINGDOWNINTERNETWORMS ................. 10 2.1ModelingWormPropagation ...................... 10 2.2FailureRate ............................... 12 2.3ADistributedAnti-WormArchitecture ................ 15 2.3.1Objectives ............................ 15 2.3.2Assumptions ........................... 15 2.3.3DAWOverview ......................... 17 2.3.4MeasuringFailureRate ..................... 19 2.3.5BasicRate-LimitAlgorithm .................. 20 2.3.6TemporalRate-LimitAlgorithm ................ 22 2.3.7RecentlyFailedAddressList .................. 25 2.3.8SpatialRate-LimitAlgorithm ................. 25 2.3.9BlockingPersistentScanningSources ............. 28 2.3.10FailLog .............................. 30 2.3.11WarholWormandFlashWorm ................ 32 2.3.12ForgedFailureReplys ...................... 33 2.4Simulation ................................ 33 v

PAGE 6

................... 37 3.1Double-HoneypotSystem ........................ 37 3.1.1Motivation ............................ 37 3.1.2SystemArchitecture ....................... 38 3.2PolymorphismofInternetWorms ................... 41 3.3Position-AwareDistributionSignature(PADS) ............ 50 3.3.1BackgroundandMotivation .................. 50 3.3.2Position-AwareDistributionSignature(PADS) ........ 53 3.4AlgorithmsforSignatureDetection .................. 57 3.4.1Expectation-MaximizationAlgorithm ............. 58 3.4.2GibbsSamplingAlgorithm ................... 59 3.4.3Complexities ........................... 60 3.4.4SignaturewithMultipleSeparatedStrings .......... 61 3.4.5Complexities ........................... 62 3.5MPADSwithMultipleSignatures ................... 62 3.6MixtureofPolymorphicWormsandClusteringAlgorithm ..... 63 3.6.1NormalizedCuts ......................... 64 3.7Experiments ............................... 66 3.7.1ConvergenceofSignatureGenerationAlgorithms ....... 68 3.7.2EectivenessofNormalizedCutsAlgorithm .......... 69 3.7.3ImpactofSignatureWidthandWormLength ........ 70 3.7.4FalsePositivesandFalseNegatives .............. 73 3.7.5ComparingPADSwithExistingMethods ........... 75 4MULTIPLEPADSMODELANDCLASSIFICATIONOFPOLYMORPHICWORMFAMILIES:ANOPTIMIZATION ................. 78 4.1Introduction ............................... 78 4.2ExtractionofMultiplePADSBlocksfromtheMixtureofPolymorphicWorms .................................. 80 4.2.1PADSBlocksandTheDatasetfromByteSequences ..... 80 4.2.2Expectation-Maximization(EM)Algorithm .......... 82 4.2.3ExtractionofMultiplePADSblocks .............. 85 4.3ClassicationofPolymorphicWormsandSignatureGeneration .. 86 4.3.1MultiplePADSBlocksModel ................. 86 4.3.2Classication ........................... 89 4.4Conclusion ................................ 90 5SUMMARYANDCONCLUSION ...................... 92 5.1Summary ................................ 92 5.2Conclusion ................................ 93 REFERENCES ................................... 95 BIOGRAPHICALSKETCH ............................ 100 vi

PAGE 7

Table page 2{1Failureratesofnormalhosts ......................... 14 2{25%propagtiontime(days)for\Temporal+Spatial" ........... 35 3{1AnexampleofaPADSsignaturewithwidthW=10 ........... 53 4{1AnexampleofaPADSblockwithwidthW=10 ............. 81 4{2AnexampleofasegmentwithwidthW=10 ............... 82 vii

PAGE 8

Figure page 2{1Distributedanti-wormArchitecture ..................... 16 2{2Worm-propagationcomparison ....................... 33 2{3Eectivenessofthetemporalrate-limitalgorithmforDAW ........ 36 2{4Eectivenessofthespatialrate-limitalgorithmforDAW ......... 36 2{5Stopwormpropagationbyblocking ..................... 36 2{6Propagationtimebeforethewormisstopped ............... 36 3{1Usingdouble-honeypotdetectingInternetworms ............. 39 3{2Adecryptorexampleofaworm. ....................... 42 3{3Dierentvariantsofapolymorphicwormusingthesamedecryptor ... 43 3{4Dierentvariantsofapolymorphicwormusingdierentdecryptors ... 44 3{5Dierentvariantsofapolymorphicwormwithdierentdecryptorsanddierententrypoint ............................. 44 3{6Dierentvariantsofapolymorphicwormwithgarbage-codeinsertation 45 3{7Dierentvariantsofapolymorphicwormwithseveraldierentpolymorphictechniques ................................... 47 3{8Signaturedetection .............................. 57 3{9Clusters .................................... 65 3{10Variantsofapolymorphicworm ....................... 67 3{11Inuenceofinitialcongurations ...................... 68 3{12Variantsclusteringusingnormalizedcuts .................. 69 3{13Matchingscoreinuenceofdierentsignaturewidthsandsamplevariantslengths ..................................... 70 3{14Inuenceofdierentlengthsofthesamplevariants ............ 71 3{15Inuenceofdierentlengthsofthesamplevariants ............ 71 viii

PAGE 9

............ 72 3{17Inuenceofdierentlengthsofthesamplevariants ............ 72 3{18Falsepositivesandfalsenegatives ...................... 74 3{19Theperformanceofsignature-basedsystemusingthelongestcommonsubstringsmethod. .............................. 75 3{20Bytefrequencydistributionsofnormaltrac(left-handplot)andwormtrac(right-handplot) ............................ 76 3{21Bytefrequencydistributionsofwormvariants.Left-handplot:maliciousandnormalpayloadscarriedbyawormvarianthaveequallength.Right-handplot:normalpayloadcarriedbyawormvariantis9timesofmaliciouspayload. ................................... 76 ix

PAGE 10

x

PAGE 11

xi

PAGE 12

1 2 3 4 5 6 7 ].EversincetheMorriswormshowedtheInternetcommunityforthersttimein1988thatawormcouldbringtheInternetdowninhours[ 8 ],newwormoutbreakshaveoccurredperiodicallyeventhoughtheirmechanismofspreadingwaslongwellunderstood.OnJuly19,2001,thecode-redworm(version2)infectedmorethan250,000hostsinjust9hours[ 9 ].Soonafter,theNimbdawormragedontheInternet[ 10 ].AsrecentlyasJanuary25,2003,anewwormcalledSQLSlammer[ 11 ]reportedlyshutdownnetworksacrossAsia,EuropeandtheAmericas.Themostcommonwayforawormtopropagateistoexploitasecurityloopholeincertainversion(s)ofaservicesoftwaretotakecontrolofthemachineandcopyitselfover.Forexample,theMorriswormexploitedabuginngerandatrapdoorinsendmailofBSD4.2or4.3,whilethecode-redwormtookadvantageofabuer-overowproblemintheindexserverofIIS4.0orIIS5.0.Typicallyaworm-infectedhostscanstheInternetforvulnerablesystems.ItchoosesanIPaddress,attemptsaconnectiontoaserviceport(e.g.,TCPport80inthecaseofcodered),andifsuccessful,attemptstheattack.Theaboveprocessrepeatswithdierentrandomaddresses.Asmoreandmoremachinesarecompromised,moreandmorecopiesofthewormareworkingtogethertoreproducethemselves.AnexplosiveepidemicisthereforedevelopedacrosstheInternet. 1

PAGE 13

Althoughmostknownwormsdidnotcauseseveredamagetothecompromisedsystems,theycouldhavealtereddata,removedles,stoleninformation,orusedtheinfectedhoststolaunchotherattacksiftheyhadchosentodoso.ThewormactivityoftencausesDenial-of-Service(DoS)asaby-product.ThehoststhatarevulnerabletoawormtypicallyaccountforasmallportionoftheIPaddressspace.Hence,wormsrelyonhigh-volumerandomscantondvictims.Thescantracfromtensofthousandsofcompromisedmachinescancongestnetworks.Therearefewanswerstothewormthreat.Onesolutionistopatchthesoftwareandeliminatethesecuritydefects[ 9 10 11 ].Thatdidnotworkbecause(1)softwarebugsseemalwaystoincreaseascomputersystemsbecomemoreandmorecomplicated,and(2)notallpeoplehavethehabitofkeepinganeyeonthepatchreleases.ThepatchforthesecurityholethatledtotheSQLSlammerwormwasreleasedhalfayearbeforethewormappeared,andstilltensofthousandsofcomputerswereinfected.Intrusiondetectionsystemsandanti-virussoftwaremaybeupgradedtodetectandremoveaknownworm,androutersandrewallsmaybeconguredtoblockthepacketswhosecontentcontainswormsignatures,butthosehappenafterawormhasspreadandbeenanalyzed.Mooreetal.studiedtheeectivenessofwormcontainmenttechnologies(addressblacklistingandcontentltering)andconcludedthatsuchsystemsmustreactinamatterofminutesandinterdictnearlyallInternetpathsinordertobesuccessful[ 2 ].Williamsonproposedtomodifythenetworkstacksothattherateofconnectionrequeststodistinctdestinationsisbounded[ 12 ].ThemainproblemisthatthisapproachbecomeseectiveonlyafterthemajorityofallInternethostsareupgradedwiththenewnetworkstack.Foranindividualorganization,althoughthelocaldeploymentmaybenettheInternetcommunity,itdoesnotprovideimmediateanti-wormprotectiontoitsownhosts,whosesecuritydepends

PAGE 14

ontherestoftheInternettakingthesameaction.ThisgiveslittleincentivefortheupgradewithoutanInternet-widecoordinatedeort.Mostknownwormshaveveryaggressivebehaviors.TheyattempttoinfecttheInternetinashortperiodoftime.Thesetypesofwormsareactuallyeasiertobedetectedbecuasetheiraggressivenessstandsoutfromthebackgroundtrac.Futurewormsmaybemodiedtocircumventtherate-baseddefensesystemsandpurposelyslowdownthepropagationrateinordertocompromiseavastnumberofsystemsoverthelongrunwithoutbeingdetected[ 2 ].Intrusiondetectionhasbeenintensivelystudiedinthepastdecade.Anomaly-basedsystems[ 4 13 14 ]prolethestatisticalfeaturesofnormaltrac.Anydeviationfromtheprolewillbetreatedassuspicious.Althoughthesesystemscandetectpreviouslyunknownattacks,theyhavehighfalsepositiveswhenthenormalactivitiesarediverseandunpredictable.Ontheotherhand,misusedetectionsystemslookforparticular,explicitindicationsofattackssuchasthepatternofmalicioustracpayload.Theycandetecttheknownwormsbutwillfailonthenewtypes.Mostdeployedworm-detectionsystemsaresignature-based,whichbelongstothemisuse-detectioncategory.Theylookforspecicbytesequences(calledattacksignatures)thatareknowntoappearintheattacktrac.Thesignaturesaremanuallyidentiedbyhumanexpertsthroughcarefulanalysisofthebytesequencefromcapturedattacktrac.Agoodsignatureshouldbeonethatconsistentlyshowsupinattacktracbutrarelyappearsinnormaltrac.Thesignature-basedsystems[ 15 16 ]haveanadvantageovertheanomaly-basedsystemsduetotheirsimplicityandtheabilityofoperatingonlineinrealtime.Theproblemisthattheycanonlydetectknownattackswithidentiedsignaturesthatareproducedbyexperts.Automatedsignaturegenerationfornewattacksisextremelydicultduetothreereasons.First,inordertocreateanattack

PAGE 15

signature,wemustidentifyandisolateattacktracfromlegitimatetrac.Automaticidenticationofnewwormsiscritical,whichisthefoundationofotherdefensemeasures.Second,thesignaturegenerationmustbegeneralenoughtocaptureallattacktracofacertaintypewhileatthesametimespecicenoughtoavoidoverlappingwiththecontentofnormaltracinordertoreducefalse-positives.Thisproblemhassofarbeenhandledinanad-hocwaybasedonhumanjudgement.Third,thedefensesystemmustbeexibleenoughtodealwiththepolymorphismintheattacktrac.Otherwise,wormsmaybeprogrammedtodeliberatelymodifythemselveseachtimetheyreplicateandthusfoolthedefensesystem. 17 ].ThismodelwaslaterusedtoanalyzethepropagationbehaviorofCode-Red-likewormsbyStanifordetal.[ 1 ]andMooreetal.[ 18 ].RenementsweremadetothemodelbyZouetal.[ 19 ]andWeaveretal.[ 20 ]inordertotwiththeobservedpropagationdata.Chenetal.proposedasophisticatedwormpropagationmodel(calledAAWP[ 21 ])basedondiscretetimes.Inthesamework,themodelisappliedtomonitor,detect,anddefendagainstthespreadofwormsunderarathersimpliedsetup,wherearangeofunusedaddressesaremonitoredandaconnectionmadetothoseaddressestriggersawormalert.ThedistributedearlywarningsystembyZouetal.[ 22 ]alsomonitorsunusedaddressesforthe\trend"ofillegitimatescantracontheInternet.Therearetwoproblemswiththeseapproaches.First,theattackerscaneasilyoverwhelmsuchasystemwithfalsepositivesbysendingpacketstothoseaddresses,orsomenormalprogramsmayscantheInternetforresearchorotherpurposesandhitthemonitoredaddresses.Second,toachieve

PAGE 16

goodresponsetime,thenumberof\unusedaddresses"tobemonitoredhastobelarge,butaddressesarescarceresourceintheIPv4world,andonlyafewhavetheprivilegeofestablishingsuchasystem.Amonitor/detectionsystembasedon\usedaddresses"willbemuchmoreattractive.ItallowsmoreinstitutesorcommercialcompaniestoparticipateinthequestofdefeatingInternetworms.Forwormsthatpropagateamongstcertaintypeofservers,asolutionistoblocktheservers'outboundconnectionssothatthewormscannotspreadamongthem.ThisapproachworksonlywhenitisimplementedforalloravastmajorityoftheserversontheInternet.SuchanInternet-wideeorthasnotbeenandmayneverbeachieved,consideringthattherearesomanycountriesintheworldandhomeusersaresettinguptheirserverswithoutknowingthis\goodpractice."Inaddition,theapproachdoesnotapplywhenamachineisusedbothasaserverandasaclient.Mooreetal.studiedtheeectivenessofwormcontainmenttechnologies(addressblacklistingandcontentltering)andconcludedthatsuchsystemsmustreactinamatterofminutesandinterdictnearlyallInternetpathsinordertobesuccessful[ 2 ].WilliamsonandTwycrossproposedtomodifythenetworkstacksothattherateofconnectionrequeststodistinctdestinationsisbounded[ 12 23 ].Schechteretal.[ 24 ]usedthesequentialhypothesistesttodetectscansourcesandproposedacredit-basedalgorithmforlimitingthescanrateofahost.Weaveretal.[ 25 ]developedcontainmentalgorithmssuitablefordeploymentwithhigh-speed,low-costnetworkhardware.ThemainproblemoftheaboveapproachesisthattheireectivenessagainstwormpropagationrequiresInternet-widedeployment.Guetal.[ 26 ]proposedasimpletwo-phaselocalwormvictimdetectionalgorithmbasedonbothinfectionpatternandscanningpattern.Apparently,itcannotissueawarningbeforesomelocalhostsarecompromised.Noneoftheaboveapproachesisabletohandletheashworms[ 27 ]thatperformtargetedscanning.

PAGE 17

Honeypots[ 28 ]havegainedalotofattentionrecently.TheirgoalistoattractandtraptheattacktracontheInternet.Provos[ 29 ]designedavirtualhoneypotframeworktoexhibittheTCP/IPstackbehaviorofdierentoperatingsystems.KreibichandCrowcroft[ 30 ]proposedtheHoneycombtoidentifythewormsignaturesbyusinglongestcommonsubstrings.Dagonetal.developedHoneyStat[ 31 ]todetectwormbehaviorsinsmallnetworks.Theabovesystemseitherassumethatallincomingconnectionstothehoneypotarefromworms,orrelyonexpertsforthemanualwormanalysis.Theserestrictionsgreatlyunderminetheeectivenessofthesystems.KruegelandVigna[ 4 ]discussedvariouswaysofapplyinganomalydetectioninweb-basedattacks.Serveralmethods,suchas2-testandMarkovmodelswerepresented.WangandStolfo[ 14 ]usedthebyte-frequencydistributionofthetracpayloadtoidentifyanomalousbehaviorandpossiblywormattacks.Thesemethodsareineectiveagainstpolymorphicworms.Theresearchindefendingagainstpolymorphicwormsisstillinitsinfancy.ChristodorescuandJha[ 32 ]discussedavarietyofdierentpolymorphismtechniquesthatcouldbeusedtoobfuscatemaliciouscode.Italsoproposedastaticanalysismethodtoidentifymaliciouspatternsinexecutables.KolesnikovandLee[ 33 ]describedsomeadvancedpolymorphicwormsthatmutatebasedonnormaltrac.KimandKarp[ 34 ]proposedawormsignaturedetectionsystemwithlimiteddiscussiononpolymorphism.

PAGE 18

wormsignatures.Finally,tofurtherimprovetheperformance,anovelformatofsignatureisdenedandtheiterativemethodstocomputethesignatureisdiscussedinordertodealwiththepolymorphismofInternetworms.TheproposedmethodisoptimizedinthethesiswithaGaussianmixturemodel,thuseliminatesunnecessarycomputationsandsavesthetimecomplexityofourapproach.

PAGE 19

normalhoststomakesuccessfulconnectionsatanyrate.DuetotheuseofDNSinresolvingIPaddresses,thechanceofattemptingconnectionstonon-existinghostsbynormalusersisrelativelylow,becauseaconnectionwillneverbeinitiatedbytheapplicationifDNSdoesnotndthedestinationhost.Thisisespeciallytrueconsideringthatatypicaluserhasanumberoffavorite,frequently-accessedsites(thatareknowntoexist).Atemporalrate-limitalgorithmandaspatialrate-limitalgorithmareusedtoboundthescanningrateoftheinfectedhosts.OneimportantcontributionofDAWistomakethespeedofwormpropagationcongurable,nolongerbytheparametersofwormsbutbytheparametersofDAW.WhiletheactualvaluesoftheparametersshouldbesetbasedontheISPtracstatistics,weanalyzetheimpactofthoseparametersontheperformanceofDAWandusesimulationstostudythesuitablevalueranges. 35 ][ 36 ]areproposedforecientcomputationofPADSfrompolymorphicwormsamples.Experimentsbasedonvariantsof

PAGE 20

theMSBlasterwormareperformed.Theresultsshowthatoursignature-baseddefensesystemcanaccuratelyseparatenewvariantsofthewormfromthenormalbackgroundtracbyusingthePADSsignaturederivedfromthepastsamples.Todealwithmultiplemalicioussegmentsoftheworm,amulti-segmentpositionawaredistributionsignature(MPADforclassicationofthepolymorphicwormfamiliestogetherwithnormalizedcutalgorithm.

PAGE 21

37 1 2 ]. Ni(t)(1i(t))(2{2) 10

PAGE 22

Theaboveequationagreesperfectlywithoursimulations.Solvingtheequation,wehavei(t)=erV N(tT) N(tT)Letthenumberofinitiallyinfectedhostsbev.i(0)=v=V,andwehaveT=N rVlnv Vv.Thetime(t())ittakesforapercentage(v=V)ofallvulnerablehoststobeinfectedis rV(ln Vv)(2{3)Supposethewormattackstartsfromoneinfectedhost.v=1.Wehave rVln(V1) 1(2{4)ThetimepredictedbyEq.( 2{4 )canbeachievedonlyunderidealconditions.Inreality,wormspropagateslowerduetoanumberoffactors.First,oncealargenumberofhostsareinfected,theaggressivescanningactivitiesoftencausewide-spreadnetworkcongestionsandconsequentlymanyscanmessagesaredropped.Second,whenawormoutbreakisannounced,manysystemadministratorsshutdownvulnerableserversorremovetheinfectedhostsfromtheInternet.Third,sometypesofwormsenterdormantstateafterbeingactiveforaperiodoftime.Duetotheabovereasons,thecoderedspreadmuchslowerthanthecalculationbasedonEq.( 2{4 ).Amoresophisticatedmodelthatconsidersthersttwofactorescanbefoundin[ 19 ],whichtsbetterwiththeobservedcode-reddata.AllexistingmodelscannotdescribethetheoreticalWarholwormandFlashwormpresentedin[ 1 ].WeshalladdressthemseparatelyinSection 2.3.11 .PracticallyitisimportanttoslowdownthewormpropagationinordertogivetheInternetcommunityenoughtimetoreactinthefaceofanunknownworm.Eq.( 2{4 )pointsouttwopossibleapproaches:decreasingrcausest()toincreaseinverse-proportionally;increasingNcausest()toincreaseproportionally.In

PAGE 23

thispaper,weusetherstapproachtoslowdowntheworms,whilerelyingonadierenttechniquetohaltthepropagation.TheideaistoblockouttheinfectedhostsandmakesurethatthescanningactivityofaninfectedhostdoesnotlastformorethanaperiodofT.Undersuchaconstraint,thepropagationmodelbecomes N(i(t)i(tT))(1i(t))(2{5)TheaboveequationcanbederivedbyfollowingthesameprocedurethatderivesEq.( 2{2 ),exceptthatattimetthenumberofinfectedhostsis(i(t)i(tT))Vinsteadofi(t)V. V)N V,thewormwillbestoppedbeforeapercentageofallvulnerablehostsareinfected.Proof:EachinfectedhostsendsrTscanmessages,andcausesrTV N(orlessduetoduplicatehits)newinfections.Forthewormtostop,weneedrTV N<1.Thetotalinfectionsbeforethewormstopsisnomorethan1i=0v(rTV N)i=v N.IfrT<(1v V)N V,wehavev N
PAGE 24

failedconnectionrequestsfromahostsiscalledthefailurerate,whichcanbemeasuredbymonitoringthefailurereplysthataresenttos.TosupportDAW,theISPrequiresitscustomernetworkstoreturnICMPhost-unreachablepacketsiftheSYNpacketsaredroppedbytheirroutersorrewalls.ItisacommonpracticeontheInternet.ItshouldbenotedthatourdefensesystemdoesnotrequireeverycustomernetworkthatblocksICMPtoforwardthelogmessages,althoughdoingsohelpstheperformanceofthesystem.Ourdefensesystemworkswellaslongasaportion(e.g.,10%)ofallcustomernetworksdoesnotblockICMPhost-unreachablepackets.Thefailureratemeasuredforanormalhostislikelytobelow.FormostInternetapplications(www,telnet,ftp,etc.),ausernormallytypesdomainnamesinsteadofrawIPaddressestoidentifytheservers.DomainnamesareresolvedbyDomainNameSystem(DNS)forIPaddresses.IfDNScannotndtheaddressofagivenname,theapplicationwillnotissueaconnectionrequest.Hence,mistypingorstaleweblinksdonotresultinfailedconnectionrequests.AnICMPhost-unreachablepacketisreturnedonlywhentheserveriso-lineortheDNSrecordisstale,whicharebothuncommonforpopularorregularly-maintainedsites(e.g.,Yahoo,Ebay,CNN,universities,governments,enterprises,etc.)thatattractmostofInternettrac.Moreover,afrequentusertypicallyhasalistoffavoritesites(servers)towhichmostconnectionsaremade.Sincethosesitesareknowntoworkmostofthetime,thefailurerateforsuchauserislikelytobelow.Ifaconnectionfailsduetonetworkcongestion,itdoesnotaectthemeasurementofthefailureratebecausenoICMPhost-unreachableorRESETpacketisreturned.Toillustrateourargument,wemeasuredthefailureratesonthreedierentdomainsoftheUniversityofFloridanetwork.Inourexperiments,domain1consistsofveClassCnetworks,domain2consistsofoneClassC

PAGE 25

avg.dailyfailurerate worstdailyfailurerate dailyfailurerate perhost perhost ofthewholenetwork Domain1 3.0 43 824 Domain2 10.1 41 116 Domain3 3.11 63 106 Table2{1. Failureratesofnormalhosts network,anddomain2consistsoftwoClassCnetwork.Table 2{1 clearlyshowsthatfailureratesfornormalhostsaretypicallyverylow.Ontheotherhand,thefailureratemeasuredforaworm-infectedhostislikelytobehigh.Unlikenormaltrac,mostconnectionrequestsinitiatedbyawormfailbecausethedestinationaddressesarerandomlypicked,whicharelikelyeithernotinuseornotlisteningontheportthatthewormtargetsat.Considertheinfamouscode-redworm.Ourexperimentshowsthat99.96%ofallconnectionsmadetorandomaddressesatTCPport80fails.Thatis,thefailurerateis99.96%ofthescanningrate.Forwormstargetingatsoftwarethatislesspopularthanwebservers,thisgurewillbeevenhigher.Therelationbetweenthescanningratersandthefailureraterfofawormisrf=(1V0

PAGE 26

approachthatrestrictsthefailureratewillrestrictthescanningrate,whichslowsdownthewormpropagation.Awormmaybedeliberatelydesignedtohaveaslowpropagationrateinordertoevadethedetection,whichwillbeaddressedinSection 2.3.9 2.3.1ObjectivesThissectionpresentsadistributedanti-wormarchitecture(DAW),whosemainobjectivesare

PAGE 27

Figure2{1. Distributedanti-wormArchitecture alargeISPhassucientincentivetodeploysuchasysteminordertogainmarketingedgeagainstitscompetitors.WeassumethatasignicantportionoffailurereplysarenotblockedwithintheISP.IftheISPaddressspaceisdenselypopulated,thenitisrequiredthatasignicantportionofTCPRESETpacketsarenotblocked,whichisnormallythecase.IftheISPaddressspaceissparselypopulated,thenitisrequiredthatICMPhost-unreachablepacketsfromasignicantportionofaddressesarenotblocked,whichcanbeeasilysatised.Becausetherearemanyunusedaddresses,theISProuterswillgenerateICMPhost-unreachableforthoseaddresses.Hence,theISPsimplyhastomakesureitsownroutersdonotlterICMPhost-unreachableuntiltheyarecounted.IfsomecustomernetworksblockallincomingSYNpacketsexceptforalistofservers,theirlteringroutersshouldeithergenerateICMPhost-unreachableforthedroppedSYNpacketsor,incasethatICMPreplysaredesirable,sendlogmessagestoanISPlogstation.Uponreceiptofalogmessage,thelogstationsendsanICMPhost-unreachabletowardsthesenderoftheSYNpacket.WhenanISPedgerouterreceivesanICPMhost-unreachablepacketfromthelogstation,itcountsaconnectionfailureanddropsthepacket.

PAGE 28

2{1 ,DAWconsistsoftwosoftwarecomponents:aDAWagentthatisdeployedonalledgeroutersoftheISPandamanagementstationthatcollectsdatafromtheagents.Eachagentmonitorstheconnection-failurereplyssenttothecustomernetworkthattheedgerouterconnectsto.Itidentiesthepotentialoendinghostsandmeasurestheirfailurerates.Ifthefailurerateofahostexceedsapre-conguredthreshold,theagentrandomlydropsaminimumnumberofconnectionrequestsfromthathostinordertokeepitsfailurerateunderthethreshold.Atemporalrate-limitalgorithmandaspatialrate-limitalgorithmareusedtoconstrainanywormactivitytoalowleveloverthelongterm,whileaccommodatingthetemporaryaggressivebehaviorofnormalhosts.Eachagentperiodicallyreportstheobservedscanningactivityandthepotentialoenderstothemanagementstation.Acontinuous,steadyincreaseinthegrossscanningactivityraisestheagofapossiblewormattack.Thewormpropagationisfurtherslowedorevenstoppedbyblockingthehostswithpersistentlyhighfailurerates.EachedgerouterreadsacongurationlefromthemanagementstationaboutwhatsourceaddressesSandwhatdestinationportsPthatitshouldmonitorandregulate.Sconsistsofallorsomeaddressesbelongingtothecustomernetwork.ItprovidesameanstoexemptcertainaddressesfromDAWforresearchorotherpurposes.Pconsistsoftheportnumberstobeprotectedsuchas80/8080forwww,23fortelnet,and21forftp.ItshouldexcludetheapplicationsthatarenotsuitableforDAW;forexample,ahypotheticalapplicationrunswithanextremelyhighfailurerate,makingnormalhostsundistinguishablefromwormstargetingattheapplication.WhileDAWisnotdesignedforallservices,itisparticularlyeectiveinprotectingtheserviceswhoseclientsinvolvehumaninteractionssuchaswebbrowsering,whichmakesgreaterdistinctionbetweennormalhostsandworm-infectedhosts.

PAGE 29

Throughoutthepaper,whenwesay\arouterreceivesaconnectionrequest",werefertoaconnectionrequestthatenterstheISPfromacustomernetwork,withasourceaddressinSandadestinationportinP.Whenwesay\arouterreceivesafailurereply",werefertoafailurereplythatleavestheISPtoacustomernetwork,withadestinationaddressinSandasourceportinPifitisaTCPRESETpacket.Thisdissertationdoesnotaddressthewormactivitywithinacustomernetwork.Aworm-infectedhostisnotrestrictedinanywaytoinfectothervulnerablehostsofthesamecustomernetwork.DAWworksonlyagainsttheinter-networkinfections.Thescanningrateofaninfectedhostsisdenedasthenumberofconnectionrequestssentbysperunitoftimetoaddressesoutsideofthecustomernetworkwheresresides.Ifacustomernetworkhasm(>1)edgerouterswiththesameISP,theDAWagentshouldbestalledonallmedgerouters.IfsomeedgeroutersarewithdierentISPsthatdonotimplementDAW,thenetworkcanbeinfectedviathoseISPsbutthenarerestrictedinspreadingthewormtothecustomernetworksoftheISPsthatdoimplementDAW.Forthepurposeofsimplicity,wedonotconsidermulti-homednetworksintheanalysis.Basedonthedatafromallagents,thecontrollermonitorsthetotalnumberofpotentialoenders.Asteadyincreaseinthenumberofpotentialoendersisconsideredaspossibleon-goingwormpropagation.Whenthishappens,thecontrollerinstructstheedgerouterstoblockoutapercentageofpotentialoenders(i.e.,theirIPaddresses)thathavethehighestfailurerates.Thecontrollercontinuestodoublethepercentageaftereachperiod(e.g.,oneminute)untilthenumberofpotentialoendersstopstoincrease.Thereasontoblockonlyapercentageinsteadofallpotentialoendersisasfollows:thefailureratesofsomenormalhostsmayhappentoexceedthethresholdamidstawormattack.Witha

PAGE 30

mixofnormalhostsandinfectedhosts,theaggressivebehaviorofwormsmakestheinfectedhostsmorelikelytobeblocked,whilethenormalhostswithmarginalexceedingfailureratesremainunblocked.Ontheotherhand,ifanormalhosthappenstorunanautomatichost-maptoolinthemiddleofawormattack,itmaybeblockedduetohighfailurerateofscanningactivity.Topreventitfrombeingblockedindenitely,eachblockedaddressshouldbeunblockedaftercertainperiodoftime.Anedgerouterkeepsalogoftheblockedaddressesandthenumberoftimestheyareblockedrecently(e.g.,duringthepastmonth).Whenanaddressisrepetitivelyblocked,theblockingtimegrowsexpontentiallybyT=T0ek,whereT0istheinitialblockingtimeandkisthenumberofpriorblocks. 2{4 ))ofanyinfectedhost.

PAGE 31

Whenevertherouterreceivesafailurereplyfors,itcallsthefollowingfunction,whichupdatesfeachtimecisincreasedby100.isaparameterbetween0and1. Update Failure Rate Record()(1)cc+1(2)if(cisamultipleof100)(3)f0100=(thecurrentsystemclockt)(4)if(c=100)(5)ff0(6)else(7)ff+(1)f0(8)tthecurrentsystemclockItisunnecessarytocreateindividualfailure-raterecordsforthosehoststhatoccasionallymakeafewfailedconnections.EachedgeroutermaintainsahashtableH.Eachtableentryisafailure-raterecordwithouttheaddresseld.Whentherouterreceivesafailurereply,ithashesthedestinationaddresstoatableentryandcallsUpdate Failure Rate Record()onthatentry.EachentrythereforemeasuresthecombinedfailurerateofroughlyA=jHjaddresses,whereAisthesizeofthecustomernetworkandjHjisthesizeofthehashtable.Onlywhentherateofahash-tableentryexceedsathreshold(e.g.,onepersecond),theroutercreatesfailure-raterecordsforindividualaddressesoftheentry.Afailure-raterecordisremovedifitscountercregisterstoofewfailedconnectionsinaperiodoftime.

PAGE 32

Foreachs2F,therouterreducesitsfailureratebelowbyrate-limitingtheconnectionrequestsfroms.Atokenbucketisused.Letsizebethebucketsize,tokensbethenumberoftokens,andtimebeatimestampwhoseinitialvalueisthesystemclockwhenthealgorithmstarts. Uponreceiptofafailurereplytos(1)tokenstokens1Uponreceiptofaconnectionrequestfroms(2)tthecurrentsystemclocktime(3)tokensminftokens+t;sizeg(4)timethecurrentsystemclock(5)if(tokens1)(6)forwardtherequest(7)else(8)droptherequestItshouldbeemphasizedthattheabovealgorithmisnotatraditionaltoken-bucketalgorithmthatbuerstheoversizedburstsandreleasesthemataxedaveragerate.Thepurposeofouralgorithmisnottoshapetheowofincomingfailurereplysbuttoshapethe\creation"ofthefailurereplys.ItensuresthatthefailurerateofanyaddressinSstaysbelow.Thiseectivelyrestrictsthescanningrateofanyworm-infectedhost(Eq. 2{6 ).Thisandotherrate-limitalgorithmsareperformedonindividualaddresses.Theyarenotperformedonthefailure-raterecordsinthehashtable;thatisbecauseotherwisemanyaddresseswouldhavebeenblockedduetoonescansourcemappedtothesamehash-tableentry.OnefundamentalideaofDAWistomakethespeedofwormpropagationnolongerdeterminedbythewormparameterssetbytheattackers,butbythe

PAGE 33

DAWparameterssetbytheISPadministrators.Inthefollowing,weproposemoreadvancedrate-limitalgorithmstogivethedefendersgreatercontrol. Uponreceiptofafailurereplytos(1)tokenstokens1Uponreceiptofaconnectionrequestfroms(2)tthecurrentsystemclocktime(3)if(c=2)(4)tokensminftokens+t;sizeg(5)else

PAGE 34

(6)0ctokens Failure Rate Record().Hence,(tokens+c)staysthesame.Nowconsidertherouterreceivesaconnectionrequest.Thevaluesoftokensbeforeandafterreceivingthepacketaredenotedastokens beforeandtokens after,respectively.Supposetokens before+c.BasedonLines6-7,we

PAGE 35

havetokens after=minftokens before+t0;sizegtokens before+tctokens before before+(ctokens before)cTherefore,tokens after+c.NextweprovethattokensrTattheendoftheday.Considerthecasethattokens<1attheendoftheday.Withoutlosinggenerality,supposetokens1beforetimet0,0tokens<1aftert0duetotheexecutionofLine1,andthentokensstayslessthanonefortherestoftheday.Aftert0,allconnectionrequestsfromsareblocked(Line12).Forallrequestssentbeforet0T,thefailurereplysmusthavealreadyarrivedbeforet0.ThereareatmostrTrequestssentbetweent0Tandt0.Therefore,thereareatmostrTfailurereplysarrivingaftert0.Weknowthattokens0att0.Hence,tokensrTattheendoftheday.Becausetokens+choldsatanytime,c+rTattheendoftheday.Thecountercequalsthenumberoffailurereplysreceivedduringthedayafterthefailure-raterecordforsiscreated.Beforethat,thereareatmostfailurereplyscountedbythehash-tableentrythatsmapsto.Intheworstcaseallthosereplysarefors.Therefore,thetotalnumberoffailurereplysforsisnomorethan2+rT.rTisnormallysmallbecausethetypicalroundtripdelayacrosstheInternetisintensorhundredsofmilliseconds.Hence,if=300,theaveragescanningrateofawormiseectivelylimitedtoabout2=D=0:42=min.Incomparison,Williamson'sexperimentshowedthatthescanningrateofthecoderedwasatleast200/second[ 12 ],whichismorethan28,000timesfaster.Yet,ittookthecodered

PAGE 36

hourstospread,suggestingthepromisingpotentialofusingthetemporalrate-limitalgorithmtoslowdownworms.Additionalsystemparametersthatspecifythemaximumnumbersoffailedrequestsinlongertimescales(weekormonth)canfurtherincreasethewormpropagationtime.

PAGE 37

controlsthetotalnumberoffailedrequestsallowedforacustomernetworkperday.Itmayvaryfordierentcustomernetworksbasedontheirsizes.OncethenumberofaddressesinsertedtoRFALexceeds,thesystemstartstocreatefailure-raterecordsforalladdressesthatreceivefailurereplys,andactivatesthespatialalgorithm.Iftherearetoomanyrecords,itretainsthosewiththelargestcounters.LetF(2S)bethesetofaddresseswhosecountersexceedasmallthreshold(e.g.,50),whichexcludestheobviousnormalhosts.Thespatialrate-limitalgorithmisthesameasthetemporalalgorithmexceptthats,,andcarereplacedrespectivelybyF,,andthetotalnumberoffailurereplystoFreceivedafterthespatialalgoirthmisactiviated.ForanyaddresssinF\F,thetemporalrate-limitalgorithmisrstexecutedandthenthespatialrate-limitalgorithmisexecuted.Thereasontoapplythetemporalalgorithmistopreventafewaggressiveinfectedhostsfromkeepingreducingtokenstozero.Ontheotherhand,iftherearealargenumberofinfectedhosts,causingthespatialalgorithmtodropmostrequests,theroutershouldtemporarilyblocktheaddresseswhosefailure-raterecordshavethelargestcounters.Theedgeroutersmaybeconguredindependentlywithsomerunningboththetemporalandspatialalgorithmsbutsomerunningthetemporalalgorithmonly.Forexample,theedgeroutersfortheneighborISPsshouldhavelargevaluesornotrunthespatialalgorithm.

PAGE 38

Duetospacelimitation,theproofisomitted,whichisverysimilartotheproofforTheorem 2 .mr0Tislikelytobesmallbecausebothr0andTaresmall.Thefollowinganalysisisbasedonasimpliedmodel.Amoregeneralmodelwillbeusedinthesimulations.Supposetherearekcustomernetworks,eachwithV=kvulnerablehosts.Onceavulnerablehostisinfected,weassumeallothervulnerablehostsinthesamecustomernetworksareinfectedimmediatelybecauseDAWdoesnotrestrictthescanningactivitywithinthecustomernetwork.BasedonTheorem 3 ,thecombinedscanningrateofallvulnerablehostsinacustomernetworkis(2+mrT)=D2=D.Letj(t)bethepercentageofcustomernetworksthatareinfectedbytheworm.Attimet,thenumberofinfectedcustomernetworksisj(t)k,andthenumberofuninfectednetworksis(1j(t))k.Theprobabilityforonescanmessagetohitanuninfectedvulnerablehostandthusinfectthenetworkwherethehostresidesis(1j(t))V=N.Foraninnitelysmallperioddt,j(t)changesbydj(t).Duringthattime,thereare2 Ndt. Ndtdj(t) 1

PAGE 39

SupposeanISPwantstoensurethatthetimeforpercentofnetworkstobeinfectedisatleastdays.Thevalueofshouldsatisfythefollowingcondition.N 1whichisnotrelatedtohowthewormbehaves. 2{5 ).Thewormpropagatesslowlyunderthetemporalrate-limitalgorithmandthespatialrate-limitalgorithm.Itgivestheadministratorssucienttimetostudythetracofthehoststobeblocked,performanalysistodeterminewhetheraworminfectionhasoccurred,anddecidewhethertoapproveordisapprovetheblocking.Oncethethreatofawormisconrmed,theedgeroutersmaybeinstructedtoreducen,whichincreasesthechanceoffullystoppingtheworm.Supposeawormscansmorethanaddressesperday.Thewormpropagationcanbecompletelystoppedifeachinfectedcustomernetworkmakeslessthanonenewinfectiononaveragebeforeitsinfectedhostsareblocked.Thenumberofaddressesscannedbytheinfectedhostsfromasinglenetworkduringndaysisabout2nbyTheorem 3 .EachmessagehasamaximumprobabilityofV=Ntoinfectanewhost.Hence,theconditiontostopawormis2nV N<1

PAGE 40

Theexpectedtotalnumberofinfectednetworksisboundedby1i=0(2nV N)i=1 12nV NOntheotherhand,when2nV N1,thewormmaynotbestoppedbytheaboveapproachalone.Howeverthesignicanceofblockinginfectedhostsshouldnotbeunder-estimatedasitmakestheworm-propagationtimelongerandgiveshumanorotherautomatictools(e.g.,theonedescribedbelow)morereactiontime.Ifthescanningrateofawormisbelowperday,theinfectedhostswillnotbeblocked.DAWrelysonadierentapproachtoaddressthisproblem.Duringeachday,anedgerouteremeasuresthetotalnumberofconnectionrequests,denotedasnc(e),andthetotalnumberoffailurereplys,denotedasnf(e).NotethatonlytherequestsandreplysthatmatchSandP(Section 2.3.3 )aremeasured.Theroutersendsthesenumberstothemanagementstationattheendoftheday.Themanagementstationmeasuresthefollowingratioe2Enf(e) e2Enc(e)whereEisthesetofedgerouters.Iftheratioincreasessignicantlyforanumberofdays,itsignalsapotentialwormthreat.Thatisbecausetheincreaseinfailedrequestssteadilyoutpacestheincreaseinissuedrequests,whichispossiblytheresultofmoreandmorehostsbeinginfectedbyworms.Themanagementstationtheninstructstheedgerouterstoidentifypotentialoenderswhosecounters(c)havethehighestvalues.Additionalpotentialoendersarefoundasfollows.Afteravulnerableserverisinfectedviaaportthatitlistensto,theservernormallyscanstheInternetonthesameporttoinfectotherservers.Basedonthisobservation,whenanedgerouterreceivesaRESETpacketwithasourceaddressd,asourceportp2Ptoadestinationaddresss2S,itsendsaSYNpackettocheckifsisalsolisteningonportp.Ifitis,theroutermarkssasa

PAGE 41

potentialoenderandcreatesafailure-raterecord,whichmeasuresthenumberoffailedconnectionsfroms.Attheendofeachday,themanagementstationcollectsthepotentialoendersfromalledgerouters.Thosewiththelargestcountersarepresentedtotheadministratorsfortracanalysis.Themanagementstationmayinstructtheedgerouterstoblockthemifthewormthreatisconrmed.Althoughablockedservercannotissueconnectionrequestsbeforeitisunblocked,itcanacceptconnectionrequestsatanyrate.Itsroleofaserverisunchanged.Analternativetocompleteblockingistoapplyadierent,smallvalue(e.g.,50)onthoseaddresses,whichleavesroomforfalsepositivessincethehostscanstillmakeasmanysuccessfulconnectionsastheywant,withoccasionalfailures.

PAGE 42

requirementforcustomernetworksthatblockICMPtogenerateFailLogwillberelaxedinSection 2.3.10 .IthasbeenassumedsofarthateverycustomernetworkthatblocksICMPwillgenerateFailLogmessages.Wenowrelaxthisrequirement.Considerawormthattargetsatoneormultipleports(e.g,webservice).LetAbetheIPaddressspacethatarenotoccupiedbythehostslisteningonthoseports.Letp1bethepercentageofAthatisusedbyexistinghosts.Letp2bethepercentageofAthatisnotusedbutrepondsconnectionrequestswithICMPhost-unreachablepackets(generatedbyrouters).ThisincludestheISP'sreservedaddressesforfutureexpansion.Letp3bethepercentageofAthatisnotusedanddoesnotrepondwithICMPhost-unreachablepackets.Amongp3,Letp03bethepercentagethatgeneratesFailLog.p1+p2+p3=1and0p03p3.Eq.( 2{6 )wasderivedundertheconditionthatp03=p3.IfnoneoronlysomecustomernetworksgenerateFailLog,theequationbecomesrs1

PAGE 43

1 ],whichembodiedanumberofhighlyeectivetechniquesthatthefuturewormsmightusetoinfecttheInternetinaveryshortperiodoftime,leavingnoroomforhumanactions.Inordertoimprovethechanceofinfectionduringtheinitialphase,theWarholwormrstscansapre-madelistof(e.g.,10000to50000)potentiallyvulnerablehosts,whichiscalledahit-list.Afterthat,thewormperformspermutationscanning,whichdividestheaddressspacetobescannedamongtheinfectedhosts.Onewaytogenerateahit-lististoperformascanoftheInternetbeforethewormisreleased[ 1 ].WithDAW,itwilltakeaboutN=2days.Suppose=300andN=232.Thatwouldbe19611years.Evenifthehit-listcanbegeneratedbyadierentmeans,thepermutationscanningislesseectiveunderDAW.Forinstance,evenafter10000vulnerablehostsareinfected,theycanonlyprobeabout100002=6106addressesaday.Consideringthesizeoftheaddressspaceis2324:3109,duplicatehitsarenotaseriousproblem,whichmeansthegainbypermutationscanningissmall.WithoutDAW,itwillbeadierentmatter.Ifthescanningrateis200=second,ittakeslessthan36minutesfor10000infectedhoststomake232probes,andduplicatehitsareveryfrequent.TheFlashwormassumesahit-listLincludingmostserversthatlistenonthetargetedport.Hence,randomscanningiscompletelyavoided;thewormscansonlytheaddressesinL.Asmoreandmorehostsareinfected,Lisrecursivelysplitamongthenewlyinfectedhosts,whichscanonlytheassignedaddressesfromL.TheFlashwormrequiresaprescanoftheentireInternetbeforeitisreleased.SuchaprescantakestoolongunderDAW.Inaddition,eachinfectedhostcanonlyscanabout2addressesperday,whichlimitsthepropagationspeedofthewormifLislarge.

PAGE 44

Figure2{2. Worm-propagationcomparison 2.3.3 .Eachtableentrycontainsasourceaddress,asourceport,adestinationaddress,andadestinationport,identifyingaconnectionrequest.Onlythosefailurereplysthatmatchthetableentriesarecounted.Analternativeapproachistoextendthefailure-raterecordbyaddingtwoelds:one(x)countingthenumberofconnectionrequestsfromsandtheother(y)countingthenumberofsuccessfulconnections,i.e.,TCPSYN/ACKpacketssenttos,wheresistheaddresseldoftherecord.Aninvariantismaintainedsuchthatthenumberoffailedconnectionsplusthenumberofsuccessfulconnectionsdoesnotexceedthenumberofconnectionrequests,i.e.,c+yx.Afailurereplyiscounted(cc+1)onlywhentheinvariantisnotviolated. 2{2 showshowtherate-limitalgorithmsslowdownthewormpropagation.Thesimulationparametersaregivenasfollows.=1/sec.=300.=3000.n=7days.Thenumberofcustomernetworksarek=10000.Theaveragenumberof

PAGE 45

vulnerablehostspercustomernetworkisz=10.Thenumbersofvulnerablehostsindierentcustomernetworksfollowanexponentialdistribution,suggestingascenariowheremostcustomernetworkshavetenorlesspublicservers,butsomehavelargenumbersofservers.SupposethewormusesaNimda-likealgorithmthataggressivelysearchesthelocal-addressspace.Weassumethatonceavulnerablehostofacustomernetworkisinfected,allvulnerablehostsofthesamenetworkareinfectedshortly.Figure 2{2 comparesthepercentagei(t)ofvulnerablehoststhatareinfectedovertimetinvedierentcases:1)noalgorithmisused,2)thebasicrate-limitalgorithmisimplementedontheedgerouters,3)thetemporalrate-limitalgorithmisimplemented,4)boththetemporalandspatialrate-limitalgorithmsareimplemented,or5)DAW(i.e.,Temporal,Spatial,andblockingpersistentscanningsources)isimplemented.Notethatallalgorithmslimitthefailurerates,nottherequestrates,andthespatialrate-limitalgorithmisappliedonlyonthehostswhosefailurecountersexceedathreshold=50.Twographsshowthesimulationresultsindierenttimescales.Theuppergraphisfrom0to18hours,andthelowerisfrom0to100days.Theshapeofthecurve\NoAlgorithm"dependsontheworm'sscanningrate,whichis10/secinoursimulation.Theotherfourcurvesareindependentoftheworm'sscanningrate;theydependonlyonDAW'sparameters,i.e.,,,,andn.Thegureshowsthatthebasicrate-limitalgorithmslowsdownthewormpropagationfromminutestohours,whilethetemporalrate-limitalgorithmslowsdownthepropagationtotensofdays.Thespatialrate-limitalgorithmmakesfurtherimprovementontopofthat|ittakestheworm80daystoinfect5%ofthevulnerablehosts,leavingsucienttimeforhumanintervention.Moreover,withpersistentscanningsourcesbeingblockedafter7days,DAWisabletostopthewormpropagationati(t)=0:000034.

PAGE 46

k z =1000 =3000 =5000 =7000 5000 10 350.3 116.8 69.6 50.2 5000 20 237.2 79.1 47.2 33.9 10000 10 190.1 63.5 38.1 27.1 10000 20 127.9 42.5 25.5 18.3 15000 10 133.6 44.4 26.3 19.3 15000 20 89.3 29.7 17.8 12.7 20000 10 103.3 34.2 20.6 14.6 20000 20 68.9 22.9 13.8 10.0 Table2{2. 5%propagtiontime(days)for\Temporal+Spatial" Table 2{2 showsthetimeittakesthewormtoinfect5%ofvulnerablehosts(called5%propagationtime)undervariousconditionswithTemporal+Spatialimplemented.Dependingonthesize(kandz)oftheISP,thepropagationtimerangesfrom10.0daysto350.3days.Toensurealargepropagationtime,averylargeISPmaypartitionitscustomersintomultipledefensezonesofmodestsizes.DAWcanbeimplementedontheboundaryofeachzone,consistingoftheedgerouterstothecustomernetworksofthezoneandtheinternalroutersconnectingtootherzones.Figure 2{3 showstheperformanceofthetemporalrate-limitalgorithmwithrespecttotheparameter.Asexpected,thepropagationtimedecreaseswhenincreases.Thealgorithmperformsverywellformodest-sizeISPs(orzones).Whenk=10000,z=10and=3000,the5%propagationtimeis63.6days.Figure 2{4 showstheperformanceofthespatialrate-limitalgorithm(alone)withrespecttotheparameter.Thealgorithmworkswellformodest-sizeISPs(orzones)evenforlargevalues.Whenk=10000,z=10and=7000,the5%propagationtimeis27.2days.Theperformanceofthetwoalgorithmsiscomparablewhen=z,wherethetotaltemporalratelimitofthelocalinfectedhostsisequaltothespatialratelimit.Asshowninthegures,if>z,thetemporalalgorithmworksbetter;if
PAGE 47

Figure2{3. Eectivenessofthetemporalrate-limitalgorithmforDAW Figure2{4. Eectivenessofthespatialrate-limitalgorithmforDAW Figure2{5. Stopwormpropagationbyblocking Figure2{6. Propagationtimebeforethewormisstopped BecauseDAWblockspersistentscanningsources,itmaystopthewormpropagation,dependingonthevalueofn.Figure 2{5 showsthenalinfectionpercentageamongthevulnerablehostsbeforeallinfectedhostsareblocked.Evenwhenalargenisselectedandthenalinfectionpercentageislarge,theblockingisstillveryusefulbecauseitconsiderablyslowsdownthewormpropagationasshowninFigure 2{6 ,whereonlythepropagationtimesforlarger-than-5%nalinfectionsareshown.Forinstance,whenk=20000,z=20andn=10,thenalinfectionpercentageiscloseto100%.However,itwilltaketheworm71.7daystoachievethat.

PAGE 48

3.1.1MotivationThespreadofamaliciouswormisoftenanInternet-wideevent.Thefundamentaldicultyindetectingapreviouslyunknownwormisduetotworeasons.First,theInternetconsistsofalargenumberofautonomoussystemsthataremanagedindependently,whichmeansacoordinateddefensesystemcoveringthewholeInternetisextremelydiculttorealize.Second,itishardtodistinguishthewormactivitiesfromthenormalactivities,especiallyduringtheinitialspreadingphase.Althoughthewormactivitiesbecomeapparentafterasignicantnumberofhostsareinfected,itwillbetoolateatthattimeduetotheexponentialgrowthrateofatypicalworm[ 19 22 21 18 17 ].Incontrasttosomeexistingdefensesystemsthatrequirelarge-scalecoordinatedeorts,wedescribeadouble-honeypotsystemthatallowsanindividualautonomoussystemtodetecttheongoingwormthreatwithoutexternalassistance.Mostimportantly,thesystemisabletodetectnewwormsthatarenotseenbefore.Beforepresentingthearchitectureofourdouble-honeypotsystem,wegiveabriefintroductionofhoneypot.Developedinrecentyears,honeypotisamonitoredsystemontheInternetservingthepurposeofattractingandtrappingattackerswhoattempttopenetratetheprotectedserversonanetwork[ 28 ].Honeypotsfallintotwocategories[ 29 ]Ahigh-interactionhoneypotoperatesarealoperatingsystemandoneormultipleapplications.Alow-interactionhoneypotsimulatesoneormultiplerealsystems.Ingeneral,anynetworkactivitiesobservedathoneypotsareconsideredassuspiciousanditispossibletocapturethelatestintrusionsbased 37

PAGE 49

ontheanalysisoftheseactivities.However,theinformationprovidedbyhoneypotsisoftenmixedwithnormalactivitiesaslegitimateusersmayaccessthehoneypotsbymistake.Hoursorevendaysarenecessaryforexpertstomanuallyscrutinizethedataloggedbyhoneypots,whichisinsucientagainstwormattacksbecauseawormmayinfectthewholeInternetinsuchaperiodoftime.Weproposeadouble-honeypotsystemtodetectnewwormsautomatically.Akeynoveltyofthissystemistheabilitytodistinguishwormactivitiesfromnormalactivitieswithouttheinvolvementofexperts.Furthermore,itisapurelylocalsystem.Itseectivenessdoesnotrequireawidedeployment,whichisagreatadvantageovermanyexistingdefensesystems[ 2 12 ].Thebasicideaismotivatedfromtheworm'sself-replicationcharacteristics.Byitsnature,anworminfectedhostwilltrytondandinfectothervictims,whichishowawormspreadsitself.Therefore,outboundconnectionsinitiatedfromthecompromisedhostsareacommoncharacteristicsharedbyallworms.Supposewedeliberatelycongureahoneypottoneverinitiateanyoutboundconnections.Nowifthehoneypotsuddenlystartstomakeoutboundconnections,itonlymeansthatthehoneypotmustbeunderforeigncontrol.Ifthehoneypotcanbecompromised,itmighttrytocompromisethesamesystemsontheInternetinthewayitwascompromised.Therefore,thesituationiseitherarealwormattackorcanbeturnedintoawormattackiftheattackerbehindthescenechoosestodoso.Weshalltreatthetwoequallyasawormthreat. 3{1 illustratesthedouble-honeypotsystem.Itiscomposedoftwoindependenthoneypotarrays,theinboundarrayandtheoutboundarray,togetherwithtwoaddresstranslators,thegatetranslatorandtheinternaltranslator.Ahoneypotarrayconsistsofoneormultiplehoneypots,whichmayrunonseparatephysicalmachinesoronvirtualmachinessimulatedbythesamecomputer[ 29 ].

PAGE 50

Figure3{1. Usingdouble-honeypotdetectingInternetworms Eachhoneypotinthearrayrunsaserveridenticaltoalocalservertobeprotected.Ahoneypotintheinbound(outbound)arrayiscalledaninbound(outbound)honeypot.Ourgoalistoattractawormtocompromiseaninboundhoneypotbeforeitcompromisesalocalserver.Whenthecompromisedinboundhoneypotattemptstoattackothermachinesbymakingoutboundconnections,itstracisredirectedtoanoutboundhoneypot,whichcapturestheattacktrac.Aninboundhoneypotshouldbeimplementedasahigh-interactionhoneypotthatacceptsconnectionsfromoutsideworldinordertobecompromisedbywormsthatmayposeathreattoalocalserver.Anoutboundhoneypotshouldbeimplementedasalow-interactionhoneypotsothatitcanremainuninfectedwhenitrecordsthewormtrac.Inadditiontoperformingthefunctionalitiesofthelocalsystem,itchecksandrecordsallnetworktracinaconnectioninitiatedfromaninboundhoneypot.Thenetworktrac,whichisdirectlyrelatedtowormactivitiesfromtheoutside,willbeanalyzedtoidentifythesignaturesoftheworms.ThegatetranslatorisimplementedattheedgerouterbetweenthelocalnetworkandtheInternet.Itsamplestheunwantedinboundconnections,andredirectsthesampledconnectionstoinboundhoneypotsthatruntheserver

PAGE 51

softwaretheconnectionsattempttoaccess(e.g.,connectionstoports80/8080areredirectedtoahoneypotrunningawebserver).Thereareseveralwaystodeterminewhichconnectionsare\unwanted".Thegatetranslatormaybeconguredwithalistofunusedaddresses.Connectionstothoseaddressesaredeemedtobeunwanted.Itisverycommonnowadaysforanorganizationtoexposeonlytheaddressesofitspublicservers.Ifthatisthecase,thegatetranslatorcanbeconguredwiththosepublicly-accessibleaddresses.Whenaconnectionforaspecicservice(e.g.,toport80forwebaccess)isnotmadetooneoftheservers,itisunwantedandredirectedtoaninboundhoneypot.SupposethesizeofthelocaladdressspaceisNandtherearehpublicly-accessibleserversonaparticulardestinationport.Typically,N>>h.Forawormwhichrandomlyscansthatport,thechanceforittohitaninboundhoneypotrstisNh N,andthechanceforittohitaprotectedserverrstish N.WitharatioofNh h,itisalmostcertainthatthewormwillcompromisetheinboundhoneypotbeforeitdoesanydamagetoarealserverwithinthenetwork.Onceaninboundhoneypotiscompromised,itwillattempttomakeoutboundconnections.Theinternaltranslatorisimplementedatarouterthatseparatestheinboundarrayfromtherestofthenetwork.Itinterceptsalloutboundconnectionsfromaninboundhoneypotandredirectsthemtoanoutboundhoneypotofthesametype,whichwillrecordandanalyzethetrac.Wegivethefollowingexampletoillustratehowthesystemworks.SupposethattheIPaddressspaceofournetworkis128.10.10.0/128,withonepublicwebserverYtobeprotected.Theserver'sIPaddressis128.10.10.1.SupposeanattackeroutsidethenetworkinitiatesawormattackagainstsystemsoftypeY.ThewormscanstheIPaddressspaceforvictims.ItishighlyprobablethatanunusedIPaddress,e.g.128.10.10.20,willbeattemptedbefore128.10.10.1.ThegatecontrollerredirectsthepacketstoaninboundhoneypotoftypeY,whichis

PAGE 52

subsequentlyinfected.Asthecompromisedhoneypotparticipatesinspreadingtheworm,itwillrevealitselfbymakingoutboundconnectionsandprovidetheattacktracthatwillberedirectedtoanoutboundhoneypotofthesystem.Afteranoutboundhoneypotcapturedaworm,thepayloadofthewormcanbedirectlyconsideredasasignature.Usingtraclteringwiththesignatureattheedgeofthenetworkwillprotectthehostsfrombeingattackedbythesameworm.Inoursystem,thepayloadofthewormwillalsobeforwardedtoasignaturecenter.Ifawormwithpolymorphismhasbeenusedduringtheattack,thesignaturecenterwillgenerateonesinglesignatureforallthevariantsofoneolymorphicwormbythealgorithmsdiscussedlater.Thespecialwillnotonlybeabletomatchthosevariantswhosepayloadshavebeencapturedbefore,itcanalsomatchthosevariantsnotseenbefore.Weshouldemphasisthat,theproposeddouble-honeypotsystemisgreatlydierentfromaconventionalhoneypot.Aconventionalsystemreceivestracfromallkindsofsources,includingtracfromthenormalusers.Itisadicultandtedioustasktoseparateattacktracfromnormaltrac,especiallyforattacksthatarenotseenbefore.Itismorethanoftenthat,onlyafterthedamageofthenewattacksissurfaced,theexpertsrushtosearchtherecordeddataforthetraceofattacktrac.Inoursystem,whenanoutboundhoneypotreceivespacketsfromaninboundhoneypot,itknowsforsurethatthepacketsarefromamalicioussource.Theoutboundhoneypotdoesnothavetofacethepotentiallyhugeamountofnormalbackgroundtracthataconventionalhoneypotmayreceive.

PAGE 53

Figure3{2. Adecryptorexampleofaworm. overthecurrentsystemsbecausethedefensecanbecarriedoutautomaticallybeforenewwormsdealasignicantdamagetothenetwork.TheattackerswilltryeverypossiblewaytoextendthelifetimeofInternetworms.Inordertoevadethesignature-basedsystem,apolymorphicwormappearsdierentlyeachtimeitreplicatesitself.ThissectiondiscussesthepolymorphismofInternetworms,whilethenextsectionprovidesasolutionagainstsomecommonpolymorphismtechniques.Therearemanywaystomakepolymorphicworms.Onetechniquereliesonselfencryptionwithavariablekey.Itencryptsthebodyofaworm,whicherasesbothsignaturesandstatisticalcharacteristicsofthewormbytestring.Acopyoftheworm,thedecryptionroutine,andthekeyaresenttoavictimmachine,wheretheencryptedtextisturnedintoaregularwormprogrambythedecryptionroutine,forexample,thecodepresentedinFigure 3{2 [ 38 ].Theprogramisthenexecutedtoinfectothervictimsandpossiblydamagethelocalsystem.Figure 3{3 illustratesasimplepolymorphicwormusingthesamedecryptor.Thewormbodyattachedafterthedecryptorpartappearsdierentlybasedondierentkeys.

PAGE 54

Figure3{3. Dierentvariantsofapolymorphicwormusingthesamedecryptor Whiledierentcopiesofawormlookdierentifdierentkeysareused,theencryptedtexttendstofollowauniformbytefrequencydistribution[ 39 ],whichitselfisastatisticalfeaturethatcanbecapturedbyanomalydetectionbasedonitsdeviationfromnormal-tracdistributions[ 4 14 ].Moreover,ifthesamedecryptionroutineisalwaysused,thebytesequenceinthedecryptionroutinecanserveasthewormsignature,ifweareabletoidentifythedecryptionroutineregionwhichisinvariantoverdierentinstancesofthesameInternetworms.Amoresophisticatedmethodofpolymorphismistochangethedecryptionroutineeachtimeacopyofthewormissenttoanothervictimhost.Thiscanbeachievedbykeepingseveraldecryptionroutinesinaworm.Whenthewormtriestomakeacopy,oneroutineisrandomlyselectedandotherroutinesareencryptedtogetherwiththewormbody.Figure 3{4 isanexampleofthiscase.Tofurthercomplicatetheproblem,theattackercanchangetheentrypointoftheprogramsuchthatdecryptionroutinewillappearatdierentlocationsofthetracpayload,asisshowninFigure 3{5 .Thenumberofdierentdecryptionroutinesislimitedbythetotallengthoftheworm.Forexample,considerabuer-overowattackthatattemptstocopymaliciousdatatoanunprotectedbuer.Over-sizedmaliciousdatamaycauseseverememorycorruptionoutsideofthebuer,leadingtosystemcrash

PAGE 55

Figure3{4. Dierentvariantsofapolymorphicwormusingdierentdecryptors Figure3{5. Dierentvariantsofapolymorphicwormwithdierentdecryptorsanddierententrypoint

PAGE 56

Figure3{6. Dierentvariantsofapolymorphicwormwithgarbage-codeinsertation andspoilingthecompromise.Givenalimitednumberofdecryptionroutines,itispossibletoidentifyallofthemasattacksignaturesafterenoughsamplesofthewormhavebeenobtained.Anotherpolymorphismtechniqueiscalledgarbage-codeinsertion.Itinsertsgarbageinstructionsintothecopiesofaworm.Forexample,anumberofnop(i.e.,nooperation)instructionscanbeinsertedintodierentplacesofthewormbody,thusmakingitmorediculttocomparethebytesequencesoftwoinstancesofthesameworm.Figure 3{6 [ 38 ]isanexampleofthisscenario.

PAGE 57

Thelevelofpolymorphisminthistypeofwormsisdecidedbytheratioofthelengthofthegarbageinstructionregiontothetotallengthoftheworm.Forthosewormswithmoderateratio,itisquiteconceivablethattherewillbeagoodchancethatregionssharingthesamebytesequenceexistindierentinstancesoftheworms,whichinturncanbeservedasthesignatureoftheworm.Withaincreasedlength,theoverlappedregionswillbeshortenedanditisproblematictoidentifythem.However,fromthestatisticspointofview,thefrequenciesofthegarbageinstructionsinawormcandiergreatlyfromthoseinnormaltrac.Ifthatisthecase,anomaly-detectionsystems[ 4 14 ]canbeusedtodetecttheworm.Furthermore,somegarbageinstructionssuchasnopcanbeeasilyidentiedandremoved.Forbetterobfuscatedgarbage,techniquesofexecutableanalysis[ 32 ]canbeusedtoidentifyandremovethoseinstructionsthatwillneverbeexecuted.Theinstruction-substitutiontechniquereplacesoneinstructionsequencewithadierentbutequivalentsequence.Unlessthesubstitutionisdoneovertheentirecodewithoutcompromisingthecodeintegrity(whichisagreatchallengebyitself),itislikelythatshortersignaturescanbeidentiedfromthestationaryportionoftheworm.Thecode-transpositiontechniquechangestheorderoftheinstructionswiththehelpofjumps.Theexcessjumpinstructionsprovideastatisticalclue,andexecutable-analysistechniquescanhelptoremovetheunnecessaryjumpinstructions.Finally,theregister-reassignmenttechniqueswapstheusageoftheregisters,whichcausesextensive\minor"changesinthecodesequence.ThesetechniquescanbebestillustratedinFigure 3{7 [ 38 ].Thespaceofpolymorphismtechniquesishugeandstillgrowing.Withthecombinationsofdierenttechniques,acure-allsolutionisunlikely.Thepragmaticstrategyistoenrichthepoolofdefensetools,witheachbeingeectiveagainstcertainattacks.Thecurrentdefensetechniquesfallintwomaincategories,

PAGE 58

Figure3{7. Dierentvariantsofapolymorphicwormwithseveraldierentpolymorphictechniques

PAGE 59

misuse/signaturematchingandanomalydetection.Theformermatchesagainstknownpatternsintheattacktrac.Thelattermatchesagainstthestatisticaldistributionsofthenormaltrac.Weproposeahybridapproachbasedonanewtypeofsignatures,consistingofposition-awarebytefrequencydistributions.Suchsignaturescantolerateextensive,\local"changesaslongasthe\global"characteristicsofthesignatureremain.Goodexamplesarepolymorphismcausedbyregisterreassignmentandmodestinstructionsubstitution.Wedonotclaimthatsuchsignaturesaresuitableforallattacks.Ontheotherhand,itmayworkwithexecutable-analysistechniquestocharacterizecertainstatisticalpatternsthatappearaftergarbageinstructionsandexcessjumpsareremoved.Inthispaper,wefocusonsolvingtheproblemofmoderatepolymorphism.Whileweadmitthattheremightexistnouniquesolutiontosolvealltheseproblems,itisquitepossiblethatpolymorphismcanbeatleastpartiallysolvedifnoextremecaseisinvolved.Moreimportantly,oursystemisstillveryusefulindealingwitheventhemostextremecases.Firstofall,ourdouble-honeypotsystemisabletoautomaticallycapturethedierentinstancesoftheworm.Althoughauniedsignaturematchingallinstancesofthewormseemsunlikelyinextremecases,itwillstillhelpanalyzingthebehavioroftheattackandprovidinganearlywarningofitbycapturingthesamplesoftheworm.Second,althoughitmightbetrueinsomecasethathumananalysiscanndoutsignaturesthatdonotconformwithourmodel,inmostcasesitislaborious,empirical,andtime-consuming.Ouralgorithm,ontheotherhand,candetectthemostsubtlesignaturesbasedonthemodelandismorereliablethanhumananalysis.Finally,oursystemcancooperatewithotherdefensesystems,e.g.,anomaly-basedsystems,inordertobemoreeective.WeusetheinvariantregionofthewormtoserveasthesignaturebecausewearedealingwiththeInternetworms.Othermaliciouscodesuchasviruscanbe

PAGE 60

detectedafterthemachinehasbeeninfectedbyscanningtheprogramsbecauseviruswillrelyontheexecutionoftheinfectedprograms.TheInternetworms,however,willneedtobeidentiedbeforetheinfectionhasbeendoneasthegoalofthewormistospreadtotheInternetasquicklyaspossible.Whilesometechniques,e.g.Christodorescu,cansuccessfullyidentifythepolymorphicmaliciouscodebylookingforthesemanticalequivalence,theyareinappropriateinwormdetectionastheyareunabletobedoneinrealtime.Inthenextsection,weusetheiterativealgorithmstoidentifytheinvariantregionfrombytesequencesofpolymorphicworms.Thebasicpremiseofourmodelaboutthesignatureisthatthebytefrequencydistributionsinthesignicantregion,whichinourcaseistheregionthatmatchthesignatureapproximately,shouldbegreatlydierentfromtherestpartofthewormbodyandnormal,legitimatetracpayloads.Thereasonisthattheycarrydierentfunctionalities.Forexample,inapolymorphicworm,thesignicantregionisresponsibleforthetruemaliciousoperationswhiletherestpartofthesequenceonlyservesasacamouagetoeludethedefensesystem.Asaresult,therestpartofthesequencewillmostlikelyhavethesameorsimilarbytefrequencydistributionasthelegitimateconnections.Evenifanattackertriestohidethetruewormbodybyattachinglegitimatepayloads,itisalwaysdiculttodesignapuremalicioussequencepartindistinguishablefromthenormalconnection.Therefore,thebytefrequencydistributionsrelatedtothispartshouldbeunder-representedintherestofthewormbody.Ifweareabletoextractasimilarregionfromeachofthesampledinstancesoftheworm,wherethefrequencydistributionisgreatlydierentfromtherestofthesequence,thenthisregionshouldbepotentiallythesignicantregionanditsprobabilisticmultinomialbytefrequencydistributionwillbethesignaturewearelookingfor.

PAGE 61

Theattackersmaynotactaswhatwehaveexpectedasabove.Forexample,theymayonlyinsertseveralnopoperationsintoeachinstancesofthewormrandomlywithoutattachingthecamouagepart.Ourargumentstillholdinthiscase.Sincenopdoesnotappearfrequentlyinnormalsessions,thesequenceofthemaliciousconnectionwillhaveahighfrequencyonnopoperations.Theprobabilityofnopineachpositionswillgreatlylargerthanthenormalincomingconnectionsequence.Thatisenoughtoconstituteasignaturewithawidthofthesamelengthastheinstancesoftheworm. 3.3.1BackgroundandMotivationMostdeployeddefensesystemsagainstInternetwormsaresignature-based.Theyrelyontheexactmatchingofthepacketpayloadwithadatabaseofxedsignatures.Thougheectiveindealingwiththeknownattacks,theyfailtodetectneworvariantsoftheoldworms,especiallythepolymorphicwormswhoseinstancescanbecarefullycraftedtocircumventthesignatures[ 32 ].Moreover,manuallyidentifyingthesignaturesmaytakedaysifnotlonger.Toaddresstheseproblems,severalanomaly-basedsystems[ 4 14 ]usethebytefrequencydistribution(BFD)toidentifytheexistenceofaworm.Theirbasicapproachistoderiveabytefrequencydistributionfromthenormalnetworktrac.Whenanewincomingconnectionisestablished,thepayloadofthepacketsisexamined.Thebytefrequencydistributionoftheconnectioniscomputedandcomparedwiththebytefrequencydistributionderivedfromthenormaltrac.Alargedeviationwillbedeemedassuspicious.Theproblemisthatanintelligentattackercouldeasilycheatthesystembyattachthewormbodytoalengthynormal,legitimatesession.Sincethemajorityofthepayloadisfromlegitimateoperations,itsbytefrequencydistributionwillnotvarymuchfromthe

PAGE 62

normaltrac.Asthewormbytesequenceisdilutedinnormaltrac,itsstatisticcharacteristicsaresmoothedout.Bothsignature-basedandanomaly-basedsystemshavetheirprosandcons.Comparedtotheanomaly-basedsystems,signature-basedsystemshavetheirownadvantages.Sincesignature-basedsystemsmatchthesignatureofthewormwithonlythecorrespondingsegmentofthewholewormbody,itwillnothelpmuchtoreducethechanceofbeingdetectedifnormalpayloadsareattachedtotheendofthewormbody.However,ifonlytheexactmatchingisusedtocomparethesignaturewiththepayloads,aslightlychangeofthemaliciouspartinthewholewormbodymeansamismatchtothesignature.Inotherwords,currentsignature-basedsystemlackstheexibilityincontrasttotheanomaly-basedsystems.InadditiondrawbacksAbe,thesignature-basedsystemisnotrobustenoughtodierenttechniquesemployedbytheintelligentattackersaswell.Forexample,anattackermightincreasethenumberofgarbageinstructionsinsertedtothewormsothateachsignaturebydenitionistailoredintoonlyseveralbytes,asisshowningure.X.Anautomaticsignature-extractionsystemwilldramaticallyincreasethefalsepositivesasitissocommonthatanyincomingconnectionsmightcontainsuchashortsignature.Oursysteminheritsthepositiveaspectsofbothsignature-basedandanomaly-basedsystems.Itisbasedonanewdefensetechniquethatiscomplementarytotheexistingones.Wedenearelaxed,inexactformofsignaturesthathavetheexibilityagainstcertainpolymorphism.Thenewsignatureiscalledtheposition-awaredistributionsignature(PADSforshort).Itincludesabytefrequencydistribution(insteadofaxedvalue)foreachpositioninthesignature\string".Theideaistofocusonthegenericpatternofthesignaturewhileallowingsomelocalvariation.

PAGE 63

Considerapolymorphicwormwithregisterreassignment(Section 3.2 ).Becauseregistersareusedextensivelyinexecutables,swappingregistersiseectiveagainsttraditionalsignatures.However,whenasignatureisexpressedinposition-awaredistributions,notonlyarethestaticelementsintheexecutablecaptured,butthesetoflikelyvaluesforthevariableelementsarealsocaptured.Hence,PADSallowsamoreprecisemeasurementof\matching".Asimilarexampleisinstructionsubstitution,wherethemutuallyreplaceableinstructions(orsequences)canberepresentedbytheposition-awaredistributions.Tobetterexplaintheconcept,wegiveanexamplehere.Supposeawormcarriesaword\worm"initsbytesequence.Inordertoavoidthedetection,thevariantsofthewormmaychangeto\w0rm",\norm".Countingthenumberofbyteappearanceateachpositionwillgiveusthefollowingtable.Basedonthismodel,whenanewincomingconnectionisestablished,itispossibletocheckthebytesequenceintheconnectionsessionanddecidethesimilaritybetweenthesequenceandthepreviouslycaptured\worm",\w0rm",\dorm",etc.Thegoalofoursystemistousedoublehoneypotstocapturethewormattacktrac,basedonwhichPADSisderivedandusedtodetectinboundwormvariants.Itprovidesaquickandautomaticresponsethatcomplementstheexistingapproachesinvolvinghumanexperts.BasedonPADS,thedefensesystemwillbeabletoidentifythenewvariantofawormatitsrstoccurrence,evenifsuchavarianthasnotbeencapturedbythesystempreviously.Thatmeansoursystemisabletoalerttheattacksthatsuccessfullyeludethecurrentexistingsystem,henceasignicantdecreaseofthefalsenegative.Besidestheadvantagesoverthetraditionalsignature-basedsystemwhichneedstheassistanceofthehumanexpert,ourproposedsystemisespeciallyusefulinspecialcaseswhenananomaly-basedsystemmayfail.

PAGE 64

b 0 1 2 ... 9 10 0x00 0.001 0.001 0.001 ... 0.500 0.100 0x01 0.001 0.001 0.001 ... 0.200 0.500 0x02 0.005 0.001 0.001 ... 0.001 0.100 ... ... ... ... ... ... ... 0xfe 0.100 0.001 0.001 ... 0.001 0.001 0x 0.001 0.700 0.700 ... 0.001 0.001 Table3{1. AnexampleofaPADSsignaturewithwidthW=10 3{1 givesanexampleofaPADSsignaturewithwidthW=10.Whenanewconnectionisestablished,weneedtodecideifthepayloadoftheconnectionisavariantofthewormornot.Itisnecessarytodeneasimilarityscalebetweenaprobabilisticbytefrequencydistributionandabytesequence.ConsiderasetofbytesequencesS=fS1;S2;:::;Sng,whereSi,1in,isthebytesequenceofanincomingconnection.WewanttodecidewhetherSiisavariantofthewormbymatchingitagainstasignature.LetlibethelengthofSi.LetSi;1,Si;2,...,Si;libethebytesofSiatposition1,2,...,li,respectively.Letseg(Si;ai)betheW-bytesegmentofSistartingfrompositionai.Thematching

PAGE 65

scoreofseg(Si;ai)withtheanomaloussignatureisdenedasM(;Si;ai)=WYp=1fp(Si;ai+p1)whichistheprobabilityforseg(Si;ai)tooccur,giventhedistribution(f1;f2;:::fW)oftheworm.Similarly,thematchingscoreofseg(Si;ai)withthenormalsignatureisdenedas (;Si;ai)=M(;Si;ai) (;Si)=lxW+1maxai=11

PAGE 66

TheW-bytesegmentthatmaximizes(;Si)iscalledthesignicantregionofSi,whichisdenotedasRi.Thematchingscoreofthesignicantregionisthematchingscoreofthewholebytesequencebydenition.ForanyincomingbytesequenceSi,if(;Si)isgreaterthanathresholdvalue,awarningabouta(possiblyvariant)wormattackisissued.Additionaldefenseactionsmaybecarriedout,e.g.,rejectingtheconnectionthatcarriesSi.Thethresholdistypicallysetat0.Fromthedenitionof,abovezeromeansthatSiisclosertotheanomaloussignature(f1;f2;:::fW);belowzeromeansthatSiisclosertothenormalsignaturef0.Nextwediscusshowtocalculatebasedonthepreviouslycollectedinstancesofaworm.Supposewehavesuccessfullyobtainedanumbernofvariantsofawormfromthedouble-honeypotsystem.Eachvariantisabytesequencewithavariablelength.Itcontainsonecopyoftheworm,possiblyembeddedinthebackgroundofanormalbytesequence.NowletS=fS1;S2;:::;Sngbethesetofcollectedwormvariants.Ourgoalistondasignaturewithwhichthematchingscoresofthewormvariantsaremaximized.Weattempttomodelitastheclassical\missingdataproblem"instatisticsandthenapplytheexpectation-maximizationalgorithm(EM)tosolveit.Tobeginwith,weknowneitherthesignature,whichistheunderlyingunknownparameter,northesignicantregionsofthevariants,whicharethemissingdata.Knowingonewouldallowustocomputetheother.Wehavejustshowedhowtocomputethesignicantregionofabytesequenceifthesignatureisknow.Nextwedescribehowtocomputethesignatureifthesignicantregionsofthevariantsareknown.Firstwecomputethebytefrequencydistributionforeachbytepositionofthesignicantregions.Atpositionp2[1:::W],themaximumlikelihoodestimationofthefrequencyfp(x),x2[0:::255],isthenumberc(p;x)oftimesthatxappearsat

PAGE 67

positionpofthesignicantregions,dividedbyn.fp(x)=cp;x n+256d(3{3)wheredisasmallpredenedpseudo-countnumber.Wementionedintheprevioussectionthatanomaly-basedsystemsutilizethebytefrequencydistributiontodetecttheexistenceofworms.Ourmethodinthispaperisatotallydistinctconcept.Inanomaly-basedsystems,thebytefrequencydistributionofthewholeincomingtraciscomparedwiththeexpecteddistributionofthenormaltracandagreatdeviationbetweenthesetwodistributionsisconsideredasmalicious.Inourmethod,however,thebytefrequencydistributionisusedtodescribethesignaturefromcollectedvariantsofthesamewormonly.Thepurposeistohavea\relaxed"formatofthesignaturesothatthemaliciousconnectioncanbeidentiedifthepayloadoftheconnectionmatchestothesignatureapproximately.Variantsofthewormneedtobeobtainedbeforehandinoursystemswhileanomaly-basedsystemsonlyneedtocareaboutthepatternsofthelegitimatetrac.WehaveestablishedthatthePADSsignatureandthesignicantregionscanleadtoeachother.Wedonotknoweitherofthem,butweknowthatthe

PAGE 68

signicantregionsarethosesegmentsthatcanmaximizethematchingscorewiththesignature.This\missingdataproblem"canbesolvedbyaniterativealgorithm,whichrstmakesaguessonthestartingpositionsofthesignicantregions,computingthesignature,usingthesignaturetocomputethenewstartingpositionsofthesignicantregions,andrepeatingtheprocessuntilconvergence. Figure3{8. Signaturedetection

PAGE 69

35 ]isaniterativeprocedurethatobtainsthemaximum-likelihoodparameterestimations.GivenasetSofbytesequences,welackthestartingpositionsa1,a2,...,anofthesignicantregions,whicharethemissingdatainourproblem.Theunderlyingparameterofourdatasetisalsounknown.TheEMalgorithmiteratesbetweentheexpectationstepandthemaximizationstepaftertheinitialization.ThedescriptionofEMalgorithmisgivenbelow.Initialization.Thestartingpositionsa1,a2,...,anofthesignicantregionsforwormvariantsS1,S2,...,Snareassignedrandomly.TheydenetheinitialguessofthesignicantregionsR1,R2,...,Rn.Themaximumlikelihoodestimateofthesignatureiscalculatedbasedontheinitialsignicantregions.Expectation.Thenewguessonthelocationsofthesignicantregionsiscalculatedbasedontheestimatedsignature.Inouralgorithm,thenewstartingpositionaiofthesignicantregionisthepositionthatthesignicantregionhasthebestmatchscorewiththesignature.Inotherwords,weseekai=argmaxai(;Si;ai)8i2[1::n]MaximizationByformula( 3{3 ),thenewmaximumlikelihoodestimateofthesignature iscalculatedbasedonthecurrentguessonthelocationsofthesignicantregions.Thealgorithmsterminatesiftheaveragematchingscoreiswithin(1+")ofthepreviousiteration,where"isasmallpredenedpercentage.StartingwithalargesignaturewidthW,weruntheabovealgorithmtodecidethesignatureaswellasthesignicantregions.Iftheminimummatchingscoreofallsignicantregionsdeviatesgreatlyfromtheaveragescore,werepeatthe

PAGE 70

algorithmwithasmallerW.Thisprocesscontinuesuntilwereachasignaturethatmatcheswellwiththesignicantregionsofallcollectedwormvariants. 40 ]approachattractedgreatattention.Simplyspeaking,theapproachallowscertainrandomselectionoftheparameter(withasmallprobabilitymovingtowardsaworsedirection),whichprovidesachancetojumpoutofalocalmaxima.OneexampleofthesimulatedannealingistheGibbsSamplingAlgorithm[ 36 ],whichwewillusetocomputethePADSsignaturebelow.Thealgorithmisinitializedbyassigningrandomstartingpositionsforthesignicantregionsofthewormvariants.Thenonevariantisselectedrandomly.ThisselectedvariantistemporarilyexcludedfromS.Thesignatureiscalculatedbasedontheremainingvariants.Afterthat,thestartingpositionforthesignicantregionoftheselectedvariantisupdated,accordingtoaprobabilitydistributionbasedonthematchingscoresatdierentpositions.Thealgorithmcontinueswithmanyiterationsuntilaconvergencecriterionismet.ThedescriptionoftheGibbssamplingalgorithmisgivenbelow.Initialization.Thestartingpositionsa1,a2,...,anofthesignicantregionsforwormvariantsS1,S2,...,Snareassignedrandomly.PredictiveUpdate.Oneofthenwormvariants,Sx,israndomlychosen.Thesignatureiscalculatedbasedontheothervariants,SSx.

PAGE 71

Thealgorithmterminatesiftheaveragematchingscoreiswithin(1+")ofthepreviousiteration,where"isasmallpredenedpercentage.Sampling.Everypossiblepositionax2[1::lxW+1]isconsideredasacandidateforthenextstartingpositionforthesignicantregionofSx.Thematchingscoreforeachcandidatepositionis(;Sx;ax)asdenedin( 3{1 ).ThenextstartingpositionforthesignicantregionofSxisrandomlyselected.Theprobabilitythatapositionaxischosenisproportionalto(;Sx;ax).Thatis,Pr(ax)=(;Sx;ax)

PAGE 72

isupdated.ThetimecomplexityinoneiterationisO(liW+1)sincethereareliW+1possibilities.InEMalgorithm,weupdateallstartlocationsatonce,thetimecomplexityisO(Pi(liW+1))foroneiteration.ThatdoesnotmeanGibbssamplingisbetterthanEMintimecomplexity,however.Theyaregenerallythesameifupdatingallstartlocationsiscountedasoneiteration.

PAGE 73

inthesignaturecanbedroppedifwecompareitwiththebytesequencescollectedfromthelegitimatecollectionsession.Therefore,thetimeandspacecomplexitycanbesignicantlyreducedwhenweusedthesimpliedsignaturetochecktheincomingtrac.

PAGE 74

capturesthemostsignicantonebutdiscardstherest,whichrendersitlesspowerfulagainsthighlysophisticatedpolymorphicworms.Toaddresstheabovelimitations,weproposeanaturalgeneralization,calledmulti-segmentpositionawaredistributionsignature(MPADforshort),whichisasetofPADSsignaturesthatarecombinedtoidentifyaworm.ItisdenotedasM=(1;:::;k),wherei,1ik,isaPADSsignature.EachPADSsignaturemayhaveadierentwidth.TocalculateM,werstusethealgorithmsinSection 3.4 tocomputeaPADSsignature,1,andthesignicantregionsfor1.WethenremovethesesignicantregionsfromthewormsamplesandcomputethenextPADSsignature,2,andthesignicantregionsfor2.Wefurtherremovethesesignicantregionsandcompute3...untilthereisnomoresignaturethatcanproducegoodmatchingscoresforallwormsamples.WhenanincomingbytesequenceismatchedagainstM,itisclassiedasapotentialwormvariantonlywhenitsmatchingscoreswithallPADSsignaturesareabovezero.Toreducethematchingoverhead,thePADSsignaturewiththemostdiversedistributioncanbeusedrst,whichattemptstoseparatewormvariants(withsomefalsepositives)fromthebackgroundtrac.TherestofPADSsignaturesarethenappliedoneafteranothertoprogressivelylteroutthefalsepositives.

PAGE 75

PADS/MPADsignatureiscalculatedforeachcluster.Thesignaturescanthenbeusedtoidentifynewvariantsoftheworms.Wedescribetwoalgorithmsfortheclusterpartitioningproblem. 3{2 ),thesimilaritybetweenSiandSjcanbeexpressedas ij=(;Si)+(;Sj)=liW+1maxai=1W1Xp=01 3{4 ).ii=0.Giventhennsimilaritymatrix=(ij),i;j2[1::n],wewanttondsuchclusters(e.g.,cliquesinthegraph)thathavelargesimilarityvaluesforintra-clusteredgesbutsmallsimilarityvaluesforinter-clusteredges.Figure 3{9 illustratesasimpleexample,whereashorteredgemeansalarger

PAGE 76

Figure3{9. Clusters similarityvalue.Thisisawell-studiedproblemandaspectralclusteringalgorithmcallednormalizedcutscanbeusedtoextracttheclusters[ 41 42 ].Forthepurposeofcompleteness,webrieydescribethealgorithminourcontext.ThenormalizedcutsalgorithmrstdecomposesthegraphGintotwoclusters,AandB,thatminimizethefollowingcriterion:cut(A;B)

PAGE 77

ThecriterioncanthenberewrittenasyT(D)y yTDyMinimizingtheabovecriterionisanintegerprogrammingproblemifyonlytakediscreteelements.Anapproximationistotreatyasarealvector[ 41 ]withpositiveelementsfortherstclusterandnegativeelementsforthesecondcluster.Itcanbeshownthatanyysatisfyingthefollowingequationforsomevaluewillminimizethecriterion.(D)y=DyFollowingcertaintransformationsthatweomithere,thegeneralizedeigenvectorycorrepondingtothesecondsmallesteigenvalueisused[ 41 ].Readersarereferredto[ 41 ]fordetails.Afterthealgorithmpartitionsthegraphintotwoclusters,wecanrecursivelyapplythealgorithmtofurtherpartitioneachclusteruntilthereisnosignicantdierencebetweenaverageintra-clustersimilarityandaverageinter-clustersimilarity. 43 ].TheW32/SasserwormexploitsabueroverowvulnerabilityintheWindowsLocalSecurityAuthorityServiceServer(LSASS)onTCPport445.Thevulnerabilityallowsaremoteattackertoexecutearbitrarycodewithsystemprivileges[ 44 ].ForSapphire(alsocalledSlammer)worm,itcaused

PAGE 78

Figure3{10. Variantsofapolymorphicworm considerableharmsimplybyoverloadingnetworksandtakingdatabaseserversoutofoperation.Manyindividualsiteslostconnectivityastheiraccessbandwidthwassaturatedbylocalcopiesoftheworm[ 45 ].ThePUDwormtriestoexploittheSSLvulnerabilityoni386Linuxmachines[ 46 ].Intheexperiments,wearticiallygeneratethevariantsofthesewormsbasedonsomepolymorphismtechniquesdiscussedinSection 3.2 .Fornormaltracsamples,weusetracestakenfromtheUFCISEnetwork.Figure 3{10 illustratesthepolymorphicwormdesignwithvevariants,S1,S2,...,andS5.Eachvariantconsistsofthreedierenttypesofregions.Theblackregionsaresegmentsofthemaliciouspayloadintheworm.Substitutionisperformedon10%ofthemaliciouspayload.Garbagepayloads,whicharerepresentedasthewhiteregionswithsolidlines,areinsertedatdierentrandomlocationsinthemaliciouspayload.Thedefaultratioofthemaliciouspayloadtothegarbagepayloadis9:1. 3{10 forbetterillustration.

PAGE 79

Figure3{11. Inuenceofinitialcongurations variantisbetween2KBto20KB.Intheillustration,thesignicantregionsofthesevariantsstartata1,a2,...,anda5,respectively. 3{11 showsthequalityofthePADSsignatureobtainedbyEMorGibbsafteracertainnumberofiterativecycles.AccordingtoSection 3.4 ,theexecutionofeitheralgorithmconsistsofiterationcycles(Expectation/MaximizationstepsforEMandUpdate/SamplingstepsforGibbs).Duringeachiterativecycle,EMrecalculatesthesignicantregionsofallvariants,whileGibbsonlymodiesthesignicantregionofonerandomlyselectedvariant.Tomakeafaircomparison,weletthexaxisbetheaveragenumberofrecalculationsperformedonthesignicantregionforeachvariant.Theyaxisistheaveragematchingscoreofthe100variantswiththesignatureobtainedsofar.Thematchingscoreisdenedin( 3{2 ).Fromthegure,thebestmatchingscore

PAGE 80

Figure3{12. Variantsclusteringusingnormalizedcuts isaround7.5,whichislikelytobetheglobalmaxima.EMtendstosettledownatalocalmaxima,dependingontheinitialconguration.Gibbsislikelytondtheglobalmaximabutitdoesnotstabilizeevenwhenitreachestheglobalmaximaduetotherandomnessnatureinitsselectionofstartingpointsofsignicantregions. 3.6.1 andparticularly( 3{4 ),iscalculatedbyusingtheGibbssamplingalgorithm.Theresultisshownintheleft-handplotofFigure 3{12 ,wherethehorizontalaxesarevariantids,representingtherowsiandthecolumnsjofthe

PAGE 81

Figure3{13. Matchingscoreinuenceofdierentsignaturewidthsandsamplevariantslengths matrix,andtheverticalaxisisthesimilarityvaluebetweenvariantsiandj.Thesurfaceoftheplotcanberoughlypartitionedintothreeregions.Therstregion(i;j2[1::50])showsthesimilarityvaluesamongstthesetofMSBlasterwormvariants.Thesecondregion(i;j2[51::100])showsthesimilarityvaluesamongstthesetofW32/Sasserwormvariants.TherestregionshowsthesimilarityvaluesbetweenMSBlastervariantsandW32/Sasservariants.Byusingthenormalizedcutsalgorithm,the100wormvariantsareseparatedintotwoclusters,oneforMSBlasterandoneforW32/Sasser.Theresultingyvectorisshownintheright-handplotofFigure 3{12 ,whereeachpointrepresentsoneelementiny.Thevariantswhosevaluesinyarebelowzerobelongtoonecluster.Thevariantswhosevaluesinyareabovezerobelongtotheothercluster.

PAGE 82

Figure3{14. Inuenceofdierentlengthsofthesamplevariants Figure3{15. Inuenceofdierentlengthsofthesamplevariants

PAGE 83

Figure3{16. Inuenceofdierentlengthsofthesamplevariants Figure3{17. Inuenceofdierentlengthsofthesamplevariants

PAGE 84

normal-tracbytesequencestotestthequalityofthesignatureforeachofthefourworms.Figure 3{13 showstheaveragematchingscorewithrespecttothesignaturewidthandtheaveragelengthofthewormvariants.Becausethewormcodehasaxedlength,wechangethelengthofavariantbylettingitcarryavariableamountofnormaltrac.ThetwoguresshowtheaveragematchingscoresofsamplevariantsafterEMandGibbssamplingalgorithmsconvergetoanalsignature.Figure 3{13 alsoindicatesthatincreasingthesignaturewidthwilldecreasetheaveragematchingscoreofwormvariants.Thereasonisthatalongersignaturemeansalargersignicantregion,whichincreasesthechanceforthesignicantregiontoincludegarbagepayload,whichinturndecreasesthematchingscore.Figure 3{13 showsthatincreasingthelengthofthenormaltraccarriedbyawormvariant,whichhasbeenwidelyusedbysomepolymorphicwormstoeludetheanomaly-basedsystems,providesnohelptoavoiddetectionbyoursystem.Thereasonisthatoursystemidentiesasignicantregionandonlyusesthesignicantregionforsignaturegeneration.Thecarriednormaltrac,nomatterhowmuchitis,willnotbeusedforsignaturegeneration.Figure 3{14 { 3{17 showtheaveragematchingscoresofthetestingworm/normaltracsequences.Thescoresforwormtracarealwaysabovezeroandthescoresfornormaltracarealwaysbelowzero.Therefore,withathresholdof0,wormvariantsaredistinctivelyseparatedfromnormaltrac.Inourexperiments,thegeneratedPADSsignaturewasalwaysabletoidentifynewvariantsofthewormwithoutfalsepositiverates.Thefalsepositiverateandfalsenegativerateofouralgorithmwillbediscussedinthenextsubsection. 3{18 showthefalsepositiverateandfalsenegativerateofouralgorithmforeachofthefourworms.Weonlyshowtheinuenceofsignature

PAGE 85

Figure3{18. Falsepositivesandfalsenegatives

PAGE 86

Figure3{19. Theperformanceofsignature-basedsystemusingthelongestcommonsubstringsmethod. widthbecausethesamplelengthhaslittleinuenceonthematchingscores.Forallfourwormexamples,neitherfalsepositivenorfalsenegativerateexceed0:5%.Aswecanseefromthegure,GibbssamplingalgorithmisalwaysbetterthanEMalgorithmforallfourworms.Withtheincreaseofsignaturewidth,thefalsepositiveratedecreasesgraduallywhilefalsenegativerateincreasesgradually. 3{19 showstheexperimentalresultsbasedonthelongestcommonsubstringmethod[ 30 ],whichrstidentiesthelongestcommonsubstringamongthesamplewormvariantsandthenusesthesubstringasasignaturetomatchagainstthetestvariants.Basedontheleft-handplot,asthenumberofsamplevariantsincrease,thelengthofthelongestcommonsubstringdecreases.Ashortersignatureincreasesthechanceforittoappearinnormaltrac.Consequently,thefalsenegativeratiodecreases,butthefalsepositiveratioincreasesdramatically(theright-handplot).Onthecontrary,withouttherequirementofexactmatching,aPADSsignatureisabletoretainmuchmore(particularlystatistical)characteristicsofapolymorphicworm.

PAGE 87

Figure3{20. Bytefrequencydistributionsofnormaltrac(left-handplot)andwormtrac(right-handplot) Figure3{21. Bytefrequencydistributionsofwormvariants.Left-handplot:maliciousandnormalpayloadscarriedbyawormvarianthaveequallength.Right-handplot:normalpayloadcarriedbyawormvariantis9timesofmaliciouspayload.

PAGE 88

Nowconsidertheposition-unawarebytefrequencydistributionsthatareusedinsomecurrentsystems.Theleft-handplotofFigure 3{20 showstheposition-unawarebytefrequencydistributionof100normaltracsequences(from100normalsessions)andtheright-handplotshowsthebytefrequencydistributionofMSblasterpayload.Thesetwodistributionsareverydierent,whichseemsprovideawaytodetecttheworm.However,ifwecreateawormvariantbyembeddingthewormpayloadinnormaltrac,thecombinedbytefrequencydistributioncanbemadeverysimilartothatofnormaltrac.Figure 3{21 showsthebytefrequencydistributionsoftwowormvariantswhosenormaltracpayloadsare1and9timesofmaliciouspayload,respectively.Theright-handplotisverysimilartotheleft-handplotofFigure 3{20 .Therefore,usingbytefrequencydistributionsalonecannothandlewormvariants.Theproposedposition-awaredistributionsignatureworksbetteragainstpolymorphicworms.

PAGE 89

78

PAGE 90

ThegenerationofthePADSblockisa\missingdata"problembecauseneitherthemaliciousregionsineachvariantsoftheworm,northesignatureitselfisknown.Ifthemaliciousregionsareknown,thePADSblockcanbecalculatedbycountingthenumberofeachbytevalueappearingatdierentpositions.Ontheotherhand,ifthePADSblockisknown,themaliciousregionineachvariantsofthewormcanbeobtainedbyscanningthroughthewholevariantsandndingtheregionsthatbestmatchesthePADSblock.The\missingdata"problemcanbesolvedusingiterativemethodssuchasExpectation-Maximization(EM)orGibbssamplingalgorithmswhichhavebeenmentionedin[ 47 ].ThemodelofthesinglePADSblockin[ 47 ]suersfromseverallimitations.Firstofall,asinglePADSblockcannotdealwithover-seperatedmaliciousregionsbecauseonePADSblockisunlikelytobeabletocoverallmaliciousregions.Secondly,thesinglePADSblockmodelisunabletoexcludetheinuenceofthebackgroundnoise.TheapproachmakesanassumptionthatnormaltracdoesnotcontainthesamePADSblock,whichisnotnecessarytrueandcanbeexploitedbyanbrilliantattacker.Finally,themodelofasinglePADSblockassumesthateachcollectedsampleofthewormbelongstothesamepolymorphicwormfamily.Thereisnomechanismtoclssifydierentpolymorphicwormfamiliesandexcludetheinuenceofthose\outliners",whichwillgreatlydecreasetheperformanceofthealgorithm.Inadditiontothelimitations,themethodofextractingPADSblocksassumesthateachsamplevariantcontainsexactlyonePADSblockofthesametype.ThesamplevariantsnotcontainingthePADSblockwillover-contributetothecharacterizationofthePADSblockandthesamplevariantscontainingrepeatingPADSblockswillunder-contribute.ThispapertriestosolvetheproblembyproposingamultiplePADSblocksmodel.Inthismodel,asetofPADSblocksisidentiedforeachpolymorphicwormfamily,whichisidentiedbyaclassicationmethodsimiliartotheextracting

PAGE 91

ofPADSblocks.ThesignaturecombinesthosePADSblockstogetherandeveryPADSblockswithinthesetaretakenintoconsiderationforwormdetection.Inordertoeliminatetheinuenceofbackgroundnoise,thecommonregionswithinthenormaltracpayloadwillberstidentiedandexcludedfromthesamplewormvariantset.Furthermore,themethodofextractingPADSblocksinthispaperisabletoidentifymultiplePADSblocksfromamixtureofsamplevariantsthatbelongstodierentpolymorphicfamilies,evenifsomePADSblocksdonotappearinallofthesamplevariantsandsomesamplevariantscontainsrepeatingPADSblocks.Toaccomplishourgoal,wefurtherdeneanewmetrictodescribethequalityofthematchingbetweenasetofPADSblocksandabytesequence.Itcanbeconsideredasanoptimizationtothepreviousdescribedapproach.Inthefollowingsections,thedetailsofextractingPADSblocks,themodelofmultiplePADSblocks,thesignaturedenitionofthemultiplePADSmodel,andtheclassicationofpolymorphicwormfamilies,willbepresented,stepbystep. 4.2.1PADSBlocksandTheDatasetfromByteSequencesInthissubsection,webrieyintroducePosition-AwareDistributionSignature(PADS)[ 47 ]blocks,whicharewormsignaturesdenedinaspecialformattoidentifythemaliciousregionsthatappearinallormostofthevariantsforthesamepolymorphicworm.APADSblockisgreatlydierentfromatraditionalstringsignatureinthatmultinomialbyte-frequencydistributionsreplacebytevaluesateachpositionsofthePADSblock.ForaPADSblockofwidthWintermsofthenumberofbytes,(f1;:::;fW)isusedtocharacterizethebyte-frequencydistributionsinsidetheregionofamaliciousblock,withfk=[fk0;:::;fkb;:::]Tfork=1:::Wspecifyingtheposition

PAGE 92

4{1 isanexampleofaPADSblockwithwidthW. b 0.001 0.001 ... 0.500 0.100 0x01 0.001 0.001 ... 0.200 0.500 0x02 0.001 0.001 ... 0.001 0.100 ... ... ... ... ... ... 0xfe 0.001 0.001 ... 0.001 0.001 0x 0.700 0.700 ... 0.001 0.001 Table4{1. AnexampleofaPADSblockwithwidthW=10 SimiliartothedenitionofPADSblocks,themultinomialbyte-frequencydistributionoutsidethemaliciousregionsinabytesequencecanbedenedasf0=[f00;:::;f0b;:::]Twithrespecttoallpossiblebytevalueb=[0:::255].WeuseFtorepresent(f1;:::;fW)andf0respectivelyinthesetwocases.OurpurposeistondthePADSblockswithinamixtureofpolymorphicworms.Inthispaper,eachbytesequenceSjisbrokenupintooverlappinglysegmentsoflengthW.IfajisusedtorepresentthestartingpositionofaW-bytesegmentwithinasequenceSj,thenajcanbeanyvaluebetween1andljW+1,whereljisthetotallengthofthesequenceSj.ByextractingallpossibleW-bytesegmentsfromthebytesequencesetS=fS1;S2;:::g,anewdatasetthatcontainsallpossibleW-bytesegmentsisobtained.SupposeNisthetotalnumberofsequencesinthedataset,njisthetotalnumberofW-bytesegmentswithinasequenceSj,andthetotalnumberwithinasequencesetisn.Apparently,wehaven=NXj=1nj=NXj=1(ljW+1)Inthispaper,thetotalsetoftheW-bytesegmentsformstheobserveddataset.TobeconsistantwiththePADSblocksandfaciliatetheexpression,thedataof

PAGE 93

theW-bytesegmentisrepresentedasbyte-frequencydistributionsaswell.LetG=(g1;:::;gW)bethedataofW-bytesegment,withgk=[gk1;:::;gkb;:::]Tbeingthemultinomialbytefrequencydistributionatpositionk.Ifthebytevaluebappearsatthepositionkofthesegment,thengkb=1andtherestprobabilitiesfgk1;:::;gk(b1);gk(b+1);:::gareall0.Table 4{2 isanexampleofthedataforaW-bytesegment. b 0.000 0.000 ... 1.000 0.000 0x01 0.000 0.000 ... 1.000 1.000 0x02 0.000 0.000 ... 0.000 0.000 ... ... ... ... ... ... 0xfe 0.000 0.000 ... 0.000 0.000 0x 1.000 1.000 ... 0.000 0.000 Table4{2. AnexampleofasegmentwithwidthW=10

PAGE 94

cannotbefoundanalytically.TheEMalgorithmmakesuseoftheconceptof\missingdata".Inourmodel,themissingdataisasetofnlabelsZ=fz(1);:::;z(n)gassociatedwiththenobserveddatasinthedataset,indicatingwhichgroupeachobserveddatabelongsto.Eachlabelisabinaryvectorz(i)=[z(i)0;:::;z(i)M]T.Ifanobserveddataybelongstothem-thgroup,thenz(i)m=1andthedatay(i)hastheprobabilityp(y(i)jm).Ontheotherhand,iftheobserveddatadoesnotbelongstothem-thgroup,thenz(i)m=0.BasedonthedenitionofZ,itisstraightforwardthatthepriorprobabilityforz(i)m=1ism, Thejointdensityoftheobserveddatay(i)andthemissingdataz(i)canbeobtainedfromEq. 4{4 and 4{5 : 4{6 wehave: logp(Y;Zj;)=nXi=1MXm=1z(i)mlog[mp(y(i)jm)](4{8)

PAGE 95

ForEMalgorithm,weiterativelymaximizestheexpecteditloglikelihoodovertheconditionaldistributionofthemissingdataZbasedontheobserveddataYandthecurrentestimateofparameters=f1;:::;M;1;:::;Mg.Initialization:Inthisstep,theinitialunknownparameters^(0),^(0)areassignedrandomly.Expectation:Inthisstep,theexpectedvalueofz(i)m(t)canbecalculatedusingBayes'ruleandEq. 4{3 and 4{4 ,wheretrepresentsthet-thiteration: EZ[logp(Y;Zj;)jY;^(t);^(t)]=EZ[nXi=1MXm=1z(i)mlogmp(y(i)jm)jY;^(t);^(t)]=nXi=1MXm=1E[z(i)mjY;^(t);^(t))]logmp(y(i)jm)=nXi=1MXm=1^z(i)m(t)logmp(y(i)jm)=nXi=1MXm=1^z(i)m(t)logp(y(i)jm)+nXi=1MXm=1^z(i)m(t)logm(4{10)

PAGE 96

Toobtain^m(t+1),wehave 4{11 and 4{12 ,wehave: ^fkb(t+1)=^z(i)m(t)gkb=255Xb=0^z(i)m(t)gkb(4{14)and

PAGE 97

4{14 isthat^fkb(t+1)willbezeroforthosebytevaluesbthatneverappearatpositionkofanyPADSblocks.ThevaluewillneverchangeduringtheiterationsduetotheupdateoftheEMalgorithm.However,^fkb(t+1),whichistheestimateoftheparametersofamultinomialrandomvariablebymaximumlikelihood,isactuallysubjecttoboundaryproblems.Forbetterexibility,weapplya\pseudo-count"totheobservedbytecount,andthebytefrequencyestimatebecomes: ^fkb(t+1)=^z(i)m(t)gkb+b 48 ].Inthispaper,aconstantdisusedtoreplace1;2;:::forsimplicityreasons.Therefore,weactuallyhave: ^fkb(t+1)=^z(i)m(t)gkb+d 4.3.1MultiplePADSBlocksModelInthissubsection,themultiplePADSblocksmodelispresented,togetherwiththecretiawhetherornotasequenceisconsideredasmalicious.InordertotakeintoconsiderationeveryPADSblockswithinthesetofapolymorphicwormfamily,wetreatthebytesequenceasafeaturevectorspacewitheachfeatureasthesimilarityagainteachPADSblock.Inourmodel,weusetheconditionalloglikelihoodofasequenceforeachPADSblockstorepresenteachfeature.The

PAGE 98

sequenceunderthefeaturespaceisdenedas:h=0BBBBBBBBBB@h(1)h(2)::h(d)1CCCCCCCCCCA=0BBBBBBBBBB@logp(YjF(1);(1))logp(YjF(2);(2))::logp(YjF(d);(d))1CCCCCCCCCCAwhereF(1);F(2);:::;F(d)arethePADSblocksignature(correspondingtointheextractionstep),(1);(2);:::;(d)arethemixturingprobabilities(correspondingtointheextractionstep)withrespecttoF(1);F(2);:::;F(d),andYisthedatasetfy(1);y(2);:::;y(ljW+1)gwithinasequenceSj

PAGE 100

Thematrix1istheinversecovariancematrix.Thematrixcanbepre-calculatedinourEMalgorithmlateron.TheadvantageofMahalanobisdistanceisthatittakesintoaccountthedierentweightsforeachelementofthevectorbyitsvarianceandthecovarianceofthevariablesmeasured.Thecomputedvaluegivesameasureofhowwellthematchingscoreofthenewsamplevarriantisconsistentwiththetrainingdataset.BasedontheMahalanobisdistance,wedeneascoreDofofanysamplevariantagainstapolymorphicwormfamily.ThemeaningofthescoreDistotallydierentfromthematchingscoreofasamplevariantagainstaPADSblockasamatchagainstaPADSblockonlydoesnotnecessarilymeanthatthesamplevariantwillbeidentiedasapolymorphicworm.ThecalculationofthescoreDisasfollows: 1. CalculatethematchingscoresofthesamplevariantagainteachPADSblockwithintheset,arrangethematchingscoresintoafeaturevectorh. 2. FindtheMahalanobisdistancebetweenhandH:d2=(hH)T1(hH). 3. ThescoreDisdenedasthesimilaritymeasuredbyD=ed2 2.Thisscoreis1foraexactmatchanddecreasesotherwise.

PAGE 101

OurmethodofclassicationisbasedontheEMalgorithm,whichissimiliartothealgorithmwedescribedbefore.Tosavethespaceofthepaper,theclassicationalgorithmiswrappedintoageneralEMalgorithmdiscussedbefore.Inourclassicationalgorithm,theobserveddatayisthed-dimensionalhofeachbytesequence.TheunknownparameteristhecovariancematrixandthesignatureH.DierentfromthemodelweuseinthePADSextraction,weassumehhaveaGaussianmixturedensitywithMfamiliesforad-dimensionalrandomvariableh.Therefore,intheprevioussection,p(yjm)isgivenby: (2)d=2detm1=2e1 2(hHm)T1m(hHm)=1 (2)d=2detm1=2e1 2Dm(4{20)SimiliarstepscanbeappliedtothegeneralstepsoftheEMalgorithm.

PAGE 102

runsoftheiterativemethods,thenewlyproposedalgorithmfurtherreducesthetimeneededforthesystem.

PAGE 103

92

PAGE 104

ofpolymorphicworm,thesystemusesiterativealgorithmstondthePADSsignatureoftheworm,whichisusedtodetectfuturewormattacksevenifnewvariantshavenotbeencapturedbefore.Inourexperiment,a100%accuracyhasbeenachievedtodetectthevariantsofMSBlasterwormwhichmeansallmalicioustraccanbedetectedandalllegitimatetraccanpassthroughthesystemwithnofalsepositives.Thesystemiscompletelyautomatic.Itrequiresnoinvolvementofhumanexperts,whichistypicallythedrawbackoftheregularsignature-basedsystem.Thesystemalsotoleratessomemodicationsofthewormwherebothsignature-andanomaly-basedsystemsmayfail.Inthethirdpartofthethesis,wefurtherinvestigatethemultiplePADSmodelandproposetheoptimizationofouriterativeapproachs.Themotivationforoptimizationisthattheiterativemethodsdiscussedinsuerfromseveraldrawbacks.BecausethePADSsignaturecanonlybeobtainedonebyoneanditerativeapproachsaretimeconsumingprocess,itwilltakealongtimebeforeeveryPADSsignaturehasbeenextracted.BecausePADSsignaturesareextractedsequentially,thequalityofthePADSsignaturewillbedierent.Sinceiterativemethodsareused,dierentinitializationwillresultintotallydierentPADSsignatureset,thusaecttheclusteringofthepolymorphicwormfamily.WeproposeanewwayofextractingmultiplePADSblocksatthesametimeusingiterativemethodssuchasExpectation-Maximization(EM)algorithm.ToclassifydierentpolymorphicInternetwormfamilies,werevisittheEMalgorithmbasedonaGaussianmixturemodelforeachbytesequence,whichisassumedtobeinafeaturevectorspace.Thealgorithmproposedsavethetimecomplexityoftheiterativeapproachsinthattheextractionstepcanbedonesimultaneously.

PAGE 105

downorevenhaltsthewormpropagationhasbeendeveloped.DAWisdesignedforanInternetserviceprovidertoprovidetheanti-wormservicetoitscustomers.Analyticalsimulationresultshavedemonstratedtheeectivenessoftheproposedtechniques.Secondly,anewsystemcalled\double-honeypot"systemisproposed,whichisabletoautomaticallycapturethewormsamplesovertheInternet.Finally,anewdenitionofwormsignature,whichutilizesthestatisticalpropertiesofthepolymorphicworms,areproposed,togetherwiththenewmethodofautomaticpolymorphicwormsignaturegeneration.Itcaneectivelyidentifypolymorphicwormsfromthenormalbackgroundtrac.Moreover,ithasthecapabilityofidentifyfuturewormattacksevenifthewormwasnotseenbefore.Thepaperalsodiscussseveraloptimizationtechniquestoreducethetimecomplexityofiterativeapproachs.

PAGE 106

[1] S.Staniford,V.Paxson,andN.Weaver,\Howto0wntheInternetinYourSpareTime,"inProceedingsofthe11thUSENIXSecuritySymposium(Secu-rity'2002),SanFrancisco,California,USA,Aug.2002,pp.1{20. [2] D.Moore,C.Shannon,G.M.Voelker,andS.Savage,\InternetQuarantine:RequirementsforContainingSelf-PropagatingCode,"inProceedingsofthe22ndAnnualJointConferenceoftheIEEEComputerandCommunicationsSocieties(INFOCOM'2003),SanFrancisco,California,USA,Apr.2003,pp.1901{1910. [3] S.ChenandY.Tang,\SlowingDownInternetWorms,"inProceedingsofthe24thIEEEInternationalConferenceonDistributedComputingandSystems(ICDCS'2004),Tokyo,Japan,Mar.2004,pp.312{319. [4] C.KruegelandG.Vigna,\AnomalyDetectionofWeb-basedAttacks,"inProceedingsofthe10thACMConferenceonComputerandCommunicationSecurity(CCS'2003).WashingtonD.C.,USA:ACMPress,Oct.2003,pp.251{261. [5] D.Moore,V.Paxson,S.Savage,C.Shannon,S.Staniford,andN.Weaver,\InsidetheSlammerWorm,"IEEEMagazineofSecurityandPrivacy,pp.33{39,July2003. [6] C.Cowan,C.Pu,D.Maier,J.Walpole,P.Bakke,S.Beattie,A.Grier,P.Wagle,Q.Zhang,andH.Hinton,\StackGuard:AutomaticAdaptiveDetectionandPreventionofBuer-OverowAttacks,"inProceedingsofthe7thUSENIXSecurityConference(Security'1998),SanAntonio,Texas,USA,Jan.1998,pp.63{78. [7] M.EichinandJ.Rochlis,\WithMicroscopeandTweezers:AnAnalysisoftheInternetVirusofNovember1988,"inProceedingsofthe1989IEEESymposiumonSecurityandPrivacy,Oakland,California,USA,May1989,pp.326{344. [8] J.A.RochlisandM.W.Eichin,\WithMicroscopeandTweezers:TheWormfromMIT'sPerspective,"Commun.ACM,vol.32,no.6,pp.689{698,1989. [9] ComputerEmergencyResponseTeam.(2001)CERTAdvisoryCA-2001-23:"CodeRed"WormExploitingBuerOverowInIISIndexingServiceDLL.Lastaccessed:March,2006.[Online].Available: http://www.cert.org/advisories/CA-2001-23.html 95

PAGE 107

[10] ||.(2001)CERTAdvisoryCA-2001-26:NimdaWorm.Lastaccessed:March,2006.[Online].Available: http://www.cert.org/advisories/CA-2001-26.html [11] ||.(2003)CERTAdvisoryCA-2001-26:MS-SQLServerWorm.Lastaccessed:March,2006.[Online].Available: http://www.cert.org/advisories/CA-2003-04.html [12] M.M.Williamson,\ThrottlingViruses:RestrictingPropagationtoDefeatMaliciousMobileCode,"inProceedingofthe18thAnnualComputerSecurityApplicationsConference(ACSAC'2002),LasVegas,Neveda,USA,Oct.2003,pp.61{68. [13] H.JavitzandA.Valdes,\TheNIDESStatisticalComponentDescriptionandJustication,"ComputerScienceLaboratory,SRIInternational,MenloPark,California,USA,Tech.Rep.,1994. [14] K.WangandS.J.Stolfo,\AnomalousPayload-basedNetworkIntrusionDetection,"inProceedingsofthe7thInternationalSymposiumonRecentAdvancesinIntrusionDetection(RAID'2004),SophiaAntipolis,FrenchRiviera,France,Sept.2004,pp.227{246. [15] K.Ilgun,R.Kemmerer,andP.Porras,\StateTransitionAnalysis:ARule-basedIntrusionDetectionApproach,"IEEETrans.SoftwareEng.,vol.2,pp.181{199,1995. [16] U.LindqvistandP.Porras,\DetectingComputerandNetworkMisuseThroughtheProduction-BasedExpertSystemToolset(P-BEST),"inProceedingsofthe1999IEEESymposiumonSecurityandPrivacy,Oakland,California,USA,May1999,pp.133{145. [17] J.O.KephartandS.R.White,\Directed-GraphEpidemiologicalModelsofComputerViruses,"inProceedingsofthe1991IEEESymposiumonSecurityandPrivacy,Oakland,California,USA,May1991,pp.343{361. [18] D.Moore,C.Shannon,andK.Clay,\Code-Red:ACaseStudyontheSpreadandVictimsofanInternetWorm,"inProceedingsofthe2ndInternetMeasurementWorkshop(IMW'2002),Marseille,France,Nov.2002,pp.273{284. [19] C.C.Zou,W.Gong,andD.Towsley,\CodeRedWormPropagationModelingandAnalysis,"inProceedingsofthe9thACMConferenceonComputerandCommunicationsSecurity(CCS'2002).Washington,DC,USA:ACMPress,Nov.2002,pp.138{147. [20] N.Weaver,I.Hamadeh,G.Kesidis,andV.Paxson,\PreliminaryResultsUsingScale-downtoExploreWormDynamics,"inProceedingsofthe2004ACMWorkshoponRapidMalcode(WORM'2004).WashingtonDC,USA:ACMPress,2004,pp.65{72.

PAGE 108

[21] Z.Chen,L.Gao,andK.Kwiat,\ModelingtheSpreadofActiveWorms,"inProceedingsofthe22ndAnnualJointConferenceoftheIEEEComputerandCommunicationsSocieties(INFOCOM'2003),SanFrancisco,California,USA,Mar.2003,pp.1890{1900. [22] C.C.Zou,L.Gao,W.Gong,andD.Towsley,\MonitoringandEarlyWarningforInternetWorms,"inProceedingsofthe10thACMConferenceonComputerandCommunicationSecurity(CCS'2003).WashingtonD.C.,USA:ACMPress,Oct.2003,pp.190{199. [23] J.TwycrossandM.M.Williamson,\ImplementingandTestingaVirusThrottle,"inProceedingsofthe12thUSENIXSecuritySymposium(Security'2003),WashingtonD.C.,USA,Aug.2003,pp.285{294. [24] S.Schechter,J.Jung,andA.W.Berger,\FastDetectionofScanningWormInfections,"inProceedingsofthe7thInternationalSymposiumonRecentAdvancesinIntrusionDetection(RAID'2004),SophiaAntipolis,FrenchRiviera,France,Sept.2004,pp.59{81. [25] N.Weaver,S.Staniford,andV.Paxson,\VeryFastContainmentofScanningWorms,"inProceedingsofthe13thUSENIXSecuritySymposium(Security'2004),SanDiego,California,USA,Aug.2004,pp.29{44. [26] G.Gu,D.Dagon,X.Qin,M.I.Sharif,W.Lee,andG.F.Riley,\WormDetection,EarlyWarning,andResponseBasedonLocalVictimInformation,"inProceedingsofthe20thAnnualComputerSecurityApplicationsConference(ACSAC'2004),Tucson,Arizona,USA,Dec.2004,pp.136{145. [27] S.Staniford,D.Moore,V.Paxson,andN.Weaver,\TheTopSpeedofFlashWorms,"inProceedingsofthe2004ACMWorkshoponRapidMalcode(WORM'2004).WashingtonDC,USA:ACMPress,2004,pp.33{42. [28] L.Spitzner,Honeypots:TrackingHackers.Reading,Massachusetts,USA:Addison-Wesley,2002. [29] N.Provos,\AvirtualHoneypotFramework,"inProceedingsofthe13thUSENIXSecuritySymposium(Security'2004),SanDiego,California,USA,Aug.2004,pp.1{14. [30] C.KreibichandJ.Crowcroft,\Honeycomb:CreatingIntrusionDetectionSignaturesUsingHoneypots,"in2ndWorkshoponHotTopicsinNetworks(HotNets-II),Cambridge,Massachusetts,USA,Nov.2003,pp.51{56. [31] D.Dagon,X.Qin,G.Gu,W.Lee,J.Grizzard,J.Levin,andH.Owen,\HoneyStat:LocalWormDetectionUsingHoneypots,"inProceedingsofthe7thInternationalSymposiumonRecentAdvancesinIntrusionDetection(RAID'2004),SophiaAntipolis,FrenchRiviera,France,Sept.2004,pp.39{58.

PAGE 109

[32] M.ChristodorescuandS.Jha,\StaticAnalysisofExecutablestoDetectMaliciousPatterns,"inProceedingsofthe12thUSENIXSecuritySymposium(Security'2003),WashingtonD.C.,USA,Aug.2003,pp.169{186. [33] O.KolesnikovandW.Lee,\AdvancedPolymorphicWorms:EvadingIDSbyBlendinginwithNormalTrac,"CollegeofComputing,GeorgiaInstituteofTechnology,Tech.Rep.,2004. [34] H.-A.KimandB.Karp,\Autograph:TowardAutomated,DistributedWormSignatureDetection,"inProceedingsofthe13thUSENIXSecuritySymposium(Security'2004),SanDiego,California,USA,Aug.2004,pp.271{286. [35] C.E.LawrenceandA.A.Reilly,\AnExpectationMaximization(EM)AlgorithmfortheIdenticationandCharacterizationofCommonSitesinUnalignedBiopolymerSequences,"PROTEINS:Structure,FunctionandGenetics,vol.7,pp.41{51,1990. [36] C.E.Lawrence,S.F.Altschul,M.S.Boguski,J.S.Liu,A.F.Neuwald,andJ.C.Wootton,\DetectingSubtleSequenceSignals:AGibbsSamplingStrategyforMultipleAlignment,"Science,vol.262,pp.208{214,Oct.1993. [37] H.W.Hethcote,\TheMathematicsofInfectiousDiseases,"SIAMReview,vol.42,no.4,pp.599{653,2000. [38] P.Szor,TheArtofComputerVirusResearchandDefense.UpperSaddleRiver,NewJersery,USA:AddisonWesleyProfessional,2005. [39] C.Kaufman,R.Perlman,andM.Speciner,NetworkSecurity:PrivateCommunicationinaPublicWorld.UpperSaddleRiver,NewJersey,USA:PrenticeHall,Inc.,2002. [40] S.GemanandD.Geman,\StochasticRelaxation,GibbsDistribution,andtheBayesianRestorationofImages,"IEEETrans.PatternAnal.MachineIntell.,vol.6,pp.721{741,1984. [41] J.ShiandJ.Malik,\NormalizedCutsandImageSegmentation,"IEEETrans.PatternAnal.MachineIntell.,vol.22,pp.888{905,Aug.2000. [42] D.A.ForsythandJ.Ponce,ComputerVisionAModernApproach.UpperSaddleRiver,NewJersey,USA:PrenticeHall,2003. [43] ComputerEmergencyResponseTeam.(2003)CERTAdvisoryCA-2003-20:W32/Blasterworm.Lastaccessed:March,2006.[Online].Available: http://www.cert.org/advisories/CA-2003-20.html [44] UnitedStatesComputerEmergencyReadinessTeam.(2004)US-CERTCyberSecurityBulletinSB04-133.Lastaccessed:March,2006.[Online].Available: http://www.us-cert.gov/cas/body/bulletins/SB04-133.pdf

PAGE 110

[45] D.Moore,V.Paxson,S.Savage,C.Shannon,S.Staniford,andN.Weaver.(2003)TheSpreadoftheSapphire/SlammerWorm.Lastaccessed:March,2006.[Online].Available: http://www.caida.org/outreach/papers/2003/sapphire/ [46] C.HoepersandK.Steding-Jessen.(2003)TheScanoftheMonth25.Lastaccessed:March,2006.[Online].Available: http://www.honeynet.org/scans/scan25/writeup.html [47] Y.TangandS.Chen,\DefendingAgainstInternetWorms:ASignature-BasedApproach,"inProceedingsofthe24ndAnnualJointConferenceoftheIEEEComputerandCommunicationsSocieties(INFOCOM'2005),Miami,Florida,USA,Mar.2005,pp.1384{1394. [48] T.J.SantnerandandD.E.Duy,TheStatisticalAnalysisofDiscreteData.NewYork,USA:SpringerVerlag,1989.

PAGE 111

YongTangwasborninKaifeng,China,in1977.HegraduatedfromKaifengHighSchoolandenteredPekingUniversityin1995.After4yearsofstudy,hereceivedhisBachelorofScience(B.S.)degreein1999withthehighesthonors.Since2002,hehasbeenconductingresearchwithDr.ShigangCheninComputerandInformationScienceandEngineeringDepartmentattheUniversityofFlorida.Hisresearchisinthenetworkingandsecurityareas,includingQuality-of-Service(QoS)routing,defenseagainstDistributedDenial-of-Service(DDoS)attacks,defenseagainstInternetworms,Peer-to-Peer(P2P)networksandwirelessnetworks. 100


Permanent Link: http://ufdc.ufl.edu/UFE0013742/00001

Material Information

Title: Defending against Internet Worms
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0013742:00001

Permanent Link: http://ufdc.ufl.edu/UFE0013742/00001

Material Information

Title: Defending against Internet Worms
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0013742:00001


This item has the following downloads:


Full Text











DEFENDING AGAINST INTERNET WORMS


By

YONG TANG

















A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA


2006

































Copyright 2006

by

Yong Tang


































I dedicate this to everyone in my family.















ACKNOWLEDGMENTS

First of all, I would like to thank my advisor, Prof. Shigang C'!l. i, for his

guidance and support throughout my graduate studies. Without the numerous

discussions and brainstorms with him, the results presented in this dissertation

would never have existed.

I am grateful to Prof. Sartaj Sahni, Prof. i-,ii ,y Ranka, Prof. Yuguang Fang

and Prof. Dapeng Wu for their guidance and encouragement during my years at

the University of Florida (UF). I would also like to thank Prof. Ye Xia for his

valuable comments and silr- .--I ir.. on my research.

I am thankful to all my colleagues in Prof. Chen's group, including Qingguo

Song, Zhan Zhang, Wei Pan, Liang Z! Ii:- and MyungKeun Yoon. They provide

valuable feedback for my research.

I would also like to thank many people in the Computer and Information

Science and Engineering (CISE) Department for their help in my research work.

In particular, I would like to thank Fei Wang for his help and valuable discussions

in my research work. I would also like to thank Ju Wang, Zhizhou Wang, Xiaobin

Wu, Mingxi Wu, Jundong Liu and Jie Zhang for their help throughout my graduate

life.

I am also thankful to my long time friends before I entered UF. In particular, I

am thankful to Peng Wu for his help and encouragement.

Last but not least, I am grateful to my parents and sisters for their love,

encouragement, and understanding. It would be impossible for me to express my

gratitude towards them in mere words.















TABLE OF CONTENTS
page

ACKNOWLEDGMENTS ................... ...... iv

LIST OF TABLES ...................... ......... vii

LIST OF FIGURES ................... ......... viii

ABSTRACT ... .. .. .. ... .. .. .. .. ... .. .. .. ... .. .. x

CHAPTER

1 INTRODUCTION ........................... 1

1.1 Internet W orm s ............................. 1
1.2 Related W ork .............................. 4
1.3 Contribution ............................. 6
1.3.1 Distributed Anti-Worm Architecture .............. 7
1.3.2 Signature-Based Worm Identification and Defense ...... 8
1.3.3 Optimization of Iterative Methods ... . . 9

2 SLOWING DOWN INTERNET WORMS ... . . 10

2.1 Modeling Worm Propagation ................ ...... 10
2.2 Failure Rate ............ . . ... 12
2.3 A Distributed Anti-Worm Architecture . . ..... 15
2.3.1 Objectives .................. ......... .. 15
2.3.2 Assumptions .................. ........ .. 15
2.3.3 DAW Overview .................. ..... .. 17
2.3.4 Measuring Failure Rate ................ .. .. 19
2.3.5 Basic Rate-Limit Algorithm ................. .. 20
2.3.6 Temporal Rate-Limit Algorithm . . ..... 22
2.3.7 Recently Failed Address List ................. .. 25
2.3.8 Spatial Rate-Limit Algorithm ................ .. 25
2.3.9 Blocking Persistent Scanning Sources . . ..... 28
2.3.10 FailLog .... ............... ..... ...... 30
2.3.11 Warhol Worm and Flash Worm ............... .. 32
2.3.12 Forged Failure Replys .................. ..... 33
2.4 Simulation .................. ............. 33









3 A SIGNATURE-BASED APPROACH ...........


3.1 Double-Honeypot System .................. ... 37
3.1.1 M otivation . . . . . . . 37
3.1.2 System Architecture .................. .. 38
3.2 Polymorphism of Internet Worms .................. .. 41
3.3 Position-Aware Distribution Signature (PADS) . . .... 50
3.3.1 Background and Motivation . . . 50
3.3.2 Position-Aware Distribution Signature (PADS) . . 53
3.4 Algorithms for Signature Detection ................. .. 57
3.4.1 Expectation-Maximization Algorithm . . ..... 58
3.4.2 Gibbs Sampling Algorithm .................. .. 59
3.4.3 Complexities .......... . . .... 60
3.4.4 Signature with Multiple Separated Strings . ... 61
3.4.5 Complexities .................. ...... .. .. 62
3.5 MPADS with Multiple Signatures . . . 62
3.6 Mixture of Polymorphic Worms and Clustering Algorithm . 63
3.6.1 Normalized Cuts. .................. .... .. 64
3.7 Experiments .................. .......... .. 66
3.7.1 Convergence of Signature Generation Algorithms ...... 68
3.7.2 Effectiveness of N.i in i i. Cuts Algorithm . .... 69
3.7.3 Impact of Signature Width and Worm Length . ... 70
3.7.4 False Positives and False Negatives ...... . . 73
3.7.5 Comparing PADS with Existing Methods . .... 75

4 MULTIPLE PADS MODEL AND CLASSIFICATION OF POLYMORPHIC
WORM FAMILIES: AN OPTIMIZATION ................ .. 78

4.1 Introduction ................ .... ........ 78
4.2 Extraction of Multiple PADS Blocks from the Mixture of Polymorphic
Worms .................. .......... ...... 80
4.2.1 PADS Blocks and The Dataset from Byte Sequences . 80
4.2.2 Expectation-Maximization (EM) Algorithm . ... 82
4.2.3 Extraction of Multiple PADS blocks . . 85
4.3 Classification of Polymorphic Worms and Signature Generation .86
4.3.1 Multiple PADS Blocks Model ................ .. 86
4.3.2 Classification. ....... ............ ...... 89
4.4 Conclusion ................... ......... 90

5 SUMMARY AND CONCLUSION ................ ...... 92

5.1 Summary ............... .......... .. 92
5.2 Conclusion ................... ......... 93

REFERENCES ...... .......... ............... 95

BIOGRAPHICAL SKETCH .................. ......... .. 100















LIST OF TABLES
Table page

2-1 Failure rates of normal hosts .................. ...... .. 14

2-2 5'. propagtion time (d iv) for "Temporal + Spatial" ........... 35

3-1 An example of a PADS signature with width W = 10 . .... 53

4-1 An example of a PADS block with width W = 10 . . 81

4-2 An example of a segment with width W = 10 . . ..... 82















LIST OF FIGURES
Figure page

2-1 Distributed anti-worm Architecture ................ .... 16

2-2 Worm-propagation comparison ............... .... 33

2-3 Effectiveness of the temporal rate-limit algorithm for DAW . ... 36

2-4 Effectiveness of the spatial rate-limit algorithm for DAW . ... 36

2-5 Stop worm propagation by blocking .................. .. 36

2-6 Propagation time before the worm is stopped .............. ..36

3-1 Using double-honeypot detecting Internet worms ............ ..39

3-2 A decryptor example of a worm. .................. .... 42

3-3 Different variants of a polymorphic worm using the same decryptor 43

3-4 Different variants of a polymorphic worm using different decryptors 44

3-5 Different variants of a polymorphic worm with different decryptors and
different entry point .................. .......... .. 44

3-6 Different variants of a polymorphic worm with garbage-code insertation 45

3-7 Different variants of a polymorphic worm with several different polymorphic
techniques .................. ................ .. 47

3-8 Signature detection .................. ......... .. .. 57

3-9 C',l-, i- ................... ............... .. 65

3-10 Variants of a polymorphic worm .................. .. 67

3-11 Influence of initial configurations ................ . .68

3-12 Variants clustering using normalized cuts ............. .. 69

3-13 Matching score influence of different signature widths and sample variants
lengths ............... ... ............ .. 70

3-14 Influence of different lengths of the sample variants ......... 71

3-15 Influence of different lengths of the sample variants ......... 71









3-16 Influence of different lengths of the sample variants . 72

3-17 Influence of different lengths of the sample variants . 72

3-18 False positives and false negatives ................ ...... 74

3-19 The performance of signature-based system using the longest common
substrings method. ............... .......... 75

3-20 Byte frequency distributions of normal traffic (left-hand plot) and worm
traffic (right-hand plot) .................. ....... .. .. 76

3-21 Byte frequency distributions of worm variants. Left-hand plot: malicious
and normal p loads carried by a worm variant have equal length. Right-hand
plot: normal p loadd carried by a worm variant is 9 times of malicious
p loadd . . . . . .. . . 76















Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

DEFENDING AGAINST INTERNET WORMS

By

Yong Tang

li ,v 2006

C'!i wi': Dr. Shigang C('!, i
Major Department: Computer and Information Science and Engineering

With the capability of infecting hundreds of thousands of hosts, worms

represent a in, i" threat to the Internet. While much recent research concentrates

on propagation models, the defense against worms is largely an open problem. This

proposal develops two defense techniques based on the behavior difference between

normal hosts and worm-infected hosts.

In the first part of the dissertation, we propose a distributed anti-worm

architecture (DAW) that automatically slows down or even halts the worm

propagation. The basic idea comes from the observation that a worm-infected host

has a much higher connection-failure rate when it scans the Internet with randomly

selected addresses. This property allows DAW to set the worms apart from the

normal hosts. A temporal rate-limit algorithm and a spatial rate-limit algorithm,

which makes the speed of worm propagation configurable by the parameters of

the defense system, is proposed. DAW is designed for an Internet service provider

to provide the anti-worm service to its customers. The effectiveness of the new

techniques is evaluated analytically and by simulations.









In the second part of the dissertation, we propose a defense system that is able

to detect new worms that were not seen before and, moreover, capture the attack

packets. It can effectively identify polymorphic worms from the normal background

traffic. The system has two novel contributions. The first contribution is the design

of a novel double-honeypot system, which is able to automatically detect new

worms and isolate the attack traffic. The second contribution is the proposal of a

new type of position-aware distribution signatures (PADS), which fit in the gap

between the traditional signatures and the anomaly-based systems. We propose

two algorithms based on Expectation-Maximization (EM) and Gibbs Sampling for

efficient computation of PADS from polymorphic worm samples. The new signature

is capable of handling certain polymorphic worms. Our experiments show that

the algorithms accurately separate new variants of the MSBlaster worm from the

normal-traffic background.

In the third part of the dissertation, we further investigate the multiple PADS

model and propose the optimization of our iterative approach. We propose a

new way of extracting multiple PADS blocks at the same time using iterative

methods such as Expectation-Maximization (EM) algorithm. To classify different

polymorphic Internet worm families, we revisit the EM algorithm based on a

Gaussian mixture model for each byte sequence, which is assumed to be in a

feature vector space. The algorithm proposed saves the time complexity of the

iterative approach in that the extraction step can be done simultaneously.















CHAPTER 1
INTRODUCTION

1.1 Internet Worms

An Internet worm is a self-propagated program that automatically replicates

itself to vulnerable systems and spreads across the Internet. It represents a huge

threat to the network community [1, 2, 3, 4, 5, 6, 7]. Ever since the Morris worm

showed the Internet community for the first time in 1988 that a worm could bring

the Internet down in hours [8], new worm outbreaks have occurred periodically

even though their mechanism of spreading was long well understood. On July 19,

2001, the code-red worm (version 2) infected more than 250,000 hosts in just 9

hours [9]. Soon after, the Nimbda worm raged on the Internet [10]. As recently

as January 25, 2003, a new worm called SQLSlammer [11] reportedly shut down

networks across Asia, Europe and the Americas.

The most common way for a worm to propagate is to exploit a security

loophole in certain versions) of a service software to take control of the machine

and copy itself over. For example, the Morris worm exploited a bug in finger and a

trap door in sendmail of BSD 4.2 or 4.3, while the code-red worm took advantage

of a buffer-overflow problem in the index server of IIS 4.0 or IIS 5.0. Typically

a worm-infected host scans the Internet for vulnerable systems. It chooses an IP

address, attempts a connection to a service port (e.g., TCP port 80 in the case of

code red), and if successful, attempts the attack. The above process repeats with

different random addresses. As more and more machines are compromised, more

and more copies of the worm are working together to reproduce themselves. An

explosive epidemic is therefore developed across the Internet.









Although most known worms did not cause severe damage to the compromised

systems, they could have altered data, removed files, stolen information, or used the

infected hosts to launch other attacks if they had chosen to do so.

The worm activity often causes Denial-of-Service (DoS) as a by-product. The

hosts that are vulnerable to a worm typically account for a small portion of the IP

address space. Hence, worms rely on high-volume random scan to find victims. The

scan traffic from tens of thousands of compromised machines can congest networks.

There are few answers to the worm threat. One solution is to patch the

software and eliminate the security defects [9, 10, 11]. That did not work because

(1) software bugs seem alv--i- to increase as computer systems become more and

more complicated, and (2) not all people have the habit of keeping an eye on the

patch releases. The patch for the security hole that led to the SQLSlammer worm

was released half a year before the worm appeared, and still tens of thousands of

computers were infected. Intrusion detection systems and anti-virus software may

be upgraded to detect and remove a known worm, and routers and firewalls may be

configured to block the packets whose content contains worm signatures, but those

happen after a worm has spread and been analyzed.

Moore et al. studied the effectiveness of worm containment technologies

(address blacklisting and content filtering) and concluded that such systems must

react in a matter of minutes and interdict nearly all Internet paths in order to be

successful [2]. Williamson proposed to modify the network stack so that the rate

of connection requests to distinct destinations is bounded [12]. The main problem

is that this approach becomes effective only after the i I i, i ily of all Internet

hosts are upgraded with the new network stack. For an individual organization,

although the local deployment may benefit the Internet community, it does not

provide immediate anti-worm protection to its own hosts, whose security depends









on the rest of the Internet taking the same action. This gives little incentive for the

upgrade without an Internet-wide coordinated effort.

Most known worms have very ..- --ressive behaviors. They attempt to infect

the Internet in a short period of time. These types of worms are actually easier to

be detected becuase their .,-.-ressiveness stands out from the background traffic.

Future worms may be modified to circumvent the rate-based defense systems and

purposely slow down the propagation rate in order to compromise a vast number of

systems over the long run without being detected [2].

Intrusion detection has been intensively studied in the past decade. Au.',,i il:i-

based systems [4, 13, 14] profile the statistical features of normal traffic. Any

deviation from the profile will be treated as suspicious. Although these systems can

detect previously unknown attacks, they have high false positives when the normal

activities are diverse and unpredictable. On the other hand, misuse detection

systems look for particular, explicit indications of attacks such as the pattern of

malicious traffic p load. They can detect the known worms but will fail on the

new types.

Most deploy, .1 worm-detection systems are :'.i,,il/, re-based, which belongs to

the misuse-detection category. They look for specific byte sequences (called attack

.':,jil;, res) that are known to appear in the attack traffic. The signatures are

manually identified by human experts through careful i i1,i- of the byte sequence

from captured attack traffic. A good signature should be one that consistently

shows up in attack traffic but rarely appears in normal traffic.

The signature-based systems [15, 16] have an advantage over the anomaly-based

systems due to their simplicity and the ability of operating online in real time. The

problem is that they can only detect known attacks with identified signatures

that are produced by experts. Automated signature generation for new attacks

is extremely difficult due to three reasons. First, in order to create an attack









signature, we must identify and isolate attack traffic from legitimate traffic.

Automatic identification of new worms is critical, which is the foundation of

other defense measures. Second, the signature generation must be general enough

to capture all attack traffic of a certain type while at the same time specific

enough to avoid overlapping with the content of normal traffic in order to reduce

false-positives. This problem has so far been handled in an ad-hoc way based on

human judgement. Third, the defense system must be flexible enough to deal with

the polymorphism in the attack traffic. Otherwise, worms may be programmed to

deliberately modify themselves each time they replicate and thus fool the defense

system.

1.2 Related Work

Much recent research on Internet worms concentrates on propagation

modeling. A classic epidemiological model of a computer virus was proposed

by Kephart and White [17]. This model was later used to analyze the propagation

behavior of Code-Red-like worms by Staniford et al. [1] and Moore et al. [18].

Refinements were made to the model by Zou et al. [19] and Weaver et al. [20] in

order to fit with the observed propagation data.

C'!. i1 et al. proposed a sophisticated worm propagation model (called AAWP

[21]) based on discrete times. In the same work, the model is applied to monitor,

detect, and defend against the spread of worms under a rather simplified setup,

where a range of unused addresses are monitored and a connection made to those

addresses tri -.-.- r-i a worm alert. The distributed early warning system by Zou

et al. [22] also monitors unused addresses for the "trend" of illegitimate scan

traffic on the Internet. There are two problems with these approaches. First,

the attackers can easily overwhelm such a system with false positives by sending

packets to those addresses, or some normal programs may scan the Internet for

research or other purposes and hit the monitored addresses. Second, to achieve






5


good response time, the number of .iii l. I add to be monitored has to be

large, but addresses are scarce resource in the IPv4 world, and only a few have the

privilege of establishing such a system. A monitor/detection system based on i,-,. I

add will be much more attractive. It allows more institutes or commercial

companies to participate in the quest of defeating Internet worms.

For worms that propagate amongst certain type of servers, a solution is to

block the servers' outbound connections so that the worms cannot spread among

them. This approach works only when it is implemented for all or a vast i! i i .1i iy

of the servers on the Internet. Such an Internet-wide effort has not been and may

never be achieved, considering that there are so many countries in the world and

home users are setting up their servers without knowing this sood practice." In

addition, the approach does not apply when a machine is used both as a server and

as a client.

Moore et al. studied the effectiveness of worm containment technologies

(address blacklisting and content filtering) and concluded that such systems must

react in a matter of minutes and interdict nearly all Internet paths in order to be

successful [2]. Williamson and Twycross proposed to modify the network stack so

that the rate of connection requests to distinct destinations is bounded [12, 23].

Schechter et al. [24] used the sequential hypothesis test to detect scan sources and

proposed a credit-based algorithm for limiting the scan rate of a host. Weaver et

al. [25] developed containment algorithms suitable for deployment with high-speed,

low-cost network hardware. The main problem of the above approaches is that

their effectiveness against worm propagation requires Internet-wide deployment.

Gu et al. [26] proposed a simple two-phase local worm victim detection algorithm

based on both infection pattern and scanning pattern. Apparently, it cannot issue a

warning before some local hosts are compromised. None of the above approaches is

able to handle the flash worms [27] that perform targeted scanning.









Honeypots [28] have gained a lot of attention recently. Their goal is to

attract and trap the attack traffic on the Internet. Provos [29] designed a virtual

honeypot framework to exhibit the TCP/IP stack behavior of different operating

systems. Kreibich and Crowcroft [30] proposed the Honeycomb to identify the

worm signatures by using longest common substrings. Dagon et al. developed

HoneyStat [31] to detect worm behaviors in small networks. The above systems

either assume that all incoming connections to the honeypot are from worms, or

rely on experts for the manual worm analysis. These restrictions greatly undermine

the effectiveness of the systems.

Kruegel and Vigna [4] discussed various v- -iv of applying anomaly detection

in web-based attacks. Several methods, such as X2-test and Markov models

were presented. Wang and Stolfo [14] used the byte-frequency distribution of

the traffic p loadd to identify anomalous behavior and possibly worm attacks.

These methods are ineffective against polymorphic worms. The research in

defending against polymorphic worms is still in its infancy. C'!!i -lIdorescu and

Jha [32] discussed a variety of different polymorphism techniques that could be

used to obfuscate malicious code. It also proposed a static analysis method to

identify malicious patterns in executables. Kolesnikov and Lee [33] described

some advanced polymorphic worms that mutate based on normal traffic. Kim and

Karp [34] proposed a worm signature detection system with limited discussion on

polymorphism.

1.3 Contribution

There are three 1i i ri contributions in this thesis. First of all, we provide a

worm containment technology that is deploy, ,1 on an ISP to provide anti-worm

service. Our system is able to substantially slow down the worm propagation rate

even if the system is not deploy, -l to the whole Internet. Second, we propose a

double-honeypot system to automatically identify the worm attacks and generate









worm signatures. Finally, to further improve the performance, a novel format of

signature is defined and the iterative methods to compute the signature is discussed

in order to deal with the polymorphism of Internet worms. The proposed method is

optimized in the thesis with a Gaussian mixture model, thus eliminates unnecessary

computations and saves the time complexity of our approach.

1.3.1 Distributed Anti-Worm Architecture

We propose a distributed anti-worm architecture (DAW)which is designed

for an Internet service provider (ISP) to provide the anti-worm service to its

customers. (Note that, from one ISP's point of view, the neighbor ISPs are also

customers.) DAW is deploy, l1 at the ISP edge routers, which are under a single

administrative control. It incorporates a number of new techniques that monitor

the scanning activity within the ISP network, identify the potential worm threats,

restrict the speed of worm propagation, and even halt the worms by blocking

out scanning sources. By tightly restricting the connection-failure rates from

worm-infected hosts while allowing the normal hosts to make successful connections

at any rate, DAW is able to significantly slow down the worm's propagation in an

ISP and minimize the negative impact on the normal users.

The proposed defense system separates the worm-infected hosts from the

normal hosts based on their behavioral differences. Particular, a worm-infected

host has a much higher connection-failure rate when it scans the Internet with

randomly selected addresses, while a normal user deals mostly with valid addresses

due to the use of DNS (Domain Name System). This and other properties allow

us to design the entire defense architecture based on the inspection of failed

connection requests, which not only reduces the system overhead but minimizes

the disturbance to normal users, who generate fewer failed connections than

worms. With a temporal rate-limit algorithm and a spatial rate-limit algorithm,

DAW is able to tightly restrict the worm's scanning activity, while allowing the









normal hosts to make successful connections at any rate. Due to the use of DNS in

resolving IP addresses, the chance of attempting connections to non-existing hosts

by normal users is relatively low, because a connection will never be initiated by

the application if DNS does not find the destination host. This is especially true

considering that a typical user has a number of favorite, frequently-accessed sites

(that are known to exist). A temporal rate-limit algorithm and a spatial rate-limit

algorithm are used to bound the scanning rate of the infected hosts. One important

contribution of DAW is to make the speed of worm propagation configurable, no

longer by the parameters of worms but by the parameters of DAW. While the

actual values of the parameters should be set based on the ISP traffic statistics,

we analyze the impact of those parameters on the performance of DAW and use

simulations to study the suitable value ranges.

1.3.2 Signature-Based Worm Identification and Defense

We design a novel double-honeypot system which is deploy, ,1 in a local

network for automatic detection of worm attacks from the Internet. The system

is able to isolate the attack traffic from the potentially huge amount of normal

traffic on the background. It not only allows us to trigger warnings but also

record the attack instances of an on-going worm epidemic. We summarize the

polymorphism techniques that a worm may use to evade the detection by the

current defense systems. We then define the position-aware distribution -:lit.ull; re

(PADS) capable of detecting polymorphic worms of certain types. The new

signature is a collection of position-aware byte frequency distributions, which is

more flexible than the traditional signatures of fixed strings and more precise

than the position-unaware statistical signatures. We describe how to match a

byte sequence against the "non-conventional" PADS. Two algorithms based on

Expectation-Maximization (EM) [35] [36] are proposed for efficient computation

of PADS from polymorphic worm samples. Experiments based on variants of









the MSBlaster worm are performed. The results show that our signature-based

defense system can accurately separate new variants of the worm from the normal

background traffic by using the PADS signature derived from the past samples. To

deal with multiple malicious segments of the worm, a multi-segment position aware

distribution signature (i\PADfor classification of the polymorphic worm families

together with normalized cut algorithm.

1.3.3 Optimization of Iterative Methods

The iterative methods discussed in the last subsection suffer from several

drawbacks. First of all, because the PADS signature can only be obtained one

by one and iterative approach are time consuming process, it will take a long

time before every PADS signature has been extracted. Secondly, because PADS

signatures are extracted sequentially, the quality of the PADS signature will be

different. Since iterative methods are used, different initialization will result in

totally different PADS signature set, thus affect the clustering of the polymorphic

worm family. To address these problems, a mixture model is used, which assumes

that each segment of the dataset may come from multiple PADS blocks at the

same time. It has the clear advantage over previously proposed approach in that

multiple PADS blocks can be extracted simultaneously. Thus reduce the time

needed for iterative approach. Furthermore, we define a new metric to define the

quality of the matching between a set of PADS blocks and a byte sequence. This

chapter can be considered as an optimization to the previous chapter overall.














CHAPTER 2
SLOWING DOWN INTERNET WORMS

2.1 Modeling Worm Propagation

The worm propagation can be roughly characterized by the classical simple

epidemic model [37, 1, 2].

d(t)
where i(t)is the percentage of vulnerable hosts that are infected with respect to

time t, and pis the rate at which a worm-infected host detects other vulnerable

hosts.

First we formly deduce the value of 0. Some notations are defined as follows.

ris the rate at which an infected host scans the address space. Nis the size of the

address space. Vis the total number of vulnerable hosts.

At time t, the number of infected hosts is i(t) -V, and the number of vulnerable

but uninfected hosts is (1 -i(t))V. The probability for one scan message to hit an

uninfected vulnerable host is p = (1 i(t))V/N. For an infinitely small period dt,

i(t) changes by di(t). During that time, there are n = r i(t) V dt scan messages

and the number of newly infected hosts is n x p = r i(t) V dt t (1 i(t))V/N

r i(t) (1- i(t))j-dt.1 Therefore,

V2
V di(t) = r i(t) (1 i(t))-dt
N (2-2)
di(t) V 1
dt rN



1 When dt -+ 0, the probability of multiple scan messages hitting the same host
becomes negligible.









The above equation agrees perfectly with our simulations. Solving the

equation, we have
e t(t-T)
(t) l -- (t- T)
1 + e '(t-T)
Let the number of initially infected hosts be v. i(0) = v/V, and we have T =
N- In v The time (t(a))it takes for a percentage a (> v/V) of all vulnerable

hosts to be infected is

N a v
t( ) = (In In ) (2-3)
rV 1 a V-v

Suppose the worm attack starts from one infected host. v = 1. We have


t(a) (V 1In) (2-4)
r-V 1-a

The time predicted by Eq. (2-4) can be achieved only under ideal conditions.

In reality, worms propagate slower due to a number of factors. First, once a

large number of hosts are infected, the ..-- -ressive scanning activities often

cause wide-spread network congestion and consequently many scan messages

are dropped. Second, when a worm outbreak is announced, many system

administrators shut down vulnerable servers or remove the infected hosts from

the Internet. Third, some types of worms enter dormant state after being active for

a period of time. Due to the above reasons, the code red spread much slower than

the calculation based on Eq. (2-4). A more sophisticated model that considers the

first two factors can be found in [19], which fits better with the observed code-red

data. All existing models cannot describe the theoretical Warhol worm and Flash

worm presented in [1]. We shall address them separately in Section 2.3.11.

Practically it is important to slow down the worm propagation in order to give

the Internet community enough time to react in the face of an unknown worm.

Eq. (2-4) points out two possible approaches: decreasing r causes t(a) to increase

inverse-proportionally; increasing N causes t(a) to increase proportionally. In









this paper, we use the first approach to slow down the worms, while relying on a

different technique to halt the propagation. The idea is to block out the infected

hosts and make sure that the scanning activity of an infected host does not last

for more than a period of AT. Under such a constraint, the propagation model

becomes
di(t) V
d(- (i(t) i(t AT))(1 i(t)) (2-5)
dt N

The above equation can be derived by following the same procedure that derives

Eq. (2-2), except that at time t the number of infected hosts is (i(t) i(t AT)) V

instead of i(t) V.

Theorem 1. If rAT < (1 )NV, the worm will be stopped before a percentage a

of all vulnerable hosts are infected.

Proof: Each infected host sends rAT scan messages, and causes rATv (or

less due to duplicate hits) new infections. For the worm to stop, we need rATv <
OO
1. The total infections before the worm stops is no more than E v(rAT)'
V. If rAT < (1 v- )N, we have Tv < aV. Namely, the worm stops
1 -AT T v V -,AT ,V
before a percentage a of the vulnerable hosts are infected.

2.2 Failure Rate

This paper studies the worms that spread via TCP, which accounts for the

1 in i ly of Internet traffic. We present a new approach that measures the potential

scanning activities by monitoring the failed connection requests, excluding those

due to network congestion.

When a source host makes a connection request, a SYN packet is sent to a

destination address. The connection request fails if the destination host does not

exist or does not listen on the port that the SYN is sent to. In the former case, an

IC' I P host-unreachable packet is returned to the source host; in the latter case,

a TCP RESET packet is returned. We call an IC'\IP host-unreachable or TCP

RESET packet as a connection-failure reply (or simply failure reply). The rate of









failed connection requests from a host s is called the failure rate, which can be

measured by monitoring the failure replys that are sent to s.

To support DAW, the ISP requires its customer networks to return IC'\!P

host-unreachable packets if the SYN packets are dropped by their routers or

firewalls. It is a common practice on the Internet.

It should be noted that our defense system does not require every customer

network that blocks IC\ P to forward the log messages, although doing so helps

the performance of the system. Our defense system works well as long as a portion

(e.g., 10'.) of all customer networks does not block IC\ I P host-unreachable

packets.

The failure rate measured for a normal host is likely to be low. For most

Internet applications (www, telnet, ftp, etc.), a user normally types domain names

instead of raw IP addresses to identify the servers. Domain names are resolved by

Domain Name System (DNS) for IP addresses. If DNS can not find the address

of a given name, the application will not issue a connection request. Hence,

mistyping or stale web links do not result in failed connection requests. An IC\ P

host-unreachable packet is returned only when the server is off-line or the DNS

record is stale, which are both uncommon for popular or regularly-maintained

sites (e.g., Yahoo, Elh -, CNN, universities, governments, enterprises, etc.) that

attract most of Internet traffic. Moreover, a frequent user typically has a list of

favorite sites (servers) to which most connections are made. Since those sites

are known to work most of the time, the failure rate for such a user is likely to

be low. If a connection fails due to network congestion, it does not affect the

measurement of the failure rate because no IC' I P host-unreachable or RESET

packet is returned. To illustrate our argument, we measured the failure rates on

three different domains of the University of Florida network. In our experiments,

domain 1 consists of five Class C networks, domain 2 consists of one Class C









avg. daily failure rate worst daily failure rate daily failure rate
per host per host of the whole network
Domain 1 3.0 43 824
Domain 2 10.1 41 116
Domain 3 3.11 63 106
Table 2-1. Failure rates of normal hosts



network, and domain 2 consists of two Class C network. Table 2-1 clearly shows

that failure rates for normal hosts are typically very low.

On the other hand, the failure rate measured for a worm-infected host is likely

to be high. Unlike normal traffic, most connection requests initiated by a worm

fail because the destination addresses are randomly picked, which are likely either

not in use or not listening on the port that the worm targets at. Consider the

infamous code-red worm. Our experiment shows that 99 '1'. of all connections

made to random addresses at TCP port 80 fails. That is, the failure rate is 99.'".'.

of the scanning rate. For worms targeting at software that is less popular than

web servers, this figure will be even higher. The relation between the scanning rate

r. il,1 the failure rate rfof a worm is

VI
rf = (1 )r,

where Vis the number of hosts that listen on the attacked port(s).2 If V << N,

we have

r]f r., (2-6)

Hence, measuring the failure rate of a worm gives a good idea about its scanning

rate. Given the .,-.-ressive behavior of a worm-infected host, its failure rate is

likely to be high, which sets it apart from the normal hosts. More importantly, an


2 V < V because not every host listens on the attacked port(s) is vulnerable.









approach that restricts the failure rate will restrict the scanning rate, which slows

down the worm propagation.

A worm may be deliberately designed to have a slow propagation rate in order

to evade the detection, which will be addressed in Section 2.3.9.

2.3 A Distributed Anti-Worm Architecture

2.3.1 Objectives

This section presents a distributed anti-worm architecture (DAW), whose main

objectives are

Slowing down the worm propagation to allow human reaction time. It took

the code red just hours to achieve wide infection. Our goal is to prolong

that time to tens of d -v A worm may even be stopped, especially when the

infected hosts scan at high rates, a property common to most known worms.

Detecting potential worm activities and identifying likely offending hosts,

which provides the security management team with valuable information in

analyzing and countering the worm threat.

Minimizing the performance impact on normal hosts and routers. Particularly,

a normal host should be able to make successful connections at any rate; a

server should be able to accept connections at any rate; the processing and

storage requirements on a router should be minimized.

2.3.2 Assumptions

Most businesses, institutions, and homes access the Internet via Internet

service providers (ISPs). An ISP network interconnects its customer networks,

and routes the IP traffic between them. The purpose of DAW is to provide an

ISP-based anti-worm service that prevents Internet worms from spreading among

the customer networks. DAW is practically feasible because its implementation

is within a single administrative domain. It also has strong business merit since









Internet Service Provider


edge routers
with DAW agent


Figure 2-1. Distributed anti-worm Architecture


a large ISP has sufficient incentive to deploy such a system in order to gain

marketing edge against its competitors.

We assume that a significant portion of failure replys are not blocked within

the ISP. If the ISP address space is /. ,:-, /;/ populated, then it is required that a

significant portion of TCP RESET packets are not blocked, which is normally the

case. If the ISP address space is -/i, -. /;/, populated, then it is required that IC I P

host-unreachable packets from a significant portion of addresses are not blocked,

which can be easily satisfied. Because there are many unused addresses, the ISP

routers will generate IC'\IP host-unreachable for those addresses. Hence, the ISP

simply has to make sure its own routers do not filter IC' I P host-unreachable until

they are counted.

If some customer networks block all incoming SYN packets except for a list of

servers, their filtering routers should either generate IC' I P host-unreachable for the

dropped SYN packets or, in case that IC'\! P replys are desirable, send log messages

to an ISP log station. Upon receipt of a log message, the log station sends an

IC\ I P host-unreachable towards the sender of the SYN packet. When an ISP edge

router receives an ICPM host-unreachable packet from the log station, it counts a

connection failure and drops the packet.









2.3.3 DAW Overview

As illustrated in Figure 2-1, DAW consists of two software components:

a DAW agent that is deploy- l1 on all edge routers of the ISP and a management

station that collects data from the agents. Each agent monitors the connection-failure

replys sent to the customer network that the edge router connects to. It identifies

the potential offending hosts and measures their failure rates. If the failure rate of

a host exceeds a pre-configured threshold, the agent randomly drops a minimum

number of connection requests from that host in order to keep its failure rate under

the threshold. A temporal rate-limit algorithm and a spatial rate-limit algorithm

are used to constrain any worm activity to a low level over the long term, while

accommodating the temporary .-:- ressive behavior of normal hosts. Each agent

periodically reports the observed scanning activity and the potential offenders

to the management station. A continuous, steady increase in the gross scanning

activity raises the flag of a possible worm attack. The worm propagation is further

slowed or even stopped by blocking the hosts with persistently high failure rates.

Each edge router reads a configuration file from the management station about

what source addresses S and what destination ports P that it should monitor and

regulate. S consists of all or some addresses belonging to the customer network.

It provides a means to exempt certain addresses from DAW for research or other

purposes. P consists of the port numbers to be protected such as 80/8080 for

www, 23 for telnet, and 21 for ftp. It should exclude the applications that are not

suitable for DAW; for example, a hypothetical application runs with an extremely

high failure rate, making normal hosts undistinguishable from worms targeting

at the application. While DAW is not designed for all services, it is particularly

effective in protecting the services whose clients involve human interactions such

as web browsering, which makes greater distinction between normal hosts and

worm-infected hosts.









Throughout the paper, when we z- "a router receives a connection re 1'L -I ,

we refer to a connection request that enters the ISP from a customer network, with

a source address in S and a destination port in P. When we v- "a router receives

a failure reply", we refer to a failure reply that leaves the ISP to a customer

network, with a destination address in S and a source port in P if it is a TCP

RESET packet.

This dissertation does not address the worm activity within a customer

network. A worm-infected host is not restricted in any way to infect other

vulnerable hosts of the same customer network. DAW works only against the

inter-network infections. The scanning rate of an infected host s is defined as the

number of connection requests sent by s per unit of time to addresses outside of the

customer network where s resides.

If a customer network has m(> 1) edge routers with the same ISP, the DAW

agent should be stalled on all m edge routers. If some edge routers are with

different ISPs that do not implement DAW, the network can be infected via those

ISPs but then are restricted in spreading the worm to the customer networks of the

ISPs that do implement DAW. For the purpose of simplicity, we do not consider

multi-homed networks in the analysis.

Based on the data from all agents, the controller monitors the total number

of potential offenders. A steady increase in the number of potential offenders

is considered as possible on-going worm propagation. When this happens,

the controller instructs the edge routers to block out a percentage of potential

offenders (i.e., their IP addresses) that have the highest failure rates. The controller

continues to double the percentage after each period (e.g., one minute) until

the number of potential offenders stops to increase. The reason to block only a

percentage instead of all potential offenders is as follows: the failure rates of some

normal hosts may happen to exceed the threshold amidst a worm attack. With a









mix of normal hosts and infected hosts, the .: .. ressive behavior of worms makes

the infected hosts more likely to be blocked, while the normal hosts with marginal

exceeding failure rates remain unblocked.

On the other hand, if a normal host happens to run an automatic host-map

tool in the middle of a worm attack, it may be blocked due to high failure rate

of scanning activity. To prevent it from being blocked indefinitely, each blocked

address should be unblocked after certain period of time. An edge router keeps

a log of the blocked addresses and the number of times they are blocked recently

(e.g., during the past month). When an address is repetitively blocked, the
blocking time grows expontentially by T Toek, where To is the initial blocking

time and k is the number of prior blocks.

How to monitor failed connection attempts? The answer to this question

allows DAW to separate the worm activity from most normal traffic and

consequently reduces the amount of information that DAW has to process.

How to achieve bounded failure rate? The answer to this question effectively

limits the maximum scanning rate (r in Eq. (2-4)) of any infected host.

How to reduce false positives? The answer to this question helps to reduce

the impact on normal hosts.

How to automatically generate the worm signatures? The answer to this

question allows DAW to work with intrusion-detection devices and firewalls to

identify and filter out the worm traffic.

2.3.4 Measuring Failure Rate

Each edge router measures the failure rates for the addresses belonging to the

customer network that the router connects to.

A failure-rate record consists of an address field s, a failure rate field f,

a timestamp field t, and a failure counter field c. The initial values of f and c

are zeros; the initial value of t is the system clock when the record is created.









Whenever the router receives a failure reply for s, it calls the following function,

which updates f each time c is increased by 100. f is a parameter between 0 and

1.

Update_Failure_Rate_Record( )

(1) c-c+1

(2) if (c is a multiple of 100)

(3) f' -- 100/(the current system clock t)

(4) if (c = 100)

(5) f- f'
(6) else

(7) f x f + (1 ) x f'

(8) t -- the current system clock
It is unnecessary to create individual failure-rate records for those hosts that

occasionally make a few failed connections. Each edge router maintains a hash

table H. Each table entry is a failure-rate record without the address field. When

the router receives a failure reply, it hashes the destination address to a table

entry and calls Update_Failure_Rate_Record() on that entry. Each entry therefore

measures the combined failure rate of roughly A/IHI addresses, where A is the size

of the customer network and IHI is the size of the hash table.

Only when the rate of a hash-table entry exceeds a threshold A(e.g., one per

second), the router creates failure-rate records for individual addresses of the entry.

A failure-rate record is removed if its counter c registers too few failed connections

in a period of time.

2.3.5 Basic Rate-Limit Algorithm

If the failure rate of an address s is larger than A, there must be a failure-rate

record created for s because the hash-table entry that s maps to must have a rate

exceeding A. Let FA be the set of addresses whose failure rates are larger than A.









For each s E F], the router reduces its failure rate below A by rate-limiting the

connection requests from s. A token bucket is used. Let size be the bucket size,

tokens be the number of tokens, and time be a timestamp whose initial value is the

system clock when the algorithm starts.

Upon receipt of a failure reply to s

(1) tokens -- tokens 1



Upon receipt of a connection request from s

(2) At -- the current system clock time

(3) tokens <- min{tokens + At x A, size}

(4) time -- the current system clock

(5) if (tokens > 1)

(6) forward the request

(7) else

(8) drop the request

It should be emphasized that the above algorithm is not a traditional

token-bucket algorithm that buffers the oversized bursts and releases them at

a fixed average rate. The purpose of our algorithm is not to shape the flow of

incoming failure replys but to shape the "creation" of the failure replys. It ensures

that the failure rate of any address in S l ,i,- below A. This effectively restricts the

scanning rate of any worm-infected host (Eq. 2-6).

This and other rate-limit il.'- :thms are p f, ,i ,, .1, on individual addresses.

They are not performed on the failure-rate records in the hash table; that is

because otherwise many addresses would have been blocked due to one scan source

mapped to the same hash-table entry.

One fundamental idea of DAW is to make the speed of worm propagation

no longer determined by the worm parameters set by the attackers, but by the









DAW parameters set by the ISP administrators. In the following, we propose more

advanced rate-limit algorithms to give the defenders greater control.

2.3.6 Temporal Rate-Limit Algorithm

A normal user behaves differently from a worm that scans the Internet

tirelessly, d,-i and night. A user may generate a failure rate close to A for a

short period of time, but that can not last for every minute in 24 hours of a

day. While we set A large enough to accommodate temporary .-.-Iressiveness in

normal behavior, the rate over a long period can be tightened. Let Qbe the system

parameter that controls the maximum number of failed connection requests allowed

for an address per div. Let D be the time of a div. Q can be set much smaller

than AD.

At the start of each day, the counters (c) of all failure-rate records and

hash-table entries are reset to zeros. The value of c alv--,i equals the number of

failed requests that have happened during the div. A hash-table entry creates

failure-rate records for individual addresses when either f > A or c > Q.

A temporal rate-limit algorithm is designed to bound the maximum number of

failed requests per div. Let FQ be the set of addresses with individual failure-rate

records and Vs E FQ, either the failure rate of s is larger than A or the counter of s

reaches 2/2. It is obvious that Fx C FQ.

Upon receipt of a failure reply to s

(1) tokens -- tokens 1



Upon receipt of a connection request from s

(2) At -- the current system clock time

(3) if (c < Q/2)

(4) tokens <- min{tokens + At x A, size}

(5) else









(6) A' Qt t c tokens
the end of the day time
(7) tokens -- min{tokens + At x A', size}

(8) time -- the current system clock

(9) if (tokens > 1)

(10) forward the request

(11) else

(12) drop the request

The temporal rate-limit algorithm constrains both the maximum failure rate

and the maximum number of failed requests. When it is used, the basic rate-limit

algorithm is not necessary. Before c reaches 2/2, the failure rate can be as high as

A. After that, the algorithm spreads the remaining "quota" (2 c tokens) on the

rest of the d4i-, which ensures that connections will be forwarded throughout the

day. Particularly, a host can make -;,... fful connections at i,.;, rate at i,;, time of

the /1,i; (e.g., browsing the favorite web sites that are up) because the constraint is

on failure replys only.

Theorem 2. When the temporal rate-limit i',j..',.:hm is used, the number of failure

replys for i,.; address does not exceed 2Q + rT in a I.i;, where r is the rate at

which the host makes connection requests and T is the round trip l. A,, in the ISP.

Proof: We first prove that tokens + c < Q holds for an arbitrary s at any time

of the div. It holds initially when the algorithm is activated on s with tokens = 0

and c < 2/2. The value of c or tokens changes only after the router receives either

a failure reply or a connection request. In the former case, tokens is decreased by

one due to the execution of the temporal rate-limit algorithm, and c is increased

by one due to the execution of Update_Failure_Rate_Record(). Hence, (tokens + c)

SI i,-, the same. Now consider the router receives a connection request. The values

of tokens before and after receiving the packet are denoted as tokensbefore and

tokensafter, respectively. Suppose tokensbefore + c < 2. Based on Lines 6-7, we









have

tokensafter

Smin{tokens before + At x A', size}
to c tokensbefore
< tokens_before + At x
the end of the di- time

< tokensbefore + (Q c tokensbefore)

<2 -c
Therefore, tokensafter + c < 2.

Next we prove that tokens > -rT at the end of the day. Consider the

case that tokens < 1 at the end of the dv. Without losing generality, suppose

tokens > 1 before time to, 0 < tokens < 1 after to due to the execution of Line 1,

and then tokens I ,i-, less than one for the rest of the diy. After to, all connection

requests from s are blocked (Line 12). For all requests sent before to T, the failure

replys must have already arrived before to. There are at most rT requests sent

between to T and to. Therefore, there are at most rT failure replys arriving after

to. We know that tokens > 0 at to. Hence, tokens > -rT at the end of the dv.

Because tokens + c < Q holds at any time, c < Q + rT at the end of the dv.

The counter c equals the number of failure replys received during the day after

the failure-rate record for s is created. Before that, there are at most Q failure

replys counted by the hash-table entry that s maps to. In the worst case all those

replys are for s. Therefore, the total number of failure replys for s is no more than

2Q + rT.

rT is normally small because the typical round trip delay across the Internet

is in tens or hundreds of milliseconds. Hence, if 2 = 300, the average scanning

rate of a worm is effectively limited to about 2Q/D = 0.42/min. In comparison,

Williamson's experiment showed that the scanning rate of the code red was at least

200 / second [12], which is more than 28,000 times faster. Yet, it took the code red









hours to spread, s-- :. -i ii-; the promising potential of using the temporal rate-limit

algorithm to slow down worms.

Additional system parameters that specify the maximum numbers of failed

requests in longer time scales (week or month) can further increase the worm

propagation time.

2.3.7 Recently Failed Address List

If a in .i web server such as Yahoo or CNN is down, an edge router may

observe a significant surge in failure replys even though there is no worm activity.

To solve this problem, each edge router maintains a recently failed address list

(RFAL), which is emptied at the beginning of each day. When the router receives

a failure reply from address d, it matches d against the addresses in RFAL. If d

is in the list, the router skips all DAW-related processing. Otherwise, it inserts d

into RFAL before processing the failure reply. If RFAL is full, d replaces the oldest

entry in the list.

When a popular server is down, if it is frequently accessed by the hosts in

the customer network, the server's address is likely to be in RFAL and the failure

replys from the server will not be repetitively counted. Hence, the number of

failed requests allowed for a normal host per d, v can be much larger than 2. It

effectively places no restriction on keeping trying a number of favorite sites that are

temporarily down. On the other hand, given the limited size of RFAL (e.g., 1000)

and the much larger space of IPv4 (232), the random addresses picked by worms

have a negligibly small chance to fall in the list.

2.3.8 Spatial Rate-Limit Algorithm

While each infected host is regulated by the temporal rate-limit algorithm,

there may be many of them, whose ..-i-' regated scanning rate can be very high.

DAW uses a spatial rate-limit algorithm to constrain the combined scanning rate

of infected hosts in a customer network. Let +be the system parameter that









controls the total number of failed requests allowed for a customer network per

day. It may vary for different customer networks based on their sizes. Once the

number of addresses inserted to RFAL exceeds 4, the system starts to create

failure-rate records for all addresses that receive failure replys, and activates

the spatial algorithm. If there are too many records, it retains those with the

largest counters. Let FE (E S) be the set of addresses whose counters exceed a

small threshold r (e.g., 50), which excludes the obvious normal hosts. The spatial

rate-limit algorithm is the same as the temporal algorithm except that s, 2, and

c are replaced respectively by FE, 4, and the total number of failure replys to FE

received after the spatial algoirthm is activated.

For any address s in FQ n FE, the temporal rate-limit algorithm is first

executed and then the spatial rate-limit algorithm is executed. The reason to

apply the temporal algorithm is to prevent a few .,.-:-ressive infected hosts from

keeping reducing tokens to zero. On the other hand, if there are a large number

of infected hosts, causing the spatial algorithm to drop most requests, the router

should temporarily block the addresses whose failure-rate records have the largest

counters.

The edge routers may be configured independently with some running both

the temporal and spatial algorithms but some running the temporal algorithm only.

For example, the edge routers for the neighbor ISPs should have large 4 values or

not run the spatial algorithm.

Theorem 3. When the spatial rate-limit i,.'>rithm is used, the total number of

failure replays per I/.r; for all infected hosts in a customer network is bounded by

24) + mr'T, where m is the number of addresses in FE, r' is the scanning rate of an

infected host after the temporal rate-limit il.' rithm is applied, and T is the round

trip /, l,~ of the ISP.









Due to space limitation, the proof is omitted, which is very similar to the proof

for Theorem 2. mr'T is likely to be small because both r' and T are small.

The following analysis is based on a simplified model. A more general model

will be used in the simulations. Suppose there are k customer networks, each with

V/k vulnerable hosts. Once a vulnerable host is infected, we assume all other

vulnerable hosts in the same customer networks are infected immediately because

DAW does not restrict the scanning activity within the customer network. Based

on Theorem 3, the combined scanning rate of all vulnerable hosts in a customer

network is (2+ + mrT)/D 2 2~/D. Let j(t) be the percentage of customer

networks that are infected by the worm.

At time t, the number of infected customer networks is j(t) k, and the number

of uninfected networks is (1 j(t))k. The probability for one scan message to hit

an uninfected vulnerable host and thus infect the network where the host resides

is (1 j(t))V/N. For an infinitely small period dt, j(t) changes by dj(t). During

that time, there are D j(t) k dt scan messages and the number of newly infected

networks is j(t) k dt (1 j(t))V/N= j(t) (1 j(t)) dt. Therefore,

2) Vk
k dj(t) j(t) (1- j(t)) dt
D N
dj(t) 2V .
dt ND

jV')= 2V< T
2")(t-T)

1 + eC (t T)

Assume there is one infection at time 0. We have T = N In The time it

takes to infect a percent of all networks is

ND a(k- 1)
t(a) In



3 The probability of multiple external infections of the same network is negligible
when dt -- 0.









Suppose an ISP wants to ensure that the time for a percent of networks to be

infected is at least 7 d-v -. The value of K should satisfy the following condition.

N a(k 1)
I < In
2.V7 1- a

which is not related to how the worm behaves.

2.3.9 Blocking Persistent Scanning Sources

The edge routers are configured to block out the addresses whose counters

(c) reach Q for n consecutive d ,v-, where n is a system parameter. If the

worm-infected hosts perform high-speed iiiii.- they will each be blocked

out after n d ,v- of activity. Hence the worm propagation may be stopped before an

epidemic materializes, according to Eq. (2-5).

The worm propagates slowly under the temporal rate-limit algorithm and the

spatial rate-limit algorithm. It gives the administrators sufficient time to study the

traffic of the hosts to be blocked, perform analysis to determine whether a worm

infection has occurred, and decide whether to approve or disapprove the blocking.

Once the threat of a worm is confirmed, the edge routers may be instructed to

reduce n, which increases the chance of fully stopping the worm.

Suppose a worm scans more than 2 addresses per div. The worm propagation

can be completely stopped if each infected customer network makes less than one

new infection on average before its infected hosts are blocked. The number of

addresses scanned by the infected hosts from a single network during n div- is

about 2n4 by Theorem 3. Each message has a maximum probability of V/N to

infect a new host. Hence, the condition to stop a worm is

V
2nL < 1
N










The expected total number of infected networks is bounded by

oo V 1
Z (2n$ -) 1
N
On th h h2na e

On the other hand, when 2n-)N > 1, the worm may not be stopped by the above

approach alone. However the significance of blocking infected hosts should not be

under-estimated as it makes the worm-propagation time longer and gives human or

other automatic tools (e.g., the one described below) more reaction time.

If the scanning rate of a worm is below Q per d-,4, the infected hosts will not

be blocked. DAW relys on a different approach to address this problem. During

each d4i-, an edge router e measures the total number of connection requests,

denoted as cn(e), and the total number of failure replys, denoted as nf(e). Note

that only the requests and replys that match S and P (Section 2.3.3) are measured.

The router sends these numbers to the management station at the end of the div.

The management station measures the following ratio

Z nf(C)
eEE
Z nc(e)
eEE

where E is the set of edge routers. If the ratio increases significantly for a number

of d-,v; it signals a potential worm threat. That is because the increase in failed

requests steadily outpaces the increase in issued requests, which is possibly the

result of more and more hosts being infected by worms.

The management station then instructs the edge routers to identify potential

offenders whose counters (c) have the highest values. Additional potential offenders

are found as follows. After a vulnerable server is infected via a port that it listens

to, the server normally scans the Internet on the same port to infect other servers.

Based on this observation, when an edge router receives a RESET packet with a

source address d, a source port p E P to a destination address s E S, it sends a

SYN packet to check if s is also listening on port p. If it is, the router marks s as a









potential offender and creates a failure-rate record, which measures the number of

failed connections from s. At the end of each div, the management station collects

the potential offenders from all edge routers. Those with the largest counters are

presented to the administrators for traffic analysis. The management station may

instruct the edge routers to block them if the worm threat is confirmed.

Although a blocked server can not issue connection requests before it is

unblocked, it can accept connection requests at any rate. Its role of a server is

unchanged. An alternative to complete blocking is to apply a different, small Q

value (e.g., 50) on those addresses, which leaves room for false positives since the

hosts can still make as many successful connections as they want, with occasional

failures.

2.3.10 FailLog

For a customer that blocks all IC'\! P traffic,4 its routers/firewalls should be

configured to send a log message to the local management station when a packet is

dropped, which is -?-dl-v's common practice. If the dropped packet is a SYN packet,

the management station forwards a <" .li of the log message to the nearest ISP edge

router, which in turn encapsulates the log in a control message (called FailLog) and

sends the message to the source address s of the SYN. The FailLog is then routed

across the ISP network to the edge router of s. An edge router is responsible of

measuring the failure rates for the addresses in the customer network it connects

to. Upon receipt of the FailLog, the edge router updates the failure rate of s and

discard the message.

The failure rate of a source address is the combined rate of RESET, IC\ I P

host-unreachable, and FailLog messages that are sent to the address. The



4 It is common for an organization to block all inbound IC'\ I P requests but not
common to block all inbound/outbound IC' I P traffic.









requirement for customer networks that block IC' I P to generate FailLog will

be relaxed in Section 2.3.10.

It has been assumed so far that every customer network that blocks IC'\IP

will generate FailLog messages. We now relax this requirement. Consider a worm

that targets at one or multiple ports (e.g, web service). Let A be the IP address

space that are not occupied by the hosts listening on those ports. Let pi be the

percentage of A that is used by existing hosts. Let p2 be the percentage of A that

is not used but reponds connection requests with IC' I P host-unreachable packets

(generated by routers). This includes the ISP's reserved addresses for future

expansion. Let P3 be the percentage of A that is not used and does not repond

with IC I P host-unreachable packets. Among p3, Let p' be the percentage that

generates FailLog. pi + p2 + P3 = 1 and 0 < p' < p3

Eq. (2-6) was derived under the condition that p' p3. If none or only some

customer networks generate FailLog, the equation becomes

1
Pi +P2 +

For example, if pi 10' p2 = (1' and p' (',- 5 then the scanning rate of any

worm-infected host will be roughly 1.4 times of the failure rate controlled by A and

Q. Our simulation shows that DAW works well even when pl + P2 +'3 1C'W

The actual value of pi + P2 + p' can be measured by the management

station by generating connection requests to random addresses and monitoring the

connection-failure replys. Since r8 is known and rf can be measured, pl + p2 + p3

rf/rs. Note that the scanning rate of the management station is not constrained by

DAW because it is inside the ISP network.



5 This is a conservation assumption because firewalls are often configured to
block IC'\ I P requests but not IC'\I P host-unreachable replys.









2.3.11 Warhol Worm and Flash Worm

The Warhol worm and the Flash worm are hypothetical worms studied in [1],

which embodied a number of highly effective techniques that the future worms

might use to infect the Internet in a very short period of time, leaving no room for

human actions.

In order to improve the chance of infection during the initial phase, the Warhol

worm first scans a pre-made list of (e.g., 10000 to 50000) potentially vulnerable

hosts, which is called a hit-list. After that, the worm performs permutation

.. ,ii,'W.:,j which divides the address space to be scanned among the infected hosts.

One way to generate a hit-list is to perform a scan of the Internet before the worm

is released [1]. With DAW, it will take about N/2Q d-v- Suppose 2 = 300 and

N = 232. That would be 19611 years. Even if the hit-list can be generated by

a different means, the permutation scanning is less effective under DAW. For

instance, even after 10000 vulnerable hosts are infected, they can only probe about

10000 x 22 = 6 x 106 addresses a d v. Considering the size of the address space is

232 M 4.3 x 109, duplicate hits are not a serious problem, which means the gain by

permutation scanning is small. Without DAW, it will be a different matter. If the

scanning rate is 200/second, it takes less than 36 minutes for 10000 infected hosts

to make 232 probes, and duplicate hits are very frequent.

The Flash worm assumes a hit-list L including most servers that listen on

the targeted port. Hence, random scanning is completely avoided; the worm scans

only the addresses in L. As more and more hosts are infected, L is recursively split

among the newly infected hosts, which scan only the assigned addresses from L.

The Flash worm requires a prescan of the entire Internet before it is released. Such

a prescan takes too long under DAW. In addition, each infected host can only scan

about 22 addresses per d i-, which limits the propagation speed of the worm if L is

large.











1 -. 1 No Algorithm --
Basic Rate-Limit
Temporal -
0.8 0.8 Temporal + Spatial
DAW ----
0.6 0.6

0.4 No Algorithm 0.4
Basic Rate-Limit
i Temporal
0.2 Temporal + Spatial 0.2
DAW ------
0 I ~------ 0
0 2 4 6 8 10 12 14 16 18 0 20 40 60 80 100
t (hours) t (days)

Figure 2-2. Worm-propagation comparison


2.3.12 Forged Failure Replys

To prevent forged failure replys from being counted, one approach is to keep a

table of recent connection requests from any source address in S to any destination

port in P during the past 45 seconds (roughly the MRTT of TCP). S and P are

defined in Section 2.3.3. Each table entry contains a source address, a source port,

a destination address, and a destination port, identifying a connection request.

Only those failure replys that match the table entries are counted. An alternative

approach is to extend the failure-rate record by adding two fields: one (x) counting

the number of connection requests from s and the other (y) counting the number

of successful connections, i.e., TCP SYN/ACK packets sent to s, where s is the

address field of the record. An invariant is maintained such that the number of

failed connections plus the number of successful connections does not exceed the

number of connection requests, i.e., c+ y < x. A failure reply is counted (c <- c+ 1)

only when the invariant is not violated.

2.4 Simulation

We use simulations to evaluate the performance of DAW. Figure 2-2 shows

how the rate-limit algorithms slow down the worm propagation. The simulation

parameters are given as follows. A = 1/sec. = 300. ) = 3000. n = 7

div The number of customer networks are k =10000. The average number of









vulnerable hosts per customer network is z = 10. The numbers of vulnerable hosts

in different customer networks follow an exponential distribution, -i.-.-- -I ir-; a

scenario where most customer networks have ten or less public servers, but some

have large numbers of servers. Suppose the worm uses a Nimda-like algorithm that

.,. i--ressively searches the local-address space. We assume that once a vulnerable

host of a customer network is infected, all vulnerable hosts of the same network are

infected shortly.

Figure 2-2 compares the percentage i(t) of vulnerable hosts that are infected

over time t in five different cases: 1) no algorithm is used, 2) the basic rate-limit

algorithm is implemented on the edge routers, 3) the temporal rate-limit algorithm

is implemented, 4) both the temporal and spatial rate-limit algorithms are

implemented, or 5) DAW (i.e., Temporal, Spatial, and blocking persistent scanning

sources) is implemented. Note that all algorithms limit the failure rates, not

the request rates, and the spatial rate-limit algorithm is applied only on the

hosts whose failure counters exceed a threshold r = 50. Two graphs show the

simulation results in different time scales. The upper graph is from 0 to 18 hours,

and the lower is from 0 to 100 d-ivd The shape of the curve "No Al, ..i 1i 11

depends on the worm's scanning rate, which is 10/sec in our simulation. The other

four curves are independent of the worm's scanning rate; they depend only on

DAW's parameters, i.e., A, 2, 4, and n. The figure shows that the basic rate-limit

algorithm slows down the worm propagation from minutes to hours, while the

temporal rate-limit algorithm slows down the propagation to tens of d4iv- The

spatial rate-limit algorithm makes further improvement on top of that -it takes

the worm 80 d ,v- to infect 5'. of the vulnerable hosts, leaving sufficient time for

human intervention. Moreover, with persistent scanning sources being blocked after

7 d4iv- DAW is able to stop the worm propagation at i(t) = 0.000034.









k z 2 = 1000 = 3000 = 5000 = 7000
5000 10 350.3 116.8 69.6 50.2
5000 20 237.2 79.1 47.2 33.9
10000 10 190.1 63.5 38.1 27.1
10000 20 127.9 42.5 25.5 18.3
15000 10 133.6 44.4 26.3 19.3
15000 20 89.3 29.7 17.8 12.7
20000 10 103.3 34.2 20.6 14.6
20000 20 68.9 22.9 13.8 10.0
Table 2-2. 5'. propagtion time (d i o-) for "Temporal + Spatial"


Table 2-2 shows the time it takes the worm to infect 5'. of vulnerable hosts

(called '. propagation time) under various conditions with Temporal + Spatial

implemented. Depending on the size (k and z) of the ISP, the propagation time

ranges from 10.0 di,-- to 350.3 di ,- To ensure a large propagation time, a very

large ISP may partition its customers into multiple defense zones of modest sizes.

DAW can be implemented on the boundary of each zone, consisting of the edge

routers to the customer networks of the zone and the internal routers connecting to

other zones.

Figure 2-3 shows the performance of the temporal rate-limit algorithm with

respect to the parameter 2. As expected, the propagation time decreases when 2

increases. The algorithm performs very well for modest-size ISPs (or zones). When

k = 10000, z = 10 and 2 = 3000, the 5' propagation time is 63.6 d-.v- Figure

2-4 shows the performance of the spatial rate-limit algorithm (alone) with respect

to the parameter 4. The algorithm works well for modest-size ISPs (or zones) even

for large 4 values. When k = 10000, z = 10 and 4 = 7000, the 5'. propagation

time is 27.2 d iv- The performance of the two algorithms is comparable when

z = z x Q, where the total temporal rate limit of the local infected hosts is equal to

the spatial rate limit. As shown in the figures, if 4 > z x 2, the temporal algorithm

works better; if 1 < z x 2, the spatial algorithm works better. Therefore, the two

algorithms are complementary to each other and they are both adopted by DAW.












k = 10,000 z = 5
30\ k = 10,000 z -10
0 \ k = 10,000 z = 20
\ k = 20,000 z 10
250 k = 20,000 z 20 -*-

S 200

* 150

S100


0
0 i i i

0 100 200 300 400 500 600 700
Omega

Figure 2-3. Effectiveness of the
temporal rate-limit
algorithm for DAW


k = 10,000 z 5
k = 10,000 z = 10
k = 10,000 z = 20
k = 20,000 z = 10
k = 20,000 z= 20 ---









0 1000 2000 3000 4000 5000 6000 7000


Figure 2-4.


Effectiveness of the spatial
rate-limit algorithm for
DAW


k = 10,000 z= 5
k = 10,000 z= 10
1 k = 10,000 z = 20
) k = 20,000 z = 10
0.8 k = 20,000 z 20

S0.6

0.4

0.2

0


0 2 4 6 8 10
n (days)

Figure 2-5. Stop worm propagation by
blocking


k 10,000 z 5
k = 10,000 z 10
k = 10,000 z 20
k = 20,000 z = 10
k = 20,000 z 20-


0 2 4 6 8 10
n (days)

Figure 2-6. Propagation time before
the worm is stopped


Because DAW blocks persistent scanning sources, it may stop the worm

propagation, depending on the value of n. Figure 2-5 shows the final infection

percentage among the vulnerable hosts before all infected hosts are blocked. Even

when a large n is selected and the final infection percentage is large, the blocking is

still very useful because it considerably slows down the worm propagation as shown

in Figure 2-6, where only the propagation times for larger-than-5', final infections

are shown. For instance, when k = 20000, z = 20 and n = 10, the final infection

percentage is close to 1C(I'. However, it will take the worm 71.7 dv to achieve

that.















CHAPTER 3
A SIGNATURE-BASED APPROACH

3.1 Double-Honeypot System

3.1.1 Motivation

The spread of a malicious worm is often an Internet-wide event. The

fundamental difficulty in detecting a previously unknown worm is due to two

reasons. First, the Internet consists of a large number of autonomous systems

that are managed independently, which means a coordinated defense system

covering the whole Internet is extremely difficult to realize. Second, it is hard to

distinguish the worm activities from the normal activities, especially during the

initial spreading phase. Although the worm activities become apparent after a

significant number of hosts are infected, it will be too late at that time due to the

exponential growth rate of a typical worm [19, 22, 21, 18, 17]. In contrast to some

existing defense systems that require large-scale coordinated efforts, we describe a

double-honeypot system that allows an individual autonomous system to detect the

ongoing worm threat without external assistance. Most importantly, the system is

able to detect new worms that are not seen before.

Before presenting the architecture of our double-honeypot system, we give a

brief introduction of honeypot. Developed in recent years, honeypot is a monitored

system on the Internet serving the purpose of attracting and trapping attackers

who attempt to penetrate the protected servers on a network [28]. Honeypots

fall into two categories [29] A high-interaction honeypot operates a real operating

system and one or multiple applications. A low-interaction honeypot simulates one

or multiple real systems. In general, any network activities observed at honeypots

are considered as suspicious and it is possible to capture the latest intrusions based









on the analysis of these activities. However, the information provided by honeypots

is often mixed with normal activities as legitimate users may access the honeypots

by mistake. Hours or even d ,v are necessary for experts to manually scrutinize

the data Ic-.:. 1 by honeypots, which is insufficient against worm attacks because a

worm may infect the whole Internet in such a period of time.

We propose a double-honeypot system to detect new worms automatically.

A key novelty of this system is the ability to distinguish worm activities from

normal activities without the involvement of experts. Furthermore, it is a purely

local system. Its effectiveness does not require a wide deployment, which is a great

advantage over many existing defense systems [2, 12].

The basic idea is motivated from the worm's self-replication characteristics. By

its nature, an worm infected host will try to find and infect other victims, which

is how a worm spreads itself. Therefore, outbound connections initiated from the

compromised hosts are a common characteristic shared by all worms. Suppose we

deliberately configure a honeypot to never initiate any outbound connections. Now

if the honeypot suddenly starts to make outbound connections, it only means that

the honeypot must be under foreign control. If the honeypot can be compromised,

it might try to compromise the same systems on the Internet in the way it was

compromised. Therefore, the situation is either a real worm attack or can be

turned into a worm attack if the attacker behind the scene chooses to do so. We

shall treat the two equally as a worm threat.

3.1.2 System Architecture

Figure 3-1 illustrates the double-honeypot system. It is composed of two

independent honeypot arrays, the inbound ,,;:r, and the outbound 1,r;, together

with two address translators, the gate translator and the internal translator. A

honeypot array consists of one or multiple honeypots, which may run on separate

physical machines or on virtual machines simulated by the same computer [29].










Outbound Array
Honeypots
Signature Center
k ~j -----------------

Firewall
Gate Translater
Local Network Int


Internal Translater
Worm

Honeypots
Inbound Array

Figure 3-1. Using double-honeypot detecting Internet worms


Each honeypot in the array runs a server identical to a local server to be protected.

A honeypot in the inbound (outbound) array is called an inbound (outbound)

ho,'" :i-I'! Our goal is to attract a worm to compromise an inbound honeypot

before it compromises a local server. When the compromised inbound honeypot

attempts to attack other machines by making outbound connections, its traffic is

redirected to an outbound honeypot, which captures the attack traffic.

An inbound honeypot should be implemented as a high-interaction honeypot

that accepts connections from outside world in order to be compromised by

worms that may pose a threat to a local server. An outbound honeypot should be

implemented as a low-interaction honeypot so that it can remain uninfected when

it records the worm traffic. In addition to performing the functionalities of the local

system, it checks and records all network traffic in a connection initiated from an

inbound honeypot. The network traffic, which is directly related to worm activities

from the outside, will be analyzed to identify the signatures of the worms.

The gate translator is implemented at the edge router between the local

network and the Internet. It samples the unwanted inbound connections, and

redirects the sampled connections to inbound honeypots that run the server









software the connections attempt to access (e.g., connections to ports 80/8080

are redirected to a honeypot running a web server). There are several v--v

to determine which connections are i n.-- il, I1". The gate translator may be

configured with a list of unused addresses. Connections to those addresses are

deemed to be unwanted. It is very common i .-- Il ivs for an organization to expose

only the addresses of its public servers. If that is the case, the gate translator can

be configured with those publicly-accessible addresses. When a connection for a

specific service (e.g., to port 80 for web access) is not made to one of the servers,

it is unwanted and redirected to an inbound honeypot. Suppose the size of the

local address space is Nand there are hpublicly-accessible servers on a particular

destination port. Typically, N >> h. For a worm which randomly scans that port,

the chance for it to hit an inbound honeypot first is N, and the chance for it to

hit a protected server first is With a ratio of N-, it is almost certain that the

worm will compromise the inbound honeypot before it does any damage to a real

server within the network.

Once an inbound honeypot is compromised, it will attempt to make outbound

connections. The internal translator is implemented at a router that separates the

inbound array from the rest of the network. It intercepts all outbound connections

from an inbound honeypot and redirects them to an outbound honeypot of the

same type, which will record and analyze the traffic.

We give the following example to illustrate how the system works. Suppose

that the IP address space of our network is 128.10.10.0/128, with one public web

server Y to be protected. The server's IP address is 128.10.10.1. Suppose an

attacker outside the network initiates a worm attack against systems of type Y.

The worm scans the IP address space for victims. It is highly probable that an

unused IP address, e.g. 128.10.10.20, will be attempted before 128.10.10.1. The

gate controller redirects the packets to an inbound honeypot of type Y, which is









subsequently infected. As the compromised honeypot participates in spreading the

worm, it will reveal itself by making outbound connections and provide the attack

traffic that will be redirected to an outbound honeypot of the system.

After an outbound honeypot captured a worm, the p i-load of the worm can

be directly considered as a signature. Using traffic filtering with the signature

at the edge of the network will protect the hosts from being attacked by the

same worm. In our system, the p loadd of the worm will also be forwarded to a

.':ll,: re center. If a worm with polymorphism has been used during the attack,

the signature center will generate one single signature for all the variants of one

olymorphic worm by the algorithms discussed later. The special will not only be

able to match those variants whose p loadss have been captured before, it can also

match those variants not seen before.

We should emphasis that, the proposed double-honeypot system is greatly

different from a conventional honeypot. A conventional system receives traffic from

all kinds of sources, including traffic from the normal users. It is a difficult and

tedious task to separate attack traffic from normal traffic, especially for attacks

that are not seen before. It is more than often that, only after the damage of the

new attacks is surfaced, the experts rush to search the recorded data for the trace

of attack traffic. In our system, when an outbound honeypot receives packets from

an inbound honeypot, it knows for sure that the packets are from a malicious

source. The outbound honeypot does not have to face the potentially huge amount

of normal background traffic that a conventional honeypot may receive.

3.2 Polymorphism of Internet Worms

The double-honeypot system provides a means to capture the byte sequences

of previous unknown Internet worms without manual analysis from the experts.

The captured byte sequences can be used to generate worm signatures, and future

connections carrying them will be automatically blocked. This is a great advantage










mov edi, 00403045h ; Set EDI to Start
add edi, ebp ; Adjust according to base
mov ecx, OAGBh ; length of encrypted virus body
mov al, [key] ; pick the key

Decrypt:
xor [edi], al ; decrypt body
inc edi ; increment counter position
loop Decrypt ; until all bytes are decrypted
jmp Start ; Jump to Start (jump over some data)

DB key 86 ; variable one byte key
Start: ; encrypted/decrypted virus body


Figure 3-2. A decryptor example of a worm.


over the current systems because the defense can be carried out automatically

before new worms deal a significant damage to the network.

The attackers will try every possible way to extend the life time of Internet

worms. In order to evade the signature-based system, a polymorphic worm appears

differently each time it replicates itself. This section discusses the polymorphism of

Internet worms, while the next section provides a solution against some common

polymorphism techniques.

There are many v--,v- to make polymorphic worms. One technique relies on self

encryption with a variable key. It encrypts the body of a worm, which erases both

signatures and statistical characteristics of the worm byte string. A copy of the

worm, the decryption routine, and the key are sent to a victim machine, where the

encrypted text is turned into a regular worm program by the decryption routine,

for example, the code presented in Figure 3-2 [38]. The program is then executed

to infect other victims and possibly damage the local system. Figure 3-3 illustrates

a simple polymorphic worm using the same decryptor. The worm body attached

after the decryptor part appears differently based on different keys.









Key




Entry point

\imn


Decryptor Encrypted worm body

Figure 3-3. Different variants of a polymorphic worm using the same decryptor


While different copies of a worm look different if different keys are used, the

encrypted text tends to follow a uniform byte frequency distribution [39], which

itself is a statistical feature that can be captured by anomaly detection based on its

deviation from normal-traffic distributions [4, 14]. Moreover, if the same decryption

routine is ah--lv-i used, the byte sequence in the decryption routine can serve as the

worm signature, if we are able to identify the decryption routine region which is

invariant over different instances of the same Internet worms.

A more sophisticated method of polymorphism is to change the decryption

routine each time a copy of the worm is sent to another victim host. This can be

achieved by keeping several decryption routines in a worm. When the worm tries

to make a copy, one routine is randomly selected and other routines are encrypted

together with the worm body. Figure 3-4 is an example of this case. To further

complicate the problem, the attacker can change the entry point of the program

such that decryption routine will appear at different locations of the traffic p ivload,

as is shown in Figure 3-5.

The number of different decryption routines is limited by the total length

of the worm. For example, consider a buffer-overflow attack that attempts to

( "i,- malicious data to an unprotected buffer. Over-sized malicious data may

cause severe memory corruption outside of the buffer, leading to system crash


I














Key




Entry point



Decryptor Encrypted worm body

Figure 3-4. Different variants of a polymorphic worm using different decryptors


Entry point


Key
4,


Entry point
z


-I- / I


Entry point -P

Decryptor Encrypted worm body

Figure 3-5. Different variants of a polymorphic worm with different decryptors and
different entry point


m


m










Original
55
8BEC
8B7608
85F6
743B
8B7EOC
09FF
7434
31D2


code
push
mov
mov
test
je
mov
or
je
xor


With garbage code
55 push
8BEC mov
8B7608 mov
85F6 test
90 nop
90 nop
90 nop
743B je
8B7EOC mov
09FF or
7434 je
31D2 xor


ebp
ebp, esp
esi, dwoed ptr [ebp + 08]
esi, esi
401045
edi, dword ptr [ebp + OC]
edi, edi
401045
edx, edx



ebp
ebp, esp
esi, dword ptr [ebp + 08]
esi, esi





401045
edi, dword ptr [ebp + OC]
edi, edi
401045
edx, edx


Figure 3-6. Different variants of a polymorphic worm with garbage-code insertation


and spoiling the compromise. Given a limited number of decryption routines, it

is possible to identify all of them as attack signatures after enough samples of the

worm have been obtained.

Another polymorphism technique is called garbage-code insertion. It inserts

garbage instructions into the copies of a worm. For example, a number of nop (i.e.,

no operation) instructions can be inserted into different places of the worm body,

thus making it more difficult to compare the byte sequences of two instances of the

same worm. Figure 3-6 [38] is an example of this scenario.









The level of polymorphism in this type of worms is decided by the ratio of

the length of the garbage instruction region to the total length of the worm. For

those worms with moderate ratio, it is quite conceivable that there will be a good

chance that regions sharing the same byte sequence exist in different instances

of the worms, which in turn can be served as the signature of the worm. With a

increased length, the overlapped regions will be shortened and it is problematic to

identify them.

However, from the statistics point of view, the frequencies of the garbage

instructions in a worm can differ greatly from those in normal traffic. If that

is the case, anomaly-detection systems [4, 14] can be used to detect the worm.

Furthermore, some garbage instructions such as nop can be easily identified and

removed. For better obfuscated garbage, techniques of executable analysis [32] can

be used to identify and remove those instructions that will never be executed.

The instruction-substitution technique replaces one instruction sequence with

a different but equivalent sequence. Unless the substitution is done over the entire

code without compromising the code integrity (which is a great challenge by itself),

it is likely that shorter signatures can be identified from the stationary portion of

the worm. The code-transposition technique changes the order of the instructions

with the help of jumps. The excess jump instructions provide a statistical clue,

and executable- ii' 1, -; techniques can help to remove the unnecessary jump

instructions. Finally, the register-reassignment technique swaps the usage of the

registers, which causes extensive "minor" changes in the code sequence. These

techniques can be best illustrated in Figure 3-7 [38].

The space of polymorphism techniques is huge and still growing. With the

combinations of different techniques, a cure-all solution is unlikely. The pragmatic

strategy is to enrich the pool of defense tools, with each being effective against

certain attacks. The current defense techniques fall in two main categories,




















Original code
55 push
8BEC mov
8B7608 mov
85F6 test
743B je
8B7EOC mov
09FF or
7434 je
31D2 xor


Obfuscated code
55 push
54 push
5D pop
8B7608 mov
09F6 or
743B je
8B7EOC mov
85FF test
7434 je
28D2 sub


ebp
ebp, esp
esi, dwoed ptr [ebp + 08]
esi, esi
401045
edi, dword ptr [ebp + OC]
edi, edi
401045
edx, edx



ebp
esp
ebp
esi, dword ptr [ebp + 08]
esi, esi
401045
edi, dword ptr [ebp + OC]
edi, edi
401045
edx, edx


558BEC8B760885F6743B8B7EOC09FF743431D2
55545D8B760809F6743B8B7EOC85FF743428D2


Figure 3-7. Different variants of a polymorphic worm with several different
polymorphic techniques









misuse/signature matching and anomaly detection. The former matches against

known patterns in the attack traffic. The latter matches against the statistical

distributions of the normal traffic. We propose a hybrid approach based on a

new type of signatures, consisting of position-aware byte frequency distributions.

Such signatures can tolerate extensive, "local" changes as long as the "global"

characteristics of the signature remain. Good examples are polymorphism caused

by register reassignment and modest instruction substitution. We do not claim

that such signatures are suitable for all attacks. On the other hand, it may work

with executable-analysis techniques to characterize certain statistical patterns that

appear after garbage instructions and excess jumps are removed.

In this paper, we focus on solving the problem of moderate polymorphism.

While we admit that there might exist no unique solution to solve all these

problems, it is quite possible that polymorphism can be at least partially solved

if no extreme case is involved. More importantly, our system is still very useful in

dealing with even the most extreme cases. First of all, our double-honeypot system

is able to automatically capture the different instances of the worm. Although a

unified signature matching all instances of the worm seems unlikely in extreme

cases, it will still help analyzing the behavior of the attack and providing an early

warning of it by capturing the samples of the worm. Second, although it might be

true in some case that human analysis can find out signatures that do not conform

with our model, in most cases it is laborious, empirical, and time-consuming. Our

algorithm, on the other hand, can detect the most subtle signatures based on the

model and is more reliable than human analysis. Finally, our system can cooperate

with other defense systems, e.g., anomaly-based systems, in order to be more

effective.

We use the invariant region of the worm to serve as the signature because we

are dealing with the Internet worms. Other malicious code such as virus can be









detected after the machine has been infected by scanning the programs because

virus will rely on the execution of the infected programs. The Internet worms,

however, will need to be identified before the infection has been done as the

goal of the worm is to spread to the Internet as quickly as possible. While some

techniques, e.g. ('Cl -I.,dorescu, can successfully identify the polymorphic malicious

code by looking for the semantical equivalence, they are inappropriate in worm

detection as they are unable to be done in real time. In the next section, we use

the iterative algorithms to identify the invariant region from byte sequences of

polymorphic worms.

The basic premise of our model about the signature is that the byte frequency

distributions in the significant region, which in our case is the region that match

the signature approximately, should be greatly different from the rest part of the

worm body and normal, legitimate traffic p loadsd. The reason is that they carry

different functionalities. For example, in a polymorphic worm, the significant

region is responsible for the true malicious operations while the rest part of the

sequence only serves as a camouflage to elude the defense system. As a result, the

rest part of the sequence will most likely have the same or similar byte frequency

distribution as the legitimate connections. Even if an attacker tries to hide the true

worm body by attaching legitimate p loadsd, it is alvb-- l difficult to design a pure

malicious sequence part indistinguishable from the normal connection. Therefore,

the byte frequency distributions related to this part should be under-represented

in the rest of the worm body. If we are able to extract a similar region from each

of the sampled instances of the worm, where the frequency distribution is greatly

different from the rest of the sequence, then this region should be potentially the

significant region and its probabilistic multinomial byte frequency distribution will

be the signature we are looking for.









The attackers may not act as what we have expected as above. For example,

they may only insert several nop operations into each instances of the worm

randomly without attaching the camouflage part. Our argument still hold in this

case. Since nop does not appear frequently in normal sessions, the sequence of the

malicious connection will have a high frequency on nop operations. The probability

of nop in each positions will greatly larger than the normal incoming connection

sequence. That is enough to constitute a signature with a width of the same length

as the instances of the worm.

3.3 Position-Aware Distribution Signature (PADS)

3.3.1 Background and Motivation

Most deploy, ,1 defense systems against Internet worms are signature-based.

They rely on the exact matching of the packet p ivload with a database of fixed

signatures. Though effective in dealing with the known attacks, they fail to

detect new or variants of the old worms, especially the polymorphic worms whose

instances can be carefully crafted to circumvent the signatures [32]. Moreover,

manually identifying the signatures may take d ,v if not longer.

To address these problems, several anomaly-based systems [4, 14] use the

byte fI './" '. ;/ distribution (BFD) to identify the existence of a worm. Their basic

approach is to derive a byte frequency distribution from the normal network

traffic. When a new incoming connection is established, the p loadd of the packets

is examined. The byte frequency distribution of the connection is computed

and compared with the byte frequency distribution derived from the normal

traffic. A large deviation will be deemed as suspicious. The problem is that an

intelligent attacker could easily cheat the system by attach the worm body to

a lengthy normal, legitimate session. Since the majority of the p loadd is from

legitimate operations, its byte frequency distribution will not vary much from the









normal traffic. As the worm byte sequence is diluted in normal traffic, its statistic

characteristics are smoothed out.

Both signature-based and anomaly-based systems have their pros and cons.

Compared to the anomaly-based systems, signature-based systems have their own

advantages. Since signature-based systems match the signature of the worm with

only the corresponding segment of the whole worm body, it will not help much

to reduce the chance of being detected if normal p loadss are attached to the

end of the worm body. However, if only the exact matching is used to compare

the signature with the p loadsd, a slightly change of the malicious part in the

whole worm body means a mismatch to the signature. In other words, current

signature-based system lacks the flexibility in contrast to the anomaly-based

systems. In addition drawbacks Abe, the signature-based system is not robust

enough to different techniques employ, l1 by the intelligent attackers as well. For

example, an attacker might increase the number of garbage instructions inserted to

the worm so that each signature by definition is tailored into only several bytes, as

is shown in figure.X. An automatic signature-extraction system will dramatically

increase the false positives as it is so common that any incoming connections might

contain such a short signature.

Our system inherits the positive aspects of both signature-based and

anomaly-based systems. It is based on a new defense technique that is complementary

to the existing ones. We define a relaxed, inexact form of signatures that have the

flexibility against certain polymorphism. The new signature is called the position-

aware distribution -.:,'i..l, re (PADS for short). It includes a byte frequency

distribution (instead of a fixed value) for each position in the signature -ing".

The idea is to focus on the generic pattern of the signature while allowing some

local variation.









Consider a polymorphic worm with register reassignment (Section 3.2).

Because registers are used extensively in executables, swapping registers is

effective against traditional signatures. However, when a signature is expressed

in position-aware distributions, not only are the static elements in the executable

captured, but the set of likely values for the variable elements are also captured.

Hence, PADS allows a more precise measurement of "matching; A similar

example is instruction substitution, where the mutually replaceable instructions (or

sequences) can be represented by the position-aware distributions.

To better explain the concept, we give an example here. Suppose a worm

carries a word v.-- i i"1' in its byte sequence. In order to avoid the detection, the

variants of the worm may change to .-Orm", "norm". Counting the number of

byte appearance at each position will give us the following table. Based on this

model, when a new incoming connection is established, it is possible to check the

byte sequence in the connection session and decide the similarity between the

sequence and the previously captured v.-111ii', v.-Orm", "dorm", etc.

The goal of our system is to use double honeypots to capture the worm

attack traffic, based on which PADS is derived and used to detect inbound worm

variants. It provides a quick and automatic response that complements the existing

approaches involving human experts. Based on PADS, the defense system will be

able to identify the new variant of a worm at its first occurrence, even if such a

variant has not been captured by the system previously. That means our system

is able to alert the attacks that successfully elude the current existing system,

hence a significant decrease of the false negative. Besides the advantages over the

traditional signature-based system which needs the assistance of the human expert,

our proposed system is especially useful in special cases when an anomaly-based

system may fail.









b 0 1 2 ... 9 10
Ox00 0.001 0.001 0.001 ... 0.500 0.100
OxO1 0.001 0.001 0.001 ... 0.200 0.500
0x02 0.005 0.001 0.001 ... 0.001 0.100

Oxfe 0.100 0.001 0.001 ... 0.001 0.001
Oxff 0.001 0.700 0.700 ... 0.001 0.001
Table 3-1. An example of a PADS signature with width W = 10


3.3.2 Position-Aware Distribution Signature (PADS)

We first describe what is a PADS signature, then explain how to match a byte

sequence against a signature, and finally motivate how to compute such a signature

based on captured worm sequences.

At each byte position p of a PADS signature, the byte-frequency distribution

is a function f,(b), which is the probability for b to appear at position p, where

b E [0..255], the set of possible values for a byte. Zbe[0..255] fp(b) 1. We use

(fl, f2, *..fw) to characterize the byte-frequency distribution of the worm, where
Wis the width of the signature in terms of the number of bytes. Let fo(b)be the

byte frequency distribution of the legitimate traffic. The PADS signature is defined

as = (fo, fl, f2, ...fw),which consists of a normal -:,i.urli, re fo and an anomalous

-:,I..; ,ire (fl, f2, ...fw). Table 3-1 gives an example of a PADS signature with

width W = 10.

When a new connection is established, we need to decide if the 1p load of the

connection is a variant of the worm or not. It is necessary to define a similarity

scale between a probabilistic byte frequency distribution and a byte sequence.

Consider a set of byte sequences S = {S, S2, ..., S,}, where Si, 1 < i < n,

is the byte sequence of an incoming connection. We want to decide whether Si is

a variant of the worm by matching it against a signature 0. Let li be the length

of Si. Let Si,, Si,2, ..., Sij be the bytes of Si at position 1, 2, ..., li, respectively.

Let seg(Si, ai)be the W-byte segment of Si starting from position ai. The matching










score of seg(Si, ai) with the anomalous signature is defined as

w
M(e, Si, a,) fp(Si,a,+p-1)
p=1

which is the probability for seg(Si, a) to occur, given the distribution (fl, f2, .fw)

of the worm. Similarly, the matching score of seg(Si, ai) with the normal signature

is defined as
w
M(e, S, ai) f o(Si,a+p- 1)
p=1
We want to find a position ai that maximizes M((, Si, ai) and minimizes

M((, Si, ai). To quantify this goal, we combine the above two scores in order to

capture both the "-,ii!i ,i I between seg(Si, a) and the anomalous signature, and

the "dI--i~i! inly between seg(Si, a) and the normal signature. For this purpose,

we define A(, Si, ai)as the matching score of seg(Si, aj) with the PADS signature.


A(O, Si, ai) M(O, Si, a) f (Siai+p-1) (3-1)


The matching score of the byte sequence Si with the signature is defined as the

maximum A(, Si, ai) among all possible positions ai, that is,

I-W+i
max A(, Si, ai)
ai=l

Alternatively, we can use the logarithm of A as the score, which makes it easier

to plot our experiment results. Our final matching score of Si with the PADS

signature 0 is defined as:

I -W+i 1
Q(O, Si) =max -log(A(O, Si, a))
ai=1 W
w (3-2)
Ix-W+1 1 f,(Si, ai + p 1)
aiI W fo(Si, ai p 1)
p= 1









The W-byte segment that maximizes ((0, Si) is called the -.:I<,. ,nl, region

of Si, which is denoted as Ri.The matching score of the significant region is the

matching score of the whole byte sequence by definition.

For any incoming byte sequence Si, if 2(0, Si) is greater than a threshold

value, a warning about a (possibly variant) worm attack is issued. Additional

defense actions may be carried out, e.g., rejecting the connection that carries Si.

The threshold is typically set at 0. From the definition of Q, above zero means that

Si is closer to the anomalous signature (fi, f2, ..fw); below zero means that Si is

closer to the normal signature fo.

Next we discuss how to calculate based on the previously collected instances

of a worm. Suppose we have successfully obtained a number n of variants of a

worm from the double-honeypot system. Each variant is a byte sequence with

a variable length. It contains one (' .i, of the worm, possibly embedded in the

background of a normal byte sequence. Now let S = {S, S2,..., S,} be the set of

collected worm variants. Our goal is to find a signature with which the matching

scores of the worm variants are maximized. We attempt to model it as the classical

ii,--IIig data pio In in statistics and then apply the expectation-maximization

algorithm (EM) to solve it.

To begin with, we know neither the signature, which is the underlying

unknown parameter, nor the significant regions of the variants, which are the

missing data. Knowing one would allow us to compute the other. We have just

showed how to compute the significant region of a byte sequence if the signature

is know. Next we describe how to compute the signature if the significant regions of

the variants are known.

First we compute the byte frequency distribution for each byte position of the

significant regions. At position p E [1...W], the maximum likelihood estimation of

the frequency f,(x), x E [0...255], is the number c(p, x) of times that x appears at









position p of the significant regions, divided by n.


f(x) P=

One problem is that fp(x) will be zero for those byte values x that never

appear at position p of any significant region. However, considering that our

calculation is based on a limited collection of the variants and fp(x) is only the

maximum likelihood estimation of the frequency, we are not absolutely confident

that the actual frequencies are zero unless we obtain all variants of the worm. For

better flexibility, we apply a Ip-" l,-count" to the observed byte count cp,x, and

the byte frequency fp(x) is estimated as


f,(x) = + d (3-3)
n + 256 d

where d is a small predefined pseudo-count number.

We mentioned in the previous section that anomaly-based systems utilize

the byte frequency distribution to detect the existence of worms. Our method

in this paper is a totally distinct concept. In anomaly-based systems, the

byte frequency distribution of the whole incoming traffic is compared with the

expected distribution of the normal traffic and a great deviation between these

two distributions is considered as malicious. In our method, however, the byte

frequency distribution is used to describe the signature from collected variants of

the same worm only. The purpose is to have a I i::.; 1" format of the signature

so that the malicious connection can be identified if the p loadd of the connection

matches to the signature approximately. Variants of the worm need to be obtained

before hand in our systems while anomaly-based systems only need to care about

the patterns of the legitimate traffic.

We have established that the PADS signature and the significant regions

can lead to each other. We do not know either of them, but we know that the










significant regions are those segments that can maximize the matching score

with the signature. This ii,--i'g data pI.fI i can be solved by an iterative

algorithm, which first makes a guess on the starting positions of the significant

regions, computing the signature, using the signature to compute the new starting

positions of the significant regions, and repeating the process until convergence.

3.4 Algorithms for Signature Detection

In this section, we show how to use the Expectation-Maximization algorithm

and the optimized Gibbs sampling algorithm to compute the PADS signature from

a collection of worm variants captured by our double-honeypot system. We want

to stress that, though comparing the signature with the p liload of the incoming

connections is online, the PADS signature itself is computed off-line. There is

no real-time requirement. The purpose of the algorithms is to obtain a PADS

signature that is able to detect the variants of the polymorphic worm even if they

are unknown. If fast response of the worm defense is required, the p loadss of

the captured worm variants should be used directly before they go through the

signature center specified. The generated PADS signature can be applied later on

for unobserved variants of the worm.
k- W --
al
-- -- -- I I I I -- - -:-- S 1
a2
-- -- -- -- -- I I I I I I]1 11;;; --- --- ---- -- -i S 2
a3
S3
a4
------------------------------ I I I I S4
a5
', I I------- I I--- I - - S 5
a6
backgro n I I I f n I I --: S6
background significant region


Figure 3-8. Signature detection









3.4.1 Expectation-Maximization Algorithm

Expectation-Maximization (EM) [35] is an iterative procedure that obtains

the maximum-likelihood parameter estimations. Given a set S of byte sequences,

we lack the starting positions al, a2, ..., a, of the significant regions, which are

the missing data in our problem. The underlying parameter O of our data set is

also unknown. The EM algorithm iterates between the expectation step and the

maximization step after the initialization.

The description of EM algorithm is given below.

Initialization. The starting positions al, a2, ..., a, of the significant regions

for worm variants S1, S2, ..., S, are assigned randomly. They define the initial

guess of the significant regions R1, R2, ..., R,. The maximum likelihood estimate of

the signature O is calculated based on the initial significant regions.

Expectation. The new guess on the locations of the significant regions is

calculated based on the estimated signature O. In our algorithm, the new starting

position ai of the significant region is the position that the significant region has

the best match score with the signature 0. In other words, we seek


a = argmaxA(0, Si, aa) Vi E [1..n]
ai

Maximization By formula (3-3), the new maximum likelihood estimate of

the signature S is calculated based on the current guess on the locations of the

significant regions.

The algorithms terminates if the average matching score 2 is within (1 + e) of

the previous iteration, where E is a small predefined percentage.

Starting with a large signature width W, we run the above algorithm to decide

the signature as well as the significant regions. If the minimum matching score

of all significant regions deviates greatly from the average score, we repeat the









algorithm with a smaller W. This process continues until we reach a signature that

matches well with the significant regions of all collected worm variants.

3.4.2 Gibbs Sampling Algorithm

One main drawback of the EM algorithm is that it may get struck in a local

maxima. There is no guarantee that the global maxima can be reached. In order

to solve the problem, many strategies have been proposed. One approach is to

start with multiple random parameter configurations and look for the best among

different results obtained. Another is to pre-process the data with some other

methods and choose ;ood" initial configuration. In recent years, the simulated

annealing [40] approach attracted great attention. Simply speaking, the approach

allows certain random selection of the parameter (with a small probability moving

towards a worse direction), which provides a chance to jump out of a local maxima.

One example of the simulated annealing is the Gibbs Sampling Algorithm [36],

which we will use to compute the PADS signature below.

The algorithm is initialized by assigning random starting positions for the

significant regions of the worm variants. Then one variant is selected randomly.

This selected variant is temporarily excluded from S. The signature is calculated

based on the remaining variants. After that, the starting position for the significant

region of the selected variant is updated, according to a probability distribution

based on the matching scores at different positions. The algorithm continues with

many iterations until a convergence criterion is met.

The description of the Gibbs sampling algorithm is given below.

Initialization. The starting positions al, a2, ..., a, of the significant regions

for worm variants S1, S2, ..., S, are assigned randomly.

Predictive Update. One of the n worm variants, S,~ is randomly chosen.

The signature 0 is calculated based on the other variants, S S,.









The algorithm terminates if the average matching score is within (1 + E) of the

previous iteration, where E is a small predefined percentage.

Sampling. Every possible position a, E [1..x1 W + 1] is considered as

a candidate for the next starting position for the significant region of S1. The

matching score for each candidate position is A(O, S,, a,) as defined in (3-1). The

next starting position for the significant region of S, is randomly selected. The

probability that a position a, is chosen is proportional to A(O, S,, a,). That is,


Pr(a,) =--- A (o, S a,)
Ya ZV.-w A(O, 1x, a1)

Go back to the predictive update step.

Some similarities and difference between EM method and Gibbs sampling

algorithm should be noted here. Both EM method and Gibbs sampler share

the same statistical model built on top of the vocabulary frequencies at each

positions of the predicted common signature region. EM can be thought of as a

deterministic version of Gibbs sampling and Gibbs sampling can also be thought of

as a stochastic analog of the EM algorithm. EM operates on the means of unknown

variables using expected sufficient statistics instead of sampling unknown variables

as does Gibbs sampling. Both EM and Gibbs sampling are used for approximation

with missing data.

3.4.3 Complexities

Since our algorithms are iterative, it makes no sense to discuss the total time

complexity. In stead, we discuss the time and space complexity in each iteration

here. The space complexity in both EM and Gibbs sampling algorithm are fixed

here. During the process, we only need to maintain a relative byte frequency table

of the signature 0 and the start locations of the significant region in each byte

sequence. Therefore, the space complexity is O(256W + n). The time complexity

is quite different. In Gibbs sampling algorithm, each time only one start location









is updated. The time complexity in one iteration is O(li W + 1) since there are

1i W + 1 possibilities. In EM algorithm, we update all start locations at once, the

time complexity is 0(EZ(1l W + 1)) for one iteration. That does not mean Gibbs

sampling is better than EM in time complexity, however. They are generally the

same if updating all start locations is counted as one iteration.

3.4.4 Signature with Multiple Separated Strings

Thus far the PADS signature is assumed to be a continuous string (where

each position in the string is associated not with a byte value but with a byte

frequency distribution). The definition can be easily extended for a signature to

contain k(> 1) separated strings, which may have different lengths. The significant

region of a byte sequence also consists of multiple separated segments, each having

a starting position and corresponding to a specific string in the signature. The

matching score A(O, Si, ail, ai2, ...) should now be a function of a set of starting

positions, and the significant region is defined by the set of starting positions that

maximizes the matching score. Because it remains that the signature and the

significant regions can be computed from each other, the EM algorithm and the

Gibbs Sampling algorithm can be easily modified to compute a signature with k

strings.

Incorporating models of signature with gaps is necessary and advantageous in

our our system. When polymorphism is used in the worm, an attacker may attach

different variants of the worm to the same legitimate p loada. If gaps in signature

are not allowed, the legitimate p loadd which appears the same in each sampled

byte sequence, instead of worm body that have variations, will be identified by EM

or Gibbs sampling algorithms. With multiple locations and lengths maintained

for one significant region in byte sequence Si, the problem can be solved. By

expanding the total length of the signature with gaps, both the legitimate p loadd

and the worm body will be covered. In addition, the legitimate p loadd attached









in the signature can be dropped if we compare it with the byte sequences collected

from the legitimate collection session. Therefore, the time and space complexity

can be significantly reduced when we used the simplified signature to check the

incoming traffic.

3.4.5 Complexities

Since our algorithms are iterative, it makes no sense to discuss the total time

complexity. In stead, we discuss the time and space complexity in each iteration

here. The space complexity in both EM and Gibbs sampling algorithm are fixed

here. During the process, we only need to maintain a relative byte frequency table

of the signature 0 and the start locations of the significant region in each byte

sequence. Therefore, the space complexity is 0(256W + n). The time complexity

is quite different. In Gibbs sampling algorithm, each time only one start location

is updated. The time complexity in one iteration is O(li W + 1) since there are

li W + 1 possibilities. In EM algorithm, we update all start locations at once, the

time complexity is O(Ei(li W + 1)) for one iteration. That does not mean Gibbs

sampling is better than EM in time complexity, however. They are generally the

same if updating all start locations is counted as one iteration.

3.5 MPADS with Multiple Signatures

Thus far the PADS signature is defined as a continuous -li i ng" of byte

frequency distributions. It identifies a single significant region in an incoming byte

sequence. This strategy has a couple of limitations. First, a worm may include

a common segment that appears often in normal traffic. This common segment

defeats any attempt by the worm to be polymorphic because the worm is easily

identifiable by the segment. However, it can lure our system to choose the common

segment as the PADS signature and consequently produce false positives on the

normal traffic that happens to carry that segment. Second, a polymorphic worm

may have multiple characteristic segments that all carry useful information. PADS









captures the most significant one but discards the rest, which renders it less

powerful against highly sophisticated polymorphic worms.

To address the above limitations, we propose a natural generalization, called

multi-segment position aware distribution -.:,Ij., l re ( I PAD for short), which is

a set of PADS signatures that are combined to identify a worm. It is denoted as

M = (1, ..., Ok), where Qi, 1 < i < k, is a PADS signature. Each PADS signature

may have a different width.

To calculate AM, we first use the algorithms in Section 3.4 to compute a PADS

signature, 01, and the significant regions for 01. We then remove these significant

regions from the worm samples and compute the next PADS signature, 02, and the

significant regions for 02. We further remove these significant regions and compute

03 ... until there is no more signature that can produce good matching scores for

all worm samples. When an incoming byte sequence is matched against AM, it is

classified as a potential worm variant only when its matching scores with all PADS

signatures are above zero. To reduce the matching overhead, the PADS signature

with the most diverse distribution can be used first, which attempts to separate

worm variants (with some false positives) from the background traffic. The rest of

PADS signatures are then applied one after another to progressively filter out the

false positives.

3.6 Mixture of Polymorphic Worms and Clustering Algorithm

Until now we have only discussed how to calculate a PADS/ I PAD signature

from a collection of worm variants that belong to the same polymorphic worm.

In reality, multiple different polymorphic worms may rage on the Internet at

the same time, and a double-honeypot system may capture a mixed set of worm

samples that belong to different worms. We have to first partition this mixed set

into clusters, each sharing similar traffic patterns and thus likely to come from

the same worm. This is called cluster p'r, I. :.: ,.,:. problem. After partitioning, a









PADS/ I PAD signature is calculated for each cluster. The signatures can then be

used to identify new variants of the worms. We describe two algorithms for the

cluster partitioning problem.

3.6.1 Normalized Cuts

We define a -;.il .'i/;. metric between any two variants. A naive definition

is to first compute the byte-frequency distributions of the two variants and

then measure the difference (e.g., KL-divergence) between them. Another naive

definition is to count the length of the longest common substring or the combined

length of the k longest common substrings. A better definition is to compute

a PADS/ \ PAD signature from the two variants and then take the combined

matching score between the variants and the signature. Our experiments will use

this definition of similarity. Consider two worm variants, Si and Sj. Suppose PADS

is used. Based on (3-2), the similarity between Si and Sj can be expressed as


ij = Q(O, S) + Q(O, Si)
W-1
li-w+1 1 fp(Si, ai + p)
max log
ai1 W fo(Si, ai + p) (34)
p=0
w-1
lj-w+l 1 f (Sj, aj + p)
+ max log
aj= fo(Sj, a + p)

Where O is the PADS signature calculated from Si and Sj and W is the length of

the signature.

The cluster partitioning problem can be formulated in a graph-theoretic

way. We construct a complete graph with n nodes, representing the variants

S = {S1,..., S,. The edge between Si and Sj is associated with a similarity value

of Qij as defined in (3-4). Qj =- 0. Given the n x n similarity matrix II = (yi),

i,j E [1..n], we want to find such clusters (e.g., cliques in the graph) that have large

similarity values for intra-cluster edges but small similarity values for inter-cluster

edges. Figure 3-9 illustrates a simple example, where a shorter edge means a larger











S S1 /



\ S3 S5 /
'-N

Cluster 2
Cluster 1

Figure 3-9. Clusters

similarity value. This is a well-studied problem and a spectral clustering algorithm
called normalized cuts can be used to extract the clusters [41, 42]. For the purpose
of completeness, we briefly describe the algorithm in our context.
The normalized cuts algorithm first decomposes the graph G into two clusters,
A and B, that minimize the following criterion:

cut(A, B) cut(A, B)
assoc(A, G) assoc(B, G)

where cut(A, B) is the sum of the similarity values of all edges that have one end
in A and the other end in B, assoc(A, G) is the sum of the similarity values of all
edges that have one end in A and the other end unrestricted, and assoc(B, G) is
similarly defined.
A vector yis used to define the two clusters. If the ith value of y is 1, then Si
belongs to the first cluster. If it is -1, then Si belongs to the second cluster. In
addition to the similarity matrix II,we define a degree matrix Das follows

Dii = i


for the diagonal elements and zero for all off-diagonal elements.









The criterion can then be rewritten as

yT(D U)y
yTDy

Minimizing the above criterion is an integer programming problem if y only take

discrete elements. An approximation is to treat y as a real vector [41] with positive

elements for the first cluster and negative elements for the second cluster. It can be

shown that any y satisfying the following equation for some A value will minimize

the criterion.

(D 1)y = ADy

Following certain transformations that we omit here, the generalized eigenvector y

corresponding to the second smallest eigenvalue is used [41]. Readers are referred to

[41] for details.

After the algorithm partitions the graph into two clusters, we can recursively

apply the algorithm to further partition each cluster until there is no significant

difference between average intra-cluster similarity and average inter-cluster

similarity.

3.7 Experiments

We perform experiments to demonstrate the effectiveness of the proposed

signatures in identifying polymorphic worms. The malicious p loadss of MS

Blaster worm, W32/Sasser worm, Sapphire worm, and a Peer-to-peer UDP

Distributed Denial of Service (PUD) worm are used in the experiments. The MS

Blaster worm exploits a vulnerability in Microsoft's DCOM RPC interface. Upon

successful execution, MS Blaster worm retrieves a copy of the file msblast.exe from

a previously infected host [43]. The W32/Sasser worm exploits a buffer overflow

vulnerability in the Windows Local Security Authority Service Server (LSASS) on

TCP port 445. The vulnerability allows a remote attacker to execute arbitrary code

with system privileges [44]. For Sapphire (also called Slammer) worm, it caused










worm body
Garbage payload
al- -----


: I I
a2
a: I 3 I
a3
:-----------I I


-----------L~
a4

a5


significant region


] m m I : s1
| m I ----i $S2

-m- mi S3
| m S4


legitimate traffic payload
malicious payload segment


Figure 3-10. Variants of a polymorphic worm


considerable harm simply by overloading networks and taking database servers out

of operation. Many individual sites lost connectivity as their access bandwidth was

saturated by local copies of the worm [45]. The PUD worm tries to exploit the SSL

vulnerability on i386 Linux machines [46].

In the experiments, we artificially generate the variants of these worms based

on some polymorphism techniques discussed in Section 3.2. For normal traffic

samples, we use traces taken from the UF CISE network.

Figure 3-10 illustrates the polymorphic worm design with five variants, S1,

S2, ..., and S5. Each variant consists of three different types of regions. The

black regions are segments of the malicious p load in the worm. Substitution

is performed on 10' of the malicious p loada. Garbage p ,liloads, which are

represented as the white regions with solid lines, are inserted at different random

locations in the malicious p loada. The default ratio of the malicious p loadd

to the garbage p loadd is 9:1.1 In addition to garbage 'p liload, each variant is

embedded in the legitimate traffic of a normal session, represented by the white

regions with dotted lines. The length of the normal traffic carried by a worm


1 This ratio is not shown proportionally in Figure 3-10 for better illustration.










9
8 Maximum = 7.5402


6
0)
85
0)
4

3 --Gibbs Run 1
E ---Gibbs Run2
2 /Gibbs Run
EA'I Pun 1
Eril Purn
I Er PunL
0 5 10 15
avg. no. of recalculations per variant

Figure 3-11. Influence of initial configurations


variant is between 2KB to 20KB. In the illustration, the significant regions of these

variants start at al, a2, ..., and a5, respectively.

3.7.1 Convergence of Signature Generation Algorithms

In the first experiment, 100 variants of MS Blaster worm are generated and

they are used as worm samples for signature generation. The EM algorithm

and the Gibbs Sampling algorithm each run three times with different initial

configurations. Specifically, the initial starting points of significant regions

are randomly selected each time. Figure 3-11 shows the quality of the PADS

signature obtained by EM or Gibbs after a certain number of iterative cycles.

According to Section 3.4, the execution of either algorithm consists of iteration

cycles (Expectation/Maximization steps for EM and Update/Sampling steps for

Gibbs). During each iterative cycle, EM recalculates the significant regions of all

variants, while Gibbs only modifies the significant region of one randomly selected

variant. To make a fair comparison, we let the x axis be the average number of

recalculations performed on the significant region for each variant. The y axis is

the average matching score of the 100 variants with the signature obtained so far.

The matching score Q is defined in (3-2). From the figure, the best matching score







69



1.5
variant 1--50
Variant 51--100
1-
5.5-
0.5
0)

4.5 0

4-
-0.5
.3.55
1003, >, x. x x
00 0 : -100


0 0-150 20 40 60 80 100
j variant id


Figure 3-12. Variants clustering using normalized cuts


is around 7.5, which is likely to be the global maxima. EM tends to settle down at

a local maxima, depending on the initial configuration. Gibbs is likely to find the

global maxima but it does not stabilize even when it reaches the global maxima

due to the randomness nature in its selection of starting points of significant

regions.2

3.7.2 Effectiveness of Normalized Cuts Algorithm

The purpose of the second experiment is to evaluate the effectiveness of the

normalized cuts algorithm in solving the cluster partitioning problem. In this

experiment, 50 variants of MS Blaster worm with ids [1..50] and 50 variants of

W32/Sasser worm with ids [51..100] are generated. The normalized cuts algorithm

is used to separate the mixed 100 variants into clusters. The similarity matrix, as

defined in Section 3.6.1 and particularly (3-4), is calculated by using the Gibbs

sampling algorithm. The result is shown in the left-hand plot of Figure 3-12, where

the horizontal axes are variant ids, representing the rows i and the columns j of the




2 The termination condition was not used in this experiment to show the
dynamics during the Gibbs iterations.






70



10 10
-Gibbs-1 -Gibbs-1
9- EM-1 9- EM-1
-Gibbs-2 -Gibbs-2
8 EM-2 8- EM-2
SGibbs-3 -Gibbs-3
EM-3 7- EM-3
-7 Gibbs-4 a .. _- --Gibbs-4
0 6-
o EM-4 6 EM-4
o 0 o
D 6 -- w -
5-
. 5* : 4
5-
o 4
E 4- E 3-
3 2-

2- 1-

101 102 103 00 1 2 3 4 5 6 7
signature width W average length of worm variants x 10o4

Figure 3-13. Matching score influence of different signature widths and sample
variants lengths


matrix, and the vertical axis is the similarity value between variants i and j. The

surface of the plot can be roughly partitioned into three regions. The first region

(i,j E [1..50]) shows the similarity values amongst the set of MS Blaster worm

variants. The second region (i,j E [51..100]) shows the similarity values amongst

the set of W32/Sasser worm variants. The rest region shows the similarity values

between MS Blaster variants and W32/Sasser variants. By using the normalized

cuts algorithm, the 100 worm variants are separated into two clusters, one for MS

Blaster and one for W32/Sasser. The resulting y vector is shown in the right-hand

plot of Figure 3-12, where each point represents one element in y. The variants

whose values in y are below zero belong to one cluster. The variants whose values

in y are above zero belong to the other cluster.

3.7.3 Impact of Signature Width and Worm Length

In the next set of experiments, we generate 2000 variants of MS Blaster worm,

Sasser worm, Sapphire worm, and PUD worm each. We use 100 samples from

each of the worms for signature generation. The rest 2000 variants are mixed with



















o Normal Traffic (Gibbs'
Normal Traffic (EM)
x Worm Traffic (Gibbs)
+ Worm Traffic (EM)
Threshold



++ + x
~+ + + ++

++








102 103
signature width W


o Normal Traffic (Gibbs
* Normal Traffic (EM)
x Worm Traffic (Gibbs)
+ Worm Traffic (EM)
-Threshold


a,
0
5



E
0


1 2 3 4
average length of worm variants


Figure 3-14. Influence of different lengths of the sample variants


o Normal Traffic (Gibbs'
* Normal Traffic (EM)
SWorm Traffic (Gibbs)
+ Worm Traffic (EM)
- Threshold


o Normal Traffic (Gibbs
* Normal Traffic (EM)
Worm Traffic (Gibbs)
+ Worm Traffic (EM)
-Threshold


0
5




E


102
signature width W


1 2 3 4
average length of worm variants


Figure 3-15. Influence of different lengths of the sample variants


a,
0
5



E


10


0
5



E


6 e SS



















o Normal Traffic (Gibbs'
Normal Traffic (EM)
x Worm Traffic (Gibbs)
+ Worm Traffic (EM)
- Threshold


a,
0
5



E


102
signature width W


o Normal Traffic (Gibbs
* Normal Traffic (EM)
x Worm Traffic (Gibbs)
+ Worm Traffic (EM)
-Threshold


a,
0
5



E
0


1 2 3 4
average length of worm variants


Figure 3-16. Influence of different lengths of the sample variants


o Normal Traffic (Gibbs'
* Normal Traffic (EM)
SWorm Traffic (Gibbs)
+ Worm Traffic (EM)
- Threshold


o Normal Traffic (Gibbs
* Normal Traffic (EM)
Worm Traffic (Gibbs)
+ Worm Traffic (EM)
-Threshold


0
5




E


102
signature width W


1 2 3 4
average length of worm variants


Figure 3-17. Influence of different lengths of the sample variants


10


0
5



E


K K


I t


* t* *









normal-traffic byte sequences to test the quality of the signature for each of the

four worms.

Figure 3-13 shows the average matching score with respect to the signature

width and the average length of the worm variants. Because the worm code has a

fixed length, we change the length of a variant by letting it carry a variable amount

of normal traffic. The two figures show the average matching scores of sample

variants after EM and Gibbs sampling algorithms converge to a final signature.

Figure 3-13 also indicates that increasing the signature width will decrease

the average matching score of worm variants. The reason is that a longer signature

means a larger significant region, which increases the chance for the significant

region to include garbage p liload, which in turn decreases the matching score.

Figure 3-13 shows that increasing the length of the normal traffic carried by a

worm variant, which has been widely used by some polymorphic worms to elude

the anomaly-based systems, provides no help to avoid detection by our system. The

reason is that our system identifies a significant region and only uses the significant

region for signature generation. The carried normal traffic, no matter how much it

is, will not be used for signature generation.

Figure 3-14 3-17 show the average matching scores of the testing worm/normal

traffic sequences. The scores for worm traffic are alv--,v- above zero and the scores

for normal traffic are ah--,v- below zero. Therefore, with a threshold of 0, worm

variants are distinctively separated from normal traffic. In our experiments, the

generated PADS signature was ah--i-i able to identify new variants of the worm

without false positive rates. The false positive rate and false negative rate of our

algorithm will be discussed in the next subsection.

3.7.4 False Positives and False Negatives

Figure 3-18 show the false positive rate and false negative rate of our

algorithm for each of the four worms. We only show the influence of signature




















-x 10-3


false-positive rate (Gibbs)
- false-positive rate (EM)
false-negative rate (Gibbs)
false-negative rate (EM)


2 7



1 2 3

101 102 10
signature width W

6x 10-3
false-positive rate (Gibbs)
--false-positive rate (EM)
5 -tfalse-negative rate (Gibbs)
false-negative rate (EM)


2


1


.x 10-3


x 10-3
false-positive rate (Gibbs)
false-positive rate (EM)
5 -false-negative rate (Gibbs)
false-negative rate (EM)


4


3


2



It ak


102
signature width W


103


102
signature width W


Figure 3-18. False positives and false negatives


0 101
101


j










3000
longest common substring method false positive ratio
1.2 false negative ratio
2500

2000
0.8
1500- 0.6

1000- 0.4

0.2
500 -


0 20 40 60 80 100 0 20 40 60 80 100
number of sample variants number of sample variants

Figure 3-19. The performance of signature-based system using the longest common
substrings method.


width because the sample length has little influence on the matching scores. For

all four worm examples, neither false positive nor false negative rate exceed 0.5'.

As we can see from the figure, Gibbs sampling algorithm is i. ,-, better than

EM algorithm for all four worms. With the increase of signature width, the false

positive rate decreases gradually while false negative rate increases gradually.

3.7.5 Comparing PADS with Existing Methods

For the purpose of comparison, we also perform experiments with some

existing methods. Figure 3-19 shows the experimental results based on the

longest common substring method [30], which first identifies the longest common

substring among the sample worm variants and then uses the substring as a

signature to match against the test variants. Based on the left-hand plot, as the

number of sample variants increase, the length of the longest common substring

decreases. A shorter signature increases the chance for it to appear in normal

traffic. Consequently, the false negative ratio decreases, but the false positive

ratio increases dramatically (the right-hand plot). On the contrary, without the

requirement of exact matching, a PADS signature is able to retain much more

(particularly statistical) characteristics of a polymorphic worm.


















=normal


50 100 150
byte value


Figure 3-20. Byte frequency distributions
worm traffic (right-hand plot


M1:1


0.1
0.09
0.08
0.07
S0.06
0.05
S0.04
0.03
0.02
0.01


Uk il .., L J. il.JLID.,..Jj..LI I


JL .1,1


J.I I


200 250 0 50 100 150
byte value


of normal traffic (left-hand plot) and
)

0.1
0.09


dL


0 50 100 150
byte value


0.08
0.07
0
0.06
S0.05


200 250


50 100 150
byte value


Figure 3-21. Byte frequency distributions of worm variants. Left-hand plot:
malicious and normal p .',loads carried by a worm variant have equal
length. Right-hand plot: normal p load carried by a worm variant is
9 times of malicious p .,iload.


0.09
0.08
0.07
0.06
S0.05
0.04
J3


Iworm


I L 1.


200 250


0.081


0
S0.06
- 0.05
S0.04
0.03
0.02
0.01


1 1:9|


200 250


L 11.I,..L A J....J


J ., 1 _.,.


L.. ... ... .. .


IL I I I I= I I i I I


''Y~" ~ -- IiY YY-j_-


I.........


i


II









Now consider the position-unaware byte frequency distributions that are

used in some current systems. The left-hand plot of Figure 3-20 shows the

position-unaware byte frequency distribution of 100 normal traffic sequences (from

100 normal sessions) and the right-hand plot shows the byte frequency distribution

of MS blaster p loada. These two distributions are very different, which seems

provide a way to detect the worm. However, if we create a worm variant by

embedding the worm p loadd in normal traffic, the combined byte frequency

distribution can be made very similar to that of normal traffic. Figure 3-21

shows the byte frequency distributions of two worm variants whose normal traffic

lp loadss are 1 and 9 times of malicious p loada, respectively. The right-hand plot

is very similar to the left-hand plot of Figure 3-20. Therefore, using byte frequency

distributions alone cannot handle worm variants. The proposed position-aware

distribution signature works better against polymorphic worms.















CHAPTER 4
MULTIPLE PADS MODEL AND CLASSIFICATION OF POLYMORPHIC
WORM FAMILIES: AN OPTIMIZATION

4.1 Introduction

As is described in the previous chapters, the iterative methods is a time

consuming process. Because the PADS signature can only be obtained one by one

in the previous method, it will take a long time before every PADS signature has

been extracted. Secondly, because PADS signatures are extracted sequentially, the

quality of the PADS signature will be different. Since iterative methods are used,

different initialization will result in totally different PADS signature set, thus affect

the clustering of the polymorphic worm family. To address these problems, a new

method has to be used to further optimize the previous described approach.

This chapter described a new strategy intended to help identify and detect

the existence of polymorphic Internet worms. The approach tries to improve the

discovery of significant regions for more complicated polymorphism. Compared

with the previous methods, the advantages for the proposed approach are

manyfold. First of all, the significant regions in this approach contain multiple

blocks with no strict sequence. The flexibility allows the detection of more subtle

polymorphism such as code transposition. Secondly, those significant regions that

also appear in normal traffic backgrounds can be effectively removed. Even if the

attacker inserts common normal regions into the polymorphic worms purposely,

the significant region that actually performs the malicious operations can still be

detected. Finally, different types of polymorphic worms are classified simutanously

in order to detect the significant regions more accurately and decrease the false

rate.









The generation of the PADS block is a "missing
the malicious regions in each variants of the worm, nor the signature itself is

known. If the malicious regions are known, the PADS block can be calculated by

counting the number of each byte value appearing at different positions. On the

other hand, if the PADS block is known, the malicious region in each variants of

the worm can be obtained by scanning through the whole variants and finding the

regions that best matches the PADS block. The "missing ( I i problem can be

solved using iterative methods such as Expectation-Maximization (EM) or Gibbs

sampling algorithms which have been mentioned in [47].

The model of the single PADS block in [47] suffers from several limitations.

First of all, a single PADS block can not deal with over-seperated malicious regions

because one PADS block is unlikely to be able to cover all malicious regions.

Secondly, the single PADS block model is unable to exclude the influence of the

background noise. The approach makes an assumption that normal traffic does not

contain the same PADS block, which is not necessary true and can be exploited by

an brilliant attacker. Finally, the model of a single PADS block assumes that each

collected sample of the worm belongs to the same polymorphic worm family. There

is no mechanism to clssify different polymorphic worm families and exclude the

influence of those "outliners", which will greatly decrease the performance of the

algorithm. In addition to the limitations, the method of extracting PADS blocks

assumes that each sample variant contains exactly one PADS block of the same

type. The sample variants not containing the PADS block will over-contribute

to the characterization of the PADS block and the sample variants containing

repeating PADS blocks will under-contribute.

This paper tries to solve the problem by proposing a multiple PADS blocks

model. In this model, a set of PADS blocks is identified for each polymorphic worm

family, which is identified by a classification method similar to the extracting









of PADS blocks. The signature combines those PADS blocks together and every

PADS blocks within the set are taken into consideration for worm detection. In

order to eliminate the influence of background noise, the common regions within

the normal traffic p ,load will be first identified and excluded from the sample

worm variant set. Furthermore, the method of extracting PADS blocks in this

paper is able to identify multiple PADS blocks from a mixture of sample variants

that belongs to different polymorphic families, even if some PADS blocks do not

appear in all of the sample variants and some sample variants contains repeating

PADS blocks. To accomplish our goal, we further define a new metric to describe

the quality of the matching between a set of PADS blocks and a byte sequence. It

can be considered as an optimization to the previous described approach.

In the following sections, the details of extracting PADS blocks, the model of

multiple PADS blocks, the signature definition of the multiple PADS model, and

the classification of polymorphic worm families, will be presented, step by step.

4.2 Extraction of Multiple PADS Blocks from the Mixture of
Polymorphic Worms

4.2.1 PADS Blocks and The Dataset from Byte Sequences

In this subsection, we briefly introduce Position-Aware Distribution Signature

(PADS) [47] blocks, which are worm signatures defined in a special format to

identify the malicious regions that appear in all or most of the variants for the

same polymorphic worm.

A PADS block is greatly different from a traditional string signature in that

multinomial byte-frequency distributions replace byte values at each positions of

the PADS block. For a PADS block of width W in terms of the number of bytes,

(fl, ..., fw) is used to characterize the byte-frequency distributions inside the region

of a malicious block, with fk [fo, ..., fkb, ...]T for k = 1...W specifying the position









k and b = [0...255] for the set of all possible byte values. fkb satisfies:

255
fkb 1 t
b=O

Table 4-1 is an example of a PADS block with width W.

b f, f2 ... f9 f0
0x00 0.001 0.001 ... 0.500 0.100
Ox01 0.001 0.001 ... 0.200 0.500
0x02 0.001 0.001 ... 0.001 0.100

Oxfe 0.001 0.001 ... 0.001 0.001
Oxff 0.700 0.700 ... 0.001 0.001
Table 4-1. An example of a PADS block with width W = 10



Similar to the definition of PADS blocks, the multinomial byte-frequency

distribution outside the malicious regions in a byte sequence can be defined as

fo = [foo, -...fob, ...]T with respect to all possible byte value b = [0...255]. We use F

to represent (f, ..., fw) and fo respectively in these two cases.

Our purpose is to find the PADS blocks within a mixture of polymorphic

worms. In this paper, each byte sequence Sj is broken up into overlappingly

segments of length W. If aj is used to represent the starting position of a W-byte

segment within a sequence Sj, then aj can be any value between 1 and lj W + 1,

where lj is the total length of the sequence Sj. By extracting all possible W-byte

segments from the byte sequence set S = {S, S2,...}, a new dataset that contains

all possible W-byte segments is obtained. Suppose N is the total number of

sequences in the dataset, nj is the total number of W-byte segments within a

sequence Sj, and the total number within a sequence set is n. Apparently, we have

N N

j=1 j=1

In this paper, the total set of the W-byte segments forms the observed dataset.

To be consistent with the PADS blocks and faciliate the expression, the data of









the W-byte segment is represented as byte-frequency distributions as well. Let

G = (g, ..., gw) be the data of W-byte segment, with gk [gkl, **,gkb, ..]T

being the multinomial byte frequency distribution at position k. If the byte value

b appears at the position k of the segment, then gkb = 1 and the rest probabilities

{gkl *gk(b-1), gk(b+l), ...} are all 0. Table 4-2 is an example of the data for a

W-byte segment.

b gl g2 ... g9 gl0
0x00 0.000 0.000 ... 1.000 0.000
Ox01 0.000 0.000 ... 1.000 1.000
0x02 0.000 0.000 ... 0.000 0.000

Oxfe 0.000 0.000 ... 0.000 0.000
Oxff 1.000 1.000 ... 0.000 0.000
Table 4-2. An example of a segment with width W = 10



4.2.2 Expectation-Maximization (EM) Algorithm

We first review the Expectation-Maximization (EM) algorithm before it is

applied in our problem. Let Y {y(l), ...,y(n)} be the observed total dataset.

Suppose each data has a mixture density from M groups, with each group decided

by an unknown paramter, then

M
p(y0) Y mpe(YOm) (4 1)
m=l

where r {-r, ...,7M} are mixing probabilities that correspond to the

unknown parameters 0 = {1, ..., OM} and satisfy

M
Y Tm = 1,T7m < 0
m=l

It is known that the MLE of the parameter value


OML arg max(log p(Y |0, 7)) (4-2)
0









can not be found analytically. The EM algorithm makes use of the concept

of "missing dI I In our model, the missing data is a set of n labels Z

{z(), ...,z(n)} associated with the n observed datas in the dataset, indicating

which group each observed data belongs to. Each label is a binary vector

(i) = [z), ..., Z1 ]. If an observed data y belongs to the m-th group, then

zr = 1 and the data y(i) has the probability p(y(i) Om). On the other hand, if the

observed data does not belongs to the m-th group, then zi = 0.

Based on the definition of Z, it is straight forward that the prior probability

for zi = 1 is 7Tm,

p(z = 18, i) = m (4-3)

The conditional densities of the data y(') and the missing data z(i) can be written

as
M
P(y(iz(i ) PY(i)10m) (4-4)
m=l
M
p(z(i) 1, ) = n 'm (4-5)
m=l

The joint density of the observed data y(') and the missing data z(i) can be

obtained from Eq. 4-4 and 4-5:

p(y(i), z(i) 10, r) = p(z(i) 10, r)p (y(i) Iz(i), 0, 7r)
M (4-6)
H [T mP(Y (i) I 0m)6 z)
m=l

Assuming each data is independent, then from Eq. 4-6 we have:
n M
p(Y, 2 1, 7) nH [7mP(y(i)Im)()]z (4-7)
i= m= 1
The complete log-likelihood is
n M
logp(y, Z|0, 7) ~ z ) log [rmP(y(') Om)] (4-8)
il m= 1









For EM algorithm, we iteratively maximizes the expected it log likelihood over

the conditional distribution of the missing data Z based on the observed data y

and the current estimate of parameters 0 {O01,..., 0M, 11, ...,* M}.

Initialization: In this step, the initial unknown parameters 0(0), 7(0) are

assigned randomly.

Expectation: In this step, the expected value of z(i(t) can be calculated

using B i, -' rule and Eq. 4-3 and 4-4, where t represents the t-th iteration:



W (t)

SE[z)I Y, 0(t), -(t)]

P(z) = 1y(i), (t), -(t)) 1+

P(z) 0ly(i), (t),i(t)) 0 (4-9)

p(y(i)|z = 1, (t), (t))P(zT =( l| (t), -(t))
p(y(i) 0(t), 7(t))
r(t)p(yl0(t))
M
E 7(t)p(y 0(t))
m=l

Maximization: The unknown parameter estimates can be updated in this

step. Since we have

E[log p(Y, Z|0, 7)1|y, 0(t), 7(t)]
n M
=E[Z z log FmP(y(i 0m) m), 0(t), (t)]
i= m= 1
n M
E[z| Y, (t), '(t))] log mp(y(i) Om) (4 10)
i1 m= 1
nM
= z~ (t) log mp(y(i) Om)
i= 1 m= 1
n M n M
= z, (t) log p(y(i) Om) + E E Z) (t) log 7m
i 1 m= 1 i 1 m= 1










To obtain Om(t + 1), we have



O,(t + 1) arg max E[logp(Y, Z|O, 7) y, 0(t), #(t)]
Om Z
n (411)
Sarg max C z)(t)log p(y(i) m)
Om
i-i
i 1

To obtain 7~n(t + 1), we have:



7^n(t + 1) -arg max E[logp(Y, Z|0, 7)|y, 0(t), r(t)]
Z m Z
n M (4-12)
arg max Zn zi (t) log 7Tm
im 1 1
i-i m=i


4.2.3 Extraction of Multiple PADS blocks

We apply the EM algorithm to extract the multiple PADS blocks from the

mixture of polymorphic worms. In our case, the observed data y is the W-width

byte-frequency distributions G = (g, ..., gw). The unknown parameter Om is the

W-width byte-frequency distributions F =(fi,..., fw) for 1 < m < M 1 and

background byte-frequency distribution F = fo for m = M. Therefore, in the

previous subsection, p(yl0m) can be calculated by:

W 255
H H (fkb)9kb,
k =lb 0
if m < M, O (fi,..., fw)m
p(Yl Om) W 255 (4
H (f0b )kb,
k=lb=0
ifm = M, Om =fo.

Apply the above to Eq. 4-11 and 4-12, we have:
255
fkb(t+ 1) zT (t)gkb/ z(t)gk (4 14)
b=0


and










n 1
^-(t + 1) z)i (t)/n (4-15)
i=1
One problem with Eq 4-14 is that fkb(t + 1) will be zero for those byte

values b that never appear at position k of any PADS blocks. The value will never

change during the iterations due to the update of the EM algorithm. However,

fkb(t + 1), which is the estimate of the parameters of a multinomial random variable
by maximum likelihood, is actually subject to boundary problems. For better

flexibility, we apply a 1- i' [' i-count" to the observed byte count, and the byte

frequency estimate becomes:


z (t)gkfb + Ob
fkbZ( ++1=(4-t6)
fkb(t + 1) 255 255
E zfr ()gkb + E A
b=0 b=0
The equation above is to assume that the prior distribution of fkb is Dirichlet

distribution with parameter 01, /2,...[48]. In this paper, a constant d is used to

replace si, /2, ... for simplicity reasons. Therefore, we actually have:


(t)gkb + d
fkb(t + 1) 255 (417)
E z(t)gkb + 255 d
b=0

4.3 Classification of Polymorphic Worms and Signature Generation

4.3.1 Multiple PADS Blocks Model

In this subsection, the multiple PADS blocks model is presented, together

with the cretia whether or not a sequence is considered as malicious. In order to

take into consideration every PADS blocks within the set of a polymorphic worm

family, we treat the byte sequence as a feature vector space with each feature

as the similarity against each PADS block. In our model, we use the conditional

log likelihood of a sequence for each PADS blocks to represent each feature. The









sequence under the feature space is defined as:

Sh() \ logp(Y F(1), H())

h(2) logp(Y F(2), I(2))
h-


h(d) \ logp(Y F(d), (d))

where F(), F(2), ..., F(d) are the PADS block signature ( corresponding to 0 in the

extraction step ), [(1), (2), ..., (d) are the mixturing probabilities ( corresponding

to 7r in the extraction step ) with respect to F),F(2), ..., F(d), and Y is the dataset

{(1), (2) (l-W+1) within a sequence Sij1

To fit the multiple PADS model, the signature H for multiple PADS blocks

{F(1), F(2), ..., F(d) is defined as a d-dimensional vector as well:

/ H(1) \


H=



SH(d)

where H), H(2),..., H(d) specifies the expected value of the conditional log

likelihood for each PADS blocks.

Once the set of PADS blocks and the signature H for the polymorphic worm

family are specified, each sample variant is scored based on how close it is with

respect to the signature H. In our model, Mahalanobis distance is used to measure

the similarity between the feature vector h of any sample variant and the signature



1 Y is different from previously defined y in that y is the observed dataset from
all byte sequences.









H. Mahalanobis distance is a standard distance metric to compare two vectors. It

is a weighted Euclidian distance defined as:


d2(h, H)= (h H)TE- (h- H) (4-18)


The matrix E-1 is the inverse covariance matrix. The matrix can be pre-calculated

in our EM algorithm later on.

In other words, the signature is treated as a feature vector space with each

feature the matching score against certain PADS block. Therefore, every PADS

blocks within the set of a polymorphic worm family are taken into consideration for

worm detection.


H(1) \


H=
H H



SH(d)

In other words, the signature is treated as a feature vector space with each

feature the matching score against certain PADS block. Therefore, every PADS

blocks within the set of a polymorphic worm family are taken into consideration for

worm detection.

Once the set of PADS blocks and the signature H for the polymorphic worm

family are specified, each sample variant is scored based on how close it is with

respect to the signature H. In our model, Mahalanobis distance is used to measure

the similarity between the feature vector matching score h of any sample variant

and the signature H. Mahalanobis distance is a standard distance metric to

compare two vectors. It is a weighted Euclidian distance defined as:


d2(h, H) = (h H)TE-(h H)


(4-19)









The matrix -1 is the inverse covariance matrix. The matrix can be pre-calculated

in our EM algorithm later on.

The advantage of Mahalanobis distance is that it takes into account the

different weights for each element of the vector by its variance and the covariance

of the variables measured. The computed value gives a measure of how well the

matching score of the new sample varriant is consistent with the training data set.

Based on the Mahalanobis distance, we define a score D of of any sample

variant against a polymorphic worm family. The meaning of the score D is totally

different from the matching score of a sample variant against a PADS block as

a match against a PADS block only does not necessarily mean that the sample

variant will be identified as a polymorphic worm. The calculation of the score D is

as follows:

1. Calculate the matching scores of the sample variant against each PADS block

within the set, arrange the matching scores into a feature vector h.

2. Find the Mahalanobis distance between h and H: d2 = (h H)TE-(h H).
d2
3. The score D is defined as the similarity measured by D = e This score is

1 for a exact match and decreases otherwise.

4.3.2 Classification

A problem for the signature detection of polymorphic worms is the classification

of the sample variant set so that the sample variants can be grouped into different

polymorphic families. The classification serves two purposes. First, it helps

increasing the accuracy of the generated polymorphic signatures. Applying one

signature to each polymorphic worm family instead of treat all polymorphic

worms as one family can greatly improve the performance of the signature

generation. Second, the classification helps discovering the intrinsic mechanism

of the polymorphic worm as the malicious regions of can be better identified from

the background noise.