<%BANNER%>

Network Centric Traffic Analysis

Permanent Link: http://ufdc.ufl.edu/UFE0019813/00001

Material Information

Title: Network Centric Traffic Analysis
Physical Description: 1 online resource (140 p.)
Language: english
Creator: Fan, Jieyan
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2007

Subjects

Subjects / Keywords: classification, network, security, traffic
Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre: Electrical and Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Over the past few years, the Internet infrastructure has become a critical part of the global communications fabric. Emergence of new applications and protocols (such as voice-over Internet Protocol, peer-to-peer, and video on demand) also increases the complexity of Internet. All these trends increase the demand for more reliable and secure service. This has affected the interest of Internet service providers (ISP) in network centric traffic analysis. Our study considers network centric traffic analysis from the two perspectives that most interest ISPs: network centric anomaly detection, and network entric traffic classification. In the first part, we focus on network centric anomaly detection. Despite the rapid advance in networking technologies, detection of network anomalies at high-speed switches/routers is still far from maturity. To push the frontier, two major technologies need to be addressed. The first is efficient feature-extraction algorithms/hardware that can match a line rate in the order of Gb/s. The second is fast and effective anomaly detection schemes. Our study addresses both issues. The novelties of our scheme are the following. First, we design an edge-router based framework that detects network anomalies as they first enter an ISP?s network. Second, we propose the so-called two-way matching features, which are effective indicators of network anomalies. We also design data structure to extract the features efficiently. Our detection scheme exploits both temporal and spatial correlations among network traffic. Simulation results show that our scheme can detect network anomalies with high accuracy, even if the volume of abnormal traffic on each link is extremely small. In the second part, we focus on network centric traffic classification. Nowadays, VoIP and IPTV become increasingly popular. To tap the potential profits that VoIP and IPTV offer, carrier networks must efficiently and accurately manage and track the delivery of IP services. Yet, the emergence of a bloom of new zero-day voice and video applications such as Skype, Google Talk, and MSN pose tremendous challenges for ISPs. The traditional approach of using port numbers to classify traffic is infeasible because it uses a dynamic port number. The proliferation of proprietary protocols and usage of encryption techniques make application-level analysis infeasible. Our study focus on a statistical pattern classification technique to identify multimedia traffic. In particular, we focus on detecting and classifying voice and video traffic. We propose a system (VOVClassifier ) for voice and video traffic classification that uses the regularities residing in multimedia streams. Experimental results demonstrate the effectiveness and robustness of our approach.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Jieyan Fan.
Thesis: Thesis (Ph.D.)--University of Florida, 2007.
Local: Adviser: Wu, Dapeng.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2007
System ID: UFE0019813:00001

Permanent Link: http://ufdc.ufl.edu/UFE0019813/00001

Material Information

Title: Network Centric Traffic Analysis
Physical Description: 1 online resource (140 p.)
Language: english
Creator: Fan, Jieyan
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2007

Subjects

Subjects / Keywords: classification, network, security, traffic
Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre: Electrical and Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Over the past few years, the Internet infrastructure has become a critical part of the global communications fabric. Emergence of new applications and protocols (such as voice-over Internet Protocol, peer-to-peer, and video on demand) also increases the complexity of Internet. All these trends increase the demand for more reliable and secure service. This has affected the interest of Internet service providers (ISP) in network centric traffic analysis. Our study considers network centric traffic analysis from the two perspectives that most interest ISPs: network centric anomaly detection, and network entric traffic classification. In the first part, we focus on network centric anomaly detection. Despite the rapid advance in networking technologies, detection of network anomalies at high-speed switches/routers is still far from maturity. To push the frontier, two major technologies need to be addressed. The first is efficient feature-extraction algorithms/hardware that can match a line rate in the order of Gb/s. The second is fast and effective anomaly detection schemes. Our study addresses both issues. The novelties of our scheme are the following. First, we design an edge-router based framework that detects network anomalies as they first enter an ISP?s network. Second, we propose the so-called two-way matching features, which are effective indicators of network anomalies. We also design data structure to extract the features efficiently. Our detection scheme exploits both temporal and spatial correlations among network traffic. Simulation results show that our scheme can detect network anomalies with high accuracy, even if the volume of abnormal traffic on each link is extremely small. In the second part, we focus on network centric traffic classification. Nowadays, VoIP and IPTV become increasingly popular. To tap the potential profits that VoIP and IPTV offer, carrier networks must efficiently and accurately manage and track the delivery of IP services. Yet, the emergence of a bloom of new zero-day voice and video applications such as Skype, Google Talk, and MSN pose tremendous challenges for ISPs. The traditional approach of using port numbers to classify traffic is infeasible because it uses a dynamic port number. The proliferation of proprietary protocols and usage of encryption techniques make application-level analysis infeasible. Our study focus on a statistical pattern classification technique to identify multimedia traffic. In particular, we focus on detecting and classifying voice and video traffic. We propose a system (VOVClassifier ) for voice and video traffic classification that uses the regularities residing in multimedia streams. Experimental results demonstrate the effectiveness and robustness of our approach.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Jieyan Fan.
Thesis: Thesis (Ph.D.)--University of Florida, 2007.
Local: Adviser: Wu, Dapeng.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2007
System ID: UFE0019813:00001


This item has the following downloads:


Full Text
xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID E20101207_AAAACD INGEST_TIME 2010-12-07T15:01:37Z PACKAGE UFE0019813_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES
FILE SIZE 25271604 DFID F20101207_AABLOM ORIGIN DEPOSITOR PATH fan_j_Page_033.tif GLOBAL false PRESERVATION BIT MESSAGE_DIGEST ALGORITHM MD5
df95a21b97c038a6e00b7700325952a3
SHA-1
9589e74b360527463670ab98684aea2fb04b0e5f
1053954 F20101207_AABLNX fan_j_Page_015.tif
81da8725817c01118a3a667dc4e61cf8
4e17fd9e68caba901d1601b04c000105d9bca5fa
F20101207_AABLPA fan_j_Page_052.tif
c70889dd44fe366f16b740527c0f9240
2303e3adfdb98bdf39eebde279476a72839023a8
F20101207_AABLON fan_j_Page_034.tif
dd053ce556a2a667682e654ae0fe8621
e12c46b04a99febea365b668681b38bae5f549b0
F20101207_AABLNY fan_j_Page_017.tif
798cad13326ced32d765b15e1d97eea8
ecf52da3146aec525ab4158605840b0ee0584f28
F20101207_AABLPB fan_j_Page_054.tif
2094686a8b2c99cc3ff40d7a8ee0150b
02eef6744ca11d8a926c616e7f43cf2d42bb6670
F20101207_AABLOO fan_j_Page_035.tif
67e46c798d596364895b31d873e572b4
dc7e4a3031d54d5b8d15270499749bd3f29d3585
F20101207_AABLNZ fan_j_Page_018.tif
15d118032c117e8f476b7b48efb2e35e
bb0a85db4de477e498ef29b48bc185a205efbff5
F20101207_AABLPC fan_j_Page_057.tif
50fa3cb8888517e157a9dec389e044ba
f10c49b44c6cdffdc268cc124cc4015948c7aa48
F20101207_AABLOP fan_j_Page_037.tif
07abcf65d03be4e68442e9ef0fb034a7
d2b7193fbd962d34981d838b45467bc216951513
F20101207_AABLPD fan_j_Page_058.tif
568ccbebcb0e07f76891947c253253a6
f82c1d30350d71dc80fe61bad7a95643d7b801e7
F20101207_AABLPE fan_j_Page_060.tif
4b65c12e9c86f829b8dc39dc3ae62021
de86f86824f04af221615b1b6602734ffe8b499a
F20101207_AABLOQ fan_j_Page_038.tif
d8943266789217716d0689a894217ffc
030839b7c57dee4b9bd545b7217298af60ec2e35
F20101207_AABLPF fan_j_Page_061.tif
9f587f9abe12397e97490ffb2695b194
456e2123669c57d53893c4394af345ffacc5abd2
F20101207_AABLOR fan_j_Page_039.tif
58ff4056187d0e72eb42e9fac142e1d0
66817971e78cbb55cb7eefad8dc38e212b8ccc17
F20101207_AABLPG fan_j_Page_062.tif
35a17887a2ded596223007eb932aaffc
add7af7e82d9d93e0844a4ab720dc2c0dd6441df
F20101207_AABLPH fan_j_Page_065.tif
c5bed23635cd3dfe54788535bbd0d861
5464c1023a1418c62f63f6776b2be9405541aafd
F20101207_AABLOS fan_j_Page_040.tif
eaeef3a3c489d722c52fe46d8e022d05
212e94c3a299c841a4d1df94b2c7581bad675f0b
F20101207_AABLPI fan_j_Page_066.tif
5dd6896a995bd8d3da338a0286dc2bbb
4e1f73bd773057ee41dbd696b0769ccc57abfeea
F20101207_AABLOT fan_j_Page_041.tif
c9a6bb1be0be600e1fc1d98787bde2d0
65d6e7412489d331e339a0ca11fa934b28f5e3bf
F20101207_AABLPJ fan_j_Page_067.tif
9a55f00a8d8134f3b3b4ec01f2949452
6b952426b627fee0a750ce02f30e343fb2b23ac2
F20101207_AABLOU fan_j_Page_044.tif
2ea7d8a2bbfee3cf21c625c19e8613a4
1412d735daa13b2bccb791081295ec4d0ffaf31d
F20101207_AABLPK fan_j_Page_068.tif
d0ada2c6627e6288562d06ca327fedd2
1e17bd2664f2efb4d118357824fcef0012b4a1a5
F20101207_AABLOV fan_j_Page_046.tif
cb4378591d7ea828ec96373d62dbd774
19929ab7646ceb1a66effc7718d4dcea769c4a26
F20101207_AABLPL fan_j_Page_069.tif
64318405239a6b4add68ad15023b51a6
62a9aca6c3e523c828a9169290fc43ef4864783e
F20101207_AABLOW fan_j_Page_047.tif
2f85ce718d6590da582cf7f1dd8a0041
d0b3b29b28501b0461113cec3dcd986486d3dbb3
F20101207_AABLQA fan_j_Page_087.tif
7aa85cdcb36e66f04a51c812cd11fe8e
f28c217e78a066a8a4af55f692a6c011abffe761
F20101207_AABLPM fan_j_Page_070.tif
840a2735ba63de3ab51d8eee5c9e37a5
b5dcd20f5973110a83b53b54ca7968494e0b8b1d
F20101207_AABLOX fan_j_Page_048.tif
8fa4b0ed87f1a8b04fe167ad544fd3dd
b0d934a6e043ea0304daef3c3d97235a6ce07471
F20101207_AABLQB fan_j_Page_088.tif
d8ef3dfad88af2fcf1f6e156a3875825
017b5f69bb2cc5c56bfcbffca6dd62c034e154f4
F20101207_AABLPN fan_j_Page_071.tif
2dbd981bb8fa21a05e4191b0b41d67f6
c2e2a09f44eb467ad28a69dd139f5fefea89d684
F20101207_AABLOY fan_j_Page_050.tif
56df616dc0a20893bcebb695cc1779f5
580acad67542696ef4ee5e1930c7966dd6007c85
F20101207_AABLQC fan_j_Page_089.tif
5044f98c5d7aa0be011a05f7f0a64c2a
92914181a7a8086d0abae3849a1e85cf4f3f2459
F20101207_AABLPO fan_j_Page_072.tif
f1d5b808060a710457cf7ac720dbe89a
153954a4995ee4ea092ab89d6fc74bc30fcaee8a
F20101207_AABLOZ fan_j_Page_051.tif
75d2376f16cc0fc295cb0eabf9c08407
6373b84076d927a5e724a3d920da97287551b792
F20101207_AABLQD fan_j_Page_090.tif
d97c481be9b416226800c863322d44d4
d384675b3d7ebc2c7e0b3e49ab4a9edbe97dac22
F20101207_AABLPP fan_j_Page_073.tif
b5b9b2ed9d8a1f39a722ce0ad6fb684e
701faac5c949c44e786528d37da108342b89d6bd
F20101207_AABLQE fan_j_Page_092.tif
36f2cef171acb82e83ce4d1bb904b891
27e88ded9d58d442ff565e1cad843f8fd0c9610c
F20101207_AABLPQ fan_j_Page_074.tif
cfa50aecd10b02eba01b1f290af58abf
630a96cd10c20d80497cbbd8a5f9f64567938e15
F20101207_AABLQF fan_j_Page_093.tif
fd35920316f5b82de7ba7b9747b46d46
37574ea12fec36bcfed0664f13f9728707ededc9
F20101207_AABLQG fan_j_Page_094.tif
a871295de065593c6d4998e13c202bfa
d842875ec9fca1d5356f4a3ac1ea18cc7df9ee7f
F20101207_AABLPR fan_j_Page_076.tif
318456c8a1e222627edd3c5ea8174324
af17d9c75aedb8baec313995d36ac84196993b53
F20101207_AABLQH fan_j_Page_095.tif
186f35c2b8e220c36a5e52cfaddb41dc
a90c946439e67cc05abf5e403d3de8ac6c8de0bc
F20101207_AABLPS fan_j_Page_077.tif
462c1a5d96937a819d47fe2de101c5a7
9c3a8190457334ef60f8f968dbf71ff791fa9fcd
F20101207_AABLQI fan_j_Page_096.tif
c365768200b51efc5ee3a2ecd6f24d0c
1543a32682d4fe24c8b79842f5730113f7740a9a
F20101207_AABLPT fan_j_Page_078.tif
1b2967cbb4afa9e5b34c70a0765b353a
5b0d2bacbdda2381c5ca6806bd542670eebb90ba
F20101207_AABLQJ fan_j_Page_098.tif
c93b909d2bfcb43d2839f73aea4adea1
5a9d3d95e6801549a87d58663caead5d2622422b
F20101207_AABLPU fan_j_Page_079.tif
f9a45315c3397bf52f7674e6478f3c86
fc6f8d024012eed9bf18a93845fd2aa3c6148713
F20101207_AABLQK fan_j_Page_099.tif
f2bc0a5a69a59b11b8c7cb981346396c
d38501b2d4d2ef5ffeb02576a071447476edc2cc
F20101207_AABLPV fan_j_Page_080.tif
406d133dc9f58b8987a7a885813f0c27
7724353671f6e03cb0916222c7c1b15d6488d070
F20101207_AABLQL fan_j_Page_102.tif
67ec98197c79fe706c8312c1a6e7f842
2dd7687f1d4b76875084cb21965502278e1bcb91
F20101207_AABLPW fan_j_Page_081.tif
3accd8d32c656d38db8eb2b2068d8ba8
3b20821a9ed4025ed469abfd991862335d8e889e
F20101207_AABLQM fan_j_Page_103.tif
e78d0daac04d2d6b1a36af94c2c4be85
a1ad39c7b3a9f36c41c0d18ce6d1f858a3087e1b
F20101207_AABLPX fan_j_Page_082.tif
c632bcb3805a93ea95ef5c629101c02e
58de7a2e4dcc891baa08fabb94ed1efa9371c580
F20101207_AABLRA fan_j_Page_124.tif
5418d5db6ef031d7ae26ca74d4e1c4e9
f63066a82ebb36a36f3431758f97cdd629d00179
F20101207_AABLQN fan_j_Page_105.tif
513d3212567f3a0e0aec1009d824e858
1a910b73d011170618085f2bd2a8cae78cc19abc
F20101207_AABLPY fan_j_Page_085.tif
cab22bc966d45ccd74e53955ba104328
e4fab4e6f305c78486aa0f3ce96204139c1618a2
F20101207_AABLRB fan_j_Page_125.tif
89be60229839565c8eb0a557226b31e2
e3acfcc3cd77ba4ba3aa98ce02e7d780b8902f88
F20101207_AABLQO fan_j_Page_106.tif
f6b6e5eb8a9b0b34ef2581c0fd5154e6
3fb6db21134ababdb706f5a446eade07e1abb8ec
F20101207_AABLPZ fan_j_Page_086.tif
4fddacf21d7dd7dee3addc80bd440fc1
5ee73cf45861269de5444892c3e939566d2f70e4
F20101207_AABLRC fan_j_Page_127.tif
5aa926908a43ad71d9cf404b9e02c5ac
10e69673979f7b996df535f37a4ce4d1c3ccd6fd
F20101207_AABLQP fan_j_Page_107.tif
acd7a8d5d64cabcda5904a1c0ffe21f5
cb847ae010833b46593234d5153952123eeff708
F20101207_AABLRD fan_j_Page_128.tif
d656cd6d9f57be3269aa9b2e09f08d5b
b5fd764d8657ac94e152fe48fed7b7cba214c8c1
F20101207_AABLQQ fan_j_Page_109.tif
723cf7a2237431c78070dc1a6607f0b1
28eaf83f2535ff8de9fcecd5b14872b75fd2623a
F20101207_AABLRE fan_j_Page_129.tif
113c4d0cde368403559e34cdba74b2e3
3144a4e999cefa43ed957ac55d943e4f24e476e7
F20101207_AABLQR fan_j_Page_111.tif
84aeb045c49edae0d27eb4ce12857744
e641455aa7571ef8b43bd8d21d1256cc25efedfb
F20101207_AABLRF fan_j_Page_130.tif
840489e8e8d592eee55ca7d61f9f86bc
77939a478421369b80fca78f8eef612ab2677bdb
F20101207_AABLRG fan_j_Page_135.tif
56f53faac201d726871f88ca5576ad17
dca2626379146407697c76954508764c821a3b0e
F20101207_AABLQS fan_j_Page_112.tif
f22c39662c13b1fb9704b60946277031
e47c11770f44739c106b6d34ccfdedc75a707abe
F20101207_AABLRH fan_j_Page_136.tif
da5fd2e74d682c66de8065f7f2a92a8e
1cb7840c92df6b5361fe9dc27201a8d38a949ae9
F20101207_AABLQT fan_j_Page_113.tif
7036d54e435f73c18cd601471feff6cf
5ed401d1b14f2af8bc62937b73b6f34f348edb00
F20101207_AABLRI fan_j_Page_137.tif
aa459cd73fd5e08c3f24320773f72f52
c0e3a1da00ad23c56e676ab232b6c39c8ebc49ce
F20101207_AABLQU fan_j_Page_115.tif
d9b093cc68c907749aec8b3759157057
4d69e9ea9730a4e8a53a130600a292200ea0d69f
F20101207_AABLRJ fan_j_Page_138.tif
8303328676061e98dcabc8ab8f3682a4
acdd84f0dd6cc7c2ee0eb473b3910d3ea1b9ddc2
F20101207_AABLQV fan_j_Page_116.tif
b106912c1518558600888323d6ca0483
210fc12e2db422a5341b39e9562dcf3469389c6e
6872 F20101207_AABLRK fan_j_Page_001.pro
874bd1bb1aa717c476cbb684f161e51b
12d10a2a521c8abf829232ea4b6553b4942146c7
F20101207_AABLQW fan_j_Page_118.tif
b0217ac57e32078b7623d4a493c02245
d7f205172d74cd6b60b54836549f7c4540a83033
728 F20101207_AABLRL fan_j_Page_002.pro
2e72bd3e2c07c7368d89267c111aab6e
c9887c4def7f086168e0df01eeea42c82a46718c
F20101207_AABLQX fan_j_Page_119.tif
2bdeeb5903f143a8909903f45b9fe4c8
ee0b862fea164b16c7aca55205b1954a28bce8d2
42156 F20101207_AABLSA fan_j_Page_024.pro
d2a1c71fd619cb06bc049b340650d9c5
30451f3b9df428e2a252b897f576e48aded3d379
3835 F20101207_AABLRM fan_j_Page_003.pro
b6dc36f3841deaeca88429a53f411280
6cb6fcbe1a39790fc4adbadfbb78dd44bf59aec3
F20101207_AABLQY fan_j_Page_120.tif
664d81169dd9935e8d85fe9f40df5575
06597df1aa19ec6a65443ce7e07e4b1464e235da
51257 F20101207_AABLSB fan_j_Page_025.pro
67114901dd90c5784a29b91eedc18b3f
8f37be703e9ee37aa0b8777ecfd00bd5a1ff7349
63226 F20101207_AABLRN fan_j_Page_005.pro
2737f849f8f8cf054c928675ba509aeb
eeb825f098a3701ba34700dcfd0c15fd1010efa6
F20101207_AABLQZ fan_j_Page_121.tif
fae6d12808e644e028bdf38656f29e6c
2b82446b5fccc8c4dfc8850c0a27cd7b76146e89
55451 F20101207_AABLSC fan_j_Page_026.pro
d382ee063a5693173a4d120a093c803f
f77a6c22484b5afc4ffc7825206935e7bc17ce90
74701 F20101207_AABLRO fan_j_Page_006.pro
d23f3aab04c3f897118fc92ac8d125ac
12a89d620c1e09c0747d74b1da1ebbf8a6da53ba
54332 F20101207_AABLSD fan_j_Page_027.pro
4fa2f48d32344ff689df56f905280232
04a76a23522902a67b5f9e9d1f8a54b467392b93
64095 F20101207_AABLRP fan_j_Page_009.pro
fd6d60c8fad670de3e429cfede23cf90
5729ca5a161250bad433514baf37c7b8a82f093d
40417 F20101207_AABLSE fan_j_Page_028.pro
c467ddcda521b69beda56b421c3fd86a
5caaab10552847290b477db63c96684cbd789de0
65881 F20101207_AABLRQ fan_j_Page_010.pro
9a99eccbf4ecebbffc9eea3e6e8248de
5aab74f42067e469b0083dc059299d1f7445e415
26555 F20101207_AABLSF fan_j_Page_029.pro
1b22bfbee7047c9514e12f1fa67b6c92
861987c69ad3d7efc8e35da0ebba1788af715cf5
50400 F20101207_AABLRR fan_j_Page_011.pro
6bc54efec33441152d4f5ad2f3e7dd29
a37d0d2af6ff007902fb9673b499758d59a33236
51418 F20101207_AABLSG fan_j_Page_030.pro
0ce869079afef60d0e7683e3a5c34b03
830739a9fa37f5fe1eeb00b146a75b2ebcc7db74
48358 F20101207_AABLRS fan_j_Page_012.pro
543539a229ee9731affc9727aa9bd42d
17f23e288f499069debb87f0e7193b92573be4e0
44736 F20101207_AABLSH fan_j_Page_032.pro
196c552cec4862768e3d9d09d172a761
ad5197868fbc11a07bf90fab97ea6ea2c3b39eed
33614 F20101207_AABLSI fan_j_Page_033.pro
c175f5e0762dd4e70cc6882db9e2b203
47615fd3d5b36d6af7f8be1a7f2917531a82f15a
34259 F20101207_AABLRT fan_j_Page_013.pro
57713a78f19f32ba2016492f77223af4
97d10c483be905d05c25bfa56a49bc0f98e048c4
45314 F20101207_AABLSJ fan_j_Page_036.pro
3c6d86bf51855b0c7de495050ff73431
a71bcdb86979616b7fadc5b3d65b0adc0898ba25
53286 F20101207_AABLRU fan_j_Page_014.pro
dceeb4b5677c4d4492c41113769d320a
32adcdabea4df0b6092f82af496de17f18bd2728
41348 F20101207_AABLSK fan_j_Page_037.pro
f4bb5a7ed6bfc092a8faea0d82b8986b
358af6b3527ceb1b8730073aea2e71ca4d6c3f42
56624 F20101207_AABLRV fan_j_Page_015.pro
e2de84e870d5cccc2cfc01331ab3af20
f5855f645890781c9aaa43fe213f1d2871a18007
43817 F20101207_AABLSL fan_j_Page_038.pro
2d5c1c110d28e599a87e8ac65be0a18b
5de50561ec2310b6936fd5d99f3db251abc557ea
28181 F20101207_AABLRW fan_j_Page_017.pro
2bedc484dbbc6ab2d14cdf4d4575ac06
2a8bd722e4d079c3a84d45d71b63694c9d38d2b0
15195 F20101207_AABLTA fan_j_Page_058.pro
9245c10812c4e5cdd8082ff3425aea00
5cd0b5d451fbd9c7760f1abb16bb5c640ce21b87
56234 F20101207_AABLSM fan_j_Page_040.pro
2d7b08cf6a3e0fa02527508e00b7feaf
0176c18ad009a72d6d55782c42b610e92681ae34
51148 F20101207_AABLRX fan_j_Page_018.pro
b446c66c40f853dba55dfcb1c020c44e
bc56077139dc53d3ede41802d166e3788bcc5f1a
50795 F20101207_AABLTB fan_j_Page_059.pro
97790383b8c67a8d5bae2591da586bd4
aeba2ec99b1952bc1a4548fd793b578bc49c9561
60895 F20101207_AABLSN fan_j_Page_041.pro
11891c9978496c0916aacedf1bdbb391
0c649174f416dce4ed0eb71bbf7bd3ee9a690f60
35261 F20101207_AABLRY fan_j_Page_021.pro
fa3295f05e5215ecb24155e47a6a4770
04377c63e530eced5202c6cc74e13e11b7f6a4f4
48440 F20101207_AABLTC fan_j_Page_060.pro
7dafe513d6fae951d460ff2cb57594c4
3f95f16e93b184550507c02b5c2c160a1daa6de3
38568 F20101207_AABLSO fan_j_Page_042.pro
4bb251e510a9e708880c4e796ad39088
47b9cb3147b6f83e9204f4e82ab90dbd4986df44
35715 F20101207_AABLRZ fan_j_Page_022.pro
c6ce9a241f44370aeef2230859678152
85427499e705fcb1663ea9b79b4186909fbc4d55
37517 F20101207_AABLTD fan_j_Page_062.pro
78bd5421919ef4b954212ae35416ac33
678ec9bcc49f8af53c3fb0605e5705cd532e4409
37603 F20101207_AABLSP fan_j_Page_043.pro
36a846f41fa10e868cf0594f21e7a7ab
6b3352802ed4c727cf3b3c5e50e02ceca958261f
40810 F20101207_AABLTE fan_j_Page_063.pro
14e15f9bae268e60fc776b8530931a90
49b0b1b8181961e49faa09d0ccd46259e4f471e4
34953 F20101207_AABLSQ fan_j_Page_044.pro
40bbc519d50affb7fd91782740d87626
6332eb52bfcaee5035a935ba1e343add1f9957f5
44503 F20101207_AABLTF fan_j_Page_064.pro
425786f7b1568b7125cdbe9e880cdae6
30213fd5cb98ba1df774b87ca877b17487a2b916
36495 F20101207_AABLSR fan_j_Page_045.pro
cc66094d7e244582b14751d377f4543d
9eabd5f87831f58a70de0f9726582bfb518e603e
43383 F20101207_AABLTG fan_j_Page_065.pro
38a1e4fdbef23727368d034f41613a61
30a709b46f0a4e41a81614498cfd86ccca23e0de
58394 F20101207_AABLSS fan_j_Page_049.pro
3751cb893bad7a1a7e6fb63d74e9cdf2
caf4416225dd7fd19d9e69b4b676647fd81a9259
41161 F20101207_AABLTH fan_j_Page_066.pro
65fd17dfd862ee92c2ad3b218f9e693c
8984bccbbd7129bc417cee431fbe46f316e0baf3
31798 F20101207_AABLST fan_j_Page_050.pro
f8f6962b5c5c4b4969aacc8511e5f67f
8f7155014299da9eb9f651098abf57ba64d45e17
41257 F20101207_AABLTI fan_j_Page_067.pro
07db466a9453c2e5975522743459d4bf
be3d132ef7e5efe2d08c0b5726324b6cbb219d09
34465 F20101207_AABLTJ fan_j_Page_069.pro
299f18edefd44261deae7b36c05cfdfe
f469d3b7a92315d34ca3cefaeff470d5c875682b
57020 F20101207_AABLSU fan_j_Page_051.pro
4a6b75ecb21eabc08644007fdcca7a72
d5371a1a2f1d4f445ac7c7e6c27b7e2aaa840bcf
44585 F20101207_AABLTK fan_j_Page_070.pro
4d92d27354a6007e4fc986c7115318a2
f6b74aec6a273bd4865cd8cd21d5448a2e6f44ad
39539 F20101207_AABLSV fan_j_Page_052.pro
85b894452e89074120224543b4558fc3
b08ebf68a86786f351dbe629cb6109ece7fa5e0b
33434 F20101207_AABLTL fan_j_Page_071.pro
de37bb091a0ef3b7463c154244575e87
9935e526a74a7cd847f42726dfabd05b4016a056
39950 F20101207_AABLSW fan_j_Page_053.pro
1685772c8c75c412139011d5b63e4be0
17259f2d182494c4fd243dc9ef70e7ca73ec5294
52607 F20101207_AABLTM fan_j_Page_072.pro
a02c71cb0adb32c790c2fce56c6ce2fc
4106c3ac6bd647190ed43f3dcbedd222bfc02c2d
40889 F20101207_AABLSX fan_j_Page_055.pro
19847f72aa5ce889772d958b21685057
1d8c43cd648e63e11d5b2e2d985140c9a40faf30
44356 F20101207_AABLUA fan_j_Page_087.pro
9f3bdb529f52b2bd3e99d4981df04f9d
b9e362e0a9e18fe63439302feb0bf43fcfce7d1f
20116 F20101207_AABLTN fan_j_Page_073.pro
813c5d82b4e41c23672c7aac7e93f1e0
2ffaa572c0a2dc40bd4822dcdb55d83b14485696
51174 F20101207_AABLSY fan_j_Page_056.pro
667268f9848e6a7f213df89749920320
98c6a3c3f03f3b4ed15dba16c130da1c6ea09975
38325 F20101207_AABLUB fan_j_Page_088.pro
019112bb61a81fe84df1397096e26fa0
974b033690232fba9b25d41550d9dbb03741b91f
38188 F20101207_AABLTO fan_j_Page_074.pro
107dd3a37a97864349230150423fee54
6ea72f4d82e0969d624b01aec7cd68dd6f4e55c5
49628 F20101207_AABLSZ fan_j_Page_057.pro
cb6bbac6314896b6dfd5dd9668a0efd4
c2ae8b95f0a77424bcc9b523ebe98f71b6e860e5
44221 F20101207_AABLUC fan_j_Page_089.pro
489cc3f4fc8539b8849864766e97eb0e
aa1501564486c0b326c40addc2f10e59bdc2beac
46017 F20101207_AABLTP fan_j_Page_076.pro
41f9103112d486816219df7f2a49a7c0
402d0b1721fd72cbfef3b56d41aaa03afb40c4c2
57184 F20101207_AABLUD fan_j_Page_090.pro
a43e055678b0915e8152b52b197dad9e
c1f8bb4acfb7914ca73198c6bd42d16539100a11
61156 F20101207_AABLUE fan_j_Page_091.pro
5d819b0bcb6b1b3daa6ec9efe3f99f21
adedb8d9a586aecb54b521ecb6a10f9ae5f1b154
35816 F20101207_AABLTQ fan_j_Page_077.pro
1e71d0d7a3b3637e11c1043c77d475f0
a03755901e324b574c21299e6518978ee640d807
5906 F20101207_AABMAA fan_j_Page_105thm.jpg
070e2d80aa8535d7f04b46022c9ac5f5
0f676c535c0341578db01d47a8761ef045c56a16
61746 F20101207_AABLUF fan_j_Page_093.pro
dc319e7d49d409208c1de3568590b4fe
2468d7b74aaaaf1554bb1ecb4d783abb0d867e3f
25267 F20101207_AABLTR fan_j_Page_078.pro
c2a738ffe49f1d49682e1434d05b362f
e6b497e0a0681fd164bd5c9ca0271d63b26b9e2c
59451 F20101207_AABLUG fan_j_Page_094.pro
dbf3080be18b4b448a51bef0eb38f650
00e316074ed51fe31be4f2165b95128ccf6bc31e
35623 F20101207_AABLTS fan_j_Page_079.pro
25c28fce2e8b88bbc1358d14ffe11928
ccb88462f40d2eb4ab5cdf3460adf5769a439fc4
6508 F20101207_AABMAB fan_j_Page_136thm.jpg
35269cf2709b7edc03fa0e48a83faaa4
0163430d6b117f1891d4e561df6bab317893915a
64415 F20101207_AABLUH fan_j_Page_095.pro
4739eb454178c397b801e7e968c01162
42b4c06c1fabe634efa9e195b8902f2955e1d274
42497 F20101207_AABLTT fan_j_Page_080.pro
3ccebf938f22a5162b8addfe1bb960b5
f2b0a639f451512ddfa05615831cb5b9ce253589
22085 F20101207_AABMAC fan_j_Page_012.QC.jpg
a753bbd30e9fe8632706e973ad12a448
405ef749cd69cba6bba5386c54e9233988020c6a
43398 F20101207_AABLUI fan_j_Page_096.pro
a38b2a6d50d916a7574b7c2d45e0f645
e018d7ba01206961766abc05ec5520efdd8f02a8
45674 F20101207_AABLTU fan_j_Page_081.pro
0ee9d356356555c86a7557f666d3f894
dc7ac7406f70b9978a7248b5c67870fe08f0c6b8
17265 F20101207_AABMAD fan_j_Page_050.QC.jpg
ed1171950fac66dee3410e55da13d467
71c18c7e67c9c0bb682ce702e56faf7694d74dd4
36577 F20101207_AABLUJ fan_j_Page_097.pro
b84bff78126792d3fb7ee3e4f44bdc4b
b30f4ed19ce9ccf76c0b70e4ecfb33a615c602b0
29444 F20101207_AABMAE fan_j_Page_095.QC.jpg
da8c08105c322b3964b82b026f7848c6
648b94fe342493bb9838afcac015f63b9e0e34cf
45430 F20101207_AABLUK fan_j_Page_098.pro
7d2ec1446e2e467e6d05390cda4f0bab
fdf85c5d92ce6bd8432d07391ef597dfe6eb5105
23906 F20101207_AABLTV fan_j_Page_082.pro
9e8952c0d31152766f294f858e85a325
5b36b5d772dec0d9898ceccc7f5863729158d2e0
5393 F20101207_AABMAF fan_j_Page_001.QC.jpg
05c2040b3d9cc353182cb19866475763
0d5a2b898c20f19460e7f45476af44d714ec4d95
42730 F20101207_AABLUL fan_j_Page_099.pro
417d37432b243ff1fc4c2c136647e63e
d1b25bfc5a3db80829f2291a85b656d1d6819643
36910 F20101207_AABLTW fan_j_Page_083.pro
9752698eb66e322de35f61da19a1b8de
bc150c559ea3255de76fb441da2901991f12d30a
3132 F20101207_AABMAG fan_j_Page_073thm.jpg
59d1d5727bcc0908bb5beb4d867b6c5f
2d65b6341d624348f3477d55856e1fc0a2f8b3da
34059 F20101207_AABLVA fan_j_Page_118.pro
a7e4132397afb8c4753ecf9aea5c30f9
b44337d005bb106ecb9693c34d4d870dd546fe4c
42489 F20101207_AABLUM fan_j_Page_100.pro
520e71ede78d7dd8ed4a768fdb184cec
c3da0e5a75dd9b931fade384300f95663eb102f0
47452 F20101207_AABLTX fan_j_Page_084.pro
66c8c41ed2fdbc7896b43b3557e8b383
fdbdc77d688622a0c9be6c4956b8334fb00e11c8
20031 F20101207_AABMAH fan_j_Page_048.QC.jpg
bcfb280831e22ee83e2ab715681e5972
9deec50ff25e5f7d86bd3e9ff999ac3236bcd098
31789 F20101207_AABLVB fan_j_Page_120.pro
cc31306f2013ed7282af2eb226d13da1
d1bc59bde5de2d214014cb31a5783b99a0e8653d
26726 F20101207_AABLUN fan_j_Page_101.pro
7f0a5677d29bc76fac2a53f50abc0e75
1137c953ce076b5a472a739b04df508aef86f8e3
38727 F20101207_AABLTY fan_j_Page_085.pro
1dafc5b1831f06ac89d440e9dd5face7
0d9d64823dd3c25a6401391f66ef0718a0db9df2
24698 F20101207_AABMAI fan_j_Page_056.QC.jpg
b998193e126caba6db65723a693d07c2
dad96ee413c7de00427cea63256b63fe98c4c027
48391 F20101207_AABLVC fan_j_Page_121.pro
3c62fbf0a942c535f49696b9f40484ef
fa1f4c99735d74823ecaf2dd62c8b67c61d04c92
20186 F20101207_AABLUO fan_j_Page_102.pro
51785dd43692353cad746a323ccace3f
49986b55fc41e438edb5b6b7980e00234e61dc5d
44896 F20101207_AABLTZ fan_j_Page_086.pro
f984676818711656aa4f734a2d404f28
789dc7a051da1efda8ec910cdcf4a3ef8cb138e4
20110 F20101207_AABMAJ fan_j_Page_046.QC.jpg
4bfcd1fea912da0bd79bef24639d884a
caeb2e334dfb0e4f4525bcf77244798959212f85
33968 F20101207_AABLVD fan_j_Page_122.pro
0a3941c04c5cdec4de927c9cc2df1ef6
51866b8b9168372ae78a5763b84394858994f374
49763 F20101207_AABLUP fan_j_Page_103.pro
f47ee9a6f001684933359b0f213fc71f
a45a699d39e8da8335442d49b8e340a8cca44d22
21279 F20101207_AABMAK fan_j_Page_036.QC.jpg
554dd246c0efff530f19dc7d5ab7eed5
fabc66d7e65cc19c8714435fc300ec0c121cd845
44173 F20101207_AABLVE fan_j_Page_123.pro
d93ddf7d8304ca23152c32d53eb8392e
9aea7e84eacc4913f028b943f077b8fd6431dafe
52561 F20101207_AABLUQ fan_j_Page_105.pro
a316c5062b9004a5f82312d27981c278
be1812d58aca030778e635d5161705e3a2add791
4677 F20101207_AABMBA fan_j_Page_042thm.jpg
c7a3fd0242064fb0a19ae926b24296bb
ab568d515e5e452891079666f3475dcb36ac36bf
14360 F20101207_AABMAL fan_j_Page_125.QC.jpg
9227139eb6d38750463bc3c14cf0cc08
335c0fec16d9ff2f3cf96bff08aeeae18162f047
41636 F20101207_AABLVF fan_j_Page_124.pro
35cedc2e7ffc62b883e145170a4cee7b
1a89d8bcce02dc3b719a6924ed8b6317401ad546
55902 F20101207_AABLUR fan_j_Page_106.pro
5c6a616efef61c672fd72a43efc6e379
3405283688b64035aaf734eabccb971e4b7e1994
15865 F20101207_AABMBB fan_j_Page_112.QC.jpg
4929579cdd9786dd4edec9464701aada
5743896deaf15638527f05409f43f3d7473ac5f7
4940 F20101207_AABMAM fan_j_Page_048thm.jpg
dd9a21ceb84dba5ad0b588608331143d
91599173f9efcc2036b01830798cf566f6b40abc
41939 F20101207_AABLVG fan_j_Page_125.pro
27170243a0ee6348addb95beafcc24a8
149aa310c19b0ec1b56a4a2830f3e8f97772dbef
42831 F20101207_AABLUS fan_j_Page_109.pro
3277ca2f795f3c70f48d25cd2d5e99d4
02a8d078c34730c956ade953a7560384010e8852
20755 F20101207_AABMAN fan_j_Page_100.QC.jpg
58fc8988af1856db964df8deec369257
ca7df1e6b12aa7a753018fab180334e7bd6dbe63
48478 F20101207_AABLVH fan_j_Page_126.pro
de9e5870b0ed9fb3f7739be68e608fa7
d4e2492177f19f57f9f2653bc94baf38d8449afd
38954 F20101207_AABLUT fan_j_Page_110.pro
5e410eb149b2770ff80fe265d1dbb41f
610eebc40bd0cb7d54bb68e7984f60628fd5d5e6
13266 F20101207_AABMBC fan_j_Page_101.QC.jpg
db751b81647d2923e287c5400613c326
e4137592a0716782a3d549813c287a03ca80c98c
1676 F20101207_AABMAO fan_j_Page_004thm.jpg
3de77e70ed73c067b5ed79b5f78b8bd2
a0f3042533f351918fe547f5bd0c37ee57cb6d27
25662 F20101207_AABLVI fan_j_Page_127.pro
76e1e77501c64646216bc97b8b8ccd44
68d78d3913d027a742437d1dd478251bda8ecaf0
31148 F20101207_AABLUU fan_j_Page_111.pro
30f640a053e596e04af7d494bf847b8c
78afda1773e124c013ef5413636268c5ae658d12
5934 F20101207_AABMBD fan_j_Page_014thm.jpg
e4347cddbf808f5082457d9e43f8b534
908b0b3276fbd9fa08d264e63cd800634da40740
5011 F20101207_AABMAP fan_j_Page_100thm.jpg
91dadf3fd937ac2364f31262d8146c4e
08a6fb894437db9bef738eb3c41f3f3d52f7f4b0
43461 F20101207_AABLVJ fan_j_Page_128.pro
5a684fe4e013c79514a2f249b35f0d5f
ebed48e9f11484493e2eaf560b1b1af8b1d5bd11
31906 F20101207_AABLUV fan_j_Page_112.pro
9520ec9f6b82c685228a850a69fb6fa6
c516b38ed1d06d9c35f84a1d906f6fa4e23d5768
4614 F20101207_AABMBE fan_j_Page_019thm.jpg
f153b9855c20e9697d49f6d9bbe84141
4a09d66940286be92dc2fb0949eb21552289600c
3261 F20101207_AABMAQ fan_j_Page_102thm.jpg
91aabf4f2d95d27458d4f172cb2e8d02
9eb62c2a96594262245c10479f74c075451b10f7
53912 F20101207_AABLVK fan_j_Page_129.pro
32083bff819e42578c738421ee382e59
ed9bd7910ece514920d54a07568477bc26e70d07
5527 F20101207_AABMBF fan_j_Page_015thm.jpg
c23741008693246e9235fbdbab70402d
b5aff0dc074d9f1854cfcdfe07c0ff04a2911fb6
5575 F20101207_AABMAR fan_j_Page_121thm.jpg
83e20009d5406b97c87a1ca1355b61d0
f554644e0159fc963e9e58f52e89b610b7dc4316
57168 F20101207_AABLVL fan_j_Page_130.pro
fe8f4b3bf7f5e53d2f7571dcfeef1300
d211b74c60c5b804f3047283a37414d4be9f7102
21112 F20101207_AABLUW fan_j_Page_113.pro
bed5db6d2f69163a4fd1f93f45a866cb
ff3ab86ee65a51e7feff50759b875b1ce7aadf4d
3555 F20101207_AABMBG fan_j_Page_134thm.jpg
b89c303d1da55f4096efd763c2a30b08
9a5f611bd48e3b88be5c606dfa89e0cb77db3919
5715 F20101207_AABMAS fan_j_Page_031thm.jpg
07ffb907cab82e46f3ac965e259abf64
6b4231a063bcfd6e8bdcb1b0c4da05a629a05a2d
27213 F20101207_AABLVM fan_j_Page_132.pro
00fd8da8e4970f7372ec00a80d44d34c
fc8010dd9994e610496949e0aa409f09478ac600
31384 F20101207_AABLUX fan_j_Page_114.pro
206a10d8f134d0c99a21fcc877c163ae
4654c11d436b745e68e24842ebd4573d6324fbf5
2644 F20101207_AABLWA fan_j_Page_009.txt
2e2818f8449489462056d6945d099aa1
d8ce67cf27fafbe97d392b2efe4fbabce00c5f3b
4666 F20101207_AABMBH fan_j_Page_044thm.jpg
637584c3d2285bcd2ac4ecb82cc40341
c06546f5beb2fead6837ef2d9432945cb58e08a4
5316 F20101207_AABMAT fan_j_Page_076thm.jpg
ad6ddd5952cd37722201676d1f9c8dff
2e7e707dc973b6dbd51820b7333c8b8674827deb
12555 F20101207_AABLVN fan_j_Page_133.pro
985a258e236019bd59e1ef57040b097b
1a9bc456bf0619c899628745797b6a3e61a7a4eb
46732 F20101207_AABLUY fan_j_Page_115.pro
706ad59020cafa2ffb88b2b1004dbd56
0aa97f570c14ec321fdc552f8506558af866c798
2707 F20101207_AABLWB fan_j_Page_010.txt
365a1dee700a516a153cb63bd1f64987
c96792de80358368eddcccca90bd0034010185f6
6033 F20101207_AABMBI fan_j_Page_035thm.jpg
b33a8f7cc7b7b452178e09a89dff0207
59523b8fbd8b8355cd1e32a9c880f98adf7bb0cf
22981 F20101207_AABMAU fan_j_Page_103.QC.jpg
6e7f1773adb4ed4dd2599d38fef200a7
e28178ccf74090cbcc2931927d8642f8831bee32
18065 F20101207_AABLVO fan_j_Page_134.pro
9b4c28711b5114bca92d08622e0d1af1
4d8ccdcf1414177fd596c793ef3c4be86bdd9890
30863 F20101207_AABLUZ fan_j_Page_117.pro
7293782a00a97a818a9227a79c89df31
4e2d3f5b0f6990619b724918a084eff43971c4a7
2067 F20101207_AABLWC fan_j_Page_011.txt
53d1d258d6b55a51c60fc3517a8c26cf
7ed64116952532efbaa406d4d5cda05422c9aff5
5814 F20101207_AABMBJ fan_j_Page_025thm.jpg
56fb5101f6f76583f49095fce702eb85
d006f39c78b40e438db6cda3014497e6cbdff9aa
4057 F20101207_AABMAV fan_j_Page_112thm.jpg
fd6b67ab4f24f25af7e13ebd72f35818
1e320924914a93870020f8f62a94cb5a4c13a0ce
62713 F20101207_AABLVP fan_j_Page_135.pro
8d6c3381f4640c8e2329be693cd612f9
d64c1fc4726454695d4c4b555f891b92d633f134
2091 F20101207_AABLWD fan_j_Page_012.txt
47752dcffff31790abfa9ae34ced0a3f
a9f39e8b128077d03e4763e013e1eeb18e247241
24863 F20101207_AABMBK fan_j_Page_030.QC.jpg
2b079534d5a543e4ca5c966cf7695bf5
7059758ff33babf312902ed73f38b6f3c0cb6ba6
13634 F20101207_AABMAW fan_j_Page_058.QC.jpg
b1ffac592b385fb34a5baae0d5ae0ed1
577979437607724b2c593587f71e62d7ebb439ae
61069 F20101207_AABLVQ fan_j_Page_136.pro
470bda79e2183a144bc758c785c4ac6f
2422f90ad281b3bd694c50df9f45f99e1bd6b62c
1365 F20101207_AABLWE fan_j_Page_013.txt
5939aa7e4b8574bbca2d4f2470807309
4f2208304cf02c3dd48fb3a5bc4715126222d157
19589 F20101207_AABMCA fan_j_Page_062.QC.jpg
4033a54a1408ba7853f0bb43117ac264
842405d73e8c6fa694e7ffc58d08f3e8da732b7a
5570 F20101207_AABMBL fan_j_Page_106thm.jpg
0fa623928df0a1dec3166fc8694d7272
bbf64de86129f9595db7e089d1b3576d34a0f1d3
26404 F20101207_AABMAX fan_j_Page_040.QC.jpg
14c3665daa028e172b90be414273dafc
472eb690459fe49a1aced722ff0a4712204c3d03
66803 F20101207_AABLVR fan_j_Page_138.pro
3cd43b8a209edd2a2d44d3d43e7b3a3c
dbadc8608c70c3026f07b8bf502b36f7f08e6d3f
2311 F20101207_AABLWF fan_j_Page_016.txt
412cf11506691a802f447cdeba59d356
bdec641b773e21df085ee231db29d56da1d28162
4634 F20101207_AABMCB fan_j_Page_118thm.jpg
c0211e741e657aa6192bc256fed2abc4
a562c0704ee84314ffd1657834e8ec371070fde3
4776 F20101207_AABMBM fan_j_Page_005thm.jpg
010049d66035a4a42fd0eed343a2367f
80ce7dc90ba322a912585807ae5e6c7f94d70961
3093 F20101207_AABMAY fan_j_Page_007thm.jpg
afe762583d82ace01122f7db642eb03c
ddc14ee5ea2f878ff8786b4e792fe1805c6e3888
27463 F20101207_AABLVS fan_j_Page_139.pro
a3e5a38d90e4f803e48dd75474f20825
524f4c1aa66a817974083f8ffaa63de1a68bfb98
1123 F20101207_AABLWG fan_j_Page_017.txt
fd95500fa57bc59ba600e80b8e1b837c
8459f0516200a9f7395e42f830e2bad36b32c402
22336 F20101207_AABMCC fan_j_Page_108.QC.jpg
23909f316b15b98e144c57df1d320a2b
da2ead1e69ac202a70611a6e686d5fbe06689361
21227 F20101207_AABMBN fan_j_Page_109.QC.jpg
6abd3cbbeece24d416435a6b8bced54a
a81e606694f567e9fc6e310c25339e7fac31805a
6682 F20101207_AABMAZ fan_j_Page_138thm.jpg
49f12d71f9cd874140f08788d1c32f52
6aa0950571d26108792925f3bac16e36f9603206
16862 F20101207_AABLVT fan_j_Page_140.pro
4916fa976c66d08f4420d8968e14a88f
abaf7995575e03ba2c5fb6a754ff5e8015b748a1
2117 F20101207_AABLWH fan_j_Page_018.txt
10a52d674eb560f0c6b0f3e735ed25d0
8d05a24dedfe6caed12eb9f69e07bcd7a797401d
19266 F20101207_AABMBO fan_j_Page_063.QC.jpg
55ae2cbfd98a39139d3ebb1448003927
3394af15384dd4241c5bf8298c924d68ca2633fc
404 F20101207_AABLVU fan_j_Page_001.txt
06a7a25ce9f718bee6a10747650ce321
759b57f242172e40aef6b56eb89f1064ef59c2d8
1243 F20101207_AABLWI fan_j_Page_019.txt
c3ca5a6b29cc5a451f4c67663381185d
b7608ae12e568b3ac1ab368f8afb54030636d03b
13010 F20101207_AABMCD fan_j_Page_127.QC.jpg
8cf10aaad7518b5efe1060ba4ca3ec55
37927eefdacd4c9a28f3066fbc4e9b2b7775b766
2566 F20101207_AABMBP fan_j_Page_003.QC.jpg
c7aff4c25ee3c296bed33b9cf957a71f
aec2ca9775617594cc463b03abf8029bbe945458
218 F20101207_AABLVV fan_j_Page_003.txt
1caf35c294d9600487c5fc90a5e80de6
1ff4ef16db40d1e68a6225eb64781b24d4cbcd52
1259 F20101207_AABLWJ fan_j_Page_020.txt
6e621b4577991f6ae9bb0a04614d94c2
b7257067ebfb060b3f100591b4da4c96a78ee504
28170 F20101207_AABMCE fan_j_Page_006.QC.jpg
5f4b7a4ba77156af18b238f099a24d75
9388134aeb4f34a3619d33e8c56331c2f1fba5bd
5554 F20101207_AABMBQ fan_j_Page_129thm.jpg
fb87548befaec4d5c6dc6525ba2ad1b7
777e8ac7df522a1743e18e0caabdece87c3bef15
570 F20101207_AABLVW fan_j_Page_004.txt
e6b1fc813a78d3bce83423bb10f6bc15
e01baae44d60f156c2a6b5f0fe2c8aaebe6a99dd
1584 F20101207_AABLWK fan_j_Page_021.txt
b61f0bad5b14d6297d4da80e2198e70a
a9d58d27724a6aa5ed9853fda5518f62f9c8a85d
4305 F20101207_AABMCF fan_j_Page_122thm.jpg
21c39dfb37630bfe1af55b5b92962a9a
2bb1f1ccee42af7d2a895a481d699985bfdeed1b
3026 F20101207_AABMBR fan_j_Page_017thm.jpg
acde64f73d68baad87b61128fcd6bc24
447f68dcbd6ffa24bcaed3bf9a17050e44cae15f
1491 F20101207_AABLWL fan_j_Page_022.txt
5b36e2e79511f200bb0956959cdc004f
6435ad8c3a28218633079b975c46446344d799bc
5972 F20101207_AABMCG fan_j_Page_093thm.jpg
f36d4c411aa2d458408f1ec062ef677d
4e4e81a4cf9c956a45672916da46042aa7a7b856
4986 F20101207_AABMBS fan_j_Page_128thm.jpg
e40f54117339fe4e6b1a0a052eaaeb16
f7d2575068e4879af0284a462616f3849ca0a29d
3087 F20101207_AABLVX fan_j_Page_006.txt
986f9a52f4167d2341cb5430e54c2c58
2110eb908fa2dbbff5a2a6cf3e5576cb822ee4db
1926 F20101207_AABLXA fan_j_Page_045.txt
6767434f111212936dba7467f10330f7
9a24ebd103adcc0ed87aece15b6b747a5f4e3a17
2342 F20101207_AABLWM fan_j_Page_023.txt
6b7bef22fea2754c187fd09bd32742e7
c237f3b9a6a9c53a041fdeb5cfe43340b034b5d2
20852 F20101207_AABMCH fan_j_Page_061.QC.jpg
081b2175b37afa5846597ec78366147b
f73397cb027633645a29e31cbd6d67abc842dd34
16793 F20101207_AABMBT fan_j_Page_114.QC.jpg
b15df2eabe0aea52cc260ea3ee499cfd
f333baf9904716295582df3349d616cbca8a9226
1485 F20101207_AABLVY fan_j_Page_007.txt
cb39c1b30560d60eac91fe4623189c50
f9cea59467d8b0fa111a038b0c9f45b7ec152a12
1702 F20101207_AABLXB fan_j_Page_047.txt
ca063054b08264114da85f235b7d3889
a0b17adecac81953d6fe0ede6097a1c98f8ec213
2044 F20101207_AABLWN fan_j_Page_025.txt
70b6e9cdb386a52ac8cb07d6a4ca3bcc
5c19d7e95c73a1029f4921de5668dfe002eb43ea
3456 F20101207_AABMCI fan_j_Page_078thm.jpg
3a87feb78051431c1f9076d151fe1c31
116a109875c5ae8463de27bbe2b54d6ba099f737
18693 F20101207_AABMBU fan_j_Page_079.QC.jpg
1da00e93739fe4117fc5627babd81123
89b1fec7c609eb65aeb670055f9822c64b7dec46
912 F20101207_AABLVZ fan_j_Page_008.txt
1de8136f60ce0c410f9f66c39a824298
8b994282381050f4c681b73bde1c2bce455e2a5a
2053 F20101207_AABLXC fan_j_Page_048.txt
797dc2649952569433e878ace35ed6d9
beb1c979efe20679cb609cde70214e9719fe4eae
2233 F20101207_AABLWO fan_j_Page_026.txt
f7b1a53190043f6809d08551690e9a5d
510ce144c98424762eaccb21b94f9c2e131b952c
4350 F20101207_AABMCJ fan_j_Page_111thm.jpg
7abb59f6db70666828aec13240615b89
3d609d48fb1604be25599b7c986181fe384268b1
25081 F20101207_AABMBV fan_j_Page_015.QC.jpg
fd4ae9fc5fd344e384a6905e6dd855c5
58d9f81a434941049fb8f150338dd2cd684a0ebb
1325 F20101207_AABLXD fan_j_Page_050.txt
5a17ef52c2954cfde93f7ceb9bbac7f8
0acd64b1d0fd9cae973d1aa5b72894b037183c74
2171 F20101207_AABLWP fan_j_Page_027.txt
aad640c4b0cdaca9795d9740eba0d0fe
f3538c445c3f424b27db9d01fcc45a7ff8872a20
13384 F20101207_AABMCK fan_j_Page_134.QC.jpg
dfeb85a825a8d9c7336836ed11bf9a9b
a36680c833c1351ab56d5624bb632e948b3f0524
17515 F20101207_AABMBW fan_j_Page_119.QC.jpg
fc71dcb0c4b53e0cd04f2adabf5c0ca9
728d8c0566fe8f95dc52e6bdea33d20fb6615450
2279 F20101207_AABLXE fan_j_Page_051.txt
56eb322d406ff2b52eb89a297fa57282
9d5098fa3b9ee98f8729d110a1c58178c8307283
1808 F20101207_AABLWQ fan_j_Page_028.txt
955894366e4ad765f6d361018b422713
b17ae9a6f3b516fcef386948ba07b53c9d9cd1a0
15982 F20101207_AABMDA fan_j_Page_007.QC.jpg
63b4af6f13734947d5490fc990dc9a65
7d9b5e4c18d4de3096ca7e6f66bf21c780150d8a
25199 F20101207_AABMCL fan_j_Page_129.QC.jpg
15502177267dd2a06b47faec0dd55e9e
29e9833a42dfa9b3d59b1445493f26e142584a09
4316 F20101207_AABMBX fan_j_Page_071thm.jpg
7f7c9c7aa97afd9b7ba762d00a21944b
95505b23c743577210f758d9dbb9f9f9ef82588c
1890 F20101207_AABLXF fan_j_Page_053.txt
e2e188a754afb4c1900c4f3cce9e76cb
614998ce1b72712d874ed9b37c59a221f425e238
1313 F20101207_AABLWR fan_j_Page_029.txt
38d5e1aeccbe64f365d18b55c08e9de1
8a7929d84361834ea12dfcf71259a5857a675c02
10090 F20101207_AABMDB fan_j_Page_008.QC.jpg
f82d1a26574de927ea9315defa3e686c
6b97bf1206084241086342c0a4782179899a56dd
16936 F20101207_AABMCM fan_j_Page_071.QC.jpg
9c5ba4b9ad25817d9c179dcc87e3862e
24c0c1faaede4e6fd81c8a4a5f97b5cb4c740290
9160 F20101207_AABMBY fan_j_Page_140.QC.jpg
90be7851494a842b4ef831860555b7e0
350d2233ed5e2eee06c185c2fee6a4283e96635c
1640 F20101207_AABLXG fan_j_Page_055.txt
15c4bf81e2d42e1e9018af52c6cb876f
b3cbf6e0d607e278e0419c078ed825a9e2d790b7
2050 F20101207_AABLWS fan_j_Page_030.txt
6b8bf800b8174cd7a176f8f725cda918
7f0a499883c4b37e8ddcfea7c621dbf5a89beb80
5724 F20101207_AABMDC fan_j_Page_010thm.jpg
488ec64600400f92f9590fd4cc2b2733
ed49ab923884ad79d627a852a8010e28c2a845fb
16306 F20101207_AABMCN fan_j_Page_022.QC.jpg
9edab391a72466ef8e3bc45d1329224d
0be477bac938ae08d31bbce267afb9d0103979fb
4495 F20101207_AABMBZ fan_j_Page_114thm.jpg
fc87cb63e8a19d079dd8b56b795e6139
e88f907984cad40b38679daa9b0c5f296d07eb5b
2028 F20101207_AABLXH fan_j_Page_056.txt
69a49b1fa3745e75b6f37f755a81014f
95ba4c54ff5bcd748619c6ae826c1a329abbb3a4
1932 F20101207_AABLWT fan_j_Page_031.txt
0282ef326d51ec9149550e85007dbb9f
ed7c99b754029566a5aab24eb518c299c1f745f7
13844 F20101207_AABLAA fan_j_Page_017.QC.jpg
a8783f0193a2c21f14d525d2d7a37a28
47bc0d3ce7a85f9a23376a5f17e8ed7573e7ffc4
26171 F20101207_AABMDD fan_j_Page_010.QC.jpg
b52d01aaf2d30ae6ac61e64792da7dbf
5144a14ef9194594ca32d2ee5843cffb6ba9eb56
23443 F20101207_AABMCO fan_j_Page_065.QC.jpg
9c547b6b68b354998c472999ddee893d
b95e008bafd3172a4c7e7de11033e30bf2b1be25
2012 F20101207_AABLXI fan_j_Page_057.txt
36d5ff6faa985e0e9c0de2582d8dca55
94a386dd964d6c6692fb3b038d26331b53ae3e1f
1424 F20101207_AABLWU fan_j_Page_033.txt
09ed08d52e7c3945a5e92535ab4b8660
bcbaa5bd4622b656da8bc248a93d463068a7eb9b
43066 F20101207_AABLAB fan_j_Page_048.pro
b70fc9ba7ef4f181ea3126348296064d
acc89f4785a1cb3f495be152d0bdcbd1aa2b7901
5318 F20101207_AABMCP fan_j_Page_081thm.jpg
e6fe3b0d4760c2404bbae4a242b854c7
7734545fe69e0249e9fcf9ffcfe47d67aba51f95
1962 F20101207_AABLXJ fan_j_Page_060.txt
5556700bcc4733091d1ffbeb4a2160d4
72df19b5843f1b4213376a52c463116613631714
2609 F20101207_AABLWV fan_j_Page_035.txt
eb10fd176bb3ff3924101baab9f62d91
41896dc4d5c9a343eac6179547f10fb1891c81b5
22277 F20101207_AABMDE fan_j_Page_011.QC.jpg
0659a69682f841314ad366b4327972e5
1c9f92de7188169ae53768789adb6d23c78aaf6a
21216 F20101207_AABMCQ fan_j_Page_075.QC.jpg
2f160576da9cb155dd5ae9f5f0cc6ec6
391f9091ee9adc772ed4baf18d46b9c64cc2eb1b
1941 F20101207_AABLXK fan_j_Page_061.txt
37f2eef2e88b02396d97e77fd44f4b07
095a9af478fc4e6200a05192116748731e5eff17
1694 F20101207_AABLWW fan_j_Page_037.txt
5f791bf6c5347fad8baade6741105748
924574f5a722887236aff6e2a4378b10966fa03c
F20101207_AABLAC fan_j_Page_002.tif
a3bbc91b1ec2e2a6d80479541383fb2d
92b3424eee32c3383987903a54960de406af6668
5153 F20101207_AABMDF fan_j_Page_012thm.jpg
9dfb7c354e86776076701f3cbe27ea90
079f2b987a7feec9c93346052fa7283be79c4fb0
17756 F20101207_AABMCR fan_j_Page_118.QC.jpg
5720df07c1e2eac7753982046c27d6b7
54a97342af042359e3b61ab98455acfa0ddc4caa
1665 F20101207_AABLXL fan_j_Page_062.txt
158e0b2f4327b0c4be54e86741705b99
ea7f9b0a920083b3d9983f632e3b59247d8ecaa5
1815 F20101207_AABLWX fan_j_Page_038.txt
88c0fa72547f348e3bae3709d0b7d41a
18ab1c4ef419ab988fb969f0f24da5652fba62df
1839 F20101207_AABLAD fan_j_Page_110.txt
441edf6ab9e5942d799c746417ee4f62
a3029c1d4bb150b4ad6ebe4340575f18540a5ebd
3603 F20101207_AABMDG fan_j_Page_013thm.jpg
2a7722fbd12dfd5b73aaaccfa430fc29
b19a2ff971f68547f0ccc97188a11c0e0d44485a
5164 F20101207_AABMCS fan_j_Page_075thm.jpg
a67bdbd9623f7ae767653af9a87e6fb0
ba1990b416de27b661983c26e12cd80635c99eb8
1830 F20101207_AABLYA fan_j_Page_080.txt
d18785814e50cd11548c2c624ebf1011
c0a565da4b4934b8af6b705b64770230b85a40f0
1797 F20101207_AABLXM fan_j_Page_064.txt
fbf2be264bb91fea528412c975dc4e98
89b8540cb1e53a61abaa99088b43118f28bd6baa
57548 F20101207_AABLAE fan_j_Page_023.pro
b16c854270695631a906a6e479715c5d
43428787dfe9300cbb473d13419fbab6283ed054
16245 F20101207_AABMDH fan_j_Page_013.QC.jpg
bbc892955758c09a430981f5d4419cd7
1a7d19bafc84e385cb505f771a1666f26702268d
4095 F20101207_AABMCT fan_j_Page_033thm.jpg
1324968e511ff6e80cc8e511ec31e904
129dc1bb31ba0767128c94f6e415e2284ae0d657
1893 F20101207_AABLYB fan_j_Page_081.txt
a4f2d424e851d3c211a576a3553a8378
3b6d6e1fea1e41b47ad97f8eae5871395e87cb09
1811 F20101207_AABLXN fan_j_Page_065.txt
b9ce96a8039fe8730d5f2260c9831902
bd3d08733d1e9efc199eaf32dcf0832b6fcf7e97
2470 F20101207_AABLWY fan_j_Page_041.txt
564b9003f04a2a0cb1b148f65f05096c
031cd1c4876870fbbac5eb8fb6c3585d745a4036
F20101207_AABLAF fan_j_Page_104.tif
9d78529d6db8fc579c1259ffed94a4e6
704284e7ef9c69f1a7f02ce3425383d4bd79886e
6144 F20101207_AABMDI fan_j_Page_016thm.jpg
d033d91f7620c357513415c4e7f34cc7
c309bc2d850cb25958a739466602199a437a5632
5596 F20101207_AABMCU fan_j_Page_024thm.jpg
f6eab8d9d3d62a909fe9a2f8ec9d428c
33b7a86c746c0e021970e7f124bff22dc5e09ea3
1007 F20101207_AABLYC fan_j_Page_082.txt
562953c6313e7dfabfc76bebaeb6b764
ac683e8dfc739f1469da04a892e36dfa894eaeb5
1756 F20101207_AABLXO fan_j_Page_066.txt
b30f1e491a499552a9b4b02a0d0603d6
826b2e37bf499fe8ab4926474cfa560aacf34c3d
1959 F20101207_AABLWZ fan_j_Page_042.txt
3aef764a21f39889496d5d32b0b3a4cd
d60f28f733a9a5dc5085a82df6369deb11c951c4
2022 F20101207_AABLAG fan_j_Page_111.txt
a866d89a59655376182429db84ee6658
7fb3cd28cb80a5e6e6e6ca5b7c40dffb39fb39c3
27725 F20101207_AABMDJ fan_j_Page_016.QC.jpg
1a76e042652ce93195d602c7599926f4
833118d972ab709f59b28c1d886b06a2b8236d78
206565 F20101207_AABMCV UFE0019813_00001.xml FULL
b91999bc10048faa48a4d8ec93fd5a68
b973cd8ec92326587a82bdde452fe5749dd834ed
1624 F20101207_AABLYD fan_j_Page_083.txt
7d9085333d0c92471c44d8f3bac6c5be
82e097b8287f15e0f51bf31a6132f9dcab0b4143
1683 F20101207_AABLXP fan_j_Page_067.txt
63c73f1651e40b7b314135b645bf565c
a482ebd3de9d1450f88f32b602c6030e74f522de
42562 F20101207_AABLAH fan_j_Page_068.pro
b4b6c26ce03368b9635e2ce8edb4ea5b
f525e31b5a4ff6dd66615dacc2199e9ebde87feb
25453 F20101207_AABMDK fan_j_Page_018.QC.jpg
c90930ef90dfaa44ce1e39ffa048e97b
4e62ecd28e69266eb885858eaa47435e5c67af10
1116 F20101207_AABMCW fan_j_Page_002.QC.jpg
910904abbc64e71db282c27edce3bca9
61e3c9a1989c758b3723ad2c85ec293af7ee318d
2019 F20101207_AABLYE fan_j_Page_084.txt
0cd1dbdeb48eaf0a0252d86ea2d317c1
ce81ad9f07ae629d74874d48c719f41dfdd61c19
1871 F20101207_AABLXQ fan_j_Page_068.txt
bfa5f43afde8fefd16a10445b78932ff
e6a732e65b878b792fa10266b29400f54cf66b77
2270 F20101207_AABLAI fan_j_Page_015.txt
a2c0bacb84f75895475defc055e06120
bead8afdb841d459f420bf1641a6f3dc333f6c94
16419 F20101207_AABMEA fan_j_Page_033.QC.jpg
9e1b7557942e879e6d5861c5a4467e43
7e447258a647ab0a089f732c4df78983d42b934d
15637 F20101207_AABMDL fan_j_Page_019.QC.jpg
83c1e0105bc2bc86274389c4748afac2
b779db32b3ade7417cdb91c6c4eb0b9bbdb56492
7388 F20101207_AABMCX fan_j_Page_004.QC.jpg
7b29d30c97f75c28410f45df4d66f343
7761e4a8d9227414b45078cd8b7c26ab706f181a
1684 F20101207_AABLYF fan_j_Page_085.txt
67a69dbe64b8833b4ba281fe4641d4e3
b2f9245196059a71c00c2032b95117d5ab37d053
1522 F20101207_AABLXR fan_j_Page_069.txt
3f5cadfbf0ba98c1c2e78f106e49331d
40d5e762953339d7a5e0450518e31bc2be44a8b5
24965 F20101207_AABLAJ fan_j_Page_107.QC.jpg
bd074764945e7a2b95900cc23487955f
6253b58dee5580677d0665c8eed84101bacecad3
4615 F20101207_AABMEB fan_j_Page_034thm.jpg
4d9991439c59ddf0e6463069bc1963fe
cc8d460bf40d91d36252885ff4b87d5154447d15
19147 F20101207_AABMDM fan_j_Page_020.QC.jpg
dff44a0c1d3ea73a3ba7384ca125a43c
c8e55c3426902d93b4a80d4c0fecb48478d374bf
23446 F20101207_AABMCY fan_j_Page_005.QC.jpg
9e4b9fef1a4474a5eaf1d567b8fc802e
c871aa78214fa80c263e400c3921a6eb0abc2fb2
2042 F20101207_AABLYG fan_j_Page_087.txt
d2a3fff3661582bfb38c2ebc2e05b765
10803f9f3249b826bc8dcbf35fc7c6746d2ab42b
1627 F20101207_AABLXS fan_j_Page_071.txt
b824803aebd6b0c74a317480cf50cfe5
234a7ceece61f2799a7a1a07ad6f0a9f8ad7a6af
F20101207_AABLAK fan_j_Page_059.tif
7e81f6aacc6128f7969c9f0b4ddd7bdf
a0e7f41f58dc3eb8d7b8927773faf68d89f0983c
26604 F20101207_AABMEC fan_j_Page_035.QC.jpg
05cd82424b5f45373823ec6bfb15d4e8
3ae70dfc90199866738e9b7458414359ffb2f495
5453 F20101207_AABMDN fan_j_Page_021thm.jpg
2c9987957874f8baa3a7f1f1c04deba7
ec7e1f75a7aeb7e48e8a324ad5079198167bf330
5770 F20101207_AABMCZ fan_j_Page_006thm.jpg
b2f3525c87920073334d49cdd43cc8ae
5d513c738e9f1c5cb2acc00240d3993851afeb9d
1823 F20101207_AABLYH fan_j_Page_089.txt
0a1fc2caf5a088e696c6e068eacce85e
1c78093acd2288db86b65d3c27897b01701f1389
2199 F20101207_AABLXT fan_j_Page_072.txt
b4a639c60b3c742fa3e784a488cf58cb
1c67993001687b5388cb974b176010f65a239bd1
63217 F20101207_AABLBA fan_j_Page_088.jpg
9f4d32a66e6cd6a711cc272990078f1e
a84e693c53a4cfed70f88e0dfaf5d49e00899a1d
16656 F20101207_AABLAL fan_j_Page_120.QC.jpg
4f17015ea1cff51d93b1dea70aced860
7061e9dacc3c26e4dfefae31d20ab9450e6daf1e
4658 F20101207_AABMED fan_j_Page_037thm.jpg
dae9d9c8023e2a356897f5a356087e72
991449abeb60852b03ffebfb8a5b16ad2ad034c7
6077 F20101207_AABMDO fan_j_Page_023thm.jpg
a5f3d47a6dd9d0c30ed299fc6a10d5c5
8dedf2ff8226608aee6658f9a8f57fcda1ad1be4
2326 F20101207_AABLYI fan_j_Page_090.txt
e7dafa762ae7fa8171af8ae2e66ef3d1
d4ee9991c118c192b69b6744ca429c19e216bb3e
871 F20101207_AABLXU fan_j_Page_073.txt
b948e94ad8ec79e42ed74974dc005668
b733925a235ade042d068d46b8500fe362a513a6
101778 F20101207_AABLBB fan_j_Page_035.jpg
a640b79dde2785ad5a2c627efa83bdac
6129415ea1aca04cb44d19c27debb560f7fbee53
406 F20101207_AABLAM fan_j_Page_002thm.jpg
95a3f1a478a15236e728f94d99cb5fbc
b70d3220508de8a96e318aca7dc43eb48146d9c4
19455 F20101207_AABMEE fan_j_Page_037.QC.jpg
b34e628d19b721bf3facc6650fb3116c
e3568c6252dbdd21e19168fa92f618fa9722a847
22297 F20101207_AABMDP fan_j_Page_024.QC.jpg
d317147aa0a9fe6ad060d5fc4d2b3494
79e7de1c59ecf72f4b373395cb550d33bd10eee0
2396 F20101207_AABLYJ fan_j_Page_091.txt
34d6ae2d2890903c9ca0156b0a311529
37be4e720feffbc825b019e3aee49ae2c7eee428
1972 F20101207_AABLXV fan_j_Page_074.txt
a3b39738fc5d3016af2073870ff9ffbf
b38671145ec82c2dfed1de019d18dd8dda5bd0bb
6223 F20101207_AABLBC fan_j_Page_065thm.jpg
6cb690b2da3e277594c4e58864eb1d70
aa2737fb6e3c3965212b83df4305bbceb42c211b
5143 F20101207_AABLAN fan_j_Page_055thm.jpg
24d6705b69d7411e97f9bc89bcd87890
9e031ad54c5d97fb20ea7ffb0cc4abe82003fc3c
5902 F20101207_AABMDQ fan_j_Page_026thm.jpg
f298102d5d9866e342dd8668f207d938
0ac036e6de9647da9c8879d88adc6abbfbd8ec11
2375 F20101207_AABLYK fan_j_Page_092.txt
9cb647428a2392d7223b7dc35961d2b9
645cb412ab5719eed73b449156ebc17f25f0aa2a
1664 F20101207_AABLXW fan_j_Page_075.txt
f28f5a9350387199ac4cf666240dea3e
077146a76f67a6da0a2965ce32ebf09b662d73a7
791091 F20101207_AABLAO fan_j_Page_045.jp2
6823a166e7ceef74d44c2c4893393022
1d8b8a817cddd9932754a6e2dc525b126b820452
5270 F20101207_AABMEF fan_j_Page_038thm.jpg
021895d4ccaa095e5ccb7e1e9a05214a
a885faf5cfd1da7efb0aa9f542f105f2b33b1996
25489 F20101207_AABMDR fan_j_Page_026.QC.jpg
8156bf67b011abbcfacb8e2640cdba5f
df7cab1ab4d143c33be6b1dc0ecb9ce34105d206
2425 F20101207_AABLYL fan_j_Page_093.txt
dfec44d43b32262af34af88afc184019
4e42a57ddf4c360d98031b69983d86cbbb6f61de
1887 F20101207_AABLXX fan_j_Page_076.txt
a2e022d0eaca0ad440f27cd65b1e70e8
df48bc3c6f4c4e8c99468081b6843ee2f0580557
70630 F20101207_AABLBD fan_j_Page_087.jpg
7d744ae0d2f0727b9d26515dd283ea9f
042fad56f9d97f7f66d04e73b5f1a9acb34f28f0
20395 F20101207_AABLAP fan_j_Page_053.QC.jpg
e2ba075624dbdbbc292499d0be7cb792
eefeb8900fefe6ebd404de6b0b840e75df42a0e3
5494 F20101207_AABMEG fan_j_Page_039thm.jpg
d5fd0008eea14d80ca310b19fde0dbbb
e4f8c0a18af13ddff619e26bb649e7a91ad90ae7
6119 F20101207_AABMDS fan_j_Page_027thm.jpg
e9c85067037a89ff9310fdcd96bb53ab
fd6b864bdb31c8a2afb1b30ff1ddeea95aeed323
2373 F20101207_AABLYM fan_j_Page_094.txt
28f1c603e1e2f8feb16954f91c73dbbe
0bf50fe0733e07ac1fd1c443f60bbea7ab2b918f
1286 F20101207_AABLXY fan_j_Page_078.txt
510cf5644a03da680b31e4046e011d31
c76d2b40196f76e7c99f7272e5e4bee2187cbab5
2790 F20101207_AABLBE fan_j_Page_005.txt
82bd3c7ac9781db51e67adcc1ab711d8
3865545ccec4adeae47e4cd8d81e0e4660995459
F20101207_AABLAQ fan_j_Page_083.tif
43e7ee03c0f49457113b2df432604ffa
76ce3314bda43451df00dfab696cd918b16ee294
1806 F20101207_AABLZA fan_j_Page_123.txt
8521740f5898272c9f240bb828ca3731
66cef8f021d1f3824451265f523b6f5a7bb73bb8
24436 F20101207_AABMEH fan_j_Page_039.QC.jpg
55f7f7982e014e8a1ae17701bb45badf
9d0303aba5c4dcd5ab198a749094ce96b07ff5e0
26383 F20101207_AABMDT fan_j_Page_027.QC.jpg
4f5583e16752da8b77b2aa840ade8796
bb18dd19da4234fd27a30bd8c6cbdceb3d0ef99b
4737 F20101207_AABLBF fan_j_Page_079thm.jpg
b929f21c547339cd998389df69fe0000
a65734f3a2c6a93c40b530932034783bb89cdd5a
2092 F20101207_AABLAR fan_j_Page_059.txt
1740c015cf902f7460060142370e6bca
1fb4dcc491cde46ae5c31ca3e06c8908348f6d4c
1942 F20101207_AABLZB fan_j_Page_124.txt
bfd3e1157a4f3aa438b27fad8e8eea18
dc5b87fccd6cf5b3172f6dc48a6f722e775c3eab
2327 F20101207_AABLYN fan_j_Page_096.txt
8dd9c8506ac395af7d0e6b4637070cd6
26293279b463e96d7c0a670ec03dcc31e51118c4
6101 F20101207_AABMEI fan_j_Page_041thm.jpg
1e4d21de50cc132454d68d172a881ea7
37e6abccc91bd6651d9ccdc9b75e5727945830c7
5499 F20101207_AABMDU fan_j_Page_028thm.jpg
858c6e39a0f3b3153df96dc5a15ce897
b904dd3fde326a5aed97939a8ed112f9ca41a2db
1589 F20101207_AABLXZ fan_j_Page_079.txt
040aa9039c903b25d1eed215ba91ac03
d063b9aa8d23b4459825165357a4ad751fc3d652
57323 F20101207_AABLBG fan_j_Page_078.jp2
8054fae03f045ba0125eefc7c60cbd92
e8e07c7fb35db949bbd05e5992fa56a816f91ce5
60592 F20101207_AABKWA fan_j_Page_092.pro
a8d0a82c01bb64d1c34e1dcb722ffb89
ac6a89693a020e3a3e660e6bef1ad1d8d75602f9
6128 F20101207_AABLAS fan_j_Page_094thm.jpg
3058c67bb9d150a549f7ffee2897445f
8371a18fcf7f32639458b846422c14192bcd538d
2239 F20101207_AABLZC fan_j_Page_125.txt
0300b7fec126fdc4e4244d44f4f7bc03
a665b53008dfb1d63fc14ccdd60a6a4735d07619
2204 F20101207_AABLYO fan_j_Page_098.txt
587186d598be5cff4a4eee2993931a8d
1c878d247ab3bb18b5238f211f18332c0927abf5
25884 F20101207_AABMEJ fan_j_Page_041.QC.jpg
a6ad4786c22aaccaba148da4874c2075
5aefd6756168e196335efd06d0dd5a7e557ac142
4856 F20101207_AABMDV fan_j_Page_029thm.jpg
65826287ed20138e649d2e505f2afe0a
cfcfa4583f16458e61453a6589110f7d1ce4f46a
93936 F20101207_AABLBH fan_j_Page_093.jpg
8e23f7707b12b9bc080827edcd911ab9
78b786773cd35623bcf083326155cd555dab1b96
690435 F20101207_AABKWB fan_j_Page_033.jp2
f5f056a7e95bf862986d7287b553fcf4
7640518878a7e3f68f401d40808e27d417a5162c
1711 F20101207_AABLAT fan_j_Page_077.txt
5f4c17cc3c779e3b12354df1a9470224
f91ec3f651408f7ef8a18406799f92f91d297f41
2210 F20101207_AABLZD fan_j_Page_126.txt
bf8b9771f65bf0d07394cdd51fb2743c
24cc11424d71d8730c47d0945bceac7b3211c2b4
2161 F20101207_AABLYP fan_j_Page_099.txt
2f0a64b717cc67a0d5a64e7dc0f4c447
39f9486bebc3dd067936b6465cee09b4cd724bc8
4668 F20101207_AABMEK fan_j_Page_043thm.jpg
a826e928dc093d9a1a4bb413be4ef26a
2b6e7d54e626a9c5c81f5e484f8248fff70fcde2
18697 F20101207_AABMDW fan_j_Page_029.QC.jpg
89d67f5a4037b041842ae85e69958030
a2d142884c8c6b36e4b6a3be43e2c51ffc544b34
F20101207_AABLBI fan_j_Page_005.tif
2644024c3fe1100fd23e8a5719d8e4b6
10ae803844b8a1a1cad40c0ae368060acedeab87
2333 F20101207_AABKWC fan_j_Page_049.txt
5a05e21b9d263f39efebac788b269352
991ef939ea9cfdeba0b30dcaecd55cc12c7b6ff8
F20101207_AABLAU fan_j_Page_110.tif
f177bd0add8fa0de172945a030b31a23
0eaed8ca4093263412f2d99ec402f4eeee20a961
1792 F20101207_AABLZE fan_j_Page_128.txt
7e5d9a32960229e9e3afc1a3f88ede82
1ac217cf7142539ff5635f92eb7f115ba4dd57df
1304 F20101207_AABLYQ fan_j_Page_102.txt
eeb687576e520471497dcf20e2a01407
bd32cf84c54ef926625876e653582af119eb13ba
23573 F20101207_AABMFA fan_j_Page_060.QC.jpg
c603f68512536f41cc92e9fcf7e6e29e
a5813cdfacc043b6b21aeca5d4237cb5971761f7
18275 F20101207_AABMEL fan_j_Page_043.QC.jpg
30aeaa29cf21c59c122b1b48233a97f6
2111d2d8862b384432d0c656c32cbaaad8674c0a
5654 F20101207_AABMDX fan_j_Page_030thm.jpg
b6c257ebdae3d8afe14d9afc6c69e6c2
7a872eda60033d018bfcd6634fb8fa38cbc48f43
21142 F20101207_AABLBJ fan_j_Page_068.QC.jpg
56dca3c666668f05e3d15cadf5e01616
3e47d7a73e0b8ad7639b716fca6f04d068c536c2
F20101207_AABKWD fan_j_Page_010.tif
347b7b31e36103c445193dc23788ff1e
891764622c496e411ecb051ca3f0eea1264c9ebc
F20101207_AABLAV fan_j_Page_056.tif
8354b625384412793c0297ee1364d421
3d5f2d9f151608c5a85c8465cf4795a5029fa03d
2122 F20101207_AABLZF fan_j_Page_131.txt
99fc3ca50cfa214f484026741fd359b1
5061989984e5ed5aee5755cbcef765ff68f4474c
1976 F20101207_AABLYR fan_j_Page_103.txt
11f5fefce9f9db533df3d7115e9033d6
4c3bc9bc54c8b29699c819d73be9310966702a43
5229 F20101207_AABMFB fan_j_Page_061thm.jpg
45500a4493d93577fa34869391a84650
bb93b80d0e3c09e233c6c9978719983b1d0283bc
16935 F20101207_AABMEM fan_j_Page_044.QC.jpg
64cf6bcac5b6a6ce530283e85052608c
1377e72ddcd299f9570cd87de04c47b94a873496
23054 F20101207_AABMDY fan_j_Page_031.QC.jpg
7bf42f42bab08004120c927f391a73e7
cef864eb981afea09e3026ce1928d72520d76515
23042 F20101207_AABLBK fan_j_Page_084.QC.jpg
e80098e1dd80b7e1b44ab53a120e68a2
4896a163d46d833502d5265482ba737fa9520978
1816 F20101207_AABKWE fan_j_Page_036.txt
14d42ce9bb75630a653832f9f8ea9ec5
d35ba7214d0e1980cc9184e9255604dbbb755cae
1047615 F20101207_AABLAW fan_j_Page_121.jp2
66c074937cc8e455d2e5c931465444ef
a14a6d32e2ca4380763b3f4082276681b7400192
1110 F20101207_AABLZG fan_j_Page_132.txt
03367833d625f3546f0b60ef862736d5
53da0e0bb97be491cb8dbcf0ee6293eb408faa95
2111 F20101207_AABLYS fan_j_Page_105.txt
1487b977d2c7d10a5fb4c6ef382ccce6
baa251155b8d40684de6b5fe7c319901c00779f7
4984 F20101207_AABMFC fan_j_Page_062thm.jpg
260d9eb16b7157f4c990dc5e50a6551b
42e69bc9d28b12255d201faa31f2f2c92c3268ae
4962 F20101207_AABMEN fan_j_Page_046thm.jpg
38b859a002afdcdd782ef907eff106c8
338508af6f1ae1462ff5430562595cafb2b74691
5179 F20101207_AABMDZ fan_j_Page_032thm.jpg
d7434b6114be0a6a8b74cffee088f051
20f4e394948bca88a316c32e6c49b066ac3196fd
F20101207_AABLBL fan_j_Page_053.tif
77fe96bfc4640c78d3fef7b70897bd25
fe07d955516721dd9aaa6a3583a9e5eaf6fa7580
F20101207_AABKWF fan_j_Page_054.txt
53c640fb36f75f51be0e6c0b1262ddfd
e521b25e51f50e5af52f0f1524a7258db1cc88d1
2009 F20101207_AABLAX fan_j_Page_070.txt
89576a6fe2374c6806ef9c0f16d2a634
159fc3f2dc8d8c75d0c102f80183cfbaaf1ef1b1
579 F20101207_AABLZH fan_j_Page_133.txt
4f296d2f42d7176423c512bcc0195380
09e888aac11c49812dfca931527423f6512690a3
2202 F20101207_AABLYT fan_j_Page_106.txt
f3d22345d8ada95b6c1bb370aa5cc4c2
77ebf3b193e5aba299da0dea40a55b184b6aa7a5
F20101207_AABLCA fan_j_Page_061.pro
9c94ffd5ec81817efec60ad1d3e08abf
7d48522f760bc442f9cc4b3e6e41ff5cad2b78d5
4846 F20101207_AABMFD fan_j_Page_063thm.jpg
a9e0a51eb935acdbc98a63094b02c36f
b867c340ca52861c761b17e1dc6e1c314d143a6f
4477 F20101207_AABMEO fan_j_Page_047thm.jpg
4f58de84bb0a1f5afe5a709869b5bf70
20ced928d8bb547e49768a71d8acd86a0846bfd6
917920 F20101207_AABLBM fan_j_Page_126.jp2
d87a8f95e6a204f638a5dbad0c05a6c3
13c672aff3a280ed14d058f660382ce8e3f1323f
25402 F20101207_AABKWG fan_j_Page_009.QC.jpg
9d92207ee3a36e78c62de7526dd2b70d
2a7cf03a89c62a6e5684e6b11936ba5fe167b027
88404 F20101207_AABLAY fan_j_Page_129.jpg
102a9db795b745ee2bb4888bbc6e93ff
1e3d3742f337756601c22bd21f59d639c3094ab9
2445 F20101207_AABLZI fan_j_Page_136.txt
1198e3c8da53acd829d047142bad279f
d9b23337c811b979bcb3e83b2ba1b5f837a68d15
1983 F20101207_AABLYU fan_j_Page_108.txt
a5325213cda1b8d0bffc417b7f55c558
a71af15a9cbbd0fdca7cdfffa7bb16ac5731d64f
21542 F20101207_AABLCB fan_j_Page_008.pro
9ca9eeb4a2325bcfd081e76030dea592
9d9712bc1833204dd6a476960e46bf78b0af9d68
5915 F20101207_AABMFE fan_j_Page_064thm.jpg
845927be2ef0baf36b68fda0498de27d
cb6ac64a5e621c7fe0df62736c53dcc25d0fdce7
6041 F20101207_AABMEP fan_j_Page_049thm.jpg
0a84dc71773a4fc3c896cd424be082a4
35aa4a91966d57391bb73fe010a4da46833826c0
93484 F20101207_AABLBN fan_j_Page_023.jpg
d8f87ba38da6f915cbb9e909a25b97dd
6ab37e663629b7c7a91f1c0c1cb9868fcccbaa76
5121 F20101207_AABKWH fan_j_Page_020thm.jpg
a4c6e072bd1cc238f379c74065dda16f
74e8e39850cd9178d853044cb3ed993ace593ce0
88202 F20101207_AABLAZ fan_j_Page_015.jpg
27a0e1b90f59c6f91f940b26c5e8cebe
f99f23e082172245dfd079d210a90cc29b14b30a
2571 F20101207_AABLZJ fan_j_Page_137.txt
78cf08fa436d4fb88f6e637339d0768d
ef49d0cc7ad7269232b15bde5a625af6956ed22d
1004 F20101207_AABLYV fan_j_Page_113.txt
59c2770058551ebae7bd4b705106c916
20cf474e0df18da9e4aa1ea84fb38a445ab24cc3
21325 F20101207_AABLCC fan_j_Page_038.QC.jpg
c73c0fb4c48c15ecb18e4cb89757ce1c
9249c2ceb7bacb1939d4a98c5f1a1d97de0e3da8
23562 F20101207_AABMFF fan_j_Page_064.QC.jpg
0870db4e581b50aace691d8e920d3445
57d8c75eee6935ce4520e901c9a155e80994d0f0
4424 F20101207_AABMEQ fan_j_Page_050thm.jpg
98407e68bc230f97230d4b7cbe14f3f6
c5a35997de0e25e17d106acb1d3440ab639efa36
23301 F20101207_AABLBO fan_j_Page_104.QC.jpg
365b03f3e665b13f02f2ca647cc66a35
b78a1dd30c974d20fa406e7a366988ecf91c4e15
41386 F20101207_AABKWI fan_j_Page_132.jpg
6973f4427623e3a446a4d5611337929c
90b76e807d35516cebede664284396816bb81188
2676 F20101207_AABLZK fan_j_Page_138.txt
d5a05c068e0945eeb66890c07797fd4b
74e25fabab8877dd5e659cfe625c43fe8be2af4f
2049 F20101207_AABLYW fan_j_Page_116.txt
d2c5d0381b9c57e4e7b7c494926e9489
4b86ccb9f56459bc2db985a9462085686beec7a3
17394 F20101207_AABLCD fan_j_Page_045.QC.jpg
313556a14e51f9531000c507f4acbefd
6e33c8e83387a8ac92781fd250c22f6fec1e213e
6213 F20101207_AABMER fan_j_Page_051thm.jpg
3df60f158c8d47e50dae660cd9cff767
40d1a4f74b84b2c64e57029fcb1952f4480f4ca8
53057 F20101207_AABLBP fan_j_Page_131.pro
807ed4dde3ac0688cc3f5b7c0115804c
5b6cc4388c0099f99ba79f9cd07d4e26bb02b338
775129 F20101207_AABKWJ fan_j_Page_122.jp2
4f79503305782381fadd598a9fe2029d
f3ea5098eeaca344c572d41a9c511635edaf0366
1115 F20101207_AABLZL fan_j_Page_139.txt
fcc136bda8f4c8cb0c76750e5fd4cbd2
820b5052e9464646be760692035866f0c3634a7d
1757 F20101207_AABLYX fan_j_Page_118.txt
592d5c6932057b826087615c6220d1d6
11fd5b2383fa82d312d04e44ff881de6f2c9d4ae
5162 F20101207_AABMFG fan_j_Page_068thm.jpg
f6d1ee8d07ec22b88b21d4b78f25a06d
4fe3d4f438e0fb9f2aa46fd6b3158772324c6f1a
4725 F20101207_AABMES fan_j_Page_052thm.jpg
1fbe0c8e38e8261bcc52e6a1d3427ab1
1012cec84e94cb4fd1d5d0277946023fe864fba8
55031 F20101207_AABLBQ fan_j_Page_050.jpg
eb0ff379547f47196504c513f2fdae28
898f6b90adbfbaa0228e37e7dbb4ef36811baaff
2201 F20101207_AABKWK fan_j_Page_014.txt
eae7eae91f6a9baac5539959dcc87812
33006d91d7f44f6261e6eb59b3409f8db756dba2
711 F20101207_AABLZM fan_j_Page_140.txt
7fa209d9cd2dcd2b4e0e8cbc15e82139
72f6f61f5c624368c69f550ff6c3f4b7a65bbecd
1596 F20101207_AABLYY fan_j_Page_119.txt
151f70a0f61c2bd4f9863cbcdd16795c
ba18e052e38d31ac2c43d18969c64db30931f2cb
21306 F20101207_AABLCE fan_j_Page_032.QC.jpg
7bdd08718bef3bf5f27504e286264b4c
cc0c52b31828d70739cc8ab16df9f186db46445d
5521 F20101207_AABMFH fan_j_Page_069thm.jpg
fbce1a1b29c78b0b58bcc0c6f0c6d9a3
509787d4ecaf670a92675a39a2356fc15a088a74
4939 F20101207_AABMET fan_j_Page_053thm.jpg
970c2c4d5be069cbc28dcb7951d742c0
d9b1ecdc5f68c117b8cc88f07eac00080d329abc
1022166 F20101207_AABLBR fan_j_Page_115.jp2
aaae5754af0caa5ffcdbfadb8028adfc
bb6eebd991e4446a983b669ae30c38370068f3b2
5552 F20101207_AABKWL fan_j_Page_009thm.jpg
85a851fa3fe83c828f2c2f5d1b4daacf
525e3335898096e73ee05b3a028a3d0377a62197
1294 F20101207_AABLZN fan_j_Page_001thm.jpg
969b220a77de2754dda3966d625a968b
b397c934a12560623edc12318b7e55f21e8ca8fa
2084 F20101207_AABLYZ fan_j_Page_121.txt
81f74c5cff331f7921a0833ace274014
b416ec760a62dd0636c2e39e1421cfd28a20ba9a
743 F20101207_AABLCF fan_j_Page_058.txt
c22444d7be18cde00a0d2998b0441c2d
a80ee34b55a38c505d6056401d413ec923001ef2
21958 F20101207_AABMFI fan_j_Page_069.QC.jpg
edbe0777913944ee39f7bfad5a0c59d7
8d2d5821f436e4adcd92b304e79c12f9468521c9
4711 F20101207_AABMEU fan_j_Page_054thm.jpg
e03beafc7563a8313004ed90975e39a5
00bb82c551c62c39946ac77fad53f0b88c4d6fbc
F20101207_AABKXA fan_j_Page_049.tif
40bd7aa35f25da05ff60c2015cfb95a9
2f3478f8c71cd25bd58a6e7c3a708426ed767ecc
4980 F20101207_AABLBS fan_j_Page_126thm.jpg
3ac126a5ed0e557b89a1f405a1686d5d
6c276beb0537949310d56b0140870d5d86a58fa2
F20101207_AABKWM fan_j_Page_112.txt
7a4e5a80c7871ef184709b29a42d8e0a
4dd745ddfae8c81cb9fca1492d176bd3b3397703
19104 F20101207_AABLZO fan_j_Page_088.QC.jpg
fe5fabf1cae559b3fb2d9d94d0a1f178
581870336ba8d8c452f25bd9c3238e8e28ecb91b
2268 F20101207_AABLCG fan_j_Page_130.txt
11f20fad80c7c57e17aa0952e0bb04d1
150fb9c55c903a82708705a189346807808632a6
5043 F20101207_AABMFJ fan_j_Page_070thm.jpg
a852e39171beb4bd0db10470d0b7ed32
6b811747e331a0316647c3224ecd3f0e7d5184cb
21492 F20101207_AABMEV fan_j_Page_054.QC.jpg
901f434e8be0e6d2cd315119d03a1b2b
775b4c59e3d6574c36892b272d9bc6d236f10dba
50947 F20101207_AABKXB fan_j_Page_039.pro
d73db03bdda5ce3bf13735fc4c83891f
b7d291cbdd577dad4823967f7aba61e4b0f3913f
25908 F20101207_AABLBT fan_j_Page_049.QC.jpg
e287deeca50e32a7bb759e6545da2462
b92cdfc90f400b59fa75f68340832d1a355acb66
F20101207_AABKWN fan_j_Page_001.tif
414da7a22043a3fd34ad7f55b4d7c8e9
946f55948ebfb1bd90efe1e7cbf0b9525288e9ad
6413 F20101207_AABLZP fan_j_Page_135thm.jpg
aebbab34d3484b2764da3f381cc2914b
e2ca1900cce8f136c0187b13ae443ce02232a67a
28873 F20101207_AABKVY fan_j_Page_091.QC.jpg
b3433f2dd1f0c20c3eb9be1147dfc870
aed0ac8b501c429c9893e14d03dabfd8744e0e68
48945 F20101207_AABLCH fan_j_Page_104.pro
31c161c8648453cfb242dcf6ba480c97
88ffadcc0edead4e7fc90f8dc667e8a13c813805
24284 F20101207_AABMFK fan_j_Page_072.QC.jpg
72e214553408dc78b59bc9a77350d965
9a9e3f43219d5ffa6854bcedd800547e79286a06
21082 F20101207_AABMEW fan_j_Page_055.QC.jpg
5d9fc3a7544540a92a8253a528376784
ce5b22a48cb5daff9c8a6c0517cbfee51632b1d0
4433 F20101207_AABKXC fan_j_Page_077thm.jpg
63284fdad18501b16dd9ebed89aefeca
c4cddd19d0600adc7930245f58c6e5c560f0491f
F20101207_AABLBU fan_j_Page_134.tif
2bae12055fc19d3e1ce911ae5e63f964
2466279115dab7436b667f97d0b6703b532552b7
F20101207_AABKWO fan_j_Page_123.tif
343b2f612d631a3e5a78986f92748cd7
a5db110be9b4c7f983c26e29832508e192d2333a
5246 F20101207_AABLZQ fan_j_Page_131thm.jpg
428a8c3a93e19596ad60574639dcc91f
0738f36c0a0716d97e1e54836d4fd39a0de8c19d
58342 F20101207_AABKVZ fan_j_Page_016.pro
6d9507968f2bffff85de8ef4ba5a5cd9
24bc4c0b65a17a42c18fa694679d876112c79e40
F20101207_AABLCI fan_j_Page_019.tif
417ada2e854cd5e37fc9bdecdc612004
fef3623cf05a6fa86041d3f8ac705f41a323e814
4936 F20101207_AABMGA fan_j_Page_089thm.jpg
bd1587aaaee578fb92b7d3de42df249d
784cfe3a7d10ee3e71d9f611dc86a7f50d8b8859
4879 F20101207_AABMFL fan_j_Page_074thm.jpg
f4449afc4b2271dbc16d794bb1d40bf8
9ff0de3a615edbad3f9a40e66748675c6ef71ea5
5672 F20101207_AABMEX fan_j_Page_056thm.jpg
a75352f66ddc5f617c1e061bbe2c8140
580b7e88120671ec12158a99f9599c6ea960bb50
4461 F20101207_AABKXD fan_j_Page_119thm.jpg
82728dcbe4294cde19a208fd8aea1445
a8e1e007367047e55a6714b3aa378141f5737a05
1754 F20101207_AABLBV fan_j_Page_052.txt
c341d319535554f1ddc9fe2c826ed093
7f0e12cc692f413acfad34a9bb3dee2ed43b210f
1051977 F20101207_AABKWP fan_j_Page_094.jp2
87f495a9584e918d8d75e6bd5d258ec4
c4fc9c997e2c4c97bfac2c3999a8c0d2cc718489
19017 F20101207_AABLZR fan_j_Page_083.QC.jpg
bbc7c3e2b4bf71bd325a87d0feece5db
ee342b20ab4f46f751ba7be32b529faa6797b111
F20101207_AABLCJ fan_j_Page_139.tif
d7e5050b89bd0189b93966cbd46f2ac4
30df263735c3db1387c3ac59a6d905161e0f8ffa
20737 F20101207_AABMGB fan_j_Page_089.QC.jpg
ce63f765742af9d296043700d1a563f4
52fb031b6952a0f7853f85f6ed0d132c5a5db8f3
18902 F20101207_AABMFM fan_j_Page_074.QC.jpg
5edd80bbb60818d4be70d9cbd4f85a81
52de96eea2744fdccb2021890c634da200ce9bd5
23915 F20101207_AABMEY fan_j_Page_057.QC.jpg
061c591161d75531dbe23b5f8ef3122e
b373e3d15be28b640aae1e70e422df17c8e40f30
84 F20101207_AABKXE fan_j_Page_002.txt
3509d09032f5aa93df5eb9b7614ac378
d3f342233d861e3e9cc3f19e190f458c0acb4f79
17171 F20101207_AABLBW fan_j_Page_097.QC.jpg
1ad3f0b688633645e44f18f24cd0331c
de00404e2e2dfe103e578f337ed42c973b81f69a
17243 F20101207_AABKWQ fan_j_Page_047.QC.jpg
ae40bc7ebe68d0a9ff5502e9f1ddc285
84dbfe9ae46aefbceff72057d58e03cce137da97
5871 F20101207_AABLZS fan_j_Page_018thm.jpg
c0d296d22fd7fd55a40196a499f5b910
0e45b72e567a5f5e570cc0f01011097db7532716
F20101207_AABLCK fan_j_Page_008thm.jpg
b6885d5a39616f490e6ab0d3090086ce
b11c016d8fe3505ec2f3e79d73038f134408a69a
6071 F20101207_AABMGC fan_j_Page_090thm.jpg
f1ada144f2920ccd7ffbfeb22a038938
1d5472e47ca22f19426998096114ac5149caf3de
21913 F20101207_AABMFN fan_j_Page_076.QC.jpg
dc1b8038e9997238aad717d8e3bc34e3
88470f987f8aa33b7fa462bfcb896512867e3ff8
5699 F20101207_AABMEZ fan_j_Page_059thm.jpg
b8d3f508930337d2ba7ee20ec11f6a87
54701595dfead7e6b5d124f45a3e0d3532f08b5f
18377 F20101207_AABKXF fan_j_Page_042.QC.jpg
f114fa956b947ed36dcbde323187a90a
0003f12e75a01094eb334ab09061cb0690ce24d6
27687 F20101207_AABLBX fan_j_Page_023.QC.jpg
d134bd470890126893613a44fedcc9b1
0490cf1afda2849d652689e8d36b666fd42db2ef
F20101207_AABKWR fan_j_Page_084.tif
1739fe95375d32ebc3bfabbc8e447a0c
88185a05e410f72ec0602989525d6f7b5b34b796
21980 F20101207_AABLZT fan_j_Page_085.QC.jpg
afbea54331990ec8dcd389de729ba00d
a79deb832a5a863aaef9c87c5e5c90217940e822
6327 F20101207_AABLDA fan_j_Page_067thm.jpg
17ba76dfd2b25be0989abfb243a43759
8f3a07e258d14130984e8287eeabfaa13abd9d1c
1051974 F20101207_AABLCL fan_j_Page_006.jp2
d6e2bfa665667221fe1626fe570f682a
ced4f187baebca8dd704e0b685cc7744d520ace4
27593 F20101207_AABMGD fan_j_Page_090.QC.jpg
ffe4fc762492a7a248a13d5a48ae55de
88921aed8c9502160bfbc02767394e9c9de7536b
18056 F20101207_AABMFO fan_j_Page_077.QC.jpg
9d86a62a54a51b5a07633c1ebebd5578
9c4d99b26599f6fa9863d8e3a8535b0a78cf2bc3
818 F20101207_AABKXG fan_j_Page_003thm.jpg
c9050c6e9e27d7c7929e4121ad5bed43
f9b4e21bcbbdb2ddf51606a58bd597bc74f6ab23
544853 F20101207_AABLBY fan_j_Page_134.jp2
f4765185af9efb3394f64bc9b0f36993
34108f8641b69c6eeb3df10fb3c2305709c0a4ae
2509 F20101207_AABKWS fan_j_Page_135.txt
655c9cc659bc71a287d78413c7ebfad4
cb1d7541a05f27ef296e1f419bccbc752cbec25c
22571 F20101207_AABLZU fan_j_Page_121.QC.jpg
397d860cd2066412f76ea9b58751a841
0e70db3c7647771bea3286ed616f585ece3133fa
96857 F20101207_AABLDB fan_j_Page_091.jpg
46eb89fed32b4f16d9f594bdf1bc0f2c
0e78b3bc76445e01050590eb9978d936a7ad0b5d
1916 F20101207_AABLCM fan_j_Page_088.txt
9f5c8f6989d5428bdaf4f6a2f10c3212
80166f4df56b7ef5206c31031f696bf9a58df920
28067 F20101207_AABMGE fan_j_Page_092.QC.jpg
7e03e2d347484022d8cb30c3d748d1de
ead9128b4c1f2cb68f722a194764123e5d058352
4767 F20101207_AABMFP fan_j_Page_080thm.jpg
62a619aafd997ae2e19a906f90c9a7fc
3b4a1a67aca24d19e9ecf28264fd7f9a98eb0d2a
1284 F20101207_AABKXH fan_j_Page_127.txt
bb808a0a07226be62be3723cacd688d7
ad3bf6034595dd3a6aa983edf3207044ae9c1869
1637 F20101207_AABLBZ fan_j_Page_034.txt
07044f11150a23968971709a13ab229f
a37816bed0ba0af033c0d24dc8df42f5a6e4a494
48842 F20101207_AABKWT fan_j_Page_107.pro
dc5330eaf1cf71baafcb4a1b3be0ac22
eee755e907c08c4a7d98c28ecf57293580007962
27256 F20101207_AABLZV fan_j_Page_051.QC.jpg
5aa27bc2e7f413cf34062879613a08a6
f694f3308c879ac70282a2ab0cf318382b1e1187
24904 F20101207_AABLDC fan_j_Page_014.QC.jpg
a78cc1c37ad5839e9c9c5e77e3b90dfb
42294019593b01edebc0928fb56177755582635d
5430 F20101207_AABLCN fan_j_Page_036thm.jpg
318dc10fda86fc101bfcc6f0c7854410
446b21a5ff498d992f8582f5514e367252e04563
27665 F20101207_AABMGF fan_j_Page_093.QC.jpg
792c5fb7752d1fab57e242dcc502b37d
a489c44c7c3b9c60ce19c47fd03fd5937cbac240
19157 F20101207_AABMFQ fan_j_Page_080.QC.jpg
8f221009b9c4054bd0f9a8a226f107ac
624e61fc4bc205088e6b3fc25a53d59bf5e36066
21397 F20101207_AABKXI fan_j_Page_087.QC.jpg
6721e866ae93d6c051fb82930233614f
9de9d47bb99dbe17b8e0b113ca857fd98e909fea
68343 F20101207_AABKWU fan_j_Page_055.jpg
8d17cf89b7745e16465a52c8db6632ec
bebe03863c78f6f3ee0125bb89a1f7d94f88e67d
25020 F20101207_AABLZW fan_j_Page_105.QC.jpg
dfe06ab93a9a010cb02c617e3985ea6f
c1597c5d83fcde56be928732c3a371f904cf5bc7
23817 F20101207_AABLDD fan_j_Page_067.QC.jpg
81732f9852a62d9a7ff0f49d8fb430f8
7416e08f229a4d1f5a0ba439d3db886667d9e49c
F20101207_AABLCO fan_j_Page_036.tif
4053c7c423945594deafa0ace91e1b73
31c6d87c36fee2ba02072d9653a0f29fc77a7502
27300 F20101207_AABMGG fan_j_Page_094.QC.jpg
d674e29440a73b90e732b5f3eb85f798
b91a8a45ca934e1305b7b9551881b074e7f4c78e
22674 F20101207_AABMFR fan_j_Page_081.QC.jpg
ea29b56d10edd99bf21e4ba300a60f42
dbafaa6c62a9d6087e8025493cad86e2e7f0f627
82763 F20101207_AABKXJ fan_j_Page_037.jp2
9832e98bad6d5eaf8335bde9c0db49ad
066cee34cf3e40de0dd7e1ccde02b006c1b35389
F20101207_AABKWV fan_j_Page_044.txt
a893e21060746853cb7defbf35d9779a
f2878524481d48cced125624002ecb6855d4ff58
28294 F20101207_AABLZX fan_j_Page_137.QC.jpg
9791a91969d782daeec768593cd284f4
aa7198399ea9878297da8ba9fc0d8cbc9ed29e05
70283 F20101207_AABLDE fan_j_Page_128.jpg
350ee68129c21782ed1f69b6a9906b60
7cff2c4d1da79c7f24faf1b76054697b3aa591a9
4799 F20101207_AABLCP fan_j_Page_011thm.jpg
7d7bb409556fc0e670e61a7908e3a953
853dee296a2e557642fd89e05804024c1686b7e9
3959 F20101207_AABMFS fan_j_Page_082thm.jpg
cd4ab4d7f5f337e224363bd6461741ff
fa4e0a830c52be1ec9e7e5ec3a0e4a8e9891b5ed
5544 F20101207_AABKXK fan_j_Page_060thm.jpg
cf9f05cd5bbbc57f69f90033a29f7292
268330a9c00762e63b34e09082bb248104cf2f8b
1461 F20101207_AABKWW fan_j_Page_120.txt
74b36d1cfda769e6f7ea62addf4115e7
bdcd8a5afa05b96d7fa204b1e4b43d1163952203
6027 F20101207_AABLZY fan_j_Page_040thm.jpg
f09bd5f09bbd83c50cfb92d4ca7aec6d
0e753fc10d3a37090bb7bc711f55f03626d3afd8
F20101207_AABLCQ fan_j_Page_114.tif
014a946588289d912fdb521b170ddcc6
99ceddf2738beccce377b58cd35c3804a24717f1
6361 F20101207_AABMGH fan_j_Page_095thm.jpg
160cf68f178425a34a5dd58a505d8089
a599d305c35cfdabe058a1eac6a9407329ff4196
13625 F20101207_AABMFT fan_j_Page_082.QC.jpg
766f2ec66bcf6c8dd6fb1fff4567b9fd
184d8573b2c1e8cb2409854299bed2283954bf9f
22274 F20101207_AABKXL fan_j_Page_028.QC.jpg
80a53aeffdbd1a58462bda48f6c5ef96
027cac8439762312d9884646d45fd3a29a34b9fc
64168 F20101207_AABKWX fan_j_Page_137.pro
3a8a4476356d286236e3284630e78792
7a0e37e506cebb0ad8e4c97df4f43449d7c64f13
17121 F20101207_AABLZZ fan_j_Page_034.QC.jpg
d37215f42aff75ed584d4137f3206d84
f305c62a610810bd3183f05a5debd1433c9bf1a6
F20101207_AABLDF fan_j_Page_092thm.jpg
eb754f30e2d5a12fa3312c1f250e592b
75dee4c28791c563bd6fb60d173cb51c855e75df
2029 F20101207_AABLCR fan_j_Page_039.txt
f32eb9a208993b38d87b9c9b8e02e326
e7787e6d7dca05b32e6b33f7cffacca0a4edd16d
5056 F20101207_AABMGI fan_j_Page_096thm.jpg
d973655a274b7d4e2c7f03e587fca98e
d30990febb39c5d3fcdf6fbb107ca0eeef57cf4b
4978 F20101207_AABMFU fan_j_Page_083thm.jpg
59022c81738da79b5bb4412b3085b71c
a30223eb1100f6d59d2ba493d998cff5d07ba6ae
5779 F20101207_AABKXM fan_j_Page_066thm.jpg
adc7512b344bf28f9e94476640e09f59
e98955ebd08e9a754d436c697994355ad4c869d1
F20101207_AABKWY fan_j_Page_133.tif
85cbb4a79b70ae8eeee70dd84281646a
d5e7d6154d6f02530392b9cc65706b037ebc2350
F20101207_AABLDG fan_j_Page_122.tif
a6083146a0ff26d0c3808bc1513b5edc
b610910b5ff556b2025af5c52d443c37e49457e2
41595 F20101207_AABKYA fan_j_Page_046.pro
b55c58576edd4b4bf0219d68ac40553d
f568f9d0b688ab67760c08759a30e07615a77e2e
5625 F20101207_AABLCS fan_j_Page_057thm.jpg
37cfa3983b5a1ecca8b383f6cbd69483
f86e93f01551024b51b737cffe5f2b4031e8255a
20938 F20101207_AABMGJ fan_j_Page_096.QC.jpg
4806a576b8f72a55fc9dafe6f3b2b5d0
dbdfec93f40c3c5212279c7e340952c8e90dd5d3
5333 F20101207_AABMFV fan_j_Page_084thm.jpg
d7309893c945ed40b73ed5bf6a8ea6ff
fd9735b47c3b3886585cfa701fe203f28197ff28
969341 F20101207_AABKXN fan_j_Page_036.jp2
ae81fee35783fe537f98077d7063f6a0
06635eb178ae109bc3f46dd419eee0b09627ac36
8082 F20101207_AABLDH fan_j_Page_133.QC.jpg
ed31eaad04819e532baeea438a9b554c
7d473ee29dcdc25a87ecd1e72664d7621343029e
33731 F20101207_AABKYB fan_j_Page_119.pro
99b3ddf5d8808bef88f9be03ee2fa6c2
febf64b18083ebc9e3a4b0c0a9d32371e37e7c7b
4489 F20101207_AABLCT fan_j_Page_120thm.jpg
61341fb8e02cbe1c8dc6421d7cc25d3f
e2de735826411b2b9500467bfc7e66fd90980179
4312 F20101207_AABMGK fan_j_Page_097thm.jpg
a67e7762996341da5fcbb0c7259d03e5
b9dd5b84518edee0bd67f6af6adef59696eee863
5041 F20101207_AABMFW fan_j_Page_086thm.jpg
0568d3b5054a62b2e193de72affaaf69
1629ed32086c1fe46f795c5ef32b89691e92ea61
37708 F20101207_AABKXO fan_j_Page_075.pro
08938526ff5a0a3f468e5d474e3f0783
e9080b69c11494e9c97ee15afd3ec0ed32a40c45
F20101207_AABKWZ fan_j_Page_097.tif
9fc49450b92fc3b8cfd167392188d69a
62a5b0ab8429e57299de958daf5681b3a0038c0f
48493 F20101207_AABLDI fan_j_Page_031.pro
99beda1f009a73e26cfe1e47d4f0034d
a88b78f9116c2ad5bc71aff5f26dc6144e77fe19
37886 F20101207_AABKYC fan_j_Page_007.pro
6764e8ce99009985bbb4e73be1ff1d18
1091a1f4b174fd99e26805c75cc6483fff299955
2021 F20101207_AABLCU fan_j_Page_115.txt
c77a687f04b78aad887140a8a6340b11
1092fd88a6c7cc8eb8026d96276c6d875f22f109
5358 F20101207_AABMHA fan_j_Page_115thm.jpg
57d7e9173282485a4143b79b307186b4
8a78ec599ad83ad3c70b8483da3fb18a69669d2b
4975 F20101207_AABMGL fan_j_Page_098thm.jpg
3257a405d23f2a06e4b7d4b4d19bb717
2b1b9f4b883a1f6de5a2616a7dbb72a3318aceff
21634 F20101207_AABMFX fan_j_Page_086.QC.jpg
2a58021fd6da0dec4909a47eb6209f9c
e66512b6a99d708d8c2627213528606a55dc0d9b
1834 F20101207_AABKXP fan_j_Page_109.txt
457468b70c4b92a26ce88e0eaf7424c2
28e1d53a30891b450ce699d0e21f31a3fe04ef8f
1795 F20101207_AABLDJ fan_j_Page_097.txt
a86feeaf8e7a7eda45daddc15ce42bee
dec033b7aec69966e4f8474a69b9073a074f47f6
F20101207_AABKYD fan_j_Page_063.txt
7515be2ae15e55ff1b369c96a455a953
fb8e7bfbae77fc8cad6591ac1395ea74662ad23c
1051975 F20101207_AABLCV fan_j_Page_049.jp2
171b0602be28345f02456962bcbf8a52
b57c8c862d1652ff853f5729c92204dcacb30202
22514 F20101207_AABMHB fan_j_Page_115.QC.jpg
5f6f821c4eaf8cc06284b1ba51a63ed6
0a342ac71d1a925bfe7ba8f63800f91c8c66d599
4977 F20101207_AABMGM fan_j_Page_099thm.jpg
aef870e82aacfe5f1996f3ebb62cd6e0
2fb6abbb2bccf4a77b6b018b6405625a37129a70
5100 F20101207_AABMFY fan_j_Page_087thm.jpg
207eec3a9bfe93457ecfe4807589fc65
3fa0831801f0dbf10ce9482ae133eb3b8c632353
791 F20101207_AABKXQ fan_j_Page_134.txt
19f60f2e9b6c0975bd94b745f6b5eed5
874b478ba52bec787d9a809165b6f166382ff775
12647 F20101207_AABLDK fan_j_Page_078.QC.jpg
6558eb2e7f2935e8449b3ebe0fdb778f
d5d00f0b6184d27b33d1fe0f2099f31c91b155d9
2207 F20101207_AABKYE fan_j_Page_129.txt
eb9b59e5ec486629626bbf0b8d7d25ee
6e37ced3308705298f7236ce1054883486e125b3
1051960 F20101207_AABLCW fan_j_Page_011.jp2
66e2053168af44b1ae341544a422e772
452e1537622ad9faa8ce08a91ff6185afb53c8a0
F20101207_AABMHC fan_j_Page_117thm.jpg
d36172ea68e43dfe67d8b9a2161781f1
0423e6ec6b5401baea4fc0f2ec44ea34896fd87e
21141 F20101207_AABMGN fan_j_Page_099.QC.jpg
fe0dbf61e44638bbabd18b93f3e63a92
a44c67b4010b5de1f5d895846ab4d090929c8ea8
4826 F20101207_AABMFZ fan_j_Page_088thm.jpg
a168d2f731947e7fc49ba7f603771521
87aa7423c7b8103f8cd79752cb2139b8597a9398
13296 F20101207_AABKXR fan_j_Page_004.pro
0222aaa6196d15f343e88d18c1d45ca9
631b9383c26b4be5b90da2bdef15e092759cc9ca
757070 F20101207_AABLEA fan_j_Page_124.jp2
1f658fa37d2e2e6226ac47ebf85274c1
59e5ce8f5d78618ad75a65d8f9248093eedb8e79
49888 F20101207_AABLDL fan_j_Page_116.pro
900f0a5f3b453291b973b3af9dd5ce1c
f866b0f75597f29cd10d2aecaa88365a97a5b382
1543 F20101207_AABKYF fan_j_Page_114.txt
18380ba4624a654947309ef8c78ffcaa
880040c4a042ac123d2e21cabd738b1d02284687
1051985 F20101207_AABLCX fan_j_Page_031.jp2
5fd4d562db7e731271f959268caad03e
5fa600128e3367b5625ad16e50b9dc035a2c16ca
16172 F20101207_AABMHD fan_j_Page_117.QC.jpg
9f5c13fc9de6555841739f31393e6741
3a10822724ad2023b8a36a2964c75ba0f4b0209a
3839 F20101207_AABMGO fan_j_Page_101thm.jpg
883540048bee02c9545aa2f46873c784
78d797f0e1ababe5a69a2072547e423b388d1a46
F20101207_AABKXS fan_j_Page_006.tif
d6733307c90945d0ea732a4001f1a8d3
ee3eb50e3f0a2cdbe6e8ec9fe064b060ae69b181
45508 F20101207_AABLEB fan_j_Page_108.pro
ff053f5e2c370819c1380fa1100837cc
d9553ef040c372c7898201504e8f2aeabaeb9118
5629 F20101207_AABLDM fan_j_Page_116thm.jpg
fcfdbe36d60e0c0cda57650eaa137f89
107d84325135a8f1bc78e529887554ea58a748ec
23506 F20101207_AABKYG fan_j_Page_116.QC.jpg
c0d67b72f9a4623b06c38feb0dc81f29
230deb81bd5237061333bb5226b77cc8601d6328
24765 F20101207_AABLCY fan_j_Page_059.QC.jpg
b6ea1b8d02ccbff63993f3829bdc24ff
d9b74c6de6aea247d62105957ba1cb94bdff9e8d
17408 F20101207_AABMHE fan_j_Page_122.QC.jpg
7f0aab2a559b774f31a0aa0265a48883
4f53417020d7e0bbdd0760438b65663d3b2dd7b8
11601 F20101207_AABMGP fan_j_Page_102.QC.jpg
df674412ad81887fab279d439e1768a7
25fd9b6c645a4e7efe35266e63e421cbc99d8de7
1051984 F20101207_AABKXT fan_j_Page_064.jp2
6d98d2649c1fbbd0526014dd844d3894
d940e47183666808e85d9d454eed4926b7bf69e1
92198 F20101207_AABLEC fan_j_Page_016.jpg
718d21853a3e77a3cf9d61ca5202bb51
c8371338d54cd406f89756cab4b445f942c8f6fc
F20101207_AABLDN fan_j_Page_124thm.jpg
1837630226cb5f97f4e536eafa525545
5932233c293bbe9465beba0aaffc2d1f68907e04
F20101207_AABKYH fan_j_Page_117.tif
370c9e7283f230ca40a97b0beca6185e
1f997aac69112a99f4278037239b3d5e08b8885d
F20101207_AABLCZ fan_j_Page_131.tif
cbb324da86ed53a594b168f48311c4c4
9e724050b3cce80cc7d4aca27e2a168cbd633721
5092 F20101207_AABMHF fan_j_Page_123thm.jpg
836fd5b86411db539329bc1d18156939
2b6de71943a7ae9dac84e546d8b1d6a56a9b9683
5292 F20101207_AABMGQ fan_j_Page_103thm.jpg
3950c4a09cf25767255f8256136eb80e
3742fb8fc70961e19c60f2a980cc3716d6ccb4e4
F20101207_AABKXU fan_j_Page_140.tif
2c88b40a291c7d966af0d2912f915177
fbf8591e5ccafc1e83cc772eca245b9386e99a9d
4218 F20101207_AABLED fan_j_Page_125thm.jpg
eedef2bb6838d70b3dd1b648920a1220
16be4949a39ebe86b1767cff2c0442e008724f91
5822 F20101207_AABLDO fan_j_Page_072thm.jpg
824ec7fbbfdee6ac6b7173e2c14e110d
195e6e8e048344124a4b48ff251773d24f004cbf
78737 F20101207_AABKYI fan_j_Page_022.jp2
b962bcf42cc00e08320a5a6e353fd5b1
22f8c7079f5628da33e5c7241aa48988d5e99490
21884 F20101207_AABMHG fan_j_Page_123.QC.jpg
b81e91be91fc68fb4db8ffe8913021cb
dd58aa2997337dfc3b31c3ab5147a38e4820c712
25704 F20101207_AABMGR fan_j_Page_106.QC.jpg
35a52d50a0c88792e7d670fdf7971502
1c242d395399267b06d414b2963b7f17cb8f2f69
73676 F20101207_AABKXV fan_j_Page_076.jpg
eee5508d006e7e7845ab6e3d47185e6c
56f57da8b7c2af13d3be4be332d1f91f545f1e27
4037 F20101207_AABLEE fan_j_Page_022thm.jpg
ffc7c18e6520df1dae781bd6ac79762c
18d491b68d9b79489eb9e3190a393958ebfd1f81
6571 F20101207_AABLDP fan_j_Page_137thm.jpg
8f1deae2d2a2d1ccc4a32029739e7132
96d3d36ac1872c82f524b02306a42d6c3d94fda5
F20101207_AABKYJ fan_j_Page_056.jp2
4bd008cab56526a9740589db8ac03a8e
5bd3bf7dad5f918c9bf37ab7536fff627eba813b
17878 F20101207_AABMHH fan_j_Page_124.QC.jpg
4be0051f20811f04f0eae42f6352ca4b
d017ad1bdf6280296c4ab11a3136b732a6b526d9
5837 F20101207_AABMGS fan_j_Page_107thm.jpg
0672272d2c891162bddf9e8de1a3ec6f
5b7dd8d269375c0ecbbf49a5339939a80ff7315d
F20101207_AABKXW fan_j_Page_132.tif
4befe337bcea8118bc7fbc4dca1271e3
6fe9523cc483dae17849bb82a793647bbdbc1d89
996608 F20101207_AABLEF fan_j_Page_098.jp2
42f7df93adb37ee67a727e206922b33e
41a9cff1f361fc993c9c58dfeb00b9a2595c194b
20041 F20101207_AABLDQ fan_j_Page_052.QC.jpg
9846cda8385776342694e0028ec89d7c
0ac77c39658aac5bc6ff8d1489a8e8a9be6e6055
21356 F20101207_AABKYK fan_j_Page_098.QC.jpg
bc149d673e79a53e1cd3bae563415800
f76cf54f331ec0a9d04777fcde527a72ea15000f
5470 F20101207_AABMGT fan_j_Page_108thm.jpg
fb3397c10cad4e0e747500ee1945799a
13e13477c9dffb719deeb5c6a5d982b55bc9648f
6216 F20101207_AABKXX fan_j_Page_091thm.jpg
3dccd500887f62bc1b8889f9d903bb70
49b3a75d329e825e702de97448ffca9d1979a3e5
99045 F20101207_AABLDR fan_j_Page_095.jpg
00baab9bea4d1fa565d87abb80702d9c
9c6502e0c0a3014fb17576cf2ea8e693f1846c81
F20101207_AABKYL fan_j_Page_126.tif
ba15c9a8e9e6d0d0de1378aeb3174671
44ea6140a674693b21e4c358b7314882887bd2dc
20540 F20101207_AABMHI fan_j_Page_126.QC.jpg
6bb5f4e42ae01821a607c86d69145e68
44bf92c9db5c58ccba323e28be765461c104cf88
5147 F20101207_AABMGU fan_j_Page_109thm.jpg
b519b6eb5e3e959c0527dd806fe0ed38
6698a17a40e20f433863324c40d2d099ca5ac1e4
964995 F20101207_AABKXY fan_j_Page_099.jp2
6312bf0f9f4ab8b3364c13b2d12d4671
eb023b49f6a2e1206ce258afc20a9ab6e4c4e922
F20101207_AABLEG fan_j_Page_075.tif
e86d163a2a03b60c5b4f6c36ec655785
a0c25a0bc1726ceb2b9bbb685f094430be375e14
4683 F20101207_AABKZA fan_j_Page_045thm.jpg
b921819e4457f6d2d50448641f67420c
200532b23f755c600715673aa141c6e2ef1b84da
F20101207_AABLDS fan_j_Page_045.tif
62444531bddb484b285d3fba964fcc3f
37ee8d60e365b9c9e956b56f003dfcd5940c7331
F20101207_AABKYM fan_j_Page_104.txt
48496868eb1e5b8c7ad910eea6e65cb2
cfa7fcf9a46a8e6eb40a3748022f4df7ed27bda5
3932 F20101207_AABMHJ fan_j_Page_127thm.jpg
c29c5aaaf70af1bd898e90533482dc9b
4bd1ca53b8bbc758970748ddcb1cbef0e750e4f9
4590 F20101207_AABMGV fan_j_Page_110thm.jpg
bc499a8bb88300a3bdba6e4e1e6ef6e0
ed4e4f75f3a742328dd7685a4d17a72260ca7d79
F20101207_AABKXZ fan_j_Page_008.tif
8cb3f879c8b595c8d34f74ed2c7d7e77
912efdcac806a87c601fe3418d2eb41a2d780f8d
1611 F20101207_AABLEH fan_j_Page_117.txt
8fe004238ddb0b0fdb6e9d61469b4c02
417a2dfc5bb4453922f1b676588140406ae10921
1854 F20101207_AABKZB fan_j_Page_032.txt
0593bd7036df09b888b44767c324bd92
cc75455857d0a530d89856510f2bc77c5030fff2
1051970 F20101207_AABLDT fan_j_Page_007.jp2
9abf8e09b577ee811edbb6c7556f0b17
f10cd4a560a0747ff4e81c3b783e105d7018ea2f
F20101207_AABKYN fan_j_Page_108.tif
27650e399ae46d26a9c5c85aa7e13c14
a4bd88109fc6b7be2653cfd5d341148edbf6b171
21206 F20101207_AABMHK fan_j_Page_128.QC.jpg
fa141b0f32d90463d557aad7bdd6cadc
d61dd0e2dea53d4117a1daaee91c6707cc51e3ff
18593 F20101207_AABMGW fan_j_Page_110.QC.jpg
13438949369e3bbc7e0bc42b93058e7b
d4ed9a17267be3451df85a4ae8614efb16fc5697
19935 F20101207_AABLEI fan_j_Page_070.QC.jpg
e970c385697e18733dc26b0d3eb38951
1521c083827ba370133a1c3874a9e7bcf1f3c629
F20101207_AABKZC fan_j_Page_042.tif
50f91597ecaef8f5a3024de6f6ec3446
40a7904a683a358bb988c76c6cc547937d7b9b84
117104 F20101207_AABLDU fan_j_Page_015.jp2
7b6726b73d0e9d47f0bfe493ac070e69
9e6c0a61240ee3c8de1bb70ae3c8c7a221bc4c93
106120 F20101207_AABKYO fan_j_Page_137.jpg
bf0e387f14f8a0af61eddee034b5b848
f515de772e344050bc1236fd3b72a0bf54fe91c1
5543 F20101207_AABMHL fan_j_Page_130thm.jpg
3f5ac545d8fab28f855abc509a4a9a44
54472016b089430770371b8d7bd5bbb62d1678c7
16084 F20101207_AABMGX fan_j_Page_111.QC.jpg
80618603437eec4d6441ceb538937a4a
f10543cba45d91f8d89709c78d92d5f01396df5b
1574 F20101207_AABLEJ fan_j_Page_122.txt
c032f4816817f08f4ac6a1c1851f7d83
2726b33c223aef235f1dc0cb953d549a42bc0c82
24700 F20101207_AABKZD fan_j_Page_131.QC.jpg
4866238f8cf263a9862cf50470f401a7
44b7be2e9c674e00c6a9ef26a2b26cdbc53af85b
5595 F20101207_AABLDV fan_j_Page_104thm.jpg
df580f92ec98f931baf544b7930e32cc
6d0fdecb06edd3bc31bc619acd13096ed1600270
1845 F20101207_AABKYP fan_j_Page_024.txt
23bf9f0e1e87d2246757feeed386f4ea
78771448b7a74a928bca4bba1bfdf2845dddc01b
25924 F20101207_AABMHM fan_j_Page_130.QC.jpg
0f6fa540ec31c283f858c515fe552795
d4b1fa006f0aaf0d0f18c915a7ba53ed25083993
3441 F20101207_AABMGY fan_j_Page_113thm.jpg
bbd081cff890f513c133e22c02f93cc1
9fdc9b6825efc4ed0d01ecdb8620ac0ad046a745
63836 F20101207_AABLEK fan_j_Page_035.pro
954263820dcdd7b40f2c10dfa591fa23
cfa961796e869fecdfc48348f595addcf6463544
37690 F20101207_AABKZE fan_j_Page_073.jpg
17c3d39d57f30073d436429d9b6f2f1d
0520ed33203646c7caa8b02d2ef9f53b70ab4309
1051934 F20101207_AABLDW fan_j_Page_107.jp2
00c654489db191f3df0c63026e1dcda6
dcf505bdf7b13a84f8415aa000cc4621df4d4a01
25747 F20101207_AABKYQ fan_j_Page_025.QC.jpg
b2e29914891a178161fe60d06b907f91
1b1f1be71e92e438a5c8b664c04ef97539252609
3255 F20101207_AABMHN fan_j_Page_132thm.jpg
82bfe23beab86ec8be0c82aa3dccfbd9
58185fb1d269c26f5634f4ee3e34ed784f274324
12338 F20101207_AABMGZ fan_j_Page_113.QC.jpg
dc63a2c6a4610b06bb99ee63f503fd6b
f01663730e4ffcbf46047c5ba267191d6894226b
113630 F20101207_AABLEL fan_j_Page_129.jp2
7a49748787b7c310f6eb6f8f71df660a
2fd14d44e6fb5e1e0333c81c45c8906a00b4124a
1877 F20101207_AABKZF fan_j_Page_086.txt
348467c2632f3a98a75722e72c5ac3b2
d830fd6d8c24cb43f06847f99cdc57f833070b12
1933 F20101207_AABLDX fan_j_Page_100.txt
cdf5bf0f2003b42d4dd61038c916f540
586d8dd790aa57a83b712079d5d34c0ad2314f58
F20101207_AABKYR fan_j_Page_101.tif
a6e5b7af646ef17e6c43801ee7065b4f
e4a5b046fcc071d0b33ce373f9f52fa5efed24fb
88823 F20101207_AABLFA fan_j_Page_005.jpg
186f23b8af29afbf91280a3812a37fa9
8b6dadb24c5f0de99fb6e2106a398b8eaa229d5f
12793 F20101207_AABMHO fan_j_Page_132.QC.jpg
754c65fac3bc2d1796f77464a796394d
223da5368387acba7e142d03b76b8640b6bb9b53
F20101207_AABLEM fan_j_Page_043.tif
ca6981016948b48b6462653e626060a2
5d56ee3eee6d619dd5167fdaba62b1728aff8ef4
1821 F20101207_AABKZG fan_j_Page_043.txt
5ed70542a31baef5c2b779b14f66d1f5
201ece5890928c2805927632d38fa173275fe53c
20530 F20101207_AABLDY fan_j_Page_021.QC.jpg
a64439627945f390b6c1fc8e2a9ae573
e7213e54b10c020b6f664b8dd98a2a572b45a7f6
1772324 F20101207_AABKYS fan_j.pdf
3696ad86321a86964a0dbc37a5aa175d
4a94a693da5ff6fe4fa456980027554ac466932a
BROKEN_LINK
citeseer.ist.psu.edu/moore01coralreef.html
59654 F20101207_AABLFB fan_j_Page_007.jpg
bf80464fc647ab72eb4de2faa2c8da2c
3be5f0cab4277b0452c0f0060f6b729d8cfe8541
2522 F20101207_AABMHP fan_j_Page_133thm.jpg
7b4b94574f1be3ae210ba46dd4b8e80b
95f4669a5f8300bd9a112c916d3e2dc60ee09783
27184 F20101207_AABLEN fan_j_Page_135.QC.jpg
d1d2581592c7e8233340d75fdd566645
e0748732db15e7e1afa6886b3eb22da1ad5ef369
F20101207_AABKZH fan_j_Page_063.tif
fefe4a7bd9e87f7d1918032960c09129
5a64989e4bdb09c91ca669dd1adee41779723eac
37424 F20101207_AABLDZ fan_j_Page_034.pro
1fe9e3a9d66e9554c50d85b16d533da8
281d285cab363f4b7f3468fa5b2be05d1297cbf3
50951 F20101207_AABKYT fan_j_Page_112.jpg
952855df62b716fc13c8d2403e5c99bf
797132ebe70ebc3030602811fee69b4a0fd0fbab
33581 F20101207_AABLFC fan_j_Page_008.jpg
24ffa512d95d14e4ef7f0267581ebfbf
c58e610ba0c941bf762d943157cacf4d2c788749
27940 F20101207_AABMHQ fan_j_Page_136.QC.jpg
a2821d0df0bf8f40e4aef541775748d5
b7af9e299cacd9f98f38b08ac24b9ee4c646dbeb
2523 F20101207_AABLEO fan_j_Page_095.txt
3a1ee1d66c28b769f8b9458ec8f2fd21
fcb54120afd134c055bb04a035f5c5c35feeda9c
61815 F20101207_AABKZI fan_j_Page_077.jpg
26dcff12648a3a6bd4373cc450a7003a
793b8939a8115030e6dd030390c28249b1c06d00
1051897 F20101207_AABKYU fan_j_Page_051.jp2
ad408c1092ec6e0ee25b337ecbe5c7f1
6ac460e62b838a03996db0b898c2a0c475cc6387
91018 F20101207_AABLFD fan_j_Page_010.jpg
2cb6964fc27ec9a06c7ef7933c1b82b4
1bcc6d2f7ba77c11f8091cefd7deda304c51b5ff
29040 F20101207_AABMHR fan_j_Page_138.QC.jpg
15db1e005f563615e3bcd2574df23af0
72f86074691b6ae92a278125fac0122b37d00e2d
F20101207_AABLEP fan_j_Page_016.tif
8049bd6d205244867bf6a8114941ca59
f70638fdfb804ab6be00d26162f476db9e21451f
F20101207_AABKZJ fan_j_Page_064.tif
d2ed711264461d0f89ede506e39062d4
b4c8b3e93839dd4042adf49cf60b0bebaeee6917
67557 F20101207_AABKYV fan_j_Page_080.jpg
78d62a02973c1abbe5691de11e4e9ff2
b7a27b5a866aad78b020a39c2e734a56e8bbee35
78728 F20101207_AABLFE fan_j_Page_011.jpg
f17c9bf9f1baa315536a1614b8a63bba
685279dae9f1a52bc1b6e98d57879359c777e232
3322 F20101207_AABMHS fan_j_Page_139thm.jpg
3e0b36aa59da7ee09908d3f602009c27
8dd86c813a224542aeeebc1825c81ac54d231e23
36145 F20101207_AABLEQ fan_j_Page_047.pro
4026509765fd9a680429627455ce1fc9
a0de27f44b1908e0f1410da5064333113ee6752e
F20101207_AABKZK fan_j_Page_027.tif
7311e7d9084317ed66e5024fbb0dbc24
e1087cdaa89f7981710eae65045e6a214e51260d
4289 F20101207_AABKYW fan_j_Page_058thm.jpg
f04ebc5a67eb7e216f0748a434c31ac6
3081fce3d67dded0df42214c78f3df39c49c8523
73169 F20101207_AABLFF fan_j_Page_012.jpg
540f6e75bf922b1960c9df6bab49b0ba
2103441b3a5aed1bb0757c85575da99ff344a8ad
13819 F20101207_AABMHT fan_j_Page_139.QC.jpg
3cd6f50ae5921e5a2adce78502c6080a
1e2b97ad05e1a8f99e902864ab9665d036d008c1
996014 F20101207_AABLER fan_j_Page_123.jp2
2a47b7e7c1a86a875754d67af5df3d31
afb18e8d389893b96ca8e40739202c5a6cd273a1
F20101207_AABKZL fan_j_Page_100.tif
8c69ef77431f2b3754abc2cb89b56d69
3a8093c7ce91c7e0ee5bf0ccf85dfe777abe610b
F20101207_AABKYX fan_j_Page_091.tif
6719ca7a837b27cc2f6be25c88b46c28
5e1a6cefd7d7b34f83930d249cba4dd82f0a8bef
53967 F20101207_AABLFG fan_j_Page_013.jpg
09846860413aea0ccbaecf7074a346be
a9dd59055474a1acf8c4c7bd2af31150e060778d
2058 F20101207_AABMHU fan_j_Page_140thm.jpg
422eb9e2517d330393e1a2004d75a03f
1ded9169f810c0c8ad03767f41c24be60cef2252
F20101207_AABLES fan_j_Page_046.txt
092bd4e962c0f661683bc9ce54408b60
4146e618c9b09b66d791d8f9df4cdf395e14d193
42434 F20101207_AABKZM fan_j_Page_054.pro
65b245d7af73d658a82afc6b44431a30
0ac2219c429f9eb6a6e1c783255a7610e5d90e1f
90048 F20101207_AABKYY fan_j_Page_009.jpg
df443a4dc2dc6d396cee0d6fb84492a5
bf754b5dcf33ec77bad32fe5a5036c5b6d41632e
159697 F20101207_AABLET UFE0019813_00001.mets
68dcadde9b43b47f330d8760d49aa7eb
7a0725c15b8c9819a12cb12384b109e8d5a90710
17461 F20101207_AABKZN fan_j_Page_019.pro
b790544d1f79e804b58d30d6e84da375
24b95960fb093f0f9c10e9bb7aafce07d125629b
F20101207_AABKYZ fan_j_Page_009.jp2
38c5efa6461fcb34534fb3ba1f7803c8
cb9b07c101326080b1024d2d509a5300e9587c2c
85284 F20101207_AABLFH fan_j_Page_014.jpg
e76329f83d077bf39f8a183562a770cf
cf6d51577ad9dec0367c96c162a70360d77681da
5429 F20101207_AABKZO fan_j_Page_085thm.jpg
1e7e9a0b0d385bbde0d0e033a8209cc5
28e6e13da662fab7f3b8e2976f480c61442b2a20
46722 F20101207_AABLFI fan_j_Page_017.jpg
2f3e5e287ac7f5205910135e52740f37
6e5e5220c32b61f379aaf0e09c717dc16c3f58b3
1768 F20101207_AABKZP fan_j_Page_101.txt
387189c68ecdbc0193558a1db394a85b
2dc8dbed007b70cb3cd97a0073cbba21153d4ae8
85374 F20101207_AABLFJ fan_j_Page_018.jpg
2fb1b359b136c9d1587714c695692441
7e8e6787638bdecd7afa22d679d16b1e481c28c6
18993 F20101207_AABLEW fan_j_Page_001.jpg
5ff4fa145d657b47a31bcd6027524743
6d2c8d7319bf4a8007b2146151b37bbd1521bb4b
23687 F20101207_AABKZQ fan_j_Page_066.QC.jpg
c805109fbdd43b17dee3d8a2fc1286c8
adb112b8ee1cbebca6fa82ee69d319da8f53d564
46056 F20101207_AABLFK fan_j_Page_019.jpg
c8b4c1c37db56c734591d685ab496597
640835ea929cd3d9a6c331ed9ee7f3c86f0b262b
3307 F20101207_AABLEX fan_j_Page_002.jpg
b63ab6f1bfec73c0cd64f11893ec711e
3e9d5a8bfa9430e4d975a4dd87c07699733e1d3d
30292 F20101207_AABKZR fan_j_Page_020.pro
e859c873452e3a2dde64f306292a0796
9f1533e5dab2b4a2647e9e8a5afc48e519425d0a
62841 F20101207_AABLGA fan_j_Page_037.jpg
be2ca5cb786b91e20e14e1669554a548
634d8d6b89bfbd63a09e285f8120321e58f04aa1
63375 F20101207_AABLFL fan_j_Page_020.jpg
ec5d5ecfd315ce5f55aa89e734f42734
18f1eeb38615460ddfc607aa347f9b69e2859553
65860 F20101207_AABKZS fan_j_Page_062.jpg
e5ed5ebf057f7a9e1dbb37e503d0fa3a
2867bbff79f0d7c9aff8e86f4816d2ac96861d2e
71336 F20101207_AABLGB fan_j_Page_038.jpg
69f0b0ef5764f81863ddd13757a7585b
3f3df54b67be0b7e20aa17a21a9fe90b1ad7fb98
65780 F20101207_AABLFM fan_j_Page_021.jpg
57de626a7c6dd4b3d0bf0a6e4c627f91
c60105e791c47f8774b3394a0b558f5aacb601b2
9218 F20101207_AABLEY fan_j_Page_003.jpg
f0c6f96a9ccf48a1b68c5758e500551b
c1a63a40e3c68b13bbb81f6b3b10d1a3a4386baa
2127 F20101207_AABKZT fan_j_Page_107.txt
1e6157d6bc49e309d4e773e3827d4781
18e8a097331d67f855861b5f9574d35c27fbf36e
81961 F20101207_AABLGC fan_j_Page_039.jpg
3c4ebb5d25f6e14ac2a2766b7124692a
d41b33cd47afe462434d6541a35147a0d24d1d08
56553 F20101207_AABLFN fan_j_Page_022.jpg
c658e59fe1ae89b5959056ca48ccdfda
d1547cde9dac063ba497278e52d6bf6de97cb20c
24026 F20101207_AABLEZ fan_j_Page_004.jpg
cb54d5bae0902139e6beb2ba88b6495a
70d0ca4d423cc5e003c4620db455b607febef387
879358 F20101207_AABKZU fan_j_Page_083.jp2
48dd4d88990b9cac0c78fe9df9776ce2
d59c13cdb0ad23b4108fa2ab4f85b7f97e0fe0c2
89338 F20101207_AABLGD fan_j_Page_040.jpg
296e67437c632932058e59d7d26995bd
95b8f646275ec55399a93ede05b8299adfcb45c7
72144 F20101207_AABLFO fan_j_Page_024.jpg
6736f56a5d586d64a0fcf074b0e721f9
c5153a83c4f1372d1856d2f2820eed8ee2bf61c9
112236 F20101207_AABKZV fan_j_Page_006.jpg
81f5a24df46e6f7b28f0fbc579d04781
0d759522f18a3af38e1ec012a6dfc2c4e11e5844
93185 F20101207_AABLGE fan_j_Page_041.jpg
95cd643cfc47b043cda81471d504d6d2
1c3822480acaa719379b552c8f49a80fe2785a3d
83591 F20101207_AABLFP fan_j_Page_025.jpg
b6d5dc67ef010ef434282979c630dfc2
5bda525e0792cc3899ddcd44f974a5194f414239
F20101207_AABKZW fan_j_Page_055.tif
6411161977ce79fd709a14f4f5a4089a
2699bd0a5aaff04f9bf12ec8c6d9d3a27e9e0920
60476 F20101207_AABLGF fan_j_Page_042.jpg
3f0a288c5f7d48234b215de1c209a61f
937cdbcccc848bee2ecfdbdc25fdc6fe91a3611f
87791 F20101207_AABLFQ fan_j_Page_026.jpg
eb92e14ff35c2391df3e0004105dd21b
fcbb0465641cce5b57ea36922d825d788440294b
2257 F20101207_AABKZX fan_j_Page_040.txt
b0bb966d01fb981f4efd17d0617e9920
1eff13c9259f6ab09b1997acefeb8fc7afeb19ce
61070 F20101207_AABLGG fan_j_Page_043.jpg
d08a1489c1920154cd7eb7e10aac9d89
99d6459fc6fbf7b683bf4db96e20d031c32638a4
86326 F20101207_AABLFR fan_j_Page_027.jpg
02a526f9883bb7811350bc671c0be256
b733bc1b68fb7a1ff764952b05197377c985fab2
12007 F20101207_AABKZY fan_j_Page_073.QC.jpg
86a175c086d1a0d5e3becdfa296e7558
6cf568b572ce0014258a788e372d01b34a8e8eec
57398 F20101207_AABLGH fan_j_Page_044.jpg
3e28064aa4b2fd5da880b68f2d61b64d
8441e3f4ec7342389dd20ce9e8a839ee4ef87e03
72776 F20101207_AABLFS fan_j_Page_028.jpg
10f18903bb00b7e2867f983705ca798a
e8fafe19e517d01a13a9aeb0c3328bba35300bac
1051980 F20101207_AABKZZ fan_j_Page_135.jp2
34a55f506794a278260d52915e742c50
5af65bb13113e82bd9258bc970c2ae8caaadcfb7
61181 F20101207_AABLFT fan_j_Page_029.jpg
03136f7f84d1de45b6189a6b842ea131
c98a4fa41afc0d20beeca2024a18ae0b8db8f77d
59225 F20101207_AABLGI fan_j_Page_045.jpg
3bceb9a56a802794f8e661ea9702f087
188241a49f47ad0f0e118719aa5c6889338f7bfb
81694 F20101207_AABLFU fan_j_Page_030.jpg
77c38bfd286a064f3580e219e74eb49f
d1974ce9610ede0eab4212b27b010f707118a98e
68252 F20101207_AABLGJ fan_j_Page_046.jpg
0684c6a9cd25aac918f764b568285f09
821fe71b340f7aa699401b70983c52253a2b1e0e
74745 F20101207_AABLFV fan_j_Page_031.jpg
f7641f3d0a18f3d6b5b8b7336a03fb82
6eebc729a2415b05992760151560e9aefc383f70
58949 F20101207_AABLGK fan_j_Page_047.jpg
c8066a9b95f87af5c6496b3eb7b83b3b
a7a8c1be5d6666cab4668791d629f3231e610104
71424 F20101207_AABLFW fan_j_Page_032.jpg
dc24c29088fa58ec5e128459dacdf5a8
25cb5fc4bbbe90a41f2ff2f8f71d9558955afc4e
71620 F20101207_AABLGL fan_j_Page_048.jpg
fdd11fc75cd80a902c8cf18f59d7fab1
5e882cc559da33570b1902e418e537c1aa88eef8
55623 F20101207_AABLFX fan_j_Page_033.jpg
892ca122a194392e67b0fd2e3ae5f106
dcfe740e7ab1a6c9e18471e3c9de4f9ce57e36b2
71588 F20101207_AABLHA fan_j_Page_066.jpg
e065520338e3f1cc1c237ad493e905b0
37818c9bb3f8af3ce470d19e558ca6258d0bfa05
88138 F20101207_AABLGM fan_j_Page_049.jpg
0e42710604485d973c38de556c0f4225
6b8426b1d3250c1fb037a33e4e823c822e7352e4
55606 F20101207_AABLFY fan_j_Page_034.jpg
dadd348871b4035c2475a5e2bd5a9632
1b9573ec641618504a1a0067678bd4c4876fd193
74919 F20101207_AABLHB fan_j_Page_067.jpg
7fc978545ad9116e92d299a7030629ed
42d022523f049f08bb47a08be974f632a36c674d
89957 F20101207_AABLGN fan_j_Page_051.jpg
06b54e4755808c8d716748ec3c431092
eb633982a5921f1056fdf038ca5541cd99e51404
72690 F20101207_AABLFZ fan_j_Page_036.jpg
291230c3443bd7afaaebeda0c8a1b769
4910faccae9aaa11c678873ada66ec0848ec3ed0
71960 F20101207_AABLHC fan_j_Page_068.jpg
47f188629ca6881a6c1b83782770837d
351145f029139d3adaf679203298422de0b35767
67543 F20101207_AABLGO fan_j_Page_052.jpg
949df598965cc895f191a8606c010c6d
3f790037362db4107996e3be2962346d5d96a8c7
67600 F20101207_AABLHD fan_j_Page_069.jpg
df0bd33f9feea39c119159fc66ccb235
1cc91b2fb92ff2b0bb16b8f1db7c4ff13ee88459
71390 F20101207_AABLGP fan_j_Page_053.jpg
9e1086ad1fe90bb0d089ea19593fc8fc
95a4720f8374d3a342add08972c2ddef183f040a
70619 F20101207_AABLHE fan_j_Page_070.jpg
cd6420cff7f78e03f935e4d31c8a166d
a00efe6bece31b3f5686da1052ae4edaa3b99650
71011 F20101207_AABLGQ fan_j_Page_054.jpg
a0c900de246ba0bcda8c063ed611f04e
a93cc59c02686adbfd896b96b3e6b530aff9b471
53581 F20101207_AABLHF fan_j_Page_071.jpg
e6d9603c2121887ccfb33abec06c17a8
7775898a352c5f83f0c36a851acd25adf648ce36
83542 F20101207_AABLGR fan_j_Page_056.jpg
89f0c719c1c45f9fee64e70ee21f5720
a36cf95b3415a86998bf465eaf007d6fad1d14d6
85434 F20101207_AABLHG fan_j_Page_072.jpg
bc881114b35459fe92ed1b1cd1d4f747
ecf97b7ff30900585adb523c2bdb128f27581d62
80221 F20101207_AABLGS fan_j_Page_057.jpg
4968914f68b937b1589cfd232c9e8cd8
b8d1cc19c3749c76700ae252ee0618218b6f4508
63004 F20101207_AABLHH fan_j_Page_074.jpg
f4ddadce6b6b1c3edc0691dc9da9fed6
0a67c29779bdb0260a9bb3bbddbaa4d48a6018d9
F20101207_AABLGT fan_j_Page_058.jpg
3451cd20911191831bf289e531bc7178
c3119181f4ffb2bb7b81dac391a5b91b1249b8be
70956 F20101207_AABLHI fan_j_Page_075.jpg
47412b0ae188a9ee893bfa6814d3af82
3645b2a154db058a5e7937a7abd2218c06ab3c08
81742 F20101207_AABLGU fan_j_Page_059.jpg
49f12524d877e30ae1b728222b28b513
9d442dd5983a42386d05c0b964d1d39b03abceb6
81279 F20101207_AABLGV fan_j_Page_060.jpg
e93b19ea2daece3f98cfc84253e62de1
0aaa3bdf706ecb137ca6d596a9ab95dfbdfa1013
45161 F20101207_AABLHJ fan_j_Page_078.jpg
2514af776dbf75394856d2e7997aa1ec
9c7dd148e2cb2e676c0f1b061b530e874a49a8d3
70939 F20101207_AABLGW fan_j_Page_061.jpg
c1afb52a2b81afd742e976104d422f8a
28d23c27351a3c8524c564cb543b6c69ec303813
62283 F20101207_AABLHK fan_j_Page_079.jpg
c897bf209b1099a87adb9b128528036f
877be7a5bb6d39b2be60293f96e6f00db0e6e984
62905 F20101207_AABLGX fan_j_Page_063.jpg
5c2306fdb060a95014452ebfc9dd2002
3bf07b4dc52c85275ad633643ca918e746cfb763
42948 F20101207_AABLIA fan_j_Page_101.jpg
90d913e806eb5ee4b85797fa00394d1b
49114e2aaa6ab69cdd573573a052ab4ccea66910
74356 F20101207_AABLHL fan_j_Page_081.jpg
7694334b23a033739a465a24645be4bc
e8a8af818ce1b5b5ca46c7444b2af49837749326
73997 F20101207_AABLGY fan_j_Page_064.jpg
7af152c197a08938b7e31e34d5900463
eefd1652c320499426cc572fc2e7ce0f45b056c0
38463 F20101207_AABLIB fan_j_Page_102.jpg
51d69a3ad72b717e47b7a6a671f8f669
bbf44a59a79db3e7dcf988157a642e8b3bceb4f2
46605 F20101207_AABLHM fan_j_Page_082.jpg
2d7f3195b7fe7fe8a31821696130c61f
d0722f36098fe6ffafc3de695b51030df339fc36
74607 F20101207_AABLGZ fan_j_Page_065.jpg
2bd30558c346cff64ae1b6c0bd08a05e
cbce5ee40a9dc24aa2f0cb0a1991cc910f867353
76406 F20101207_AABLIC fan_j_Page_103.jpg
ef92ece60c952956695b492423f1e97e
5f68f821a60fe01f179219abdeb1413519e5ef25
61763 F20101207_AABLHN fan_j_Page_083.jpg
3cd722ccf7e0c791849e4a220406307a
98d496866b68e5d3a45e83e4da6c3530ce910452
77957 F20101207_AABLID fan_j_Page_104.jpg
70db7aab497721803a66354a559849da
1da308a51ff86f042f6a6a7e30a53bb73ced4018
75558 F20101207_AABLHO fan_j_Page_084.jpg
931ad1318b5f890dd1404aa59ad9fc50
a9bb4dc98b87706ef94c3b7f9eea8e632dfe8c6a
84527 F20101207_AABLIE fan_j_Page_105.jpg
90bf958e2d61b7c6f1df57f2b15165b0
1e3b0deb705dc460b7c8ce8637e8b799e1c26a50
71053 F20101207_AABLHP fan_j_Page_085.jpg
070db78453a4ad176c944130127f9e6c
49b60824134ecdf871509171a4f970941f6bc8ba
87210 F20101207_AABLIF fan_j_Page_106.jpg
2f64a486073d6d338718286c32e99e64
7b6e2a814377edec1c83e17a9d9d35712b53bfd8
71931 F20101207_AABLHQ fan_j_Page_086.jpg
89ed6a23cd84dfb6613e544f4748c301
5798cc2a1685ce8895b8666154e526dca451946a
80485 F20101207_AABLIG fan_j_Page_107.jpg
949e7c073be00550aa034c222e3eaacc
4e29e7fa5eee0acbe91424a4dc7034ed554ac6b7
73179 F20101207_AABLHR fan_j_Page_089.jpg
bdffe868c66117e66323759420e0ebf9
406634d658b3ef18a86a9895dfd89f8116c5dabd
73282 F20101207_AABLIH fan_j_Page_108.jpg
41d80546a6ef21ff07fd49efab794eb7
69007ec70c89c1740aee7487cecada720e9f1d42
92455 F20101207_AABLHS fan_j_Page_090.jpg
31768dc4008ba35043591e0cc613d97e
2e930532ee403ebfcd646cfa5a23e96ad0d69c6c
70098 F20101207_AABLII fan_j_Page_109.jpg
abde81ee065d63b75cf2befaae17486d
97cadf8adc3d5765bb387700685ff46ffc1fb342
96091 F20101207_AABLHT fan_j_Page_092.jpg
340d02ffce2dbe196053ba5fa56c82f6
86b8e368bbc5a8a1867301040df829156dc60583
62785 F20101207_AABLIJ fan_j_Page_110.jpg
9ad5da097160b72ed830fe054c90ae30
8c65e3b4f51539fd8fe97a2131a0b2223658f66a
91053 F20101207_AABLHU fan_j_Page_094.jpg
f59b720d40463c51667a35f96e2b0df4
8067117b916afe275a9eec9c1ad814b95614a73a
70841 F20101207_AABLHV fan_j_Page_096.jpg
6b800fa392fa381ad07c41ef80c2f6dc
4454623bcfd90e292122e1edab4009bb7c524db6
50233 F20101207_AABLIK fan_j_Page_111.jpg
9827700e601a726e3797723b6d2606ad
e1325af8b4c90bcd4fc0f9b63ecfed4e39735613
55412 F20101207_AABLHW fan_j_Page_097.jpg
7760b159a9d896d61f520bb6f5dcc41b
cd50022f0ea754d425802c24ede09bf78cdf6e54
85278 F20101207_AABLJA fan_j_Page_130.jpg
ffc82cfa0c9a47f5e718d0847730f76a
6f1db3e9175b7c613c553f48aaef98c4e225ef74
36879 F20101207_AABLIL fan_j_Page_113.jpg
050df22641c1aa6c2336603bdcf34b6f
78266194c10daf51d86534ad1908efd47fffda3c
72324 F20101207_AABLHX fan_j_Page_098.jpg
395334c3de87f5c72ca410bd50263852
62eaa426ce39e8695a8850a98c71f0b6b4ada44a
81910 F20101207_AABLJB fan_j_Page_131.jpg
f21847301f82f9b991df1c8c8f8b5da2
064810ab2416426265b3d2630e1673e6dfd80185
55469 F20101207_AABLIM fan_j_Page_114.jpg
f87f0034ed89896b1b57d0fd3567c1b4
6f44515bec280a527b6b7d362dfc9c35e4ccfc94
72328 F20101207_AABLHY fan_j_Page_099.jpg
0e771a82933495155bd4b0c1ec5ad069
fc14b354fbd1ce384d5b5fe2d1cb70592d18784e
25347 F20101207_AABLJC fan_j_Page_133.jpg
6cd172f59dd13c56057662660b891b4b
2029912754b63a1c9029724807565e0c87cd0364
73889 F20101207_AABLIN fan_j_Page_115.jpg
73c2126b8286b2bcbf45612a00a41451
46de18f2316d73d46ff68e6613c0249f0d2f2cb2
69557 F20101207_AABLHZ fan_j_Page_100.jpg
91aa6c6f539bea65f820da0c9419b468
6794f031e20607bf0a9d31d99d91976816102fc4
45200 F20101207_AABLJD fan_j_Page_134.jpg
284786577193f488821ad333bf075555
279b3e05183da40325ccd2d060875dc0eb5aa1ef
79672 F20101207_AABLIO fan_j_Page_116.jpg
0cd0ac73cd001d4bb7ced915d7bf96dc
b3470dba8d0c5164a043b6796d9d8ad3bd35f2c4
101075 F20101207_AABLJE fan_j_Page_135.jpg
4745fe4534f38e8af348eee4c2a6cc90
344c0c5c24495a92dbbd606ba6aa20526b9b604e
51477 F20101207_AABLIP fan_j_Page_117.jpg
9969b57d0cada2e2181d28e084cce99e
1e0e331fe8581effcc1b5549118b7564d138d43c
99952 F20101207_AABLJF fan_j_Page_136.jpg
0dbbc4939e58795c374a81f45974266b
e26704c5e93ccdb29e4273f7761aea55fdbd7021
56462 F20101207_AABLIQ fan_j_Page_118.jpg
d7f588980496b2f50ef519157b06f13b
5d96f484f952e062737a30f2d4ee9b0a32185ce8
109599 F20101207_AABLJG fan_j_Page_138.jpg
2d621c9eadc36875df8c2a8b8522d486
e99059fb1235040e286a5a468a40317df770a11c
56049 F20101207_AABLIR fan_j_Page_119.jpg
340194f277cb1e0732cfe373182c18ca
3cf6014f1a5d9342e2e5c6e54e2588ce9b05bc49
50497 F20101207_AABLJH fan_j_Page_139.jpg
67842530959276449f31ee3ff03ab097
c326e0bef56cb5517ce34c933b313ed80f973828
56485 F20101207_AABLIS fan_j_Page_120.jpg
56ef1c5c521daafd8cf1817568836ef5
7565d09c539a1b99df8e94a92c3fbf987468ceea
30643 F20101207_AABLJI fan_j_Page_140.jpg
7d0a46ec25b088cb5cddf5e5c01a6c9b
2220ec25b0b1f9cee8a2186fcfee2c40b9f6e2e2
74733 F20101207_AABLIT fan_j_Page_121.jpg
963cd51480d26ac35c388022f3e5da2a
f372d81b26fe522e9c123b60864848e2206f7d46
20273 F20101207_AABLJJ fan_j_Page_001.jp2
fb439a1034b1eaeebc981ba8bb18e893
36575e75ac30948d385d14b883d77439d48fe29d
57514 F20101207_AABLIU fan_j_Page_122.jpg
4d26bf25e1b5b420795400a199f39f1f
ea57a249074a37874462dd61e7a2d4b0e1508179
4723 F20101207_AABLJK fan_j_Page_002.jp2
a4e9aeecc20a1b8bec6ee052d0c94a3c
ead066363bed9dcf02becda9f37c5324a0fbc413
71162 F20101207_AABLIV fan_j_Page_123.jpg
0d0b472521f6ebd38c1d9bd1270e1a29
10f7da9dfef8613746c5ee2172c2490befee4b63
57602 F20101207_AABLIW fan_j_Page_124.jpg
b5e9a766ed132694099cfdd1293673ce
4537ddfc9da8e98411ae5e6963b90fe70cd6be60
11567 F20101207_AABLJL fan_j_Page_003.jp2
784c9d685556eeda780986d51bdb4a09
07fa414ddca39eb63dbe84e35748bdf72190fe54
46361 F20101207_AABLIX fan_j_Page_125.jpg
9938cf2404494ed5eeaa5303672de2ce
9e95e8e867000310fd3cab7adc11e864e35f1502
1051978 F20101207_AABLKA fan_j_Page_024.jp2
749fae3bdeaec5bb63545aa22c8a8386
4d935ce49626d2cdbe3ba0249607e50a5610cd14
31086 F20101207_AABLJM fan_j_Page_004.jp2
4ffce85d7de077e0f73a6be43b57826c
c826708020fb68bcc9847479696e7ebb76a33c90
68413 F20101207_AABLIY fan_j_Page_126.jpg
82dd3769cf37500afd381bcce7f8af12
75d021b9764ec46fa41d358ef893da1bba1904f9
1051953 F20101207_AABLKB fan_j_Page_025.jp2
26029bd2fc7f32fcbc67037a79b8a776
4e4bba70491be910016adc85e4f9597368036efa
F20101207_AABLJN fan_j_Page_005.jp2
20ca1ca3a7e406afd21836ffde0669cb
8384f0e0d3512521619e97a2d0daf32241b1dd72
42601 F20101207_AABLIZ fan_j_Page_127.jpg
74ff154822ee3037b7b545c5bbcd094e
350ff4d48e31d4aa341828e79b54e781645b2941
1051891 F20101207_AABLKC fan_j_Page_026.jp2
702706ca80c7d43b85647c6f44175afd
3123e932516eb83fcddea76861ff75975a9f6824
665170 F20101207_AABLJO fan_j_Page_008.jp2
4169d2092fefd4a3098f021d97f94307
5c2730fced1ee63d3d83122215da8ffa9d85e1bf
1051969 F20101207_AABLKD fan_j_Page_027.jp2
ed03ad82c2f46f4ae8698f9ea998bb54
b67251457157d9c8d01ce356f5332f03989ac7d6
F20101207_AABLJP fan_j_Page_010.jp2
92f1c590b20a28d3f96f897892a99042
0a06f3594823f99f13d3b81ab8b79d93dea80cf3
1051965 F20101207_AABLKE fan_j_Page_028.jp2
d0209220c3a65b7eed0b475a964f3a8e
1b9dba312e5f177349165ae6ab596f890bd2a3c2
101935 F20101207_AABLJQ fan_j_Page_012.jp2
37faed2fa77c28551ec7063d341111f2
cdf7681d6c3ad8c1e6202d2332b4ac354aca0908
940763 F20101207_AABLKF fan_j_Page_029.jp2
69c62c4a1ad0c08c101fd0af221624fd
62ee27e69f0515b218140f152a50e6ec346fda1c
74166 F20101207_AABLJR fan_j_Page_013.jp2
ac3b2d26c8f26f96e8c480181c705cb9
41edbd2e8b4ed6a65f8daaf380cb1029115e2f1f
F20101207_AABLKG fan_j_Page_030.jp2
6a171a166b8628bc7aa2591e98e86c4c
ea74f46314cfd16ae5b557283633ea0fa0c8055b
1051963 F20101207_AABLJS fan_j_Page_014.jp2
6244738ecd32d5fed6c495cd5c159c1f
172b66c44136a5aed2190517cee3ebf07d4f70b2
982263 F20101207_AABLKH fan_j_Page_032.jp2
f240ba7dd7691ff3897564e3f38a641f
6ce0d89f63c9860cf70566d281d812b027399d22
1051971 F20101207_AABLJT fan_j_Page_016.jp2
2fb836876acec22dc492d221140c46b1
19bd3c183988b4e4df2067c0d982aa6e0f1e5049
744060 F20101207_AABLKI fan_j_Page_034.jp2
948953725e74d7a85c90baf7c66dcce6
3053ac18df28df60c7de2ddc6b8a8a4797425413
62760 F20101207_AABLJU fan_j_Page_017.jp2
f1fe20238906d55a78592944113ea028
04857cfc913f1ab1a8f19091f5649b6c6b60128d
1051966 F20101207_AABLKJ fan_j_Page_035.jp2
6705cea415b77cb60123229eaaed1976
bc1fbca7e4e87f283eb0febe728bbd0604835602
928946 F20101207_AABLKK fan_j_Page_038.jp2
1c6a5d7359d0f8962251137483c22013
7bee00aa4f8dbc7b5cdbd89e237b285f2f2ad384
F20101207_AABLJV fan_j_Page_018.jp2
65f59f1c25e69fce19dd7a6c6fa4701f
f158bf619c6dabb938db7148fa963220c4193812
F20101207_AABLKL fan_j_Page_039.jp2
26eaed25d307266398e0ffbaa033b35c
bf705455be364fb2cde9fc0b0ae1843b63d4560a
813191 F20101207_AABLJW fan_j_Page_019.jp2
59676993c1f9d963dbcd3efd231e632f
ce830c356c0f286b73b9b1fc176dac4087935749
48958 F20101207_AABLLA fan_j_Page_058.jp2
1fc9292f1e829f13c1526d82c3e7734a
f329435ef4c52e8f500c733a3ab8dbf86677b20e
873798 F20101207_AABLJX fan_j_Page_020.jp2
9d84f362cba880cca0a8bb3bb8def691
0f472e666178f0e286570affbd116c80147c0a9e
1051964 F20101207_AABLLB fan_j_Page_059.jp2
1d2a71fa5e2ec65cda4f0c5623035851
87d4afe19320c3953ff37a94aa4db4d83844cb13
F20101207_AABLKM fan_j_Page_040.jp2
2cfea61ca19179f0ecfad4ab26600f71
670f6528e0ec0848cc4e548ba110293907defbdc
979934 F20101207_AABLJY fan_j_Page_021.jp2
27ba92391dca828db29bfaa45d0d0ee0
2376a345fadc689aae29c4165087eea0b6df4c70
1051958 F20101207_AABLLC fan_j_Page_060.jp2
ec34bf7c470084ce591bb34892d0d396
2e49341d04011e0cf8b19f5ded29f2e2aafdadd1
F20101207_AABLKN fan_j_Page_041.jp2
1e69a02fdecba1177efd9e7c6795139f
d233b6e411d6f4a402c69b8342ee3b0b3f355634
1051979 F20101207_AABLJZ fan_j_Page_023.jp2
4ac273bbee8445dcdb11a97f5228f0ed
c34a9624abb73581308148366bbd0751a92b16e8
917529 F20101207_AABLLD fan_j_Page_061.jp2
52da9ec5d96397a1d12acce62075a167
1c6cd817e40ed78fe570d3a9ae1433ba548ed8a7
808341 F20101207_AABLKO fan_j_Page_042.jp2
4d1438c3497aaac29981de75fe6b76f6
be05b83f4af58c9f246f7331d56ea7438ea515fa
873021 F20101207_AABLLE fan_j_Page_062.jp2
1965f01ea678a852cf19807e978002a5
660611d81e547a6f91610c363388b037a9fd51ea
815448 F20101207_AABLKP fan_j_Page_043.jp2
92b62a271f3802e2bbca73f70b1a5006
41d90441f49c54b1bbedb29c3ca94a8d10658148
895851 F20101207_AABLLF fan_j_Page_063.jp2
1988efe2b78f18f3f75fa61c6c8c9dbe
fef60a14669ded96e2eaaff8b6a015267f91a686
773434 F20101207_AABLKQ fan_j_Page_044.jp2
7cbbfff5c8c6f0896dbf42350bd8b8fc
5e7b816d0c81cae2287958d44666295830a9429e
1041363 F20101207_AABLLG fan_j_Page_065.jp2
d2ae1282217ca0fa98a73f79deab51e4
d66bac2f1eced805e27cee72b611320144d1b680
915937 F20101207_AABLKR fan_j_Page_046.jp2
63d56a899a0738b2b519b214dad6c537
5b9753811227489d1c9c15657480e920ae84b49f
1022739 F20101207_AABLLH fan_j_Page_066.jp2
67c0c0ccaf77b9d900bbc151456fd5d9
e7055e559f88035c1a737becd87e9f37979d1e74
844802 F20101207_AABLKS fan_j_Page_047.jp2
3be3630be3de2d2d78eaf842473afa22
625dbba3e7fc65a6511a9ec37a9734992bb25a52
1051095 F20101207_AABLLI fan_j_Page_067.jp2
8ee0cf10975ca8f7deec74178297476d
f241307a62b0169a841be765795a697fbea0d077
978613 F20101207_AABLKT fan_j_Page_048.jp2
ea384ba633db9b194d184db9be752bda
d31eeef9ef4228e3102e7c2dca5deba9dcd046ab
941975 F20101207_AABLLJ fan_j_Page_068.jp2
af78d6f3590062671a794f6649e4deac
ce46c88cc91e85499971764c0d90892782d8f4e3
777341 F20101207_AABLKU fan_j_Page_050.jp2
60ca037f48ae544586727eddf001138d
f48314c449c72599bf88c28aa3e826dcc3ef07e9
977713 F20101207_AABLLK fan_j_Page_069.jp2
59cb17493d4ba476f0f63c792d1160fe
a8c44782e5b49f2be650c94a96e52298cd289cf5
919981 F20101207_AABLKV fan_j_Page_052.jp2
53260967c0bb04a1f19716de43d63d2d
6fb0ce81b5567959d3dc3a9581034e2bad232ee1
937495 F20101207_AABLLL fan_j_Page_070.jp2
6234dd7adb3509692fc4c1e5bce1a57c
b48e2973ed45c2a35159d27771fa81e205143d8d
1051896 F20101207_AABLKW fan_j_Page_053.jp2
3156a78d3d78b19fc887d59d8e0864e8
95631437a133ee985dc9093a773d4bda35db3ac3
745407 F20101207_AABLLM fan_j_Page_071.jp2
74a295589540f1c2c6b2feaf109fb679
b7ef40a1e2a42ad9df4faf4fa82dad2df97e8bb5
1006788 F20101207_AABLKX fan_j_Page_054.jp2
82578c531b670844ec37f195a951b18d
e48f70befe0192540a3f9f491950178c5724d3d0
1012457 F20101207_AABLMA fan_j_Page_087.jp2
30ad543b1f5a9d2f76d35678b318f3ce
3ee69ffe03c0727d42bf4ff72d94fceadb6ec525
970701 F20101207_AABLKY fan_j_Page_055.jp2
1bbc03286da87e6ef17073e442670871
abb68512b938a835299ab85fbe3a479fedb68f95
855093 F20101207_AABLMB fan_j_Page_088.jp2
0b278c97b9bdf8a35ff56b31560b58ca
09632a56c57b48a66afdcfa59d0d495ed9107f5b
1051933 F20101207_AABLLN fan_j_Page_072.jp2
09a9c868ecb79d890757129e647f89f9
47254da5ff35fefb97792aaaee80e7ba88d592e6
1051915 F20101207_AABLKZ fan_j_Page_057.jp2
fe9b45964164ef6aea6ea9434fa92401
38b8d870313594348ae15518a01268748656b6e3
975500 F20101207_AABLMC fan_j_Page_089.jp2
96a10e49827bda391e9aeb29dff8b70b
b91df19b36083a834ffdb2fc9f7b2edea20ccc2e
517881 F20101207_AABLLO fan_j_Page_073.jp2
796b2ea30d53269e8a9e200896dd19af
3fce6d3ffc2449a4ea1f9008aca1dfc4feb8babe
F20101207_AABLMD fan_j_Page_090.jp2
2eeb3fe9efe6ae2438a9adc5e41870e3
a66b26d217c432c0c3cdb128a4e2c4d6a97d11ea
837020 F20101207_AABLLP fan_j_Page_074.jp2
fcf92a35a9d28bb897b7d9a2e66c3b14
36d89222948e5828c1901570b8637efe97aa0c2c
F20101207_AABLME fan_j_Page_091.jp2
8e6d90f309cef8ba7cc322ef4da696ae
6c64e553e6eba05e1affe361bdf1800301d6e8a6
996188 F20101207_AABLLQ fan_j_Page_075.jp2
5347e97ec67664a69cfefbb3016a94e5
5c130ad1d8c2421daf700e88c4cca4b110035eb7
F20101207_AABLMF fan_j_Page_092.jp2
923a236cc5606e6c9db9a084e5ab30fd
5f438219248158ff56ba20d0fd47e07bcb576a18
1027803 F20101207_AABLLR fan_j_Page_076.jp2
fdb7fd35143f3a044b8d5a0358443eea
3c21962ed505a9b9927c3f9e172896183ac43200
128732 F20101207_AABLMG fan_j_Page_093.jp2
612c473badf2b4541072077e7f32f836
0d0b150d735a1cc19b1f625ba3fd234a87602dca
785454 F20101207_AABLLS fan_j_Page_077.jp2
238530537ea2579c8a1165765683131a
629df1bb762f5a163f0bfd4634351806e58987b6
1051946 F20101207_AABLMH fan_j_Page_095.jp2
300bed856eb6ff2659ca01f3a39f42b5
e948dbb0c8a9a5a59ba6bd7e389edf1f272f1291
881671 F20101207_AABLLT fan_j_Page_079.jp2
fa10e5c890038b8bffda72278906e82a
e92575d8c1fd7bc6714f6ecd73f82faeca13af2c
1023694 F20101207_AABLMI fan_j_Page_096.jp2
aba62225b11796ac326d0a334008b98c
d3eb76fa06b08636d6b76d444cbfda84e83eaac9
900094 F20101207_AABLLU fan_j_Page_080.jp2
641cd5879951fa5698972524407fae99
2177f2086734f2eed7ac12acb6c55849b53561a8
811360 F20101207_AABLMJ fan_j_Page_097.jp2
4ce6af1935f4510354121daefe664251
3858c2c1a3957a05da112a3a064f3f7b0f2e44cf
1043199 F20101207_AABLLV fan_j_Page_081.jp2
9d31ddbbb71756a3f02da5a9923657fd
846c6fda2863e4e38996668ef573e713692a2c41
987585 F20101207_AABLMK fan_j_Page_100.jp2
92eae78b7b7081fba87f8938fd89ff01
328cbe64071f2182a9fa0ada568ecfb412b624c1
584074 F20101207_AABLLW fan_j_Page_082.jp2
18e336a12f2f1aa292d47e800f929d4c
e61c82b60c1bed42d9394c27a96063b100ffcca9
637042 F20101207_AABLML fan_j_Page_101.jp2
92a3a4d8d43baf41dad5abb88ce2fc8c
62b6b93d6f5eec20cbfd3b0c7ad2c7afa6b9b997
1040449 F20101207_AABLLX fan_j_Page_084.jp2
7b45a737fa6ec32c64779e737555a5c2
97b0bf480507549f9caf8ae80ac98cf4a218af3b
762869 F20101207_AABLNA fan_j_Page_118.jp2
85b075e2df8d15bc8f78d6f6242bd6a4
f054ed51671aad4be32db8d554d0dc2f71f09b1c
551543 F20101207_AABLMM fan_j_Page_102.jp2
c213816f02b77fd8f0cbd2fbaef4194a
72b43e6e1eca65ad30c3fd38b86135603b3f62cf
1051919 F20101207_AABLLY fan_j_Page_085.jp2
c7bb24cbca991c7f8a22de5cac2c5855
a35bd7970e6d2a20f1c5612673f61b1e52729745
734283 F20101207_AABLNB fan_j_Page_119.jp2
c02bd754f98bb32c20b9546cd564d682
fa68a678773cf47fb48230a2e1fc10203d59dde7
F20101207_AABLMN fan_j_Page_103.jp2
22f363c74268973f6a3c6445ed6be453
d47569e49a923e22b28c76b052492102360e8dae
991961 F20101207_AABLLZ fan_j_Page_086.jp2
a5a37db00a394f2bb946d3696c5ef6a3
9877ea89cc34e0391c5961e2c2e03c0c2531ce24
721176 F20101207_AABLNC fan_j_Page_120.jp2
ead9be299e058d2ed12e474b04481dad
79988f255d070b36da0f0161c6efd9ef30c83253
624491 F20101207_AABLND fan_j_Page_125.jp2
29d9e11d226b4f6d4386a88518d2b6fd
d1652b3a209c19b9899e5bba5a978625e003b5ff
1051917 F20101207_AABLMO fan_j_Page_104.jp2
ee71983f4b715b588b3fc6cbd73b5c1f
4caeefcb4e54b6f6f18fd6983ad59774f2e7df25
556182 F20101207_AABLNE fan_j_Page_127.jp2
59bb1803e12fe6fa4996557278095527
19ac3b965e3bb167395fdc47f200190fcf31f44b
F20101207_AABLMP fan_j_Page_105.jp2
6fa265c2d4dc9fb421b67efba1642fae
48e7efe93f87a57b8a26b80eb4ddd8fc2d6d0ab2
968498 F20101207_AABLNF fan_j_Page_128.jp2
2ea02f2c42feb4dee093ec70e0cf4347
da6cf36e29c21b26b314ca9d33e0386c1579ebda
117151 F20101207_AABLMQ fan_j_Page_106.jp2
3d7aa0aa968587273befb3223ab48531
5618cd8469f00b0a8fefced106289ac812e0eaa9
120378 F20101207_AABLNG fan_j_Page_130.jp2
053d539cdaf4b05845ed94a786dc4f9f
356db7cc877aef32850fe1bd585a93b19d672aa7
1000269 F20101207_AABLMR fan_j_Page_108.jp2
cfd5a8a4a076e734a11c2dcbc4cd08e4
e53d924c3e971ca929ce2d2d32bea90a0938ed62
111285 F20101207_AABLNH fan_j_Page_131.jp2
7bccbb45b64b6b94d2193fddae453538
63b24cd0cf0e06f3aae5ff8342cc05eba735c54c
985030 F20101207_AABLMS fan_j_Page_109.jp2
74e575103e44b66840b13a6eeb0d6d18
8cf71db8cb1ddc399ed5873f73ffe5e2643183e8
60150 F20101207_AABLNI fan_j_Page_132.jp2
f6aa9f44177a7dbfd271e58de7076f2b
798bedd8501c3e54d1249b5b0e05b53fb98be474
844583 F20101207_AABLMT fan_j_Page_110.jp2
5550a75cd66d4a10f8b48e156eb9197d
e4b35a5663c7875d9e0352941ee5259cbbef2fa5
298611 F20101207_AABLNJ fan_j_Page_133.jp2
a61e6678bae452cedda8c81d74d2d0e4
bb783fc7d38af09fc86c602501c3438687a4fa74
663924 F20101207_AABLMU fan_j_Page_111.jp2
4d73b148120b47ebe93d1f4ea1d31c43
59fdfae299bef39e732ff10a473550ed63511c1f
1051972 F20101207_AABLNK fan_j_Page_136.jp2
e74bd8804e2c28b8e4b6f6f142660aa2
84d305b27e65e1d4ba606e8c4fa36646f5148efb
650842 F20101207_AABLMV fan_j_Page_112.jp2
f5054966aa9b43668e36bb4cff5afcd0
17fee0a56ec12658e38046239ba5f65757eb8d87
F20101207_AABLNL fan_j_Page_137.jp2
eeec25bad148f9aa4a079c7d02733dea
ce09f2cf3f1a224df14667e2099fddeb86b0a998
451088 F20101207_AABLMW fan_j_Page_113.jp2
5d8f3171b44591d4c0dca047d901369d
6f3b02a6c385d8c7ee148438af96d7fa98897a6d
F20101207_AABLOA fan_j_Page_020.tif
79b82150ce35daf131fb3e157a6b3058
4360de8b6d28ac9f5039a78edf75812421ac72a5
1051951 F20101207_AABLNM fan_j_Page_138.jp2
81fd4bd770eadd0dff78689e9febf098
0d001988a8fb790ea9c618b92a88946ebf7bd6a2
726783 F20101207_AABLMX fan_j_Page_114.jp2
1de89d953153656c9b5a73f6e6d78573
6207243db242841ac00c81da28d6dca7750ee532
F20101207_AABLOB fan_j_Page_021.tif
45af80099fc5bc7bf1b81bbe4094e18e
da5c4d0fbd7653ac557d8581e5aac15b6ba7301c
726314 F20101207_AABLNN fan_j_Page_139.jp2
d3df205cec3b5fd2ade794db58cc802a
79fc5f59344bd1586fea42dd68cf085600207d8f
F20101207_AABLMY fan_j_Page_116.jp2
7bffbd1ddefe5dba980fb100db9d4fab
4ed626d34e4b3ef8968850d1d6633bc0b182c08e
F20101207_AABLOC fan_j_Page_022.tif
a0f83f2846a80b3a5a3f182dc05bcf86
ab0fd9e86acbb7bdc580c076d5d950c1f25be930
39089 F20101207_AABLNO fan_j_Page_140.jp2
a00723baa73199d1882c5cc8d1fe7178
a6379f88eb4bdb784c064205d714c0b57cde1506
673015 F20101207_AABLMZ fan_j_Page_117.jp2
24125c4eeaa8099e9b6428aa8ae18deb
57b6146968230199875abf2572a2168e09dc4bbb
F20101207_AABLOD fan_j_Page_023.tif
d9223a83ff9cf7459b676eb6195f726d
cd71595d09079c8ff13c4fbd73ee0bd390f47bd5
F20101207_AABLOE fan_j_Page_024.tif
0507fd4f26afac04d8aac335947c2b4f
1e765ab5f51606601d41b76d6ad9d9e30b96e781
F20101207_AABLNP fan_j_Page_003.tif
ff08cde69f108802dc10336c3442298b
ac0ba646db0fc144a1b80dab4259ffee4072a692
F20101207_AABLOF fan_j_Page_025.tif
d971e6f745d710aabc18062b7d601a76
9bf20bc7db99e2417b88996e33c69fca6917517c
F20101207_AABLNQ fan_j_Page_004.tif
3992841cd4ec04503672b8be83340925
ca53ed55d97a03f3007659c265c82a3a30d9853f
F20101207_AABLOG fan_j_Page_026.tif
e5b5b0889518e4640436ca355110596e
058edb15cbd3f9563281af14bc87954de28db53b
F20101207_AABLNR fan_j_Page_007.tif
338b0b52a28ffc7a2ec91f01c71d4d12
9a74da2f8994b5f03935bc83cd6ed12c8a313a4d
F20101207_AABLOH fan_j_Page_028.tif
6acc78a10b8dceae8a525334b98dee5f
3b7084167be839568065de2f87e5ecdef2231069
F20101207_AABLNS fan_j_Page_009.tif
5098f26e7e1446332da701e4d366233a
b4a30d52b017a1494cfdad80f3ae0a84f8bce4f3
F20101207_AABLOI fan_j_Page_029.tif
38f9ed7f0a44941a5c63eb9ddb2dfd02
2cc3c61452d84590409e9962ec94b4dc80402a59
F20101207_AABLNT fan_j_Page_011.tif
c38b0aa0bcdb75c8ad080c5c1c9c6f98
fae7bf68c92a023b8a18141616a66b4ef07d43ad
F20101207_AABLOJ fan_j_Page_030.tif
5a93322f03ef663c75f2a2a238c982e8
6a5a06d6ca8d1fa10a09f129025e5014a5acbcdd
F20101207_AABLNU fan_j_Page_012.tif
f2466634e3550cac3ce15af06ff8554a
9aa9c9b4082a424a9237b54735821672e02313a8
F20101207_AABLOK fan_j_Page_031.tif
652164ffec901fa7ccf6587cfdd1c382
809658f046eed523a8cee1bdc42c59587d4e5138
F20101207_AABLNV fan_j_Page_013.tif
40071b1cf9d22c8320fad2fd1f94326e
9ce0e9999d867c5ea9d95e1c62e0ff76014541e6
F20101207_AABLOL fan_j_Page_032.tif
d565f191af7d4d4e9f12609b7983b3ff
f382d84ba9981087748f88b2f1b0e3182015d3e4
F20101207_AABLNW fan_j_Page_014.tif
3e7fa770c9f18d8072c0f87e9aacdc34
9c95cb48a0c195adf293f8fab9cabdac91580fa0







NETWORK CENTRIC TRAFFIC ANALYSIS


By

JIEYAN FAN



















A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2007

































2007 Jieyan Fan

































To those who sparked my interest in science, opening for me the door to discovering

nature and letting me walk through it in my own way.









ACKNOWLEDGMENTS

First of all, thank my advisor Professor Dapeng Wu for his great inspiration, excellent

guidance, deep thoughts, and friendship. I also thank my supervisory committee members,

Professors Shigang C'!i, i, Liuqing Yang, and Tao Li, for their interest in my work.

I also express my appreciation to all of the faculty, staff, and my fellow students

in the Department of Electrical and Computer Engineering. In particular, I extend my

thanks to Dr. Kejie Lu for his helpful discussions.









TABLE OF CONTENTS
page

ACKNOW LEDGMENTS ................................. 4

LIST OF TABLES ....................... ............. 8

LIST OF FIGURES .................................... 9

ABSTRACT . . . . . . . . . . 12

CHAPTER

1 INTRODUCTION ...................... .......... 14

1.1 Introduction to Network Anomaly Detection ....... ......... 14
1.2 Introduction to Network Centric Traffic Classification ..... ...... 16

2 NETWORK ANOMALY DETECTION FRAMEWORK ............ .18

2.1 Introduction . .. . . . .. . . .. 18
2.2 Edge-Router Based Network Anomaly Detection Framework ........ 18
2.2.1 Traffic Monitor ............... ......... .. 20
2.2.2 Local Analyzer ............... ......... .. 20
2.2.3 Global Analyzer ............... ......... .. 21
2.3 Summary . ............... ............ .. 22

3 FEATURES FOR NETWORK ANOMALY DETECTION . . .... 23

3.1 Introduction ............... . . .. 23
3.2 Hierarchical Feature Extraction Architecture ................ .. 24
3.2.1 Three-Level Design .................. ........ .. 24
3.2.2 Feature Extraction in a Traffic Monitor . . ..... 26
3.2.3 Feature Extraction in a Local Analyzer or a Global Analyzer . 27
3.3 Two-Way Matching Features .................. ....... .. 27
3.3.1 M otivation ...... ... .. ....... ...... 27
3.3.2 Definition of Two-Way Matching Features . . 30
3.4 Basic Algorithms .................. ............. .. 32
3.4.1 Hash Table Algorithm .................. ..... .. 32
3.4.2 Bloom Filter . . . . . . .. 33
3.5 Bloom Filter Array (BFA) .................. ........ .. 35
3.5.1 Data Structure .................. .......... .. 35
3.5.2 Algorithm .................. ............. .. 36
3.5.3 Round Robin Sliding Window .................. .. 38
3.5.4 Random-Keyed Hash Functions ............. .... .. .. 39
3.6 Complexity Analysis .................. ........... .. 40
3.6.1 Space/Time Trade-off .................. .. .... .. 41
3.6.2 Optimal Parameter Setting for Bloom Filter Array . ... 50









3.7 Sim ulation Results . .. .. .. ... .. .. .. ... .. .
3.7.1 The BFA Algorithm vs. the Hash Table Algorithm ......
3.7.2 Experiment on Feature Extraction System .. ........
3.8 Sum m ary . . . . . . . .

4 MACHINE LEARNING ALGORITHM FOR NETWORK ANOMALY
D ETECTIO N . . . . . . . . .


4.1 Introduction..... . . .
4.1.1 Receiver Operating C'!h i o :teristics Curve .
4.1.2 Threshold-Based Algorithm . .....
4.1.3 C!I i;,. -Point Algorithm . .......
4.1.4 B i -i ,i Decision Theory . ......
4.2 B li, -i ,'l Model for Network Anomaly Detection .
4.2.1 B li, -i il Model for Traffic Monitors and Local
4.2.2 B li -i i' Model for Global Analyzers . .
4.2.3 Hidden Markov Tree (HMT) Model for Global
4.3 Estimation of HMT Parameters . .......
4.3.1 Likelihood Estimation . ........
4.3.2 Transition Probability Estimation . ..
4.4 Network Anomaly Detection Using HMT . .
4.5 Simulation Results . ..............
4.5.1 Experiment Setting . ..........
4.5.2 Performance Comparison . ......
4.5.3 D discussion . . . . .
4.6 Sum m ary . . . . . .


Analyzers

Analyzer


5 NETWORK CENTRIC TRAFFIC CLASSIFICATION: AN OVERVIEW ....

5.1 Introduction ........... . . ......
5.2 Related Work . . . . .....
5.3 Intuitions Behind a Proper Detection of Voice and Video Streams . .
5.3.1 Packet Inter-Arrival Time and Packet Size in Time Domain . .
5.3.2 Packet Inter-Arrival Time in Frequency Domain . . .
5.3.3 Packet Size in Frequency Domain . . . .
5.3.4 Combining Packet Inter-Arrival Time and Packet Size in Frequency
D om ain . . . . . .. ..
5.4 Sum m ary . . . . . . . . .....

6 NETWORK CENTRIC TRAFFIC CLASSIFICATION SYSTEM . .

6.1 System Architecture...... . . .....
6.1.1 Flow Summary Generator (FSG) . . ....
6.1.2 Feature Extactor (FE) and Voice/Video Subspace Generator (SG)
6.1.3 Voice/Video CLassifer (CL) . . . . .....
6.2 Feature Extractor (FE) Module via Power Spectral Density (PSD) . .
6.2.1 Modeling the network flow as a stochastic digital process . .


59
59
60
60
62
64
64
66
68
72
72
76
81
84
84
86
88
89

90

90
94
95
97
99
99

100
102

104

104
105
105
106
107
107


............










6.2.2 Power Spectral Density (PSD) Computation . ..
6.3 Subspace Decomposition and Bases Identification on PSD Features
6.3.1 Subspace Decomposition Based on Minimum Coding Length


6.3.2 Subspace Bases Identification . ......
6.4 Voice/Video Classifier . ..............
6.5 Experiment Results . ...............
6.5.1 Experiment Settings . ...........
6.5.2 Skype Flow Classification . ........
6.5.3 General Flow Classification . .......
6.5.4 D discussion . . . . . .
6.6 Sum m ary . . . . . . .

7 CONCLUSION AND FUTURE WORK . .......


. 108
. 115
. 117


. . . 120
. . . 12 1
. . . 123
. . . 123
. . . 124
. . . 124
. . . 125
. . . 128

. . . 129


Summary of Network Centric Anomaly Detection
Summary of Network Centric Traffic Classification


APPENDIX

A PROOFS . . .


Equation (431) . ...
Equation (4-32) . ...
Equation (433) . ...
Equation (4-34) . ...


133
133
134
134


REFERENCES ...............


BIOGRAPHICAL SKETCH .............................. .......









LIST OF TABLES


Table page

3-1 Notations for two-way matching features .................. ..... 31

3-2 Notations for complexity analysis .................. ..... .. 41

3-3 Space/time complexity for hash table, Bloom filter, and BFA . .... 47

4-1 Parameters used in CUSUM .................. .......... .. 60

4-2 Notations for hidden markov tree model .................. ..... 70

4-3 Parameter setting of feature extraction for network anomaly detection . 86

4-4 Performance of different schemes. .................. ...... .. .. 86

5-1 Commonly used speech codec and their specifications . . ..... 96

6-1 Typical PD and PFA values. ............... ......... 126









LIST OF FIGURES


Figure page

2-1 An ISP network architecture. .................. ...... 19

2-2 Network anomaly detection framework. .................. .... 19

2-3 Responsibilities of and interactions among the traffic monitor, local analyzer,
and global analyzer. .................. ... ......... 20

2-4 Example of .,-vmmetric traffic whose feature extraction is done by the global
analyzer . .................... ................ 21

3-1 Hierarchical structure for feature extraction. ................ .... 24

3-2 Network in normal condition. ............... ....... 28

3-3 Source-address-spoofed packets. ............... ........ 29

3-4 Reroute ............... ................ .. 29

3-5 Hash Table Algorithm ............... .......... .. 33

3-6 Bloom Filter Operations ............... ........... .. 34

3-7 Scenarios of the problems caused by Bloom filter. (a) Boundary problem. (b)
An outbound packet arrives before its matched inbound packet with 2 tl < F. 34

3-8 Bloom Filter Array Algorithm .................. ......... .. 37

3-9 Bloom Filter Array Algorithm using sliding window .............. 38

3-10 Space/time trade-off for the hash table, BFA with q = 0.1 and BFA with q =
1 ... ............... .................. .... .. 48

3-11 Relation among space complexity, time complexity, and collision probability.
(a) 3. : vs. q. (b) E [T,]* vs. . . . ... . . .. 50

3-12 Space complexity vs. collision probability for fixed time complexity. ...... ..52

3-13 Memory size (in bits) vs. average processing time per query (in ps) ...... ..53

3-14 Average processing time per query (in ps) vs. average number of hash function
calculations per query ................ .......... .. .. 54

3-15 Comparison of numerical and simulation results. (a) Hash table algorithm. (b)
BFA algorithm with rq1 .................. .......... .. 55

3-16 Feature data: (a) Number of SYN packets (link 1), (b) Number of unmatched
SYN packets (link 1), (c) Number of SYN packets (link 2), and (d) Number of
unmatched SYN packets (link 2). .................. ..... 58









4-1 Generative process in graphical representation, in which the traffic state generates
the stochastic process of traffic. .................. .. .. 64

4-2 Extended generative model including traffic feature vectors: (a) original model
and (b) simplified model. .................. .. ...... 65

4-3 Generative independent model that describes dependencies among traffic states
and traffic feature vectors. .................. .. ...... 66

4-4 Generative dependent model that describes dependencies among edge routers. .67

4-5 Hidden Markov tree model. For an node i, p(i) denotes its parent node and v(i)
denotes the set of its children nodes. .................. ..... 69

4-6 Probability density function of the univariate Gaussian distribution A (x; 0, 1). 73

4-7 Histogram of the two-way matching features measured at a real network during
network anom allies. .................. ... ......... 73

4-8 The EM algorithm for estimating p(i(IQ = u), i E u E {0,1}. . . 75

4-9 Iteratively estimate transition probabilities. .................. 77

4-10 Belief propagation algorithm. .................. ....... 78

4-11 Viterbi algorithm for HMT decoding. .................. ..... 82

4-12 Experiment Network .................. .............. .. 85

4-13 Performance of threshold-based and machine learning algorithms with different
feature data ................... ............ ...... 87

4-14 Performance of four detection algorithms .................. ..... 88

5-1 Average packet size versus inter-arrival variability metric for 5 applications: voice,
video, file transfer, mix of file transfer with voice and video. . .... 96

5-2 Inter-arrival time distribution for voice and video traffic . . ..... 97

5-3 Packet size distribution for voice and video traffic ................ ..98

5-4 Power spectral density of two sequences/traces of time-varying inter-arrival times
for voice traffic .................. ................. .. 99

5-5 Power spectral density of two sequences of time-varying inter-arrival times for
video traffic . . . . . . . . ... .. 100

5-6 Power spectral density of two sequences of discrete-time packet sizes for voice
traffic ................................ .. ... .. 101









5-7 Power spectral density of two sequences of discrete-time packet sizes for video
traffic ............... ....................... 101

5-8 Power spectral density of two sequences of continuous-time packet sizes for voice
traffic ........ .. . . .............. ..... 102

5-9 Power spectral density of two sequences of continuous-time packet sizes for video
traffic ........ .. . . .............. ..... 102

6-1 VOVClassifier System Architecture .................. ..... 104

6-2 Power spectral density features extraction module. Cascade of processing steps. 107

6-3 Levinson-Durbin Algorithm. .................. ....... 113

6-4 Parametric PSD Estimate using Levinson-Durbin Algorithm. . .... 114

6-5 Pairwise steepest descent method to achieve minimal coding length. ...... ..119

6-6 Function II. l/.fyBases identifies bases of subspace. .............. ..120

6-7 Function Voice VideoCl. i-l.:y determines whether a flow with PSD feature vector
b is of type voice or video or neither. 01 are 02 are two user-specified threshold
arguments. Function voicevideoC1 i-.:fy uses Function NormalizedDistance to
calculate normalized distance between a feature vector and a subspace. .... 122

6-8 The ROC curves of single-typed flows generated by Skype, (a) VOICE and (b)
VIDEO. ................... .... ....... .. ...... 124

6-9 The ROC curves of hybrid flows generated by Skype, (a) VOICE, (b) VIDEO,
(c) FILE+VOICE, and (d) FILE+VIDEO. .................. .. 125

6-10 The ROC curves of single-typed flows generated by Skype, MSN, and GTalk:
(a) VOICE and (b) VIDEO. ............... ......... 126

6-11 The ROC curves of hybrid flows generated by Skype, MSN, and GTalk: (a) VOICE,
(b) VIDEO, (c) FILE+VOICE, and (d) FILE+VIDEO. . . 127









Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

NETWORK CENTRIC TRAFFIC ANALYSIS

By

Jieyan Fan

December 2007

C('!i : Dapeng Oliver Wu
Major: Electrical and Computer Engineering

Over the past few years, the Internet infrastructure has become a critical part of the

global communications fabric. Emergence of new applications and protocols (such as voice

over Internet Protocol, peer-to-peer, and video on demand) also increases the complexity

of Internet. All these trends increase the demand for more reliable and secure service.

This has affected the interest of Internet service providers (ISP) in network centric traffic

analysis.

Our study considers network centric traffic analysis from the two perspectives

that most interest ISPs: network centric anomaly detection, and network centric traffic

classification.

In the first part of our research, we focus on network centric anomaly detection.

Despite the rapid advance in networking technologies, detection of network anomalies

at high-speed switches/routers is still far from maturity. To push the frontier, two

1i ri" technologies need to be addressed. The first is efficient feature-extraction

algorithms/hardware that can match a line rate in the order of Gb/s. The second is fast

and effective anomaly detection schemes. Our study addresses both issues. The novelties

of our scheme are the following. First, we design an edge-router based framework that

detects network anomalies as they first enter an ISP's network. Second, we propose the

so-called two-way matching features, which are effective indicators of network anomalies.

We also design data structure to extract the features efficiently. Our detection scheme









exploits both temporal and spatial correlations among network traffic. Simulation results

show that our scheme can detect network anomalies with high accuracy, even if the volume

of abnormal traffic on each link is extremely small.

In the second part, we focus on network centric traffic classification. N .--. 1 Iv

VoIP and IPTV become increasingly popular. To tap the potential profits that VoIP

and IPTV offer, carrier networks must efficiently and accurately manage and track the

delivery of IP services. Yet, the emergence of a bloom of new zero-d-,- voice and video

applications such as Skype, Google Talk, and MSN pose tremendous challenges for ISPs.

The traditional approach of using port numbers to classify traffic is infeasible because

it uses a dynamic port number. The proliferation of proprietary protocols and usage of

encryption techniques make application-level analysis infeasible. Our study focus on a

statistical pattern classification technique to identify multimedia traffic. In particular,

we focus on detecting and classifying voice and video traffic. We propose a system

(VOVC1.- -.,' r ) for voice and video traffic classification that uses the regularities residing

in multimedia streams. Experimental results demonstrate the effectiveness and robustness

of our approach.









CHAPTER 1
INTRODUCTION

Over the past few years, the Internet infrastructure has become a critical part of the

global communications fabric. A survey by the Internet Systems Consortium (ISC) shows

that the number of hosts advertised in domain name system (DNS)[1, 2] has risen from

approximately 9,472,000 in January 1996 to 394,991,609 in January 2006. In addition, the

emergence of new applications and protocols, such as voice over Internet Protocol (VoIP),

pear-to-pear (P2P), and video on demand (VoD)[3], also increases the complexity of the

Internet. Accompanying this trend is an increasing demand for more reliable and secure

service. A 1i ir challenge for Internet service providers (ISP) is to better understand the

network state by analyzing network traffic in real time. Thus ISPs are very interested in

the problem of network centric traffic analysis.

We consider the network centric traffic analysis problem from two perspectives: 1)

network anomaly detection and 2) network centric traffic classification. We introduce the

two perspectives in the next two sections.

1.1 Introduction to Network Anomaly Detection

With the rapid growth of Internet, detection of network anomalies becomes a i, i i 1r

concern in both industry and academia since it is critical to maintain availability of

network services. Abnormal network behavior is usually the symptom of potential

unavailability in that:

* Network anomaly is usually caused by malicious behavior, such as denial-of-service
(DoS) attacks, distributed denial-of-service (DDoS) attacks, worm propagation,
network scans, or email spams;

* Even if it is caused by unintentional reasons, network anomaly is often accompanied
with network congestion or router failures.

However, detecting network anomalies is not an easy task, especially at high-speed

routers. One of the main difficulties arises from the fact that the data rate is too high to

afford complicated data processing. An anomaly detection algorithm usually works with









traffic features instead of the original traffic data itself. Traffic features can be regarded as

succinct summaries of the voluminous traffic (e.g., the traffic data rate is a feature of the

traffic). We study two 1 i, i" issues in feature extraction for network anomaly:

* what features to extract (i.e., what features make most distinction between normal
and abnormal network states);

* how to extract features efficiently to catch up line rate of high-speed routers (e.g., in
the order of Gb/s).

Our research addresses both issues.

In addition to traffic feature extraction, another difficulty lies in classification

of network state based on extracted features. Given the same feature set, different

classification schemes have different performance. The difficulty lies in how to efficiently

but accurately make decisions on network state. In this paper, we address this problem by

designing a machine learning algorithm to exploit spatial correlations among edge routers

Specifically, our 1 i ji" contributions in network anomaly detection include but not

limited to

* designing a framework which deploys on edge routers to detect network anomalies
based on both local information and global information;

* proposing the so-called two-way matching features which make significant distinctions
between normal and abnormal network states, and designing the data structure
Bloom filter array to extract the two-way matching features efficiently;

* designing a machine learning algorithm to detect network anomalies accurately by
exploiting spatial correlations of edge routers and efficiently by employing the hidden
Markov tree data structure.

Analysis and simulation results show that our framework is capable of detecting

network anomalies accompanied with low volume traffic, which is of much importance to

detect network anomalies in the first place. For example, for low volume DDoS attacks,

given the same false alarm probability, our scheme has a detection probability of 0.97,

whereas the existing scheme has a detection probability of 0.17, which demonstrates the

superior performance of our scheme.









1.2 Introduction to Network Centric Traffic Classification

Besides network anomaly detection, classification of normal network traffic is also

of practical significance to both enterprise network administrators and ISPs. Along

with the rapid emergence of new types of network applications such as VoIP, VoD, and

P2P file exchange, quality of service (QoS) becomes a more and more important issue.

For example, transmission of real-time voice and video has bandwidth, delay, and loss

requirements. However, there is no QoS guarantee for these real-time applications over the

current best-effort network. Many schemes are proposed to address this problem. On the

other hand, enterprise network administrators may want to restrict network bandwidth

used by disallowed VoIP, VoD, or P2P applications, if not totally block, which might be

too rude. That is, they want to limit the QoS of specific network traffic.

Wu et al.[4] summarized techniques for QoS provision for real-time streams from

the point of view of end hosts. These techniques include coding methods, protocols,

and requirements on stream servers. Another effective solution is from the point of view

of network carriers or ISPs. For example, ISPs can assign different forwarding priority

to different types of network traffic on routers. This is the motivation of differentiated

services (DiffServ)[5, 6].

DiffServ is a method designed to guarantee different levels of QoS for different classes

of network traffic. It is achieved by setting the "type of service" (TOS)[7] field, which

hence is also called DiffServ code point(DSCP)[5], in the IP header according to the class

of the network data, so that the better classes get higher numbers. Unfortunately, such

design highly depends on network protocols, especially proprietary protocols, observing

DiffServ regulations. In the worst case, if all protocols set TOS to the highest number, it

is even worse to employ DiffServ method.

For this reason, we believe a proper DiffServ scheme should be able to classify

network traffic on the fly, instead of relying on any tags in packet header. Thus, the

difficulty lies in accurate classification of network traffic in real-time.









Yet, the emergence of a bloom of new zero-d-v voice and video applications such as

Skype, Google Talk, and MSN poses tremendous challenges for ISPs. The traditional

approach of using port numbers to classify traffic is infeasible due to the usage of

dynamic port number. In the second part of our research, we focus on a statistical

pattern classification technique to identify multimedia traffic. Based on the intuitions that

voice and video data streams show strong regularities in the packet inter-arrival times

and the associated packet sizes when combined together in one single stochastic process,

we propose a system, called VOVClassifier, for voice and video traffic classification.

VOVClassifier is an automated self-learning system that classifies traffic data by extracting

features from frequency domain using Power Spectral Density analysis and grouping

features using Subspace Decomposition. We applied VOVClassifier to real packet traces

collected from different network scenarios. Results demonstrate the effectiveness and

robustness of our approach.









CHAPTER 2
NETWORK ANOMALY DETECTION FRAMEWORK

2.1 Introduction

The first issue of network anomaly detection is to design a framework. There are

two types of network anomaly detection frameworks, i.e., host-based frameworks and

network-based frameworks. Host-based frameworks are deploy, .1 on end-hosts. These

frameworks typically use firewall and intrusion detection systems (IDS), and/or balance

the load among multiple (geographically dispersed) servers to defend against network

anomalies. The host-based approaches can help protect the server system; but it may not

be able to protect legitimate access to the server, because high-volume abnormal traffic

may congest the incoming link to the server.

On the other hand, network-based frameworks are deploy, .1 inside networks, e.g.,

on routers. These frameworks are responsible for detecting network anomalies and

identifying abnormal packets/flows or anomaly sources. To detect network anomalies,

signal processing techniques (e.g., wavelet [8], spectral analysis [9, 10], statistical methods

[11-13]), and machine learning techniques [14] can be used. To identify network anomaly

sources, IP traceback [15] is typically used. The IP traceback techniques can help contain

the attack sources; but it requires large-scale deployment of the same IP traceback

technique and needs modification of existing IP forwarding mechanisms (e.g., IP header

processing).

This chapter presents our network anomaly detection framework, which is of the

network-based category. We present our framework design in Section 2.2 and summarize

this chapter in Section 2.3.

2.2 Edge-Router Based Network Anomaly Detection Framework

To detect network anomalies in an ISP network, we designed an edge-router based

network anomaly detection framework. The motivation results from an ISP network

architecture (Figure 2-1). It consists of two types of IP routers, i.e., core routers and edge










Subnet
Edge routers

SSubnet










S Autonomous
System
SSubnet

Core routers Subnet

Figure 2-1. An ISP network architecture.


routers. Core routers interconnect with one another to form a high-speed autonomous

system (AS). In contrast, edge routers are responsible for connecting subnets (i.e.,

customer networks or other ISP networks) with the AS. In this paper, a subnet can be

either a customer network or an ISP network.

Global Analyzer - Local Analyzer
ubne




Autonomous Systehinm


SLocal Analyzer




0 Traffic monitor

Directional link between autonomous system and subnet

Figure 2-2. Network anomaly detection framework.
























Figure 2-3. Responsibilities of and interactions among the traffic monitor, local analyzer,
and global analyzer.


Given such ISP network architecture, we design a framework to detect network

anomalies. Our framework (Figure 2-2) consists of three types of components: traffic

monitors, local analyzers, and a global analyzer. Figure 2-3 summarizes the functionalities

of each type of components and their interactions. Next, we discuss the functionalities of

traffic monitors, local i, i. is, and global analyzer in Sections 2.2.1, 2.2.2, and 2.2.3,

respectively.

2.2.1 Traffic Monitor

A traffic monitor (represented by a filled oval in Figure 2-2) is responsible for:

* scanning partial or all packets of a single unidirectional link;

* summarizing traffic characteristics;

* extracting simple features from the traffic characteristic;

* making decisions (e.g., declare network anomaly or classify type of normal traffic) on
one single unidirectional link; and

* reporting the summary of traffic information, simple feature data, and decisions to a
local analyzer.

2.2.2 Local Analyzer

A local analyzer is responsible for:










* extracting complicated features from traffic information obtained at a single edge
router;

* making decisions based on local traffic information (i.e., one edge router);

* reporting decisions, feature data, and summary of traffic information (if necessary) to
a global analyzer.

The local analyzer can utilize temporal correlation of traffic to generate feature data.

2.2.3 Global Analyzer

A global analyzer is responsible for:

* extracting complicated features that require global information, such as routing
information, from traffic;

* analyzing feature data obtained from multiple local analyzers; and

* making decisions with global information obtained from multiple edge routers.

Global Analyzer
'B







IX Y
Local Analyzer Local Analyzer



A

Figure 2-4. Example of .i-vii, i ic traffic whose feature extraction is done by the global
analyzer.


The global analyzer has a global view of the whole network. Hence, it exploits

both temporal correlation and spatial correlation of traffic. Here it is important to note

that, some feature data must be obtained at the global analyzer if global information

is required. For example, in Figure 2-4, if the traffic from subnet A to server B passes

through edge router X, and the traffic from server B to subnet A passes through edge









router Y, then the so-called two-way matching features between subnet A and server B

shall be obtained at the global analyzer, which has the routing information of the ISP

network.

The advantages of our framework design are that:

1. it is deploy, .1 on edge routers instead of systems of end users, such that it can detect
network anomalies in the first place they enter an AS;

2. it has no burden on core routers;

3. it is flexible in that detection of network anomalies can be made both locally and
globally;

4. it is capable of detecting low volume network anomalies accurately by exploiting
spatial correlations among edge routers.

The framework is designed to be an add-on service provided by ISP to protect end

users from network anomalies.

2.3 Summary

This chapter is concerned with design of network anomaly detection frameworks.

There are two types of frameworks (i.e., host-based and network based. Our design is of

the second type). Specifically, we designed a framework deploy, .1 on edge routers. It is

composed of three components, traffic monitors, local i i liv. i1, and global analyzers.

This framework is flexible in that it can detect network anomalies from both local view

and global view of the network. By exploiting spatial correlations among edge routers, our

framework is capable of detecting low volume network anomalies.









CHAPTER 3
FEATURES FOR NETWORK ANOMALY DETECTION

3.1 Introduction

Given the network anomaly detection framework we have established, the second

issue of network anomaly detection is feature extraction. Features for network anomaly

detection have been studied extensively in recent years. For example, Peng et al.[12]

proposed the number of new source IP addresses to detect DDoS attacks, under the

assumption that source addresses of IP packets observed at an edge router were relatively

static in normal conditions than those during DDoS attacks. Peng further pointed out

that the feature could differentiate DDoS attacks from the flash crowd, which represents

the situation when many legitimate users start to access one service at the same time.

For example, when many people watch a live sports broadcast over the Internet at the

same time. In both cases (DDoS attacks and the flash crowd), the traffic rate is high. But

during DDoS attacks, the edge routers will observe many new source IP addresses because

attackers usually spoof source IP addresses of attacking packets to hide their identities.

Therefore, this feature improves those DDoS detection schemes that rely on traffic rate

only. However, Peng et al.[12] focused on detection of DDoS attacks. It did not mention

other types of network anomalies. For example, when malicious users are scanning the

network, we can also observe high traffic rate but few new source IP addresses. It is

very important to differentiate network scanning from flash crowd because the former is

malicious but the latter is not. The two-way matching feature on different network l1 i-ri

(Section 3.3.1) can tell not only the presence of network anomalies but also their cause.

Lakhina et al.[16] summarized the characteristics of network anomalies under different

causes. Its contribution is to help identify causes of network anomalies. For example,

during DDoS attacks, we can observe high bit rate, high packet rate, and high flow rate.

The source addresses are distributed over the whole IP address space. On the other hand,

during network scanning, all the three rates are high, but the destination addresses,










rather than the source addresses, are distributed. However, the paper did not resolve an

important problem, i.e., how to extract features efficiently to match a high line rate in

the order of Gb/s. We proposed a data structure called Bloom filter array to address this

problem.

3.2 Hierarchical Feature Extraction Architecture

Network anomaly detection is not an easy task, especially at high-speed routers.

One of the main difficulties arises from the fact that the data rate is too high to afford

complicated data processing. An anomaly detection algorithm usually works with traffic

features instead of the original traffic data itself. Traffic features can be regarded as

succinct representations of the voluminous traffic, e.g., the traffic data rate is a feature of

the traffic.

We focus on presenting our feature extraction architecture for network anomaly

detection. We also cover extraction schemes for some simple features, such as data rate

and SYN/FIN(RST) ratio. The more advanced features, the so-called two-way matching

features, are discussed later.

3.2.1 Three-Level Design

Incoming packets


Level 1 Filter


TCP SYN TCP FIN TCP RST TCP SYN/ACK Level 2 Filter

| SYN Rate | YN/FIN(RST) Ratio| 2D Matching Feature Extraction


Figure 3-1. Hierarchical structure for feature extraction.


To efficiently extract features from traffic, we design a three-level hierarchical

structure (Figure 3-1), where incoming packets are processed by level-one filters, then

by level-two filters, and finally by (level-three) feature extraction modules. Level-one filters









and level-two filters are placed in traffic monitors. A feature extraction module can be

placed in either a traffic monitor or a local analyzer, depending on the type of the feature.

Level-one filters select a packet based on its source-destination pair, which is defined

by the source IP address (SA), the source network mask (SNM), the destination IP

address (DA) the destination network mask (DNM). For example, if we are interested in

packets from 172.10.5.28 to 210.33.68.102, we can choose 255.255.255.255 as both the SNM

and the DNM; if we are interested in packets from 172.10.x.x to 208.33.1.x, we can use

255.255.0.0 as the SNM and 255.255.255.0 as the DNM. In this way, we selectively monitor

an end-host or a subnet, giving much flexibility in framework configuration. The output of

a level-one filter is packets with the same source-destination pair, which are cor,:' .1 to

level-two filters.

A level-two filter classifies the packets coming from level-one filters, based on

the upper- iv.-r' data fields, e.g., TCP SYN or FIN. The packets of interest will be

forwarded to one or multiple feature extraction modules. For example, the number of

TCP SYN packets can be used to generate both the TCP SYN rate feature and the TCP

SYN/FIN(RST) ratio feature; hence, TCP SYN packets are conveyed to both the TCP

SYN rate module and the TCP SYN/FIN(RST) ratio module (Figure 3-1). On the other

hand, a feature module may need packets from multiple level-two filters. For example, the

SYN/FIN(RST) ratio feature extraction requires packets from three filters (Figure 3-1).

Compared to the packet classification schemes developed by Wang et al.[ll] and

Peng et al.[12], our hierarchical structure for feature extraction is more general and

efficient.

Next, we describe the most important module in the three-level hierarchical structure,

the feature extraction module.



1 Here, the upper liv--r can be either Liv-r 4 or Li--r 7.









Similar to previous studies [11, 12], we generate features in a discrete manner, i.e.,

our feature extraction module will generate a (feature) value or a vector at the end of

each time slot. Intuitively, shorter slot duration may reduce the detection delay, which

is defined as the interval from the epoch when the anomaly starts to the epoch when the

anomaly is detected; but a smaller duration may increase the computational complexity,

since the detection algorithm needs to analyze more feature data for the same time

interval. On the other hand, if a feature is represented by a ratio, the slot duration

must be sufficiently large to avoid division by zero. For example, if we want to use the

SYN/FIN(RST) ratio as in Ref. [11] to detect TCP SYN flood, then the slot duration

cannot be too small, because the number of FIN packets in a short period can be 0, which

will result in a false alarm even if the number of SYN packets is not large.

Feature extraction can be done in a traffic monitor, a local analyzer, and a global

analyzer, which will be described in Sections 3.2.2 and 3.2.3, respectively.

3.2.2 Feature Extraction in a Traffic Monitor

As we mentioned earlier, some features are generated within a traffic monitor. These

features are typically simple and reside in traffic of a single unidirectional link.

In our framework, a traffic monitor can generate the following features:

* Packet rate: defined by the number of packet arrivals in one time slot. This feature is
simple but useful for detecting high volume DoS and DDoS attacks. But it can hardly
help detect low volume attacks and other types of network anomalies. Furthermore,
normal network behaviors may be also accompanied with high packet rate, e.g., flash
crowd [12]. So is data rate.

* Data rate: defined by the total number of bits of all packets that arrive in one time
slot.

* SYN/FIN(RST) ratio2 : defined by the ratio of the number of TCP SYN packets in
one time slot to the number of FIN (and a portion of RST) packets in the same time
slot.



2 How to obtain this ratio can be found in Ref. [17].









3.2.3 Feature Extraction in a Local Analyzer or a Global Analyzer

Although a traffic monitor can generate simple features efficiently, these features may

not be sufficient to detect network anomalies. In particular, the packet rate and data

rate features may only be useful for detecting network anomalies accompanied with high

volume traffic; and SYN/FIN(RST) ratio has a large variation even for normal traffic

and hence cannot help accurately distinguish normal network conditions from network

anomalies. To improve detection accuracy, one can use a local in 1.v. r to generate more

sophisticated features, for example, the SYN/SYN-ACK ratio proposed in Ref. [17] and

the percentage of new IP addresses proposed in Ref. [12].

However, the existing features such as the SYN/SYN-ACK ratio [17] and the

percentage of new IP addresses [12] either do not lead to good performance of detectors,

or require high storage/time complexity (Section 3.1). To address these deficiencies, we

propose a new type of features called two-way matching features, which can make distinct

features between normal and attack traffic, thereby improving accuracy of detecting

attacks.

Next, we discuss the two-way matching features and the extraction scheme.

3.3 Two-Way Matching Features

3.3.1 Motivation

The motivation of using two-way matching features arises from the fact that, for

most Internet applications, packets are generated from both end hosts that are engaged

in communication. Information carried by packets on one direction shall match the

corresponding information carried by packets on the other direction. By monitoring the

degree of mismatch between flows of two directions, we can detect network anomalies.

To illustrate this, let us consider the behaviors of the two-way traffic in three scenarios,

namely, 1) normal conditions, 2) DDoS attacks, and 3) re-route.

In the first scenario, when the network of an ISP works normally, information carried

on both directions of communication matches (Figure 3-2). Host a and host v are two










Edge routers Local Analyzer
Local Analyzer
1
I

--_,



% .Autonomous System






Figure 3-2. Network in normal condition.


ends of communication (assume that host v is within the autonomous system of the ISP

while host a is not). Host a sends a packet to host v and v responds a packet back to

host a. Both packets pass the edge router A. From the point of view of the local analyzer

1 attached to edge router A, we define the first packet as an inbound packet, and the

second packet as an outbound packet. The source IP address (SA) and destination IP

address (DA) of the inbound packet match the DA and SA of the outbound packet. If the

communication is based on UDP or TCP, we can further observe that the source port (SP)

and destination port (DP) of the inbound packet match the DP and SP of the outbound

packet. Therefore, the local analyzer 1 can observe matched inbound and outbound

packets in normal conditions. In the example of Figure 3-2, it is assumed that the border

gateway protocol (BGP) routing makes the inbound packets and the corresponding

outbound packets pass through the same edge router. If the BGP routing makes the

inbound packets and the corresponding outbound packets go through different edge routers

(Figure 2-4), the matching can still be achieved by a global analyzer (Section 2.2.3), i.e.,

multiple local analyzers convey the unmatched inbound packets and the corresponding

outbound packets to the global analyzer, which has the routing information of the whole

autonomous system.









Local Analyzer
1





\.
\


Autonomous System


Figure 3-3. Source-address-spoofed packets.


In the second scenario, when attackers launch spoofed-source-IP-address DDoS

attacks[18], the local analyzer 1 observes many unmatched inbound packets (Figure 3-3).

Since source addresses of inbound packets are spoofed, the outbound packets are routed to

the nominal destinations, i.e., b and c in Figure 3-3, which do not pass through edge router

A any more. In this case, local analyzer 1 will observe many unmatched inbound packets.

-Global Analyzer
Local Analyzer -
1
I -



Adtonomous System

.v /a
Local Analyzer -


Figure 3-4. Reroute.


In the third scenario (Figure 3-4), the number of unmatched inbound packets

observed by local analyzer 1 is increased due to a failure of the original route and re-route

of outbound packets to another edge router. A global analyzer can address this problem

similar to the .,i-viiiiii Ii ic case in the first scenario.









All the above scenarios seem to -ii--.- -1 that the number of unmatched inbound

packets observed by an edge router is a good feature for network anomaly detection.

However, usually, this is not true because traffic volume from one end to the other is not

symmetric, typically. In Figure 3-2, if host a is a client uploading a large file using the File

Transfer Protocol (FTP)[19] to host v, there will be much more packets from a to v than

those from v to a. Uploading file to an FTP server is a normal behavior but the number of

unmatched inbound packets is very high in this case.

Therefore, it is more appropriate to use flow-level quantities (instead of packet-level

quantities) as features for network anomaly detection. As in the above FTP case, when a

TCP connection is established, all packets on one direction constitute one flow and packets

on the reverse direction constitute another flow. No matter how many packets are sent on

each direction, there are only one inbound flow and only one outbound flow. They match

in IP addresses and port numbers. Therefore, we call the number of unmatched inbound

flows as a two-way matching feature.

Two-way matching features are shown to be effective indicators of network anomalies

[20].3 However, extraction of two-way matching features at high-speed edge routers is not

an easy task. We will address this issue in Sections 3.4 and 3.5.

Next, we define the two-way matching features.

3.3.2 Definition of Two-Way Matching Features

We first define three terms.

Definition 1. Signature is the information of interest, carried in traffic.

The exact definition of signature depends on the specific application targeted. For

example, to detect SYN flood DDoS attacks, we may use a 5-tuple signature


3 Two-way matching features are good indicators of DDoS attacks with spoofed source
IP addresses but are not good indicators of DDoS attacks with non-spoofed source IP
addresses.









DA, DP, sequence number> for inbound packets and
1> for outbound packets. We further define inbound -:iu,.al;, re as the signature extracted

from inbound packets and outbound -.:<,iI.l, re from outbound packets.

Definition 2. A flow is a set of the packets with the same -:i,..l;, re and the same

direction.

For example, a TCP connection between two ends generates two flows with different

directions.

Definition 3. An unmatched inbound flow (UIF) is an inbound flow that has no cor-

responding outbound packet arriving at an intended edge router within a time period

F.

Note that we use a time constraint F in the definition of UIF because it takes

time for an outbound packet to arrive. If F is too short, then some returning outbound

packets might be ignored, which increases the false alarm probability of network anomaly

detection. If F is too large, then the detection delay is long. The suitable choice of F

depends on the round trip time (RTT) of the connection. For example, we can choose F

to be the most significant 9'1'. RTT, i.e., more than 9' '. corresponding outbound packets

return within time F.

Table 3-1: Notations for two-way matching features
Notation Description
ti The ith sampling time epoch, where ti = ti + F and i E Z+.
s(p) Inbound signature of an an inbound packet p.
s'(p') Outbound signature of an outbound packet p'.
D(ti) The number of UIF during the ith period.


Based on the above definitions, we define the two-way matching features to be the

number of UIF. Table 3-1 lists the notations used in the rest of the paper, where Z+

stands for the nonnegative integer set.

In the following sections, we present algorithms to extract two-way matching features

from the traffic at local analyzers. Note that two-way matching features should be









extracted by global analyzers when an AS is not symmetric. However, the feature

extraction approaches used by local analyzers and global analyzers are same.

3.4 Basic Algorithms

This section presents two basic algorithms to process and store the two-way matching

features, namely, the hash table algorithm and the Bloom filter algorithm.

3.4.1 Hash Table Algorithm

The general procedure to extract the two-way matching features from traffic at an

local analyzer is:

1. The local analyzer maintains a buffer in memory;

2. When the traffic monitor captures an inbound packet, if its inbound signature is not
in the buffer, the local analyzer creates one entry for its signature and set the state of
that entry to "UNMATCHED";

3. When the traffic monitor captures an outbound packet, if its outbound signature is in
the buffer, the local analyzer sets the state of that entry to \ IATCHED";

4. At time ti+l, the local analyzer assigns the number of entries with state "UNMATCHED"
to D(ti).

So typically we need three operations: insertion, search and removal4

A basic algorithm to do this is to use a hash table. Suppose the signature extracted

from a packet is b bits long. We organize the buffer into a table, V, with f cells of b + 1

bits each. The extra one bit is the state bit. We also have C hash functions hi:S Z+ ,

where i E Zc {0, 1,..., /C 1}, and S is the data set of interest, e.g., signature domain.

The symbol Z stands for the set {0,..., 1}, where is an integer.

The operations of hash table algorithm are listed in Figure 3-5, where the argument s

is the signature extracted from a packet.



4 Setting the state to \! ATCHED" is actually the removal operation.









1. function HashTableInsert(V, s)
2. for i -- 0 to C 1
3. if V[hi(s)] is empty
4. insert s to V[hi(s)], set state bit of V[hi(s)] to "UNMATCHED"
5. return
6. end if
7. end for
8. report insertion operation error
9. end function
10. function HashTableSearch(V, s)
11. for i -- 0 to C 1
12. if V[hi(s)] is empty
13. return false
14. if V[hi(s)] holds s
15. return true
16. end for
17. return false;
18. end function
19. function HashTableRemove(V, s)
20. for i -- 0 to C 1
21. if V[hi(s)] is empty
22. return;
23. if V[hi(s)] holds s
24. set state bit of V[hi(s)] to \lATCHED"
25. return true;
26. end if
27. end for
28. end function
Figure 3-5. Hash Table Algorithm

3.4.2 Bloom Filter

The hash table algorithm can be used for offline traffic a n i1 i-; or analysis of low

data-rate traffic but it cannot catch up with a high data rate at edge routers. To address

this limitation, one can use Bloom filter algorithm[21]. Compared to the hash table

algorithm, Bloom filter algorithm reduces space/time complexity by allowing small degree

of inaccuracy in membership representation, i.e., a packet signature, which does not

appear before, may be falsely identified as present.










Bloom filter stores data in a vector V of M elements, each of which consists of one

bit. Bloom filter also uses IC hash functions hi:S v-+ ZM, where i E c. Figure 3-6

describes the insertion and search operations of Bloom filter.

1. function BloomFilterInsert(V, s)
2. for Vi e Z;c do
3. V[hi(s)] 1
4. end function
5. function BloomFilterSearch(V, s)
6. for Vi E Z;K do
7. if V[hi(s)] / 1 then
8. return false
9. end for
10. return true
11. end function

Figure 3-6. Bloom Filter Operations



P p' P' p



ti t ti+1 t '2 ti+2 ti t 1 t '2 ti+1
(a) (b)

Figure 3-7. Scenarios of the problems caused by Bloom filter. (a) Boundary problem. (b)
An outbound packet arrives before its matched inbound packet with t2 t1 < F.


Although Bloom filter has better performance in the sense of space/time trade-off, it

cannot be directly applied to our application because of the following problems:

1. Bloom filter does not provide removal functionality. Since one bit in the vector may
be mapped by more than one item, it is unsuitable to remove the item by setting all
bits indexed by its hash results to 0.

2. Bloom filter does not have counting functionality. Although the counting Bloom filter
[22] can be used for counting, it replaces a bit with a counter, which significantly
increases the space complexity.









3. Sampling two-way matching features in discrete time results in boundary effect
(Figure 3-7(a)). An inbound packet arrives at time t' E [ti, ti+l) whereas its matched
outbound packet arrives within next period. The inbound packet is counted as an
unmatched inbound packet even though t' t' < F. Therefore, boundary effect
increases the false alarm rate.

4. In previous discussion, we did not consider the scenario that an outbound packet may
arrive before its matched inbound packet (Figure 3-7(b)). When the outbound packet
arrives at time t', its signature is not in the buffer, so we do nothing. At time t'
its matched inbound packet arrives, whose inbound signature will be recorded. As a
result, the latter is regarded as an unmatched inbound packet during period [ti, ti+).
This early-arrival problem also increases the false alarm rate.

Next, we propose a Bloom filter array algorithm to address the above problems.

3.5 Bloom Filter Array (BFA)

The good space/time trade-off motivates us to apply Bloom filter to two-way

matching feature extraction. But we need to address the limitations of Bloom filter

mentioned in Section 3.4.2. Our idea is to design a Bloom filter array (BFA) with the

following functionalities, not available in the original Bloom filter [21, 23]:

1. Removal functio,.rl.:1;,: We implement insertion and removal operations synergistically
by using insertion-removal pair vectors. The trick is that, rather than removing an
outbound signature from the insertion vector, we create a removal vector and insert
the outbound signature into the removal vector.

2. Counting fir,,.. ..:h..i:i,: We implement this by introducing counters in Bloom
filter array. The value of a counter is changed based on the query result from an
insertion/removal operation.

3. Bo;n,, .'l11 effect abatement: We use multiple time slots and a sliding window to
mitigate the boundary effect.

4. Resolving the early-arrival problem: which is achieved by storing signature of not only
inbound packets but also outbound packets. In this way, when an inbound packet
arrives and the signature of its matched outbound packet is present, we do not count
this inbound packet as an unmatched one.

3.5.1 Data Structure

To address the boundary effect, we partition the time constraint F into w time slots,

where w is the number of slots enough to mitigate the boundary effect (see Section 3.5.3).









Assume the length of a slot is 7. Then, we have F = w x 7. The data structure of BFA is

as follows:

* An array of bit vectors {IVj} (j E Z+), where IVj is the jth insertion vector holding
inbound signatures in slot [rj, Tj+l), where Tj+l = j + 7.

* An array of bit vectors {RVj} (j E Z+), where RVj is the jth removal vector holding
outbound signatures in slot [rT, Tj+i).

* An array of counters {C,} (j E Z+), where Cj is used to count the number of UIF in
slot [Tj, Tj+,).

Since the two-way flows need to be matched within a time interval of length F, we

only need to keep information within a time window of length F. That is, if the current

slot is [7-j, -j+l), only {IVj_,w+,... ,IVj}, {RVj-_w+,... ,RVj}, and {Cj_,+I,...,Cj} are

kept in memory.

3.5.2 Algorithm

Our algorithm for BFA (Figure 3-8) consists of three functions, namely, ProcInbound,

ProcOutbound and Sample, which are described as below.

Function ProcInbound is to process inbound packets. It works as below. When

an inbound packet arrives during [ T, 9Tj+), we increase Cj by 1 and insert its inbound

signature s into IVj if none of the following conditions is satisfied:

1. s is stored in at least one RVj,, where j w + 1 < j' < j;

2. s is stored in IVj.

Condition 1 being true means that the corresponding outbound flow of this inbound

packet has been observed previously; so we should not count it as an unmatched inbound

packet. Condition 2 being true means that the inbound flow, to which this inbound packet

belongs, has been observed during the current slot j; so we should not count the same

inbound flow again. If both conditions are false, we increase Cj by one to indicate a new

potential UIF (line 7 to 10).









1. function ProcInbound(s)
2. a false, b false
3. if 3j', j w + 1 < < j, such that BloomFilterSearch(RVj,,s) returns true then
4. a -- true
5. if BloomFilterSearch(IVj,s) returns true then
6. b true
7. if a and b are both false
8. Cj Cj + 1
9. BloomFilterInsert(IVj, s)
10. end if
11. end function
12. function ProcOutbound(s')
13. for j' j to j w + 1
14. if BloomFilterSearch(RVj,, s') returns true
15. break
16. if BloomFilterSearch(IVjy, s') returns true
17. cj, j, 1
18. end for
19. BloomFilterInsert(RVj, s')
20. end function
21. function Sample(j)
22. return Cj-,w+l
23. end function
Figure 3-8. Bloom Filter Array Algorithm

Function ProcOutbound is to process outbound packets. It works as below. When an

outbound packet arrives during [rj, Tj+,), we check whether we need to update counter Cy

for each j' (j w + 1 < j' < j). Specifically, for each j' (j w + 1 < j < j), decrease Cj

by one if its outbound signature s' satisfies both of the following conditions:

1. s' is not contained in RVj,;

2. s' is contained in IVy'.

Condition 1 being true means that no packet from the outbound flow to which this

outbound packet belongs arrives during the j'th time slot. Condition 2 being true means

that the matched inbound flow of this outbound packet has been observed in the j'th slot.

Satisfying both conditions means that its matched inbound flow has been counted as a

potential UIF; hence, upon the arrival of the outbound packet, we need to decrease C, by









one to uncount it. In Function ProcOutbound, Line 13 starts a loop to iterate j' from j to

j w + 1. Condition 1 is checked in lines 14 to 15 and Condition 2 is checked in lines 16

to 17. Note that the loop exits (line 15) if RVj, contains s'; this is because an outbound

packet of the same flow arrived in that j'th slot and hence the buffer of the jth slot (for

each j < j') has already been checked.

Function Sample is to extract the two-way matching features. When we execute

Function Sample at the end of the jth slot (i.e., at time Tj+i), the output is D(rj-w+i)

instead of D(-j) since a time lag of F (w slots) is needed for two-way matching.

3.5.3 Round Robin Sliding Window

1. function ProcInbound(s)
2. a -- false, b -- false
3. if j',j' e {(I w + 1).,"',(I w + 2) .',....,I .,'},such that
BloomFilterSearch(RVjy,s) returns true then
4. a -- true
5. if BloomFilterSearch(IVI,s) returns true then
6. b true
7. if a and b are both false then
8. C C +
9. BloomFilterInsert (IV,, s)
10. end if
11. end function
12. function ProcOutbound(s')
13. for j' -- I to (I w + 1)- .
14. if BloomFilterSearch(RVj,, s') returns true then
15. break
16. if BloomFilterSearch(IVjy, s') returns true then
17. CY <- C- 1
18. end for
19. BloomFilterInsert(RVI, s')
20. end function
21. function Sample()
22. I <-- (I + 1)
23. return CI
24. end function
Figure 3-9. Bloom Filter Array Algorithm using sliding window


The algorithm presented in Section 3.5.2 has a drawback in memory allocation.

Specifically, at epoch Tj+i, we sample D(rj-_w+i), and then we need to throw away the









buffer for the (j w + 1)th slot, and create a new buffer for the (j + 1)th slot. This is

inefficient for most operating systems. A better memory allocation strategy is to use the

useless buffer of the (j w + 1)th slot for the new (j + 1)th slot, saving the cost of memory

allocation. This is the idea of our round-robin sliding window.

Our new memory allocation scheme is the following. We allocate a memory area

of fixed size for w insertion vectors {IVj}, w removal vectors, {RVj}, and w counters

{Cj}, where j Z,. The insertion vector, removal vector, and counter for the jth slot
are IVjw, RVj%,, and Cjw, respectively. Here, stands for modulo operation. We

also define a pointer I to point to the current slot. Then, rather than deleting a useless

buffer and acquiring a new buffer for the new slot, we simply update the pointer by

I = (I + 1) i,. Figure 3-9 shows the improved version of BFA, based on the round-robin

sliding window.

3.5.4 Random-Keyed Hash Functions

In previous sections, we assume C hash functions are given a priori. However,

choosing hash functions appropriately is not trivial due to the following two concerns.

First, KC is a user-specified parameter, subject to change. But for a value of KC that a

-- chooses, it is not desirable to require the user to manually select C hash functions

from a large pool of hash functions provided by the manufacturer. Also, it wastes memory

to store a large pool of hash functions.

Second, to improve security, the C hash functions need to be changed over time.

Otherwise, if an attacker knows the hash functions, he can generate such attack packets

that for signatures of any two packets, sl and a2, Si) / s2 but hi(si) = h(s2), i c Zc. The

consequence is that even if there are many attack packets with different signatures, the



5 A user here is a network operator who wants to use our BFA and detection technique
to detect network anomalies.









BFA algorithm will regard them as belonging to the same flow. So, the number of UIF for

these packets is only one. This causes security vulnerability.

We address the aforementioned two problems by using kei, ,1 hash functions, i.e.,

we only need one kernel hash function and C randomly generated keys. Specifically, the

ith hash function hi(x) is simply h(keyi, x), where h is a predefined kernel hash function

and {keyi} (i E Zc) are randomly generated keys. For example, we can use MD5 Digest

Algorithm[24] as the hash function. Since MD5 takes any number of bits as input, we can

organize keyi and x into a bit vector and apply MD5 to it.

Using keyed hash functions, the first concern (varying IC) can be addressed straightforwardly.

Specifically, when KC is changed, we simply generate a corresponding number of random

keys. Applying these C keys to the same kernel hash function, we obtain /C hash

functions. Hence, our method has two advantages: 1) the number of hash functions

can be specified on the fly; 2) hash functions are determined on the fly, instead of being

stored a priori, resulting in storage saving.

The second concern (changing hash functions) can also be addressed if the keys are

periodically changed. Even if the kernel hash function is disclosed, it is still very difficult,

if not impossible, for an attacker to guess the changing random keys.

Note that the collision probability of the hash functions is not affected due to the

use of keyed hash functions. In the case of random-kei, ,1 hash functions, the collision

probability of hi(x) depends on not only the collision probability of h but also the

correlation between keyi and x. Since random number generator techniques are so mature

that we can assume independence between keyi and x, introduction of random keys has no

effect on the collision probability.

3.6 Complexity Analysis

This section compares the hash table, Bloom filter, and our BFA. The section is

organized as follows. In Section 3.6.1, we analyze the space/time trade-off for the three

algorithms. Section 3.6.2 addresses how to optimally choose parameters of BFA.









3.6.1 Space/Time Trade-off


Space/time trade-off for both Hash table and Bloom filter algorithms was 1i, lv. I 1 by

Bloom [21]. However, the analysis by Bloom[21] is not directly applicable to our setting

due to the following reasons:

1. A static data set was assumed by Bloom[21]. However, our feature extraction deals
with a dynamic data set, i.e., the number of elements in the data set changes over
time. Hence, new analysis for a dynamic data set is needed. In addition, Bloom[21]
only considered the search operation due to the assumption of static data sets. Our
feature extraction, on the other hand, requires three operations, i.e., insertion, search,
and removal, for dynamic data sets.

2. Bloom[21] assumed bit-comparison hardware in time complexity analysis. However,
current computers usually use word (or multiple-bit) comparison, which is more
efficient than bit-comparison hardware. Hence, it is necessary to analyze the
complexity based on word comparison.

3. The time complexity obtained by Bloom[21] did not include hash function calculations.
However, hash function calculation dominates the overall time complexity, e.g.,
calculating one hash function based on MD5 takes 64 clock cycles [25], while one
word-comparison usually takes less than 8 clock cycles [26].

For the above reasons, we develop new i ,1', i -; for the hash table and Bloom filter,

respectively. In addition, we analyze the performance of BFA and use numerical results to

compare the three algorithms. Table 3-2 lists the notations used in the analysis.

Table 3-2: Notations for complexity analysis
Notation Description
N Random variable representing the number of different flows recorded.
SEmpty ratio.
'q Collision probability, i.e., the probability that an item is falsely identified
to be in the buffer.
R Flow arrival rate, which is assumed to be constant.


Analysis for hash table. Denote by i the size of a hash table in bits (i.e., space

complexity) and by Th the random variable representing the number of hash function

calculations for an unsuccessful search (i.e., time complexity).

Let us consider search operation first. Upon the arrival of an inbound packet, the

HashTableSearch (see Figure 3-5) checks if its inbound signature s is in the table. Because









an unsuccessful search will continue the loop until an empty cell is found, it consumes

more time than a successful one does. In addition, it is very difficult to analyze the time

complexity of a successful search since the complexity depends on the distribution of

flow signatures and the data rate of each flow. For this reason, we only consider the time

consumed for an unsuccessful search, which is a conservative estimate of the average time

complexity of a search. Recall that, as mentioned in Section 3.4.1, the hash table has f

cells of b + 1 bits each, such that .i = (b + 1). Given the condition that N flows have

been recorded by the hash table, the empty ratio is

S- N i1, N(b + 1)


In each loop, the HashTableSearch calculates one hash function and checks the addressed

entry. If the entry is not empty, next loop is executed. The conditional probability that

the loop is executed for x times for a given n follows a geometric distribution as below


Pr[Th = xl =n]= 1 (1- -)]-1. (3-2)


Therefore the conditional expectation of Th is
O 1i
E[l N= n] = xQ(1 ).- 1 (3-3)
E01 p 1 n(b + 1)

Since the table records data for the duration of F, the maximum number of different

flows that we need to store in the buffer is RF. Then the expectation of Th is

RF
E[Th] Pr[N= n]E[LThlN n]. (34)
n=0

Assume N has a uniform distribution

1
Pr[N n] = + (3 5)
RF + 1









Applying Equation (3-5) to Equation (3-4), we obtain the expectation of Th

1 R ,
E[Th] R 1 (3 6)
RF +t 11, n(b+ t1)'

Since the time to insert a signature into or remove a signature from a given entry is
much shorter than that to find the proper entry, the time complexities of insertion and
removal operations are almost the same as that of the search operation. Equation (3-6)
gives the space/time trade-off (i.e., .1, vs. Th) of the hash table method.

Analysis for Bloom filter. First of all, we consider the space complexity of Bloom
filter. Denote by ./,, the length of the vector V used by Bloom filter (see Section 3.4.2).
The choice of ,. will affect the accuracy of the search function, BloomFilterSearch (see
Figure 3-6). The reason is the following.
When signatures of N flows are stored in V, 0, denoting the percentage of entries of
V with value 0, is

= C (3-7)


where KC is the number of hash functions. Assuming KC < 1 as is certainly the case, we
can approximate 0 as

O exp( I ). (3-8)

Function BloomFilterSearch(V, s) falsely identifies s to be stored in V if and only if
results of all KC hash functions point to bits with value 1, which is known as a collision.
Denote by T]N the collision probability under the condition that N flows have been
recorded. Then

= ( = exp --- (3-9)
( I"'* /KN









Therefore, the average collision probability is


RF RF
I n Pr[N n] 1 ex(p -t) (3-10)
n=0 n=O

where N is assumed to be uniformly distributed as in Equation (3-5). From Equation (3-10),

it can be observed that TI decreases with i.,, if /C is fixed. Based on Equation (3-10), we

can denote ,. as a function of Tr and /C as below


I[, ar(/, 1C). (3-11)


Equation (3-11) gives the space complexity of Bloom filter as a function of collision

probability and the number of hash functions.

Now, let us consider the time complexity of Bloom filter. Denote by Tb the random

variable representing the number of hash function calculations.

Function BloomFilterInsert alv-w calculates all the C hash functions, that is,


Tb {BloomFilterInsert is executed} /C, (3-12)


where "|" followed by an event means a condition and "=" means equality with

probability 1.

For function BloomFilterSearch, we first consider a special case that BloomFil-

terSearch returns true. In this case, all C hash functions need to be calculated. So


Tb {BloomFilterSearch returns true} = KC. (3 13)


This fact will be used in the analysis for BFA (see Section 3.6.1).

In general,


Pr[Tb = xlN=n and BloomFilterSearch is executed]

( O-1 x <(3 14)
(3-14)
(l-t x= C









Hence, the conditional expectation of Tb is


E[TblN=n and BloomFilterSearch is executed]
k/-1
Y X(1 )X-1 + /C(1 )-I1
xl1
1 [ -exp c)]
SaRr('n,))) \
exp ( aRr (,)

nA(I, lK). (3-15)

Averaging over N at both sides of Equation (3-15), we get the expectation of Tb under the

condition that BloomFilterSearch is executed, i.e.,


E[Tb BloomFilterSearch is executed]

1 +RF
RP+ t1 (3-16)
n=0

If we know the two prior probabilities, i.e., the probability that BloomFilterSearch is

executed, denoted by Ps, and the probability that BloomFilterlnsert is executed, denoted

by Pi, then we can get
P RF
E[Tb] -R + 1 (T, C) + PC. (3-17)
n=0

Equation (3-17) gives the time complexity of Bloom filter in terms of number of hash

function calculations.

Analysis for Bloom filter array. Once again, we ain 1. the space complexity

of BFA first. The techniques in Section 3.6.1 can be applied here since BFA is originated

from standard Bloom filter. However, there are some differences between these two

schemes. As described in Section 3.5, BFA has multiple buffers such as IVj, RVj, and Cj,

j E Z,. Therefore, the storage size for BFA, denoted by if, (in bits), is w(2 x .i .. + L),

where i .. is the size of each insertion or removal vector, and L is the size of each counter

in bits.









Similar to Equation (3-10), the collision probability is

1 1 /- k (3-18)

n=o

Note that length of each time slot of BFA is 7, so that the upper limit of the summation

operator is R7 rather than RF. Similar to Equation (3-11), .i.. is a function of TI and /C.

We define

MI. ., (q ,C). (3 19)



Then


if, =w(2 x aR(q, IC) + L). (3-20)


Equation (3-20) gives the space complexity of BFA.

Now, let us consider the time complexity of BFA. Denote by T, the random

variable representing the number of hash function calculations for BFA. Recall that BFA

(Figure 3-9) defines three functions, ProcInbound, ProcOutbound, and Sample. Obviously,


TaI {Sample is executed} 0. (3-21)


When executing Function Proclnbound, all the C hash functions need to be

calculated. The reason is the following.

1. If variables a and b are both false, Function BloomFilterInsert is executed, which
calculates C hash functions (see Equation (3-12)).

2. Otherwise, at least one of a and b is true; then at least one of the search operations,
i.e., BloomFilterSearch(RVj,s), j' (I w + 1)-. ,,,(I w + 2)'y. ,,... I .W, and
BloomFilterSearch(IVi,s), returns true. This also means that/C hash functions have
been calculated (see Equation (3-13)).

Therefore, in any case, ProcInbound calculates all the C hash functions. Further note

that, although BloomFilterSearch executes up to w + 1 search operations, and at most

one insertion operation, the total number of hash function calculations in these operations









is the same as that in one search operation. This is because the results of hash function

calculation in one search operation can be used again by all the other search operations

and insertion operation. Therefore,


T,a{ProcInbound is executed} /C. (3-22)

Similarly,


T,a{ProcOutbound is executed} = C. (3-23)

In each time slot, we execute Sample once, ProcInbound for Rp,7 times, and ProcOut-

bound for Rpo7 times, where Rpi and Rpo are inbound packet arrival rate and outbound

packet arrival rate, respectively. Combining Equations (3-21), (3-22), and (3-23) and

assuming (Rpi + Rpo)7 > 1, which is alhv-, true in our design of BFA, we have

1 KC(R,, + Rpo)7
E[T,] -0 x 1 + (3-24)
(Rp + Rpo)7 + 1 (R,i + Rpo)7 + 1

Combining Equations (3-24) and (3-20), we obtain the relationship between i1, and

T, as below


1, = w[2aR;(rl,E[Tl]) + L]. (3-25)


Table 3-3: Space/time complexity for hash table, Bloom filter, and BFA
Algorithm Space complexity Time complexity
Hash table .i (free variable) Equation (3-6)
Bloom filter Equations (3-10) and (3-11) Equations (3-15), (3-16), and (3-17)
BFA Equation (3-18), (3-19), and (3-20) Equation (3-24)

Table 3-3 lists the space complexity and time complexity for hash table, Bloom filter,

and BFA algorithms.

Numerical Results.










In this section, we use the formulae derived in above sections to compare the hash

table scheme with BFA algorithm through numerical calculations. The setting of our

numerical study is the following:

1. Traces captured from an ISP's edge router shows that the average number of
flows during one second is around 250, 000. So, we let R=250, 000. To reduce the
probability of false alarms caused by normal packets with long RTT, we choose F
large enough such that more than 9'' packets have RTT less than F. For the same
traces, F 80 seconds.

2. Suppose we want to detect TCP traffic anomaly. Thus the signature captured from
each packet is composed of 32-bit SA, 32-bit DA, 16-bit SP, and 16-bit DP. So b = 96
bits.

3. In the BFA algorithm, we use 40 time slots (i.e., w = 40), each of which is 2 seconds
(i.e., 7 2). Also suppose each counter is a 32-bit integer (i.e., L = 32).


1011
Hash Table
-- BFA ( = 0 001)
SBFA ( = 0 01)
1010




107




1 2 3 4 5 6 7 8 9 10
12345678910
Time E[T]


Figure 3-10. Space/time trade-off for the hash table, BFA with = 0.1 and BFA with




Figure 3-10 shows M vs. E[T] for the hash table scheme, BFA with collision

probability 1 and BFA with collision probability 0.1 In Figure 3-10, X axis represents

the time complexity (i.e., the expected number of hash function calculations) and Y

axis represents the space complexity (i.e., the number of bits needed for storage). From

Figure 3-10, we can see that the curve of BFA is below the curve of the hash table. It

means BFA uses less space for a given time complexity. Therefore, BFA achieves better









space/time trade-off than the hash table. We also see that the curve of BFA with r = 1

is below the curve of BFA with l = 0.1 This shows the relationship between space/time

and collision probability. Specifically, to reach a lower collision probability or more

accurate detection, we need to either calculate more hash functions or use more storage

space.

To see the gain of using BFA, let us look at an example. Suppose E[T] = 5, i.e., in

each slot, 5 hash function calculations is needed on average. Then, the memory required

by the hash table scheme, BFA with q = 0.1 t and BFA with = 1-. is 1.01G bits,

115.3M bits, and 62.9M bits, respectively. It can be seen that our BFA with = 1 can

save storage by a factor of 16, compared to the hash table scheme.

Figure 3-10 shows that for the hash table scheme, i. is a monotonic decreasing

function of E[Th]. The observation matches our intuition that the larger table, the smaller

collision probability for hash functions, resulting in less hash function calculations. Further

note that i., approaches RF(b + 1) when E[Th] increases. This is the minimum space

required to tolerate up to RF flows.

For BFA, .1, is not a monotonic function of E[T,], which approximately equals /C.

We have the following observations.

* Case A: For fixed storage size, the smaller IC, the larger the probability that all /C
hash functions of two different inputs return the same outputs, which is the collision
probability. In other words, the smaller IC, the larger storage size required to achieve
a fixed collision probability. That is, KC = i= M, T.

* Case B: Since an input to BFA may set KC bits to "1" in a vector V, hence the larger
IC, the more bits in V will be set to "1" (nonempty), which translates into a larger
collision probability. In other words, the larger IC, the larger storage size required to
achieve a fixed collision probability. That is, K/ T=> .l, T.

Combining Cases A and B, it can be argued that there exists a value of K/ or E[Ta]

that achieves the minimum value of i ,, given a fixed collision probability. This minimum

property can be used to guide the parameter setting for BFA, which will be addressed in

Section 3.6.2.










3.6.2 Optimal Parameter Setting for Bloom Filter Array

This section addresses how to determine parameters of BFA under two criteria,

namely, minimum space criterion and competitive optimality criterion.

Minimum space criterion. According to Equation (3-25), three parameters, ii,,

E[Ta], and r7, are coupled. Since the collision probability Tr critically affects the detection

error rate in our network anomaly detection, a network operator may want to choose an

upper bound r on the acceptable collision probability r and then minimize the storage

required, i.e.,


According to Equation (3


min ii,, subject to T < T
E[Ta h

25), the solution of (3 26) is as below


i f* = minif, = min w [2aR (qT, E[Ta]) + L],
E[Ta] E[Ta]

E[T] = arg minf ., arg min aR (T7, E[T,]).
E[Ta] E[Ta]


10 10

(a)


10 10


10" 10

(b)


Figure 3-11. Relation among space
(a) .l: vs. T1. (b) E[T]a* vs. T7.


complexity, time complexity, and collision probability.


Figure 3-11 shows .1 1 vs. T7, and E[T,]* vs. Tr under the same setting as that

in Section 3.6.1. From Figure 3-11(a), it can be observed that .1i decreases when T


(3-26)


(3-27)


(3-28)


10 10









increases. This is because the larger collision probability we can tolerate, the less space

required.

From Figure 3-11(b), one observes that generally, E[T*] decreases when Tr increases.

This may be because the smaller E[T*] or IC, the larger the probability that all C hash

functions of two different inputs return the same outputs, which is the collision probability.

Competitive optimality criterion.From Equation (3 18), it can be observed that

rT decreases with the increase of i .. if KC is fixed; in other words, i .. decreases with the

increase of Tr if KC is fixed. Further, from Equations (3-19) and (3-25), it can be inferred

that if, decreases with the increase of Tr if E[Ta] is fixed (note that E[Ta] /C). This is

shown in Figure 3-12. From the figure, it can be observed that the two lines intersect at a

value of collision probability, denoted by rc. This value is critical for the parameter setting

of BFA. If a network operator has a desirable collision probability r1, which is greater than

Trc, then it should choose E[Ta] = 4 since this parameter setting gives both smaller time

complexity and smaller space complexity. We call this property 'competitive optimality'

since there is no tradeoff between time complexity and space complexity in this case.

On the other hand, if a network operator has a desirable collision probability r1, which

is smaller than rc, then it needs to make a tradeoff between space complexity and time

complexity.

3.7 Simulation Results

In this section, we conduct two sets of experiments to show the performance of BFA

for feature extraction in high-speed networks. Section 3.7.1 compares the performance of

the BFA algorithm with that of the hash table algorithm. In Section 3.7.2, we show the

performance of the complete feature extraction system, which uses the BFA algorithm.

3.7.1 The BFA Algorithm vs. the Hash Table Algorithm

Simulation settings. We apply the hash table algorithm and the BFA algorithm

to the time series of signatures extracted from real traffic traces, which were collected

by Auckland Uni,. i- y[27]. To make a fair comparison with respective to the numerical











10 ~ E[Ta]=4
S-E[Ta]=6


1085



1083



1081
10-3 10-2 n110.0156 10-1


Figure 3-12. Space complexity vs. collision probability for fixed time complexity.


results in Section 3.6.1, we use the same 96-bit signature, i.e., SA, DA, SP, and DP, and

let R=250,000 packets/second and F=80 seconds, which translates to 250,000 x 80 20M

input signatures for each simulation. These signatures are preloaded into memory before

the beginning of simulations so that I/O speed of hard drive does not affect the execution

time of simulations.

For each simulation run of the hash table algorithm, we specify the memory size

i and measure the algorithm performance in terms of the average number of hash

function calculations per signature query request, denoted by Th, and the execution

time. Due to the Law of Large Numbers, Th approaches the expected number of hash

function calculations per signature query request, i.e., E[Th] in Equation (3-6), if we run

the simulation many times with the same .i In our simulations, we run the hash table

algorithm ten times; each time with a different set of input signatures but with the same

1,

For each simulation run of the BFA algorithm, we specify the memory size ma and

the number of hash functions IC, and measure the algorithm performance in terms of

the collision frequency, denoted by t, and the execution time. The collision frequency is

defined as the ratio of the number of collision occurrences in BloomFilterSearch to the











total number of BloomFilterSearch executions. Due to the Law of Large Numbers, q is a

good estimate of collision probability, T1.

Performance comparison between hash table and BFA. Figure 3-13 shows

average processing time per query vs. memory size for the hash table algorithm, BFA

algorithm with = 0.1 and BFA algorithm with =1 .


1010
Hash Table
BFA (=0 001)


i 10

0



10



107
0 1 2 3 4 5 6:
Average processing time per packet 4 s)


Figure 3-13. Memory size (in bits) vs. average processing time per query (in ps)


From Figure 3-13, we observe that 1) compared to the hash table algorithm, the BFA

algorithm requires less memory space for the same time complexity (average processing

time per query), which was predicted in Section 3.6, and 2) the BFA algorithm with =1t

has a better space-complexity/time-complexity tradeoff than the BFA algorithm with

,=0.1 but at cost of higher collision probability, which is predicted by the numerical

results in Figure 3-10.

Figure 3-14 shows average processing time per query vs. average number of

hash function calculations per query. It can be observed that the average processing

time per query linearly increases with the increase of the average number of hash

function calculations per query. That is, the larger the average number of hash function

calculations per query, the larger the average processing time per query. For this

reason, instead of running simulations to obtain the time complexity (i.e., the average












6 BFA (=0 001)





3-





S 2 3 4 5 6 7 8
Average number of hash function calculations

Figure 3-14. Average processing time per query (in ps) vs. average number of hash
function calculations per query.


processing time per query), in Section 3.6.1, we used the average number of hash function

calculations per query to represent the time complexity of the hash table algorithm and

the BFA algorithm.

Performance comparison between numerical and simulation results.

Figure 3-15 compares the simulations results and the numerical results obtained from

the analysis in Section 3.6, for both hash table algorithm and BFA algorithm in terms of

space complexity vs. time complexity.

In Figure 3-15(a), the numerical result agrees well with the simulation result, except

when the average number of hash function calculations per query is close to 1. From

Equation (3 6), if the expected number of hash function calculations approaches 1, the

required memory size approaches infinity; in contrast, simulations with a large i A, may

not give accurate results, due to limited memory size of a computer. This causes the big

discrepancy between the numerical result and the simulation result when the average

number of hash function calculations per query is close to 1. When the average number

of hash function calculations per query is greater than or equal to two, it is observed that

simulation ah--,~ requires more memory than the numerical result. This is due to the

fact that practical hash function is not perfect. That is, entries in the hash table are not












simulation results
numerical results


2 3 4 5 6 7
Average number of hash function calculations


-simulation results
numerical results


2 3 4 5 6 7
Average number of hash function calculations


Figure 3-15. Comparison of numerical and simulation results. (a) Hash table algorithm.
(b) BFA algorithm with =1.


equally likely to be accessed. Hence, Equation (3-2) does not hold perfectly, neither does

Equation (3-3). As a result, the average number of hash function calculations per query in

simulation is larger than that predicted by Equation (3-6).

Figure 3-15(b) shows that the numerical result agrees well with the simulation result

for all the values of the average number of hash function calculations per query under our

study.

3.7.2 Experiment on Feature Extraction System

In this section, we show the performance of the complete feature extraction system

implemented on traffic monitors and local analyzers, which uses the BFA algorithm.

The reason of conducting this experiment is that we would like to know the

performance of the whole hierarchical feature extraction architecture presented in

Section 3.2. In contrast, the experiment in Section 3.7.1 does not involve the interaction

among the three levels.

Experiment settings. We use the trace data provided by Auckland University [27]

as the background traffic. This data set consists of packet header information of traffic

between the Internet and Auckland University. The connection is OC-3 (155 Mb/s) for

both directions.









In our experiment, we use two 24-hour traces as the background traffic. We simulate

network anomalies caused by TCP SYN flood attacks [18] by randomly inserting TCP

SYN packets with random source IP addresses into the background trace during specified

time periods. Specifically, synchronized attacks are simulated during 14000 16000 second

and 28000 -32000 second in both traces. In addition, .i-vnchronous attacks are launched

during 50000-52000 second in trace 1 and during 57000-59000 second in trace 2. The

average attack rate is 1 of the packet rate of the background traffic during the same

period.

To detect TCP SYN flood attacks, we choose < SA, DA, SP, DP > as the signature

of inbound packets and < DA, SA, DP, SP > for outbound ones. Thus the two-way

matching feature is the number of unmatched inbound TCP SYN packets in one time slot.

The average flow rate is 2480 flows/second. Therefore, we set R = 2480. We further set

S= 0.1 -C = 8, w = 8, and 7 = 10. Then, by solving Equation (3-18) for ifl.. and

requiring i .. to be a power of 2, we obtain f.. = 215 bits. The computer used for our

experiments has one 2.4G Hz CPU and 1GB memory. For comparison, we also extract the

number of inbound SYN packets in a slot.

Performance.

The average processing rate is measured to be 265, 000 packets/second. Hence,

the algorithm can deal with a line rate of 1 Gbps since the average Internet packet

size is about 500 bytes. Note that our test is offline and data is read from hard disk,

whose access speed is much lower than that of memory. In a real implementation,

data is captured by a high-speed network interface and maintained in the memory;

so the processing speed can be increased. Furthermore, in our test, a hash function is

implemented by software, which is also much slower than a dedicated hardware. Therefore,

it is reasonable to anticipate a higher processing rate if a dedicated hardware is used.









We show the features extracted from the two traces, specifically, the number of SYN

packet arrivals and the number of unmatched inbound SYN packet arrivals during a slot

(Figure 3-16).

From Figure 3-16, it can be observed that the features are rather noisy, especially for

the feature of the number of SYN packets. From Figs. 3-16(a) and 3-16(c), we can hardly

distinguish the slots under the low volume synchronized attacks from the slots without

attacks (by visual inspection). In comparison, it is much easier to identify the slots under

the synchronized attacks (by visual inspection) when the number of unmatched SYN

packets is used as the feature (see Slot 1400 1600 and Slot 2800 3200 in Figs. 3-16(b)

and 3-16(d).

3.8 Summary

This chapter is concerned with design of data structure and algorithms for network

anomaly detection, more specifically, feature extraction for network anomaly detection.

Our objective is to design efficient data structure and algorithms for feature extraction,

which can cope with a link with a line rate in the order of Gbps. We proposed a novel

data structure, namely the Bloom filter array, to extract the so-called two-way matching

features, which are shown to be effective indicators of network anomalies. Our key

technique is to use a Bloom filter array to trade off a small amount of accuracy in feature

extraction, for much less space and time complexity. Different from the existing work, our

data structure has the following properties: 1) .;,ir i.' Bloom filter, 2) combination of a

sliding window with the Bloom filter, and 3) using an insertion-removal pair to enhance

the Bloom filter with a removal operation. Our analysis and simulation demonstrate

that the proposed data structure has a better space/time trade-off than conventional

algorithms.

Next, we discuss classification algorithm based on extracted features.
























250

200

150

100


0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Time Slot


SYN (Link


2) --


0 1000 2000 3000 4000 5000
Time Slot


6000 7000 8000 9000


0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Time Slot


300 ---
UM-SYN (
250

200

150

100

50
^Jji.JF


0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Time Slot


Figure 3-16. Feature data: (a) Number of SYN packets (link 1), (b) Number of unmatched
SYN packets (link 1), (c) Number of SYN packets (link 2), and (d) Number of unmatched
SYN packets (link 2).


m 111


N


I


u









CHAPTER 4
MACHINE LEARNING ALGORITHM FOR NETWORK ANOMALY DETECTION

4.1 Introduction

The third issue in network anomaly detection is the classification algorithm. In

this section, we introduce three basic detection algorithms, threshold-based algorithm,

change-point algorithm, and B iv i i, decision theory. Our machine learning algorithm

derives from the B i, i ,i decision theory.

This section is organized as below. Section 4.1.1 introduces the Receiver Operating

CIi o ':teristics curve. It is used in Section 4.5 as the metrics to compare performance of

different classification methods. We describe the threshold-based algorithm, change-point

algorithm, and B ,i, -i oi decision theory in Sections 4.1.2, 4.1.3, and 4.1.4, respectively.

4.1.1 Receiver Operating Characteristics Curve

Receiver Operating C'li,,,. /. ristics (ROC) curve [28] is a typical method to quantify

the performance of detection algorithms. It is a plot of detection probability vs.

false alarm probability. In practice, we estimate detection probability and false alarm

probability by the fraction of true positives and the fraction of true negatives, respectively.

Hence, to obtain an ROC curve, one needs to measure the following quantities

* Af: the number of false alarms, i.e., the number of slots in which the detection
algorithm declares 'abnormal' given that no anomaly actually happens in these slots;

* A,: the number of slots in which no anomaly happens;

* Ad: the number of slots in which the detection algorithm declares 'abnormal' given
that network anomalies actually happen in these slots;

* Aa: the number of slots in which network anomalies happen.

The false alarm probability and the detection probability of the detection algorithm

can be estimated by Af/A, and Ad/Aa, respectively. By varying parameters of detection

algorithms, we can obtain different pairs of false alarm probability and detection









probability, which give the ROC curve [28]. In this paper, we will use the ROC curve

to compare the performance of different detection algorithms.

Next, we introduce some basic classification algorithms.

4.1.2 Threshold-Based Algorithm

The idea of the threshold-based algorithm is that if the feature value exceeds a

preset threshold, declare 'abnormal'; otherwise, declare 'normal'. Note that the detection

operation is conducted in each slot. By tuning the threshold for the feature value, we

can obtain different pairs of false alarm probability and detection probability, resulting

in the ROC curve. Given the ROC curve and the desired false alarm probability, one can

determine the value of the threshold for detection operation.

Although it is the simplest method, threshold-based algorithm can only be used

when the features make significant difference between normal and abnormal conditions.

Therefore, threshold-based algorithm is not suitable to detect low volume network

anomalies.

4.1.3 Change-Point Algorithm

In the literature, a simple change-point algorithm -non-parametric Cumulative

Summation (CUSUM) algorithm -has been widely used [11-13, 17]. However, existing

studies only consider the change from normal state to abnormal state, which means that

the number of false alarms can be very large after network anomalies end. To facilitate the

discussion, we define the following parameters used in CUSUM in Table 4-1:

Table 4-1: Parameters used in CUSUM
Parameter Description
S(ti) The observed traffic feature at the end of time slot i.
4n The expectation of K(ti) in normal states.
io The expectation of +(ti) in abnormal states. Without losing generality,
here we assume that 4, < 4a
K(ti) The adjusted variable, defined as i(ti) = i(ti) a, where a is a parameter such that
< a < 4,









Now define variable S(ti) by


0 i=0
S(ti) (4-1)
max(0,S(tji_) + )(ti)) i > 0

In the CUSUM algorithm, if S(ti) is smaller than a threshold 'HCUSUM, declare that

the network state is normal; otherwise, declare that the state is abnormal.

From the discussion above, we note that two parameters, i.e., a and 'HCUSUM, need

to be determined. However, we cannot uniquely determine these two parameters. To

overcome this problem, we shall introduce another parameter, i.e., the detection delay,

denoted as V. According to the change-point theory, we have

V 1 1
S(4-2)
'HCUSUM (oa nr) -n a o aa

From Equation (4-2), we can obtain


'HcusuM- x (.a- a). (4-3)

Hence, once V and a are given, we can determine 'HCUSUM through Equation (4-3).1

Given a and 'HCUSUM, we can use the CUSUM algorithm to detect network anomaly.

We notice that the existing CUSUM algorithms [11-13] only consider one change,

i.e., from the normal state to the abnormal state. In practice, this approach may lead to

a large number of false alarms after the end of attacks. To mitigate the high false alarm

issue of the existing algorithms, which we call single-CUSUM algorithms, we develop a

dual-CUSUM algorithm. In this algorithm, one CUSUM will be used to detect the change

from the normal to the abnormal state, while another CUSUM is responsible for detecting

the change from the abnormal to the normal state. The method of setting parameters for



1 In Refs. [11] and [12], a = (4a ,)/2; thus only the detection delay is needed.









dual-CUSUM is similar to the method described in this section. Tuning f'CUSUM results

in the ROC curve of both CUSUM and dual-CUSUM methods.

Although CUSUM has better performance than threshold-based method, its detection

accuracy is still unsatisfactory. We developed a machine learning algorithm which

dramatically outperforms both single and dual CUSUM algorithms (Section 4.5.2).

Our machine learning algorithm is based on B i-, -i i, decision theory, which is introduced

in next section.

4.1.4 Bayesian Decision Theory

B i- i i: decision theory is a "fundamental statistical approach to the problem of

pattern classification" [29]. It is composed of

* the feature space, D, which might be a multi-dimensional Euclidean space;

* U states of nature, H {H-,; u E Zu};

* prior probability distribution, P(H), H E H;

* likelihood probabilities, p( |H), E D, HE H;

* loss function, x(H*, H), H*, H E Zu, which describes the loss incurred for classifying
an object to be of class H* when its state of nature is class H.

Note that, in the paper, P(-) represents a probability mass function (PMF)[30] and

p(.) a probability density function (pdf)[30].
Given the observed feature, Q, of an object, the B i, -i i: decision theory classifies it

to be of class H such that

I argmin Z (H*,H)P(H|). (4-4)
HEH

Due to B,-v. formula[30],

P(p) P( H)P(H) p((H)P(H)5)
*TEfHlp(0|H')P(H') p(W )









Equation (4-4) is equivalent to


=argmin X(H* H)p( |H)P(H). (4-6)
HEH

Equation (4-6) gives the B li- -i ,i criterion for pattern classification.

A simple loss function is defined to be


X(H,., H,) = (u)(u* = u), for V u e Zu, (4-7)

where <(u) is the gain factor, typically positive, representing the gain obtained by

correctly detecting H,, and Z(-) is the indicator function such that


1 if x is true
I(x) = (4-8)
0 if x is false

Equation (4-7) specifies that misclassification induces zero loss and correct classification

induces negative loss, which actually achieves gain. Applying Equation (4-7) to Equation (4-6),

the B li-,-i ,i criterion is simplified to,


S= arg max ((u)p(| H,) P(H,). (4-9)
uEZU

We call Equation (4-9) the maximum gain criterion. Further note that, scaling all gain

factors by a same factor does not change the criterion specified by Equation (4-9). Hence,

we can ah--,i-x set x(0) = 1. By tuning other gain factors, we can generate ROC curve for

B i, -i io decision theory.

In this chapter, we extended the B i, -i i, decision theory for network anomaly

detection. The remainder of this chapter is organized as follows. In Section 4.2, we

establish B li, -i models for network anomaly detection. Sections 4.3 and 4.4 solve

two fundamental problems of B i-, -i i, model, i.e., training problem and classification

problem, respectively. Section 4.5 shows our simulation results and Section 4.6 concludes

this chapter.









4.2 Bayesian Model for Network Anomaly Detection

In this section, we model the network anomaly detection issue in terms of B li, -i i,

decision theory. This section is organized as follows. Section 4.2.1 generalizes the

B li, i i model for network anomaly detection on traffic monitors and local analyzers.

In section 4.2.2, we extend this model to the whole autonomous system. Section 4.2.3

introduces the hidden Markov tree model to decrease the computation complexity of the

general model defined in Section 4.2.2.

4.2.1 Bayesian Model for Traffic Monitors and Local Analyzers

As described earlier, both the traffic monitor and the local ,i 1v-. r have local

information of one edge router. They are able to detect network anomalies through one

single edge router. That is, a traffic monitor makes anomaly declaration when it observes

abnormal features extracted from one link of an edge router, such as large data rate, large

SYN/FIN(RST) ratio, and so on. Similarly, a local analyzer detects network anomaly

by observing the two-way matching features on one edge router. Next, we formulate the

detection problem in terms of the B i, -i in decision theory introduced in Section 4.1.4.








Figure 4-1. Generative process in graphical representation, in which the traffic state
generates the stochastic process of traffic.


In the context of network anomaly detection, there are two states of nature of an edge

router, i.e., H= {Ho, H1}, where

* Ho represents normal state, in which case no abnormal network traffic enters the AS
through that edge router;

* Hi represents abnormal state, in which case abnormal network traffic enters the AS
through the edge router.









To formulate the model, we define a random variable Q : H -- Z2, such that


Q(H,) = u, u e Z2. (4-10)


Furthermore, denote by A the traffic observed by traffic monitors. Since network

state induces stochastic process of traffic, we employ the widely-used graphic model

representation [31] to depict this cause-effect relationship in Figure 4-1.







(a) (b)

Figure 4-2. Extended generative model including traffic feature vectors: (a) original model
and (b) simplified model.


Denote by 4, I E c the feature extracted from traffic, where ED is the feature space.

Most importantly, in selection of the optimal features, we seek for the most discriminative

statistical properties of the traffic. Also note that it is possible to employ multiple features

in the detection procedure, in which case ) is a vector. Since features are succinct

representations of the voluminous traffic, we extend the above model in Figure 4-1 to the

one illustrated in Figure 4-2(a). Once ) is extracted from A, we assume that ) represents

A well. It means that we may operate only over lower-dimensional 4, which reduces

computational complexity. Therefore, we simplify the model in Figure 4-2(a) to that

illustrated in Figure 4-2(b), where A is dismissed.

Since the feature is measurable, it is called observable random vector, and is depicted

by a rectangular node in Figure 4-2(b). The network state generating the traffic feature

is to be estimated. We call it hidden random variable, and depict it by a round node in

Figure 4-2(b). Now, the goal becomes to estimate the hidden state f given the observable









KP. The maximum gain criterion (see Equation (4-9)) specifies the estimate, i, to be


u = arg max ((u)p(|bQ = u)P(Q = u). (4-11)
uEZ2

Since p( f ) and P(Q = u) are unknown, we need to estimate them. This is the goal

of training (Section 4.3).








O 02 3 OK


Figure 4-3. Generative independent model that describes dependencies among traffic
states and traffic feature vectors.


An AS has many edge routers, each of which has multiple links. Traffic monitors

deploy, -1 on links and local analyzers on edge routers extract features and make decisions

independently. Therefore, the one link model in Figure 4-2(b) is further extended to the

more general model for the AS as illustrated in Figure 4-3, where K stands for the number

of edge routers.

The limitation of the detection model in Figure 4-3 is that it assumes edge routers

are mutually independent. This is due to the fact that traffic monitors and local analyzers

only have local information of the whole AS. Although it is suitable to detect network

anomalies accompanied with high traffic volume on single link, it is not suitable for low

volume network anomaly detection. We address this limitation by introducing spatial

correlation in next section.

4.2.2 Bayesian Model for Global Analyzers

The novelty of our detection approach lies in introducing spatial correlation among

edge routers into the network anomaly detection. This section introduces the spatial

correlation and its contribution to network anomaly detection. Since only global analyzers









have global information of the whole AS, detection approach employing spatial correlation

can only be deploy, ,1 in global analyzers.

When network anomaly happens, usually more than one edge router exhibits

abnormal symptoms. For example, when DDoS attacks are launched toward a victim

in an AS, the attack traffic enters the AS from multiple edge routers as the attack sources

are distributed. At each of those edge routers, the monitored traffic volume may be

low. That is, each traffic monitor or local analyzer observes a small deviation of traffic

features from normal distribution. However, the global analyzer, upon obtaining reports

from local analyzers, will observe small deviations of features from multiple edge routers

simultaneously. Employing spatial correlation contributes to low traffic network anomaly

detection.











Oi 02 a K


Figure 4-4. Generative dependent model that describes dependencies among edge routers.


Introducing the spatial correlation into the independent model in Figure 4-3 results

in the dependent model as illustrated in Figure 4-4. The difference between two models

is that, from the view point of a global analyzer, edge routers are no long independent.

As a result, statistical dependence among states of edge routers is represented by the

non-directional connections. Note that the independent model can be regarded as a special

case of the dependent one. Also note that we still assume that features extracted from one

edge router are independent of the states of other edge routers.












=( 2Q1, 22, Q ), a (4-12)

u U( U2, U ,) (413)

1 Q02, )O (414)

where fi is the random variable representing state of edge router i, i E {1,..., }, which

is defined in the same way as in Equation (410). We further assume gain factors are

independent of node index, i.e., <(ui) = X(uv,) whenever ui = u, (0 or 1), no matter

whether i is equal to i' or not. Then the maximum gain criterion (see Equation (4-9)) for

the dependent model is

u = argmax (i)p()J|Q = ui)P(Q = i)

-argmax [ixUi)P(Qi 2 Ut) P(l it). (4-15)

As the dependent model takes spatial correlation into consideration, it can make more

accurate detection, especially when traffic volume is low. However, it is a computationally

intractable model. That is because solving Equation (4-15) directly, we need to exhaustively

compute p(|l)P() for each possible combination of Q, which results in a O (2")

complexity. For a large AS, it is intractable.

We introduced a hierarchical structure to reduce computation complexity, which is the

topic of the next section.

4.2.3 Hidden Markov Tree (HMT) Model for Global Analyzer

The reason that the dependent model illustrated in Figure 4-4 becomes computationally

intractable is that we assume edge routers are fully dependent. A rough understanding

is that, if we break some dependence in Figure 4-4, we can reduce the computation

complexity. On the other hand, we would like to account for the dependencies among as

many nodes as possible to provide accurate detection. To balance these two conflicting










goals, we propose to use a hierarchical model, the hidden Markov tree (HMT) model, as

depicted in Figure 4-5.

1+1











I-1





Figure 4-5. Hidden Markov tree model. For an node i, p(i) denotes its parent node and
v(i) denotes the set of its children nodes.


The motivation of applying HMT model is that we assume edge routers are not

equally correlated. Instead, edge routers topologically close to each other have high mutual

correlations. Based on this assumption, we cluster edge routers according to the topology

of AS and form a tree structure, as depicted in Figure 4-5. Without loss of generality,

Figure 4-5 plots a quad-tree structure, i.e., each node, except leaf nodes, has four children.

To facilitate further discussion, each node in the HMT is assigned an integer number,

beginning with 0, from top to bottom. That is, node 0 is ah--,i a root node2 Table 4-2

lists the notations used in the rest of the paper for HMT.

In the HMT, each leaf node stands for an edge router. Zero-padding virtual edge

routers are introduced when the number of edge routers is not a power of B. States of

these zero-padding virtual nodes are al--x i- normal and features are ah--,l 0. Non-leaf



2 A HMT might have multiple roots, depending on the number of edge routers and the
number of levels.









Table 4-2: Notations for hidden markov tree model
Notation Description
fi The random variable representing the state of node i.
ti The random variable/vector representing the features) measured at node i.
4r {4i; i E T}, where T is a subtree of the HMT.
L The number of levels of the HMT.
2 The set of all nodes in the HMT.
El The set of nodes at level 1, 1 E ZL, in the HMT. Specifically, Eo represents the set
of root nodes and EL-1 leaf nodes.
B The number of children nodes of each node, except leaves. For example,
B = 4 for quad-HMT, as illustrated in Figure 4-5.
p(i) The parent node of node i, where i o0.
v(i) The set of children nodes of node i, where i EL_1.
T' The set of ancestor nodes of node i, where i Eo, including node i.
7(i) The root node of the subtree containing node i, where i E B.
% The subtree whose root is node i, where i E .

Ti TKY) \ 7>

nodes represent clusters of edge routers. Features of nodes are defined in Equation (4-16).

Features measured at the corresponding edge router i E EL_l(i.e., leaf node)

1 jE,(i) 4 i i ZEL-(i.e., non-leaf node)
(4-16)

One notes that only features of leaf nodes have physical nii iii:.- i.e., features measured

at corresponding edge routers. Features of a non-leaf node are assumed to be average of

features of its child nodes.

We have two assumptions for the HMT:

1. Node state only depends on state of its parent, if it is known, i.e.,

P(QilQ,j e E,j / i) =P(QQQ(i)), i E \ So (4-17)


2. Features measured at a node only depends on state of that node, if it is known, i.e.,

p(AN A) =p(-AN ), i E B (4-18)








Similar to Sections 4.2.1 and 4.2.2, we employ maximum gain criterion (see
Equation (4-9)) to estimate node states, i.e.,

ui = arg max ((u)P(Q = iu)

Sarg max I (u/,)P(A,'= ui/,QP(i,) = up(i),/)
i^ T e' \{iR(i)

SI (;T-,,) P(A ) W ,), (4-19)

for V i E L-_1. Applying Viterbi algorithm to solving Equation (4-19), we reduce the
computation complexity from O (2") (see Section 4.2.2) to O (Ba). This is the ni ii r
advantage of introducing HMT model. The details are given in Section 4.4.
Solving Equation (4-19) requires knowledge of P(Q~ i |p(i), for Vi E E \ Eo, and
P(j = usll), for Vi e Zo. By B ,, -i o, formula[30],

P(Q= us 1 P ,,Q) U)p(\|
P(li = Ui\l,(i) = Up(i), A) = --- (4-20)
P(Qp(i) = Up(i) |)
for Vi E E \ Eo. Therefore, solving Equation (4-19) translates to estimating

P(A0, Qp()| ), Vi B -\o, (4 21)

and

P(Q
Estimating Equations (4-21) and (4-22) in closed form is difficult. We proposed a
belief propagation (BP) algorithm, described in Section 4.3, to estimate them efficiently
given knowledge of

* prior probabilities: P(, = 0), i E Eo;

* likelihood: p(o(| j ), i E E;

* transition probabilities: P(Q ISQ(i)), i E E \ o0.









Usually it is difficult to estimate prior probabilities. In the paper, we simply assume
P(Q = 0) = P(Q = 1) = c i E 0o. That is, states of root nodes are equally likely to

be normal or abnormal. Other parameters such as likelihood and transition probabilities

are estimated from training data. This is covered in Section 4.3. After that, classification
using maximum gain criterion is described in Section 4.4.
4.3 Estimation of HMT Parameters

In this section, we describe estimation of HMT parameters. It is organized as follows.
In Section 4.3.1, we describe estimation of likelihood p(p|I Q), i E B. Estimation of

transition probabilities P(Q2lR p()), i cE \ 0o are presented in Section 4.3.2.
4.3.1 Likelihood Estimation

For the purpose of likelihood estimation, we collect two sets of training data.

* The set of features sampled in normal states, {1 k); k {1,..., Ko}}, where Ko is
(k) 0(k) j,(k) denotes the kth feature
the number of normal samples and 0k) )' i denotes te feature
measured at node i;

* The set of features sampled in abnormal states, {(k); k c {1,..., Ki}}, where K1 is
the number of abnormal samples and j { 'i; i E denotes the kth feature
measured at node i.

Gaussian mixture model. In order to effectively estimate the likelihood, we assume

that the random variables/vectors, |ilti, i E follow a statistical distribution model.
Then, likelihood estimation translates to model parameters estimation. We establish the

statistical model in the following.

Because of its good properties, Gaussian (normal) distribution [30] is widely emploiv, 1
in many applications. The pdf of a d-dimensional multivariate Gaussian distribution with

mean vector p and variance matrix E is

(X; (2 P, d 2 exp (x P)t-l(x ) (4 23)


Figure 4-6 plots the pdf of the univariate Gaussian distribution AV (x; 0, 1). It is observed
that Gaussian distribution is a unimodal distribution[30], i.e., its pdf only has one peak.













N(0,1) distribution


04
O~-

035

03

025

S02

015

01

005-

0-
-4


1 2 3 4


Figure 4-6. Probability density function of the univariate Gaussian distribution A/ (x; 0, 1).


Histogram


0 100 200 300 400 500 600


Figure 4-7. Histogram of the two-way matching features measured at a real network

during network anomalies.



However, multiple peaks may exist in the empirical distribution of |l Qi. For


example, Figure 4-7 shows the histogram of the two-way matching features measured


in a real network during DDoS attacks. It has two peaks. Hence, the unimodal Gaussian


distribution is not suitable. In the paper, we adopt the Gaussian mixture model (GMM)


to model the likelihood distribution.


-3 -2 -1 0
X









The motivation of the GMM is the following. Suppose we first randomly pick a

number g from set {1, 2,..., G} with probability P(G = g)=7r(g), where

G
~7(g)= 1. (4-24)
g-1

Next, we generate a random variable X from a Gaussian distribution with pdf Af (x; p(g), E(g)).

Then the random variable X follows the G-state GMM, whose pdf is

G
px(x) = Y b(x; (g), (g)) (g), (4-25)
g-1

where 7r(g), p(g), and E(g) are known as prior probability, mean vector, and variance

matrix of the gth Gaussian distribution in the GMM, respectively. A G-state GMM has G

modes. Therefore, it is suitable to model distributions with multiple modes.

In the paper, we assume i l i, i E E, follow G-state GMM. That is, the pdf of the

likelihood of node i is
G
p(Oi I, = u) -= Y (g) (0; PLs(g?), I,(g)), (4-26)
g=1

for Vi E E, Vu E {0, 1}, where 7i,i(g), pj,u(g), and Eii,(g) are prior probability, mean

vector, and variance matrix of the gth Gaussian distribution at node i with state u,

respectively.

Next, we present schemes to estimate the GMM parameters.

GMM parameter estimation. Based on Equation (4-26), the likelihood estimation

translates to estimating GMM parameters 7ru(g), pi,u(g), and Esi,(g) for Vi E E,

Vu E {0, 1}, and Vg E {1,..., G}, with constraint

G
Y7,,-(g)= 1. (4-27)
g-1

Denote by 7^i,u(g), pi,,(g), and Ei,,(g) the estimates of 7ri,u(g), pi,u(g), and Esi,(g),

respectively.









The most commonly used approach to estimate model parameters is the maximum-likelihood

(\! I) method. Given the training features at node i with network state u, { l ,... ..)},

the ML method chooses the parameters to maximize
K.
p( (lk) i = u), (4-28)
k=1

where p(.-| = u) is given in Equation (4-26). Unfortunately, Nechyba[32] showed

that ML method for G-state GMM with G > 1 has no closed form solution. In

addition, a G-state GMM has a 3G-dimensional continuous parameter space. Exhaustive

searching numerical solution for ML estimate in such a parameter space is computational

intractable.

1. Input: k(k), k e {1,..., ; 7 (g), ) (g), and 0(g), g {1,..G}.
2. Output: ri,u(g), pi,(ug), and i,u(g), g c {1,..., G}.
3. j = 0.
4. repeat



9. jKu (- j


C=1" (,' )"
9. j -j+ 1
10. until converge
11. iji,u(g) (g), k^,(g) iu (g), -i,u(g) -= )(g), Vg e {1,..., G}.

Figure 4-8. The EM algorithm for estimating p(oi I = u), i E t, u E {0, 1}.


A practical solution to this issue is the expectation-maximization (EM) algorithm[29,

30]. Nechyba[32] derived EM algorithm for GMM in detail. Figure 4-8 illustrates the

algorithm.

The EM algorithm requires initial values for the parameters, as denoted by (0)(g),

o)(g), and Eio(g) in Figure 4-8. At each iteration j, the EM algorithm uses parameters
estimated at iteration j 1 to calculate new estimates. Although both EM and ML









methods scan the parameter space, EM works in a better way. It is proven that after each

iteration, EM algorithm guarantees to generate estimates of parameters which increase

Equation (4-28). As a result, EM algorithm converges much faster than numerical ML

method.

However, the disadvantage of EM algorithm is that it converges to a local maxima

rather than the global one. Specifically, initial values of parameters determine the local

maxima to which the EM algorithm converges. In practice, we have prior knowledge of

network features, which helps to choose initial values of parameters.

Till now, we present schemes to estimate likelihood of HMT. In next section, we

estimate transition probabilities.

4.3.2 Transition Probability Estimation

In this section, we estimate P(QilSp,()), i EE \ 0o. Since closed form representation of

the transition probabilities is not available, we also estimate them in an iterative way.

Denote by {Q(k); k {1,... K}} the set of training features for transition probability

estimation, where (k) 0(k); ci E -}. Figs. 4-9 and 4-10 show the pseudo-code for

transition probability estimation. In next two sections, we explain the two figures.

Iteratively estimate transition probabilities. Figure 4-9 shows the pseudo-code

to estimate transition probabilities. The function TransProbEstimate takes three sets of

arguments:

1. likelihood estimated in Section 4.3.1, {p( |Qj); i E B};

2. training features, {((k); k e {1,..., K}}.

It returns the estimate of transition probabilities, i.e.,


P(, Q~,(Q)),V i \ o.

Before the iterations, we set the initial transition probabilities to be 1 at line 5 of

Figure 4-9. This is equivalent to assume normal state and abnormal state to be initially









1. function TransProbEstimate(. .)
2. Argument 1: likelihood, {p(Qk Q|); ie E}.
3. Argument 2: training data, {(k); k E {1,..., K}}.
4. Return: transition probability estimate: P( ul |f,()), iE \ So.
5. P(o)(Qi u l, u') = for V i eE \ o, V u, u' {0, 1}.
6. j = 0.
7. repeat
8. fork<- 1 toK
9.

{P(')(Qp() (k)), p(+l(1) i, Q(i) (k));i e \ 0o}

-BP ({PJ)(Q| Q(); i e E \ Bo} {p(i D); i }, i f) (4-29)

10. end for
11. forVi E \ Eo
12.

PQ(i+1)( K1 P+I) (Ai, p(i) (k))
K( P(j+l). p(i) (k)
k1l
SP(J+I) (i P(), (4-30)
k-1
13. end for
14. j<--j+1
15. until converge
16. P(Qi|QP(i))=-PF)(Qi|P(i)).
Figure 4-9. Iteratively estimate transition probabilities.

equally likely. Then, at each iteration, we update the estimate of transition probabilities

until it converges. The update procedure is the following.

First, we iterate the training feature set. For each feature, we use BP algorithm (see

Figure 4-10) to estimate the posterior probabilities given that feature. The details of the

BP algorithm is discussed in the next section.

Three sets of arguments are passed to the BP algorithm:
estimate of transition probabilities obtained at the previous iteration, {P()( (iR,)); i E \ Eo};

1. likelihood, {p( ijp|); i E}, which is the argument passed to function TransProbEsti-
mate;












1. function BP(...)
2. Argument 1: transition probabilities, {P(R, O,(,));i E \ o)}.
3. Argument 2: likelihood, {p(|ila0); i c E}.
4. Argument 3: training feature, <.
5. Return: posterior probabilities, P(p( l( )), P(0j, Op(,i|);i iE \ -lo.
6. Ti(0) Ti(1) for Vi E Bo(i.e., roots).
7. (u) = p(il|i u), for Vi e EL_l(i.e., leaves), Vu e {0, 1}.
8. Top-down pass, i.e., from root to leaf:
9. for 1,...,L- 1
10. for Vi Vu e {0,, let
11.

Ti(u) > P ({= ulp() ') Tp(r)(u')p( (()lu=p(|) u') (4-31)
u'e{o,1}

12. end for
13. end for
14. Bottom-up pass, i.e., from leaf to root;
15. for 1 L 2,...,0
16. for Vi E Vu {0, 1}, let
17.


(u) = p(O i = u) j P (J u'l O u) vj(u') (4-32)
jEv(i) u'e{o,1}

18. end for
19. end for
20.

p/o Ti(u')vi(u')
P( ) =|5)(4-33)

21.

P(i = u, Op(i) u'l)P
v[>, i(u")v, (u")1 [ (u P (I2O =" 2U()) u l ) (")] (434)


Figure 4-10. Belief propagation algorithm.









2. current training feature, (k).

It returns estimate of posterior probabilities

P (+ O ( ) I|(k)) p(j+) i p(i) I -(k)) \ 70


With the posterior probabilities obtained through BP algorithm, we update the

estimates of transition probabilities by Equation (4-30) for V i E E \ Bo and step to the

next iteration. When the estimates converge, iteration stops and Function TransProbEsti-
mate returns estimates obtained at the last iteration.

The validity of Equation (4-30) is shown in the following. For V i E E \ 0o,

P(Q|, QP)) J P(QAI Q, = )p b)dQ

= f [P((Q|, Q(), ))] (4-35)

where [.] represents statistical expectation and the subscript stands for the random

variable over which expectation is taken. Because sample average is alv--x the best

unbiased estimate of statistical expectation[30], we estimate


[P((Q|I Q(,),))]

by

k -1 k -1


Combining Equations (4-35) and (4-36), we obtain Equation (4-30).

Next, we discuss the belief propagation algorithm, which is called by function

TransProbEstimate.

Belief propagation algorithm. The BP algorithm [33-35], also known as the

sum-product i,'lj .:thm, e.g., [36-39], is an important method for computing approximate

marginal distributions. In this paper, we apply the BP algorithm to estimating posterior

probabilities (Figure 4-10).








Function BP takes three sets of arguments:


1. transition probabilities, {P(QA, Q,());i cE \ o)};

2. likelihood {p(O|l Q); i e _};

3. the training feature, Q,
and returns estimates of the posterior probabilities,

{ P(Q() |A P(Q, ) |; i c B \ o} .

In Function BPO, we define two sets of transitory variables for convenience, i.e.,

rju) p (Qi U,=u^), (4-37)

S(u) Ap (T U) (4 38)

where u E {0, 1}, i E E.
Function BPO first initializes the variables Tj(u) for root nodes and vi(u) for leaves.

* When i E So, i.e., root nodes, 7 = 0, such that Tj(u) P (j u). Hence, we let
Ti(0) and Ti(1) for all root nodes i(see line 6 of Figure 4-10), because we
assume root nodes are equally likely to be in normal or abnormal state.

* When i E EL-1, i.e., leaf nodes, ^ = Q, therefore vi(u) = p(~| Q = u), u c {0,1}
(see line 7 of Figure 4-10).
Then it propagates belief on the tree roots and leaves to other nodes in top-down pass
and bottom-up pass, respectively.

* During the top-down pass, Function BP iterates from root to leaf. At each level 1,
we update the transitory variables Tj(u) by Equation (4-31), which is proven in
Appendix A.1. Note that, Tp,()(u') in Equation (4-31) is obtained in the previous
iteration, i.e., iteration at level 1 1.

* During the bottom-up pass, Function BP iterates from leaf to root. at each level
1, we update the transitory variables (u) by Equation (4-32), which is proven
in Appendix A.2. Also note that, vj(u') is obtained in the previous iteration, i.e.,
iteration at level I + 1.









Finally, we obtain the posterior probabilities by Equations (4-33) and (4-34). These

two equations are proven in Appendix A.3 and A.4, respectively. The estimated posterior

probabilities are used in Function TransProbEstimate to update estimates of transition

probabilities (see Figure 4-9).

Till now, we established the HMT model and described approaches to estimate

its model parameters from training data. Next, we present network anomaly detection

approaches using the fully determined HMT model.

4.4 Network Anomaly Detection Using HMT

In this section, we present the network anomaly detection using HMT model. This

is equivalent to a decoding problem in terms of pattern classification. That is, given an

observation sequence, i.e., extracted features ( {1(; iE E}, and a HMT model defined by

* prior probabilities: P(RQ), i E Bo;

* likelihood: p(Q ij|), i E B;

* transition probabilities: P(QjI2,(i)), i E B \ "o,

we need to compute the "b' -I state combination = {2i; i E E}. Here, the word
"b. -I is in terms of maximum gain criterion, as illustrated in Equation (4-19). We

rewrite the criterion in Equation (4-39),


i = arg max f (u,)P(A, = u, I Qp(i) = p(i'), )


I IP(Q ) I"c ` (4-39)

for V i E B. Function ViterbiDecodeHMT, as illustrated in Figure 4-11, shows the

pseudo-code for the classification algorithm.

Function ViterbiDecodeHMT takes three arguments. The first two arguments, the

transition probabilities and the likelihood, are estimated during training phase. The

last one is the extracted features, based on which we perform anomaly detection. It

returns the estimates of node states. Among them, we are only interested in states of leaf








nodes, which represent whether an edge router is in abnormal state. Next, we explain the
algorithm in detail.

1. function ViterbiDecodeHMT(...)
2. Argument 1: transition probabilities, {P(R ,|(i)); i E \ 0o}.
3. Argument 2: likelihood, {p(|il0i); i c E}.
4. Argument 3: feature, 0.
5. Return: estimated node states, {ui; i E E}.


{P(Q ,(i)| ), P(A i, QP()| i); i 2 \ Eo}
BP {P( | ( Q,()); i e \ o} {p( I| i); i e } (


P (Qi, QPy) r
P (Q~i )


(4-41)


for V i E 0o
Ui = arg n -::
end for
for l -- ,...,L


P(A = u ,l)

-, for V i C EE


ii =argmaxs(ui()P (OQ
u,


SII


(4-42)


13. end for


Figure 4-11. Viterbi algorithm for HMT decoding.

By B ,i, ii formula, the terms in Equation (4-39) can be computed by


P (Q6i Ui," QPW() UP(W)4

P Qv I ( )


(4-43)


(4-40)


P (Qi I (i), )


UjjQ 6i)-itPhi)'I) -


O(i,)P (Q/,


P (Qi, Ui, I QPW) UPW), )









where i' E E\Eo. Therefore, it also requires to calculate posterior probabilities. We employ

BP algorithm (see Equations (4-40) and (4-41) in Figure 4-11) to solve Equation (4-43) in

a way similar to HMT transition probability estimation(see Figure 4-10).

Obtaining solutions to Equation (4-43) for all nodes, we can solve Equation (4-39). A

brute force solution to Equation (4-39) is to exhaustively compute


IH I)P(/ Ui,\Q^) U), V I P () i\) W (4-44)
i'ETiV'\{R(i)}

for V i e E \ Eo, V Ui e {Space of 0} and select the "b. -i one. A HMT with L levels

modeling an AS with K edge routers has

1 -L
1 B-1

nodes, each of which has two states, normal and abnormal. Then the 0 space has




possible values. The computation complexity of the brute force method is




even worse than the dependent model (see Figure 4-4), whose complexity is

0 (2").

In this paper, we applied Viterbi algorithm [40-42] to solving Equation (4-39) in an

iterative manner, reducing the computational complexity to O (Ba).

The motivation of Viterbi algorithm is the following. It iterates from top level of a

HMT to bottom level. At each iteration, it estimates the node states in that level in a way

that, when combined with states estimated in upper levels, "b, -1 explains the observed

features. That is, at each iteration, Viterbi algorithm albv-i- selects the local maxima.









Although it does not guarantee to find the global optimal solution to Equation (4-39),

Viterbi algorithm is efficient and has good performance empirically.

The computation complexity of Viterbi algorithm is O (Ba), much better than

the dependent model, as illustrated in Figure 4-4, whose complexity is O (2"). The

performance improvement results from the fact that Viterbi algorithm does not exhaustively

test all possible node state combinations. Instead, we decompose the decoding problem

into multiple stages, each of which decodes node states at one level in HMT. At level 1, we

estimate node states of level 1, i.e., {ji; i E EE}, based on results obtained during previous

stages, i.e. {Qj; i E {Bo0,..., -1i}}. In such a procedure, each node is only accessed for

twice. Hence the complexity is linear to the number of nodes, which is

1 1 -L
Spose sl< B aa.
1 B-1 1 B-1

Till now, we described the HMT model, including its parameter training and

classification approaches. Next, we show the simulation results of applying the HMT

model to network anomaly detection.

4.5 Simulation Results

In this section, we evaluate the performance of the proposed schemes through

simulation.

4.5.1 Experiment Setting

In our study, we develop a testbed to 1) extract various feature information, and 2)

analyze feature data by our machine learning algorithm and CUSUM algorithms. Next,

we describe the setting for networks, traffic traces, and feature extraction used in our

experiments.

Network.

In our experiment, we assume that the ISP network consists of a core AS, a victim

subnet, and 16 edge routers that connect to 16 subnets, as illustrated in Figure 4-12. At

each edge router, two monitors are placed to measure the inbound and outbound traffic














(Subnet 1


Inbound Autono os Sys m





(Victim
Subnet

Figure 4-12. Experiment Network


between a subnet and the victim network, respectively. For convenience, we denote a link

as the route between an edge router and the victim subnet.

Traffic. A link may carry normal traffic (called background traffic) or abnormal

traffic. For the background traffic, we use the same data set as in Section 3.7.2[27]. Since

we do not have real data traces obtained from 16 different links, we use the real traffic

trace measured on one link (between the Internet and Auckland University) in 16 different

d-,4 to create traffic traces for 16 different links.

For the abnormal traffic, we randomly generate TCP SYN flood attacks into the

background trace. Specifically, we generate several attack scenarios. For each scenario,

we randomly select the abnormal links and attack durations. Attack traffic on each link

is generated in the same way as in Section 3.7.2. That is, we randomly insert TCP SYN

packets with random source IP addresses into the background traffic of that link. The

average packet rate of TCP SYN attack traffic on each selected link is 1 of the total

packet rate on the link. For each attack scenario, attacks on each of the selected links are

launched during almost the same period to simulate the synchronized DDoS attacks. Since

the attack traffic on each link is low (just 1 .), we effectively simulate low volume attack

traffic.









Features.

To detect distributed TCP SYN attacks, we use the two-way matching features

described in '!O ipter 3, i.e., the number of unmatched inbound SYN packets in one

time slot. The parameter setting of two-way matching features extraction is same as in

Section 3.7.2. For convenience, we summarize the parameters in Table 4-3.

Table 4-3: Parameter setting of feature extraction for network anomaly detection
Notation Description
R 2480
1l 0.1.
/C 8
w 8
7 10 seconds
1i1. 215 bits


For comparison purpose, we also measure the number of SYN packets and SYN/FIN

ratio[11] in a slot.

4.5.2 Performance Comparison

Table 4-4: Performance of different schemes.
Feature Detection algorithm Detection probability False alarm probability
SYN/FIN ratio CUSUM 0.174 0.129
SYN CUSUM 0.52 0.129
SYN Machine learning 0.656 0.123
Unmatched SYN CUSUM 0.690 0.130
Unmatched SYN Machine learning 0.973 0.115


Table 4-4 compares the performance of different schemes, where the benchmark is

the scheme in [11], i.e., the CUSUM scheme with SYN/FIN ratio as the feature; for the

benchmark scheme, we use the same parameter setting as that in [11]; we compare the

benchmark with CUSUM and our machine learning algorithm under different features.

To make fair comparison, we make the false alarm probability of each scheme almost

the same and compare the detection probability. From Table 4-4, it can be seen that,

the benchmark scheme ('SYN/FIN ratio'+CUSUM) performs very poorly in detecting

low volume DDoS attacks. In contrast, a CUSUM algorithm with the number of SYN










packets or the number of unmatched SYN packets as the feature can achieve much higher

detection probability. More importantly, our machine learning algorithm can significantly

outperform CUSUM, given the same feature data, no matter whether the feature is the

number of SYN packets or the number of unmatched SYN packets.




0.8

I 0.6

S 0.4

0.2 Threshold (SYN)
Machine Learning (SYN)
Threshold (UM-SYN)
0 Machine Learning (UM-SYN)
0 0.2 0.4 0.6 0.8 1
False Alarm Probability

Figure 4-13. Performance of threshold-based and machine learning algorithms with
different feature data


Figure 4-13 compares the ROC curve of the threshold-based scheme described in

Section 4.1.2 and our machine learning algorithm under two different features, i.e., the

number of SYN packets (denoted by 'SYN') and the number of unmatched SYN packets

(denoted by 'UM-SYN'). We observe that, for the same detection algorithm, using the

number of unmatched SYN packets can significantly improve the ROC performance,

compared to using the number of SYN packets. In other words, given the same false alarm

probability, the detection probability is much higher when using the number of unmatched

SYN as feature.

Another important observation from Figure 4-13 is that given the same feature data,

our machine learning algorithm can (significantly) improve the ROC, compared to the

threshold-based scheme; e.g., for the same false alarm probability of 0.05, our machine

learning algorithm achieves a detection probability of 0.93, while the threshold-based

scheme only achieves a detection probability of 0.72. This is due to the fact that our











machine learning algorithm exploits the spatial correlation among traffic on multiple links,

while the threshold-based scheme only uses the traffic on one link.


Single CUSUM
Dual CUSUM
Threshold ---
0.8 Machine Learning -/ /


0.6
2

5 0.4


0.2

0 ......4- "i r ........-----------
0.0001 0.001 0.01 0.1 1
False Alarm Probability

Figure 4-14. Performance of four detection algorithms


In Figure 4-14, we compare the ROC performance of four detection algorithms (the

threshold-based, the single-CUSUM, the dual-CUSUM described in Section 4.1.3, and our

machine learning algorithm) under the same feature, i.e., the number of unmatched SYN

packets. For the single-CUSUM and the dual-CUSUM algorithm, the detection delay D is

chosen from 1 to 10 slots and the parameter ai of link i is determined by


ai = (Dattack Dormal) V 1 < < 16,
17

where Dattack and Dnormal are the average number of unmatched inbound SYN packets in

attack and normal conditions, respectively.

The ROC performance (Figure 4-14) of our machine learning algorithm is the

best among all the algorithms. We also see that the dual-CUSUM out-performs the

simple threshold-based algorithm and the single-CUSUM algorithm has the worst ROC

performance.

4.5.3 Discussion

We would like to point out that, besides detecting network anomalies with low

volume traffic, our machine learning algorithm is also able to detect high volume anomaly,









the results of which are not shown here due to the space limit. The machine learning

algorithm is shown to be robust under realistic time-varying traffic patterns such as

the Auckland data traffic [27]. We tested our machine learning algorithms for a large

IP address space, i.e., the IP address space can be the whole IP address space for the

Internet.

4.6 Summary

In this chapter, we propose a novel machine learning detection algorithm, based on

B i, -i in decision theory and hidden Markov tree model. The key idea of our algorithm is

to exploit spatial correlation of network anomalies. Our detection scheme has the following

nice properties:

* In addition to detecting network anomalies having high-data-rate on a link, our
scheme is also capable of accurately detecting attacks having low-data-rate on
multiple links. This is due to exploitation of spatial correlation of network anomalies.

* Our scheme is robust against time-varying traffic patterns, owing to powerful machine
learning techniques.

* Our scheme can be deploy, 1 in large-scale high-speed networks, thanks to use of
Bloom filter array to efficiently extract features.

With the proposed techniques, our scheme can effectively detect network anomalies

without modifying existing IP forwarding mechanisms at routers. Our simulation results

show that the proposed framework can detect DDoS attacks even if the volume of attack

traffic on each link is extremely small (i.e., 1 .). Especially, for the same false alarm

probability, our scheme has a detection probability of 0.97, whereas the existing scheme

has a detection probability of 0.17, which demonstrates the superior performance of our

scheme.









CHAPTER 5
NETWORK CENTRIC TRAFFIC CLASSIFICATION: AN OVERVIEW

In C! liters 5 and 6, we focus on the second part of our research, i.e., network centric

traffic classification. This chapter motivates the significance and points out the challenges

of this issue, and shows weakness of existing solutions.

5.1 Introduction

The Telecom business is rapidly changing. Commoditized below profitable levels,

traditional circuit-switched voice service just is not lucrative anymore. Since 2000, the

drop in traditional voice revenue has prompted the large telcos to explore new business

opportunities. Services over IP (SoIP) have been identified as the new streams to continue

growing. Among all the SoIP, VoIP and IPTV are the most attractive ones as they are

trusted to represent the largest source of profits as consumer interest in online voice and

video services increases, and as broadband deployments proliferate. According to Point

Topic research firm, there were 209.3 million global broadband users at the end of 2005,

up to 56.2 million from 153.3 million lines on 31 December 2004. As a consequence, VoIP

and IPTV user population is expected to grow dramatically in the next few months. For

example, France Telecom released on July 2006 that the number of its VoIP users grew

sI' in the last 6 months to a total of 1.73 million as of June 30th, 2006. Same for IPTV

users expected to grow from 300,000 tod iv to 5 million in the next 2 years.

But to tap the potential profits that SoIP offers, the infrastructure of carrier networks

needs to evolve. Next-generation Networks (NGN) feature the convergence of access

technologies (wireline, wireless, cellular), information services (voice, broadband, data,

content), and devices (consumer electronics, traditional telecom equipment). Such

multi- ,i--, i1 I convergence promises reduced costs, greater workforce and consumer

mobility, and exciting new business models. However, the trend toward convergence

creates a strong need for fair methods of efficiently and accurately managing and tracking

the delivery of IP services. As carriers transition to becoming service providers, they begin









to sell and deliver IP services to their customers. Unfortunately the emergence of a bloom

of new zero-d-, voice and video applications over IP, like Skype, Google Talk (Gtalk),

MSN, etc, the proliferation of new peer-to-peer protocols that now allow the usage of voice

and video among other applications, and the continuing growth of the usage of encryption

techniques to protect data confidentiality, lead to tremendous revenue leakage for ISPs due

to their inefficiency in detecting these new applications and thus lack of proper actions.

The result from unmanaged commercial traffic adds up to loss of hundreds of millions of

dollars annually and poses a solid road block to the profitability of ISPs' VoIP and IPTV

services.

As a consequence, it is imperative for ISPs to identify robust solutions for detecting

voice and video over IP data-streams. The most common approach for identifying

applications on an IP network is to associate the observed traffic with an application

based on TCP or UDP port numbers [43, 44]. In principle the TCP and UDP server

port numbers can be used to identify the higher 1- v.r application, by simply identifying

the server port and mapping this port to an application using the Internet Assigned

Numbers Authority (IANA) list of registered ports [45]. However, port-based application

classification has limitations due to the emergence of new applications that no longer

use fixed, predictable port numbers. For example, non-privileged users often have to use

ports above 1024 to circumvent operating system access control restrictions; or common

applications like FTP allows the negotiation of unknown and unpredictable server ports to

be used for the data transfer; or proprietary applications may deliberately try to hide their

existence or bypass port-based filters by using standard ports. For example, server port 80

is being used by a large variety of non-web applications to circumvent firewalls which do

not filter port-80 traffic; others (e.g., Skype) tend to use dynamic ports.

A more reliable technique involves stateful reconstruction of session and application

information from packet contents [46-48]. Although this avoids reliance on fixed port

numbers, it imposes significant complexity and processing load on the classification device,









which must be kept powerful enough to perform concurrent analysis of a large number

of flows while applying techniques to search very complex protocol signatures that might

require processing of a large chunk of packet 1 ivload. The proliferation of proprietary

protocols, coupled with the growing trend in the usage of encryption techniques to ensure

data confidentiality, makes this approach infeasible. For example, Skype does not run on

any standard port, but randomly selects ports for its communication and use either the

TCP, or UDP or both for the data transfer. Furthermore, its use of a 256-bit encryption

algorithm and no visibility into neither the algorithm nor its keys makes its detection

even harder. All the above makes the general problem of the detection of VoIP and video

data-streams over IP challenging and yet it is of huge business interest.

A new emerging tendency in the research community to approach this problem is

to rely on pattern classification techniques. This new family of techniques formulate

the application detection problem as a statistical problem that develop discriminating

criteria based on statistical observations and distributions of various flow properties in the

packet traces. A few papers [49, 50] have taken this statistical approach to classify traffic

into p2p, multimedia 1r ii-:1i1- interactive application, and bulk transfer application.

Unfortunately, although these papers addressed the problem of distinguishing multimedia

traffic from other applications, they have not addressed the problem of distinguishing

voice traffic from video traffic. One problem is to separate streaming traffic from other

applications, and a different problem is to detect and correctly classify voice and video and

clearly separate the two applications from each other. In the extreme case, voice and/or

video data streams might even be bundled together in the same exact flow with other

applications. These problems are common for many applications like Skype, Gtalk and

MSN that allow users to mix voice and/or video streams with chat and/or file transfer

traffic in the same exact 5-tuple flow, defined as source IP address, destination IP address,

source port, destination port, and protocol numbers. In such cases, one flow may carry









traffic from multiple types of applications (such as voice, video, chat, and file transfer),

referred to as i,;l';.: flow in the remainder of this paper.

Our research focuses on detecting and classifying voice and video traffic and further

deal with its more general formulation that considers the presence of hybrid flows.

Based on the intuitions that voice and video data streams show strong regularities in

the inter-arrival times of packets within the flow and the associated packet sizes when

combined together in one single stochastic process and analyzed in the frequency domain,

we propose a system, called VOVCla' -'.,. r for voice and video traffic classification.

VOVClassifier is an automated self-learning system that is composed of four

ii i P.i modules operating in cascade. The system first is trained with voice and video

data streams and afterwards enters the classification phase. During the training

period, all packets belonging to the same flow are extracted and used to generate a

stochastic model that captures the features of interest. Then all flows are processed in

the frequency domain using Power Spectral D. u;-.:/ (PSD) analysis in order to extract a

high-dimensional space of frequencies that carry the 1 Pi ii y of energy of the signal. All

features extracted from each flow are grouped into a I. II ire vector". Due to the wide

usage of different codecs for voice and video, we propose a second module that clusters the

feature vectors into several groups using Subspace Decomposition (SD) and then identifies

its subspace structure, e.g. bases of the subspace, using Principal Component A,.l;.i..:

(PCA). These two steps are applied to all flows during the training period and produce

low-dimensional spaces, referred in the paper as voice subspace and video subspace.

After training, all flows are processed by the PSD module and the associated feature

vector is compared with the voice and video spaces obtained during training. The space at

minimum normalized distance from the feature vector is selected as candidate and chosen

if and only if its distance is below a specific predetermined threshold.

We applied VOVClassifier to real packet traces collected into two different network

scenarios. Results demonstrate the effectiveness and robustness of our approach, able to









achieve 1C(,' detection rate of both voice and video in the case of single-typed flow, e.g.

one application per 5-tuple, and 98.,' and 94.>' respectively for voice and video when

dealing with the more complex scenario of hybrid flows, e.g. voice, video and file transfer

bundled together in the same 5-tuple flow.

The rest of the chapter is organized as follows. In Section 5.2, we introduce the

related work in the area of pattern classification methodologies. Section 5.3 describes

the weaknesses of metrics previously used by other works when applied in our context

and highlights the new traffic features that constitute the foundation of our approach.

Section 5.4 summarizes this chapter.

5.2 Related Work

Existing work on traffic classification uses discriminating criteria such as the packet

size distribution per flow, the inter-arrival times between packets within the same flow

and other statistics captured across multiple flows. For example, in Ref. [49], the authors

proposed the combination of average packet size within a flow and the inter-arrival

variability metric, e.g. defined as the ratio of the variance to the average inter-arrival

times of packets within a flow, as a powerful metric to define fairly distinct boundaries for

three groups of applications: (i) bulk data lii.r er like FTP, (ii) interactive like HTTP,

and (iii)streaming like voice, video, -,nii.r etc. Several classification techniques, like

nearest-neighbor and K-nearest-neighbor, were then tested using the above traffic features.

Although this preliminary study has proved that the approach of pattern classification has

great potential for a proper application classification, it proves that much more work still

remains, e.g. exploiting other alternative for traffic features and classification techniques.

Moreover, although the features extracted are simple and feasible to be implemented

on-the-fly, the learning algorithm is complex and the outcome boundaries among the three

families of applications are heavily non-linear and time-dependent.

Similar to Ref. [49], Karagiannis et al.[50] proposed a novel approach, called BLINC,

that exploits network-related properties and characteristics. The novelty of this approach









resides in twofold. First, the authors shift the focus from classifying individual flows to

associating Internet hosts with applications, and then classifying their flows accordingly.

Second, BLINC follows a different philosophy from previous methods attempting to

capture the inherent behavior of a host at three levels: (i) social level, e.g. how each host

interacts with other hosts, (ii) functional level, e.g. role 1p1 i-, d by each host in the network

as a provider of an application or a consumer of the application, and finally (iii) the

application level, e.g. ports used by each host during its communication with other hosts.

Although the approach proposed in [50] is interesting from a conceptual perspective and

proved to perform reasonably well for a variety of different applications, it is still prone

to large estimation errors for streaming applications. Moreover, its high complexity and

large memory consumption remains an open issue for high-speed application classification.

Other papers using pattern classification appeared lately in literature but more focused

on specific application detection like Peer-to-Peer [46] and chat [51]. More importantly, to

the best of our knowledge, none of the existing work has been able to separate voice traffic

from video traffic or to indicate the presence of voice traffic or video traffic in a hybrid

flow that contains traffic from both voice/video and other applications such as file transfer.

5.3 Intuitions Behind a Proper Detection of Voice and Video Streams

Generally -1'" i1:;i the problem of voice and video detection can be formulated as

a complex pattern Ia- I .: /,l.:.n problem that has to deal with curse of i.:,,. ,:..:"',:,l.':l,

e.g. discrimination of voice and video data streams when dealing with hidden traffic

patterns and too many interrelated features. A critical step toward the solution is to

identify traffic features that correctly represent the characteristics of the data streams of

interest and uniquely isolate them from other applications. In order to achieve this, in this

section we start by showing how simple metrics presented in the past are not applicable

in our context and we conclude with some observations that constitute the essence of

our approach. In Figure 5-1 we show the results obtained when using the combination

of average packet size and the inter-arrival variability metric proposed by Roughan et










al. [49]. Although this metric performed very well in separating streaming, file transfer,

transactional and interactive applications, it performs poorly when used to further

separate applications within the same family, as voice, video or voice and video mixed

with other applications like file transfer, e.g. hybrid flows. Figure 5-1 clearly highlights the

complete absence of any distinct boundary and heavy overlapping between voice and video

traffic. The reasons why the pair (average packet size, inter-arrival variability metric)

cannot separate video from voice are as below. First, the packet size for video/voice is

controlled by the packetization strategy of the video/voice application designer [52]; hence,

a video application may produce similar average packet size to that for voice (Figure 5-1).

Second, random end-to-end d.1 iv in the Internet causes large variations in the inter-arrival

variability metric for different video/voice flows.


1500
fX audio
1 file
S 3 fdleaudio
*fllevldeo
S video
1000
N



0')
a) 500 *




0 2 4 6 8 10 12
Inter-Arrival Variability Metric


Figure 5-1. Average packet size versus inter-arrival variability metric for 5 applications:
voice, video, file transfer, mix of file transfer with voice and video.


Table 5-1: Commonly used speech codec and their specifications
Standard Codec Method Inter-Packet Delay (ms)
G.711[53] PC\i .125
G.726[54] ADPC'\ .125
G.728[55] LD-CELP .625
G.729[56] CS-ACELP 10
G.729A[56] CS-ACELP 10
G.723.1[57] MP-MLQ 30
G.723.1[57] ACELP 30











In order to overcome the above problem, in this section we exploit different metrics

that might have great potential to serve our purpose: strong regularities of inter-arrival

times between packets within the same flow and packet sizes residing in voice and video

data streams. Specifically, we consider four types of metrics, i.e.,


1. packet inter-arrival time and packet size in time domain;


2. packet inter-arrival time in frequency domain;


3. packet size in frequency domain;


4. combining packet inter-arrival time and packet size in frequency domain.

These metrics are discussed later.


025
AUDIO
VIDEO
02


015
-n
01


0 05


0 001 002 003 004 005 006
IAT (seconds)


Figure 5-2. Inter-arrival time distribution for voice and video traffic



5.3.1 Packet Inter-Arrival Time and Packet Size in Time Domain

The intuitions behind such metrics reside in the observation that any protocol used

for voice and video applications specifies a constant time between two consecutive packets

at the transmitter side, also known as Inter-Packet Delay (IPD).For example, Table 5-1

lists some speech codec standards and the associated IPDs that are required for a correct

implementation of those protocols. Packets leaving the transmitter might traverse a

large number of links in the Internet before reaching the proper destination. Along this











007
| AUDIO
VIDEO
006

0 05 /

0 04
-n
003

002

^ j I ) / \

0 01
0 100 200 300 400 500 600 700
Packet Size (Bytes)


Figure 5-3. Packet size distribution for voice and video traffic


traveling, packets might experience random d 1 iv due to congestion at routers' interfaces.

As a consequence, the inter-arrival times between packets at the receiver might be severely

affected by random noise, e.g. jitter, and thus this metric might not represent a reliable

candidate feature for a robust classification methodology. Although this problem does

exist, we note how the inter-arrival times between packets within the same flow still shows

a strong iiolai;, when studied in the frequency domain at the receiver side. As an

example, in Figure 5-2, we show the distributions of the inter-arrival packet times at the

receiver side when using Skype to transmit respectively voice only and video only between

two hosts, one located in University A in east coast and the other in University B in

west coast of USA. As we can see, the distributions for both video and voice are centered

around 0.03 second. On the other hands, Figure 5-3 shows the distributions of the packet

sizes for both voice and video. As you can see, both voice and video are characterized by

similar distribution for packet size less than 200 bytes. Although video traffic generates

packet size of larger than 200, these larger packets cannot be reliably used to separate

video from voice since other applications such as chat or file transfer might also generate

these larger packets. As a consequence, packet inter-arrival time or packet size is a weak

feature when considered in the temporal domain.











5.3.2 Packet Inter-Arrival Time in Frequency Domain

We show how the same feature becomes a key reliable feature when observed in

the frequency domain. In this new domain, we are interested whether it does exist any

frequency component, e.g. inter-arrival time, that captures the 1i, ii, i ily of the energy of

this stochastic process at the receiver side. We exploit the above by computing the power

spectral density (PSD) analysis of the packet inter-arrival time process of two traces, each

of which is of length 10 second, in Figures 5-4 and 5-5 respectively for voice and video. We

can see that some regularity for both voice and video exist for different traces, although

the regularity is not quite strong. This result holds true for all experiments conducted

when transmitting Skype voice and video packets over the Internet from University A to

University B.


0
Trace 1
Trace 2
-10
m
-n
S-20
(/)
C
o -30

P -40

a -50
O
-60

-70
0 02 04 06 08 1
Normalized Frequency (xTc rad/sample)


Figure 5-4. Power spectral density of two sequences/traces of time-varying inter-arrival
times for voice traffic


5.3.3 Packet Size in Frequency Domain

Somewhat stronger regularity is visible for voice and video packet sizes. Indeed,

most video coding schemes use two types of frames[58], i.e., Intra frames (I-frame)

and Predicted frames (P-frame). An I-frame is a frame coded without reference to any

frame except itself. It serves as the starting point for a decoder to reconstruct the video

stream. A P-frame may contain both image data and motion vector displacements














10
'5 -15
O -20
5 -25
0 -30
-35 \ /-\3 5- / \ F\ fx

-40
-45
0 02 04 06 08 1
Normalized Frequency (xTc rad/sample)


Figure 5-5. Power spectral density of two sequences of time-varying inter-arrival times for
video traffic


and/or combinations of the two. Its decoding needs reference to previously decoded

frames. Packets containing I-frames are larger than those containing P-frames. Usually,

the number of P-frames between two consecutive I-frames is constant. Hence, one can

observe a strong periodic variation of packet size due to the interleaving of I-frames and

P-frames composing video data streams. Voice streams have similar phenomenon if Linear

Prediction Coding (LPC), e.g., code excited linear prediction (CELP) voice coder, is

employ, ,1 As an example, Figs. 5-6 and 5-7 show the power spectral density of voice and

video packet sizes, respectively.

5.3.4 Combining Packet Inter-Arrival Time and Packet Size in Frequency
Domain

Figs 5-8 and 5-9 show how the regularities hidden in voice and video data streams

can be amplified when combining the two features together in one single stochastic process

that will be described later in the paper. Note how the two important frequencies are

amplified and clearly visible in the PSD plots. The reason why there is a peak in the

PSD for voice (see Figure 5-8) is that voice applications usually produce close-to-constant

packet rate due to constant inter-packet delay of the widely used speech codecs listed

in Table 5-1; e.g., the peak of 33 Hz in Figure 5-8 corresponds to 30 ms inter-packet

















m

60
c
o 50

| 40

S30

20

10
0 02 04 06 08
Normalized Frequency (xrc rad/sample)


Figure 5-6. Power spectral density of two sequences of discrete-time packet sizes for voice
traffic


80
Trace 1
75- -Trace 2

m 70

6 65
C
a 60

55
50

45

S40

35

30
0 02 04 06 08
Normalized Frequency (xrc rad/sample)


Figure 5-7. Power spectral density of two sequences of discrete-time packet sizes for video
traffic



delay. Compared to voice, video applications have a flatter PSD. The reason is as below.


The number of bits in an I-frame of video depends on the texture of the image (e.g.,


the I-frame of a blackboard image produces much less bits than that of a complicated


flower image), resulting in a large range in the number of packets in an I-frame, e.g., from


1 packet to a few hundred packets produced by an I-frame. The frame rate is usually


constant (e.g., 30 frames/s is a standard rate in USA), i.e., a frame will be generated every












33 seconds for 30 frames/s; hence, the inter-arrival time between two packets in an I-frame


may span a large range, resulting in a flat PSD.



45
Trace 1
40 -Trace 2

S35

S30
C
) 25

U 2020
20


10--

5
0

0 10 20 30 40 50
Frequency (Hz)


Figure 5-8. Power spectral density of two sequences of continuous-time packet sizes for
voice traffic


100 200 300
Frequency (Hz)


Trace 1
Trace 2


400 500


Figure 5-9. Power spectral density of two sequences of continuous-time packet sizes for
video traffic



5.4 Summary


In this chapter, we motivated the importance and presented challenges faced by


network traffic classification, specifically, detecting and classifying voice and video traffic.









N. .,-I 1 ivs, VoIP and IPTV become increasingly popular and represent the largest

source of profits as consumer interest in online voice and video services increases, and

as broadband deployments proliferate. In order to tap the potential profits that VoIP

and IPTV offer, carrier networks have to efficiently and accurately manage and track the

delivery of IP services. Yet, the emergence of a bloom of new zero-d-iv voice and video

applications such as Skype, Gtalk, and MSN poses tremendous challenges for ISPs. The

traditional approach of using port numbers to classify traffic is infeasible due to the usage

of dynamic port number. The proliferation of proprietary protocols, coupled with the

growing trend in the usage of encryption techniques to ensure data confidentiality, makes

application-level ,in lv-i infeasible. We also proposed a novel problem that multiple

sessions reuse the same transport liv,-r connection. To our best knowledge, this problem

has never been considered in existing literatures.

We showed that existing technologies (Section 5.2) are not able to accurately

distinguish between voice and video flows. By analyzing the properties of voice and

video data streams, our intuition is to exploit the strong regularities residing in packet

inter-arrival times and the associated packet sizes. In this chapter, we analyze four types

of metrics that exploit the regularities,

1. packet inter-arrival time and packet size in time domain;

2. packet inter-arrival time in frequency domain;

3. packet size in frequency domain;

4. combining packet inter-arrival time and packet size in frequency domain.

By analyzing properties and illustrating figures of the four types of metrics, we show

that combining packet inter-arrival time and packet size in one single stochastic process

generates distinctive feature to classify voice and video streams.










CHAPTER 6
NETWORK CENTRIC TRAFFIC CLASSIFICATION SYSTEM

6.1 System Architecture

Voice Training
Flows Flow Summary P Feature Extractor Voice Subspace
Generator (PSD) Generator
Video Training
Flows Flow Summary Feature Extractor Voice Subspace
Generator (PSD) Generator
Training Phase

Raw Packets
Raw Packets Flow Summary_ Feature Extractor_ Voice/Video
Generator (PSD) Classifier
Classification Phase

Voice/Video/Other

Figure 6-1. VOVClassifier System Architecture


We first present the overall architecture of our system (VOVCl -,I... r Figure 6-1),

and provide a high-level description of the functionalities of each of its modules. Generally

speaking, VOVClassifier is an automated learning system that uses packet headers from

raw packets collected off the wire, organize them into transport network flows and process

them in realtime to search for voice and video applications. VOVClassifier first trains voice

and video data streams separately before being used in realtime for classification. During

the training phase, VOVClassifier extracts feature vectors, which is a summary (also

known as a statistic) of raw traffic bit stream, and maintain their statistics in memory.

During the online classification phase, a classifier makes decision by measuring similarity

metrics between the feature vector extracted from on-the-fly network traffic and the

feature vectors extracted from training data. Flows with high values of similarity metric

with the voice (or video) features are classified as voice(or video); data streams with low

values of similarity with voice/video are classified as other applications.

In general, VOVClassifier is composed of four 1 i, Prw modules that operate in cascade:

(i) Flow Suni,,,;,,, e Generator (FSG),(ii) Feature Extractor (FE) via Power Spectral









Density analysis, (iii) Voice/video-Subspace Generator (SG) and (iv) Voice/video-

CLassifier (CL).

Next, we briefly summarize the functionalities of each component.

6.1.1 Flow Summary Generator (FSG)

All packets collected off-the-wire are processed by the Flow S,'n,,nI. ,1; Generator

module, that reorders packets by removing any duplicated packet, and organizes them into

network transport flows according to their 5-tuple, e.g., source IP, destination IP, source

Port, destination Port and transport protocol. In Section 5.3 we have shown as voice and

video data streams that are characterized by packets that are very small in size. As a

consequence, this module filters out all packets whose size is smaller than a pre-specified

threshold Op. The processed flow is then internally described in terms of packet sizes and

inter-arrival times between packets within any generic flow Fs.


s {P,Ai);i 1,...I}, (6-1)

where Pi and Ai denote the packet size and relative packet arrival time of the ith packet

in the flow with I packets, respectively. As we only consider relative arrival time, A1 is

alv--,- 0.

6.1.2 Feature Extactor (FE) and Voice/Video Subspace Generator (SG)

The FSG output is forwarded to the Feature Extractor module that computes a

feature vector for each flow processed by analyzing its power spectral density (PSD) in

order to exploit regularities residing in voice and video traffic. The voice/video Subspace

Generator processes the high dimensional feature vectors received and projects the feature

vectors into a low dimensional space that embeds the fine granularity properties of the

data stream in process. This is achieved by first partitioning the feature vector space into

a few non-overlapping clusters or data sets and then extracting the characteristic of each

cluster using principal component analysis (PCA).The FSG and FE modules (Figure 6-1)

are used during both the training and the classification phases.









6.1.3 Voice/Video CLassifer (CL)

During the classification phase, the data are processed by one extra module, named

Voice/Video Cl. *-.:,' r, that compares the feature vectors extracted from current data

streams entering the system to the voice and video subspaces generated during training

in order to classify the stream as voice, video or other. The problem of data stream

classification requires the implementation of a similarity metric. In literatures, there

are many similarity metrics. For example, B-i-;,- classifier uses cost function, and

nearest-neighbor (1-NN) and K-nearest-neighbor (KNN) use Euclidean distance. In

general, no similarity metric is guaranteed to be the best for all applications. For example,

B-i,--- classifier is applicable only when the likelihood probabilities are well estimated,

which requires the number of training samples to be much larger than the number of

feature dimensions. As a consequence, it is not suitable for classification based on an high

dimensional feature vector, such as the PSD feature vector. Furthermore, both 1-NN and

K-NN are proved to be optimal only under the assumption that data of the same category

are clustered together. Unfortunately, this is not alv--,- the case. We overcome the above

problem by employing a similarity metric based on the normalized distance from feature

vector representing the ongoing flow to the two subspaces obtained during training phase.

The subspace at minimum distance will be elected as candidate only if the distance is

below specific thresholds.

We conclude this section by highlighting one minor limitation of our approach. Our

system is unable to distinguish a flow containing video only from a flow containing video

packets pi.:-ybacked by voice data (when video and voice applications are simultaneously

launched in Skype, voice data is pi.-:.vbacked on video packets). This is because the

feature for video packets piggybacked by voice data is very similar to that for video only.

Hence, our traffic classifier will declare a flow containing video packets pi-:2yvbacked by

voice data as "video".









The rest of this chapter is organized as the following. Sections 6.2, 6.3, and 6.4

describe the components, Feature Extractor, Voice / Video Subspace Generator, and Voice

/ Video Classifier, respectively. In Section 6.5, we conduct experiments on traffic collected

between two universities using Skype, MSN, and GTalk. Section 6.6 summarizes this

chapter.

6.2 Feature Extractor (FE) Module via Power Spectral Density (PSD)

As explained in Section 5.3 the extraction and processing of simple traffic features

does not solve the problem of detecting and separating voice and video data streams from

other applications. In this section we first introduce the preliminary steps that we take to

transform each generic flow Ts obtained from FSG into a stochastic process that combines

the inter-arrival times and packet sizes. Then we describe how to use power spectral

density (PSD) analysis as a powerful methodology to extract such hidden key regularities

residing in real-time multimedia network traffic.

6.2.1 Modeling the network flow as a stochastic digital process

Preprocessing

________, 1, ___]LPF ] Th-
Stochastic aqtF LPF Sample Estimate
Model hut Ts PSO

Figure 6-2. Power spectral density features extraction module. Cascade of processing
steps.


Each flow Fs extracted from the FSG is forwarded to the FE module that applies

several steps in cascade (Figure 6-2). First, any Fs extracted (see Equation (6-1)) is

modeled as a continuous stochastic process as illustrated in Equation (6-2):


P(t)= P6 (t A), (6 2)
< P,A> E-s

where 6(.) denotes the delta function. As the reader can notice, our model combines

packet arrival times and packet sizes together as a single stochastic process. Because

digital computers are more suitable to deal with discrete-time sequences than continuous-time









processes, we transform P(t) to a discrete-time sequence by applying sampling at

frequency F, = .

Because the signal defined in Equation (6-2) is represented as a summation of delta

functions, its spectrum spans the whole frequency domain. In order to correctly reshape

the spectrum of Ph(t) to avoid aliasing when it is sampled at interval T,, we apply a

low pass filter (LPF) characterized by its impulse response hLpF(t). Ph(t) can then be

mathematically described as following:


Ph(t) =P(t) hLpp(t)= Ph(t -A). (6-3)
EFs

After sampling at interval T, we obtain the following discrete-time sequence:


Pd(i) =Ph(iT) Ph(iT A), (6-4)
<'P,A> E5s

where i 1,..., Id A- A- + 1, Am is the arrival time of the last packet in the flow.

We note that the sampling interval T, cannot be arbitrarily chosen. If T, is too large,

then the spectrum of the flow F, contains information only related to low frequencies and

thus lacks of information related to the high frequency spectrum. On the other hand, if T,

is too small, then the length Id of the resulting discrete-time sequence will be very large,

resulting in very high complexity in computing the PSD of Fs. After an extensive analysis

of widely-used voice and video applications such as Skype, MSN, and GTalk, we observed

that choosing T, = 0.5 milliseconds is sufficient to extract all useful information for our

purpose.

Next, we provide a methodology to extract the regularities residing in the signal P(t).

We achieve this by studying the extracted digital signal Pd(i) in the frequency domain

applying power spectral density analysis.

6.2.2 Power Spectral Density (PSD) Computation

Power spectral density definition.









The power spectral density of a digital signal represents its energy distribution in

the frequency domain. Regularities in time domain translate into dominant periodic

components in its autocorrelation function and finally to peaks in its power spectral

density.

For a general second-order stationary sequence {y(i)}~ the power spectral density

(defined in [59]) can be computed as:
00

k=-oo

where {r(k; y); k E Z} represents the autocovariance sequence of the signal {yi; i E Z}, i.e.,


r(k; y) = [y(i)y*( k)]. (6-6)


Although w in Equation (6-5) can take any value, we restrict its domain to be within

[-T, 7) because i(j7; y) = i(w + 27; y).

According to Equations (6-5) and (6-6), the computation of the PSD for a digital

signal theoretically requires to have access to an infinite long time sequence. Since in

reality, we cannot assume to have infinite digital sequences at our disposal, we need to

face the problem on which technique can be used in our context to estimate the power

spectral density with an admissible accuracy. In literature, two different families of PSD

estimation are available: parametric and non-parametric. Parametric methods have

shown to perform better under the assumption that the underlying model is correct

and accurate. Furthermore, these methods are more interesting from a computational

complexity perspective as they require the estimation of fewer variables when compared

with non-parametric methods.

In our research, we employ parametric method to estimate PSD. The details are

presented in the next section.

PSD estimation based on parametric method.









Now, we briefly present the parametric methods to estimate PSD. According to

Weierstrass theorem, any continuous PSD can be approximated arbitrarily closely by a

rational PSD of the form


A() (7 2, (6-7)

where E2 is a positive scalar and B(w) and A(w) are polynomials:


A(7) =1 + ale-z + .. + ape-zp, (6-8)

B(=) =1 + ble- + + bqe-q 7. (6-9)


Equation (6-7) can be regarded as obtaining a signal by filtering white noise of power E2

through a filter with transfer function B(") i.e.,
A( ) ,
P q
y(i) + ,,(i t) =E(i) + bt(i t) (6-10)
t=1 t=1

Starting from Equation (6-7), three types of methods are derived:
ifp > 0 and q = 0, one models {y(i)}ez as an autoregressive (AR(p)) signal;

1. ifp 0 and q > 0, one models {y(i)}i, as a moving average (\!A(q)) signal;

2. otherwise, it is modeled as an autoregressive moving average (ARMA(p, q)) signal.

Based on the AR, MA, or ARMA assumptions, one can estimate the coefficients in

Equation (6-7) and hence PSD. In general, none of these three models outperforms the

other two but rather their performance are strictly related to the specific shape of the

signal under consideration. Due to the fact that the signal we process are characterized

by strong regularities in the time domain, we decided to adopt the AR model. The reason

is that the AR equation can model spectrum with narrow peaks by placing zeros of A(w)

close to the unit circle.









Yule-Walker method. Now, we describe methods to estimate coefficients of an AR

signal. Given the fact that q = 0, Equation (6-10) can be written as
P
y(i) + ,,(i- t) =(i). (6-11)
t=1

Multiplying Equation (6-12) by y*(i k) and taking expectation at both sides, one obtains
P
r(k; y) + Y atr(k t; y) =E {E(i)y*( k)}. (6-12)
t= 1
Noting that

0 ifi k
E{ (i)y*(- k)} (6-13)
E2 ifi k

one obtains the equation system


r(0; y) + Et, atr(-t; y) 2 (
< (6-14)
r(k;y) + EY atr(k t;y) = 0, k 1,...,p.

Equation (6-14) can be rewritten in matrix form, i.e.,

r(0;y) r(- ;y) r(-p; y) 1 2

r(1; y) r(; y) al 0
(6-15)
r(- ;) :

r(p; y) (. r(0;y) 0, O

Equation (6-15) is known as the Yule-Walker method for AR spectral estimation[59].

Given data {y(i)} 1i, one first estimates the autocovariance sequence


r(k; y) 7= L =k+ y(i)y*(i k)
S, k = 0,...,p. (6-16)
(-k; y) = *(k; y)

When {r(k; y); k = -p,...,p} is replaced by its estimate {r(k; y); k = -p,...,p},

Equation (6-15) becomes a system of p + 1 linear equations with p + 1 unknown variables,









i.e., E2, a .. ., ap. Its solution is
-1-
a,1 (0; /) ... r(-- + p ;l/) (l)
S (6-17)

p (p- ; y) ... (0; y) r(p)

(2 =(0; y) + (-t; y)at. (6-18)
t=1

Levinson-Durbin algorithm (LDA). The direct solution of Yule-Walker method,

i.e., Equations (6-17) and (6-18), is not good enough in terms of time complexity.

Equation (6-17) computes the inversion of covariance matrix, whose time complexity is

O (p3)[60, page 755]. In addition, in most applications, there is no a priori information

about the true order p. To cope with that, the Yule-Walker system of equations,

Equation (6-15), has to be solved for p = 1 up to p = pmax, where pmax is some

prespecified maximum order. The time complexity is O (p4ax).

In this paper, we use the Levinson-Durbin Algorithm (LDA)[59]to reduce time

complexity. It estimates AR signal coefficients recursively in the order p. To facilitate

further discussion and to emphasize the order p, we denote

r(0; Y) r(- ; y) r(-p; y)

A l r (1; y) r(0; y) r(-p+ 1;y) (69)
,p+i (619)


r(p;y}) r (1;y) r(0;y)

a1
ap^ (6-20)

ap










and EF the power of noise of the AR(p) signal. Thus, one can rewrite Equation (6-15) as


RP+1[


function LDA(...)
Argument 1: data, {y(i)} i1
Argument 2: order, p.
Return: parameters of AR(p) model, a, and F2
p,


I
r(k; y) = y7iS*
i=k+l



r = al


2-
I r(0; y)


k), for Vk 0, ,...,p




r(1; y)
r (0; y)


Ir(1; y) 12
r(0; y)


A
rt A

at

rt+1


[r*(t; ),r*(t- ; y),.. r*(l;y)]T

[at, at-_, al]T
r(t + 1; y) + rtat


(6-25)

(6-26)

(6-27)

(6-28)

(6-29)


i2 (2 t l / 2' )
Ct+l C t t+
at t1 / 2
at+ = o + rt+l 1


10 ondr for


Figure 6-3. Levinson-Durbin Algorithm.

Figure 6-3 gives the LDA algorithm to estimate coefficients of AR(p) model given

data {y(i)} i.


1 E2

dp 0


(6-21)


(6-22)


(6-23)


(6-24)


8. for t <-- 1,...,p








For the same scenario that one needs to estimate AR model from order 1 up to pmax,
the time complexity of LDA is 0 (p~,), much better than the direct solution given by
Equations (6-17) and (6-18).

1. function PSDEstimate(...)
2. Argument 1: data, {y(i)} i.
3. Argument 2: order, p.
4. Return: PSD [w(~; y).
5.

[ap, _C ] LDA({y(i)}0 ,p) (6-30)
6.
2
(; y) =- -t2 (6-31)
It + EY ate-a

Figure 6-4. Parametric PSD Estimate using Levinson-Durbin Algorithm.

Once the AR model is estimated, one can estimate PSD of signal {y(i)} 1. The
procedure is given in Figure 6-4.
PSD feature vector.
According to the above discussion, we now define the PSD feature vector of a flow
as the following. Let us assume {Pd(i)}d, (see Equation (6-4)) to be second-order
stationary. Then its PSD can be estimated as:

b (7;Pd) PSDEstimate ({Pd(i)} d ,) (6-32)

where w E [-7r, 7) and p is the pre-specified order.
Recall that {Pd(i)}!I are obtained by sampling a continuous-time signal Ph(t) at
time interval T, (see Figure 6-2). Thus, one can further formulate the PSD in terms of real
frequency f as

bf(f; 'Pd) ( ; P ed) ( F- s ) (633)
F, 2 2









where F, -. Equation (6- 33) shows the relationship between the periodic components

of a stochastic process in the continuous-time domain and the shape of its PSD in the

frequency domain.

f (f; pd) is a continuous function in frequency domain. To handle it in a computer,

we need to do sampling in frequency domain. In other words, we select a series of

frequencies,


0< fl < fi 2' (6

and define the PSD feature vector as
S1T
= f (fl;'Pd) f (f2;Pd), f (fM;Pd) (6-35)

c e RM is the feature vector we use to perform classification.

In the next Section, we introduce a new technique that we use to translate the

characteristic of these high-dimensional feature vectors into a more tractable low

dimensional space.

6.3 Subspace Decomposition and Bases Identification on PSD Features

In _r inl,, scientific and engineering problems, the data of interest can be viewed

as drawn from a mixture of geometric or statistical models instead of a single one.

Such data are often referred to in different contexts as "mixed," or "multi-modal," or

"multi-model," or "heterogeneous," or "hybrid." Subspace decomposition is a general

method for modeling and segmenting such mixed data using a collection of subspaces,

also known in mathematics as a subspace arrangement. By introducing certain new

algebraic models and techniques into data clustering, traditionally a statistical problem,

the subspace decomposition methodology offers a new spectrum of algorithms for data

modeling and clustering that are in many aspects more efficient and effective than (or

complementary to) traditional methods, e.g., principle component ii 1, -i (PCA),

Expectation Maximization (EM), and K-Means clustering.









As illustrated in Figure 6-1, we collect voice and video training flows during the

training phase. After processing the raw packet data through the feature extraction

module via PSD, one obtains two sets of feature vectors,

) { (1), (2),. .., (Ni)}, (6-36)


which is obtained through voice training data, where N1 is the number of voice flows; and

A (2) {' (t), (2), .. ,, (N2)} (6-37)


which is obtained through video training data, where N2 is the number of video flows.

To facilitate further discussion, let us also regard (i) as a M x Ni matrix, for i = 1, 2,

where each column is a feature vector. In other words, 1(i) e RMXN.

In this section, we present techniques to identify the low dimensional subspaces

embedded in RM, for both T1 and T2.

There are a lot of low dimensional subspace identification schemes, such as Principal

Components Analysis (PCA) [61] and Metric Multidimensional Scaling (Il )S) [62], which

identify linear structure, and ISOMAP [63] and Locally Linear Embedding (LLE)[64],

which identify non-linear structure.

Unfortunately, all these methods assume that data are embedded in one single

low-dimensional subspace. This assumption is not alv--x true. For example, as different

software uses different voice coding, it is more reasonable to assume that the PSD feature

vector of voice traffic is a random vector generated from a mixture model than a single

model. In such case, it is more likely that there are several subspaces in which the feature

vectors are embedded. The same holds for video feature vectors.

As a result, a better scheme is to first, cluster the trained feature vectors into several

groups, known as subspace decomposition; and second, to identify the subspace structure

of each group, known as subspace bases identification. We describe the two steps in the

following sections.









6.3.1 Subspace Decomposition Based on Minimum Coding Length

The purpose of subspace decomposition is to partition the data set

T= {(1), (2),..., ^(N)} (6-38)

into non-overlapping K subsets such that

S-=iU T2 U .. U 'K. (6-39)

IH.' -[rf] proposed a method to decompose subspaces according to the minimum coding

length criteria. The idea is to view the data segmentation problem from the perspective of

data coding/compression.

Suppose one wants to find a coding scheme, C, which maps data in T E RMxN to

bit sequence. As all elements are real numbers, infinite long bit sequence is needed to

decode without error. Hence, one has to specify the tolerable decoding error, C, to obtain a

mapping with finite coding length, i.e.,

C-1 (~()) < 2, for Vn 1,...,N. (640)

Then the coding length of the coding scheme C is a function

Lc : RMxN -_ Z+. (6-41)

It is proven [65] that the coding length is up bounded by

N + K K N K p N
c ) < = N l2 det + )+ o2 (6-42)

where
N
P =N () (6-43)
i= 1

4=[ -(1) p,...J(N) (6-44)









The optimal partition (see Equation (6-39)), in terms of minimum coding length

criteria, should minimize coding length of the segmented data, i.e.,


min Lc (T; 1 m) min Ic ()+ k log2 (6-45)
k1 kL J

where 1I denotes the partition scheme. The first term in Equation (6-45) is the summation

of coding length of each group, and the second one is the number of bits needed to

encoding membership of each item of T in the K groups.

The optimal partition is achieved in the following way. Let the segmentation scheme

be represented by the membership matrix,


Ilk diag ([lk, T2k,... Nk]) E RNXN, (6-46)

where 7 k denotes the probability that vector Q(n) belongs to subset k, such that
K
YTnk = 1, for Vn 1,..., N (6-47)
k= 1

and diag(-) denotes converting a vector to a diagonal matrix.

Hi i. -[r, page 34] proved that the coding length is bounded as follows.




< r tr (f1k) + K logKdet (I +tr () K q4lk ')
k-I 2 tr (Ik 2

f tr (u~k) 10t2
k=
A(q; II), (6-48)

where tr (.) denotes the trace of a matrix, and det (.) denotes matrix determinant.

Combining Equations (6-45) and (6-48), one achieves a minimax criterion

S= argmin max ~c (1; 1)] arg min, ('; 1). (6-49)
n c n -









Then for 'Y' C T, ,' e k after segmentation if and only if


k =arg max lnk. (6-50)
k


1. function MCLPartition(...)
2. Argument 1: set of feature vectors, T = (1), (N)
3. Return: partition of T,

II {Il,..., K; l1 U... U\ K Tt, ri n J 0 for Vi j,}


4. Initialization:
II= { (1)} (2)},. ..,{(N),
5. while true do
6.

(71,72) arg min C ( Uw ) (F ,*) (6-51)
7rl En,7rF EHn
7. if (TT U TT)- (TT*, TT) > 0 then
8. break
9. else
10. II (II\{ri, r2}) U {ri U 72}
11. end if
12. end while
13. return TT
Figure 6-5. Pairwise steepest descent method to achieve minimal coding length.

There is no closed form solution for Equation (6-49). IT. .4-[r, page41] proposed a

pairwise steepest descent method to solve it (Figure 6-5). It works in a bottom-up way.

It starts with a partition scheme that assigns each element of T to a partition. Then, at

each iteration, the algorithm finds two subsets of feature vectors such that by merging

these two subsets, one can decrease the coding length the most (Equation (6-51)). This

procedure stops when no further decrease of coding length can be achieved by merging any

two subsets.

Using the above method, we obtain a partition of voice feature vector set T(1),












(6-52)


and a partition of video feature vector set T(2)

(2) ,T2) U .-. U (2) (6-53)


Next, we describe the method to identify subspace bases in each of the segmentations.

6.3.2 Subspace Bases Identification

In this section, we use PCA[61] algorithm to identify subspace bases for each

segmentation,

{ (i k 1,..., K i 1,2}, (6-54)


obtained in the previous section. The basic idea is to identify uncorrelated bases and

choose those bases with dominant energy. Figure 6-6 shows the algorithm.

1. function L,,zsU ,1] II I. IfyBases(T E R MXN6)
2. 4- '1
3. P-,', ^ b, im -
4. Do eigenvalue decomposition on 4T such that

TT =UUT', (6-55)

where U [ul, M], E L diag ([ar ,..., a ]), and a2 > 7g > > a.
5. j argmin5 EY 12(r) > MEM 172(M)
6. U = [uit, u2,..., i-1
7. U [j, ij+ ,... ,UMI
8. E diag( 91 ,..., 7- _J)
9. E= diag ( r.... c.r )
10. end function
Figure 6-6. Function II. ,l.:fyBases identifies bases of subspace.


In Figure 6-6, argument T represents the feature vector set of one segmentation and

6 is a user defined parameter which specifies the percentage of energy retained, e.g., 91'









or 95'. The algorithm returns 5 variables. j7 represents the sampled mean of all feature

vectors. It is the origin of the identified subspace. The columns of U are the bases with

dominant energy (i.e.,variance), whose corresponding variances are denoted by E. These

bases determine the identified low dimensional subspace spanned by T. The columns of

U compose the null space of the previous subspace, whose corresponding variances are E.

The last two outputs are required to calculate the distance of an ongoing feature vector to

the subspace, which will be described in Section 6.4.

Applying the function II. ,l:fyBases on all segmentations, we obtain


k ik k) IdentifyBases (T ) (6-56)

for Vk = 1,..., K, i = 1, 2. These are the outputs of subspace identification module, and

hence the results of training phase, in Figure 6-1.

During the classification phase, these outputs are used as system parameters, which

will be presented in the next section.

6.4 Voice/Video Classifier

In Section 6.3, we presented an approach to identify subspaces spanned by PSD

feature vectors of training voice and video flows. Specifically, one obtains the following

parameters:


[l~,^)j i) ,u ) U i)] ( -57)
pk Uk k k k (6-57)

for Vk = 1,...,K, i, = 1, 2. In this section, we use these parameters to do classification.

During the classification phase, for each ongoing flow F, one composes a sub-flow,

Fs, by extracting small packets, i.e., packets smaller than Op, and passes it through PSD

feature extraction module to generate PSD feature vector Q. This is the input to the

voice/video classifier.

The voice/video classifier works in the following way. It first calculates the normalized

distances between p and all subspaces of both categories. Then it chooses minimum









distance to each category. The decision is made by comparing the two distance values to

two thresholds, OA and Ov respectively for voice and video. Figure 6-7 shows the procedure

of the voice/video classifier.

1. function type Voic VideoCl J i (, OA, Ov
2. For Vi 1,2, Vk 1- ,... K,

d( = NormalizedDistance (i), )


3. For Vi 1,2,
di = min d')
k k

4. if dl < OA and d2 > 0v
5. type = VOICE.
6. else if dl > OA and d2 < Ov
7. type = VIDEO.
8. else
9. type = "DON'T KNOW", i.e., neither voice nor video.
10. end if
11. end function
12. function d NormalizedDistance (, 7, &,

13. d (-_ j) Uy-IU-T )
14. end function
Figure 6-7. Function Voice VideoCl,' .:Ti determines whether a flow with PSD feature
vector y is of type voice or video or neither. 01 are 02 are two user-specified threshold
arguments. Function voicevideoCli-J :fy uses Function NormalizedDistance to calculate
normalized distance between a feature vector and a subspace.


Note that, in Function VoiceVideoCl -J:fy, line 7, when we detect flow type to be

video, the flow may also carry voice traffic. The reason is discussed in Section 6.1.3.

From lines 2 and 13 in Figure 6-7, the time complexity of function VoiceVideoC1lr-':fy

is

0 ((Ki + K2)M2) (6-58)









6.5 Experiment Results

In this section, we demonstrate the experiment results of applying the system

presented in Figure 6-1 to network traffic classification. Before that, we first describe

experiment settings in Section 6.5.1.

6.5.1 Experiment Settings

We perform four sets of experiments. In Section 6.5.2, two sets of experiments are

conducted on traffic generated by Skype. In Section 6.5.3, other two sets of experiments

are conducted on traffic generated by Skype, MSN, and GTalk.

For each set of the experiments, we use Receiver Operating C'.,,,,i. I/ ristics (ROC)

(< iL -[2! page 107] as the performance metric. ROC curve is a curve of detection

probability, PD, vs. false alarm probability, PFA, where,


PDniH P (The estimated state of nature is -H The true state of nature is '), (6-59)

PFA\H =P (The estimated state of nature is IThe true state of nature is not H), (6 60)

where H can be voice, video, file+voice, and file+video. By tuning parameters 8,, OA, and

Ov ( see Figure 6-7), one is able to generate ROC curve.

During the experiments, we collected network traffic from three applications, i.e.,

Skype, MSN, and GTalk. For each application, traffic was collected in two scenarios. For

the first scenario, two lab computers located in University A and University B respectively

were communicating with each other. There was direct connection between two peers. For

the second one, we used a firewall to block direct connection between the two peers such

that the application was forced to use relay nodes.

To do classification, we chose first 10 seconds of each flow, i.e., Ama, < 10 seconds.

We set T, 0.5 milliseconds. Hence, Id = 20, 000.











6.5.2 Skype Flow Classification

In this section, we conduct experiments on Skype traffic. We first consider the

scenario when each Skype flow carries one type of traffic. In other words, in this set of

experiments, one flow is of type VOICE, VIDEO, or none of the above.

Figure 6-8 shows the ROC curves of classifying voice and video flows.


1 1
099 099
098 098
097 097
096 096
0 95 6 095
094 094
093 093
092 092
091 091
09 09
0 02 04 06 08 1 0 02 04 06 08
PFA PFA

(a) (b)

Figure 6-8. The ROC curves of single-typed flows generated by Skype, (a) VOICE and (b)
VIDEO.


We then conduct experiments on hybrid Skype flows. In other words, each flow

may be of type VOICE, VIDEO, FILE+VOICE, FILE+VIDEO, or none of the above.

Figure 6-9 plots the ROC curves of these five types, respectively.

6.5.3 General Flow Classification

Now, let us do the same experiments on network traffic generated by Skype, MSN,

and GTalk, as these are very common applications at present. In other words, a voice flow

now can be a VoIP flow generated by Skype, MSN, or GTalk. So are video flows. Note

that, GTalk does not support video conference. Similarly, two sets of experiments are

conducted, one on single-typed flows and the other on hybrid flows.















099 099
098 098
097 097
096 096
S095 095
094 094
093 093
092 092
091 091
091 09
0 02 04 06 08 1 0 02 04 06 08
PFA PFA

(a) (b)




095- 095
09 09
085 085
08 08
a075 075
07 07
065 065
06 06
055 055
05 05
0 02 04 06 08 1 0 02 04 06 08
PFA PFA

(c) (d)


Figure 6-9. The ROC curves of hybrid flows generated by Skype, (a) VOICE, (b) VIDEO,
(c) FILE+VOICE, and (d) FILE+VIDEO.



Similar to Section 6.5.2, we first consider the scenario when each flow carries one type


of traffic. By tuning the thresholds, 0~, OA, and Ov, we generate ROC curves of classifying


voice and video flows (Figure 6-10).


We then conduct experiments on hybrid flows. Figure 6-11 shows the ROC curves of


classifying VOICE, VIDEO, FILE+VOICE, and FILE+VIDEO flows.


6.5.4 Discussion


To better understand Figs. 6-8, 6-9, 6-10, and 6-11, we show some typical values of


PD and PFA pairs in Table 6-1. One can see the following phenomena from Table 6-1.













099- 099
098 098
097- 097 -
096 096
0 95- 095 -
094 094
093 093
092 092
091- 091
09 09
0 02 04 06 08 1 0 02 04 06 08 1
PFA PFA

(a) (b)

Figure 6-10. The ROC curves of single-typed flows generated by Skype, MSN, and GTalk:
(a) VOICE and (b) VIDEO.

Table 6-1: Typical PD and PFA values.
Skype Skype+MSN+GTalk
PFA (PD) Single Hybrid Single Hybrid
VOICE 0(1) 0(1) 0(.995) .002(.986)
VIDEO 0(.993) 0(.965) 0(.952) 0(.948)


Voice flows vs. video flows. From Table 6-1, one notes that classification of

VOICE traffic is more accurate than that of VIDEO. Specifically, we can achieve 10' .

accurate classification of Skype voice flows. This is due to the fact that voice traffic has

higher regularity than video does (Figure 5-8 and Figure 5-9). One can immediately tell

the dominant periodic component at 33Hz in the voice flows. This frequency corresponds

to the 30-millisecond IPD of the employ, -1 voice coding. On the other hand, Video PSDs

have peaks at 0. It means that non-periodic component dominates in video flows. One can

see that PSDs of the two video flows are close to each other. That is the reason why our

approach achieves high classification accuracy by using PSD features.

Single-typed flows vs. hybrid flows. From Table 6-1, one can see that the

classification of single-typed flows is more accurate than that of hybrid flows. Mixing

multiple types of traffic together is like increasing noise. Hence, it is not surprising that

classification accuracy is reduced.















1 ----r

095

09

085-

08-

075-

07-

065-

06-

055-

05
0 02


095

09

085

08

075-

07

065

06

055

05
0


04 06 08
PF
FA

(a)


02 04 06 08


09

08

07

06

05

04

03

02

0

0 02 04 06 08


09

08

07

06

05

04

03

02

01

0
0


02 04 06 08


PFA PFA

(c) (d)


Figure 6-11. The ROC curves of hybrid flows generated by Skype, MSN, and GTalk: (a)

VOICE, (b) VIDEO, (c) FILE+VOICE, and (d) FILE+VIDEO.



One application vs. multiple applications. One further notes that classification


of Skype flows is more accurate than that of flows generated from general applications, i.e.,


Skype, MSN, and GTalk.


Empirically, we found that Skype flows are similar to GTalk flows, but quite different


from MSN. For example, both Skype and GTalk voice flows have about 33-millisecond


inter-arrival time, whereas MSN voice flow has approximately 25-millisecond inter-arrival


time.









When these flows are mixed together, classification accuracy is reduced. But the

accuracy reduction is acceptable. Specifically, for hybrid voice traffic at PFA w 0, PD is

reduced from 1 to 0.986; for hybrid video traffic, it is from 0.965 to 0.948.

This shows that our approach is robust. The robustness results from the fact that the

subspace identification module, as presented in Section 6.3, decomposes multiple subspaces

in the original high-dimensional feature space. As a result, PSD feature vectors of Skype

and GTalk are likely to be within different subspaces than those of MSN. Therefore, we

can still classify traffic accurately.

6.6 Summary

In this chapter, we describe the VOVClassifier system to do network classification.

VOVClassifier is composed of four components, feature summary generator, feature

extractor, subspace generator, and voice/video classifier. The novelty of VOVClassifier is

* modeling a network flow by a stochastic process;

* estimating PSD feature vector to extract the regularities residing in voice and video
traffic;

* decomposing subspaces of training feature vectors followed by bases identification;

* using minimum distance to subspace as the similarity metric to perform classification.

Experiment results demonstrate the effectiveness and robustness of our approach.

Specifically, we show that classification of voice traffic is more accurate than that of video

traffic; classification of single-typed flows is more accurate than that of hybrid flows;

and classification of pure Skype flows is more accurate than that of flows generated from

multiple applications (e.g., Skype, MSN, and GTalk).









CHAPTER 7
CONCLUSION AND FUTURE WORK

7.1 Summary of Network Centric Anomaly Detection

In the first part of our study, we presented our achievement on network centric

anomaly detection. We first proposed a novel edge-router based framework to robustly

and efficiently detect network anomalies in the first place they happen. The key idea is

to exploit spatial and temporal correlation of abnormal traffic. The motivation of our

framework design is to use both spacial and temporal correlation among edge routers. The

framework consists of three types of components (i.e., traffic monitors, local analyzers,

and a global i, &i-v. r). Traffic monitors summarize traffic information on each single link

between edge router and user subnet. Local analyzers collect information on edge routers,

which is provided by traffic monitors, and reports to the global analyzer. The global

analyzer has a global view of the whole autonomous system and makes final decision. The

advantages of our framework design are the following.

1. It is deploy, .1 on edge routers instead of systems of end users, such that it can detect
network anomalies in the first place they enter an AS.

2. It has no burden on core routers;

3. It is flexible in that detection of network anomalies can be made both locally and
globally;

4. It is capable of detecting low volume network anomalies accurately by exploiting
spatial correlations among edge routers.

We then presented feature extraction for network anomaly detection. Based on

the framework, we designed the hierarchical feature extraction architecture. Different

components extract different features. For example, traffic monitors can extract features

such as packet rate, data rate, and SYN/FIN ratio. Local analyzers are able to extract

features such as SYN/SYN-ACK ratio, round-trip time, and two-way matching features

on one edge router. The global analyzer can extract two-way matching features from

the whole autonomous system. Specifically, we focus on the novel type of features,









the two-way matching features, proposed by us. This type of features uses both the

temporal and spacial information carried in network traffic. It is a very effective indicator

of network anomaly associated with spoofed source IP address. We designed a novel

data structure, referred to as Bloom filter array, to efficiently extract two-way matching

features. Different from the existing works, our data structure has the following properties:

1) i,',in.' Bloom filter, 2) combination of a sliding window with the Bloom filter, and

3) using insertion-removal pairs to enhance the Bloom filter with a removal operation.

Our analysis and simulation demonstrate that the proposed data structure has a better

space/time trade-off than conventional algorithms.

In the end, we applied the machine learning technology to network anomaly detection.

Specifically, we used B li, -i in, model to determine the state of each edge router, normal

or abnormal. Traditionally, edge routers are regarded as independent. It is incapable to

detect low-traffic network anomaly. Straightforward improvement over this independent

model is to regard edge routers to be dependent on each other. However, this method has

an exponential time complexity to determine the edge router states, i.e., O (2"), where

r is the number of edge routers. We proposed the hidden Markov tree (HMT) to model

the correlations among edge routers. It takes advantages of employing dependence among

edge routers while has almost linear time complexity, i.e., O (Ba), where B is the number

of child nodes of each non-leaf node in the HMT. Our machine learning scheme has the

following nice properties:

* In addition to detecting network anomalies having high-data-rate on a link, our
scheme is also capable of accurately detecting attacks having low-data-rate on
multiple links. This is due to exploitation of spatial correlation of network anomalies.

* Our scheme is robust against time-varying traffic patterns, owing to powerful machine
learning techniques.

* Our scheme can be deploy, 1 in large-scale high-speed networks, thanks to use of
Bloom filter array to efficiently extract features.









Our simulation results show that the proposed framework can detect DDoS attacks

even if the volume of attack traffic on each link is extremely small (i.e., 1 .). Especially,

for the same false alarm probability, our scheme has a detection probability of 0.97,

whereas the existing scheme has a detection probability of 0.17, which demonstrates the

superior performance of our scheme.

7.2 Summary of Network Centric Traffic Classification

We then presented our research on network centric traffic classification, specifically, to

detect and classify voice and video data streams.

We first motivated the significance and points out the challenges of this issue, and

shows weakness of existing solutions. With the emergence of software using user specified

ports or dynamic ports, traffic classification based on TCP and UDP port numbers is

no longer valid. Other methods based on reconstructions of session and application

information from packet contents impose significant complexity and processing load on

the classification device. In addition, they are incapable to classify encrypted traffic. A

new emerging tendency in the research community to approach this problem is to rely

on pattern classification techniques. However, existing machine learning technologies are

not able to distinguish between voice and video traffic. In the research, we also proposed

a novel problem that one network flow may carry multiple types of sessions, such as

Skype uses one connection to carry voice, video, chat, and file transfer at the same time.

This increases the difficulties of traffic classification. To our best knowledge, no existing

literature has ever considered this problem. Our intuition to approach this problem is to

employ regularities residing in multimedia traffic. We also illustrates four types of metrics

to measure the regularities, i.e.,

1. packet inter-arrival time and packet size in time domain;

2. packet inter-arrival time in frequency domain;

3. packet size in frequency domain;









4. combining packet inter-arrival time and packet size in frequency domain.

It turns out that the last one is the most distinctive feature for classifying voice and

video traffic.

We then presented the VOVClassifier system to classify voice and video traffic.

VOVClassifier is composed of four 1i i, ri" modules that operate in cascade, flow summary

generator, feature extractor, voice / video subspace generator, and voice / video classifier.

The novelty of VOVClassifier is that

* we combine packet inter-arrival times and packet sizes of a network flow and model it
by a stochastic process;

* we estimate PSD feature vector to extract regularities residing in voice and video
traffic;

* we use minimum coding length to decompose subspaces from the training feature
vectors and principal component analysis to identify bases of each subspace;

* we use minimum distance to subspaces as the similarity metric to perform classification.

The experiment results demonstrate the effectiveness and robustness of our approach.









APPENDIX A
PROOFS

A.1 Equation (4 31)


Proof:


P(Q,
UlEfo'll

>: P(Q,


Y p(Qi
U'Efo'll


p(A iu, O)

-T (U).


A.2 Equation (4 32)


Proof:


pG(&Ql = u) n E P(Q,
jEv(i) u'e{O,1}

-p(l I= u) n U P(,j
jEv(i) u'E{O,1}

-p(Ql u) n p(i,| Q= u
jEv(i)


u'I = u) v,(u')


U'II = u) p(T7 I CQ


-p(T,\i Ii = u) = vi(u)


UjIS2,() U') -fp(i)(U,)p(p(Q~i) u,(i)


U I PWz U')p(Qp(i) U', JTAp(< (I)|p(i)


U, QPW UI, ?-~i)


(A-1)


(A-2)







A.3 Equation (4 33)


Proof:
Tr(u)' (u)
E., Ti(u')v (u')
p OQ = U, pv) P (?=v I U)
K,, P (Q U u) p (^ | U)
p Qi = u,
K,, P (j U1,
p [Qi u,
p( )
P(p(i) = u) (A-3)

A.4 Equation (4-34)
Proof:
(, u)Vp(i)( ')P (,, = U, p( ) p R ') p(j(')
[ ,,, Ti(u(u")] [ ,( P (P -(=2 u" Q) = u') v (u")]
P ( i -u)P (TK\. Qp() U') P (Q UQp() U) p (ip()) U',)

p (i =u (T I Qp(i) U') P (pp(j) u', p?\^)) P KW | p(i) U')
[E i=P (i U', )I [YP (Q i I|p(i ) = u)


P Hp (fe |p(i) /)
L p (i = U ", pp() = U', U p(i) Ti, p(i) = U

p ( ui = Qp() = U',


P(Qj = u, Qp(i) = u'\^) (A-4)









REFERENCES


[1] P. Mockapetris, "Domain names concepts and facilities," RFC 1034.

[2] P. Mockapetris, "Domain names implementation and specification," RFC 1035.

[3] "Video on demand," Wikipedia. [Online]. Available: http://en.wikipedia.org/wiki/
Video_on_demand

[4] D. Wu, Y. T. Hou, W. Zhu, Y.-Q. Z!i ii- and J. M. Peha, "Streaming video over
the internet: Approaches and directions," IEEE Trans. Circuits Syst. Video Technol.,
vol. 11, pp. 282-300, Mar. 2001.

[5] K. Nichols, S. Blake, F. Baker, and D. Black, "Definition of the differentiated services
field (ds field) in the ipv4 and ipv6 headers," RFC 2474.

[6] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss, "An architecture
for differentiated services," RFC 2475.

[7] P. Almquist, "Type of service in the internet protocol suite," RFC 1349.

[8] S. S. Kim, A. L. N. Reddy, and M. Vannucci, "Detecting traffic anomalies using
discrete wavelet transform," in Proceedings of International Conference on Information
Networking (ICOIN), vol. III, Busan, Korea, Feb. 2004, pp. 1375-1384.

[9] C.-M. C(! i;,, H. T. Kung, and K.-S. Tan, "Use of spectral analysis in defense against
dos attacks," in Proceedings of IEEE Globecom 2002, vol. 3, Taipei, Taiwan, Nov.
2002, pp. 2143-2148.

[10] A. Hussain, J. Heidemann, and C. Papadopoulos, "A framework for classifying denial
of service attacks," in Proceedings of AC'If SIGCOMM, Karlsruhe, Germany, Aug.
2003.

[11] H. Wang, D. Z!h Ii: and K. G. Shin, "Detecting SYN flooding attacks," in Proc. IEEE
INFOCOM'02, New York City, NY, June 2002, pp. 1530-1539.

[12] T. Peng, C. Leckie, and K. Ramamohanarao, "Detecting distributed denial of service
attacks using source IP address monitoring," Department of Computer Science and
Software Engineering, The University of Melbourne, Tech. Rep., 2002. [Online].
Available: http://www.cs.mu.oz.au/~tpeng

[13] R. B. Blazek, H. Kim, B. Rozovskii, and A. Tartakovsky, "A novel approach to
detection of "denial-of-service" attacks via adaptive sequential and batch-sequential
change-point detection methods," in Proc. IEEE Workshop on Information Assurance
and S.. ;,, West Point, NY, June 2001, pp. 220-226.

[14] S. Mukkamala and A. H. Sung, "Detecting denial of service attacks using support
vector machines," in Proceedings of IEEE International Conference on F;,..; S.l,/; ii
May 2003.









[15] S. Savage, D. Wetherall, A. Karlin, and T. Anderson, "Practical network support for
ip traceback," in Proc. of ACIf SIGCOMM'2000, Aug. 2000.

[16] A. Lakhina, M. Crovella, and C. Diot, "C('!i i:terization of network-wide anomalies
in traffic flows," in Proc. AC'I[ SIGCOMM Conference on Internet Measurement '04,
Oct. 2004.

[17] H. Wang, D. Zi ir. and K. G. Shin, "C'!I i,i:,.-point monitoring for the detection
of dos attacks," IEEE Transactions on Dependable and Secure Conri'l.:,' no. 4, pp.
193-208, Oct. 2004.

[18] J. Mirkovic and P. Reiher, "A i:;:nm, of ddos attacks and ddos defense mechanisms,"
in Proc. ACI[ SIGCOMM Computer Communications Review '04, vol. 34, Apr. 2004,
pp. 39-53.

[19] J. B. Postel and J. Reynolds, "File transfer protocol," RFC 959, Oct. 1985. [Online].
Available: http://www.faqs.org/rfcs/rfc959.html

[20] K. Lu, J. Fan, J. Greco, D. Wu, S. Todorovic, and A. Nucci, "A novel anti-ddos system
for large-scale internet," in AC'I SIGCOMM 2005, Philadelphia, PA, Aug. 2005.

[21] B. H. Bloom, "Space/time trade-offs in hash coding with allowable errors," Commun.
AC(M, vol. 13, no. 7, pp. 422-426, July 1970.

[22] L. Fan, P. Cao, J. Almeida, and A. Z. Broder, "Summary cache: A scalable wide-area
web cache sharing protocol." IEEE/ACM11 Trans. Netw., vol. 8, no. 3, June 2000.

[23] F. C'!I i!;, W. chang Feng, and K. Li, "Approximate caches for packet classification,"
in IEEE INFOCOM 2004, vol. 4, Mar. 2004, pp. 2196-2207.

[24] R. Rivest, "The md5 message-digest algorithm," RFC 1321, Apr. 1992. [Online].
Available: http://www.faqs.org/rfcs/rfcl321.html

[25] MD5 CRYPTO CORE FAMILY, HDL Design House, 2002. [Online]. Available:
http://www.hdl-dh.com/pdf/hcr_7910.pdf

[26] D. A. Patterson and J. L. Hennessy, Computer Organization and Design: The
Hardware/Software In.I, l-i ., San Francisco, CA: Morgan Kaufmann, 1998, ch. 5,6.

[27] "Auckland-IV trace data," 2001. [Online]. Available: http://wand.cs.waikato.ac.nz/
wand/wits/auck/4/

[28] L. L. Scharf, Statistical i..i:,l processing: detection, estimation, and time series
analysis. Addison Wesley, 1991.

[29] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Cla--1..:;.'.n, 2nd ed.
Wiley-Interscience, Oct. 2000.

[30] G. Casella and R. L. Berger, Statistical Inference, 2nd ed. Duxbury Press, June 2001.









[31] B. J. Frey, Graphical Models for Machine Learning and D.:.I:l.i Communication.
Cambridge, MA: MIT Press, 1998.

[32] M. C. Nechyba, l\! ::iiiiiiii-likelihood estimation for mixture models:
the em algorithm,," 2003, course note. [Online]. Available: http:
//mil.ufl.edu/~nechyba/__eel6825.f2003/coursematerials/t4.em_theory/emnotes.pdf

[33] Y. Weiss, "Correctness of local probability propagation in graphical models with
loops," Neural Computation, vol. 12, pp. 1-4, 2000.

[34] J. S. Yedidia, W. T. Freeman, and Y. Weiss, "Generalized belief propagation,"
Advances in Neural Information Processing S,'l. mi, vol. 13, pp. 689-695, Dec. 2000.

[35] J. Pearl, Probabilistic Reasoning in Intelligent SI-. I"- Networks of Plausible
Inference. Morgan Kaufmann, Sept. 1988.

[36] S. M. Aji and R. J. McEliece, "The generalized distributive law," IEEE Trans. I,,1,1
Th(.-,;, vol. 46, pp. 325-343, Mar. 2000.

[37] T. Richardson, "The geometry of turbo-decoding dynamics," IEEE Trans. InfT, 11
Th(.-,;, vol. 46, pp. 9-23, Jan. 2000.

[38] R. J. McEliece, D. J. C. McKay, and J. F. C'!. i:- "Turbo decoding as an instance
of pearls belief propagation algorithm," IEEE J. Select. Areas Commun., vol. 16, pp.
140-52, Feb. 1998.

[39] F. Kschischang and B. Frey, 11I I, i i. decoding of compound codes by probability
propagation in graphical models," IEEE J. Select. Areas Commun., vol. 16, pp.
219-230, Feb. 1998.

[40] A. J. Viterbi, "Error bounds for convolutional codes and an .,-iipll ically optimum
decoding algorithm," IEEE Trans. Inform. Tht.' -; vol. 13, pp. 260-269, Apr. 1967.

[41] J. G. DAVID FORNEY, "The viterbi algorithm," in Proceedings of the IEEE, vol. 61,
Mar. 1973, pp. 268-278.

[42] L. R. RABINER, "A tutorial on hidden markov models and selected applications in
speech recognition," in Proceedings of the IEEE, vol. 77, Feb. 1989, pp. 257-286.

[43] D. Moore, K. Keys, R. Koga, E. Lagache, and k claffy, "The CoralReef software suite
as a tool for system and network administrators," in Usenix LISA. (2001), Dec. 2001.
[Online]. Available: citeseer.ist.psu.edu/moore01coralreef.html

[44] C. Logg, "C'!i i:terization of the traffic between slac and the internet," July
2003. [Online]. Available: http://www.slac.stanford.edu/comp/net/slac-netflow/html/
SLAC-netflow.html

[45] I. A. N. Authority, "Port numbers," Aug. 2006. [Online]. Available:
http://www.iana.org/assignments/port-numbers









[46] T. Karagiannis, A. Broido, N. Brownlee, kc claffy, and M. Faloutsos, "Is p2p dying or
just hiding?" in IEEE Globecom 2004, 2004.

[47] S. Sen, O. Spatscheck, and D. Wang, "Accurate, scalable in network identification of
p2p traffic using application signatures," in WWW, 2004.

[48] K. Wang, G. Cretu, and S. J. Stolfo, "Anomalous p ,iload-based network intrusion
detection," in 7th International Symposium on Recent Advanced in Intrusion
Detection, Sept. 2004, pp. 201 222.

[49] M. Roughan, S. Sen, O. Spatscheck, and N. Duffield, "Class-of-service mapping for qos:
A statistical signature-based approach to ip traffic classification," in AC I[ Internet
Measurement Conference, Taormina, Italy, 2004.

[50] T. Karagiannis, K. Papagiannaki, and M. Faloutsos, "Blinc: multilevel traffic
classification in the dark," in SIGCOMM '05: Proceedings of the 2005 conference on
Applications, technologies, architectures, and protocols for computer communications.
New York, NY, USA: AC\ I Press, 2005, pp. 229-240.

[51] C. Dewes, A. Wichmann, and A. Feldmann, "An analysis of internet chat
systems," in IMC '03: Proceedings of the 3rd ACi f SIGCOMM conference on Internet
measurement. New York, NY, USA: AC \! Press, 2003, pp. 51 64.

[52] D. Wu, T. Hou, and Y.-Q. Zhang, "Transporting real-time video over the internet:
C'! i11! y and approaches," Proceedings of the IEEE, vol. 88, no. 12, pp. 1855 1875,
December 2000.

[53] ITU-T, "G.711: Pulse code modulation (pcm) of voice frequencies," ITU-T
Recommendation G.711, 1989. [Online]. Available: http://www.itu.int/rec/T-REC-G.
711/e

[54] ITU-T, "G.726: 40, 32, 24, 16 kbit/s adaptive differential pulse code
modulation (adpcm)," ITU-T Recommendation G.726, 1990. [Online]. Available:
http://www.itu.int/rec/T-REC-G.726/e

[55] ITU-T, "G.728: Coding of speech at 16 kbit/s using low-d,'1li code excited
linear prediction," ITU-T Recommendation G.728, 1992. [Online]. Available:
http://www.itu.int/rec/T-REC-G.728/e

[56] ITU-T, "G.729: Coding of speech at 8 kbit/s using conjugate-structure
algebraic-code-excited linear prediction (cs-acelp)," ITU-T Recommendation G.729,
1996. [Online]. Available: http://www.itu.int/rec/T-REC-G.729/e

[57] ITU-T, "G.723.1: Dual rate speech coder for multimedia communications transmitting
at 5.3 and 6.3 kbit/s," ITU-T Recommendation G.723.1, 2006. [Online]. Available:
http://www.itu.int/rec/T-REC-G.723.1/en

[58] Y. Wang, J. Ostermann, and Y.-Q. Zhang, Video Processing and Communications,
1st ed. Prentice Hall, 2002.









[59] P. Stoica and R. Moses, Spectral A,i,.l-.:- of S.:i,.'l- 1st ed. Upper Saddle River, NJ:
Prentice Hall, 2005.

[60] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms,
2nd ed. Duxbury Press, Sept. 2001.

[61] L. I. Smith, "A tutorial on principal components analysis," Feb. 2002.
[Online]. Available: http://www.cs.otago.ac.nz/cosc453/studenttutorials/
principal_components.pdf

[62] K. V. Deun and L. Delbeke, \!ill dimensional scaling," University of Leuven.
[Online]. Available: http://www.mathpsyc.uni-bonn.de/doc/delbeke/delbeke.htm

[63] J. B. Tenenbaum, V. de Silva, and J. C. Langford, "A global geometric framework for
nonlinear dimensionality reduction," Science, vol. 290, pp. 2319-2323, Dec. 2000.

[64] S. T. Roweis and L. K. Saul, "Nonlinear dimensionality reduction by locally linear
embedding," Science, vol. 290, pp. 2323-2326, Dec. 2000.

[65] W. Hong, "Hybrid models for representation of imagery data," Ph.D. dissertation,
University of Illinois at Urbana-C'l I' p ,i-n Aug. 2006.









BIOGRAPHICAL SKETCH

Jieyan Fan was born on Jul 26, 1979 in Shanghai, ('!i i, The only child in the family,

he grew up mostly in his home town, graduating from the High School Affiliated to Fudan

University in 1997. He earned his B.S. and M.S. in electrical engineering from Shanghai

Jiao Tong University, Shanghai, C('l, in 2001 and 2004, respectively. He is currently

a Ph.D. candidate with electrical and computer engineering, University of Florida,

Gainesville, FL. His research interests are network security and pattern classification.

Upon completion of his Ph.D. program, Jieyan will be working in Yahoo! Inc,

Sunnyvale, CA.





PAGE 1

1

PAGE 2

2

PAGE 3

3

PAGE 4

Firstofall,thankmyadvisorProfessorDapengWuforhisgreatinspiration,excellentguidance,deepthoughts,andfriendship.Ialsothankmysupervisorycommitteemembers,ProfessorsShigangChen,LiuqingYang,andTaoLi,fortheirinterestinmywork.Ialsoexpressmyappreciationtoallofthefaculty,sta,andmyfellowstudentsintheDepartmentofElectricalandComputerEngineering.Inparticular,IextendmythankstoDr.KejieLuforhishelpfuldiscussions. 4

PAGE 5

page ACKNOWLEDGMENTS ................................. 4 LISTOFTABLES ..................................... 8 LISTOFFIGURES .................................... 9 ABSTRACT ........................................ 12 CHAPTER 1INTRODUCTION .................................. 14 1.1IntroductiontoNetworkAnomalyDetection ................. 14 1.2IntroductiontoNetworkCentricTracClassication ............ 16 2NETWORKANOMALYDETECTIONFRAMEWORK ............. 18 2.1Introduction ................................... 18 2.2Edge-RouterBasedNetworkAnomalyDetectionFramework ........ 18 2.2.1TracMonitor ............................. 20 2.2.2LocalAnalyzer ............................. 20 2.2.3GlobalAnalyzer ............................. 21 2.3Summary .................................... 22 3FEATURESFORNETWORKANOMALYDETECTION ............ 23 3.1Introduction ................................... 23 3.2HierarchicalFeatureExtractionArchitecture ................. 24 3.2.1Three-LevelDesign ........................... 24 3.2.2FeatureExtractioninaTracMonitor ................ 26 3.2.3FeatureExtractioninaLocalAnalyzeroraGlobalAnalyzer .... 27 3.3Two-WayMatchingFeatures .......................... 27 3.3.1Motivation ................................ 27 3.3.2DenitionofTwo-WayMatchingFeatures .............. 30 3.4BasicAlgorithms ................................ 32 3.4.1HashTableAlgorithm ......................... 32 3.4.2BloomFilter ............................... 33 3.5BloomFilterArray(BFA) ........................... 35 3.5.1DataStructure ............................. 35 3.5.2Algorithm ................................ 36 3.5.3RoundRobinSlidingWindow ..................... 38 3.5.4Random-KeyedHashFunctions .................... 39 3.6ComplexityAnalysis .............................. 40 3.6.1Space/TimeTrade-o .......................... 41 3.6.2OptimalParameterSettingforBloomFilterArray .......... 50 5

PAGE 6

............................... 51 3.7.1TheBFAAlgorithmvs.theHashTableAlgorithm .......... 51 3.7.2ExperimentonFeatureExtractionSystem .............. 55 3.8Summary .................................... 57 4MACHINELEARNINGALGORITHMFORNETWORKANOMALYDETECTION ..................................... 59 4.1Introduction ................................... 59 4.1.1ReceiverOperatingCharacteristicsCurve ............... 59 4.1.2Threshold-BasedAlgorithm ...................... 60 4.1.3Change-PointAlgorithm ........................ 60 4.1.4BayesianDecisionTheory ........................ 62 4.2BayesianModelforNetworkAnomalyDetection ............... 64 4.2.1BayesianModelforTracMonitorsandLocalAnalyzers ...... 64 4.2.2BayesianModelforGlobalAnalyzers ................. 66 4.2.3HiddenMarkovTree(HMT)ModelforGlobalAnalyzer ....... 68 4.3EstimationofHMTParameters ........................ 72 4.3.1LikelihoodEstimation .......................... 72 4.3.2TransitionProbabilityEstimation ................... 76 4.4NetworkAnomalyDetectionUsingHMT ................... 81 4.5SimulationResults ............................... 84 4.5.1ExperimentSetting ........................... 84 4.5.2PerformanceComparison ........................ 86 4.5.3Discussion ................................ 88 4.6Summary .................................... 89 5NETWORKCENTRICTRAFFICCLASSIFICATION:ANOVERVIEW .... 90 5.1Introduction ................................... 90 5.2RelatedWork .................................. 94 5.3IntuitionsBehindaProperDetectionofVoiceandVideoStreams ..... 95 5.3.1PacketInter-ArrivalTimeandPacketSizeinTimeDomain ..... 97 5.3.2PacketInter-ArrivalTimeinFrequencyDomain ........... 99 5.3.3PacketSizeinFrequencyDomain ................... 99 5.3.4CombiningPacketInter-ArrivalTimeandPacketSizeinFrequencyDomain .................................. 100 5.4Summary .................................... 102 6NETWORKCENTRICTRAFFICCLASSIFICATIONSYSTEM ........ 104 6.1SystemArchitecture .............................. 104 6.1.1FlowSummaryGenerator(FSG) ................... 105 6.1.2FeatureExtactor(FE)andVoice/VideoSubspaceGenerator(SG) 105 6.1.3Voice/VideoCLassifer(CL) ...................... 106 6.2FeatureExtractor(FE)ModuleviaPowerSpectralDensity(PSD) ..... 107 6.2.1Modelingthenetworkowasastochasticdigitalprocess ...... 107 6

PAGE 7

............. 108 6.3SubspaceDecompositionandBasesIdenticationonPSDFeatures .... 115 6.3.1SubspaceDecompositionBasedonMinimumCodingLength .... 117 6.3.2SubspaceBasesIdentication ...................... 120 6.4Voice/VideoClassier ............................. 121 6.5ExperimentResults ............................... 123 6.5.1ExperimentSettings ........................... 123 6.5.2SkypeFlowClassication ........................ 124 6.5.3GeneralFlowClassication ....................... 124 6.5.4Discussion ................................ 125 6.6Summary .................................... 128 7CONCLUSIONANDFUTUREWORK ...................... 129 7.1SummaryofNetworkCentricAnomalyDetection .............. 129 7.2SummaryofNetworkCentricTracClassication .............. 131 APPENDIX APROOFS ....................................... 133 A.1Equation( 4{31 ) ................................. 133 A.2Equation( 4{32 ) ................................. 133 A.3Equation( 4{33 ) ................................. 134 A.4Equation( 4{34 ) ................................. 134 REFERENCES ....................................... 135 BIOGRAPHICALSKETCH ................................ 140 7

PAGE 8

Table page 3-1Notationsfortwo-waymatchingfeatures ...................... 31 3-2Notationsforcomplexityanalysis .......................... 41 3-3Space/timecomplexityforhashtable,Bloomlter,andBFA ........... 47 4-1ParametersusedinCUSUM ............................. 60 4-2Notationsforhiddenmarkovtreemodel ...................... 70 4-3Parametersettingoffeatureextractionfornetworkanomalydetection ...... 86 4-4Performanceofdierentschemes. .......................... 86 5-1Commonlyusedspeechcodecandtheirspecications ............... 96 6-1TypicalPDandPFAvalues. ............................. 126 8

PAGE 9

Figure page 2-1AnISPnetworkarchitecture. ............................ 19 2-2Networkanomalydetectionframework. ....................... 19 2-3Responsibilitiesofandinteractionsamongthetracmonitor,localanalyzer,andglobalanalyzer. ................................. 20 2-4Exampleofasymmetrictracwhosefeatureextractionisdonebytheglobalanalyzer. ........................................ 21 3-1Hierarchicalstructureforfeatureextraction. .................... 24 3-2Networkinnormalcondition. ............................ 28 3-3Source-address-spoofedpackets. ........................... 29 3-4Reroute. ........................................ 29 3-5HashTableAlgorithm ................................ 33 3-6BloomFilterOperations ............................... 34 3-7ScenariosoftheproblemscausedbyBloomlter.(a)Boundaryproblem.(b)Anoutboundpacketarrivesbeforeitsmatchedinboundpacketwitht2t1<. 34 3-8BloomFilterArrayAlgorithm ............................ 37 3-9BloomFilterArrayAlgorithmusingslidingwindow ................ 38 3-10Space/timetrade-oforthehashtable,BFAwith=0:1%,andBFAwith=1% ........................................... 48 3-11Relationamongspacecomplexity,timecomplexity,andcollisionprobability.(a)Mavs..(b)E[Ta]vs.. ........................... 50 3-12Spacecomplexityvs.collisionprobabilityforxedtimecomplexity. ....... 52 3-13Memorysize(inbits)vs.averageprocessingtimeperquery(ins) ....... 53 3-14Averageprocessingtimeperquery(ins)vs.averagenumberofhashfunctioncalculationsperquery. ................................ 54 3-15Comparisonofnumericalandsimulationresults.(a)Hashtablealgorithm.(b)BFAalgorithmwith=1%. ............................. 55 3-16Featuredata:(a)NumberofSYNpackets(link1),(b)NumberofunmatchedSYNpackets(link1),(c)NumberofSYNpackets(link2),and(d)NumberofunmatchedSYNpackets(link2). .......................... 58 9

PAGE 10

........................... 64 4-2Extendedgenerativemodelincludingtracfeaturevectors:(a)originalmodeland(b)simpliedmodel. .............................. 65 4-3Generativeindependentmodelthatdescribesdependenciesamongtracstatesandtracfeaturevectors. .............................. 66 4-4Generativedependentmodelthatdescribesdependenciesamongedgerouters. 67 4-5HiddenMarkovtreemodel.Forannodei,(i)denotesitsparentnodeand(i)denotesthesetofitschildrennodes. ........................ 69 4-6ProbabilitydensityfunctionoftheunivariateGaussiandistributionN(x;0;1). 73 4-7Histogramofthetwo-waymatchingfeaturesmeasuredatarealnetworkduringnetworkanomalies. .................................. 73 4-8TheEMalgorithmforestimatingp(iji=u),i2,u2f0;1g. ......... 75 4-9Iterativelyestimatetransitionprobabilities. .................... 77 4-10Beliefpropagationalgorithm. ............................ 78 4-11ViterbialgorithmforHMTdecoding. ........................ 82 4-12ExperimentNetwork ................................. 85 4-13Performanceofthreshold-basedandmachinelearningalgorithmswithdierentfeaturedata ...................................... 87 4-14Performanceoffourdetectionalgorithms ...................... 88 5-1Averagepacketsizeversusinter-arrivalvariabilitymetricfor5applications:voice,video,letransfer,mixofletransferwithvoiceandvideo. ........... 96 5-2Inter-arrivaltimedistributionforvoiceandvideotrac .............. 97 5-3Packetsizedistributionforvoiceandvideotrac ................. 98 5-4Powerspectraldensityoftwosequences/tracesoftime-varyinginter-arrivaltimesforvoicetrac .................................... 99 5-5Powerspectraldensityoftwosequencesoftime-varyinginter-arrivaltimesforvideotrac ...................................... 100 5-6Powerspectraldensityoftwosequencesofdiscrete-timepacketsizesforvoicetrac ......................................... 101 10

PAGE 11

......................................... 101 5-8Powerspectraldensityoftwosequencesofcontinuous-timepacketsizesforvoicetrac ......................................... 102 5-9Powerspectraldensityoftwosequencesofcontinuous-timepacketsizesforvideotrac ......................................... 102 6-1VOVClassierSystemArchitecture ......................... 104 6-2Powerspectraldensityfeaturesextractionmodule.Cascadeofprocessingsteps. 107 6-3Levinson-DurbinAlgorithm. ............................. 113 6-4ParametricPSDEstimateusingLevinson-DurbinAlgorithm. ........... 114 6-5Pairwisesteepestdescentmethodtoachieveminimalcodinglength. ....... 119 6-6FunctionIdentifyBasesidentiesbasesofsubspace. ................ 120 6-7FunctionVoiceVideoClassifydetermineswhetheraowwithPSDfeaturevector~isoftypevoiceorvideoorneither.1are2aretwouser-speciedthresholdarguments.FunctionvoicevideoClassifyusesFunctionNormalizedDistancetocalculatenormalizeddistancebetweenafeaturevectorandasubspace. ..... 122 6-8TheROCcurvesofsingle-typedowsgeneratedbySkype,(a)VOICEand(b)VIDEO. ........................................ 124 6-9TheROCcurvesofhybridowsgeneratedbySkype,(a)VOICE,(b)VIDEO,(c)FILE+VOICE,and(d)FILE+VIDEO. ..................... 125 6-10TheROCcurvesofsingle-typedowsgeneratedbySkype,MSN,andGTalk:(a)VOICEand(b)VIDEO. ............................. 126 6-11TheROCcurvesofhybridowsgeneratedbySkype,MSN,andGTalk:(a)VOICE,(b)VIDEO,(c)FILE+VOICE,and(d)FILE+VIDEO. .............. 127 11

PAGE 12

12

PAGE 13

13

PAGE 14

1 2 ]hasrisenfromapproximately9,472,000inJanuary1996to394,991,609inJanuary2006.Inaddition,theemergenceofnewapplicationsandprotocols,suchasvoiceoverInternetProtocol(VoIP),pear-to-pear(P2P),andvideoondemand(VoD)[ 3 ],alsoincreasesthecomplexityoftheInternet.Accompanyingthistrendisanincreasingdemandformorereliableandsecureservice.AmajorchallengeforInternetserviceproviders(ISP)istobetterunderstandthenetworkstatebyanalyzingnetworktracinrealtime.ThusISPsareveryinterestedintheproblemofnetworkcentrictracanalysis.Weconsiderthenetworkcentrictracanalysisproblemfromtwoperspectives:1)networkanomalydetectionand2)networkcentrictracclassication.Weintroducethetwoperspectivesinthenexttwosections. 14

PAGE 15

15

PAGE 16

4 ]summarizedtechniquesforQoSprovisionforreal-timestreamsfromthepointofviewofendhosts.Thesetechniquesincludecodingmethods,protocols,andrequirementsonstreamservers.AnothereectivesolutionisfromthepointofviewofnetworkcarriersorISPs.Forexample,ISPscanassigndierentforwardingprioritytodierenttypesofnetworktraconrouters.Thisisthemotivationofdierentiatedservices(DiServ)[ 5 6 ].DiServisamethoddesignedtoguaranteedierentlevelsofQoSfordierentclassesofnetworktrac.Itisachievedbysettingthe\typeofservice"(TOS)[ 7 ]eld,whichhenceisalsocalledDiServcodepoint(DSCP)[ 5 ],intheIPheaderaccordingtotheclassofthenetworkdata,sothatthebetterclassesgethighernumbers.Unfortunately,suchdesignhighlydependsonnetworkprotocols,especiallyproprietaryprotocols,observingDiServregulations.Intheworstcase,ifallprotocolssetTOStothehighestnumber,itisevenworsetoemployDiServmethod.Forthisreason,webelieveaproperDiServschemeshouldbeabletoclassifynetworktraconthey,insteadofrelyingonanytagsinpacketheader.Thus,thedicultyliesinaccurateclassicationofnetworktracinreal-time. 16

PAGE 17

17

PAGE 18

8 ],spectralanalysis[ 9 10 ],statisticalmethods[ 11 { 13 ]),andmachinelearningtechniques[ 14 ]canbeused.Toidentifynetworkanomalysources,IPtraceback[ 15 ]istypicallyused.TheIPtracebacktechniquescanhelpcontaintheattacksources;butitrequireslarge-scaledeploymentofthesameIPtracebacktechniqueandneedsmodicationofexistingIPforwardingmechanisms(e.g.,IPheaderprocessing).Thischapterpresentsournetworkanomalydetectionframework,whichisofthenetwork-basedcategory.WepresentourframeworkdesigninSection 2.2 andsummarizethischapterinSection 2.3 2-1 ).ItconsistsoftwotypesofIProuters,i.e.,coreroutersandedge 18

PAGE 19

AnISPnetworkarchitecture. routers.Coreroutersinterconnectwithoneanothertoformahigh-speedautonomoussystem(AS).Incontrast,edgeroutersareresponsibleforconnectingsubnets(i.e.,customernetworksorotherISPnetworks)withtheAS.Inthispaper,asubnetcanbeeitheracustomernetworkoranISPnetwork. Figure2-2. Networkanomalydetectionframework. 19

PAGE 20

Responsibilitiesofandinteractionsamongthetracmonitor,localanalyzer,andglobalanalyzer. GivensuchISPnetworkarchitecture,wedesignaframeworktodetectnetworkanomalies.Ourframework(Figure 2-2 )consistsofthreetypesofcomponents:tracmonitors,localanalyzers,andaglobalanalyzer.Figure 2-3 summarizesthefunctionalitiesofeachtypeofcomponentsandtheirinteractions.Next,wediscussthefunctionalitiesoftracmonitors,localanalyzers,andglobalanalyzerinSections 2.2.1 2.2.2 ,and 2.2.3 ,respectively. 2-2 )isresponsiblefor:scanningpartialorallpacketsofasingleunidirectionallink;summarizingtraccharacteristics;extractingsimplefeaturesfromthetraccharacteristic;makingdecisions(e.g.,declarenetworkanomalyorclassifytypeofnormaltrac)ononesingleunidirectionallink;andreportingthesummaryoftracinformation,simplefeaturedata,anddecisionstoalocalanalyzer. 20

PAGE 21

Figure2-4. Exampleofasymmetrictracwhosefeatureextractionisdonebytheglobalanalyzer. Theglobalanalyzerhasaglobalviewofthewholenetwork.Hence,itexploitsbothtemporalcorrelationandspatialcorrelationoftrac.Hereitisimportanttonotethat,somefeaturedatamustbeobtainedattheglobalanalyzerifglobalinformationisrequired.Forexample,inFigure 2-4 ,ifthetracfromsubnetAtoserverBpassesthroughedgerouterX,andthetracfromserverBtosubnetApassesthroughedge 21

PAGE 22

22

PAGE 23

12 ]proposedthenumberofnewsourceIPaddressestodetectDDoSattacks,undertheassumptionthatsourceaddressesofIPpacketsobservedatanedgerouterwererelativelystaticinnormalconditionsthanthoseduringDDoSattacks.PengfurtherpointedoutthatthefeaturecoulddierentiateDDoSattacksfromtheashcrowd,whichrepresentsthesituationwhenmanylegitimateusersstarttoaccessoneserviceatthesametime.Forexample,whenmanypeoplewatchalivesportsbroadcastovertheInternetatthesametime.Inbothcases(DDoSattacksandtheashcrowd),thetracrateishigh.ButduringDDoSattacks,theedgerouterswillobservemanynewsourceIPaddressesbecauseattackersusuallyspoofsourceIPaddressesofattackingpacketstohidetheiridentities.Therefore,thisfeatureimprovesthoseDDoSdetectionschemesthatrelyontracrateonly.However,Pengetal.[ 12 ]focusedondetectionofDDoSattacks.Itdidnotmentionothertypesofnetworkanomalies.Forexample,whenmalicioususersarescanningthenetwork,wecanalsoobservehightracratebutfewnewsourceIPaddresses.Itisveryimportanttodierentiatenetworkscanningfromashcrowdbecausetheformerismaliciousbutthelatterisnot.Thetwo-waymatchingfeatureondierentnetworklayers(Section 3.3.1 )cantellnotonlythepresenceofnetworkanomaliesbutalsotheircause.Lakhinaetal.[ 16 ]summarizedthecharacteristicsofnetworkanomaliesunderdierentcauses.Itscontributionistohelpidentifycausesofnetworkanomalies.Forexample,duringDDoSattacks,wecanobservehighbitrate,highpacketrate,andhighowrate.ThesourceaddressesaredistributedoverthewholeIPaddressspace.Ontheotherhand,duringnetworkscanning,allthethreeratesarehigh,butthedestinationaddresses, 23

PAGE 24

Hierarchicalstructureforfeatureextraction. Toecientlyextractfeaturesfromtrac,wedesignathree-levelhierarchicalstructure(Figure 3-1 ),whereincomingpacketsareprocessedbylevel-onelters,thenbylevel-twolters,andnallyby(level-three)featureextractionmodules.Level-onelters 24

PAGE 25

3-1 ).Ontheotherhand,afeaturemodulemayneedpacketsfrommultiplelevel-twolters.Forexample,theSYN/FIN(RST)ratiofeatureextractionrequirespacketsfromthreelters(Figure 3-1 ).ComparedtothepacketclassicationschemesdevelopedbyWangetal.[ 11 ]andPengetal.[ 12 ],ourhierarchicalstructureforfeatureextractionismoregeneralandecient.Next,wedescribethemostimportantmoduleinthethree-levelhierarchicalstructure,thefeatureextractionmodule. 25

PAGE 26

11 12 ],wegeneratefeaturesinadiscretemanner,i.e.,ourfeatureextractionmodulewillgeneratea(feature)valueoravectorattheendofeachtimeslot.Intuitively,shorterslotdurationmayreducethedetectiondelay,whichisdenedastheintervalfromtheepochwhentheanomalystartstotheepochwhentheanomalyisdetected;butasmallerdurationmayincreasethecomputationalcomplexity,sincethedetectionalgorithmneedstoanalyzemorefeaturedataforthesametimeinterval.Ontheotherhand,ifafeatureisrepresentedbyaratio,theslotdurationmustbesucientlylargetoavoiddivisionbyzero.Forexample,ifwewanttousetheSYN/FIN(RST)ratioasinRef.[ 11 ]todetectTCPSYNood,thentheslotdurationcannotbetoosmall,becausethenumberofFINpacketsinashortperiodcanbe0,whichwillresultinafalsealarmevenifthenumberofSYNpacketsisnotlarge.Featureextractioncanbedoneinatracmonitor,alocalanalyzer,andaglobalanalyzer,whichwillbedescribedinSections 3.2.2 and 3.2.3 ,respectively. 12 ].Soisdatarate.Datarate:denedbythetotalnumberofbitsofallpacketsthatarriveinonetimeslot.SYN/FIN(RST)ratio 17 ]. 26

PAGE 27

17 ]andthepercentageofnewIPaddressesproposedinRef.[ 12 ].However,theexistingfeaturessuchastheSYN/SYN-ACKratio[ 17 ]andthepercentageofnewIPaddresses[ 12 ]eitherdonotleadtogoodperformanceofdetectors,orrequirehighstorage/timecomplexity(Section 3.1 ).Toaddressthesedeciencies,weproposeanewtypeoffeaturescalledtwo-waymatchingfeatures,whichcanmakedistinctfeaturesbetweennormalandattacktrac,therebyimprovingaccuracyofdetectingattacks.Next,wediscussthetwo-waymatchingfeaturesandtheextractionscheme. 3.3.1MotivationThemotivationofusingtwo-waymatchingfeaturesarisesfromthefactthat,formostInternetapplications,packetsaregeneratedfrombothendhoststhatareengagedincommunication.Informationcarriedbypacketsononedirectionshallmatchthecorrespondinginformationcarriedbypacketsontheotherdirection.Bymonitoringthedegreeofmismatchbetweenowsoftwodirections,wecandetectnetworkanomalies.Toillustratethis,letusconsiderthebehaviorsofthetwo-waytracinthreescenarios,namely,1)normalconditions,2)DDoSattacks,and3)re-route.Intherstscenario,whenthenetworkofanISPworksnormally,informationcarriedonbothdirectionsofcommunicationmatches(Figure 3-2 ).Hostaandhostvaretwo 27

PAGE 28

Networkinnormalcondition. endsofcommunication(assumethathostviswithintheautonomoussystemoftheISPwhilehostaisnot).Hostasendsapackettohostvandvrespondsapacketbacktohosta.BothpacketspasstheedgerouterA.Fromthepointofviewofthelocalanalyzer1attachedtoedgerouterA,wedenetherstpacketasaninboundpacket,andthesecondpacketasanoutboundpacket.ThesourceIPaddress(SA)anddestinationIPaddress(DA)oftheinboundpacketmatchtheDAandSAoftheoutboundpacket.IfthecommunicationisbasedonUDPorTCP,wecanfurtherobservethatthesourceport(SP)anddestinationport(DP)oftheinboundpacketmatchtheDPandSPoftheoutboundpacket.Therefore,thelocalanalyzer1canobservematchedinboundandoutboundpacketsinnormalconditions.IntheexampleofFigure 3-2 ,itisassumedthatthebordergatewayprotocol(BGP)routingmakestheinboundpacketsandthecorrespondingoutboundpacketspassthroughthesameedgerouter.IftheBGProutingmakestheinboundpacketsandthecorrespondingoutboundpacketsgothroughdierentedgerouters(Figure 2-4 ),thematchingcanstillbeachievedbyaglobalanalyzer(Section 2.2.3 ),i.e.,multiplelocalanalyzersconveytheunmatchedinboundpacketsandthecorrespondingoutboundpacketstotheglobalanalyzer,whichhastheroutinginformationofthewholeautonomoussystem. 28

PAGE 29

Source-address-spoofedpackets. Inthesecondscenario,whenattackerslaunchspoofed-source-IP-addressDDoSattacks[ 18 ],thelocalanalyzer1observesmanyunmatchedinboundpackets(Figure 3-3 ).Sincesourceaddressesofinboundpacketsarespoofed,theoutboundpacketsareroutedtothenominaldestinations,i.e.,bandcinFigure 3-3 ,whichdonotpassthroughedgerouterAanymore.Inthiscase,localanalyzer1willobservemanyunmatchedinboundpackets. Figure3-4. Reroute. Inthethirdscenario(Figure 3-4 ),thenumberofunmatchedinboundpacketsobservedbylocalanalyzer1isincreasedduetoafailureoftheoriginalrouteandre-routeofoutboundpacketstoanotheredgerouter.Aglobalanalyzercanaddressthisproblemsimilartotheasymmetriccaseintherstscenario. 29

PAGE 30

3-2 ,ifhostaisaclientuploadingalargeleusingtheFileTransferProtocol(FTP)[ 19 ]tohostv,therewillbemuchmorepacketsfromatovthanthosefromvtoa.UploadingletoanFTPserverisanormalbehaviorbutthenumberofunmatchedinboundpacketsisveryhighinthiscase.Therefore,itismoreappropriatetouseow-levelquantities(insteadofpacket-levelquantities)asfeaturesfornetworkanomalydetection.AsintheaboveFTPcase,whenaTCPconnectionisestablished,allpacketsononedirectionconstituteoneowandpacketsonthereversedirectionconstituteanotherow.Nomatterhowmanypacketsaresentoneachdirection,thereareonlyoneinboundowandonlyoneoutboundow.TheymatchinIPaddressesandportnumbers.Therefore,wecallthenumberofunmatchedinboundowsasatwo-waymatchingfeature.Two-waymatchingfeaturesareshowntobeeectiveindicatorsofnetworkanomalies[ 20 ]. 3.4 and 3.5 .Next,wedenethetwo-waymatchingfeatures. 30

PAGE 31

Table3-1: Notationsfortwo-waymatchingfeatures NotationDescription Basedontheabovedenitions,wedenethetwo-waymatchingfeaturestobethenumberofUIF.Table 3-1 liststhenotationsusedintherestofthepaper,whereZ+standsforthenonnegativeintegerset.Inthefollowingsections,wepresentalgorithmstoextracttwo-waymatchingfeaturesfromthetracatlocalanalyzers.Notethattwo-waymatchingfeaturesshouldbe 31

PAGE 32

3-5 ,wheretheargumentsisthesignatureextractedfromapacket. 32

PAGE 33

functionHashTableInsert(V,s) fori0toK1 3. ifV[hi(s)]isempty 4. insertstoV[hi(s)],setstatebitofV[hi(s)]to\UNMATCHED" 5. return 6. endif 7. endfor 8. reportinsertionoperationerror 9. endfunction 10. functionHashTableSearch(V,s) fori0toK1 12. ifV[hi(s)]isempty 13. returnfalse 14. ifV[hi(s)]holdss returntrue 16. endfor 17. returnfalse; 18. endfunction 19. functionHashTableRemove(V,s) fori0toK1 21. ifV[hi(s)]isempty 22. return; 23. ifV[hi(s)]holdss setstatebitofV[hi(s)]to\MATCHED" 25. returntrue; 26. endif 27. endfor 28. endfunction Figure3-5. HashTableAlgorithm 21 ].Comparedtothehashtablealgorithm,Bloomlteralgorithmreducesspace/timecomplexitybyallowingsmalldegreeofinaccuracyinmembershiprepresentation,i.e.,apacketsignature,whichdoesnotappearbefore,maybefalselyidentiedaspresent. 33

PAGE 34

3-6 describestheinsertionandsearchoperationsofBloomlter. functionBloomFilterInsert(V,s) for8i2ZKdo 3. 4. endfunction 5. functionBloomFilterSearch(V,s) for8i2ZKdo 7. ifV[hi(s)]6=1then 8. returnfalse 9. endfor 10. returntrue 11. endfunction BloomFilterOperations (b)Figure3-7. ScenariosoftheproblemscausedbyBloomlter.(a)Boundaryproblem.(b)Anoutboundpacketarrivesbeforeitsmatchedinboundpacketwitht2t1<. AlthoughBloomlterhasbetterperformanceinthesenseofspace/timetrade-o,itcannotbedirectlyappliedtoourapplicationbecauseofthefollowingproblems:1.Bloomlterdoesnotprovideremovalfunctionality.Sinceonebitinthevectormaybemappedbymorethanoneitem,itisunsuitabletoremovetheitembysettingallbitsindexedbyitshashresultsto0.2.Bloomlterdoesnothavecountingfunctionality.AlthoughthecountingBloomlter[ 22 ]canbeusedforcounting,itreplacesabitwithacounter,whichsignicantlyincreasesthespacecomplexity. 34

PAGE 35

3-7(a) ).Aninboundpacketarrivesattimet012[ti;ti+1)whereasitsmatchedoutboundpacketarriveswithinnextperiod.Theinboundpacketiscountedasanunmatchedinboundpacketeventhought02t01<.Therefore,boundaryeectincreasesthefalsealarmrate.4.Inpreviousdiscussion,wedidnotconsiderthescenariothatanoutboundpacketmayarrivebeforeitsmatchedinboundpacket(Figure 3-7(b) ).Whentheoutboundpacketarrivesattimet01,itssignatureisnotinthebuer,sowedonothing.Attimet02,itsmatchedinboundpacketarrives,whoseinboundsignaturewillberecorded.Asaresult,thelatterisregardedasanunmatchedinboundpacketduringperiod[ti;ti+1).Thisearly-arrivalproblemalsoincreasesthefalsealarmrate.Next,weproposeaBloomlterarrayalgorithmtoaddresstheaboveproblems. 3.4.2 .OurideaistodesignaBloomlterarray(BFA)withthefollowingfunctionalities,notavailableintheoriginalBloomlter[ 21 23 ]:1.Removalfunctionality:Weimplementinsertionandremovaloperationssynergisticallybyusinginsertion-removalpairvectors.Thetrickisthat,ratherthanremovinganoutboundsignaturefromtheinsertionvector,wecreatearemovalvectorandinserttheoutboundsignatureintotheremovalvector.2.Countingfunctionality:WeimplementthisbyintroducingcountersinBloomlterarray.Thevalueofacounterischangedbasedonthequeryresultfromaninsertion/removaloperation.3.Boundaryeectabatement:Weusemultipletimeslotsandaslidingwindowtomitigatetheboundaryeect.4.Resolvingtheearly-arrivalproblem:whichisachievedbystoringsignatureofnotonlyinboundpacketsbutalsooutboundpackets.Inthisway,whenaninboundpacketarrivesandthesignatureofitsmatchedoutboundpacketispresent,wedonotcountthisinboundpacketasanunmatchedone. 3.5.3 ). 35

PAGE 36

3-8 )consistsofthreefunctions,namely,ProcInbound,ProcOutboundandSample,whicharedescribedasbelow.FunctionProcInboundistoprocessinboundpackets.Itworksasbelow.Whenaninboundpacketarrivesduring[j;j+1),weincreaseCjby1andinsertitsinboundsignaturesintoIVjifnoneofthefollowingconditionsissatised:1.sisstoredinatleastoneRVj0,wherejw+1j0j;2.sisstoredinIVj.Condition1beingtruemeansthatthecorrespondingoutboundowofthisinboundpackethasbeenobservedpreviously;soweshouldnotcountitasanunmatchedinboundpacket.Condition2beingtruemeansthattheinboundow,towhichthisinboundpacketbelongs,hasbeenobservedduringthecurrentslotj;soweshouldnotcountthesameinboundowagain.Ifbothconditionsarefalse,weincreaseCjbyonetoindicateanewpotentialUIF(line 7 to 10 ). 36

PAGE 37

functionProcInbound(s) if9j0,jw+1j0j,suchthatBloomFilterSearch(RVj0,s)returnstruethen 4. ifBloomFilterSearch(IVj,s)returnstruethen 6. ifaandbarebothfalse 8. 9. BloomFilterInsert(IVj;s) 10. endif 11. endfunction 12. functionProcOutbound(s0) forj0jtojw+1 14. ifBloomFilterSearch(RVj0;s0)returnstrue 15. break 16. ifBloomFilterSearch(IVj0;s0)returnstrue 17. 18. endfor 19. BloomFilterInsert(RVj,s0) 20. endfunction 21. functionSample(j) returnCjw+1 endfunction Figure3-8. BloomFilterArrayAlgorithm FunctionProcOutboundistoprocessoutboundpackets.Itworksasbelow.Whenanoutboundpacketarrivesduring[j;j+1),wecheckwhetherweneedtoupdatecounterCj0foreachj0(jw+1j0j).Specically,foreachj0(jw+1j0j),decreaseCj0byoneifitsoutboundsignatures0satisesbothofthefollowingconditions:1.s0isnotcontainedinRVj0;2.s0iscontainedinIVj0.Condition1beingtruemeansthatnopacketfromtheoutboundowtowhichthisoutboundpacketbelongsarrivesduringthej0thtimeslot.Condition2beingtruemeansthatthematchedinboundowofthisoutboundpackethasbeenobservedinthej0thslot.SatisfyingbothconditionsmeansthatitsmatchedinboundowhasbeencountedasapotentialUIF;hence,uponthearrivaloftheoutboundpacket,weneedtodecreaseCj0by 37

PAGE 38

13 startsalooptoiteratej0fromjtojw+1.Condition1ischeckedinlines 14 to 15 andCondition2ischeckedinlines 16 to 17 .Notethattheloopexits(line 15 )ifRVj0containss0;thisisbecauseanoutboundpacketofthesameowarrivedinthatj0thslotandhencethebuerofthejthslot(foreachj
PAGE 39

3-9 showstheimprovedversionofBFA,basedontheround-robinslidingwindow. 39

PAGE 40

24 ]asthehashfunction.SinceMD5takesanynumberofbitsasinput,wecanorganizekeyiandxintoabitvectorandapplyMD5toit.Usingkeyedhashfunctions,therstconcern(varyingK)canbeaddressedstraightforwardly.Specically,whenKischanged,wesimplygenerateacorrespondingnumberofrandomkeys.ApplyingtheseKkeystothesamekernelhashfunction,weobtainKhashfunctions.Hence,ourmethodhastwoadvantages:1)thenumberofhashfunctionscanbespeciedonthey;2)hashfunctionsaredeterminedonthey,insteadofbeingstoredapriori,resultinginstoragesaving.Thesecondconcern(changinghashfunctions)canalsobeaddressedifthekeysareperiodicallychanged.Evenifthekernelhashfunctionisdisclosed,itisstillverydicult,ifnotimpossible,foranattackertoguessthechangingrandomkeys.Notethatthecollisionprobabilityofthehashfunctionsisnotaectedduetotheuseofkeyedhashfunctions.Inthecaseofrandom-keyedhashfunctions,thecollisionprobabilityofhi(x)dependsonnotonlythecollisionprobabilityofhbutalsothecorrelationbetweenkeyiandx.Sincerandomnumbergeneratortechniquesaresomaturethatwecanassumeindependencebetweenkeyiandx,introductionofrandomkeyshasnoeectonthecollisionprobability. 3.6.1 ,weanalyzethespace/timetrade-oforthethreealgorithms.Section 3.6.2 addresseshowtooptimallychooseparametersofBFA. 40

PAGE 41

21 ].However,theanalysisbyBloom[ 21 ]isnotdirectlyapplicabletooursettingduetothefollowingreasons:1.AstaticdatasetwasassumedbyBloom[ 21 ].However,ourfeatureextractiondealswithadynamicdataset,i.e.,thenumberofelementsinthedatasetchangesovertime.Hence,newanalysisforadynamicdatasetisneeded.Inaddition,Bloom[ 21 ]onlyconsideredthesearchoperationduetotheassumptionofstaticdatasets.Ourfeatureextraction,ontheotherhand,requiresthreeoperations,i.e.,insertion,search,andremoval,fordynamicdatasets.2.Bloom[ 21 ]assumedbit-comparisonhardwareintimecomplexityanalysis.However,currentcomputersusuallyuseword(ormultiple-bit)comparison,whichismoreecientthanbit-comparisonhardware.Hence,itisnecessarytoanalyzethecomplexitybasedonwordcomparison.3.ThetimecomplexityobtainedbyBloom[ 21 ]didnotincludehashfunctioncalculations.However,hashfunctioncalculationdominatestheoveralltimecomplexity,e.g.,calculatingonehashfunctionbasedonMD5takes64clockcycles[ 25 ],whileoneword-comparisonusuallytakeslessthan8clockcycles[ 26 ].Fortheabovereasons,wedevelopnewanalysisforthehashtableandBloomlter,respectively.Inaddition,weanalyzetheperformanceofBFAandusenumericalresultstocomparethethreealgorithms.Table 3-2 liststhenotationsusedintheanalysis. Table3-2: Notationsforcomplexityanalysis NotationDescription 3-5 )checksifitsinboundsignaturesisinthetable.Because 41

PAGE 42

3.4.1 ,thehashtablehas`cellsofb+1bitseach,suchthatMh=`(b+1).GiventheconditionthatNowshavebeenrecordedbythehashtable,theemptyratiois=`N `=MhN(b+1) 42

PAGE 43

3{5 )toEquation( 3{4 ),weobtaintheexpectationofThE[Th]=1 3{6 )givesthespace/timetrade-o(i.e.,Mhvs.Th)ofthehashtablemethod. 3.4.2 ).ThechoiceofMbwillaecttheaccuracyofthesearchfunction,BloomFilterSearch(seeFigure 3-6 ).Thereasonisthefollowing.WhensignaturesofNowsarestoredinV,,denotingthepercentageofentriesofVwithvalue0,is=1K Mb: MbK: 43

PAGE 44

MbK; 3{5 ).FromEquation( 3{10 ),itcanbeobservedthatdecreaseswithMbifKisxed.BasedonEquation( 3{10 ),wecandenoteMbasafunctionofandKasbelowMb=R(;K): 3{11 )givesthespacecomplexityofBloomlterasafunctionofcollisionprobabilityandthenumberofhashfunctions.Now,letusconsiderthetimecomplexityofBloomlter.DenotebyTbtherandomvariablerepresentingthenumberofhashfunctioncalculations.FunctionBloomFilterInsertalwayscalculatesalltheKhashfunctions,thatis,TbjfBloomFilterInsertisexecutedgK; 3.6.1 ).Ingeneral,Pr[Tb=xjN=nandBloomFilterSearchisexecuted]=8>><>>:(1)x1x
PAGE 45

R(;K)iK R(;K)4=n(;K): 3{15 ),wegettheexpectationofTbundertheconditionthatBloomFilterSearchisexecuted,i.e.,E[TbjBloomFilterSearchisexecuted]=1 3{17 )givesthetimecomplexityofBloomlterintermsofnumberofhashfunctioncalculations. 3.6.1 canbeappliedheresinceBFAisoriginatedfromstandardBloomlter.However,therearesomedierencesbetweenthesetwoschemes.AsdescribedinSection 3.5 ,BFAhasmultiplebuerssuchasIVj,RVj,andCj,j2Zw.Therefore,thestoragesizeforBFA,denotedbyMa(inbits),isw(2Mv+L),whereMvisthesizeofeachinsertionorremovalvector,andListhesizeofeachcounterinbits. 45

PAGE 46

3{10 ),thecollisionprobabilityis=1 MvK: 3{11 ),MvisafunctionofandK.WedeneMv=R(;K): 3{20 )givesthespacecomplexityofBFA.Now,letusconsiderthetimecomplexityofBFA.DenotebyTatherandomvariablerepresentingthenumberofhashfunctioncalculationsforBFA.RecallthatBFA(Figure 3-9 )denesthreefunctions,ProcInbound,ProcOutbound,andSample.Obviously,TajfSampleisexecutedg0: 3{12 )).2.Otherwise,atleastoneofaandbistrue;thenatleastoneofthesearchoperations,i.e.,BloomFilterSearch(RVj0,s),j0=(Iw+1)%w,(Iw+2)%w,:::;I%w,andBloomFilterSearch(IVI,s),returnstrue.ThisalsomeansthatKhashfunctionshavebeencalculated(seeEquation( 3{13 )).Therefore,inanycase,ProcInboundcalculatesalltheKhashfunctions.Furthernotethat,althoughBloomFilterSearchexecutesuptow+1searchoperations,andatmostoneinsertionoperation,thetotalnumberofhashfunctioncalculationsintheseoperations 46

PAGE 47

3{21 ),( 3{22 ),and( 3{23 )andassuming(Rpi+Rpo)1,whichisalwaystrueinourdesignofBFA,wehaveE[Ta]=01 (Rpi+Rpo)+1+K(Rpi+Rpo) 3{24 )and( 3{20 ),weobtaintherelationshipbetweenMaandTaasbelowMa=w[2R(;E[Ta])+L]: Table3-3: Space/timecomplexityforhashtable,Bloomlter,andBFA AlgorithmSpacecomplexityTimecomplexity HashtableMh(freevariable)Equation( 3{6 )BloomlterEquations( 3{10 )and( 3{11 )Equations( 3{15 ),( 3{16 ),and( 3{17 )BFAEquation( 3{18 ),( 3{19 ),and( 3{20 )Equation( 3{24 ) Table 3-3 liststhespacecomplexityandtimecomplexityforhashtable,Bloomlter,andBFAalgorithms. 47

PAGE 48

Figure3-10. Space/timetrade-oforthehashtable,BFAwith=0:1%,andBFAwith=1% Figure 3-10 showsMvs.E[T]forthehashtablescheme,BFAwithcollisionprobability1%,andBFAwithcollisionprobability0.1%.InFigure 3-10 ,Xaxisrepresentsthetimecomplexity(i.e.,theexpectednumberofhashfunctioncalculations)andYaxisrepresentsthespacecomplexity(i.e.,thenumberofbitsneededforstorage).FromFigure 3-10 ,wecanseethatthecurveofBFAisbelowthecurveofthehashtable.ItmeansBFAuseslessspaceforagiventimecomplexity.Therefore,BFAachievesbetter 48

PAGE 49

3-10 showsthatforthehashtablescheme,MhisamonotonicdecreasingfunctionofE[Th].Theobservationmatchesourintuitionthatthelargertable,thesmallercollisionprobabilityforhashfunctions,resultinginlesshashfunctioncalculations.FurthernotethatMhapproachesR(b+1)whenE[Th]increases.ThisistheminimumspacerequiredtotolerateuptoRows.ForBFA,MaisnotamonotonicfunctionofE[Ta],whichapproximatelyequalsK.Wehavethefollowingobservations.CaseA:Forxedstoragesize,thesmallerK,thelargertheprobabilitythatallKhashfunctionsoftwodierentinputsreturnthesameoutputs,whichisthecollisionprobability.Inotherwords,thesmallerK,thelargerstoragesizerequiredtoachieveaxedcollisionprobability.Thatis,K#)Ma".CaseB:SinceaninputtoBFAmaysetKbitsto\1"inavectorV,hencethelargerK,themorebitsinVwillbesetto\1"(nonempty),whichtranslatesintoalargercollisionprobability.Inotherwords,thelargerK,thelargerstoragesizerequiredtoachieveaxedcollisionprobability.Thatis,K")Ma".CombiningCasesAandB,itcanbearguedthatthereexistsavalueofKorE[Ta]thatachievestheminimumvalueofMa,givenaxedcollisionprobability.ThisminimumpropertycanbeusedtoguidetheparametersettingforBFA,whichwillbeaddressedinSection 3.6.2 49

PAGE 50

3{25 ),threeparameters,Ma,E[Ta],and,arecoupled.Sincethecollisionprobabilitycriticallyaectsthedetectionerrorrateinournetworkanomalydetection,anetworkoperatormaywanttochooseanupperboundontheacceptablecollisionprobabilityandthenminimizethestoragerequired,i.e., minE[Ta]Ma;subjectto AccordingtoEquation( 3{25 ),thesolutionof( 3{26 )isasbelowMa=minE[Ta]Ma=minE[Ta]w[2R(;E[Ta])+L]; (b)Figure3-11. Relationamongspacecomplexity,timecomplexity,andcollisionprobability.(a)Mavs..(b)E[Ta]vs.. Figure 3-11 showsMavs.,andE[Ta]vs.underthesamesettingasthatinSection 3.6.1 .FromFigure 3-11(a) ,itcanbeobservedthatMadecreaseswhen

PAGE 51

3-11(b) ,oneobservesthatgenerally,E[Ta]decreaseswhenincreases.ThismaybebecausethesmallerE[Ta]orK,thelargertheprobabilitythatallKhashfunctionsoftwodierentinputsreturnthesameoutputs,whichisthecollisionprobability. 3{18 ),itcanbeobservedthatdecreaseswiththeincreaseofMvifKisxed;inotherwords,MvdecreaseswiththeincreaseofifKisxed.Further,fromEquations( 3{19 )and( 3{25 ),itcanbeinferredthatMadecreaseswiththeincreaseofifE[Ta]isxed(notethatE[Ta]K).ThisisshowninFigure 3-12 .Fromthegure,itcanbeobservedthatthetwolinesintersectatavalueofcollisionprobability,denotedbyc.ThisvalueiscriticalfortheparametersettingofBFA.Ifanetworkoperatorhasadesirablecollisionprobability,whichisgreaterthanc,thenitshouldchooseE[Ta]=4sincethisparametersettinggivesbothsmallertimecomplexityandsmallerspacecomplexity.Wecallthisproperty`competitiveoptimality'sincethereisnotradeobetweentimecomplexityandspacecomplexityinthiscase.Ontheotherhand,ifanetworkoperatorhasadesirablecollisionprobability,whichissmallerthanc,thenitneedstomakeatradeobetweenspacecomplexityandtimecomplexity. 3.7.1 comparestheperformanceoftheBFAalgorithmwiththatofthehashtablealgorithm.InSection 3.7.2 ,weshowtheperformanceofthecompletefeatureextractionsystem,whichusestheBFAalgorithm. Simulationsettings.WeapplythehashtablealgorithmandtheBFAalgorithmtothetimeseriesofsignaturesextractedfromrealtractraces,whichwerecollectedbyAucklandUniversity[ 27 ].Tomakeafaircomparisonwithrespectivetothenumerical 51

PAGE 52

Spacecomplexityvs.collisionprobabilityforxedtimecomplexity. resultsinSection 3.6.1 ,weusethesame96-bitsignature,i.e.,SA,DA,SP,andDP,andletR=250;000packets/secondand=80seconds,whichtranslatesto250;00080=20Minputsignaturesforeachsimulation.ThesesignaturesarepreloadedintomemorybeforethebeginningofsimulationssothatI/Ospeedofharddrivedoesnotaecttheexecutiontimeofsimulations.Foreachsimulationrunofthehashtablealgorithm,wespecifythememorysizeMh,andmeasurethealgorithmperformanceintermsoftheaveragenumberofhashfunctioncalculationspersignaturequeryrequest,denotedby^Th,andtheexecutiontime.DuetotheLawofLargeNumbers,^Thapproachestheexpectednumberofhashfunctioncalculationspersignaturequeryrequest,i.e.,E[Th]inEquation( 3{6 ),ifwerunthesimulationmanytimeswiththesameMh.Inoursimulations,werunthehashtablealgorithmtentimes;eachtimewithadierentsetofinputsignaturesbutwiththesameMh.ForeachsimulationrunoftheBFAalgorithm,wespecifythememorysizemaandthenumberofhashfunctionsK,andmeasurethealgorithmperformanceintermsofthecollisionfrequency,denotedby^,andtheexecutiontime.ThecollisionfrequencyisdenedastheratioofthenumberofcollisionoccurrencesinBloomFilterSearchtothe 52

PAGE 53

3-13 showsaverageprocessingtimeperqueryvs.memorysizeforthehashtablealgorithm,BFAalgorithmwith^=0.1%,andBFAalgorithmwith^=1%. Figure3-13. Memorysize(inbits)vs.averageprocessingtimeperquery(ins) FromFigure 3-13 ,weobservethat1)comparedtothehashtablealgorithm,theBFAalgorithmrequireslessmemoryspaceforthesametimecomplexity(averageprocessingtimeperquery),whichwaspredictedinSection 3.6 ,and2)theBFAalgorithmwith^=1%hasabetterspace-complexity/time-complexitytradeothantheBFAalgorithmwith^=0.1%butatcostofhighercollisionprobability,whichispredictedbythenumericalresultsinFigure 3-10 .Figure 3-14 showsaverageprocessingtimeperqueryvs.averagenumberofhashfunctioncalculationsperquery.Itcanbeobservedthattheaverageprocessingtimeperquerylinearlyincreaseswiththeincreaseoftheaveragenumberofhashfunctioncalculationsperquery.Thatis,thelargertheaveragenumberofhashfunctioncalculationsperquery,thelargertheaverageprocessingtimeperquery.Forthisreason,insteadofrunningsimulationstoobtainthetimecomplexity(i.e.,theaverage 53

PAGE 54

Averageprocessingtimeperquery(ins)vs.averagenumberofhashfunctioncalculationsperquery. processingtimeperquery),inSection 3.6.1 ,weusedtheaveragenumberofhashfunctioncalculationsperquerytorepresentthetimecomplexityofthehashtablealgorithmandtheBFAalgorithm. 3-15 comparesthesimulationsresultsandthenumericalresultsobtainedfromtheanalysisinSection 3.6 ,forbothhashtablealgorithmandBFAalgorithmintermsofspacecomplexityvs.timecomplexity.InFigure 3-15(a) ,thenumericalresultagreeswellwiththesimulationresult,exceptwhentheaveragenumberofhashfunctioncalculationsperqueryiscloseto1.FromEquation( 3{6 ),iftheexpectednumberofhashfunctioncalculationsapproaches1,therequiredmemorysizeapproachesinnity;incontrast,simulationswithalargeMhmaynotgiveaccurateresults,duetolimitedmemorysizeofacomputer.Thiscausesthebigdiscrepancybetweenthenumericalresultandthesimulationresultwhentheaveragenumberofhashfunctioncalculationsperqueryiscloseto1.Whentheaveragenumberofhashfunctioncalculationsperqueryisgreaterthanorequaltotwo,itisobservedthatsimulationalwaysrequiresmorememorythanthenumericalresult.Thisisduetothefactthatpracticalhashfunctionisnotperfect.Thatis,entriesinthehashtablearenot 54

PAGE 55

(b)Figure3-15. Comparisonofnumericalandsimulationresults.(a)Hashtablealgorithm.(b)BFAalgorithmwith=1%. equallylikelytobeaccessed.Hence,Equation( 3{2 )doesnotholdperfectly,neitherdoesEquation( 3{3 ).Asaresult,theaveragenumberofhashfunctioncalculationsperqueryinsimulationislargerthanthatpredictedbyEquation( 3{6 ).Figure 3-15(b) showsthatthenumericalresultagreeswellwiththesimulationresultforallthevaluesoftheaveragenumberofhashfunctioncalculationsperqueryunderourstudy. 3.2 .Incontrast,theexperimentinSection 3.7.1 doesnotinvolvetheinteractionamongthethreelevels. 27 ]asthebackgroundtrac.ThisdatasetconsistsofpacketheaderinformationoftracbetweentheInternetandAucklandUniversity.TheconnectionisOC-3(155Mb/s)forbothdirections. 55

PAGE 56

18 ]byrandomlyinsertingTCPSYNpacketswithrandomsourceIPaddressesintothebackgroundtraceduringspeciedtimeperiods.Specically,synchronizedattacksaresimulatedduring14000{16000secondand28000{32000secondinbothtraces.Inaddition,asynchronousattacksarelaunchedduring50000-52000secondintrace1andduring57000-59000secondintrace2.Theaverageattackrateis1%ofthepacketrateofthebackgroundtracduringthesameperiod.TodetectTCPSYNoodattacks,wechooseasthesignatureofinboundpacketsandforoutboundones.Thusthetwo-waymatchingfeatureisthenumberofunmatchedinboundTCPSYNpacketsinonetimeslot.Theaverageowrateis2480ows/second.Therefore,wesetR=2480.Wefurtherset=0:1%,K=8,w=8,and=10.Then,bysolvingEquation( 3{18 )forMvandrequiringMvtobeapowerof2,weobtainMv=215bits.Thecomputerusedforourexperimentshasone2.4GHzCPUand1GBmemory.Forcomparison,wealsoextractthenumberofinboundSYNpacketsinaslot. 56

PAGE 57

3-16 ).FromFigure 3-16 ,itcanbeobservedthatthefeaturesarerathernoisy,especiallyforthefeatureofthenumberofSYNpackets.FromFigs. 3-16(a) and 3-16(c) ,wecanhardlydistinguishtheslotsunderthelowvolumesynchronizedattacksfromtheslotswithoutattacks(byvisualinspection).Incomparison,itismucheasiertoidentifytheslotsunderthesynchronizedattacks(byvisualinspection)whenthenumberofunmatchedSYNpacketsisusedasthefeature(seeSlot14001600andSlot28003200inFigs. 3-16(b) and 3-16(d) 57

PAGE 58

(b) (c) (d)Figure3-16. Featuredata:(a)NumberofSYNpackets(link1),(b)NumberofunmatchedSYNpackets(link1),(c)NumberofSYNpackets(link2),and(d)NumberofunmatchedSYNpackets(link2). 58

PAGE 59

4.1.1 introducestheReceiverOperatingCharacteristicscurve.ItisusedinSection 4.5 asthemetricstocompareperformanceofdierentclassicationmethods.Wedescribethethreshold-basedalgorithm,change-pointalgorithm,andBayesiandecisiontheoryinSections 4.1.2 4.1.3 ,and 4.1.4 ,respectively. 28 ]isatypicalmethodtoquantifytheperformanceofdetectionalgorithms.Itisaplotofdetectionprobabilityvs.falsealarmprobability.Inpractice,weestimatedetectionprobabilityandfalsealarmprobabilitybythefractionoftruepositivesandthefractionoftruenegatives,respectively.Hence,toobtainanROCcurve,oneneedstomeasurethefollowingquantitiesf:thenumberoffalsealarms,i.e.,thenumberofslotsinwhichthedetectionalgorithmdeclares`abnormal'giventhatnoanomalyactuallyhappensintheseslots;n:thenumberofslotsinwhichnoanomalyhappens;d:thenumberofslotsinwhichthedetectionalgorithmdeclares`abnormal'giventhatnetworkanomaliesactuallyhappenintheseslots;a:thenumberofslotsinwhichnetworkanomalieshappen.Thefalsealarmprobabilityandthedetectionprobabilityofthedetectionalgorithmcanbeestimatedbyf=nandd=a,respectively.Byvaryingparametersofdetectionalgorithms,wecanobtaindierentpairsoffalsealarmprobabilityanddetection 59

PAGE 60

28 ].Inthispaper,wewillusetheROCcurvetocomparetheperformanceofdierentdetectionalgorithms.Next,weintroducesomebasicclassicationalgorithms. 11 { 13 17 ].However,existingstudiesonlyconsiderthechangefromnormalstatetoabnormalstate,whichmeansthatthenumberoffalsealarmscanbeverylargeafternetworkanomaliesend.Tofacilitatethediscussion,wedenethefollowingparametersusedinCUSUMinTable 4-1 : Table4-1: ParametersusedinCUSUM ParameterDescription (ti)Theobservedtracfeatureattheendoftimesloti.nTheexpectationof(ti)innormalstates.aTheexpectationof(ti)inabnormalstates.Withoutlosinggenerality,hereweassumethatn
PAGE 61

(4{1)IntheCUSUMalgorithm,ifS(ti)issmallerthanathresholdHCUSUM,declarethatthenetworkstateisnormal;otherwise,declarethatthestateisabnormal.Fromthediscussionabove,wenotethattwoparameters,i.e.,aandHCUSUM,needtobedetermined.However,wecannotuniquelydeterminethesetwoparameters.Toovercomethisproblem,weshallintroduceanotherparameter,i.e.,thedetectiondelay,denotedasD.Accordingtothechange-pointtheory,wehaveD aa: 4{2 ),wecanobtainHCUSUM=D(aa): 4{3 ). 11 { 13 ]onlyconsideronechange,i.e.,fromthenormalstatetotheabnormalstate.Inpractice,thisapproachmayleadtoalargenumberoffalsealarmsaftertheendofattacks.Tomitigatethehighfalsealarmissueoftheexistingalgorithms,whichwecallsingle-CUSUMalgorithms,wedevelopadual-CUSUMalgorithm.Inthisalgorithm,oneCUSUMwillbeusedtodetectthechangefromthenormaltotheabnormalstate,whileanotherCUSUMisresponsiblefordetectingthechangefromtheabnormaltothenormalstate.Themethodofsettingparametersfor 11 ]and[ 12 ],a=(an)=2;thusonlythedetectiondelayisneeded. 61

PAGE 62

4.5.2 ).OurmachinelearningalgorithmisbasedonBayesiandecisiontheory,whichisintroducedinnextsection. 29 ].Itiscomposedofthefeaturespace,D,whichmightbeamulti-dimensionalEuclideanspace;Ustatesofnature,~H=fHu;u2ZUg;priorprobabilitydistribution,P(H),H2~H;likelihoodprobabilities,p(jH),2D,H2~H;lossfunction,(H;H),H;H2ZU,whichdescribesthelossincurredforclassifyinganobjecttobeofclassHwhenitsstateofnatureisclassH.Notethat,inthepaper,P()representsaprobabilitymassfunction(PMF)[ 30 ]andp()aprobabilitydensityfunction(pdf)[ 30 ].Giventheobservedfeature,,ofanobject,theBayesiandecisiontheoryclassiesittobeofclass^Hsuchthat^H=argminHXH2~H(H;H)P(Hj): 30 ],P(Hj)=p(jH)P(H) 62

PAGE 63

4{4 )isequivalentto^H=argminHXH2~H(H;H)p(jH)P(H): 4{6 )givestheBayesiancriterionforpatternclassication.Asimplelossfunctionisdenedtobe(Hu;Hu)=~(u)I(u=u);for8u2ZU; 4{7 )speciesthatmisclassicationinduceszerolossandcorrectclassicationinducesnegativeloss,whichactuallyachievesgain.ApplyingEquation( 4{7 )toEquation( 4{6 ),theBayesiancriterionissimpliedto,^u=argmaxu2ZU~(u)p(jHu)P(Hu): 4{9 )themaximumgaincriterion.Furthernotethat,scalingallgainfactorsbyasamefactordoesnotchangethecriterionspeciedbyEquation( 4{9 ).Hence,wecanalwaysset~(0)=1.Bytuningothergainfactors,wecangenerateROCcurveforBayesiandecisiontheory.Inthischapter,weextendedtheBayesiandecisiontheoryfornetworkanomalydetection.Theremainderofthischapterisorganizedasfollows.InSection 4.2 ,weestablishBayesianmodelsfornetworkanomalydetection.Sections 4.3 and 4.4 solvetwofundamentalproblemsofBayesianmodel,i.e.,trainingproblemandclassicationproblem,respectively.Section 4.5 showsoursimulationresultsandSection 4.6 concludesthischapter. 63

PAGE 64

4.2.1 generalizestheBayesianmodelfornetworkanomalydetectionontracmonitorsandlocalanalyzers.Insection 4.2.2 ,weextendthismodeltothewholeautonomoussystem.Section 4.2.3 introducesthehiddenMarkovtreemodeltodecreasethecomputationcomplexityofthegeneralmodeldenedinSection 4.2.2 4.1.4 Figure4-1. Generativeprocessingraphicalrepresentation,inwhichthetracstategeneratesthestochasticprocessoftrac. Inthecontextofnetworkanomalydetection,therearetwostatesofnatureofanedgerouter,i.e.,~H=fH0;H1g,whereH0representsnormalstate,inwhichcasenoabnormalnetworktracenterstheASthroughthatedgerouter;H1representsabnormalstate,inwhichcaseabnormalnetworktracenterstheASthroughtheedgerouter. 64

PAGE 65

31 ]todepictthiscause-eectrelationshipinFigure 4-1 (b)Figure4-2. Extendedgenerativemodelincludingtracfeaturevectors:(a)originalmodeland(b)simpliedmodel. Denoteby,2D,thefeatureextractedfromtrac,whereDisthefeaturespace.Mostimportantly,inselectionoftheoptimalfeatures,weseekforthemostdiscriminativestatisticalpropertiesofthetrac.Alsonotethatitispossibletoemploymultiplefeaturesinthedetectionprocedure,inwhichcaseisavector.Sincefeaturesaresuccinctrepresentationsofthevoluminoustrac,weextendtheabovemodelinFigure 4-1 totheoneillustratedinFigure 4-2(a) .Onceisextractedfrom,weassumethatrepresentswell.Itmeansthatwemayoperateonlyoverlower-dimensional,whichreducescomputationalcomplexity.Therefore,wesimplifythemodelinFigure 4-2(a) tothatillustratedinFigure 4-2(b) ,whereisdismissed.Sincethefeatureismeasurable,itiscalledobservablerandomvector,andisdepictedbyarectangularnodeinFigure 4-2(b) .Thenetworkstategeneratingthetracfeatureistobeestimated.Wecallithiddenrandomvariable,anddepictitbyaroundnodeinFigure 4-2(b) .Now,thegoalbecomestoestimatethehiddenstategiventheobservable 65

PAGE 66

4{9 ))speciestheestimate,^u,tobe^u=argmaxu2Z2~(u)p(j=u)P(=u): 4.3 ). Figure4-3. Generativeindependentmodelthatdescribesdependenciesamongtracstatesandtracfeaturevectors. AnAShasmanyedgerouters,eachofwhichhasmultiplelinks.Tracmonitorsdeployedonlinksandlocalanalyzersonedgeroutersextractfeaturesandmakedecisionsindependently.Therefore,theonelinkmodelinFigure 4-2(b) isfurtherextendedtothemoregeneralmodelfortheASasillustratedinFigure 4-3 ,wherestandsforthenumberofedgerouters.ThelimitationofthedetectionmodelinFigure 4-3 isthatitassumesedgeroutersaremutuallyindependent.ThisisduetothefactthattracmonitorsandlocalanalyzersonlyhavelocalinformationofthewholeAS.Althoughitissuitabletodetectnetworkanomaliesaccompaniedwithhightracvolumeonsinglelink,itisnotsuitableforlowvolumenetworkanomalydetection.Weaddressthislimitationbyintroducingspatialcorrelationinnextsection. 66

PAGE 67

Figure4-4. Generativedependentmodelthatdescribesdependenciesamongedgerouters. IntroducingthespatialcorrelationintotheindependentmodelinFigure 4-3 resultsinthedependentmodelasillustratedinFigure 4-4 .Thedierencebetweentwomodelsisthat,fromtheviewpointofaglobalanalyzer,edgeroutersarenolongindependent.Asaresult,statisticaldependenceamongstatesofedgeroutersisrepresentedbythenon-directionalconnections.Notethattheindependentmodelcanberegardedasaspecialcaseofthedependentone.Alsonotethatwestillassumethatfeaturesextractedfromoneedgerouterareindependentofthestatesofotheredgerouters. 67

PAGE 68

4{10 ).Wefurtherassumegainfactorsareindependentofnodeindex,i.e.,~(ui)=~(ui0)wheneverui=ui0(0or1),nomatterwhetheriisequaltoi0ornot.Thenthemaximumgaincriterion(seeEquation( 4{9 ))forthedependentmodelis^~u=argmax~u~(~u)p(~j~=~u)P(~=~u)=argmax~u"Yi=1~(ui)p(iji=ui)#P(~=~u): 4{15 )directly,weneedtoexhaustivelycomputep(~j~)P(~)foreachpossiblecombinationof~,whichresultsinaO(2)complexity.ForalargeAS,itisintractable.Weintroducedahierarchicalstructuretoreducecomputationcomplexity,whichisthetopicofthenextsection. 4-4 becomescomputationallyintractableisthatweassumeedgeroutersarefullydependent.Aroughunderstandingisthat,ifwebreaksomedependenceinFigure 4-4 ,wecanreducethecomputationcomplexity.Ontheotherhand,wewouldliketoaccountforthedependenciesamongasmanynodesaspossibletoprovideaccuratedetection.Tobalancethesetwoconicting 68

PAGE 69

4-5 Figure4-5. HiddenMarkovtreemodel.Forannodei,(i)denotesitsparentnodeand(i)denotesthesetofitschildrennodes. ThemotivationofapplyingHMTmodelisthatweassumeedgeroutersarenotequallycorrelated.Instead,edgerouterstopologicallyclosetoeachotherhavehighmutualcorrelations.Basedonthisassumption,weclusteredgeroutersaccordingtothetopologyofASandformatreestructure,asdepictedinFigure 4-5 .Withoutlossofgenerality,Figure 4-5 plotsaquad-treestructure,i.e.,eachnode,exceptleafnodes,hasfourchildren.Tofacilitatefurtherdiscussion,eachnodeintheHMTisassignedanintegernumber,beginningwith0,fromtoptobottom.Thatis,node0isalwaysarootnode 4-2 liststhenotationsusedintherestofthepaperforHMT.IntheHMT,eachleafnodestandsforanedgerouter.Zero-paddingvirtualedgeroutersareintroducedwhenthenumberofedgeroutersisnotapowerofB.Statesofthesezero-paddingvirtualnodesarealwaysnormalandfeaturesarealways0.Non-leaf 69

PAGE 70

Notationsforhiddenmarkovtreemodel NotationDescription iTherandomvariablerepresentingthestateofnodei.iTherandomvariable/vectorrepresentingthefeature(s)measuredatnodei.~Tfi;i2Tg,whereTisasubtreeoftheHMT.LThenumberoflevelsoftheHMT.ThesetofallnodesintheHMT.lThesetofnodesatlevell,l2ZL,intheHMT.Specically,0representsthesetofrootnodesandL1leafnodes.BThenumberofchildrennodesofeachnode,exceptleaves.Forexample,B=4forquad-HMT,asillustratedinFigure 4-5 .(i)Theparentnodeofnodei,wherei=20.(i)Thesetofchildrennodesofnodei,wherei=2L1.TiThesetofancestornodesofnodei,wherei=20,includingnodei.R(i)Therootnodeofthesubtreecontainingnodei,wherei2.TiThesubtreewhoserootisnodei,wherei2.TinjTinTj.TniTR(i)nTi. nodesrepresentclustersofedgerouters.FeaturesofnodesaredenedinEquation( 4{16 ).i=8>><>>:Featuresmeasuredatthecorrespondingedgerouteri2L1(i.e.,leafnode)1 (4{18) 70

PAGE 71

4.2.1 and 4.2.2 ,weemploymaximumgaincriterion(seeEquation( 4{9 ))toestimatenodestates,i.e.,^ui=argmax~u~(~u)P(~=~uj~)=argmaxfui0;i02Tig24Yi02TinfR(i)g~(ui0)P(i0=ui0j(i0)=u(i0);~)35~(uR(i))P(R(i)=uR(i)j~); 4{19 ),wereducethecomputationcomplexityfromO(2)(seeSection 4.2.2 )toO(B).ThisisthemajoradvantageofintroducingHMTmodel.ThedetailsaregiveninSection 4.4 .SolvingEquation( 4{19 )requiresknowledgeofP(ij(i);~),for8i2n0,andP(i=uij~),for8i20.ByBayesianformula[ 30 ],P(i=uij(i)=u(i);~)=P(i=ui;(i)=u(i)j~) 4{19 )translatestoestimatingP(i;(i)j~);8i2n0; 4{21 )and( 4{22 )inclosedformisdicult.Weproposedabeliefpropagation(BP)algorithm,describedinSection 4.3 ,toestimatethemecientlygivenknowledgeofpriorprobabilities:P(i=0),i20;likelihood:p(iji),i2;transitionprobabilities:P(ij(i)),i2n0. 71

PAGE 72

2,i20.Thatis,statesofrootnodesareequallylikelytobenormalorabnormal.Otherparameterssuchaslikelihoodandtransitionprobabilitiesareestimatedfromtrainingdata.ThisiscoveredinSection 4.3 .Afterthat,classicationusingmaximumgaincriterionisdescribedinSection 4.4 4.3.1 ,wedescribeestimationoflikelihoodp(iji),i2.EstimationoftransitionprobabilitiesP(ij(i)),i2n0arepresentedinSection 4.3.2 30 ]iswidelyemployedinmanyapplications.Thepdfofad-dimensionalmultivariateGaussiandistributionwithmeanvectorandvariancematrixisN(x;;)4=1 (2)d=2jj1=2exp1 2(x)t1(x): 4-6 plotsthepdfoftheunivariateGaussiandistributionN(x;0;1).ItisobservedthatGaussiandistributionisaunimodaldistribution[ 30 ],i.e.,itspdfonlyhasonepeak. 72

PAGE 73

ProbabilitydensityfunctionoftheunivariateGaussiandistributionN(x;0;1). Figure4-7. Histogramofthetwo-waymatchingfeaturesmeasuredatarealnetworkduringnetworkanomalies. However,multiplepeaksmayexistintheempiricaldistributionofiji.Forexample,Figure 4-7 showsthehistogramofthetwo-waymatchingfeaturesmeasuredinarealnetworkduringDDoSattacks.Ithastwopeaks.Hence,theunimodalGaussiandistributionisnotsuitable.Inthepaper,weadopttheGaussianmixturemodel(GMM)tomodelthelikelihooddistribution. 73

PAGE 74

4{26 ),thelikelihoodestimationtranslatestoestimatingGMMparametersi;u(g),i;u(g),andi;u(g)for8i2,8u2f0;1g,and8g2f1;:::;Gg,withconstraintGXg=1i;u(g)=1: 74

PAGE 75

4{26 ).Unfortunately,Nechyba[ 32 ]showedthatMLmethodforG-stateGMMwithG>1hasnoclosedformsolution.Inaddition,aG-stateGMMhasa3G-dimensionalcontinuousparameterspace.ExhaustivesearchingnumericalsolutionforMLestimateinsuchaparameterspaceiscomputationalintractable. 1. Input:(k)i;u,k2f1;:::;Kug;(0)i;u(g),(0)i;u(g),and(0)i;u(g),g2f1;:::;Gg. 2. Output:^i;u(g),^i;u(g),and^i;u(g),g2f1;:::;Gg. 3. 4. repeat 5. 7. (j+1)i;u(g)=PKuk=1#(j)i;u(g;(k)i;u)k(k)i;u(j+1)i;u(g)k2 10. untilconverge 11. ^i;u(g)=(j)i;u(g),^i;u(g)=(j)i;u(g),^i;u(g)=(j)i;u(g),8g2f1;:::;Gg. Figure4-8. TheEMalgorithmforestimatingp(iji=u),i2,u2f0;1g. Apracticalsolutiontothisissueistheexpectation-maximization(EM)algorithm[ 29 30 ].Nechyba[ 32 ]derivedEMalgorithmforGMMindetail.Figure 4-8 illustratesthealgorithm.TheEMalgorithmrequiresinitialvaluesfortheparameters,asdenotedby(0)i;u(g),(0)i;u(g),and(0)i;u(g)inFigure 4-8 .Ateachiterationj,theEMalgorithmusesparametersestimatedatiterationj1tocalculatenewestimates.AlthoughbothEMandML 75

PAGE 76

4{28 ).Asaresult,EMalgorithmconvergesmuchfasterthannumericalMLmethod.However,thedisadvantageofEMalgorithmisthatitconvergestoalocalmaximaratherthantheglobalone.Specically,initialvaluesofparametersdeterminethelocalmaximatowhichtheEMalgorithmconverges.Inpractice,wehavepriorknowledgeofnetworkfeatures,whichhelpstochooseinitialvaluesofparameters.Tillnow,wepresentschemestoestimatelikelihoodofHMT.Innextsection,weestimatetransitionprobabilities. 4-9 and 4-10 showthepseudo-codefortransitionprobabilityestimation.Innexttwosections,weexplainthetwogures. 4-9 showsthepseudo-codetoestimatetransitionprobabilities.ThefunctionTransProbEstimatetakesthreesetsofarguments:1.likelihoodestimatedinSection 4.3.1 ,fp(iji);i2g;2.trainingfeatures,f~(k);k2f1;:::;Kgg.Itreturnstheestimateoftransitionprobabilities,i.e.,P(ij(i));8i2n0:Beforetheiterations,wesettheinitialtransitionprobabilitiestobe1 2atline 5 ofFigure 4-9 .Thisisequivalenttoassumenormalstateandabnormalstatetobeinitially 76

PAGE 77

functionTransProbEstimate(:::) Argument1:likelihood,fp(iji);i2g. 3. Argument2:trainingdata,f~(k);k2f1;:::;Kgg. 4. Return:transitionprobabilityestimate:P(i=uij(i)),i2n0. 5. 2,for8i2n0,8u;u02f0;1g. 6. 7. repeat 8. fork1toK 10. endfor 11. for8i2n0 (4{30) 13. endfor 14. 15. untilconverge 16. Figure4-9. Iterativelyestimatetransitionprobabilities. equallylikely.Then,ateachiteration,weupdatetheestimateoftransitionprobabilitiesuntilitconverges.Theupdateprocedureisthefollowing.First,weiteratethetrainingfeatureset.Foreachfeature,weuseBPalgorithm(seeFigure 4-10 )toestimatetheposteriorprobabilitiesgiventhatfeature.ThedetailsoftheBPalgorithmisdiscussedinthenextsection.ThreesetsofargumentsarepassedtotheBPalgorithm:estimateoftransitionprobabilitiesobtainedatthepreviousiteration,P(j)(ij(i));i2n0;1.likelihood,fp(iji);i2g,whichistheargumentpassedtofunctionTransProbEsti-mate; 77

PAGE 78

functionBP(:::) 2. Argument1:transitionprobabilities,P(ij(i));i2n0. 3. Argument2:likelihood,fp(iji);i2g. 4. Argument3:trainingfeature,~. 5. Return:posteriorprobabilities,nP((i)j~);P(i;(i)j~);i2n0o. 6. i(0)=i(1)=1 2,for8i20(i.e.,roots). 7. 8. Top-downpass,i.e.,fromroottoleaf: 9. forl1;:::;L1 10. for8i2l;8u2f0;1g,let 11. i(u)=Xu02f0;1gPi=uj(i)=u0(i)(u0)p((i)ju(i)=u0) (4{31) 12. endfor 13. endfor 14. Bottom-uppass,i.e.,fromleaftoroot; 15. forlL2;:::;0 16. for8i2l,8u2f0;1g,let 17. 18. endfor 19. endfor 20. (4{33) 21. [Pu00i(u00)i(u00)]Pu00Pi=u00j(i)=u0i(u00) Figure4-10. Beliefpropagationalgorithm. 78

PAGE 79

4{30 )for8i2n0andsteptothenextiteration.Whentheestimatesconverge,iterationstopsandFunctionTransProbEsti-matereturnsestimatesobtainedatthelastiteration.ThevalidityofEquation( 4{30 )isshowninthefollowing.For8i2n0,P(ij(i))=ZP(ij(i);~=~)p(~)d~=E~hP((ij(i);~))i; 30 ],weestimateE~hP((ij(i);~))iby1 4{35 )and( 4{36 ),weobtainEquation( 4{30 ).Next,wediscussthebeliefpropagationalgorithm,whichiscalledbyfunctionTransProbEstimate. 33 { 35 ],alsoknownasthesum-productalgorithm,e.g.,[ 36 { 39 ],isanimportantmethodforcomputingapproximatemarginaldistributions.Inthispaper,weapplytheBPalgorithmtoestimatingposteriorprobabilities(Figure 4-10 ). 79

PAGE 80

2andi(1)=1 2forallrootnodesi(seeline 6 ofFigure 4-10 ),becauseweassumerootnodesareequallylikelytobeinnormalorabnormalstate.Wheni2L1,i.e.,leafnodes,~Ti=~i,thereforei(u)=p(iji=u),u2f0;1g(seeline 7 ofFigure 4-10 ).Thenitpropagatesbeliefonthetreerootsandleavestoothernodesintop-downpassandbottom-uppass,respectively.Duringthetop-downpass,FunctionBPiteratesfromroottoleaf.Ateachlevell,weupdatethetransitoryvariablesi(u)byEquation( 4{31 ),whichisproveninAppendix A.1 .Notethat,(i)(u0)inEquation( 4{31 )isobtainedinthepreviousiteration,i.e.,iterationatlevell1.Duringthebottom-uppass,FunctionBPiteratesfromleaftoroot.ateachlevell,weupdatethetransitoryvariablesi(u)byEquation( 4{32 ),whichisproveninAppendix A.2 .Alsonotethat,j(u0)isobtainedinthepreviousiteration,i.e.,iterationatlevell+1. 80

PAGE 81

4{33 )and( 4{34 ).ThesetwoequationsareproveninAppendix A.3 and A.4 ,respectively.TheestimatedposteriorprobabilitiesareusedinFunctionTransProbEstimatetoupdateestimatesoftransitionprobabilities(seeFigure 4-9 ).Tillnow,weestablishedtheHMTmodelanddescribedapproachestoestimateitsmodelparametersfromtrainingdata.Next,wepresentnetworkanomalydetectionapproachesusingthefullydeterminedHMTmodel. 4{19 ).WerewritethecriterioninEquation( 4{39 ),^ui=argmaxfui0;i02Tig24Yi02TinfR(i)g~(ui0)P(i0=ui0j(i0)=u(i0);~)35~(uR(i))PR(i)=uR(i)j~; 4-11 ,showsthepseudo-codefortheclassicationalgorithm.FunctionViterbiDecodeHMTtakesthreearguments.Thersttwoarguments,thetransitionprobabilitiesandthelikelihood,areestimatedduringtrainingphase.Thelastoneistheextractedfeatures,basedonwhichweperformanomalydetection.Itreturnstheestimatesofnodestates.Amongthem,weareonlyinterestedinstatesofleaf 81

PAGE 82

1. functionViterbiDecodeHMT(:::) 2. Argument1:transitionprobabilities,P(ij(i));i2n0. 3. Argument2:likelihood,fp(iji);i2g. 4. Argument3:feature,~. 5. Return:estimatednodestates,f^ui;i2g. 6. 7. 8. for8i20 ^ui=argmaxuiP(i=uij~) 10. endfor 11. forl1;:::;L1,for8i2l ^ui=argmaxui~(ui)Pi=uij(i)=^u(i);~24Yi02T(i)nR(i)~(^ui0)Pi0=^ui0j(i0)=^u(i0);~35~(^uR(i))PR(i)=^uR(i)j~ 13. endfor Figure4-11. ViterbialgorithmforHMTdecoding. ByBayesianformula,thetermsinEquation( 4{39 )canbecomputedbyPi0=ui0j(i0)=u(i0);~=Pi0=ui0;(i0)=u(i0)j~ 82

PAGE 83

4{40 )and( 4{41 )inFigure 4-11 )tosolveEquation( 4{43 )inawaysimilartoHMTtransitionprobabilityestimation(seeFigure 4-10 ).ObtainingsolutionstoEquation( 4{43 )forallnodes,wecansolveEquation( 4{39 ).AbruteforcesolutiontoEquation( 4{39 )istoexhaustivelycompute24Yi02TinfR(i)g~(ui0)P(i0=ui0j(i0)=u(i0);~)35~(uR(i))PR(i)=uR(i)j~; 4-4 ),whosecomplexityisO(2):Inthispaper,weappliedViterbialgorithm[ 40 { 42 ]tosolvingEquation( 4{39 )inaniterativemanner,reducingthecomputationalcomplexitytoO(B).ThemotivationofViterbialgorithmisthefollowing.ItiteratesfromtoplevelofaHMTtobottomlevel.Ateachiteration,itestimatesthenodestatesinthatlevelinawaythat,whencombinedwithstatesestimatedinupperlevels,\best"explainstheobservedfeatures.Thatis,ateachiteration,Viterbialgorithmalwaysselectsthelocalmaxima. 83

PAGE 84

4{39 ),Viterbialgorithmisecientandhasgoodperformanceempirically.ThecomputationcomplexityofViterbialgorithmisO(B),muchbetterthanthedependentmodel,asillustratedinFigure 4-4 ,whosecomplexityisO(2).TheperformanceimprovementresultsfromthefactthatViterbialgorithmdoesnotexhaustivelytestallpossiblenodestatecombinations.Instead,wedecomposethedecodingproblemintomultiplestages,eachofwhichdecodesnodestatesatonelevelinHMT.Atlevell,weestimatenodestatesoflevell,i.e.,fi;i2lg,basedonresultsobtainedduringpreviousstages,i.e.fi;i2f0;:::;l1gg.Insuchaprocedure,eachnodeisonlyaccessedfortwice.Hencethecomplexityislineartothenumberofnodes,whichisBdlogBe1BL 4-12 .Ateachedgerouter,twomonitorsareplacedtomeasuretheinboundandoutboundtrac 84

PAGE 85

ExperimentNetwork betweenasubnetandthevictimnetwork,respectively.Forconvenience,wedenotealinkastheroutebetweenanedgerouterandthevictimsubnet. 3.7.2 [ 27 ].Sincewedonothaverealdatatracesobtainedfrom16dierentlinks,weusetherealtractracemeasuredononelink(betweentheInternetandAucklandUniversity)in16dierentdaystocreatetractracesfor16dierentlinks.Fortheabnormaltrac,werandomlygenerateTCPSYNoodattacksintothebackgroundtrace.Specically,wegenerateseveralattackscenarios.Foreachscenario,werandomlyselecttheabnormallinksandattackdurations.AttacktraconeachlinkisgeneratedinthesamewayasinSection 3.7.2 .Thatis,werandomlyinsertTCPSYNpacketswithrandomsourceIPaddressesintothebackgroundtracofthatlink.TheaveragepacketrateofTCPSYNattacktraconeachselectedlinkis1%ofthetotalpacketrateonthelink.Foreachattackscenario,attacksoneachoftheselectedlinksarelaunchedduringalmostthesameperiodtosimulatethesynchronizedDDoSattacks.Sincetheattacktraconeachlinkislow(just1%),weeectivelysimulatelowvolumeattacktrac. 85

PAGE 86

3 ,i.e.,thenumberofunmatchedinboundSYNpacketsinonetimeslot.Theparametersettingoftwo-waymatchingfeaturesextractionissameasinSection 3.7.2 .Forconvenience,wesummarizetheparametersinTable 4-3 Table4-3: Parametersettingoffeatureextractionfornetworkanomalydetection NotationDescription Forcomparisonpurpose,wealsomeasurethenumberofSYNpacketsandSYN/FINratio[ 11 ]inaslot. Performanceofdierentschemes. FeatureDetectionalgorithmDetectionprobabilityFalsealarmprobability SYN/FINratioCUSUM0.1740.129SYNCUSUM0.520.129SYNMachinelearning0.6560.123UnmatchedSYNCUSUM0.6900.130UnmatchedSYNMachinelearning0.9730.115 Table 4-4 comparestheperformanceofdierentschemes,wherethebenchmarkistheschemein[ 11 ],i.e.,theCUSUMschemewithSYN/FINratioasthefeature;forthebenchmarkscheme,weusethesameparametersettingasthatin[ 11 ];wecomparethebenchmarkwithCUSUMandourmachinelearningalgorithmunderdierentfeatures.Tomakefaircomparison,wemakethefalsealarmprobabilityofeachschemealmostthesameandcomparethedetectionprobability.FromTable 4-4 ,itcanbeseenthat,thebenchmarkscheme(`SYN/FINratio'+CUSUM)performsverypoorlyindetectinglowvolumeDDoSattacks.Incontrast,aCUSUMalgorithmwiththenumberofSYN 86

PAGE 87

Figure4-13. Performanceofthreshold-basedandmachinelearningalgorithmswithdierentfeaturedata Figure 4-13 comparestheROCcurveofthethreshold-basedschemedescribedinSection 4.1.2 andourmachinelearningalgorithmundertwodierentfeatures,i.e.,thenumberofSYNpackets(denotedby`SYN')andthenumberofunmatchedSYNpackets(denotedby`UM-SYN').Weobservethat,forthesamedetectionalgorithm,usingthenumberofunmatchedSYNpacketscansignicantlyimprovetheROCperformance,comparedtousingthenumberofSYNpackets.Inotherwords,giventhesamefalsealarmprobability,thedetectionprobabilityismuchhigherwhenusingthenumberofunmatchedSYNasfeature.AnotherimportantobservationfromFigure 4-13 isthatgiventhesamefeaturedata,ourmachinelearningalgorithmcan(signicantly)improvetheROC,comparedtothethreshold-basedscheme;e.g.,forthesamefalsealarmprobabilityof0.05,ourmachinelearningalgorithmachievesadetectionprobabilityof0.93,whilethethreshold-basedschemeonlyachievesadetectionprobabilityof0.72.Thisisduetothefactthatour 87

PAGE 88

Figure4-14. Performanceoffourdetectionalgorithms InFigure 4-14 ,wecomparetheROCperformanceoffourdetectionalgorithms(thethreshold-based,thesingle-CUSUM,thedual-CUSUMdescribedinSection 4.1.3 ,andourmachinelearningalgorithm)underthesamefeature,i.e.,thenumberofunmatchedSYNpackets.Forthesingle-CUSUMandthedual-CUSUMalgorithm,thedetectiondelayDischosenfrom1to10slotsandtheparameteraioflinkiisdeterminedbyai=(DattackDnormal)i 4-14 )ofourmachinelearningalgorithmisthebestamongallthealgorithms.Wealsoseethatthedual-CUSUMout-performsthesimplethreshold-basedalgorithmandthesingle-CUSUMalgorithmhastheworstROCperformance. 88

PAGE 89

27 ].WetestedourmachinelearningalgorithmsforalargeIPaddressspace,i.e.,theIPaddressspacecanbethewholeIPaddressspacefortheInternet. 89

PAGE 90

5 and 6 ,wefocusonthesecondpartofourresearch,i.e.,networkcentrictracclassication.Thischaptermotivatesthesignicanceandpointsoutthechallengesofthisissue,andshowsweaknessofexistingsolutions. 90

PAGE 91

43 44 ].InprincipletheTCPandUDPserverportnumberscanbeusedtoidentifythehigherlayerapplication,bysimplyidentifyingtheserverportandmappingthisporttoanapplicationusingtheInternetAssignedNumbersAuthority(IANA)listofregisteredports[ 45 ].However,port-basedapplicationclassicationhaslimitationsduetotheemergenceofnewapplicationsthatnolongerusexed,predictableportnumbers.Forexample,non-privilegedusersoftenhavetouseportsabove1024tocircumventoperatingsystemaccesscontrolrestrictions;orcommonapplicationslikeFTPallowsthenegotiationofunknownandunpredictableserverportstobeusedforthedatatransfer;orproprietaryapplicationsmaydeliberatelytrytohidetheirexistenceorbypassport-basedltersbyusingstandardports.Forexample,serverport80isbeingusedbyalargevarietyofnon-webapplicationstocircumventrewallswhichdonotlterport-80trac;others(e.g.,Skype)tendtousedynamicports.Amorereliabletechniqueinvolvesstatefulreconstructionofsessionandapplicationinformationfrompacketcontents[ 46 { 48 ].Althoughthisavoidsrelianceonxedportnumbers,itimposessignicantcomplexityandprocessingloadontheclassicationdevice, 91

PAGE 92

49 50 ]havetakenthisstatisticalapproachtoclassifytracintop2p,multimediastreaming,interactiveapplication,andbulktransferapplication.Unfortunately,althoughthesepapersaddressedtheproblemofdistinguishingmultimediatracfromotherapplications,theyhavenotaddressedtheproblemofdistinguishingvoicetracfromvideotrac.Oneproblemistoseparatestreamingtracfromotherapplications,andadierentproblemistodetectandcorrectlyclassifyvoiceandvideoandclearlyseparatethetwoapplicationsfromeachother.Intheextremecase,voiceand/orvideodatastreamsmightevenbebundledtogetherinthesameexactowwithotherapplications.TheseproblemsarecommonformanyapplicationslikeSkype,GtalkandMSNthatallowuserstomixvoiceand/orvideostreamswithchatand/orletransfertracinthesameexact5-tupleow,denedas.Insuchcases,oneowmaycarry 92

PAGE 93

93

PAGE 94

5.2 ,weintroducetherelatedworkintheareaofpatternclassicationmethodologies.Section 5.3 describestheweaknessesofmetricspreviouslyusedbyotherworkswhenappliedinourcontextandhighlightsthenewtracfeaturesthatconstitutethefoundationofourapproach.Section 5.4 summarizesthischapter. 49 ],theauthorsproposedthecombinationofaveragepacketsizewithinaowandtheinter-arrivalvariabilitymetric,e.g.denedastheratioofthevariancetotheaverageinter-arrivaltimesofpacketswithinaow,asapowerfulmetrictodenefairlydistinctboundariesforthreegroupsofapplications:(i)bulkdatatransferlikeFTP,(ii)interactivelikeHTTP,and(iii)streaminglikevoice,video,gaming,etc.Severalclassicationtechniques,likenearest-neighborandK-nearest-neighbor,werethentestedusingtheabovetracfeatures.Althoughthispreliminarystudyhasprovedthattheapproachofpatternclassicationhasgreatpotentialforaproperapplicationclassication,itprovesthatmuchmoreworkstillremains,e.g.exploitingotheralternativefortracfeaturesandclassicationtechniques.Moreover,althoughthefeaturesextractedaresimpleandfeasibletobeimplementedon-the-y,thelearningalgorithmiscomplexandtheoutcomeboundariesamongthethreefamiliesofapplicationsareheavilynon-linearandtime-dependent.SimilartoRef.[ 49 ],Karagiannisetal.[ 50 ]proposedanovelapproach,calledBLINC,thatexploitsnetwork-relatedpropertiesandcharacteristics.Thenoveltyofthisapproach 94

PAGE 95

50 ]isinterestingfromaconceptualperspectiveandprovedtoperformreasonablywellforavarietyofdierentapplications,itisstillpronetolargeestimationerrorsforstreamingapplications.Moreover,itshighcomplexityandlargememoryconsumptionremainsanopenissueforhigh-speedapplicationclassication.OtherpapersusingpatternclassicationappearedlatelyinliteraturebutmorefocusedonspecicapplicationdetectionlikePeer-to-Peer[ 46 ]andchat[ 51 ].Moreimportantly,tothebestofourknowledge,noneoftheexistingworkhasbeenabletoseparatevoicetracfromvideotracortoindicatethepresenceofvoicetracorvideotracinahybridowthatcontainstracfrombothvoice/videoandotherapplicationssuchasletransfer. 5-1 weshowtheresultsobtainedwhenusingthecombinationofaveragepacketsizeandtheinter-arrivalvariabilitymetricproposedbyRoughanet

PAGE 96

49 ].Althoughthismetricperformedverywellinseparatingstreaming,letransfer,transactionalandinteractiveapplications,itperformspoorlywhenusedtofurtherseparateapplicationswithinthesamefamily,asvoice,videoorvoiceandvideomixedwithotherapplicationslikeletransfer,e.g.hybridows.Figure 5-1 clearlyhighlightsthecompleteabsenceofanydistinctboundaryandheavyoverlappingbetweenvoiceandvideotrac.Thereasonswhythepair(averagepacketsize,inter-arrivalvariabilitymetric)cannotseparatevideofromvoiceareasbelow.First,thepacketsizeforvideo/voiceiscontrolledbythepacketizationstrategyofthevideo/voiceapplicationdesigner[ 52 ];hence,avideoapplicationmayproducesimilaraveragepacketsizetothatforvoice(Figure 5-1 ).Second,randomend-to-enddelayintheInternetcauseslargevariationsintheinter-arrivalvariabilitymetricfordierentvideo/voiceows. Figure5-1. Averagepacketsizeversusinter-arrivalvariabilitymetricfor5applications:voice,video,letransfer,mixofletransferwithvoiceandvideo. Table5-1: Commonlyusedspeechcodecandtheirspecications StandardCodecMethodInter-PacketDelay(ms) G.711[ 53 ]PCM.125G.726[ 54 ]ADPCM.125G.728[ 55 ]LD-CELP.625G.729[ 56 ]CS-ACELP10G.729A[ 56 ]CS-ACELP10G.723.1[ 57 ]MP-MLQ30G.723.1[ 57 ]ACELP30 96

PAGE 97

Figure5-2. Inter-arrivaltimedistributionforvoiceandvideotrac 5-1 listssomespeechcodecstandardsandtheassociatedIPDsthatarerequiredforacorrectimplementationofthoseprotocols.PacketsleavingthetransmittermighttraversealargenumberoflinksintheInternetbeforereachingtheproperdestination.Alongthis 97

PAGE 98

Packetsizedistributionforvoiceandvideotrac traveling,packetsmightexperiencerandomdelayduetocongestionatrouters'interfaces.Asaconsequence,theinter-arrivaltimesbetweenpacketsatthereceivermightbeseverelyaectedbyrandomnoise,e.g.jitter,andthusthismetricmightnotrepresentareliablecandidatefeatureforarobustclassicationmethodology.Althoughthisproblemdoesexist,wenotehowtheinter-arrivaltimesbetweenpacketswithinthesameowstillshowsastrongregularitywhenstudiedinthefrequencydomainatthereceiverside.Asanexample,inFigure 5-2 ,weshowthedistributionsoftheinter-arrivalpackettimesatthereceiversidewhenusingSkypetotransmitrespectivelyvoiceonlyandvideoonlybetweentwohosts,onelocatedinUniversityAineastcoastandtheotherinUniversityBinwestcoastofUSA.Aswecansee,thedistributionsforbothvideoandvoicearecenteredaround0.03second.Ontheotherhands,Figure 5-3 showsthedistributionsofthepacketsizesforbothvoiceandvideo.Asyoucansee,bothvoiceandvideoarecharacterizedbysimilardistributionforpacketsizelessthan200bytes.Althoughvideotracgeneratespacketsizeoflargerthan200,theselargerpacketscannotbereliablyusedtoseparatevideofromvoicesinceotherapplicationssuchaschatorletransfermightalsogeneratetheselargerpackets.Asaconsequence,packetinter-arrivaltimeorpacketsizeisaweakfeaturewhenconsideredinthetemporaldomain. 98

PAGE 99

5-4 and 5-5 respectivelyforvoiceandvideo.Wecanseethatsomeregularityforbothvoiceandvideoexistfordierenttraces,althoughtheregularityisnotquitestrong.ThisresultholdstrueforallexperimentsconductedwhentransmittingSkypevoiceandvideopacketsovertheInternetfromUniversityAtoUniversityB. Figure5-4. Powerspectraldensityoftwosequences/tracesoftime-varyinginter-arrivaltimesforvoicetrac 58 ],i.e.,Intraframes(I-frame)andPredictedframes(P-frame).AnI-frameisaframecodedwithoutreferencetoanyframeexceptitself.Itservesasthestartingpointforadecodertoreconstructthevideostream.AP-framemaycontainbothimagedataandmotionvectordisplacements 99

PAGE 100

Powerspectraldensityoftwosequencesoftime-varyinginter-arrivaltimesforvideotrac and/orcombinationsofthetwo.Itsdecodingneedsreferencetopreviouslydecodedframes.PacketscontainingI-framesarelargerthanthosecontainingP-frames.Usually,thenumberofP-framesbetweentwoconsecutiveI-framesisconstant.Hence,onecanobserveastrongperiodicvariationofpacketsizeduetotheinterleavingofI-framesandP-framescomposingvideodatastreams.VoicestreamshavesimilarphenomenonifLinearPredictionCoding(LPC),e.g.,codeexcitedlinearprediction(CELP)voicecoder,isemployed.Asanexample,Figs. 5-6 and 5-7 showthepowerspectraldensityofvoiceandvideopacketsizes,respectively. 5-8 and 5-9 showhowtheregularitieshiddeninvoiceandvideodatastreamscanbeampliedwhencombiningthetwofeaturestogetherinonesinglestochasticprocessthatwillbedescribedlaterinthepaper.NotehowthetwoimportantfrequenciesareampliedandclearlyvisibleinthePSDplots.ThereasonwhythereisapeakinthePSDforvoice(seeFigure 5-8 )isthatvoiceapplicationsusuallyproduceclose-to-constantpacketrateduetoconstantinter-packetdelayofthewidelyusedspeechcodecslistedinTable 5-1 ;e.g.,thepeakof33HzinFigure 5-8 correspondsto30msinter-packet 100

PAGE 101

Powerspectraldensityoftwosequencesofdiscrete-timepacketsizesforvoicetrac Figure5-7. Powerspectraldensityoftwosequencesofdiscrete-timepacketsizesforvideotrac delay.Comparedtovoice,videoapplicationshaveaatterPSD.Thereasonisasbelow.ThenumberofbitsinanI-frameofvideodependsonthetextureoftheimage(e.g.,theI-frameofablackboardimageproducesmuchlessbitsthanthatofacomplicatedowerimage),resultinginalargerangeinthenumberofpacketsinanI-frame,e.g.,from1packettoafewhundredpacketsproducedbyanI-frame.Theframerateisusuallyconstant(e.g.,30frames/sisastandardrateinUSA),i.e.,aframewillbegeneratedevery 101

PAGE 102

Figure5-8. Powerspectraldensityoftwosequencesofcontinuous-timepacketsizesforvoicetrac Figure5-9. Powerspectraldensityoftwosequencesofcontinuous-timepacketsizesforvideotrac 102

PAGE 103

5.2 )arenotabletoaccuratelydistinguishbetweenvoiceandvideoows.Byanalyzingthepropertiesofvoiceandvideodatastreams,ourintuitionistoexploitthestrongregularitiesresidinginpacketinter-arrivaltimesandtheassociatedpacketsizes.Inthischapter,weanalyzefourtypesofmetricsthatexploittheregularities,1.packetinter-arrivaltimeandpacketsizeintimedomain;2.packetinter-arrivaltimeinfrequencydomain;3.packetsizeinfrequencydomain;4.combiningpacketinter-arrivaltimeandpacketsizeinfrequencydomain.Byanalyzingpropertiesandillustratingguresofthefourtypesofmetrics,weshowthatcombiningpacketinter-arrivaltimeandpacketsizeinonesinglestochasticprocessgeneratesdistinctivefeaturetoclassifyvoiceandvideostreams. 103

PAGE 104

VOVClassierSystemArchitecture Werstpresenttheoverallarchitectureofoursystem(VOVClassier,Figure 6-1 ),andprovideahigh-leveldescriptionofthefunctionalitiesofeachofitsmodules.Generallyspeaking,VOVClassierisanautomatedlearningsystemthatusespacketheadersfromrawpacketscollectedothewire,organizethemintotransportnetworkowsandprocesstheminrealtimetosearchforvoiceandvideoapplications.VOVClassierrsttrainsvoiceandvideodatastreamsseparatelybeforebeingusedinrealtimeforclassication.Duringthetrainingphase,VOVClassierextractsfeaturevectors,whichisasummary(alsoknownasastatistic)ofrawtracbitstream,andmaintaintheirstatisticsinmemory.Duringtheonlineclassicationphase,aclassiermakesdecisionbymeasuringsimilaritymetricsbetweenthefeaturevectorextractedfromon-the-ynetworktracandthefeaturevectorsextractedfromtrainingdata.Flowswithhighvaluesofsimilaritymetricwiththevoice(orvideo)featuresareclassiedasvoice(orvideo);datastreamswithlowvaluesofsimilaritywithvoice/videoareclassiedasotherapplications.Ingeneral,VOVClassieriscomposedoffourmajormodulesthatoperateincascade:(i)FlowSummaryGenerator(FSG),(ii)FeatureExtractor(FE)viaPowerSpectral 104

PAGE 105

5.3 wehaveshownasvoiceandvideodatastreamsthatarecharacterizedbypacketsthatareverysmallinsize.Asaconsequence,thismoduleltersoutallpacketswhosesizeissmallerthanapre-speciedthresholdP.Theprocessedowistheninternallydescribedintermsofpacketsizesandinter-arrivaltimesbetweenpacketswithinanygenericowFS.FS=fhPi;Aii;i=1;:::;Ig; 6-1 )areusedduringboththetrainingandtheclassicationphases. 105

PAGE 106

106

PAGE 107

6.2 6.3 ,and 6.4 describethecomponents,FeatureExtractor,Voice/VideoSubspaceGenerator,andVoice/VideoClassier,respectively.InSection 6.5 ,weconductexperimentsontraccollectedbetweentwouniversitiesusingSkype,MSN,andGTalk.Section 6.6 summarizesthischapter. 5.3 theextractionandprocessingofsimpletracfeaturesdoesnotsolvetheproblemofdetectingandseparatingvoiceandvideodatastreamsfromotherapplications.InthissectionwerstintroducethepreliminarystepsthatwetaketotransformeachgenericowFSobtainedfromFSGintoastochasticprocessthatcombinestheinter-arrivaltimesandpacketsizes.Thenwedescribehowtousepowerspectraldensity(PSD)analysisasapowerfulmethodologytoextractsuchhiddenkeyregularitiesresidinginreal-timemultimedianetworktrac. Powerspectraldensityfeaturesextractionmodule.Cascadeofprocessingsteps. EachowFSextractedfromtheFSGisforwardedtotheFEmodulethatappliesseveralstepsincascade(Figure 6-2 ).First,anyFSextracted(seeEquation( 6{1 ))ismodeledasacontinuousstochasticprocessasillustratedinEquation( 6{2 ):P(t)=X2FSP(tA); 107

PAGE 108

6{2 )isrepresentedasasummationofdeltafunctions,itsspectrumspansthewholefrequencydomain.InordertocorrectlyreshapethespectrumofPh(t)toavoidaliasingwhenitissampledatintervalTs,weapplyalowpasslter(LPF)characterizedbyitsimpulseresponsehLPF(t).Ph(t)canthenbemathematicallydescribedasfollowing:Ph(t)=P(t)hLPF(t)=X2FSPh(tA): Powerspectraldensitydenition. 108

PAGE 109

59 ])canbecomputedas:($;y)=1Xk=r(k;y)ej$k;$2[;); 6{5 )cantakeanyvalue,werestrictitsdomaintobewithin[;)because($;y)=($+2;y).AccordingtoEquations( 6{5 )and( 6{6 ),thecomputationofthePSDforadigitalsignaltheoreticallyrequirestohaveaccesstoaninnitelongtimesequence.Sinceinreality,wecannotassumetohaveinnitedigitalsequencesatourdisposal,weneedtofacetheproblemonwhichtechniquecanbeusedinourcontexttoestimatethepowerspectraldensitywithanadmissibleaccuracy.Inliterature,twodierentfamiliesofPSDestimationareavailable:parametricandnon-parametric.Parametricmethodshaveshowntoperformbetterundertheassumptionthattheunderlyingmodeliscorrectandaccurate.Furthermore,thesemethodsaremoreinterestingfromacomputationalcomplexityperspectiveastheyrequiretheestimationoffewervariableswhencomparedwithnon-parametricmethods.Inourresearch,weemployparametricmethodtoestimatePSD.Thedetailsarepresentedinthenextsection. 109

PAGE 110

6{7 )canberegardedasobtainingasignalbylteringwhitenoiseofpower"2throughalterwithtransferfunctionB($) (6{10)StartingfromEquation( 6{7 ),threetypesofmethodsarederived:ifp>0andq=0,onemodelsfy(i)gi2Zasanautoregressive(AR(p))signal;1.ifp=0andq>0,onemodelsfy(i)gi2Zasamovingaverage(MA(q))signal;2.otherwise,itismodeledasanautoregressivemovingaverage(ARMA(p;q))signal.BasedontheAR,MA,orARMAassumptions,onecanestimatethecoecientsinEquation( 6{7 )andhencePSD.Ingeneral,noneofthesethreemodelsoutperformstheothertwobutrathertheirperformancearestrictlyrelatedtothespecicshapeofthesignalunderconsideration.Duetothefactthatthesignalweprocessarecharacterizedbystrongregularitiesinthetimedomain,wedecidedtoadopttheARmodel.ThereasonisthattheARequationcanmodelspectrumwithnarrowpeaksbyplacingzerosofA($)closetotheunitcircle. 110

PAGE 111

6{10 )canbewrittenasy(i)+pXt=1aty(it)="(i): 6{12 )byy(ik)andtakingexpectationatbothsides,oneobtainsr(k;y)+pXt=1atr(kt;y)=Ef"(i)y(ik)g: 6{14 )canberewritteninmatrixform,i.e.,266666664r(0;y)r(1;y)r(p;y)r(1;y)r(0;y).........r(1;y)r(p;y)r(0;y)3777777752666666641a1...ap377777775=266666664"20...0377777775 6{15 )isknownastheYule-WalkermethodforARspectralestimation[ 59 ].Givendatafy(i)gIi=1,onerstestimatestheautocovariancesequence8>><>>:^r(k;y)=1 6{15 )becomesasystemofp+1linearequationswithp+1unknownvariables, 111

PAGE 112

6{17 )and( 6{18 ),isnotgoodenoughintermsoftimecomplexity.Equation( 6{17 )computestheinversionofcovariancematrix,whosetimecomplexityisO(p3)[ 60 ,page755].Inaddition,inmostapplications,thereisnoaprioriinformationaboutthetrueorderp.Tocopewiththat,theYule-Walkersystemofequations,Equation( 6{15 ),hastobesolvedforp=1uptop=pmax,wherepmaxissomeprespeciedmaximumorder.ThetimecomplexityisO(p4max).Inthispaper,weusetheLevinson-DurbinAlgorithm(LDA)[ 59 ]toreducetimecomplexity.ItestimatesARsignalcoecientsrecursivelyintheorderp.Tofacilitatefurtherdiscussionandtoemphasizetheorderp,wedenoteRp+14=266666664r(0;y)r(1;y)r(p;y)r(1;y)r(0;y)r(p+1;y).........r(p;y)r(1;y)r(0;y)377777775; 112

PAGE 113

6{15 )asRp+12641~ap375=264"2p0375 1. functionLDA(:::) 2. Argument1:data,fy(i)gIi=1. 3. Argument2:order,p. 4. Return:parametersofAR(p)model,~apand"2p. 5. 6. (6{23) 7. (6{24) 8. fort1;:::;p rt4=[r(t;y);r(t1;y);;r(1;y)]T 10. endfor Figure6-3. Levinson-DurbinAlgorithm. Figure 6-3 givestheLDAalgorithmtoestimatecoecientsofAR(p)modelgivendatafy(i)gIi=1. 113

PAGE 114

6{17 )and( 6{18 ). 1. functionPSDEstimate(:::) 2. Argument1:data,fy(i)gIi=1. 3. Argument2:order,p. 4. Return:PSD($;y). 5. (6{30) 6. Figure6-4. ParametricPSDEstimateusingLevinson-DurbinAlgorithm. OncetheARmodelisestimated,onecanestimatePSDofsignalfy(i)gIi=1.TheprocedureisgiveninFigure 6-4 6{4 ))tobesecond-orderstationary.ThenitsPSDcanbeestimatedas:($;Pd)=PSDEstimatefPd(i)gIdi=1;p 6-2 ).Thus,onecanfurtherformulatethePSDintermsofrealfrequencyfasf(f;Pd)=2f Fs;Pd;f2Fs 114

PAGE 115

6{33 )showstherelationshipbetweentheperiodiccomponentsofastochasticprocessinthecontinuous-timedomainandtheshapeofitsPSDinthefrequencydomain.f(f;Pd)isacontinuousfunctioninfrequencydomain.Tohandleitinacomputer,weneedtodosamplinginfrequencydomain.Inotherwords,weselectaseriesoffrequencies,0f1
PAGE 116

6-1 ,wecollectvoiceandvideotrainingowsduringthetrainingphase.AfterprocessingtherawpacketdatathroughthefeatureextractionmoduleviaPSD,oneobtainstwosetsoffeaturevectors,(1)4=n~1(1);~1(2);:::;~1(N1)o; 61 ]andMetricMultidimensionalScaling(MDS)[ 62 ],whichidentifylinearstructure,andISOMAP[ 63 ]andLocallyLinearEmbedding(LLE)[ 64 ],whichidentifynon-linearstructure.Unfortunately,allthesemethodsassumethatdataareembeddedinonesinglelow-dimensionalsubspace.Thisassumptionisnotalwaystrue.Forexample,asdierentsoftwareusesdierentvoicecoding,itismorereasonabletoassumethatthePSDfeaturevectorofvoicetracisarandomvectorgeneratedfromamixturemodelthanasinglemodel.Insuchcase,itismorelikelythatthereareseveralsubspacesinwhichthefeaturevectorsareembedded.Thesameholdsforvideofeaturevectors.Asaresult,abetterschemeistorst,clusterthetrainedfeaturevectorsintoseveralgroups,knownassubspacedecomposition;andsecond,toidentifythesubspacestructureofeachgroup,knownassubspacebasesidentication.Wedescribethetwostepsinthefollowingsections. 116

PAGE 117

65 ]proposedamethodtodecomposesubspacesaccordingtotheminimumcodinglengthcriteria.Theideaistoviewthedatasegmentationproblemfromtheperspectiveofdatacoding/compression.Supposeonewantstondacodingscheme,C,whichmapsdatain2RMNtobitsequence.Asallelementsarerealnumbers,innitelongbitsequenceisneededtodecodewithouterror.Hence,onehastospecifythetolerabledecodingerror,,toobtainamappingwithnitecodinglength,i.e.,~nC1C~n22;for8n=1;:::;N: 65 ]thatthecodinglengthisupboundedbyLC()L()=N+K N2T+K 117

PAGE 118

6{39 )),intermsofminimumcodinglengthcriteria,shouldminimizecodinglengthofthesegmenteddata,i.e.,min^LC(;)=min(KXk=1LC(k)+KXk=1jkjlog2k 6{45 )isthesummationofcodinglengthofeachgroup,andthesecondoneisthenumberofbitsneededtoencodingmembershipofeachitemofintheKgroups.Theoptimalpartitionisachievedinthefollowingway.Letthesegmentationschemeberepresentedbythemembershipmatrix,k4=diag([1k;2k;:::;Nk])2RNN; 65 ,page34]provedthatthecodinglengthisboundedasfollows.^LC(;)KXk=1tr(k)+K 6{45 )and( 6{48 ),oneachievesaminimaxcriterion^=argminhmaxC^LC(;)i=argmin^L(;): 118

PAGE 119

1. functionMCLPartition(:::) 2. Argument1:setoffeaturevectors,=n~(1);:::;~(N)o Return:partitionof,=f1;:::;K;1[:::[K=;i\j=;for8i6=j;g. 4. Initialization:=nn~(1)o;n~(2)o;:::;n~(N)o;o whiletruedo 6. (6{51) 7. ifL(1[2)L(1;2)0then 8. break 9. else 10. =(nf1;2g)[f1[2g endif 12. endwhile 13. return Figure6-5. Pairwisesteepestdescentmethodtoachieveminimalcodinglength. ThereisnoclosedformsolutionforEquation( 6{49 ).Hong[ 65 ,page41]proposedapairwisesteepestdescentmethodtosolveit(Figure 6-5 ).Itworksinabottom-upway.Itstartswithapartitionschemethatassignseachelementoftoapartition.Then,ateachiteration,thealgorithmndstwosubsetsoffeaturevectorssuchthatbymergingthesetwosubsets,onecandecreasethecodinglengththemost(Equation( 6{51 )).Thisprocedurestopswhennofurtherdecreaseofcodinglengthcanbeachievedbymerginganytwosubsets.Usingtheabovemethod,weobtainapartitionofvoicefeaturevectorset(1), 119

PAGE 120

61 ]algorithmtoidentifysubspacebasesforeachsegmentation,n(i)k;k=1;:::;Ki;i=1;2o; 6-6 showsthealgorithm. 1. functionh~;^U;^;U;i=IdentifyBases2RMN; =h~1~;~2~;:::;~jj~i DoeigenvaluedecompositiononTsuchthatT=UUT; 5. 6. ^U=[~u1;~u2;:::;~uJ1] 7. U=[~uJ;~uJ+1;:::;~uM] 8. ^=diag21;:::;2J1 =diag2J;:::;2M endfunction Figure6-6. FunctionIdentifyBasesidentiesbasesofsubspace. InFigure 6-6 ,argumentrepresentsthefeaturevectorsetofonesegmentationandisauserdenedparameterwhichspeciesthepercentageofenergyretained,e.g.,90% 120

PAGE 121

6.4 .ApplyingthefunctionIdentifyBasesonallsegmentations,weobtainh~(i)k;^U(i)k;^(i)k;U(i)k;(i)ki=IdentifyBases((i)k) (6{56)for8k=1;:::;Ki,8i=1;2.Thesearetheoutputsofsubspaceidenticationmodule,andhencetheresultsoftrainingphase,inFigure 6-1 .Duringtheclassicationphase,theseoutputsareusedassystemparameters,whichwillbepresentedinthenextsection. 6.3 ,wepresentedanapproachtoidentifysubspacesspannedbyPSDfeaturevectorsoftrainingvoiceandvideoows.Specically,oneobtainsthefollowingparameters:h~(i)k;^U(i)k;^(i)k;U(i)k;(i)ki 121

PAGE 122

6-7 showstheprocedureofthevoice/videoclassier. 1. functiontype=VoiceVideoClassify~;A;V For8i=1;2,8k=1;:::;Kid(i)k=NormalizedDistance~;~(i)k;U(i)k;(i)k. 3. For8i=1;2,di=minkd(i)k. 4. ifd1V type=VOICE. 6. elseifd1>Aandd2
PAGE 123

6-1 tonetworktracclassication.Beforethat,werstdescribeexperimentsettingsinSection 6.5.1 6.5.2 ,twosetsofexperimentsareconductedontracgeneratedbySkype.InSection 6.5.3 ,othertwosetsofexperimentsareconductedontracgeneratedbySkype,MSN,andGTalk.Foreachsetoftheexperiments,weuseReceiverOperatingCharacteristics(ROC)curves[ 28 ,page107]astheperformancemetric.ROCcurveisacurveofdetectionprobability,PD,vs.falsealarmprobability,PFA,where,PDjH4=P(TheestimatedstateofnatureisHjThetruestateofnatureisH); 6-7 ),oneisabletogenerateROCcurve.Duringtheexperiments,wecollectednetworktracfromthreeapplications,i.e.,Skype,MSN,andGTalk.Foreachapplication,tracwascollectedintwoscenarios.Fortherstscenario,twolabcomputerslocatedinUniversityAandUniversityBrespectivelywerecommunicatingwitheachother.Therewasdirectconnectionbetweentwopeers.Forthesecondone,weusedarewalltoblockdirectconnectionbetweenthetwopeerssuchthattheapplicationwasforcedtouserelaynodes.Todoclassication,wechoserst10secondsofeachow,i.e.,Amax10seconds.WesetTs=0:5milliseconds.Hence,Id=20;000. 123

PAGE 124

6-8 showstheROCcurvesofclassifyingvoiceandvideoows. (b)Figure6-8. TheROCcurvesofsingle-typedowsgeneratedbySkype,(a)VOICEand(b)VIDEO. WethenconductexperimentsonhybridSkypeows.Inotherwords,eachowmaybeoftypeVOICE,VIDEO,FILE+VOICE,FILE+VIDEO,ornoneoftheabove.Figure 6-9 plotstheROCcurvesofthesevetypes,respectively. 124

PAGE 125

(b) (c) (d)Figure6-9. TheROCcurvesofhybridowsgeneratedbySkype,(a)VOICE,(b)VIDEO,(c)FILE+VOICE,and(d)FILE+VIDEO. SimilartoSection 6.5.2 ,werstconsiderthescenariowheneachowcarriesonetypeoftrac.Bytuningthethresholds,,A,andV,wegenerateROCcurvesofclassifyingvoiceandvideoows(Figure 6-10 ).Wethenconductexperimentsonhybridows.Figure 6-11 showstheROCcurvesofclassifyingVOICE,VIDEO,FILE+VOICE,andFILE+VIDEOows. 6-8 6-9 6-10 ,and 6-11 ,weshowsometypicalvaluesofPDandPFApairsinTable 6-1 .OnecanseethefollowingphenomenafromTable 6-1 125

PAGE 126

(b)Figure6-10. TheROCcurvesofsingle-typedowsgeneratedbySkype,MSN,andGTalk:(a)VOICEand(b)VIDEO. Table6-1: TypicalPDandPFAvalues. SkypeSkype+MSN+GTalkPFA(PD)SingleHybridSingleHybrid VOICE0(1)0(1)0(.995).002(.986)VIDEO0(.993)0(.965)0(.952)0(.948) 6-1 ,onenotesthatclassicationofVOICEtracismoreaccuratethanthatofVIDEO.Specically,wecanachieve100%accurateclassicationofSkypevoiceows.Thisisduetothefactthatvoicetrachashigherregularitythanvideodoes(Figure 5-8 andFigure 5-9 ).Onecanimmediatelytellthedominantperiodiccomponentat33Hzinthevoiceows.Thisfrequencycorrespondstothe30-millisecondIPDoftheemployedvoicecoding.Ontheotherhand,VideoPSDshavepeaksat0.Itmeansthatnon-periodiccomponentdominatesinvideoows.OnecanseethatPSDsofthetwovideoowsareclosetoeachother.ThatisthereasonwhyourapproachachieveshighclassicationaccuracybyusingPSDfeatures. 6-1 ,onecanseethattheclassicationofsingle-typedowsismoreaccuratethanthatofhybridows.Mixingmultipletypesoftractogetherislikeincreasingnoise.Hence,itisnotsurprisingthatclassicationaccuracyisreduced. 126

PAGE 127

(b) (c) (d)Figure6-11. TheROCcurvesofhybridowsgeneratedbySkype,MSN,andGTalk:(a)VOICE,(b)VIDEO,(c)FILE+VOICE,and(d)FILE+VIDEO. 127

PAGE 128

6.3 ,decomposesmultiplesubspacesintheoriginalhigh-dimensionalfeaturespace.Asaresult,PSDfeaturevectorsofSkypeandGTalkarelikelytobewithindierentsubspacesthanthoseofMSN.Therefore,wecanstillclassifytracaccurately. 128

PAGE 129

129

PAGE 130

130

PAGE 131

131

PAGE 132

132

PAGE 133

4{31 )Proof:Xu02f0;1gPi=uj(i)=u0(i)(u0)p((i)ju(i)=u0)=Xu02f0;1gPi=uj(i)=u0p((i)=u0;~Tn(i))p((i)j(i)=u0)=Xu02f0;1gp(i=u;(i)=u0;~Tni)=p(i=u;~Tni)=i(u): 4{32 )Proof:p(iji=u)Yj2(i)Xu02f0;1gP(j=u0ji=u)j(u0)=p(iji=u)Yj2(i)Xu02f0;1gP(j=u0ji=u)p(~Tjjj=u0)=p(iji=u)Yj2(i)p(~Tjji=u)=p(~Tiji=u)=i(u) (A{2) 133

PAGE 134

4{33 )Proof:i(u)i(u) Pu0pi=u0;~Tnip~Tniji=u=pi=u;~ Pu0pi=u0;~=pi=u;~ (A{3) 4{34 )Proof:i(u)(i)(u0)Pi=uj(i)=u0(i)(u0) [Pu00i(u00)i(u00)]Pu00Pi=u00j(i)=u0i(u00)=p~Tiji=up~T(i)j(i)=u0Pi=uj(i)=u0p(i)=u0;~Tn(i) hPu00pi=u00;~Tnip~Tiji=u00ihPu00Pi=u00j(i)=u0p~Tiji=u00i=pi=u;~Tij(i)=u0p(i)=u0;~Tn(i)p~T(i)j(i)=u0 hPu00pi=u00;~ihPu00pi=u00;~Tij(i)=u0i=pi=u;(i)=u0;~Ti[Tn(i)p~T(i)j(i)=u0 (A{4) 134

PAGE 135

[1] P.Mockapetris,\Domainnames-conceptsandfacilities,"RFC1034. [2] P.Mockapetris,\Domainnames-implementationandspecication,"RFC1035. [3] \Videoondemand,"Wikipedia.[Online].Available: http://en.wikipedia.org/wiki/Video on demand [4] D.Wu,Y.T.Hou,W.Zhu,Y.-Q.Zhang,andJ.M.Peha,\Streamingvideoovertheinternet:Approachesanddirections,"IEEETrans.CircuitsSyst.VideoTechnol.,vol.11,pp.282{300,Mar.2001. [5] K.Nichols,S.Blake,F.Baker,andD.Black,\Denitionofthedierentiatedserviceseld(dseld)intheipv4andipv6headers,"RFC2474. [6] S.Blake,D.Black,M.Carlson,E.Davies,Z.Wang,andW.Weiss,\Anarchitecturefordierentiatedservices,"RFC2475. [7] P.Almquist,\Typeofserviceintheinternetprotocolsuite,"RFC1349. [8] S.S.Kim,A.L.N.Reddy,andM.Vannucci,\Detectingtracanomaliesusingdiscretewavelettransform,"inProceedingsofInternationalConferenceonInformationNetworking(ICOIN),vol.III,Busan,Korea,Feb.2004,pp.1375{1384. [9] C.-M.Cheng,H.T.Kung,andK.-S.Tan,\Useofspectralanalysisindefenseagainstdosattacks,"inProceedingsofIEEEGlobecom2002,vol.3,Taipei,Taiwan,Nov.2002,pp.2143{2148. [10] A.Hussain,J.Heidemann,andC.Papadopoulos,\Aframeworkforclassifyingdenialofserviceattacks,"inProceedingsofACMSIGCOMM,Karlsruhe,Germany,Aug.2003. [11] H.Wang,D.Zhang,andK.G.Shin,\DetectingSYNoodingattacks,"inProc.IEEEINFOCOM'02,NewYorkCity,NY,June2002,pp.1530{1539. [12] T.Peng,C.Leckie,andK.Ramamohanarao,\DetectingdistributeddenialofserviceattacksusingsourceIPaddressmonitoring,"DepartmentofComputerScienceandSoftwareEngineering,TheUniversityofMelbourne,Tech.Rep.,2002.[Online].Available: http://www.cs.mu.oz.au/tpeng [13] R.B.Blazek,H.Kim,B.Rozovskii,andA.Tartakovsky,\Anovelapproachtodetectionof\denial-of-service"attacksviaadaptivesequentialandbatch-sequentialchange-pointdetectionmethods,"inProc.IEEEWorkshoponInformationAssuranceandSecurity,WestPoint,NY,June2001,pp.220{226. [14] S.MukkamalaandA.H.Sung,\Detectingdenialofserviceattacksusingsupportvectormachines,"inProceedingsofIEEEInternationalConferenceonFuzzySystems,May2003. 135

PAGE 136

[15] S.Savage,D.Wetherall,A.Karlin,andT.Anderson,\Practicalnetworksupportforiptraceback,"inProc.ofACMSIGCOMM'2000,Aug.2000. [16] A.Lakhina,M.Crovella,andC.Diot,\Characterizationofnetwork-wideanomaliesintracows,"inProc.ACMSIGCOMMConferenceonInternetMeasurement'04,Oct.2004. [17] H.Wang,D.Zhang,andK.G.Shin,\Change-pointmonitoringforthedetectionofdosattacks,"IEEETransactionsonDependableandSecureComputing,no.4,pp.193{208,Oct.2004. [18] J.MirkovicandP.Reiher,\Ataxonomyofddosattacksandddosdefensemechanisms,"inProc.ACMSIGCOMMComputerCommunicationsReview'04,vol.34,Apr.2004,pp.39{53. [19] J.B.PostelandJ.Reynolds,\Filetransferprotocol,"RFC959,Oct.1985.[Online].Available: http://www.faqs.org/rfcs/rfc959.html [20] K.Lu,J.Fan,J.Greco,D.Wu,S.Todorovic,andA.Nucci,\Anovelanti-ddossystemforlarge-scaleinternet,"inACMSIGCOMM2005,Philadelphia,PA,Aug.2005. [21] B.H.Bloom,\Space/timetrade-osinhashcodingwithallowableerrors,"Commun.ACM,vol.13,no.7,pp.422{426,July1970. [22] L.Fan,P.Cao,J.Almeida,andA.Z.Broder,\Summarycache:Ascalablewide-areawebcachesharingprotocol."IEEE/ACMTrans.Netw.,vol.8,no.3,June2000. [23] F.Chang,W.changFeng,andK.Li,\Approximatecachesforpacketclassication,"inIEEEINFOCOM2004,vol.4,Mar.2004,pp.2196{2207. [24] R.Rivest,\Themd5message-digestalgorithm,"RFC1321,Apr.1992.[Online].Available: http://www.faqs.org/rfcs/rfc1321.html [25] http://www.hdl-dh.com/pdf/hcr 7910.pdf [26] D.A.PattersonandJ.L.Hennessy,ComputerOrganizationandDesign:TheHardware/SoftwareInterface.SanFrancisco,CA:MorganKaufmann,1998,ch.5,6. [27] \Auckland-IVtracedata,"2001.[Online].Available: http://wand.cs.waikato.ac.nz/wand/wits/auck/4/ [28] L.L.Scharf,Statisticalsignalprocessing:detection,estimation,andtimeseriesanalysis.AddisonWesley,1991. [29] R.O.Duda,P.E.Hart,andD.G.Stork,PatternClassication,2nded.Wiley-Interscience,Oct.2000. [30] G.CasellaandR.L.Berger,StatisticalInference,2nded.DuxburyPress,June2001.

PAGE 137

[31] B.J.Frey,GraphicalModelsforMachineLearningandDigitalCommunication.Cambridge,MA:MITPress,1998. [32] M.C.Nechyba,\Maximum-likelihoodestimationformixturemodels:theemalgorithm,,"2003,coursenote.[Online].Available: http://mil.u.edu/nechyba/ eel6825.f2003/course materials/t4.em theory/em notes.pdf [33] Y.Weiss,\Correctnessoflocalprobabilitypropagationingraphicalmodelswithloops,"NeuralComputation,vol.12,pp.1{4,2000. [34] J.S.Yedidia,W.T.Freeman,andY.Weiss,\Generalizedbeliefpropagation,"AdvancesinNeuralInformationProcessingSystems,vol.13,pp.689{695,Dec.2000. [35] J.Pearl,ProbabilisticReasoninginIntelligentSystems:NetworksofPlausibleInference.MorganKaufmann,Sept.1988. [36] S.M.AjiandR.J.McEliece,\Thegeneralizeddistributivelaw,"IEEETrans.Inform.Theory,vol.46,pp.325{343,Mar.2000. [37] T.Richardson,\Thegeometryofturbo-decodingdynamics,"IEEETrans.Inform.Theory,vol.46,pp.9{23,Jan.2000. [38] R.J.McEliece,D.J.C.McKay,andJ.F.Cheng,\Turbodecodingasaninstanceofpearlsbeliefpropagationalgorithm,"IEEEJ.Select.AreasCommun.,vol.16,pp.140{52,Feb.1998. [39] F.KschischangandB.Frey,\Iterativedecodingofcompoundcodesbyprobabilitypropagationingraphicalmodels,"IEEEJ.Select.AreasCommun.,vol.16,pp.219{230,Feb.1998. [40] A.J.Viterbi,\Errorboundsforconvolutionalcodesandanasymptoticallyoptimumdecodingalgorithm,"IEEETrans.Inform.Theory,vol.13,pp.260{269,Apr.1967. [41] J.G.DAVIDFORNEY,\Theviterbialgorithm,"inProceedingsoftheIEEE,vol.61,Mar.1973,pp.268{278. [42] L.R.RABINER,\Atutorialonhiddenmarkovmodelsandselectedapplicationsinspeechrecognition,"inProceedingsoftheIEEE,vol.77,Feb.1989,pp.257{286. [43] D.Moore,K.Keys,R.Koga,E.Lagache,andkclay,\TheCoralReefsoftwaresuiteasatoolforsystemandnetworkadministrators,"inUsenixLISA.(2001),Dec.2001.[Online].Available: citeseer.ist.psu.edu/moore01coralreef.html [44] C.Logg,\Characterizationofthetracbetweenslacandtheinternet,"July2003.[Online].Available: http://www.slac.stanford.edu/comp/net/slac-netow/html/SLAC-netow.html [45] I.A.N.Authority,\Portnumbers,"Aug.2006.[Online].Available: http://www.iana.org/assignments/port-numbers

PAGE 138

[46] T.Karagiannis,A.Broido,N.Brownlee,kcclay,andM.Faloutsos,\Isp2pdyingorjusthiding?"inIEEEGlobecom2004,2004. [47] S.Sen,O.Spatscheck,andD.Wang,\Accurate,scalableinnettworkidenticationofp2ptracusingapplicationsignatures,"inWWW,2004. [48] K.Wang,G.Cretu,andS.J.Stolfo,\Anomalouspayload-basednetworkintrusiondetection,"in7thInternationalSymposiumonRecentAdvancedinIntrusionDetection,Sept.2004,pp.201{222. [49] M.Roughan,S.Sen,O.Spatscheck,andN.Dueld,\Class-of-servicemappingforqos:Astatisticalsignature-basedapproachtoiptracclassication,"inACMInternetMeasurementConference,Taormina,Italy,2004. [50] T.Karagiannis,K.Papagiannaki,andM.Faloutsos,\Blinc:multileveltracclassicationinthedark,"inSIGCOMM'05:Proceedingsofthe2005conferenceonApplications,technologies,architectures,andprotocolsforcomputercommunications.NewYork,NY,USA:ACMPress,2005,pp.229{240. [51] C.Dewes,A.Wichmann,andA.Feldmann,\Ananalysisofinternetchatsystems,"inIMC'03:Proceedingsofthe3rdACMSIGCOMMconferenceonInternetmeasurement.NewYork,NY,USA:ACMPress,2003,pp.51{64. [52] D.Wu,T.Hou,andY.-Q.Zhang,\Transportingreal-timevideoovertheinternet:Challengesandapproaches,"ProceedingsoftheIEEE,vol.88,no.12,pp.1855{1875,December2000. [53] ITU-T,\G.711:Pulsecodemodulation(pcm)ofvoicefrequencies,"ITU-TRecommendationG.711,1989.[Online].Available: http://www.itu.int/rec/T-REC-G.711/e [54] ITU-T,\G.726:40,32,24,16kbit/sadaptivedierentialpulsecodemodulation(adpcm),"ITU-TRecommendationG.726,1990.[Online].Available: http://www.itu.int/rec/T-REC-G.726/e [55] ITU-T,\G.728:Codingofspeechat16kbit/susinglow-delaycodeexcitedlinearprediction,"ITU-TRecommendationG.728,1992.[Online].Available: http://www.itu.int/rec/T-REC-G.728/e [56] ITU-T,\G.729:Codingofspeechat8kbit/susingconjugate-structurealgebraic-code-excitedlinearprediction(cs-acelp),"ITU-TRecommendationG.729,1996.[Online].Available: http://www.itu.int/rec/T-REC-G.729/e [57] ITU-T,\G.723.1:Dualratespeechcoderformultimediacommunicationstransmittingat5.3and6.3kbit/s,"ITU-TRecommendationG.723.1,2006.[Online].Available: http://www.itu.int/rec/T-REC-G.723.1/en [58] Y.Wang,J.Ostermann,andY.-Q.Zhang,VideoProcessingandCommunications,1sted.PrenticeHall,2002.

PAGE 139

[59] P.StoicaandR.Moses,SpectralAnalysisofSignals,1sted.UpperSaddleRiver,NJ:PrenticeHall,2005. [60] T.H.Cormen,C.E.Leiserson,R.L.Rivest,andC.Stein,IntroductiontoAlgorithms,2nded.DuxburyPress,Sept.2001. [61] L.I.Smith,\Atutorialonprincipalcomponentsanalysis,"Feb.2002.[Online].Available: http://www.cs.otago.ac.nz/cosc453/student tutorials/principal components.pdf [62] K.V.DeunandL.Delbeke,\Multidimensionalscaling,"UniversityofLeuven.[Online].Available: http://www.mathpsyc.uni-bonn.de/doc/delbeke/delbeke.htm [63] J.B.Tenenbaum,V.deSilva,andJ.C.Langford,\Aglobalgeometricframeworkfornonlineardimensionalityreduction,"Science,vol.290,pp.2319{2323,Dec.2000. [64] S.T.RoweisandL.K.Saul,\Nonlineardimensionalityreductionbylocallylinearembedding,"Science,vol.290,pp.2323{2326,Dec.2000. [65] W.Hong,\Hybridmodelsforrepresentationofimagerydata,"Ph.D.dissertation,UniversityofIllinoisatUrbana-Champaign,Aug.2006.

PAGE 140

JieyanFanwasbornonJul26,1979inShanghai,China.Theonlychildinthefamily,hegrewupmostlyinhishometown,graduatingfromtheHighSchoolAliatedtoFudanUniversityin1997.HeearnedhisB.S.andM.S.inelectricalengineeringfromShanghaiJiaoTongUniversity,Shanghai,China,in2001and2004,respectively.HeiscurrentlyaPh.D.candidatewithelectricalandcomputerengineering,UniversityofFlorida,Gainesville,FL.Hisresearchinterestsarenetworksecurityandpatternclassication.UponcompletionofhisPh.D.program,JieyanwillbeworkinginYahoo!Inc,Sunnyvale,CA. 140