UFDC Home  myUFDC Home  Help 



Full Text  
xml version 1.0 encoding UTF8 REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchemainstance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd INGEST IEID E20101118_AAAAAS INGEST_TIME 20101118T10:42:03Z PACKAGE UFE0021627_00001 AGREEMENT_INFO ACCOUNT UF PROJECT UFDC FILES FILE SIZE 56350 DFID F20101118_AAANOO ORIGIN DEPOSITOR PATH UFE0021627_00001.mets GLOBAL false PRESERVATION FULL MESSAGE_DIGEST ALGORITHM MD5 fbf967d0316b0c7514617227e038704a SHA1 9c79c72858753b586e1b97a16db72f60dc0a50be 1053954 F20101118_AAANJR mao_f_Page_17.tif BIT 5392e1fee020b32532c2b8262d6f1f64 d589811e125ac761f52648808e8a3b64e8d920d3 47949 F20101118_AAANEU mao_f_Page_32.pro 36ccbf8b169fbddba13a22f5cf60af4e 0f0d536cd44a68a7d58a03534df1bc30181e9e69 F20101118_AAANJS mao_f_Page_20.tif 095e704bfec88fd8e9ad640cc050d2fe a6b871240e6e97ce1beaf4395e478baa5bbafbc6 2020 F20101118_AAANEV mao_f_Page_22.txt 86bc140c58dcdeffd964d31de65bd5f0 f2dc3d359bfc5d671a281b52da06cff6f5a210dc 25271604 F20101118_AAANJT mao_f_Page_23.tif 159e0429bd02c0038f3945f69416c52d 4c1be25788e715f47447480a639a7a3960cc3958 35450 F20101118_AAANEW mao_f_Page_11.pro 33010b14016afd02c276f165e8cf0263 debd08bf4128bd8f499f32588646b58d8a73df4c F20101118_AAANJU mao_f_Page_25.tif fbff3959b1dff56e68ed0277995b54b4 ef4f0e889b55e29aec2693684ef07d1dda751088 33753 F20101118_AAANEX mao_f_Page_09.jp2 fc1aac247865957fb865ae21426aca10 fd17d781cf4d642908a60b2bb26a495347dd7ade F20101118_AAANJV mao_f_Page_27.tif 5080ea3c3c5d65c8c63ed63eb98f8d40 1724fbb7c604935de83b17f3fa079437f4e0a888 F20101118_AAANEY mao_f_Page_28.tif f84aa869219f6156441d179e92cb26ca fd01535105546f1d0d9fc400cf64f8b7abf72513 F20101118_AAANJW mao_f_Page_30.tif 72051024dfdea0411a26d637e265e310 774e45fb9a81754e289290130ca14496a61ba587 21756 F20101118_AAANCA mao_f_Page_12.QC.jpg edab649aa209f1d0b4387f39ae667045 494708a48ab8e3d0168ddf51e408a998a4326a40 22259 F20101118_AAANEZ mao_f_Page_01.jp2 a178e7ea10f59428899af2c1ee5ac6c8 af2b7f800572afd7813059881658cdcfe04e46e5 F20101118_AAANJX mao_f_Page_33.tif e196a697c3ed1023847f4002a8d5c32e cd771d47f7a785e34194d258d64ff22a7c49c542 F20101118_AAANCB mao_f_Page_15.tif efd028677b5e62934c649bf4bb27961c fd731a86a59507f39574def962c4d662c0d2fba9 3095 F20101118_AAANHA mao_f_Page_44thm.jpg 90dd2af7f8265f348bd4c5f7921a6423 c99b43af3dbf5caae5396ed3c77b9ed3c53ec568 F20101118_AAANJY mao_f_Page_34.tif fa7e487b543974fd238137c7ba157e46 e5160ce18dd4eb79e00e21aee95243743345758f F20101118_AAANCC mao_f_Page_31.tif bf1b0612060a2da2b05f1983cac5ad03 78eae5a0de712f50e9ae4aca1ee85f9ae81fa015 455 F20101118_AAANHB mao_f_Page_01.txt bc978f7af95df5bbb5ba45af77fd9119 7a61d75009ac41ea3e85070bb92229ec961af8cd F20101118_AAANJZ mao_f_Page_35.tif 0bdad9b94d622865b4659800dde593f9 65fb8d8cd4409c3098bc0c2f66d82becf8d909db 16121 F20101118_AAANCD mao_f_Page_10.jpg cd7ba534768bb3cb7038c9ce57f17b02 08cafb52b64e02910f5f0eb14ff1683350ebb4a1 841972 F20101118_AAANHC mao_f_Page_23.jp2 5a1294387e7bcd76ebd5cc585998bf2e 5f205510580d97876aaab856cabf7910e9ad554b 1710 F20101118_AAANCE mao_f_Page_19.txt 9b50910db55c022c83b656142030340c ed5c3459393c37015b1c0ed0462627df7d3018c2 5489 F20101118_AAANHD mao_f_Page_19thm.jpg 3b88a2e6f7593e489b886a5210f32ea2 41bcf277ffe550b1820de8fd0ca8ee341e5de17f 91051 F20101118_AAANCF mao_f_Page_32.jp2 08443e4d37b3461c207de738afcfaad2 8c5e694cfb5a800455cb620e115507be0e4b2568 1933 F20101118_AAANMA mao_f_Page_29.txt aa18104ca85b8229ef8bfcda0872bb74 6a9012f05a3d1142a23f5945168b2cc5ee4b5c97 2287 F20101118_AAANHE mao_f_Page_04thm.jpg 2637698b5429e18318ea075c16c1de5a 4ea1f14a7b1a51ad26a8085a26625530c9e280af 596165 F20101118_AAANCG mao_f_Page_30.jp2 86ecef853f2654d00e4e363262b3c80b 245e26f672251373274b91c58eb07573cdd508aa 2479 F20101118_AAANMB mao_f_Page_32.txt 798652ad8a6d74dec4620e133723ac68 b8a0640f386c43da9ae94c5e7822256eda944f6b 7951 F20101118_AAANCH mao_f_Page_09.QC.jpg e62a383789938efa7ea6da39cb512b59 c696bcbf2e2a18d9497c35a2065e15a3ce9e19bf 1581 F20101118_AAANMC mao_f_Page_33.txt 86ee2d16ecf99124112a8ac17694ff1d 5b6e66118ca820a5db2fba4c530410977a2d0366 19863 F20101118_AAANHF mao_f_Page_10.jp2 9ec8fb8176b5504b94347138d5cc7aed a7d1a171c78386bf60e2da528106d5ffbbb82316 4937 F20101118_AAANCI mao_f_Page_32thm.jpg 2aa411db237f2131b7eb20a185915c0f 103fd06d668a9895db1b0d4baca44772bdd0d166 1382 F20101118_AAANMD mao_f_Page_34.txt 6483d000e8f3da390e073025076d6fe8 a563c1df4303f837962163daccf4abc2b3f739db 85824 F20101118_AAANCJ mao_f_Page_14.jp2 662fc0f091609ae798f11908080b6371 aeb657618880bfac4465a5ab004091e399c8a332 2226 F20101118_AAANME mao_f_Page_35.txt c20e9dced65814b711ebfaadc38a8c7a e65ce3709dd005bb5eb6a66b952110ea861c4d18 4637 F20101118_AAANHG mao_f_Page_33thm.jpg 7c5882076337bb1223dbedcdac58ca7f de23b7cb764c1ceda211cdee9ce18572e8de5160 24335 F20101118_AAANCK mao_f_Page_24.QC.jpg 9efd0f395f89e444fc827dbf3c72ea1f e8e677d21b27e8cbce81e6320c083b4fe7b717e8 1927 F20101118_AAANMF mao_f_Page_36.txt abe35130aa30a8f135071c6742ddc5e4 9720efafd4d94a2f784a9cbdb1366772b6945f85 70576 F20101118_AAANHH mao_f_Page_21.jpg 81d8268d87a23349bb95a3dd08435def c795e6dc8e9b703c3dbc383d3f490b0226d99534 89078 F20101118_AAANCL mao_f_Page_25.jp2 3daec2de600f82067f88386443c32376 b028cfccc7b215df8648c996537ea3f70baa2da8 2213 F20101118_AAANMG mao_f_Page_38.txt 896cb1df6788e09af6e3b1704599ad9f 01fb2a452386fc6071825a2cd7dcd4808b496e29 F20101118_AAANHI mao_f_Page_02.tif ced17fbf4438dab11650edb4d4287687 8be8ccf4f8c733988ef27d0d3ace2afb40059f1a 46199 F20101118_AAANCM mao_f_Page_45.jpg e6ac3e59c21194e3f166001ab6a37517 73066c555dc3274ad680ed9482c8f22992b70c47 757 F20101118_AAANMH mao_f_Page_39.txt d6d9354b24902a912dd3ca423592b2b7 76919bf94cd23daa983484d604063d934d65a38d 9590 F20101118_AAANHJ mao_f_Page_41.QC.jpg fb4be67379e4b5cb8708ab0e0f0db1e0 26dccc3389abc39cd3d9d7bb6da772a549b6c25c F20101118_AAANCN mao_f_Page_19.tif dcfbc14c204e5371f5d623b160e252b3 0099b5d21343849a7978a0143602ba2edef86561 1198 F20101118_AAANMI mao_f_Page_40.txt ca51d5af913f3ea4ce4e409377d331eb 3539255ab97cbb6f9a142932849f42bfca2eb598 38141 F20101118_AAANHK mao_f_Page_23.pro 4a52deaf1dd2f6a535a46ba04c8e03f9 c605023d9b8e665495e5ed2b41a3704e43d1ec5a 1447 F20101118_AAANCO mao_f_Page_03.QC.jpg a7581ea08af450192043281c481dbeb7 8eace24fda69bb1dc9ef829182e3a4ff25eb8c79 601 F20101118_AAANMJ mao_f_Page_41.txt 2fd279cdbb0ae2fcfaab04d21aaaedb6 2baf3b6b96fba4edee56553fb4ae88c76e72b85d 16325 F20101118_AAANHL mao_f_Page_36.QC.jpg d88930a44cd2072e91a8a756df54701b c95042724e0c2181d91ce90eb6ba5f2e311225fc 82128 F20101118_AAANCP mao_f_Page_35.jpg f41ef7c0f09670d598411eb18cf39650 2f19d5f83216c538a3b9b5a92ea335f7c10c8104 25256 F20101118_AAANHM mao_f_Page_09.jpg 703db38f75886cad18b1d7b1f933a567 6670aee20eb21772e9439888b3c1b766f13569ab 7955 F20101118_AAANCQ mao_f_Page_10.pro 9c60131f8703228a4b2e5a588a628129 dcf693bd5583b1e4289d986cae6deeda9e3cbb54 127 F20101118_AAANMK mao_f_Page_42.txt 2eacf8b8f9d7c4f104b930363a4b0db6 cc4eaf9a1e8e9f295f5ddb5f5f7bf9ae3165874c 41040 F20101118_AAANHN mao_f_Page_21.pro bc072a256f69e9f369cf8d32ae9b494d ad27750625247b2a7dbc9083fab5931ee16bd355 23775 F20101118_AAANCR mao_f_Page_44.pro d4a5ce6d9ec5b4c002784a0fd6fcc7b4 9c80377403368ac5be105263b484fac6ea24aeed 1375 F20101118_AAANML mao_f_Page_43.txt 347602bed34ab8eb0fc464972c51f7fa aceaa99e43654c60d69fe051cf262a401581073d 72341 F20101118_AAANHO UFE0021627_00001.xml 0ad270f13ee4a72755f9d005daaaa620 96176fa36640f2679622ba19d3c442bc650ae0ff 6349 F20101118_AAANCS mao_f_Page_38thm.jpg 0503556ae6583033abb522c64c459ff3 5a34b5afdf59e2d041b60de7f213e8cb04167448 1317 F20101118_AAANMM mao_f_Page_44.txt 6a3aade6b483efc6e836d1d8e22d2bfc b1421a1895256d2bed68e027bd58df96dcd810c4 F20101118_AAANCT mao_f_Page_32.tif 719b488a51cf6630a5f08b9f4643cd7d 6b7c68a9bc8f734a6ee2d5be42e056f213484d45 1598 F20101118_AAANMN mao_f_Page_45.txt 81112df05cc548c109cedc991cb8dda6 a88fbfdf459c261b302f4b4cef63706408ec9699 F20101118_AAANCU mao_f_Page_29.tif 433fb42971c16c22be3f93169cad86d0 cb847984f5cdec1f242c435e2894ee121ef5a31c 434998 F20101118_AAANMO mao_f.pdf cb505c530ea30a8f3f40066ca839f3f4 511ce263989eac08d5f704aaef5cc671f55a1099 21158 F20101118_AAANHR mao_f_Page_01.jpg f1b1aac0610c129388b3fb340e642057 0865a5428a9075e9dab1f0f0e0b845fe3744ba15 384 F20101118_AAANCV mao_f_Page_47.txt 8cf3fbe6d3efe12d5027aa3652220e40 0c8d2b2bb0116a8c45cec18ca075dadfd49e4868 1479 F20101118_AAANMP mao_f_Page_01thm.jpg 9d2fb76037b1b3823a3697e71afdeb7d a9a9b03ad18d57d0f85e0a6c8a213871d41e88fe 3087 F20101118_AAANHS mao_f_Page_02.jpg 678ab0f6d7916b0b9a2961d3eb8afc26 2e736630343cad6096fc832c0354fb5faf2121e8 73430 F20101118_AAANCW mao_f_Page_13.jpg fa5e7d8927602017ad4cf38226ef4be0 e89cd841d96539714973e44ec254185aa52db4e8 6353 F20101118_AAANMQ mao_f_Page_01.QC.jpg 2e778a16986bb4e56d19b612da2ae7f1 c87a50d6476cad3e4fc2be2ad5fadc2e53e97635 30107 F20101118_AAANHT mao_f_Page_04.jpg 96ab02c750726d5ff0d3b3bf59738c59 1e1303feb4c724b1a10e51adfa15f615ba0a9eeb F20101118_AAANCX mao_f_Page_07.tif 51a7ba5349c4097570a6101d6f2b2619 75511ba8c1c4830b2f5194a02dde3c8eee9931ff 623 F20101118_AAANMR mao_f_Page_03thm.jpg dc8eec01ddc94cc99eb76c892679c104 ba89204a80de229cc91b56da26d66d01c0cea28b 84658 F20101118_AAANHU mao_f_Page_05.jpg 4730c365b3f2d1361d7e91ab20109ff0 538fb382963d6e2e38f81dbf80264fe31c0451da 4626 F20101118_AAANCY mao_f_Page_02.jp2 3e310353bcb73ff212050be88a429e2e 9c36acfe91fae14014659295f7ebb281a0dc9368 8751 F20101118_AAANMS mao_f_Page_04.QC.jpg ccafa8b1c9225646bd2fa20516bed734 1109a0425558c12e651f38f5f34c9c566ac2bb8c 14876 F20101118_AAANHV mao_f_Page_06.jpg 9619e8210b93aed3d8e5fca4f1dcda53 64febc60bd8a21a867a8bca3dc9ba09bb600a30f 44439 F20101118_AAANCZ mao_f_Page_39.jp2 210ead44699312b218b9e8c786e689af 1e68ab1b2fedca3c7d1e3b761494da37dc6f3c86 4703 F20101118_AAANMT mao_f_Page_05thm.jpg 9989622f246b6f11059ad519b047fea1 b3482182aca77e1d881e194d5def9248bb8111ec 69778 F20101118_AAANHW mao_f_Page_07.jpg ac37f65b8ccdcc179137a16c4959dd0c 625b8b92cbdac79c6520bc47c7d10c35a8700670 4629 F20101118_AAANMU mao_f_Page_07thm.jpg f43d698606a340c3dd6bbbc5324ad0c9 f6cd0a4176f333934b55b32e10255d007f409f20 62781 F20101118_AAANHX mao_f_Page_11.jpg a62f38c40556c2439a2d0135da77a256 5ae0043fbb9064577da455a6c84157bef229c926 3129 F20101118_AAANMV mao_f_Page_08thm.jpg 79f39251f77cc48f32e9f6a7d7f44e83 099ae9230e1ae1d138f9732d9d710e927e308ead 1952 F20101118_AAANFA mao_f_Page_27.txt 471fd9529ee64461a0eb87b0d82d4e80 bb89f0536692965f5edfb745efbf9880ca1b0224 68091 F20101118_AAANHY mao_f_Page_14.jpg f60f3eeebc248033293580e6d139e548 36eb8dc6685c0c9925cbedabe44388ccfa58473b 12409 F20101118_AAANMW mao_f_Page_08.QC.jpg 7566492f3f1ca1304587778e001b4339 60659c71710723a88900bad0d2bee981abae3f65 93008 F20101118_AAANFB mao_f_Page_27.jp2 8ec646f7117978f9dde2b56af527fc39 4190e595e5894b36304d96461b117bc3825705d2 51715 F20101118_AAANHZ mao_f_Page_15.jpg a5c50d00479ef04f555344e011c49e31 66c18977515a23c96e27ecac91bd85bfe4c0efef 5516 F20101118_AAANMX mao_f_Page_10.QC.jpg 0aab6039cd6fdfdb36f65aa13dfa2f0e d0aa4acd8927646b7419563e6ad609c95b5e3f0d 1051977 F20101118_AAANFC mao_f_Page_05.jp2 aaf2b494dfbca26fc42d3dabadb91092 0686822857637850a3c3429d8a0ee05454662dcb F20101118_AAANKA mao_f_Page_37.tif 2c66fc796291212086d66bb196bbed76 dd83bb159058314d3f9aba2c20159c4aa7b938af 4456 F20101118_AAANMY mao_f_Page_11thm.jpg 56acc1734e5c7e4129727fcc0d11263f 06c7142a25676da1c53a00813472e03af7c360d6 F20101118_AAANKB mao_f_Page_38.tif 6d54ed7bd03bc66289cfe762a80e1b59 e8b21e744e3bc5793a8088e1bd98a5246ac348f4 5570 F20101118_AAANMZ mao_f_Page_14thm.jpg dcd813db75caf9968fd9bb5cb558a406 b020c46970662dbc8b27cfbbdeeada8a119259ce 48524 F20101118_AAANFD mao_f_Page_35.pro 01704088522a056c6b82c96835b0288d 5c25e2f11cb51f687399ed54cb1ab2696d385334 F20101118_AAANKC mao_f_Page_40.tif 4fa8c65a2229703f7e59009aeb818d95 411e06abfae5d07d7ec9876e91bd27212355e830 100247 F20101118_AAANFE mao_f_Page_24.jp2 23aeb9cd6d4460140c8c236dfbfa2c9b f867e1694c55582c3412b10564e8c6b236647458 39202 F20101118_AAANFF mao_f_Page_37.pro 41e2016789dbb4fddc0bd3e4a242c3e5 5754d0f009bed51041760cb8468bfc395af39826 F20101118_AAANKD mao_f_Page_41.tif 27db12eda5176d260d9891ea1e835482 d1f70f6c51bf92f37460dc832563dabaad1e524f F20101118_AAANFG mao_f_Page_26.tif bfef212c0f80a0e5e3410446de00c803 e37047722d917da589eb4ac07b1873226c675c8b F20101118_AAANKE mao_f_Page_42.tif a13ffb99625b7f33c2aeb185b500d423 eedfa02aca76a879b57904543cc8eaf5cf5df382 19962 F20101118_AAANFH mao_f_Page_23.QC.jpg 6381a0f4d3b907c9066b88c3d3b80c60 617cc6629ca2dd03a39d61da45dbaabe56ea82a0 F20101118_AAANKF mao_f_Page_43.tif f49551d52b9104c7112eaff93906459b 974fc4ce491094c96fd1285f13b7552987f4ff73 1453 F20101118_AAANFI mao_f_Page_10thm.jpg 730e7df271f4944503b6d1fcea8a7062 e4aa5aa0633bd328dd87f7dfd870ac6c4d352262 F20101118_AAANKG mao_f_Page_44.tif 85a2e9bdbb57bc7187984bb6a6602272 d668538460e9b259c97e7d163585522c06a1a331 538 F20101118_AAANFJ mao_f_Page_06.txt 3519d7c749693bffa806bb1fd026fc00 26f8afdd39d9e634dfa0957df9a2f31812972ab3 F20101118_AAANKH mao_f_Page_45.tif 8d1b21d0d006dfa9180411bffcd0c4f1 8135abe1d09d8f337eb672f320ab3b9339a1da1e 67780 F20101118_AAANFK mao_f_Page_12.jpg 2986ecd6388104a9f41b68e1561d114c 3d8c7d3b2a4554684a9d309366f0149e5753b9f1 19161 F20101118_AAANFL mao_f_Page_37.QC.jpg 1909eb5d014540b7c7e406efe89d6ea3 0d238d1b2d6c7b66158bf4631f14a77e926dec2a F20101118_AAANKI mao_f_Page_46.tif 2768cea32198c76628633e448e0ffd1e b2ba51a17f27f79550ca383712f31fbd21d8b7d5 95814 F20101118_AAANFM mao_f_Page_28.jp2 774aa40b4ff6474676c675ffdadb3ebc 8da5c3db529371f4d3be2b77be74bc3bad561644 F20101118_AAANKJ mao_f_Page_47.tif 2ace30d1e72c84dcf5154a49d585dedf 47996d4b53bdcc5cf87551d828fe8f5d811ff56c 44410 F20101118_AAANFN mao_f_Page_08.jpg 6af236f20ba686ecda7ec911f82a6716 2f15657459d5fc25792e6224218bcc74191e26e3 7349 F20101118_AAANKK mao_f_Page_01.pro cfca9088ea991c87d7ecdd9646ef3ef4 ac2899864e2261de6bc7b46672d9a5f50cce2d9a F20101118_AAANFO mao_f_Page_10.tif a60dd643efd10906e3fe9c77af7ba555 cad5bc2c58c0b59b2e2e3e2757c7be63866ff330 1754 F20101118_AAANKL mao_f_Page_03.pro 2cfb7e572a0a6d35f6861f88c3b9c3dd f8a3bebfb98eedfdc0d509221e37bc85e78ab9a0 2059 F20101118_AAANFP mao_f_Page_17.txt a3e14a575f1f4d6004a8c388c190b104 56b12de500472fc88130a61a52265aa8d12e2f95 15754 F20101118_AAANKM mao_f_Page_04.pro 92717ea2669fb02ff2f8838ec5749d0e 9fb888a6736f9d9ef8b1cf57451ee3cadb70ad02 1624 F20101118_AAANFQ mao_f_Page_46.txt 726f1a62c9a7ebac90c1263c3e1810f5 fc9821dc9b66ed63684dc3ae1af5abba410ae488 92665 F20101118_AAANKN mao_f_Page_05.pro 0de78b9216b4c7c6bf91b8c2f54ee0b4 0a2b1490a618ae7d313b154056c6548f36080546 5274 F20101118_AAANFR mao_f_Page_03.jpg fad96e9c5d1905d44381c5d7aaea0cc1 b7ab624afa4edde39181b489b97d24a2dbd0e08c 57332 F20101118_AAANKO mao_f_Page_07.pro aa228291d4e5b7866b11a832cc03f0fe 4d2f93f77dd1238ad9cedf7f0a26d6c881d2190c 2598 F20101118_AAANFS mao_f_Page_09thm.jpg 5cdee35605893dad0a6e2e0ff0f21a92 06424f4d86db24c889aa8e4f3a1bab7967c9254c 32851 F20101118_AAANKP mao_f_Page_08.pro 33fad9364f9acb89066590625966304a 03b52b90932fd9b94c6d75e57d6c041e22b60d42 4708 F20101118_AAANFT mao_f_Page_15thm.jpg ef22e5998506ec6b8ce68ae3a6637122 93ecea3023259ebe873374f0596d5cad7721a2d4 17388 F20101118_AAANKQ mao_f_Page_09.pro 253be40cd37c651c570c548b3c0f6ef8 415432d492650bba37e2f89cddbe2d956f8ce898 F20101118_AAANFU mao_f_Page_12.tif 9c6f79ff025d043e11fd5446cea30c53 192cd753ef540b81c45a73621e432247fad31d85 40207 F20101118_AAANKR mao_f_Page_12.pro 3131e37aef8b3bde593950c1ec2e2ae6 c93aa1f9b71cc7326961440d09523860ad1e0373 66751 F20101118_AAANFV mao_f_Page_17.jpg 821ecbc6bb99dee6027837f0cf4c20fd 3b6ee6ec069dd8ac38651c9fb5e8e55703d0e333 45225 F20101118_AAANKS mao_f_Page_13.pro c4c927f9d79a9122721e324a2dbc6414 b86d56476b94e9bf1b8aaaf59e825a3c45109193 2086 F20101118_AAANFW mao_f_Page_37.txt d3f43bbf29bd36eab3085c21319632c5 73ef34e6a0d4de52ba1e6c9b09354b127769fbb7 40691 F20101118_AAANKT mao_f_Page_14.pro a97706b970c54baf08e0754622684d24 30e53962795e648799b1cbf36a224b52ff998fbe 38642 F20101118_AAANFX mao_f_Page_30.jpg 8bc0d7eb3de67ac24796c083e986a8b2 99a4a8a5c03dbabf088ac52f7b05aa4b8091a2f7 38782 F20101118_AAANKU mao_f_Page_17.pro bae4bbf99894b9e71c6cd0fa30cb7557 96960f321fccbcdf60d3ebb260bd63fbdd8af90a F20101118_AAANFY mao_f_Page_21.tif ab3af5a86b49a7721bc2d7cb0f54033b 362965244e677ced31bface841b4bc962cc2aca4 26026 F20101118_AAANKV mao_f_Page_18.pro 421d4aaf49fe2553115b48d7727e69c9 e4adb557706d79d2079b776f34d1b3eb2feaad65 51426 F20101118_AAANDA mao_f_Page_36.jpg cb824265e9198402b1e9c283339ac75f 114309f20d5cddb6fac74cc6f32377e95967e3e2 F20101118_AAANFZ mao_f_Page_24.tif 53bd4bd62bbfd963f836fb3a96b28de1 f118212cc38968cb5019779367877649caad40c4 38334 F20101118_AAANKW mao_f_Page_19.pro fe87b0dc5219b8040fb6ee3d835a26c4 2fe8b65adf2b4846a224f26ea89052cd5c598ce2 47402 F20101118_AAANKX mao_f_Page_20.pro feb44fb10c353869f55474a802d74c7e 3dfb08d9b280a1af22c12975209ccbc06688c79a 1316 F20101118_AAANDB mao_f_Page_06thm.jpg 28965949866d207be32f4b123e305f07 1a168e8829f0f829fab8dc9c4bb5fc4d7fa309bf 61475 F20101118_AAANIA mao_f_Page_19.jpg ff0ec0f5fe1d52a82c46ed4a809c259c 05a6d50d9793682abc9a889ce2c1a26ca4bb302b 47032 F20101118_AAANKY mao_f_Page_24.pro c2ea46fb33919da44d69bee79ca986ff 7c93ef3e4f6a65f18dfb96d10d8fa7d2186c16cb F20101118_AAANDC mao_f_Page_09.tif 30f21e153676bcd9d2453be15e2cedb0 80aa992d7bed4f4a6529a6ff5b1b6b4924fba80a 79955 F20101118_AAANIB mao_f_Page_20.jpg 689a73ab756e07e5476ede3b6a57b916 dd44b308db1770080148dc3b6ce6b4ec84c93d7b 42819 F20101118_AAANKZ mao_f_Page_25.pro 876ad60596fa1f22d5a179f850a1734a 7df13e51de5782569cc48e72eed5344fb51619ec 2961 F20101118_AAANDD mao_f_Page_42.QC.jpg 86d9e49dcccda5cd8944847148d1e415 5602e40101c021eaa5075342f7fb82dbeaa0f29d 63539 F20101118_AAANIC mao_f_Page_22.jpg f8d556dc6e0e19112c64e3627fdafe7a 7e7d12ce2bd5ef8fbda200872b74e420249d88cc 6107 F20101118_AAANDE mao_f_Page_20thm.jpg d1d95ff04a049485c5e5e566bffe2096 45c5b7cd6199bfec50794a2125acf4d5759ed7f2 64685 F20101118_AAANID mao_f_Page_23.jpg 1ea2e03884ebc92456fc7134926e1df0 3573a7f26d5f2fd57ab2d1340faaaf0f46d9de2c 81466 F20101118_AAANDF mao_f_Page_26.jpg 443f51afecb46248daa1792b88b5f020 0ad20a81eb82f6fffb1cf7839f4d3e31fcd94abd 4592 F20101118_AAANNA mao_f_Page_16thm.jpg dbc38058b7183c9ada46831e1414145e 855549a5ce6ec0691702f8505ce0227bce14f178 79605 F20101118_AAANIE mao_f_Page_24.jpg 0b2fcc6d35f0ee7a258c46db774cc812 4cd7b13efa7255a86e25a25905043505f3ce193b 12371 F20101118_AAANDG mao_f_Page_30.QC.jpg cfa6a457ca09dc48fdfceecc3497fd4e a279f4c39cbfd658c46cf340a04d7282a1de6b3a 17630 F20101118_AAANNB mao_f_Page_16.QC.jpg 7c9cc0067d76f400aa4b8a9b85325df5 a14718dfbc6cd9f81e0e6b254fd3acb71d008d5e 72063 F20101118_AAANIF mao_f_Page_27.jpg 1ea13d5ea78ffec6f91e0d20821f441b b9a85bb84e8f246a14079ab46a2d1d77edcd77d9 10169 F20101118_AAANDH mao_f_Page_06.pro 88a0657b000b0d19202f07a7bbfe8e60 68d26bcb334984a31998088bb202fd3f62f78b29 5165 F20101118_AAANNC mao_f_Page_17thm.jpg 05fcdb39f8cf1c686e1bc7e310cbb0d6 55661dc9cc1acebe70928c106f2a3a889906b5a4 55315 F20101118_AAANDI mao_f_Page_43.jp2 9d4604b70fec8b7017d0b09fdb3d1856 5525621d115753f7d0856ffb966afc929178892c 15845 F20101118_AAANND mao_f_Page_18.QC.jpg 4ab61d3c8f65388afe32f8e93c8b6e50 6b247fb5e4305b3b6410573e9595f6c5c4b0e2a5 79229 F20101118_AAANIG mao_f_Page_28.jpg 2fc3c590f39c8587478db2223e8dccd1 560d258be874746dd409bd73dc23c0e860a3766c 5813 F20101118_AAANDJ mao_f_Page_13thm.jpg 59abdd13a48ec77871795f8b9dd9586c 7cbaae0f68b49f673845279b179c42323738e0e5 19813 F20101118_AAANNE mao_f_Page_19.QC.jpg 33b2f16913f4b1505ebb5c7fe98c3987 e8edbdad4b8d2bf649a53cd9aa4a09ab02e54475 71576 F20101118_AAANIH mao_f_Page_29.jpg cf86593d4346c304e0f8a33d3b14a639 7274af9f0dfec02c13a9be61f4208db8eea275b0 3672 F20101118_AAANDK mao_f_Page_43thm.jpg 821efd05fcb56c1e1dc02d522a2e5a31 e746c151583863866f4335ee3597ad603d579aaa 22057 F20101118_AAANNF mao_f_Page_20.QC.jpg 5a3a4628f334302ca691eee51479625e 71cc9d19d4a0eb4e48c321436f5e51b27d2e94da 80301 F20101118_AAANII mao_f_Page_31.jpg 66476cbd2e30c46c5e74597a77eef29e 5cc681d78f5c2ca1334839e391b6d43db479c793 7003 F20101118_AAANDL mao_f_Page_03.jp2 fd6c69f41831e807782adf2a110f0ffb e1cdc5e7cb2b773b146c5b458477b5ce790f749f 21306 F20101118_AAANNG mao_f_Page_21.QC.jpg fbd7782528d4d27c6c2435eac7e07ca3 253a770f553b92978dfb16f8e93f044069f0b2d8 83625 F20101118_AAANIJ mao_f_Page_32.jpg b62a1192510a431334dc7fb2e6d3cd99 361bf1a7335314d9c09f350e73d13488f6e10988 44099 F20101118_AAANDM mao_f_Page_40.jp2 c490582aeb20fce2c63a3fb08bf2069d a35e50e8a48c4e6bfc19c515bf6f1a39ba60a28f 5142 F20101118_AAANNH mao_f_Page_22thm.jpg 82a1f0d8af991342c35534449f7374c1 f9be65f66104d2b00e670e52d233872f9db77915 60247 F20101118_AAANIK mao_f_Page_33.jpg d294fd132ea31b5d7ba13ac7383581e4 6da11481f311daa15b37384d01db43c573fef3ea 845149 F20101118_AAANDN mao_f_Page_34.jp2 2bed7e0584aa2a91e17b197e43661cac 7660849744798c426bac961ac5cbc0935674ee2f 21989 F20101118_AAANNI mao_f_Page_22.QC.jpg 5f668e2701db9608100bb31c0c052424 dca7408654563c9caa85d12325756e89a8cd55ac 48655 F20101118_AAANIL mao_f_Page_34.jpg e422557e564ccb9864d85b5ef9229f61 a4af9ceb020bccbbfc67509009d20d7f8f65417f F20101118_AAANDO mao_f_Page_39.tif 85b930929fed77a3527008cee339fe70 cfb33ee7b182dde3b1d63b2b2d2ab684392fc79e 5406 F20101118_AAANNJ mao_f_Page_23thm.jpg a47fc24c8f09dd31c5255d09e50692b3 02e86b75b2ae1974a57cd05a72dca990a7c16ba6 60500 F20101118_AAANIM mao_f_Page_37.jpg dca010cb8d8f9b9019c2852a482904e7 a3bbb621c2d87e67668c8ccb2d29f8881e04903b 30653 F20101118_AAANDP mao_f_Page_15.pro 085f8120b09947838bafc7396c071a7b 11c99130c75da1560b9e7979a50d2cd6f2d3757b 5852 F20101118_AAANNK mao_f_Page_24thm.jpg da171a7597b84e0aeec3a93a2a4d4a57 98b6350f476d00b5fce3e3990fe541ef570bd2f5 34429 F20101118_AAANIN mao_f_Page_39.jpg 74104d277b5e05be9fad0656dd8267df d4e873d574cc574032bf171456a1cb20112f1bc8 3808 F20101118_AAANDQ mao_f_Page_05.txt 63da2da5cd64b3192f0449dd0202256f 71a55b0678a8b7225ef70c0f0030ee412c03fde7 34430 F20101118_AAANIO mao_f_Page_40.jpg c979785e35f9da42c5319aaa3dd14a4e 1b4352bcfaf1114de645e0fdd0ab768c725d52db F20101118_AAANDR mao_f_Page_36.tif ad12179d5963e36223f11d524562ea58 4fe02f8d44ef5eca6d6e6c3b90830efa83f13944 5576 F20101118_AAANNL mao_f_Page_25thm.jpg 953f6275d4276a4846d2d9b82382c066 e01a677dce7bc51142dfcfb2ae3908bcc1a400a0 28060 F20101118_AAANIP mao_f_Page_41.jpg 0aa978334e214f0797ae8b56691a78b5 d07307c27c568994b31a605a984fdd92310ff16e 21391 F20101118_AAANDS mao_f_Page_13.QC.jpg 42eda876ab3f0bc61fb2c949434ca354 498ea88ac252aaa1028794aa4df92e6bb3bba883 22198 F20101118_AAANNM mao_f_Page_25.QC.jpg 336fe1437927f1e98a1f8a32bee55a0f da1816a66ff1dea3bed5513ed1385f89a4869d84 9954 F20101118_AAANIQ mao_f_Page_42.jpg 59cea515b56556dd213840a7202e5611 02bf21fbf366ed4ded58a7d1e13670d3c924884a 138 F20101118_AAANDT mao_f_Page_03.txt 5f4fe90dbe3b1e9b5c0c890c232be160 0167e9b8cbc6b74a5db4d53263ff70d7897ea076 24766 F20101118_AAANNN mao_f_Page_26.QC.jpg 8e14b89fbf3a5ea0d027458d890460a8 0165cb5fd6a464d5cf5a39042d61c0d424c26039 45747 F20101118_AAANIR mao_f_Page_43.jpg 8de05657743fe75b83b1b336e57d3a2c 1a35b8f96ebac0bbd8e24ffe58e8e75ba739a3d5 408 F20101118_AAANDU mao_f_Page_02thm.jpg 10b0ace7233108b8813206996f34e3c0 5fef4f621c52283dd81fa88fbc13d8c34f95ecb2 5717 F20101118_AAANNO mao_f_Page_27thm.jpg d262d23ffef0c5ff5cd049f093b0495c 215559a959e989146828f0b77688cf4b046d6b96 21079 F20101118_AAANIS mao_f_Page_47.jpg f283ddac5eb330488822a18a0a594164 eb7b480e73e66271511879b8a0701be1ad1ade57 11769 F20101118_AAANDV mao_f_Page_40.QC.jpg 8c9c33977a37a5d3169792bd37dd2f07 d397d8fe43d639b10d5598f56783f0c9118eefd6 5877 F20101118_AAANNP mao_f_Page_28thm.jpg fee3bc168b186b1ae446d2cf12fa0404 f7e9e462447c29776ccf519a31d5e215990950b1 266200 F20101118_AAANIT mao_f_Page_06.jp2 c27dcf9cbbbc70714bb9eb43106589de fb458caed57feb65bbc56e122c1cd7b454d9a698 4245 F20101118_AAANDW mao_f_Page_18thm.jpg 77f6ab4dba9ca028c8cf7305476449c2 bc004ec72481e69ea2a72f84cfe147097c4235f8 24631 F20101118_AAANNQ mao_f_Page_28.QC.jpg c53442d5be5cdb6944e75d36fa23b98e 152d000857ea6ea1e4e20c8b00c297804995bebd 1051982 F20101118_AAANIU mao_f_Page_07.jp2 f0bf45fa42bf7b4a8823df012b41dbd5 4390c1641c890de86490dbbc56a03bd2c49f3b96 F20101118_AAANDX mao_f_Page_11.tif 956f3e360e21afcb5c0ebaf0d59ff756 8eccffd8e82596c697abd3fc45688dcc426f5607 5498 F20101118_AAANNR mao_f_Page_29thm.jpg 311cef7be79050b4599e489ee9b90437 f0652d31fd409290e53c01846f24a22441fd6b75 1051981 F20101118_AAANIV mao_f_Page_08.jp2 ebd560ffe790dce91edfc66e1a1d43b0 66548d61f455a5ef2ae56cf728bf041174a2971f 958 F20101118_AAANDY mao_f_Page_09.txt 9284f02be48b1cab88e6eac6a0be3dd9 62670b324d1193340aaef18a748ed88c5a2517be 22629 F20101118_AAANNS mao_f_Page_29.QC.jpg f0ee9d20c4aebcbcf2ee898025a71d46 e155f2f3fd8076970d81fb73184823ada19c5da7 94910 F20101118_AAANIW mao_f_Page_13.jp2 aa4493a6242b6d73fa36f0b39ec8ee6b 5acf8463cee7ba984dc25e365548f0043219e307 18075 F20101118_AAANDZ mao_f_Page_11.QC.jpg 7d05102ab10e0d5508203438398fa3b8 fcf142937760a2bcdd1324f7e87319d814a9fd46 3455 F20101118_AAANNT mao_f_Page_30thm.jpg ba0656a8f90aa886cdfd20ac4b133533 a069534a888241422da06ac81369e4346f810f1f 695029 F20101118_AAANIX mao_f_Page_15.jp2 c6a78b8e1f53eaca57aadec5c4e12ee5 586a5e896a8e377c8caa77748c5c75b1c1f23de3 6012 F20101118_AAANNU mao_f_Page_31thm.jpg 208ba2cefb569da278c77ec53b4e6598 097ccdbde59d9199ba5245bf2411a6f6e278a1a2 82916 F20101118_AAANIY mao_f_Page_19.jp2 598c358e5f67bc1f504ed9b94f8f22b9 5859cadd9772bb3bc941e955669592d5106ee9ce 25644 F20101118_AAANNV mao_f_Page_31.QC.jpg 89c0dbe18547377c7157a08f5a841af8 44ad03a8c767652e913b25a32c2c6f39e486a825 70972 F20101118_AAANGA mao_f_Page_25.jpg 38608eaf3f2dd077a819afeefd45ba68 a263a857170def38a76d3bb4f3b8f4094261061d 88208 F20101118_AAANIZ mao_f_Page_21.jp2 0c755341ee45364b4998f93cf5f87dd8 0cae8c1a44637cafe1ecf40d19ba79e7ede98181 23573 F20101118_AAANNW mao_f_Page_32.QC.jpg a7d84c7da5e52f1f5d569fe37917585b 431056127c8528635fd26d0941f43d9d5df7c84d 2035 F20101118_AAANGB mao_f_Page_31.txt 23fb44fc96064734b1263a68e9ec086c 9ed5acbd98400c7016527e74cfbd126448b14704 19495 F20101118_AAANNX mao_f_Page_33.QC.jpg 1f63bd76be6fd02845033ff69c1ee96d d8a00b82822106b9f0e0e8cf81ca954edf7c4632 19082 F20101118_AAANGC mao_f_Page_39.pro 4737668a7d0a0389e30ee24636779196 1d1a60a56807ab9d738a04794028a59f50736747 48185 F20101118_AAANLA mao_f_Page_28.pro 2676cd7a0cb7111706a1096f45af4f3a 6dd0abbe503bd855451240e2c6a3be18aba5a56d 15302 F20101118_AAANNY mao_f_Page_34.QC.jpg d913b39564ff7b40ec8bde0d4c665ad5 fb031933b93fb651566a2300c6e54ab9d545753c F20101118_AAANGD mao_f_Page_18.tif 4ca4eafa8688a3b752b1076d0eafd524 9a623e76b31c0f90d28d586f2ee2d0c833935b58 42696 F20101118_AAANLB mao_f_Page_29.pro 5603ad92486dfb92571d99a2d937016a bc4a9458001ac2de71884d4fb7efa19327d5fcbb 6081 F20101118_AAANNZ mao_f_Page_35thm.jpg f42371c0c4a9e1b6cdae402843261e61 37d4fa7c7e52c726ef1fe0c887fb917cdcbd71d9 18550 F20101118_AAANLC mao_f_Page_30.pro 46f84350fec2e294f7b251122bae6ebe 188da6a72ffbeeac3d5e9c66c6069f86d54d05be 31344 F20101118_AAANGE mao_f_Page_36.pro 102ab885890324029110cfc85dbf8cb8 78df7bf586886732219328191aceeaf6b25b0fe4 37342 F20101118_AAANLD mao_f_Page_33.pro 7b14b12bddeb5478f689ec3a99dbd4cc 7d721952f1d9f23dd235d1d8c4c3a349053f49c8 5229 F20101118_AAANGF mao_f_Page_12thm.jpg 92d99d7998473f94aecbce119df9464a d5fd7c6fff0052a3ea55daaa6538e356d2aae397 23450 F20101118_AAANLE mao_f_Page_34.pro 8f443284c78da3e0eda5f9dea60ce1ae 0525ad7104910ef33fea596f29ee728b42e7cfe2 18783 F20101118_AAANGG mao_f_Page_07.QC.jpg 24f257a39f1111e9c73d3b7040ad0e5d 23f405067b05665a87cd5e484e75d48e25d7f380 49930 F20101118_AAANLF mao_f_Page_38.pro 5326dd27ff192f0f845b3815e7ff6d0f a856eb4493a4e8747c533b59c9c89685ff27abe3 1890 F20101118_AAANGH mao_f_Page_12.txt 20ffd8f81d9487962e59b84dd78a6a06 9173a037d6b9b2981b9dbaeea45fa57871a1d215 18979 F20101118_AAANLG mao_f_Page_40.pro dc22b08bb9c4d61782275d66e64de113 d311f8594316bc68d5398a1307a162ff1aeb2f2b 82243 F20101118_AAANGI mao_f_Page_17.jp2 ba53273cebd660cc125b9ed94d3103c3 429e24249d0a5f74b90fb9b88aa1d2d54205ef31 10229 F20101118_AAANLH mao_f_Page_41.pro e0629664991a652d88987a2a66c322c5 d407defda7e20ea97d7c56678867d54df8d33997 89095 F20101118_AAANGJ mao_f_Page_29.jp2 170f6d4c2c39ed99a4ceeec805b50f09 60702e1956c0e043def7d0225316009c2758a6ba 2976 F20101118_AAANLI mao_f_Page_42.pro 325d766cf66e21cf5f965309b1ee5fd5 d7c2f2c3c549a57fa41f73e79ef708d6b794f30e 2739 F20101118_AAANGK mao_f_Page_41thm.jpg d0490db060804767e48ae3014e4e7011 aefcb6ad6a0cdc256555b40f4edb5a19f369ed58 2648 F20101118_AAANGL mao_f_Page_39thm.jpg f0aede562f88cd2dfe80b7197f90888b f5f8cf81e670f680b0712da4a70255f77e1522ef 26946 F20101118_AAANLJ mao_f_Page_43.pro f7669d691b69afd75e49a036553e2e4d 95c13cb0fa85dfcba5845567a5894545084e8dc1 58001 F20101118_AAANGM mao_f_Page_46.jpg 506bac714f4604a82e0cbc0b431d4b0e 054780feb5e3dc3493941dda179c73219d563916 29069 F20101118_AAANLK mao_f_Page_45.pro e1cda2dc0d22df5d08b3cf94b9168229 6949ba2d25d089a55d965d2d77e8c48b06a09031 3098 F20101118_AAANGN mao_f_Page_40thm.jpg 46588dd863fffd978e102a07f3155f5e 781f10aaaaf58e034dabe05c7dc26cf3cf03241f 39717 F20101118_AAANLL mao_f_Page_46.pro 803ab4da60e556a9a93b45e5ab15a32b 392a02f2ab9052eb076719bb3ed667bcc69b4bc4 46330 F20101118_AAANGO mao_f_Page_31.pro 7aed7b285464f68a25a2ee5820bd0ed0 4ac24c42f480565713afb7c449a624f742369648 9454 F20101118_AAANLM mao_f_Page_47.pro ba84978639ca4bcfe5fb54b92aa50ac3 6174224a513e8e72a1cf50aaac0c4adeb9081c07 649 F20101118_AAANGP mao_f_Page_02.pro a885a8eae0c0920f1135a551b133a4e1 0debed65241b8ca5c5587df2684a73976cc37073 73 F20101118_AAANLN mao_f_Page_02.txt 1499b1ec51d9fac41b1443ec08ecfdda c8e59fd05c91ccc20887867b66c742355ecdf476 47255 F20101118_AAANGQ mao_f_Page_18.jpg 6422c5a22afbec9a873f3d1091826ab6 41b98f54e0ee52e9e8355a93f7b157d705d89333 59974 F20101118_AAANBT mao_f_Page_18.jp2 1978d2db3d6c665695365deeb7c57bbb bd275853f02a8a539dfa91221bb84d594f26375c 2406 F20101118_AAANLO mao_f_Page_07.txt 84865103f83287573c0842a09f276f88 e22a8f90fac3e5e83121be0094aa01f3ad125425 4328 F20101118_AAANGR mao_f_Page_34thm.jpg a602ffcae8610dbf9b6b6d6b1a8d0b53 74e5c788af1d12daa27868e2d36758f4801be839 770658 F20101118_AAANBU mao_f_Page_36.jp2 22dc8b5894cdfacfa65a642cf75fc76a 52ba9fa7cd03947c71688995cd84fe3aa6233d9b 1334 F20101118_AAANLP mao_f_Page_08.txt 4dc0e63f1095503d3b014c9362f4db45 d8c8388de81fab3e693db3c844d2306cb88ed261 70249 F20101118_AAANGS mao_f_Page_16.jp2 f4b8d0228bdf960ca8b6c4652e56f1e9 682c9a84ce993bdd6fd61ad035ab8229c8d4dce6 101420 F20101118_AAANBV mao_f_Page_20.jp2 29ee38e6ee57cd7c9881ac842e15e782 48e5746b195bb962efbad4434aa80a7b6b9b0065 392 F20101118_AAANLQ mao_f_Page_10.txt 53c34856c16677b16e6fec040f5fa1f6 b1e4bc82554e6a35f08dc7b36e9acefea3f2f995 31355 F20101118_AAANGT mao_f_Page_16.pro 7ff30fb3abe1f4a95aef06762ff6f27b a0290fd615e3ea8394d64a3c6d8ce5c822ef3e3f 1106 F20101118_AAANBW mao_f_Page_30.txt 26b0e5e76abb2ba73688b276f8097e42 796c746dee20b6a1e024c1c7dda869ed004eea97 1642 F20101118_AAANLR mao_f_Page_11.txt 56f5185ce0c747f86d7c03e6b00f8449 556c1053a182f45deff1400218887fbb04f495bd 677 F20101118_AAANGU mao_f_Page_04.txt 4f90bdcc8e631b4d855328ca91337d95 a9a1c417c04cef5d3825ccec4237dd25256d1b8a 21797 F20101118_AAANBX mao_f_Page_27.QC.jpg bdcd75c74b3aff3f4d2eacb4450b41e0 7e18339e665bc6801ede0dfb600d263d8557e136 1901 F20101118_AAANLS mao_f_Page_14.txt ca9ca1857685969cf3fc5fc40ba859bb a799debcb5c0e08efe7a021715ad9181b6aa9b44 22621 F20101118_AAANGV mao_f_Page_14.QC.jpg 44cb96cf820c45b37d1be41c83e6d81c fd2b11cce5f3205091535700b22854e0f148ba76 87607 F20101118_AAANBY mao_f_Page_12.jp2 0a04f52d5ae1a5186e9af53d72e89cc8 2e47e78fec641b3f7d7bab255346b081afd904f2 1527 F20101118_AAANLT mao_f_Page_15.txt bb44a8d584a3853592c3b0a6eb5d7a32 b59ecd107264fa9516e9a1abe20263d4873dff6b 17476 F20101118_AAANGW mao_f_Page_15.QC.jpg e1891130d921163632762a8b4e46d44f 0a0f9ee33c5712381216681d5e20461ef1a6d0f8 F20101118_AAANBZ mao_f_Page_13.txt eb3185cd3047297d62e1d47d49daced6 de19b4b73debaa14d41ec3356a9a34b9e6e77f87 1575 F20101118_AAANLU mao_f_Page_16.txt 1678986db468006e477a5e46705efc28 62ecdb0fe45d6079207d1d54a6eb196896696906 37969 F20101118_AAANGX mao_f_Page_22.pro 424938a266da3dfc967e77875079cc23 e5bcc603ef9d6859e1ca9c05618c3876d0d20bd6 1783 F20101118_AAANLV mao_f_Page_21.txt 8f4329d18293777636f7bd47f570579e 25bf46fee5038e0c820829c4a0aeda1eecb7eb2b 80032 F20101118_AAANEA mao_f_Page_22.jp2 c4fdd245730348cdbe1a727af680520d 9a14bbbe0797b8fd2ed69d5e5664a64b349c4693 34723 F20101118_AAANGY mao_f_Page_41.jp2 de28294a01ec1a7e68538fa70cb5f46a a9678a8569a2cfa355f131c6a9a62fa194d2cdef 1859 F20101118_AAANLW mao_f_Page_23.txt e695864ec95c65a187d4a03db86bdbf1 29cb8748d317ec264026549bce6b26d30c866190 1261 F20101118_AAANEB mao_f_Page_18.txt 1096fc9c7996f2c81caf32ab7b09b535 6839ab30af2473c45b7b0ceed8680d4a3a3ba1e6 1930 F20101118_AAANGZ mao_f_Page_20.txt 879f3e1ff62d3560fe4519bd992b342c b0542ba7e1d1e45ca6bb554f6c9f9ea2dbb248e3 2039 F20101118_AAANLX mao_f_Page_24.txt 69d677f86fcd243544fb7f6247f5f84a 4f97492de40b87216d367daa74583e1ea6553bd7 49261 F20101118_AAANEC mao_f_Page_26.pro c8ff5d4c82818d6d9f6710a9417de52c 68a0546ab5bba0d77298352a05f9562f56f18b71 1051966 F20101118_AAANJA mao_f_Page_31.jp2 bdc613b9ebd25f8a9dcc46a1478217a5 2952598d40b8a30a388baeab269643439d2e80ad 1870 F20101118_AAANLY mao_f_Page_25.txt 81b9d596612a05754342afa791a197dc f27cb686ce285a2d41f9b64d3d318a9600bd479c 55830 F20101118_AAANED mao_f_Page_16.jpg 69d3d200e63d2332967caa7cf8631761 b586c558f16cb78ee208eb91796fd3274a874e7f 98398 F20101118_AAANJB mao_f_Page_33.jp2 aff4862f972101c091b2abe8486b3d38 c70157763a129e2f266abbc183548783eaa30fd7 2535 F20101118_AAANLZ mao_f_Page_26.txt 60f2aa2f44a6d3de7d8c77d6a2a9653b d9ee1be82f269f50d9fd4cdeadcbec68dc07f144 F20101118_AAANEE mao_f_Page_01.tif f3c1b0b13654fd13d92f6a7b585d6fba bf5d7cb97d5027b87157b42fa06e5502c6bde96a 100630 F20101118_AAANJC mao_f_Page_35.jp2 e562815ef975d91ec0172cb5f2cc9444 b4e477c79e50a2768bbabae77fe16030826c1ccc 4256 F20101118_AAANEF mao_f_Page_06.QC.jpg dbc7f6402315a605e91da7539d2298eb 6e4db25ba654b809c0dec337bb02530a5be2b33e 24991 F20101118_AAANOA mao_f_Page_35.QC.jpg 04e40096c83716e1a1d0d4b6faf39ae3 87142699cb250a3b16a53ed54cd2372b7b6b81bd 865241 F20101118_AAANJD mao_f_Page_37.jp2 568c599e152938a4528caa50b530dd4a 7a4e28b1429bcc7d94aa503c1bb10b9efb943294 80197 F20101118_AAANEG mao_f_Page_11.jp2 7fd4e37144dce6d53ebf9bcc88c8ad00 5ec20672dc8f9f995f2bce46e5c3201bbf415206 4172 F20101118_AAANOB mao_f_Page_36thm.jpg 9fdc65d8ab42e31e5ca13f97a7462bfc 17c2e1d28a50ffe7b1014606b169b59b9918b421 103204 F20101118_AAANJE mao_f_Page_38.jp2 df4dfd000ec3c10d4ef90c2e919c965d 32bdb6d7b120586a4715ece815058e15092eec48 37179 F20101118_AAANEH mao_f_Page_44.jpg 25a24f02e9c6425b3d937d34ebbbf58f 876d63861eb1a78a908b321c7d271199b3591e4e F20101118_AAANOC mao_f_Page_37thm.jpg 148f102a0152a0d1391c6fc81b8cba82 c68c301d9c3c1b8339b46685e96f90d62013d998 12480 F20101118_AAANJF mao_f_Page_42.jp2 c6c382e06b458473e1b9603dc8a09e3f 629b8a2f9c1ff3321f32c00766784f4571a164af 984 F20101118_AAANEI mao_f_Page_02.QC.jpg e4eb3122058b0a86978d5c3eb788a652 e897ef849c7f41c2572272a85187969fdc15965d 24735 F20101118_AAANOD mao_f_Page_38.QC.jpg d82ebf40fb48a3ba57743726055275df d4a55f0ea19eea8a1be6da3650fa23997c0fb227 42829 F20101118_AAANJG mao_f_Page_44.jp2 5d3a15faad7b71ebf3132591b399bfa1 993e6098046dcad81f170b273b43cc0c2e4381c3 2262 F20101118_AAANEJ mao_f_Page_28.txt 3fb70b9ceb7cd9c82c7c6e0b26e4e849 366c8bb3b4024766c825e5d7b3ffd3f105e81336 11220 F20101118_AAANOE mao_f_Page_39.QC.jpg daaac70dd2a0cb5f6fd61c1e4e79559a 4f8fe7f34ba0f357178e33c16d782bd76b525e8f 1111 F20101118_AAANOF mao_f_Page_42thm.jpg a6ccf292064427f49a1ba5cacade6973 25c0ce2182930a4e8e1a5f94c85dc3da2219f252 52022 F20101118_AAANJH mao_f_Page_45.jp2 840d513caf68361d6f3dcb4829e14b95 5ee23d85c007000c4810df8d0f980460ec83944a 91967 F20101118_AAANEK mao_f_Page_26.jp2 b860f6606282f49f5038a02d01bb1908 bbc4ed151c0cdcf46c821b7c0873ccb4e7a58eb5 13966 F20101118_AAANOG mao_f_Page_43.QC.jpg fa250d68dcd39c13d2a3b3c7ab0be982 dd830798df05de6666ae4acc62ded1fd8ba4ec3f 82388 F20101118_AAANJI mao_f_Page_46.jp2 09caad210fe15bcfaddb41a934084d0e b1f230caf3da3ba944ad48b806aaf8218b288213 38429 F20101118_AAANEL mao_f_Page_04.jp2 29bc86c6d55d833a6fa464bb9b1f72f8 a83e325670f02175aa7ea303d22e12f0b0b0e5a0 10839 F20101118_AAANOH mao_f_Page_44.QC.jpg 839b69acf81431c533b0b0c4bf2ff0e6 b7e02091787162bafba4dcd31cc1b3b14622e245 24687 F20101118_AAANJJ mao_f_Page_47.jp2 3ae2543beb1c41495a3a58de2e325c06 66f0aeb8032415ba34b6492fb9b145155caf3b65 5490 F20101118_AAANEM mao_f_Page_21thm.jpg 6ffbb344c80694c8eb4ef0c2ca5035b3 0a6c72e255bc3847f86ef72805c66c745f0b5032 3423 F20101118_AAANOI mao_f_Page_45thm.jpg fb6954b1df3adc10e28715ebbde6fbcf 3e44f5f65374b6dd7db7790198394cff71e65346 F20101118_AAANJK mao_f_Page_03.tif df0573e7678c19b1226bc8c1d87bef50 ef105397b4335fa0d4c2f62fe910f39e08088193 18248 F20101118_AAANEN mao_f_Page_05.QC.jpg c79e8d9b98894ff571db648a0a3efe6b cfefd27fb606dfc4ec69a2299e14c321024f996b 12787 F20101118_AAANOJ mao_f_Page_45.QC.jpg f0938d0f4937a84b494a974a446f7217 47d23e79b52b38e6cad018844aad90e09613e258 F20101118_AAANJL mao_f_Page_05.tif fd8023bc60e97525bbcd4bfa31010cbe 2c597370b803ac3dfe4ff447313a67a33a8cf1b4 43356 F20101118_AAANEO mao_f_Page_27.pro a66f92c4b34936fc414485921b1212f6 9578798b91eeef0b5a0d90dec6a4d0a55732a419 4528 F20101118_AAANOK mao_f_Page_46thm.jpg cb9d589b3a8193f6d064bd90820d80d2 9c849e8515637a728bb0ddfc0c80c90de97dce6e F20101118_AAANJM mao_f_Page_06.tif ababf3f483e6da0db798033566b477d2 9aa91d9b23ecdf606a655fcd63743d74ea762bb4 F20101118_AAANEP mao_f_Page_22.tif 7d6190391045cceb3f32dab60767ffc4 b4073adf89545113641dc809d147fd73053d2350 16869 F20101118_AAANOL mao_f_Page_46.QC.jpg cedb1a08131705e1c9a5bfa6aeb4c892 0ebedc4280b339e491d87ae85220a364a7ef1996 F20101118_AAANJN mao_f_Page_08.tif 0574dec22a628f80cf3819b4b30bd735 d567872f9a98e2851bcce619082fd2d650173a9f 19417 F20101118_AAANEQ mao_f_Page_17.QC.jpg 555afc9c1e7e5a55956b935b299178d6 de81df647947c0d27e999c24378289599f60accb F20101118_AAANJO mao_f_Page_13.tif ee4e4f206aa3f78221aee54c021569f8 e3c28eeec305ad1891d145ce4440b31b4734c0ee F20101118_AAANER mao_f_Page_04.tif f71162e8701f5b5951194b0af5950304 cb9da2393321d7f19e62528867dae87e6ed8be9e 1755 F20101118_AAANOM mao_f_Page_47thm.jpg 7f06feebee6368fd5121fc771d9172f9 1ef274c2ab22f2bd7ed9951ccf10c86383250687 F20101118_AAANJP mao_f_Page_14.tif 1ecd83a84da7dcd73d27fe30eafa5771 7f461ad9513057b980b649de63f7f3ac6238d4ef 5800 F20101118_AAANES mao_f_Page_26thm.jpg 04ab37e2055d7b850d775ff47bbbef17 38b2dd5006e97a757e907685174d662cf71342bb 5108 F20101118_AAANON mao_f_Page_47.QC.jpg db4d26351e5131b3a1266ecb7838c70f 9ae92a024a76ac49f332b8758e0f0eda9aefc66e F20101118_AAANJQ mao_f_Page_16.tif dffaec26312308dad5b1a087f3ff2286 f480fea0fcf718cc902ea5d9d99fea0c28dec85a 80548 F20101118_AAANET mao_f_Page_38.jpg cf231ffa845558d1de7a53d09408ac77 c8726f358b37acc68e6b8318b3b4a594dbd6cc92 A STUDY OF JOINT CLASSIFIER AND FEATURE OPTIMIZATION: THEORY AND ANALYSIS By FAN MAO A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE UNIVERSITY OF FLORIDA 2007 2007 Fan Mao To my Mom, for her enormous patience and unfailing love. ACKNOWLEDGMENTS I would like to thank my thesis adviser, Dr. Paul Gader, for his encouragement and valuable adviceboth on the general research direction and the specific experimental details. As a starter, I sincerely appreciate the opportunity of getting the direct help from an expert in this field. I also thank Dr. Arunava Banerjee and Dr. Joseph Wilson for being my committee members and reading my thesis. Special thanks go to Xuping Zhang. We had a lot of interesting discussions in the last half year, many of which greatly inspired me and turned out to be on my experiment designs. TABLE OF CONTENTS page A CK N O W LED G M EN TS ........... ........................................ ... ............... .....................4 L IS T O F T A B L E S .............................................................................. ............... 7 LIST OF FIGURES .................................. .. ..... ..... ................. .8 LIST OF SYM BOLS .............. ...... ... ............................... ..........9 A B S T R A C T ......... ....................... ............................................................ 1 1 CHAPTER 1 V ARIABLE SELECTION V IA JCFO ....................................................................... ...... 12 1.1 Intro du action ............................................................................... 12 1.2 O rigin ....................................................... ..................................12 1 .2 .1 L A S S O ......................................................................................................... 1 2 1.2 .2 T he P rior of f ........................................................................... .. .... 14 1 .3 JC F O ................................................................. ................ 16 1.4 E M A lgorithm for JC F O .............................................................................. ............17 1.5 C om plexity A naly sis............ ... .......................................................... ......... .. 19 1.6 A alternative A approaches ....................................................................... ..................2 1 1.6 .1 S p arse S V M .................................................... ................ 2 1 1.6.2 R elevance V ector M machine ...................................................... ..... .......... 23 2 A N A L Y SIS O F JC FO ..................................................... ........................ ...............24 2.1 Brief Introduction............................... ............. ............. ......... 24 2 .2 Irrelev an cy ............................ ...................................2 4 2.2.1 U uniform ly D distributed Features ........................................ ...... ............... 25 2.2.2 Noninformative Noise Features............................................... ...................26 2 .3 R edu n dan cy ................................................................................ 2 7 2.3.1 O racle Features .................................. .. ... ........ ............ 27 2.3.2 D duplicate F features ........................ .. ....................... .... .. ........... 28 2.3.3 Sim ilar Features .................. ...... ...... .... ... .... ..... ............... 29 2.3.4 Gradually M ore Informative Features .................................... ............... 31 2 .3.5 Indispensable F features .......................................................................... ... ... 33 2.4 Nonlinearly Separable Datasets ...........................................................................35 2.5 D discussion and M odification ............................................... ............................ 37 2 .6 C o n clu sio n ............................................................................. 3 8 APPENDIX A SOME IMPORTANT DERIVATIONS OF EQUATIONS ....................................... 40 B THE PSEUDOCODES FOR EXPERIMENTS DESIGN........................................................43 L IST O F R E F E R E N C E S .................................................................................... .....................46 B IO G R A PH IC A L SK E T C H .............................................................................. .....................47 6 LIST OF TABLES Table page 1.1. Comparisons of computation time of 0 and other parts................... ..................................21 2 .1. F feature divergence of H H ............................ ......................................... ................................25 2 .2 F feature divergence of C rab s ......................................................................... ....................25 2.3. Uniformly distributed feature in Crabs ............................. ....................................... 26 2.4. Uniformly distributed feature in HH.......................................................... ...............26 2.5. N oninform ative N oise in HH and Crabs ........................................ .......................... 26 2.6. O racle feature ................................................. 28 2.7. D duplicate feature w eights on H H .......................................................... ............................28 2.8. Duplicate feature weights on Crabs ............... ........................ ................................ 28 2.9. Percentage of either two identical features' weights being set to zero in HH.....................29 2.10. Percentage of either two identical features' weights being set to zero in Crabs...................29 2.11. Five features ................................................... 32 2 .12 T e n fe atu re s ..................................................................................................................... 3 2 2.13. Fifteen features................................................... 32 2.14. Comparisons of JCFO, nonkemelized ARD and kemelized ARD ............................ 35 2.15. Comparisons of JCFO and kernelized ARD with an added noninformative irrelevant fe atu re ........................................................................................... 3 5 2.16. Comparisons of JCFO and kernelized ARD with an added similar redundant feature ........36 2.17. Comparisons of the three ARD methods ...................................................... ............ 38 7 LIST OF FIGURES Figure page 1.1. G aussian (dotted) vs. Laplacian (solid) prior............................................... ..................... 15 1.2. Logarithm of gamma distribution. From top down: a=le2 b=le2; a=le3 b=le3; a= le4 b= le4. .............................................................................23 2.1. Two Gaussian classes that can be classified by either x or y axes ......................................30 2.2. Weights assigned by JCFO (from top to down: x, y and z axis) ........................................30 2.3. Weights assigned by ARD (from top to down: x, y and z axis) ..........................................31 2.4. Two Gaussians that can only be classified by both x and y axes .......................................33 2.5. Weights assigned by JCFO (from top to down: x, y and z axis) ........................................34 2.6. Weights assigned by ARD (from top to down: x, y and z axis) ..........................................34 2 .7 C ross data ......... ................................................................... 36 2 .8. E llip se data ............... .. ............................................................37 LIST OF SYMBOLS Symbols Meanings x' the ith input object vector x'I thejth element of the ith input object vector P weight vector Sp p norm I identity matrix 0 zero vector V(v 0,1) 0 mean unit variance normal distribution evaluated on v sgn(.) sign function H design matrix (.) positive part operator (D(z) Gaussian cdf 9N(x 0,1)dx E[z] z's expectation f(t) the estimate of f in the tth iteration O() big 0 notation of complexity o elementwise Hadamard matrix multiplication LIST OF ABBREVIATIONS JCFO: Joint Classifier and Feature Optimization LASSO: Least Absolute Shrinkage and Selection Operator RBF: Radial Basis Function SVM: Support Vector Machine RVM: Relevance Vector Machine ARD: Automatic Relevance Determination Abstract of Thesis Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Master of Science A STUDY OF JOINT CLASSIFIER AND FEATURE OPTIMIZATION PAGE: THEORY AND ANALYSIS By Fan Mao December 2007 Chair: Paul Gader Major: Computer Engineering Feature selection is a major focus in modem day processing of high dimensional datasets that have many irrelevant and redundant features and variables, such as text mining and gene expression array analysis. An ideal feature selection algorithm extracts the most representative features while eliminating other noninformative ones, which achieves feature significance identification as well as the computational efficiency. Furthermore, feature selection is also important in avoiding overfitting and reducing the generalization error in regression estimation where feature sparsity is preferred. In this paper, we provide a thorough analysis of an existing stateoftheart Bayesian feature selection method by showing its theoretical background, implementation details and computational complexity in the first half. Then in the second half we analyze its performance on several specific experiments we design with real and synthetic datasets, pointing out its certain limitations in practice and finally giving a modification. CHAPTER 1 VARIABLE SELECTION VIA JCFO 1.1 Introduction Joint Classifier and Feature Optimization (JCFO) was first introduced by (Krishnapuram et al, 2004). It's directly inspired by (Figureiredo, 2003) for achieving sparsity in feature selections, by driving feature weights to zero. Compared to the traditional ridge regression, which shrinks the range of the regression parameters (or simply the weight coefficients of a linear classifier) but rarely sets them directly to zero, JCFO inherits the spirit of LASSO (Tibshirani, 1996) for driving some of the parameters exactly to zero, which is equivalent to removing those corresponding features. In this way, it is claimed that JCFO eliminates redundant and irrelevant features. The remainder of this chapter will be arranged as follows: Section 2 takes a close look at how this idea was derived; Section 3 gives the mathematical structure of JCFO; Section 4 and 5 further illustrate an EM algorithm which is used to derive the learning algorithm for JCFO and analyze its complexity. The last part briefly introduces two other approaches that have a similar functionality. 1.2 Origin 1.2.1 LASSO Suppose we have a data set Z = {(x',y,)},i= 1,2,...,n where x' = (xl,x,...xk) is the ith input variable vector whose elements are called features, and yj are the responses or the class labels (this thesis will only consider, e {0,1}). The ordinary least squares regression would be to find f = (S1, g2,..., Pk)', which minimizes N k J1 (ii) 11 J=\ The solution is the wellknown original least squares estimate f0 = (XTX) X'y As mentioned in (Hastie et al, 2001), it often has low bias but large variance when X'X is invertible, which tends to incur overfitting. To improve the prediction accuracy we usually shrink or set some elements of f to zero in order to achieve parameter sparsity. If this is appropriately done, we actually also employ an implicit feature selection process on our dataset getting rid of the insignificant features and extracting those that are more informative. In addition to simplifying the structure of our estimation functions, according to (Herbrich, 2002), the sparsity of weight vector/ also plays a key role in controlling the generalization error which we will discuss in the next section. The ridge regression as a revised approach that penalizes the large /P would be N k k argmin (y, Z/ x)2 _+A (1.2) f =1 ==1 j=1 Here A is a shrinking coefficient that adjusts the ratio of the squared 2 norm off to the residual sum squares in our objective function. The solution of (1.2) is(1 + y) /f0, (Tibshirani, 1996) where y depends on A and X. This shows ridge regression reduces f to a fraction off0, but rarely sets their elements exactly to zero hence can't completely obtain the goal of feature selection. For an alternative approach, (Tibshirani, 1996) proposed changing (1.2) to N k k (y, )2 + (1.3) 11 Jy1 J=1 Or equivalently, (y f )T (y fTX)+A Pfl (1.4) I denotes the ?, norm. This is called Least Absolute \N//,, i/kige and Selection Operator (LASSO). To see why / norm favors more sparsity of #, note that (/i, / ) i = ) (1, 0  = 1, but (v/i2, )/i2) = 2 > I (1,0) 1 =1 .Hence it tends to set more elements exactly to zero. By (Tibshirani, 1996), the solution of (1.4) for orthogonal design X is ,8 =sgn(,)(gO ), (1.5) where sgn(.) denotes the sign function and (a) is defined as (a) = a, ifa > 0; 0 otherwise. y depends on A. Intuitively, y can be seen like a threshold that "filters" those 0o s that are below a certain range and truncates them off. This is a very nice attribute by which our purpose of getting sparsity has been fulfilled. 1.2.2 The Prior of fl The ARD (Automatic Relevance Determination) approach proposed by (Figureiredo, 2003) inherits this idea of promoting sparsity from LASSO through a Bayesian approach. It considers the regression functional h as linear with respect to f so our estimate function would be d f(x,fl) = h, (x)= f h(x), (1.6) J1 where h(x) could be a vector of linear transformations of x, nonlinear fixed basis functions or kernel functions which consist of a socalled design matrix H, such that H j = h (x) Further, it assumes that the error e y, flh(x) is a zero mean Gaussian, V(0, u2). Hence the likelihood would be P(Y f) = P=(y  HP,a2I), (1.7) where lis akx k identity matrix. Note that since we assume samples are i.i.d. Gaussian, there's no correlation among them. More importantly, this method also assigns a Laplacian prior to f : p ) = k k e p(fl I) = a {a 8 e iJ}= exp{ f a . 1=1 2 2 The influence that a prior exerts on the sparsity off was discussed in (Herbrich, 2002), where they used a W(0, Ik) prior to illustrate that fly's logdensity is proportional to k  2 = i and the highest value is acquired when f = For the comparison of Gaussian 1=1 and Laplacian priors, we plot both of their density functions in Figure 1.1. The latter is much more peaked in the origin therefore favors more/f's elements being zero. 05 0 45 04 035 03 025 02 8 6 4 8 6 4 2 0 2 4 6 Figure 1.1. Gaussian (dotted) vs. Laplacian (solid) prior The MAP estimate off/ is given by f = arg min {y Hfl[ + 22a 1} (1.8) It can be easily seen that this is essentially the same as (1.3). IfH is an orthogonal matrix, (1.8) can be solved separately for each /, (See Appendix 1.1 for the detail derivation.) /, = arg min{/,2 2/, (H'y), + 2c2a /7, = sgn((H'y),)((HTy), 2a) (1.9) Unfortunately, in the general case /8 cannot be solved directly from (1.8) due to its nondifferentiability in the origin. As a modification, (Figureiredo, 2003) present a U 15b 01 005 0 10 8 10 hierarchicalBayes view of the Laplacian prior, showing it's nothing but a twolevel Bayes modelzero mean with independent, exponentially distributed variance: p(fl, I ,) = V(0, r,) and p(, y) = (/ 2) exp{ ( / 2)r, }, such that: p(A, )= p(l, r,)p(r, y)dr, = exp {J f/, }, (1.10) 0 2 where r, can be considered as a hidden variable being calculated by an EM algorithm while y is the real hyperparameter we need to specify. (This integration can be found in Appendix 1.2.) 1.3 JCFO From the same Bayesian viewpoint, JCFO (Krishnapuram et al, 2004) apply a Gaussian cumulative distribution function (probit link) to (1.6) to get a probability measure of how likely one input object belongs to a classic e {0,1}. To be more precise: P(y= 1 x)= ( ,+f0 K (x,x) (1.11) where c(z) = N(0, 1)dz. (Note that ((z) = 1 (z).) K,(x, x) is a symmetric measure of the similarity of two input objects. For example: k r K,(x, x) I= 1 +f xx (1.12) 1=1 for polynomial functions and: K,(x, xJ) exp O, x )2 (1.13) for Radial Basis Functions. (The good attributes of RBF functions will be discussed in Section 2.4.) An apparent disadvantage of (1.6) is that if we don't choose h as a linear function of x (or simply the input vector x itself), the calculation of f would be like selecting the kernel functions of x rather than selecting the features of x. This is also why it's conceptually an ARD method. As a major modification of explicitly selecting each feature, in (1.12) and (1.13) JCFO assigns each xl a corresponding parameter, Since the same is applied to the ith element of each input x , it is equivalent to weighing the significance of the ith feature in the input object. This is how the feature selection is incorporated. Another point that needs to be mentioned is that all 0, should be nonnegative because K,(x, x) measures the similarity of two objects as a whole, i.e. accumulating the differences in each two corresponding elements. Had 0, = 0J, in (1.13) the difference of the ith and thejth elements of two objects would have been cancelled. Then like how we treated /P, each O, is given a Laplacian prior, too: fp 2 6(OkO, pk, ) if8k>0 P(O Pk ) = 2,,5>O (1.14) where p(pk 72) (72 / 2)exp{( / 2)p}, thus: exp( 2 Ok)k > 0 P(Ok Pk )P(Pkk 172)dpk {= if ( > (1.15) 0o if Ok 0 1.4 EM Algorithm for JCFO Now from our i.i.d. assumption of the training data, its likelihood function becomes N N Y3 p(DPfl,O)=fJ 4 ,80 +l Ko(x,x') 1=1 f j=1) (1.16) _6i, _,Ko(x'),x) J=1 Rather than maximizing this likelihood, JCFO provides an EM algorithm by first assuming a random function z(x, fl, 0) = h (x)fl + co, where h (x) = [1, K (x, x1),..., K (x, xN)] and o ~ (0,1). Then it treats z =[z, z2,...zN], p as the missing variables whose expectations would be calculated in the E step. E step The complete logposterior is log p(f, 0 D, z, p) oc log(z f, 0, D) + log(fl ) + log(0 I p) oc z'z fH (Hof 2z) flTP_T 'RO' where T = diag(r, r, ,...rT ) and R = diag(p p2',..., p,) If we want to calculate the expectation of (1.17), i.e. Q(, )(1.17) = EIzz iTHi (HO, 2z) zTTP)0OT ROT1 The expectations with respect to those missing variables are v,=E Iz' D,f, I = h (x')t + (2y'1)N(hO (x')(') I 0,1)' (118) SD((2y' i)h (x'))'l") a, =E[z\D, ^ ]=y, DL; and = IE[p, 1 D, 2,y2 =Y(2t) ,1 where l(t) denotes the estimated value of /, in the tth iteration. (The derivation of (1.18) can be found in Appendix 1.3) M step Now with the expectations of these missing variables at hand, we can start applying MAP on the Q function of (1.17). After dropping all terms irrelevant to f, 0 and setting v= [v,v2,..., vN], Q= diag[co), o2,...o) ] and A diag[d1,d2,...,8d] we get Q(f,0 t'tt)) =fT HHof + 2flTHv fl T 0' A0. (1.19) Take the derivative of (1.18) with respect to f and 0k would be: VfQ= 2HHol + 2Hv 2Ql (1.20) S2 {( N N+Hi Vf where o represents the elementwise Hadamard matrix multiplication. Now by jointly maximizing (1.19) with respect to both/f and0, we not only select the kernel functions but also those features of the input objects if both f and0 are parsimonious. This is the core motivation of JCFO. Something that needs to be stressed is that f could be solved directly in each iteration while Okcan only be computed in terms of an approximation method. This in fact increases the computation complexity and, even worse, influences the sparsity of 0. We will repeatedly come to this issue in Chapter 2. 1.5 Complexity Analysis What is the complexity of the EM algorithm above in the general case? Since in each iteration this algorithm alternatively calculates f and 0, we investigate these two scenarios respectively. Complexity of estimating / Let's first take a look at the search for/f, i.e. setting (1.19) to zero: f = ( + HTHo)' HvT (1.22) i) If kernel functions are applied, the calculations for the inner product is O(k) while the common matrix inversion and multiplication of an N x N matrix in matlab are O(NC), 2 < c < 3 and O(N3) respectively. Therefore the whole complexity of (1.22) would be: O(N2k + N3 + NC). For lowdimensional datasets with k < N, we can simplify it as O(N3). ii) If only the linear regression is used, there's no need to calculateH., thus the complexity is counted by the larger portion between matrix inversion and multiplication, i.e. max(Nc, N2k), (2 < c < 3), or O(NC) if we assume < Nc2 Compared with the kernel version, this doesn't seem like reducing many computations. However, if the constants in the Onotation are taken into account, the difference would be conspicuous. Meanwhile, without the kernel, we get rid of the calculation of 0. This also saves us enormous computation time, which we are going to show in the subsequent section. For an improved method, we can exploit the fact that after a few iterations, more and more/, would vanish, which means there's no use in recalculating their corresponding kernel functions K,(x', x1), j = 1,2,...,N. Hence we can delete this entire row from Ho and shrink the size of this design matrix significantly. This desirable attribute can also be applied to 0, especially in the case of high dimensional dataset, as an alleviation of the computation pressure incurred by kernel methods. Complexity of estimating 0 After getting the value off/ in the current iteration, 0 is solved by plugging this value into (1.21) and approximating it analytically. (The authors of JCFO choose a conjugate gradient method.) Depending on different datasets, the number of iterations required to estimate the minimum of the target function vary remarkably, which makes it difficult to quantify the exact time complexity of calculating 0. Here we provide a comparison of the time used to compute 0 with that of other computations by averaging 50 and 100 runs of JCFO on two datasets. The JCFO matlab source code is from Krishnapuram, one of the inventors of JCFO. Two hyper parameters are chosen to be ,1 = 0.002 and = 4; 6, 's are initialized as 1 / k, where k is the number of features. Table 1.1. Comparisons of computation time of 0 and other parts. Datasets HH Crabs Variables (50 runs) sec (100 runs) sec 0 mean 200.3 16.27 std 55.70 0.4921 Other mean 129.7 9.524 std 37.97 0.3577 We can see that nearly twothirds of computations are used to compute 0. Furthermore, by choosing a conjugate gradient method, like the authors did with fmincon in matlab, we cannot assume the minimum would be obtained by a parsimonious 0. In other words, since the optimization itself is not sparsityoriented, enforcing this requirement sometimes will lead to its terminating badly. This is a major weakness of JCFO. In order to verify our hypotheses, we will look at more behaviors of 0 in Chapter 2. 1.6 Alternative Approaches 1.6.1 Sparse SVM In the following two sections we will briefly discuss two other approaches derived from LASSO that have the very similar property as JCFO. (B.Scholkopf and A.J. Smola, 2002) introduce an insensitive SVM which has the following the objective function: 1 2 N mm r(w, (.)) = w C2 ), wGH,GRN,bGR 2 N, .t. ((w,x,)+ b) y, +, (1.23) y,((w, x)+ b) E + , *) > 0, E > O. From the constraints we know that if a sample object falls inside the C tube, it won't get a penalty in our objective function. The parameter (*) denotes the real distance of a point to the e tube C is a coefficient that controls the ratio of the total ,(* distance to the norm of the weight vector both of which we want to minimize, or generally, the balance between the sparsity and the prediction accuracy. The variable e [0,1] is a hyper parameter that behaves as the upper bound of the fraction of errors (the number of points outside the tube) as well as the lower bound of the fraction of SVs. Now let's take the derivative of w, b in the dual form of (1.22) and set them to zero. We would get N 1N N max W(a"())= (a, +a)y, 11(a, +a )(a, +ac)K(x,, x), aGRN 11 2 ,1 J1 N S.t. Z(a a*) 0, =1 (1.24) (*) [0,C], 0(' +a)< Cv . This is the standard form of v SVM. Since the points inside the c tube can't be used as the SVs, intuitively it attains more weight vector sparsity than ordinary SVM by assigning more a*) to be zero. (J. Bi et al, 2003) proposes a modification by adding a term 1 a to the loss function (this revision is called SparseSVM) in order to get an even more parsimonious set of SVs. This also inherits the idea from Lasso. Of course, optimization involves a searching algorithm to find both the prior C and v. 1.6.2 Relevance Vector Machine The Relevance Vector Machine (RVM) (M. Tipping, 2000) has a very similar design structure as JCFO. Indeed, it also assumes a Gaussian prior on Cw ~ (0, T) and p(t I e) N(Xo, o2I) where T = diag(r, r2,r), r, can be considered as the hyper parameter of o and o2 is known. These assumptions are the same as (1.7) and (1.14). However, instead of calculating the expectation of t and T and plugging them back to maximize the posterior as what JCFO did in (1.17) and (1.19), RVM integrates out o to get the marginal likelihood: p(t T,')a2=fp(t ) 2)p(w T)dw) m p i T T 1 (1.25) =(2zr) o2I+XTXT 2 exp t(cr2+XTXr)'t 2 Henceforth we can take the derivatives with respect to r, and 02 in order to maximize (1.25). Interestingly, the logarithm of Gamma distribution F(r, I a, b) assigned by (Herbrich, 2002) as the prior ofr, has the same effect as the Laplacian prior JCFO uses in promoting the sparsity of(o. We plot the shapes of this prior with different a, b in Figure 1.2. 1 05 0 0 1 15 Figure 1.2. Logarithm of gamma distribution. From top down: ae2 be2; ae3 be3; a=e4 b=e4. 6 102 1 5 0'5 8 05 1 15 2 Figure 1.2. Logarithm of gamma distribution. From top down: a=le2 b=le2; a=le3 b=le3; a=le4 b=le4. CHAPTER 2 ANALYSIS OF JCFO 2.1 Brief Introduction As the inventors of JCFO claim, its major objectives are twofold: i To learn a function that most accurately predicts the class of a new example (Classifier Design) and, ii To identify a subset of the features that is most informative about the class distinction (Feature Selection). Within this chapter we will investigate these two attributes of JCFO on several real and synthetic datasets. From Experiment 2.2.1 to 2.3.2.2, we train our samples from the 220sample 7feature HH landmine data and 200sample5feature Crabs data as our basic datasets, both of which are separable by an appropriate classification algorithm. Then we'll take a 4fold cross validation on that. For the rest of the experiments, we'll generate several special Gaussian distributions. From the performance evaluation's perspective, we compare both of JCFO's prediction and feature selection abilities using RBF kernel with the nonkernelized (linear) ARD method presented by (Figureiredo, 2003), which is its direct ascendant. Regarding our interest, we are going to devote more effort to the feature selection, primarily to the elimination of irrelevant and redundant features. In each scenario we'll contrive a couple of tiny experiments that test a certain aspect. Then a more extensive result will be provided together with an explanation of how this result is derived. The pseudo code of each experiment can be found in Appendix B. 2.2 Irrelevancy Intuitively, if some features don't vary from one class to another much, which means they're not sufficiently informative in differentiating the classes, we call them irrelevant features. To be more precise, it's convenient to introduce the betweenclass divergence as follows: SO 1 2 1 dk + 2 2)+ (1 2) )2( + ) 2 2 o2 0 2 1 02 where k denotes the kth feature and /, o, are the mean and the variance of Feature k of class i. Note that d is in reverse proportion to the similarity of the two classesthe more similar the mean and variance are, the less d would become. In the extreme case that the two distributions represented by the kth feature totally overlap, d drops to zero. This is equivalent to saying that the kth feature is completely irrelevant. We will use this concise measure to compare with the feature weight 0k assigned by JCFO in the following experiments. The divergences of each feature in HH and Crabs are given by Table 2.1 and Table 2.2. Table 2.1. Feature divergence of HH Feature 1 2 3 4 5 6 7 Divergence 0.3300 0.5101 32.7210 0.2304 1.3924 3.0806 3.9373 Table 2.2. Feature divergence of Crabs Feature 1 2 3 4 5 Divergence 0.0068 0.3756 0.0587 0.0439 0.0326 2.2.1 Uniformly Distributed Features The distribution of this kind of feature is completely the same between two classes. Obviously, features like this should be removed. We examine the performance of JCFO and ARD in eliminating uniformly distributed features by adding a constant column as a new feature to Crabs and HH. Testing results that average 50 runs are shown in Table 2.3 and Table 2.4. Only ARD completely sets the interferential feature weight to zero on both datasets, while JCFO doesn't fulfill the goal of getting rid of the invariant feature on HH. Nonetheless, the time consumption of JCFO is enormous, too. Table 2.3. Uniformly distributed feature in Crabs JCFO ARD Train error rate mean (%) 2 4 Train error rate std (%) 0 0 Test error rate mean (%) 6 6 Test error rate std (%) 0 0 Percentage of the noise weights set to zero 100% 100% Running Time 2 hours Imin Table 2.4. Uniformly distributed feature in HH JCFO ARD Train error rate mean (%) 1.49 14.5 Train error rate std (%) 1.48 2.8e15 Test error rate mean (%) 21.7 20 Test error rate std (%) 4.63 8.5e15 Percentage of the noise weights set to zero 0% 100% Running Time 4 hours Imin 2.2.2 Noninformative Noise Features While a uniformly distributed feature is rarely seen in realistic datasets, most of the irrelevancies lie under the cover of various kinds of random data noise. We can emulate this by adding the class independent Gaussian noise as a feature into our datasets. Although different means and variances can be chosen, this feature itself is still noninformative in terms of classification. Our purpose is to see if JCFO can remove this noise feature. By adding unit Gaussian noise with means of 5, 10 and 15 each time as a new feature and training each one 50 times, we get an average divergence of the noise feature as 0.0001. The testing result is provided in Table 2.5. Table 2.5. Noninformative Noise in HH and Crabs JCFO ARD HH Crabs HH Crabs Train error rate mean (%) 1.55 1.82 14.55 2.40 Train error rate std (%) 1.32 0.83 0.1 1.96 Test error rate mean (%) 27.41 5.00 20.03 3.60 Test error rate std (%) 6.1 1.83 0.47 2.95 Percentage of the noise weight being set to zero 0 8.87 96 99.3 It can be seen that almost none of the corresponding noise weights are set to zero by JCFO. By contrast, ARD almost always sets noise weights to zero. This implies that JCFO does not eliminate irrelevant Gaussian features well compared to the ARD method. As mentioned in Section 1.5.2, the reason is likely due to the fact that although fl in (1.10) and 0 in (1.14) have the same sparsitypromoting mechanism, in the implementation 0 can only be derived approximately. This leaves the values of its elements tiny but still nonzero; thus, it conversely compromises O's sparsity. A similar situation will be reencountered in the next section. Nonetheless, JCFO doesn't beat ARD in terms of the testing error. This result can also be explained as the unfulfilment of the sparsityrelated reduction of generalization error with respect to 0 because we know the two methods share the same functionality in the f part. 2.3 Redundancy A redundant feature means that it has a similar, but not necessarily identical, structure to its other counterparts. Here we can employ a wellknown measure to assess the correlation between two features: N YXk, XA, N Nk k=1 k=l k=1 where i and j denote the ith and jth feature. The larger the p is, the more correlated they are. The extreme case is 1 when they are identical. 2.3.1 Oracle Features First let's look at a peculiar example: if we put the class label as a feature, obviously this can be considered as the most informative attribute in distinguishing the classes, and all the other features are redundant compared to it. We shall check if JCFO can directly select this 'oracle' feature and get rid of all the others; i.e., only the 0, corresponding to this feature is nonzero. We test JCFO and ARD 100 times on HH and Crabs respectively with the 'label feature' being added. Table 2.6. Oracle feature JCFO ARD Train error rate (%) 0 0 Only the label feature weight is nonzero? Yes Yes The purpose of designing this toy experiment is to determine if, as feature selection approaches, both of them have the ability of identifying the most representative feature from other relatively subtle ones. This should be considered as the basic requirement for feature selection. 2.3.2 Duplicate Features A duplicate feature is identical to another feature and thus is completely redundant. Of course, here the correlation value of the two features is exactly 1. This experiment will examine whether JCFO can identify a duplicate feature and remove it. We achieve this by simply replicating a feature in our original dataset then adding it back as a new feature. We repeat this process on each feature and test whether it or its replica can be eliminated. Table 2.7. Duplicate feature weights on HH Feature 1 2 3 4 5 6 7 Error(%) Weight __Train/Test JCFO Itself 0.2157 0.1438 0.3130 0.2134 0.4208 0.2502 0.2453 4.082/ Duplicate 0.2300 0.1520 0.3539 0.2142 0.4010 0.2454 0.2604 20.32 ARD Itself 0 0 0 0 0 0 0.0094 14.63/ Duplicate 0 0 0 0 0 0.3172 0 20.25 Table 2.8. Duplicate feature weights on Crabs Feature 1 2 3 4 5 Error (%) Weight _Train/Test JCFO Itself 0 0.0981 0.1091 0 0.0037 2/ Duplicate 0 0.0846 0.1076 0 0.0032 6 ARD Itself 0 0.9558 0.9050 0 0 4/ Duplicate 0 0 0 0 0 6 From the above tables, apparently JCFO assigns each feature approximately the same weight as its duplicate, while ARD sets either the feature or its duplicate (or both) to zero. Meanwhile, considering their testing errors we come to the situation of 2.2.2 again. 2.3.3 Similar Features Now we examine the features that are not identical but highly correlated. This section is separated into two parts: 2.3.3.1 This experiment is similar to the duplicate feature experiment. We coin a new feature by replicating an original feature, mixing it with a unit variance Gaussian noise. A verification is also taken on each feature to check if either this feature or its counterpart is removed by running the test 50 times on HH and 100 times on Crabs and averaging the times of either of their weights being set to zero. Table 2.9. Percentage of either two identical features' weights being set to zero in HH Feature 1 2 3 4 5 6 7 Error(%) p (0.4652) (0.8951) (0.9971) (0.9968) (0.9992) (0.9992) (0.1767) Train/Test Weight JCFO 30 30 20 10 10 10 20 3.853/21.4 ARD 100 100 100 100 99 100 92 14.51/20.11 Table 2.10. Percentage of either two identical features' weights being set to zero in Crabs Feature 1 2 3 4 5 Error (%) P (0.9980) (0.9970) (0.9995) (0.9996) (0.9977) Train/Test Weight JCFO 90 10 30 90 50 2.013/4.840 ARD 100 91 97 99 100 4.001/5.976 The results on two datasets still imply that JCFO can't perform as well as ARD in terms of the redundancy elimination. y x z Figure 2.1. Two Gaussian classes that can be classified by either x or y axes 2.3.3.2 In Figure 2.1 above, these two ellipsoids can be differentiated with either the x or y variables, thus either can be considered as redundant to the other. However, z is completely irrelevant. We generate 200 samples each time, half in each class, and run both methods 150 times. We plot the 0 corresponding to three axes features in the following figures. 0.4 0.2 0 0 50 100 150 0.4 0.2 0 0 50 100 150 0.1 0.05 0 50 100 150 Figure 2.2. Weights assigned by JCFO (from top to down: x, y and z axis) 0.4 0.2 0 0 50 100 150 0 0.2 0.4 0 50 100 150 0.05 0 0.05  0 50 100 150 Figure 2.3. Weights assigned by ARD (from top to down: x, y and z axis) It can be seen that both methods discover the irrelevancy of z axis though the performance of ARD is much better. For the redundancy, unfortunately neither of them eliminates the x or y axis. But, more interestingly, the 0 assigned by ARD on x and y each time are almost identical while those assigned by JCFO are much more chaotic. Note that the redundancy here is different from the previous experiments since it's more implicit from a machine's perspective. We will discuss this further in 2.4. 2.3.4 Gradually More Informative Features In this experiment we generate samples from two multivariate Gaussians. The corresponding elements in both mean vectors have gradually larger differences with the increase of their orders. e.g. suppose q, = (0, 0,..., Ok) and p2 = (10, 20,..., 1Ok). The variance of each dimension is fixed to one fourth of the largest element difference. So, in the above case it should be 10k/4. By treating each dimension as a feature, we contrive a set of gradually more informative features since the feature corresponding to the largest order has the most distant means and the least overlapped variances, thus it is the most separable. When compared to this feature, the rest could be deemed redundant. We do this experiment with 5, 10 and 15dimensional data respectively. In each experiment we generate 200 samples and perform a 4fold crossvalidation. This process is repeated 50 times on JCFO and 100 times on ARD, then we look at whether the most informative feature is kept while the others are deleted. Results are listed in the following three tables. Table 2.11. Five features u JCFO ARD Feat mean std zero(%) mean std zero(%) 1 0.0001 0.0008 14 0.0057 0.0072 57 2 0.0041 0.0013 0 0.0352 0.0155 8 3 0.0090 0.0020 0 0.0819 0.0168 0 4 0.0158 0.0021 0 0.1486 0.0193 0 5 0.0256 0.0027 0 0.2339 0.0224 0 Table 2.12. Ten features JCFO ARD Fetu mean std zero(%) mean std zero(%) 1 0 0 100 0.0001 0.0010 95 2 0.0001 0.0003 80 0.0026 0.0045 72 3 0.0011 0.0019 50 0.0070 0.0082 52 4 0.0027 0.0028 34 0.0187 0.0125 24 5 0.0040 0.0039 16 0.0286 0.0145 14 6 0.0053 0.0049 6 0.0460 0.0146 4 7 0.0099 0.0071 0 0.0668 0.0171 0 8 0.0133 0.0075 0 0.0856 0.0156 0 9 0.0144 0.0087 2 0.1125 0.0190 0 10 0.0214 0.0106 0 0.1356 0.0197 0 Table 2.13. Fifteen features JCFO ARD Fetu mean std zero(%) mean std zero(%) 1 0 0 100 0.0002 0.0009 94 2 0.0001 0.0004 92 0.0009 0.0024 86 3 0.0002 0.0006 84 0.0022 0.0038 71 4 0.0009 0.0017 68 0.0046 0.0063 61 5 0.0015 0.0027 62 0.0062 0.0076 56 6 0.0021 0.0036 52 0.0118 0.0099 36 7 0.0030 0.0050 44 0.0143 0.0120 35 8 0.038 0.0050 30 0.0242 0.0143 18 9 0.0044 0.0069 34 0.0318 0.0156 10 10 0.0068 0.0089 30 0.0413 0.0138 2 11 0.0057 0.0075 30 0.0497 0.0157 2 12 0.0109 0.0120 16 0.0588 0.0176 2 13 0.0117 0.0110 8 0.0735 0.0159 0 14 0.0118 0.0136 20 0.0845 0.0173 0 15 0.0151 0.0134 12 0.0992 0.0193 0 The above tables ::::i :i that although the most .; ::1F .:::i : :::::: are assigned to the largest weights by both methods, the rest of the features are not removed but ranked in terms of their .... : .. Generally, JCFO eliminates more redundant features in the 2nd and 3rd cases However, table 2.12 also shows that sometimes JCFO incorrectly deletes the most informative feature (1: for Feature 15). In some scenarios, the Cf .... selection method can't .i H : the redundancy, but it gives a ranking measurement according to how much each r contributes to the classification, which is also an acceptable alternative. (I. Guyon, ''') 2.3.5 Indispensable Features In : !: ::: classification, if a set of features : :.:I* as a whole, i.e. no subsets of them can fulfill the classification, we call them indispensable features. Eliminating a feature from an .I: ,: :1. feature set will cause a selection error. '::; e are going to check how JCFO and ARD deal with this kind of ': ::: If we treat the x and y variables as two: : : ::. in Figure 2.4, .'i.1. :i. they consist of an ::.. .. : .*. feature set since none of them alone can determine the classes. y z Figure 2.4. Two Gaussians that can only be classified by both x and y axes Like in Experiment 2.3.3.2, we generate 2:: samples each time, half in each class and run both methods 150 times. Results are listed below: 0.5 1  0 0 50 100 150 1 0.5 0 0 50 100 150 1 0.5 0 0 50 100 150 Figure 2.5. Weights assigned by JCFO (from top to down: x, y and z axis) 0.4 0.5  0.6 0 50 100 150 0 0.2 0.4 SA V v V 0 50 100 150 0.1 0 50 100 150 Figure 2.6. Weights assigned by ARD (from top to down: x, y and z axis) The fact that both JCFO and ARD don't remove either x or y axisfeatures is a good result. However, the weights assigned by ARD are much more stable. Regarding the irrelevant z axis, ARD does a clean job like it did in 2.3.3.2. Therefore, we can still evaluate its performance as better than JCFO. 2.4 Nonlinearly Separable Datasets The experiments with synthetic data in the previous two sections concentrated on classifying linearly separable features, which is not the strong suit for JCFO. In the following two tests, we will examine JCFO's performance on two nonlinearly separable datasets compared with that of ARD method. The 2D Cross data comprises two crossing Gaussian ellipses as one class and four Gaussians that surround this cross as the other class, each of which has 100 samples. (Figure 2.7) The 2D Ellipse data has two equal mean Gaussian ellipses as two classes, each of which has 50 samples. One of them has a much wider variance than the other, which makes distribution look like one class is surrounded by the other (Figure 2.8). Clearly, there's no single hyper plane that can classify either of the two datasets correctly. We run JCFO, nonkemelized ARD and kernelized ARD on both datasets 50 times respectively. A comparison of their performances is shown in Table 2.14. In the experiment summarized in Table 2.15, a noninformative irrelevant feature was added in exactly as described in Section 2.2.2. In the experiment summarized in Table 2.16, one redundant feature was added in as described in Section 2.3.1. Table 2.14. Comparisons of JCFO, nonkernelized ARD and kernelized ARD Methods Nonkemelized Kerelized JCFO Test error ( ARD ARD Cross mean 50 8.1 4.0 std 0 le14 le15 Ellipse mean 48 8.4 12 std 4e14 3e15 le14 Table 2.15. Comparisons of JCFO and kernelized ARD with an added noninformative irrelevant feature Methods Kerelized JCFO Test error (0 ARD Cross mean 17.7 7.32 std 17.6 3.20 Ellipse mean 15.4 13.4 std 5.70 5.03 Table 2.16. Comparisons of JCFO and kernelized ARD with an added similar redundant feature Methods Kemelized JCFO Test error (o ARD Cross mean 15.3 8.76 std 4.01 3.77 Ellipse mean 13.4 12.3 std 5.21 4.41 Since nonkemelized ARD performs a linear mapping, it cannot perform a reasonable classification in this scenario. Each of kemelized ARD and JCFO performs a little better in one of the original datasets. JCFO outperforms ARD on both of the two datasets mixed with irrelevant or redundant features, especially on the Cross data. However, the time consumption for training JCFO still dominates that of kernelized ARD and none of the parameters corresponding to the noise feature are set to zero by JCFO in these two scenarios. 1 0 + + + 0 0 0.6 + + 0.3 o o o n + o o +0.5 + + + + + + 0.4 + ++ + + 0 0 + 00 S+o + O 0 o0 0. + + 00 00 0 0.2 0 0 o 0 0 + 0 0.1 0 o 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 2.7. Cross data 0.8 0.7 + 0.6 + + + 0 0 + o 0 0 + + + 0 0 0 0.5 + 0 0 0 0 0 + + 0.5 o 0 0 + 0.4o o 0. + 00 0 O 0 + + + 0 0 0 0 0 0 0.3 + 0 0 + 0.2 + + + 0 0.1 + + + 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Figure 2.8. Ellipse data 2.5 Discussion and Modification From the trouble JCFO suffered in the experiments we designed, we reiterate what we discussed in 2.2.2: That is, JCFO does not reliably achieve sparsity on the feature set and does not, therefore, realize the advantage of low generalization error due to reduced model complexity. Hence, it doesn't achieve both of the two advantages it claims. Therefore, a more effective approximation method for calculating 0 in (1.21) needs to be proposed in order to achieve JCFO's theoretical superiority. Without such a method being available at hand, we suggest simply dropping the 0 part and returning to a kernelized ARD model. e.g. we can plug in the RBF kernel function in (1.13). According to (Herbrich, 2002), this function has the appealing property that each linear combination of kernel functions (see (1.6)) of the training objects (x', x2,...x ) can be viewed as a density estimator in the input space because it effectively puts a Gaussian on each x' and weights its contribution to the final density by 8, in (1.6). Regarding feature selection, we can apply a nonkernelized (linear) ARD to directly weigh each feature. (However, this requires our datasets shouldn't be nonlinearly separable.) As we have seen in 2.3.3.2, each time the x and y variables are assigned the same absolute value (different signs), which might help us detect the very implicit redundancy in that scenario. By combining these two approaches, we first use the latter to preselect the more representative features, then turn them back to the former to do a more precise learning. This will increase the efficiency of the whole learning process. We show comparisons of the performance of nonkernelized, kernelized ARD and the combo of them as below on the Crabs data and HH by training 100 times. Table 2.17. Comparisons of the three ARD methods Methods Nonkemelized Kernelized Combo Test error (_ Crabs mean 3.60 2.2 130 std 2.95 0 110 iterations 100 5 2 HH mean 20.03 12.63 9.17 std 0.47 6.26 2.11 iterations 100 5 3 Note that after implementing feature selections on the Crabs data, only two features remain. The Kernelized ARD will make /, vanish very fast on low dimension datasets, (here even after 2 iterations only two /, are nonzero) which causes the training result to be unstable (See the mean and std of the combinational method on the Crabs data). Therefore, we also need to realize that there are certain cases that the combination method is not appropriate and it also needs to balance the tradeoff between the parameter sparsity and the prediction accuracy. 2.6 Conclusion In this thesis, we introduced the theoretical background of an existing Bayesian based feature selection and classification method in the first chapter, from its origin to implementation detail as well as complexity. Then, from the second chapter, we systematically analyzed its performances in a series of particularly designed experiments. A couple of comparisons with its direct ascendant approach are provided. From these experimental results we've seen that even if JCFO is theoretically more ideal in achieving both the features and basis functions sparsity, the lack of an effective implementation technique seriously restricts its performance. As an alternative, we suggest returning to the original ARD method, jointly using the kernelized and nonkernelized versions of it to exploit both feature selections and class predictions. Though our model would become less ambitious henceforth, its simplicity in practice and timeefficiency are still preserved from our original design purpose. APPENDIX A SOME IMPORTANT DERIVATIONS OF EQUATIONS Equation (1.9) Consider the General case of minimizing: f(x) = x 2ax + 2b  x w.r.t. x, where a, b are constants. This is equivalent to: x2 2ax + 2bx, x > 0 minf(x) = x=0. x 2a 2bx, x < 0 When x> 0: b ,ab>0 (1) arg min f (x) =O (1) x 0 ab<0 (2) When x< 0: a+b ,a+b<0 (3) arg min f(x) = {a x 0 a+b > 0 (4) (1) & (3) b < 0 & x = arg min(f(a b), f(a + b)) ab,a+b x= a+b, a<0 x=ab, a>0 Sx = sgn(a)( a I b) (1)&(4)>a>0&x=ab>0 > x = sgn(a)( a I b)+ (2)&(3)>a<0&x=a+b <0 > x = sgn(a)( a I b)+ (2)&(4)>b>O&x=0 Sx = sgn(a)( a I b) Summarizing the above four scenarios yields: x = sgn(a)(i a I b) Integration (1.10) o 1 y" V 5 e 2 z.e 2 dr S2 e 2, te _ e 2 e 2 e 2 2r 2 1 dr 2f Sdx (letx r= T12,d 12 2 (22J_ 17xMl dt (Beyer,1979) Expectation (1.18) Since p(z' D, ,(t),(t)) p(z' D, Bt',t'")) N(h^, (Ax)",1), let's consider y =1, then h, (x)(t) > 0. i.e.: 0 0 ,z' <0 z' >0 ,z <0 ((h,, (x)/()) z' p(z' l D, '(x)fdz' o, The case of y = 0 follows the similar way.  =E Izz D, ft't), d't) APPENDIX B THE PSEUDOCODES FOR EXPERIMENTS DESIGN 2.2 Irrelevancy NOTES: In this testing unit, our basic datasets are randomly generated, linearly separable. Each feature is normalized to zeromean, unit variance. 2.2.1 Uniformly Distributed Feature counter 0; For 1:50 ds load dataset; (n,k) get the matrix size from ds; ds append a column of 1s as the k+1 feature Run JCFO (ds); If theta(k+1) = 0; counter counter + 1; End End Display(counter/50) 2.2.2 Noninformative Noise Features counter 0; For mu = 5, 10, 15 For 1:50 ds load dataset; (n,k) get the matrix size from ds; ds append a column of mumean, unit variance noise as the k+1 feature Run JCFO (ds); If theta(k+1) = 0 counter counter + 1; End End End Display(counter/150) 2.3 Redundancy NOTES: In this testing unit, our basic datasets come from the randomly generated, linearly separable datasets or Gaussian distributions. Each feature is normalized to zeromean, unit variance. 2.3.1 Oracle Features counter 0; For 1:100 ds load dataset; (n,k) get the matrix size from ds; ds < append the class label as the k+l feature Run JCFO (ds); If theta(k+1) > 0 and theta(1 through k) = 0; counter counter + 1; End End Display(counter/100) 2.3.2 Duplicate Features counter 0; For 1:50 ds load dataset; (n,k) get the matrix size from ds; Fori 1:k ds < replicate and append the ith feature Run JCFO (ds); If theta(i) = 0 or theta(k+1) = 0 counter counter + 1; End End End Display(counter/(50*k)) 2.3.3 Similar Features 2.3.3.1 counter 0; For 1:50 ds load dataset; (n,k) get the matrix size from ds; Fori 1:k ds < mix the ith feature with a unit noise and append it Run JCFO (ds); If theta(i) = 0 or theta(k+1) = 0 counter counter + 1; End End End Display( counter/(50*k)) 2.3.3.2 mul [10 0 0]; mu2 [0 10 0]; sigl [7 0 0; 0 1 0; 0 01]; sig2 [1 0 0; 0 70; 0 0 1] counter 0; For 1:150 ds randomly generate 200 samples from the two Gaussians with class label Run JCFO(ds); If either Feature X or Feature Y has been removed counter counter + 1; End End Display(counter/150) 2.3.4 Gradually More Informative Features counter 0; Fork=5, 10, 15 For 1:100 mul a vector with k zeros; mu2 [10 20 30 ... 10*k]; sig (10k/4) (ksize identity matrix) ds randomly generate 200 samples from the Gaussians with class label Run JCFO (ds); If theta(k)>0 and theta(1 through k1) = 0 counter counter + 1; End End End Display(counter/300) 2.3.5 Indispensable Features mul [5 5 0]; mu2 [8 8 0]; sig [4 0 0; 0 1 0; 0 01]; counter 0; For 1:150 ds randomly generate 200 samples from the two Gaussians with class label then rotate it 45 degree clockwise around the class center Run JCFO(ds); If theta(1)>0 and theta(2) > 0. counter counter + 1; End End Display( counter/150) LIST OF REFERENCES W. Beyer. CRC Standard Mathematical Tables, 25th ed. CRC Press, 1979. J. Bi, K.P. Bennett, M. Embrechits, C.M. Breneman, and M. Song. Dimensionality reduction via sparse support vector machines. JMLR, 3: 12291243, 2003. M. Figueiredo. Adaptive sparseness for supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9): 11501159, 2003. I. Guyon and Andre Elisseeff. An introduction to variable and feature selection. JMLR, 3:11571182, 2003. T. Hastie, R. Tibshirani and J. Friedman. The Elements of Statistical Learning. Springer Verlag, 2001. R. Herbrich. Learning Kernel Classifiers. MIT Press, 2002. B. Krishnapuram, A.J. Hartemink and M.A.T. Figueiredo. A Bayesian approach to joint feature selection and classifier design. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9): 11051111, 2004. B. Krishnapuram, D. Williams, Y. Xue, A. Hartemink, L. Carin and M. Figueiredo. On semisupervised classification. In L. K. Saul, Y.Weiss and L. Bottou, editors, Advances in Neural Information Processing Systems 17. MIT Press, 2005. B. Scholkopf and A.J. Smola. Learning with Kernels Regula'rizition, Optimization and Beyond. MIT Press, 2002. R. Tibshirani. Regression shrinkage and selection via lasso. J. Royal Statistical Soc. (B), 58: 267200, 1996. M. Tipping. The relevance vector machine. In S. A. Solla, T. K. Leen and K. R. Muller, editors, Advances in Neural Information Processing Systems 11, pp. 218224. MIT Press, 2000. BIOGRAPHICAL SKETCH Fan Mao was born in Chengdu, China, in 1983. He received his bachelor's degree in computer science and technology in Shanghai Maritime University, Shanghai, China, in 2006. Then he came to University of Florida, Gainesville, FL. In December 2007, he received his M.S. in computer science under the supervision of Dr. Paul Gader. PAGE 1 A STUDY OF JOINT CLASSIFIER AND FEATURE OPTIMIZATION: THEORY AND ANALYSIS By FAN MAO A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLOR IDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE UNIVERSITY OF FLORIDA 2007 1 PAGE 2 2007 Fan Mao 2 PAGE 3 To my Mom, for her enormous patience and unfailing love. 3 PAGE 4 ACKNOWLEDGMENTS I would like to thank my th esis adviser, Dr. Paul Gade r, for his encouragement and valuable adviceboth on the general research direc tion and the specific experi mental details. As a starter, I sincerely appreciate the opportunity of getting th e direct help from an expert in this field. I also thank Dr. Arunava Banerjee and Dr. Jose ph Wilson for being my committee members and reading my thesis. Special thanks go to Xuping Zhang. We had a lot of interesting disc ussions in the last half year, many of which greatly inspired me a nd turned out to be on my experiment designs. 4 PAGE 5 TABLE OF CONTENTS page ACKNOWLEDGMENTS ...............................................................................................................4 LIST OF TABLES ...........................................................................................................................7 LIST OF FIGURES .........................................................................................................................8 LIST OF SYMBOLS .......................................................................................................................9 ABSTRACT ...................................................................................................................................11 CHAPTER 1 VARIABLE SELECTION VIA JCFO...................................................................................12 1.1 Introduction ..................................................................................................................12 1.2 Origin ...........................................................................................................................12 1.2.1 LASSO ...............................................................................................................12 1.2.2 The Prior of ...................................................................................................14 1.3 JCFO ............................................................................................................................16 1.4 EM Algorithm for JCFO ..............................................................................................17 1.5 Complexity Analysis ....................................................................................................19 1.6 Alternative Approaches ...............................................................................................21 1.6.1 Sparse SVM .......................................................................................................21 1.6.2 Relevance Vector Machine ................................................................................23 2 ANALYSIS OF JCFO............................................................................................................24 2.1 Brief Introduction .........................................................................................................24 2.2 Irrelevancy ...................................................................................................................24 2.2.1 Uniformly Distributed Features .........................................................................25 2.2.2 Noninformative Noise Features ........................................................................26 2.3 Redundancy ..................................................................................................................27 2.3.1 Oracle Features ..................................................................................................27 2.3.2 Duplicate Features .............................................................................................28 2.3.3 Similar Features .................................................................................................29 2.3.4 Gradually More Informative Features ...............................................................31 2.3.5 Indispensable Features .......................................................................................33 2.4 Nonlinearly Separable Datasets ...................................................................................35 2.5 Discussion and Modification .......................................................................................37 2.6 Conclusion ...................................................................................................................38 APPENDIX A SOME IMPORTANT DERIVATIONS OF EQUATIONS...................................................40 5 PAGE 6 B THE PSEUDOCODES FOR EXPERIMENTS DESIGN......................................................43 LIST OF REFERENCES...............................................................................................................46 BIOGRAPHICAL SKETCH.........................................................................................................47 6 PAGE 7 LIST OF TABLES Table page 1.1. Comparisons of computation time of and other parts.......................................................21 2.1. Feature divergence of HH.................................................................................................. .....25 2.2. Feature divergence of Crabs............................................................................................... ....25 2.3. Uniformly distributed feature in Crabs...................................................................................2 6 2.4. Uniformly distributed feature in HH.......................................................................................26 2.5. Noninformative Noise in HH and Crabs...............................................................................26 2.6. Oracle feature............................................................................................................ ..............28 2.7. Duplicate feature weights on HH........................................................................................... .28 2.8. Duplicate feature weights on Crabs........................................................................................ 28 2.9. Percentage of either two identical f eatures weights being set to zero in HH........................29 2.10. Percentage of either two identical features weights being set to zero in Crabs...................29 2.11. Five features............................................................................................................ ..............32 2.12. Ten features............................................................................................................. ..............32 2.13. Fifteen features......................................................................................................................32 2.14. Comparisons of JCFO, nonkernelized ARD and kernelized ARD.....................................35 2.15. Comparisons of JCFO and kernelized ARD with an added noninformative irrelevant feature................................................................................................................................35 2.16. Comparisons of JCFO and kernelized ARD with an added similar redundant feature........36 2.17. Comparisons of the three ARD methods..............................................................................38 7 PAGE 8 LIST OF FIGURES Figure page 1.1. Gaussian (dotted) vs. Laplacian (solid) prior..........................................................................15 1.2. Logarithm of gamma distribution. From t op down: a=1e2 b=1e2; a=1e3 b=1e3; a=1e4 b=1e4.....................................................................................................................23 2.1. Two Gaussian classes that can be classified by either x or y axes.........................................30 2.2. Weights assigned by JCFO (from top to down: x, y and z axis)............................................30 2.3. Weights assigned by ARD (from top to down: x, y and z axis).............................................31 2.4. Two Gaussians that can only be classified by both x and y axes...........................................33 2.5. Weights assigned by JCFO (from top to down: x, y and z axis)............................................34 2.6. Weights assigned by ARD (from top to down: x, y and z axis).............................................34 2.7. Cross data................................................................................................................ ................36 2.8. Ellipse data.............................................................................................................. ................37 8 PAGE 9 LIST OF SYMBOLS Symbols Meanings ix the i th input object vector i j x the j th element of the i th input object vector weight vector p pnorm I identity matrix 0 zero vector (0,1) v N 0 mean unit variance nor mal distribution evaluated on v sgn() sign function H design matrix () positive part operator () z Gaussian cdf (0,1)z x dxN [] z zs expectation ()t the estimate of in the t th iteration () O big O notation of complexity elementwise Hadamard matrix multiplication 9 PAGE 10 LIST OF ABBREVIATIONS JCFO: Joint Classifier and Feature Optimization LASSO: Least Absolute Shrinkage and Selection Operator RBF: Radial Basis Function SVM: Support Vector Machine RVM: Relevance Vector Machine ARD: Automatic Relevance Determination 10 PAGE 11 Abstract of Thesis Presen ted to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Master of Science A STUDY OF JOINT CLASSIFIER AND FEATURE OPTIMIZATION PAGE: THEORY AND ANALYSIS By Fan Mao December 2007 Chair: Paul Gader Major: Computer Engineering Feature selection is a major focus in modern day processing of high dimensional datasets that have many irrelevant and redundant features and variables, such as text mining and gene expression array analysis. An id eal feature selection algorithm ex tracts the most representative features while eliminating other noninformativ e ones, which achieves feature significance identification as well as the computational effici ency. Furthermore, feature selection is also important in avoiding overfitti ng and reducing the generalization error in regression estimation where feature sparsity is prefe rred. In this paper, we provide a thorough analysis of an existing stateoftheart Bayesian f eature selection method by showi ng its theoretical background, implementation details and computational complexity in the first half. Then in the second half we analyze its performance on se veral specific experiments we design with real and synthetic datasets, pointing out its certa in limitations in practice a nd finally giving a modification. 11 PAGE 12 CHAPTER 1 VARIABLE SELECTION VIA JCFO 1.1 Introduction Joint Classifier and Feature Optimization (JCFO) was first intr oduced by (Krishnapuram et al, 2004). It's directly in spired by (Figureiredo, 2003) for achieving sparsity in feature selections, by driving feature weights to zero. Compared to the tradit ional ridge regression, which shrinks the range of the regression paramete rs (or simply the weight coefficients of a linear classifier) but rarely sets them directly to zero, JCFO inherits the spirit of LASSO (Tibshirani, 1996) for driving some of the parame ters exactly to zero, which is equivalent to removing those corresponding featur es. In this way, it is claimed that JCFO eliminates redundant and irrelevant features. The remainder of this ch apter will be arranged as follows: Section 2 takes a close look at how this idea was derived; Section 3 gives the ma thematical structure of JCFO; Section 4 and 5 further illustrate an EM algorithm which is used to deri ve the learning algorithm for JCFO and analyze its complexity. The last pa rt briefly introduces two other approaches that have a similar functionality. 1.2 Origin 1.2.1 LASSO Suppose we have a data set {()},1,2,..., Z yi i ix, n where 12(,,...)iiii k x xx x is the i th input variable vector whose elements are called features and y i are the responses or the class labels (this thesis will only consider {0,1} y i ). The ordinary least squares regression would be to find 12(,,...,)T k which minimizes 2 11(Nk i ij ijyx )j (1.1) 12 PAGE 13 The solution is the wellknown original least squares estimate As mentioned in (Hastie et al, 2001), it ofte n has low bias but large variance when 01 ()T XXXy T T X X is invertible, which tends to incur overfitting. To improve th e prediction accuracy we usually shrink or set some elements of to zero in order to achieve parameter sparsity. If this is appropriately done, we actually also employ an implicit feature select ion process on our dataset getting rid of the insignificant features and extracting those that are more informative. In addition to simplifying the structure of our estimation functions, according to (Herbrich, 2002), the sparsity of weight vector also plays a key role in controlling the gene ralization error which we will discuss in the next section. The ridge regr ession as a revised approach that penalizes the large j would be 2 111argmin()Nkk i ijj ijjyx 2 j (1.2) Here is a shrinking coefficient that ad justs the ratio of the squared norm of 2 to the residual sum squares in our objectiv e function. The solution of (1.2) is 10 (1) (Tibshirani, 1996) where depends on and This shows ridge regression reduces X to a fraction of 0 but rarely sets their elements exactly to zero hence can't completely obtain the goal of feature selection. For an alternative approach, (Tib shirani, 1996) proposed changing (1.2) to 2 111()Nkk i ijj ijjyx j (1.3) Or equivalently, 1 T TT y Xy X (1.4) 1 denotes the norm. This is called Least Absolute Shrinkage and Selection Operator (LASSO). To see why norm favors more sparsity of 1 1 note that 13 PAGE 14 2 2(1/2,1/2)(1,0)1 but 1 1(1/2,1/2)2(1,0)1 .Hence it tends to set more elements exactly to zero. By (Tibshirani, 1996), the solution of (1.4) for orthogonal design X is 00 sgn()()jjj (1.5) where denotes the sign function and sgn() () a is defined as () aa if ; 0 otherwise. 0 a depends on Intuitively, can be seen like a thres hold that filters those 0j s that are below a certain range and truncates them off. This is a very nice attribute by which our purpose of getting sparsity has been fulfilled. 1.2.2 The Prior of The ARD ( Automatic Relevance Determination ) approach proposed by (Figureiredo, 2003) inherits this idea of promoting sparsity from LASSO through a Bayesian approach. It considers the regression functional h as linear with respect to so our estimate function would be 1(,)()()d T jj jfhx x hx (1.6) where h ( x) could be a vector of linea r transformations of x, nonlinea r fixed basis functions or kernel functions which consist of a socalled design matrix H such that Further, it assumes that the error is a zero mean Gaussian, ,()ijjiHhx ()T iy hx 2(0,) N Hence the likelihood would be 2()(,) p y yH I N (1.7) where I is a identity matrix. Note that since we assume samples are i.i.d. Gaussian, theres no correlation among them. More importantly, th is method also assigns a Laplacian prior to kk : 1 1()expexp{} 22k k i ip 14 PAGE 15 The influence that a prior exerts on the sparsity of was discussed in (Herbrich, 2002), where they used a prior to illustrate that (,)k0I N s logdensity is proportional to 2 2 1 k i i and the highest value is acquired when 0 For the comparison of Gaussian and Laplacian priors, we plot both of their dens ity functions in Figure 1.1. The latter is much more peaked in the origin therefore favors more s elements being zero. 10 8 6 4 2 0 2 4 6 8 10 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Figure 1.1. Gaussian (dotted) vs. Laplacian (solid) prior The MAP estimate of is given by 2 2 2 argmin{2} yH 1 (1.8) It can be easily seen that this is essentially the same as (1.3). If H is an orthogonal matrix, (1.8) can be solved separately for each j (See Appendix 1.1 for the detail derivation.) 22 2argmin{2()2} sgn(())(())iT iiii TT ii i Hy HyHy (1.9) Unfortunately, in the general case i cannot be solved directly from (1.8) due to its nondifferentiability in the origin. As a modification, (Figureiredo, 2003) present a 15 PAGE 16 hierarchicalBayes view of the Laplacian pr ior, showing its nothing but a twolevel Bayes modelzero mean with independent, exponentially distributed variance: ()(0,)ii ip N and ()(/2)exp{(/2)}iip such that: 0()()()exp{} 2ii iiipppd i (1.10) where i can be considered as a hidden variable bei ng calculated by an EM algorithm while is the real hyperparameter we n eed to specify. (This integration can be found in Appendix 1.2.) 1.3 JCFO From the same Bayesian viewpoint, JCFO (K rishnapuram et al, 2004) apply a Gaussian cumulative distribution function ( probit link) to (1.6) to get a pr obability measure of how likely one input object belongs to a class To be more precise: {0,1} c (1.11) 0 1(1) ()N i i iPy K x x x where ()(0,1)z z dzN (Note that ()1() zz .) ( ) j Kx,x is a symmetric measure of the similarity of two input objects. For example: (1.12) 1()1r k j iii iK x,x jx x for polynomial functions and: 2 2 1() ()expj k j ii i ixx K x,x (1.13) for Radial Basis Functions (The good attributes of RBF functi ons will be discussed in Section 2.4.) 16 PAGE 17 An apparent disadvantage of ( 1.6) is that if we dont choose h as a linear function of x (or simply the input vector x itself), the calculation of would be like selecting the kernel functions of x rather than selecting the features of x. This is also why its conceptually an ARD method. As a major modification of explicitly selecting each feature, in (1.12) and (1.13) JCFO assigns each i x a corresponding parameter i Since the same i is applied to the i th element of each input j x it is equivalent to weighing the significance of the i th feature in the input object. This is how the feature selection is incorporat ed. Another point that needs to be mentioned is that all i should be nonnegative because ( ) j Kx,x measures the similarity of two objects as a whole, i.e. accumulating the differences in each two corresponding elements. Had i j in (1.13) the difference of the i th and the j th elements of two objects would have been cancelled. Then like how we treated i each i is given a Laplacian prior, too: (1.14) 2(0,)0 () 00kkk kk kif p if N where 222()(/2)exp{(/2)}kkp thus: 22 2 0exp()0 ()() 00kk kkkk kif ppd if (1.15) 1.4 EM Algorithm for JCFO Now from our i.i.d. assumption of the training data its likelihood function becomes 0 1 1 (1) 0 1(,) () ()i iy N N ij i j i y N ij i jpD K K x,x x,x (1.16) 17 PAGE 18 Rather than maximizing this likelihood, JCFO provides an EM algorithm by first assuming a random function (,,)()Tz x hx ) where and 1()[1,(),...,()]NKKhxx,xx,x ~(0,1 N Then it treats 12[,,...],,Nzzz z as the missing variable s whose expectations would be calculated in the E step. E step The complete logposterior is log(,,,,)log(,,)log()log() (2)TTT TTpD D z z zz HH z T R where 111 01(,,...)Ndiag T and 111 12(,,...,)Ndiag R If we want to calculate the expectation of (1.17), i.e. ()() (,,) (2)tt TTT TTQ zz HH z T R (1.17) The expectations with respect to those missing variables are ()() () () () ,, (21)(()0,1) () ((21)())itt i iTit Tit iTitvzD y y hx hx hx N (1.18) ; 1()() 11 ,,tt iiiiD 1 and 1 1()() 22 ,,tt iiiiD where ()t i denotes the estimated value of i in the t th iteration. (The derivation of (1.18) can be found in Appendix 1.3) 18 PAGE 19 M step Now with the expectations of these missing va riables at hand, we can start applying MAP on the Q function of (1.17). After dropping all terms irrelevant to and setting 12[,,...,]T Nvvvv 12[,,...]Ndiag and 12[,,...,]Ndiag we get ()() (,,) 2ttTT TTTTQ HH Hv (1.19) Take the derivative of (1.18) with respect to and k would be: 22TTQ HH Hv 2 (1.20) 1 11 (,)22()NN T kk ij kk ijQ v H H (1.21) where represents the elementwise Hadamard matrix multiplication. Now by jointly maximizing (1.19) with respect to both and, we not only select the ke rnel functions but also those features of the input objects if both and are parsimonious. This is the core motivation of JCFO. Something that need s to be stressed is that could be solved directly in each iteration while k can only be computed in terms of an appr oximation method. This in fact increases the computation complexity and, even worse, influences the sparsity of. We will repeatedly come to this issue in Chapter 2. 1.5 Complexity Analysis What is the complexity of the EM algorithm above in the general case? Since in each iteration this algorithm alternatively calculates and we investigate these two scenarios respectively. Complexity of estimating Lets first take a look at the search for i.e. setting (1.19) to zero: 19 PAGE 20 1 ()T T H HH v (1.22) i) If kernel functions are applied, the calculations for the inner product is while the common matrix inversion and multiplication of an () Ok NN matrix in matlab are (),23cONc and respectively. Therefore the whole complexity of (1.22) would be: For lowdimensional datasets with k < N we can simplify it as 3( ON ) ) ) 23(cONkNN 3( ON ii) If only the linear regression is us ed, theres no need to calculate H thus the complexity is counted by the larger portion between matrix inversion and multiplication, i.e. or if we assume 2max(,),(23)cNNkc ()cON 2 ckN Compared with the kernel version, this doesnt seem like reducing many computations However, if the cons tants in the Onotation are taken into account, the difference would be conspicuous. Meanwhile, without the kernel, we get rid of the calculation of This also saves us enormous computation time, which we are going to show in the subsequent section. For an improved method, we can exploit the fact that after a few iterations, more and more j would vanish, which means theres no use in recalculating their corresponding kernel functions (),1,2,...,.ij K jx,x N Hence we can delete this entire row from H and shrink the size of this design matrix significantly. This desirable attribute can also be applied to especially in the case of high dime nsional dataset, as an allevia tion of the computation pressure incurred by kernel methods. Complexity of estimating After getting the value of in the current iteration, is solved by plugging this value into (1.21) and approximating it analytically. (The au thors of JCFO choose a conjugate gradient method.) Depending on different da tasets, the number of iterations required to esitimate the 20 PAGE 21 minimum of the target function vary remarkabl y, which makes it difficult to quantify the exact time complexity of calculating Here we provide a comparison of the time used to compute with that of other computations by averagi ng 50 and 100 runs of JCFO on two datasets. The JCFO matlab source code is from Krishnapuram, one of the inventors of JCFO. Two hyper parameters are chosen to be 10.002 and 24 ; 'is are initialized as where k is the number of features. 1/ k Table 1.1. Comparisons of computation time of and other parts. Datasets Variables HH (50 runs) sec Crabs (100 runs) sec mean 200.3 16.27 std 55.70 0.4921 mean 129.7 9.524 Other std 37.97 0.3577 We can see that nearly twothirds of computations are used to compute Furthermore, by choosing a conjugate gradient method, like the authors did with fmincon in matlab, we cannot assume the minimum would be obtained by a parsimonious. In other words, since the optimization itself is not sparsityoriented, enforc ing this requirement sometimes will lead to its terminating badly. This is a major weakness of JC FO. In order to verify our hypotheses, we will look at more behaviors of in Chapter 2. 1.6 Alternative Approaches 1.6.1 Sparse SVM In the following two sections we will briefl y discuss two other approaches derived from LASSO that have the very similar property as JCFO. (B.Schlkopf and A.J. Smola, 2002) introduce an insensitive SVM which has the followi ng the objective function: 21 PAGE 22 2 (*) (*) ,, 1 (*)11 min(,) ( ), 2 ..(,), (,), 0,0.NN i wHb i iii iii iC N st by yb w w wx wxRR (1.23) From the constraints we know that if a sample object falls inside the tube it won't get a penalty in our objective function. The parameter (*) i denotes the real di stance of a point to the tube C is a coefficient that controls the ratio of the total (*) i distance to the norm of the weight vector both of which we want to minimize or generally, the balan ce between the sparsity and the prediction accuracy. The variable [0,1] is a hyper parameter that behaves as the upper bound of the fraction of errors (the number of points outside the tube ) as well as the lower bound of the fraction of SVs. Now lets take the derivative of ,,, b w in the dual form of (1.22) and set them to zero. We would get (*) 11 1 1 (*) 11 max()()()()(,), 2 ..()0, [0,], ().NNN N iii iiiiij ii j N ii i i N ii iWyK st C m C xxR (1.24) This is the standard form of SVM Since the points inside the tube can't be used as the SVs, intuitively it attains more weight ve ctor sparsity than ordinary SVM by assigning more (*) i to be zero. (J. Bi et al, 2003) proposes a modification by adding a term 1 N i i to the loss function (this revision is called SparseSVM) in order to get an even more parsimonious set 22 PAGE 23 of SVs. This also inherits the idea from La sso. Of course, optimization involves a searching algorithm to find both the prior C and 1.6.2 Relevance Vector Machine The Relevance Vector Machin e (RVM) (M. Tipping, 2000) ha s a very similar design structure as JCFO. Indeed, it also assumes a Gaussian prior on ~(, ) 0N and 2()~(,) pt X I N where 12(,,)kdiag i can be considered as the hyper parameter of and 2 is known. These assumptions are the same as (1.7) and (1.14). However, instead of calculating the expectation of t and and plugging them back to maximize the posterior as what JCFO did in (1.17) a nd (1.19), RVM integrates out to get the marginal likelihood: 22 1 22 2 2(,)(,)() 1 (2) exp( ) 2m TTTptppd t 1 I XXtIXXt (1.25) Henceforth we can take the derivatives with respect to i and 2 in order to maximize (1.25). Interestingly, the logar ithm of Gamma distribution (,)iab assigned by (Herbrich, 2002) as the prior of i has the same effect as the Laplacian prior JCFO uses in promoting the sparsity of. We plot the shapes of this prior with different a, b in Figure 1.2. 2 1.5 1 0.5 0 0.5 1 1.5 2 10 9 8 7 6 5 4 3 2 1 0 Figure 1.2. Logarithm of gamma distribution. From top down: a=1e2 b=1e2; a=1e3 b=1e3; a=1e4 b=1e4. 23 PAGE 24 CHAPTER 2 ANALYSIS OF JCFO 2.1 Brief Introduction As the inventors of JCFO claim, its major objectives are twofold: i To learn a function that most accurately pred icts the class of a ne w example (Classifier Design) and, ii To identify a subset of the features that is most informa tive about the class distinction (Feature Selection). Within this chapter we will investigate these two attributes of JCFO on several real and synthetic datasets. From Experiment 2.2.1 to 2.3.2.2, we train our samples from the 220sample7feature HH landmine data and 200sample5feat ure Crabs data as our basic datasets, both of which are separable by an appropriate classifica tion algorithm. Then well take a 4fold cross validation on that. For the rest of the experime nts, well generate several special Gaussian distributions. From the performance evaluation s perspective, we compare both of JCFOs prediction and feature selection abilities using RBF kernel with the nonkernelized (linear) ARD method presented by (Figureiredo, 2003), which is its direct ascendant. Re garding our interest, we are going to devote more effort to the feat ure selection, primarily to the elimination of irrelevant and redundant features. In each scenario well contrive a couple of tiny experiments that test a certain aspect. Then a more extensive result will be provided together with an explanation of how this result is derived. The pseudo code of each experiment can be found in Appendix B. 2.2 Irrelevancy Intuitively, if some features don't vary from one class to another much, which means they're not sufficiently informative in differentiati ng the classes, we call them irrelevant features. To be more precise, it's convenient to in troduce the betweenclass divergence as follows: 24 PAGE 25 2 12 12 21 12111 (2)()( 22kd 1 ) where k denotes the k th feature and ,ii are the mean and the variance of Feature k of class i Note that d is in reverse proportion to the similarity of the two classesthe more similar the mean and variance are, the less d would become. In the extreme case that the two distributions represented by the k th feature totally overlap, d drops to zer o. This is equivalent to saying that the k th feature is completely irrelevant. We will use this concise measure to compare with the feature weight k assigned by JCFO in the following e xperiments. The divergences of each feature in HH and Crabs are gi ven by Table 2.1 and Table 2.2. Table 2.1. Feature divergence of HH Feature 1 2 3 4 5 6 7 Divergence 0.3300 0.5101 32.7210 0.2304 1.3924 3.0806 3.9373 Table 2.2. Feature divergence of Crabs Feature 1 2 3 4 5 Divergence 0.0068 0.3756 0.0587 0.0439 0.0326 2.2.1 Uniformly Distributed Features The distribution of this kind of feature is completely th e same between two classes. Obviously, features like this should be removed. We examine the performance of JCFO and ARD in eliminating uniformly distributed features by adding a constant column as a new feature to Crabs and HH. Testing results that averag e 50 runs are shown in Table 2.3 and Table 2.4. Only ARD completely sets the interferential f eature weight to zero on bot h datasets, while JCFO doesnt fulfill the goal of getti ng rid of the invariant feature on HH. Nonetheless, the time consumption of JCFO is enormous, too. 25 PAGE 26 Table 2.3. Uniformly distributed feature in Crabs JCFO ARD Train error rate mean (%) 2 4 Train error rate std (%) 0 0 Test error rate mean (%) 6 6 Test error rate std (%) 0 0 Percentage of the noise weights set to zero 100% 100% Running Time 2 hours 1min Table 2.4. Uniformly distributed feature in HH JCFO ARD Train error rate mean (%) 1.49 14.5 Train error rate std (%) 1.48 2.8e15 Test error rate mean (%) 21.7 20 Test error rate std (%) 4.63 8.5e15 Percentage of the noise weights set to zero 0% 100% Running Time 4 hours 1min 2.2.2 Noninformative Noise Features While a uniformly distributed f eature is rarely seen in real istic datasets, most of the irrelevancies lie under the cover of various kinds of random data noise. We can emulate this by adding the class independent Gaussian noise as a feature into our data sets. Although different means and variances can be chosen, this feature itself is still noninformative in terms of classification. Our purpose is to see if JCFO can remove this noise feature. By adding unit Gaussian noise with means of 5, 10 and 15 each time as a new feature and training each one 50 times, we get an average divergence of the noise feature as 0.0001. The testing result is provided in Table 2.5. Table 2.5. Noninformative Noise in HH and Crabs JCFO ARD HH Crabs HH Crabs Train error rate mean (%) 1.55 1.82 14.55 2.40 Train error rate std (%) 1.32 0.83 0.1 1.96 Test error rate mean (%) 27.41 5.00 20.03 3.60 Test error rate std (%) 6.1 1.83 0.47 2.95 Percentage of the noise weight being set to zero 0 8.87 96 99.3 26 PAGE 27 It can be seen that almost none of the corre sponding noise weights ar e set to zero by JCFO. By contrast, ARD almost always sets noise weights to zero. Th is implies that JCFO does not eliminate irrelevant Gaussian features well compared to the ARD method. As mentioned in Section 1.5.2, the reason is likely due to th e fact that although in (1.10) and in (1.14) have the same sparsitypromoting mechanism, in the implementation can only be derived approximately. This leaves the va lues of its elements tiny but still nonzero; thus, it conversely compromises sparsity. A similar situation will be reencountered in the next section. Nonetheless, JCFO doesnt beat ARD in terms of the testing error. Th is result can also be explained as the unfulfilment of the sparsityrelat ed reduction of generali zation error with respect to because we know the two methods sh are the same functionality in the s part. 2.3 Redundancy A redundant feature means that it has a similar, but not necessarily identical, structure to its other counterparts. Here we can employ a wellkn own measure to assess th e correlation between two features: 1 22 11 N kikj k ij NN kiki kkxx x x where i and j denote the ith and jth feature. The larger the is, the more correlated they are. The extreme case is 1 when they are identical. 2.3.1 Oracle Features First lets look at a peculiar example: if we put the class la bel as a feature, obviously this can be considered as the most informative attribut e in distinguishing the classes, and all the other features are redundant compared to it. We shall check if JCFO can directly select this oracle 27 PAGE 28 feature and get rid of all the others; i.e., only the i corresponding to this feature is nonzero. We test JCFO and ARD 100 times on HH and Crabs re spectively with the l abel feature being added. Table 2.6. Oracle feature JCFO ARD Train error rate (%) 0 0 Only the label feature weight is nonzero? Yes Yes The purpose of designing this toy experiment is to determine if, as feature selection approaches, both of them have the ability of id entifying the most representative feature from other relatively subtle ones. This should be c onsidered as the basic requirement for feature selection. 2.3.2 Duplicate Features A duplicate feature is identical to another feature and thus is completely redundant. Of course, here the correlation value of the two features is exactly 1. This experiment will examine whether JCFO can identify a duplicate feature and remove it. We achieve this by simply replicating a feature in our original dataset then adding it back as a new feature. We repeat this process on each feature and test whether it or its replica can be eliminated. Table 2.7. Duplicate feature weights on HH Feature Weight 1 2 3 4 5 6 7 Error (%) Train/Test Itself 0.2157 0.1438 0.3130 0.2134 0.4208 0.2502 0.2453 JCFO Duplicate 0.2300 0.1520 0.3539 0.2142 0.4010 0.2454 0.2604 4.082/ 20.32 Itself 0 0 0 0 0 0 0.0094 ARD Duplicate 0 0 0 0 0 0.3172 0 14.63/ 20.25 Table 2.8. Duplicate feature weights on Crabs Feature Weight 1 2 3 4 5 Error (%) Train/Test Itself 0 0.0981 0.1091 0 0.0037 JCFO Duplicate 0 0.0846 0.1076 0 0.0032 2/ 6 Itself 0 0.9558 0.9050 0 0 ARD Duplicate 0 0 0 0 0 4/ 6 28 PAGE 29 From the above tables, apparently JCFO a ssigns each feature approximately the same weight as its duplicate, while ARD sets either the feature or its duplicate (or both) to zero. Meanwhile, considering their testing errors we come to the situation of 2.2.2 again. 2.3.3 Similar Features Now we examine the features that are not id entical but highly correla ted. This section is separated into two parts: 2.3.3.1 This experiment is similar to the duplicate feature experiment. We coin a new feature by replicating an original feature, mixing it with a unit variance Gaussian noise. A verification is also taken on each feature to check if either th is feature or its counter part is removed by running the test 50 times on HH and 100 times on Crabs a nd averaging the times of either of their weights being set to zero. Table 2.9. Percentage of either two identical features weights being set to zero in HH 1 2 3 4 5 6 7 Feature Weight (0.4652) (0.8951) (0.9971) (0.9968) (0.9992) (0.9992) (0.1767) Error(%) Train/Test JCFO 30 30 20 10 10 10 20 3.853/21.4 ARD 100 100 100 100 99 100 92 14.51/20.11 Table 2.10. Percentage of either two identical features weights being set to zero in Crabs 1 2 3 4 5 Error (%) Feature Weight (0.9980) (0.9970) (0.9995) (0.9996) (0.9977) Train/Test JCFO 90 10 30 90 50 2.013/4.840 ARD 100 91 97 99 100 4.001/5.976 The results on two datasets still imply that JC FO cant perform as well as ARD in terms of the redundancy elimination. 29 PAGE 30 x y z Figure 2.1. Two Gaussian classes that can be classified by either x or y axes 2.3.3.2 In Figure 2.1 above, these two ellipsoids can be differentiated with either the x or y variables, thus either can be considered as redundant to th e other. However, z is com p letely irrelevan t We generate 200 sam p les each tim e half in each class, and ru n both m e thods 150 tim es. W e plot the corresponding to three axes featur es in the following figures. 0 50 100 150 0 0. 2 0. 4 0 50 100 150 0 0. 2 0. 4 0 50 100 150 0 0. 05 0. 1 Figure 2.2. Weights assigned by JCFO (f rom top to down: x, y and z axis) 30 PAGE 31 0 50 100 150 0 0.2 0.4 0 50 100 150 0.4 0.2 0 0 50 100 150 0.05 0 0.05 Figure 2.3. Weights assigned by ARD (fro m top to down: x, y and z axis) It can be seen that both methods discover th e irrelevancy of z axis though the performance of ARD is much better. For the redundancy, unfort unately neither of them eliminates the x or y axis. But, more interestingly, theassigned by ARD on x and y each time are almost identical while those assigned by JCFO are mu ch more chaotic. Note that th e redundancy here is different from the previous experiments since its more implicit from a machines perspective. We will discuss this further in 2.4. 2.3.4 Gradually More Informative Features In this experiment we generate sample s from two multivariate Gaussians. The corresponding elements in both mean vectors have gradually larger differences with the increase of their orders. e.g. suppose and 1(0,0,...,0)k 2(10,20,...,10) k The variance of each dimension is fixed to one fourth of the largest el ement difference. So, in the above case it should be 10k/4. By treating each dimension as a featur e, we contrive a set of gradually more informative features since the feature correspond ing to the largest orde r has the most distant means and the least overlapped variances, thus it is the most separable. When compared to this feature, the rest could be deemed redunda nt. We do this experiment with 5, 10 and 15dimensional data respectively. In each experi ment we generate 200 samples and perform a 4fold crossvalidation. This process is repeated 50 times on JCFO and 100 times on ARD, then 31 PAGE 32 we look at whether the most info rmative feature is kept while th e others are deleted. Results are listed in the following three tables. Table 2.11. Five features JCFO ARD Feature mean std zero(%) mean std zero(%) 1 0.0001 0.0008 14 0.0057 0.0072 57 2 0.0041 0.0013 0 0.0352 0.0155 8 3 0.0090 0.0020 0 0.0819 0.0168 0 4 0.0158 0.0021 0 0.1486 0.0193 0 5 0.0256 0.0027 0 0.2339 0.0224 0 Table 2.12. Ten features JCFO ARD Feature mean std zero(%) mean std zero(%) 1 0 0 100 0.0001 0.0010 95 2 0.0001 0.0003 80 0.0026 0.0045 72 3 0.0011 0.0019 50 0.0070 0.0082 52 4 0.0027 0.0028 34 0.0187 0.0125 24 5 0.0040 0.0039 16 0.0286 0.0145 14 6 0.0053 0.0049 6 0.0460 0.0146 4 7 0.0099 0.0071 0 0.0668 0.0171 0 8 0.0133 0.0075 0 0.0856 0.0156 0 9 0.0144 0.0087 2 0.1125 0.0190 0 10 0.0214 0.0106 0 0.1356 0.0197 0 Table 2.13. Fifteen features JCFO ARD Feature mean std zero(%) mean std zero(%) 1 0 0 100 0.0002 0.0009 94 2 0.0001 0.0004 92 0.0009 0.0024 86 3 0.0002 0.0006 84 0.0022 0.0038 71 4 0.0009 0.0017 68 0.0046 0.0063 61 5 0.0015 0.0027 62 0.0062 0.0076 56 6 0.0021 0.0036 52 0.0118 0.0099 36 7 0.0030 0.0050 44 0.0143 0.0120 35 8 0.038 0.0050 30 0.0242 0.0143 18 9 0.0044 0.0069 34 0.0318 0.0156 10 10 0.0068 0.0089 30 0.0413 0.0138 2 11 0.0057 0.0075 30 0.0497 0.0157 2 12 0.0109 0.0120 16 0.0588 0.0176 2 13 0.0117 0.0110 8 0.0735 0.0159 0 14 0.0118 0.0136 20 0.0845 0.0173 0 15 0.0151 0.0134 12 0.0992 0.0193 0 32 PAGE 33 The above tables indicate that although the most significant features are a ssigned to the largest weights by both methods, the rest of the f eatures are not removed but ranked in terms of their significances. Generally, JCFO eliminates more redundant features in the 2nd and 3rd cases. However, table 2.12 also shows that sometimes JC FO incorrectly deletes the most informative feature (12% for Feature 15). In some scenario s, the feature selection method cant totally identify the redundancy, but it gives a ranking m easurement according to how much each feature contributes to the classification, which is al so an acceptable altern ative. (I. Guyon, 2003) 2.3.5 Indispensable Features In pattern classification, if a se t of features operate as a whole, i.e. no subsets of them can fulfill the classification, we call them indispensable features. E liminating a feature from an indispensable feature set will cause a selection error. We are going to check how JCFO and ARD deal with this kind of feature. If we treat the x and y variable s as two features in Figure 2.4, apparently they consist of an indispensable feature set since none of them alone can determine the classes. x y z Figure 2.4. Two Gaussians that can only be classified by both x and y axes Like in Experim e nt 2.3.3.2, we generate 200 sample s each tim e half in e ach class and run both m e thods 150 tim es. Results are listed below: 33 PAGE 34 0 50 100 150 0 0.5 1 0 50 100 150 0 0.5 1 0 50 100 150 0 0.5 1 Figure 2.5. Weights assigned by JCFO (f rom top to down: x, y and z axis) 0 50 100 150 0.6 0.5 0.4 0 50 100 150 0.4 0.2 0 0 50 100 150 0.1 0 0.1 Figure 2.6. Weights assigned by ARD (fro m top to down: x, y and z axis) The fact that both JCFO and ARD dont remove either x or y axisfeatures is a good result. However, the weights assigned by ARD are much more stable. Rega rding the irrelevant z axis, ARD does a clean job like it did in 2.3.3.2. Theref ore, we can still evaluate its performance as better than JCFO. 34 PAGE 35 2.4 Nonlinearly Separable Datasets The experiments with synthetic data in th e previous two sections concentrated on classifying linearly separable features, which is not the strong suit for JCFO. In the following two tests, we will examine JCFOs performa nce on two nonlinearly separable datasets compared with that of ARD method. The 2D Cross data comprises two crossing Gaussian ellipses as one class and four Gaussians that surround this cross as the other cl ass, each of which has 100 samples. (Figure 2.7) The 2D Ellipse data has two equal mean Gaussian ellipses as two classe s, each of which has 50 samples. One of them has a much wider variance than the other, which makes distribution look like one class is surrounded by the other (Figur e 2.8). Clearly, theres no single hyper plane that can classify either of the two datasets correctly. We r un JCFO, nonkernelized ARD and kernelized ARD on both datasets 50 times respec tively. A comparison of their performances is shown in Table 2.14. In the experiment summari zed in Table 2.15, a noninformative irrelevant feature was added in exactly as descried in Se ction 2.2.2. In the experiment summarized in Table 2.16, one redundant feature was added in as described in Section 2.3.1. Table 2.14. Comparisons of JCFO, nonke rnelized ARD and kernelized ARD Methods Test error (%) Nonkernelized ARD Kernelized ARD JCFO mean 50 8.1 4.0 Cross std 0 1e14 1e15 mean 48 8.4 12 Ellipse std 4e14 3e15 1e14 Table 2.15. Comparisons of JCFO and kernelized AR D with an added noninformative irrelevant feature Methods Test error (%) Kernelized ARD JCFO mean 17.7 7.32 Cross std 17.6 3.20 mean 15.4 13.4 Ellipse std 5.70 5.03 35 PAGE 36 Table 2.16. Comparisons of JCFO and kernelized ARD with an added sim ilar redundant feature Methods Test error (%) Kernelized ARD JCFO mean 15.3 8.76 Cross std 4.01 3.77 mean 13.4 12.3 Ellipse std 5.21 4.41 Since nonkernelized ARD performs a linear mapping, it cannot perf orm a reasonable classification in this scenario. Each of kerne lized ARD and JCFO performs a little better in one of the original datasets. JCFO outperforms AR D on both of the two datasets mixed with irrelevant or redundant features, especially on th e Cross data. However, the time consumption for training JCFO still dominates that of kernelized ARD and none of the parameters corresponding to the noise feature are set to zero by JCFO in these two scenarios. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 2.7. Cross data 36 PAGE 37 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Figure 2.8. Ellipse data 2.5 Discussion and Modification From the trouble JCFO suffered in the experiments we designed, we reiterate what we discussed in 2.2.2: That is, JCFO does not reliably achieve sparsity on the feature set and does not, therefore, realize the advantage of low ge neralization error due to redu ced model complexity. Hence, it doesnt achieve both of the two advantages it claims. Therefore, a more effective approximation method for calculating in (1.21) needs to be proposed in order to achieve JCFOs theoretic al superiority. Without such a method being available at hand, we s uggest simply dropping the part and returning to a kernelized ARD model. e.g. we can plug in the RBF kernel functi on in (1.13). According to (Herbrich, 2002), this function has the appealing property that each linea r combination of kernel functions (see (1.6)) of the training objects can be viewed as a density estimator in the input space because it effectively puts a Gaussian on each 12(,,...)Nxxx i x and weights its contributi on to the final density by i in (1.6). Regarding feature selection, we can apply a nonkernelized (linear) ARD to 37 PAGE 38 directly weigh each feature. (However, this requires our datasets shouldnt be nonlinearly separable.) As we have seen in 2.3.3.2, each time the x and y variables are assigned the same absolute value (different signs), which might help us detect the very implicit redundancy in that scenario. By combining these two approaches, we first use the latter to preselect the more representative features, then turn them back to the former to do a more precise learning. This will increase the efficiency of the whole learning pr ocess. We show comparisons of the performance of nonkernelized, kernelized ARD and the comb o of them as below on the Crabs data and HH by training 100 times. Table 2.17. Comparisons of the three ARD methods Methods Test error (%) Nonkernelized Kernelized Combo mean 3.60 2.2 1~30 std 2.95 0 1~10 Crabs iterations 100 5 2 mean 20.03 12.63 9.17 std 0.47 6.26 2.11 HH iterations 100 5 3 Note that after implementing feature selections on the Crabs data, only two features remain. The Kernelized ARD will make i vanish very fast on low dimension datasets, (here even after 2 iterations only two i are nonzero) which causes the training result to be unstable (See the mean and std of the combinational method on the Crab s data). Therefore, we also need to realize that there are certain cases that the combination method is not a ppropriate and it also needs to balance the tradeoff between the paramete r sparsity and the prediction accuracy. 2.6 Conclusion In this thesis, we introduced the theoretical b ackground of an existing Bayesian based feature selection and classificati on method in the first chapter, fr om its origin to implementation detail as well as complexity. Then, from the s econd chapter, we systematically analyzed its 38 PAGE 39 performances in a series of particularly design ed experiments. A couple of comparisons with its direct ascendant approach are provided. From these experimental results weve seen that even if JCFO is theoretically more ideal in achieving both the features and basis functions sparsity, the lack of an effective implementation technique seriously restricts its performance. As an alternative, we suggest returni ng to the original ARD method, jo intly using the kernelized and nonkernelized versions of it to exploit both fe ature selections and class predictions. Though our model would become less ambitious henceforth, its simplicity in practice and timeefficiency are still preserved from our original design purpose. 39 PAGE 40 APPENDIX A SOME IMPORTANT DERIVATIONS OF EQUATIONS Equation (1.9) Consider the General case of minimizing: 2()22 f xxaxbx w.r.t. x, where a, b are constants. This is equivalent to: 2 222, min()0 ,0 22,xxaxbxx fx x xaxbxx 0 0 When : 0 x ; ,0 argmin() 0,0(xabab fx ab (1) 2) (3) 4) b When : 0 x ,0 argmin() 0,0(xabab fx ab ,(1)&(3)0&argmin((),()) ,0 ,0 sgn()()ababbxfabfa xaba xaba xaab ; (1)&(4)0&0 sgn()() axab x aab ; (2)&(3)0&0 sgn()() axab x aab ; (2)&(4)0&0 sgn()() bx x aab Summarizing the above f our scenarios yields: 40 PAGE 41 sgn()() x aab Integration (1.10) 2 2 2 2 2 2 2 2 2 2 22 2 0 22 0 1/2 1/2 2 2 0 2 2 2 0 2 0  2 2 1 2 2 1 22 1 (, 2 2 2 (,1979) 2 2x x x x t teed ed edxletxdx ed x edt e Beyer e ) Expectation (1.18) Since lets consider ()()() () (,,)~((),1)tittTtpzD hx N 1iy then i.e.: ()() ()0tTthx () () () ()() ()() () 0 () () ((),1) ,0 (,,) ((),1) 0, ((),1) ,0 (()) 0,t t t tTt i itt Tt i Tt i Tt iz pz z z z 0 0 hx D hx hx hx N N N 41 PAGE 42 () () ()()() ()() 0 () () () ,,(,,) (()0,1 () (())t t tittiitti i Tt Tt TtvzDzpzdz D hx hx hx N ) The case of follows the similar way. 0iy 42 PAGE 43 APPENDIX B THE PSEUDOCODES FOR EXPERIMENTS DESIGN 2.2 Irrelevancy NOTES: In this testing unit, our basic datasets are randomly generated, linearly separable. Each feature is normalized to zeromean, unit variance. 2.2.1 Uniformly Distributed Feature counter 0; For 1:50 ds load dataset; (n,k) get the matrix size from ds; ds append a column of 1s as the k+1 feature Run JCFO (ds); If theta(k+1) = 0; counter + 1; counter End End Display(counter/50) 2.2.2 Noninformative Noise Features counter 0; For mu = 5, 10, 15 For 1:50 ds load dataset; (n,k) get the matrix size from ds; ds append a column of mumean, unit variance noise as the k+1 feature Run JCFO (ds); If theta(k+1) = 0 counter + 1; counter End End End Display(counter/150) 2.3 Redundancy NOTES: In this testing unit, our basic datasets come from the randomly generated, linearly separable datasets or Gaussian distributions. Each feature is normalized to zeromean, unit variance. 43 PAGE 44 2.3.1 Oracle Features counter 0; For 1:100 ds load dataset; (n,k) get the matrix size from ds; ds append the class label as the k+1 feature Run JCFO (ds); If theta(k+1) > 0 and theta(1 through k) = 0; counter + 1; counter End End Display(counter/100) 2.3.2 Duplicate Features counter 0; For 1:50 ds load dataset; (n,k) get the matrix size from ds; For i 1:k ds replicate and append the ith feature Run JCFO (ds); If theta(i) = 0 or theta(k+1) = 0 counter + 1; counter End End End Display(counter/(50*k)) 2.3.3 Similar Features 2.3.3.1 counter 0; For 1:50 ds load dataset; (n,k) get the matrix size from ds; For i 1:k ds mix the ith feature with a unit noise and append it Run JCFO (ds); If theta(i) = 0 or theta(k+1) = 0 counter + 1; counter End End End Display( counter/(50*k)) 2.3.3.2 44 PAGE 45 mu1 [10 0 0]; mu2 [0 10 0]; sig1 [7 0 0; 0 1 0; 0 01]; sig2 [1 0 0; 0 7 0; 0 0 1] counter 0; For 1:150 ds randomly generate 200 samples from the two Gaussians with class label Run JCFO(ds); If either Feature X or Feature Y has been removed counter + 1; counter End End Display(counter/150) 2.3.4 Gradually More Informative Features counter 0; For k = 5, 10, 15 For 1:100 a vector with k zeros; mu1 mu2 [10 20 30 10*k]; sig (10k/4) (ksize identity matrix) ds randomly generate 200 samples from the Gaussians with class label Run JCFO (ds); If theta(k)>0 and theta(1 through k1) = 0 counter + 1; counter End End End Display(counter/300 ) 2.3.5 Indispensable Features mu1 [5 5 0]; mu2 [8 8 0]; sig [4 0 0; 0 1 0; 0 01]; counter 0; For 1:150 ds randomly generate 200 samples from the two Gaussians with class label then rotate it 45 degree clockwise around the class center Run JCFO(ds); If theta(1)>0 and theta(2) > 0. counter + 1; counter End End Display( counter/150 ) 45 PAGE 46 LIST OF REFERENCES W. Beyer. CRC Standard Mathematical Tables, 25th ed CRC Press, 1979. J. Bi, K.P. Bennett, M. Embrechits, C.M. Breneman and M. Song. Dimensionality reduction via sparse support vector machines. JMLR 3: 12291243, 2003. M. Figueiredo. Adaptive sparseness for supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9): 11501159, 2003. I. Guyon and Andre Elisseeff. An introduction to variable and feature selection. JMLR, 3:11571182, 2003. T. Hastie, R. Tibshirani and J. Friedman. The Elements of Statistical Learning. Springer Verlag, 2001. R. Herbrich. Learning Kernel Classifiers MIT Press, 2002. B. Krishnapuram, A.J. Hartemink and M.A.T. Figueire do. A Bayesian approach to joint feature selection and classifier design. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(9): 11051111, 2004. B. Krishnapuram, D. Williams, Y. Xue, A. Hartemink, L. Carin and M. Figueiredo. On semisupervised classification. In L. K. Saul, Y.Weiss and L. Bottou, editors, Advances in Neural Information Processing Systems 17 MIT Press, 2005. B. Schlkopf and A.J. Smola. Learning with Kernels Regular ization, Optimization and Beyond MIT Press, 2002. R. Tibshirani. Regression shrinkage and selection via lasso. J. Royal Statistical Soc. (B), 58: 267200, 1996. M. Tipping. The relevance vector machinc. In S. A. Solla, T. K. Leen and K. R. Muller, editors, Advances in Neural Information Processing Systems 11 pp. 218224. MIT Press, 2000. 46 PAGE 47 BIOGRAPHICAL SKETCH Fan Mao was born in Chengdu, China, in 1983. He received his bachelors degree in computer science and technology in Sha nghai Maritime University Shanghai, China, in 2006. Then he came to University of Fl orida, Gainesville, FL. In December 2007, he received his M.S. in com puter science under the superv ision of Dr. Paul Gader. 47 