<%BANNER%>

Robust kernel methods in context-dependent fusion

Permanent Link: http://ufdc.ufl.edu/UFE0041144/00001

Material Information

Title: Robust kernel methods in context-dependent fusion
Physical Description: 1 online resource (144 p.)
Language: english
Creator: Heo, Gyeongyong
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2009

Subjects

Subjects / Keywords: clustering, context, fusion, fuzzy, kernel, landmine, pca, robustness, svm
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Combining classifiers, a common way to improve the performance of classification, has gained popularity in recent years. By combining classifiers, one can take advantage of the diversity of the classifiers, while some defects of a classifier can be compensated for by other classifiers. In this dissertation, a classifier combination method, one specifically applicable to the combination of classifier outputs - commonly called fusion - is investigated. The proposed fusion method focuses on the combination of classifier outputs with the help of feature space information that is referred to as context. The basic assumption in the proposed method is that a context corresponds to a homogeneous region in the feature space. By dividing the feature space into a given number of homogeneous regions, one can identify the same number of contexts, then a different fusion process can be developed for each context, hence the name context-dependent fusion (CDF). The context-dependent fusion algorithm is an iterative method simultaneously clustering the feature space and learning optimal parameters for fusion. Although CDF has several advantages over previous methods, it is limited in that it is only valid for convex clusters with linearly separable classes. To mitigate the convex cluster assumption, a modified CDF using regularization, called context-dependent fusion with regularization (CDF-R), is formulated. By adding the regularization terms, not only does CDF-R achieve noise robustness, the main purpose of regularization, but the consequent clusters, which need not be convex, result in better performance than CDF. Although CDF-R is better at classification than CDF, the linear separability does not change. To completely remove the limitation, CDF is transformed to be non-linear, termed kernel-based context-dependent fusion (K-CDF). K-CDF adopts modified kernel methods to remove the restrictions of CDF and remedies some problems in the original kernel methods. K-CDF consists of three main components: dimension reduction, feature space clustering, and fusion. For each component, robust kernel fuzzy principal component analysis (RKF-PCA), kernel-based global fuzzy c-means (KG-FCM), and fuzzy support vector machine for noisy data (FSVM-N) are formulated, and which correspond to the robust variant of kernel PCA, kernel FCM, and fuzzy SVM, respectively. Although the three modifications were originated to address different shortcomings, one common purpose is to reduce the effect of nose, i.e., making the kernel methods noise-robust. By combining the three robust kernel methods, not only does K-CDF overcome the convex cluster assumption and linearly separable restriction, but it achieves noise robustness and better performance than previous methods.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Gyeongyong Heo.
Thesis: Thesis (Ph.D.)--University of Florida, 2009.
Local: Adviser: Gader, Paul D.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2009
System ID: UFE0041144:00001

Permanent Link: http://ufdc.ufl.edu/UFE0041144/00001

Material Information

Title: Robust kernel methods in context-dependent fusion
Physical Description: 1 online resource (144 p.)
Language: english
Creator: Heo, Gyeongyong
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2009

Subjects

Subjects / Keywords: clustering, context, fusion, fuzzy, kernel, landmine, pca, robustness, svm
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Combining classifiers, a common way to improve the performance of classification, has gained popularity in recent years. By combining classifiers, one can take advantage of the diversity of the classifiers, while some defects of a classifier can be compensated for by other classifiers. In this dissertation, a classifier combination method, one specifically applicable to the combination of classifier outputs - commonly called fusion - is investigated. The proposed fusion method focuses on the combination of classifier outputs with the help of feature space information that is referred to as context. The basic assumption in the proposed method is that a context corresponds to a homogeneous region in the feature space. By dividing the feature space into a given number of homogeneous regions, one can identify the same number of contexts, then a different fusion process can be developed for each context, hence the name context-dependent fusion (CDF). The context-dependent fusion algorithm is an iterative method simultaneously clustering the feature space and learning optimal parameters for fusion. Although CDF has several advantages over previous methods, it is limited in that it is only valid for convex clusters with linearly separable classes. To mitigate the convex cluster assumption, a modified CDF using regularization, called context-dependent fusion with regularization (CDF-R), is formulated. By adding the regularization terms, not only does CDF-R achieve noise robustness, the main purpose of regularization, but the consequent clusters, which need not be convex, result in better performance than CDF. Although CDF-R is better at classification than CDF, the linear separability does not change. To completely remove the limitation, CDF is transformed to be non-linear, termed kernel-based context-dependent fusion (K-CDF). K-CDF adopts modified kernel methods to remove the restrictions of CDF and remedies some problems in the original kernel methods. K-CDF consists of three main components: dimension reduction, feature space clustering, and fusion. For each component, robust kernel fuzzy principal component analysis (RKF-PCA), kernel-based global fuzzy c-means (KG-FCM), and fuzzy support vector machine for noisy data (FSVM-N) are formulated, and which correspond to the robust variant of kernel PCA, kernel FCM, and fuzzy SVM, respectively. Although the three modifications were originated to address different shortcomings, one common purpose is to reduce the effect of nose, i.e., making the kernel methods noise-robust. By combining the three robust kernel methods, not only does K-CDF overcome the convex cluster assumption and linearly separable restriction, but it achieves noise robustness and better performance than previous methods.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Gyeongyong Heo.
Thesis: Thesis (Ph.D.)--University of Florida, 2009.
Local: Adviser: Gader, Paul D.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2009
System ID: UFE0041144:00001


This item has the following downloads:


Full Text

PAGE 2

2

PAGE 3

3

PAGE 4

Firstofall,Iwouldliketothankmyadvisor,Dr.PaulGader,forallofhisguidanceandencouragementthroughoutmystudies.MythanksalsogotoDr.HowardBeck,Dr.AnandRangarajan,Dr.GerhardRitter,Dr.ClintSlatton,andDr.JosephWilson,foralloftheirsupportandvaluablesuggestions.Additionally,thanktomymanyformerandcurrentlabmatesfordiscussionduringmystudies.Ialsowanttothankmyparentsfortheirlove,understanding,andmanysacricestheyhadtomakethroughoutmystudies.Finally,manythankstomywifeandson,whowhohavebeenthereforme. 4

PAGE 5

page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 7 LISTOFFIGURES ..................................... 8 LISTOFALGORITHMS .................................. 10 ABSTRACT ......................................... 11 CHAPTER 1INTRODUCTION ................................... 13 2CONTEXT-DEPENDENTFUSION ......................... 19 2.1Multipleclassiersystem ........................... 20 2.2Decisionfusion ................................. 23 2.3Context-dependentfusion ........................... 24 2.3.1Simultaneousclusteringandattributediscrimination ........ 26 2.3.2Context-dependentfusion ....................... 28 2.4Context-dependentfusionwithregularization ................ 32 2.5Experimentalresults .............................. 36 2.6Discussion ................................... 38 3RKF-PCA:ROBUSTKERNELFUZZYPCA .................... 41 3.1RobustPCA(R-PCA) ............................. 42 3.2RobustFuzzyPCA(RF-PCA) ......................... 45 3.3KernelPCA(K-PCA) .............................. 49 3.4RobustKernelFuzzyPCA(RKF-PCA) .................... 51 3.5Experimentalresults .............................. 54 3.6Discussion ................................... 58 4KG-FCM:KERNELIZEDGLOBALFUZZYC-MEANS .............. 61 4.1Globalk-means(GKM) ............................ 63 4.2Globalfuzzyc-means(G-FCM) ........................ 64 4.3Kernel-basedglobalfuzzyc-means(KG-FCM) ............... 65 4.4Experimentalresults .............................. 70 4.4.1Experimentsonarticialdatasets .................. 70 4.4.2Experimentsonrealworlddatasets ................. 76 4.5Discussion ................................... 79 5

PAGE 6

..... 81 5.1Supportvectormachine ............................ 82 5.2Previousapproachestomembershipcalculation .............. 84 5.3FuzzySVMfornoisydata ........................... 91 5.4Experimentalresults .............................. 95 5.5Discussion ................................... 98 6KERNEL-BASEDCONTEXT-DEPENDENTFUSION ............... 100 6.1Kernel-basedcontext-dependentfusion ................... 102 6.2Experimentalresults .............................. 107 6.3Discussion ................................... 112 7CONCLUSIONS ................................... 114 APPENDIX AUPDATEEQUATIONSFORCONTEXT-DEPENDENTFUSIONWITHREGULARIZATION .............................. 116 BRECONSTRUCTIONERRORWITHTHEMEMBERSHIPSINTHEFEATURESPACE .............................. 124 CUPDATEEQUATIONSFORKERNEL-BASEDCONTEXT-DEPENDENTFUSION 129 REFERENCES ....................................... 134 BIOGRAPHICALSKETCH ................................ 144 6

PAGE 7

Table page 2-1UpdateequationsofCDFandCDF-R ....................... 35 2-2Landminedataset(N=numberofdatapoints,K=numberofclusters,d=featuredimension) .................................. 36 2-3ExperimentalresultsfromdatasetI ........................ 39 2-4ExperimentalresultsfromdatasetII ........................ 40 3-1Classicationerroroneachsetofcharacter-pairdata .............. 57 4-1ExperimentalresultsonD7(N=numberofclusterscorrectlyidentied,var(N)=varianceofN) ................................... 72 4-2ExperimentalresultsusingUCIdatasets(N=numberofsamples,d=datadimension,K0=numberofclasses) ........................ 79 5-1SVManditsvariants ................................. 98 5-2ErrorratesonUCIdatasets(N=numberofdatapoints,d=datadimension) 98 6-1UpdateequationsofCDF-RandK-CDF ...................... 105 6-2ExperimentalresultsfromthesubsetofdatasetII ................ 110 6-3Condenceintervalofthenumberoffalsealarmswitha90%condencelevel 111 6-4Experimentalresultsfromhyperspectraldataset ................. 112 6-5Condenceintervalofthenumberoffalsealarmswitha90%condencelevel 112 7

PAGE 8

Figure page 1-1Blockdiagramofcontext-dependentfusion. .................... 14 1-2Blockdiagramofthekernel-basedcontext-dependentfusion. .......... 15 2-1Ensemblemethods. ................................. 22 2-2Combinationmethodsofclassieroutputs. .................... 25 2-3Testphaseofcontext-dependentfusion. ...................... 30 2-4ROCcurvesforasetofclassiersfromdatasetI ................. 38 2-5ROCcurvesforasetofclassiersfromdatasetII ................ 39 3-1Estimatedprincipalcomponentsusingreconstructionerrorwith(a)oneand(b)twoprincipalcomponents. ............................ 46 3-2Therstprincipalcomponentusing(a)PCAand(b)RF-PCA. .......... 48 3-3Therstprincipalcomponentfrom(a)K-PCAoncleandata,(b)K-PCAonnoisydata,and(c)RKF-PCAonnoisydata. .................... 54 3-4(a)K-PCAoncleandata,(b)K-PCAonnoisydata,and(c)RKF-PCAonnoisydata. .......................................... 55 3-5Errorhistogrambetweeneigen-systemsofcleanandnoisydatausingK-PCAandRKF-PCA. .................................... 56 3-6Condenceintervalsoftheerrorrate ........................ 58 3-7Errorratewithrespecttonoiseratio. ........................ 59 4-1SensitivitytoinitializationinFCM. ......................... 71 4-2IncrementalseedselectioninG-FCM ....................... 72 4-3Numberofclusterscorrectlyidentiedwithrespecttonoiseratio. ........ 73 4-4Clusteringresultsusing(a)FCMand(b)K-FCM-Cwithrandominitialization. 74 4-5ErrorratewithrespecttodbetweeninDparallel 75 4-6VarianceoferrorratewithrespecttodbetweeninDparallel 75 4-7Non-linearlyseparableclusters. ........................... 76 4-8ErrorratewithrespecttodbetweeninDnonlinear. ................... 76 5-1MappingfunctionforH-FSVM. ........................... 90 8

PAGE 9

.............................. 91 5-3RescalingfunctionforFSVM-N(N,1
PAGE 10

Algorithm page 2-1Context-dependentfusion .............................. 32 2-2Context-dependentfusionwithregularization ................... 36 3-1RF-PCA ........................................ 48 3-2RKF-PCA ....................................... 53 4-1Fastglobalk-means ................................. 64 4-2K-FCMwithCauchykernel ............................. 68 4-3KG-FCMwithCauchykernel ............................ 68 4-4Averagecommutetime ............................... 69 4-5K-FCMwithrandomwalkkernel .......................... 70 4-6KG-FCMwithrandomwalkkernel ......................... 71 5-1FuzzySVMfornoisydata .............................. 95 6-1Kernel-basedcontext-dependentfusion ...................... 106 10

PAGE 11

Combiningclassiers,acommonwaytoimprovetheperformanceofclassication,hasgainedpopularityinrecentyears.Bycombiningclassiers,onecantakeadvantageofthediversityoftheclassiers,whilesomedefectsofaclassiercanbecompensatedforbyotherclassiers.Inthisdissertation,aclassiercombinationmethod,onespecicallyapplicabletothecombinationofclassieroutputscommonlycalledfusionisinvestigated.Theproposedfusionmethodfocusesonthecombinationofclassieroutputswiththehelpoffeaturespaceinformationthatisreferredtoascontext.Thebasicassumptionintheproposedmethodisthatacontextcorrespondstoahomogeneousregioninthefeaturespace.Bydividingthefeaturespaceintoagivennumberofhomogeneousregions,onecanidentifythesamenumberofcontexts,thenadifferentfusionprocesscanbedevelopedforeachcontext,hencethenamecontext-dependentfusion(CDF). Thecontext-dependentfusionalgorithmisaniterativemethodsimultaneouslyclusteringthefeaturespaceandlearningoptimalparametersforfusion.AlthoughCDFhasseveraladvantagesoverpreviousmethods,itislimitedinthatitisonlyvalidforconvexclusterswithlinearlyseparableclasses.Tomitigatetheconvexclusterassumption,amodiedCDFusingregularization,calledcontext-dependentfusionwithregularization(CDF-R),isformulated.Byaddingtheregularizationterms,notonlydoesCDF-Rachievenoiserobustness,themainpurposeofregularization,but 11

PAGE 12

12

PAGE 13

Searchingforpatternsindataisafundamentalproblemthathasalonghistory.Oneofthemaincomponentsoftheproblem,classication,assignseachdatapointtooneofanitenumberofcategories.SinceFisherintroducedthemethodofdiscriminantanalysisinthe1930s[ 1 ],numerousalgorithmsforclassicationhavebeendevelopedrangingfromsimplelineardiscriminantanalysistotheemergingkernel-basedmethods[ 2 ].Althoughthelatestclassicationmethodsperformbetterthanpreviousonesandhavebeenappliedinmanyareassuccessfully,someofthemethodsareapplicableonlytospecicapplicationsduetotherequirementofalargedatasetandtheincreaseintimeandspacecomplexity.Thefactthatclassiersthemselvesareapproachingtheirlimitshasdrivenresearchersintootherareasofclassicationsystems.Combiningclassiersappearstobeanaturalstepforwardwhenacriticalmassofknowledgeofsingleclassiermodelshasbeenaccumulated.Althoughtherearemanyunansweredquestionsaboutmatchingclassierstorealworldproblems,combiningclassiersisarapidlygrowingeldinthepatternrecognitionandmachinelearningcommunities. Inthisdissertation,anewfusionmethod,calledcontext-dependentfusion(CDF)[ 3 ],[ 4 ],thatusesfeaturespaceinformationtodenecontextsisdescribedrst.Theconceptofcontext-basedfusionwasproposedbyPaulGaderoftheUniversityofFloridaincollaborationwithHichemFriguioftheUniversityofLouisville.TherehavebeenseveralattemptstomodifyandenhancetheperformanceofCDFofwhichthisstudyisone.CDFisageneralizationofthepreviousfusionmethodsandusesfeatureanddecision-levelinformationtogether.ThentwovariantsofCDFareproposedtoimprovetheperformanceofCDF.Therstoneisanoise-robustmethodusingregularization,calledcontext-dependentfusionwithregularization(CDF-R).CDF-Rachievesthenoise-robustnessbyadoptingregularization,whichiswidelyusedtoreducetheeffectof 13

PAGE 14

Blockdiagramofcontext-dependentfusion. noise.Thesecondoneisanon-linearvariantadoptingkernelmethods,calledkernel-basedcontext-dependentfusion(K-CDF).ToimplementK-CDF,eachcomponentofCDFiskernelizedusingarobustvariantofexistingkernel-basedmethod.Specically,threerobustkernel-basedmethods,robustkernelfuzzyprincipalcomponentanalysis,kernel-basedglobalfuzzyc-means,andfuzzysupportvectormachinefornoisydata,areproposed. 1-1 ,iscomposedoftwomaincomponents,contextextractionanddecisionfusion.Toextractcontext,theinputdataaredividedintohomogeneousregionsinthefeaturespaceandineachoftheregionsadifferentfusionprocessarises.Theoptimizationofeachcomponentcanberepresentedasaminimizationproblemofsomeobjectivefunction.InCDF,thetwocomponentsareintegratedintooneobjectivefunctionthatmakesitpossibletooptimizethemsimultaneously.InChapter 2 ,themotivationandstructureofCDFisintroduced.InCDF,amodiedfuzzyc-means(FCM)withfeaturediscriminationisusedtoextractcontextsandtheweightedaverageisusedforfusion.Duringclustering,thefeaturediscriminationfunctioncanautomaticallyweighteachfeatureaccordingtoitsrelevance.Itsrobustvariantusingregularization,termedcontext-dependentfusionwithregularization,isalsodescribedinChapter 2 14

PAGE 15

Blockdiagramofthekernel-basedcontext-dependentfusion. AlthoughCDFisasimpleandefcientmethodforfusion,itsbasiclimitationisthatneithermodiedfuzzyclusteringforcontextextractionnorweightedaverageforfusioncanaccommodatenon-linearcases.AlthoughCDF-RisbetterthanCDFinclassicationaccuracy,CDF-RhasthesameproblemsasCDF.Whileincreasingthenumberofclustersincontextextractioncanmitigatetheproblem,theincreaseinthenumberofclustersalsoincreasesthetimeandspacecomplexity.Moreover,togetastablesolution,moretrainingpointsareneeded,whichisoftenimpossibleinrealworldapplications.Over-ttingalsocanbeanon-trivialproblem. ToovercomethelimitationsofCDF,ageneralizedCDF,calledkernel-basedcontext-dependentfusion,isformulated.Therehavebeenseveralattemptstoextendorgeneralizelinearmethodsintocorrespondingnon-linearonesandthekernel-basedmethodisthemostpromising[ 5 ].Kernelmethodsapproachpatternrecognitionproblemsbymappingthedataintoahighdimensionalfeaturespace,whereeachco-ordinatecorrespondstoonedataitem.Inthatspace,avarietyofmethodscanbeusedtondrelationsinthedata.Sincethemappingcanbequitegeneral,therelationsfoundinthiswayarenotnecessarilylinear.Algorithmscapableofoperatingwithkernelsincludesupportvectormachine(SVM),Fisher'slineardiscriminantanalysis,principalcomponentsanalysis(PCA),ridgeregression,spectralclustering,andmanyothers[ 5 ]. 15

PAGE 16

1-2 .InCDF,amodiedfuzzyc-meanssimultaneouslyperformsthecontextextractionandfeatureweighting.Comparedtotheoriginaldatadimensionality,thedimensionalityofthedatainkernelmethodsisveryhighandsometimesassumedtobeinnite.Consequently,featureweightcannotbedecidedeasilyinkernel-basedmethods.InK-CDF,therefore,featurespaceclusteringandfeaturediscriminationaredividedintotwocomponentsandmodiedkernelPCAandglobalfuzzyc-means(G-FCM),respectively,areusedoneach. First,theinputdataisprocessedusingmodiedkernelPCA(K-PCA),whichcorrespondstofeaturediscriminationinCDF.PCAiswidelyusedfordimensionalityreductionandfeatureextractioninpatternrecognition.AlthoughPCAhasbeenappliedinmanyareassuccessfully,itsuffersfromsensitivitytonoiseandislimitedtolinearprincipalcomponents.ThenoisesensitivityproblemcomesfromthesumofsquaresmeasureusedinPCAandcanbealleviatedusingrobustestimation.ThelimitationoflinearcomponentsoriginatesfromthefactthatPCAusesanafnetransformdenedbyeigenvectorsofthecovariancematrix.Thisproblemcanbeattackedbyusingkernelstonon-linearlytransformthedata.InChapter 3 ,arobustvariantofkernelPCA,whichextendskernelPCAofScholkopfetal.[ 6 ]andusesfuzzymembershipsisintroducedtotacklethetwoproblemssimultaneously.Toderivethemethod,rstaniterativemethodtondarobustcovariancematrix,robustfuzzyPCA(RF-PCA),isintroduced.TheRF-PCAmethodisthenextendedtoanon-linearone,robustkernelfuzzyPCA(RKF-PCA)[ 7 ],usingkernels.ByintroducingfuzzymembershipsintoK-PCA,RKF-PCAachievesbetternoise-resistancethanK-PCA. Forcontextextraction,K-FCMisconsidered.ThisisadirectextensionofFCMintoanon-linearmethod.FCMisasimplebutpowerfulclusteringmethodusingtheconceptofafuzzyset.Afteritsintroduction,FCMhasproventobesuccessfulinmany 16

PAGE 17

4 ,globalfuzzyc-means(G-FCM)andkernelfuzzyc-means(K-FCM)arecombinedandextendedtoresolvetheshortcomingsmentionedabove.FCMgenerallyrequiresmultiplerunswithrandominitializationtoobtainthebestmodel,whichistime-consuming,and,evenmorechallenging,ndinganoptimalinitializationisknownasaNPhardproblem.ThereareseveralgroupsofmethodstoinitializeFCMtondasub-optimalsolutionandglobalFCM(G-FCM)isoneofthem.G-FCMisavariantofFCMusinganincrementalanddeterministicseedselectionmethod,andisefcientinalleviatingthesensitivitytoinitialization.Therearealsoseveralapproachestorelaxtheburdenofnoiseandnon-convexclustersandK-FCMisoneofthem.K-FCMisconsideredinChapter 4 becauseitcanbeeasilyextendedusingdifferentkernels,aswellasbeingefcientatdealingwithFCMproblems.Theproposedmethod,kernel-basedglobalfuzzyc-means(KG-FCM),isbasedonG-FCMtoavoidtheinitializationproblem,thenextendedwiththehelpofthekernelmethod.Specically,KG-FCMwithCauchykernelisproposedtomitigatetheeffectofnoiseandKG-FCMwithrandomwalkkernelisproposedtoeffectivelymanagenon-convexclusters. Chapter 5 developsarobustvariantoffuzzysupportvectormachine(FSVM),calledFSVMfornoisydata(FSVM-N).SVMisatheoreticallywellmotivatedalgorithmdevelopedfromstatisticallearningtheory,thathasshowngoodperformanceinmanyelds.Inspiteofitssuccess,itstillsuffersfromanoisesensitivityproblem.Torelaxtheproblem,SVMwasextendedbytheintroductionoffuzzymembershipsandresultedinfuzzySVM,whichalsohasbeenextendedfurtherintwoways:byadoptingadifferentobjectivefunctionwiththehelpofdomain-specicknowledgeandbyemployingadifferentmembershipcalculationmethod.InChapter 5 ,anewmembershipcalculationmethodthatbelongstothesecondgroupisproposed.Itisdifferentfrompreviousonesinthatitdoesnotrelyoncircularassumptionsaboutthedatadistributionand 17

PAGE 18

ThecommonaimofChapters 3 4 ,and 5 istodeveloparobustnon-linearmethodtoaddresstheshortcomingsinCDF.Byadoptingkernelmethods,thecomponentsinCDFaretransformedintonon-linearonesandthefuzzymembershipsareusedtoachievenoise-robustness.Usingalltherobustmethods,kernel-basedcontext-dependentfusionisformulatedinChapter 6 AlthoughChapters 2 3 4 ,and 5 leadtothedevelopmentofK-CDF,eachchapteraddressesaseparateproblemorlimitation.Thethreerobustkernelmethods(RKF-PCA,KG-FCM,andFSVM-N)areindependentofeachotherandcanbeusedinotherpatternrecognitionproblems,asdemonstratedineachchapter.Therefore,inthisdissertation,eachchapterhasitsownsurveyofrelatedresearches,experiments,anddiscussionsandnoseparatechapterisdevotedtoaliteraturesurvey. 18

PAGE 19

Classicationproblems[ 2 ],[ 8 ]areamajorcategoryofdataanalysisthatareappliedtopatternrecognition,machinelearning,statisticalinference,and,recently,datamining.Classicationmethodsrepresentasetofsupervisedlearningtechniqueswhereasetofdependentvariablesneedstobepredictedbasedonasetofinputvariables.Classicationtechniqueshaveattractedtheattentionofresearchersfromvariouseldsandavarietyofmethods,suchasdecisiontrees,rulebasedmethods,neuralnetworks,Bayesianmethods,andsupportvectormachines,areusedtoaddressclassicationproblems.Classicationtechniqueshavebeensuccessfullyappliedtomanyrealworldproblems,althoughitisalsogenerallyacceptedthatthereisnoonebestwaytosolvetheproblemsanditmaybefutiletodebatewhichtypeofclassicationtechniqueisbest[ 9 ]. Inspiteofthesuccesses,alloftheclassicationmethodshavetheirownlimitations,whichmakesresearchersinvestigateotherareasofclassicationsystem.Combiningmultipleclassierstoobtainimprovedperformance,generallycalledmultipleclassiersystem(MCS)[ 10 ],wasdevelopedasapracticalandeffectiveapproachtosolvethelimitationofsingleclassiersystems.Fromthebeginning,thisapproachhasproducedpromisingresultsandresearchinthisdomainhasincreasedsignicantly,partlyasaresultofadvancesintheclassicationtechnologyitself.Combiningmultipleclassierscanbeconsideredasagenericpatternrecognitionprobleminwhichtheinputconsistsoftheresultsoftheindividualclassiers,andtheoutputisthecombineddecision.Forthispurpose,manycurrentclassicationtechniquescanbeapplied.MCShasasurprisinglylonghistory.Forexample,theBordacountforcombiningmultiplerankingsisnamedfortheeighteenthcenturyFrenchmathematicianJean-CharlesdeBorda,whodevisedthesystemin1770.MCSwentthroughparallelrouteswithinseveraldisciplines,forexample,patternrecognition,machinelearning,andinformationtheory.Although 19

PAGE 20

Inthischapter,wedescribeanextendedfusionmethod,calledcontext-dependentfusion(CDF)[ 3 ],[ 4 ],whichisastartingpointofthisdissertation.CDFisacombinationoftraditionalfusionandfeaturespaceclustering.Itisalsocanbeseenasalocalapproachthattakesdifferentfusionprocessestodifferentregionsofthefeaturespace.Itcantakeadvantagesofthestrengthsofafewclassiersindifferentregionsofthefeaturespacewithoutbeingaffectedbytheweaknessesoftheotherclassiers.Thefeaturespaceinformationisreferredtoasthecontextofthedatapointandhelpsthefusertomakerobustdecisions. Inthenextsection,multipleclassiersystemisbrieyoverviewed.Insection 2.2 ,weputafocusonfusion,anddescribethemotivationoftheproposedmethod.Insection 2.3 ,context-dependentfusion,isdescribed.AnextensionofCDF,context-dependentfusionwithregularization(CDF-R),isintroducedinsection 2.4 .Experimentalresultsaregiveninsection 2.5 andadiscussionsectionendsthischapter. 11 ],speechrecognition[ 12 ],andtextcategorization[ 13 ].Theconceptofclassiercombinationismotivatedbyobservationofclassiers'complementarycharacteristicswhichleadtobetteraccuracyforthewholethanforanyofitsindividualclassiers.Itisdesirabletotakeadvantageofthestrengthsofindividualclassiersandtoavoidtheirweaknesses.Anecessaryandsufcientconditionforan 20

PAGE 21

14 ].Anaccurateclassierisonethathasanerrorratebetterthanrandomguessingonanewsample;twoclassiersarediverseiftheymakedifferenterrorsonnewdatapoints.Inmostapplications,theconditionisassumedtobesatisedandthesuperiorityofMCSoversingleclassiersystemhasbeendemonstratedexperimentally. Dietterich[ 15 ]alsosuggeststhreereasonswhyaclassierensemblemightbebetterthanasingleclassier.Therstreasonisstatistical.Alearningalgorithmcanbeviewedassearchingahypothesisspacetondthebesthypothesis.Thestatisticalproblemariseswhentheamountoftrainingdataistoosmallcomparedtothesizeofthehypothesisspace.Withoutsufcientdata,thelearningalgorithmcanndmanydifferenthypothesesthatallgivethesameaccuracyonthetrainingdata.Byconstructinganensembleoutofallaccurateclassiers,thealgorithmscanaveragetheirvotesandreducetheriskofmakingawrongdecision.Thesecondreasoniscomputational.Manylearningalgorithmsworkbyperformingalocalsearchthatmaygetstuckinlocaloptima.Incaseswherethereisenoughtrainingdatasothatthestatisticalproblemisabsent,itmaystillbeverydifcultcomputationallyforthelearningalgorithmtondthebesthypothesis.Forexample,optimaltrainingofneuralnetworksisaNPhardproblem[ 16 ],[ 17 ].Anensembleconstructedbyrunningthelocalsearchfrommanydifferentstartingpointsmayprovideabetterapproximationofthetruehypothesisthananyoftheindividualhypothesis.Thethirdreasonisrepresentational.Insomeapplications,thetruehypothesiscannotberepresentedbyanyofthehypothesesfoundbyaclassier.Byformingweightedsumsofhypotheses,itmaybepossibletoexpandthehypothesisspace.Thesethreeissuesarethethreemostcommonwaysinwhichexistinglearningalgorithmsfail.Hence,ensemblemethodshavethepromiseofrelaxingtheshortcomingsofsingleclassiersystem. 21

PAGE 22

Ensemblemethods. MCShasbeenprovedtobesuperiortosingleclassiersystemboththeoreticallyandexperimentally,therehavebeennumerousmethodsproposedtoconstructMCS[ 10 ].ThemethodscanberepresentedinseveralwaysandFigure 2-1 showsoneofthem.ThediagraminFigure 2-1 illustratesfourlevelsusedinbuildingensemblesofdiverseclassiers.Atthedatalevel,thedatasetcanbemodiedwithdifferentpre-processingmethodsordifferentdatasubsetscanbeselectedsothateachclassierintheensembleistrainedonitsowndataset.Featureextractionmayalsoberelatedtoaspecicclassierandeachextractorgeneratesitsownfeaturevectorforthecorrespondingclassier.Themethodsattheclassierlevelcanbedividedroughlyintotwotypes:usingonebaseclassierandusingseveralbaseclassiers.Manyensembleparadigmsemploytheformerapproach,butthereisnoevidencethatthisstrategyisbetterthanusingdifferentclassiersandthelatterapproachisusedinthisstudy.Thedecisionlevelfocusesonthemethodsofcombiningclassieroutputs,thengeneratingoneglobaldecision.Thisiscommonlycalleddecisionfusionorsimplyfusion.Context-dependentfusionfocusesonthecombinationlevelwiththehelpoffeaturelevelinformation. 22

PAGE 23

18 ],[ 19 ],Bordacount[ 20 ],[ 21 ],Bayesian[ 22 ],[ 23 ],neuralnetwork[ 24 ],[ 25 ],DempsterShafer[ 26 ],[ 27 ],baggingandboosting[ 28 ],[ 29 ],andfuzzyintegrals[ 30 ],[ 31 ],justtonameafew.Thesemethodsforfusionmaytakeoneofthetwoapproaches:classierfusionandclassierselection. Inclassierfusion,allclassiersaresupposedtobeequallyexperiencedinthewholefeaturespaceandtheiroutputsarecombinedinsomemannertoachieveaconsensus.Althoughclassierfusionmethodscombinetheresultsofseveralindividualclassiers,theydonottakeintoaccountthelocalexpertiseofeachclassier.Thiscanmisleadtheconsensusofmultipleclassiersbyoverlookingtheopinionofsomebetterskilledclassiersinaspecicregiontowhichthegiveninputbelongs. Sometimesitisusefultodecomposeacomplexproblemintosimplersub-problemsandsolveeachsub-problemonebyoneinsteadoflearningtheglobalrelationbetweeninputvariablesandtargetvariables.Classierselectionmethodstakethedivide-and-conquerapproachbydividingthefeaturespaceintosomehomogeneousregionsandassigningoneormoreclassierstoeachregion.Inclassierselectionmethods,amethodofpartitioningthefeaturespaceandestimatingtheperformanceofeachclassierineachpartitioniscrucial.Woodsetal.proposedamethodcalleddynamicclassierselectionbylocalaccuracy[ 32 ].Thebasicconceptofitistoestimateeachclassier'saccuracyinlocalregionsoffeaturespacesurroundinganunknowntestsample,andusethedecisionofthemostlocallyaccurateclassier.Thismethod,however,wastootime-consumingduetotheneedforanaccuracyestimationforeachtestsample.Intheclustering-and-selectionmethod[ 33 ],Kunchevapresentedanalgorithmtostatisticallyselectthebestclassier.Inthismethod,thetrainingdataareclusteredtoformthedecisionregions,andonelocallybestclassierisselected 23

PAGE 24

34 ]proposedamodiedversionoftheclustering-and-selectionmethod,thattriedtotakeadvantageofclasslabels.Witheachclassier,thetrainingsamplesaredividedintocorrectlyandincorrectlyclassiedsamples,whicharethenclusteredtoformapartitionofthefeaturespace.Duetothedifferencebetweentheclassiers'errorcharacteristics,thepartitionsresultingfromdifferentclassiersgenerallyarenotthesame.Inthetestphase,themostaccurateclassierinthevicinityoftheinputsampleisappointedtomakethenaldecision.Themaindefectofthismethodisthateachclassiershouldmaintainitsownpartition,whichmakesthedecisionprocessmemory-andtime-intensive. 2-2 comparesthestructuresofthethreemethodsforfusion:classierfusion,classierselection,andcontext-dependentfusion. 24

PAGE 25

(b)classierselection (c)context-dependentfusion Combinationmethodsofclassieroutputs. sub-functionsthatcorrespondstothetwocomponents,respectively.Byintegratingthetwocomponentsintooneobjectivefunction,CDFsimultaneouslydividesthefeaturespaceintoKcontextsandlearnstheparametersneededforfusionineachcontext.InCDF,thesimultaneousclusteringandattributediscrimination(SCAD)algorithm[ 35 ],[ 36 ]isconsideredforcontextextractionandaweightedaverageisusedforfusion. Context-dependentfusionhasseveraladvantagesoverthepreviousmethods.Firstofall,asthefeaturespaceisdividedusingmodiedfuzzyclustering,atestpointcanbeeasilyassignedtoaclusterorseveralclusterswithdifferentdegrees.Inother 25

PAGE 26

37 ]proposedfuzzysetsthatproducedtheideaofpartialmembershipdescribedbyamembershipfunction,fuzzyclusteringhasbeenwidelystudiedandappliedinvariousareas.Infuzzyclustering,fuzzyc-means(FCM),generalizedbyBezdek[ 38 ],isthemostwell-knownmethod.FCMisanalgorithmderivedfromaconstrainedoptimizationprobleminwhichthefollowingobjectivefunctionisoptimizediteratively: whereNisthenumberofdatapoints,Kisthenumberofclusters,vkisthecenterofthekthcluster,ukiisthemembershipoftheithpointinthekthcluster,Dkiisthedistancebetweentheithpointandthekthclustercenter,andmisafuzzierconstant.Theconstraintonthemembershipscanbewrittenas AlthoughFCMisasimpleandefcientmethod,itiswellknownthattheEuclideandistanceusedinFCMisnoise-sensitiveandvalidonlyforcircularclusters.Severalgroupsofmethodshavebeendevelopedtomitigatetheseproblemsandfeature 26

PAGE 27

35 ]isconsideredasafeaturespaceclusteringmethod.TheobjectivefunctionofSCADcanbewrittenas whereListhenumberoffeaturesanddkilrepresentsthefeature-wisedistance.Thedistancebetweenadatapointxi=[xi1,,xiL]Tandaclustercentervk=[vk1,,vkL]Tiscalculatedasaweightedsumoffeature-wisedistancesas ThevaluerklistheweightofthelthfeatureinthekthclusterandshouldsatisfytheFCM-likeconstraint, andqisafuzzierconstantoffeatureweights.OneimportantdifferencebetweenRandUisthatrkl>0,i.e.,allthefeaturesareassumedtoparticipateintheclassicationwithdifferentdegrees.ThesecondterminEquation 2 isaregularizationtermandkisaregularizationparameter.SCADperformsclusteringandfeatureweightingsimultaneouslyandhasseveraladvantagesovertraditionalclusteringmethods.First,itscontinuousfeatureweightingprovidesamuchricherfeaturerelevancerepresentationthanbinaryfeatureselection.Second,SCADlearnsadifferentfeaturerelevancerepresentationforeachclusterinanunsupervisedmanner. TheobjectivefunctionofsimpliedSCAD(S-SCAD)usedintheCDFalgorithmcanbewrittenas 27

PAGE 28

NPi=1umki,(2) whereListheoverallfeaturedimensionandcanberepresentedasthesumofthenumberoffeaturesusedbythetthalgorithm,Lt: ThecontextsaredenedashomogeneousregionsofthefeaturespaceanditisalsoassumedthatthereareKcontextsorclusters.TheobjectivefunctionofCDFcanbe 28

PAGE 29

ThersttermofEquation( 2 )partitionstheNsamplesintoKclustersusingS-SCADandthesecondtermattemptstolearnacluster-dependentaggregationoftheTalgorithmoutputs.Theconstantservestobalancethetwoterms.TherearetwosetsofweightsinEquation( 2 ),featureweightsrklandclassierweightswkt.Theformerputsaweightonthelthfeatureinthekthcluster.Thelatterweight,wkt,putsaweightontheoutputvalueofthetthalgorithminthekthclusterandaffectstheclusteroutputofthekthcluster.Inthetestphase,whenadatapointxanditsalgorithmoutputsy=[y1,,yT]Taregiven,themembershipsaredecidedusingEquation( 2 )andeachclustergeneratesaclusteroutputasaweightedsumoftheTalgorithmoutputsas Thenaloutputisgivenasamembership-weightedsumoftheKclusteroutputs: BoththeweightsshouldsatisfytheFCM-likesum-to-oneconstraint.However,thefeatureweightrklshouldbegreaterthanzeroandtheclassierweightwktisallowedtohavenegativevalues.Inaspeciccluster,aclassiermightgeneratenegativelycorrelatedoutputvalueswiththedesiredoutputs,yetnegativeweightforthatclassiercanmaketheclassierstilleffectiveinclassication.TheconstraintforthefeatureweightscanbewrittenasEquation( 2 )andtheconstraintfortheclassierweightscanbewrittenas 29

PAGE 30

Testphaseofcontext-dependentfusion. Figure 2-3 showstheproceduregeneratingnaloutputvaluefromaninputpoint.InFigure 2-3 ,threematrices,V,R,andW,aretheoptimizedmatricesduringthetrainingphase. Inthetrainingphase,fourmatrixes,V,R,W,andU,thatminimizetheobjectivefunctionshouldbefound,andtheupdateequationsforthemcanbeobtainedusingthemethodofLagrangemultipliers.Theupdateequationscanbewrittenas NPi=1umki,(2) 30

PAGE 31

NPi=1umkiy2ai,(2) where NPi=1umkiy2ai TPa=11 Equations( 2 )and( 2 )aresimilartotheupdateequationsofFCMexceptthatthedistancebetweenadatapointandacenteriscalculatedbasedontheEuclideandistancebetweenthemandaggregatedresult(Equation( 2 )).Thefeatureweightalsocanbecalculatedinasimilarwaytothecalculationofmemberships(Equation( 2 )).ThenumeratorofEquation( 2 )iscomposedoftwoterms.Thersttermistheactualweightdecisiontermandthesecondtermisaddedduetothesum-to-oneconstraintofwkt.Aweightforthetthalgorithminthekthclusteriscalculatedbasedonthedifferencebetweendesiredoutputandcalculatedoutputusing(T1)algorithmsexceptthetthalgorithm.Thetthalgorithmisexcludedbecausethedifferenceistheportionofthetthalgorithminthekthcluster.ThedetailedderivationcanbefoundinAppendix A A concernsthederivationoftheupdateequationsofcontext-dependentfusionwithregularization.Theupdateequationsforcontext-dependentfusioncanbeobtainedbyequatingtheregularizationparameters,,,and,tozero. 31

PAGE 32

2 ). 2 ). 2 ). 2 ). 2 ). 2 ). 2 ),asthedistanceisdecidedbasedontheweightedEuclideandistanceandcombinedalgorithmoutputs,theboundaryoftheclusterisgenerallynotconvexinthefeaturespace.Althoughtheclusterboundariesneednottobeconvex,noisealsocancausethenon-convexboundaries,i.e.,over-tting.Toreducetheeffectofnoiseandmaketheclusterboundariessmooth,apost-processingroutineisemployedwherethemembershipmatrixisre-calculatedonlyusingtheEuclideandistance(line9inAlgorithm 2-1 ).Inthetestphase,themembershipvaluesfortestdataarealsocalculatedusingEquation( 2 )becausethetargetvaluesoftestdataarenotavailable.Finally,withthexedU,R,andV,theclassierweightmatrixisupdateduntilconvergence(line1012inAlgorithm 2-1 ).Thetrainingphaseofthecontext-dependentfusionalgorithmissummarizedinAlgorithm 2-1 32

PAGE 33

39 ]andhasbeenappliedtovariousproblemstomakealgorithmsnoise-robust.Regularizationalsohasbeenappliedtofuzzyclusteringandsomeobjectivefunctionshavebeenformulatedasfollows[ 40 ][ 43 ]: Althougheachregularizationtermisslightlydifferentfromtheothers,alltheregularizationtermsareminimizedwhenalltheukihavethesamevalue,uki=1=Kforall1iNand1kK,whichmeansthattheregularizationtermspreventukifromhavingextremevalues,0and1.Inthischapter,thepolynomialregularizationtermisusedbecauseonlythatmakesitpossibletoobtainaclosed-formsolutionfortheupdateequationswithoutmodifyingtheobjectivefunctionofCDF. InCDF-R,threepolynomialregularizationterms,formemberships,featureweights,andclassierweights,areadded.TheobjectivefunctionforCDF-Risdenedas where,,andareregularizationparameters.TheupdateequationsforCDF-RusingthemethodofLagrangemultipliersresultsinthesameequationsasCDF,exceptfortheupdateequationforclassierweight.Theupdateequationfortheclassierweightcan 33

PAGE 34

NPi=1umikiy2ai+ ,(2) andthetermsinEquations( 2 ),( 2 ),and( 2 )shouldbemodiedas .(2) ThedetailedderivationcanbefoundinAppendix A andthetrainingphaseofCDF-RcanbesummarizedasAlgorithm 2-2 .Table 2-1 comparestheupdateequationsofCDFandCDF-R.AsisshowninTable 2-1 ,inCDF-R,eachupdateequationexceptvklhasaregularizationparameter.Theregularizationparameterputsalimitontheminimumvalue.Forexample, 34

PAGE 35

UpdateequationsofCDFandCDF-R NPi=1umikiy2aiwkt=NPi=1umki0B@oiTPa=1a6=twkayai1CAytik NPi=1umikiy2ai+ wherek=KPa=1NPi=1umkioiTPt=1wktytiyai NPi=1umkiy2ai KPa=11

PAGE 36

2 ). 2 ). 2 ). 2 ). Landminedataset(N=numberofdatapoints,K=numberofclusters,d=featuredimension) Dataset K Sensor SetI 1000(266mines,734non-mines) 10 EHD GPR 40 Low-metalminevs.non-mine HMM GPR 20 SPECT GPR 20 SetII 875(311mines,564non-mines) 8 EHD GPR 40 Minevs.non-mine SPECT GPR 18 WEMI EMI 4 44 ].Fourclassicationsystems,calledHMM[ 45 ][ 47 ],SPECT[ 48 ],EHD[ 49 ],[ 50 ],andWEMI[ 51 ],[ 52 ],respectively,weretrainedusingthedata.Theseclassicationsystemsareup-to-datelandminedetectionsystemsdevelopedovertheyearsindependentlyofthiswork.Eachsystemextracteditsownfeaturesetusingthesubsetofdataandgeneratedcondencevalues,bothof 36

PAGE 37

DatasetsusedinthischapteraresummarizedinTable 2-2 .Ineachdataset,asamplecorrespondstoalandmine,aburiedobjectthatisnotalandmine,orablankpieceofground.AstheEMIdatawerenotavailablefordatasetI,threeclassiersusingtheGPRdatawereusedfortest,whereastwoclassiersusingtheGPRdataandoneusingtheEMIdatawereusedfordatasetII.Landminescanberoughlydividedintotwogroupsaccordingtotheamountofmetalliccontent,high-metalandlow-metalmines.High-metalminesarerelativelyeasytodetectusingEMIsensors.Howeverlow-metalminesarenotandeveniftheyaredetectable,itisdifculttodistinguishthemfromnon-mineobjects.Evenworse,someminesclassiedaslow-metalminesdonothaveanymetalliccontentatall.Inotherwords,detectinglow-metalminesisarealprobleminlandminedetectionevenwithup-to-dateclassicationsystems.DatasetIcontainsonlylow-metalmines,whereasdatasetIIcontainsalltypesofmines.DatasetIis,therefore,madeuptoshowperformanceimprovementforaprobleminwhichtheperformanceimprovementusingasingleclassierisnoteasy.DatasetIIiscomposedoftwodatasubsetsfromtwositeswhoseenvironmentsarequitedifferentfromeachother.AsshowninTable 2-4 ,anysingleclassiercannotaccommodatetwodatasubsetsfromdifferentenvironments.Thesecondexperimentis,therefore,designedtoshowtheeffectivenessoftheproposedmethodsonthedatasetcomposedofseveralsubsetswithdifferentcontexts.Intheseexperiments,ten-foldcross-validationisusedandthenumbersareaveragedover10runs.Ineachrun,thedatasetisdividedinto10foldsrandomly.Figures 2-4 and 2-5 showthereceiveroperatingcharacteristic(ROC)curvesofasetofclassiersandfusedresultsfordatasetIandIIrespectively.Tables 2-3 and 2-4 summarizetheresults.Intheguresandtables,probabilityofdetection(PD)isthenumberofminesdetecteddividedbythetotalnumberofmines;probabilityoffalsealarm(PFA)isthenumberofnon-minesdeclaredtobeminesdividedbythe 37

PAGE 38

ROCcurvesforasetofclassiersfromdatasetI totalnumberofnon-mines.InTables 2-3 and 2-4 ,theoriginalvaluesweremultipliedby100toclearlyshowthedifference.Ascanbeseenfromtheresults,CDFshowedgoodperformanceonaverageforbothdatasets,althoughindatasetI,at95%PD,CDFwasworsethanthesinglebestclassier.Ontheotherhand,CDF-RshowedbetterperformancethananysingleclassierandCDF.TheperformancedegradationofCDFmaycomefromthepost-processingroutine.Thepost-processingwasappliedtomaketheclusterboundariessmoothandtomakeCDFnoise-robust,whichresultsinconvexclustersinthefeaturespace.However,CDF-Rdoesnotforcetheclusterstobeconvex.ThefactthatCDF-RisbetterthanCDFinclassicationsuggeststhattheclustersinthefeaturespacemaynotbeconvex. 38

PAGE 39

ExperimentalresultsfromdatasetI PD(%) PFA(%) EHD SPECT HMM CDF CDF-R 85 5.858 23.297 10.218 4.632 2.861 90 8.311 32.153 13.760 6.403 5.177 95 11.444 48.229 23.297 13.215 9.946 Figure2-5. ROCcurvesforasetofclassiersfromdatasetII traditionalfusionmethodandfeaturespaceclustering.ThefeaturespaceinformationisreferredtoascontextinCDFandhelpsthefusertoselectlocallyskilledclassiers,whichresultsinimprovedperformance.ToextendtheperformanceofCDF,avariantofCDFusingregularization,calledcontext-dependentfusionwithregularization(CDF-R),isproposed.Regularizationisawell-knownmethodtomakeanalgorithmnoise-robust.Experimentalresultsusinglandminedatasetsprovedtheeffectivenessofthetwomethodsand,especially,thesuperiorityofCDF-RtoCDF. Althoughbothproposedfusionmethodswereefcientandeffectiveinrealworldproblems,CDFsometimesshowedperformancedegradation.Thisdegradationismainly 39

PAGE 40

ExperimentalresultsfromdatasetII PD(%) PFA(%) EHD SPECT WEMI CDF CDF-R 85 37.057 33.156 24.113 15.780 9.220 90 49.113 39.362 31.738 19.858 15.071 95 71.631 54.965 43.972 31.915 24.823 causedbytheconvex-clusterassumption,whichisoneimportantdifferencebetweenCDFandCDF-R.AsCDFnalizestheclusterstructurebasedonEuclideandistance,theresultingclustersinthefeaturespacebecomesconvex.CDF-Rdoesnothavethatassumption,soshowedbetterresultsthanCDF.Theseresultsleadonetotheconjecturethatcontextscannotberepresentedasconvexclustersinthefeaturespace.CDF-Ralsohasaprobleminthelinearcombinationmethodforfusion.ThelinearitypreventsCDF-Rfromhandlingnon-linearclassesineachcluster.Toremovethelinearrestriction,somenon-linearfusionmethodcanbeadopted,whicharediscussedinthefollowingChapter 6 40

PAGE 41

Principalcomponentanalysis(PCA)isawidelyusedstatisticalmethodthatattemptstoexplainthestructureofthedatabymeansofasmallnumberofcomponents.PCAhasbeenappliedtostatisticalanalysis,patternrecognition,andimageprocessing[ 53 ].AlthoughPCAhasbeenappliedsuccessfullyinmanyareas,ithassomeproblems,suchassensitivitytonoiseandlimitationtoGaussiandistributions. Itiswellknownthattheprincipalcomponentsareoftenaffectedbyoutliers,thusmaynotcapturethetruestructureofthedata.ThereforedatareductionbasedonPCAbecomesunreliableifoutliersarepresentinthedata.Torelievethesensitivity,severalrobustvariantsofPCAhavebeenproposed.ThegoalofrobustPCA(R-PCA)istoobtainaccurateprincipalcomponentsofasamplesetfromadistributionwhenthesamplesethasbeencontaminatedbysamplesfromadifferentdistribution.Pointsfromthedifferentdistributionarereferredtohereasnoise.Mostmethodsachieverobustnessbyreplacingtheclassicalestimateofacovariancematrixbyarobustestimate.Thereareseveralgroupsofmethodstorobustlyestimatecovariancematrices.Therstgroupofmethodsisbasedonsubsetselection[ 54 ],[ 55 ].Severalsubsetsarerandomlyselectedfromthewholedatasetandthestandardprincipalcomponentsarecalculatedforeachsubset.Theprincipalcomponentsfromthesubsetsareaggregatedbyarobustaggregationmethod.Thesecondgroupisbasedonoutlierrejection[ 56 ],[ 57 ]inwhichsomedatapointsfarfrommostofthedataareconsideredoutliersandareremoved.Thethirdgroupconsistsofmethodsthatassignaweighttoeachdatapointrepresentingthecontributionofthatpointtotheestimationoftheprincipalcomponents[ 58 ][ 60 ].Theproposedmethod,robustfuzzyPCA(RF-PCA),belongstothisgroup. AnothershortcomingofPCAisitsinabilitytoaccuratelyrepresentnon-Gaussiandistributions.Traditionally,principalcomponentsaredeterminedexclusivelybythesecond-orderstatisticsofthedata,whicharesufcientforGaussiandistributionbut 41

PAGE 42

6 ],[ 61 ],[ 62 ].AmongthesemodiedPCAmethods,kernelPCA(K-PCA)[ 6 ]proposedbyScholkopfetal.isthemostwellknownandwidelyadopted.AlthoughK-PCAiswidelyusedasanonlinearfeatureextractionanddimensionalityreductionmethod[ 63 ][ 65 ],itretainsthesensitivitytonoisefoundinPCA.Toremedythisproblem,theproposedRF-PCAistransformedintoakernel-basedmethod,calledrobustkernelfuzzyPCA(RKF-PCA). Inthenextsection,previousapproachesformakingPCArobustandtheirshortcomingsaredescribed.RF-PCA,robustPCAusingfuzzymembership,isintroducedinsection 3.2 .Insection 3.3 ,K-PCAispresented,andRF-PCAisreformulatedasakernel-basedmethodinsection 3.4 .Experimentalresultsaregiveninsection 3.5 ,anddiscussioninsection 3.6 53 ].Itisalsowellknownthatleast-squarestechniquesarenotrobustinthesensethatoutlierscanarbitrarilyskewanestimatefromthetruevalue[ 66 ].SeveralmethodshavebeenproposedtomakePCArobustbyrobustlyestimatingacovariancematrix,andtheycanbecategorizedintoseveralgroupsaccordingtothecovarianceestimationmethodused. Therstgroupisbasedonsubsetselection.Rousseeuw[ 54 ]proposedaminimumcovariancedeterminant(MCD)methodinwhichoneconsiderssomesubsetsofsizehoutofasetwithN(>h)elements.Thecovariancematrixofeachsubsetiscomputedandtherobustestimateofthecovariancematrixisgivenbythecovariancematrixcomputedoverthesubsetwiththesmallestdeterminant.Althoughtheresultisrobust,itisnotapplicabletoasmalldatasetbecausetheestimatedcovariancematrixisclosetosingular,whichcausesinstabilityinthecalculationofprincipalcomponents.Inaddition, 42

PAGE 43

55 ]proposedanothermethodbasedonsubsetselectioninwhichprincipalcomponentsfromeachsubsetareaggregatedwithvotingandaveraging.Thisaggregationmethod,however,withtheMCDmethodsharesthetwoproblemsdescribedaboveandthereisnoguaranteeoforthogonalityamongtheresultingprincipalcomponents. Thesecondgroupisbasedonoutlierrejection,inwhichadatapointassumedtobemostatypicalisremovediteratively.Thetwomethodsin[ 56 ]and[ 57 ]basicallysharethesamestructureexceptthemeasureofatypicality.In[ 56 ],thereconstructionerror,denedasEquation( 3 ),isusedasanatypicalitymeasure, whereWisamatrixhavingthetopkprincipalcomponentsofthecovariancematrixofdatasetX=fx1,,xNgascolumnsandkisassumedtobeknown.In[ 57 ],thedistancefromtheoriginisusedtogetherwithoneclasssupportvectormachine[ 67 ].Thesemeasurescanhaveanynon-negativevalues,whichisdifferentfromfuzzymembershipsthattakeonvaluesbetweenzeroandoneinclusively.Thebasiclimitationofthesemethodsisthepre-denednoiselevel,whichcannotbedecidedeasilyinrealworldproblems.Thesecondgroupofmethodsisalsosubjecttothesingularproblemwhenusingsmalldatasets. Thethirdgroupusesmembershiptocontroltheeffectofeachdatapointonacovariancematrix.Themembershipcanbeappliedintwoways:directlyonthedata[ 58 ]andindirectlyinthecalculationofacovariancematrix[ 59 ],[ 60 ].In[ 58 ],thedataareclusteredintoapre-denednumberofclusters(M),andtheweight(zi,1iN)iscalculatedasafunctionofthemembershipvaluesinMclusters.Withtheseweights,a 43

PAGE 44

andthenthetraditionalPCAisconductedonZXinsteadofX;thisiscalledtheweightedPCA(W-PCA)method.W-PCAisdifferentfromtheproposedmethodandindirectweightingmethodsinthatthedatashouldbeclusteredintoMclustersbeforehand,whereasonlyoneclusterisdenedinothermethods.OneshortcominginW-PCAisthepre-clustering,whichisnotasimpleproblemevenwhenthenumberofclusters,M,isgiven. Indirectweightingmethodsuseweightsinthecalculationoffuzzycovariancematricesdenedas NPi=1umi,(3) wherexisthemeanofdata,uiisthemembershipofxi(thedegreeofbelongingtothecluster),andmisafuzzierconstant.Indirectweightingmethodsconsideronlyonecluster,andthatclustercorrespondstotherstprincipalcomponent.Thekeypointinthesemethodsliesindecidingthemembershipui,whichrepresentsthedegreeofimportanceinthecalculationofthefuzzycovariancematrix.Inindirectweightingmethods,therstprincipalcomponentisconsideredtobealineprototype[ 68 ]andanadditionaltermcorrespondingtoanoisecluster[ 69 ]isaddedtodecidethemembershipinarobustway.Theobjectivefunctioncanbewrittenas wherewrepresentsthelineprototypeandthenoisedistance.InEquation( 3 ),thersttermrepresentstheweightedsumofprojectiondistance(orreconstructionerror) 44

PAGE 45

70 ],inwhichuicanonlyhavediscretevalues,ui2f0,1g.Inthiscase,itisnoteasytominimizeEquation( 3 )withrespecttoU=fu1,...,uNgandwsimultaneouslybecauseitisamixtureofdiscreteandcontinuousoptimization.TheconstraintonUisrelaxedtohavecontinuousvaluesin[ 59 ]and[ 60 ],whichmakesiteasytooptimizetheobjectivefunction.Afterthemembershipisdecided,therobustprincipalcomponentscanbefoundbyanalyzingthefuzzycovariancematrix[ 59 ]orthroughtheGram-Schmidtorthogonalizationprocedure[ 60 ].Oneproblemintheindirectweightingmethodsisthatthemembershipvalueisdecidedbasedonlyontherstprincipalcomponent,whichmaynotreectthetruestructureofthedata. Samplingandrejectionmethodsaresimpleandaccurate,buttheymaysufferfromcomputationalproblemsduetothesmallsizeofthedataset.Anotherpossibleproblemisinformationlossduetotheremovalofdatapoints,althoughtheymightnotbeimportantincalculatingtheprincipalcomponents.Thereforeitisdesirabletousealltheavailabledata,andweightingdatapointsisbetterthansimplydiscardingsomeofthem. Thesecondassumptioncomesfromthedefectofindirectweightingmethods.Whenusingindirectweightingmethods,onlytherstPCisconsideredasalineprototypeandtheobjectivefunctionisoptimizediterativelytodecidethemembershipvalues.Howeverthelineprototypemaynotcontainenoughinformationtorepresentcomplexdatastructures,whichresultsintheinaccuraterepresentationofthemembershipvalues,covariancematrix,andnallyPCs.Figure 3-1 showstheestimatedPCsusingreconstructionerrorinEquation( 3 ).Figure 3-1(a) usestherstonePCtocalculate 45

PAGE 46

(b) Estimatedprincipalcomponentsusingreconstructionerrorwith(a)oneand(b)twoprincipalcomponents. theerrorandFigure 3-1(b) usesthersttwo.ThedimensionalityofthedataisthreeandthedatapointsandPCsareprojectedontotwo-dimensionalspaceusingPCA.AsisclearfromFigure 3-1 ,theindirectmethodsthatonlyusetherstPCsometimesfailtocorrectlyestimatethePCswhenthedataisnoisy.AnotherproblemistheeigenvaluetowhichthelengthofalineisproportionalinFigure 3-1 .Ideallythesecondverticallinehashalfthelengthoftherstone,butthesecondlargesteigenvalueinFigure 3-1(a) ismuchsmallerthantherstoneanditsvalueisalmostzero.ByusingkPCssimultaneously,however,onecangetamorestableresult,asshowninthegure.Thevalueofkisdecidedthroughcross-validation. TheproposedmethodisbasedonfuzzymeanandfuzzycovarianceandhasaconnectionwithM-estimatorsinrobuststatistics.Inrobuststatistics[ 66 ],M-estimatorsformeanandcovariancecanbedenedas NPi=1f1(g(xijR,CR)),(3) 46

PAGE 47

3 )andanexponentialfunction, whereisaconstant,thenf1isafuzzymembershipfunctionwhoserangeisbetweenzeroandoneinclusively.ThemembershipfunctionalsocanbederivedfromanobjectivefunctionsimilartoEquation( 3 ).Anobjectivefunctionwithentropyregularization[ 71 ],whichiswidelyusedtomitigatetheeffectofnoise,canberepresentedas wherethersttermrepresentstheweightedsumofreconstructionerrorandthesecondtermdoestheregularizationterm,whichisalittledifferentfromtheoneinfuzzyclustering.Infuzzyclustering,thereisasum-to-oneconstraintoverM(>1)clustersandtheconstraintpreventstheobjectivefunctionfromhavingatrivialsolution,ui=0foralli=1,...,N.InRF-PCA,however,thereisonlyoneclusterandnoconstraint.Sothetermuiisaddedtopreventuifromhavingatrivialsolution.BytakingapartialderivativeofJwithrespecttoui,onecanobtain 47

PAGE 48

3 )and( 3 ). 3 ). (b) Therstprincipalcomponentusing(a)PCAand(b)RF-PCA. whichisequaltoEquation( 3 ).UsingEquations( 3 )and( 3 ),Equations( 3 )and( 3 )canbere-writtenas NPi=1ui,(3) whichcorrespondtotherobustestimatesofmeanandcovariance.RobustkprincipalcomponentscanbeobtainedfromCR.TheoverallRF-PCAalgorithmcanbewrittenasinAlgorithm 3-1 .Thisalgorithmiteratesuntilthemaximumdifferenceofthepreviousandcurrentmembershipvaluesisnotgreaterthanthepre-speciedconstant,maxi=1,...,Njjut1iutijj<",orthecounterreachesitsmaximumvalue,t>tmax. 48

PAGE 49

3-2 showstherstprincipalcomponentfoundbyPCAandRF-PCA.Thedatasetconsistsof110datapoints,100pointsfromaGaussiandistributionandtennoisepoints.PCAfoundaskewedprincipalcomponentduetonoise,butRF-PCAfoundanunskewedonebecausethenoisepointshavesmallmembershipvaluesandconsequentlyarenegligibleinthecalculationofacovariancematrix. 6 ],[ 61 ],[ 62 ].Amongthem,Scholkopfetal.'skernelPCA(K-PCA)[ 6 ]isconsideredhere.InK-PCA,theinputvectorxismappedintoahighdimensionalfeaturespaceFviaamappingfunction :Rd!F,x!(x),(3) wheredisthedimensionalityofaninputvector.AssumingthemappeddatahaszeromeaninthefeaturespaceF,thecovariancematrixinFis whereNisthenumberofdatapoints.Wehavetondeigenvalues0andeigenvectorsw2Fnf0gsatisfying Allsolutionswwith0lieinthespanof(x1),(x2),...,(xN).Thusthereexistcoefcientsi(i=1,...,N)suchthat 49

PAGE 50

wheredenotesthecolumnvectorwithentries1,2,...,N.Let0102...0Ndenotetheeigenvaluesand1,2,...,Nthecorrespondingeigenvectors,k=[k1,...,kN]T(k=1,...,N),ofEquation( 3 ).EacheigenvectorshouldbenormalizedsothatthecorrespondingprincipalcomponentsinFhaveunitlength,i.e.,wkwk=1.Thiscanbedonebytakingktohavealengthof1=p UsingEquation( 3 ),wecancalculatetheprojecteddatainFwithouttheexplicitcalculationofthecovariancematrixinF,whichisoftenimpossible.Whenthedataarenotcentered,thefollowingtransformationtransformsthedatatohavezeromeaninthefeaturespaceF, whereeKisazeromeankernelmatrixinF.Atestpointalsoshouldbetransformedas Referto[ 6 ]foramoredetailedderivation. 50

PAGE 51

3.2 isreformulatedasakernel-basedmethod.Firstofall,Equations( 3 )and( 3 )arekernelizedas Assumethatthemeaninthefeaturespace()iszero.ThenEquation( 3 )canbesimpliedas where(xi)isdenedas(xi)=ui(xi).Theproblemthenistondeigenvalues0andeigenvectorsw2Fnf0gsatisfying AswithK-PCA,allsolutionswwith0lieinthespanof(x1),(x2),...,(xN)sothereexistcoefcientsi(i=1,...,N)suchthat whichisaweightedcombinationofmappeddatainF.BydeninganNNkernelmatrixK0asK0ij=(xi)(xj),onearrivesattheequation 51

PAGE 52

Ifthedataarenotcentered,thedatashouldbecenteredusingatransformationsimilartothatgiveninEquation( 3 ). whereN0=PNi=1ui.Atestpoint,x,canbetransformedinasimilarwaytohavezeromean. Theremainingproblemisthecalculationofmembershipvalues.UsingEquation( 3 ),thereconstructionerrorinFcanberepresentedas 52

PAGE 53

3 )foreigenvalues(k)andeigenvectors(k). 3 ). andk(
PAGE 54

6 ]toshowtheeffectivenessofK-PCA.ThedatasetiscomposedofthreeGaussianclusters,eachofwhichhas30points.Figure 3-4(a) showstheresultsusingK-PCA.InFigures 3-4(b) and 3-4(c) ,tennoisepointsareadded,andK-PCAandRKF-PCAareapplied.InFigure 3-4(b) ,itisclearthatthenoisepointstendtoskewtheoriginalprincipalcomponentstowardthemselves.InFigure 3-4(c) ,however,theprincipalcomponentsfoundarealmostthesameasthoseinFigure 3-4(a) ,becausethenoisepointshavesmallmembershipvaluesandhavelittleeffectonthecalculationofacovariancematrix.Figure 3-5 showstheerrorhistogramsbetweentheeigen-systemofcleandatausingK-PCAandthatofnoisydatausingK-PCAandRKF-PCA.TwohundredexperimentswereconductedandeachoftheexperimentsusedthedatainFigure 3-4(a) withtenrandomlygeneratednoisepoints.Theerrorisdenedastheweightedsumofanglesbetweentwoeigenvectors, whereN1(=90)isthenumberofpointsinthecleandataset,CiistheitheigenvaluefromcleandatausingK-PCA,wiCandwiNaretheitheigenvectorfromcleanand (b) (c) Therstprincipalcomponentfrom(a)K-PCAoncleandata,(b)K-PCAonnoisydata,and(c)RKF-PCAonnoisydata. 54

PAGE 55

(b) (c) (a)K-PCAoncleandata,(b)K-PCAonnoisydata,and(c)RKF-PCAonnoisydata. noisydata.Theeigenvaluesareassumedtobesortedindescendingorderandtheeigenvectorsarerearrangedaccordingtotheireigenvalues.Thevalue(wiC,wiN)representstheanglebetweentwoeigenvectorsandcanbecalculatedas 55

PAGE 56

Errorhistogrambetweeneigen-systemsofcleanandnoisydatausingK-PCAandRKF-PCA. whereN2(=100)isthenumberofpointsinanoisydataset.Themembershipvaluesareuj=1(1jN)forK-PCAandfromEquation( 3 )forRKF-PCA.AsisclearfromFigure 3-5 ,theerrorofRKF-PCAismuchsmallerthanthatofK-PCAandthevariance,too,whichmeansthatRKF-PCAisbetterthanK-PCAinnoisycondition. TheproposedalgorithmwasalsotestedasafeatureextractionmethodontheletterrecognitiondatasetthatisavailablefromtheUCImachinelearningrepository[ 72 ].Thedatasetconsistsof26classes,eachofwhichcorrespondstoacharacterintheEnglishalphabet.Thereareapproximately700samplesperclass.Inthisexperiment,eachpairofclasseswereclassiedusingFisher'slineardiscriminant[ 2 ],andthethreepairshavingthelargesterrorratewereselectedforfurtherexperiment:namely,H-O,H-R,andS-Zpairs.Ten-foldcross-validationwasconductedoneachcharacter-pairdataandTable 3-1 summarizestheerrorrateoneachpair.Theparameters,ofthekernelfunctions,weresetusingcross-validation.AsisclearfromTable 3-1 ,noneofthethreecharacterpairsislinearlyseparableintheoriginalfeaturespaceandthekernel-basednon-linearfeatureextractionmethodreducestheerrorsignicantly.OnethingthatshouldbenotedisthatPCAintheinputspaceshowsthebestresultwiththeoriginaldimensionality(whichis16)andbothkernel-basedPCAsshowthebestresultwithadimensionalityofaround50.Thisismuchhigherthantheoriginaloneandsuggeststhatthehighorderfeaturesarehelpfulinthiscase.AlthoughitappearsthatK-PCAandRKF-PCAachieveaboutthesameerrorrate,thedifferenceisstatisticallysignicant, 56

PAGE 57

Classicationerroroneachsetofcharacter-pairdata Featureextractionmethod Errorrate(%) H-O H-R S-Z PCA 21.44 11.80 15.91 K-PCA 3.41 6.91 3.65 RKF-PCA 3.38 6.62 3.52 exceptforH-Opair.InTable 3-1 ,eacherrorrateistheprobabilityofanerrorsample(p).Assumeeacherrorsampleisindependent.Thenthenumberoferrorsamples(NF)outofthenumberofsamples(Ntotal)isbinomiallydistributed,whichcanbefurtherapproximatedwithaGaussiandistributionwith=Ntotalpand2=Ntotalp(1p).The100(1)%condenceinterval(CI)forNFcanbeformulatedas[ 73 ] wheret=2isthecutoffvalueofastudent-tdistributionwith(Ntotal1)degreesoffreedom.Figure 3-6 showstheCIswitha95%condencelevel(CL).InFigure 3-6 ,dottedlinesrepresentcondenceintervalsofRKF-PCAandsolidlinesrepresentthoseofK-PCA.ForH-RandS-Zpair,thereisnooverlapbetweentheCIs,whichmeansthattheerrorrateofK-PCAandRKF-PCAisstatisticallysignicantandthatRKF-PCAperformedbetterclassicationthanK-PCAinthisexperimentata95%CL.ForH-Opair,thedifferenceisnotsignicantlydifferentata95%CL. ToinvestigatetherobustnessofRKF-PCA,thepreviousexperimentswereconductedagainwithsomenoisepoints.Noisepointsareselectedrandomlyfromotherclassesofcharactersexceptthetworealtrainingclasses,thenaddedtothetrainingsetwithrandomlygeneratedlabels.Thenoisepointscanbeconsideredasmis-labeledpointsandplaytheroleofoutliers.Thenumberofnoisepointsisdecidedasthenumberofrealtrainingpointsmultipliedbynoiseratio.Figure 3-7 summarizestheclassicationerrorusingFisher'slineardiscriminantwithrespecttonoiseratio.ThevaluesplottedinFigure 3-7 areaveragedovertenruns. 57

PAGE 58

(b)H-Rpair (c)S-Zpair Condenceintervalsoftheerrorrate Kernel-basedmethodsareknowntobelesssensitivetooutliersthanthecorrespondinginputspacemethodsincludingK-PCA.FromFigure 3-7 ,however,wecanseethatRKF-PCAislesssensitivetooutliersthanK-PCAandthedifferencegetsbiggerasthenoiseratioincreases.InthecaseofH-Opair,thedifferencebetweenK-PCAandRKF-PCAisnotstatisticallysignicantuntilthenoiseratioisbiggerthan0.7undera95%CL.However,RKF-PCAconvergestoK-PCAasinEquation( 3 )goesinnity,whichimpliesthattheperformanceofK-PCAmaybealowerboundofRKF-PCAperformance. 58

PAGE 59

(b)H-Rpair (c)S-Zpair Errorratewithrespecttonoiseratio. TheproposedalgorithmoutperformedK-PCAasdemonstratedusinganarticialandcharacterrecognitiondatasets. AlthoughRKF-PCAperformedbetterthanK-PCA,therearealsosomedrawbacks.RKF-PCAisaniterativemethod,whichmeansthatitrequiresmorecomputationthanK-PCA.ThereconstructionerrordenedinEquation( 3 )alsorequiresO(k2N2)

PAGE 60

3 )oradifferentmeasuremightreducethecomputationalcomplexitywithoutsacricingtheperformance,whichisunderinvestigation.AnotherproblemisdeterminingtheextramembershipparameterinEquation( 3 ).Withthepropersettingof,RKF-PCAoutperformsK-PCA,butthecross-validationmethodusedhereistimeconsuming.Ifthedistributionofmembershipvaluesisknown,mightbedecidedwithoutcross-validation,whichisleftforfurtherresearch. 60

PAGE 61

Clusteringhasformedanimportantareainpatternrecognition,imageprocessing,and,mostrecently,datamining.Clustering,alsoknownasclusteranalysis,assignsanunlabeleddatasetX=fx1,...,xNgintoK(1
PAGE 62

77 ]anditsvariants,inwhichadatapointhavingthelargestdistancetotheexistingsetofseedsisiterativelyadded,arethemostwidelyusedones.Althoughtheyaresimpleandefcient,someofthemalsorequiremultiplerunsandtheresultingsetofseedstendstobeplacedontheboundaryofthedata.Densityestimationmethodsselectasetofseedsbasedonlocaldensity,theestimationofwhichiscrucialinthesemethods.Thek-nearestneighboror"-radiusballisgenerallyusedtodecideneighboringpoints[ 78 ],andeachrequiresoneadditionalparameter.Globalfuzzyc-means[ 79 ]alsobelongstothisclass,whichiterativelyaddsadatapointoptimizingtheoriginalobjectivefunctionforclusteringasaseed.Duetoitsdeterministicproperty,itdoesnotrequiremultiplerunsanddoesnotneedanyextraparameter. Theglobalclusteringmethodwasrstformulatedforhardclustering[ 80 ],[ 81 ]andextendedtosoftclustering[ 79 ].Theseglobalmethodscandeterminetheinitialseedsefcientlyincomputationandperformance,butarestillaffectedbynoiseandcannotmanagenon-convexclusters.Thesensitivitytonoisecanbealleviatedinseveralwaysandthekernel-basedmethodisoneofthem.Thekernel-basedmethodwasoriginallyproposedtoconvertalinearmethodintoanon-linearone[ 5 ],butitisalsowell-knownthatsomekernelsusedinkernel-basedclusteringhaveoutlierrobustnesstogetherwiththeirnon-linearproperty[ 82 ].Anotherreasonforadoptingkernel-basedmethodsisthatthekernel-basedmethodscanbeextendedinseveralwaysusingdifferentkernels. Inthischapter,globalfuzzyc-means(G-FCM)isconvertedintoakernel-basedmethod,calledkernel-basedglobalfuzzyc-means(KG-FCM)algorithm,andrealizedusingtwodifferentkernels.Firstofall,KG-FCMisimplementedasanoiseresistantmethodusingtheCauchykernel,calledKG-FCMwithCauchykernel(KG-FCM-C). 62

PAGE 63

83 ],[ 84 ]tomanagenon-convexclusters,calledKG-FCMwithrandomwalkkernel(KG-FCM-RW).KG-FCM-Cismorenoiseresistantthanexistingmethods,andKG-FCM-RWisthemostefcientoneforclusteringnon-convexclusters. Inthenexttwosections,thehardandsoftglobalclusteringalgorithmsglobalk-meansandglobalfuzzyc-means,respectivelyarebrieydescribed.Insection 4.3 ,globalfuzzyc-meansisre-formulatedaskernel-basedglobalfuzzyc-means.Experimentalresultsaregiveninsection 4.4 anddiscussionandfurtherresearchesaregiveninsection 4.5 85 ]isaniterativealgorithmthatndsKcrispandhyper-sphericalclustersinthedatasuchthatameasuredenedinEquation( 4 )isminimized. whereI()isanindicatorfunctionhavingavalueof1ifadatapointxibelongstothekthcluster(Ck)and0otherwise.Commonly,theKclustercenters,V=fv1,v2,...,vKg,areinitializedrandomly,andthedatasetX=fx1,x2,...,xNgispartitionedbasedontheminimumsquareddistancecriterion.Althoughk-meansisoneofthesimplestlearningalgorithmsandisstillinuse,oneofitsmajorproblemsisitssensitivitytoinitialization.Toresolvethisproblem,severalmethodsforinitializationhavebeenproposedandglobalk-meansisoneofthem.Globalk-means[ 80 ],[ 81 ]isanincrementalapproachtoclusteringthatdynamicallyaddsoneclusteratatimethroughadeterministicsearchprocedure.Theassumptiononwhichthemethodisbasedis:anoptimalclusteringsolutiontothekclusteringproblemcanbeobtainedthroughNlocalsearchesusingk-meansstartingfromaninitialstatewith 63

PAGE 64

TheoriginalversionofGKMrequiresO(N)executionsofk-meansforonenewseed,whichresultsinO(KN)executionsofk-meansandisnotfeasiblewithalargeN.ThusafastGKMmethodwasproposed,inwhichonlyonedatapointmaximizinganobjectivefunctionisconsideredasanewseed.Theobjectivefunctionisdenedas wheredjk1isthedistancebetweenxjandtheclosestcenteramongthe(k1)clustercentersobtainedsofar.Thequantitybimeasurestheguaranteedreductionintheobjectivefunctionobtainedbyinsertinganewclustercenteratxi.Themodiedalgorithmsignicantlyreducesthenumberofk-meansexecutionsfromO(KN)toO(K).ThefastGKMalgorithmcanbesummarizedasAlgorithm 4-1 4 ),whichcanbeextendedto 64

PAGE 65

whereukiisthemembershipoftheithpointtothekthcluster,andmisafuzzierconstant.G-FCMusesthesamealgorithmasGKM,exceptfortwothings.Firstofall,FCMisusedinsteadofk-means.Equations( 4 )and( 4 )showtheupdateequationsforFCM[ 86 ]: NPi=1umki.(4) Anotherdifferenceistheobjectivefunctionfortheselectionofanewseed.UsingEquation( 4 ),GKMtriestondadatapointminimizingEquation( 4 ).Likewise,inG-FCM,oneshouldndadatapointminimizingEquation( 4 ).Equation( 4 )canbere-formulatedusingEquation( 4 )asfollows[ 87 ]: Theobjectivefunctionforseedselectionthenbecomes anddatapointxminimizingEquation( 4 )isselectedastheinitialpositionofanewclustercenter. 65

PAGE 66

Kernel-basedmethodswererstlyintroducedtoclusteringbyGirolami[ 88 ]whoproposedkernelk-means.Otherkernel-basedfuzzyclusteringalgorithmswereproposedafterthat[ 89 ].KernelFCM(K-FCM)isusedtominimizethefollowingobjectivefunction,whichisadirectextensionofEquation( 4 ): where()isamappingfunction.Theupdateequationscanbewrittenasfollows: TomakeKG-FCMnoise-robust,theCauchykernelisusedinsteadofthecommonlyusedGaussiankernel.TheCauchykernelhasbeenshowntobeoutlier-robustandabletohandleclusterswithdifferentdensitieseffectivelycomparedtotheGaussiankernel[ 82 ].TheCauchykernelcanbedenedas 1+jjxyjj2,(4) whereisakernelparameter.UsingEquation( 4 ),theupdateequationscanbewrittenas 66

PAGE 67

NPi=1umkiC(xi,vk)2.(4) Althoughthecoordinatesofcentersinthefeaturespacecannotbeevaluatedduetodimensionality,thecorrespondingonesintheinputspace(Equation( 4 ))canbecalculatedwhentheCauchykernelisused.InEquation( 4 ),thekernelfunctionC(xi,vk)worksasaweightingfunction.ThefunctionC(xi,vk)weightsadatapointbasedonthesimilaritybetweenxiandvj,whichresultsinnoiserobustness.Todoaglobalsearch,Equation( 4 )alsoshouldbekernelizedasfollows: InEquation( 4 )thedistancetermbetweenadatapointandacenter,jj(xi)(vj)jj,shouldbedividedintwoaccordingtothetypeofacenter:onefromtheprevious(k1)clusteringproblemandtheotherfromadatapoint, whichcanbesimpliedas Thenalthingthatshouldbeconsideredistheevaluationorderoftheupdateequations.Atthebeginningofakclusteringproblem,themembershipinEquation( 4 )mustbecalculatedrsttoincorporatethecurrentproblemintothepriorinformationtheoptimalclusteringresultfrom(k1)clusteringproblemandthedatapointminimizingEquation( 4 ).Algorithm 4-2 summarizesthekernelFCMwithCauchykernel(K-FCM-C)algorithmwithinitialization.Kernel-basedglobalFCMwithCauchykernel(KG-FCM-C)canbesummarizedasAlgorithm 4-3 .Theproposedalgorithm,KG-FCM-C,canbeusedtodecideinitialseedsefcientlyandismore 67

PAGE 68

Algorithm4-3KG-FCMwithCauchykernel Therandomwalkdistanceisagraphtheoreticdistance,inwhicheachdatapointisconsideredasanodeandthedistancebetweentwopointsisdenedbasedonMarkovproperty.TherandomwalkdistancewasrstproposedbyKleinandRandic[ 83 ]andisalsoknownasresistancedistancefollowingtheelectroniccircuitanalogy.OfthetwodistancemeasuresthatcanbecomputedfollowingthepropertyofMarkovchain,theaveragecommutetimeisusedhere.Theaveragecommutetimeisdened 68

PAGE 69

84 ].Thetraditionalshortestpathdistancedoesnotconsiderthenumberofpathsconnectingtwonodes,buttherandomwalkdistancedecreasesasthenumberofpathsincreasesorthelengthofapathdecreases.Theaveragecommutetimecanbecalculatedintermsofthepseudo-inverseoftheLaplacianmatrix,L+,asinAlgorithm 4-4 .InAlgorithm 4-4 ,idenotesthedistancetothe(2d+1)thnearestneighborfromxi.Bydecidingtheparameterinthisway,wecanefcientlyreduceboththeeffectofnoiseandclusterswithdifferentdensities[ 90 ].Toobtainmoredetailedinformation,referto[ 84 ]andthereferencestherein.Usingtheaveragecommutetime,therandomwalkkernelcanbedenedas whereisakernelparameter.TheupdateequationofthemembershipsforK-FCMwithrandomwalkkernel(K-FCM-RW)canbewritteninawaysimilartoEquation( 4 )as Theclustercentercannotbeevaluatedinthefeaturespaceorintheinputspace,however,bydot-product(xi)onbothsideofEquation( 4 ),RW(xi,vk)canbe 69

PAGE 70

Equation( 4 )canbeusedtoupdatemembershipvaluesinEquation( 4 )andtocalculatetheobjectivefunctionforseedselectioninEquation( 4 ).Algorithm 4-2 alsoshouldbemodied,becausethecoordinatesofcenterscannotbeevaluatedinK-FCM-RWandthemembershipvaluesdescribethecentersindirectly.Thus,thesetofcenters,V,inAlgorithm 4-2 shouldbereplacedwiththemembershipmatrixU,whosekthcolumnrepresentsthemembershipvectortothekthcluster.TheinitialvalueU1inKG-FCM-RWcanberepresentedasa(N1)vectorofones.Algorithm 4-5 summarizesK-FCM-RWwithinitializationandAlgorithm 4-6 doesKG-FCM-RW. 4.4.1Experimentsonarticialdatasets 4-1 showsaclusteringresultusingFCM.Thetestdata(D7)consistsofsevenwell-separatedcircularclusters,eachofwhichhas50points.ThelargedotsrepresenttherandominitialseedsusedinFCM. 70

PAGE 71

SensitivitytoinitializationinFCM. AlthoughFCMndsthecorrectstructuremostofthetime,itsometimesfailsduetotheconvergencetoalocaloptimumeveninadatasetwithwell-separatedclusters.OnlythreeclustersoutofsevenarecorrectlyidentiedinFigure 4-1 .Figure 4-2 showstheinitialseedsinG-FCM,whicharedecidedinadeterministicway.Conceptually,KG-FCMworksexactlythesamewayasthatinFigure 4-2 ,butishardtovisualizeduetothedimensionality. Table 4-1 summarizestheclusteringresultsforD7.Ifnotstatedotherwise,allthenumbersreportedarenumbersaveragedover100runs.FCMoccasionallyfallsintoalocaloptimum.K-FCM-CisalittlebetterthanFCM,butsometimesitalsocanfallintoalocaloptimum.K-FCM-RWistheleastsatisfactorybecausetherandomwalkdistanceis 71

PAGE 72

(b)Iteration2 (c)Iteration3 (d)Iteration4 (e)Iteration5 (f)Iteration6 (g)Iteration7 IncrementalseedselectioninG-FCM Table4-1. ExperimentalresultsonD7(N=numberofclusterscorrectlyidentied,var(N)=varianceofN) Method var(N) 6.82 0.4925 G-FCM 7.00 0.0000 K-FCM-C 6.92 0.3168 KG-FCM-C 7.00 0.0000 K-FCM-RW 6.42 3.1164 KG-FCM-RW 6.96 0.0788 basedonthepathconnectivitybetweenpoints,whichissensitivetooutliers.ThelargevarianceforK-FCM-RWalsocorroboratesthis.Thetwoglobalmethods,G-FCMandKG-FCM-C,alwaysndthecorrectstructureofD7.KG-FCM-RWismuchbetterthanK-FCM-RW,butnotasgoodasotherglobalmethods. 72

PAGE 73

(b) Numberofclusterscorrectlyidentiedwithrespecttonoiseratio. Totestrobustnesstonoise,somenoisepointsareaddedtoD7andthesameexperimentsarerepeated.Noisepointsaresampledfromauniformdistributionandthenoiseratioisgivenastheratioofthenumberofnoisepointstothenumberofdatapoints.Figure 4-3 showsthenumberofcorrectlyidentiedclusterswithrespecttothenoiseratio.Foraclearcomparison,twosubsetsoutofsixmethodsareplottedondifferentscales.FromFigure 4-3(a) ,wecanconcludethattheglobalsearchmethodsaremorerobusttonoisethantherandominitializationmethods,andKG-FCM-Cisbest.Asthenoiseratioincreases,therandominitializationmethodstendtofallintoalocaloptimum,buttheglobalsearchmethodsdonotsuffernoiseproblemsandndtherightstructure.InFigure 4-3(b) ,wecanseethatK-FCM-RWcollapsesasthenoiseratioincreases.Althoughglobalsearchtechniquehelps,KG-FCM-RWalsofailstondtherightstructureinnoisyconditions. 73

PAGE 74

(b) Clusteringresultsusing(a)FCMand(b)K-FCM-Cwithrandominitialization. Anothermeritofthekernel-basedmethodsisthattheycanclusterirregularlyshapedclusters.Figure 4-4 showsclusteringresultsonasampledatasetconsistingoftwoelongatedclusters(Dparallel),eachofwhichhas50datapointsgeneratedfromaGaussiandistribution.Bytheintroductionofanotherdistancemeasure,forexample,MahalanobisdistanceinsteadofEuclideandistance,FCMcanalsoclustertheelongatedclusterscorrectly.Howeverthekernel-basedmethodshaveanothermerit,outlierrobustness,inadditiontothat.Figure 4-5 summarizestheerrorrateforDparallelwithrespecttothedistancebetweentwoclustercenters(dbetween)andFigure 4-6 doesthesameforthevarianceoftheerrorrate.Theerrorrateisdenedasthenumberofmis-clusteredpointsdividedbythenumberofdatapoints.Inthisexperiment,themethodsusingtheglobalsearchtechniqueshowedalmostthesameresulttothecorrespondingmethodswithrandominitialization,sotheresultsoftheglobalmethodsarenotplotted.AsisclearfromFigure 4-5 ,FCMwithMahalanobisdistanceachievesthebestresultwhendbetweenissmallbecausethemeasureisspecializedforGaussiandistributions.Howeveritsometimesfailstoseparatetheclustersevenwhenthedbetweenislargeenough,becausethesizeofthedataisnotlargeenoughtoestimateparameters,includingthecovariancematrix.Theoveralllargevariancealsosupportsthat.AnotherinterestingpointinFigure 4-6 isthatthevarianceforK-FCM-Cdecreasesasdbetweenincreases.Ontheotherhand,FCMandK-FCM-RWhaveapeak,which 74

PAGE 75

ErrorratewithrespecttodbetweeninDparallel VarianceoferrorratewithrespecttodbetweeninDparallel Althoughtherandomwalkkernelisnoise-sensitive,itissuitablefornon-convexclusters.Figure 4-7 showsasampledatasetthatisnotlinearlyseparable(Dnonlinear).FCMandK-FCM-CcannotseparatethetwoclustersinFigure 4-7 fortwodifferentreasons.First,FCMfailstoseparatetheclustersmainlyduetothemeansquaresmeasure,soFCMtendstoseparateDnonlinearintotwoclustersinaverticaldirection.Second,K-FCM-Cfailstoseparatetheclustersduetoitsnoisesuppressingproperty.Asaresult,K-FCM-Cconsidersthepointsattheendofsemi-circlesasoutliersand 75

PAGE 76

Non-linearlyseparableclusters. Figure4-8. ErrorratewithrespecttodbetweeninDnonlinear. tendstoseparateDnonlinearintotwoclustersinahorizontaldirection.Figure 4-8 showstheerrorratewithrespecttothedistance,dbetween,depictedinFigure 4-7 .Asthedistancedecreases,theerrorrateofFCMincreases.K-FCM-CisslightlybetterthanFCM,butK-FCM-Cappearstobeamorenoise-robustmethodthananon-linearmethod,asdescribedabove,althoughtheclusterboundariesinK-FCM-Carenotlinear.OnlyK-FCM-RWcanseparatethenon-convexclusters,butitalsofailswhentwoclustersaretooclose. 72 ].Tocomparetheresultsusingdifferentclusteringmethods,aninformation-baseddistance,amodiedvariationofinformation[ 91 ],isused.Supposetwodifferentclusterings,C=fC1,C2,...,CKgandC0=fC01,C02,...,C0K0g(correspondstothesetofknownlabels),whichcluster 76

PAGE 77

andtheamountofinformationcommontobothclusteringscanberepresentedasthemutualinformationofthetworandomvariables,p(k)andp(k0), wherep(k,k0)=jCk\C0k0j=NistheprobabilitythatapointbelongstoCkinclusteringCandtoC0k0inC0.ThenthedifferencebetweenthetwotermsrepresentstheamountofinformationthatisnotdescribedbyaclusteringC: Equation( 4 )iszeroifandonlyifthetwoclusteringsareidenticalwhenK=K0,andcanbeconsideredasdistanceofaclustering(C)fromthegroundtruth(C0).AsthenumberofclustersinCincreases,themutualinformationincreasesandDIdecreases.Experimentally,clusteringwithlargenumberofclusters,K>K0,showedasmallerDIthanthatwiththesamenumberofclusters,K=K0,buttherelativeperformancedidnotchange.Therefore,thenumberofclusters(K)issettobeequaltothenumberofclasses(K0)inthedataset.Table 4-2 summarizestheexperimentalresultsusingUCIdatasetsandallthevaluesinTable 4-2 areDIvaluesaveragedover100runs. 77

PAGE 78

Fromtheseexperiments,wecanconcludethat: 1. Theglobalsearchtechniqueisusefulfordecidinginitialseeds,althoughtheglobalmethodssometimesfailtoachievetheminimumofthecorrespondingrandommethods.Thisfailureismainlyduetothedifferentstructuresofthesolutionspace.Therandommethodscanonlyhavedatapointsastheirinitialpositions,buttheglobalmethodscanhavenon-preoccupiedpositions,whichisanotherreasonforthedifference. 78

PAGE 79

ExperimentalresultsusingUCIdatasets(N=numberofsamples,d=datadimension,K0=numberofclasses) Dataset d K0 Globalmethod Randommethod Min. Average Max. Iris 150 4 3 FCM 0.4041 0.4041 0.4042 0.4045 K-FCM-C 0.3898 0.3898 0.3898 0.3898 K-FCM-RW 0.2663 0.6057 0.6164 0.6365 Wine 178 13 3 FCM 1.2944 1.2944 1.3088 1.3386 K-FCM-C 1.2284 1.2284 1.2368 1.3128 K-FCM-RW 1.2683 1.1542 1.2351 1.4986 Breastcancer 569 30 2 FCM 0.9329 0.9329 0.9339 0.9535 K-FCM-C 0.8712 0.8243 0.9224 0.9535 K-FCM-RW 0.9381 0.9202 0.9393 0.9535 Yeast 180 8 5 FCM 1.6129 1.6043 1.6139 1.7376 K-FCM-C 1.5855 1.8001 1.8001 1.8001 K-FCM-RW 1.8021 1.6755 1.843 1.9138 2. KG-FCM-Cismorerobusttooutliersthanothermethodsandshowsbetterandmorestableresultsontestdatasets. 3. KG-FCM-RWshowsthebestresultonirisdata,whichiscleanandhasarelativelywellseparatedclusterstructure.However,asthedimensionalityorthenumberofclustersincreases,itismorelikelytofallintoalocaloptimum. 79

PAGE 80

Althoughtheproposedalgorithm,KG-FCM,isbetterthanexistingmethods,itstillhassomeproblems.Initially,thekernelmethodisadoptedbecauseofthevariouseffectsobtainedusingdifferentkernels.Asisclearfromtheexperiments,theCauchykernelisgoodatsuppressingoutliersandtherandomwalkkernelisefcientatseparatingnon-convexclusters.However,eachkernelhastheother'sstrengthasitsweakness.Stillanotherkernelmaycombinethestrengthsofthetwo;thisisunderinvestigation.AnotherweaknessofKG-FCMisthattheglobalmethodsometimesshowspoorerresultsthanthecorrespondingrandommethod.Thisismainlyduetothedifferenceinsolutionspace.Adifferentseedselectionfunctionmaysolvethisproblem,whichisleftforfurtherresearch.Thelastthingthatshouldbementionedconcernstheselectionofkernelparameters.Thekernelparameters,intheCauchykernelandintherandomwalkkernel,aredecidedusingcross-validationhere.Thereisongoingresearchtoefcientlydeterminethekernelparameters,whichisalsoanimportantfurtherresearcharea. 80

PAGE 81

Supportvectormachine(SVM)isasetofsupervisedmachinelearningmethodsbasedonstructuralriskminimization.AfteritsintroductionbyVapnik[ 92 ],ithasbeensuccessfullyusedinmanyapplications,suchashandwritingrecognition[ 92 ],imageretrieval[ 93 ],andtextcategorization[ 94 ].Inspiteofthesuccesses,SVMstillhasanoisesensitivityproblem1.Theproblemoriginatesfromtheassumptionthateachtrainingpointhasequalimportanceorweightintraining.Inmanyrealworldproblems,however,therearecaseswheresometrainingpointsarecorruptedbynoiseandtheyshouldbetreatedaslessimportant.AlthoughSVMgeneralizeswellduetoregularizationbybalancingthemarginanderror,theerrortermbasedonsumofsquaresstillcanskewthedecisionboundarytowardoutliers.Torelaxthenoisesensitivityproblem,severalSVMvariantshavebeenproposed,includinglyfuzzySVM(FSVM)[ 95 ],newfuzzySVM[ 96 ],priorknowledgeSVM[ 97 ],posteriorprobabilitySVM[ 98 ],andsoftSVM[ 99 ].Eachisalittledifferentfromtheothersinformulation,buttheysharethebasicideaofdifferentweightsfordifferentdatapoints.AlthoughtheSVMvariantsareusefulinnoisyconditions,theirbasiclimitationsarethattheyassumesomedomain-specicknowledgeandthattheweightsareassumedtobeknownortobecalculatedeasilyusingthatknowledge.Therearealsoseveralmethodsforestimatingtheweightsbasedsolelyonthedata[ 95 ],[ 100 ][ 102 ].AllofthemarebasedonFSVMandintroducetheirownmeasuresofoutlier-ness.OneprobleminthesemethodsisthatmostofthemeasuresarebasedontheEuclideandistance,whichresultsintheassumptionofcirculardatadistribution. 81

PAGE 82

Inthischapter,wedevelopanewmembershipcalculationmethodbasedonthereconstructionerror,whichisbasedonthestatisticsofdataanddoesnotrequireanydomainknowledge.ThemethodassumesthatdatafollowsanGaussiandistributioninthefeaturespace,whichcanaccommodatemoredistributionsthancircularassumption.Thereconstructionerrormeasuresthedegreeofconformityofapointtooveralldatastructureusingprincipalcomponentanalysis(PCA).Usingthereconstructionerror,themembershipvaluescanbedecidedinarobustwayandtherobustmembershipsresultinmoreaccurateclassicationthanSVManditsvariants.Experimentalresultswithsyntheticandrealdatasetsalsosupportthis. Inthenextsection,SVMisbrieyoverviewed.Insection 5.2 thepreviousmembershipcalculationmethodsandtheirweaknessesaredescribedindetail.Theproposedmethodisintroducedinsection 5.3 .Experimentalresultsaregiveninsection 5.4 andthediscussionsectionconcludesthischapter. 82

PAGE 83

5 ] 2jjwjj2+CNXi=1i, whereCisaregularizationconstant,whichistheonlyfreeparameterinSVMformulation.SearchingfortheoptimalsolutionofEquation( 5 )canbesolvedbyconstructingaLagrangeequationandtransformingitintoitsdualproblem 2NXi=1NXj=1ijyiyj(xixj), where=(1,...,N)isthevectorofnon-negativeLagrangemultipliersassociatedwiththeconstraintsinEquation( 5 ).AccordingtotheKarush-Kuhn-Tucker(KKT)condition,thesolutioniofEquation( 5 )satisesthefollowingequalities: Fromtheseequalities,itcomesthattheonlynon-zerovaluesiinEquation( 5 )correspondtothepointsforwhichtheconstraintsinEquation( 5b )aresatisedwithequality.Thepointxicorrespondingwithi>0iscalledasupportvector,whichisdividedintotwotypes.Inthecase0
PAGE 84

andthescalarbcanbedeterminedfromEquation( 5 ).Thedecisionfunction,thus,canbewrittenas Thisformulationcanbeeasilyextendedintonon-linearSVM,inwhichthedatapointsarerstmappedintoahighdimensionalspaceusingamappingfunctionandthelinearSVMisapplied.Byreplacingeachdatapointxiwiththemappedone(xi)andtheproductwithkernelfunction(xi)(xj)=(xi,xj),onecangettheequationsofnon-linearSVM. 2jjwjj2+CNXi=1uii, Itisnotedthatasmalluireducestheeffectoftheparameterisuchthatthecorrespondingpointxiistreatedaslessimportant.TheprobleminEquation( 5 )canbetransformed 84

PAGE 85

2NXi=1NXj=1ijyiyj(xi,xj), andthecorrespondingKKTconditionscanbewrittenas Thepointxicorrespondingwithi>0iscalledasupportvectorandtherearealsotwotypesofsupportvectorsasinSVM.Thedatapointcorresponding0
PAGE 86

whereI+=fijyi=+1gandI=fijyi=1garetheindexsetsofpositiveandnegativesamples,xareclassmeans, and>0isaconstanttoavoidthecaseui=0.InEquation( 5 ),jIjrepresentsthecardinalityoftheset.OneprobleminthismethodisthatthemembershipvaluesarecalculatedbasedontheEuclideandistance.Whenthedistributionofdatapointsisnotspherical,thecontributionofeachpointtothedecisionboundarymaynotberepresentedproperly.Tomitigatethisproblem,Jiangetal.[ 100 ]proposedakernelizedversionofFSVM-I,calledFSVMinthefeaturespaceorFSVM-F.Inthefeaturespace,thecenterofaclustercanberepresentedas UsingEquation( 5 ),thedistancebetweenamappedpoint(xi)andacenter(x)canberepresentedas 86

PAGE 87

where(,)representsthekernelfunction.UsingEquation( 5 ),theclassradiiinthefeaturespacecanbewrittenas andthemembershipofFSVM-Fcanbewrittenas Thebasiclimitationofthesetwomethodscomesfromtheassumptionofacirculardatadistribution.Asoutliersaredetectedbasedsolelyonthedistancefromcorrespondingclassmean,thesemethodsmayproducegoodresultsonlywhenthedistributionofdatapointsissphericalintheinputorfeaturespace. Linetal.[ 101 ]proposedanothermembershipcalculationmethodusingkerneltargetalignment[ 103 ].Kerneltargetalignmentisamethodformeasuringthedegreeofagreementbetweenakernelandtargetvalues.Theinnerproductbetweenkernelmatricescanbedenedas 87

PAGE 88

5 )canbere-writtenas Inthismethod,aheuristicfunctionderivedfromEquation( 5 )isevaluatedasthealignmentbetweenadatavectorandthelabelforthatvector.Therefore,thismethodisreferredtoasheuristicfunctionFSVMorH-FSVM.FromEquation( 5 ),theheuristicfunctionisdenedas Thismethodassumesthatadatapointcanbeconsideredasnoisewithhighprobabilityifthisdatapointismoresimilartotheotherclassthanitsclass.ThevaluedenedinEquation( 5 )isproportionaltothedifferenceoftheaveragesquaredistancetothepointsbelongingtothesameclassandtheaveragesquaredistancetothepointsbelongingtotheotherclasswhenthedatasetisbalancedand(x,x)=1.Let 88

PAGE 89

thedifferenceisreducedtotheheuristicfunction.Thismethodalsoassumesacirculardistributionofthedatabecausetheoutlier-nessisbasedonthedistanceinthefeaturespaceasshowninEquation( 5 ).AnotherprobleminthismethodisthatthevaluefromEquation( 5 )canbepositiveandnegativeandtheboundisdependentonthenumberofdatapointsandthekernelfunction.Toobtainthememberships,therefore,thevaluefromEquation( 5 )shouldberescaled.Therescalingfunctionwasdenedas wherehCandhTrepresentthemaximumandminimumvalueoftheheuristicfunctionrespectively,istheminimummembershipvalue,anddisthedegreeofmappingfunction.Figure 5-1 showsthemappingfunctionwithrespecttothedegree. Shiltonetal.[ 102 ]proposedaniterativemethodusingtheslackvariablei,whichrepresentsthedistancefromadatapointtothedecisionhyper-plane.Themembership 89

PAGE 90

MappingfunctionforH-FSVM. valueisgivenas wherehisanon-increasingfunctionsatisfying andh()=sech2()wasusedin[ 102 ].Usingthesemembershipvalues,thedecisionboundarycanbere-constructedusingFSVM.ThetwostepsFSVMtrainingandmembershipcalculationmaybeiteratedusingaxednumberoftimesoruntiltheconvergenceofthemembershipvector.ThismethodisreferredtoasiterativeFSVMorI-FSVM.Thismethodassumesthatthepointsfarfromthedecisionboundaryarelessimportantinconstructingtheboundaryanddoesnotassumeanydistributionofthedata.Thedefectofthismethodalsoarisesfromthisassumption.Althoughthepointsadjacenttotheboundaryareimportant,thelargemembershipvaluesofthesepointsmaymaketheboundaryover-ttedtothepointsandresultinpoorgeneralization.Theincreaseinerrorrateasthenumberofiterationsincreasesclearlyshowstheoverttingproblem[ 102 ]. 90

PAGE 91

Reconstructionerrore(x). 104 ],[ 105 ],thusthereconstructionerrorisapromisingmeasureofoutlier-ness.Thereconstructionerroristhedistancebetweenaddimensionaldatapointxandtheprincipalcomponentreconstructionofxusingk(
PAGE 92

106 ]foramoredetailedinformationaboutKPCA.LetX=f(x1),,(xN)gbeasetofmappedsamples.UsingKPCA,thereconstructionerrorinthemappedspacecanbecalculatedas whereNisthenumberofdatapoints,isthemeanofthedatainthemappedspace,and(,)isthekernelfunction.ThematrixWhaskprincipalcomponentsofcovariancematrixofX.Asthedimensionalityof(x)isveryhighandsometimesinnite,itisimpossibletobuildthecovariancematrixexplicitly.Therefore,thereconstructionerrorinthefeaturespaceisevaluatedindirectlyusingtheeigenvectorsofthekernelmatrix.Thevectork=[k1,,kN](1kN)istheeigenvectorofthekernelmatrixKhavingthekthlargesteigenvalue.The(i,j)elementofthekernelmatrixisgivenby Thevaluesjandlaredenedby 92

PAGE 93

5 )canbefoundinAppendix B 5 ),themembershipvalueisdenedas whereNisafreeparameter.Thefunctione0()performsrescalingone()andisdenedas whereeandearethemeanandvarianceofthereconstructionerror,respectively.Thisrescalingtreatsdatapointswithreconstructionerrorsmallerthanthemeanreconstructionerrorequallybyassigningthemallamembershipvalueof1.Therescalingalsomakesitpossibleforthealgorithmtoaccommodateclasseswithdifferentdensitiesbynormalizingthereconstructionerrorhavingunitvariance.Figure 5-3 showsthemembershipcalculationfunctioninFSVM-N.Thevalueuirepresentsthedegreeoftypical-ness,andatypicalpointisdenedasapointwithsmallreconstructionerror.AlargevalueofNmakesui=1forall1iN,whichmakesallthesamplepointshaveequalweightsandreducestheproposedmethodtoSVM.AsmallvalueofN,ontheotherhand,throwsmostofthepointsawayinthetraining.Thus,theparametervalueofNshouldbedecidedcarefullyandthegridsearchmethod[ 107 ]isusedtosetthatvalue. Stillremainingisthedecisiononthenumberofprincipalcomponentsk.Inthischapter,theprolelikelihoodmethod(PLM)[ 108 ]isusedtodecidek.ThePLMisa B concernsthederivationofthereconstructionerrorwithmemberships.Theequationforthereconstructionerrorwithoutmembershipcanbeobtainedbysettingthemembershipsto1,ui=1forall1iN. 93

PAGE 94

RescalingfunctionforFSVM-N(N,1
PAGE 95

5 ). 5 ),( 5 ),and( 5 ). DecisionboundariesofSVMandFSVM-I. wherejjjisthecardinalityofthesetj.Theoptimalnumberofeigenvectorsisgivenas TheoverallprocedureofFSVM-NcanbesummarizedasAlgorithm 5-1 .Onethingthatshouldbenoticedisthatthemembershipvaluesaredecidedinasupervisedmanner,i.e.,positiveandnegativesamplesaretreatedseparatelyandthemembershipvaluesaredecidedusingtheeigenvectorsofthecorrespondingclasstoconsiderthedifferentstructureofthedataineachclass. 95

PAGE 96

5-4 showsthedecisionboundariesofSVMandFSVM-I.ThedatasetinFigure 5-4 consistsoftwoclusters,onecircularcluster(+)andoneellipticalcluster(*).Intheoverlappedregion,becausethedistancefromthecenteroftheellipticalclusterislargerthanthatfromthecenterofthecircularcluster,theboundaryofFSVM-Iisskewedtowardthecenteroftheellipticalcluster.Thisshiftmaymakesomepointsfromtheellipticalclusterbecomeincorrectlyclassied.Duetothisinabilitytodealwithnon-circularclusters,FSVM-Iisnotincludedinthefollowingexperiments.Instead,FSVM-Fisincludedbecauseitcandealwithnon-convexaswellasnon-circularclusters,althoughitalsoassumesthatclustersarecircularinthefeaturespace.InFSVM-F,thereisanotherfreeparameter()thatcontrolstheminimumvalueofthememberships. H-FSVMalsoassumesacirculardistributionofthedatainthefeaturespace.AnotherprobleminH-FSVMisthenumberoffreeparametersinEquation( 5 ),whichmakesitdifculttoobtainoptimalparametervalues.Therefore,usedisadifferentrescalingfunctionwhichcanbeoptimizedeasily.Thenewrescalingfunctionisdenedas whichissimilartoEquation( 5 ).Thefunctionf00(xi,yi)isdenedas wheref0andf0aremeanandvarianceoff0(xi,yi).Thefunctionf0(xi,yi)isdenedas Becausee((xi))measuresdissimilarityandf(xi,yi)measuressimilarity,thefunctionf(xi,yi)shouldbeinvertedrst.Usingthesamefunction,themeasuresofoutlier-nessthereconstructionerrorandthekerneltargetalignmentcanbecompared. 96

PAGE 97

DecisionboundariesofSVMandI-FSVM. Figure 5-5 showsthedifferencesbetweentheboundariesofSVMandI-FSVM.InFigure 5-5 ,I-FSVMiteratesuntilthemembershipvectorconverges.AlthoughthereisnoerrorunderthedecisionboundaryofI-FSVM,itismorecomplex,i.e.,over-ttedtotrainingdataandmaynotbegoodforgeneralization.Asshownin[ 102 ],onaverage,thebestresultwasobtainedafterseveniterations,butwedidnotxtheiterationnumber(n)andconsidereditasafreeparameter. Giventheconsiderationsdiscussedabove,thefourpreviousmethodsSVM,FSVM-F,H-FSVM,andI-FSVMandtheproposedmethod,FSVM-N,weretestedusingdatasetsavailablefromtheUCIMachineLearningRepository[ 72 ].TheGaussiankernel, wasusedforallthemethods.ThekernelparameterandtheregularizationparameterCforSVMweresetusingthegridsearchmethod[ 107 ].ThekernelparameterusedforSVMwasalsousedforallotherSVMvariants,andthefreeparameterspecictoeachmethodtogetherwiththeregularizationparameterCwerealsooptimizedusingthegridsearchmethod.Table 5-1 sumsupthemethodsusedintheexperiments. Table 5-2 summarizesthedatasetsusedandtheexperimentalresultsusingten-foldcross-validation.AsisclearfromTable 5-2 ,alltheSVMvariantsoutperformed 97

PAGE 98

SVManditsvariants Freeparameters Limitations SVM FSVM-F H-FSVM I-FSVM FSVM-N Table5-2. ErrorratesonUCIdatasets(N=numberofdatapoints,d=datadimension) Dataset d SVM FSVM-F H-FSVM I-FSVM FSVM-N Breastcancer 569 30 2.109 2.109 1.933 1.757 351 34 5.413 5.128 5.128 5.413 197 22 5.641 4.615 4.615 PimaIndiansdiabetes 768 8 22.396 22.266 22.005 Yeast 892 8 35.202 34.193 34.753 33.968 98

PAGE 99

AlthoughtheproposedmethodshowedbetterresultsthanSVManditsvariants,thereisstillroomforimprovement.TherstimprovementwouldbetheGaussianassumption.FSVM-Nachievedbetterperformancebyrelaxingtheassumptionofthepreviousmethods,therefore,removingtheGaussianassumptionmayimprovetheperformancefurther.Anon-parametricmethodthatdoesnotassumeanydatadistributionisunderinvestigation.Anotherimprovementwouldbetothecomputationalcomplexityofit.Whenthedatasetislarge,kernelPCAusedtocalculatethereconstructionerroriscomputationallyexpensive.Anothermeasureofoutlier-nessapproximatingthereconstructionerrorwithlowcomputationalcomplexityisunderinvestigation.Anothertopicleftforfurtherresearchisamethodtodealwithimbalanceddatasets,whicharequitecommoninemergingapplications.AlthoughSVMvariantforimbalanceddataisaspecialcaseofFSVM,notonlyFSVM-N,butexistingmethodscannotdealefcientlywithhighlyimbalanceddata.OnlyFSVM-Nopenedawaytothatthroughtherescalingofthereconstructionerrorandsupervisedmembershipdecisionprocedure. 99

PAGE 100

Context-dependentfusion(CDF)isageneralframeworkforfusionthatdividesthefeaturespaceintosomenumberofhomogeneousregions,i.e.,contexts,anddevelopsdifferentfusionprocessesfordifferentcontexts.CDFconsistsoftwocomponents,contextextractionanddecisionfusion,asdepictedinFigure 1-1 .CDFusesfeaturespaceinformationinfusionandtheinformationenablesittoachievebetterperformancethanpreviousfusionmethods.AlthoughCDFdemonstratedpromisingresults,italsohadsomeproblemsandcouldbeimprovedfurther.CDFassumesthateachcontextcorrespondstoaconvexclusterinthefeaturespace.AvariantofCDFusingregularization,calledcontext-dependentfusionwithregularization(CDF-R),mitigatedtheassumptionandachievedbetterperformancethanCDF,whichsuggeststhateachcontextmightnotberepresentedasaconvexclusterinthefeaturespace.AlthoughCDF-Rusedadifferentdistancefunctiontosoftentheconvexclusterassumption,thedistancewasstillbasedontheEuclideandistanceinthefeaturespace.Anon-linearfeaturemappingcanremovetheassumptionandmayimprovetheperformanceofCDF.Anotherproblemisthelinearcombinationmethodusedindecisionfusion.Ineachcontext,multiplealgorithmoutputswerecombinedintoaclusteroutputusingalinearcombinationmethod,whichcannotaccommodatenon-linearlyseparableclasses.Thelinearrestrictioncanberesolvedbyusinganon-linearcombinationmethod. Inthischapter,tosolvetheCDFproblems,CDFisgeneralizedasarobustnon-linearmethod,calledkernel-basedcontext-dependentfusion(K-CDF)usingtherobustmethodsdiscussedinthepreviouschapters.AsshowninFigure 6-1 ,K-CDFconsistsofthreecomponents:dimensionreduction,contextextraction,anddecisionfusion.DifferentfromCDF,K-CDFrequiresoneadditionalcomponentfordimension 100

PAGE 101

Blockdiagramofkernel-basedcontext-dependentfusion. reduction.Asthedimensionalityofthedatainthekernelfeaturespace1isgenerallyveryhighandoftenassumedtobeinnite,thefeature-wiseoperationusedinCDFisnotapplicable.Therefore,adimensionreductionmethodisintroducedinK-CDFandrobustkernelfuzzyprincipalcomponentanalysis(RKF-PCA)introducedinChapter 3 isusedforthat.RKF-PCAisarobustnon-linearvariantofPCA,efcientinreducingfeaturedimension.ByusingRKF-PCA,K-CDFcanovercometheconvexclusterlimitation.Toextractcontexts,amodiedfuzzyclustering,globalfuzzyc-means(G-FCM),isadopted.AsRKF-PCAmapstheoriginalfeatureinanon-linearway,clustersarenolongerconvexinthefeaturespace,althoughclustersmaybeconvexinthekernelfeaturespace.ThereforeG-FCMisadoptedtoovercomethesensitivitytoinitializationinFCManditsvariants.Finally,arobustvariantofsupportvectormachine(SVM)usingfuzzymemberships,calledfuzzysupportvectormachine(FSVM),isusedtofuseclassieroutputsinanon-linearmanner.AnotherstrengthofK-CDFisitsnoise 101

PAGE 102

Inthenextsection,K-CDFisformulatedbyextendingCDF-R.Experimentalresultsusingarticialandrealdatasetsaregiveninsection 6.2 andadiscussionconcludesthischapter. 2jjw 2jjw whereNisthenumberofdatapoints,Kisthenumberofclusters,andListhefeaturedimensioninthekernelfeaturespace.Thevalueukiisthemembershipofxitothekthclusterandrklistheweightofthelthfeatureinthekthcluster.Theconstantsmandqrepresentthefuzzierconstantsformembershipandfeatureweight,respectively.Thevaluedkil=jvklxiljisafeature-wisedistancebetweenthelthfeatureofdatapointxiandthelthfeatureofthekthclustercentervk.ThesecondtermisthesumofKfuzzysupportvectormachine(FSVM)objectivefunctions,eachofwhichcorrespondstoafusionprocessforeachcluster.AlltheFSVMssharetheregularizationconstantC.Thevaluew 102

PAGE 103

TheobjectivefunctionshouldbeminimizedundertheconstraintsonmembershipUandfeatureweightR.Theconstraintscanberepresentedas whicharethesameasthoseinCDFandCDF-R.TheobjectivefunctionofK-CDFissomewhatdifferentfromthatofCDF,whichcanbewrittenas whereL0isthefeaturedimensionintheoriginalfeaturespace,wktisaweightofthetthclassierinthekthcluster,ytiisaoutputvaluefromthetthclassierforxi,andoiisadesiredoutputforxi.ThemaindifferenceliesinthesecondtermofEquations( 6 )and( 6 ).DifferentfromCDF,inK-CDF,thereisnoexplicitmentionofthealgorithmoutput(yti)ortargetvalue(oi).AsK-CDFusesFSVMforfusion,algorithmoutputsandtargetvaluesaregivenasinputstoFSVM.Insteadofminimizingthedifferencebetweenfusedoutputsandtargetvalues,therefore,K-CDFoptimizesthemargin(indirectlyrepresentedbyw AlternatingoptimizationcanbeusedtooptimizeEquation( 6 )andtheupdateequationscanbewrittenas 103

PAGE 104

NPi=1umki,(6) where ThedetailedderivationcanbefoundinAppendix C .Theoptimalvaluesoftheremainingparametersspecictothekthcluster,fw 6 ),( 6 ),and( 6 )areessentiallythesameasthoseofCDFandCDF-R.Thedifferenceliesinthedistancecalculationmethods.ThedistancesinEquations( 6 )and( 6 )alsolooksimilartothoseofCDFandCDF-R,but,inK-CDF,dkiliscalculatedinthekernelfeaturespace,notinthefeaturespace.Table 6-1 comparestheupdateequationsofCDF-RandK-CDF2. ThelastthingthatshouldbementionedisthemappingofoutputvaluesfromFSVM.AsanoutputvaluefromFSVMisnotbounded,itshouldbemappedintoaspecicrange,typically[0,1],tobeconsideredasaprobability.Althoughtherehavebeenseveralapproachestotheconversionoftheoutputtoaposteriorprobability[ 109 ], 6-1 ,theupdateequationforclassierweightsinCDF-Risnotshown.RefertoTable 2-1 forcompleteupdateequationsofCDFandCDF-R. 104

PAGE 105

UpdateequationsofCDF-RandK-CDF 2jjwkjj2+CNPi=1umkiki+NPi=1KPk=1umki+KPk=1LPl=1rqkl NPi=1umkivkl=NPi=1umkixil NPi=1umkirkl=1

PAGE 106

6 ). 6 ). 6 ). 110 ],inK-CDF,asimplesigmoidfunctionwasused.Themappingfunctionisdenedas 1+ey0,(6) wherey0=fy0kij1kK,1iNgisFSVMoutputandisaconstant.Thevalueofisdecidedexperimentally.Algorithm 6-1 summarizesthetrainingphaseoftheK-CDFalgorithm. Inthetestphase,whenadatapointxisgiven,itisrstmappedintoanLdimensionalfeaturex0usingRKF-PCAandalgorithmoutputsy=[y1,,yT]arecalculated.Foratestpoint,asthetargetvalueisnotknown,theerrorkcannotbecalculated,soEquation( 6 )cannotbeevaluated.Therefore,themembershipofxiscalculatedusingEquation( 6 )withthefollowingdistancefunction: TheKFSVMstakealgorithmoutputsasinputsandgenerateSVMoutputsthatareconvertedintoclusteroutputusingEquation( 6 ).Thenaloutputisgivenasamembership-weightedsumoftheKclusteroutputs: 6-2 showstheproceduregeneratingnaloutputfrominputpointx.InFigure 6-2 106

PAGE 107

Testphaseofkernel-basedcontext-dependentfusion. thematricesVandRandtheFSVMparametersareoptimizedusingAlgorithm 6-1 6-3 .Thedatasetconsistsof400datapointsfromtwoclassesandnaturallyformstwoclusters.Therstclusterislinearlyseparable,butthesecondoneisnot.Thecondencevalueoftherstclassiertakestherstfeaturevalueafternormalization.Simplelinearrescalingwasperformedcluster-wiseforthecondencevaluestohavevaluesbetweenzeroandone.Thecondencevalueofthesecondclassierwassetusingthesecondfeaturevalueinasimilarway.Figure 6-4 showsthereceiveroperatingcharacteristic(ROC)curvesofthetwoclassiersandthefusedresultsusingCDFwithregularization(CDF-R)andK-CDF.Asthetwoclustersarewellseparated,CDF-RandK-CDFclusteredthedata,asshowninFigure 6-3 ,inthefeature 107

PAGE 108

Articialdataset. spaceandkernelfeaturespace.3CDF-R,however,failedtoclassifythedatacorrectlyinclusterII,theclusterthatisnotlinearlyseparable.K-CDFalsomadeoccasionalerrorsduetotherandomdatageneration,butK-CDFwasmuchbetterthanCDF-R,sometimesachievinga100%classicationrateforthearticialdataset,becauseSVMcaneffectivelyaccommodatenon-linearlyseparableclasses. ThereareseveralimportantparametersinK-CDF,includingkernelparametersforkernelPCA(PCA)andfuzzySVM(FSVM),andaregularizationparameterforfuzzySVM(C).TheinitialvalueofPCAwassetasthevaluegeneratingminimumclassicationerrorusingthegridsearchmethod[ 107 ].ClassicationerrorwascalculatedusingtheFisherlineardiscriminantasdescribedinChapter 3 .Thenumberofeigenvectorsusedinreconstructionwassetastheminimumnumberofeigenvectorswithwhichatleast90%ofthedatavarianceisdescribedwiththegivenkernelparameterPCA.TheinitialvaluesofFSVMandCwerealsosetasthevaluesgeneratingminimumclassicationerrorwithSVM.Startingfromtheinitialvalues 108

PAGE 109

ROCcurvesfromanarticialdataset. decidedabove,thevalueswereadjustedexperimentallytoobtainthebestROCcurve.AlthoughthereareseveralavailablemethodstocompareROCcurves[ 111 ][ 113 ],theprobabilityoffalsealarm(PFA)at85%,90%,and95%ofprobabilityofdetection(PD)wasthersttobeminimizedinthischapter.TheremainingparametersweresetequaltothoseinCDF-R. K-CDFwasalsoappliedtolandminedetectionproblems.WhenK-CDFwasappliedtothedatausedinChapter 2 ,itshowedalmostthesameresultsasCDF-Rbecausealltheclassiersystemsusedforfusionweresophisticatedenoughtogeneratelineardecisionboundaryineachcluster.AlthoughK-CDFshowedsomewhatbetterresultsthanCDF-R,thedifferenceswerenotstatisticallysignicant.Asdemonstratedinthepreviousexperiment,K-CDFismuchbetterthanCDF-Rwhenalineardecisionboundarydoesnotwork.ByrandomlysamplingwithdatasetIIinChapter 2 ,wemadeupanotherdataset,whichconsistedof292samples,with113minesamplesand 109

PAGE 110

ROCcurvesforasetofclassiersfromthesubsetofdatasetII. Table6-2. ExperimentalresultsfromthesubsetofdatasetII PD(%) PFA(%) EHD SPECT WEMI CDF-R K-CDF 85 34.64 34.64 30.17 12.85 7.82 90 38.55 41.90 34.64 17.88 13.41 95 58.10 50.84 41.34 27.93 23.46 179non-minesamples.Figure 6-5 showstheROCcurvesandTable 6-2 summarizestheresults.AsisclearfromFigure 6-5 ,K-CDFshowedbetterresultsthanCDF-R,especiallyat85%,90%,and95%PD.AlthoughitlooksasifK-CDFandCDF-Rproducedalmostthesameresults,thedifferencebetweenthemarestatisticallysignicant.Table 6-3 showsthecondenceintervals(CIs)atthe90%condencelevel(CL)usingEquation( 3 ).ThereisnooverlapbetweentheCIsofCDF-RandK-CDF,whichmeansthatthedifferencesarestatisticallysignicantandK-CDFisbetterthanCDF-RateachPD. 110

PAGE 111

Condenceintervalofthenumberoffalsealarmswitha90%condencelevel PD(%) CI CDF-R K-CDF 85 [22.4523.55] [13.5614.44] 90 [31.3732.64] [23.4424.56] 95 [49.2650.73] [41.3042.69] Figure6-6. ROCcurvesforasetofclassiersfromhyperspectraldata. Anotherdatasetwashyperspectralimagerycollectedfromanairbornehyperspectralimager[ 114 ].Thedatasetconsistedof482samples,with167minesamplesand315non-minesamples,eachofwhichhad70spectralbandsaftertrimmingandbinning.ThreeclassiersystemscalledWDW(whitenandde-whiten)[ 115 ],SAM(spectralanglemapper)[ 116 ],andSCM(spectralcorrelationmapper)[ 117 ]wereusedtogeneratecondencevalues.Figure 6-6 showstheROCcurvesandTable 6-4 summarizestheresults.AsisclearfromFigure 6-6 ,K-CDFisbetterthanCDF-Roranysingleclassierat85%and90%PD.SAMisbetterthanK-CDFat95%PD,butK-CDFisstillbetterthan 111

PAGE 112

Experimentalresultsfromhyperspectraldataset PD(%) PFA(%) WDW SAM SCM CDF-R K-CDF 85 56.19 37.46 29.21 24.76 15.87 90 61.59 44.44 50.48 40.00 26.67 95 73.97 53.97 100.00 64.13 60.63 Table6-5. Condenceintervalofthenumberoffalsealarmswitha90%condencelevel PD(%) CI CDF-R K-CDF 85 [77.2878.70] [49.3950.59] 90 [125.19126.81] [83.2884.74] 95 [201.22202.80] [190.18191.79] CDF-R.Table 6-5 showstheCIsofthenumberoffalsealarmswitha90%CL,whichalsosupportsthesuperiorityofK-CDFtoCDF-R. Fromtheaboveexperimentsusingarticialandlandminedatasets,K-CDFshowedbetterresultsthanCDF-R.ThestrengthofK-CDFcomesfromitsnon-linearandrobustcharacteristics.Thenon-linearitycomesfromthekernelmethodsadoptedinK-CDFandtherobustnessmainlyoriginatesfromregularizationandfuzzymemberships.K-CDFrequiresconsiderabletimetobetrainedduetotherepeatedtrainingofFSVM,whichalsomakesitdifculttotunetheparametervaluesinK-CDF.K-CDFis,however,agoodchoiceforanoisyrealworldproblemduetothestrengthesmentionedabove. 2 isextendedtoaddressandsolvesolveitslimitations.CDFhastwolimitations:theclustersareconvexinthefeaturespaceandtheclassesineachclusterarelinearlyseparable.Theextendedframework,K-CDF,useskernelPCAtoremovetheconvexclusterassumptionandFSVMtoseparatenon-linearlyseparableclasses.Inaddition,regularizationandtherobustmethodsadoptedforeachcomponentinK-CDFmakeitpossibletoachieveevenbetterperformancethanpreviousmethods. 112

PAGE 113

AlthoughK-CDFwasaneffectiveframeworkforfusion,itcanbeimprovedfurther.Therstimprovementconcernstimecomplexity.BecauseK-CDFusesSVMasafusionmethod,itrequiresconsiderabletimeforoptimization.Asimplenon-linearclassiermightsignicantlyreducethecomplexitywithoutsacricingtheperformanceofK-CDF.Thisisleftforfurtherresearch.Anotherproblemisthedecisionofparametervalues.K-CDFrequirestwokernelparametersforkernelPCAandfuzzySVM,andanotherregularizationparameterforfuzzySVM.Althoughtheparameterinitializationmethodusedinthischaptershowedreasonableresults,itistime-consumingandmaynotbeoptimal.Itmightbepossibletodecidetheparametervaluesbasedondata,whichisanopenproblemtobestudiedfurther. 113

PAGE 114

Combiningseveralclassiershasbecomeanimportanttaskinrecentyearstoovercomethelimitationsofasingleclassier.Inthisdissertation,ageneralizationofcontext-dependentfusion(CDF)usingrobustnon-linearmethodswasinvestigated. CDFisageneralizationofthepreviousfusionframeworkexecutedbydividingthefeaturespaceintoanumberofhomogeneousregionsanddevelopingadifferentfusionprocessforeachoftheregions.CDF,however,doesnotexplicitlyaddressthenoisesensitivityproblemanditcannotaccommodatenon-linearcases.Regularizationwasadoptedtoenhancethenoiserobustness,whichresultedinarobustvariantofCDF,calledcontext-dependentfusionwithregularization(CDF-R).CDF-Rwasfurtherextendedusingrobustkernelmethods,namedkernel-basedcontext-dependentfusion(K-CDF).K-CDFhasthreemaincomponentsdimensionreduction,contextextraction,anddecisionfusionandrobustkernelmethodsweredevelopedforeachcomponent.Specically,robustkernelfuzzyprincipalcomponentanalysis(RKF-PCA),kernel-basedglobalfuzzyc-means(KG-FCM),andfuzzysupportvectormachinefornoisydata(FSVM-N)wereinvestigated. RKF-PCAwasusedasadimensionreductionmethodinK-CDFbecauseitovercomesthenoisesensitivityandlinearrestrictionproblemsinPCA,bytheintroductionoffuzzymembershipsandkernelmethods,respectively.IterativeoptimizationenablesRKF-PCAtogeneratemorenoise-resistantresultsthanpreviousmethods.Globalfuzzyc-meanswasusedasacontextextractionmethodduetothedeterministicandnoise-resistantcharacteristicsofglobalclustering.Initializationisoneofthemajordifcultiesofclusteringalgorithmsandtheglobalmethodcaneffectivelymitigatetheproblemsinperformanceandspeed.Forthedecisionfusion,avariantofsupportvectormachineusingfuzzymembershipswasused.Astheclusteringalgorithmisincludedas 114

PAGE 115

Bycombiningthethreefuzzykernelmethods,K-CDFcanaccommodatenon-linearcasesinanoiseresistantway,whichisthemainresultofthisdissertation.Anothercontributionisthatthethreerobustkernelmethodsdevelopedcanbeusedinotherpatternrecognitionproblems,asdemonstratedinChapters 3 4 ,and 5 115

PAGE 116

Theobjectivefunctionforcontext-dependentfusionwithregularizationcanbewrittenas whereNisthenumberofdatapoints,Kisthenumberofclusters,Listhedimensionofdata,andTisthenumberofalgorithms.Thevalueukiisthemembershipofxitothekthcluster,rklistheweightofthelthfeatureinthekthcluster,andwktistheweightofthetthalgorithminthekthcluster.Theconstantsmandqrepresentthefuzzierconstantsformembershipandfeatureweightrespectively.Foradatapointxi,theoutputvalueytiisgeneratedfromthetthalgorithmandtheweightedsumofthealgorithmoutputsiscomparedtothedesiredoutputoi.Thevaluedkil=jvklxiljisafeature-wisedistancewhichmeasuresthedifferencebetweenthelthfeatureofadatapointxiandthelthfeatureofthekthclustercentervk.Thevalueisaconstantbalancingbetweenclusteringandfusion.Theremainingthreeconstants,,andareregularizationconstants.TheobjectivefunctionshouldbeminimizedundertheconstraintsonmembershipU,featureweightR,andclassierweightW.Theconstraintscanberepresentedas 116

PAGE 117

A )andtheconstraints,theLagrangeequationcanbewrittenas where1,i,2,k,and3,kareLagrangemultipliers.BycollectingallthetermsrelatedtoukifromtheLagrangeequation,onecanobtain Bytakingapartialderivativewithrespecttoukiandequatingtozero,onecanobtain LeteDkibeanewdistancemeasuredenedas thenEquation( A )canbere-writtenas 117

PAGE 118

ThemembershipshouldsatisfytheconstraintinEquation( A ).Usingtheconstraint,theLagrangemultiplier01,icanbecalculatedas UsingEquations( A )and( Ab ),theupdateequationforthemembershipukicanbewrittenas BycollectingallthetermsrelatedtovklfromtheLagrangeequation,onecanobtain Bytakingapartialderivativewithrespecttovklandequatingtozero,onecanobtain Becausethefeatureweightrkl>0,i.e.,allthefeaturesareassumedtohaveeffectsonthedistancecalculation,Equation( A )canbere-writtenas 118

PAGE 119

NPi=1umki.(A) BycollectingallthetermsrelatedtorklfromtheLagrangeequation,onecanobtain Bytakingapartialderivativewithrespecttorklandequatingtozero,onecanobtain Let A )as thenEquation( A )canbere-writtenas Dkl=02,k andthefeatureweightrklcanbecalculatedas 119

PAGE 120

A ).Usingtheconstraint,theLagrangemultiplier02,kcanbecalculatedas UsingEquations( A )and( Ab ),theupdateequationforthefeatureweightrklcanbewrittenas BycollectingallthetermsrelatedtowkmfromtheLagrangeequation,onecanobtain Bytakingapartialderivativewithrespecttowkaandequatingtozero,onecanobtain 120

PAGE 121

A )canbefurthersimpliedas !=2NXi=1umki0B@oiTXt=1t6=awktyti1CAyai+3,k, !=NXi=1umki0B@oiTXt=1t6=awktyti1CAyai+3,k andtheupdateequationfortheclassierweightwkacanbewrittenas NPi=1umkiy2ai+ .(A) 121

PAGE 122

A ).UsingEquation( A ),theconstraintcanbere-writtenas NPi=1umkiy2ai+ =TXa=1NPi=1umkioiTPt=1wktyti+wkayaiyai+03,k NPi=1umkiy2ai+ =TXa=1NPi=1umkioiTPt=1wktytiyai+wkaNPi=1umkiy2ai+03,k NPi=1umkiy2ai+ =TXa=1NPi=1umkioiTPt=1wktytiyai+wkaNPi=1umkiy2ai+ wka +03,k NPi=1umkiy2ai+ =TXa=1NPi=1umkioiTPt=1wktytiyaiwka +TXa=1wka+03,kTXa=11 =TXa=1NPi=1umkioiTPt=1wktytiyaiwka +1+03,kTXa=11 =1,(A) 122

PAGE 123

=TXa=1NPi=1umkioiTPt=1wktytiyaiwka 123

PAGE 124

Thereconstructionerrorinthefeaturespacecanbedenedas whereWisaneigenvectormatrixcomposedofkeigenvectorsascolumns: Eacheigenvectorcanberepresentedasaweightedcombinationofthemappeddatapointsxi(1iN)as whereNisthenumberofdatapoints.Themeandataalsocanberepresentedasacombinationoftheprojecteddatapointsas whereN0=PNi=1ui.UsingEquation( B )and( B ),thersttermofEquation( B )canbewrittenas 124

PAGE 125

B )canbewrittenas InEquation( B ),WT(x)canbecalculatedas wherejisdenedas 125

PAGE 126

wherelisdenedas UsingEquations( B )and( B ),Equation( B )canberepresentedas ThethirdtermofEquation( B )canbewrittenas 126

PAGE 127

B ),WWT(x)canbecalculatedusingEquation( B )as whereliisdenedas Similarly,WWTcanbecalculatedusingEquation( B )as UsingEquations( B )and( B ),thetermsinEquation( B )canberepresentedas 127

PAGE 128

andEquation( B )canbewrittenas Finally,Equation( B )canbewrittenusingEquations( B ),( B ),and( B )as 128

PAGE 129

Theobjectivefunctionforkernel-basedcontext-dependentfusioncanbewrittenas 2jjwkjj2+CNXi=1umkiki!+NXi=1KXk=1umki+KXk=1LXl=1rqkl,(C) whereNisthenumberofdatapoints,Kisthenumberofclusters,andListhedimensionofdata.Thevalueukiisthemembershipofxitothekthclusterandrklistheweightofthelthfeatureinthekthcluster.Theconstantsmandqrepresentthefuzzierconstantsformembershipandfeatureweightrespectively.Thevaluedkil=jvklxiljisafeature-wisedistancewhichmeasuresthedifferencebetweenthelthfeatureofadatapointxiandthelthfeatureofthekthclustercentervk.ThesecondtermisasumofKobjectivefunctionsoffuzzysupportvectormachine(FSVM),eachofwhichcorrespondstoadifferentfusionprocessinaspeciccluster.AlltheFSVMssharetheregularizationconstantC.ThevaluewkrepresentsthedecisionsurfaceinthekthFSVMandkiisanerrorofxiinthekthFSVM.AstheparametersetforthekthFSVMfwk,kiji=1,,NgisindependentfromotherparametersandcanbeoptimizedusingtheoptimizationprocedureofFSVMwhenfukiji=1,,Ngisknown,theupdateequationsforU,V,andRarediscussedhere.Refertochapter 5 formoredetailedinformationaboutSVM.Thevalueisaconstantbalancingbetweenclusteringandfusionandtheremainingtwoconstantsandareregularizationconstants.TheobjectivefunctionshouldbeminimizedundertheconstraintsonmembershipUandfeatureweightR.Theconstraintscanberepresentedas 129

PAGE 130

C )andtheconstraints,theLagrangeequationcanbewrittenas 2jjwkjj2+CNXi=1umkiki!+NXi=1KXk=1umki+KXk=1LXl=1rqklNXi=11,iKXk=1uki1!KXk=12,kLXl=1rkl1!,(C) where1,iand2,kareLagrangemultipliers.BycollectingallthetermsrelatedtoukifromtheLagrangeequation,onecanobtain 2jjwkjj2+CNXi=1umkiki!+NXi=1KXk=1umkiNXi=11,iKXk=1uki1!.(C) Bytakingapartialderivativewithrespecttoukiandequatingtozero,onecanobtain LeteDkibeanewdistancemeasuredenedas thenEquation( C )canbere-writtenas 130

PAGE 131

ThemembershipshouldsatisfytheconstraintinEquation( C ).Usingtheconstraint,theLagrangemultiplier01,icanbecalculatedas UsingEquations( C )and( Cb ),theupdateequationforthemembershipukicanbewrittenas BycollectingallthetermsrelatedtovklfromtheLagrangeequation,onecanobtain Bytakingapartialderivativewithrespecttovklandequatingtozero,onecanobtain Becausethefeatureweightrkl>0,i.e.,allthefeaturesareassumedtohaveeffectsonthedistancecalculation,Equation( C )canbere-writtenas 131

PAGE 132

NPi=1umki.(C) BycollectingallthetermsrelatedtorklfromtheLagrangeequation,onecanobtain Bytakingapartialderivativewithrespecttorklandequatingtozero,onecanobtain Let C )as thenEquation( C )canbere-writtenas Dkl=02,k andthefeatureweightrklcanbecalculatedas 132

PAGE 133

C ).Usingtheconstraint,theLagrangemultiplier02,kcanbecalculatedas UsingEquations( C )and( Cb ),theupdateequationforthefeatureweightrklcanbewrittenas 133

PAGE 134

[1] R.A.Fisher,Theuseofmultiplemeasurementsintaxonomicproblem,AnnalsofEugenics,vol.7,pp.179,1936. [2] C.M.Bishop,PatternRecognitionandMachineLearning.Springer,Singapore,2007. [3] H.Frigui,P.Gader,andA.C.B.Abdallah,Agenericframeworkforcontext-dependentfusionwithapplicationtolandminedetection,inProceed-ingsoftheSPIEConferenceonDetectionandSensingofMines,ExplosiveObjects,andObscuredTargetsXIII,vol.6953,OrlandoFL,March2008. [4] H.Frigui,L.Zhang,P.Gader,andD.Ho,Contextdependentfusionforlandminedetectionwithgroundpenetratingradar,inProceedingsoftheSPIEConferenceonDetectionandRemediationTechnologiesforMinesandMinelikeTargetsXII,vol.6553,OrlandoFL,April2007. [5] B.ScholkopfandA.J.Smola,LearningwithKernels:SupportVectorMachines,Regularization,Optimization,andBeyond.MITPress,Cambridge,MA,2001. [6] B.Scholkopf,A.Smola,andK.-R.Muller,Nonlinearcomponentanalysisasakerneleigenvalueproblem,NeuralComputation,vol.10,no.5,pp.1299,1998. [7] G.Heo,P.Gader,andH.Frigui,RobustkernelPCAusingfuzzymembership,inProceedingsofthe2009InternationalJointConferenceonNeuralNetworks,Atlanta,GA,June2009. [8] R.O.DudaandP.E.Hart,PatternClassicationandSceneAnalysis.JohnWiley,NewYork,1973. [9] M.Minsky,Logicalversusanalogicalorsymbolicversusconnectionistorneatversusscruffy,AIMagazine,vol.12,no.2,pp.34,1991. [10] L.I.Kuncheva,CombiningPatternClassiers.Wiley-Interscience,NewYork,2004. [11] K.Sirlantzis,S.Hoque,andM.Fairhurst,Trainablemultipleclassierschemesforhandwrittencharacterrecognition,inProceedingsofthe3rdInternationalWorkshoponMultipleClassierSystems,Cagliari,Italy,June2002,pp.319. [12] F.Huenupan,N.B.Yoma,C.Molina,andC.Garreton,Condencebasedmultipleclassierfusioninspeakerverication,PatternRecognitionLetters,vol.29,no.7,pp.957,2008. [13] Y.Bi,D.Bell,H.Wang,G.Guo,andJ.Guan,CombiningmultipleclassiersusingDempster'srulefortextcategorization,AppliedArticialIntelligence,vol.21,no.3,pp.211,2007. 134

PAGE 135

L.K.HansenandP.Salamon,Neuralnetworkensembles,IEEETransactionsonPatternAnalysisandMachineIntelligence,vol.12,no.10,pp.993,1990. [15] T.G.Dietterich,Ensemblemethodsinmachinelearning,inProceedingsofthe1stInternationalWorkshoponMultipleClassierSystems,Cagliari,Italy,June2000,pp.1. [16] S.E.Shimony,FindingMAPsforbeliefnetworksisNP-hard,ArticialIntelli-gence,vol.68,no.2,pp.399,1994. [17] A.M.AbdelbarandS.M.Hedetniemi,ApproximatingMAPsforbeliefnetworksisNP-hardandothertheorems,ArticialIntelligence,vol.102,no.1,pp.21,1998. [18] J.KittlerandF.M.Alkoot,Sumversusvotefusioninmultipleclassiersystems,IEEEtransactionsonPatternAnalysisandMachineIntelligence,vol.25,no.1,pp.110,2003. [19] D.RutaandB.Gabrys,Classierselectionformajorityvoting,InformationFusion,vol.6,no.1,pp.63,2005. [20] L.S.KennedyandS.-F.Chang,Arerankingapproachforcontext-basedconceptfusioninvideoindexingandretrieval,inProceedingsofthe6thACMInternationalConferenceonImageandVideoRetrieval,Amsterdam,TheNetherlands,July2007,pp.330. [21] J.Feng,S.Yoon,andA.K.Jain,Latentngerprintmatching:Fusionofrolledandplainngerprints,inProceedingsofthe3rdIAPR/IEEEInternationalConferenceonBiometrics,Alghero,Italy,June2009. [22] T.-Y.KimandH.Ko,Bayesianfusionofcondencemeasuresforspeechrecognition,IEEESignalProcessingLetters,vol.12,no.12,pp.871,2005. [23] P.deOude,G.Pavlin,andT.Hood,AmodularapproachtoadaptiveBayesianinformationfusion,inProceedingsofthe10thInternationalConferenceonInformationFusion,Quebec,Canada,July2007,pp.1. [24] N.Farah,M.T.Khadir,andM.Sellami,Articialneuralnetworkfusion:ApplicationtoArabicwordsrecognition,inProceedingsof2005EuropeanSymposiumonArticialNeuralNetworks,Bruges,Belgium,April2005,pp.151. [25] S.KangandS.Park,Afusionneuralnetworkclassierforimageclassication,PatternRecognitionLetters,2008. 135

PAGE 136

M.Arif,T.Brouard,andN.Vincent,AfusionmethodologybasedonDempster-Shaferevidencetheoryfortwobiometricapplications,inProceed-ingsofthe18thInternationalConferenceonPatternRecognition,HongKong,August2006,pp.590. [27] S.L.Hegarat-Mascle,D.Richard,andC.Ottle,Multi-scaledatafusionusingDempster-Shaferevidencetheory,IntegratedComputer-AidedEngineering,vol.10,no.1,pp.9,2003. [28] M.M.Islam,X.Yao,S.M.S.Nirjon,M.A.Islam,andK.Murase,Baggingandboostingnegativelycorrelatedneuralnetworks,IEEETransactionsonSystems,Man,andCyberneticsPartB:Cybernetics,vol.38,no.3,pp.771,2008. [29] S.B.KotsiantisandP.E.Pintelas,Combiningbaggingandboosting,Interna-tionalJournalofComputationalIntelligence,vol.1,no.4,pp.324,2004. [30] A.Temko,D.Macho,andC.Nadeu,Fuzzyintegralbasedinformationfusionforclassicationofhighlyconfusablenon-speechsounds,PatternRecognition,vol.41,no.5,pp.1831,2008. [31] A.Mendez-Vazquez,P.Gader,J.M.Keller,andK.Chamberlin,MinimumclassicationerrortrainingforChoquetintegralswithapplicationtolandminedetection,IEEETransactionsonFuzzySystems,vol.16,no.1,pp.225,2008. [32] K.Woods,W.P.K.Jr.,andK.Bowyer,Combinationofmultipleclassiersusinglocalaccuracyestimates,IEEETransactionsonPatternAnalysisandMachineIntelligence,vol.19,no.4,pp.405,1997. [33] L.I.Kuncheva,Clustering-and-selectionmodelforclassiercombination,inProceedingsofthe4thInternationalConferenceonKnowledge-BasedIntelligentEngineeringSystemsandAlliedTechnologies,Brighton,UK,August2000,pp.185. [34] R.LiuandB.Yuan,Multipleclassierscombinationbyclusteringandselection,InformationFusion,vol.2,pp.163,2001. [35] H.FriguiandO.Nasraoui,Simultaneousclusteringandattributediscrimination,inProceedingsofthe9thIEEEInternationalConferenceonFuzzySystems,SanAntonio,TX,May2000,pp.158. [36] A.KellerandF.Klawonn,Fuzzyclusteringwithweightingofdatavariables,InternationalJournalofUncertainty,FuzzinessandKnowledge-BasedSystems,vol.8,no.6,pp.735,2000. [37] L.A.Zadeh,Fuzzysets,InformationandControl,vol.8,no.3,pp.338,1965. 136

PAGE 137

J.C.Bezdek,PatternRecognitionwithFuzzyObjectiveFunctionAlgorithms.PlenumPress,NewYork,1981. [39] A.N.Tikhonov,Onsolvingincorrectlyposedproblemsandmethodofregularization,Dokl.Acad.NaukUSSR,vol.151,pp.501,1963. [40] R.-P.LiandM.Mukaidono,Amaximumentropyapproachtofuzzyclustering,inProceedingofthe4thIEEEInternationalConferenceonFuzzySystems,Yokohama,Japan,March1995,pp.2227. [41] S.MiyamotoandK.Umayahara,Fuzzyclusteringbyquadraticregularization,inProceedingofthe1998IEEEInternationalConferenceonFuzzySystems,Anchorage,AK,May1998,pp.1394. [42] D.OzdemirandL.Akarun,Afuzzyalgorithmforcolorquantizationofimages,PatternRecognition,vol.35,pp.1785,2002. [43] J.YuandM.-S.Yang,Ageneralizedfuzzyclusteringregularizationmodelwithoptimalitytestsandmodelcomplexityanalysis,IEEETransactionsonFuzzySystems,vol.15,no.5,pp.904,2007. [44] J.A.MacDonald,AlternativesforLandmineDetection.RAND,2003. [45] H.Frigui,K.C.Ho,andP.Gader,Real-timelandminedetectionwithground-penetratingradarusingdiscriminativeandadaptivehiddenMarkovmodels,EURASIPJournalonAppliedSignalProcessing,vol.2005,no.12,pp.1867,2005. [46] Y.Zhao,P.Gader,P.Chen,andY.Zhang,TrainingDHMMsofmineandcluttertominimizelandminedetectionerrors,IEEETransactionsonGeoscienceandRemoteSensing,vol.41,no.5,pp.1016,2003. [47] G.HeoandP.Gader,Prior-updatingensemblelearningfordiscreteHMM,inProceedingsofthe19thInternationalConferenceonPatternRecognition,TampaFL,December2008,pp.1. [48] K.C.Ho,L.Carin,P.Gader,andJ.N.Wilson,Aninvestigationofusingthespectralcharacteristicsfromgroundpenetratingradarforlandmine/clustterdiscrimination,IEEETransactionsonGeoscienceandRemoteSensing,vol.46,no.4,pp.1177,2008. [49] H.FriguiandP.Gader,Detectionanddiscriminationoflandminesinground-penetratingradarbasedonedgehistogramdescriptorsandapossibilistick-nearestneighborclassier,IEEETransactionsonFuzzySystems,vol.17,no.1,pp.185,2009. [50] ,Detectionanddiscriminationoflandminesbasedonedgehistogramdescriptorsandfuzzyk-nearestneighbors,inProceedingsofIEEEInternationalConferenceonFuzzySystems,VancouverCanada,July2006,pp.1494. 137

PAGE 138

G.HeoandP.Gader,Shape-basedlandminedetectionusingwidebandelectromagneticinductionmethod,TechnicalReport,2006,ComputerandInformationScienceandEngineering,UniversityofFlorida. [52] S.E.Yuksel,G.Ramachandran,P.Gader,J.Wilson,D.Ho,andG.Heo,Hierarchicalmethodsforlandminedetectionwithwidebandelectro-magneticinductionandgroundpenetratingradarmulti-sensorsystems,inProceedingsofIEEEInternationalConferenceonGeoscienceandRemoteSensingSymposium,BostonMA,July2008,pp.IIII. [53] I.T.Jolliffe,PrincipalComponentAnalysis,2nded.Springer,NewYork,2002. [54] P.Rousseeuw,Multivariateestimationwithhighbreakdownpoint,MathematicalStatisticsandApplications,vol.B,pp.283,1985. [55] B.Gabrys,B.Baruque,andE.Corchado,OutlierresistantPCAensembles,inProceedingsofthe10thInternationalConferenceonKnowledge-BasedIntelligentInformationandEngineeringSystems,Bournemouth,UK,October2006,pp.432. [56] C.-D.Lu,T.-Y.Zhang,X.-Z.Du,andC.-P.Li,ArobustkernelPCAalgorithm,inProceedingsofthe3rdInternationalConferenceonMachineLearningandCybernetics,Shanghai,China,August2004,pp.3084. [57] C.Lu,T.Zhang,R.Zhang,andC.Zhang,AdaptiverobustkernelPCAalgorithm,inProceedingsoftheIEEEInternationalConferenceonAcoustics,Speech,andSignalProcessing,HongKong,April2003,pp.VI621. [58] M.Sato-IlicandL.C.Jain,Fuzzyclusteringbasedprincipalcomponentanalysis,inInnovationsinFuzzyClustering.Springer,2006,pp.9. [59] T.R.Cundari,C.Sarbu,andH.F.Pop,Robustfuzzyprincipalcomponentanalysis(FPCA).Acomparativestudyconcerninginteractionofcarbon-hydrogenbondswithmolybdenum-oxobonds,JournalofChemicalInformationandComputerSciences,vol.42,no.6,pp.1363,2002. [60] T.-N.YangandS.-D.Wang,Fuzzyauto-associativeneuralnetworksforprincipalcomponentextractionofnoisydata,IEEETransactiononNeuralNetworks,vol.11,no.3,pp.808,2000. [61] M.Scholz,M.Fraunholz,andJ.Selbig,Nonlinearprincipalcomponentanalysis:Neuralnetworkmodelsandapplications,inPrincipalManifoldsforDataVisu-alizationandDimensionReduction,A.N.Gorban,B.Kegl,D.C.Wunsch,andA.Zinovyev,Eds.Springer,Berlin,2007,pp.44. [62] M.Harkat,G.Mourot,andJ.Ragot,NonlinearPCAcombiningprincipalcurvesandRBF-networksforprocessmonitoring,inProceedingsofthe42ndIEEECon-ferenceonDecisionandControl,Maui,Hawaii,December2003,pp.1956. 138

PAGE 139

J.Li,X.Li,andD.Tao,KPCAforsemanticobjectextractioninimages,PatternRecognition,vol.41,no.10,pp.3244,2008. [64] G.-H.Cha,Kernelprincipalcomponentanalysisforcontentbasedimageretrieval,inProceedingsofthe9thPacic-AsiaConferenceonAdvancesinKnowledgeDiscoveryandDataMining,Hanoi,Vietnam,May2005,pp.844. [65] A.Jade,B.Srikanth,V.Jayaraman,B.Kulkarni,J.Jog,andL.Priya,FeatureextractionanddenoisingusingkernelPCA,ChemicalEngineeringScience,vol.58,pp.4441,2003. [66] P.J.Huber,RobustStatistics.Wiley-Interscience,NewYork,1981. [67] B.Scholkopf,J.C.Platt,J.Shawe-Taylor,A.J.Smola,andR.C.Williamson,Estimatingthesupportofahigh-dimensionaldistribution,NeuralComputation,vol.13,pp.1443,2001. [68] J.M.Leski,Fuzzyc-varieties/elliptotypesclusteringinreproducingkernelHilbertspace,FuzzySetsandSystems,vol.141,no.2,pp.259,2004. [69] R.N.Dave,Characterizationanddetectionofnoiseinclustering,PatternRecognitionLetters,vol.12,pp.657,1991. [70] L.XuandA.L.Yuille,Robustprincipalcomponentanalysisbyself-organizingrulesbasedonstatisticalphysicsapproach,IEEETransactiononNeuralNet-works,vol.6,no.1,pp.131,1995. [71] H.Ichihashi,K.Honda,andN.Tani,GaussianmixturePDFapproximationandfuzzyc-meansclusteringwithentropyregularization,inProceedingsofthe4thAsianFuzzySystemsSymposium,Tsukuba,Japan,May2000,pp.217. [72] A.AsuncionandD.Newman,UCImachinelearningrepository,2007.[Online].Available: http://www.ics.uci.edu/mlearn/MLRepository.html [73] J.S.BendatandA.G.Piersol,RandomData:AnalysisandMeasurementProcedures.Wiley-Interscience,NewYork,2000. [74] R.XuandD.Wunsch,Surveyofclusteringalgorithms,IEEETransactionsonNeuralNetworks,vol.16,pp.645,2005. [75] J.He,M.Lan,C.-L.Tan,S.-Y.Sung,andH.-B.Low,Initializationofclusterrenementalgorithms:Areviewandcomparativestudy,inProceedingsofthe2004IEEEInternationalJointConferenceonNeuralNetworks,Budapest,Hungary,July2004,pp.297. [76] D.SteinleyandM.J.Brusco,Initializingk-meansbatchclustering:Acriticalevaluationofseveraltechniques,JournalofClassication,vol.24,pp.99,2007. 139

PAGE 140

B.Mirkin,ClusteringforDataMining:ADataRecoveryApproach.Chapman&Hall/CRC,2005. [78] S.-Z.YangandS.-W.Luo,Anovelalgorithmforinitializingclusteringcenters,inProceedingsofthe2005InternationalConferenceonMachineLearningandCybernetics,Guangzhou,China,August2005,pp.5579. [79] W.Wang,Y.Zhang,Y.Li,andX.Zhang,Theglobalfuzzyc-meansclusteringalgorithm,inProceedingsofthe6thWorldCongressonIntelligentControlandAutomation,Dalian,China,June2006,pp.3604. [80] A.Likas,N.Vlassis,andJ.J.Verbeek,Theglobalk-meansclusteringalgorithm,PatternRecognition,vol.36,pp.451,2003. [81] A.M.Bagirov,Modiedglobalk-meansalgorithmforminimumsum-of-squaresclusteringproblems,PatternRecognition,vol.41,pp.3192,2008. [82] H.-S.Tsai,Astudyonkernel-basedclusteringalgorithms,Ph.D.dissertation,DepartmentofAppliedMathematics,ChungYuanChristianUniversity,ChungLi,Taiwan,2007. [83] D.J.KleinandM.Randic,Resistancedistance,JournalofMathematicalChemistry,vol.12,pp.81,1993. [84] F.Fouss,A.Pirotte,J.-M.Renders,andM.Saerens,Random-walkcomputationofsimilaritiesbetweennodesofagraphwithapplicationtocollaborativerecommendation,IEEETransactionsonKnowledgeandDataEngineering,vol.19,no.3,pp.355,2007. [85] J.B.MacQueen,Somemethodsforclassicationandanalysisofmultivariateobservations,inProceedingsofthe5thBerkeleySymposiumonMathematicalStatisticsandProbability.UniversityofCaliforniaPress,Berkeley,CA,1967,pp.281. [86] J.C.Bezdek,PatternRecognitionwithFuzzyObjectiveFunctionAlgorithms.PlenumPress,NewYork,1981. [87] R.J.HathawayandJ.C.Bezdek,Optimizationofclusteringcriteriabyreformulation,IEEETransactionsonFuzzySystems,vol.3,no.2,pp.241,1995. [88] M.Girolami,Mercerkernel-basedclusteringinfeaturespace,IEEETransactionsonNeuralNetworks,vol.13,no.3,pp.780,2002. [89] M.Filippone,F.Camastra,F.Masulli,andS.Rovetta,Asurveyofkernelandspectralmethodsforclustering,PatternRecognition,vol.41,no.1,pp.176,2008. 140

PAGE 141

I.FischerandJ.Poland,Amplifyingtheblockmatrixstructureforspectralclustering,inProceedingsofthe14thAnnualMachineLearningConferenceofBelgiumandtheNetherlands,Enschede,TheNetherlands,February2005,pp.21. [91] M.Meila,Comparingclusteringsaninformationbaseddistance,JournalofMultivariateAnalysis,vol.98,no.5,pp.873,2007. [92] V.Vapnik,StatisticalLearningTheory.JohnWiley&Sons,NewYork,1998. [93] K.WuandK.-H.Yap,FuzzySVMforcontent-basedimageretrieval,IEEEComputationalIntelligenceMagazine,vol.1,no.2,pp.10,2006. [94] B.Lim,M.Tsui,V.Charastrakul,andD.Shi,WebsearchwithtextcategorizationusingprobabilisticframeworkofSVM,inProceedingsoftheIEEEInternationalConferenceonSystems,ManandCybernetics,Taipei,Taiwan,October2006,pp.2950. [95] C.-F.LinandS.-D.Wang,Fuzzysupportvectormachines,IEEETransactionsonNeuralNetworks,vol.13,no.2,pp.464,2002. [96] Y.Wang,S.Wang,andK.K.Lai,Anewfuzzysupportvectormachinetoevaluatecreditrisk,IEEETransactionsonFuzzySystems,vol.13,no.6,pp.820,2005. [97] L.Wang,P.Xue,andK.L.Chan,IncorporatingpriorknowledgeintoSVMforimageretrieval,inProceedingsofthe17thInternationalConferenceonPatternRecognition,Cambridge,UK,August2004,pp.981. [98] Q.Tao,G.-W.Wu,F.-Y.Wang,andJ.Wang,Posteriorprobabilitysupportvectormachinesforunbalanceddata,IEEETransactionsonNeuralNetworks,vol.16,no.6,pp.1561,2005. [99] Y.LiuandY.F.Zheng,SoftSVManditsapplicationinvideo-objectextraction,IEEETransactionsonSignalProcessing,vol.55,no.7,pp.3272,2007. [100] X.Jiang,Z.Yi,andJ.C.Lv,FuzzySVMwithanewfuzzymembershipfunction,NeuralComputingandApplications,vol.15,pp.268,2006. [101] C.fuLinandS.deWang,Trainingalgorithmsforfuzzysupportvectormachineswithnoisydata,PatternRecognitionLetters,vol.25,pp.1647,2004. [102] A.ShiltonandD.T.H.Lai,Iterativefuzzysupportvectormachineclassication,inProceedingsoftheIEEEInternationalConferenceonFuzzySystems,London,UK,July2007,pp.1. [103] N.Cristianini,J.Kandola,A.Elisseeff,andJ.Shawe-Taylor,Onkernel-targetalignment,JournalofMachineLearningResearch,vol.5,pp.27,2004. 141

PAGE 142

S.Mika,B.Scholkopf,A.Smola,K.-R.Muller,M.Scholz,andG.Ratsch,KernelPCAandde-noisinginfeaturespaces,inProceedingsofNeuralInformationProcessingSystems,Denver,CO,November-December1999,pp.536. [105] T.TakahashiandT.Kurita,Robustde-noisingbykernelPCA,inProceedingsofthe2002InternationalConferenceonArticialNeuralNetworks,Madrid,Spain,August2002,pp.739. [106] B.Scholkopf,A.Smola,andK.-R.Muller,Nonlinearcomponentanalysisasakerneleigenvalueproblem,NeuralComputation,vol.10,no.5,pp.1299,1998. [107] C.-W.Hsu,C.-C.Chang,andC.-J.Lin,Apracticalguidetosupportvectorclassication,TechnicalReport,2003,DepartmentofComputerScience,NationalTaiwanUniversity,Taipei,Taiwan. [108] M.ZhuandA.Ghodsi,Automaticdimensionalityselectionfromthescreeplotviatheuseofprolelikelihood,ComputationalStatisticsandDataAnalysis,vol.51,pp.918,2006. [109] J.C.Platt,ProbabilisticOutputsforSupportVectorMachinesandComparisonstoRegularizedLikelihoodMethods.MITPress,2004. [110] A.Madevska-Bogdanova,D.Nikolik,andL.Curfs,ProbabilisticSVMoutputsforpatternrecognitionusinganalyticalgeometry,Neurocomputing,vol.62,pp.293,2004. [111] C.M.Schubert,M.E.Oxley,andK.W.Bauer,AcomparisonofROCcurvesforlabel-fusedwithinandacrossclassiersystems,inProceedingsofthe7thInternationalConferenceonInformationFusion,2005,pp.415. [112] C.CortesandM.Mohri,AUCoptimizationvs.errorrateminimization,inAdvancesinNeuralInformationProcessingSystems,2003,pp.313. [113] A.P.Bradley,TheuseoftheareaundertheROCcurveintheevaluationofmachinelearningalgorithms,PatternRecognition,vol.30,no.7,pp.1145,1997. [114] P.G.Lucey,T.J.Williams,M.Mignard,J.Julian,D.Kobubun,G.Allen,D.Hampton,W.Schaff,M.J.Schlangen,E.M.Winter,W.B.Kendall,A.D.Stocker,K.A.Horton,andA.P.Bowman,AHI:anairbornelong-waveinfraredhyperspectralimager,inProceedingsofSPIE,vol.3431,1998,pp.36. [115] R.Mayer,F.Bucholtz,andD.Scribner,Objectdetectionbyusingwhitening/dewhiteningtotransformtargetsignaturesinmultitemporalhyperspectralandmultispectralimagery,IEEETransactionsonGeoscienceandRemoteSens-ing,vol.41,no.5,pp.1136,2003. 142

PAGE 143

R.H.Yuhas,A.F.Goetz,andJ.W.Boardman,Discriminationamongsemi-aridlandscapeendmembersusingthespectralanglemapper(SAM)algorithm,inSummariesofthe3rdAnnualJPLAirborneGeoscienceWorkshop,Pasadena,CA,1992,pp.147. [117] O.A.deCarvalhoandP.R.Meneses,Spectralcorrelationmapper(SCM):Animprovementonthespectralanglemapper(SAM),inSummariesofthe9thAnnualJPLAirborneEarthScienceWorkshop,Pasadena,CA,2000,pp.65. 143

PAGE 144

GyeongyongHeoreceivedhisBachelorofScienceandMasterofEngineeringinelectronicengineeringatYonseiUniversityinKoreain1994and1996respectively.Since2004,hecontinuedhisstudiesinthedepartmentofComputerandInformationScienceandEngineeringatUniversityofFlorida.Hisresearchinterestsincludelandminedetection,kernelmethods,imageanalysis,andpatternrecognition. 144