<%BANNER%>

Automatic Feature Learning and Parameter Estimation for Hidden Markov Models Using MCE and Gibbs Sampling

Permanent Link: http://ufdc.ufl.edu/UFE0041157/00001

Material Information

Title: Automatic Feature Learning and Parameter Estimation for Hidden Markov Models Using MCE and Gibbs Sampling
Physical Description: 1 online resource (104 p.)
Language: english
Creator: Zhang, Xuping
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2009

Subjects

Subjects / Keywords: bayesian, classfication, convolution, feature, gaussian, gibbs, hmm, learning, mcmc, morphological, owa, sampling
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Hidden Markov models (HMMs) are useful tools for landmine detection using Ground Penetrating Radar (GPR), as well as many other applications. The performance of HMMs and other feature-based methods depends not only on the design of the classifier but also on the features. Few studies have investigated both classifiers and feature sets as a whole system. Features that accurately and succinctly represent discriminating information in an image or signal are very important to any classifier. In addition, when the system that generated those original images has to change to fit different environments, the features usually have to be modified. The process of modification can be laborious and may require a great deal of application domain specific knowledge. Hence, it is worthwhile to investigate methods of automatic feature learning for purposes of automating algorithm development. Even if the discrimination capability is unchanged, there is still value to feature learning in terms of time saved. In this dissertation, two new methods are explored to simultaneously learn parameters intended to extract features and learn parameters for image-based classifiers. The notion of an image is general here. For example, a sequence of time or frequency domain features. We have developed a generalized, parameterized model of feature extraction based on morphological operations. More specifically, the model includes hit-and-miss masks to extract the shape of interests in the images. In one method, we use the minimum classification error (MCE) method with generalized probabilistic descent algorithm to learn the parameters. Since our model is based on gradient decent methods, the MCE method cannot guarantee a global optimal solution and is very sensitive to initialization. We propose a new learning method based on Gibbs sampling to learn the parameters. The new learning method samples parameters from their individual conditional probability distribution instead to maximize the probability directly. This new method is more robust to initialization, and can generally find a better solution. We also developed a new learning method based on Gibbs sampling to learn parameters for continuous hidden Markov models with multivariate Gaussian mixtures. Because hidden Markov models with multivariate Gaussian mixtures are commonly used HMM models in applications, we propose a learning method based on Gibbs sampling. The proposed method is empirically shown to be more robust than comparable expectation-maximization algorithms. We performed experiments using both synthetic and real data. The results show that both methods work better than the standard HMM methods used in landmine detection applications. Experiments with handwritten digits are also presented. The results show that the HMM-model framework with the automatic learning feature algorithm again performed better than the same framework with the man-made feature.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Xuping Zhang.
Thesis: Thesis (Ph.D.)--University of Florida, 2009.
Local: Adviser: Gader, Paul D.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2010-12-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2009
System ID: UFE0041157:00001

Permanent Link: http://ufdc.ufl.edu/UFE0041157/00001

Material Information

Title: Automatic Feature Learning and Parameter Estimation for Hidden Markov Models Using MCE and Gibbs Sampling
Physical Description: 1 online resource (104 p.)
Language: english
Creator: Zhang, Xuping
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2009

Subjects

Subjects / Keywords: bayesian, classfication, convolution, feature, gaussian, gibbs, hmm, learning, mcmc, morphological, owa, sampling
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Hidden Markov models (HMMs) are useful tools for landmine detection using Ground Penetrating Radar (GPR), as well as many other applications. The performance of HMMs and other feature-based methods depends not only on the design of the classifier but also on the features. Few studies have investigated both classifiers and feature sets as a whole system. Features that accurately and succinctly represent discriminating information in an image or signal are very important to any classifier. In addition, when the system that generated those original images has to change to fit different environments, the features usually have to be modified. The process of modification can be laborious and may require a great deal of application domain specific knowledge. Hence, it is worthwhile to investigate methods of automatic feature learning for purposes of automating algorithm development. Even if the discrimination capability is unchanged, there is still value to feature learning in terms of time saved. In this dissertation, two new methods are explored to simultaneously learn parameters intended to extract features and learn parameters for image-based classifiers. The notion of an image is general here. For example, a sequence of time or frequency domain features. We have developed a generalized, parameterized model of feature extraction based on morphological operations. More specifically, the model includes hit-and-miss masks to extract the shape of interests in the images. In one method, we use the minimum classification error (MCE) method with generalized probabilistic descent algorithm to learn the parameters. Since our model is based on gradient decent methods, the MCE method cannot guarantee a global optimal solution and is very sensitive to initialization. We propose a new learning method based on Gibbs sampling to learn the parameters. The new learning method samples parameters from their individual conditional probability distribution instead to maximize the probability directly. This new method is more robust to initialization, and can generally find a better solution. We also developed a new learning method based on Gibbs sampling to learn parameters for continuous hidden Markov models with multivariate Gaussian mixtures. Because hidden Markov models with multivariate Gaussian mixtures are commonly used HMM models in applications, we propose a learning method based on Gibbs sampling. The proposed method is empirically shown to be more robust than comparable expectation-maximization algorithms. We performed experiments using both synthetic and real data. The results show that both methods work better than the standard HMM methods used in landmine detection applications. Experiments with handwritten digits are also presented. The results show that the HMM-model framework with the automatic learning feature algorithm again performed better than the same framework with the man-made feature.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Xuping Zhang.
Thesis: Thesis (Ph.D.)--University of Florida, 2009.
Local: Adviser: Gader, Paul D.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2010-12-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2009
System ID: UFE0041157:00001


This item has the following downloads:


Full Text

PAGE 2

2

PAGE 3

tomyBrotherbecausehewastherewhenIneededhelp, tomyprofessorsbecausetheypointedmeintherightdirectionwhenIwaslost, thankyou 3

PAGE 4

IwouldliketothankDr.PaulGader,Dr.JoeWilson,Dr.GerhardRitter,Dr.DavidWilson,andDr.TamerKahvecifortheirpatienceandunderstanding.Iwouldalsoliketothankmyco-workersatthelab,RaaziaMazhar,AlinaZare,JeremyBolton,SenihaEsenYuksel,GyeongyongHeo,AndresMendez-Vazquez,ArthurBarnes,RyanClose,RyanBusser,KennethWatford,Hyo-JinSuh,Wen-HsiungLee,JohnMcElroy,TaylorGlenn,SeanMatthews,andGanesanRamachandran,fortheirhelpandunderstanding. 4

PAGE 5

page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 7 LISTOFFIGURES ..................................... 8 LISTOFSYMBOLS .................................... 10 ABSTRACT ......................................... 11 CHAPTER 1INTRODUCTION ................................... 13 1.1StatementofProblem ............................. 13 1.2ClassicationModel .............................. 13 1.3OverviewofResearch ............................. 13 2LITERATUREREVIEW ............................... 16 2.1FeatureLearning ................................ 16 2.1.1FeatureSelection ............................ 16 2.1.2FeatureWeighting ........................... 19 2.1.3SparsityPromotion ........................... 20 2.1.4InformationTheoreticLearning .................... 22 2.1.5TransformationMethods ........................ 24 2.1.6ConvolutionalNeuralNetworkandShared-WeightNeuralNetwork 29 2.1.7MorphologicalTransform ........................ 30 2.1.8BayesianNonparametricLatentFeatureModel ........... 31 2.2HiddenMarkovModel(HMM) ......................... 32 2.2.1DenitionandBasicConcepts ..................... 32 2.2.2ApplicationsoftheHMM ........................ 33 2.2.3LearningHMMParameters ...................... 33 2.2.4MinimumClassicationError(MCE)forHMM ............ 34 2.3GibbsSampling ................................. 36 3CONCEPTUALAPPROACH ............................ 39 3.1Overview .................................... 39 3.2FeatureRepresentation ............................ 40 3.2.1OrderedWeightedAverage(OWA)-basedGeneralizedMorphologicalFeatureModel ............................. 40 3.2.2ConvolutionalFeatureModels ..................... 42 3.2.2.1FeaturemodelforlooselycoupledsamplingfeaturelearningHMM(LSampFealHMM) .................. 42 5

PAGE 6

.................. 43 3.3FeatureInitialization .............................. 45 3.4LearningMethods ............................... 46 3.4.1MCE-HMMModelforFeatureLearning ................ 46 3.4.2GibbsSamplingMethodforContinuousHMMwithMultivariateGaussianMixtures ........................... 48 3.4.3LooselyCoupledGibbsSamplingModelforFeatureLearning ... 52 3.4.3.1GibbssamplerforLSampFeaLHMM ............ 52 3.4.3.2InitializationandmodiedViterbilearning ......... 57 3.4.4TightlyCoupledGibbsSamplingModelforFeatureLearning .... 59 4EMPIRICALANALYSIS ............................... 68 4.1DataSets .................................... 68 4.2ExperimentsandResults ........................... 70 4.2.1SynData1 ................................ 71 4.2.2GPRArid ................................. 72 4.2.3GPRTwoSite ............................... 73 4.2.4SynData2 ................................ 73 4.2.5GPRTemp ................................ 75 4.2.6HandwrittenDigits ........................... 76 5CONCLUSIONSANDFUTUREWORK ...................... 93 REFERENCES ....................................... 95 BIOGRAPHICALSKETCH ................................ 104 6

PAGE 7

Table page 4-1AlgorithmsandDatasets ............................... 78 4-2Confusionmatrixfordigitpair0and1forTSampFeaLHMM ........... 78 4-3Confusionmatrixfordigits0,1,2,4forTSampFeaLHMM ............ 78 4-4Confusionmatrixfordigits0,1,2,4forHMMwithhuman-mademasks .... 78 7

PAGE 8

Figure page 1-1Generalclassicationmodelwithdiagram. ..................... 15 2-1ThedashedlineisthePCAprojection,buttheverticaldottedlinerepresentsthebestprojectiontoseparatetwoclusters .................... 38 2-2Thetopplothasclosetozerotomaximizethevariationofprojections(horizontalaxis)ofallobservations,andthebottomplothasclosetoonetominimizethevariationoftheprojections(verticalaxis)oftheobservationsinsamecluster. 38 3-1Featureextractionprocessforfeaturelearning .................. 63 3-2MCE-basedtrainingprocessforfeaturelearning ................. 64 3-3GibbssamplingHMMtrainingprocess ....................... 64 3-4FeaturemodelforLSampFealHMM ......................... 65 3-5LSampFealHMMTrainingAlgorithm ........................ 65 3-6InitialfeaturemodelforTSampFealHMM ...................... 66 3-7FinalfeaturemodelforTSampFealHMM ...................... 66 3-8TSampFealHMMTrainingAlgorithm ........................ 67 4-1TensamplesfromeachclassofdatasetSynData1. ................ 79 4-2SamplesfromeachclassofdatasetGPRArid. .................. 80 4-3Hit-misspairsforinitialmasks. ........................... 80 4-4InitialOWAweightsforhitandmiss.Thetopweightscorrespondtothehitmaskandthebottomweightscorrespondtothemissmask. ........... 81 4-5Hit-missmasksafterfeaturelearningcorrespondingtoinitialmasksinFigure 4-3 .......................................... 81 4-6OWAweightsafterfeaturelearning. ........................ 81 4-7Finalmaskslearnedforthelandminedata.Eachrowrepresentsdifferentfeature.Row1positive,Row2positive,Row3negative,Row4negative. ........ 82 4-8ReceiveroperatingcharacteristiccurvescomparingMcFeaLHMMwithtwodifferentinitializationstothestandardHMM. .................... 82 4-9Left:ascendingedge,agedge,anddescendingedgesequences.Right:sequencesfromnoisebackground. ............................... 83 8

PAGE 9

....................... 83 4-11Resultfor135-degreestateafterGibbsfeaturelearningwithhit-missmasks. 84 4-12Hit-missmasksafterGibbsfeaturelearning. .................... 84 4-13ResultwithshiftedtrainingimagesafterGibbsfeaturelearningwithhit-missmasks. ........................................ 85 4-14Hit-missmasksafterGibbsfeaturelearningwithshiftedtrainingimages. .... 85 4-1525samplesequencesextractedfrommineimagesfromdatasetGPRTemp. .. 86 4-16HitmasksafterGibbsfeaturelearningwithfour-stateHMMsetting. ....... 86 4-17HitmasksafterGibbsfeaturelearningwiththree-stateHMMsetting. ...... 87 4-18ReceiveroperatingcharacteristiccurvescomparingLSampFeaLHMMandSampHMMalgorithmswiththestandardHMMalgorithm. ............ 87 4-1918samplesforeachdigitfromMNIST. ....................... 88 4-20Twosamplesforeachdigittoshowzonesplitting. ................ 89 4-21HitmasksandtransitionmatrixafterGibbsfeaturelearningfordigits. ...... 89 4-22Tenhuman-mademasks. .............................. 89 4-23HMMSampvsDTXTHMMonGPRA1atsiteS1whileclutter==mine. ..... 90 4-24HMMSampvsDTXTHMMonGPRA1atsiteS1whileclutter=mine. .... 90 4-25HMMSampvsDTXTHMMonGPRA2atsiteS1whileclutter=mine. .... 91 4-26HMMSampvsDTXTHMMonGPRA2atsiteS2whileclutter=mine. .... 91 4-27HMMSampvsDTXTHMMonGPRA2atsiteS1whileclutter==mine. ..... 92 4-28HMMSampvsDTXTHMMonGPRA2atsiteS2whileclutter==mine. ..... 92 9

PAGE 10

10

PAGE 11

HiddenMarkovmodels(HMMs)areusefultoolsforlandminedetectionusingGroundPenetratingRadar(GPR),aswellasmanyotherapplications.TheperformanceofHMMsandotherfeature-basedmethodsdependsnotonlyonthedesignoftheclassierbutalsoonthefeatures.Fewstudieshaveinvestigatedbothclassiersandfeaturesetsasawholesystem.Featuresthataccuratelyandsuccinctlyrepresentdiscriminatinginformationinanimageorsignalareveryimportanttoanyclassier.Inaddition,whenthesystemthatgeneratedthoseoriginalimageshastochangetotdifferentenvironments,thefeaturesusuallyhavetobemodied.Theprocessofmodicationcanbelaboriousandmayrequireagreatdealofapplicationdomainspecicknowledge.Hence,itisworthwhiletoinvestigatemethodsofautomaticfeaturelearningforpurposesofautomatingalgorithmdevelopment.Evenifthediscriminationcapabilityisunchanged,thereisstillvaluetofeaturelearningintermsoftimesaved. Inthisdissertation,twonewmethodsareexploredtosimultaneouslylearnparametersintendedtoextractfeaturesandlearnparametersforimage-basedclassiers.Thenotionofanimageisgeneralhere.Forexample,asequenceoftimeorfrequencydomainfeatures.Wehavedevelopedageneralized,parameterizedmodeloffeatureextractionbasedonmorphologicaloperations.Morespecically,themodelincludeshit-and-missmaskstoextracttheshapeofinterestsintheimages.Inonemethod,weusetheminimumclassicationerror(MCE)methodwithgeneralized 11

PAGE 12

WealsodevelopedanewlearningmethodbasedonGibbssamplingtolearnparametersforcontinuoushiddenMarkovmodelswithmultivariateGaussianmixtures.BecausehiddenMarkovmodelswithmultivariateGaussianmixturesarecommonlyusedHMMmodelsinapplications,weproposealearningmethodbasedonGibbssampling.Theproposedmethodisempiricallyshowntobemorerobustthancomparableexpectation-maximizationalgorithms. Weperformedexperimentsusingbothsyntheticandrealdata.TheresultsshowthatbothmethodsworkbetterthanthestandardHMMmethodsusedinlandminedetectionapplications.Experimentswithhandwrittendigitsarealsopresented.TheresultsshowthattheHMM-modelframeworkwiththeautomaticlearningfeaturealgorithmagainperformedbetterthanthesameframeworkwiththeman-madefeature. 12

PAGE 13

1-1 )isdifculttoinvestigateinitsfullgenerality.Hence,thesub-probleminvolvingahiddenMarkovmodel(HMM)withmorphologicalfeaturesisconsideredsincethismethodologytypeisbasedonad-hocmethodsthathaveshownpromiseinthepast( Gaderetal. ; Zhaoetal. 2003 ). Thefeaturelearningapproachinvolvesparameterizingthefeatureextractionprocess.Inlandminedetection,theexistingfeatureextractionmethodologyfortheHMMsproceedsbyperformingmorphologicaloperationsonsmallwindowsofpositive 13

PAGE 14

Gaderetal. ).Inaddition,linearconvolution-basedfeatureswillalsobeconsidered. Inoneproposedalgorithm,wedeneaminimumclassicationerror(MCE)objectivefunction.ThefunctiondependsnotonlyontheHMMclassicationparameters,butalsoonthefeatureextractionparameters.Optimizingthisobjectivefunctionoverbothparametersetssimultaneouslyyieldsbothfeatureextractionandclassicationparameters.Thefeaturemodelgeneralizesthemorphologicalmodelusingorderedweightedaverage(OWA)operations.OWAoperatorsareusedtoparameterizeboththemorphologicaloperationsandtheaggregationoperationsastheyformafamilyofoperatorsthatcanrepresentmaximum,minimum,andmanyotheroperators. Foranotherproposedalgorithm,weapplyaBayesianframeworktothefeaturelearningmodels.Ratherthandeninganobjectivefunctionandmaximizingthatfunction,wetrytondthefullprobabilitiesdistributionofparametersanddata.Bysamplingtheparametersfromtheindividualconditionalprobabilitydistributions,wecanobtainbettersolutions.WeusetheGibbssamplerasthetooltosimulatetheprobabilitydistribution.GibbssamplingisacommonMarkovChainMonteCarlo(MCMC)samplingmethod.Itisastraightforward,powerfulsamplingmethod(Section 2.3 ). Next,anewlearningmethodforcontinuousHMMwithmultivariateGaussianmixturesisproposed.HMMswithmultivariateGaussianmixturesarewidelyusedmodelsinmanyapplications( Rabiner 1989 ; Zhaoetal. 2003 ).MCMCsamplingmethodshavetheadvantageofgenerallyndingbetteroptimathantraditionalmethods,suchasexpectationmaximizationalgorithm( DunmurandTitterington 1997 ; RydenandTitterington 1998 ).AlthoughtherearesomelearningmethodsproposedforHMMbasedonMCMCsampling,thesemethodsareeitherforadiscreteHMMproblem,or 14

PAGE 15

Zenetal. 2006 ),non-parametricHMMmodels( Thrunetal. 1999 ),ornonstationaryHMMmodels( Zhuetal. 2007 ).WeproposeanewlearningmethodbasedonGibbssamplingthatfocusesonthisspecicHMMproblem. Therestofthedissertationisorganizedasfollows.InChapter2,wereviewtheliteratureofvariousfeaturelearningmethods,HMMalgorithms,andGibbssampling.InChapter3,wepresentthethreenewlearningalgorithms.InChapter4,weshowtheresultsofapplyingthesealgorithms.ConclusionsandfutureworkareinChapter5. Figure1-1. Generalclassicationmodelwithdiagram. 15

PAGE 16

Belongieetal. 1998 ; GaderandKhabou 1996 ; GuyonandElisseeff 2003 ; TamburinoandRizki 1989a ; YuandBhanu 2006 ).Goodfeaturesresultinbetterclassicationaccuracy.Theclassierswithbetterfeaturesarefasttocompute,andaremorecosteffective.Inaddition,betterfeatureshelphumansunderstandtheunderlyingprocessofdatageneration.Forinstance,identifyingthegenesresponsibleforcertaindiseasescanhelphumanstobetterunderstandthecauseofsomecancers( Leeetal. 2003 ).Wewillgiveanoverviewofthemostcommonlyusedalgorithmsinfeatureextractioninthissection.Featurelearningcaninvolveselectingasubsetoffeaturesfromalargesetofcandidates,learningorestimatingparametersofparameterizedoperations,orselectingoperatorsfromasetofcandidatesandlearningparameters.Therstisreferredtoasfeatureselection.Thefocusofthisresearchisthesecondcategory,learningorestimatingparametersofparameterizedoperations.Weprovidereviewsofallthreetypes. 16

PAGE 17

BlumandLangley 1997 ; GuyonandElisseeff 2003 ; Guyonetal. 2004 ; LiuandMotoda 2007 ):thelter,thewrapper,andtheembeddedmethod. Theltermethodisindependentoflaterprocessing.Thesimplestapproachistorankthefeaturesviasomecriteria.Therankcriteriacanbethecorrelationbetweenthevariablesandthelabels( GuyonandElisseeff 2003 ),orthemutualinformationbetweenvariableandthelabels( GlobersonandTishby 2003 ).Thehighrankvariablesareselectedasthenaldatarepresentatives.Thealgorithmisfastandeasytoimplementandhasbeensuccessfullyusedinmanycases( RogatiandYang 2002 ; Stoppigliaetal. 2003 ).However,somehighrankfeaturescanberedundantbecausetheycanbehighlycorrelated.Somelowrankfeaturesmayhelptoimprovetheperformanceofclassierswhentheyareincluded( Stoppigliaetal. 2003 ). Insteadofrankingvariablesindividually,anotherclassofltermethodsinvolvesrankingsubsetsofallthevariables.Somerankcriteriaarebasedonmutualinformation,suchastheminimal-redundancy-maximal-relevance(mRMR)criterionmethod( Pengetal. 2005 ).Onevariableischosenineachsteptoincreasethesizeofthefeaturesubset,thenthemutualinformationD(S;c)ofthelabelcandthesubsetSwiththenextnewvariableiscomputedasD=1 Anotherset-basedltermethod,basedonforwardorthogonalsearch,hasbeenproposed( WeiandBillings 2007 ).Anincrementalsearchisconductedusingthesquared-correlationbetweenthetwovariablesastherankingcriterion.SupposewehaveadatasetofNdatasamples,andeachsamplehasnfeaturecandidates.The 17

PAGE 18

Sincealtermethodisindependentoflaterprocessing,itmaynotimprovethenalperformance.Ontheotherhand,wrappermethodsincorporatelaterprocessingdirectly.Wrappermethodsselectsubsetsofvariablesbasedontheireffectonperformanceoflaterprocessing( GuyonandElisseeff 2003 ). Sinceanyclassierorotherlearningalgorithmcanbeusedinlaterprocessing,thewrappermethodispowerfulwhenappliedtotheselectionoffeatures.However,altermethodisusuallyfasterthanawrappermethod,becausewrappermethodsneedtoperformlearningalgorithmsforeveryfeaturesubsetcandidate.Therefore,anefcientsearchstrategyisneeded.Greedysearchstrategiesarecommonlyusedinwrappermethods. Forsupervisedlearning,theclasslabelisgiven.Itisnaturaltousetheclassicationperformancetoevaluatetherelevanceoffeaturesets( Najjaretal. 2003 ).Unsupervisedlearning,whichusuallyinvolvesclustering,isnotasstraightforward.Insteadofclassicationerrors,othercriteriaareused,suchasmaximumlikelihood(ML),scatterseparability( DyandBrodley 2004 ),oradiscriminantcriterion( RothandLange 2004 ).MLcriterionmaximizesthelikelihoodofthedatagiventhemodel(featuresetand 18

PAGE 19

Westonetal. 2001 ). Littlestone 1987 )andtheRELIEFalgorithm( KiraandRendell 1992 ). TheWinnowalgorithmwasdevelopedtoupdatetheweightsinamultiplicativemanner.TheideaoftheWinnowalgorithmistoupdatetheweightsbypresentingthepositiveandnegativeexamplesiteratively.Givenanexampledenotedby(x1;:::;xn)andtheweightsdenotedby(w1;:::;wn),wherenisthenumberoffeatures,thealgorithmpredicts1ifw1x1+:::+wnxn>threshold,otherwiseitpredicts0.Then,ineachiteration,theweightsareupdatedifthepredictionofthealgorithmisincorrect.Ifthealgorithmpredictsanegativevalueforthepositiveexample,thevalueofwiisincreasedbythescaleofapromotionparameterforeachxiequalto1.Ifthealgorithmpredictsapositivevalueforanegativeexample,thevalueofwiisdecreasedbythescaleofademotionparameterforeachxiequalto1.Thepromotionanddemotionparametersaresetbyexperiments.TheWinnowalgorithmisnotdifculttoimplement,anditscaleswellto 19

PAGE 20

GoldingandRoth 1999 ). TheRELIEFalgorithm( Dietterich 1997 )estimatesfeatureweightsiterativelyaccordingtotheirabilitytodiscriminatebetweenneighboringpatterns.Ineachiteration,apatternxisrandomlyselected.Thenthetwonearestneighborsofxarefound.Oneisfromthesameclassasx(termedthenearesthit,denotedbyNH(x))andtheotherisfromadifferentclass(termedthenearestmiss,denotedbyNM(x)).Theweightofthei-thfeatureisthenupdatedas:wnewi=woldi+jx(i)NM(i)(x)jjx(i)NH(i)(x)j.Itisproven( Sun 2007 )thatRELIEFisanonlinealgorithmthatsolvesaconvexoptimizationproblemwitha1-NearestNeighborhoodobjectivefunction.Therefore,theRELIEFalgorithmperformsbetterasanonlinearclassiertosearchforinformativefeaturescomparedwithltermethods.Inaddition,itcanbeimplementedveryefciently,asnoexhaustivesearchisapplied.Thismakesitsuitableforlarge-scaleproblems.However,itcalculatesnearestneighborsintheoriginalfeaturespaceratherthaninweightedfeaturespace,whichhurtsitsperformance.Moreover,itisnotrobustwithrespecttooutliers.TheIRELIEFalgorithm( Sun 2007 )wasproposedtoimprovetheRELIEFalgorithm.ItcalculatestheprobabilitiesofdatapointsinNM(x)andNH(x),andrepresentstheprobabilitythatadatapointisanoutlierasabinaryrandomhiddenvariable.Itupdatestheweightsfollowingtheprincipleoftheexpectation-maximization(EM)algorithm.TheresultshowsthatIRELIEFimprovestheRELIEFalgorithmbecauseitisrobustagainstmislabelingnoise,andisabletondusefulfeatures.BecauseitfollowstheEMalgorithm,theproperchoiceoftuningparametersisimportanttoachievegoodperformance. 20

PAGE 21

Penalization-basedorregularizationmethodsarecommonsparsitypromotiontechniques.Thesemethodsrelyonminimizingapenaltytermappliedtoasetofparameters.TheL2normisaverycommonlyusedpenaltyterm.Supposewepromotethesparsityoverasetofnparametersw1;:::;wn.ThepenaltyisdenedasPiw2i,whereisadecayconstant,alsotermedaweightdecaypenalty.Inalinearmodel,thisformofweightdecaypenaltyisequivalenttoridgeregression.Itisgoodatcontrollingmodelcomplexitybyshrinkingallcoefcientstowardzero,buttheyallstayinthemodel,sinceitisrarethatparametersgotozero. TheL0normisdenedasthenumberofnonzeroparameters.TheL0normpenaltyissimpletoapply,anditpromotessparsitydirectly( WipfandRao 2005 ).However,ingeneral,solvinganoptimizationproblemwithanL0normpenaltyisanNP-hardproblem;thusconvexrelaxationregularizationbytheL1normisoftenconsidered( Mrupetal. 2008 ; WolfandShashua 2003 ).TheL1normisdenedasPijwij.Theleastabsoluteshrinkageandselectionoperator(LASSO)( Tibshirani 1996 )imposestheL1normconstraintontheparametersofaproblem.ItshowsthattheL1normconstraintisequivalenttoassumingtheparametershaveLaplacepriorsandtheL2normconstraintisequivalenttoassumingtheparametershaveGaussianpriors.SinceLaplacefunctionsquicklypeakatzero,thetailofLaplacefunctionsdropsslowly,andtheL1normconstraintwouldpushtheparameterstoeitherzeroorlargevalues. TheLaplaceprioriscommonlyusedforsparsitypromotionintheBayesianapproach( Williams 1995 ).Becausethereisacomputationaldifcultyduetonon-differentiabilityoftheLaplacefunctionattheorigin,analternativehierarchicalformulationwasproposed( Figueiredo 2003 ),whereitisshownthatazero-meanGaussianpriorisequivalenttoaLaplacepriorwhenthevarianceoftheGaussianpriorhasaregular 21

PAGE 22

Krishnapuram 2004 ; Krishnapurametal. 2004 ). Thereareotherpenaltiesusedtopromotesparsity.Insteadofselectingasubsetoffeatures,theminimummessagelength(MML)criterionisusedtoestimateasetofreal-valued(usuallyin[0;1])quantities(oneforeachfeature),whicharecalledfeaturesaliencies.AnMMLpenaltyisadoptedtoavoidallthefeaturesalienciestoachievemaximumpossiblevalue( Lawetal. 2004 ; Mackey 2003 ).TheMMLcriterionisgiven( FigueiredoandJain 2000 )bylogp()logp(Yj)+1 2logjI()+c 12),whereYisthesetofdatasamples,isthesetofparametersofthemodel,cisthedimensionof,andI()=E[D2logp(Yj)]istheFisherinformationmatrix(thenegativeexpectedvalueoftheHessianofthelog-likelihood).TheMMLcriterionencouragesthesalienciesofirrelevantfeaturestogotozero,andallowsthemethodtoprunethefeatureset.However,theFisherinformationmatrixusedintheMMLcriterionisverydifculttoobtainanalytically.Approximatemethodsareusuallyneededtondtheoptimalsolution( FigueiredoandJain 2000 ). Mackey 2003 ; Principeetal. 1998 2000 )formachinelearning.Theconceptsofentropyandmutualinformationareneededtoposeandsolveoptimizationproblemswithinformationtheoreticcriteria. Herewereviewtheexpressionsforentropyandmutualinformationaccordingto Shannon ( 1948 )and KullbackandLeibler ( 1951 ).Shanon'sentropyisdenedasH(y)=Rp(y)logp(y)dy,wherep(y)isthePDFoftherandomvariabley.AlthoughShannon'sentropyistheonlyonethatpossessesallthepostulatedpropertiesforaninformationmeasure,otherforms,suchasRenyi'sentropy(HR(y)=1 2logRp2(y)dy),areequivalentwithrespecttoentropymaximization.Theconditionalentropyofarandom 22

PAGE 23

Mutualinformationiscommonlyusedinfeatureselectionorfeatureextractionmethods( HildIIetal. 2006 ; Leiva-MurilloandArtes-Rodriguez 2007 ; Torkkola 2003 ).Thesemethodsusuallytrainfeatureextractorsbymaximizinganapproximationofmutualinformationbetweentheclasslabelsandtheoutputofthefeatureextractor.Differentmethodsvaryfordifferententropyforms,differentcomputationalformulasofmutualinformation,ordifferentmaximizationmethods. HildIIetal. ( 2006 )presentedamethodusinganonparametricestimationofRenyi'sentropytolearnfeaturesandtraintheclassier.TheyuseParzenwindowstoestimatetheprobabilitydensityp(x)ofdataX,wherethedensityofXisestimatedasasumofsphericalGaussians,eachcenteredatasamplexi.So,p(x)u1 Torkkola ( 2003 )presentedamethodforlearningdiscriminativefeaturetransforms.InsteadofacommonlyusedmutualinformationmeasurebasedonKullback-Leiblerdivergence,aquadraticdivergencemeasureisused,whichisdenedasD(f;g)=Ry(f(y)g(y))2dy.Alinearfeatureextractionmethodforclassication,basedonthemaximizationofthemutualinformationbetweenthefeaturesextractedandtheclasses,wasproposed( Leiva-MurilloandArtes-Rodriguez 2007 ).TheyuseGram-Charlierexpansiontoestimatetheprobabilitydensityofdata.Thus,theentropyh(z)iscomputedash(z)=hG(z)J(z),wherehG(z)istheentropywithGaussianassumption,andJ(z)

PAGE 24

Experimentalresults( Leiva-MurilloandArtes-Rodriguez 2007 )showthatmutualinformation-basedmethodscanoutperformexistingsupervisedfeatureextractionmethodsandrequirenopriorassumptionsaboutdataorclassdensities.However,sincethesemethodsusuallyusenonparametricestimationofdensityofdata,suchasParzendensityestimation,theyrequirealargedatasetandveryhighcomputationtime.Thesemethodsaresensitivetothechoiceofwindowsizeandtheydonotworkwellonhighdimensionaldata. Burges 2004 ; Smith 2002 ; Torkkola 2003 ; WangandPaliwal 2003 ; YangandYang 2002 ).ThePCAmethodusestheeigenvectorscorrespondingtothelargesteigenvaluesofthecovariancematrixofthedataasthetransformmatrix.Aftertransformation,itgeneratesmutuallyuncorrelatedfeatures.Thistransformationisoptimalintermsofminimalmean-squareerrorbetweentheoriginaldataandthedatareconstructedfromthefeatures.ItalsomaximizesmutualinformationbetweentheoriginaldatavectoranditsfeaturerepresentationforthedatafromtheGaussian 24

PAGE 25

YeungandRuzzo 2001 ),asshowninFigure 2-1 TherearesomevariationsofPCAtoimprovetheperformanceortotdifferentlearningframeworks.Theprobabilisticprincipalcomponentanalysis(PPCA)method( TippingandBishop 1998 )modiestheoriginalPCAtotitintoaBayesianframework.PPCAintroducesazeromeanGaussiandistributionlatentvariable~ytotheregularPCAmodel,suchthat~x=W~y+~+~,wherevector~xistheobservation,~isthevectorparameterthatrepresentsthemeanofdata,~yN(0;I),and~N(0;2I).Giventhemodel,theEMalgorithmisusedtondtheoptimaltransformmatrixtomaximizethelikelihoodoftheobservations.EM-PCA( Roweis 1998 )usessimilarideas.PPCA/EM-PCAhavethesameadvantagesasPCA,becausetheyalsondmoreinformativeuncorrelatedfeatures.Theycanalsoassignlowprobabilitiesforsomeoutliersfarawayfrommostdata.Unfortunately,theyhavesimilardisadvantagestoPCA.Theyarenotgoodforndingoptimalfeaturesforclassicationperformance.AnothershortcomingofthesetwomethodsisthatPPCAandEM-PCAarebatchalgorithms( Choi 2004 ). InformedPCA( Cohn 2003 )isanothervariationofPCAthatincorporatestheinformationoflabelsorcategoriesintothedenitionofthetransformation.PCAonlypenalizesaccordingtosquareddistanceofanobservationfromitsprojection.InformedPCAisbasedontheassumptionthatifasetofobservationsSi=fx1;x2;:::;xngareinthesameclassi,thentheyshouldshareacommonsource.ForahyperplaneHdenedbytheorthogonalmatrixC,whichconsistsoftheeigenvectorsofthecovariancematrixofSi,themaximumlikelihoodsourceisthemeanofSi'sprojectionsontoH,denotedbySi.IfwedenotexjastheprojectionofthejthobservationbythetransformmatrixC,thelikelihoodshouldbepenalizednotonlybasedonthevarianceofobservationsaroundtheirprojections(Pjjjxj^xjjj2),whichissameasPCA,butalsoonthevarianceoftheprojectionsaroundtheirsetmeans(PiPxj2Sijj^xjSijj2).Withatrade-offhyper 25

PAGE 26

2-2 Walletal. 2003 )hasthesamegoalasPCAinthatitndsprojectionsthatminimizethesquarederrorinreconstructingoriginaldata.Itcalculatestheeigenvectorsofthecovariancematrixoftheoriginaldatabysingularvaluedecomposition(X=USVT,whereUandVaretheorthogonalmatrix,Sisthediagonalmatrix,whichhasnonzerodiagonalelements.).IthasmoreefcientalgorithmsavailablethanPCAtondtheeigenvectors,andsomeimplementationsndjustthetopNeigenvectors.However,itisstillcomputationallyexpensiveinthecaseofhighdimensiondata.IthasthesamedisadvantagesasPCAinthatitisnotaccurateinndingoptimalfeaturesforclassicationperformance. ChenandYang 2004 ; PetridisandPerantonis 2004 ; YangandYang 2003 ; Zhaoetal. 2006 ).Itisoptimallydiscriminativeforcertaincases( Torkkola 2003 ).LDAndstheeigenvectorsofC=S1wSb,whereSbisthebetween-classcovariancematrix,andSwisthesumofwithin-classcovariancematrices.ThematrixS1wcapturesthecompactnessofeachclass,andSbrepresentstheseparationoftheclassmeans.EigenvectorscorrespondingtothelargesteigenvaluesofCformthecolumnsoftransformmatrixW.Newdiscriminativefeaturesyarederivedfromtheoriginaldataxbyy=Wtx.ItperformsbestondatawithGaussiandensityforeachclass,andwellseparatedmeansbetweenclasses.Inaddition,sinceaFishercriterionisnotdirectlyrelatedtotheclassicationaccuracy,itisnotoptimalintermsoftheclassicationerror. 26

PAGE 27

HyvarinenandOja 2000 ).ThefundamentalrestrictioninICAisthattheindependentcomponentsmustbenon-Gaussian,sincethekeytoestimatingtheICAmodelisnon-Gaussianity.Tousenon-GaussianityinICAestimation,thereshouldbeaquantitativemeasureofnon-Gaussianityofarandomvariable.Theclassicalmeasureofnon-Gaussianityiskurtosis,orthefourth-ordercumulant.Anotherveryimportantmeasureofnon-Gaussianityisgivenbynegentropy.Negentropyisbasedontheinformation-theoreticquantityofdifferentialentropy.BecauseafundamentalresultofinformationtheoryisthataGaussianvariablehasthelargestentropyamongallrandomvariablesofequalvariance,entropycouldbeusedasameasureofnon-Gaussianity. ICAcanalsobeconsideredavariantofprojectionpursuit( Huber 1985 ).Projectionpursuitisatechniquedevelopedinstatisticsforndingthemostinterestingprojectionsofmultidimensionaldata.Someresearchers( Huber 1985 ; JonesandSibson 1987 )arguedthatGaussiandistributionistheleastinterestingone,andthatthemostinterestingdirectionsarethosethatshowtheleastGaussiandistribution.Bycomputingthenongaussianprojectionpursuitdirections,theindependentcomponentscanbeestimated,whichistheconceptofICA,butifthenon-Gaussianitymodeldoesnotholdforthedata,ICAdoesnotwork. Wangetal. 2007 ).Givenadatasample[f(x),x=0;1;:::;N1],thediscreteFouriertransformisF(u)=1 27

PAGE 28

Saha 2000 ),usecosinefunctionsasbasisfunctions.ThediscretecosinetransformisD(u)=(u)PN1x=0f(x)cos(2x+1)u NanavatiandPanigrahi 2005 ; Saha 2000 )isanotherwidelyusedtransforminimageprocessing.Ituseswaveletsasbasisfunctions.Waveletsarefunctionsoflimitedduration,andhaveanaveragevalueofzero.Thesebasisfunctionsareobtainedfromasingleprototypefunctionbydilations,orcontractions(scaling)andtranslations(shifts).Thecoefcientsofthesebasisvectors/functionsareusedastherepresentativesoforiginalsignals,suchasF(u)andD(u).Becausethesebasisfunctions,suchascosinefunctions,havecompactenergyonlowfrequencies,andnaturalimageshavemostlylow-frequencyfeatures,imagescanberepresentedbyasmallnumberofcoefcientswithoutmuchlossofinformation.Theyhavegoodinformationpackingpropertiesforsignalcompressionandreconstruction.OnedisadvantageofDFTandDCTisthattheirbasicfunctionsareperiodiccontinuousfunctions,suchassinusoids;theymaynotbegoodatgeneratingmorelocalizedfeaturessuchasedgeinformation.OnedisadvantageoftheDWT( NanavatiandPanigrahi 2005 )istheproblemofselectingbasisfunctionsforagivenapplication,becauseaparticularwaveletissuitedonlyforaparticularpurpose. Gabor ( 1946 )formulatedanorientedbandpasslterthatrepresentsanoptimalcompromisetotheuncertaintyrelationshipbetweenpositionalandspatialfrequencylocalization.TheGaborfunctionsg(x;y)=s(x;y)wr(x;y)aredenedastheproductofaradiallysymmetricGaussianfunctionwr(x;y)andacomplexsinusoidalwavefunctions(x;y)( Movellan 2006 ; Rizkietal. 1993 ).Acomplexsinusoidalfunctionisdenedass(x;y)=exp(j(2(u0x+v0y0)+P)),where(u0;v0)andPdenethespatialfrequencyandthephaseofthesinusoidal,respectively.Adistinctadvantageof 28

PAGE 29

Kyrkietal. 2004 ).TheGaborlterfeaturesareusefulfortextualanalysis,astheyhavetunableorientation,radialfrequencybandwidths,andtunablecenterfrequencies( Greenspanetal. 1991 ; YuandBhanu 2006 ).AdisadvantagetousingaGabortransformisthattheoutputsofGaborlterbanksarenotmutuallyorthogonal.Thus,thefeaturesextractedmaybecorrelated.Inaddition,Gaborltersusuallyrequirepropertuningoflterparameters. Nebauer 1998 ).Ithasbeensuggestedthatthistopologyismoresimilartobiologicalnetworksbasedonreceptiveeldsandimprovedtolerancetolocaldistortions( Nebauer 1998 ).Inaddition,thenumberofweightsandthecomplexityofmodelsareefcientlyreducedbyweightsharing.Whenimageswithhigh-dimensionalinputvectorsarepresenteddirectlytothenetwork,thismethodcanavoidexplicitlydenedfeatureextractionanddatareductionmethodsusuallyappliedbeforeclassication( LeCunetal. 1989 1998 ).Inotherwords,thismethoddoesfeatureextractionandclassicationsimultaneously.Thedisadvantageofthismethodisthatthenetworkiseasytoover-t,anditisdifculttointerpretthemeaningofinnernodesandthestructureofthenetwork.Itisnoteasytoincorporatepriorknowledgeintothenetwork. Convolutionalnetworksusuallyhavethreearchitecturalschemes:localreceptiveelds,sharedweights,andsubsampling.Somedegreeofscaleshiftanddistortion 29

PAGE 30

LeCunetal. 1998 ).Theimagesofcharactersintheinputlayerareapproximatelysizenormalizedandcenteredrst.Eachnodeinalayerisconnectedtoasetofnodeslocatedinasmallneighborhoodinthepreviouslayer,andthenalltheweightsarelearnedwithbackpropagation.Becausethenumberoffreeparametersisreducedbytheweightsharingtechnique,thecapacityofthemachineandthegapbetweentesterrorandtrainingerrorisreduced( Chenetal. 2006 ; GarciaandDelakis 2004 ). Asharedweightnetwork( Gaderetal. 1995 ; Porteretal. 2003 )isanothernameforaCNNandemphasizestheweight-sharingpropertiesofnetworks.Thenetworkisviewedasnonlinearcombinationsoflinearlters. TamburinoandRizki 1989a b ),dilation,andhit-miss( GaderandKhabou 1996 ; ZmudaandTamburino 1996 )arecommonlyusedmorphologicaltransformsinimageprocessing.Someothermathematicalmorphologies,knownasgranulometries,arestudiedaswell( Serra 1983 ; Urbachetal. 2007 ). Themorphologicaltransformcanbeappliedtoneuralnetwork,suchasamorphologicalneuralnetwork.Itisanetworkwithamorphologicalfeatureextractionlayer.Anexampleisdescribedby Haunetal. ( 2000 ).Rawimagesintheinputlayerarerstundersampledtodecreasecomputationintensity.Thenhit/misstransformsareusedtomapthepixelswithintheimagestofeaturemaps.Eachfeaturemapisproducedbyonehit/missweightmatrixpair.Thetransformsareessentiallythetargetseroded(hit)andthebackgroundsdilated(miss).WeassumebothhitmatrixHandmissmatrixIare33matrices.Duringthetransformprocess,a33windowslidesontheinputimage.Givena33matrixIofpixelsfromtheinputimage,whereI22istheorigin,adifferencematrixDisproducedbyD=IHandasummatrixSisproducedbyS=I+M.Thenthevaluefofthepixelattheorigininthefeaturemapiscomputedby 30

PAGE 31

Toimprovethecomputationspeedofmorphologicaltransformations,somefastalgorithmsarestudiedtocomputemin,median,max,oranyotherorderstatisticltertransform( GilandWcrman 1993 ).Anefcientanddeterministicalgorithmisproposedtocomputetheone-dimensionaldilationanderosion(maxandmin)slidingwindowlters( GilandKimmel 2002 ). Gaderetal. 2000 ; KhabouandGader 2000 ; Khabouetal. 2000 ; Sahoolizadehetal. 2008 ; WonandGader 1995 ).Itisatwo-stagenetwork,withastandardfeedforwardclassicationstagefollowedbyafeatureextractionstage.Thefeatureextractionstageiscomposedofoneormorefeatureextractionlayers.Eachlayeriscomposedofoneormorefeaturemaps.Associatedwitheachfeaturemap,isapairofstructuringelementsoneforerosionandonefordilation.Thevaluesofafeaturemaparetheresultofperformingahit-missoperationwiththepairofstructuringelementsonamapinthepreviouslayer.Thevaluesofthefeaturemapsonthelastlayerarefedtothefeed-forwardclassicationstageoftheMSNNwithgray-scalehit-misstransform( GaderandKhabou 1996 ; Gaderetal. 1995 ; Haunetal. 2000 ). Ghahramanietal. 2007 ; GrifthsandGhahramani 2005 )isaexiblenonparametricapproachtolatentvariablemodelinginwhichthenumberoflatentvariablesisunbounded.Thisapproachisbasedonaprobabilitydistributionoverequivalenceclassesofbinarymatriceswithanitenumberofrows,correspondingtothedatapoints,andanunboundednumberofcolumns,correspondingtothelatentvariables.Eachdatapointcanbeassociatedwithasubset 31

PAGE 32

2.2.1DenitionandBasicConcepts Thenotationweusegenerallyfollows Rabiner ( 1989 )andisasfollows: 32

PAGE 33

AnHMMisgenerallyrepresentedasathree-tuple=(A;B;). Rabiner 1989 ),machinetranslation( InoueandUeda 2003 ),andgeneprediction( StankeandWaack 2003 ).Ofparticularinterestarethoseapplicationswithimagesasinput(imageclassication( Maetal. 2007 ),handwritingrecognition,andminedetection( GaderandPopescu 2002 ; Gaderetal. ; Zhaoetal. 2003 )). ExpectationMaximization Bilmes 1997 ).TheEMalgorithmaimstondmaximum-likelihood(ML)estimatesforsettingswherethisappearstobedifcult.TheconceptoftheEMalgorithmistomapthegivendatatocompletedatafromwhichitiswell-knownhowtogetMLestimates.TheEMalgorithmperformsiterationsoverthegivendata.Eachiterationhastwosteps.Anexpectation(E)stepisfollowedbyamaximization(M)step.IntheEstep,theEMalgorithmcalculatestheexpectationofunknown(hidden)datagivenaninstanceofaprobabilisticmodel.IntheMstep,theEMalgorithmcalculatesanMLestimateoverthecurrentcaseofcomplete-data(knowndataandtheexpectationofhiddendata).TheEMalgorithmisguaranteedtoconvergetoalocalmaximum( Prescher 2004 ).Disadvantagesincludeitmaynotndtheglobaloptimalsolutionanditissensitivetotheinitialization. Ma 2004 ; Zhaoetal. 2003 ).Ithasalossfunctionasanobjectivefunction.Thefunctionisusuallyasigmoidfunctionofamisclassicationmeasure.Gradient 33

PAGE 34

whereDisthedimensionoftheobservationx. Inthiswork,weusecontinuousHMMswithGaussianmixturemodelsrepresentingtheemittingprobabilitydensitiesthatarethereforegivenby wherebjk(xt)=(2)D=2jRjkj1=2exp(1 2DXl=1xtljkl ToestimatetheHMMparameters,weusetheMCEmethodwithgeneralizedprobabilisticdescent(GPD)( Ma 2004 ).ThegoalofMCEtrainingistobeabletodiscriminateamongsamplesofdifferentclassescorrectlyratherthantoestimatethedistributionsofeachclassaccurately.TheMCEobjectivefunctionisalossfunctionthatdependsonamisclassicationmeasure.Themisclassicationmeasureforthetwo-classproblemusedhereisdenedasfollows: 34

PAGE 35

1+ed(x)+;(2) whereisapredenedparameter. Wetrytominimizetheexpectedlosstooptimizetheperformanceoftheclassier.InstandardMCEtraining,weseektoestimatethemixtureproportions,means,andcovariancesoftheGaussianmixtures,aswellasthetransitionprobabilitiesthatwillminimizetheaverageloss.Todeveloppracticalformulas,auxiliaryvariablesareexplicitlyandimplicitlyintroduced: Notethattheserelationsyieldthefollowingdifferentialrelations: whichareusedintheupdateformulasbelow.Similardifferentialrelationsholdforthetransitionprobabilities.Letpxl(xj)(1l(xj)).Usingtheseauxiliaryvariablesandrelations,applicationofgradientdescentleadstothefollowingformulastoupdateHMMmodelparameters: 35

PAGE 36

where@bj(xt) where@bj(xt) CasellaandGeorge 1992 ; Shengetal. 2005 )isaMarkovchainMonteCarlo(MCMC)method( Neal 1993 )forjointdistributionestimationwhenthefullconditionaldistributionsofalltheconcernedrandomvariableareavailable.GibbssamplinghasbecomeacommonalternativetotheEMalgorithmforsolvinganincompletedataprobleminaBayesiancontext.Gibbssamplingprovidessamplestoestimatethejointdistributionoftherandomhiddenvariablesandparameters.Itprovidesfortheestimationofrandomvariablesfromthesesamples.Therefore,GibbssamplingmayndamoreoptimalsolutionthanEM,whichispronetondinglocalsolutions. Givenjointdensityp(x1;x2;:::;xK)forasetofrandomvariablesx1;x2;:::;xK,itisusuallydifculttoestimateorsamplethemarginaldistributionsdirectly,becausethemarginaldistributionp(xi)forxi,wherei=1:::K,iscomputedusing 36

PAGE 37

...x(t+1)ip(xijx1=x(t+1)1;:::;xi1=x(t+1)i1;x(i+1)=x(t)i+1;:::;xK=x(t)K); ...x(t+1)Kp(xKjx(t+1)1;:::;xK1=x(t+1)K1); wheretdenotestheiterationindex.HereXP(XjY)denotestheprocessofdrawingasampleXifromapopulationdenedbytheconditionaldistributionP(XjY) Neal 1993 )thatast!1,thesampledistributionofx1;x2;:::;xKconvergestop(x1;x2;:::;xK).Equivalently,ast!1,thedistributionofx(t)iconvergestop(xi)fori=1:::K.Thus,theGibbssamplertreatsthesamplesx(t)1;:::;x(t)KfortMasasamplefromp(x1;x2;:::;xK)byselectingsomelargevalueforM.Theinitialperiodwhensamplesarerstdrawnisreferredtoastheburn-inprocedure.Nowwecancalculatetheexpectationofafunctionf(x)overthedistributionp(xi).ThisisdonebytheMonteCarlointegration wheretistheiterationindexinthesamplingprocess,andNisthetotalnumberofsamplescollected. 37

PAGE 38

ThedashedlineisthePCAprojection,buttheverticaldottedlinerepresentsthebestprojectiontoseparatetwoclusters Figure2-2. Thetopplothasclosetozerotomaximizethevariationofprojections(horizontalaxis)ofallobservations,andthebottomplothasclosetoonetominimizethevariationoftheprojections(verticalaxis)oftheobservationsinsamecluster. 38

PAGE 39

Threenewapproachesareproposed:(1)simultaneousfeaturelearningandHMMtrainingusinganMCEalgorithm,(2)HMMtrainingusingGibbssampling,and(3)bothlooselyandtightlycoupledfeaturelearningandHMMtrainingusingGibbssampling.NotethatthesecondapproachfocusesonlyonestimatingtheHMMparametersandnotthefeatureparameters.EstimatingtheHMMparametersaloneallowsforafocusedanalysisoftheproposednoveltechnique.WerefertothersttwomethodsasMcFeaLHMMandSampHMM,respectively(hereFeaLisanacronymreplacingFeatureLearning).Thethirdmethodconsistsoftwosub-methods,whichwerefertoasTSampFeaLHMMandLSampFealHMM.NotethatthefeaturemodelsusedforMcFeaLHMMareorderedweightedaverage(OWA)-basedgeneralizationsofmorphologicaloperators,whereasthoseusedbyTSampFeaLHMMandLSampFeaLHMMareconvolutionalmodels. Resultsindicatethat,whileallfeaturelearningmodelscanachieveperformancesimilartoorbetterthanthatofahuman,McFeaLHMMisverysensitivetoinitializationandlearningrates.Fortunately,TSampFeaLHMMandLSampFeaLHMMaremuchmorestableandcanproducebettersolutionsthanMcFeaLHMMinlandminedetectionexperiments.ItmaybepossibletoalleviatetheseproblemsforMcFeaLHMMbysamplingfromaposteriordistributionbasedontheMCElossfunction,butinvestigationofthatconceptislefttofuturework.Thefeaturemodelsandtrainingalgorithmsarenowdescribedbelowindetail. 39

PAGE 40

AnyweightssatisfyingthesepropertieswillbereferredtoasasetofOWAweights.Letf=ff1;f2;:::;fngbeamulti-setofrealnumbers.Thei-thorderstatisticoffisf(i),wheretheparenthesizedsubscriptsdenoteapermutationoftheindicessuchthatf(1)f(2)f(n).TheOWAoperatoronfwithweightvectorwisdenedas OWAw(f)=nXi=1wif(i):(3) WealsodeneOWAwasanOWAoperatorofsizenwherenisthenumberofweights.OWAoperatorsareusedheretodenegeneralfeatureextractors.Inourcontext,afeatureisdenedtobeatupleconsistingofthefollowing: 1. Afeaturewindowsize,NK. 2. TwosetsofOWAweightshandmofsizeNKandassociatedOWAoperatorsthatactontwo-dimensionalarraysBasfollow: OWAh(B)=NKXi=1hib(i);(3) 40

PAGE 41

3. TwoNKarrayscalledthehitmaskandmissmaskanddenotedGhandGm,respectively.Themasksrepresentthegeometricshapesofthefeatures.Consistentwithstandardpracticeinmathematicalmorphology,thehitmaskisapatternthatmatchesaforegroundshapeandthemissmaskisapatternthatmatchesabackgroundshapeofthefeatures.Thevaluesofthearraysareeitherbinary,inthesetf0,1g,ornon-binary,intheinterval[0,1].Wehavehandmtorepresenthitandmissmasks,respectively. TheOWAweightsandmaskvaluesarethefeatureextractionparametersthatarelearnedinthetrainingprocess.FeaturesarecomputedoversubwindowsfromtheimagesusingneighborhoodOWAoperators. AtrainingsetTconsistsofimagesfromeachclass.Therststepinfeaturelearningisfeatureinitialization.Featureinitializationproceedsbyrstcollectingsubwindowsfromthetrainingimages.Variousprocedurescanbeusedtoselectorcomputeasmallsetoflikelyinitialfeaturesfromthesesubwindows.Wewillinvestigateseveraloftheseprocedures.ThefeatureparametersarethenupdatedtogetherwiththestandardHMMparameters.Severaltrainingalgorithms,includingMCE-basedandGibbssamplingwillbeconsidered.Neighborhoodupdatingmethodsthatattempttoencourageconnectedfeatureswillalsobestudied.Givenanimage,thefeatureextractionprocessproceedsasfollows.LetAbeanimagewithpixelvaluesintheinterval[0,1]andsizelargerthanthewindowsize.Thefeatureextractionconsistsoftwosteps:rstisapplyingamasked,neighborhoodOWAhit-missoperatortoA,andsecondisaggregatingtheresultoftherststepovertherowsofA.Moreprecisely,letAtkdenotetheNKsubimageofAwiththeupperlefthandcornerlocatedatrowtandcolumnk.Afterweapplythemaskandhit-missoperatortoA,animageDwiththesamesizeasAis 41

PAGE 42

wheresymboldenotespointwisemultiplication.Notethatiftheimageandthemasksarebinaryandifhandmcorrespondtominimumoperations,thenthisoperatorisexactlytheordinaryhit-missoperatorfrommathematicalmorphology.Thenalstepinfeatureextractionisaggregatingtheoutputsofthemasked,neighborhoodhit-missoperatorbycomputingthemaximumofeachcolumnofD: Theresultisasequenceoffeaturevaluesindexedbyk.Apictorialdescription( Zhangetal. 2007 )isshowninFigure 3-1 3-4 ,wearegivenpNKimages.WetransformeachimagetoanNK1vectorAi,wherei=1;:::;H.ForeachimageAianNKbinaryhitmaskoranNKternarywithvaluesinf-1,0,1ghit-missmaskMiisappliedusingconvolutionasfollows.LetL=NK.WedenethevectorBias wherethesymbol`'denotespointwisemultiplication,andrepresentsazeromeanGaussianperturbationwithcovariancematrix=2IL1.Notethatweconsidereach 42

PAGE 43

NowwedeneDiasthesumoverBplusazeromeanadditiveGaussianperturbationwithvariance2: WedeneonefeaturextastheaggregationofDiwiththeadditivezeromeanGaussiannoise", Nowweassignthelabelyttothefeaturext,giventhethreshold,by AsshowninFigure 3-6 ,whenanN1K1imageAisgiven,theimageissplitintoTN2K2subimagesAt;t=1;:::;T.Thesesubimagesarecalledzones.ForeachzoneAt,anNKternaryf-1,0,1ghit-missmaskMtisappliedusingconvolution,similarto 43

PAGE 44

ThevectorDticanberepresentedas WedeneonefeaturextastheaggregationofDti, Toreducecomputationandthenumberofvariablestobesampled,wechangetheorderoftheequationsabove,asshowninFigure 3-7 .WerstconvolvezoneAtwithamaskM,whereMk=1foreachk.TheresultisanNKmatrixorsizeNK1vectorZtthatis WethendeneanarrayCtby WedeneonefeaturextastheaggregationofCtk, NowweassumethefeaturextisfromaGaussiandistributionN(;)andapplytheHMMmodel(A;;;)tothesequencex1;:::;xT.NotethatthevaluesofAtkshouldbescaledtotheinterval[-1,1],withthebackgroundvaluesoftheimagetakingthevalue-1.Thisscalingprovidestheadvantagethatifthehit-missmaskhasahitontheareaofinterest,thefeaturevaluewillalwaysbeclosetothemaximumvalue. 44

PAGE 45

Ma 2004 ).Consequently,thisalgorithmisverysensitivetoinitialization.Infact,inourexperiments,randominitializationdidnotleadtousefulsolutions.Therefore,adata-basedalgorithmforinitializingmaskswasdevisedforMcFeaLHMM.Thesampling-basedalgorithmswerenotsensitivetoinitialization.ThereforerandominitializationwasusedforTSampFeaLHMMandLSampFealHMM. ThealgorithmusedtoinitializeMcFeaLHMMisbasedonclustering,soisreferredtoasMcClustInit.TheMcFeaLHMMalgorithmusesoneHMMtomodeleachclass.Foreachmodel,atrainingsetA1;A2;:::;AHofNbigKbigimagesofpatternsfromtheassociatedclassisgiven.TheOWA-basedfeatureoperatorsareintendedtodetectsub-patternsofthosepatternscontainedinNKsubimages.TheMcClustInitalgorithmthereforeclusterssubimagesofsizeNKextractedfromthetrainingdataset.Intherststep,theMcClustInitalgorithmemploystheOtsuthresholdingalgorithm( Otsu 1979 )tosemi-thresholdthetrainingpatterns.Next,allNKsubimageswithsufcientenergyareextractedfromthesemi-thresholdedtrainingimages.WeletSdenotethesetofthesesubimages.Thegoalistondshift-invariantprototypesofthepatternsinS.Forexample,allhorizontallinesshouldhavethesamerepresentation.Therefore,thealgorithmcalculatesthemagnitudeoftheFouriertransformsofallthepatternsinS,producingsetF.TheelementsofFwereclusteredusingtheFuzzyC-Means(FCM)algorithmwithapre-denedvalueofC,resultinginasetPoffrequencydomainprototypes.Theseprototypeswerethenusedtocomputespatialdomainprototypes,whichwereusedastheinitialfeaturemasks. 45

PAGE 46

3.4.1MCE-HMMModelforFeatureLearning where(i;j)isthepositionofrowiandcolumnjof2-DmatrixGh,Gm,orBtk.Followingmasking,theOWAhit-missoperatorisappliedasfollows: HereBt;k;(s)denotesthesortedvalueofBtk,where(s)istheone-dimensionalsortingindexofBtk.Thefeaturevaluesarethencalculatedaccordingtoequation( 3 ). HerethefeaturelearningalgorithmisderivedusinggradientdescentontheMCE.Objectivelossfunctionl(xj)isdenedinsection 2.2.4 .Hence,weneedtocompute@l(xj) 3 )isaminfunctionandthat Thus,itsufcestoderive@l(xj) 3 ).TomaintaintherequirementsplacedonOWAweightsandmasks,weintroduceauxiliary 46

PAGE 47

1+eh~Gh(i;j);(3) wherehisauser-denedpriorparametertodecidetheslopeofthesigmoidfunction. Notethattheserelationsyieldthefollowingdifferentialrelations: whichareusedintheupdateformulas.Weapplygradientdescenttotheauxiliaryvariablesandthenupdatethevariablesusedinthecalculationsaccordingtoequation( 3 ).Sinceweknowthat@l(xj) @upand@l(xj) @~Gh(i;j),thederivatives@x @upand@x @~Gh(i;j)arederivedasfollows: @up=@dt;k wheretmaxk=argmaxtk(D(;k)); @~Gh(i;j)=@Dtmaxk;k wherep=(i1)K+j: 47

PAGE 48

and@l(xj) where@bj(xt) 3-2 DunmurandTitterington 1997 ; RydenandTitterington 1998 ).AlthoughtherearesomelearningmethodsproposedforHMMbasedonMCMCsampling,theyareeitherforthediscreteHMMproblem(standardHMM( Bae 2005 )orinniteHMM( Bealetal. 2002 )),plussomeuncommonHMMs,suchastrajectoryHMMs( Zenetal. 2006 ),nonparametricHMMs( Thrunetal. 1999 ),ornonstationaryHMMs( Zhuetal. 2007 ).Here,aGibbsapproachisproposedfortrainingcontinuousHMMswithmultivariateGaussianmixturesrepresentingthestates. 48

PAGE 49

Theprobabilityofastatesequenceiscomputedasfollows: wherenrs=thenumberoftransitionstatepairssuchthatqt1=randqt=sfort=2;:::;T.Notethatthetransitionprobabilitiesarestationary,thevalueP(qtjqt1)fort=2;:::;Tandxeds;rareequal. Sincetheposteriordistributionsoftheparameters,asdenedinsection 2.2.1 ,arenotavailableinexplicitform,weuseGibbssamplingtosimulatetheparametersfromtheposteriordistributionsafterdeninglikelihoodandpriorprobabilitymodelsfortheparameters.First,weassumethelikelihoodmodelforstatetransitionsisthemultinomialprobabilitydistribution: Theconjugatepriorofthemultinomialdistributionisusedasthepriorprobabilitydistributionofars,soitisaDirichletprobabilitydistribution where~r0isthevectorofpriorparameters.ThestateprobabilitydistributionisassumedtobeaGaussianmixture.Weletr=(cr;r;r)denotetheparametersofthestate 49

PAGE 50

Weassumetheprobabilityofthecomponentsisgovernedbyamultinomialprobabilitydistribution.Let^nrkdenotethenumberofoccurrencesofcomponentkinstater. Similartobefore,weusetheconjugatepriorofthemultinomial,theDirichletprobabilitydistribution,asthepriorprobabilitydistributionofcrk: where~0isthehyperpriorparameter. Nowwecancomputetheposteriorconditionalprobabilities.Theposteriorconditionalprobabilityofstatetransitionars,s=1;:::;Nisgivenby where~rp=~r0+[nr1;:::;nrN]T: 50

PAGE 51

where~p=~0+[^nr1;:::;^nrK]T: NowwecomputetheposteriorconditionalprobabilityoftheGaussianmodelparametersmodelingstatecomponents.AGaussianprobabilitydistributionisgivenby (2)dT=2jjT=2exp1 2Xt(xt)T1(xt)!;(3) wheredisthedimensionofxt.WeassumethepriorofthemeanisaGaussianp(j0;0)=N(;0;0),where0;0arehyperpriorparameters,andthepriorofthecovarianceisaninvWishartprobabilitydistributionp(js;Ts)=invWishart(js;Ts),wheres;Tsarehyperpriorparameters.Hencetheposteriorconditionalprobabilitydistributionofthemeanis wherep=10+T11100+1Xtxt!;p=10+T11:

PAGE 52

wherep=s+T;Tp=Ts+Xt(xt)(xt)T: 3-3 ThetrainingalgorithmforLSampFealHMMconsistsofalternatingoptimizationbetweenaGibbssamplerandamodiedViterbilearningalgorithm.Thatis,itisoftheform RunaGibbssamplertoestimatefeaturemasks RenestatesusingmodiedViterbilearning Inthenextsection,werstdescribetheGibbsSamplerandthentheinitializationandmodiedViterbilearning. 52

PAGE 53

Theprobabilityofthek-thelementofthebinaryhitmaskMisabinomialdistribution, wherenk=Xi(Mik=1): wherenk1=Xi(Mik=1);nk0=Xi(Mik=0);nk2=Xi(Mik=1); Theposteriordistributionisnotavailableinexplicitform,soweusetheGibbssamplingapproachtosampleallthevariablesregardforestimation. 53

PAGE 54

1.Samplepk;k=1;:::;Lgiven(M;;).ThesampleisdrawnfromthebetadistributionifMisthebinaryhitmask. ThesampleisdrawnfromtheDirichletdistributionifMistheternaryhit-missmask. 2.SampleMgiven(p;B;A).SinceeverycomponentofMisassumedtobeindependent,itiseasytosamplecomponentwise. IfMisabinaryhitmask,wesampleitfromabinomialdistribution.WecomputetheP(Mk=1jpk;B;A)andP(Mk=0jpk;B;A)rst,P(Mk=1jpk;B;A)/P(Mk=1jpk)P(BjMk=1;A)/pkexp(BikAik)2 54

PAGE 55

IfMisaternaryhit-missmask,wesampleitfromamultinomialdistribution.FirstwecomputeP(Mk=1jpk1;B;A);P(Mk=0jpk0;B;A)andP(Mk=1jpk2;B;A),whereP(Mk=1jpk1;B;A)/P(Mk=1jpk1)p(BjMk=1;A)/pk1exp(BikAik)2 Afterthesethreevaluesarenormalized,wecansampleMfromamultinomialdistribution:Mkjpk1;pk0;pk2;B;Amultinomial(P(Mk=1jpk1;B;A);P(Mk=0jpk0;B;A);P(Mk=1jpk2;B;A)): 3.SamplethevariableBgiven(A;M).Weknowthat and 55

PAGE 56

RatherthansamplingBasamatrix,itisbettertosamplecomponent-wisefromtheGaussiandistribution: where%=1 4.SampleDgiven(x;B).Weknowthat, 22";(3) sowehave 22":(3) SimilarlywewouldliketosampleDcomponent-wisefromtheGaussiandistribution: 5.Samplexgiven(y;D)fromthetruncatedGaussiandistribution.p(xtjyt=1;D)/N(XiDi;2")truncatedattherightbythethreshold; 56

PAGE 57

IntheLSampFealHMM,wedonotusestateprobabilities.Instead,weusethefeaturesthemselvesdirectly,either forbinarymasksor forternarymasks.Then,giventhetestobservationsequenceX=~x1;~x2;:::;~xT,theoutputofLSampFealHMMisgivenbytheViterbialgorithm,usedforndingthebestpath: outputTXt=1ln(bqt(xqt))(3) ThetrainingalgorithmissummarizedinFigure 3-5 57

PAGE 58

3.2.2.1 ,initializingBisperformedbyrandomlyinitializingthemasksMi;i=1;:::;Qusinguniformprobabilitiesof0and1ateachelementofeachMi. Inadditiontothemodelparameters,theobservationsneedtobeassociatedwitheachstatetoprovideasetoftrainingsamplesforeachmask.ThemethodemployedistousetherstT=Qofalltheobservationsforstates1,thesecondT=Qsamplesforstates2,andsoforth.Ofcourse,animplementationdetailarisesifT=Qisnotaninteger,butthisisnotsignicantfortheGibbssampler. ModiedViterbilearningisusedtoupdatethesamplesusedtolearnthemasksforeachstate.Usingtheparametersandobservationsequences,anoptimalstatesequenceisfoundforeachtrainingsampleusingtheViterbialgorithm.Hence,foreachtrainingsample,wehavepairedsequences Notethatthesecondsegmentisthesetofstatelabelsassociatedwithanoptimalstatesequence,butwemayrefertoitasanoptimalstatesequence.Wewritetheentiresetofpairsofobservationsobtainedinthismanneras Foreachstateindexr2f1;2;:::;Qg,wedenethesetofobservationsassociatedwithras Thetransitionmatrixisalsoupdatedusingtheoptimalstatesequences.Letnijdenotethenumberofoccurrencesoftheconsecutivesubsequencei;jinthesetofall 58

PAGE 59

Giventhehyperparametersk,k,k,thepriorfortheprobabilitiespk1;pk0,pk2isaDirichletdistribution, Theprobabilityofthek-thelementofthehit-missmaskMisamultinomialdistribution, wherenk1=Xi(Mik=1);nk0=Xi(Mik=0);nk2=Xi(Mik=1): 3 .UsingtheGibbssampler,weneedtosamplethesevariables 59

PAGE 60

1.SamplethevariableZgiven(C;A;M).Weknowthat 22(3) and Sowehave 22expLXk=1(CtkZtkMtk)2 WewouldliketosampleZcomponent-wisefromtheGaussiandistribution: 60

PAGE 61

3.SampleMgiven(p;C;Z).SinceeverycomponentofMisassumedtobeindependent,itiseasytosamplecomponent-wise. Afterthesethreevaluesarenormalized,wecansampleMfromamultinomialdistribution:Mtkjpk1;pk0;pk2;C;Zmultinomial(P(Mtk=1jpk1;C;Z);P(Mtk=0jpk0;C;Z);P(Mtk=1jpk2;C;Z)): 4.SampleCgiven(Z;M;qt;qt).Weknowthat and 61

PAGE 62

RatherthansamplingCasamatrix,itisbettertosamplecomponent-wisefromtheGaussiandistribution: where%=1 5.Samplethemeanrandvariancerforthestater,giventhestatesequenceQ.Wecomputextfort=1;:::;Trstby Then,similartosection 3.4.2 ,wecansamplerandrfromtheposteriorconditionalprobabilitydistributionsasfollows: where=(0+T)10@00+Xxj2rxj1A;=(0+T)1=2 62

PAGE 63

3 insection 3.4.2 7.Samplethetransitionmatrixusingequation 3 insection 3.4.2 ThetrainingalgorithmissummarizedinFigure 3-8 .Initializationinthiscaseisrandom. Figure3-1. Featureextractionprocessforfeaturelearning 63

PAGE 64

MCE-basedtrainingprocessforfeaturelearning Figure3-3. GibbssamplingHMMtrainingprocess 64

PAGE 65

FeaturemodelforLSampFealHMM Figure3-5. LSampFealHMMTrainingAlgorithm 65

PAGE 66

InitialfeaturemodelforTSampFealHMM Figure3-7. FinalfeaturemodelforTSampFealHMM 66

PAGE 67

TSampFealHMMTrainingAlgorithm 67

PAGE 68

4-1 .Thesyntheticdatacontained300imagesfromeachclassinthetrainingsetand40imagesfromeachclassinthetestingset.McFeaLHMMwasappliedtothisdatasettoshowtheperformanceofthealgorithm.WerefertothisdatasetastheSynData1. Thesecondsyntheticdatasetcontainedthedatasamplesofimagesequences,100sequencesforeachclass.Eachsequencehadnineimages,eachimagewasa55imagewithintensityvaluesintheinterval[0,1].Thefeatureclasshadthesequencesofsimulatedhyperbolas.Therefore,thesesequenceshadthreegroupsofimages.Thethreegroupswereassociatedwiththeimagescontaininglinesegmentsthatoriented45,180,and135degrees,respectively.Togeneratethesampleimages,werstusedaxed,left-righttransitionmatrixtogeneratestatesequences.Then,accordingtothestatesequences,lineimagesfromthatstateweregenerated.Abinaryimagewasusedasthetemplate,thenadditiveGaussiannoise(0.1meanand0.1standardderivation)andsaltandpeppernoise(withprobability0.3ofchangingstate)wasusedtocorruptthetemplate.Thebackgroundsequencesconsistedofimagesfromcorruptedblank 68

PAGE 69

4-9 .Inthegure,therearetensequencesfromeachclassfromtoptobottom.Eachrowisasequenceoftenimages.Twoadjacentimagesinthesequenceareseparatedbyablankcolumn.Thetwoadjacentsequencerowsareseparatedbyablankrow.LSampFeaLHMMwasappliedtothisdatasettoshowtheperformanceofthealgorithm.WerefertothisdatasetasSynData2. TherearethreeGPRdatasets.TheywerebothacquiredusingNIITEKtimedomainGPRsystemswelldescribedintheliterature( Leeetal. 2007 ).Thefocushereisonadiscussionoftherelativeperformanceofthenewandoldalgorithms.Datasetscontained2classes:Anti-tank(AT)minesandnon-mines.Thesearebothplastic-casedandmetal-casedmines.Inallcases,thevariousHMMalgorithmswereappliedtoalarmsdetectedbyapre-screener( Gaderetal. 2004 ). TherstGPRdatasetwasacquiredfromanaridtestsite.Itconsistedof120mineencountersand120falsealarms.Datasampleswereextractedfrompre-screeneralarms.Thissetcontains80imagesfromeachclassinthetrainingsetand40imagesfromeachclassinthetestingset.Eachdatasampleisa2923imagewithintensityvaluesintheinterval[-1,1].TensamplesareshowninFigure 4-2 .ThisauthorproducedalltheHMMresultsonthisdataset.ComparisonsofstandardHMM( Ma 2004 )andMcFeaLHMMalgorithmsweremadeonthisset.WerefertoitastheGPRAciddataset. ThesecondGPRdatasetwasacquiredfromantemperatetestsite.Itconsistedof316mineencounters,ofwhich234wereplastic-cased,and1,025werefalsealarms.SimilartotheGPRAciddataset,datasampleswereextractedfrompre-screeneralarms.Eachdatasampleisa2923imagewithintensityvaluesintheinterval[-1,1].Thelane-based10-foldcrossvalidationswereappliedtothisdataset.ThisauthorproducedalltheHMMresultsfromthisdataset.ComparisonsofstandardEM-HMM,SampHMM,andLSampFeaLHMMalgorithmsweremadeonthisset.WerefertoitastheGPRTempdataset. 69

PAGE 70

AsignicantpointisthattheHMMexperimentswerenotrunbytheauthoronthisdataset.Theywererunbyothers(P.GaderandJ.Bolton,pers.comm.)andveriedbyP.Gader,theadviserforthisdissertationstudy.AlgorithmSampHMMandDTXTHMMwerecomparedusingthisset,whichisreferredtoastheGPRTwoSitedataset.Othercomparisonswillbemadeinthefuture,butarelimitedbytheabilitytotransferalgorithms.Furthermore,truefalsealarmratesarenotgiven.Falsealarmratesaregiveninarbitraryunits,butareproportionalinthesensethatifalgorithmAhasxfalsealarmsandalgorithmBhasrxfalsealarms,thenalgorithmBhasrtimesasmanyfalsealarmsasalgorithmA.Again,thefocusofthisresearchistoevaluaterelativeperformanceofalgorithms,notGPRsystems.Thus,absolutefalsealarmratesarenotnecessary. ThehandwrittendataconsistofimagesacquiredfromtheMNISTdataset( LeCunandCortes ).ThepurposeoftheseexperimentsistocompareperformanceofthefeaturelearningHMMwithanHMMtrainedusinghandmadefeatures.TSampFeaLHMMandSampHMMarecomparedusingthisset,whichisreferredastheHWDRdataset. 4-1 .TheDTXTHMMrepresentsthebaselinealgorithmforminedetectionusingGPRdata.IthasbeendevelopedoveryearsandversionsofithavedemonstratedexcellentperformanceonseveralGPRsystems( Friguietal. 2005 ; Gaderetal. ; Wilsonetal. 2007 ; Zhaoetal. 2003 ).Wecompareagainstitforthelandminedetectionexperiments. TheHMMalgorithmforHWDRwithhandmadefeatureswillbedescribedinsection 4.2.6 70

PAGE 71

Sincethealgorithmissensitivetoinitialization,weconsideredtwodifferentmaskinitializations.Oneinitializationusedhit-misspairsrepresentinghorizontalandverticallinesegments.Theotherusedhit-misspairsrepresentingdiagonalandanti-diagonallinesegments.ThetwopairsofinitialmasksareshowninFigure 4-3 .Eachmaskisa55arrayimagewithvaluesintheinterval[0,1]. TheOWAoperatorsassociatedwitheachmaskwererandomlyinitialized.ThesameOWAoperatorswereusedforbothhorizontal/verticalanddiagonal/anti-diagonalinitializations.TheweightsareshowninFigure 4-4 .Theverticalaxisisthevalueoftheweights.Thehorizontalaxisistheindexoftheorderedelementsofthemask.Theheightofthebaratindexiindicatesthevalueofwiinequation 3 .Therstfteenweightsweresettohaveverysmallvalues,andtheothertenweightsweresampledfromtheuniformdistributionontheinterval[0,1]initially.Theweightswerenormalizedforeachmask. AfterMcFeaLHMMfeaturelearning,theclassicationrateoverthetestsetonsyntheticdatawas100%.ThenalmasksareshowninFigure 4-5 .Themasksareverysimilartoinitialmasks.Thisisexpectedsincethesenalmaskscanextractthehyperbolashapeinformationofthetrainingimages.ThenalOWAweightsareshowninFigure 4-6 .ThesenalweightsaresimilartoameanOWAoperatorortheypreferhighvalueelementsofmasks. TheresultshowsthatMcFeaLHMMcanachievegreatperformancewithgoodinitialization.However,theexperimentalsoshowedthatthealgorithmissensitivetoinitialization.Thealgorithmcouldnotconvergewithrandominitialization. 71

PAGE 72

GPRAridcontains80imagesfromeachclassinthetrainingsetand40imagesfromeachclassinthetestingset.Thedimensionalityofthefeaturewaschosentobefour.Twodimensionswereextractedfromthepositivepartoftheimageandtwodimensionswereextractedfromthenegativepartoftheimage.ThemasksandOWAoperatorswereinitializedthesamewayasforSynData1. Thealgorithmswaretrainedonthetrainingsetandtestedonthetestset.TheplotsofProbabilityofDetection(PD)vs.ProbabilityofFalseAlarm(PFA)onthetestsetforthestandardHMM,McFeaLHMM-trainedviafeaturelearninginitializedwiththehorizontalandverticalmasks,andMcFeaLHMM-trainedviafeaturelearninginitializedwiththediagonalandanti-diagonalfeaturesareshowninFigure 4-8 .Ascanbeseen,thePFAofthefeaturelearningalgorithmisreducedto80%ofastandardHMMalgorithmataPDof90%.ThefeatureslearnedbyalgorithmMcFeaLHMMontrainingsetareshowninFigure 4-7 ThenalOWAweightstrainedonGPRAriddidnotappearqualitativelydifferentfromtheweightstrainedonSynData1.Thenalmasksaresimilartotheinitialmasks.Thisisnotsurprisingsincethesemaskcanextractinformationfromthehyperbolashapeofmineimages. TheexperimentshaveshownthatthisMCEfeaturelearningmethodisverysensitivetoinitialization.Thelearningrateswerealsocarefullytuned,otherwisethetrainingalgorithmcouldnotconvergetoastablepoint.Infact,identifyinglearningalgorithmsthatarenotsosensitive,wasthereasonthatoursamplingfeaturelearningalgorithmswereproposed. 72

PAGE 73

TheplotofPDvsPFAforSampHMMandDTXTHMMalgorithmsondifferentdatasitesareshowninFigures 4-23 4-24 4-25 4-26 4-27 ,and 4-28 .TheplotsshowthatPFAofSampHMMislessoreveninhalfofthePFAofthestandardHMMalgorithmforPDsof90%or85%.WecouldconcludethattheSampHMMalgorithmoutperformstheexistingHMMalgorithmsforGPRminedetection. Weappliedthree-stateHMMmodelsinthetrainingprocess.AfterGibbsfeaturelearningwiththehitmaskonly,thenalmasksareshowninFigure 4-10 .Fromlefttoright,therearethreestatesshowninthegure.Thebottomrowshowsthehitmaskforeachstate.Thetoprowshowsthevalueoffeatures.Thehorizontalaxisindicatestheindexofimagesinthespeciedstateforallsequences.Thersthalfisfromthefeatureclass,andthesecondhalfisfromthebackgroundclass.Theverticalaxisshowsthefeaturevaluesofimagesafterapplyingthemask.Thenalhitmasksareanintuitiveresult,whatweexpected.Theyperfectlymatchedthebinarytemplatesthatweusedtogeneratethetrainingsamples.Thefeaturevaluesarewellseparatedbetweenfeatureimagesandbackgroundimages.Abigfeaturevalueindicatesastrong`hit'inthatimage. 73

PAGE 74

4-11 .Theleftpartofthegureshowstheimagesassociatedwiththisstateatthenaliteration.Thetophalfisfromthefeatureclass,andthebottomhalfisfromthebackgroundclass.Eachrowisavectorformatofanimage.Thefeaturevaluesofimagesinthisstateareshownatthetopcenterofthegure.Thenalhit-missmaskisshownatthebottomcenterofthegure.Theintensitiesoftheimageforthehit-missmaskareintheinterval[-1,1].Theleftpartofthegureshowsthematrixformatofimages.Thelefttwocolumnsarefromthefeatureclass.Theright-mostcolumnisfromthebackgroundclass.Thetwoadjacentimagesareseparatedbyablankrowineachcolumn.Thefeaturevaluegureshowsthefeaturevaluewiththehit-missmaskismoreseparatedthanthefeature-valuewithhitmaskonly. Theindividualhitmask,do-not-caremask,andmissmaskareshowninFigure 4-12 .Theintensityvaluesofallthreeimagesareintheinterval[0,1].Thegureshowsthatthehitmaskisananti-diagonalshape.Themissmaskisanegativeanti-diagonalmask.Thedo-not-caremaskisalmostblank,whichtsthisdataset. Wealsoconductedexperimentstotesttheperformanceofthealgorithmwhenimagesarenotalignedinthesameposition.Itwouldrequirethealgorithmtoshifttheimagetomatchthefeaturemaskinthelearningprocess.TheresultsareshowninFigure 4-13 .Fromtheleftpartandrightpartofthegure,wecanseethatsomeimagesinthisgroupdonotalign,buttheresultsshowthatthefeaturevaluesarewellseparatedagain.Thenalhit-missmaskstillhasgoodanti-diagonalshape,althoughitisnotcrisp. Theindividualhitmask,do-not-caremask,andmissmaskareshowninFigure 4-14 .Thehitmaskstillhasgoodshapeinformation.Themissmaskhasweakvalues,sinceitishardtomatchagainsttheoff-alignmentposition.Thedo-not-caremaskgainssomeintensity,sincetheoffsetpositionsmaynotcontributeasmuch,eitherfromthe 74

PAGE 75

4-15 .Eachsequenceisalongtherow.Twosequencesareseparatedbyahorizontalgraybar,andtwoadjacentimagesinonesequenceareseparatedbyaverticalgraybar.Itcanbeseenthatthesequencesconsistofascending-edgeanddescending-edgeimages. Twohundredimagesequencesfromthemineclasswereextractedfromlandminedatainthetrainingset.ThenalhitmasksafterGibbsfeaturelearningwithafour-stateHMMsettingareshowninFigure 4-16 .Each55blockisonehitmaskforonestate,andtwomasksareseparatedbyaverticalblackbar.Thesecondstatehasveryfewsamplesassociatedwithit,sothesecondhitmaskcanbeignored.Itisclearthatthenalthreehitmasksarecapturingascending-edge,at-edge,anddescending-edgeinformation,respectively. ThenalhitmasksafterGibbsfeaturelearningwithathree-stateHMMsettingareshowninFigure 4-17 .Thesecondstatehasveryfewsamplesassociatedwithit,sothesecondhitmaskisignored.Therstandthirdhitmaskscapturetheascending-edgeanddescending-edgeinformation,respectively. 75

PAGE 76

4-18 .ThegureshowsthattheHMMsamplingalgorithmhasthelowestPFAat90%ofPD.ThePFAoftheGibbsfeaturelearningalgorithmmatchesorexceedsthestandardHMMalgorithmatmostPDs.TheresultshowsthattheHMMsamplingalgorithmperformedbestonthelandmineGPRdataset.Ourfeaturelearningalgorithm(LSampFeaLHMM)canmatchorexceedHMMalgorithmswithhuman-madefeaturemasks,thussavetimeandlabor. 3-7 .InexperimentsontheTSampFeaLHMMalgorithm,thesefeaturemaskswereestimatedinthetrainingprocess.Thehuman-mademasksareshowninFigure 4-22 .Theyareninelinesegmentsthatwereoriented0,20,40,50,70,90,110,130,and160degrees,respectively.Thesemaskswerecreatedtosimulatethecommonlyusededgedetectors( Friguietal. 2009 ).Anallblankmaskwasaddedtocapturetheemptybackgroundzone. Eachrawdigitimageisa2828gray-levelimage.Theintensityvalueswerethenscaledto[0,1].Next,theprincipaltransformalgorithmwasappliedtotheimages,sothatthetwoprincipaldirectionsoftheimageswerehorizontalandverticalaxes.Thenthebackgroundvaluesoftheimagesweresetto-1,andzerovalueswerepaddingtotheedgeofdigits.SomesamplesareshowninFigure 4-19 .Nexteachimagewassplitinto16size88overlappedzones,asshowninFigure 4-20 .Theorderalongtheanti-diagonaldirectionfromtop-righttobottom-left,formedasequencefromthesesixteenzones. 76

PAGE 77

Seven-stateHMMmodelswereusedintheexperimentusingTSamplFeaLHMMwithdigitzonesequences.Fulltransitionmatrixeswereused.First,inthetrainingprocess,thefeaturemasksandHMMstateparameterswerelearnedsimultaneously.Samplingwasperformedfor10,000iterations,asufcientburningperiod,thenthefeaturemaskswerexedandtheHMMparameterswereupdatedforsubsequentiterations.Trainingwasstoppedafter15,000iterations.ThesecondloopwastonetunetheHMMparameters.ThesametrainingprocesswithoutthefeaturelearningprocesswasusedforSampHMM. ClassicationresultsoftheTSamplFeaLHMMareshowninTable 4-2 astheconfusionmatrixfordigitpair0and1.Therowindexistrueclassandthecolumnindexisalgorithmclassication.Aboutclassicationerrorof2%isshownforthesetwodigitclasses. ClassicationresultsoftheTSamplFeaLHMMfordigitclasses0,1,2,4areshowninTable 4-3 .Correctclassicationofdigitclass2wasthemostdifcult.Aboutclassicationerrorof8%isshownforthesefourdigitclasses. Thenalhit-missmasksandtransitionmatrixareshowninFigure 4-21 .Thesizeofpixelmeanstherelativevalueoftheelementinthattransitionmatrix.Itisdifculttogetconclusiveinformationfromthesehitmasks. ClassicationresultsoftheHMMalgorithmwithhuman-mademasksfordigits0,1,2,4areshowninTable 4-4 .Aboutclassicationerrorof14%isshownforthesedigits,whichisworsethanourfeaturelearningalgorithm.TheseresultsshowthattheperformanceofthefeaturelearningHMMbeatstheHMMalgorithmusinghuman-madefeatures. 77

PAGE 78

AlgorithmsandDatasets SynData1 X SynData2 X GPRArid XX GPRTemp XXX GPRTwoSite XX HWDR X*X *Usinghandmadefeatures Confusionmatrixfordigitpair0and1forTSampFeaLHMM Digits 01 0 3000 1 8292 Table4-3. Confusionmatrixfordigits0,1,2,4forTSampFeaLHMM Digits 0124 0 27701013 1 029216 2 12025632 4 2113284 Table4-4. Confusionmatrixfordigits0,1,2,4forHMMwithhuman-mademasks Digits 0124 0 2561376 1 129153 2 13027413 4 52138209 78

PAGE 79

(b)Class2:60-degreeangleimages,abovearerandomlypicked10imagesfromclass2. TensamplesfromeachclassofdatasetSynData1. 79

PAGE 80

(b)10samplesfromnon-minesdataset SamplesfromeachclassofdatasetGPRArid. (b)Diagonalandanti-diagonalpairs Hit-misspairsforinitialmasks. 80

PAGE 81

InitialOWAweightsforhitandmiss.Thetopweightscorrespondtothehitmaskandthebottomweightscorrespondtothemissmask. (b)Diagonalandanti-diagonalpairs Hit-missmasksafterfeaturelearningcorrespondingtoinitialmasksinFigure 4-3 (b)Hitandmissweightsfordiagonalandanti-diagonalinitialization OWAweightsafterfeaturelearning. 81

PAGE 82

(b)Diagonalandanti-diagonalpairs Finalmaskslearnedforthelandminedata.Eachrowrepresentsdifferentfeature.Row1positive,Row2positive,Row3negative,Row4negative. Figure4-8. ReceiveroperatingcharacteristiccurvescomparingMcFeaLHMMwithtwodifferentinitializationstothestandardHMM. 82

PAGE 83

Left:ascendingedge,agedge,anddescendingedgesequences.Right:sequencesfromnoisebackground. Figure4-10. HitmasksafterGibbsfeaturelearning. 83

PAGE 84

Resultfor135-degreestateafterGibbsfeaturelearningwithhit-missmasks.Figure4-12. Hit-missmasksafterGibbsfeaturelearning. 84

PAGE 85

ResultwithshiftedtrainingimagesafterGibbsfeaturelearningwithhit-missmasks. Figure4-14. Hit-missmasksafterGibbsfeaturelearningwithshiftedtrainingimages. 85

PAGE 86

25samplesequencesextractedfrommineimagesfromdatasetGPRTemp. Figure4-16. HitmasksafterGibbsfeaturelearningwithfour-stateHMMsetting. 86

PAGE 87

HitmasksafterGibbsfeaturelearningwiththree-stateHMMsetting. Figure4-18. ReceiveroperatingcharacteristiccurvescomparingLSampFeaLHMMandSampHMMalgorithmswiththestandardHMMalgorithm. 87

PAGE 88

18samplesforeachdigitfromMNIST. 88

PAGE 89

(b)digit1 (c)digit2 (d)digit4 Twosamplesforeachdigittoshowzonesplitting. (b)Hitmaskstransitionmatrixfordigit1 (c)Hitmaskstransitionmatrixfordigit2 (d)Hitmaskstransitionmatrixfordigit4 HitmasksandtransitionmatrixafterGibbsfeaturelearningfordigits. Figure4-22. Tenhuman-mademasks. 89

PAGE 90

HMMSampvsDTXTHMMonGPRA1atsiteS1whileclutter==mine. Figure4-24. HMMSampvsDTXTHMMonGPRA1atsiteS1whileclutter=mine. 90

PAGE 91

HMMSampvsDTXTHMMonGPRA2atsiteS1whileclutter=mine. Figure4-26. HMMSampvsDTXTHMMonGPRA2atsiteS2whileclutter=mine. 91

PAGE 92

HMMSampvsDTXTHMMonGPRA2atsiteS1whileclutter==mine. Figure4-28. HMMSampvsDTXTHMMonGPRA2atsiteS2whileclutter==mine. 92

PAGE 93

Theperformanceofsuchfeature-basedlearningmethodsasHMMsdependsnotonlyonthedesignoftheclassier,butalsoonthefeatures.Fewstudieshaveinvestigatedbothawholesystem.Featuresthataccuratelyandsuccinctlyrepresentdiscriminatinginformationinanimageorsignalareveryimportanttoanyclassier. Ourapproachinvolveddevelopingaparameterizedmodeloffeatureextractionbasedonmorphologicalorlinearconvolutionoperations.TolearntheparametersofthefeaturemodelandtheHMM,twofeaturelearningalgorithmsweredeveloped.TheycansimultaneouslyextractthefeaturesandlearntheparametersoftheHMMmodel.Onealgorithmisbasedonminimumclassicationerror,andtheotherisbasedonGibbssampling.TheGibbssamplingmethodisusedsothatthemethodismorerobusttotheinitializationandachievesthebettersolution.Theexperimentsshowthisnewmethodcanoutperformtheothermethodsinthelandminedetectionapplication. Additionally,anewlearningmethodforlearningparametersoftheHMMmodelwithmultivariateGaussianmixturehasbeenpresented.Thismethodhasbeenshowntoimproveperformanceofboththesyntheticdataandrealdatasetscomparedtoexistingstate-of-the-artmethodsandtohuman-madefeatures. Specically,thefollowingresultswereachieved: 93

PAGE 94

94

PAGE 95

K.Bae.BayesianModel-basedApproacheswithMCMCComputationtoSomeBioinfor-maticsProblems.PhDthesis,TexasA&MUniversity,2005. M.Beal,Z.Ghahramani,andC.Rasmussen.TheinnitehiddenMarkovmodel.InMachineLearning,pages29,Cambridge,MA,2002.MITPress. S.Belongie,C.Carson,H.Greenspan,andJ.Malik.Color-andtexture-basedimagesegmentationusingEManditsapplicationtocontent-basedimageretrieval.InProc.Int'lConf.ComputerVision,pages675,1998. J.A.Bilmes.AgentletutorialontheEMalgorithmanditsapplicationtoparameterestimationforGaussianmixtureandhiddenMarkovmodels.Technicalreport,UniversityofCalifornia,Berkeley,1997. A.L.BlumandP.Langley.Selectionofrelevantfeaturesandexamplesinmachinelearning.ArticialIntelligence,97:245,1997. C.Burges.Geometricmethodsforfeatureextractionanddimensionalreduction.TechnicalReport55,MicrosoftResearch,November2004. G.CasellaandE.I.George.ExplainingtheGibbssampler.TheAmericanStatistician,46(3):167,1992. S.ChenandX.Yang.Alternativelineardiscriminantclassier.PatternRecognition,37:1545,2004. Y.-N.Chen,C.-C.Han,C.-T.Wang,B.-S.Jeng,andK.-C.Fan.Theapplicationofaconvolutionneuralnetworkonfaceandlicenseplatedetection.InternationalConferenceonPatternRecognition,3:552,2006. S.Choi.SequentialEMlearningforsubspaceanalysis.PatternRecognitionLetter,25:15591567,2004. D.Cohn.Informedprojections.InAdvancesinNeuralInformationProcessingSystems15,pages849,Cambridge,MA,2003.MITPress. T.G.Dietterich.Machinelearningresearch:Fourcurrentdirections.AIMagazine,18(4):97,1997. A.DunmurandD.Titterington.ComputationalBayesiananalysisofhiddenMarkovmeshmodels.IEEETransactionsonPatternAnalysisandMachineIntelligence,19(11):1296,1997. J.G.DyandC.E.Brodley.Featureselectionforunsupervisedlearning.JournalofMachineLearningResearch,5:845,April2004. M.A.Figueiredo.Adaptivesparsenessforsupervisedlearning.IEEETransactionsonPatternAnalysisandMachineIntelligence,25(9):1150,September2003. 95

PAGE 96

H.Frigui,K.C.Ho,andP.Gader.Real-timelandminedetectionwithground-penetratingradarusingdiscriminativeandadaptivehiddenMarkovmodels.EURASIPJ.Appl.SignalProcess.,2005:1867,2005. H.Frigui,A.Fadeev,A.Karem,andP.Gader.AdaptiveedgehistogramdescriptorforlandminedetectionusingGPR.InSocietyofPhoto-OpticalInstrumentationEngineers(SPIE)ConferenceSeries,volume7303ofSocietyofPhoto-OpticalInstrumentationEngineers(SPIE)ConferenceSeries,May2009. D.Gabor.Theoryofcommunication.J.Inst.Electr.Engrs,93(26):429457,November1946. P.GaderandM.Khabou.Automaticfeaturegenerationforhandwrittendigitrecognition.IEEETransactionsonPatternAnalysisandMachineIntelligence,18(12):1246,December1996. P.GaderandM.Popescu.GeneralizedhiddenMarkovmodelsforlandminedetection.Proc.SPIE,4742:349,April2002. P.Gader,M.Mystkowski,andY.Zhao.LandminedetectionwithgroundpenetratingradarusinghiddenMarkovmodels.IEEETransactionsonGeoscienceandRemoteSensing. P.Gader,J.R.Miramonti,Y.Won,andP.Cofeld.Segmentationfreesharedweightnetworksforautomaticvehicledetection.NeuralNetworks,8(9):1457,1995. P.Gader,M.Khabou,andA.Koldobsky.Morphologicalregularizationneuralnetworks.PatternRecognition,33:935,2000. P.Gader,W.-H.Lee,andJ.Wilson.Detectinglandmineswithground-penetratingradarusingfeature-basedrules,orderstatistics,andadaptivewhitening.IEEETransactionsonGeoscienceandRemoteSensing,42(11):2522,Nov.2004. C.GarciaandM.Delakis.Convolutionalfacender:Aneuralarchitectureforfastandrobustfacedetection.IEEETransactionsonPatternAnalysisandMachineIntelligence,26(11):1408,2004. Z.Ghahramani,T.L.Grifths,andP.Sollich.Bayesiannonparametriclatentfeaturemodels.BayesianStatistics,pages201,2007. J.GilandR.Kimmel.Efcientdilation,erosion,opening,andclosingalgorithms.IEEETransactionsonPatternAnalysisandMachineIntelligence,24(12):1606,December2002. J.GilandM.Wcrman.Computing2-dmin,median,andmaxlters.IEEETransactionsonPatternAnalysisandMachineIntelligence,15(5):504,May1993. 96

PAGE 97

A.R.GoldingandD.Roth.AWinnow-basedapproachtospellingcorrection.MachineLearning,34:107,1999. H.Greenspan,R.Goodman,andR.Chellappa.Textureanalysisviaunsupervisedandsupervisedlearning.InternationalJointConferenceonNeuralNetworks,1991,IJCNN-91-Seattle,1:639,July1991. T.L.GrifthsandZ.Ghahramani.InnitelatentfeaturemodelsandtheIndianbuffetprocess.TechnicalReport2005-01,GatsbyComputationalNeuroscienceUnit,UniversityCollegeLondon,2005. I.GuyonandA.Elisseeff.Anintroductiontovariableandfeatureselection.JournalofMachineLearningResearch,3:1157,March2003. I.Guyon,A.B.Hur,S.Gunn,andG.Dror.ResultanalysisoftheNIPS2003featureselectionchallenge.InAdvancesinNeuralInformationProcessingSystems17,pages545,Cambridge,MA,2004.MITPress. D.Haun,K.Hummel,andM.Skubic.Morphologicalneuralnetworkvisionprocessingformobilerobots.Technicalreport,Dept.ofComputerEngineeringandComputerScience,UniversityofMissouri-Columbia,September2000. K.HildII,D.Erdogmus,K.Torkkola,andJ.C.Principe.Featureextractionusinginformation-theoreticlearning.IEEETransactionsonPatternAnalysisandMachineIntelligence,28(9):1395,September2006. P.Huber.Projectionpursuit.TheAnnalsofStatistics,13(2):435475,1985. A.HyvarinenandE.Oja.Independentcomponentanalysis:Algorithmsandapplications.NeuralNetworks,13(4-5):411,June2000. M.InoueandN.Ueda.ExploitationofunlabeledsequencesinhiddenMarkovmodels.IEEETransactionsonPatternAnalysisandMachineIntelligence,25(12):1570,December2003. M.JonesandR.Sibson.Whatisprojectionpursuit?JournaloftheRoyalStatisticalSociety,SeriesA,150(1):1,1987. M.KhabouandP.Gader.Automatictargetdetectionusingentropyoptimizedshared-weightneuralnetworks.IEEETransactionsonNeuralNetworks,11(1):186,January2000. M.Khabou,P.Gader,andJ.Keller.Ladartargetdetectionusingmorphologicalshared-weightneuralnetworks.MachineVisionandApplications,11:300305,2000. 97

PAGE 98

B.Krishnapuram.AdaptiveClassierDesignUsingLabeledandUnlabeledData.PhDthesis,DukeUniversity,2004. B.Krishnapuram,A.Hartemink,L.Carin,andM.Figueiredo.ABayesianapproachtojointfeatureselectionandclassierdesign.IEEETransactionsonPatternAnalysisandMachineIntelligence,26(9):1105,September2004. S.KullbackandR.A.Leibler.Oninformationandsufciency.TheAnnalsofMathemati-calStatistics,22(1):79,1951. V.Kyrki,J.-K.Kamarainen,andH.Kalviainen.SimpleGaborfeaturespaceforinvariantobjectrecognition.PatternRecognitionLetter,25:311,2004. M.Law,M.Figueiredo,andA.Jain.Simultaneousfeatureselectionandclusteringusingmixturemodels.IEEETransactionsonPatternAnalysisandMachineIntelligence,26(9):1154,September2004. Y.LeCunandC.Cortes.MNISThandwrittendigitdatabase.Availableat Y.LeCun,B.Boser,J.Denker,D.Henderson,R.E.Howard,W.Hubbard,andL.Jackel.Backpropagationappliedtohandwrittenzipcoderecognition.NeuralComputation,1(4):541,1989. Y.LeCun,L.Bottou,Y.Bengio,andP.Haffner.Gradient-basedlearningappliedtodocumentrecognition.ProceedingsoftheIEEE,86(11):2278,November1998. K.E.Lee,N.Sha,E.R.Dougherty,M.Vannucci,andB.K.Mallick.Geneselection:ABayesianvariableselectionapproach.Bioinformatics,19(1):90,January2003. W.-H.Lee,P.D.Gader,andJ.N.Wilson.Optimizingtheareaunderareceiveroperatingcharacteristiccurvewithapplicationtolandminedetection.IEEETransactionsonGeoscienceandRemoteSensing,45(2):389,February2007. J.M.Leiva-MurilloandA.Artes-Rodriguez.Maximizationofmutualinformationforsupervisedlinearfeatureextraction.IEEETransactionsonNeuralNetworks,18(5):1433,September2007. N.Littlestone.Learningquicklywhenirrelevantattributesabound:Anewlinear-thresholdalgorithm.FoundationsofComputerScience,28thAnnualSym-posiumon,pages68,October1987. H.LiuandH.Motoda.ComputationalMethodsofFeatureSelection.Chapman&Hall/CRC,2007. 98

PAGE 99

X.Ma,D.Schonfeld,andA.Khokhar.Ageneraltwo-dimensionalhiddenMarkovmodelanditsapplicationinimageclassication.InIEEEInternationalConferenceonImageProcessing,2007.ICIP2007.,volume6,pages41,Oct2007. D.Mackey.InformationTheory,Inference,andLearningAlgorithms.CambridgeUniversityPress,2003. M.Mrup,K.H.Madsen,andL.K.Hansen.Approximatel0constrainednon-negativematrixandtensorfactorization.InAcceptedISCAS2008specialsessiononNon-negativeMatrixandTensorFactorizationandRelatedProblems,2008. J.R.Movellan.TutorialonGaborlters.Tutorialpaper M.Najjar,C.Ambroise,andJ.-P.Cocquerez.Featureselectionforsemisupervisedlearningappliedtoimageretrieval.InIEEEICIP2003,pages559,September2003. S.P.NanavatiandP.K.Panigrahi.Wavelets:Applicationstoimagecompression-I.Resonance,10(2):52,February2005. R.M.Neal.ProbabilisticinferenceusingMarkovchainMonteCarlomethods.TechnicalReportCRG-TR-93-1,UniversityofToronto,1993. C.Nebauer.Evaluationofconvolutionalneuralnetworksforvisualrecognition.IEEETransactionsonNeuralNetworks,9(4):685,July1998. N.Otsu.Athresholdselectionmethodfromgray-levelhistograms.IEEETrans.SysMan.,Cyber,(9):6266,1979. H.Peng,F.Long,andC.Ding.Featureselectionbasedonmutualinformation:Criteriaofmax-dependency,max-relevance,andmin-redundancy.IEEETransactionsonPatternAnalysisandMachineIntelligence,27(8):12261238,August2005. S.PetridisandS.Perantonis.Ontherelationbetweendiscriminantanalysisandmutualinformationforsupervisedlinearfeatureextraction.PatternRecognition,37:857,2004. R.Porter,N.Harvey,S.Perkins,J.Theiler,S.Brumby,J.Bloch,M.Gokhale,andJ.Szymanski.Optimizingdigitalhardwareperceptronsformulti-spectralimageclassication.JournalofMathematicalImagingandVision,19:133,2003. D.Prescher.Ashorttutorialontheexpectation-maximizationalgorithm.InstituteforLogic,LanguageandComputation.UniversityofAmsterdam,2004. J.C.Principe,J.FisherIII,andD.Xu.Information-theoreticlearning,May1998. 99

PAGE 100

L.R.Rabiner.AtutorialonhiddenMarkovmodelsandselectedapplicationsinspeechrecognition.ProceedingsoftheIEEE,77(2):257,February1989. M.Rizki,L.Tamburino,andM.Zmuda.Multi-resolutionfeatureextractionfromGaborlteredimages.InAerospaceandElectronicsConference,1993,ProceedingsoftheIEEE1993National,pages819,1993. M.RogatiandY.Yang.High-performingfeatureselectionfortextclassication.InCIKM'02:ProceedingsoftheEleventhInternationalConferenceonInformationandKnowledgeManagement,pages659,NewYork,NY,USA,2002.ACM. V.RothandT.Lange.Featureselectioninclusteringproblems.InS.Thrun,L.Saul,andB.Scholkopf,editors,AdvancesinNeuralInformationProcessingSystems16.MITPress,Cambridge,MA,2004. S.Roweis.EMalgorithmsforPCAandSPCA.InNIPS'97:Proceedingsofthe1997ConferenceonAdvancesinNeuralInformationProcessingSystems10,pages626,Cambridge,MA,1998.MITPress. T.RydenandD.M.Titterington.ComputationalBayesiananalysisofhiddenMarkovmodels.JournalofComputationalandGraphicalStatistics,7(2):194,June1998. S.Saha.Imagecompression-fromDCTtowavelets:Areview.Crossroads,6(3):12,2000. H.Sahoolizadeh,M.Rahimi,andH.Dehghani.Facerecognitionusingmorphologicalshared-weightneuralnetworks.ProceedingsofWorldAcademyofScience,Engineer-ingandTechnology,35:556,November2008. J.Serra.ImageAnalysisandMathematicalMorphology.AcademicPress,Orlando,FL,USA,1983. C.Shannon.Amathematicaltheoryofcommunication.BellSystemTechnicalJournal,27:379and623,JulyandOctober1948. Q.Sheng,G.Thijs,Y.Moreau,andB.D.Moor.ApplicationsofGibbssamplinginbioinformatics,2005.InternalReport05-65. L.I.Smith.Atutorialonprincipalcomponentsanalysis,February2002.CornellUniversity. M.StankeandS.Waack.GenepredictionwithahiddenMarkovmodelandanewintronsubmodel.Bioinformatics,19(2):215,2003. 100

PAGE 101

Y.Sun.Iterativereliefforfeatureweighting.IEEETransactionsonPatternAnalysisandMachineIntelligence,29(6):1,June2007. L.A.TamburinoandM.M.Rizki.Automatedfeaturedetectionusingevolutionarylearningprocesses.InAerospaceandElectronicsConference,1989,ProceedingsoftheIEEE1989National,volume3,pages1080,May1989a. L.A.TamburinoandM.M.Rizki.Automaticgenerationofbinaryfeaturedetectors.AerospaceandElectronicSystemsMagazine,IEEE,4(9):20,1989b. S.Thrun,J.C.Langford,andD.Fox.MonteCarlohiddenMarkovmodels:Learningnon-parametricmodelsofpartiallyobservablestochasticprocesses.InProc.oftheInternationalConferenceonMachineLearning(ICML),pages415.MorganKaufmann,1999. R.Tibshirani.Regressionshrinkageandselectionviathelasso.JournaloftheRoyalStatisticalSociety.SeriesB(Methodological),,58(1):267,1996. M.E.TippingandC.M.Bishop.Mixturesofprobabilisticprincipalcomponentanalyzers.NeuralComputation,11(2):443,July1998. K.Torkkola.Featureextractionbynon-parametricmutualinformationmaximization.JournalofMachineLearningResearch,3:1415,March2003. E.Urbach,J.Roerdink,andM.Wilkinson.Connectedshape-sizepatternspectraforrotationandscale-invariantclassicationofgray-scaleimages.IEEETransactionsonPatternAnalysisandMachineIntelligence,29(2):272,February2007. M.E.Wall,A.Rechtsteiner,andL.M.Rocha.Singularvaluedecompositionandprincipalcomponentanalysis.InAPracticalApproachtoMicroarrayDataAnalysis,pages91,2003. S.Wang,H.Chen,S.Li,andD.Zhang.FeatureextractionfromtumorgeneexpressionprolesusingDCTandDFT.InEPIAWorkshops,pages485,2007. X.WangandK.K.Paliwal.Featureextractionanddimensionalityreductionalgorithmsandtheirapplicationsinvowelrecognition.PatternRecognition,36:2429,2003. H.WeiandS.Billings.Featuresubsetselectionandrankingfordatadimensionalityreduction.IEEETransactionsonPatternAnalysisandMachineIntelligence,29(1):162,January2007. J.Weston,S.Mukherjee,O.Chapelle,M.Pontil,T.Poggio,andV.Vapnik.FeatureselectionforSVMs.InAdvancesinNeuralInformationProcessingSystems13,pages668,Cambridge,MA,2001.MITPress. 101

PAGE 102

J.Wilson,P.Gader,W.-H.Lee,H.Frigui,andK.Ho.Alarge-scalesystematicevaluationofalgorithmsusingground-penetratingradarforlandminedetectionanddiscrimination.IEEETransactionsonGeoscienceandRemoteSensing,45(8):2560,August2007. D.WipfandB.Rao.l0-normminimizationforbasisselection.AdvancesinNeuralInformationProcessingSystems17,2005. L.WolfandA.Shashua.Featureselectionforunsupervisedandsupervisedinference:theemergenceofsparsityinaweighted-basedapproach.IEEEInternationalConferenceonComputerVision,1:378,2003. Y.WonandP.D.Gader.Morphologicalshared-weightneuralnetworkforpatternclassicationandautomatictargetdetection.InIEEEInternationalConferenceonNeuralNetworks,1995Proceedings.,volume4,pages2134,1995. J.YangandJ.Yang.WhycanLDAbeperformedinPCAtransformedspace.PatternRecognition,36:563,2003. J.YangandJ.Yang.GeneralizedKLtransformbasedcombinedfeatureextraction.PatternRecognition,35:295,January2002. K.Y.YeungandW.L.Ruzzo.Principalcomponentanalysisforclusteringgeneexpressiondata.Bioinformatics,17(9):763,September2001. J.YuandB.Bhanu.Evolutionaryfeaturesynthesisforfacialexpressionrecognition.PatternRecognitionLetter,27:1289,2006. H.Zen,Y.Nankaku,K.Tokuda,andT.Kitamura.EstimatingtrajectoryHMMparametersusingMonteCarloEMwithGibbssampler.InIEEEInternationalConferenceonAcoustics,SpeechandSignalProcessing,2006.ICASSP2006Proceedings.,volume1,pages1173,May2006. X.Zhang,P.Gader,andH.Frigui.FeaturelearningforahiddenMarkovmodelapproachtolandminedetection.InSocietyofPhoto-OpticalInstrumentationEngineers(SPIE)ConferenceSeries,volume6553ofPresentedattheSocietyofPhoto-OpticalInstrumentationEngineers(SPIE)Conference,May2007. D.Zhao,C.Liu,andY.Zhang.Discriminantfeatureextractionusingdual-objectiveoptimizationmodel.PatternRecognitionLetter,27:929,2006. Y.Zhao,P.Gader,P.Chen,andY.Zhang.TrainingDHMMsofmineandcluttertominimizelandminedetectionerrors.IEEETransactionsonGeoscienceandRemoteSensing,41(5):1016,May2003. 102

PAGE 103

M.A.ZmudaandL.A.Tamburino.Efcientalgorithmsforthesoftmorphologicaloperators.IEEETransactionsonPatternAnalysisandMachineIntelligence,18(11):1142,November1996. 103

PAGE 104

XupingZhangisaPh.D.studentattheUniversityofFlorida.Heearnedhisbachelor'sdegreeatTsinghuaUniversity,Beijing,China.Hisresearchinterestsincludelandminedetection,articialintelligence,machinelearning,Bayesianmethods,featurelearningforimages/signals,datamining,andpatternrecognition. 104