<%BANNER%>

Bregman Metrics and Their Applications

Permanent Link: http://ufdc.ufl.edu/UFE0021037/00001

Material Information

Title: Bregman Metrics and Their Applications
Physical Description: 1 online resource (178 p.)
Language: english
Creator: Chen, Pengwen
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2007

Subjects

Subjects / Keywords: bregman, capacity, cardiac, classification, clustering, curve, dynamic, hellinger, kullback, metric, primal, triangle
Mathematics -- Dissertations, Academic -- UF
Genre: Mathematics thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: In this work, I provide three different but related discussions on a famous distance: Kullback-Leibler(KL) divergence. First, I construct two classes of distances similar to the construction of KL divergence. I take a serious discussion to verify these mathematical results. Second, their applications include data clustering and classification. I propose a novel distribution classifier, which can be used to classify the cardiac contours. Third, since KL divergence has a nice property, called parametrization independence, it leads a well-defined distance in matching curves. It provides the symmetric and transitive properties in matching curves through both matching to an average curve. Based on this framework, I provide two novel models: the location weighted model and Jensen-Shannon-Hellinger(JSH) model. The location weighted model is suitable to match curves under occlusion. JSH model is a robust, total angle variation based model, and capable to handle multiple curves. In JHS model, the optimal matching is the one under which the average curve has the maximal sum of feature measure, including arc-length and angle total variation.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Pengwen Chen.
Thesis: Thesis (Ph.D.)--University of Florida, 2007.
Local: Adviser: Chen, Yunmei.
Local: Co-adviser: Rao, Murali.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2009-08-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2007
System ID: UFE0021037:00001

Permanent Link: http://ufdc.ufl.edu/UFE0021037/00001

Material Information

Title: Bregman Metrics and Their Applications
Physical Description: 1 online resource (178 p.)
Language: english
Creator: Chen, Pengwen
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2007

Subjects

Subjects / Keywords: bregman, capacity, cardiac, classification, clustering, curve, dynamic, hellinger, kullback, metric, primal, triangle
Mathematics -- Dissertations, Academic -- UF
Genre: Mathematics thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: In this work, I provide three different but related discussions on a famous distance: Kullback-Leibler(KL) divergence. First, I construct two classes of distances similar to the construction of KL divergence. I take a serious discussion to verify these mathematical results. Second, their applications include data clustering and classification. I propose a novel distribution classifier, which can be used to classify the cardiac contours. Third, since KL divergence has a nice property, called parametrization independence, it leads a well-defined distance in matching curves. It provides the symmetric and transitive properties in matching curves through both matching to an average curve. Based on this framework, I provide two novel models: the location weighted model and Jensen-Shannon-Hellinger(JSH) model. The location weighted model is suitable to match curves under occlusion. JSH model is a robust, total angle variation based model, and capable to handle multiple curves. In JHS model, the optimal matching is the one under which the average curve has the maximal sum of feature measure, including arc-length and angle total variation.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Pengwen Chen.
Thesis: Thesis (Ph.D.)--University of Florida, 2007.
Local: Adviser: Chen, Yunmei.
Local: Co-adviser: Rao, Murali.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2009-08-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2007
System ID: UFE0021037:00001


This item has the following downloads:


Full Text

PAGE 1

1

PAGE 2

2

PAGE 3

Myrstandforemostthanksgotomyadvisers,Dr.YunmeiChenandDr.MuraliRao.Withouttheirconstantencouragement,andsupport,Iwouldnothavebeentocompletethisresearchwork.Theywereverygenerouswiththeirtimeandhelp.Especially,theygavemealotofhelpfulmathematicalandnon-mathematicaladviceandassistancethroughoutmyresearchwork.Iamgratefultomysupervisorycommitteemembers(WilliamHager,GopalakrishnanJayadeep,JosePrincipe,andRonglingWu)formystudy.Itisapleasuretoacknowledgetheirsuggestionsonthisresearchwork.Ibenetedfromeverydiscussionwiththem.IamespeciallygratefultoDr.GopalakrishnanJayadeepandDr.WilliamHagerforoeringnumericalcourses.Listeningtotheirlectureswasapleasantexperience.Iwouldalsoliketothanktheprofessorsinmydepartmentwhohelpedmeonvariousoccasions,andinparticularDr.JohnKlauderforourenjoyablediscussionsduringmyrstyear.Lastbutnotleast,IwanttothankmywifeApril,myparentsandherparentsfortheirunderstandingandsupport. 3

PAGE 4

page ACKNOWLEDGMENTS ................................. 3 LISTOFTABLES ..................................... 7 LISTOFFIGURES .................................... 8 ABSTRACT ........................................ 9 CHAPTER 1MOTIVATIONANDOUTLINEOFTHISPAPER ................ 11 2BREGMANMETRICS ............................... 15 2.1Introduction ................................... 15 2.2PreliminariesandNotations .......................... 17 2.2.1BregmanDivergenceofOrder,MeasureInvariance ......... 18 2.2.2BregmanDivergencesvs.ExponentialFamilies ............ 20 2.2.3TheCriticalPower1=2 ......................... 22 2.3TheFirstTypeofMetrics ........................... 24 2.3.1SquareRootMetric ........................... 24 2.3.2BregmanDivergenceofOrderandLNorm ............ 31 2.4TheSecondKindofMetricswithBregmanDivergences ........... 34 2.5Mini-maxMetrics:Capacity,Mini-maxISDistance ............. 41 2.5.1RangeofTheMaximizerp 43 2.5.2Capacity:BasicProperties ....................... 48 2.5.3Capacityvs.L1norm .......................... 50 2.5.4CapacityMetric ............................. 51 2.5.5Mini-maxISDistance .......................... 56 3KULLBACK-LEIBLERDIVERGENCEBASEDCLASSIFIER ......... 62 3.1Introduction ................................... 62 3.2NotionsandDenitions ............................. 64 3.3PrimalProblemvs.DualProblem ....................... 65 3.3.1EquivalenceinPrimalProblems .................... 68 3.3.2EquivalenceinDualProblems ..................... 70 3.3.3ExistenceofDualProblemandDiscussionsonInactiveConstraint 74 3.3.4ExistenceofSolutionsinPrimalandDualProblems ......... 77 3.4EectsofTheParametersC1;C2 80 3.5LocationWeightedClassier .......................... 81 4APPLICATIONINCARDIACNORMALITYSHAPEANALYSIS? ...... 88 4.1NumericalAlgorithm .............................. 88 4

PAGE 5

.............. 88 4.1.2InnerLoopofTheAlgorithmandItsProperties ........... 89 4.1.3ThePropertiesofNewtonIteration .................. 91 4.1.4CapacityAlgorithm ........................... 93 4.2NumericalExperiments ............................. 98 4.3ModelingCardiacShape ............................ 100 4.3.1Introduction ............................... 100 4.3.2ContourRepresentation:AreaDistribution .............. 102 4.3.3ClusteringEndocardialContours .................... 103 4.3.4ClassifyEndocardialContours ..................... 103 4.3.5ResultandDiscussion: ......................... 105 4.3.6LocationWeightedClassier ...................... 106 4.4Conclusion .................................... 107 5KLDIVERGENCEBASEDMULTIPLECURVESMATCHING ......... 113 5.1Introduction ................................... 113 5.2ProblemDescription .............................. 116 5.2.1ModelDescription ............................ 116 5.2.2DenitionsandNotations ........................ 118 5.2.3ExistenceandUniqueness ....................... 122 5.3VariationalModelswithDesiredProperties .................. 125 5.4LocationWeightedMatchingModel ...................... 127 5.5ModelsofPreferringtheSameSignsofMatchedCurvatures ........ 128 5.5.1HellingerDistanceonBendingEnergy ................. 129 5.5.2AnotherViewpointofThisHellingerDistanceModel ........ 133 5.5.3ComparisonbetweenHellingerDistanceandJSDivergence ..... 134 5.5.4De-noisingAbilityandStability .................... 135 5.6NumericalAlgorithmsandExperiments .................... 139 5.7Conclusion .................................... 142 6MISCELLANEOUS:APPLICATIONINCLUSTERINGANALYSIS ...... 149 6.1ClusteringStatesinTheRacialStructure ................... 150 6.2ClusteringtheEndocardialContours ..................... 152 7FUTUREWORK ................................... 158 APPENDIX APROOFSOFTHENECESSARYCONDITIONSOFBREGMANMETRICS 159 BMOREPROOFS ................................... 162 B.1StillAMetric:CapacitytoThePowerr=0:4959. ............. 162 B.2SeveralProofsonTheRangeofp,Whenn=2. ............... 165 5

PAGE 6

...................... 171 C.1ExistenceandUniquenessofTheOptimalMapping ............. 171 C.2RelationshipsbetweenJSDivergenceandHellingerdistance ........ 173 LISTOFREFERENCES ................................. 175 BIOGRAPHICALSKETCH ................................ 178 6

PAGE 7

Table page 2-1Examplesoftherstkindofmetrics. ........................ 60 2-2ExamplesofminimaxBregmandivergences. .................... 60 5-1ExperimentresultsofJSHmodel. .......................... 144 6-1ResultsofKLl1clusteringstates ........................ 153 6-2ResultsofKLl1clusteringstatesin10groups ................. 156 7

PAGE 8

Figure page 2-1BregmandivergenceBf(x;y). ............................ 60 2-2TherelativelocationsofpointsmentionedinTheorem 2.6 ............ 60 2-3Theshiftindicatesthedivergence. ........................ 61 2-4Theexampleplotsofthefunctiongr(x). ...................... 61 4-1Numericalcomputationsofourcapacityalgorithm. ................ 109 4-2ComparisonbetweenouralgorithmandB-Aalgorithm. .............. 109 4-3Clusteringall91normalshapesinto4groups. .................. 110 4-4Theweightfunction. ................................. 110 4-5TrainingandtestingexperimentresultusingC1=C2=0:5. ........... 111 4-6TrainingandtestingresultsusingC1=C2=1:0. ................. 111 4-7Histogramofclassicationresults. .......................... 112 5-1Non-Transitivityofpairwisematching. ....................... 144 5-2Thenumberofpossiblemappings. .......................... 144 5-3Setting:DenitionsofG1and1. .......................... 145 5-4Azigzaggedcurve. .................................. 145 5-5ComparisonofJSdivergenceandHellingerdistance. ............... 146 5-6Stabilityoftheglobalminimizer. .......................... 146 5-8Theresultofthelocationweightedmodel. ..................... 147 5-9TheresultofJSHmodelwithdierentC. ..................... 148 6-1Clusteringresultof44heartshapesinto8groups ................. 155 B-1BoundIllustrationoftheminimizerp. ....................... 170 8

PAGE 9

Inmystudy,IprovidethreedierentbutrelateddiscussionsonKullback-Leibler(KL)divergence.First,itisonefamousmemberamongBregmandivergences.ItisknownthatthesquarerootofaveragingKLdivergence(Jensen-Shannondivergence)isametric.IprovideanecessaryandsucientconditiontodeterminewhichBregmandivergencesbecomeametricthroughaveragingandtakingsquarerootprocedure.Plus,Iprovethatcapacitytothepower1=eisametric,whichistheminimaxsphereinthesenseofKLdivergence.Secondly,itisknownthatBregmandivergencesprovideaframeworkfordata-clustering.WeuseKLdivergencestoclusterstatesinracialstructuresandcardiaccontours,howeverwealsoprovideanovelclassierbasedonKLdivergences.Itsmathematicalpropertiesareexploredandjustied.Weprovideanecientnumericalalgorithmforthisclassier,andconductanexperimenttoexaminethenormalityofcardiaccontours.SinceKLdivergencehasaniceproperty,calledparametrizationindependence,itprovidesawell-deneddistanceinmatchingcurves.Itprovidesthesymmetricandtransitivepropertiesinmatchingcurvesthroughbothmatchingtoanaveragecurve.Basedonthisframework,weprovidetwonovelmodels:thelocationweightedmodelandJensen-Shannon-Hellinger(JSH)model.Thelocationweightedmodelissuitabletomatchcurvesunderocclusion.TheJSHmodelisangletotalvariationbasedmodel,andcapabletohandlemultiplecurves.InthisJHSmodel,theoptimalmatching 9

PAGE 10

10

PAGE 11

Inmystudy,threedierentbutrelatedtopicsareaddressed.Thesediscussionsarealllinkedtotheinformationdivergence,whichisalsoknownasKullback-Leiblerdivergence. Kullback-Leiblerdivergenceisalsocalledthe\relativeentropy."Itprovidesadistanceforprobabilitydistributionfunctions,andmanypropertiesarediscussedininformationtheory,especiallycloselyrelatedtomaximallikelihoodprinciple.However,thisdivergenceisnotametric,sinceitfailstopreservethesymmetryandthetriangleinequality.Recently,EndresandSchindelin[ 22 ]proposedametricbasedonKullback-Leiblerdivergence.TheirideawastoconstructanaverageKullback-Leiblerdivergence,alsoknownastheJensenShannondivergence.TheirmetricwassimplyasquaredrootofJensenShannondivergence.OnecanviewKullack-LeiblerdivergenceasonespecialcaseofBregmandivergences[ 10 ].Thenitfollowsanaturalquestion,canweconstructmetrics(forshort,denoting\squaredrootmetrics")throughtakingasquarerootofarbitraryaverageBregmandivergences?(LetuscalltheseBregmansquarerootmetrics) Mystudydiscussestwoclassesofmetricsassociatedwiththisidea.Inmywork,surprisinglywecananswerthisquestionbyprovidinganexplicitcriteriononthechoiceofBregmandivergences,whichisthenecessaryandsucientcondition.Onemotivationofthismetriccanbeillustratedbyafollowingsimpleintuitivethought.Givenastatementthattwopointsinaspacewithadistance.Iftheirdistanceisverysmall,thentheytendtobeindistinguishable.Inotherwords,itisverylikelythattheyaresimplyonepointinsteadoftwo,andtheirnonzerodistanceissimplyameasurementerror.Thislikelihoodofbeingonepointindicatesanother\natural"distancebetweentwopoints. IncaseofKullback-Leiblerdivergence,thescenariois,giventwoempiricaldistributionsP1;P2withthedatanumbersk1;k2,theirmaximumlikelihoodofcomingfromanother\centraldistributionQ"isgivenbyexp(k1KL(P1;Q)k2KL(P2;Q)) 11

PAGE 12

3 ][ 4 ],aprobabilitydistributionfunctionofexponentialfamiliesisassociatedwithaBregmandivergences[ 10 ],forinstance,multi-nomialdistributionsassociatedwithKullback-Lieblerdivergences,andnormaldistributionswithEuclideandistances,andtheexponentialdistributionswithItakuru-Saitodistances.Basedontheirworks,ourresultpointoutthatonecanconstructametricbasedonthesedivergences,whichprovidesanaturalmaximallikelihoodinterpretationasabove.Anothertypeofmetricisalsoprovidedinthechapter 2 ,calledminimaxmetrics.Capacityisoneofthem.Wewillshowthiscapacitytothepower1 Bregmandivergenceshavebeenappliedinclusteringdataintoseveralgroups[ 4 ].Thisisknownasanunsupervisedlearningproblem.Expectation-Maximumalgorithm(EM)canbeusedintheclusteringimplementation.WhentheBregmandivergenceisEuclideandistance,thisalgorithmisalsoknownasK-meansmethod.Inchapter 6 ,IwillgiveanexampleofusingKLdivergencetoclusteringstatesintermsoftheirracialstructures.ThisclusteringusingKLdivergenceisalsoknownasinformationtheoreticclustering[ 20 ].AmongallBregmandivergences,wewillshowthattheKullback-Liebleristheonewhich 12

PAGE 13

Anothertypeofproblem,calledclassication,isknownasasupervisedlearningproblem[ 6 ].Inarecentliterature[ 18 ][ 29 ],BregmanianspheresareusedinoneclassclassicationasageneralizationofEuclideanspheres.Theyarelookingforthisgeneralizedsmallspherethatenclosesmostofexamples(data).Inchapter 3 ,aKL-basedmodelisproposed(calledlocationweightedclassier)toclassifyL1+functionsandtoretrievetheinformationfromthedata.Conceptually,wearelookingforasmallestsphereenclosingagroupofdata,callednormaldata,andexcludingtheothergroupofdatacalledabnormaldata,whichisdierenttheaboveone-classproblem.Thiscanbeformulatedasaminimaxproblem.However,weshouldnottreatallinformationequally,i.e.someinformationshouldbeweighmorethantherest.Tothisend,wearelookingfora\weightedsphere"suchthattheseparationofnormalandabnormaldataisoptimal.SeveralresultsaresimilartothepropertiesinSupportVectorMachines[ 7 ].Inthischapter,themathematicalexistenceanduniquenessofclassierisestablished,andmanypropertiesareexploredthroughconstructingprimalanddualproblems.Thisisnotatrivialresult,becausethereisaninactiveconstraintinthedualproblem.Aclearandcompletediscussionisgiventoillustratethisinactiveconstraint.Intheendofthischapter,Iwillconstructthislocationweightedclassierwhichcanperformtheseparationtaskandidentifythesignicanceregionsimultaneously.InTheorem 3.8 ,wealsogivemoreexplicitformofthecostfunctionalandtheweightfunction.NumericallyanalgorithmbasedonSequentialMinimalOptimization(SMO)[ 31 ]isdiscussedinchapter 4 .Anexperimentofclassifyingthecardiaccontourbasedonitsnormalityisprovidedtojustifytheutilityofthismodel. Anothercontributioninmyworkisthattheapplicationincurvematchingproblem.ThisBregmanframeworktohandlethemultiplecurvescorrespondenceprobleminchapter 5 .Formostexistingpairwisecurvesmatchingmodels,thelackoftransitivityisa 13

PAGE 14

Basedonthisbasicbutsoundtoymodel,wecanproposeamoreinterestingmodel,calledJSHmodel.TheJSHmodelhasafullyparametrizationindependenceonbothstretchingandbendingenergies.Moreover,thismodelisangletotalvariationbasedmodel.Thismodelisarobustmodelagainnoisewhichhasasmallangletotalvariation.Incasesofmultiplecurves,thismodeltellsthefollowingintuition,inordertondtheoptimalmatching,weshouldndtheonewiththemaximalsumofarclengthandangletotalvariationontheassociatedaveragecurve.Inthisframework,thetaskofmatchingcurvesbecomesndingcorrespondenceof\featuremeasures".FinallyanEMalgorithmcanbeusedinndingthedistanceofgroupsofcurvesnumerically. 14

PAGE 15

23 ].Moreprecisely,thelog-likelihoodofanexponentialfamilydistributioncanberepresentedbyasumofaBregmandivergenceandaparameterunrelatedterm.Hence,Bregmandivergenceprovidesalikelihooddistanceforexponentialfamilyinsomesense.Inthelastdecade,Bregmandivergences[ 10 ]havebecomeanimportanttoolintheresearchareaofmachinelearning.OnecanchooseaproperdivergenceamongBregmandivergencestomeetuser'sspecicdemand. Inliteratures,avarietyofBregmandivergenceshavebeenusedasdistortionfunctions(orlossfunctions)indataclustering,vectorquantization[ 4 ][ 27 ].Forinstance,Itakura-Saitodistance[ 12 ][ 25 ],andKullback-Leiblerdivergence[ 17 ][ 20 ]areusedwidelyinthespeechsignalprocessingcommunityandtheinformationtheorycommunityrespectively.Besides,thecloserelationshipwitheveryprobabilitydistributioninexponentialfamilyhasbeenusedingeneralizingprincipalcomponentanalysisbasedonGaussiandistributionstootherExponentialfamily[ 15 ].Thesedivergenceshoweverarenotmetrics,becausetheyarenotsymmetricanddonotsatisfythetriangleinequality. Forinstance,Kullback-LeiblerdivergenceKL(f;g)oftwonon-negativefunctionsf;gisnotsymmetric.ThiscaneasilybeovercomebyconsideringthesumofKL(f;g)+ 15

PAGE 16

2(KL(f;1 2(f+g))+KL(g;1 2(f+g))).Inthepaper[ 22 ],itisprovedthatthesquarerootofJensen-Shannondivergenceisametric.Besides,itisalwaysniteforanytwoL1functions.Infact,Jensen-ShannondivergencecanberegardedasaspecialcaseofaveragingBregmandivergenceswithanassociatedconvexfunctionxlogx.ItisnaturaltoaskwhetherthesquarerootofotherBregmandivergencesisalsoametric?Thisisthemainmotivationofmywork.Wewillprovideasucientandnecessaryconditionontheassociatedconvexfunction,suchthatthesquarerootofaveragingBregmandivergenceisametric.Wewillshowthatthefeasiblesetofconvexfunctionsinfactformsaconvexcone.Clearlythejusticationofthetriangleinequalityistheonlynontrivialpart. \Capacity"isanothersymmetricdiscriminantmeasureofprobabilitydistributions.Capacityisoneoftheimportantconceptsininformationtheory[ 17 ].Thisquantitycanberegardedasthe\radius"oftheminimalenclosingKullback-Leiblerdivergencesphere.However,itisalsoknownthatcapacityitselfisnotametric[ 19 ].Inthispaper,weshowthecapacitytothepower1=eisametric. Triangleinequalityprovidesavaluableinformationinpatternrecognitionresearcharea.Forinstance,theimportanttaskofsearchingforthe"nearest"neighborinamultidimensionalvectorspaceormetricspaceisaverycommonprocedureintheeldofpatternrecognitionandbecomesmeaningfulonlyinametricspace.Inthelastdecade,manyecientalgorithmshavebeenproposedtondanearestneighborinavarietyofmetricspaces[ 36 ][ 37 ].Anotherexample:Thesimilaritycriterionisgiventhroughauser-deneddistancefunction.Oneeconomicalmethodofndingthenearestneighboristhroughtheconstructionofaso-calledmetrictree:Nobjectsareputintoametrictreewithheightlog2N.Then,incasetriangleinequalityobtains,alotofeortissavedinndingthenearestoneinthemetrictree.(Thetotaldistancecomputationisreduced 16

PAGE 17

Wesummarizedourmaincontributionsinthispaper,roughlyasfollows:IfapoweroftheBregmandivergenceleadstoametricthispowercanatmostbe1 2. 2isacriticalpoint.WecharacterizethetypesofBregmandivergencesthatleadtometricsthroughtheprocessofaveragingandtakingasquareroot. 2.5.5 ) Inthispaperweconsidertwokindsofaveragingmethodstoensurethesymmetryaspectofametric:rst,mf(x;y)=1 2(Bf(x;z)+Bf(y;z));z=1 2(x+y);

PAGE 18

2(f(x)+f(y))f(1 2(x+y));second,mf(x;y)=max0p1;z=px+(1p)y(pBf(x;z)+(1p)Bf(y;z)); ClearlyBf(x;y)isconvexinx,butingeneralnotnecessarilyconvexiny.OnepropertyofBregmandivergenceisgivenasfollows. Iff=x2,thenwehaveBf(x;y)=(xy)2,andclearlyitssquarerootisametric.Butingeneral,asquarerootofBregmandivergenceisnotametric.(seeanexampleinRemark 2.2 ).Ourgoalistodecidewhenthesquarerootofmf(x;y)isametric,i.e.whatfunctionsfareeligible? Thankstothefollowinglemma,thestcriterionofmetricisfullled. (FormorepropertiesofBregmandivergences,wereferinterestedreadersto[ 3 ][ 4 ].) Denition2.2. Kullback-LeiblerdivergenceKL(x;y)isdenedasKL(x;y):=nXi=1xilogxi

PAGE 20

y(xy)); if=0,wehaveF(k)=c1logk+c2k+c3;BF(x;y)=c1(logx y1+x y); if6=0;1,wehaveF(k)=c1 ghfh+ghd=Zflogf gf+gd^; gf+gd;Mp(1;2):=(pCKL(1;3(p))+(1p)CKL(1;3(p))): Denition2.4(Exponentialfamily). 11 ])LetbeanitemeasureontheBorelsubsetsofRk.LetthenaturalparameterspaceofN=f2Rk:Rexp(x)(dx)<1g;.2Niscalledacanonicalparameter.Let()=Rexp(x)(dx),andlet()=log()bethecumulativegeneratingfunction.

PAGE 21

dx. Duetotherelationf(x)=x(),wehavex()=f(x)+(xx)rf(x)=Bf(x;x)+f(x): 15 ][ 4 ][ 3 ]. Thefollowingisoneinterpretationofourmetrics.Considertwoobservedeventsx1;x22dom(f).Considereachxi;i=1;2separately,weliketondthemaximalestimatori.Inthedualform,wehavethatx=ximinimizeslogp(xi).Nowsupposethesetwoeventshappenindependently,thenndingasingleestimator=0tomaximizep(x1)p(x2)isequivalenttondaxtominimizeBf(x1;x)+Bf(x2;x).Clearly,x=1 2(x1+x2)istheminimizer.Andthelikelihoodratioisgivenbyp1(x1)p2(x2) 1 2(logp1(x1)p2(x2) 2(Bf(x1;1 2(x1+x2))+Bf(x2;1 2(x1+x2)))=mf(x1;x2):

PAGE 22

4 ],theassociatedfunctionfforGaussianPoison,Bernoulli,ExponentialandMultinomialdistributionsareEuclideandistance,Kullback-Leiblerdivergence,Itakura-Saitodistance.Inthispaper,wewillshowthatp Inthissection,weconstructaclassofmetricsbasedonBregmandivergences. Denotem(p;x;y):=pf(x)+qf(y)f(px+qy);p+q=1;0
PAGE 23

2. (2):followsfromtheconvexityoff. (3):fisconvexifandonlyiff0isincreasing.Thisimpliesm(p;x;y)isalsoincreasinginy. (4):Becauseof(3),theonlynontrivialcasetoconsiderisxm(p;x;z);m(p;x;y)>m(p;z;y). Thus,m(p;x;z) (5):Supposethetriangleinequalityholdslocally,i.e.m(p;xa;x)r+m(p;x;x+a)rm(p;xa;x+a)r;forsome0
PAGE 24

2follows. Next,weinvestigatewhetherd(x;y)=p 2Bf(x;1 2(x+y))+1 2Bf(y;1 2(x+y))withsomestrictlyconvexfunctionf.Thisisnottrueforarbitrarystrictlyconvexfunctionf.Thenonnegativityandsymmetrypropertiesareclear.Accordingtothepreviouslemma,1 2isthelargestpowerthatcansatisfythetriangleinequality.Inthefollowing,wewillshowthatthenecessaryandsucientconditionforthisis(logf00)000.Forthreenumbersa
PAGE 25

Indeedleta>0andconsidera;0;a.Wehavemf(a;a)=p 2(p 2+1 4a1 2a1+O(1 2a!0;a!1; Noticethatthisfunctionffailsthecondition(logf00)000. 2.5.5 Notethatifastrictlyconvexfunctionfsatises(logf00)00=0,thenwehavelogf00=c1x+c2,i.e.eitherf=1 Noticethatanyfunctiong2G,andabc,wehaveq

PAGE 26

Inbothcases,fgforxoutsidethesegment[a1;a4]. Proof. Sincethereexistzerosfory0z0=0andyz=0intheinterval[1;3],wemusthavey=zon[1;3]. Nowconsiderthesituationoutsidetheinterval[1;3].Considerthecase[3;1)rst.Since(logy00logz00)000everywhere,and(logy00logz00)0,logy00logz00bothvanishatx=3,itmustholdthaty00z000forx3.Similarlysincey0=z0,y=zatx=3,wehavey0z00;yz0in[3;1);butbyassumptionthereexistszerosfory0z0=0andyz=0in[3;1).Wemusthavey=zintheinterval[3;a5]. Similarreasoningappliestothecase(;1].Finallywehavew:=yz=0intheinterval[a1;a5].Andwehaveestablishedtherststatement. Incasesofnomorethan4intersections:Noww000,thenwhasatmosttwovanishingpoints.Letthesetwovanishingpointsofwbe1<2.wtakesvalueswithsigns+;;+in(;1);(1;2),(2;1).Sameargumentsapplyonw0,andw.Hence,wtakessigns+;;+;;+in(;a1);(a1;a2),(a2;a3),(a3;a4),(a4;1),hereak;k=1;:::;4arevanishingpointsofw.Thus,weprovedthesecondstatement. 26

PAGE 27

Thenthereexistsacurvegivenbyy=fc1;c2;c3;c4(x)=ec1x+c2+c3x+c4,c16=0;c2;c3;c4somescalarsory=fc2;c3;c4(x)=c2x2+c3x+c4,passingthroughtheseabove4points. Proof. Sinceyk=f(xk);k=1;:::;4,thenwehave Thus,fork=1;2wehave Forthesakeofsimplicity,letrk;k+1:=yk+1yk NotethatbyMeanValueTheorem,ec1xk+1ec1xk=c1(xk+1xk)ec1k;forsomek2[xk;xk+1]; 1ec1(12).Bytheassumptionf00(x)>0,thenr3;4r2;3>0andr2;3r1;2>0,andLHSoftheaboveequationispositive.MoreoverRHSapproaches1,and0asc1towards1,andrespectively,thentheexistenceofc1isensured. Notethatasc1!0,usingL'Hospital'srule,RHSinEquation 2{3 becomesx4x2

PAGE 28

2{2 2{1 .(Theexistenceofc2isguaranteedbytheconvexityoffandfc1;c2;c3;c4(x).)Likelyinthecasec1=0,aftergettingc2,wecansolveforc3;c4backwards. Whenweverifythetriangleinequality,thefunctionvaluesat6pointsonthecurvey=f(x)areused,including4interiorpoints(1 2(a+b);f(1 2(a+b))),(1 2(b+c);f(1 2(b+c))),(1 2(a+c);f(1 2(a+c))),(1 2b;f(b)),andtwoendpoints(1 2a;f(a)),(1 2c;f(c)).Accordingtotheabovelemma,weknowthatthereexistsacurveintheclassGpassingthroughtherst4points,suchthattheendtwopointslieaboveoronthecurve. Next,wewillshowthatsuchapropertyindeedyieldsthedesiredtriangleinequality. 2(a+b);1 2(b+c);1 2(a+c); Proof. Thenwehavemg(a;b)=mf(a;b);mf(a;c)mg(a;c)=1 2(f(c)g(c));

PAGE 29

2(f(c)g(c)): Proof. 2(a+b);f(1 2(a+b))),(1 2(b+c);f(1 2(b+c))),(1 2(a+c);f(1 2(a+c))),(1 2b;f(b)). Inordertoverifythetriangleinequalityp 2.6 suchthaty=g(x)passesthroughthese4districtinteriorpoints. NowbyDenition 2.6 wehavep 2.5 ,wehavef(a)g(a),f(c)g(c).Finallybythepreviouslemma 2.7 ,wecanconcludef2F. Inthecaseof(a+c)=2b,anygiven3pointsonacurvey=f(x),usingasimilarargument,wecanhaveaconstructedcurvey=fc1;c2;c3;c4(x)tangentialtothesecond 29

PAGE 30

Proof. Sincek0,thenf0i0f0i000k(f0i00)20.f00f0000k(f000)2=2(f010f01000k(f0100)2)+(1)2(f020f02000k(f0200)2)+(1)(f010f02000+f01000f0202kf0100f0200)0+0+(1)(2p Proof. ZmF(g;k)d=s Zp Zp

PAGE 31

Zp Zp ZmF(g;h)d+s ZmF(h;k)d=p 2.3 ,notingthatsinceBregmandivergencesarenonnegative,thentheseintegralsarewell-denedaslongastheyareboundedabove.Thus,inthecaseofBregmandivergenceoforder,1,theconditionthosefunctionsg;h;khavenitenormskgkL1;khkL1;kkkL1sucientlyensuresthewell-denedintegralinfact. 2(F(a)+F(b))F(1 2(a+b)),thenasa!b,wehavemF(a;b)=F00(1) 8(ab)2+O((ab)3). Consider1<2,thedomainis(;1).SincemF(a;b)=mF(a ;b );:=max(jaj;jbj).c1:=infx2[1;1)mF(x;1) (1x)2;c2:=supx2[1;1)mF(x;1) (1x):

PAGE 32

a+b;2b a+b): (2x2)2;c2:=supx2[0;1)mF(x;2x) (2x2): ;b );:=max(jaj;jbj). Denotec1:=infx2[1;1)mF(x;1) (1x)2;c2:=supx2[1;1)mF(x;1) (1x)2: 2(f+g))mF(2f f+g;2g f+g)d,duetothelastlemma,wehavethesecondinequalityMF(f;g)c2R(fg)d. Ontheotherhand,usingthelastlemma,thenwehaveMF(f;g)22c1R(fg)2 32

PAGE 33

f+g1,and0g f+g1,thentherstinequalityisveriedbyusingtheinequalityR(f+g)d(Rfd+Rgd). Claim:giventwopositivenumbersa;b,thereexistconstants1;2,suchthat1(a a+b;b1:=2b a+b,thenamong0a12,b1=2a1,andalsolet1:=inf0a12mF(a1;b1) (a (a 33

PAGE 34

((1+r) 2(f00(1+r)+f00(1r)) Nowfor0p1,andq=1p,denotem(p;x;y):=pBf(x;px+qy)+qBf(y;px+qy)=pf(x)+qf(y)f(px+qy),andinthissubsection,wedenote(notethatthedenitionofmfisdierentfromtheprevioussubsection)mf(x;y):=max0p1;p+q=1pBf(x;px+qy)+qBf(y;px+qy)=max0p1;p+q=1pf(x)+qf(y)f(px+qy): Proof. i.e.B(x;px+qy)=B(y;px+qy),equaldivergences. Sincemf(x;y)=max0p1;p+q=1pB(x;px+qy)+qB(y;px+qy),thenmf(x;y)=B(x;px+qy)=B(y;px+qy). 34

PAGE 35

2.3 ,weseekforthenecessaryconditionofthemetricp A Nowconsiderf(x):=1 Sincemf(a;b)=pf(a)+(1p)f(b)f(pa+(1p)b)=1

PAGE 36

3((logf00)0)2arelistedinTable 2.5.5 Proof. Claim:thereexistsacurvegivenbyy=fc1;c2;c3;c4(x)=c2 Supposec26=0,(Wewillprovideacriterion(Eq. 2{6 )tojudgewhetherc2=0ornot)Thequadraticcase(c2=0)issimilar,sotheproofisomitted.Ifthedomainisconcerned,letusassumethatc2>0,andthedomainis(c1;1)\fx:jf0(x)j<1gtemporarily.Infact,thefollowingproofgoeswellwithoutthisassumption.Theequation 26 isalsothecriterionforthedomain. 36

PAGE 37

Thus,fork=1;2wehave Forthesakeofsimplicity,letrk;k+1:=yk+1yk Sincethedata(xk;yk)isgiven,theonlyunknownisc1intheaboveequation.Byassumptionf00>0,weknowLHSispositive,theneitherc1x4>x1.Suppose01,thenc1>x4.IfLHS=1,thennosolutionforc1,butweshoulduseaquadraticfunctiony=fc2;c3;c4=c2x2+c3x+c4,insteadoftheaboveconstructedfunction.SothisLHSisthecriterionofwhichconstructedfunctionsshouldbeadopted.Notethatasc1!1,LHS!1+,andasc1!x+4,LHS!1.Similarly,asc1!,LHS!1,andasc1!x1,LHS!0+.Thustheexistenceofc1isensured,andc1isunique. Oncec1isgained,wecansolveforc2;c3;c4backwardsusingpreviousequations( 2{5 ),( 2{4 ).

PAGE 38

Denotey(x):=(logf(x)00)0;z(x):=(logg(x)00)0,theny01 3y20,(yincreasesmonotonically),z0=1 3z2,andyz=0at1;2.Clearlyz=3 Intherstcase,wehavez(x0)<0andzsatisesz0=1 3z2.Thenwehavez=3 3;(1 3,and(1 Recallyz=0at1;2,i.e.1 Now,wediscusstheset[2;x5]intwocasesy<0,andy0.Inthesub-domainwherey<0,since(1 Similarargumentsapplyto[x1;1].Finallywehavef=gin[x1;x5]. Supposethereareexactlyonly4zeros,x1;x2;x3;x4.Since(1 38

PAGE 39

Inthesecondcase:z(x0)=0,thenwehavez(x)=0forallx2R.Sinceyincreasesmonotonically,theny=zat1;2impliesy=z=0onthewholeinterval[1;2].Afterwardsthroughthesamearguments,wehavef=gonthewholeinterval[1;2].Alsoy0=zon[2;x5],andf=gatx5,thensimilarlywehavef=gon[2;x5].Applyingthesamearguments,wehavef=gon[x1;1]. Nowsupposeonly4zeros.Sincez(x)=0forallx,thenyzmonotonicallyincreases.Usingsimilararguments,wecanshowthatfgtakesvalueswithsigns+;;+;;+on(;x1),(x1;x2),...(x4;1). 2-2 .)(Replacef(x)withf(x)ifnecessary.Replacef(x)withf(x+x1)(xa)(f(x1+c)f(x1+a))=(ca)ifnecessary.) Basedonpreviouslemma,theseexistsacurvey=g(x);g2Gpassingthroughthreepoints(x1;f(x1)),(b;f(b)),(c;f(c)),andtangentatx1(inFig. 2-4 ).(i.e.g0(x1)=0)Thenthereexistsapoint(ba;g(ba)),suchthatg(ba)=f(c);ba<0.Sincetwocurvesy=f(x);y=g(x)intersectatx=x1;b;c,andaretangentialatx=x1,thenbythepreviouslemma,fgtakesvalueswithsigns+;+;;+on(;x1);(x1;b);(b;c);(c;1).Thusbaa
PAGE 40

a1,suchthath(a)=g(ka)=g(ba)=f(a)thenthiscurvey=h(x)istangenttoy=f(x)atx1,andalsointersectata.Noteh2G.SinceBregmandivergencesareinvariantunderhorizontallyscaling,then Sinceh(c)=g(kc)g(c)=f(c),andalsof2F;h2G,thenfh>0on(bc;1),herebc>candf(bc)=h(bc).Sincethecurvesy=f(x);y=h(x)intersectata;bcandaretangentialatx1,thenfhtakesvalueswithsigns+;;;+on(;a),(a;x1),(x1;bc),(bc;1). Nowintheinterval[a;b],fh,andthelinesegmentconnecting(a;f(a));(b;f(b))liesabovethesegmentconnecting(a;h(a));(b k;h(b k)).Besides,bothlinesegmentshavenegativeslopeandh0(x1)=f0(x1)=0,thenwehavemf(a;b)mh(a;b k). Thenusing( 2{9 ),wehave k)=mg(ba;b):(2{10) Therefore,from( 2{7 ),( 2{8 ),( 2{10 ),andp Clearly,wehavethefollowingresultaswell.ItsproofissimilartotheproofinTheorem 2.3 ,soweomitit. 40

PAGE 41

2.7 ,anothertypeofmetricsaremoreusefulinpractice.Inthissection,weproposeanewtypeofmetric,calledmini-maxmetrics.Thiskindofmetricis\theminimalradiusofasphere"toenclosingtwodata,heretheradiusisinthesenseofBregmandivergence.SimilartotwopointsinEuclideanspace,butthesituationisslightlydierentbecausetheirmutualdistanceisnolongerEuclideandistance. ConsiderastrictlyconvexfunctionFwithitseectivedomain:=fx:f(x)<1;f0(x)<1g.Givenf;g;l;a2Rn,withtheirithcomponents()i,andfi;gi;li2,andai>0fori=1;:::;n.Nowconsidertheproblem:minl2nR;subjecttoBF(f;l;a)R;BF(g;l;a)R;

PAGE 42

Proof. LetL(p;)=Pni=1piBF(fi;;a)(Pipi1): @pi(p)=0,whichimpliesnXj=1(F(fi)fiF0())jaj=0;i:e:BF(fi;)=nXj=1(F()F0())jaj: Conversely,fortheindexi,suchthatpi=0,then@L @pi(p)<0;i:e:BF(fi;))F0())jaj: @pi(p)0alwaysholds.Thus,theseequations(2)(3)aretheoptimalconditionsforp. 42

PAGE 43

8 ]theradiusRofthespherecontainingmvectorsv1;:::;vminRNmaybecharacterizedbyR=minv2RNmaxikvivk: Now,takethemaximuminponbothsides,andminimizeamongallpossibleh2m,wehaveMF(f1;:::;fn;a)minhmaxiBF(fi;h;a): 2-3 .)

PAGE 44

Hence,x=a()andx=b()arethesolutionsoff(x)sx=,thenf(a())sa()=;f(b())sb()=; d=1 d=1 b()a();q():=za() 2. Usethenotionin 2.10 ,thenz=p()a()+q()b()xedforall,andp0()0. Usethisresult,if1>,wehavesupxp(x)=supxargmaxm(p;1;x);infxp(x)=1supxp(x): 2.10 ,theintersections(a();b()),b()a(),a(0)=b(0)=z. Sincebyassumptionf0000,thenf0(a)f0(z) 2(f0(b)f0(z))(bz):

PAGE 45

2(f0(a)f0(z))(az): (f0(b)f0(z))(bz)(f0(a)f0(z))(az)0:(2{12) Sincep();q()satisesz=p()a()+q()b(),withp()+q()=1,dierentiatingwithrespectto,wehave0=p0a+pa0+q0b+qb0=p0(ab)+1 f0(a)f0(z)+za f0(b)f0(z)); 2{12 ,wehave(f0(b)f0(z))(f0(a)f0(z))(ba)2p0=(bz)(f0(b)f0(z))(az)(f0(a)f0(z))0: Nowweverifytherststatement.Denotep:=lima!bargmaxpm(p;a;b)=1 2;notethatthemaximizerp=psatisesf(b)f(a) 2(ba)f00(a)((1p)(ba)f00(a))+O(jabj2)=0; 2,asa!b. Givena>b,assumep=argmaxpm(p;a;b),letz=pa+(1p)b,thenbyp0()<0,wehavep<1 2,thenp
PAGE 46

B.2 Basedonthislemma,wecanhaveaboundforp:supx2p(x)pinfx2p(x):

PAGE 47

Proof. Then1p1=p2+:::+pn,let^f:=1 1p1(p2f2+:::+pnfn),thenf=p1f1+(1p1)f0,andBF(f1;f;a)=1 1p1(p2BF(f2;f;a)+:::+pnBF(fn;f;a))BF(^f;f;a) Nowconsiderp:=argmaxp(pBF(f1;pf1+(1p)^f;a)+(1p)BF(^f;pf1+(1p)^f;a)); 47

PAGE 48

Denote:=Pipifi;andafunctionalMp(f1;:::;fn):=ZnXi=1pifilogfi I-divergence,generalizedfromKullback-Leiblerdivergence(KL(f;g):=Rflogf gdwithtwoprobabilitydistributionsf;g)isdenedfortwonon-negativefunctionsf;g, 2;1 2). Inthefollowingtheorem,wewillverifytheexistenceofthemaximizerp2:

PAGE 49

Now,foramaximizerp=(pi)2,considerasequencefpk:k=1;2;:::gconvergestop, then!;pkifilogfi wehaveZXipifilogfi Hence,M(p)iscontinuous,andstrictlyconcaveonacompactset,thusitattainsauniquemaximum. 17 ],thisquantitycanbeusedtomeasurethecapacityofadiscretememorylesschannelofastochasticmatrixW:X!Y,withinputsetX:=fx1;:::;xng,andoutputsetY:fy1;:::;ymg,

PAGE 50

3;i=1;2;3;j=1;2;3,thenP(yj)=1 3;j=1;2;3,nomatterwhichdistributiononXistaken.Thus,themutualinformationbetweentheinputandtheoutputis0,infact,theyareindependent.(Itisimpossibletoinferanyinformationabouttheinputfromtheoutput.)Incapacityfunctionalframework,M(f1;f2;f3)=0,orthesethreefunctionsareidentical. 2.8 ) ap+bq+bqlogb ap+bq; 2jabj(2{13) 2ZjfgjdMp(f;g)pq

PAGE 51

4,wehave1 2ZjfgjdM(f;g)1 8Rmax(f;g)(Zjfgjd)2: 2.4 Proof. 8(Zmax(f;g)d2Zmin(f;g)+R(min(f;g))2d 8(Zmax(f;g)d2Zmin(f;g)d)1 8Zmax(f;g)d1 4kgk: pf+qg+qglogg pf+qgd;q:=1p: q+px)r1;0
PAGE 52

2-4 1.gr(p;x)hasonlyonediscontinuityatx=1.limx!1gr(p;x)=(p 2:;ifr1 2: dxgr(p;x)ispositiveforx2R+nf1g,thusg(p;x)increasesmonotonically. Proof. q+px=pq(x1)+pq2 2sgn(x1)1;ifr<1 2 dxgr>0ifandonlyiff>0,heref(x):=(1+r)log1 q+pxq q+pxlog1 q+pxlogx q+px: Takedierentiationagain,f0(x)=q(q(1+r)+prx)log1 q+px

PAGE 53

Therearetwocasesforr. Supposep
PAGE 54

Sincethetriangleinequalityholdsinthecasesa
PAGE 55

a>1,(thenx=b=c,x2(1 @c=gr(p;x)+gr(q;x): thenifr<1 2,then8><>:RHS!+1asx!1;RHS!asx!1 2,then8><>:RHS!+p 2.3 weknowthatgr(p;x)+gr(q;x)ismonotonicallyincreasingasxincreasesfrom1 Noteascdecreases,xincreases.Whenthesignchangehappens,thederivativec1r@RHS @cgoesfromnegativetopositiveasxgoesfrom1 @c<0atc=b,and@RHS @c>0atc=a,whichimpliesthatthereisonlyandonlyonesignchangeamongc2[a;b],whichisamaximizer,notaminimizer. Thus,RHShastwolocalminimaatc=aandc=b. Finally,whenc=a,orc=b,thenm(p;a;c)r+m(p;c;b)rbecomesm(p;a;b)r.Soweestablishthisinequality. ByLemma 2.3 ,clearlywehavethesecondstatement. Proof.

PAGE 56

B.1 Amongthesetf(f;g):MF(f;l)Lf;MF(g;l)Lgg,wearelookingforthemaximalofMF(f;g). ConsideracostfunctionalMF(f;g)1MF(f;l)2MF(g;l);

PAGE 57

(F0(fi)F0(p1fi+q1gi))3(F0(fi)F0(p3li+q3fi))=0; (F0(gi)F0(p1fi+q1gi))2(F0(gi)F0(p2gi+q2li))=0; 2{14 2{16 ,andoneofthemis(fi;gi;li)=(1;1;1). Proof. Ontheotherhand,since(p1;q1)isthemaximizer,wehaveXi(logfiloggi)ai=Xi(figi

PAGE 58

2{17 2{19 ,wehaveXi(loglilogfi)ai+1(Xi(logliloggi)ai=0; 1+(p2+p12)x: Wecanformulateitinthefollowing,y+(p2+q31)xy+1x=0;y+x(21)+x2(2p2p12)+xy(p12+p2)=0: Inthecasex=0,wehave(x;y)=(0;0),thetrivialsolution.Inthesecondcase,usingtheresult2=1+1,weknowthatthisdescribesalinethroughtheorigin,andthisyieldsatmosttwosolutionsincluding(0;0).Thus,weknowonlytwosolutionsinthesystem 2{14 2{16 ,andoneofthemis(fi;gi;li)=(1;1;1). 58

PAGE 60

Examplesoftherstkindofmetrics. (logf00) (logf00)00 (2)logx 2x Table2-2. ExamplesofminimaxBregmandivergences. (logf00) (logf00)0 (2)logx 2x BregmandivergenceBf(x;y). TherelativelocationsofpointsmentionedinTheorem 2.6 60

PAGE 61

Theshiftindicatesthedivergence. Theexampleplotsofthefunctiongr(x).(A)gr(x),r=p=0:4;q=0:6.(B)gr(x),r=p=q=0:5. 61

PAGE 62

3.1 )containingasmanynormaldataaspossibleandexcludingasmanyabnormaldataaspossible.Weprovetheexistenceanduniquenessofsuchaclassierinourformalism. MuchworkexistsintheliteratureonthisclassicationmostlybasedontheL2metric.(e.g.[ 6 ][ 7 ])HerewepresentamathematicalmodelapplicableinL1contextandwebaseitonthenotionofKullback-Lieblerdivergencebecauseitiscloselyconnectedwiththestatisticalconceptoflikelihood. SupposegiventwosetsofnonnegativeL1functionsff1;:::;fmg,andfg1;::;gngdenedonanitemeasurespace(;),labelednormalandabnormalrespectively.Ourgoalistondsomecommon\feature"ofthenormal,whichcanbeusedtoseparateitfromabnormal.OurmodelingapproachincorporatessomeoftheideasinSupportVectorClusteringtechnology[ 7 ].Inparticularourdecisionmethodaimstond\thesmallestKL-sphere"(seeDenition 3.1 )enclosingasmany\normals"aspossibleandsimultaneouslyexcludingasmany\abnormals"aspossible.Weallowforthepossibilitiesof\outliers".Tothisend,weminimizethefollowingcostfunctional: 62

PAGE 63

Wepresentthefollowingjusticationfortheaboveformulation:handRwillbethecenterandradius,respectively,ofthesmallestKL-spherecontainingthenormals.iandjmeasurethecostofmisclassifyingthenormalandabnormaldatarespectively.ThefourthtermbeingtheCKLdivergence(inDenition 3.1 below)betweenandwand1weighsthecomponentsofthemassmostappropriateforourclassifyingproblem.Itiseasytoseethattheoptimali;jaremax(CKL(fi;h)R;0);max(RCKL(gj;h);0)respectively,onceh;R;wareknown.Wecallthis(h;R;w)alocationweightedclassier. Wewillverifytheexistenceoftheoptimalsolution(h;R;w).Notethatwithoutthelastterminthecostfunctional,theoptimalwiszero.Thelasttermplaysarolesimilartothatofaregularizationtermininverseproblems[ 13 ].Theassumptionoftheabsolutecontinuitysimplyreectsthefactthattheinformationcanonlyberetrievedfromavailabledata. Letusmentiontwoimportantfeaturesofourmodel:First,inpractice,thedatacollectedcontainalotofinformation,butmostofthismightbeirrelevanttotheclassicationprocess.Ourmodelislookingforaweightfunctionwonemphasizingtheimportanceofthedataondierentpartsof,suchthattheclassiercanyieldanoptimalseparationamongtrainingdata. Second,ourmodelfocusesontheclassicationofdistributions(forthesakeofclarity,hereaftercalled"data-distributions").AnaturalandgenerallyaccepteddiscriminantmeasuringthedissimilarityoftwodensitiesisKullback-Leiblerdivergence.ByStein'sLemmaunderatruedistributionQ,theprobabilityofdrawinganempiricaldistributionPwithsamplingnumbernisertotherstorder.HeretheexponentristheproductofnandtheKullback-LeiblerdivergencebetweenPandQ.Inourmodel,the\sphere"ofsmallestradiusenclosingtwodata-distributionsintermsofKullback-Leiblerdivergence 63

PAGE 64

Onefundamentalideaunderlyingourmodelis,thatwhereasanytwo\normaldata-distributions"maybefarapartintheK-Lparadigmtheymayallbereasonablyclose,ifweagreetodiscardirrelevantinformation,toa\center".ThiswecallthecenteroftheKL-sphere.WewillshowthatoptimalseparationintermsofKullback-Leiblerdivergenceispossiblewithanappropriatelychosenweightfunction. Mathematically,wewillshowthatgiventwosetsofnonnegativeL1functions,suchaclassier-thesolutionofourmodelexistsuniquely. LetL1+(;)denotetheconvexconeofnon-negativefunctionsinL1(;).Weassumethatisanitemeasure. gf+g)d: g)d: yx+y0; 64

PAGE 65

3{1 separatingthegroups.Theoptimalgroupingisdeterminedbyachievingtheaboveoptimaloverallcost.Inthispaper,wedonotconsiderthisgeneralcase,andwewillfocusonlyonegroupof\normals". Theexistenceoftheminimizeraboveisnotobvious,butthiswillbeestablishedinthesequel.Intuitively,wearemeasuringthesimilarityoffandgbytheir\closeness"intheCKLdivergencesensetoanauxiliarythirdfunctionh,ratherthanCKL(f;g)orCKL(g;f)asisthestandard. Asmentionedintheintroduction,wearegivenasetoffunctionswithtwolabels,called\normal",ffi2L1+;i=1;:::;mg,and\abnormal",fgj2L1+;j=1;:::;ng,togetherwiththepriorinformationthatnormaldatais\similar",whileabnormaldatavariesalot.Ouraimistoexplorethenormalitystructureofthetrainingdata,sothatnewdata 65

PAGE 66

3{1 )intheintroduction. Beforeweanalyzethismodel,letusstudyasimplermodelinwhichnoweightfunctionisinvolved. ThissimplermodelisdescribedinthefollowingDenition. Givenf1;:::;fm;g1;:::;gn2L1+,C1>0,C20,letonX+Rm+n+by ThisminimizationproblemiscalledPrimalProblem. 3.3 hastheveryniceproperty,whichwecall\measureinvariance":Assume(h;R)istheoptimalclassierunderthebasemeasure.Whenwechangethebasemeasuretoanothermeasure00withd=ud00;u2L1+(;00),andchangethedatafi;gjtonewdataufi;ugj,theoptimalsolutionis(uh;R).Inotherwords,eventhoughthedatafi;gj;hthemselveschangewiththebasemeasure,alldivergencesfCKL(fi;h);CKL(gj;h);i=1;:::;m;j=1:::;ngandtheradiusoftheminimalCKL-sphere,R,staythesame.

PAGE 67

AssumegiventwopositiverealnumbersC11 infh2L1+()sup(p;q)20L(h;p;q);(3{5) and sup(p;q)2infh2L1+()L(h;p;q):(3{6) Wewillverifytheequivalencebetweentheproblem( 3{5 )andthePrimalproblem,andtheequivalencebetweentheproblem( 3{6 )andtheDualproblem.Thenweare 67

PAGE 68

3{5 ),( 3{6 ). NoticethatatrstglancetheformulationofthePrimalproblemappearstobeahighlynon-convexproblem,butinfactitisaconvexproblemindisguise! Proof. 68

PAGE 69

Similarargumentsmaybeapplied,whenRisvariedfrom^Rto^R,(^R)(^R)=+C1(m1+m2)C2n1>0: Finally,whenR=0,wehaveC1(m1+m2)C2n1=mC1>1andC1m1(n1+n2)C2=mC1>1.SincebothmonotonicallydecreaseasRincreases,thereexistsauniqueoptimalradius^Rsatisfyingtheinequality:C1(m1+m2)C2n1>1>C1m1(n1+n2)C2. NowweestablishtheequivalencebetweenthePrimalproblemand( 3{5 ).Sosupposethatgivenh2L1+,therearem1normalsoutsidetheCKLsphereandm2onthesphere;n1,n2abnormaldatainsideCKLsphereandonthesphere.Clearly,max(p;q)20L(h;p;q)isattainedifpi=C1;qj=C2forthosei;jwithCKL(fi;h)>R;CKL(gj;h)R.(Forthosei;jwithCKL(fi;h)=CKL(gj;h)=R,changingthevaluesofpi;qjdoes 69

PAGE 70

3.3 .Thereforeinfh2L1+sup(p;q)20L(h;p;q)=infh2L1+(R(1m1C1+n1C2)+C1Xi:CKL(fi;h)>RCKL(fi;h)C2Xj:CKL(gj;h)
PAGE 71

Besides,inthecaseofab,wehave1 zblogb za+b+z);withz>0: First,ifa=b,thenmx;y(z;p;q)=Pmi=1pixilogxi+Pnj=1qjyjlogyj+zforallz>0,anditsminimumisattainedaszapproaches0.Thuswemaydenemx;y(0;p;q)=Pmi=1pixilogxi+Pnj=1qjyjlogyj. Ifa>b,thenwritec:=ab>0,mx;y(z;p;q)isconvexinz,wehavealowerboundforthesecondtermofmx;y(z;p;q),aloga zblogb za+b+z=clogc z+aloga cblogb cc+zaloga cblogb c: Combiningthecasesab=0,andab>0,weknowthisinequalityholdsforab0. 71

PAGE 72

Lowerbound:Since(qj)yj>b=nXj=1(qj)yj;(qj)yjlogyj c+bloga b;1 alogc aab alogb a1 alogc a1 alogb a1 Inordertoexaminethedierencebetweenand0,wepartitionthewholedomaininto1[2,with1:=f!2:Pmi=1pifi(!)+Pnj=1qjgj(!)0g,and2:=f!2:Pmi=1pifi(!)+Pnj=1qjgj(!)<0g.

PAGE 73

Thus,theintegrandinMf;g(p;q)isintegrable,andMf;g(p;q)isniteover.Besides,Mf;g(p;q)isconcavein(p;q)2. Thus,sup(p;q)20infh2L1+L(h;p;q)=sup(p;q)2infh2L1+L(h;p;q)=sup(p;q)2Mf;g(p;q): Nowgivenf;g;h2L1+(),and(p;q)2,sincetheintegrandinL(h;p;q):=ZmXi=1pifilogfi Toverifythesecondpartofthistheorem,forany(p;q)20,werewritetheexpressioninfh2L1+()L(h;p;q)asinfh2L1+(1[2)Z1[2mXi=1pifilogfi

PAGE 74

Ontheset2,infh2L1+(2)Z2mXi=1pifilogfi Thus,forthose(p;q)20,infh2L1+L(h;p;q)=. So,sup(p;q)20infh2L1+L(h;p;q)=sup(p;q)2infh2L1+L(h;p;q)=sup(p;q)2Mf;g(p;q): Theorem3.3(ExistenceanduniquenessofsolutioninDualproblem). Proof. Moreover,sinceiscompact,thereexistsasubsequencef(p;q)k0gfromf(p;q)kgsuchthatlimk0!1M((p;q)k0)=^M; 74

PAGE 75

SinceM(p;q)isstrictlyconcaveinpi;i=1;:::;m,andqj;j=1;:::;n,thenwehaveauniquemaximizer. Let(^p;^q)bethemaximizerofthedualproblem.Theneither(^p;^q)liesintheinterioroforontheboundary.If(^p;^q)isaboundarypoint,thenatleastonecomponentofanoptimalpoint(^p1;:::;^pm;^q1;:::;^qn)satises^pi=0orC1or^qj=0orC2,orf!:<^p;f(!)>+<^q;g(!)>=0ghasnonzeromeasure.Next,wewillexcludethelastcaseofaboundarypointinthefollowingsense.Iftheredoesexistsuchanonzeromeasureset,theneveryfunctionpifiandqjgjvanishesonthisset. Letuspartitiontheset1(p;q)further,1(p;q)=3(p;q)[4(p;q)with3(p;q):=f!2:Pmi=1pifi(!)+Pnj=1qjgj(!)>0g,and4(p;q):=f!2:Pmi=1pifi(!)+Pnj=1qjgj(!)=0g. Inotherwords,ifxsatisesthisequationPmi=1^pifi(!)+Pnj=1^qjgj(!)=0,then\nocancelation"happensinthissummation,andinsteadalltermsmustvanishatthesametime. Proof.

PAGE 76

Nowwecomputethedirectionalderivativeofmat(^p;^q)alongthislinesegment.For>0,wehaved(p;q) isintegrableforall1>>0,denotingh:=+,thenwehave Supposethesetf!2:^h(!)=0ghasnonzeromeasurewiththemaximizer(^p;^q).WejustshowedinTheorem 3.2 thatMf;g(p;q)isnite.AlsothefunctionalMf;g(p;q)isconcavein(p;q)2,thenthisfunctionaliscontinuous.Since(^p;^q)isthemaximizer,thenthisdirectionalderivativeshouldbenon-positive. 76

PAGE 77

3{7 )isnonnegative,andifforanyi,suchthat^pifi6=0onanynonzeromeasuresubsetof4(^p;^q),thenlim!0dM(f;g)(p;q) 3{8 ). Sincesup(p;q)20L(h;p;q)L(h;p;q);infh2L1+()sup(p;q)20L(h;p;q)infh2L1+()L(h;p;q);infh2L1+()sup(p;q)20L(h;p;q)sup(p;q)20infh2L1+()L(h;p;q): 8 ]. 77

PAGE 78

Proof. TakingderivativeofcM(p;q)withrespecttopiandqj,wehave@cM @pi=Zfilogfi @qj=Zgjloggj When^pi=C1or^qj=0,thesederivativesarenonnegative,CKL(fi;h(^p;^q))kh(^p;^q)k+0;CKL(gj;h(^p;^q))kh(^p;^q)k+0;

PAGE 79

Thus,thereisnodualitygap. Finally,sincethemaximizer(^p;^q)oftheDualProblemisunique,theminimizer(h(^p;^q);^R)ofthePrimalproblemisalsounique.

PAGE 80

ByTheorem 3.2 ,thisbhsatisesbh=argminhmmXi=11

PAGE 81

IfC1;C21,thenn1+n2+1 6 ]. 3{1 )intheintroduction.Duetoavarietyofreasons,theanappropriatemeasureforthedatacollectedisusuallynotavailable,ormissing.Directlymeasuringthesimilarityofthedatamaybenotbeproperwiththisbasemeasure.Inotherwords,eventhoughthesupportofdataisonthewholeof,somesubsetof 81

PAGE 82

Bythedualityformulationoftherstterm,wehavethefollowingexpression,denotingh(p;q)=mXi=1pifi+nXj=1qjgj;mf;g(p;q):=mXi=1pifilogfi (3{9)

PAGE 83

Moreover,givenanysequencefwig2A,thereexistasubsequenceoffwig,andafunction^w2L1+(),suchthatthissubsequenceconvergesweaklyto^winL1. Proof. e+1)d=Zfwg(wlog e)dlog eZfwgwd;Zfwgwdb e: +1).WehaveRwwd
PAGE 84

Consideraminimizingsequencefwi2Agandlet^:=minw2A(w)=limi!1(wi): 3{9 )isconvexinw,andconcavein(p;q),thentheoptimalsolution,say(^p;^q;^w)isasaddlepoint[ 21 ].Toanalyzethiscostfunctionalfurther,letusexaminethefollowingdualproblem. max(p;q)2(minw(!)2L1+()M(p;q;w));here (3{10) 3{9 )satises:8><>:(^p;^q)=argmin(p;q)2Rexp(1

PAGE 85

3{10 ).Intuitively,itshowsthat^w(!)putsmoreweightatlocationswherem(f(!);g(!))(^p;^q)issmaller. 3{9 )canbesimpliedasC3()min(p;q)2Zexp(1 3{10 ).Notethatforeveryw0,wehavetheinequality andequalityholdsonlywhenw=exp(v):(Infact,logwwandexp(v)areconjugatefunctions[ 8 ].) Letv=1 3{11 ),anddenote^w:=exp(v),thenwehave1 3{10 )canberewrittenasC3max(p;q)2Z(1exp(1 Denoteby(^p;^q;^w)theoptimalsolutionoftheproblem( 3{10 ). 85

PAGE 86

3{9 )isstrictlyconvexinw,andstrictlyconcavein(p;q),thereisatmostonesaddlepoint.Hencetocompletetheproof,itsucestoshowthatthispoint(^p;^q;^w)isasaddlepoint[ 21 ],i.e.weneedtoshowM(^p;^q;w)M(^p;^q;^w);forallw2L1+() andM(p;q;^w)M(^p;^q;^w);forall(p;q)2: ThusM(^p;^q;^w)M(p;q;^w);forall(p;q)2: 86

PAGE 87

max(p;q)2infw2L1+()M(p;q;w)=infw2L1+()max(p;q)2M(p;q;w)(3{13) and(^p;^q;^w)istheiroptimalsolution.Thus,( 3{12 )isthesolutionoftheoriginproblem( 3{9 ). Numerically,considerthedataf;g2RN,thegradientdescentmethodwithArmijorulecanbeusedtondthisoptimalsolutionoftheproblem( 3{12 ). 87

PAGE 88

31 ]whichisshowninthissection,tosolvethedualproblemeciently. 4.1.1HowtoComputeIt:OutlineofAlgorithm Hereweunifythesenotionsusedinthissubsection.

PAGE 89

Let^beasmallpositivenumber,e.g.106. DenotetheradiusR:=PipiCKL(fi;f) 2.Fortwodierentindicesi1;i2,fi16=fi2. 3.WhenC26=0,someqjmaybenegative,theoretically,thereexistsapositivevector,suchthatf>.InLemma 4.4 ,wewilldiscusshowtodecidesuchavector,suchthateverycomponentfj>jforj=1;:::;N. 4.1 .Forclarity,considerthefollowingoptimizationproblemrst,given(p1;:::;pm+n)2,letp():=(p1();:::;pm+n())2,max:p()2M(p());withpi1():=pi1+;pi2()=pi2;pk()=pk;forotherindicesk6=i1;i2: d=CKL(fi1;f)CKL(pi2;f)>0,whichyieldstheoptimal>0. 89

PAGE 90

Denoteupper:=maxD0,and:=argmaxfj0g,theneitherD=[0;upper],or[0;). Hereisthevalueofupper,whentheindicesi1;i2arechosen.upper:=8>>>>>>><>>>>>>>:min(C1pi1;pi2);ifi1m;i2m:min(C1pi1;C2+pi2);ifi1m;i2>m:min(pi1;pi2);ifi1>m;i2m:min(pi1;C2+pi2);ifi1>m;i2>m: d2=Pf(fi1fi2)2 d d2M d2j=k;0:=0;ifk2D: Inpractice,usuallythesetAisasmallportionofthewholeindexsetf1;2;:::;m+ng,thuswecheckwhethertheconstraintisstillactivebeforeNewtoniterationisused. 4.2 90

PAGE 91

2k(0);k=0;1;:::,then0<1<:::<.Were-assignsasfollows.s:=8><>:upper;ifupper:k;ifupper>andkisthesmallestintegersuchthatg(k)<0: Proof. SinceFj()=Fj(0)+(fi1;jfi2;j)=0,>0andFj(0)>0,thenfi1;jfi2;j<0.Thenwehavelim!g()=Xffi1logfi1fi1gXffi2logfi2fi2glim!Xf(fi1fi2)logF())g=: Basedontheabovedenition,wecanstatethemainideaofinnerloopofouralgorithm:either=upperorcanbeapproximatedusingNewtoniterationwithBisectionrule. 4.1.3ThePropertiesofNewtonIteration Denition4.5. F()g,whereF()=,thendenoteg():=dM d=Xffi1logfi1 d2(0)=0g,and:=fjg()=g(0)g.

PAGE 92

4.5 .HereareseveralpropertiesofNewton'siterationinourproblem. 1.g0()<0;g000()<0.Hence,sincek+1:=kg(k) 2.If00,theng00()>0.If0,theng00()<0. 3.Assumeg0boundedonD.IfisainteriorpointofD,andg0()<0for2[0;],thenk+1=(k)2g00() 2g0(k): 4.Similarto3.,ifs0,forsomepositiveintegers,withs2D,thenfkg1k=sdecreasesmonotonicallyandconvergesto. Proof. d=d2M d2=Xf(fi1fi2)2 d2=d3M d3=Xf(fi1fi2)3 d3=d4M d4=Xf2(fi1fi2)4 Weshowthethirdstatementasfollows. Sinces0,theng0(s)<0. Sinces,ands+1:=sg(s) 2g0(s)0;s:

PAGE 93

Then,weknowthatfkg1k=sincreasesmonotonicallyandconvergesto. Fors,sincewehaveg()=g(s)+g0()(s)g(s)+g0(s)(s),forsomebetween;s,thentheincrementM(p())M(p(s))=Zsg()dZsg(s)+g0(s)(s)d=g(s)2 2(s+g(s) Theoutputincludesthevectorp,theactivevectora,theradiusR,theITCvectorF,andthecostM. Wedenotepk;ak;Rk;Fkastheirk-thiteratedvalues. 1.[Initialization]H:=(H1;:::;Hm+n),Hi:=Pffilogfig;p0:=(p01;:::;p0m+n),p0i:=1 2.[step1:theselectionoftheindexi1;i2] BecausedeterminingAandRarerelated,severaliterationsmightbeneeded. begin

PAGE 94

beginwhilejdk+1i1dk+1i2j>g,douseNewton'siterationtoupdate,dk+1i1dk+1i2; end 4.3 Besides,theincrementofthecapacityfunctionalMisatleast1 2max(jg0(0)j;jg0(l)j)g2(1). Proof. d=Xffi1logfi1 d<0;d3g d3=d4M d4<0:

PAGE 95

Supposeg()>0,whichimpliesg()>0forall2D,thenweknowthattheupperboundofpisanactiveconstraintinfact.Thiscasehappensonlywhenupper2D,andg(upper)>0.AndtheincrementofMisatleastg2 Inthecaseg()=0,Newton'siterationisused.NoticethatthemaximalincrementofMisgivenbyM(p())M(p(0))=R0g()d>0.Sinceonlytherstltermsaregeneratedbeforejg(l)jmin(upper;). If0min(upper;),thenre-assigns+1byBisectionrule,andcontinueNewton'siterationfromthisnews+1,thenfkg1k=s+1monotonicallydecreasingto.So,suchanintegerlexists. ToanalyzetheincrementofM,twocasescouldhappen,eithertherstcaseg(0)>g(l)>0orthesecondcaseg(l)<0. Now,intherstcase,00thenit'seasytoseetheincrementofMatleast1 2jg0(0)jg(0)2. 95

PAGE 96

2g(0)=1 2jg0()j(1)g2(0): max(jg0(0)j;jg0(l)j). Theconclusionis,weknowthateither=upper,ortheNewton'smethodwithBisectionrulegivesthelocationofl,suchthatjg(l)j,forallk. Inthek-thiteration,denoteITCasFk(0):=f(pk)=(Fk1;:::;FkN),andtheindicesi1;i2arechosen,withCKL(fi1;Fk(0))>CKL(fi2;Fk(0)),thenwehaveforallkFkjj;wherej:=max1imfi;je(1+K fi;j);j=1;:::;N: 96

PAGE 97

fi;j);i=1;:::;m:Fkjmax1imfi;je(1+K fi;j): fi;j)amongallfi;j>0. InNewton'siteration,saytheindexi1;i2chosen,sinceFk+1:=Fk+(fi1fi2),thenanupperboundofisforallk,thusg() 97

PAGE 98

d,theng0()=d2M(p()) Inthecasen=0;C11,thisturnsouttobetheproblemofndingthesmallestsphereenclosingallthedistributions,intermsofinformationdivergence.Ininformationtheory,thismaximalquantitycalledcapacity,isthemaximalquantitythroughaninfor-mationchannel,whichischaracterizedbythecolumnsoftransitionmatrixofadiscretememorylesschannel,i.e.ffig.([ 17 ][ 19 ]) Example1: Letf:=rand(100;100)whosevaluesaresampledfromtheuniformlydistribution[0;1],i.e.ffi;i=1;:::;100g,arevectorswith100components.Thex-axisindicatesthenumberofiterations(onlystep1).They-axisoftheleftgureindicatesthevalueofM(pk)(Left),andthey-axisoftherightgureindicatesthevalueofg(pk)(Right).Thetotalnumberofiterationsforstep1,2are155,155respectively.(=0:001isused.) 98

PAGE 99

Considerf1=[1;1;1]=3;f2=[1;2;3]=6;f3=[1;4;5]=10,withtheinitialp=[1 3;1 3;1 3].Weuse14iterationsofstep1andittakestotally20iterationsofstep2,(=0:001isused),toachieveaccuracy107,i.e.g(pk)lessthan107.Theoptimalp=[0:4609;0:0462;0:4929],andtheirCKLdivergencesare[0:0432883;0:0432884;0:0432883]respectively.Hereinthegure 4-2 ,wecompareourmethodwithBlahut-Arimoto(1972)algorithm[ 1 ][ 9 ].(Infact,ittakesmorethan1000iterationstoattainthedierenceglessthan104.Thetotalnumberofiterationinouralgorithmis100timessmaller,andtheexecutedtimeisabout4timeslonger.) 28 ]. Letf:=rand(100;100),andnormalizeeachcolumnvectorsuchthattheirsumsareall1.Taketherst50vectorsasthenormaldata,andtherest50vectorsindicatetheabnormaldata.AndsettheparametersC1=0:7;C2=0:3. HerearetheresultofM,andtheerrorateachiteration. Thex-axisindicatesthenumberofiterations(onlystep1counted).They-axisoftheleftgureindicatesthevalueofM(pk)(Left).They-axisoftherightgureindicatesthevalueofg(pk)(Right).Theerrorgureshowsthatitislinearconvergent.Thenumberofiterationsforstep1;2are842;944respectively,whwn=0:01.(Thenumberofiterationsinstep2increasesto1240for=0:001.)

PAGE 100

4.3.1Introduction Unlikelargesizeofdataavailableinmostmachinelearningproblems,thedatasizeofthecardiacshapeinourproblemisrelativelysmall.Sincethesedataareepicardialborderswhichweretracedintheultrasoundimagesbyexperts,itisexpensivetoproducealargesetoftrainingdata,butontheotherhand,understandingthestructureofcardiaccontourisanimportantprobleminheartshapeanalysis. Intraditionalclusteringmethod,agivennumberofcentersarecomputedinordertorepresentthewholedatasetwithminimaldistortionordissimilarity.Ourmodelgivesan 100

PAGE 101

WhentheparameterC1islessthan1 3.6 ,themodelactuallyperformstheclusteringofnormaldata.ThegroupingexerciseistoclusterthosenormalshapesbychoosingseveralITCcurves.Inotherwords,theseITCcurvesrepresenttheirowngroupsinthesenseofsimilarity.Unlikethetraditionalgrouping,theradiusinourmodelusedinconjunctionwithITCgivestheamountofdistortion.WhenC1isnotverysmall,abettergroupingisachievedthroughminimizingmisclassicationerrorsofbothnormalandabnormaldata.Moreprecisely,letGkbethesetofcurvesingroupk,k=1;:::;K,thenourmodelminimizesKXk=1Mk;whereMk:=Rk+C1Xi2Gki+C2nXj=1j;k=1;:::;K Hereareourmodelassumptionsandalgorithmoutline. 3.1 ).EachgroupisenclosedbyaspherewitharadiusmeasuredbyKL-divergence.Thecenterofthesmallestsphereisouraveragingshape.Then,wecandecidethenormalityofanynew-coming,merelycomputetheKL-divergencefromthecentertothiscomingdata,andcomparewiththeradius. 101

PAGE 102

3.3 ,consistingthreeterms,oneforthedissimilaritycost,therestforthemisclassicationcost. 2r2withthepolarcoordinateofthecurver=r().Inordertohaveauniqueexpressionofshape,weperformthefollowingrigidtransform.Translateitsmasscentertotheorigin,andalsonormalizetheareadistributionsuchthatRf()d=1.Therotationangleofthedatafisdeterminedby=argminCKL(f();f()),herefisthecenter.Indetails,CKL(fi;f):=minCKL(f();f()) Endocardialboundary,4-chamberED(enddiastolephase)-Endo(endo-cardial)isusedthroughthewholeexperiments.Theseepicardialbordersweretracedintheultrasoundimagesbyexperts.Thenormalityisdeterminedby4curves,ED-Endo,ED-Epi(epi-cardial),ES(endsystole)-Epi,andES-Endo,andtheintensityofultrasoundimage.Ofcourse,sinceonlyED-Endoisusedinourexperiments,itisunlikelytoget100%accuracy,however,ourresultshowsthatitisstillaniceclassierundersuchcircumstances. 102

PAGE 103

2.ComputegroupGk:=ffijk=argminCKL(fi;fk);i=1;:::;mg,k=1;:::;M. 3.ComputeMk;k=1;:::;KwithC2=0,togetRk,andtheupdatedcenterfk;k=1;:::;K. 4.ThealgorithmstopsifnochangemadeinStep3,otherwisegotoStep2. 4.4 .(Left)Ineachsub-gure,thosecurvesindicatestheheartcontoursinthesamegroup.TheITCcurvesareplottedin.SinceonlylocalminimalisattainedthroughK-meansmethod,weruntheprogramseveraltimeswiththedierentrandomlyinitialdataandchoosetheonewiththesmallestcost. 103

PAGE 104

Inthisexperiment,weclusterallnormalcontoursinto3groups.HereweupdatetheirmembershipthroughcomparingtheirCKLdivergenceexcessoutsidethesespheres,i.e.g(i):=argmink2f1;:::;Kg(CKL(fi;fk)Rk): Thefollowingistheclassicationmethod. 2.ComputegroupGk:=ffijk=argminkCKL(fi;fk)Rk;i=1;:::;mg,k=1;:::;M. 3.ComputeMk;k=1;:::;K,togetRk,andtheupdatedcenterfk;k=1;:::;K. 4.ThealgorithmstopsifnochangemadeinStep3.OtherwisebacktoStep2. Thisalgorithmconsistsoftwoparts.First,clusterthenormaltrainingdatainto3groupsbythepreviousclusteringmethod.Theresultwillbetheinitialassignmentforthesecondstep.Second,classifyeachgroupwiththewholeabnormaltrainingdataiteratively.Theradiusobtainedintherststepisusuallytoolarge,i.e.thespheremaycontainabnormaldata.Thenforeachgroup,wehavetheirITC(f1;f2;f3)andradius(R1;R2;R3).Inthegure 4-6 ,thosesolidcurvesarethosecorrectlyseparatedbythespheres,andthedottedgreycurvesarenot.Thetoprowdisplaysthetrainingdata,and 104

PAGE 105

Now,foreachoftestingdata,assignthemintothesegroupsbyseekingtheclosestCKL-divergencefromthese3ITCs.Forinstance,supposethatfisatestingdata,and1=argmini=1;2;3CKL(f;fi),thenassignftogroup1.Ifthisdivergenceissmallerthanthegroupradius,CKL(f;f1)R1,weclassifythistestingshapeintonormalgroup.Ifnot,weclassifyitintoabnormalgroup. Among24normaltestingdata:successrate83:3%. Among22abnormaltestingdata:successrate86:4%.Failurenumbersare4;3outof24;22.Inthegure 4-7 ,thevalueofx-axisisCKL(f;fi)Ri,iffisassignedtogroupi.Thehistogramoftestingdatashowsthatmostofmisclassieddataareclosetothespheres. 4-7 .)Inotherwords,thismodelinfactistominimizethistransitionregion. 105

PAGE 106

4-5 .Thus,theseparametersaresuggestedlessthan1:0.Onecanviewtheseparametersastheweightbetweenclusteringandclassifying.WhenC1;C2aresmall,itbecomesmorelikeclustering,sincetheradiuswhichcontrolsthesimilarityofcurvesisminimized.Whentheyarelarge,itactsmorelikeaclassier,sincetheclassicationerrorbecomessmall(whichreducesthenumberofoutliers),butontheotherhand,theradiuscouldbelarge,whichimpliesalargedissimilarity. 106

PAGE 107

4.4 .Forcomparison,acirclewithradius1isplotted,too.Thisweightfunctionshowsthatseveralpartsofcardiaccontourhavemoreinuenceaboutnormality,especiallythosepartswithhighcurvatures. 107

PAGE 108

Unlikethetraditionalclusteringmethod,weobtainnotonlytheseclusteringcenters,butalsohowbigthesizeofeachgroupis,andwherethetransitionregionofeachgrouplocates.Finally,usingthelocationweightclassier,weobtainmoreunderstandingaboutnormality.Overall,wediscussnotonlythemathematicalpropertiesofthismodel,butalsothenormalityresultinthetestingexperimentisgood,validatingtheretrievedinformationfromthewholeprocedure. 108

PAGE 109

Numericalcomputationsofourcapacityalgorithm.(A)(B)ResultsofExample1.(C)(D)ResultsofExample3. ComparisonbetweenouralgorithmandB-Aalgorithm. 109

PAGE 110

Clusteringall91normalshapesinto4groups. Theweightfunction.UsingC1=C2=0:5;C3=0:05plottedindot-line,andthesolidlineisaunitcircleincomparison. 110

PAGE 111

TrainingandtestingexperimentresultusingC1=C2=0:5.(A)(B)(C)Trainingand(D)(E)(F)testingheartshapesof3normalgroupsand3ITCs(inlargedots),thegreycurvesarethoseoutsidethespheres Figure4-6. TrainingandtestingresultsusingC1=C2=1:0. (A)(B)(C)Trainingand(D)(E)(F)testingheartshapesof3normalgroupsand3ITCs(inlargedots),thegreycurvesarethoseoutsidethespheres 111

PAGE 112

Histogramofclassicationresults.HistogramofnormalTraining(A)andTesting(B)results,abnormalTraining(C)andTesting(D)results 112

PAGE 113

35 ],imagetracking,videoindexing,imagesegmentationandregistration[ 14 ][ 34 ]. Oneoftheimportantapproachestondingoptimalcorrespondencesisbasedonminimizingamixtureofstretchingandbendingenergiessuchthatmainfeaturesoncurvescanbematched.Avarietyofshapesignaturematchingmethodsinvolvingcurvatures,bendingangles,andorientationshavebeenproposeddependingonapplications,e.g.[ 5 ][ 33 ][ 38 ].Recently,Klassenetal.[ 26 ][ 30 ](Chap.12)developedanapproachtoshapematchingbycomputinggeodesicsonashapemanifold.ThenumericalimplementationlaterwasimprovedbyF.Schmidtetal.[ 32 ]. Astudyofthesetechniquescompelsustopositthatcurvematchingmodelhavingthefollowingpropertiesispreferable:First,thecostfunctionalisdierentiable,thecorrespondenceissymmetricandrobusttonoise,distortionandocclusion(e.g.see[ 33 ]).Second,thedistancebetweentwocorrespondingcurveshasametricpropertyandleadstoanearestneighborinalargedatabase[ 37 ].Third,themodeliscapableofndingthecorrespondencesamongagroupofcurvessimultaneously,insteadoftwocurvespairwiseatatime. Tagare[ 34 ]successfullysolvedthesymmetryproblembyintroducingabimorphisminthespaceofpairsofcorrespondences,whosetwocomponentsdeterminethecorrespondencebetweenthecurves.Thisbimorphismoptimizesacostfunctionalconsistingoftwoparts.Oneoftheseforcesthebimorphismclosetouniformmapping,andtheotherminimizesthelocalcurvaturedierence.Tagareetal.basetheirenergyfunctiononL2norm. 113

PAGE 114

33 ]Sebastianetal.proposedasymmetricmodelbasedonminimizingafunctionalmeasuringthedierentialarclengthandorientationinL1metric.Theyjustiedthatthisdistanceisinfactametric.Thismetricpropertyishelpfulinindexinglargedatabaseseciently[ 37 ].Amorediscussionaboutthevarietyofcostscanbefoundin[ 5 ],butmostofthemarelimitedtopairsofcurvesonly. Pairwisematchingislessrobusttonoise,distortion,occlusion,andmayleadtoundesirablecorrespondences,sincetheinformationretrievedfromagroupismorereliablethanfromjustanytwo.Moreover,itisclearlyimportanttohavethefollowingtransitivityproperty:iff;g;harethecorrespondencemappingsfromatob,btoc,andatoc,respectively,thengf=h.Itisdiculttoconstructsymmetricpairwisemappingswiththisproperty,butaswillbeshownbelowthiscanbeachievedbyourmethodofmatchingagroupofcurvessimultaneously.ForinstanceinFig. 5-1 ,weusepairwisematchinginthreeheartshapes.ThreesamplingpointsA;B;Contherightandmiddleheartshapesareoptimalpairs,buttheiroptimalcorrespondingpointsonthethirdheartshapearenotconsistent.Inordertosolvethis,weneedtondthosecorrespondentsamongthreecurvessimultaneously.Asimplewayistoconstructoptimalmappingsbetweeneachcurvetoanothercurve,calledtheaveragecurve,suchthatasimilarityquantityisminimized. MathematicallygivenncurvesC1;:::;Cn,wesearchforanoptimalaveragecurveC0andoptimalmappingsg1;:::;gnfromthecurveC0tocurvesCisuchthatPni=1Similarity(Ci(gi(s));C0(s))isminimized,hereapropersimilaritymeasurementSimilarity(;)ischosentomeetpracticaldemands. Toproceedfurther,wechoosethesimilaritymeasurementwithtwofollowingfeatures.Weadoptastandardapproachtodividethecostintostretchingpartandbendingpart.Thebendingenergyisthettingtermwhichmeasuresthesimilarityofcurvesgiventheoptimalmapping,andthestretchingpartmeasuresthesimilarityoftheoptimalmapping 114

PAGE 115

Second,bothcostfunctionsinstretchingandbendingenergiesarechosenfromBregmandivergences.Theaveragecurvecanbeeasilyconstructedbyaveragingarc-lengthsandanglefeaturesamongallcurves.Thissimpliesthedeviationoftheaveragecurve.Thus,weadoptKullback-Leiblerdivergencebasedstretchingenergytoanalyzethismultiplecurvematchingproblem.Wewilldiscussseveralrelatedtypesofcostfunctions,andrevealtheirmathematicalproperties. Inthischapter,wearefocusedonthatcurvesarecharacterizedbytwolocalfeatures,arc-lengthandanglevariation.Thetaskofmatchingcurvesisperformedthroughmatchingthesetwofeatures.Thenourinterestis,howtoincorporatethesetwoinformationtoachieveaoptimalmatching.Onesimplestwayistouseatradeoparameterbetweenthestretchingandbendingenergies.Thisparametershouldbeingeneralchosenproperlyinordertomatchdesiredfeaturesoncurvesnoisecorruptedcurves. Undercircumstances,theintensityofnoiseislocationdependentamongsimilarcurves.Weproposealocationweightedmodeltodealwithcurvescontaminatedbylocationdependentnoise,especiallyincasesofocclusion.Thismodelwilloptimallypartitionthedomainofeachcurveintosimilarportionanddissimilarportionwithminimizingtheanglefeaturesonsimilarportionandthearc-lengthofthenoisecorruptedregion. Withonemorecriterionontheselectionofbothstretchingandbendingcostfunctions:parametrizationindependence,thefeaturematchingprocessturnsouttomatchfeaturemeasures.Kullback-LeiblerdivergenceandHellingerdistancearetwowell-knowndistancefunctionsforprobabilitydistributionfunctions.Basedonthesetwodistancefunctions,Weproposeaninterestingmodel{Jensen-Shannon-Hellinger(JSH)model.Thismodelprovidesaninterestingandmeaningfulinterpretationofmatching 115

PAGE 116

Theoptimalmappingistheoneunderwhichtheaveragecurvepreservethemostfeaturesamongthesecurves,whichincludesthearc-lengthandthetotalanglevariationhere.Inotherwords,wesearchanoptimalmapping,suchthattheaveragecurvehasthemaximalarc-lengthandthemaximaltotalanglevariation. 33 ].) 5.2.1ModelDescription 116

PAGE 117

ThisisequivalenttocomputingthenumberofpossiblemappingsffromXrtoYrforallr=1;:::;p.Letsr;trbethesizeofsetsXr;Yr,thenthepossiblemappingsffromXrtoYrforr=1;:::;psimultaneouslyisgivenbyQpi=1tsii Similarly,givenanothersetZwiththesizel,andZispartitionedassubsetsZ=[pi=1ZihereeachZihasthesizeui.ConsideramappinggfromZtoY,witheachZimappedintoYi.Thenwehaveapproximatelyexp(Ppi=1uilogui Hereisanotherexplanationofourmodel.KL-divergenceisintimatelyconnectedwiththemaximallikelihoodestimatorformulti-nomialdistributions.Forinstance,Stein'slemma[ 17 ]:DrawingktimesindependentlywithdistributionQfromanitesetX,theprobabilityofobtainingx2Xk:=kz }| {X:::XisQk(x)=Qa2XQ(a)N(a)=ePa2XN(a)logQ(a),hereN(a)denotesthenumberofoccurrencesofa2X.Then,giventheempiricaldistributionPdenedbyP(a)=N(a) Inthefollowing,wemodelthecurvematchingasthedissimilarityoftwodistributionsP;Q.Supposegivennpoints0
PAGE 118

Wedenotethelengthsofnsubintervalsqi:=xixi1;i=1;:::;n,thenwehavePni=1qi=1.LetXbearandomvariable,X:=f1;:::;ngwithfProbabilityof(X=i)isqi;i=1;:::;ng.ThenXfollowsamulti-nomialdistribution.DenoteQ:=fq1;::;qng.Denotepi:=yiyi1;i=1;:::;n,andP:=fpi;i=1;:::;ng.KL(P;Q)measuresthedistortioninf.Clearly,letyi=xi;i=1;:::;n,i.e.f=id.wehavetheminimalvalueKL(P;Q)=0,whichmaximizeekKL(P;Q). Incaseofmultiplecurves,supposemmulti-nomialdistributionsP1;:::;PmareobservedfrommindependentrandomvariablesX1;:::;Xmbasedontheidenticalsamplingnumber,whicharesupposedtofollowthedistributionQ,thenPmi=1KL(Pi;Q)quantifyanoveralldistortion.Tohavenontrivialoptimalmatching,weconsiderabendingenergy,measuringthelikelihoodofanglefunctionsaectedbyGaussiannoises. GiveagroupofcurvesfCi;i=1;::;Ng,ourtaskistodeterminean\averagecurve"C0,andtheoptimalcorrespondencefunctionsgi:I0!Ii.WeproposemaximizingtheconditionalprobabilityP(C0;gi;i=1;:::;njCi;i=1;:::;n).AssumeP(C0;gi),P(CijC0;gi)followMultinomialandGaussiandistributionsrespectively.BytheBayesformula,wehaveP(C0;gijCi)=P(C0;gi)P(CijC0;gi) 118

PAGE 119

5-3 .)SimilarlywedenotebyG2;G0thesetofallmappingsfromItoI2,andI0respectively.Wecallg(s):=S1 Deneamappingdivergencebetweentwomappingsg12G1;g02G0,byCKLI(g01;g00):=ZS0g01logg01 a+b+alog2a a+b. 5-3 )alongthecurveCi)withtheassociatedmappingfunctionsgi2Gi,i=0;1;2.GivenatradeoparameterC,denetheelasticdivergenceD(g1;g0)C1;C0:=G1G0C1(I1)C1(I0)!RbyZS0(g01logg01

PAGE 120

2(x+y)foranyx;y2. Basedonthis,theaboveelasticenergyE(C1;C2)canbereformulatedas

PAGE 121

zx+z+ylogy zy+zxlog2x x+y+ylog2y x+y; 2(x+y))2+(y1 2(x+y))2=1 2(xy)2: 121

PAGE 122

Hereafter(inthischapter),denote:=f(g1;g2):G1G2;g01+g02=2gi.e=f(g1;g2):g1(s)+g2(s)=2s;g1(0)=g2(0)=0;g1(S)=S1;g2(S)=S2g: satisesPni=1gi(s)=ns,andtheoptimalaveragecurveC0isgivenbytheanglefunction0(s)describedbytheequation: Inotherwords,the\averagecurve"isconstructedbytakingthearithmeticmeanofallthecurves.ThisisacommonpropertyforallBregmandivergences[ 4 ].However,wecanshowthatthismappingdistanceistheonlyBregmandivergencewiththeparametrizationindependence.KL-divergenceistheoneamongBregmandivergence,whichhasthemeasure-invariantproperty. 5{2 )are g2;C0(givenby0(s)),andthecurvesaresmooth,thenwecanshowthese g02>0a.e.inthefollowingtheorem.Thentheoptimal 122

PAGE 123

C.1 .) Proof. Also,weconstructasequencef^g01;k:=1 Sincebothfunctions1(s1);2(s2)aresmooth,thenlimk!1k1(^g1)2(^g2)kL2=k1(

PAGE 124

Proof. ThendE(g1(s;);g2(s;)) Next,weevenhavetheuniquenessofthesolutionprovidedc0issmallenough. C.1 .) 124

PAGE 125

First,supposewewanttohavethemetricpropertyofthedissimilaritymeasurement,onewaytoachievethisistomodifythesecondtermofourelasticenergy:E(C1;C2):=minZS0(g01log2g01 UsingthefactthatthesquarerootofJSdivergenceisametric[ 22 ],wecanprovethefollowingresult. Proof. 125

PAGE 126

f+g+glog2g f+g)ds,wehavethetriangleinequality([ 22 ])p 2.Observej1(s1)g013(s2)g02j+j2(s2)g023(s3)g03jj1(s1)g013(s3)g03j; ThusweknowthatthesquarerootofthecostEisametric. Second,supposeacurvaturebasedmodelwithasquarerootmetricpropertyisdesired.Suchamodelmaybegivenasfollows, Third,sincetheanglefunctionschangeasthereferencecoordinatesystemisrotated,sometimesarotationinvariancepropertyisdesired.Toachievethisweintroduceanotherparametertomatchtheanglefunctions,i.e.weminimizethiscostwithrespectto(g1;g2;): Inthissetupwecanprovethefollowingresultfortherotationangle.

PAGE 127

Theaboveresultsaysinwords:ifrotationinvarianceisdesired,onesimplyappliestheaboveoptimalrotationtothecurveC1. minw2L1+;(g1;g2)2ZS0g01logg01+g02logg02+w(12)2+1

PAGE 128

min(g1;g2)2ZS0g01logg01+g02logg02+1 Theunderlyingideaoftherstequationis,useavariableweightwinsteadofaxedparameterC.Andthecostinthelastequationweightsmorethesimilarities,insteadofweightinglessthedissimilarities.Inthepresenceofocclusion,resultsinthesetwoapproachescanbedierent(seetheexperiment).Clearly,alargeterm(12)2hasagreatereectinmodel( 5{2 )thaninthismodel( 5{8 ). Intheequation 5{8 ,asj12j2islargecomparedto1 5 ]. Inthefollowing,weliketointroduceseveralmodelswithinterestingproperties.First,letusdiscusstwoextremearticialexamples.Giventwocurvesaresimilartoeachotherandonecurveisbendfromtheother,i.e.nostretchingofmappinginvolved.Thennoneedofndingtheoptimalmatching,thedistanceoftwocurvesisonlythebendingenergywithrespecttog1(s)=g2(s)=s.Theotherextremeis,giventwocurves,andonecurveisstretchedfromtheotherwithoutnobendingactioninvolved.Thentheoptimalmappingcanbeuniquelydeterminedinthefollowingremark. 128

PAGE 129

JTV(i),hereTV(i)isthetotalanglevariationofcurveCi.Thentheoptimalmatchingsimplymatchesthosesamplingpointssi;jwiththesamejamongalli.Clearly,itisuniqueandtransitive.Itmatchesallmainfeatures,forinstance,sharpcornersoncurves. Thesemodelsaretotalanglevariationbased.Inthissection,weassumethetotalanglevariationsareniteforallcurves.Toavoidnoiseeects,weconsidersmoothedcurveswithminimaltotalanglevariationsandminimalareadierence,seeFig. 5-4 .Anotherwaytoreducethenoisetogetndaoptimalsamplingoneachcurvesuchthattheangletotalvariationisminimalsubjecttoagivenboundonmaximalsamplingintervals. 33 ],Zjds1 129

PAGE 130

Here,wewillproposeamodelwhichpreservestheaboveproperties,andmoreoverprovideaconsistentcorrespondencemannerinmatchingagroupofcurves.Letusintroduceanewbendingenergybetweentwocurvesdescribedbytwoanglefunctions1(s1);2(s2):(thefunctionsgn(x)isdenedasx Hereweassumethatbothcurveshavenitetotalvariationsonanglefunctions. Z(sgn(g(x))p 130

PAGE 131

5{9 )thend1(g1(s)) Proof. ifthereexistsanyportionofmatchedcurves,sgn(d1(g1(s)) 2(a+b)];g1(s):=g1(b+2(sb));g2(s):=g2(b);s2(1 2(a+b);b]: 131

PAGE 132

thenconstructthefunctiong2on[a;b]asfollows,8><>:d2(g2(s)) 132

PAGE 133

i.e.thesumofarc-lengthandthetotalanglevariationoftheaveragecurve. 2: 133

PAGE 134

2,thenwehavethemodelintheprevioussection.Supposep=1 2;q=0,thenthemodelbecomesZsgn(d1 f+g+glog2g f+g)d;dH(f;g)2:=Z(f+g2p C.2 ) 134

PAGE 135

2; 2Z(flog2f f+g+glog2g f+g)d: 1.Incaseofprobabilitymeasures,wehavedH(;)=(R2(1p 2,anddH(;)assumesvalues[0;p 2.Forprobabilityproductmeasures:=12;:=12on:=12,wehave11 2d2H(;)=(11 2d2H(1;1))(11 2d2H(2;2)); 2d2H(1;1)d2H(2;2)d2H(;)d2H(1;1)+d2H(2;2): (Proofisinappendix C.2 ) HereweshowthecornersofV-shapeswillbematchedasClargeenough. 135

PAGE 136

Clearlyeitherthecornersarematched,orthecornersarenotmatchedandtheoptimalmatchingistheuniformmapping. Supposethecornersarenotmatched,thenthecostoftheJS-HellingermodelundertheuniformmappingisgivenbyJS(S1;S2)C0: Next,wewilldiscusstheinuenceofnoise.Therstquestionweshouldaskis,whatkindofnoisesarepermittedinthemodel?Thisisrelatedtothechoiceofsimilarityin 136

PAGE 137

Letexaminethenoiseeectonthedistancebetweentwocurves,andtheoptimalmatching.Ideally,ifnonoiseexist,theneverycostfunctionpreservingthemetricpropertyshouldberegardedasagoodsimilaritymeasurement.However,inreality,apropercostfunctionalshouldnotbetoosensitivetonoiseandtheminimizershouldbealsorobustagainstnoise.Otherwisetheresultdoesnotmakeanysensebecauserepeatingcomputationswithaslightlydierentnoiseyieldsaverydierentresult.Herethenoiseisformallydenedtocauseasmallperturbationina\certainmetricspace".Thus,thechoiceofsimilaritycostfunctionandassociatedmetricspaceshouldbesubjectivelydeterminedbythedenitionofnoiseallowed. Forinstance,whennoiseisrestrictedtohaveasmallangletotalvariation,thentheoptimaldistanceinJS-Hellingermodelisstable.Seethefollowinglemma. Infact,letE1;E2betheoptimalcostsbeforeandafternoiseinuence.Then(p Proof.

PAGE 138

Duetothetriangleinequality,thecostdierence(p Ontheotherhand,thestabilityoftheminimizerismuchhardertopreservethantheoptimalcostfunctional.Whenthecostfunctionisnotconvex,weneedtosearchfortheglobalminimizerinevitably,i.e.theoptimalmapping.But,usuallytheexistenceofmultiplelocalminimizersleadtotheinstabilityofthechoiceoftheglobalminimizer,seeFig. 5-6 .Ingeneral,thiscausesdicultyinndingtheminimizer,andprovidingameaningfulsenseabouttheselocalminima.However,whenthebendingcostsaresmall,i.e.similartoeachother,thentheselocalminimizerssimplyrefertotheself-similarityofthecurves.(Seethefollowingremark.)Foraselfsimilarcurve,itishardtodeterminewhich 138

PAGE 139

Hence,g2g11mapscurveC1toC2,andg2g11mapscurveC1toC2also.So,themappingg2g11g1g12yieldsalocaloptimalcorrespondencefromC2toitselfwithbendingenergylessthan(p 2 ]).Thismethodhasbeenfrequentlyusedinthiscurvematchingproblem,anditsuseisillustratedin[ 33 ]forinstance. Forthesakeofsimplicity,weexplainthenumericalalgorithmusingthesimplestmodel( 5{2 ).Onlyaminormodicationforthebendingtermisneededtoimplementothermodels.Whenmatchingapairofcurves,wecanconstructthedynamicalprogramming 139

PAGE 140

33 ]).Togetthecorrespondencesbetweenagroupofcurvessimultaneouslyweadoptthefollowingapproach. ConsideragroupofcurvesC1;:::;CNrepresentedbytheanglefunctionsk(sk),(k=1;:::;N),parameterizedbyarclengthssk(0skSk.Ourtaskistondtheir\averagecurve"C0representedby0(s)(S=1 Wedothesetwothingsalternatelythroughaminimizationalgorithm. Next,weshowourexperimentalresultstovalidateourmodel.Intherstexperiment,givenfournormalED-Endo,4-chamberedcardiaccontours,weaimtondthe\average 140

PAGE 141

5{6 )(intheextendedversionto4curves)totherst4curvesinFig. 5-7 .Byusingthealgorithmdescribedintheprevioussectionweobtainedthecorrespondencemappingsgi(i=1;:::;4)andthenthe\averagecurve"iscomputedusingequation( 5{4 ).ThecorrespondencesarepresentedthroughtheidenticalsymbolsinFig. 5-7 .Eachcurve'scostdivergencefromthe\averagecurve"isalsoshowninthetopofeachsub-gure.Itisevidentthatthepointswithhighcurvaturearewellmatched.Thisexperimentalsoillustratesthatourproposedmethodisusefulforndingan\averageshape"forasetoftrainingshapes. Theaimofthesecondmodelistoshowthatthelocationweightedmodelimprovesthematchingresultgreatlywhenpartialocclusionoccurs.ThreesetsofresultsonmatchingtwoshcurvesareshowninFig. 5-8 .ThematchingresultsonthetoprowfortwopairsofshcontoursareobtainedbyusingtheEditdistancemodel[ 33 ].GiventwocurvesCandCparameterizedinarclengthssands,respectively.FindamappinggfromCtoCthatminimizesthedissimilarityintheircurvatures(s)and(s),theEditdistancemodel[ 33 ]modelsearchesaoptimalmappinggtominimizetheenergyfunctional:ZS0[jg0(s)1j+Cj(g(s))g0(s)(s)j]ds: Toshowtheeectivenessofourlocationweightedideaweappliedthefollowingmodeltothesamepairoftheshcontoursshownontherightofthetoprow: minw2L1+;(g1;g2)2ZS0(g01logg01+g02logg02)+wd1 141

PAGE 142

5{8 ): min(g1;g2)2ZS0(g01logg01+g02logg02)+1 withtheweightfunctionw=exp(C0jd1 dsd ds,itiscomparablewiththebendingtoterminthismodelifw=1. ThebottomrowinFig. 5-8 ,showsthematchingresultobtainedbyourlocationweightedmodelwiththepresenceoftheocclusionintheshtail.Comparingthematchingresultsforthesethreepairsitclearlyindicatesthattheeditdistancemodelleadsapoormatchinginthepresenceofocclusion,whilethelocationweightedmodeldoesmuchbetterjob.Thegraphonthebottomrowpresentsthetheweightfunctionwhosevaluevariesfrom0:1to1:0.Wecanseethatitissmallonthelocationaroundtheocclusionpart(fromthesymbol/tothesymbolM).Thenthecorrespondencemapbecomesmoreexibletomatchthosesimilarparts,i.e.theocclusionhaslessinuenceonthewholematching. Inthethirdnumericalexperiment,weshowtheresultsofJSHmodelsunderdierenttradeoparameters,fromC=1e5toC=1000.thematchingresultsareshowninFig. 5-9 .Usingasmalltradeoparameter,clearlythecornersarenotmatchedatall.Onthecontrary,anextremelargeparameterleadstoamismatchingoffeaturesunderocclusion.ThebestmatchingresultamongthemisattainedwhenC=0:1.(SecondrowLeft)InTable 5-1 ,wecanseethearclengthandangletotalvariationofaveragingcurvesunderdierenttradeoparameters. 142

PAGE 143

143

PAGE 144

Non-Transitivityofpairwisematching.ThesamplingpointsA;B;Conheartshapesarematchedin(L)(M)(N).Whenmatchingtheheartshapesin(L)(N),A;B;CaremappedtoA0;B0;C0.However,whenmatchingtheheartshapesin(M)(N),A;B;CaremappedtoA00;B00;C00. Figure5-2. Thenumberofpossiblemappings.(A)Figureshowsthepossiblemappingswithoutanyrestrictions.ThetotalnumberisCnm.(B)Figureshowsthepossiblemappingswithmappingrestrictions:f(i1)=j1;:::,f(ip1)=jp1. Table5-1. ExperimentresultsofJSHmodel. Carc-lengthangletotalvariation 1e-52.444912.46611e-32.443015.18441e-12.373521.6080102.245722.127710002.239622.1284 Basedonthemodel 5{10 ,dierenttradeoparametersCareusedtomaximizethearclengthandtheangletotalvariationoftheaveragingcurve.ItshowsthatwhenCincreases,thearc-lengthoftheaveragingcurvedecreases,andtheangletotalvariationincreases. 144

PAGE 145

Setting:DenitionsofG1and1. (A)Oneexampleofg12G1;g22G2.(B)OneexampleofcurveC1by1(s1)=1(g1(s)). Figure5-4. Azigzaggedcurve.Theellipsehasasmallertotalanglevariationthantheoriginalzigzaggedcurve.Theshadedareaistheareadierence. 145

PAGE 146

ComparisonofJSdivergenceandHellingerdistance. Thisgureshowstwofunctions:JS(x;2x)andd2H(x;2x). Figure5-6. Stabilityoftheglobalminimizer.TwolocalminimizersA;Bexistforthiscostfunctional.Thefunctionvaluesatthesetwopointsareveryclose.ThustheglobalminimizerwillvaryevenunderaslightlysmallchangeonthefunctionvaluesatA;B. 146

PAGE 147

Findinganaveragecurveamongseveralcardiaccontours. The4curves(A{D)aregivencardiaccontoursandthecurve(E)istheir\averagecurve"obtainedusingmodel( 5{6 )withC=10.Thesamesymbolindicatesthecorrespondenceamongthese5curvesobtainedbythismodel. Figure5-8. Theresultofthelocationweightedmodel.Matchingresultsforapairofshcurves(noocclusion:(A)(B)andwithocclusion:(C)(D)respectively)obtainedbyusingEditdistancemodel.(E)(F):matchingresultforapairofshcurvesobtainedbyusingmodel( 5{12 ).Thesub-gure(G)showsthegraphofweightfunction.Thecorrespondencesareindicatedbythesamesymbols. 147

PAGE 148

TheresultofJSHmodelwithdierentC.Matchingresultsfortwopairs(A)(B),and(C)(D)ofshcurvesobtainedbyusingJSHmodel 5{10 underC=1e5;C=1e3.Threemorepairs(E)(F),(G)(H)and(I)(J)aretheresultsunderC=0:1;C=10;C=1000. 148

PAGE 149

Hereweliketocomputethecenter(ITC)ofCKLspherenumericallyusingtheline-searchbasedonArmijoRule. TomaximizethefunctionalM(p)in,weformtheLagrangianL(p)=nXi=1piZfilogfi @pi=Rfilogfi 3.Givenapositivethresholdp,say1e6, denetheactivesetvectora(k):=((p(k)p):or:(g(k)0)). 4.Iteratethep(k+1)byp(k+1)=p(k)+(k)g(k):a(k): 5.Repeat1.-4.,untilnorm(g)<,isthestopcondition. Inotherwords,welookforamappingg:f1;:::;ng!f1;:::;mg,andtheirgroupcentersf1;:::;fm,suchthatmaxKL(fi;fg(i))isminimized,orthemaximalofeachgroup'sITRisminimized. WealwaysassumeRfid=1. 149

PAGE 150

AndDenetheorderrelationshipR0Rifthereexistsani,suchthatR0iRiandR0j=Rj;8j
PAGE 151

race:html. Sinceweconsiderthesimilarityanddissimilarityofracialstructureofeachstate,weadopttheITCasthemeasurement.Weusethepercentageofdierentraces,insteadofitspopulation.So,eachstructurevectorconsistsofthepercentageof8dierentracesineachstate. 1.Hispanic 2.Non-HispanicWhite 3.Non-HispanicBlack 4.Non-HispanicAmericanIndian 5.Non-HispanicAsian 6.Non-HispanicHawaiianorPacicIslander 7.Non-HispanicOther 8.TwoorMoreRaces Wewillclusterthese51vectors(including50statesandDC)into10groups.TheresultislistedinTable 6.2 Thislastcolumnprepresentspi.Andforeachgroup,theITCrowconsistsofthegroupITCanditsKL-radius(inthelastcell). Forexample,thestateArkansasconsistsof3:25%Hispanic,78:56%Non-HispanicWhite,15:58%Non-HispanicBlack,0:62%Non-HispanicAmericanIndian,0:74%Non-HispanicAsian,0:06%Non-HispanicHawaiianorPacicIslander,0:05%Non-HispanicOtherand1:14%TwoorMoreRaces. Andthenumber0inthelastcellmeansthatpi=0intheITCformulaf=Pipifi. NoticethattheDCareaandHawaiistatearedissimilarfromotherstatesintheirmajorrace. Next,weuseacostfunctionPiKL(fi;fg(i))instead,whichisanalogtotheK-meanscostfunctionPikfifg(i)k2,whereL2normisused. 151

PAGE 152

6.2 .Thelastrowofeachgroupisthegroupcenter,andtherstcellofthisrowisthemaximalKL(fi;fg(i)). Inotherwords,weusethel1cost,insteadofthel1cost. Thej-thgroupcenter,j=1;:::;m,isthearithmeticmean1 Hence,themaximumKL-divergence(0:0805)ishigherthantheone(0:054)inthel1cost. 6-1 are44Endocardialcontours.TheseHeartShapedataisprovidedbyM.D.EdwardGeiserandDr.DavidWilson.Thisisanotherapplicationingroupingheartshape.Weclustertheminto8groups.Ineachsub-gure,curveswithdierentcolorsgreen/redrepresentdierentgroups.TheITCcurvesisplottedin?. Eachcurvesisrepresentedinpolarcoordinates:r=r().First,wetranslateallcurves,suchthattheirmasscenterlocatesattheorigin.Andtheareadistributionisfi()=r2 Inordertobecomeameasurementforshape,weneedtoensurethatthemeasurementoftwocontourswiththesameshapealways0.Inotherwords,thealignmentisallowedtocontainsomerotation. Denefi(;i):=fi(i) Then,wecanuseasimilarclusteralgorithm,exceptthemissingparameteri-rotationangles. 2.Foreachgroup,nditsITCfj;j=1;:::;m. 3'.updateallthemembershipg(i)=j,ifminjminiKL(fi;fj)
PAGE 153

ResultsofKLl1clusteringstates GroupHispanicWhiteBlackIndianAsianHawaiiOtherTwop 153

PAGE 154

GroupHispanicWhiteBlackIndianAsianHawaiiOtherTwo Group5ITC23.3660.247.940.495.590.170.182.030.05California32.3846.706.440.5310.770.310.212.670.42Colorado17.1074.463.680.672.170.090.131.690.29Florida16.7965.4414.170.271.640.040.181.480.30Nevada19.7265.216.581.074.430.390.142.460.00NewJersey13.2866.0413.030.135.670.030.231.590.00NewYork15.1161.9814.820.285.460.030.401.930.00Texas31.9952.4311.340.332.660.050.101.110.00Group6ITC3.3682.963.356.750.860.050.062.610.03Montana2.0089.540.286.030.510.050.061.530.57Oklahoma5.2074.087.487.711.340.060.074.060.48S.Dakota1.4488.040.608.080.570.030.041.190.00Group7ITC4.7787.102.162.711.340.040.391.480.04Idaho7.8688.050.381.220.900.090.091.410.00N.Dakota1.2191.740.594.790.560.030.041.040.50Oregon8.0583.521.561.172.930.220.132.420.00RhodeIsland8.6681.894.000.402.230.030.801.990.46Utah9.0385.270.721.191.630.660.091.400.02Wyoming6.4188.860.712.070.540.050.101.250.00Group8ITC7.8627.8359.450.222.630.050.291.680.00DC7.8627.8359.450.222.630.050.291.681.00Group9ITC7.2422.871.720.2140.798.950.1718.050.00Hawaiii7.2422.871.720.2140.798.950.1718.051.00Group10ITC2.9865.3328.010.332.120.030.101.100.03Alabama1.7170.2925.860.490.700.020.060.880.00Georgia5.3262.6528.480.222.100.040.141.070.00Lousiana2.4162.5332.300.541.210.020.110.880.00Maryland4.3062.0527.650.253.960.040.181.570.00Missiccippi1.3960.7436.150.390.650.020.050.610.51N.Carolina4.7170.1621.411.181.400.040.110.990.00S.Carolina2.3766.1129.370.320.890.030.080.830.00Virginia4.6670.1519.440.263.660.050.171.610.48 154

PAGE 155

Clusteringresultof44heartshapesinto8groupsEachsub-gureof(A)(B)(C)(D)contains2groupsinredandgreencolorsrespectively. 155

PAGE 156

ResultsofKLl1clusteringstatesin10groups Group/KLradiusHispanicWhiteBlackIndianAsianHawaiiOtherTwo Group1Hawaiii7.2422.871.720.2140.798.950.1718.050.007.2422.871.720.2140.798.950.1718.05Group2Indiana3.5385.848.310.220.960.030.101.01Kentucky1.4889.277.270.200.730.030.100.93Missouri2.1283.7611.180.421.090.050.091.29Ohio1.9184.0111.370.191.160.020.121.21Pennsylvania3.2184.059.790.121.780.020.110.92Wisconsin3.6087.285.600.821.640.030.070.970.01212.6485.708.920.331.230.030.101.05Group3Alabama1.7170.2925.860.490.700.020.060.88Georgia5.3262.6528.480.222.100.040.141.07Lousiana2.4162.5332.300.541.210.020.110.88Maryland4.3062.0527.650.253.960.040.181.57Missiccippi1.3960.7436.150.390.650.020.050.61S.Carolina2.3766.1129.370.320.890.030.080.830.01862.9264.0629.970.371.580.030.100.97Group4Idaho7.8688.050.381.220.900.090.091.41Kansas7.0083.105.630.831.720.040.091.58Massachusetts6.7581.885.010.183.730.030.691.74Minnesota2.9188.163.431.062.870.030.101.43Nebraska5.5287.333.950.791.270.040.081.03Oregon8.0583.521.561.172.930.220.132.42RhodeIsland8.6681.894.000.402.230.030.801.99Utah9.0385.270.721.191.630.660.091.40Washington7.4978.933.131.455.420.390.202.980.02927.0384.243.090.922.520.170.251.78Group5Arizona25.2563.822.924.551.740.110.121.49California32.3846.706.440.5310.770.310.212.67NewMexico42.0844.721.698.881.000.050.171.42Texas31.9952.4311.340.332.660.050.101.110.080532.9251.925.603.574.040.130.151.67 156

PAGE 157

Group/KLradiusHispanicWhiteBlackIndianAsianHawaiiOtherTwo Group6Iowa2.8292.622.080.271.240.030.070.87Maine0.7396.500.510.540.710.030.070.92Montana2.0089.540.286.030.510.050.061.53NewHampshire1.6695.100.680.221.280.030.100.94N.Dakota1.2191.740.594.790.560.030.041.04S.Dakota1.4488.040.608.080.570.030.041.19Vermont0.9096.160.480.380.850.020.091.12W.Virginia0.6894.563.140.190.520.020.060.83Wyoming6.4188.860.712.070.540.050.101.250.0421.9992.571.012.510.750.030.071.08Group7Alaska4.1267.603.3615.393.950.510.214.86Oklahoma5.2074.087.487.711.340.060.074.060.01744.6670.845.4211.552.640.280.144.46Group8DC7.8627.8359.450.222.630.050.291.6807.8627.8359.450.222.630.050.291.68Group9Colorado17.1074.463.680.672.170.090.131.69Connectict9.4177.498.680.212.400.030.241.55Florida16.7965.4414.170.271.640.040.181.48Illinois12.3267.8314.950.153.380.030.111.24Nevada19.7265.216.581.074.430.390.142.46NewJersey13.2866.0413.030.135.670.030.231.59NewYork15.1161.9814.820.285.460.030.401.930.040414.8268.3510.840.403.590.090.201.71Group10Arkansas3.2578.5615.580.620.740.060.051.14Delaware4.7672.4818.940.302.060.030.131.30Michigan3.2678.5514.110.541.760.020.121.64NorthCarolina4.7170.1621.411.181.400.040.110.99Tennessee2.1879.2016.310.240.990.030.080.96Virginia4.6670.1519.440.263.660.050.171.610.01243.8074.8517.630.521.770.040.111.28 157

PAGE 158

Thisthesisconsistsofthreecomponents,Bregmanmetrics,Capacityclassier,andtheparametrizationindependencecurvematchingmethod.Ineachcomponent,itstillleavesmanyunsolvedproblem.Forinstance,Ilistseveralproblemsasfutureworks. 1.InBregmanmetricspart,Istillfailtoshowthesquarerootofcapacityisametric.Thisisonelargestdrivingforceofmywork.Ibelievethatsomehowthereexistssomefundamentalmathematicalreasonbehindthat. 2.Inthecapacityclassier,Istillbuildaformaldiscussionfromthestandardmachinelearningframework.Especially,thegeneralizationerror.OnemoreimportantissueiswhydoestheSupportvectormachinemethodfailtoworkinourdataset?Ineedtondaframeworktoexplainthisinterestingphenomena. 3.Incurvematchingpart,thisisthemostexcitingpart.Therearemanyinterestingproblems.Ialwayshavearoughideaaboutsurfacematchingfollowingthiscurrentframework.Butinordertoconstructadoablemodel,thiscurrentresultisstillnotenough,especiallythenumericalissueaboutlocalminima. Thisisanoutlineofmyfuturework. 158

PAGE 159

Thisistheproofofthenecessaryconditionofthersttypemetric. Hereweconsideraspecialcase,threenumbersxa;x;x+a,withaispositiveandcloseto0.BytheTaylor'sexpansion,f(x+a)=f(x)+f0(x)a+f00(x) 2a2+f000(x) 3!a3+f0000(x) 4!a4+O(a5); 2(f(x+a)+f(xa))f(x)=f00(x) 2a2+f0000(x) 4!a4+O(a5): 2(f(x+a)+f(x))f(x+a=2)=f00(x+a 2(a 4!(a 8a2+f000(x) 16a3+f0000(x) 64a4+f0000(x) 164!a4+O(a5): 2(f(x+a)+f(x))f(x+a=2))+r 2(f(xa)+f(x))f(xa=2))r 2(f(x+a)+f(xa))f(x))=r 2a21 16(1 2(f000(x) Thisistheproofofthenecessaryconditionofthesecondtypemetric. 159

PAGE 160

Herewewillderivearelationbetweena;b,a=k1b+k2b2+O(b3)forsomeconstantk1;k2. Sincef(za)=f(z+b)=y,andtakingTaylor'sexpansionaroundz,thenwehavef00 Supposeforza;z;z+b,wehaveq b;f0(zc)=y a,thentheabovetriangleinequalitycanberewrittenasr byf(z+d)+r ayf(zc)p bf(z+d) af(zc) 2b+f000 48(f0000

PAGE 161

4+b2 12(f000 2(f000 4f0000 bf(z+d) 4+bf000 24)b2 12(f000 2(f000 4f0000 bf(z+d) 2+bf000 24)b2 3(f000 4f0000 af(zc) 2af000 24)a2 3(f000 4f0000 2bf000 24)b2 3(f000 4f0000 72: bf(z+d) af(zc) 3(f000 4f0000 243=b2 4f0000

PAGE 162

ap+bq+bqlogb ap+bq,wherea;b2R>0;00impliesgp(x)monotonicincreasing,andthisisasucientconditionforthetriangleinequality. q+pxq q+pxlog1 q+pxlogx q+px Lett=1 q+px, then0s1t,f(x)couldbesimpliedasf=(1+r)logtlogsqtlogtpslogs: Letc(p):=sups1pslogs+qtlogt Next,f>0ifandonlyif1r>c(p)IfwecanndanupperboundfortheRHS,thenwehaveanupperboundforr.Usingthenumericalcomputation,underthisframework,r=0:4959ensuresthemetric.Thisisthemainideainthenextlemma.

PAGE 163

Proof. q(1s)increases.Fromq(t1)=p(1s),wehavep=t1 LetC(p;s):=ps logs+ps(1 logt1 logs); wehave@C @p=s(1 logt1 logsp(1s) logt1 logs(t1)(ts) Then@C @p=(logt)2 t(u1)(t1) Usingtheinequality:1 loguu u11 2,(logt)2(t1)2 2(logt)2(t1) then@C @p(logt)2 2(logt)2(t1) u1((logt)2(t1)2 First,claimsuppc(p)1. Sincep=1n;q=1p;t=1+p q(1s),thenwecantreatc(p)asafunctionofp;s.Next,letD:=f(p;s)2(0;1)(0;1)j1p q(1s)1+p on 163

PAGE 164

log(p +1)+qt log(p +1)+1ps log(p +1)+1!1;as!0 Second,claimsuppc(p)1.Foranyxeds<1,asp!1,t!1,thusthersttermalwaysvanish. Forthesecondterm,forsxed,asp!1,onecanchoosesome>0smallenough,suchthatwehave1ps+logp>0for1>p>1.) Thussup(p;s)2(0;1)(0;1)ps 1+logp Thus,weconcludelimp!1c(p)=1. Conclusion:Forpxed,sup0
PAGE 165

(K(p)f:=K(p);K(p)g:=KL(g;pf+qg)),thenK(p)isstrictlyconvexinp. Moreover,assume(p;q):=argmaxMp(f;g),then1 4M1 2,whichimpliesZflogf pf+qg(f(pf+qg))d>Zf f+g(ff+g f+g(gf+g 2Zflog2f f+g(ff+g 1xisdecreasinginxin0
PAGE 166

f+g(ff+g 4qZflog2f f+g(ff+g 4. Thus,1 41,replacexby1 First,notef=xlogx x110,thenef(x)1. (x1)2dp dx=ef(logx x1)+1.Now,claimtheaboveexpressionisnon-positive.Tothisend,dierentiatingtheRHSagain,thenwehavethatef((11 x1)((x1)(logx+1)xlogx Second,notelogx1x x1)+1approaches0,thusdp dx0Thisconcludesthatpattainthemaximumatx=0,andmaxp=11 Similarly,forx>1xed,considery:=1 p+qy. Thenexchangingp;q,andfollowingtheabovediscussion,wehavemaxq=11 Thus,1 166

PAGE 167

2.def Rewriteh(p)=pxlogx px+q+qlog1 px+q+q(1x))+q(log1 px+q+q(1x)=log1 px+qlog1 i.e.xlogx+(1x)log(e(px+q))=0 Let(p):=xlogx+(1x)log(e(px+q)).Then0(p)=(1x)2 Nowsupposethereexistssomep<1 2associatedwithsomeabetween0;1,thensince(p)isdecreasing,(1 2)<(p)=0,i.e.aloga+(1a)log(e(a+1) 2)<0. Let(x):=xlogx+(1x)loge(x+1) 2.Thenatx=a,wehave(a)<0. Butthisisimpossibleduetothefollowingargument. 2+1x x+1;00(x)=1x x(x+1)2>0 167

PAGE 168

But0(1)=0,then0(x)<0for00. Thus,wemusthavep1 2. Next,wewillshowp11 Supposenot.Thenthereexistssomep>11 Orloga+1a a(log((p0a+q0)+1)>0. Let(x):=logx+1x xlog((p0x+q0)+1).Thenatx=a,wehave(a)>0. Butthisisimpossibleduetothefollowingargument.(a)>0;(1)=0,usingtheMeanValuetheorem,0(x)mustbenegativesomewherebetween0;1. Sincethenumeratorisconvex,andhavezerosat0;1,thenthenumeratorisnon-positive,whichimplies0(x)0: Thus,wemusthavep11 168

PAGE 169

2p11 Foranypositivenumbersx;y,letzbetheITC.Thenwehavexlogx z(xz)=ylogy z(yz): z;b:=y z.Thenalogaa+1=blogbb+1,withITC(a;b)=1. Nowinthegure B-1 ,thecurveABCisplottedbyy=f(x):=xlogxx+1,A=(0;1);B=(1;0);C=(e;1).ThenanytwopointswiththesamevalueyhavetheITC1. LetcurveDBbethereected,stretchedcurvefromABwithaxisL:y=1. AnypointE=(x;y)oncurveABmapstoapointF=(1+(1x);y)oncurveBD. Claim:forall11,therearetwointersectionsB1andR. SinceE;Fhavethesameyvalue,thentheareasundercurvesB1F1andB1E2mustbethesame. ThusweknowthatRliesbetweenB1;F1bytheareaundercurveB1RonDsmallerthantheareaundercurveB1RonC.AndtheuniquenessofF1;E2areclear. 169

PAGE 170

BoundIllustrationoftheminimizerp.(A):y=xlogxx+1and(B):itsderivativegraph Finally,supposethexcoordinateofE2isx0,aslarger,thecurveDmovesdownwards,andtheareaunderDbetweenx=1andx=x0decreases. Thus,weneedalargerxcoordinatorofE2forthelarger.Thus,theclaimisproved. Sincep=d(F;L)

PAGE 171

5.3 Supposeg01isanoptimalsolutionwithg01
PAGE 172

5.5 172

PAGE 173

5.9 f+g;g1:=2g f+g;d1=f+g f+g+glog2g f+g)d=Z(f1logf1+g1logg1)d1;anddH(f;g)2=Z(f+g2p SinceF0(a)=2(a1) NextwewillshowG(a)F(a)log20.Rewritethisasfollowing:2log2r cossin: sin2=tan2,thenwehaveA00()>0for2(0;). 173

PAGE 174

5{11

PAGE 175

[1] S.ARIMOTO,Analgorithmforcomputingthecapacityofarbitrarydiscretememorylesschannels,IEEETrans.Inf.Theory,18(1972),pp.14{20. [2] A.A.AMINI,T.E.WEYMOUTH,ANDR.C.JAIN,Usingdynamicprogrammingforsolvingvariationalproblemsinvision,IEEETrans.onPatternAnal.andMach.Intell.12(1990),pp.855{867. [3] K.S.AZOURY,ANDM.K.WARMUTH,Relativelossboundsforon-linedensityestimationwiththeexponentialfamilyofdistributions,MachineLearning,43(2001),pp.211{246. [4] A.BANERJEE,S.MERUGU,I.S.DHILLON,ANDJ.GHOSH,ClusteringwithBregmandivergences,JournalofMachineLearningResearch,6(2005),pp.1705{1749. [5] R.BASRI,L.COSTA,D.GEIGER,ANDD.JACOBS,Determingthesimilarityofdeformableshapes,IEEEWorkshoponPhys.BasedModel.ComputerVision,1995,pp.135{143. [6] P.J.BARTLETT,B.SCHOLKOPF,D.SCHUURMANS,ANDA.J.SMOLA,AdvancesinLarge-MarginClassiers,MITPress,Cambridge,Mass.,2000. [7] A.BEN-HUR,D.HORN,H.T.SIEGELMANN,V.VAPNIK,Supportvectorclustering,JournalofMachineLearningResearch2(2001),pp.125{137. [8] D.P.BERTSEKAS,NonlinearProgramming,AthenaScientic,Belmont,Mass.,2003. [9] R.E.BLAHUT,Computationofchannelcapacityandrate-distortionfunctions,IEEETrans.Inf.Theory,18(1972),pp.460{473. [10] L.M.BREGMAN,Therelaxationmethodofndingthecommonpointofconvexsetsanditsapplicationtothesolutionofproblemsinconvexprogramming,USSRComputationalMathematicsandPhysics,7(1967),pp.200{217. [11] L.D.Brown,Fundamentalsofstatisticalexponentialfamilies,InstituteofMath.Statistics,Hayward,1986. [12] A.BUZO,A.H.GRAY,R.M.GRAY,ANDJ.D.MARKEL,Speechcodingbaseduponvectorquantization,IEEETrans.onAcoustics,SpeechandSignalProcessing,28(1980),pp.562{574. [13] B.CHALMOND,ModelingandInverseProblemsinImagingAnalysis,Springer,NewYork,2003. 175

PAGE 176

I.COHEN,N.AYACHE,ANDP.SULGER,TrackingPointsonDeformableObjectsUsingCuratureInformation,Proc.EuropeanConf.ComputerVision,1992,pp.458{466. [15] M.COLLINS,S.DASGUPTA,ANDR.SCHAPIRE,Ageneralizationofprincipalcomponentanalysistotheexponentialfamily,Proc.oftheAnnualConf.onNIPS,2001. [16] C.CORTESANDV.VAPNIK,Supportvectornetworks,MachineLearning,20(1995),pp.273{297. [17] T.M.COVERANDJ.A.THOMAS,ElementsofInformationTheory,Wiley&Sons,NewYork,1991. [18] K.CRAMMERANDY.SINGER,LearningalgorithmsforenclosingpointsinBregamnianspheres,ProceedingsoftheSixteenthAnnualConferenceonComputationalLearningTheory(COLT),2003. [19] I.CSISZARANDJ.KORNER,InformationTheory:CodingTheoryforDiscreteMemorylessSystems,AcademicPress,NewYork,1981. [20] I.DHILLON,S.MALLELA,ANDR.KUMAR,Adivisiveinformation{theoreticfeatureclusteringalgorithmfortextclassication,JournalofMachineLearningResearch,3(2003),pp.1265{1287. [21] I.EKELANDANDR.TEMAM,ConvexAnalysisandVariationalProblems,SIAM,Philadelphia,1999. [22] D.M.ENDRESANDJ.E.SCHINDELIN,Anewmetricforprobabilitydistribu-tions,IEEETrans.Inform.Theory,49(2003),pp.1858{1860. [23] J.FORSTERANDM.K.WARMUTH,Relativeexpectedinstantaneouslossbounds,Proc.ofthe13thAnnualConf.onComputationalLearningTheory,2000,pp.90{99. [24] A.L.GIBBSANDF.E.SU,Onchoosingandboundingprobabilitymetrics,SIAMJ.AppliedMath.,58(1998),pp.565{586. [25] F.ITAKURA,ANDS.SAITO,Analysissynthesistelephonybaseduponmaximumlikelihoodmethod,Repts.ofthe6thInternat'l.Cong.Acoust.,Y.Kohasi,ed.,Tokyo,C-5-5,C17-20,1968. [26] E.KLASSEN,A.SRIVASTAVA,W.MIO,ANDS.JOSHI,Analysisofplanarshapesusinggeodesicpathsonshapespaces,IEEETrans.PatternAnal.andMach.Intell.,26(2004),pp.372-383. [27] Y.LINDE,A.BUZO,ANDR.M.GRAY,Analgorithmforvectorquantizerdesign,IEEETrans.onComm.,28(1980),pp.84{95. 176

PAGE 177

G.MATZANDP.DUHAMEL,Informationgeometricformulationandinterpre-tationofacceleratedBlahut-Arimoto-typealgorithms,Proc.2004IEEEInformationTheoryWorkshop,SanAntonio,TX,USA,pp.24{29. [29] R.NOCKANDF.NIELSEN,FittingthesmallestenclosingBregmanballs,Proceedingsofthe16thEuropeanConferenceonMachineLearning(ECML),vol3720,pp.649{656. [30] N.PARAGIOS,Y.CHEN,O.FAUGERAS,HandbookofMathematicalModelsinComputerVision,Springer,NewYork,2006. [31] J.PLATT,Fasttrainingofsupportvectormachinesusingsequentialminimaloptimization,AdvancesinKernelMethods-SupportVectorLearning,MITPress,Mass.1998. [32] F.SCHMIDT,M.CLAUSEN,ANDD.CREMERS,Shapematchingbyvariationalcomputationofgeodesiconamanifold,DAGM2006,LNCS4174,pp.142{151. [33] T.B.SEBASTIAN,P.N.KLEIN,ANDB.B.KIMIA,Onaligningcurves,IEEETrans.PatternAnalysisandMachineintelligence,25(2003),pp.116{124. [34] H.D.TAGARE,Shapebasednonrigidcorrespondencewithapplicationtoheartmotionanalysis,IEEETrans.MedicalImaging,18(1999),pp.570{578. [35] C.C.TAPPERT,Cursivescriptrecognitionbyelasticmatching,IBMJ.ResearchDevelopment,26(1982),pp.765{771. [36] J.UHLMANN,Satisfyinggeneralproximity/similarityquerieswithmetrictrees,InformationProcessingLetters,1991,pp.175{179. [37] P.N.YIANILOS,Datastructuresandalgorithmsfornearestneighborsearchingeneralmetricspaces,ACM{SIAMSymp.onDiscretealgorithms,1993,pp.311{321. [38] L.YOUNES,Computableelasticdistancebetweenshapes,SIAMJ.AppliedMath.,58(1998),pp.565{586. 177

PAGE 178

PengwenChenwasborninHsinchu,Taiwan.HehasabachelorofsciencedegreeinmathematicsandamasterofengineerdegreeinappliedmechanicsfromNationalTaiwanUniversity.In2000,heworkedasarmwareengineerinadigitalcameracompany.In2001,heworkedondevelopingfastalgorithmsinthevisualizationofchaoticbifurcationofStokes'owinthemathematicsinstituteinAcademiaofSinica,Taiwan.InAugust,2002,hestartedhisgraduatestudiesinmathematicsatUniversityofFlorida.Hisresearchinterestisappliedmathematics. 178