<%BANNER%>

Adaptive Kernel Self-Organizing Maps using Information Theoretic Learning

Permanent Link: http://ufdc.ufl.edu/UFE0041707/00001

Material Information

Title: Adaptive Kernel Self-Organizing Maps using Information Theoretic Learning
Physical Description: 1 online resource (65 p.)
Language: english
Creator: Chalasani, Rakesh
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2010

Subjects

Subjects / Keywords: correntropy, kernel, magnification, self
Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre: Electrical and Computer Engineering thesis, M.S.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: ADAPTIVE KERNEL SELF-ORGANIZING MAPS USING INFORMATION THEORETIC LEARNING By Rakesh Chalasani May 2010 Chair: Jose Carlos Principe Major: Electrical and Computer Engineering Self organizing map (SOM) is one of the popular clustering and data visualization algorithm and evolved as a useful tool in pattern recognition, data mining, etc., since it was first introduced by Kohonen. However, it is observed that the magnification factor for such mappings deviates from the information theoretically optimal value of 1 (for the SOM it is 2/3). This can be attributed to the use of the mean square error to adapt the system, which distorts the mapping by over sampling the low probability regions. In this work, the use of a kernel based similarity measure called correntropy induced metric (CIM) for learning the SOM is proposed and it is shown that this can enhance the magnification of the mapping without much increase in the computational complexity of the algorithm. It is also shown that adapting the SOM in the CIM sense is equivalent to reducing the localized cross information potential and hence, the adaptation process as a density estimation procedure. A kernel bandwidth adaptation algorithm in case of the Gaussian kernels, for both homoscedastic and heteroscedastic components, is also proposed using this property.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Rakesh Chalasani.
Thesis: Thesis (M.S.)--University of Florida, 2010.
Local: Adviser: Principe, Jose C.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2011-04-30

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2010
System ID: UFE0041707:00001

Permanent Link: http://ufdc.ufl.edu/UFE0041707/00001

Material Information

Title: Adaptive Kernel Self-Organizing Maps using Information Theoretic Learning
Physical Description: 1 online resource (65 p.)
Language: english
Creator: Chalasani, Rakesh
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2010

Subjects

Subjects / Keywords: correntropy, kernel, magnification, self
Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre: Electrical and Computer Engineering thesis, M.S.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: ADAPTIVE KERNEL SELF-ORGANIZING MAPS USING INFORMATION THEORETIC LEARNING By Rakesh Chalasani May 2010 Chair: Jose Carlos Principe Major: Electrical and Computer Engineering Self organizing map (SOM) is one of the popular clustering and data visualization algorithm and evolved as a useful tool in pattern recognition, data mining, etc., since it was first introduced by Kohonen. However, it is observed that the magnification factor for such mappings deviates from the information theoretically optimal value of 1 (for the SOM it is 2/3). This can be attributed to the use of the mean square error to adapt the system, which distorts the mapping by over sampling the low probability regions. In this work, the use of a kernel based similarity measure called correntropy induced metric (CIM) for learning the SOM is proposed and it is shown that this can enhance the magnification of the mapping without much increase in the computational complexity of the algorithm. It is also shown that adapting the SOM in the CIM sense is equivalent to reducing the localized cross information potential and hence, the adaptation process as a density estimation procedure. A kernel bandwidth adaptation algorithm in case of the Gaussian kernels, for both homoscedastic and heteroscedastic components, is also proposed using this property.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Rakesh Chalasani.
Thesis: Thesis (M.S.)--University of Florida, 2010.
Local: Adviser: Principe, Jose C.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2011-04-30

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2010
System ID: UFE0041707:00001


This item has the following downloads:


Full Text

PAGE 1

ADAPTIVEKERNELSELF-ORGANIZINGMAPSUSINGINFORMATIONTHEORETICLEARNINGByRAKESHCHALASANIATHESISPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFMASTEROFSCIENCEUNIVERSITYOFFLORIDA2010

PAGE 2

c2010RakeshChalasani 2

PAGE 3

Tomybrotherandsister-in-law,wishingthemahappymarriedlife. 3

PAGE 4

ACKNOWLEDGMENTS IwouldliketothankDr.JoseC.Principeforhisguidancethroughoutthecourseofthisresearch.Hisconstantencouragementandhelpfulsuggestionshaveshapedthisworktoitpresentform.IwouldalsoliketothankhimforthemotivationhegavemeandthingsthatIhavelearnedthroughhimwillbewithmeforever.IwouldalsoliketothankDr.JohnM.SheaandDr.ClintK.Slattonforbeingonthecommitteeandfortheirvaluablesuggestions.IamthankfultomycolleaguesattheCNELlabfortheirsuggestionsanddiscussionsduringthegroupmeetings.IwouldalsoliketothankmyfriendsattheUFwhomademystayheresuchamemorableone.Lastbutnottheleast,Iwouldliketothankmyparents,brotherandsister-in-lawforallthesupporttheygavemeduringsomeofthetoughtimesIhadbeenthrough. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 7 LISTOFFIGURES ..................................... 8 ABSTRACT ......................................... 9 CHAPTER 1INTRODUCTION ................................... 10 1.1Background ................................... 10 1.2MotivationandGoals .............................. 11 1.3Outline ...................................... 12 2SELF-ORGANIZINGMAPS ............................. 13 2.1Kohonen'sAlgorithm .............................. 13 2.2EnergyFunctionandBatchMode ....................... 15 2.3OtherVariants ................................. 16 3INFORMATIONTHEORETICLEARNINGANDCORRENTROPY ........ 19 3.1InformationTheoreticLearning ........................ 19 3.1.1Renyi'sEntropyandInformationPotentials .............. 19 3.1.2Divergence ............................... 21 3.2CorrentropyanditsProperties ........................ 22 4SELF-ORGANIZINGMAPSWITHTHECORRENTROPYINDUCEDMETRIC 25 4.1On-lineMode .................................. 25 4.2EnergyFunctionandBatchMode ....................... 26 4.3Results ..................................... 30 4.3.1MagnicationFactoroftheSOM-CIM ................. 30 4.3.2MaximumEntropyMapping ...................... 32 5ADAPTIVEKERNELSELFORGANIZINGMAPS ................. 35 5.1TheAlgorithm .................................. 35 5.1.1HomoscedasticEstimation ....................... 36 5.1.2HeteroscedasticEstimation ...................... 38 5.2SOM-CIMwithAdaptiveKernels ....................... 40 6APPLICATIONS ................................... 45 6.1DensityEstimation ............................... 45 5

PAGE 6

6.2PrincipalSurfacesandClustering ....................... 47 6.2.1PrincipalSurfaces ........................... 47 6.2.2AvoidingDeadUnits .......................... 49 6.3DataVisualization ............................... 51 7CONCLUSION .................................... 54 APPENDIX AMAGNIFICATIONFACTOROFTHESOMWITHMSE .............. 56 BSOFTTOPOGRAPHICVECTORQUANTIZATIONUSINGCIM ......... 59 B.1SOMandFreeEnergy ............................. 59 B.2ExpectationMaximization ........................... 59 REFERENCES ....................................... 62 BIOGRAPHICALSKETCH ................................ 65 6

PAGE 7

LISTOFTABLES Table page 4-1TheinformationcontentandthemeansquarequantizationerrorforvariousvalueofinSOM-CIMandSOMwithMSEisshown.Imax=3.2189 ...... 32 5-1TheinformationcontentandthemeansquarequantizationerrorforhomoscedasticandheteroscedasticcasesinSOM-CIM.Imax=3.2189 .............. 43 6-1Comparisonbetweenvariousmethodsasdensityestimators. .......... 47 6-2NumberofdeadunitsyieldedfordifferentdatasetswhenMSEandCIMareusedformapping. .................................. 50 7

PAGE 8

LISTOFFIGURES Figure page 2-1TheSOMgridwiththeneighborhoodrange. .................... 14 3-1SurfaceplotCIM(X,0)in2Dsamplespace.(kernelwidthis1) .......... 24 4-1TheunfoldingofmappingwhenCIMisused. ................... 28 4-2Thescatterplotoftheweightsatconvergencewhentheneighborhoodisrapidlydecreased. ...................................... 29 4-3Thegureshowstheplotbetweenln(r)andln(w)fordifferentvaluesof. ... 31 4-4Theguresshowstheinputscatterplotandthemappingoftheweightsatthenalconvergence. ................................ 33 4-5Negativeloglikelihoodoftheinputversusthekernelbandwidth. ........ 34 5-1Kerneladaptationinhomoscedasticcase. ..................... 38 5-2Kerneladaptationinheteroscedasticcase. .................... 39 5-3Magnicationofthemapinhomoscedasticandheteroscedasticcases. .... 42 5-4MappingoftheSOM-CIMwithheteroscedastickernelsand=1. ....... 43 5-5Thescatterplotoftheweightsatconvergenceincaseofhomoscedasticandheteroscedastickernels. ............................... 44 6-1ResultsofthedensityestimationusingSOM. ................... 46 6-2Clusteringofthetwo-crescentdatasetinthepresenceofoutliernoise. .... 48 6-3ThescatterplotoftheweightsshowingthedeadunitswhenCIMandMSEareusedtomaptheSOM. ............................. 49 6-4TheU-matrixobtainedforthearticialdatasetthatcontainsthreeclusters. .. 51 6-5TheU-matrixobtainedfortheIrisandbloodtransfusiondatasets. ........ 52 B-1TheunfoldingoftheSSOMmappingwhentheCIMisusedasthesimilaritymeasure. ....................................... 60 8

PAGE 9

AbstractofThesisPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofMasterofScienceADAPTIVEKERNELSELF-ORGANIZINGMAPSUSINGINFORMATIONTHEORETICLEARNINGByRakeshChalasaniMay2010Chair:JoseCarlosPrncipeMajor:ElectricalandComputerEngineering Selforganizingmap(SOM)isoneofthepopularclusteringanddatavisualizationalgorithmandevolvedasausefultoolinpatternrecognition,datamining,etc.,sinceitwasrstintroducedbyKohonen.However,itisobservedthatthemagnicationfactorforsuchmappingsdeviatesfromtheinformationtheoreticallyoptimalvalueof1(fortheSOMitis2/3).Thiscanbeattributedtotheuseofthemeansquareerrortoadaptthesystem,whichdistortsthemappingbyoversamplingthelowprobabilityregions. Inthiswork,theuseofakernelbasedsimilaritymeasurecalledcorrentropyinducedmetric(CIM)forlearningtheSOMisproposedanditisshownthatthiscanenhancethemagnicationofthemappingwithoutmuchincreaseinthecomputationalcomplexityofthealgorithm.ItisalsoshownthatadaptingtheSOMintheCIMsenseisequivalenttoreducingthelocalizedcrossinformationpotentialandhence,theadaptationprocessasadensityestimationprocedure.AkernelbandwidthadaptationalgorithmincaseoftheGaussiankernels,forbothhomoscedasticandheteroscedasticcomponents,isalsoproposedusingthisproperty. 9

PAGE 10

CHAPTER1INTRODUCTION 1.1Background Thetopographicorganizationofthesensorycortexisoneofthecomplexphenomenonsinthebrainandiswidelystudied.Itisobservedthattheneighboringneuronsinthecortexarestimulatedfromtheinputsthatareintheneighborhoodintheinputspaceandhence,calledasneighborhoodpreservationortopologypreservation.Theneurallayeractsasatopographicfeaturemap,ifthelocationsofthemostexcitedneuronsarecorrelatedinaregularandcontinuousfashionwitharestrictednumberofsignalfeaturesofinterest.Insuchacase,theneighboringexcitedlocationsinthelayercorrespondstostimuliwithsimilarfeatures( Ritteretal. 1992 ). Inspiredfromsuchabiologicalplausibility,butnotintendedtoexplainit, Kohonen ( 1982 )hasproposedtheSelfOrganizingMaps(SOM)wherethecontinuousinputsaremappedintodiscretevectorsintheoutputspacewhilemaintainingtheneighborhoodofthevectornodesinaregularlattice.Mathematically,theKohonen'salgorithm( Kohonen 1997 )isaneighborhoodpreservingvectorquantizationtoolworkingontheprincipleofwinner-take-all,wherethewinnerisdeterminedasthemostsimilarnodetotheinputatthatinstantoftime,alsocalledasthebestmatchingunit(BMU).ThecenterpieceofthealgorithmistoupdatetheBMUanditsneighborhoodnodesconcurrently.Byperformingsuchamappingtheinputtopologyispreservedontothegridofnodesintheoutput.ThedetailsofthealgorithmareexplainedinChapter 2 Theneuralmappinginthebrainalsoperformsselectivemagnicationoftheregionsofinterest.Usuallytheseregionsofinterestaretheonesthatareoftenexcited.Similartotheneuralmappinginthebrain,theSOMalsomagniestheregionsthatareoftenexcited( Ritteretal. 1992 ).ThismagnicationcanbeexplicitlyexpressedasapowerlawbetweentheinputdatadensityP(v)andtheweightvectordensityP(w)atthetimeofconvergence.Theexponentiscalledasthemagnicationfactorormagnication 10

PAGE 11

exponent( VillmannandClaussen 2006 ).Afaithfulrepresentationofthedatabytheweightscanhappenonlywhenthismagnicationfactoris1. Itisshownthatifthemeansquareerror(MSE)isusedasthesimilaritymeasuretondtheBMUandalsotoadapta1D-1DSOMorincaseofseparableinput,thenthemagnicationfactoris2/3( Ritteretal. 1992 ).So,suchamappingisnotabletotransferoptimalinformationfromtheinputdatatotheweightvectors. SeveralvariantsoftheSOMareproposedthatcanmakethemagnicationfactorequalto1.Theyarediscussedinthefollowingchapters. 1.2MotivationandGoals Thereasonforthesub-optimalmappingofthetraditionalSOMalgorithmcanbeattributedtotheuseofeuclidiandistanceasthesimilaritymeasure.BecauseoftheglobalnatureoftheMSEcostfunction,theupdatingoftheweightvectorsisgreatlyinuencedbytheoutliersorthedatainthelowprobabilityregions.Thisleadstooversamplingofthelowprobabilityregionsandundersamplingthehighprobabilityregionsbytheweightvectors.Here,toovercomethis,alocalizedsimilaritymeasurecalledcorrentropyinducedmetricisusedtotraintheSOM,whichcanenhancethemagnicationoftheSOM. Correntropyisdenedasageneralizedcorrelationfunctionbetweentworandomvariablesinahighdimensionalfeaturespace( Santamariaetal. 2006 ).ThisdenitioninducesareproducingkernelHilbertspace(RKHS)andtheeuclidiandistancebetweentworandomvariablesinthisRKHSiscalledasCorrentropyInducedMetric(CIM)( Liuetal. 2007 ).WhenGaussiankernelinducestheRKHSitisobservedthattheCIMhasastrongoutlier'srejectioncapabilityintheinputspace.Italsoincludesalltheevenhigherordermomentsofthedifferencebetweenthetworandomvariablesintheinputspace. ThegoalofthisthesisistoshowthatthesepropertiesoftheCIMcanhelptoenhancethemagnicationoftheSOM.Alsondingtherelationbetweenthisandtheotherinformationtheoreticlearning(ITL)( Principe 2010 )basedmeasurescanhelp 11

PAGE 12

tounderstandtheuseoftheCIMinthiscontext.Itcanalsohelptoadaptthefreeparameterinthecorrentropy,thekernelbandwidth,suchthatitcangiveanoptimalsolution. 1.3Outline Thisthesiscomprisesof7chapters.InChapter2thetraditionalselforganizingmapsalgorithm,theenergyfunctionassociatedwiththeSOManditsbatchmodevariantarediscussed.SomeothervariantsoftheSOMarealsobrieydiscussed.Chapter3dealswiththeinformationtheoreticlearningfollowedbythecorrentropyanditsproperties.InChapter4theuseoftheCIMintheSOMisintroducedandalsotheadvantageofusingtheCIMisshownthroughsomeexamples.InChapter5akernelbandwidthadaptationalgorithmisproposedandisextendedtotheSOMwiththeCIM(SOM-CIM).TheadvantagesofusingtheSOM-CIMareshowninChapter6;whereitisappliedtosomeoftheimportantapplicationoftheSOMlikedensityestimation,clusteringanddatavisualization.ThethesisisconcludedinChapter7 12

PAGE 13

CHAPTER2SELF-ORGANIZINGMAPS Selforganizingmapisabiologicallyinspiredtoolwithprimaryemphasisonthetopologicalorder.Itisinspiredfromthemodelingofthebraininwhichtheinformationpropagatesthroughtheneurallayerswhilepreservingthetopologyofonelayerintheother.Sincetheywererstintroducedby Kohonen ( 1997 ),theyhaveevolvedasusefultoolsinmanyapplicationssuchasvisualizationofhighdimensionaldatainlowdimensionalgrid,datamining,clustering,compression,etc.Inthesection 2.1 theKohonen'salgorithmisbrieydiscussed. 2.1Kohonen'sAlgorithm Kohonen'salgorithmisinspiredfromvectorquantization,inwhichagroupofinputsarequantizedbyafewweightvectorscalledasnodes.However,inadditiontothequantizationoftheinputs,herethenodesarearrangedinregularlowdimensionalgridandtheorderofthegridismaintainedthroughoutthelearning.Hence,thedistributionoftheinputdatainthehighdimensionalspacescanbepreservedonthelowdimensionalgrid( Kohonen 1997 ). Beforegoingfurtherintothedetailsofthealgorithm,pleasenotethatthefollowingnotationisusedthroughoutthisthesis:TheinputdistributionV2Rdismappedasfunction:V!A,whereAinalatticeofMneuronswitheachneuronhavingaweightvectorwi2Rdwhereiarelatticeindices. TheKohonen'salgorithmisdescribedinthefollowingpoints: Ateachinstantt,arandomsample,v,fromtheinputdistributionVisselectedandthebestmatchingunit(BMU)correspondingtoitisobtainedusing r=argminsD(ws,v)(2) wherenoderisconsideredastheindexofwinningnode.Here'D'isanysimilaritymeasurethatisusedtocomparetheclosenessbetweenthetwovectors. 13

PAGE 14

Figure2-1. TheSOMgridwiththeneighborhoodrange.Thecirclesindicatethenumberofnodesthatcorrelateinthesamedirectionasthewinner;hereitisnode(4,4). Oncethewinnerisobtainedtheweightsofallthenodesshouldbeupdatedinsuchawaythatthelocalerrorgivenby( 2 )shouldbeminimized. e(v,wr)=XshrsD(v)]TJ /F7 11.955 Tf 11.95 0 Td[(ws)(2) Herehrsistheneighborhoodfunction;anonincreasingfunctionofthedistancebetweenthewinnerandanyothernodesinthelattice.Toavoidlocalminima,asshownby Ritteretal. ( 1992 ),itisconsideredasaconvexfunction,likethemiddleregionofGaussianfunction,withalargerangeatthestartandisgraduallyreducedto Ifthesimilaritymeasureconsideredistheeuclidiandistance,D(v)]TJ /F7 11.955 Tf 9.7 0 Td[(ws)=kv)]TJ /F7 11.955 Tf 9.7 0 Td[(wsk2,thentheon-lineupdatingrulefortheweightsisobtainedbytakingthederivativeandminimizing( 2 ).Itcomesouttobe ws(n+1)=ws(n)+hrs(v(n))]TJ /F7 11.955 Tf 11.95 0 Td[(ws(n))(2) 14

PAGE 15

Asdiscussedearlier,oneoftheimportantpropertiesoftheSOMistopologicalpreservationanditistheneighborhoodfunctionthatisresponsibleforthis.Theroleplayedbyneighborhoodfunctioncanbebettersummarizedasdescribedin( Haykin 1998 ).ThereasonforlargeneighborhoodistocorrelatethedirectionofweightupdatesoflargenumberofweightsaroundtheBMU,r.Astherangedecreases,sodoesthenumberofneuronscorrelatedinthesamedirection(Figure 2-1 ).Thiscorrelationensuresthatthesimilarinputsaremappedtogetherandhence,thetopologyispreserved. 2.2EnergyFunctionandBatchMode Erwinetal. ( 1992 )haveshownthatincaseofnitesetoftrainingpatterntheenergyfunctionoftheSOMishighlydiscontinuesandincaseofcontinuousinputstheenergyfunctiondoesnotexist.ItisclearthatthingsgowrongattheedgeoftheVoronoiregionswheretheinputisequallyclosertotwonodeswhenthebestmatchingunit(BMU)isselectedusing 2 .Toovercomethis Heskes ( 1999 )hasproposedthatbyslightlychangingtheselectionoftheBMUtherecanbeawelldenedenergyfunctionfortheSOM. IftheBMUisselectedusing 2 insteadof 2 thenthefunctionoverwhichtheSOMisminimizedisgivenby 2 (indiscretecase).Thisisalsocalledasthelocalerror. r=argminsXthtsD(v(n))]TJ /F7 11.955 Tf 11.95 0 Td[(wt) (2) e(W)=MXshrsD(v(n))]TJ /F7 11.955 Tf 11.95 0 Td[(ws) (2) Theoverallcostfunctionwhenanitenumberofinputsamples,sayN,areconsiderisgivenby E(W)=NXnMXshrsD(v(n))]TJ /F7 11.955 Tf 11.95 0 Td[(ws)(2) 15

PAGE 16

Tondthebatchmodeupdaterule,wecantakethederivativeofE(W)w.r.twsandndthevalueoftheweightvectorsatthestationarypointofthegradient.Iftheeuclideandistanceisusedthenthebatchmodeupdateruleis ws(n+1)=PNnhrsxn PNnhrs(2) 2.3OtherVariants ContrarytowhatisassumedincaseoftheSOM-MSE,theweightdensityatconvergence,alsodenedastheinverseofthemagnicationfactor,isnotproportionaltotheinputdensity.Itisshownby Ritteretal. ( 1992 )thatinacontinuummapping,i.e.,havinginniteneighborhoodnodedensity,ina1Dmap(i.e,achain)developedinaonedimensionalinputspaceormultidimensionalspacewhichareseparable,theweightdensityP(w)/P(v)2=3withanon-vanishingneighborhoodfunction.(ForproofpleasereferAppendix A .)Whenadiscretelatticeisusedthereisacorrectionintherelationandisgivenby p(w)=p(v) (2) with=2 3)]TJ /F6 11.955 Tf 53.26 8.09 Td[(1 32h+3(h+1)2 wherehistheneighborhoodrangeincaseofrectangularfunction.Thereareseveralothermethods( VanHulle 2000 )proposedwiththechangeinthedenitionofneighborhoodfunctionbutthemappingisunabletoproduceanoptimalmapping,i.e,magnicationof1.Itisobservedthattheweightsarealwaysoversamplingthelowprobabilityregionsandundersamplingthehighprobabilityregions. ThereasonforsuchamappingcanbeattributedtotheglobalnatureoftheMSEcostfunction.Whentheeuclideandistanceisused,thepointsatthetailendoftheinputdistributionhaveagreaterinuenceonoveralldistortion.ThisisthereasonwhytheuseoftheMSEasacostfunctionissuitableonlyforthintaildistributionsliketheGaussian 16

PAGE 17

distribution.Thispropertyoftheeuclideandistancepushestheweightsintoregionsoflowprobabilityandhence,oversamplingthatregion. Byslightlymodifyingtheupdatingrule, BauerandDer ( 1996 )and Claussen ( 2005 )haveproposeddifferentmethodstoobtainamappingwithmagnicationfactorof1. BauerandDer ( 1996 )haveusedthelocalinputdensityattheweightstoadaptivelycontrolthestepsizeoftheweightupdate.Suchmappingisabletoproducetheoptimalmappinginthesamecontinuumconditionsasmentionedearlierbutneedstoestimatetheunknownweightdensityataparticularpointmakingitunstableinhigherdimensions( VanHulle 2000 ).While,themethodproposedby Claussen ( 2005 )isnotabletoproduceastableresultincaseofhighdimensionaldata. Acompletelydifferentapproachistakenby Linsker ( 1989 )wheremutualinformationbetweentheinputdensityandtheweightdensityismaximized.Itisshownthatsuchalearningbasedontheinformationtheoreticcostfunctionleadstoanoptimalsolution.Butthecomplexityofthealgorithmmakesitimpracticaland,strictlyspeaking,itisapplicaletoonlyforaGaussiandistribution.Furthermore, VanHulle ( 2000 )hasproposedananotherinformationtheorybasedalgorithmbasedonthe BellandSejnowski ( 1995 )'sInfomaxprinciplewherethedifferentialentropyoftheoutputnodesismaximized. Intherecentpast,inspiredfromtheuseofkernelHilbertspacesby Vapnik ( 1995 ),severalkernelbasedtopographicmappingalgorithmsareproposed( Andras 2002 ; Graepel 1998 ; MacDonald 2000 ). Graepel ( 1998 )hasusedthetheoryofdeterministicannealing( Rose 1998 )todevelopanewselforganizingnetworkcalledthesofttopographicvectorquantization(STVQ).AkernelbasedSTVQ(STMK)isalsoproposedwheretheweightsareconsideredinthefeaturespaceratherthantheinputspace.Toovercomethis, Andras ( 2002 )proposedakernel-Kohonennetworkinwhichthetheinputspaceistransformed,boththeinputsandtheweights,intoahighdimensionalreproducingkernelHilbertspace(RKHS)andusedtheeuclidiandistanceinthehigh 17

PAGE 18

dimensionalspaceasthecostfunctiontoupdatetheweights.Itisanalyzedinthecontextofclassicationandtheideacomesfromthetheoryofnon-linearsupportvectormachines( Haykin 1998 )whichstatesIftheboundaryseparatingthetwoclassesisnotlinear,thenthereexistatransformationofthedatainanotherspaceinwhichthetwoclassesarelinearlyseparable. But Lauetal. ( 2006 )showedthatthetypeofthekernelstronglyinuencestheclassicationaccuracyanditisalsoshownthatkernel-Kohonennetworkdoesnotalwaysoutperformthebasickohonennetwork.Moreover, Yin ( 2006 )hascomparedKSOMwiththeselforganizingmixturenetworks(SOMN)( YinandAllinson 2001 )andshowedthatKSOMisequivalenttoSOMN,andinturntheSOMitself,butfailedtoestablishwhyKSOMgivesabetterclassicationthantheSOM. Throughthisthesis,wetrytousetheideaofCorrentropyInducedMetric(CIM)andshowhowthetypeofthekernelfunctioninuencesthemapping.Wealsoshowthatthekernelbandwidthplaysanimportantroleinthemapformation.Inthefollowingchapterwebrieydiscussthetheoryofinformationtheoreticlearning(ITL)andCorrentropybeforegoingintothedetailsoftheproposedalgorithm. 18

PAGE 19

CHAPTER3INFORMATIONTHEORETICLEARNINGANDCORRENTROPY 3.1InformationTheoreticLearning Thecentralideaoftheinformationtheoreticlearningproposedby Principeetal. ( 2000 )isnon-parametricallyestimatingthemeasureslikeentropy,divergence,etc. 3.1.1Renyi'sEntropyandInformationPotentials Renyi's-orderentropyisdenedas H(X)=1 1)]TJ /F3 11.955 Tf 11.96 0 Td[(logZfx(x)(3) Aspecialcasewith=2isconsiderandiscalledastheRenyi'squadraticentropy.ItisdenedoverarandomvariableXas H2(X)=)]TJ /F4 11.955 Tf 9.29 0 Td[(logZf2x(x)dx(3) ThepdfofXisestimatedusingthefamoususingParzendensityestimation( Parzen 1962 )andisgivenby ^fx(x)=1 NNXn(x)]TJ /F7 11.955 Tf 11.95 0 Td[(xn)(3) wherexn;n=1,2,...,NareNsamplesobtainedfromthepdfofX.Popularly,aGaussiankerneldenedas G(x)=1 p 2exp)]TJ /F6 11.955 Tf 13.15 8.09 Td[((x)2 22(3) isusedfortheParzenestimation.Incaseofmultidimensionaldensityestimation,theuseofjointkernelsofthetype G(x)=DYdGd(x(d))(3) whereDisthenumberofdimensionsoftheinputvectorx,issuggested. Substituting 3 inplaceofthedensityfunctionsin 3 ,weget H2(X)=)]TJ /F4 11.955 Tf 9.3 0 Td[(logZ1 NNXnG(x)]TJ /F7 11.955 Tf 11.96 0 Td[(xn)2dx 19

PAGE 20

=)]TJ /F6 11.955 Tf 11.29 0 Td[(logZ1 N2NXnNXmG(x)]TJ /F7 11.955 Tf 11.95 0 Td[(xn)G(x)]TJ /F7 11.955 Tf 11.96 0 Td[(xm)dx=)]TJ /F6 11.955 Tf 11.29 0 Td[(log1 N2NXnNXmZG(x)]TJ /F7 11.955 Tf 11.95 0 Td[(xn)G(x)]TJ /F7 11.955 Tf 11.96 0 Td[(xm)dx=)]TJ /F6 11.955 Tf 11.29 0 Td[(log1 N2NXnNXmGp 2(xn)]TJ /F7 11.955 Tf 11.96 0 Td[(xm) (3) =)]TJ /F6 11.955 Tf 11.29 0 Td[(logIP(x) where IP(x)=1 N2NXnNXmG(xn)]TJ /F7 11.955 Tf 11.96 0 Td[(xm)(3) Here 3 iscalledastheinformationpotentialofX.Itdenesthestateoftheparticlesintheinputspaceandinteractionsbetweentheinformationparticles.ItisalsoshownthattheParzenpdfestimationofXisequivalenttotheinformationpotentialeld( DenizErdogmus 2002 ). Anothermeasurewhichcanestimatetheinteractionbetweenthepointsoftworandomvariablesintheinputspaceisthecross-informationpotential(CIP)( Principeetal. 2000 ).Itisdenedasanintegraloftheproductoftwoprobabilitydensityfunctionswhichcharacterizessimilaritybetweentworandomvariables.Itdeterminestheeffectofthepotentialcreatedbygyataparticularlocationinthespacedenedbythefx(orviceversa);wherefandgaretheprobabilitydensityfunction. CIP(X,Y)=Zf(x)g(x)dx (3) UsingtheParzendensityestimationagainwiththeGaussiankernel,weget CIP(X,Y)=Z1 MNNXnMXmGf(u)]TJ /F7 11.955 Tf 11.95 0 Td[(xn)Gg(u)]TJ /F7 11.955 Tf 11.96 0 Td[(ym)du (3) Similartotheinformationpotential,weget CIP(X,Y)=1 NMNXnMXmGa(xn)]TJ /F7 11.955 Tf 11.95 0 Td[(ym)(3) 20

PAGE 21

with2a=2f+2g. 3.1.2Divergence Thedissimilaritybetweentwodensityfunctionscanbequantiedusingthedivergencemeasure. Renyi ( 1961 )hasproposedan-orderdivergencemeasureforwhichthepopularKL-divergenceisaspecialcase.TheRenyi'sdivergencebetweenthetwodensityfunctionfandgisgivenby D(f,g)=1 1)]TJ /F3 11.955 Tf 11.95 0 Td[(logZf(x)f(x) g(x))]TJ /F10 7.97 Tf 6.59 0 Td[(1dx(3) Again,byvaryingthevalueoftheusercansetthedivergencevaryingdifferentorder.Asalimitingcasewhen!1,D!DKL;whereDKLreferstoKL-Divergenceandisgivenby DKL(f,g)=Zf(x)logf(x) g(x)dx(3) UsingthesimilarapproachusedtocalculatetheRenyi'sentropy,toestimatetheDKLfromtheinputsamplesthedensityfunctionscanbesubstitutedusingtheParzenestimation 3 .Iffandgareprobabilitydensityfunctions(pdf)ofthetworandomvariablesXandY,then DKL(f,g)=Zfx(u)log(fx(u))du)]TJ /F11 11.955 Tf 11.95 16.28 Td[(Zfx(u)log(gy(u))du=Ex[log(fx)])]TJ /F4 11.955 Tf 11.95 0 Td[(Ex[log(gy)]=Exlog(NXiG(x)]TJ /F4 11.955 Tf 11.95 0 Td[(x(j))))]TJ /F4 11.955 Tf 11.95 0 Td[(Exlog(MXjG(x)]TJ /F4 11.955 Tf 11.96 0 Td[(y(j))) (3) Thersttermin 3 isequivalenttotheShannon'sentropyofXandthesecondtermisequivalenttothecross-entropybetweentheXandY. Severalotherdivergencemeasuresareproposedusingtheinformationpotential(IP)liketheCauchy-Schwarzdivergenceandquadraticdivergencebutarenotdiscussedhere.Interestedreaderscanreferto( Principe 2010 ). 21

PAGE 22

3.2CorrentropyanditsProperties CorrentropyisageneralizedsimilaritymeasurebetweentwoarbitraryrandomvariablesXandYdenedby( Liuetal. 2007 ). V(X,Y)=E((X)]TJ /F4 11.955 Tf 11.96 0 Td[(Y)) (3) IfisakernelthatsatisestheMercer'sTheorem( Vapnik 1995 ),thenitinducesareproducingkernelHilbertspace(RKHS)calledasHv.Hence,itcanalsobedenedasthedotproductofthetworandomvariablesinthefeaturespace,Hv. V(X,Y)=E[<(x)>,<(y)>](3) whereisanon-linearmappingfromtheinputspacetothefeaturespacesuchthat(x,y)=[<(x)>,<(y)>];where<.>,<.>denesinnerproduct. HereweusetheGaussiankernel G(x,y)=1 (p 2)dexp)]TJ 13.15 8.09 Td[(kx)]TJ /F7 11.955 Tf 11.95 0 Td[(yk2 22(3) wheredistheinputdimensions;sinceitismostpopularlyusedintheinformationtheoreticlearning(ITL)literature.AnotherpopularkernelusedistheCauchykernelwhichisgivenby C(x,y)= 2+kx)]TJ /F7 11.955 Tf 11.95 0 Td[(yk2(3) Inpractice,onlyanitenumberofsamplesofthedataareavailableandhence,correntropycanbeestimatedas ^VN,=1 NNXi=1(xi)]TJ /F7 11.955 Tf 11.95 0 Td[(yi)(3) Someofthepropertiesofcorrentropythatareimportantinthisthesisarediscussedhereanddetailedanalysiscanbeobtainedat( Liuetal. 2007 ; Santamariaetal. 2006 ). Property1:CorrentropyinvolvesalltheevenordermomentsoftherandomvariableE=X)]TJ /F4 11.955 Tf 11.96 0 Td[(Y. 22

PAGE 23

ExpandingtheGaussiankernelin 3 usingtheTaylor'sseries,weget V(X,Y)=1 p 21Xn=0()]TJ /F6 11.955 Tf 9.3 0 Td[(1)n 2nn!E(X)]TJ /F4 11.955 Tf 11.95 0 Td[(Y)2n 2n(3) Asincreasesthehigherordermomentsdecayandsecondordermomentsdominate.Foralargecorrentropyapproachescorrelation. Property2:Correntropy,asasampleestimator,inducesametriccalledCorrentropyinducedmetric(CIM)inthesamplespace.GiventwosamplevectorsX(x1,x2,...,xN)andY(y1,y2,...,yN),CIMisdenedas CIM(X,Y)=((0))]TJ /F6 11.955 Tf 13.82 2.66 Td[(^V(X,Y))1=2 (3) =1 NNXn(0))]TJ /F3 11.955 Tf 11.96 0 Td[((xn)]TJ /F4 11.955 Tf 11.96 0 Td[(yn)1=2 (3) TheCIMcanalsobeviewedastheeuclidiandistanceinthefeaturespaceifthekernelisoftheformkx)]TJ /F7 11.955 Tf 12.33 0 Td[(yk.ConsidertworandomvariablesXandYtransformedintothefeaturespaceusinganon-linearmapping,then E[k(x))]TJ /F3 11.955 Tf 11.96 0 Td[((y)k2]=E[(x,x)+(y,y))]TJ /F6 11.955 Tf 11.95 0 Td[(2(x,y)]=E[2(0))]TJ /F6 11.955 Tf 11.96 0 Td[(2(x,y)]=2 NNXn((0))]TJ /F3 11.955 Tf 11.96 0 Td[((xn)]TJ /F4 11.955 Tf 11.96 0 Td[(yn)) (3) Exceptfortheadditionalmultiplyingfactorof2, 3 issimilarto 3 Sinceisanon-linearmappingfromtheinputtothefeaturespace,theCIMinducesanon-linearsimilaritymeasureintheinputspace.IthasbeenobservedthattheCIMbehaveslikeanL2normwhenthetwovectorsareclose,asL1normoutsidetheL2normandastheygofartherapartitbecomesinsensitivetodistancebetweenthetwovectors(L0norm).AndtheextentofthespaceoverwhichtheCIMactsasL2orL0normisdirectlyrelatedtothekernelbandwidth,.ThisuniquepropertyofCIMlocalizesthesimilaritymeasureandisveryhelpfulinrejectingtheoutliers.Inthisregardit,isverydifferentfromtheMSEwhichprovidesaglobalmeasure. 23

PAGE 24

AwithGaussianKernel BwithCauchyKernel Figure3-1. SurfaceplotCIM(X,0)in2Dsamplespace.(kernelwidthis1) Figure 3-1 demonstratesthenon-linearnatureofthesurfaceoftheCIM,withN=2.ClearlytheshapeofL2normdependsonthekernelfunction.AlsothekernelbandwidthdeterminestheextentL2regions.Aswewillseelater,theseparametersplayanimportantroleindeterminingthequalityofthenaloutput. 24

PAGE 25

CHAPTER4SELF-ORGANIZINGMAPSWITHTHECORRENTROPYINDUCEDMETRIC Asdiscussedinthepreviouschaptercorrentropyinducesanon-linearmetricintheinputspacecalledasCorrentropyInducedMetric(CIM).HereweshowthattheCIMcanbeusedasasimilaritymeasureintheSOMtodeterminethewinnerandalsotoupdatetheweightvectors. 4.1On-lineMode Ifv(n)isconsideredastheinputvectoroftherandomvariableVataninstantn,thenthebestmatchingunit(BMU)canbeobtainedusingtheCIMas r=argminsCIM(v(n),ws)N=1 (4) =argmins((0))]TJ /F3 11.955 Tf 11.95 0 Td[((kv(n))]TJ /F7 11.955 Tf 11.96 0 Td[(wsk)) (4) whererisconsiderastheBMUatinstantnandispredetermined. AccordingtoKohonen'salgorithm,iftheCIMisusedasthesimilaritymeasurethenthelocalerrorthatneedstobeminimizedtoobtainanorderedmappingisgivenby e(wr,v)=MXs=1hrs((0))]TJ /F3 11.955 Tf 11.95 0 Td[((kv(n))]TJ /F7 11.955 Tf 11.96 0 Td[(wsk))(4) Forperforminggradientdescentoverthelocalerrorforupdatingtheweights,takethederivativeof 4 withrespecttowsandupdatetheweightsusingthegradient.Theupdaterulecomesouttobe ws(n+1)=ws(n))]TJ /F3 11.955 Tf 11.96 0 Td[(4ws(4) IftheGaussianfunctionisusedthenthegradientis 4ws=)]TJ /F4 11.955 Tf 9.29 0 Td[(hrsG(kv(n))]TJ /F7 11.955 Tf 11.96 0 Td[(wsk)(v(n))]TJ /F7 11.955 Tf 11.95 0 Td[(ws) (4) IncaseofCauchykernel,itis 4ws=)]TJ /F4 11.955 Tf 9.29 0 Td[(hrs1 (2+kv(n))]TJ /F7 11.955 Tf 11.96 0 Td[(wsk2)(v(n))]TJ /F7 11.955 Tf 11.96 0 Td[(ws)(4) 25

PAGE 26

Theimplicittermin 4 and 4 iscombinedwiththelearningrate,afreeparameter. WhencomparedtotheupdateruleobtainedusingtheMSE( 2 ),thegradientinboththecaseshaveanadditionalscalingfactor.Thisadditionalscalingfactor,whosevalueissmallwhenkv(n))]TJ /F7 11.955 Tf 9.9 0 Td[(wskislarge,intheupdatingrulepointsoutthestrongoutliersrejectioncapabilityoftheCIM.ThispropertyoftheCIMisabletoresolveoneofthekeyproblemsassociatedwithusingMSE,i.e.,oversamplingthelowprobabilityregionsoftheinputdistribution.Sincetheweightsarelessinuencedbytheinputsinthelowprobabilityregions,theSOMwiththeCIM(SOM-CIM)cangiveabettermagnication.ItshouldbenotedthatsincetheinuenceoftheoutliersdependsontheshapeandextentoftheL2-normandL0-normregionsoftheCIM,themagnicationfactorinturnisdependentonthetypeofthekernelanditsbandwidth,. AnotherpropertyoftheCIM(withaGaussiankernel),thatinuencesthemagnicationfactor,isthepresenceofhigherordermoments.Accordingto Zador ( 1982 )theorderoftheerrordirectlyinuencesthemagnicationfactorofthemapping.Again,sincethekernelbandwidth,,determinestheinuenceofthehigherordermoments(refertosection 3.2 )thechoiceofplaysanimportantroleintheformationofthenalmapping. 4.2EnergyFunctionandBatchMode IftheCIMisusedfordeterminingthewinnerandtoupdatetheweights,thenaccordingto Heskes ( 1999 )anenergyfunctionoverwhichtheSOMcanbeadaptedexistonlywhenthewinnerisdeterminedusingthelocalerrorratherthanjustametric.Hence,iftheBMUisobtainedbyusing r=argminsXthtsCIM(v(n))]TJ /F7 11.955 Tf 11.95 0 Td[(wt) (4) thentheSOM-CIMisadaptedbyminimizingthefollowingenergyfunction. e(W)=MXshrsCIM(v(n))]TJ /F7 11.955 Tf 11.96 0 Td[(ws) (4) 26

PAGE 27

Inthebatchmode,whenanitenumberofinputsamplesarepresenttheoverallcostfunctionbecomes ECIM(W)=NXnMXshrsCIM(v(n))]TJ /F7 11.955 Tf 11.95 0 Td[(ws)(4) SimilartothebatchmodeupdateruleobtainedwhileusingtheMSE,wendtheweightvectorsatstationarypointofthegradientoftheaboveenergyfunction,ECIM(W).IncaseoftheGaussiankernel @ECIM(W) @ws=)]TJ /F5 7.97 Tf 16.63 14.95 Td[(NXnhrsG(v(n))]TJ /F7 11.955 Tf 11.96 0 Td[(ws)(v(n))]TJ /F7 11.955 Tf 11.96 0 Td[(ws)=0 (4) )ws=PNnhrsG(v(n))]TJ /F7 11.955 Tf 11.96 0 Td[(ws)v(n) PNnhrsG(v(n))]TJ /F7 11.955 Tf 11.96 0 Td[(ws) (4) Theupdateherecomesouttobethesocallediterativexedpointupdaterule,indicatingthattheweightsareupdatediterativelyandcannotreachminimainoneiteration,likeitdoeswiththeMSE. AndincaseoftheCauchykernel,theupdateruleis ws=PNnhrsC(v(n))]TJ /F7 11.955 Tf 11.95 0 Td[(ws)v(n) PNnhrsC(v(n))]TJ /F7 11.955 Tf 11.96 0 Td[(ws)(4) Thescalingfactorinthedenominatorandthenumeratorplaysthesimilarroleasdescribedintheprevioussection 4.1 Figure 4-1 showshowthemapevolveswhentheCIMisusedtomap4000samplesofauniformdistributionbetween(0,1]ontoa7x7rectangulargrid.Itshowsthescatterplotoftheweightsatdifferentepochsandtheneighborhoodrelationbetweenthenodesinthelatticeareindicatedbythelines.Initially,theweightsarerandomlychosenbetween(0,1]andthekernelbandwidthiskeptconstantat0.1.AGaussianneighborhoodfunction,givenby 4 ,isconsideredanditsrangeisdecreasedwiththenumberofepochsusing 4 hrs(n)=exp)]TJ /F6 11.955 Tf 13.15 8.09 Td[((r)]TJ /F4 11.955 Tf 11.96 0 Td[(s)2 2h(n) (4) 27

PAGE 28

ARandominitialcondition BEpoch=10 CEpoch=30 DEpoch=50 Figure4-1. Thegureshowsthescatterplotoftheweightsalongwiththeneighborhoodnodesconnectedwiththelinesatvariousstagesofthelearning. h(n)=ihexp)]TJ /F3 11.955 Tf 11.95 0 Td[(ihn N (4) whererandsarenodeindexes,histherangeoftheneighborhoodfunction,ih=initialvalueofh,=0.3andN=numberofiterations/epochs. Intherstfewepochs,whentheneighborhoodislarge,allthenodesconvergearoundthemeanoftheinputsandastheneighborhooddecreaseswiththenumberofepochs,thenodesstarttorepresentthestructureofthedataandminimizestheerror. Ananotherimportantpointregradingthekernelbandwidthshouldbeconsideredhere.Ifthevalueofissmall,thenduringthe'orderingphase'( Haykin 1998 )ofthe 28

PAGE 29

AConstantkernelsize BWithkernelannealing Figure4-2. Thescatterplotoftheweightsatconvergencewhentheneighborhoodisrapidlydecreased. mapthemomentofthenodesisrestrictedbecauseofthesmallL2region.So,aslowannealingoftheneighborhoodfunctionisrequired.Toovercomethis,alargeisconsideredinitially,whichensuresthatthemomentofthenodesisnotrestricted.Itisgraduallyannealedduringtheorderingphaseandkeptconstantatthedesiredvalueduringtheconvergencephase.Figure 4-2 demonstratesthis.Whenthekernelsizeiskeptconstantandtheneighborhoodfunctionisdecreasedrapidlywith=1in 4 ,arequirementforfasterconvergence,thenalmappingisdistorted.Ontheotherhand,whentheisannealedgraduallyfromalargevaluetoasmallvalue,inthiscase5to0.1,duringtheorderingphaseandkeptconstantafterthat,thenalmappingismuchlessdistorted.Henceforth,thisisthetechniqueusedtoobtainalltheresults. Inadditiontothis,itisinterestingifwelookbackatthecostfunction(orlocalerror)whentheGaussiankernelisused.Itisreproducedhereandisgivenby eCIM(W)=XPMshrs(1)]TJ /F4 11.955 Tf 11.95 0 Td[(G(ws,v(n))) (4) =MXshrs)]TJ /F5 7.97 Tf 16.56 14.94 Td[(MXshrsG(ws,v(n)) (4) 29

PAGE 30

ThesecondtermintheaboveequationcanbeconsideredastheestimateoftheprobabilityoftheinputusingthesumoftheGaussianmixturescenteredattheweights;withtheneighborhoodfunctionconsideredequivalenttotheposteriorprobability. p(v(n)=W)Xip(v(n)=wi)P(wi)(4) Hence,itcanbesaidthatthiswayofkernelingoftheSOMisequivalenttoestimatingtheinputprobabilitydistribution( Yin 2006 ).ItcanalsobeconsideredastheweightedParzendensityestimation( BabichandCamps 1996 )technique;withtheneighborhoodfunctionactingasthestrengthassociatedwitheachweightvector.ThisreassertstheargumentthattheSOM-CIMcanincreasethemagnicationfactorofthemappingto1. 4.3Results 4.3.1MagnicationFactoroftheSOM-CIM Aspointedoutearlier,theuseoftheCIMdoesnotallowthenodestooversamplethelowprobabilityregionsoftheinputdistributionastheydowiththeMSE.Alsothepresenceofthehigherordermomentseffectthenalmappingbutitisdifculttodeterminetheexacteffectthehigherordermomentshaveonthemappinghere.Still,itcanbeexpectedthatthemagnicationoftheSOMcanbeimprovedusingCIM.Toverifythisexperimentally,weuseasetupsimilartothatusedby Ritteretal. ( 1992 )todemonstratethemagnicationoftheSOMusingtheMSE. Here,aonedimensionalinputspaceismappedontoaonedimensionalmap,i.e.,achain,with50nodeswheretheinputis100,000instancesdrawnfromthedistributionf(x)=2x.Thentherelationbetweentheweightsandthenodeindexesarecompared.AGaussianneighborhoodfunction,givenby 4 ,isconsideredanditsrangeisdecreasedwiththenumberofiterationsusing 4 with=0.3. Ifthemagnicationfactoris'1',asinthecaseofoptimalmapping,therelationbetweentheweightsandthenodesisln(w)=1 2ln(r).IncaseoftheSOMwiththeMSE(SOM-MSE)asthecostfunction,themagnicationisprovedtobe2/3(Appendix. 30

PAGE 31

Figure4-3. Thegureshowstheplotbetweenln(r)andln(w)fordifferentvaluesof.'--'indicatestheidealplotrepresentingtherelationbetweenwandrwhenthemagnicationis1and'...'indicatestheidealplotrepresentingthemagnicationof2/3.'-+-'indicatesrelationobtainedwithSOM-CIMwhen=0.03and'-x-'indicateswhen=0.5. A )andthetherelationcomesouttobeln(w)=3 5ln(r).Figure 4-3 showsthatwhensmallerisconsideredfortheSOM-CIMthenabettermagnicationcanbeachieved.AsthevalueoftheincreasesthemappingresemblestheonewiththeMSEasthecostfunction.ThisestablishesthenatureofthesurfaceofCIM,asdiscussedinSection 3.2 ,thatwiththeincreaseinthevalueofthesurfacetendstobehavemorelikeMSE.Actually,itcanbesaidthatwiththechangeinthevalueofthethemagnicationfactoroftheSOMcanvarybetween2/3to1!Butananalyticalsolutionthatcangivearelationbetweenthetwoisstillelusive. 31

PAGE 32

Table4-1. TheinformationcontentandthemeansquarequantizationerrorforvariousvalueofinSOM-CIMandSOMwithMSEisshown.Imax=3.2189 MethodMeanSquareEntropyorQuantizationErrorInfoCont.,I KSOMkernelBandwidth,(.10)]TJ /F10 7.97 Tf 6.58 0 Td[(3) 0.0515.3613.1301Gaussian0.15.7243.2101Kernel0.25.0823.19290.55.0613.18330.85.0463.17941.05.0373.17501.55.0453.1768 0.0255.66353.12690.055.89073.2100Cauchy0.15.33053.2065Kernel0.25.13993.19230.55.12783.181615.03593.17251.55.06783.1789 MSE5.04203.1767 4.3.2MaximumEntropyMapping Optimalinformationtransferfromtheinputdistributiontotheweightshappenwhenthemagnicationfactoris1.Suchamappingshouldensurethateverynodeisactivewithequalprobability,alsocalledasequiprobabilisticmapping( VanHulle 2000 ).Since,theentropyisameasurethatdeterminestheamountofinformationcontent,weuseShannon'sentropy( 4 )oftheweightstoquantifythisproperty. I=)]TJ /F5 7.97 Tf 15.9 14.94 Td[(MXrp(r)ln(p(r))(4) wherep(r)istheprobabilityofthenodertobethewinner. Table 4-1 showsthechangeinIwiththechangeinthevalueofthekernelbandwidthandtheperformancebetweentheSOM-MSEandtheSOM-CIMiscompared.Themappinghereisgeneratedinthebatchmodewiththeinputhaving50002-dimensionalrandomsamplesgeneratedfromalineardistributionP(x)/5xandmappedontoa5x5rectangulargridwithh=10!0.0045. 32

PAGE 33

AInputscatterplot BSOM-MSE CSOM-CIM Figure4-4. Theguresshowstheinputscatterplotandthemappingoftheweightsatthenalconvergence.(IncaseofSOM-CIM,theGaussiankernelwith=0.1isused.)The'dots'indicatetheweightvectorsandthelinesindicatetheconnectionbetweenthenodesinthelatticespace. Fromtable 4-1 ,itisclearthatasthevalueoftheinuencesthequalityofthemappingandgure 4-4 showstheweightdistributionattheconvergence.Itisclearthat,forlargebecauseofwiderL2-normregionthemappingoftheSOM-CIMbehavessimilartothatoftheSOM-MSE.Butasisdecreasedthenodestrytobeequiprobabiliticindicatingthattheyarenotoversamplingthelowprobabilityregions.Butsettingthevalueoftoosmallagaindistortsthemapping.Thisisbecauseoftherestrictedmovementofthenodes;whichleadstooversamplingthehighprobabilityregion.Thisunderlinestheimportancethevalueoftoobtainagoodqualitymapping. 33

PAGE 34

Figure4-5. Negativeloglikelihoodoftheinputversusthekernelbandwidth.Thegureshowshowthenegativeloglikelihoodoftheinputsgiventheweightschangewiththechangeinthekernelbandwidth.Notethatthenegativevalueintheplotindicatesthatthemodel,theweightsandthekernelbandwidth,isnotappropriatetoestimatetheinputdistribution. Itcanalsobeobservedhowthetypeofthekernelalsoinuencesthemapping.TheGaussianandtheCauchykernelsgivethebestmappingatdifferentvaluesof,indicatingthattheshapeoftheL2normregionforthesetwokernelsaredifferent. Moreover,sinceitisshownthattheSOM-CIMisequivalenttodensityestimation,thenegativeloglikelihoodoftheinputgiventheweightsisalsoobservedwhentheGaussiankernelisused.Thenegativeloglikelihoodisgivenby LL=)]TJ /F6 11.955 Tf 12.06 8.09 Td[(1 NNXnlog(p(v(n)=W,)) (4) wherep(v(n)=W,)=1 MMXiG(v(n),wi) (4) .Thechangeinitsvaluewiththechangeinthekernelbandwidthisplottedinthegure 4-5 .Foranappropriatevalueofthekernelbandwidth,thelikelihoodoftheestimatingtheinputishigh;whereasforlargevaluesofitdecreasesconsiderably. 34

PAGE 35

CHAPTER5ADAPTIVEKERNELSELFORGANIZINGMAPS Nowthattheinuenceofthekernelbandwidthonthemappingisstudied,oneshouldunderstandthedifcultyinvolvedinsettingthevalueof.ThoughitisunderstoodfromthetheoryofCorrentropy( Liuetal. 2007 )thatthevalueofshouldbeusuallylessthanthevarianceoftheinputdata,itisnotclearhowtosetthisvalue.SincetheSOM-CIMcanbeconsideredasadensityestimationprocedure,certainrulesofthumb,liketheSilverman'srule( Silverman 1986 )canbeused.ButitisobservedthattheseresultsaresuboptimalinmanycasesbecauseoftheGaussiandistributionassumptionoftheunderlyingdata.Itshouldalsobenotedthatforequiprobabilisticmodelingeachnodeshouldadapttothedensityoftheinputsinitownvicinityandhence,shouldhaveitsownuniquebandwidth.Inthefollowingsectionwetrytoaddressthisusinganinformationtheoreticdivergencemeasure. 5.1TheAlgorithm Clusteringofthedataiscloselyrelatedtothedensityestimation.Anoptimalclusteringshouldensurethattheinputdensitycanbeestimatedusingtheweightvectors.Severaldensityestimationbasedclusteringalgorithmsareproposedusinginformationtheoreticquantitieslikedivergence,entropy,etc.TherearealsoseveralclusteringalgorithmsintheITLliteratureliketheInformationTheoreticVectorQuantization( Lehn-Schioleretal. 2005 ),VectorQuantizationusingKL-Divergence( Hegdeetal. 2004 )andPrincipleofRelevantInformation( Principe 2010 );wherequantitieslikeKL-Divergence,Cauchy-SchwartzDivergenceandEntropyareestimatednon-parametrically.Aswehaveshownearlier,theSOM-CIMwiththeGaussiankernelcanalsobeconsideredasadensityestimationprocedureandhence,adensityestimationbasedclusteringalgorithm. Ineachofthesecases,thevalueofthekernelbandwidthplaysanimportantroleisthedensityestimationandshouldbesetsuchthatthedivergencebetweenthe 35

PAGE 36

trueandtheestimatedpdfisassmallaspossible.Hereweusetheideaproposedby Erdogmusetal. ( 2006 )ofusingtheKullbackLeiblerdivergence(DKL)toestimatethekernelbandwidthfordensityestimationandextendittotheclusteringalgorithms. Asshowninsection 3.1.2 ,theDKL(fkg)canbeestimatednon-parametricallyfromthedatausingtheParzenwindowestimation.Hereiffistheestimatedpdfusingthedataandgistheestimatedpdfusingtheweightvectors,thentheDKLcanbewrittenas DKL(f,g)=Evlog(NXiGv(v)]TJ /F7 11.955 Tf 11.96 0 Td[(v(i))))]TJ /F4 11.955 Tf 11.95 0 Td[(Evlog(MXjGw(v)]TJ /F7 11.955 Tf 11.96 0 Td[(w(j)))(5) wherevisthekernelbandwidthforestimatingthepdfusingtheinputdataandwisthekernelbandwidthwhileusingtheweightvectorstoestimatethepdf.Sincewewanttoadaptthekernelbandwidthoftheweightvector,minimizingDKLisequivalenttominimizingthesecondtermin 5 .Hence,thecostfunctionforadaptingthekernelbandwidthis J()=)]TJ /F4 11.955 Tf 9.3 0 Td[(Evlog(MXjGw(v)]TJ /F7 11.955 Tf 11.95 0 Td[(w(j)))(5) ThisalsocalledastheShannon'scrossentropy. Now,theestimationofthepdfusingtheweightvectorcanbedoneusingasinglekernelbandwidth,homoscedastic,orwithvaryingkernelbandwidth,heteroscedastic.Wewillconsidereachofthesecasesseparatelyhere. 5.1.1HomoscedasticEstimation Ifthepdfisestimatedusingtheweightswithasinglekernelbandwidth,givenby g=MXjGw(v)]TJ /F7 11.955 Tf 11.95 0 Td[(w(j))(5) thentheestimationiscalledhomoscedastic.Insuchacaseonlyasingleparameterneedstobeadaptedin 5 .Toadaptthisparameter,gradientdescentisperformedoverthecostfunctionbytakingthederivativeofJ()w.r.tw.Thegradientcomesouttobe 36

PAGE 37

@J @w=)]TJ /F4 11.955 Tf 9.3 0 Td[(EvPMjGw(v)]TJ /F7 11.955 Tf 11.96 0 Td[(w(j))kv)]TJ /F10 7.97 Tf 6.58 0 Td[(w(j)k2 3w)]TJ /F5 7.97 Tf 15.79 4.71 Td[(d w PMjGw(v)]TJ /F7 11.955 Tf 11.96 0 Td[(w(j))Tondanon-lineversionoftheupdaterule,weusethestochasticgradientoftheaboveupdaterulebyreplacingtheexpectedvaluebytheinstantaneousvalueoftheinput.Thenalupdaterulecomesouttobe 4w=)]TJ /F11 11.955 Tf 9.3 16.85 Td[(PMjGw(v(n))]TJ /F7 11.955 Tf 11.96 0 Td[(w(j))kv(n))]TJ /F10 7.97 Tf 6.59 0 Td[(w(j)k2 3w)]TJ /F5 7.97 Tf 15.79 4.71 Td[(d w PMjGw(v(n))]TJ /F7 11.955 Tf 11.96 0 Td[(w(j)) (5) w(n+1)=w(n))]TJ /F3 11.955 Tf 11.95 0 Td[(4w (5) whered=inputdimensionsandisthelearningrate. Thiscanbeeasilyextendedtothebatchmodebytakingthesampleestimationoftheexpectedvalueandndingthestationarypointofthegradient.Itcomesouttoaxedpointupdaterule,indicatingthatitrequiresafewepochstoreachaminima.Thebatchupdateruleis 2w=1 NdNXnPMjGw(v(n))]TJ /F7 11.955 Tf 11.95 0 Td[(w(j))kv(n))]TJ /F7 11.955 Tf 11.95 0 Td[(w(j)k2 PMjGw(v(n))]TJ /F7 11.955 Tf 11.95 0 Td[(w(j))(5) Itisclearfrombothon-lineupdaterule 5 andbatchupdaterule 5 thatthevalueofwisdependentonthevarianceoftheerrorbetweentheinputandtheweightslocalizedbytheGaussianfunction.Thislocalizedvarianceconvergessuchthatthedivergencebetweentheestimatedpdfandthetruepdfoftheunderlyingdataisminimized.Hence,thesekernelscanbeusedtoapproximatethepdfdensityusingthemixtureoftheGaussiankernelscenteredattheweightvectors.Wewillgetbacktothisdensityestimationtechniqueinthelaterchapters. Theperformanceofthisalgorithmisshowninthefollowingexperiment.Figure 5-1 showstheadaptationprocessofthewheretheinputisasyntheticdataconsistingof4clusterscenteredatfourpointsonasemicircledistortedwiththeGaussiannoiseof0.4variance.Thekernelbandwidthisadaptedusingtheon-lineupdaterule 5 by 37

PAGE 38

A B Figure5-1. Kerneladaptationinhomoscedasticcase.[A]Thetrackofthevalueduringthelearningprocessinthehomoscedasticcase.[B]Thescatterplotoftheinputandtheweightvectorswiththeringindicatingthekernelbandwidthcorrespondingtotheweightvectors.Thekernelbandwidthapproximatesthelocalvarianceofthedata. keepingtheweightvectorsconstantatthecentersoftheclusters.Itisobservedthatthekernelbandwidth,,convergenceto0.4041.Thisnearlyequaltothelocalvarianceoftheinputs,i.e.,thevarianceoftheGaussiannoiseateachclustercenter. However,theproblemwiththehomoscedasticmodelisthatifthewindowwidthisxedacrosstheentiresample,thereisatendencyforspuriousnoisetoappearinthetailsoftheestimates;iftheestimatesaresmoothedsufcientlytodealwiththis,thenessentialdetailinthemainpartofthedistributionismasked.Inthefollowingsectionwetrytoaddressthisbyconsideringheteroscedasticmodel,whereeachnodehasuniqueakernelbandwidth. 5.1.2HeteroscedasticEstimation Ifthedensityisestimatedusingtheweightvectorswithvaryingkernelbandwidth,givenby g=MXjGw(j)(v)]TJ /F7 11.955 Tf 11.96 0 Td[(w(j))(5) thentheestimationiscalledheteroscedastic.Inthiscase,theassociatedwitheachnodeshouldbeadaptedtothevarianceoftheunderlyinginputdensitysurrounding 38

PAGE 39

A B Figure5-2. Kerneladaptationinheteroscedasticcase.[A]Theplotshowsthetrackofthevaluesassociatedwiththeirrespectiveweightsagainstthenumberofepochs.Initially,thekernelbandwidthsarechosenrandomlybetween(0,1]andareadaptedusingtheon-linemode.[B]Thescatterplotoftheweightsandtheinputswiththeringsindicatingthekernelbandwidthassociatedwitheachnode. theweightvector.Weusethesimilarprocedureasinthehomoscedasticcase,bysubstitutingthisin 5 andndthegradientw.r.tw(j).Weget @J @w(i)=)]TJ /F4 11.955 Tf 9.29 0 Td[(EvGw(v)]TJ /F7 11.955 Tf 11.96 0 Td[(w(i))kv)]TJ /F10 7.97 Tf 6.59 0 Td[(w(i)k2 w(i)3)]TJ /F5 7.97 Tf 20.58 4.7 Td[(d w(i) PMjGw(j)(v)]TJ /F7 11.955 Tf 11.96 0 Td[(w(j))(5) Similartothepreviouscase,foron-lineupdaterulewendthestochasticgradientandforbatchmodeupdaterulewereplacetheexpectedvaluewiththesampleestimationbeforendingthestationarypointofthegradient. Theupdaterulescomesouttobe: On-lineUpdaterule: 4w(n,i)=)]TJ /F11 11.955 Tf 9.3 16.86 Td[(Gw(i)(v(n))]TJ /F7 11.955 Tf 11.95 0 Td[(w(i))kv(n))]TJ /F10 7.97 Tf 6.58 0 Td[(w(i)k2 w(i)3)]TJ /F5 7.97 Tf 20.58 4.71 Td[(d w(i) PMjGw(v(n))]TJ /F7 11.955 Tf 11.95 0 Td[(w(j)) (5) w(n+1,i)=w(n,i))]TJ /F3 11.955 Tf 11.96 0 Td[(4w(n,i) (5) BatchModeUpdaterule: 39

PAGE 40

w(i)2=1 dPNnGw(i)(v(n))]TJ /F10 7.97 Tf 6.58 0 Td[(w(i))kv(n))]TJ /F10 7.97 Tf 6.59 0 Td[(w(i)k2 PMjGw(j)(v(n))]TJ /F10 7.97 Tf 6.59 0 Td[(w(j)) PNnGw(i)(v(n))]TJ /F10 7.97 Tf 6.59 0 Td[(w(i)) PMjGw(j)(v(n))]TJ /F10 7.97 Tf 6.59 0 Td[(w(j))(5) Figure 5-2 showsthekernelbandwidthadaptationforasimilarsetupasdescribedforthehomoscedasticcase( 5.1.1 ).Initially,thekernelbandwidthsareconsideredrandomlybetween(0,1]andareadaptedusingtheon-linelearningrule( 5 )whilekeepingtheweightvectorsconstantatthecentersoftheclusters.Thekernelbandwidthtracksassociatedwitheachnodeareshownandtheassociatedwitheachnodeagainconvergedtothelocalvarianceofthedata. 5.2SOM-CIMwithAdaptiveKernels ItiseasytoincorporatetheabovementionedtwoadaptivekernelalgorithmsintheSOM-CIM.Sincethevalueofthekernelbandwidthadaptstotheinputdensitylocaltothepositionoftheweights,theweightsarerstupdatedfollowedbythekernelbandwidth.Thisensuresthatthekernelbandwidthadaptstothepresentpositionoftheweightsandhence,betterdensityestimation. TheadaptivekernelSOM-CIMincaseofhomoscedasticcomponentscanbesummarizedas: Thewinnerisselectedusingthelocalerroras r=argminsMXthst(G(n)(0))]TJ /F4 11.955 Tf 11.96 0 Td[(G(n)(kv(n))]TJ /F7 11.955 Tf 11.96 0 Td[(wtk)) (5) Theweightsandthenthekernelbandwidthareupdatedas 4ws=)]TJ /F4 11.955 Tf 9.3 0 Td[(hrsG(n)(kv(n))]TJ /F7 11.955 Tf 11.95 0 Td[(wsk)(v(n))]TJ /F7 11.955 Tf 11.96 0 Td[(ws) (n)3 (5) ws(n+1)=ws(n))]TJ /F3 11.955 Tf 11.96 0 Td[(4ws (5) 4(n)=)]TJ /F11 11.955 Tf 9.3 16.86 Td[(PMjG(n)(v(n))]TJ /F7 11.955 Tf 11.96 0 Td[(wj(n+1)kv(n))]TJ /F10 7.97 Tf 6.58 0 Td[(wj(n+1)k2 (n)3)]TJ /F5 7.97 Tf 18.56 4.71 Td[(d (n) PMjG(n)(v(n))]TJ /F7 11.955 Tf 11.96 0 Td[(wj(n+1) (5) (n+1)=(n))]TJ /F3 11.955 Tf 11.95 0 Td[(4(n) (5) 40

PAGE 41

Inthebatchmode,theweightsandthekernelbandwidthareupdatedas ws+=PNnhrsG(v(n))]TJ /F7 11.955 Tf 11.95 0 Td[(ws)v(n) PNnhrsG(v(n))]TJ /F7 11.955 Tf 11.95 0 Td[(ws) (5) +=1 NdNXnPMjG(v(n))]TJ /F7 11.955 Tf 11.96 0 Td[(wj)kv(n))]TJ /F7 11.955 Tf 11.96 0 Td[(wjk2 PMjG(v(n))]TJ /F7 11.955 Tf 11.96 0 Td[(wj) (5) Incaseofheteroscedastickernels,thesameupdaterulesapplyfortheweightsbutwithspeciedforeachnode;whilethekernelbandwidthsareupdatedas On-linemode: 4i(n)=)]TJ /F11 11.955 Tf 9.3 16.86 Td[(Gi(n)(v(n))]TJ /F7 11.955 Tf 11.96 0 Td[(wi(n+1))kv(n))]TJ /F10 7.97 Tf 6.59 0 Td[(wi(n+1)k2 i(n)3)]TJ /F5 7.97 Tf 19.99 4.71 Td[(d i(n) PMjGj(n)(v(n))]TJ /F7 11.955 Tf 11.96 0 Td[(wj(n+1) (5) i(n+1)=i(n))]TJ /F3 11.955 Tf 11.95 0 Td[(4i(n) (5) Batchmode: +i=1 dPNnGi(v(n))]TJ /F10 7.97 Tf 6.59 0 Td[(wi)kv(n))]TJ /F10 7.97 Tf 6.59 0 Td[(wik2 PMjG(v(n))]TJ /F10 7.97 Tf 6.59 0 Td[(wj) PNnGw(i)(v(n))]TJ /F10 7.97 Tf 6.59 0 Td[(w(i)) PMjGw(j)(v(n))]TJ /F10 7.97 Tf 6.59 0 Td[(w(j)) (5) Butitisobservedthatwhentheinputhasahighdensitypart,comparedtotherestofthedistribution,thekernelbandwidthoftheweightsrepresentingthesepartsshrinkstoosmalltogiveanysensiblesimilaritymeasureandmakesthesystemunstable.Tocounterthis,ascalingfactorcanbeintroducedas 4i(n)=)]TJ /F11 11.955 Tf 9.3 16.86 Td[(Gi(n)(v(n))]TJ /F7 11.955 Tf 11.96 0 Td[(wi(n+1))kv(n))]TJ /F10 7.97 Tf 6.59 0 Td[(wi(n+1)k2 i(n)3)]TJ /F5 7.97 Tf 19.99 4.71 Td[(d i(n) PMjGj(n)(v(n))]TJ /F7 11.955 Tf 11.96 0 Td[(wj(n+1) (5) Thisensuresthatthevalueofthedoesnotgotoolow.Butatthesametimeitinadvertentlyincreasesthekernelbandwidthoftherestofthenodesandmightresultinasuboptimalsolution.Theexperimentsimilartothatinsection 4.3.1 clearlydemonstratesthis.100,000samplesfromthedistributionf(x)=2xaremappedontoa1Dchainof50nodesinon-linemodeandtherelationbetweentheweightsandthenodeindexesisobserved.Figure 5-3 showstheresultsobtained.Clearlyincaseofthehomoscedasticcasethemappingconvergetotheidealmapping.Butincaseof 41

PAGE 42

A B C D Figure5-3. Magnicationofthemapinhomoscedasticandheteroscedasticcases.TheFigures[A]and[B]showsthetracksofthekernelbandwidthincaseofhomoscedasticandheteroscedastickernels,respectively.Figures[C]and[D]showstheplotbetweenln(r)andln(w).'--'indicatestheidealplotrepresentingtherelationbetweenwandrwhenthemagnicationis1and'...'indicatestheidealplotrepresentingthemagnicationof2/3.'-+-'indicatesrelationobtainedwithadaptivekernelsSOM-CIMinhomoscedasticandheteroscedasticcases,respectively.Refertextforexplanation. theheteroscedastickernelsitisobservedthethesystemisunstablefor=1.Heretoobtainastableresultissetto0.7.Thisstabilizesthemappingbynotallowingthekernelbandwidthtoshrinktoosmall.Butitclearlyincreasesthekernelbandwidthofthenodesinlowprobabilityregionandhence,thedistortionin 5-3D ItisalsointerestingtoseethechangesinthekernelbandwidthwhileadaptingintheFigure 5-3A andFigure 5-3B .Initially,allthekernelsconvergetothesame 42

PAGE 43

Table5-1. TheinformationcontentandthemeansquarequantizationerrorforhomoscedasticandheteroscedasticcasesinSOM-CIM.Imax=3.2189 MethodMeanSquareEntropyorQuantizationErrorInfoCont.,I Homoscedastickernels5.872510)]TJ /F10 7.97 Tf 6.59 0 Td[(33.2095Heteroscedastickernels(=0.5)6.117910)]TJ /F10 7.97 Tf 6.59 0 Td[(33.1970Heteroscedastickernels(=1)15.26810)]TJ /F10 7.97 Tf 6.59 0 Td[(32.700Constantkernel1,=0.15.724010)]TJ /F10 7.97 Tf 6.59 0 Td[(33.2101MSE5.042010)]TJ /F10 7.97 Tf 6.59 0 Td[(33.1767 Figure5-4. MappingoftheSOM-CIMwithheteroscedastickernelsand=1. bandwidthasthenodesareconcentratedatthecenterandasmapchangesfromtheorderingphasetotheconvergencethekernelbandwidthsadapttothelocalvarianceofeachnode.Thiskindofadaptationensuresthattheproblemofslowannealingoftheneighborhoodfunction,discussedinsection 4.1 ,canberesolvedbyhavingrelativelylargebandwidthatthebeginning,allowingthefreemovementofthenodesduringorderingphase. Astheadaptivekernelalgorithmisabletomapthenodesneartotheoptimalmapping,itisexpectedtotransfermoreinformationabouttheinputdensitytothe 1ThebestresultfromTable 4-1 isreproducedforcomparison. 43

PAGE 44

A B Figure5-5. Thescatterplotoftheweightsatconvergenceincaseofhomoscedasticandheteroscedastickernels.[A]Thescatterplotofweightsincaseofhomoscedastickernels.[B]Thescatterplotofweightsincaseofheteroscedasticcase.Thelinesindicatetheneighborhoodinthelatticespace.Clearly,theheteroscedastickernelswith=0.5oversamplethelowprobabilityregionswhencomparedwiththehomoscedasticcase. weights.Resultsobtainedfortheexperimentsimilartotheoneinsection 4.3.2 ,wheretheinputsare5000samplesfromthe2dimensionaldistributionwithpdff(x)=5xmappedontoa5x5rectangulargrid,areshowninTable 5-1 .Withthehomoscedastickernels,thekernelbandwidthconvergedto0.0924,whichisclosetothevaluethatisobtainedasthebestresultinTable 4-1 (=0.1)andatthesametimeisabletotransfermoreinformationtotheweights.Butintheheteroscedasticcasethesystemisunstablewith=1andfailstounfold( 5-4 ),concentratingmoreonthehighprobabilityregions.Ontheotherhand,whenissetto0.5,theresultingmapdoesnotprovideagoodoutputbecauseofthelargevalues.Figure 5-5 clearlydemonstratesthis.When=0.5,itclearlyoversamplesthelowprobabilityregionsbecauseofthelargekernelbandwidthsofthenodesrepresentingthem. 44

PAGE 45

CHAPTER6APPLICATIONS NowthatwesawhowtheSOMcanbeadaptedusingtheCIM,inthischapterwetrytousetheSOM-CIMinfewoftheapplicationsoftheSOM;likedensityestimation,clusteringandprincipalcurves,andshowthathowitcanimprovetheperformance. 6.1DensityEstimation Clusteringanddensityestimationarecloselyrelatedandoneisofterusedtondtheother,i.e.,densityestimationisusedndtheclusters( AzzaliniandTorelli 2007 )andclusteringisusedtondthedensityestimation( VanHulle 2000 ).AswehaveshownearlierSOM-CIMisequivalenttoadensityestimationprocedureandwiththeadaptivekernelsitshouldbeabletoeffectivelyreproducetheinputdensityusingtheParzennon-parametricdensityestimationprocedure( Parzen 1962 ). Herewetrytoestimatea2dimensionalLaplaciandensityfunctionshowninFigure 6-1A .ThechoiceoftheLaplaciandistributionisappropriatetocomparethesemethodsbecauseitisheavytaildistributionandhence,isdifculttoestimatewithouttheproperchoiceofthekernelbandwidth. Figure 6-1 andTable 6-1 showstheresultsobtainedwhendifferentmethodsareusedtoestimatetheinputdensity.Here4000samplesofa2dimensionalLaplaciandistributionaremappedontoa10x10rectangulargrid.Theneighborhoodfunctionisgivenby 4 andisannealedusing 4 with=0.3andih=20.WhentheSOM-MSEisusedthekernelbandwidthisestimatedusingSilverman'srule( Silverman 1986 ): =1.06fN)]TJ /F10 7.97 Tf 6.58 0 Td[(5(6) wherefisthevarianceoftheinputandNisthenumberofinputsamples.Figure 6-1B clearlydemonstratesthatthisprocedurenotabletoabletoproduceagoodresultbecauseofthelargenumberofbumpsinthelowprobabilityregions.Thisisbecause 45

PAGE 46

ATruedensityfunction BSOMwithMSE CSOM-CIMwithhomoscedastickernels DSOM-CIMwithheteroscedastickernels Figure6-1. ResultsoftheDensityestimationusingSOM.[A]A2dimensionalLaplaciandensityfunction.[B]TheestimateddensityusingSOMwithMSE.ThekernelbandwidthisobtainusingtheSilverman'srule.[C]TheestimateddensityusingSOM-CIMwithhomoscedastickernels.[D]TheestimateddensityusingSOM-CIMwithheteroscedastickernels. oftheoversamplingofthelowprobabilityregionswhenMSEisused,asdiscussedinsection 2.3 Ontheotherhand,theuseoftheSOM-CIM,withbothhomoscedasticandheteroscedastickernels,isabletoreducetheoversamplingofthelowprobabilityregions.Butincaseofhomoscedastickernels,becauseoftheconstantbandwidthofthekernelsforallthenodes,theestimationofthetailofthedensityisstillnoisyandisunabletoclearlydemonstratethecharacteristicsofthemainlobeofthedensityproperly. 46

PAGE 47

Table6-1. Comparisonbetweenvariousmethodsasdensityestimators.MSEandKLDarethemeansquareerrorandKL-Divergence,respectively,betweenthetrueandtheestimateddensities. MethodMSE(.10)]TJ /F10 7.97 Tf 6.58 0 Td[(4)KLD SOMwithMSE2.0854308.4163SOM-CIM,homoscedastic2.0189128.6758heteroscedastic(=0.8)3.143212.6948 Thisisresolvedwhentheheteroscedastickernelsareused.Becauseofthevaryingkernelbandwidth,itisabletosmoothoutthetailofthedensitywhilestillretainingthecharacteristicsofthemainlobeandhence,abletoreducethedivergencebetweenthetrueandestimateddensities. 6.2PrincipalSurfacesandClustering 6.2.1PrincipalSurfaces Topologypreservationmapscanbeinterpretedasanapproximationprocedureforthecomputationofprincipalcurves,surfacesorhigher-dimensionalprincipalmanifolds( Ritteretal. 1992 ).Theapproximationconsistsinthediscretizationofthefunctionfdeningthemanifold.ThediscretizationisimplementedbymeansofalatticeAofcorrespondingdimension,whereeachweightvectorindicatesthepositionofasurfacepointintheembeddingspaceV.Intuitively,thesesurfacepoints,andhencetheprincipalcurve,areexpectedtopassrightthroughthemiddleoftheirdeningdensitydistribution.Thisdenitionofprincipalsurfacesin1dimensionalcase,andcanbegeneralizedtohighdimensionalprincipalmanifoldas( Ritteretal. 1992 ): Letf(s)beasurfaceinthevectorspaceV,i.e,dim(f))]TJ /F4 11.955 Tf 12.48 0 Td[(dim(V))]TJ /F6 11.955 Tf 12.48 0 Td[(1,andletdf(v)betheshortestdistanceofapointv2Vtothesurfacef.fisaprincipalsurfacecorrespondongtothethedensitydistributionP(v)inV,ifthemeansquareddistanceDf=Zd2f(v)P(v)dLv isextremalwithrespecttolocalvariationofthesurface. 47

PAGE 48

AWithMSE B10Epochs C20Epochs D50Epochs Figure6-2. Clusteringofthetwo-crescentdatasetinthepresenceofoutliernoise.[A]ThemappingattheconvergenceofSOMwithMSE.[B]-[D]Thescatterplotoftheweightswiththekernelbandwidthatdifferentepochsduringlearning. ButtheuseoftheMSEasthecriteriaforthegoodnessoftheprincipalsurfaces,makesitweaklydenedbecauseonlythesecondordermomentsareusedandalsothedistortioninthemappingwhenoutliersarepresent.Figure 6-2A showsthiswhenthetwocrescentdataisslightlydistortedbyintroducingsomeoutliers.Ontheotherhand,iftheprincipalsurfaceisadaptedinthecorrentropyinducedmetricsense,thentheeffectoftheoutlierscanbeeliminatedandgivesabetterapproximationoftheprincipalsurfaces.Figure 6-2 illustratesthisincaseoftheSOM-CIMwithhomoscedastickernels, 48

PAGE 49

Figure6-3. ThescatterplotoftheweightsshowingthedeadunitswhenCIMandMSEareusedtomaptheSOM.[A]TheboxeddotsindicatethedeadunitsmappedwhenMSEisused.[B]MappingwhenCIMwith=0.5isusedintheSOM.Therearenodeadunitsinthiscase. wherethekernelbandwidthadaptssuchthattheoutliersdoesnothavesignicanteffectthenalmappingthroughoutthelearning. 6.2.2AvoidingDeadUnits AnotherproblemwiththeSOMisthatitcanyieldnodesthatareneveractive,calledasdeadunits.Theseunitswillnotsufcientlycontributetotheminimizationoftheoveralldistortionofthemapand,hence,thiswillresultinalessoptimalusageofthemap'sresources( VanHulle 2000 ).Thisisacutewhenthereareclustersofdatathatarefarapartintheinputspace. ThepresenceofthedeadunitscanalsobeattributedtotheMSEbasedcostfunctionwhichpushesthenodesintotheseregions.Figure 6-3A indicatesthiswheretheinputcontainsthreenodes,eachisskewedby500samplesofa2dimensionalGaussiannoisewithvarianceequalto0.25.Ontheotherhand,whenCIMisused,asindicatedinsection 4.3.2 ,themappingtriestobeequiprobabilisticandhence,avoidsdeadunits(Figure 6-3B ). 49

PAGE 50

Table6-2. NumberofdeadunitsyieldedfordifferentdatasetswhenMSEandCIMareusedformapping.Eachentryisanaverageover30runs. DatasetMethodDeadUnitsMSE(10)]TJ /F10 7.97 Tf 6.59 0 Td[(2) ArticialMSE41.3877CIM,=0.214708.4458 IrisMSE5.65.939CIM,=0.518.023 BloodTransfusionMSE00.7168CIM,=0.200.8163 Table 6-2 showtheresultsobtainedoverthreedifferentdatasetsmappedonto10x5hexagonalgrid.Thearticialdatasetisthesameastheonedescribedabove.TheIrisandBloodtransfusiondatasetsareobtainedfromtheUCIrvinerepository. TheIrisdatasetcontains3classesof50instanceseach,whereeachclassreferstoatypeofirisplant.Here,oneclassislinearlyseparablefromtheother2;thelatterarenotlinearlyseparablefromeachother,andeachinstancehas4features.Itisobservedthatthedeadunitsappearinbetweenthetwolinearlyseparablecases.Thebloodtransfusiondataset(normalizedtobebetween[0,1]beforemapping)contains2classeswith24%positiveand76%negativeinstances.Thoughtherearenodeadunitsinthiscase,aswewillshowlater,thiskindofmappingneglectstheoutliersandhenceabletogiveabettervisualizationofthedata. Anotherimportantobservationis,theadaptivekernelalgorithmsareunabletogiveagoodresultforbothIrisandBloodTransfusiondatasets.ThereasonforthisistheinherentnatureoftheParzendensityestimation,whichisthecenterpieceforthisalgorithm.AsthenumberofdimensionsincreasethenumberofinputsamplesrequiredbytheParzendensityestimationforproperdensityestimationincreasesexponentially.Becauseofthelimitedamountofdatainthesedatasetsthealgorithmfailedtoadaptthekernelbandwidth. 50

PAGE 51

Figure6-4. TheU-matrixobtainedforthearticialdatasetthatcontainsthreeclusters.[A]U-matrixobtainedwhentheSOMistrainedwithMSEandtheeuclidiandistanceisusedtocreatethematrix.[B]U-matrixobtainedwhentheCIMisusedtotraintheSOMandtocreatethematrix. 6.3DataVisualization VisualizinghighdimensionaldatainlowdimensionalgridisoneoftheimportantapplicationsofSOM.Sincethetopographycanbepreservedontothegrid,visualizingthegridcangivearoughideaaboutthestructureofthedatadistributionand,inturn,anideaaboutthenumberofclusterswithoutanyaprioriinformation.ThepopularSOMvisualizationtoolistheU-matrix,wherethedistancebetweentheanodeandallitsneighborsisrepresentedinagrayscaleimage.Iftheeuclideandistanceisusedtoconstructthematrix,thenthedistancebetweenthecloselyspacedclustersisovershadowedbythefarawayclusters,makingitdifculttodetectthem.Ontheotherhand,becauseofthenon-linearnatureoftheCIMandsaturationregionbeyondcertainpoint(determinedbythevalueof),thecloselyspacedclusterscanbedetectediftheU-matrixisconstructedusingCIM. Figure 6-4 showstheresultsobtainedwhentheMSEandtheCIMareusedtotraintheSOMandtoconstructtheU-matrix.TheSOMhereisa10x5hexagonalgridand 51

PAGE 52

A B C D Figure6-5. TTheU-matrixobtainedfortheIrisandbloodtransfusiondatasets.[A]and[C]showtheU-matrixfortheIrisandBloodtransfusiondatasets,respectively,whentheSOMistrainedwithMSEandtheeuclidiandistanceisusedtocreatethematrix.[B]and[D]showtheircorrespondingU-matrixobtainedwhentheCIMisusedtotraintheSOMandtocreatethematrix. 52

PAGE 53

theinputisa2dimensionalarticialdatasetcontaining3clusters;oneclusterisfartherawayfromtheothertwo( 6-3 ),eachmadeupof500instancesoftheGaussiannoisewithvariance0.25.Clearly,theboundariesarenotcrispwhentheMSEisused.ThisisbecauseofthedeadunitstheSOMyieldswhentheMSEisused.Ontheotherhand,theU-matrixobtainedwhentheSOM-CIMisusedisabletogiveacrispboundaries. Theproblemofviewingcloselyspaceclusterswhenthereareotheroutliersorfarawayclustersbecomesevenmoredifcult.Figure 6-5 illustratesthisisthecaseoftheIrisdatasetandtheBloodtransfusiondataset.TheIrisdatasethasthreeclusterswhereonlyoneclusterislinearlyseparablefromtheothertwo.WhentheU-matrixiscreatedusingtheMSE,onlythelinearlyseparableclusterisdistinguishablefromtheothertwo.Whereas,theU-matrixcreatedusingtheCIMisabletoshowthethreeclusters(thoughnotwithcrispboundaries).Thereasonforthisisnotonlybecauseofavoidingthedeadunitsbutalsobecauseofthenon-linearnormoftheCIM.Ifthedistancebetweenthenodesthatarecloselyspace(ineuclidiansense)isevaluatedusingtheCIM,thenthisisnotovershadowedbythedistanceevaluatedusingtheCIMbetweenthenodesthatsarefarapartbecauseofthesaturationoftheCIMnorm. Thebloodtransfusiondatasethastwoclustersandexpectedtohavelargenumberofoutliers.ThequalityoftheU-matrixisnotdiscussedandislefttothereaders. 53

PAGE 54

CHAPTER7CONCLUSION TheuseofthekernelHilbertspacesby Vapnik ( 1995 )hasspurredtheuseofthekernelmethodsisseveraleld.Thisideaisalsousedintheselforganizingmapspreviouslyby Lauetal. ( 2006 ), Andras ( 2002 ), MacDonald ( 2000 ),etc.Inallthesecasestheuseofthekerneltrickisviewedincontextofthepatternclassicationandtheincreaseintheperformanceisattributedtotheassumptionthattheclassestendtobelinearlyseparablewhenmappedintohigherdimensions. Ontheotherhand,theuseofthecorrentropyinducedmetricgivesanideaabouteffectthekernelizationhasintheinputspaceandhence,isabletogiveanideaabouthowthenaloutputofthemappingiseffectedbythechoiceofthekernelandkernelbandwidth.Asindicated,thechoiceofthekernelbandwidthdictatesthemagnicationofthemapping.Forexample,largerbandwidthadapttheSOM-CIMtominimizethequantizationerrorbecauseofthelargeL2-norminduced,whereassmallerbandwidthmightresultinamapthatconcentratesmoreonthehighprobabilityparts;andhence,distortingthemapping.Also,aspreviouslydiscussed,theadvantageofusingtheCIMliesinitsstrongoutlier'srejectioncapabilityandthepresenceofhigherordermoments.Boththesepropertiesareusefulwhentheinputdistributionisnon-uniformandhence,theSOM-CIMcanoutperformtheSOMonlywhenthedatainnon-uniformlydistributed(trueformanypracticalcases). TheproposedadaptivekernelalgorithmbasedontheKL-divergenceisabletoadaptthebandwidthnearlytotheoptimalsolution.Butitisobservedthatthealgorithmisunstableintheheteroscedasticcaseandanadditionalfreeparameterneedstobespecied.Inspiteofthat,thenalmappingisstilllesssensitivetothevalueofthanthevalueofandisalsoabletogiveaheteroscedasticalternative. AnotherpointthatneedstobediscussedistheextensionoftheuseoftheCIMinotherclusteringalgorithmslikeneural-gas,elasticnets,soft-topographicvector 54

PAGE 55

quantization1,etc.ThesimilaritymeasureusedintherankingoftheweightvectorsintheneuralgasalgorithmcanbereplacedbytheCIMandasimilarprocedurecanbeappliedtoadaptthethenetwork.Thisisalsoexpectedtoimprovetheperformanceofthenetworkintermsofthemagnicationfactor. Finally,thoughthedependenceofthemagnicationfactoronthekernelbandwidthisshownexperimentally,theanalyticalanalysisisstillillusive.Thefutureworkshouldconcentrateonthendingtherelationbetweenthemagnicationfactorandthekernelbandwidth;whichinturndependsonthevarianceofthedata.AlsotheuseofmultivariatekernelsmightbeimportantwhentheSOM-CIMisusedfordensityestimationanditalsoneedstobestudied.Theproposedadaptivekernelalgorithmcanalsobeextendedtothemultivariatekernelsusingtheideaproposedby Erdogmusetal. ( 2006 )butwillleadtoalargercomputationalcomplexity.Aneffective,lesscomputationallyexpensivealgorithmtoadaptthekernelbandwidthisnecessary. 1ThisisdiscussedinAppendix B 55

PAGE 56

APPENDIXAMAGNIFICATIONFACTOROFTHESOMWITHMSE ThequantitativeanalysisofthemagnicationfactoroftheSOMwiththeMSEasthecostfunctionisshownhere.Detailedanalysisofthiscanbefoundat( Ritteretal. 1992 ). Thealongastheneighborhoodfunctionhrsisnon-zero,w(r,t)undergoesanon-zerochangeateachtimestep.Thistransitionoftheweightsfromwto^wcanbeformallybythetransformation w=T(^w,v,)(A) whereisthelearningrate.Explicitly,thetransitionin A canbewrittenas ws=^ws+hrs(v)]TJ /F7 11.955 Tf 13.2 0 Td[(^ws),8s(A) .SincetheinputvoccursonlywiththeprobabilityP(v),theprobabilityQ(w,^w)ofthetransitionfromthestatewto^wcanbewrittenas Q(w,^w)=Z(w)]TJ /F4 11.955 Tf 11.95 0 Td[(T(^w,v,))P(v)dv(A) Intheequilibriumcondition,thenecessaryandsufcientconditionisthatthetransitionprobabilitytobezero.Thisimplies Q(w,^w)=0)Zhrs(v)]TJ /F7 11.955 Tf 11.96 0 Td[(ws)P(v)dv==08s (A) Amaximallyorderedmap,denedasamapthatisfullyunfoldedwithoutany'twists'or'kinks',isassumedwhichleadstothevalidityofthefollowingassumptionsinthecontinuumcase Itisassumedthatforasufcientlylargesystemswrisafunctionofthatvariesslowlyfromlatticepointtolatticepointsothatitsreplacementbyafunctionw(r)onacontinuumofr-valuesisjustied. Weassumethatw(r)isone-to-one. 56

PAGE 57

Alsothetheneighborhoodfunctionwithasteepmaximumatr=sthatsatises Zhrs(r)]TJ /F7 11.955 Tf 11.95 0 Td[(s)dr=0Zh(r)]TJ /F7 11.955 Tf 11.95 0 Td[(s)(ri)]TJ /F4 11.955 Tf 11.96 0 Td[(si)(rj)]TJ /F4 11.955 Tf 11.96 0 Td[(sj)dr=ij2h,8i,j=1,2,...,d (A) wheredisthedimensionalityofthelattice,whichisassumedsameasthedimensionalityoftheinputspaceVandhistheneighborhoodrangeoftheGaussianneighborhoodfunctionused. Duetothecontinuumapproximationwhereitisassumedthatthereareinnitenumberofnodes,attheequilibrium,thequantitykv)]TJ /F7 11.955 Tf 12.26 0 Td[(wrkvanishessinceeachsamplevintheinputspacehasacorrespondingtoanrforwhichv=wr.So,wecanreplacevin A withwr.Also,bychangingthevariableofintegrationtosinsteadofw,the A becomes Zhrs(wr)]TJ /F7 11.955 Tf 11.96 0 Td[(ws)P(wr)J(wr)ds=0(A) Here J(wr)=jdv sj(A) istheabsolutevalueoftheJacobianofthemapw:v!w.Withq=r)]TJ /F4 11.955 Tf 12.33 0 Td[(sasanewintergrationvariableandP(r)=P(w(r))theexpansionof A inthepowersofqyields(withimplicitsummationqi@iistobesummedoveralli) 0=Zhq0(qi@iw+1 2qiqj@i@jw+...). (A) .(P+@kP+...)(J+ql@lJ+...)dq=Zhq0(qi@iwPJ+qiqk@iw@kPJ+qiql@iw@lJP+...)dq=Zhq0dq.)]TJ /F6 11.955 Tf 5.48 -9.69 Td[((@iw)@j(PJ)+1 2PJ.@i@jw(s)+O(4h)=2h(@iw)@j(PJ)+1 2PJ.@2iw(r)+O(4h) (A) 57

PAGE 58

wheretheconditionabouttheneighborhoodfunctionin A areused.Thenecessaryandsufcientconditionfor A toholdforsmallis Xi@iw@iP P+@iJ J=)]TJ /F6 11.955 Tf 10.49 8.09 Td[(1 2Xi@2iw(A) SubstitutingtheJacobianmatrixJij=@jwi(r)and4=Pi@2i,theconditionbecomes J.ln(P.J)=)]TJ /F6 11.955 Tf 10.49 8.09 Td[(1 24w(A) Foraone-dimensionalcaseweobtainJ=J=dw=drand4w=d2w=dr2withwandrasscalars.Usingthisthedifferentialequation A canbesolved.Rewritingtheaboveequationbysubstitutingthesevalues,weget dw dr1 PdP dr+dw dr)]TJ /F10 7.97 Tf 6.59 0 Td[(1d2w dr2=)]TJ /F6 11.955 Tf 10.5 8.09 Td[(1 2d2w dr2(A) fromwhichwecanobtain d drlnP=)]TJ /F6 11.955 Tf 10.5 8.09 Td[(3 2d drlndw dr(A) Sincew(w(r))=rholds,thelocalmagnicationfactorMofwcanbedenedbyM=1=J.Fortheone-dimensionalcaseM=(dw=dr))]TJ /F10 7.97 Tf 6.58 0 Td[(1andweobtainasarelationbetweeninputstimulusdistributionandthelatticerepresentation M(v)=J)]TJ /F10 7.97 Tf 6.58 0 Td[(1=dr dw/P(v)2=3(A) Hence,thelocalmagnicationfactorM(v)isdependentontheprobabilitydensityP(v)accordingtoapowerlawwithfactor2/3. TheconditionthatneedtobesolvedforobtainingthemagnicationfactoroftheSOM-CIM,attheequilibriumcondition,is ZhrsG(v)]TJ /F7 11.955 Tf 11.96 0 Td[(ws)(wr)]TJ /F7 11.955 Tf 11.96 0 Td[(ws)P(v)dv=0(A) Thisisleftforthefuturework. 58

PAGE 59

APPENDIXBSOFTTOPOGRAPHICVECTORQUANTIZATIONUSINGCIM Graepeletal. ( 1997 )hasuseddeterministicannealing( Rose 1998 )intopographicpreservingvectorquantization.Theideaistoreducethefreeenergyoftheclusteringnodeswhilemaintainingthetopologyofthenodesinthelatticespace.Thealgorithmcanbedescribedinthefollowingsections. B.1SOMandFreeEnergy Withthelocalerrorin 2 decidingthewinner,thequantizationerrorovertheentiredatasetisgivenby Fquant(P,W)=XnXrpnrXshrsD(vn,ws)(B) wherepnristheposteriorprobabilitythatthenoderisthewinnerwhenvnisconsideredandtheentropytermwhichdeterminestheprobabilityofthenodestooccurisgivenby Fent(P)=XnXrpnrlogpnr qr(B) whereqr=priorprobabilityofthenodes.Usingthe B and B ,thefreeenergyofthefunctioncanbewrittenas F(P,W)=Fquant(P,W)+Fent(P)(B) whereiscalledtheinversetemperature,largerthelessertheinuenceoftheentropy( Heskes 2001 ). B.2ExpectationMaximization Asprovedby NealandHinton ( 1998 )thatthe(local)minimumofthefreeenergycorrespondstothe(local)minimumerroroftheselforganizingmaps,expectationmaximizationcanbeusedtondtheoptimalsolution( Heskes 2001 ; NealandHinton 1998 )oftheweightsandtheircorrespondingposteriors. Theexpectationstepcorrespondstondingtheposteriors,pnr,thatminimizestheF(P,W).ByusingaLagrangemultiplierwiththeconditionthatPrpnr=1,theoptimal 59

PAGE 60

A=0.01 B=100 C=250 D=500 FigureB-1. TheunfoldingoftheSSOMmappingwhentheCIMisusedasthesimilaritymeasure.Thesnapshotsaretakenatintermediatepointduringtheunfoldingofthemappingatdifferentvaluesof.Thedatasetconsideredhereisauniformdistributionof1000pointswithinaunitsquare.Theneighborhoodfunctionisxedath=0.5. posteriorscanbeobtainedanditcomesouttobe pnr=qrexp()]TJ /F3 11.955 Tf 9.29 0 Td[(PthrtD(vn,wt)) Psqsexp()]TJ /F3 11.955 Tf 9.3 0 Td[(PthstD(vn,wt))(B) Thisupdateruledoesnotdependonthedistancemeasureandhence,wouldbesamewhileusingboththeMSEandCIMmeasures. 60

PAGE 61

Whereasthemaximizationstepchangeswiththesimilaritymeasure. WithMSE ByndingthederivativeofEq.5w.r.tweights,ws,andequatingittozero,wegettheupdationruleasinEq.9. ws(i)=PnPrpnrhrsx(i) PnPrpnrhrs(B) WithCIM WhileusingtheCIM,thexedpointupdateruleis ws(i)=PnPrpnrhrsG(vn)]TJ /F4 11.955 Tf 11.96 0 Td[(ws)vn(i) PnPrpnrhrsG(vn)]TJ /F4 11.955 Tf 11.96 0 Td[(ws)(B) Withhrs!,STVQactsasatraditionalSOMwithsoftmembershipfunction( Graepeletal. 1997 ).Figure B-1 showstheevolutionofthesoftSOM(SSOM)whenCIMwith=0.5isused.Sincetheuseofdeterministicannealingistoavoidthelocalminimaofthecostfunction,thediscussionaboutthemagnicationofthemappingstillholdshere.ItisexpectedthattheuseoftheCIMhereplaysasimilarroleasinthecaseoftheSOM-CIM. 61

PAGE 62

REFERENCES Andras,P.Kernel-Kohonennetworks.InternationalJournalofNeuralSystems12(2002):117. Azzalini,AdelchiandTorelli,Nicola.Clusteringvianonparametricdensityestimation.StatisticsandComputing17(2007):71. Babich,GregoryA.andCamps,OctaviaI.WeightedParzenWindowsforPatternClassication.IEEETransactionsonPatternAnalysisandMachineIntelligence18(1996):567. Bauer,H.U.andDer,R.Controllingthemagnicationfactorofself-organizingfeaturemaps.NeuralComput.8(1996).4:757. Bell,AnthonyJ.andSejnowski,TerrenceJ.Aninformation-maximizationapproachtoblindseparationandblinddeconvolution.NeuralComput.7(1995).6:1129. Claussen,JensChristian.Winner-RelaxingSelf-OrganizingMaps.NeuralComput.17(2005).5:996. DenizErdogmus,JoseC.Principe.GeneralizedInformationPotentialCriterionforAdaptiveSystemTraining,.Trans.onNeuralNetworks12(2002).5:1035. Erdogmus,Deniz,Jenssen,Robert,Rao,YadunandanaN.,andC.Principe,Jos.Gaussianization:AnEfcientMultivariateDensityEstimationTechniqueforStatisticalSignalProcessing.TheJournalofVLSISignalProcessing45(2006):67. Erwin,E.,Obermayer,K.,andSchulten,K.Self-OrganizingMaps:Ordering,ConvergencePropertiesandEnergyFunctions.BiologicalCybernetics67(1992):47. Graepel,T.Self-organizingmaps:Generalizationsandnewoptimizationtechniques.Neurocomputing21(1998).1-3:173. Graepel,Thore,Burger,Matthias,andObermayer,Klaus.DeterministicAnnealingforTopographicVectorQuantizationandSelf-OrganizingMaps.ProceedingsoftheWorkshoponSelf-OrganisingMaps,volume7ofProceedingsinArticialIntelligence.Inx,1997,345. Haykin,Simon.NeuralNetworks:AComprehensiveFoundation(2ndEdition).PrenticeHall,1998,2ed. Hegde,A.,Erdogmus,D.,Lehn-Schiler,T.,Rao,Y.,andPrincipe,J.Vector-QuantizationbydensitymatchingintheminimumKullback-Leiblerdivergencesense.IEEEInternationalConferenceonNeuralNetworks-ConferenceProceed-ings.vol.1.2004,105. 62

PAGE 63

Heskes,T.Self-organizingmaps,vectorquantization,andmixturemodeling.NeuralNetworks,IEEETransactionson12(2001).6:1299. Heskes,Tom.Energyfunctionforself-organizingmaps.Amsterdam:Elsevierl,1999. Kohonen,Teuvo.Self-organizedformationoftopologicallycorrectfeaturemaps.BiologicalCybernetics43(1982):5969. .Self-OrganizingMaps(2nded)(SpringerSeriesinInformationSciences,30).Springer,1997,2nded. Lau,K.,Yin,H.,andHubbard,S.Kernelself-organisingmapsforclassication.Neurocomputing69(2006).16-18:2033. Lehn-Schioler,Tue,Hegde,Anant,Erdogmus,Deniz,andPrincipe,Jose.Vectorquantizationusinginformationtheoreticconcepts.NaturalComputing4(2005).1:39. Linsker,Ralph.Howtogenerateorderedmapsbymaximizingthemutualinformationbetweeninputandoutputsignals.NeuralComput.1(1989).3:402. Liu,Weifeng,Pokharel,PuskalP.,andPrincipe,JoseC.Correntropy:PropertiesandApplicationsinNon-GaussianSignalProcessing.IEEETransactionsonSignalProcessing55(2007).11:5286. MacDonald,FyfeC.,D.TheKernelself-organisingmaps.4thint.conf.onknowledge-basedintelligenceengineeringsystemsandappliedtechnologies.2000,317. Neal,RadfordandHinton,GeoffreyE.AViewOfTheEmAlgorithmThatJustiesIncremental,Sparse,AndOtherVariants.LearninginGraphicalModels.1998,355. Parzen,Emanuel.OnEstimationofaProbabilityDensityFunctionandMode.TheAnnalsofMathematicalStatistics33(1962).3:1065. Principe,Jose.InformationTheoreticLearning:Renyi'sEntropyandKernelPerspec-tives,(SpringerSeriesinInformationSciencesandStatistics).Springer,2010,1nded. Principe,JoseC.,Xu,Dongxin,Zhao,Qun,andFisher,JohnW.,III.LearningfromExampleswithInformationTheoreticCriteria.J.VLSISignalProcess.Syst.26(2000).1/2:61. Renyi,A.Onmeasuresofinformationandentropy.Proceedingsofthe4thBerkeleySymposiumonMathematics,StatisticsandProbability(1961). Ritter,Helge,Martinetz,Thomas,andSchulten,Klaus.NeuralComputationandSelf-OrganizingMaps;AnIntroduction.Boston,MA,USA:Addison-WesleyLongmanPublishingCo.,Inc.,1992. 63

PAGE 64

Rose,Kenneth.DeterministicAnnealingforClustering,Compression,Classication,Regression,andRelatedOptimizationProblems.ProceedingsoftheIEEE.1998,2210. Santamaria,I.,Pokharel,P.P.,andPrincipe,J.C.Generalizedcorrelationfunction:denition,properties,andapplicationtoblindequalization.SignalProcessing,IEEETransactionson54(2006).6:2187. Silverman,B.W.DensityEstimationforStatisticsandDataAnalysis.Chapman&Hall/CRC,1986. VanHulle,M.M.FaithfulRepresentationsandtopographicmaps:Fromdistortion-toinformation-basedself-organization.NewYork:Wiley,2000. Vapnik,VladimirN.Thenatureofstatisticallearningtheory.NewYork,NY,USA:Springer-VerlagNewYork,Inc.,1995. Villmann,ThomasandClaussen,JensChristian.MagnicationControlinSelf-OrganizingMapsandNeuralGas.NeuralComput.18(2006).2:446. Yin,H.andAllinson,N.M.Self-organizingmixturenetworksforprobabilitydensityestimation.NeuralNetworks,IEEETransactionson12(2001).2:405. Yin,Hujun.Ontheequivalencebetweenkernelself-organisingmapsandself-organisingmixturedensitynetworks.NeuralNetw.19(2006).6:780. Zador,P.Asymptoticquantizationerrorofcontinuoussignalsandthequantizationdimension.InformationTheory,IEEETransactionson28(1982).2:139. 64

PAGE 65

BIOGRAPHICALSKETCH RakeshChalasaniisborninMopidevi,Indiain1986.HereceivedhisBachelorofTechnologydegreefromDepartmentofElectronicsandCommunicationEngineeringatNationalInstituteofTechnology,Nagpur,Indiain2008.HejoinedtheDepartmentofElectricalandComputerEngineeringattheUniversityofFloridainAugust2008.HehasbeenworkingwithDr.JoseC.Principesince2009andreceivedhisMasterofSciencedegreeinMay2010.Hispresentresearchinterestsincludemachinelearning,patternrecognition,unsupervisedlearningandadaptivesignalprocessing. 65