Optimization Based Robust Methods in Data Analysis with Applications to Biomedicine and Engineering

MISSING IMAGE

Material Information

Title:
Optimization Based Robust Methods in Data Analysis with Applications to Biomedicine and Engineering
Physical Description:
1 online resource (166 p.)
Language:
english
Creator:
Syed, Naqeebuddin Mujahid
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Industrial and Systems Engineering
Committee Chair:
Pardalos, Panagote M
Committee Members:
Geunes, Joseph Patrick
Richard, Jean-Philippe P
Principe, Jose C

Subjects

Subjects / Keywords:
blind-signal-separation -- classification -- clustering -- generalized-covexity -- robust-algorithms -- robust-measures
Industrial and Systems Engineering -- Dissertations, Academic -- UF
Genre:
Industrial and Systems Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
Analysis of a complex system as a whole, and the limitations of traditional statistical analysis led towards the search of robust methods in data analysis. In the current information age, data driven modeling and analysis forms a core research element of many scientific research disciplines. One of the primary concerns in the data analysis is the treatment of data points which do not show the true behavior of the system (outliers). The aim of this dissertation is to develop optimization based methods for data analysis that are insensitive and/or resistant towards the outliers. Generally, such methods are termed as robust methods. In this dissertation, our approach will be different from the conventional uncertainty based robust optimization approaches. The goal is to develop robust methods that include robust algorithms and/or robust measures. Specifically, applicability of an information theoretic learning measure based on entropy called correntropy is highlighted. Some crucial theoretical results on the optimization properties of correntropy and related measures are proved. Optimization algorithms for correntropy are developed for both parametric and non-parametric frameworks. A second order triggered algorithm is developed, which minimizes the correntropic cost on a parametric framework. For the case of non-parametric framework, the usage of convolution smoothing and simulated annealing based algorithms is proposed. Furthermore, a modified Randomized Sampling Consensus (RANSAC) based robust algorithm is also proposed. The performance of the proposed approaches is illustrated by case studies on the data related to biomedical and engineering areas, with the objective of binary classification and signal separation.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Naqeebuddin Mujahid Syed.
Thesis:
Thesis (Ph.D.)--University of Florida, 2013.
Local:
Adviser: Pardalos, Panagote M.
Electronic Access:
RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2014-08-31

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2013
System ID:
UFE0045841:00001


This item is only available as the following downloads:


Full Text

PAGE 1

OPTIMIZATIONBASEDROBUSTMETHODSINDATAANALYSISWITHAPPLICATIONSTOBIOMEDICINEANDENGINEERINGByNAQEEBUDDINMUJAHIDSYEDADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2013

PAGE 2

c2013NaqeebuddinMujahidSyed 2

PAGE 3

Dedicatedtomybelovedmother,memoriesofmyfather,andallofmydearestsiblings,whotaughtmetobelieveinmyself... 3

PAGE 4

ACKNOWLEDGMENTS AllpraiseisduetoAllah(S.W.T)forHiskindestblessingsonmeandallthemembersofmyfamily.IfeelprivilegedtoglorifyHisnameinsincerestwaythroughthissmallaccomplishment.IaskforHisblessings,mercyandforgivenessallthetime.IsincerelyaskHimtoacceptthismeagereffortasanactofworship.MaythepeaceandblessingsofAllah(S.W.T)beuponHisdearestprophet,Muhammad(S.A.W).IwouldliketoexpressmyprofoundgratitudeandappreciationtomyadvisorProf.PanosM.Pardalos,forhisconsistenthelp,guidanceandattentionthathedevotedthroughoutthecourseofthiswork.Heisalwayskind,understandingandsympathetictome.Hisvaluablesuggestionsandusefuldiscussionsmadethisworkinterestingtome.IamalsoverygratefultoProf.JoseC.Principeforhisimmensehelpandinsightfuldiscussionsonthetopicspresentedinthethesis.SincerethanksgotomythesiscommitteemembersDr.JosephGeunes,Dr.Jean-PhilippeP.Richardfortheirinterest,cooperationandconstructiveadvice.IwouldalsoliketothankDr.PandoGeorgievforhoursoffriendlydiscussionandconstructiveadvice.SpecialthankstoDr.IliasKotsireasandDr.JamesC.Sackellaresfortheirvaluablediscussions.IwouldliketothanktheUniversityofFloridaandtheIndustrialandSystemsEngineeringDept.forprovidingmeanopportunitytopursuePhDundertheesteemedprogram.IwouldliketothankallthestaffmembersattheISEDept.,myWeil401friends,andthestaffattheinternationalcenterfortherefriendlyguidance,andwarmsupportthroughoutmystudyatUFL.SpecialthankstoBr.AmmarformakingmystayinGainesvillememorable.Lastbutnotleast,Ihumblyoffermysincerethankstomymotherforherincessantinspiration,blessingsandprayers,andtomyfatherforhisindeliblememorieslledwithloveandcare.IowealottomybrothersS.N.JaweedandS.N.Majeed,andmysistersNasreen,Shaheen,Tahseen,YasmeenandAfreenfortheirunrequitedsupport,encouragement,blessingsandprayers. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 7 LISTOFFIGURES ..................................... 8 ABSTRACT ......................................... 10 CHAPTER 1ROBUSTMETHODSINDATAANALYSIS ..................... 12 1.1DataAnalysis .................................. 14 1.2MotivationandSignicance .......................... 18 1.3RobustMethods ................................ 19 1.4ScopeandObjective .............................. 20 2ROBUSTMEASURESANDALGORITHMS .................... 22 2.1TraditionalRobustMeasures ......................... 22 2.2ProposedEntropyBasedMeasures ..................... 24 2.3MinimizationofCorrentropyCost ....................... 25 2.4MinimizationofErrorEntropy ......................... 38 2.5MinimizationofErrorEntropywithFiducialPoints .............. 41 2.6TraditionalRobustAlgorithm .......................... 44 2.7ProposedRobustAlgorithm .......................... 46 2.8DiscussionontheRobustMethods ...................... 47 3ROBUSTDATACLASSIFICATION ......................... 49 3.1Preliminaries .................................. 49 3.2TraditionalClassicationMethods ....................... 66 3.3ProposedClassicationMethods ....................... 69 4ROBUSTSIGNALSEPARATION .......................... 75 4.1SignalSeparationProblem .......................... 77 4.2TraditionalSparsityBasedMethods ..................... 80 4.3ProposedSparsityBasedMethods ...................... 86 5SIMULATIONSANDRESULTS ........................... 106 5.1CauchyandSkewNormalData ........................ 106 5.2RealWorldBinaryClassicationData .................... 107 5.3ComparisonAmongANNBasedMethods .................. 108 5.4ANNandSVMComparison .......................... 109 5

PAGE 6

5.5LinearMixingEEG-ECoGData ........................ 110 5.6fMRIDataAnalysis ............................... 113 5.7MRIScans ................................... 115 5.8FingerPrints .................................. 116 5.9ZipCodes .................................... 117 5.10GhostEffect ................................... 118 5.11HyperplaneClustering ............................. 119 5.12RobustSourceExtraction ........................... 120 6SUMMARY ...................................... 150 6.1Criticism ..................................... 151 6.2Conclusion ................................... 152 APPENDIX:GENERALIZEDCONVEXITY ....................... 155 REFERENCES ....................................... 158 BIOGRAPHICALSKETCH ................................ 166 6

PAGE 7

LISTOFTABLES Table page 3-1Binaryclassicationproposedmethods ...................... 72 5-1Binaryclassicationcasestudy1 .......................... 121 5-2Cauchydata ..................................... 121 5-3Skewdata ....................................... 122 5-4Binaryclassicationcasestudy2 .......................... 122 5-5SamplebasedperformanceofANNonPIDdata ................. 123 5-6BlockbasedperformanceofANNonPIDdata .................. 124 5-7SamplebasedperformanceofANNonBLDdata ................. 125 5-8BlockbasedperformanceofANNonBLDdata .................. 126 5-9SamplebasedperformanceofANNonWBCdata ................ 127 5-10BlockbasedperformanceofANNonWBCdata .................. 128 5-11PerformanceofACSfordifferentvaluesofandnumberofPEsinhiddenlayeronPIDdata ................................... 129 5-12PerformanceofACSfordifferentvaluesofandnumberofPEsinhiddenlayeronBLDdata .................................. 129 5-13PerformanceofACSfordifferentvaluesofandnumberofPEsinhiddenlayeronWBCdata .................................. 129 5-14Linearmixingassumption .............................. 129 5-15Averageunmixingerror ............................... 129 5-16Standarddeviationunmixingerror ......................... 130 5-17Simulation-1resultsforcasestudy2 ........................ 130 5-18Simulation-2resultsforcasestudy2 ........................ 130 5-19Performanceofcorrentropyminimizationalgorithm ................ 130 A-1Generalizedconvexity ................................ 157 7

PAGE 8

LISTOFFIGURES Figure page 3-1Correntropic,quadraticand0-1lossfunctions ................... 73 3-2Perceptron ...................................... 74 4-1Cocktailpartyproblem ................................ 98 4-2BSSsetupforhumanbrain ............................. 99 4-3OverviewofdifferentapproachestosolvetheBSSproblem ........... 99 4-4OriginalexamplesourceS2R380 ........................ 103 4-5MixeddataX2R280 ................................ 103 4-6ProcesseddatabX2R280 ............................. 104 4-7Algorithm 4.2 description .............................. 105 5-1GlobalviewofCauchydata ............................. 131 5-2LocalviewCauchydata ............................... 132 5-3Skewnormaldatawithnoise ............................ 133 5-4PerformanceofSVMonPIDdata .......................... 134 5-5PerformanceofSVMonBLDdata ......................... 135 5-6PerformanceofSVMonWBCdata ......................... 136 5-7EEGrecordingsfrommonkey ............................ 137 5-8ECoGrecordingsfrommonkey ........................... 138 5-9fMRIdatavisualization ................................ 139 5-10ConvexhullPPC1assumption ........................... 139 5-11MixingandunmxingofMRIscans ......................... 140 5-12Mixingandunmxingofngerprints ......................... 141 5-13Mixingandunmxingofzipcodes .......................... 142 5-14Mixingandunmxingofghosteffect ......................... 143 5-15Originalsparsesource(normalized)forcasestudy1 ............... 144 5-16Givenmixturesofsourcesforcasestudy1 .................... 145 8

PAGE 9

5-17Originalmixingmatrixforcasestudy1 ....................... 146 5-18Mixingmatricesforcasestudy1 .......................... 146 5-19Recoveredmixingmatrixforcasestudy1 ..................... 146 5-20Recoveredsource(normalized)forcasestudy1 ................. 147 5-21Dataforsourceextractionmethod ......................... 148 5-22Recoveryofsourcesbyquadraticandcorrentropyloss .............. 149 9

PAGE 10

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophyOPTIMIZATIONBASEDROBUSTMETHODSINDATAANALYSISWITHAPPLICATIONSTOBIOMEDICINEANDENGINEERINGByNaqeebuddinMujahidSyedAugust2013Chair:PanosM.PardalosMajor:IndustrialandSystemsEngineeringAnalysisofacomplexsystemasawhole,andthelimitationsoftraditionalstatisticalanalysisledtowardsthesearchofrobustmethodsindataanalysis.Inthecurrentinformationage,datadrivenmodelingandanalysisformsacoreresearchelementofmanyscienticresearchdisciplines.Oneoftheprimaryconcernsinthedataanalysisisthetreatmentofdatapointswhichdonotshowthetruebehaviorofthesystem(outliers).Theaimofthisdissertationistodevelopoptimizationbasedmethodsfordataanalysisthatareinsensitiveand/orresistanttowardstheoutliers.Generally,suchmethodsaretermedasrobustmethods.Inthisdissertation,ourapproachwillbedifferentfromtheconventionaluncertaintybasedrobustoptimizationapproaches.Thegoalistodeveloprobustmethodsthatincluderobustalgorithmsand/orrobustmeasures.Specically,applicabilityofaninformationtheoreticlearningmeasurebasedonentropycalledcorrentropyishighlighted.Somecrucialtheoreticalresultsontheoptimizationpropertiesofcorrentropyandrelatedmeasuresareproved.Optimizationalgorithmsforcorrentropyaredevelopedforbothparametricandnon-parametricframeworks.Asecondordertriggeredalgorithmisdeveloped,whichminimizesthecorrentropiccostonaparametricframework.Forthecaseofnon-parametricframework,theusageofconvolutionsmoothingandsimulatedannealingbasedalgorithmsisproposed.Furthermore,amodiedRandomizedSamplingConsensus(RANSAC)basedrobust 10

PAGE 11

algorithmisalsoproposed.Theperformanceoftheproposedapproachesisillustratedbycasestudiesonthedatarelatedtobiomedicalandengineeringareas,withtheobjectiveofbinaryclassicationandsignalseparation. 11

PAGE 12

CHAPTER1ROBUSTMETHODSINDATAANALYSISUnderstandingtheunderlyingmechanismofarealworldsystemisthebasicgoalofmanyscienticresearchdisciplines.TypicalquestionsrelatedtothesystemlikeHowdoesitwork?orWhatwillhappenifthisorthatischangedinthesystem?aretobeansweredforsuccessfulprogressofthescienticresearch.Thisparticularresearchelementhasbeenrevolutionizedbythemethodsofexperimentationandstatisticalanalysis.Infact,priortostatisticalanalysisandexperimentation,deductivelogicwastypicallyusedinunderstandingofthesystem,whichhadtremendouslimitations.Theconceptofhypothesistestingcanbesolelyattributedtostatisticalanalysisandexperimentation.Nowadays,obtaininginformationfromdataisoneoftheprevalentresearchareasofscienceandengineering.However,asthecuriositytostudytherealworldcomplexsystemsasawholeincreasedoverthetime,traditionalstatisticalanalysismethodshadproventobeinefcient.Statisticalmethodsdictatedthetheoryofanalyzingthedata,untilTukey[ 98 ]revolutionizedtheideologyofanalyzingtheexperimentaldata.Hedifferentiatedthetermdataanalysisfromstatisticalanalysisbystatingthattheformercanbeconsideredasscience,butthelaterissubjectiveuponthestatistician'sapproach(i.e.,eithermathematicsorscience,butnotboth).SupportingTukey'sideology,Huber[ 49 ]encouragedtheusageoftermdataanalysis,astheothertermisofteninterpretedinanoverlynarrowsense(restrictedtomathematicsandprobability).Thus,theseminalworkofTukey[ 97 98 ]enlargedthescopeofdataanalysisfrommerestatisticalinferencetosomethingmore.Insimpleterms,thekeyideaofdataanalysisapproachistoproposesomeanalyticalormathematicalmodelthatrepresentstheunderlyingmechanismofthesystemunderconsideration.Theproposedmodelcanbespecic(parametric)tothesystemorcanbeageneral(nonparametric)model.Bothparametricandnonparametric 12

PAGE 13

modelshavesomeparameterstotune.Theparametersaretunedbasedontheobserveddatacollectedfromthesystem(experimentation).Theprocessofidentifyingthemodelparameteriscalledasparameterestimation.Thebasicideainvolvedinmodelparameterestimationistoestimatethemodelparametersbyminimizingtheerrorbetweentheestimatedoutputfromthemodelandthedesiredresponse.Theerrorbydenitionismerelyadifferencebetweentheoutputandtheresponse.Theerrormeasure(ortheworthofanerrorvalue)playsaverycrucialroleintheestimationofthemodelparameters.Typically,whentheerrorisassumedtobeGaussian,theMeanSquareError(MSE)criterionisequivalenttotheminimizationoferrorenergy(orvariance).Itiswellknowthat,underGaussianityassumption,MSEleadstothebestpossibleparameterestimation(amaximumlikelihoodsolution).However,parameterestimationissuesrelatedtononlinearandnon-gaussianerrorcallforcostsotherthanMSE[ 15 ].Higherorderstatisticscanbeusedtodealwithnon-gaussianerrors.Statistically,MSEminimizesthesecondordermomentoftheerror.Inordertotransferasmuchinformationaspossiblefromthesystem'sresponsestothemodelparameters,secondandhigherordermoments(kurtosis,orcumulants)oferrorshouldbeminimized.However,themostimportantdrawbackofusinghigherorderstatisticsisthattheyareverysensitivetooutliers.TheobjectiveofChapter 1 istointroducethetopicofthedissertation.Section 1.1 presentsansimplisticintroductiontothenotionofdataanalysis.Section 1.2 highlightsthesignicanceoftherobustdataanalysisbypresentingamotivatinganecdotefromtheliterature,andhighlightsitsrelevancetothepracticalengineeringandbiomedicalscenarios.InSection 1.3 ,theoverviewoftherobustdataanalysisideologyispresented,andthespecicapproachesthatwillbeimplementedanddevelopedinthisworkwillbeclearlystated.ThescopeandobjectiveofthisworkispresentedinSection 1.4 13

PAGE 14

1.1.DataAnalysisDataanalysisisaninterdisciplinaryeld,includingstatistics,databasedesign,machinelearningandoptimization.Itcanbedenedinsimpletermsastheprocessofextractingknowledgefromarawdatasetbyanymeans.Approachesindataanalysisvarydependinguponthetypeofdata,theobjectiveoftheanalysis,theavailabilityofcomputationaltimeandresources,andthefamiliarity(orinclination)oftheresearchertowardsaspecicapproach.Thus,thereareplethoraofdataanalysismethods,includingparametricandnon-parametricframeworkwithexactandheuristicalgorithms.However,adataanalysisapproachcanbeschematicallyspeciedbasedonsomeprominentelementsofdataanalysis.Ingeneral,theelementsofdataanalysiscanbestructuredintothefollowingsixsequentialsteps:Objective.Therstandmostimportantstepindataanalysisistheobjectiveoftheanalysis.Itshouldbewelldenedandclearinnature.Basedontheobjective,thelaterstepsarecustomized.Typically,theobjectivesmayinvolveoneormorethanone(oracombination)ofthefollowingmajorcriteria: Regression:Literallytheterm`Regression'meansareturntoformalorprimitivestate.Statisticalregressioninvolvestheideaofndinganunderlyingprimitiverelationshipbetweenthecausalvariablesandtheeffectvariables.Moreover,theunstatedbasicassumptioninstatisticalregressionisthatallthedatabelongstoasingleclass. Classication:Literally`Classication'meansaprocessofclassifyingsomethingbasedonsharedcharacteristics.Statisticalclassicationisasupervisedlearningmethodthatinvolvesclassifyinguncategorizeddatabasedontheknowledgeofcategorizeddata.Theclasslabelforthecategorizeddataisknown.Whereas,theclasslabelfortheuncategorizeddataisunknown.Theunstatedbasicassumptioninstatisticalclassicationisthatanuncategorizeddatapointshouldbeassignedtoexactlyoneoftheclasslabels. Clustering:Literally`Clustering'meanscongregatingthingstogetherbasedontheirparticularcharacteristics.Statisticalclusteringisanunsupervisedlearningmethodwhichaimstoclusterdatabasedondenednearnessmeasure.Itinvolvesmultipleclasses,andforeachclassanunderlyingrelationshipistobefound.Ideally,thereisnopriorknowledgeavailableaboutthedataclasses.However, 14

PAGE 15

someoftheclusteringmethodsassumethattheinformationofthetotalnumberofdataclassesareknownapriori.DataRepresentation.Dataisnothingbutstoredand/orknownfacts.Datacomesindifferentformsandrepresentations.Itcanrepresentaqualitativeorquantitativefact(intheformofnumbers,text,patternsorcategories).Basedontheobjectiveofdataanalysis,asuitabledatarepresentationshouldbeselected.Ageneralizedwaytorepresentdataisintheformofannpmatrix,alsoknownas`atrepresentation'.Typically,therows(records,individual,entities,casesorobjects)representdatapoints,andforeachdatapointacolumn(attribute,feature,variableoreld)representsameasurementvalue.However,dependinguponthecontext,theinterpretationofrowsandcolumnsmayinterchange.KnowledgeRepresentation.Theextractedknowledgecanberepresentedintheformofrelationships(betweeninputsandoutputs)and/orsummaries(novelwaystorepresentsamedata).Thewayofrepresentingtherelationships(orsummaries)dependsupontheeldofresearch,andthenalaudience(i.e.,itshouldbenovel,butmoreimportantlyunderstandabletothereader).Therelationshipsorsummaries(oftenreferredtoasmodelsorpatterns)canberepresentedbutnotlimitedtofollowingforms:Linearequations,Graphs,Trees,Rules,Clusters,Recurrentpatterns,Neuralnetworks,etcetera.Typically,thetypeofrepresentationforrelationships/summariesshouldbeselectedbeforeanalyzingthedata.LossFunction.Thelossfunctionisameasurefunctionthataccountsfortheerrorbetweenthepredictedoutputandactualoutput.Itisalsoknownaspenaltyfunctionorcostfunction.Theselectionordesignoflossfunctiondependsupontwomaincriteria.Firstly,itshouldappropriatelyreecttheerrorbetweenthepredictedoutputandactualoutput.Secondly,thelossfunctionshouldbeeasilyincorporableinsideanoptimizationalgorithm.Inadditiontothat,givenaninstanceofpredictedoutputandactualoutput,thelossfunctionshouldgivetheerrorvalueinpolynomialtime.Thelongerittakesto 15

PAGE 16

calculatetheerror,thelesseristheefciencyoftheoptimizationalgorithm.Therearetwomainclassicallossfunctions,namely:absoluteerror,meansquareerror.Typically,themeansquareerror(commonlyknownasquadraticlossfunction)isusedoftenasalossfunction.OptimizationAlgorithm.Theknowledgerepresentation,selectedapriori,istrained(usinganoptimizationalgorithm)onthedatasettominimizethelossfunction.Thus,thisassuresthattherepresentedknowledgeaptlyimitatestherealsystem(thesourceorgeneratorofthedataset).Suchtrainingalgorithms,alsoknownaslearningalgorithms,arebasedonsomeoptimizationmethods.Classically,aparametricrepresentationisencouraged,andisaccompaniedbyanexactoptimizationmethod.Although,aparametricrepresentationrequiresindepthknowledgeofthegivendataset,parametricmethodsweregivensuperiorityovernon-parametricmethodsduetotheexistenceofefcientexactoptimizationmethods.Moreover,exactsolutionmethodsaresuitableforalimitedclassofparametricrepresentations,thustheylimitthescopeofknowledgerepresentation.Recentdevelopmentsintheuseofnon-parametricmethodslikearticialneuralnetworkshavewidenedthescopeofknowledgerepresentation.However,duetotheuseofexactmethods,theyhavenotbeenutilizedtotheirfullpotential.Lately,duetothedevelopmentinheuristicoptimizationmethods,theuseofnon-parametricmethodshavebecomedesirableandenlargedthescopeofknowledgerepresentation.Validation.Thisistypicallythelaststepinthedataanalysis.Thekeypurposeofthisstepistojustifytheoutput(estimatedparameters)obtainedfromtheearliersteps.Expertsontheproblemspecicdomainareconsultedtoverifyandvalidatetheresults.However,expertopinionmaynotalwaysbeavailable.Hence,crossvalidationsmethodsaredeveloped.Thereareseveralcrossvalidationmethodsthatarebasedontheconceptoftrainingandtesting.Theideaistodividethegivendatasetintotwosubgroupscalledtrainingandtestingsets.Dataanalysisisconductedonthetraining 16

PAGE 17

dataset,andthemodel'sperformanceiscalibratedusingthetestingdataset.Generally,thesizeoftrainingsetisgreaterthantestingset.Next,threemostcommonmethodsofcrossvalidationaredescribed: k-foldCrossValidation(kCV):Inthismethod,thedatasetispartitionedinkequallysizedgroupsofsamples(folds).Ineverycrossvalidationiteration,k-1foldsareusedforthetrainingand1foldisusedforthetesting.Intheliterature,usuallyktakesavaluefrom1,...,10. LeaveOneOutCrossValidation(LOOCV):Inthismethod,eachsamplerepresentsonefold.Particularly,thismethodisusedwhenthenumberofsamplesaresmall,orwhenthegoalofclassicationistodetectoutliers(sampleswithparticularpropertiesthatdonotresembletheothersamplesoftheirclass). RepeatedRandomSub-samplingCrossValidation(RRSCV):Inthismethod,thedatasetispartitionedintotworandomsets,namelytrainingsetandvalidation(ortesting)set.Ineverycrossvalidation,thetrainingsetisusedtotrainthemodel,andthetesting(orvalidation)settotesttheaccuracyofthemodel.Thismethodispreferrediftherearelargenumberofsamplesinthedata.Theadvantageofthismethod(overk-foldcrossvalidation)isthattheproportionofthetrainingsetandnumberofiterationsareindependent.However,themaindrawbackofthismethodisiffewcrossvalidationsareperformed,thensomeobservationsmayneverbeselectedinthetrainingphase(orthetestingphase).Whereasothersmaybeselectedmorethanonceinthetrainingphase(orthetestingphaserespectively).Toovercomethisdifculty,themodeliscrossvalidatedsufcientlylargenumberoftimes,sothateachsampleisselectedatleastoncefortrainingaswellastestingthemodel.ThesemultiplecrossvalidationsalsoexhibitMonte-Carlovariation(sincethetrainingandtestingsetsarechosenrandomly).Amongtheabovestatedsteps,knowledgerepresentation,lossfunctionandtheoptimizationalgorithmformthecruxofthedataanalysis.Traditionalapproachesofdataanalysiswerebasedonstatisticalprinciples,andweretermedasstatisticalanalysis.Atypicalassumptioninthetraditionalapproachesincludestheavailabilityoftheknowledgeofthedatadistributionorabilitytoperfectlylearnthedistributionfromtheinnitelengthdata.Thus,eitherthedataisassumedtobeperfect,ortheltermethodsaredevelopedtoremovethenoisefromthedatabeforeconductingthestatisticalanalysis.However,ltermethodsarebasedonassumptions,andrequiredata 17

PAGE 18

specicknowledge.Therefore,thestatisticalanalysisperformswelltheoreticallybuthaslimitationsformostofthepracticalscenarios. 1.2.MotivationandSignicanceFromtraditionalstatisticalanalysistothecontemporarydataanalysis,oneofthekeyanalysiselementsthathasremainedunchangedistheoptimizationbasedapproachinextractingknowledgefromthedata.Theefciencyofoptimizationmethodsareinturndependentuponthetypeoftheobjectivefunctionandthefeasiblespace.Furthermore,thesolutionquality(localorglobalbest)ofdataanalysismethodsalsodependsupontheobjectivefunctionandthefeasiblespace.Existenceofoutliers(ornoise)oftentaintthesolutionspace.Hence,practicaldataanalysiscallsformethodsindataanalysisthatareinsensitiveorresistanttotheoutliers.Determiningsimilaritybetweendatasamplesusinganappropriatemeasurehasbeenthekeyissueintheanalysisofexperimentaldata.TheimportanceofrobustmethodsindataanalysiscanbetracedbacktotheoldfamousdisputebetweenFisherandEddington.Basedonpracticalobservations,Eddington[ 25 ]proposedthesuitabilityoftheabsoluteerrorasanappropriatemeasure.Fisher[ 30 ]counteredtheideaofEddingtonbytheoreticallyshowingthatunderidealcircumstances(errorsarenormallydistributed,andoutliersfreedata)themeansquareerrorisbetterthantheabsoluteerror.ThedisputebetweenEddingtonandFisheractuallyplayedaprominentroleinshapingthetheoryofstatisticalanalysis.AfterFisher'sillustration,manyresearchersincorporatedmeansquareerrorasadefaultsimilaritymeasureintheiranalysis.Tukey[ 97 ]reasonedthatoccurrenceoftheidealcircumstancesforpracticalscenariosisveryrare.Huber[ 48 ]furthershowedthatnoiseaslessas0.2%,whichisidealformanypracticaldata,willfavortheusageofabsoluteerrorinsteadofmeansquareerror.AlthoughTukey'spaperhighlightedtheimportanceofrobustmeasuresliketheabsoluteerror,theprevalenceofmeansquareerrorindataanalysiscanbesolelyattributedtoitsconvex,continuousanddifferentiablenature.Therehavebeenexplicitstudies 18

PAGE 19

[ 40 47 48 ]ontheresearchanddevelopmentofrobustmeasures,underthepreambleofrobuststatistics.Thetraditionalstatisticalanalysismethodswerestrictlydependentupontheoreticalassumptionslike, idealcircumstances:Errorsarenormallydistributed. distributionalassumptions:Distributionofdatacanbelearned(oravailable). sensitivityassumptions:Smalldeviationsindistributionresultinminorchanges. smoothingassumptions:Effectoffewoutliersgetsfadedoutw.r.tbulkdata.Tukey[ 97 ]suggestedthatinthepracticalscenarios,theassumptionsarehardlytrueandbarelyveriable.Infact,theassumptionsaremoreorlessassumedtobetrueformathematicalconvenience.Theassumptionswerejustiedbyvaguestabilityprinciplesthatminorchangesshouldresultinsmallerrorintheultimateconclusion.Onthecontrary,Huber[ 47 ]statesthattheassumptionsdonotalwayshold,andtraditionalmethodsbasedonthedistributionalassumptionsareverysensitivetominorchanges.Infact,Geary[ 31 ](citedbyTukey[ 98 ]andHampel[ 39 ])statedthatNormalityisamyth;thereneverwas,andneverwillbe,anormaldistribution.Thus,robustproceduresareacrucialrequirementofthecontemporarydataanalysismethods.Theseideologiesledtothedevelopmentofrobustmethodsindataanalysis. 1.3.RobustMethodsArobustmethodindataanalysiscanbedenedasthemethodofextractingknowledgefromthebulkofthegivendata,simultaneouslyneglectingtheknowledgefromtheoutlierspresentinthegivendata.Themajorapproachesofrobustmethodsindataanalysiscanbedividedintofollowingcategories:RelaxingDistributionalAssumptions.Theapproachhereistodevelopdataanalysismethodsbasedongeometric(orstructural)assumptionsratherthanthedistributionalassumptions.Thisapproachisfollowedinthehopeofreducingthesensitivityofmethodswithrespecttothepracticalscenarios.Furthermore,the 19

PAGE 20

geometricalassumptionsondatacanbeeasilyveried,unlikethedistributionalassumptions.IncorporatingDistributionalAssumptions.Obviouslyrelaxingallthedistributionalassumptionsinadataanalysismethodisthemostappropriatecaseforpracticaldata.However,thedistributionalassumptionscannotbediscardedinmostofthescenarios,mainlyduetothelossofmathematicalconvenienceintheanalysisapproach.Thus,mostoftheresearchinrobustmethodsisbasedonincorporatingideasintothetraditionalmethodsthatwillresultininsensitivitytotheconventionaltheoreticalassumptions.Theapproachescanbecategorizedasusageof: RobustMeasure:Ameasurewhichisinsensitivetooutliersisusedasalossfunction. RobustAlgorithm:Subsamplesfromthegivendatasampleisanalyzedseparately,andtheinformationfromthesubsampleanalysisisutilizedtoconstructthemodel. RobustOptimization:Anuncertaintybaseddomainisconsideredaroundeachdatasample,andstochasticoptimizationbasedalgorithmsareusedtoconducttheanalysis.Itistobenotedthatincorporatingrobustnessisapracticalapproach,anditisacurrentcriticalrequirementofdataanalysismethods.However,robustnessoftenresultsinlossofconvexityand/orsmoothnessfromtheoptimizationproblemrelatedtothedataanalysis.Furthermore,thecomputationalefciencyoftherobustmethodsaregenerallylowerthanthatofthenon-robustmethods.Itisoutofscopeofthisdissertationtodiscussalltheaspectsofrobustmethod.Therefore,beforeproceedingfurthertodevelopthethemeofrobustmethods,thescopeandobjectiveofthisdissertationispresentedinSection 1.4 1.4.ScopeandObjectiveTheobjectiveofthisdissertationistodevelopnoveloptimizationbasedrobustmethodsindataanalysisproblems.AsdescribedinSection 1.3 ,thetermrobustmethodshasbeenusedindifferentconnotationsbasedontheintentionandareaof 20

PAGE 21

theapplication.Inthiswork,robustmethodsmeanincorporationofrobustalgorithms,and/orusageofrobustmeasuresindataanalysisproblems.Inthecaseofrobustmeasures,thefocusisontheapplicabilityofentropybasedrobustmeasures,likecorrentropy,indataanalysis.Inthiswork,generalizedconvexitybasedresultsarepresentedfortheentropybasedmeasures.Inadditiontothat,theperformanceoftherobustmeasureinbinaryclassicationusinganon-parametricframeworkisillustrated.Ontheotherhand,arobustalgorithmforsignalseparationproblemisalsoproposed.Specically,alinearmixingmodelforthesignalseparationproblemisconsidered.Robustalgorithmsaredevelopedtoextractthedictionaryinformationfromthegivenmixturedata.Furthermore,anentropybasedmethodisproposedtoextractthesourcesfromthemixturedata.Robustmethodsareapplicabletopracticaldataanalysisscenarios,whichtypicallyinvolvenoisydata.Fromtheliterature[ 39 ],itcannowbeassumedasaruleofthumb(notanexception)thatthedatafrombiomedicalandengineeringsystemscontain5%to10%ofoutliers.Moreover,ifitisassumedthattherearenooutlierspresentinthedata,thenthesolutionqualityobtainedfromrobustmethodsistypicallycompetitivetothenon-robustmethods.However,themaindrawbackofrobustmethodsisthattheyarecomputationallyexpensive.Nevertheless,ouraimistoanalyzeoptimizationpropertiesofrobustmeasureandproposeselectionstrategiesforrobustalgorithmsthatmaybeusedtoimprovethecomputationalandoptimizationefciency.InChapter 2 ,thoseissuesrelatedtorobustmethodsthatarerelevanttothisdissertationareaddressed.Interestedreadersaredirectedtowardsreferences[ 40 47 ],whichpresentageneraldiscussionontherobustmethods. 21

PAGE 22

CHAPTER2ROBUSTMEASURESANDALGORITHMSInChapter 2 ,thetheoryofrobustmethodsispresented.Theproposedapproachesincludetheconceptofrobustmeasuresandrobustalgorithms.Theideasrelatedtorobustoptimizationarerelativelynewwhencomparedtotraditionalrobustmeasuresandalgorithms.However,robustoptimizationbasedmethods,whicharenothingbutuncertaintybasedoptimizationmethods,havebeenrigorouslyappliedintheareaofdataanalysisduetotheefcientmethodsdevelopedbythestochasticoptimizationcommunity.Ontheotherhand,thenotionofrobustmeasuresandalgorithmscanbetracedbacktothetimesofEddingtonandFisher.However,elegantmethodstoincorporatetheconceptsrobustmeasuresandalgorithmsinapracticalframeworkhavealwaysbeenanopenresearcharea.Thecruxofthisworkistoshowtheapplicabilityofanewrobustmeasure,whichisdevelopedfromtheoryofRenyi'sentropy,intheproblemsrelatedtodataanalysis.Chapter 2 isstructuredasfollows.Section 2.1 presentsabriefsummaryofthetraditionalrobustmeasures.TheconceptofentropybasedrobustmeasuresispresentedinSection 2.2 .Sections 2.3 2.4 and 2.5 provethegeneralizedconvexitybasedoptimizationpropertiesofthethreeentropybasedrobustmeasures.Furthermore,Section 2.6 presentsabriefintroductiontothetraditionalrobustalgorithms.Section 2.7 presentstheproposedrobustalgorithm.Finally,Section 2.8 concludesChapter 2 bypresentingabriefdiscussionontheproposedmethods. 2.1.TraditionalRobustMeasuresConsideraunivariatedatacontainingNsamples.Oneofthetraditionalwaystocollectinformationfromthesamplesistocalculateitsmeanandvariance.Now,assumethatoneoutlierhasbeenappendedtotheexistingdataset.Obviouslythe SomesectionsofChapter2havebeenpublishedinOptimizationLetters. 22

PAGE 23

meanandvariancewillchangesignicantly.However,themedianofthedatawillnotchangemuch.Infact,themedianwillgivethetrueinformationofthedatauntilthereareabout50%outliersinthedata.Thus,medianisconsideredasarobustmeasurethancomparedtomean,andinsomesense,medianisthemostrobustmeasureoflocation.Animprovementoftraditionalmeancalculationisthe-trimmedmean,where0<<1 2.Thekeyideain-trimmedmeanistoremoveuptoNpointsfromthesamplebeforecalculatingthemean.Usingtheaboveideology,manyrobustestimatesofdatahavebeenproposed.Generally,thescaleestimatescanbeclassiedintothreemaincategories:L-estimators,M-estimators,andS-estimators.Amongthethreeestimates,thisworkwillconsiderM-estimators.Insimpleterms,M-estimatorsaretheminimaofameasurethatisconstructedbythesummationoffunctionsofthedatapoints.HuberwaspioneerinproposingaclassofrobustM-estimators.TheHuberclassoffunctioncanbedenedbyafamilyoffunctions (x),whereisaparameter,andxisanerrorvalue.Fortheestimatesoflocation (x)= (x)]TJ /F3 11.955 Tf 12.34 0 Td[(),andthebasemodelofHubermeasurecanberepresentedas: (x)=8>><>>:xifjxjk1k2sign(x)otherwise,(2)where00andk2=0,thefunction (x)correspondstometrictrimming.Whenk1=k2=c>0,thefunction (x)correspondstometricwinsorizing.Tukeyproposedanotherclassofrobustmeasure,calledasbiweightmeasure: (x)=x"1)]TJ /F8 11.955 Tf 11.95 16.86 Td[(x k12#2+,(2) 23

PAGE 24

where[a]+representsthepositivepartofa,andk1isaparameter.Furthermore,Hampelproposedapiecewiselinearfunctionsbasedrobustmeasure,whichisdenedas: (x)=8>>>>>>>>>><>>>>>>>>>>:jxjsign(x)0
PAGE 25

Shannon'sentropy,ameasureofuncertaintyoftheprobabilitydistribution,quantiestheexpectedvalueofinformationcontainedinasystem.Later,Renyi[ 76 ]generalizedthenotionofentropy;thatincludesShannon'sdenitionofentropy.Whencombinedwithanon-parametricestimatorlikeParzen'sestimator[ 71 ],Renyi'sentropyprovidesamechanismtoestimateentropydirectlyfromtheresponses.Usingtheconceptofnon-parametricRenyi'sentropy,thenotionofMinimizationofErrorEntropy(MEE)[ 26 ]isfounded,whichisacentralconceptintheeldofinformationtheoreticlearning[ 72 73 ].Anotherimportantpropertyofentropybasedrobustmeasuresisthattheyencompasshigherordermoments.Therefore,minimizingerrormeasurebasedonentropyindirectlytakeintoaccounthigherorderstatistics.Typically,thetraditionalhigherorderstatisticsareverysensitivemeasures.Ontheotherhand,asanadditionaladvantage,entropybasedrobustmeasuresarerobust.Thus,entropybasedmeasuresareusefulfornonlinearnongaussiansystems.Sections 2.3 2.4 and 2.5 presentnovelpropertiesofthreeentropybasedrobustmeasures. 2.3.MinimizationofCorrentropyCostCorrentropy(strictlyspeaking,shouldbecalledascrosscorrentropy)isageneralizedsimilaritymeasurebetweenanytwoarbitraryrandomvariables(y,a),denedas[ 83 ]: (y,a)=Ey,a[k(y)]TJ /F10 11.955 Tf 11.95 0 Td[(a,)],(2)wherekisanyformoftransformationkernelfunctionwithparameter(inthisworkitistakenasGaussiankernel).Forthesakeofsimplicity,considerabinaryclassicationscenario.Letx=a)]TJ /F10 11.955 Tf 12.34 0 Td[(yrepresenttheerror,wherea,x,andy2Rareactuallabel,errorandpredictedlabelrespectively.Thecorrentropiclossfunctionisdenedas: FC(x,)=(1)]TJ /F3 11.955 Tf 11.95 0 Td[((x))orFC(x,)=(1)]TJ /F10 11.955 Tf 11.96 0 Td[(Ex[k(x,)]),(2)where=h1)]TJ /F10 11.955 Tf 11.95 0 Td[(e()]TJ /F15 5.978 Tf 5.75 0 Td[(1 22)i)]TJ /F6 7.97 Tf 6.58 0 Td[(1.Typically,theprobabilitydistributionfunctionofxisunknown,andonlyfxigni=1observationsareavailable.Usingtheinformationfromnobservations, 25

PAGE 26

theempiricalcorrentropiclossfunctioncanbedenedas: FC(x,)=(1)]TJ /F11 11.955 Tf 13.25 8.09 Td[(1 nnXi=1k(xi,)),(2)wherex=[x1,...,xn]Tisanarrayofsampleerrors,andk(x,)=e)]TJ /F19 5.978 Tf 5.75 0 Td[(x2 22.ApracticalapproachtominimizethefunctiongiveninEquation 2 istoassumeasaparameter.Multipleiterationsfordifferentvaluesoftheparameterareexecutedtoobtaintheoptimalsolution.ParameterBasedCorrentropicFunction Theparameterizedcorrentropiclossfunction,isdenedas: FC(x)=(1)]TJ /F11 11.955 Tf 13.25 8.08 Td[(1 nnXi=1k(xi)),(2)where=h1)]TJ /F10 11.955 Tf 11.96 0 Td[(e()]TJ /F15 5.978 Tf 5.75 0 Td[(1 22)i)]TJ /F6 7.97 Tf 6.58 0 Td[(1,andk(x)=e)]TJ /F19 5.978 Tf 5.75 0 Td[(x2 22.LetHC(x)denotestheHessianofthefunctiondenedinEquation 2 ,givenas: HC(x)=0BBBBBBB@(x1)2)]TJ /F14 7.97 Tf 6.58 0 Td[(x21 2000(x2)2)]TJ /F14 7.97 Tf 6.58 0 Td[(x22 20............00(xn)2)]TJ /F14 7.97 Tf 6.58 0 Td[(x2n 21CCCCCCCA.(2)where(x)= 2e)]TJ /F19 5.978 Tf 5.76 0 Td[(x2 22.FromEquation 2 ,itcanbeseenthatifjj>jxij,fori=1,...,n,thenthecorrentropicfunctionisconvex.UndertheidealcircumstancesasassumedbyFisher,choosingjj>jxij,fori=1,...,nisappropriate.However,forthepracticalcase,shouldbeselectedsuchthatjjjxijwhentheithsampleisanon-outlier.Thiswinnowingofoutliersbykernelwidthistherobustnessofthecorrentropiclossfunction.However,therobustnessisachievedincorrentropiclossfunctionatthecostoflosingconvexity. 26

PAGE 27

Theaboveanalysishighlightsthesubtleyetcrucialissue,i.e.,thetrade-offamongthethreedesiredproperties:convexity,robustnessandsmoothness.Conventionally,thebeststrategyistoselectanytwoofthethreepropertiesinasimilaritymeasure.Forinstance,mostofthetraditionalpractitionersselectconvexityandrobustness(liketheabsolutelossfunction),orselectconvexityandsmoothness(likethequadraticlossfunction).Correntropyopensadoorinthedirection,whererobustnessandsmoothnessareguaranteed.Butwithoutconvexity,optimizationofageneralnonlinearfunctionwillbeachallengingtask.Fortunately,forthecorrentropiclossfunction,weshowthatthefunctionispseudoconvexforonedimension,andinvexformulti-dimensions.Whendatacannotbenormalized(i.e.,whendatashouldhaveadifferentkernelwidthfordifferentfeatures)thegeneralizedcorrentropicfunctionlossfunction,isdenedas: FC(x)=nXi=1i n(1)]TJ /F10 11.955 Tf 11.95 0 Td[(ki(xi)),(2)wherei=1)]TJ /F10 11.955 Tf 11.96 0 Td[(e()]TJ /F15 5.978 Tf 5.76 0 Td[(1 22i))]TJ /F6 7.97 Tf 6.58 0 Td[(1andki(x)=e)]TJ /F19 5.978 Tf 5.75 0 Td[(x2 22i.InthefollowingpartofChapter 2 ,thetotalcorrentropicloss,insteadoftheaveragelossisconsidered,i.e.,thecorrentropiclossfunctionisdenedas: FC(x)=nXi=1i(1)]TJ /F10 11.955 Tf 11.96 0 Td[(ki(xi)).(2)GeneralizedConvexityofCorrentropicFunction Althoughforthehighervalueofparameter(dependingupontheerrormagnitude),thecorrentropybasedmeasureisconvex.However,itisofpracticalimportancetostudythepropertiesofcorrentropyfunctionforanyvalueof>0.Specicallyinthiswork,itisclaimedthatthecorrentropyfunctionispseudoconvexandinvex,dependinguponthesampledimension.Letusconsiderthesimplestcase,wheretheerrorfromasinglesampleisconsideredoneatatime.Thiscaseiscalledassinglesamplecase. 27

PAGE 28

SingleSampleCase:Letxbethesampleerror.Thecorrentropylossfunction,withrespecttoonesample,canbedenedas: FC(x)=h1)]TJ /F10 11.955 Tf 11.96 0 Td[(e()]TJ /F15 5.978 Tf 5.76 0 Td[(1 22)i)]TJ /F6 7.97 Tf 6.59 0 Td[(11)]TJ /F10 11.955 Tf 11.96 0 Td[(e)]TJ /F19 5.978 Tf 5.75 0 Td[(x2 228x2R.(2)Thepseudoconvexityofthelossfunctionisclaimedunderthefollowingconditions. Theorem2.1. Let=h1)]TJ /F10 11.955 Tf 11.96 0 Td[(e()]TJ /F15 5.978 Tf 5.76 0 Td[(1 22)i)]TJ /F6 7.97 Tf 6.58 0 Td[(1andS=fx2R:x20.Proof:Letx1,x22R.Considerthefollowing: rFC(x1)(x2)]TJ /F10 11.955 Tf 11.96 0 Td[(x1)= 2e)]TJ /F19 5.978 Tf 5.76 0 Td[(x21 22(x1)(x2)]TJ /F10 11.955 Tf 11.96 0 Td[(x1)=(x1)x1(x2)]TJ /F10 11.955 Tf 11.95 0 Td[(x1),where(x)= 2e)]TJ /F19 5.978 Tf 5.76 0 Td[(x2 22and(x)>08,x;since>0,6=0andnite,andx2
PAGE 29

Case2:ifx1<0,thenEquation 2a reducestothefollowing: x2x1orx2x1<0)FC(x2)FC(x1). (2a) FromEquations 2a & 2a ,thefollowingstatementholds: IfrFC(x1)(x2)]TJ /F10 11.955 Tf 11.95 0 Td[(x1)0,thenFC(x2)FC(x1)foragiven(2)FromEquation 2 ,itfollowsthatFCispseudoconvexforagivenparameter. Remark2.1. Ifthereexistsx?2RsuchthatrFC(x?)=0,thenx?istheglobaloptimalsolutionofFC.n-sampleCase:Letxibetheithsampleerror.Thecorrentropiclossfunction,inn-sample(cumulativeerrorofn-samples)isgivenas: FC(x)=nXi=1i"1)]TJ /F10 11.955 Tf 11.96 0 Td[(e)]TJ /F19 5.978 Tf 5.76 0 Td[(x2i 22i#8x2S,(2)wherei="1)]TJ /F10 11.955 Tf 11.96 0 Td[(e)]TJ /F15 5.978 Tf 5.75 0 Td[(1 22i#)]TJ /F6 7.97 Tf 6.59 0 Td[(1andS=fx2Rn:x2i
PAGE 30

pseudoconvexityofcorrentropicfunctionfornsamplesdoesnotfollowdirectlyfromTheorem 2.1 Theorem2.2. Leti="1)]TJ /F10 11.955 Tf 11.95 0 Td[(e)]TJ /F15 5.978 Tf 5.76 0 Td[(1 22i#)]TJ /F6 7.97 Tf 6.58 0 Td[(1andS=fx2Rn:x2i0.Proof:LetN(x)=fyjky)]TJ /F4 11.955 Tf 12.5 0 Td[(xk<,0<<^)166(!0grepresenttheepsilonneighborhoodofx.Letx2Sandbx2N(x)\Sbeanytwopoints,suchthat: rFC(x)T(bx)]TJ /F4 11.955 Tf 11.95 0 Td[(x)0 (2a) nXi=1i(xi)xi(bxi)]TJ /F10 11.955 Tf 11.95 0 Td[(xi)0nXi=1i(xi)xidi0, (2b) whered2Rnisthedirection,suchthatbx=x+d.Thefollowingrelationisclaimedtobetrue: FC(x)FC(bx).(2)Bycontradiction,sayifFC(x)>FC(bx),then: nX8i=1[fi(bxi))]TJ /F10 11.955 Tf 11.95 0 Td[(fi(xi)]<0.(2)Now nX8i=1[fi(bxi))]TJ /F10 11.955 Tf 11.96 0 Td[(fi(xi)]=nX8i=1")]TJ /F3 11.955 Tf 9.3 0 Td[(ie)]TJ /F15 5.978 Tf 5.75 0 Td[((xi2+2xidi+2d2i) 22i+ie)]TJ /F19 5.978 Tf 5.76 0 Td[(xi2 22i#=nX8i=1ie)]TJ /F19 5.978 Tf 5.76 0 Td[(xi2 22i"1)]TJ /F10 11.955 Tf 11.95 0 Td[(e)]TJ /F15 5.978 Tf 5.75 0 Td[((2xidi+2d2i) 22i#. (2) 30

PAGE 31

Equations 2 & 2 implythefollowing: nX8i=1e)]TJ /F19 5.978 Tf 5.76 0 Td[(xi2 22i"1)]TJ /F10 11.955 Tf 11.96 0 Td[(e)]TJ /F15 5.978 Tf 5.75 0 Td[((2xidi+2d2i) 22i#<0.(2)DividingbothsidesofEquation 2 by>0,andtakingthelimits)166(!0,resultsin: 0>lim)177(!01 nX8i=1ie)]TJ /F19 5.978 Tf 5.75 0 Td[(xi2 22i"1)]TJ /F10 11.955 Tf 11.96 0 Td[(e)]TJ /F15 5.978 Tf 5.76 0 Td[((2xidi+2d2i) 22i#=nX8i=1ie)]TJ /F19 5.978 Tf 5.76 0 Td[(xi2 22ilim)177(!02641)]TJ /F10 11.955 Tf 11.96 0 Td[(e)]TJ /F15 5.978 Tf 5.76 0 Td[((2xidi+2d2i) 22i 375=nX8i=1ie)]TJ /F19 5.978 Tf 5.76 0 Td[(xi2 22ixidi 2i=nX8i=1i(xi)xidi. (2) Equation 2 isacontradictiontotheassumptionmadeinEquation 2 .ThisprovesthattheclaimstatedinEquation 2 istrue.Therefore,fromEquations 2 2 & 2 itisconcludedthat: ifrFC(x)T(bx)]TJ /F4 11.955 Tf 11.95 0 Td[(x)0,thenFC(bx)FC(x)8bx2N(x)\S.(2)Thatis,fromEquation 2 ,itcanbestatedthatFCislocallypseudoconvexforagivenparameter.Unfortunately,thelocalpseudoconvexitywillnotguaranteetheexistenceofglobalpseudoconvexity.However,agradientdescentalgorithmwithsufcientlysmallstepsizecanbedesignedsuchthatitcanguaranteeglobalconvergence.Nevertheless,thefollowingtheoremprovestheexistenceofinvexityforthecorrentropiclossfunction. Theorem2.3. Leti="1)]TJ /F10 11.955 Tf 11.95 0 Td[(e)]TJ /F15 5.978 Tf 5.76 0 Td[(1 22i#)]TJ /F6 7.97 Tf 6.58 0 Td[(1andS=fx2Rn:x2i
PAGE 32

isinvexforanynitei>0.Proof:Letx,bx2Sbeanytwopoints.Sincex2i0,2Rsuchthatx2i 2iMi8i=1,...,n.Thegradient,rFC(x)2Rnisdenedas: rFC(x)="1 21e)]TJ /F19 5.978 Tf 5.76 0 Td[(x21 221x1,...,n 2ne)]TJ /F19 5.978 Tf 5.75 0 Td[(x2n 22nxn#T,(2)whichimplies rFC(x)TrFC(x)=nXi=12i 4ie)]TJ /F19 5.978 Tf 5.76 0 Td[(x2i 2ix2i.(2)Sincex2S,itfollowsthatrFC(x)TrFC(x)=0onlywhenrFC(x)=0.Letusdene(bx,x)2Rnas: (bx,x)=8>>>>>><>>>>>>:0rFC(x)=0[FC(bx)FC(x)]rFC(x) rFC(x)TrFC(x)otherwise.(2)FromEquation 2 ,itfollowsthat: FC(bx))-222(FC(x)(x,bx)TrFC(x).(2)FromEquation 2 itfollowsthatFC(x)isinvex,whenx2i08k=1,...,nandiisnonzeroandnite.Proof:InTheorem 2.2 ,localpseudoconvexityofFCisproved.InTheorem 2.4 ,itisshownthatundercertainconditions,FCisgloballypseudoconvex.LetHC(x)denotetheHessianofthefunction,givenas: HC(x)=0BBBBBBB@1(x1)21)]TJ /F14 7.97 Tf 6.59 0 Td[(x21 210002(x2)22)]TJ /F14 7.97 Tf 6.58 0 Td[(x22 220............00n(xn)2n)]TJ /F14 7.97 Tf 6.58 0 Td[(x2n 2n1CCCCCCCA.(2)ConsidertheborderedHessianmatrixB(x)ofFCdenedas: B(x)=0B@HC(x)rFC(x)rFC(x)T01CA.(2)ThedeterminantofB(x)isdenedas: detB(x)=1(x1)21)]TJ /F14 7.97 Tf 6.58 0 Td[(x21 21001(x1)x102(x2)22)]TJ /F14 7.97 Tf 6.59 0 Td[(x22 2202(x2)x2...............00n(xn)2n)]TJ /F14 7.97 Tf 6.59 0 Td[(x2n 2nn(xn)xn1(x1)x12(x2)x2n(xn)xn0.(2)Usingtypicalrowoperations,thedeterminantcanberewrittenas: detB(x)=)]TJ /F14 7.97 Tf 16.65 14.95 Td[(nYi=1i(xi)(2i)]TJ /F10 11.955 Tf 11.95 0 Td[(x2i) 2i nXi=1i(xi)x2i2i 2i)]TJ /F10 11.955 Tf 11.95 0 Td[(x2i!.(2) 33

PAGE 34

LetdetBk(x)=)]TJ /F8 11.955 Tf 11.29 8.97 Td[(Qki=1i(xi)(2i)]TJ /F14 7.97 Tf 6.58 0 Td[(x2i) 2iPki=1i(xi)x2i2i 2i)]TJ /F14 7.97 Tf 6.59 0 Td[(x2i.Sinceck()>08k=1,...,n,8x2S,detBk(x)<08k=1,...,n,whichimpliesthefunctionisquasiconvex.Furthermore,fromTheorem 2.3 thefunctionisinvex.Thus,itcanbeconcludedthatthefunctionispseudoconvexwhenck()>08k=1,...,n.AdditionalProperties Inadditiontotheaboveprovedproperties,thefollowingpropertiesholdforthecorrentropicfunction: Leti=8i=1,...,n,=h1)]TJ /F10 11.955 Tf 11.96 0 Td[(e()]TJ /F15 5.978 Tf 5.76 0 Td[(1 22)i)]TJ /F6 7.97 Tf 6.59 0 Td[(1andS=fx2Rn:x2i
PAGE 35

Furthermore,duetotheinvexityofFC,wehave: FC(x))-222(FC(y)(y,x)TrFC(y).(2)Since, ismonotoneincreasing: 0(x)>08x2R.(2)MultiplyingEquation 2 onbothsidesby 0(FC(y))andsubstitutinginEquation 2 ,theresultsfollows. If2i>bMi8i,thenFC(x)isconvex.Somedataanalysisproblems,likemulti-classclassication,arebasedonerrorvector,i.e.,theerrorforasinglesampleisavectorinm-dimensions.Inordertoavoidconfusionontheusageofsampleanderrordimension,theerrordimensionsarecalledasdimensions.m-dimensions,Single-sampleCase:Thecorrentropylossfunctionform-dimensions,withrespecttoonesamplecanbedenedas: GC(x)=h1)]TJ /F10 11.955 Tf 11.96 0 Td[(e()]TJ /F15 5.978 Tf 5.75 0 Td[(1 22)i)]TJ /F6 7.97 Tf 6.58 0 Td[(1"1)]TJ /F10 11.955 Tf 11.95 0 Td[(ejjxjj2 22#8x2Rm,(2)wherejjxjjistheEuclideannorm.Weclaimthatthelossfunctionispseudoconvex. Theorem2.5. Ifx2RmthenthefunctionGC:Rm7!R,denedas: GC(x)=h1)]TJ /F10 11.955 Tf 11.96 0 Td[(e()]TJ /F15 5.978 Tf 5.75 0 Td[(1 22)i)]TJ /F6 7.97 Tf 6.58 0 Td[(1"1)]TJ /F10 11.955 Tf 11.95 0 Td[(ejjxjj2 22#8x2Rm,(2)ispseudoconvexfornite>0.Proof:Let=h1)]TJ /F10 11.955 Tf 11.96 0 Td[(e()]TJ /F15 5.978 Tf 5.75 0 Td[(1 22)i)]TJ /F6 7.97 Tf 6.58 0 Td[(1,>08.Thefunctioncanberewrittenas: GC(x)=)]TJ /F3 11.955 Tf 11.95 0 Td[(ejjxjj2 228x2Rm.(2)Letx1andx2betwovectorssuchthat: GC(x2)
PAGE 36

Then,)]TJ /F3 11.955 Tf 11.95 0 Td[(ejjx2jj2 22<)]TJ /F3 11.955 Tf 11.95 0 Td[(ejjx1jj2 22 (2a) ejjx2jj2 22>ejjx1jj2 22 (2b) jjx2jj2 22>jjx1jj2 22 (2c) jjx1jj>jjx2jj. (2d) Now,rGC(x)=(x)x,where(x)= 2ejjxjj2 22and(x)>08,x.Consider: rGC(x1)T(x2)]TJ /F4 11.955 Tf 11.96 0 Td[(x1)=(x1)xT1(x2)]TJ /F4 11.955 Tf 11.96 0 Td[(x1) (2a) =(x1)(xT1x2)]TJ /F4 11.955 Tf 11.96 0 Td[(xT1x1). (2b) UsingtheCauchy-Bunyakovsky-Schwarzinequality,wehave: ifjjx1jj>jjx2jjthenxT1x1>xT1x2.Therefore,usingtheaboveinequalityandfromEquations 2d & 2b ,wehave: IfGC(x2)08,x.Letgi(X)=24 n)]TJ /F3 11.955 Tf 11.96 0 Td[(e )]TJ /F24 5.978 Tf 7.03 4.48 Td[(Pjx2i,j 22!35.Thelossfunctioncanberewrittenas: GC(X)=nXi=1gi(X).(2) 36

PAGE 37

ThegradientofGC(X)canbewrittenas: rGC(X)=[(x1)x1,1,...,(x1)x1,m,...,(xn)xn,1,...,(xn)xn,m]T(2)and rGC(X)T(bX)]TJ /F10 11.955 Tf 11.96 0 Td[(X)=nXi=1mXj=1(xi)xi,j(bxi,j)]TJ /F10 11.955 Tf 11.96 0 Td[(xi,j).(2) Theorem2.6. Leti="1)]TJ /F10 11.955 Tf 11.95 0 Td[(e)]TJ /F15 5.978 Tf 5.75 0 Td[(1 22i#)]TJ /F6 7.97 Tf 6.59 0 Td[(1andS=fX2Rnm:jjxijj20.Proof:LetX,bX2Rnmbeanytwopoints.Sincejjxijj20,2Rsuchthatjjxijj2 2iMi8i.Thegradient,rGC(X)2Rnmisdenedas: rGC(X)=[1(x1)x1,1,...,1(x1)x1,m,...,n(xn)xn,1,...,n(xn)xn,m]T,(2)whichimplies rGC(x)TrGC(x)=nXi=1i(xi)jjxijj2.(2)Sincex2S,itfollowsthatrGC(X)TrGC(X)=0onlywhenrGC(X)=0.Letusdene(bX,X)2Rnmas: (bX,X)=8>>>>>><>>>>>>:0rGC(X)=0[GC(bX)GC(X)]rGC(X) rGC(X)TrGC(X)otherwise.(2)FromEquation 2 ,itfollowsthat: GC(bX))-222(GC(X)(X,bX)TrGC(X).(2) 37

PAGE 38

FromEquation 2 itfollowsthatGC(X)isinvex,whenxi2S8i=1,...,n. 2.4.MinimizationofErrorEntropyLetzbetheerrorbetweenithmeasurementandithdesiredvalue,denedaszi=xi)]TJ /F10 11.955 Tf 12.35 0 Td[(yi8i=1,...,N.TheMinimizationofErrorEntropy(MEE)problemcanbestatedasmaximizationofInformationPotential(IP)andcanbedenedas: minimize:)]TJ /F1 11.955 Tf 9.3 0 Td[(IP(z)=)]TJ /F11 11.955 Tf 14.52 8.09 Td[(1 N2NXi=1NXj=1k(zi)]TJ /F10 11.955 Tf 11.96 0 Td[(zj), (2) wherek()=e)]TJ /F17 5.978 Tf 5.76 0 Td[(2 22isthewellknownGaussiankernelandisthekernelparameter(forthesakeofsimplicity,theconstanttermintheGaussiankernelisignored).Lete2R(N)]TJ /F6 7.97 Tf 6.59 0 Td[(1)beavectorcontainingallones.Letei2R(N)]TJ /F6 7.97 Tf 6.59 0 Td[(1)beavectorcontainingallzeros,excepta1atithposition.ConstructamatrixBk2RN(N)]TJ /F6 7.97 Tf 6.59 0 Td[(1)as: Bk=[)]TJ /F4 11.955 Tf 9.3 0 Td[(e1,...)]TJ /F4 11.955 Tf 11.96 0 Td[(ek)]TJ /F6 7.97 Tf 6.58 0 Td[(1,+e,)]TJ /F4 11.955 Tf 9.3 0 Td[(ek,...)]TJ /F4 11.955 Tf 11.95 0 Td[(eN)]TJ /F6 7.97 Tf 6.58 0 Td[(1]8k=1,...,N.(2)LetA2RN(N)]TJ /F6 7.97 Tf 6.59 0 Td[(1)Nbedenedas: A=[BT1,...,BTN]T.(2)Now,theMEEproblemcanbere-statedas: minimize:)]TJ /F11 11.955 Tf 14.52 8.09 Td[(1 N2N(N)]TJ /F6 7.97 Tf 6.59 0 Td[(1)Xk=1k(aTkz), (2) whereak2RNrepresentsthekthrowofmatrixA.LetS1=fu2RN(N)]TJ /F6 7.97 Tf 6.58 0 Td[(1):u=Az,z2Sg,anddeneanafnefunctionL:SRN7!S1RN(N)]TJ /F6 7.97 Tf 6.59 0 Td[(1)as: L(z)=Az.(2) 38

PAGE 39

LetGC(z)=)]TJ /F6 7.97 Tf 13.48 4.71 Td[(1 N2PN(N)]TJ /F6 7.97 Tf 6.58 0 Td[(1)k=1k(aTkz)8z2S.GCcanberepresentedasacompositefunctionofFLandL,i.e., GC=FLL,(2)whereFL:S1RN(N)]TJ /F6 7.97 Tf 6.59 0 Td[(1)7!R,denedas: FL=)]TJ /F11 11.955 Tf 14.53 8.09 Td[(1 N2N(N)]TJ /F6 7.97 Tf 6.59 0 Td[(1)Xi=1e)]TJ /F19 5.978 Tf 5.76 0 Td[(u2i 22.(2)Now,Equation 2 representsthecorrentropylossfunctiondenedoveraprojectedspaceS1.Furthermore,Equation 2 impliesthatMEEisacompositefunctionofthecorrentropylossfunctionandanafnefunction.ThisrepresentationpavesthewaytoestablishthegeneralizedconvexityresultsforMEEsince,ingeneral,compositionwithanafnefunctionpreservesgeneralizedconvexityofthecompositefunction.Next,thepropertiesofMEEfunctionispresented. Theorem2.7. LetS=fz2RN:z2i
PAGE 40

ispseudoconvexwhenck()>08k=1,...,Nandisnonzeroandnite,whereck()=Qki=1(ui)(2)]TJ /F14 7.97 Tf 6.58 0 Td[(u2i) 2Pki=1(ui)u2i2 2)]TJ /F14 7.97 Tf 6.59 0 Td[(u2i,i(ui)=1 2e)]TJ /F19 5.978 Tf 5.76 0 Td[(u2i 22andu=Az.Proof:From[ 60 ]and[ 7 ],itisconcludedthatpseudoconvexityisinvariantunderthecompositionwithanafnefunction.Thus,usingtheresultsfrom[ 92 ]inEquation 2 ,itfollowsthatGCispseudoconvexwhenck()>08k=1,...,Nandisnonzeroandnite,whereck()>08k=1,...,Nandisnonzeroandnite,whereck()=Qki=1(ui)(2)]TJ /F14 7.97 Tf 6.58 0 Td[(u2i) 2Pki=1(ui)u2i2 2)]TJ /F14 7.97 Tf 6.59 0 Td[(u2i,i(ui)=1 2e)]TJ /F19 5.978 Tf 5.76 0 Td[(u2i 22,andu=Az. Theorem2.9. LetS=fz2RN:z2i0.Proof:Tothebestofourknowledge,thereisnogeneralproofintheliteraturethatafrmspreservationofinvexityoverafnecompositions.Therefore,anelementaryproofthatservesnotonlyasaprooffortheabovetheorem,butforinvexfunctionsingeneral,ispresented.Toprove:ifFLisaninvexfunction,thenGC=FLLisinvex,whereLisanyafnetransformation.Bycontradiction,assumeGCisnotinvex,i.e.,thefollowingistrueforanyarbitrary(z,w):SS7!S GC(z))-222(GC(w)<(z,w)TrGC(w).(2)RewritingEquation 2 : FL(Az))-221(FL(Aw)<[A(z,w)]TrFL(Aw).(2)Letb(Az,Aw)=A(z,w),u=Az,andv=Aw.Equation 2 canbewrittenas: FL(u))-222(FL(v)
PAGE 41

Since,z,w,and(z,w)arechosenarbitrarily,Equation 2 impliesthatFLisnotinvex.ThiscontradictionisaresultoftheassumptionmadeinEquation 2 .Thus,theassumptionthatGCisnotinvexisfalse. 2.5.MinimizationofErrorEntropywithFiducialPointsAnotherimportantM-estimator,usingtheconceptofducialpoint(referencepoint)isproposedin[ 55 ].Thegoalofsuchmeasureistoprovideananchortozeroerror,i.e.,makemostoftheerrorszero.ThisM-estimatorisobtainedbytheMinimizationofErrorEntropywithFiducialpoints(MEEF).TheMEEFproblemcanbedenedas: minimize:)]TJ /F11 11.955 Tf 29.85 8.09 Td[(1 (N+1)2NXi=0NXj=0k(zi)]TJ /F10 11.955 Tf 11.95 0 Td[(zj).(2)TheonlymodicationinMEEF,whencomparedtoMEEistheadditionofareferencepoint,z0=0.Simplifyingtheabovefunction,byusingthesymmetricpropertyoftheGaussiankernel,theMEEFproblemcanbewrittenas: minimize:)]TJ /F11 11.955 Tf 29.85 8.08 Td[(1 (N+1)2NXi=1NXj=1k(zi)]TJ /F10 11.955 Tf 11.95 0 Td[(zj))]TJ /F11 11.955 Tf 32.51 8.08 Td[(2 (N+1)2NXj=0k(z0)]TJ /F10 11.955 Tf 11.96 0 Td[(zj)(2)or minimize:)]TJ /F11 11.955 Tf 29.85 8.09 Td[(1 (N+1)2NXi=1NXj=1k(zi)]TJ /F10 11.955 Tf 11.96 0 Td[(zj))]TJ /F11 11.955 Tf 32.5 8.09 Td[(2 (N+1)2NXj=1k(zj))]TJ /F11 11.955 Tf 32.5 8.09 Td[(2 (N+1)2.(2)Ingeneral,byaddingmducialpoints,thefollowingMEEFfunctionisobtained: minimize:)]TJ /F11 11.955 Tf 29.85 8.09 Td[(1 (N+1)2NXi=1NXj=1k(zi)]TJ /F10 11.955 Tf 11.96 0 Td[(zj))]TJ /F11 11.955 Tf 27.52 8.09 Td[(2m (N+1)2NXj=1k(zj))]TJ /F11 11.955 Tf 27.52 8.09 Td[(2m (N+1)2.(2)Removingtheconstantterm,andnormalizingthecoefcients,wegetthefollowing: minimize:)]TJ /F3 11.955 Tf 9.3 0 Td[(NXi=1NXj=1k(zi)]TJ /F10 11.955 Tf 11.95 0 Td[(zj))]TJ /F11 11.955 Tf 11.96 0 Td[((1)]TJ /F3 11.955 Tf 11.96 0 Td[()NXj=1k(zj),(2)where2(0,1].Itcanbeseenthat,as)166(!0,theMEEFformulationconvergestoMinimizingCorrentropicCost(MCC)function.Ontheotherhand,when=1,the 41

PAGE 42

MEEFobjectivefunctionreducestoMEEobjectivefunction.Intuitively,thesecondterm,PNj=1k(zj)canbeseenasaregularizationfunction.Infact,correntropyisasimilaritynorm,andcanbeusedforsparsicationofthesolution.Thissparsicationistheunderlyingreasoningfortheusageofducialpoints.ConsiderthenormalizedlossfunctionoftheMEEFfunction,HC(x),denedas: HC(z)=)]TJ /F3 11.955 Tf 9.3 0 Td[(NXi=1NXj=1k(zi)]TJ /F10 11.955 Tf 11.95 0 Td[(zj))]TJ /F11 11.955 Tf 11.96 0 Td[((1)]TJ /F3 11.955 Tf 11.96 0 Td[()NXj=1k(zj)(2) HC(z)=GC(z)+(1)]TJ /F3 11.955 Tf 11.95 0 Td[()FC(z),(2)whereGC(z)=)]TJ /F8 11.955 Tf 11.29 8.97 Td[(PNi=1PNj=1k(zi)]TJ /F10 11.955 Tf 10.52 0 Td[(zj),andFC(z)=)]TJ /F8 11.955 Tf 11.29 8.97 Td[(PNj=1k(zj).Equation 2 statesthatfunctionHC(z)isaconvexcombinationoftworealfunctions.Unlikeconvexity,asareminder,pseudoconvexitymaynotbepreservedwithpositiveweightedsummation.However,invexitywillbepreservedoverthepositiveweightedsummation,whenallthefunctionsareinvexwithrespecttosamefunction.Next,theconditionsunderwhich,HC(x)isconvexinparticularandinvexingeneralaredeveloped. Theorem2.10. LetS=fz2RN:z2i0. 42

PAGE 43

Proof:InordertoprovetheinvexityofHC(z),itissufcienttoshowthatbothGC(z)andFC(z)areinvexwithrespecttoacommonfunction.Bycontradiction,assumethatfollowingsystem,say(system-1)isinfeasibleforanyzandw2S: rFC(w)T(z,w)FC(z))-222(FC(w) (2a) rGC(w)T(z,w)GC(z))-222(GC(w). (2b) SinceEquation 2 islinearwithrespectto(z,w),fromtheGale'stheorem[ 61 ],itcanbestatedthat:iftheabovelinearsystem(system-1)isinfeasible,thenthefollowingsystem(system-2)shouldbefeasible: [rFC(w)rGC(w)]p=0 (2a) [FC(z))-221(FC(w)GC(z))-222(GC(w)]p=)]TJ /F11 11.955 Tf 9.3 0 Td[(1 (2b) p0. (2c) Case1:eitherp1=0orp2=0.Clearly,ifp1=0,thenp2=0fromEquation 2a .Whereas,whenp1=0andp2=0thenEquation 2b isinfeasible.Thus,p16=0.Similarargumentcanbefollowedtoshowthatp26=0.Tosumup,neitherp1=0norp2=0giveafeasiblesolutionfor(system-2).Case2:p1>0andp2>0.Letusrearrangeelementsofwsuchthatthefollowingrelationholds:w1w2...wN.NowEquation 2a canbewrittenas: rGC(w)=)]TJ /F3 11.955 Tf 9.3 0 Td[(rFC(w),(2)where=p1 p2>0.Nowconsiderthefollowingtwosub-cases: 43

PAGE 44

Sub-case1:wN0ConsiderthelastelementonbothsidesofEquation 2 ,i.e.,consider 2 2NXi=1e)]TJ /F15 5.978 Tf 5.76 0 Td[((wN)]TJ /F19 5.978 Tf 5.76 0 Td[(wi)2 22(wN)]TJ /F10 11.955 Tf 11.96 0 Td[(wi)=)]TJ /F3 11.955 Tf 9.3 0 Td[(1 2e)]TJ /F19 5.978 Tf 5.75 0 Td[(w2N 22wN.(2)Clearly,Equation 2 hasnofeasiblevalueof.Sub-case2:wN<0ConsidertherstelementonbothsidesofEquation 2 ,i.e.,wehave [rGC(w)]1=)]TJ /F3 11.955 Tf 9.29 0 Td[([rFC(w)]1(2) 2 2NXi=1e)]TJ /F15 5.978 Tf 5.76 0 Td[((w1)]TJ /F19 5.978 Tf 5.76 0 Td[(wi)2 22(w1)]TJ /F10 11.955 Tf 11.96 0 Td[(wi)=)]TJ /F3 11.955 Tf 9.29 0 Td[(1 2e)]TJ /F19 5.978 Tf 5.76 0 Td[(w21 22w1.(2)Clearly,Equation 2 hasnofeasiblevalueof.Thus,the(system-2)isinfeasible,implyingthattheassumptionisfalse,andthe(system-1)isfeasible.Inotherwords,thereexistacommonsuchthatbothGC(z)andFC(z)areinvex.Therefore,HC(z)isinvexfornonzeronitevalueof.Tosumup,itcanbestatedthatMCC,MEEandMEEFfunctionsareinvexinnature.Furthermore,invexityalongwithrobustnessandsmoothnessarethethreemaindesirablepropertiesinarobustmeasure.Presenceofthesethreeproperties,alongwithsuitableoptimizationalgorithmswillimprovethecurrentcomputationalcomplexitiesoftherobustmethods.Next,thetraditionalandproposedrobustalgorithmsarepresented. 2.6.TraditionalRobustAlgorithmConsidertheclassicaldataanalysismethods,likeleastsquaremethod,inordertounderstandtheconceptofrobustalgorithms.Theideaintheclassicalmethodsistoestimatethemodelparameterswithrespecttoallofthepresenteddata.Themethodsgiveequalweighttoallthedatapoints,andthemethodshavenointernalmechanismtodetectand/orltertheoutliers.Theclassicalmethodsarebasedonsmoothingassumption,whichstatesthattheeffectsofoutliersaresmoothedoutbypresenceoflargeamountofgooddatapoints.However,inmanypracticalproblems,thesmoothingassumptionisnotjustiable.Thus,earlierrobustalgorithmswerebasedon 44

PAGE 45

removalofoutliers.Thesimpleideainarobustalgorithmistoestimatetheparameterswithrespecttoallofthedatapoints.Then,identifythosepointswhicharefarthest(non-conforming)fromthemodel.Theidentiedpointsareassumedasoutliers,andremovedfromthedata.Theremainingpointsareusedtoconstructanewmodel.Thisiterativeprocesscontinuesuntilabettermodelisconstructed,oruntilthereisnolongersufcientremainingpointstoproceed.However,theheuristiciterativemethodseasilyfailevenwhenthereisonlyoneoutlier[ 28 ].Fischler[ 29 ]pioneeredtheconstructiveapproachofrobustalgorithmusingthenotionofrandomsampling,calledRandomSamplingConsensus(RANSAC).ThebasicideainRANSACistosimultaneouslyestimatethemodelandeliminatetheoutliers.ThenoveltythatRANSACproposedwhencomparedtotheearlierheuristicscanbesummarizedas: Initially,asmallnumberofdatapoints(initialset)areselectedtoestimatethemodelparameters. Whileestimatingtheinstantaneousmodelparameters,theinitialsetisenlargedinsizebyaddingtheconsensuspoints.ThephilosophyofselectingsmallnumberofpointsforestimatingtheinstantaneousmodelparametersistherobustnessoftheRANSACalgorithm.Typically,thenumberofoutliersinapracticaldatasetisassumedtobemuchlessinnumberwhencomparedtogooddatapoints.Thus,selectingasmallsampleforthegivendataincreasestheprobabilityofselectinggooddatapoints.Formally,theRANSACisdescribedinAlgorithm 2.1 .RANSACisthebasisofmanyrobustalgorithmsduetoitsabilitytotoleratealargefractionofoutliers.RANSACcanoftenperformwellwithhighamountofoutliers,however,thenumberofsamplesrequiredtodosoincreasesexponentiallywithrespecttothepercentageofoutliersinthedata.Thus,similartorobustmeasures,robustalgorithmsarecomputationallyexpensive. 45

PAGE 46

Ifthepercentageofoutliersinasampleisknowapriori(saypo),thenthenumberofsamplesrequired(sayk)withlevelofcondencecanbecalculatedas: klog(1)]TJ /F3 11.955 Tf 11.96 0 Td[() log(1)]TJ /F11 11.955 Tf 11.96 0 Td[((1)]TJ /F10 11.955 Tf 11.96 0 Td[(po)m),(2)wheremistheminimumnumberofsamplesrequiredtocomputeasolution.RANSACisasimple,successfulrobustalgorithmavailableintheliteratureofdataanalysis.Nevertheless,manyeffortshavebeendevelopedtowardimprovingtheperformanceofRANSAC.Forexample,theoptimizationofthemodelvericationprocessofRANSACisaddressedin[ 12 16 18 ].Improvementsdirectingtowardssamplingprocessinordertogenerateusablehypothesesareaddressedin[ 17 19 96 ].Furthermore,thereal-timeexecutionissuesareaddressin[ 68 74 ]. 2.7.ProposedRobustAlgorithmInthiswork,aRANSACbasedrobustalgorithmisdeveloped,whichproposesimprovementinsamplingstrategyandusageofmathematicalmodelingforhypothesistesting.Specically,thealgorithmisproposedforthehyperplaneclusteringproblem.Thekeyideainsampleselectionofrobustalgorithmistwofold.First,aninitialdatasampleS1isselectedbasedonclosenesscriterion.Byrestrictingtheclosenesscriterion,anothersubgroupoftheinitialdatasampleS2isselected.LettherestofthedatapointsbedenotedasS3.Themodelparameterisestimatedasfollows:ThedatapointsbelongingtoS2isconsideredasdenitelygoodpoints.ThedatapointsbelongingtoS1areconsideredastentativegoodpoints.ByusinginformationofpointsfromS1andS2,ahyperplanecontainingS2andotherpointsissearched.IfsomeofthepointsbelongingtoS1containpointsthatdonotbelongtohyperplane,thenthealgorithmhasamechanismtodiscardthosepoints.Aftertheexecutionofoneinstanceofthealgorithm,wewillendupwithtwopossibilities.Thedesiredpossibilityisthenumberofconsensuspointswillbeabovethresholdforthathyperplane,andwesearchforthenexthyperplane.Anotherpossibilityisthatthenumberofconsensuspointsisbelowthe 46

PAGE 47

threshold.Inthiscase,were-sampleforthetwosets,butarchiveinformationofpreviousunsuccessfulsampleS2.Thisarchiveactsasacutforselectingthenextsample,andavoidstherepetition. 2.8.DiscussionontheRobustMethodsThepracticalconsideration,whileapplyingtheresultspresentedinSection 2.3 ,istheasymptoticbehaviorofthenegativeexponentialfunction.Theoretically,(x))166(!0asx)166(!1.Howeverinpractice,nitelargevaluesofxresultin(x)=0.Thisbehaviormayresultinlocalminima,whichcanbeavoidedbyusingthefollowingmethods:constrainingtheabsolutevalueoferror,replacingtheGaussiankernelwithasuitablekernelfunction,orusingthesolutionofaquadraticlossfunctionasastartingpointofcorrentropyminimization.InSections 2.3 2.4 and 2.5 ,convexity,pseudoconvexityandinvexityoftheentropybasedlossfunctions(MCC,MEEandMEEF)areestablished.Invexityisthesolepropertycanbeexploitedindesigningoptimizationalgorithms,whichcanbeusedforefcientlyminimizingthelossfunction.Thegeneralizedconvexityresultsforthecaseofsingleandmultipledimensionareseparatelypresented.Thepurposeofdiscussingone-dimensionalcasesseparatelywastoaddressthetraditionalsample-by-samplearticialneuralnetworkapproachindataanalysis.Generally,thecumulativeerrorapproachisusefulinboththeparametricandnon-parametricapproachesofdataanalysis.Futuredirectionsofutilizingthecorrentropiclossfunctionwillinvolvedesigningfastalgorithmsthatcanspeedupthegridsearchofthekernelparameter.Furthermore,designingakernelthatimprovestheasymptoticbehaviorofthecurrentkernelfunctionwillenhancetheefciencyofthealgorithms.EntropiclearningintheformofMCC,MEEandMEEFhasbeensuccessfullyappliedinrobustdataanalysis,includingrobustregression[ 56 ],robustclassication[ 93 ],robustpatternrecognition[ 43 ],robustimageanalysis[ 42 ],etc.InChapter 2 ,itisshownthatunconstrainedMEEandMEEFproblemsareinvex.Ingeneral,the 47

PAGE 48

invexitypropertyremainsintactoveraconvexfeasiblespacefortheconstrainedoptimizationproblems.Therefore,alinearlearningmapper(oringeneral,aconvexmapper)designedtominimizeMEEwillbeaninvexproblem.Bysuitablyexploitingtheinvexity,efcientoptimizationalgorithmscanbeproposedforMCC,MEEandMEEFproblems.Furthermore,stochasticgradientmethodslikeconvolutionsmoothingcanbeintelligentlyappliedtosolvetheproblems.Infact,byvaryingthekernelparameterwemovefromconvextoinvexdomain,whichisinherentlythenotionofnotonlyconvolutionsmoothing,butalsoformanyglobaloptimizationalgorithms.Sections 2.6 and 2.7 presenttherobustalgorithmicapproachindataanalysis.Typically,theRANSACphilosophyisappliedincomputervisionandrelatedareasofdataanalysis.However,themethodisusefulinthosedataanalysisscenarioswhereasamplecanbeusedtoestimatemodelparametersandvalidatetheotherpoints.InChapter 4 ,theblindsignalseparationproblemispresentedandasolutionmethodologythatinvolvestheRANSACphilosophyisproposed. Algorithm2.1:RANSACAlgorithm input:Pdatapoints.output:EstimatedmodelM?. 1begin 2Set"thetolerancelimit; 3Setapredenedthreshold; 4Settermination=false; 5whiletermination==falsedo 6SelectSi,asetcontainingnpoints,fromthegivenPpoints; 7EstimatemodelMiusingtheknowledgeofsetSi; 8IdentifySci,thesetofpoints(consensusset)fromtheoriginalPdatapointsthatfallwithin"tolerancelimitsofMi; 9ifjScijthen 10EstimatemodelM?usingSci; 11termination=true; 12return(M?); 48

PAGE 49

CHAPTER3ROBUSTDATACLASSIFICATIONThegoalofChapter 3 istopresentthebinaryclassicationproblem,andillustratetherobustnon-parametricmethodsfordataclassication.InSection 3.1 ,allthemajorpreliminarytopicsrequiredtounderstandtheproposedrobustmethodsindataclassicationarediscussed.Thepurposeofreviewingthesetopicsistoprovidesufcientbackgroundinformationtoanovicereader.However,theybynomeansserveasacomprehensivediscussion,andinterestedreaderswillbedirectedtotheappropriatereferencesforthedetaileddiscussions.Furthermore,Section 3.2 reviewssomeofthetraditionalapproachesinbinaryclassication,whereas,Section 3.3 presentstheproposedrobustapproaches. 3.1.PreliminariesThefollowingtopicsarereviewedinSection 3.1 : Classication CorrentropicFunction ConvolutionSmoothing(CS) SimulatedAnnealing(SA) ArticialNeuralNetwork(ANN) SupportVectorMachine(SVM)Classication Classication(strictlyspeaking,statisticalclassication)isasupervisedlearningmethodologyofidentifying(orassigning)classlabelstoanunlabeleddataset(asub-populationofdata,whoseclassisunknown)fromtheknowledgeofapre-labeleddataset(anothersub-populationofthesamedata,whoseclassisknown).Theknowledgeofpre-labeleddatasetcanbeusedtogenerateanoptimalrule,basedonthetheoryoflearning[ 99 100 ].Morespecically,theoptimalrule(discriminant SomesectionsofChapter3havebeenpublishedinDynamicsofInformationSys-tems:MathematicalFoundations. 49

PAGE 50

functionf)isgeneratedinsuchawaythatitwillminimizetheriskofassigningincorrectclasslabels[ 3 65 ].Theclassicationproblemisdenedinthefollowingparagraph.LetDnrepresentsthedatasetcontainingtheobservations,denedas,Dn=f(xi,yi),i=1,..,n:xi2Rm^yi2f)]TJ /F11 11.955 Tf 27.78 0 Td[(1,1gg,wherexiisaninputvector,andyiistheclasslabelfortheinputvector.Undertheassumptionthat(xi,yi)isanindependentandidenticalrealizationofrandompair(X,Y),theclassicationproblemcanbedenedasndingafunctionffromaclassoffunctions)]TJ /F1 11.955 Tf 6.77 0 Td[(,suchthatfminimizestherisk,R(f).Thus,classicationproblemcanbewrittenas: minimize:R(f) (3a) subjectto:(xi,yi)2Dn8i=1,...,n, (3b) xi2Rm8i=1,...,n, (3c) yi2f)]TJ /F11 11.955 Tf 26.57 0 Td[(1,1g8i=1,...,n, (3d) f2, (3e) whereR(f)isdenedas: R(f)=P(Y6=sign(f(X))=E[l0)]TJ /F6 7.97 Tf 6.58 0 Td[(1(f(X),Y)], (3) wheresignisthesignumfunctionandl0)]TJ /F6 7.97 Tf 6.58 0 Td[(1isthe0-1lossfunction,theyaredenedas: sign(f(X))=8>>>>>><>>>>>>:+1iff(X)>0)]TJ /F11 11.955 Tf 9.3 0 Td[(1iff(X)<00otherwise(3) l0)]TJ /F6 7.97 Tf 6.59 0 Td[(1(f(x),y)=jj()]TJ /F10 11.955 Tf 9.29 0 Td[(yf(x))+jj0, (3) 50

PAGE 51

where(.)+denotesthepositivepartandjj.jj0denotestheL0norm.Whenf(x)=0,theabovedenitiondoesnotreecterror,however,thisisararecaseandcanbeeasilyavoidedoradjusted(i.e.,byconsideringjj(f(x))]TJ /F10 11.955 Tf 12.86 0 Td[(y)+jj0).Moreover,itisclearfromthedenitionofR(f)thatitrequirestheknowledgeofP(X,Y),thejointprobabilitydistributionoftherandompair(X,Y).Usually,thejointdistributionisunknown.ThisleadstothecalculationofempiricalriskfunctionbR(f),whichisgivenas: bR(f)=1 nPni=1l0)]TJ /F6 7.97 Tf 6.59 0 Td[(1(yif(xi)). (3) Atthisjuncture,onlyEmpiricalRiskMinimization(ERM)isconsidered,andanydiscussionpertainingtoStructuralRiskMinimization(SRM)isavoided.However,SRMwillbediscussedwhenthenotionofsupportvectormachineispresented.Generally,itisnoteasytondtheoptimalsolutionf?ofproblemstatedinFormulation 3 sincethespaceoffunctionsclass)]TJ /F1 11.955 Tf 10.1 0 Td[(ishuge,andthereisnoefcientwaytosearchoversuchspace.Inordertondthesolution,ausualapproachistoselecttheclassoffunctionsapriori,andthentrytondthebestfunctionfromtheselectedfunctionclassb)]TJ /F1 11.955 Tf 6.78 0 Td[(.Generally,theselectedclassoffunctionscanbecategorizedasparametricornon-parametricclass.Basedonthecategoryofthefunctionclass,differentlearningalgorithmscanbeusedtominimizethelossfunction.Thus,withtheabovestatedrestrictions,theclassicationproblemcanberepresentedas: minimize:bR(f) (3a) subjectto:(xi,yi)2Dn8i=1,...,n, (3b) xi2Rm8i=1,...,n, (3c) yi2f)]TJ /F11 11.955 Tf 26.57 0 Td[(1,1g8i=1,...,n, (3d) f2b. (3e) 51

PAGE 52

Insummary,usually,bothbR(f)andb)]TJ /F1 11.955 Tf 10.09 0 Td[(willbeselectedbeforendingf?.Moreover,thetypeofriskfunctionandthefunctionclassselectedwillsignicantlydeterminetheaccuracyofclassicationmethod.Next,theusageofcorrentropylossfunctionasariskfunctionindataclassicationispresented.CorrentropicFunction AlthoughtheclassicationproblemstatedinFormulation 3 lookssimple,ithasaninherentdifculty,duetothenon-convexandnon-continuouslossfunctiondenedinEquation 3 .Furthermore,thesearchovertheb)]TJ /F1 11.955 Tf 10.09 0 Td[(functionspaceisanotherdifcultyinsolvingFormulation 3 .ThekeyideaistoproposealossfunctionthatcanefcientlyreplacethelossfunctiongiveninEquation 3 .Conventionally,the0-1lossfunctionisreplacedbyaquadraticlossfunction,i.e.,thequadraticriskisgivenas: R(f)=E[(Y)]TJ /F10 11.955 Tf 11.95 0 Td[(f(X)2]=E[(")2]. (3) Ingeneral,theknowledgeofProbabilityDistributionFunction(PDF)of"isrequiredtocalculatetheaboveriskfunction.However,thequadraticriskcanbeapproximatedbythefollowingempiricalquadraticriskfunction:bR(f)=1 nnXi=1(yi)]TJ /F10 11.955 Tf 11.95 0 Td[(f(xi))2, (3)wherenisthenumberofsamples.Thereplacementof0-1lossfunctionwithquadraticlossfunctionmakesFormulation 3 computationallyeasytosolve(duetoitsconvexnature).Moreover,ifthefunctionclassb)]TJ /F1 11.955 Tf 10.09 0 Td[(issmooth,thentheproblemcanbesolvedbygradientdescentmethods.However,thequadraticlossfunctionperformspoorlyinnoisydata,i.e.,thecomputationalsimplicityhasitspriceintheclassicationperformance.Hence,usualgradientdescentbasedoptimizationmethodswithquadraticlossfunctionmaynotprovidetheglobaloptimalsolutionfortheclassoffunctionsselected(b)]TJ /F1 11.955 Tf 6.77 0 Td[().Inordertoovercomethisdifculty,theuseofthecorrentropiclossfunctionisproposed. 52

PAGE 53

Inordertodenethecorrentropicriskfunction,considerafunction,(f(x),y)denedas: ,(f(x),y)=[1)]TJ /F10 11.955 Tf 11.96 0 Td[(k(1)]TJ /F10 11.955 Tf 11.96 0 Td[(yf(x))]=[1)]TJ /F10 11.955 Tf 11.96 0 Td[(k(1)]TJ /F3 11.955 Tf 11.96 0 Td[())], (3a) where=yf(x)iscalledthemargin,=[1)]TJ /F10 11.955 Tf 12.59 0 Td[(e()]TJ /F15 5.978 Tf 5.75 0 Td[(1 22)])]TJ /F6 7.97 Tf 6.59 0 Td[(1isapositivescalingfactor,andkistheGaussiankernelwithwidthparameter.Thisfunctionhasitsrootsfromcorrentropyfunction(see[ 72 ]formoredetails).Usingthisinformation,thecorrentropicriskfunctioncanberewrittenas: R(f)=E[,(f(X),Y)]=E[(1)]TJ /F10 11.955 Tf 11.95 0 Td[(k(1)]TJ /F10 11.955 Tf 11.95 0 Td[(Yf(X)))]=(1)]TJ /F10 11.955 Tf 11.96 0 Td[(E[k(1)]TJ /F10 11.955 Tf 11.95 0 Td[(Yf(X))])=(1)]TJ /F3 11.955 Tf 11.96 0 Td[((1)]TJ /F10 11.955 Tf 11.95 0 Td[(Yf(X))])=(1)]TJ /F3 11.955 Tf 11.96 0 Td[((Y)]TJ /F10 11.955 Tf 11.95 0 Td[(f(X))])=(1)]TJ /F3 11.955 Tf 11.96 0 Td[((")). (3a) DuetotheunavailabilityofPDFfunction,similartoquadraticlossfunction,theempiricalcorrentropicriskfunctioncanbedenedas:bR(f)=(1)]TJ /F8 11.955 Tf 12.22 0 Td[(b(")), (3)whereb(")=1 nPni=1k(yi)]TJ /F10 11.955 Tf 11.95 0 Td[(f(xi))andnisthenumberofsamples.ThecharacteristicsofthisfunctionfordifferentvaluesofthewidthparameterareshowninFigure 3-1 .Clearly,fromFigure 3-1 ,itcanbeseenthatthefunction,(f(x),y)isconvexforhighervaluesofkernelwidthparameter(>1),andastheparametervaluedecreases,itbecomesnon-convex.For=1itapproximatesthehingelossfunction(HingelossfunctionisatypicalfunctionoftenusedinSVMs). 53

PAGE 54

However,forsmallervaluesofkernelwidththefunctionalmostapproximatesthe0-1lossfunction,whichismostlyanunexploredterritoryfortypicalclassicationproblems.Infact,anyvalueofkernelwidthapartfrom=2or1hasnotbeenstudiedforotherlossfunctions.Thispeculiarpropertyofcorrentropicfunctioncanbeharmoniouslyusedwiththeconceptofconvolutionsmoothingforndingglobaloptimalsolutions.Moreover,withaxedlowervalueofkernelwidth,suitableglobaloptimizationalgorithms(heuristicslikesimulatedannealing)canbeusedtondtheglobaloptimalsolution.Next,theelementaryideasaboutdifferentoptimizationalgorithmsthatcanbeusedwiththecorrentropiclossfunctionarediscussed.ConvolutionSmoothing(CS) AConvolutionSmoothing(CS)approach[ 87 ]formsthebasisforoneoftheproposedmethodsofcorrentropicriskminimization.ThemainideaofCSapproachissequentiallearning,wherethealgorithmstartsfromahighkernelwidthcorrentropiclossfunctionandsmoothlymovestowardsalowkernelwidthcorrentropiclossfunction,approximatingtheoriginallossfunction.Thesuitabilityofthisapproachcanbeseenin[ 86 ],wheretheauthorsusedatwostepapproachforndingtheglobaloptimalsolution.Thecurrentproposedmethodisageneralizationofthethetwostepapproach.Beforediscussingtheproposedmethod,considerthefollowingbasicframeworkofCS.Ageneralunconstrainedoptimizationproblemisdenedas: minimize:g(u) (3a) subjectto:u2Rn, (3b) whereg:Rn7!R.Thecomplexityofsolvingsuchproblemsdependsuponthenatureoffunctiong.Ifgisconvexinnature,thenasimplegradientdescentmethodwillleadtotheglobaloptimalsolution.Whereas,ifgisnon-convex,thenthegradientdescentalgorithmwillbehavepoorlyandwillconvergetoalocaloptimalsolution(orintheworstcaseconvergestoastationarypoint). 54

PAGE 55

CSisaheuristicbasedglobaloptimizationmethodtosolveproblemsillustratedinFormulation 3 whengisnon-convex.Thisisaspecializedstochasticapproximationmethodintroducedin1951[ 77 ].Usageofconvolutioninsolvingconvexoptimizationproblemswasrstproposedin1972[ 4 ].Later,asanextension,ageneralizedmethodforsolvingnon-convexunconstrainedproblemsisproposedin1983[ 82 ].ThemainmotivationbehindCSisthattheglobaloptimalsolutionofamulti-extremalfunctiongcanbeobtainedbytheinformationofalocaloptimalsolutionofitssmoothedfunction.Itisassumedthatthefunctiongisaconvolutedfunctionofaconvexfunctiong0andothernon-convexfunctionsgi8=1,...,n.Theothernon-convexfunctionscanbeseenasnoiseaddedtotheconvexfunctiong0.Inpractice,g0isintangible,i.e.,itisimpracticaltoobtainadeconvolutionofgintogi's,suchthatargminufg(u)g=argminufg0(u)g.Inordertoovercomethisdifculty,asmoothedapproximationfunctionbgisused.Thissmoothedfunctionhasthefollowingmainproperty: bg(u,))166(!g(u)as)166(!0, (3) whereisthesmoothingparameter.Forhighervaluesof,thefunctionishighlysmooth(nearlyconvex),andasthevalueoftendstowardszero,thefunctiontakestheshapeoforiginalnon-convexfunctiong.Suchsmoothedfunctionscanbedenedas: bg(u,)=Z1bh((u)]TJ /F10 11.955 Tf 11.96 0 Td[(v),)g(v)dv, (3) wherebh(v,)isakernelfunction,withthefollowingproperties: bh(v,))166(!(v),as)166(!0;where(v)isDirac'sdeltafunction. bh(v,)isaprobabilitydistributionfunction. bh(v,)isapiecewisedifferentiablefunctionwithrespecttou.Moreover,thesmoothedgradientofbg(u,)canbeexpressedas: rbg(u,)=Z1rbh(v,)g(u)]TJ /F10 11.955 Tf 11.96 0 Td[(v)dv.(3) 55

PAGE 56

Equation 3 highlightsaveryimportantaspectofCS,itstatesthatinformationofrg(v)isnotrequiredforobtainingthesmoothedgradient.Thisisoneofthecrucialaspectsofsmoothedgradientthatcanbeeasilyextendedfornon-smoothoptimizationproblemswhererg(v)doesnotusuallyexist.Furthermore,theobjectiveofCSistondtheglobaloptimalsolutionoffunctiong.However,basedonthelevelofsmoothness,alocaloptimalsolutionofthesmoothedfunctionmaynotcoincidewiththeglobaloptimalsolutionoftheoriginalfunction.Therefore,aseriesofsequentialoptimizationsarerequiredwithdifferentlevelofsmoothness.Usually,atrst,ahighvalueofisset,andanoptimalsolutionu?isobtained.Takingu?asthestartingpoint,thevalueofisreduced,andanewoptimalvalueintheneighborhoodofu?isobtained.Thisprocedureisrepeateduntilthevalueofisreducedtozero.Theideabehindthesesequentialoptimizationsistoendupinaneighborhoodofu?as)166(!0,i.e., u?)166(!u?as)166(!0, (3) whereu?=argminfg(u)g.ThecrucialpartintheCSapproachisthedecrementofthesmoothingparameter.Differentalgorithmscanbedevisedtodecrementthesmoothingparameter.In[ 87 ]aheuristicmethod(similartosimulatedannealing)isproposedtodecreasethesmoothingparameter.Apparently,themaindifcultyofusingtheCSmethodtoanyoptimizationproblemisdeningasmoothedfunctionwiththepropertygivenbyEquation 3 .However,theCScanbeusedefcientlywiththeproposedcorrentropiclossfunction,asthecorrentropiclossfunctioncanbeseenasageneralizedsmoothedfunctionforthetruelossfunction(seeFigure 3-1 ).Thekernelwidthofcorrentropiclossfunctioncanbevisualizedasthesmoothingparameter.Therefore,theCSmethodisapplicableinsolvingtheclassicationproblemwhensuitablekernelwidthisunknownapriori(apracticalsituation).Ontheother 56

PAGE 57

hand,ifappropriatevalueofkerneliswidthknownapriori(maybeanimpracticalassumption,butquitepossible),thenotherefcientmethodsmaybedeveloped,likesimulatedannealingbasedmethods.ThecruxofChapter 3 istopresentacorrentropyminimizationmethodoveranon-parametricframework.Generally,thecorrentropylossfunctionisinvex(andconvexundercertaincases).However,duetothepresenceofnon-convexframework,globaloptimizationmethodslikeCSorsimulatedannealingbasedmethodsareproposed.SimulatedAnnealing(SA) SimulatedAnnealing(SA)isameta-heuristicmethodwhichisemployedtondagoodsolutiontoanoptimizationproblem.Thismethodstemsfromthermalannealingwhichaimstoobtainaperfectcrystallinestructure(lowestenergystatepossible)byaslowtemperaturereduction.Metropolisetal.in1953simulatedthisprocessesofmaterialcooling[ 13 ],Kirkpatricketal.appliedthesimulationmethodforsolvingoptimizationproblems[ 53 70 ].SAcanbeviewedasanupgradedversionofgreedyneighborhoodsearch.Inneighborhoodsearchmethod,aneighborhoodstructureisdenedinthesolutionspace,andtheneighborhoodofacurrentsolutionissearchedforabettersolution.Themaindisadvantageofthistypeofsearchisitstendencytoconvergetoalocaloptimalsolution.SAtacklesthisdrawbackbyusingconceptsfromHill-climbingmethods[ 64 ].InSA,anyneighborhoodsolutionofthecurrentsolutionisevaluatedandacceptedwithaprobability.Ifthenewsolutionisbetterthanthecurrentsolution,thenitwillreplacethecurrentsolutionwithprobability1.Whereas,ifthenewsolutionisworsethanthecurrentsolution,thentheacceptanceprobabilitydependsuponthecontrolparameters(temperatureandchangeinenergy).Duringtheearlyiterationsofthealgorithm,temperatureiskepthigh,andthisresultsinahighprobabilityofacceptingworsenewsolutions.Afterapredeterminednumberofiterations,thetemperatureisreducedstrategically,andthustheprobabilityofacceptinganewworsesolutionisreduced. 57

PAGE 58

Theseiterationswillcontinueuntilanyoftheterminationcriteriaismet.Theuseofhightemperatureattheearlieriterations(lowtemperatureatthelateriterations)canbeviewedasexploration(exploitation,respectively)ofthefeasiblesolutionspace.Aseachnewsolutionisacceptedwithaprobabilityitisalsoknownasstochasticmethod.AcompletetreatmentofSAanditsapplicationsiscarriedoutin[ 75 ].Neighborhoodselectionstrategiesarediscussedin[ 2 ].ConvergencecriteriaofSAarepresentedin[ 57 ].Inthiswork,SAwillbeusedtotrainthecorrentropiclossfunctionwhentheinformationofkernelwidthisknownapriori.Althoughtheassumptionofknownkernelwidthseemsimplausible,anyknowninformationofanunknownvariablewillincreasetheefciencyofsolvinganoptimizationproblem.Moreover,acomprehensiveknowledgeofdatamayprovidetheappropriatekernelwidththatcanbeusedinthelossfunction.Nevertheless,whenkernelwidthinunknown,agridsearchcanbeperformedonthekernelwidthspacetoobtaintheappropriatekernelwidththatmaximizestheclassicationaccuracy.ThisisatypicalapproachwhileusingkernelbasedsoftmarginSVMs,whichgenerallyinvolvesgridsearchoveratwodimensionalparameterspace.Sofar,nodiscussionaboutthefunctionclass(b)]TJ /F1 11.955 Tf 6.77 0 Td[()isaddressed.Inthecurrentwork,anon-parametricfunctionclass,namelyarticialneuralnetworks,andaparametricfunctionclass,namelysupportvectormachines,isconsidered.Next,anintroductoryreviewofarticialneuralnetworksispresented.ArticialNeuralNetworks(ANN) CuriosityofstudyingthehumanbrainledtothedevelopmentofArticialNeuralNetworks(ANNs).Henceforth,ANNsarethemathematicalmodelsthatsharesomeofthepropertiesofbrainfunctions,suchasnonlinearity,adaptabilityanddistributedcomputations.TherstmathematicalmodelthatdepictedaworkingANNusedtheperceptron,proposedbyMcCullochandPitts[ 62 ].TheactualadaptableperceptronmodeliscreditedtoRosenblatt[ 80 ].Theperceptronisasimplesinglelayerneuron 58

PAGE 59

model,whichusesalearningrulesimilartogradientdescent.However,thesimplicityofthismodel(singlelayer)limitsitsapplicabilitytomodelcomplexpracticalproblems.Thereby,itwasanobjectofcensurein[ 66 ].However,aquestionwhichinstigatedtheuseofmultilayerneuralnetworkswaskindledin[ 66 ].Afteracoupleofdecadesofresearch,neuralnetworkresearchexplodedwithimpressivesuccess.Furthermore,multilayeredfeedforwardneuralnetworksarerigorouslyestablishedasafunctionclassofuniversalapproximators[ 46 ].Inadditiontothat,differentmodelsofANNswereproposedtosolvecombinatorialoptimizationproblems.Furthermore,theconvergenceconditionsfortheANNsoptimizationmodelshavebeenextensivelyanalyzed[ 91 ].ProcessingElements(PEs)aretheprimaryelementsofanyANN.ThestateofPEcantakeanyrealvaluebetweentheinterval[0,1](Someauthorsprefertousethevaluesbetween[-1,1];however,bothdenitionsareinterchangeableandhavethesameconvergencebehavior).ThemaincharacteristicofaPEistodofunctionembedding.Inordertounderstandthisphenomenon,considerasinglePEANNmodel(theperceptronmodel)withninputsandoneoutput,showninFigure 3-2 .ThetotalinformationincidentonthePEisPni=1wixi.PEembedsthisinformationintoatransferfunction,andsendstheoutputtothefollowinglayer.Sincethereisasinglelayerintheexample,theoutputfromthePEisconsideredasthenaloutput.Moreover,ifwedeneas: nXi=1wixi+b!=8>><>>:1ifPni=1wixi+b00otherwise,(3)wherebisthethresholdlevelofthePE,thenthesinglePEperceptroncanbeusedforbinaryclassication,giventhedataislinearlyseparable.Thedifferencebetweenthissimpleperceptronmethodofclassicationandsupportvectorbasedclassicationisthattheperceptronndsaplanethatlinearlyseparatesthedata.However,thesupportvectorndstheplanewithmaximummargin.Thisdoesnotindicatesuperiorityofone 59

PAGE 60

methodovertheothermethodsinceasinglePEisconsidered.Infact,thisshowsthecapabilityofasinglePE,but,asinglePEisincapabletoprocesscomplexinformationthatisrequiredformostpracticalproblems.Therefore,multiplePEsinmultiplelayersareusedasuniversalclassiers.ThePEsinteractwitheachothervialinkstosharetheavailableinformation.TheintensityandsenseofinteractionsbetweenanytwoconnectingPEsisrepresentedbyweightorsynapticweightonthelinks.Thetermsynapticisrelatedtothenervoussystem,andisusedinANNstoindicatetheweightbetweenanytwoPEs.Usually,PEsinthe(r)]TJ /F11 11.955 Tf 11.96 0 Td[(1)thlayersendsinformationtotherthlayerusingthefollowingfeedforwardrule: yi=i0@Xj2(r)]TJ /F6 7.97 Tf 6.58 0 Td[(1)wjiyj)]TJ /F10 11.955 Tf 11.95 0 Td[(Ui1A,(3)wherePEibelongstotherthlayer,andanyPEjbelongstothe(r)]TJ /F11 11.955 Tf 11.95 0 Td[(1)thlayer.yirepresentsthestateoftheithPE,wjirepresentsweightbetweenthejthPEandithPE,andUirepresentsthresholdleveloftheithPE.Functioni()isthetransferfunctionfortheithPE.OncethePEsinthenallayerareupdated,theerrorfromtheactualoutputiscalculatedusingalossfunction(thisisthepartwherecorrentropiclossfunctionwillbeinjected).TheerrororlosscalculationmarkstheendoffeedforwardphaseofANNs.Basedontheerrorinformation,backpropagationphaseofANNsstarts.Inthisphase,theerrorinformationisutilizedtoupdatetheweights,usingthefollowingrules: wjk=wjk+kyj, (3) where k=@F(") @"n0(netk), (3) 60

PAGE 61

whereisthelearningstepsize,netk=Pj2(r)]TJ /F6 7.97 Tf 6.59 0 Td[(1)wjiyj)]TJ /F10 11.955 Tf 12.93 0 Td[(Uk,andF(")istheerrorfunction(orlossfunction).Fortheoutputlayer,theweightsarecomputedas: k=0=@F(") @"0(netk)=(y)]TJ /F10 11.955 Tf 11.96 0 Td[(y0)0(netk), (3) andthedeltasofthepreviouslayersareupdatedas: k=h=0(netk)N0Xo=1whoo. (3) Intheproposedapproaches,ANNistrainedinordertominimizethecorrentropiclossfunction.Intotal,twodifferentapproachestotrainANNareproposed.Inoneapproach,ANNwillbetrainedusingtheCSalgorithm.Whereasintheotherproposedapproach,ANNwillbetrainedusingtheSAalgorithm.Inordertovalidatetheresults,wewillnotonlycomparetheproposedapproacheswithconventionalANNtrainingmethods,butalsocomparethemwiththesupportvectormachinesbasedclassicationmethod.Next,areviewofsupportvectormachinesispresented.SupportVectorMachines(SVMs) SupportVectorMachine(SVM)isapopularsupervisedlearningmethod[ 9 22 ].Ithasbeendevelopedforbinaryclassicationproblems,butcanbeextendedtomulticlassclassicationproblems[ 38 101 102 ]andithasbeenappliedinmanyareasofengineeringandbiomedicine[ 44 52 69 95 104 ].Ingeneral,supervisedclassicationalgorithmsprovideaclassicationruleabletodecidetheclassofanunknownsample.Inparticular,thegoalofSVMstrainingphaseistondahyperplanethat`optimally'separatesthedatasamplesthatbelongtoaclass.Moreprecisely,SVMisaparticularcaseofhyperplaneseparation.ThebasicideaofSVMistoseparatetwoclasses(sayAandB)byahyperplanedenedas: f(x)=wtx+b,(3) 61

PAGE 62

suchthatf(a)<0whena2A,andf(b)>0whenb2B.However,therecouldbeinnitelymanypossiblewaystoselectw.ThegoalofSVMistochooseabestwaccordingtoacriterion(usuallytheonethatmaximizesthemargin),sothattheriskofmisclassifyinganewunlabeleddatapointisminimum.Abestseparatinghyperplaneforunknowndatawillbetheonethatissufcientlyfarfromboththeclasses(itisthebasicnotionofSRM),i.e.,ahyperplanewhichisinthemiddleofthefollowingtwoparallelhyperplanes(supporthyperplanes)canbeusedasaseparatinghyperplane: wtx+b=c(3) wtx+b=)]TJ /F10 11.955 Tf 9.29 0 Td[(c.(3)Since,w,bandcareallparameters,asuitablenormalizationwillleadto: wtx+b=1(3) wtx+b=)]TJ /F11 11.955 Tf 9.3 0 Td[(1.(3)Moreover,thedistancebetweenthesupportinghyperplanesdenedinEquations 3 & 3 isgivenby: =2 jjwjj.(3)Inordertoobtainthebestseparatinghyperplane,thefollowingoptimizationproblemissolved: maximize:2 jjwjj (3a) subjectto:yi(wtxi+b))]TJ /F11 11.955 Tf 11.95 0 Td[(108i. (3b) 62

PAGE 63

TheobjectivegiveninEquation 3a isreplacedbyminimizingjjwjj2=2.Usually,thesolutiontoFormulation 3 isobtainedbysolvingitsdual.Inordertoobtainthedual,considertheLagrangianofEquation 3 ,givenas: L(w,b,u)=1 2jjwjj2)]TJ /F14 7.97 Tf 17.29 14.95 Td[(NXi=1ui)]TJ /F10 11.955 Tf 5.48 -9.69 Td[(yi(wtxi+b))]TJ /F11 11.955 Tf 11.95 0 Td[(1,(3)whereui08i.Now,observethatFormulation 3 isconvex.Therefore,thestrongdualityholds,andEquation 3 isvalid: min(w,b)maxuL(w,b,u)=maxumin(w,b)L(w,b,u).(3)Moreover,fromthesaddlepointtheory[ 5 ],thefollowingequationshold: w=NXi=1uiyixi(3) NXi=1uiyi=0.(3)Therefore,usingEquations 3 & 3 ,thedualofFormulation 3 isgivenas: maximize:NXi=1ui)]TJ /F11 11.955 Tf 13.15 8.09 Td[(1 2NXi,j=1uiujyiyjxitxj (3a) subjectto:NXi=1uiyi=0, (3b) ui08i. (3c) Thus,solvingFormulation 3 resultsinobtainingsupportvectors,whichinturnleadstotheoptimalhyperplane.ThisphaseofSVMiscalledastrainingphase.Thetestingphaseissimpleandcanbestatedas: ytest=8>><>>:)]TJ /F11 11.955 Tf 9.3 0 Td[(1,test2Aiff(xtest)<0+1,test2Biff(xtest)>0.(3) 63

PAGE 64

Theabovemethodworksverywellwhenthedataislinearlyseparable.However,mostofthepracticalproblemsarenotlinearlyseparable.InordertoextendtheusabilityofSVMs,softmarginsandkerneltransformationsareincorporatedinthebasiclinearformulation.Whenconsideringsoftmargin,Equation 3a ismodiedas: yi(wtxi+b))]TJ /F11 11.955 Tf 11.95 0 Td[(1+si08i,(3)wheresi0areslackvariables.Theprimalformulationisthenupdatedas: minimize:1 2jjwjj2+cNXi=1si (3a) subjectto:yi(wtxi+b))]TJ /F11 11.955 Tf 11.96 0 Td[(1+si08i, (3b) si08i. (3c) SimilartothelinearSVM,theLagrangianofFormulation 3 isgivenby: L(w,b,u,v)=1 2jjwjj2+cNXi=1si)]TJ /F14 7.97 Tf 17.29 14.94 Td[(NXi=1ui)]TJ /F10 11.955 Tf 5.48 -9.68 Td[(yi(wtxi+b))]TJ /F11 11.955 Tf 11.95 0 Td[(1)]TJ /F4 11.955 Tf 11.96 0 Td[(vts,(3)whereui,vi08i.Correspondingly,usingthetheoriesofsaddlepointandstrongduality,thesoftmarginSVMdualisdenedas: maximize:NXi=1ui)]TJ /F11 11.955 Tf 13.15 8.09 Td[(1 2NXi,j=1uiujyiyjxitxj (3a) subjectto:NXi=1uiyi=0, (3b) uic8i, (3c) ui08i. (3d) 64

PAGE 65

Furthermore,thedotproductxitxjinEquation 3a isexploitedtoovercomethenonlinearity,i.e.,byusingkerneltransformationsintoahigherdimensionalspace.Thus,thesoftmarginkernelSVMhasthefollowingdualformulation: maximize:NXi=1ui)]TJ /F11 11.955 Tf 13.15 8.09 Td[(1 2NXi,j=1uiujyiyjK(xi,xj) (3a) subjectto:NXi=1uiyi=0, (3b) uic8i, (3c) ui08i, (3d) whereK(x,y)isanysymmetrickernel.inthisdissertationaGaussiankernelisused,whichisdenedas: K(xi,xj)=e)]TJ /F13 7.97 Tf 6.59 0 Td[(jjxi)]TJ /F6 7.97 Tf 6.59 0 Td[(xjjj2,(3)where>0.Therefore,inordertoclassifythedata,twoparameters(c,)shouldbegivenapriori.Theinformationabouttheparametersisobtainedfromtheknowledgeandstructureoftheinputdata.However,thisinformationisintangibleforpracticalproblems.Thus,anexhaustivelogarithmicgridsearchisconductedovertheparameterspacetondtheirsuitablevalues.Itisworthwhiletomentionthatassumingc&asvariablesforthekernelSVMandlettingthekernelSVMtrytoobtaintheoptimalvaluesofc&makestheclassicationproblemFormulation 3 intractable.Oncetheparametervaluesareobtainedfromthegridsearch,thekernelSVMistrainedtoobtainthesupportvectors.UsuallythetrainingphaseofthekernelSVMisperformedincombinationwithare-samplingmethodcalledcrossvalidation.Duringcrossvalidationtheexistingdatasetispartitionedintwoparts(trainingandtesting).Themodelisbuiltbasedonthetrainingdata,anditsperformanceisevaluatedusingthetestingdata.In[ 84 ],ageneralmethodtoselectdatafortrainingSVMisdiscussed. 65

PAGE 66

Differentcombinationsoftrainingandtestingsetsareusedtocalculateaverageaccuracy.Thisprocessismainlyfollowedinordertoavoidmanipulationofclassicationaccuracyresultsduetoaparticularchoiceofthetrainingandtestingdatasets.Finally,theclassicationaccuracyreportedistheaverageclassicationaccuracyforallthecrossvalidationiterations.Thereareseveralcrossvalidationmethodsavailabletobuiltthetrainingandtestingsets.Inthiswork,theRRSCVmethodisusedtotrainthekernelSVM.TheperformanceaccuracyoftheSVMiscomparedwiththeproposedapproaches. 3.2.TraditionalClassicationMethodsThegoalofanylearningalgorithmistoobtaintheoptimalrulef?bysolvingtheclassicationproblemillustratedinFormulation 3 .Basedonthetypeoflossfunctionusedinriskestimation,thetypeofinformationrepresentation,andthetypeofoptimizationalgorithm,differentclassicationalgorithmscanbedesigned.AsummaryoftheclassicationmethodsthatareusedinthisworkislistedinTable 3-1 .Next,theconventionalnon-parametricandparametricapproachesarepresented.ConventionalNon-parametricApproaches AclassicalmethodofclassicationusingANNinvolvestrainingaMulti-LayerPerceptron(MLP)usingaback-propagationalgorithm.Usually,asignmodalfunctionisusedasanactivationfunction,andaquadraticlossfunctionisusedforerrormeasurement.TheANNistrainedusingaback-propagationalgorithminvolvinggradientdescentmethod[ 63 ].Beforeproceedingfurthertopresentthetrainingalgorithms,letusdenethenotations:wnjk:TheweightbetweenthekthandjthPEsatthenthiteration.ynj:OutputofthejthPEatthenthiteration.netnk=Pjwnjkynj:Weightedsumofalloutputsynjofthepreviouslayeratnthiteration. 66

PAGE 67

():SigmoidalsquashingfunctionineachPE,denedas:(z)=1)]TJ /F10 11.955 Tf 11.96 0 Td[(e)]TJ /F14 7.97 Tf 6.59 0 Td[(z 1+e)]TJ /F14 7.97 Tf 6.59 0 Td[(zynk=(netnk):OutputofkthPEofthecurrentlayer,atthenthiteration.yn2f1g:thetruelabel(actuallabel)forthenthsample.Next,thetrainingalgorithmsaredescribed.ThesealgorithmsmainlydifferinthetypeoflossfunctionusedtotrainANNs.TrainingANNwithQuadraticlossfunctionusingGradientdescent(AQG).ThisisthesimplestandmostwidelyknownmethodoftrainingANN.AthreelayeredANN(input,hidden,andoutputlayers)istrainedusingaback-propagationalgorithm.Specically,thegeneralizeddeltaruleisusedtoupdatetheweightsofANN,andthetrainingequationsare: wn+1jk=wnjk+nkynj, (3) where nk=@MSE(") @"n0(netnk), (3) whereisthelearningstepsize,"=(yn)]TJ /F10 11.955 Tf 12.2 0 Td[(yn0)istheerror(orloss),andMSE(")isthemeansquareerror.Fortheoutputlayer,theweightsarecomputedas: nk=n0=@MSE(") @"n0(netnk)=(yn)]TJ /F10 11.955 Tf 11.95 0 Td[(yn0)0(netnk). (3) Thedeltasofthepreviouslayersareupdatedas: nk=nh=0(netnk)N0Xo=1wnhono. (3) TrainingANNwithCorrentropiclossfunctionusingGradientdescent(ACG).ThismethodissimilartoAQGmethod,theonlydifferenceistheuseofcorrentropicloss 67

PAGE 68

functioninsteadofquadraticlossfunction.Furthermore,thekernelwidthofcorrentropiclossisxedtoasmallervalue(in[ 86 ],avalueof0.5isillustratedtoperformwell).Moreover,sincethecorrentropicfunctionisnon-convexatthatkernelwidth,theANNistrainedwithaquadraticlossfunctionforsomeinitialepochs.Aftersufcientnumberofepochs(ACG1),thelossfunctionischangedtocorrentropiclossfunction.Thus(ACG1)isaparameterofthealgorithm.Thereasonforusingquadraticlossfunctionattheinitialepochsistopreventconvergingatalocalminimumatearlylearningstages.SimilartoAQG,thedeltaruleisusedtoupdatetheweightsofANN,andthetrainingequationsare: wn+1jk=wnjk+nkynj, (3) where nk=@F(") @"n0netnk), (3) whereisthesteplength,andF(")isagenerallossfunction,whichcanbeeitherquadraticorcorrentropicfunctionbasedonthecurrentnumberoftrainingepochs.Fortheoutputlayer,theweightsarecomputedas: nk=n0=@F(") @"n0(netnk).=8>><>>: 2e)]TJ /F15 5.978 Tf 5.76 0 Td[((yn)]TJ /F19 5.978 Tf 5.76 0 Td[(yn0)2 22(yn)]TJ /F10 11.955 Tf 11.95 0 Td[(yn0)g0(netnk)ifFC-lossFunction(yn)]TJ /F10 11.955 Tf 11.96 0 Td[(yn0)g0(netnk)ifFMSEFunction, (3) whereC)]TJ /F10 11.955 Tf 11.95 0 Td[(lossisthecorrentropicloss.Thedeltasofthepreviouslayersareupdatedas: nk=nh=0(netnk)N0Xo=1wnhono. (3) Basedontheresultsof[ 86 ],thevalueofACG1istakenas5epochs.ThepurposeofcomparingtheproposedapproacheswiththeACGmethodistoseetheimprovementintheclassicationaccuracy,whenthekernelwidthischangingsmoothly. 68

PAGE 69

ConventionalParametricSVMApproach TrainingsoftmarginSVMwithGaussiankernel(SGK).SVMisoneofthemostwidelyknownparametricmethodsinclassication.Inthepresentwork,aGaussiankernelbasedsoftmarginSVMisused.TheSVMisimplementedintwosteps.Intherststep,optimalparameters(kernelwidthandcostpenalty)areobtainedviaexhaustivesearchingovertheparameterspace.Oncetheoptimalparametersareobtainedinthesecondstep,thekernelSVMistrainedwiththeoptimalparameters.Fromthegridsearch,appropriatevaluesoftheparametersareselected.Basedontheselectedvaluesoftheparameters,theSVMistrainedwith100Monte-Carlosimulations.Ineachsimulation,adataisdividedintotworandomsubsetsfortrainingandtesting(RRSCVmethod).TheuseofthekernelSVMinChapter 3 istocomparetheresultsoftheproposedalgorithms.Next,theproposedalgorithmsarepresented. 3.3.ProposedClassicationMethodsInSection 3.3 ,twooptimizationmethodsthatutilizethecorrentropiclossfunctionareproposed.Inoneofthemethods,thekernelwidthactasvariable.Whereas,intheothermethod,thekernelwidthissetasaparameter.TrainingANNwithCorrentropicLossFunctionUsingConvolutionsmoothing(ACC) SimilartothepreviousANNbasedmethods,aback-propagationalgorithmisusedtotraintheANN,i.e.,inthismethod,theweightsareupdatedusingthedeltarule.However,thecostfunctionFisalwaysthecorrentropicfunction,andthekernelwidthischangedoverthetrainingperiod.ThekernelwidthactasasmoothingparameteroftheCSalgorithm,andinitiallykernelwidthissettoavalueof2.Asthealgorithmproceeds,thekernelwidthissmoothlyreducedtillitreaches0.5.Furthermore,asthealgorithmprogress,ifthedeltaruleleadstoahigherrorvalue,thenthekernelwidthisincreasedtoavalueof2withprobabilityPaccept,toescapefromthelocalminima.Thisprobabilityisreducedexponentiallydependingonthenumberofepochs.ACCmethod 69

PAGE 70

canbeseenasastochasticCSmethodwhichminimizesthecorrentropiclossfunction.ThetrainingequationsfortheunderlyingANNframeworkareasfollows: wn+1jk=wnjk+nkynj, (3) wherefortheoutputlayer,thedeltasandweightsarecomputedas: nk=@FC(") @"n0(netnk) (3) nk=n0=@FC(") @"n0(netnk) (3) = 2e)]TJ /F15 5.978 Tf 5.75 0 Td[((yn)]TJ /F19 5.978 Tf 5.76 0 Td[(yn0)2 22(yn)]TJ /F10 11.955 Tf 11.95 0 Td[(yn0)0(netnk), (3) whereFCcorrentropiclossFunctionwithkernelwidth,andFC(")istheerrorattheoutputlayer.Thedeltasofthepreviouslayersareupdatedas: nk=nh=0(netnk)N0Xo=1wnhono. (3) TheACCmethodisillustratedinAlgorithm 3.1 foragivennpdatamatrixwithrelementsinthemiddlelayer.Algorithm 3.1 representsACClearningmethodfortheblockupdatescenario.Forthesamplebysampleupdatescenario,Algorithm 3.1 isadjustedappropriatelytoincorporatetheCSmechanism.InAlgorithm 3.1 ,0,1aretheparametersthatcontroltheowofACCmethod,andtheirvaluesaretakenas2,0.5erespectively(whereeisvectorofones).f1,f2arethefunctionstoupdate,Pacceptistheprobabilityofacceptingnoisysolutions.Forthesakeofsimplicity,f1andPacceptaretakenasexponentiallydecreasingfunctions,andf2updatestoavalueof2.TrainingANNwithCorrentropicLossFunctionUsingSimulatedAnnealing(ACS) Unlikethepreviousgradientdescentbasedlearningmethods,inthismethodaSAalgorithmisusedtotrainANN,i.e.,nogradientsearchisinvolvedinANN.Thismethod 70

PAGE 71

assumesthatthecorrentropiclossfunctionhasaxedkernelwidth.Sincethekernelwidthdeterminestheconvexityofthelossfunction,agradientdescentmethodcannotbeusedasalearningmethodinageneralizedframework.Hence,theSAalgorithmisusedasalearningmethodtoavoidconvergencetoalocalminimum.TheACSmethodisillustratedinAlgorithm 3.2 foragivennpdatamatrixwithrelementsinthemiddlelayer.Furthermore,=bisagivenparameterofthealgorithm.Moreover,theACSalgorithmisusedinblockupdatemodeonly,unliketheACCalgorithm(i.e.,ACCalgorithmcanbeusedinasampleorblockbasedupdatemode).InAlgorithm 3.2 ,T0istheinitialtemperature,anditsvalueistakenas1.f1(T)andPaccept(T)aretwodifferentfunctionsoftemperature.f1(T)isasimpleexponentialcoolingfunction,whereas,functionPaccept(T)isexponentialprobability,whichdependsuponthevaluesofT,aanda)]TJ /F6 7.97 Tf 6.59 0 Td[(1.TherearetwoterminationcriteriaforACCandACSmethod.EitherthetotalerrorshouldfallbelowminErr(takenas0.001)orthenumberofepochsshouldexceedMaxEpochs(MaxEpochsisaparameterforexperimentalruns,andisvariedfrom1,...,10).TheimplementationoftheproposedalgorithmsonsimulatedandrealdataarepresentedinChapter 5 .InChapter 4 ,anotherwellknownproblemofdataanalysisisintroduced,androbustmethodstosolvetheproblemisproposed. 71

PAGE 72

Table3-1. Notationanddescriptionofproposed(z)andexisting(X)methods NotationInformationRepresentationLossFunctionOptimizationAlgorithm ExactMethod-AQGXNon-parametric(ANN)QuadraticGradientDescentInitiallyQuadratic,ExactMethod-ACGXNon-parametric(ANN)shiftstoCorrentropyGradientDescentwithxedkernelwidthCorrentropywithHeuristicMethod-ACCzNon-parametric(ANN)varyingkernelwidthConvolutionSmoothingCorrentropywithHeuristicMethod-ACSzNon-parametric(ANN)xedkernelwidthSimulatedAnnealingQuadraticwithExactMethod-SGQXParametric(SVM)gaussiankernelQuadraticOptimization Algorithm3.1:ACCMethod input:Classicationdata,StructureandtransferfunctionsofANNoutput:Optimalweights 1begin 2RandomlyinitializeW(0); 3Set=0,=0; 4Settermination=false; 5whiletermination==falsedo 6ExecuteBLOCKFEEDFORWARDPHASE-ANN; 7ifrandom()
PAGE 73

A BFigure3-1. Correntropic,quadraticand0-1lossfunctions.A)Marginonx-axis.B)Erroronx-axis. 73

PAGE 74

Algorithm3.2:ACSMethod input:Classicationdata,StructureandtransferfunctionsofANNoutput:Optimalweights 1begin 2RandomlyinitializeW(0); 3Set=0,=0; 4initializea=0andT=T0; 50=FC("0)Settermination=false; 6whiletermination==falsedo 7T=f1(T); 8a=a+1; 9Wa=neighbor(Wa)]TJ /F6 7.97 Tf 6.59 0 Td[(1); 10ExecuteBLOCKFEEDFORWARDPHASE-ANN; 11a=FC("a); 12ifa
PAGE 75

CHAPTER4ROBUSTSIGNALSEPARATIONSignalseparationisaspeciccaseofsignalprocessing,whichaimsatidentifyingunknownsourcesignalssi(t)(i=1,...,n)fromtheirobservablemixturesxj(t)(j=1,...,m).Inthisproblem,amixtureisassumedtobealineartransformationofsources,i.e.,x(t)=As(t),whereA2Rmnismixingmatrix(orsometimescalledasdictionary).Typically,tisanyacquisitionvariable,overwhichasampleofmixture(acolumnfordiscreteacquisitionvariable)iscollected.Themostcommontypesofacquisitionvariablesaretimeandfrequency.However,position,wavenumber,andotherindicescanbeuseddependingonthenatureofthephysicalprocessunderinvestigation.Apartfromidenticationofthesources,knowledgeaboutmixingisassumedtobeunknown.Thegenerativemodeloftheprobleminitsstandardformcanbewrittenas: X=AS+N,(4)whereX2RmNdenotesthemixturematrix,A2Rmnisthemixingmatrix,S2RnNdenotesthesourcematrix,andN2RmNdenotesuncorrelatednoise.Since,bothAandSareunknown,thesignalseparationproblemiscalledBlindSignalSeparation(BSS)problem.TheBSSproblemrstappearedin[ 45 ],wheretheauthorshaveproposedtheseminalideaofBSSviaanexampleoftwosourcesignals(n=2)andtwomixturesignals(m=2).Theirobjectivewastorecoversourcesignalsfromthemixturesignals,withoutanyfurtherinformation.AclassicalillustrativeexamplefortheBSSmodelisthecocktailpartyproblemwhereamixtureofsoundsignalsfromsimultaneouslyspeakingindividualsisavailable(seeFigure 4-1 forasimpleillustration).Inanutshell,thegoalinBSSistoidentify SomesectionsofChapter4havebeenpublishedinComputers&OperationsRe-searchandNeuromethods. 75

PAGE 76

andextractthesources(Figure 4-1B )fromtheavailablemixturesignals(Figure 4-1A ).Thisproblemcaughttheattentionofmanyresearchers,duetoitswideapplicabilityindifferentscienticresearchareas.AgeneralsetupoftheBSSproblemincomputationalneuroscienceisdepictedinFigure 4-2 .Anysurface(orscalp)noninvasivecognitiveactivityrecordingcanbeusedasaspecicexample.Dependinguponthescenario,themixturecanbeEEG,MEGorfMRIdata.Typically,physicalsubstanceslikeskull,brainmatter,muscles,andelectrode-skullinterfaceactasmixers.Thegoalistoidentifytheinternalsourcesignals,whichhopefullyreducethemixingeffectduringfurtheranalysis.Currently,mostoftheapproachesofBSSincomputationalneurosciencearebasedonthestatisticalindependenceassumptions.Thereareveryfewapproachesthatexploitthesparsityinthesignals.SparsityassumptionscanbeconsideredasexibleapproachesforBSScomparedtotheindependenceassumption,sinceindependencerequiresthesourcestobeatleastuncorrelated.Inadditiontothat,ifthenumberofsourcesislargerthanthenumberofmixtures(underdeterminedcase),thenthestatisticalindependenceassumptioncannotrevealthesources,butitcanrevealthemixingmatrix.Forsparsitybasedapproaches,thereareveryfewpapersintheliterature(comparedtoindependencebasedapproaches)thathavebeendevotedtodevelopidentiabilityconditions,andtodevelopthemethodsofuniquelyidentifying(orlearning)themixingmatrix[ 1 34 37 54 ].InSection 4.1 ,anoverviewoftheBSSproblemispresented.Sufcientidentiabilityconditionsarerevised,andtheirimplicationonthesolutionmethodologyarediscussedinSection 4.2 .DifferentwellknownapproachesthatareusedtondthesolutionofBSSproblemarealsobrieypresented.Finally,theproposedalgorithmsarepresentedinSection 4.3 .OtherLook-alikeProblems.BSSisaspecialtypeofLinearMatrixFactorization(LMF)problem.TherearemanyothermethodsthatcanbedescribedintheformofLMF.Forinstance,NonnegativeMatrixFactorization(NMF),MorphologicalComponent 76

PAGE 77

Analysis(MCA),SparseDictionaryIdentication(SDI),etc.ThethreepropertiesthatdifferentiateBSSfromotherLMFproblemsare: ThemodelisassumedtobegenerativeInBSS,thedatamatrixXisassumedtobealinearmixtureofS. CompletelyunknownsourceandmixingmatricesSomeoftheLMFmethods(likeMCA)assumepartialknowledgeaboutmixing. IdentiablesourceandmixingmatricesSomeoftheLMFmethods(likeNMF,SDI)focusonestimatingAandSwithoutanyconditionforidentiability.NMFcanbeconsideredasadimensionalityreductionmethodlikePrincipalComponentAnalysis(PCA).Similarly,SDIestimatesAsuchthatX=AS,andSisassparseaspossible.AlthoughtheNMFandSDIproblemlookssimilartoBSS,theyhavenoprecisenotionaboutthesourcesignalsortheiridentiability. 4.1.SignalSeparationProblemFromthispointaatrepresentationofmixturedataisassumed,i.e.,mixturesignalscanberepresentedbyamatrixcontainingnitenumberofcolumns.BeforepresentingtheformaldenitionoftheBSSproblem,considerthefollowingnotationsthatwillbeusedthroughoutChapter 4 :Ascalarisdenotedbyalowercaseletter,suchasy.Acolumnvectorisdenotedbyaboldlowercaseletter,suchasy,andamatrixisdenotedbyabolduppercaseletter,suchasY.Forexample,inChapter 4 ,themixturesarerepresentedbymatrixX.AnithcolumnofmatrixXisrepresentedasxi.AnithrowofmatrixXisrepresentedasxi.AnithrowjthcolumnelementofmatrixXisrepresentedasxi,j.Now,theBSSproblemcanbemathematicallystatedas:LetX2RmNbegeneratedbyalinearmixingofsourcesS2RnN.GivenX,theobjectiveofBSSproblemistondtwomatricesA2RmnandS,suchthatthethreematricesarerelatedasX=AS.Inthetheoreticaldevelopmentoftheproblemandthesolutionmethods,thenoisefactorisignored.Although,withoutnoise,theproblemmayappeareasy.However,fromtheverydenitionoftheproblem,itcanbeseenthatthesolutionoftheBSSproblemsuffersfromuniquenessandidentiability.Thusthenotionofgood 77

PAGE 78

solutiontotheBSSproblemmustbepreciselydened.Next,theuniquenessandidentiabilityissuesareexplained.Uniqueness:Letand2Rnnbeadiagonalmatrixandpermutationmatrixrespectively.LetAandSbesuchthat,X=AS.Considerthefollowing:X=AS=(A))]TJ /F4 11.955 Tf 5.48 -9.69 Td[()]TJ /F6 7.97 Tf 6.58 0 Td[(1)]TJ /F6 7.97 Tf 6.59 0 Td[(1S=AaSa.Thus,evenifAandSareknown,therecanbeinniteequivalentsolutionsoftheformAaandSa.ThegoalofgoodBSSsolutionalgorithmshouldbetondatleastoneoftheequivalentsolutions.Duetotheinabilityofndingtheuniquesolution,notonlytheinformationregardingtheorderofsourcesislot,butalsotheinformationofenergycontainedinthesourcesislost.Generally,normalizationofrowsofSmaybeusedtotacklescalability.Also,relativeornormalizedformofenergycanbeusedinthefurtheranalysis.Theoretically,anyinformationpertainingtoorderofsourceisimpossibletorecover.However,problemspecicknowledgewillbehelpfulinidentifyingcorrectorderforthefurtheranalysis.Identiability:Let2Rnnbeanynonsingularmatrix.LetAandSbesuchthat,X=AS.Considerthefollowing:X=AS=(A))]TJ /F4 11.955 Tf 5.48 -9.69 Td[()]TJ /F6 7.97 Tf 6.58 0 Td[(1S=AgSg.Thus,evenifAandSareknown,therecanbeinnitenon-identiablesolutionsoftheformAgandSg.ThegoalofBSSsolutionalgorithmistoavoidthenon-identiablesolutions.Typically,theissueofidentiabilityarisesfromthedimensionandstructureofAandS.Thekeyideatocorrectlyidentifyboththematrices(ofcoursewithunavoidable 78

PAGE 79

scalingandpermutationambiguity)istoimposestructuralpropertiesonSwhilesolvingtheBSSproblem(seeFigure 4-3 ).SomewidelyknownBSSsolutionapproaches[ 90 ]fromtheliteraturearesummarizedbelow.StatisticalIndependenceAssumptions:OneoftheearliestapproachestosolvetheBSSproblemistoassumestatisticalindependenceamongthesourcesignals.TheseapproachesaretermedastheIndependentComponentAnalysis(ICA)approaches.ThefundamentalassumptioninICAisthattherowsofmatrixSarestatisticallyindependentandnon-gaussian[ 50 94 ].SparseAssumptions:ApartfromICA,theothertypeofapproaches,whichprovidesufcientidentiabilityconditionsarebasedonthenotionofsparsityintheSmatrix.TheseapproachescanbenamedasSparseComponentAnalysis(SCA)approaches.Therearetwodistinctcategoriesinthesparseassumptions: PartiallySparseNonnegativeSources(PSNS):Inthiscategory,alongwithacertainlevelofsparsity,theelementsofSareassumedtobenonnegative.IdeasofthistypeofapproachcanbetracedbacktotheNonnegativeMatrixFactorization(NMF)method.ThebasicassumptioninNMFisthattheelementsofSandAareassumedtobenonnegative[ 21 ].However,inthecaseofBSSproblemthenonnegativityassumptionsontheelementsofmatrixAcanberelaxed[ 67 ]withoutdamagingtheidentiabilityofAandS. CompletelySparseComponents(CSC):Inthiscategory,nosignrestrictionsareplacedontheelementsofS,i.e.,si,j2R.TheonlyassumptionusedtodenetheidentiabilityconditionsistheexistenceofcertainlevelofsparsityineverycolumnofS.[ 32 ].Atpresent,thesearetheonlyknownBSSapproachesthatcanprovidesufcientidentiabilityconditions(uniquenessuptopermutationandscalability).Infact,thesparsitybasedapproaches(see[ 34 67 ])arerelativelynewintheareaofBSSwhencomparedtothetraditionalstatisticalindependenceapproaches(see[ 50 ]).OneofthenoveltiesthatsparsitybasedmethodsbroughttotheBSSproblemistheveriabilityofthesparseassumptionsonanitelengthdata.Furthermore,notonlyoverdeterminedbutalsounderdeterminedscenariosofBSSproblemcanbehandledbythesparsity 79

PAGE 80

basedmethods.However,underdeterminedscenariorequiresahighlevelofsparsitythanthem=nsimplescenario.InSection 4.2 ,abriefdiscussionontheimportantissuesofthesparsitybasedmethodsispresented[ 90 ]. 4.2.TraditionalSparsityBasedMethodsTheearliestmethodsthatproposedthenotionofsparsityandtheidentiabilityconditionsforBSSproblemscanbefoundin[ 33 34 67 ].Fromtheliterature,differentapproachestosolveSparseComponentAnalysis(SCA)problemcanbegroupedintotwodistinctclasses.ThemaindifferencebetweenthetwoclassesisbasedonthenonnegativityassumptionoftheelementsintheSmatrix.ThereasonforsuchdivisionisduetothestructureoftheresultingSCAproblem.Typically,whenthesourcesarenon-negativetheSCAproblemcanbeboileddowntoaconvexprogrammingproblem.Thus,thealgorithmsfortheclasswithnonnegativityassumptionsarecomputationallyinexpensive.Whereas,fortheotherclass,theSCAproblemgenerallyresultsinnonconvexoptimizationproblem.Therefore,ndingaglobaloptimalsolution,whenthesourceelementsarereal,isacomputationallyexpensivetask.SCAcanbeconsideredasaexiblemethodforBSSthanICA.ICArequiresthesourcetobestatisticallyindependent,whereas,SCArequiressparsityofsources(aweakerassumption).Inadditiontothat,ICAisnotsuitableifthenumberofsourcesislargerthanthenumberofmixtures(underdeterminedcase).TypicalideasofSCAcanbefoundin[ 34 37 54 ].Furthermore,theidentiabilityconditionsonXthatimprovetheseparabilityofsourcesarestudiedbyfewresearchers[ 1 34 ].PartiallySparseNonnegativeSources(PSNS) Inmanyphysiologicaldatascenarios,thenotionthatthesourcesignalisnonnegativeseemstobevalid;forexample,medicalimaging,NMR,ICP,HRetc.Usingthisideology,andthefactthatICAatleastrequirescompleteuncorrelatedsourcesignals,apartialcorrelatedBSSmethodcanbedeveloped.AsourcematrixSisdenedtobepartiallycorrelated,whenrowsofacertainsetofcolumnsofSareuncorrelated.However, 80

PAGE 81

therowsoffullSmatrixarecorrelated.Forthesourcesonwhichthenonnegativityassumptionholds,apartiallycorrelatedassumptionislessrestrictivethanICA.TheprimaryideaonwhichthisclassofSCAmethodworkscanbesummarizedas:anyvectorxi8i=1,...,Nisnothingbutanonnegativelinearcombinationofvectorsaj8j=1,...,n.Thus,sparseassumptionsonSthatmayleadtoproperidenticationofA,whichcanbeexploitedinordertoidentifyAandS.OneofearliestapproachestowardsthismethodispresentedbyNaanaaandNuzillard[ 67 ],iscalledPositiveandPartiallyCorrelated(PPC)method.Next,thesufcientidentiabilityconditionforPPCwillbediscussed.SufcientIdentiabilityConditionsonAandSforPPC[ 67 ] Followingarethetwosufcientconditions,whicharerequiredforuniqueidenticationofAandS(uptoscalingandpermutationambiguity): PPC1:ThereexistsadiagonalsubmatrixinS.ForeachrowSithereexistsaj2f1,...,Ngsuchthatsi,j=0andsk,j>0fork=1,...,i)]TJ /F11 11.955 Tf 11.96 0 Td[(1,i+1,...,N. PPC2:ColumnsofAarelinearlyindependent.ImplicationoftheIdentiabilityConditionsforPPC DuetotherestrictiongiveninPPC1,thePPCBSSproblemboilsdowntothefollowing:AllthecolumnsofmatrixXspanaconeinRm,wheretheedgesoftheconearenothingbutthecolumnsofmatrixA.Usingthissimplication,suitablelinearorconvexprogrammingproblemscanbesolvedtoidentifytheedgesoftheconespannedbythecolumnsofX.FindingtheseedgesresultsinideticationofA.MatrixScanbeobtainedbyusingMoore-PenrosepseudoinverseofA. 81

PAGE 82

PPCApproaches In[ 67 ],aleastsquareminimizationproblemisproposedtosolvethePPCproblem.Theformulationisgivenas: minimize:NXi=1i6=jixi)]TJ /F4 11.955 Tf 11.96 0 Td[(xj2 (4a) subjectto:i08i. (4b) Inadditiontotheaboveformulation,basedonthesameedgeextractionidea,manyrecentworksaredirectedtowardsefcientedgeextractionfromX[ 14 103 ].AnotherrecentmodicationofPPCapproachiscalledPositiveeverywherePartiallyorthogonalDominantintervals(PePoDi)[ 88 ].InPePoDi,thePPC1conditionismodiedbystatingthatthelastrowofSispositivedominantanddoesnotsatisfyPPC1.HoweverthismodicationcomeswiththepriceofrestrictingAtobenonnegative.Thus,PePoDimethodcanbeseenasaspecialcaseoftheNMFproblem.CompleteSparseComponentSources[ 36 ] Whenthesourcesarenotnon-negative,thentheBSSproblemtransformsintoanonconvexoptimizationproblem.Infact,theonlyidentiableconditionknownforrealsources,istohavesparsityineachcolumnofX.Beforedeningthecompletesparsecomponent(CSC)criteria,considerthefollowingdenitions:CSC-conditioned:AmatrixMissaidtobeCSC-conditionedifeverysquaresubmatrixofMisnonsingular.CSC-sparse:AmatrixMissaidtobeCSC-sparseifeverycolumnofMhasatmostm)]TJ /F11 11.955 Tf 12.24 0 Td[(1nonzeroelements.CSC-representable:AmatrixMissaidtobeCSC-representableifforanyn)]TJ /F10 11.955 Tf 12.15 0 Td[(m+1selectedrowsofM,thereexistsmcolumnssuchthat: Allthemcolumnscontainzerosintheselectedrows,and 82

PAGE 83

Anym)]TJ /F11 11.955 Tf 11.96 0 Td[(1subsetofthemcolumnsislinearlyindependent.SufcientIdentiabilityConditionsonAandSforCSC Followingarethethreesufcientconditions,whicharerequiredforuniqueidenticationofAandS(uptoscalingandpermutationambiguity): CSC1:AisCSC-conditioned, CSC2:SisCSC-sparse, CSC3:SisCSC-representableImplicationoftheIdentiabilityConditionsforCSC DuetotherestrictiongiveninCSC2andCSC3,theCSCBSSproblemboilsdowntothefollowing:AllthecolumnsofmatrixXlieonm-hyperplanespassingthroughorigin,wherethenormalvectorsofthehyperplanesarenothingbuttheorthonormalcomplimentofthematrixA.Usingthistransformation,suitablehyperplaneclusteringmethodscanbeusedtoidentifythehyperplanesdenedbyX.Since,thehyperplaneclusteringisnon-convex,theCSCBSSproblemisrelativelydifculttosolvewhencomparedtoPPCBSSproblem.CSCApproaches GivendatamatrixX2RmN,thegoalofCSCistondtwomatrices,namely,mixing(A2Rmn)andsource(S2RnN),suchthatX=AS.UndertheCSC1,CSC2andCSC3assumptions,uniquenessuptopermutationandscalabilitycanbeachieved.NextthebasicformulationofCSCBSSproblemisdescribed,andthedifferentimprovementsthatcanbedonewiththebasicformulationisproposed.Beforeproceedingfurther,letusdescribeallthenotationsthatwillbeusedinthefollowingformulations: 83

PAGE 84

GivenData:p:Indexforapoint,p2f1,...,NgX:(x1,...,xN)=datamatrixofNpoints,xp2Rmn:ThecolumnsizeofdictionarymatrixVariables:h:Indexforahyperplane,h2f1,...,ngwh:Normalvectorofhthhyperplane,2Rmuhp:Distancebetweenpthpointandhthhyperplane,2R+thp:1ifpthpointbelongstohthhyperplane,0otherwisevhp:Ancillaryvariable,whichreectstheproductofthpuhpinalinearisedformMathematically,thesetofhyperplanescontainingthedatapointsisasolutiontomathematicalFormulation 4 : minimize:NXp=1min1hn(wthxp)]TJ /F10 11.955 Tf 11.95 0 Td[(bh)2 (4a) subjectto:jjwhjj2=1, (4b) wh2Rm, (4c) bh2R. (4d) Therefore,anysolutionofFormulation 4 willrepresentaw(2))]TJ /F10 11.955 Tf 12.3 0 Td[(skeletonofX[ 10 ].Itconsistsofnhyperplanesdenedas: Hh=fxp2Rm:wthxp=bhg8h=1,...,n.(4) 84

PAGE 85

Anotherapproachforhyperplaneclusteringispresentedin[ 81 ],whichcanbedescribedviaFormulation 4 : minimize:NXp=1min1hnwthxp)]TJ /F10 11.955 Tf 11.96 0 Td[(bh (4a) subjectto:( 4)-222()]TJ /F11 11.955 Tf 21.26 0 Td[(3b ))]TJ /F11 11.955 Tf 11.96 0 Td[(( 4)-222()]TJ /F11 11.955 Tf 21.26 0 Td[(3d ). (4b) ThesolutiontoFormulation 4 denesw(1))]TJ /F10 11.955 Tf 13.19 0 Td[(skeletonofX.Formulation 4 isanalogoustoFormulation 4 indeningthehyperplanes.However,themaindifferenceisthatFormulation 4 minimizestheabsolutedistances,whereasFormulation 4 minimizesthesquareddistances.Thisdoesnotseemstobeahugedifference,however,absolutedistanceminimizationisconsideredtobearobustapproach.Theequivalenceofboththeformulationsanduniquenessoftheirsolutionundersparsityassumptionsarediscussedin[ 20 32 ].Moreover,Georgievetal.[ 32 ]havereducedthehyperplaneclusteringproblemtoabilinearformulationinthecasewheneverydatapointbelongstoonlyoneskeletonhyperplane(andtherefore,theminimumvalueinFormulation 4 iszero).ThenFormulation 4 isequivalenttoFormulation 4 .Inordertoobtainthebilinearformulation,thenon-linearconstraintsgiveninEquation 4e isreplacedwithwthe=1(whereeisvectorofallones).ThisreplacementdoesnotchangethehyperplanesdenedbytheirsolutionsandthosedenedbysolutionsofFormulation 4 .Differentoptimizationmethodscanbeappliedtosolvethebilinearproblem.In[ 32 ],ann-planeclusteringalgorithmvialinearprogrammingisproposedtosolvethebilinearproblem.Algorithm 4.1 brieydescribesthisn-planeclusteringalgorithm.TheinitialapproachestoCSCBSSproblemisbasedonbilinearhyperplaneclusteringapproach[ 32 ].However,themaindrawbackofthealgorithmisits 85

PAGE 86

convergencetolocalminima.Infact,mostofthehyperplaneclusteringmethodsintheliteratureareconnedto7to8dimensions. minimize:NXp=1nXh=1thpuhp (4a) subjectto:wthxpuhp8h,p, (4b) wthxp)]TJ /F10 11.955 Tf 21.92 0 Td[(uhp8h,p, (4c) Xhthp=18p, (4d) jjwhjj2=18h, (4e) thp18h,p, (4f) thp08h,p, (4g) wh2Rm8h, (4h) uhp08h,p. (4i) 4.3.ProposedSparsityBasedMethodsThegoalofSection 4.3 istopresenttheproposedapproachesfortheSCAproblem.Specically,thestandardpreprocessingmethodforBSSproblemisillustrated.Inadditiontothat,novelmethodsforbothPPCandCSCcaseofSCAproblemaredeveloped.Furthermore,arobustcorrentropyminimizationmethodforsourceextractionisalsoproposed.DataPreprocessingandRecovery Beforeusingtheproposedmethods,thegivendataXispreprocessedusingtheprewideningmethod.Thisisdoneinordertoreducetheill-conditioningeffectonXarisingfromthedictionarymatrix.Forexample,considerthesourcematrixS2R380,showninFigure 4-4 .ThedictionarymatrixisA(seematrixinEquation 4 ).Inthis 86

PAGE 87

examplem
PAGE 88

SST=I.Therefore,bAbAT=Iasshownbelow: bAbAT=AATT=ASSTATT=XXTT=)]TJ /F15 5.978 Tf 7.78 3.26 Td[(1 2QTQQTQ)]TJ /F15 5.978 Tf 7.78 3.26 Td[(1 2 (4) =I. (4) However,wedonotassumethatSST=I,buttheabovetransformationstillhelpsinndinghyperplanes.Forthecasewhenm=n,oncetheoptimalsolutionforallnoptimizationproblemisobtained,thesourcematrixfromnon-noisymixturesisobtainedas: S=cWTbX=WTX,(4)whereS=PS,Pisamonomialmatrix(i.e.,eachrowandeachcolumncontainsonlyonenon-zeroelement).ThesourceextractionmethodfornoisymixturesisconsideredantheendofChapter 4 .UnlessanyotherinformationaboutAisknown,correspondencebetweenrowsofSandSishardtodetermine.Similarly,matrixbAisobtainedbysolvingEquation 4 : cWTbA=P.(4)HoweverP,isunknown.Therefore,Equation 4 canbesolvedbyasimpleassumptiononPmatrix(i.e.,P=I).Moreover,theresultingdictionarymatrixwillbeuniqueuptopermutationandscalabilityofcolumns.Forexample,solvingsystemofequationsgivenbyEquation 4 willbeenough: cWTbA=I.(4) 88

PAGE 89

Finally,Acanbeobtainedas: A=Q1 2bA.(4)Tosum,althoughtheactualAandSmatricescannotbeidentied,theycanbeobtainedinpermutedandscaledformswhenm=n.PPCRobustMethodforDictionaryIdentication GivendatamatrixX2RmN,thegoalofPPCistondtwomatrices,namelymixing(A2Rmn)andsource(S2RnN+),suchthatX=AS.Whiledevelopingthealgorithm,itisassumedthatthesourcesignalsarenon-negative.Theproposedalgorithmisasfollows: Step1:NormalizeallthecolumnsofX Step2:SolvethefollowingLPtotogettheprojectiondirection: minimize (4) s.t.)]TJ /F4 11.955 Tf 21.92 0 Td[(dTxi8i, (4) )]TJ /F4 11.955 Tf 9.29 0 Td[(dTxi08i, (4) )]TJ /F11 11.955 Tf 9.29 0 Td[(2dj2. (4) TheaboveformulationwillgenerateprojectionvectordwhichisinsidetheconeformedbythecolumnsofX. Step3:Normalizevectord. Step4:Projectingthepointsonan-dimensionalsimplexplaneorthogonaltod,i.e.,updateeachpointxiasxi=xi dTxi Step5:Translatethepoints,suchthattheplanecontainingthen-dimensionalsimplexplanepassesthroughtheorigin.Thiscanbedonebycenteringofdata,i.e.,foreachdatapointusethefollowingtransformation: xi=xi)]TJ /F8 11.955 Tf 11.62 .5 Td[(bx std(4)wherebxandstdarerespectivelythemeanandstandarddeviationofallthecolumnsofX. Step6:Afnetransformation,likePrincipalComponentAnalysis(PCA)canbeusedtotransformthensimplex,fromn+1dimensionstondimensions.ThePCA 89

PAGE 90

method:IdentifytheeigenvaluesandeigenvectorsofXXT:UDnU=XXTRearrangetheeigenvaluesinthediagonalofDnindecreasingorderoftheirvalue.LetDn)]TJ /F6 7.97 Tf 6.59 0 Td[(1bethesubmatrixofDn,constructedbyeliminatinglastrowandlastcolumn.LetYbecreatedsuchthatyi=UTxi8i.LetZrepresentthesubmatrixofYobtainedbyeliminatingthelastrowofmatrixY.ThematrixZisanafnetransformationanddimensionalityreductionofmatrixX. Step7:IfthePPCconditionsaresatised,thenndthenvertices.Ifnot,thenapproximatelyndthebestnextremepoints.Projectionbasedideaisanextensiontothemethodproposedin[ 14 ].However,theapproachdidnotaddressthescenarioofnegativeelementsinmixingmatrix.Theproposedmethodcanincorporatethenegativeelementsinthemixingmatrix.Arecentapproachalsoaddressesissuesofnegativeelementsinthemixingmatrix[ 103 ].Furthermore,themajoradvantageoftheproposedapproachoverearliermethods[ 14 103 ]istoavoidsolvinglargenumberofLPs.TheonlyLPthatwesolveisintheStep2.ForStep7,intheproposedapproach,insteadofsolvingmanyLPsthefollowingprojectionapproachisproposed:ProjectionApproach:initially,thedatapointsareprojectedonthenormalvectortotheedgesofthestandardn-dimensionalsimplexprojectedonthen-dimensionalspace.Themaximumandminimumprojectionsfortheinitialnnormalvectorprojectionsarearchived.Now,thestandardsimplexisrandomlyrotated,andanewsetofnormalvectorsareusedfortheprojection.Again,themaximumandminimumprojectionsarearchived.Ifthetotalnumberofminimumandmaximumprojectionpointsisequalton+1points,thenthePPCassumptionsaresatised.Furthermore,thenverticescannowbeobtainedfromthearchive.However,iftherearemorethann+1points,thenthisindicatesthatPPCassumptionsarenotsatised.Inthiscase,fromthesetofarchivedpoints(potentialcandidatesforvertices),theonewithmaximumnormispicked.Themaximumnormpointistakenasabestextremepoint.Now,therestofthearchivedpointsareprojectedonahyperplanepassingthroughoriginwithanormal 90

PAGE 91

vectorpassingthroughtheidentiedextremepoint.Theprojectedarchivedpointscannowbeusedtoreducetheproblemsizebyonedimension.Thisprocessofprojectionanddimensionreductioniscontinuedntimestoidentifyallthebestextremepoints.Itistobenotedthat,intheprojectionanddimensionreductionphaseoftheproposedapproachutilizesarchivedpointsonly.CSCRobustMethodforDictionaryIdentication GivendatamatrixX2RmN,thegoalofCSCistondtwomatrices,namely,mixing(A2Rmn)andsource(S2RnN+),suchthatX=AS.Analternativeapproach,whichisdevelopedinthisdissertationistosolvethebilinearproblemgiveninFormulation 4 via0-1linearreformulation[ 89 ].Next,the0-1formulationforCSCispresented: minimize:NXp=1nXh=1vhp (4a) subjectto:( 4)-221()]TJ /F11 11.955 Tf 21.25 0 Td[(6b ))]TJ /F11 11.955 Tf 11.95 0 Td[(( 4)-221()]TJ /F11 11.955 Tf 21.25 0 Td[(6d ),( 4)-222()]TJ /F11 11.955 Tf 21.25 0 Td[(6h ),( 4)-221()]TJ /F11 11.955 Tf 21.25 0 Td[(6i ), (4b) wthe=18h, (4c) vhpM1thp8h,p, (4d) vhpuhp8h,p, (4e) vhpuhp)]TJ /F10 11.955 Tf 11.95 0 Td[(M2(1)]TJ /F10 11.955 Tf 11.95 0 Td[(thp)8h,p, (4f) thp2f0,1g8h,p, (4g) vhp08h,p. (4h) whereM1andM2areverylargepositivescalars.Formulations 4 and 4 areequivalent.Clearly,theMIPcanbesolvedsequentiallyforeachhyperplane.BeforedeningthehierarchybasedMIPformulation,letusintroducethefollowing: 91

PAGE 92

Notations: w?r:OptimalsolutionattherthoptimizationproblemgivenbyFormulation 4 .H?r:AHyperplanepassingthroughorigin,whosenormalvectorisw?r.Pr:Indexsetofpoints,denedasPr=Pr)]TJ /F6 7.97 Tf 6.59 0 Td[(1nRr)]TJ /F6 7.97 Tf 6.59 0 Td[(1,r>1,...,nwhere,P1=1,...,NRr:IndexsetofpointswhicharewithindistancefromthehyperplaneH?r,denedasRr=fp:jw?rtxpjg,where>0isagiventhresholdsuchthatRrhasatleastm+1elements. minimize:Xp2Prpvp)]TJ /F8 11.955 Tf 12.95 11.36 Td[(Xp2Prptp (4a) subjectto:)]TJ /F10 11.955 Tf 9.3 0 Td[(upwtrxpupp2Pr, (4b) up)]TJ /F10 11.955 Tf 11.96 0 Td[(M1(1)]TJ /F10 11.955 Tf 11.95 0 Td[(tp)vpp2Pr, (4c) vpupp2Pr, (4d) vpM2tpp2Pr, (4e) wtre=1, (4f) Xp2Prtpm+1, (4g) tp2f0,1gp2Pr, (4h) up0p2Pr, (4i) vp0p2Pr, (4j) wr2Rm. (4k) 92

PAGE 93

Sincetheformulationconsidersonehyperplaneattime,thesecondindexfordoubleindexedvariablescanbedropped.Forexamplevpisnothingbutvpr,similarargumentfollowsforupandtp.pandparescalingfactors,andarearbitrarilyselected.Clearly,thenon-hierarchicalFormulation 4 hasNnbinaryvariables,whereastherthiterationinhierarchicalFormulation 4 hasjPrjbinaryvariables(wherejPrj1).Moreover,foranytwoiterationssayr1,r2,wherer2>r1,wehavejPr1j>jPr2j.Probabilistically,thecomplexityateachiterationisreduced.Thisisduetothefactthatintherthiteration,theprobabilitythatxp,p2Prwilllieintheremainingn)]TJ /F10 11.955 Tf 12.2 0 Td[(r+1planeswillbe1 n)]TJ /F14 7.97 Tf 6.59 0 Td[(r+1,(sinceXisBSS-skeletable).Ideally,ifthereisnonoiseinthedataandifalltheearlieriterationsconvergedtoglobaloptimalsolution,thenthenthiterationisredundant.TheproposedhierarchicalapproachforsolvingFormulation 4 ispresentedinAlgorithm 4.2 .ThestepsoftheproposedhierarchicalapproachisillustratedbyaowchartshowninFigure 4-7 .RobustMethodforSourceExtraction Whentheknowledgeofdictionaryisobtained,thenthesourceextractionproblemcanbesimpliedas: S=pinv(A)X,(4)wherepinv(.)ispseudoinversefunction.ThismethodworksonlywhenXisfreefromoutliers.However,whenthemixturematrixcontainoutliers,theabovesolutionapproachwillnotwork.Forsuchscenarios,thefollowingalgorithmisproposed.Considerthefollowingoptimizationproblem: minimize:kAS)]TJ /F4 11.955 Tf 11.96 0 Td[(Xk (4a) subjectto:S2RnN, (4b) 93

PAGE 94

whereA2RmnandX2RmN.Typically,theaboveproblemissolvedasaquadraticerrorminimizationproblem.Suchmethodsarenotrobust,whentheelementsofdata(Aand/orX)arecontaminatedwithoutliers.Thegoalistopresentarobustmethodforsourceextraction,whichisinsensitivetooutliers.Specically,thefollowingproblemisconsidered: minimize:Fc(Y)+Fc(S) (4a) subjectto:Y=AS)]TJ /F4 11.955 Tf 11.95 0 Td[(X, (4b) S2RnN, (4c) Y2RmN. (4d) whereFCisthecorrentropiclossfunction,andisknownweight(oraparameter)forregularization,whichcontrolsthesparsityinS.Letvectorz2RN(m+n)bedenedas: zi=8>><>>:ydi Ne,(i)]TJ /F6 7.97 Tf 6.59 0 Td[((di Ne)]TJ /F6 7.97 Tf 10.35 0 Td[(1)N)imNsdi)]TJ /F19 5.978 Tf 5.76 0 Td[(mN Ne,(i)]TJ /F14 7.97 Tf 6.59 0 Td[(mN)]TJ /F6 7.97 Tf 6.58 0 Td[((di Ne)]TJ /F6 7.97 Tf 10.35 0 Td[(1)N)otherwise.(4)LetC2RmN(m+n)Nbedenedas: C=[)]TJ /F4 11.955 Tf 9.3 0 Td[(ImN,AIN](4) 94

PAGE 95

Theaboveproblemcanbetransformedas: minimize:)]TJ /F6 7.97 Tf 11.29 15.65 Td[((m+n)NXi=1ie)]TJ /F19 5.978 Tf 5.75 0 Td[(z2i 22 (4a) subjectto:Cz=d, (4b) z2RN(m+n), (4c) whered2RmN,denedasdi=xdi Ne,(i)]TJ /F6 7.97 Tf 6.59 0 Td[((di Ne)]TJ /F6 7.97 Tf 10.35 0 Td[(1)N),and i=8>><>>:1ifimNotherwise.(4)Basedonthevalueof,Formulation 4 canmovefromconvexdomaintoinvexdomain.Specically,theproblemwillbeaconvexprogrammingproblem,when2z2i8i=1,...,(m+n)N.ConsidertheLagrangianofFormulation 4 : L(z,v)=)]TJ /F6 7.97 Tf 11.29 15.66 Td[((m+n)NXi=1ie)]TJ /F19 5.978 Tf 5.75 0 Td[(z2i 22+vT(Cz)]TJ /F4 11.955 Tf 11.95 0 Td[(d),(4)wherev2RmNarethedualvariables.TheKKTsystemofFormulation 4 willbe: rFC(z)+CTv=0(4) Cz=d,(4)where[rFC(z)]i=i 2e)]TJ /F19 5.978 Tf 5.75 0 Td[(z2i 22zi8i=1,...,(m+n)N.SolvingEquations 4 and 4 givesthesolutionforminimumcorrentropyerrorwithregularity.Letz(r)bethecurrentfeasiblesolution,andletd(r+1)beanimprovingandfeasibledirection.Considerthelinearapproximationofanygradientfunctionofa 95

PAGE 96

twicedifferentiablefunction: rf(w+u)rf(w)+r2f(w)u.(4)Usingtheaboveinformation,Equation 4 and 4 canberewrittenas: rFC(z(r))+r2FC(z(r))d(r+1)+CTv(r+1)=0(4) Cd(r+1)=0,(4)wherer2FC(z(r))istheHessianofthecorrentropicfunction,denedas: r2FC(z(r))i,j=8>>><>>>:i 2e )]TJ /F19 5.978 Tf 5.76 0 Td[(z(r)i2 22!2)]TJ /F14 7.97 Tf 6.59 0 Td[(z(r)i2 2ifi=j0otherwise.(4)Equation 4 canberewrittenas: d(r+1)=)]TJ /F8 11.955 Tf 11.29 9.68 Td[(r2FC(z(r)))]TJ /F6 7.97 Tf 6.59 0 Td[(1rFC(z(r))+CTv(r+1),(4)where r2FC(z(r)))]TJ /F6 7.97 Tf 6.59 0 Td[(1i,j=8>>><>>>:2 ie z(r)i2 22!2 2)]TJ /F14 7.97 Tf 6.59 0 Td[(z(r)i2ifi=j0otherwise.(4)Let=r2FC(z(r)))]TJ /F6 7.97 Tf 6.58 0 Td[(1.UsingEquation 4 inEquation 4 ,weget: CrFC(z(r))+CTv(r+1)=0(4) CCTv(r+1)=)]TJ /F4 11.955 Tf 9.3 0 Td[(CrFC(z(r)).(4)Equation 4 canbewrittenas: v(r+1)=)]TJ /F8 11.955 Tf 9.3 9.69 Td[()]TJ /F4 11.955 Tf 5.48 -9.69 Td[(CCT)]TJ /F6 7.97 Tf 6.59 0 Td[(1CrFC(z(r)).(4) 96

PAGE 97

SubstitutingEquation 4 inEquation 4 ,weget: d(r+1)=)]TJ /F4 11.955 Tf 9.3 0 Td[(hrFC(z(r)))]TJ /F4 11.955 Tf 11.95 0 Td[(CT)]TJ /F4 11.955 Tf 5.48 -9.69 Td[(CCT)]TJ /F6 7.97 Tf 6.59 0 Td[(1CrFC(z(r))i(4) d(r+1)=)]TJ /F4 11.955 Tf 9.3 0 Td[(hI(m+n)N)]TJ /F4 11.955 Tf 11.96 0 Td[(CT)]TJ /F4 11.955 Tf 5.48 -9.68 Td[(CCT)]TJ /F6 7.97 Tf 6.59 0 Td[(1CirFC(z(r))(4) d(r+1)=hCT)]TJ /F4 11.955 Tf 5.48 -9.68 Td[(CCT)]TJ /F6 7.97 Tf 6.59 0 Td[(1C)]TJ /F4 11.955 Tf 11.95 0 Td[(irFC(z(r))(4) d(r+1)=264)]TJ /F4 11.955 Tf 9.3 0 Td[(YS(AIN)T375Y+(AIN)S(AIN)T)]TJ /F6 7.97 Tf 6.58 0 Td[(1)]TJ /F4 11.955 Tf 9.3 0 Td[(Y(AIN)SrFC(z(r)),(4)where=264Y00S375.ThesecondordermethodissuitablewhentheobjectivefunctionofFormulation 4 isconvex.Whenthereareoutliers,thegoalinFormulation 4 willbetominimizethetotalcorrentopicloss,whileignoringtheeffectofoutliers.Insuchascenario,thekernelwidthwillbesoselectedsuchthatitseparatesthetruesampleandoutliers.Typically,thisseparationmechanismleadstotransformationofthecorrentropytotheinvexdomain.Thus,thesecondorderNewton'smethodwillnotbeabletondtheoptimalsolution.Thus,inthefollowingparagraphsaniterativemethodtosolveFormulation 4 isdevelopedwhencorrentropyisinvex.Letz(r)bethecurrentfeasiblesolution.Letf1(S)=Fc(AS)]TJ /F4 11.955 Tf 12.04 0 Td[(X)andf2(S)=Fc(S).Theaimofndingtheoptimalkernelwidthistoidentifyaborderthatseparatesgooddatapointsandoutliers.Generally,suchmechanismofseparatingdatapointsrequiresproblemspecicknowledge.However,inthiswork,correntropybasedmethodthatidentiesoptimalkernelwidthisproposed,whichinturnprovidesamarginbetweengooddatapointsandoutliers.Thephilosophyoftheproposedmethodisbasedonthesimplenotionthatifiistheoptimalkernelwidthandifpthgivenpointbpcontainsnoise,thensettingthecorrespondingsolutionsptozerovectorshouldgivemaximumimprovementinthe 97

PAGE 98

objectivefunctionf(S)=f1(S)+f2(S).Itiseasytoseewhyf2shoulddecrease.However,thedecrementinf1isonlypossiblewhenthegivenpointxpisindeedanoutlierw.r.ti.Now,amongallpossiblevaluesofi,theonethatprovidesmaximumdecreasew.r.ttheoriginalobjectivefunctionvalueistheoptimalvalueofthekernelwidth.Letf(Snp)bethecorrentropycost,whenspissetequaltothezerovector.Algorithm 4.3 presentstheproposedalgorithm.Oneofthedrawbacksofthisapproachisthecomputationalexpensivenessofthesecondordermethod,whichincreaseswiththeproblemdimensionsn,mandN.Ontheotherhand,thestepinvolvingsecondordercanbeavoidedwhentheproposedmethodisusedforinitialltering,i.e.,solvingthefollowingproblem: X=IXf.(4)WhensolvingforXfinEquation 4 ,XfcanbeinitializedasXf=X,andthesecondordermethodcanbeskipped.AfterexecutingAlgorithm 4.3 ,theoptimalkernelwidthandlteredmixturematrixisobtained.Thislteredmixturematrixthencanbeusedfordictionaryidenticationandsourceextraction. A BFigure4-1. Cocktailpartyproblem:A)Setup.B)Problem. 98

PAGE 99

Figure4-2. BSSsetupforhumanbrain BLINDSIGNALSEPARATION PartialBlind FullBlind IndependentComponentAnalysis SparseComponentAnalysis PartiallySparseNonnegativeSource CompleteSparseComponents Figure4-3. OverviewofdifferentapproachestosolvetheBSSproblem 99

PAGE 100

Algorithm4.1:BilinearAlgorithm input:X2RmNoutput:W2RmnandT2RnN 1begin 2RandomlyinitializeT; 3Settermination=false; 4Set=epsilon; 5whileterminationdo 6forp=1toNdo 7CalculatedistancesDhp'sbetweenxpandallthehyperplaneswh's; 8AssignxptoclusterCh?iffDh?p=minhfDhpg; 9error=PpjDh?pj; 10iferror
PAGE 101

Algorithm4.2:CSCHierarchicalOptimizationAlgorithm input:X2RmNoutput:A2RmnandS2RnN 1begin 2X=Preprocessing(X,); 3Set=epsilon; 4SetP1=f1,...,Ng; 5forcounter=1tondo 6Setr=counter; // rrepresentsthecurrentindexofthehyperplane 7Settermination=false; 8repeat 9Chooseinitialpoints; 10SolveFormulation( 4--24 )fortherthhyperplane,givenxp,8p2Pr; 11ifoptimalsolutionisobtainedthen 12termination=true; 13Archivevectorw?r; // w?ristheoptimalsolutionvectorofFormulation( 4--24 ) 14ObtainindexesRr=fp:jw?rTxpjg; 15SetPr+1=PrnRr; 16untiltermination; 17ArrangeW=[w?1,...,w?n]; 18GetSas,S=WTX; 19GetAbysolvingthemodel,WTA=Inn; 20return(S,A); 101

PAGE 102

Algorithm4.3:CorrentropyMinimizationforX=ASTypeScenarios input:X2RmNandA2Rmnoutput:S2RnNand? 1begin 2Letz(r)besolutionobtainedfromsecondorderminimization,wherercanbechosenarbitrarilybasedontherequiredaccuracy; 3LetS(r)bethesolutionconstructedfromz(r); 4Let(r)betheminimumvalueofkernelwidthobtainedfromz(r),suchthatthecorrentropyfunctionisconvex; 5Selectanyvaluefor,suchthat0<<1; 6r=; 7termination=false; 8whiletermination==falsedo 9Calculatef(S)(r); 10fori=1toNdo 11iff(Sni)(r)rthen 16(r)=(r); 17r=r+1; 18else 19?=(r); 20termination=true; 21Returnf?,Xg; 102

PAGE 103

Figure4-4. OriginalexamplesourceS2R380 Figure4-5. MixeddataX2R280 103

PAGE 104

Figure4-6. ProcesseddatabX2R280 104

PAGE 105

Figure4-7. Algorithm 4.2 description 105

PAGE 106

CHAPTER5SIMULATIONSANDRESULTSInChapter 5 ,theapplicabilityoftheproposedrobustmethodsisillustratedbyexperimentationsonsimulatedandrealworlddata.Generally,itisimpracticaltoconcludeaboutthepresenceofoutliersinrealdata.Therefore,thesignicanceoftheproposedmethodsarehighlightedusingsimulateddata.Weshowthattheproposedmethodsworkverywellinnon-noisysimulateddata,aswellasinnoisysimulateddata.Afterthetwosimulateddatascenarios,theperformanceoftheproposedmethodsontherealdataisalsotested.InSections 5.1 5.4 ,casestudiesrelatedtobinaryclassicationareillustrated.Section 5.5 presentsthecasestudyrelatedtolinearmixingassumption.Applicationofnon-negativesourceseparationisshowninSection 5.5 .ThesuitabilityoftheproposedPPCmethodfortheimageunmixingproblemsisshowninSections 5.6 5.10 .Section 5.11 presentsthecasestudiesrelatedtocompletesparsesourceseparationviathehyperplaneclusteringmethod.Finally,Section 5.12 illustratestheproposedrobustsourceextractionprocedure. 5.1.CauchyandSkewNormalDataTheobjectiveofSection 5.1 istoevaluatetheperformancebehaviorofcorrentropylossfunctioninsimulatednoisydataclassication.Atwo-dimensionalnoisydataforthebinaryclassicationissimulatedforthisstudy.Altogether,twodifferenttypesofdatasetsweregenerated.TherstdatasetisgeneratedusingCauchydistribution.Thereasonforselectingthisdistributionistoevaluatetheperformanceofproposedandexistingmethodsinanon-Gaussianenvironment.Inthisdataset,thefattailbehavioroftheCauchydistributionmimicsthenoise.Theseconddatasetisgeneratedbyaskew SomesectionsofChapter5havebeenpublishedinDynamicsofInformationSys-tems:MathematicalFoundations,Computers&OperationsResearchandNeurometh-ods. 106

PAGE 107

normaldistribution.Inthisdataset,10%ofthedatapointsfromoneclassarerandomlyassignedtoanotherclassandviceversa.BriefinformationregardingthetwodatasetsisgiveninTable 5-1 .ThedetailsofthedatasetsareshowninFigures 5-1 5-2 & 5-3 .Forthedatasets,axednumberofrecordswereselectedfortrainingtheclassier.Theremainingrecordswereusedfortestingthetrainedclassier.Inordertohaveaccurateresults,adatasetisrandomlydividedintotestingdataandtrainingdata.Foreachdataset,thetrainingdataispreprocessedbynormalizingthedatatozeromeanandunitvariancealongthefeatures(toavoidscalingeffects).Basedonthemeanandvarianceofthetrainingdata,thetestingdataisscaled.Inadditiontothat,fortheresultstobeconsistent,100Monte-Carlosimulationswereconducted(bothforANNandSVM),andtheaveragetestingaccuracyoftheclassieroverthe100simulationsisreportedintheresults.TheresultsareshowninTables 5-2 & 5-3 .Fromtheresults,itcanbeseenthatthecorrentropylossfunctiondoesperformbetterforthecaseofCauchydata.However,whenthedataisnormallydistributed,likeSkewdata,itsperformanceissimilartothequadraticlossfunction. 5.2.RealWorldBinaryClassicationDataInSection 5.2 ,simulationsarecarriedoutforthreerealworlddatasets(WisconsinBreastCancerData,PimaIndiansDiabetesDataandBUPALiverDisorderData)relatedtobiomedicaleld.ThesedatasetsaretakenfromtheUCImachinelearningrepository( http://archive.ics.uci.edu/ml/ ).AbriefinformationregardingeachofthedatasetsisgiveninTable 5-4 .TheobjectiveofSection 5.2 istoevaluatetheperformancebehaviorofcorrentropylossfunctionintherealworlddataclassication.Originally,someoftheselecteddatasetshavemissingvalues.Alltherecordscontaininganymissingdatavaluesarediscardedbeforeusingthedataforclassication.Inadditiontothat,foreachdataset,axednumberofrecordswereselectedfortrainingtheclassier.Theremainingrecordswereusedfortestingthetrainedclassier.Inorder 107

PAGE 108

tohaveaccurateresults,adatasetisrandomlydividedintotestingdataandtrainingdata(keepingthetotalnumberoftrainingrecordsxed,asgiveninTable 5-4 ).Foreachdataset,thetrainingdataispreprocessedbynormalizingthedatatozeromeanandunitvariancealongthefeatures(toavoidscalingeffect).Basedonthemeanandvarianceofthetrainingdata,thetestingdataisscaled.Thepurposeofnormalizingthetrainingdataaloneandscalingthetestingdatalateristomimicthereallifescenario.Usually,thetestingdataisnotavailablebeforehand,anditsinformationisunknownwhilenormalizingthetrainingdata.Inadditiontothat,fortheresultstobeasconsistentaspossible,100Monte-Carlosimulationswereconducted(bothforANNandSVM),andtheaveragetestingaccuracyoftheclassieroverthe100simulationsisreportedintheresults. 5.3.ComparisonAmongANNBasedMethodsTheaimofSection 5.3 istocomparetheproposedANNbasedmethodswiththeexistingANNbasedbinaryclassicationmethods.SincethenumberofPEsinthehiddenlayerhaveaneffectontheperformanceofANNbasedclassiers,simulationshavebeenconductedfor5,10and20PEsinthehiddenlayerforeachofthedatasets.Although,theexactnumberofPEsthatwillgivemaximumclassicationaccuracyisunknown,itcanbeestimatedbyanexperimentalsearchoverthenumberofPEsinthehiddenlayer.However,suchasearchisoutofthescopeofthecurrentworkduetoitshighcomputationalrequirements.Therefore,thecomputationshavebeenconnedfor5,10and20PEsinordertoefcientlycomparealltheANNbasedclassiers.Moreover,theperformanceofANNbasedclassierwithsampleandblockbasedlearningframeworkwerealsoconsideredinthecomparison.TheresultofsampleandblockbasedlearningmethodsofANNsimulationsaregiveninTables 5-5 5-6 5-7 5-8 5-9 ,and 5-10 .InthesixTables,eachcolumnrepresentsanumberoflearningepochsforsamplebasedlearning.Whereas,eachcolumnrepresentsanumberofepochstrainingsamplesizeforblockbased 108

PAGE 109

learning.Foragivenalgorithm,arowrepresentstheaverageresultof100Monte-Carlosimulations.Firstrowpresentstheresultswith5PEsinthehiddenlayer.Secondrowpresentstheresultswith10PEs,andthirdrowpresentstheresultswith20PEsinthehiddenlayer.FortheAQGandACGmethods,theresultsfrom[ 86 ]areusedasareferenceforfurthercomparisons(seeTables 5-5 5-7 and 5-9 ).SinceACSrequiresknowledgeofchangeinlossfunctionvalueoveranytwoconsecutiveiterations,itcannotbeimplementedinsamplebasedlearning.However,allthealgorithmshavebeenimplementedinblockbasedlearning,andtheperformanceresultsofACSat=0.5havebeenpresented.TheresultsshowsthatACCalmostalways(bothforsampleandblockbasedlearningmethods)performsbetterwhencomparedtoanyoftheANNbasedclassicationalgorithms.Therefore,thismethodcanbeusedasageneralizedrobustANNbasedclassierforpracticaldataclassicationproblems.Moreover,thepoorperformanceofACSmethodisattributedtothe=0.5criterion.ItisnotnecessarythattheassumedcriterionmayshowACS'sbestperformance.Therefore,thisinstigatedthestudyofperformancebehaviorofACSmethodoverdifferentlevelsofparameter(seeTables 5-11 5-12 and 5-13 ). 5.4.ANNandSVMComparisonTheaimofSection 5.4 istocomparetheproposedANNbasedbinaryclassicationmethodswiththeSVMbasedbinaryclassicationmethods.SinceSVMhasnoconceptofPEs,thebestoftheaverageaccuracyofSVM(averageof100Monte-Carlosimulationsforagivenpairofcand)overtheexponentialgridspaceofcandisusedtocomparewiththeaccuracyoftheproposedalgorithms.Figure 5-4A showsthetopologyofperformanceaccuracyoverthegrid,andFigure 5-4B showsthetopologyofnumberofsupportvectorsforPIDdata.Correspondingly,Figures 5-5 and 5-6 showthesameforBLDandWBCdatarespectively.Themaximumtestingaccuracythatis 109

PAGE 110

obtainedforPIDdatafromthegridsearchis77.2%.Similarly,forBLDandWBCitis71.4%and97.07%respectively.ItwouldbeunfairtodirectlycomparethebestaccuracyofSVMwiththeaccuracyoftheproposedANNbasedalgorithms,duetofollowingreason:WhilecalculatingthebestaccuracyofSVMbasedmethod,anegridsearch(exhaustiveinnature)overtheparameterscandisconducted.Thepossibilitytoconductsuchexhaustivesearchesovertheparameterspaceiscreditedtotheexistenceoffastquadraticoptimizationalgorithmslikesequentialminimaloptimization[ 27 ].However,suchneexhaustivegridsearchfortheproposedmethodsisyetcomputationallyexpensiveinthecaseofANNs(forexample:anexhaustivegirdsearchforACSrequiretosearchoverthreeparametersnamely:numberofepochs,andnumberofPEsinthehiddenlayer).However,inordertoseethebehavioroftheACSalgorithmwithvariouslevelsof,acoarsegridsearchwithfewgridpointshavebeenconducted.TheresultofthisgridsearchisshowninTables 5-11 5-12 and 5-13 .Although,thegridisconnedtoveryfewgridpoints,itcanbeseenthattheperformanceaccuracyofACSalgorithmvarywiththechangeofparameters(andnumberofPEsinhiddenlayer).TheresultsfromthegridsearchshowsthattheperformanceaccuracyofACS(evenwithlimitedPEsandconnedlevelsof)isveryclosertothebestperformanceaccuracyofsoftmarginkernelbasedSVM.Furthermore,evenwiththelimitations(numberofPEsinthehiddenlayer,andnumberofepochs)ACCbeatsthebestperformanceaccuracyofSVMforWBCdata.Inadditiontothat,itsperformanceisveryclosetothatofbestSVMperformancefortheothertwodatasets. 5.5.LinearMixingEEG-ECoGDataTheaimofSection 5.5 istounderstandthenatureofmixingacrosstheskull.Inparticular,theobjectiveistoassertthevalidityofthelinearmixingassumptioninBSSproblem.Since,linearmixingisassumedinalmostalltheBSSmethods,itwillbeofprimaryinteresttoshowthevalidityoftheassumptionwithrespecttoneuraldata.The 110

PAGE 111

ideaofthisexperimentistoconsideraneuraldatasetwhichcontainstheinformationregardingthesourceaswellasthemixturesignalsfrombrain.Basedontheavailableinformationfromthedataset,thegoalistoextractthemixingmatrix.However,themixingmatrixitselfmaynotprovideasignicantinformation,whencomparedtothetotalerrorfromthelinearmixingassumption.Therefore,inthefollowingexperiment,asuitablepubliclyavailabledataset(whichcontainsbothsourceandmixturedata)isconsidered.Theaimistoexaminethelinearmixingassumptionacrosstheskullbyminimizingthetotalerror.Datacontainingsimultaneouselectricalactivityoverthescalp(EEG)andovertheexposedsurfaceofthecortex(ECoG)fromamonkeyisconsideredinSection 5.5 .Theinformationregardingexperimentalsetupandpositionofelectrodesisavailableonthefollowingwebaddress,( http://wiki.neurotycho.org/EEG-ECoG recording ).Sincethedatafromthisexperimentissimultaneouslycollectedfromabovethescalpandunderthescalp,itopensthedoortounderstandthemixingmechanismacrossthebrain.Typically,themixingovertheskullisassumedtobelinear.Mathematicaladvantagesinformulatingtheproblem,developingthealgorithms,andidentifyingtheunknownsourceandmixingmatricesareobtainedthroughthelinearmixingassumption.Infact,theonlyknownsuccessfulresultsinBSSproblemisobtainedfromlinearmixingassumption.Byanalyzingthedataofthisexperiment,thegoalistoexperimentallyverifythevalidityofthelinearmixingassumption.ThedataconsistsofECoGandEEGsignals,whichweresimultaneouslyrecordedfromthesamemonkey.128channelsECoGarraythatcoveredentirelateralcorticalsurfaceoflefthemispherewithevery5millimeterspacingwasimplantedinthemonkey.TheEEGsignalwasrecordedfrom19channels.ThelocationoftheEEGelectrodeswasdeterminedby10-20systemswithouttheCzelectrode(becausethelocationoftheCzelectrodeinterferedwithaconnectorofECoG).Inthepresentsimulation,resultsonaparticulardatasetispresented,wherethemonkeyisblindfolded,seatedinaprimate 111

PAGE 112

chair,andhandsaretiedtothechair.Figure 5-7 showsthe8EEGchannelsofthelefthemisphere,andFigure 5-8 showsthe128ECoGchannelsfromthelefthemisphere.Duringtherecording,themonkeyisinrestingcondition.Insuchscenario,itisassumedthatthethetaandalphabandsshouldbedominantinanormalhealthyprimate.Thus,thegoalistoseehowparticularfrequencybandsmixovertheskull.Basically,theformulationisofthefollowingform: minimize:jXeeg)]TJ /F4 11.955 Tf 11.95 0 Td[(AXecogj,(5)whereXeeg2R18NrepresentstheEEGdatafrom18channels(eachcolumnrepresentsachannel),Xecog2R128NrepresentstheECoGdatafrom128channels(eachcolumnrepresentsachannel),andA2R18128istheunknownmixingmatrix.BeforesolvingFormulation 5 ,thedatahasbeenlteredtoremovehigh(45Hz)andlow(0.5Hz)frequencies.Inadditiontothat,50Hzand60Hznotchltershavebeenusedtoremovethenoiseinducedbytheelectriccurrent.Furthermore,allthechannelshavebeenreferencedtotheaveragesignalbeforeconductingtheanalysis,i.e.,forexample,EEGdatavaluefromaparticularchannelatagiventimeinstanceisreferencedwithaverageEEGfromalltheEEGchannelsatthesametimeinstance.Similarly,theECoGdataisreferencedtotheaveragesignal.InsteadofsolvingFormulation 5 withrespecttothewholedata,theformulationhasbeensolvedmultipletimes,withreduceddatasets.Typically,thereduceddatasetsarenothingbutsmallerchunksofdatawiththewindowsizeofN=2000pointsforaparticularfrequencyband,takenfromtheoriginaldata.TheobjectiveofFormulation 5 istocalculatethetotalabsoluteerrorduetolinearmixingassumptionindifferentfrequencybands.Thus,thisexperimentprovidesamechanismtounderstandmixingaroundtheskull.Alowerrorshowsthatlinearmixingassumptionisvalid.Whereas,ahigherrorindicatesthatthelinearmixingassumptionisinvalid.Moreover,theultimategoalistoshowifthemixingisconstantoverthetime.However,todevelopsuchresults, 112

PAGE 113

completeunderstandingregardingthetotalnumberofsourcesshouldbeavailable.Atthispoint,asimpleexperimentispresented,where,itisassumedthatalltheECoGelectrodesaresources,andalltheEEGelectrodesaremixtures.Thus,themodelishighlyunder-determined,butduetotheavailabilityofbothsourceandmixtureinformation,Formulation 5 transformstoaconvexprogrammingproblem.TheresultsoftheanalysisareshowninTable 5-14 .Whilecalculatingtheerror,onlythosechannelsareconsideredthatareplacedonthelefthemisphere,i.e.,8EEGchannels,and128ECoGchannels.Since,ECoGdataisavailableforthelefthemisphere,therighthemisphericalchannelsinEEGhavebeenneglectedduringthecalculationoftheerror.InTable 5-14 ,thethirdrowpresentsthemeanvalueofthetotalabsoluteerroroverallthemultiplerunsonthereduceddataset.Thefourthrowprovidesthecorrespondingvarianceofthetotalabsoluteerrorsforthemultipleruns.Thelowaverageerrorandnegligiblevarianceinalphaandthetabandssuggesttheexistenceoflinearmixingacrosstheskull.Atthisstageoftheexperiment,thelinearmixingassumptionisvalidatedintheneuraldata.However,itisfarfromtheoreticalvalidationandgeneralizationtootherneuraldatasets.Furthermore,theothercriticalquestion,whichdirectstowardstheconstancyinmixingisopenforfurtherinvestigation. 5.6.fMRIDataAnalysisInSections 5.6 5.7 5.8 5.9 and 5.10 ,thefocusisonnon-negativesources.Generally,imagesfallunderthenon-negativesourcescategory.TheaimofSection 5.6 istoexaminethevalidityofPPCsparsityassumptioninfMRIdata.Generally,sparsityinfMRIimagesisamoreplausibleassumptionthanindependence[ 24 ].However,thePPCsparsitymaynotbeapplicabletofMRIdata.Throughthisexperiment,theapplicabilityofPPCmethodonfMRIdataisexplored.AnfMRIdatasetexaminedpreviouslyintheliteratureisconsideredinSection 5.6 .ThedescriptionofexperimentalsetupanddatacollectionofthefMRIdataisavailablein[ 35 ],wheretheauthorscompareICAandSCAmethods.Here,thesamedataare 113

PAGE 114

usedtoanalyzetheconvexhullofthefMRIdata.Thebasicideais,ifPPCassumptionsarevalid,thentheconvexhullshouldbeasimplex.Furthermore,iftheconvexhullissimplexinndimensions,thenanafnetransformationtolowerdimensions,likePCA,shouldresultinasimplexinlowerdimensions.Furthermore,theextremepoints(orvertices)ofsimplex(orconvexhull)isnothingbutthecolumnsofthemixingmatrix.Thus,ndingconvexhullleadstotheidenticationofmixingmatrix.ThefMRIdatafromasinglesubjectconsistof98imagestakenevery50millisecond.Theimagesarevectorizedbyscanningtheimageverticallyfromtoplefttobottomright.Next,thedimensionalityofdataisreducedto3principalcomponentsusingPCAforeaseofidentifyingtheconvexhull.Sincetheimagesarevectorized,therelationbetweenfMRIdataandPCAcomponentsisintangible.However,theusageofPCAhasacrucialadvantageinvisualizationofthedata,whichinturnleadstoeasyidenticationoftheconvexhullinthelowerdimensions.Figure 5-9A showsthescatterplotofthreeprincipalcomponents.Now,takingthethreeprincipalcomponents,thedataisprojectedonatwodimensionalplane.ThisprojectionofthethreeprincipalcomponentsintotwodimensionsisshowninFigure 5-10 .FirstthingtonoticeistheprojectionintwodimensionisdifferentfromFigure 5-9B ,whichshowsthescatterplotoftwoprincipalcomponents.Next,asimplexthattsallthepointsinFigure 5-10 givestheinformationpertainingtothecolumnsofthemixingmatrix.Fortheuniqueidenticationofmixingmatrix,existenceofauniquesimplexisnecessary.ForthefMRIdata,thePPC1conditionsarenotcompletelysatisedsincetheverticesofthetriangle(simplex)arenotavailable.However,approximatemethodscanbedevelopedtoidentifytheextremepointsofthetriangle.Forexample,Figure 5-10A andFigure 5-10B showdifferentwaysofextrapolatingthedatatoobtainthevertices.Obviously,thisideacanbeextendedinhighdimensionsbydeningobjectiveslikendingasimplexofminimumvolumecontainingallthedata,orndingasimplexofminimumvolumecontaininghighpercentageofthedata.Fromthisanalysis,itcanbe 114

PAGE 115

concludedthat,ingeneralthePPCmethodmaynotbedirectlyapplicabletotheanalysisoffMRIdata.Thus,alternatemethodswhichcanovercometherestrictionsofthePPCmethodareneededtoanalyzethefMRIdata. 5.7.MRIScansInSection 5.7 ,threeMRIscanimagesareconsidered.FromtheoriginalMRIimages,theminimumpixelvalueissubtracted,andthevalidityofthePPC1assumptionistested.TheseprocessedimagesdosatisfythePPC1assumption.Letuscalltheseimagesasinitialimages.Nowtheinitialimagesarelinearlymixedtoobtainthreemixtureimages.Thegoalistoextractthepuresourceimagesfromthemixtureimages.Figure 5-11A displaystheinitialsources,andFigure 5-11B presentsthemixtureimages.ThethreemixtureimagesarevectorizedintomatrixX2R3N,whereNdependsuponthesizeoftheimages.Now,thecolumnsofXareprojectedonthetwodimensionalspaceusingthePCAtransformation.Fromtheprojecteddata,thethreeuniqueverticesofthesimplex(triangle)areidentiedusingtheproposedprojectionapproach.SincetheinitialimagessatisfythePPC1assumption,theuniqueverticesareidentied,i.e.,noapproximationisneeded.Fromtheverticesofthesimplex,themixingmatrixisconstructed.Usingtheinformationofthemixingmatrix,thesourceimagesarerecovered.Figure 5-11C showstherecoveredsourceimages.SincethePPC1assumptionsweresatisedinitially,excepttheorderingandintensity(ambiguityofpermutationandscalability)oftheimages,alltheotherinformationisrecoveredfromthemixtureimages.Furthermore,inordertoseetheperformanceofproposedapproach,theexperimentisrepeated50timeswitharandommixingmatrixineveryrepetition.Moreover,the50repetitionsoftheexperimentisconductedforeveryvalueofn=3,...,7.Inordertopresenttheaccuracyoftheproposedapproach,theerrorbetweenrecoveredand 115

PAGE 116

originalsourcesiscalculatedas: e(S,bS)=min2nnXi=1ksi)]TJ /F8 11.955 Tf 11.15 .5 Td[(bsik2,(5)wheresiistheithrowoftheoriginalsourcematrixS,andbsiistheithrowoftherecoveredsourcematrixbS.Alltherowsoforiginalandrecoveredsourcematricesarenormalized.Thenormalizationremovesthescalingeffect.Theeffectofpermutationishandledbyvector.Let=[1,...,n]Tandn=f2Rnji2f1,2,...,ng,i6=j,8i6=jgbethesetofallpermutationsoff1,2,...,ng.TheoptimizationproblemgiveninEquation 5 istomatchtherowsofrecoveredsourcematrixtotheoriginalsourcematrix.Typically,theminimizationproblemisnothingbutstandardtheassignmentproblem,andcanbeeasilysolvedusingtheHungarianmethod.TheaverageerrorandstandarddeviationforMRIscanimagesofthesimulationarepresentedintherstcolumnofTables 5-15 and 5-16 respectively. 5.8.FingerPrintsInSection 5.8 ,threengerprintimagesareconsidered.SimilartotheMRIscansinSection 5.7 ,inthecaseofthengerprintimages,theminimumpixelineachimageisrstsubtractedfromtheimages,andthencheckedforthePPC1assumption.TheseprocessedimagesdonotsatisfythePPC1assumption.Letuscalltheseimagesasinitialimages.NowthelinearmixingoperationisrepeatedsimilartotheMRIscanimagesexperiment(seeSection 5.7 ),toobtainthreemixtureimages.SincethePPC1assumptionisnotsatised,thegoalistoapproximatelyextractthepuresourcesfromthemixtureimages.Figure 5-12A displaystheinitialsources,andFigure 5-12B presentsthemixtureimages.ThethreemixtureimagesarevectorizedintomatrixX2R3N,whereNdependsuponthesizeoftheimages.Now,thecolumnsofXareprojectedonthetwodimensionalspaceusingthePCAtransformation.Fromtheprojecteddata,thethreebestextremepointsareidentiedusingtheproposedprojectionapproach.Takingthe 116

PAGE 117

threepointsastheverticesofthesimplex,themixingmatrixisconstructed.Sourcesarerecoveredusingtheinformationofmixingmatrix.Figure 5-12C showstheextractedsourceimages.Itcanbeseenfromtherecoveredsourcesthat,apartfromintensityandordering,therecoveryisnotperfect.Furthermore,inordertoseetheperformanceofproposedapproach,theexperimentisrepeated50timeswitharandommixingmatrixineveryrepetition.Moreover,the50repetitionsoftheexperimentisconductedforeveryvalueofn=3,...,7.Inordertopresenttheaccuracyoftheproposedapproach,theerrorbetweenrecoveredandoriginalsourcesiscalculatedbytheformulagiveninEquation 5 .TheaverageerrorandstandarddeviationforngerprintimagesofthesimulationarepresentedinthesecondcolumnofTables 5-15 and 5-16 respectively. 5.9.ZipCodesInSection 5.9 ,fourzipcodeimagesareconsidered.Letuscalltheseimagesasinitialimages.NowthelinearmixingoperationsimilartotheMRIscanimagesexperiment(seeSection 5.7 )isperformed,inordertoobtainfourmixtureimages.ThePPC1assumptionisnotsatisedforthefourimages,andthegoalistoapproximatelyextractthepuresourcesfromthemixtureimages.Figure 5-13A displaystheinitialsourceimages,andFigure 5-13B presentsthemixtureimages.ThefourmixtureimagesarevectorizedintomatrixX2R4N,whereNdependsuponthesizeoftheimages.Now,thecolumnsofXareprojectedonthethreedimensionalspaceusingthePCAtransformation.Fromtheprojecteddata,thefourbestextremepointsareidentiedusingtheproposedprojectionapproach.Takingthefourpointsastheverticesofthesimplex,themixingmatrixisconstructed.Sourcesarerecoveredusingtheinformationofmixingmatrix.Figure 5-13C showstheextractedsources.Itcanbeseenfromtherecoveredsourcesthat,apartfromintensityandordering,therecoveryisnotperfect. 117

PAGE 118

Furthermore,inordertoseetheperformanceofproposedapproach,theexperimentisrepeated50timeswitharandommixingmatrixineveryrepetition.Moreover,the50repetitionsoftheexperimentisconductedforeveryvalueofn=3,...,7.Inordertopresenttheaccuracyoftheproposedapproach,theerrorbetweenrecoveredandoriginalsourcesiscalculatedbytheformulagiveninEquation 5 .TheaverageerrorandstandarddeviationforzipcodeimagesofthesimulationarepresentedinthethirdcolumnofTables 5-15 and 5-16 respectively. 5.10.GhostEffectFivetranslatedimagesofasameindividualareconsideredinSection 5.10 .Letuscalltheseimagesasinitialimages.NowthelinearmixingoperationsimilartotheMRIscanimagesexperiment(seeSection 5.7 ),isperformedinordertoobtainvemixtureimages.ThePPC1assumptionisnotsatisedfortheveimages,andthegoalistoapproximatelyextractthepuresourcesfromthemixtureimages.Figure 5-14A displaystheinitialsources,andFigure 5-14B presentsthemixtureimages.ThevemixtureimagesarevectorizedintomatrixX2R5N,whereNdependsuponthesizeoftheimages.Now,thecolumnsofXareprojectedonthefourdimensionalspaceusingthePCAtransformation.Fromtheprojecteddata,thevebestextremepointsareidentiedusingtheproposedprojectionapproach.Takingthevepointsastheverticesofthesimplex,themixingmatrixisconstructed.Sourcesarerecoveredusingtheinformationofmixingmatrix.Figure 5-14C showstheextractedsources.Itcanbeseenfromtherecoveredsourcesthat,apartfromintensityandordering,therecoveryisnotperfect.Furthermore,inordertoseetheperformanceofproposedapproach,theexperimentisrepeated50timeswitharandommixingmatrixineveryrepetition.Furthermore,the50repetitionsoftheexperimentisconductedforeveryvalueofn=3,...,7.Inordertopresenttheaccuracyoftheproposedapproach,theerrorbetweenrecoveredandoriginalsourcesiscalculatedbytheformulagiveninEquation 5 .Theaverageerror 118

PAGE 119

andstandarddeviationforghosteffectimagesofthesimulationarepresentedinthefourthcolumnofTables 5-15 and 5-16 respectively. 5.11.HyperplaneClusteringInordertoshowtheperformanceofAlgorithm 4.2 ,randomtestinstanceshavebeengeneratedinSection 5.11 .Forsimplicity,thecasewhenm=nisconsidered.Toshowtheperformanceoftheproposedapproach,noisefreecorrelatedsourcesareused.Allthedatainthecasestudyisarticiallygenerated.DatapointsX2R161600withoutanynoisefromrandomlygenerateddictionaryA2R1616andsourceS2R161600matriceshavebeengenerated(sourceissparse,i.e.,ineachcolumnthereisatleastonezero).Figure 5-15 representstheoriginalsourcematrix,andFigure 5-16 representsthegivendata.TheoriginaldictionarymatrixAisrandomlygeneratedforthecasestudy.MatrixshowninFigure 5-17 representstheAmatrixnormalizedseparatelywithrespecttoeachcolumn.ThecorrelationofsourcesisgivenbythematrixshowninFigure 5-18 .Thecorrelationmatrix(seeFigure 5-18 )isfarfrombeingdiagonal,therefore,thesourcesarehighlycorrelated.Nevertheless,theproposedmethodseparatetheirmixturessuccessfully,unlikeICA,sinceICAatleastrequiresthesourcestobeuncorrelated.Afterpreprocessingthedata,thehierarchicalsequenceofMIPsissolved,asdescribedinAlgorithm 4.2 .Forfastexecution,thealgorithmisjumpstartedbygeneratinginitialpoints.Specically,thefollowingtwoneighborhoodsofapointxraredened: xp2N1(xr)ixtpxr jjxtpjj2jjxrjj21(5)and xp2N2(xr)ixtpxr jjxtpjj2jjxrjj22.(5)Foriterationr,arandompointxbpisselectedasacandidatepoint,whichhasmaximumelementsinN1neighborhood.Next,allthepointsthatbelongstotheN1(xbp)are 119

PAGE 120

consideredasthepointsthatbelongstorthhyperplane.Moreover,allthepointsthatbelongtoN2(xbp)aretakenasastartingsolutiontotherthhierarchicalproblem.Where1,2arearbitraryselected,suchthat1<2.Inthecasestudy,wehaveset1=cos(),where2[15,20]and2=cos(25).ThesetsN1andN2arethetwosamplesoftheproposedRANSACbasedalgorithm.Afterrunningtheproposedalgorithm,fromEquation 4 Aisrecovered.InFigure 5-19 ,thecolumnnormalizeddictionarymatrixAisshown.Inadditiontothat,therecoveredsourceSisshowninFigure 5-20 .TheAandAarenormalizedandtruncatedtotwodecimalplacesforthesakeofeasycomparison.ItcanbeseenthatAandAdifferonlybypermutationofthecolumns,whichshowstheexcellentperformanceoftheproposedalgorithm.Inadditiontothat,datapointsX2RmNfromrandomlygenerateddictionaryA2RnnandsourceS2RnNmatriceshavebeengenerated,withoutanynoise,fordifferentvaluesofm,nandN.Theobjectiveistostudytheperformanceoftheproposedalgorithmw.r.tthesolutiontime.Forconsistencyinthestudy,allthesimulationsarecarriedonthesamemachine(used8processorsona64processorLinuxserver).Toaccommodatetheinfeasibilityissueforhighill-conditionedAmatrix,thebesttimeoutof5runsisreported.Table 5-17 presentsthesolutiontimesforthecaseswhenm=n=6andN=600,...,3800.Table 5-18 presentsthesolutiontimesforthecaseswhenm=n=6,8,...,16andN=100n.BasedonthesimulationresultspresentedinTables 5-17 and 5-18 ,itcanbeseenthatthecomplexityoftheproblemisinclinedtowardsn,thancomparedtoN. 5.12.RobustSourceExtractionInSection 5.12 ,theapplicationandperformanceofAlgorithm 4.3 ispresented.Inallthesimulations,onlyoneiterationofthesecondordermethodisexecuted.Dataisrandomlygeneratedforthesimulations.Figure 5-21A showsoriginal7signals.ThesignalsarelinearlymixedusingarandomAmatrixtoobtaintheXmatrix.Now, 120

PAGE 121

2%noiseisaddedtotheXmatrix.Figure 5-21B showsthenon-contaminatedandFigure 5-21C showsthecontaminatedXmatrices.Figure 5-22A showstheresultsobtainedbysimplequadraticminimization.Figure 5-22B showsthesolutionobtainedfromtheproposedapproach.Furthermore,inordertoseetheperformanceofproposedalgorithm,theexperimentisrepeated50timeswithrandommixingandsourcematricesineveryrepetition.Moreover,the50repetitionsoftheexperimentisconductedforeveryvalueofn=3,...,6.Inordertopresenttheaccuracyoftheproposedapproach,theerrorbetweenrecoveredandoriginalsourcesiscalculatedbytheformulagiveninEquation 5 .Theaverageerror,standarddeviationandaveragetimetosolveoneinstanceforthesimulatedsignalsoftheexperimentarepresentedinTable 5-19 Table5-1. Binaryclassicationcasestudy1 NameInherentDistributionNoiseCriteria CauchyCauchyDistributionrandomglobalightsofthedistributionareconsideredasnoiseSkewSkewNormalDistributionrandom10%noiseisaddedtothedata Table5-2. Cauchydata 5PE's10PE's20PE's AQG0.71570.80650.8208ACG0.69950.77020.814ACC0.66450.7290.801ACS0.8340.84050.8403 121

PAGE 122

Table5-3. Skewdata 5PE's10PE's20PE's AQG0.9020.9010.909ACG0.90080.9050.9005ACC0.89980.89930.9ACS0.90050.89980.9025 Table5-4. Binaryclassicationcasestudy2 DatasetAttributes(or)TotalClassesTrainingFeaturesrecordssize PimaIndiansDiabetes(PID)87682400WisconsinBreastCancer(WBC)96832300BUPALiverDisorders(BLD)63452150 122

PAGE 123

Table5-5. SamplebasedperformanceofANNonPIDdata 12345678910Best AQG0.740.7550.7570.7570.7570.7560.7560.7550.7540.7540.7580.750.7570.7580.7570.7570.7560.7560.7550.7550.7540.7380.7440.7440.7480.7480.7470.7450.7430.7440.744ACG0.7620.7630.7630.7630.7630.7660.7650.7660.7660.7650.7650.7590.7610.760.760.76ACC0.6870.7460.7540.7630.7610.7610.7630.7620.7620.7610.7680.7310.7580.760.7650.7640.7650.7680.7660.7650.7640.7470.7590.7640.7620.7640.7630.760.7650.7660.763 123

PAGE 124

Table5-6. BlockbasedperformanceofANNonPIDdata 12345678910Best AQG0.710.7440.7580.7630.7640.7620.7670.7640.7630.7660.7690.7360.7560.7620.7630.7660.7670.7690.7670.7660.7650.7460.7610.7660.7680.7650.7680.7670.7670.7620.764ACG0.7660.7670.7650.7650.7660.7690.7670.7650.7690.7650.7650.7670.7650.7650.7650.766ACC0.670.7010.7240.7410.7520.7590.7590.7650.7650.7620.770.6980.7250.7460.7540.7620.7630.7650.7670.770.7680.720.750.7620.7630.7640.7670.7640.7650.7660.769ACS0.7550.7520.7520.7560.7540.7510.7540.7540.7530.750.7560.7520.7520.7550.7490.7530.7520.750.7510.750.7550.7470.7540.7510.7520.7480.750.7490.7480.7480.75 124

PAGE 125

Table5-7. SamplebasedperformanceofANNonBLDdata 12345678910Best AQG0.5690.5780.5780.5770.5880.5840.5910.590.5910.5920.620.5680.570.5840.5840.5960.5990.5990.6060.6110.610.5720.5740.5910.5970.6030.6110.6030.6140.620.622ACG0.5780.5790.5790.580.5830.5960.5790.5810.5850.5850.5870.5840.5960.5940.5950.592ACC0.5750.5770.5810.5790.5830.5850.5920.590.5920.5970.6270.570.5760.5820.5840.5910.5910.5970.6030.610.6130.5710.5810.5820.5920.6010.60.6080.6120.6220.627 125

PAGE 126

Table5-8. BlockbasedperformanceofANNonBLDdata 12345678910Best AQG0.5610.570.590.5970.5950.6020.6130.6150.6310.6380.6850.570.5950.5960.610.6250.6380.6440.6530.6570.6580.580.610.6370.6440.6520.6630.6720.6680.6750.685ACG0.6120.6140.6150.6260.6330.6850.6310.6390.6430.6550.6590.660.6670.6710.6750.685ACC0.570.5780.5810.5910.6040.6040.6280.630.6320.6410.6860.5650.5850.5980.6170.6220.6310.650.6470.6590.6680.5810.6080.6340.6390.6620.6630.6670.6750.6770.686ACS0.6120.6340.6430.6370.6360.6410.6330.6460.6430.6420.6750.6370.6550.6550.6570.6560.660.6580.6570.6560.6540.6530.6750.6680.6690.6680.6690.6640.6740.670.67 126

PAGE 127

Table5-9. SamplebasedperformanceofANNonWBCdata 12345678910Best AQG0.9660.9680.9690.9690.9690.9690.9690.9680.9680.9680.970.9650.9690.970.970.9690.970.9690.9690.9690.9690.9650.970.970.970.970.9690.9690.9690.9680.968ACG0.970.970.970.970.970.9710.970.970.970.970.9710.970.9710.9710.9710.971ACC0.9690.970.9690.9690.9690.970.9710.970.970.970.9720.970.970.9710.970.9720.970.970.9710.9710.9710.9710.970.970.970.970.970.970.970.9690.97 127

PAGE 128

Table5-10. BlockbasedperformanceofANNonWBCdata 12345678910Best AQG0.9610.9680.9680.9680.970.970.970.970.9690.970.970.9640.9680.9680.9680.9690.970.9690.970.9690.9680.9660.970.9690.9690.9690.9690.970.970.970.97ACG0.970.9670.9690.9670.970.9730.9710.970.9710.9730.9650.9710.9650.970.9690.971ACC0.9610.9660.9680.9680.970.9710.970.970.9720.9690.9720.9650.9680.9690.970.9680.9690.9680.970.970.9690.9660.9680.9690.9710.9690.970.970.970.970.97ACS0.9650.9650.9660.9650.9640.9650.9640.9650.9670.9660.9670.9640.9660.9670.9650.9650.9650.9640.9650.9650.9660.9650.9640.9650.9640.9640.9620.9630.9630.9650.964 128

PAGE 129

Table5-11. PerformanceofACSfordifferentvaluesofandnumberofPEsinhiddenlayeronPIDdata 0.50.811.21.41.6 5PE0.75560.7490.75930.75680.76330.761610PE0.75490.74610.75850.76040.76080.760320PE0.75430.74230.76140.7580.75850.7593 Table5-12. PerformanceofACSfordifferentvaluesofandnumberofPEsinhiddenlayeronBLDdata 0.50.811.21.41.6 5PE0.6460.68060.6810.68610.68530.68410PE0.65960.68840.69280.69310.69410.692820PE0.67530.69920.69960.69970.70130.7007 Table5-13. PerformanceofACSfordifferentvaluesofandnumberofPEsinhiddenlayeronWBCdata 0.50.811.21.41.6 5PE0.96720.96460.96330.96480.96480.967210PE0.96650.96390.96310.96470.96340.963520PE0.96540.96210.96130.96250.9630.9634 Table5-14. Linearmixingassumption THETAALPHABETA Frequency3.5-7.5Hz8-13Hz14-30HzActivityfallingasleepclosedeyesconcentrationError(mean)7.32E-040.0013.3539Error(variance)1.04E-062.06E-0697.8081 Table5-15. Averageunmixingerror nMRIScansFingerPrintsZipCodesGhostEffect 31.4110)]TJ /F6 7.97 Tf 6.59 0 Td[(160.00310.0061.7710)]TJ /F6 7.97 Tf 6.59 0 Td[(445.4210)]TJ /F6 7.97 Tf 6.59 0 Td[(40.00460.01117.1910)]TJ /F6 7.97 Tf 6.59 0 Td[(450.00220.00640.01520.001660.00690.00840.01860.003370.01580.01040.02630.0055 129

PAGE 130

Table5-16. Standarddeviationunmixingerror nMRIScansFingerPrintsZipCodesGhostEffect 3210)]TJ /F6 7.97 Tf 6.59 0 Td[(16110)]TJ /F6 7.97 Tf 6.59 0 Td[(4410)]TJ /F6 7.97 Tf 6.59 0 Td[(4110)]TJ /F6 7.97 Tf 6.59 0 Td[(154210)]TJ /F6 7.97 Tf 6.59 0 Td[(4410)]TJ /F6 7.97 Tf 6.59 0 Td[(4810)]TJ /F6 7.97 Tf 6.59 0 Td[(4410)]TJ /F6 7.97 Tf 6.59 0 Td[(55210)]TJ /F6 7.97 Tf 6.59 0 Td[(5110)]TJ /F6 7.97 Tf 6.59 0 Td[(30.002510)]TJ /F6 7.97 Tf 6.59 0 Td[(56710)]TJ /F6 7.97 Tf 6.59 0 Td[(40.0010.003910)]TJ /F6 7.97 Tf 6.59 0 Td[(570.0010.0020.004710)]TJ /F6 7.97 Tf 6.59 0 Td[(4 Table5-17. Simulation-1resultsforcasestudy2 m,nNtime(sec) 66009.4075616100015.343756140030.179276180046.281626220062.921346260097.7140963000112.989663400149.6679 Table5-18. Simulation-2resultsforcasestudy2 m,nNtime(sec) 66009.407561880046.69568990029.9763310100058.3871111110073.56582121200132.9046141400634.74181616004320.502 Table5-19. Performanceofcorrentropyminimizationalgorithm nN33004400550066007700 mean0.0350.0360.0390.0270.025std(10)]TJ /F6 7.97 Tf 6.59 0 Td[(2)4.245.839.271.290.96time(s)0.3959.13139.88284.36527.79 130

PAGE 131

Figure5-1. GlobalviewofCauchydatawithlocalandglobalights. 131

PAGE 132

Figure5-2. LocalviewCauchydatawithlocalandglobalights. 132

PAGE 133

Figure5-3. Skewnormaldatawithnoise 133

PAGE 134

A BFigure5-4. PerformanceofSVMonPIDdata.A)Accuracy.B)Numberofsupportvectors. 134

PAGE 135

A BFigure5-5. PerformanceofSVMonBLDdata.A)Accuracy.B)Numberofsupportvectors. 135

PAGE 136

A BFigure5-6. PerformanceofSVMonWBCdata.A)Accuracy.B)Numberofsupportvectors. 136

PAGE 137

Figure5-7. EEGrecordingsfrommonkey. 137

PAGE 138

Figure5-8. ECoGrecordingsfrommonkey. 138

PAGE 139

A BFigure5-9. fMRIdatavisualization.A)PCAreductionto3dimensions.B)PCAreductionto2dimensions. A BFigure5-10. ConvexhullPPC1assumption.A)Convexhullrepresentation1.B)Convexhullrepresentation2. 139

PAGE 140

A B CFigure5-11. MixingandunmxingofMRIscans.A)Original.B)Mixture.C)Recovered. 140

PAGE 141

A B CFigure5-12. Mixingandunmxingofngerprints.A)Original.B)Mixture.C)Recovered. 141

PAGE 142

A B CFigure5-13. Mixingandunmxingofzipcodes.A)Original.B)Mixture.C)Recovered. 142

PAGE 143

A B CFigure5-14. Mixingandunmxingofghosteffect.A)Original.B)Mixture.C)Recovered. 143

PAGE 144

Figure5-15. Originalsparsesource(normalized)forcasestudy1 144

PAGE 145

Figure5-16. Givenmixturesofsourcesforcasestudy1 145

PAGE 146

Figure5-17. Originalmixingmatrixforcasestudy1 Figure5-18. Mixingmatricesforcasestudy1 Figure5-19. Recoveredmixingmatrixforcasestudy1 146

PAGE 147

Figure5-20. Recoveredsource(normalized)forcasestudy1 147

PAGE 148

A B CFigure5-21. Dataforsourceextractionmethod.A)Originalsourcesignal.B)Mixturebeforeaddingnoise.C)Mixtureafteraddingnoise. 148

PAGE 149

A BFigure5-22. Recoveryofsourcesbyquadraticandcorrentropyloss.A)Recoveredsourcebyquadraticerrorminimization.B)Recoveredsourcebyproposedmethod. 149

PAGE 150

CHAPTER6SUMMARYInChapter 3 ,twonovelapproachesintegratingtheconceptsofcorrentropyindataclassicationareproposed.Therationalebehindproposingcorrentropiclossfunctionindataclassication,isitsabilitytodeemphasizeoutliersduringthelearningphase.Thus,theoutlierswillnothaveinuencewhileobtainingtheclassicationrule.Thisisanimportantpropertyofthecorrentropyfunctionthatcanbeusedinrealworlddataclassicationproblems.Inadditiontothat,theuseofthecorrentropiclossfunctionintwodifferentformshasbeenillustrated.Inrstform,thekernelwidthisallowedtovaryinthelearningphase.Inordertoincorporatevaryingkernelwidth,aCSbasedANNlearningisproposed(ACCmethod).TheACCmethodusesthesimplewellknowndeltaruletoupdatetheweights.However,thepurposeofusingthisback-propagationmechanismistoillustratetheuseofCSbasedANNlearning.Differentsophisticatedmethodstoreplacetheback-propagationcanbeusedtoenhancethebasicACCalgorithm.Furthermore,thesecondformofcorrentropiclossfunctionhasaxedkernelwidth.Dependinguponthekernelwidth,thelossfunctionmaybeconvexorinvex.However,theANNmapperinherentlycontainsnonconvexity.Therefore,anyclassicalgradientdescentalgorithminANNframeworkmayconvergetoalocalminimum.Toavoidsuchlocalconvergence,thegradientdescentmethodhasbeenreplacedbySAalgorithm.AlthoughasimpleSAisusedwithinANNframework,nevertheless,thismethodcansuitablyincorporateotherspecializedformsofSA.Chapter 4 proposessolutionmethodsfortwomajorsparsitybasedclassesoftheBSSproblem.Theproposedsolutionmethodsarebrokendownintotwomajorsteps.Therststepinvolvesidenticationofthemixingmatrix.Twodifferentapproachestoidentifythemixingmatrixbasedonthenon-negativityandsparsitylevelofthesourcesareproposed.Thesecondstepinvolvesextractionofsourcematrix.Forthisstep,the 150

PAGE 151

correntropybasedmethodisproposed.Theproposedmethodcanbeusednotonlytoidentifythesourcematrix,butalsotoidentifytheoutliersinthemixturematrix.Byapplyingthetwosteps,theBSSprobleminthepresenceofoutlierscanbesolvedefciently.ExperimentsonbinaryclassicationshowthattheproposedcorrentropiclossfunctionimprovestheclassicationaccuracyofANNbasedclassiers.Furthermore,experimentsshowthattheproposedapproachesprovideatoughcompetitiontothestateoftheartSVMbasedclassier.Itcanbeproposedthatthecorrentropiclossfunctionisasubstantialcontenderforarobustmeasureintheriskminimization.Moreover,thedevelopmentofefcientalgorithmsfortheparametersearchesinANNswillfurtherenhancetheimportanceofcorrentropiclossfunction.Experimentsonsignalseparationshowthattheproposedmethodforhyperplaneclusteringcansolveproblemsuptosize16whichisunattainablewiththeearliermethods.Furthermore,thecorrentropybasedsourceextractionmethodshowsthatsuitablekernelwidthcanbeobtainedfromthecontaminateddata,whichcanseparatetheoutliersfromthegooddatapoints. 6.1.CriticismRobustmethodshavealwaysbeencriticizedonthelossofefciencyandincreaseincomputationalcomplexity.TheoreticalresultslikethoseshownbyFisher[ 30 ]alwayssupporttheusageofquadraticlossfunction.Moreover,thequadraticlossfunctioniseasytooptimizeandisefcientinmodelparameterestimation.Thus,usingthenotionofsmoothingeffect(i.e.,theeffectoffewoutlierscanbesubduedbythepresenceoflargenumberofgooddatapoints)hasalwaysbeenusedtocountertheideaofrobustmethods.TherearetwobasictypesofcriticismontheusageofBSSapproachesindataanalysis.Theprimarycommentisonthelossoforderinthesources.AsdiscussedinSection 4.1 ,thescalabilityissueofBSSmethodcanbeovercomebyusingsuitable 151

PAGE 152

normalizationapproaches.However,identifyingappropriatesourcesingeneralisnotpossible.Makegietal.[ 58 ]discussedtheaboveissue,andstatedtheimportanceofknowingwhatthesourcesareinsteadofknowingwherethesourcesareinunderstandingcorticalactivity.Furthermore,theunderdeterminedcaseisusuallyresolvedbyexperimentaldesign,whereartifactsareintroducedintothedatawhilerecordingtoreducetheunderdeterminacy.TheothertypeofcriticismthatisreceivedontheusageofBSSapproachesisonthevalidityoftheassumptionsimposedonmixingandsourcematrices.Thesmearingofsignalbyvolumeconductionisinstantaneous,thusanodelayassumptionisnotmuchofaconcern.Thelinearmixingassumptionisthecriticaloneandishardtovalidateexperimentally.However,superpositionofsignals(atypicalnaturalphenomenon)canbeusedtosupportthenotionoflinearmixing.Inadditiontothat,theassumptionsimposedonsourcesignalsareoftenobjected.Statisticalindependenceamongtheneuronalsignalsishardtojustify.Therefore,researchersworkingwithICAdirectedtheresearchjustifyingstatisticalindependenceamongartifactsandneuronalsignals.Ontheotherhand,theassumptionsofthenovelSCAapproachesareyettobeexperimentallyvalidatedonneurologicaldata.Furthermore,sparsicationmethodstransformingthegivenproblemintoasparsesourceproblemareyettobeexplored. 6.2.ConclusionConventionally,aquadraticlossfunctionisusedasameasurethesimilarity.Rockafellaretal.[ 79 ]proposedfouraxiomsforanerrormeasure:errormeasureisstrictlypositiveforanon-zeroerror,positivehomogeneity,subadditivity,andlowersemicontinuity.Homogeneityandrobustnessarecontradicting,andcannotexistsinasinglefunction.Thusinthiswork,thefollowingpropertiesfavorableforarobusterrormeasureareproposed:(1)errormeasureisstrictlypositiveforanon-zeroerror,(2)generalizedconvexity,(3)differentiability(4)symmetry,and(5)lowersemicontinuity.Oneofthegoalsofthisworkistoproposeaspecicrobustmeasure,calledcorrentropic 152

PAGE 153

lossfunction,thatcalculatesthesimilaritybetweentworandomvariablesyanda,andsatisestheaboveveproperties.Furthermore,similartothegeneralizationofSVMsfromthebasicformulationtothekernelbasedsoftmarginformulation,correntropybasedANNscanbeviewedasageneralizedformofANNs(bothinregression[ 72 ]andclassication).Fromrigorousexperimentalresults,theusabilityofcorrentropybasedANNsinrealworlddataclassicationproblemsisshowninChapter 5 .BSSapproachesbasedonICAarewellknowninthesignalseparationliterature.However,sparsitybasedBSSmethodsarerelativelynew,andtheirpotentialisyettobeexploredintheareaofsignalprocessing.Throughthesystematicoverviewpresentedinthisdissertation,theawarenessofthenovelsparsitybasedBSSmethodsisincreased,andthedifferencesbetweenICAandSCAmethodsarehighlighted.TheprimarydifferencebetweenICAbasedmethods,comparedtoSCAbasedmethods,isthattheICAbasedmethodsaremostlysuitableforartifactltering.However,thestrikingdifferenceisthat,theSCAbasedmethodsmaybesuitableforseparatingpuresources,whicharenotnecessarilystatisticallyindependent.SimilartoEEG/MEGanalysiswithICA,whereartifactsareinducedintothesignalviastrategicexperiments,efcientexperimentsforSCAcanbedesigned,wheresparsitycanbeinducedintothesourcesignals.Furthermore,sparsicationmethods(likewavelettransforms)thatcanefcientlysparsifysourcesignalscanalsobeusedtoanalyzenon-sparsesourcesignals.Tosumup,SCAbasedmethodsmayopenanewdoorforunderstandingthemysteriesofthebrain.Toconclude,thecomputationalcomplexityofrobustmethodswillalwaysbeanissuewhencomparedtothetraditionalmethods.However,propertieslikeinvexityforrobustmeasures,andsampleselectionstrategiesforrobustalgorithmswillovercometheissuesrelatedtothecomputationalcomplexitytoacertainextent.Nevertheless,forpracticalscenarios,robustmethodsarealwayspreferableintermsofsolutionqualitytothetraditionalmethodsindataanalysis.Furthermore,evenforthetheoreticalscenarios, 153

PAGE 154

theperformanceofrobustmethodsintermsofsolutionqualityiscompetitivewiththetraditionalmethods. 154

PAGE 155

APPENDIXGENERALIZEDCONVEXITY Inthefollowingdiscussion,thefunctionsareassumedtobetwicedifferentiable.Obviously,convexanalysisisnotconnedtothedifferentiablefunctions,andtheinterestedreadersmayreferto[ 5 6 11 61 78 ]forcomprehensivedetails.Animportantbuildingblockofconvexanalysisisthenotionofaconvexset.Asetissaidtobeconvex,ifthelinesegmentjoininganytwopointsofthesetcompletelyliewithintheset. Denition1. Letf:S7!Rbeatwicedifferentiablefunction,whereSisanonemptyconvexsubsetofRn.Thefunctionfissaidtobeconvex,ifandonlyif,theHessianmatrixoffispositivesemideniteateachpointinS.Dualityandtheoptimalityconditionsarethetwoimportanttheoriesintheeldofoptimizationthatarenurturedbyconvexity[ 8 ].Convexityaddedthecrucialbrickofnodualitygap,inthedualitytheory.Furthermore,itisconvexitythatprovidedaladderforalocaloptimalsolutiontoreachthestatusofaglobaloptimalsolution.Thesetwotheoriesarethebackboneofalmostalltheoptimizationalgorithms.However,therehasalwaysbeencuriosityamongresearcherstobreakthestrictrequirementsofconvexity.Thisisduetothefactthatmostpracticalproblemstendtobenon-convex.Asarstsuccessfulattempt,Mangasarian[ 59 ]generalizedthenotionofconvexitybyproposinganotherclassoffunctionscalledpseudoconvexfunctions. Denition2. Letf:S7!Rbeadifferentiablefunction,whereSisanonemptysubsetofRn.Thefunctionfissaidtobepseudoconvex:ifrf(x1)T(x2)]TJ /F4 11.955 Tf 11.95 0 Td[(x1)0thenf(x2)f(x1)8x1,x22SPseudoconvexfunctionsdonotrequirethepositivesemidenitecriterion,likethatofconvexfunctions.Furthermore,pseudoconvexfunctionspreservethetractability,i.e.,alocalminimumofapseudoconvexfunctiononaconvexdomainisaglobalminimum.Thus,pesudoconvexityaugmentedtheoptimalityconditionstoalargerclass 155

PAGE 156

offunctions.Pseudoconvexityintheobjectivefunction,alongwiththequasiconvexityintheconstraintswereassumedtobetheweakestconditionsthatcanbeimposedsothattheKarush-Kuhn-Tucker(KKT)conditionsaresufcient(undercertainconstraintqualications)[ 5 61 ].Howeveringeneral,thepseudoconvexfunctionfailedwithrespecttoextendability.Inotherwords,thenon-negativeweightedsumofpseudoconvexfunctionsmaynotresultinapseudoconvexfunction.Therefore,thepseudoconvextheoryhaditsownlimitations.Therehasbeencontinuousefforttorelaxtheconvexitycriterion,yetpreservethetractabilityandtheextendabilitycharacteristics.Manyotherideastoextendtheconceptoftractabilitycanbeseenintheliterature[ 11 51 ].Oneofthepracticalsuccessfulextensionsofconvexityisinvexity[ 6 ].Hanson[ 41 ]proposedthecharacteristicsofsuchfunctionswhoselocalminimumisaglobalminimum.Subsequently,Craven[ 23 ]namedsuchfunctionsasinvexfunctions. Denition3. Letf:S7!Rbeadifferentiablefunction,whereSisanonemptysubsetofRn.Thefunctionfissaidtobeinvex,ifandonlyif: f(x2)f(x1)+(x1,x2)Trf(x1)8x1,x22S(A)where:SS7!Rnissomearbitraryvectorfunction.Invexfunctionsnotonlyprovideacriterionoftractability,butalsoprovideacriterionofextendability.Thatis,alocalminimumofaninvexfunctionoveraconvexdomainwillbeaglobalminimum,andthereexistsacriterionunderwhichthenon-negativesumofinvexfunctionswillbeaninvexfunction.Although,itmaybearguedthatinvexitycomeswithaprice;unlikethepseudoconvexfunctions,asub-levelsetofaninvexfunctionmaynotbeconvex.However,theypreservebothtractabilityandextendability,anditisduetoinvexitythatahugeclassoffunctionscannowbeanalyzedwithrespecttotheoptimalityconditions.Therefore,invexityisoneoftheweakestpropertiesinconvexanalysisthatextendsthetheoryofoptimizationinconcludingtheglobaloptimalityofafeasiblesolution. 156

PAGE 157

Thereasontousedifferentiabilitybaseddenitionsisduetothedifferentiablenatureofthecorrentropiclossfunction.Thereareotherdenitionsandpropertiesoftheabovestatedfunctions,andreadersaredirectedto[ 5 11 ]foracomprehensivelistofdenitionsandproperties. TableA-1. Generalizedconvexity(?underconstraintqualication) FunctionTypeTractabilityOptimalityConditionsStrongDualityExtendability ConvexTrueSufcient?ExistsAlwaysPseudoconvexTrueSufcient?ExistsNoKnownCriteriaInvexTrueSufcient?ExistsCriterionExists 157

PAGE 158

REFERENCES [1] Aharon,M.,Elad,M.,&Bruckstein,A.(2006).Ontheuniquenessofovercompletedictionaries,andapracticalwaytoretrievethem.Linearalgebraanditsapplica-tions,416(1),48. [2] Alizamir,S.,Rebennack,S.,&Pardalos,P.(2008).Improvingtheneighborhoodselectionstrategyinsimulatedannealingusingtheoptimalstoppingproblem.SimulatedAnnealing,C.M.Tan(Ed.),(pp.363). [3] Anthony,M.,&Bartlett,P.(2009).Neuralnetworklearning:Theoreticalfounda-tions.CambridgeUnivPr. [4] Antonov,G.,&Katkovnik,V.(1972).Generalizationoftheconceptofstatisticalgradient.Avtomat.iVycisl.Tehn.(Riga),4,25. [5] Bazaraa,M.,Sherali,H.,&Shetty,C.(2006).Nonlinearprogramming:theoryandalgorithms.Wiley-interscience. [6] Ben-Israel,A.,&Mond,B.(1986).Whatisinvexity.J.Austral.Math.Soc.Ser.B,28(1),1. [7] Bereanu,B.(1972).Quasi-convexity,strictlyquasi-convexityandpseudo-convexityofcompositeobjectivefunctions.ESAIM:MathematicalModellingandNumericalAnalysis-ModelisationMathematiqueetAnalyseNumerique,6(R1),15. [8] Bertsekas,D.(2003).Convexanalysisandoptimization.AthenaScienticBelmont. [9] Boser,B.,Guyon,I.,&Vapnik,V.(1992).Atrainingalgorithmforoptimalmarginclassiers.InProceedingsofthefthannualworkshoponComputationallearningtheory,(pp.144).ACM. [10] Bradley,P.,&Mangasarian,O.(2000).k-planeclustering.JournalofGlobalOptimization,16(1),23. [11] Cambini,A.,&Martein,L.(2008).GeneralizedConvexityandOptimization:TheoryandApplications,vol.616.Springer. [12] Capel,D.(2005).Aneffectivebail-outtestforransacconsensusscoring.InProc.BMVC,(pp.629). [13] Catoni,O.(1996).Metropolis,simulatedannealing,anditeratedenergytransformationalgorithms:theoryandexperiments.JournalofComplexity,12(4),595. [14] Chan,T.-H.,Ma,W.-K.,Chi,C.-Y.,&Wang,Y.(2008).Aconvexanalysisframeworkforblindseparationofnon-negativesources.SignalProcessing,IEEETransactionson,56(10),5120. 158

PAGE 159

[15] Chen,B.,&Principe,J.(2012).Maximumcorrentropyestimationisasmoothedmapestimation.SignalProcessingLetters,IEEE,19(8),491. [16] Chum,O.,&Matas,J.(2002).Randomizedransacwithtd,dtest.InProc.BritishMachineVisionConference,vol.2,(pp.448). [17] Chum,O.,&Matas,J.(2005).Matchingwithprosac-progressivesampleconsensus.InComputerVisionandPatternRecognition,2005.CVPR2005.IEEEComputerSocietyConferenceon,vol.1,(pp.220).IEEE. [18] Chum,O.,&Matas,J.(2008).Optimalrandomizedransac.PatternAnalysisandMachineIntelligence,IEEETransactionson,30(8),1472. [19] Chum,O.,Matas,J.,&Kittler,J.(2003).Locallyoptimizedransac.InPatternRecognition,(pp.236).Springer. [20] Cichocki,A.,&Amari,S.(2002).BlindSignalandImageProcessing.WileyOnlineLibrary. [21] Cichocki,A.,Zdunek,R.,&Amari,S.(2006).Newalgorithmsfornon-negativematrixfactorizationinapplicationstoblindsourceseparation.InAcoustics,SpeechandSignalProcessing,2006.ICASSP2006Proceedings.2006IEEEInternationalConferenceon,vol.5,(pp.VV).Ieee. [22] Cortes,C.,&Vapnik,V.(1995).Support-vectornetworks.Machinelearning,20(3),273. [23] Craven,B.(1981).Dualityforgeneralizedconvexfractionalprograms.GeneralizedConcavityinOptimizationandEconomic,(pp.437). [24] Daubechies,I.,Roussos,E.,Takerkart,S.,Benharrosh,M.,Golden,C.,D'Ardenne,K.,Richter,W.,Cohen,J.,&Haxby,J.(2009).Independentcomponentanalysisforbrainfmridoesnotselectforindependence.ProceedingsoftheNationalAcademyofSciences,106(26),10415. [25] Eddington,S.(1914).StellarMovementsandtheStructureoftheUniverse.MacmillanandCompany,limited. [26] Erdogmus,D.,Principe,J.,&HildI.,K.E.(2002).Beyondsecond-orderstatisticsforlearning:Apairwiseinteractionmodelforentropyestimation.Naturalcomput-ing,1(1),85. [27] Fan,R.,Chen,P.,&Lin,C.(2005).Workingsetselectionusingsecondorderinformationfortrainingsupportvectormachines.TheJournalofMachineLearningResearch,6,1889. [28] Fischler,M.,&Bolles,R.(1980).Randomsampleconsensus:aparadigmformodelttingwithapplicationstoimageanalysisandautomatedcartography.Tech.rep.,DTICDocument. 159

PAGE 160

[29] Fischler,M.,&Bolles,R.(1981).Randomsampleconsensus:aparadigmformodelttingwithapplicationstoimageanalysisandautomatedcartography.CommunicationsoftheACM,24(6),381. [30] Fisher,R.,etal.(1920).Amathematicalexaminationofthemethodsofdeterminingtheaccuracyofanobservationbythemeanerror,andbythemeansquareerror.MonthlyNoticesoftheRoyalAstronomicalSociety,80,758. [31] Geary,R.(1947).Testingfornormality.Biometrika,34(3/4),209. [32] Georgiev,P.,Pardalos,P.,&Theis,F.(2007).Abilinearalgorithmforsparserepresentations.ComputationalOptimizationandApplications,38(2),249. [33] Georgiev,P.,&Theis,F.(2004).Blindsourceseparationoflinearmixtureswithsingularmatrices.IndependentComponentAnalysisandBlindSignalSeparation,(pp.121). [34] Georgiev,P.,Theis,F.,&Cichocki,A.(2005).Sparsecomponentanalysisandblindsourceseparationofunderdeterminedmixtures.NeuralNetworks,IEEETransactionson,16(4),992. [35] Georgiev,P.,Theis,F.,Cichocki,A.,&Bakardjian,H.(2007).Sparsecomponentanalysis:anewtoolfordatamining.DataMininginBiomedicine,(pp.91). [36] Georgiev,P.,Theis,F.,&Ralescu,A.(2007).Identiabilityconditionsandsubspaceclusteringinsparsebss.IndependentComponentAnalysisandSignalSeparation,(pp.357). [37] Gribonval,R.,&Schnass,K.(2010).Dictionaryidentication-sparsematrix-factorizationvial1-minimization.InformationTheory,IEEETransactionson,56(7),3523. [38] Gunn,S.(1998).Supportvectormachinesforclassicationandregression.ISIStechnicalreport,14. [39] Hampel,F.(1973).Robustestimation:Acondensedpartialsurvey.ProbabilityTheoryandRelatedFields,27(2),87. [40] Hampel,F.,Ronchetti,E.,Rousseeuw,P.,&Stahel,W.(2011).RobustStatistics:TheApproachBasedonInuenceFunctions,vol.114.Wiley. [41] Hanson,M.(1981).Onsufciencyofthekuhn-tuckerconditions.JournalofMathematicalAnalysisandApplications,80(2),545. [42] He,R.,Zheng,W.-S.,&Hu,B.-G.(2011).Maximumcorrentropycriterionforrobustfacerecognition.PatternAnalysisandMachineIntelligence,IEEETransac-tionson,33(8),1561. 160

PAGE 161

[43] He,R.,Zheng,W.-S.,Hu,B.-G.,&Kong,X.-W.(2011).Aregularizedcorrentropyframeworkforrobustpatternrecognition.NeuralComputation,23(8),2074. [44] Heisele,B.,Ho,P.,&Poggio,T.(2001).Facerecognitionwithsupportvectormachines:Globalversuscomponent-basedapproach.InComputerVision,2001.ICCV2001.Proceedings.EighthIEEEInternationalConferenceon,vol.2,(pp.688).IEEE. [45] Herault,J.,Jutten,C.,&Ans,B.(1985).Detectiondegrandeursprimitivesdansunmessagecompositeparunearchitecturedecalculneuromimetiqueenapprentissagenonsupervise.In10Colloquesurletraitementdusignaletdesimages,FRA,1985.GRETSI,GroupedEtudesduTraitementduSignaletdesImages. [46] HornikMaxwell,K.,&White,H.(1989).Multilayerfeedforwardnetworksareuniversalapproximators.Neuralnetworks,2(5),359. [47] Huber,P.(1981).Robuststatistics. [48] Huber,P.(1997).RobustStatisticalProcedures.27.SIAM. [49] Huber,P.(2012).Dataanalysis:whatcanbelearnedfromthepast50years,vol.874.Wiley. [50] Hyvarinen,A.,&Oja,E.(2000).Independentcomponentanalysis:algorithmsandapplications.Neuralnetworks,13(4),411. [51] Khanh,P.(1995).Invex-convexlikefunctionsandduality.Journalofoptimizationtheoryandapplications,87(1),141. [52] Kim,K.,Jung,K.,Park,S.,&Kim,H.(2002).Supportvectormachinesfortextureclassication.PatternAnalysisandMachineIntelligence,IEEETransactionson,24(11),1542. [53] Kirkpatrick,S.,Gelatt,C.,&Vecchi,M.(1983).Optimizationbysimulatedannealing.science,220(4598),671. [54] Kreutz-Delgado,K.,Murray,J.,Rao,B.,Engan,K.,Lee,T.,&Sejnowski,T.(2003).Dictionarylearningalgorithmsforsparserepresentation.Neuralcomputation,15(2),349. [55] Liu,W.,Pokharel,P.,&Principe,J.(2006).Errorentropy,correntropyandm-estimation.InMachineLearningforSignalProcessing,2006.Proceedingsofthe200616thIEEESignalProcessingSocietyWorkshopon,(pp.179).IEEE. [56] Liu,W.,Pokharel,P.,&Principe,J.(2007).Correntropy:propertiesandapplicationsinnon-gaussiansignalprocessing.SignalProcessing,IEEETransac-tionson,55(11),5286. 161

PAGE 162

[57] Lundy,M.,&Mees,A.(1986).Convergenceofanannealingalgorithm.Mathemat-icalprogramming,34(1),111. [58] Makeig,S.,Jung,T.-P.,Ghahremani,D.,Bell,A.,&Sejnowski,T.(1996).What(notwhere)arethesourcesoftheeeg?InThe18thAnnualMeetingofTheCognitiveScienceSociety. [59] Mangasarian,O.(1965).Pseudo-convexfunctions.JournaloftheSocietyforIndustrial&AppliedMathematics,SeriesA:Control,3(2),281. [60] Mangasarian,O.(1968).Convexity,pseudo-convexityandquasi-convexityofcompositefunctions. [61] Mangasarian,O.(1994).Nonlinearprogramming.SocietyforIndustrialandAppliedMathematicsPhiladelphia,PA. [62] McCulloch,W.,&Pitts,W.(1943).Alogicalcalculusoftheideasimmanentinnervousactivity.BulletinofMathematicalBiology,5(4),115. [63] Mehrotra,K.,Mohan,C.,&Ranka,S.(1997).Elementsofarticialneuralnetworks.theMITPress. [64] Michalewicz,Z.,&Fogel,D.(2004).Howtosolveit:modernheuristics.Springer-VerlagNewYorkInc. [65] Michie,D.,Spiegelhalter,D.,&Taylor,C.(1994).Machinelearning,neuralandstatisticalclassication.EllisHorwoodSeriesinArticialIntelligence,NewYork,NY:EllisHorwood,c1994,editedbyMichie,Donald;Spiegelhalter,DavidJ.;Taylor,CharlesC.,1. [66] Minsky,M.,&Seymour,P.(1988).Perceptrons.InNeurocomputing:foundationsofresearch,(pp.157).MITPress. [67] Naanaa,W.,&Nuzillard,J.(2005).Blindsourceseparationofpositiveandpartiallycorrelateddata.SignalProcessing,85(9),1711. [68] Nister,D.(2005).Preemptiveransacforlivestructureandmotionestimation.MachineVisionandApplications,16(5),321. [69] Pardalos,P.,Boginski,V.,&Vazacopoulos,A.(2007).Datamininginbiomedicine.SpringerVerlag. [70] Pardalos,P.,Pitsoulis,L.,Mavridou,T.,&Resende,M.(1995).Parallelsearchforcombinatorialoptimization:geneticalgorithms,simulatedannealing,tabusearchandgrasp.ParallelAlgorithmsforIrregularlyStructuredProblems,(pp.317). [71] Parzen,E.(1962).Onestimationofaprobabilitydensityfunctionandmode.Theannalsofmathematicalstatistics,33(3),1065. 162

PAGE 163

[72] Principe,J.(2010).Informationtheoreticlearning:Renyi'sentropyandKernelperspectives.SpringerVerlag. [73] Principe,J.,Xu,D.,&Fisher,J.(2000).Informationtheoreticlearning.Unsuper-visedadaptiveltering,1,265. [74] Raguram,R.,Frahm,J.-M.,&Pollefeys,M.(2008).Acomparativeanalysisofransactechniquesleadingtoadaptivereal-timerandomsampleconsensus.InComputerVisionECCV2008,(pp.500).Springer. [75] Reeves,C.(1993).Modernheuristictechniquesforcombinatorialproblems.JohnWiley&Sons,Inc. [76] Renyi,A.(1965).Onthefoundationsofinformationtheory.Revuedel'InstitutInternationaldeStatistique,(pp.1). [77] Robbins,H.,&Monro,S.(1951).Astochasticapproximationmethod.TheAnnalsofMathematicalStatistics,(pp.400). [78] Rockafellar,R.(1997).Convexanalysis,vol.28.Princetonuniversitypress. [79] Rockafellar,R.,Uryasev,S.,&Zabarankin,M.(2008).Risktuningwithgeneralizedlinearregression.MathematicsofOperationsResearch,33(3),712. [80] Rosenblatt,F.(1958).Theperceptron:Aprobabilisticmodelforinformationstorageandorganizationinthebrain.Psychologicalreview,65(6),386. [81] Rubinov,A.,&Ugon,J.(2003).Skeletonsofnitesetsofpoints.submittedpaper. [82] Rubinstein,R.(1983).Smoothedfunctionalsinstochasticoptimization.Mathe-maticsofOperationsResearch,(pp.26). [83] Santamaria,I.,Pokharel,P.,&Principe,J.(2006).Generalizedcorrelationfunction:Denition,properties,andapplicationtoblindequalization.SignalProcessing,IEEETransactionson,54(6),2187. [84] Scholkopf,B.,Burges,C.,&Vapnik,V.(1995).Extractingsupportdataforagiventask.InProceedings,FirstInternationalConferenceonKnowledgeDiscovery&DataMining.AAAIPress,MenloPark,CA,(pp.252). [85] Shannon,C.(1948).Amathematicaltheoryofcommunication. [86] Singh,A.,&Principe,J.(2010).Alossfunctionforclassicationbasedonarobustsimilaritymetric.InNeuralNetworks(IJCNN),The2010InternationalJointConferenceon,(pp.1).IEEE. [87] Styblinski,M.,&Tang,T.(1990).Experimentsinnonconvexoptimization:stochasticapproximationwithfunctionsmoothingandsimulatedannealing.NeuralNetworks,3(4),467. 163

PAGE 164

[88] Sun,Y.,&Xin,J.(2012).Nonnegativesparseblindsourceseparationfornmrspectroscopybydataclustering,modelreduction,andl1minimization.SIAMJournalonImagingSciences,5(3),886. [89] Syed,M.,Georgiev,P.,&Pardalos,P.(2012).Ahierarchicalapproachforsparsesourceblindsignalseparationproblem.Computers&OperationsResearch,availableonline. [90] Syed,M.,Georgiev,P.,&Pardalos,P.(2013).Blindsignalseparationmethodsincomputationalneuroscience.InNeuromethods.Springer,toappear. [91] Syed,M.,&Pardalos,P.(2013).Neuralnetworkmodelsincombinatorialoptimization.InHandbookofCombinatorialOptimization.Springer,toappear. [92] Syed,M.,Pardalos,P.,&Principe,J.(2013).Ontheoptimizationofthecorrentropiclossfunctionindataanalysis.OptimizationLetters,availableon-line. [93] Syed,M.,Principe,J.,&Pardalos,P.(2012).Correntropyindataclassication.InDynamicsofInformationSystems:MathematicalFoundations,(pp.81).Springer. [94] Te-Won,L.(1998).Independentcomponentanalysis,theoryandapplications.Boston:KluwerAcademicPublishers. [95] Tong,S.,&Koller,D.(2002).Supportvectormachineactivelearningwithapplicationstotextclassication.TheJournalofMachineLearningResearch,2,45. [96] Tordoff,B.,&Murray,D.(2002).Guidedsamplingandconsensusformotionestimation.InComputerVisionECCV2002,(pp.82).Springer. [97] Tukey,J.(1960).Asurveyofsamplingfromcontaminateddistributions.Contribu-tionstoProbabilityandStatistics,2,448. [98] Tukey,J.(1962).Thefutureofdataanalysis.TheAnnalsofMathematicalStatistics,33(1),1. [99] Vapnik,V.(1999).Anoverviewofstatisticallearningtheory.NeuralNetworks,IEEETransactionson,10(5),988. [100] Vapnik,V.(2000).Thenatureofstatisticallearningtheory.SpringerVerlag. [101] Vapnik,V.,Golowich,S.,&Smola,A.(1996).Supportvectormethodforfunctionapproximation,regressionestimation,andsignalprocessing.InAdvancesinNeuralInformationProcessingSystems9. 164

PAGE 165

[102] Weston,J.,&Watkins,C.(1998).Multi-classsupportvectormachines.Tech.rep.,TechnicalReportCSD-TR-98-04,DepartmentofComputerScience,RoyalHolloway,UniversityofLondon. [103] Yang,Z.,Xiang,Y.,Rong,Y.,&Xie,S.(2013).Projection-pursuit-basedmethodforblindseparationofnonnegativesources.NeuralNetworksandLearning,IEEETransactionson,24(1),47. [104] Zhang,J.,Xanthopoulos,P.,Chien,J.,Tomaino,V.,&Pardalos,P.(2011).Minimumpredictionerrormodelsandcausalrelationsbetweenmultipletimeseries.WileyEncyclopediaofOperationsResearchandManagementScience,J.J.Cochran(ed.),3,1843. 165

PAGE 166

BIOGRAPHICALSKETCH NaqeebuddinMujahidSyedhasreceivedBachelorofEngineering(BE)inMechanicalEngineeringfromMuffakhamJahCollegeofEngineeringandTechnology(MJCET),OsmaniaUniversity(OU),Hyderabad,Indiain2005.HewasawardedwithtwoGoldMedalsinBE(MechanicalEngineering)fromMJCETaswellasfromOU.HereceivedMasterofScience(MS)inSystemsEngineering(SE)fromKingFahdUniversityofPetroleumandMinerals(KFUPM),Dhahran,SaudiArabiain2007.HewasawardedwiththeOutstandingAcademicPerformanceawardfortheacademicyear2006-07fromtheCollegeofComputerScience&Engineering(CCSE),atKFUPM.From2007to2009,heservedasalecture-BintheSEDept.atKFUPM.HereceivedDoctorofPhilosophy(PhD)inOperationsResearchfromtheIndustrialandSystemsEngineering(ISE)DepartmentattheUniversityofFlorida(UFL).DuringhisPhDhehasbeenawardedwiththeOutstandingInternationalStudentawardatUFLfortheyears2009,2011and2012.Inadditiontothat,hewasawardedwiththeGraduateStudentTeachingawardatISEDept.inUFL. 166