Robust Data Mining Techniques with Application in Biomedicine and Engineering

MISSING IMAGE

Material Information

Title:
Robust Data Mining Techniques with Application in Biomedicine and Engineering
Physical Description:
1 online resource (106 p.)
Language:
english
Creator:
Xanthopoulos,Petros
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Industrial and Systems Engineering
Committee Chair:
Pardalos, Panagote M
Committee Members:
Geunes, Joseph P
Lan, Guanghui
Thai, My Tra

Subjects

Subjects / Keywords:
data -- robust -- supervised -- uncertainty
Industrial and Systems Engineering -- Dissertations, Academic -- UF
Genre:
Industrial and Systems Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
Analysis and interpretation of large datasets is a very significant problem that arises in many areas of science. The task of data analysis becomes even harder when data are uncertain or imprecise. Such uncertainties introduce bias and make massive data analysis an even more challenging task. Over the years there have been developed many mathematical methodologies for data analysis based on mathematical programming. In this field, the deterministic approach for handling uncertainty and immunizing algorithms against undesired scenarios is robust optimization. In this work we examine the application of robust optimization is some well known data mining algorithms. We explore the optimization structure of such algorithms and then we state their robust counterpart formulation.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Petros Xanthopoulos.
Thesis:
Thesis (Ph.D.)--University of Florida, 2011.
Local:
Adviser: Pardalos, Panagote M.
Electronic Access:
RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2012-02-29

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2011
System ID:
UFE0043173:00001


This item is only available as the following downloads:


Full Text

PAGE 1

ROBUSTDATAMININGTECHNIQUESWITHAPPLICATIONINBIOMEDIC INEAND ENGINEERING By PETROSXANTHOPOULOS ADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOL OFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF DOCTOROFPHILOSOPHY UNIVERSITYOFFLORIDA 2011

PAGE 2

c r 2011PetrosXanthopoulos 2

PAGE 3

Tomygrandmother 3

PAGE 4

ACKNOWLEDGMENTS DuringmydoctoralstudiesinFlorida,Ihadthelucktomeeta ndworkwithmany interestingpeoplefrommanydifferentcountriesandtechn icalbackgrounds.Apart fromthetechnicalskillsthatIreceivedinthePhDprogramo fIndustrialandSystems DepartmentImostlylearnedhowtobeapartoftheteamandhow tocommunicate myideasandthoughtsmoreefciently.Sorstofall,Iwould liketoacknowledgemy academicadvisorDr.PanosM.Pardaloswhoguidedmethrough allthestagesof doctoralstudies.Hismentoringhelpedmydevelopmentinmu ltiplelevels. AlsoIwouldliketoacknowledgemyrstmentorandundergrad uatethesisadvisor, professorMichalisZervakisfromtheElectronicsandCompu terEngineeringdepartment attheTechnicalUniversityofCrete,thepersonwhoshowedm ehowtoconduct researchandhelpedmesignicantlyduringtherststepsof mycareer.Iwouldlike tothankallmycolleaguesandfriendsattheCenterforAppli edOptimization,University ofFlorida.Ihadthechancetodevelopfruitfulcollaborati onsandfriendshipsallthis timeIwanttothankmyparentsLazarosandKaitifortheirsup portofmypersonal decisionandfortheirfreedomtheygaveme.Iwouldalsolike tothankSibelforher continuoushelpandsupport.LastbutnotleastIwouldliket othanksDr.JosephP Geunes,Dr.Guanhui(George)LanandDr.MyT.Thaiforservin ginmyPhDcommittee anddedicatingpartoftheirvaluabletimetothistask. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS ..................................4 LISTOFTABLES ......................................7 LISTOFFIGURES .....................................8 ABSTRACT .........................................9 CHAPTER 1INTRODUCTION ...................................10 1.1ABriefOverview ................................10 1.2AMethodologicalPerspective .........................14 1.3ABriefHistoryofRobustness .........................18 1.3.1RobustoptimizationvsStochasticProgramming ..........20 1.3.2ThesisGoalandStructure .......................21 2DATAMININGMETHODS ..............................23 2.1LinearLeastSquares .............................23 2.1.1WeightedLinearLeastSquares ....................26 2.1.2ComputationalAspectsofLinearLeastSquares ...........27 2.1.3LeastAbsoluteShrinkageandSelectionOperator ..........28 2.2PrincipalComponentAnalysis ........................29 2.2.1MaximumVarianceApproach .....................29 2.2.2MinimumErrorApproach .......................31 2.3LinearDiscriminantAnalysis .........................32 2.3.1GeneralizedDiscriminantAnalysis ..................35 2.4SupportVectorMachines ...........................36 2.4.1AlternativeObjectiveFunction .....................41 2.5RegularizedGeneralizedEigenvalueClassication .............42 2.6CaseStudy-Celldeathdiscrimination ....................45 2.6.1Materials ................................47 2.6.1.1Dataset ............................47 2.6.1.2Ramanspectroscope ....................48 2.6.1.3Datapreprocessing .....................49 2.6.2Software .................................49 2.6.3NearestNeighborClassication ....................49 2.6.4ImprovedIterativeScaling .......................50 2.6.5ResultsandDiscussion ........................51 2.6.5.1Classicationandmodelselection .............51 2.6.5.2Impactoftrainingsetdimension ..............52 5

PAGE 6

3ROBUSTLEASTSQUARES ............................54 3.1TheOriginalProblem .............................54 3.2VariationsoftheOriginalProblem ......................58 4ROBUSTPRINCIPALCOMPONENTANALYSIS .................60 5ROBUSTLINEARDISCRIMINANTANALYSIS ..................63 6ROBUSTSUPPORTVECTORMACHINES ....................67 7ROBUSTGENERALIZEDEIGENVALUECLASSIFICATION ...........72 7.1UnderEllipsoidalUncertainty .........................72 7.1.1RobustCounterpartUnderEllipsoidalUncertaintySe t .......75 7.1.2BalancingBetweenRobustnessandOptimality ...........78 7.2ComputationalResults .............................78 7.2.1ACaseStudy ..............................78 7.2.2Experiments ...............................80 7.3Extentions ....................................84 7.3.1MatrixNormConstrainedUncertainty ................87 7.3.2NonlinearRobustGeneralizedClassication ............91 8CONCLUSIONS ...................................94 APPENDIX AOPTIMALITYCONDITIONS ............................95 BDUALNORMS ....................................99 REFERENCES .......................................100 BIOGRAPHICALSKETCH ................................106 6

PAGE 7

LISTOFTABLES Table page 2-1Averageclassicationaccuracyforholdoutcrossvalid ation(100repetitions) .51 7-1Datasetdescription. .................................80 7

PAGE 8

LISTOFFIGURES Figure page 1-1Moore's“law” .....................................12 1-2Kryder's“law” .....................................12 1-3Thebigpicture ....................................14 2-1Thesignleinputsingleoutcomecase .......................24 2-2LLSandregularization ................................26 2-3TwopathsforPCA ..................................30 2-4Directedacyclicgraphapproachforafourclassexample ............41 2-5ExplanationofRaman ................................46 2-6ThemeanRamanspectraforeachclass .....................48 2-7Classicationaccuracyversussizeofthetrainingset( binary) ..........52 2-8Classicationaccuracyversussizeofthetrainingset( multiclass) ........53 7-1Objectivefunctionvaluedependenceontheellipsepara meters. ........79 7-2AnalysisforPimaindiandataset ..........................81 7-3Analysisforirisdataset ...............................81 7-4Analysisforwinedataset ..............................82 7-5AnalysisfordatasetNDC ..............................83 7-6Comparisongraphfor(a)Pimaindian(b)irisdatasets ..............85 7-7Comparisongraphfor(c)wineand(d)NDCdatasets ...............86 8

PAGE 9

AbstractofDissertationPresentedtotheGraduateSchool oftheUniversityofFloridainPartialFulllmentofthe RequirementsfortheDegreeofDoctorofPhilosophy ROBUSTDATAMININGTECHNIQUESWITHAPPLICATIONINBIOMEDIC INEAND ENGINEERING By PetrosXanthopoulos August2011 Chair:PanosM.PardalosMajor:IndustrialandSystemsEngineering Analysisandinterpretationoflargedatasetsisaverysign icantproblemthat arisesinmanyareasofscience.Thetaskofdataanalysisbec omesevenharder whendataareuncertainorimprecise.Suchuncertaintiesin troducebiasandmake massivedataanalysisanevenmorechallengingtask.Overth eyearstherehavebeen developedmanymathematicalmethodologiesfordataanalys isbasedonmathematical programming.Inthiseld,thedeterministicapproachforh andlinguncertaintyand immunizingalgorithmsagainstundesiredscenariosisrobu stoptimization.Inthiswork weexaminetheapplicationofrobustoptimizationissomewe llknowndatamining algorithms.Weexploretheoptimizationstructureofsucha lgorithmsandthenwestate theirrobustcounterpartformulation. 9

PAGE 10

CHAPTER1 INTRODUCTION DataMining(DM),conceptually,isverygeneraltermthaten capsulatesalarge numberofmethods,algorithmsandtechnologies.Thecommon denominatoramongall theseistheirabilitytoextractusefulpatternsandassoci ationsfromdatausuallystored inlargedatabases.ThusDMtechniquesaimtoextractknowle dgeandinformationout ofdata.Thistaskiscrucial,especiallytoday,mainlybeca useoftheemergingneedsand capabilitiesthattechnologicalprogresscreates.Inthep resentworkweinvestigatesome ofthemostwellknowndataminingalgorithmsfromanoptimiz ationperspectiveandwe studytheapplicationofRobustOptimization(RO)inthem.T hiscombinationisessential inordertoaddresstheunavoidableproblemofdatauncertai ntythatarisesinalmostall realisticproblemsthatinvolvedataanalysis. 1.1ABriefOverview Beforewestatethemathematicalproblemsofthisthesis,we provide,fortheshake ofcompletion,ahistoricalandmethodologicaloverviewof DM.HistoricallyDMwas evolved,initscurrentform,duringlastfewdecadesfromth einterplayofclassical statisticsandArticialIntelligence(AI).Itisworthmen tioningthatthroughthisevolution processDMdevelopedstrongbondswithcomputerscienceand optimizationtheory. InordertostudymodernconceptsandtrendsofDMwerstneed tounderstandits foundationsanditsinterconnectionswiththefouraforeme ntioneddisciplines. ArticialIntelligence:Theperpetualneed/desireofhuma ntocreatearticial machines/algorithmsabletolearn,decideandactashumans gavebirthtoArticial Intelligence.OfciallyAIwasbornin1956inaconferenceh eldatDartmouthCollege. ThetermitselfwascoinedbyJ.McCarthyatthesameconferen ce.ThegoalsofAI statedinthisrstconference,today,canbecharacterized assupercialfromapesimist perspectiveorasopenproblemsfromanoptimisticperspect ive.Byreadingagain thegoalsofthisconferencewecanhavearoughideaoftheexp ectationsandthe 10

PAGE 11

goalsoftheearlyAIcommunity:“Toproceedonthebasisofth econjecturethatevery aspectoflearningoranyotherfeatureofintelligencecanb esopreciselydescribed thatamachinecanbemadetosimulateit”[ 46 ].Despitethefactthateventoday understandingcognitionremainsanopenproblemforcomput ationalandclinical scientists,thisfoundingconferenceofAIstimulatedthes cienticcommunityand triggeredthedevelopmentofalgorithmsandmethodsthatbe camethefoundations ofmodernmachinelearning.Forinstancebayesianmethodsw erefurtherdeveloped andstudiedaspartofAIresearch.Computerprogramminglan guageslikeLISPand PROLOGweredevelopedforservingAIpurposesandalgorithm sandmethodslike perceptron,backpropagationandarticialneuralnetwork s(ANN)wereinvented. ComputerScience/Engineering:InliteratureDMisoftencl assiedasabranchof computerscience(CS).IndeedmostofDMresearchhasbeendr ivenbyCSsociety. Inadditiontothis,therewereseveraladvancesinCSareath atboostedDMrelated research.Databasemodelingtogetherwithsmartsearchalg orithmsmadepossiblethe indexingandprocessingofmassivedatabases[ 1 ].Theadvancesinsoftwarelevelin theareaofdatabasemodelingandsearchalgorithmswasacco mpaniedbyaparallel developmentofsemiconductortechnologiesandcomputerha rdwareengineering. InfactthereisafeedbackrelationbetweenDMandcomputere ngineeringthat drivestheresearchinbothareas.ComputerEngineeringpro videscheaperandlarger storageandprocessingpower.Ontheotherhandthesenewcap abilitiesposenew problemsforDMsocietyrelatedtotheprocessingofsuchamo untsofdata.These problemscreatenewalgorithmsandnewneedsforprocessing powerthatisinturns adressedbycomputerengineeringsociety.Theprogressint hisareacanbebest describedbythesocalledMoore's“law”(namedafterIntel' scofounderG.E.Moore) thatpredictedthatthenumberoftransistorsonachipwilld oubleevery24months[ 47 ]. Thissimplerulewasabletopredictthecountoftransistorp erchipatleastuntiltoday (Fig. 1-1 ). 11

PAGE 12

197019751980198519901995200020052010 10 3 10 4 10 5 10 6 10 7 10 8 10 9 YearNumberofTransistors Intel AMD Figure1-1.Moore's“law”drivesthesemiconductormarkete ventoday.Thisplotshows thetransistorcountofseveralprocessor'sfrom1970until todayfortwo majorprocessormanufacturingcompanies(IntelandAMD). Similarempirical“laws”havebeenstatedforharddrivecap acityandharddrive price.Inactithasbeenobservedthatharddrivecapacityin creasestentimeseveryve yearsandthecostbecomesonetentheveryveyears.Thisemp iricalobservationis knownasKryder's“law”(Fig. 1-2 )[ 76 ].Lastasimilarrulewhichisrelatedtonetwork bandwidthperuser(Nielsen's“law”)indicatesthatitincr easesby50%annually[ 49 ]. 200020012002200320042005200620072008200920102011 10 1 10 0 10 1 YearCostperGb($)Figure1-2.Kryder's“law”describestheexponentialdecre aseofcomputerstoragecost overtime.thisruleisabletipredictapproximatelythecos tofstoragespace overthelastdecade. 12

PAGE 13

Thefastthatcomputerprogressischaracterizedbyallthes eexponentialempirical rulesisinfactindicativeofthecontinuousandrapidtrans formationofDM'sneedsand capabilities. Optimization:Mathematicaltheoryofoptimizationisabra nchofmathematics thatwasoriginallydevelopedforservingtheneedsofopera tionsresearch(OR).It isworthnotingthatalargeamountofdataminingproblemsca nbedescribedas optimizationproblems,sometimestractable,sometimesno t.ForexamplePrincipal ComponentAnalysis(PCA)andFisher'sLinearDiscriminant Analysis(LDA)are formulatedasminimization/maximizationproblemsofcert ainstatisticalcriteria[ 15 ]. SupportVectorMachines(SVMs)canbedescribedasaconvexo ptimizationproblem [ 74 ]andmanytimeslinearprogrammingcanbeusedfordevelopme ntofsupervised learningalgorithms[ 44 ]andseveralmetaheuristicshavebeenproposedforadjusti ng theparametersofsupervisedlearningmodels[ 16 ].Ontheothersidedatamining methodsareoftenusedaspreprocessingforbeforeemployin gsomeoptimization model(e.g.,clustering).InadditionabranchofDMinvolve snetworkmodelsand optimizationproblemsonnetworksforunderstandingtheco mplexrelationshipsthat underliebetweenthenodesandtheedges.Inthissenseoptim izationisatoolthatcan beemployedinordertosolveDMproblems.Inarecentreviewp apertheinterplayof operationsresearchdataminingandapplicationswasdescr ibedbytheschemeshown inFig. 1-3 [ 51 ]. Statistics:Statisticssetthefoundationformanyconcept sbroadlyusedindata mining.Historicallyoneoftherstattemptstounderstand interconnectionbetweendata wasBayesanalysisin1763[ 8 ].Otherconceptsincluderegressionanalysis,hypothesis testing,PCAandLDA.InmodernDMitisverycommontomaximiz eorminimizecertain statisticalpropertiesinordertoachievesomegroupingor forndinginterconnections amonggroupsofdata. 13

PAGE 14

D M O a R n efciency effectiveness A a data information structure decisionFigure1-3.Thebigpicture.Schemecapturingtheinderdepe ndenceamongDM,OR andthevariousapplicationelds. 1.2AMethodologicalPerspective AsitcanbeunderstoodthestudyofDMpassesessentiallythr oughtheaforementioned disciplines.InadditionwithintheeldofDMalgorithmsan dproblemscanbegroupedin certaincategoriesforamoresystematicstudyandundersta nding.Herewebrieylist thesecategories Datapreprocessingmethods Machinelearning –UnsupervisedLearning–SupervisedLearning–Semisupervisedlearning Featureselection-featureextraction Datavisualization/representation Nowwewillbrieydiscussthesemajorcategories.Dataprep rocessingincludesall algorithmsresponsiblefordatapreparation.Timeseries ltering,outlierdetection,data cleaningalgorithmsanddatanormalizationalgorithmsfal linthiscategory.Properdata preprocessingisessentialforamoreefcientperformance oflearningalgorithms. 14

PAGE 15

MachineLearning(ML),ismaybethemostimportantpartofda tamining.Machine learningisasetofalgorithmsthathaveadatasetasinputan dalsomaybesome informationaboutit.TheoutputofMLisasetofrulesthatle tusmakeinferenceabout anynewdatapoint. UnsupervisedLearning(UL),sometimesalsoknownascluste ring,aimstond associationsbetweendatapoint(clusters).Clusteringis performedusuallywhenno informationisgivenaboutthestructureofthedataset.Itc anbeusedforexploratory purposes(e.g.,identifyspecicdatastructurethancanbe usedformoreefcient supervisedalgorithmdesign).Foramoreextensivetourtod ataclusteringwereferthe readerto[ 36 ]. SupervisedLearning(SL)isoneofthemostwellknowndatami ningalgorithms.SL algorithmsaregivenasetofdatapoints(datasamples)with knownproperties(features) andtheclassestheybelong(labels).Then,theSLalgorithm trainsamodelwhichat theendofthetrainingphaseiscapableofdecidingontheide ntityofnewdatapoints withunknownlabel(testdataset).Inthiscategoryonecani ncludethearticialneural networks,Bayesianclassiers,k-nearestneighborclassi cation,geneticalgorithms andothers[ 15 ].Ifthesamplescontainqualitativefeaturevalues,thent herulebased classicationcanbeemployed[ 58 59 ].Especiallyforthetwoclassgeneralcasebinary classication,oneofthemostcommonlyusedapproachisSVM s.Originallyproposed byVapnik[ 74 ],SVMaimstodetermineaseparationhyperplanefromthetra ining dataset.SVMpossessesasolidmathematicalfoundationino ptimizationtheory.Ifthe twoclassesofdatacannotbediscriminatedwithalinearhyp erplane,thentheproblem canbeaddressedasanonlinearclassicationproblem.Such problemscanbeattacked usingthesocalledkerneltrick.Inthiscase,originalthed atasetsareembeddedin higherdimensionspaceswhereperfectlinearseparationca nbeachieved[ 66 ].The combineduseofsupervisedclassicationmethodswithkern elsisthemostcommon waytoaddressdataminingproblems. 15

PAGE 16

SemiSupervisedLearningliesinbetweensupervisedanduns upervisedlearning. Inthiscase,classlabelsareknownonlyforaportionofavai labledataandsome partialinformationisgiventothealgorithm,usuallyinth eformofpairwiseconstraints (e.g.,points a and b belongdonotbelongtothesameclass).Thegoalinthiscase istoachieveoptimalutilizationofthisinformationinord ertoobtainhighestpredictive accuracy.Biclusteringtriestondassociategroupsoffea tureswithcorresponding groupsofsamples.Inthisway,onecandecideonthemostimpo rtantfeaturesthatare responsibleforagroupofsampleswithspeciccharacteris tics.Ithasbeenextensively usedinmicroarraydataanalysisforassociatinggeneswith specicphenotypes.There arenumerousalgorithmicapproachestobiclustering,rang ingfromgreedyalgorithms, spectralbiclustering,columnreorderingand0-1fraction alprogramming.Foramore mathematicalrigorousreviewaboutbiclusteringandtheir applicationswereferthe readerto[ 18 77 ].Itisworthmentioningthatbiclusteringcanhaveeithers upervisedor unsupervisedversion. Featureselectionconsistsofdeterminingthemostimporta ntproperties(called features)ofeachdatapoint(calledsample)thatwillbeuse dfortraining.Forexample, givenasetofpeople(i.e.sample)andsomeoftheirfeatures likeweight,height,eye color,haircolor,wewishtodistinguishtheinfantsfromth eadults.Afeatureselection algorithmwilltrytoselectasmallsubsetoffeaturesthath avethelargestcombined discriminatorypower(inthiscase,maybeweightandheight ).Incaseofproblems describedbyhundredsorthousandsofcharacteristics,fea tureselectionpermitsto reducethenumberofvariables,withagreatadvantageinter msofcomputationaltime neededtoobtainthesolution.Examplesofalgorithmsforfe atureselectionareRFE, Relief,[ 32 ].Hereweneedtopointoutthedifferencebetweenfeaturese lectionand featureextraction. Featureextractionconsistsoftheconstructionofsomefea turesthatdonot necessarilybelongtothesetoftheoriginalfeatures.Thes ignicantreductionofthe 16

PAGE 17

originalnumberoffeatures,duetoafeatureselectionorfe atureextractionalgorithm,in termoffeaturesiscalleddimensionalityreductionwhichi sessentialinordertoimprove theprocessingtime.Astandardapproachforfeatureextrac tionisthroughPCA. Datavisualization/representation:Thislastbranchofda taminingdealswith methodswheretheextractedinformationcanberepresented to/visualizedbytheend user.Massivedatasetsneedspecialtypeofalgorithmsfora nalysisandrepresentation [ 1 ].Onecommonwaytorepresentinformationandpotentialrel ationshipbetween entitiesisthroughgraphs(alsoreferredtoasnetworks).N etworkrepresentationcanbe veryusefulinunderstandingthedynamicsthatgovernasyst em.Softwarepackageslike Cytoscapesoftware[ 65 ]dealwithrepresentationofcomplexbiologicaldata. Twoothertopicsrelatedtodataminingandmorespecically tothesupervised machinelearningcomponentoftheprocessarethemodelsele ctionandthecross validationofthelearningscheme.Modelselectionisrelat edtothetuningofthemodel itselfintermsoftheparameterselection.Themostcommonm ethodemployedinthis caseisauniformsearchoveragridspannedbytheparameters .Latelytherehave beenproposedsomemoreadvancedmodelselectionmethodsba sedonuniform designtheory[ 34 ].Ontheotherhand,crossvalidationofamodelisveryimpor tant, especiallywhenonewantstostatisticallyassuretheindep endenceoftheoutputmodel accuracyfromthespecicpropertiesoftrainingandtestin gdatasets.Forthisthegiven dataarepartitionedmanytimesintestingandtrainingdata setusingsomeresampling approach.Thereportedaccuracyofthemethodistheaverage accuracyofallofthe crossvalidationrepetitions.Crossvalidationmethodsus uallyvarybytheirstrategy forpartitioningtheoriginaldata.Mostofthewell-studie dcrossvalidationtechniques includek-fold,leave-one-outandholdoutcrossvalidatio n. Allthedescribedmethodsareimplementedinmanyopensourc eproblemsolving environmentslikeR( http://www.r-project.org/ ),Weka( http://www.cs.waikato. ac.nz/ml/weka )andOctave( http://www.gnu.org/software/octave/ ).Theyprovide 17

PAGE 18

simplegraphicaluserinterfacesthatcanbeusedalsobynon expertusers.Someof thesemethodscomeinotherpopulartechnicalcomputingpro gramminglanguagessuch asMatlabandMathematica. 1.3ABriefHistoryofRobustness Thetermrobustisusedextensivelyinengineeringandstati sticsliterature.In engineeringitisoftenusedinordertodenoteerrorresilie nceingeneral(e.g.,robust methodsarethesethatarenotaffectedthatmuchforsmaller rorinterferences).In statisticsrobustisusedtodescribeallthesemethodsthat areusedwhenthemodel assumptionsarenotexactlytrue.Forexample,itcanbeused whenvariablesdo notfollowexactlytheassumeddistribution(existenceofo utliers).Inoptimization (minimizationofmaximization)robustisusedinordertode scribetheproblemofnding thebestsolutiongiventhattheproblemdataarenotxedbut obtaintheirvalueswithina welldeneduncertaintyset.Thusifweconsidertheminimiz ationproblem(withoutloss ofgenerality) min x 2X f ( A x ) (1–1a) where A accountsforalltheparametersoftheproblemthatareconsi dertobexed numbers,and f ( ) istheobjectivefunction,therobustcounterpart(RC)prob lemisgoing tobea min max problemofthefollowingform min x 2 X max A 2A f ( A x ) (1–2a) where A isthesetofalladmissibleperturbations.Themaximizatio nproblemoverthe parameters A accounts,usually,foraworstcasescenario.Theobjective ofrobust optimizationistodeterminetheoptimalsolutionwhensuch ascenariooccurs.Inreal dataanalysisproblemsitisverylikelythatdatamightbeco rrupted,perturbedorsubject toerrorrelatedtodataacquisition.Infactmostofmodernd ataacquisitionmethods arepronetoerrors.Themostusualsourceofsucherrorsisno isewhichisusually 18

PAGE 19

associatedwithinstrumentationitselforduetohumanfact ors(whenthedatacollection isdonemanually).Spectroscopy,microarraytechnology,e lectroencephalography (EEG)aresomeofthemostcommonlyuseddatacollectiontech nologiesthatare subjecttonoise.Robustoptimizationisemployednotonlyw henwearedealingwith dataimprecisionsbutalsowhenwewanttoprovidestablesol utionsthatcanbeused incaseofinputmodication.Inadditionitcanbeusedinord ertoavoidselectionof “useless”optimalsolutioni.e.solutionsthatchangedras ticallyforsmallchangesof data.Especiallyincasewhereoptimalsolutioncannotbeim plementedprecisely,due totechnologicalconstraints,wewishthatthenextbestopt imalsolutionwillbefeasible andveryclosetotheonethatisoutofourimplementationsco pe.Forallthesereasons Robustmethodsandsolutionsarehighlydesired. Inordertooutlinethemaingoalandideaofrobustoptimizat ionwewillusethewell studiedexmpleoflinearprogrmming(LP).Inthisproblemwe needdeterminetheglobal optimumofalinearfunctionoverthefeasibleregiondened byasetoflinearsystem. min c T x (1–3a) s.t. Ax = b (1–3b) x 0 (1–3c) where A 2 R n m b 2 R n c 2 R m .Inthisformulation x isthedecisionvariableand A b c arethedataandtheyhaveconstantvalues.LPforxeddatava luescanbesolved efcientlybymanyalgorithms(e.g.,SIMPLEX)andithasbee nshownthatLPcanbe solvedinpolynomialtime[ 37 ]. Inthecaseofuncertaintyweassumethatdataarenotxedbut theycantake anyvalueswithinanuncertaintyset.Thentherobustcounte rpart(RC)problemisto ndavector x thatminimizestheobjectiveofEq. 1–3a forthe“worstcase”allowed perturbation.Thisworstcaseproblemcanbestatedasamaxi mizationproblemwith 19

PAGE 20

respectto A b c .Thewholeprocesscanbestatedasthefollowing min max problem min x max A b c c T x (1–4a) s.t. Ax = b (1–4b) x 0 (1–4c) A 2A b 2B c 2C (1–4d) where A B C aretheuncertaintysetsof A b c correspondingly.ProblemofEq. 1–4 can mightbetractableoruntractablebasedontheuncertaintys etsproperties.Forexample ithasbeenshownthatifthecolumnsof A followsellipsoidaluncertaintyconstraintsthe problemispolynomiallytractable[ 10 ].BertsimasandSimshowedthatifthecoefcients of A matrixarebetweenalowerandanupperboundthenthisproble mcanbestill solvedwithlinearprogramming[ 12 ].AlsoBertsimasetal.haveshownthatanuncertain LPwithgeneralnormboundedconstriantsisaconvexprogram mingproblem[ 11 ]. 1.3.1RobustoptimizationvsStochasticProgramming HereitisworthnotingthatRobustOptimizationisnottheon lyapproachfor handlinguncertaintyinoptimizationproblems.Intherobu stframeworktheinformation aboutuncertaintyisgiveninaratherdeterministicformof worstcasebounding constraints.Inadifferentframeworkonemightnotrequire thesolutiontobefeasible foralldatarealizationbuttoobtainthebestsolutiongive nthatproblemdataarerandom variablesfollowingaspecicdistribution.Thisisofpart icularinterestwhentheproblem possessessomeperiodicpropertiesandhistoricdataareav ailable.Inthiscasethe parametersofsuchadistributioncouldefcientlybeestim atedthroughsomemodel ttingapproach.Thenaprobabilisticdescriptionoftheco nstraintscanbeobtainedand thecorrespondingoptimizationproblemcanbeclassiedas astochasticprogramming 20

PAGE 21

problem.Inthismannerthestochasticequivalentoflinear programmingwillbe min x t t (1–5a) s.t. Pr f c T x t Ax b g p (1–5b) x 0 (1–5c) where c A and b entriesarerandomvariablethatfollowsomeknowndistribu tion, p isanonnegativenumberlessthan1and Pr fg someprobabilityfunction.This nondeterministicdescriptionoftheproblemdoesnotguarr anteethattheprovided solutionwouldbefeasibleforalldatasetrealizationsbut providealessconservative optimalsolutiontakingintoconsiderationthedistributi onbaseduncertainties.Although thestochasticapproachmightbeofmorepracticalvalueins omecasestheresome assumptionsassociatedwiththemthatoneshouldbeawareof beforeusingthem[ 9 ] 1.Theproblemmustbeofstochasticnatureandthatindeedth ereisadistribution hiddenbehindeachvariable. 2.Oursolutiondependsonourabilitytodeterminethecorre ctdistributionfromthe historicdata. 3.Wehavetobesurethatourproblemacceptsprobabilistics olutionsi.e.a stochasticproblemsolutionmightnotbeimmunizedagainst acatastrophic scenarioandasystemmightbevulnerableagainstrareevent occurrence. Forthisthechoiceoftheapproachstrictlydependsonthena tureoftheproblemas wellastheavailabledata.Foranintroductiontostochasti cprogrammingwereferthe readerto[ 14 ].Foracompleteoverviewofrobustoptimizationwereferth ereaderto[ 9 ]. 1.3.2ThesisGoalandStructure Inthisworkourgoalistoexploreinparticulartheoptimiza tionmodelsemployed insomeofthewellstudiedDMalgorithms,formulatetheirRC problemandprovide algorithmsinordertosolvethem.Therestofthisworkisorg anizedasfollows.In Chapter 2 wegiveanoverviewofsomeofthemostimportantdatamininga lgorithmsin theabsenceofuncertainty.WediscussLinearLeastSquares (LLS),PCA,Fisher'sLDA, 21

PAGE 22

SVMandRegularizedGeneralizedEigenvalueClassication (ReGEC).Attheendof theirpresentationwegiveacomparativestudyofsomeofthe monarealclassication problemfrombiomedicine[ 78 ].InChapter 3 thoughChapter 6 wediscusstherobust counterpartformulationofLinearLeastSquares,PCA,LDA, SVM[ 80 ]whereasin Chapter 7 wegivetherobustformulationofRegularizedGeneralizedE igenvalue Classication[ 79 ].InChapter 8 wegivediscussfurtherresearchdirectionsandprovide someconclusions. 22

PAGE 23

CHAPTER2 DATAMININGMETHODS Inthissectionwewillgivethemathematicaldescriptionas severaldatamining methods.InChapter 3 throughChapter 6 wewilldealwiththeirrobustformulations.We willrevisedthebasicformulationsofthefollowingdatami ningalgorithms 1.LinearLeastSquares(LLS)2.PrincipalComponentAnalysis(PCA)3.LinearDiscriminantAnalysis(LDA)4.SupportVectorMachines(SVM)5.RegularizedGeneralizedEigenvalueClassication(ReG EC) Therstalgorithmisbroadlyusedforlinearregression,bu tcanalsobeusedfor supervisedlearning,whereasthelastthreemethodsaretyp icalsupervisedlearning algorithms.Thegoalistogivethenecessarybackgroundbef oreintroducingtheirrobust formulations.Weconcludebyprovidingacomparativecases tudyonasupervised learningproblemfromRamanspectroscopy. 2.1LinearLeastSquares Intheoriginallinearleastsquares(LS)problemoneneedst odeterminealinear modelthatapproximates“best”agroupofsamples(datapoin ts).Eachsamplemight correspondtoagroupofexperimentalparametersormeasure mentsandeachindividual parametertoafeatureor,instatisticalterminology,toap redictor.Inadditioneach sampleischaracterizedbyanoutcomewhichisdenedbyarea lvaluedvariableand mightcorrespondtoanexperimentaloutcome.Ultimatelywe wishtodeterminealinear modelabletoissueoutcomepredictionfornewsamples.Theq ualityofsuchamodel canbedeterminedbyaminimumdistancecriterionbetweenth esamplesandthelinear model.Thereforeif n datapoints,ofdimension m each,arerepresentedbyamatrix A 2 R n m andtheoutcomevariablebyavector b 2 R n (oneentrycorrespondingto arowofmatrix A )weneedtodetermineavector x 2 R m suchthattheresidualerror, 23

PAGE 24

expressedbyanorm,isminimized.Thiscanbestatedas min x k Ax b k 22 (2–1) Where kk 2 istheEuclideannormofavector.Theobjectivefunctionval ueisalsocalled residualanddenoted r ( A b x ) orjust r .Thegeometricinterpretationofthisproblemis tondavector x suchthatthesumofthedistancesbetweenthepointsreprese ntedby therowsofmatrix A andthehyperplanedenedby x T w b =0 isminimized.Inthis sensethisproblemisarstorderpolynomialttingproblem .Thenbydeterminingthe optimal x vectorwewillbeabletoissuepredictionsfornewsamplesby justcomputing theirinnerproductwith x .A2DexamplecanbeseeninFig. 2.1 .Inthiscasethedata matrixwillbe A =[ ae ] 2 R n 2 where a isthepredictorvariableand e acolumnvector fullofonethataccountsfortheconstantterm. 120 100 80 60 40 20020406080100120 40 20 0 20 40 60 80 100 abFigure2-1.TheSignleinputsingleoutcomecase.Thisisa2D examplethepredictor arerepresentedbythe a variableandtheoutcomebyverticalaxis b Theproblemcanbesolved,initsgeneralform,analytically sinceweknowthatthe globalminimumwillbeataKKTpoint(sincetheproblemiscon vexandunconstrained). thelagrangianequation L LLS ( x ) willbegivenbytheobjectivefunctionitselfandthe KarushKuhnTucker(KKT)pointscanbeobtainedbysolvingth efollowingequation 24

PAGE 25

d L LLS ( x ) dx =0 2 A T Ax = A T b (2–2) Incasethat A isoffullrow,thatis rank ( A )= n ,matrix A T A isinvertibleandwecan write x LLS =( A T A ) 1 A T b A y b (2–3) Matrix A y isalsocalledpseudoinverseorMoorePenrosematrix.Itisv erycommon thatthisfullrankassumptionisnotalwaysvalid.Insuchca sethemostcommonwayto addresstheproblemisthroughregularization.Oneofthemo stfamousregularization techniquesistheoneknownasTikhonovregularization.Int hiscaseinsteadofEq. 2–1 weconsiderthefollowingone min x k Ax b k 2 k x k 2 (2–4) byusingthesamemethodologyweobtain A T ( Ax b ) x =0 ( A T A I ) x = A T b (2–5) where I isaunitmatrixofappropriatedimension.Nowevenincaseth at A T A isnot invertable A T A + I is,sowecancompute x by x RLLS =( A T A + I ) 1 A T b (2–6) Thistypeofleastsquaresolutionisalsoknownasridgeregr ession.Theparameter is somekindofpenaltycoefcientthatcontrolsthetradeoffb etweensolutionwithlowleast squareerrorandloweuclideannormsolution.Originallyre gularizationwasproposed inordertoovercomethispracticaldifcultythatarisesin realproblem.Thevalueof is determinedusuallybytrialanderroranditsmagnitudeisus uallysmallercomparedto theentriesofdatamatrix.InFig. 2.1 wecanseehowtheleastsquaresplanechanges fordifferentvaluesofdelta. 25

PAGE 26

120 100 80 60 40 20020406080100120 40 20 0 20 40 60 80 100 ab =0.0 =0.1 =1.0 =2.0 =4.0 =8.0 =16.0 Figure2-2.LLSandregularization.Changeoflinearleasts quaressolutionwithrespect todifferent values.Aswecanobserved,inthisparticularexample,the solutiondoesnotchangedrasticallywithregularization. InChapter 3 wewillexaminetherelationbetweenrobustlinearleastsqu aresand robustoptimization.2.1.1WeightedLinearLeastSquares Aslight,andmoregeneral,modicationoftheoriginalleas tsquaresproblemis theweightedleastsquaresproblem.Inthiscasewehavethef ollowingminimization problem min x r T Wr =min x ( Ax b ) T W ( Ax b )=min x k W 1 = 2 ( Ax b ) k (2–7) Where W istheweightmatrix.Notethatthisisamoregeneralformula tionsince for W = I theproblemreducestoEq. 2–1 .Theminimumcanbeagainobtainedbythe solutionofthecorrespondingKKTsystemswhichis 2 A T W ( Ax b )=0 (2–8) andgivesthefollowingsolution x RLLS =( A T WA ) 1 A T Wb (2–9) 26

PAGE 27

Assumingthat A T WA isinvertible.Indifferentcaseregularizationcanbeagai n employedyieldingintheweightedregularizedleastsquare sproblem min x k W 1 = 2 ( Ax b ) k 2 + k x k 2 (2–10) thatattainsitsglobalminimumfor x RWLLS =( A T WA + I ) 1 AWb (2–11) Nextwewilldiscusssomepracticalapproachesforcomputin gleastsquaresolutionfor allthediscussedvariationsoftheproblem.2.1.2ComputationalAspectsofLinearLeastSquares Computationallyleastsquaressolutioncanbeobtainedbyc omputinganinverse matrixandapplyingacoupleofmatrixmultiplications.Inf actmatrixinversionisavoided inpractise,especiallyduetothehighcomputationalcost, otherdecompositionmethods canbeemployed.Forcompletionwewillincludethreeofthem ostpopularhere. Choleskyfactorization:Incasetatmatrix A isoffullrankthen AA T isinvertible andcanbedecomposedwitCholeskydecompositionisaproduc t LL T were L islower triangularmatrix.ThenEq. 2–2 canbewrittenas LL T x = A T b (2–12) canbesolvedbyaforwardsubstitutionfollowedbyabackwar dsubstitution.Incasethat A isnotoffullrankthenthisprocedurecanbeappliedtothere gularizedproblemofEq. 2–5 QRfactorization:AnalternativemethodistheoneofQRdeco mposition.Inthis casewedecomposematrix AA T intoaproductoftwomatricesweretherstmatrix Q isorthogonalandthesecondmatrix R isuppertriangular.Thisdecompositionagain requiresdatamatrix A tobeoffullrowrank.OrthogonalmatrixQastheproperty 27

PAGE 28

QQ T = I thustheproblemisequivalentto Rx = Q T A T b (2–13) Thelastcanbesolvedbybackwardsubstitution.SingularValueDecomposition(SVD):Thislastmethoddoesn otrequirefullrankof matrixA.ItusesthesingularvaluedecompositionofA A = U V T (2–14) whereUandVareorthogonalmatricesand isdiagonalmatrixthathasthesingular values.EverymatrixwitrealelementsasaSVDandfurthermo reitcanbeprovedthata matrixisoffullrowrankifandonlyifallofitssingularval uesarenonzero.Substituting A matrixwithitsSVDdecompositionweget AA T x =( U V T )( V U T ) x = U 2 U T x = A T b (2–15) andnally x = U ( 2 ) y U T A T b (2–16) Thematrix ( 2 ) y canbecomputedeasilybyinvertingitsnon-negativeentrie s.If A isfullranktenallsingularvaluesarenonzeroandthen ( 2 ) y =( 2 ) 1 .AlthoughSVD canbeappliedtoanykindofmatrixitiscomputationallyexp ensiveandsometimesis notpreferredespeciallywhenprocessingmassivedatasets 2.1.3LeastAbsoluteShrinkageandSelectionOperator Analternativeregularizationtechniqueforthesameprobl emistheoneofleast absoluteshrinkageandselectionoperator(LASSO)[ 70 ].Inthiscasetheregularization termcontainsarstnormterm k x k 1 .Thusweavethefollowingminimizationproblem min x k Ax b k 2 + k x k 1 (2–17) 28

PAGE 29

Althoughthisproblemcannotbesolvedanalyticallyastheo neobtainedafter Tikhonovregularization,sometimesitispreferredasitpr ovidessparsesolutions.Thatis thesolution x vectorobtainedbyLASSOhasmorezeroentries.Forthisthis approach hasalotofapplicationsincompressivesensing[ 5 19 43 ].Asitwillbediscussedlater thisregularizationpossessesfurtherrobustpropertiesa sitcanbeobtainedtrough robustoptimizationforaspecictypeofdataperturbation s. 2.2PrincipalComponentAnalysis ThePCAtransformationisaverycommonandwellstudieddata analysistechnique thataimstoidentifysomelineartrendsandsimplepatterns inagroupofsamples.It hasapplicationinseveralareasofengineering.Itispopul ardueto(itrequiresonlyan eigendecomposition,orsingularvaluedecomposition). Therearetwoalternativeoptimizationapproachesforobta iningprincipalcomponent analysissolution,theoneofvariancemaximizationandthe oneofminimumerror formulation.Bothstartwitha“different”initialobjecti veandendupprovidingthesame solution.Itisnecessarytostudyandunderstandbothofthe sealternativeapproaches. Atthispointweneedtonotethatweassumethatthemeanofthe datasamplesis equaltozero.Incasethisisnottrueweneedtosubstractthe maplemeanaspartof preprocessing.2.2.1MaximumVarianceApproach Inthiscasewetrytondasubspaceofdimensionality p < m forwhichthe variabilityoftheprojectionofthepointsismaximized.if wedenotewith x thesample arithmeticmean x = 1 n n X i =1 x i (2–18) thenthevarianceoftheprojecteddataonasubspacedenedb ythedirection vector u willbe 1 n N X i =1 u T x i u T x 2 = 1 n N X i =1 u T ( x i x ) 2 = u T P ni =1 ( x i x ) T ( x i x ) n u (2–19) 29

PAGE 30

2 1.5 1 0.500.511.52 2 1 0 1 2 xyFigure2-3.TwopathsforPCA.PCAhastwoalternativeoptimz ationformulationsthat resultinthesameoutcome.Oneistondaspacewheretheproj ectionof theoriginaldatawillhavemaximumvarianceandthesecondi stondthe subspacesuchthattheprojectionerrorisminimized. andgiventhatthevariancecovariancematrixisdenedby S = P ni =1 ( x i x )( x i x ) n (2–20) ThenEq. 2–19 canbewritteninmatrixnotationas u T Su (2–21) Ifwerestrict,withoutlossofgenerality,oursolutionspa cejusttothevectors u with unitEuclideannormthenPCAproblemcanbeexpressedasthef ollowingoptimization problem max u u T Su (2–22a) s.t. u T u =1 (2–22b) TheLagrangian L PCA ( u ) forthisproblemwillbe L PCA ( u )= u T Su + ( u T u 1)=0 (2–23) 30

PAGE 31

where isthelagrangemultiplierassociatedwiththesingleconst raintoftheproblem. TheoptimalpointswillbegivenbytherootsoftheLagrangia n(since S ispositive semidenitetheproblemisconvexminimization).Thus Su = u (2–24) Thisequationissatisedbyalltheeigenpairs ( n u n ), i =1,..., n where 1 2 ... n (2–25) aretheorderedeigenvaluesand u i 'sarethecorrespondingeigenvectors.Theobjective functionismaximizedfor u = u n = n andtheoptimalobjectivefunctionvalueis u T n Su n = u T n n u n = n k u n k 2 = n 2.2.2MinimumErrorApproach AnalternativederivationofPCAcanbeachievedthroughadi fferentpath.In thissecondapproachtheobjectiveistorotatetheoriginal axissystemsuchthatthe projectionerrorofthedatasettotherotatedsystemwillbe minimized.Thuswedene asetofbasisvector f u i g ni =1 .Assoonaswedothisweareabletoexpresseverypoint, includingourdatasetpoints,asalinearcombinationofthe basisvectors. x k = m X i =1 a ki x i = m X i =1 x T k u i u i (2–26) Ourpurposeistoapproximateeverypoint x k with ~ x k usingjustasubset p < m of thebasis.Thustheapproximationwillbe ~ x n = p X i =1 x T n u i u i + m X i = p +1 x T u i u i (2–27) Thustheapproximationerrorcanbecomputedthroughasasqu aredEuclidean normsummationoveralldatapoints n X k =1 k x k ~ x k k 2 (2–28) 31

PAGE 32

wecanobtainamorecompactexpressionfor x k ~ x k x k ~ x k = m X i =1 x T k u i u i p X i =1 x T n u i u i + m X i = p +1 x T u i u i (2–29a) = m X i = p +1 ( x T k u i ) u i ( x T k u i ) u i (2–29b) ThenthesolutioncanbeestimatedbyminimizingEq. 2–27 andbyconstrainingthe solutiontothevectorsofunitEuclideannorm min u m X i = p +1 u T i Su i =max u p X i =1 u T i Su i (2–30a) s.t. u T u =1 (2–30b) Thisoptimizationproblemissimilartotheoneobtainedthr oughthemaximum varianceapproach(theonlydifferenceisthatwearelookin gfortherst p components insteadofjustone)andthesolutionisgivenbytherst p eigenvectorsthatcorrespond tothe p highesteigenvalues(canbeprovedthroughanalyticalsolu tionofKKT equation). 2.3LinearDiscriminantAnalysis TheLDAapproachisafundamentaldataanalysismethodorigi nallyproposedby R.Fisherfordiscriminatingbetweendifferenttypesofow ers[ 27 ].Theintuitionbehind themethodistodetermineasubspaceoflowerdimension,com paredtotheoriginal datasampledimension,inwhichthedatapointsoftheorigin alproblemare“separable”. Separabilityisdenedintermsofstatisticalmeasuresofm eanvalueandvariance.One oftheavdvantagesofLDAisthatthesolutioncanbeobtained bysolvingageneralized eigenvaluesystem.Thisallowsforfastandmassiveprocess ingofdatasamples.In additionLDAcanbeextendedtononlineardiscriminantanal ysisthroughthekernel trick[ 7 ].Theoriginalalgorithmwasproposedforbinaryclassprob lemsbutmulticlass generalizationshavebeenproposed[ 60 ].Herewewilldiscussbothstartingfromthe simpletwoclasscase. 32

PAGE 33

Let x 1 ,..., x p 2 R m beasetof p datasamplesbelongingtotwodifferentclasssets, A and B .Foreachclasswecandenethesamplemeans x A = 1 N A X x 2 A x x B = 1 N B X x 2 B x (2–31) where N A N B arethenumberofsamplesin A and B respectively.Thenforeachclass wecandenethepositivesemidenitescattermatricesdesc ribedbytheequations S A = X x 2 A ( x x A )( x x A ) T S B = X x 2 B ( x x B )( x x B ) T (2–32) Eachofthesematricesexpressesthesamplevariabilityine achclass.Ideallywe wouldliketondahyperplaneforwhichifweprojectthedata samplestheirvariance wouldbeminimal.Thatcanbeexpressedas min T S A + T S B =min T ( S A + S B ) (2–33) Ontheotherside,thescattermatrixbetweenthetwoclasses isgivenby S AB =( x A x B )( x A x B ) T (2–34) AccordingtoFisher'sintuitionwewishtondahyperplanei nordertomaximizethe distancebetweenthemeansbetweenthetwoclassesandatthe sametimetominimize thevarianceineachclass.Mathematicallythiscanbedescr ibedbymaximizationof Fisher'scriterion max J ( )=max T S AB T S (2–35) Forthisoptimizationproblemcanhaveinnitelysolutionw itthesameobjective functionvalue.Thatisforasolution allthevectors c giveexactlythesamevalue. Forthis,withoutlossofgenerality,wereplacethedenomin atorwitanequalityconstraint 33

PAGE 34

inordertochoseonlyonesolution.Thentheproblembecomes max T S AB (2–36a) s.t. T S =1 (2–36b) TheLagrangianassociatedwiththisproblemis L LDA ( x )= T S AB ( T S 1) (2–37) where isthelagrangemultiplierthatisassociatedwiththeconst raintofEq. 2–36b Since S AB ispositivesemidenitetheproblemisconvexandtheglobal minimumwillbe atthepointforwhich @ L LDA ( x ) @ x =0 S AB S =0 (2–38) Theoptimal canbeobtainedastheeigenvectorthatcorrespondstothesm allest eigenvalueofthefollowinggeneralizedeigensystem S AB = S (2–39) Multi-classLDAisanaturalextensionofthepreviouscase. Given n classes,we needtoredenethescattermatrices.Theintra-classmatri xbecomes S = S 1 + S 2 + + S n (2–40) whiletheinter-classscattermatrixisgivenby S 1,..., n = n X i =1 p i ( x i x )( x i x ) T (2–41) where p i isthenumberofsamplesinthe i -thclass, x i isthemeanforeachclass,and x isthetotalmeanvectorcalculatedby x = 1 p n X i =1 p i x i 34

PAGE 35

Thelineartransformation wewishtondcanbeobtainedbysolvingthefollowing generalizedeigenvalueproblem S 1,..., n = S LDAcanbeusedinordertoidentifywhicharethemostsignic antfeaturestogether withthelevelofsignicanceasexpressedbythecorrespond ingcoefcientofthe projectionhyperplane.AlsoLDAcanbeusedforclassifying unknownsamples.Once thetransformation isgiven,theclassicationcanbeperformedinthetransfor med spacebasedonsomedistancemeasure d .Theclassofanewpoint z isdeterminedby class ( z )=argmin n f d ( z x n ) g (2–42) where x n isthecentroidof n -thclass.Thismeansthatrstweprojectthecentroidsof allclassesandtheunknownpointsonthesubspacedenedby andtheweassignthe pointstotheclosestclasswithrespectto d 2.3.1GeneralizedDiscriminantAnalysis Inthecasethatthelinearprojectionmodelcannotinterpre tthedataweneedto obtainanonlinearequivalentofLDA[ 7 ].Thiscanbeachievedbythewellstudied kerneltrick.Inthiscaseweembedtheoriginaldatapoints( inputspace)toahigher dimensionspace(featurespace)andthenwesolvethelinear problem.Theprojection ofthislineardiscriminantinthefeaturespaceisanon-lin eardiscriminantintheinput space.Thiskernelembeddingisperformedthroughafunctio n : R m 7! R q where q is thedimensionofthefeaturespace.Thenthearithmeticmean onthefeaturespacefor eachclasswillbe x 1 = 1 N A X x 2 A ( x ),..., x n = 1 N B X x 2 B ( x ), (2–43) Thenthescattermatricesforeachclassinthefeaturespace wouldbe V 1 = X x 2 A ( ( x ) x A )( ( x ) x A ) T ,..., V n = X x 2 B ( ( x ) x B )( ( x ) x B ) T (2–44) 35

PAGE 36

andthevariancebetweenclassesinthefeaturespacewillbe B 1,..., n = n X i =1 p i ( x i x )( x i x ) T (2–45) Fisher'scriterioninthefeaturespacewillbe min y J k ( y )= y T ( B 1,..., n ) y y T ( P ni =1 V i ) y (2–46) andthesolutioncanbeobtainedfromtheeigenvectorthatco rrespondstothesmallest eigenvalueofthegeneralizedeigensystem B 1,..., n y = P ni =1 V i y .Thereareseveral functionsthatareusedaskernelfunctiongenerallyindata miningliterature.Foramore extensivestudyofkerneltheoreticalpropertieswerefert hereaderto[ 66 ]. 2.4SupportVectorMachines SupportVectorMachines(SVM)isoneofthemostwellknownsu pervised classicationalgorithm.ItwasoriginallyproposedbyV.V apnik[ 74 ].Theintuition behindthealgorithmisthatwewishtoobtainahyperplaneth at“optimally”separates twoclassesoftrainingdata.ThepowerofSVMliesinthefact thatithasminimal generalizationerror(atleastinthecaseoftwoclasses)an dthesolutioncanbe obtainedcomputationallyefcientlysinceitcanbeformul atedasaconvexprogramming problem.It'sdualformulationcanbeusedinordertoboostt heperformanceevenmore. AsforothersupervisedclassicationmethodsSVMoriginal formulationreferstobinary classicationproblems. Givenasetofdatapoints x i i =1,..., n andanindicatorvector d 2f 1,1 g n with theclassinformationofthedatapointsweaimtondahyperp lanedenedby ( w b ) suchthatthedistanceofbetweenthehyperplaneandtheclos estofthedatapoints ofeachclass(supportvectors).Thiscanbeexpressedasthe followingoptimization 36

PAGE 37

problem min w b 1 2 k w k 2 (2–47a) s.t. d i w T x i + b 1, i =1,..., n (2–47b) ForthisproblemtheLagrangianequationwillbe L SVM ( w b )= 1 2 w T w n X i =1 i d i w T x i + b 1 (2–48) where =[ 1 2 ... n ] areLagrangemultipliers.Inordertodeterminethemweneed to takethepartialderivativeswithrespecttoeachdecisionv ariableandsetthemequalto zero. @ L SVM ( w b ) @ w =0 w = n X i =1 i d i x i (2–49a) @ L SVM ( w b ) @ b =0 n X i =1 i d i =0 (2–49b) AndifwesubstituteinEq. 2–48 weget L SVM ( w b )= 1 2 n X i j =1 i j d i d j h x i x j i n X i j =1 i j d j h x j x i i + b n X i =1 d i + n X i =1 i (2–50a) = n X i =1 i 1 2 n X i j =1 i j d i d j h x i x j i (2–50b) ThenwecanexpressthedualoftheoriginalSMVproblemasfol lows max n X i =1 i 1 2 n X i j =1 i j d i d j h x i x j i (2–51a) s.t. n X i =1 d i i =0 (2–51b) i 0 i =1,..., n (2–51c) Thelastisaalsoaconvexquadraticproblemthatcanbesolve defciently.Once theoptimaldualvariables i i =1,..., n arefoundthentheoptimalseparation 37

PAGE 38

hyperplane w canbeobtainedfrom w = n X i =1 d i i x i (2–52) Notethat b doesnotappearinthedualformulationthusitshouldbeesti matedthrough theprimalconstraints b = max d i = 1 h w x i i +min d i =1 h w x i i 2 (2–53) Thismodelcangiveaseparationhyperplaneincasethatthet woclassesare linearlyseparable.Incasethatthisassumptiondoesnotho ldtheoptimizationproblem willbecomeinfeasible.Forthisweneedtoslightlymodifyt hisoriginalhardmargin classicationmodelsothatitremainsfeasibleevenincase thatsomepointsare misclassied.Theideaistoallowmisclassiedpointsbuta tthesametimetopenalize thismisclassicationsmakingitalessfavorablesolution min w b i 1 2 k w k 2 + C n X i =1 2 i (2–54a) s.t. d i w T x i + b 1 i i =1,..., n (2–54b) where C isthepenalizationparameter.Notethatthismodelbecomes thesameasin Eq. 2–47a as C 7! + 1 .ThismodiedSVMformulationisknownassoftmarginSVM. Thisformulationcanbeseenasaregularizedversionofform ulationofEq. 2–47a .The LagrangianofEq. 2–54a willbe L SVM S ( w b , )= 1 2 w T w + C 2 n X i =1 2 i X i =1 i [ d i w T x i + b 1+ ] (2–55) where,again, i areappropriateLagrangianmultipliers.Thedualformulat ioncanbe easilyobtainedinawaysimilartothehardmarginclassier .Theonlydifferenceisthat nowwewillhaveanadditionalequationassociatedtothenew variables.Settingthe derivationoftheLagrangianequaltozeroforeachofthedec isionvariablesgivesthe 38

PAGE 39

followingKKTsystem @ L SVM ( w b , ) @ w =0 w = n X i =1 d i i x i (2–56a) @ L ( w b , ) @ =0 C = = 1 C (2–56b) @ L SVM ( w b , ) @ b =0 n X i =1 d i i =0 (2–56c) SubstitutingtheseequationtotheprimalLagrangianweobt ain L SVM S ( w b , )= 1 2 w T w + C 2 n X i =1 2 i X i =1 i d i w T x i + b 1+ (2–57a) = n X i =1 i 1 2 n X i j =1 i j d i d j h x i x j i + 1 2 C h i 1 C h i (2–57b) = n X i =1 i 1 2 n X i j =1 i j d i d j h x i x j i 1 2 C h i (2–57c) = n X i =1 i 1 2 n X i j =1 i j d i d j h x i x j i + 1 C ij (2–57d) where ij isthekronecker whereitisequalto1when i = j anditiszerootherwise. Thedualformulationoftheproblemisthus max n X i =1 i 1 2 n X i j =1 i j d i d j h x i x j i + 1 C ij (2–58a) s.t. n X i =1 d i i =0 (2–58b) i 0 i =1,..., n (2–58c) Oncetheoptimaldualvariableshavebeenobtainedtheoptim alseparation hyperplanecanberecoveredsimilarasinhardmarginclassi er.Oncethehyperplane hasbeenobtainedanewpoint x u canbeclassiedinoneofthetwoclassesbasedon thefollowingrule d x u = sgn ( w T x u + b ) (2–59) 39

PAGE 40

where sgn ( ) isthesignfunction.Itisworthnotingthatthenonlinearve rsionofSVM canbecanbeobtainedifwejustreplacethedotproductfunct ionwithanotherkernel function ( x i x j ) .ForexamplethedualsoftmarginkernelSVMformulationwil lbe max n X i =1 i 1 2 n X i j =1 i j d i d j ( x i x j )+ 1 C ij (2–60a) s.t. n X i =1 d i i =0 (2–60b) i 0 i =1,..., n (2–60c) Infactthelinearcaseisaspecialkernelcaseasthedorprod uctcanbeseenasan admissiblekernelfunctioni.e. ( )= h i .Oneofthefundamentallimitationsofthe genericformulationofsoftSVMisthatitisproposedjustfo rthetwoclasscase(binary classication).Thismightposesaproblemasmanyoftherea lworldproblemsinvolve datathatbelonginmorethantwoclasses. Majorityvotingscheme[ 74 ]:Accordingtothisapproach,givenatotalof N classes, wesolvetheSVMproblemforallbinarycombinations(pairs) ofclasses.Forexamplefor athreeclassproblem(class A ,class B andclass C )wendtheseparationhyperplanes thatcorrespondtotheproblems A vs B A vs C and B vs C .Whenanewpointcomes theneachclassier“decides”ontheclassofthispoint.Fin allythepointisclassiedto theclasswiththemost“votes”. Directedacyclicgraphapproach[ 54 ]:Forthemajorityvotingprocessoneneeds toconstructalargenumberoftrainingbinaryclassiersin ordertoinfertheclassofan unknownsample.Thiscanposeacomputationalproblemtothe performance.Thus inthedirectedacyclicgraphwetrytominimizethenumberof necessaryclassiers required.Thiscanbeachievedbyconsideringatreethateli minatesoneclassateach level.AnexamplewithfourclassesisillustratedinFig. 2.4 40

PAGE 41

A vs B A vs C A vs D A C vs D C A B vs C B vs D B C vs D C B Figure2-4.Directedacyclicgraphapproachforafourclass example( A B C and D ).At therstlevelofthetreethesamplegoesthroughthe A vs B classier.Then dependingontheoutcomethesampleistestedthroughthe A vs C or B vs C .Thetotalnumberofbinaryclassiersneededequalsthedep thofthe tree. Astraightforwardobservationregardingthesetwomulticl assgeneralization strategiesisthattheycanbeusedforanytypeofbinaryclas siers(notonlySVM) withorwithouttheuseofkernel.2.4.1AlternativeObjectiveFunction Inamoregeneralframeworktherehavebeenproposedalterna tiveobjective functionthatcanbeusedforSVMclassicationyieldingin, computationallydifferent problems.Inthegeneralformonecanexpresstheobjectivef unctionas f p ( w ) where p correspondstothetypeofnorm.Alongtheselineswecanexpr essthepenaltyfunction as g ( C ) where C canbeeitheranumbermeaningthatallpointsarepenalized thecamewayoradiagonalmatrixwitheverydiagonalelement correspondingto thepenalizationcoefcientofeachdatasample.Apractica lapplicationofdifferent penalizationcoefcientmightoccurinthecasewherewehav eunbalancedclassication problem(thetrainingsamplesofoneclassismuchhighertha ttheother).Forthe caseofSVMthatwealreadypresentedweassumedquadraticob jectiveandpenalty functions f 2 ( w )= k w k 22 = w T w g ( C )= T C (2–61) 41

PAGE 42

Anotherpopularchoiceis f ( w )= k w k 1 g ( C xi )= h C i (2–62) InthiscasetheSVMformulationbecomes min k w k 1 + h C i (2–63a) s.t. d i ( w T x i + b ) 1 i =1,..., n (2–63b) Itiseasytoshownthatthelastformulationcanbesolvedasa linearprogram(LP). Morespecicallyifweintroducetheauxilaryvariable wecanobtainthefollowing equivalentformulationofEq. 2–63 min n X i =1 i + h C i (2–64a) s.t. d i n X i =1 i h x i x j i + b 1 i =1,..., n (2–64b) i 0, 0, i =1,..., n (2–64c) Itisworthnotingthatthelinearprogrammingapproachwasd evelopedindependently fromthequadraticone. 2.5RegularizedGeneralizedEigenvalueClassication Anothersupervisedlearningalgorithmthatmakesuseofthe notionofhyperplane istheRegularizedGeneralizedEigenvalueClassier(ReGE C).Giventwoclassesof points,representedbytherowsofmatrices A 2 R m n and B 2 R p n ,eachrowbeing apointinthefeaturesspace R n ,wewishtodeterminetwohyperplanes,oneforeach class,withthefollowingcharacteristics: a) Thesumofthedistancesbetweeneachpoint intheclassandthehyperplanetobeminimum,and b) thethesumofthedistances betweenthehyperplaneandthepointsoftheotherclasstobe maximum.Originally thisconceptwasdevelopedbyMangassarianetal.[ 45 ]andthenfurtherdeveloped byGuarracinoetal.[ 29 30 ].Firstwewillgiveamathematicaldescriptionofthe 42

PAGE 43

generalizedeigenvalueclassiersandthenwewillillustr atetheproposedregularization techniquesproposed.Ifwedenotethehyperplanerelatedto class A with w T A x r A =0 theproblemconsistsindeterminingthevariables w A 2 R n and r A 2 R suchthat min w A r A 6 =0 k Aw A e r A k 2 k Bw A e r A k 2 (2–65) where e isacolumnvectorofonesofproperdimension.Ifwelet G =[ A e ] T [ A e ], H =[ B e ] T [ B e ], (2–66) thentheproblemofEq. 2–65 becomes min z A 6 =0 z T A Gz A z T A Hz A (2–67) with z A = w T A r A T .NotethatthislastproblemhassimilarstructuretoFisher Linear DiscriminantFunctionminimization.Itisaminimizationo fageneralizedRayleigh quotient.Thisproblemcanbetransformedtothefollowingo ne min z T A Gz A (2–68a) s.t. z T A Hz A =1 (2–68b) ThentheLagrangianwillbe L ReGEC ( z A )= z T A Gz A z T A Hz A 1 (2–69) IfwetaketheLagrangian'sderivativeandsetifequaltozer oweget @ L ReGEC ( z A ) @ z A =0 Gz A = Hz A (2–70) Since H and G arerealandsymmetricmatricesbyconstruction,if H ispositivedenite, theoptimalsolutionisattainedattheeigenvector z A = z min thatcorrespondtothe minimumeigenvalue min .Byfollowingasimilarprocess,wecandeterminethe hyperplaneforclass B .Giventhesymmetryoftheproblem,if z B istheeigenvector 43

PAGE 44

thatcorrespondstothemaximumeigenvalueoftheproblemof Eq. 2–70 ,thenitisthe eigenvectorcorrespondingtotheminimumeigenvalueofthe followingsystem Hz = Gz (2–71) Thismeanswecancomputethehyperplanesfor A and B computingtheeigenvectors relatedtominimumandmaximumumeigenvaluesofproblemofE q. 2–70 .Themodelis alsoabletopredicttheclasslabelofunknownsamplesbyass igningthemtotheclass withminimumdistancefromthehyperplane.Thatis,foranar bitraryunknownpoint x u class ( x u )=argmin i 2f A B g = j w T i x u r i j k w i k (2–72) Thegeneralizationtomulti-classproblemsisstraightfor ward.Foreachclass A i i =1,..., k ,amodelisbuiltwithrespecttoeveryotherclass A j k 6 = i .The k 1 modelsforeachclassarethenmergedusingtheirnormalvect ors.Thelatterare averagedusingsingularvaluedecomposition,whichproduc esavectorwithminimum anglewithrespecttoeachnormalvector.Suchvectoridenti esasinglehyperplane, thatisusedasclassicationmodelfortheclass A i .Theassignmentofatestpoint toaclassisdoneintwosteps.First,thepointsofeachclass areprojectedontheir respectiveplane.Then,thetestpointisassignedtothecla ssofthenearestneighbor projectedpoint.Adetaileddescriptionofthemethodandit sperformancecanbefound in[ 31 ]. Intheoriginalpaperongeneralizedeigenvalueclassicat ionbyMangasarianet al[ 45 ],since H isnotalwayspositivedenite,Tikhonovregularizationwa susedfor solvingthegeneralizedeigenvalueproblem.Theregulariz ationavoidsinstabilitiesand difcultiesrelatedtopossiblesingularitiesoftheeigen valuesandmultipleeigenvectors producedbyarankdecient H .Inthisframework,aregularizationparameter is used,whichisadjustedusuallythroughsometrial-and-err orprocedure;then, G + I and H + I areusedoneatatime,insteadof G and H ,inthesolutionoftwodistinct 44

PAGE 45

eigenvalueproblemsthatproducethetwoclassicationmod elsfordatapointsin A and B .Thealgorithmprovestohavebetterperformancecomparedt otheonewithout regularization.In[ 30 ]Guarracinoet.al.proposedaalternativeregularization framework forthisproblem.Theregularizationframeworkproposedis thefollowing min w i r i 6 =0 k Aw i e r i k + ( i ) 1 k ~ Bw i e r i k k Bw i e r i k + ( i ) 2 k ~ Aw i + e r i k i 2f A B g (2–73) where ( i ) 1 ( i ) 2 areregularizationparametersand ~ A ~ B arediagonalmatriceswhose diagonalelementsarethediagonalelementsofmatrices A B correspondingly.The lastproblem'ssolutioncanbeobtainedfromthesolutionof thefollowinggeneralized eigenvalueproblem ( G + 1 ~ H ) z = ( H + 2 ~ G ) z (2–74) Thisalternativeregularizationapproachwasshowntoyiel dhigherclassication accuracycomparedtoTikhonovregularization[ 30 ]. 2.6CaseStudy-Celldeathdiscrimination Inthissection,beforeproccedingtothedescriptionofthe robustformulaltionof thealreadydiscusseddataminingmethods,wewillpresenta comparativecasestudy amongsomeofthedescribedsupervisedlearningalgorithms .Thisisdonemainlyin ordertogivearoughcomparisonamongthemethodsinarealpr oblemandtoshow thatthereisnodefactooptimalmethodforallrealdataanal ysisproblems.Theexample istakenfrombiomedicineandmorespecicallyfromcelldea thdiscriminationusing Ramanspectroscopy.Thiscasestudyhasappearedin[ 78 ]. Thediscriminationofcellshaswidespreaduseinbiomedica landbiological applications.Cellscanundergodifferentdeathtypes(e.g .,apoptotic,necrotic),due totheactionofatoxicsubstanceorshock.Inthecaseofcanc ercelldeath,thequantity ofcellssubjecttonecroticdeath,comparedwiththosegoin gthroughapoptoticdeath, isanindicatorofthetreatmenteffect.Anotherapplicatio nofcelldiscriminationiscell linecharacterization,thatistoconrmtheidentityandth epurityofagroupofcells 45

PAGE 46

thatwillbeusedforanexperiment.Thestandardsolutionis eithertousemicroarray technologies,ortorelayontheknowledgeofanexpert.Inth erstcase,analysistakes alongtime,issubjecttoerrors,andrequiresspecializede quipments[ 56 ].Ontheother hand,whentheanalysisisbasedonlyonobservations,resul tscanbehighlysubjective anddifculttoreproduce. Figure2-5.ExplanationofRaman.Pictorialviewof(a)Rama nspectrometer'sbasic principleofoperationand(b)instrumentation. Recently,Ramanspectroscopyhasbeenappliedtotheanalys isofcells.This methodisbasedonadiffractionprinciple,calledtheRaman shift,thatpermitsto 46

PAGE 47

estimatethequantityandqualityofenzymes,proteinsandD NApresentinasingle cell.Amicroscopefocusesthelaserthroughtheobjectivel ensonthesampleandthe scatteredphotonsarecollectedbythesameobjectivelensa ndtraveltotheRaman spectrometer,wheretheyareanalyzedbyagratingandaCCDd etector,asdepictedin Fig. 2-5 Sincelowenergylasersdonotdeteriorateorkillcells,iti susedinvitroanditcan beusedinvivo.Furthermore,Ramanspectraarenotaffected bychangesinwater, whichmakestheresultsrobustwithrespecttothenaturalch angesinsizeandshapeof cells.Finally,thespectrometerscanaccomplishtheexper imentinlessthanaminute, andcanevenbebroughtoutsideabiologicallaboratory,whi chcanmakeitpotentially usefulinmanyotherapplications,asinthecaseofbiologic althreatdetectioninairports orbattleelds. Ramanspectrumisusuallyanalyzedtondpeaksatspecicwa velengths,which revealsthepresenceandabundanceofaspeciccellcompone nt.Thisinturncanbe usedasabiomarkerforcelldiscriminationorcelldeathcla ssication[ 13 ].Thisstrategy ishighlydependentontheexperimentandspectroscopetuni ng,thusgivingriseto questionsregardingnormalizationofspectraandpeakdete ction. Weexploreandcomparealternativedataminingalgorithmst hatcananalyzethe wholespectrum.Thesetechniqueshavebeensuccessfullyap pliedtootherbiological andbiomedicalproblems,andandaredefactostandardmetho dsforsuperviseddata classication[ 53 63 ].Methodsaretestedthroughnumericalexperimentsonreal data andtheirefciencywithrespecttotheiroverallclassica tionaccuracyisreported. 2.6.1Materials2.6.1.1Dataset Forevaluatingthedataminingalgorithms,weusedtwodiffe rentdatasets.The rstcontainscellsfromtwodifferentcelllines.The30cel lsfromtheA549cellline and60fromMCF7cellline.Therstarebreastcancercells,w hereasthelaterare 47

PAGE 48

600 800 1000 1200 1400 1600 1800 Wavelength (cm -1 )Intensity a.u. A549 MCF7 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 Wavelength (cm -1 )Intensity a.u. healthy apoptotic necrotic (a) (b) Figure2-6.ThemeanRamanspectraforeachclass.Ina)cells fromA459andMCF7 celllineareshownandinb)A549cellstreatedwithEtoposid e(apoptotic), Triton-X(necrotic)andcontrolcells.Allspectrahavebee nnormalizedso thattheyhavezeromeanandunitarystandarddeviationandt hentheywere shiftedforclarity. cancerepitheliacells.All90cellsofthisclasswerenottr eatedwithanysubstance. Theaimofthisexperimentistoevaluatetheabilityofvario usdataminingtechniquesin discriminatingbetweendifferentcelllines. TheseconddatasetconsistsuniquelyofA549cancerepithel ialcells.Therst28 cellsareuntreatedcancercells(control),thenext27cell sweretreatedwithEtoposide andthelast28cellsweretreatedwithTriton-X,sothatthey undergoapoptoticand necroticdeathcorrespondingly.Thedetailedprotocolsfo llowedforthebiological experimentswerestandardandcanbefoundat[ 57 ].Themeanspectrumofeachclass forthetwodatasetsareshowninFig. 2-6 (a&b). 2.6.1.2Ramanspectroscope TheRamanmicroscopeisanInViasystembyRenishaw.Itconsi stsofaLeica microscopeconnectedtoaRenishaw2000spectrometer.Theh ighpowerdiodelaser (250mW)produceslaserlightof785nm.Bothdatasetswereac quiredbyParticle engineeringResearchCenter(P.E.R.C.)attheUniversityo fFlorida. 48

PAGE 49

2.6.1.3Datapreprocessing ForpeakanalysisRamanspectracanbepreprocessedinmanyw ays.Once theyhavebeenacquiredbytheinstrument,therststepcons istsinsubtractingthe backgroundnoise.Thisisusuallydonesubtractingtoeachs pectrumthevalueof aspectrumobtainedwithoutthebiologicalsample.Thenspe ctraarenormalized subtractingameanspectrumobtainedwithapolynomialappr oximationofxedorder. Othertechniquesareusedtodetectpeaksandtodeletespike s.Inthepresentcase study,weonlynormalizedthedataalongthefeaturesofthet rainingset,toobtain featureswithzeromeanandunitvariance.Thosevaluesofme anandvariancearethen usedtonormalizethetestspectra.2.6.2Software InthiscasestudywearecomparingLDA,SVM,ReGEC,thatweha vealready exploretheiroptimizationstructure,aswellasknearestn eighboralgorithmand ImprovedIterativescaling(furtherdiscussedinfollowin gsubsections).Forthe computationalexperimentsMatlabarsenaltoolboxwasused forLDA,IIS, k -NN[ 82 ], whereasforSVM,libsvmwasemployed[ 22 ].ForReGECclassication,theauthor's implementationwasused[ 29 30 ]. 2.6.3NearestNeighborClassication Thisisoneofthesimplestandmostpopularsupervisedlearn ingalgorithm.For everysampleinthetestingsetwecomputethedistances(de nedbysomedistance metric)betweentheunknownsampleandalltheknownsamples .Forthisstudywe usedEuclideandistance.Thenwekeeptheclosest k knownsamplesandtheclassof theunknownoneisdecidedwithamajorityvotingscheme.Usu ally,especiallyforbinary classicationproblem k ischosentobeanoddnumber(mostcommon1,3,5,7).By denition k -NNisnonlinearandmulticlassalgorithm.Itisverypopula rasitiseasyto implementanddoesnotmakeuseofadvancedmathematicaltec hniques.Alsoitisone 49

PAGE 50

ofthesimplesttechniquesfornonlinearclassicationwit houtuseofkernelembedding. Ontheothersideithasbeencriticizedmajorbecause1.Testingprocessalwaysrequirecomputationofalldistan cesbetweentheunknown sampleandallthesamplesthatbelongtothetrainingset.Th isdependenceonthe trainingsetsizemightsubstantiallyslowdownthetesting process. 2.Thisalgorithmisextremelysensitivetooutliers.Asmal lnumberofoutlier(oforder k )cancausecanmanipulatetheclassicationaccuracyasthe algorithm“cares” onlyaboutthe k closestpointstotheunknownsamples. Inordertoovercometheaforementioneddrawbacksseveralp reprocessing techniquescanbeemployed.2.6.4ImprovedIterativeScaling Givenarandomprocesswhichproduces,ateachtimestep,som eoutputvalue y whichisamemberofthesetofpossibleoutputs,IIS[ 23 ]computestheprobabilityof theevent y inuencedbyaconditioninginformation x .Inthiswaywecanconsider,for example,inatextsequence,theprobability p ( y j x ) oftheeventthatgivenaword x ,the nextwordwillbe y .Thisleadstothefollowingexponentialmodel p ( y j x )= 1 Z ( x ) exp( m X i =1 i f i ( x y )), (2–75) where f i ( x y ) isabinaryvaluedfunctioncalled featurefunction i 2 R istheLagrange multipliercorrespondingto f i and j i j isameasureoftheimportanceofthefeature f i Z ( x ) isanormalizingfactorandnallyweput = f 1 ,..., m g Givenajointempiricaldistribution p ( x y ) ,thelog-likelihoodof p accordingtoa conditionalmodel p ( y j x ) ,isdenedas L ( p ) ()= X x y p ( x y )log p ( y j x ). (2–76) Thiscanberegardedasameasureofthequalityofthemodel p .Clearlywehave that L ( p ) () 0 and L ( p ) ()=0 ifandonlyif p is perfect withrespectto p ,i.e. p ( y j x )=1 p ( x y ) > 0 50

PAGE 51

Giventheset f f 1 ,..., f m g ,theexponentialformofEq. 2–75 andthedistribution p ,IIS solvesthemaximumlikelihoodproblemcomputing =argmax L p () 2 R m 2.6.5ResultsandDiscussion2.6.5.1Classicationandmodelselection Weappliedthefollowingsupervisedlearningalgorithmson bothdatasets:a)soft marginSVMb)Regularizedgeneralizedeigenvalueclassic ationandc) k nearest neighborclassication( k -NNwith k =3 ),d)LinearDiscriminantAnalysisande) ImprovedIterativeScaling(IIS)classication.Nokernel wasappliedintheclassiers. Inparticular,forsoftmarginSVMclassiertheparameter C waschosentobe 10 for therstdatasetand 100 forthesecond.ForReGECtheregularizationparameter was chosen 0.01 .Thetuningwasdonethroughagridsearchontheontheparame terspace. Ateveryrepetition90%ofthesampleswereusedfortraining and10%fortesting.The averagecrossvalidationaccuraciesarereportedonTable 2-1 Table2-1.Averageclassicationaccuracyforholdoutcros svalidation(100repetitions). TheLDAachievesthehighestaccuracyforthebinaryproblem andReGEC forthethreeclass. Classicationaccuracy(%) CelllinediscriminationCelldeathdiscrimination (twoclass)(threeclass) C-SVM95.3397.33 ReGEC96.6698.44 3-NNR79.2295.44 IIS95.6787.44 LDA10091.00 WecanseethatforbothdatasetsonlyC-SVMandReGECachieve classication higherthat 95% .Nearestneighborclassicationalthoughitperformsvery wellforthe threeclassproblemithaspoorresultsinthetwoclass.This isrelatedtothegeneric drawbackofthismethodwhichmakesitverysensitivetooutl iers.LinearDiscriminant 51

PAGE 52

analysisalsoachieveshighclassicationresults( > 90% inbothcases)justifyingitsuse intheliterature[ 50 52 ]. 2.6.5.2Impactoftrainingsetdimension Nextweexaminedtherobustnessofeachclassierwithrespe cttothesizeofthe trainingdataset.Forthiswexedthetrainingsizedataset andwerepeatedthecross validationprocessfordifferentsizesofthetrainingdata set.Theresultswereevaluated throughholdoutcrossvalidation( 100 repetitions).ResultsareshowninFig 2-7 &Fig. 2-8 .WenoticethatReGECisconsiderablyrobusttothesizeofth etrainingdataset maintainingclassicationaccuracyhigherthat 95% forallthecases.Overallalgorithms demonstratedasmoothperformancemeaningthatthechangeo stheclassication accuracywasproportionaltothechangeoftheclassicatio ndatasetsize. 20 30 40 50 60 70 80 90 60 65 70 75 80 85 90 95 100 Training set size (%)Classification accuracy (%) C-SVM ReGEC k-NN IIS LDA Figure2-7.Classicationaccuracyversussizeofthetrain ingset(binary.Accuracyis evaluatedfor100crossvalidationruns. 52

PAGE 53

20 30 40 50 60 70 80 90 70 75 80 85 90 95 100 Training set size (%)Classification accuracy (%) C-SVM ReGEC k-NN IIS LDA Figure2-8.Classicationaccuracyversussizeofthetrain ingset(multiclass).Accuracy isevaluatedfor100crossvalidationruns. 53

PAGE 54

CHAPTER3 ROBUSTLEASTSQUARES 3.1TheOriginalProblem Inthischapterwewillstudytherobustversionofproblemof Eq. 2–1 .The resultspresentedinthischapterwererstdescribedin[ 25 ]andsimilarresultswere independentlyobtainedin[ 21 ].Attheendofthechapterwedescribesomeextensions thatwererstdescribedin[ 20 ].AswediscussedinearliertheRCformulationofa probleminvolvessolutionofaworstcasescenarioproblem. Thisisexpressedbya min-max(ormax-min)typeproblemwheretheoutermin(max)p roblemreferstothe originalonewhereastheinnermax(min)totheworstadmissi blescenarioone.For theleastsquarescasethegenericRCformulationcanbedesc ribedfromthefollowing problem min x max A 2U A b 2U b k ( A + A ) x +( b + b ) k 2 (3–1) where A B areperturbationmatricesand U A U B aresetsofadmissibleperturbations. Asinmanyrobustoptimizationproblemsthestructuralprop ertiesof U A U B are importantforthecomputationaltracktabilityoftheprobl em.Herewewillstudythe casewherethetwoperturbationmatricesareunknownbutthe irnormisboundedbya knownconstant.Thuswehavethefollowingoptimizationpro blem min x max k A k A k b k b k ( A + A ) x +( b + b ) k 2 (3–2) Thistypeofuncertaintyisoftencalledcoupleduncertaint ybecausetheuncertainty informationisnotgivenintermsofeachsampleindividuall ybutintermsofthewhole datamatrix.FirstwewillreduceproblemofEq. 3–2 toaminimizationproblemthrough thefollowinglemmaLemma1. TheproblemofEq. 3–2 isequivalenttothefollowingproblem min x ( k Ax + b k + A k x k + b ) (3–3) 54

PAGE 55

Proof. Fromtriangularinequalitywecanobtainanupperboundonth eobjective functionofEq. 3–2 k ( A + A ) x +( b + b ) kk Ax + b k + k Ax + b k (3–4) k Ax + b k + k A kk x k + k b k (3–5) k Ax + b k + A k x k + B (3–6) NowifintheoriginalproblemofEq. 3–2 weset A = Ax b k Ax b k x T k x k A b = Ax b k Ax b k B (3–7) weget k ( A + A ) x ( b + b ) k = k Ax + b Ax + b k (3–8) = k Ax b k 1+ k x k k Ax b k A + 1 k Ax b k B (3–9) = k Ax b k + A k x k + B (3–10) Thismeansthattheupperboundobtainedbythetriangularin equalitycanbe achievedbyEq. 3–7 .Sincetheproblemisconvexthiswillbetheglobaloptimumf orthe problem. WecaneasilyobservethatthepointofEq. 3–7 satisfytheoptimalityconditions. SinceproblemofEq. 3–3 isunconstraintitsLagrangianwillbethesameasthecost function.Sincethisfunctionisconvexwejustneedtoexami nethepointsforwhichthe derivativeisequaltozeroandtakeseparatecaseforthenon differentiablepoints.Atthe pointswherethecostfunctionisdifferentiablewehave @ L RLLS ( x ) @ x =0 A T ( Ax b ) k Ax b k + x k x k A =0 (3–11) 55

PAGE 56

Fromthislastexpressionwerequire x 6 =0 and Ax 6 = b (wewilldealwiththiscases lately).Ifwesolvewithrespectto x weobtain 1 k Ax b k A T ( Ax b )+ x k Ax b k k x k A =0 (3–12) or A T A + A k Ax b k k x k I x = A T b (3–13) andnally x =( A T A + I ) 1 A T b where = k Ax b k k x k A (3–14) Incasethat Ax = b thenthesolutionisgivenby x = A y b where A y istheMoore Penroseorpseudoinversematrixof A .Thereforewecansummarizethisresultinthe followinglemmaLemma2. TheoptimalsolutiontoproblemofEq. 3–3 isgivenby x = 8><>: A y b if Ax = b ( A T A + I ) 1 A T b = A k Ax b k k x k otherwise (3–15) Sinceinthislastexpression isafunctionof x weneedtoprovidewithawayin ordertotuneit.Forthisweneedtousethesingularvaluedec ompositionofdatamatrix A A = U 264 0 375 V T (3–16) where isthediagonalmatrixthatcontainsthesingularvaluesof A indescending order.Inadditionwepartitionthevector U T b asfollows 264 b 1 b 2 375 = U T b (3–17) where b 1 containstherst n elementsand b 2 therest m n .Nowusingthisdecompositions wewillobtaintwoexpressionsfortheenumeratorandtheden ominatorof .Firstforthe 56

PAGE 57

denominator x =( A T A I ) 1 A T b = V 2 V T I 1 V b 1 = V 2 + I 1 b 1 (3–18) thusthenormwillbegivenfrom k x k = k ( 2 + I ) 1 k (3–19) andfortheenumerator Ax b = U 264 0 375 V T V 2 + I 1 b 1 b (3–20) = U 0B@ 264 0 375 2 + I 1 b 1 U T b 1CA (3–21) = U 0B@ 264 ( 2 + I ) 1 b 1 b 1 b 2 375 1CA (3–22) = U 264 ( 2 + I ) 1 b 1 b 2 375 (3–23) andforthenorm k Ax b k = p k b 2 k 2 + 2 k ( 2 + I ) 1 b 1 k 2 (3–24) Thus willbegivenby = k Ax b k k x k = A p k b 2 k 2 + 2 k ( 2 + I ) 1 b 1 k 2 k ( 2 + I ) 1 b 1 k (3–25) Notethatthepresentinthepresentanalysisweassumethatd atamatrix A isoffull rank.Ifthisisnotthecasesimilaranalysiscanbeperforme d(fordetailssee[ 20 ]).The 57

PAGE 58

closedformsolutioncanbeobtainedbythesolutionofEq. 3–25 .Nextwewillpresent somevariationsoftheoriginalleastsquaresproblemthata rediscussedin[ 20 ]. 3.2VariationsoftheOriginalProblem In[ 20 ]authorsintroducedleastsquareformulationforslightly differentperturbation scenarios.Forexampleinthecaseoftheweightedleastsqua resproblemwithweight uncertaintyoneisinterestedtond min x max k W k W k ( ( W + W )( Ax b ) ) k (3–26) usingthetriangularinequalitywecanobtainanupperbound k ( W + W )( Ax b ) kk W ( Ax b ) k + k W ( Ax b ) k (3–27) k W ( Ax b ) k + W k Ax b k (3–28) Thustheinnermaximizationproblemreducestothefollowin gproblem min x ( k W ( Ax b ) k + W k Ax b k ) (3–29) bytakingthecorrespondingKKTconditions,similartoprev iousanalysis,weobtain @ L WLLS ( x ) @ x = @ k W ( Ax b ) k @ x + @ k Ax b k @ x (3–30) = A T W T ( WAx Wb ) k W ( Ax b ) k + W A T ( Ax b ) k Ax b k (3–31) Bysolvingtheequation @ L WLLS ( x ) @ x =0 (3–32) wendthatthesolutionshouldsatisfy A T ( W T W + I ) Ax = A T ( W T W + I ) b where w = k W ( Ax b ) k k Ax b k (3–33) 58

PAGE 59

Givingtheexpressionfor x x = 8>>>><>>>>: A y b if Ax = b ( WA ) y Wb if WAx = Wb A T ( W T W + w I A ) 1 A T W T W + w I b otherwise (3–34) where w isdenedinEq. 3–33 .Thesolutionforthelastonecanbeobtainedthrough similarwayasfortheoriginalleastsquaresproblem.Inano thervariationoftheproblem theuncertaintycanbegivenwithrespecttomatrix A butinmultiplicativeform.Thusthe robustoptimizationproblemforthisvariationcanbestate dasfollows min x max k A k A k ( I + A ) Ax b k (3–35) whichcanbereducedtothefollowingminimizationproblem min x ( k Ax b k + A k Ax k ) (3–36) thenbysimilaranalysisweobtain @ L MLLS ( x ) @ x = A T ( Ax b ) k A T ( Ax b ) k + A A T Ax k Ax k =0 (3–37) andnally x = 8><>: ( A T A ) y b if A T Ax = A T b A T A (1+ A ) 1 A T b A = k A T ( Ax b ) k k Ax k otherwise (3–38) 59

PAGE 60

CHAPTER4 ROBUSTPRINCIPALCOMPONENTANALYSIS Inthischapterwewilldescribearobustoptimizationappro achforprincipal componentanalysis(PCA)transformation.Againweneedtoc larifythatthepurpose ofthepresentworkistoinvestigatetheapplicationofrobu stoptimizationinthePCA transformation.TherehavebeenseveralRobustPCApapersi ntheliteraturethatdeal withtheapplicationofrobuststatisticsinPCA[ 35 ]andtheyareofinterestwhenoutliers arepresentinthedata.Unlikesupervisedlearningapproac heslikeSVM,thatthe purposeistondtheoptimalsolutionfortheworstcasescen ario,thepurposeofrobust formulationofPCA,asdescribedin[ 6 ],istoprovidecomponentsthatexplainingdata variancewhileatthesametimeareassparseaspossible.Thi sisingeneralcalled sparsecomponentanalysis(SPCA)transformation.Byspars esolutionswemean thevectorswithlargenumberofzeros.Ingeneralsparsityc anbeenforcedthrough differentmethods.Sparsityisthedesiredproperty,espec iallyintelecommunications, becauseitallowsmoreefcientcompressionandfasterdata transmission.Aasparse componentanalysisformulationcanbeobtainedifweaddaca rdinalityconstraintthat strictlyenforcessparsity.Thatis max u T Su (4–1a) s.t u T u =1 (4–1b) card(x) k (4–1c) where card ( ) isthecardinalityfunctionand k isparameterdeningthemaximum allowedcomponentcardinality.Matrix S isthecovariancematrixdenedinChapter 4 and u isthedecisionvariablevector.Alternativelythisproble mcanbecastedasa 60

PAGE 61

semideniteprogrammingproblemasfollows max Tr ( US ) (4–2a) s.tTr ( U )=1 (4–2b) card(U) k 2 (4–2c) U 0, Rank ( X )=1 (4–2d) where U isthedecisionvariablematrixand denotesthatthematrixisposistive semidenite(i.e. a T Xa 0, 8 a 2 R n ).Indeedthesolutiontotheoriginalproblemcanbe obtainedfromthesecondonesinceconditionsofEq. 4–2d guarranteethat U = u u T Insteadofstrictlyconstrainthecardinalitywewilldeman d e T abs ( U ) e k (where e is thevectorof1'sand abs ( ) returnsthematrixwhoseelementsaretheabsolutevaluesof theoriginalmatrix).Inadditionwewilldroptherankconst raintasthisisalsoatoughto handleconstraint.Thusobtainthefollowingrelaxationof theoriginalproblem max Tr ( US ) (4–3a) s.tTr ( U )=1 (4–3b) e T abs ( U ) e k (4–3c) U 0 (4–3d) thelastrelaxedproblemisasemideniteprogramwithrespe cttomatrixvariable U .We canrewriteitasfollows max Tr ( US ) (4–4a) s.tTr ( U )=1 (4–4b) e T abs ( U ) e k (4–4c) U 0 (4–4d) 61

PAGE 62

IfwenowremovetheconstraintofEq. 4–4c andinsteadgetapenalizationcoefcient weobtainthefollowingrelaxation max Tr ( US ) e T abs ( U ) e (4–5a) s.tTr ( U )=1 (4–5b) U 0 (4–5c) where isproblemsparameter(Lagrangemultiplier)determiningt hepenalty's magnitude.Bytakingthedualofthisproblemwecanhaveabet terunderstanding forthenatureoftheproblem. min max ( S + V ) (4–6a) s.t. j V ij j i j =1, n (4–6b) Where max ( X ) isthemaximumeigenvalueofmatrix X .TheproblemofEq. 4–5 canberewrittenasthefollowing min max problem max X 0, Tr ( U )=1 min j V ij j Tr ( U ( S + V )) (4–7a) Morepreciselythegoalistodeterminethecomponentthatco rrespondtothe maximumpossiblevariance(whichistheoriginalPCAobject ive)bychosingthemost sparsesolution(accordingtothesparsityconstraints). 62

PAGE 63

CHAPTER5 ROBUSTLINEARDISCRIMINANTANALYSIS TheRCformulationofrobustlineardiscriminantanalysis( LDA)wasproposed byKimetal[ 39 40 ].Asinotherapproachesthemotivationforarobustcounter part formulationofLDAcomesfromthefactthatdatamightbeimpr ecisethusthemeans andthestandarddeviationscomputedmightnotbetrustwort hyestimatesoftheirreal values.Theapproachthatwewillpresenthereconsidersthe uncertaintyonthemean andstandarddeviationratheronthedatapointsthemselves .GiventheoriginalFisher's optimizationproblem(describedinChapter 2 ) max 6 =0 J ( )=max 6 =0 T S AB T S =max 6 =0 T ( x A x B )( x A x B ) T T ( S A + S B ) (5–1) where x A x B arethemeansofeachclassand S A S B aretheclasscovariancematrices. Thesolutiontothisproblem,thatcanbecomputedanalytica llyis opt =( S A + S B ) 1 ( x A x B ) (5–2) WhichcorrespondstotheoptimalFishercriterionvalue J ( opt )=( x A x B ) T ( S A + S B ) 1 ( x A x B ) T (5–3) Computationally opt and J ( opt canbefoundfromtheeigenvectorsandtheeigenvalues ofthegeneralizedeigenvalueproblem S AB = S (5–4) Fortherobustcaseweareinterestedtodeterminetheoptima lvalueofFisher's criterionforsomeundesired,worstcasescenario.Interms ofoptimizationthiscanbe describedbythefollowing min max problem. max 6 =0 min x A x B S A S B T ( x A x B )( x A x B ) T T ( S A + S B ) =max 6 =0 min x A x B S A S B ( T ( x A x B )) 2 T ( S A + S B ) (5–5) 63

PAGE 64

Inotherwordsweneedtoestimatetheoptimalvector ,deningtheFisher's hyperplane,giventhataworstcasescenario,withrespectt omeansandvariances, occurs.Thisproblemssolutionstronglydependsonthenatu reoftheworstcase admissibleperturbationset.Ingeneralwedenotethesetof alladmissibleperturbation U R n R n S n ++ S n ++ whereherewith S n ++ wedenotethesetofallpositive semidenitematrices.Thentheonlyconstraintoftheinner minimizationproblem wouldbe ( x A x B S A S B ) 2U .Incasethatweareabletoexchangetheorderorthe minimizationandthemaximizationproblemwithoutaffecti ngtheproblem'sstructure.In suchacasewecouldwrite max 6 =0 min ( x A x B S A S B ) 2U ( T ( x A x B )) 2 T ( S A + S B ) =min ( x A x B S A S B ) 2U ( x A x B )( S A + S B ) 1 ( x A x B ) T (5–6) Forageneral min max problemwecanwrite min x 2X max y 2Y f ( x y )=max y 2Y min x 2X f ( x y ) (5–7) if f ( x y ) isconvexfunctionwithrespecttoboth x ,concavewithrespectto y and also X Y areconvexsets.Thisresultsisknownasstrong min max property[ 68 ]. Whenconvexitydoesnotholdwehavethesocalledweak min max property min x 2X max y 2Y f ( x y ) max y 2Y min x 2X f ( x y ) (5–8) Thusin[ 39 40 ]Kimetal.providewithsuchaminimaxtheoremfortheproble m underconsiderationthatdoesnotrequirethestrictassump tionsofSion'sresult.This resultsisstatedinthefollowingtheoremTheorem5.1. Forthefollowingminimizationproblem min ( w T a ) 2 w T Bw (5–9) let ( a opt B opt ) betheoptimalsolution.Alsolet w opt =( B opt ) 1 a opt .Thenthepoint ( w opt a opt B opt ) satisesthefollowingminimaxproperty 64

PAGE 65

(( w opt ) T a opt ) 2 ( w opt ) T B opt w opt =min ( a B ) max w ( w T a ) 2 w T Bw =max w min ( a B ) ( w T a ) 2 w T Bw (5–10) Proof. See[ 40 ] Hereitisworthnotingthatthisresulthasavariertyofappl icationincludingsignal processingandportfoliooptimization(fordetailssee[ 39 ]).Thusthesolutionforthe robustproblemcanbeobtainedbysolvingthefollowingprob lem min( x A x B )( S A + S B ) 1 ( x A x B ) T (5–11a) s.t ( x A x B S A S B ) 2U (5–11b) Assumingthat U isconvexproblemofEq. 5–11 isaconvexproblem.Thisholds becausetheobjectivefunctionisconvexasamatrixfractio nalfunction(fordetailed proofsee[ 17 ]).Nextwewillexaminetherobustlineardiscriminantsolu tionforaspecial caseofuncertaintysets.Morespecicallyletusassumetha tthefrobeniusnormof thedifferencesbetweentherealandtheestimatedvalueoft hecovariancematricesis boundedbyaconstant.Thatis U S = U A U B (5–12) U A = f S A jk S A S A k F A g (5–13) U B = f S B jk S B S B k B g (5–14) Ingeneraltheworstcaseminimizationproblemcanbeexpres sed min ( x A x B S A S B ) =min ( x A x B ) 2U x ( x A x B ) max ( S A S B ) 2U S T ( S A + S B ) (5–15) Theprobleminthedenominatorcanbefurthersimplied max ( S A S B ) 2U S T ( S A + S B ) = T ( S A + S B + A I + B I ) (5–16) 65

PAGE 66

Thustherobustsolutionwillbegivenbythesolutiontothec onvexoptimization problem min ( x A x B ) ( x A x B ) T ( S A + S B + A I + B I ) 1 ( x A x B ) (5–17) WhichissimplerthanEq. 5–11 66

PAGE 67

CHAPTER6 ROBUSTSUPPORTVECTORMACHINES Oneofthemostwellstudiedapplicationofrobustoptimizat ionindataminingisthe oneofsupportvectormachines(SVMs).Thetheoreticalandp racticalissueshavebeen extensivalyexploredthroughtheworksofTrafalisetal.[ 71 72 ],Nemirovskietal.[ 9 ] andXuetal.[ 81 ].ItisofparticularinterestthatrobustSVMformulations aretracktable foravariertyofperturbationsets.Atthesametimethereis cleartheoreticalconnection betweenparticularrobusticationandregularization[ 81 ].Ontheothersideseveral robustoptimizationformulationcanbesolvedasconicprob lems.Ifwerecalltheprimal softmarginSVMformulationpresentedinChapter 2 min w b i 1 2 k w k 2 + C n X i =1 2 i (6–1a) s.t. d i w T x i + b 1 i i =1,..., n (6–1b) i 0, i =1,..., n (6–1c) Fortherobustcasewereplaceeachpoint x i with ~ x i = x i + i where x i arethe nominal(known)valuesand i isanadditiveunknownperturbationthatbelongstoa welldeneduncertaintyset.Theobjectiveistoreoptimize theproblemandobtainthe bestsolutionthatcorrespondtotheworstcaseperturbatio n.Thusthegeneralrobust optimizationproblemformulationcanbestatedasfollows min w b i 1 2 k w k 2 + C n X i =1 2 i (6–2a) s.t. min i d i w T ( x i + i )+ b 1 i i =1,..., n (6–2b) 0, i =1,..., n (6–2c) i 2U i (6–2d) NotethatsincetheexpressionofconstraintofEq. 6–2b correspondstothedistance ofthe i th pointtotheseparationhyperplanetheworstcase i wouldbetheonethat 67

PAGE 68

minimizesthisdistance.Anequivalentformofconstrainto fEq. 6–2b is d i w T x i + b +min i d i w T i 1 i i =1,..., n (6–3) ThuspartofndingthesolutiontotherobustSVMformulatio npassesthroughthe solutionofthefollowingproblem min i 2U i d i w T i i =1,..., n (6–4) forxed w ,where U i isthesetsofadmissibleperturbationscorrespondingto i th sample.Supposethatthe l p normoftheunknownperturbationsareboundedbyknown constant. min d i w T i i =1,..., n (6–5) s.t. k i k p i (6–6) ByusingH ¨ oldersinequality(seeappendix)wecanobtain j d i ( w T i ) jk w k q k i k p i k w k q (6–7) where kk q isthedualnormof kk p .Equivalentlywecanobtain i k w k q d i ( w T i ) (6–8) Thustheminimumofthisexpressionwillbe i k w k q .Ifwesubstitutethisexpressionin theoriginalproblemweobtain min w b i 1 2 k w k 2 + C n X i =1 2 i (6–9a) s.t. d i w T ( x i + i )+ b i k w k q 1 i i =1,..., n (6–9b) i 0, i =1,..., n (6–9c) 68

PAGE 69

Thestructureoftheobtainedoptimizationproblemdepends onthenorm p .Next wewillpresentsome“interesting”case.Itiseasytodeterm inethevalueof q from 1 = p +1 = q =1 (fordetailsseeappendix).For p = q =2 weobtainthefollowing formulation min w b i 1 2 k w k 2 + C n X i =1 2 i (6–10a) s.t. d i w T x i + b i k w k 2 1 i i =1,..., n (6–10b) i 0, i =1,..., n (6–10c) Thelastformulationcanweseenasaregularizationoftheor iginalproblem. Anotherinterestingcaseiswhentheuncertaintyisdescrib edwithrespecttotherst norm(boxconstraints).Inthiscasetherobustformulation willbe min w b i 1 2 k w k 21 + C n X i =1 2 i (6–11a) s.t. d i w T ( x i + i )+ b i k w k 1 1 i i =1,..., n (6–11b) i 0, i =1,..., n (6–11c) Sincethedualof l 1 normisthe l 1 norm.Ifwefurthermoreassumethatthenormof thelossfunctionisexpressedwithrespecttothe l 1 normthentheobtainedoptimization problemcanbesolvedasalinearprogram(LP).Morespecica lyifweintroducethe 69

PAGE 70

auxilaryvariable .Wecanobtainthefollowingequivalentformulationofprob lem 6–11 min w b i + h C i (6–12a) s.t. d i w T ( x i + i )+ b i 1 i i =1,..., n (6–12b) i 0 i =1,..., n (6–12c) w k k =1,..., n (6–12d) w k k =1,..., n (6–12e) 0 (6–12f) Ontheothersideiftheperturbationsareexpressedwithres pecttothe l 1 normthenthe equivalentformulationofSVMis min ( k w k 1 + h C i ) (6–13a) s.t. d i w T x i + b i k w k 1 1 i i =1,..., n (6–13b) i 0 i =1,..., n (6–13c) Inthesamewayifweintroducea min i w b i n X i =1 i + h C i (6–14a) s.t. d i w T x + b i n X i =1 i 1 i i =1,..., n (6–14b) i 0 i =1,..., n (6–14c) i w i i =1,..., n (6–14d) i w i i =1,..., n (6–14e) i 0 i =1,..., n (6–14f) 70

PAGE 71

ItisworthnotingthatforallrobustformulationsofSVMthe classicationrule remainsthesameasforthenominalcase class ( u )= sgn w T u + b (6–15) 71

PAGE 72

CHAPTER7 ROBUSTGENERALIZEDEIGENVALUECLASSIFICATION Inthischapter,wewillpresentourresultsontherobustcou nterpartformulation ofregularizedgeneralizedeigenvalueclassier(ReGEC). Wewillconsiderthe caseofellipsoidalperturbationsaswellasthecaseofnorm constrainedcorrelated perturbations.Attheendofthechapterwediscussthepoten tialextensionofthe proposedalgorithmsforthenonlinearcase.Partofthiswor khasbeenpublishedin [ 79 ]. 7.1UnderEllipsoidalUncertainty SupposethatwearegiventheproblemofEq. 2–67 withtheadditionalinformation thattheavailablevalueofdatapointsfortherstclassis ~ A = A + A insteadof A ,and ~ B = B + B insteadof B .Wefurtherimpose A 2U A B 2U B ,where U A U B arethe setsofadmissibleuncertainties.Ifweusethefollowingtr ansformation ~ G =[ ~ A e ] T [ ~ A e ], ~ H =[ ~ B e ] T [ ~ B e ], (7–1) theRCformulationisgivenbythefollowingmin-maxproblem min z 6 =0 max A 2U A B 2U B z T ~ Gz z T ~ Hz (7–2) Forgeneraluncertaintysets U A U B theproblemisintractable[ 75 ].Similar min max optimizationproblemswithaRayleighquotientobjectivef unctionarisein otherapplicationslikeFisherLinearDiscriminantAnalys is(LDA),signalprocessingand nance[ 39 40 ].Unfortunatelyitisnotpossibletoapplytherobustresul tsofLDAhere becauseofthelackothespecialstructureoftheenumerator matrix.Morepreciselythe LDA'senumeratorisadyad(amatrixofrank1thatisaresulto ftheouterproductofa vectorwithitself).IntheReGECcasethiswouldcorrespond toaclassicationproblem forwhichoneoftheclassescontainsjustonesample.Forthi swewillexaminethe problemfromscratch. 72

PAGE 73

Incasethattheuncertaintyinformationisgivenintheform ofnormbounded inequalitiesinvolvingmatrices G and H ,thenitispossibletoassociateTikhonov regularization,asusedin[ 45 ],withthesolutiontothefollowingrobustproblem min z 6 =0 max k G k 1 z T ( G + G ) z z T Hz (7–3) ThiscanbestatedthroughthefollowingtheoremTheorem7.1. Thesolutionforthefollowing min max problem min z 6 =0 max k G k 1 k H k 2 z T ( G + G ) z z T ( H + H ) z (7–4) isgivenbytheeigenvectorrelatedtothesmallesteigenval ueofthefollowinggeneralizedeigenvaluesystem ( G + 1 I ) z = ( H 2 I ) z (7–5) Proof. :TheoriginalproblemofEq. 7–2 canbewrittenas min z 6 =0 max k G k 1 z T ( G + G ) z min k H k 2 z T ( H + H ) z (7–6) Wenowconsiderthetwoindividualproblemsfortheenumerat orandthedenominator. MorespecicallyonecanwritethecorrespondingKKTsystem fortheenumerator problem zz T + 2 G =0 (7–7) ( k G k 1 ) =0. (7–8) SolvingEq. 7–7 withrespectto G yields G = 1 2 zz T (7–9) FromEq. 7–8 wecanget = k z k 2 2 1 (7–10) 73

PAGE 74

whichgivesthenalexpressionfor G G = 1 k z k 2 zz T (7–11) Thesetwomatricescorrespondtothemaximumandtheminimum oftheproblem, respectively.SubstitutingEq. 7–11 intheenumeratorofEq. 7–4 weobtain z T ( G + 1 I ) z (7–12) Repeatingthesameprocessforthedenominatorandsubstitu tingintothemaster problemshowsthatEq. 7–4 isequivalenttothefollowingminimizationproblem min z 6 =0 z T ( G + 1 I ) z z T ( H 2 I ) z (7–13) Thisproblemattainsitsminimumwhen z isequaltotheeigenvectorthatcorresponds tothesmallesteigenvalueofthegeneralizedeigenvaluepr oblemofthefollowingsystem ( G + 1 I ) z = ( H 2 I ) z (7–14) TheregularizationusedbyMangasarianetal.in[ 45 ]uses G = 1 I H =0 for thesolutionoftherst,and G =0, H = 2 I forthesecondproblemandthereforeitis aformofrobustication.Unfortunately,whenboundsaregi venonthenormsof G and H ,wehavenodirectwaytoderivetheperturbationsintroduce donpoints,namely A and B .Thismeansthattheregularizationisprovidingarobusti cation,butwecannot knowtowhatperturbationwithrespecttothetrainingpoint s. Itisworthnotingthatasimilartheoremhasalreadybeensta tedin[ 64 ]fora completelydifferentproblemintheeldofadaptivelterd esign.Hereitistherst timethatitsrelationwithrobustclassicationmethodsha sbeenshowntoprovidea 74

PAGE 75

straightforwardconnectionbetweenregularizationandro busticationforgeneralized eigenvalueclassiers. Incasethattheuncertaintyinformationisnotgiveninthea boveform(e.g.,we haveinformationforeachspecicdatapoint)thesolutionp rovidedbyEq. 7–5 givesa veryconservativeestimateoftheoriginalsolution.Forth isreasonwehavetopropose alternativealgorithmicsolutionsthatcantakeintoconsi derationalladditionalinformation availablefortheproblem.7.1.1RobustCounterpartUnderEllipsoidalUncertaintySe t Nowwewillfocusonthefollowinguncertaintysetswherethe perturbation informationisexplicitlygivenforeachdatapointinthefo rmofanellipsis U A = n A 2 R m n A =[ ( A ) i ( A ) 2 ... ( A ) m ] T : ( A ) T i i ( A ) i 1, i =1,..., m o (7–15) and U B = n B 2 R p n B =[ ( B ) i ( B ) 2 ... ( B ) p ] T : ( B ) T i i ( B ) i 1, i =1,..., p o (7–16) where ( A ) i i =1,..., m and ( B ) i i =1,..., p aretheindividualperturbationsthatoccurin eachsampleand i 2 R n n isapositivedenitematrixthatdenestheellipse'ssizea nd rotation.ThiscoverstheEuclideannormcasewhen i isequaltotheunitmatrix. Sincetheobjectivefunctionsinenumeratoranddenominato rinEq. 2–65 are nothingbutthesumofdistancesofthepointsofeachclassfr omtheclasshyperplane, wecanconsidertheproblemofndingthemaximum(orminimum )distancefroman ellipse'spointtothehyperplanedenedby w T x b =0 .Sincethedistanceofpointtoa hyperplaneisgivenby j w T x b j = k w k theproblemcanbewrittenas max j w T x b j (7–17a) s.t. ( x x c ) T ( x x c ) 1 0 (7–17b) 75

PAGE 76

where x c istheellipse'scenter.Wecanconsiderthetwocasesofthep roblem max w T x b (7–18a) s.t. ( x x c ) T ( x x c ) 1 0 (7–18b) w T x 0 (7–18c) and max w T x + b (7–19a) s.t. ( x x c ) T ( x x c ) 1 0 (7–19b) w T x 0 (7–19c) NowletusconsiderproblemofEq. 7–18 .ThecorrespondingKKTsystemwillbe w T 2 1 ( x x c ) 2 w T =0 (7–20a) ( x x c ) T ( x x c ) 1 0 (7–20b) alsothereexist 1 2 suchthat 1 ( x x c ) T ( x x c ) 1 =0 (7–21a) 2 w T x =0 (7–21b) FromEq. 7–21b wederive 2 =0 ,becauseindifferentcasethepointshould satisfytheequationoftheline.IfwesolveEq. 7–20a withrespectto x andsubstitutein Eq. 7–20b weobtainthefollowingexpressionfor 1 1 = p w T 1 w 2 (7–22) whichgivestheexpressionfor x x = 1 w p w T 1 w + x c (7–23) 76

PAGE 77

Thetwodifferentpointscorrespondfortheminimumandmaxi mumdistancepoint ontotheellipse.IfweconsiderproblemofEq. 7–19 wewillarrivetothesameresult. Sincethesolutionisexpressedasafunctionof w ,andthus z ,wewillhaveforour originalproblem min z 6 =0 max A 2U A B 2U B z T Gz z T Hz =min z 6 =0 z T H ( z ) z z T G ( z ) z (7–24) Thelatterproblemcannotbesolvedwiththecorrespondingg eneralizedeigenvalue problem,becausethematricesdependon z .Forthisweuseaniterativealgorithm forndingthesolution.First,wesolvethenominalproblem andweusethissolution asthestartingpointoftheiterativealgorithm.Next,star tingfromtheinitialsolution hyperplanes,weestimatethe“worstcase”pointbyEq. 7–23 .Then,wesolveagain theproblembutthistimefortheupdatedpoints.Theprocess isrepeateduntilthe solutionhyperplanesdonotchangemuch,oramaximumnumber ofiterationshasbeen reached.Analgorithmicdescriptionoftheiterativeproce dureisshowninAlgorithm 7.1.1 Algorithm7.1.1 TrainingRobustIterativeReGEC z 1 =[ w T 1 r 1 ] T =argmin k Aw e r k 2 k Bw e r k 2 A (1) A B (1) B z 0 anyvaluesuchthat k z 1 z 0 k > i 1 while k z i z i 1 k or i i max do i i +1 for eachrow x ( i ) j ofmatrix A ( i ) do x ( i ) j max x (1) j + 1 j w i q w T i 1 j w i x (1) j 1 j w i q w T i 1 j w i endforfor eachrow x ( i ) ofmatrix B ( i ) do x ( i ) j min x (1) j + 1 j w i q w T i 1 j w i x (1) j 1 j w i q w T i 1 j w i endforFormupdatedmatrices A ( i ) B ( i ) z i =[ w T i +1 r i +1 ] T =argmin k A ( i ) w e r k 2 k B ( i ) w e r k 2 endwhilereturn z i =[ w T i r i ] T 77

PAGE 78

Asimilarprocessisfollowedforclass B .Incaseof k classesAlgorithm 7.1.1 is appliedforallthecombinationsofclassesandthenthenal hyperplaneisobtained throughSVDonthematrixcontainingthe k 1 normalvectors.Thisisthesame processfollowedfortheoriginalReGEC[ 31 ].Also,itisworthnotingthatthetesting partofthealgorithmisexactlythesametotheoneforthenom inalproblem.Thusthe labelsoftheunknownpointsaredecidedwiththeruledescri bedbyEq. 2–72 where w i areestimatedfromalgorithm 7.1.1 ,incaseoftwoclasses,andwiththetrainingpoint projectionmethodexplainedearlier,incaseofthreeormor eclasses. 7.1.2BalancingBetweenRobustnessandOptimality Robustapproacheshavebeencriticizedforprovidingoverc onservativesolutions inthesensethattheyareoptimalonlywhentheworstassumed scenariooccurs[ 12 ]. Inreallifeapplications,itmightbemoreinterestingtoro bustifyanalgorithmagainst anaveragecasescenario.Forthisreason,weaimatprovidin gamodelthatcanbe adjustable.InsteadofusingEq. 7–23 ,wecantaketheconvexcombinationofthe nominaldatapointsandthe“robust”ones x balanced = x c +(1 ) 1 w p w T 1 w + x c = x c +(1 ) 1 w p w T 1 w ,0 1. (7–25) Parameter determineshowclosepointswillbetotheirnominalvalueso rtotheir “worstscenario”ones.Forcomputationalpurposeswecanch ose bygenerating severalaveragebasedscenariosandselectingthevaluetha tgivesthelowestobjective functionvalueforthetrainingdatapoints.Wearegoingtoc allthismethod -Robust RegularizedGeneralizedEigenvalueClassier( -R-ReGEC). 7.2ComputationalResults 7.2.1ACaseStudy Nowwewillillustratehow -R-ReGECworksunderellipsoidaluncertainty.In thisexampleeachclassiscomposedofthreepoints.Letusas sumeforallpointsthe 78

PAGE 79

ellipsoidaluncertaintyisthesameanditisdescribedbyth ematrix = I [ 2 2 ] T (where I istheunitmatrix).Werstconsiderthesimpletwoclassexa mplewhereeach classisrepresentedby A = 264 5.004.004.637.115.364.42 375 T B = 264 2.822.001.001.447.11 0.68 375 T (7–26) Inordertoexaminethebehavioroftherobustsolutionweper formthefollowingtest. Wecomputethenominalsolutionbasedonthedatavalueswith outanyperturbation. Thenweassumeanellipsoidalperturbationintwodimension sandwecomputethe robustsolution.Wecreate1000differentrealizations(di fferentsetsofpointsperturbed withellipsoidperturbation)andwecomputetheobjectivef unctionvaluefortherobust classierandthenominalone.Thisexperimentisrepeatedf orincreasingvaluesof and .TheresultsareshowninFig. 7-1 0 0.5 1 1.5 2 2.5 3 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 aobjective function value x = 0 x = 0.2 x = 0.4 x = 0.6 x = 0.8 x = 1 0 1 2 3 4 5 6 7 8 9 10 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 bobjective function value x = 0 x = 0.2 x = 0.4 x = 0.6 x = 0.8 x = 1 (a) (b) Figure7-1.Objectivefunctionvaluedependenceontheelli pseparameters. Wenotethatforsmallvaluesofellipseparameterstherobus tsolutionisvery conservative.Thismeansthatitmightbeoptimalfortheass umedworstcasescenario, butitdoesnotperformwellintheaveragecase.Asthepertur bationincreasesrobust solutiontendstohaveaconstantbehaviorforanyrealizati onofthesystem. 79

PAGE 80

7.2.2Experiments Nowwewilldemonstratetheperformanceofthealgorithmond atasetsfromUCI repository[ 28 ].DatasetcharacteristicsareshowninTable 7-1 Table7-1.Datasetdescription. Datasetname#ofpoints#ofattributes#ofclasses PimaIndian[ 69 ]76882 Iris[ 27 ]15043 Wine[ 2 3 ]178133 NDC[ 48 ]30072 Foreachrunweusedholdoutcrossvalidationwith50repetit ions.Inevery repetition90%ofthesampleswereusedfortrainingand10%f ortesting.Ateach repetitionwetraintherobustalgorithmwiththenominalda taplustheuncertainty informationandwetestitonarandomrealizationofthetest ingdataset,thatisnominal valuesoftestingdatasetplusnoisethatsatisestheellip soidaluncertaintyconstraints. Theuncertaintywaskeptconstantandequalto0.1foralldim ensions(features).For thenonxeddimensiontheperturbationissetequalto thatisaparameterforour experiments.Alldatafeaturesareinitiallynormalizedso thattheyhave0meanand unitarystandarddeviation.AllcodewasdevelopedinMATLA B.TherobustSVMsolver usedforcomparisoniswritteninPythonandrununderMatlab TheresultsarereportedinFigs. 7-2 7-3 7-4 & 7-5 informofheatmaps.Foreach considereddataset,weplottheresultsofReGECalgorithmi ntheleftpanel,andthose attainedby -R-ReGECintherightone.Resultshavebeenobtainedwithth eabove describedcrossvalidationtechnique,foreachxedvalueo f and .Eachtoneofgray representstheaverageaccuracyforallcrossvalidationre petitions,asreportedinthe legend.Wenoticethat,inallconsideredcases,rightpanel sshowhigherclassication valuesforlargervaluesof .Thisconrmsitispossibletondvaluesof forwhich theclassicationmodelisresilienttoperturbations.Wen oticethatsomedatasetsare lesssensitivetothevalueof .InthePimaIndianandIrisdatasets,theclassication accuracyincreasesasthevalueof increasesfor 0.4 0.8 .ForWineandNDC 80

PAGE 81

xaNominal 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 10 xaRobust 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 10 0.68 0.7 0.72 0.74 0.76 0.78 0.8 Figure7-2.AnalysisforPimaindiandataset.Thehorizonta laxisdeterminesthe parameterthatbalancesbetweenrobustandnominalsolutio nwhereasthe verticalaxisistheperturbationparameter. xaNominal 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 10 xaRobust 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 10 0.7 0.75 0.8 0.85 0.9 0.95 1 Figure7-3.Analysisforirisdataset.Thehorizontalaxisd eterminesthe parameterthat balancesbetweenrobustandnominalsolutionwhereastheve rticalaxisis theperturbationparameter. 81

PAGE 82

xaNominal 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 10 xaRobust 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 10 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 Figure7-4.Analysisforwinedataset.Thehorizontalaxisd eterminesthe parameter thatbalancesbetweenrobustandnominalsolutionwhereast heverticalaxis istheperturbationparameter. datasets,therobustclassierobtainsnearlyconstantcla ssicationaccuracyforlarge valuesof and 0 .8 NextwecompareourproposedalgorithmagainstRobustSVM.F orthisweused thepackageCVXOPT[ 73 ]andinparticularthecustomizedsolverforRobustSVM underellipsoidaluncertaintiesalsodescribedin[ 4 ].Theoriginalformulationofthe problemisfrom[ 67 ].Moreprecisely,theR-SVMproblemissolvedasasecondord er conequadraticprogram(QP)withsecondorderconeconstrai nts.Foratwoclass classicationproblem,theseparatinghyperplaneisgiven bythesolutionofthefollowing 82

PAGE 83

xaNominal 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 10 xaRobust 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 10 0.74 0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 Figure7-5.AnalysisfordatasetNDC.Thehorizontalaxisde terminesthe parameter thatbalancesbetweenrobustandnominalsolutionwhereast heverticalaxis istheperturbationparameter. SecondOrderConeProgram(SOCP) min 1 2 k w k 2 + e T (7–27a) s.t.diag ( d )( Xw be ) 1 Eu (7–27b) u 0 (7–27c) k j k u j j =1,..., t (7–27d) Wherethe diag ( x ) isthediagonalmatrixthathasvector x inthemaindiagonal, d 2f 0,1 g m isthelabelvector, E isanindicatormatrixthatassociatesanellipsewith acorrespondingdatapoint,i.e.if E ij =1 meansthatthe i th ellipsoidisassociated withthe j th datapoint,and j isapositivesemidenitematrixthatdenestheshapeof theellipse.Theseparationhyperplaneisdenedby w b u areadditionaldecision variablesand isaparameterthatpenalizesthewrongclassiedsamples.N otethatfor 83

PAGE 84

t =0 and E =0 theproblemreducestotheoriginalsoftmarginSVMclassie r min 1 2 k w k 2 + e T (7–28a) s.t.diag ( d )( Xw be ) 1 (7–28b) u 0 (7–28c) ThecomparisonresultsbetweenReGECandSVMforthefourdat asetsofTable 7-1 areshowninFig. 7-6 &Fig. 7-7 InFig. 7-6 &Fig. 7-7 horizontalaxescorrespondtomeanclassicationaccuraci es. Asbeforetheholdoutcrossvalidationwith50repetitionsw asused.ForReGECthe parameterwasadjustedineachrunthroughthetrainingphas eandforSVM r was chosenthroughauniformgridsearch.Itisworthmentioning thatalterationsof r didnot changedramaticallytheobtainedclassicationaccuracie s. Inallcasesandforbothalgorithms,as increases,theaccuracyofthetwonominal algorithmsdecreasesmorerapidlythanforRCs.Furthermor e,theRCsalwaysobtain higheraccuracyvalues.Forthreedatasets(PimaIndian,Wi neandIris)bothrobust algorithmsperformequallywellevenforlargeperturbatio nvalues.ForNDC,ReGECis betterespeciallyforperturbationslessthat =5 .Ingeneral,bothnonrobustalgorithms failtoachieveahighclassicationaccuracyforhighpertu rbationvalues.Wecan thereforeconcludethattheproposedrobusticationofthe algorithmiswellsuitedfor problemsinwhichuncertaintyontrainingpointscanbemode ledasbelongingtoellipsis. 7.3Extentions InthisworkarobustformulationofReGECisintroduced,tha tcanhandle perturbationintheformofellipsoidaluncertaintiesinda ta.Aniterativealgorithmis proposedtosolvetheassociatedmin-maxproblem.Resultso nUCIdatasetsshowan increasedclassicationaccuracy,whensyntheticellipso idalperturbationsareintroduced intrainingdata. 84

PAGE 85

0 1 2 3 4 5 6 7 8 9 10 0.7 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78 0.79 aClassification Accuracy ReGEC Robust ReGEC SVM Robust SVM (a) 0 1 2 3 4 5 6 7 8 9 10 0.75 0.8 0.85 0.9 0.95 1 aClassification Accuracy ReGEC Robust ReGEC SVM Robust SVM (b) Figure7-6.Comparisongraphfor(a)Pimaindian(b)irisdat asets.Redcolor correspondstoReGECwhereasblueforSVM.Solidlinecorres pondstothe robustanddashedtonominalformulation. Someproblemsstillremainopen.First,inthekernelversio nofthealgorithm.Since dataarenonlinearlyprojectedinthefeaturespace,arepre sentationoftheellipsoidal perturbationinthefeaturespaceisneeded.Then,wenoteth atthisembeddingisnot trivial,astheconvexshapeoftheellipsoidintheoriginal spaceisnotpreservedin 85

PAGE 86

0 1 2 3 4 5 6 7 8 9 10 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 aClassification Accuracy ReGEC Robust ReGEC SVM Robust SVM (c) 0 1 2 3 4 5 6 7 8 9 10 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 aClassification Accuracy ReGEC Robust ReGEC SVM Robust SVM (d) Figure7-7.Comparisongraphfor(a)wineand(b)NDCdataset s.Redcolorcorresponds toReGECwhereasblueforSVM.Solidlinecorrespondstother obustand dashedtonominalformulation. thefeaturespace,thusresultinginamuchmoredifcultpro blem.Forthisreason,the directionproposedin[ 55 ]seemspromisingandworthinvestigating.Inaddition,fur ther uncertaintyscenariosneedtobeexplored.Someinterestin guncertaintysetsinclude normboundeduncertaintiesonthedatamatrixi.e.when k A k ,boxconstraintsi.e. 86

PAGE 87

when k i k 1 i orpolyhedralconstraintsi.e.when A i i b i .Mostofthesescenarios havebeenexploredforotherclassiers(e.g.,SVM)introdu cingsignicantperformance improvementoverthenominalsolutionschemes.Finally,si nceanaturalapplicationof thismethodisintheeldofbiomedicalandbioinformaticap plications,wewillinvestigate methodstoreducethenumberoffeaturesinvolvedinthesolu tionprocess. 7.3.1MatrixNormConstrainedUncertainty Inthissectionwewillprovidesometheoreticalresultsfor thecasethatthenorm oftheperturbationofeachmatrixisrestrictedbyaknownco nstant.Thiscaseis, sometimes,refertoascorrelatedperturbationbecausethe perturbationofeachsample isdependentontheperturbationofalltheothers.Thisisno tthecaseintheellipsoid perturbationscenariowheretheuncertaintysetisindepen dentforeachdatasample. Aswediscussedearlieritispossibletoderivearobustform ulationofReGECwhen theperturbationinformationisknownforthematrices G and H .Inpracticethisisnot thecase,thuswewishtoderivearobustformulationforthec asethattheinformatiois givendirectlyforthedatamatrices A and B .Inthiscase,datamatrices A and B are replacedby ~ A = A + A and ~ B = B + B .Thenwecanconsiderthefollowing min max optimizationproblem min z 6 = max k A k 1 k B k 2 k ~ Aw e r k 2 k ~ Bw e r k 2 =min z 6 =0 max k A k 1 k ~ Aw e r k 2 min k ~ B k 2 k ~ Bw e r k 2 (7–29) Againweconsiderthemaximizationproblemontheenumerato r max k A k 1 k ~ Aw e r k 2 (7–30) Since 1 > 0 theconstraintcanbereplacedby k A k 2 21 .Thenthecorresponding KKTsystemwillbe ( Aw + Aw e r ) w T + A =0 (7–31) k A k 2 21 =0 (7–32) 87

PAGE 88

FromEq. 7–31 wecanget A ww T + I = ( Aw e r ) w T (7–33) Ingeneralforamatrix M = K + cd T where K isnonsingularand c d arecolumn vectorsofappropriatedimensionswecanwrite[ 26 ] M 1 = K 1 1 K 1 cd T K 1 (7–34) where =1+ d T K 1 c .Applyingthisresulttomatrix ww T + I yields ( ww T + I ) 1 = 1 I 1 ( + k w k 2 ) ww T (7–35) Thusweobtainanexpressionof A asafunctionof A = ( Aw e r ) w T 1 I 1 ( + k w k 2 ) ww T = ( Aw e r ) w T 1 + k w k 2 (7–36) Inordertoobtainanexpressionfor wecantakethenormofbothsidesinEq. 7–36 k A k = k ( Aw e r ) w T k 1 + k w k 2 (7–37) andfromthis = k ( Aw e r ) w T k k A k k w k 2 (7–38) SubstitutingbackinEq. 7–36 andusingEq. 7–32 weobtainthenalexpressionfor A A = ( Aw e r ) w T 1 k ( Aw e r ) w T k (7–39) AtthispointitisworthnotingthesymmetrybetweenEq. 7–39 andEq. 7–11 Againthetwopointscorrespondtotheminimumandthemaximu mcorrespondingly. Substitutingintheenumeratoroftheoriginalproblemwege t rrrr A +( Aw e r ) w T 1 k ( Aw e r ) w T k w e r rrrr 2 = rrrr ( Aw e r ) 1+ k w k 2 1 k ( Aw e r ) w T k rrrr 2 (7–40) 88

PAGE 89

ByusingCauchy-Schwartzinequalitywehave k ( Aw e r ) w T kk Aw e r kk w k (7–41) Ontheothersidebytriangularinequalitywecanobtain k ( A + A ) w e r kk Aw e r k + 1 k w k forevery A 2 R n m (7–42) andfor A = Aw e r k Aw e r kk w k w T 1 (7–43) weobtainequalityinEq. 7–42 .Convexityoftheproblemguaranteestheunityofthe globalmaximum.Thentherobustsolutiontotheoriginalpro blemisobtainedbysolving thefollowingproblem min w 6 =0, r ( k Aw e r k + 1 k w k ) 2 ( k Aw e r k 2 k w k ) 2 (7–44) Ifwesuppose,withoutlossofgeneralitythat k k =1 ,wecanwrite min k k =1, r 6 =0 k A e r k + 1 k B e r k 2 2 (7–45) Ifweset G =[ A e ] T [ A e ] and H =[ B e ] T [ B e ] ,wecanwrite min k k =1, r 6 =0 p z T Gz + 1 p z T Hz 2 2 (7–46) Thisisafractionalproblemthatcannotbesolveddirectly. Wecantransformthis problemintoaparametriconebymakinguseofDinkelbach'sf ollowingresult Theorem7.2. Fortwocontinuousandrealvaluedfunctionsf(x),g(x)that aredenedon acompactandconnectedsubset S ofEuclideanspacetheproblems max x 2 S f ( x ) g ( x ) = (7–47) and max x 2 S [ f ( x ) g ( x ) ] =0 (7–48) 89

PAGE 90

areequivalent.Proof. [ 24 ] Thisisastrongandgeneralresultthatassociatesfraction alandparametric programming.Thisenablesustotransformouroriginalprob leminto min z ( p z T Gz + 1 ) 2 ( p z T Hz 2 ) 2 (7–49) BasedonDinkelbach'stheoptimalsolution ( z ) istheoneforwhichtheobjective functionisequaltozero.TheLagrangianoftheproblemis L ( z )=( p z T Gz + 1 ) 2 ( p z T Hz 2 ) 2 (7–50) ifweset 5 z L ( z )=0 weget ( p z T Gz + 1 ) Gz p z T Gz = ( p z T Hz 2 ) Hz p z T Hz (7–51) or p z T Gz + 1 p z T Gz Gz = p z T Hz 1 p z T Hz Hz (7–52) Notethatwhen 1 = 2 =0 theproblemreducestotheoriginaleigenvaluesystem Gz = Hz (7–53) wecantrysolvingtheproblemasfollows 1.SolvetheoriginalproblemofEq. 7–53 toobtain z 0 2.Iterateandobtain z i from p z T i Gz i + 1 p z T i Gz i Gz i +1 = p z T i Hz i 2 p z T i Hz i Hz i +1 (7–54) 3.stopwhen k z i +1 z i k = k z i k 90

PAGE 91

Despitewedonothaveanytheoreticalguarranteeforthecon vergenceofthis algorithmweknowthatweareclosetotheoptimalsolutionwh en L ( z ) iscloseto zero.Thuswecanalsousethefollowingconvergencecriteri on L ( z ) (7–55) Inthenextsectionwewilldiscussthepotentialnonlinearg eneralizationofrobust ReGECthroughtheuseofkernelfunctions.7.3.2NonlinearRobustGeneralizedClassication InordertoproposethenonlinearversionofRobustReGECwen eedtointroduce theconceptofkernelfunctions.Kernelfunctionsareusede xtensivelyindatamining clusteringandclassicationinordertomakeuseofpowerfu llinearclassierstonon linearlyseparableproblems.Inthisframeworkthepointso foriginaldataspace(of dimension n ),whichisoftencalled inputspace ,areembeddedthroughakernelfunction : R n 7! R m toahigherdimensionspace( m > n )knownasthe featurespace .Insuch aspacethereishighchancethattheproblemislinearlysepa rablethusapplicationof alinearalgorithmispossible.theprojectionofalinearcl assierinthe featurespace is anonlinearoneinthe inputspace .ForReGECthekernelizedversionwasproposed in[ 30 ].Fortherobustcasethealgorithmrenementisrelatedtot heutilizationofthe uncertaintyinformation.Thusembeddingofellipsoids,de nedbyapositivesemidenite matrix i ,isdifferentascomparedtoembeddingofsinglepoints.Ino rdertoembedthe ellipsoidsinthefeaturespaceweneedtoexpresstheminthe followingform i = X j j x j x T j (7–56) where j aretheeigenvaluesand x j thecorrespondingeigenvectors.Thefactthatthe ellipsismatricesarepositivesemideniteguaranteestha tsuchadecompositionexists. Themappinginfeaturespacecanbeobtainedifwereplaceeac heigenvectorwithits 91

PAGE 92

projectiononthefeaturespace.Thusweobtainthefollowin gmappingof i C i = X j j ( x j ) ( x j ) T (7–57) Forthecomputationoftheworstcaseperturbation C 1 i needstobecomputed. Inthiscasethecomputationisnotasstraightforwardasint heinputspace.For computationalpurposeswereplacethecomputationofthein versewiththeleast squaressolutionofthesystem min r k C i r z k 2 (7–58) whereEq. 7–57 isusedfor C i .Foravoidingnumericalinstabilitieswiththeregularize d counterpart min r k C i r z k 2 + C i k r k 2 (7–59) Thisexpressioncanbeusedinordertoobtaintheworstcases cenariopointson thefeaturespace.Thentheworstcasescenariopointsinthe featurespacewillbegiven by x wc =max x c + r p z T r x c r p z T r (7–60) ininordertobalancebetweenrobusticationandaccuracyw econsidertheconvex combinationoftherobustandthenominalpoints x balanced = x c +(1 ) x wc (7–61) where istheparameterthatdeterminesthetradeoffandshouldbet unedduringthe modelselectionphase.Thesepointsareusedfortherobusta lgorithm.Alongthesame lineswiththelinearcasethekernelversionofrobustReGEC canbeobtainedthrough thefollowingalgorithmicscheme 92

PAGE 93

Algorithm7.3.1 TrainingRobustIterativenonlinearReGEC z 1 =[ w T 1 r 1 ] T =argmin k Aw e r k 2 k Bw e r k 2 A (1) A B (1) B z 0 anyvaluesuchthat k z 1 z 0 k > i 1 while k z i z i 1 k or i i max do i i +1 for eachrow x ( i ) j ofmatrix A ( i ) do x ( i ) j max x (1) j + 1 j w i q w T i 1 j w i x (1) j 1 j w i q w T i 1 j w i endforfor eachrow x ( i ) ofmatrix B ( i ) do x ( i ) j min x (1) j + 1 j w i q w T i 1 j w i x (1) j 1 j w i q w T i 1 j w i endforFormupdatedmatrices A ( i ) B ( i ) z i =[ w T i +1 r i +1 ] T =argmin k A ( i ) w e r k 2 k B ( i ) w e r k 2 endwhilereturn z i =[ w T i r i ] T 93

PAGE 94

CHAPTER8 CONCLUSIONS Inthiswork,wepresentedsomeofthemajorrecentadvanceso frobustoptimization indatamining.wealsopresentedtherobustcounterpartfor mulationforReGEC.We comparedtheirperformancewithrobustsupportvectormach ines(SVM)andsuggested anonlinear(kernalized)versionofthealgorithm. Inmostthethesis,weexaminedmostofdataminingmethodsfr omthescope ofuncertaintyhandlingwithonlyexceptiontheprincipalc omponentanalysis(PCA) transformation.Neverthelesstheuncertaintycanbeseena saspecialcaseofprior knowledge.Inpriorknowledgeclassicationforexamplewe aregiventogetherwiththe trainingsetssomeadditionalinformationabouttheinputs pace.Anothertypeofprior knowledgeotherthanuncertaintyisthesocalledexpertkno wledge(e.g.,binaryrule ofthetype“iffeature a ismorethan M 1 andfeature b lessthan M 2 thenthesample belongstoclass x ”).Therehasbeensignicantamountofresearchintheareao fprior knowledgeclassication[ 42 62 ]buttherehasnotbeenasignicantstudyofrobust optimizationonthisdirection. Ontheothersidetherehavebeenseveralothermethodsablet ohandleuncertainty likestochasticprogrammingaswealreadymentionedattheb eginningofthemanuscript. Sometechniques,forexamplelikeconditionalvalueatrisk (CVAR),havebeen extensivelyusedinportfoliooptimizationandinotherris krelateddecisionsystems optimizationproblems[ 61 ]buttheirvalueformachinelearninghasnotbeenfully investigated. Alastthoughtonapplicationofrobustoptimizationinmach inelearningwouldbeas analternativemethodfordatareduction.Inthiscasewecou ldreplacegroupsofpoints byconvexshapes,suchasballs,squaresorellipsoids,that enclosethem.Thenthe supervisedlearningalgorithmcanbetrainedjustbyconsid eringtheseshapesinsteadof thefullsetsofpoints. 94

PAGE 95

APPENDIXA OPTIMALITYCONDITIONS HerewewillbrieydiscusstheKarushKuhnTucker(KKT)Opti malityConditions andthemethodofLagrangemultipliersthatisextensivelyu sedthoughthiswork.In thissection,fortheshakeofcompletionwearegoingtodesc ribethetechnicaldetails relatedtooptimalityofconvexprogramsandtherelationwi thKKTsystemsandmethods ofLagrangemultipliers.Firstwewillstartbygivingsomee ssentialdenitionsrelatedto convexity.Firstwegivethedenitionofaconvexfunctiona ndconvexset. Denition1. Afunction f : X R m 7! R iscalledconvexwhen f ( x )+(1 ) f ( x ) f ( x +(1 ) x ) for 0 1 and 8 x 2 X Denition2. Aset X iscalledconvexwhenforanytwopoints x 1 x 2 2 X thepoint x 1 +(1 ) x 2 2 X for 0 1 NowwearereadytodeneaconvexoptimizationproblemDenition3. Anoptimizationproblem min x 2 X f ( x ) iscalledconvexwhen f ( x ) isa convexfunctionand X isaconvexset. Theclassofconvexproblemsisreallyimportantbecauseint heyareclassied asproblemsthatarecomputationallytractable.Thisallow stheimplementationoffast algorithmsfordataanalysismethodsthatarerealizedasco nvexproblems.Processing ofmassivedatasetscanberealizedbecauseofthisproperty .Oncewehavedenedthe convexoptimizationproblemintermsofthepropertiesofit sobjectivefunctionandits feasibleregionwewillstatesomebasicresultsrelatedtot heiroptimality. Corollary1. Foraconvexminimizationproblemalocalminimum x isalwaysaglobal minimumaswell.Thatisif f ( x ) ( f ( x )) for x 2 S where S X then f ( x ) f ( x ) for x 2 X Proof. Let x bealocalminimumsuchthat f ( x ) < f ( x ), x 2 S X andanotherpoint x beingtheglobalminimumsuchthat f ( x ) < f ( x ), x 2 X .Thenbyconvexityofthe 95

PAGE 96

objectivefunctionitholdsthat f ( x +(1 ) x )= f ( x + ( x x )) f ( x )+(1 ) f ( x ) < f ( x ) (A–1) ontheothersidebylocaloptimalityofpoint x wehavethatthereexist > 0 suchthat f ( x ) f ( x + ( x x )),0 (A–2) whichisacontradiction. Thisisaimportantconsequencethatexplainsinpartthecom putationaltracktability ofconvexproblems.Nextwedenethecriticalpointsthatar eextremelyimportantfor thecharacterizationofglobaloptimaofconvexproblems.B utbeforethatweneedto introducethenotionofextremedirections.Denition4. Avector d R n iscalledfeasibledirectionwithrespecttoaset S atapoint x ifthereexist c 2 R suchthat x + d 2 S forevery 0 << c Denition5. Foraconvexoptimizationproblem min x 2 X f ( x ) where f differentiableevery pointthatsatises d T r f ( x ) 0, d 2 Z ( x ) (where Z ( x ) isthesetofallfeasible directionsofthepoint x )iscalledacritical(orstationary)point. Criticalpointsareveryimportantinoptimizationastheya reusedinorderto characterizelocaloptimalityingeneraloptimizationpro blems.Inageneraldifferentiable setupstationarypointscharacterizelocalminima.Thisis formalizedthroughthe followingtheorem.TheoremA.1. If x isalocalminimumofacontinuouslydiffentiablefunction f dened onaconvexset S thenitsatises d T r f ( x ) 0, d 2 Z ( x ) Proof. [ 33 ]p.14. Duetothespecicpropertiesofconvexity,inconvexprogra mming,criticalpoints areusedinordertocharacterizeglobaloptimalsolutionsa swell.Thisisstatedthrough thefollowingtheorem. 96

PAGE 97

TheoremA.2. if f isacontinuouslydifferentiablefunctiononanopensetcon taining S and S isaconvexsetthen x 2 S isaglobalminimumifandonlyif x isastationary point.Proof. [ 33 ]pp.14-15. Thelasttheoremisaverystrongresultthatconnectsstatio narypointswithglobal optimality.Sincestationarypointsaresoimportantforso lvingconvexoptimization problemsitisalsoimportanttoestablishamethodologytha twouldallowustodiscover suchpoints.ThisisexactlythegoalofKarushKuhnTuckerco nditionsandmethod ofLagrangianmultipliers(Theyareactuallydifferentsid esofthesamecoin).this systematicmethodologywasrstintroducedbyLagrangein1 797anditwasgeneralized thoughthemasterthesisofW.Karush[ 38 ]andnallytheybecamemorepopularknow throughtheworkofH.W.KuhnandA.W.Tucker[ 41 ].Theseconditionsareformally statedthroughthenexttheorem.TheoremA.3. (KKTconditions)Giventhefollowingoptimizationproblem min f ( x ) (A–3a) s.t. g i ( x ) 0, i =1, n (A–3b) h i ( x )=0, i =1, m (A–3c) x 0 (A–3d) Thefollowingconditions(KTT)arenecessaryforoptimalit y r f ( x )+ n X i =1 i r g i ( x )+ m X i =1 i r h i ( x ) (A–4a) i g i ( x )=0 i =1,..., n (A–4b) i 0 i =1,..., n (A–4c) Forthespecialcasethat f ( ), g ( ), h ( ) areconvexfunctionsthentheKKTconditionsare alsosufcientforoptimality. 97

PAGE 98

Proof. See[ 33 ] TheEq. A–4a isalsoknownasLagrangianequationand i arealsoknownas lagrangemultipliers.Thusonecandeterminestationaryfo raproblembyjustndingthe rootsoftheLagrangian'srstderivative.Forthegeneralc asethismethodisformalized throughtheKarush-Kuhn-Tuckeroptimalityconditions.Th eimportantoftheseconditions isthatunderconvexityassumptionstheyarenecessaryands ufcient. Duetotheaforementionedresultsthatconnectstationaryp ointwithoptimalitywe canclearlyseethatonecansolveaconvexoptimizationprob lemjustbysolvingthe correspondingKKTsystem.Thecorrespondingpointswouldb ethesolutiontothe originalproblem. 98

PAGE 99

APPENDIXB DUALNORMS Dualnormsisamathematicaltool,necessaryfortheanalysi sofrobustsupport vectormachinesformulation.Denition6. Foranorm kk wedenethedualnorm kk asfollows k x k = sup f x T jk x k g (B–1) Thereareseveralpropertiesassociatedwiththedualnormt hatwewillbriey discusshere.Property1. Adualnormofadualnormistheoriginalnormitsself.Inothe rwords k x k = k x k (B–2) Property2. Adualofan l a normis l b normwhere a and b satisfythefollowingequation 1 a + 1 b =1 b = a a 1 (B–3) Immediateresultsofthepreviouspropertyisthat ThedualnormodtheEuclideannormistheEuclideannorm(b=2 /(2-1)=2). Thedualnormofthe l 1 normis l 1 NextwewillstateH ¨ oldersinequalityandCauchySwartzinequalitywhicharetw o fundamentalinequalitiesthatconnecttheprimalandthedu alnorm. TheoremB.1. (H ¨ oldersinequality)Forapairofdualnorms a and b thefollowing inequalityholds: h x y ik x k a k y k b (B–4) Forthespecialcasethat a = b =2 thenH ¨ oldersinequalityreducestoCauchySwartz inequality h x y ik x k 2 k y k 2 (B–5) 99

PAGE 100

REFERENCES [1]Abello,J.,Pardalos,P.,Resende,M.:Handbookofmassi vedatasets.Kluwer AcademicPublishersNorwell,MA,USA(2002) [2]Aeberhard,S.,Coomans,D.,DeVel,O.:Comparisonofcla ssiersinhigh dimensionalsettings.Dept.Math.Statist.,JamesCookUni v.,NorthQueensland, Australia,Tech.Rep(1992) [3]Aeberhard,S.,Coomans,D.,DeVel,O.:Theclassicatio nperformanceofRDA. Dept.ofComputerScienceandDept.ofMathematicsandStati stics,JamesCook UniversityofNorthQueensland,Tech.Reppp.92–01(1992) [4]Andersen,M.S.,Dahl,J.,Liu,Z.,Vandenberghe,L.:Opt imizationforMachine Learning,chap.Interior-pointmethodsforlarge-scaleco neprogramming.MIT Press(2011) [5]Angelosante,D.,Giannakis,G.:RLS-weightedLassofor adaptiveestimationof sparsesignals.In:Acoustics,SpeechandSignalProcessin g,2009.ICASSP2009. IEEEInternationalConferenceon,pp.3245–3248.IEEE(200 9) [6]dAspremont,A.,ElGhaoui,L.,Jordan,M.,Lanckriet,G. :Adirectformulationfor sparsepcausingsemideniteprogramming.SIAMreview 49 (3),434(2007) [7]Baudat,G.,Anouar,F.:Generalizeddiscriminantanaly sisusingakernelapproach. Neuralcomputation 12 (10),2385–2404(2000) [8]Bayes,T.:Anessaytowardssolvingaprobleminthedoctr ineofchances.R.Soc. Lond.Philos.Trans 53 ,370–418(1763) [9]Ben-Tal,A.,ElGhaoui,L.,Nemirovski,A.S.:Robustopt imization.PrincetonUnivPr (2009) [10]Ben-Tal,A.,Nemirovski,A.:Robustsolutionsoflinea rprogrammingproblems contaminatedwithuncertaindata.MathematicalProgrammi ng 88 (3),411–424 (2000) [11]Bertsimas,D.,Pachamanova,D.,Sim,M.:Robustlinear optimizationundergeneral norms.OperationsResearchLetters 32 (6),510–516(2004) [12]Bertsimas,D.,Sim,M.:Thepriceofrobustness.Operat ionsResearch 52 (1),35–53 (2004) [13]Bhowmick,T.,Pyrgiotakis,G.,Finton,K.,Suresh,A., Kane,S.,Moudgil,B.,Bellare, J.:AstudyoftheeffectofJBparticlesonSaccharomycescer evisiae(yeast)cells byRamanspectroscopy.JournalofRamanSpectroscopy 39 (12),1859–1868 (2008) 100

PAGE 101

[14]Birge,J.,Louveaux,F.:Introductiontostochasticpr ogramming.SpringerVerlag (1997) [15]Bishop,C.:Patternrecognitionandmachinelearning. SpringerNewYork(2006) [16]Blondin,J.,Saad,A.:Metaheuristictechniquesforsu pportvectormachinemodel selection.In:HybridIntelligentSystems(HIS),201010th InternationalConference on,pp.197–200(2010) [17]Boyd,S.,Vandenberghe,L.:Convexoptimization.Camb ridgeUnivPr(2004) [18]Busygin,S.,Prokopyev,O.,Pardalos,P.:Biclusterin gindatamining.Computers& OperationsResearch 35 (9),2964–2987(2008) [19]Calderbank,R.,Jafarpour,S.:ReedMullerSensingMat ricesandtheLASSO. SequencesandTheirApplications–SETA2010pp.442–463(20 10) [20]Chandrasekaran,S.,Golub,G.,Gu,M.,Sayed,A.:Param eterestimationinthe presenceofboundedmodelingerrors.SignalProcessingLet ters,IEEE 4 (7), 195–197(1997) [21]Chandrasekaran,S.,Golub,G.,Gu,M.,Sayed,A.:Param eterestimationinthe presenceofboundeddatauncertainties.SIAMJournalonMat rixAnalysisand Applications 19 (1),235–252(1998) [22]Chang,C.C.,Lin,C.C.:LIBSVM:alibraryforsupportve ctormachines(2001). Softwareavailableat http://www.csie.ntu.edu.tw/ ~ cjlin/libsvm [23]DellaPietra,S.,DellaPietra,V.,Lafferty,J.,Techn ol,R.,Brook,S.:Inducing featuresofrandomelds.IEEEtransactionsonpatternanal ysisandmachine intelligence 19 (4),380–393(1997) [24]Dinkelbach,W.:Onnonlinearfractionalprogramming. ManagementScience 13 (7), 492–498(1967) [25]ElGhaoui,L.,Lebret,H.:Robustsolutionstoleast-sq uaresproblemswithuncertain data.SIAMJournalonMatrixAnalysisandApplications 18 ,1035–1064(1997) [26]Faddeev,D.,Faddeeva,V.:Computationalmethodsofli nearalgebra.Journalof MathematicalSciences 15 (5),531–650(1981) [27]Fisher,R.:Theuseofmultiplemeasurementsintaxonom icproblems.Annalsof Eugenics 7 (7),179–188(1936) [28]Frank,A.,Asuncion,A.:UCImachinelearningreposito ry(2010).URL http:// archive.ics.uci.edu/ml [29]Guarracino,M.,Cifarelli,C.,Seref,O.,Pardalos,P. :Aparallelclassicationmethod forgenomicandproteomicproblems.In:AdvancedInformati onNetworkingand Applications,2006.,vol.2,pp.588–592(2006) 101

PAGE 102

[30]Guarracino,M.,Cifarelli,C.,Seref,O.,Pardalos,P. :Aclassicationmethodbased ongeneralizedeigenvalueproblems.OptimizationMethods andSoftware 22 (1), 73–81(2007) [31]Guarracino,M.,Irpino,A.,Verde,R.:Multiclassgene ralizedeigenvalueproximal SupportVectorMachines.In:ProceedingsofInternational Conferenceon Complex,IntelligentandSoftwareIntensiveSystems-CISI S,pp.25–32.IEEE CS(2010) [32]Guyon,I.,Elisseeff,A.:Anintroductiontovariablea ndfeatureselection.The JournalofMachineLearningResearch 3 ,1157–1182(2003) [33]Horst,R.,Pardalos,P.,Thoai,N.:Introductiontoglo baloptimization.Springer (1995) [34]Huang,C.,Lee,Y.,Lin,D.,Huang,S.:Modelselectionf orsupportvectormachines viauniformdesign.ComputationalStatistics&DataAnalys is 52 (1),335–346 (2007) [35]Hubert,M.,Rousseeuw,P.,VandenBranden,K.:Robpca: anewapproachto robustprincipalcomponentanalysis.Technometrics 47 (1),64–79(2005) [36]Jain,A.,Dubes,R.:Algorithmsforclusteringdata.Pr enticeHall(1988) [37]Karmarkar,N.:Anewpolynomial-timealgorithmforlin earprogramming. Combinatorica 4 (4),373–395(1984) [38]Karush,W.:Minimaoffunctionsofseveralvariableswi thinequalitiesasside constraints.MScThesis,DepartmentofMathematics.Unive rsityofChicago(1939) [39]Kim,S.J.,Boyd,S.:Aminimaxtheoremwithapplication stomachinelearning, signalprocessing,andnance.SIAMJournalonOptimizatio n 19 (3),1344–1367 (2008) [40]Kim,S.J.,Magnani,A.,Boyd,S.:Robustsherdiscrimi nantanalysis.Advancesin NeuralInformationProcessingSystems 18 ,659(2006) [41]Kuhn,H.,Tucker,A.:Nonlinearprogramming.In:Proce edingsofthesecond Berkeleysymposiumonmathematicalstatisticsandprobabi lity,vol.481,p.490. California(1951) [42]Lauer,F.,Bloch,G.:Incorporatingpriorknowledgein supportvectormachinesfor classication:Areview.Neurocomputing 71 (7-9),1578–1594(2008) [43]Lilis,G.,Angelosante,D.,Giannakis,G.:SoundField Reproductionusingthe Lasso.Audio,Speech,andLanguageProcessing,IEEETransa ctionson 18 (8), 1902–1912(2010) 102

PAGE 103

[44]Mangasarian,O.,Street,W.,Wolberg,W.:Breastcance rdiagnosisandprognosis vialinearprogramming.OperationsResearch 43 (4),570–577(1995) [45]Mangasarian,O.,Wild,E.:MultisurfaceproximalSupp ortVectorMachine classicationviageneralizedeigenvalues.IEEETrans.Pa tternAnal.Mach. Intell. 28 (1),69–74(2006) [46]McCarthy,J.,Minsky,M.,Rochester,N.,Shannon,C.:A proposalfortheDartmouth summerresearchprojectonarticialintelligence.AIMAGA ZINE 27 (4),12(2006) [47]Moore,G.:Crammingmorecomponentsontointegratedci rcuits.Electronics 38 (8), 114–117(1965) [48]Musicant,D.R.:NDC:normallydistributedclusteredd atasets(1998).URL http:// www.cs.wisc.edu/dmi/svm/ndc/ [49]Nielsen,J.:NielsenslawofInternetbandwidth.Onlin eathttp://www.useit. com/alertbox/980405.html(1998) [50]Notingher,I.,Green,C.,Dyer,C.,Perkins,E.,Hopkin s,N.,Lindsay,C.,Hench, L.:Discriminationbetweenricinandsulphurmustardtoxic ityinvitrousingRaman spectroscopy.JournalofTheRoyalSocietyInterface 1 (1),79–90(2004) [51]Olafsson,S.,Li,X.,Wu,S.:Operationsresearchandda tamining.European JournalofOperationalResearch 187 (3),1429–1448(2008) [52]Owen,C.A.,Selvakumaran,J.,Notingher,I.,Jell,G., Hench,L.,Stevens,M.:In vitrotoxicologyevaluationofpharmaceuticalsusingRama nmicro-spectroscopy.J. Cell.Biochem. 99 (1),178–186(2006) [53]Pardalos,P.,Boginski,V.,Vazacopoulos,A.:DataMin inginBiomedicine.Springer Verlag(2007) [54]Platt,J.,Cristianini,N.,Shawe-Taylor,J.:Largema rginDAGsformulticlass classication.Advancesinneuralinformationprocessing systems 12 (3),547–553 (2000) [55]Pothin,J.,Richard,C.:Incorporatingpriorinformat ionintosupportvectormachines intheformofellipsoidalknowledgesets.Citeseer(2006) [56]Powers,K.,Brown,S.,Krishna,V.,Wasdo,S.,Moudgil, B.,Roberts,S.:Research strategiesforsafetyevaluationofnanomaterials.PartVI .Characterizationof nanoscaleparticlesfortoxicologicalevaluation.Toxico logicalSciences 90 (2), 296–303(2006) [57]Pyrgiotakis,G.,Kundakcioglu,O.,Finton,K.,Pardal os,P.,Powers,K.,Moudgil,B.: CelldeathdiscriminationwithRamanspectroscopyandSupp ortVectorMachines. AnnalsofBiomedicalEngineering 37 (7),1464–1473(2009) 103

PAGE 104

[58]Quinlan,J.:C4.5:programsformachinelearning.Morg anKaufmann(1993) [59]Quinlan,J.:Improveduseofcontinuousattributesinc 4.5.JournalofArticial IntelligenceResearch 4 ,77–90(1996) [60]Rao,C.:Theutilizationofmultiplemeasurementsinpr oblemsofbiological classication.JournaloftheRoyalStatisticalSociety.S eriesB(Methodological) 10 (2),159–203(1948) [61]Rockafellar,R.,Uryasev,S.:Optimizationofconditi onalvalue-at-risk.Journalof risk 2 ,21–42(2000) [62]Sch ”olkopf,B.,Simard,P.,Smola,A.,Vapnik,V.:Priorknowle dgeinsupportvector kernels.Advancesinneuralinformationprocessingsystem spp.640–646(1998) [63]Seref,O.,Kundakcioglu,O.,Pardalos,P.:DataMining ,SystemsAnalysis,and OptimizationinBiomedicine.In:AIPConferenceProceedin gs(2007) [64]Shahbazpanahi,S.,Gershman,A.,Luo,Z.,Wong,K.:Rob ustadaptive beamformingusingworst-caseSINRoptimization:Anewdiag onalloading-type solutionforgeneral-ranksignalmodels.In:Acoustics,Sp eech,andSignal Processing,vol.5,pp.333–336(2003) [65]Shannon,P.,Markiel,A.,Ozier,O.,Baliga,N.,Wang,J .,Ramage,D.,Amin,N., Schwikowski,B.,Ideker,T.:Cytoscape:asoftwareenviron mentforintegrated modelsofbiomolecularinteractionnetworks.Genomeresea rch 13 (11),2498 (2003) [66]Shawe-Taylor,J.,Cristianini,N.:Kernelmethodsfor patternanalysis.Cambridge UnivPr(2004) [67]Shivaswamy,P.,Bhattacharyya,C.,Smola,A.:Secondo rderconeprogramming approachesforhandlingmissinganduncertaindata.TheJou rnalofMachine LearningResearch 7 ,1283–1314(2006) [68]Sion,M.:Ongeneralminimaxtheorems.PacicJournalo fMathematics 8 (1), 171–176(1958) [69]Smith,J.W.,Everhart,J.E.,Dickson,W.C.,Knowler,W .C.,Johannes,R.S.:Using theadaplearningalgorithmtoforecasttheonsetofdiabete smellitus.Johns HopkinsAPLTechnicalDigest 10 ,262–266(1988) [70]Tibshirani,R.:Regressionshrinkageandselectionvi athelasso.Journalofthe RoyalStatisticalSociety.SeriesB(Methodological)pp.2 67–288(1996) [71]Trafalis,T.,Alwazzi,S.:Robustoptimizationinsupp ortvectormachinetrainingwith boundederrors.In:NeuralNetworks,2003.Proceedingsoft heInternationalJoint Conferenceon,vol.3,pp.2039–2042 104

PAGE 105

[72]Trafalis,T.B.,Gilbert,R.C.:Robustclassicationa ndregressionusingsupport vectormachines.EuropeanJournalofOperationalResearch 173 (3),893–909 (2006) [73]Vandenberghe,L.:TheCVXOPTlinearandquadraticcone programsolvers(2010) [74]Vapnik,V.:TheNatureofStatisticalLearningTheory. Springer-Verlag,NewYork (1995) [75]Verdu,S.,Poor,H.:Onminimaxrobustness:Agenerala pproachandapplications. InformationTheory,IEEETransactionson 30 (2),328–340(2002) [76]Walter,C.:Kryder'slaw.ScienticAmerican 293 (2),32(2005) [77]Xanthopoulos,P.,Boyko,N.,Fan,N.,Pardalos,P.:Bic lustering:Algorithmsand ApplicationinDataMining.JohnWiley&Sons,Inc.(2011) [78]Xanthopoulos,P.,DeAsmundis,R.,Guarracino,M.,Pyr giotakis,G.,Pardalos, P.:Supervisedclassicationmethodsforminingcelldiffe rencesasdepictedby ramanspectroscopysubmitted.In:Seventhinternationalm eetingoncomputational intelligencemethodsforbioinformaticsandbiostatistic s,PalermoSeptember16-18 2010,pp.112–122(2011) [79]Xanthopoulos,P.,Guarracino,M.,P.M.,P.:Robustgen eralizedeigenvalueclassier withellipsoidaluncertainty.AnnalsofOperationsResear ch(submitted) [80]Xanthopoulos,P.,Pardalos,P.,Trafalis,T.:RobustD ataMining.Spinger(2012) [81]Xu,H.,Caramanis,C.,Mannor,S.:RobustnessandRegul arizationofSupport VectorMachines.JournalofMachineLearningResearch 10 ,1485–1510(2009) [82]Yan,R.:MATLABArsenal-AMatlabPackageforClassica tionAlgorithms. Informedia,SchoolofComputerScience,CarnegieMellonUn iversity(2006) 105

PAGE 106

BIOGRAPHICALSKETCH PetrosXanthopouloswasbornatHeraclion,Crete,Greece.H ereceivedthe diplomaofengineeringdegreefromtheElectronicsandComp uterEngineering Department,TechnicalUniversityofCreteatChania,Greec e,in2005.Infall2006, PetrosXanthopoulosjoinedthegraduateprogramofIndustr ialandSystemsEngineering (I.S.E.)departmentattheUniversityofFlorida.Hereceiv edaMasterofSciencein 2008andaPhDin2011.Hisresearchinterestareoperationsr esearch,datamining, robustoptimizationandbiomedicalsignalprocessing.His workhasbeenpublished injournalsoftheInstintuteofElectricalandElectronics Engineers(IEEE),Elsevier andWiley&SonsandhehaseditedaspecialissuefortheJourn alofCombinatorial OptimizationentitledDataMininginBiomedicine.Heisthe co-editoroftwovolumes publishedfromSpringerbookseriesSpringerOptimization anditsApplications (SOIA)andonepublishedfromAmericanMathematicalSociet y(Fieldsinstintute bookseries).Hehasalsoco-organizedmanyspecialsession sininternationalmeeting andtwoconferencesrelatedtoapplicationsofdataminingi nbiomedicine.Whileatthe UniversityofFlorida,hetaughtseveralundergraduatecla sses,andin2010,hereceived theI.S.E.departmentgraduateawardforexcellenceinrese arch.Heisstudentmember ofInstintuteofOperationsresearchandManagementScienc e(INFORMS)andSociety ofIndustrialandAppliedMathematics(SIAM).InAugust201 1hejoinedtheIndustrial EngineeringandManagementSystemsDepartment,atUnivers ityofCentralFloridaas tenuretrackfacultyatthelevelofassistantprofessor. 106