Risk Management Approaches in Data Mining

MISSING IMAGE

Material Information

Title:
Risk Management Approaches in Data Mining
Physical Description:
1 online resource (65 p.)
Language:
english
Creator:
Tsyurmasto, Petr
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Industrial and Systems Engineering
Committee Chair:
URYASEV,STANISLAV
Committee Co-Chair:
BOGINSKIY,VLADIMIR L
Committee Members:
PARDALOS,PANAGOTE M
RANGARAJAN,ANAND

Subjects

Subjects / Keywords:
svm
Industrial and Systems Engineering -- Dissertations, Academic -- UF
Genre:
Industrial and Systems Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
This dissertation is dedicated to the development of risk management approach to Support Vector Machine (SVM) for classification. The approach is based on the representation of classical versions of SVM in primal as a minimization of some tail risk measures such as maximal loss, conditional value-at-risk, partial moment, etc. By revisiting the fundamental concepts of structural risk minimization and maximal margin classification we develop a generalized framework, which is shown to contain several classical versions of SVMs such as hard-margin, soft-margin, $\nu$-SVM and extended $\nu$-SVM as its special cases, corresponding to the particular choices of risk functional. Under the assumption that the empirical risk function is positive homogeneous, we derive conditions for which the generalized formulation is equivalent to several structural risk minimization formulations. The considered formulations specify different ways how to express a tradeoff between the empirical risk and regularization. The non-linear Gaussian (RBF) kernel extension is provided for all formulaion. The sufficient conditions for the existence of an optimal solution and unbounded solutions to the associated optimization problems are also given and can be checked before the optimization. Within a presented framework, we propose new classification methods based on value-at-risk and conditional value-at-risk measures, which are stable to data outliers. Computational experiments confirm that the new classifiers have a superior out-of-sample performance on datasets contaminated by outliers, compared to the classical versions of SVM.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Petr Tsyurmasto.
Thesis:
Thesis (Ph.D.)--University of Florida, 2014.
Local:
Adviser: URYASEV,STANISLAV.
Local:
Co-adviser: BOGINSKIY,VLADIMIR L.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2014
System ID:
UFE0046215:00001


This item is only available as the following downloads:


Full Text

PAGE 1

RISKMANAGEMENTAPPROACHESINDATAMININGByPETERTSYURMASTOADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2014

PAGE 2

c2014PeterTsyurmasto 2

PAGE 3

TomyparentsElenaandAnatolywhoconstantlysupportedmeinallmyendeavors,andtoVitalyBezmenov,themostinuentialteacherinmylife 3

PAGE 4

ACKNOWLEDGMENTS IamverythankfultomyadvisorProf.StanUryasevforhisinvaluablehelpinresearchduringmydoctoratestudyingandgreatpersonalsupport.Iwouldliketoexpressmygratitudetoothermembersofmydoctoratecommittee,Prof.PanosPardalos,Prof.RangarajanandProf.VladimirBoginskifortheirhelpandguidence.Particulary,IwouldliketoexpressmygreatappreciationtoJunyaGotoh(ChuoUniversity)andMichaelZabarankin(StevensInstituteofTechnology)fortheirsignicantcontributiontomyresearch.AlsoIwouldliketothankmyfamilyfortheirtremendoussupport. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 7 LISTOFFIGURES ..................................... 8 ABSTRACT ......................................... 9 CHAPTER 1INTRODUCTION ................................... 10 2OVERVIEWOFEXISTINGSVMFORMULATIONS ................ 15 3RISKMANAGEMENTAPPROACHTOSUPPORTVECTORMACHINE .... 18 3.1ReformulationofSVMswithRiskFunctionals ................ 18 3.2GeneralizedMaximalMarginFormulation .................. 20 3.3GeneralizedStructuralRiskMinimizationFormulations ........... 22 4THEORYOFSVMFORMULATIONSWITHRISKFUNCTIONALS ....... 26 4.1PositiveHomogeneousRiskFunctionals ................... 26 4.2UnboundednessofOptimalSolution ..................... 27 4.3EquivalenceofFormulationswithPositiveHomogeneousRiskFunctionals 28 4.4ExistenceofOptimalSolution ......................... 29 4.5Non-LinearSVM ................................ 32 5NEWROBUSTCLASSIFIERSBASEDONRISKMANAGEMENTAPPROACH 35 5.1Value-at-RiskSupportVectorMachine .................... 37 5.2CVaR-(L,U)SupportVectorMachine .................... 42 5.3ComputationalResults ............................. 43 5.3.1ResultsofVaR-SVMagainst-SVMonArticialdataset ...... 44 5.3.2Real-LifeDataSets ........................... 45 6CONCLUSION .................................... 52 6.1DissertationContribution ........................... 52 6.2FutureWork ................................... 52 APPENDIX APROOFSOFSELECTEDPROPOSITIONS .................... 53 A.1ProofofProposition 4.2 ............................ 53 A.2ProofofProposition 4.3 ............................ 54 5

PAGE 6

A.3ProofofProposition 5.1 ............................ 55 A.4ProofofProposition 5.2 ............................ 55 A.5ProofofProposition 4.6 ............................ 56 BEXAMPLESOFPSGMETA-CODE ........................ 58 REFERENCES ....................................... 60 BIOGRAPHICALSKETCH ................................ 63 6

PAGE 7

LISTOFTABLES Table page 5-1Runningtimeof-SVMandVaR-SVMfordifferentdatasetswithprocessorIntel(R)Core(TM)2QuadCPU@2.83GHz .................... 50 5-2ExperimentalresultsforLiverDisordersdatasetwithoutliers. .......... 50 5-3ExperimentalresultsforHeartDiseasedatasetwithoutliers. .......... 50 5-4ExperimentalresultsforGermanCreditdatasetwithoutliers. .......... 51 5-5ExperimentalresultsforIndianDiabetesdatasetwithoutliers. .......... 51 5-6ExperimentalresultsforIonospheredatasetwithoutliers. ............ 51 7

PAGE 8

LISTOFFIGURES Figure page 3-1HistogramofdistancesofdatasamplestothehyperplaneforGermanCreditdata. .......................................... 25 4-1Equivalenceamongseveralformulationswithriskfunctionals .......... 34 5-1Histogramfortheprobabilitydensityfunctionoflossdistribution:VaRisthe-percentileoflossdistribution. ........................... 46 5-2Distributionofdistancestothehyperplane.CVaRmeasureiscalculatedasanormalizedaverageofdistancesexceeding-percentileofdistribution. .... 47 5-3Distributionofdistancestothehyperplane.DifferenceofCVaRLandCVaRUmeasuresiscalculatedasanormalizedaveragebetweenLandUpercentilesofdistribution. ..................................... 48 5-4Out-of-sampleperformanceof-SVMandofVaR-SVMasafunctionofthepercentageofoutliersforthearticialdataset. ................... 49 8

PAGE 9

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophyRISKMANAGEMENTAPPROACHESINDATAMININGByPeterTsyurmastoMay2014Chair:StanUryasevMajor:IndustrialandSystemsEngineeringThisdissertationisdedicatedtothedevelopmentofriskmanagementapproachtoSupportVectorMachine(SVM)forclassication.TheapproachisbasedontherepresentationofclassicalversionsofSVMinprimalasaminimizationofsometailriskmeasuressuchasmaximalloss,conditionalvalue-at-risk,partialmoment,etc.Byrevisitingthefundamentalconceptsofstructuralriskminimizationandmaximalmarginclassicationwedevelopageneralizedframework,whichisshowntocontainseveralclassicalversionsofSVMssuchashard-margin,soft-margin,-SVMandextended-SVMasitsspecialcases,correspondingtotheparticularchoicesofriskfunctional.Undertheassumptionthattheempiricalriskfunctionispositivehomogeneous,wederiveconditionsforwhichthegeneralizedformulationisequivalenttoseveralstructuralriskminimizationformulations.Theconsideredformulationsspecifydifferentwayshowtoexpressatradeoffbetweentheempiricalriskandregularization.Thenon-linearGaussian(RBF)kernelextensionisprovidedforallformulaion.Thesufcientconditionsfortheexistenceofanoptimalsolutionandunboundedsolutionstotheassociatedoptimizationproblemsarealsogivenandcanbecheckedbeforetheoptimization.Withinapresentedframework,weproposenewclassicationmethodsbasedonvalue-at-riskandconditionalvalue-at-riskmeasures,whicharestabletodataoutliers.Computationalexperimentsconrmthatthenewclassiershaveasuperiorout-of-sampleperformanceondatasetscontaminatedbyoutliers,comparedtotheclassicalversionsofSVM. 9

PAGE 10

CHAPTER1INTRODUCTIONThesuccessofSupportVectorMachine(SVM)isbasedonawiderangeofconceptsfromstatistics,functionalanalysis,computerscience,mathematicaloptimization,etc.ThisdissertationfocusesontheanalysisofoptimizationaspectsofSVMformulationsinprimalanddevelopsariskmanagementapproach,whichgeneralizesclassicalversionsofSVM.Aspresentedintextbooksortutorialpapers(e.g., Burges ( 1966 )),SVMcanbetracedbacktotheso-calledmaximummargincriterionforbinaryclassication.Itmaximizestheso-calledgeometricmarginwhenthegivensampledatasetis(linearly)separable.Supposewehaveatrainingdataset(1,y1),...,(l,yl)offeaturesi2Rmwithbinaryclasslabelsyi2f)]TJ /F5 11.955 Tf 27.2 0 Td[(1,+1gfori=1,...,l,andahyperplanespeciedbytheequationwTx+b=0,w2Rmnf0g,b2R.Thefollowingquantitiesgaugethedegreesofmisclassicationonthebasisofthegeometricdistancesfromdatasamplestothehyperplane: di(w,b)=)]TJ /F10 11.955 Tf 10.49 8.09 Td[(yi(wTi+b) kwk,fori=1,...,l,(1)wherekkdenotestheEuclideannorm,i.e.,kk:=p 21++2m.Inequalitydi(w,b)0impliesthatthesampleiiscorrectlyclassiedbythehyperplane,whiledi(w,b)>0implieswrongclassication.Withthenotationabove,themaximummargincriterionisformulatedasaminimizationproblem minimizew,bmaxfd1(w,b),...,dl(w,b)g.(1)Thiscriterionisappealingsince1)itisintuitivelyunderstandable;2)itisastraightforwardminimizationoftheso-calledgeneralizationerrorbound(e.g., Vapnik 1999 ).Inaddition,aslongasthedatasetisseparable,themarginmaximizationcanbereducedtoa 10

PAGE 11

convexquadraticprogrammingproblem(QP):minimizew,b1 2kwk2subjectto)]TJ /F10 11.955 Tf 11.95 0 Td[(yi(wTi+b))]TJ /F5 11.955 Tf 21.92 0 Td[(1,i=1,...,l.Onthecontrary,whenthedatasetisnotseparable,theresultingconvexQPbecomesinfeasible,andthesoftmarginSVM Cortes&Vapnik ( 1995 )isdevelopedbyintroducingslackvariables.ThesoftmarginSVMcanthenbeconsideredasthesimultaneousminimizationofanempiricalrisk(associatedwiththeslackvariablesintroduced)andaregularizationterm(usuallydenedwiththeEuclideannormofthenormalvectorofthehyperplane).Forexample,withtheaforementionednotation,theC-SVM,themostpopularsoftmarginformulation,isrepresentedbythebi-objectiveminimization: minimizew,b1 2kwk2+C1 lPli=1maxf1)]TJ /F10 11.955 Tf 11.95 0 Td[(yi(wTi+b),0g,(1)withsomeconstantC>0.NotethattheconstantCisaparameterbalancingthetwoobjectivesanditneedstobedetermined.Basedonthisformulation,theC-SVMcanbeviewedasthestructuralriskminimizationofthefollowingform: minimize(regularization)+C(empiricalrisk).(1)However,readersmayquestionthelogicsketchedabove:Whymarginmaximization( 1 )isnotappliedfortheinseparablecase?orWhatistherelationbetweenthemarginmaximization( 1 )andC-SVM( 1 )?Itisnothardtoseethatthemarginmaximization( 1 )resultsinanonconvexoptimizationunlessthedatasetisseparable.Butdowehaveanystatistical(nottractability-based)motivationforavoidingthenonconvexity?Inaddition,theprincipleofthestructuralriskminimizationdoesnotmentionhowittreatsthetrade-offbetweentheempiricalriskandtheregularization.Inotherwords,it 11

PAGE 12

canalsobeformulatedineither minimize(empiricalrisk)subjectto(regularization)EwithaconstantE,(1)or minimize(regularization)subjectto(empiricalrisk))]TJ /F10 11.955 Tf 21.92 0 Td[(DwithaconstantD.(1)However,withtheempiricalriskfunctionemployedinC-SVM( 1 ),eachoftheaboveformulations( 1 ),( 1 )and( 1 )resultsinadifferentclassier.Infact,C-SVMoftheform( 1 )withE>0resultsinameaninglesssolutionwithw=0,inwhichnohyperplaneisobtained.Ontheotherhand,itisprovedthatthe-SVM( Scholkopfetal. 2000 ),anotherpopularsoftmarginSVM,doesnotdependonthedifferenceamongtheformulations( 1 ),( 1 )and( 1 ).Thefocusofthisdissertaionistondwhatfactormakessuchadifference.Toexplorethis,weapplyariskmanagementapproachtotheclassicationproblem.viewtheoriginalcriterion( 1 )asaminimizationproblemwithobjectivefunctionR(d1,...,dl)referredtoasriskfunctionalanddenedongeometricmarginsd1,...,dl: minimizew,bR(d1(w,b),...,dl(w,b)),(1)whereR:Rl!Randdiisdenedin( 1 ).Inotherwords,optimizationproblem( 1 )isaspecialcaseof( 1 )whenriskfunctionalR(d1,...,dl)=maxfd1,...,dlgisapplied.Theareaofriskmanagementhasbeenextensivelygrowninrecentyears.Specically,value-at-risk(VaR)andconditionalvalue-at-risk(CVaR)( Rockafellar&Uryasev 2000 )arecurrentlywidelyusedtomonitorandcontrolthemarketriskofnancialinstruments( Dufe&Pan 1997 ; Jorion 1997 ).Acomprehensiverecentstudyofriskmeasurescanbefoundin Rockafellar&Uryasev ( 2013 ).Ameritofthegeneralizedrepresentationoftheform( 1 )istoprovideauniedschemeforvariousexistingSVMformulations.Specically,weshowthathard-marginSVM( Boseretal. 1992 ),-SVM( Scholkopfetal. 2000 ),extended-SVM( Perez-Cruz 12

PAGE 13

etal. 2003 )andVaR-SVM( Tsyurmastoetal. 2013 )inprimalcanberepresentedincompactformswithriskfunctionals(max-functional,conditionalvalue-at-risk,value-at-risk)andderivedfromformulation( 1 ).Itisnotsurprisingthattheform( 1 )hasaprofoundrelationtothestructuralriskminimizationforms( 1 ),( 1 )and( 1 ),buttherelationamongthemhasbeenexploredonlyinalimitednumberofarticlesbyspecifyingitsownstructure.Forexample, Gotoh&Takeda ( 2005 ); Perez-Cruzetal. ( 2003 )studytherelationofthe-SVM( Scholkopfetal. 2000 )totheoptimizationproblemoftheform( 1 )withtheCVaR.Recently, Gotohetal. ( 2013a )developaclassofgeneralizedformulationsofSVMonthebasisoftheso-calledcoherentriskmeasures( Artzneretal. 1999 ),whichincludesthemax-functionalandtheCVaRasspecialcases.Furtherweshowthattheconsideredapproachcanderiveamoregeneralsetofclassiers.Specically,weconsideraclassofriskfunctionalsthatarepositivehomogeneous.Werstshowthatwiththisassumption,thestructuralriskminimizationsofthedifferentforms( 1 ),( 1 )and( 1 ),havethesamesetofoptimalclassiersiftheoptimalvalueof( 1 )isnonpositive;Ontheotherhand,neither( 1 ),( 1 )or( 1 )providesanyclassieriftheoptimalvalueof( 1 )ispositive(Figure 4-1 ).Wederiveasufcientconditionfortheexistenceofoptimalsolutionsoftheconsideredformulations.Itisnoteworthythattheobtainedconditioncanbecheckedbeforetheoptimization,andisconsistentwiththefactpointedoutby Burges ( 1966 ); Chang&Lin ( 2002 ); Gotoh&Takeda ( 2005 ); Gotohetal. ( 2013b )where-SVMisshowntobeapplicablenotforallvaluesofparameterintherange[0,1].Inaddition,iftheriskmeasureispositivehomogeneous,thesetofoptimalclassiersobtainedbyeitherofthestructuralriskminimizations( 1 ),( 1 )or( 1 )doesnotdependontheparametersC,DorE.Namely,withoutbotheringhowtosetthosevalues,wecansetC=D=E=1(Figure 4-1 ).Thisfactispointedoutin Scholkopfetal. ( 2000 )forthe-SVMbyusingtheduality.Incontrast,weshowwithout 13

PAGE 14

dualitythatsuchapropertycomesfromthepositivehomogeneityofthecorrespondingriskfunctional.Ontheotherhand,theaforementioneddependenceontheparametersintheC-SVMcanbeexplainedbylackofpositive-homogeneity.Asapracticalapplicationofthedevelopedmethodology,weproposetwonewclassicationmethods.TherstmethodsreferredtoasVaR-SVMisbasedonvalue-at-risk(VaR)thatrequiresthehard-marginconstrainttoholdwithaprobability2(0,1].When=1,VaR-SVMreducestothehard-marginSVM.VaRwithacondencelevelisthe-percentileofthelossdistributionandiswidelyusedtomonitorandcontrolthemarketriskofnancialinstruments Dufe&Pan ( 1997 ); Jorion ( 1997 ).AsimilaritybetweentheSVMclassicationandanoptimizationofthemonetaryriskmeasureswasobservedin Gotoh&Takeda ( 2005 ); Sakalauskasetal. ( 2012 ); Takeda&Sugiyama ( 2008 ).ThesecondmethodreferredtoasCVaR-(L,U)-SVMhaslowerandupperparameters,LandUsuchthat0L
PAGE 15

Thecomputationalexperimentswitharticialandreal-lifedatasetsconrmthatinpresenceofoutliers,VaR-SVMandCVaR-(L,U)-SVMhavesuperiorout-of-sampleperformanceversus-SVM. 15

PAGE 16

CHAPTER2OVERVIEWOFEXISTINGSVMFORMULATIONSLet(1,y1),...,(l,yl)beatrainingdatasetoffeaturesi2Rmwithbinaryclasslabelsyi2f)]TJ /F5 11.955 Tf 26.99 0 Td[(1,+1gfori=1,...,l.Theoriginalfeatures1,...,l2Rmaretransformedintofeatures(1),...,(l)2Rn,respectively,withthemapping:Rm!Rn.ThegoaloftheSVMisthentoconstructahyperplanespeciedbytheequationwTx+b=0,w2Rn,b2R,whichseparatesthetransformedfeatures(1),...,(l)withclasslabels+1and)]TJ /F5 11.955 Tf 9.3 0 Td[(1fromeachotherinRnspace.Inordertondahyperplaneonthebasisofthegiventrainingsetandan(implicit)mapping,minimizingviolationofseparationisabottomline.Wesaythatthedatasetisseparableifthereexists(w,b)2Rnnf0gRsuchthatforeachi=1,...,l,yi(wT(i)+b)>0.Ontheotherhand,adatapointiisconsideredtobewronglyclassiedifyi(wT(i)+b)<0foragiven(w,b).Areasonablecriterionfordetermininga(w,b)istoplacethehyperplanesothatthegeometricdistancefromtheworstclassieddatasamplewouldbethenearestifdatasetisinseparable,orthatfromthenearestdataismaximizedifdatasetisseparable.Thisruleisformulatedasafractionalprogramming(FP)problem: maximizew,bminiyi(wT(i)+b) kwk:i=1,...,l.(2)Inthisdissertation,thequantityyi(wT(i)+b)isreferredtoasthemargin(ofasamplei),whileyi(wT(i)+b)=kwkisreferredtoasthegeometricmargin(ofasamplei).Theoptimizationproblem( 2 )isknowntoberewrittenbyaquadraticprogramming(QP)problem: maximizew,b1 2kwk2 subjecttoyi(wT(i)+b)1,i=1,...,l,(2)ifthedatasetisseparableunderthemapping.ThisapproachisknownashardmarginSVM(see,e.g., Boseretal. ( 1992 )).Itis,however,noteworthythattheFP( 2 )remains 16

PAGE 17

well-denedevenwhendatasetisinseparable,whereasQP( 2 )istheninfeasible,i.e.,ithasnofeasiblesolution.Inordertodealwithnon-separabledatasets,formulation( 2 )wasextendedbyintroducingslackvariablesz1,...,zl: minimizew,b,z1 2kwk2+C lPli=1zi subjecttoyi(wT(i)+b)1)]TJ /F10 11.955 Tf 11.96 0 Td[(zi,i=1,...,l, zi0,i=1,...,l,(2)whereC>0isaparametertobetuned.( 2 )isreferredtoasC-SVM.Itshouldbenotedthat( 2 )isrewrittenbyanunconstrained(convex)minimizationoftheform: minimizew,b1 2kwk2+C llXi=1[1)]TJ /F10 11.955 Tf 11.95 0 Td[(yi(wT(i)+b)]+,(2)where[x]+:=maxfx,0g.Theform( 2 )isreferredtoasaclassofstructuralriskminimization,whichisoftenconsideredasacentralprincipleformachinelearningmethods.Namely,( 2 )isconsideredasthesimultaneousminimizationoftheempiricalclassicationerrortermPli=1[1)]TJ /F10 11.955 Tf 12.54 0 Td[(yi(wT(i)+b)]+andtheregularizationtermkwk,whichisaddedtocontrolovertting.Ontheotherhand,theinterpretationoftheempiricalerrortermhasbeenleftambiguousinthesensethatadatasamplecontributestothistermifthemarginislessthan1andnoclearinterpretationisgivenforthemeaningofthevalueof1.Atthesametime,theinterpretationoftheparameterCisalsonotclear.Asaremedyfortheambiguousinterpretationoftheinvolvedconstants,i.e.,1andC, Scholkopfetal. ( 2000 )developanalternativeknownas-SVM: minimizew,b,,z1 2kwk2)]TJ /F3 11.955 Tf 11.95 0 Td[(+1 lPli=1zi subjecttoyi(wT(i)+b))]TJ /F10 11.955 Tf 11.96 0 Td[(zi,i=1,...,l, zi0,i=1,...,l,(2)where2(0,1]isaparametertobetuned. 17

PAGE 18

Correspondingto( 2 ), Perez-Cruzetal. ( 2003 )poseanextendedformulation: minimizew,b,,z)]TJ /F3 11.955 Tf 9.3 0 Td[(+1 lPli=1zi subjecttoyi(wT(i)+b))]TJ /F10 11.955 Tf 11.96 0 Td[(zi,i=1,...,l, zi0,i=1,...,l, kwk=1,(2)where2(0,1]isaparametertobetuned.E-SVM(13)isanonconvexminimizationformulation,butitextendsthelowerboundminofadmissiblerangeof-SVMto(0,max].WehavesofaroverviewedseveralpopularSVMformulations,anditisnothardtoseethatallarebasedonthemargin,yi(wT(i)+b),orgeometricmargin,yi(wT(i)+b)=kwk,andeventuallyformulatedasastructuralriskminimizationform(e.g.,( 2 )( 2 ),( 2 )and( 2 ))oranFP(e.g.,( 2 )).Itisnothardtoseethatallthesehaveacommonfeatureinformulatingacertaintrade-offbetweentheempiricalriskandtheregularization,butitisnotclearhowtheyarerelatedtoeachother,especiallyintermsoftheparameterstherein,suchasCin( 2 )orin( 2 )and( 2 ).Oneofthepurposesofthisdissertationistorevealtheconnectionamongthoseformulationsandtoshowthatthepositivehomogeneityoftherelatedfunctionplaysanimportantrole. 18

PAGE 19

CHAPTER3RISKMANAGEMENTAPPROACHTOSUPPORTVECTORMACHINE 3.1ReformulationofSVMswithRiskFunctionalsLetf(1,y1),...,(l,yl)gbeatrainingdatasetofsamplesi2Rmwithclasslabelsyi2f)]TJ /F5 11.955 Tf 26.83 0 Td[(1,+1gfori=1,...,l.Theoriginalsamplesf1,...,lgRmaretransformedintof(1),...,(l)gRnbymapping:Rm!Rn.Thegoalistoconstructahyperplanew>x+b=0,w2Rn,b2R,thatseparatessamplesf(1),...,(l)gwithclasslabels+1fromthosewithclasslabel)]TJ /F5 11.955 Tf 9.29 0 Td[(1intheRnspace.Let=f!1,...,!lgbeanitesamplespacewithequal1probabilitiesofoutcomes,i.e.Pr(!i)=1=l,i=1,...,l,andlet:!Rnandy:!f)]TJ /F5 11.955 Tf 32.05 0 Td[(1,+1gbediscreterandomvariablessuchthat(!i)=(i),y(!i)=yifori=1,...,l.Foreachoutcome!2anddecisionvariablesw2Rnandb2R,alossfunctionisdenedby L!(w,b)=)]TJ /F10 11.955 Tf 9.29 0 Td[(y(!)w>(!)+b.(3)Itcanbeinterpretedasarandomvariablewithrealizations)]TJ /F10 11.955 Tf 9.3 0 Td[(yi[w>i+b]li=1assumingequalprobabilities1=l.Typically,arandomlossXistranslatedintoareal-valuednumberthroughriskfunctionalssuchas Worst-caselosssup!2X. PartialmomentE[X)]TJ /F3 11.955 Tf 11.95 0 Td[(]+,whichistheexpectedlossexceedingsomespeciedthreshold2R,where[]+=maxf,0g. Conditionalvalue-at-risk(CVaR)CVaR(X),denedastheaverageofthe-tailoftheprobabilitydistributionofXforaspeciedcondencelevel2[0,1].Allwell-knownSVMsadmitaconciseformulationwiththeaboveriskfunctionals.Thehard-marginSVM Boseretal. ( 1992 )minw2Rn,b2R1 2kwk2s.t.yi(w>(i)+b)1,i=1,...,l, 1Theapproachcanbereadilyextendedtothecaseofarbitraryprobabilitiesofoutcomes:Pr(!i)=pi,i=1,...,l. 19

PAGE 20

canbeexpressedwiththeworst-caselossfunctionalby minw2Rn,b2R1 2kwk2s.t.sup!2L!(w,b))]TJ /F5 11.955 Tf 21.92 0 Td[(1.(3)Thesoft-marginSVM(C-SVM) Cortes&Vapnik ( 1995 )minw2Rn,b2R 1 2kwk2+ClXi=1)]TJ /F10 11.955 Tf 9.3 0 Td[(yiw>(i)+b+1+!,C>0,canberewrittenwiththepartialmomentfunctionalas minw2Rn,b2R1 2kwk2+C0E[L!(w,b)+1]+,C0=Cl.(3)The-SVM Scholkopfetal. ( 2000 ),originallyformulatedby minw2Rn,b2R,0 1 2kwk2)]TJ /F3 11.955 Tf 11.95 0 Td[(+1 llXi=1)]TJ /F10 11.955 Tf 11.95 0 Td[(yi(w>(i)+b)+!,(3)canberecastintheform minw2Rn,b2R1 2kwk2+CVaR1)]TJ /F6 7.97 Tf 6.58 0 Td[((L!(w,b)),(3)whichfollowsfromRockafellar-Uryasev'soptimizationformula Rockafellar&Uryasev ( 2000 )forCVaR: CVaR1)]TJ /F6 7.97 Tf 6.59 0 Td[((L!(w,b))=min2R )]TJ /F3 11.955 Tf 9.3 0 Td[(+1 llXi=1)]TJ /F10 11.955 Tf 11.96 0 Td[(yi(w>(i)+b)+!(3)andthefactthatthecondition0isredundantasshownin Burges ( 1966 ).Therelationshipbetweenthe-SVMandCVaRminimizationwasrstreportedin Gotoh&Takeda ( 2005 ).Infact,( 3 )isequivalenttotheformulation minw2Rn,b2R1 2kwk2s.t.CVaR1)]TJ /F6 7.97 Tf 6.59 0 Td[((L!(w,b)))]TJ /F5 11.955 Tf 21.91 0 Td[(1.(3)Both( 3 )and( 3 )areconvexproblems,andtheirequivalenceisestablishedthroughthedualitytheoryprovidedthattheybothhaveanoptimalsolution. 20

PAGE 21

Finally,TakedaandSugiyama Takeda&Sugiyama ( 2008 )showedthattheE-SVM Perez-Cruzetal. ( 2003 )canbeformulatedby minw2Rn,b2RCVaR1)]TJ /F6 7.97 Tf 6.59 0 Td[((L!(w,b))s.t.kwk=1.(3)Thethreeriskfunctionals:worst-caseloss,partialmoment,andCVaRhavedifferentlevelsoftolerancetothemisclassicationerror.Theworst-caselossfunctionalisthemostconservativeamongthethree,andthecorrespondingSVM( 3 )canbeinterpretedasarobustoptimizationproblem Zhang&Wang ( 2008 ).IntheSVMformulations( 3 ),( 3 ),and( 3 ),thegreatertheparameteris,themoretoleranttomisclassicationstheSVMsare:=0and=1correspondtothemostandleastconservativecases,respectively.Infact,for=0,theSVMs( 3 ),( 3 ),and( 3 )areequivalenttotheSVM( 3 )withtheworst-caselossfunctional. 3.2GeneralizedMaximalMarginFormulationArandomdistancefunctionisdenedforeachoutcome!2anddecisionvariablesw2Rn,b2Rasanormalizedlossfunction: d!(w,b)=L!(w,b) kwk.(3)Foreachoutcome!2,theabsolutevalueof( 5 )isaEuclidiandistancebetweenavector(!)andahyperplaneH=fx2RnjwTx+b=0g.Distancefunction( 5 )hasadiscretedistributionofrealizations )]TJ /F10 11.955 Tf 10.5 8.09 Td[(yi[wT(i)+b] kwkli=1.(3)withequalprobabilities1 l.Figure 3-1 showsahistogramfortherealizations( 5 )ofdistancesofdatasamples((1),y1),...,((l),yl)tothehyperplaneforsomexed 21

PAGE 22

parameters(w,b)ofthehyperplane2.Samplesfallingintolightarea(blueincolor)areclassiedcorrectly,whilesamplesfallingintoadarkarea(redincolor)areclassiedincorrectly.Thegoalistoselectsuchparametersofhyperplane(w,b)in( 5 )astoreducethesizeoftherighttail(markedwithredcolor)ofthehistogram.ThisproblemcanbeformalizedbyintroducingariskfunctionalR(),whichconvertsarandomdistanced!(w,b)intoadeterministicfunctionR(d!(w,b)): minimizew,bRL!(w,b) kwk.(3)Anobjectivein( 3 )representsanaggregatedlossintherighttailof( 5 )sothatthelowertheobjectivein( 3 ),thelowerthesizeoftherighttailof( 5 )exceedingzero.Optimizationproblem( 3 )canbeviewedasamargin-basedclassiersinceittakesintoaccountdistancesofdatasamplestothehyperplaneandaggregatesthembymeansofriskfunctionalR().EachchoiceofriskfunctionalR()denesaparticularclassier.Inthisdissertation,wefocusonriskfunctionalswiththefollowingproperties: Denition1. (InsensitivitytoConstant)R(C)=CforeachconstantC2R. Denition2. (PositiveHomogeneity)R(L!)=R(L!)foreachrandomvariableL!andconstant>0,2R. Denition3. (Continuity)limk!1R(Lk!)=R(L!)foreachsequenceofrandomvariablesLk:!R(k=1,2,...)andarandomvariableL:!Rsuchthatlimk!1Lk!=L!foreach!23. Denition4. (LowerSemicontinuity)R(L!)limk!1R(Lk!)foreachsequenceofrandomvariablesLk:!R(k=1,2,...)andarandomvariableL:!Rsuchthatlimk!1Lk!=L!foreach!2. 2ThedatasetwastakenfromUCIMachineLearningRepository http://archive.ics.uci.edu/ml/datasets.html )whenthedecisionvariablesarexed.3Inthisdissertation,weconsideranitesamplespace=f!1,...,!lg. 22

PAGE 23

Denition5. (Convexity)R(L1!+(1)]TJ /F3 11.955 Tf 12.69 0 Td[()L2!)R(L1!)+(1)]TJ /F3 11.955 Tf 12.69 0 Td[()R(L2!)foreachrandomvariableL1!,L2!andconstant2[0,1],2R.Next,weprovidealistofriskfunctionalsandindicatewhichofthepropertiesDef. 1 -Def. 5 hold: 1. Worst-CaseLoss:sup(L!)=maxfL!1,...,L!lg(Def. 1 -Def. 5 ) 2. ExpectedLoss:E(L!)=1 lPli=1L!i(Def. 1 -Def. 5 ) 3. Above-ZeroLoss:ATL0(L!)=1 lPli=1[L!i]+(Def. 1 -Def. 5 ) 4. AboveTargetLoss:ATLt(L!)=1 lPli=1[L!i+t]+witht>0(Def. 1 3 4 5 ) 5. Mean-AbsoluteSemi-Deviation:MASDt(L!)=E(L!)+tE([L!)]TJ /F11 11.955 Tf 9.3 0 Td[(E(L!)]+)witht0(Def. 1 -Def. 5 ) 6. Value-at-Risk:VaR(L!)=mincfc:PrfL!cgg(Def. 1 1 4 ) 7. ConditionalValue-at-Risk:CVaR(L!)=mincnc+1 (1)]TJ /F6 7.97 Tf 6.59 0 Td[()lPli=1[L!i)]TJ /F10 11.955 Tf 9.3 0 Td[(c]+o(Def. 1 -Def. 5 ) 8. CoherentRiskMeasures:CRM(L!)=maxq1,...,qlflPi=1qiL!i:(q1,...,ql)2QgwithQf(q1,...,ql):lPi=1qi=1,qi0,i=1,...,lg(Def. 1 -Def. 5 ) 3.3GeneralizedStructuralRiskMinimizationFormulationsThissectiongeneralizesseveralstructuralriskminimizationformulationbymeansofriskfunctionalR().Thefollowingformulationsexpressthetradeoffbetweentheempiricalriskandregularization minimizew,bR(L!(w,b)) kwk,(3) minimizew,bR(L!(w,b))subjecttokwk=E,E>0,(3) minimizew,bR(L!(w,b))subjecttokwkE,E>0,(3) minimizew,b1 2kwk2subjecttoR(L!(w,b)))]TJ /F10 11.955 Tf 21.92 0 Td[(D,D>0,(3) 23

PAGE 24

minimizew,b1 2kwk2+CR(L!(w,b)),C>0,(3)correspondingtoparticularriskfunctionalsR(). Example:MaximumMarginSVMandHard-MarginSVM..Themaximummarginformulation( 2 )andthehardmarginSVM( 2 )canberelatedtotheworst-caseloss(ormaximumloss): sup(L!):=maxfL!1,...,L!lg.(3)Withworst-caseloss( 3 ),formulations( 2 )and( 2 )arespecialcasesof( 3 )and( 3 )(withD=1),respectively.2 Example:C-SVM..Basedon( 2 ),theC-SVM( 2 )correspondstotheabove-target-loss: ATLt(L!):=1 llXi=1[L!i+t]+.(3)witht=1.Indeed,substitutionof( 3 )withinto( 3 )resultsin( 2 ).2 Example:-SVM Scholkopfetal. ( 2000 )..Usingthefactthatconstraint0isredundantintheformulationaswasshownby Burges ( 1966 ),-SVM( 2 )canbeexpressedwiththeConditionalValue-at-Risk(CVaR)functional,i.e.,R()=CVaR(),whichisgivenby CVaR(L!):=minc(c+1 (1)]TJ /F3 11.955 Tf 11.96 0 Td[()llXi=1[L!i)]TJ /F10 11.955 Tf 9.29 0 Td[(c]+),(3)where2[0,1),[x]+:=maxfx,0g.Byusingthe(strong)dualitytheoremoflinearprogramming(see,e.g., Boyd&Vandenberghe ( 2004 )),wecanobtainausefulformulacomputingtheCVaRofalossL!.LetL[i]denotethei-thlargestelementamongL!1,...,L!l,i.e.,L[1]L[2]L[l].ThentheCVaRofLiscalculatedbytheformula: CVaR(L!)=1 pl8<:bpcXi=1L[i]+(p)-222(bpc)L[bpc+1]9=;,(3) 24

PAGE 25

wherep=(1)]TJ /F3 11.955 Tf 12.04 0 Td[()l.Considerthecasewhere(1)]TJ /F3 11.955 Tf 12.04 0 Td[()lisanintegerk2f1,2,...,lg,thentheformula( 3 )canbesimpliedasfollows:CVaR(L)=1 kkXi=1L[i].Inthissense,theCVaRcanbeconsideredastheconditionalexpectationofLofthelargest(1)]TJ /F3 11.955 Tf 12.82 0 Td[()lelements.(See,e.g., Rockafellar&Uryasev ( 2013 )forthedetailedexplanationandpropertiesforCVaR).Therelationbetween-SVMandtheCVaRminimizationwasrstnoticedin Gotoh&Takeda ( 2005 ).2 Example:Extended-SVM(E-SVM)( Perez-Cruzetal. ( 2003 ))..E-SVM( 2 )canbeexpressedwiththeCVaRfunctional( 3 ) minimizew,bCVaR1)]TJ /F6 7.97 Tf 6.59 0 Td[((L!(w,b))subjecttokwk=1,(3)aswasshownby Takeda&Sugiyama ( 2008 ).Indeed,substitutionoftheCVaRwith=1)]TJ /F3 11.955 Tf 11.96 0 Td[(into( 3 )withE=1resultsin( 3 ).2 Example:VaR-SVM( Tsyurmastoetal. ( 2013 ))..(VaR)isaquantileoftheloss,i.e., VaR(L!):=mincfc:PrfL!cgg,(3)where2(0,1).VaR-SVMcanbeexpressedintheform( 3 )withVaRfunctional( 3 ): minimizew,b1 2kwk2subjecttoVaR(L!(w,b)))]TJ /F5 11.955 Tf 21.92 0 Td[(1.(3)Incontrasttotheaforementionedexamples,VaR(L!(w,b))isnonconvexwithrespecttowandb,andaccordingly,(29)isanonconvexminimization,whichmayhave(nonglobal)localminima.2 25

PAGE 26

Figure3-1. HistogramofdistancesofdatasamplestothehyperplaneforGermanCreditdata. 26

PAGE 27

CHAPTER4THEORYOFSVMFORMULATIONSWITHRISKFUNCTIONALSInthischapter,weestablishrelationsbetweenoptimizationproblems( 3 )-( 3 ),summarizedinFigure 4-1 andderivenecessaryandsufcientconditionsforexistenceoftheiroptimalsolution.Firstofall,wendoutwhichoftheaforementionedproblemshavethesamesetofclassiersforeachriskfunctionalR(). Proposition4.1. Classiersdenedinprimalbyoptimizationproblems( 3 )and( 3 )providethesameseparatinghyperplaneforeachriskfunctionalR()andtheconstantE>0.Indeed,if(w,b)isanoptimalsolutionof( 3 ),then(Ew kwk,Eb kwk)isanoptimalsolutionof( 3 ).Conversely,if(w,b)isanoptimalsolutionof( 3 ),then(w,b)isanoptimalsolutionof( 3 )foreach>0.NoticethattheconstantEin( 3 )and( 3 )canbesetE=1duetothepositivehomogeneityofthenormfunction.Thustheoptimizationproblems( 3 )and 3 canbeequivalentlyrecastwithE=1: minimizew,bR(L!(w,b))subjecttokwk=1,(4)Nextwefocusonaspecialcaseofpositivehomogeneousriskfunctionals,forwhichnorm-constrained,risk-constrainedandunconstrainedformulations( 3 )-( 3 )undersomeadditionalconditionsdenethesameclassiers. 4.1PositiveHomogeneousRiskFunctionals Denition1. (PositiveHomogeneity)R(X)=R(X)foreachrandomvariableXandconstant>0,2R.Notethatamongtheriskfunctionalslistedintheprevioussection,ATLt,whichisgivenin( 3 ),doesnotsatisfythisproperty,andaccordingly,weexcludetheC-SVCfromtheanalysisbelow.Severalexamplesofpositivehomogeneousriskfunctionalsareprovidedbelow. 27

PAGE 28

sup(L!)=maxfL!1,...,L!lgWorst-CaseLoss E(L!)=1 lPli=1L!iExpectedLoss ATL0(L!)=1 lPli=1[L!i]+Above-ZeroLoss MASDt(L!)=E(L!)+tE([L!)]TJ /F11 11.955 Tf 9.3 0 Td[(E(L!)]+)witht0Mean-AbsoluteSemi-Deviation VaR(L!)=mincfc:PrfL!cggValue-at-Risk CVaR(L!)=mincnc+1 (1)]TJ /F6 7.97 Tf 6.59 0 Td[()lPli=1[L!i)]TJ /F10 11.955 Tf 9.3 0 Td[(c]+oConditionalValue-at-Risk CRM(L!)=maxq1,...,qlflPi=1qiL!i:(q1,...,ql)2QgwithQf(q1,...,ql):lPi=1qi=1,qi0,i=1,...,lgCoherentRiskMeasuresNoticealsothatbasedonthesetofpositivehomogeneousriskfunctionalsR1(L!),...,Rk(L!)thenewriskfunctionalscanbeobtainedbyapplyingoperationsthatpreservepositivehomogeneity,e.g, R(L!)=maxfR1(L!),...,Rk(L!)g Rk(L!)=kPi=1iRi(L!)fori2R,i=1,...k. 4.2UnboundednessofOptimalSolutionThissectionaddressesanunboundednessofoptimalsolutionof( 3 )forpositivehomogeneouscontinuousriskfunctionalR().Noticethatallmentionedintheprevioussectionriskfunctionalsexceptforvalue-at-riskarecontinuous.Thefollowingpropositionstatestheconditionsofunboundedness: Proposition4.2. SupposethatR()isapositivehomogeneousandcontinuousriskfunctional.Thenoptimizationproblem( 3 )isunboundedif minfR()]TJ /F10 11.955 Tf 9.3 0 Td[(y(!),R(y(!)))g<0.(4)TheproofofProposition 4.2 issketchedinAppendix A.1 28

PAGE 29

4.3EquivalenceofFormulationswithPositiveHomogeneousRiskFunctionalsSupportVectorMachineisusuallyformulatedusingthestructuralriskminimizationprinciplethatcanbeexpressedasatradeoffbetweenempiricalriskandtheregularization.Classicationmethodsdenedinprimalas( 3 )-( 3 )provideseveralwayshowthistradeoffcanbeexpressed.WithempiricalfunctionemployedinC-SVM( 1 ),allthreeformulationsresultindifferentclassiers.Ontheotherhand,itisprovedthatallthreeformulation( 3 )-( 3 )withriskfunctionalsCVaR1)]TJ /F6 7.97 Tf 6.59 0 Td[(()dene-SVM( 2 ).ThisfactcanbeexplainedbypositivehomogeneityofCVaRriskmeasure.Thissectionaimsatshowingthatforpositivehomogeneousriskfunctionalsunderacertainconditionclassiersspeciedinprimalby( 3 )-( 3 )providethesameseparatinghyperplane,whichisalsooptimalforthegeometricmarginformulation( 3 ). Proposition4.3. SupposethatR()isapositivehomogeneousandcontinuousriskfunctional.Then 1. If(w,b)isanoptimalsolutionof( 3 )withnegativeoptimalobjectivevalue,thenE kwk(w,b)isoptimalfor( 3 ). 2. If(w,b)isanoptimalsolutionof( 3 )andw6=0,then(w,b)isanoptimalsolutionof( 3 ).TheproofofProposition 4.3 issketchedinAppendix A.2 Proposition4.4. SupposethatR()isapositivehomogeneousandcontinuousriskfunctional.Then 1. If(w,b)isoptimalsolutionof( 3 )withnegativeoptimalobjectivevalue,then)]TJ /F4 7.97 Tf 10.5 4.71 Td[(D (w,b)isanoptimalsolutionof( 3 ),where=jR(L!(w,b))j. 2. If(w,b)isanoptimalsolutionof( 3 )andw6=0,then(w,b)isanoptimalsolutionof( 3 ).TheproofofProposition 5.1 issketchedinAppendix A.3 Proposition4.5. SupposethatR()isapositivehomogeneousandcontinuousriskfunctional.Then 29

PAGE 30

1. If(w,b)isoptimalsolutionof( 3 )withnegativeoptimalobjectivevalue,thenCr kwk(w,b)isanoptimalsolutionof 3 ,wherer=)]TJ /F14 7.97 Tf 10.5 5.48 Td[(R(L!(w,b)) kwk. 2. If(w,b)isanoptimalsolutionof( 3 )andw6=0,then(w,b)isanoptimalsolutionof( 3 ).TheproofofProposition 5.2 issketchedinAppendix A.4 .When( 3 )havenon-zerooptimalsolutions,( 3 )determinesthesamehyperplanefordifferentvaluesofC>0.Thus,theparameterCcanbesetto1.Propositions 4.3 5.2 implythefollowingcorollary,whichsummarizestherelationbetweenformulations( 3 ),( 3 ),( 3 ), 3 Corollary1. SupposethatR()isapositivehomogeneousandcontinuousriskfunc-tional.Then 1. If( 3 )hasanegativeoptimalobjectivevalue,optimizationproblems( 3 ),( 3 ),( 3 ),( 3 )determinethesameseparatinghyperplane. 2. If( 3 )hasapositiveorzerooptimalobjectivevalue,optimizationproblems( 3 ),( 3 )haveatrivialsolution(w=0)and( 3 )isinfeasible. 4.4ExistenceofOptimalSolutionInthissection,weexploreunderwhatconditionthereexistsanoptimalsolutionofoptimizationproblem( 3 ).Weformulatetheresultforpositivehomogeneousandlowersemi-continuousriskfunctionalsR().Thelowersemi-continuityassumptionismoregeneralthancontinuityandholdforallriskfunctionalslistedinthisdissertation. Proposition4.6(ExistenceofOptimalSolution). SupposethatR()ispositivehomoge-neousandlowersemi-continuous.Then( 3 )hasanoptimalsolutionif minfR()]TJ /F10 11.955 Tf 9.3 0 Td[(y(!)),R(y(!))g>0(4)TheproofofProposition 4.6 issketchedinAppendix A.5 .Weshouldnotethatcontinuityisnotnecessaryfortheexistenceofanoptimalsolutionorunboundednessoftheminimization.Infact,althoughVaR,denedin( 3 ), 30

PAGE 31

doesnotsatisfythe(uppersemi-)continuity,itsminimizationisunboundedif( 4 )holds(seeTheorem2of Tsyurmastoetal. ( 2013 )fordetails). Example:Expectedloss.Considerthecasewheretheexpectedlossisusedastheriskmeasure,i.e.,R(L!(w,b))=E[L!(w,b)]=)]TJ /F12 11.955 Tf 11.45 11.36 Td[(X!2Pr(!)y(!)(wT((!))+b).NotethattheexpectedlossispositivehomogeneousandconvexonRm(i.e.,continuous).Inthiscase,wehavemin!2fR()]TJ /F10 11.955 Tf 9.3 0 Td[(y(!)),R(y(!))g=minf1 llPi=1yi,)]TJ /F9 7.97 Tf 10.5 4.71 Td[(1 llPi=1yig=minf1 l(l+)]TJ /F10 11.955 Tf 11.95 0 Td[(l)]TJ /F5 11.955 Tf 7.09 1.8 Td[(),1 l(l)]TJ /F2 11.955 Tf 9.75 1.8 Td[()]TJ /F10 11.955 Tf 11.95 0 Td[(l+)g=8><>:<0ifl+6=l)]TJ /F5 11.955 Tf 7.08 1.8 Td[(,=0ifl+=l)]TJ /F5 11.955 Tf 7.08 1.79 Td[(.wherel+:=jfi:yi=+1gjandl)]TJ /F5 11.955 Tf 11.68 1.79 Td[(:=jfi:yi=)]TJ /F5 11.955 Tf 9.29 0 Td[(1gj.ThenProposition 4.6 indicatesthattheexpectedloss-basedSVMcanhaveanoptimalsolutiononlyifthenumberofsamplesofyi=+1isequaltothatofyi=)]TJ /F5 11.955 Tf 9.3 0 Td[(1.Thisisconsistentwiththeresultin Gotohetal. ( 2013b ),whereageneralprobabilitysettingpi=Pr(!i)isemployedandtheauthorsshowthattheconditionPli=1piyi=0isthenecessaryandsufcientfortheoptimality.Inaddition,weshouldnotethattheexpectedlossbasedSVMadmitsanybasanoptimalsolutionevenwhentheconditionl+=l)]TJ /F1 11.955 Tf 10.41 1.79 Td[(holds,i.e.,thenumberofthesamplesineachclassisequal,andaccordingly,ithastheboundedoptimalvalue.2 Example:Worst-caseloss(ormaximumloss).Givenasetofdatasamples=f!1,...,!lg(ormorespecically,f(1,y1),...,(l,yl)g),considertheriskmeasureR()=sup().Thecondition( 4 )isthengivenbyminfsupf)]TJ /F10 11.955 Tf 15.28 0 Td[(y1,...,)]TJ /F10 11.955 Tf 9.3 0 Td[(ylg,supfy1,...,ylgg=minf1,1g=1>0.Namely,whentheworst-caselossisemployed,thecondition( 4 )issatisedifandonlyifthereisatleastonesampleofeachclass.2 31

PAGE 32

Example:CVaR.ConsiderthecasewheretheCVaRisusedastheriskmeasure.Byusingtheformula( 3 ),wecaneasilycheckifthecondition( 4 )issatised.Indeed,wecanseethatR(y(!))>0holdsifandonlyif(1)]TJ /F3 11.955 Tf 12.54 0 Td[()l<2l+holdswherel+:=jfi:yi=+1gj;R()]TJ /F10 11.955 Tf 9.3 0 Td[(y(!))>0holdsifandonlyif(1)]TJ /F3 11.955 Tf 12.6 0 Td[()l<2l)]TJ /F1 11.955 Tf 10.41 1.79 Td[(holdswherel)]TJ /F5 11.955 Tf 10.41 1.79 Td[(:=jfi:yi=)]TJ /F5 11.955 Tf 9.3 0 Td[(1gj.Accordingly,thecondition( 4 )fortheCVaRisgivenby >1 l(1)]TJ /F5 11.955 Tf 11.96 0 Td[(2minfl+,l)]TJ /F2 11.955 Tf 7.08 1.79 Td[(g).(4)Itisnoteworthythatthisboundisconsistentwiththeadmissiblerangeoftheparameterfor-SVM Burges ( 1966 ).Alsothecondition( 4 )isconsistentwiththeadmissiblerangeforthe-SVMshownin Chang&Lin ( 2002 )andtheconditioninLemma2.2of Gotoh&Takeda ( 2005 ),wheretheconditionfortheexistenceofanoptimalsolutionofthegeometricmargin-basedCVaRminimizationformulationisgivenby1)]TJ /F5 11.955 Tf -428.77 -23.91 Td[(2minfPi:yi=+1pi,Pi:yi=)]TJ /F9 7.97 Tf 6.59 0 Td[(1pigwiththeprobabilitypi:=Pr(!i).Ontheotherhand,thecondition( 4 )impliesthatfor<1 l(1)]TJ /F5 11.955 Tf 11.96 0 Td[(2minfl+,l)]TJ /F2 11.955 Tf 7.08 1.8 Td[(g),optimizationproblems( 3 )and( 3 )-( 3 )areunbounded.2 Example:VaR.LetusndanadmissiblerangeofparameterforVaR-SVM( 5 ).Sincey(!)isdiscretelydistributedrandomvariableswithrealizationsf)]TJ /F5 11.955 Tf 15.28 0 Td[(1,...,)]TJ /F5 11.955 Tf 9.29 0 Td[(1| {z }l)]TJ /F5 11.955 Tf 29.45 17.35 Td[(,+1,...,+1| {z }l+g, VaR(y(!))=8><>:0ifl+1 l,<0if
PAGE 33

4.5Non-LinearSVMThissectionprovidesanonlinearextensionof( 3 )-( 3 ).AsprovedinPropositions( 4.3 )-( 5.2 ),underpositivehomogeneityandcontinuityofriskfunctionalR()alongwithnegativityofoptimalobjectivevalueof( 3 ),optimizationproblems( 4.3 )-( 5.2 )providethesameclassier.Thus,werestrictourattentiononoptimizationproblem( 3 ).Givenatrainingsetf(1,y1),...,(l,yl)goffeaturesiwithbinaryclasslabelsyi2f)]TJ /F5 11.955 Tf 28.64 0 Td[(1,1gfori=1,...,l,anon-linearSVMcanbeconstructedbyatransformationoftheoriginalfeaturesf1,...,lgintofeaturesf(1),...,(l)gwiththemapping:Rm!Rn.ThetransformationisusuallyimplicitlyspeciedbyakernelfunctionK(,0) Mulleretal. ( 2001 ).However,( 3 )isnotalwaysconvexandcannotbesolvedthroughitsdual.Thissectionshowshowtoconstructnon-linear( 3 )withakernel(e.g.,Gaussian(RBF)kernel).Thereexistsalineartransformation oftheoriginalsetoffeaturesf1,...,lgRmsuchthatthescalarproductsof (j)and (j)areequaltothoseproducedbyK(,0),i.e.h (i), (j)i=K(i,j)h(i),(j)iforalliandj(see,e.g., Cristianini&Shawe-Taylor ( 2000 )),sothatthesolutionoftheprimalproblemwiththetransformedfeaturesf (1),..., (l)gRncoincideswiththatforthedualproblemwiththekernelK(,0)correspondingtotheoriginaltransformation Chapelle ( 2007 ).Forfeaturesf1,...,lg,thekernelK(,0)yieldsapositivedenitekernelmatrixK=fK(i,j)gi,j=1,...,l,whichcanbedecomposedas K=VVT(V1 2)(V1 2)T,(4)where=diag(1,...,l)isadiagonalmatrixwitheigenvalues1>0,...,l>0andV=(v1,...,vl)isanorthogonalmatrixwithcorrespondingeigenvectorsv1,...,vlofK.Therepresentation( 4 )impliesthat :i!(V1 2)i,i=1,...,l,isthesoughtlineartransformation,where(V1 2)iisrowiofthematrix(V1 2).Thus,thenonlinearversionof 33

PAGE 34

( 5 )hasthefollowingexplicitformulation1 minimize,01 2kk2+R(L!(,0))(4)withnewdecisionvariables2Rl,02RanddiscretelydistributedlossfunctionL!(,0)givenbyobservations(scenarios)L!i(,0)=)]TJ /F10 11.955 Tf 9.3 0 Td[(yi[T(i)+0],and(i)=(V1 2)ifori=1,...,l. 1TheconstantCinthesubsequentformulationcanbesuppressedduetoProposition( 5.2 ) 34

PAGE 35

Figure4-1. EquivalenceamongseveralformulationswithriskfunctionalsIfriskfunctionalR()ispositivehomogeneous(denotedPH),allthestructuralriskminimizationformula-tions,listedontheleft-handsideoftheabovegure,havethesamesetofclassiers(independentlyofthevaluesofthecontainedconstantsC,D,E),andaccordingly,alltheconstants,i.e.,C,D,E,inthestruc-turalriskminimizationformulationscanbesetequalto1,aslistedontheright-handsideofthegure.Ontheotherhand,theupperfourformulationsareequivalenttoeachotherunderthepositivehomogeneityandaremoregeneralthanthebelowsixonesinthesensethatthelowerformulationscanachieveoptimalsolutionsonlyiftheupperoneshavenonpositiveoptimalvalues. 35

PAGE 36

CHAPTER5NEWROBUSTCLASSIFIERSBASEDONRISKMANAGEMENTAPPROACHThischapteraddressessensitivityofseveralwell-knownSVMs(hard-marginSVM,soft-marginSVM,-SVM,andE-SVM)todataoutliersandproposesnewclassiers,whichincontrasttoclassicalSVMshavestableperformance.TheSVMliteratureoffersseveralSVMstodealwithdataoutliersandnoisydata.Forexample,toreducetheeffectofoutliers,thefuzzySVM Lin&Wang ( 2002 )associatesafuzzymembershipwitheachtrainingsampleintheC-SVM,although,itdoesnotspecifyhowtoselectaproperfuzzymembershipfunctionforaparticulardataset.Inadditiontosupportvectors,therobustSVM Songetal. ( 2002 )andthecenterSVM Zhang ( 1999 )usecentersofclassestoconstructaclassicationboundary.However,whenthesampledistributionisnotGaussianandhighlyskewed,themean(center)oftheclassmaynotbearepresentativeormayfalloutsideoftheclass.Fornoisyandcorruptdata,SVMclassicationreliesonmethodsofrobustoptimization Aharonetal. ( 2009 )andtheroughsettheory,see Zhang&Wang ( 2008 ).Forexample,in Trafalis&Gilbert ( 2002 ),themagnitudeofnoiseisassumedtobebounded,andforrelativelysmallbounds,theseparatinghyperplaneremainsalmostunaffected.However,astheboundexceedsacertainthreshold,themisclassicationerrorconsiderablyincreases.Also,theupperboundonthenoisecannotbedirectlyestimatedfromthedata.Asapracticalapplicationofthedevelopedmethodology,weproposetwonewclassicationmethods.First,weproposeanSVMthatrequiresthehard-marginconstrainttoholdwithprobability2(0,1].TheSVMisreformulatedwiththevalue-at-risk(VaR)measure1andisreferredfurthertoasVaR-SVM.For=1,theVaR-SVMreducestothehard-marginSVM,whereasfor<1,itdiscards(1)]TJ /F3 11.955 Tf 11.49 0 Td[()100% 1VaRwithacondencelevelisthe-percentileofthedistributionoflossandiswidelyusedtomonitorandcontrolthemarketriskofnancialinstruments Dufe&Pan ( 1997 ); Jorion ( 1997 ). 36

PAGE 37

ofthetrainingsamplesviewedasoutliers.AsimilaritybetweenSVMclassicationandoptimizationofmonetaryriskmeasureswasobservedin Gotoh&Takeda ( 2005 ); Sakalauskasetal. ( 2012 ); Takeda&Sugiyama ( 2008 ).Forexample,the-SVMcanbereformulatedasanSVMwiththeconditionalvalue-at-risk(CVaR)measure,whichcontrolstheaverageofthelargest(1)]TJ /F3 11.955 Tf 12.36 0 Td[()100%ofmisclassicationerrors.However,incontrasttotheVaR-SVM,the-SVMdiscardsnodatasamples.Second,weproposeanewclassierreferredtoasCVaR-(L,U)-SVMwithlowerandupperparameters,LandUsuchthat0L
PAGE 38

5.1Value-at-RiskSupportVectorMachineTheSVMs( 3 ),( 3 ),( 3 ),and( 3 )aresensitivetooutliers,sincethesupremum,partialmoment,andCVaRallrelyontherighttailofthelossdistributionthatcontainsdataoutliers.Specically,thepartialmomentistheaverageofthelossesexceeding)]TJ /F5 11.955 Tf 9.29 0 Td[(1,whereasCVaRaverages(1)]TJ /F3 11.955 Tf 13.01 0 Td[()100%ofthelargestlosses,andthesupremumisthelargestsingleloss.However,SVM'ssensitivitytooutlierscanbereducedbyusingriskfunctionalsthatdiscardthelargestvaluesintherighttailofthelossdistribution.Thehard-marginSVM( 3 )hastheequivalentformulation minw2Rn,b2R1 2kwk2s.t.Pr[L!(w,b))]TJ /F5 11.955 Tf 21.92 0 Td[(1]=1,(5)whichsuggeststhattheconstraintL!(w,b))]TJ /F5 11.955 Tf 25.48 0 Td[(1canberequiredtoholdwithprobability2(0,1],i.e. minw2Rn,b2R1 2kwk2s.t.Pr[L!(w,b))]TJ /F5 11.955 Tf 21.91 0 Td[(1].(5)Withvalue-at-risk(VaR),orpercentilefunction,denedby VaR(L!(w,b))=minz2RfzjPr[L!(w,b)z]gmin(z1 llXi=11f)]TJ /F4 7.97 Tf 10.82 0 Td[(yi[w>(i)+b]zg),(5)where1fgistheindicatorfunctionequalto1iftheconditionincurlybracketsistrueandequalto0otherwise,theproblem( 5 )canberewrittenintheform minw2Rn,b2R1 2kwk2s.t.VaR(L!(w,b)))]TJ /F5 11.955 Tf 21.92 0 Td[(1,(5) percentageofthetrainingdataischanged,whereasthelatterndstheseparationhyperplaneundertheassumptionthatthetrainingdataarenotclearlyspeciedbutratherareknowntobefromsomeset. 38

PAGE 39

whichwillbereferredtoasVaR-SVM.TheparameterintheVaR-SVMindicatesthat(1)]TJ /F3 11.955 Tf 12.17 0 Td[()100%percentageofdataisconsideredasoutliersand,thus,isdiscarded,seeFigure 5-1 .Incontrasttothe-SVM,theVaR-SVMisunaffectedbyoutliersinthe-tailofthelossdistribution.ObservethattheVaR-SVM( 5 )resembles( 3 ).Establishingtheequivalenceof( 3 )to( 3 )reliesonCVaRconvexity,whereasVaRisnotconvexandsoistheproblem( 5 ).Therefore,thesimilarequivalenceof( 5 )totheunconstrainedproblem minw2Rn,b2R1 2kwk2+CVaR(L!(w,b)),C>0(5)cannotbeobtainedthroughthedualitytheoryandwillbeestablishedbyothermeans.TheVaR-SVMminimizesthe-quantileofthedistanced!(w,b)fromthevector(!)totheseparatinghyperplaneH=fx2Rnjw>x+b=0ginRn.Foreachoutcome!2anddecisionvariablesw2Rnandb2R,d!(w,b)isdenedby d!(w,b)=L!(w,b) kwk.(5)Ithasnegativesignwhen(!)isclassiedcorrectly,i.e.y(!)[w>(!)+b]>0,andhaspositivesignwhen(!)isclassiedincorrectly,i.e.y(!)[w>(!)+b]<0.Thedistance( 5 )assumesrealizations )]TJ /F10 11.955 Tf 10.5 8.09 Td[(yi[wT(i)+b] kwkli=1(5)withequalprobabilities1=l.Figure 3-1 showsthehistogramofd!(w,b)forthedatasamples((1),y1),...,((l),yl)fromGermanCreditData3forsomexedwandb.Theblue-coloredandred-coloredsamplesarethesamplesclassiedcorrectlyandincorrectly,respectively.Thegoalistominimizethenumberofred-coloredsamplesby 3ThedatasetwastakenfromUCIMachineLearningRepository http://archive.ics.uci.edu/ml/datasets.html 39

PAGE 40

varyingtheparameters(w,b)in( 5 ),sothattheVaR-SVMisformulatedby minw2Rn,b2RVaRL!(w,b) kwk.(5)Nexttwotheoremsestablishrelationshipsbetweenoptimizationproblems( 5 ),( 5 ),and( 5 ). Theorem5.1. If(w,b)isanoptimalsolutionof( 5 )and=VaR(L!(w,b))<0,then()]TJ /F5 11.955 Tf 9.3 0 Td[(1=)(w,b)isoptimalfor( 5 ).If(w,b)isanoptimalsolutionof( 5 )withw6=0,then(w,b)isoptimalfor( 5 )foreach>0. Proof. If(w,b)isoptimalfor( 5 )thenVaR(L!(w,b))=)]TJ /F5 11.955 Tf 9.3 0 Td[(1.Indeed,supposethat=VaR(L!(w,b))<)]TJ /F5 11.955 Tf 9.3 0 Td[(1,then)]TJ /F5 11.955 Tf 9.3 0 Td[(1=2(0,1)and(~w,~b)=()]TJ /F5 11.955 Tf 9.3 0 Td[(1=)(w,b)isfeasiblefor( 5 )withVaR(L!(~w,~b))=)]TJ /F5 11.955 Tf 9.3 0 Td[(1,butk~wk
PAGE 41

whichcontradictstheassumptionthat(w,b)isoptimalfor( 5 ). Also,positivehomogeneityofVaRimpliesthat(w,b)isoptimalfor( 5 )foreach>0. Theorem5.2. If(w,b)isanoptimalsolutionof( 5 )and=VaR(L!(w,b))<0,thenCr kwk(w,b)isoptimalfor( 5 ),wherer=)]TJ /F3 11.955 Tf 9.3 0 Td[(=kwk.If(w,b)isanoptimalsolutionof( 5 )withw6=0,then(w,b)isoptimalfor( 5 )foreach>0. Proof. Let(w,b)beanoptimalsolutionof( 5 )suchthat=VaR(L!(w,b))<0,andletr=)]TJ /F3 11.955 Tf 9.3 0 Td[(=kwk>0.ThenVaR(L!(w,b)) kwkVaR(L!(w,b)) kwk=)]TJ /F10 11.955 Tf 9.29 0 Td[(rforallw6=0andb.Thus,foranyw6=0andb,theobjectivefunctionof( 5 )isboundedfrombelowby 1 2kwk2+CVaR(L!(w,b))=1 2kwk2+CVaR(L!(w,b)) kwkkwk1 2kwk2)]TJ /F10 11.955 Tf 9.3 0 Td[(Crkwk)]TJ /F5 11.955 Tf 32.41 8.09 Td[(1 2(Cr)2.(5)Observethatfor(~w,~b)=Cr kwk(w,b),theinequality( A )reducestotheequality 1 2k~wk2+CVaR(L!(~w,~b))=1 2(Cr)2kwk2 kwk2+C(Cr)VaR(L!(w,b) kwk| {z }=)]TJ /F4 7.97 Tf 6.59 0 Td[(r=)]TJ /F5 11.955 Tf 10.5 8.09 Td[(1 2(Cr)2.(5)Ifw=0,thentheleft-handsideof( A )isequaltoCVaR()]TJ /F10 11.955 Tf 9.3 0 Td[(y(!)b)=CjbjVaR()]TJ /F10 11.955 Tf 9.3 0 Td[(y(!)signb)0,sincetheexistenceofanoptimalsolutionof( 5 )impliesVaR(y(!))0.Indeed,letVaR()]TJ /F10 11.955 Tf 9.3 0 Td[(y(!))<0andletw=w0bexedwithkw0k=1,thenlimb!1VaRL!(w0,b) kw0k=limb!1jbjVaR()]TJ /F10 11.955 Tf 9.3 0 Td[(y(!)[!(b)+1])| {z }<)]TJ /F6 7.97 Tf 6.58 0 Td[(forsufcientlysmall!(b)=, 41

PAGE 42

where!(b)=w>0(!)=jbj!0asb!1,andisapositivenumber.Similarly,itcanbeshownthatVaR(y(!))0,sothat( A )holdsforw=0,andconsequently,(~w,~b)isoptimalfor( 5 ).Nowsupposethat(w,b)isoptimalfor( 5 )withw6=0butthatitisnotoptimalfor( 5 ),i.e.thereexists(~w,~b)suchthatVaR L!(~w,~b) k~wk!1 2kwk2+CVaR(L!(~w,~b)) k~wkkwk=1 2kwk2)]TJ /F10 11.955 Tf 11.96 0 Td[(C~rkwk)]TJ /F5 11.955 Tf 32.41 8.08 Td[(1 2(C~r)2, (5)where~r=)]TJ /F5 11.955 Tf 9.3 0 Td[(VaR(L!(~w,~b))=k~wk.Since(w,b)isoptimalfor( 5 ),theinequality( A )yields 1 2kwk2+CVaR(L!(w,b))>)]TJ /F5 11.955 Tf 10.5 8.08 Td[(1 2(C~r)2(5)foranywandb.Similarlyto( 5 ),itcanbeshownthatfor(^w,^b)=C~r k~wk(~w,~b),1 2k^wk2+CVaR(L!(^w,^b))=)]TJ /F5 11.955 Tf 10.5 8.09 Td[(1 2(C~r)2,whichcontradicts( 5 ),sothat(w,b)isoptimalfor( 5 ),andthepositivehomogeneityofVaR()impliesthat(w,b)isoptimalsolutionof( 5 )foreach>0aswell. Therelationbetweenproblems( 5 )and( 5 )followsfromTheorems 5.1 and 5.2 Corollary2. If(w,b)isoptimalfor( 5 )withw6=0and=VaR(L!(w,b)),thenCr kwk(w,b)isoptimalfor( 5 ),wherer=)]TJ /F3 11.955 Tf 9.3 0 Td[(=kwk.Conversely,if(w,b)isoptimalfor( 5 )withw6=0,then()]TJ /F5 11.955 Tf 9.3 0 Td[(1=)(w,b)isoptimalfor( 5 ).Corollary 2 impliesthatsolvingof( 5 )reducestosolvingtheunconstrainedoptimizationproblem( 5 ). 42

PAGE 43

Theparameters(w,b)oftheseparatinghyperplanew>x+b=0aredetermineduptoapositivemultiplier>0.Therefore,when( 5 )and( 5 )havenon-zerooptimalsolutions,theydeterminethesameseparatinghyperplane.Also,( 5 )determinesthesameseparatinghyperplanefordifferentvaluesofC>0. 5.2CVaR-(L,U)SupportVectorMachineInthissection,weproposeanewclassierformulatedintheprimalas minw,b(1)]TJ /F3 11.955 Tf 11.96 0 Td[(L)CVaRL(d!(w,b)))]TJ /F5 11.955 Tf 11.96 0 Td[((1)]TJ /F3 11.955 Tf 11.96 0 Td[(U)CVaRU(d!(w,b))(5)andfurtherreferredtoasCVaR-(L,U)-SVM.Ithastwoparameters:lowercondencelevelL2[0,1]anduppercondencelevelU2[0,1]suchthatL
PAGE 44

whendatasetisfreefromoutlierstheparameterUcanbechosenequalto1and,thus,CVaR-(L,U)-SVMperformsasgoodas-SVM.However,whendatasetiscontaminatedbyoutliers,CVaR-(L,U)-SVMhasanadvantageofstabilitytooutlierscomparedto-SVM.NotethatCVaR-(L,U)-SVMispositivehomogeneous.Clearly,problem( 5 )isaspecialcaseof( 3 )withR()=(1)]TJ /F3 11.955 Tf 12.26 0 Td[(L)CVaRL())]TJ /F5 11.955 Tf 12.27 0 Td[((1)]TJ /F3 11.955 Tf 12.27 0 Td[(U)CVaRU().WithProposition 5.2 ,optimizationproblem( 5 )canberecastintheform minimizew,b1 2kwk2+(1)]TJ /F3 11.955 Tf 11.96 0 Td[(L)CVaRL(L!(w,b)))]TJ /F5 11.955 Tf 11.96 0 Td[((1)]TJ /F3 11.955 Tf 11.96 0 Td[(U)CVaRU(L!(w,b))(5)aslongastheobjectivevalueof( 5 )isnonpositive.Although( 5 )isunconstrainednon-fractionaloptimization,itisstillanonconvexoptimization.Accordingly,thedeterministicglobaloptimizationmethods(see,e.g., Horstetal. ( 2000 ); Horst&Thoai ( 1999 ); Horst&Tuy ( 2003 ))arenotpromisingexceptforverysmallinstances.However,theobjectiveof( 5 )isintheformoftheso-calledD.C.(differenceoftwoconvexfunctions)andefcientgoodheuristicalgorithmssuchasDCA(see,e.g., An&Tao ( 2005 ))areavailable.Inaddition,noticethatthenonconvexityoftheobjectivecanbeexpectedtobesmallwhenUiscloseto1,i.e.,Z1UVaR(d!(w,b))d0,asU1,and,thus,objective( 5 )isexpectedtovirtuallyremainstobeconvex.Sincetypically,datacontains<3to5%outliers,itisenoughtosetU2[0.95,1)fordiscardingthoseoutliers.InSection6,weshowthattheheuristicapproachtoCVaR-(L,U)-SVMachievesasuperiorout-of-sampleperformancecomparedto-SVMonthereal-lifedatacontaminatedbyoutliers. 44

PAGE 45

5.3ComputationalResultsInthissection,VaR-SVM( 5 ),-SVM( 3 )andCVaR-(L,U)-SVM( 5 )arecomparedonarticialandreal-lifedatasets.AlthoughVaRanddifferenceoftwoCVaRfunctionsarenotconvex(infact,optimizationofVaRisNP-completeproblem,seeforinstance Yangetal. ( 2013 )),variousheuristicalgorithmsandsmoothingtechniques(seee.g., Gaivoronski&Pug ( 2005 ); Larsenetal. ( 2002 ))areincludedinstandardoptimizationpackages.Specically,weperformedcomputationswithMATLABusingPortfolioSafeguard(PSG)solver4,whichhasspecialheuristicstohandleanon-convexoptimization,seeSection9.16in Zabarankin&Uryasev ( 2004 ).WithPSG,solvingtheproblems( 5 )and( 3 )involvesthreestages: 1. FormulatingtheoptimizationproblemwithprecodedVaRandCVaRfunctions.Atypicalmeta-codeuses5operators(seeAppendix B forthePSGmeta-codefor( 5 )). 2. DataprocessingforthePSGfunctionsinarequiredformat.Typically,VaRandCVaRfunctionsaredenedonthematrixoftransformedtrainingsamplesf((1),y1),...,((l),yl)g. 3. RunningPSGsolverwiththemeta-codeandprocesseddata. 5.3.1ResultsofVaR-SVMagainst-SVMonArticialdatasetAnarticialdatasetconsistsofl1=400samples(withlabel+1)andl2=400samples(withclasslabel)]TJ /F5 11.955 Tf 9.3 0 Td[(1)generatedfromGaussiandistributionswithmeanvectorandcovariancematrix:(+,+)=(5e,5I)and()]TJ /F5 11.955 Tf 7.08 1.79 Td[(,)]TJ /F5 11.955 Tf 7.08 1.79 Td[()=(15e,5I),whereeisavectorofonesinR10andIisanidentitymatrixR1010.TheoutliersaremodeledtofollowaGaussiandistribution(out,out)without=100e,out=20Iandhavingclasslabel+1.Thepercentageofoutliersvariesintherange0%10%oftheoriginal800samples.VaR-SVMand-SVMarethencomparedonout-of-samplewiththefollowingparameters: 4 http://www.aorda.com/aod/welcome.action/psg.action 45

PAGE 46

Fractionoftrainingsamplesis2=3; Numberofrandomsplittingoftheentiredatasetintotrainingandtestingis10; Theparametersin( 3 )andin( 5 )areselectedfromagrid0:0.0025:1tominimizeout-of-sampleerror.Figure 5-4 showsout-of-sample(OOS)performanceofVaR-SVM( 5 )andof-SVMasafunctionofthepercentageofoutliersforthearticialdataset.Forasmallnumberofoutliers(<3%),OOSisapproximatelythesameforbothclassiersandiscloseto1.However,foralargernumberofoutliers(>4%),theOOSgraphsdeviatesubstantially.TheOOSgraphfor-SVMdramaticallydropsdowntoalmost0.5duetosensitivityof-SVMtooutliers.TheOOSgraphforVaR-SVM,incontrast,stabilizesatthelevelofOOS0.9.The10%ofmisclassicationscorrespondto10%ofoutliers.ThegraphconrmsthatVaR-SVMisstabletooutliers. 5.3.2Real-LifeDataSetsTheproblems( 5 ),( 3 ),( 5 )aresolvedwithdatasetsfromUCIMachineLearningRepository5:LiverDisorders,HeartDisease,IndianDiabetes,GermanandIonosphere.Outliersaregeneratedbyarticiallymultiplyingthefractionof0%,1%,5%,and10%oftheoriginaldatasetby1000.Testingaccuracyisevaluatedwith10-foldcrossvalidation.Theproblems( 3 ),( 5 )and( 5 )aresolvedforvalues0,0.05,0.1,...,0.9of,,Landvalues0.91,0.92,...,0.99ofU.Todeterminetheoptimalvaluesofparameters,thetrainingsetissplit100timesrandomlyintoa2/3partanda1/3part,where2/3partisusedtotuneparameters(w,b)in( 3 ),( 5 )and( 5 ).Theoptimalvalueofparametersarechosentominimizeanaveragemisclassicationerroronthe1/3partoftrainingsetover100splits.Tables 5-2 to 5-6 showthatasthepercentageofoutliersincreases,theperformanceof-SVMdegradessignicantly,whereasVaR-SVM'sperformanceisalmostunaffected.Theoriginalfeatureswerenormalized 5 http://archive.ics.uci.edu/ml/datasets.html 46

PAGE 47

(zeromean,unitstandarddeviation).Table 5-1 comparesrunningtimeofVaR-SVM,-SVMandCVaR-(L,U)-SVMfordifferentdatasets.TherunningtimesforVaR-SVMandCVaR-(L,U)-SVMareslightlyhigherthanfor-SVM,thoughitisofthesameorder. 47

PAGE 48

Figure5-1. Histogramfortheprobabilitydensityfunctionoflossdistribution:VaRisthe-percentileoflossdistribution. 48

PAGE 49

Figure5-2. Distributionofdistancestothehyperplane.CVaRmeasureiscalculatedasanormalizedaverageofdistancesexceeding-percentileofdistribution. 49

PAGE 50

Figure5-3. Distributionofdistancestothehyperplane.DifferenceofCVaRLandCVaRUmeasuresiscalculatedasanormalizedaveragebetweenLandUpercentilesofdistribution. 50

PAGE 51

Figure5-4. Out-of-sampleperformanceof-SVMandofVaR-SVMasafunctionofthepercentageofoutliersforthearticialdataset. 51

PAGE 52

Table5-1. Runningtimeof-SVMandVaR-SVMfordifferentdatasetswithprocessorIntel(R)Core(TM)2QuadCPU@2.83GHz SOLVINGTIME(SEC) DATASET #SAMPLES #FEATURES -SVM VaR-SVM CVaR-fL,Ug-SVM LIVERDISORDERS 345 6 1.12 0.93 1.34 HEARTDISEASE 294 13 0.80 1.27 1.03 INDIANDIABETES 345 6 1.12 0.93 1.23 GERMAN 1000 24 2.03 3.07 2.89 IONOSPHERE 796 14 1.76 2.49 2.21 Table5-2. ExperimentalresultsforLiverDisordersdatasetwithoutliers. OUT-OF-SAMPLEACCURACY(%) OULIERS(%) -SVM VaR-SVM CVaR-(L,U)-SVM MEAN STD MEAN STD MEAN STD 0 63.65 3.88 69.65 3.14 71.17 1.73 1 59.35 3.55 68.35 3.29 71.65 2.65 5 58.87 2.72 66.52 3.89 68.91 1.14 10 58.78 3.11 65.74 3.17 70.78 1.86 Table5-3. ExperimentalresultsforHeartDiseasedatasetwithoutliers. OUT-OF-SAMPLEACCURACY(%) OULIERS(%) -SVM VaR-SVM CVaR-(L,U)-SVM MEAN STD MEAN STD MEAN STD 0 82.96 1.22 81.84 1.51 83.16 1.21 1 78.29 2.94 81.61 2.67 83.27 2.07 5 76.53 2.19 81.12 1.52 83.27 1.62 10 70.71 2.65 81.53 2.14 83.57 1.74 52

PAGE 53

Table5-4. ExperimentalresultsforGermanCreditdatasetwithoutliers. OUT-OF-SAMPLEACCURACY(%) OULIERS(%) -SVM VaR-SVM CVaR-(L,U)-SVM MEAN STD MEAN STD MEAN STD 0 74.86 1.31 73.42 0.48 76.26 1.38 1 72.49 4.40 74.14 0.07 76.19 1.57 5 62.70 1.43 70.93 1.35 75.63 1.68 10 60.03 5.39 70.39 2.20 75.33 1.20 Table5-5. ExperimentalresultsforIndianDiabetesdatasetwithoutliers. OUT-OF-SAMPLEACCURACY(%) OULIERS(%) -SVM VaR-SVM CVaR-(L,U)-SVM MEAN STD MEAN STD MEAN STD 0 77.54 1.84 76.56 2.31 77.99 1.53 1 74.15 1.82 76.84 1.78 76.84 1.78 5 60.86 4.13 73.95 3.22 77.79 2.98 10 57.66 7.63 73.16 2.97 77.24 2.17 Table5-6. ExperimentalresultsforIonospheredatasetwithoutliers. OUT-OF-SAMPLEACCURACY(%) OULIERS(%) -SVM VaR-SVM CVaR-(L,U)-SVM MEAN STD MEAN STD MEAN STD 0 69.96 1.88 69.25 1.73 71.27 2.19 1 63.17 1.23 70.60 2.68 70.96 1.13 5 61.28 3.02 68.01 2.92 71.01 1.84 10 60.36 1.05 67.36 1.92 70.70 3.05 53

PAGE 54

CHAPTER6CONCLUSION 6.1DissertationContributionBasedonthegeometricmargin,whichisemployedinthemaximummarginSVM,theuniedschemeisprovidedandshowntocontainseveralwell-knownSVMs.Withthenotionofriskfunctionals,therelationsbetweenexistingSVMclassiersareestablished;itisshownthatpositivehomogeneityplaysacentralroleintheequivalenceamongthedifferentformulations.Thesufcientconditionforunboundednessandexistenceofoptimalsolutionofconsideredgeneralizedformulationsarederived.Thenonlinearextensionoftheproposedgeneralizedformulationsisprovided.Asaspecialcaseofdevelopedgeneralizedframework,newclassiersareproposedandempiricallyshowntobestabletodataoutliers. 6.2FutureWorkThereareseveraldirectionsofhowthisworkcanbeextended.First,newclassierscanbeobtainedbyemployingvariousexistingriskfunctionalstotheproposedframework.Arecentcomprehensivestudyofriskmeasurescontainingnumerousexamplescanbefoundin Rockafellar&Uryasev ( 2013 ).Second,itwouldbeinterestingtoderiveanupperboundonVapnik-ChervonenkisgeneralizationerrorforthegeneralizedSVMformulationsandinvestigateforwhichriskfunctionalstheboundistighter.VaRmeasureanddifferenceofCVaRmeasurearenotconvex.Thiscallsforthesearchforconvexfunctionalshavingsimilarpropertiesofstabilitytooutliers. 54

PAGE 55

APPENDIXAPROOFSOFSELECTEDPROPOSITIONS A.1ProofofProposition 4.2 DuetoProposition 4.1 ,itisenoughtoshowthat( 3 )isunboundedifthecondition( 4 )holds.Suppose,forinstance,thatR()]TJ /F10 11.955 Tf 9.3 0 Td[(y(!))<0.ConsiderthebehaviorofthefunctionR(L!(w,b))asb!+1.UsingthepositivehomogeneityofR(),weobtainR(L!(w,b))=R()]TJ /F10 11.955 Tf 9.3 0 Td[(y(!)(wT(!)+b))=bR)]TJ /F10 11.955 Tf 10.49 8.09 Td[(y(!)wT(!) b)]TJ /F10 11.955 Tf 11.95 0 Td[(y(!),forb>0.Letustakeanarbitrarysequencebksuchthatlimk!1bk=+1.Thenforeach!2,takingintoaccountthatkwk=E,wehavelimk!1)]TJ /F10 11.955 Tf 10.49 8.09 Td[(y(!)wT(!) bk)]TJ /F10 11.955 Tf 11.95 0 Td[(y(!)=)]TJ /F10 11.955 Tf 9.3 0 Td[(y(!).DuetothecontinuityofR(),weobtainlimk!1Ry(!)wT(!) bk)]TJ /F10 11.955 Tf 11.96 0 Td[(y(!)=R()]TJ /F10 11.955 Tf 9.3 0 Td[(y(!))<0.Therefore,limb!+1R(L!(w,b))=limb!+1bR)]TJ /F4 7.97 Tf 10.5 5.48 Td[(y(!)wT(!) b)]TJ /F10 11.955 Tf 11.95 0 Td[(y(!)=,whichimpliesthat( 3 )isunbounded.Now,supposethatR(y(!))<0.ApplyingthesamereasoningasforthecaseR()]TJ /F10 11.955 Tf 9.3 0 Td[(y(!))<0,weobtainlimb!R(L!(w,b))=limb!R()]TJ /F10 11.955 Tf 9.3 0 Td[(y(!)(wT(!)+b))=limb!jbjR)]TJ /F10 11.955 Tf 10.49 8.09 Td[(y(!)wT(!) b+y(!)=limb!jbjR(y(!))=,whichimpliesthat( 3 )isunboundedandconditionminfR()]TJ /F10 11.955 Tf 9.3 0 Td[(y(!)),R(y(!))g0isnecessaryfortheexistenceofanoptimalsolutionof( 3 ).2 55

PAGE 56

A.2ProofofProposition 4.3 1. Supposethat(w,b)isanoptimalsolutionof( 3 )withnegativeobjectivevalue,thenEw kwk,Eb kwkisanoptimalsolutionof( 3 )duetoProposition( 4.1 ).Moreover,theoptimalobjectivevalueof( 3 )isnegativesinceRL!Ew kwk,Eb kwk=ERL!(w,b) kwk<0.Sincethedifferencebetweentheoptimizationproblems( 3 )and( 3 )isonlyinanormconstraintonw,itisenoughtoshowthatk^wk=Eholdsforeachoptimalsolution(^w,^b)of( 3 ).Toobtaincontradiction,supposethatk^wk1andoptimalobjectivevalueof( 3 )isnegative.Thus,weobtaincontradictionwithoptimalityof(^w,^b).If^w=0,thenanobjectivevalueof( 3 )canberewrittenasR()]TJ /F10 11.955 Tf 9.3 0 Td[(y(!)^b).Since( 3 )hasanoptimalsolution,thenconditionminfR()]TJ /F10 11.955 Tf 9.29 0 Td[(y(!)),R(y(!))g0holdsduetoProposition 4.2 .Thus,RL!Ew kwk,Eb kwk<0RL!(^w,^b)=R()]TJ /F10 11.955 Tf 9.29 0 Td[(y(!)^b),whichcontradictsanoptimalityof(^w,^b). 2. Supposethat(w,b)isanoptimalsolutionof( 3 )suchthatw6=0.Thenitfollowsthatkwk=EasprovedintherstpartofthisProposition.Thus,(w,b)isanoptimalsolutionof( 3 ).ApplyingProposition( 4.1 ),weobtainthat(w,b)isoptimalsolutionof( 3 )foreach>0.2 56

PAGE 57

A.3ProofofProposition 5.1 1. Forshort,denote=jR(L!(w,b))j.Supposethat(w,b)isoptimalfor( 3 )withnegativeoptimalobjectivevalue,thenD (w,b)isalsooptimalfor( 3 )andD R(L!(w,b))=)]TJ /F10 11.955 Tf 9.3 0 Td[(D.ThenD (w,b)isoptimalto: minimizew,bRL!(w,b) kwksubjecttoR(L!(w,b))=)]TJ /F10 11.955 Tf 9.3 0 Td[(D,(A)Optimizationproblem( A )canbeequivalentlyrecast: minimizew,b1 2kwk2subjecttoR(L!(w,b))=)]TJ /F10 11.955 Tf 9.3 0 Td[(D,(A)IndeedR()]TJ /F10 11.955 Tf 9.3 0 Td[(by(!))0duetoProposition 4.2 ,whichimpliesthatoptimalsolutionof( A )satisesconditionw6=0.Thus,(w,b)isoptimalfor( A ).Letusshowthat(w,b)isoptimalfor( 3 ).Toobtaincontradiction,supposethat(w,b)isnotoptimalfor( 3 ),i.e.R(L!(w,b))<)]TJ /F10 11.955 Tf 9.3 0 Td[(D,then(^w,^b)=D jR(L!(^w,^b))j(w,b)isfeasiblefor( 3 )andk^wk0.2 A.4ProofofProposition 5.2 1. Supposethat(w,b)isanoptimalsolutionof( 3 )withnegativeoptimalobjectivevalueRL!(w,b) kwk<0.Forshort,letusmakeanotationr=RL!(w,b) kwk.Objectiveof( 3 )canberewrittenasfollows: 1 2kwk2+CR(L!(w,b))=8>><>>:1 2kwk2+CR(L!(w,b)) kwkkwk,ifw6=0R()]TJ /F10 11.955 Tf 9.3 0 Td[(by(!)),ifw=0.(A) 57

PAGE 58

SincewehaveR()]TJ /F10 11.955 Tf 9.3 0 Td[(by(!))0duetoProposition 4.2 ,itsufcestoconsiderthecasew6=0.Optimalityof(w,b)for( 3 )andpositivehomogeneityofR()impliesthatR(L!(w,b)) kwkR(L!(w,b)) kwk=)]TJ /F10 11.955 Tf 9.3 0 Td[(rforeach(w,b)2Rn+1,w6=0andwehave 1 2kwk2+CR(L!(w,b)) kwkkwk1 2kwk2)]TJ /F10 11.955 Tf 11.96 0 Td[(Crkwk=1 2(kwk)]TJ /F10 11.955 Tf 20.59 0 Td[(Cr)2)]TJ /F5 11.955 Tf 13.15 8.09 Td[(1 2C2(r)2)]TJ /F5 11.955 Tf 23.11 8.09 Td[(1 2(Cr)2.(A)andthelowerbound)]TJ /F9 7.97 Tf 10.5 4.71 Td[(1 2(Cr)2in( A )isattainedonthepointCr kwk(w,b).andconsequently,thepointCr kwk(w,b)isoptimalfor( 3 ). 2. Supposethat(w,b)isanoptimalsolutionof( 3 )andw6=0.Toobtaincontradiction,supposethat(w,b)isnotoptimalsolutionof( 3 ),i.e.,thereexistsa(~w,~b)suchthat R L!(~w,~b) k~wk!1 2kwk2+CR(L!(~w,~b)) k~wkkwk=1 2(kwk)]TJ /F10 11.955 Tf 20.59 0 Td[(C~r)2)]TJ /F5 11.955 Tf 13.15 8.08 Td[(1 2(C~r)2)]TJ /F5 11.955 Tf 23.11 8.08 Td[(1 2(C~r)2. (A)ThisimpliesthatC~r k~wk(~w,~b)attainsasmallerobjectivevalueof( 3 )than(w,b),contradictingtheoptimalityof(w,b).2 A.5ProofofProposition 4.6 DuetoProposition 4.1 ,itisenoughtoshowthat( 3 )hasanoptimalsolutionifthecondition( 4 )holds.SupposethatminfR()]TJ /F10 11.955 Tf 9.3 0 Td[(y(!)),R(y(!))g>0andprovethat( 3 )hasanoptimalsolution.First,observeR(L!(w,b))islowersemi-continuouswithrespectto(w,b).(NotethatR()islowersemi-continuousandL!(,)isanafnefunction).Second,weprovethatifminfR()]TJ /F10 11.955 Tf 9.3 0 Td[(y(!)),R(y(!))g>0,then 58

PAGE 59

limb!1R(L!(w,b))=1.Letusnd,forinstance,limb!+1R(L!(w,b)).Again,applyingthepositivehomogeneityofR(),weobtainR(L!(w,b))=R()]TJ /F10 11.955 Tf 9.3 0 Td[(y(!)[wT(!)+b])=bR)]TJ /F10 11.955 Tf 10.49 8.09 Td[(y(!)wT(!) b)]TJ /F10 11.955 Tf 11.96 0 Td[(y(!),forb>0.Applyinglowersemi-continuityofR(),weobtainlimb!+1R)]TJ /F10 11.955 Tf 10.49 8.09 Td[(y(!)wT b)]TJ /F10 11.955 Tf 11.96 0 Td[(y(!)R()]TJ /F10 11.955 Tf 9.3 0 Td[(y(!))>0.Therefore,limb!+1R(L!(w,b))=1.Inasimilarway,itcanbeshownthatR(y(!))>0implieslimb!R(L!(w,b))=1.Now,weshowthatanoptimalsolutionof( 3 )exists,ifminfR()]TJ /F10 11.955 Tf 9.29 0 Td[(y(!)),R(y(!))g>0.Foreach>0thefollowingoptimizationproblemhasanoptimalsolution minimizew,bR(L!(w,b))subjecttokwk=E,jbj.(A)sincethelowersemi-continuousfunctionR(L!(w,b))isminimizedoveracompactset.Denoteby(w,b)anoptimalsolutionofproblem( A)-222()]TJ /F5 11.955 Tf 21.25 0 Td[(7 ).Ifthereexists>0,suchthatfor(w,b)andforeach(w,b),w2Rn,b2Rsatisfyingkwk=E,jbj,wehaveR(L!(w,b))R(L!(w,b)),then(w,b)istheoptimalsolutionof( 3 ).Toobtaincontradiction,supposethatforeach>0thereexists(~w,~b),~w2Rn,~b2Rsatisfyingk~wk=E,j~bjsuchthatR(L!(~w,~b))0.2 59

PAGE 60

APPENDIXBEXAMPLESOFPSGMETA-CODEThisappendixpresentsthePSGmeta-codeforsolvingoptimizationproblem( 5 )and( 5 ).Meta-code,DataandSolutionscanbedownloadedfromthelink1,seeProblems1band3c,accordingly.Meta-Codeforoptimizationproblem( 5 ): 1Problem:problem var svm,type=minimize2objective:objective svm3quadratic matrix quadratic(matrix quadratic)4var risk 1(0.5,matrix prior scenarios)5box of variables:upperbounds=1,lowerbounds=)]TJ /F5 11.955 Tf 9.3 0 Td[(16Solver:VAN,precision=6,stages=6 Meta-Codeforoptimizationproblem( 5 ): 1Problem:problem difference cvar,type=minimize2objective:objective svm3quadratic matrix quadratic(matrix quadratic)40.7*cvar risk 1(0.3,matrix prior scenarios)5-0.07*cvar risk 2(0.93,matrix prior scenarios)6box of variables:upperbounds=1,lowerbounds=)]TJ /F5 11.955 Tf 9.3 0 Td[(17Solver:VAN,precision=6,stages=6 Commandminimizeinformsthesolverthat( 5 )and( 5 )areminimizationproblems,whereasobjectiveisadeclarationoftheobjectivefunctiondenedinlines3-4for( 5 )and3-5for( 5 ).Thequadraticpartoftheobjectiveinline3is 1 http://www.ise.ufl.edu/uryasev/research/testproblems/advanced-statistics/case-study-nu-support-vector-machine-based-on-tail-risk-measures/ 60

PAGE 61

denedbycommandquadratic,andthecorrespondingdatamatrixcanbefoundinlematrix quadratic matrix.txt.TheVaRpartoftheobjectiveof( 5 )inline4isdenedbycommandvar risk 1,andthecorrespondingdatamatrixistobefoundinlematrix prior scenarios.txt.The(1)]TJ /F3 11.955 Tf 12.56 0 Td[(L)CVaRL())]TJ /F5 11.955 Tf 12.56 0 Td[((1)]TJ /F3 11.955 Tf 12.56 0 Td[(U)CVaRU()partofobjectiveof( 5 )inthelines4,5isdenedbythekeywordscvar risk 1,cvar risk 2anddatamatrixlocatedinthelematrix prior scenarios.txt.ThecoefcientsCandin( 5 )aresettobe0.5,whereasthecoefcientsLandUaresettobe0.3and0.93,accordingly. 61

PAGE 62

REFERENCES Acerbi,C.(2002).Spectralmeasuresofrisk:acoherentrepresentationofsubjectiveriskaversion.JournalofBanking&Finance,26,1505. Aharon,B.,Ghaoui,L.E.,&Nemirovski,A.(2009).Robustoptimization.NewYork:PrincetonUniversityPress. An,L.,&Tao,P.D.(2005).Thedc(differenceofconvexfunctions)programminganddcarevisitedwithdcmodelsofrealworldnonconvexoptimizationproblems.AnnalsofOperationsResearch,5,23. Artzner,P.,Delbaen,F.,Eber,J.-M.,&Heath,D.(1999).Coherentmeasuresofrisk.MathematicalFinance,9,203. Boser,B.E.,Guyon,I.M.,&Vapnik,V.N.(1992).Atrainingalgorithmforoptimalmarginclassiers.InProceedingsoftheFifthAnnualWorkshoponComputationalLearningTheory,(pp.144). Boyd,S.,&Vandenberghe,L.(2004).Convexoptimization.London:CambridgeUniversityPress. Burges,D.J.C.J.C.(1966).Ageometricinterpretationof-svmclassiers.AdvancesinNeuralInformationProcessingSystems12,12,547. Chang,C.-C.,&Lin,C.-J.(2002).Training-supportvectorregression:theoryandalgorithms.NeuralComputation,14,1959. Chapelle,O.(2007).Trainingasupportvectormachineintheprimal.NeuralComputa-tion,19,1155. Collobert,R.,Sinz,F.,Weston,J.,&Bottou,L.(2006).Tradingconvexityforscalability.InProceedingsofthe23rdInternationalConferenceonMachineLearning,(pp.201). Cortes,C.,&Vapnik,V.(1995).Support-vectornetworks.Machinelearning,20,273. Cristianini,N.,&Shawe-Taylor,J.(2000).Anintroductiontosupportvectormachinesandotherkernel-basedlearningmethods.Cambridge:CambridgeUniversityPress. Dufe,D.,&Pan,J.(1997).Anoverviewofvalueatrisk.InstitutionalInvestorJournals,28,7. Gaivoronski,A.,&Pug,G.(2005).Value-at-riskinportfoliooptimization:propertiesandcomputationalapproach.JournalofRisk,7,1. Gotoh,J.,&Takeda,A.(2005).Alinearclassicationmodelbasedonconditionalgeometricscore.PacicJournalofOptimization,1,277. 62

PAGE 63

Gotoh,J.,Takeda,A.,&Yamamoto,R.(2013a).Interactionbetweennancialriskmeasuresandmachinelearningmethods.ComputationalManagementScience,1,1. Gotoh,J.,Takeda,A.,&Yamamoto,R.(2013b).Interactionbetweennancialriskmeasuresandmachinelearningmethods.ComputationalManagementScience,1,1. Horst,R.,Pardalos,P.M.,&Thoai,N.V.(2000).Introductiontoglobaloptimization.NewYork:KluwerAcademicPub. Horst,R.,&Thoai,N.V.(1999).Dcprogramming:overview.JournalofOptimizationTheoryandApplications,101,1. Horst,R.,&Tuy,H.(2003).Globaloptimization:Deterministicapproaches.NewYork:Springer. Jorion,P.(1997).Valueatrisk:thenewbenchmarkforcontrollingmarketrisk.NewYork:McGraw-Hill. Larsen,N.,Mausser,H.,&Uryasev,S.(2002).Algorithmsforoptimizationofvalue-at-risk.AppliedOptimization,70,19. Lin,C.,&Wang,S.(2002).Fuzzysupportvectormachines.NeuralNetworks,IEEETransactionson,13,464. Muller,K.-R.,Mika,G.,Tsuda,K.,&Scholkopf,B.(2001).Anintroductiontokernel-basedlearningalgorithms.NeuralNetworks,IEEETransactionson,12,181. Perez-Cruz,F.,Weston,J.,Herrmann,D.,&Scholkopf,B.(2003).Extensionofthenu-svmrangeforclassication.NatoScienceSeriesSubSeriesIIIComputerandSystemsSciences,60,179. Rockafellar,R.,&Uryasev,S.(2000).Optimizationofconditionalvalue-at-risk.JournalofRisk,2,21. Rockafellar,R.T.,&Uryasev,S.(2013).Thefundamentalriskquadrangleinriskmanagement,optimizationandstatisticalestimation.SurveysinOperationsResearchandManagementScience,18,33. Sakalauskas,L.,Tomasgard,A.,&Wallace,S.(2012).Advancedriskmeasuresinestimationandclassication.vol.1,(pp.114). Scholkopf,B.,Smola,A.J.,Williamson,R.C.,&Bartlett,P.L.(2000).Newsupportvectoralgorithms.NeuralComputation,12,1207. 63

PAGE 64

Song,Q.,Hu,W.,&Xie,W.(2002).Robustsupportvectormachinewithbulletholeimageclassication.Systems,Man,andCybernetics,PartC:ApplicationsandReviews,IEEETransactionson,32,440. Takeda,A.,&Sugiyama,M.(2008).-supportvectormachineasconditionalvalue-at-riskminimization.InProceedingsofthe25thInternationalConferenceonMachineLearning,(pp.1056). Trafalis,T.,&Gilbert,R.(2002).Robustclassicationandregressionusingsupportvectormachines.EuropeanJournalofOperationalResearch,173,893. Tsyurmasto,P.,Zabarankin,M.,&Uryasev,S.(2013).Value-at-risksupportvectormachine:Stabilitytooutliers.JournalofCombinatorialOptimization. Vapnik,V.(1999).Thenatureofstatisticallearningtheory.NewYork:Springer. Yang,X.,Tao,S.,Liu,R.,&Cai,M.(2013).Complexityofscenario-basedportfoliooptimizationproblemwithvarobjective.InternationalJournalofFoundationsofComputerScience,13,671. Zabarankin,M.,&Uryasev,S.(2004).StatisticalDecisionProblems:SelectedConceptsandPortfolioSafeguardCaseStudies.NewYork:Springer. Zhang,J.,&Wang,Y.(2008).Aroughmarginbasedsupportvectormachine.Informa-tionSciences,178,2204. Zhang,X.(1999).Usingclass-centervectorstobuildsupportvectormachines.InNeuralNetworksforSignalProcessingIX,1999.Proceedingsofthe1999IEEESignalProcessingSocietyWorkshop.,(pp.3). 64

PAGE 65

BIOGRAPHICALSKETCH PeterTsyurmastowasbornin1987inSamara,Russia.HestudiedatMedicalTechnicalLyceumwithafocusonmathematicsandphysicsandatthesummercorrespondenceschoolduring2000-2004.Hereceivedhisbachelor'sdegreeinappliedmathematicsandphysics;andmaster'sdegreeindataminingfromMoscowInstituteofPhysicsandTechnologyin2004-2010.HestudiedaPhDprogramwithconcentrationinquantitativenanceduring2010-2013andreceivedhisPhDdegreeinindustrialandsystemsengineeringatUniversityofFloridainMay,2014.HehasworkedasaquantitativeresearchanalystatDeutscheBanksinceMay,2013. 65