Citation
Subspace Classification Methods for High Dimensional Datasets

Material Information

Title:
Subspace Classification Methods for High Dimensional Datasets
Creator:
Panagopoulos, Orestis
Publisher:
University of Florida
Publication Date:
Language:
English

Thesis/Dissertation Information

Degree:
Master's ( M.S.)
Degree Grantor:
University of Florida
Degree Disciplines:
Industrial and Systems Engineering
Committee Chair:
PARDALOS,PANAGOTE M
Committee Co-Chair:
GEUNES,JOSEPH PATRICK
Graduation Date:
8/9/2014

Subjects

Subjects / Keywords:
Curse of dimensionality ( jstor )
Datasets ( jstor )
Dimensionality ( jstor )
Distance functions ( jstor )
High dimensional spaces ( jstor )
Machine learning ( jstor )
Mathematical vectors ( jstor )
Mining ( jstor )
Principal components analysis ( jstor )
Statistical discrepancies ( jstor )
binary
classification
dimensional
high
subspace

Notes

General Note:
In this work we consider high dimensional (HD) datasets which pose well known difficulties when it comes to classification tasks. The dimensionality reduction technique called Principal Component Analysis (PCA) is investigated. PCA is utilized by Local Subspace Classifier (LSC), a subspace classification algorithm, to perform classification. A new binary classification method called Constrained Subspace Classifier (CSC) is then proposed. CSC improves on Local Subspace Classifier (LSC) by accounting for the relative angle between subspaces while approximating the classes with individual subspaces. CSC is solved by an efficient alternating optimization technique and is evaluated on publicly available HD datasets. The improvement in classification accuracy over LSC shows the importance of considering the relative angle between the subspaces while approximating the classes. Moreover, CSC seems to be effective when introduced for lower dimensional datasets.

Record Information

Source Institution:
UFRGP
Rights Management:
All applicable rights reserved by the source institution and holding location.
Embargo Date:
8/31/2016

Downloads

This item has the following downloads:


Full Text

PAGE 1

SUBSPACECLASSIFICATIONMETHODSFORHIGHDIMENSIONALDATASETSByORESTISPANOSPANAGOPOULOSATHESISPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFMASTEROFSCIENCEUNIVERSITYOFFLORIDA2014

PAGE 2

c2014OrestisPanosPanagopoulos 2

PAGE 3

Idedicatethistotheindividualswhohaveinspiredme. 3

PAGE 4

ACKNOWLEDGMENTS ThisthesisistheresultofafruitfulcooperationwithDr.VijayPappuovertheyears2013-2014.Firstandforemost,Iwouldliketoexpressmydeepestgratitudetomythesissupervisor,Dr.PanosM.Pardalos,forhisenlighteningguidanceandinspiringinstructionaswellasforsupportingandtrustingmethroughoutthisstudy.IwouldliketothankDr.JosephGeunesforhiscontinuedsupportduringthiswork.FurthermoreIwouldliketothankAndonisMitidisforhisvaluableandconstructivesuggestionsduringtheplanninganddevelopmentofthisstudy.Lastbutnotleast,Iwouldliketoexpressmyloveandgratitudetomyfamilyforsupportingmeunconditionally. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 6 LISTOFFIGURES ..................................... 7 ABSTRACT ......................................... 8 CHAPTER 1INTRODUCTION ................................... 9 2PRINCIPALCOMPONENTANALYSIS ....................... 12 2.1SeekingaProjectionOperator ........................ 12 2.2AMinimumErrorFormulation ......................... 15 2.3SolutionoftheMinimumErrorFormulation ................. 20 3LOCALSUBSPACECLASSIFIER(LSC) ...................... 23 4MOTIVATINGEXAMPLES .............................. 25 5CONSTRAINEDSUBSPACECLASSIFIER(CSC) ................ 27 5.1SeekingaDistanceMetric ........................... 28 5.2FormulatingCSC ................................ 29 6ALGORITHM ..................................... 31 6.1AlternatingOptimizationTechnique ...................... 31 6.2TerminationRules ............................... 31 7NUMERICALEXPERIMENTS ........................... 33 7.1NumericalExperiments ............................ 33 7.1.1DLBCL .................................. 33 7.1.2Breast .................................. 34 7.1.3Colon .................................. 34 7.1.4DBWorld ................................. 34 7.1.5Mushroom ................................ 34 7.1.6Spambase ................................ 34 7.2Results ..................................... 34 8CONCLUSIONSANDFUTUREWORK ...................... 38 REFERENCES ....................................... 39 BIOGRAPHICALSKETCH ................................ 41 5

PAGE 6

LISTOFTABLES Table page 4-1AverageclassicationaccuraciesandrelativeanglebetweensubspacesgeneratedfromLSCandCSCintwoexamples. ........................ 25 6

PAGE 7

LISTOFFIGURES Figure page 2-1Visualizingtheprojectionoperationin3D ..................... 14 2-2Visualizingthedimensionalityreductionontoa2Dspace ............ 14 3-1Classicationofanewpointx ........................... 24 4-1DatapointsgeneratedbyN1andN2inexample1andthesubspacesgeneratedbyLSCandCSCineachofthetrainingfolds. .................. 25 4-2DatapointsgeneratedbyN1andN2inexample2andthesubspacesgeneratedbyLSCandCSCineachofthetrainingfolds. .................. 26 7-1DLBCL ........................................ 35 7-2Breast ......................................... 35 7-3Colon ......................................... 36 7-4DBWorld ....................................... 36 7-5Mushroom ...................................... 37 7-6Spambase ...................................... 37 7

PAGE 8

AbstractofThesisPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofMasterofScienceSUBSPACECLASSIFICATIONMETHODSFORHIGHDIMENSIONALDATASETSByOrestisPanosPanagopoulosAugust2014Chair:PanosM.PardalosMajor:IndustrialandSystemsEngineeringInthisworkweconsiderhighdimensional(HD)datasetswhichposewellknowndifcultieswhenitcomestoclassicationtasks.ThedimensionalityreductiontechniquecalledPrincipalComponentAnalysis(PCA)isinvestigated.PCAisutilizedbyLocalSubspaceClassier(LSC),asubspaceclassicationalgorithm,toperformclassication.AnewbinaryclassicationmethodcalledConstrainedSubspaceClassier(CSC)isthenproposed.CSCimprovesonLocalSubspaceClassier(LSC)byaccountingfortherelativeanglebetweensubspaceswhileapproximatingtheclasseswithindividualsubspaces.CSCissolvedbyanefcientalternatingoptimizationtechniqueandisevaluatedonpubliclyavailableHDdatasets.TheimprovementinclassicationaccuracyoverLSCshowstheimportanceofconsideringtherelativeanglebetweenthesubspaceswhileapproximatingtheclasses.Moreover,CSCseemstobeeffectivewhenintroducedforlowerdimensionaldatasets. 8

PAGE 9

CHAPTER1INTRODUCTIONHighdimensionaldatahasgrownincreasinglypopularduringthelastdecade.Itcanbefoundinwebpages,socialnetworks,emaildatabases,genearrays,radiationwavesandinbio-medicalapplications.Everysampleproducedbythesemeanstendtobecharacterizedbythousandsoffeatures.Forexample,everysampleingene-expressionmicroarraydatasetsconsistofmeasurementsfromthousandsofgenes.Magneticresonanceimages(MRI)andfunctionalMRIdatacollectedforeachsubjectcouldalsobeconsideredashighdimensionaldatasetswheretheintensityateachpixelconstitutesafeature.VariouskindsofspectralmeasurementsincludingMassSpectroscopyandRamanSpectroscopyareverycommoninchemometrics,wherethespectraarerecordedinchannelsthatnumberwellintothethousands[ 6 ].Inmanysuchbiomedicalapplicationsthenumberofsamplesforsuchdatasetsisverysmallwhencomparedtothenumberoffeaturesduetotheincreasedcostoftheexperiments[ 23 ].Classicationtasksonhighdimensionaldatasetsposesignicantchallengestothestandardstatisticalmethodsandrendermanyexistingclassicationtechniquesimpractical[ 10 ].Firstly,thegeneralizationabilityofmanyclassicationalgorithmsiscompromisedduetothecurseofdimensionalityarisingfromthehighdimensionalityoftheinputspace[ 12 ].Thecurseofdimensionalityphenomenonanditsnegativeimpactongeneralizationperformancecancausemodeloverttingandestimationinstability[ 4 ].Secondly,earlierstudieshaverevealedthegeometricaldistortionthatarisesinhighdimensionaldataspaces,wherethevolumeincreasesexponentiallyasdimensionalityincreases,andpointstendtobecomeequidistantforvarietyofdatadistributionsanddistancefunctions[ 2 ].Thirdly,severalstatisticalmethodsrequireknowingclasscovariancesapriori.Incaseofunavailabilityofclasscovariances,suchestimatesfromsampledatawouldbeunreliableduetosmallsamplesizes. 9

PAGE 10

Onecommonapproachtoaddresstheaforementionedchallengesinvolvesreducingthedimensionalityofthedataseteitherbyusingfeatureextraction[ 15 ]and/orfeatureselection[ 16 ]priortoclassication.Thesedimensionalityreductiontechniquesdecreasethecomplexityoftheclassicationmodelandthusimprovetheclassicationperformance[ 16 ].Featureselectioniseffectiveinremovingirrelevantdata,increasinglearningaccuracy,andimprovingresultcomprehensibility.However,theincreaseddimensionalityofdataposesaseverechallengetomanyexistingfeatureselectionmethodswithrespecttoefciencyandeffectiveness[ 22 ].Featureextractiontechniquestransformtheinputdataintoasetofmeta-featuresthatextracttherelevantinformationfromtheinputdataforclassication.OnepopulartechniquecalledPrincipalComponentAnalysis(PCA)[ 21 ],ndsasetoflinearlyuncorrelatedvariablescalledprincipalcomponentsfromasetofobservationsofpossiblycorrelatedvariables.PCAremovesredundancybytransformingthedatafromahigherdimensionalspaceintoanorthogonallowerdimensionalspace.Thistransformationisperformedinawaythattherstprincipalcomponentaccountsforthemaximumpossiblevariance,andeachsucceedingcomponenthasdecreasingvaluesofvariance[ 20 ].Thenumberofretainedprincipalcomponentsisusuallylessthanorequaltothenumberoforiginalvariablesandaredeterminedusingseveralcriterialiketheeigenvalue-onecriterion,screetest,proportionofvarianceaccountedfor,etc.LocalSubspaceClassier(LSC)[ 13 ]utilizesPCAtoperformclassication.Duringthetrainingphase,alowerdimensionalsubspaceisfoundforeachclassthatapproximatesthedata.Inthetestingphase,anewdatapointisclassiedbycalculatingthedistanceofthepointfromeachsubspaceandchoosingtheclasswithminimaldistance.AlthoughLSCissimpleandrelativelyeasytoimplement,ithasitsownlimitations.LSCndsthesubspacesforeachclassseparatelywithouttheknowl-edgeofthepresenceoftheotherclass.Whileeachsubspaceapproximatesthedata 10

PAGE 11

well,theseprojectionsmaynotbeidealfromaclassicationperspective.Inthisthesis,weconstructanotherclassiercalledConstrainedSubspaceClassier(CSC)thataccountsforthepresenceofanotherclasswhilendingtheindividualsubspaces.LSCformulationismodiedtoincludetherelativeanglebetweenthesubspacesandissolvedefcientlyusingalternateoptimizationtechniques.TheperformanceofCSConpubliclyavailabledatasetsisevaluatedandcomparedtothatofLSC. 11

PAGE 12

CHAPTER2PRINCIPALCOMPONENTANALYSIS 2.1SeekingaProjectionOperatorLetSbeadataspaceofdimensionequaltothenumberoffeatures,L,oftheselecteddataset.WecanalwaysndanorthonormalbasisforS(usingtheGram-Schmidtprocess)givenby UL=fu1,u2,u3,...,uLgwithui2
PAGE 13

correspondstoadimensionallyreduceddatapointin
PAGE 14

Figure2-1. Visualizingtheprojectionoperationin3D kxi)]TJ /F4 11.955 Tf 11.96 0 Td[(UlUl|xik(2)ForasetofNdatapointsina3Dspacewecanvisualisethedimensionalityreductionontoa2Dspace Figure2-2. Visualizingthedimensionalityreductionontoa2Dspace 14

PAGE 15

Initiallythedatapointslivein<3,andmayformathreedimensionalvolume.Projectingthemonatwodimensionalsubspace,<2thedimensionallyreduceddatapointsnowformatwodimensionalcluster(theyliveontheplaneT).PCA(forthe3Dcase)istheproblemofndingthetwodimensionalsurfacesuchthatthesummationofthedistancesoftheoriginaldatapointsfromtheirprojectionsonthe2Dplaneisminimized.InPCAwewanttondtheplane(T)suchthat:PNi=1kxi)]TJ /F4 11.955 Tf 11.95 0 Td[(UlU|lxik2isminimized.or minTNXi=1kxi)]TJ /F4 11.955 Tf 11.96 0 Td[(UlU|lxik2subjecttoU|lUl=I(2)AsmentionedearliertheplaneTisdenedasthe2Dsurfacethatisspannedbythe(reduced)orthonormalbasis( 2 ).Findingsuchabasisisequivalenttodeningtheplane,T.Alternatively:TreatingTasanembeddedsubspacein<3wecandeneitasthesetofallpoints~xisuchthat~xi=x0+Ulaiwhere~xi,x02<3andi2<2. 2.2AMinimumErrorFormulationWewillnowusetheaboveapproachtoformulatethePCAproblem:LetusconsiderthedatamatrixX2
PAGE 16

where,Ul=[u1u2u3...ul]2
PAGE 17

Using( 2 )and( 2 )weget @ @x0"NXi=1kxi)]TJ /F5 11.955 Tf 11.95 0 Td[(x0)]TJ /F4 11.955 Tf 11.95 0 Td[(Ulaik2)]TJ /F3 11.955 Tf 14.71 8.09 Td[(1 NlXk=1NXi=1kaki)]TJ /F9 7.97 Tf 17.3 14.95 Td[(NXi=1NXj=1ij(ui|uj)]TJ /F11 11.955 Tf 11.96 0 Td[(ij)#=0(2)andsincethesecondandthirdtermsvanish(afterdifferentiation)( 2 )becomes @ @x0"NXi=1(xi)]TJ /F5 11.955 Tf 11.95 0 Td[(x0)]TJ /F4 11.955 Tf 11.95 0 Td[(Ulai)|(xi)]TJ /F5 11.955 Tf 11.96 0 Td[(x0)]TJ /F4 11.955 Tf 11.96 0 Td[(Ulai)#=0(2) UlNXi=1ai)]TJ /F9 7.97 Tf 17.3 14.94 Td[(NXi=1(xi)]TJ /F5 11.955 Tf 11.95 0 Td[(x0)=0(2) Ul1 NNXi=1ai)]TJ /F3 11.955 Tf 14.72 8.09 Td[(1 NNXi=1(xi)]TJ /F5 11.955 Tf 11.95 0 Td[(x0)=0(2)Usingthe( 2 )constraintsin( 2 )resultsin 1 NNXi=1(xi)]TJ /F5 11.955 Tf 11.95 0 Td[(x0)=0(2)hence x0=1 NNXi=1xi=xmean(2)Fromrstorderoptimalityconditions,thefollowingconditionalsoholds: @L @ai=0,8i=1,2,...,l(2)Using( 2 )and( 2 )weget @ @ai"NXi=1kxi)]TJ /F5 11.955 Tf 11.96 0 Td[(x0)]TJ /F4 11.955 Tf 11.96 0 Td[(Ulaik2)]TJ /F3 11.955 Tf 14.72 8.09 Td[(1 NlXk=1NXi=1kaki)]TJ /F9 7.97 Tf 17.3 14.94 Td[(NXi=1NXj=1ij(ui|uj)]TJ /F11 11.955 Tf 11.95 0 Td[(ij)#=0(2) 17

PAGE 18

2Ul|(xi)]TJ /F5 11.955 Tf 11.96 0 Td[(x0)]TJ /F4 11.955 Tf 11.95 0 Td[(Ulai)+=0(2) 2Ul|Ulai=2Ul|(xi)]TJ /F5 11.955 Tf 11.95 0 Td[(x0)+(2) ai=Ul|(xi)]TJ /F5 11.955 Tf 11.96 0 Td[(x0)+1 28i=1,2,...,N(2)Summingupoverallvaluesofi, NXi=1ai=Ul|NXi=1(xi)]TJ /F5 11.955 Tf 11.95 0 Td[(x0)+N 2(2)MultiplyingbyUlweget UlNXi=1ai=UlUl|NXi=1(xi)]TJ /F5 11.955 Tf 11.96 0 Td[(x0)+UlN 2(2)Using( 2 )on( 2 )impliesthat: =0(2)Applying( 2 )on( 2 )weget: ai=Ul|(xi)]TJ /F5 11.955 Tf 11.96 0 Td[(x0)8i=1,2,...,N(2)Hencetheoptimizationproblem( 2 )reducesto, minUl,x0NXi=1kxi)]TJ /F5 11.955 Tf 11.96 0 Td[(x0)]TJ /F4 11.955 Tf 11.96 0 Td[(UlU|l(xi)]TJ /F5 11.955 Tf 13.74 0 Td[(x0)k2subjecttoU|lUl=I(2)Sincex0=xmean,themeanofthedatacanbesubtractedfromeverypointxipriortosolvingtheoptimizationproblem( 2 ).Fromherein,Xwouldbeconsideredasmean-subtracteddatamatrix. 18

PAGE 19

So,theoptimizationproblemreducesto, minUlNXi=1kxi)]TJ /F4 11.955 Tf 11.96 0 Td[(UlU|lxik2subjecttoU|lUl=I(2)Usingthefrobeniusnorm, NXi=1kxi)]TJ /F4 11.955 Tf 11.95 0 Td[(UlU|lxik2F(2) =NXi=1tr(xi)]TJ /F4 11.955 Tf 11.96 0 Td[(UlU|lxi)(xi)]TJ /F4 11.955 Tf 11.96 0 Td[(UlU|lxi)|(2) =NXi=1trxixi|)]TJ /F5 11.955 Tf 11.95 0 Td[(xixi|UlU|l)]TJ /F4 11.955 Tf 11.96 0 Td[(UlU|lxixi|+UlU|lxixi|UlU|l(2) =trNXi=1xixi|)]TJ /F3 11.955 Tf 11.96 0 Td[(trNXi=1xixi|UlU|l)]TJ /F3 11.955 Tf 11.96 0 Td[(trNXi=1xixi|UlU|l+trNXi=1xixi|UlU|lUlU|l(2) =trNXi=1xixi|)]TJ /F3 11.955 Tf 11.96 0 Td[(trNXi=1xixi|UlU|l(2) =trXX|)]TJ /F3 11.955 Tf 11.96 0 Td[(trXX|UlU|l(2) =trXX|(I)]TJ /F4 11.955 Tf 22 0 Td[(UlU|l)(2)Theoptimizationproblemin( 2 )reducesto, minUltrXX|(I)]TJ /F4 11.955 Tf 22 0 Td[(UlU|l)subjecttoU|lUl=I(2) 19

PAGE 20

SincetrXX|isaconstant,theoptimizationproblemcanbere-writtenas: maxUltrfU|lXX|UlgsubjecttoU|lUl=I(2) 2.3SolutionoftheMinimumErrorFormulationPCAsolvestheoptimizationproblemgivenby( 2 ):maxUltrfU|lXX|UlgsubjecttoU|lUl=IWhereX2<>:1fori=j0fori6=j(2)Fromrstorderoptimalityconditions,thefollowingconditionshold: @L @=0(2)and @L @uj=0(2)Using( 2 )and( 2 )weget @ @tr(U|lXX|Ul))]TJ /F9 7.97 Tf 19.26 14.94 Td[(lXi=1lXj=1ij(u|iuj)]TJ /F11 11.955 Tf 11.95 0 Td[(ij)=0(2) 20

PAGE 21

u|iuj=ij(2)Using( 2 )and( 2 )weget @ @ujtr(U|lXX|Ul))]TJ /F9 7.97 Tf 19.25 14.95 Td[(lXi=1lXj=1ij(u|iuj)]TJ /F11 11.955 Tf 11.96 0 Td[(ij)=0(2) uj|XX|)]TJ /F9 7.97 Tf 19.26 14.94 Td[(lXi=1ijui|=0(2)Right-multiplying( 2 )byuj: u|jXX|uj)]TJ /F9 7.97 Tf 19.25 14.94 Td[(lXi=1ijui|uj=0(2)Using( 2 )on( 2 )weget: u|jXX|uj)]TJ /F9 7.97 Tf 19.25 14.94 Td[(lXi=1ijij=0(2) u|jXX|uj=lXi=1ijij(2) u|jXX|uj=jj(2)Hence: tr(U|lXX|Ul)=lXj=1u|jXX|uj(2) tr(U|lXX|Ul)=lXj=1jj(2) 21

PAGE 22

Themaximumobjectivefunctionvalueof( 2 )thatcouldattainedisgivenby( 2 )andisequaltothesummationofthellargesteigenvaluesjjofsymetricmatrixXX|.ThereforetheorthonormalbasisforthelowerdimensionalsubspaceisequivalenttotheeigenvectorscorrespondingtothellargesteigenvaluesofsymmetricmatrixXX|. 22

PAGE 23

CHAPTER3LOCALSUBSPACECLASSIFIER(LSC)Considerabinaryclassicationproblem.LetthematricesX12
PAGE 24

TheorthonormalbasisU2isobtainedbychoosingtheeigenvectorscorrespondingtotheklargesteigenvaluesofmatrixX2XT2.AnewpointxisclassiedbycomputingthedistancefromsubspacesS1andS2: dist(x,Si)=tr(UTixxTUi)(3)andtheclassofxisdeterminedas: class(x)=argmini2f1,2gfdist(x,Si)g(3) Figure3-1. Classicationofanewpointx ThoughthesubspacesS1andS2approximatetheclasseswell,theseprojectionsmaynotbeidealforclassicationtasksaseachofthemareobtainedwithouttheknowledgeofanotherclass/subspace.Hence,fromaclassicationperformanceperspective,thesesubspacesmaynotbethebestprojectionsfortheclasses.Inordertoaccountforthepresenceofanothersubspace,weconsidertherelativeorientationofthesubspaces.Inthefollowingsection,weshowtheeffectofchangingtherelativeorientationofthesubspacesonclassicationperformancethroughsomemotivatingexamples. 24

PAGE 25

CHAPTER4MOTIVATINGEXAMPLESWeconsidertwoexampleshereshowingtheeffectofchangingtherelativeanglebetweensubspacesgeneratedbyLSC.ThedatasetsaregeneratedfromtwobivariatenormaldistributionsN1(1,1)andN2(2,2)representingclassesC1andC2.Eachclassconsistsof100randomlygeneratedpointsfromN1andN2respectively.TheparametersofN1andN2forthetwoclassesareshowninTable 4-1 . Table4-1. AverageclassicationaccuraciesandrelativeanglebetweensubspacesgeneratedfromLSCandCSCintwoexamples. DATASETSN1N2LSCCSC 1122ACC(%)ANGLE()ACC(%)ANGLE() EXAMPLE13104)]TJ /F21 10.909 Tf 8.48 0 Td[(1.104744)]TJ /F21 10.909 Tf 8.48 0 Td[(2.804690.29890.83EXAMPLE2354)]TJ /F21 10.909 Tf 8.49 0 Td[(2061010522579.50.98910.74 LSCandCSCaretrainedonthedatawithk=1andtheclassicationaccuraciesareobtainedvia10-foldcrossvalidation[ 11 ].Thesubspacesobtainedforeachofthetrainingfoldsinexample1andexample2areshowninFigures 4-1 and 4-2 respectively. ALSC BCSC Figure4-1. DatapointsgeneratedbyN1andN2inexample1andthesubspacesgeneratedbyLSCandCSCineachofthetrainingfolds. Theaverageclassicationaccuraciesandtheaveragerelativeangle(0/2)betweenthesubspacesforLSCandCSCarereportedinTable 4-1 . 25

PAGE 26

ALSC BCSC Figure4-2. DatapointsgeneratedbyN1andN2inexample2andthesubspacesgeneratedbyLSCandCSCineachofthetrainingfolds. Inexample1,increasingtherelativeanglebetweenthesubspacesclearlyimprovestheclassicationaccuracyby20%.Howeverinexample2,decreasingtherelativeanglebetweenthesubspacesshowsbetterclassicationperformanceandoutperformsLSCby11%.Theseexamplesshowthattherelativeorientationofthesubspacesshouldalsobeconsideredinadditiontocapturingthemaximalvarianceindata. 26

PAGE 27

CHAPTER5CONSTRAINEDSUBSPACECLASSIFIER(CSC)ConstrainedSubspaceClassier(CSC)ndstwosubspacessimultaneously,oneforeachclass,suchthateachsubspaceaccountsformaximalvarianceinthedatainthepresenceoftheotherclass/subspace.Thus,CSCallowsforatradeoffbetweenapproximatingtheclasseswellandtherelativeorientationamongthesubspaces.Therelativeorientationbetweensubspacesisgenerallydenedintermsofprincipalangles[ 8 ].Webrieyreviewprincipalanglesbetweensubspacesbelow,whicharefurtherutilizedtomodifytheformulationofLSCtoincludetherelativeorientationamongthesubspaces.Denition1:LetU121U22(5)whereY>Y=Ik,Z>Z=IkandC=diag(1,2,...,k). 27

PAGE 28

(i.e.ThecolumnsofYandZformanorthonormalbasisforthevectorspaces1)]TJ /F5 11.955 Tf 11.95 0 Td[(U2U>2kF(5)UsingtheF-normdenitionkXk2F=tr(XTX)wehavethat:kU1U>1)]TJ /F5 11.955 Tf 11.96 0 Td[(U2U>2k2F =tr((U1U>1)]TJ /F5 11.955 Tf 11.95 0 Td[(U2U>2)>(U1U>1)]TJ /F5 11.955 Tf 11.96 0 Td[(U2U>2))(5) =tr(U1U>1U1U>1)]TJ /F5 11.955 Tf 11.96 0 Td[(U1U>1U2U>2)]TJ /F5 11.955 Tf 11.95 0 Td[(U2U>2U1U>1+U2U>2U2U>2)(5) =tr(U1U>1)+tr(U2U>2))]TJ /F3 11.955 Tf 11.95 0 Td[(2tr(U1U>1U2U>2)(5) =tr(U>1U1)+tr(U>2U2))]TJ /F3 11.955 Tf 11.95 0 Td[(2tr(U>2U1U>1U2)(5) =kU1k2F+kU2k2F)]TJ /F3 11.955 Tf 11.96 0 Td[(2kU>2U1k2F(5) 28

PAGE 29

AccordingtoTheorem1: kU>2U1k2F=kXi=12i=kXi=1cos2i(5)Using( 5 ),( 5 )becomes: =kXi=1i+kXi=1i)]TJ /F3 11.955 Tf 11.96 0 Td[(2kXi=1cos2i(5) =k+k)]TJ /F3 11.955 Tf 11.95 0 Td[(2kXi=1cos2i(5) =2k)]TJ /F9 7.97 Tf 18.18 14.94 Td[(kXi=1cos2i(5) =2(1)]TJ /F1 11.955 Tf 11.96 0 Td[(cos21)+(1)]TJ /F1 11.955 Tf 11.96 0 Td[(cos22)++(1)]TJ /F1 11.955 Tf 11.95 0 Td[(cos2k)(5) =2kXi=1sin2i(5)HencetheprojectionF-normbecomes: dpF(U1,U2)=1 p 2kU1U>1)]TJ /F5 11.955 Tf 11.95 0 Td[(U2U>2kF=vuut kXi=1sin2i(5) 5.2FormulatingCSCTheprojectionmetricisutilizedtoincorporatetherelativeorientationbetweensubspacesinLSC.TheformulationofLSCismodiedasshownbelowtoobtaintheConstrainedSubspaceClassier(CSC): maximizeU1,U221)]TJ /F5 11.955 Tf 11.95 0 Td[(U2U>2k2FsubjecttoUT1U1=Ik,UT2U2=Ik(5) 29

PAGE 30

wheretheparameterCcontrolsthetradeoffbetweentherelativeorientationofthesubspacesandtheapproximationofthedata.Fromcalculationsinsection5.1:kU1U>1)]TJ /F5 11.955 Tf 11.95 0 Td[(U2U>2k2F=2k)]TJ /F3 11.955 Tf 11.96 0 Td[(2tr(U>1U2U>2U1)Hencetheoptimizationproblembecomes: maximizeU1,U22
PAGE 31

CHAPTER6ALGORITHM 6.1AlternatingOptimizationTechniqueWeintroduceanalternatingoptimizationalgorithmtosolve( 5 ).ForaxedU2,( 5 )reducesto: maximizeU12
PAGE 32

Relativechangeinobjectivefunctionvalueof( 5 )atiterationkandk+1, tolkf=F(k+1))]TJ /F4 11.955 Tf 11.95 0 Td[(F(k) jF(k)j+1(6)ThealgorithmforCSCcanbesummarizedasfollows: Algorithm1CSC(X1,X2,k,C) 1.InitializeU1andU2suchthatUT1U1=Ik,UT2U2=Ik. 2.FindeigenvectorscorrespondingtotheklargesteigenvaluesofsymmetricmatrixX1XT1+CU2UT2. 3.FindeigenvectorscorrespondingtotheklargesteigenvaluesofsymmetricmatrixX2XT2+CU1UT1. 4.Alternatebetween2and3untiloneoftheterminationrulesissatised. 32

PAGE 33

CHAPTER7NUMERICALEXPERIMENTS 7.1NumericalExperimentsTheperformanceofCSCisevaluatedonfourhighdimensionalpubliclyavailabledatasets.CSCisalsotestedontwolowerdimensionaldatasetsfortheshakeofcompleteness.TheperformanceofCSCisevaluatedfordifferentvaluesofC,andcomparedtothatofLSC.ThevaluesofCarechoseninsuchawaythattherelativeanglebetweenthesubspacesvariesuniformly.TherelativeanglebetweenthesubspacesisevaluatedintermsoftheprojectionmetricdpF.ThevalueofdpFvariesbetween0andk,wherekisthedimensionalityofthesubspaces.Thevalueofkischosenasf1,3,10g.Experimentsareperformedwitha2.83GHzIntelCore2QuadCPUrunningWindows7with4.0GBofmainmemory.Theclassicationperformanceisevaluatedusingleave-one-outcrossvalidation(LOOCV)technique[ 11 ].TheclassicationaccuraciesasafunctionofCfordifferentvaluesofkareshowningures7.1-7.6.C0representstheresultsofLSC.C)]TJ /F6 7.97 Tf 6.59 0 Td[(1,C)]TJ /F6 7.97 Tf 6.58 0 Td[(2correspondtoC<0andC+1,C+2correspondtoC>0.Asmentionedearlier,positivevaluesofCdecreasetherelativeanglebetweenthesubspaceswhilenegativevaluesofCincreasetherelativeangle.ThevaluesofK,tolkf,tolkU1andtolkU2arechosentobe2000,1e-6,1e-6and1e-6respectively. 7.1.1DLBCLDiffuselargeB-celllymphomaDLBCL[ 18 ],themostcommonlymphoidmalignancyinadults,iscurableinlessthan50%ofpatients.TheDLBCLdatasetconsistsof77sampleswith5469features.CSCwasusedtoidentifycuredversusfatalorrefractorydisease. 33

PAGE 34

7.1.2BreastBreastdataset[ 19 ]consistsof77samplesofbreasttumors.4869featuresdescribeeachoneofthosetumors.CSCclassiedthetumorsasrecurringornon-recurring. 7.1.3Colon40tumorand22normalcolontissuesamplesmakeupColon[ 1 ]dataset.2000featuresdescribeeachoneofthosesamples.CSCclassiedthesamplesastumorousornot. 7.1.4DBWorldDBWorlddataset[ 9 ]consistsof64e-mails(samples)dividedintwoclasses.Therstoneconsistsofonlysubjectlines,whilethesecondconsistsofonlybodies.4702featuresdescribeeachoneofthosesamples.CSCclassiedthesamplesassubjectsorbodies. 7.1.5MushroomMushroomdataset[ 17 ]describescharacteristicsofgilledmushrooms.Itconsistsof8124sampleswith126features.CSCclassiedthesamplesontotwocategories,edibleandnon-edible. 7.1.6SpambaseSpambasedataset[ 14 ]consistsof4601samples(emails)with57features.CSCwasusedtondweatheranemailisspamornot. 7.2ResultsForDLBCLandColondatasets,classicationaccuracyisimprovedbyreducingtherelativeanglebetweensubspaces.InthecaseofBreastdataset,increasingtherelativeangleconsiderablyimprovestheclassicationaccuracy.However,theeffectofanglechangeisrelativelysmallfork=10dimensionalsubspacesforthosedatasets.Additionally,inthecaseofColondataset,thechangeinthedimensionofthesubspacesdoesnotaffecttheperformanceofCSC. 34

PAGE 35

Figure7-1. DLBCL Figure7-2. Breast TheclassicationaccuracyofCSCwasalmostidenticaltothatofLSCfortheDBWorlddatatset. 35

PAGE 36

Figure7-3. Colon Figure7-4. DBWorld Withrespecttothelowerdimensionaldatasets,CSCperformedatleastasgoodasLSC.InthecaseofSpambasedataset,CSCwasabletoslightlyincreasetheaccuracyofclassicationfornegativevaluesofC. 36

PAGE 37

Figure7-5. Mushroom Figure7-6. Spambase 37

PAGE 38

CHAPTER8CONCLUSIONSANDFUTUREWORKInthisthesis,weproposeanewclassicationmethodforhighdimensionaldatasets.OurmethodcalledConstrainedSubspaceClassier(CSC)improvesonanearlierproposedsubspaceclassierLocalSubspaceClassier(LSC).Inadditiontoapproximatingtheclasseswellbyindividualsubspaces,CSCalsoaccountsfortherelativeanglebetweenthesubspacesbyutilizingtheprojectionmetric.Anefcientalternatingoptimizationtechniqueisalsoproposed.CSChasbeenevaluatedonpubliclyavailabledatasetsandiscomparedtoLSC.Theimprovementinclassicationaccuracyshowstheimportanceofconsideringtherelativeanglebetweensubspaceswhileapproximatingtheclasses.Additionally,CSCseemstobeeffectivewhenintroducedforlowerdimensionalsubspaces.Regardingfuturework,weplantokernelizetheprocedurewhichmayleadtoreducedtimecomplexity.Also,weaimtogeneralizeCSCforthemulti-classcase.Therststeptowardsthisdirectionwouldbetoconsidertherelativevolumesbetweenthesubspaces. 38

PAGE 39

REFERENCES [1] UriAlon,NaamaBarkai,DanielA.Notterman,KurtGish,SuzanneYbarra,DanielMack,andArnoldJ.Levine.Broadpatternsofgeneexpressionrevealedbyclusteringanalysisoftumorandnormalcolontissuesprobedbyoligonucleotidearrays.ProceedingsoftheNationalAcademyofSciences,96(12):6745,1999. [2] KevinBeyer,JonathanGoldstein,RaghuRamakrishnan,andUriShaft.Whenisnearestneighbormeaningful?DatabaseTheoryICDT99,pages217,1999. [3] AkeBjrckandGeneH.Golub.Numericalmethodsforcomputinganglesbetweenlinearsubspaces.Mathematicsofcomputation,27(123):579,1973. [4] RobertClarke,HabtomW.Ressom,AntaiWang,JianhuaXuan,MinettaC.Liu,EdmundA.Gehan,andYueWang.Thepropertiesofhigh-dimensionaldataspaces:implicationsforexploringgeneandproteinexpressiondata.NatureReviewsCancer,8(1):37,2008. [5] TomsA.AriasEdelman,AlanandStevenT.Smith.Thegeometryofalgorithmswithorthogonalityconstraints.SIAMjournalonMatrixAnalysisandApplications,20(2):303,1998. [6] MichaelB.FennandVijayPappu.Dataminingforcancerbiomarkerswithramanspectroscopy.DataMiningforBiomarkerDiscovery,pages143,2012. [7] GeneHGolubandCharlesFVanLoan.Matrixcomputations,volume3.JHUPress,2012. [8] JihunHammandDanielD.Lee.Grassmanndiscriminantanalysis:aunifyingviewonsubspace-basedlearning.InProceedingsofthe25thinternationalconferenceonMachinelearning,pages376.ACM,2008. [9] BoanergesAleman-MezaHassell,JosephandI.BudakArpinar.Ontology-drivenautomaticentitydisambiguationinunstructuredtext,volume4273.SpringerBerlinHeidelberg,2006. [10] IainM.JohnstoneandD.MichaelTitterington.Statisticalchallengesofhigh-dimensionaldata.PhilosophicalTransactionsoftheRoyalSocietyA:Mathe-matical,PhysicalandEngineeringSciences,367(1906):4237,2009. [11] RonKohavi.Astudyofcross-validationandbootstrapforaccuracyestimationandmodelselection.IJCAI,14(2):1137,1995. [12] MarioKoppen.Thecurseofdimensionality.In5thOnlineWorldConferenceonSoftComputinginIndustrialApplications(WSC5),pages4,2000. [13] JormaLaaksonen.Localsubspaceclassier.InArticialNeuralNetworksI-CANN'97,pages637.Springer,1997. 39

PAGE 40

[14] SangMinLeeetal.Spamdetectionusingfeatureselectionandparametersoptimization.IntelligentandSoftwareIntensiveSystems(CISIS),pages883,2010. [15] HuanLiuandHiroshiMotoda.Featureextraction,constructionandselection:Adataminingperspective.Springer,1998. [16] YvanSaeys,InakiInza,andPedroLarranaga.Areviewoffeatureselectiontechniquesinbioinformatics.Bioinformatics,23(19):2507,2007. [17] AmitSatsangiandOsmarR.Zaiane.Contrastingthecontrastsets:Analternativeapproach.11thInternationalDatabaseEngineeringandApplicationsSymposium,2007. [18] MargaretA.Shipp,KenN.Ross,PabloTamayo,AndrewP.Weng,JefferyL.Kutok,RicardoCT.Aguiar,MichelleGaasenbeek,MichaelAngelo,MichaelReich,GeraldineS.Pinkus,etal.Diffuselargeb-celllymphomaoutcomepredictionbygene-expressionprolingandsupervisedmachinelearning.Naturemedicine,8(1):68,2002. [19] LauraJ.van'tVeer,HongyueDai,MarcJ.VanDeVijver,YudongD.He,AugustinusAMHart,MaoMao,HansL.Peterse,etal.Geneexpressionprolingpredictsclinicaloutcomeofbreastcancer.nature,415(6871):530,2002. [20] ReneVidaletal.GeneralizedPrincipalComponentAnalysis,volume1.ElectronicsResearchLaboratory,CollegeofEngineering,UniversityofCalifornia,2006. [21] PetrosXanthopoulos,PanosM.Pardalos,andTheodoreB.Trafalis.RobustDataMining,volume1.Springer,2010. [22] LeiYuandHuanLiu.Featureselectionforhigh-dimensionaldata:Afastcorrelation-basedltersolution.3:856,2003. [23] LingsongZhangandXihongLin.Someconsiderationsofclassicationforhighdimensionlow-samplesizedata.StatisticalMethodsinMedicalResearch,2011. 40

PAGE 41

BIOGRAPHICALSKETCH OrestisPanosPanagopoulosearnedhisDiplomainproductionengineeringandmanagementfromtheTechnicalUniversityofCrete,Greece(2010).HehasalsoreceivedtheExitCerticatefromtheEnglishLanguageInstitute(ELI)attheUniversityofFlorida,U.S.A.(2011).HereceivedhisM.Sc.inindustrialandsystemsengineeringfromtheUniversityofFloridainthesummerof2014.Orestishasbroadexperiencerangingfrominformationtechnologytosalesandproductionengineering,acquiredwhilehisinternshipsasmanagementassistantattheAthenianBreweryS.A.,subsidiaryofHeineken(Athens,Greece)duringtheyears2005-2010.Hisresearchinterestsfocusondatamining,machinelearning,statisticalmodelingandgraphtheory.Heisspecializinginclustering,featureselectionanddimensionalityreductiontechniques,andassignmentproblemsinlargenetworks. 41



PAGE 1

TAMINGTHECURSEOFDIMENSIONALITYINKERNELSAND NOVELTYDETECTIONPaulF.Evangelista DepartmentofSystemsEngineering UnitedStatesMilitaryAcademy WestPoint,NY10996 MarkJ.Embrechts DepartmentofDecisionSciencesandEngineeringSystems RensselaerPolytechnicInstitute Troy,NewYork12180 BoleslawK.Szymanski DepartmentofComputerScience RensselaerPolytechnicInstitute Troy,NewYork12180Abstract. Thecurseofdimensionalityisawellknownbutnotentirelywell-understoodphenomena. Toomuchdata,intermsofthenumberofinputvariables,isnotalwaysagoodthing.Thisisespeciallytruewhentheprobleminvolvesunsupervisedlearningorsupervisedlearningwithunbalanced data(manynegativeobservationsbutminimalpositiveobservations).Thispaperaddressestwoissues involvinghighdimensionaldata:TheÞrstissueexploresthebehaviorofkernelsinhighdimensional data.Itisshownthatvariance,especiallywhencontributedbymeaninglessnoisyvariables,confounds learningmethods.Thesecondpartofthispaperillustratesmethodstoovercomedimensionalityproblemswithunsupervisedlearningutilizingsubspacemodels.Themodelingapproachinvolvesnovelty detectionwiththeone-classSVM.1IntroductionHighdimensionaldataoftencreateproblems.Thisproblemisexacerbatedifthetrainingdataisonlyone class,unknownclasses,orsigniÞcantlyunbalancedclasses.ConsiderabinaryclassiÞcationproblemthat involvescomputerintrusiondetection.OurintentionistoclassifynetworktrafÞc,andweareinterestedin classifyingthetrafÞcaseitherattacks(intruders)ornonattacks.CapturingnetworktrafÞcissimple-hookup toaLANcable,runtcpdump,andyoucanÞllaharddrivewithinminutes.Thesecapturednetworkconnectionscanbedescribedwithattributes;itisnotuncommonforanetworkconnectiontobedescribedwith over100attributes[14].However,theclassofeachconnectionwillbeunknown,orperhapswithreasonable conÞdencewecanassumethatalloftheconnectionsdonotinvolveanyattacks. Theabovescenariocanbegeneralizedtoothersecurityproblemsaswell.Givenamatrixofdata, X , containing N observationsand m attributes,weareinterestedinclassifyingthisdataaseitherpotential attackers(positiveclass)ornonattackers(negativeclass).If m islarge,andourlabels, y RN × 1,are unbalanced(usuallyplentyofknownnonattackersandfewinstancesofattacks),oneclass(allnonattackers), orunknown,increaseddimensionalityrapidlybecomesaproblemandfeatureselectionisnotfeasibledueto theminimalexamples(ifany)oftheattackerclass.2RecentWorkTheprimarymodelexploredwillbetheone-classSVM.Thisisanoveltydetectionalgorithmoriginally proposedin[27].Themodelisrelativelysimplebutapowerfulmethodtodetectnoveleventsthatoccurafterlearningfromatrainingsetofnormalevents.Formallystated,theone-classSVMconsiders x1, x2,..., xNX instancesoftrainingobservationsandutilizesthepopular"kerneltrick"tointroduce anonlinearmappingof xi ( xi) .UnderMercer'stheorem,itispossibletoevaluatetheinnerproduct oftwofeaturemappings,suchas ( xi) and ( xj) ,withoutknowingtheactuallyfeaturemapping.Thisis possiblebecause ( xi) , ( xj) ( xi, xj) [2]. willbeconsideredamappingintothefeaturespace, F , from X . Thefollowingminimizationfunctionattemptstosqueeze R ,whichcanbethoughtofastheradiusofa hypersphere,assmallaspossibleinordertoÞtallofthetrainingsamples.IfatrainingsamplewillnotÞt, iisaslackvariabletoallowforthis.Afreeparameter, (0 , 1) ,enablesthemodelertoadjusttheimpactof theslackvariables.

PAGE 2

minR R , RN,c FR2+ 1 N ii(1) subjectto ( xi) Š c 2 R2+ i,i 0 for i [ N ] ThelagrangiandualoftheoneclassSVMisshownbelowinequation2. maxii ( xi, xi) Š i,jij ( xi, xj) (2) subjectto 0 i 1 vN and ii=1 CristianiniandShawe-Taylorprovideadetailedexplanationofone-classSVMsin[24].StolfoandWang [25]successfullyapplytheone-classSVMtotheSEAdatasetandcompareitwithseveralofthetechniques mentionedabove.Chenusestheone-classSVMforimageretrieval[8].Sch¬ olkopfet.al.exploretheabove formulationoftheone-classSVMandotherformulationsin[23].Fortunatelythereisalsofreelyavailable softwarethatimplementstheone-classSVM,writteninC++byChangandLin[7]. Thedimensionalityproblemfacedbytheone-classSVMhasbeenmentionedinseveralpapers,however itistypicallya"futureworks"typeofdiscussion.TaxandDuinclearlymentionthatdimensionalityisa problemin[27],howevertheyoffernosuggestionstoovercomethis.Modelinginsubspaces,whichisthe proposedmethodtoovercomethisproblem,isnotanaltogethernovelconcept.Indatamining,subspace modelingtoovercomedimensionalityisapopularapproach.Aggarwaldiscussesthisin[1].Parsonset.al. provideasurveyofsubspaceclusteringtechniquesin[21].Thecurseofdimensionalityislargelyafunction ofclassimbalanceandouraprioriknowledgeofthedistributionof ( x | y ) .Thisimpliesthatthecurseof dimensionalityisaproblemthatimpactsunsupervisedproblemsthemostseverely,anditisnotsurprising thatdataminingclusteringalgorithms,anunsupervisedmethod,hascometorealizethevalueofmodelingin subspaces.3AnalyticalInvestigation3.1TheCurseofDimensionality,Kernels,andClassImbalanceMachinelearninganddataminingproblemstypicallyseektoshowadegreeofsimilaritybetweenobservations,oftenasadistancemetric.Beyeret.al.discusstheproblemofhighdimensionaldataanddistance metricsin[3],presentingaprobabilisticapproachandillustratingthatthemaximallydistantpointandminimallydistantpointconvergeindistanceasdimensionalityincreases.Aproblemwithdistancemetricsinhigh dimensionalspaceisthatdistanceistypicallymeasuredacrossvolume.Volumeincreasesexponentiallyas dimensionalityincreases,andpointstendtobecomeequidistant.Thecurseofdimensionalityisexplained withseveralartiÞcialdataproblemsin[15]. Kernelbasedpatternrecognition,especiallyintheunsuperviseddomain,isnotentirelyrobustagainst highdimensionalinputspaces.Akernelisnothingmorethanasimilaritymeasurebetweentwoobservations. Giventwoobservations, x1and x2,thekernelbetweenthesetwopointsisrepresentedas ( x1, x2) .Alarge valuefor ( x1, x2) indicatessimilarpoints,wheresmallervaluesindicatedissimilarpoints.Typicalkernels includethelinearkernel, ( x1, x2)= x1, x2 ,thepolynomialkernel, ( x1, x2)=( x1, x2 +1)p,and thepopulargaussiankernel, ( x1, x2)= e( Š x1Š x2 / 2 2).Asshown,thesekernelsareallfunctionsofinner products.Ifthevariableswithin x1and x2areconsideredrandomvariables,thesekernelscanbemodeledas functionsofrandomvariables.Thefundamentalpremiseofpatternrecognitionisthefollowing: ( ( x1, x2) | y1= y2) > ( ( x1, x2) | y1 = y2) (3) Ifthispremiseisconsistentlytrue,goodperformanceoccurs.Bymodelingthesekernelsasfunctionsof randomvariables,itcanbeshownthattheadditionofnoisy,meaninglessinputvariablesdegradesperformanceandthelikelihoodofthefundamentalpremiseshownabove. InaclassiÞcationproblem,thecurseofdimensionalityisafunctionofthedegreeofimbalance.Ifthereare asmallnumberofpositiveexamplestolearnfrom,featureselectionispossiblebutdifÞcult.Withunbalanced data,signiÞcantevidenceisrequiredtoillustratethatafeatureisnotmeaningful.Iftheproblemisbalanced, theburdenisnotasgreat.FeaturesaremuchmoreeasilyÞlteredandselected.2

PAGE 3

AsimpleexplanationofthisistoconsideratwosampleKolmogorovtest[22].Thisisaclassicalstatisticaltesttodeterminewhetherornottwosamplescomefromthesamedistribution,andthistestisgeneral regardlessofthedistribution.InclassiÞcationmodels,ameaningfulvariableshouldbehavedifferentlydependingontheclass,implyingdistributionsthatarenotequal.Statedintermsofdistributions,if x isany variabletakenfromthespaceofallvariablesinthedataset, ( Fx( x ) | y =1) shouldnotbeequivalentto ( Gx( x ) | y = Š 1) . Fx( x ) and Gx( x ) simplyrepresentthecumulativedistributionfunctionsof ( x | y =1) and ( x | y = Š 1) ,respectively.InordertoapplythetwosampleKolmogorovtest,theempiricaldistribution functionsof Fx( x ) and Gx( x ) mustbecalculatedfromagivensample,andthesedistributionfunctionswill bedenotedas F N1( x ) and G N2( x ) . N1willequatetothenumberofsamplesintheminorityclass,and N2equatestothenumberofsamplesinthemajorityclass.Theseempiricaldistributionfunctionsareeasilyderivedfromtheorderstatisticsofthegivensample,whichisshownin[22].TheKolmogorovtwosample teststatesthatifthesupremumofthedifferenceofthesefunctionsexceedsatabledcriticalvaluedepending onthemodeler'schoiceof (sumofprobabilitiesintwotails),thenthesetwodistributionsaresigniÞcantly different.Statedformally,ourhypothesisisthat Fx( x )= Gx( x ) .WerejectthishypothesiswithaconÞdence of (1 Š ) ifequation4istrue. DN1,N2=supŠ DN1,N2,(4) Forlargervaluesof N1and N2(both N1and N2greaterthan20)and = . 05 ,wecanconsiderequation 5toillustrateanexample.Thisequationisfoundinthetableslistedin[22]: DN1,N2, = . 05=1 . 36 N1+ N2 N1N2(5) If N2isÞxedat100,and N1isconsideredtheminorityclass,itispossibletoplottherelationshipbetween m andthecriticalvaluenecessarytorejectthehypothesis. increasing class balance increasing curse of dimensionality Figure1.PlotofcriticalvaluefortwosampleKolmogorovtestwithÞxed N2, = . 05Figure1illustratestheeffectofclassimbalanceonfeatureselection.Iftheclassesarenotbalanced,as isthecasewhen N1=20 and N2=100 ,thereisalargevaluerequiredfor DN1,N2.Itisalsoevidentthat iftheclassesweremoreseverelyimbalanced, DN1,N2wouldcontinuetogrowexponentially.Astheclasses balance, DN1,N2andthecriticalvaluebeginstoapproachalimit.Thepointofthisexercisewastoshowthat thecurseofdimensionalityisafunctionofthelevelofimbalancebetweentheclasses,andthetwosample Kolmogorovtestprovidesacompactandstatisticallygroundedexplanationforthis.3

PAGE 4

3.2KernelBehaviorinHighDimensionalInputSpaceAnexampleisgiveninthissectionwhichillustratestheimpactofdimensionalityonlinearkernelsand gaussiankernels. ConsidertworandomvectorsthatwillserveasartiÞcialdataforthisexample. x1=( z1,z2,...,zm) , zi N (0 , 1) i.i.d x2=( z1,z2,...,zm) , zi N (0 , 1) i.i.d m= m ,andlet vi= ziziTheexpectedvalueof viiszero. viistheproductoftwostandardnormalrandomvariables,whichfollows aninterestingdistributiondiscussedin[12].TheplotofthisdistributionisshowninÞgure2. Figure2.Plotof vi= ziziToÞndtheexpectationofalinearkernel,itisstraightforwardtoseethat E ( x , y )= ivi= E ( z1z1+ z2z2+ ... + zmzm)=0 .Thevarianceofthelinearkernelcanbefoundasfollows: fzi,zi ( zi,zi) isbivariatenormal fzi,zi ( zi,zi)= 1 2 eŠ ( z 2 i + z 2 i ) 2fv( v )= Šfzi,zi ( zi, v zi) 1 | zi| dziE ( v )=0 variance = E ( v2)= Šv2[ fv( v )] v =1 (veriÞedbynumericalintegration) Againconsideringthelinearkernelasafunctionofrandomvariables, ( x1, x2)= x1, x2 = m i =1viisdistributedwithameanof0andavarianceof m i =11= m . InclassiÞcationproblems,however,itisassumedthatthedistributionsofthevariablesforoneclassare notthesameasthedistributionsofthevariablesfortheotherclass.Letusnowconsider vŠasaproductof4

PAGE 5

dissimilardistributions,and v+asaproductofsimilardistributions.Let vŠ=( ziŠ 1)( zi+1) . vŠwill bedistributedwithameanof µŠ= E ( ziziŠ zi+ ziŠ 1)= Š 1 ,andavarianceof3(veriÞedthrough numericalintegration).Thelinearkernelofthedissimilardistributionscanbeexpressedas: ( x1Š 1 , x2+1)=mi =1vŠThislinearkernelisdistributedwiththefollowingparameters: meanŠ= mµŠ= Š m ,variance = m2=3 m Forthesimilarobservations,let v+=( zi+1)( zi+1)=( ziŠ 1)( z iŠ 1) .Theparametersofthe kernelforthesimilarobservationscanbefoundinthesamemanner. v+isdistributedwithameanof µ+= E ( zizi+ zi+ zi+1)=1 andavarianceof 2=3 .Thelinearkernelofthesimilardistributionscanbe expressedas: ( x1+1 , x2+1)=mi =1v+Thiskernelisdistributedwiththefollowingparameters: mean+= mµ+= m ,variance = m2=3 m Themeansandvariancesofthedistributionsofthelinearkernelsareeasilytractable,andthisisallthe informationthatweneedtoanalyzetheeffectofdimensionalityonthesekernels.Intheaboveexample, themeanofeveryvariablefordissimilarobservationsdiffersby2.Thisisconsistentforeveryvariable. Obviously,nodatasetisthisclean,howevertherearestillinterestingobservationsthatcanbemade.Consider thatratherthaneachvariabledifferingby2,theydifferbysomevalue i.If iisasmallvalue,oreven zeroforsomeinstances(whichwouldbethecaseforpurenoise),thisvariablewillcontributeminimally indistinguishingsimilarfromdissimilarobservations,andfurthermorethevarianceofthisvariablewillbe entirelycontributed.Alsonoticethatattherateof 3 m ,variancegrowslargefast. Basedonthisobservation,anassertionisthatforthebinaryclassiÞcationproblem,bimodalvariables aredesirable.Eachmodewillcorrespondtoeitherthepositiveornegativeclass.Largedeviationsinthese modes,withminimalvariationwithinaclass,arealsodesired.Aneffectivemodelmustbeabletodistinguish vŠfrom v+.Inorderforthistooccur,themodelneedsgoodseparationbetween meanŠand mean+and variancethatisundercontrol. Itisalsointerestingtoexplorethegaussiankernelunderthesameexample.Forthegaussiankernel, ( x1, y2)= eŠ x1Š x22/ 2 2.Thiskernelisentirelydependentuponthebehaviorof x1Š x22andthe modeler'schoiceoftheparameter (whichhasnorelationtovariance). Restrictingourattentionto x1Š x22,aninitialobservationisthatthisexpressionisnothingmore thantheeuclideandistancesquarred.Also,if x1and x2containvariablesthataredistributed N (0 , 1) ,then ( x1Š x2) containsvariablesdistributednormallywithameanof0andavarianceof2. Let w =( ziŠ zi)2,implyingthat w/ 2 isachi-squarreddistributionwithameanofone(whichwillbe annotatedas 2(1) ).Thisalsoindicatesthat w =2 2(1) ,indicatingthat w hasameanof2andavarianceof 8(veriÞedbynumericalintegration). Therefore, x1Š x22= m i =1wiwillhaveadistributionwithameanof 2 m andavarianceof 8 m . Noticethatthevariancegrowsmuchfasterunderthisformulation,indicatingevenmoresensitivitytonoisy variables. Thepurposeoftheaboveexampleistoshowhoweveryvariableaddedwillcontributetotheoverall behaviorofthekernel.Ifthevariableismeaningful,thepatterncontributedtothe-1classisnotequivalentto thepatterncontributedto+1class.Themeaningfulnessofthevariablecanalsobeconsideredintermsofcost andbeneÞt.ThebeneÞtofincludingavariableinaclassiÞcationmodelisthecontributionofthevariable towardspushing meanŠawayfrom mean+.Thecostofincludingavariableinvolvesthevariance.This variancewillbeincludedregardlessofthesigniÞcanceofthebeneÞt.5

PAGE 6

3.3TheImpactofDimensionalityontheOne-ClassSVMInordertoillustratetheimpactofdimensionalityonkernelsandtheone-classSVMspeciÞcally,anexperimentwithartiÞcialdatawasconstructed.Thisdatamodelsasimplepatterninvolvingstandardnormal distributionswherethepositiveclassandnegativeclasshaveadifferenceof2betweentheirmeans.This modelcanbepresentedasfollows: x+1=( z1+1 ,z2+1 ,z3,...,zm) , zi N (0 , 1) i.i.d xŠ 1=( z1Š 1 ,z2Š 1 ,z3...,zm) , zi N (0 , 1) i.i.d ThetruepatternonlyliedintheÞrsttwovariables.Allremainingvariableswerenoise.Threetypes ofkernelswereexamined:thelinearkernel,polynomialkernel,andgaussiankernel.Onlytheresultsfrom thegaussiankernelareshownhere,howeverdegradationofperformanceoccurredwithallkernels.The performancemetricusedwastheareaundertheROCcurve(AUC).Table1.OneClassSVM(gaussiankernel)experimentforvariousdimensionsonartiÞcialdata Dimensions AUC R2 2 0.9201 0.5149 5 0.8978 0.4665 10 0.8234 0.4356 50 0.7154 0.3306 100 0.6409 0.5234 250 0.6189 0.4159 500 0.5523 0.6466 1000 0.5209 0.4059 Thegaussiankernelsinthisexperimentweretunedusinganauto-tuningmethod.Typicallyforgaussian kernels,avalidationsetofpositiveandnegativelabeleddataisavailablefortuning .Inunsupervisedlearning,theseexamplesofpositivelabeleddatadonotexist.Therefore,thebesttuningpossibleistoachieve somevariationinthevaluesofthekernelwithoutvaluesconcentratedoneitherextreme.If istoolarge, allofthevalueswilltendtowards1.Iftoosmall,theytendto0.Theautotuningfunctionensuresthatthe off-diagonalvaluesfor ( x+1, xŠ 1) averagebetween.4and.6,withaminvaluegreaterthan.2.4AFrameworktoOvercomeHighDimensionalityAnovelframeworkforunsupervisedlearning,oranomalydetection,hasbeeninvestigatedtosolveunsupervisedlearningproblemsofhighdimension[10,11].Thistechniqueisdesignedforunsupervisedmodels, howeverthefusionofmodeloutputappliestoanytypeofclassiÞerthatproducesasoft(realvalued)output. Thisframeworkinvolvesexploringsubspacesofthedata,trainingaseparatemodelforeachsubspace,and thenfusingthedecisionvariablesproducedbythetestdataforeachsubspace.Intelligentsubspaceselection hasalsobeenintroducedwithinthisframework. CombinationsofmultipleclassiÞers,orensembletechniques,isaveryactiveÞeldofresearchtoday. However,theÞeldremainsrelativelylooselystructuredasresearcherscontinuetobuildthetheorysupporting theprinciplesofclassiÞercombinations[18].SigniÞcantworkinthisÞeldhasbeencontributedbyKuncheva in[16Ð19].Bonissoneet.al.investigatedtheeffectofdifferentfuzzylogictriangularnormsbaseduponthe correlationofdecisionvaluesfrommultipleclassiÞers[4].ThemajorityofworkinthisÞeldhasbeendevoted tosupervisedlearning,withlesseffortaddressingunsupervisedproblems[26].Theresearchthatdoesaddress unsupervisedensemblesinvolvesclusteringalmostentirely.Thereisavastamountofliteraturethatdiscusses subspaceclusteringalgorithms[21].Therecentworkthatappearssimilarinmotivationtoourtechnique includeYanget.al.whodevelopasubspaceclusteringmodelbaseduponPearson's R correlation[28], andMaandPerkinswhoutilizetheone-classSVMfortimeseriespredictionandcombineresultsfrom intermediatephasespaces[20].TheworkinthispaperhasalsobeeninspiredbyHo'sRandomSubspace Methodsin[13].Ho'smethodrandomlyselectssubspacesandconstructsadecisiontreeforeachsubspace; treesarethenaggregatedintheendbytakingthemean.Breiman'sworkwithbagging[5]andrandom forests[6]wasalsoasigniÞcantcontributioninmotivatingthiswork.Breiman'sbaggingtechniqueinvolves6

PAGE 7

bootstrapsamplingfromatrainingsetandcreatingadecisiontreeforeachsample.Breimanalsousesthe meanastheaggregator.Therandomforesttechniqueexploresdecisiontreeensemblesfromrandomsubsets offeatures,similartoHo'smethod. Learning Machine subspace 2 inputs subspace 3 inputs Learning Machine Learning Machinesubspace 1 inputs FUSION OF DECISION VARIABLES t t t SYNERGISTIC ROC CURVE(OR FUZZY ROC CURVE IF USING FUZZY LOGIC AGGREGATION) Intelligent Selection of Subspaces ... ... ... ... ... ... ... X 2 1 2 22 21 1 11 11 =mm m n m mx x x x x x x x x t : threshold value for classification t distribution of negative (healthy) points distribution of positive (unhealthy) points LEGEND 1 1 An overall system of IDS that generates a synergistic curve from component systems. TRUE POSITIVEFALSE POSITIVE Convex Hull Figure3.Asketchofsubspacemodelingtoseeksynergisticresults. Intersections(T-Norms) Unions(T-Conorms) Averages 0max (0 , x + y – 1) x x y min( x,y ) max ( x,y ) x + y -x x y min(1, x + y )1(algebraic product) (bounded product) (algebraic sum) (bounded sum)Figure4.AggregationoperatorsThetechniqueweproposeillustratesthatunsupervisedlearninginsubspacesofhighdimensionaldata willtypicallyoutperformunsupervisedlearninginthehighdimensionaldataspaceasawhole.Furthermore, thefollowinghypothesesshowexceptionalpromisebasedoninitialempiricalresults: 1.Intelligentsubspacemodelingwillprovidefurtherimprovementofdetectionbeyondarandomselection ofsubspaces. 2.FuzzylogicaggregationtechniquescreatethefuzzyROCcurve,illustratingimprovedAUCbyselecting properaggregationtechniques. Promisingresultsfromthisapproachhavebeenpublishedin[10,11].Aspreviouslydiscussed,aggregationofmodelswithfuzzylogicaggregatorsisanimportantaspect.Givenunbalanceddata(minoritypositive class),ithasbeenobservedthatfusionwithT-normsbehaveswellandimprovesperformance.Figure4illustratesthespectrumoffuzzylogicaggregators. Theresultsshownintable2andÞgure5illustratetheimprovementsobtainedthroughourensemble techniquesforunsupervisedlearning.TheplotoftheROCcurvesshowstheresultsfromusing26original variablesthatrepresentedtheSchonlauet.al.(SEA)data[9]asonegroupofvariableswiththeone-classSVM andtheresultofcreating3subspacesoffeaturesandfusingtheresultstocreatethefuzzyROCcurve.Itis interestingtonoticeinthetableofresultsthatnearlyeveryaggregationtechniquedemonstratedimprovement, especiallyintheSEAdata,withthemostsigniÞcantimprovementintheT-norms. TheionospheredataisavailablefromtheUCIrepository,anditconsistsof34variablesthatrepresent differentradarsignalsreceivedwhileinvestigatingtheionosphereforeithergoodorbadstructure.Forthis experimentweagainchose l =3 .7

PAGE 8

Table2.ResultsofSEAdatawithdiverseandnon-diversesubsets SEAdata Ionospheredata BaseAUC(usingallvariables) .7835 .931 T-norms minimum .90 .96 algebraicproduct .91 .61 T-conorms maximum .84 .69 algebraicsum .89 .69 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 TRUE POSITIVE RATEFALSE POSITIVE RATE AREA UNDER THE CURVE : 0.9318 Fuzzy_ROC Curve, AUC = 0.9318 26 vars, AUC = .7835 Figure5.ROCforSEAdatausingalgebraicproductwithcontention5DiscussionandConclusionThereweretwocomponentstotheresearchpresentedinthispaper.TheÞrstcomponentinvolvedexposingthe impactofthecurseofdimensionalitywithkernelmethods.Thisinvolvedillustratingthatmoreisnotalways betterintermsofvariables,butmoreimportantlythattheimpactofthecurseofdimensionalitygrowsasclass imbalancebecomesmoresevere.Kernelmethodsarenotimmunetoproblemsinvolvinghighdimensional data,andtheseproblemsneedtobeunderstoodandmanaged. Thesecondcomponentofthisresearchinvolvedthediscussionandbriefillustrationofaproposedframeworkforunsupervisedmodelinginsubspaces.Unsupervisedlearning,especiallynoveltydetection,hasimportantapplicationsinthesecuritydomain.Thisappliesespeciallytocomputerandnetworksecurity.Future directionsforthisresearchincludeexposingthetheoreticalfoundationsofunsupervisedensemblemethods andexplorationofotherensemblesfortheunbalancedclassiÞcationproblem.References1.CharuC.AggarwalandPhilipS.Yu.OutlierDetectionforHighDimensionalData.SantaBarbara,California,2001. Proceedingsofthe2001ACMSIGMODInternationalConferenceonManagementofData. 2.KristinP.BennettandColinCampbell.SupportVectorMachines:HypeorHallelujah.2(2),2001. 3.KevinBeyer,JonathanGoldstein,RaghuRamakrishnan,andUriShaft.WhenIs"NearestNeighbor"Meaningful? LectureNotesinComputerScience ,1540:217Ð235,1999. 4.PieroBonissone,KaiGoebel,andWeizhongYan.ClassiÞerFusionusingTriangularNorms.Cagliari,Italy,June 2004.ProceedingsofMultipleClassiÞerSystems(MCS)2004.8

PAGE 9

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 TRUE POSITIVE RATEFALSE POSITIVE RATE AREA UNDER THE CURVE : 0.9604 Fuzzy_ROC Curve, AUC = 0.9604 34 vars, AUC = .931 Figure6.ROCplotforionospheredatawithminimizeaggregationtechnique 5.LeoBreiman.Baggingpredictors. MachineLearning ,24(2):123Ð140,1996. 6.LeoBreiman.Randomforests. MachineLearning ,45(1):5Ð32,2001. 7.ChihChungChangandChih-JenLin.LIBSVM:ALibraryforSupportVectorMachines. http://www.scie.ntu.edu.tw/cjlin/libsvm,Accessed5September,2004. 8.YunqiangChen,XiangZhou,andThomasS.Huang.One-ClassSVMforLearninginImageRetrieval.Thessaloniki, Greece,2001.ProceedingsofIEEEInternationalConferenceonImageProcessing. 9.WilliamDuMouchel,WenHuaJu,AlanF.Karr,MatthiusSchonlau,MartinTheus,andYehudaVardi.Computer Intrusion:DetectingMasquerades. StatisticalScience ,16(1):1Ð17,2001. 10.PaulF.Evangelista,PieroBonissone,MarkJ.Embrechts,andBoleslawK.Szymanski.FuzzyROCCurvesforthe OneClassSVM:ApplicationtoIntrusionDetection.Montreal,Canada,August2005.InternationalJointConference onNeuralNetworks. 11.PaulF.Evangelista,PieroBonissone,MarkJ.Embrechts,andBoleslawK.Szymanski.UnsupervisedFuzzyEnsemblesandTheirUseinIntrusionDetection.Bruges,Belgium,April2005.EuropeanSymposiumonArtiÞcialNeural Networks. 12.AndrewG.Glen,LawrenceM.Leemis,andJohnH.Drew.ComputingtheDistributionoftheProductofTwo ContinuousRandomVariables. ComputationalStatisticsandDataAnalysis ,44(3):451Ð464,2004. 13.TinKamHo.TheRandomSubspaceMethodforConstructingDecisionForests. IEEETransactionsonPattern AnalysisandMachineIntelligence ,20(8):832Ð844,1998. 14.AlexanderHofmann,TimoHoreis,andBernhardSick.FeatureSelectionforIntrusionDetection:AnEvolutionary WrapperApproach.Budapest,Hungary,July2004.InternationalJointConferenceonNeuralNetworks. 15.MarioKoppen.TheCurseofDimensionality.(heldontheinternet),September4-182000.5thOnlineWorld ConferenceonSoftComputinginIndustrialApplications(WSC5). 16.LudmilaI.Kuncheva.'Fuzzy'vs.'Non-fuzzy'inCombiningClassiÞersDesignedbyBoosting. IEEETransactions onFuzzySystems ,11(3):729Ð741,2003. 17.LudmilaI.Kuncheva.ThatElusiveDiversityinClassiÞerEnsembles.Mallorca,Spain,2003.Proceedingsof1st IberianConferenceonPatternRecognitionandImageAnalysis. 18.LudmilaI.Kuncheva. CombiningPatternClassiÞers:MethodsandAlgorithms .JohnWileyandSons,Inc.,2004. 19.LudmilaI.KunchevaandC.J.Whitaker.MeasuresofDiversityinClassiÞerEnsembles. MachineLearning ,51:181Ð 207,2003. 20.JunshuiMaandSimonPerkins.Time-seriesNoveltyDetectionUsingOne-classSupportVectorMachines.Portland, Oregon,July2003.InternationalJointConferenceonNeuralNetworks. 21.LanceParsons,EhteshamHaque,andHuanLiu.SubspaceClusteringforHighDimensionalData:AReview. SIGKDDExplorations,NewsletteroftheACMSpecialInterestGrouponKnowledgeDiscoveryandDataMining , 2004.9

PAGE 10

22.VijayK.RohatgiandA.K.Md.EhsanesSaleh. AnIntroductiontoProbabilityandStatistics .Wiley,secondedition, 2001. 23.BernhardScholkopf,JohnC.Platt,JohnShaweTaylor,AlexJ.Smola,andRobertC.Williamson.Estimatingthe SupportofaHighDimensionalDistribution. NeuralComputation ,13:1443Ð1471,2001. 24.JohnShawe-TaylorandNelloCristianini. KernelMethodsforPatternAnalysis .CambridgeUniversityPress,2004. 25.SalvatoreStolfoandKeWang.OneClassTrainingforMasqueradeDetection.Florida,19November2003.3rd IEEEConferenceDataMiningWorkshoponDataMiningforComputerSecurity. 26.AlexanderStrehlandJoydeepGhosh.ClusterEnsembles-AKnowledgeReuseFrameworkforCombiningMultiple Partitions. JournalofMachineLearningResearch ,3:583Ð617,December2002. 27.DavidM.J.TaxandRobertP.W.Duin.SupportVectorDomainDescription. PatternRecognitionLetters ,20:1191Ð 1199,1999. 28.JiongYang,WeiWang,HaixunWang,andPhilipYu. -clusters:CapturingSubspaceCorrelationinaLargeData Set.pages517Ð528.18thInternationalConferenceonDataEngineering,2004.10