<%BANNER%>

Category Space Dimensionality Reduction for Supervised Learning

MISSING IMAGE

Material Information

Title:
Category Space Dimensionality Reduction for Supervised Learning
Physical Description:
1 online resource (95 p.)
Language:
english
Creator:
Smith, Anthony O
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Computer Engineering, Computer and Information Science and Engineering
Committee Chair:
Rangarajan, Anand
Committee Members:
Ho, Jeffrey Yih Chian
Vemuri, Baba C
Banerjee, Arunava
Aytug, Haldun

Subjects

Subjects / Keywords:
classification -- multiclass -- orthogonal -- reduction
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre:
Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
In this research, we investigate the reasons that make the multiclass classification problem difficult and suggest that category ambiguity is at the heart of the problem. We analyze previous efforts at multiclass classification and explain how they do not account for this ambiguity when they suggest simple voting schemes or offer the combination of pairwise binary classifiers as solutions. Then we demonstrate that most other methods lack the  notion of a geometric space to represent the classes. These essential concepts have been neglected because in the past, classes were perceived to be independent. This leads to limited approaches in which techniques assume distinct classes and assign nominal labels to them. We argue that class relationships exist that must be exploited. The approach we propose gives an alternate method for dimensionality reduction so that multiclass classification techniques can overcome several of the problems that exist with pairwise classification schemes and exhibits better performance on many problems. We look to separate objects according to similar qualities and characteristics by projecting to a space---a Category Space---defined by the number of classes and the properties of the classes. After dimensionality reduction to the category space, we use a classification technique to evaluate performance on a large collection of benchmark data sets. Finally, we detail the strengths of our approach, and provide a framework for alternative objective functions for linear and kernel-based projections. Our contributions span the range of dimensionality reduction, classification, and optimization for multiclass problems.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Anthony O Smith.
Thesis:
Thesis (Ph.D.)--University of Florida, 2013.
Local:
Adviser: Rangarajan, Anand.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2013
System ID:
UFE0045768:00001

MISSING IMAGE

Material Information

Title:
Category Space Dimensionality Reduction for Supervised Learning
Physical Description:
1 online resource (95 p.)
Language:
english
Creator:
Smith, Anthony O
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Computer Engineering, Computer and Information Science and Engineering
Committee Chair:
Rangarajan, Anand
Committee Members:
Ho, Jeffrey Yih Chian
Vemuri, Baba C
Banerjee, Arunava
Aytug, Haldun

Subjects

Subjects / Keywords:
classification -- multiclass -- orthogonal -- reduction
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre:
Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
In this research, we investigate the reasons that make the multiclass classification problem difficult and suggest that category ambiguity is at the heart of the problem. We analyze previous efforts at multiclass classification and explain how they do not account for this ambiguity when they suggest simple voting schemes or offer the combination of pairwise binary classifiers as solutions. Then we demonstrate that most other methods lack the  notion of a geometric space to represent the classes. These essential concepts have been neglected because in the past, classes were perceived to be independent. This leads to limited approaches in which techniques assume distinct classes and assign nominal labels to them. We argue that class relationships exist that must be exploited. The approach we propose gives an alternate method for dimensionality reduction so that multiclass classification techniques can overcome several of the problems that exist with pairwise classification schemes and exhibits better performance on many problems. We look to separate objects according to similar qualities and characteristics by projecting to a space---a Category Space---defined by the number of classes and the properties of the classes. After dimensionality reduction to the category space, we use a classification technique to evaluate performance on a large collection of benchmark data sets. Finally, we detail the strengths of our approach, and provide a framework for alternative objective functions for linear and kernel-based projections. Our contributions span the range of dimensionality reduction, classification, and optimization for multiclass problems.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Anthony O Smith.
Thesis:
Thesis (Ph.D.)--University of Florida, 2013.
Local:
Adviser: Rangarajan, Anand.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2013
System ID:
UFE0045768:00001


This item has the following downloads:


Full Text

PAGE 1

CATEGORYSPACEDIMENSIONALITYREDUCTIONFORSUPERVISED LEARNING By ANTHONYO'NEALSMITH ADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOL OFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF DOCTOROFPHILOSOPHY UNIVERSITYOFFLORIDA 2013 1

PAGE 2

c 2013AnthonyO'NealSmith 2

PAGE 3

Thisworkisdedicatedtomylovingchildren,MylesandToriSmith,andtothemost amazingwomanIknow,TomikaSmithmywife. 3

PAGE 4

ACKNOWLEDGMENTS Ifthereisnostruggle,thereisnoprogress.Thisstrugglemaybeamoralone;oritmay beaphysicalone;oritmaybebothmoralandphysical;butitmustbeastruggle. FrederickDouglass WestIndiaEmancipation FirstandforemostIwouldliketothankGodforhisgrace,andblessingsthathave givenmethestrengthtoendurethisadventure.Manypeopleareresponsibleformy productiveandenjoyablelearningexperienceattheUniversityofFloridaandfortheideas andresultsinthisdissertationresearch.Thisworkisdedicatedtomylovingfamily.Their patienceandlovehaveinspiredandsustainedmeduringthemanytrials.Iappreciate yourencouragementandsacriceduringtheentireprocess.Ioermysincerestgratitude tomysupervisor,Dr.AnandRangarajan,whohasguidedmethroughoutmyresearch withhisknowledgeandencouragement.Hismentorshiphasgivenmeafoundationfor whichtobuildupon.Iwouldalsoliketoshowgratitudetomycommitteemembers,Dr. ArunavaBanerjee,Dr.JereyHo,Dr.HaldunAytug,andDr.BabaC.Vemuri,forthe insightfulcommentsandsuggestionstheyhaveshared.Iamindebtedfortheassistance andfriendshipofmypeersandcolleagues,Dr.AdrianPeter,Dr.JosefAllen,Dr.Eric Spellman,Dr.KenSartor,andMr.KareemSimmons.Finally,Ioermygreatestregards andblessingstomyparentsforalltheloveandsupport. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS.................................4 LISTOFTABLES.....................................7 LISTOFFIGURES....................................8 ABSTRACT........................................9 CHAPTER 1INTRODUCTION..................................10 1.1TheExistenceofDiscrimination........................10 1.2GeometryofClassication...........................12 1.3MulticlassLearning...............................14 1.4CategoryVectorSpace.............................16 2EVOLUTIONOFCLASSIFICATIONPARADIGMS...............19 2.1HistoryofDiscriminationandDimensionalityReduction..........19 2.2NearestNeighborMethods..........................20 2.3ProbabilisticMethodsforClassication....................21 2.4DimensionalityReductionforClassication..................22 2.5DiscriminantAnalysis.............................24 2.6DiscriminantAnalysisforMulticlassClassication..............26 2.7BackgroundSummary.............................33 3ACATEGORYSPACEAPPROACHFORTHEMULTICLASSPROBLEM..34 3.1OverviewofTheCategorySpace.......................34 3.2TheMathematicalFrameworkforCategorySpaces.............35 4ALTERNATIVEAPPROACHESTOSUPERVISEDDIMENSIONALITYREDUCTION43 4.1KernelBasedQuadraticFormulation.....................43 4.2 L 1 NormObjectiveFormulation........................48 4.3BiasEstimation:Origin............................50 5RESULTSANDDISCUSSION...........................54 5.1PerformanceMeasures.............................54 5.2AccuracyAnalysis...............................55 6CONCLUSIONS...................................74 6.1Contributions..................................74 6.2FutureWork...................................75 5

PAGE 6

APPENDIX ARACEFACEIMAGEDATA............................76 A.1RaceSpaceFaceDataSetOverview......................76 BTHEOREMS,LEMMAS,&COROLLARY.....................82 B.1TheoremOverviewBolla etal. ........................82 B.2Theorem3.1Bolla etal. ...........................82 B.3Lemma3.1Bolla etal. ............................82 B.4Corollary3.1Rapcsk etal. .........................83 CSPLINE-BASEDCLASSIFICATION........................84 C.1SplineOverview.................................84 REFERENCES.......................................88 BIOGRAPHICALSKETCH................................95 6

PAGE 7

LISTOFTABLES Table page 5-1Listofalgorithms..................................56 5-2Listoftestdatasets.................................57 5-3SyntheticGaussiandataaccuracymetrics.....................58 5-4CalTechgrayscaleimageaccuracymetrics....................58 5-5Coilimageaccuracymetrics.............................59 5-6Femalegrayscaleimageaccuracymetrics.....................60 5-7Malegrayscaleimageaccuracymetrics.......................60 5-8Englishalphabetaccuracymetrics..........................61 5-9Movementlibrasaccuracymetrics..........................62 5-10Opticalrecognitionaccuracymetrics........................63 5-11Handwrittendigitsaccuracymetrics........................63 5-12Fourofakind,Royalush,Straightushaccuracymetrics............64 5-13Satellitedataaccuracymetrics...........................65 5-14Segmentationaccuracymetrics...........................66 5-15Shuttleaccuracymetrics...............................67 5-16Textureaccuracymetrics...............................68 5-17Irisaccuracymetrics.................................69 5-18Seedsaccuracymetrics................................70 5-19Vertebralaccuracymetrics..............................71 5-20Winerecognitionaccuracymetrics.........................71 5-21Vowelaccuracymetrics................................72 7

PAGE 8

LISTOFFIGURES Figure page 1-1Multiclassclassicationconversiontechniques...................18 5-1SampleCaltechgrayfaceimages..........................59 5-2Coildatasample...................................59 5-3Letterdatasample..................................61 5-4Movementlibrasdatasample............................62 5-5Optdigitssampledata................................64 5-6Pokersampledata..................................65 5-7Satellitedatasample.................................66 5-8Segmentdatasample.................................67 5-9Shuttlesampledata.................................68 5-10Texturesampledata.................................69 5-11Irissampledata....................................70 5-12Winerecognitionsampledata............................72 5-13Voweldatasample..................................73 A-1SampleofimagesincludedinRaceSpacedataset.................77 A-2MeyersraceEthnographicMap..........................78 A-3Blumenbachdivisionofraces............................78 A-4SampleCaucasianfemalefaces...........................79 A-5Samplenegroidfemalefaces.............................79 A-6SampleAsianfemalefaces..............................80 A-7SampleArierfemalefaces..............................80 A-8SampleIndianfemalefaces..............................81 C-1Splineresultsforbenchmarkdatasets.......................87 8

PAGE 9

AbstractofDissertationPresentedtotheGraduateSchool oftheUniversityofFloridainPartialFulllmentofthe RequirementsfortheDegreeofDoctorofPhilosophy CATEGORYSPACEDIMENSIONALITYREDUCTIONFORSUPERVISED LEARNING By AnthonyO'NealSmith August2013 Chair:AnandRangarajan Major:ComputerEngineering Inthisresearch,weinvestigatethereasonsthatmakethemulticlassclassication problemdicultandsuggestthatcategoryambiguityisattheheartoftheproblem.We analyzepreviouseortsatmulticlassclassicationandexplainhowtheydonotaccount forthisambiguitywhentheysuggestsimplevotingschemesoroerthecombinationof pairwisebinaryclassiersassolutions.Thenwedemonstratethatmostothermethods lackthenotionofageometricspacetorepresenttheclasses.Theseessentialconcepts havebeenneglectedbecauseinthepast,classeswereperceivedtobeindependent.This leadstolimitedapproachesinwhichtechniquesassumedistinctclassesandassignnominal labelstothem.Wearguethatclassrelationshipsexistthatmustbeexploited.The approachweproposegivesanalternatemethodfordimensionalityreductionsothat multiclassclassicationtechniquescanovercomeseveraloftheproblemsthatexistwith pairwiseclassicationschemesandexhibitsbetterperformanceonmanyproblems.We looktoseparateobjectsaccordingtosimilarqualitiesandcharacteristicsbyprojecting toaspaceaCategorySpacedenedbythenumberofclassesandthepropertiesof theclasses.Afterdimensionalityreductiontothecategoryspace,weuseaclassication techniquetoevaluateperformanceonalargecollectionofbenchmarkdatasets.Finally, wedetailthestrengthsofourapproach,andprovideaframeworkforalternativeobjective functionsforlinearandkernel-basedprojections.Ourcontributionsspantherangeof dimensionalityreduction,classication,andoptimizationformulticlassproblems. 9

PAGE 10

CHAPTER1 INTRODUCTION 1.1TheExistenceofDiscrimination Thebrainisahighlycomplexinformationprocessinginstrument.Dierent subsystemsofthebrainsuchasthoseresponsibleforvisionandnaturallanguage processingprovideuswithinformationtointeractwiththeexternalworld.Individuals formsymbolicrepresentationsoftheworldsuchascategoriesfromtheseinteractions. Forexample,weattachcategoricalpropertiestoobjectsandprocesses.Theserangefrom simplepropertiessuchascolorandshapetomorecomplexpropertiessuchastextureand material.Theresultofhumanbeingsinteractingwithnatureandwitheachotheris,at themostbasiclevelofasystemofinterdependentcategoriesandlabels.Thisorganization oftheworldintermsofattributesandcategoriesishighlybenecialsinceitconnectsus withtheworldaroundus.Thisiswhyclassicationisimportant,andwhythehuman componentisvital. Ultimately,thisresearchwantstoimprovethewaymachineslearn.Webelieve machinescanbetteraccomplishlearningwiththeaidofhumanexperience.Wewish todevelopaframeworkthatwillimprovetheabilityofthemachinetolearn,bybetter utilizinghumanexperience.Peoplehaveanamazingabilitytoclassifytheenvironment inwhichwelive.Webuildataxonomythatisarepresentationofthenaturalworld. Thisiscarriedoutbyemployingoursensesandthebraintoprocessmillionsoffeatures. Theoverallgoalofthisworkistoeectivelyutilizetheinterdependentsystemofhuman categoriesinmachinelearning. Onewaytodescribehumancategorizationisviatheconceptofdiscrimination. Discriminationisthepracticeofdeterminingadistinctionbetweentwoobjects[1],[2]. Thistypeofcognitionisinstinctivetohumansandmachinesshouldleveragethisability toimproveorenhancelearning.Thefundamentaltaskofdiscriminationbymachines canprovetobedicultifnotimpossiblewhenobjectsarenotobviouslyseparable.The 10

PAGE 11

simplestwaytoassistthemachineinthissituationisbypresentinganassociationbetween objectsandlabels.Theideaisforthemachinetolearnfromarelationship;having learnedthisassociationbetweenpatternsandlabels,labelpredictioncanbemadefornew patterns.Inessence,whenanewpatternispresented,themachinehasthenecessarytools tomakeaninformedjudgmentgiventhesetofpossiblelabels. Conventionaldiscriminationisdenedbetweentwoclasses,butinrealityobjects andprocessesaremorecomplex.Thisbringsustotheconceptofthreeormorelabels orclasses.Whathappenstodiscriminationwhenyouhavemorethantwolabelsor classes?Inthisscenariothetaxonomycannotbeeasilydividedintoanoppositional space.Anobviousapproachtohandlemultiplelabelsistoextendthefundamentalidea ofdiscriminationthathasworkedverywellforthecaseoftwoopposingclasses.But,this approachintroducesanewlevelofcomplexitytotheproblem.Oneideaistotakeevery possiblepairofclassesandcreateapseudooppositionalspace.Buttotrulyrepresent acompleteoppositionalspace,wewouldhavetocreateoppositionsbetweenallpairs. Havingcreatedallpossibleoppositionalpairs,itisstillnotobvioushowtoassignaclass labelwhenanewobjectorpatternispresented.Thisapproachisalsonotfeasiblewhen thenumberofclassesislargesincethenumberofoppositionalpairsgrowsquadratically. Toavoidlookingatallpossiblepairs,otherapproacheshavebeensuggestedsuchas creatingoppositionsbetweeneachclassandaunionoftheothers.Awinner-take-all approachisusedtoassigntheclasslabelofanew,incomingpattern[3].Inouropinion, thepseudooppositionalspacecreatedinthisone-vs-allapproachisarticial.Wewill expandontheseapproachesinmoredetailinChapter2.1,alongwithexploringthe historyofdimensionalityreductionandclassication.Basedontheseobservations,in thiswork,wetakeitheinterdependentnatureofclasses,andiithelackofbinary oppositionsinthemultipleclasscasetoheartandendowcategorieswithageometric structure.Weformallydescribeourapproach,andremaincognizantofthecorrelation acrosscategories. 11

PAGE 12

1.2GeometryofClassication Statisticalpatternclassicationandcategorizationareanintegralpartofapplied machinelearningandanalysis.Webeginwithahighleveldescription.Thiseldofstudy canbedividedintothedistinctsectionsofLearningandClassication.Learningis thegeneralproblemofidentifyingafunction f x thatyieldsavaluecloseto y ,where y istheresponseand x theobservation.Statisticallearningtheoryeectivelyprovidesa frameworktosuccessfullysolveclassicationproblems[4],[5].Asmentionedpreviously, inclassication,weneedtoassociateaclasslabeltoaninputobservationbasedonits features.Usually,thislabelassignmentisperformedfromasetofpredenedclasslabels. Whilestatisticallearningtheoryisaneectiveframeworkforlearningandregression,all toooftentheoutputlabels y areconceivednominallyorasadiscretesetofsymbols whichhavenointer-relationships. Supervisedlearningisawellstudieddisciplineinmachinelearning.Bydesign,the varioussupervisedlearningalgorithmsmapinputpatternstodesignatedoutputlabels. Earlyworkinmachinelearningconceivedofcategoriesmainlyasnominallabels[6],[7], [8].Forthisreason,mostrecentworkinmachinelearningcontinuestorepresentcategories aslabelsandconsequentlycollapsesthedistinctionbetweencategoryandclass.Froma largerviewpoint,categorizationisaprocessinwhichideasandobjectsareidentiedand separatedmuchlikeclassication.But,inaddition,theprocessofcategorizationimplies apurpose-driveninteractiveprocesswhereininter-relationshipsbetweencategoriesare activelyconstructed.Inthisview,acategoryexpressesarelationamongstitsmembers andforgesarelationshipbetweencategories. Herearethreegeneraldocumentedapproachestocategorization[9]: 1.Classicalcategorization 2.Conceptualclustering 3.Prototypetheory 12

PAGE 13

Classicalcategorization. Theoldestofthepracticesanditsrootsstemfrom Platowhointroducedcategorizationastheconceptofgroupingobjectsbasedonsimilar attributes.Fromtheperspectiveofhistheory,categoriesshouldbeclearlydened, mutuallyexclusiveandcollectivelyexhaustive[9],suchthatanobjectbelongstoone,and onlyonecategory. Conceptualclustering. Aderivativeofclassicalcategorizationandhasfoundits wayintomodernmachinelearningapproachesprimarilyunsupervisedlearning.Here, classesorclustersaregeneratedrst,aconceptualdescriptionistheninferredandlater usedforclassication. Prototypetheory Psychologistsconsiderequivalencesbetweenobjectstomean thatdierentmembersofthesameclasscanbeassignedgenericnames,suchasdog,cat etc.Basiccategoriesarethosewhichcarrythemostinformation,implyingtheypossess thehighestcategoryvalidity[10],[11].Itisatthisrudimentarylevelwherecategoriesare mostdierentiatedfromoneanother.Fornaturalcategories,conditionsarealmostnever mettobeabletoguaranteeperfectdiscrimination.Wetendtothinkthatsuchcategories arenotdistinctbutblurredattheboundaries.Thisisonewaytoallowanobjectto belongtomorethanonecategory. Tosummarize,categorieshaveanunderlyingstructurewithrelationshipsexisting notonlybetweenobjectsandcategoriesbutbetweenthecategoriesthemselves.This structureisnotapparentinmostpreviousmachinelearningapproaches.Unlikethese previousmethodsthathaveregardedcategoriesasnominalandhaveconsequently erasedthedistinctionbetweencategoriesandclasslabelsinthiswork,wewishto establishtheideaofacategoryspace.Inadditiontoprovidingarelationshipbetween objectsandcategories,thisspaceprovidesthecrucialassociationsbetweencategories. Thecontributionofthisresearchisinthelearningofsuchacategoryspace.Learning acategoryspacefromasetoftrainingdatacansuccessfullydemonstratenotonlythe representationofeachuniquecategorybutalsothecorrelationsbetweencategories. 13

PAGE 14

1.3MulticlassLearning Wehavenotspokenmuchonthestructureoftheinputfeaturestrainingandtest setsalike.Henceforth,werestrictourfocustoinputfeaturesthatareorganizedintoa featurevector.Thisvectorspacerepresentationoftheinputpatternsisverycommon inpresentdaymachinelearning.Toacertainextent,wecandescribemostofmachine learningastheattempttolearnafunction f thatmapsasetofinputfeaturevectors x i 2 R D ;i 2f 1 ;:::;N g toasetofnominaloutputcategories f 1 ;:::;K g .Ourtask aheadistorstmathematicallyexplainwhynominallyconceivedclassescreateproblems andthenoerourstructuredcategoriesapproach. Multiclassclassicationisanon-goingproblem,andthereareseveralproposed solutionstotheproblem[12],[13].Thegoalinmulticlassclassicationistodevelopan algorithmwhichwilltakeaninputfeaturevector x i andassignittoaparticularclass C k where k 2f 1 ;:::;K g and K> 2 classes.Manyauthorshaveregardedclassesas names,labels,ortagsthathavenointernalrelationships.Machinelearningandpattern recognitionbeganbyformulatingandsolvingbinaryclassicationproblems.Aswehave repeatedlymentioned,realworldproblemshavemorethantwoclasses.Whileconsiderable researchhasbeendonetoshowtheadvantagesofdesigningsimpletwowayclassiersthat arethencombined[14],[15],[16],theseapproacheshavedrawbacks.Multiclasslearningin thisviewisanafterthoughtandnotaninitialconsideration.Mostofthemethodsconsist ofareductiontobinaryclassicationfollowedbyacombinationscheme.Theselearning systemsexhibitintolerancetoambiguitysincetheclassesareconceivedasdiscreteor nominallabels.Also,thereisusuallynowayforthesystemtogeneratenewclassesby exploitingthestructureorpropertiesoftheclasses. Oneoftheearliestproposedmethodstosolvethisproblemistheone-vs-one approach.Aclassierislearnedforpairsofclassesafterwhichtransitivityisapplied[17]. Forexampleifwehadonlythreechoicesforthecollegefootballnationalchampionship; FloridaGators,FloridaStateSeminoles,andMiamiHurricanes.Then,wecouldmatch 14

PAGE 15

GatorsvsSeminolesandSeminolesvsHurricanesandassumethisexhauststhesetof allpossiblematch-ups.IftheGatorsdefeattheSeminolesandtheSeminolesthendefeat theHurricanes,thisdoesnotnaturallyimplythattheGatorswillbevictoriousoverthe Hurricaneseventhoughitisimpliedbytransitivity.IftheGatorsdidinfactplaythe Hurricanesandheavenforbidthelatterwon,transitivityclearlydoesnotapply.Ifwedo notwanttoimposetransitivitypropertiesonthetrainingsetandinsteadlearnclassiers forallpairsofclasses,thiscouldbecomequiteexpensiveasthenumberofclassesgrows. Whenwehave K classes,thisresultsin K K )]TJ/F23 7.9701 Tf 6.586 0 Td [(1 2 binaryclassiers[18]. Anotherclassicapproachistheone-vs-allapproachtothemulticlassproblem[19]. Theideahereistorstconstruct K classierswitheachseparatingoneclassversus therestwiththeremainingpatternspooledtogetherintooneclass.Anewpatternis assignedaclasslabelbychoosingthebestclassierthroughawinner-take-allprocess. Since,inthiscase,binaryclassiersareobtainedbytrainingondierentandindependent problems,itisunclearwhethertheirreal-valuedoutputsarecomparable[20].Ifdierent classiersassignthesamevalueformembershiptoaclass,afurthersymmetrybreaking problemarises.Inaddition,one-vs-allclassiershavebeencriticizedforcreatingan asymmetrictrainingproblemsincethenumberofpatternsintherestcanbeexpectedto overwhelmthenumberofpatternsineachclassasexhibitedin[21]. Tobetterunderstandtheaforementionedschemes,inFigure1-1ontheleftisshown aone-vs-allapproachwhereasontherightisshowntheallpairsone-vs-oneapproach.The regionsindicatedwithquestionmarksidentifywherethereisambiguityinclassication. Intheone-vs-allapproach,severalclassiersmightconcludethatapatternislabeledas havingmembershiptotheirclass,alsoknownasType-Iclassambiguity.Allclassiers mayalsodrawtheconclusionthatthepatterndoesnothavemembershiptotheirclass, alludingtoType-IIambiguity.Fortheallpairsone-vs-onemethod,therewillbe instanceswhenallclassierseitherdisagreeorresultinvotingties.Thisschemewillresult inaType-IIIregionofambiguitywhereallclassesagree. 15

PAGE 16

1.4CategoryVectorSpace Insummary,ourviewisthatalltheproblemsinmulticlasslearningstemfrom thetreatmentofcategoriesasnominallabels.Inthiswork,weaimtoovercomethis limitation.WeconceiveofcategoriesascoordinateaxesinaEuclideanvectorspace. Whenthetrainingpatternsarealsorepresentedasvectorswhichisstandardinpresent daymachinelearning,thelearningproblembecomesoneofmappingtheinputfeature vectorsintoacategoryvectorspace.Oncethepatternshavebeenprojectedintothe categoryspace,thelocationsofthecoordinateaxisyieldsvitalmembershipinformation whichcanbeusedtoperformclassication.Newcompoundcategoriescanalsobe discoveredinthisscheme.Acompoundcategoryismadewhentwoormorecategories arejoinedtoformanewcategory.Considerthecaseofmusicalnotes,C,C#,Detc,in thecategoryspace.Thesenoteswouldberepresentedasprimarycategoriestheaxes ofthecategoryspace.Achord,whichisaharmonicsetoftwoormorenotesplayed simultaneously,wouldbeviewedinthisapproachasacompoundcategory. Theworldisinherentlyambiguouswhichmakesclassicationdicult.Previous eortsatmulticlassclassicationdonotaccountforthecomplexityofclasseswhenthey suggestsimplevotingschemesortryandcombineanumberofpairwisebinaryclassiers. Fundamentally,theylacktheideaofageometricstructuretorepresenttheclasses. Classesmayseemindependentandthisperceptionstemsfromthefactthattechniques trytoseparatepatternsintodistinctclasses.But,intrinsically,arelationshipexiststhat mustbeexploited.Thisconceptisbestexplainedbyexample.Wedeneasetofrace labels:Caucasian,NegroidandMongolian.Wearegivenatrainingsetandthetaskof mappingthetrainingsettothedenedlabelssuchthatweachievereasonablelearning andlowgeneralizationerror.Now,considerthecasewhereanambiguoustestpatternis presentedanIndianerorNativeAmerican orapersondrawnfromaracewhichisnot representedbythesystem.Anycurrentmulticlasssolutionwouldnothavetheability toclassifytheincomingpattern.Theywouldperformoneoftwoactions:1indicate 16

PAGE 17

thattheincomingpatternisoutsideofthelabelspace,or2assignthepatterntothat labelwhichreceivedthehighestvote.Thesesystemsdonothavetheabilitytorepresent amixedracepersonusingacombinationofclasses.Theyarenotabletointerpretthe underlyingrelationthatthemixedracebridgesthegapbetweentwoofthepredened, nominalclasses.Oneapproachtoaccommodatethisambiguityisbyassigninganew labeltothemixedracepattern.Thisapproachcouldpotentiallyleadtoanexponentially largelabelspaceifpatternsfromallpossiblelabelcombinationsarepresented.Another routeistotruncatelabelstoavoidtheverylargelabelspace.Inthiscase,intolerance returnsintheformofunrepresentedclasses.Unfortunatelytheseproblemsarenottrivial becauseambiguityisnaturallypresentintheworld.Wesolvethisproblembytakinginto considerationtheunderlyingrelationshipsbetweenclassesandlearningacategoryspace whichexhibitstheassociation.ThisisdenedastheCategorySpace.Theapproachwe proposegivesanalternativemethodformulticategoryclassication.Welooktoseparate objectsaccordingtosharedqualitiesorcharacteristics.Weevaluatetheperformanceona largecollectionofbenchmarkdatasetsandinvestigateitsuseinraceclassicationfrom imagedata. 17

PAGE 18

Figure1-1.Multiclassclassicationconversiontechniques AOne-vs-allBOne-vs-One.FigurecourtesyofPRML. 18

PAGE 19

CHAPTER2 EVOLUTIONOFCLASSIFICATIONPARADIGMS 2.1HistoryofDiscriminationandDimensionalityReduction Thegoalofclassicationistolearnaclassierwhichconstructsamappingthat minimizesthenumberofmisclassicationsoftheinputpatternswhilegeneralizingwell tonewandunseenpatterns.Simplycountingthenumberofmisclassicationsleadstoa binarylossfunction.Ifaninputpattern x isclassiedincorrectly,itincursaresponseof y =1 ,otherwise y =0 whichindicatesnopenalty.Theresultisabinarylossfunction[22] thatcanbewrittenas C y;f x = 8 > < > : 1 ; if y = f x ; 0 ; otherwise Tominimizationthenumberofmisclassicationshasalwaysbeenregardedas acompromisetoavoidcomputationaldicultiesandoftenleadstoanNP-complete optimizationproblem[23].Thisistrueevenwhentheinputvector x 2 R D andthe function f x : R D R islinearforsome D dimensionalfeaturespace.Thereisno knownecientalgorithmtodeterminetheoptimalfunction f x thatworksforall featurespacedimensions.But,if D issmall,thenitispossibletoenumerateallmappings viaexhaustivesearch.Thisbruteforceapproachisnotpracticalandusuallylimited dependingontheproblem.Researcheortsinmachinelearninghavethereforemoved inthedirectionsoficonstructingdiscriminantswhileavoidingharddecisionsand iidesigningconvexobjectivefunctionsbasedonapproximatingthemisclassication error.Theconstructionoflineardiscriminantshasamuchlongerhistorydatingback totheworkofFisherinthe1930'swiththeconstructionofanobjectivefunction. Convexobjectivefunctionsthatattempttoapproximatethemisclassicationerrorhave gainedtractionsince1990,resultinginmethodssuchasleast-squares,logisticregression andsupportvectormachines.Theseapproacheshavelaidthefoundationforadvanced 19

PAGE 20

classicationalgorithms.Theyprovideafeasiblesolutiontothebinaryclassication problem,andthengeneralizetothemulticlassproblem. Inmachinelearning,itiscustomaryforclassicationmethodologiestoregardclasses asnominallabelswithouthavinganyinternalstructure.Thisremainstrueregardlessof whetheradiscriminantorobjectivefunctionapproachistaken.Discriminantsaredesigned byattemptingtoseparatepatternsintooppositionalclasses[24].Whengeneralization toamulticlassclassierisrequired,manyoppositionaldiscriminantsarecombinedwith thenalclassierbeingawinner-take-allorvoting-baseddecision.Convexobjective functionsbasedonmisclassicationerrorapproximationarenotthatdierenteither. Least-squaresorlogisticregressionmethodssetupconvexobjectivefunctionswith nominallabelsconvertedtobinaryoutputs[25].Whenextensionstomulticlassaresought, thebinarylabelsareconvertedtotheverticesofa K )]TJ/F15 11.9552 Tf 11.84 0 Td [(1 simplexwhere K isthenumber ofclasses.SupportvectormachinesSVM'swereinherentlydesignedfortwoclass discriminationandallformulationsofmulticlassSVM'scontradictthisoppositionalorigin. Below,webeginbydescribingthedierentapproachestothemulticlassproblem.This chapterisnotmeanttobeexhaustive,butprovideandoverviewofsomeofthepopular methodsandapproachesthathavebeenresearchedinclassicationanddimensionality reduction. 2.2NearestNeighborMethods Nearestneighbormethodsareperhapstheeasiesttodescribe.Givenatrainingset ofpatternswithtwoormorenominalclasslabels,anincomingpatternisassignedthe labelofitsnearestneighbor[26].ThenotionofnearesthereistheEuclideandistance whenthepatternsarerepresentedinaEuclideanvectorspace.Elaborationsofthenearest neighbormethodincludetakingamajorityvoteofasetofnearestneighborsofthe incomingpattern.Thismethodremainsthesamefortwoormoreclasses.Nooppositional discriminationorcompetitionstrategysimilartowhatwehavedescribedisinvolved 20

PAGE 21

inthistechnique.Sinceclasslabelsaredecidedbyamajorityvoteorbythenearest neighbor,thismethodalsotreatsclassesnominally. 2.3ProbabilisticMethodsforClassication Probabilisticmethodsbeginbyconstructingdensityfunctionsforeachclass[27]. Given K classes,themethodbuilds K densityfunctions p x j C k ;k 2f 1 ;:::;K g .In thesimplestversionofthisapproach,afterdensityfunctionshavebeencreatedusinga trainingsetoflabeledpatterns,anincomingpatternisclassiedbyselectingtheclass labelbasedontheclassconditionaldensitywiththehighestvalue: argmax f C k g f p x j C k ;k 2f 1 ;:::;K gg Densityestimationmethodsareusedtoconstructtheclassconditionaldensityfunctions p x j C k whichusuallyleadstotherequirementofverylargetrainingsetsinhigh dimensions.Whenclasspriors p C k areavailable,themethodusesposteriorprobabilities p C k j x obtainedviaBayes'theorem.Classesaretreatednominallyinthisapproach withthebestclasslabelobtainedusingthewinner-take-allinEquation2.Sinceclass conditionaldensitiesareseparatelyconstructedforeachclass,itisdiculttocombine mixedclasses.Unlikenearestneighbormethods,classconditionaldensitiescanbeused tobuilddiscriminantsbetweeneverypairofclasses.However,themethoddoesnotbegin withoppositionaldiscrimination[8]. Anotherplausiblestrategyisrelaxationlabelingwhichhasbeenappliedtomany applicationsofmachinelearning[28],[29],[30].Thefundamentalprinciplebehindthe labelingmethodutilizesasetoffeatures,suchascornersoredges,andasetoflabels. Typically,thelabelingschemesareprobabilisticassignmentsofweightsorprobabilities foreachfeaturetoeachlabelintheset[31].Thisprovidesandestimateofthelikelihood thattheparticularlabeliscorrectforthatfeature.Probabilisticapproachesiterateto maximizeorminimizetheprobabilitiesassociatedwithlocalneighborhoodfeatures.This typeofrelaxationstrategydoesnotguaranteeconvergence,somaynotidentifyanal 21

PAGE 22

labelingwithauniquelabelthathasaprobabilityofoneforeachfeature.Assumewe have S = f s 1 ;:::;s D g ,thesetof D dimensionalobjectfeaturestobelabeled. L istheset f l 1 ;:::;l M g possiblelabelsforthefeatures.Let P i l m betheprobabilitythatthelabel l m isthecorrectlabelforobjectfeature s d .Inthiscasetheusualprobabilityaxiomshold true: Aprobabilitysatises 0 P i l m 1 where P i l m =0 impliesthatlabel l m hasa weakprobabilityforfeature s d and P i l m =1 impliesastrongprobabilityforthe labeling. Agivensetoflabelsaremutuallyexclusiveandexhaustivesuchthatforeach P i over allpossiblelabels L : X L P i l m =1 Therefore,eachfeatureisdescribedbyexactlyonelabelfromtheset.Updatingthe probabilitiesforthelabelsisdonebyconsideringtheprobabilitiesoflabelsfora localneighborhood.Thelabelingprocessisinitializedwithanarbitraryassignmentof probabilitiesforeachlabelforeachfeature,typicallyindicatingequallikelyprobability. Thestepsofthealgorithmthenassignprobabilitiesintoanewsetaccordingtosome relaxationschedule.Thisprocessisrepeateduntilthelabelingstabilizesandthemethod converges.Generally,thisoccurswhenlittleornoprobabilityvalueschangebetween successiveiterationsofthemethod. 2.4DimensionalityReductionforClassication Thegoalofdimensionalityreductionistondatransformationfunction A thatmapsthe D dimensionalinputvector x 2 R D intoalower M dimensional subspacewhichwillcontainalltherelevantinformationneededtosolvetheoverall discriminationproblem.SomeofthemorepopularapproachessuchasKarhunen-Loeve ortheeigenvectororthonormalexpansion[32],[33],[34]havebeenshowntobeviable solutions.Historically,literaturehasbeencenteredonsolvingforlineartransformations utilizingeigendecompositionofanestimatedcovariancematrixfromthedata.Each 22

PAGE 23

eigenvectorcanberankedbasedonthemagnitudeofthecorrespondingeigenvalue.A subsetcontaining M eigenvectorsarethenchosen,andtheinputfeaturevectorsare transformedbymultiplyingbytheeigenvectormatrix A =[ e 1 ;e 2 ;:::;e M ] .Thisproduces amappedfeaturevector x suchthat x = A T x ,where x M 1 A D M ,and x D 1 .Forthe Karhunen-Loeveapproachthe M dimensionalsubspaceisselectedbychoosingtherst M elementsofthediscreteKarhunen-Loeveexpansion.Theestimatedtransformationis optimalwithrespecttottingthedata,butisnotnecessarilyagoodtransformationfor discriminatingandclassifying. FolleyandSammon[35],[36]studiedthetwoclassproblemandafeatureselection algorithminwhichthecriteriaisbaseddirectlyontheirpotentialtodiscriminate.Agood featureselectionisdependentonaasetofoptimaldiscriminantvectors.Toidentifythe optimalvectorstheychosethegeneralizedFisheroptimalitycriterionproposedby[37]. ThepropertiesoftheFishercriterionensurethattheselecteddirectionsmaximizethe criterion,andsatisfytheorthogonalityconstraint.Thisapproachprovidesasolutionfor thebestdiscriminatingsubspace,butisonlyoptimalwithrespecttothechosencriteria. PrincipalcomponentsanalysisPCApermitsthereductionofdimensionsofhigh dimensionaldatawithoutloosingsignicantinformation[38],[39],[40].Principal componentsareawayofidentifyingpatternsorsignicantfeatures,thenexpressing similaritiesanddierencesacrossthedata[41].Thepatternscanbeusedforcompression andclassicationbyreducingthenumberofdimensions.Principalcomponentsarethe linearcombinationsofthefeaturesshowingthehighestvariationacrossthedata. SupervisedPCASPCA,derivedfromPCAisamethodforpredictingusefulwhen thenumberofpredictors p N ,thesamplesize.Thistechniquewasrstdescribedby BairandTibshirani[42]inthecontextofamethodknownassupervisedclustering.The ideabehindSPCAisinsteadofperformingtraditionalprincipalcomponentanalysistaking allthedatasamplesintoconsideration,asubsetofthedataisselectedandlabeledto generateamodel.Theisusedtobuildapredictionmodelasafunctionoftheexpression 23

PAGE 24

[43].Thenchoosethosesampleswhosecorrelationwiththeoutcomeislargest,anduse onlythosesamples,toextractthemostsignicantprincipalcomponentsofthereduced datamatrix. 2.5DiscriminantAnalysis Historically,machineshavelearnedwiththeaidofhumansusingaverypopular approachknownasbinaryopposition[7].Usinghumaninput,patternsaredividedinto twoclassese.g.malevs.female,whitevs.blackandliberalvs.conservative.This meansthatpatternsaredeliberatelychosenfromopposingclasses.Themachinethen learnsnotonlytoassociatepatternswithcertainlabels,butsystematicallylearnsa contrastoranoppositionalstructure.Thisapproachhasbeenwidelyadoptedbecauseof itssimplicitywhichlendsitselftostraightforwardclassicationmethods.Whenyoulearn anopposition,thereisaverycleardiscriminationaspectthatenablesthemachinetomake predictions.Thechoiceisblackorwhite,sothechanceoferrorisonly50%.Thisisthe fundamentalideabehinddiscriminantsanddiscriminant-basedclassication. Discriminantmethodshavebeenwidelyutilizedinclassicationwithtwoofthemore popularmethodsbeingtheFisherLinearDiscriminantFLD[2]andtheSupportVector MachineSVM[44].TheSVMisnormallynotthoughtofasadiscriminantbutweargue thatitisfundamentallybasedondiscriminatingbetweentwoclasseswiththemulticlass SVMbeinganextension.Instatisticalpatternclassicationliterature,discriminant analysisismainlyusedtolearnfeaturetransformations.Lineardiscriminantanalysis LDAbasedontheFLD[12]isapopularmethodthatprojectshighdimensional vectorsintoalowerdimensionalvectorspace,wherethepatternsachievemaximum classseparability[5],[13].ClassicalLDAforbinaryclassicationlookstoprojectdata froma D dimensionalinputfeaturespaceontoa1Dfeaturespace.Thegoalisto ndtheorientationofthatlineforwhichtheprojectedsamplesbestseparate.LDA yieldsalineartransformationthatmaximizestheabilitytoseparatetheclassesinthe 24

PAGE 25

reduceddimensionalspace.Essentially,discriminantanalysislooksforsubspacesthatare maximallyecientforseparation. FLDistheprototypicalmethodthatuseslabeledtrainingpatternstocreatea classier.Forbinaryclassication,thissolutioncorrespondstondingaone-dimensional subspacesuchthattheprojectionofthelabeledtrainingpatternsontothesubspace achievestwogoals:1maximizationofthedistancebetweentheprojectedmeansofthe twoclassesand2minimizationofthesumofthetwointra-classprojectedvariances weightedbytherespectiveclasscardinality.Thesubspacethataccomplishesthese goalsisfoundbysolvingageneralizedeigenvalueproblem.Inthetwoclassproblem, FLDdeterminestheonedimensionalongwhichthepatterns x i 2 R D ; 8 i 2 C 1 and x j 2 R D ; 8 j 2 C 2 havethelargestvalueoftheobjectivefunction J W w = w T S B w w T S W w where w 2 R D isthedesiredweightvector, S B = m 2 )]TJ/F44 11.9552 Tf 11.955 0 Td [(m 1 m 2 )]TJ/F44 11.9552 Tf 11.956 0 Td [(m 1 T and S W = X i 2 C 1 x i )]TJ/F44 11.9552 Tf 11.955 0 Td [(m 1 x i )]TJ/F44 11.9552 Tf 11.955 0 Td [(m 1 T + X j 2 C 2 x j )]TJ/F44 11.9552 Tf 11.955 0 Td [(m 2 x j )]TJ/F44 11.9552 Tf 11.955 0 Td [(m 2 T Here, m 1 and m 2 arethemeansof C 1 and C 2 respectively.Thesingledimensionalong whichthepatternsareprojectedis y i = ^ w T x i where x i isapatternthatcouldbe fromeitherclass.Here, ^ w istheweightvectorthatmaximizes J W w .Subsequently, forclassicationathresholdneedstobedeterminedandxedinordertodenea discriminant.Thethresholdcanbedeterminedbyminimizingthetotalnumberof misclassications. TherehasbeensomedevelopmentonLDAbasedalgorithmswithorthogonal transformations.Someauthors[45],[46]showedthatanorthogonalsetofvectorswas morepowerfulthantheclassicaldiscriminantvectors.Whenadiscriminantvectoris 25

PAGE 26

found,forinstanceinatwo-classproblem,itmaybebeofinteresttondasecondvector, orthogonaltotherstone,maximizingtheprojectedvariance.ThiscorrespondstoaPCA withaconstraintoforthogonality.AnotheralgorithmstudiedbyDucheneandLeclercq [47]lookedtosolveforatransformationofahighdimensionalpatternmatrix.Thenew formwouldprovideaprojectionofthedatasetscombiningageneralizedPCAanda discriminantanalysis.Theobjectiveistondthebestsetofdiscriminantvectorsinorder toseparatepredenedclassofobjectsorevents.If K classesweredened,itcaneasilybe provedthatatmost K )]TJ/F15 11.9552 Tf 12.152 0 Td [(1 optimalvectorsexist.Mostoftheauthorschoosethese K )]TJ/F15 11.9552 Tf 12.152 0 Td [(1 vectorsasthebestsetofdiscriminantvectors[48].Whenadiscriminantvectorisfound, forinstanceinatwo-classproblem,itmaybeinthebestofinteresttondasecond vector,orthogonaltotherstone,maximizingtheprojectedvariance.Thiscorresponds toaPCAwithaconstraintoforthogonality.Thusthismethodpermitstodeneabest variancevectororthogonaltoasetofpreviouslycomputedvectors,withoutusingany statisticalpropertyofthisset. Insummary,evenifthemajorgoalofpatternrecognitionistoclassifyatbest objectsintheirclasses,itcanbealsoofinteresttoobtaininthesamesubspaceagood representationofthecorrespondingpointsandtoshowthedistributionofthesepointsin twoplaces. 2.6DiscriminantAnalysisforMulticlassClassication 2.6.1TheMC-FLDCriterion ThisleadsustotheMulticlassFisherLinearDiscriminantMC-FLD[7].MC-FLD isanacceptedmulticlassclassicationmethodthatattemptstotakeclassesorlabelsand constructavectorspace.Thisisalsoapopularmethodfordiscriminantsubspaceanalysis [13]whenasucientnumberofclassesareavailable.MC-FLDtriestoidentifythe optimalsubspacesuchthatameasureoflinearseparabilityoftheclassesismaximized. Whilethetwoclasscriterionisintuitiveandstraightforward,themulticlasscaseisnot. Inthetwoclasscase,onecouldbuildadiscriminanttoseparatethetwoclasses.As 26

PAGE 27

discussedbefore,inthemulticlasscase,itisnotobvioushowtocombinemultipletwo-way discriminationintoanoveralldiscriminant.MC-FLDelectstobuild K discriminantswith eachtryingtoseparateitsclassmeanfromthepooledclassmean.Theminimizationof thesumoftheintra-classprojectedvariancesisalsoincluded.Theoptimizationofsuch acriterionisstraightforward,andisdiscussedatlengthin[7].Giventheinputpatterns x i 2 R D ;i 2f 1 ;:::;N g ,and K classes,theprojectedsetoffeaturesisdenedby z i = W T x i where W is D K 0 withthesymbolandmeaningof K 0 tobeexplainedlater, theMC-FLDmaximizes J W =trace n )]TJ/F24 11.9552 Tf 5.48 -9.684 Td [(W T S W W )]TJ/F23 7.9701 Tf 6.586 0 Td [(1 )]TJ/F24 11.9552 Tf 5.479 -9.684 Td [(W T S B W o where S B = K X k =1 N k m k )]TJ/F44 11.9552 Tf 11.956 0 Td [(m m k )]TJ/F44 11.9552 Tf 11.955 0 Td [(m T and S W = K X k =1 X n 2 C k z n )]TJ/F44 11.9552 Tf 11.956 0 Td [(m k z n )]TJ/F44 11.9552 Tf 11.955 0 Td [(m k T andhere, m k isthemeanvectorofclass C k and m theglobalmean.Class C k has N k trainingsamples.Everycolumn w k of W isasolutiontothegeneralizedeigenvalue problem S B w k = k S W w k .Since S B hasmaximumrank K 0 = K )]TJ/F15 11.9552 Tf 12.728 0 Td [(1 ,thesolutionis thetop K )]TJ/F15 11.9552 Tf 12.461 0 Td [(1 eigenvectorsof S )]TJ/F23 7.9701 Tf 6.587 0 Td [(1 W S B .Giventheoptimalsolution ^ W ,thepatternsare projectedintothe K 0 dimensionalsubspacefollowedbyclassierconstruction.MC-FLD isthereforeasupervisedfeatureselectionapproachandnotaclassier.Itrepresentsan extensionofthetwoclassFLDtomultipleclasseswithclassdiscriminationandvariance minimizationreplacedbydiscriminationbetweeneachclassandthepooledmeanand covarianceminimization.Theclassierconstructedinthe K 0 dimensionalsubspace usuallyemploysone-vs-oneorone-vs-allstrategieswhichwenowdescribe. 2.6.2One-Versus-AllClassication MC-FLDisgenerallyusedtodiscoverasubspacewith K )]TJ/F15 11.9552 Tf 10.832 0 Td [(1 dimensionsformulticlass problemswhere K isthenumberofclassesinthetrainingset.Projectionintothe subspacemustbefollowedbytheconstructionofanactualclassier.One-vs-allisa 27

PAGE 28

populartechniqueforreducingamulticlassproblemintoasubsetofbinaryclassication problems[20].Toachievea K classclassieritiscommontoconstructasetofbinary classiers,whereeachistrainedtoseparateoneclassfromtherest.Thenextstepinvolves acomparisonofthe K outputvaluesfromtheclassiers.Todetermineclassmembership, the K classiersarecombinedandtheclassischosenaccordingto argmax k =1 ;:::;K )]TJ/F24 11.9552 Tf 5.479 -9.683 Td [(w T k x Whencomparisonsaremadeofthe K binaryclassiers,itisunclearwhetherthe independentoutputvaluesarecomparable.Thisposesaproblem,sincesituations mayarisewhereseveralclassiersassignequalmaximumvaluestoapattern.Another drawbacktothisapproachistheasymmetricapproachtotraining.Ineachofthe K binaryclassiers,thenumberofpatternsintheremainingclassesmayvastlyoutnumber thenumberofpatternsineachclass.Thisproblemworsensas K increases.Thiscancause abiasintrainingwhichultimatelyaectsperformance[21]. 2.6.3One-Versus-OneClassication One-vs-oneisalsoknownasanall-pairsormaximumvotingschemeapproach [49].Thisisanotherpopularextensionofbinarydiscriminantmethodsintendedfor multiclassclassication.Asshownin[17],aclassieristrainedforeachpairofclasses. For K classes,thisresultsinatotalof K K )]TJ/F23 7.9701 Tf 6.587 0 Td [(1 2 binarytwoclassdiscriminants.Oneof theadvantageshereisthattrainingismoresymmetricascomparedtotheone-vs-all approach.Anaddedbenetisthattrainingisdoneonsmallerindividualproblems.A votingschemeisemployedtoperformclassication.Despitethenameone-vs-one,atthis stageeachclasscompetesagainsteveryotherclass.Theclasswiththehighestnumberof votesisassignedtotheinputpattern.Asthenumberofclassesincreases,thenumberof pairwisediscriminantsincreasesquadratically. 28

PAGE 29

2.6.4ErrorCorrectingCodingforMulticlass Inthisapproach,eachclassisassignedauniquebinarystringoflength K [50].These stringsarereferredtoascodewords.Then, K binaryfunctionsarelearned,oneforeach bitpositioninthesestrings.Inthetrainingstage,thedesiredoutputsofthe K binary functionsarespeciedbythecodewordforclass C k .Newinputpatterns x areclassied byevaluatingeachofthe K binaryfunctionswhichtogethergenerateanew K bitstring. Thisstringisthencomparedtoeachofthe K codewords,and x isassignedtotheclass whosecodewordisclosest,accordingtosomesimilaritymeasureHammingdistancefor example[51].Thisapproachsuggeststhatweviewsupervisedlearningasaregression problemwithmultiplefunctionsbeinglearnedfollowedbycodecomparisonsincethe functionslearnedarebinary. Sinceerrorcorrectingcodesareconstructed,thesystemhassomeabilitytorecover fromerrors.Ameasureofthequalityofanerror-correctingcodeistheminimum Hammingdistancebetweeneachpairofcodewords.IftheminimumHammingdistance is h ,thenthecodecancorrectatleast h )]TJ/F23 7.9701 Tf 6.587 0 Td [(1 2 singlebiterrors.Thisisbecauseeachbit errormovesusoneunitawayfromthetruecodewordintermsofHammingdistance.If wemakeonly h )]TJ/F23 7.9701 Tf 6.587 0 Td [(1 2 errors,therequestedcodewordwillstillbethecorrectcodewordfor adequateclassication.Thismethodshouldbeseenasaregressionapproachfollowedby awinner-take-allforclassassignmentsincethecodewordwiththesmallestHamming distanceischosenfortheincomingpattern. 2.6.5RegressionandOptimization-basedApproaches Leastsquaresisaclassicaltechniquesforbothregressionandclassication[12]. Regression-basedapproachesaresimilartoerrorcorrectingcodes.For K classes, K functionsarelearned,eachwithbinarylabeledoutputsintheset f 0 ; 1 g usedtoindicate classmembershipofapatternForclass C 2 andatotalofthreeclasses,theoutputvector is [0 ; 1 ; 0] T .Ifthetrainingsetcontainspatternsstrictlydrawnfrom K classeswithno mixedpatterns,thisapproachisequivalenttoprojectingthepatternsintoa K )]TJ/F15 11.9552 Tf 12.72 0 Td [(1 29

PAGE 30

dimensionalsimplexwiththe K verticesofthesimplexbeingthe K binary-valuedclass outputvectors.Inputpatternsarealsoprojectedintothesimplexusingthelearned functionswiththenearestvertexchosenastheclasslabel. Aninterestingrelationship,rstdemonstratedby[52],existsbetweenleast-squares regressionandMC-FLD.Ithasbeenknown,foraverylongtime,thatleast-squares regressionwithappropriatelychosenoutputsisidenticaltothetwoclassFLD.Thework in[52]derivedasimilarrelationshipbetweenleast-squaresregressionandMC-FLD.The resultingformulationisdubbedLeast-SquaresLinearDiscriminantAnalysisLS-LDA. Inmulticlasslinearregression,thegoalistondtheweightvectors, f w k g K k =1 2 R D +1 withanextradimensionforthebiascomponentofthe K linearmodels,for k =1 ;:::;K byminimizingtheobjectivefunction L W = 1 2 k W T X )]TJ/F24 11.9552 Tf 11.955 0 Td [(Y k 2 F where W =[ w 1 ;w 2 ;:::;w K ] istheweightvectormatrix, X and Y thematricesformed fromthesetsofinputvectorsandbinaryclassvectoroutputsrespectively,with kk F denotingtheFrobeniusnormofamatrix.Theoptimal ^ W isgivenby ^ W = )]TJ/F24 11.9552 Tf 5.479 -9.684 Td [(XX T )]TJ/F23 7.9701 Tf 6.586 0 Td [(1 XY T assumingtherelevantmatrixinversesexist.Undersuitablechoicesoftheoutputvectors y i inthematrix Y ,theleast-squaresapproachcanbeshowntobeequivalenttoMC-FLD. Evenwithoutformalequivalence,notethatwhentheinputpatternsaredrawnfromthe K classes,theleast-squaresregressionapproachprojectsthemintoa K )]TJ/F15 11.9552 Tf 12.227 0 Td [(1 subspacewith theverticesbeingtheclassoutputvectors[53]. Supportvectormachinesasdevelopedin[44],[54],[55]arebasedonmaximizingthe marginbetweentwosetsofpatternsdrawnfromtwoclasses.Themarginisdenedasthe sumoftheperpendiculardistancesoftheclosestpatternineachclasstothehyperplane separatingthetwoclasses.Thisturnsouttobeequivalenttominimizingthenormofthe 30

PAGE 31

weightvector w subjecttotheconstraints w T x i + b 1 ,for i 2 C 1 and w T x j + b )]TJ/F15 11.9552 Tf 28.56 0 Td [(1 ,for j 2 C 2 where b isthebias.Theobjectivefunctionis min w w T w subjecttotheconstraintsinEquation2.UsingaLagrangeparametervector f i g to expresstheconstraints,theobjectivefunctioncanbewrittenas E w;b; = 1 2 w T w )]TJ/F25 7.9701 Tf 16.805 14.944 Td [(N X i =1 i y i )]TJ/F24 11.9552 Tf 5.479 -9.684 Td [(w T x i + b )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 withnon-negativityconstraints i 0 8 i .Ifthepatternsarelinearlyseparable,theSVM objectivefunctioncanbeconvertedintoitsdualformulationwhichisaconvexquadratic programmingproblemwithlinearconstraints.Ecientalgorithmsareavailabletosolve thedual[56],[57].Alargemarginclassierleadstogoodgeneralization.Ifafewpatterns behavelikeoutliers,theycanberejectedwhilemaximizingthemargin.Atrade-oexists betweenmarginmaximizationandoutlierrejection. SVM'swereoriginallydesignedforbinaryclassicationproblems.Theeective extensionofSVMtothemulticlassclassicationproblemisstillanon-goingresearch issue.Severalmethodshavebeenproposedwheretypicallyweconstructamulticlass classierbycombiningseveralbinarySVMclassiers.Morerecently,objectivefunction methodshavebeenproposedthatconsiderallclassesatonce[44]. Theapproachin[58]extendsthebinarySVMtothemultipleclasses.Theydesigna singleobjectivefunctionthattakesallclassesintoconsiderationsimultaneously.Theidea isderivedfromthesupportvectormachineformulationandissimilartotheone-vs-all approach.Inthisoptimizationproblem, K binarydecisionrulesareformed,wherethe k th function w T k x + b separatestrainingvectorsofclass k fromtheothers.Here K isthe 31

PAGE 32

numberofclassesorlabels.Thus,thereexist K decisionfunctions.Theoptimization problemisformulatedasfollows: min f w;b; g 1 2 K X k =1 w T k w k + C l X i =1 X k 6 = y i k i wheretheset f w;b; g denotesall K weightvectors,biasesandoutliervariables. C controlsthetrade-obetweenslackvariablepenaltyandthemargininminimizingthe errorandthemodelcomplexity.Theclasslabel y i 2f 1 ;:::;K g isadeparturefromthe binary-valued y i inthetwoclassSVM.Theconstraintsforpattern i are w T y i x i + b y i w T k x i + b k +2 )]TJ/F24 11.9552 Tf 11.955 0 Td [( k i ; k i 0 8 i;k; k 2f 1 ;:::;K g y i ; withsimilarconstraintsfortherestofthepatterns.Thelabelassignedtoincomingtest pattern x is argmax k 2f 1 ;:::;K g )]TJ/F24 11.9552 Tf 5.479 -9.684 Td [(w T k x + b k whichisaspertheone-vs-allmethod.Examinationoftheconstraintsshowsthateach pattern x i hastoseparatelysatisfyinequalityconstraintswitheveryotherclass. WenowdescribeanothermulticlassSVMtheM-SVMby[59]alsobasedona singleobjectivefunctioninsteadofamerecombinationofbinarySVM's.Theapproach setsupanoptimizationproblemasfollows: min f w; g 1 2 K X k =1 w T k w k + C L X i =1 i subjectto w T y i x i )]TJ/F24 11.9552 Tf 11.955 0 Td [(w T k x i e k i )]TJ/F24 11.9552 Tf 11.955 0 Td [( i for i =1 ;:::;L 32

PAGE 33

where e k i =1 )]TJ/F24 11.9552 Tf 11.955 0 Td [( y i ;k with beingtheKroneckerdeltafunction.Thedecisionfunctionis argmax k 2f 1 ;:::;K g )]TJ/F24 11.9552 Tf 5.48 -9.684 Td [(w T k x ThebinarySVMandthemulticlassSVM'shavenaturalkernelextensionswherethe innerproductsbetweenweightvectorsandthepatternsarecarriedoutusingreproducing kernelHilbertspaceRKHSkernels.Theuseofkernelsdoesnotchangethefundamental limitationsofthemulticlassSVMthatitisanextensionofthetwoclassSVMwith classestreatedasopponents. 2.7BackgroundSummary Whilewehaveaddressedthemostpopulartechniquesindimensionalityreduction andmulticlassclassicationreductiontothetwoclasscaseandobjectivefunction-based combination,thisisnotanexhaustivestudyoftheliterature.Ourfocussofarisprimarily ondiscriminantandobjectivefunction-basedmethodsthatassistinbettermulticlass classicationperformance. Theclosestwehaveseeninrelationtoourworkoncategoryspacesistheworkin [60]and[61].Theauthormentionstheimportanceandusefulnessofmodelingvector spacesfordocumentretrievalandexplainshowunrelateditemsshouldhaveanorthogonal relationship.Thisistosaythattheyshouldhavenofeaturesincommon.Thestructured SVMin[62]isanotherrecenteortatgoingbeyondnominalclasses.Here,classesare allowedtohaveinternalstructureintheformofstrings,treesetc.However,anexplicit demonstrationofclassesasvectorspacesisnotcarriedout.Inthesubsequentchapters, weshowthatthecategoryvectorspaceavoidstheneedtotreatclassesasnominallabels andnopairwiseoppositionasameansofclassication. 33

PAGE 34

CHAPTER3 ACATEGORYSPACEAPPROACHFORTHEMULTICLASSPROBLEM 3.1OverviewofTheCategorySpace Thecoreideabehindlearningacategoryspaceisthatclassesarerepresentedbyaxes inaEuclideanvectorspace,andthisrepresentationislearnedfrompatternsinsucha waythatsimilarpatternsareneartooneanotherinthatspace.Wedescribethecategory spacebyusinglinearalgebraconceptsandformalizingintermsoforthogonality.This mathematicalframeworkhasbeenderivedfromsimilarworkthatiscoveredingreater detailin[63].Theideaisexplainedclearlythroughanexample.Let S beauniversalset withadiscourseof U ,thenbydenitionthecomplement S ? is`NOT S 'andcorresponds totheset S in U .Then,theconcept`NOT S 'shouldcorrespondtotheorthogonal complement S ? of S .Furthermore,let V beavectorspaceand S beasubspaceof V .Ifweuseasetoffeaturestodeneabasisfor V ,thissaysthat`NOT S 'refersto thesubspaceof V whichhasnofeaturesincommonwith S .Thisexampledescribes orthogonalashaving zerosimilarityorbeing completelyunrelated.Foraclassication problemthisperfectlyacceptable,whereweassumethatiftwoinputpatternshaveno featuresincommontheylieondierentaxes.Now,fordiscriminationthiscorrespondsto anoppositionbetweenaspecicaxisandthesubspaceformedbytheremainingaxes. Inourendeavortoendowmulticlasspatternswithageometryakintoavectorspace, wechoosetomodelclassesasbasesinthenon-negativeorthant R K + ,withthedimension correspondingtothe K classes.Forourpurposes,theset R K + includestheorigin.The K categoriesformanorthonormalbasisinthisspacewitheachclasshavingitsowncategory axis.Apatternwhenprojectedintothiscategoryspacehasadegreeofmembershipto eachclassproportionaltotheanglebetweenthepatternandthecategoryaxis.Thebasic ideaisthatinputpatterns,whenrepresentedinthecategoryspace,aremodeledasvectors representableasalinearcombinationofourlearnedbasis.Sinceonlythenon-negative orthantisused,thesignoftheinnerproductbetweenafeaturevectorandthecategory 34

PAGE 35

axisisirrelevant.Trainingsetpatternsarecomparedwithonlytheirowncategorybasis andnotwithotherbases.Thisavoidstheoppositionale.g.one-vs-allproblemfacedby theothermethods.Incomingtestpatternscanbecomparedwitheachcategorybasis.If desired,awinnercanbechosenandanominallabelassigned.Inaddition,thisframework aordsanovelcapabilityunimaginableinothercontemporaryclassicationmethodologies. Duetothefactwehavelearnedapossiblycompletecategorybasisforourinputpattern space,wecannowdesignaclassierinthisspacetoassignnewlabelstopatternsthat maybeacombinationoftheindividualcategoriesthatconstitutedourtrainingdata. Withthisinformaldescriptioninplace,wedescribethepayoinusingacategory space.Recallfromourdiscussioninthepreviouschapter,theinherentoppositional dichotomyfacedbytheSVMandothermethodsinthemulticlasssetting.Sincethese methodsusenominalclasses,theyattempttodiscriminatebetweendiscreteclasses.In contrast,inourapproach,eachprojectedpatternmeasuresitsdeviationfromaclassin termsoftheinnerproductwithitscategoryaxis.Thisavoidstheoppositionalproblem describedinothermethods.Sincethecategoryspacehasageometricstructure,there isnoneedtosetupthesupervisedlearningproblemusingdiscrimination.Instead,the supervisedlearningproblembecomesoneofmaximizing,overthetrainingset,asuitable functionofinnerproductsofpatternswithcategoryaxes.Discriminationisanatural byproductduetothelearned,orthonormalcategorybases.Thecompetitioninour developmentisforpatternstohavegreaterinnerproductmagnitudewithrespecttoits labeledcategoryaxesandlesserinnerproductmagnitudetoothercategoryaxes.These notionsareformalizedinthefollowingsections. 3.2TheMathematicalFrameworkforCategorySpaces Categorizationisaprocessinwhichobjectsareidentiedandseparatedmuchlike classication.But,inaddition,theprocessofcategorizationimpliesapurpose-driven interactiveprocesswherein-relationshipsbetweencategoriesareactivelyconstructed[10]. 35

PAGE 36

Werstintroducethetwoclasscategoryspaceformulationandthenfollowsuitwith themulticlassgeneralization.Thisisdonetoshowthattherearenohiddenoppositional aspectsthathavetobedroppedintheshifttomulticlass. Webeginwithasetof N labeled,inputvectors x i 2 R D drawnfromtwoclasses C 1 and C 2 .Usinganunknown,globalorigin x 0 2 R D ,thepatternsaremappedtothe categoryspacebythetransformation z i = W T x i )]TJ/F24 11.9552 Tf 11.955 0 Td [(x 0 ,toobtainacorrespondingset of N samples z i 2 R K where K =2 .Here W D K ,where W = w 1 w 2 with w 1 and w 2 representingthecolumnsofthematrix W .Webypasstheissueofestimating theorigin x 0 forthemomentandwilladdressthisinalaterchapter.Webeginwith allowingforthediscoveryofnewcategoriesviamixtureandsimultaneouslyavoid pairwisediscrimination.Theconstraintsthatweenforceontheweightvectorscolumns of W shouldreecttheseprinciples.Theeasiestwayfortheseconsiderationstobetaken intoaccountisfortheweightvectorstoformanorthonormalpair.Recallthatpairwise discriminationisthedeterminationofmembershipbasedonadirectcomparisonofanew patterntothenominalclasslabels.Pairwisediscriminationisavoidedsinceorthogonality guaranteesthatgreaterthemagnitudeoftheinnerproductofapatternwith w 1 ,the lesserthemagnitudeoftheinnerproductwith w 2 : Further,anypatternprojectedinto thesubspacespannedby W canbeconceivedasanew,possiblyuniquecombinationof categories.Finally,theseprinciplesallowforastraightforwardandorganicextensionto multiclass. Theorthonormalityconstraintson W areonlyhalfthestory.Wealsorequirea principleforprojectingthepatternsintothesubspacespannedby W .Notethatwiththe classesnolongerconceivedinnominalterms,wecannotsetupregressionobjectiveswith binary-valuedtargets.Instead,theonlyguidingprinciplewecanfallbackonisthatinner productmagnitudesofeachpatternwithitsowncategorybasisvectorcanbeexpected tobelargerthaninnerproductmagnitudeswiththeotherbasisvector.Thiswillnotbe sucienttoallowustoxtheorigin x 0 anissuewewilltackleshortly.Basedonthese 36

PAGE 37

considerations,therstobjectivefunctionweconsideris min w 1 ;w 2 E W = )]TJ/F15 11.9552 Tf 10.494 8.088 Td [(1 2 X i 2 C 1 )]TJ/F24 11.9552 Tf 5.479 -9.684 Td [(w T 1 x i 2 )]TJ/F15 11.9552 Tf 13.151 8.088 Td [(1 2 X j 2 C 2 )]TJ/F24 11.9552 Tf 5.48 -9.684 Td [(w T 2 x j 2 wheretheorigin x 0 hasbeenremovedfornow.TheobjectivefunctioninEquation3 istobeminimizedsubjecttotheconstraints W T W = I 2 : where I 2 representsthe K K identitymatrix.Whilewecouldhaveusedanymonotonically increasingfunctionof j w T x j ,weusethesquaredmagnitudeoftheinnerproductfor algorithmicreasonswhicharedetailedbelow. Themainreasonforthechoiceofsquaredinnerproductsaboveisthatitleadstoa quadraticobjectivefunction.Giventheorthonormalityconstraints,wehaveaquadratic objectivefunctionwithorthonormalityconstraints.Itturnsoutthattheglobalminimaof suchobjectivefunctionsandconstraintshaverecentlybeencharacterizedmakingthisan attractivechoice. TheobjectivefunctioninEquation3isexpandedasfollows: min w 1 ;w 2 E w 1 ;w 2 = )]TJ/F15 11.9552 Tf 10.494 8.088 Td [(1 2 w T 1 R 1 w 1 )]TJ/F15 11.9552 Tf 13.15 8.088 Td [(1 2 w T 2 R 2 w 2 where R 1 def = X i 2 C 1 x i x T i and R 2 def = X j 2 C 2 x j x T j arecorrelationmatrices.Thisobjectivefunctionandconstraintshavebeenanalyzedby [64],[65]forthecasewhere R 1 and R 2 arepositivedenite.Wenowturntoadescription ofthenecessaryandsucientconditionsforlocalandglobalminima. 37

PAGE 38

TheorthonormalconstraintscanbeexpressedusingaLagrangeparametermatrixto obtainthefollowingconstrainedoptimizationproblem: min W L W ; = )]TJ/F15 11.9552 Tf 10.494 8.088 Td [(1 2 w T 1 R 1 w 1 )]TJ/F15 11.9552 Tf 13.15 8.088 Td [(1 2 w T 2 R 2 w 2 +trace )]TJ/F44 11.9552 Tf 5.48 -9.684 Td [(W T W )]TJ/F44 11.9552 Tf 11.955 0 Td [(I 2 where istheLagrangeparametermatrixusedtoexpresstheorthonormalityconstraints. Settingthegradientof L w.r.t. W tozero,weobtain r W L W ; = )]TJ/F15 11.9552 Tf 11.291 0 Td [([ R 1 w 1 ;R 2 w 2 ]+ W )]TJ/F15 11.9552 Tf 5.479 -9.684 Td [(+ T = 0 : InvokingtheorthonormalityconstraintsallowsustosolvefortheLagrangeparameter matrix : W )]TJ/F15 11.9552 Tf 5.479 -9.683 Td [(+ T =[ R 1 w 1 ;R 2 w 2 ] : Usingtheconstraint W T W = I 2 ,weget + T = W T [ R 1 w 1 ;R 2 w 2 ] : Since )]TJ/F15 11.9552 Tf 5.48 -9.683 Td [(+ T issymmetric,thisimmediatelyimpliesthat W T [ R 1 w 1 ;R 2 w 2 ] issymmetric atalocalminimum.Thesolutioncanbeshown[65]tobecontainedinthepolar decompositionof R W =[ R 1 w 1 ;R 2 w 2 ] .Letthepolardecompositionbe [ R 1 w 1 ;R 2 w 2 ]= UP where U isorthonormaland P positivedenite.Thepolardecompositioniswrittenin termsoftheSingularValueDecompositionSVD UV T =[ R 1 w 1 ;R 2 w 2 ] : Thenweutilizetheorthonormalitypropertyof V T V = I 2 toshowthat UV T V V T =[ R 1 w 1 ;R 2 w 2 ] 38

PAGE 39

Let WS = )]TJ/F44 11.9552 Tf 5.479 -9.684 Td [(UV T )]TJ/F44 11.9552 Tf 12.951 -9.684 Td [(V V T : ThenBolla etal. [65]showedthatthenecessaryconditionsforanextremumofEquation3 are [ R 1 w 1 ;R 2 w 2 ]= WS where W T W = I 2 and S issymmetric.Theauthorsalsoshowthatanecessarycondition fortheaboveextremumtobeaglobalminimumisthat S = W T R W needstobe positivesemideniteinadditiontobeingsymmetric.Thischaracterizesa 1 st order necessary,orweaksucientcondition. Itwouldbehelpfultocharacterizenotonlytheconditionsforextremaandlocal minimabutalsoforglobalminima.Localconditionsthatguaranteeaglobalminimumare availableforthisproblem.Tosetthisup,weneedtointroducesomehelpfulnewnotation. Giventheclassvariancematrices R 1 and R 2 wedene R def = 2 6 4 R 1 0 0 R 2 3 7 5 where R isa KD KD blockdiagonalmatrix,and S 1 W def = 2 6 4 w T 1 R 1 w 1 I 2 1 2 )]TJ/F24 11.9552 Tf 5.479 -9.684 Td [(w T 1 R 1 w 2 + w T 2 R 2 w 1 I 2 1 2 )]TJ/F24 11.9552 Tf 5.479 -9.684 Td [(w T 1 R 1 w 2 + w T 2 R 2 w 1 I 2 w T 2 R 2 w 2 I 2 3 7 5 Rapcsk etal .[64]provedthatif R )]TJ/F24 11.9552 Tf 11.955 0 Td [(S 1 W ispositivesemidenite,thenthis characterizesa 2 nd ordersucientconditionthatisaglobalminimumpointprovided theotherconditionsaresatised. Havingdevelopedthecategoryspaceapproachforthecase K =2 ,wenowdevelop themulticlassextensions.Insteadofhavingtwobasisvectors,wehave K categorybasis vectors,with W =[ w 1 ;w 2 ;:::;w K ] andtheorthonormalityconstraints W T W = I K .The 39

PAGE 40

objectivefunctionis min W E W = )]TJ/F15 11.9552 Tf 10.494 8.088 Td [(1 2 K X k =1 w T k R k w k where R k def = X i 2 C k x i x T i : Since R k willusuallynotbepositivedenite,weaddasmallpositiveconstanttoeach diagonalelementtomakeitpositivedenite.Thenecessaryconditionsforextremaare [ R 1 w 1 ;R 2 w 2 ;:::;R K w K ]= WS where W T W = I K and S issymmetric.Anecessaryconditionfortheaboveextremum tobeaglobalminimumisthat S needstobepositivesemideniteinadditiontobeing symmetric.Tocharacterizeglobalminima,wedene R def = 2 6 6 6 6 6 6 6 4 R 1 0 0 0 R 2 0 0 . 0 0 R K 3 7 7 7 7 7 7 7 5 and S 1 = 2 6 6 6 6 6 6 6 4 w T 1 R 1 w 1 I K 1 2 )]TJ/F48 10.9091 Tf 5 -8.837 Td [(w T 1 R 1 w K + w T K R K w 1 I K 1 2 )]TJ/F48 10.9091 Tf 5.001 -8.836 Td [(w T 1 R 1 w 2 + w T 2 R 2 w 1 I K w T 2 R 2 w 2 I K 1 2 )]TJ/F48 10.9091 Tf 5 -8.836 Td [(w T 2 R 2 w K + w T K R K w 2 I K . . . . . . 1 2 )]TJ/F48 10.9091 Tf 5 -8.836 Td [(w T 1 R 1 w K + w T K R K w 1 I K w T K R K w K I K 3 7 7 7 7 7 7 7 5 If R )]TJ/F24 11.9552 Tf 11.955 0 Td [(S 1 W ispositivesemidenite,thenweareataglobalminimumprovidedthe otherconditionsaresatised Inadditiontoprovidingnecessaryconditionsforglobalminima,theauthorsin [65]developedaniterativeprocedure[pleaseseeAlgorithm3.1foritsdescription]. Bolla etal. [65]alsoanalyzethepropertiesofconvergencetoalocalminimum.The iterativesearchbeginswithanarbitraryorthonormalsystem W forinitialization, 40

PAGE 41

andthenprogressesbygeneratingasequence W ; W ;:::; W m .If W m isalready known,thepolardecompositionofthematrix h R 1 w m 1 ;:::;R K w m K i isperformed.This decompositionistheproductofan D K matrixwithorthonormalcolumnsanda K K positivesemidenitebutnotnecessarilysymmetricmatrixwhere m =0 ; 1 ; 2 ;::: .The polardecompositionisuniqueifthecolumnsofthematrix h R 1 w m 1 ;:::;R K w m K i are linearlyindependent.Instep m +1 W m +1 isthe N K orthonormalmatrixthe rstfactorinthepolardecomposition. Algorithm3.1 IterativeProcessforMinimization Repeat Chooseanarbitraryinitialorthonormalsystem W Thesequence W ; W ;:::; W m .If W m isconstructedasfollows m =0 ; 1 ; 2 ;::: PerformPolarDecompositionusingSVDon: h R 1 w m 1 ;:::;R K w m K i Thematrixisdecomposedastheproductofa D K matrixwithorthonormal columnsanda K K symmetricalpositivesemi-denitematrix. UV T = h R 1 w m 1 ;:::;R K w m K i where V T V = I )]TJ/F44 11.9552 Tf 5.479 -9.684 Td [(UV T )]TJ/F44 11.9552 Tf 12.952 -9.684 Td [(VV T = UV T W m +1 istherstfactorofthisdecomposition W m +1 = UV T Until R )]TJ/F24 11.9552 Tf 11.955 0 Td [(S 1 W ispositivesemi-deniteOR k W m +1 )]TJ/F44 11.9552 Tf 11.956 0 Td [(W m k F Oncewehavedeterminedtheoptimal W ,thepatternsareprojectedtothecategory space z i = W T x i ,toobtainacorrespondingsetof N samples z i 2 R K whereamethod ofclassicationcanbedenedandusedtodeterminecategorymembership.Ideally,we desirecategoriestobeprojectedtowardstheirrespectiveaxiswithinthecategoryspace. Thislendsitselftodeneaclassierthatutilizestheintrinsicnatureofthecategory spaceinordertodeneaclassier.Inourcaseadegreeofmembershiptoeachclassis proportionaltotheanglebetweenthepatternandtherespectivecategoryaxis. Inthischapter,wehavemathematicallydevelopedournotionofacategoryspace andshowedhowthiscanbeformulatedasaquadraticminimizationproblemwith 41

PAGE 42

orthonormalityconstraints.Itturnsoutthatthisoptimizationproblemhasbeenrecently characterized[64],[65]withthenecessaryandsucientconditionsneededforrecognizing localandglobalminima.Also,anextremelysimplealgorithmisavailablewhichis guaranteedtoconvergetoalocalminimum.Thisjustiesourchoiceofmaximizingthe squaredinnerproductsofpatternsw.r.t.theircategorybasis.Thechoiceoftheorigin x 0 isstillanopenissuewhichwereturntolaterinthiswork. 42

PAGE 43

CHAPTER4 ALTERNATIVEAPPROACHESTOSUPERVISEDDIMENSIONALITYREDUCTION 4.1KernelBasedQuadraticFormulation Thebasicideainthischapteristocreateaclassierthattakesadvantageof thestrengthofnon-linearapproaches,andhowkernelsareusedtoprojectdata.A commonapproachtoformulatingthistypeofclassieristomapthedatafromthe inputfeaturespaceontoanewhigherdimensionalfeaturespacesuchthataproblem thatisnonseparableintheinputspace,becomesseparableinthehigherdimensional space.Toachievethistypeofmappingweapplyanon-lineartransformationusinga suitablebasisfunctionandthenusealinearmodelinthefeaturespaceforclassication. Thelinearmodelinthefeaturespacecorrespondstoanon-linearmodelintheinput spacecommonlyknownasthe kerneltrick.Thegeometricinterpretationofthekernel asinnerproductsinafeaturespacewasrstintroducedbyAizerman etal .[66].The kerneldenesaninnerproductbetweentwodatapoints. Welookedtoextendthequadraticdimensionalityreductionapproachdenedin Chapter ?? ,toakernelformulationinaneorttoimprovetheaccuracyofsupervised classication.ThequadraticobjectivefunctionwedenedinEquation3isformulated withtheassumptionthattheproblemislinearlyseparable.Thisformulationhasshown promiseasalinearclassierandhasintrinsicpropertiesthatlendthemselvestobe extendedtoanon-linearclassier.Ourkernelapproachdenesanextensiontothelinear multiclasssolutionandprovidestheframeworkforanon-linearmulticlassclassier. ByextendingEquation3weseektoderiveanobjectivefunctionthatwillprovide similarreliabilityasourlinearsolution,butwithanemphasisonadierentinnerproduct structure.Toachievethisandmakeuseofthedatainfeaturespaceitisnecessaryto deneaninnerproductspace H innerproduct h ; i H betweentwovectors.Kernel methodsoperateinahighdimensionalfeaturespace.Inputsaremappedtothefeature 43

PAGE 44

spacebyamappingfunction : : X!H x : Thisapproachhasbecomeaverypopulartoolinmachinelearning.Initially,wedenethe weightmatrix W = w 1 w 2 ;:::;w K wherethecolumnsarethevectorsassociatedwith the C 1 ;:::;C K classes,tobeindependentlyrestrictedtolieinasubspacespannedbythe projectionofthetrainingpoints.Theyaredenedsuchthat w k = N X n =1 k n x n where k w k k =1 .Here k isasetofcoecientscorrespondingtothetrainingpatterns for w k .Intuitively,theutilityofthemappingisthatnon-linearrelationshipsintheinput space R D becomelinearwhenmappedintothehigherdimensionalinnerproductspace H ThequadratickernelobjectivefunctionisdenedbysubstitutingEquation4into Equation3inplaceofvectorinnerproducts.Inthe2classexamplewebeginwiththe quadraticapproachandthenintroducethekernel-basedinnerproduct: min W L W ; = )]TJ/F17 10.9091 Tf 9.68 7.38 Td [(1 2 w T 1 0 @ X i 2 C 1 x i x T i 1 A w 1 )]TJ/F17 10.9091 Tf 12.104 7.38 Td [(1 2 w T 2 0 @ X j 2 C 2 x j x T j 1 A w 2 +trace )]TJ/F51 10.9091 Tf 5 -8.837 Td [(W T W )]TJ/F51 10.9091 Tf 10.909 0 Td [(I 2 where istheLagrangeparametermatrixusedtoexpresstheorthonormalityconstraints. Letakernel beanon-linearmappingtosomefeaturespace H ,thenthevectors [ w 1 ;w 2 ;:::;w K ] 2H canbefoundbyrewritingthetermstoutilizethekernelmapping w T 1 X i 2 C 1 x i x T i w 1 = X i 2 C 1 N X n =1 n k x n ;x i k x n ;x i n w T 1 X i 2 C 1 x i x T i w 1 =)]TJ/F23 7.9701 Tf 26.382 4.936 Td [( T Q 1 )]TJ/F23 7.9701 Tf 7.314 4.936 Td [( 44

PAGE 45

where Q 1 isthe C 1 classGramkernelmatrix,andthesecondterm w T 2 X j 2 C 2 x j x T j w 2 = N 2 X j =1 N X n =1 n k x n ;x j k x n ;x j n w T 2 X j 2 C 2 x j x T j w 2 =)]TJ/F23 7.9701 Tf 26.381 4.936 Td [( T Q 2 )]TJ/F23 7.9701 Tf 7.314 4.936 Td [( Q 2 isthe C 2 classGramkernelmatrix.Weseeintheexpansionoftheorthonormal constraintsthat w T 1 w 2 = N X i =1 N X j =1 i j x i T x j =0 w T 1 w 2 = N X i =1 N X j =1 i j k x i ;x j =0 w T 1 w 2 =)]TJ/F23 7.9701 Tf 26.381 4.936 Td [( T K )]TJ/F23 7.9701 Tf 7.314 4.936 Td [( =0 and w T 1 w 1 = N X i =1 N X j =1 i j k x i ;x j =1 w T 1 w 1 =)]TJ/F23 7.9701 Tf 26.381 4.936 Td [( T K )]TJ/F23 7.9701 Tf 7.314 4.936 Td [( =1 w T 2 w 2 = N X i =1 N X j =1 i j k x i ;x j =1 w T 2 w 2 =)]TJ/F23 7.9701 Tf 26.381 4.937 Td [( T K )]TJ/F23 7.9701 Tf 7.314 4.937 Td [( =1 arealsodenedintermsofthekernelmappingandcoecients.Here K istheglobal N N Gramkernelmatrix,whoseelementsaredenedby k x i ;x j andthefullmatrix being K = 2 6 6 6 6 6 6 6 4 k x 1 ;x 1 k x 2 ;x 1 k x N ;x 1 k x 1 ;x 2 k x 2 ;x 2 k x N ;x 2 . . . . . . k x 1 ;x N k x 2 ;x N k x N ;x N 3 7 7 7 7 7 7 7 5 : 45

PAGE 46

Bydenition K issymmetricandpositivesemi-denite.UtilizingEquation4,the quadratickernelobjectiveisnowwrittenas min )]TJ/F24 11.9552 Tf 16.384 7.439 Td [(L )]TJ/F24 11.9552 Tf 11.867 0 Td [(; = )]TJ/F15 11.9552 Tf 10.494 8.088 Td [(1 2 K X k =1 T k Q k k + 1 2 trace )]TJ/F15 11.9552 Tf 5.479 -9.684 Td [()]TJ/F25 7.9701 Tf 7.314 4.936 Td [(T K )]TJ/F21 11.9552 Tf 9.971 0 Td [()]TJ/F44 11.9552 Tf 11.955 0 Td [(I K where k representthecolumnsofthematrix )-340(=[ 1 ; 2 ;:::; k ; K ] ,andthedimensions ofthematrixintheinnerproductspaceis )]TJ/F25 7.9701 Tf 7.314 4.338 Td [(N K .Again, K istheglobalGrammatrix, and Q k = N k X i =1 N X n =1 k x n ;x i k x n ;x i : Theelementsofa Q k matrixaregivenbyastandardpositivedenitekernelformulation k x n ;x i = h x n ; x i i H [67],[54],[22].Thesematricesprovideanaccesspointforthe trainingpatternstoenterthealgorithmthroughtheentriesinthekernelmatrix. Thiskernelformulationoftheobjectivefunctionretainsthesamepropertiesasthe quadraticforminEquation3,wecontinuetominimizethesumof K symmetric, positivedenite D D matrices Q 1 ;Q 2 ;:::;Q K K D ,undertheconstraintthat 1 ; 2 ;:::; k ; K 2 R D formanorthonormalsysteminHilbertspace.Thisisconvenient becauseitallowsustotakeadvantageoftheiterativeframeworkpreviouslydescribed, thatcharacterizesbothweakandsucientconditionsforthissystem. Let K = K 1 2 K 1 2 thenwecanrewritetheorthonormalconstraintsfromEquation4 as )]TJ/F23 7.9701 Tf 7.314 4.937 Td [( T K )]TJ/F23 7.9701 Tf 7.314 4.937 Td [( =)]TJ/F23 7.9701 Tf 26.381 4.937 Td [( T K 1 2 K 1 2 )]TJ/F23 7.9701 Tf 7.314 4.937 Td [( =0 )]TJ/F23 7.9701 Tf 7.314 4.936 Td [( T K )]TJ/F23 7.9701 Tf 7.314 4.936 Td [( =)]TJ/F23 7.9701 Tf 26.381 4.936 Td [( T K 1 2 K 1 2 )]TJ/F23 7.9701 Tf 7.314 4.936 Td [( =1 )]TJ/F23 7.9701 Tf 7.314 4.936 Td [( T K )]TJ/F23 7.9701 Tf 7.314 4.936 Td [( =)]TJ/F23 7.9701 Tf 26.381 4.936 Td [( T K 1 2 K 1 2 )]TJ/F23 7.9701 Tf 7.314 4.936 Td [( =1 46

PAGE 47

intermsof u and v .Thenextstepistousesubstitutionfor u = K 1 2 )]TJ/F23 7.9701 Tf 7.314 4.338 Td [( and v = K 1 2 )]TJ/F23 7.9701 Tf 7.314 4.338 Td [( thentheorthonormalconstraintsaredenedas u T v =0 u T u =1 v T v =1 : Solvingfor )]TJ/F23 7.9701 Tf 7.314 4.936 Td [( = K )]TJ/F18 5.9776 Tf 7.782 3.259 Td [(1 2 u & )]TJ/F23 7.9701 Tf 7.314 4.936 Td [( = K )]TJ/F18 5.9776 Tf 7.782 3.259 Td [(1 2 v thenewobjectivefunctioncanbeshowntobe min u;v F u;v = )]TJ/F29 10.9091 Tf 10.303 12.109 Td [( K )]TJ/F18 5.9776 Tf 7.782 3.258 Td [(1 2 u Q 1 K )]TJ/F18 5.9776 Tf 7.782 3.258 Td [(1 2 u )]TJ/F29 10.9091 Tf 10.909 12.109 Td [( K )]TJ/F18 5.9776 Tf 7.782 3.258 Td [(1 2 v Q 2 K )]TJ/F18 5.9776 Tf 7.782 3.258 Td [(1 2 v +trace )]TJ/F51 10.9091 Tf 5 -8.836 Td [(U T U )]TJ/F51 10.9091 Tf 10.909 0 Td [(I 2 : Finally,rearrangingtheterms,weget min u;v F u;v = )]TJ/F48 10.9091 Tf 8.485 0 Td [(u T K )]TJ/F18 5.9776 Tf 7.782 3.259 Td [(1 2 Q 1 K )]TJ/F18 5.9776 Tf 7.782 3.259 Td [(1 2 u )]TJ/F48 10.9091 Tf 10.909 0 Td [(v T K )]TJ/F18 5.9776 Tf 7.782 3.259 Td [(1 2 Q 2 K v +trace )]TJ/F51 10.9091 Tf 5 -8.837 Td [(U T U )]TJ/F51 10.9091 Tf 10.909 0 Td [(I 2 where istheLagrangeparametermatrixusedtoexpresstheorthonormalityconstraints in U .ThisformulationtsintothesameoptimizationframeworkdenedinAlgorithm3.1. Theremaybeissuesduetothepositivesemi-denitenessof K : Inthiscase,weregularize K bymakingitpositivedenite.Compilingthematrix K )]TJ/F18 5.9776 Tf 7.782 3.258 Td [(1 2 Q k K )]TJ/F18 5.9776 Tf 7.782 3.258 Td [(1 2 u K )]TJ/F18 5.9776 Tf 7.782 3.258 Td [(1 2 Q k +1 K )]TJ/F18 5.9776 Tf 7.782 3.258 Td [(1 2 v fromwhichafeasiblesolutionisdeterminedbythepolardecompositionusingSVDof ADB T = K )]TJ/F18 5.9776 Tf 7.782 3.258 Td [(1 2 Q k K )]TJ/F18 5.9776 Tf 7.782 3.258 Td [(1 2 u K )]TJ/F18 5.9776 Tf 7.782 3.258 Td [(1 2 Q k +1 K )]TJ/F18 5.9776 Tf 7.782 3.258 Td [(1 2 v wherethenewmatrices B and A withproperties B T B = I and A T A = I 47

PAGE 48

areusedtosimplifythesolution )]TJ/F44 11.9552 Tf 5.48 -9.684 Td [(AB T )]TJ/F44 11.9552 Tf 12.951 -9.684 Td [(BDB T = ADB T toamorecompactform W m +1 = AB T forthenextiteration.Thisfollowstheiterative frameworkandconditionsthathavebeenanalyzedby[65],[64]. 4.2 L 1 NormObjectiveFormulation TheoptimizationproblemsinEquation3andEquation4consistofthe minimizationofthesumsofheterogeneousquadraticfunctions.Thetheoryand minimizationframeworkcanbegeneralizedtoapplytoanysetofconvexfunctions f 1 ;:::;f M denedon R D [65].Thegeneralizedobjectivefunctionwouldbedenedas: f W = )]TJ/F25 7.9701 Tf 16.073 14.944 Td [(K X k =1 f k w k min undertheconditionsthat w 1 ;::::::;w K formanorthonormalsystem.Wemadeuseof thistheorybyanalyzingthe L 1 normobjectivefunction.BeginningwithEquation3, butreformulatedtoreplacethesquaredinnertermwithournewobjectivefunctionin termsofthe L 1 norm,weget min W E W = )]TJ/F25 7.9701 Tf 16.073 14.944 Td [(K X k =1 X i 2 C k w T k x i Bydenitionthe L 1 normfunctioniscontinuouseverywhere,monotonicallydecreasing, andnon-dierentiableat x =0 .Bynothavingaderivativeatzero,thispreventsus fromstandardderivative-basedoptimizationschemes.Itisconvenienttorstconvert Equation4intoadierentiableform,byemployinganalgebraictransformation [Equation4]thatisdenedintermsoftheoriginalvariable X andanosetvariable j X j! p X 2 + 2 48

PAGE 49

togiveusaredenedobjective min W E W = )]TJ/F25 7.9701 Tf 16.073 14.944 Td [(K X k =1 X i 2 C k q w T k x i 2 + 2 Thenextstepistoreducethecomplexityforminimizingthenalobjectivefunction. ThiscanbeachievedbymakinguseofaLegendretransform[68].Intuitively,the Legendretransform,enablesexpressingtheoriginalobjectivefunctionintermsofthe originalvariable X ,andanauxiliaryvariable u .Specically,theLegendretransform characterizesafunction f asthefunction f denedby f p =sup x px )]TJ/F24 11.9552 Tf 11.955 0 Td [(f x where sup representsthesupremum.HereweshowtheLegendretransforminEquation4is equivalenttothe L 1 normapproximatelywiththeapproximationbecomingincreasingly exactas 0 min u )]TJ/F29 11.9552 Tf 11.291 13.271 Td [( u X )]TJ/F24 11.9552 Tf 11.955 0 Td [( p 1 )]TJ/F24 11.9552 Tf 11.955 0 Td [(u 2 = )]TJ 9.299 10.46 Td [(p X 2 + 2 DierentiatingEquation4w.r.t u andsettingtheresulttozero,wegettheupdatein Equation4. u = X p 2 + X 2 Withbacksubstitutionandbasicalgebraitcanbeshownthatthelefthandsideof Equation4isequivalenttotherighthandside.Thisthensaysthatusingthe Legendretransformtodeneourobjectivefunctionisequivalenttotheobjectivefunction denedinEquation4.Nowthathavingderivedtheseproperties,thenewformofthe L 1 normobjectivefunctioniswrittenas min W ; U L W ; U = )]TJ/F25 7.9701 Tf 16.073 14.944 Td [(K X k =1 X i 2 C k u i )]TJ/F24 11.9552 Tf 5.479 -9.684 Td [(w T k x i + q 1 )]TJ/F24 11.9552 Tf 11.955 0 Td [(u 2 i intermsoftheclassweightvectors W = w 1 w 2 ;:::;w K ,andtheauxiliaryvariables U wherewehavenotexpressedtheorthonormalityconstraintson W .Thenewform [Equation4]islinearin w ,sothatitsimpliesanyminimizationtechniquesthat seektondasolutionfor w .Thisnew L 1 normformulationcannowbeminimizedwith 49

PAGE 50

thesameiterativeprocessprovidedbyBolla etal. [65].Theorthonormalsystemfor W is showntobecontainedinthepolardecompositionof 2 4 X i 2 C k x i u i X j 2 C k +1 x j u j X m 2 C k +2 x m u m 3 5 = WS where W T W = I and S issymmetric. 4.3BiasEstimation:Origin Earlier,inSection3.2,wemadereferencetoaglobalbias x 0 .Let'sbeginwithaset of N labeled,inputvectors x i 2 R D andanunknown,biasterm x 0 2 R D .Supposethe patternsaremappedtothecategoryspacebythetransformation z i = W T x i )]TJ/F24 11.9552 Tf 11.955 0 Td [(x 0 .The focusbecomesestimatingthebias x 0 ,andhowitimpactsthecomplexityofthesolution. Traditionallyforminimizingquadraticobjectiveswhenthebiasisaconcern,methods havetreatedthebiasastheclasscentroid.Wedecidedtodrawfromthisintuitive approach,andbeginourformulationutilizingthesameconcept.Let'sdene x k 0 = X a 2 C k w T k x a N k tobetheclassbias.StartingwiththeEquation3form,butextendingtosuitthe multiclassproblemwhere K> 2 ,andaccountforthebias,ourbiasfreequadratic objectiveformulationcanbewrittenas L W = )]TJ/F15 11.9552 Tf 10.494 8.087 Td [(1 2 K X k =1 X i 2 C k w T k x i )]TJ/F24 11.9552 Tf 11.955 0 Td [(x k 0 2 : BysubstitutionofEquation4intoEquation4,weseethatanadjustmentis madetoeachcategoryspaceprojectedpatternwithinaclass C k toaccountfortheclass categoryspacecentroid. L W = )]TJ/F15 11.9552 Tf 10.494 8.087 Td [(1 2 K X k =1 X i 2 C k w T k x i )]TJ/F29 11.9552 Tf 14.965 11.357 Td [(X i 2 2 C k w T k x i 2 N k 2 50

PAGE 51

TheLegendretransformhasbeenapowerfultoolinthisresearch,andhasplayedavery importantroleinourformulations,sowedecidedtocontinueinthatfamilyofobjective functions. L W = )]TJ/F25 7.9701 Tf 16.073 14.944 Td [(K X k =1 X i 2 C k u k i )]TJ/F24 11.9552 Tf 5.479 -9.684 Td [(w T k x i + 1 2 K X k =1 X i 2 C k u k i 2 InEquation4wehaveappliedtheLegendretransformationpropertiesthatwere denedinEquation4,thenaddtheconstraintthattheclassauxiliaryvariablessum tozero, P i 2 C k u k i =0 .Ournewobjectivefunctionis L W = )]TJ/F25 7.9701 Tf 16.073 14.944 Td [(K X k =1 X i 2 C k u k i )]TJ/F24 11.9552 Tf 5.479 -9.684 Td [(w T k x i + 1 2 K X k =1 X i 2 C k u k i 2 + K X k =1 k X i 2 C k u k i : ItcanbeshownthattheformofEquation4doesnotchangetheintentofthe originalobjectivefunctionasdenedinEquation4.First,dierentiatew.r.t. u k i thensettozero @L W @u k i = )]TJ/F24 11.9552 Tf 9.298 0 Td [(w T k x i + u k i + k =0 : Next,multiplyEquation4throughby u k i ,andsumovertheclasses,toseethatwe get )]TJ/F29 11.9552 Tf 12.226 11.358 Td [(X i 2 C k u k i )]TJ/F24 11.9552 Tf 5.48 -9.684 Td [(w T k x i + X i 2 C k u k i 2 + k X i 2 C k u k i =0 : ThenexpandEquation4suchthatourresultis )]TJ/F29 11.9552 Tf 12.226 11.358 Td [(X i 2 C k u k i )]TJ/F24 11.9552 Tf 5.48 -9.684 Td [(w T k x i +2 1 2 X i 2 C k u k i 2 + k X i 2 C k u k i )]TJ/F15 11.9552 Tf 13.151 8.088 Td [(1 2 X i 2 C k u k i 2 : Now,usetheconstraintfromEquation4tosimplifytheobjective,andtheresulting functioncanisexpressedas L W = )]TJ/F15 11.9552 Tf 10.494 8.088 Td [(1 2 X i 2 C k u k i 2 : 51

PAGE 52

Aswecontinuetosolveforthe u k i componentfromEquation4,weget u k i = w T k x i )]TJ/F29 11.9552 Tf 14.965 11.357 Td [(X i 2 2 C k w T k x i N k where k = P i 2 2 C k w T k x i N k showsthatEquation4isequivalenttoEquation4. Thisshowsthatthereisnothinglostfromthedenitionoftheoriginalobjective function[Equation4]whenwrittenintermsof u k i andaddingtheconstraint that P i 2 C k u k i =0 Asimilarconceptcanbeappliedtoextendedthe L 1 normobjectivefunctionin Equation4,toabiasfreeobjectivefunction.The L 1 normbiasfreeobjectiveis L W = )]TJ/F25 7.9701 Tf 16.073 14.944 Td [(K X k =1 X i 2 C k u k i )]TJ/F24 11.9552 Tf 5.479 -9.683 Td [(w T k x i )]TJ/F24 11.9552 Tf 11.955 0 Td [( K X k =1 X i 2 C k r 1 )]TJ/F29 11.9552 Tf 11.955 13.27 Td [( u k i 2 + K X k =1 k X i 2 C k u k i whereanadditionaltermisaddedfortheconstraint,thattheclassauxiliaryvariablessum tozero, P K k =1 k P i 2 C k u k i .SolvingfortheLagrangeparametersresultinasolutionfor u k i thataccountsforthebiasasseenin u k i = w T k x i q w T k x i 2 + 2 )]TJ/F15 11.9552 Tf 17.471 8.087 Td [(1 N k X i 2 C k 0 @ w T k x i q w T k x i 2 + 2 1 A : ThesummationconstraintinEquation4isnotequivalenttotheclasscentroidthat wasfoundforthesolutiontothequadraticobjectiveinEquation4,butinfactplays averysimilarroleinthisinstance.Sincetherstterminthesolutiontendsto 1 ,the secondterm,whichisthebiasterm,compensatestokeeponesidefromdominating.This isanalogoustotheconceptofndingamedianvalueasasolutionto u k i .Thebiasterm inallcasesisimportantasithelpstodeterminethelocationoftheorigininthecategory space. WehaveshownhowtheframeworkandconvergenceconditionsderivedinChapter1.4, canbegeneralizedandmadetouseotherconcaveobjectivefunctions[65].This 52

PAGE 53

demonstratesthestrengthofourapproachandtherangeofpossibleproblemsthat canbesolved,iftheobjectivefunctioncanbemathematicallyreformulated,totake advantageofthisoptimizationapproach.Wehavetestedtheseobjectivefunctionson variousbenchmarkstandarddatasetswiththeresultstobecoveredinChapter5.1. 53

PAGE 54

CHAPTER5 RESULTSANDDISCUSSION 5.1PerformanceMeasures Oneofthemostimportantmetricsforevaluatingtheperformanceofanalgorithmis itsaccuracyrates.Whenstudyinganddesigninglearningalgorithms,weareinterested inperformanceonexamplesnotseenduringtraining,i.e.,thegeneralizationerror.The simplestandmostwidelyusedmethodforestimatingpredictionerroriscross-validation. Ideally,ifwehadenoughdata,wewouldsetasideavalidationsetandusethevalidation settoassesstheperformanceofourpredictionmodel.Sincedataisoftenscarcethis isusuallynotpossible.Theapproachwetakeistosplitthetrainingdataintodisjoint subsets.Usingaholdouttypemethod,wesplitthedataintothetrainingsetandthetest set.Thisprocessissimilarto k )]TJ/F19 11.9552 Tf 9.299 0 Td [(foldcrossvalidation[69],wherealltheexamplesinthe datasetareeventuallyusedforbothtrainingandtesting.Thissplitisdonetoproduce avariablesizedsegmentduringanexperiment.Foreachoftheexperiment,weusethe subsetfortrainingandtheremainingdatafortesting.Atotalof T trialsoftrainingand validationtakeplace,andamisclassicationerrorrate e i iscalculatedatthe i th trialwith auniquerandomsubsetofthedataheld-outforvalidation.Thetrueerrorisestimated usingtheaverageerrorrate: E = 1 T T X i =1 e i Todetermine e i ,weperformclassicationonthecategoryspaceprojecteddataas follows.Oncewehaveobtainedafeasiblesolution,aprobabilisticmethodforclassication isexecutedbyconstructingGaussiandensityfunctionsforeachofthe K classes[27]inthe categoryspace.Given K classes,theapproachbuilds K densityfunctions p x j C k ;k 2 f 1 ;:::;K g .Afterestimatingthedensitymodelsfromthetrainingsetofpatterns,the modelsareusedforclassifyingincomingpatternsbyselectingthecategorymembership basedonthecategoryconditionaldensitywiththehighestvalue. 54

PAGE 55

5.2AccuracyAnalysis Inthissection,wepresentexperimentalresultsonseveralstandarddatasets.We demonstratetheeectivenessofourapproaches,andbenchmarktheiroverallclassication accuracyagainstseveralpopularalgorithms.SeeTable5-1foralistofthealgorithms andabbreviationsusedthroughouttheremainderofthischapter.Wehaveusedvarious standardclassicationdatasetsfromboththeKeeldatabase[70],andtheUCIRepository [71].Thesedatasetsvaryinthenumberofclasses K ,samples N ,andfeaturespersample D .Thedatawasdeliberatelychosentovaryinsizeandnumberofattributesinorderto exhibittheexibilityofourapproaches.Alldatasetswereusedintheformprovidedby therepositorywithoutmodication,foracompletelistofthedatasetsseeTable5-2. Ifmoredetailsaredesired,informationabouttheindividualsetscanbefoundattheir respectiverepositorywebsite. Thetestsetsweredividedintothreemajorgroupssynthetic,image,andfeaturedata. Thesyntheticdatawasgeneratedby3independentmultivariateGaussiandistributions, andusedasabasisforcalibrationTable5-3.The K =3 classproblemislinearly separableandtheexpectedresultis100%classicationaccuracy,butmostimportantly thisallowedustocheckforcorrectnessinthebehaviorandimplementationofthe comparisonalgorithms. TheimagedataclassicationaccuracyisreportedinTable5-4throughTable5-7. Duringourexperimentsthegrayscaleintensityvaluewasusedasafeature.Allimages werestandardizedandregisteredtoensureequivalentinputdimensionsforeachpattern. Allsampleimageswerevectorized,andprincipalcomponentanalysisPCAfeature extractionstepwasperformed.Wetookthisapproachtoremovetheunwantednoise, whichwillallowthedimensionalityreductionalgorithmstooperateonlyonsignicant features. 55

PAGE 56

WereportthefeaturedataaccuracyinTable5-8throughTable5-21.Forthese testsnopreprocessingstepwasperformed,andthevectorizedfeaturedataisusedinits nativeform. Inmanyofthetestcases,ouralgorithmshaveshowncomparableclassicationresults againsttraditionalmethodsusedfordimensionalityreduction.Overallthefouralgorithms developedinthisresearchCQS,K-CQS,CAS,andK-CASprovedpromisingonadiverse testset,andinmanycasesexceededtheperformanceofthemorepopularreduction approaches.Inaddition,sincethefocusofthisresearchishowdimensionalityreduction hasapositiveimpactontheperformanceofmulticlassclassication,weincludedthe resultsforSVM,andK-SVMasareferenceforcomparisonofatraditionaldiscriminant classiersthatappliesastandardone-against-allconversiontechniquetosolvethe multiclassproblem. NOTE: Theresultsinthedatatablesarereportedintheformof averageaccuracy errormargin Table5-1.Listofalgorithms AlgorithmNameAbbreviation CategoryVectorSpaceQuadraticCQS KernelCategoryVectorSpaceQuadraticK-CQS CategoryVectorSpace L 1 normCAS KernelCategoryVectorSpace L 1 normK-CAS LeastSquaresLinearDiscriminantAnalysisLS-LDA PrincipalComponentAnalysisPCA MulticlassFisherLinearDiscriminantMC-FLD KernelMulticlassFisherLinearDiscriminantK-MC-FLD MulticlassSupportVectorMachineSVM MulticlassKernelSupportVectorMachineK-SVM 56

PAGE 57

Table5-2.Listoftestdatasets DataSetNames#Classes#Attributes Gaussian33 CalTechFace6-205400/15 Coil2016384/67 Englishalphabet13-80516 Movementlibras1590 Handwrittendigits10-57264 PokerFour,Royal,Straight3,17,23610 Satellite6-153336 Segmentation719 Vertebral3-1506 Shuttle7-455869 Texture1140 Iris34 Seeds37 Winerecognition3,58,7113 Vowel1010 RaceSpacefemale5-63510010/60 RaceSpacemale5-345110010/60 57

PAGE 58

Table5-3.SyntheticGaussiandataaccuracymetrics AlgorithmAccuracy% std CQS100.0 0.0 CAS99.8 0.1 LS-LDA100.0 0.0 PCA100.0 0.0 MC-FLD100.0 0.0 SVM100.0 0.0 K-CQS99.4 0.6 K-CAS100.0 0.0 K-MC-FLD100.0 0.0 K-SVM100.0 0.0 Table5-4.CalTechgrayscaleimageaccuracymetrics AlgorithmAccuracy% std CQS65.9 7.8 CAS68.9 5.3 LS-LDA80.0 7.4 PCA68.9 7.3 MC-FLD81.2 5.9 SVM94.8 2.4 K-CQS74.8 7.8 K-CAS84.2 3.7 K-MC-FLD30.9 14.8 K-SVM94.4 2.4 58

PAGE 59

Figure5-1.SampleCaltechgrayfaceimages PhotoscourtesyofCalTechfaceimagesdataset. Table5-5.Coilimageaccuracymetrics AlgorithmAccuracy% std CQS79.2 2.0 CAS78.9 2.8 LS-LDA98.2 0.85 PCA77.8 1.9 MC-FLD99.3 0.3 SVM99.8 0.1 K-CQS96.8 1.8 K-CAS97.2 1.6 K-MC-FLD22.2 15.1 K-SVM99.9 0.1 Figure5-2.Coildatasample PhotoscourtesyofCoildataset. 59

PAGE 60

Table5-6.Femalegrayscaleimageaccuracymetrics AlgorithmAccuracy% std CQS82.1 6.1 CAS82.5 7.1 LS-LDA96.5 3.0 PCA88.4 4.3 MC-FLD96.8 1.1 SVM98.7 1.3 K-CQS84.1 4.4 K-CAS88.0 3.6 K-MC-FLD55.8 12.6 K-SVM99.3 0.8 Table5-7.Malegrayscaleimageaccuracymetrics AlgorithmAccuracy% std CQS84.0 5.3 CAS83.6 4.9 LS-LDA88.1 5.4 PCA74.0 5.8 MC-FLD86.3 1.6 SVM88.1 4.0 K-CQS71.0 11.5 K-CAS78.6 21.0 K-MC-FLD45.0 15.9 K-SVM90.4 4.3 60

PAGE 61

Table5-8.Englishalphabetaccuracymetrics AlgorithmAccuracy% std CQS87.6 0.8 CAS87.4 1.1 LS-LDA86.8 0.3 PCA86.2 1.2 MC-FLD85.9 1.2 SVM72.2 1.4 K-CQS86.0 1.2 K-CAS82.0 2.4 K-MC-FLD71.7 28.8 K-SVM77.4 1.3 Figure5-3.Letterdatasample PhotocourtesyofKEEL-datasetrepository.[70] 61

PAGE 62

Table5-9.Movementlibrasaccuracymetrics AlgorithmAccuracy% std CQS86.5 3.5 CAS87.1 1.6 LS-LDA87.5 3.8 PCA85.4 3.9 MC-FLD87.2 4.9 SVM99.2 0.9 K-CQS96.2 4.2 K-CAS70.7 29.8 K-MC-FLD24.3 16.1 K-SVM99.4 0.6 Figure5-4.Movementlibrasdatasample PhotocourtesyofKEEL-datasetrepository.[70] 62

PAGE 63

Table5-10.Opticalrecognitionaccuracymetrics AlgorithmAccuracy% std CQS93.5 1.4 CAS93.6 1.0 LS-LDA96.8 0.3 PCA89.3 0.5 MC-FLD96.6 0.6 SVM95.9 0.6 K-CQS94.4 0.5 K-CAS65.4 30.5 K-MC-FLD77.3 12.3 K-SVM97.9 0.4 Table5-11.Handwrittendigitsaccuracymetrics AlgorithmAccuracy% std CQS96.6 0.5 CAS96.6 0.8 LS-LDA95.5 0.4 PCA96.1 0.6 MC-FLD94.9 0.6 SVM91.2 0.3 K-CQS95.7 0.5 K-CAS92.7 0.8 K-MC-FLD54.0 16.2 K-SVM90.8 0.3 63

PAGE 64

Figure5-5.Optdigitssampledata PhotocourtesyofKEEL-datasetrepository.[70] Table5-12.Fourofakind,Royalush,Straightushaccuracymetrics AlgorithmAccuracy% std CQS76.1 10.7 CAS74.7 9.7 LS-LDA73.3 9.2 PCA47.9 17.2 MC-FLD43.1 6.3 SVM91.2 0.0 K-CQS82.7 4.5 K-CAS43.6 13.3 K-MC-FLD91.2 0.0 K-SVM91.2 0.0 64

PAGE 65

S1C1S2C2S3C3S4C4S5C5Class 112113112111112one-pair Figure5-6.Pokersampledata PhotocourtesyofUCIMachineLearningrepository.[71] Table5-13.Satellitedataaccuracymetrics AlgorithmAccuracy% std CQS86.4 1.0 CAS86.4 1.2 LS-LDA86.0 1.3 PCA82.6 1.7 MC-FLD84.4 2.0 SVM82.6 0.8 K-CQS83.8 1.5 K-CAS85.4 1.5 K-MC-FLD67.2 1.2 K-SVM85.8 0.9 65

PAGE 66

Figure5-7.Satellitedatasample PhotocourtesyofKEEL-datasetrepository.[70] Table5-14.Segmentationaccuracymetrics AlgorithmAccuracy% std CQS84.3 4.7 CAS84.0 4.2 LS-LDA84.7 0.8 PCA88.5 0.8 MC-FLD91.6 0.7 SVM84.3 0.3 K-CQS84.8 4.2 K-CAS78.8 2.2 K-MC-FLD93.5 0.6 K-SVM81.9 0.4 66

PAGE 67

Figure5-8.Segmentdatasample PhotocourtesyofKEEL-datasetrepository.[70] Table5-15.Shuttleaccuracymetrics AlgorithmAccuracy% std CQS85.1 2.2 CAS85.2 2.2 LS-LDA85.0 0.8 PCA46.9 14.2 MC-FLD91.7 0.7 SVM85.5 0.9 K-CQS97.0 1.0 K-CAS79.4 2.5 K-MC-FLD93.4 0.6 K-SVM84.3 2.2 67

PAGE 68

Figure5-9.Shuttlesampledata PhotocourtesyofKEEL-datasetrepository.[70] Table5-16.Textureaccuracymetrics AlgorithmAccuracy% std CQS98.8 0.3 CAS98.9 0.2 LS-LDA99.6 0.1 PCA98.0 0.3 MC-FLD99.8 0.0 SVM97.3 0.4 K-CQS96.6 0.7 K-CAS92.1 2.6 K-MC-FLD79.5 26.8 K-SVM96.7 0.5 68

PAGE 69

Figure5-10.Texturesampledata PhotocourtesyofKEEL-datasetrepository.[70] Table5-17.Irisaccuracymetrics AlgorithmAccuracy% std CQS97.0 1.3 CAS96.6 1.1 LS-LDA97.0 2.4 PCA96.6 1.6 MC-FLD97.3 1.9 SVM89.0 5.7 K-CQS95.4 1.3 K-CAS94.4 2.7 K-MC-FLD77.5 10.5 K-SVM87.0 3.92 69

PAGE 70

Figure5-11.Irissampledata PhotocourtesyofUCIMachineLearningrepository.[71] Table5-18.Seedsaccuracymetrics AlgorithmAccuracy% std CQS95.0 3.0 CAS95.2 2.7 LS-LDA95.4 1.7 PCA92.6 1.5 MC-FLD96.6 0.9 SVM89.2 3.6 K-CQS90.2 1.8 K-CAS90.7 2.7 K-MC-FLD75.4 26.9 K-SVM90.0 4.4 70

PAGE 71

Table5-19.Vertebralaccuracymetrics AlgorithmAccuracy% std CQS74.5 3.7 CAS74.6 3.7 LS-LDA83.0 2.0 PCA75.3 5.6 MC-FLD82.5 3.4 SVM81.9 1.8 K-CQS73.1 5.7 K-CAS65.1 3.8 K-MC-FLD79.4 5.5 K-SVM81.4 0.9 Table5-20.Winerecognitionaccuracymetrics AlgorithmAccuracy% std CQS84.2 4.4 CAS84.5 5.0 LS-LDA92.5 3.0 PCA82.8 3.7 MC-FLD98.2 1.5 SVM67.7 2.3 K-CQS75.7 2.1 K-CAS68.0 4.9 K-MC-FLD65.8 29.2 K-SVM67.4 2.3 71

PAGE 72

Figure5-12.Winerecognitionsampledata PhotocourtesyofUCIMachineLearningrepository.[71] Table5-21.Vowelaccuracymetrics AlgorithmAccuracy% std CQS85.2 1.7 CAS85.1 1.9 LS-LDA83.0 2.8 PCA82.0 1.5 MC-FLD77.2 2.5 SVM52.5 1.3 K-CQS82.9 2.4 K-CAS81.5 3.4 K-MC-FLD96.4 1.2 K-SVM54.1 2.1 72

PAGE 73

Figure5-13.Voweldatasample PhotocourtesyofKEEL-datasetrepository.[70] 73

PAGE 74

CHAPTER6 CONCLUSIONS 6.1Contributions Inthisresearch,weinvestigatedseveralaspectsofmulticlassclassication,focusingin particularondimensionalityreduction,andhowourapproachreducesgeometricambiguity andimprovesmulticlassclassication.Ourstudyclearlyexplainsthatpresentmulticlass supervisedlearningmethodsdonotsolvethisproblem.Thisisduetohowclassesare identiedasnominallabels,ortheuseofrelateddiscriminantbasedconversiontechniques one-versus-oneorone-versus-allthatareconstructedinordertocreatecompetition. Weproposedacategoryspaceapproach,thatovercameseveraloftheproblems inherenttopastattempts.Wedenedastructuredcategoryspacewherecategoriesare basisvectorsaxescorrespondingtobasiccategoriesthatdeneaEuclideanvectorspace. Basiccategoriesinthecategoryspaceareorthogonaltoeachother,whilecompound categoriescanbedenedasalinearcombinationofbasiccategories.Thispropertyenables thecategoryspacetoensurethatambiguityishandledatafundamentallevel. Wederived4superviseddimensionalityreductiontechniques:quadraticcategory spacediscriminantCSD,kernelquadraticcategoryspacediscriminantKCSD, L 1 norm categoryspacediscriminantLCSD,andkernel L 1 normcategoryspacediscriminant KLCSD.Wealsointroducedanoptimizationframeworktoobtainafeasiblesolution fortheobjectivefunctions,andcharacterizedbybothnecessaryandsucientconditions inthequadraticcases.Ourexperimentsincludetheaccuracyresultsforourtechniques, andcomparisonswithmanyclassicalalgorithmsonseveralbenchmarkclassicationdata sets.Evaluationofourcategoryspaceapproachonreal-worlddataresultedincomparable performanceinmanyofthetestcases. Wewereparticularlyinterestedinthenaturalvarietythatraceoers,andhowthis tsintotherealmofmulticlassclassication.WeprovidedaRaceSpacefaceimagedata 74

PAGE 75

setthatisdiverseinbothraceandgender,andlookedtoapplyourapproachtotherace classicationproblem. 6.2FutureWork Researchandanalysisofthecategoryspacewasconstrainedtoprojectedpatterns thatlieinthenon-negativeorthant R + K .Aninterestingextensionofthisworkwouldbe toexaminethefullvectorspacebyincludingnegativecategories.Theideaistousethe negativeaxisfornegativecategoryrepresentationexamplenotA ,notB etc..Another valuablelineofresearchwouldinvestigatehowwarpingtheprojectedpatternstowards theirrespectivecategoryaxiscouldimproveclassicationaccuracy.Futurestudieswill investigatethekernelformulation,andexperimentwithavarietyofkernelstoanalyzehow projectionintoHilbertspacealterstheresultsofsuperviseddimensionalityreduction. Whilethisresearchfocusedonmulticlassclassication,wefeelthisworkhasdirect applicationtomanyproblemsthatmayincludefeatureextraction,compression,encoding, andbigdataanalysis.Wewillcontinuetoextendourworktoshowhowsupervised dimensionalityreductioncanhaveapositiveimpactinvariousresearchareas. 75

PAGE 76

APPENDIXA RACEFACEIMAGEDATA A.1RaceSpaceFaceDataSetOverview Raceplaysamajorroleinsocietyandhowtheworldisdened.Racemaybe thoughtofindierentwaysbuttypicallyreferstoclassifyinghumansintodistinctgroups basedonappearance,locationFigureA-2,culture,ortraits[72].Asabiologicalterm racedenotesgeneticallydivergenthumanpopulationsthatcanbemarkedbycommon phenotypetraits.[73]Thisdenitioniscommonlyusedinsciencessuchasforensics, biomedicalresearch,andethnicbasedmedicine[74]. Raceanditseectshavebeenresearchedforcenturies.Ethnology,canbedened asthesciencethatstudiesandanalyzesthesocialstructureoftheethnic,racial,and/or nationaldivisionsofhumanity[75].Oneoftheearliestpublicationsonclassicationof humansintodistinctraceswasbyFranoisBernierNewdivisionofEarthbythedierent speciesorraceswhichinhabitit,publishedin1684.Whilethedesiretodenethehuman racesgrew,scientistcontinuedtodebateabouttheclasses.Humangroupsbecamethe focusofmostscienticresearch.JohannFriedrichBlumenbachpublishedhistextin 1775entitledOnTheNaturalVarietiesofMankind[76].Heusedtraitsofthehuman speciestodenetheindividualgroups.Henoticedadistinctionintheskullsofthegroups asshowninFigureA-3.Thiswasthedenitiveworkthatestablishedthevemajor divisionsofhumans,i.e.,theCaucasoidrace,Mongoloidrace,Negroidrace,American Indianrace,andMalayanrace. Wefoundraceclassicationtobeanintriguingproblemandsynonymousto multiclassclassication.ThistypeofclassicationwasattemptedbyOu etal .[77].We searchedmanyfaceandimagedatabases[78],butnonehadasucientrepresentationof theracesoftheworld.Wewantedimagesoftheindividualswithnaturalfacialexpressions andinuncontrolledenvironments.WebeganwithimagesfromthepopularLabeled FacesinTheWilddatabase[79],andproceededtodividetheindividualsintovarious 76

PAGE 77

racegroups,basedoninternetresearchandhumanopinions.Imageswereaddedand modicationswereperformedtodeneadatabaseknownasRaseSpaceDatabase.The originalimageswerewerecroppedto 250 250 pixelimagetocaptureshoulderline andinordertoencapsulatetheheadandfacialfeatures.Wedenedafaceellipsewith majorandminoraxesoftheellipse89/107pixelsrespectively,tocollectjustthefaceand removebackground.Thesewerethenresizedintoa 128 128 pixelimagewithawhite background.Includedinthedatasetaretheoriginalcolorimages,acorrespondinggray scaleimage,adiscretecosinetransformDCTimage,andahueimage.Asampleofthe variousimagesisseeninFigureA-1. A B C D E FigureA-1.SampleofimagesincludedinRaceSpacedataset AOriginalBGrayscaleCCroppedfaceDDCTimageEHueoforiginalimage PhotoscourtesyofLabeledFacesintheWild[79]andworldwideweb. 77

PAGE 78

FigureA-2.MeyersraceEthnographicMap FromMeyersKonversationslexikonof1885-90.ListingCaucasianracesAryans,Hamites, Semites,MongolianracesnorthernMongolian,ChineseandIndo-Chinese,Japaneseand Korean,Tibetan,Malayan,Polynesian,Maori,Micronesian,Eskimo,AmericanIndian, andNegroidracesAfrican,Hottentots,Melanesians/Papua,"Negrito",Australian Aborigine,Dravidians,Sinhalese Photocourtesyof https://en.wikipedia.org/wiki/Race_human_classication FigureA-3.Blumenbachdivisionofraces Blumenbachdividedthehumansintoveracesin1779. Photocourtesyof http://en.wikipedia.org/wiki/Johann_Friedrich_Blumenbach#cite_noteeb9-3 78

PAGE 79

FigureA-4.SampleCaucasianfemalefaces PhotoscourtesyofLabeledFacesintheWild[79]andworldwideweb. FigureA-5.Samplenegroidfemalefaces PhotoscourtesyofLabeledFacesintheWild[79]andworldwideweb. 79

PAGE 80

FigureA-6.SampleAsianfemalefaces PhotoscourtesyofLabeledFacesintheWild[79]andworldwideweb. FigureA-7.SampleArierfemalefaces PhotoscourtesyofLabeledFacesintheWild[79]andworldwideweb. 80

PAGE 81

FigureA-8.SampleIndianfemalefaces PhotoscourtesyofLabeledFacesintheWild[79]andworldwideweb. 81

PAGE 82

APPENDIXB THEOREMS,LEMMAS,&COROLLARY B.1TheoremOverviewBolla etal. Wearegiven K symmetric,positivedenite D D matrices R 1 ;R 2 ;:::;R K K 6 D Thesetoforthonormal K -tuplesin R D iscalledaStiefelmanifoldanddenotedby V D;K S isasymmetric K K matrix,andthe D K matrices R W =[ R 1 w 1 ;:::;R K w K ] and W =[ w 1 ;:::;w K ] containtheenumeratedvectorsastheircolumns.Theobjective function f quad W = )]TJ/F25 7.9701 Tf 16.073 14.944 Td [(K X k =1 w T k R k w k istobeminimizedundertheconstraints w T i w j = ij 6 i;j 6 K where W =[ w 1 ;:::;w K ] ,and ij istheKroneckerdeltafunction.As V D;K isacompact manifoldand f quad iscontinuouson V D;K ,niteglobalminimaexistandisattainedat somepoint.[65] B.2Theorem3.1Bolla etal. W 2 V D;K isacriticalpointof f quad ifandonlyif S = R W T W issymmetric,i.e., R W = W S holds,withasymmetric S .[65] B.3Lemma3.1Bolla etal. If W 2 V D;K isaglobalmaximumofthefunctional f quad ,thenthecorresponding matrix S = W T R W ispositivesemidenite.[65] 82

PAGE 83

B.4Corollary3.1Rapcsk etal. Givensymmetric D D matricesand R isa KD KD blockdiagonalmatrixwith diagonalblocks D 1 ;D 2 ;:::;D K .Ifthepoint W 2 V D;K isastrictlocalminimumthen R W = S 1 W W andthematrix R )]TJ/F24 11.9552 Tf 11.955 0 Td [(S 1 W isapositivesemidenitedenitematrixwhere S 1 W = 2 6 6 6 6 6 6 6 4 w T 1 R 1 w 1 I K 1 2 )]TJ/F24 11.9552 Tf 5.479 -9.684 Td [(w T 1 R 1 w K + w T K R K w 1 I K 1 2 )]TJ/F24 11.9552 Tf 5.479 -9.684 Td [(w T 1 R 1 w 2 + w T 2 R 2 w 1 I K w T 2 R 2 w 2 I K 1 2 )]TJ/F24 11.9552 Tf 5.479 -9.684 Td [(w T 2 R 2 w K + w T K R K w 2 I K . . . . . . 1 2 )]TJ/F24 11.9552 Tf 5.48 -9.684 Td [(w T 1 R 1 w K + w T K R K w 1 I K w T K R K w K I K 3 7 7 7 7 7 7 7 5 : If C V D;K isanopenconcaveset,andthereexistsapoint W 2 C suchthat R W = S 1 W W ,and R )]TJ/F24 11.9552 Tf 11.955 0 Td [(S 1 W W 2 C arepositivesemidenitedenitematrices,thenthepoint W isastrictglobal minimumofthefunction f quad ontheset C .[64] 83

PAGE 84

APPENDIXC SPLINE-BASEDCLASSIFICATION C.1SplineOverview Thepurposeofthisappendixistodevelopasplinebaseddiscriminantalongthelines suggestedbyProf.HaldunAytug.Thiscameupduringtheproposaldefensediscussion. Thequestionraisedatthattimewastheneedfororthogonalityoftheweightvectors mappingthefeaturestothecategoryspace.Thespline-basedapproachdevelopedhere relaxestheorthonormalityconstraintontheweightvectors.Inthisapproach,weattempt toforceeachpatterntowarditschosencategoryaxisusingaspline-drivenwarping.Thisis dierentfromtheoriginalcategoryspaceapproachdescribedpreviously. Thesplinesarebasedonstandardkernelsusedinwarping[80].Wederivethe formulationforatwoclassproblemandthenshowthatextensiontomultipleclassesis straightforward.Letthefunctions y = X n K x n ;x w n 1 y = X n K x n ;x w n 2 C beapairofapproximatingfunctionsthatmapagiveninput x totheoutputvalue y .The sets y and y arethecoordinatesresultingfromwarpingthefeaturevector x .Two setsofsplinecoecients w 1 and w 2 arerequiredtocarryoutthewarping.Anystandard RKHSkernel K canbeused.Forpatterns x i drawnfromclass1and x j drawnfromclass 2,weget y i = N X n =1 K x n ;x i w n 1 C y i = N X n =1 K x n ;x i w n 2 84

PAGE 85

and y j = N X n =1 K x n ;x j w n 1 C y j = N X n =1 K x n ;x j w n 2 respectively.Wedesignanobjectivefunctionwhoseminimizationyieldsthespline coecients.Thegoalistowarpfeaturesvectors x i ascloseaspossibletoitscategoryaxis withasimilargoalforfeaturevectors x j .Weusetheobjectivefunction min f w 1 ;w 2 g X i 2 C 1 )]TJ/F24 11.9552 Tf 9.298 0 Td [(y 2 i + y 2 i + X j 2 C 2 )]TJ/F24 11.9552 Tf 9.298 0 Td [(y 2 j + y 2 j : C Substitutingfor y and y fromEquationsCandCintoEquationC,weget E = )]TJ/F29 11.9552 Tf 12.015 11.358 Td [(X i 2 C 1 w T 1 )]TJ/F24 11.9552 Tf 5.479 -9.684 Td [(K i K T i w 1 + X i 2 C 1 w T 2 )]TJ/F24 11.9552 Tf 5.48 -9.684 Td [(K i K T i w 2 C )]TJ/F29 11.9552 Tf 12.515 11.358 Td [(X j 2 C 2 w T 2 )]TJ/F24 11.9552 Tf 5.48 -9.684 Td [(K j K T j w 2 + X j 2 C 2 w T 1 )]TJ/F24 11.9552 Tf 5.479 -9.684 Td [(K j K T j w 1 where K i isthe i thcolumnofthekernelGrammatrixandsimilarlyfor K j aswell.The solutionfor w 1 istheeigenvectorcorrespondingtothesmallesteigenvalueofthematrix X j 2 C 2 )]TJ/F24 11.9552 Tf 5.48 -9.684 Td [(K j K T j )]TJ/F29 11.9552 Tf 12.679 11.358 Td [(X i 2 C 1 )]TJ/F24 11.9552 Tf 5.48 -9.684 Td [(K i K T i : C Similarly,thesolutionfor w 2 istheeigenvectorcorrespondingtothesmallesteigenvalueof thematrix X i 2 C 1 )]TJ/F24 11.9552 Tf 5.48 -9.684 Td [(K i K T i )]TJ/F29 11.9552 Tf 13.179 11.358 Td [(X j 2 C 2 )]TJ/F24 11.9552 Tf 5.48 -9.684 Td [(K j K T j : C Wecangeneralizethisto3ormoreclasses.Theobjectivefunctionbecomes 85

PAGE 86

E = )]TJ/F29 11.9552 Tf 12.014 11.357 Td [(X i 2 C 1 w T 1 )]TJ/F24 11.9552 Tf 5.479 -9.684 Td [(K i K T i w 1 + X i 2 C 1 w T 2 )]TJ/F24 11.9552 Tf 5.48 -9.684 Td [(K i K T i w 2 + X i 2 C 1 w T 3 )]TJ/F24 11.9552 Tf 5.479 -9.684 Td [(K i K T i w 3 )]TJ/F29 11.9552 Tf 12.514 11.358 Td [(X j 2 C 2 w T 2 )]TJ/F24 11.9552 Tf 5.479 -9.684 Td [(K j K T j w 2 + X j 2 C 2 w T 1 )]TJ/F24 11.9552 Tf 5.48 -9.684 Td [(K j K T j w 1 + X j 2 C 2 w T 3 )]TJ/F24 11.9552 Tf 5.48 -9.684 Td [(K j K T j w 3 )]TJ/F29 11.9552 Tf 12.883 11.357 Td [(X k 2 C 3 w T 3 )]TJ/F24 11.9552 Tf 5.48 -9.683 Td [(K k K T k w 3 + X k 2 C 3 w T 1 )]TJ/F24 11.9552 Tf 5.479 -9.683 Td [(K k K T k w 1 + X k 2 C 3 w T 2 )]TJ/F24 11.9552 Tf 5.48 -9.683 Td [(K k K T k w 2 : C Sincethesearethreedecoupledobjectivefunctionsw.r.t. w 1 w 2 and w 3 ,thesolutionis quitestraightforward.Thesolutionfor w 1 istheeigenvectorcorrespondingtothesmallest eigenvalueof X k 2 C 3 K k K T k + X j 2 C 2 K j K T j )]TJ/F29 11.9552 Tf 12.679 11.357 Td [(X i 2 C 1 K i K T i C withsimilarsolutionsfor w 2 and w 3 Oneofthemostimportantmetricsforevaluatingtheperformanceofanalgorithmis itsaccuracyrates.Whenstudyinganddesigninglearningalgorithms,weareinterested inperformanceonexamplesnotseenduringtraining,i.e.,thegeneralizationerror.The simplestandmostwidelyusedmethodforestimatingpredictionerroriscross-validation. Ideally,ifwehadenoughdata,wewouldsetasideavalidationsetandusethevalidation settoassesstheperformanceofourpredictionmodel.Sincedataisoftenscarcethis isusuallynotpossible.Theapproachwetakeistosplitthetrainingdataintodisjoint subsets.Usingaholdouttypemethod,wesplitthedataintothetrainingsetandthe testset.Thisprocessissimilarto k )]TJ/F19 11.9552 Tf 9.299 0 Td [(foldcrossvalidation[69],wherealltheexamples inthedatasetareeventuallyusedforbothtrainingandtesting.Thissplitisdoneto produceavariablesizedsegmentduringanexperiment.Foreachoftheexperiment, weusethesubsetfortrainingandtheremainingdatafortesting.Atotalof T trialsof trainingandvalidationtakeplace.Amisclassicationerrorrate e i iscalculatedatthe i th trialwithauniquerandomsubsetofthedataheld-outforvalidation.Thetrueerroris 86

PAGE 87

DataSetSplineAccuracy% std CalTechgray58.7 7.1 Coil77.8 2.2 Letter87.2 1.4 MovementLibras85.3 3.0 OpticalRecognition94.1 0.9 Penbased96.7 1.7 UCIPokerfour,royal,ush59.4 12.0 Landsatsatellite85.8 1.6 Imagesegmentation81.1 4.5 StatlogShuttle80.8 8.1 Texture98.6 0.4 Irisplant96.3 2.1 Seeds93.0 4.2 Vertebralcolumn75.1 2.5 Winerecognition74.8 2.3 RaceSpacefemale81.2 4.2 RaceSpacemale69.0 9.9 VowelRecognitions84.8 1.9 FigureC-1.Splineresultsforbenchmarkdatasets estimatedusingtheaverageerrorrate: E = 1 T T X i =1 e i C Todetermine e i ,weperformclassicationonthecategoryspaceprojecteddataas follows.OncewehaveobtainedfeasiblesetsofsplinecoecientsthatminimizeEquation C,aprobabilisticmethodforclassicationisexecutedbyconstructingGaussian densityfunctionsforeachclassinthecategoryspace[27].Given K classes,theapproach builds K densityfunctions p x j C k ;k 2f 1 ;:::;K g .Afterestimatingthedensitymodels fromthetrainingsetofpatterns,themodelsareusedforincomingpatternsthenclassied byselectingthecategorymembershipbasedonthecategoryconditionaldensitywiththe highestvalue.Belowaretheperformanceaccuracyresultsbasedonvariousstandarddata sets. 87

PAGE 88

REFERENCES [1]Cambridgedictionariesonline,2013.[Online].Available:http://dictionary.cambridge. org/dictionary/american-english/discrimination_1?q=discrimination1.1 [2]R.Fisher,Theuseofmultiplemeasurementsintaxonomicproblems, Annalsof Eugenics ,vol.7,pp.179,1936.1.1,2.5 [3]M.RiesenhuberandT.Poggio,Hierarchicalmodelsofobjectrecognitionincortex, NatureNeuroscience ,vol.2,no.11,pp.1019,1999.1.1 [4]V.Vapnik, TheNatureofStatisticalLearningTheory .NewYork:Springer-Verlag, 1995.1.2 [5]T.Hastie,R.Tibshirani,andJ.Friedman, TheElementsofStatisticalLearning: Datamining,Inference,andPrediction ,2nded.Springer,2009.1.2,2.5 [6]R.DudaandP.Hart, PatternClassicationandSceneAnalysis .NewYork,NY: Wiley,1973.1.2 [7]K.Fukunaga, IntroductionToStatisticalPatternRecognition ,2nded.Academic Press,1990.1.2,2.5,2.6 [8]C.Bishop, NeuralNetworksForPatternRecognition ,1sted.OxfordUniversityPress, 1996.1.2,2.3 [9]H.CohenandC.Lefebvre, HandbookofCategorizationInCognitiveScience .Elsevier Science,2005.1.2,1.2 [10]E.Rosch,Natrualcategories, CognitivePsychology ,vol.4,pp.328,1973.1.2, 3.2 [11]E.Rosch,C.Mervis,W.Gray,D.Johnson,andP.Boyes-Braem,Basicobjectsin naturalcategories, CognitivePsychology ,1976.1.2 [12]C.Bishop, PatternRecognitionandMachineLearning ,1sted.Springer,2006.1.3, 2.5,2.6.5 [13]R.Duda,P.Hart,andD.Stork, PatternClassication ,2nded.Wiley-Interscience, 2000.1.3,2.5,2.6 88

PAGE 89

[14]L.Kuncheva, CombiningPatternClassiers:MethodsandAlgorithms ,1sted. Wiley-Interscience,2004.1.3 [15]R.Schapire, NonlinearEstimationandClassication ,1sted.Springer,2003,ch.The BoostingApproachToMachineLearning:AnOverview,pp.149.1.3 [16]S.DzeroskiandB.Zenko,Iscombiningclassierswithstackingbetterthanselecting thebestone? MachineLearning ,vol.54,pp.255,2004.1.3 [17]U.Kressel,Pairwiseclassicationandsupportvectormachines,in Advancesin KernelMethods .MITPress,1999,pp.255.1.3,2.6.3 [18]J.Milgram,M.Cheriet,andR.Sabourin,"OneAgainstainstOne"or"OneAgainst All":WhichoneisbetterforhandwritingrecognitionwithSVMs?in Tenth InternationalWorkshoponFrontiersinHandwritingRecognition ,2006.1.3 [19]Y.LiuandY.Zheng,One-against-allmulti-classSVMclassicationusing reliabilitymeasures,in NeuralNetworks,2005.IJCNN'05.Proceedings.2005IEEE InternationalJointConferenceon ,vol.2.IEEE,2005,pp.849.1.3 [20]C.HsuandC.Lin,Acomparisonofmethodsformulticlasssupportvectormachines, IEEETrans.NeuralNetworks ,vol.13,no.2,pp.415,2002.1.3,2.6.2 [21]P.ViolaandM.Jones,Robustreal-time-facedetection, InternationalJournalof ComputerVision ,vol.57,no.2,pp.137,January2004.1.3,2.6.2 [22]B.SchlkopfandA.Smola, LearningwithKernelsSupportVectorMachines, Regularization,Optimization,andBeyond ,T.Dietterich,Ed.TheMITPress,2002. 2.1,4.1 [23]P.MarcotteandG.Savard,Novelapproachestothediscriminationproblem, MathematicalMethodsofOperationsResearch ,vol.36,no.6,pp.517,1992.2.1 [24]P.Belhumeur,J.Hespanha,andD.Kriegman,Eigenfacesvs.Fisherfaces:Recognition usingclassspeciclinearprojection, IEEETransactionsonPatternAnalysisand MachineIntelligence ,vol.19,no.7,pp.711,1997.2.1 [25]G.GolubandC.VanLoan, MatrixComputations ,3rded.TheJohnsHopkins UniversityPress,1996.2.1 [26]M.Aly,Surveyonmulticlassclassicationmethods,CaliforniaInstituteof Technology,Pasadena,CA,USA,Tech.Rep.,2005.2.2 89

PAGE 90

[27]T.Wu,C.Lin,andR.Weng,Probabilityestimatesformulti-classclassicationby pairwisecoupling, TheJournalofMachineLearningResearch ,vol.5,pp.975, 2004.2.3,5.1,C.1 [28]A.Rosenfeld,R.Hummel,andS.Zucker,Scenelabelingbyrelaxationoperations, IEEETrans.Syst.Man,Cybern. ,vol.6,no.6,pp.420,Jun.1976.2.3 [29]S.Zucker,R.Hummel,andA.Rosenfeld,Anapplicationofrelaxationlabelingto lineandcurveenhancement, Computers,IEEETransactionson ,vol.100,no.4,pp. 394,1977.2.3 [30]R.HummelandS.Zucker,Onthefoundationsofrelaxationlabelingprocesses, IEEE Trans.Patt.Anal.Mach.Intell. ,vol.5,no.3,pp.267,May1983.2.3 [31]S.Peleg,Anewprobabilisticrelaxationscheme, IEEETrans.Patt.Anal.Mach. Intell. ,vol.2,no.4,pp.362,Jul.1980.2.3 [32]S.Watanabe,Karhunen-Loveexpansionandfactoranalysis,theoreticalremarksand applications,in Proc.4thPragueConf.Inform.Theory ,1965.2.4 [33]Y.ChienandK.Fu,OnthegeneralizedKarhunen-Loveexpansioncorresp., InformationTheory,IEEETransactionson ,vol.13,no.3,pp.518,1967.2.4 [34]K.FukunagaandW.Koontz,ApplicationoftheKarhunen-Loveexpansionto featureselectionandordering, Computers,IEEETransactionson ,vol.100,no.4,pp. 311,1970.2.4 [35]J.Sammon,Anoptimaldiscriminantplane, Computers,IEEETransactionson ,vol. 100,no.9,pp.826,1970.2.4 [36]D.FoleyandJ.Sammon,Anoptimalsetofdiscriminantvectors, Computers,IEEE Transactionson ,vol.C-24,no.3,pp.281,1975.2.4 [37]T.AndersonandR.Bahadur,Classicationintotwomultivariatenormaldistributions withdierentcovariancematrices, AnnalsofMathematicalStatistics ,vol.33,no.2, pp.420,1962.2.4 [38]H.Hotelling,Analysisofacomplexofstatisticalvariablesintoprincipalcomponents, JournalofEducationalPsychology ,vol.24,no.6,1933.2.4 [39]I.Jollie, PrincipalComponentAnalysis .Springer-VerlagNewYork,1986,vol.487. 2.4 90

PAGE 91

[40]B.Schlkopf,A.Smola,andK.Mller,Nonlinearcomponentanalysisasakernel eigenvalueproblem, NeuralComputation ,vol.10,pp.1299,1998.2.4 [41]C.Rao,Theuseandinterpretationofprincipalcomponentanalysisinapplied research, Sankhya:TheIndianJournalofStatistics,SeriesA ,pp.329,1964.2.4 [42]E.BairandR.Tibshirani,Semi-supervisedmethodstopredictpatientsurvivalfrom geneexpressiondata, PLoSBiol ,vol.2,pp.511,2004.2.4 [43]E.Bair,T.Hastie,D.Paul,andR.Tibshirani,Predictionbysupervisedprincipal components, JournaloftheAmericanStatisticalAssociation ,vol.101,no.473,2006. 2.4 [44]V.Vapnik, StatisticalLearningTheory .WileyInterscience,1998.2.5,2.6.5,2.6.5 [45]T.OkadaandS.Tomita,Anoptimalorthonormalsystemfordiscriminantanalysis, PatternRecognition ,vol.18,no.2,pp.139,1985.2.5 [46]J.Ye,Characterizationofafamilyofalgorithmsforgeneralizeddiscriminantanalysis onundersampledproblems, JournalofMachineLearningResearch ,vol.6,pp.483 502,2005.2.5 [47]J.DucheneandS.Leclercq,Anoptimaltransformationfordiscriminantandprincipal componentanalysis, PatternAnalysisandMachineIntelligence,IEEETransactions on ,vol.10,no.6,pp.978,1988.2.5 [48]J.Romeder,M.Maronda-Duhamel,C.Garon,andF.Grmy, MthodesetProgrammes D'analyseDiscriminante .Dunod,1973.2.5 [49]T.HastieandR.Tibshirani,Classicationbypairwisecoupling, AdvancesinNeural InformationProcessingSystems ,vol.10,1998.2.6.3 [50]W.PetersonandE.Weldon, Error-CorrectingCodes ,2nded.TheMITPress,1972. 2.6.4 [51]R.Hamming,Errordetectinganderrorcorrectingcodes, TheBellSystemTechnical Journal ,vol.29,no.2,pp.147,April1950.2.6.4 [52]J.Ye,Leastsquareslineardiscriminantanalysis,in Proceedingsofthe24th internationalconferenceonMachinelearning .ACM,2007,pp.1087.2.6.5 91

PAGE 92

[53]P.Zhang,J.Peng,andN.Riedel,Discriminantanalysis:Aleastsquares approximationview,in IEEEComputerSocietyConferenceonComputerVisionand PatternRecognition ,2005,pp.46.2.6.5 [54]B.Schlkopf,C.Burges,andA.Smola, AdvancesinKernelMethods:SupportVector Learning .TheMITpress,1999.2.6.5,4.1 [55]B.Schlkopf,A.Smola,R.Williamson,andP.Bartlett,Newsupportvector algorithms, NeuralComputation ,vol.12,no.5,pp.1207,2000.2.6.5 [56]D.BertsekasandJ.Tsitsiklis, ParallelandDistributedComputation:Numerical Methods .EnglewoodClis,NJ:Prentice-Hall,1989.2.6.5 [57]L.BottouandC.Lin,Supportvectormachinesolvers, LargeScaleKernelMachines pp.301,2007.2.6.5 [58]J.WestonandC.Watkins,Multi-classsupportvectormachines,Departmentof ComputerScience,RoyalHolloway,UniversityofLondon,Tech.Rep.,1998.2.6.5 [59]K.CrammerandY.Singer,Onthealgorithmicimplementationofmulticlass kernel-basedvectormachines, TheJournalofMachineLearningResearch ,vol.2,pp. 265,2002.2.6.5 [60]D.Widdows, GeometryandMeaning .CenterfortheStudyofLanguageand Information,2004.2.7 [61]D.WiddowsandS.Peters,Wordvectorsandquantumlogic:Experimentswith negationanddisjunction,in MathematicsofLanguageConference ,vol.8,2003,pp. 141.2.7 [62]I.Tsochantaridis,T.Hofmann,T.Joachims,andY.Altun,Supportvectormachine learningforinterdependentandstructuredoutputspaces,in InternationalConference onMachineLearning ,2004.2.7 [63]D.Widdows,Orthogonalnegationinvectorspacesformodellingword-meanings anddocumentretrieval,in Proceedingsofthe41stAnnualMeetingonAssociationfor ComputationalLinguistics ,vol.1.AssociationforComputationalLinguistics,2003, pp.136.3.1 [64]T.Rapcsak,Onminimizationonstiefelmanifolds, EuropeanJournalofOperational Research ,vol.143,no.2,pp.365,2002.3.2,3.2,3.2,4.1,B.4 92

PAGE 93

[65]M.Bolla,G.Michaletzky,G.Tusnady,andM.Ziermann,Extremaofsumsof heterogeneousquadraticforms, LinearAlgebraandItsApplications ,vol.269,no.1-3, pp.331,1998.3.2,3.2,3.2,3.2,3.2,4.1,4.2,4.2,4.3,B.1,B.2,B.3 [66]M.Aizerman,E.Braverman,andL.Rozonoer,Theoreticalfoundationsofthe potentialfunctionmethodinpatternrecognitionlearning, AutomationandRemote Control ,vol.25,pp.821,1964.4.1 [67]J.Mercer,Functionsofpositiveandnegativetype,andtheirconnectionwith thetheoryofintegralequations, PhilosophicalTransactionsoftheRoyalSocietyof London.SeriesA,ContainingPapersofaMathematicalorPhysicalCharacter ,pp. 415,1909.4.1 [68]E.MjolsnessandC.Garrett,Algebraictransformationsofobjectivefunctions, Neural Networks ,vol.3,pp.651,1990.4.2 [69]P.Refaeilzadeh,L.Tang,andH.Liu,Cross-validation, EncyclopediaofDatabase Systems ,pp.532,2009.5.1,C.1 [70]J.Alcal-Fdez,A.Fernandez,J.Luengo,J.Derrac,S.Garca,L.Snchez,and F.Herrera,Keeldata-miningsoftwaretool:Datasetrepository,integration ofalgorithmsandexperimentalanalysisframework, JournalofMultiple-Valued LogicandSoftComputing ,vol.17:2-3,pp.255,2011.[Online].Available: http://keel.es/datasets.php5.2,5-3,5-4,5-5,5-7,5-8,5-9,5-10,5-13 [71]K.BacheandM.Lichman,UCImachinelearningrepository,2013.[Online]. Available:http://archive.ics.uci.edu/ml5.2,5-6,5-11,5-12 [72]Encyclopediabritannicaonline.[Online].Available:http://www.britannica. com/EBchecked/topic/488030/raceA.1 [73]M.Gordon,Assimilationinamericanlife:theroleofrace,religion,andnational origins,1964.A.1 [74]N.Risch,E.Burchard,E.Ziv,andH.Tang,Categorizationofhumansinbiomedical research:genes,raceanddisease, GenomeBiology ,vol.3,no.7,July2002. [Online].Available:http://dahsm.medschool.ucsf.edu/history/Suran_Racial_PDF/ Risch_2002.pdfA.1 [75]G.Newman, EchoesFromthePast:WorldHistoryTothe16thCentury .McGraw-Hill RyersonLtd,2008.A.1 93

PAGE 94

[76]J.Blumenbach,DegenerishumanivarietatenativaOnthenaturalvarietyof mankind,Master'sthesis,UniversityofGttingen,1775.A.1 [77]Y.Ou,X.Wu,H.Qian,andY.Xu,Arealtimeraceclassicationsystem,in IEEE InternationalConferenceonInformationAcquisition ,june2005.A.1 [78]S.LiandA.Jain,Facedatabases,in HandbookofFaceRecognition ,Springer-Verlag, Ed.,February2005.A.1 [79]G.Huang,M.Ramesh,T.Berg,andE.Learned-Miller,Labeledfacesinthewild: Adatabaseforstudyingfacerecognitioninunconstrainedenvironments,University ofMassachusetts,Amherst,Tech.Rep.07-49,October2007.[Online].Available: http://vis-www.cs.umass.edu/lfwA.1,A-1,A-4,A-5,A-6,A-7,A-8 [80]F.Bookstein,Principalwarps:Thin-platesplinesandthedecompositionof deformations, IEEETrans.Patt.Anal.Mach.Intell. ,vol.11,no.6,pp.567, 1989.C.1 94

PAGE 95

BIOGRAPHICALSKETCH Ifyouhavenocondenceinselfyouaretwicedefeatedintheraceoflife.Withcondence youhavewonevenbeforeyouhavestarted. MarcusGarvey ThePhilosophyandOpinionsofMarcusGarvey AnthonySmithwasbornandraisedinRidgeland,SouthCarolina.In1994,he graduatedwithhishighschooldiplomafromJasperCountyHighSchool.Itwasduring hissenioryear,Anthonyexpressedhisambitiontostudycomputerscience.Anthonychose topursuehisundergraduatedegreeatClemsonUniversityinClemson,SouthCarolina.He earnedhisBachelorofScienceincomputerinformationsystemsinMay2000,leadinghim intoacareerasasoftwareengineerinMelbourne,Florida. AnthonyobtainedhisMasterofScienceincomputersciencewithafocusonparallel computingfromWebsterUniversityinDecember2004.Early2005,Anthonysethissights onearninghisdoctorateincomputerscience.InAugust2005,heenrolledasagraduate studentintheDepartmentofComputerScienceattheUniversityofFlorida.Hebegan hisdoctoralresearch,underthedirectionofDr.AnandRangarajanwithafocuson machinelearningformulticlassclassication.InAugust2013,AnthonySmithcompleted hisresearch,earninghisDoctorofPhilosophyincomputerengineeringfromtheUniversity ofFloridainGainesville,FL. 95