<%BANNER%>

Block, Group, and Affine Regularized Sparse Coding and Dictionary Learning

MISSING IMAGE

Material Information

Title:
Block, Group, and Affine Regularized Sparse Coding and Dictionary Learning
Physical Description:
1 online resource (73 p.)
Language:
english
Creator:
Chi, Yu-Tseh
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Computer Engineering, Computer and Information Science and Engineering
Committee Chair:
Ho, Jeffrey Yih Chian
Committee Members:
Peters, Jorg
Vemuri, Baba C
Banerjee, Arunava
Burks, Thomas Francis

Subjects

Subjects / Keywords:
algorithm -- coding -- dictionary -- learning -- optimization -- sparse
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre:
Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
I first propose a novel approach for sparse coding that further improves upon the sparse representation-based classification (SRC) framework. This proposed framework, affine constrained group sparse coding, extends the current SRC framework to classification problems with multiple inputs. Geometrically, the constrained group sparse coding essentially searches for the vector in the convex hull spanned by the input vectors that can best be sparse coded using the given dictionary. The resulting objective function is still convex and can be efficiently optimized using iterative block-coordinate descend scheme that is guaranteed to converge. Furthermore, I provide a form of sparse recovery result that guarantees, at least theoretically, that the classification performance of the constrained group sparse coding should be at least as good as the group sparse coding.  While utilizing the affine combination of multiple input test samples can improve the performance of the conventional sparse representation-based classification framework, it is difficult to integrate this approach into a dictionary learning framework. Therefore, we propose to combine (1) imposing group structure on data (2) imposing block structure on the dictionary and (3) using different regularizer term to sparsely encode the data. We call this approach either block/group (BGSC) or reconstructed block/group (R-BGSC) sparse coding . Incorporating either one of them with the novel Intra-block Coherence Suppression Dictionary Learning (ICS-DL), which, as the name suggests, suppress the coherence of atoms within the same block, algorithm results in a novel dictionary learning framework. An important and distinguishing feature of the proposed framework is that all dictionary blocks are trained simultaneously with respect to each data group while the intra-block coherence being explicitly minimized as an important objective. We provide both empirical evidence and heuristic support for this feature that can be considered as a direct consequence of incorporating both the group structure for the input data and the block structure for the dictionary in the learning process. The optimization problems for both the dictionary learning and sparse coding can be solved efficiently using block-coordinate descent, and the details of the optimization algorithms are presented. In both parts of this work, the proposed methods are evaluated on several classification (supervised) and clustering (unsupervised) problems using well-known datasets. Favorable comparisons with state-of-the-art methods demonstrate the viability and validity of the proposed frameworks.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Yu-Tseh Chi.
Thesis:
Thesis (Ph.D.)--University of Florida, 2013.
Local:
Adviser: Ho, Jeffrey Yih Chian.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2013
System ID:
UFE0044995:00001

MISSING IMAGE

Material Information

Title:
Block, Group, and Affine Regularized Sparse Coding and Dictionary Learning
Physical Description:
1 online resource (73 p.)
Language:
english
Creator:
Chi, Yu-Tseh
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Computer Engineering, Computer and Information Science and Engineering
Committee Chair:
Ho, Jeffrey Yih Chian
Committee Members:
Peters, Jorg
Vemuri, Baba C
Banerjee, Arunava
Burks, Thomas Francis

Subjects

Subjects / Keywords:
algorithm -- coding -- dictionary -- learning -- optimization -- sparse
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre:
Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
I first propose a novel approach for sparse coding that further improves upon the sparse representation-based classification (SRC) framework. This proposed framework, affine constrained group sparse coding, extends the current SRC framework to classification problems with multiple inputs. Geometrically, the constrained group sparse coding essentially searches for the vector in the convex hull spanned by the input vectors that can best be sparse coded using the given dictionary. The resulting objective function is still convex and can be efficiently optimized using iterative block-coordinate descend scheme that is guaranteed to converge. Furthermore, I provide a form of sparse recovery result that guarantees, at least theoretically, that the classification performance of the constrained group sparse coding should be at least as good as the group sparse coding.  While utilizing the affine combination of multiple input test samples can improve the performance of the conventional sparse representation-based classification framework, it is difficult to integrate this approach into a dictionary learning framework. Therefore, we propose to combine (1) imposing group structure on data (2) imposing block structure on the dictionary and (3) using different regularizer term to sparsely encode the data. We call this approach either block/group (BGSC) or reconstructed block/group (R-BGSC) sparse coding . Incorporating either one of them with the novel Intra-block Coherence Suppression Dictionary Learning (ICS-DL), which, as the name suggests, suppress the coherence of atoms within the same block, algorithm results in a novel dictionary learning framework. An important and distinguishing feature of the proposed framework is that all dictionary blocks are trained simultaneously with respect to each data group while the intra-block coherence being explicitly minimized as an important objective. We provide both empirical evidence and heuristic support for this feature that can be considered as a direct consequence of incorporating both the group structure for the input data and the block structure for the dictionary in the learning process. The optimization problems for both the dictionary learning and sparse coding can be solved efficiently using block-coordinate descent, and the details of the optimization algorithms are presented. In both parts of this work, the proposed methods are evaluated on several classification (supervised) and clustering (unsupervised) problems using well-known datasets. Favorable comparisons with state-of-the-art methods demonstrate the viability and validity of the proposed frameworks.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Yu-Tseh Chi.
Thesis:
Thesis (Ph.D.)--University of Florida, 2013.
Local:
Adviser: Ho, Jeffrey Yih Chian.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2013
System ID:
UFE0044995:00001


This item has the following downloads:


Full Text

PAGE 1

BLOCK,GROUP,ANDAFFINEREGULARIZEDSPARSECODINGAND DICTIONARYLEARNING By YU-TSEHCHI ADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOL OFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF DOCTOROFPHILOSOPHY UNIVERSITYOFFLORIDA 2013

PAGE 2

c 2013Yu-TsehChi 2

PAGE 3

Tomyfather,mywifeandmyson 3

PAGE 4

ACKNOWLEDGMENTS IamextremelygratefultoDr.JereyHoforhisguidanceandsupportduringmy graduatestudies.Hehasbeenaconstantsourceofinspirationandencouragementfor me,themostimportantingredientsnecessaryforresearch.IamalsothankfultoDr.Jorg Peters,Dr.ArunavaBanerjee,Dr.BabaC.VemuriandDr.ThomasBurksforbeingon mysupervisorycommitteeandprovidingextremelyusefulinsightsintotheworkpresented inthisdissertation. IwouldliketothanktheDepartmentofComputerandInformationScienceand EngineeringCISEandtheUniversityofFloridaUFforgivingmetheopportunity topursuemygraduatestudiesinaveryconstructiveenvironment.Iamespecially thankfultotheCISEdepartmentforfundingmydoctoralstudiesandtravelstovarious conferences.Duringmygraduatestudies,Ienjoyedmyjobasateachingassistantand forthatIamgratefultoDr.ManuelBermudezandDr.JonathanLiuforbeingaterric boss. Ialsoappreciatethecamaraderieofmylab-matesMohsenAli,ShahedNejhum, MuhammadRushdi,ShaoyuQi,KarthikGurumothy,VenkatkrishinanRamaswamy, NicholasFisher,NathanVanDerKraat,SubhajitSengupta,AjitRajwade,TerryChingHsiangHsuandHangYu.Itwasafunlabtoworkat.Iamthankfultomylongtime friendsinTaiwanMCHsiao,AhwayChen,Yan-FuKao,Yan-SueChiang,OdieYu, RichardHsuandChin-YoungHsuforrootingforme. Lastlyandmostimportantly,Iamthankfultomyfamily,fortheirunconditional anduninchingloveandsupport.Iwillbeeternallygratefultomyfather,ChiNoand especiallymywife,AparnaGazula,forallthattheyhavedoneforme. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS.................................4 LISTOFTABLES.....................................7 LISTOFFIGURES....................................8 ABSTRACT........................................9 CHAPTER 1OVERVIEW.....................................11 2AFFINECONSTRAINEDGROUPSPARSECODINGFORIMAGECLASSIFICATIONS.....................................13 2.1Introduction...................................13 2.2TheoryandMethod..............................17 2.2.1TheoreticalGuarantee.........................18 2.2.2Part-BasedACGSC...........................20 2.3RelatedWorks.................................22 2.4Experiments...................................24 2.4.1FaceClassication...........................24 2.4.2ImposterDetection...........................27 2.4.3FaceRecognitionwithOcclusions...................31 2.4.4TextureClassication..........................33 2.5FutureWork..................................36 3BLOCKANDGROUPREGULARIZEDSPARSEMODELINGFORDICTIONARYLEARNING..................................38 3.1Introduction...................................38 3.1.1RelatedWork..............................43 3.2Methods.....................................44 3.2.1TheoreticalGuarantee.........................44 3.2.2Block/GroupSparseCoding......................44 3.2.3ReconstructedBlock/GroupSparseCoding..............46 3.2.4Intra-BlockCoherenceSuppressionDictionaryLearning.......47 3.3ExperimentsandDiscussions.........................50 3.3.1Hand-WrittenDigitRecognition....................50 3.3.2GroupRegularizedFaceClassication.................55 3.3.3UnsupervisedTextureClustering...................58 4CONCLUSION....................................60 APPENDIX:PROOFOFTHETHEORETICALGUARANTEE...........61 5

PAGE 6

REFERENCES.......................................69 BIOGRAPHICALSKETCH................................73 6

PAGE 7

LISTOFTABLES Table page 3-1Classicationerror%withdierentstructureon D ................52 3-2Classicationerror%withdierent inICS-DL.................53 3-3Comparisonofclassicationerrorrate%betweenourapproachesandothers..54 7

PAGE 8

LISTOFFIGURES Figure page 2-1IllustrationoftheproposedAneConstrainedGSCframework..........14 2-2Illustrationofthestandard GSC andourproposed ACGSC ...........18 2-3Comparisonbetweenthestandardandpart-based ACGSC ............21 2-4SelectedtrainingandtestsamplesfromtheYaleExtendedBdatabase.....24 2-5Experimentsresultsofour ACGSC andothermethodsonfacerecognition...27 2-6Reconstructedimagesandtestsamplesfromthefacerecognitionexperiment..28 2-7Precisionv.s.RecallCurveoftheimposterdetectionexperiment.........30 2-8Examplesofdetectedimposters...........................31 2-9ResultsofthePart-basedACGSConfacerecognitionwithocclusion.......33 2-10Resultsofapplyingpart-based ACGSC onfacerecognitionwithocclusion...34 2-11SelectedimagesfromthethecroppedCuretdatabase...............34 2-12Classicationratesofthetextureclassicationexperiment.............36 3-1IllustrationoftheproposedBlock/GroupSparseCodingalgorithm........39 3-2Visualizationofthesparsecoecientsofthetrainingsamples...........51 3-3Intra-blockcoherenceofthedictionaryanderrorrates..............52 3-4Errorrates%oftheUSPSdatasetundervedierentscenarios.........53 3-5Demonstrationofselectedtrainingandtestsamples................57 3-6Resultsofthetextureseparationexperiment....................59 A-1Theequivalencebetween X = DC and x = D 0 c 0 .................62 A-2Theequivalencebetween D [ i ] ; C [ i ] and D 0 [ i ] ; c 0 [ i ] ...................67 8

PAGE 9

AbstractofdissertationPresentedtotheGraduateSchool oftheUniversityofFloridainPartialFulllmentofthe RequirementsfortheDegreeofDoctorofPhilosophy BLOCK,GROUP,ANDAFFINEREGULARIZEDSPARSECODINGAND DICTIONARYLEARNING By Yu-TsehChi August2013 Chair:JereyHo Major:ComputerEngineering InthisDISSERTATION,Irstproposeanovelapproachforsparsecodingthat furtherimprovesuponthesparserepresentation-basedclassication SRC framework. Thisproposedframework,aneconstrainedgroupsparsecoding,extendsthecurrent SRC frameworktoclassicationproblemswithmultipleinputs.Geometrically,the constrainedgroupsparsecodingessentiallysearchesforthevectorintheconvexhull spannedbytheinputvectorsthatcanbestbesparsecodedusingthegivendictionary. Theresultingobjectivefunctionisstillconvexandcanbeecientlyoptimizedusing iterativeblock-coordinatedescendschemethatisguaranteedtoconverge.Furthermore, Iprovideaformofsparserecoveryresultthatguarantees,atleasttheoretically,thatthe classicationperformanceoftheconstrainedgroupsparsecodingshouldbeatleastas goodasthegroupsparsecoding. Whileutilizingtheanecombinationofmultipleinputtestsamplescanimprovethe performanceoftheconventionalsparserepresentation-basedclassicationframework,it isdiculttointegratethisapproachintoadictionarylearningframework.Therefore,we proposetocombineimposinggroupstructureondataimposingblockstructureon thedictionaryandusingdierentregularizertermtosparselyencodethedata.We callthisapproacheither block/group BGSCor reconstructedblock/group R-BGSC sparsecoding.Incorporatingeitheroneofthemwiththenovel Intra-blockCoherence SuppressionDictionaryLearning ICS-DL,which,asthenamesuggests,suppressthe 9

PAGE 10

coherenceofatomswithinthesameblock,algorithmresultsinanoveldictionarylearning framework. Animportantanddistinguishingfeatureoftheproposedframeworkisthatall dictionaryblocksaretrainedsimultaneouslywithrespecttoeachdatagroupwhilethe intra-blockcoherencebeingexplicitlyminimizedasanimportantobjective.Weprovide bothempiricalevidenceandheuristicsupportforthisfeaturethatcanbeconsidered asadirectconsequenceofincorporatingboththegroupstructurefortheinputdata andtheblockstructureforthedictionaryinthelearningprocess.Theoptimization problemsforboththedictionarylearningandsparsecodingcanbesolvedecientlyusing block-coordinatedescent,andthedetailsoftheoptimizationalgorithmsarepresented. Inbothpartsofthiswork,theproposedmethodsareevaluatedonseveralclassicationsupervisedandclusteringunsupervisedproblemsusingwell-knowndatasets. Favorablecomparisonswithstate-of-the-artmethodsdemonstratetheviabilityand validityoftheproposedframeworks. 10

PAGE 11

CHAPTER1 OVERVIEW Inrecentyears,sparsesignalrepresentationhasreceivedalotofattention.Ithas proventobeapowerfultoolintheeldofcomputervisionandmachinelearning[8,45]. Itssuccessismainlyduetothefactthatimagesorimagepatcheshavenaturallysparse representationswithrespecttoglobalandpre-constructedbasesDCT,waveletorbases specicallytrained[26]. Incontrasttotheeigen-coecients,whicharejustprojectionoftheinputfeature vectorontheeigenspace,calculationofthesparserepresentationisnotstraight-forward. Itinvolvesminimizingthefollowingequation: k x )]TJ/F39 11.9552 Tf 11.955 0 Td [(Dc k 2 2 ; s.t. j c j 0 s; {1 where D 2 R n k isthedictionary, x istheinputvectorand c isthesparserepresentation of x .TheaboveequationiscombinatorialproblemandisknowntobeNP-hard.Many alternativemethodsaretoconvexifyitbyreplacingthe ` 0 -normconstraintwithan ` 1 -norm[4,20]oran ` q -norm
PAGE 12

example,invideosurveillance,adurationofonsecondwouldprovideabout30frames. Therefore,thereisaneedtoproperlygeneralize SRC forclassicationproblemsthat requiredecisiononagroupoftestsamples.Onewaytoapproachthisproblemisto considerthemethodproposedin[1].Thetestsamplesareconsideredasonegroupand encode.However,inpractice,thevariabilitytrainingsamplesandthetestsamplescanbe solargethatthetestsamplesdonotlieinthesubspacespannedbythetrainingsample. Theproposedalgorithmtakeadvantageoftheconvexsetformedbythetestsamples.It isdesignedtocaptureanyvectorintheconvexsetthatcanbestbesparsecodedbythe dictionaryD. RecentlySprechmannet.al[38]proposedtoimposestructureonboththeinput data x andthedictionary D .Themainfocusofthisreport,however,isonrecovering andseparatingmixtureofsignals.optimizationproposedin[21].Theydidnotpropose norinvestigatetheintegrationofdictionarylearningalgorithms.Inthesecondpartof thiswork,twoalternativeblockandgroupregularizedalgorithmsareproposed.Iwill provethatourproposedalgorithmscanindeedproducesparserepresentationforagroup ofinputdata.Iwillalsoprovideecientoptimizationmethodsforthesetwoproposed algorithms.Iwilldiscusstheadvantagesanddisadvantagesofincorporatingthesesparse codingalgorithmswithdictionarylearningalgorithms.Anoveldictionarylearning algorithmwillbepresentedtoalleviatetheproblemscausedbyintegratingtheproposed sparsecodingalgorithmswithdictionarylearningalgorithms. 12

PAGE 13

CHAPTER2 AFFINECONSTRAINEDGROUPSPARSECODINGFORIMAGE CLASSIFICATIONS 2.1Introduction Sparserepresentation-basedclassication SRC hasbeeninvestigatedinseveral notablerecentworke.g.,[10,46],anddespitethesimplicityofthealgorithm,the reportedrecognitionresultsarequiteimpressive.Thegeometricmotivationbehindthis approachistheassumptionthatdatafromeachclassresidesonalow-dimensionallinear subspacespannedbythetrainingimagesbelongingtothegivenclass.Thedictionary isobtainedvirtuallywithoutcostbysimplystackingtogetherthevectorizedtraining imagesintoadictionarymatrix D suchthattrainingimagesfromasingleclassforma contiguousblockin D Fig.2-1.Duringtesting,atestimage x issparsecodedwith respecttothedictionarybyoptimizingaconvexobjectivefunction E c ; x ; D ofthe sparsecoecient c thatisusuallyasumof ` 2 -datadelitytermandasparse-inducing regularizer: E c ; x ; D = k x )]TJ/F39 11.9552 Tf 11.955 0 Td [(Dc k 2 + c : {1 Aplethoraofsparse-inducingregularizershavebeenproposedintheliteraturee.g., [19],[1],[40],andmanyofthethemarebasedonthesparse-promotingpropertyofthe ` 1 -norm[5].Theblockstructureofthedictionary D togetherwiththesparsecodingof x allowsonetoinfertheclasslabelof x byexaminingthecorrespondingblockcomponents of c ,and SRC essentiallylooksforthesparsestrepresentationofatestimagewiththe hopethatsucharepresentationselectsafewcolumnsof D fromthecorrectblockclass. However,formanyimageclassicationproblemsincomputervision,thecurrent SRC asembodiedbyEq.2{1hasseveralinherentshortcomings,andtheconstrainedgroup sparsecodingmodelproposedandstudiedinthispaperaimstofurtherimprovethe eectivenessof SRC byaddressingtwooftheseshortcomings:itsgeneralizabilityandits extensionformultipleinputs. 13

PAGE 14

Figure2-1.Illustrationoftheproposedframework.Left:Theconvexhullformedby columnsof X .Theimagecorrespondingtoeachcolumnof X areshownand x = X a isthesolutiondeterminedbytheproposedframework. x isthe meanofcolumnsof X .Themagnitudeofeachcomponent a i of a isshownas thewhitebarinthecorrespondingimage.Right:Illustrationof D c with selectedatomsfromthedictionary D andthesparsecoecients c of x shown ontheright.Imagesofthesamecolorareinthesameblock. Inthisageofinformation,dataareplentifulandinmanyapplications,testdata donotcomeinsinglesbutingroups.Forexample,invideosurveillance,adurationof mereonesecondwouldprovideabout30framesofdatawithperhapsthesamenumber ofdetectedfacestoberecognized.Therefore,thereisaneedtoproperlygeneralize SRC forclassicationproblemsthatrequiredecisiononagroupoftestdata x 1 ;:::; x k Ontheotherhand,formostcomputervisionapplications,itisatruismthatthere doesnotexistaclassierthatcancorrectlyanticipateeverypossiblevariationofthe testimages.Forimageclassicationandfacerecognitioninparticular,theseinclude variationinillumination,pose,expressionandimagealignment.Inparticular,subspacebasedclassication,forwhichSRCisaspecialcase,isknowntobesensitivetoimage misalignment[46].Evenforasmalldegree,misalignmentcanbedetrimentalandcause temperamentalbehavioroftheclassierwithunpredictableoutcomes. Compoundingtheproblemfurther,thereisadesiretominimizethesizeofthe dictionaryformanyreasonsincludingcomputationaleciency,andthelatterrequirement willinevitablylimitthedimensionofthelinearsubspaceforeachclassandtherefore, reduceitsgeneralizability.However,inmanyvisionapplications,thedierencebetween thetrainingimagesandanticipatedtestcanbemodeledusingasmallnumberoflinear 14

PAGE 15

transformationsintheimagespace.Forinstance,lightingvariation,2Dnonrigidimage deformationandtosomeextent,posevariationcanbeapproximatelymodeledusing lineartransformationsintheimagespace.Therefore,asensiblesolutionforincreasingthe generalizabilityof D wouldbetoobtainthesecrucialtransforms 1 T 1 ; ; T k : R k R k duringtraining,andapplying T i tothesingletestimage x toobtainagroupofimages x 1 ;:::; x k ,fromwhichaclassicationdecisionwillbedetermined.Somewhatsurprisingly, thisyieldsanotherinstanceofclassicationproblemswithmultipleinputs. Multipleinputs x 1 ;:::; x k providesanewanddierentsettingfor SRC ,anda straightforwardapplicationof SRC wouldbetoapply SRC individuallytoeach x i andgeneratethenalclassicationdecisionbypoolingorvotingamongtheindividual classicationresults.Thisapproachisunsatisfactorybecausebytreating x i independently, itcompletelyignoresthehiddencommonalitysharedbytheseobviouslyrelateddata. Groupsparsecoding[1]GSCoersasolutionthattakesintoaccountthepotential relationamong x 1 ;:::; x k bysparsecodingalldatasimultaneouslyusingtheobjective function min C E C ; X ; D = k X )]TJ/F39 11.9552 Tf 11.955 0 Td [(DC k 2 + C ; {2 where X isamatrixformedbyhorizontallystackingtogether x i ,and C isaappropriate ` 1 =` 2 -basedregularizer.Thematrix C ofsparsecoecientscanbeusedasin SRC togenerateclassicationdecisionbyapplyinge.g.,votingandpoolingacrossitsrows. However,therearetwoundesirablefeatures:theeectof onthematrix C isdicultto predictandunderstand,andthepoolingandvotingstillcannotbeavoided. Forclassicationproblemsincomputervision,thispaperarguesthatamodication ofthegroupsparsecoding, constrainedgroupsparsecoding ,usingthefollowingobjective 1 Weassumethattheidentityisamongthetransformations 15

PAGE 16

function,oersamoreprincipledandexibleapproach: E a ; c ; X ; D = k Xa )]TJ/F39 11.9552 Tf 11.956 0 Td [(Dc k 2 + c ; {3 where a =[ a 1 :::; a k ] > ,isa k -dimensionalvectorwithnonnegativecomponentssatisfying theaneconstraint a 1 + ::: + a k =1.Comparingwith GSC asinEq.2{2,the constrained GSC enlargesitsfeasibledomainbyincludinga k -dimensionalvector a However,thefeaturevector c usedinclassicationisinfactavectornotamatrix. Geometrically,constrained GSC iseasytoexplainasitsimplysearchesforthevector intheconvexhull S generatedbythe x 1 ;::: x k thatcanbestbesparsecodedusingthe dictionary D .Comparingwith GSC ,theclassicationdecisionbasedonEq.2{3does notrequirepoolingorvoting. Theargumentinfavorofconstrainedgroupsparsecodingreliesmainlyonaform ofsparserecoveryguaranteepresentedinTheoremOnebelow.Thetheoremeectively showsthatforanysparsevector x amongthecolumnsof X ,ifitcanbecorrectlyclassied using GSC byminimizingEq.2{2,theninalargerdomain S ,itwillstillbetheglobal minimumofEq.2{3withthesameparameter .Aswillbemademoreprecisely later,thiswillallowustoarguethat,atleastintheory,theclassicationperformanceof constrained GSC shouldbeatleastasgoodastheonebasedon GSC orEq.2{1. Weconcludetheintroductionbysummarizingthethreeexplicitcontributionsmadeinthis paper: 1.Weproposeanewsparserepresentation-basedclassicationframeworkbasedon constrainedgroupsparsecoding,anditprovidesaprincipledextensionofthecurrent SRC frameworktoclassicationproblemswithmultipleinputs.Theresulting optimizationproblemcanbeshowntobeconvexandcanbesolvedecientlyusing iterativealgorithm. 2.Weprovidesometheoreticalanalysisoftheproposed SRC intheformofasparse recoveryresult.Basedonthisresult,wearguethattheoretically,theclassication 16

PAGE 17

performanceoftheproposedframeworkshouldbeequalorbetterthanotherexisting SRC frameworkssuchasgroupsparsecoding GSC 3.Weevaluatetheproposedframeworkusingthreefacerecognition-relatedexperiments.Theresultssuggestthattheproposedframeworkdoesprovidenoticeable improvementsoverexistingmethods. Thispaperisorganizedasfollows.Wewillpresenttheaneconstrainedgroupsparse coding ACGSC frameworkinthenextsection.Theassociatedoptimizationproblem anditssolutionwillalsobediscussed.Weprovideabriefsurveyoftherelatedworkin sectionthree.Theexperimentalresultsarereportedinsectionfour,andweconcludethe paperwithashortsummaryandtheplanforfuturework. 2.2TheoryandMethod Let x 1 ;::: x k denoteagroupofinputtestdata,and D isthegivendictionary.Wefurther let X denotethematrix X =[ x 1 x 2 :::: x k ].Ourproposedane-constrainedgroupsparse codingseekstominimizetheobjectivefunction: E CGSC a ; c ; X ; D = k Xa )]TJ/F39 11.9552 Tf 11.956 0 Td [(Dc k 2 + c {4 subjecttothenonnegativeaneconstraintonthe groupcoecient a P k i =1 a i =1 ; and a 1 ; a 2 ;:::; a k 0.Notethatingroupsparsecoding[1] GSC ,therearenogroup coecients a andthesparsecoecients c aregivenasamatrix.Aschematicillustration ofthedierencebetweenthegroupsparsecodingandourconstrainedversionisshown inFig.2-2.The GSC -basedclassicationschemesparsecodestheinputfeaturevectors x i simultaneously.Whilesomegroupsparsitycanbeclaimedforthisapproachbased ontheappropriateregularizer,itisgenerallydiculttoprovideanyguaranteeon thebehaviorofthesparsecoecientmatrix C .Ontheotherhand,forourconstrained version,thediscretesetoftheinputvectorshasbeen completed toformaconvexset S ,andourapproachisdesignedtocaptureanyvectorinthisconvexsetthatcanbest besparsecodedbythedictionary D .Thesituationheresharessomesimilaritywith 17

PAGE 18

Figure2-2.Illustrationofthedierencebetweengroupsparsecodingandconstrained groupsparsecoding,anditseectonclassicationresults.Thecone representsthesubspacespannedbyablockofthedictionary D .Shadedareas representtheconvexhullspannedcolumnsin X .Noneofthe x i liewithinthe subspace;howeversomeofthepointsontheconvexhulldoandtheproposed algorithmismeanttocapturethesepoints. theLP-relaxationofintegerprogramming[44]ortheconvexicationofanon-convex program[35],inwhichoneenlargesthefeasibledomaininordertoachieveconvexityand thereby,ecientlycomputeapproximatesolution. WeremarkthattheaneconstraintisquitenecessaryinEq.2{4,andwithoutit, thereisalwaysthetrivialsolution a =0 ; c =0.Itisclearthattheoptimizationproblemis indeedconvexanditiscompletelytractableasthefeasibledomainandobjectivefunction arebothconvex.Weiterativelysolvefor a and c usinggradientdescent,andthisscheme isguaranteedtoconverge.Theonlycomplicationistheprojectionontothesimplex denedbythegroupcoecientconstraint a 1 + ::: + a k =1,andthisstepcanbeeciently managedusinganiterativeschemedescribedinthesupplementalmaterial. 2.2.1TheoreticalGuarantee Givenadictionary D ,avector x hassparsity s ifitcanbewrittenexactlyasalinear combinationof s columnsof D .Animportantresultthatunderliesall SRC frameworks istheguaranteeprovidedbythesparserecoveryresultthatforafeaturevector x with sparsityboundedbypropertiesof D [7,11], x canberecoveredbyminimizingthe ` 1 cost-function: 18

PAGE 19

P 1 min c k c k 1 subjectto Dc = x : {5 Inactualapplication,theabove ` 1 -programismodiedas P 1 min c k x )]TJ/F39 11.9552 Tf 11.955 0 Td [(Dc k 2 2 + k c k 1 ; {6 forasuitablychosenconstant > 0.Weremarkthatthetwoprograms,whilerelated, areinfactdierent,withmostsparserecoveryresultsgivenbyminimizing P 1 .Let x beanoiselesstestvectortobeclassied.Atypical SRC methodwilldetermine itsclassicationbasedonitssparsecoecientsobtainedbyminimizingtheprogram P 1 .Comparedtothem,ourproposedframeworkenlargestheoptimizationdomainby introducingthegroupcoecients a ,anditispossiblethatwithlargerdomain,spurious andincorrectsolutionscouldarise.Thefollowingtheoremrulesoutthispossibility,at leastwhenthesparsevector x canbe exactly recoveredbyatypical SRC methodand classiedcorrectly: Theorem2.1. Let x beafeaturevectorwithsparsity s suchthatitcanbeexactly recoveredbyminimizing P 1 forsome .Furthermore,weassumethat x isintheconvex hull S .Then, x istheglobalminimumof E CGSC withthesame and D Proof. Theproofisstraightforwardanditconsistsofcheckingthesparsevector x also correspondstotheglobalminimumof E CGSC a ; c .Since x 2 S ,wehave x = Xa for somefeasible a .Since x isasparsevectorthatcanberecoveredexactlybyminimizing P 1 inEq.2{6,welet c beitssparsecoecients,andwehave x = Dc .Weclaimthat a ; c isaglobalminimumof E CGSC a ; x byshowingthatthegradientvanishesat a ; c First,since c istheglobalminimumfor P 1 with x = Xa ,and r c P 1 = r c E CGSC a ; c thereforethe c -component r c ofthegradient rE CGSC vanishesat: r c E CGSC a ; c =0 : 19

PAGE 20

Ontheotherhand,bydirectcalculation,wehave a -componentofthegradient rE CGSC r a E CGSC a ; c = X > Xa )]TJ/F39 11.9552 Tf 11.955 0 Td [(X > Dc =0 ; because Dc = x = Xa .Thisshowsthat a ; c istheglobalminimumoftheconvex function E CGSC a ; c ,regardlesswhether x isontheboundaryoftheconvexhull S Fromtheabove,wecandrawtwoimportantconclusions.First,comparingto GSC ourconstrainedversion,withanenlargedfeasibledomain,willindeedrecovertheright solutionifthenoiselesssolutionisindeedamongtheinputfeaturevectors x 1 ;::: x k Therefore,ourmethodwillnotproduceincorrectresultinthiscase.However,thebehaviorof GSC inthiscaseisdiculttopredictbecauseothernoisyinputfeaturevectors willaectthesparsecodingofthenoiselessinputvector,andtheresultofthesubsequencepoolingbasedonthematrix C isalsodiculttopredict.Second,ifthereisa sparsevector x lyinginsidetheconvexhull S spannedby x 1 ;::: x k ,ourmethodwillindeed recoveritwhentherequiredconditionsaresatised. 2.2.2Part-BasedACGSC The ACGSC frameworkbasedonEq.2{3isversatileenoughtoallowforseveral variations,andherewediscussonevariation,thepart-based ACGSC ,thatisspecically suitablefordetectingthepresenceofocclusions.OnevariationofEq.2{3is E A ; c ; X ; D = k k X i A i x i )]TJ/F39 11.9552 Tf 11.955 0 Td [(Dc k 2 + c ; {7 where A isthesetofall A i A i arediagonalmatriceswithnonnegativeelements, k isthe numberofinputsamples,and x i 2 R d isthe i -thcolumnin X .Theaneconstraintson the A i are P k i A j i =1for j =1 d ,where A j isthe j -thdiagonalelementof A .The resultingvector P i A i x i istheelement-wiseanecombination 2 of x i 's. 2 P i A i x i = P i diag A i x i ,where denoteselement-wiseproduct. 20

PAGE 21

A B Figure2-3.Comparisonbetweenthestandardaandthepart-based ACGSC b. a : a i arethegroupcoecientscorrespondingtothesample x i .Thenonnegative aneconstrainthereis a 1 + a 2 =1. b :Thesameinputsamplesaresplit into4partsinthepart-basedapproach.Thereare4nonnegativeane constraintsi.e. a p 1 + a p 2 =1for p =1 4. AlthoughEq.2{7providesanextensionofEq.2{3,itisseverelyunderconstrainedasthereare d k unknownsinallthe A i .Toalleviatethisproblem,we canfurtherreducethenumberofvariablesin A i .Forexample,theequationbelowgives only n p dierentvariables: a j i arescalarvariables,and I p areidentitymatricesofcertain sizes, A i = 2 6 6 6 6 6 6 6 4 a 1 i I 1 0 0 0a 2 i I 2 0 . . . . . . 0 0a n p i I n p 3 7 7 7 7 7 7 7 5 : {8 Thisformulationof A i isequivalenttosplittingasample x i into n p parts.Eachpart of x i correspondstoascalarvariable a p i .Thesizeofapartisequaltothesizeofthe corresponding I p .Notethateach I p doesnotnecessarilyhavethesamesize.Let I p denote thesetofindicesoftherowsin A i correspondingto a p .Eq.2{7canberewrittenas: 21

PAGE 22

k 2 6 6 6 6 6 4 X I 1 0 0 0X I 2 . . . . 0 0 0X I n p 3 7 7 7 7 7 5 2 6 6 6 6 6 4 a 1 a 2 . a n p 3 7 7 7 7 7 5 )]TJ/F39 11.9552 Tf 11.955 0 Td [(Dc k 2 + c s.t. k X i =1 a p i =1for p =1 n p and a p i 0 ; {9 where a p =[ a p 1 ; a p 2 ; ; a p k ] | and X I p aretherowsin X thatcorrespondtothe p -th part.Acomparisonbetweenthestandardandthepart-based ACGSC isillustratedin Fig.2-3.Becausetherstpartofthedatadelitytermisstilla d -dimensionalvector, optimizationof c isthesameasinEq.2{3.AlthoughEq.2{9andEq.2{3havea similarstructure,theformerhaspart-structuredenedonthecomponentsof a ,andin practice,thepartsarespeciedbyeachindividualapplication.Forfacerecognition,we candenethepartsaccordingtotheimageregionswheretheusefulfeaturessuchaseyes, noseandmoutharetobefound. SincetherstmatrixinEq.2{9isblock-diagonal,Eq.2{9canberewrittenas: n p X p =1 k X I p a p )]TJ/F39 11.9552 Tf 11.955 0 Td [(D I p c k 2 + c ; {10 where D I p aretherowsof D correspondingtorowsof X I p .Thevector a p canthen beoptimizedindividuallyunderthenonnegativeaneconstraintsgiveninEq.2{9. Notethattheindicescorrespondingto a p in A ,asshowninEq.2{9,donothavetobe contiguous.Thisprovidesusmoreexibilityfordeterminingandspecifyingusefulparts, dependingontheintendedapplication. 2.3RelatedWorks Tothebestofourknowledge,asimilarframeworkandalgorithmtotheoneproposed inthispaperhavenotbeenreportedinthecomputervisionliterature.Wewillkeep ourpresentationsuccinctandtothepoint.Inparticular,wewillfocusprimaryonthe 22

PAGE 23

presentationofthealgorithmaswellastheexperimentalresultsusingrealimagedata. Forlimitedspace,wewillonlysummarizethemajordierencesbetweenourworkand someoftherepresentativeworksindictionarylearningandsparserepresentationthat haveappearedinthepastfewyears. Sparserepresentationshavebeensuccessfullyappliedtomanyvisiontaskssuchas facerecognition[24],[31],[42],imageclassication[43],[10],[14],[27],[28],[32],[48], denoisingandinpainting[25],[2]andmanyotherareas[45],[51].Thesuccessislikely duetothefactthatsparserepresentationsofvisualsignalareproducedintherststage ofvisualprocessingofhumanbrain[29].Inmanycases,simplyreplacingtheoriginal featureswithitssparserepresentationsleadstosurprisingbetterresults[45].Moreover, inmanyoftheapplications,theyrequireonly unlabeled dataduringthetrainingphase" DictionaryLearning. Whilemanyoftheworksfocusingonreplacingtheoriginaldensefeatureswiththe sparsecoding,someproposedaddingstructuredconstraintseitheronthesparserepresentations[43],[1]oronthedictionary[19],[37].IntheworkofWangetal.[43],afeatureis codedwithatomsinthedictionarythatarelocaltothefeature.Althoughthereisno sparsitypromotingregularizerterm ` 1 normoncoecientsintheirformulation,theirlocalityconstraintspromotecodingofafeaturebyitslocalneighborswhichinturnpromote sparsity.Thisresultsinastate-of-artperformanceintheimageclassicationtask. IntheworkofBengioetal.[1],asparsitypromotingmatrixnormisimposedon thecoecientsofdatathatbelongstothesamegroup.Thissparsitypromotingmatrix normencouragefeaturesinthesamegrouptosharethesameatomscodewordsinthe dictionary.Duringthesparsecodingphase,thisprocesshelpstoidentifytheatomsthat arecommonlyusedbyimageswithinthesamegroup.Inadierentcontext,weassume sampleddataisundercertainperturbationsi.e.lightingvariation,orocclusions.Wetreat thetheseperturbationsasagroupandapplyaconstrainedsparsecodingschemethat wouldhelptoidentifytheunderlyingfeaturesamongthem. 23

PAGE 24

2.4Experiments 2.4.1FaceClassication SRC basedfaceclassicationhasbeenextensivelystudiedandstate-of-the-art resultshavebeenestablishedinthepast[10,46].However,noneoftheworksinvestigated themorerealisticscenariothattherecanexistlargevariabilitybetweenthedictionary trainingsamplesandthetestsamples.Inthisexperiment,weusedthe croppedExtended YaleFaceDatabaseB [21]tosimulatesuchscenario.Thisdatabasecontainsalignedimages of38personsunderdierentlaboratory-controlledilluminationconditions.Foreach subject,wechosetheimageswiththeazimuthandelevationanglesofthelightsource 35 o asthetrainingsamples.Therestoftheimagesinthedatabase,whichcontainlarge amountofshadows,wereusedastestsamples. Figure2-4.Selectedtrainingtoprowandtestbottomrowsamples.Numbersinthe parenthesisaretheazimuthandelevationangles. Thetrainingsampleswereusedtosimulatethewell-litimagessuchaspassportor I.D.photos.Thetestsampleswereusedtosimulatepoorlyacquiredimagese.g.from surveillancecamerathatareverydierentfromthetrainingdata.Fig.2-4demonstrates thelargevariabilitybetweenthechosentrainingtoprowandtestbottomrowsamples. 24

PAGE 25

Weusedthetrainingsamplesdirectlyasatomsofthedictionary D ,asin[10,46]. Therefore,thereare38blocksin D andeachblockcontains24or23atoms 3 .Thenumber oftestsamplesforeachpersonisaround40.Theexperimentwasconductedasfollows: 1.Reducedimensionalityofthedatato600usingthePCA.Normalizethesamplesto haveunit ` 2 norms. 2.Usethetrainingsamplesdirectlyasatomsofthedictionary D 3.Foreachsubject,randomlyselect n g =[2 7]numbersoftestsamples X 4.Initialize a =[ 1 n g ; ; 1 n g ] | and c = 0 5.Iterativelyupdate a and c 6.Determinetheclasslabelby label X =min i k Xa )]TJ/F39 11.9552 Tf 11.955 0 Td [(D i c i k 2 ; {11 where D i and c i arethe i -thblockof D and c ,respectively. 7.Repeatuntilalltestsamplesarebeingusedforevaluation. Werantheaboveexperiment10timesandtheresultsarereportedinFig.2-5.We comparedourresultwiththeresultofsimplyusingthemeanofcolumnsof X asinput vectorlastcolumnofFig.2-6. Wealsocomparedwithtwogroupregularizedsparsecodingalgorithmsproposedby Bengioet.al.[1],andbySprechmannet.al.[38].In[1],theenergyfunctiontheyminimized isinthesameformasEq.2{2: k X )]TJ/F39 11.9552 Tf 11.955 0 Td [(DC k 2 F + j D j X i k C i k 2 ; {12 where C i isthe i -throwof C .Theiralgorithmpromotesthedatainagroupcolumnsof X tosharesamedictionaryatomstoencode.In[38],ontopofthegroupstructure,they 3 Missingsamplesinsomecategories. 25

PAGE 26

addedablockstructureon D .Theenergyfunctiontheyminimizedisalsointhesame formasEq.2{2: k X )]TJ/F39 11.9552 Tf 11.955 0 Td [(DC k 2 F + n b X b k C b k 2 ; {13 where C b istherowsin C thatcorrespondstothe b -thblockof D .Similartoprevious algorithm,Sprechmann'salgorithmpromotesthedatain X tosharesameblocksof D to encode. WealsocomparedwiththeresultsusingalgorithmsfromtheworksofWrightet. al.[46]andofElhamifaret.al.[10].Weappliedthesetwomethodstoeverytestsample sincetheydidnotutilizegroupstructureondataandtheresultsarealsoreportedin Fig.2-5 4 .Theresultsshowthatourproposedmethodsignicantlyoutperformedother algorithms.Thereasonis,asshowninFig.2-2,ourproposedmethodprovidesalarger feasibledomainthesimplexspannedby X whilethegroupregularizedmethodcanonly relyonfewatoms.Theregularizerparameters foreachmethodarelistedinFig.2-5. Fig.2-6demonstratesfourgroupsoftestsamples,theactualcomputedcoecients a whitebars,theimageatoptimality, Xa nd columnfromtherightandthemean imagelastcolumn.Theimagesatoptimalityhavethelightingconditionsthataremore similartotheatomsin D toprowinFig.2-4thanthemeanimages ifthesimplex spannedbycolumnsof X lieswithinthesubspacesspannedbytheblocksin D .Therst threerowsdemonstratesuccessfulexamplesbyourmethods.Wewerenotabletoclassify the1 st and3 rd correctlybyusingthemeanof X becausethemajorityoftheimagesin thegrouparetoodark.Thebottomrowshowsafailedcasewherenoneofthesamples containsmuchdistinguishablefeatures. 4 TheresultsinFig.2-5ofthesetwomethodsareworsethanwhatwerereportedinthe originalliteratures[46]and[10].Thisisbecause,intheirexperiments,theyrandomly chosehalfofthedatasetasatomsof D andtheotherhalfastestsamples.Thereforetheir dictionariesare50%largerthanoursandthevariabilitybetweentrainingandtestsamples areminimizedduetotherandomselection. 26

PAGE 27

Figure2-5.Comparisonofourmethodwithdierentsparsecodingalgorithms.The for eachmethod,intheorderofthelegend,are0.05,0.2,0.05,0.1,0.05,and0.05, respectively. Theresultsalsoshowthatsampleswhicharemoresimilartoatomsinthedictionary usuallyhavehigher a values.Infact,ifasampledoesNOTprovideanyusefulfeatures thatcantriggeramatchfromthedictionary,itscorresponding a valueswilloftenbe driventozerose.g.2 nd and3 rd imagesin1 st roweventhoughwedonotputanysparseinducingconstrainton a 2.4.2ImposterDetection Fromtheresultsintheprevioussection,weobservedthatsamplesin X havehigher correspondingvaluesin a iftheyaremoresimilartotheatomsinthedictionary.Now, let'sassumeweacquiredacollectionofsamplesthatcontain n k classk samplesand n j samplesthatarenotinclassk .Wecallthese n j samplestheimposters.Onestraightforwardwayistoapplyourproposedalgorithmtothese n k + n j samples.Thesampleswith lowercorresponding a i valueswouldimplythattheyaretheimposters. Thisapproachwouldworkwellifthe n j samplesarenotclosetootherblocksinthe dictionarye.g.the n k samplesarefaceimagesofapersonandthe n j samplesareimages ofotherobjects.However,whenthe n j samplesarealsoverysimilartosomedictionary atoms,thismethodbecomesunreliable.Wethereforemodifyouralgorithmslightlytosuit 27

PAGE 28

| {z } X | {z } X a | {z } mean X Figure2-6.Left:columnsof X andthevaluesofthecomputed a whitebars.Thelast twocolumnsaretheresultsof X a andthemeanofcolumnsof X respectively.Therstthreerows thispurpose.Whenupdatingtheanecoecient a ,insteadofusingthewholedictionary, wecompute a using a = X | X )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 X | D i c i ; {14 where D i and c i arethe i -thblockof D and c ,respectivelyandthe i -thblockistheblock thathastheminimumreconstructionerrorto Xa .Thevalueof i canbecomputedusing Eq.2{17. WecomparedwiththeresultsusingthealgorithmproposedbySprechmannet. al.[38]thatminimizesEq.2{13.Asmentionedinprevioussection,theiralgorithm promotes X tousethesamefewblocksin D toencode.Inotherwords,theimposters in X areforcedtousethesameblocksasotherinliersforencoding.Thiswilllikely 28

PAGE 29

resultinalargerreconstructionerrorsfortheimpostersastheyusetheblocksthatare irrelevanttothembutrelevanttotheinliersforencoding.Inthisapproach,weusedthe reconstructionerrorstodetermineimposters. Tomakethetaskmorechallenging,insteadoffollowingtheset-upasprevioussection. WeusedtheAR-facedataset[30].Thisdatasetcontainsfaceimagesof100individuals. Thereare14imagesfromeachindividualwithdierentexpressionsandillumination variationsintermsoflightingintensitynotindirectionsastheExtendedYaleDatabaseB. Foreachperson,werandomlychose n g imagesasthetestsamplesandtherestastraining samples.Theexperimentwasconductedasfollows: 1.Projectthedatasampledownto600dimensionusingtheprincipalcomponentsof thetrainingsamples. 2.Usethetrainingsamplesasatomsof D directly. D hasa100-blockstructure. 3.Forthe p -thperson,pick n g imagesfromitstestsamplesandpickrandomly n i imposterimagesfromthetestsamplesofotherpeople.These n g + n i imagesare columnsof X 4.Initialize a = 1 = n g + n i c = 0 and C = 0 5.Iterativelyupdate c sparsecoecientof Xa and a .UseEq.2{14toupdate a instead. 6.Determinetheimpostersbycomparingthevaluesin a withathresholdvalue 1 X i isanimposterif a i < 1 7.Computethesparsecoecients C of X usingthealgorithmin[38]. 8.Computethereconstructionerrors e ofeachcolumnof X : e j = k X j )]TJ/F39 11.9552 Tf 11.976 0 Td [(D i C j i j 2 ,where X j isthe j -thcolumnof X and C j i isthe i -thblockofthe j -thcolumnof C i isthe indexoftheactiveblockfordata X .Anactiveblockisdeterminedby active-block X =min i k X )]TJ/F39 11.9552 Tf 11.955 0 Td [(D i C i k F ; {15 where C i aretherowsthatcorrespondto i -thblockof D 29

PAGE 30

Figure2-7.Precisionv.s.RecallcurveandaverageprecisionA.P.comparisonsof various n i .Inthebottom-rightgraph,allthe n i impostersarefromonesingle class. 9. X j isanimposterif e j > 2 Werepeattheaboveprocedure10timesforeachclass.The forouralgorithm andthealgorithmin[38]aresetto0.1and0.5,respectively. n g is5and n i =[3 10]. ThePrecisionv.s.recallcurvesandtheaverageprecisionsareasofthecurvesofboth methodsarelistedinFig.2-7.Theresultsshowthatourdetectionperformanceisslightly betterthanthatofSprechmann's.Theaverageprecision n i =3%doesnotdier muchfromthatof n i =10%.Thisisbecausewhenwechosetheimpostersinstep3, theimposterswerechosenfromacrossallotherclasses.Therefore,theinliersremainthe majorityclassin X .Thisishowastandarddataacquisitionsystemshouldnormallywork. Weconductedanotherexperimentwith n g =5and n i =4butallthefourimposters arechosenfromonesingleclass.Theresultisshowninthebottom-rightgraphinFig. 2-7.Theaverageprecisionsofbothmethodsaresignicantlyworse. 30

PAGE 31

| {z } Inliers | {z } Imposters Figure2-8.Impostersdetectionresults.Firstthreerow:detectionresultsusingourane model.Lastrow:Detectionresultusingreconstructionerrors.Theimposters arelistedinthelasttwocolumns.Thethirdrowisafailedcasewherethe2nd samplefromtheleftwasidentiedasanimposter. Weshowsomeselectedresultswiththeir a rst3rowsand e lastrowvaluesin Fig.2-8.Forpresentationpurposes,weonlyshowtwoimposterdatasamples.Theresults showthatimpostersusuallyhavelower a orhigher e values.Thethirdrowdemonstrates afailedcasewherethefacialexpressionsofbothimpostersareverysimilartooneofthe inliers-thfromtheleft.Thisimposterdetectionframeworkcouldbeeasilyextendedfor otherapplicationssuchasunusualeventdetection[50].Itis,however,behindthescopeof thispaper.Wewillleaveitforfuturework. 2.4.3FaceRecognitionwithOcclusions WehavetestedourproposedapproachusingtheAR-Facedatabase[30].Thisdataset containsfaceimagesof100individuals.Thereare14non-occludedand12occluded 31

PAGE 32

imagesfromeachindividualwithdierentexpressionsandilluminationvariations.The occludedimagescontainstwotypesofocclusions:sun-glassesandscarfcoveringthe facefromthenosedown.Eachocclusiontypecontains6imagesperperson.Toreduce thedimensionality,wedown-sampledtheimagesto55 40andvectorizedthem.We randomlyselected8non-occludedimagesfromeachpersontoformthedictionary D with a100-blockstructure.Theexperimentwasperformedasfollows: 1.Randomlyselected n g testsamples X fromtheoccludedimagesofperson p .They mustcontainatleastonefromeachtypeofocclusion. 2.Splitthetestimagesinto6uniformly-sizedandnon-overlappingpartsFig.2-10D. 3.Initialize A i = I =n g a p = 1 =n g c = 0 4.Iterativelyoptimize c and A i a p 'susingEq.2{9andEq.2{10,respectively. 5.Determinetheclasslabelby label X =min i n p X p =1 k X I p a p )]TJ/F39 11.9552 Tf 11.955 0 Td [(D I p i c i k 2 ; {16 where D i and c i arethe i -thblockof D and c ,respectively.Thisequationisa suitablemodicationofEq.2{17. Werepeatedtheaboveprocedure20timesforeachperson.Wecomparedtheresults withthatofusingstandard ACGSC .Wealsocomparedourresultswiththoseof GSC [1]andofSprechmann[38].Bothalgorithmstreatthemultipletestsamplesasagroup andsparsecodedthemsimultaneously.Lastly,wecomparedwiththeresultsfromdirectly applyingregularsparsecodingWrightet.al.[46]andblocksparsecodingElharmifar et.al.[10]ontheaveragetestsamplesFig.2-10B.TheresultsinFig.2-9showthat ourpart-wise ACGSC outperformsothermethodsbyasignicantmargin.Thestandard ACGSC doesnothavesignicantadvantageoverothermethods.Thisisduetothefact thattheocclusionsarepresentinallthetestsamples. Fig.2-10Dshowsthepart-basedgroupcoecients a p ofthetestsamplesafterthe optimization.Thevaluesaredisplayedusingthecolorsoverlaidonthecorresponding parts.Thelargestcoecientofthisspecicexampleisaround0.6.Wecanclearlysee 32

PAGE 33

Figure2-9.Comparisonofclassicationresults.The valuesusedinthemethods,inthe orderoflegends,are0 : 005 ; 0 : 005 ; 0 : 004 ; 0 : 002 ; 0 : 01 ; and0 : 02,respectively. thatthepartscorrespondingtotheoccludedregionshavesignicantlylowvalues,i.e., ourmethodcorrectlyidentiedtheocclusions.Fig.2-10Eshowsthepart-basedane combinationofthetestimages.Ourpart-basedapproachwasabletoselectthepartsthat aremoreconsistentandalignedwiththedictionarytrainingsamples.Fig.2-10Fshows thereconstructedimageusingourmethodandFig.2-10Cshowsthereconstructedimage bydirectlyapplyingWright'smethod[46]ontheaverageimageFig.2-10B.Duetothe occlusionofthescarf,thetestsampleswereincorrectlymatchedtotrainingsamplesfrom amalesubjectwithbeardsandmustache. 2.4.4TextureClassication Inthisexperiment,weusedthecroppedColumbia-UtrechtCurettexturedatabase [41] 5 .Thisdatabasecontainsimagesof61materialswhichbroadlyspantherangeof dierentsurfacesthatwecommonlyseeinourenvironmentSeeFig.2-11.Ithasatotal of61 92=5612images.Eachimageisofsize200-by-200pixels.Foreachtexture, werandomlychose20imagesasthetrainingsamplesandtherestastestsamples.The experimentwasconductedasfollows 1.Foreachimage,computeitsSIFT 6 featureovertheentireimage.Eachimageis simplyrepresentedbyone128-dimensionalSIFTvector. 5 Imagescanbedownloadedat http://www.robots.ox.ac.uk/~vgg/research/texclass/setup.html 6 Weusedthevl siftpackage. 33

PAGE 34

A B C D E F Figure2-10. A :Inputtestsamples. B :Theaverageofthe3testsamples. C : ReconstructedimageusingWright'smethodontheaverageimageB. D : Weightsofthepart-wisegroupcoecientsoverlaidonthecorrespondingtest samples.Thereddertheshadeis,thehighertheaneweightis. E :The part-wiseanecombinationofthetestsamplesaftertheoptimization. F : Reconstructedimageusingourproposedapproach. Figure2-11.SelectedimagesfromthethecroppedCuretdatabase.Toprow:5dierent typesoftextures.Bottomrow:sametextureunderdierentillumination conditions. 34

PAGE 35

2.NormalizetheSIFTvectorstohaveunit ` 2 norm.Truncatelargeelementsinthese vectorsbysettingtheelementswithvalues > 0 : 25to0 : 25.Normalizethemagain. 3.Usethetrainingsamplesdirectlyasatomsofthedictionary D D isofsize128 1 ; 220andhasa61-blockstructure. 4.Foreachclass,randomlychoose n g numberoftestsamples X 5.Iterativelyupdate c and a inEq.inthemaintext.Theregularizer tocompute c issetto0.2. 6.Repeattheabovetwostepssuchthatalltestsamplesinagivenclassareusedfor testing. 7.Determinetheclasslabelof X by label X =min i k Xa )]TJ/F39 11.9552 Tf 11.955 0 Td [(D i c i k 2 ; where D i and c i arethe i -thblockof D and c ,respectively. Wecomparedourresultswiththeresultsusingtheframeworkfromtheworks ofWrightet.al.[46].Sincethereisnogroupstructuredenedintheirframework. Wecomputedthesparsecoecientsofallthetestsamplesindividually.The ofthis algorithmissetto0.17.Theclasslabelisdeterminedbyusingthefollowingequation: label x =min i k x )]TJ/F39 11.9552 Tf 11.955 0 Td [(D i c i k 2 : TheclassicationresultsarelistedinFig.2-12.Theresultsaresurprisingwell forsuchsimplefeaturesSIFToverthewholetextureimage.Theresultfromusing Wright'sframeworkiscomparabletothestate-of-the-artresults[41]forthisdataset.Our frameworkfurtherimprovedtheresultofWrightet.al.by3 : 3%classicationrateis 99.67%when n g =5. 35

PAGE 36

Figure2-12.Classicationratesofthetextureclassicationexperiment. 2.5FutureWork Forfuturework,wewillinvestigatemoretheoreticalaspectoftheapproach.We believethatitispossibletoobtainastrongerformofthesparserecoveryresultunder noisyassumption,providingabetterunderstandingofthepowerandlimitationofthe proposedalgorithm.Fromthepracticalside,wewillalsoinvestigateecientmethods andstrategiesfordeterminingthecollectionofrelevanttransformsthatcanbeapplied onlinetoimprovetherobustnessandaccuracyof SRC -basedapproachasdescribedinthe introduction.Furthermore,wewillalsoinvestigateusefulandeectivepriorforthegroup coecients a andtheresultingusuallynonconvexoptimizationproblem.Fromtherst experiment,wehaveobservedthatthegroupcoecients a tendtohaveonlyfewlarge elementsandtherestareveryclosetozero.Itmakessensetoputasparse-inducingprior thatdoesnotviolatetheoriginalconstraintson a .Oneimmediatecandidatewouldbethe Dirichletprioronthesimplexspannedby a ,withthemodiedobjectivefunction E CGSC a ; c ; X ; D = k Xa )]TJ/F39 11.9552 Tf 11.955 0 Td [(Dc k 2 + c + DIR a ; 36

PAGE 37

where DIR isthedensityfunctionoftheDirichletdistributionontheunitsimplexin R k 37

PAGE 38

CHAPTER3 BLOCKANDGROUPREGULARIZEDSPARSEMODELINGFORDICTIONARY LEARNING 3.1Introduction Sparsemodelinganddictionarylearninghasemergedrecentlyasaneectiveand popularparadigmforsolvingmanyimportantlearningproblemsincomputervision.Its appealstemsfromitsunderlyingsimplicity:givenacollectionofdata X = f x 1 ; ; x l g2 R n ,learningcanbeformulatedusinganobjectivefunctionoftheform: Q D ; C ; X = X g k X g )]TJ/F39 11.9552 Tf 11.955 0 Td [(DC g k 2 F + D D + C C g ; {1 wherethe X g arevectors/matricesgeneratedfromthedata X ,and ; areregularizers onthelearneddictionary D andsparsecoecients C g ,respectively.Indictionary learning, C isusuallybasedonvarioussparsity-promotingnormsthatdependonthe extrastructuresplacedon D ,anditistheregularizer D thatlargelydeterminesthe natureofthedictionary D .Itissurprisingthatsuchaninnocuousformulatemplatehas generatedanactiveandfertileresearcheld,atestamenttothepowerandversatilityof thenotionsthatunderlietheequation:linearityandsparsity. IfEq.3{1providestheeleganttheme,itsvariationsareoftencomposedofextra structuresplacedonthedictionary D [11,18,33,38],andlessfrequently,dierent waysofgeneratingsparsely-codeddata X g fortrainingthedictionary.Theformeraects howthetworegularizers ; shouldbedened,andthelatterdetermineshowthe vectors/matrices X g shouldbegeneratedfrom X .Forclassication,ablockstructure isoftenimposedon D andhierarchicalstructurescouldbefurtherspeciedusingthese blocks[18,39],withtheaimofendowingthelearneddictionarycertainpredictivepower. Topromotesparsity,theblockstructureon D isoftenaccompaniedbyanappropriate block-based ` 2 -norme.g., ` 1 =` 2 -norm[49]usedin C .Ontheotherhand,for X g acommonapproachistogenerateacollectionofgroupsofdatavectors f x g 1 ; ; x g k g 38

PAGE 39

Figure3-1.IllustrationoftheproposedBlock/GroupSparseCodingalgorithm.Agroupof data X g ontheleftissparselycodedwithrespecttothedictionary D with blockstructure D [1] D [ b ] from X andtosimultaneouslysparsecodethedatavectorsineachdatagroup X g [1]. Forclassicationproblems,theideaistogeneratedatagroups X g withfeaturevectors x g i thatshouldbesimilarlyencoded,andsuchdatagroups X g canbeobtainedusing problem-specicinformationsuchasclasslabels,similarityvaluesandotherinformation sourcese.g.,neighboringimagepatches. Inanoiselesssetting,ourproposedproblemofencodingsparserepresentationsfor agroupofsignals X usingtheminimumnumberofblocksfrom D canbecastasthe optimizationprogram: P ` 0 ;p :min C X i I k C [ i ] k p s.t. X = DC ; {2 where I isanindicatorfunction, p =1 ; 2and C [ i ] isthe i -theblocksub-matrixof C thatcorrespondstothe i -thblockof D asshowninFig.3-1.Thiscombinatorialproblem isknowntobeNP-hard,andthe ` 1 -relaxedversionoftheaboveprogramis: P ` 1 ;p :min C X i k C [ i ] k p s.t. X = DC : {3 Wewillcallthisprogram Block/GroupSparseCoding BGSCasitincorporatesboth thegroupstructureindataandblockstructureinthedictionary. 39

PAGE 40

Insomeapplicationsofwhichthemainconcernisidentifyingcontributingblocks ratherthanndingthesparserepresentation[11],thefollowingoptimizationprogramis considered: P 0 ` 0 ;p :min C X i I k D [ i ] C [ i ] k p s.t. X = DC : {4 Again,thisprogramisalsoNP-Hardandits ` 1 relaxationis: P 0 ` 1 ;p :min C X i k D [ i ] C [ i ] k p s.t. X = DC : {5 Wewillcalltheprograms P 0 ` 0 ;p and P 0 ` 1 ;p ReconstructedBlock/GroupSparseCoding R-BGSCastheyfocusonminimizingthenormofthereconstructionterm k D [ i ] C [ i ] k Theoptimizationalgorithmsforsolving P ` 1 ;p and P 0 ` 1 ;p willbepresentedinSec.2. Duetolimitedspace,wewillonlysummarizethemajordierencesbetweenourwork andsomeoftherepresentativeapproachesindictionarylearningandsparserepresentation thatappearinrecentyears.Thegroupsparsecodingwasrstintroducedin[1].Elhamifar andVidal[11]explicitlyimposedblockstructureonthedictionaryforclassication. However,noneoftheseapproachesinvestigatedneitherthecombinedframeworkthat incorporatedboththegroupandblockstructuresnortheeectofcombiningthesetwo structuresondictionarylearning. SparseRepresentationbasedClassicationSRCwasstudiedrecentlyin[10, 46].However,thedictionaryissimplycolumnsofthetrainingdataandthereisno emphasisonminimizingtheintra-blockcoherence.Althoughtheworkin[33]sharessome supercialsimilaritieswithourdictionarylearningalgorithm,thedierencesaremajor. First,Ramirezet.al.traineachblockusingdierentcollectionofdataandtherefore, thereisnonotionoftrainingblocksof D simultaneouslyasinourframework.And becauseofthis,themainobjectivein[33]istheminimizationofinter-blockcoherence insteadoftheintra-blockcoherence.Finally,Sprechmannet.al.[38]proposedasparse codingschemethatissimilartooursBGSCusingtheproximaloptimizationproposedin 40

PAGE 41

[47].However,theydidnotproposenorinvestigatetheintegrationofdictionarylearning algorithmsastheirworkisfocusedonsignalrecoveryandseparation. Sharingofdictionaryatomsfordatainthesamegrouphadbeenprovedtoincrease thediscriminativepowerofthedictionary[1].Withtheblockstructureaddedon dictionary D ,ourproposedBGSCandR-BGSCalgorithmspromoteagroupofdatato shareonlyfewblocksof D forencoding.Therefore,incorporatingtheseSCalgorithms inadictionarylearningframework,whichiterativelyupdatescoecientsofdataand updatesatomsof D ,willresultintrainingeachblockof D usingonlyfewgroupsofdata. Thismeansthatabadlywrittendigit'9',whichlookslikea'7',whengroupedtogether withothernormallywritten'9's,willbeencodedusingatomsthese'9'sused.Thebadly written'9'will,inturns,beusedtotraintheatomsin D thatrepresent'9'sratherthan thosethatrepresent'7's. Anothernoveltyofourframeworkisthatwedonotassignaclassofsignalsto specicblocksofadictionary,unlikeotherSparseRepresentationbasedClassication SRC[10,46]and[33].Thiswouldallowsomeblockstostoresharedfeaturesbetween somedierentclasses.Ramirezet.al.[33]trainedasingledictionaryblockforeachgroup ofdata.Thismethodincreasestheredundancyoftheinformationencodedinthelearned dictionaryastheinformationcommontotwoormoregroupsacommonscenarioinmany classicationproblemswillneedtobestoredseparatelywithineachblock.Sinceone dictionaryblockisassignedtoeachclass,theredundancyinducedinthedictionaryneeds tobereducedforgreatereciency.Thisisperformedbyremovalofdictionaryelements whosemutualdotproducthasanabsolutevaluegreaterthananarbitrarythresholde.g. 0.95.Instead,weprovideanobjectivefunctionwhoseminimizationnaturallyproduces dictionariesthatarelessredundant.Infact,ourproposaltoencodedatafromasingle classusingmultipleblocksobviatestheneedtoevenincorporateanexplicitinter-block coherenceminimizationtermunlike[33].Bynotassigningclassestoblocksandgenerally havingmoreblocksin D thanthenumberofclasses,weallow mutualblocks thatcontain 41

PAGE 42

featuresfrommorethanoneblock.Aswell,datawithmorevariabilityisallowedtouse moreblocks. Asprovedin[9],theprogram P ` 1 ;p Eq.3{2isequivalentto P ` 0 ;p Eq.3{3 1 when n a k )]TJ/F15 11.9552 Tf 11.956 0 Td [(1 B < 1 )]TJ/F15 11.9552 Tf 11.955 0 Td [( n a )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 S ; {6 where n a and k arethesizeandtherankofablock,respectivelyand B and S areinterandintra-blockcoherencedenedinSection3.2.4,respectively.Inotherwords,thesmaller S isthemorelikelythetwoprogramscanbeequivalent.Awaytoachieveminimum S istomakeatomsorthonormalwithineachblock[6,23].However,suchdictionaries over-completedictionarywithunionoforthonormalbasisdonotperformaswellasthose withmoreexiblestructure[36].Forexample,inSRC-basedfacerecognition,eachblock containsatomstodescribefacesofthesameperson.Itdoesnotmakesensetoimpose strictorthogonalityoneachblock.Therefore,ratherthanimposingstrongorthogonality constraintoneachblock,weproposedadictionarylearningalgorithmonlytominimize intra-blockcoherence.Wewillelaboratetheeectofourframeworkonintra-block coherencelaterinSec.3.2.4. Theproposeddictionarylearningframeworklearnsthedictionary D byminimizing theobjectivefunctiongiveninEq.3{20,andthethirdterminEq.3{20measuresthe mutualcoherencewithineachblockof D .Thecorrespondingsparsecodingcanbeeither BGSCorR-BGSC.Besidesthenovelsparsecodingalgorithms,BGSCandR-BGSC, therearethreespecicfeaturesthatdistinguishourdictionaryframeworkfromexisting methods: 1.Insteadofinter-blockcoherence,theproposedICS-DLalgorithmpresentedin Sec.3.2.4minimizestheintra-blockcoherenceasoneofitsmainobjectives. 1 Theyprovedthecasewhen p =2and X isasinglevector.InSection3.2.1,wewill showthattheconditionstillholdswhen X isamatrix. 42

PAGE 43

2.Ourframeworkdoesnotrequiretoassignaclassoragroupofdatatoblocksin thedictionaryasin[33].Thisallowssomeblocksofthedictionarytobesharedby dierentclasses. 3.Adictionaryistrainedsimultaneouslywithrespecttoeachgroupoftrainingsamples X g usingourproposedblock/groupregularizedSCalgorithm. Weevaluatetheproposedmethodsonclassicationsupervisedandclustering unsupervisedproblemsusingwell-knowndatasets.Preliminaryresultsareencouraging, demonstratingtheviabilityandvalidityoftheproposedframework. 3.1.1RelatedWork Inthissection,Iwillonlysummarizethemajordierencesbetweenmyworkand someoftherepresentativeapproachesindictionarylearningandsparserepresentation thatappearinrecentyears.Thegroupsparsecodingwasrstintroducedin[1].Elhamifar andVidal[11]explicitlyimposedblockstructureonthedictionaryforclassication. However,noneoftheseapproachesinvestigatedneitherthecombinedframeworkthat incorporatedboththegroupandblockstructuresnortheeectofcombiningthesetwo structuresondictionarylearning. SparseRepresentationbasedClassicationSRCwasstudiedrecentlyin[10, 46].However,thedictionaryissimplycolumnsofthetrainingdataandthereisno emphasisonminimizingtheintra-blockcoherence.Althoughtheworkin[33]sharessome supercialsimilaritieswithourdictionarylearningalgorithm,thedierencesaremajor. First,Ramirezet.al.traineachblockusingdierentcollectionofdataandtherefore, thereisnonotionoftrainingblocksof D simultaneouslyasinourframework.And becauseofthis,themainobjectivein[33]istheminimizationofinter-blockcoherence insteadoftheintra-blockcoherence.Finally,Sprechmannet.al.[38]proposedasparse codingschemethatissimilartooursBGSCusingtheproximaloptimizationproposedin [47].However,theydidnotproposenorinvestigatetheintegrationofdictionarylearning algorithmsastheirworkisfocusedonsignalrecoveryandseparation. 43

PAGE 44

3.2Methods Inthissection,wedescribethealgorithmsinourproposedframework.Wewillstart withsparsecodingalgorithmsrstandworkourwaytowardsthefulldictionarylearning algorithm.Wedenotescalarswithlower-caseletters,matriceswithupper-caseletters,and the i -th block and group ofamatrixorvectorwith Z [ i ] ,and Z i ,respectively. 3.2.1TheoreticalGuarantee Itisimportanttounderstandtheconditionson D underwhichourconvexrelaxations Eq.3{3and3{5areequivalenttotheiroriginalcombinatorialEq.3{2and3{4 programs.Inotherwords,wewanttoexaminetheconditionsunderwhichourproposed programscanindeedhaveexactrecoveriesastheircorrespondingcombinatorialprograms. Theconditionswhen X g isasinglevectorwasprovedin[11].Usinglinearalgebra,we canconvertourprograms,where X g and C g arematrices,intoequivalentprograms, where X g and C g arevectors.Theconversionisstraightforwardandlistedinthe AppendixA.Wethenprovetheequivalenceconditionsofourprogramsinasimilarwayas givenin[11]. 3.2.2Block/GroupSparseCoding Theprogram P ` 1 ;p inEq.3{3canbecastasanoptimizationproblemthatminimizestheobjectivefunction: Q c C ; X ; D = X g Q c C g ; X g ; D = X g 1 2 k X g )]TJ/F39 11.9552 Tf 9.962 0 Td [(DC g k 2 F + X i C g [ i ] p : {7 Forclarityofpresentation,wewillpresenttheoptimizationstepsonlyforonespecic groupofdata X anditscorrespondingsparsecoecients C .Eq.3{7canbewrittenas: 1 2 k X )]TJ/F39 11.9552 Tf 11.956 0 Td [(DC k 2 F + X i C [ i ] p = 1 2 k X )]TJ/F28 11.9552 Tf 11.955 11.357 Td [(X i 6 = r D [ i ] C [ i ] )]TJ/F39 11.9552 Tf 11.955 0 Td [(D [ r ] C [ r ] k 2 F + C [ r ] p + c; {8 44

PAGE 45

where c includesthetermsthatdonotdependon C [ r ] .When p =1,thisobjective functionisseparable.Iteratesofelementsin C [ r ] canbesolvedusingamethodsimilarto [1].When p =2 2 ,itisonlyblock-wiseseparable.ComputingthegradientofEq.3{8 withrespectto C [ r ] ,weobtainthefollowingsub-gradientcondition: )]TJ/F39 11.9552 Tf 11.956 0 Td [(D | [ r ] X + D | [ r ] X i 6 = r D [ i ] C [ i ] + D | [ r ] D [ r ] C [ r ] + @ k C [ r ] k F 2 0 : {9 Let'sassumefornowtheoptimalsolutionfor C [ r ] hasanon-zeronorm k C [ r ] k F > 0. Denotingthersttwotermsby )]TJ/F39 11.9552 Tf 9.299 0 Td [(N ,substitutingthepositivesemi-denitematrix D | [ r ] D [ r ] withitseigen-decomposition U U | ,multiplyingbothsidesoftheequationwith U | and usingthefactthat @ k C [ r ] k F = C [ r ] k C [ r ] k F ,wehave U U | C [ r ] + C [ r ] k C [ r ] k F = N U | C [ r ] + U | C [ r ] k C [ r ] k F = U | N : {10 Changingthevariables Y = U | C [ r ] andusingthefactthattheFrobeniusnormis invariantunderorthogonaltransformations,wehave Y + Y k Y k F = ^ N ; {11 where ^ N = U | N .Setting = k Y k F and ^ Y = Y = k Y k F ,wehave ^ Y = + I )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 ^ N ; s.t. k ^ Y k F =1 : {12 Sinceisadiagonalmatrix, + I )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 isalsoadiagonalmatrixwithdiagonalentries 1 = i + ,where i isthe i -theigen-valuein.Therefore,theconstraint k ^ Y k F =1 2 Weuseelement-wise ` 2 normherewhichistheFrobeniusnorm. 45

PAGE 46

impliesthat X i;j ^ N 2 i;j i + 2 =1 ; {13 where ^ N i;j is i;j -thelementofmatrix ^ N Wesolvefortherootoftheaboveone-variableequationw.r.t. usingstandard numericalmethodssuchasNewton'smethod.Once iscomputed,wecanobtain ^ Y and Y usingEqs.3{12and3{11,respectively.Theiterateof C [ r ] canbecomputedby projecting Y backtotheoriginaldomaini.e. C [ r ] = UY Nowlet'srevisitthepositivityassumptionof k C [ r ] k F .Whenthesolutionof inEq.3{13isnotpositive,thereisnosolutionforEq.3{10asitcontradictsthe assumptionthat > 0.Inthiscase,theoptimalityhappensat C [ r ] = 0 becausethe derivativeof k C [ r ] k F doesnotexistwhen k C [ r ] k F = 0 andourobjectivefunction,Eq. 3{8,isconvexandboundedfrombelow. Theproofofthisclaimisstraight-forward.Let f x beacontinuousconvexfunction whichisboundedfrombelowanddierentiableeverywhereexceptat x = x o .We solve @f x =0fortheminimumof f x .Ifthesolutionof @f x =0doesnotexist, theminimumof f x mustoccurat x = x o forotherwisewewouldnd x 0 suchthat @f x 0 =0. AswecanseefromEq.3{13,theblocksparsityof C dependsonthevalueof Thelarger is,thelesslikelythereexistsafeasiblesolutionto inEq.3{13.Onthe otherhand,when =0,solutionof willalwaysbepositive,andhencethereisnononzero C [ r ] s.ThisisanalogoustotheshrinkagemechanisminstandardLassoprogram[7]. When X isasinglevector,ourBGSCisequivalenttothe P ` q =` 1 programin[10].When thereisnoblockstructureon D ,BGSCisequivalenttothegroupsparsecodingGSCin [1]. 3.2.3ReconstructedBlock/GroupSparseCoding Forclarityofpresentation,wewillagainderivethenovelR-BGSCalgorithmforone groupofdatainthissection. P 0 ` 1 ;p inEq.3{5canbecastasanoptimizationproblemin 46

PAGE 47

termsof C [ r ] thatminimizes 1 2 k X )]TJ/F28 11.9552 Tf 11.955 11.357 Td [(X i 6 = r D [ i ] C [ i ] )]TJ/F39 11.9552 Tf 11.955 0 Td [(D [ r ] C [ r ] k 2 F + X i D [ i ] C [ i ] p + c; {14 where c isaconstantthatincludesthetermsthatdonotdependon C [ r ] .Theiterateof C [ r ] canbederivedinasimilarfashionasthepreviousalgorithm. Wenowderivethecrucialstepsinoptimizationalgorithmfor p =2asthe p =1case isstraightforward.Similartothederivationinprevioussection,werstassumethenorm D [ r ] C [ r ] F ispositiveforoptimalsolutionfor C [ r ] .Takingthegradientoftheobjective functionwithrespectto C [ r ] andequatingittozero,wehave )]TJ/F39 11.9552 Tf 9.298 0 Td [(D | [ r ] X + D | [ r ] X i 6 = r D [ i ] C [ i ] + D | [ r ] D [ r ] C [ r ] + D | [ r ] D [ r ] C [ r ] k D [ r ] C [ r ] k F =0 : {15 Nowdenotingthersttwotermswith )]TJ/F39 11.9552 Tf 9.299 0 Td [(N andcomputingthesingularvaluedecompositionof D [ r ] = USV | ,wehave VS 2 V | C [ r ] + VS SV | C [ r ] k SV | C [ r ] k F = N : {16 Multiplyingbothsidesoftheaboveequationwith V | ,andletting Y = SV | C [ r ] k SV | C [ r ] k F = k SV | C [ r ] k F ,and ^ N = V | N ,wehave S + S Y = ^ N = Y = S + S )]TJ/F22 7.9701 Tf 6.587 0 Td [(1 ^ N ; s.t. k Y k F =1 : {17 UsingthesamemethodasinSection3.2.2, canbesolvedrstandtheiterateof C [ r ] can becomputed.Notethatwhen X isasinglevector,R-BGSCisequivalenttothe P 0 ` q =` 1 programin[10]. 3.2.4Intra-BlockCoherenceSuppressionDictionaryLearning Theintra-blockcoherenceisdenedas S D =max i max p;q 2I i ;p 6 = q d | p d q k d p kk d q k ; {18 47

PAGE 48

where I i istheindexsetoftheatomsinblocki.Inter-blockcoherence B isdenedas B D =max i 6 = j 1 n a 1 D | [ i ] D [ j ] ; {19 where 1 isthelargestsingularvalueand n a isthesizeofblock. AsmentionedintheIntroduction,itisnecessarytohaveadictionaryupdating algorithmthatminimizestheintra-blockcoherence.Wethereforeproposedthefollowing objectivefunction: Q d D ; X ; C = X g 1 2 k X g )]TJ/F39 11.9552 Tf 11.955 0 Td [(DC g k 2 F + j D j X k =1 k d k k 2 + X b 0 @ X p;q 2I b ;p 6 = q k d | p d q k 2 1 A + C ; {20 whereistheregularizertermon C SeeEq.3{14and3{8.Thersttwotermsare thesameastheobjectivefunctionusedin[1].Theirformulationfacilitatestheremovalof dictionaryatomswithlowpredictivepowergiventhedatamustbemean-subtracted.We addthethirdtermtominimizetheintra-blockcoherence. Forthesakeofclarity,wederivetheupdateformularequiredinoptimizingthe objectivefunctionaboveforonegroupofdata.Again,werstassumetheoptimal solutionfor d k tohaveanon-zeronorm.Computingthegradientwithrespectto d r ,and equatingittozero,wehave )]TJ/F39 11.9552 Tf 9.299 0 Td [(Xc | r + X k 6 = r d k c k c | r + d r c r c | r + d r k d r k 2 + X j 2I b ;j 6 = r d j d | j d r =0 ; {21 where c r isthe r -throwof C and d r isinblock b .Notethat c r c | r indicatestheweightof howmuchtheatom d r isbeingusedtoencode X Itisclearfromtherstthreetermsoftheaboveequationwhygroup-regularizedSC algorithmstendtogeneratehighintra-blockcoherenceblocks.Aswecansee,thevalue of d r dependsnotonlyonhowmuchitisbeingusedtoencode X st and3 rd termbut 48

PAGE 49

alsoonhowmuchother d k 'sarebeingusedtoencode X .Since,BGSCandR-BGSC minimizethenumberofblockstobeusedforencoding X ,theatoms d r arelikelyinthe sameblockas d k .Forexample,ifthecoecient C of X hasonlyonenon-zeroblock, thentheatoms d r and d k ,whichcorrespondtothenon-zerorowsofcoecients c 'sinthe aboveequation,areallinthesameblock.Therefore,updating d k usingonlytherstthree termsintheaboveequationwillresultinhighintra-blockcoherence.Thisjustiesputting theintra-blockcoherencesuppressingregularizerterminEq.3{20. Tothebestofourknowledge,thereisnoworkdiscussinghowtogroupthetraining samples.Intuitively,onewouldsplitaclassoftrainingdataintomultiplesimilargroups usingtechniquessuchasK-means.However,thismightputallthe in-classoutliers ,e.g. badlywritten'9'sthatlooklikea'7',intoonegroupandhenceallowthemtoactas onedierentclassandtobeusedtotrainthedictionaryblockscorrespondingtothe wrongclasses.Fromourempiricalobservations,itisbettertohaveagroupofdatathat hassimilarvariabilityasthewholeclass.Thiswouldforcethese in-classoutliers tobe regularizedbyinliersofthesameclass. ContinuefromEq.3{21,replacingrsttwotermswith )]TJ/F23 11.9552 Tf 9.299 0 Td [( i c r c | r with t ,and P d j d | j with r ,Eq.3{21becomes t d r + d r k d r k 2 + r d r = i = t U | d r + U | d r k U | d r k 2 + U | d r = U | i ; {22 where U U | istheeigendecompositionof r and isadiagonalmatrixonlycontains thenon-zeroeigen-valuesof r and U arethecorrespondingeigen-vectors.Denoting U | d r k U | d r k by y k U | d r k 2 by ,and U | i by~ i ,respectively,Eq.3{22thenbecomes t y + y + y = i = y = t + I + )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 ~ i ; s.t. k y k 2 =1 : {23 49

PAGE 50

Wecanusethesamemethodsasinprevioussectionstosolvefortheiterateof d r ,andif thesolution is 0,wewillset d k = 0 Notethatitisnotuncommontoaddapost-processingsteptomakeatomsin D tohaveunitnormsorsimplyrequiring k d r k 2 tobe1.Thischangestheiterateof d r to d r = t I + r )]TJ/F22 7.9701 Tf 6.586 0 Td [(1 i as k d r k 2 =1andthereforemakesthewholealgorithmmuchmore ecientascomputingeigen-decompositionofatypicallylargematrix r cannowbe avoided. 3.3ExperimentsandDiscussions 3.3.1Hand-WrittenDigitRecognition WeusedtheUSPSdataset[17],whichcontainsatotalof9,29816-by-16imagesof handwrittendigits,inthisexperimentWevectorizedtheimagesandnormalizedthe vectorstohaveunit ` 2 norm.Wecollected15groupsofdatafromeachdigitwhereeach groupcontained50randomlychosenimagesfromthesameclass. Theexperimentwasconductedasfollows: 1.Generatearandomdictionary D with n b blocksandeachblockcontains n a columns ofatomsatotalof n b n a columns. 2.IterativelycomputecoecientsusingBGSCandupdatethedictionaryusingICSDLalgorithm. 3.Usethecoecientsofthetrainingdatatotrain10one-vs-alllinearSVMs[3]. 4.Computethesparsecoecientsofthe testsamples usingeitherBGSCorR-BGSC. UsetheSVMstoclassifythetestsamplesusingthecoecients. Table3-1demonstratestheimpactofthedictionary'sblockstructureontheerror rates.Theparametersare =200, train = 0 : 6 3 ,and test =0 : 2.Fortheexperiment inthelastcolumnofTable3-1,weassigntwoblockstoeachdigit.Theresultsshowthat theerrorratesaresimilarwhenthenumberofblocks n b isgreaterthan10eventhough 3 train variesslightlywithrespectto n a 50

PAGE 51

Figure3-2.Visualizationoftheabsolutevaluesofthesparsecoecientsofthetraining samples.Eachcolumncontains15groups.Graypixelscorrespondtonon-zero coecients.Grayareasarethenon-zerocoecients.Darkercolorrepresents largervalues. thenumberofclassesofthisdatasetis10.Thereasonisthatthereexistssomevariability withineachclassandmutualsimilaritybetweenimagesofdierentclasses.Infact,as showninFig.3-2,thesparsecoecientsofmostofthetrainingdatahave3to6active non-zeroblockswhen n b =20. ThelastcolumnofTable3-1showsthatthehardassignmentofblockstoclasses resultsinhighererrorrateeventhoughthesizeofthedictionaryistwiceaslargeasthose oftherstthreeexperimentsinTable3-1.AsmentionedintheIntroduction,wedidnot assignblockstoclasses,andtherationalebehindthisisthatwewanttoletdatawith largervariabilitytousemoreblocksforencoding.Moreover,weallowdatafromdierent classtosharemutualblocks.Fig.3-2illustratesthecoecientsofthetrainingdata.We canseethat'7'and'9'sharetwoblocksofdictionaryduetotheirsimilarity.However, theyeachhaveanexclusiveblockwithlargecoecientsdarkerincolortoallowthemto encodethedierence. Nextwedemonstratetheeectofthevalue inICS-DLonclassicationrates.The parametersare train =0 : 4and n b ;n a = ; 25.When =0,ourICS-DLalgorithm doesnotsuppressintra-blockcoherenceandishenceequivalenttothedictionarylearning algorithmin[1].WeusedBGSCtocomputethecoecientsduringtraining.During 51

PAGE 52

Figure3-3.Intra-blockcoherencesolidanderrorratesdottedoftwodictionariesred for =200andbluefor =0.Errorratesoftherst30iterationsarenot shown. Table3-1.Classicationerror%withdierentstructureon D n b :numberofblocksin D n a :numberofatomsineachblock. n b ;n a ,25,12,50,50,50 y Error%2.533.426.222.956.52 y :Assigneachdigittotwoblocksofthedictionary. testing,weusedeitherBGSCorR-BGSCtocomputethecoecientsoftestsamples. test wasvariedbetween0.15and0.35andthebestresultwasreportedinTable3-2. Westoppedthetrainingroughlyafter200iterationswhenthedictionaryupdatedidnot changemuch.TheresultsinTable3-2suggestthatsuppressingtheintra-blockcoher`ence canindeedimprovetheperformance.However,as increases,theerrorrateincreases.In theextremecasewhenimposingastrictorthogonalityontheblocksusingtheUOB-DL, theerrorrateincreasedto4.27seeTable3-3.Theseresultsprovideanempiricalsupport forourclaimofnotusingstrictorthogonalityconstraint. Notethatwhen =0,ourresultisveryclosetothatofSISF-DL[33]SeeTable3-3. However,ourICS-DLalgorithmdoesnotimposeanyinter-blockorthogonalityconstraint 52

PAGE 53

Figure3-4.Errorrates%oftheUSPSdatasetundervedierentscenarios.The scenariosdierintermsofhowthetrainingsamplesareorganizedtocompute thecoecientsandwhichproposedSCalgorithmswereused.Firstcolumnin thelegendseparatedby j indicateshowthecoecientsofthe training samples arecomputed,ingroups G orindividually I .Thesecondcolumn indicateswhichSCalgorithmisusedtocomputethecoecientsofthe trainingsamples .ThethirdcolumnindicateswhichSCalgorithmisusedto computethecoecientsofthe testsamples individually. onthedictionaryasSISF-DLdoes.Itisprobablyduetothatourframeworkdoesnot hardassignclasstoblockofdictionaryandthatweimposedgroupstructureondata. Table3-2.Classicationerror%withdierent inICS-DL. 0100200300400600800 Error%4.023.472.582.43 2.26 2.423.12 Tofurtherdemonstratetheintra-blockcoherencesuppressingpropertyofourICS-DL algorithm,weplottheintra-blockcoherencevaluesofthedictionariestrainedwith =0 and =200,respectively,inFig.3-3.Wealsoprovidedtheerrorratesevery4iterations fromthe30-thiterationonward.Solidanddottedlinesindicatethecoherenceanderror, respectively.TheredsolidlinedemonstratesthatourICS-DLmethodcankeepthe intra-blockcoherenceatalowvalue.Onthecontrary,withouttheintra-blockcoherence suppressionterm,thebluesolidlineshowsthatthecoherencevaluebecomescomparably 53

PAGE 54

largewithincreasingnumberofiterations.Thebluedottedlineshowsthatitsassociated errorrateevenincreasesbetweeniterations40and60whichimpliesthatover-tting occurswithinsomeblocks. Oncewehaveatraineddictionary,weusedthecoecientsoftrainingsamplesto train10linearSVMs.Wecanusethealreadyavailablecoecientscomputedduring thetrainingphase.Thesecoecientsarecomputedasagroup.Anotherwaytoobtain coecientsoftrainingsamplesistore-computethem individually .Fig.3-4showstheerror ratesof ve ofthedierentscenarios.Thedictionarywastrainedwith =400, train = 0 : 4, n b ;n a = ; 25andnumberofiterationis150.TheresultsinFig.3-4showsthat R-BGSCgenerallyperformedslightlybetterthanBGSCespeciallyinscenarios1and2in Fig.3-4.However,theresultfromscenario3with test =0 : 25achievesthebesterrorrate at2.26%.02%betterthanthatofscenario4with test =0 : 30. Finally,wecomparedourresultswithotherstate-of-the-artresultsusingdictionary learningalgorithms[28,33]inthetoprowofTable3-3.Wealsocomparewiththe UOB-DL[23]whichimposestrictorthogonalityconstraintonblocks.Theresults showthatouralgorithmsoutperformedotherdictionarylearningmethodseventheone speciallytailoredforperforminghand-writtendigitsrecognition[16].AlthoughTable 3-2suggeststhatsuppressingintra-blockcoherenceof D improvestheclassication performance,imposingastrictorthogonalityontheblocks,however,doesnotresultinany improvement. Table3-3.Comparisonofclassicationerrorrate%oftheUSPSandtheMNIST datasetswithrecentlypublishedapproaches.TheresultsofSISF-DL,SDL-DL, TDK-SVMaretakenfrom[33],[28],and[16],respectively. BGSCR-BGSCSISF-DLSDL-DLUOB-DLTDK USPS 2.26 2.283.983.544.272.40 MNIST2.32 )]TJ/F8 10.9091 Tf 66.598 0 Td [(1.261.05 )-4394()]TJETq1 0 0 1 72 124.138 cm[]0 d 0 J 0.498 w 0 0 m 468 0 l SQ0 g 0 G0 g 0 GBT/F15 11.9552 Tf 300.147 32.149 Td [(54

PAGE 55

WealsoapplyourframeworkontheMNISTdataset.However,duetotheamount andcomplexityofthisdataset,wewerenotabletofullyexploitdierentdictionary structuresandparameterstoobtainareasonableresult.Theparameterstoobtainthe resultsinTable3-3are n b ;n a = ; 40 4 =500, train =1 : 20,and test =0 : 4.300 groups,eachcontains100data,wereused. 3.3.2GroupRegularizedFaceClassication ImageclassicationusingtheSRCframeworkisknowntobesensitivetovariations suchasmisalignmentorlightingconditionsbetweentestandtrainingsamples,andasmall amountofvariationcouldoftennegativelyaecttheperformance.Onestraightforward solutionwouldbetoincludeallpossiblevariationsinthedictionary[22].However,this wouldincreasethecomputationalcostdrastically.Therefore,applyingperturbationson thetestimageon-lineoersamorepracticalsolution.Theperturbationcanbesome triansformationstocompensatespatialmis-alignment.Ifthenormalvectorsofthetest faceimageisprovidedorcanbecomputed[12],theperturbationscanbedierent lightingconditions.WeproposeaSRCframeworkhereforfacerecognitionthatcan alleviatethenegativeeectofvariationsbetweenthetrainingandtestsamples. Inthisexperiment,weusedthe croppedExtendedYaleFaceDatabaseB [21]which containsimagesof38personsunderdierentilluminationconditions.Tosimulatethe largevariationsbetweentrainingandtestsamples,weusedtheimageswiththeazimuth andelevationanglesofthelightsource 35 o asthetrainingsamples.Asshowninthetop rowofFig.3-5A,theseimagesarewell-litandcontainlittletonoshadow.Weusedthese samplestosimulatethewell-preparedlaboratorygradedata. Wekepttherestofthedataset,whichcontainslargeramountofshadow,asthetest samplesbottomrowofFig.3-5A.Weusedthetestsamplestosimulatepoorlyacquired dataorsomepoorperturbationsofonesingletestsample.Forexample,wehaveoneor 4 Ourdictionarysizeis5timessmallerthanwhatwasusedinSISF-DL. 55

PAGE 56

afewpoorlyacquiredtestimagesthatdieralotfromthetrainingsamplesin D .We estimatethenormalofeachpixelintheimageandgeneratemanyilluminationconditions ofthisimagethatareclosertothosein D .However,duetothepoorestimationof thenormals,notallperturbationshavegoodquality.Inthisexperiment,wewantto demonstratethatcombiningtheseperturbationsasagroupcanimprovetheclassication performance. Theexperimentwasconductedasfollows: 1.Projectthesamplesdownto R m usingPCA. 2.Usethetrainingsamplesasatomsof D D has n b =38blockswhereeachblock contains n a =24or23atoms 5 3.Randomlypick n g testsamplesfromoneclasstoformagroup X 4.Computethecoecients C of X usingBGSC,R-BGSC,andGSC[1]. 5.Computetheclasslabelofeachcolumnof X individually using label x = min i )]TJ/F21 11.9552 Tf 5.479 -9.684 Td [(k x )]TJ/F39 11.9552 Tf 11.955 0 Td [(D [ i ] c [ i ] k 2 2 ,where c isthesparsecoecientof x Otherthanthethreemethodsabove,wealsousedaframeworkin[46]WSC,btwo algorithms, P ` 2 =` 1 BSCand P 0 ` 2 =` 1 B'SC,in[10].Sincethesemethodsdonotimpose groupstructureondata,wecomputedthecoecientsoftestsamplesindividually. Therationalebehindusingthegroupstructureisthataslongaspartsofsome imagese.g.1 st ,3 rd and4 th imagesinthebottomrowofFig.3-5Aaresimilartoafew atoms/blocksin D ,theywillgeneratehighresponseswithrespecttotheseatoms.It enablesanactivesetthatcontainstheseatoms,anditfurtherforcestheotherimages in X tousetheseatomsforencoding.Therefore,itreducesthechanceofotherseverely shadowedimagestobeencodedbyusingotherirrelevantblocks. TheclassicationratesofthesixmethodsareshowninFig.3-5B.Theresults indicatethatgroupregularizedmethodsaregenerallybetterthantheothers.Ournovel 5 Missingsamplesinsomecategories. 56

PAGE 57

R-BGSCsignicantlyoutperformedothersbyroughly20%.Notethattheclassication ratesofBSCandWSCinFig.3-5Baremuchworsethanthoselistedin[10]and[46], respectively.Thisisduetothat,intheirexperiments,theyrandomlychosehalfofthe datasetastrainingusedasthedictionaryandtheotherhalfastestsamples.Therefore theirdictionariesarealmosttwotimeslargerthanoursandthevariabilitybetween trainingandtestsamplesareminimizedduetotherandomselection. A B Figure3-5. A :Toprow:Well-litfacesinthetrainingset.Bottom:Testsamplescontain largeamountofshadows. B :Classicationratesofthe6methodsofdierent groupsize n g with m =600.The for BGSC R )]TJ/F23 11.9552 Tf 11.955 0 Td [(BGSC ,GSC,BSC,B 0 SC,andWSCare 0 : 2, 0 : 2, 0 : 05,0 : 1,0 : 1,and0 : 02,respectively. 57

PAGE 58

3.3.3UnsupervisedTextureClustering Weappliedourframeworkon unsupervised textureclustering.Weusedimagesfrom the Brodatz datasetFig.3-6A.Thegoalistoseparatethetwotextureswithoutany otherpriorinformation.Theexperimentwasconductedasfollows: 1.Dividethetestimageinto32-by-32overlapping regions withspacingof16pixels. 2.Randomlysample5021-by-21overlapping patches withineach region andcombine themintoonegroup. 3.Initializearandomdictionarywithtwoblockseachofwhichcontains230atomsi.e. D =[ D [1] ; D [2] ]. 4.IterativelyupdatethecoecientsandthedictionaryusingBGSCandICS-DL, respectively. 5.Oncethedictionaryistrained,wealsohavethesparsecoecient c ofeachpatch x 6.Createtwoblank score maps M f [1] ; [2] g fortheimage. 7.Foreach c ,compute k c [ i ] ;i =1 ; 2 k 1 andcastthevaluestothepixels,wherethecorresponding x covers,in M [ i ] inaHoughvotingfashion. 8.Classifytheimageintotworegionsbycomparingthevaluesin M [1] and M [2] pixelby pixel. Weset train =4 ; 000 6 and =200.Becauseofthesizeandspacingofthe regions step1and2,some patches willcontainbothtexturesandsomegroupswillcontain patchesofbothtextures.Duetotherepetitivenatureoftexture,patcheswithineach grouparequitesimilar.Hence,assigningneighboringpatchestoonegroupisjustiable. Fig3-6Bshowssomeselectiveatomsfromthetraineddictionary.Thetoptwoandbottom tworowsareselectedfromthe1 st and2 nd blockof D ,respectively.Fig3-6Cshowsthe 6 Suchlargevalueisduetothatwedidnotnormalizedtheintensityvaluesandour ICS-DLalgorithmnormalizetheatomsof D 58

PAGE 59

energymap M [2] .Fig3-6Dshowstheresultingsegmentationandthebluelineisthe groundtruthsegmentation. Thereasonthisunsupervisedapproachworkedisbecause,thoughsomepatches containbothtextures,themajorityofthemcontainonlyonetypeoftexture.Inthecase ofmorecomplicatedmixtureoftextures,amoresophisticatedstructureon D isrequired e.g.moreblocksin D torepresentthepatchesthatcontainmultipletextures. A B C D Figure3-6.A:Textureimage.B:Selectedatomsfromthetraineddictionary.C:Energy mapofthesecondblock.D:Segmentationresultandthegroundtruthblue line. 59

PAGE 60

CHAPTER4 CONCLUSION Inthiswork,Ihavepresentedtwonovelsparsecodingframeworks.InChapter2, Ihavepresentedthenovelaneconstrainedgroupsparsecodingapproachforsparse representation-basedclassicationwiththeaimofextendingthecurrent SRC -framework toclassicationproblemswithmultipleinputs.Ihavealsopresentedaformofsparse recoveryresult.Basedonthisresult,wehavearguedthat,theoretically,theclassication performanceusingtheproposedmethodshouldbeasgoodasifnotbetterthantheone usingexisting SRC -basedframework.Ihaveevaluatedtheproposedapproachusingthree experimentsthatinvolvesfacerecognition,textureclassicationandimposteroutlier identication.Thepreliminaryexperimentsdemonstratetheeectivenessaswellas eciencyoftheproposedapproach. InChapter3,Ihaveproposedanoveldictionarylearningframeworkthatincludes twonovelblock/groupregularizedsparsecodingalgorithmsandonenoveldictionary learningalgorithm.Theoreticalequivalencesbetweenoursparsecodingprogramsand theiroriginalcombinatorialprogramswereproven.Thisindicatesthattheproposed sparsecodingalgorithmcanindeedproducedsparsesolutionsiftheyexist.Eective optimizationshavebeendevelopedandtestedforsolvingthesealgorithms.Experimental comparisonswithseveralstate-of-the-artdictionarylearningmethodsarefavorable,and inparticular,forhand-writtendigitrecognitionexperiment,theproposedframework outperformedthesestate-of-the-artdictionarylearningalgorithms. 60

PAGE 61

APPENDIX PROOFOFTHETHEORETICALGUARANTEE EquivalenceofGroupSparseCoding Inanoiselesssetting,thealgorithmGroupSparseCoding GSC proposedin[1]is thefollowingoptimizationprogram: P ` p ` 1 :min X r k C r k p suchthat X = DC A{1 where p =1 ; 2and C r isthe r -throwof C .TheoptimizationprograminEq.A{1isthe relaxedversionofEq.A{2. P ` p ` 0 :min X r I k C r k p > 0suchthat X = DC A{2 WewanttondouttheconditionunderwhichthetwoprogramsinEq.A{1and A{2areequivalent.First,let'sassume X 2 R n s D 2 R n m ,and C 2 R m s .By stackingcolumnsofmatrix X intoavector x X = DC canberewrittenas x = 2 6 6 6 6 6 6 6 4 X 1 X 2 . X s 3 7 7 7 7 7 7 7 5 = 2 6 6 6 6 6 6 6 4 D0 0 0D 0 . . . . . . 00 D 3 7 7 7 7 7 7 7 5 2 6 6 6 6 6 6 6 4 C 1 C 2 . C s 3 7 7 7 7 7 7 7 5 = ~ D ~ c A{3 where X i and C i arethe i -thcolumnof X and C ,respectively.Wefurtherarrange ~ c such thatelementsinthesamerowof C areinthesameblockinthenewvector c 0 i.e. c 0 = C 1 1 ; C 1 2 ; C 1 s ; C 2 1 ; C 2 s ; C m 1 ; C m s ; | A{4 where C i j istheelementinthe i -throwand j -thcolumnin C .Tohave x inEq.A{3 remainingthesameafterthisrearranging, ~ D mustbechangedaccordingly.Thenew 61

PAGE 62

d i D C C i D 0 D 0 [ i ] c 0 c 0 [ i ] FigureA-1.Theequivalencebetween X = DC and x = D 0 c 0 .The i -throwof C correspondstothe i -thblockof c 0 shadedarea.Ablockin D 0 isthe Kroneckerproductofanidentitymatrixwithitscorrespondingcolumnin D dictionaryis D 0 = 2 6 6 6 6 6 6 6 4 d 1 0 0 d 2 0 0 0d 1 0 0d 2 0 . . . . 00 d 1 00 d m 3 7 7 7 7 7 7 7 5 =[ I s d 1 ; I s d 2 I s d m ]A{5 where d i isthe i -thcolumnof D and denotestheKroneckerproduct.Thenewdictionary D 0 iscomposedof m blockswherethe i -thblockistheKroneckerproductofa identitymatrixofsize s andthe i -thcolumnof D .Afterthisrearrangement, x 2 R n s D 0 2 R n s m s ,and c 0 2 R m s .Theequivalencebetween X = DC and x = D 0 c 0 isillustratedinFigure4.Arowin C correspondstoablockin c 0 andablockin D 0 is simplytheKroneckerproductofanidentitymatrixwithacolumnin D Throughthisrearrangementoftheequation,theprograminEq.A{2isactually identicalto P ` p ` 0 0 :min m X i =1 I k c 0 [ i ] k p > 0s.t. x = D 0 c 0 A{6 where c 0 [ i ]isthe i -thblockin c 0 .LikewisetheprograminEq.A{1isidenticalto P ` p ` 1 0 :min m X i =1 k c 0 [ i ] k p s.t. x = D 0 c 0 A{7 62

PAGE 63

whichisexactlytherelaxedversionofEq.A{6. Undertheassumptionthataeachblockof D 0 haslinearlyindependentcolumnsand bthesubspacesspannedbyeachblockaredisjoint,[13]showsthat P ` p ` 1 0 isequivalent to P ` p ` 0 0 for p =2if k )]TJ/F15 11.9552 Tf 11.956 0 Td [(1 S < 1A{8 where k indicatesthat c 0 is k -blocksparseandisauniquesolutionand S isthemutual subspaceincoherenceof D 0 denedasfollows S =max i 6 = j max v 2 S i ; w 2 S j j v | w j k v k 2 k w k 2 A{9 ,where S i =span D 0 [ i ].Duetothestructureof D 0 ,eachofitsblockshaslinearlyindependentcolumns.Moreover,ifthereisnoredundantcolumnsin D ,thenthesubspaces spannedbyeachblockof D 0 aredisjoint. Sincecolumnsofeachblockin D 0 aretheKroneckerproductofanidentitymatrix withtheircorrespondingcolumnin D ,themutualsubspaceincoherenceof D 0 isactually themutualcoherenceof D ,whichisdenedas =max i 6 = j j d | i d j j k d i k 2 k d j k 2 A{10 ,where d i denotesthe i -thcolumnof D .Theproofisverystraight-forward.Given twoblocks D 0 [ i ] and D 0 [ j ] andthesubspaces S i and S j spannedbythem,theirmutual incoherenceis max v 2 S i ; w 2 S j j v | w j k v k 2 k w k 2 ,which,duetothespecialstructureof D 0 ,isactuallyequalto j d | i d j j k d i k 2 k d j k 2 : Wehavealsoshownthattheprograms P ` p ` 1 and P ` p ` 0 areequivalenttotheprograms P ` p ` 1 0 and P ` p ` 0 0 ,respectively.Usingtheproofin[13],wecanconcludethatwhen solutionof C inEq.A{1is k row-sparseandisunique, P ` 2 ` 1 inEq.A{1isequivalentto 63

PAGE 64

P ` 2 ` 0 inEq.A{2if k )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 < 1 : A{11 andthereisnoredundantcolumnsin D In[10],theauthorsfurthergeneralizedtheconditionfor p 1.Theorem1in [10]showstheconditionunderwhich P ` p ` 1 0 isequivalentto P ` p ` 0 0 .Beforeweshowthe condition,wewillrstintroducetwovariablesthatweredenedin[10]. Denition1. Givenadictionary D ,dene p asthesmallestconstantsuchthatforevery i thereexistsasubmatrix D [ i ] 2 R n d i suchthatforevery c [ i ] wehave )]TJ/F23 11.9552 Tf 11.955 0 Td [( p k c [ i ] k 2 p k D [ i ] c [ i ] k 2 2 + p k c [ i ] k 2 p A{12 for p 1 p isthe bestblockp-restrictedisometryconstant .Itcharacterizesthebestp-restricted isometrypropertyamongallsubmatrices D [ i ] of D Denition2. Givenadictionary D ,dene p asthesmallestconstantsuchthatforevery i and c [ i ] wehave k D [ i ] c [ i ] k 2 p p k c [ i ] k 2 p A{13 for p 1 p isthe upperblockp-restrictedisometryconstant ofadictionary.Duetothe structureof D 0 inEq.A{5and c 0 [ i ]issimplythe i -throwof C inEq.A{2orA{1, wehave k D 0 [ i ] c 0 [ i ] k 2 2 = k c 0 i s +1 + + c 0 i +1 s d i k 2 2 = k s X j =1 C i j d i k 2 2 A{14 where C i j istheelementinthe i -throwand j -thcolumnin C .Likewise k D 0 [ i ] c 0 [ i ] k 2 p = k P s j =1 C i j d i k 2 p .Thereforewecandenethefollowingtwoconstants: 64

PAGE 65

Denition3. Givenadictionary D =[ d 1 d m ],denethe bestrow p -restrictedisometry constant 0 p asthesmallestconstantsuchthatforevery C i Ihave )]TJ/F23 11.9552 Tf 11.955 0 Td [( 0 p k C i k 2 p k s X j =1 C i j d i k 2 2 + 0 p k C i k 2 p A{15 for p 1. Denition4. Givenadictionary D =[ d 1 d m ],denethe upperrow p -restricted isometryconstant 0 p asthesmallestconstantsuchthatforevery C i Ihave k s X j =1 C i j d i k 2 p 0 p k C i k 2 p A{16 for p 1. Asshownpreviouslyinthissection, 0 p and 0 p arethesameas p and p ,respectivelywhenthedictionary D in Denition3 4 is[ d 1 d m ]andthedictionaryin Denition1 2 isstructuredinthewayshowninEq.A{5. From Denition3 4 andTheorem1in[10],wecanconcludethatforsignalsthat haveaunique k -row-sparserepresentationin D ,thesolutionoftheoptimizationprogram P ` p ` 1 isequivalentto P ` p ` 0 if k s 0 p 1+ 0 p + k )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 < 1 )]TJ/F23 11.9552 Tf 11.956 0 Td [( 0 p 1+ 0 p A{17 65

PAGE 66

EquivalenceofProposedBGSCAlgorithm Ourproposedblock-groupsparsecodingalgorithmassumesblockstructureinthe dictionary D i.e. D =[ D [1] D [ L ] ],where L isthenumberofblocks.Thecoecients correspondthe i -thblockofthedictionaryisdenotedas C [ i ] .Thealgorithmsolveforthe followingoptimizationprogram P ` 1 ;p :min L X i =1 k C [ i ] k p s.t. X = DC = D [1] D [ L ] 2 6 6 6 6 4 C [1] . C [ L ] 3 7 7 7 7 5 A{18 where L m isthenumberofblocksin D k C [ i ] k p isthe ` p -normofthevectorizedmatrix and p =1 ; 2.TheoptimizationprogramintheEq.aboveisarelaxedversionof P ` 0 ;p :min L X i =1 I )]TJ/F21 11.9552 Tf 5.479 -9.683 Td [(k C [ i ] k p > 0 s.t. X = DC A{19 Thedictionaryhereconsistsof L blocks: D [ i ] 2 R n jI i j ,where I i isanindexset containstheindicesofthecolumnsinblock i C [ i ] containsthe fI i g -throwsof C AgainwecanrearrangeEq.A{19andA{18suchthattheyareinthesame structuresasinEq.A{6andA{7,respectively.Thedierenceisthatthenumber ofblocksof D 0 isnow L ratherthan m whichisthenumberofcolumnof D inprevious section.Nowthatweimposeablockstructureon D .Ablockin D 0 consistsofthe Kroneckerproductsofidentitymatrixwiththecolumnsinthecorrespondingblockof D Forexample,asillustratedinFigure4,assuming D [ i ] =[ d 3 d 4 d 5 ]then D 0 [ i ] would beequalto[ I s d 3 I s d 4 I s d 5 ].Thecoecientscorrespondtothis D [ i ] isamatrix C [ i ] =[ C 3 ; C 4 ; C 5 ]ofsize3-bys .Andthecorresponding c 0 [ i ] isequalto[ C 3 C 4 C 5 ] | of size s -by-1.Sincethenormontheblocksofthecoecientmatricesiselementwise norm,Eq.A{18andA{19areequivalenttoEq.A{7andA{6,respectively. 66

PAGE 67

D [ i ] D C C [ i ] D 0 D 0 [ i ] c 0 c 0 [ i ] FigureA-2.Theequivalencebetween D [ i ] ; C [ i ] and D 0 [ i ] ; c 0 [ i ] Theproofoftheconditionunderwhich P ` 0 ;p isequivalentto P ` 1 ;p canbecarriedout inthesamefashionasinprevioussection.Thetwoconstantsthatareequivalenttowhat aredened Denition3 and 4 aredenedasfollows: Denition5. Assumeadictionary D = D [1] D [ L ] =[ d 1 d m ]of L blocksand C [ i ] aretherowsin C correspondingtothe i -thblockof D I i istheindexsetthatcontains theindicesofthecolumnsof D belongtothe i -thblock.Denethe bestblock/row p restrictedisometryconstant p asthesmallestconstantsuchthatforevery i thereexistsa submatrix D [ i ] =[ d fI i g ],suchthatforevery C [ i ] wehave )]TJ/F15 11.9552 Tf 12.042 0 Td [( p k C [ i ] k 2 p k X r 2I i s X j =1 C r j d r k 2 2 + p k C [ i ] k 2 p A{20 Thesecondtermintheaboveequation, k P r 2I i P s j =1 C r j d r k 2 2 ,isequaltothesecond term, k D [ i ] c [ i ] k 2 2 ,inEq.A{12if D =[ I s d 1 I s d m ]and c =[ C 1 C m ] | .Likewise, sincethenormusedintherstandthirdtermofaboveequationiselement-wisematrix norm, k C [ i ] k 2 p isequalto k c [ i ] k 2 p inEq.A{12.Therefore p isequalto p inEq.A{12. Denition6. Givenadictionary D = D [1] D [ L ] =[ d 1 d m ]of L blocks,denethe upperblock/row p -restrictedisometryconstant p asthesmallestconstantsuchthatfor every C [ i ] wehave k X r 2I i s X j =1 C r j d r k 2 p p k C [ i ] k 2 p A{21 67

PAGE 68

Usingthesamemethod,Icanshowthat p isequalto p inEq.A{13.From Denition5 and 6 ,theequivalencebetween P ` 0 ;p and P ` 1 ;p and )]TJ/F21 11.9552 Tf 5.479 -9.684 Td [(P ` 0 ;p 0 and )]TJ/F21 11.9552 Tf 5.48 -9.684 Td [(P ` 1 ;p 0 respectively,andTheorem1in[10],wecanmakethefollowingconclusion: Forsignalsthathaveaunique k -block/row-sparserepresentationin D ,thesolutionof theoptimizationprogram P ` 1 ;p isequivalentto P ` 0 ;p if k r p 1+ p + k )]TJ/F15 11.9552 Tf 11.955 0 Td [(1 < 1 )]TJ/F15 11.9552 Tf 12.042 0 Td [( p 1+ p : A{22 68

PAGE 69

REFERENCES [1]S.Bengio,F.Pereira,Y.Singer,andD.Strelow.Groupsparsecoding. Advancesin NeuralInformationProcessingSystems ,22:82{89,2009. [2]J.Cai,H.Ji,C.Liu,andZ.Shen.Blindmotiondeblurringfromasingleimageusing sparseapproximation.In ComputerVisionandPatternRecognition,2009.CVPR 2009.IEEEConferenceon ,pages104{111.IEEE,2009. [3]C.-C.ChangandC.-J.Lin.LIBSVM:Alibraryforsupportvectormachines. ACM Trans.Intell.Syst.Technol. ,2:27:1{27:27,2011. [4]S.Chen,D.Donoho,andM.Saunders.Atomicdecompositionbybasispursuit. SIAM journalonscienticcomputing ,20:33{61,1999. [5]D.Donoho.Compressedsensing. InformationTheory,IEEETransactionson 52:1289{1306,2006. [6]A.DremeauandC.Herzet.Anem-algorithmapproachforthedesignoforthonormal basesadaptedtosparserepresentations.In ICASSP,2010 ,pages2046{2049.IEEE, 2010. [7]M.Elad. SparseandRedundantRepresentations. SpringerVerlag,2010. [8]M.Elad,M.Figueiredo,andY.Ma.Ontheroleofsparseandredundantrepresentationsinimageprocessing. ProceedingsoftheIEEE ,98:972{982,2010. [9]Y.Eldar,P.Kuppinger,andH.Bolcskei.Block-sparsesignals:Uncertaintyrelations andecientrecovery. SignalProcess.,IEEETrans.on ,58:3042{3054,2010. [10]E.ElhamifarandR.Vidal.Robustclassicationusingstructuredsparserepresentation.In ComputerVisionandPatternRecognitionCVPR,2011IEEEConference on ,pages1873{1879.IEEE,2011. [11]E.ElhamifarandR.Vidal.Block-sparserecoveryviaconvexoptimization. IEEE TransactionsonSignalProcessing ,60:4094{4107,2012. [12]G.Fye,X.Yu,andP.Debevec.Single-shotphotometricstereobyspectralmultiplexing.pages1{6,2011. [13]A.Ganesh,Z.Zhou,andY.Ma.Separationofasubspace-sparsesignal:Algorithms andconditions.pages3141{3144,2009. [14]S.Gao,L.Chia,andI.Tsang.Multi-layergroupsparsecodingforconcurrentimage classicationandannotation.In ComputerVisionandPatternRecognitionCVPR, 2011IEEEConferenceon ,pages2809{2816.IEEE,2011. [15]I.GorodnitskyandB.Rao.Sparsesignalreconstructionfromlimiteddatausingfocuss:Are-weightedminimumnormalgorithm. SignalProcessing,IEEETransactions on ,45:600{616,1997. 69

PAGE 70

[16]B.HaasdonkandD.Keysers.Tangentdistancekernelsforsupportvectormachines. In ICPV,2002 ,volume2,pages864{868.IEEE,2002. [17]J.Hull.Adatabaseforhandwrittentextrecognitionresearch. PatternAnal.Mach. Intell.,IEEETrans.on ,16:550{554,1994. [18]R.Jenatton,J.Mairal,G.Obozinski,andF.Bach.Proximalmethodsforsparse hierarchicaldictionarylearning.In InternationalConferenceonMachineLearning ICML ,2010. [19]R.Jenatton,G.Obozinski,andF.Bach.Structuredsparseprincipalcomponentanalysis.In InternationalConferenceonArticialIntelligenceandStatisticsAISTATS 2010. [20]H.Lee,A.Battle,R.Raina,andA.Ng.Ecientsparsecodingalgorithms. Advances inneuralinformationprocessingsystems ,19:801,2007. [21]K.Lee,J.Ho,andD.Kriegman.Acquiringlinearsubspacesforfacerecognition undervariablelighting. IEEETrans.PatternAnal.Mach.Intelligence ,27:684{698, 2005. [22]K.Lee,J.Ho,andD.Kriegman.Acquiringlinearsubspacesforfacerecognition undervariablelighting. PatternAnal.Mach.Intell.,IEEETrans.on ,27:684{698, 2005. [23]S.Lesage,R.Gribonval,F.Bimbot,andL.Benaroya.Learningunionsoforthonormal baseswiththresholdedsvd.In ICASSP'05. ,volume5,pagesv{293.IEEE,2005. [24]X.Li,T.Jia,andH.Zhang.Expression-insensitive3dfacerecognitionusingsparse representation.In ComputerVisionandPatternRecognition,2009.CVPR2009. IEEEConferenceon ,pages2575{2582.IEEE,2009. [25]J.Mairal.Learningmultiscalesparserepresentationsforimageandvideorestoration preprint.Technicalreport,DTICDocument,2007. [26]J.Mairal,F.Bach,J.Ponce,andG.Sapiro.Onlinedictionarylearningforsparse coding.In Proceedingsofthe26thAnnualInternationalConferenceonMachine Learning ,pages689{696.ACM,2009. [27]J.Mairal,F.Bach,J.Ponce,G.Sapiro,andA.Zisserman.Discriminativelearned dictionariesforlocalimageanalysis.In ComputerVisionandPatternRecognition, 2008.CVPR2008.IEEEConferenceon ,pages1{8.IEEE,2008. [28]J.Mairal,F.Bach,J.Ponce,G.Sapiro,andA.Zisserman.Superviseddictionary learning.pages1033{1040,2008. [29]C.MarcAurelioRanzato,S.Chopra,andY.LeCun.Ecientlearningofsparse representationswithanenergy-basedmodel. AdvancesinNeuralInformation ProcessingSystems ,19:1137{1144,2006. 70

PAGE 71

[30]A.MartinezandR.Benavente.Thearfacedatabasecvctechnicalreport,no.24. Barcelona,Spain:ComputerVisionCenter,UniversitatAutonomadeBarcelona 1998. [31]P.NageshandB.Li.Acompressivesensingapproachforexpression-invariantface recognition.In ComputerVisionandPatternRecognition,2009.CVPR2009.IEEE Conferenceon ,pages1518{1525.Ieee,2009. [32]A.Quattoni,M.Collins,andT.Darrell.Transferlearningforimageclassication withsparseprototyperepresentations.In ComputerVisionandPatternRecognition, 2008.CVPR2008.IEEEConferenceon ,pages1{8.IEEE,2008. [33]I.Ramirez,P.Sprechmann,andG.Sapiro.Classicationandclusteringviadictionary learningwithstructuredincoherenceandsharedfeatures.pages3501{3508,2010. [34]B.D.RaoandK.Kreutz-Delgado.Ananescalingmethodologyforbestbasis selection. SignalProcessing,IEEETransactionson ,47:187{200,1999. [35]R.T.Rockafellar. ConvexAnalysisPrincetonLandmarksinMathematicsand Physics .PrincetonUniversityPress,Dec.1996. [36]R.Rubinstein,A.Bruckstein,andM.Elad.Dictionariesforsparserepresentation modeling. ProceedingsoftheIEEE ,98:1045{1057,2010. [37]R.Rubinstein,M.Zibulevsky,andM.Elad.Doublesparsity:Learningsparse dictionariesforsparsesignalapproximation. SignalProcessing,IEEETransactions on ,58:1553{1564,2010. [38]P.Sprechmann,I.Ramirez,G.Sapiro,andY.Eldar.C-hilasso:Acollaborative hierarchicalsparsemodeling. SignalProcess.,IEEETrans.on ,59:4183{4198,2011. [39]Z.Szabo,B.Poczos,andA.Lorincz.Onlinegroup-structureddictionarylearning.In CVPR,2011IEEEConferenceon ,pages2865{2872,june2011. [40]Z.Szabo,B.Poczos,andA.L}orincz.Collaborativelteringviagroup-structured dictionarylearning. LatentVariableAnalysisandSignalSeparation ,pages247{254, 2012. [41]M.VarmaandA.Zisserman.Astatisticalapproachtotextureclassicationfrom singleimages. InternationalJournalofComputerVision ,62:61{81,2005. [42]A.Wagner,J.Wright,A.Ganesh,Z.Zhou,andY.Ma.Towardsapracticalface recognitionsystem:Robustregistrationandillumination.pages597{604,2009. [43]J.Wang,J.Yang,K.Yu,F.Lv,T.Huang,andY.Gong.Locality-constrainedlinear codingforimageclassication.In ComputerVisionandPatternRecognitionCVPR, 2010IEEEConferenceon ,pages3360{3367.IEEE,2010. [44]L.Wolsey. IntegerProgramming .WileySeriesinDiscreteMathematicsandOptimization.JohnWiley&Sons,1998. 71

PAGE 72

[45]J.Wright,Y.Ma,J.Mairal,G.Sapiro,T.Huang,andS.Yan.Sparserepresentation forcomputervisionandpatternrecognition. ProceedingsoftheIEEE ,98:1031{ 1044,2010. [46]J.Wright,A.Yang,A.Ganesh,S.Sastry,andY.Ma.Robustfacerecognitionvia sparserepresentation. PatternAnalysisandMachineIntelligence,IEEETransactions on ,31:210{227,2009. [47]S.Wright,R.Nowak,andM.Figueiredo.Sparsereconstructionbyseparable approximation. SignalProcessing,IEEETransactionson ,57:2479{2493,2009. [48]J.Yang,K.Yu,Y.Gong,andT.Huang.Linearspatialpyramidmatchingusing sparsecodingforimageclassication.In ComputerVisionandPatternRecognition, 2009.CVPR2009.IEEEConferenceon ,pages1794{1801.Ieee,2009. [49]M.YuanandY.Lin.Modelselectionandestimationinregressionwithgrouped variables. JournaloftheRoyalStatisticalSociety:SeriesB ,68:49{67,2006. [50]B.Zhao,L.Fei-Fei,andE.Xing.Onlinedetectionofunusualeventsinvideosvia dynamicsparsecoding.pages3313{3320,2011. [51]M.ZibulevskyandM.Elad.L1-l2optimizationinsignalandimageprocessing. Signal ProcessingMagazine,IEEE ,27:76{88,2010. 72

PAGE 73

BIOGRAPHICALSKETCH Yu-TsehChiwasborninTaiwan.HereceivedhisBachelorofSciencedegreefrom theNationalTaiwanUniversity,Taipei,Taiwan.HereceivedhisMasterofSciencedegree fromTheOhioStateUniversity,Columbus,Ohio,U.S.A.HehasbeenaPh.D.studentin DepartmentofComputerandInformationScienceandEngineeringattheUniversityof Floridasince2007.Hisresearchinterestsincludecomputervisionandmachinelearning. 73