<%BANNER%>

Foundations towards an Integrated Theory of Intelligence

Permanent Link: http://ufdc.ufl.edu/UFE0044666/00001

Material Information

Title: Foundations towards an Integrated Theory of Intelligence
Physical Description: 1 online resource (88 p.)
Language: english
Creator: Tarifi, Mohamad H
Publisher: University of Florida
Place of Publication: Gainesville, Fla.

Subjects

Subjects / Keywords: geometric -- integrated
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: This work outlines the beginning of a new attempt at forming an Integrated Theory of Intelligence.  A complete understanding of intelligence requires a multitude of interdisciplinary ideas. Instead, many specifics are abstracted out in favor of purely computational building blocks.  This enables us to focus on a algorithmic and mathematical aspects of understanding intelligence. A novel universal circuit element for bottom up learning and inference is proposed. The circuit element is a concatination of  Dictionary Learning and Dimension Reduction. This fundamental building block is used to learn hierarchies of sparse representations. The model is applied to standard datasets where the numerical experiments show it performing well.  The Dictionary Learning problem is then examined more closely from a geometric point of view. An exhaustive solution is obtained by framing dictionary learning as the intersection structure of a subspace arrangement. The method learns a subspace arrangement using subspace clustering techniques then applies an intersection algorithm to recover the dictionary. Notable generalizations and special cases are discussed. Geometric Dictionary Learning is investigated theoretically with the help of a surrogate problem, in which the combinatorics of the subspace supports are specified. The problem is then approach using machinery from algebraic geometry. Specifically, a rigidity-type theorem is obtained that charachterizes the sample arrangements, that recover a finite number of dictionaries, using a purely combinatorial property on the supports. Finally some open questions and future work is discussed.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Mohamad H Tarifi.
Thesis: Thesis (Ph.D.)--University of Florida, 2012.
Local: Adviser: Sitharam, Meera.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2013-06-30

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2012
System ID: UFE0044666:00001

Permanent Link: http://ufdc.ufl.edu/UFE0044666/00001

Material Information

Title: Foundations towards an Integrated Theory of Intelligence
Physical Description: 1 online resource (88 p.)
Language: english
Creator: Tarifi, Mohamad H
Publisher: University of Florida
Place of Publication: Gainesville, Fla.

Subjects

Subjects / Keywords: geometric -- integrated
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: This work outlines the beginning of a new attempt at forming an Integrated Theory of Intelligence.  A complete understanding of intelligence requires a multitude of interdisciplinary ideas. Instead, many specifics are abstracted out in favor of purely computational building blocks.  This enables us to focus on a algorithmic and mathematical aspects of understanding intelligence. A novel universal circuit element for bottom up learning and inference is proposed. The circuit element is a concatination of  Dictionary Learning and Dimension Reduction. This fundamental building block is used to learn hierarchies of sparse representations. The model is applied to standard datasets where the numerical experiments show it performing well.  The Dictionary Learning problem is then examined more closely from a geometric point of view. An exhaustive solution is obtained by framing dictionary learning as the intersection structure of a subspace arrangement. The method learns a subspace arrangement using subspace clustering techniques then applies an intersection algorithm to recover the dictionary. Notable generalizations and special cases are discussed. Geometric Dictionary Learning is investigated theoretically with the help of a surrogate problem, in which the combinatorics of the subspace supports are specified. The problem is then approach using machinery from algebraic geometry. Specifically, a rigidity-type theorem is obtained that charachterizes the sample arrangements, that recover a finite number of dictionaries, using a purely combinatorial property on the supports. Finally some open questions and future work is discussed.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Mohamad H Tarifi.
Thesis: Thesis (Ph.D.)--University of Florida, 2012.
Local: Adviser: Sitharam, Meera.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2013-06-30

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2012
System ID: UFE0044666:00001


This item has the following downloads:


Full Text

PAGE 1

FOUNDATIONSTOWARDSANINTEGRATEDTHEORYOFINTELLIGENCEByMOHAMMADTARIFIADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2012

PAGE 2

c2012MohammadTari 2

PAGE 3

Thisworkisdedicatedtofuturegenerations. 3

PAGE 4

ACKNOWLEDGMENTS First,Iwouldliketothankmyadvisor,DrMeeraSitharam.Sheisasourceofconstantgrowthandinspiration.Iwouldliketothankallmyfriendswhosupportedmethroughthejourney.BassamAounhasbeenaconsistentsupport.Finally,Iwouldlikethankmyfamily:mymotherDallal,myfatherHassan,andmysisterReem.Theymeantheworldtome. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 8 LISTOFFIGURES ..................................... 9 ABSTRACT ......................................... 10 CHAPTER 1INTRODUCTION ................................... 12 1.1BackgroundModels .............................. 12 1.1.1HierarchicalModels .......................... 13 1.2Organization .................................. 14 2DRDLELEMENT ................................... 16 2.1TechnicalBackground ............................. 16 2.1.1HistoricalContext ............................ 16 2.1.2SparseApproximation ......................... 18 2.1.3IllustratingTheModelWithSimpleExamples ............ 18 2.1.4PursuitAlgorithms ........................... 20 2.1.5DimensionReduction .......................... 22 2.1.6DictionaryLearning ........................... 24 2.2OurContribution ................................ 26 2.2.1DRDLCircuitElement ......................... 26 2.2.1.1RelationbetweenDRandDL ................ 26 2.2.1.2Discussionoftrade-offsinDRDL .............. 27 2.2.2OurHierarchicalSparseRepresentation ............... 27 2.2.2.1Assumptionsofthegenerativemodel ........... 28 2.2.2.2Learningalgorithm ...................... 29 2.2.2.3Representationinference .................. 29 2.2.2.4MappingbetweenHSRandcurrentmodels ........ 30 2.2.2.5WhatdoesHSRofferbeyondcurrentmodels? ...... 31 2.2.2.6Discussionoftrade-offsinHSR .............. 32 2.2.3IncorporatingAdditionalModelPrior ................. 32 2.2.4Experiments ............................... 34 2.2.4.1MNISTResults ........................ 34 2.2.4.2COILResults ......................... 35 2.3Discussion ................................... 36 5

PAGE 6

3GEOMETRICDICTIONARYLEARNING ...................... 37 3.1Preliminaries .................................. 37 3.1.1StatisticalApproachestoDictionaryLearning ............ 37 3.1.2ContributionandOrganization ..................... 38 3.2TheSetup .................................... 38 3.2.1AssumptionsofTheGenerativeModel ................ 38 3.2.2ProblemDenitions ........................... 39 3.2.3ProblemRelationships ......................... 41 3.3ClusterandIntersectAlgorithm ........................ 43 3.3.1LearningSubspaceArrangements .................. 43 3.3.1.1Randomsampleconsensus ................ 44 3.3.1.2Generalizedprinciplecomponentanalysis ......... 45 3.3.1.3Combinatoricsofsubspaceclustering ........... 45 3.3.2s-IndependentSmallestSpanningSet ................ 46 3.4LearninganOrthogonalBasis ........................ 47 3.5Summary .................................... 48 4SAMPLINGCOMPLEXITYFORGEOMETRICDICTIONARYLEARNING ... 49 4.1AlgebraicRepresentation ........................... 51 4.2CombinatorialRigidity ............................. 53 4.3RequiredGraphProperties .......................... 56 4.4RigidityTheoremind=3,s=2 ........................ 57 4.4.1Consequences ............................. 64 4.5Summary .................................... 65 5FUTUREWORKANDCONCLUSIONS ...................... 66 5.1MinorResults .................................. 66 5.1.1RepresentationSchemeonTheCube,MoutlonMapping ...... 66 5.1.2CautiouslyGreedyPursuit ....................... 67 5.2FutureWorkFromChapter2 ......................... 67 5.2.1DRDLTrade-offs ............................ 67 5.2.2TheHierarchyQuestion ........................ 68 5.3FutureWorkFromChapter3 ......................... 69 5.3.1ClusterandIntersectExtensions ................... 69 5.3.2TemporalCoherence .......................... 70 5.4FutureWorkFromChapter4 ......................... 71 5.4.1UniquenessandNumberofSolutions ................ 71 5.4.2HigherDimensions ........................... 72 5.4.3Genericty ................................ 73 5.4.4ComputingTheRealization ...................... 73 5.5FutureExtensionForTheModel ....................... 73 5.6Conclusion ................................... 74 REFERENCES ....................................... 75 6

PAGE 7

BIOGRAPHICALSKETCH ................................ 88 7

PAGE 8

LISTOFTABLES Table page 2-1AcomparisonbetweenDRDL,HSR,andstandardtechniquesonaclassicationtaskontheMNISTdataset. ............................. 35 2-2AcomparisonbetweenDRDLandstandardtechniquesonaclassicationtaskontheCOILdataset. .............................. 35 8

PAGE 9

LISTOFFIGURES Figure page 2-1Vectorsdistributedass=1sparsecombinationsofthe3shownvectors .... 19 2-2Normalizedvectorsdistributedass=2non-negativesparsecombinationsofthe4shownvectors ................................. 19 2-3Dictionaryofthe1stlayerofMNIST(left);COIL-100(right). ........... 20 2-4Adimensionreductionforthe3bluepointsfromd=3tod=2thatminimizespairwisedistancesdistortion. ............................ 24 2-5Asimple3layerHierarchywithnocycles ..................... 28 2-6AnexampleoffactorizingHSRbyincorporatingadditionalprior.Instep1,wefactorasinglelayerintoa3layerhierarchy.Instep2,wefactorinto3dictionarylearningelementsand2dimensionreductionelements.Instep3,thedimensionreductionatlevel2isfactoredtofanoutintotwoseparateDRDLelements .. 33 3-1Twopossiblesolutionstothesmallestspanningsetproblemofthelinearrangementshown ......................................... 41 3-2Anarrangementofpointsthatissufcienttodeterminethegeneratorsusings=3decompositionstrategybutnots=2. .................... 42 4-1Twosimplearrangements.Thelarger(blue)dotsarethepointsPandthesmall(grey)dotsarethegiventhepinsX. ........................ 51 4-2Asimplearrangementof6points. ......................... 52 4-3Thevelocitiesofapairofpointspiandpjconstraintbyapinxk. ......... 55 5-1Ak-5congurationwithtwodistinctsolutions. ................... 71 5-2Anarrangementof6pointsind=3thatissufcienttodeterminetheirminimumgeneratingdictionaryofsize4. ........................... 72 9

PAGE 10

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophyFOUNDATIONSTOWARDSANINTEGRATEDTHEORYOFINTELLIGENCEByMohammadTariDecember2012Chair:MeeraSitharamMajor:ComputerEngineeringThisworkoutlinesthebeginningofanewattemptatanIntegratedTheoryofIntelligence.Acompleteunderstandingofintelligencerequiresamultitudeofinterdisciplinaryideas.Instead,weabstractoutmanyspecicsinfavorofpurelycomputationalbuildingblocks.Thisenablesustofocusonalgorithmicandmathematicalaspectsofunderstandingintelligence.Anoveluniversalcircuitelementforbottom-uplearningandinferenceisproposed.ThecircuitelementisaconcatenationofDictionaryLearningandDimensionReduction.Thisfundamentalbuildingblockisusedtolearnhierarchiesofsparserepresentations.Themodelisappliedtostandarddatasetswherethenumericalexperimentsshowpromisingperformance.TheDictionaryLearningproblemisthenexaminedmorecloselyfromageometricpointofview.Weidentifyrelatedproblemsanddrawformalrelationshipsamongthem.Thisleadstoaviewofdictionarylearningastheminimumgeneratingsetofasubspacearrangement.Anexhaustivealgorithmfollowsthatappliessubspaceclusteringtechniquesthenanintersectionalgorithmtolearnthedictionary.Notablespecialinstancesarediscussed.GeometricDictionaryLearningistheninvestigatedtheoreticallywiththehelpofasurrogateproblem,inwhichthecombinatoricsofthesubspacesupportsarespecied.Theproblemisthenapproachedusingmachineryfromalgebraicandcombinatorial 10

PAGE 11

geometry.Specically,arigidity-typetheoremisobtainedthatcharacterizethesamplearrangements,thatrecoveranitenumberofdictionaries,usingapurelycombinatorialpropertyonthesupports.Finallywediscusssomeminorresults,openquestions,andfuturework. 11

PAGE 12

CHAPTER1INTRODUCTIONWorkingtowardsaComputationalTheoryofIntelligence,wedevelopacomputationalframeworkinspiredbyideasfromNeuroscience.Specically,weintegratenotionsofcolumnarorganization,hierarchicalstructure,sparsedistributedrepresentations,andsparsecoding. 1.1BackgroundModelsAnintegratedviewofIntelligencehasbeenproposedbyKarlFristonbasedonfree-energy[ 37 41 45 ].Inthisframework,Intelligenceisviewedasasurrogateminimizationoftheentropyofthissensorium.Thisworkisintuitivelyinspiredbythisview,aimingtoprovideacomputationalfoundationforatheoryofintelligencefromtheperspectiveoftheoreticalcomputerscience,therebyconnectingtoideasinmathematics.Bybuildingfoundationsforaprincipledapproach,thecomputationalessenceofproblemscanbeisolated,formalized,andtheirrelationshiptofundamentalproblemsinmathematicsandtheoreticalcomputersciencecanbeilluminatedandthefullpowerofavailablemathematicaltechniquescanbebroughttobear.Speculationonacommoncorticalmicro-circuitelementdatesbacktoMountcastle'sobservationthatacorticalcolumnmayserveasanalgorithmicbuildingblockoftheneocortex[ 82 ].LaterworkbyLeeandMumford[ 94 ]andHawkinsandGeorge[ 51 ]attemptedfurtherinvestigationofthisprocess.Thebottom-uporganizationoftheneocortexisgenerallyassumedtobeaheterarchicaltopologyofcolumns.Thiscanbemodeledasadirectedacyclicgraph,butisusuallysimpliedtoahierarchicaltree.WorkbyPoggio,Serre,etal.[ 110 115 117 ],Dean[ 25 26 ],discussedahierarchicaltopology.Smaleetal.attemptstodevelopatheoryaccountingfortheimportanceofthehierarchicalstructure[ 12 136 ].WorkonmodelingearlystagesofsensoryprocessingbyOlshausen[ 108 135 ],usingsparsecoding,producedresultsthataccountfortheobservedreceptiveeldsin 12

PAGE 13

earlyvisualprocessing.Thisisusuallydonebylearninganovercompletedictionary.However,itremainedunclearhowtoextendthistohigherlayers.Ourworkcanbepartiallyviewedasaprogressinthisdirection.ComputationalLearningTheory[ 44 ]istheformalstudyoflearningalgorithms.ProbablyApproximatelyCorrect(PAC)learningdenesanaturalsettingforanalyzingsuchalgorithms[ 43 ].However,withfewnotableexceptions(boosting,inspirationforSupportVectorMachines[ 42 ],etc)theproducedguaranteesaredivorcedfrompractice.Withouttightguarantees,MachineLearningisstudiedusingexperimentalresultsonstandardbenchmarks,whichisproblematic.Weaimatclosingthegapbetweentheoryandpracticebyprovidingstrongerassumptionsonthestructuresandformsconsideredbythetheory,throughconstraintsinspiredbybiologyandcomplexsystems. 1.1.1HierarchicalModelsSeveralhierarchicalmodelshavebeenintroducedintheliterature.H-MaxisbasedonSimple-ComplexcellhierarchyofHubelandWiesel.Itisbasicallyahierarchicalsuccessionoftemplatematchingandmax-operations,correspondingtosimpleandcomplexcellsrespectively[ 110 ].HierarchicalTemporalMemory(HTM)isalearningmodelcomposedofahierarchyofspatialcoincidencedetectionandtemporalpooling[ 50 51 55 ].Coincidencedetectioninvolvesndingaspatialclusteringoftheinput,whiletemporalpoolingisaboutndingvariableorderMarkovchains[ 47 ]describingtemporalsequencesinthedata.H-MaxcanbemappedintoHTMinastraightforwardmanner.InHTM,thetransformationsinwhichthedataremainsinvariantarelearnedinthetemporalpoolingstep[ 55 ].H-Maxexplicitlyhardcodestranslationaltransformationsthroughthemaxoperation.ThisgivesH-Maxbettersamplecomplexityforspecicproblemswheretranslationalinvarianceispresent. 13

PAGE 14

Bouvrieetal.[ 11 12 ]introducedageneralizationofhierarchicalarchitecturescenteredaroundafoundationalelementinvolvingtwosteps,FilteringandPooling.FilteringisdescribedthroughareproducingKernelK(x,y),suchasthestandardinnerproductK(x,y)=hx,yi,oraGaussiankernelK(x,y)=e)]TJ /F7 7.97 Tf 6.58 0 Td[(kx)]TJ /F4 7.97 Tf 6.58 0 Td[(yk2.Poolingthenremapstheresulttoasinglevalue.Examplesofpoolingfunctionsincludemax,mean,andlpnorm(suchasl1orl1).H-max,ConvolutionalNeuralNets[ 53 ],andDeepFeedforwardNeuralNetworks[ 52 ]allbelongtothiscategoryofhierarchicalarchitecturescorrespondingtodifferentchoicesoftheKernelandPoolingfunctions.AsweshowinSection2.3.4,ourmodeldoesnotfallwithinBouvrie'spresentframework,andcanbeviewedasageneralizationofhierarchicalmodelsinwhichbothHTMandBouvrie'sframeworkareaspecialcase.FristonproposedHierarchicalDynamicModels(HDM)whicharesimilartotheabovementionedarchitecturesbutframedinacontroltheoreticframeworkoperatingincontinuoustime[ 37 ].Acomputationalformalismofhisapproachisthusprohibitivelydifcult.Acomputationalapproachisfocusedondevelopingtractablealgorithms,exploringthecomplexitylimitsofIntelligence,thereby,improvingthequalityofavailableguaranteesforevaluatingperformanceofmodels,improvingcomparisonsamongmodels,andmovingtowardsprovableguaranteessuchassamplesize,timecomplexity,andgeneralizationerror.Inaddition,priorassumptionsabouttheenvironmentaremadeexplicit.Thisfurnishesasolidtheoreticalfoundationwhichmaybeused,amongotherthings,asabasisforbuildingArticialIntelligence. 1.2OrganizationThenextchapterintroducesanelementalbuildingblockthatcombinesDictionaryLearningandDimensionReduction(DRDL).WeshowhowthisfoundationalelementcanbeusedtoiterativelyconstructaHierarchicalSparseRepresentation(HSR)ofasensorystream.Wecompareourapproachtoexistingmodelsshowingthegeneralityof 14

PAGE 15

oursimpleprescription.Wethenperformpreliminaryexperimentsusingthisframework,illustratingwiththeexampleofanobjectrecognitiontaskusingstandarddatasets.InChapter3,theDictionaryLearningproblemisthenexaminedmorecloselyfromageometricpointofview.Weidentifyrelatedproblemsanddrawformalrelationshipsamongthem.Thisleadstoaviewofdictionarylearningastheminimumgeneratingsetofasubspacearrangement.Anexhaustivealgorithmfollowsthatappliessubspaceclusteringtechniquesthenanintersectionalgorithmtolearnthedictionary.Notablespecialinstancesarediscussed.InChapter4,GeometricDictionaryLearningistheninvestigatedtheoreticallywiththehelpofasurrogateproblem,inwhichthecombinatoricsofthesubspacesupportsarespecied.Theproblemisthenapproachedusingmachineryfromalgebraicandcombinatorialgeometry.Specically,arigidity-typetheoremisobtainedthatcharacterizethesamplearrangements,thatrecoveranitenumberofdictionaries,usingapurelycombinatorialpropertyonthesupports.ThethesisisconcludedinChapter5withsomeminorresults,openquestions,anddiscussionoffuturework. 15

PAGE 16

CHAPTER2DRDLELEMENTThischapterintroducesanelementalbuildingblockthatcombinesDictionaryLearning[ 3 ](tobeformallyintroducedanddiscussedinSection 2.1.6 )andDimensionReduction[ 49 ](tobeformallyintroducedanddiscussedinSection 2.1.5 ).TheelementisoftenabbreviatedasDRDL.WeshowhowthisfoundationalelementcanbeusedtoiterativelyconstructaHierarchicalSparseRepresentation(HSR,tobeformallyintroducedanddiscussedinSection 2.2.2 )ofasensorystream.Wecompareourapproachtoexistingmodelsshowingthegeneralityofoursimpleprescription.Wethenperformpreliminaryexperimentsusingthisframework,illustratingwiththeexampleofanobjectrecognitiontaskusingstandarddatasets.Next,weintroducesomeoftherelevantmathematicalandconceptualtopicsandtechniqueswhichweusedinourwork. 2.1TechnicalBackgroundADictionaryisand-by-mmatrixinRdm,wherethecolumnsarenormalizedtounitmagnitude.Adictionaryiscalledovercomplete,orredundant,ifd>m.TwodictionariesD1andD2areequivalentifthereexistsasignedpermutationmatrix)]TJ /F1 11.955 Tf 10.64 0 Td[(suchthatD1=D2)]TJ /F1 11.955 Tf 7.31 0 Td[(.Webeginbyplacingourworkwithrespecttohistoricalcontext. 2.1.1HistoricalContextNonlinearApproximationTheory,asapartoffunctionalanalysis,canbetracedbacktoKolomogrov,whoformulatedthenotionofn-widths[ 157 ].Thiseldseekstondthebestapproximationtoasingleinputfunctiondrawnfromaxedspace(accordingtovarioustypesofxednorms/distributions),bychoosingfromagiven,possiblyovercomplete/dependent(i.e.manymorefunctionsthanthedimension),familyoffunctions(calledaDictionary)drawnfromavectorspace,andcombiningthe 16

PAGE 17

chosenfunctionsinxed,allowedsetofnon-linearwayssuchascompositionsandsuperpositions.Inaddition,onemayrestrictthenumberoffunctionsused.Ingeneral,theproblemofndingasparseapproximationisnonlinearduetoovercompleteness.Evenifthefamilyisabasis,theusualprojectiontypemethodsdonotgenerallywork,duetononorthogonalityofthebasisfamily.Nordodualitybasedmethodsworkingeneral.AsimplerproblemisobtainedwhenrestrictingtheallowedsetofcombinationstosparselinearcombinationsfromadictionaryD.FororthogonalandindependentD,wellstudiedinstancesincludedclassicalharmonicapproximation(suchasFourierbases)andorthogonalpolynomialbases.WhenDisallowedtobenon-orthogonal,butremainsindependent,classicexamplesstudiedincludepolynomialbasesandsplines.OvercompleteanddependentfamiliesstudiedincludeapproximationbyrationalfunctionsorFreeKnotsplineandspaces,wherethelocationofknotsisfree,butthenumberofknotsisxed.AnotherexampleisapproximationfromWaveletsandMulti-resolutionSobolevandBesovspacesaimingatharmonicanalysiswithnitesupport(aswithsplines).Theseareovercompletefamilies,butarenicelylayeredintoindependent,evenorthogonal,families(thesplinesareorthogonalized)thatmoreoverhavecertaintypesoforthogonalityrelationshipsbetweenthelayers.InFunctionalAnalysis,theseapproximationschemesareappliedtoinnitedimensionalvectorspacessuchasHilbertandevenBanachspaces.[ 67 ].InNumericalApproximationTheory,whichisappropriatetoseekingcomputationalunderstanding,thedomainisdiscretized,andboththedomainrenementaswellasotherparametersrelatedtothedimensionofthevectorspaces(suchasdegreeoftheapproximatingfunctions)areimportantmeasuresofcomplexity.Inapproximationtheory,asymptoticcomplexityandqualityguaranteesarecapturedbyprovingboundsandtrade-offsbetweenthedimensionofapproximatingspace,degree,ratesofconvergenceofiterativemethods,andapproximationorder. 17

PAGE 18

Theseproblemsthereforearesetinnitedimensionallinearalgebra,withanasymptoticthrust(forcomplexityanalysis).Notsurprisingly,thesearerelatedtoclassicalproblemsandmethodsinnumericallinearalgebraandstatisticssuchasinverseproblemsandleastsquares,andwereattackedbyvariousPursuitAlgorithmssuchasBasisPursuit,andvariantssuchasLARS/LASSO[ 130 134 139 150 151 ] 2.1.2SparseApproximationSparseapproximation,alsoknowasmodelorvectorselection,istheproblemofrepresentinginputdatainaknownDictionaryD.Theproblemcanbestatedas: Denition1. GivenadictionaryD2Rdm(possiblyovercomplete)andainputvectory2Rd,theSparseRepresentationproblemasksforthex2Rmsuchthatminjjxjj0:y=Dx.Thatis,xisthesparsestvectorthatrepresentsyaslinearcombinationsofthecolumnsofD.Wesaythatadictionarys-spansavectory,ifthesparserepresentationproblemcanbesolvedwithjjxjj0s 2.1.3IllustratingTheModelWithSimpleExamples Example1. Figure 2-1 illustratesaparticulardistributionthatiss=1-sparseinthe3drawnvectors.TheDictionaryD1is0B@1010111CA.Thecolumnsofthedictionarycorrespondtothedrawnvectors,andthedataisexpressedsimplyasavectorwithonecoefcientcorrespondingtoinnerproductwiththeclosestvectorandzerootherwise.Thisproducesans=1sparsevectorindimension3.Aslightlymorecomplicatedexampleiss=2intheDictionaryD2showninFigure 2-2 18

PAGE 19

Figure2-1. Vectorsdistributedass=1sparsecombinationsofthe3shownvectors Example2. Figure2-2illustratesthedistributiondrawnfromthedictionarybelowfornon-negativecoefcients.Thedatawasnormalizedforconvenience.0BBBB@1001010100111CCCCA Figure2-2. Normalizedvectorsdistributedass=2non-negativesparsecombinationsofthe4shownvectors Nextwegiveatypicalexamplethatisencounteredinpractice. Example3. TheMNISTdatasetisadatabaseofhandwrittendigits[ 175 ].TheCOIL-100datasetconsistsofcolorimagesof100objectstakenatposeintervalsof5degrees,foratotalof72posesperobject[ 48 ].Figure 2-3 showsdictionariestrainedontheMNISTandCOIL-100datasets.Eachimagepatchisacolumnofthedictionary.ThedictionarywastrainedusingtheSPArseModelingSoftware(SPAMS)opensourcelibrary[ 102 176 ]. 19

PAGE 20

Figure2-3. Dictionaryofthe1stlayerofMNIST(left);COIL-100(right). 2.1.4PursuitAlgorithmsSeveralalgorithmsexistforvectorselectionreectingvariousapproachestotheproblem.ItcanbeshownthatthegeneralVectorSelectionproblemforanydictionaryisNP-hard.ThisisdonebyreductiontotheExactCoverby3-setproblem[ 24 ].ThisshowsthatthegeneralselectionproblemisdifcultforanarbitrarydictionaryD.However,efcientoptimalalgorithmsmayexistforspecialD.Forinstance,ifDisacompletebasis,thenthisproblemcanbeefcientlysolvedbyusingPrincipleComponentAnalysis(PCA)[ 61 ]andthenselectingthetopcomponentsasneeded.Insomecases,weareonlyinterestedinjjxjj0s,wheres2N+isgiven.Oneapproachforvectorselectionisusingafullsearchofall)]TJ /F4 7.97 Tf 5.48 -4.38 Td[(mssub-dictionariesandtakingtheMoore-Penrosepseudo-inverseA+=(AA))]TJ /F4 7.97 Tf 6.59 0 Td[(1Aoftheinducedsub-dictionaryA.Thepseudo-inversendstheleastsquareapproximationtotherstprincipalcomponentaboutthemeanofasetofpointscanberepresentedbythatlinewhichmostcloselyapproachesthedatapoints(asmeasuredbythesquareddistanceoftheclosestapproach,i.e.perpendiculartotheline).Thisisincontrasttothelinearleastsquares,whichtriestominimizethedistanceintheydirectiononly.Thus,althoughthetwouseasimilarerrormetric,linearleastsquaresisamethodthattreatsonedimensionofthedatapreferentially,whilePCAtreatsalldimensionsequally. 20

PAGE 21

MatchingPursuitalgorithms,suchasOrthogonalMatchingPursuit(OMP)[ 134 ],areafamilyofgreedyapproachestosparseapproximation.SimilarapproachesinthestatisticsliteraturearereferredtoasForwardSelection[ 139 ],StepwiseForwardSelection[ 139 ],andLeastAngleRegression(LARS)[ 130 ].IfwetakeanL1relaxationoftheproblem,wehaveminjjxjj1:y=Dx,thisoptimizedwiththeconvexLagrangianexpression:minjjy)]TJ /F3 11.955 Tf 11.95 0 Td[(Dxjj2+jjxjj1where>0istheLagrangianmultiplier.ThisproblemiscalledLeastAbsoluteShrinkageandSelectionOperator(LASSO)[ 151 ]andcanbesolvedusingBasisPursuit[ 150 ],inwhichitisre-expressedasaLinearProgramming(LP)problem[ 133 ].HowgoodisLASSOatndingasolution?ThishasbeendiscussedbyZhaoandYu[ 101 ]andDonoho[ 31 ].BothapproachesusedconditionsonthedictionaryDtoevaluatetheefcacyofLASSO.ZhaoandYudenetheIrrepresentableSignConditionCn21(Cn11))]TJ /F4 7.97 Tf 6.58 0 Td[(1sign(n)1)]TJ /F9 11.955 Tf 11.95 0 Td[(whereCn11isthecorrelationbetweenthedictionary'sactiveelements,andCn21isthecorrelationbetweentheactiveandinactiveelements.Activeelementsaretheelementsthatcorrespondtononzeroentriesinx.ThisIrrepresentableSignConditionisshowntobeequivalenttotheSignConsistencypropertydenedasthelimitofobtainingacorrectdecompositionwithpropersigncoefcientsasd!1.ThisresultshowsthatLASSOisconsistentifandonlyifthemaximumcorrelationisbounded:maxi,jjCnijj1 smax 21

PAGE 22

wheresmaxisthemaximumsparsityallowed.AlthoughthisresultguaranteesthesuccessofLASSOinweaklycorrelatedD,thisisanegativeresultsinceonmostdictionariesofinterestaresignicantlycorrelated.Yetanotherwellknowntechnique,theFOCalUnder-determinedSystemSolver[ 140 ],minimizesanobjectivefunctionmin(jjy)]TJ /F3 11.955 Tf 12.01 0 Td[(Dxjj2+jjxjjp),withap-norm,p<1.Aspgetsclosetozero,thistermgetclosetothesparsityl0norm. 2.1.5DimensionReductionDimensionReductioninvolvesmappingdatapointsfromdimensiond1tod2,withd2
PAGE 23

Inthiswork,weuseaparticulartypeofdimensionreductionknownasCompressedSensing[ 30 ],whichissuitedfordatathatissparseinaxedknownbasisB.WecanobtainadimensionreductionbyapplyingalinearoperatorsatisfyingtheFrameProperty(alsoknownastheRestrictedIsometryProperty,orRIP,inCompressedSensingtheory). Denition2(FrameProperty). AdictionaryDisframeifforallysuchthatjjyjj0s,thereexistsas,forwhichitholdsthat: (1)]TJ /F9 11.955 Tf 11.96 0 Td[(s)kAxk22 kxk22(1+s)(2)wheres>0istheminimumpossiblevaluesuchthatthetwoboundsapply.Forconstantsparsity,s,CompressedSensingwithaframedictionaryachievesexponentialdimensionreductionwhenthemaintainedpropertyisapproximatemutualdistances.Thiscanbeseenbyconsideringtwos-sparsevectorsx1andx2,then: (1)]TJ /F9 11.955 Tf 11.95 0 Td[(2s)kAx1)]TJ /F3 11.955 Tf 11.95 0 Td[(Ax2k22 kx1)]TJ /F3 11.955 Tf 11.95 0 Td[(x2k22(1+2s).(2)Givenans-sparsevectorofdimensionn,aframereducesthedimensiontoO(slog(n)).Furthermore,theframepropertyguaranteesexactrecoverabilityofx,fromthecompressedvectory=Ax,byusingL1minimization. Observation1. IfAisaframe,thensolvingtheconvexoptimizationproblemminjjy)]TJ /F3 11.955 Tf -447.25 -23.91 Td[(Axjj2+jjxjj1givesthesparsestsolutionminjjxjj0:y=Ax,whenxissparse.ForaproofofObservation 1 ,thereaderisreferredto[ 30 ]. Example4. WecanthenapplyadimensionreductiontothesparserepresentationobtainedinExample 1 thatpreservesdistancesbetweenrepresentations.Therepre-sentationscorrespondtothestandardbasisind=3.Thebestdimensionreductiontod=2isthensimplytheprojectionoftherepresentationsontotheplaneperpendicu-larto(1,1,1).WherebypointsontheunitbasisprojecttotheverticesofatriangleasillustratedinFigure 4 23

PAGE 24

Figure2-4. Adimensionreductionforthe3bluepointsfromd=3tod=2thatminimizespairwisedistancesdistortion. Sinceweareusingaframe,efcientdecompressionisguaranteedusingL1minimization.ThedatacanberecoveredexactlyusingL1minimizationalgorithmssuchasBasisPursuit.FramescanbeobtainedprobabilisticallyfrommatriceswithrandomGaussianentries.Alternatively,Framescanbeobtainedusingsparserandommatrices[ 56 ].Inthisthesiswefollowthelatterapproach.ThequestionofdeterministicallyconstructingFrameswithsimilarboundsisstillopen.NextweturnourattentiontotheproblemofndingthedictionaryD,whenitisunknown,fromthedata. 2.1.6DictionaryLearningDictionaryLearningobtainsasparserepresentationbylearningvectorsonwhichthedataxicanbewritteninsparselinearcombinations. Denition3. GivenaninputsetX=[x1...xm],wherexi2Rd,DictionaryLearningndsD=[v1...vn]and=[i...m],where2Rn,suchthatxi=Diandkik0s. 24

PAGE 25

wherethek.k0istheL0-normorsparsity.Ifallentriesofiarerestrictedtobenon-negative,weobtainSparse-Non-negativeMatrixFactorization(SNMF)[ 63 64 ].AnoptimizationversionofDictionaryLearningcanbewrittenas:minD2Rd,nmaxximinkik0:xi=Di.Inpractice,theDictionaryLearningproblemisoftenrelaxedtotheLagrangian:minkX)]TJ /F3 11.955 Tf 11.95 0 Td[(Dk2+kk1whereX=[x0...xm]and=[0...m].Severaldictionarylearningalgorithmsworkbyiteratingthefollowing.Step1,solvethevectorselectionproblemforallvectorsX.Thiscanbedoneusingyourfavoritevectorselectionalgorithm,suchasbasispursuit.Step2,givenX,theoptimizationproblemisnowconvexinD.UseyourfavoritemethodtondD.Usingamaximumlikelihoodformalism,theMethodofOptimalDictionary(MOD)[ 60 ]usesthepseudo-inversetocomputeD:D(i+1)=X(i)T(niT))]TJ /F4 7.97 Tf 6.59 0 Td[(1whereDiandiaretheithiterationcandidateforDandrespectively.TheMODcanbeextendedtoMaximumA-Posterioriprobabilitysettingwithdifferentpriorstotakeintoaccountpreferencesintherecovereddictionary.Similarly,k-SVDusesatwostepiterativeprocess,withaTruncatedSingularValueDecomposition[ 154 ]toupdateD.ThisisdonebytakingeveryvectorinDandapplyingSVDtoXandrestrictedtoonlythecolumnsthathaveacontributionfromthatvector.WhenDisrestrictedtobeoftheformD=[B1,B2...BL]whereBi'sareorthonormalmatrices,thenamoreefcientpursuitalgorithmisobtainedforthesparsecodingstageusingablockcoordinaterelaxation. 25

PAGE 26

WewillinvestigatetotheproblemofDictionaryLearningindetailinChapters3and4.WearenowreadytointroduceourDRDLelement. 2.2OurContribution 2.2.1DRDLCircuitElementOurcircuitelementisasimpleconcatenationofaDictionaryLearning(DL)stepfollowedbyaDimensionReduction(DR)step;DRDLisusedasashorthand.TheDLsteplearnsarepresentationioftheinputthatlivesinahighdimensionn.Let1i1,...isnbethenonzeroentriesofi.Thenitspossibletoobtainadimensionreductionbyembeddingthecorrespondingentriesintolow=(i1,i1,i2,i2...is,is).Thenumberofbitsneededtorepresentijisatmostlogn,thereforetheembeddedspacehasdimensionO(slog(n)).Theproblemwiththissimpledimensionreductionisthatthemetricdistortionishigh.CompressedsensingwithaframeensuresapproximatemetricpreservationwhileembeddingthevetorsintoadimensionoforderO(slog(n)).Thisadditionalpropertyenablesustousemetricbasedalgorithmstodistinguishbetweendifferentconceptclasses(fromtheoutputoftheDRDLelement).ThispropertyisalsokeytoapplyingtheDRDLcircuitelementrecursively(asinSection 2.2.2 )especiallyinthepresenceofnoise.WefurtherassumethattheDLsteplearnsaframe.Thisconditionwillbeusefulforlearningthedictionary(althoughwewillloosenthisassumptioninChapter3).ThisalsoenablesObservation 2 bellow. 2.2.1.1RelationbetweenDRandDLTheDRandDLstepsareintimatelyrelated.Toshowtheirrelationshipclearly,werewritethetwoproblemswiththesamevariablenames.Thesevariablesareonlyrelevantforthissection.Thetwoproblemscanbestatedas: 1. DLasksforDandfx1...xmg,givenfy1...ymg,forDxi=yi,suchthatthesparsitykxik0isminimizedforaxeddimensionofyi. 26

PAGE 27

2. DRasksforDandfy1...ymg,givenfx1...xmg,forDxi=yi,suchthatthedimensionofyi'sisminimizedforaxedsparsitykxik0.Inpractice,bothproblemsuseL1approximationasaproxyforL0optimization.Thisleadstothefollowingobservation Observation2. TheinverseofaDRDLisaDRDL.Thismeansthatthespaceofmappings/functionsofourmodelisthesameasit'sinverse.Thispropertyisusefulforincorporatingfeedback. 2.2.1.2Discussionoftrade-offsinDRDLDRDLcanbethoughtofasamemorysystem('memorypocket')oradimensionreductiontechniquefordatathatcanbeexpressedsparselyinadictionary.Oneparametertrade-offisbetweenn(thenumberofcolumnsinD)ands(thesparsityoftherepresentation).Ononehand,wenotethattheDRstepputsthedatainO(slog(n))dimension.Therefore,ifwedesiretomaximizethereductionindimension,increasingnbyraisingittoaconstantpowerkiscomparabletomultiplyingsbyk.Thismeansthatwewouldmuchratherincreasethenumberofcolumnsinthedictionarythanthesparsity.Ontheotherhand,increasingthenumberofcolumnsinDforcesthecolumnstobehighlycorrelatedWhichbecomesproblematicforBasisPursuitvectorselection.Thistrade-offhighlightstheimportanceofinvestigatingapproachestodictionarylearningandvectorselectionthatcangobeyondcurrentresultsintohighlycoherentdictionaries. 2.2.2OurHierarchicalSparseRepresentationIfweassumeahierarchicalarchitecturemodelingthetopographicorganizationofthevisualcortex,asingularDRDLelementcanbefactorizedandexpressedasatreeofsimplerDRDLelements.WiththisarchitecturewecanlearnaHierarchicalSparseRepresentationbyiteratingDRDLelements. 27

PAGE 28

2.2.2.1AssumptionsofthegenerativemodelOurmodelsassumesthatthedataisgeneratedbyahierarchyofspatiotemporalinvariants.Atanygivenleveli,eachnodeinthegenerativemodelisassumedtobecomposedofasmallnumberofvectorssi.Generationproceedsbyrecursivelydecompressingthepatternfromparentnodesthenproducingpatternstochildnodes.Thisinputisfedtothelearningalgorithmbelow.Inthischapter,weassumethatboththetopologyofgenerativemodelandthespatialandtemporalextentofeachnodeisknown.Discussionofalgorithmsforlearningthetopologyandinternaldimensionsisleftforfuturework.Considerasimpledatastreamconsistingofaspatiotemporalsequencesfromagenerativemodeldenedabove.Figure 2-5 showsapotentiallearninghierarchy.Forsimplevisionproblems,wecanconsideralldictionarieswithinalayerasthesame.Inthischapter,processingproceedsbottom-upthehierarchyonly. Figure2-5. Asimple3layerHierarchywithnocycles 28

PAGE 29

2.2.2.2LearningalgorithmTheoverallpictureofthelearningalgorithmisrelativelystraightforward.Recursivelydividethespatiotemporalsignalxitoobtainatreerepresentingtheknowntopographichierarchyofspatiotemporalblocks.Letx0i.jbethejthblockatlevel0.Wedenotebyxki.jthejthblock(inagiventopographicorder)atlevelkandbyDkjthedictionaryinthejposition(inthesametopographicorder)atlevelk.Then,startingatthebottomofthetree,do: 1. LearnaDictionaryDkjinwhichthespatiotemporaldataxki,jcanberepresentedsparsely.Thisproducesavectorofweightski,j. 2. Applydimensionalityreductiontothesparserepresentationtoobtainuki,j=Aki,j. 3. Generatexk+1i,jbyconcatenatingvectorsuki,lforalllthatischildofjinatlevelkinthetree.Replacek=k+1.Andnowjrangesoverelementsoflevelk.Ifkisstilllessthanthedepthofthetree,gotoStep1.Notethatindomainssuchascomputervision,itisreasonabletoassumethatallDictionariesatlevelkarethesameDkj=Dk.Thisalgorithmattemptstomirrorthegenerativemodel.Itoutputsaninferencealgorithmthatinducesahierarchyofsparserepresentationsforagivendatapoint.Thiscanbeusedtoabstractinvariantfeaturevectorsinthenewdata.Onecanthenuseasupervisedlearningalgorithmontopoftheinvariantfeaturevectorstosolveclassicationproblems. 2.2.2.3RepresentationinferenceFornewdatapoints,therepresentationisobtained,inanalogytothelearningalgorithm,byrecursivelydividingthespatiotemporalsignaltoobtainatreerepresentingtheknowntopographichierarchyofspatiotemporalblocks.TherepresentationisinferrednaturallybyiterativelyapplyingVectorSelectionandCompressedSensing.ForVectorSelection,weemployacommonvariationaltechniquecalledBasisPursuitDe-Noising(BPDN)[ 54 ],whichminimizeskDi)]TJ /F3 11.955 Tf 12.56 0 Td[(xik22+kik1.Thistechniqueproducesoptimalresultswhenthesparsity 29

PAGE 30

kk0<1 2+1 2C(2)whereCisthecoherenceofthedictionaryD, C=maxk,l(DTD)k,l.(2)Thisisalimitationinpracticesinceitisdesirabletohavehighlycoherentdictionaries.Inferenceproceededbyiterativelyapplyingvectorselectionanddimensionreduction. 2.2.2.4MappingbetweenHSRandcurrentmodelsThissimple,yetpowerful,toymodelisusedasabasisinvestigationtoolforourfuturework.Weabstractedthismodeloutofaneedforconceptualsimplicityandgenerality.Severalmodelsinspiredbytheneocortexhavebeenproposed,manysharingsimilarcharacteristics.Wepresentafewhereandcomparetoourmodel.H-MaxisahierarchycomposedofatemplatematchingstepfollowedbyaMaxoperation.TemplateMatchingissimplypickingthehighestinnerproductbetweentheinputandeverycolumnofthedictionary,whichcanbethoughtofassparserepresentationwiths=1.H-MaxismappedtoHSRbyreplacingBasisPursuitforSparseApproximationwithTemplateMatching.H-Maxusestemplates(columnsofthedictionary)thataresampledrandomlyfromtheinputdata[ 71 110 115 117 ],whereasHSRusesDictionaryLearning.The'max'operationinH-Maxcanbeunderstoodintermsoftheoperationofsparsecoding,whichproduceslocalcompetitionamongfeaturevectorsrepresentingslightvariationsinthedata.Alternatively,the'max'operationcanbeviewedasalimitedformofdimensionreduction.HTM[ 50 51 55 ]canbemappedtoHSRbyconsideringtimeasadditionaldimensionofspace.Thisway,boundedvariableorderMarkovchains[ 47 ]canbewrittenasSparseApproximationofspatiotemporalvectorsrepresentingsequencesofspacialvectorsintime.HTMconstructsasetofspatialcoincidencesthatisautomaticallysharedacrosstime.HSRmaydothesamewhenfedamovingwindowofspatial 30

PAGE 31

sequences.Alternatively,HSRwillsimulateanHTMbyalternatingbetweentime-onlyandspace-onlyspatiotemporallyextendedblocks.InthisviewasingleHTMnodeismappedtotwolayerHSRnodes,onerepresentingtime,andtheotherrepresentingspace.UnlikeHTM,treatingtimeandspaceonequalfootinghastheaddedadvantagethatthesamealgorithmscanbeusedforboth.HTMwithwinner-take-allpolicycanthenbemappedtoourmodelbyassumingsparsitys=1.HTMwithdistributiononbeliefstatescanbemappedtoourmodelwithTemplateMatchinginsteadofSparseApproximationintheinferencestep.Finally,HTMdoesnotleveragetheRIPdimensionreductionstep.HTMusesfeedbackconnectionsforprediction,whichisrestrictedtopredictionsforwardintime.ExtendingHSRwithfeedbackconnections,whichaccountsfordependencybetweennodesthatarenotconnecteddirectly,enablesfeedbacktoaffectallspace-time. 2.2.2.5WhatdoesHSRofferbeyondcurrentmodels?Oneadvantageofourapproachisinprovidinginvertibilitywithoutadimensionalityblow-upforhierarchicallysparsedata.Models[ 52 53 ]fallingunderBouvrieetal.'sframework[ 11 12 ](introducedinChapter1,Section 1.1.1 )loseinformationasyouproceedupthehierarchyduetothePoolingoperation.Thisbecomesproblematicwhenextendingthemodelstoincorporatefeedback.Moreover,thisinformationlossforcesanalgorithmdesignertohardcodewhatinvariancesaparticularmodelmustselectfor(suchastranslationalinvarianceinH-max).Ontheotherhand,invertiblemodelssuchasHTMsufferfromdimensionalityblowup,whenthenumberofvectorslearnedatagivenlevelisgreaterthantheinputdimensiontothatlevel-asisusuallythecase.Dimensionalityreductionachievesbothasavingsincomputationalresourcesaswellasbetternoiseresiliencebyavoidingover-tting.Dictionarylearningrepresentsthedatabysparsecombinationsofdictionarycolumns.ThiscanbeviewedasaL0orL1regularization,whichprovidesnoisetoleranceandbettergeneralization.Thistypeofregularizationisintuitiveandiswell 31

PAGE 32

motivatedbyneuroscience[ 108 135 ]andtheorganizationofcomplexsystems.OurapproachdepartsfromcurrentmodelsthatusesimpletemplatematchingandleveragestheexpressivepowerofSparseApproximation.Thisprovidesadisciplinedprescriptionforlearningvectorsateverylevel.Finally,HSRisaconceptuallysimplemodel.Thiselegancelendsitselfwelltoanalysis. 2.2.2.6Discussionoftrade-offsinHSRThereareseveraldesigndecisionswhenconstructinganHSR.Informally,thehierarchyisusefulforreducingthesamplecomplexityanddimensionalityoftheproblem.Forinstanceconsiderthesimpliedcaseofbinaryf0,1gcoefcientsandtranslationinvariance(suchasvision).AnHSRgenerativemodeloftwolayerswillproduce)]TJ /F4 7.97 Tf 5.48 -4.38 Td[(m2s2patterns.LearningthiswithasinglelayerHSRwouldinvolvelearningadictionaryofm2columnsands2sparsityusingjXjsamplesindimensiond.Usingtwolayers,wehavetherstlayerlearningadictionaryofsizem1andsparsitys1usingkjXjsamplesindimensiond k,thesecondlayerlearnsadictionaryofsizem2columnsands2withjXjsamplesindimensionkO(s1logm1)
PAGE 33

Figure2-6. AnexampleoffactorizingHSRbyincorporatingadditionalprior.Instep1,wefactorasinglelayerintoa3layerhierarchy.Instep2,wefactorinto3dictionarylearningelementsand2dimensionreductionelements.Instep3,thedimensionreductionatlevel2isfactoredtofanoutintotwoseparateDRDLelements TheStructureoftheModel.Theassumptionthatourgenerativemodelcanbefactoredintoahierarchyisastructuralprior.Furtherfactorizationscanreectdifferentstructuralorganizations,suchasthetopologyofthecortex.Otherkindsofassumptionsmaybeimposedaswell.Forexample,inComputerVision,aconvenientassumptionwouldbethatnodeswithinthesamelevelofhierarchysharethesamedictionary.Wefollowthisassumptioninournumericalexperiments.Thisisclearlyinvalidwhendealingwithmulti-modalsensorydatainlowerlevels.Figure 2-6 showsanexampleofprogressivefactorizationsofthemodelaccordingtopriorassumptions. 33

PAGE 34

TheDLStep.AddinginvariancetotheDictionaryLearningstepsimprovesthesamplingcomplexity.Forinstance,timeandspacesharethepropertyofbeingshift-invariant.Onecanmodelthesamespatiotemporalblockwithasingledictionaryorwithathreelevelhierarchyofshift-invariantDRDLreectingtwodimensionsofspaceandoneoftime.Shift-invariantdictionarieshavebeenstudiedinthecontextoflearningaudiosequencesyieldingimprovedperformanceempirically[ 128 ]. TheDRStep.Imposinginvarianceselectivityinthedimensionreductionsteplowerstheembeddingdimensionattheexpenseofinvertibility.Amoregeneralapproachwouldbetotrade-offselectivityforinvarianceswithinvertibility,wherebythemodelincorporatesdimensionreductionsthatselectsforsomepropertyofinterest.Forexample,inComputerVisiononecanimposeselectivityforsmallshifts,rotations,andscalinginvariancesthroughadimensionreductionmatrixthatpoolsoversuchtransformations.ThisissimilartotheapproachtakenbyH-Max[ 71 ]. 2.2.4ExperimentsInthissectionweelaborateonpreliminarynumericalexperimentsonclassicationtasksperformedwithDRDLandHSRonbasicstandardMachineLearningdatasets.WeappliedourmodeltotheMNISTandCOILdatasetsandsubsequentlyusedtherepresentationasafeatureextractionstepforaclassicationalgorithmsuchasSupportVectorMachines(SVM)ork-NearestNeighbors(kNN).Inpractice,additionalpriorassumptionscanbeincludedinourmodelasdiscussedinSection 2.2.3 2.2.4.1MNISTResultsWeappliedourmodeltoallpairsoftheMNISTdataset.FortheRIPstep,wetriedrandommatricesandsparse-randommatrices,andtestedtheefcacyoftheapproachbyreconstructingthetrainingdatawithbasispursuit.Weusedonlytwolayers.Afterthefeaturevectorsarelearnedweappliedak-NNwithk=3.Werefrainedfromtweakingtheinitialparameterssinceweexpectourmodeltoworkoff-shelf.Foronelayerof 34

PAGE 35

Method Error %ReconstructiveDictionaryLearning4.33SupervisedDictionaryLearning1.05k-NN,l25.0SVM-Gauss1.4OnelayerofDRDL1.24TwolayersofHSR2.01Table2-1. AcomparisonbetweenDRDL,HSR,andstandardtechniquesonaclassicationtaskontheMNISTdataset. Method Classication %OnelayerofDRDL87.8SVM84.9NearestNeighbor81.8VTU89.9CNN84.8Table2-2. AcomparisonbetweenDRDLandstandardtechniquesonaclassicationtaskontheCOILdataset. DRDLweobtainedanerrorrateof1.24%withastandarddeviationof0.011.Usingtwolayersweobtainedanerrorof2.01%andstandarddeviationof0.016.Table2-1presentsacomparisonbetweenDRDL,HSR,andstandardtechniques[ 175 ]ontheMNISTdataset. 2.2.4.2COILResultsWeappliedourmodeltoallpairsoftheCOIL-30(whichisasubsetof30objectsoutoftheentire100objectsintheCOILdatasets).Thedatasetconsistsof72imagesforeveryclass.Weusedonly4labeledimagesperclassfortraining.Thesearetakenatequallyspacedangles(0,90,180,270).WeusedthesameprocedureastheMNISTexperimentforobtainingandcheckingtheRIPmatrix.WeappliedasinglelayerDRDRL,thentrainedak-NNwithk=3.Wealsorefrainedfromtweakingtheinitialparameters.Weobtainedameanerrorof12.2%.Table2-2presentsacomparisonbetweenDRDLandstandardtechniques[ 48 70 ]ontheCOILdataset. 35

PAGE 36

2.3DiscussionWeintroducedanovelformulationofanelementalbuildingblockthatcouldserveasathebottom-uppieceinthecommoncorticalalgorithm.Asweshallseeintherestofthethesis,thismodelleadstoseveralinterestingtheoreticalquestions.Tohelpguideexperiments,wealsoillustratedhowadditionalpriorassumptionsonthegenerativemodelcanbeexpressedwithinourintegratedframework.Furthermore,asdiscussedinChapter5,thisframeworkcanalsobeextendedtoaddressfeedback,attention,action,complimentarylearning,andtheroleoftime.Inthenextchapter,wefocusonunderstandingtheDictionaryLearningcomponent. 36

PAGE 37

CHAPTER3GEOMETRICDICTIONARYLEARNINGThischapterinvestigatestheDictionaryLearningproblemfromageometricandalgebraicpointofview.Weidentifyrelatedproblemsanddrawformalrelationshipsamongthem.Thisleadstoaviewofdictionarylearningastheminimumgeneratingsetofasubspacearrangement.Thisintroducesanexhaustivemethodthatlearnsasubspacearrangementusingsubspaceclusteringtechniquesthenappliesanintersectionalgorithmtorecoverthedictionary.Wealsodiscussthespecialcaseoflearninganorthogonalbasis. 3.1PreliminariesRecallfromChapter2,Section 2.1.6 ,ourformaldenitionofDictionaryLearning.Beforeintroducingourownapproach,wealsorecallsomeoftheknown(statistical)approachestoattackingtheproblem. 3.1.1StatisticalApproachestoDictionaryLearningAnoptimizationversionofDictionaryLearningcanbewrittenas:minD2Rd,nmaxximinkyik0:xi=Dyi.Inpractice,theDictionaryLearningproblemisoftenrelaxedtotheLagrangianminPni=0(kxi)]TJ /F3 11.955 Tf 12.32 0 Td[(Dyik2+kyik1).TraditionalapproachesrelyonheuristicmethodssuchasEM.Severaldictionarylearningalgorithmsworkbyiteratingthefollowingtwosteps[ 108 120 143 167 ]: 1. SolvethesparserepresentationproblemforallvectorsX.Thiscanbedoneusingyourfavoritevectorselectionalgorithm,suchasbasispursuit[ 150 ]. 2. GivenX,theoptimizationproblemsisnowconvexinD.UseyourfavoritemethodtondD.LetX=[x1...xm]andY=[y1...ym].Usingamaximumlikelihoodformalism,theMethodofOptimalDictionary(MOD)[ 60 ]usesthepsuedoinversetocomputeD:D(i+1)=XY(i)T(YnYiT))]TJ /F4 7.97 Tf 6.59 0 Td[(1.TheMODcanbeextendedtoMaximumA-Posterioriprobabilitysettingwithdifferentpriorstotakeintoaccountpreferencesintherecovered 37

PAGE 38

dictionary.Similarly,k-SVDusesantwostepiterativeprocess,withaTruncatedSingularValueDecompositiontoupdateD.ThisisdonebytakingeveryatominDandapplyingSVDtoXandYrestrictedtoonlythecolumnsthatthathavecontributionfromthatatom.WhenDisrestrictedtobeoftheformD=[B1,B2...BL]whereBi'sareorthonormalmatrices,thenamoreefcientpursuitalgorithmisobtainedforthesparsecodingstageusingablockcoordinaterelaxation. 3.1.2ContributionandOrganizationThischapterinvestigatestheDictionaryLearningproblemfromageometricpointofview.InSection2,webeingbyintroducingagenerativemodelfordictionarydatasetsandintroduceavarietyofrelatedproblems.Insection3,weobserveanexhaustivesolutionbyframingdictionarylearningastheminimumgeneratingsetofasubspacearrangement.Themethodlearnsasubspacearrangementusingsubspaceclusteringtechniquesthenappliesanintersectionalgorithmtorecoverthedictionary.Section4discussesthecaseofanorthogonaldictionary.FurthergeneralizationsandopenquestionsareleftforChapter5ofthethesis. 3.2TheSetupWeaimtowardsformulatingandstudyingthedictionarylearningproblemfromadirectgeometricpointofview,seekingaconcretesolutionwhoseperformancecanbeformallyunderstood.Aformalapproachreliesonmodelingthedatafromagenerativemodelandanalyzingthecomplexityoflearning. 3.2.1AssumptionsofTheGenerativeModelThereareafewchoicesforgenerativemodelsthatproducedatareadilymodeledbyadictionary.InitsmostgeneralformweareaskedtodetermineanunknowndictionaryDandasetofunknownpointsY=fy1...yngpickedfromadistributionPY,givenasetofsamplepointsX=fx1...xngsuchthatxi=Dyiwherekyik0s.Afurthercomplicationarisesintheformofnoisexi=Dyi+iwherethel2normofiisbounded. 38

PAGE 39

WesaythatasetofvectorsV,s-spansapointorsubspaceifandonlyifthepointorsubspacecanbewrittenasalinearcombinationofatmostselementsofV.Acommonpropertyoftenimposedondictionariesiss-regularity,denedbelow: Denition4(s-regularity). AdictionaryDiss-regularifforally,suchthatjjyjj0s,itholdsthatDy6=0.Forans-regulardictionary,thegeneralvectorselectionproblemisilldened.ForinstanceDcanbeovercompleteleadingtomultiplesolutionsforyi.Overcomingthisbyframingtheproblemasaminimizationproblemisexceedinglydifcult.Indeedundergenericassumptions,evendeterminingtheminimuml0normyiwhenDandxiareknownisNP-hard.Underthiscondition,wecanmakethevectorselectionproblemwelldenedbyenforcing2s-regularitypropertyonD. Denition5(s-independence). AdictionaryDiss-independentifforally1,y2,suchthatjjy1jj0sandjjy2jj0s,itholdsthatDy1=Dy2ifandonlyify1=y2.s-independenceisaminimalrequirementforuniqueinvertibility.Noticethatthedenitiongivenfors-independenceisindeedequivalentto2s-regularity.WecanfurtherstrengthentheconstraintsonDbyassumingthatDisaframe(denedinChapter2).Thisensuresthatbasictasks,suchasvectorselection,aretractableandnoisetolerant.InthefollowingsectionsfurtherconstraintsareimposedonD,PY,and,thatfurtherspecializetheframework.Unlessmentionedotherwise,weset=0. 3.2.2ProblemDenitionsAnumberofindependentlyinterestingbutintimatelyrelatedproblemsarisefromthissetup.Therstproblemisthefull-blownDictionaryLearningquestionundertheassumptionsabove. Denition6(GeometricDictionaryLearning). LetXbeagivensetofvectorsinRnthatareknowntobegeneratedfromaframedictionaryD,withjDjatmostm,andxi=Dyi, 39

PAGE 40

wherejjyijj0s.FindanyframedictionaryD,suchthatm=jDjm,andforallxi2X,thereexistsyi2Rmwherexi=Dyi.I.e.,DandDaresuchthateachvectorxinXcanberepresentedbyans-sparsecombinationofvectorsinD.OneapproachtotheDictionaryLearningproblemisbydecomposingitintousefulsubproblems.Inturn,theseproblemsarerevealedtobeofindependentinterest.Wedenetheseproblemsbelowandthenshowtheirrelationshiptotheoriginalquestion.WesaythatasetofvectorsXliesonasetSofs-dimensionalsubspacesifandonlyifforallxi2X,thereexistsSi2Ssuchthatxi2Si. Denition7(SubspaceArrangementLearning). LetXbeagivensetofvectorsthatareknowntolieonasetSofs-dimensionalsubspacesofRn,wherejSjisatmostk.FurtherassumethatthesubspacesinShavebasessuchthattheirunionisaframe.FindanysubspacearrangementSXsuchthatjSXjk,XliesonSX,andtheunionofthebasesofSi2SXisaframe.TheSubspaceArrangementisofindependentinteresttothemachinelearningandcomputervisioncommunities.InSection3.4.1,wesummarizeknownresultsontheproblem.Thesecondofourkeyproblemsisanoptimizationproblemforrepresentingaunionofsubspaces.WesayasetofvectorsDs-spansasetofvectorsX,ifandonlyifforallxi2X,thereexistsyi2RjDjsuchthatxi=Dyiandjyijs. Denition8(SmallestSpanningSetforSubspaceArrangement). Findaminimumcar-dinalitysetofvectorsthats-spanallthesubspacesinaninputsubspacearrangementS,thathavebeenspeciedbygivingbasesforeachsubspace.Ingeneral,thesmallestspanningsetisnotnecessarilyuniqueevenfors-regularsubspaces,asillustratedintheexamplebellow. Example5. Thisexample,seeFigure3-1,showstwopossiblesolutionstothesmallestspanningsetforthegivensubspacearrangement. 40

PAGE 41

Figure3-1. Twopossiblesolutionstothesmallestspanningsetproblemofthelinearrangementshown Thesubspacesareofdimensions=2andliveind=3.Theyareviewedprojectivelyintheafneplane.Noticethattheunionoftheminimumgeneratingsetis2-regular,i.e.no3oftheshownpointslieonthesameline.AsisapparentfromFigure3-1,theintersectionofthesubspacesiskey.Thismotivatesthenextofourproblems. Denition9(IntersectionofSubspaceArrangement). LetSbeangivensetofs-dimensionalsubspacesofRn(speciedbygivingtheirbaseswhoseunionisaframe).ItispromisedthatthereisasetIofvectorswithjIjatmostm,thats-spanstheunionoftheirintersection.FindanysetofvectorsIthatsatisestheaboveconditions. 3.2.3ProblemRelationshipsWecannowsketchintuitivelyhowtheaboveproblemsarerelated.TherstobservationrelatestheGeometricDictionaryLearningproblemwiththeSubspaceArrangementLearningandSmallestSpanningSetfors-regulardictionaries. Observation3(DecompositionofDictionaryLearning). Thefollowingtwo-stepproce-duresolvesthes-regularGeometricDictionaryLearningproblem: LearnaSubspaceArrangementSforX(instanceofDenition9). RecoverDbyndingthesmallestSpanningSetofS(instanceofDenition10). 41

PAGE 42

Notethatitisnottruethatthedecompositionstrategyshouldalwaysbeappliedforthesamesparsitys,theconstantinthegenerativemodel(discussedinSection3.3.1).Thefollowingexampleillustratesthis. Example6. ConsiderthearrangementshowninFigure3-2,withs=2livingintheprojectivehyperplaneofdimension3: Figure3-2. Anarrangementofpointsthatissufcienttodeterminethegeneratorsusings=3decompositionstrategybutnots=2. Therearenotenoughsamplepointstoapplythedecompositionstrategywiths=2.Instead,ifweuses=3,alltheplanesofthesimplexaredeterminedby4ofthegivenpoints.Therefore,wecanlearnasubspacearrangement(theunionoftheplanes)andfromthatrecoverthedictionaryastheverticesofthesimplex.Unlessotherwisespecied,thealgorithmsinthischapterstartoutwiththeminimumgivenvalueofsandarereappliedwithiterativelyhighersifasolutionhasnotbeobtained.Wedeploythisstrategyduetothefactthatmostsubspacearrangementlearningalgorithms(discussedinthenextsection)sufferexponentialdegradedperformanceintermsofs-withthenotableexceptionofthespecialinstancesdiscussedinthenalsectionofthischapter.Furthermore,weobservethat,fors-independentsets,theSmallestSpanningSetcanbeobtainedviaintersectionoftheSubspaceArrangements. Observation4(SmallestSpanningSetviaIntersection). UndertheconditionthattheSubspaceArrangementiscomingfromans-independentdictionary,theSmallestSpanningSetistheunionof: ThesmallestspanningsetIofthepairwiseintersectionofallthesubspacesinS. 42

PAGE 43

Anypointsoutsidethepairwiseintersectionsthat,togetherwithI,completelys-spanthesubspacesinS.Inthefollowingsectionweobtainanalgorithmforsolvingthes-independentSmallestSpanningsetproblem,byapplyingtheaboveobservationrecursively. 3.3ClusterandIntersectAlgorithmInthissectionweillustrateourgeometricstrategy.Recall,fromourdenitionofthegenerativemodelinSection3.3.1,thatPYisthedistributionfromwhichyi'saregenerated.TheSupportofyiistheindexsetIofcoordinatesofyithatarenon-zero.InthissectionPYisasfollows: Asetofksupportsarepickeduniformlyfromthesetof2mpossibilities. Thevaluesofthenon-zeroentriesofyiarepickeduniformlyfromRs.GivenmcolumnvectorsinD,thenumberofpossiblesupportsforyisPs1)]TJ /F4 7.97 Tf 5.48 -4.38 Td[(mi.Thisallowsustoquantifytheapproachintermsofthenumberofsubspacesk,whichcouldvarybetweenthevarioussettingsencounteringaninstanceoftheDictionaryLearningproblem.ForinstanceDisoftenusedtoseparatecausesinanenvironment,andnaturallynotallpossiblecombinationsofcausesarerealized.Webeginbydiscussingthecasewhereallsubspacesareofequaldimensionss,thatisd(Si)=s,foralli2f1...kg,thengeneralizetoarbitrarydimensionsofsupport.IfthesupportsarepickeduniformlythenwecanensurethateverycolumnofDisrepresentedwithprobabilityapproachingO(me)]TJ /F4 7.97 Tf 6.59 0 Td[(ks=m)using(k)subspaces.Toseethis,applythetrivialunionboundtotheprobabilitythatanygivensubspacecontainsaparticularcolumndi,whichcomesto(1)]TJ /F4 7.97 Tf 15.8 4.71 Td[(s m)qm!e)]TJ /F4 7.97 Tf 6.59 0 Td[(qs.WerstdiscussmethodsofdeterminingthesubspacearrangementSfromX,thenshowhowtorecoverD. 3.3.1LearningSubspaceArrangementsThereareseveralknownalgorithmsforlearningsubspacearrangements.Forasurveythereaderisreferredto[ 98 ].Sincethesubspacesareallofthesamedimension 43

PAGE 44

s,weprojectonagenericsubspaceofdimensions+1.Theprojectedsubspacespreservetheirdimensionsanddistinctivenesswithprobability1.ThisenablesustoworkinRs+1.Therefore,withoutlossofgenerality,weassumethatthesubspacesarehyperplanesfortheremainderofthissection. 3.3.1.1RandomsampleconsensusRandomSampleConsensus(RANSAC)[ 98 ]isanapproachtolearningsubspacearrangementsthatisolates,onesubspaceatatime,viarandomsampling.Whendealingwithanarrangementofks-dimensionalsubspaces,forinstance,themethodsampless+1pointswhichistheminimumnumberofpointsrequiredtotans-dimensionalsubspace.Theprocedurethenndsanddiscardsinliersbycomputingtheresidualtoeachdatapointrelativetothesubspaceandselectingthepointswhoseresidualisbelowacertainthreshold.Theprocessisiterateduntilwehaveksubspacesorallpointsaretted.RANSACisrobusttomodelscorruptedwithoutliers.ThefollowingalgorithmillustratesrandomanddeterministicRANSACusedforthepresentproblem. AlgorithmFindSubspacesrandomized[deterministic] Input: X,s,k Output: S 1. S=; 2. whilejSj
PAGE 45

3.3.1.2GeneralizedprinciplecomponentanalysisGeneralizedPCA(GPCA)isamethodforsubspaceclusteringusingtechniquesfromalgebraicgeometry[ 98 ].GPCAtsaunionofksubspacesusingasetofpolynomialsPoforderk.Tounderstandthis,observethateveryhyperplaneSicanbeparametrizedbyavectorbinormaltoit.SincexiisdrawnfromtheunionofksubspacesinRd,itcanberepresentedwithpolynomialsoftheformP(x)=hb1,xi...hbk,xi=0.Theprocedurebeginsbyttingahomogeneouspolynomialofdegreektothepointsfx1...xng.PcanbewrittenascTk(x),wherecisavectorofcoefcientsandvnisthevectorofall)]TJ /F4 7.97 Tf 5.48 -4.38 Td[(n+k)]TJ /F4 7.97 Tf 6.59 0 Td[(1nmonomialsofdegreekinx.Therefore,totthepolynomial,wecansolvecTvk(x1)=...=cTvk(xn)=0forc.Incaseofnoisydata,ccanbettedusingleast-squares.Thesetofvectorsfb1...bkgisobtainedbytakingderivativeofthePevaluatedatasinglepointineachsubspace.WenotethatGPCAcanalsodeterminekifitisunknown. 3.3.1.3CombinatoricsofsubspaceclusteringWecanevaluatethecombinatorialperformanceofthesubspaceclusteringalgorithmspresentedbyviewingtheproblemasaballsandbinsframework,whichingeneralstudiesthedistributionofaprobabilisticexperimentwhereanumberofballsareindependentlyandrandomlythrownintobins[ 68 ].Here,theballsarethesamplepointsandtheunknownsubspacesarethebins. Theorem3.1. TheexpectednumberofsubspacesthatcanberecoveredfromnsamplesisE(n,k,s)=1 kn)]TJ /F8 5.978 Tf 5.75 0 Td[(1Pnj=s+1)]TJ /F4 7.97 Tf 9.83 -4.38 Td[(nj+1(k)]TJ /F3 11.955 Tf 11.95 0 Td[(1)n)]TJ /F4 7.97 Tf 6.59 0 Td[(j)]TJ /F4 7.97 Tf 6.59 0 Td[(1. Proof. Observethat,ingeneralposition,thecombinatorialstructureofthisproblemisequivalenttotheclassicframeworkofballandbins.LetE(n,j)denotetheexpectednumberofbinscontainingexactlysballsafternthrows.ObservethatE(n,j)satisestherecurrencerelationE(n,j)=(1)]TJ /F4 7.97 Tf 13.85 4.71 Td[(1 k)E(n)]TJ /F3 11.955 Tf 12.53 0 Td[(1,j)+1 kE(n)]TJ /F3 11.955 Tf 12.53 0 Td[(1,j)]TJ /F3 11.955 Tf 12.53 0 Td[(1)whichsolvestoE(n,j)=k)]TJ /F4 7.97 Tf 5.48 -4.38 Td[(nj(1)]TJ /F4 7.97 Tf 13.27 4.71 Td[(1 k)n)]TJ /F4 7.97 Tf 6.58 0 Td[(j1 kj+1.Thensumoverallj>s. 45

PAGE 46

Wegiveatightapproachtoanalyzingfordictionarylearningthatreliesonthealgebraicandcombinatorialgeometryoftheprobleminthenextchapter. 3.3.2s-IndependentSmallestSpanningSetGivenasubspacearrangementScomingfromfromans-independentdictionary,thesmallestspanningsetcanbewrittenrecursivelyintermsoftheunionof:(a)thespanningsetofthearrangement,obtainedbytakingunionofpairwiseintersectionofallthesubspacesinS,togetherwithpoints(b):outsidethepairwiseintersectionsthatarewouldbenecessaryandsufcienttocompletelys-spanthesubspacesinS.Thisdirectlyleadstoarecursivealgorithmforthesmallestspanningsetproblem,asfollows: AlgorithmSmallestSpanningSet Input: S Output: I 1. S=PairwiseIntersection(S) 2. ifS6=; 3. returnSmallestSpanningSet(S[S) 4. else: 5. SortByIncreasingDimension(S) 6. I=; 7. forSi2S: 8. ndSi\s-span(I) 9. ndSi=I,suchthatSi=I[(Si\s-span(I))isabasisforSi 10. I=I[Si=I 11. returnIThedictionaryisnowobtainedfrompickingmatomsfromtheintersectionofthesubspacesandtheremainingspanningsets. 46

PAGE 47

3.4LearninganOrthogonalBasisWeturnourattentiontoaninterestingspecialcase.InapplicationssuchasCompressedSensing,discussedintheChapter2,asignalisknowntobesparseinsomebasesB.TheisutilizedtoobtainsignicantsamplingcomplexitygainsbeyondtheShannon-Nyquistbound.HereweaddresstheproblemofefcientlydiscoveringBwhenitisunknown. Denition10(BasisLearning). LetXbeagivensetofvectorsinRdthatareknowntobegeneratedfromaframeorthogonalbasisB,andxi=Byi,wherejjyijj0s.FindanyorthogonalbasisB,suchthatforallxi2X,thereexistsyi2Rdwherexi=Byi.Forthespecialcasesofabasiswithnon-negativecoefcientsyi,ands=o(p d),weobtainarelativelysimplealgorithmforBasisLearning.ThisalgorithmisbasedonLemma 1 bellow,whichassertsthat,inthissetting,thereisasignicantchanceforrandomsamplestobeorthogonal. Lemma1. TworandomsparsesamplesxiandxjfromabasisBands=o(p d)areorthogonalwithconstantprobabilityasd!1. Proof. Theprobabilitygenericallydependsonwhetherthesupportsaremutuallyexclusive.Pr(hxi,xji=0)=Qsi=0d)]TJ /F4 7.97 Tf 6.58 0 Td[(s)]TJ /F4 7.97 Tf 6.59 0 Td[(i d)]TJ /F4 7.97 Tf 6.59 0 Td[(i>Qsi=0d)]TJ /F4 7.97 Tf 6.58 0 Td[(2s d)]TJ /F4 7.97 Tf 6.59 0 Td[(s>es2 d)]TJ /F8 5.978 Tf 5.76 0 Td[(s. Givenapointxi,andasetofpointsX?i=fxi1...xilg,thatareallorthogonaltoxi,suchthatspan(X?i)=d)]TJ /F3 11.955 Tf 12.87 0 Td[(s,wecanpartitiontheBastoBiandB?isuchthatB?i=span(X?i)andBiisthenullspaceofB?i.ThissuggestsalearningalgorithmthatrecursivelysubdividesBintosubspaces: AlgorithmLearnBasis Input: X,s Output: B 1. B=; 2. forxi2X 3. letX?i=fxjjxj2X,hxi,xji=0g 47

PAGE 48

4. ifrank(X?i)d)]TJ /F3 11.955 Tf 11.96 0 Td[(s 5. thenB1=LearnBasis(ProjectXontoX?i,s) 6. B2=LearnBasis(ProjectXontonull(X?i),s) 7. B=B1[B2 8. repeat 3.5SummaryInthischapter,weillustratedhowDictionaryLearningcanbeapproachedfromageometricpointofview.Aformalconnectionisestablishedtootherproblemsofindependentinterest.Thisdisciplinedwayoflookingattheproblemleadstospecicalgorithms,withimprovementsforsimpliedinstancesthatareencounteredinpractice,asinSection 3.4 (andasweshallseeinChapter5).Inthenextchapter,westudythecomplexityofDictionaryLearning.Thisisdonewiththehelpofasurrogateprobleminwhichthecombinatoricsofthesubspacesupportsarespecied.Theproblemisthenapproachedusingmachineryfromalgebraicandcombinatorialgeometry.Specically,arigidity-typetheoremisobtainedthatcharacterizesthesamplearrangements,thatrecoveranitenumberofdictionaries,usingapurelycombinatorialpropertyonthesupports. 48

PAGE 49

CHAPTER4SAMPLINGCOMPLEXITYFORGEOMETRICDICTIONARYLEARNINGFormotivationsdiscussedearlier,weareinterestedinunderstandingtheproblemofgeometricdictionarylearning.FromfromChapter2,recallthedenitionofGeometricDictionaryLearning.AsdiscussedinChapter2,eventheproblemofrecoveringYgivenXwhereDisknownhasbeenshowntobeNP-hardbyreductiontotheExactCoverby3-setproblem[ 24 ].OneisthentemptedtoconcludeascorollarythatGeometricDictionaryLearningisalsoNP-hard.However,thiscannotbedirectlydeduced,ingeneral.Theerrorwiththisreasoningisthat,eventhoughaddingawitnessDturnstheproblemintoanNP-hardproblem(vectorselection),itispossiblethattheGeometricDictionaryLearningsolvestoproduceadifferentdictionaryD.Nowweintroduceasurrogateproblemwherethecombinatoricsarespecied.Inthisproblemwearealsogiventhesupportsoftheinputsetintheunknowndictionary.ThisnewproblemiscalledtheRestrictedDictionaryLearningproblem,properlydenedinwhatfollows. Denition11(RestrictedDictionaryLearning). LetXbeagivensetofvectorsinRd.WearealsogivenanindexsetSiofthescolumnsofanunknowns-regulardictionaryD=[d1,...,dm]suchthatxi2span(dj,j2Si).FindanyframedictionaryD,suchthatm=jDjm,andforallxi2X,xi2span(dj,j2Si).Thissimplicationenablesustoanalyzetheproblemusingmachineryfromalgebraicandcombinatorialgeometry.Forthetimebeing,werestrictourselvestos=2,d=3.Wecanprojectthesystemontheafneplanesothatthedimapstopi,a2dpoint.Fornotationalconvenience,wecalltheprojectionofxkassimplyxk,wherethemeaningisclearfromthecontext.Thisdenesacorrespondingproblemintheprojectiveplane: Denition12(PinnedLine-IncidenceProblem). LetXbeagivensetofpoints(pins)inP2(R).WearealsogivenanindexsetSiofthelinespassingthroughanunknownsetof 49

PAGE 50

pointsP=fp1,...,pmg,suchthatno3pointslieonthesameline.FindanysuchsetofpointsP[X]thatsatisfythegivenSilineincidencesonP[X]andX.PinnedLine-IncidenceisrelatedtoclassicproblemsintheliteraturesuchasDirectionNetworks[ 155 168 169 ],DirectionNetworkswithpinsonsliders[ 172 ],theMolecularconjecturein2D[ 160 ],andpin-collinearbody-pin[ 171 ].PinnedLine-Incidencecanbeviewedas: DirectionNetworks,exceptthatwearegiventranslations(thexi's)insteadofdirections. ABody-Pinframeworkwhereeachbodyisalineandeachpinisonatmost2bodies.Weaddthateachpinisconstrainedtoaslider. APoint-Linecoincidenceswithatmost2pointsonalineandeachlineispinnedbyagloballyxed(given)pin. APoint-Linecoincidenceswithatmost2linesonapointandeachpointisonagloballyxed(given)slider.However,whilethesepreviouslystudiedproblemsarelinear,weshallsee,inSection 4.1 ,thatourproblemisfurthercomplicatedbybeingnon-linear(theconstraintsarequadratic).ThenotionofAfneRigidity[ 164 ]isalsorelatedtoourproblemwhenviewedinacoordinate-freemanner(i.e.onlyrelativepositionsarerelevant).Afnerigidityasks,foragivensetofpointsinRn,whendomeasurementsoftherelativeafnepositionsofsomesubsetsofthepointsdeterminethepositionsofallthepoints,uptoanoverallafnetransformation. Example7. Figure 4-1 depictstwoexamplesofthePinnedLine-IncidenceProblem.Ourproblemissimilarinavortobody-pinframeworks,asinthe2Dmoleculartheorem,whereeachbodyisalineandeachpinisonatmost2bodies.Wethenaddtheconstraintthateachpinliesonaslider.Itisalsosimilartopoint-linecoincidences,withatmost2pointsonaline,andeachlineispinnedbyagloballyxed(given)pin.Lookingatthedualspace,wemayalsoviewtheproblemasapoint-linecoincidences,withatmost2linesonapointandeachpointisonagloballyxed(given)slider. 50

PAGE 51

Figure4-1. Twosimplearrangements.Thelarger(blue)dotsarethepointsPandthesmall(grey)dotsarethegiventhepinsX. WeprovecombinatorialconditionsthatcharacterizetheinputsthatrecoveranitenumberofsolutionsforP.Notethat2-regularity,i.e.therequirementthatno3pointsinParecollinear,simpliestheproblembyavoidingbadlybehavedcasessuchasPappus'sTheorem[ 165 ]. 4.1AlgebraicRepresentationWecanderivealgebraicsystemsofequationsrepresentingourproblem.Nowconsiderapointxkinthelineconnectingpiandpj.Workingintheprojectiveplaneandusinghomogeneouscoordinates,thepointcanbewrittenintermsofpiandpjasfollows: (1)]TJ /F9 11.955 Tf 11.96 0 Td[()pi,1+pj,1=xk,1(1)]TJ /F9 11.955 Tf 11.96 0 Td[()pi,2+pj,2=xk,2.Solvingtoremove,weobtain: 51

PAGE 52

pi,1pj,2)]TJ /F3 11.955 Tf 11.95 0 Td[(pj,1pi,2)]TJ /F3 11.955 Tf 11.95 0 Td[(xk,2(pi,1)]TJ /F3 11.955 Tf 11.96 0 Td[(pj,1)+xk,1(pi,2)]TJ /F3 11.955 Tf 11.95 0 Td[(pj,2)=0Notethatisinfactthesamestatementas(di)]TJ /F3 11.955 Tf 12.07 0 Td[(xk)(di)]TJ /F3 11.955 Tf 12.07 0 Td[(xk)=0inhomogeneousspace.Theproblemnowreducestosolvingasystemofequations,eachoftheform 4 .ThesystemofequationssetsamultivariatefunctiontoF(P,X)=0: F(P,X)=8>>>><>>>>:...pi,1pj,2)]TJ /F3 11.955 Tf 11.96 0 Td[(pj,1pi,2)]TJ /F3 11.955 Tf 11.96 0 Td[(xk,2(pi,1)]TJ /F3 11.955 Tf 11.95 0 Td[(pj,1)+xk,1(pi,2)]TJ /F3 11.955 Tf 11.96 0 Td[(pj,2)=0... (4) whereF(P,X)isavectorvaluedfunctionfromRjPj+jXjtoRjXj.Henceforth,whenweviewXasaxedparameterweletFX(P)=F(P,X),afunctionfromRjPjtoRjXjparametrizedbyX. Example8. FortheexampleinFigure 4-2 ,thepolynomialsystemFX(P)=0cannowbewrittenas: Figure4-2. Asimplearrangementof6points. 52

PAGE 53

8>>>>>>>>>>>>>><>>>>>>>>>>>>>>:p1,1p2,2)]TJ /F3 11.955 Tf 11.95 0 Td[(p1,1p2,2)]TJ /F3 11.955 Tf 11.96 0 Td[(x1,2(p1,1)]TJ /F3 11.955 Tf 11.95 0 Td[(p2,1)+x1,1(p1,2)]TJ /F3 11.955 Tf 11.96 0 Td[(p2,2)=0p1,1p2,2)]TJ /F3 11.955 Tf 11.95 0 Td[(p1,1p2,2)]TJ /F3 11.955 Tf 11.96 0 Td[(x2,2(p1,1)]TJ /F3 11.955 Tf 11.95 0 Td[(p2,1)+x2,1(p1,2)]TJ /F3 11.955 Tf 11.96 0 Td[(p2,2)=0p1,1p3,2)]TJ /F3 11.955 Tf 11.95 0 Td[(p1,1p3,2)]TJ /F3 11.955 Tf 11.96 0 Td[(x3,2(p1,1)]TJ /F3 11.955 Tf 11.95 0 Td[(p3,1)+x3,1(p1,2)]TJ /F3 11.955 Tf 11.96 0 Td[(p3,2)=0p1,1p3,2)]TJ /F3 11.955 Tf 11.95 0 Td[(p1,1p3,2)]TJ /F3 11.955 Tf 11.96 0 Td[(x4,2(p1,1)]TJ /F3 11.955 Tf 11.95 0 Td[(p3,1)+x4,1(p1,2)]TJ /F3 11.955 Tf 11.96 0 Td[(p3,2)=0p2,1p3,2)]TJ /F3 11.955 Tf 11.95 0 Td[(p2,1p3,2)]TJ /F3 11.955 Tf 11.96 0 Td[(x5,2(p2,1)]TJ /F3 11.955 Tf 11.95 0 Td[(p3,1)+x5,1(p2,2)]TJ /F3 11.955 Tf 11.96 0 Td[(p3,2)=0p2,1p3,2)]TJ /F3 11.955 Tf 11.95 0 Td[(p2,1p3,2)]TJ /F3 11.955 Tf 11.96 0 Td[(x6,2(p2,1)]TJ /F3 11.955 Tf 11.95 0 Td[(p3,1)+x6,1(p2,2)]TJ /F3 11.955 Tf 11.96 0 Td[(p3,2)=0whichcannowbesolvedusingyourfavoritesystemssolversuchastheSageopensourcemathematicssoftwarebasedonPython.Noticethateverypointisaconstraintthatcan,potentially,removeasingledegreeoffreedomfromourproblem.Withoutanypoints,thearrangementhasatotalof2mdegreesoffreedom. 4.2CombinatorialRigidityAsshownintheprevioussection,theproblemcanbeviewedasndingthecommonsolutionsofasystemofpolynomialequations(realalgebraicvariety).Thecombinatoricsoftheproblemcanbeviewedasamulti-graphG=(V,E),whereeachedgerepresentsthepinconstraintsxi's.Wedescribetheapproachtakenbythetraditionofrigiditytheory[ 155 156 159 160 168 172 ],andgivesomeofthedenitions[ 173 ].Generally,asolutionsexistifthepolynomialsareindependent.However,checkingtheindependenceofthepolynomialsystemisequivalenttocheckingwhetheroneofthepolynomialsofthesystemisintheidealgeneratedbytheothers.Ingeneral,checkingindependencerelativetotheidealgeneratedbythevarietyiscomputationallyhardandbestknownalgorithms,suchascomputingGrobnerbasis,areexponentialintimeandspace[ 58 ].However,thealgebraicsystemisusuallylinearizedatregular(locallyat,i.e.non-singular)pointsandthenindependencecanbecheckedforgenericcongurations. 53

PAGE 54

APinnedLine-IncidenceFramework(G,X,P),whereG=(V,E)isagraph,X:fx1,...,xmgR2!EisanassignmentofagivensetofpointsxitoedgesX(xi)=(vi1,vi2)2E;andP:V!R2misanembeddingofeachvertexvjintoapointpj=P(vj)2R2suchthatforeachxi,thethreepointsxi,pi1,pi2arecollinear.Note:whenthecontextisclear,weuseXtodenoteboththesetofpointsfx1,...,xmg,aswellastheaboveassignmentofthesepointstoedgesofG.Twoframeworks(G1,X1,P1)and(G2,X2,P2)areequivalentifG1=G2andX1=X2,i.e.satisfythesamealgebraicequationsforthesamelabeledgraphandorderedsetofpins.TheyarecongruentiftheyareequivalentandP1=P2.Independenceofthealgebraicsystemisdenedasnoneofthealgebraicconstraintsisintheidealgeneratedbytheothers.Independenceimpliestheexistenceofasolution.Rigidityistheexistenceofatmostnitelymanysolutionstothealgebraicsystem 4 .MinimalRigidityistheexistenceofasolutionandatmostnitelymanysolutions.GlobalRigidityistheexistenceofatmost1solution.UnderappropriateconditionsofLemma 3 ,rigidityandindependence(basedonnonlinearpolynomials)canbecapturedbylinearconditionsinaninnitesimalsetting.Considertheinnitesimalmotionvector(innitesimalex)ofapairofpointspiandpjconstraintbyapinxk,asinFigure 4-3 ,thentheirvelocitiesviandvjsatisfyhvi,nliri,k+hvj,nli(di,j)]TJ /F3 11.955 Tf 12.66 0 Td[(ri,k)=0.wherenlisthenormaltotheline,di,jthedistancebetweenpiandpj,andri,kisthedistancebetweenthepinandthepi.Nowconsidernlthenormalvectortothelineli,j,thennlcanbewrittenas[cosi,j,sini,j].IfvPisacolumnvectorwhoseentriesarethevelocitiesofallvi,thenthentheconstrainthvi,nliri,k+hvj,nli(di,j)]TJ /F3 11.955 Tf 11.95 0 Td[(ri,k)=0translatestoarowoftheform: [0...0,ri,kcosi,j,ri,ksini,j,0...0,(di,j)]TJ /F3 11.955 Tf 11.95 0 Td[(ri,k)cosi,j,(di,j)]TJ /F3 11.955 Tf 11.95 0 Td[(ri,k)sini,j,0...0]vP=0. 54

PAGE 55

Figure4-3. Thevelocitiesofapairofpointspiandpjconstraintbyapinxk. Ifthepi'sarenotcoincident(i.e.di,j6=0),wecandividetherowbydi,j.Moreover,sincethenumberoflinesisnite,acoordinatesystemcanbeselectedsuchthatcosi,jisnotzero.Therefore,therowpatterncanbesimpliedto: [0...0,ak,akbk,0...0,(1)]TJ /F3 11.955 Tf 11.96 0 Td[(ak),(1)]TJ /F3 11.955 Tf 11.95 0 Td[(ak)bk,0...0].(4)ARigidityMatrixisamatrixwhosekernelistheinnitesimalmotions(exes).InnitesimalIndependenceisdenedasindependenceoftherowsoftherigiditymatrix,i.e.thenumberofrowsoftherigiditymatrixisthesameasitsrank.Innites-imalRigidityisthefullrankoftherigiditymatrix.innitesimalminimalrigidityiswhenthereisexactlyenoughindependentrowsoftherigiditymatrixtodeterminethevariables.Inalgebraicgeometry,apropertyisgenericintuitivelymeansthatthepropertyholdsontheopendensecomplementofan(real)algebraicvariety.Formally, Denition13. AframeworkG(p)isgenericw.r.t.apropertyQifandonlyifthereexistsaneighborhood(p)suchthatforallq2(p),psatisesthepropertyQifandonlyifqsatisesQ.Furthermorewecandenegenericpropertiesofthegraph, 55

PAGE 56

Denition14. ApropertyQofframeworksisgeneric(i.e,becomesapropertyofthegraphalone)ifforallgraphsG,eitherallgeneric(w.r.t.Q)frameworksofGsatisfyQ,orallgeneric(w.r.tQ)frameworksofGdonotsatisfyQ.AframeworkisgenericforQifanalgebraicvarietyVQspecictoQisavoidedbythegivenframework.Often,forconvenienceinrelatingQtootherproperties,amorerestrictivenotionofgenericityisusedthanstipulatedby 13 or 14 aboveasinLemma 3 .I.e.,forconvenience,anothervarietyVQischosensothatVQVQ.Ideally,thevarietyVQcorrespondingtothechosennotionofgenericityshouldbeastightaspossibleforthepropertyQ(necessaryandsufcientfor 13 and 14 ),andshouldbeexplicitlydened,oratleasteasilytestableforagivenframework.OnceanappropriatenotionofgenericityisdenedforwhichapropertyQisgeneric,wecantreatQgenericallyasapropertyofagraph.TheprimaryactivityoftheareaofcombinatorialrigidityistoadditionallygivepurelygraphcharacterizationsofsuchgenericpropertiesQ.Intheprocessofdrawingsuchcombinatorialcharacterizations,thenotionofgenericitymayhavetobefurtherrestricted,i.e,thevarietyVQisfurtherexpandedbyso-calledpureconditionsthatareneededforthecombinatorialcharacterizationtogothrough.(WewillseethisbelowinTheorem4.1)NotethatthegenericrankofagenericmatrixMisatleastaslargeastherankofanyspecicrealizationM(p). 4.3RequiredGraphPropertiesThefollowinggivesapuregraphpropertythatwillbeusefulforourpurposes. Denition15. The(2,0)-tightnessconditionsuitableforourproblemcanbedenedonagraphGsuchthat: jEj=2jVj. ForanyVV,thenG=(E,V),theaugmentedinducedsubgraph,satisesjEj2jVj.Gisthevertexinducedsubgraphaugmentedwithaself-loopatvi 56

PAGE 57

whentherearetwoedgesbetweenthesamevertexvj2V)]TJ /F3 11.955 Tf 13.58 3.03 Td[(Vtothesamevertexvi2V.Agraphiscalled(2,0)-sparseifitsatisesthe2ndconditionofDenition 15 Example9. Figure 4-4 givesanexampleofacongurationwhosesupportmulti-graphis(2,0)-tight. Figure4-4 Thisisaspecialcaseofthe(k,l)-sparsityconditionstudiedin[ 156 158 168 ].Arelevantconceptfromgraphmatroids,wedeneacircuitgraphasfollows. Denition16. Acircuitisa(multi)graphG=(V,E),suchthatE=T[e,whereTisaminimalspanningtreeofVandeisanarbitraryedge.Thefollowinglemmagivesausefulcharacterizationof(2,0)-tightgraphsintermsofcircuits. Lemma2. AgraphG=(V,E)iscomposedoftheunionof2-edge-disjointcircuits,G1=(V,E1)andG2=(V,E2),ifandonlyifthegraphGis(2,0)-tight Proof. AtheorembyTutte-NashWilliams[ 167 ]showsthatagraphcanbecoveredby2-edge-disjointspanningtreesifandonlyifthegraphis(2,2)-sparse.Sincethecircuitsaremadeoftreesandarbitraryedges,itiseasytoseethatany(multi)graphcanbecoveredby2-edge-disjointcircuitsifandonlyifthegraphis(2,0)-tight. 4.4RigidityTheoremind=3,s=2Westatethemainresulthereandthenworkourwaytotheproof. 57

PAGE 58

Theorem4.1. Ind=3,therestricteddictionarylearningproblemisgenericallyminimallyrigid(i.e.admitsanitenumberofsolutions)ifandonlyifthesupports'multi-graphis(2,0)-tight.Theprooffollowsthetraditionofrigiditytheory[ 155 156 159 160 168 172 ].InparticularweadoptanapproachbyWhiteandWhiteley[ 155 169 ],inprovingrigidityofk-frames.Theproofoutlineisasfollows: Weshowthat,foroursystem,innitesimalrigidityisequivalentrigidity,ataregularpoint(Lemma 3 ). WeobtainasimpleformfortheRigiditymatrix(Lemma 4 )andshowthatthismatrixisequivalenttotheJacobianofthealgebraicfunctionin 4 Weshowthatforaspecicformoftherowsofamatrix,denedonacircuitgraphs,thedeterminantisnotidenticallyzero(Lemma 5 ). WeapplyLaplacedecompositiontothe(2,0)-tightgraph,asasumoftwocircuits,toshowthatthedeterminantoftheRigiditymatrixisidenticallyzero.(ProofofMainTheorem). Theresultingpolynomialiscallthepurecondition:therelationshipbetweenthatthesystemhastosatisfyinorderforthecombinatorialcharacterizationtohold.Itisshownin[ 153 ]that,ataregularpoint,innitesimalrigidityisequivalenttogenericrigidity.Weadaptthisconnectionforourproblemasfollows. Lemma3. IfPandXareregularpoints,thengenericinnitesimalrigidityisequivalentgenericrigidity. ProofSketch. Firstweshowthatifaframeworkisregular,innitesimalrigidityimpliesrigidity:Considerthepolynomialsystemofequations 4 ,F(X,P).TheImplicitFunctionTheoremstatesthatthereexistsag(x),suchthatP=g(X)onsomeopeninterval,ifandonlyiftheJacobianJX(P)ofF(X,P)w.r.t.Phasfullrank.Therefore,ifthesystemisinnitesimallyrigid,thenthesolutionstothealgebraicsystemareisolatedpoints(otherwiseg(x)couldnotbeexplicit).Sincethealgebraicsystemcontainsnitelymanycomponents,thereareonlynitelymanysuchsolutionandeachsolutionisa0 58

PAGE 59

dimensionalpoint.Thisimpliesthatthetotalnumberofsolutionsisnitewhichisthedenitionofrigidity.Toshowthatgenericrigidityimpliesgenericinnitesimalrigidity,wetakethecontrapositive:ifthesystemisnotinnitesimallyrigid,weshowthatthereisaniteex.If(G,P,X)isnotinnitesimallyrigid,thentherankrofthejacobianJX(P)islessthan2m.LetEbeasetofedgesinGsuchthatjEj=randthecorrespondingrowsinJX(P)areallindependent.Therearerindependentrowsaswell.LetPEbethecomponentsofPcorrespondingtothoserrowsandPE?betheremainingcomponents.Ther-by-rsubmatrix,madeofupofthecorrespondingindependentrowsandcolumns,isinvertible.Then,bytheImplicitFunctionTheorem,inaneighborhoodofPthereexistsacontinuousanddifferentiablefunctiongsuchthatPE=g(PE?).ThisidentiesP,whosecomponentsarePEandthelevelsetofgcorrespondingtoPE,suchthatFX(P)=0.Thelevelsetdenestheniteexingofthesystem.Thereforethesystemisnotrigid. NextweconstructasimpleRigidityMatrixwhichistheJacobianofFunction 4 Lemma4. TherigiditymatrixMwhoserowsareoftheform: [0...0,ak,akbk,0...0,(1)]TJ /F3 11.955 Tf 11.95 0 Td[(ak),(1)]TJ /F3 11.955 Tf 11.95 0 Td[(ak)bk,0...0].istheJacobianofFunction 4 Proof. TheJacobianJX(P)canbewrittenbytakingthederivativesofFunction 4 w.r.t.thepointspi's.TherowsoftheJacobianareoftheform: ...,pj,2)]TJ /F3 11.955 Tf 11.96 0 Td[(xk,2,)]TJ /F3 11.955 Tf 9.3 0 Td[(pj,1+xk,1,...,)]TJ /F3 11.955 Tf 9.3 0 Td[(pi,2+xk,2,pi,1)]TJ /F3 11.955 Tf 11.96 0 Td[(xk,1,....(4)ThisformcanbereadilyseenasequivalenttoEquation 4 above,bynoticingthattherowconstraintscorrespondto()777(!pj)]TJ 12.41 5.15 Td[()777(!xk)and()777(!pj)]TJ 12.42 5.15 Td[()778(!xk)projectedonthecoordinatesystem. 59

PAGE 60

TheinnitesimalmotionsarethereforegivenbythesolutionstoMvP=0,whereMistherigiditymatrixoftheconstraints,suchthatcolumnsrepresentthetwodegreesoffreedomofthepi's.Rowsrepresentconstraintsimposedbythepinsonthevelocitiesofthepi's. Example10. ConsiderthearrangementgiveninExample 8 .Ifthepinsaregiven:X=0B@)]TJ /F3 11.955 Tf 9.3 0 Td[(2)]TJ /F3 11.955 Tf 9.3 0 Td[(4=3)]TJ /F3 11.955 Tf 9.3 0 Td[(114=32=310)]TJ /F3 11.955 Tf 9.3 0 Td[(1)]TJ /F3 11.955 Tf 9.3 0 Td[(1011CAThentheunknowndictionaryis:D=0B@0)]TJ /F3 11.955 Tf 9.3 0 Td[(222)]TJ /F3 11.955 Tf 9.3 0 Td[(1)]TJ /F3 11.955 Tf 9.3 0 Td[(11CAAndtheRigidityMatrixis:M=0BBBBBBBBBBBBBB@2=3)]TJ /F3 11.955 Tf 9.3 0 Td[(1008=324=3)]TJ /F3 11.955 Tf 9.3 0 Td[(20010=31)]TJ /F3 11.955 Tf 9.3 0 Td[(2=3)]TJ /F3 11.955 Tf 9.3 0 Td[(1)]TJ /F3 11.955 Tf 9.3 0 Td[(4=3)]TJ /F3 11.955 Tf 9.3 0 Td[(200)]TJ /F3 11.955 Tf 9.3 0 Td[(4=3)]TJ /F3 11.955 Tf 9.3 0 Td[(2)]TJ /F3 11.955 Tf 9.3 0 Td[(2=3)]TJ /F3 11.955 Tf 9.3 0 Td[(1000010300030101CCCCCCCCCCCCCCAWhichisfullrank.Notethatthereareseveralcorrectwaystowritetherigiditymatrixoftheproblem,dependingonwhatyouconsiderastheprimarycomponentsofthecolumns(points,lines,orboth),andwhetheronechoosestoworkinprimalordualspace.Wepickpointsforcolumnsandworkinprimalspaceforthesimplicityoftherowpattern.Wedeveloponemorelemma,onthegenericrankofparticularmatricesdenedoncircuitgraphs,thatwillbehelpfulinanalyzingtherankofM. 60

PAGE 61

Lemma5. AmatrixNdenedonadirectedcircuitG=(V,E),suchthatcolumnsareindexedbytheverticesandrowsbytheedges,wheretherowforei,j2Ehasnon-zerosentriesonlyattheindicescorrespondingtoviandvjfollowingthepattern: [0...0,ak,0...0,(1)]TJ /F3 11.955 Tf 11.96 0 Td[(ak),0...0]isgenericallyfullrank. Proof. AnycircuitGcanbewrittenasC[fT1...TsgwhereCisasinglecycleofcorevertices,andTiaredisjointtrees(Forest),suchthattheTi'shavetheirrootvertexvTi2C.Thesub-matrixforthecycleN[VC,EC]canbeshowntobegenericallyfullrankbywritingthecolumnsintheirorderinthecycle,andredirecttheedgessuchthattheyallpointinthesamedirectiononthecycle.ThedeterminantoftheredirectedcycleNR[VC,EC]is: det(NR[VC,EC])=Yavi+Y(1)]TJ /F3 11.955 Tf 11.96 0 Td[(avi)(4)whichcanbesimpliedtotheform sign(j)Xaj11...ajjvcjvc(4)wherejrangesfrom0to2jvcj)]TJ /F3 11.955 Tf 10.8 0 Td[(1,andjiistheithbase2digitinj.Redirectingamountstoachangeofvariableoftheformak=1)]TJ /F3 11.955 Tf 12.17 0 Td[(ak.Afterthechangeofvariables,Equation(2)and(3)representtheconditionforwhichtheN[VC,EC]isfull-rank,i.e.thepureconditionforgenericity.ForagiventreeTiandGkCWecanshowbyinductiononeachleveljofthetree,thatthematrixN[VGk[VTi,EGk[ETi]hasfullrank. Basecase:sincetherootofTiisinC,thenitisinGk. 61

PAGE 62

Ifagivenleveliisfullrank,thenbyGaussianeliminationleveli+1isfullranksinceleveliandleveli+1areconnected.ApplyingtheaboveinductiveargumentinductivelytoeachTi,withthebasecaseG1=C,completestheproofthatcircuitshavefullrank.Tocalculatethedeterminant,applyaLaplaceexpansionsplittingthecircleCandtheforest[Ti)]TJ /F3 11.955 Tf 11.95 0 Td[(C,thenobservethatthedeterminantofaforesthasasingleterm(eitherakor1-ak)foreachvertex.Then: det(N)=det(NC)Qad(k)k(1)]TJ /F3 11.955 Tf 11.95 0 Td[(ak)1)]TJ /F4 7.97 Tf 6.59 0 Td[(d(k)wheretheproductistakenoveralledgeseiin[Ti)]TJ /F3 11.955 Tf 11.96 0 Td[(C,andd(i)=1ifandonlyifeiisdirectedtowardsC,otherwised(i)=0. Nowwearereadytoprovethemaintheorem. ProofofMainTheorem. ThecharacterizationfollowsfromprovingthatthepolynomialfromthedeterminantofMisnotidenticallyzero.Sincethenumberofcolumnsis2m,itistrivialthat2mpinsarenecessary.Itisalsotrivialtoseethat(2,0)tightnessisnecessarysinceasubgraphG=(V,E)exceedingitscount,jEj>2jVj,impliesthatitsvertexcomplimentG=(V=V)]TJ /F3 11.955 Tf 11.96 0 Td[(V,E=E(V))isunder-determined,i.e.jEj<2jVj.Nextweshowthattheconverseistrue:that2mpinsarrangedgenericallyina(2,0)-tightpatternimplyinnitesimalrigidity.Bygroupingandfactoringoutthecolumnstoseparategroupsofcoordinates(thea'sandb's),asimplermatrixisobtained.ThiscanbedonebyapplyingaLaplaceexpansiontorewritethedeterminateoftherigiditymatrix,asasumofproductsofdeterminants(brackets)representingeachofthecoordinatestakenseparately: det(M)=P8X,Ydet(M[A,X])det(M[B,Y])wherethesumistakenoverallcomplementarysetsofmrowsXandY.ObservethatM[A,X]'snowhaverowsoftheform: [0...0,ak,0...0,(1)]TJ /F3 11.955 Tf 11.95 0 Td[(ak),0...0]. 62

PAGE 63

Moreover,sincethedeterminantofamatrixismulti-linearinrowsofthatmatrix, det(M[B,Y])=(bc1,l...bcm,l)det(M[B,Y])wherefc1...cmg=c(l)indexestheLaplacepartitions,andtherowsofM[B,Y]areoftheform: [0...0,ak,0...0,(1)]TJ /F3 11.955 Tf 11.95 0 Td[(ak),0...0].SincebothM[A,X]andM[B,Y]havethesameformasNfromLemma 5 ,theirdeterminantsaregenericallynon-zeroiftheinducedgraphsarecircuits.Weconcludethat, det(M)=P8c(l)(bc1,l...bcm,l)det(M[A,Y])det(M[B,Y])wherethec(l)'senumeratealltheLaplacepartitions.Observethateachelementofthesumhasauniquemulti-linearmonomial(bc1,l...bcm,l)thatgenericallydonotcancelwithanyoftheotherssincedet(M[A,Y])det(M[B,Y])areindependentoftheb's.ThisimpliesthatthegenericrankofMis2miftheinducedgraphsarecircuits.Since,fromLemma 2 ,(2,0)-tightgraphscanbepartitionedintoedgedisjointunionoftwocircuits,thiscompletestheproof.Moreover,substitutingthevaluesofdet(M[A,Y])anddet(M[B,Y])fromLemma4givesthepureconditionforgenericity. TheTheoremgivesapureconditionthatcharacterizesthebadlybehavedcases(i.e.theconditionsofnongenercitythatbreaksthecombinatorialcharacterizationoftheinnitesimalrigidity).Thepureconditionisafunctionofthea'sandb'swhichcanbecalculatedfromtheparticularrealization(framework)usingLemma 5 andthemaintheorem.Whetheritispossibletoefcientlytestforgenericityfromtheproblem'sinput(thegraphandxi's),isanopenproblem.TheTheoremrequiresthefollowinggenericities: Thepurecondition,whichisafunctionofagivenframework. 63

PAGE 64

Genericinnitesimalrigidity,whichisthegenericrankofthematrix(i.e.therankoftherigiditymatrixofarealizationbeatleastaslargeastherankofallotherrealizations).Therelationshipbetweenthetwonotionsofgenercitiesisanopenquestion.Whetherisaoneimpliestheotherisanareaoffuturedevelopment.However,eachoftheaboveconditionsisopenanddense.Thereforethenotionofgenericityfortheentiretheoremthatsatisesalloftheaboveconditionsisalsoopen,dense,subsetofR6m.Inotherwordsthetheoremappliescomplimenttoaclosednowhereset. Example11. ThepureconditionforthepolynomialsystempresentedinExample 8 canbecalculatedfromabove.Thetermcorrespondingtopartitioningat1,2,3is(b4b5b6)()]TJ /F3 11.955 Tf 9.3 0 Td[(a1a3a4a5+a1a3a4a6+a2a3a4a5)]TJ /F3 11.955 Tf 12.07 0 Td[(a2a3a4a6+a1a4a5)]TJ /F3 11.955 Tf 12.07 0 Td[(a1a4a6)]TJ /F3 11.955 Tf 12.06 0 Td[(a2a4a5+a2a4a6),whichistheonlytermthatcontainsb4b5b6asafactor.Theentireexpressionforthepureconditioncontain20suchfactors.ThenextobservationintroducesamoresuccinctwayofwritingthepureconditionviaGrassmann-Pluckercoordinates[ 174 ]: Observation5. ThepureconditionforaMcanbewrittenasdet(M)=h'(M[A]),'(M[B])i.where: '(M[.])isthe)]TJ /F4 7.97 Tf 9.71 -4.38 Td[(mm=2Grassmann-Pluckercoordinate. M[A]andM[B]arethem-by-2msubmatricesofMconsistingoftherstandsecondgroupofcoordinates,respectively.Furthermore,'(M[B])='(M[A])P(b),where '(M[A])is'(M[A])coordinatesswappedtocomplimentaryindices. P(b)isthevectorof)]TJ /F4 7.97 Tf 9.71 -4.38 Td[(mm=2combinationsoffb1...bmginreverseorder. iselement-wiseproduct. 4.4.1ConsequencesNowwerelatedtherestricteddictionarylearningproblemtothegeneralgeometricdictionarylearningproblem.Thefollowingisausefulcorollarytothemaintheorem. 64

PAGE 65

Corollary1. GivenasetofnpointsX=x1,..,xnofpointsinR3,genericallythereisaDofsizemsuchthatDyi=xiandjjyijj0sonlyifmn=2.Conversely,ifm=n=2,andthesupportsofxi(thenonzeroentriesoftheyi)areknowntoforma(2,0)-tightgraphG,thengenerically,thereisatleastoneandatmostnitelymanysuchdictionaries. Proof. Onedirectionholdsbecauseform
PAGE 66

CHAPTER5FUTUREWORKANDCONCLUSIONSInthischapterwediscussnewquestions/ideasrelevanttoourwork.Westartwithdiscussingminorresultsthatneedfurtherdevelopment. 5.1MinorResultsInthissectionwediscusssomepotentiallyinterestingminorresultsthatneedfurtherdevelopment.Webeginwithanaturaldiscretizationofsparseapproximationwherethedictionaryispickedfromthehypercubef)]TJ /F3 11.955 Tf 15.28 0 Td[(1,1gn. 5.1.1RepresentationSchemeonTheCube,MoutlonMappingSupposewewanttorestrictourselvestodictionaryatomspickedfromthen-dimensionalhypercubef)]TJ /F3 11.955 Tf 15.28 0 Td[(1,1gn.Thenanaturalquestionarises:givens,whatistheminimumapproximationerrorthatcanbeguaranteedforallx2Rn.Withasimplelineartransformation,L(x)=2x)]TJ /F3 11.955 Tf 11.79 0 Td[(1,onecanre-framethequestionbyaskingforavectorx,ontheunitballinRn,thatisfarawayfromeverys-dimensionalsubspacespannedbyvectorsinf0,1gn.Whens=n,thereisnosuchvectorx.Whens
PAGE 67

Therefore,onecanapproximateanyvectorsxusingans-sparsesubsetofthecube,withtheconstraintthatthecoefcientsareselectedfromtheonthelatticef0,1 2s,2 2s,...,1gswithanerroratmostn 22s.Thenextstepforthisistoshowtheoptimalerrorbyconsiderthepointsthatarethegeneralizationsofthein-centersatthesurfacesofthecube.Thismayobtainatightlowerboundontheerror. 5.1.2CautiouslyGreedyPursuitConsideraSparseApproximationproblemwiths

1 p s.Inthissetting,agreedysolutionispossible.Asketchofthesolutionideaisasfollows.GivenxandD=fd1,...,dmg,thealgorithmndsybyiterativelyfollowingthegreedydirectiononD(i.e.pickthecolumndiwithlargestinnerproductwiththeresidualofy)forastepproportionaltojyj2 sqrt(s).TheL2errorconvergesasO((1)]TJ /F3 11.955 Tf 12.58 0 Td[(1=s)n)=O(e)]TJ /F4 7.97 Tf 6.59 0 Td[(n=s)innsteps.Extendingthisapproachmaybeatopicoffuturedevelopment. 5.2FutureWorkFromChapter2ThissectionproposessomedirectionsforfuturedevelopmentofourworkinChapter2.WediscussDRDLtrade-offsandtheHierarchyquestion. 5.2.1DRDLTrade-offsOneinterestingdirectionistoinvestigatetheeffectofcappingthedimensionreductioninvariousways(e.g.poolingfunctionsinSmale'sframework).Imposinginvarianceselectivityinthedimensionreductionsteplowerstheembeddingdimensionattheexpenseofinvertibility,similartoH-max.Amoregeneralapproachwouldbetoincorporatepartialselectivityforinvariances,wherebyonelearnsdimensionreductionsthatarelossyforinvariancesbutmaintainsomevarianceforameasureofinterest.Forexample,invisiononecanattempttoimposepartialselectivityforrotationalinvarianceandscaleinvariancesbyadimensionreductionthatpoolsoversmallrotationsorsmall 67

PAGE 68

scaleshifts.Partial-Invarianceselectivitycanbemodeledastheadditionofrandomnesstoourgenerativemodel.Wewillexplorethisdirectionfuturework.AddinginvariancetotheDictionaryLearningstepsimprovesthesamplingcomplexity.Forinstance,timeandspacesharethepropertyofbeingshift-invariant.Onecanmodelthesamespatiotemporalblockwithasingledictionaryorwithathreelevelhierarchyofshift-invariantDRDLreectingtwodimensionsofspaceandoneoftime.Shift-invariantdictionarieshavebeenstudiedinthecontextoflearningaudiosequencesyieldingimprovedperformanceempirically[ 128 ]. 5.2.2TheHierarchyQuestionTowardsformallyunderstandingtheimplicationsofusingahierarchy,weproposealistofquestions.[ 148 ]developsatheoreticalframeworkwithasimilargoalbasedupontheH-Maxclassofmodels,whichwehaveshowntobequalitativelydifferentfromours.Thequestionofevaluatingthe(in)signicanceofhierarchiescanbedividedintotwoparts: 1. CapacityofModel:characterizetheconceptclassesexpressiblerelativetothenumberoflevels. 2. ComplexityofLearning:characterizethedifferencesinsamplingcomplexity,computationalcomplexity,etc.,asafunctionofthemismatchbetweenthegenerativeandlearningmodels.Whenisthehierarchicalgenerativemodelagoodassumption?Thereareargumentsfromcomplexsystemtheoryandevolutionarytheoryonwhyourenvironmentismodular[ 13 ].Thebasicpremiseisinformalandisusuallyalongthelinesthatnaturalselectionrequiresstable/robustmodulesbeforeitcanbuildmorecomplicatedonesinasustainablemanner.Weseekamorecomprehensiveanswer.Toobtainaformalcomputationalunderstandingofthisquestion,weaskwhatpotentialclassesoffunctionsdoourgenerativemodelsencode?Ifwerestrictourselvestoloss-lessDR,thenasinglelayerofDRDLingenerativemodecanbethoughtofasageneratoroffunctionsgivenit'sinputasaseed,whereavectorcorrespondstothevalueofasinglefunctionover 68

PAGE 69

it'sentiredomain.ThegenerativemodelofHSRthencorrespondstoacompositionsofthesefunction.ThisisdirectlyconnectedtoaclassicgoalofApproximationTheory,understandingwhathappenswhenapplyingcompositionsandsuperpositions.AnefforttounderstandthisquestiongaverisetoHilbert's13thproblem:whetherthe3variablepolynomialx7+ax3+bx2+cx+1=0canbewrittenasatheconcatenationofanitenumberoftwo-variablefunctions.AgeneralanswertoHilbert's13thproblemwasgivenbyArnol'dshowingthateverycontinuousfunctionofthreevariablesbeexpressedasaofnitelymanycontinuousfunctionsoftwovariables[ 147 ].Givenahierarchicalgenerativemodel,whatarethecomplexitytrade-offsofusingadifferentdepthforthelearninghierarchy?Forinstancewemightwanttounderstandthecomplexitycostoflearningahierarchicalgenerativemodelwithasinglelargelayer.WetoucheduponthisissueinSection 2.2.2.6 .Amorecomprehensiveanalysisinvolvinganymismatch,suchasdifferentdepthhierarchiesanddifferentcomplexitywithineachlayer,isapotentialareaoffuturedevelopment. 5.3FutureWorkFromChapter3ThissectiontouchesuponfutureextensionsandapplicationsofourworkinChapter3.WediscusspotentialgeneralizationandapplicationoftheClusterandIntersectAlgorithm. 5.3.1ClusterandIntersectExtensionsTheClusterandIntersectalgorithmcanbeextendedtodifferentdimensionsofsupportinarelativelystraightforwardway.BothRANSACandGPCAcanbeextendedtoworkcorrectlyforamixtureofsubspacedimensions.Theintersectionalgorithmcanbemodiedaswell.Thegeometricapproachcanbeextendedtonon-zerosnoiseaswell.BothRANSACandGPCAalgorithmsextendtonoisysubspaces.However,theirperformanceislimitedbytheamountofnoise.Foracomprehensivesurvey,wereferthereaderto[ 98 ].Theintersectionoftwosubspacescannowbecastedasndingwhetherasetof 69

PAGE 70

systemoflinearinequalitieshasafeasiblesolution.GiventwosubspacesS1andS2andaboundontheerror,theproblemofintersectingthesubspacescanthenbecastas: minz,s.t.S?1z
PAGE 71

Theanalysisofthealgorithminthetemporalcoherencesettingdependsonthedetailsoftherandomprocess.Anareaoffutureworkwouldbetoapplythisalgorithmtocommonlyencounteredprocessesandmodeltheperformanceofthisalgorithmbothempiricallyandtheoretically(basedonempiricallymotivatedidealizationofthegenerativeprocess). 5.4FutureWorkFromChapter4ThissectiondiscussessomedirectionsforfutureextensionsandgeneralizationsofourmainresultinChapter4. 5.4.1UniquenessandNumberofSolutionsTherigiditytheoremofChapter4showsthat(2,0)-sparsitygenericallyguaranteesanitenumberofsolutions.Insimilarproblemsstudiedintheliterature[ 155 160 168 169 171 172 ],theresultingalgebraicsystemislinear,thereforegenericrigidityimpliesglobalrigidity,i.e.agenericallyrigidsystemhasasinglesolution.However,thisisnotthecaseforourproblemduetonon-linearity.Conditionsthatcharacterizeglobalrigidityremainanopenquestion. Example12. ConsiderthepincongurationinFigure 5-1 withtwodistinctsolutions: Figure5-1. Ak-5congurationwithtwodistinctsolutions. takeanyK5(completegraphwithjVj=5)androtateacopyofitselfasillustratedabove.ThepinsaretheintersectionpointsbetweenthetwoK5's. 71

PAGE 72

UsingBezout'stheorem[ 65 ],wecanobtainaweakupperboundonthenumberofsolutions.Thisfollowsfromthefactthatthenumberofintersectionsofalgebraicsurfacesinprojectivespace,countingmultiplicities,issimplytheproductofthedegreesoftheequationsofthesurfaces. 5.4.2HigherDimensionsGeneralizingthetheoremtohigherdimensiontakestheproblemtothedomainofhypergraphs.AcorrespondinggeneralizationofDirectionNetworks[ 155 169 ]tohigherdimensionsmayprovidethemathematicaltoolsandbackgroundforundertakingthistask.Onecaveatisthatthesparsitytobeusedintheanalysismaydependonthearrangementgraph.Forexample,considerFigure 5-2 ,ifweuses=2thenthereisnotenoughpointstodeterminethedictionary. Figure5-2. Anarrangementof6pointsind=3thatissufcienttodeterminetheirminimumgeneratingdictionaryofsize4. However,ifweworkwiths=3,andnoticethateachpointislyingontwoplanes(thefacesofthesimplex),thenitiseasytoseethatallthefacesofthesimplexarexed.Thisimpliesthattherealizationisrigid.Therefore,oneapproachistoviewtheproblem'scombinatoricsasaseriesofhypergraphsforincreasings
PAGE 73

5.4.3GenerictyTherelationshipbetweenthenotionsofgenercitiesintroducedisanopenquestion.Inparticular,whetherinnitesimalrigidityandthepureconditionarerelatedisanareaoffuturedevelopment.Yetanotheropendirectionistoinvestigatewhetheritispossibletoefcientlytestforgenercityfromtheproblem'sinput(thegraphandxi's). 5.4.4ComputingTheRealizationAnotheropenquestioniscomputingtherealizationfromtheinput.Chapter3discussesalgorithmsforsolvingtheGeometricDictionaryLearningproblem,butthesedonottranslatetooptimalsolutionsoftheRestrictedDictionaryLearningproblem.AnalgorithmthatmeetsthetheoreticalboundofTheorem1isaninterestingdirectionofresearch. 5.5FutureExtensionForTheModelThissectiondiscussesdirectionsforfuturedevelopmentofourmodel.Theproposedmodelcannaturallybeextendedtoincorporatefeedback.Inneuroscience,theroleoffeedbackisdebated.WeviewfeedbackasaBayesianprocessbywhichagenerativemodelpredictsincomingsensorystream.Inthisview,bothattention[ 18 ]andactionarenaturallymanifest.Actioncanbeinterpretedasactiveinference[ 37 ],i.e.samplingthesensoriumtominimizefree-energy.Forattentionandaction,weareinterestedinfeedbackforwardintime.AdynamicmodelcanbeobtainedfromDRDL/HSR.Therststepistomodifythevectorselectionalgorithmsforuseinpredictionintimebyinferringonlyontheportionoftheinputrepresentingthepresentandpast.Wemayextendthemodelwiththeabilitytolearnitsowntopology.Westartwithasimplehierarchicaltopology,reectingthetopographicmapping(inbiologythisislearnedontheevolutionaryscale)ofthesensorycortex.Overtime,connectionsaremodiedandnewconnectionsarecreatedviaexperience.Apossibleon-lineapproach 73

PAGE 74

tothiscanbeinspiredfromtheComplementaryLearningmodeloftheHippocampus[ 19 ].Thisviewsthecortexasaslowlearner,withtheHippocampusasafastlearnerontop.TheHippocampusremembersassociationsacrossthecortexandlater,suchasduringsleep,syndicatesitsinformationbymodifyingtheconnectionswithinthecortex. 5.6ConclusionWeintroducedanovelformulationofanelementalbuildingblockthatcouldserveasathebottom-uppieceinthecommoncorticalalgorithm.Thismodelleadstoseveralinterestingtheoreticalquestions.Weillustratedhowadditionalpriorassumptionsonthegenerativemodelcanbeexpressedwithinourintegratedframework.Furthermore,thisframeworkcanalsobeextendedtoaddressfeedback,attention,action,complimentarylearningandtheroleoftime.DictionaryLearninghasbeenapproachedfromageometricpointofview.Aconnectionwasestablishedtootherproblemsofindependentinterest.ThisdisciplinedwayoflookingatDictionaryLearningintroducedspecicalgorithms,withimprovementsforsimpliedinstances,suchastemporalcoherence,whicharecommonlyencounteredinpractice.WeinvestigatedDictionaryLearningtheoreticallyusingmachineryfromalgebraicandcombinatorialgeometry.Specically,arigidity-typetheoremisobtainedthatcharacterizesthesamplearrangements,thatrecoveranitenumberofdictionaries,usingapurelycombinatorialpropertyonthesupports.Finally,wediscussedsomeminorresultsandquestions,andoutlinedthenextstepsforthiswork. 74

PAGE 75

REFERENCES [1] MichalAharon,MichaelElad,andAlfredBruckstein.K-SVD:AnAlgorithmforDesigningOvercompleteDictionariesforSparseRepresentation.Structure,54(11):4311,2006. [2] HishamEAtallah,MichaelJFrank,andRandallCO'Reilly.Hippocampus,cortex,andbasalganglia:insightsfromcomputationalmodelsofcomplementarylearningsystems.Neurobiologyoflearningandmemory,82(3):253,November2004. [3] KKreutz-Delgado,JosephF.Murray,BhaskarD.Rao,KjerstiEngan,Te-WonLee,TerrenceJ.SenjowskiDictionaryLearningAlgorithmsforSparseRepresentation.StructureNeuralComputation,2002. [4] FrancisBach.ExploringLargeFeatureSpaceswithHierarchicalMultipleKernelLearning.Distribution,(2). [5] FrancisBach,InriaWillowProject-team,andGuillermoSapiro.OnlineLearningforMatrixFactorizationandSparseCoding.pages1. [6] LucyABates,PhyllisCLee,NorahNjiraini,JoyceHPoole,KatitoSayialel,SoilaSayialel,CynthiaJMoss,andRichardWByrne.DoElephantsShowEmpathy?(10):204,2008. [7] aJBell.Levelsandloops:thefutureofarticialintelligenceandneuroscience.PhilosophicaltransactionsoftheRoyalSocietyofLondon.SeriesB,Biologicalsciences,354(1392):2013,December1999. [8] aJBellandTJSejnowski.Aninformation-maximizationapproachtoblindseparationandblinddeconvolution.Neuralcomputation,7(6):1129,November1995. [9] YoshuaBengioandYannLecun.ScalingLearningAlgorithmstowardsAI.Large-ScaleKernelMachines,MITPress,2007. [10] GaryGBlasdel.OrientationSelectivity,StriateCortexPreference,andContinuityinMonkey.Cortex,12(August),1992. [11] JakeBouvrie,TomasoPoggio,LorenzoRosasco,SteveSmale,andAndreWibisono.GeneralizationandPropertiesoftheNeuralResponse.ComputerScienceandArticialIntelligenceLaboratoryTechnicalReport,2010. [12] JakeBouvrie,LorenzoRosasco,andTomasoPoggio.OnInvarianceinHierarchicalModels.AdvancesinNeuralInformation,2009. [13] HASimonTheArchitectureofComplexity.ProceedingsoftheAmericanphilosophicalsociety,1962 75

PAGE 76

[14] EmmanuelJCand,YoninaCEldar,DeannaNeedell,andPaigeRandall.CompressedSensingwithCoherentandRedundantDictionaries.Communi-cations,pages1,2010. [15] ECandes.Therestrictedisometrypropertyanditsimplicationsforcompressedsensing.ComptesRendusMathematique,346(9-10):589,May2008. [16] GunnarCarlsson,TigranIshkhanov,VinSilva,andAfraZomorodian.OntheLocalBehaviorofSpacesofNaturalImages.InternationalJournalofComputerVision,76(1):1,June2007. [17] DamonMChandlerandDavidJField.Estimatesoftheinformationcontentanddimensionalityofnaturalscenesfromproximitydistributions.JournaloftheOpticalSocietyofAmerica.A,Optics,imagescience,andvision,24(4):922,April2007. [18] SharatS.Chikkerur,ThomasSerre,andTomasoPoggio.ABayesianinferencetheoryofattention:neuroscienceandalgorithms.October,2009. [19] McClelland,OReilly,McNaughton.WhyThereAreComplementaryLearningSystemsintheHippocampusandNeocortex:InsightsFromtheSuccessandFailuresofConnectionistModelofLearningandMemory.PsychologicalReview,1995. [20] SharatS.Chikkerur,ThomasSerre,ChestonTan,andTomasoPoggio.Whatandwhere:ABayesianinferencetheoryofattention.VisionResearch,May2010. [21] SharatS.Chikkerur,ChestonTan,ThomasSerre,andTomasoPoggio.Anintegratedmodelofvisualattentionusingshape-basedfeatures.2009. [22] YDan,JJAtick,andRCReid.Efcientcodingofnaturalscenesinthelateralgeniculatenucleus:experimentaltestofacomputationaltheory.TheJournalofneuroscience:theofcialjournaloftheSocietyforNeuroscience,16(10):3351,May1996. [23] SanjoyDasgutaandAnupamGupta.AnElementaryProofoftheJohnson-LindernstraussLemma,1999. [24] GeoDavis.AdaptiveNonlinearApproximations,PhDThesis,NewYorkUniversity,1994. [25] ThomasDean.ScalableInferenceinHierarchicalGenerativeModels.Pro-ceedingsoftheNinthInternationalSymposiumonArticialIntelligenceandMathematics,2006. [26] ThomasDean,GlennCarroll,andRichardWashington.OntheProspectsforBuildingaWorkingModeloftheVisualCortex.Science,1999. 76

PAGE 77

[27] ThomasDean,RichWashington,andGregCorrado.SparseSpatiotemporalCodingforActivityRecognition.Science,(March),2010. [28] RDevore.Deterministicconstructionsofcompressedsensingmatrices.JournalofComplexity,23(4-6):918,August2007. [29] AlexanderGDimitrov,AurelaLazar,andJonathanDVictor.Informationtheoryinneuroscience.Journalofcomputationalneuroscience,30(1):1,February2011. [30] D.L.Donoho.Compressedsensing.IEEETransactionsonInformationTheory,52(4):1289,April2006. [31] D.L.Donoho,M.Elad,andV.N.Temlyakov.Stablerecoveryofsparseovercompleterepresentationsinthepresenceofnoise.IEEETransactionsonInformationTheory,52(1):6,January2006. [32] RichardDurbinandMitchisonGraeme.Adimensionreductionframeworkforunderstandingcorticalmaps.Group,1990. [33] M.EladandAlfredBruckstein.Ageneralizeduncertaintyprincipleandsparserepresentationinpairsofbases.IEEETransactionsonInformationTheory,48(9):2558,September2002. [34] HarrietFeldmanandKarlJFriston.Attention,uncertainty,andfree-energy.Frontiersinhumanneuroscience,4(December):215,January2010. [35] AlysonKFletcherandSundeepRangan.OrthogonalMatchingPursuitfromNoisyMeasurements:ANewAnalysis.ElectricalEngineering,pages1. [36] KarlJFriston.EmbodiedInferenceorIthinkthereforeIamifIamwhatIthink.Optimization,pages89. [37] KarlJFriston.Hierarchicalmodelsinthebrain.PLoScomputationalbiology,4(11):e1000211,November2008. [38] KarlJFriston,JeanDaunizeau,andStefanJKiebel.Reinforcementlearningoractiveinference?PloSone,4(7):e6421,January2009. [39] KarlJFriston,JeanDaunizeau,JamesKilner,andStefanJKiebel.Actionandbehavior:afree-energyformulation.Biologicalcybernetics,102(3):227,March2010. [40] KarlJFriston,JamesKilner,andLeeHarrison.Afreeenergyprincipleforthebrain.Journalofphysiology,Paris,100(1-3):70. [41] KarlJFriston,JeremieMattout,andJamesKilner.Actionunderstandingandactiveinference.Biologicalcybernetics,pages137,February2011. [42] NelloCristianini,JohnShawe-TaylorAnIntroductiontoSupportVectorMachinesandOtherKernel-basedLearningMethods.2000. 77

PAGE 78

[43] DavidHausslerOverviewoftheProbablyApproximatelyCorrect(PAC)LearningFramework,1995. [44] MichaelJ.Kearns,UmeshVazirani.AnIntroductiontoComputationalLearningTheory,MITPress,1994. [45] KarlJFristonandKlaasEStephan.Free-energyandthebrain.Synthese,159(3):417,December2007. [46] JiriNajemnik.Eyemovementstatisticsinhumansareconsistentwithanoptimalsearchstrategy.JournalofVision,8:1,2008. [47] PeterBuhlmann,AbrahamWynerVariableLengthMarkovChains.AnnalsofStatistics,Volume27,Number2,480-513,1999. [48] S.A.Nene,S.K.NayarandH.MuraseColumbiaObjectImageLibrary(COIL-100).TechnicalReportCUCS-006-96,February1996. [49] ImolaK.FodorAsurveyofdimensionreductiontechniques..LLNLTechnicalReport,2002. [50] JeffHawkins,SandraBlakesleeOnIntelligence.TimesBooks,2005. [51] DileepGeorgeandJeffHawkins.Towardsamathematicaltheoryofcorticalmicro-circuits.PLoScomputationalbiology,5(10):e1000532,October2009. [52] Larochelle,Bengio,Louradour,Lamblin.ExploringStrategiesforTrainingDeepNeuralNetworks.JournalofMachineLearningResearch,2009. [53] Y.LeCunandY.Bengio.Convolutionalnetworksforimages,speech,andtime-series.TheHandbookofBrainTheoryandNeuralNetworks.MITPress,1995 [54] SSChen,DLDonohoAtomicdecompositionbybasispursuit.SIAMreview,2001. [55] DileepGeorge.HowtheBrainMightWork:AHiearchicalandTemporalModelforLearningandRecognitionPhDThesis,Stanford,2008. [56] AnnaGilbertandPiotrIndyk.SparseRecoveryUsingSparseMatrices.Proceed-ingsoftheIEEE,98(6):937,June2010. [57] JoshuaGluckman.Higherorderwhiteningofnaturalimages.ComputerVisionandPatternRecognition,2005. [58] JohannesMittmannGrobnerBases:ComputationalAlgebraicGeometryanditsComplexity,2007. [59] BenGoertzel,ItamarArel,andMatthiasScheutz.TowardaRoadmapforHuman-LevelArticialGeneralIntelligence:EmbeddingHLAISystemsinBroad, 78

PAGE 79

Approachable,PhysicalorVirtualContextsPreliminaryDraft.Intelligence,pages1,2009. [60] K.Engan,S.O.Aase,andJ.H.Husy,Methodofoptimaldirectionsforframedesign.Proc.ICASSP,Vol.5,pp.2443-2446,1999. [61] I.T.JolliffePrincipalComponentAnalysis,Springer,2ndedition,2002. [62] JiriMatousekLecturesonDiscreteGeometry,Chapter15,2002. [63] JinguKimandHaesunParkSparseNonnegativeMatrixFactorizationforClustering,2008. [64] PatrikO.HoyerSparseNonnegativeMatrixFactorizationwithSparsenessConstraintsJournalofMachineLearningResearch,2004. [65] IgorV.Dolgachev.IntroductiontoAlgebraicGeometry.2010. [66] NoahDGoodman,TomerDUllman,andJoshuaBTenenbaum.Learningatheoryofcausality.Psychologicalreview,118(1):110,January2011. [67] M.J.D.PowellApproximationTheoryandMethods.CambridgeUniversityPress,1981. [68] DevdattP.Dubhashi,AlessandroPanconesi.ConcentrationofMeasurefortheAnalysisofRandomisedAlgorithms.CambridgeUniversityPress,2005. [69] I.F.GorodnitskyandB.D.Rao.IntroductiontoApproximationTheory.IEEETransactionsonSignalProcessing,45(3):600,March1997. [70] HosseinMobahi,RonanCollobert,JasonWestonDeepLearningfromTemporalCoherenceinVideo.InternationalConferenceonMachineLearning,2009. [71] T.Serre.Learningadictionaryofshape-componentsinvisualcortex:Comparisonwithneurons,humansandmachines.PhDThesis,MassachusettsInstituteofTechnology,Cambridge,MA,April,2006 [72] KarolGregorandYannLecun.LearningFastApproximationsofSparseCoding.InternationalConferenceonMachineLearning,2010. [73] GeoffreyEHinton.Learningmultiplelayersofrepresentation.Trendsincognitivesciences,11(10):428,October2007. [74] J.J.Hopeld.NeuralNetworksandPhysicalSystemswithEmergentCollectiveComputationalAbilities.ProceedingsoftheNationalAcademyofSciences,79(8):2554,April1982. [75] J.J.Hopeld.NeuronswithGradedResponseHaveCollectiveComputationalPropertieslikeThoseofTwo-StateNeurons.ProceedingsoftheNationalAcademyofSciences,81(10):3088,May1984. 79

PAGE 80

[76] JonathanCHortonandDanielLAdams.Thecorticalcolumn:astructurewithoutafunction.PhilosophicaltransactionsoftheRoyalSocietyofLondon.SeriesB,Biologicalsciences,360(1456):837,April2005. [77] PatrikOHoyer.Non-negativesparsecoding.NeuralNetworksforSignalProcessingXII(Proc.IEEEWorkshoponNeuralNetworksforSignalProcessing,2002. [78] PatrikOHoyer.Non-negativeMatrixFactorizationwithSparsenessConstraints.JournalofMachineLearningResearch,5:1457,2004. [79] aHyvarinen,POHoyer,andMInki.Topographicindependentcomponentanalysis.Neuralcomputation,13(7):1527,July2001. [80] EMIzhikevich.Simplemodelofspikingneurons.IEEEtransactionsonneuralnetworks/apublicationoftheIEEENeuralNetworksCouncil,14(6):1569,January2003. [81] EugeneMIzhikevich.Whichmodeltouseforcorticalspikingneurons?IEEEtransactionsonneuralnetworks/apublicationoftheIEEENeuralNetworksCouncil,15(5):1063,September2004. [82] VernonB.MountcastlePerceptualNeuroscience:TheCerebralCortex.HarvardUniversityPress1edition,1998. [83] HerbertJaegerandHaraldHaas.Harnessingnonlinearity:predictingchaoticsystemsandsavingenergyinwirelesscommunication.Science,304(5667):78,April2004. [84] RodolpheJenatton,InriaFr,andFrancisBach.ProximalMethodsforSparseHierarchicalDictionaryLearning.ProceedingsoftheInternationalConferenceonMachineLearning,2010 [85] RodolpheJenatton,GuillaumeObozinski,andFrancisBach.StructuredSparsePrincipalComponentAnalysis.JournalofMachineLearningResearch,2010. [86] AnatoliJuditsky.OnVeriableSufcientConditionsforSparseSignalRecoveryviaL1Minimization,2008. [87] YanKarklinandMichaelSLewicki.Learninghigher-orderstructuresinnaturalimages.Network(Bristol,England),14(3):483,August2003. [88] ThomasPKarnowski,ItamarArel,andDerekRose.DeepSpatiotemporalFeatureLearningwithApplicationtoImageClassication.ElectricalEngineering. [89] KennethKreutz-Delgado,JosephFMurray,BhaskarDRao,KjerstiEngan,Te-WonLee,andTerrenceJSejnowski.Dictionarylearningalgorithmsforsparserepresentation.Neuralcomputation,15(2):349,February2003. 80

PAGE 81

[90] MFLandandRDFernald.Theevolutionofeyes.Annualreviewofneuroscience,15(1990):1,January1992. [91] AnnB.Lee.TreeletsAToolforDimensionalityReductionandMulti-ScaleAnalysisofUnstructuredData.JournalofMachineLearningResearch,2007. [92] AnnB.Lee,BoazNadler,andLarryWasserman.TreeletsAnadaptivemulti-scalebasisforsparseunordereddata.AnnalsofAppliedStatistics,2(2):435,June2008. [93] HonglakLeeandAndrewYNg.Efcientsparsecodingalgorithms.NeuralInformationProcessingSystems,2006. [94] TaiSingLeeandDavidMumford.HierarchicalBayesianinferenceinthevisualcortex.JournaloftheOpticalSocietyofAmerica.A,Optics,imagescience,andvision,20(7):1434,July2003. [95] JingLei,NicolaiMeinshausen,DavidPurdy,andVinceVu.TheCompositeAbsolutePenaltiesFamilyForGroupedHierarchicalVariableSelection.AnnalsofStatistics,2009. [96] DaLeopold,aJO'Toole,TVetter,andVBlanz.Prototype-referencedshapeencodingrevealedbyhigh-levelaftereffects.Natureneuroscience,4(1):89,January2001. [97] MSLewickiandTJSejnowski.Learningovercompleterepresentations.Neuralcomputation,12(2):337,February2000. [98] R.VidalATutorialOnSubspaceClustering.InPress.2011. [99] TianyangLv,ShaobinHuang,XizheZhang,andZheng-xuanWang.ARobustHierarchicalClusteringAlgorithmanditsApplicationin3DModelRetrieval.FirstInternationalMulti-SymposiumsonComputerandComputationalSciences(IMSCCS'06),pages560,June2006. [100] KZhang.Representationofspatialorientationbytheintrinsicdynamicsofthehead-directioncellensemble:atheory.TheJournalofneuroscience,16(6):2112,March1996. [101] PengZhaoandBinYu.OnModelSelectionConsistencyofLasso.JournalofMachineLearningResearch,7:2541,2006. [102] JulienMairal,FrancisBach,InriaWillowProject-team,andGuillermoSapiro.OnlineLearningforMatrixFactorizationandSparseCoding.JournalofMachineLearningResearch,11:19,2010. [103] E.H.Mckinney.GeneralizedBirthdayProblem.TheAmericanMathematicalMonthly,73(4):385,April1966. 81

PAGE 82

[104] MartinPMeyerandStephenJSmith.Evidencefrominvivoimagingthatsynaptogenesisguidesthegrowthandbranchingofaxonalarborsbytwodistinctmechanisms.TheJournalofneuroscience:theofcialjournaloftheSocietyforNeuroscience,26(13):3604,March2006. [105] HosseinMobahi,RonanCollobert,andJasonWeston.Deeplearningfromtemporalcoherenceinvideo.Proceedingsofthe26thAnnualInternationalConferenceonMachineLearning,pages1,2009. [106] CristopherMNiell,MartinPMeyer,andStephenJSmith.Invivoimagingofsynapseformationonagrowingdendriticarbor.Natureneuroscience,7(3):254,March2004. [107] BrunoAOlshausen,CHAnderson,andDCVanEssen.Aneurobiologicalmodelofvisualattentionandinvariantpatternrecognitionbasedondynamicroutingofinformation.TheJournalofNeuroscience,13(11):4700,November1993. [108] BrunoAOlshausenandDJField.Sparsecodingwithanovercompletebasisset:astrategyemployedbyV1?Visionresearch,37(23):3311,December1997. [109] TomasoPoggioandDMarr.CooperativeComputationofStereoDisparity.AdvancementOfScience,194(4262):283,2008. [110] MRiesenhuberandTomasoPoggio.Hierarchicalmodelsofobjectrecognitionincortex.Natureneuroscience,2(11):1019,November1999. [111] STRoweisandLKSaul.Nonlineardimensionalityreductionbylocallylinearembedding.Science(NewYork,N.Y.),290(5500):2323,December2000. [112] ChristopherJRozell,DonHJohnson,RichardGBaraniuk,andBrunoaOlshausen.Sparsecodingviathresholdingandlocalcompetitioninneuralcircuits.Neuralcomputation,20(10):2526,October2008. [113] SylvainSardy,AndrewG.Bruce,andPaulTseng.BlockCoordinateRelaxationMethodsforNonparametricWaveletDenoising.JournalofComputationalandGraphicalStatistics,9(2):361,June2000. [114] TerrenceJSejnowskiandZacharyMainen.ReliabilityofSpikeTiminginNeocorticalNeurons.AdvancementOfScience,268(5216):1503,2008. [115] ThomasSerre,GabrielKreiman,MinjoonKouh,CharlesCadieu,UlfKnoblich,andTomasoPoggio.Aquantitativetheoryofimmediatevisualrecognition.Brain,165:33,2007. [116] ThomasSerre,AudeOliva,andTomasoPoggio.Afeedforwardarchitectureaccountsforrapidcategorization.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,104(15):6424,April2007. 82

PAGE 83

[117] ThomasSerre,LiorWolf,StanleyBileschi,MaximilianRiesenhuber,andTomasoPoggio.Robustobjectrecognitionwithcortex-likemechanisms.IEEEtransactionsonpatternanalysisandmachineintelligence,29(3):411,March2007. [118] MNShadlenandWTNewsome.Noise,neuralcodesandcorticalorganization.Currentopinioninneurobiology,4(4):569,August1994. [119] WRSoftkyandCKoch.ThehighlyirregularringofcorticalcellsisinconsistentwithtemporalintegrationofrandomEPSPs.TheJournalofNeuroscience,13(1):334,January1993. [120] PabloSprechmannandGuillermoSapiro.DictionaryLearningandSparseCodingforUnspervisedClustering.IMA,2009. [121] DowningStreetandUnitedKingdom.BiologicalCybernetics.Nature,237(5349):55,May1972. [122] DmitriBStrukov,GregorySSnider,DuncanRStewart,andRStanleyWilliams.Themissingmemristorfound.Nature,453(7191):80,May2008. [123] JBTenenbaum,VdeSilva,andJCLangford.Aglobalgeometricframeworkfornonlineardimensionalityreduction.Science(NewYork,N.Y.),290(5500):2319,December2000. [124] MishaTsodyks.Attractorneuralnetworksandspatialmapsinhippocampus.Neuron,48(2):168,October2005. [125] RichardTurner.Atheoryofcorticalresponses,KarlFriston,2005.Neuroscience,2005. [126] MichaelAWebster,DanielKaping,YokoMizokami,andPaulDuhamel.Adaptationtonaturalfacialcategories.Nature,428(April):357,2004. [127] HeikoWersingandEdgarKorner.Learningoptimizedfeaturesforhierarchicalmodelsofinvariantobjectrecognition.Neuralcomputation,15(7):1559,July2003. [128] BorisMailhe,SylvainLesage,RemiGribonval,FredericBimbot,andPierreVandergheynst.Shift-InvariantdictionaryLearningForSparseRepresentations:Extendingk-SVD.Proc.EUSIPCO,2008. [129] JulienMairal,FrancisBach,AndrewZisserman,andGuillermoSapiro.SupervisedDictionaryLearningNeuralInformationProcessing,2008. [130] BradleyEfron,TrevorHastie,IainJohnstoneandRobertTibshiraniLeastAngleRegression.StatisticsDepartment,StanfordUniversity.,2003. 83

PAGE 84

[131] TimotheeMasquelier,ThomasSerre,andTomasoPoggio.Learningcomplexcellinvariancefromnaturalvideos:AplausibilityproofCBCLPaper,MassachusettsInstituteofTechnology,Cambridge,MA,USA,2007. [132] KennethMiller,JosephKeller,andMichaelStryker.OcularDominanceColumnDevelopment:AnalysisandSimulation.Science,1989. [133] AlanSultan.LinearProgramming:AnIntroductionWithApplicationsCreateS-pace,2011. [134] Pati,Y.C.,CARezaiifar,R.,Krishnaprasad,P.S.Orthogonalmatchingpursuit:Recursivefunctionapproximationwithapplicationstowaveletdecomposition,Signals,SystemsandComputers1993. [135] BrunoaOlshausenandDavidJField.Emergenceofsimplecellreceptiveeldpropertiesbylearningasparsecodefornaturalimages.Nature,1996. [136] SteveSmale,TomasoPoggio,AndreaCaponnetto,andJakeBouvrie.DerivedDistance:towardsamathematicaltheoryofvisualcortex.ArticialIntelligence,2007. [137] EvanCSmithandMichaelSLewicki.Efcientauditorycoding.Nature,439(7079):978,February2006. [138] WRSoftky.Simplecodesversusefcientcodes.Currentopinioninneurobiology,5(2):239,April1995. [139] ShelleyDerksen,H.J.Keselman.Backward,forwardandstepwiseautomatedsubsetselectionalgorithms:Frequencyofobtainingauthenticandnoisevariables.,BritishJournalofMathematicalandStatisticalPsychology,Volume45,Issue2,pages265282,November,1992. [140] IrinaF.Gorodnitsky,BhaskarD.RaoSparseSignalReconstructionfromLimitedDataUsingFOCUSS:ARe-weightedMinimumNormAlgorithm.IEEETransac-tionsonSignalProcessing,1997. [141] MarkDPlumbley.DictionaryLearningforL1-ExactSparseCoding.Proceedingsofthe7thinternationalconferenceonIndependentcomponentanalysisandsignalseparation,Pages406-413,2007. [142] TomasoPoggioandSteveSmale.TheMathematicsofLearning:DealingwithData1Introduction.NoticesoftheAmericanMathematicalSociety,2003. [143] IgnacioRamirez,PabloSprechmann,andGuillermoSapiro.ClassicationandClusteringviaDictionaryLearningwithStructuredIncoherenceandSharedFeatures.IEEEInternationalConferenceonComputerVisionandPatternRecognition(CVPR),2010. 84

PAGE 85

[144] BhaskarDRao.SignalProcessingwiththeSparsenessConstraint.IEEEInternationalConferenceonAcoustics,SpeechandSignalProcessing,1998. [145] HerbertSimon.TheOrganizationofComplexSystemsHierarchyTheory-TheChallengeofComplexSystems,GoergeBraziller,NewYork,pages:1-27,1973. [146] EeroPSimoncelliandBrunoAOlshausen.NaturalImageStatisticsandNeuralRepresentation.AnnualReviewofNeuroscience,vol.24,pp.1193-1216,May2001. [147] ZiqinFengHilberts13thProblem.PhDThesis,UniversityofPittsburgh,2010. [148] SteveSmaleandFelipeCucker.OntheMathematicalFoundationofLearning.Bull.Amer.Math.Soc.,39,1-49,2002. [149] PhilippRobbelandDebRoy.ExploitingFeatureDynamicsforActiveObjectRecognition.Change. [150] ScottShaobingChen,DavidDonoho,MichaelSAunders.AtomicDecompositionByBasisPursuit.SocietyforIndustrialandAppliedMathematics,Volume43Issue1,Pages129-159,2001. [151] Tibshirani,R.Regressionshrinkageandselectionviathelasso.J.Royal.Statist.SocB.,Vol58,No1,pages267-288,1996. [152] DavidPetrieMoultonNumberTheoryandGroups.PhDThesis,UniversityofCalifornia,Berkeley,1995. [153] L.Asimow,B.RothTheRigidityofGraphs.TransactionsoftheAMS245,279289,1978. [154] PetChristianHansenThetruncatedSVDasamethodforregularization.BITComputerScienceandNumericalMathematics,Volume27Issue4,Pages534-553,Oct.1987 [155] WhiteandWhiteleyThealgebraicgeometryofstressesinframeworks.SIAM.J.onAlgebraicandDiscreteMethods,4(4),481511.1987. [156] WhiteleyTheunionofmatroidsandtherigidityofframeworks.SIAMJournalonDiscreteMathematics,Volume1Issue2,Pages237-255,May1988. [157] MingLiandPaulVitanyiAnIntroductiontoKolmogorovComplexityandItsApplications,2ndEditionSpringer,1997. [158] Lee,StreinuandTheranGradedSparseGraphsandMatroidsJournalofUniversalComputerScience,Vol13,Issue11,Pages1671-1679,Nov2007. [159] StreinuandTheranSlider-PinningRigidity:aMaxwellLaman-TypeTheorem.DiscreteandComputationalGeometry,Volume44,Issue4,pages812-837,December2010. 85

PAGE 86

[160] ServatiusMolecularconjecturein2D16thFallWorkshoponComputationalandCombinatorialGeometry,2006. [161] LaurenzWiskottandTerrenceSenjowski.SlowFeatureAnalysis:UnsupervisedLearningofInvariances.NeuralComputationVol.14,No.4,Pages715-770,April2002. [162] DavidHWolpert,NnaD,HarryRoad,SanJose,andWilliamGMacready.NoFreeLunchTheoremsforOptimization1Introduction.1996. [163] JeremyMWolfe.GuidedSearch4.0CurrentProgressWithaModelofVisualSearch.IntegratedModelsofCognitiveSystems,2006. [164] StevenJ.Gortler,CraigGotsman,LigangLiu,andDylanP.ThurstonOnAfneRigidity.arXiv:1011.5553,2010. [165] H.S.M.CoxeterProjectiveGeometrySpringer.2ndEdition.2003.InPress. [166] HonghaoShan,LingyunZhang,andGarrisonWCottrell.RecursiveICA.Ad-vancesinNeuralInformationProcessingSystems19,pages1273-1280,2007. [167] MichalAharon,MichaelElad,andAlfredBruckstein.K-SVD:AnAlgorithmforDesigningOvercompleteDictionariesforSparseRepresentation.IEEETransactionsonSignalProcessing,54(11):4311,Nov2006. [168] Whiteley,W.Somematroidsfromdiscreteappliedgeometry.MatroidTheory.ContemporaryMathematics,AmericanMathematicalSociety,171-311,1996. [169] WhiteandWhiteleyTheAlgebraicGeometryofMotionsofBar-and-BodyFrameworks.SIAMJournalonAlgebraicandDiscreteMethods,1987. [170] K.Haller,A.Lee,M.Sitharam,I.Streinu,N.WhiteACM-SACGeometricconstraintsandReasoning,2009andFwCG2008.ComputationalGeometryTheoryandApplications.CoRRabs/1006.1126:(2010)Body-and-CadConstraintSystems.ACM-SACGeometricconstraintsandReasoning,2009.ComputationalGeometryTheoryandApplications,CoRRabs/1006.1126:,2010. [171] B.JacksonandT.JordanPin-collinearBody-and-PinFrameworksandtheMolecularConjecture.TechnicalReport,2006. [172] Theran,Louis.Problemsingenericcombinatorialrigidity:sparsity,sliders,andemergenceofcomponentsPhDThesis,UniversityofMassachusetts,Amherst,2010. [173] RuijinWu,CIS6930,Geometriccomplexity,UniversityofFlorida,Lecturenotes,7-12.2011 [174] DhruvRanganathan.AGentleIntroductiontoGrassmanianns.InPress.2010. [175] URL:http://yann.lecun.com/exdb/mnist/ 86

PAGE 87

[176] URL:http://spams-devel.gforge.inria.fr/ 87

PAGE 88

BIOGRAPHICALSKETCH MohammadTaricompletedhisBachelorofEngineeringincomputersandcommunications,togetherwithminorsinmathematicsandphysics,attheAmericanUniversityofBeirut.Hewentontoobtainamaster'sdegreeincomputerengineering,researchingquantumcomputingandtheoreticalcomputerscience,attheUniversityofFlorida.ConcurrentlywithhisDoctorofPhilosophystudies,Mohammadworkedfull-timeforseveralyearsinindustry. 88