Analysis of Bayesian Group-Lasso in Regression Models

MISSING IMAGE

Material Information

Title:
Analysis of Bayesian Group-Lasso in Regression Models
Physical Description:
1 online resource (35 p.)
Language:
english
Creator:
Chandran,Manu
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Master's ( M.S.)
Degree Grantor:
University of Florida
Degree Disciplines:
Electrical and Computer Engineering
Committee Chair:
Khargonekar, Pramod
Committee Members:
Gader, Paul D
Lele, Tanmay

Subjects

Subjects / Keywords:
bayesian -- feature -- group -- lasso -- selection
Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre:
Electrical and Computer Engineering thesis, M.S.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
The Group-Lasso estimator, used in regression analysis, does not calculate the variance estimates of regression coefficients. Such estimates are important, since they represent the confidence of the model in its estimation. By using the Bayesian version of Group-Lasso, known as Bayesian Group-Lasso, we can estimate the variance estimates of the regression coefficients. Bayesian Group-Lasso has already been proposed and used for classification models. In this thesis, we use the Bayesian Group-Lasso model for regression problems. It is implemented using a Gibbs sampling technique. We evaluate its performance and compare it with Lasso and Group-Lasso techniques, using datasets generated from known model parameters.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Manu Chandran.
Thesis:
Thesis (M.S.)--University of Florida, 2011.
Local:
Adviser: Khargonekar, Pramod.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2011
System ID:
UFE0043490:00001


This item is only available as the following downloads:


Full Text

PAGE 1

ANALYSISOFBAYESIANGROUP-LASSOINREGRESSIONMODELSByMANUCHANDRANATHESISPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFMASTEROFSCIENCEUNIVERSITYOFFLORIDA2011

PAGE 2

c2011ManuChandran 2

PAGE 3

ToMomandDad 3

PAGE 4

ACKNOWLEDGMENTS Iwouldliketoexpressmygratitudetomyadvisor,Dr.PramodKhargonekar,andmycommitteemembers,Dr.PaulGaderandDr.TanmayLele,fortheguidanceandsupportthroughoutthisstudyandresearch.IwouldalsoliketothankDr.JohnShea,forinspiringmebyhisenthusiasminteachingandresearch.IwouldalsoliketothankDr.BrandiOrmerod,forgivinganopportunitytoworkwithherteam.Additionally,IwouldliketothankmyroommatesManuNandan,ClintGeorgeandAbbinAntony,fortheirsupportthroughoutmystayattheUniversityofFlorida.Iwouldalsoliketothankmyparents,MohanaChandranandPadmaKumari,andmyclosefriends,AswinSreenivas,JayaLekshmiandBijiRose,fortheirconstantsupportandencouragement. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS ................................... 4 LISTOFTABLES ...................................... 6 LISTOFFIGURES ..................................... 7 ABSTRACT ......................................... 8 CHAPTER 1INTRODUCTION ................................... 9 1.1Motivation .................................... 9 1.2Organization ................................... 11 2BACKGROUND .................................... 12 2.1UsingaSparsityRegularizationTerm ...................... 13 2.1.1Lasso ................................... 13 2.1.2Group-Lasso ............................... 15 2.2BayesianLearningwithIndependentPriors ................... 16 3ALGORITHMANDDISCUSSION ........................... 19 3.1BayesianGroup-LassoModel .......................... 19 3.1.1EquivalencetoGroup-Lasso ....................... 20 3.1.2HierarchicalModel ............................ 20 3.2BayesianGroup-LassoImplementation ...................... 21 3.2.1Sampling ................................. 22 3.2.2Convergence ............................... 22 3.2.3ChoosingModelHyperparameters .................... 23 3.3ConceptofGroupStructure ........................... 23 3.3.1StrongGroupSparsity .......................... 24 3.3.2WeakGroupSparsity ........................... 24 4RESULTS ....................................... 26 5CONCLUSIONANDFUTUREWORK ........................ 32 REFERENCES ........................................ 33 BIOGRAPHICALSKETCH ................................. 35 5

PAGE 6

LISTOFTABLES Table page 4-1Dataset1results ................................... 27 6

PAGE 7

LISTOFFIGURES Figure page 2-1Lassoandridgeregressionestimation ........................ 14 3-1HierarchicalstructureofBayesianGroup-Lassoregressionmodel ........... 21 3-2Exampleofregressioncoecientstructureforthecaseofstronggroupsparsity ... 24 3-3Exampleofregressioncoecientstructureforthecaseofweakgroupsparsity ... 25 4-1Changingnumberofsamplesforastronggroupsparsitycase ............ 28 4-2Changingnumberofgroupsforastronggroupsparsitycase ............. 29 4-3Changingnumberofsamplesforaweakgroupsparsitycase ............. 30 7

PAGE 8

AbstractofThesisPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofMasterofScienceANALYSISOFBAYESIANGROUP-LASSOINREGRESSIONMODELSByManuChandranAugust2011Chair:PramodKhargonekarMajor:ElectricalandComputerEngineeringTheGroup-Lassoestimator,usedinregressionanalysis,doesnotcalculatethevarianceestimatesofregressioncoecients.Suchestimatesareimportant,sincetheyrepresentthecondenceofthemodelinitsestimation.ByusingtheBayesianversionofGroup-Lasso,knownasBayesianGroup-Lasso,wecanestimatethevarianceestimatesoftheregressioncoecients.BayesianGroup-Lassohasalreadybeenproposedandusedforclassicationmodels[ 15 ].Inthisthesis,weusetheBayesianGroup-Lassomodelforregressionproblems.ItisimplementedusingaGibbssamplingtechnique.WeevaluateitsperformanceandcompareitwithLassoandGroup-Lassotechniques,usingdatasetsgeneratedfromknownmodelparameters. 8

PAGE 9

CHAPTER1INTRODUCTION 1.1MotivationInmanyeldssuchasgenomics,computationalbiologyandtextcategorization,thedataavailableforprocessinghasbeengrowingenormously.Itisbecomingincreasinglydiculttomakesenseofthislargeamountofdata.Manytechniqueshavebeendevelopedinthepastfewdecadestoaddressthischallenge.Sparsedatarepresentationsoeraverypromisingavenuetodealwithsuchproblems.Asparserepresentationisalinearcombinationofasmallnumberofelementarysignals,whichretainsallormostoftheinformationcontainedintheoriginalsignal.Thetransformationofdataintoasparserepresentationalsohelpsinstoringthedatamoreeciently.Theprocessofstoringdatainanecientmanner,withorwithoutlossofinformationisknownasdatacompression.Arelativelynewtechniqueindatacompressioniscompressivesensing.Ifasignalisknowntobesparseinsomedomain,thencompressivesensingneedsfewersamplesforsignalreconstructionthanneededbytheconventionalNyquist-Shannonsampling[ 2 ].Itusesanonlinearprocessforsignalreconstruction.Incontrast,whenconventionalsamplingtechniquesareused,linearinterpolationisusedforsignalreconstruction.Compressivesensingisnowgenerallyconsideredarevolutionarytechniqueintheeldofinformationtheory.SomeotheralgorithmsthattransformdataintosparserepresentationsarediscussedindepthinSection 2 .Althoughcompressionofdataisadesirableobjective,themaingoalofmanymachinelearningalgorithmsistodetermineamodelthatbestdescribestheavailabledata.Supervisedlearningisacategoryofmachinelearningalgorithmsinwhichtheinputdataconsistsofthevaluesofobservedvariablesknownasfeaturesandthedesiredoutputvariable.Thedesiredoutputvariableisalsoreferredtoastargetvariable.Theinputdataisusuallysplitintotrainingdataandtestdata.Thetrainingdataisusedtotunecertainparametersofthemodel,sothattheoutputofthemodelissimilartothemeasuredvaluesofthetargetvariable.Thetestdataisusedtoverifytheperformanceoftheestimatedmodel.Supervisedlearningproblemscanbe 9

PAGE 10

oftwotypes,classicationproblemsandregressionproblems.Inclassicationproblems,thetargetvariablecantakeonlydiscretevalues,whileinregressionproblemsthetargetvariablevaluesareassumedtotakecontinuousvalues.Inboththesemethods,toinfertheunderlyingmodel,itishelpfultofocusonthemostprominentfeatures.Oncethemostprominentfeaturesareidentied,weneedtostoreinformationofonlythosefeatures.Thuswecantransformdataintoitssparserepresentation.Anexampleofsupervisedlearningproblemthatrequiressparserepresentationofdataismicro-arrays,fromtheeldofcomputationalbiology.Recently,micro-arraysareincreasinglyusedfordiagnosisofavarietyofdiseases.Astheuseofmicro-arrayshasincreased,theuseoffeatureselectionalgorithmsinmicro-arrayshasalsoincreased[ 7 ].AveryimportanttechniqueusedinfeatureselectionistheLeastabsoluteshrinkageandselectionoperator(Lasso).WewilldiscussLassoanditspropertiesinSection 2.1.1 .Inmanycomputationalbiologyapplications,groupingoffeaturescanbehelpfulindeterminingthemodelofthesystem.Groupingoffeaturesandselectionofagroupoffeaturesisknownasgroupselection.Thisisusedwhenagroupstructureofthefeaturesisalreadyavailable.Forexample,ifweknowthattwofeaturesbelongtothesamegroup,thenitmightbeappropriatetoincludeorexcludethemsimultaneously.Groupingoffeaturesalsondsapplicationinfusionofdata.Performanceofanumberofalgorithms,suchasmassspectrometryforcancerdetectionandhyperspectralimagery,canbeimprovedbygroupingthefeatures[ 16 ].AnextensionofLasso,knownasGroup-Lassotechniquehasbeenproposedtoimprovetheperformanceofthesealgorithms[ 21 ].Although,Group-Lassocanbeusedinproblemsrequiringgroupingoffeatures,itgivesapointestimateofthesolution.Thevariancesofestimatedsolutions,representthecondenceofthemodelintheestimation,whichisadesirableproperty.BayesianGroup-LassoisanextensionoftheGroup-Lassoalgorithm,whichndsthesolutionanditsvarianceestimate.WewilldiscussBayesianGroup-LassoindetailinSection 3.1 10

PAGE 11

Inthisthesis,weanalyzetheperformanceofthealgorithmBayesianGroup-Lasso,asappliedtoregressionproblems,usingsimulateddatasets.BayesianGroup-Lassowasusedin[ 15 ]toaddresslearningproblemsinclassication.Anin-depthperformanceanalysisofBayesianGroup-Lassoinregressionproblemshasneverbeendone.Apreviousattemptin[ 10 ]didnotbringouttheeectofchangeinsamplesizeandgroupstructureontheperformanceofBayesianGroupLasso.Inthiswork,performanceofBayesianGroup-LassoiscomparedwithLassoandGroup-Lassofordatasetsofvariedsamplesizeandgroupstructure. 1.2OrganizationThisthesisisorganizedasbelow.InChapter2,weexplainthemostimportantalgorithmsusedinfeatureselection.Theconceptoffeatureselectionusingregularizationparameterandusageofpriorsforregularizationarepresented.InChapter3,theBayesianGroup-Lassomodelispresented.Theanalysis,implementation,advantagesanddisadvantagesoftheBayesianGroup-Lassoarediscussed.InChapter4,theresultsofthisstudyarepresented.ThestudymainlyincludesanalysisofsimulateddatasetstobringoutthepropertiesofBayesianGroup-Lasso.Finally,Chapter5summarizesthisstudyandpresentsadiscussiononpossibleextensionsofthecurrentwork. 11

PAGE 12

CHAPTER2BACKGROUNDAsmentionedintheintroduction,manyupcomingeldsnduseoffeatureselectionalgorithms.Thishasfueledanincreasinginterestinthedevelopmentoffeatureselectionalgorithms.Insomecases,identicationoftheprominentfeaturesisimportant.Inothers,identicationofagroupofprominentfeaturesisimportant.Inthischapter,wediscussthemostprominentfeatureselectionalgorithmsfoundintheliterature.Thealgorithmsthatwewilldiscussinthischapterareforregressionproblems.Thesealgorithms,performfeatureselectionandregressionsimultaneously.Featureselectionalgorithmscanbedividedintotwocategories.1)Techniquesthatuseasparsityregularizationterm,forexample,LassoandGroup-Lassoand2)Techniquesthatuseindependentpriors,forexampleBayesianLasso,JointClassierandFeatureOptimization(JCFO)andRelevancevectormachine(RVM).Inthischapter,wewillreviewsomeofthesetechniquesindetail,bringingouttheiradvantagesanddisadvantages.ConsideradatasetofNdatapoints.Eachdatapointconsistsofapair(xi,yi)wherexiistheinputvariableandyiisthetargetvariable.EachinputxiisavectorinRD,whereDdenotesthenumberoffeatures.WedeneXtobetheNxDmatrixwhoserowsconsistofinputdatavectorsxi,i=1,2,....Nandytobethevector(y1,...,yN)T,yi2R.ThemodelsdescribedinthisthesisassumealinearrelationshipbetweenyandX.Letwdenotetheregressioncoecientvector,aDdimensionalvectordenedbyw=(w1w2...wD)T.ThemodelforlinearregressionisgivenbytheEquation( 2{1 ),wheredenotetheerrorinthelinearrelationshipbetweenyandX: y=XwT+(2{1)Theobjectiveofthealgorithmsdescribedinthissectionistondanestimateoftheregressioncoecientvectorw.Wedenotetheestimatedregressioncoecientvectorasw.Thetargetvariablecalculatedfromthisestimatedregressioncoecientvectorisgivenbyy=XwT.Oneoftheobjectivesofthefeatureselectionalgorithmsistominimizethedierencebetween 12

PAGE 13

yandy.Theotherobjectiveistoidentifytheprominentfeatures.Wecandeneafunctionf(w),whichgivestheerrorbetweenyandy.Acommonlyusederrorfunctionisthesquaredl2-normofthedierencebetweenyandy,expressedasf(w)=ky)]TJ /F6 11.955 Tf 11.95 0 Td[(yk22. 2.1UsingaSparsityRegularizationTermLetussaythattheaimistondasolutionforwthathasasmallnumberofnonzerovaluesi.e.,asparsesolution[ 5 ].Thiscanbefoundbyminimizingtheerrorfunctionf(w)withtheconstraintthatwissparse.Anindirectapproachtothisproblemisviathefollowingoptimizationproblem. w=argminw[f(w)+g(w)](2{2)Hereg(w)isasparsitypromotingfunctionthatpenalizesnon-sparsesolutions.Here0isaregularizationparameterwhichbalancesthetradeobetweenoriginalobjectivefunctionf(w)andsparsitypromotingtermg(w).Thevalueofthatgivesthesolutionwiththeminimumerror,isdeterminedusingtechniquessuchascrossvalidation[ 7 ].Theregularizationtermg(w)alsohelpsinavoidingovertting.Overttingreferstothebehavioroflearningtechniquesinwhichthelearnedmodelbecomesincreasinglytunedtotherandomnoiseinthetargetvariable[ 3 ].Inthisthesis,wefocusontechniquesthatgeneratesparsesolutions.Averyimportantregularizationfunctionisgivenbyg(w)=kwk0,wherek.k0denotesl0-normwhichisdenedtobethenumberofnonzeroelementsinw. 2.1.1LassoAlthoughtheideaofusingl0-normofw,tosolve( 2{2 )isappealing,optimizing( 2{2 )withg(w)=kwk0isahardproblem[ 11 ].Hence,g(w)isusuallychosentobesomerelaxedformofthel0-norm.Thechoiceisusuallysomelp-normwherep2.Awidelyappliedtechniqueistousethel2-normastheregularizationfunctiong(w),whichisknownasRidgeregression.TheRidgeregressionestimatecanbemathematicallyexpressedasinEquation( 2{3 ). minw,")]TJ /F6 11.955 Tf 5.48 -9.68 Td[(y)]TJ /F6 11.955 Tf 11.95 0 Td[(XwTT)]TJ /F6 11.955 Tf 5.48 -9.68 Td[(y)]TJ /F6 11.955 Tf 11.96 0 Td[(XwT+DXj=1jwjj2#(2{3) 13

PAGE 14

A BFigure2-1. Lassoandridgeregressionestimation:TheErrorfunctionf(w)andregularizationfunctiong(w)ofLassoandRidgeregressionestimationisshowninthesegures.Functionf(w)isrepresentedbytheellipticalcontours.A)Lasso:Thecontourofregularizationfunctiong(w)=kwk1,forLassoisrepresentedbythesquarecontours.B)Ridgeregression:Theregularizationfunctiong(w)=kwk2,forRidgeregression,isrepresentedbythecircularcontourswhichhasnoedges. Althoughtheregularizationfunctiong(w)inRidgeregressionhelpsinavoidingovertting,itdoesnotencouragesparsity.Toencouragesparsityapopulartechniqueistousel1-normastheregularizationfunctiong(w).ThisisknownasLeastabsoluteshrinkageandselectionoperator(Lasso).LassowassuggestedasaselectionmethodforlinearregressionbyTibshirani[ 18 ].Itiswidelyregardedasabenchmarktechniqueforsparseregression.TheLassoestimatecanbemathematicallyexpressedasinEquation( 2{4 ),where0isaLagrangemultiplier: minw,")]TJ /F6 11.955 Tf 5.48 -9.69 Td[(y)]TJ /F6 11.955 Tf 11.96 0 Td[(XwTT)]TJ /F6 11.955 Tf 5.48 -9.69 Td[(y)]TJ /F6 11.955 Tf 11.95 0 Td[(XwT+DXj=1jwjj#(2{4)WeillustrateLassoinFigure 2-1A .Asthevalueofincreases,thesizeofthesquarealsoincreases.TheoutermostpointsoftheLassoregularizationfunctionlieontheaxis,representedbytheedgesofthesquare.ContrastthistothecaseforRidgeregressionshowninFigure 2-1B .Hereagain,asthevalueofincreases,thesizeofthecircleincreases.ThecircularcontouroftheregularizationfunctionofRidgeregressiondoesnothaveanyedges,sothepossibilityofndingasparsesolutionisless.However,inLassotheedgesofg(w)areon 14

PAGE 15

theaxes.Hence,thechanceofLassoestimatingasparsesolutiononapointontheaxes,ishigherthaninRidgeregression.Theseideashavebeenmathematicallyformalizedin[ 18 ]. 2.1.2Group-LassoInmanyregressionproblems,someofthefeaturesareknowntobelongtocertaingroups.Insuchcasesitmightbeimportanttondsparsityatagrouplevelthanattheindividuallevel.Forexample,ifweknowthattwofeaturesbelongtothesamegroup,thenitmightbeappropriatetoincludeorexcludethemsimultaneously.Someareaswheregroupingoffeaturesareimportantaresimultaneoussparseapproximation[ 19 ],multi-taskcompressivesensing[ 9 ]andsensorfusion[ 17 ].TheseapplicationshavemotivatedthedevelopmentofavariantofLassoestimation,popularlyknownasGroup-Lasso[ 21 ].SupposethatthereareDfeatures,whicharedividedintoGgroups.LetPgdenotethenumberofcoecientsinthegroupg2f1,2,...,Gg.LetthematrixXgcontainthefeaturevectorscorrespondingtothegthgroup.Thevectorwgcorrespondstotheregressioncoecientsofthegthgroup.TheregularizationfunctionusedinGroup-Lasso,g(w)isdesignedtoaccountforthegroupingoffeatures.Itisgivenbyg(w)=PGg=1p pgkwgk2.TheGroup-Lassominimizestheobjectivefunction,whichismathematicallyrepresentedas minw,24 y)]TJ /F8 7.97 Tf 17.17 14.95 Td[(GXg=1Xgwg!T y)]TJ /F8 7.97 Tf 17.17 14.95 Td[(GXg=1Xgwg!+GXg=1p Pgkwgk235(2{5)Fromtheequationoftheregularizationfunctiong(w),wecannoticethatanl2-normisusedforcoecientswithinagroupandl1-normisusedatthegrouplevel.Itisinterestingtonotethat,Group-Lassodoesnotyieldsparsitywithinagroup,whileittriestoattainsparsityatthegrouplevel.Sparseselectionwillactonlyonagrouplevelandanentiregroupofpredictorsmaybedropped.Tomakethisideamoreclearletusconsidertheexampledetailedin[ 22 ].Consideracasewherecoecientshavetwogroups,w=[(w11,w12),w2]T.Here,coecientsw11andw12belongtothesamegroupandanothercoecientw2belongstotheothergroup.InLasso,thepenaltyfunctioncorrespondstol1-norm,g(w)=jw11j+jw12j+jw2j.InGroup-Lasso,thel1-penaltyactsonthegrouplevelandl2-penaltywithinthegroup.Hence,the 15

PAGE 16

penaltyfunctionisgivenbytheequationg(w)=k(w11,w12)k2+jw2jandcoecientsinthesamegroup,suchasw11andw12areestimatedtobesparseornon-sparsesimultaneously. 2.2BayesianLearningwithIndependentPriorsInSection 2.1 wesawthattheobjectivefunctionhasaregularizationtermtoencouragesparsityandtoavoidovertting.InaBayesianlearningsetting,insteadofusingaregularizationparameter,apriorprobabilitydistributionisassignedtothecoecientsofthemodel.Bychoosingsuitablepriors,theBayesianlearningalgorithmsachieveobjectivessuchasregularizationandsparsity.BayesianLasso AswediscussedinSection 2.1.1 ,Lassoisapopulartechniqueinfeatureselection.However,ithassomelimitations.Oneofthemisthattheregularizationparameterhastobedeterminedusingcrossvalidation.AnotherlimitationisthatLassogivesapointestimateofacoecientandthusdoesnotgiveanideaofthecondenceintheestimatedparameters[ 18 ].Ontheotherhand,BayesianLassodoesnothavetheselimitations[ 14 ].ExtendingtheideaofLasso,ParkandCasella[ 14 ]proposedBayesianLasso,afullyBayesiananalysisusingconditionalLaplaceprioronw.TheyproposedaGibbssamplingtechnique[ 3 ]forBayesianLassoestimation.InBayesiananalysis,alikelihoodfunctionandapriorfunctionaredenedforthetargetvariableandthecoecientvector.Thelikelihooddistributionfory,isdenedby( 2{6 ). P(yjX,w)=NYi=1N(yijxiwT,2)(2{6)ItisdenedastheproductofNNormaldistributionswithmeanxnwandvariance2.xnreferstothenthsample,thatisthenthrowofX.ThechoiceofproductofNormaldistributionsisattributedtotheassumptionthatthesamplesaregeneratedindependentlyandthatthenoiseinthesystemisGaussianinnature.Thepriordistributionofthecoecientvector,w,P(w)isassumedtobeaLaplacedistribution. P(wj2)=DYj=1 2p 2e)]TJ /F13 7.97 Tf 6.58 0 Td[(jwjj=p 2(2{7) 16

PAGE 17

TheposteriordistributionP(wjy)isproportionaltotheproductoflikelihooddistributionandprior[ 3 ].Fromtheposteriordistribution,varianceestimatesofthecoecientscanbefound.Additionally,themodelgivesanestimatefortheerrorvariance.Auniformdistributionovertheinterval[1,A),whereAisalargenumber,isusedasthepriordistributionon2. P(2)=1=2(2{8)Thismakesthepriornoninformative.Notethattheprioronwisconditionedon2.Thisconditionalpriorhastheadvantagethattheinducedfullposteriordistributionofwisguaranteedtobeunimodal[ 14 ].Ifthefullposteriorisnotunimodal,itcouldslowtheconvergenceofthesamplingprocessandtheresultingpointestimatesmightnotbemeaningful[ 14 ].HierarchicalModel IfaLaplacedistributionpriorisdirectlyused,thecomplexityincomputationincreases.Hence,analternaterepresentationofLaplacedistributionasamixtureofnormalsscaledwithexponentialmixingdensityiscommonlyused[ 1 ][ 14 ].TherepresentationofLaplacedistributionasamixtureofnormalsisexpressedbyEquation( 2{11 ):P(wj2)=DYj=1 2p 2e)]TJ /F13 7.97 Tf 6.58 0 Td[(jwjj=p 2 (2{9)=DYj=1Z101 q 222je)]TJ /F20 5.978 Tf 5.75 0 Td[(jwjj2 222j2 2e)]TJ /F20 5.978 Tf 7.78 3.26 Td[(2 22jd2j (2{10)=DYj=1Z10N(wjj0,22j)Gamma(2jj1,2 2)d2j (2{11)HereGamma(2jj1,2 2)isthegammadistributionwithshapeparameter1andrateparameteras2 2whichisequivalenttotheexponentialdistributionExpon(2 2).ThemodeoftheexponentialdistributionExpon(2 2),iszero,forany.Thisimpliesthatmostvaluesof2jarezero.From( 2{11 ),wecanseethatmostvaluesofwjaregoingtobezero,whenmost2jarezero. 17

PAGE 18

Thevarianceof2jisheavilydependentonthethevalueof.Thus,insteadofpresettingthevaluefor,itisdesirabletoestimatealongwithotherparameters.Thiswillaccountfortheuncertaintyintheselectionofforregressionestimates.AGamma(a,b)priorisassignedtotomaintainconjugacy[ 14 ].TheBayesianLassoclaimstoperformonparwiththeordinaryLasso[ 14 ].BayesianLassoisanalyticallysimplerandisalsoeasytoimplement.BayesianLassoalsocomputesthevarianceestimatesforallcoecients,whichrepresentsthecondenceofthemodelintheestimation.EventhoughBayesianLassohassomanyadvantagesoverconventionalLasso,itshouldbementionedthatBayesianLassoiscomputationallymoreintensiveandmaynotbesuitableforlargedatasets. 18

PAGE 19

CHAPTER3ALGORITHMANDDISCUSSIONAsdiscussedintheintroduction,ourworkisfocusedontheperformanceofBayesianGroup-Lassoinregressionproblems.BayesianGroup-LassocanbeconsideredasanextensionoftheGroup-Lassoprocedure.Inthissection,wediscusstheBayesianGroup-Lassomodelandthemethodweusedtoimplementit.Additionally,thesimilarityofGroup-LassoandBayesianGroup-Lassowillbediscussed. 3.1BayesianGroup-LassoModelAsexplainedinSection 2.1.2 ,theGroup-Lassoisusedwhenthegroupstructureofthecoecientvector,wisknown.Group-Lassotendstodrivethecoecientswithinagrouptozeroornonzerosimultaneously.Ithasbeenshownthat,ifthegroupstructureisknown,Group-LassogivesbetterperformancethanLasso[ 8 ].TheBayesianGroup-Lasso[ 15 ]isdevelopedfromBayesianLasso[ 14 ]andGroup-Lasso[ 21 ][ 15 ].InBayesianLasso,independentandidenticalLaplacepriorsareassumedoverindividualregressioncoecients.However,inBayesianGroup-LassoinsteadofindependentandidenticalLaplacepriorsovereachregressioncoecient,eachgroupisassumedtohaveindependentandidenticalMulti-Laplacepriors,withdimensionPg,givenby( 3{2 ).HerePgisthesizeofthegthgroup,inthegroupstructure.Thevectorwgisdenedastheregressioncoecients,thatbelongstothegthgroup.TheBayesianGroup-Lassomodelcanberepresentedasbelow: P(yjX,w,2)=NYn=1N(ynjxnwT,2)(3{1) P(wgj)=M-Laplace(wgj0,(Pg=2)1 2)(3{2) P(2jv0,s20)=InvGamma(2jv0,s20)(3{3) P(jr,s)=Gamma(jr,s)(3{4)Thelikelihooddistributionforyisgivenby( 3{1 ).ThisisthesameasinBayesianLassoasexplainedinSection 2.2 .Forestimatingthenoise,aninverseGammapriorisassumed 19

PAGE 20

overthevariable2,givenbyEquation( 3{3 ).Anotherparameterinthemodel,whichhastobeestimatedis.AswewillseeinSection 3.1.2 ,inverseof,aectsthevarianceoftheregressioncoecients.AGammadistributionisassumedforthevariable,sothattheposteriordistributionwillmaintainconjugacy. 3.1.1EquivalencetoGroup-LassoIntheGroup-Lasso,wehavearegularizationtermgivenbyg(w)=PGg=1p Pgkwgk2( 2{5 ).ForeaseofanalysistheGroup-Lassoequationisrestatedhere. minw,24 y)]TJ /F8 7.97 Tf 17.17 14.94 Td[(GXg=1Xgwg!T y)]TJ /F8 7.97 Tf 17.17 14.94 Td[(GXg=1Xgwg!+GXg=1p Pgkwgk235(3{5)IfwenoticetheBayesianGroup-Lassomodel,amultivariatePgdimensionalMulti-Laplacianpriorisassumedovereachgthregressioncoecientgroup.Inthelog-spaceoftheposteriordistribution,thispriorwillcorrespondtotheregularizationtermoftheGroup-Lasso[ 3 15 ]. 3.1.2HierarchicalModelTheuseofMulti-Laplacepriorfortheregressioncoecientswgisusefultoobtainsparsesolutionsofwg.However,thispriorcausescomputationaldiculties.Soatwo-levelhierarchicalmodelhasbeenproposedtorepresentthisprior[ 15 ]whichiseasiertoimplementandanalyze.ThisissimilartoBayesianLasso,wheretheLaplaceprioronwisexpandedasahierarchicalmodelinSection 2.2 .TheMulti-LaplacepriorcanbeexpressedasamixtureofnormaldistributionswithaGammamixingdistribution,Gamma2jjPg+1 2,2 Pg[ 15 ]:P(wgj2,)=M-Laplace(wgj0,(Pg=2))]TJ /F18 5.978 Tf 5.75 0 Td[(1 2) (3{6)/(Pg=2))]TJ /F17 5.978 Tf 7.78 4.52 Td[(Pg 2exp()]TJ /F5 11.955 Tf 9.3 0 Td[((Pg=2)1 2kwgk2) (3{7)/Z10N)]TJ /F4 11.955 Tf 5.48 -9.69 Td[(wjj0,22jGamma2jjPg+1 2,2 Pgd2j (3{8)ThedependencystructureoftheBayesianGroup-LassomodelisrepresentedgraphicallyinFigure 3-1 [ 15 ].IntheFigure 3-1 ,thecovarianceofwisdenotedby,whichisaDxD 20

PAGE 21

Figure3-1. HierarchicalstructureofBayesianGroup-Lassoregressionmodel:ThemostimportantpartofthediagramisthehierarchicalNormal-Gammarepresentationofregressioncoecient. matrix.Thematrixisadiagonalmatrixwith2jasdiagonalelements,witheach2jrepeatedPgtimes.TheparameterisassignedaGammaprior,withhyperparametersrands.Thus,allparametersofthemodelareestimatedfromthedata.However,westillhavetoselectappropriatehyperparametersforthemodel.WewilldiscusstheselectionofhyperparametersintheSection 3.2.3 3.2BayesianGroup-LassoImplementationAsdiscussedabove,priorsareassignedtomanyoftheparametersinthemodel.Severalintegrationsarerequiredtoestimatetheparametersofthemodel.However,inpracticetheseintegrationsaretoocomplextobedoneanalytically.Hence,aGibbssamplingapproachissuggestedforestimatingtheparameterdistributions.Forsamplingparametersofamodel,conditionalposteriordistributionsofalltheparametersarerequired.Thetwolevelhierarchicalmodeling,discussedinSection 3.1.2 allowsderivationofconditionalposteriordistributionsforthismodel[ 20 ]. 21

PAGE 22

3.2.1SamplingGibbssamplingbelongstothefamilyofMarkovChainMonteCarlo(MCMC)methods.InGibbssampling,wesamplefromfullyconditionalposteriordistributionsofthemodelparameters.Eachconditionalposteriordistributionoftheparametersisconditionedonthecurrentvaluesofallotherparametersandobserveddata.Ineachsamplingiteration,oftheBayesianGroup-Lassomodel,wesampletheparametersw,2,2jandfromtheirconditionaldistributions.Westopthesamplingiterationsoncewereachconvergence.ThecriteriausedintestingconvergenceforthismodelisdescribedinSection 3.2.2 .ThesamplesgeneratedfromtheMarkovChainafterconvergenceisassumedtobefromthejointposteriorofthemodel.Uponconvergence,alargenumberofsamplesaredrawnfromthejointposteriortocomputetheposteriormeanandvariance.Theposteriormeansareusedastheestimateofregressioncoecients.Inthiswork,thesoftwareWinBUGS14[ 12 ]isusedforGibbssampling. 3.2.2ConvergenceConvergenceofthesamplingtechniquesisimportant,sinceonlythesamplesafterconvergencecanbeconsideredtobesamplesfromthetrueposteriordistribution[ 4 ].Iftheconvergencecriterionisnotproper,dierencebetweentheestimatedposteriordistributionandtruedistributionmightbetoolarge.Thiswillleadtoerrorsintheestimationofparameters.Inthiswork,totestforconvergence,threeparallelinstances(alsoknownaschains)ofsamplingwererunfromdispersedstartingpointsandweremonitoredfortheirconvergence.Thenumberofiterationswilldeterminetheconvergenceoftheestimatedposteriordistribution.Inthiscase,theconvergencewasdecidedbasedonoverlapofdensityfunctionofcoecientsofthethreechains,formedfromsamplesaftertheburn-inperiod.Theburn-inperiodisthenumberofiterations,requiredbythesamplertoreachconvergenceandinourexperimentswexedtheburn-inperiodas10000iterations.Oncetheburn-inperiodwasover,5000sampleswerecollectedandtreatedassamplesfromthejointposteriordistribution.Anotherparametertochooseinsampling,isthevalueofthinning(Tr).ThinningisaprocessinwhichonlyeveryTthrsampleaftertheburn-inperiodisusedandtherestofthesamplesdiscarded.Thinningthe 22

PAGE 23

sequencehelpsinreducingtheautocorrelationbetweensavedsamples.Inourexperiments,thethinningratiowaschosenasTr=5whichresultedinaautocorrelationof0.1forthestoredsamples.Thesmallvalueofautocorrelationforstoredsamplesisanothertestforconvergence. 3.2.3ChoosingModelHyperparametersAlthoughalltheparametersareestimatedfromthedata,wehavetosetthevaluesofthehyperparameters.Thehyperparametersinthismodelarev0,s20,rands.Herev0,s20aretheshapeandrateparameteroftheinverseGammadistributionon2.Bychoosingsmallvaluesofv0ands20,theinverseGammadistributionbecomesadiusedprior,almosttendingtoauniformdistribution[ 14 20 ].Thishyperparameterselectionthusassumesnopreviousknowledgeoftheunderlyingsystem.However,ifwehavetheknowledgeofthenoiselevelofthedata,thenwecanincorporatesuchinformationinthesepriorparameters.Thevarianceassociatedwiththeweightparameterwjis2j.Hence,thevaluesof2jdeterminethesparsityofestimatedcoecients.AsinBayesianLasso,whenmostvaluesof2jarezero,mostvaluesofwjwillbeestimatedaszero.TheGammadistributionof2jhastherateparameterproportionalto1=.Insteadofassigningavalueto,itisappealingtoassignaprioronandestimateitalongwithotherparameters.Tomaintainconjugacy,isassignedaGammadistributionwithshapeandrateparameterrands.Thisisequivalenttosamplingtherateparameterof2jfromaninverseGammadistribution.Ifwehavesomeknowledgeaboutthesparsityoftheunderlyingmodel,wecouldtailorthehyperparametersrandstoincludethesparsityinformation.Forsimulationpurposes,valuesofrandswerechosentobesmallvalues(forexample,r=0.1,s=0.1),sothattherateparameterof2j,issampledfromadiusedprior.Sohereagain,weareassumingnopreviousknowledgeoftheunderlyingsystem[ 15 20 ]. 3.3ConceptofGroupStructureThissectiondescribestheconceptofgrouping.TostudythepropertiesofGroup-Lasso,HuangandZhang[ 8 ]introducedtheconceptofstronggroupsparsityandweakgroupsparsity.Wewilldiscussthesetwoconceptsinthissection.Thenotationsusedinthissectionwillbe 23

PAGE 24

referredinChapter 4 ,wherewediscusstheresults.Intypicalfeatureselectionscenarios,theindicatorofsparsityisthel0-normoftheregressioncoecientvector,w.Wedenegroupsofwbasedonourknowledgeofgroupstructure.Anon-sparsegroupisagroupofcoecientswithatleastonenon-zerocoecient.LetGnsdenotethenumberofnon-sparsegroups.LetPjgdenotethesizeofjthnonsparsegroup.Letkdenotethesumofgroupsizesofnon-sparsegroups.Thus,kisgivenbyEquation( 3{9 ) k=GnsXj=1Pjg(3{9) 3.3.1StrongGroupSparsityAcoecientvectorisdenedtohavestronggroupsparsityifthevaluefork=kwk0issmall.Thisisthecasewhenthenon-zerocoecientsareeectivelycoveredbythegroups.Considerthecasewhenk=kwk0=1.Thisiswhenallthenon-sparsegroupshaveallthecoecientsasnon-zero. Figure3-2. Exampleofregressioncoecientstructureforthecaseofstronggroupsparsity TheFigure 3-2 correspondstothevaluesinacoecientvector,whenithasstronggroupsparsity.Here,thecoecientsareassumedtohavevaluesofeither0or1.Itgivesanexampleofhownon-zerocoecientsaregroupedtogetherinastronggroupsparsitycase.Thedimension(D)ofthiscoecientvectoris18,numberofsparsefeatures(kwk0)is6,numberofGroups(Gnum)is6,numberofnon-sparsegroups(Gns)is2andfromEquation( 3{9 ),thetotalnumberofcoecientsinnon-sparsegroups(k)is6.Noteherethat,forthisexamplek=kwk0=1.Thisisanexampleofstrongsparsity. 3.3.2WeakGroupSparsityAcoecientvectorisdenedtohaveweakgroupsparsityifthevaluefork=kwk0islarge.Thisimpliesthatthegroupstructurechosendoesnotcorrespondtothesparsity 24

PAGE 25

structureofthecoecients.Thenon-sparsegroupsmighthavemanysparsecoecientsintheirgroup.Group-Lassoconsiderssparsityatthegrouplevel.Thus,whenkishighercomparedtokwk0,performanceofGroup-Lassowilldeteriorate.WewillquantitativelyanalyzetheperformanceinSection 4 Figure3-3. Exampleofregressioncoecientstructureforthecaseofweakgroupsparsity TheFigure 3-3 correspondstothevaluesinacoecientvector,whenithasweakgroupsparsity.Thedimension(D)ofthiscoecientvectoris18,numberofsparsefeatures(kwk0)is6,numberofGroups(Gnum)is6,numberofnon-sparsegroups(Gns)is4andfromEquation( 3{9 ),thetotalnumberofcoecientsinnon-sparsegroups(k)is12.Noteherethatk=kwk0=2.Thisisanexampleofweaksparsity.FurtherdiscussionoftheeectofgroupstructureandsamplesizeNisgiveninSection 4 25

PAGE 26

CHAPTER4RESULTSInthischapterwewillanalyzetheperformanceofBayesianGroup-Lasso.Thesimulationdatasetsaredesignedtotestcertaindesirablequalitiesofafeatureselectionalgorithm.Mostofthesetestsareadoptedfromavailableliterature,notablyfrom[ 13 ][ 6 ]and[ 15 ]SimulationResults InthissectionperformanceoffeatureselectionalgorithmssuchasLasso,Group-LassoandBayesianGroup-Lassoarecomparedusingdierentdatasets,preparedfromliterature.TheresultsofLassoandGroup-Lassowerecomputedusing5-foldcrossvalidation[ 7 ].ThealgorithmsestimatethecoecientvectorusingavailabledatayandX.Forquantitativeevaluation,wemustdenetheerrorintheestimatedcoecients.Therecoveryerroristherelativedierenceinl2-normbetweenestimatedcoecientvectorwandtruecoecientvectorw. Err=kw)]TJ /F6 11.955 Tf 11.96 0 Td[(wk2=kwk2(4{1)StrongGroupSparsity Inthisexperiment,wewillassumethatthecoecientshavestronggroupsparsity.WedenedanddiscussedstronggroupsparsityinSection 3.3.1 .Inthisscenario,theGroup-LassoandBayesianGroup-LassoaresupposedtoperformbetterthanLasso,byusingtheknowledgeofgroupstructure.Inthissimulationweaimtoquantitativelymeasurethedierenceinperformanceofthesemethods.Thisexperimentisamodicationoftheexperimentdescribedin[ 8 ].TheerrorperformanceofLasso,Group-LassoandBayesianGroup-Lassoarecompared.ThefeaturevectorsaresamplesgeneratedfromastandardGaussiandistribution,N(0,1).Theregressioncoecientvectorisrandomlygeneratedwithvalues1.Azero-meanGaussiannoisewithstandarddeviation=0.01isaddedtothesamples.ThedimensionD=128,numberofnon-zerocoecientskwk0=32,andgroupsizeisselectedtobeevenwithvalueGsize=8.Sothereare16groupsandthenumberofnon-sparsegroupsGns=4.Herethenumberofcoecientsinnon-sparsegroups,kis32.Thetargetvariableyiscomputedusing 26

PAGE 27

thelinearmodel,y=XwT+,whereisaGaussiannoisewithvariance2=0.01The Table4-1. Dataset1results MethodMeanerrorvaluevariance Lasso0.38750.113Group-Lasso0.00170.003BayesianGroup-Lasso0.16210.036 numberofsamplesNischosentobeequalto3kthatisN=96.TheresultsaregiveninTable 4-1 .Theexperimentwasrun10timesandthemeanerrorvalueandthevariancearereported.Asisclearfromtheresults,theBayesianGroup-LassoandGroup-LassoperformbetterthanLasso.InthenexttwoexperimentswewillseehowtheperformanceofLasso,Group-Lasso,BayesianGroup-Lassovarieswithchangesinnumberofsamplesandwithchangesinnumberofgroups.Experiment:changingnumberofsamples Inthisexperiment,wegeneratedthecoecientvectorandfeaturesetjustasinthepreviousexperiment.Asbefore,thedimensionD=128,numberofnon-zerocoecientskwk0=32andgroupsizeisselectedtobeuniformwithvalueGsize=8.Thus,thereare16groupsandthenumberofnon-sparsegroupsGns=4.Herethenumberofcoecientsinnon-sparsegroups,kis32.Thenumberofsamplesisvaried,N=32,96,160.ThiscorrespondstoN=k=1,3,5.IntheFigure 4-1 ,xaxisrepresentsthevalueN=kandyaxisrepresentstheErrvalue,asdenedinEquation 4{1 .Eachdatapointinthegraphsbelowiscalculatedbyrunningtheexperiment10timesandcalculatingthevarianceandmeanoftheErrvalue.AswecanseeintheFigure 4-1 ,whenthenumberofsamplesN=k,Lasso,Group-LassoandBayesianGroup-Lassohavesimilarperformance.Thiscouldbebecausethenumberofsamplesisquitelowcomparedtothenumberoffeaturesandthetechniqueuseddoesnotgiveanyadvantage.AsthenumberofsamplesincreasetoN=3k,wecanseethatGroup-Lassohasthelowesterror.WeshouldalsonotethatBayesianGroup-Lassoperformsbetterthan 27

PAGE 28

Figure4-1. Changingnumberofsamplesforastronggroupsparsitycase Lasso.However,inthiscasetheerrorvalueofBayesianGroup-LassoisinbetweenLassoandGroup-Lasso.Asthenumberofsamplesincreases,theerrorvalueofBayesianGroup-LassoandGroup-Lassoconvergetoalmostsimilarvalues.ThisisthecasewhenN=5k.WecanseefromFigure 4-1 thatGroup-LassoandBayesianGroup-LassoperformslightlybetterthanLasso.Experiment:changingnumberofgroups Inthisexperiment,wegeneratedthecoecientvectorandfeaturesetjustasinthepreviousexperiment.Asbefore,thedimensionD=128andnumberofnon-zerocoecientskwk0=32.Herethenumberofcoecientsinnon-sparsegroupsiskeptatk=32.ThenumberofsamplesiskeptatN=96.Thegroupsizeisselectedtobeuniform,butvariedasGsize=8,2,1.Duetochangeingroupsize,thenumberofnon-sparsegroupsalsochangesrespectively,Gns=4,16,32.IntheFigure 4-2 ,xaxisrepresentsthenumberofnon-sparsegroupsGnsandtheyaxisrepresentstheErrvalue,asdenedinEquation 4{1 28

PAGE 29

Figure4-2. Changingnumberofgroupsforastronggroupsparsitycase Considerthecasewhennumberofnon-sparsegroups,Gns=32.Inthiscase,thenumberofnon-sparsegroupsisthesameasthenumberofnon-sparsecoecients.TheerrorvalueofGroup-Lasso,BayesianGroup-LassoandLassoarealmostequal,withinstatisticalprecision.SinceGsize=1,theproblemformulationsforBayesianGroup-LassoandGroup-LassobecomethesameasLasso.Hence,theperformanceofallthreealgorithmsaresimilar.Asthenumberofnon-sparsegroupsdecreases,Group-LassoandBayesianGroup-Lassoexploittheknowledgeofthegroupstructureandperformbetter.ConsiderthecaseGns=4.Inthiscase,theerrorvaluesforBayesianGroup-LassoandGroup-LassoarefarbelowtheerrorvaluefoundbyLasso.WeakGroupSparsity Inthisexperiment,wewillinvestigatethecasewherethecoecientshaveweakgroupsparsity.AsdiscussedinSection 3.3.2 ,weakgroupsparsityimpliesthatthenon-sparsegroupsmighthavemanyzerocoecientsintheirgroup.Inthiscase,LassoisexpectedtoperformbetterthantheGroup-LassoandBayesianGroup-Lasso. 29

PAGE 30

Experiment:changingnumberofsamples Figure4-3. Changingnumberofsamplesforaweakgroupsparsitycase Inthisexperiment,wegeneratedthefeaturesetjustasinthepreviousexperiment.However,thecoecientvectorisgeneratedsothatithasaweaksparsegrouping.Asbefore,thedimensionD=128andnumberofnon-zerocoecientskwk0=32.Herethenumberofcoecientsinnon-sparsegroups,kiskeptat128.Thatisk=4kwk0.GroupsizeisselectedtobeuniformwithvalueGsize=8.Thus,thereare16groupsandthenumberofnon-sparsegroupsGns=16.Thenumberofsamplesisvaried,N=32,96,160.ThiscorrespondstoN=k=1,3,5.IntheFigure 4-3 ,xaxisrepresentsthevalueN=kandyaxisrepresentstheErrvalue,asdenedinEquation 4{1 .AswecanseeintheFigure 4-3 ,whenthenumberofsamplesN=k,Lasso,Group-LassoandBayesianGroup-Lassohavesimilarperformance.Thiscouldbebecausethenumberofsamplesisquitelowcomparedtothenumberoffeaturesandthetechniqueuseddoesnotgiveanyadvantage.AsthenumberofsamplesincreasetoN=3k,wecanseethatGroup-Lassohasthehighesterror.Thiscouldbebecauseoftheweakgroupsparsity.Since 30

PAGE 31

thegroupshaveweakgroupsparsity,thenumberofcoecientsconsideredtobenonsparsebyGroup-Lassocouldbehigherthantheactualvalues.ItisinterestingtocomparetheperformanceofBayesianGroup-LassoandGroup-Lassointhestronggroupsparsesituationandweakgroupsparsesituation.ConsiderthecasewhenN=k=2.AsseeninFigure 4-1 ,BayesianGroup-LassodoesnotperformaswellasGroup-Lasso.Onthesamenote,fromFigure 4-3 ,forN=k=2,performanceofBayesianGroup-LassodoesnotdeterioratelikeGroup-Lasso. 31

PAGE 32

CHAPTER5CONCLUSIONANDFUTUREWORKInthisthesis,westudiedthreemajormethodsinfeatureselectionandanalyzedtheirperformance.Lasso,abenchmarkamongfeatureselectionalgorithms,isanalyzedandthereasonforitssparsitypromotionisexplained.BayesianLasso,anextensionofLassogivessimilarperformancetoLassoandalsoestimatesthevarianceincoecientestimation.Group-Lasso,althoughsimilartoLassousesaregularizationfunctionwhichselectscoecientsingroups.Group-Lassoassumesknowledgeofthegroupstructureofthecoecients.WealsoimplementedtheBayesianGroup-LassomodelforregressionandcompareditsperformancewithLassoandGroup-Lasso.WealsodiscussedtheperformanceofBayesianGroup-LassoincomparisonwithLassoandGroup-Lasso,whenthenumberofsampleschangesandwhengroupsizechanges.Manyinterestingareasofresearchcameupduringtheanalysisandtestingofthesealgorithms.Oneshortcomingofalmostallthetechniquesdiscussedinthisthesisisthattheyassumealinearrelationbetweenthetargetvariableandfeatures.OneofthepossiblefutureworkistoimplementtheBayesianGroup-LassoconceptusingaKernelfunction.AnotherfutureworkistoimplementtheBayesianGroup-LassoconceptusingExpectationMaximizationmethod(EMmethod).Thecurrentproposedmethodusessamplingastheimplementationmethod.Onedisadvantageofusingsamplingistheheavycomputationaloverhead.IfwecanndasolutionforBayesianGroup-LassousingEMmethod,thesolutioncanbefoundmuchfaster. 32

PAGE 33

REFERENCES [1] D.AndrewsandC.Mallows.Scalemixturesofnormaldistributions.JournaloftheRoyalStatisticalSociety.SeriesB(Methodological),36(1):99{102,1974. [2] R.Baraniuk.Compressivesensing.IEEESignalProcessingMagazine,24(4):118{120,2007. [3] C.M.Bishop.PatternRecognitionandMachineLearning.Springer,2ndprintingedition,2007. [4] S.Chib.Markovchainmontecarlomethods:computationandinference.HandbookofEconometrics,5:3569{3649,2001. [5] S.Cotter,B.Rao,K.Engan,andK.Kreutz-Delgado.Sparsesolutionstolinearinverseproblemswithmultiplemeasurementvectors.IEEETransactionsonSignalProcessing,53(7):2477{2488,2005. [6] J.Friedman,T.Hastie,andR.Tibshirani.Anoteonthegrouplassoandasparsegrouplasso.ArXive-prints,2010. [7] T.Hastie,R.Tibshirani,andJ.Friedman.TheElementsofStatisticalLearning:Datamining,Inference,andPrediction.SpringerVerlag,2009. [8] J.HuangandT.Zhang.Thebenetofgroupsparsity.TheAnnalsofStatistics,38(4):1978{2004,2010. [9] S.Ji,D.Dunson,andL.Carin.Multi-taskcompressivesensing.IEEETransactionsonSignalProcessing,57(1):92{106,2009. [10] M.Kyung,J.Gill,M.Ghosh,andG.Casella.Penalizedregression,standarderrors,andbayesianlassos.BayesianAnalysis,5(2):369{412,2010. [11] Y.Lin.l1-normsparsebayesianlearning:Theoryandapplications.PhDDissertation,UniversityofPennsylvania,2008. [12] D.J.Lunn,A.Thomas,N.Best,andD.Spiegelhalter.Winbugs-abayesianmodellingframework:Concepts,structure,andextensibility.StatisticsandComputing,10:325{337,2000. [13] F.Mao.Astudyofjointclassierandfeatureoptimization:Theoryandanalysis.PhDDissertation,UniversityofFlorida,2007. [14] T.ParkandG.Casella.Thebayesianlasso.JournaloftheAmericanStatisticalAssociation,103:681{686,June2008. [15] S.Raman,T.J.Fuchs,P.J.Wild,E.Dahl,andV.Roth.Thebayesiangroup-lassoforanalyzingcontingencytables.InProceedingsofthe26thAnnualInternationalConferenceonMachineLearning,ICML'09,pages881{888,2009. 33

PAGE 34

[16] N.SubrahmanyaandY.Shin.Sparsemultiplekernellearningforsignalprocessingapplications.IEEETransactionsonPatternAnalysisandMachineIntelligence,32(5):788{798,May2010. [17] N.Subrahmanya,Y.C.Shin,andP.H.Meckl.Abayesianmachinelearningmethodforsensorselectionandfusionwithapplicationtoon-boardfaultdiagnostics.MechanicalSystemsandSignalProcessing,24(1):182{192,2010. [18] R.Tibshirani.Regressionshrinkageandselectionviathelasso.JournaloftheRoyalStatisticalSociety,SeriesB,58:267{288,1994. [19] D.WipfandB.Rao.Anempiricalbayesianstrategyforsolvingthesimultaneoussparseapproximationproblem.IEEETransactionsonSignalProcessing,55(7):3704{3716,2007. [20] N.YiandS.Xu.Bayesianlassoforqtlmapping.Genetics,179(2):1045{1055,2008. [21] Y.LinandH.H.Zhang.Componentselectionandsmoothinginsmoothingsplineanalysisofvariancemodels.TheAnnalsofStatistics,34:2272{2297,2003. [22] M.YuanandY.Lin.Modelselectionandestimationinregressionwithgroupedvariables.JournaloftheRoyalStatisticalSociety:SeriesB(StatisticalMethodology),68(1):49{67,2006. 34

PAGE 35

BIOGRAPHICALSKETCH ManuChandranreceivedhisBachelorofTechnologydegreeinelectronicsandcommunicationengineeringfromKeralaUniversity,Indiain2006.HejoinedtheDepartmentofElectricalandComputerEngineeringattheUniversityofFloridainAugust2008.Asagraduatestudent,heworkedininterdisciplinarymicrosystemsgroupandcontrolsystemsgroupattheUniversityofFlorida.Heconductedresearchinfeatureselectionandothermachinelearningtechniques.HereceivedhisMasterofSciencedegreefromtheDepartmentofElectricalandComputerEngineeringinthesummerof2011.Hisotherareasofinterestsincludeimageprocessing,signalprocessingandembeddedsystems. 35