Two Models Involving Bayesian Nonparametric Techniques

MISSING IMAGE

Material Information

Title:
Two Models Involving Bayesian Nonparametric Techniques
Physical Description:
1 online resource (175 p.)
Language:
english
Creator:
Sengupta, Subhajit
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Computer Engineering, Computer and Information Science and Engineering
Committee Chair:
Banerjee, Arunava
Committee Members:
Ho, Jeffrey Yih Chian
Gader, Paul D
Entezari, Alireza
Ghosh, Malay

Subjects

Subjects / Keywords:
bayesian-nonparametrics -- categorical-ibp -- de-finetti -- mcmc-gibbs -- negative-multinomial-process -- stiefel-manifold -- variational-bayes
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre:
Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
Deciphering latent structure in data is one of the fundamental challenges that the machine learning community has been grappling with in recent years. Developments in non-parametric Bayesian theory have helped researchers in Machine Learning to build successively better inference techniques to apply to real world problems. In particular, the Dirichlet Process Mixture Model (DPMM) has been exploited extensively for unsupervised clustering of data. Although inference techniques have been studied on different analytic manifolds, nonparametric modeling techniques on general analytic manifolds have not been thoroughly explored so far; there has been only limited work on particular special manifolds like the Stiefel and the Grassmann manifold.  In the first problem that we discuss in this dissertation, we present a Dirichlet process mixture model framework on the Stiefel manifold to automatically discover the number and membership of clusters. Solutions to both the standard approaches to inference -- Markov chain Monte Carlo, as well as variational inference -- are discussed. This novel approach to discovering the hidden structure in the data successfully combines directional statistics, the geometric structure of the space, and Bayesian learning. In support of the theoretical results, some real-world data sets as well as some synthetic data sets are clustered using our algorithm and satisfactory performance is demonstrated. The second problem that we discuss in this dissertation concerns the Latent feature model which has been extensively used in various machine learning applications. One of the important advances in latent feature modeling is the development of a stochastic process called the Indian Buffet Process (IBP). It defines a prior probability distribution over equivalence classes of sparse binary matrices with finite number of rows and an unbounded number of columns. Researchers have also shown the connection between IBP and an underlying stochastic process called the Beta process (BP).  In this part of the dissertation, we present an extension to the IBP that removes the "binary" constraint on the elements of the matrix. We show how this new process, called the categorical IBP, or cIBP, is related to another underlying stochastic process called the Beta Dirichlet process (BDP), and discuss its properties. Finally, we present the necessary conjugacy and inference techniques for a proposed application to topic discovery.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Subhajit Sengupta.
Thesis:
Thesis (Ph.D.)--University of Florida, 2013.
Local:
Adviser: Banerjee, Arunava.
Electronic Access:
RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2014-05-31

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2013
System ID:
UFE0045184:00001


This item is only available as the following downloads:


Full Text

PAGE 1

TWOMODELSINVOLVINGBAYESIANNONPARAMETRICTECHNIQUESBySUBHAJITSENGUPTAADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2013

PAGE 2

c2013SubhajitSengupta 2

PAGE 3

ThisdissertationisdedicatedtomyMaaandBabafortheirendlesssupportandlove 3

PAGE 4

ACKNOWLEDGMENTS Aftercompletingawonderfulvoyageofsixandhalfyears,Iwouldliketotakethisopportunitytothankallthepeoplewhoaccompaniedmeinthismemorablelongjourney.Withouttheirsupportandencouragement,itwouldbeimpossibleformetocometothispoint.Firstofall,IwouldliketothankmyadvisorsandmentorsDr.ArunavaBanerjeeandDr.JereyHowithoutwhomIwouldnotbeheretowritemydissertationintherstplace.ThisisprobablythersttimeIambeingsoformalastothankthem.Iowethemadebtofgratitudefortheirpatience,immensesupport,kindnessandbeliefinme.Arunavaisthepersonwhoinstilledthedesiretodomeaningfulresearch.Hehasamazingmotivationalprowess.ThereisnowayIcannishtalkingabouthimwithinaparagraph.IamespeciallythankfultoJeforbeingsogenerouswithhistimeforlasttwoandhalfyears.WehadcountlessinterestingandexcitingdiscussionwhichIwillbemissingthemostwhileIwillnotbeinGainesville.Hisappetiteforknowledgeshouldbeadreamforeveryyoungresearcher.IfeelreallyblessedtohaveArunavaandJeasmyteachers,fromwhomIwillnevernishlearninginmyentirelife.IwouldliketothankallothermembersofmycommitteeDr.PaulGader,Dr.AlirezaEntezari,Dr.MalayGhoshforspendingtheirinvaluabletimefornumeroushelpfuldiscussions.IwouldalsoliketothankDr.ArunavaBanerjee,Dr.JereyHo,Dr.AnandRangarajan,Dr.MeeraSitharam,Dr.PaulGaderfortheirexcellentcoursesinComputerSciencedepartment.Outsideourdepartment,IwouldliketothankDr.Rosalsky,Dr.Robinson,Dr.Ghosh,Dr.Doss,Dr.Presnell,Dr.Hobertfortheirwonderfulmathematicsandstatisticscourses.IamalsoverythankfultoDr.DonaldRichardsfromPennStateUniversityforinsightfulcommunicationsoveremailsrelatedtotherstproblemthatwewilldiscussinthisdissertation.Iwaspartiallysupportedbyagrant(IIS-0902230)fromtheNationalScienceFoundationtoArunava,backin2009-2010andbyaresearchassistantshipunderJein2011whichIgratefullyacknowledge.IamalsoverythankfultotheCISEDepartmentforsupportingme 4

PAGE 5

throughteachingassistantships(TA)overtheyears.IthasbeenagreatprivilegetospendseveralyearsintheCISEdepartmentatUniversityofFlorida.Thesemomentswillalwaysremaindeartome.IamthankfultoJohnBowersandJoanCrismanforalltheirhelpwithadministrativeissues.IamverygratefultoWikimediafoundationfortheirwonderfuleortofcreatingaworldwideknowledgeresourceforeverysinglehumanbeing.IwouldliketothankallofmyhousematesovertheyearsandallmyfriendsinUSandotherpartsoftheworld.Theywerealwaysthesourceofjoy,laughterandsupport.I'dliketogivespecialthankstomyfriendsKiranmoyDasandSubhadipPalwithwhomIspentlotoftimediscussingmanyinterestingstatisticalproblems.Iwouldliketothankallofmylab-matesKarthikGurumoothy,VenkatakrishnanRamaswami,AjitRajwade,JohnCorring,MohsenAli,JasonChi,ManuSethi,ShahedNejhum,NathanVanderkraats,NekoFisher.Ihadanwonderfultimeinmylabenjoyingcountlessinterestingconversationwithallofyouguys.IknowthatIleftoutmanynamesofmyreallygoodfriendsbutyouknowwhoyouare!FinallyIamthankfultomylovingandcaringmaa(mother)andbaba(father),whoalwaysbelieveinme,supportedmeeveryminute,teachingmethevaluesforbeingadisciplinedandresponsiblepersonandhelpedmetonishoneofmyexcitingandenjoyablebattlesinmylife. 5

PAGE 6

TABLEOFCONTENTS page ACKNOWLEDGMENTS ................................... 4 LISTOFTABLES ...................................... 10 LISTOFFIGURES ..................................... 11 ABSTRACT ......................................... 13 CHAPTER 1INTRODUCTION ................................... 15 1.1ProblemStatements ............................... 15 1.2PreviousRelatedWork .............................. 16 1.3OrganizationoftheDissertation ......................... 17 2BAYESIANINFERENCEANDNONPARAMETRICBAYESIANFRAMEWORK ... 19 2.1BayesianTheory ................................. 19 2.1.1MAPEstimate .............................. 20 2.1.2ConjugatePrior .............................. 20 2.2NonparametricBayesian ............................. 20 2.2.1MotivationandTheoreticalBackground ................. 21 2.2.2Exchangeability .............................. 22 2.2.3DeFinettiTheorem ........................... 22 2.2.4DirichletDistributionandDirichletProcess ............... 23 2.2.4.1PosteriorDistributionforDP ................. 25 2.2.4.2Polya'sUrnSchemeorCRP .................. 26 2.2.4.3StickBreakingRepresentationofDP ............. 28 2.2.4.4DirichletProcessMixtureModel(DPMM) .......... 29 2.2.5BetaProcess(BP) ............................ 32 2.2.5.1CompletelyRandomMeasure(CRM) ............. 36 2.2.5.2AnotherviewpointofCRM .................. 36 2.2.5.3BP ............................... 37 2.2.5.4Bernoulliprocess(BeP)andIndianBuetProcess(IBP) ... 38 2.2.5.5ConnectiontoIBP ....................... 39 2.2.6BPinaNutshell ............................. 39 2.3MarkovChainMonteCarloSampling ...................... 39 2.3.1Metropolis-Hastings(MH)andGibbsSampling ............. 41 2.3.2RejectionSamplingMethod ....................... 41 2.3.3AdaptiveRejectionSampling(ARS) ................... 42 2.3.4MCMCsamplingTechniquesforDPMM ................. 42 2.3.4.1SliceSampling ......................... 44 2.3.4.2EcientSliceSampling .................... 45 6

PAGE 7

2.4VariationalBayes(VB)Inference ........................ 46 2.4.1ApproximateInferenceProcedure .................... 46 2.4.2KL-Divergence .............................. 47 3GEOMETRICANDSTATISTICALPROPERTIESOFSTIEFELMANIFOLD .... 49 3.1GeometricPropertiesofStiefelManifold .................... 49 3.1.1AnalyticManifold ............................. 49 3.1.2StiefelManifold .............................. 49 3.1.3GroupAction ............................... 51 3.1.4TangentandNormalSpaceofStiefelManifold ............. 56 3.2StatisticalPropertiesofStiefelManifold .................... 58 3.2.1ProbabilityDistributiononStiefelManifold ............... 58 3.2.2PropertiesofMatrixLangevinDistribution ............... 59 3.2.3ComputationoftheHypergeometricFunctionofaMatrixArgument .. 60 3.2.4SamplingRandomMatricesfromMatrixLangevinDistributiononVn,p 61 3.2.5TheRejectionSamplingMethod ..................... 62 3.2.6GibbsSamplingMethod ......................... 63 4BAYESIANANALYSISOFMATRIX-LANGEVINONTHESTIEFELMANIFOLD .. 65 4.1Preliminaries ................................... 65 4.2MotivatingExample:DictionaryLearning .................... 66 4.3TheStiefelManifoldandMLDistribution ................... 67 4.4ParametricBayesianInferencefortheMLDistribution ............ 68 4.4.1LikelihoodfortheMLDistribution ................... 69 4.4.2PriorforthePolarPartM ........................ 69 4.4.3PosteriorforthePolarPartM ...................... 70 4.4.4PriorfortheEllipticalorConcentrationPartK ............. 70 4.4.5UpperandLowerBoundsforthe0F1()Function ............ 70 4.4.5.1ALowerBound ........................ 71 4.4.5.2LowerBoundsforI0(x) .................... 74 4.4.5.3Remarks ............................ 76 4.4.5.4LowerBoundfor0F1()UsingLowerBoundforI0(x) .... 76 4.4.6PosteriorfortheEllipticalorConcentrationPartD ........... 77 4.4.6.1RejectionSampling ...................... 78 4.4.6.2Metropolis-Hastings(MH)SamplingSchemeforD ...... 80 4.4.6.3HybridGibbsSampling .................... 80 4.4.7ExperimentsonSimulatedData ..................... 81 4.4.8ExtensionoftheModeltoaMoreGeneralK .............. 81 4.4.9Log-convexityoftheHypergeometricFunction ............. 83 4.4.9.1Asolution ........................... 83 4.4.9.2PossibleARSSampling .................... 85 4.5FiniteMixtureModeling ............................. 85 4.6InniteMixtureModeling ............................ 86 4.6.1DPMModelingontheStiefelManifold ................. 87 7

PAGE 8

4.6.2MCMCInferenceScheme ........................ 88 4.6.3VariationalBayesInference(VB)onStiefelManifold .......... 88 4.6.3.1Matrix-Langevindistributions ................. 90 4.6.3.2Updateequationfort .................... 90 4.6.3.3Updateequationfort .................... 91 4.6.3.4CGforminimizingF()ontheStiefelmanifold ....... 92 4.6.3.5Updateequationforn,t .................... 93 4.6.3.6CalculatedKL-Divergence ................... 94 4.7Experiments ................................... 95 4.7.1ExperimentsonSyntheticData ..................... 95 4.7.2CategorizationofObjects ........................ 96 4.7.3ClassicationofOutdoorScenes ..................... 98 5BETA-DIRICHLETPROCESSANDCATEGORICALINDIANBUFFETPROCESS 101 5.1MultivariateLiouvilleDistributions ....................... 101 5.2Beta-Dirichlet(BD)Distribution ........................ 103 5.3NormalizationConstantbyLiouvilleExtensionofDirichletIntegral ...... 103 5.4BDDistributionConjugacy ........................... 104 5.4.1WithMultinomialLikelihood ....................... 104 5.4.2WithNegativeMultinomialLikelihood .................. 105 5.5CompletelyRandomMeasure(CRM)Representation .............. 106 5.5.1AnotherViewpointforCRM ....................... 108 5.5.2Campbell'sTheorem ........................... 109 5.6BetaDirichletProcess .............................. 109 5.6.1BPconstructionbytakinglimitfromdiscretecase ........... 110 5.6.2ConstructionofBDP ........................... 110 5.7MultivariateCRM(MCRM)RepresentationofBDP .............. 112 5.8Beta-DirichletprocessasaPoissonprocess .................. 114 5.9ASize-biasedConstructionforLevyrepresentationofBDP ........... 115 5.10BD-CategoricalProcessConjugacy ....................... 118 5.11CategoricalProcess(CaP) ............................ 119 5.11.1ConjugacyforCaPandBDP{CRMFormulation ............ 119 5.11.2BDCaPConjugacyWithStandardParametrization ........... 122 5.11.3BDCaPConjugacyUsingAlternativeParametrizationforBDPintheBaseMeasure(G) ............................ 123 5.12BDCaP-Conjugacy-ProofStatement ..................... 124 5.13ExtensionofIndianBuetProcess ....................... 126 5.14ExtensionofFiniteFeatureModelandtheLimitingCase ............ 127 5.15BDprocess(BDP)andCategoricalIndianBuerProcess(cIBP) ........ 132 5.15.1SymmetricDirichlet ........................... 133 5.15.2AsymmetricDirichlet ........................... 136 5.15.3Connection ................................ 137 5.16BD-NMConjugacy ................................ 138 5.16.1NegativeMultinomialProcess(NMP) .................. 138 8

PAGE 9

5.16.2ConjugacyforNMPandBDP{CRMFormulation ........... 139 5.16.3FormalProofofConjugacyofNMPforBDP .............. 141 5.16.3.1PriorPart ........................... 146 5.16.3.2InducedMeasure ........................ 148 5.16.3.3MarginalDistributionofYViaMarkedPoissonProcess ... 151 5.16.3.4CheckingtheIntegration ................... 153 5.16.4TheCaseWhen=1 ......................... 154 5.17Beta-Dirichlet-NegativeMultinomialProcessasaMarkedPoissonProcess .. 155 5.18ExperimentwithSimulatedDataandResults .................. 157 5.18.1SyntheticData .............................. 159 5.19InferenceforBDNMModel ........................... 160 5.19.1NegativeMultinomiallikelihoodisconjugatetoBetaDirichletprior .. 160 5.19.2NegativeMultinomialasamixtureofGammaandmultivariateindependentPoisson(MIP) ............................... 160 5.19.3PosteriorinferencewithFiniteapproximateGibbssampler ....... 162 5.19.3.1BDdraws ........................... 163 5.19.3.2Negativemultinomialdraws .................. 163 5.19.3.3Gamma-Poissonconjugacy .................. 164 5.19.3.4Inferencesteps ......................... 164 5.19.3.5Samplingzd,n,yd,nandcd,n .................. 165 5.19.3.6SamplingAd,k ......................... 165 5.19.3.7Samplingb0,k ......................... 165 5.19.3.8Sampling!k .......................... 166 6DISCUSSIONANDFUTUREWORK ......................... 167 6.1FutureworkRelatedtotheFirstProblem .................... 167 6.2FutureworkRelatedtotheSecondProblem .................. 168 REFERENCES ........................................ 170 BIOGRAPHICALSKETCH ................................. 175 9

PAGE 10

LISTOFTABLES Table page 4-1Resultsforsyntheticdataset. ............................. 96 4-2ActualandEstimatednumberofclusterandaccuracywithrealdatahavingdierentnumberofclusters ................................... 98 10

PAGE 11

LISTOFFIGURES Figure page 2-1ThespaceofdataXandthespaceofallmeasuresM(X). ............. 22 2-2PartitionofX ..................................... 24 2-3CRPrepresentationofDP ............................... 28 2-4SBconstructionforDP ................................ 29 2-5GraphicalmodelforDPMM .............................. 31 2-6CRM{alternativeview ................................ 37 2-7BPinnutshell ..................................... 40 2-8Rejectionsamplingmethod .............................. 42 3-1Thetangentandnormalspacesofanembeddedmanifold .............. 56 4-1Lowerboundfor0F1(a;S)[inred]byRHSofequation 4{14 [inblue].x-axisrepresentsthesumofeigenvaluesandy-axisdenotesthefunctionvalues. ........... 73 4-2LowerboundforI0(x)[inred]byexp(x)]TJ /F6 11.955 Tf 12.39 0 Td[(0.77)[inblue].NotethattheinequalityI0(x)>exp(x)]TJ /F6 11.955 Tf 11.95 0 Td[(0.77)holdsonlyintheinterval[0,1]. ............... 77 4-3Thisisaapproximateproleforposteriordensityfunctionfora22diagonalmatrixwhen100datapointsaregiven ............................ 77 4-4GraphicalModelforvariationalinferenceofDPM .................. 88 4-5logmarginalprobabilityofthedataincreaseswithnumberofiterations ....... 95 4-6Confusionmatrixforallofthesimulateddataset .................. 97 4-7Selected6objectcategoriesfromtheETH-80dataset ................ 98 4-8ConfusionmatrixforETH-80dataset ........................ 99 4-9Selected3scenecategoriesfromthe8-Scenedataset ................ 99 4-10ConfusionmatrixforOutdoorScenedataset ..................... 99 5-1(Leftside)Poissonprocesson[0,]S2owithmeanmeasure=h.ThesetVcontainsaPoissondistributednumberofatomswithparameterRSh(d!)(dp).(Rightside)OnedrawfromBDconstructedfrom.Therstdimensionisthelocationandotherdimensionsconstitutetheweightvector. ................. 114 5-2BD-CategoricalprocesswithQ=2 .......................... 120 5-3AcandidatematrixwithQ=3has4categoriesnamely0,1,2or3. ........ 127 11

PAGE 12

5-4AcandidatematrixwithQ=2has3categoriesnamely0,1,2. .......... 132 5-5BD-NegativeMultinomialprocesswithQ=2 .................... 139 5-6TheHierarchicalBP-BD-NMprocesswithK=3andQ=2 ............ 158 5-7Top6topicsandtheirtop20words ......................... 161 12

PAGE 13

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophyTWOMODELSINVOLVINGBAYESIANNONPARAMETRICTECHNIQUESBySubhajitSenguptaMay2013Chair:ArunavaBanerjeeMajor:ComputerEngineeringDecipheringlatentstructureindataisoneofthefundamentalchallengesthatthemachinelearningcommunityhasbeengrapplingwithinrecentyears.Developmentsinnon-parametricBayesiantheoryhavehelpedresearchersinMachineLearningtobuildsuccessivelybetterinferencetechniquestoapplytorealworldproblems.Inparticular,theDirichletProcessMixtureModel(DPMM)hasbeenexploitedextensivelyforunsupervisedclusteringofdata.Althoughinferencetechniqueshavebeenstudiedondierentanalyticmanifolds,nonparametricmodelingtechniquesongeneralanalyticmanifoldshavenotbeenthoroughlyexploredsofar;therehasbeenonlylimitedworkonparticularspecialmanifoldsliketheStiefelandtheGrassmannmanifold.Intherstproblemthatwediscussinthisdissertation,wepresentaDirichletprocessmixturemodelframeworkontheStiefelmanifoldtoautomaticallydiscoverthenumberandmembershipofclusters.Solutionstoboththestandardapproachestoinference{MarkovchainMonteCarlo,aswellasvariationalinference{arediscussed.Thisnovelapproachtodiscoveringthehiddenstructureinthedatasuccessfullycombinesdirectionalstatistics,thegeometricstructureofthespace,andBayesianlearning.Insupportofthetheoreticalresults,somereal-worlddatasetsaswellassomesyntheticdatasetsareclusteredusingouralgorithmandsatisfactoryperformanceisdemonstrated.ThesecondproblemthatwediscussinthisdissertationconcernstheLatentfeaturemodelwhichhasbeenextensivelyusedinvariousmachinelearningapplications.Oneoftheimportantadvancesinlatentfeaturemodelingisthedevelopmentofastochasticprocesscalledthe 13

PAGE 14

IndianBuetProcess(IBP).Itdenesapriorprobabilitydistributionoverequivalenceclassesofsparsebinarymatriceswithnitenumberofrowsandanunboundednumberofcolumns.ResearchershavealsoshowntheconnectionbetweenIBPandanunderlyingstochasticprocesscalledtheBetaprocess(BP).Inthispartofthedissertation,wepresentanextensiontotheIBPthatremovesthe\binary"constraintontheelementsofthematrix.Weshowhowthisnewprocess,calledthecategoricalIBP,orcIBP,isrelatedtoanotherunderlyingstochasticprocesscalledtheBetaDirichletprocess(BDP),anddiscussitsproperties.Finally,wepresentthenecessaryconjugacyandinferencetechniquesforaproposedapplicationtotopicdiscovery. 14

PAGE 15

CHAPTER1INTRODUCTION 1.1ProblemStatementsThisdissertationcan,byandlarge,bedividedintotwopieces,withnonparametricBayesianprovidingthecommonthread.TherstconcernstheapplicationofBayesiannon-parametricinferencetodatalyinginnon-Euclideanspaces.ThesecondconcernstheapplicationofBayesiannon-parametricinferencetolatentfeaturemodelsofdata.Eachpieceisdescribedinturninthefollowingparagraphs.Inmachinelearning,oftentimes,collecteddatadonotlieinEuclideanspace.Forexample,datacorrespondingtoorientationlieonaunithyper-sphere.Inthegeneralcase,datamaylieonsomemanifoldwithnon-zerocurvature.OnesuchcompactmatrixmanifoldisknownastheStiefelmanifold.Itisthespaceoforderedporthonormalvectors,orp-frames,inRn,andcanbeseenasageneralizationoforientationdata.Standardprobabilitydistributionscannotbeusedtodescribetheorientation.Wehavetouseaprobabilitydistributionthatisdenedonthatmanifold.Insomesituationswearerequiredtoclustertheseorientationmatrices.In[ 1 ],wendonesuchmodelingrequirement.IntherstsegmentofthisdissertationwepresentaBayesiannonparametricmodelontheStiefelmanifold.Weseektoclusterdatapointsonthemanifolditself.Clusteringisawellknownprobleminunsupervisedmachinelearning,wherethegoalistogroupsimilarobjectstogether.Therearenumerousmethodsthathavebeenproposedtosolveclustering.K-means,MixtureofGaussians,EMaresomeofthemorepopularones.Inthecontextofclustering,nonparametricmodelingisparticularlyappropriatesincewemightnothavepriorknowledgeofthenumberofclusters.ByapplyingBayesiannonparametrictechniquesonecanpotentiallyavoidthemodelselectionproblem;theinferencetechniquesareabletondtherightnumberofclustersfromthedata.HereweformulateandsolveanovelclusteringmodelontheStiefelmanifoldbyrstdevelopingtheoryforthecasewherethenumberofclustersisknownbeforehand,andthenaugmentingitwiththeoryborrowedfromBayesiannonparametrics. 15

PAGE 16

ThesecondproblemthatwetackleinthisdissertationisbasedontheLatentfeaturemodelofdata.Inthismodel,eachobject/datapointisrepresentedbyavectoroflatentfeaturevalues.Thedatacanbegeneratedfromadistributiononthelatentfeaturevector.AgeneralizationtoinnitelymanyfeaturesisachievedbytheIndianBuetprocess(IBP)whichdenesaprioronthespaceofbinarymatricesthatindicatethepossessionofaparticularfeatureforanobject,withthenumberofcolumnsinthematrix(correspondingtofeatures)beingpotentiallyunbounded.Theobjectsareexchangeable,andthereforeaccordingtoDeFinetti'stheoremthereexistsarandomprobabilitymeasuresuchthattheobjectsareconditionallyindependentoncethismeasureisgiven.In[ 2 ]theauthorshaveshownthattheBetaProcessistheunderlyingDeFinettimeasurefortheIndianBuetprocess(IBP).Hereweshowthatthebeta-Dirichlet(BD)process[ 3 ]istheunderlyingDeFinettimeasureforageneralizationtoIBP,astochasticprocesswhichwecallCategoricalIBP(cIBP).Statedinformally,thisdissertationshowsthattheBDprocessplaysthatroleforthecIBPthattheDirichletProcess(DP)andtheBetaProcess(BP)playfortheChineseRestaurantProcess(CRP)andthetraditionalIBP,respectively.OursisanaturalextensionorgeneralizationofthetraditionalIBPwhereeachentryisacategoricalrandomvariableinsteadofaBernoullirandomvariable.WehavedevelopedahierarchicalBayesianframeworkwiththeBDprocessandusethisconnectiontodevelopanecientinferenceschemeforcIBP.Finallywepresentanapplicationintopicmodelingusingthisnewmodel. 1.2PreviousRelatedWorkResearchersinDierentialgeometryhavebeenstudyingthegeometricpropertiesofgeneralRiemannianmanifolds.Averygoodintroductioncanbefoundin[ 4 ],[ 5 ].AtthesametimemanyresearchersinterestedinaStatisticalstandpointhaveexploredinterestingstatisticalpropertiesthatthosesamemanifoldshave.Acomprehensivereviewcanbefoundin[ 6 ].Theauthorsin[ 7 ]haveexploredoptimizationtechniquesontheStiefelmanifold,thatis,techniquesinthepresenceoforthogonalityconstraints.Asurveyofdierentoptimizationtechniquesinvolvedinmatrixmanifoldscanbefoundin[ 8 ].Shapeanalysisisanareawherestatistics 16

PAGE 17

onmanifoldhasvariousapplications.RecentlynonparametricBayesiandensityestimationhasbeenappliedinplanershapestudywheretheauthorshaveidentiedtheplanershapespacewiththecomplexprojectivespacewhichisthespaceofallcomplexlinespassingthroughtheorigininanappropriatecomplexplane[ 9 ].Spatiotemporaldynamicalmodelscanalsobestudiedonthesespecialmanifolds[ 10 ].Imageandvideobasedrecognitionproblemshavebeenformulatedinthisframework.Recentlyanintrinsicmean-shiftalgorithmhasbeendevelopedonboththeStiefelandGrassmannmanifolds,whichhasfoundapplicationinobjectcategorizationandmotionsegmentation.ClassicationtechniquesonRiemannianmanifoldsalsohavebeendescribedinavarietyofresearchincomputervision.FortheLatentfeaturemodel,agoodreviewarticleonIBPcanbefoundin[ 11 ].TheconnectionbetweenIBPandtheBetaprocesscanbefoundin[ 2 ].[ 12 ]and[ 13 ]aretwootherinterestingworksrelatedtoIBP.ConceptsfromLevyprocessandPoissonprocessareusedinvariousproofs.[ 14 ],[ 15 ]and[ 16 ],[ 17 ]areexcellentreferencesforLevyandPoissonprocess,respectively.Hjort's[ 18 ]andKim's[ 19 ]workonthedevelopmentofnonparametricBayesiantechniqueinthecontextofsurvivalanalysisareoneoftheimportantadvancementsinthiseld.[ 3 ]isthemajormotivationforourworkinextendingtheIBPpriorinthecategoricalsetting.Withinthemachinelearningcommunity,[ 20 ]isaninterestingrecentrelatedwork. 1.3OrganizationoftheDissertationInchapter 2 ,wediscusstheBayesianframework,andinparticularthenonparametricBayesianmodelingtheorywithvariousexistinginferencetechniques.Chapter 3 containsthegeometricandstatisticalpropertiesoftheStiefelmanifold.Inchapter 4 wediscusstherstproblem-BayesiananalysiswithMatrixLangevindistributionontheStiefelmanifold.BothMarkovchainMonteCarlo(MCMC)andvariationalBayesinferencetechniquesareformulatedandexperimentalresultsarepresentedattheendofthechapter.Chapter 5 isaboutanewLatentfeaturemodelrelatedtotheBeta-Dirichlet(BD)process.WeshowconjugacyofthenegativemultinomialprocesswiththeBDprocess.Wealsodevelopahierarchicalmodeltoapplyonasyntheticdata-setinthedomainoftopicdiscoveryandshowexperimentalresults. 17

PAGE 18

Atthenendofthischapter,weexplicitlyshowtheconnectionbetweencIBPandBDPwherethesecondistheDeFinettimixingdistributionoftherst.Finallyinchapter 6 ,weconcludewithourfutureworkplan. 18

PAGE 19

CHAPTER2BAYESIANINFERENCEANDNONPARAMETRICBAYESIANFRAMEWORK 2.1BayesianTheoryBayesianinferenceisoneofthemostimportanttechniquesinmathematicalstatistics.InthebasicBayesianframework,apriordistributionfortheparameterreectsone'spriorknowledgeregarding.ThepriorisupdatedbyobservingthedataX1,X2,,Xn,whicharemodeledasindependentandidenticallydistributed(i.i.d)pgiven.Theupdateddistributionforbasedonthedata,iscalledposteriordistributionforandisobtainedviatheBayes'rule.Posteriorprobabilityisderivedfromthepriorprobabilityandthelikelihoodfunction.Theposterior,liketheprior,isaprobabilitymeasureontheparameterspace,whichdependsonX1,X2,,Xn.Itisrelativelyeasytondoutthepredictivedistributionwhichisusedtocalculatevariousstatisticsforthefutureobservations.LetuswritedowntheformaldescriptionofBayesiantheory,X=[X1,X2,,Xn]arenobserveddatapointsistheparameteristhehyper-parameterforthedistributionofthepriorXnewisanewdatawhosepredicteddistributionneedstobecomputedThepriordistributionforisdenotebyp(j).Thelikelihoodisdenotedbyp(Xj)=Qni=1p(Xij)asthedataaredrawni.i.dfromp.Theposteriordistributionofisgivenbyp(jX,)p(jX,)=p(Xj)p(j) p(Xj)=p(Xj)p(j) Rp(Xj)p(j)d/p(Xj)p(j)wherep(Xj)iscalledmarginallikelihood.TheposteriorpredictivedistributionisthedistributionofanewdatapointXnew,marginalizedovertheposteriordistributionandgivenby,p(XnewjX,)=Zp(Xnewj)p(jX,)d 19

PAGE 20

2.1.1MAPEstimateInBayesianstatistics[ 21 ],themodeoftheposteriordistributionisknownasmaximumaposteriori(MAP).MAPisusedtoobtainapointestimateoftheparameterbasedontheobserveddata.ThisestimationprocedurecanbeseenasaregularizedMaximumLikelihoodestimation.Thiscanbealsoseenasanoptimizationprocedureovertheparameterspacewheretheparametersnowhaveapriordistribution.MAPestimateisgivenby,^MAP=argmaxp(jX,)Whenthemodesoftheposteriordistributioncanbewritteninaclosedanalyticalform,computationofMAPestimateiseasier.Thisbringsthenotionofconjugateprior. 2.1.2ConjugatePriorWhentheposteriordistributionwhichisgivenbyp(jX,)isinthesamefamilyasthepriorprobabilitydistributionp(j),thepriorandposteriorarecalledconjugatetoeachotherandtheprioriscalledaconjugatepriorforthelikelihoodp(Xj).Conjugatepriorisanalgebraicconveniencesothattheposteriordistributioncanbeexpressedinaclosedform.Butitisnotagoodideatojustchooseanypriordistributionthatworksanalytically,conjugatepriordistributionshouldbechosensuchthatitadequatelydescribesinvestigator'sknowledgeoftheunknownparameter(s)beforeanydataisobtained.Forexample,BetadistributionistheconjugatepriorforBernoulliorBinomiallikelihood,DirichletdistributionistheconjugatepriorforMultinomiallikelihoodetc.Infact,ifthelikelihoodfunctionbelongstotheexponentialfamily,thentherealwaysexistsacorrespondingconjugatepriordistribution. 2.2NonparametricBayesianInparametricBayesiantheory,theformofthedistributionforeachclassofdataisassumed.Thisisinsomewayveryrestrictiveaswellasinexible.Inthiscase,thenumberofparametersdoesnotdependonsamplesize.Ontheotherhand,aBayesiannonparametricmodelisaBayesianmodelonaninnite-dimensionalparameterspace.Weknowthatmodelcomplexityisoftenrepresentedasthenumberofparameters.ontheotherhand,incaseof 20

PAGE 21

Bayesiannonparametric[ 22 ]modeling,theeectivemodelcomplexityadaptstothedataandhenceitisveryexible.Inmachineleaningresearch,Bayesiannonparametricmodelshaverecentlybeenstudiedandappliedtoavarietyofproblemsthatincludeclustering,classication,regression,densityestimation,imagesegmentation,documentprocessing,topicmodeling. 2.2.1MotivationandTheoreticalBackgroundMostoftherecentmachinelearningproblemsdealwithuncoveringthepatternsorstructureofthedata.Oneoftheimportantproblemisunsupervisedlearningwhereweneedtondtheappropriatesetofparametersforamodelclasswithoutanytrainingexamples.Themajorchallengeofmachinelearningistoidentifyallthemodelclasseswiththesuitableparametersthoseareresponsibleforgeneratingthedata-thisisthesocalledtheproblemofmodelselection.Tobeprecise,inthesettingofunsupervisedlearning,itisthenumberofunknownclusterslike-ndingtheunknownnumberofstateinthecaseofhiddenMarkovmodel.Firstly,wewouldliketopointoutthatnonparametricactuallymeansunboundednumberofparameters.Weuseaninnite-dimensionalparameterspaceanduseonlyanitesubsetoftheavailableparametersbasedonanygivennitedataset.Thissubsetgenerallygrowswiththedataset.Ferguson(1974)[ 23 ]wastheonewhorstexploredtheBayesianapproachfornonparametricproblems.Inoneofhisseminalpapershementionedabouttwodesirablepropertiesforthepriordistributionfornonparametricproblems: Thesupportofthepriordistributionshouldbelargewithrespecttothesuitabletopologyinthespaceofprobabilitydistributiononthesamplespace. Posteriordistributiongivenallthesamplesfromatrueprobabilitydistributionshouldbemanageableanalytically.Bayesianpriorisformulatedasadistributiononthespaceofprobabilitydistributions.Thedistributiononthespaceofprobabilitydistributionsorprobabilitymeasuresisastochasticprocess. ParametricBayesian-formofthedistributionforeachclassisassumed. NonparametricBayesian-parameterspacecontainssetofallprobabilitymeasureonX-M(X). 21

PAGE 22

TheBayesianprioristhenformulatedasadistributiononthespaceM(X). Figure2-1. ThespaceofdataXandthespaceofallmeasuresM(X). 2.2.2ExchangeabilityAsequenceofrandomvariablesX1,,Xnoverthesameprobabilityspace(X,B(X),)arecalledexchangeableifthejointdistributionofthoserandomvariablesisinvarianttopermutation.IfPisthejointdistributionandisanypermutationoff1,2,,ng,theninvarianceofpermutationgives,P(X1=x1,X2=x2,,Xn=xn)=P(X1=x(1),X2=x(2),,Xn=x(n))AninnitesequenceofrandomvariableareinnitelyexchangeableifX1,X2,,Xnareexchangeableforalln1.Thisassumptionisverycommoninanyapplicationofmachinelearningorappliedstatistics.Itismuchweakerassumptionthani.i.drandomvariables.Clearly,i.i.dimpliesexchangeabilitybuttheconverseisnottrue. 2.2.3DeFinettiTheoremIf(X1,X2,Xn)areinnitelyexchangeable,thenthejointprobabilityP(X1,X2,Xn)hasarepresentationasamixturedistributionforsomerandomvariable.Thejointprobabilityisgivenby:P(X1,X2,,Xn)=ZP()nYi=1P(Xij)dIfweassumeapriorontheunderlyingrandomlatentparameter,thedataareconditionallyi.i.dgiventhelatentparameter.InthecaseofDirichletprocess,P()isadistributionon 22

PAGE 23

spaceofprobabilitymeasure.SoDeFinettitheoremimplicitlydenesthestochasticprocessunderlyingtheBayesiannonparametricmodel.ThisgeneralversionofDeFinetti'sresultwasprovedbyHewittandSavage(1955)[ 24 ]. 2.2.4DirichletDistributionandDirichletProcessDirichletdistributionisamultivariategeneralizationofBetadistribution.ItisadistributionoverK-dimensionalprobabilitysimplex.Letp=(p1,p2,,pk)beaK-dimensionalprobabilityvector,suchthatforalli,pi0andPKi=1pi=1.LetuswriteDirichletdistributionwithparameter=(1,,K)asProb(pj)=Dir(1,,K)Dir(1,,K)=\(PKi=1i) QKi=1\(i)KYi=1pi)]TJ /F5 7.97 Tf 6.59 0 Td[(1iwithE[pj]=j PKi=1i,Var[pj]=j(j)]TJ /F16 7.97 Tf 6.59 5.98 Td[(PKi=1i) (PKi=1i)2(PKi=1i+1)andCov(pr,ps)=)]TJ /F12 7.97 Tf 6.59 0 Td[(rs (PKi=1i)2(PKi=1i+1)forr6=s.OnespecialcaseofthisdistributionisthesymmetricDirichletdistributionwherealltheparametersiisequalto.TheDirichletdistributionistheconjugatepriordistributionofthecategoricaldistributionandmultinomialdistribution.SoifdatalikelihoodfollowscategoricalormultinomialdistributionandthepriordistributionfollowsaDirichletdistribution,thentheposteriorwouldbeaDirichlet.DirichletprocessistheinnitegeneralizationofDirichletdistribution.Dirichletprocess(DP)[ 25 ]isarandomprobabilitymeasureGover(X,B(X))suchthatforanynitesetofmeasurablepartitionofX=A1[A2[[AN(G(A1),G(A2),,G(AN))Dir(A1,A2,,AN)whereisthebasemeasure.DPisadistributionoverprobabilitymeasuressuchthatmarginalsonnitepartitionsareDirichletdistributed.NowthisimmediatelyleadsustothequestionofexistenceofaDirichletprocess.Inotherwords,howtoconstructthestochasticprocessfromthesefamilyofmarginaldistributions.TheanswertothatisKolmorogovConsistencyorextensiontheorem.Hereistheformalstatementofthetheorem[ 26 ]: 23

PAGE 24

Figure2-2. PartitionofX Proposition2.1. (Consistencytheorem):LetTdenotesometimeintervalandletn2N.Foreachk2Nandnitesequenceoftimest1,t2,,tk2T,lett1,t2,,tkbeaprobabilitymeasureon(Rn)k.Supposethatthesemeasuressatisfythefollowingtwoconsistencyconditions: Forallpermutationsoff1,2,,kgandmeasurablesetsFiRn,t(1),t(2),,t(k)(F(1)F(2)F(k))=t1,t2,,tk(F1F2Fk) ForallmeasurablesetsFiRn,m2Nt1,t2,,tk(F1F2Fk)=t1,t2,,tk,tk+1,tk+m(F1F2FkRnRn| {z }m)]TJ /F4 7.97 Tf 6.59 0 Td[(times)Thenthereexistsaprobabilityspace(,F,P)andastochasticprocessX:T!Rnsuchthatt1,t2,,tk(F1F2Fk)=P(Xt12F1,Xt22F2,Xtk2Fk)forallti2T,k2NandmeasurablesetsFiRn,i.e.Xhast1,,tkasitsnitedimensionaldistributionsrelativetotimest1,,tk.Inotherwords,giventheaboveconsistencyconditions,thereexistsa(unique)measureon(Rn)Twithmarginalst1,,tkforanynitecollectionoftimest1,,tk.ThistheoremcanbeusedtoprovetheexistenceofDP.ADPGhastwoparameters: BasedistributionH,whichislikeexpectationofDP. 24

PAGE 25

Concentrationparameter,whichislikeinverseofvarianceoftheDP.TheExpectationandvarianceforDPareE[G(A)]=H(A)andV[G(A)]=H(A)(1)]TJ /F4 7.97 Tf 6.58 0 Td[(H(A)) +1,respectively,whereAisanymeasurablesubsetofX. 2.2.4.1PosteriorDistributionforDPLetGDP(,H).SinceGisarandomdistribution,let1,2,,nbethedrawsfromG.TheposteriordistributionofGgivenobservedvaluesof1,2,,n.LetA1,A2,,ANbeanitemeasurablepartitionofX,andletnkbethenumberofisuchthati2Ak.BythedenitionofDPandtheconjugacybetweentheDirichletandthemultinomialdistributionsitcanbeshownthat:(G(A1),G(A2),,G(AN)j1,,n)Dir(H(A1)+n1,,H(AN)+nN)Sincetheaboveistrueforallnitemeasurablepartitions,theposteriordistributionoverGmustbeaDPaswell.TheposteriorDPhastheconcentrationparameter(+n)andbasedistributionH+Pni=1i +n,whereiisapointmasslocatedatiandnk=Pni=1i(Ak).SorewritingtheposteriorDP,wehave:(Gj1,2,,n)DP+n, +nH+n +nPni=1i nNotethattheposteriorbasedistributionisaweightedaveragebetweenthepriorbasedistributionHandtheempiricaldistributionPni=1i n.Astheamountofobservationsgrowslargei.e.n,theposteriorissimplydominatedbytheempiricaldistributionwhichisinturnacloseapproximationofthetrueunderlyingdistribution(byGlivenko-Cantellilemma).ThisgivesaconsistencypropertyoftheDPwhichsimplyimpliesthattheposteriorDPapproachesthetrueunderlyingdistribution.ApartfromtheabstractdenitionofDP,ithastwomorerepresentations: Polya'sUrnschemeorChineseRestaurantProcess(CRP)and StickBreaking(SB)construction. 25

PAGE 26

2.2.4.2Polya'sUrnSchemeorCRPLetbeanynitepositivemeasureonacompletelyseparablemetricspaceX.AlsowehavewhichisarandomdrawfromaDPwithparameter,whichisalmostsurelydiscrete.AsequencefXn,n1gofrandomvariableswithvaluesinXisaPolyasequencewithparameterifforallBX P(X12B)=(B) (X) P(X(n+1)2BjX1,X2,Xn)=n(B) n(X)wheren()=()+Pni=1(Xi)()and(x)istheunitmeasureconcentratingatx.ForniteX,theurnschemecanbedescribedas-thesequencefXngcanbeseenasresultsofsuccessivedrawsfromanurnwhichhasinitially(x)numberofballsofcolorx,aftereachdrawtheballisreplacedandanotherballofthesamecolorisaddedtothaturn.Xn+1representsthecoloroftheballdrawnat(n+1)-thdrawandP(Xn+1)denotesthedistributionofthisevent. Proposition2.2. (BlackwellandMacQueen'sconstruction[ 27 ]ofDPviaPolya'sUrnscheme):LetXnbeaPolyasequencewithparameter.Then mn=n n(X)convergeswithprobability1asn!1toalimitingdiscretemeasure, isadrawfromaDPwithparameterand given,therandomvariablesX1,X2,areindependentwithdistribution.Inotherwords,wecanrealizeadrawGfromaDP(,H)asarandomprobabilitymeasure.TreatingGasDeFinettimeasure,letibethei.i.ddrawsfromG.BymarginalizingoutG,wecanwritedownthedistributionofn+1conditionedon1,n,whichiscalledaPolya'surnscheme,as,(n+1j1,2,,n)1 n+nXi=1(i)+ n+H 26

PAGE 27

NextcomesthedescriptionofDPasCRPwhichclearlyshowstheclusteringpropertyofaDP.Ifwedraw1,2,nfromaPolya'surnschemeandasDPisdiscretewithprobability1,sotherewillbealmostsurelym
PAGE 28

Figure2-3. CRPrepresentationofDP Itcanbeshownforlargeenoughn-ExpectednumberofclustersE(m)=nXi=1 +i)]TJ /F6 11.955 Tf 11.96 0 Td[(1'log(1+n ).Notethatthenumberofclustersgrowsonlylogarithmicallyinthenumberofobservations.Thisgrowthisslowanditalsoindicatestherich-gets-richerphenomenon.directlycontrolsthenumberofclusters,withlargerimplyingalargernumberofclustersapriori. 2.2.4.3StickBreakingRepresentationofDPIn1994,Sethuraman[ 28 ]gaveaconstructivedenitionofDP.ItisimportantinthesensethatitgivethethedenitionabouthowtheactualdrawfromaDPwouldlooklikeandthisconstructionhelpsinthesimulationoftheprocess.LetGbeadrawfromaDPwithbasemeasureHandconcentrationparameter,i.e.GDP(,H).AsGisdiscretewithprobability1,whichimpliesG=1Xk=1kkwhereP1k=1k=1andk=kk)]TJ /F5 7.97 Tf 6.58 0 Td[(1Yj=1(1)]TJ /F8 11.955 Tf 11.96 0 Td[(j)kBeta(1,)kH()TheSBconstructionofcanbeunderstoodasfollows.Startingwithastickoflengthunity,breakitat1,assigning1tobethelengthofstickwejustbrokeo.Nowrecursivelybreak 28

PAGE 29

theotherportiontoobtain2,3andsoforth.TheSBdistributionoverissometimeswrittenGEM(),wherethelettersstandforGriths,EngenandMcCloskey. Figure2-4. SBconstructionforDP Pitman-YorprocessisanotherexampleofSBprocesswherekBeta(1)]TJ /F3 11.955 Tf 12.39 0 Td[(a,b+ka).ThisisageneralizationofDPbecauseinthecaseofDPa=0andb=.AlsothereisanotherprocesswhichisaBetatwo-parameterprocesswherekBeta(a,b). 2.2.4.4DirichletProcessMixtureModel(DPMM)DPgivesrisetosomethingcalledatomicdistributionwithprobability1.AsitisnotpossibletogenerateanabsolutelycontinuousdistributionwithaDPpriorsoresearchersstudiedMixturedistributionwithDPprior.In[ 29 ],DPMMwererstappliedintheproblem 29

PAGE 30

inbio-assayandalsoin[ 27 ],predictivedistributionoftheDPareshowntoberelatedwithBlackwell-MacQueenurnscheme.Laterin[ 28 ],theconstructivedenitionofDPviaastickbreakingprocesswasgivenandthisexplicitlygivestheformulatosamplefromadistributionwhichhasaDPprior.DuetotheatomicnatureofDP,clusteringwiththispriorhasbeenwidelyused.IfGDP(,H)andi.i.dsequence1,2,,nG,theposteriordistributionofGgiventhedataandthepriorHtakesthesimpleform p(Gjy1,,yn)DP(H(A1)+n1,,H(An)+nn)wheren1,n2,,nnrepresentthenumberofobservationsfallingineachofthepartitionsA1,A2,...,Anrespectively,nisthetotalnumberofobservations,andYirepresentsthepointmassfunctionatthesamplepointYi.Wecanalsowritedownthepredictivedistributionforn+1 (n+1j1,2,,n)1 +n H+nXi=1i!(2{1)Thisrepresentationiscalledthechineserestaurantprocess,whichwehavealreadyseen.Thisnonparametricmodelingissuitableinmanysituationsbecauseitcanpotentiallymodelinnitenumberofclustersanditisusefulwherewedonothavethenumberofclustersbeforehandwhichincludesourapplicationaswell.ThegraphicalmodelforageneralDPMMmodelisgivenbelow:GDP(j,H)iG8i=1,2,,nxif(ji)8i=1,2,,nForexample,iff(j)isaGaussiandensitywithparameters,thenitiscalledaDPMMofGaussians.DPMMistheinnitegeneralizationofalatentmixturemodelwheremixingproportionsfollowsaDirichletdistributionanddistributionofclassindicatorsismultinomial. 30

PAGE 31

Figure2-5. GraphicalmodelforDPMM Letzibeaclusterassignmentvariable,whichtakesonvaluekwithprobabilityk.Thenaboveequationscanbeequivalentlyexpressedas:jGEM()zijmult()kjHHxijzi,fkgf(zi)withG=P1k=1kkandi=zi.Inmixturemodelingterminology,isthemixingproportion,karetheclusterparameters,F(k)isthedistributionoverdatainclusterkandHtheprioroverclusterparameters.IntheDPmixturemodel,theactualnumberofclustersusedtomodeldataisnotxed,andcanbeautomaticallyinferredfromdatausingtheusualBayesianposteriorinferenceframeworkwhichcanbedoneintwodierentways: MarkovchainMonteCarlo(MCMC)inferenceprocedure. 31

PAGE 32

VariationalBayes(VB)inferenceprocedure.Laterinthesamplingsectionandinchapter 4 wewilldiscusseachofthemethodsindetail. 2.2.5BetaProcess(BP)in1990,Hjort[ 18 ]introducedBetaprocessinthecontextofsurvivalmodel.Inthatsetupheworkedwithright-censoreddata.LetX1,,Xnbei.i.dwithunknowncumulativedistributionfunction(cdf)Fanddatamaybesubjecttorightcensoring.TheproblemwastoconstructnonparametricBayesestimatorforF.ThisinvolvesplacingapriorprobabilitydistributionofspaceFofallcdf'sanditisequivalenttotreatFasastochasticprocess.Hazardrate(t)isdenedastheratioF0(t) F[t,1)]andcumulativehazardfunction(CHF)canbedenedasA(t)=Rt0(s)ds.Fromthesetwodenitiononecansee,A(t)=Zt0dF(s) 1)]TJ /F3 11.955 Tf 11.95 0 Td[(F(s)]TJ /F6 11.955 Tf 9.3 0 Td[()F(t)=1)]TJ /F10 11.955 Tf 17.02 11.36 Td[(Ys2[0,t](1)]TJ /F3 11.955 Tf 11.96 0 Td[(dA(s))ThusF(t)isdenedviaaproductintegration.HjortwantedtodothenonparametricBayesianestimationofAinthesurvivalmodelwithrightcensoreddata.HederivedBetaprocess(BP)anditturnsoutthatthisclassgivesasuitablepriordistributionforAwithlargesupportandtractability.Betaprocessesproducecumulativehazardrateswithindependentincrementsandapproximatelybetadistributed.BetaprocesspriorformsamuchricherandexibleclassthanDirichletprocessprior.LetX1,,XnFarethesurvivaltimes.Letusalsodenotethecensoringtimebyc1,,cnC.LetTi=min(Xi,ci)andi=I(Xici)fori=1,,n.NowtheproblemstatementofHjortwas{givenfTi,igni=1,howtoestimateF?i.e.undernonparametricBayesiansetuphowtondasuitablepriorforFonF,whichisthesetofallprobabilitydistributiononR+.Intheprevioussubsection,wehaveseenDirichletprocess(DP)whichalsoaimstogiveapriordistributiononF.TheproblemwithDPwasitproducesaconjugatemodelwhenthedataarenotcensored.Soparticularlywithright-censoreddataDPisnotconjugate.ForthispurposeBPhasbeenshowntoproduceaconjugatemodel. 32

PAGE 33

ACHFcanbedenedasthefollowing{A(t)=Zt0P(X=sjXs)dsAisanon-decreasingfunctionwithA(0)=0andlimt!1A(t)=1andletusdenoteA(t)=A(t))]TJ /F3 11.955 Tf 12.09 0 Td[(A(t)]TJ /F6 11.955 Tf 9.3 0 Td[().CHFpropertygivesusA(t)2[0,1].Wealsoknowtheone-to-onerelationshipbetweenF(t)andA(t)asgivenbytheproductintegral.HjortdenedhisBetaprocessinthecontinuouscasebytakinglimitsoftime-discretemodel.WewillseesimilartypeofdiscreteprocesswhenwewilldiscussanotherstochasticprocessnamelyIndianBuetProcess(IBP).UnliketheDirichletprocesswhichisdenedbythedistributionofthejointprobabilitiesofanynitemeasurablepartitionof[0,1),theBetaprocessisbestdenedthroughthenotionofLevyprocesswhichhasthefollowingproperties: Itisalmostsurelynon-decreasing. Ithasnon-negativeindependentincrements. Itisalmostsurelyrightcontinuous. Thelimitgoestoinnityast!1. Itiszeroattheorigin.Nowthistypeofprocessesaretermedassubordinators.Itspathhasnitevariation.Forsuchaprocesstherecanexistatmostcountablymanynumberofxedpointsofdiscontinuityattimet1,t2,withjumpss1,s2,whichareindependentnon-negativerandomvariables.ThenAc(t)=A(t))]TJ /F10 11.955 Tf 12.76 8.97 Td[(Ptktskisanon-decreasingprocesswithindependentincrementsandwithnoxedpointsofdiscontinuityandthereforecanberepresentedbyLevy-Khintchineformula.Usingthis,Ac(t)canberepresentedbyitscharacteristicfunction{E(iAc(t))=ia(t)+expZ10)]TJ /F3 11.955 Tf 5.48 -9.69 Td[(eix)]TJ /F6 11.955 Tf 11.95 0 Td[(1dLt(x)=ia(t)+expZt0Z10)]TJ /F3 11.955 Tf 5.48 -9.69 Td[(eix)]TJ /F6 11.955 Tf 11.95 0 Td[(1(dsdx) 33

PAGE 34

whereaisnon-decreasingandcontinuous,witha(0)=0.Lt(x)iscalledLevymeasurewhichsatisesthefollowingproperties: foreveryBorelsetB,Lt(B)iscontinuousandnon-decreasing. foreveryrealt>0,Lt(.)isameasureontheBorelsetsof(0,1). Ac(t)isnitea.swheneverR10x 1+xdLt(x)isnite. R10x 1+xdLt(x)!0ast!0.Sincearepresentsanonrandomcomponent,soitcanbeconsideredtobeidenticallyzeroandnotethattheBrownianmotionpartoftheLevyprocesshastobezeroforasubordinator.ThenitcanbeshownthatanysubordinatorisalimitofacompoundPoissonprocess.LetN(t)beaPoissonprocesswithmean=Rts=0(s)ds.LetY(t)beindependentrandomvariablewithdistributionfunctionG.AcompoundPoissonprocessisgivenasAc(t)=XstY(s)I(N(s)=1)AnysubordinatorcanbeapproximatedbylettingbecomeslargeandY(t)becomessmall.AccordingtoKim'spaper[ 19 ],itisknownthatifthefollowingconditionshold,([0,t]D)=Zt0ZDdFs(x)dx=Zt0ZD(fs(x)dx)dswhereforallt,Rt0R10xfs(x)dxds<1.ThenthereexistsanuniquesubordinatorwhoseLevymeasurewillbegivenby.WecanalwaysaddasetU=fu1,,ulgwithnitenumberofxeddiscontinuitypointssothatthemodiedLevymeasurecouldberepresentedby([0,t]D)=ZDdLt(x)+XtjtZDdHj(x)whereHj(x)isthedistributionfunctionofjumpvariableUjcorrespondingtouj.SowecanwriteA(t)=Ac(t)+Ad(t)wherexedjumpsareonlyresponsibleforthepartAd(t).Withthissetup,ifU=;,AbecomesacompoundPoissonprocesssuchthatJ(t)=PstI(A(s)6=0)isaPoissonprocesswithintensityfunction(t)where(t)=R10ft(x)dx 34

PAGE 35

andconditionaldistributionofA(t)conditionalonA(t)>0isFt(x) (t).Thisisonlytruewhen(t)<1.WhenitisinnitewehavetoapproximateA(t)bytakinglimitsofsequenceofcompoundPoissonprocesses.WhenU6=;,AbecomesanextendedcompoundPoissonprocesswithxeddiscontinuitiesatfu1,,ulg.SotheimportantfeaturesofBetaprocessarethefollowing: BetaprocesspriorsputmeasuresonthespaceofCHFratherthanthespaceofdistributionfunction. Wecantakethepriorclasstobeasubordinator. Itisaspecialformofsubordinatorwhichisconjugatetoright-censoreddata. AsA(t)2[0,1],soapossiblepriorcanbebetadistribution. ForBP,A(t)Beta(0,c(t))orincompletebetadistributionandc(t)isanon-negativefunctionoftentakentobepositiveconstant. SoBPisasubordinatorwithLevymeasure(dsdx)=c(s)x)]TJ /F5 7.97 Tf 6.58 0 Td[(1(1)]TJ /F3 11.955 Tf 11.95 0 Td[(x)c(s))]TJ /F5 7.97 Tf 6.59 0 Td[(1dxdA0(s)whereE(A(s))=A0(s) IfA0hasbothcontinuousanddiscreteparts,thenBPcanbedecomposedintoA(t)=Acontn(t)+Adiscr(t)wheretheyareindependent.Acontn(t)isaBPwithparametersAcontn0(t)andc(t)andAdiscr(t)isPstHs.HsareindependentanddistributedasBeta(c(s)Adiscr0(s),c(s)(1)]TJ /F6 11.955 Tf 11.95 0 Td[(Adiscr0(s))) VarianceofBPisgivenbyVar(A(t))=Zt0dA0(s)(1)]TJ /F3 11.955 Tf 11.95 0 Td[(dA0(s)) 1+c(s)Hjortalsogavetheposteriordistributionwhenthedataareright-censoreddata.LetN(t)=Pni=1I(Xit,i=1)andY(t)=I(Xit).NowifAisaBPpriorwithparametersc(t)andA0(t)thentheposteriorofAdenotedbyAposthastheupdated 35

PAGE 36

parametersasApost0(t)=Zt0c(s) c(s)+Y(s)dA0(s)+Zt0dN(s) c(s)+Y(s)cpost(t)=c(t)+Y(t)MoredetailsonBPcanbefoundin[ 22 ]. 2.2.5.1CompletelyRandomMeasure(CRM)ByKingman's[ 30 ],[ 17 ]resultitisknownthatBPisoneparticularinstanceofageneralfamilyofrandommeasuresknownasCompletelyRandomMeasureorCRM.ACRMonaprobabilityspace,FisarandommeasuresuchthatforanydisjointmeasurablesetsB1,,Bn2F,therandomvariables(B1),,(Bn)areindependent.CRMcanbedecomposedintothreeparts[ 30 ].=f+d+o,wherefisthexedatomiccomponent,disthedeterministiccomponentandoistheordinarycomponent.NotethatoispurelyatomicandatomsofofollowasuperpositionofindependentPoissonprocesshenceaPoissonprocess.oisnotnecessarilysigma-nite.CRMcanbeobtainedfromanunderlyingPoissonpointprocess.Letusdenoteasigma-nitemeasure(dpd!)ontheproductspace[0,1]anddrawacollectionofpointsfpi,!ig1i=1fromthePoissonpointprocesswithmeanmeasure.Nowwecanconstructarandommeasureasfollows:=1Xi=1pi!iwhere!idenotesapointmassat!i.ForanymeasurablesetT,thediscreterandommeasureisgivenby(T)=X!j2Tpj 2.2.5.2AnotherviewpointofCRMCRMcanbethoughtofasafunctionalofaPoissonrandommeasure[ 31 ],whichcanbeexplainedwiththehelpofthegurebelow.HereB()denotestherespectivesigmaeld. 36

PAGE 37

Figure2-6. CRM{alternativeview Notethat,wecouldhavedenedmoregenerallinearfunctionallike{ZSXh(p)N(dp,dx)whereSisaseparablecompletemetricspaceandh:S!R+.Theyareknownash-biasedrandommeasures. Hereweareusingaverysimpleh,whichish(p)=p,soitissometimescalledsize-biasedrandommeasure. 2.2.5.3BPIfBisabetaprocesssuchthatBBP(c,A0),thenBisaCRMwiththeordinarycomponentoandthecorrespondingPoissonprocessratemeasureis(dpd!)=c(!)p)]TJ /F5 7.97 Tf 6.59 0 Td[(1(1)]TJ /F3 11.955 Tf 11.95 0 Td[(p)c(!))]TJ /F5 7.97 Tf 6.59 0 Td[(1dpA0(d!) 37

PAGE 38

WehereassumethatA0isabsolutelycontinuousandc(.)isanon-negativefunction.Forsimplicityc(.)istakentobeaconstant2RandcalledtheconcentrationparameterofBP.Thisparameteractuallyactsasaprecisionparameter.A0isanon-negativebasemeasure.TotalmassofA0()=iscalledthemassparameterofBPandassumedtobeniteandpositive.Notethat,the[0,t]spaceinHjorthasbeengeneralizedbyThibauxandJordan[ 2 ]toanabstractspace.LikeDP,isalmostsurelydiscrete.Thepair(pi,!i)correspondtoalocation!i2anditsweightpi2[0,1].Now([0,1])istypically1whichsaystheunderlyingPoissonprocessgeneratesinnitelymanypointsbut(T)isniteifA0isnite.IfA0isadiscretemeasureoftheformA0=P1i=1qi!i,(qi2[0,1])thenhasatomsinthesamelocationsalmostsurelyandcanbewrittenasP1i=1pi!i,wherepiBeta(c(!i)qi,c(!i)(1)]TJ /F3 11.955 Tf 11.96 0 Td[(qi))Thisformulationisusedinhierarchicalmodels. 2.2.5.4Bernoulliprocess(BeP)andIndianBuetProcess(IBP)In[ 2 ]theauthorsdenesaBernoulliprocesswithhazardmeasure,whichcanbewrittenasXBeP().Ifiscontinuous,thenXisnothingbutaPoissonprocesswithmeanmeasure.X=NXi=1!iwhereNPoisson(())!iareindependentdrawfromtheprobabilitydistribution ().Ifisdiscrete(e.gisadrawfromanyCRM)with=P1i=1pi!i,thenX=P1i=1gi!iandgiareindependentlyBernoullidistributedwithparameterpi.TheseauthorsalsodiscussedtheconjugacyofBPwithBeP.LetBP(c,A0)andtherearenindependentdrawsdenotedbyfXigni=1fromBeP().Notethat,eachXiisnowaBernoulliprocess.ThentheposteriordistributionofgivenfXigni=1isaBPwithparametersjfXigni=1BP(c+n,c c+nA0+1 c+nnXi=1Xi) 38

PAGE 39

2.2.5.5ConnectiontoIBPIBPwasintroducedin[ 32 ],[ 11 ].ItisastochasticprocesswhichcanbeviewedasafactorialanalogofCRPanditgeneratesaexchangeablepriordistributiononbinarymatriceswithinnitecolumns.Metaphorically,theprocesscanbedescribedasfollows.Thereisasequenceofcustomerstastingdishesinaninnitebuet.LetZbeainnitecolumnbinarymatrixandZibethei-thcustomer.Zik=1ifi-thcustomertastesk-thdish.i-thCustomertastesk-thdishwithprobabilitymk ik,wheremkisthepreviousnumberofcustomerswhohavealreadysampledk-thdish.Afterthati-thcustomertastesanadditionalnumberofdishesdrawnfromPoisson( i).Inthesecases,theabstractspaceisthespaceoffeatures.Inparticular,forIBPitisthedishes.BymarginalizingtheunderlyingDeFinettimeasure,whichisBPincaseofIBP,itwasshownin[ 2 ]thatXn+1jX1,X2,,XnBep(c c+nA0+Xjmn,j c+n!j)wheremn,jisthenumberofcustomerswhohavetrieddish!j.Alsonumberofnewdishesforxn+1isdistributedasPoisson(c c+n).Thisisbasicallyatwo-parametergeneralizationoforiginalIBPwhichhasonlyoneparameter.Inthiscasec=1and=.ItcanbeeasilyshownthattotalnumberofuniquedishesisPoisson(nXi=1c c+i)Poisson(+clogc+n c+1)ThistermgoestoPoisson()andPoisson(n)asctendsto0and1,respectively. 2.2.6BPinaNutshellThefollowinggure 2-7 isapictorialdescriptionofBPandallotherconnectionsfromit. 2.3MarkovChainMonteCarloSamplingMarkovchainMonteCarlo(MCMC)[ 33 ]isastrategyforgeneratingsamplesx(i)byexploringthestatespaceXusingaMarkovchain.Thechainisconstructedsothattheitconvergestothetargetdistributionf(x).Evolutionofthechainonlydependsupon 39

PAGE 40

Figure2-7. BPinnutshell thecurrentstateandaxedtransitionmatrix(discretestatespace)ortransitionkernel(continuousstatespace).TwoimportantconditionsregardingMarkovchainare: irreducibility:fromanystateoftheMarkovchain,thereisapositiveprobabilityforvisitinganyotherstate. aperiodicity:theMarkovchainshouldnothaveanycycle. positiverecurrent:Astateissaidtobetransientif,giventhatwestartinthatstatethereisanon-zeroprobabilitythatwewillneverreturntothatstate.Thistimeisalsocalledhitting-time.Thestateisrecurrentifisnottransient.Ifthehittingtimeofastatehasniteexpectation,itiscalledpositiverecurrent.AnirreducibleMarkovchainhasastationarydistributionifandonlyifallofitsstatesarepositiverecurrent.IfaMarkovchainisirreducible,positiverecurrent,andaperiodicthenforanyinitialprobabilitydistribution,theMarkovchainalwayseventuallyreachesthestationarydistribution.OnewaytodesigntheMCMCsamplerissuchthatthestationarydistribution 40

PAGE 41

happenstobethetargetdistribution.Onewaytoensurethatistosatisfythereversibilityordetailedbalancecondition. 2.3.1Metropolis-Hastings(MH)andGibbsSamplingInMetropolis-Hastings(MH)algorithmwithstationarydistributionf(x)andproposaldistributionq(x#jx),eachstepinvolvessamplinganewcandidatevaluex#giventhecurrentvaluexaccordingtoq(x#jx).Theprobabilityofacceptingthenewvaluex#isA(x#,x)=min(1,f(x#)q(xjx#) f(x)q(x#jx))andwithprobability(1)-271(A(x#,x))thechainremainsattheoldvaluex.GibbssamplerisonespecialtypeofMHsamplerwheretheacceptanceprobabilityforeachproposalisalways1.Itcanbedesignedwhenitispossibletosamplefromthefullconditionaldistribution.HereisthebasicGibbssamplerAlgorithm2Gibbssampling Initializex1,xnbyx01,x0n forr=1toN)]TJ /F6 11.955 Tf 11.95 0 Td[(1do forj=1tondo Samplex(i+1)jf(xjjx(i+1)1,,x(i+1)j)]TJ /F5 7.97 Tf 6.58 0 Td[(1,x(i)j+1,,x(i)n) endfor endfor 2.3.2RejectionSamplingMethodRejectionsamplingmethodisoneoftheimportanttechniquesinordertosamplefromadistributionp(x),whichisknownuptoaproportionalityconstant.Letq(x)isadistributionwherefromitiseasytosampleandforallx,p(x)
PAGE 42

Figure2-8. Rejectionsamplingmethod thentheacceptanceprobabilityp(xaccepted)=p(u
PAGE 43

DPMMmodel.Gibbssamplingmethodcanbeeasilyimplementedforconjugatepriordistribution.Weherechoosetoimplementthatparticularmethodwhichcanbeadaptedeasilyfornon-conjugatemodelaswell.PracticalmethodregardingDPMMrstdevelopedbyEscoberandWest[ 36 ]andMacEachernandMuller[ 37 ].Thebasicmodellookslike:yijiF(i)ijGGGD(G0,)Thedatay1,y2,,ynareexchangeableanddrawnfromamixtureofdistributionsdenotedbyF(i).Gisthemixingdistributionover.NowthepriorforGisaDirichletprocesswithaconcentrationparameterandbasedistributionG0.ThedevelopmentofMCMCalgorithmsforDPMMcomesfromtheCRPtypeofdescriptionofit.Duetointractability,exactcomputationofposteriorexpectationcannotbedone.AlthoughitcanbeestimatedusingMonteCarlomethods.WecansamplefromtheposteriordistributionofisbysimulatingaMarkovChainwhichhasthisposteriordistributionasitsstationarydistribution.Expectationforpredictivedistributionforanewobservationcanalsobecomputedinsimilarmanner.ThefollowingtwoalgorithmsareadaptedfromNeal(2000)[ 38 ].Onecanusetherstoneonlywhentheconjugatepriorsareusedbecauseitinvolvesanintegrationwithrespecttothebasemeasurewhichcannotbedoneanalyticallyincaseofnon-conjugatepriors.Inthecasewithnon-conjugatepriors,sometimesMonteCarlointegrationisusedtoapproximatetheintegral.Buttheerrormightbehighinmanysituations.Hereisthebasicideaofhowauxiliaryvariables[ 38 ]areusedinMCMCalgorithms.WecansamplefromadistributionPxofxbysamplingfromsomejointdistributionPxyfor(x,y)andthendiscardingy.NotethatthemarginaldistributionofxisPx.xisthepermanentstateoftheMCandanauxiliaryvariableyisintroducedtemporarilyintheupdatestep.ThefollowingarethegeneralstepsfortheMCMCsamplingusingauxiliaryvariable: 43

PAGE 44

Algorithm4AlgorithmforGibbssampling-caseofconjugateprior LetthestateoftheMarkovchainconsistsofc1,c2,,cn Let=(c:c2fc1,c2,,cng). Repeatedlysampleasfollows: fordoi=1ton iftthenhepresentvalueofciisassociatedwithnootherobservations removecifromthestate. endif Drawanewvalueforcifromcijc)]TJ /F4 7.97 Tf 6.58 0 Td[(i,yi ifthenc=ciforsomej6=i P(ci=cjc)]TJ /F4 7.97 Tf 6.58 0 Td[(i,yi,)=bn)]TJ /F14 5.978 Tf 5.75 0 Td[(i,c n)]TJ /F5 7.97 Tf 6.58 0 Td[(1+F(yi,c) else P(ci6=cj8j6=ijc)]TJ /F4 7.97 Tf 6.59 0 Td[(i,yi,)=b n)]TJ /F5 7.97 Tf 6.59 0 Td[(1+RF(yi,)dG0() endif iftthenhenewciisnotassociatedwithanyotherobservation drawanewvalueforcifromHiandaddittothestate. whereHiistheposteriorofgiventhepriorG0anddatayi. endif endfor foradollc2fc1,c2,cng Drawanewvaluefromcjycwhereyc=fyis.t.ci=cg endfor FromthejointdistributionPxy,ndouttheconditionaldistributionofyjxanddrawavaluefory. dosomeupdateon(x,y)leavingPxyinvariant. nallygettheupdatedvalueofxdiscardingy.Clearly,theupdateforxwillleavePxinvariantandthechainwillconvergetoPx.SoforDPMMwithnon-conjugatepriorsthefollowingalgorithmwasproposedbyNeal[ 38 ]. 2.3.4.1SliceSamplingOneofthepopulartechniquesforsamplingDPMmodelswasrstdescribedbyWalkerin[ 39 ].Originalslicesamplingalgorithmreplyonintegratingouttherandomdistributionfunctionfromthismodel.Thesearecalledmarginalmethods.TheslicesamplingtechniqueisbasedontheideathatallowstosampleasucientbutnitenumberofvariablesineachiterationofavalidMarkovchainwithcorrectstationarydistribution.Thesetypeofalgorithmsarecalledconditionalmethods.Theyaretypicallyverysimpletoimplement. 44

PAGE 45

Algorithm5AlgorithmforGibbssamplingwithmauxiliaryparameters LetthestateoftheMarkovchainconsistsofc1,c2,,cn Let=(c:c2fc1,c2,,cng). Repeatedlysampleasfollows: fordoi=1ton Letk)]TJ /F1 11.955 Tf 10.99 -4.34 Td[(bethenumberofdistinctcjforj6=i. Leth=k)]TJ /F6 11.955 Tf 9.74 -4.34 Td[(+m.Labelthesecjwithvaluesinf1,2,k)]TJ /F2 11.955 Tf 7.08 -4.34 Td[(g. ifthenci=cjforsomej6=i, drawvaluesindependentlyfromG0forcfork)]TJ /F8 11.955 Tf 10.41 -4.34 Td[(
PAGE 46

2.4VariationalBayes(VB)Inference 2.4.1ApproximateInferenceProcedureThegoalofthenonparametricBayesianinferenceistocomputetheposteriordistributionP(WjX,),whereobservationsareX=fx1,x2,,xng,latentvariablesareW=fw1,w2,,wkgandhyper-parametersare.UsingBayes'rulewehave,P(WjX,)=expflogP(X,Wj))]TJ /F3 11.955 Tf 11.96 0 Td[(logP(Xj)gTheposteriordistributionisintractablewhenDPMpriorisused.Twotypesofapproximateinferencestechniquesarepopularinrecenttimes.OneistosamplefromtheintractableposteriorwithMarkovchainMonteCarlo(MCMC)method.Anotherfasterwayistouseavariationaldistributiontolowerboundtheposteriordistributionandmaximizetheboundbyoptimizingoverthevariationalparameterspace.Thebasicideaofvariationalinference(VI)[ 41 ],[ 42 ]isverysimple.ThegoalistominimizetheKullback-Leibler(KL)divergencebetweenposteriorandvariationaldistribution.LetQ(W)denotethevariationaldistributionwhereisthesetofparametersfordistributionQ.TheKLdivergencebetweenthistwodistributioncanbewrittenas, DKL(Q(W)jjp(WjX,))=EQ[logQ(W)])]TJ /F9 11.955 Tf 11.96 0 Td[(EQ[logp(W,xj)]+logp(xj)(2{2)whereEQistheexpectationtakenwithrespecttothevariationaldistributionQ.Fortractableoptimizationprocedureusuallyafully-factorizedvariationaldistributionisconsidered,whichbreaksallthedependenciesamonglatentvariables.ThistypeofvariationalinferenceisalsocalledMeaneldvariationalinference.VIaimstoapproximatetheposteriordistributionwithafactorizeddistributionofknownform.ItisoftenfasterthanitsMCMCcounterpart.Itisadeterministicsearchalgorithmthattriestooptimizeagivenobjectivefunction.Inmostofthecases,becauseofnon-convexnatureofobjectivefunctionitisstucktoalocalminima.Sothechoiceofinitialconditionisreallycrucial.VIrequiresthefollowingsteps: 46

PAGE 47

ApproximatetheposteriorP(WjX,)usingafamilyofdistributionparametrizedbytheparameter. DetermineadistributionP2suchthatP=argminQ()2DKL(P(WjX,),Q())WhereDKL(P(WjX,),Q())istheKL-divergencebetweentwodistributions. Thevariationalparameteristheparameterofoptimization. 2.4.2KL-DivergenceKL-divergenceorKullback-Leiblerdivergenceisanon-symmetricmeasureofthedistancebetweentwoprobabilitydistributionsPandQDKL(P(X),Q(X))=Zp(X)logp(x) q(x)dx=EP[logp(X)])]TJ /F9 11.955 Tf 11.96 0 Td[(EP[logq(X)]where,p(X)andq(X)arethedensityfunctionsofPandQ,respectively.Ithasthefollowingproperties: )]TJ /F9 11.955 Tf 9.3 0 Td[(EP[logp(X)]iscalledtheentropyofPanditisdenotedbyH(P). DKL(P,Q)isnotametricasitisnotsymmetric. DKL(P,Q)0 DKL(P,Q)=0iP=Q.Wehave,DKL(Q(),P(WjX,)=)]TJ /F3 11.955 Tf 9.3 0 Td[(H(Q()))]TJ /F9 11.955 Tf 11.95 0 Td[(EQ()[log(X,Wj)]+EQ()[log(Xj)]AsDKL(Q(),P(WjX,)0,wehave logP(Xj)EQ()[log(X,Wj)]+H(Q())(2{3)So,thegoalistondaP2thatmaximizetherighthandsideoftheEquation 2{3 .Sotheoptimizationproblembecomes:maxQ()2EQ()[logP(X,Wj)]+H(Q()) 47

PAGE 48

maxQ()2EQ()[logP(WjX,)]+H(Q())+logP(Xj)Fortractability,QistakentobefullyfactorizeddistributionQ()=Qki=1Qi(wi)andeachQi(wi)andtheconditionaldistributionP(wijW)]TJ /F4 7.97 Tf 6.59 0 Td[(i,)areassumedtobelongtotheexponentialfamily.TheobjectivefunctioncanbewrittenasthesumofSifori=1,2,kandSiisgivenby:Si=EQi[logP(wijW)]TJ /F4 7.97 Tf 6.59 0 Td[(i,X,)])]TJ /F9 11.955 Tf 11.96 0 Td[(EQi[logQi(wi)]Nowusingtheformsoftheexponentialfamily,wehaveQi(wi)=h(wi)exp[iTwi)]TJ /F3 11.955 Tf 11.96 0 Td[(A(i)]p(wijW)]TJ /F4 7.97 Tf 6.59 0 Td[(i,X,)=h(wi)exp[gi(W)]TJ /F4 7.97 Tf 6.59 0 Td[(i,X,)Twi)]TJ /F3 11.955 Tf 11.95 0 Td[(A(gi(W)]TJ /F4 7.97 Tf 6.59 0 Td[(i,X,))]wheregi(W)]TJ /F4 7.97 Tf 6.58 0 Td[(i,X,)denotesthenaturalparameterforwiwhenconditioningontheremaininglatentvariablesandtheobservationsX.TakingthederivativesofSiandequatingtozero,wegettheupdateequationsas: i=EQ[gi(W)]TJ /F4 7.97 Tf 6.59 0 Td[(i,X,)]8i=1,2,k(2{4)Q()isassumedtobeafullyfactorizeddistribution,otherwisesimplicationintheaboveequationscannotbedone.AlsowithouttheassumptionthatP(wijW)]TJ /F4 7.97 Tf 6.59 0 Td[(i,)isfromanexponentialfamily,theupdateequation 2{4 maynothaveananalyticalform.Wewillusethisinferencetechniqueinthematerialinchapter 4 48

PAGE 49

CHAPTER3GEOMETRICANDSTATISTICALPROPERTIESOFSTIEFELMANIFOLD 3.1GeometricPropertiesofStiefelManifold 3.1.1AnalyticManifoldAd-dimensionaltopologicalmanifold[ 4 ],[ 5 ]Misatopologicalspacethatsatises: MisaHausdorspacewhichisatopologicalspaceifforanytwodistinctpointxandy,thereexistneighborhoodsUxaroundxandUyaroundysuchthatUxandUyaredisjoint. MislocallyhomeomorphictoEuclideanspace.Foranypointx2MthereexistsaneighborhoodUMaroundxandanmapping:U!Rd,suchthat(U)isanopensetinRd.Theyarecalledco-ordinatechart(U,)together. Missecondcountable,thatis,thereexistsacountablesystemofopensets,knownasabasis,suchthateveryopensetinMisthecountableunionofsomesetsinthebasis.Giventwocoordinatecharts(U,)and(V, ),ifU\Visnon-empty,thenthemap )]TJ /F5 7.97 Tf 6.59 0 Td[(1isdenedfromtheopenset (U\V)2Rdtotheopenset(U\V)2Rd.Ananalytic(smoothorC1)manifoldisamanifoldsuchthatforallcoordinatecharts(U,)and(V, )eitherU\VisemptyorU\Visnonemptyandthemap )]TJ /F5 7.97 Tf 6.59 0 Td[(1isanalytic. 3.1.2StiefelManifoldStiefelmanifold[ 7 ],[ 6 ],[ 8 ],[ 43 ][ 44 ]Vn,pisthespacewhosepointsarep-framesinRn,whereasetofporthonormalvectorsinRniscalledap-frameinR(pn).TheStiefelmanifoldVn,pisrepresentedbythesetofnpmatricesXsuchthatXTX=Ip,whereIpistheppidentitymatrix;soVn,p=fX(np);XTX=Ipg.SotheStiefelmanifoldVn,pconsistsofnp\tall-skinny"orthonormalmatrices.Vn,pdenesasurfacethatisasubsetofasphereofofradiusp pinRnpwithEuclideandistance.ThisisdirectconsequenceofthefactthatforX=(xi,j)2Vn,p(i=1,2,,n;j=1,2,,p)andXiXjx2i,j=p. 49

PAGE 50

Followingaresomeinterestingspecialcases[ 43 ]: a1-frameisjustaunitvector,soV1,n=Sn)]TJ /F5 7.97 Tf 6.59 0 Td[(1. anorthonormaln-frameisidenticaltoanorthogonalmatrix,soVn,n=O(n),theorthogonalgroupconsistingallorthogonalnnmatrices,withthegroupoperationbeingmatrixmultiplication. anorthonormal(n)]TJ /F6 11.955 Tf 12.11 0 Td[(1)-framecanbeextendeduniquelytoanorthonormaln-framewithmatrixofdeterminant1,soVn)]TJ /F5 7.97 Tf 6.59 0 Td[(1,n=SO(n),thespecialorthogonalgroup(normalsubgroupofO(n))consistingallnnrotationmatrix.TheStiefelmanifoldmaybeembeddedinthenp-dimensionalEuclideanspaceofnpmatrices.ClearlyVn,pisasubsetofthesetRnpwhichadmitsalinearmanifoldstructure.ThesetRnpisavectorspacewithstandardsumandmultiplicationbyascalar.Nowclearlythishasanaturallinearmanifoldstructure.Achartofthismanifoldisgivenby:Rnp!RnpandX2Rnp!vec(X)2Rnpwherevec(X)istheoperationofstackingallthecolumnsofamatrixXfromlefttorightbelowoneafteranother.ThedimensionofthismanifoldRnpisnp.NowmanifoldRnpcanbefurtherturnedintoaEuclideanspacewiththestandardinnerproductdenedas:hX1,X2i:=vec(X1)Tvec(X2)=trace(XT1X2)ThisinnerproductinducesanormwhichisthestandardFrobeniousnormanddenedbyjjXjj2F=trace(XTX).Considerafunctionh:Rnp!sym(p)andX!(XTX)]TJ /F3 11.955 Tf 12.88 0 Td[(Ip),wheresym(p)isthesetofallsymmetricppmatrices.Clearly,sym(p)isavectorspace.Alsonotethat,Vn,p=h)]TJ /F5 7.97 Tf 6.59 0 Td[(1f0pg. Proposition3.1. (Submersiontheorem)[ 8 ]LetF:M1!M2beasmoothmappingbetweentwomanifoldsofdimensiond1andd2,d1>d2,andletybeapointonM2.IfyisaregularvalueofF(i.e.therankofFisequaltod2ateverypointofF)]TJ /F5 7.97 Tf 6.59 0 Td[(1(y)),thenF)]TJ /F5 7.97 Tf 6.59 0 Td[(1(y)isaclosedembeddedsubmanifoldofM1,anddim(F)]TJ /F5 7.97 Tf 6.59 0 Td[(1(y))=d1)]TJ /F3 11.955 Tf 11.96 0 Td[(d2. 50

PAGE 51

InordertoobtainthedimensionofVn,p,werstnoticethatthedimensionofsym(p)is1 2p(p+1)asasymmetricmatrixiscompletelydeterminedbyitsuppertriangularpart,includingthediagonal.Fromtheabovepropositionwehave,dim(Vn,p)=np)]TJ /F6 11.955 Tf 13.15 8.08 Td[(1 2p(p+1)ThereisanotherwaytorealizethedimensionofVn,pbyndingoutthenumberoffunctionallyindependentconditionsonnpelementsofX.Wewillndoutthenumberofdependentconditionforonecolumneach.Fortherstcolumnitis1becauseeachcolumnisaunitvector.Forthesecondcolumnitis2because,notonlyitisaunitvectorbutalsoithastobeorthogonaltotherstcolumn.Similarlyforthethirdoneitis3,becauseithastobeorthogonaltothersttwocolumns.Soforallthepcolumnsnumberofdependentconditionis1+2++p=1 2p(p+1).SothedimensionofVn,p=numberofindependentconditions=np)]TJ /F5 7.97 Tf 13.93 4.71 Td[(1 2p(p+1).Topologically,Vn,pisnothingbutatopologicalproductofpspheresasSn)]TJ /F5 7.97 Tf 6.59 0 Td[(1Sn)]TJ /F5 7.97 Tf 6.59 0 Td[(2Sn)]TJ /F4 7.97 Tf 6.58 0 Td[(p.WeseeVn,pasanembeddedmanifoldofRnp,itstopologysubsettopologyinducedbyRn,p.AllthecolumnsoftheelementofaStiefelhasunitnorm,soFrobeniousnorm,jjXjj2F=p p,soVn,pisbounded.Also,Vn,pistheinverseimageoftheclosedsetf0pgunderthecontinuousfunctionh,soitisclosed.ThenbyapplyingHeine-Boreltheoremwehave,Vn,piscompact. 3.1.3GroupAction Denition1. LetGbeagroupandXaset.ThenGissaidtoactonX(ontheleft),ifthereismapping:GX!Xsatisfyingtwoconditions IfeistheidentityelementofG,then(e,x)=x8x2X Ifg1,g22G,then(g1,(g2,x))=(g1g2,x)8x2XWedenetherightactioninsimilarmanner.WhenGisatopologicalgroup,Xisatopologicalspace,andiscontinuous,thentheactioniscalledcontinuous. 51

PAGE 52

Oneexamplesofgroupactionis-invertiblelinearmapactingonarealvectorspace::GL(n)Rn!Rn,(A,x)=Ax,whereGL(n)isthegroupofinvertiblennmatrices.ThisexamplecanbeextendedtoothersubgroupsofGL(n),suchasorthogonalgroupO(n),specialorthogonalgroupSO(n)etc. Denition2. Twopointsx1,x22XaresaidtobeequivalentunderG,writtenx1x2,ifthereexistsag2Gsuchthatx2=(g,x). Denition3. ThefunctionfwhosedomainisX,issaidtobeinvariantunderG,iff((g,x))=f(x)8x2X8g2G Denition4. Ifx1x28x1,x22X,thenthegroupGissaidtoacttransitivelyonXandXissaidtobehomogeneouswithrespecttoG. Denition5. Forx02X,wedenethesubgroupG0ofGasisotropygroupofGatx0whichconsistsofalltransformationswhichleavesx0invariant.G0=fg2G:(g,x0)=x0g Denition6. LetG0betheisotropygroupofGatx02X.Foreachg2G,thesetgG0=f(g,g0),g02G0gGiscalledaleftcosetofG0ofG.wedenethequotientG=G0:=fgG0,g2GgasthesetofcosetsofG0inG.Inotherwords,ifx0isanypointofahomogeneousspaceX(withrespecttoagroupG)andG0isthesubgroupconsistingofallelementsofGwhichleavex0invariant,andifh2Gtransformx0intox,thenthesetofallelementsofGwhichtransformx0intoxistheleftcosethG0.Thusthepointsx2Xareinone-to-onecorrespondencewiththeleftcosetshG0.Henceaspace,homogeneouswithrespecttoagroupoftransformations,mayberegardedasaspaceof(left)cosetsofthegroup. 52

PAGE 53

WetakethesetX=Vn,pandG=O(n).Theaction(left)ofO(n)onVn,pisgivenby:O(n)Vn,p!Vn,p,(Q,A)=QAwiththegroupoperationbeingmatrixmultiplication.O(n)actstransitivelyonVn,p.TheisotropysubgroupofO(n)at[Ip:0]T2Vn,pisG0=8><>:264Ip00B13752O(n),B12O(n)]TJ /F3 11.955 Tf 11.96 0 Td[(p)9>=>;andthecosetcorrespondingtoQ12Vn,pis[Q1:Q2]G0whereQ2isanyn(n)]TJ /F3 11.955 Tf 12.35 0 Td[(p)suchthat[Q1:Q2]2O(n).ThecosetconsistsofallorthogonalnnmatriceswithQ1astherstpcolumns.WritingthehomogeneousspaceVn,pasthecosetspaceoftheisotropygroupwehaveVn,p=O(n)=O(n)]TJ /F3 11.955 Tf 11.95 0 Td[(p). Denition7. Anexteriordierentialform[ 44 ],[ 6 ]ofdegreerinRnisanexpressionofthetypeXi1
PAGE 54

ForanarbitrarynmmatrixX,thesymbol(dX)willdenotetheexteriorproductofthemnelementsofdX:(dX)^mj=1^ni=1dxij.Similarly,ifX2sym(m),thesymbol(dX)willdenotetheexteriorproductofthe1 2m(m+1)distinctelementsofdX:(dX)^1ijmdxij,and,ifX2skew-sym(m),thesymbol(dX)willdenotetheexteriorproductofthe1 2m(m)]TJ /F6 11.955 Tf 12 0 Td[(1)distinctelementsofdX:(dX)^i
PAGE 55

Thisdenesaninvariantmeasure[ 44 ]ontheStiefelmanifoldVn,p.ThesurfaceareaorvolumeoftheStiefelmanifoldVn,pisVol(Vn,p)=ZVn,p(HT1dH1)ItcanbeshownthatZVn,p(HT1dH1)=2pnp 2 )]TJ /F4 7.97 Tf 6.77 -1.8 Td[(p(n 2),where)]TJ /F4 7.97 Tf 6.77 -1.79 Td[(p()isthemultivariateGammafunction,whichisageneralizationoftheGammafunction.)]TJ /F4 7.97 Tf 6.77 -1.79 Td[(pn 2=p(p)]TJ /F17 5.978 Tf 5.75 0 Td[(1) 4pYj=1)]TJ /F10 11.955 Tf 8.77 16.86 Td[(n 2+1)]TJ /F3 11.955 Tf 11.96 0 Td[(j 2 Proposition3.3. IfXisatopologicalspaceandGisatransitivecompacttopologicalgroupoftransformationsofXontoitselfsuchthatHXisacontinuousfunctionofHandXintoX,thenthereexistsanitemeasureonXinvariantunderG.isuniqueinthesensethatanyotherinvariantmeasureonXisaconstantnitemultipleof.ThemeasuredenedonVn,piscalledinvariantunnormalizedmeasure.ItisoftencalledtheHaarmeasure.Thismeasurecanbenormalizedtoaprobabilitymeasurebysetting[dH]=1 Vol(Vn,p)(HT1dH1)sothat,ZVn,p[dH]=1Wedenitethismeasureby,sothat(A)=ZA[dH]8A2B(Vn,p)whereB(Vn,p)istheBorel-algebrageneratedbytheopensetsofVn,p.Nowhasthefollowingtwoproperties: ()isleft-invariantundertheactionofO(n)onVn,p,so(QA)=(A)8Q2O(n) ()isright-invariantundertheactionofO(p)onVn,p,so(AQ)=(A)8Q2O(p) 55

PAGE 56

()playsthesameroleonVn,pthattheLebesguemeasureplaysonRn,butasthemanifoldVn,piscompactsothismeasureisnite.ThusitistheuniformdistributiononVn,pand()istheuniqueprobabilitymeasurewhichisinvariantunderrotationsandreections. 3.1.4TangentandNormalSpaceofStiefelManifoldThetangentspace[ 7 ]atapointpistheplanetangenttothesubmanifoldatthatpoint.Forad-dimensionalmanifold,thecorrectwaytovisualizethetangentspaceistolookitasad-dimensionalvectorspacewiththeoriginatthepointoftangency.Normalspaceistheorthogonalcomplementofthisvectorspace.LetusdeneapointonStiefelbyYandwehaveYTY=I.Ondierentiating,wegetYT+TY=0)YTisskewsymmetric.Forthisskew-symmetryweseethattherearep(p+1) 2constraintson.Sothevectorspaceofalltangentvectorhasdimensiondim(Vn,p) dim(Vn,p)=np)]TJ /F3 11.955 Tf 13.15 8.09 Td[(p(p+1) 2=p(p)]TJ /F6 11.955 Tf 11.96 0 Td[(1) 2+p(n)]TJ /F3 11.955 Tf 11.95 0 Td[(p)(3{1)AsweareviewingtheStiefelmanifoldasoneembeddedinEuclideanspace[ 7 ],wecanusethe Figure3-1. Thetangentandnormalspacesofanembeddedmanifold standardinnerproductinnp-dimensionalEuclideanspace,whichishT12i=trace(T12) 56

PAGE 57

andthisisalsoFrobeniousinnerproductfornpmatrices.ThenormalspaceatapointYconsistsofallthematriceswithtrace(TN)=0Usingthefactthatanymatrixcanbebrokenasasumofitsprojectionontothenormalandtangentspaces,weseethatthegeneralformoftangentdirectionsatY,=YA+Y?B=YA+(I)]TJ /F3 11.955 Tf 11.96 0 Td[(YYT)CwhereAisppskew-symmetric,Bisany(n)]TJ /F3 11.955 Tf 12.76 0 Td[(p)p,Cisanynpmatrix.Clearly,B=YT?C.Y?isanyn(n)]TJ /F3 11.955 Tf 11.95 0 Td[(p)suchthatYYT+Y?YT?=I.TheexactclosedformofthegeodesicequationforacurveY(t)onVn,pisgiveninEdelman'spaperby:Y(t)=Y(0),_Y(0)expt0B@A)]TJ /F3 11.955 Tf 9.3 0 Td[(S(0)IA1CAI2p,pe)]TJ /F4 7.97 Tf 6.58 0 Td[(AtwhereA=YT_YandSisasymmetricmatrixgivenby_YT_Y.OntheotherhandusingthequotientgeometryonStiefel,wecandenecanonicalmetricandgeodesic[ 7 ].Thecanonicalmetricisgivenbygc(,)=trace(T(I)]TJ /F6 11.955 Tf 13.15 8.09 Td[(1 2YYT)).Regardingthegeodesic,Edelman[ 7 ]gavethefunctionalformofitemanatingfromYinthedirectionofHbyY(t)=YM(t)+QN(t)whereYandHarenpmatricessuchthatYTY=IpandA=YTHisskewsymmetricandQR:=K=(I)]TJ /F3 11.955 Tf 11.95 0 Td[(YYT)H 57

PAGE 58

isthecompactQR-decompositionofK.M(t)andN(t)aregivenbythefollowingmatrixexponential0B@M(t)N(t)1CA=expt0B@A)]TJ /F3 11.955 Tf 9.3 0 Td[(RTR01CA0B@Ip01CA 3.2StatisticalPropertiesofStiefelManifold 3.2.1ProbabilityDistributiononStiefelManifoldLetXbeannprandommatrixonVn,p.ThedierentialformdX(discussedlater)givestheunnormalizedinvariantmeasureonStiefelVn,p.Thisinturngivesanormalizedinvariantmeasure[ 6 ],[ 43 ]ornormalizedHaarmeasurewhichistheuniformdistributiondenotedby[dX]=(dX)=Vol(Vn,p),whereVol(Vn,p)=2pnp 2 )]TJ /F14 5.978 Tf 4.82 -.99 Td[(p(n 2). Proposition3.4. IfXisdistributeduniformlyonVn,p,thenH1XHT2isalsouniformlydistributedforanyH12O(n)andH22O(p)whereH1andH2areindependentofX,hencewehaveE(X)=0 LetH1andH2bei.i.ddistributedonO(n)andO(p),respectively,andletX0beanynpmatrixinVn,p,constantorindependentofH1andH2,thentherandommatrixH1X0HT2isuniformlydistributedonVn,p. ArandommatrixuniformlydistributedonVn,pcanbeexpressedasX=Z(ZTZ))]TJ /F17 5.978 Tf 7.78 3.26 Td[(1 2OurBayesianframeworkusesoneofthenon-uniformdistributionsonVn,pwhichisknownastheMatrixLangevindistribution.Theonedimensionalspecialcaseofthisdistributionisvon-Misesdistributiononhypersphere.SobasedonthenormalizedHaarmeasure,anexponentialfamilyofprobabilitydistributionhasbeendenedonVn,pinthefollowingway.Thedensityfunctioncanbewrittenas: dF(X)=1 0F1(n 2,1 4FTF)exp(trace(FTX))[dX](3{2) 58

PAGE 59

whereFisannpparametermatrixand0F1isafunctionofzonalpolynomialoralternativelyhypergeometricfunctionwithamatrixargument.Wecanwritethenormalizingconstantas:0F1(n 2,1 4FTF)=ZVn,pexp(trace(FTH1))[dH1]where,dH1isnormalizedinvariantmeasureonVn,p.Alsoitcanbeshownthat:0F1(n 2,1 4FTF)=ZO(n)exp(trace(FTH1))[dH]whereH=[H1:H2]2O(n)anddHisnormalizedinvariantmeasureonO(n).Actuallythemoregeneralandexibleclassofprobabilitydensitieshavinglinearaswellasquadratictermsisgivenby-fBMF(XjA,B,F)/exp(trace(FTX+BXTAX))whereAandBaregenerallytakentobesymmetricmatricesandmatrix.ForF=0,wegetMatrixBinghamdistributionandforAorBequalstozerowegetMatrixLangevindistribution.InthisdissertationwehaveonlytalkedabouttheMatrixLangevindistribution. 3.2.2PropertiesofMatrixLangevinDistribution Thisdensity[ 6 ],[ 43 ]hasamodeatX=M,whereMisthepolarpartororientationofF.Fcanbedecomposedintoproductoftwomatricesi.eF=MKbythepolardecompositionmethod.Clearly,M2Vn,pandKisappsymmetricpositivesemi-denitematrix.KistheellipticalpartorconcentrationofF. WesawthatF=MKandKisasymmetricpositivedenitematrix,soKcanbeeigendecomposedbyK=UDUTwhereU2O(p)andDisadiagonalmatrixdiag(1,2,p).Nowwecanwritethehypergeometricnormalizingconstantas:0F1(n 2,1 4FTF)=0F1(n 2,1 4KTMTMK)=0F1(n 2,1 4KTK)0F1(n 2,1 4KTK)=0F1(n 2,1 4UDUTUDUT)=0F1(n 2,1 4D2)whereMTM=IpasM2Vn,pand0F1()dependsonlyontheeigenvaluesoftheparametermatrix. LetX1,X2,Xnaretheobservationsgeneratedfromthisdensity.Asthisisanexponentialfamilythemaximumlikelihoodestimator(MLE)oftheparameterFis 59

PAGE 60

^F=^M^Kanddeterminedby:E^F(X)=XButXmaynotbeapointonVn,p,soitisnotansuitableestimator.LetRbetheellipticalpartofX,thenR=XTX1 2sowehave^M=XR)]TJ /F5 7.97 Tf 6.59 0 Td[(1.Inordertoget^KwecanusethefollowingequationsR=UTdiag(r1,r2,,rp)UwhereU2O(p)andr1r2rp^K=UTdiag(^1,^2,,^p)Uwhere^1^2^pri=@ @i0F1n 2,1 4diag(21,22,2p(1,,p)=(^1,,^p)8i=1,,pTheapproximatesolutionaregivenby^i'nriwhen^iissmall8i=1,,p(n)]TJ /F3 11.955 Tf 11.96 0 Td[(p)]TJ /F6 11.955 Tf 13.15 8.09 Td[(1 2)^)]TJ /F5 7.97 Tf 6.59 0 Td[(1i+pXj=1(^i+^j))]TJ /F5 7.97 Tf 6.59 0 Td[(1'2(1)]TJ /F3 11.955 Tf 11.95 0 Td[(ri)when^iislarge8i=1,,p 3.2.3ComputationoftheHypergeometricFunctionofaMatrixArgumentWehaveadaptedthealgorithmthatwaspresentedinKoev(2006)[ 45 ]paperinordertoapproximatethehypergeometricfunctionofamatrixargument.Thehypergeometricfunctionofamatrixargumentisscalar-valued.Thehypergeometricfunctionofamatrixargumentisdenedasfollows.Letp0andq0beintegersandletXbeannncomplexorrealsymmetricmatrixwitheigenvaluesx1,x2,,xn.Then,pF()q(a1,,ap;b1,,bq;X)1Xk=0X`k(a1)()(ap)() k!(b1)()(bq)()C()(X),where>0isaparameter,`kmeans=(1,2,)isapartitionofki.e.120areintegerssuchthatjj=1+2+=k,(a)()Y(i,j)2a)]TJ /F3 11.955 Tf 13.15 8.09 Td[(i)]TJ /F6 11.955 Tf 11.96 0 Td[(1 +j)]TJ /F6 11.955 Tf 11.96 0 Td[(1isthegeneralizedPochhammersymbolandC()(X)istheJackfunctionwhichisasymmetric,homogeneouspolynomialofdegreejjintheeigenvaluesx1,x2,,xnofX. 60

PAGE 61

Theapproximationofthisinniteseriesisdonebycomputingitstruncationform,whichis,mpF()q(a1,,ap;b1,,bq;X)mXk=0X`k(a1)()(ap)() k!(b1)()(bq)()C()(X),Thisseries: convergesforanyX,whenpq convergesifmaxijxij<1whenp=q+1 divergeswhenp>q+1 -termconvergestozeroasjj!1asitconvergesKoevet.al.exploitedrecursivecombinatorialrelationshipsbetweentheJackfunctions,whichallowedthemtoonlyupdatethevalueofaJackfunctionfromotherJackfunctionscomputedearlierintheseries.Thismethodisreallyecientintermsofcomputationalcomplexitywhichisonlylinearinthesizeofthematrixargument.Inaveryspecialcase,whenXisamultipleoftheidentity,algorithmisevenfaster. 3.2.4SamplingRandomMatricesfromMatrixLangevinDistributiononVn,pNoticethat,whenp=1,theStiefelmanifoldVn,1isnothingbuttheunithypersphereSn)]TJ /F5 7.97 Tf 6.58 0 Td[(1inRn.Thisdistribution[ 46 ]onunithypersphereistermedasvon-Mises-Fisher(vMF)distributionwhichhasadensityw.r.ttheuniformdistributiongivenby:pvMF(xj,c)=cn=2)]TJ /F5 7.97 Tf 6.59 0 Td[(1 (2)n=2In=2)]TJ /F5 7.97 Tf 6.59 0 Td[(1(c)exp(cTx)wherex2Snwherec0andkk=1andIrdenotesthemodiedBesselfunctionoftherstkindandorderr.Wood(1994)[ 47 ]providedastraightforwardrejectionsamplingproceduretosampleavectorfromvMFdistributiononSn.Ho(2009)[ 46 ]discussedtwosamplingmethods-oneviarejectionsamplingandanotherusingGibbssamplingmethodtogeneratesamplefromMatrixLangevin(ML)distributionwhosedensityisgivenby:fML(XjF)=1 0F1(n 2,1 4D2)exp(trace(FTX))whereX2Vn,p 61

PAGE 62

andDisthediagonalmatrixofsingularvaluesofF.BothofthesemethodsactuallybasedonthesamplingprocedureforvMFdistributiononSn. 3.2.5TheRejectionSamplingMethodTheparametermatrixFcanbedecomposedviaSingularValueDecompositionorSVDinF=UDVT,whereUandVarenpandpporthonormalmatrices,respectivelyandDisdiagonalmatrixwithpositiveentries.TheprobabilitydensityofXcanbewrittenas/exp(trace(FTX))=exp(trace(VDUTX))=exp(trace(DUTXV)).ThemodeofthisprobabilitydensityisatX=UVTandtheentriesinD,theconcentrationparametersindicatethathowclosearandommatrixclosetoitsmode.ThemodendingproblemisisequivalenttondingthenearestorthogonalmatrixtoagivenmatrixM.TondthisorthogonalmatrixR,oneusesthesingularvaluedecompositionM=WYTtowriteR=WYT.Fortherejectionsamplingmethod,auniformdensityenvelopfuwasused.AsthemodeoftheMLdistributionisatUVTsothedistributionhasthemaximumdensityf0ML=exp(trace(D)) 0F1(n 2,1 4D2)InitiallygeneratenpnumberofrandomvariablesusfromNormal(0,1)andarrangetheminanpmatrixZandformX=Z(ZTZ))]TJ /F5 7.97 Tf 6.58 0 Td[(1=2.NowZisfullrankwithprobability1.AswecanseefromthepropertiesoftheuniformdistributiononVn,p,Xwillbeuniformlydistributed.Weneedtogenerateuuniform(0,1)independentofX,nowiff0MLu
PAGE 63

Algorithm6MLsamplegeneration-I SampleX[,1]ML(H[,1]) forr=2topdo ConstructNr,anorthonormalbasisfornullspaceofX[,(1,2,,r)]TJ /F5 7.97 Tf 6.59 0 Td[(1)] ConstructzML(NTrH[,r]) setX[,r]=Nrz endfor hediscussedansecondalgorithm:whereC(H)isgivenbythefollowingquantity: Algorithm7MLsamplegeneration-II DotheSVDofF=UDVTandletH=UD Samplespairsfu,Yguntilu
PAGE 64

ThishelpstowritedownaMarkovchainforX.LetthecurrentvalueofXisXi.NowinordertogeneratethenextvalueX(i+1),wefollowthefollowingsteps:Thisisareversible, Algorithm8Gibbssamplermethod Foranyrandomk2f1,2,pgdothefollowingsteps: ChooseanorthonormalbasisfornullspaceofX[,)]TJ /F4 7.97 Tf 6.59 0 Td[(k] samplezvMF(NTF[,k] X[,k]=Nz returnX(i+1)=X aperiodic,irreducibleMarkovchainifp
PAGE 65

CHAPTER4BAYESIANANALYSISOFMATRIX-LANGEVINONTHESTIEFELMANIFOLD 4.1PreliminariesAnalysisofdirectionaldatacomprisesoneofthemajorsub-eldsofstudyinStatistics.Directionalstatisticsdealswithobservationsthatareunitvectors,orsetsororderedtuplesofunitvectors,inthen-dimensionalspaceRn.SincethesamplespaceisnottheusualEuclideanspace,standardmethodsdevelopedforthestatisticalanalysisofunivariateormultivariatedatadonotapplyimmediately.Stateddierently,incorporatingtheintrinsicstructureofthesamplespaceisessentialtotheproperanalysisofsuchdata.Thereisextensiveliteratureonthestatisticsofcircularandsphericaldata.Inaddition,therehasbeensignicantinterestinthestudyofmoregeneralsamplespacessuchastheStiefelandtheGrassmannmanifold.Inparticular,Downs[ 48 ],KhatriandMardia[ 49 ],andJuppandMardia[ 50 ],havedevelopedstatisticalmethodsfordatathatlieontheStiefelmanifold.Whentheorientationofanobject,orsomederivedfeature,liesinaspaceofnon-zerocurvature,theusualprobabilitydistributionscannotbeusedtodescribeit.AspacethatisanaturaltforsuchdataistheStiefelmanifold.Agoodstatisticalframeworktoinfertheparametersoftheprobabilitydistributioninthisgeneralsamplespacewouldclearlybeofusetoseveralareasofscienticenquiry.AnappropriatemarriagewithBayesianinference,which,withtheevergrowingcomputationalpowerofthedigitalcomputer,hascomeofageinawidevarietyofsituations,wouldbeevenmorebenecial.InthischapterwedevelopaBayesianframeworkforaparticulardistributionknownastheMatrix-Langevin(henceforthdenotedbyML)ontheStiefelmanifoldVn,p.WebeginbyproposingappropriatepriorsandderivingtheposteriorestimateofparametersofaMLonVn,p.WethenextendtheframeworktoanitemixturemodelofMLandnallytoanon-parametricDirichletProcessMixture(DPM)modelofMLwhichcanpotentiallyaccommodateaninnitenumberofclusters.WealsodemonstrateafastervariationalinferenceschemefortheDPMmodel. 65

PAGE 66

4.2MotivatingExample:DictionaryLearningInaBayesiandictionarylearningframework[ 51 ],asignalXi(ordata)isrepresentedasann-dimensionalvectorandtheover-completedictionaryDisrepresentedasannKmatrix,withn
PAGE 67

ofsizep,D(j)forj=1,2,,M,D=264d1dp| {z }D[1]:dp+1d2p| {z }D[2]:dK)]TJ /F4 7.97 Tf 6.59 0 Td[(p+1dK| {z }D[M]375.Inthisnewsetting,thesparsitypriorisnowplacedattheblock-levelandtheusualnotionsofcoherenceandsparsity[ 52 ]assumeamoregeneralformofblock-coherenceandblock-sparsity.Sinceeachblockislow-dimensionalandunder-complete,uniquenessofsolutionwithineachblockisassuredandnoregularizationisrequiredwithineachblock.Inparticular,onecanassumethateachblockD(j)iscomposedoforthonormalcolumns,andthenaturalpriorfortheblock-structureddictionaryDisthenadistributiononanappropriatelychosenStiefelmanifold. 4.3TheStiefelManifoldandMLDistributionWhenp-distinguishableordereddirectionsinn-dimensions(np)arerequiredtodescribeeachorientation,[ 48 ]hasgivenmethodsforsummarizingandcomparingorientationsofsamplesoforientableobjects.Letxjbeann1columnvectorwhoseelementsarevaluesgiveninsomexedco-ordinatesystem,forthej-thdirection.ThentheorientationofasingleobjectmaybeexpressedasannpmatrixXofrankpwhosecolumnsarex1,x2,,xp.ThiscanbeformallyimposedbyrequiringXtosatisfyXTX=CwhereCisasymmetricpositive-deniteppmatrix.ThespaceofallsuchXisknownastheStiefelC-manifold.WhenCisassumedtobeidentity(Ip)wehavetheusualStiefelmanifold.Henceforth,weshalldenotethisbyVn,porO(n,p).Informally,Vn,pconsistsofnp\tall-skinny"orthonormalmatrices.OurBayesianframeworkusesanon-uniformdistributionsonVn,pknownastheMLdistribution.Theonedimensionalspecialcaseofthisdistributionisthevon-Misesdistributiononahypersphere.BasedonthenormalizedHaarmeasure[dH],anexponentialfamilyofprobabilitydistribution[ 48 ]hasbeendenedonVn,pinthefollowingmanner.Thedensityfunction[ 6 ],[ 43 ]canbewrittenas: dF(X)=exp(trace(FTX)) 0F1(n 2,1 4FTF)[dX](4{1) 67

PAGE 68

whereFisannpparametermatrixand0F1,thenormalizingconstant[ 49 ],isaHypergeo-metricFunctionwithaMatrixargument[ 57 ],[ 58 ],[ 59 ]. Fcanbedecomposedintotheproductoftwomatrices,F=MK,viathePolardecompositionmethod.Clearly,M2Vn,pandKisappsymmetricpositivedenitematrix.ThedensityhasamodeatX=M,whereMisthepolarpartororientationofF.KistheellipticalpartorconcentrationofF. SinceF=MKandKisasymmetricpositivedenitematrix,KcanbeeigendecomposedbyK=UDUTwhereU2O(p)andDisadiagonalmatrixdiag(d1,d2,,dp).Wecannowwritethehypergeometricnormalizingconstantas:0F1(n 2,1 4FTF)=0F1(n 2,1 4KTK)=0F1(n 2,1 4D2)sinceMTM=IpasM2Vn,p.0F1()thusdependsonlyontheeigenvaluesoftheparametermatrix.ThemoregeneralandexibleclassofprobabilitydensitieshavinglinearaswellasquadratictermsisfBMF(XjA,B,F)/exp(trace(FTX+BXTAX))whereAandBaregenerallytakentobesymmetricmatrices.ForF=0,wegettheMatrixBinghamdistributionandforAorB=0wegettheMLdistribution.InthispaperwerestrictourselvestotheMLdistribution.However,itisnotdiculttoseethatourtechniqueextendstothemoregeneralfamilyofdistributions. 4.4ParametricBayesianInferencefortheMLDistributionAsdescribedabove,theparameterFhastwodistinctcomponents{M(polar)andK(ellipticalorconcentration).Theyplayverydierentrolesingivingshapetotheunderlyingdistribution.MisresponsibleforpurerotationwhereasKisresponsibleforconcentratingthedistributionaroundthemodeM.WewillassumethatthepriordistributionsforMandKareindependent.NotenextthatM,beingaunitarymatrixitself,liesontheStiefel.Also,sinceitdoesnotoccurinthedenominatorofML,itsinferenceislikelytobesimpler.Incontrast,theinferenceforK,whichisapositive-denitematrixisnotlikelytobestraightforwardsinceweareconfrontedwiththeHypergeometricFunctiononKinthedenominatorofML.The 68

PAGE 69

primarycontributionofthisworkisthatwehavesuccessfullyconstructedafullBayesianframeworkforbothparameters,which[ 1 ]couldnotaccomplish.ConjugatepriorshavebeenwidelyusedtoincorporatepriorbeliefsseamlesslyintotheBayesianframework.Thechoiceoftheconjugatepriordependsheavilyontheformofthelikelihoodfunction.OurassumptionthatthepriordistributionsforMandKareindependentisatoddswiththeMLdistributionwherethelikelihoodisdeterminedbyanonfactorizablefunctionoftheirproduct.Asaresult,theposteriorisnotfactorizableliketheprior,andhencethepriorisnotconjugateingeneral.Howeverifweweretoxeitheroneoftheparameters,theconditionaldistributionoftheotherparametercanstillbemadeconjugatetothelikelihood{whichintheliteratureisknownasaConditionallyconjugatemodel.Wefollowthispath. 4.4.1LikelihoodfortheMLDistributionLetNsamplesofdataXN=fX1,X2,,XNgbegeneratedi.i.d.fromtheMLdistributiononVn,pwithparametermatrixF=MK. ML(Xi;F)=exp(trace(FTXi)) 0F1(n 2;1 4FTF)(4{2)LetR=PNj=1Xi.UsingKT=K,MTM=IpanddiagonalizingK=UDUT,wehaveFTF=DTD=D2.ThecompletedatalikelihoodisL(XN)= exp(trace(KMTR)) QNj=10F1(n 2;1 4D2)=exp(trace(RTMK)) )]TJ /F5 7.97 Tf 5.48 -11.48 Td[(0F1(n 2;1 4D2N (4{3) 4.4.2PriorforthePolarPartMSinceMliesonVn,p,weassumeanotherMLwithhyper-parameterG0asthepriordistributionforM.ML(M;G0)=exp(trace(GT0M)) 0F1(n 2;1 4GT0G0)G0=M0Q0whereQ0=Ipand2R 69

PAGE 70

4.4.3PosteriorforthePolarPartMWenowcomputethefullconditionaldistributionforM,whichwillbeusedintheGibbssampler.P(MjXi,K)=P(XijM,K)P(M)P(K) P(XijK)P(K)Nowwehave, P(MjXN,K)=ML0@ NXi=1XiK+G0!TM1A(4{4)ThuswegetaconditionallyconjugatepriorforMforthegivenlikelihood. 4.4.4PriorfortheEllipticalorConcentrationPartKWerstassumethatKisadiagonalmatrixD.Thisassumptionismotivatedprimarilybythesimplicityitconferstocalculations.WeextendourresultstoamoregeneralKinalatersection.Observingthelikelihood,weproposeapriordistribution,whichisproportionalto(D;,)/exp(trace(D)) )]TJ /F5 7.97 Tf 5.48 -11.48 Td[(0F1(n 2;1 4D2)1(D2[0,1]p)wherehyperparameters>0and2Rpp.1(D2[0,1]p)denotestheindicatorrandomvariablewithdiagonalentries;theeigenvaluesareboundedaboveby1andbelowby0.Theframeworkremainsunchangedwhentheeigenvaluesareboundedabovebyanyt>0.TheMLdistributionisdenedforaparticularK=D.SincetheStiefelmanifoldiscompact,theintegralofMLw.r.tXiisboundedandhenceanitenormalizingconstantexists.However,Kliesonanunboundedconewhichisnotcompact(thePSDcone).Itisthereforeunlikelytohaveaproperposteriorifweassumethesupportofthepriortobeunbounded,sincetheintegrationw.r.tKmightnotyieldanitenormalizationconstant.ThustheneedtoboundKinsomemanner.NoteincontrastthatthepriorforMdoesnotsuerfromthisissuesinceMliesontheStiefelwhichiscompact. 4.4.5UpperandLowerBoundsforthe0F1()FunctionApartitionisavector=(1,;p)ofnon-negativeintegersthatareweaklydecreasing:12p.Theentries1,,parecalledthepartsof;thelength 70

PAGE 71

ofisthenumberofnon-zeroj;andtheweightofisjj:=1+2++p.Fora2C(orR)andanynon-negativeintegerj,therisingfactorial,(a)jisdenedas(a)j=\(a+j) \(a)=a(a+1)(a+2)(a+j)]TJ /F6 11.955 Tf 11.96 0 Td[(1)Correspondingtoeachpartition,thepartitionalrisingfactorial,(a)isdenedas(a)=pYj=1(a)]TJ /F6 11.955 Tf 13.15 8.09 Td[(1 2(j)]TJ /F6 11.955 Tf 11.96 0 Td[(1))jLetSbearealsymmetricppmatrix.Foreachpartition,wedenotebyZ(S)thezonalpolynomialofthematrixS.Leta2C(orR)besuchthat)]TJ /F3 11.955 Tf 9.29 0 Td[(a+1 2(j)]TJ /F6 11.955 Tf 11.98 0 Td[(1)isnotanon-negativeintegerforallj=1,p.ForanysymmetricppmatrixS,wedeneageneralizedhypergeometricfunctionofmatrixargument, 0F1(a;S)=1Xk=01 k!Xjj=kZ(S) (a)=E(say)(4{5)wheretheinnersummationisoverallpartitions=(1,,p)ofweightk.Also, Xjj=kZ(S)=(trace(S))k.(4{6)Itfollowsfromequations 4{5 and 4{6 that0F1(a;S)<2exp(trace(S)) 4.4.5.1ALowerBoundFrom (a)=pYj=1a+1 2)]TJ /F3 11.955 Tf 14.19 8.09 Td[(j 2j=a(a+1)]TJ /F6 11.955 Tf 11.95 0 Td[(1)(a)]TJ /F6 11.955 Tf 13.15 8.09 Td[(1 2)(a)]TJ /F6 11.955 Tf 13.15 8.09 Td[(1 2+2)]TJ /F6 11.955 Tf 11.96 0 Td[(1)(a)]TJ /F3 11.955 Tf 13.15 8.09 Td[(p)]TJ /F6 11.955 Tf 11.95 0 Td[(1 2)(a)]TJ /F3 11.955 Tf 13.15 8.09 Td[(p)]TJ /F6 11.955 Tf 11.95 0 Td[(1 2+p)]TJ /F6 11.955 Tf 11.96 0 Td[(1) (4{7) 71

PAGE 72

wenotethattheproductismaximizedwhenthepartitionofklookslikef1=k,0,,0g.So(a)a(a+1)(a+k)]TJ /F6 11.955 Tf 11.96 0 Td[(1).Hence E>1Xk=01 k!1 a(a+1)(a+k)]TJ /F6 11.955 Tf 11.95 0 Td[(1)Xjj=kZ(S)(4{8)NowusingthefactthattheArithmeticMean(AM)GeometricMean(GM),wehave(a(a+k)]TJ /F6 11.955 Tf 11.96 0 Td[(1))1 ka++(a+k)]TJ /F6 11.955 Tf 11.95 0 Td[(1) k=)(a(a+k)]TJ /F6 11.955 Tf 11.96 0 Td[(1))a+k)]TJ /F6 11.955 Tf 11.95 0 Td[(1 2kFromInequality 4{8 andusingPjj=kZ(S)=Tk(whereT=trace(S)),wehave E>1Xk=01 k!Tk )]TJ /F3 11.955 Tf 5.48 -9.68 Td[(a+k)]TJ /F5 7.97 Tf 6.58 0 Td[(1 2k=1Xk=01 k!(T=a)k )]TJ /F6 11.955 Tf 5.48 -9.68 Td[(1+k)]TJ /F5 7.97 Tf 6.58 0 Td[(1 2ak(4{9)Notethat,inourcasea=p 2andp1,so2a1andk1 1 2a1=)k)]TJ /F6 11.955 Tf 11.96 0 Td[(1 2a(k)]TJ /F6 11.955 Tf 11.96 0 Td[(1)=)1+k)]TJ /F6 11.955 Tf 11.96 0 Td[(1 2ak(4{10)Fromequation 4{9 and 4{10 andnotingthatthersttermoftheRHSis1,wehave E>1+1Xk=11 k!(T=a)k kk(4{11)WenowusetheStirling'sformulap 2kkp ke)]TJ /F4 7.97 Tf 6.58 0 Td[(kk!ekkp ke)]TJ /F4 7.97 Tf 6.58 0 Td[(ktoboundkk E>1+1Xk=11 k!(T=a)kp ke)]TJ /F4 7.97 Tf 6.58 0 Td[(k kkp ke)]TJ /F4 7.97 Tf 6.59 0 Td[(k=1+1Xk=11 k!(T=ae)kp k kkp ke)]TJ /F4 7.97 Tf 6.59 0 Td[(k1+p 21Xk=11 k!(T=ae)kp k k!1+p 21Xk=1(T=ae)k (k!)2=1+p 21Xk=1(p T=p ae)2k (k!)2 72

PAGE 73

=1+p 21Xk=1( 2p T)2k (k!)2where=2 p ae1+p 21Xk=1( 2)2kTk (k!)2where=2 p ae (4{12) WeknowT=trace(S)=trace(D2)andasDisdiagonalandallitseigenvalues(d1,d2,,dp)arepositive,wehaveTk=)]TJ /F3 11.955 Tf 5.47 -9.69 Td[(trace(D2)k= pXi=1d2i!kpXi=1(di)2kSoby 4{12 ,wehave E>1+p 2"1Xk=1( 2)2kPpi=1(di)2k (k!)2#=1+p 2pXi=1"1Xk=1(di 2)2k (k!)2# (4{13) FromthedenitionofthemodiedBesselfunctionoforderzero(I0(x))wehave"1Xk=1(di 2)2k (k!)2#=I0(di))]TJ /F6 11.955 Tf 11.96 0 Td[(1So,wehave: E>1+p 2pXi=1(I0(di))]TJ /F6 11.955 Tf 11.95 0 Td[(1)where=2 p ae=1+p 2pXi=1I0(di))]TJ /F3 11.955 Tf 11.95 0 Td[(pp 2 (4{14) Figure 4-1 presentsthislowerboundforrandomchoicesofeigenvaluesofD. Figure4-1. Lowerboundfor0F1(a;S)[inred]byRHSofequation 4{14 [inblue].x-axisrepresentsthesumofeigenvaluesandy-axisdenotesthefunctionvalues. 73

PAGE 74

4.4.5.2LowerBoundsforI0(x)NoterstthatI00(x)=I1(x).Observenextthatexp(x) I0(x)isanincreasingfunctionforallx0,as exp(x) I0(x)0=exp(x)(I0(x))]TJ /F3 11.955 Tf 11.96 0 Td[(I1(x)) I20(x)>0(4{15)ApplyingI0(x)>I1(x),fromequation 4{15 ,wehavef(x)=exp(x) I0(x)isanincreasingfunction,so8x>b f(x)>f(b)=)=)I0(x)x,inordertocheckitletuswritetheexpressionforI0(x)I0(x)=1Xk=0(x=2)2k (k!)2=1+(x=2)2+O(x4)now(x=2)]TJ /F6 11.955 Tf 11.96 0 Td[(1)2>0=)1+(x=2)2>x,sowehaveI0(x)>x.LetuswriteanimportantidentityofmodiedBesselfunctions(x>0)[itisoftencalledbackwardrecurrencerelation] I)]TJ /F5 7.97 Tf 6.58 0 Td[(1(x))]TJ /F3 11.955 Tf 11.95 0 Td[(I+1(x)=2 xI(x)(4{17)Alsofrom[ 60 ]wehaveTurantypeofinequalityformodiedBesselfunction I)]TJ /F5 7.97 Tf 6.58 0 Td[(1(x)I+1(x)0and)]TJ /F5 7.97 Tf 23.12 4.71 Td[(1 2(4{18)Fromequations 4{18 and 4{17 andputting=1 I21(x)>I0(x)I2(x)=I0(x)I0(x))]TJ /F6 11.955 Tf 13.34 8.09 Td[(2 xI1(x)=)I21(x)>I20(x))]TJ /F6 11.955 Tf 13.33 8.09 Td[(2 xI0(x)I1(x)=)y2+2 xy)]TJ /F6 11.955 Tf 11.96 0 Td[(1>0wherey=I1(x) I0(x) (4{19) 74

PAGE 75

Asthediscriminant=q 4 x2+4>0,soithastworealroots.Alsonotethaty>0bythedenitionofmodiedBesselfunction.So,inordertoholdinequality 4{19 ,yshouldbegreaterthanthelargerpositiverootr1,wherer1isgivenby r1=)]TJ /F5 7.97 Tf 10.61 4.71 Td[(2 x+q 4 x2+4 2=)]TJ /F6 11.955 Tf 10.68 8.09 Td[(1 x+r 1 x2+1=1 x()]TJ /F6 11.955 Tf 9.3 0 Td[(1+p 1+x2)(4{20)Sowehave y>r1=)I1(x) I0(x)>1 x()]TJ /F6 11.955 Tf 9.3 0 Td[(1+p 1+x2)=)xI1(x)>(p 1+x2)]TJ /F6 11.955 Tf 11.95 0 Td[(1)I0(x)>(p x2)]TJ /F6 11.955 Tf 11.96 0 Td[(1)I0(x)=)xI1(x)>(x)]TJ /F6 11.955 Tf 11.95 0 Td[(1)]I0(x) (4{21) Letustakeg(x)=exp(x) xI0(x)sog0(x)=exp(x)[(x)]TJ /F6 11.955 Tf 11.95 0 Td[(1)I0(x))]TJ /F3 11.955 Tf 11.95 0 Td[(xI1(x)] x2(I0(x))2Now,g0(x)<0asfrom 4{21 ,wehave(x)]TJ /F6 11.955 Tf 11.95 0 Td[(1)I0(x))]TJ /F3 11.955 Tf 11.95 0 Td[(xI1(x)<0.so,g(x)bandusingthisweget, exp(x) xI0(x)exp(x) xbI0(b) exp(b)(4{22)Now,I0(b)>1,sowehave,I0(x)>exp(x)]TJ /F3 11.955 Tf 11.95 0 Td[(b) (x=b)=exp(x)]TJ /F6 11.955 Tf 11.96 0 Td[(ln(x=b))]TJ /F3 11.955 Tf 11.96 0 Td[(b)>exp(ax)]TJ /F3 11.955 Tf 11.96 0 Td[(b)forsome0ax.So,wehave:whenx>b, I0(x)>exp(ax)]TJ /F3 11.955 Tf 11.96 0 Td[(b)forsome0
PAGE 76

4.4.5.3RemarksWhenx>bwehaveaboundforI0(x)>exp(ax)]TJ /F3 11.955 Tf 12.02 0 Td[(b).Clearlyitdependsonchoiceofb.From[ 61 ]weknowI0(x)1 2exp(2x )+1 2exp()]TJ /F6 11.955 Tf 10.5 8.09 Td[(2x )1 2exp(2x )Weusegenericconstantsc1andc2instead, I0(x)>exp(c1x)]TJ /F3 11.955 Tf 11.96 0 Td[(c2)(4{24) 4.4.5.4LowerBoundfor0F1()UsingLowerBoundforI0(x)Nowfromequation 4{14 and 4{24 andnotingthatS=1 4D2anda=n 2wehave, E1=0F1(n 2;1 4D2)>1)]TJ /F3 11.955 Tf 11.95 0 Td[(pp 2+p 2pXi=1I0di 21)]TJ /F3 11.955 Tf 11.95 0 Td[(pp 2+p 2pXi=1exp(c1di 2)]TJ /F3 11.955 Tf 11.95 0 Td[(c2) (4{25) wealsoknow,fromtheAMGMinequalitythat 1 ppXi=1exp(yi) exp(pXi=1yi)!1 p=)pXi=1exp(yi)pexp 1 ppXi=1yi! (4{26) puttingyi=(c1di 2)]TJ /F3 11.955 Tf 12.18 0 Td[(c2)and=2 p ae=2p 2 p ne,usingtheaboveinequalityinequation 4{25 ,wecanwrite E1>1)]TJ /F3 11.955 Tf 11.96 0 Td[(pp 2+pp 2exp c1p 2 pp n1 p epXi=1di)]TJ /F3 11.955 Tf 11.95 0 Td[(c2!(4{27)SettingZ=c1p 2 pp n1 p ePpi=1di)]TJ /F3 11.955 Tf 11.95 0 Td[(c2,wehave, 0F1(n 2;1 4D2)>1)]TJ /F3 11.955 Tf 11.95 0 Td[(pp 2+pp 2exp(Z)>exp(Z) (4{28) 76

PAGE 77

Noticethat,Z=c1p 2 pp n1 p etrace(D))]TJ /F3 11.955 Tf 11.96 0 Td[(c2=c3trace(D))]TJ /F3 11.955 Tf 12.31 0 Td[(c2.Wherec3=c1p 2 pp n1 p e.Usingnumericalsimulations,when0x1wesee(Figure 4-2 )thatI0(x)>exp(x)]TJ /F6 11.955 Tf 12.06 0 Td[(0.77).Soinourparticularcasec1=1,c2=0.77andc3=p 2 pp n1 p e. Figure4-2. LowerboundforI0(x)[inred]byexp(x)]TJ /F6 11.955 Tf 11.95 0 Td[(0.77)[inblue].NotethattheinequalityI0(x)>exp(x)]TJ /F6 11.955 Tf 11.95 0 Td[(0.77)holdsonlyintheinterval[0,1]. 4.4.6PosteriorfortheEllipticalorConcentrationPartD Figure4-3. Thisisaapproximateproleforposteriordensityfunctionfora22diagonalmatrixwhen100datapointsaregiven NotethatthepriorforDlookslike, (D;,)=1 C0exp(trace(D)) )]TJ /F5 7.97 Tf 5.48 -11.48 Td[(0F1(n 2;1 4D2)1(D2[0,1]p)(4{29)C0,thenormalizationconstant,isnitesincethesupportofthedistributioniscompact.Thedatalikelihoodtermis L(XN)=exp(trace(HTD)) )]TJ /F5 7.97 Tf 5.48 -11.47 Td[(0F1(n 2;1 4D2NwhereH=MT pXi=1Xi!(4{30) 77

PAGE 78

Fromequation 4{29 and 4{30 wecanwritethefullconditionaldistributionforDas,(DjXN,M)= 1 Cexp(trace(HTD+D)) )]TJ /F5 7.97 Tf 5.48 -11.48 Td[(0F1(n 2;1 4D2N+1(D2[0,1]p)<1 Cexp(trace(HTD+D)) (exp(Z))N+1(D2[0,1]p)=1 Cexp(trace(HTD+D)) (exp((N+)Z))1(D2[0,1]p)=1 Cexp(trace(HTD+D)) exp(1trace(D))]TJ /F8 11.955 Tf 11.96 0 Td[(2)1(D2[0,1]p)where1=(N+)c1p 2 pp n1 p eand2=(N+)c2.=1 Cexp(trace(HT+)]TJ /F8 11.955 Tf 11.96 0 Td[(1)D))e21(D2[0,1]p)=1 Cexp(trace()]TJ /F3 11.955 Tf 11.66 0 Td[(D))e21(D2[0,1]p) (4{31) where)-301(=HT+)]TJ /F8 11.955 Tf 12.07 0 Td[(1Ip,1=(N+)c1p 2 pp n1 p e,2=(N+)c2,andCistheappropriatenormalizationconstant.NumericalintegrationmethodisusedtocalculateCforsamplingfromtheposteriorofD. 4.4.6.1RejectionSamplingAswehave266664)]TJ /F5 7.97 Tf 6.77 -1.79 Td[(11)]TJ /F5 7.97 Tf 6.77 -1.79 Td[(1p)]TJ /F4 7.97 Tf 6.78 -1.79 Td[(p1)]TJ /F4 7.97 Tf 6.77 -1.79 Td[(pp377775266664d100dp377775=266664)]TJ /F5 7.97 Tf 6.78 -1.8 Td[(11d1)]TJ /F4 7.97 Tf 6.77 -1.79 Td[(ppdp377775 78

PAGE 79

Wearenotinterestedintheodiagonalentries.Nowclearly,trace()]TJ /F3 11.955 Tf 13.65 0 Td[(D)=pXi=1)]TJ /F4 7.97 Tf 6.77 -1.79 Td[(iidi=pXi=1idi(say)Sofromequation 4{31 ,wehave(DjXN,M)<1 C)]TJ /F3 11.955 Tf 5.48 -9.69 Td[(e2exp pXi=1idi!1(D2[0,1]p)=1 C)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(e2W"1 Wexp pXi=1idi!1(D2[0,1]p)#WhereW=pYi=1Z10e(idi)d(di)=pYi=1e(idi) i10=pYi=1ei)]TJ /F6 11.955 Tf 11.95 0 Td[(1 iwherethefunctiong(di)=i ei)]TJ /F5 7.97 Tf 6.59 0 Td[(1exp(idi)1(di2[0,1])foralli=1,2,,pisaproperdensityfunctionas,Z10g(di)d(di)=1Sotheenvelopdistributionfortherejectionsamplingistheproductofindependentdistributionhavingdensityforeachi=1,2,,pg(di)d(di)=i (ei)]TJ /F6 11.955 Tf 11.95 0 Td[(1)eidid(di)1(di2[0,1])Hencetheposterioris(DjXN,M)< 1 C)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(e2pYi=1ei)]TJ /F6 11.955 Tf 11.95 0 Td[(1 i!"pYi=1g(di)d(di)#=Q"pYi=1g(di)d(di)#WhereQ=1 C)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(e2Qpi=1ei)]TJ /F5 7.97 Tf 6.59 0 Td[(1 iandtheotherpartisavalidprobabilitydensityfunction.Qistheacceptance-rejectionratiowhichdependsonthevalueofC.Thisdistributionisa 79

PAGE 80

truncatedexponentialdistributionandsincewehaveacompactsupport,theintegralisniteaslongaseachi<1.ThetruncatedexponentialCDFwouldbe(wheneachdiliesbetween0and1)F(di)=i (ei)]TJ /F6 11.955 Tf 11.95 0 Td[(1)Zdi0eidid(di)=(eidi)]TJ /F6 11.955 Tf 11.96 0 Td[(1) (ei)]TJ /F6 11.955 Tf 11.96 0 Td[(1)WeusetheinverseCDFtransformsamplingtogeneratei.i.d.samplesfromthisdistribution.Firstwesampleanumberufromuniform[0,1]andthenusethefollowingequationtogenerateasamplefromtheabovetruncatedexponentialdistributioninthe[0,1]range,whichisourenvelopdistribution. u=(eidi)]TJ /F6 11.955 Tf 11.96 0 Td[(1) (ei)]TJ /F6 11.955 Tf 11.96 0 Td[(1)=)di=1 ilog(1+u(ei)]TJ /F6 11.955 Tf 11.96 0 Td[(1))(4{32)ThuswecansampleaDbysamplingallofitseigenvaluesdi(i=1,2,,p). 4.4.6.2Metropolis-Hastings(MH)SamplingSchemeforDWhenwehaveadiagonalD,wecaneasilyinstrumentaMarkovchainsothatthechainconvergetothecorrectstationarydistribution.NotethatMHsamplerdoesnotgivei.i.dsamplesasthesamplesarecorrelated.Inordertoremovethecorrelationthinningisemployedandalargeburn-initerationofsampleshastobediscarded.InourcasewhenthedimensionofDissmallthenMHsamplerisquiteecient.Wehaveuseaproductofindependentbetadistributionsasaproposaldistributions.ForaverygeneralK,wecanuseWishartasaproposaldistribution. 4.4.6.3HybridGibbsSamplingA2-stepGibbssamplingmethodwitharejectionsamplercanbesetupinthefollowingmanner, 1. InitializeDtoanyrandomdiagonalmatrixwhereallentriesaresampledfrom[0,1]. 2. SampleMfromthefullconditionaldistributionofMgivenin 4{4 3. SampleDfromthefullconditionaldistributionofDbytheaboveMHorRejectionsamplingscheme. 80

PAGE 81

4. Repeat(2)and(3),untilMandDconvergetotheappropriatestationarydistribution. 4.4.7ExperimentsonSimulatedDataWehaverunsimulationswithvariousvaluesfornandp.Wereporthereourresultsforn=5,p=3.Weused1000Gibbsiterationsasourburn-inperiod.Weranthechainforanother1000iterationsaftertheburn-in.SincethesamplesfromtheMCMCarecorrelated,wepickedevery20-thsample(thinning).Forsimplicityofcalculations,wedidnotputanydistributiononthehyperparameters.Foreachsimulationwesimplyuseddierentvaluesofand.ThedistancebetweenthepredictedMpredandMorigwascalculatedusingthemetricd=p)]TJ /F1 11.955 Tf 11.96 0 Td[(trace(MTpredMorig).IfMpredwereclosetoMorig,themetricwouldreturnasmallvalue.Forvariousexperimentsforn=5andp=3,wegotdvaluesintherange0.19to0.46.ForD,wegivetwotypicalexamples.InoneexperimentwehadDorig=diag(0.8,0.8,0.8)andthesimulationresultpredictedDpred=diag(0.91,0.79,0.83).InanotherexamplewehadDorig=diag(0.8,0.9,0.7)andthepredictedDpred=diag(0.84,0.91,0.81).Theseresultsprovethattheinferencewassatisfactory.Forotherchoicesofnandpwehadsimilarresults.Rejectionsamplingalgorithmsareoftencriticizedfortheirslowconvergenceduetohighrejectionrates.Thiswaspartiallytrueinourcase,withsomeexamplesshowinghighrejectionrates,andothersnot.SincetherejectionsamplingisdonewithinthehybridGibbssampler,therejectionratesweredatadependent.Abetterdata-adaptiveschemewilllikelyimproveourconvergencerates. 4.4.8ExtensionoftheModeltoaMoreGeneralKUsingKT=K,MTM=IpanddiagonalizingK=UDUT,wehaveFTF=DTD=D2.Intheprevioussection,weassumedthatKwasdiagonal.WenowshowhowKcanbegeneralizedtoasymmetricpositivedenitematrixallofwhoseentriesliebetween0and1,i.e.,foralli,j=1,2,,pwehaveKij2[0,1].Notethefollowing:trace(K2)=trace(KTK)=pXi=1pXj=1(Kij)2 81

PAGE 82

andthisreducestoPpi=1(Kii)2whenKisdiagonal.Carefulinspectionshowsthatequation 4{12 canalsobewrittenforageneralK,E>1+p 21Xm=1( 2)2m)]TJ 5.48 -.72 Td[(Ppi=1Ppj=1Kij2m (m!)21+p 21Xm=1( 2)2mPpi=1Ppj=1(Kij)2m (m!)2=1+p 2pXi=1pXj=12641Xm=1Kij 22m (m!)2375SousingsameinequalitiesinvolvingmodiedBesselfunctionoforderzerowehave(usingAMGM).Hence0F1n 2;1 4D2>1+p 2pXi=1pXj=1I0Kij 2)]TJ /F6 11.955 Tf 11.95 0 Td[(1=1)]TJ /F3 11.955 Tf 11.95 0 Td[(p2p 2+p 2pXi=1pXj=1I0Kij 2=1)]TJ /F3 11.955 Tf 11.95 0 Td[(p2p 2+p 2pXi=1pXj=1expc1Kij 2)]TJ /F3 11.955 Tf 11.95 0 Td[(c2=1)]TJ /F3 11.955 Tf 11.95 0 Td[(p2p 2+p2p 2exp c1p 2 p2p n1 p epXi=1pXj=1Kij)]TJ /F3 11.955 Tf 11.95 0 Td[(c2!whereDisnowadiagonalmatrixcontainingeigenvaluesofKandweknow0F1n 2;1 4KTK=0F1n 2;1 4D2.AlsonotethattheentriesofKarepositivesoPpi=1Ppj=1Kij>Ppi=1Kii=trace(K).SettingZ=c1p 2 p2p n1 p ePpi=1Ppj=1Kij)]TJ /F3 11.955 Tf 11.96 0 Td[(c2,wehave,0F1n 2;1 4D2>1)]TJ /F3 11.955 Tf 11.96 0 Td[(p2p 2+p2p 2exp(Z)>exp(Z)Wegetaverysimilarexpressionfortheposteriordistribution(KjXN,M)byreplacingDbyK.Notethat0F1()dependsonlyontheeigenvaluesofK.Fortheenvelopdistributionwe 82

PAGE 83

nowhaveatotalof1 2p(p+1)independentTruncatedExponentialdistributionintherange[0,1](asKissymmetric)andwecanperformtherejectionsamplingasbeforebysettingallintegrationsintheregion[0,1]p(p+1)=2. 4.4.9Log-convexityoftheHypergeometricFunctionInthissetupwewilltakeDtobegeneralnotjusta(pp)diagonalmatrix.IntheMatrix-Langevindensityfunctionwehave0F1(n 2,1 4D2)asthedenominator,whichistheHypergeometricfunctionwiththisgeneralmatrixargumentDandalleigenvaluesofDarepositivebecauseDispositivedenite.Soletusrsttakep=2andletusdenoteeigenvaluesofDbyrands(r,s>0).Alsowritef(r,s)=logf0F1(n 2,1 4D2)g.LetuswritedowntheHessianmatrixHasbelow,H=0B@@2f(r,s) @r2@2f(r,s) @r@s@2f(r,s) @s@r@2f(r,s) @s21CANowinordertoshowlog-convexityoftheHypergeometricfunctionf(r,s),weneedtoshowHispositivedenite. 4.4.9.1AsolutionLetX2Rkbeak-dimensionalrandomvectorwhichbelongstotheexponentialfamily,andsupposethattheprobabilitydensityfunctionofXisoftheformf(x)=exp(v0x)]TJ /F3 11.955 Tf 11.95 0 Td[(c(v)),x2S,whereSisthesamplespace(i.e.,therangeofpossiblevalues)ofX;thevectorv2Rkisthecorresponding\naturalparameter";andc(v)istheusualnormalizingconstant.BecausefisadensityfunctionthenZSf(x)dx=1,soitfollowsimmediatelythatexp)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(c(v)=ZSexp(vx)dv.orc(v)=logZSexp(vx)dv. 83

PAGE 84

Next,wewanttocalculatethemoment-generatingfunctionofX.Inthecaseofthematrix-Langevin,moment-generatingfunctionexistsforallt2Rk;andalso,thecovariancematrixofX,isstrictlypositivedenite.So,forak-dimensionalvectort,wewanttocalculateM(t):=Eexp(t0X).Then,itfollowseasilyfromtheaboveformulasthat,atleastforalltinasuitablysmallneighborhoodoftheorigin,M(t)=Eexp(t0X)=ZSexp(t0X)f(x)dx=exp)]TJ /F2 11.955 Tf 8.13 -9.69 Td[()]TJ /F3 11.955 Tf 11.96 0 Td[(c(v)ZSexp)]TJ /F6 11.955 Tf 5.47 -9.69 Td[((t+v)0xdx=exp)]TJ /F2 11.955 Tf 8.13 -9.68 Td[()]TJ /F3 11.955 Tf 11.96 0 Td[(c(v)exp)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(c(t+v).Therefore,weobtainlogM(t)=c(t+v) c(v).WeknowthatlogM(t)iscalledtheCumulant-generatingfunctionofX.TocalculatethecovariancematrixofX,wesimplyhavetodierentiatelogM(t)twicewithrespecttotandevaluatethederivativesatt=0.Tobeprecise,ift=(t1,...,tk)0then=@2 @t@t0logM(t)t=0@2 @ti@tjlogM(t)t=0,wherei,j=1,...,k.Now,fori,j=1,...,k,letusdenethefunctionscij(t)=@2 @ti@tjc(t).Then,byapplyingtheearlierformulaforlogM(t)intermsofc(v),weobtain@2 @ti@tjlogM(t)=@2 @ti@tjc(t+v) c(v)=1 c(v)@2 @ti@tjc(t+v); 84

PAGE 85

therefore,@2 @ti@tjlogM(t)t=0=1 c(v)cij(v);So,wehavearrivedataratherneatformula:=1 c(v))]TJ /F3 11.955 Tf 5.48 -9.68 Td[(cij(v).Andnally,becauseispositivedenitethenthematrix)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(cij(v)alsoispositivedenite;i.e.,thematrix@2 @vi@vjc(v)@2 @vi@vjlogZSexp(vx)dvispositivedenite.Inthecasethatwewant,wesimplytakeXtobethecollectionofentriesinthematrix-Langevinrandommatrix(viewedasalongcolumnvector).Andsincewehavetheexplicitformulaforc(v)intermsofthehypergeometricfunctionofmatrixargumentthenwehavethepositivedeniteconditionthatwewerelookingfor.Thisp=2casecanbegeneralizedsimilarlyusingthesametechnique. 4.4.9.2PossibleARSSamplingTheaboveproofoflog-convexitywillmaketheposteriordistributionalog-concaveoneandaccordingtoourearlierdiscussionitshouldenableustoimplementARSsamplingscheme.ButARSsamplingschemetraditionallydevelopedforaunivariatecase.Inordertoapplytoamultivariateschemeweneedtomodifythealgorithmsothatitcanbefeasiblecomputationally.Onepossiblealgorithmistorstcomputetheenvelophyperplanesonthegridandnottolookforactualintersectionpointbecausethatwillhavehugecomputationalburden.Insteadwewilltaketheminimumhyperplaneonaparticulargridandcarryouttheintegration.Withthehelpofaoctree(andhigher)typeofdatastructureitcanbeimplementedeasily. 4.5FiniteMixtureModelingWecannowextendourframeworkseamlesslytoanitemixturemodel.NotingthatifthereexistsasamplingschemefortheposterioroftheMLdistribution,theextensionwillbe 85

PAGE 86

standard,wesimplystatethegenerativemodelforlackofspace.Here,weassumeknowledgeofthenumberofmixturecomponents,L.ThemixtureweightsandparametershavepriordistributionsandtheweightsaretypicallyviewedasaL-dimensionalrandomvectordrawnfromaDirichletdistribution.ThefullgenerativemodelisN=numberofobservationsL=numberofmixturecomponentsXi=i-thobservationMi=Polarpartoftheparameterofdistributionofobservationsassociatedwithi-thComponentKi=Ellipticalpartoftheparameterofdistributionofobservationsassociatedwithi-thComponent=figLi=1=priorprobabilityofi-thcomponentsuchthatPLi=1i=1zi=componentofobservationiDirichlet(1,2,,L)zi=1,2,NCategorical()Mij(M0,K0)ML(Mi;M0,K0)Kij(,)(Ki;,)XiML(Xi;Mzi,Kzi) 4.6InniteMixtureModelingAvarietyofnon-parametricBayesianmethodshavebecomestandardtoolsformodelinginnitemixtures.OneofthemoreimportantexamplesinreferencetoNP-BayesistheDirichlet 86

PAGE 87

Process(DP)MixtureorDPM.TheDPMwasintroducedbyAntoniak[ 29 ]andhasseengreatpopularityinrecenttimes.Onceagain,duetospaceconstraints,weonlybrieydescribetheDPMmodelingasageneralizationofthenitemixturemodel.WegivedetailsofoneparticularmethodusedintheinferenceoftheDPMmodel,approximatevariationalinference,inthesupplementarymaterial.SincetheinferencebasedonMCMCwouldfollowalongthelinesof(appropriatelyadapted)theauxiliaryvariablealgorithmfromNeal(2000)[ 38 ],weonlybrieydescribeithere.FortheapproximatevariationalinferencewehaveusedtheConjugateGradientmethodontheStiefelmanifold,anddetailsarepresentedinthesupplementarymaterial. 4.6.1DPMModelingontheStiefelManifoldFromthebasicDPMmodel,wehavethefollowingequations:Gj,G0DP(,G0)ijGG8i=1,2,NXijiF(i)8i=1,2,NInourcase,FistheMLdistributionontheStiefelmanifold.Thefullgenerativemodelcanbewrittenasfollows: vijBeta(1,)i=f1,2,g DrawijG0G0i=f1,2,g Forthen-thdatapointn=f1,2,,Ng { Drawznjv1,v2,multinomial((V)) { DrawXnjznML(Xnjzn)G0isthebasedistributionfortheDirichletProcessandisdenedontheproductspaceoftheparametersoftheMLdistribution.XnisalsodrawnfromtheMLwithdierentparameters.(V)isthevectorforthestick-breakingweights.Intheoriginalconstructionthisvectorhadinnitelength.Notethat,forthevariationalinferenceimplementation,wehavetakenthisto 87

PAGE 88

Figure4-4. GraphicalModelforvariationalinferenceofDPM beofaxedlength,T.Thisiscalledthetruncated-stick-breakingprocessintheliterature.Correspondingly,thesetf1,2,,Tgaretheatomsrepresentingthecomponentsofthemixturedistribution.SoGcanbewrittenas{G=1Xi=1i(V)iwherei(V)=vii)]TJ /F5 7.97 Tf 6.59 0 Td[(1k=1(1)]TJ /F3 11.955 Tf 11.96 0 Td[(vk)ThecorrespondinggraphicalmodelforDPMisgivenbyFigure 4-4 4.6.2MCMCInferenceSchemeWecanapplytheearliersamplingtechniquesforsamplingMandDandcombinethatwithusualsamplingtechniquesforDPMMmodel. 4.6.3VariationalBayesInference(VB)onStiefelManifoldThebasicideaofVBonStiefelmanifoldissimilartothemethodproposedin[ 42 ].IntheDPMcontextthelatentvariablesareW=V,,z.ThehyperparametersarescalingparametersandparametersforconjugatebasedistributionwhichareMatrix-Langevininourcase,=f,g.WewilldenoteMatrixLangevindistributiononStiefelVn,pforXwithparameterFbyL(X,F), L(X,F)=1 0F1(n 2;1 4FTF)exp(trace(FTX))(4{33) 88

PAGE 89

HereFisannpparametermatrixwhichcanbeseenasaproductoftwomatricesF=MK,whereM,thepolarpartbeingthemodeofthedistributionandKisthescalingpartwhichisappmatrix.MisagainamatrixwithdimensionnpandalsoliesonVn,p.NotethatwecanwritedowntheuniquesingularvaluedecompositionofFas,F=\003T,where)]TJ /F2 11.955 Tf 10.09 0 Td[(2Vn,p,2O(p),=diag(1,2,,k),1k0TheexpectationofXonStiefelisgivenbyEST(X)=FR,whereRistheppmatrixand(Rij)isgivenby Rij=2@log0F1(n 2;1 4H) @HijwhereH=FTF(4{34)Nownowon,wewilldenotethenormalizingconstantofthedistributionbyc(F).Notethat,0F1isthehypergeometricfunctionofonematrixargument.Usingequation 2{2 ,wecanhavealowerboundonthelogmarginalprobabilityoftheNnumbersofdata, logp(xj,)EQ[logp(Vj)]+EQ[logp(j)]+NXn=1(EQ[logp(znjV)]+EQ[logp(xnjzn)]))]TJ /F9 11.955 Tf 19.26 0 Td[(EQ[logQ(V,,z)] (4{35) ThefullyfactorizeddistributionforQcanbewrittenas: Q(V,,z)=)]TJ /F6 11.955 Tf 5.48 -9.68 Td[(T)]TJ /F5 7.97 Tf 6.59 0 Td[(1t=1Qt(vt))]TJ /F6 11.955 Tf 12.95 -9.68 Td[(Tt=1Qt(t))]TJ /F6 11.955 Tf 12.95 -9.68 Td[(Nn=1Qn(zn)(4{36)WhereTisthetruncationlevelforvariationaldistribution,whichcanbeappropriatelyset.Inouranalysis,Qt(vt)arebetadistributions,Qt(t)areMatrix-LangevindistributionsandQn(zn)aremultinomialdistributions.Theparametersetisthesetf1,2,,T)]TJ /F5 7.97 Tf 6.59 0 Td[(1,1,2,,T,1,2,,Ng. 89

PAGE 90

Notethateachthastwocomponentst,1andt,2andeachnhasTcomponentsn,twheret21,2,,T. CoordinateAscentAlgorithmforOptimization.ThebasicideaistocomeupwithanexplicitcoordinateascentalgorithmwhichinturnmaximizetheR.H.Sofequation 4{35 w.r.tallofthevariationalparameters.Sothefunctionwillbemaximizedw.r.ttheparametersonebyoneandwewillthusattaintheminimumKLdivergencebetweenintractableposteriorandthefullyfactorizedvariationaldistribution. 4.6.3.1Matrix-LangevindistributionsInthisanalysisweareusingseveralMatrix-Langevindistributions,oneisinthedatagenerationpartandotherareusedaspriorfortheparameterfordatageneration.Theyaregivenas: p(xnj)=c(G)exp(trace(GTxn))(4{37) p(j)=c(D)exp(trace(DT))(4{38) Q(j)=c(E)exp(trace(ET))(4{39)Wherec(G),c(D)andc(E)areappropriatenormalizingconstantsasmentionedinEquation 4{33 .Onaseparatenote,thisgivenframeworkcanbeeasilyextendedbyplacingagammaprioron.Inthiscase,Gamma(a1,a2)andthecorrespondingvariationalparameterisGamma(1,2). 4.6.3.2UpdateequationfortRecallthatTisthetruncationlevelforvariationaldistributionforDP.Fromequation 4{35 ,inordertogetupdateequationfort,weneedtocollectallthetermswhichinvolvestonly.LetF(t,1,t,2)denotesthoseterms.Thusrearrangingthetermswecanwrite, F(t,1,t,2)=EQ[logp(Vj)])]TJ /F9 11.955 Tf 11.95 0 Td[(EQ[logQ(V,,z)]| {z }(i)+NXn=1EQ[logp(xnjzn)]| {z }(ii)(4{40) 90

PAGE 91

Clearly,(i)isnothingbutthe-KL(Q(V)jjp(V))andboththedistributionsareBetadistributionswithparameters(t,1,t,2)and(1,),respectively.Usingthestandardformulafor(i)andusingn,tforQ(zn=t)in(ii),wehave:F(t,1,t,2)=T)]TJ /F5 7.97 Tf 6.59 0 Td[(1Xt=1hlogB(t,1,t,2) B(1,)+(1)]TJ /F8 11.955 Tf 11.95 0 Td[(t,1) (t,1)+()]TJ /F8 11.955 Tf 11.95 0 Td[(t,2) (t,2))]TJ /F6 11.955 Tf 19.26 0 Td[((1+)]TJ /F8 11.955 Tf 11.96 0 Td[(t,1)]TJ /F8 11.955 Tf 11.96 0 Td[(t,2) (t,1+t,2)i+NXn=1TXt=1 TXj=t+1n,j!EQ[log(1)]TJ /F3 11.955 Tf 11.95 0 Td[(vt]+n,tEQ[logvt]where,B(x,y)isthebetafunctiondenedas\(x)\(y) \(x+y)where\()isthegammafunctionand (x)isthedigammafunctionwhichisdenedasd dxlog\(x).Notethat,EQ[log(1)]TJ /F3 11.955 Tf 11.98 0 Td[(vt]= (t,2))]TJ /F8 11.955 Tf 11.98 0 Td[( (t,1+t,2)andEQ[logvt]= (t,1))]TJ /F8 11.955 Tf 11.98 0 Td[( (t,1+t,2).NowdierentiatingF,rstw.r.ttot,1andthenw.r.tt,2andequatingtozero,wehavetheupdateequationsfort. t,1=1+NXn=1n,tt,2=+NXn=1TXj=t+1n,j (4{41) 4.6.3.3UpdateequationfortFirstrememberthattliesonStiefelmanifold.SowehavetodotheoptimizationonStiefelmanifolditself.WewillbeusingConjugate-Gradient(CG)methodinordertondthemaximaonthemanifold.Thetermsinvolvingtaregivenasbelow: F(t)=EQ[logp(tj)])]TJ /F9 11.955 Tf 11.96 0 Td[(EQ[logQ(tjt)]| {z }(i)+nXn=1EQ[logp(xnjzn)]| {z }(ii)(4{42)Asbefore,(i)isnothingbutthe-KL(Q(t)jjp(t))whichis =ZST)]TJ /F4 7.97 Tf 6.59 0 Td[(manifoldlogL(t;D) L(t;tE)L(t;tE)dt 91

PAGE 92

=CZST)]TJ /F4 7.97 Tf 6.59 0 Td[(manifold(trace(DTt)]TJ /F3 11.955 Tf 11.96 0 Td[(ETtt)L(t;tE)dt=Cftrace(DT)]TJ /F3 11.955 Tf 11.95 0 Td[(ETt)tg (4{43) WhereC=log(c(D) c(E)).Astraceisalinearoperatorsowecanwritetheequation 4{43 withtand t=ZST)]TJ /F4 7.97 Tf 6.58 0 Td[(manifoldtL(t;tE)dt=EST(t)=tER (4{44) whereRisgivenbyequation 4{34 withH=ETTttE=ETEasTtt=Ik.Equation 4{43 becomesCftrace(DTtER))]TJ /F3 11.955 Tf 12.46 0 Td[(trace(E2R)gandastrace(E2R)doesnotdependont,wecansafelyignorethatpartwhilecomputinggradientw.r.tt.AfterrearrangingtherstpartbecomesC(trace(ATt))whereA=ERDT.Similarlyfrom(ii),wehaveforaparticulart,PNn=1n,tflogc(G)+trace(GTtxn)gAfterremovingthetermthatdoesnotdependont,wehave NXn=1n,ttrace(STt)whereS=xnERGT(4{45)Sofrom(i)and(ii)thelinearfunctionthatwehavetomaximizeonStiefelmanifoldwilllooklike, F(t)=C(trace(ATt))+NXn=1n,ttrace(STt)(4{46)ThislinearfunctionoftcanbemaximizedonStiefelmanifoldbyCGmethodasdescribedinEdelman[ 7 ]. 4.6.3.4CGforminimizingF()ontheStiefelmanifold given0s.tT00=Ik,computeG0=F0)]TJ /F8 11.955 Tf 11.95 0 Td[(0FT00andsetH0=)]TJ /F3 11.955 Tf 9.3 0 Td[(G0 Fork=0,1,2, { MinimizeF(k(t))overtwherek(t)=kM(t)+QN(t) 92

PAGE 93

QRisthecompactQRdecompositionof(I)]TJ /F8 11.955 Tf 12.12 0 Td[(kTk)Hk,A=TkHk,andM(t)andN(t)arep-by-pmatricesgivenbythe2p-by-2pmatrixexponentialM(t)N(t)=exptA)]TJ /F3 11.955 Tf 9.3 0 Td[(RTR0Ip0 { Settk=tminandk+1=k(tk) { ComputeGk+1=Fk+1)]TJ /F8 11.955 Tf 11.96 0 Td[(k+1FTk+1k+1 { ParalleltransporttangentvectorHktothepointk+1Hk=HkM(tk))]TJ /F8 11.955 Tf 11.95 0 Td[(kRTN(tk)setGk:=Gkor0,whichisnotparallel. { ComputethenewsearchdirectionHk+1=)]TJ /F3 11.955 Tf 9.3 0 Td[(Gk+1+kHkwherek=hGk+1)]TJ /F8 11.955 Tf 11.95 0 Td[(Gk,Gk+1i hGk,Gkiandh1,2i=trace(T1(I)]TJ /F6 11.955 Tf 13.15 8.08 Td[(1 2T)2) { ResetHK+1=)]TJ /F3 11.955 Tf 9.29 0 Td[(Gk+1ifk+1modp(n)]TJ /F3 11.955 Tf 11.95 0 Td[(p)+p(p)]TJ /F6 11.955 Tf 11.96 0 Td[(1)=2Although,herewehaveusedCGmethod,onecanuseNewtonmethodtondthemaximaonStiefelmanifold. 4.6.3.5Updateequationforn,tThetermsfromequation 4{35 whicharerelevantforupdatingn,tare U=)]TJ /F9 11.955 Tf 9.3 0 Td[(EQ[logQ(V,,z)]+NXn=1EQ[logp(znjV)]+EQ[logp(xnjzn)](4{47)WeneedtomaximizeUw.r.tn,t,butnotethatwehaveaconstraintPTt=1n,t=1.ByusingtheusualLagrangemultipliertechniquewehaveUnew=U+(PTt=1)n,t)]TJ /F6 11.955 Tf 12.06 0 Td[(1)whereistheLagrangemultiplier.Takingappropriatetermsforn,t,rearranging,takingderivativeofUneww.r.tn,tandequatingittozero,wehave @Unew @n,t=)]TJ /F6 11.955 Tf 9.29 0 Td[((1+logn,t)+t)]TJ /F5 7.97 Tf 6.59 0 Td[(1Xi=1EQ[log(1)]TJ /F3 11.955 Tf 11.95 0 Td[(vi)]+EQ[logvi]+trace(STt)+ 93

PAGE 94

@Unew @n,t=)]TJ /F6 11.955 Tf 9.29 0 Td[((1+logn,t)+Wt+(=0) (4{48) whereWt=Pt)]TJ /F5 7.97 Tf 6.59 0 Td[(1i=1EQ[log(1)]TJ /F3 11.955 Tf 12.12 0 Td[(vi)]+EQ[logvi]+trace(STt).Fromtheequation 4{48 ,wegetn,t/exp(Wt).NowbyusingtheconstraintPTt=1n,t=1,wecaneasilygetthevalueoftheproportionalityconstant.Thuswecanndtheupdateequationsforallthevariationalparametersandbyupdatingthemonebyonewecanmaximizethelowerboundofthemarginallog-likelihood.ThismethodiswayfasterthanMCMCmethodbutoneofthepotentialdrawbacksofthismethodisthatitmightgetstuckinsomelocalmaximaastheoptimizationistypicallynon-convex. 4.6.3.6CalculatedKL-DivergenceLetuswritedowntheR.H.Sof 4{35 whichistheactualcomputedlowerbound. G=EQ[logp(Vj)])]TJ /F9 11.955 Tf 11.95 0 Td[(EQ[logQ(Vj)]| {z }(i)+EQ[logp(j)])]TJ /F9 11.955 Tf 11.95 0 Td[(EQ[logQ(j)]| {z }(ii)+NXn=1(EQ[logp(znjV)])]TJ /F9 11.955 Tf 11.96 0 Td[(EQ[logQ(znj)])| {z }(iii)+NXn=1EQ[logp(xnjzn)]| {z }(iv) (4{49) Nowclearly,term(i),(ii)and(iii)are-KL(Qjjp)betweenthedistributionofcorrespondingvariablesandwecaneasilycalculatethem.Sousingtheexpressionsthatwehavecomputerearlierfor(i),(ii),(iii)and(iv)Gbecomes: G=T)]TJ /F5 7.97 Tf 6.59 0 Td[(1Xt=1hlogB(t,1,t,2) B(1,)+(1)]TJ /F8 11.955 Tf 11.96 0 Td[(t,1) (t,1)+()]TJ /F8 11.955 Tf 11.95 0 Td[(t,2) (t,2))]TJ /F6 11.955 Tf 19.26 0 Td[((1+)]TJ /F8 11.955 Tf 11.95 0 Td[(t,1)]TJ /F8 11.955 Tf 11.96 0 Td[(t,2) (t,1+t,2)i+C)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(trace(DTtER))]TJ /F3 11.955 Tf 11.96 0 Td[(trace(E2R)+)]TJ /F4 7.97 Tf 17.29 14.95 Td[(NXn=1TXt=1n,tlogn,t+NXn=1TXt=1 TXj=t+1n,j!EQ[log(1)]TJ /F3 11.955 Tf 11.95 0 Td[(vt)]+n,tEQ[logvt]+NXn=1TXt=1n,t)]TJ /F6 11.955 Tf 5.48 -9.68 Td[(logc(G)+trace(STt) (4{50) 94

PAGE 95

Figure4-5. logmarginalprobabilityofthedataincreaseswithnumberofiterations InFigure 4-5 wecanseehowGvarieswiththenumberofiterations. 4.7ExperimentsInthisstudy,werstvalidateourtheoryonasetofsimulatedexperimentsonsyntheticdata.Wehavereportedourresultsherebothonsyntheticdatasetaswellasrealimagedataset.Asthisresultisbasedonthenonparametricframework,soapriorithenumberofclusterswereunknown.Inmostofthecasespresentedhere,wealmosthavesuccessfullyfoundouttheunknownnumberofclusterwhichisinotherwaysolutiontothemodelselectionproblemintraditionalmachinelearning.Ontheotherhandspeciallyinalargecollectionofimagesitalwaysmakesensenottoassumethenumberofclusterbeforestartingtheexperiment.WewillprovideresultstoshowthatinourmethodwearecircumventingthisissuebyusinganonparametricBayesianframework. 4.7.1ExperimentsonSyntheticDataInthecaseofexperimentsonsimulateddataconstructedatotalofeightexperimentswhichcontainsdierentdatasetfromtheStiefelmanifoldV(n,p)forvariousvaluesofnandpwherepn.Oneofthemainprobleminthissyntheticsimulationistogeneratedataon 95

PAGE 96

Table4-1. Resultsforsyntheticdataset. V(n,p)Accuracy#-ofclusters(est.) npNonparam-DPMCn 5171.63%43287.88%45391.63%410398.13%411480.88%612599.98%421198.50%450195.13%6 theStiefelmanifold.WeemployaMCMCsamplingalgorithmtogeneratesamplesonthatmanifold.Forfurtherreferenceonthiswereferreadertothepaper[ 46 ].Inthispaper,authordevelopedanalgorithmtosamplefromthematrixBingham-vonMises-Fisherdistribution.ThisistheoneofthemostgeneralformofthedistributiononStiefel.WegeneratedsamplesfromappropriateMatrixLangevindistribution.Inallthesimulatedexperimentswehavetaken4classesandforeachclasswehad200datapoints.inthefollowingtablewehavegiventheoverperformanceofouralgorithmonallofthesimulateddataset.IntheTable 4-1 wehavepointedouttheperformanceofclusteringbasedonDirichletprocessmixture.Wewouldagainpointoutthatnumberofclusterweretotallyunknownandbasedonthedata,dynamicallyoptimalnumberofclusterswereselected.Notinallthecaseswecouldnotgettheactualnumberofclusters,convergencetoalocalminimacouldbeonepossiblereasonsforthis.Forexampleamongthereported6clusters,2clusterscontainedonly7memberseach(weightswas0.875%).IntheMCMCimplementationmaybemoreburn-intimemightbeusefultoovercomethisproblem.Theimportantobservationisthatasrincreasesfromonetotwo,threeorfour,theoverallrun-timeofthealgorithmincreases.Wealsorantheseexperimentswith3clustersalsoandtheoverallperformanceisverysimilartothosereportedabove. 4.7.2CategorizationofObjectsObjectcategorizationproblemreferstotheproblemofgroupingsimilarobjectstogetherinthesameclass.Forevaluationofourtechniqueonthisproblem,weranthealgorithmona 96

PAGE 97

Figure4-6. Confusionmatrixforallofthesimulateddataset subsetofETH-80dataset[ 62 ],[ 63 ].Wehaveused6dierentcategoriesandeachcategorycontains80images.Thesecategoriesweretomato,pear,car,horse,cupandcow.Weevaluateouralgorithmalsoon3classes,4classes,5classesandnally6classes.Asanyonewouldimaginetheperformanceaccuracywouldslightlydecreasewhenwegohigherinthenumberofcategories.Featureextractionwasoneoftheimportantpartsinthisexperiment.Wehaveusedtwodierentfeaturesinthiscontext.Therstonewasaunitnormfeaturevectorconstructedsimilartothemethoddescribedin[ 64 ].Atrstx-gradientandy-gradientofimageswerecomputedandtheywerecomputedatthreedierentscaleswhichareessentiallythreedierentstandarddeviationvaluesforGaussianlter.Heretheywere1,2and4.Thisresultedactuallysiximagesforeachimageinourdataset.Foreachimagea32binhistogramwascomputedandtheywereindividuallynormalizedandconcatenated.Itgaveusatotalof192length 97

PAGE 98

Figure4-7. Selected6objectcategoriesfromtheETH-80dataset Table4-2. ActualandEstimatednumberofclusterandaccuracywithrealdatahavingdierentnumberofclusters ActualAccuracyEstimated 393.33%4497.41%4590.50%5679.58%6 featurevector,whichwasthenprojectedonalowdimensionalsubspacesuchthatatleast99%variabilitycouldbecaptured.Forourcaseitturnedoutthatitwasa21dimensionalspaceonwhichallthefeaturevectorswereprojected[ 41 ].FinallythesevectorswerenormalizedtogeneratethefeaturevectorswhichwerethenclusteredusingDPmixture.AnotherfeaturethatweusedinthisexperimentswasHistogramofOrientedGradients(HOG)[ 65 ].Bothofthesefeaturesweresimilarinperformance.Thefollowingguresrepresentssomeobjectfromour6categorydataset.Below,wearegivingtheperformanceaccuracyfordierentnumberofclusters. 4.7.3ClassicationofOutdoorScenesWehaveusedasubsetof3categoriesfrom8-Sceneclassicationdatasetwhichwereusedearlierby[ 66 ]intheirresearch.InthiscasewehaveusedHOGtoextractfeaturefrom 98

PAGE 99

Figure4-8. ConfusionmatrixforETH-80dataset Figure4-9. Selected3scenecategoriesfromthe8-Scenedataset thedataasitoutperformedtheotherfeatures.The3categorieswere-mountain,coastandtall-building.Thefollowinggurerepresentsfewimagesfromthissubset. Figure4-10. ConfusionmatrixforOutdoorScenedataset 99

PAGE 100

UsingDPMclusteringmethodwesuccessfullyclusteredtheimageswith90.42%ofaccuracyandimportantlythealgorithmfoundthe3clusterscorrectly.Alloftheaboveexperimentswithsyntheticandrealdatasetshowthepromiseinourwork. 100

PAGE 101

CHAPTER5BETA-DIRICHLETPROCESSANDCATEGORICALINDIANBUFFETPROCESS 5.1MultivariateLiouvilleDistributionsADirichletdistributionwithparameter=f1,,QgcanberepresentedeitherontheSimplexSQc=f(z1,,zQ)jPQi=1zi=1ginRQ+orasadistributioninsidethesimplexSQ)]TJ /F5 7.97 Tf 6.59 0 Td[(1o=f((z1,,zQ)]TJ /F5 7.97 Tf 6.58 0 Td[(1)jPQ)]TJ /F5 7.97 Tf 6.58 0 Td[(1i=1zi1ginRQ)]TJ /F5 7.97 Tf 6.58 0 Td[(1+.WewillreferSQcandSQ)]TJ /F5 7.97 Tf 6.59 0 Td[(1oasClosedSimplexandOpenSimplex,respectively.Sothefollowingtwostatementsareequivalent.yDirQ()onSQcyDirQ)]TJ /F5 7.97 Tf 6.59 0 Td[(1(1,,Q)]TJ /F5 7.97 Tf 6.58 0 Td[(1;Q)onSQ)]TJ /F5 7.97 Tf 6.58 0 Td[(1o Denition8. AnQ1vectorxinRQ+issaidtohaveaLiouvilledistributionifxd=ry,whereyDirQ()onSQc.risanindependentrandomvariablewithcdfForpdff(ifdensityexists).ItiswrittenasxLQ(;F).Wewillusethefollowingterminologies{ yistheDirichletbasewithDirichletparameter. risthegeneratingvariate Fandfaregeneratingcdfandgeneratingdensity,respectively.Notethat,xLQ(;F)ix1 PQi=1xi,,xQ PQi=1xiDirQ()onSQcandindependentofPQi=1xi,whichisequivalenttor.Nowwewillstateafactanditsprooffrom[ 67 ]. Fact1. ALiouvilledistributionLQ(;F)hasageneratingdensityfithedistributionhasadensityofthefollowingQYi=1xi)]TJ /F5 7.97 Tf 6.59 0 Td[(1i \(i)\(PQi=1i) (x.)(PQi=1i)]TJ /F5 7.97 Tf 6.59 0 Td[(1)f(x.)wherex.=QXi=1xiThisdensityisdenedinthesimplexf(x1,,xQ)jPQi=1xiagifisdenedintheinterval(0,a). 101

PAGE 102

Proof. IfrhasthedensityfandyDirQ(),thenthejointdensityof(y1,,yQ)]TJ /F5 7.97 Tf 6.58 0 Td[(1)andrisgivenby\(PQi=1i) QQi=1\(i)Q)]TJ /F5 7.97 Tf 6.59 0 Td[(1Yi=1yi)]TJ /F5 7.97 Tf 6.59 0 Td[(1i 1)]TJ /F4 7.97 Tf 11.95 14.94 Td[(Q)]TJ /F5 7.97 Tf 6.59 0 Td[(1Xi=1yi!Q)]TJ /F5 7.97 Tf 6.58 0 Td[(1f(r)Theproofofthisfactisbasedoftherandomvariabletransformationfrom(x1,,xQ)to(y1,,yQ)]TJ /F5 7.97 Tf 6.59 0 Td[(1,r)wherer=QXi=1xi=x.andyj=xj x.8j=1,,Q)]TJ /F6 11.955 Tf 11.96 0 Td[(1AsthistransformationhasJacobian1 rQ)]TJ /F17 5.978 Tf 5.75 0 Td[(1,wecaneasilyobtainthefact.Similarlyconverseofthefactcanbeshownbytheinversetransformation.Thedomainofthedensityisclearaswell. IfthegeneratingvariaterdistributedasBeta(,),thentheparticularformofLiouvilledistributioniscalledBeta-LiouvilleorBeta-Dirichletdistribution.LetuswritedownfewimportantmomentsofLiouvilledistribution.Notethateachyjismarginallybetadistributedandyisindependentofr.Letustake.=PQi=1i. MeanisgivenbyE(xj)=E(ryj)=E(r)E(yj)=E(r)j VarianceisgivenbyVar(xj)=Var(ryj)=E(r2)E(y2j))]TJ /F6 11.955 Tf 11.96 -.17 Td[([E(r)E(yj)]2=j (.)2(.+1)(.)(j+1)E[r2])]TJ /F8 11.955 Tf 11.96 0 Td[(j(.+1)(E[r])2=j (.)2(.+1)(.)(j+1)Var[r]+(.)]TJ /F8 11.955 Tf 11.95 0 Td[(j)(E[r])2 Co-varianceisgivenby(j
PAGE 103

Foreachj
PAGE 104

=QQi=1\(i) \(1++Q)Z10t)]TJ /F5 7.97 Tf 6.58 0 Td[(1(1)]TJ /F3 11.955 Tf 11.96 0 Td[(t))]TJ /F5 7.97 Tf 6.59 0 Td[(1dt=QQi=1\(i) \(1++Q)\()\() \(+) (5{2) andthuswederivethenormalizationconstantforBD.Thereisasimplerwaytoderiveitbyusingtheabovefact.Letuswritex.=PQi=1xi.SoBDdensitycanbealsowrittenasthefollowing\(PQi=1i) QQi=1\(i)Q)]TJ /F5 7.97 Tf 6.59 0 Td[(1Yi=1yi)]TJ /F5 7.97 Tf 6.58 0 Td[(1i 1)]TJ /F4 7.97 Tf 11.96 14.94 Td[(Q)]TJ /F5 7.97 Tf 6.58 0 Td[(1Xi=1yi!Q)]TJ /F5 7.97 Tf 6.59 0 Td[(1f(x.)whereyjisdenedasabovewheregeneratingdensityisBetaanditisdistributedas\(+) \()\()(x.))]TJ /F5 7.97 Tf 6.59 0 Td[(1(1)]TJ /F3 11.955 Tf 11.96 0 Td[(x.))]TJ /F5 7.97 Tf 6.59 0 Td[(1.Puttingbackthevalues,wehavetheBDdensityas, "\(PQj=1j) QQj=1\(j)QYj=1(yj)j)]TJ /F5 7.97 Tf 6.59 0 Td[(1#| {z }Dirichlet\(+) \()\()(x.))]TJ /F5 7.97 Tf 6.58 0 Td[(1(1)]TJ /F3 11.955 Tf 11.96 0 Td[(x.))]TJ /F5 7.97 Tf 6.59 0 Td[(1| {z }Beta(5{3)whereyQ=1)]TJ /F10 11.955 Tf 11.96 8.97 Td[(PQ)]TJ /F5 7.97 Tf 6.58 0 Td[(1j=1yj.SothisdensitywhichisdenedonSQc[0,1]orSQ)]TJ /F5 7.97 Tf 6.58 0 Td[(1o[0,1]isequaltotheonedenedinequation 5{1 .Fromhere,wecanalsoseeitgivesrisetothesamenormalizationconstantas 5{2 .SoitcanbegeneratedinthefollowingstagesGeneratex.fromabetadistributionwithparametersand. Generate(y1,y2,,yQ)fromaDirichletdistributionwithparameters(1,,Q). Multiplythevector(y1,y2,,yQ)withx.toget(x1,x2,,xQ). 5.4BDDistributionConjugacy 5.4.1WithMultinomialLikelihoodLetuswritedownthediscretetimeBDdensityfunction(Q-dimension)withparameters,and1,,Q(p1,p2,,pQ)/ QXj=1pj!)]TJ /F16 7.97 Tf 6.58 5.98 Td[(PQj=1j 1)]TJ /F4 7.97 Tf 16.99 14.94 Td[(QXj=1pj!)]TJ /F5 7.97 Tf 6.58 0 Td[(1QYj=1pj)]TJ /F5 7.97 Tf 6.58 0 Td[(1jI(p2SQo) 104

PAGE 105

LetX0,X1,,XQfollowsa(Q+1)-dimensionalMultinomialwithparametersnandp0,p1,,pQsuchthatPQj=0pj=1.Letusdenotethenumberofoutcomesineachcategoriesbynjforallj=0,1,,Qandn=PQj=0nj.Itspmf,whichisthelikelihood,isgivenbyL(X0=n0,X1=n1,XQ=nQ)/pn00QYj=1pnjj= 1)]TJ /F4 7.97 Tf 16.98 14.94 Td[(QXj=1pj!n0QYj=1pnjjSotheposteriordistributionofpjforj=0,1,Qgiventhedatabecomes0/L/ QXj=1pj!(+PQj=1nj))]TJ /F5 7.97 Tf 6.58 0 Td[((PQj=1(j+nj)) 1)]TJ /F4 7.97 Tf 16.98 14.94 Td[(QXj=1pj!(+n0))]TJ /F5 7.97 Tf 6.59 0 Td[(1QYj=1p(j+nj))]TJ /F5 7.97 Tf 6.59 0 Td[(1j= QXj=1pj!0)]TJ /F16 7.97 Tf 6.59 5.98 Td[(PQj=10j 1)]TJ /F4 7.97 Tf 16.99 14.94 Td[(QXj=1pj!0)]TJ /F5 7.97 Tf 6.58 0 Td[(1QYj=1p0jjWhereupdatedparameters(forDirichletpart)0j=j+nj8j=1,2,Q(forBetapart)0=+QXj=1njand0=+n0Itisclearthatinthediscretecase,BDisaconjugatepriorformultinomiallikelihood. 5.4.2WithNegativeMultinomialLikelihoodTheNegativeMultinomial(NM)distributionisageneralizationofthenegativebinomialdistributionformorethantwooutcomes.SupposeanexperimentgeneratesQ+12outcomes,namelyfn0,,nQg,eachwithprobabilitiesfp0,,pQgrespectively.Ifsamplingisdoneuntilnobservations,thenfn0,,nQgwouldbedistributedwithmultinomialdistribution.However,iftheexperimentisstoppedoncen0reachesthepredeterminedvaluer0,thenthedistributionoftheQ-tuplefn1,,nQgisNM. 105

PAGE 106

LettheNegativemultinomialisrepresentedbyNM(r0,p1,,pQ),suchthatPQj=1pj1.Thedistributionfunctionisgivenby \(r0+PQj=1nj) \(r0)QQj=1nj! 1)]TJ /F4 7.97 Tf 16.98 14.94 Td[(QXj=1pj!r0QYj=1pjnj(5{4)wherenjcorrespondstopj.UsingtheBDdensitydenedabovewehavetheposterioras,/QXj=1pj(+PQj=1nj))]TJ /F16 7.97 Tf 6.59 5.98 Td[(PQj=1(j+nj) 1)]TJ /F4 7.97 Tf 16.99 14.94 Td[(QXj=1pj!(+r0))]TJ /F5 7.97 Tf 6.58 0 Td[(1QYj=1pj(j+nj))]TJ /F5 7.97 Tf 6.59 0 Td[(1Similarly,thisisalsoaBDwithupdatedparametersBD(0,0,(10,,Q0)),wheretheupdatedparametersare,(forDirichletpart)0j=j+nj8j=1,2,Q(forBetapart)0=+QXj=1njand0=+r0Sointhediscretecase,BDisalsoconjugatepriorfornegativemultinomiallikelihood. 5.5CompletelyRandomMeasure(CRM)RepresentationConsideraprobabilityspace(,F,P).Arandommeasureissuchthat(A)isanon-negativerandomvariableforanysetA2F.NowforanydisjointmeasurablesetA,A02F,if(A)and(A0)areindependentrandomvariables,theniscalledaCRM.Borrowingtheterminologyfrom[ 20 ],wecanseeiscomposedofatmostthreecomponents,=d+f+o Adeterministicmeasuredenotedbydandclearlyd(A)andd(A0)areindependentfordisjointAandA0. Anitexedatoms:let(u1,u2,,uL)2Lbeacollectionofxednitenumberoflocationsandlet(1,2,,L)2RL+aretheindependentrandomweightsforthoseatoms,respectively.Thenf=PLl=1lul. Anordinarycomponent:letbeaPoissonprocessintensityonspaceR+.Let(v1,1),(v2,2),)beadrawfromthisprocess.Theno=P1j=1jvj. 106

PAGE 107

Throughoutthisarticle,wehaveassumedthatdisidenticallyequalto0asitisanon-randomcomponent.From[ 20 ],wehaveCRMrepresentationforBetaprocess(BP),Gammaprocess(GP)andDirichletprocess(DP)andwecanalsoidentifythexedandordinarycomponentofthose.Forexample,BetaprocessisanexampleofaCRMwithamassparameter>0,aconcentrationparameter>0,ana.spurelyatomicmeasureHf=PLl=1lulwithl2[0,1]foralllandapurelyabsolutelycontinuousmeasureHoon.SotheCRMcomponentsare: disuniformly0 Fixedatomlocationsare(u1,,uL)2LandatomweightslisdistributedaslindBeta(l,(1)]TJ /F8 11.955 Tf 11.96 0 Td[(l)) OrdinarycomponenthasPoissonprocessintensityHo,whereisthe-nitemeasurewithnitemean(dp)=p)]TJ /F5 7.97 Tf 6.59 0 Td[(1(1)]TJ /F3 11.955 Tf 11.96 0 Td[(p))]TJ /F5 7.97 Tf 6.59 0 Td[(1dpCRMforBPcanbebewrittenas(includingbothcontinuousanddiscretepart)B=1Xk=1pk!k,LXl=1lul+JXjjvjTheatomlocationsareunionofboththeatomsfkgk=fulgl=1L[fvjgJj=1.ClearlyBPisalmostsurelydiscrete.ThereisanalternativewaytodenetheBetaProcesswhereitcanbewrittenasBBP(,,[u,,],Ho).ItdescribesthefollowingCRMwithamassparameter>0,aconcentrationparameter>0,Lnumberofatomswithlocations(u1,u2,uL)2L,twosetsofpositiveatomweightparametersflgLl=1andflgLl=1andapurelyabsolutelycontinuousmeasureHoon diszero. Lxedatomlocations(u1,,uL)2LwithcorrespondingweightslindBeta(l,l). TheordinarycomponenthasPoissonprocessintensityHo,where(dp)=p)]TJ /F5 7.97 Tf 6.59 0 Td[(1(1)]TJ /F3 11.955 Tf -429.44 -14.44 Td[(p))]TJ /F5 7.97 Tf 6.58 0 Td[(1dp 107

PAGE 108

5.5.1AnotherViewpointforCRMOftentimesitisveryeasytoworkwithCRMrepresentationforthesenon-parametricpriors.SoletuslookintoCRMfromafunctionalofaPoissonrandommeasureviewpoint[ 68 ].LetTbeanytopologicalspacewitha-algebraB(T).LetU=(,F,P)anyprobabilityspaceandXbeacomplete,separablemetricspace.LetusdeneaPoissonrandommeasureNonS=R+Xwithmeanmeasure. ForanysetAinB(S)suchthat(A)=E(N(A))<1andtherandomvariableN(A)isdistributedasPoisson((A)). ForanynitecollectionofdisjointsetsA1,,AninB(S),N(A1),,N(An)areindependentrandomvariables. ThefollowingconditionsneedtobesatisedZ10p(dp,X)<1and([1,1)X)<1NowletusdenotethespaceofallboundednitemeasureonXby(MX,B(MX)).LetbearandomelementonUtakingvaluesin(MX,B(MX))whichcanbewrittenas(C)=ZR+CpN(dp,dx)8C2B(X)SoisalinearfunctionalofPoissonrandommeasureanditisKingman'sCRMonX.SoforanydisjointsetsX1,X2,inB(X),(X1),(X2),aremutuallyindependentand([iAi)=Pi(Ai)almostsurely-P.Ifg:X!R+isameasurablefunctionthencanbecharacterizebythefollowingLaplacefunctionalEexp)]TJ /F10 11.955 Tf 11.29 16.28 Td[(ZXg(x)(dx)=expZSe)]TJ /F4 7.97 Tf 6.59 0 Td[(pg(x))]TJ /F6 11.955 Tf 11.96 0 Td[(1(dp,dx)Now,if(dp,dx)=(dp)H(dx)forsomemeasure()onR+andHisanon-atomic-nitemeasureonX,thenNandarecalledhomogeneous.Herewewillonlyconsiderthehomogeneouscase.Alsonotethat,wecouldhavedenedmoregenerallinearfunctionallikeZSXh(p)N(dp,dx) 108

PAGE 109

whereSisaseparablecompletemetricspaceandh:S!R+.Theyareknownash-biasedrandommeasures.Butweareusingaverysimpleh,whichish(p)=p,soitisalsocalledsize-biasedrandommeasure. 5.5.2Campbell'sTheoremIwouldalsoliketostateCampbell'stheoremfrom[ 17 ],whichiscloselyrelatedtotheseconcepts.Theproofcanalsobefoundin[ 17 ]. Theorem5.5.1. LetbeaPoissonprocessonSwithmeanmeasureandletf:S!Rbemeasurable.Thenthesum=XX2f(X)isabsolutelyconvergentwithprobabilityoneiZSmin(jf(x)j,1)(dx)<1Ifthisconditionholds,thenE(e)=expZS)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(ef(x))]TJ /F6 11.955 Tf 11.96 0 Td[(1(dx)foranycomplexforwhichtheintegralontherightconverges,andinparticularwheneverispureimaginary.MoreoverE()=ZSf(x)(dx)inthesensethattheexpectationexistsitheintegralconverges,andtheyareequal.Iftheaboveequationconverges,thenVar()=ZSf(x)2(dx)niteorinnite. 5.6BetaDirichletProcessRecentlyin[ 3 ]Beta-Dirichlet(BD)priorprocesswasintroducedinthecontextofanalyzingmulti-stateeventhistorydata.Itisaconjugatepriorforcumulativeintensity 109

PAGE 110

functionsofaMarkovprocessbasedonmultivariatenon-decreasingprocesswithindependentincrementwhichcanbeconsideredasanextensionofBetaprocessintroducedbyHjort[ 18 ]earlier. 5.6.1BPconstructionbytakinglimitfromdiscretecaseSoLetusrstreviewtheconstructionofBetaProcessbytakingthelimitfromthetime-discretemodel.WehavealreadyseenthedenitionofBetaProcessinchapter 3 .BPisaspecicexampleofpurejumpsubordinator.In[ 14 ]readercangetexcellentreferenceofthesetypeofLevyprocesses.WeknowthattheycanbeequivalentlydenedthroughKingman'sCRM.IntheoriginalconstructionofHjort[ 18 ],BPwasrstdenedinthetimediscretecase.Afterthatfortime-continuouscasehetookthelimitofthediscretemodel.LetustakeA0becontinuousandc(.)bepiecewisecontinuous,non-negativefunction.Theconstructionwasasfollows{ Foreachn,letusdeneindependentvariablesXn,iBeta(an,i,bn,i)fori=1,2,. Wherean,i=cn,iA0(i)]TJ /F5 7.97 Tf 6.58 0 Td[(1 n,i n]bn,i=cn,i(1)]TJ /F3 11.955 Tf 11.96 0 Td[(A0(i)]TJ /F5 7.97 Tf 6.59 0 Td[(1 n,i n])andcn,i=c(i)]TJ /F17 5.978 Tf 7.78 3.25 Td[(1 2 n) LetsAn(0)=0andAn(t)=Pi ntXn,i; ThenAnhasindependentbetaincrements Thenumberofjumpsincreasesasn!1 Theexpectedjumpsizebecomessmallerasn!1 ThesequencefAngconvergeindistributiontothecorrectBetaprocessintheSkorokhodspaceD[0,1] 5.6.2ConstructionofBDPFordiscretetimemodel,Dirichletdistributionisanaturalconjugatepriorfortransitionprobabilities.ButHjortshowedthatcumulativeintensityfunctionsareindependentinthelimit(seetheorem5.1in[ 18 ])ifwestartwithDirichletmodel.TobepreciseletusassumethattheyaredenotedbyA1,A2,,AQ.Iftheyareindependent,thenthecardinalityoffk:Ak(t)>0giseither0or1.NowitisclearthatindependentBetaprocessesare 110

PAGE 111

notconjugateinthiscase.ToremovethisproblemBDpriorwasintroducedandcumulativeintensityfunctionsbecomedependentinthelimitunderthisnewprior.WehavealsoseentheeectofthisindependencewhenwetriedtoextendtheIBPinastandardwayusingDirichletdistribution.WewillbrieydescribeBDandthepropertiesasmentionedin[ 3 ].LetAh=(Ahj:j=0,1,,Q)bethevectorofcumulativeintensityfunctionsfromstateh.Weknowthatforj6=h,Ahj(t)0andPj6=hAhj(t)1.AlsothecumulativeintensityfunctionsfortransitionsfromstatehisgivenbyAh.(t)=Pj6=hAhj(t).Letustake,vhj(t)=dAhj(t) dAh.(t)whichdenotesinstantaneousconditionaltransitionprobabilityfromstatehtoj. Properties. ForgivenAh,wecanseethat(frompropertiesofBDdistribution)Ah.(t)andfvhj(t),h6=jgarecompletelyindependent.NotethatAh.isaBetaprocessandfphj(t),h6=jgfollowsaDirichletdistributiongivent. UnderlyingQ-dimensionalLevyprocessrepresentationofBDprocessAwithparametersA0(),c(),fj()gQj=1canbewrittenasE(exp()]TJ /F8 11.955 Tf 12.61 0 Td[(<,A>))=expZt0Z[0,1]Qe)]TJ /F12 7.97 Tf 6.59 0 Td[(<,x>)]TJ /F6 11.955 Tf 11.96 0 Td[(1(ds,dx)where(dt,dx)=x1)]TJ /F5 7.97 Tf 6.59 0 Td[(11xQ)]TJ /F5 7.97 Tf 6.59 0 Td[(1Q(x.))]TJ /F16 7.97 Tf 7.99 5.98 Td[(PQj=1j(1)]TJ /F3 11.955 Tf 11.96 0 Td[(x.)c(t))]TJ /F5 7.97 Tf 6.59 0 Td[(1dx1dxQdA0(t)andQXj=1xj=x.and=c(t)\(PQj=1j) QQj=1\(j) Letustakeyj=xj=x.forallj=1,2,Q.NowletuswritetheaboveLevymeasureintermofthesenewvariablesyj.NotethatPQj=1yj=PQj=1xj=x.=1.Sowehavebyusingtheresultfromequation 5{3 ,(\(PQj=1j) QQj=1\(j)QYj=1(yj)j)]TJ /F5 7.97 Tf 6.59 0 Td[(1dy1dyQ)| {z }1nc(t)(x.))]TJ /F5 7.97 Tf 6.58 0 Td[(1(1)]TJ /F3 11.955 Tf 11.95 0 Td[(x.)c(t))]TJ /F5 7.97 Tf 6.59 0 Td[(1(dx.)dA0(t)o| {z }2 111

PAGE 112

Nowwecanseeclearlyfromthepart(1)oftheaboveequationthat(y1,,yQ))isDirichletdistributedwithparameters(1,,Q)whilex.isdistributedasBetawithparametersc(t)andA0(t)frompart(2)oftheequation.In[ 3 ]papertheauthorstalkedaboutthesamplepathofQ-dimensionalBDprocesswithparametersc(),A0(),1(),,Q().Itcanbegeneratedasfollows.LetA.beabetaprocesswithparametersc()andA0().NowconditionedonA.,theBDprocessA(t)canbeconstructedasaQ-dimensionalLevyprocessas:A(t)=Xst,s2TV(s)A.(s)where,T=ft:A.(t)>0gandV(s)(s2T)areindependentDirichletrandomvariableswithparameters(1(s),2(s),,Q(s)).Forsimplicity,hereweassumedV(s)V,wherealltheparametersofDirichletarexedto(1,2,,Q)insteadofbeingfunctionofs.Tomakeitevensimplerwewillassumethatc()isalsoaconstantequaltoc. Meanofthisprocesscanbespeciedforj-thco-ordinatebyE(Aj(t))=Zt0j(s) Pkk(s)dA0(s)8j=1,,Q Varianceofthisprocesscanbespeciedforj-thco-ordinatebyVar(Aj(t))=Zt0j(s)(j(s)+1) Pkk(s)(Pkk(s)+1)1 c(s)+1dA0(s)8j=1,,Q Co-varianceofthisprocessbetweenj-thandi-thco-ordinatescanbespeciedbyCov(Aj(t),Ai(t))=Zt0j(s)i(s) Pkk(s)(Pkk(s)+1)1 c(s)+1dA0(s)8j,i=1,,Qwherej6=i 5.7MultivariateCRM(MCRM)RepresentationofBDPThedenitionofCRMcanbeextendedforvectorvaluedCRM[ 69 ]orMCRMaswell.Forexample,=(1,,Q)isacompletelyrandommeasuressuchthatforAandA0thevectors(1(A),,Q(A))and(1(A0),,Q(A0))areindependent.Likeone-dimensionalcase,wecanwritedowntheMCRMrepresentationofBDPasfollows,=d+f+o. Thedeterministicpartdisuniformlyzero,d=0. 112

PAGE 113

Thenitexedatoms:let(u1,u2,,uL)2Larexedatomsandlet(1,,L)2((R+)Q)Lareindependentvectorsofweightsforthoseatoms.f=LXl=1lul=LXl=1(l1,l2,,lQ)ul Theordinarycomponent:letbethePoissonprocessintensitydenedon(R+)Q.Let(v1,1),(v2,2),)beadrawfromthisprocess.Theno=1Xi=1ivi=1Xi=1(i1,i2,,iQ)viBDprocessisanexampleofavectorvaluedCRMorMCRMwithamassparameter>0,aconcentrationparameter>0,ana.spurelyatomicmeasureHf=PLl=1lulwithl2[0,1]foralllandapurelyabsolutelycontinuousmeasureHoon.SoforBDprocessletuswritedowntheMCRMrepresentationasfollows: disuniformly0. Fixedatomlocationsare(u1,,uL)2Landindependentrandomweightsl=(l1,l2,,lQ)2[0,1]Qforthoseatomsaredistributedas{lindBeta-Dirichlet(l,(1)]TJ /F8 11.955 Tf 11.95 0 Td[(l),(f1,f2,,fQ)) OrdinarycomponenthasPoissonprocessintensityHo,whereisthe-nitevector-valued([0,1]Q-valued)measurewithnitemean(dx)=\(PQj=1oj) QQj=1\(oj)xo1)]TJ /F5 7.97 Tf 6.59 0 Td[(11xoQ)]TJ /F5 7.97 Tf 6.58 0 Td[(1Qs)]TJ /F16 7.97 Tf 8 5.98 Td[(PQj=1oj(1)]TJ /F3 11.955 Tf 11.96 0 Td[(s))]TJ /F5 7.97 Tf 6.58 0 Td[(1dx1dxQwheres=PQj=1xj1.SocompleteMCRMforBDPcanbebewrittenas:D=1Xk=1pk!k,LXl=1lul+MXi=1ivi=LXl=1(l1,,lQ)ul+MXi=1(i1,,iQ)viAlternativeversionofBDprocesscanbewrittenas:DBDP(,,hu,,,ffjgQj=1i,fojgQj=1,Ho) 113

PAGE 114

dis0. Lxedatomlocationsare(u1,,uL)2!Lwithcorrespondingrandomweightsl=(l1,l2,,lQ)indBeta-Dirichlet(l,l,(f1,f2,,fQ)) OrdinarycomponenthasPoissonprocessintensityHo,whereisthe-nitevector-valued([0,1]Q-valued)measurewithnitemean(dx)=\(PQj=1oj) QQj=1\(oj)xo1)]TJ /F5 7.97 Tf 6.59 0 Td[(11xoQ)]TJ /F5 7.97 Tf 6.58 0 Td[(1Qs)]TJ /F16 7.97 Tf 8 5.97 Td[(PQj=1oj(1)]TJ /F3 11.955 Tf 11.96 0 Td[(s))]TJ /F5 7.97 Tf 6.58 0 Td[(1dx1dxQwheres=PQj=1xj1. 5.8Beta-DirichletprocessasaPoissonprocess Figure5-1. (Leftside)Poissonprocesson[0,]S2owithmeanmeasure=h.ThesetVcontainsaPoissondistributednumberofatomswithparameterRSh(d!)(dp).(Rightside)OnedrawfromBDconstructedfrom.Therstdimensionisthelocationandotherdimensionsconstitutetheweightvector. NotethattheopensimplexisdenotedbySQo.CRMforBDcanbewrittenasP1k=1pk!k.WeillustratethisPoissonconnectionintheaboveFigure 5-1 (takingQ=2).Asinthegurep1k+p2k1forallkandthatisdepictedbythedottedlineandtheprojectedvaluewhichisunderthelinep1+p2=1.TheunderlyingPoissonprocessisdenotedby=f(!,p)g.Poissonprocessiscompletelycharacterizedbyitsmeanmeasure(d!,dp).ForanysubsetV[0,]SQo,therandom 114

PAGE 115

countingmeasureN(V)isthenumberofpointsgeneratedbywhichareinsideV,thereforeN(V)isPoissondistributedwithmeanmeasure(V).AlsoifwehavepairwisedisjointsetV1andV2,thenN(V1)andN(V2)wouldbeindependent.SoincaseofBDprocess,themeanmeasureofPoissonprocessisgivenbyBD(d!dp)=h(d!)(dp)=\(PQj=1j) \(1)\(Q)cp1)]TJ /F5 7.97 Tf 6.58 0 Td[(11pQQ)]TJ /F5 7.97 Tf 6.59 0 Td[(1(p.))]TJ /F16 7.97 Tf 8 5.98 Td[(PQj=1j(1)]TJ /F3 11.955 Tf 11.96 0 Td[(p.)c)]TJ /F5 7.97 Tf 6.58 0 Td[(1dph(d!)wherep.=PQj=1pj1andpdenotesthatitisavector-valuedrandomprocess.ThemeasureBD(dpd!)iscalledtheLevymeasureoftheprocessandhisalsocalleditsbasemeasure. 5.9ASize-biasedConstructionforLevyrepresentationofBDPIntheconstructionofCRM,Kingman[ 30 ]pointedoutthattheOrdinarycomponentofanyCRMcanbedecomposedcountablenumberofindependentcomponents.o=XkkoLetmeasureandkarethecompensatoroftheLevyprocesscorrespondingtooandko,respectively.LetusdenotetheunderlyingPoissonprocessbyandkwithmeanmeasuresandk,respectively.Theaboveequationleadstothefollowingequations=Xkk=XkkLetusnowstateaveryusefultheoremcalledSuperpositionTheoreminPoissonpointprocesstheory.Theproofisveryeasyandcanbefoundin[ 17 ]. Theorem5.9.1. Let1,2,beacountablecollectionofindependentPoissonProcessesonS.Letkhasameanmeasurek.Thenthesuperposition=PkkisagainaPoissonprocesswithmeanmeasure=Pkk. 115

PAGE 116

IfwetaketheunionofalltheunderlyingPoissonprocessdenotedbyk,thenitwillhavetheLevymeasureasaresult.NowwehaveaverysimplemethodtoconstructobytakingunionofalltheunderlyingPoissonprocessdenotedbyk.InCRMframeworkourPoissonprocessisdenedontheS=[0,1]andkcanrewrittenasPf(!j),!j2kf(!j)!jLettheQdimensionalBeta-DirichletProcessisgivenbyA(t)Beta-Dirichlet(c(t),A0(t),1(t),,Q(t))WecanwriteitsLevy-KhintchinerepresentationasEexp)]TJ /F6 11.955 Tf 9.55 0 Td[(uTA(t)=exp)]TJ /F10 11.955 Tf 11.29 16.27 Td[(Z[0,1]Q(1)]TJ /F3 11.955 Tf 11.95 0 Td[(e)]TJ /F5 7.97 Tf 6.74 0 Td[(uTz)dLt(z)A(t)hasunderlyingLevymeasureLtandweknowRV2B([0,1]Q)dLt(z)=([0,t],V).Forsimplicity,letusassumec(t)candj(t)jforallj=1,,Q.LetuswritedowntheLevymeasuredLt(z)=Zts=0(ds,dz)=(Zts=0 QYj=1zj)]TJ /F5 7.97 Tf 6.58 0 Td[(1j!(z.))]TJ /F16 7.97 Tf 8 5.98 Td[(PQj=1j(1)]TJ /F3 11.955 Tf 11.95 0 Td[(z.)c)]TJ /F5 7.97 Tf 6.59 0 Td[(1dA0(s))dz1dzQwheret0andz2SQoandQXj=1zj=z.and=c\(PQj=1j) QQj=1\(j)Notethatdoesnotdependont.Letyj=zj z.forj=1,2,,(Q)]TJ /F6 11.955 Tf 12 0 Td[(1)andletyQ=z..NowtheLaplacetransformofA(t)becomes,L=exp(Z[0,1]QZts=0e)]TJ /F5 7.97 Tf 6.74 0 Td[(uTz)]TJ /F6 11.955 Tf 11.95 0 Td[(1 QYj=1zj)]TJ /F5 7.97 Tf 6.58 0 Td[(1j!(z.))]TJ /F16 7.97 Tf 8 5.98 Td[(PQj=1j(1)]TJ /F3 11.955 Tf 11.95 0 Td[(z.)c)]TJ /F5 7.97 Tf 6.59 0 Td[(1dA0(s)dz1dzQ)=exp(ZSQ)]TJ /F17 5.978 Tf 5.76 0 Td[(1oZ10Zts=0e)]TJ /F5 7.97 Tf 6.75 0 Td[(uTy)]TJ /F6 11.955 Tf 11.96 0 Td[(1 Q)]TJ /F5 7.97 Tf 6.58 0 Td[(1Yj=1yj)]TJ /F5 7.97 Tf 6.58 0 Td[(1j! 1)]TJ /F4 7.97 Tf 11.95 14.95 Td[(Q)]TJ /F5 7.97 Tf 6.59 0 Td[(1Xj=1yj!Q)]TJ /F5 7.97 Tf 6.58 0 Td[(1 116

PAGE 117

(yQ))]TJ /F5 7.97 Tf 6.58 0 Td[(1(1)]TJ /F3 11.955 Tf 11.95 0 Td[(yQ)c)]TJ /F5 7.97 Tf 6.59 0 Td[(1dA0(s)dy1dyQ)]TJ /F5 7.97 Tf 6.59 0 Td[(1dyQ)wherey=yQy1,,yQ)]TJ /F5 7.97 Tf 6.59 0 Td[(1,(1)]TJ /F10 11.955 Tf 11.96 8.97 Td[(PQ)]TJ /F5 7.97 Tf 6.58 0 Td[(1j=1yj)AsweknowyQ2(0,1),wecanwrite,1 yQ=1 1)]TJ /F6 11.955 Tf 11.95 0 Td[((1)]TJ /F3 11.955 Tf 11.95 0 Td[(yQ)=1+(1)]TJ /F3 11.955 Tf 10.11 0 Td[(yQ)+(1)]TJ /F3 11.955 Tf 11.96 0 Td[(yQ)2+=1Xk=0(1)]TJ /F3 11.955 Tf 11.95 0 Td[(yQ)kas(1)]TJ /F3 11.955 Tf 11.96 0 Td[(yQ)2(0,1).UsingthisidentityinexpressionforLandwritingQQ)]TJ /F5 7.97 Tf 6.59 0 Td[(1j=1yj)]TJ /F5 7.97 Tf 6.59 0 Td[(1j1)]TJ /F10 11.955 Tf 11.95 8.97 Td[(PQ)]TJ /F5 7.97 Tf 6.58 0 Td[(1j=1yjQ)]TJ /F5 7.97 Tf 6.58 0 Td[(1=G(yQ)]TJ /F5 7.97 Tf 6.59 0 Td[(1),wehaveL=exp(ZSQ)]TJ /F17 5.978 Tf 5.76 0 Td[(1oZ10Zts=0e)]TJ /F5 7.97 Tf 6.75 0 Td[(uTy)]TJ /F6 11.955 Tf 11.95 0 Td[(1G(yQ)]TJ /F5 7.97 Tf 6.59 0 Td[(1) 1Xk=0(1)]TJ /F3 11.955 Tf 11.95 0 Td[(yQ)k!(1)]TJ /F3 11.955 Tf 11.95 0 Td[(yQ)c)]TJ /F5 7.97 Tf 6.59 0 Td[(1dA0(s)Q)]TJ /F5 7.97 Tf 6.59 0 Td[(1Yj=1dyjdyQ)=1Xk=0exp(ZSQ)]TJ /F17 5.978 Tf 5.76 0 Td[(1oZ10Zts=0e)]TJ /F5 7.97 Tf 6.75 0 Td[(uTy)]TJ /F6 11.955 Tf 11.95 0 Td[(1G(yQ)]TJ /F5 7.97 Tf 6.59 0 Td[(1)(1)]TJ /F3 11.955 Tf 11.95 0 Td[(yQ)c+k)]TJ /F5 7.97 Tf 6.58 0 Td[(1dA0(s)Q)]TJ /F5 7.97 Tf 6.59 0 Td[(1Yj=1dyjdyQ)=1Xk=0exp(ZSQ)]TJ /F17 5.978 Tf 5.76 0 Td[(1oZ10Zts=0e)]TJ /F5 7.97 Tf 6.75 0 Td[(uTy)]TJ /F6 11.955 Tf 11.95 0 Td[(1 cG(yQ)]TJ /F5 7.97 Tf 6.58 0 Td[(1)c(yQ)1)]TJ /F5 7.97 Tf 6.59 0 Td[(1(1)]TJ /F3 11.955 Tf 11.96 0 Td[(yQ)c+k)]TJ /F5 7.97 Tf 6.59 0 Td[(1dA0(s)Q)]TJ /F5 7.97 Tf 6.59 0 Td[(1Yj=1dyjdyQ)=1Xk=0exp(ZSQ)]TJ /F17 5.978 Tf 5.76 0 Td[(1oZ10Zts=0e)]TJ /F5 7.97 Tf 6.75 0 Td[(uTy)]TJ /F6 11.955 Tf 11.95 0 Td[(1 cG(yQ)]TJ /F5 7.97 Tf 6.58 0 Td[(1)h(c+k)(yQ)1)]TJ /F5 7.97 Tf 6.59 0 Td[(1(1)]TJ /F3 11.955 Tf 11.95 0 Td[(yQ)c+k)]TJ /F5 7.97 Tf 6.59 0 Td[(1ic c+kdA0(s)Q)]TJ /F5 7.97 Tf 6.59 0 Td[(1Yj=1dyjdyQ)=1Xk=0exp(Z[0,1]QZts=0e)]TJ /F5 7.97 Tf 6.74 0 Td[(uTz)]TJ /F6 11.955 Tf 11.95 0 Td[(1 QYj=1zj)]TJ /F5 7.97 Tf 6.58 0 Td[(1j!(z.)1)]TJ /F16 7.97 Tf 6.58 5.98 Td[(PQj=1j(1)]TJ /F3 11.955 Tf 11.96 0 Td[(z.)c+k)]TJ /F5 7.97 Tf 6.59 0 Td[(1c c+kdA0(s)dz1dzQ)Thislastequalitycomesfromreversingtheoriginaltransform. 117

PAGE 118

Theorem5.9.2. ForaBeta-DirichletProcessABDP(c(s),A0(s),1(s),,Q(s))withbasemeasureA0,concentrationmeasurec(s)andDirichletparametersfj(s)gQj=1.LetisitsunderlyingPoissonprocessandbeitsLevymeasure.ThenandAcanbewrittenas=[kkandA=XkAkwhereAkistheLevyprocesswithunderlyingPoissonprocessk.TheLevymeasurekisthedecompositionofsuchthat,k(ds,dz)=Beta-Dirichlet(z;1,c+k,1,,Q)dzAk0(ds)Ak0(ds)=c c+kA0(ds)andwehave(ds,dz)=1Xk=0k(ds,dz)TheabovetheoremshowsthatwecanwritedowntheunderlyingPoissonprocessoforiginalBeta-DirichletprocessisasuperpositionofcountableindependentPoissonprocesskwithcorrespondingmeanmeasurek.SoanyBDPcanbeexpressedasacountablesumofindependentLevyprocessesAkwithLevymeasurekandkastheunderlyingPoissonProcess.NotethatAkisnolongeraBDPasitviolatesthedenitionofBDP. 5.10BD-CategoricalProcessConjugacyWehaveseenthedenitionofBDP.InordertouseBDPinmachinelearningapplication,wehavetocoupleitwitheitherCategoricalProcess(CaP)orNegativeMultinomialProcess(NMP).LetusrstdeneCaPandNMPwithunderlyingbaseorhazardmeasureA.Inthiscontext,wewillfocusonthefactthatthebasemeasureisdiscreteandadrawfromaBDP.Buttheseprocessescanbedenedinaverygeneralsettingaswell.Wewilltrytokeepournotationssimilartothestandardnotationsfoundinrecentmachinelearningliteratureforexample[ 20 ]. 118

PAGE 119

5.11CategoricalProcess(CaP)LetAbeavectorvaluedmeasuredenedon.WedeneaCategoricalProcessXwithbasemeasureAanddenoteXCaP(A).IfAiscontinuousthenXisamarkedPoissonprocess.IfAisdiscretethenAisoftheformA=P1k=1pk!kwherep=(p1,,pQ),theX=P1k=1ck!kwhereck=(ck0,ck1,,ckQ).NotethatwhenwewilltrytodrawarandomvectorwithpfromaCategoricaldistributionweneedtoaugmentpwithp0=(1)]TJ /F10 11.955 Tf 12.07 8.96 Td[(PQj=1pj)andasaoutputwewillgeta(Q+1)-dimensionalvectorwhereonlyone1canappear.Whenc0=1,i.eifc=(1,0,,0)wewilltreatthisasaspecialvector.IfAhastwoparts-continuousaswellasdiscrete,thenXwillalsohavetwoparts.ThisCRMdenitionofCaPenablesustocarryouteasiercalculation.Ingure 5-2 wecanseehowwearegeneratingX.Similarto[ 2 ],wecantreateach!k2asalatentfeature.Previously,forexample,BernoulliProcess(BeP)andIndianBuetProcess(IBP)weonlycareaboutexistenceornon-existenceofthefeaturesbutnowwecanthinkofthefeatureswhichcomeswithachoiceorcategoryandthuswehaveamoregeneralprocesslikeCaPorcategoricalIBP(discussedinlatersectioninthischapter).WewillalsoshowthatBDPistheDeFinettimixingdistributionforc-IBPlikeCRPforDPorBPforIBP. 5.11.1ConjugacyforCaPandBDP{CRMFormulationLetuswritedownthefullCRMspecicationforBDPandCaPwherewewillhaveonlythediscretepart.CaPwillbedenotedbyCaP(A).Ithasasingleparameter,i.e.thebaseorhazardmeasureA,whichisdiscreteandhasatomsthattakesvaluesinSQo.WewillfocusparticularlyonthemodelwhereAisdrawnfromaBDprocesswhosedrawisalmostsurelydiscrete.Theentiremodelisthefollowing{XCaP(A)ABDP(,,(1,,Q),G) 119

PAGE 120

Figure5-2. BD-CategoricalprocesswithQ=2 whereistheconcentrationparameter,isthemassparameter,(1,,Q)istheDirichletparameterandGisthebasemeasuresuchthatG()=.HereGisassumedtobeabsolutelycontinuous.NotethataCategoricaldistributionisaspecialcaseofmultinomialdistributionwithtotaloutcomeisequalto1.AlsoweknowthatitisthegeneralizationoftheBernoullidistributionforacategoricalrandomvariable.Inoneformulationofthedistribution,thesamplespaceistakentobeanitesequenceofintegers.Theexactintegersusedaslabelsarenotimportant.Wewillusef0,1,,Qgforconvenience.Inthiscase,theprobabilitymassfunctionfis{f(x=jjp)=pj, 120

PAGE 121

wherepjrepresentstheprobabilityofseeingelementjandPQj=0pj=1.Anotherformulationthatappearsmorecomplexbutfacilitatesmathematicalmanipulationsisasfollows,usingtheIversonbracketf(xjp)=QYj=0p[x=j]j,where[x=j]evaluatesto1ifx=j,0otherwise.BDPAisnowcoupledwithaCaP,denotedbyCaP(A).Ithasasingleparameter,thebasemeasureA,whichisdiscreteandhasatomsthattakesvaluesin[0,1]Q.WewillfocusparticularlyonthemodelwhereAisdrawnfromaBDprocesswhosedrawisalmostsurelydiscrete.WewriteA=1Xk=1pk!k=1Xk=1(pk1,pk2,,pkQ)T!kNotethat,PQj=1pkj1forallk=1,2,.Letustakepk0=1)]TJ /F10 11.955 Tf 11.96 8.96 Td[(PQj=1pkj.NowfpkjgQj=0canbetreatedasapriorprobabilitydistributionforacategoricaldistributionwith(Q+1)components.WesaythattherandommeasureXisdrawnfromaCategoricalprocess,XCaP(A),ifX=1Xk=1ck!k=1Xk=1(ck0,ck1,,ckQ)T!kwhere(ck0,ck1,,ckQ)indCategorical(pk0,pk1,,pkQ)fork=1,2,.Notethatonlyoneentryof(ck0,ck1,,ckQ)is1andallotherentriesare0sforallk=1,2,.SoThecategoricalprocessliesonthespacef0,1,,Qgasshowningure 5-2 .Hereweassumedthatthebasemeasurehasbothdiscreteandcontinuouspart.NowsupposeBDPAhastheparameters>0,>0,ff,jgQj=1,fo,jgQj=1andbasemeasureG.WeassumeGisanyarbitraryCRMwiththreepartsG=Gd+Gf+Goaccordingto[ 30 ].InallproofswehaveassumedGd=0.SowewilltakeG=Gf+Go.Inlatersection,wehave 121

PAGE 122

assumedthatGitselfisadrawfromaBPG=P1kbk!kinthehierarchicalmodel.ABDP(,,ff,jgQj=1,fo,jgQj=1,Gf,Go)andXCaP(A).WerefertheoverallprocessastheBeta-DirichletCategoricalprocess(BDCaP). 5.11.2BDCaPConjugacyWithStandardParametrization Theorem5.11.1. LetGbeameasurewithxedatomiccomponentGf=PLl=1lulandabsolutelycontinuouscomponentGo.Let>0and>0.ConsiderNconditionally-independentdrawsfromCaPasfollowsXn=LXl=1cfn,lul+MXi=1con,ivii.i.d.CaP(A),forn=1,2,,NwithABDP,,ff,jgQj=1,fo,jgQj=1,Gf,GoThatis,theCaPdrawhasMatomsthatarenotlocatedattheatomsofGf.Then)]TJ /F6 11.955 Tf 6.37 -7.03 Td[(AjX1,,XNBDPpost,post,fpostf,jgQj=1,fposto,jgQj=1,Gpostf,Gpostowithpost=+N,post= +Nand.AlsoGposto=GoandGpostf=LXl=1postlul+MXi=1postiviwhere,postl=l+())]TJ /F5 7.97 Tf 6.59 0 Td[(1NXn=1QXq=1cf,n,l,q8l=1,,Landposti=())]TJ /F5 7.97 Tf 6.59 0 Td[(1NXn=1QXq=1co,n,i,q8i=1,,MAlsowehave,postf,l,j=fj+NXn=1cf,n,l,j8l=1,,Landj=1,2,,Qposto,i,j=oj+NXn=1co,n,i,j8i=1,,Mandj=1,2,,Q 122

PAGE 123

Notethat,thesecondsumPQq=1isnotreallyasumbecausewehaveexactlyoneq=q0(say)forwhichc,n,l,q0=1andrestareallzerosforallqsuchthat((q2f1,2,,Qg)^(q6=q0)). Proof. ThiscanbeeasilyveriedfromTheorem3ofKimin[ 3 ]. Remark:.c,n,l,qistheq-thcomponentofthevectorc,n,landwecanalsondoutthepredictivedistributionbyintegratingouttheunderlyingBDP.WeendupwiththeCategoricalIndianBuetProcess(cIBP)whichisthedirectanalogueofthemultidimensionalversionoftheworkin[ 2 ].WehaveshowncIBPconstructioninlatersection. 5.11.3BDCaPConjugacyUsingAlternativeParametrizationforBDPintheBaseMeasure(G) Theorem5.11.2. AssumetheconditionsfortheabovetheoremandconsiderNconditionally-independentdrawsfromCaP:Xn=LXl=1cf,n,lul+MXi=1co,n,ivii.i.d.CaP(A),forn=1,2,,NwithABDP,,fojgQj=1,hu,,,ffjgQj=1i,GowherealltheseboldfacesymbolforLcorrespondingatoms.InthisalternativedenitionofBDPthebasemeasureischaracterizedbythetupleG(Gf,Go)([u,,,ffjgQj=1],Go).flgLl=1andflgLl=1areall>0.ffjgQj=1aretheDirichletparameterscorrespondingtoxedpartandareassumedtobesameforallxedatoms.SimilarlyfojgQj=1aretheDirichletparameterscorrespondingtorandompartandareassumedtobesameforallrandomatoms.ThenAjX1,,XNBDPpost,post,fpostojgQj=1,hupost,post,post,fpostfjgQj=1i,Gpostowithpost=+N,post= +NandGposto=GoandtotalL+Mxedatoms,fupostl0g=fulgLl=1[fvigMi=1.post,postwillbesuchthatforl=1,2,,Lpostl=l+NXn=1QXq=1cf,n,l,qand 123

PAGE 124

postl=l+N)]TJ /F4 7.97 Tf 17.29 14.94 Td[(NXn=1QXq=1cf,n,l,qandfori=1,2,,MpostL+i=NXn=1QXq=1co,n,i,qandpostL+i=+N)]TJ /F4 7.97 Tf 17.29 14.94 Td[(NXn=1QXq=1co,n,i,qAlsowehave,postf,l,j=fj+NXn=1cf,n,l,j8l=1,,Landj=1,2,,Qposto,i,j=oj+NXn=1co,n,i,j8i=1,,Mandj=1,2,,Q Proof. ThisisimmediatefrompreviousTheorem. 5.12BDCaP-Conjugacy-ProofStatement Theorem5.12.1. LetAprior=P1k=1pk!kbeadiscrete,CRMon[0,1]Qwithatomlocationsin[0,1].Letp.=PQj=1pj.Supposeithasthefollowingcomponents. Thereisnodeterministiccomponent. TheordinarycomponentisgeneratedfromaPoissonpointprocesswithintensityc(d!,dp)=c(dp)d!suchthatcisabsolutelycontinuousandc([0,1]Q)<1.Inparticular,theQ-dimensionalweightsarethe\p"axesandtheatomlocationsareinthe\!"axis. ThereareLxedatomsatlocationsu1,u2,,uL2[0,1].TheQ-dimensionalweightofthel-thxedatomisarandomvariablewithdistributiondGl.DrawoneCaPXwithinputmeasureAprior.Letusassumethereisonlyonenon-zeroatomofXandletfc1,s1gbethepairofobserveddatawithcorrespondinglocation.Non-zeroatommeansthatthecategoricaldatavectorhasa1whichcanappearinanyofthelastQpositions 124

PAGE 125

only,soc10cannotbe1.Notethatc=fc10,c11,,c1Qgis(Q+1)-dimensionalvectorwithone1inanypositionj=1,,Q.TheposteriorprocessgiventheCaPXisaCRMApostwiththefollowingcomponents. Thereisnodeterministiccomponent. TheordinarycomponentisgeneratedfromaPoissonpointprocesswithintensity(1)]TJ /F3 11.955 Tf 11.95 0 Td[(p.)c(d!,dp)=(1)]TJ /F3 11.955 Tf 11.96 0 Td[(p.)c(dp)d! Therearethreetypesofxedatoms. 1. Thereisold,repeatedxedatom.Iful=s1,thenthereisaxedatomatulwithweightdensity1 WrfQYj=1pI[c1j=1]jdGl(p)whereWrfisthenormalizingconstantWrf=Zp2[0,1]QQYj=1pI[c1j=1]jdGl(p) 2. Thereareold,unrepeatedxedatoms.Iful6=s1,thenthereisaxedatomatulwithweightdensity1 Wuf(1)]TJ /F3 11.955 Tf 11.96 0 Td[(p.)dGl(p)whereWufisthenormalizingconstantWuf=Zp2[0,1]Q(1)]TJ /F3 11.955 Tf 11.95 0 Td[(p.)dGl(p) 3. Thereisanewxedatom.Ifs1=2fu1,,uLg,thenthereisaxedatomats1withweightdensity1 WnewQYj=1pI[c1j=1]jc(dp)whereWnewisthenormalizingconstantWnew=Zp2[0,1]QQYj=1pI[c1j=1]jc(dp) Proof. ThiscanbealsoveriedfromTheorem3ofKimin[ 3 ]. 125

PAGE 126

5.13ExtensionofIndianBuetProcessLatentfeaturemodelhasbeenextensivelyusedinvariousapplicationofmachinelearning.Innon-parametricBayesiansetupoftenthenumberoffeaturesisunknown.Oneoftheimportantworksinthisdirectioncanbefoundinhere[ 11 ]whereastochasticprocessnamedIBPisdened.Itisapriorforgeneratinginnitebinarymatrices.InametaphoricalwayitisprocessoftastingdishesinaninnitebuetbyNnumberofcustomers.LetZibethebinaryvectorwhereZik=1ifcustomer(object)itastesthedish(feature)k.Customeritastesthedishkwithprobabilitymk i,whichistheindicatorofthepopularityofthedishsofar.Havingsampledthepreviouslysampleddishes,i-thcustomertriesnewNsetofdishes,whereNisdrawnfromaPoisson( i)distribution.TheexchangeabledistributionofIBPcanbeobtainedwhichisinvarianttothepermutationofthecolumns.BytheDeFinettitheorem,foranyexchangeabledistributionthereshouldbeanunderlyingrandommeasuresuchthatconditionedonthatthesamplesbecomesconditionallyindependentgiventhatrandommeasure.ForIBP,itwasshownthatBPistheunderlyingDeFinettimeasure.Itwouldbeverynaturaltoextendthismodeltoacategoricalsetting,whereeachentryofthematrixisnotnecessarilybe0or1,ratherasetofintegersinf0,1,,QgwhereQisxedapriori.Nowwewilldeneaprioronthisinnite(Q+1)-naryrandommatrix.WecallthisstochasticprocessCategoricalIBPorcIBP. CategoricalIBP.cIBPisadirectgeneralizationofIBPaswehavediscussedabove.Themaindierenceofthistypeofmatrixisinsteadofbinaryentrieswewillhaveaanyoneofthe(Q+1)categoriesdenotedbytheintegers1,2,,Q.LetuswecandescribecIBPinthelightofmetaphoricallanguage.ThesettingwouldbeexactlysamewithNnumberofcustomersinaninnitebuetrestaurant.Nowassumeeachdishcomeswithachoice.Forexample,wecanassumeforthemomentthatQ=3and1,2and3denotesthespicelevel-mild(1),medium(2)andhot(3)ofaparticulardish.Nowasthei-thcustomerswalksin,he/shechoosesthedishkwithaparticularspicelevelwithprobability(mk. i)(kj+mkj +mk.)wheremk.=PQj=1mkjand=PQi=1j.mk.denotesthenumberofcustomerswhohavetastedthe 126

PAGE 127

dishkintotalandmkjdenotesthenumberofcustomerswhohavetastedthedishkwithaspecicspicelevel.FinallycustomeritastesnewdisheswithspiceleveljdeterminedbyadrawfromPoisson((j ) i).Thisprocess-cIBPcanalsobeshowntogenerateanexchangeabledistributionover(Q+1)-naryrandommatricesandwewillexplicitlyfoundtheunderlyingDeFinettimeasureofthisexchangeabledistribution-whichhappenstobetheBDprocessinourcase. 5.14ExtensionofFiniteFeatureModelandtheLimitingCaseNowifonelooksattheniteIBPmodelthenaturalpriortoextendthismodeltoacategoricalversionwouldbeDirichlet.Herewehaveafeaturewhosevaluecanlieinthesetf0,1,2,,Qg-sototal(Q+1)choice.Treatthis0-thfeatureaspossessionofnofeature.Restofthemf1,2,,Qgcanbeseenasachoiceforanyparticularfeature.Qisxedbeforehand.OnetypicalmatrixZmightlooklikethegure 5-3 .LetthenumberofrowsbeNandnumberofcolumnsbeK.Considereachrowasanobjectandeachcolumnasafeature.Wewillassumeeachobjectpossessesqcategoriesoffeaturekwithaprobabilitykq.WewilllaterderiveexpressionwhenK!1.WewillassumethattheprobabilityofmatrixZgiven Figure5-3. AcandidatematrixwithQ=3has4categoriesnamely0,1,2or3. 127

PAGE 128

=f~1,~2,,~Kgwhereeach~kisavectorthatlookslikefk1,k2,,kQgsuchthatPQj=1kj1forallk=1,2,,K.Alsoletk0denotestheprobabilityofgenerating0-thcategoryoffeatureornofeature.NotethateachzikisnowaCategorical/Discretevariabletakingvalueinthesetf0,1,,Qg.Nowletuscomputetheprobabilityofgeneratingsuchmatrixp(Zj)=KYk=1NYi=1p(Zikj~k)=KYk=1mk1k1mk2k2mkQkQ(1)]TJ /F8 11.955 Tf 11.96 0 Td[(k)mk0wherek=PQj=1kjandmkjdenotesthenumberofobjectsfromtotalNobjectspossessingcategoryjoffeatureki.emkj=PNi=1I(zik=j).Letusdenea(Q+1)-dimensionalDirichletprioron~kwithparametersf K, K,, K,1g[therearetotalQnumbersof K].Soclearly,p(~k)=)]TJ 8.77 -.16 Td[((Qr+1) hQQj=1\(r)i\(s)r)]TJ /F5 7.97 Tf 6.59 0 Td[(1k1r)]TJ /F5 7.97 Tf 6.59 0 Td[(1k2r)]TJ /F5 7.97 Tf 6.58 0 Td[(1kQ(1)]TJ /F8 11.955 Tf 11.96 0 Td[(k)s)]TJ /F5 7.97 Tf 6.59 0 Td[(1wherer= Kands=1.Sothemodelcanbespeciedasfollowszikj~kDiscrete(~k)~kDirichlet( K, K,, K,1)Notethat,eachzikisindependentofallotherassignmentand~karegeneratedindependently.Sop(Z)=KYk=1Z[0,1](Q) NYi=1p(zikj~k)!p(~k)d~k=KYk=1\(mk1+ K)\(mkQ+ K)\(N)]TJ /F3 11.955 Tf 11.96 0 Td[(mk+1) \(N+1+Q K)\(Q K+1) )]TJ /F6 11.955 Tf 5.48 -9.68 Td[(\( K)Q=KYk=1\(Q K+1)hQQj=1\(mkj+ K)i\(N)]TJ /F3 11.955 Tf 11.96 0 Td[(mk+1) \(N+1+Q K))]TJ /F6 11.955 Tf 5.48 -9.69 Td[(\( K)Qwheremk=PQj=1mkj.Itisanexchangeabledistributionbecauseitdependsonlyoncountmkj.Expectationofthenumberofnon-zerocategoriesforanentry(forasinglecolumn)(bycollapsing(Q+1)dimensionalDirichlettoaBetavariableconsideringonlytwopossibilities128

PAGE 129

featurecategory0andnon-zero)KNXi=1Z10(1)]TJ /F8 11.955 Tf 11.95 0 Td[(k)p(k)dk=NKQ K 1+Q K/NQ(asK!1)SoExpectationofnon-zeroentryisboundedby(NQ)asK!1. EquivalenceClass.Leftordered(Q+1)-narymatrixorlof(Z)isobtainedbyorderingthecolumnsofZfromlefttorightbythemagnitudeofthe(Q+1)-narynumber(i.erepresentedinbase(Q+1))expressedbythatcolumntakingrstrowasthemostsignicantbit.ThefullHistoryoffeaturekisreferredbythedecimalequivalenttothe(Q+1)-narynumberrepresentedbythevector(z1k,z2k,zNk).Kh=numberoffeatureshavinghistoryhK0=numberoffeaturesforwhichmk=0K+=(Q+1)N)]TJ /F5 7.97 Tf 6.58 0 Td[(1Xh=1=numberoffeaturesforwhichmk>0SowehaveK=K0+K+.Letusdenotetheequivalentclassby[Z].Bythelof()notionwecanseeexactlyK! K0!K1!K(Q+1)N)]TJ /F17 5.978 Tf 5.76 0 Td[(1!.mappedtothesameleft-orderedmatrix.Sonowwederivep([Z]). p([Z])=XZ2[Z]p(Z)= K! Q(Q+1)N)]TJ /F5 7.97 Tf 6.59 0 Td[(1h=0Kh!!KYk=1\(Q K+1)hQQj=1\(mkj+ K)i\(N)]TJ /F3 11.955 Tf 11.96 0 Td[(mk+1) \(N+1+Q K))]TJ /F6 11.955 Tf 5.48 -9.69 Td[(\( K)Q(5{5)Soletusrstderivetheequationforp(Z).Notethat,ifnumberofobjectshavingfeaturekis0,thenmk=0andallmkj=0,whichisdenotedbyK0.KYk=1\(Q K+1)hQQj=1\(mkj+ K)i\(N)]TJ /F3 11.955 Tf 11.95 0 Td[(mk+1) \(N+1+Q K))]TJ /F6 11.955 Tf 5.48 -9.68 Td[(\( K)Q 129

PAGE 130

= \(Q K+1))]TJ /F6 11.955 Tf 5.48 -9.68 Td[(\( K)Q\(N+1) )]TJ /F6 11.955 Tf 5.48 -9.69 Td[(\( K)Q\(N+1+Q K)!K)]TJ /F4 7.97 Tf 6.59 0 Td[(K+K+Yk=1\(Q K+1)hQQj=1\(mkj+ K)i\(N)]TJ /F3 11.955 Tf 11.96 0 Td[(mk+1) )]TJ /F6 11.955 Tf 5.48 -9.68 Td[(\( K)Q\(N+1+Q K)= \(Q K+1)\(N+1) \(N+1+Q K)!K)]TJ /F4 7.97 Tf 6.59 0 Td[(K+K+Yk=1\(Q K+1)hQQj=1\(mkj+ K)i\(N)]TJ /F3 11.955 Tf 11.95 0 Td[(mk+1) )]TJ /F6 11.955 Tf 5.48 -9.68 Td[(\( K)Q\(N+1+Q K)= \(Q K+1)\(N+1) \(N+1+Q K)!K)]TJ /F6 11.955 Tf 5.48 -9.69 Td[(\(N+1+Q K)K+ )]TJ /F6 11.955 Tf 5.48 -9.68 Td[(\(Q K+1)K+(\(N+1))K+K+Yk=1\(Q K+1)hQQj=1\(mkj+ K)i\(N)]TJ /F3 11.955 Tf 11.95 0 Td[(mk+1) )]TJ /F6 11.955 Tf 5.48 -9.69 Td[(\( K)Q\(N+1+Q K)= \(Q K+1)\(N+1) \(N+1+Q K)!KK+Yk=1hQQj=1\(mkj+ K)i\(N)]TJ /F3 11.955 Tf 11.96 0 Td[(mk+1) )]TJ /F6 11.955 Tf 5.48 -9.69 Td[(\( K)Q\(N+1)=0B@N! \(N+1+Q K) \(Q K+1)1CAKK+Yk=1(N)]TJ /F3 11.955 Tf 11.96 0 Td[(mk)!hQQj=1\(mkj+ K) \( K)i N!=0B@N! (N+Q K)! (Q K)!1CAKK+Yk=1(N)]TJ /F3 11.955 Tf 11.95 0 Td[(mk)!hQQj=1\(mkj+ K) \( K) K Ki N!=0B@N! (N+Q K)! (Q K)!1CAKK+Yk=1 (N)]TJ /F3 11.955 Tf 11.96 0 Td[(mk)! N! KQ"QYj=1)]TJ /F3 11.955 Tf 5.48 -9.69 Td[(mkj)]TJ /F6 11.955 Tf 11.96 0 Td[(1+ K! )]TJ /F12 7.97 Tf 7.41 -4.97 Td[( K!#!= N! QNj=1)]TJ /F3 11.955 Tf 5.48 -9.69 Td[(j+Q K!K KQK+K+Yk=1 (N)]TJ /F3 11.955 Tf 11.96 0 Td[(mk)! N!"QYl=1mkl)]TJ /F5 7.97 Tf 6.59 0 Td[(1Yj=1j+ K#!Nowfromequation 5{5 wehave,p([Z])= K! Q(Q+1)N)]TJ /F5 7.97 Tf 6.59 0 Td[(1h=0Kh!! N! QNj=1)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(j+Q K!K KQK+K+Yk=1 (N)]TJ /F3 11.955 Tf 11.96 0 Td[(mk)! N!"QYl=1mkl)]TJ /F5 7.97 Tf 6.58 0 Td[(1Yj=1j+ K#! 130

PAGE 131

= QK+ Q(Q+1)N)]TJ /F5 7.97 Tf 6.59 0 Td[(1h=0Kh!!| {z }1K! K0!KQK+| {z }2 N! QNj=1)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(j+Q K!K| {z }3K+Yk=1 (N)]TJ /F3 11.955 Tf 11.96 0 Td[(mk)! N!"QYl=1mkl)]TJ /F5 7.97 Tf 6.58 0 Td[(1Yj=1j+ K#!| {z }4Nowfrom(3)wehave, N! QNj=1(j+Q K)!K=NYj=10@1 1+(Q j) K1AKNow,weknowlimK!11 1+x KK=expf)]TJ /F3 11.955 Tf 15.27 0 Td[(xg.UsingthiswehaveNYj=1exp)]TJ /F3 11.955 Tf 9.29 0 Td[(Q1 j=expf)]TJ /F3 11.955 Tf 15.28 0 Td[(QHNgasK!1.where,HN=PNj=11 j.From(4)wehave,mkl)]TJ /F5 7.97 Tf 6.58 0 Td[(1Yj=1j+ K=(mkl)]TJ /F6 11.955 Tf 11.96 0 Td[(1)!+ K[otherterms]!(mkl)]TJ /F6 11.955 Tf 11.96 0 Td[(1)!asK!1So(4)becomes,K+Yk=1 (N)]TJ /F3 11.955 Tf 11.95 0 Td[(mk)!QQl=1(mkl)]TJ /F6 11.955 Tf 11.95 0 Td[(1)! N!!For(2),rstwenotethatK0=K)]TJ /F3 11.955 Tf 11.95 0 Td[(K+andK!=(K0+K+)!=K0!((K0+1)(K0+2)(K0+K+))=K0!((K)]TJ /F3 11.955 Tf 11.96 0 Td[(K++1)(K)]TJ /F3 11.955 Tf 11.95 0 Td[(K++K+))=K0! K+Yj=1(K)]TJ /F3 11.955 Tf 11.95 0 Td[(j+1)!Usingthiswehave,K! K0!KQK+=0@K0!hQK+j=1(K)]TJ /F3 11.955 Tf 11.95 0 Td[(j+1)i K0!KQK+1A= QK+j=1(K)]TJ /F3 11.955 Tf 11.95 0 Td[(j+1) KQK+!!0whenQ>1 131

PAGE 132

ThisshowsthatprobabilityofthismatrixgenerationunderDirichletpriorgoesto0.ThissigniesthatthesewillbeallindependentBetaprocessesasstatedbyHjortinhispaper.Notethat, Oneoftheimportantcharacteristicsofthismatrixisthatithasmorethan2featuresinacolumn. 5.15BDprocess(BDP)andCategoricalIndianBuerProcess(cIBP)TheBDprocesswasdenedbyKimetal.[ 3 ]inthecontextofmultistatedataeventhistorydatafromsurvivalanalysis.ThisnewpriorhasbeenappliedbytheminaBayesiansemi-parametricregressionmodelrelatedtocredithistorydatarecently.ThisBDcanbeconsiderasanextensionofHjort'sBetaprocess[ 18 ].Oneoftheimportantthingtonotethatindiscretetimemodel,onecanstartwithanaturalconjugatepriorforcategoricaldata-whichisDirichlet.Butifwetakethelimitthenthecumulativeintensityfunctionbecomesindependentinlimit(theorem5.1inHjort).ToeliminatethispropertyKimetal.processthisnovelpriorwhichhasadesirablepropertythatcumulativeintensityfunctionsaredependentinlimit.LetustakeBDdistributionwithparameters(1,2,(1,2,,Q)).SoinsteadoftraditionalDirichletwewilltakeBDwiththosespecicparameters.OurgoalistogenerateamatrixwhichwouldlooklikethefollowingFirstletuswritedownthedensityfunctionof Figure5-4. AcandidatematrixwithQ=2has3categoriesnamely0,1,2. 132

PAGE 133

Q-dimensionalBD,whichisgivenby:\() QQi=1\(i)\(1+2) \(1)\(2)(x.)1(1)]TJ /F3 11.955 Tf 11.95 0 Td[(x.)2QYi=1xiiwhere=PQi=1iandx.=PQi=1xi. 5.15.1SymmetricDirichletInthiscasewewillchooseasymmetricDirichletwithallparametersareequaltoand1= Kand2=1.Sothex.willfollowaBetadensitywithparameters( K,1)andfx1 x.,,xQ x.gwillfollowaDirichletwithf,,gSothefullgenerativemodelofthismatrixiszikj~kDiscrete(~k)~kBeta-Dirichlet K,1,(,,)NowwewillderivetheexpressionforniteKrstandthenwewillobtaintheexpressionwhenK!1.LetusdenotePQj=1mkj=mk.andalsonotethatinordertomk.tobe0allthemkjhastobe0.Usingthesametechniqueasbeforewehavep(Z)=KYk=1\(Q)\( K+1) \( K)(\())QhQQj=1\(+mkj)i\(N)]TJ /F3 11.955 Tf 11.96 0 Td[(mk.+1)\( K+mk.) \( K+N+1)\(Q+mk.)=\(N+1)\( K+1) \( K+N+1)K)]TJ /F4 7.97 Tf 6.59 0 Td[(K+K+Yk=1\(Q))]TJ /F12 7.97 Tf 7.41 -4.97 Td[( KhQQj=1\(+mkj)i\(N)]TJ /F3 11.955 Tf 11.95 0 Td[(mk.+1)\( K+mk.) (\())Q\( K+N+1)\(Q+mk.)=\(N+1)\( K+1) \( K+N+1)K\( K+N+1) \(N+1)\( K+1)K+K+Yk=1\(Q))]TJ /F12 7.97 Tf 7.41 -4.98 Td[( KhQQj=1\(+mkj)i\(N)]TJ /F3 11.955 Tf 11.95 0 Td[(mk.+1)\( K+mk.) (\())Q\( K+N+1)\(Q+mk.)=N!\( K+1) \( K+N+1)K 133

PAGE 134

K+Yk=1\(Q))]TJ /F12 7.97 Tf 7.41 -4.98 Td[( KhQQj=1\(+mkj)i\(N)]TJ /F3 11.955 Tf 11.95 0 Td[(mk.+1)\( K+mk.)\( K+N+1) (\())Q\( K+N+1)\( K+1)\(N+1)\(Q+mk.)=N!( K)! ( K+N)!K KK+K+Yk=1\(Q) \(Q+mk.)\( K+mk.) \( K+1)\(N)]TJ /F3 11.955 Tf 11.96 0 Td[(mk.+1) \(N+1)"QYj=1\(+mkj) \()#=N!( K)! ( K+N)!K KK+K+Yk=1\(N)]TJ /F3 11.955 Tf 11.95 0 Td[(mk.+1) \(N+1)\( K+mk.) \( K+1)Q Q\(Q) \(Q+mk.)"QYj=1\(+mkj) \()#=N!( K)! ( K+N)!K KK+1 QK+K+Yk=1\(N)]TJ /F3 11.955 Tf 11.95 0 Td[(mk.+1) \(N+1)\( K+mk.) \( K+1)(Q)! (Q+mk.)]TJ /F6 11.955 Tf 11.95 0 Td[(1)!"1 QYj=1\(+mkj) \()#Continuinginthismannerwehave,= 1 QNj=1(j+ K)!K KK+1 QK+K+Yk=1(N)]TJ /F3 11.955 Tf 11.96 0 Td[(mk.)! N!( K+mk.)]TJ /F6 11.955 Tf 11.95 0 Td[(1)! ( K)!(Q)! (Q+mk.)]TJ /F6 11.955 Tf 11.95 0 Td[(1)!"1 QYj=1\(+mkj) \()#= 1 QNj=1(j+ K)!K KK+1 QK+K+Yk=1(N)]TJ /F3 11.955 Tf 11.96 0 Td[(mk.)! N!"mk.)]TJ /F5 7.97 Tf 6.59 0 Td[(1Yj=1j+ K#"1 Qmk.)]TJ /F5 7.97 Tf 6.59 0 Td[(1j=1(j+Q)1 QYj=1\(+mkj) \()#LetsusnowtakeK!1,thenwehavep([Z])=limK!1 K! K0!Q(Q+1)N)]TJ /F5 7.97 Tf 6.59 0 Td[(1h=1Kh!! 1 QNj=1(j+ K)!K KK+1 QK+K+Yk=1(N)]TJ /F3 11.955 Tf 11.95 0 Td[(mk.)! N!"mk.)]TJ /F5 7.97 Tf 6.58 0 Td[(1Yj=1j+ K#"1 Qmk.)]TJ /F5 7.97 Tf 6.58 0 Td[(1j=1(j+Q)1 QYj=1\(+mkj) \()# 134

PAGE 135

=)limK!1K! K0!KK+ ( Q)K+ Q(Q+1)N)]TJ /F5 7.97 Tf 6.59 0 Td[(1h=1Kh!! 1 QNj=1(j+ K)!KK+Yk=1(N)]TJ /F3 11.955 Tf 11.95 0 Td[(mk.)! N!"mk.)]TJ /F5 7.97 Tf 6.58 0 Td[(1Yj=1j+ K#"1 Qmk.)]TJ /F5 7.97 Tf 6.58 0 Td[(1j=1(j+Q)1 QYj=1\(+mkj) \()#NowweknowK! K0!KK+!1asK!1 1 QNj=1(j+ K)!K!exp()]TJ /F8 11.955 Tf 9.3 0 Td[(HN)asK!1"mk.)]TJ /F5 7.97 Tf 6.58 0 Td[(1Yj=1j+ K#!(mk.)]TJ /F6 11.955 Tf 11.95 0 Td[(1)!asK!1Sointhelimititbecomes, ( Q)K+ Q(Q+1)N)]TJ /F5 7.97 Tf 6.58 0 Td[(1h=1Kh!!exp()]TJ /F8 11.955 Tf 9.3 0 Td[(HN)K+Yk=1((N)]TJ /F3 11.955 Tf 11.96 0 Td[(mk.)!(mk.)]TJ /F6 11.955 Tf 11.95 0 Td[(1)! N!"1 Qmk.)]TJ /F5 7.97 Tf 6.59 0 Td[(1j=1(j+Q)1 QYj=1\(+mkj) \()#)Note:Theoneinthedenominatoristocompensateforthefactthatthenumeratorcorrespondingtotherstrowofacolumnstartscontributingfrom(+1);soinordertomakeitfrom,weneedthisconstant.Wenowgiveoneexamplefortherstcolumnofthegivenmatrix(Q=2)shownin 5-4 .Itwillbe1 2+1 2+12 3 2+21 43 5+2 2+3Thuswecanverifytheequation. 135

PAGE 136

5.15.2AsymmetricDirichletLetQdierentparametersforDirichletbef1,2,,Qg.Thetheonlyfactorthatwillchangeisthefollowing:\(Q) \(Q+mk.)1 QYj=1\(+mkj) \()anditwillchangeto\() \(+mk.)1 fkQYj=1\(j+mkj) \(j)where=(1+2++Q)andfkcanbeanyofjs.fkcorrespondstotheDirichletparametertherstsymbolinf1,2,,Qgthatgotgeneratedinthek-thcolumn.Nowmultiplyingwithafactorfk appropriatelyandassumethati-thcustomergeneratesKidishandamongthemK(f)iarenumberofdisheswithchoicef.SopreviouswehadK+=PNi=1Ki.NoweachKi=PQf=1K(f)i.SoK+=PNi=1PQf=1K(f)i=PQf=1K(f)+,whereK(f)+numberoftotalnewdishgeneratedwithchoicef.WehaveforniteK 1 QNj=1(j+ K)!K KK+ QYf=1f Kf+!K+Yk=1((N)]TJ /F3 11.955 Tf 11.95 0 Td[(mk.)! N!"mk.)]TJ /F5 7.97 Tf 6.58 0 Td[(1Yj=1j+ K#"1 Qmk.)]TJ /F5 7.97 Tf 6.59 0 Td[(1j=1(j+)1 fkQYj=1\(j+mkj) \(j)#)Sonowthelimitingdistributionbecamefrom, ( Q)K+ Q(Q+1)N)]TJ /F5 7.97 Tf 6.58 0 Td[(1h=1Kh!!exp()]TJ /F8 11.955 Tf 9.3 0 Td[(HN)K+Yk=1((N)]TJ /F3 11.955 Tf 11.95 0 Td[(mk.)!(mk.)]TJ /F6 11.955 Tf 11.96 0 Td[(1)! N!"1 Qmk.)]TJ /F5 7.97 Tf 6.58 0 Td[(1j=1(j+Q)1 QYj=1\(+mkj) \()#)tothisnewdistributionp([Z])=0B@()K+QQf=1(f )K(f)+ Q(Q+1)N)]TJ /F5 7.97 Tf 6.58 0 Td[(1h=1Kh!1CAexp()]TJ /F8 11.955 Tf 9.3 0 Td[(HN) 136

PAGE 137

K+Yk=1((N)]TJ /F3 11.955 Tf 11.95 0 Td[(mk.)!(mk.)]TJ /F6 11.955 Tf 11.96 0 Td[(1)! N!"1 Qmk.)]TJ /F5 7.97 Tf 6.59 0 Td[(1j=1(j+)1 fkQYj=1\(j+mkj) \(j)#)Nowletusrewritethisexpressioninabetterform.SointhelimittheprobabilityofgeneratinganmatrixisQYf=1"0@(f )K(f)+ QNi=1K(f)i!1Aexp)]TJ /F10 11.955 Tf 11.29 16.85 Td[(f HNK(f)+Yk=1((N)]TJ /F3 11.955 Tf 11.96 0 Td[(mk.)!(mk.)]TJ /F6 11.955 Tf 11.95 0 Td[(1)! N!"1 Qmk.)]TJ /F5 7.97 Tf 6.59 0 Td[(1j=1(j+)1 fkQYj=1\(j+mkj) \(j)#)#whereK(f)iisthenumberofnewdisheswithchoicef2f1,2,,Qgsampledbyi-thcustomer. 5.15.3ConnectionNow,letuswritedownthe\metaphorical"quantitiesrelatedtocIBPLetzik=0ifcustomeridoesnottastedishkandzik=jifcustomeritastedishkwithchoicej Customeritastesdishkwithchoicejwithprobabilitymk. ij+mkj +mk. wheremkjisthepreviousnumberofcustomerstasteddishkwithchoicejandmk.isthepreviousnumberofcustomerstasteddishkwithanychoice Fromhereweknowthatcustomeridoesnottastedishkwithprobability1)]TJ /F10 11.955 Tf 11.96 20.44 Td[( QXj=1mkj i!=1)]TJ /F3 11.955 Tf 13.15 8.09 Td[(mk. i CustomeridrawsnewdisheswithchoicejfromPoi(j ) itotalingPoi()newdishes.TheDeFinettimixingdistributionbehindcategoricalIBPisBetaDirichlet(BD)process.WhenwesayXiCategorical(D),wemeanthatXiisainnitevectorwithoneentrybeinganyintegerbetween0andQ. 137

PAGE 138

Lemma5.15.1. LetDBD(c,D0,(1,,Q))andletXijDCaP(D)fori=1,2,nbenindependentCategoricalprocessdrawsfromD.TheposteriordistributionofDafterobservingfXigni=1isstillaBDprocesswiththefollowingparametersDjfXigni=1BD c+n,c c+nD0+1 c+nnXi=1Xi., 1+nXi=1Xi1,,Q+nXi=1XiQ!! Lemma5.15.2. CombiningthepreviousLemmaandusingthefactthatP(Xn+1jfXigni=1)=EDjfXigni=1P(Xn+1jD)wehavethefollowingpredictivedistributionformulaXn+1jfXigni=1CaP(a1,a2,,aQ,1)]TJ /F3 11.955 Tf 11.96 0 Td[(a)wherea=QXi=1aiwhereeachai=(c c+n)(i )D0+Pjmn.j c+n(i+mnij +mn.j)wjforalli=1,2,,QHeremn,i,jisthenumberofcustomersamongX1,,Xnhavingtrieddishwjwithcategoryiandmn,.,jisthenumberofcustomersamongX1,,Xnhavingtrieddishwjwithanycategory.IfD0()=,thendrawingnewdishwithchoicejgeneratesPoi((c c+n)j )newdishesofthatkindwiththatparticularchoice. 5.16BD-NMConjugacy 5.16.1NegativeMultinomialProcess(NMP)WehaveseenthatBDdistributionisconjugatetoNegativeMultinomial(NM)distribution.WecansimilarlydeneaNMProcessYwithaunderlyingbasemeasureA=P1k=1pk!kandwithanotherparameterr0.LikebeforeAcanbecontinuousordiscreteormixtureofboth.SowewilldenoteYNMP(r0,A).InNM,the0-thpositionisaveryspecialpositionsopcanbeusedtodrawaQ-dimensionalprocessi.e.Y=P1k=1mk!kwheremk=(mk0,mk1,,mkQ).InCRMnotationwecanwrite,YNMP(r0,A)=)Y=1Xk=1mk!kwheremkindNM(r0,p) 138

PAGE 139

Aswecanseethisprocesswillbeappropriatewherewenotonlyneedjustexistenceornon-existenceofthelatentfeaturesbutalsoneedthecountofthem.InthelatersectionwewillexplicitlyshowtheConjugacyofNMPandBDPforboththecases{whentheunderlyingPoissonmeasure(d!,dp)isniteand(d!,dp)goesto1.AlsowewillgiveanalternativeconstructionforNMPviamarkedPoissonprocess. 5.16.2ConjugacyforNMPandBDP{CRMFormulation Figure5-5. BD-NegativeMultinomialprocesswithQ=2 LetuswritedownthefullCRMspecicationforBDPandNMPwherewewillhaveonlythediscretepart.NMPwillbedenotedbyNMP(r0,A).Ithasatwoparameters,r0andthebaseorhazardmeasureA,whichisdiscreteandhasatomsthattakesvaluesinSQo.WewillfocusparticularlyonthemodelwhereAisdrawnfromaBDprocesswhosedrawisalmostsurelydiscrete.Theentiremodelisthefollowing{YNMP(r0,A)ABDP(,,(1,,Q),G)whereistheconcentrationparameter,isthemassparameter,(1,,Q)istheDirichletparameterandGisthebasemeasuresuchthatG()=. 139

PAGE 140

ThebasemeasureAisdiscreteandhasatomsthattakesvaluesin[0,1]Q.WewillfocusparticularlyonthemodelwhereAisdrawnfromaBDprocesswhosedrawisalmostsurelydiscrete.Hencewewrite,A=1Xk=1pk!k=1Xk=1(pk1,pk2,,pkQ)T!kNotethat,PQj=1pkj1forallk=1,2,.NowfpkjgQj=1canbetreatedasapriorprobabilitydistributionforanegativemultinomialdistributionwithQcomponents.WesaythattherandommeasureYisdrawnfromaNMprocessifY=1Xk=1mk!k=1Xk=1(mk1,,mkQ)T!kwhere(mk1,,mkQ)indNegativeMultinomial(pk1,,pkQ)fork=1,2,.SoTheNMPliesonthespace(Z+)Qasshowningure 5-5 .NowsupposeBDPAhastheparameters>0,>0,(1,,Q)andbasemeasureG.WeassumeGisanyarbitraryCRMwiththreepartsG=Gd+Gf+Goaccordingto[ 30 ].InallproofswehaveassumedGd=0.SowewilltakeG=Gf+Go.Inlatersection,wehaveassumedthatGitselfisadrawfromaBPG=P1kbk!kinthehierarchicalmodel.ABDP(,,ff,jgQj=1,fo,jgQj=1,Gf,Go)andYNMP(A).WerefertheoverallprocessastheBeta-DirichletNegativeMultinomialProcess(BDNMP). BDNMPConjugacyUsingAlternativeParametrizationforBDPintheBaseMeasure(G). Theorem5.16.1. AssumetheconditionsfortheabovetheoremandconsiderNconditionally-independentdrawsfromNMPYn=LXl=1mf,n,lul+MXi=1mo,n,ivii.i.d.NMP(A),forn=1,2,,NwithABDP,,fojgQj=1,hu,,,ffjgQj=1i,Go 140

PAGE 141

Asbefore,G(Gf,Go)([u,,],Go).flgLl=1andflgLl=1areall>0.ffjgQj=1aretheDirichletparameterscorrespondingtoxedpartandareassumedtobesameforallxedatoms.SimilarlyfojgQj=1aretheDirichletparameterscorrespondingtorandompartandareassumedtobesameforallrandomatoms.ThenAjY1,,YNBDPpost,post,fpostojgQj=1,hupost,post,post,fpostfjgQj=1i,Gpostowithpost=+r0N,post= +r0NandGposto=Go.TherewillbetotalL+Mxedatoms,fupostl0g=fulgLl=1[fvigMi=1.post,postwillbesuchthatforl=1,2,,Lpostl=l+NXn=1QXq=1mf,n,l,qandpostl=l+r0Nandfori=1,2,,MpostL+i=NXn=1QXq=1mo,n,i,qandpostL+i=+r0NAlsowehave,postf,l,j=fj+NXn=1mf,n,l,j8l=1,,Landj=1,2,,Qposto,i,j=oj+NXn=1mo,n,i,j8i=1,,Mandj=1,2,,Q Proof. ThiswillbeimmediatefromthefollowingTheorem. 5.16.3FormalProofofConjugacyofNMPforBDPInthissetup,wearegeneratingdatafromaQ-dimensionalNegativeMultinomialprocess(NMP).ThepriorisaQ-dimensionalBeta-Dirichletprocess(BDP).NowweknowfromBDPconstructionthatitcanbewrittenasP1k=1pk!ksuchthatforallk,PQj=1pkj1.Wewilldenotepk=PQj=1pkjandpk0=1)]TJ /F10 11.955 Tf 11.96 8.97 Td[(PQj=1pkj.(analogoustoNBPprocesswithBPprior). 141

PAGE 142

Thisproofcanbeconstructedasanextensionoftheproofsgivenin[ 20 ]and[ 3 ],whichwereobtainedmainlyafterKim'sfamousproof[ 19 ].Ourproofwillbesimilartothose.Wearegivingitforthereaderstobeabletoseethecompletestory.Herewearetalkingaboutmultivariatenon-decreasingprocesswithindependentincrements,whichisamultidimensionalLevyprocess.LetAbeaQ-dimensionalLevyprocesson[0,1].Denearandommeasureas([0,t],D)=XstI[A(s)2D)]TJ /F6 11.955 Tf 11.96 2.19 Td[(0]whereD2B[0,1]Q.Accordingto[ 70 ],isaPoissonmeasure.Let()=E[()]whichisthecompensatoroftheLevyprocessA.DeneasetUbysettingU=fuj(fug[0,1]Q)>0g.ThenUisthenitesetoftimesofxeddiscontinuityofA.LetusdenotethissetbyU=fu1,u2,,uLg.LetthedistributionfunctionforjumpsinthexedpointsofdiscontinuityujisdenotedbyGj().Nowcanbedecomposedas{([0,t]D)=ZDdLt(z)+XtjtZDdGj(z)=Zt0ZDdFs(z)ds+XujtZDdGj(z)=Zt0ZDfs(z)dzds+XujtZDdGj(z)wheredLt(z)=Zt0fs(z)dsdzanddFs(z)=fs(z)dzNotethatforcompoundPoissonprocessfs(z)=gs(z)(s)where(s)=Z[0,1]Qfs(z)dzandgs(z)=fs(z) (s)when(s)<1 WithnitenumberofxedpointofdiscontinuityacompoundPoissonprocessissometimescalledextendedcompoundPoissonprocess. 142

PAGE 143

Forageneralcasewhen(s)=1,wehavetondthelimitofasequenceofcompoundPoissonprocesseswithnite(s). Theorem5.16.2. LetAprior=P1k=1pk!kbeadiscrete,CRMon[0,1]Qwithatomlocationsin[0,1].Letp.=PQj=1pj.Supposeithasthefollowingcomponents. Thereisnodeterministiccomponent. TheordinarycomponentisgeneratedfromaPoissonpointprocesswithintensityc(d!,dp)=c(dp)d!suchthatcisabsolutelycontinuousandc([0,1]Q)<1.Inparticular,theQ-dimensionalweightsarethe\p"axesandtheatomlocationsareinthe\!"axis. ThereareLxedatomsatlocationsu1,u2,,uL2[0,1].TheQ-dimensionalweightofthel-thxedatomisarandomvariablewithdistributiondGl.DrawanNMPYwithinputmeasureApriorandparameterr0.Letusassumethereisonlyonenon-zeroatomofYandletfm1,s1gbethepairofobserveddatawithcorrespondinglocation.Notethatm1=fm11,,m1QgisQ-dimensionalvector.TheposteriorprocessfortheBDPgivenYisgivenbyaCRMApostwiththefollowingcomponents. Thereisnodeterministiccomponent. TheordinarycomponentisgeneratedfromaPoissonpointprocesswithintensity(1)]TJ /F3 11.955 Tf 11.96 0 Td[(p.)r0c(d!,dp)=(1)]TJ /F3 11.955 Tf 11.96 0 Td[(p.)r0c(dp)d! Therearethreetypesofxedatoms. 1. Thereisold,repeatedxedatom.Iful=s1,thenthereisaxedatomatulwithweightdensity1 Wrf QYj=1pm1jj!(1)]TJ /F3 11.955 Tf 11.96 0 Td[(p.)r0dGl(p)whereWrfisthenormalizingconstantWrf=Zp2[0,1]Q QYj=1pm1jj!(1)]TJ /F3 11.955 Tf 11.96 0 Td[(p.)r0dGl(p) 143

PAGE 144

2. Thereareold,unrepeatedxedatoms.Iful6=s1,thenthereisaxedatomatulwithweightdensity1 Wuf(1)]TJ /F3 11.955 Tf 11.95 0 Td[(p.)r0dGl(p)whereWufisthenormalizingconstantWuf=Zp2[0,1]Q(1)]TJ /F3 11.955 Tf 11.95 0 Td[(p.)r0dGl(p) 3. Thereisanewxedatom.Ifs1=2fu1,,uLg,thenthereisaxedatomats1withweightdensity1 Wnew QYj=1pm1jj!(1)]TJ /F3 11.955 Tf 11.96 0 Td[(p.)r0c(dp)whereWnewisthenormalizingconstantWnew=Zp2[0,1]Q QYj=1pm1jj!(1)]TJ /F3 11.955 Tf 11.95 0 Td[(p.)r0c(dp) Proof. WewillassumetheexistenceofaNMPYwithcumulativeintensityfunctionApriorusingamarkedPoissonprocessV.LetVbeamarkedPoissonprocess(s1,m1),(s2,m2),wheretheprocessP1i=1I(sit)isthePoissonprocesswiththecumulativeintensityfunctionAprior.=PQj=1Apriorj.LetY=(m1,m2,)=((m11,,m1Q),(m21,,m2K),).Withoutlossofgeneralityletconsideronly[0,1]intervalin.LetsK=fs1,s2,,sKgbethesetoftimesforjumpandletmK=fm1,,mKgbethecorrespondingmarks.Let(B,B)setofcompletelyrandommeasureson[0,1]withweightsin[0,1]Qanditsassociated-algebra.Let(M,M)bethesetofcompletelyrandommeasureson[0,1]withatomweightsin(Z+)Qanditsassociated-algebra.ForanysetB2BandM2M,Q(B;C)beaprobabilitymeasureonBbytheproposedposteriordistribution.Finally,letusdenotethemarginaldistributionofCbyPC.Itsucestoprovethat P(B\M)=ZC2MQ(B;C)dPC(5{6)Letusrstconsiderthecasewhenc([0,1]Q)=R[0,1]Qc(dp)=<1(homogeneousPoisson).Deneprobabilitydensityfunctionasc(dp) wherec(dp)c(dp1dp2dpQ). 144

PAGE 145

WecanwriteApriorasAprior(d!)=LXl=1lul(d!)+KXi=1ivi(dv)Notethat,i=fi1,,iQgandl=fl1,,lQg.HereKisthenumberofatomsintheordinarycomponentofAprior.So,totalatomsinApriorisL+KandtotalnumberofatomsinthecountingmeasurewithApriorshouldbeatmostL+K.AtomlocationsarefvigKi=1forordinarycomponentandfulgLl=1forxedpointofdiscontinuities(whichweassumedtobenite).fs1g2fulgLl=1[fvigKi=1.Q-dimensionalatom'sweightatxedpointfulgLl=1areflgLl=1andatordinarycomponentlocationfvigKi=1arefigKi=1.LetvK=fv1,v2,,vKgand(K;L)=f1,,K;1,,Lg.Let=c([0,1]Q),whichisnitebyassumption.ThennumberofatomsintheordinarycomponentisPoissondistributed.KPoisson()fgKi=1arei.i.ddistributedrandomvariableswithvalueson[0,1]Qandeachhasdensityc(dp) .Accordingto[ 70 ],itsucestoconsideronlythesetsBandGwhichareofthefollowingformB=fK=k,(vk~vk,k~k),L~LgM=fT=1,S1s,m1=~mgwhereTisthenumberofpointsgeneratedbyNMPandthatis1inourcase.Foranygivenvector~vk2[0,1]k,~k2[0,1]kQand~L2[0,1]LQ,s2[0,1]and~m2(Z+)Qwherevn~vnisdenedasvi~vi(i=1,,n).Similarlyn~n)andL~Laredenedcomponent-wiseforavector.HerewehaveconsideredthecasewhenY,theNMPhasonlyoneobservation,extensiontothecaseforgreaterthan1isstraight-forwardaspointedoutin[ 19 ].SointherandommeasureAprior,weconsiderasetwithaxednumbernofordinarycomponentatomsandhavexedupperbounds~vi,~iand~lonthelocationoftheordinarycomponentatoms,theweightsoftheordinarycomponentatomsandweightsofthexed 145

PAGE 146

atoms,respectively.ForthecountingmeasureY,werestricttoasingleatomatsandthemarkofthatatomis~m,whichcanbelongtothe(Z+)Q. 5.16.3.1PriorPartFirstletuscalculateP(B\M).Wecanwrite, PK=k,(vk~vk,k~k),L~L=PfK=kgZvk~vkP((k~k),(L~L))jvk,K=kPdvkjK=k (5{7) Nowweknow, PfK=kg=k k!exp()]TJ /F8 11.955 Tf 9.3 0 Td[()(5{8)Alsothelocationofthoseatomsvigiventheirtotalnumberkisdistributedas,(notethattheelementinthesetfvigki=1areorderedinincreasingorderoftime). Pfvn~vnjN=ng=n!)]TJ /F8 11.955 Tf 5.48 -9.69 Td[()]TJ /F4 7.97 Tf 6.58 0 Td[(nZ~v10Z~v2~v1Z~vn~vn)]TJ /F17 5.978 Tf 5.75 0 Td[(1nYi=1dvi=k!Z~v10Z~v2~v1Z~vk~vk)]TJ /F17 5.978 Tf 5.75 0 Td[(1kYi=1dvi (5{9) Giventheatomlocationandtheirtotalnumbers,theweightdistributionofthewholeatomset(randomandxed)willbeofthefollowing P(k~k)(L~L)jK=k;vk=~vk (5{10) ="kYi=1Z~i0c(di) #"LYl=1Z~l0dGl(l)# (5{11) NotethanfulgLl=1areuniqueandfvigki=1arealmostsurelyunique.WealsohaveT=1whichcancomeeitherfromxedatompartorPoissonprocesspart.Suppose,K=kwithvk=~vk,k=~kandL=~L.Thedirectcalculationyields(afterbreakingintotwoseparatecases)PfT=1,S1s,m1=~mjApriorg 146

PAGE 147

=kXi=1(K,L,vk,k,L,j,vi)I(vis)+LXl=1(N,L,vk,k,L,~m,ul)I(uls)Theprobabilitythatthenon-zerovectoroccursataparticularatomistheprobabilitythatthenon-zerovectorappearsatthisatomandzerocountsappearatallotheratoms.Inthiscontext,letusnowdenethefunction().(K,L,vk,k,L,~m,s)=(kYi=1)]TJ /F3 11.955 Tf 5.48 -9.69 Td[(H(~m;r0,i)I(vi=s))]TJ /F3 11.955 Tf 5.48 -9.69 Td[(H(0;r0,i)I(vi
PAGE 148

5.16.3.2InducedMeasureConditionedonT=1,S1=sandm1=~m,wehaveQ(B;C)=PfK=k,(vk~vk,k~k),(L~L)T=1,S1=s,m1=~mgWehavepreciselytwocasestoconsider-eithertheatomofNMPisatthesamelocationasaxedatomofthepriorrandommeasure,sayuloritisatadierentlocation. CaseI:.Letusconsidertherstcasewheres=ul.Asbefore,thenumberofatomsintheordinarycomponentisPoissondistributedwithmeanequaltothetotalPoissonpointprocessmasspost=Z1p=0(1)]TJ /F3 11.955 Tf 11.95 0 Td[(p)c(dp)Sowehave, PfK=kjT=1,S1=ul,m1=~mg=expf)]TJ /F8 11.955 Tf 15.27 0 Td[(postg(post)k k!(5{13)Letuslookatthedistributionoftheordinarycomponentatoms{ Pfvk~vkjK=k,T=1,S1=ul,m1=~mg=k!(post))]TJ /F4 7.97 Tf 6.59 0 Td[(kZ~v10Z~v2~v1Z~vk~vk)]TJ /F17 5.978 Tf 5.76 0 Td[(1kYi=1postdvi=k!Z~v10Z~v2~v1Z~vk~vk)]TJ /F17 5.978 Tf 5.75 0 Td[(1kYi=1dvi (5{14) Also, Pf(k~k,L~L)jvk=~vk,K=k,T=1,S1=ul,m1=~mg="kYi=1Z~i0(1)]TJ /F6 11.955 Tf 12.96 2.66 Td[(i.)c(di) post#"LYl=1R~l0(H(~m;r0,l))I(l=l))]TJ /F3 11.955 Tf 5.48 -9.69 Td[(H(0;r0,l)I(l6=l)dGl(l) R10(H(~m;r0,l))I(l=l))]TJ /F3 11.955 Tf 5.48 -9.68 Td[(H(0;r0,l)I(l6=l)dGl(l)# (5{15) Puttingtogethertheequations 5{13 5{14 and 5{15 wehave, PfK=k,(vk~vk,k~k),(L~L)jT=1,S1=ul,m1=~mg=1 WU(l)expf)]TJ /F8 11.955 Tf 15.28 0 Td[(postgZv(K,L,vk,k,L,~m,ul) 148

PAGE 149

kYi=1dvi! kYi=1c(di)! LYl=1dGl(l)! (5{16) whereWU(l)=LYl=1Z10(H(~m;r0,l))I(l=l))]TJ /F3 11.955 Tf 5.48 -9.69 Td[(H(0;r0,l)I(l6=l)dGl(l) CaseII:.Forthecasewheres=2fu1,u2,,uLg,conditionedonS1=sandK=k,thereexistsai2f1,2,,kgsuchthatvi=s.Letusalsoassumethatviistheo-thsmallestorderstatistics(asfvigki=1area.s.unique).Soforthiscase,wecanwrite PfK=k,(vk~vk,k~k),(L~L)jT=1,S1=s,m1=~mg=kXi=1PfK=k,vo=sjT=1,S1=s,m1=~mgPf(vk~vk,k~k),(L~L)jK=k,T=1,S1=vo=vi,m1=~mg (5{17) NownotethatnumberofatomsoneithersideofviisPoissondistributed.Sowehave, PfK=k,vo=sjT=1,S1=s,m1=~mg=expf)]TJ /F8 11.955 Tf 15.27 0 Td[(postvig(postvi)o)]TJ /F5 7.97 Tf 6.59 0 Td[(1 (o)]TJ /F6 11.955 Tf 11.95 0 Td[(1)!expf)]TJ /F8 11.955 Tf 15.27 0 Td[(post(1)]TJ /F3 11.955 Tf 11.95 0 Td[(vi)g(post(1)]TJ /F3 11.955 Tf 11.96 0 Td[(vi))k)]TJ /F4 7.97 Tf 6.59 0 Td[(o (k)]TJ /F3 11.955 Tf 11.95 0 Td[(o)! (5{18) Letuslookatthedistributionoftheatomsoneithersideofvi(normalizedappropriately) Pfvk~vkjK=k,T=1,vo=S1=s,m1=~mg=(o)]TJ /F6 11.955 Tf 11.95 0 Td[(1)!(post))]TJ /F5 7.97 Tf 6.58 0 Td[((o)]TJ /F5 7.97 Tf 6.59 0 Td[(1)Z~v10Z~v2~v1Z~vo~vo)]TJ /F17 5.978 Tf 5.76 0 Td[(1 o)]TJ /F5 7.97 Tf 6.58 0 Td[(1Yi=1postdvi vi!(k)]TJ /F3 11.955 Tf 11.95 0 Td[(o)!(post))]TJ /F5 7.97 Tf 6.58 0 Td[((k)]TJ /F4 7.97 Tf 6.59 0 Td[(o)Z~vo+1~voZ~vo+2~vo+1Z~vk~vk)]TJ /F17 5.978 Tf 5.76 0 Td[(1 kYi=o+1postdvi 1)]TJ /F3 11.955 Tf 11.96 0 Td[(vi!=(o)]TJ /F6 11.955 Tf 11.95 0 Td[(1)!Z~v10Z~v2~v1Z~vo~vo)]TJ /F17 5.978 Tf 5.76 0 Td[(1 o)]TJ /F5 7.97 Tf 6.59 0 Td[(1Yi=1dvi vi!(k)]TJ /F3 11.955 Tf 11.95 0 Td[(o)!Z~vo+1~voZ~vo+2~vo+1Z~vk~vk)]TJ /F17 5.978 Tf 5.76 0 Td[(1 kYi=o+1dvi 1)]TJ /F3 11.955 Tf 11.96 0 Td[(vi! (5{19) 149

PAGE 150

Alsotheweightdistributionswillbe Pf((k~k),(L~L))jvk~vk,K=k,T=1,vo=S1=vi,m1=~mg=24kYi=1R~i0)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(H(~m;r0,i)I(i=i))]TJ /F3 11.955 Tf 5.48 -9.68 Td[(H(0;r0,i)I(i6=i)c(di) R10)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(H(~m;r0,i)I(i=i))]TJ /F3 11.955 Tf 5.48 -9.68 Td[(H(0;r0,i)I(i6=i)c(di)35"LYl=1R~l0H(0;r0,l)dGl(l) R10H(0;r0,l)dGl(l)# (5{20) Combiningequations 5{22 5{18 5{19 and 5{20 wehave, PfK=k,(vk~vk,k~k),(L~L)jT=1,S1=vi,m1=~mg=1 WVexpf)]TJ /F8 11.955 Tf 15.28 0 Td[(postgZ()]TJ /F14 5.978 Tf 5.75 0 Td[(o)v(K,L,vk,k,L,~m,vi) kYi=1,i6=odvi! kYi=1c(di)! LYl=1dGl(l)! (5{21) whereWV="Z10)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(H(~m;r0,i)I(i=i)c(di)#"LYl=1Z10H(0;r0,l)dGl(l)#becausetheotherpartisgettingcanceledfrompartoftheequation 5{18 .Alsowehave,()]TJ /F4 7.97 Tf 6.58 0 Td[(o)v=fvk2[0,1]k:vk~vkandv1vo)]TJ /F5 7.97 Tf 6.58 0 Td[(1vo=svo+1vkg.Puttingtogetherequation 5{16 and 5{21 weget, PfK=k,(vk~vk,k~k),(L~L)jT=1,S1=s,m1=~mg=LXl=1(I(ul=s)1 WU(l)expf)]TJ /F8 11.955 Tf 15.28 0 Td[(postgZv"(K,L,vk,k,L,~m,ul) kYi=1dvi! kYi=1c(di)! LYl=1dGl(l)!#)+Is=2fulgLl=1kXi=1(1 WVexpf)]TJ /F8 11.955 Tf 15.28 0 Td[(postgZ()]TJ /F14 5.978 Tf 5.76 0 Td[(o)v"(K,L,vk,k,L,~m,vi) 150

PAGE 151

kYi=1,i6=odvi! kYi=1c(di)! LYl=1dGl(l)!#) (5{22) 5.16.3.3MarginalDistributionofYViaMarkedPoissonProcessLetusconsiderthedenitionofaMarkedPoissonProcess(MPP)tondoutthemarginal.WearestartingfromaBDProcess(BDP)prior.So,let(!,p)formsaPoissonProcess(PP)on[0,1]Qwithmeanmeasure.Nowmarkeach(Q+1)-dimensional(Qfrompand1from!)point(!,p)witharandomvariableZwhosevalueliesinaspaceVwithatransitionprobabilityP(p,).Then(!,p,Z)formaPoissonprocesson[0,1]QVwithmeanmeasureP(p,Z).InNMPprocesscontext,ZtakevaluesinB=f0,1,2,gf0,1,2,g=f0,1,2,gQ.Now,heref!,p,ZgisaPPon[0,1]QBwithmeanmeasure(d!,dp)P(p,Z).WealsoknowthereisacountingmeasureassociatedwitheveryPP.LetthiscountingmeasurebeN(d!,dp,Z).Notethatinourcase,thetransitionprobabilityP(p,)istheNegativeMultinomial(NM)probabilitydistributiononspaceB.LetusdenoteC=B)-223(f0gtoexcludethezerovaluefromourcoupledprocess.SoP(p,C)istheprobabilityofthesetCw.r.tNegativeMultinomialdistributionwithparameters(r0,p),soP(p,C)=(1)]TJ /F6 11.955 Tf 12.15 0 Td[((1)]TJ /F3 11.955 Tf 11.96 0 Td[(p.)r0)[asweareconsideringonlyonepoint),wherep=PQj=1pj.WewantthecountingmeasureN(A,[0,1]Q,C)isarandomvariablewithprobability(whereA)(A,[0,1]Q)P(p,C)=Z[0,1]Q(A,dp)P(p,C)=ZAZ[0,1]Q(d!,dp)P(p,C)=(A)Z[0,1]Q)]TJ /F6 11.955 Tf 5.48 -9.68 Td[(1)]TJ /F6 11.955 Tf 11.95 0 Td[((1)]TJ /F3 11.955 Tf 11.95 0 Td[(p)r0(dp)=A(let)WewanttorstcomputetheprobabilitydistributionforP(K=1,mi2C,s1^s1).Forthatwearetryingtocomputethedensityrstandthenintegratingthatouttogetthisdistribution.Thendensitywilllooklikeasbelow,rememberUisasetwhichis(v1)]TJ /F12 7.97 Tf 12.56 4.7 Td[( 2,v1+ 2) 151

PAGE 152

wherev1isthepointinaxisgeneratedbytherandomPP.lim!0PhN(()]TJ /F3 11.955 Tf 11.95 0 Td[(U),[0,1]Q,C)=0iPhN(U,[0,1]Q,C)=1i Now,fromPoissondistributionweknowPhN(()]TJ /F3 11.955 Tf 11.95 0 Td[(U),[0,1]Q,C)=0i=expf()]TJ /F4 7.97 Tf 6.59 0 Td[(U)gandPhN(U,[0,1]Q,C)=1i=UexpfUgAlso,wearegeneratingthevaluem1=~matthepointv1,whichisoneofthepointsfromthesetC=B)-288(f0g=f1,2,3,gQ.SotheprobabilityisgivenbyH(~mjr0,p) P(p,C)(byconditionalprobabilityrule).AswenallywanttocomputetheprobabilitydistributionforP(K=1,m1=~m,s1^s1),sowehavetocalculatethefollowingdensity{lim!0PhN(()]TJ /F3 11.955 Tf 11.96 0 Td[(U),[0,1]Q,C)=0iPhN(U,[0,1]Q,C)=1iH(~mjr0,p) P(p,C) =lim!0expf)]TJ /F8 11.955 Tf 15.28 0 Td[(()]TJ /F4 7.97 Tf 6.58 0 Td[(U)g[Uexpf)]TJ /F8 11.955 Tf 15.27 0 Td[(Ug]hH(~mjr0,p) P(p,C)i =lim!01 (exp)]TJ /F8 11.955 Tf 9.3 0 Td[(()]TJ /F3 11.955 Tf 11.95 0 Td[(U)Z[0,1]Q(dp)P(p,C)(U)Z[0,1]Q(db)P(p,C)exp)]TJ /F8 11.955 Tf 9.3 0 Td[(Z[0,1]Q(dp)P(p,C)H(~mjr0,p) P(p,C))=exp)]TJ /F10 11.955 Tf 9.3 16.27 Td[(Z[0,1]Q(dp)P(p,C)Z[0,1]Q(dp)P(p,C)H(~mjr0,p) P(p,C)=exp)]TJ /F10 11.955 Tf 9.3 16.27 Td[(Z[0,1]Q(dp)P(p,C)Z[0,1]QH(~mjr0,p)(dp)=exp)]TJ /F10 11.955 Tf 9.3 16.28 Td[(Z[0,1]Q)]TJ /F6 11.955 Tf 5.48 -9.69 Td[(1)]TJ /F6 11.955 Tf 11.96 0 Td[((1)]TJ /F3 11.955 Tf 11.96 0 Td[(p)r0(dp)Z[0,1]QH(~mjr0,p)(dp)=exp)]TJ /F10 11.955 Tf 11.29 16.27 Td[(Z[0,1]Q(dp)+Z[0,1]Q(1)]TJ /F3 11.955 Tf 11.96 0 Td[(p)r0(dp)Z[0,1]QH(~mjr0,p)(dp)=e)]TJ /F5 7.97 Tf 6.59 0 Td[((+post)Z[0,1]QH(~mjr0,p)(dp) 152

PAGE 153

Herewehaveusedthatthemeasure()]TJ /F3 11.955 Tf 11.97 0 Td[(U)=1)]TJ /F8 11.955 Tf 11.97 0 Td[(and(U)=and!0.Sowehave,P(K=1,m1=~m,s1^s1)=Z^s1!=0e()]TJ /F12 7.97 Tf 6.58 0 Td[(+post)Z[0,1]QH(^m1jr0,p)(dp)d!Nowletussubstitutethevaluesforpiandc.Notethatthisatomdoesnotcomefromthexedpointset.SoinordertogetthecorrectmarginalwehavetomultiplythistermwithQLl=1H(0;r0,l).SonallywehavePfT=1,S1s,m1=~mg="LYl=1Z10)]TJ /F3 11.955 Tf 5.47 -9.69 Td[(H(0;r0,l)dGl(l)#e()]TJ /F12 7.97 Tf 6.58 0 Td[(+post)Zs!=0Z[0,1]QH(^m1jr0,i)c(di)d!=expf)]TJ /F8 11.955 Tf 15.28 0 Td[(gexpfpostgZs!=0WVd!Similarlyfortheothercasewehave,PfT=1,S1=ul,m1=~mg=expf)]TJ /F8 11.955 Tf 15.28 0 Td[(gexpfpostgWU(l) 5.16.3.4CheckingtheIntegrationForsimplicity,letususethefollowingnotation,Z1=(Zv(K,L,vk,k,L,~m,ul) kYi=1dvi! kYi=1c(di)! LYl=1dGl(l)!)Z2=(Zv(K,L,vk,k,L,~m,vi) kYi=1dvi! kYi=1c(di)! LYl=1dGl(l)!)Z()]TJ /F4 7.97 Tf 6.58 0 Td[(o)2=(Z()]TJ /F14 5.978 Tf 5.75 0 Td[(o)v(K,L,vk,k,L,j,vi) kYi=1,i6=odvi! kYi=1c(di)! LYl=1dGl(l)!)Thejointdistribution(prior)isgivenby L.H.S=expf)]TJ /F8 11.955 Tf 15.28 0 Td[(gLXl=1I(uls)Z1+expf)]TJ /F8 11.955 Tf 15.28 0 Td[(gkXi=1I(vis)Z2(5{23) 153

PAGE 154

Priortointegration,theinducedposteriormultiplied(xedwithxedpartandrandomwithrandompart)withcorrespondingmarginalisgivenby R.H.S="LXl=1I(ul=s)1 WU(l)expf)]TJ /F8 11.955 Tf 15.27 0 Td[(postgZ1#expf)]TJ /F8 11.955 Tf 15.28 0 Td[(gexpfpostgWU(l)+"kXi=1I(vi=s)1 WVexpf)]TJ /F8 11.955 Tf 15.28 0 Td[(postgZ()]TJ /F4 7.97 Tf 6.59 0 Td[(o)2#expf)]TJ /F8 11.955 Tf 15.27 0 Td[(gexpfpostgZs!=0WVd!=expf)]TJ /F8 11.955 Tf 15.27 0 Td[(gLXl=1I(ul=s)Z1+expf)]TJ /F8 11.955 Tf 15.27 0 Td[(gkXi=1I(vi=s)Z()]TJ /F4 7.97 Tf 6.59 0 Td[(p)2Zs!=0d!=expf)]TJ /F8 11.955 Tf 15.27 0 Td[(gLXl=1I(ul=s)Z1+expf)]TJ /F8 11.955 Tf 15.27 0 Td[(gkXi=1I(vi=s)Z2 (5{24) Nowtakingtheintegrationofequation 5{24 wehave, Zs0"expf)]TJ /F8 11.955 Tf 15.28 0 Td[(gLXl=1I(ul=s)Z1+expf)]TJ /F8 11.955 Tf 15.28 0 Td[(gkXi=1I(vi=s)Z2#=expf)]TJ /F8 11.955 Tf 15.28 0 Td[(gLXl=1I(uls)Z1+expf)]TJ /F8 11.955 Tf 15.27 0 Td[(gkXi=1I(vis)Z2 (5{25) Nowwecanseetheequation 5{25 isexactlyequaltotheequation 5{23 .Thuswehaveprovedtheequation 5{6 forBeta-Dirichlet-Negative-MultinomialProcess(BDNMP). 5.16.4TheCaseWhen=1 Theorem5.16.3. TheprevioustheoremcanstillbeappliedwhenthePoissonintensitymeasureisnotniteintheinterval[0,1],i.e.c([0,1]Q)=1,butrathersatisesaweakercondition Z[0,1]Q(p.)c(dp)<1(5{26)ThesequenceofcompoundPoissonprocessAnpriorcanbeobtainedasthefollowingAnprior,j(t)=XstAprior,j(s)IAprior,.>1 n 154

PAGE 155

TheideaoftheproofistoshowthatthelimitofthesequenceofthecompoundPoissonprocessthathasniteintensitywillconvergetothecorrectprocess.HereisatheoremwhoseproofcanbeconstructedfromtheTheorem3foundin[ 3 ]. Theorem5.16.4. LetAprior,nbeaCRMwithanitesetofxedatomsin[0,1]andwiththePoissonprocessintensitycn.csatisesequation 5{26 .LetYnbedrawnasaNegativeMultinomialprocesswithparameterr0andAprior,n.LetApriorbeaCRMwithPoissonprocessintensitycandletYbedrawnasaCategoricalprocesswithparameterr0andAprior.Then,)]TJ /F6 11.955 Tf 6.38 -7.03 Td[(Aprior,n,Ynd!)]TJ /F6 11.955 Tf 6.37 -7.03 Td[(Aprior,YUsingthispreviousTheoremandTheorem3.2in[ 19 ],wecanshowthatevenwiththisweakercondition,theconjugacyofNMPandBDPstillholds. 5.17Beta-Dirichlet-NegativeMultinomialProcessasaMarkedPoissonProcessLetusstateanotherveryusefultheoremcalledMarkingTheoreminthecontextofPoissonpointprocesstheory.Theproofcanbefoundin[ 17 ]. Theorem5.17.1. LetbeaPoissonProcessonSwithmeanmeasure.SupposewitheachpointXoftherandomset,weassociatearandomvariablemX(themarkofX)takingvaluesinsomeotherspaceM.ThedistributionofmXmaydependonXbutnotontheotherpointsof,andmXfordierentXareindependentwithrespectivedistributionsp(X,).Thepair(X,mX)canthenberegardedasarandompointXintheproductspaceSM.ThetotalityofpointsXformsarandomcountablesubset=f(X,MX)jX2gofSM.NowthisrandomsubsetisaPoissonprocessonSMwithmeanmeasuregivenby(A)=ZZ(x,m)2A(dx)p(x,dm) 155

PAGE 156

Thebetadistributionmayalsobeparametrizedintermsofitsmean(0<<1)andsamplesize=+(>0).=where=+>0=(1)]TJ /F8 11.955 Tf 11.96 0 Td[()ThesimilarparametrizationhasalsobeenusedbyHjortinhispaper.RecallthataBPdrawBBP(c,A0)canbeconsideredasadrawfromthePoissonprocesswithmeanmeasureBPwhereCRMBisdenedonproductspace[0,1]withtheLevymeasureBP(dpd!)=cp)]TJ /F5 7.97 Tf 6.58 0 Td[(1(1)]TJ /F3 11.955 Tf 11.95 0 Td[(p)c)]TJ /F5 7.97 Tf 6.59 0 Td[(1dpA0(d!)wherec>0istheconcentrationparameterorconcentrationfunctionifcdependson!.A0isthebasemeasurewithA0()=,whichiscalledmassparameter.ForBDitisdenedonproductspace[0,1]KwiththeLevymeasureBD(dpd!)=\(PKj=1j) \(1)\(K)cp1)]TJ /F5 7.97 Tf 6.59 0 Td[(11pKK)]TJ /F5 7.97 Tf 6.59 0 Td[(1(p.))]TJ /F16 7.97 Tf 7.99 5.98 Td[(PKj=1j(1)]TJ /F3 11.955 Tf 11.96 0 Td[(p.)c)]TJ /F5 7.97 Tf 6.59 0 Td[(1dpA0(d!)wherep.=PKj=1pj1andpdenotesthatitisavector-valuedrandomprocess.Inaverysimilarfashion,wecanseethataBDdrawDBD(c,A0,1,,K)wherefigKi=1areDirichletparameters(whichcanbefunctionof!).Wecanmarkarandompoint(!k,pk)ofDwitharandomvariabler0ktakingvaluesinR+,alsor0kandr0k0areindependentifk6=k0.NowfrommarkedPoissonprocesstheoryofKingmansuggeststhatf(!k,pk,r0k)g1k=1canbeviewedasrandompointsdrawnfromaPoissonprocessintheproductspace[0,1]KR+,withcompensatororLevymeasure#BD=\(PKj=1j) \(1)\(K)cp1)]TJ /F5 7.97 Tf 6.58 0 Td[(11pKK)]TJ /F5 7.97 Tf 6.58 0 Td[(1(p.))]TJ /F16 7.97 Tf 8 5.98 Td[(PKj=1j(1)]TJ /F3 11.955 Tf 11.95 0 Td[(p.)c)]TJ /F5 7.97 Tf 6.58 0 Td[(1dpA0(d!)H0(dr0)whereH0isacontinuousnitemeasureonR+withitsmassparameter=H0(R+).Nowwiththepreviousequation,usingA0H0asabasemeasure,wecanconstructamarkedBD 156

PAGE 157

processD#BD(c,H0A0,(1,,K))asD#=1Xk=1pk(!k,r0k)whereanatom(!k,r0k)comeswithweightpk2[0,1]K.Nowletusdenei-thdrawfromanegativemultinomialprocessasYiNMP(D#).Yi=1Xk=1vki!kwherevkiNM(r0k,pk).TheBDdrawD#denesasetofparametersfpk,!k,r0kg1k=1and(r0k,pk)areusedtodrawnegativemultinomialcountvectordenotedbyvkiforeachYidrawandtheatoms!karesharedacrossalldrawsofYis.Thecountvectorassociatedwithagivenatom!kisafunctionofaindexi,denotedbyvki. 5.18ExperimentwithSimulatedDataandResultsWewillusethisBDProcess(BDP)coupledwithNegativeMultinomialprocess(NMP)insomeadmixturemodel.FromanNMdraw,wegetaQ-dimensionalvectorofcountsforonemixingcomponent.Nowwecanactuallytreatthesecountstodrawdatafromallthesub-categoriesforaparticularclusterandtypicallyclusterinformationisalatentvariableandwithBayesiannonparametrictechniqueswecanalsoinfertheunknownnumberofclusters.Wecanrstchoosethenumberofdatapointsassociatedwitheachsubcategoriesforaclusterandthengeneratethedataaccordingtothosecountsfromthatcluster.Sothefollowingisthegenerativeprocessforthedatapoints{ FirstrandomlydrawQnumberofcountsmj1,,mjQforclusterjwherej=1,2,,K. Nowfromeachofthesub-clustersqofclusterjgeneratemjqnumberofdata. Foreachi=1,2,,mjqgeneratedataxjqfromF((j,q))where(j,q)istheappropriateparameterorsetofparameters.TotalnumberofdatageneratedinthatmannerisPKj=1PQq=1mjq.Wewillwritedownthemodelforactualexperimenthere. 157

PAGE 158

Figure5-6. TheHierarchicalBP-BD-NMprocesswithK=3andQ=2 TheaboveFigure 5-7 givesthebasicschematicforourgenerativemodelintopicmodeling.TheHierarchicalBeta-BetaDirichlet-NegativeMultinomial(HBBDNM)ProcesscanbegeneratedbyB0=Xkb0,k!kBP(0,0,H)Ad=Xk[bd,k,1bd,k,Q]T!kBDd,d,(1,,Q),B0 B0()8d=1,2,,DYd=Xk[md,k,1md,k,Q]T!kNM(r0d,Ad) 158

PAGE 159

whereHisthebasemeasureforBetaProcess(BP).Hereisthegenerativemodelforeachdocumentwhichisusingbag-of-wordsframework.Foreachdocumentd=1,2,,D,thereisanexchangeableobservationsfxd,ngNdn=1.Letusassume!karethelatenttopicsforthecorpusandzd,nisthetopicassociatedwithxd,n.NowinthisscenariowithBD,wehavethefollowinggenerativemodel, Algorithm9Generativemodelforthecorpus foreachdocumentd,d=1,,Ddo foreachtopick,k=1,,Kdo foreachcategoryj,j=1,,Qdo forl=1,2,,md,k,jdo DrawaBernoullirandomvariablecwithsuccessprobability ifcis1then GenerateawordfromthevocabularyfromaknownbasictopicTj else Generateawordfromthevocabularyfromanunknowntopic!k endif endfor endfor endfor endfor 5.18.1SyntheticDataInordertobuildthesyntheticdata,wetakeW=100wordlengthvocabulary.Wetake20wordsfromthefollowing5topics-Computer,Biology,ComputationalBiology,Compu-tationalNeuroscienceandBioinformatics.NowamongthesetopicswerealizeComputerandBiologyarelikesuper-topicandunderthesetwowehave3topicswhichsomehowcombinesthemtogetherbutinaverydierentwaymostlikely.Wehavegiventhegenerativemodelintheabovegure 5-7 .HereT1isthetopicComputerandT2isthetopicBiologywhichareknown.Nowbasedontheabovegenerativemodel,wegenerate300documentswithonaverage250to400wordsinit.Nowwerunourinferencetechniquesonthissyntheticdata-set.WeranaGibbssamplerwherealltheconditionaldistributionarementionedbelow.Wedid1000burninandcollected1000samplesafterevery5iterationstoremoveapossible 159

PAGE 160

correlation.WecanseethatwesomewhatrecoveredtheunderlyinglatenttopicsofComputa-tionalbiology,bioinformaticsandComputationalNeuroscienceasthetop6topics.r0foreverydocumentdenotedbyr0dwassetbyusingthefollowingequation{r0d=Nd(0)]TJ /F6 11.955 Tf 11.95 0 Td[(1) 00wherethisestimatecomesfromtheexpectationofNMdistribution.Alsonotethatthevocabularywordsarecurrentlydenotedbyhtopici #andthesearetherepresentativewordsforthatparticulartopic. 5.19InferenceforBDNMModelWewillstateandshowtwofactshere. 5.19.1NegativeMultinomiallikelihoodisconjugatetoBetaDirichletpriorThenegativemultinomial(NM)distributionisageneralizationofthenegativebinomialdistributionformorethantwooutcomes.WehaveseethatBDdistributionisconjugatetoNMlikelihood. 5.19.2NegativeMultinomialasamixtureofGammaandmultivariateindependentPoisson(MIP)X=(n1,,nQ)NM(r,p1,,pQ)isequivalenttothefollowingmixtureGamma(r0,1)]TJ /F3 11.955 Tf 11.96 0 Td[(p.)XMultivariate-Poisson(p1,p2,,pQ)wherep.=PQi=1piandnotethatweareusingtheGammadistributionforwithshapeparameter>0andrateparameter>0andthedistributionfunctionlookslike \())]TJ /F5 7.97 Tf 6.59 0 Td[(1e)]TJ /F12 7.97 Tf 6.59 0 Td[(Themixturewilllooklike,Z10"QYj=1(pj)njexp()]TJ /F3 11.955 Tf 9.3 0 Td[(pj) nj!#(1)]TJ /F3 11.955 Tf 11.95 0 Td[(p.)r0 \(r0)r0)]TJ /F5 7.97 Tf 6.58 0 Td[(1exp()]TJ /F6 11.955 Tf 9.3 0 Td[((1)]TJ /F3 11.955 Tf 11.95 0 Td[(p.))d 160

PAGE 161

Figure5-7. Top6topicsandtheirtop20words 161

PAGE 162

=(1)]TJ /F3 11.955 Tf 11.95 0 Td[(p.)r0 \(r0)QQj=1pnjj QQj=1nj!Z10r0+PQj=1nj)]TJ /F5 7.97 Tf 6.59 0 Td[(1exp()]TJ /F8 11.955 Tf 9.3 0 Td[()d=(1)]TJ /F3 11.955 Tf 11.95 0 Td[(p.)r0 \(r0)QQj=1pnjj QQj=1nj!\(r0+PQj=1nj) 1r0+PQj=1nj=\(r0+PQj=1nj) \(r0)QQj=1nj!(1)]TJ /F3 11.955 Tf 11.96 0 Td[(p.)r0QYj=1pjnjNowcomparingthisequationwithequation 5{4 weseethattheyareexactlythesame. 5.19.3PosteriorinferencewithFiniteapproximateGibbssamplerHerewewanttopresentaposteriorinferencealgorithmforHBBDNMP.CurrentlyisxedandassumedtobeknownandKisthetotalnumberoftopicsandcomesduetoapproximationofunderlyingBetaprocess[ 71 ].WecouldhaveimplementedsliceorecientsliceMCMCsamplermentionedin[ 39 ]and[ 40 ].Wewillusemd,k,j,ltodenotethenumberofdatageneratedindocumentdfromtopickundercategoryjandldenotesmixturecomponenti.e.thefactthatthewordactuallygotgeneratedfrombasicknowntopicTjorunknowntopic!k.Tjcorrespondtol=1and!kcorrespondstol=0.Wewillalsouse.notationtodenotethemarginalizationforthatcomponent.Likemd,.,j.denotesnumberofwordsthosegeneratedindocumentdundercategoryj.SonumberofwordsinadocumentNd=XkQXj=1Xl=0,1md,k,j,lWewanttoinfereach!kandallthethreecomponentsresponsibleforeachwordorobservationzd,n,yd,nandcd,n.UnderHBBDNMPwehavethefollowingposterior p(z.,.,y.,.,c.,.,!.jx.,.,)/p(z.,.,y.,.,c.,.,!.,b.,.,.,b0,.jx.,.,)(5{27)whereF,H,,0,0,fdgDd=1,fdgDd=1,fr0dgDd=1.NotethatnotonlythewordsaregivenasdatawealsoknowthelengthoftheindividualdocumentwhichisNdforalld=1,,D.FandHaretakentobeMultinomialandDirichlet,respectivelywithknown 162

PAGE 163

parameters(withdimensionW).WewillemployanniteapproximationversionofGibbssamplerandthisapproximationcomesduetotheapproximationoftherstlevelBPB0.Sobysize-biasedconstructionofB0,wehaveaniteapproximationofitwithaxednumberofcomponentsK. B0=KXk=1b0,k!kwhereb0,kBeta00 K,01)]TJ /F8 11.955 Tf 13.15 8.09 Td[(0 K;!kH8k=1,,K (5{28) 5.19.3.1BDdrawsNowtheparametersforBDiscomingfromtheBPintherstlevel.WecanwritetheBDas~Vd,k[bd,k,1,bd,k,2,,bd,k,Q]TBD(ddb0,k,d(1)]TJ /F8 11.955 Tf 11.96 0 Td[(db0,k),(1,,Q))andthusitalsomeansthatwehave,bd,k,.Beta(ddb0,k,d(1)]TJ /F8 11.955 Tf 11.95 0 Td[(db0,k)) 5.19.3.2NegativemultinomialdrawsInordertoaccountforthenumberofwordsNdinadocumentwehavetodecomposeanegativemultinomialdrawasamixtureofGammaandmulti-variateindependentPoisson(shownabove).Letustakeanauxiliaryvariabled,kwhichwillbeusedtodenotetherateforthePoissonlikelihoodfortopickindocumentd.Nowifwedraw(id,k,1,,id,k,Q)TNM)]TJ /F3 11.955 Tf 5.47 -9.68 Td[(r0d,(bd,k,1,,bd,k,Q),wecanwriteitas d,kGamma)]TJ /F3 11.955 Tf 5.47 -9.68 Td[(r0d,(1)]TJ /F3 11.955 Tf 11.96 0 Td[(bd,k,.)wherebd,k,.=QXj=1bd,k,j(id,k,1,,id,k,Q)TMIP(bd,k,1d,k,,bd,k,Qd,k) (5{29) 163

PAGE 164

5.19.3.3Gamma-PoissonconjugacyIfthepriorandlikelihoodaregivenbythefollowingequationsGamma(,)xiPoisson()8i=1,2,,nThentheposteriorofispostGamma +nXi=1xi,+n!Nowgiventhedata,wecanwritedowntheposteriorford,kbyusingtheaboveconjugacyaspostd,kGamma(r0d+md,k,.,.,(1)]TJ /F3 11.955 Tf 11.95 0 Td[(bd,k,.)+1)wheremd,k,.,.=PQj=1Pl=0,1md,k,j,l=PNdI(zd,n=k)whichcanbefoundbyPNdPQj=1Pl=0,1I(zd,n=k,yd,n=j,cd,n=l). 5.19.3.4InferencestepsTheclustersizeid,k,.,.canbeconstructedbysamplingeachzd,nindependentlyfromd,k Pkd,kandsettingid,k,.,.=PNdI(zd,n=k).Nowthej-thcategorysizegiventheclustersizeisinproportionbd,k,j Pjbd,k,j.Soconditionedondatanfxd,ngNdn=1oDd=1andNdwecanwritedowntheposteriorforzd,n,yd,nandcd,n.Letuswritewritedownthelikelihoodofthedatagivenz,yandc.ThedatafordocumentdiswrittenasXdanditslengthisdenotedbyNd. L(fXd,NdgDd=1)=DYd=1NdYn=124KYk=1"QYj=1h[P(xd,n=xj!k)]I(Cd,n=0)[P(xd,n=xjTj)]I(Cd,n=1)iI(Yd,n=j)#I(zd,n=k)35DYd=1"KYk=1"QYj=1P(Nd,k,j=md,k,j)## (5{30) Now,P(Nd,k,j=md,k,j)isaPoissonlikelihoodwithparameterbd,k,jd,kandd,khasaGammapriorwithshapeandrateparametersr0dand(1)]TJ /F3 11.955 Tf 12.24 0 Td[(bd,k,.),respectively.Sogivendata(lengthofdocumentrestrictedtoaparticulartopic),P(zd,n=kjyd,n=j,cd,n=l,xd,n=x)shouldbeproportionaltoposteriorofandbd,k,ji.e.bd,k,jpostd,k 164

PAGE 165

5.19.3.5Samplingzd,n,yd,nandcd,nP(zd,n=kjyd,n=j,cd,n=l,xd,n=x)/[P(xd,n=xj!k)]I(l=0)[P(xd,n=xjTj)]I(l=1)(bd,k,j)postd,k8k=1,,KP(yd,n=jjzd,n=k,cd,n=l,xd,n=x)/bd,k,jn[P(xd,n=xj!k)]I(l=0)[P(xd,n=xjTj)]I(l=1)o Pjbd,k,jn[P(xd,n=xj!k)]I(l=0)[P(xd,n=xjTj)]I(l=1)o8j=1,,QP(cd,n=ljzd,n=k,yd,n=j,xd,n=x)/[P(xd,n=xjTj)]I(l=1)[(1)]TJ /F8 11.955 Tf 11.95 0 Td[()P(xd,n=xj!k)]I(l=0) P(xd,n=xjTj)+(1)]TJ /F8 11.955 Tf 11.96 0 Td[()P(xd,n=xj!k)forl=0or1 5.19.3.6SamplingAd,kForsamplingAd,k=(bd,k,1,,bd,k,Q)T,wewilluseBDandNMconjugacy, Ad,kBD)]TJ /F8 11.955 Tf 5.48 -9.68 Td[(ddb0,k+md,k,.,.,d(1)]TJ /F8 11.955 Tf 11.96 0 Td[(db0,k)+r0d,(d,1+md,k,1,.,,d,Q+md,k,Q,.)(5{31)Also,fromthedenitionofBDbd,k,.followsaBetadistributionwithparameters)]TJ /F8 11.955 Tf 5.48 -9.68 Td[(ddb0,k+md,k,.,.,d(1)]TJ /F8 11.955 Tf 11.96 0 Td[(db0,k)+r0d. 5.19.3.7Samplingb0,kThiswouldbeverysimilar,b0,k(00=K))]TJ /F5 7.97 Tf 6.59 0 Td[(1(1)]TJ /F3 11.955 Tf 11.96 0 Td[(b0,k)0(1)]TJ /F12 7.97 Tf 6.59 0 Td[(0=K))]TJ /F5 7.97 Tf 6.59 0 Td[(1DYd=11 \(ddb0,k)\(d(1)]TJ /F8 11.955 Tf 11.96 0 Td[(db0,k))bd,k,. 1)]TJ /F3 11.955 Tf 11.96 0 Td[(bd,k,.ddb0,kWeknowbd,k,.followsaBetadistributionwithaboveparameters,sointegratingoutb.,k,.wehavetheconditionaldensityofb0,kasfollows, b0,k(00=K))]TJ /F5 7.97 Tf 6.58 0 Td[(1(1)]TJ /F3 11.955 Tf 11.95 0 Td[(b0,k)0(1)]TJ /F12 7.97 Tf 6.59 0 Td[(0=K))]TJ /F5 7.97 Tf 6.59 0 Td[(1DYd=1\(id,k,.,.+ddb0,k)\(r0d+d(1)]TJ /F8 11.955 Tf 11.95 0 Td[(db0,k)) \(ddb0,k)\(d(1)]TJ /F8 11.955 Tf 11.95 0 Td[(db0,k))(5{32)WewillemployrandomwalkMHalgorithmwithBetaorUniformasproposaldistribution. 165

PAGE 166

5.19.3.8Sampling!kWewilluseconjugacyforFandH,whichhappenedtobeMultinomialandDirichletinourcase.Theposteriorwilllooklike H(!k)DYd=1NdYn=1P(xd,n=xj!k)I(zd,n=k;cd,n=0)(5{33)whichwillbeDirichletwithupdatedparametersbyconjugacy. 166

PAGE 167

CHAPTER6DISCUSSIONANDFUTUREWORKInthisdissertationwehavepresentedtwomodelsfeaturingtheBayesiannonparametricframework.TherstsegmentinvokedtheDPMMframeworkontheStiefelmanifold,usingwhichweclusteredmatrixvariatedatathatdoesnotlieintheEuclideanspace.ThesecondsegmentextendedtheIBPprocesstocategoricaldataanddemonstratedthattheBeta-DirichletprocessistheunderlyingDeFinettimixingdistributionofcIBP. 6.1FutureworkRelatedtotheFirstProblemThetechniquespresentedinthisdissertationcanbesystematicallyextendedtodictionarylearningandnonparametricBayesiandensityestimationonothermanifolds.AnimmediateextensionwouldbetocarryouttheinferenceprocedureforthecorrespondingDPMMmodelontheGrassmannmanifold.HerewebrieyreviewthedenitionandpropertiesoftheGrassmannmanifold. GrassmannManifold:.TheGrassmannmanifoldGn,pisaspacewhosepoints/elementsarep-dimensionalhyperplanesinRncontainingtheorigin.ElementsinGn,pmayalsobespeciedbyorthonormalbaseswhichcanberepresentedbyn)]TJ /F3 11.955 Tf 13.14 0 Td[(by)]TJ /F3 11.955 Tf 13.13 0 Td[(pmatrices.TheimportantpointtonoteisthatthechoiceofthematrixisnotuniqueunlikeinthecaseoftheStiefelmanifold.TheGrassmanninsteadcomprisesofequivalenceclassesofn)]TJ /F3 11.955 Tf 12.42 0 Td[(by)]TJ /F3 11.955 Tf 12.42 0 Td[(porthonormalmatriceswheretwomatricesareequivalentiftheircolumnsspansthesamep-dimensionalsubspace.ThisthengivesthequotientrepresentationoftheGrassmannviatheStiefelasVn,p=O(p)withrespecttothegroupofright-orthogonaltransformationlikeX!XHforH2O(p).TheorbitofXontheStiefelmapstoaxedpointintheGrassmann(axedp-dimensionalsubspace).Amongthenon-uniformdistributionsonPn,p,theMatrix-Bingham(MB)distributionisoneofthemorepopulardistributions.Itisdenedas167

PAGE 168

YMB(Y;A)=1 1F1(p 2;n 2;A)exp(trace(AY)) (6{1) =1 1F1(p 2;n 2;A)exp(trace(YTAY)) (6{2) becausetrace(YTAY)=trace(YAYT)=trace((AYT)(Y))=trace(A(YTY))=trace(AY)whereAisnnasymmetricparametermatrix.WeplantoworkoutthenonparametricBayesianinferenceindetail{bothMCMCandvariationalinference{fortheGrassmannmanifold.AsforMCMC,conjugacywillnolongerbeavailableusingtheBinghamfamilyofdistributions.WeshallthereforeincorporateaMetropolis-Hastingstypeupdateortherejectionsamplingmethodtocarryouttheinference.Formalanalysisofconvergence(ergodicproperties)foralltheseMarkovchainswouldbeanotherinterestingfuturework.Finally,forboththeStiefelandtheGrassmann,wewouldliketoimplementasplit-mergeMCMCalgorithm[ 72 ]whichisarecentdevelopmentintheMCMCliterature.ThecurrentMCMCalgorithmssuerfromslowconvergenceandpoormixing.Infactwhentwoormoremixturecomponentshavesimilarparameters,theGibbssamplingmethodmaygettrappedinalocalmode,whichmightthenleadtoanincorrectclusteringofthedatapoints.TypicallyallofthecurrentMCMCalgorithmsworkinaincrementalmannerwhichisnotsuitedtomovingasingledatapointtoanewmixturecomponent,whenthecomponentshavesimilarparameters.Inthenewmethodithasbeenarguedthatoneisabletotraversethestatespacequicklyandvisithigh-probabilityregionsmoreoftenasitsplitsormergesagroupofobservationsineachupdate.Thisalgorithmorsomevariantofitlikethesequentiallyallocatedsplit-mergesampler[ 73 ]areclaimedtoworkbetterwithhighdimensionaldata. 6.2FutureworkRelatedtotheSecondProblemSinceBDPhasapositivecovariancestructure,accordingtotheauthorsof[ 3 ],itwouldbeinterestingtodevelopaprocesswherethecovariancestructureisnegative.Intheentire 168

PAGE 169

discussionwehaveassumedthat,intheunderlyingPoissonrandommeasure,theweightsandthelocationoftheatomsareindependent.Weplantoinvestigatehowasimpledependencestructurecanbeimposedinamannersuchthattheinferencedoesnotbecomedicult.Finally,therearenumerousmachinelearningproblemsthatmightrequiremodelingnewclassesofpriorsthatareyettobeintroduced.Wewouldliketoallocatesomethoughtinthedirectionofdevisingnewclassesofpriorsthatwillsuitspecictypesofapplicationssuchastopicmodeling,dynamictexturesclassication,shapeanalysis,objectrecognition,clusteringoftime-seriesmodel,etc. 169

PAGE 170

REFERENCES [1] G.Camano-Garcia,Statisticsonstiefelmanifolds.ProQuest,2006. [2] R.ThibauxandM.I.Jordan,\Hierarchicalbetaprocessesandtheindianbuetprocess,"inInternationalConferenceonArticialIntelligenceandStatistics,vol.11,2007,pp.564{571. [3] Y.Kim,L.James,andR.Weissbach,\Bayesiananalysisofmultistateeventhistorydata:beta-dirichletprocessprior,"Biometrika,vol.99,no.1,pp.127{140,2012. [4] W.Boothby,AnintroductiontodierentiablemanifoldsandRiemanniangeometry.AcademicPress,1986,vol.120. [5] J.Lee,Introductiontosmoothmanifolds.SpringerVerlag,2003,vol.218. [6] Y.Chikuse,Statisticsonspecialmanifolds.SpringerVerlag,2003,vol.174. [7] A.Edelman,T.Arias,andS.Smith,\Thegeometryofalgorithmswithorthonormalityconstraints,"SIAMjournalonmatrixanalysisandapplications,vol.20,no.2,pp.303{353,1999. [8] P.Absil,R.Mahony,andR.Sepulchre,Optimizationalgorithmsonmatrixmanifolds.PrincetonUnivPr,2008. [9] A.BhattacharyaandD.Dunson,\Nonparametricbayesiandensityestimationonmanifoldswithapplicationstoplanarshapes,"Biometrika,vol.97,no.4,p.851,2010. [10] P.Saisan,G.Doretto,Y.Wu,andS.Soatto,\Dynamictexturerecognition,"inComputerVisionandPatternRecognition,2001.CVPR2001.Proceedingsofthe2001IEEEComputerSocietyConferenceon,vol.2.IEEE,2001,pp.II{58. [11] T.L.GrithsandZ.Ghahramani,\Theindianbuetprocess:Anintroductionandreview,"JournalofMachineLearningResearch,vol.12,pp.1185{1224,2011. [12] Y.Teh,D.Gorur,andZ.Ghahramani,\Stick-breakingconstructionfortheindianbuetprocess,"inProceedingsoftheInternationalConferenceonArticialIntelligenceandStatistics,vol.11,2007. [13] Y.TehandD.Gorur,\Indianbuetprocesseswithpower-lawbehavior,"2009. [14] K.-i.Sato,Levyprocessesandinnitelydivisibledistributions.Cambridgeuniversitypress,1999. [15] D.Applebaum,Levyprocessesandstochasticcalculus.Cambridgeuniversitypress,2004,vol.93. [16] D.DaleyandD.Vere-Jones,\Anintroductiontothetheoryofpointprocesses,"1988. [17] J.F.C.Kingman,Poissonprocesses.OxfordUniversityPress,USA,1993,vol.3. 170

PAGE 171

[18] N.L.Hjort,\Nonparametricbayesestimatorsbasedonbetaprocessesinmodelsforlifehistorydata,"TheAnnalsofStatistics,vol.18,no.3,pp.1259{1294,1990. [19] Y.Kim,\Nonparametricbayesianestimatorsforcountingprocesses,"Annalsofstatistics,pp.562{588,1999. [20] T.Broderick,L.Mackey,J.Paisley,andM.Jordan,\Combinatorialclusteringandthebetanegativebinomialprocess,"arXivpreprintarXiv:1111.1802,2011. [21] A.Gelman,J.Carlin,H.Stern,andD.Rubin,Bayesiandataanalysis.CRCpress,2004. [22] J.GhoshandR.Ramamoorthi,Bayesiannonparametrics.SpringerVerlag,2003. [23] T.Ferguson,\Abayesiananalysisofsomenonparametricproblems,"Theannalsofstatistics,pp.209{230,1973. [24] E.HewittandL.J.Savage,\Symmetricmeasuresoncartesianproducts,"TransactionsoftheAmericanMathematicalSociety,vol.80,no.2,pp.470{501,1955. [25] Y.Teh,\Dirichletprocess,"Encyclopediaofmachinelearning,pp.280{287,2010. [26] B.ksendal,Stochasticdierentialequations:anintroductionwithapplications.SpringerVerlag,2003. [27] D.BlackwellandJ.MacQueen,\Fergusondistributionsviapolyaurnschemes,"Theannalsofstatistics,vol.1,no.2,pp.353{355,1973. [28] J.Sethuraman,\Aconstructivedenitionofdirichletpriors,"StatisticaSinica,vol.4,pp.639{650,1994. [29] C.Antoniak,\Mixturesofdirichletprocesseswithapplicationstobayesiannonparametricproblems,"Theannalsofstatistics,pp.1152{1174,1974. [30] J.F.Kingman,\Completelyrandommeasures."PacicJournalofMathematics,vol.21,no.1,pp.59{78,1967. [31] L.F.James,A.Lijoi,andI.Prunster,\Posterioranalysisfornormalizedrandommeasureswithindependentincrements,"ScandinavianJournalofStatistics,vol.36,no.1,pp.76{97,2009. [32] T.GrithsandZ.Ghahramani,\Innitelatentfeaturemodelsandtheindianbuetprocess,"2005. [33] C.Andrieu,N.DeFreitas,A.Doucet,andM.Jordan,\Anintroductiontomcmcformachinelearning,"Machinelearning,vol.50,no.1,pp.5{43,2003. [34] W.R.GilksandP.Wild,\Adaptiverejectionsamplingforgibbssampling,"AppliedStatistics,pp.337{348,1992. 171

PAGE 172

[35] W.R.Gilks,N.Best,andK.Tan,\Adaptiverejectionmetropolissamplingwithingibbssampling,"AppliedStatistics,pp.455{472,1995. [36] M.EscobarandM.West,\Bayesiandensityestimationandinferenceusingmixtures,"Journaloftheamericanstatisticalassociation,pp.577{588,1995. [37] S.MacEachernandP.Muller,\Estimatingmixtureofdirichletprocessmodels,"JournalofComputationalandGraphicalStatistics,pp.223{238,1998. [38] R.Neal,\Markovchainsamplingmethodsfordirichletprocessmixturemodels,"Journalofcomputationalandgraphicalstatistics,pp.249{265,2000. [39] S.G.Walker,\Samplingthedirichletmixturemodelwithslices,"CommunicationsinStatisticsSimulationandComputationR,vol.36,no.1,pp.45{54,2007. [40] M.Kalli,J.E.Grin,andS.G.Walker,\Slicesamplingmixturemodels,"Statisticsandcomputing,vol.21,no.1,pp.93{105,2011. [41] C.BishopandS.S.enligne),Patternrecognitionandmachinelearning.springerNewYork,2006,vol.4. [42] D.BleiandM.Jordan,\Variationalinferencefordirichletprocessmixtures,"BayesianAnalysis,vol.1,no.1,pp.121{144,2006. [43] K.MardiaandP.Jupp,Directionalstatistics.JohnWiley&SonsInc,2000. [44] R.Muirhead,Aspectsofmultivariatestatisticaltheory.WileyOnlineLibrary,1982,vol.42. [45] P.KoevandA.Edelman,\Theecientevaluationofthehypergeometricfunctionofamatrixargument,"MathematicsofComputation,vol.75,no.254,pp.833{846,2006. [46] P.Ho,\Simulationofthematrixbingham{vonmises{sherdistribution,withapplicationstomultivariateandrelationaldata,"JournalofComputationalandGraphicalStatistics,vol.18,no.2,pp.438{456,2009. [47] A.Wood,\Simulationofthevonmisessherdistribution,"Communicationsinstatistics-simulationandcomputation,vol.23,no.1,pp.157{164,1994. [48] T.D.Downs,\Orientationstatistics,"Biometrika,vol.59,no.3,pp.665{676,1972. [49] C.KhatriandK.Mardia,\Thevonmises-shermatrixdistributioninorientationstatistics,"JournaloftheRoyalStatisticalSociety.SeriesB(Methodological),pp.95{106,1977. [50] P.JuppandK.Mardia,\Maximumlikelihoodestimatorsforthematrixvonmises-sherandbinghamdistributions,"TheAnnalsofStatistics,vol.7,no.3,pp.599{606,1979. [51] M.Zhou,H.Chen,J.Paisley,L.Ren,G.Sapiro,andL.Carin,\Non-parametricbayesiandictionarylearningforsparseimagerepresentations,"2009. 172

PAGE 173

[52] M.Elad,Sparseandredundantrepresentations:fromtheorytoapplicationsinsignalandimageprocessing.Springer,2010. [53] Y.C.EldarandM.Mishali,\Robustrecoveryofsignalsfromastructuredunionofsubspaces,"InformationTheory,IEEETransactionson,vol.55,no.11,pp.5302{5316,2009. [54] Y.C.Eldar,P.Kuppinger,andH.Bolcskei,\Compressedsensingofblock-sparsesignals:Uncertaintyrelationsandecientrecovery,"arXivpreprintarXiv:0906.3173,2009. [55] M.Stojnic,F.Parvaresh,andB.Hassibi,\Onthereconstructionofblock-sparsesignalswithanoptimalnumberofmeasurements,"SignalProcessing,IEEETransactionson,vol.57,no.8,pp.3075{3085,2009. [56] R.G.Baraniuk,V.Cevher,M.F.Duarte,andC.Hegde,\Model-basedcompressivesensing,"InformationTheory,IEEETransactionson,vol.56,no.4,pp.1982{2001,2010. [57] D.S.P.Richards,\High-dimensionalrandommatricesfromtheclassicalmatrixgroups,andgeneralizedhypergeometricfunctionsofmatrixargument,"Symmetry,vol.3,no.3,pp.600{610,2011. [58] K.I.GrossandD.S.P.Richards,\Specialfunctionsofmatrixargument.i:Algebraicinduction,zonalpolynomials,andhypergeometricfunctions."Trans.Am.Math.Soc.,vol.301,no.2,pp.781{812,1987. [59] A.Constantine,\Somenon-centraldistributionproblemsinmultivariateanalysis,"TheAnnalsofMathematicalStatistics,pp.1270{1285,1963. [60] H.Skovgaard,\Oninequalitiesofturantype,"Math.Scand.,vol.2,pp.65{73,1954. [61] H.FuandP.-Y.Kam,\Exponential-typeboundsontherst-ordermarcumq-function,"inGlobalTelecommunicationsConference(GLOBECOM2011),2011IEEE.IEEE,2011,pp.1{5. [62] B.LeibeandB.Schiele,\Analyzingappearanceandcontourbasedmethodsforobjectcategorization,"inComputerVisionandPatternRecognition,2003.Proceedings.2003IEEEComputerSocietyConferenceon,vol.2.IEEE,2003,pp.II{409. [63] H.CetingulandR.Vidal,\Intrinsicmeanshiftforclusteringonstiefelandgrassmannmanifolds,"inComputerVisionandPatternRecognition,2009.CVPR2009.IEEEConferenceon.IEEE,2009,pp.1896{1902. [64] O.HamsiciandA.Martinez,\Spherical-homoscedasticdistributions:Theequivalencyofsphericalandnormaldistributionsinclassication,"JournalofMachineLearningResearch,vol.8,no.1583-1623,pp.1{3,2007. [65] N.DalalandB.Triggs,\Histogramsoforientedgradientsforhumandetection,"inComputerVisionandPatternRecognition,2005.CVPR2005.IEEEComputerSocietyConferenceon,vol.1.Ieee,2005,pp.886{893. 173

PAGE 174

[66] L.Fei-FeiandP.Perona,\Abayesianhierarchicalmodelforlearningnaturalscenecategories,"inComputerVisionandPatternRecognition,2005.CVPR2005.IEEEComputerSocietyConferenceon,vol.2.Ieee,2005,pp.524{531. [67] K.Fang,S.Kotz,andK.Ng,SymmetricMultivariateandRelatedDistributionsMonographsonStatisticsandAppliedProbability.London:ChapmanandHallLtd.MR1071174,1990. [68] L.F.James,A.Lijoi,andI.Prunster,\Posterioranalysisfornormalizedrandommeasureswithindependentincrements,"ScandinavianJournalofStatistics,vol.36,no.1,pp.76{97,2008. [69] F.Leisen,A.Lijoi,andD.Spano,\Avectorofdirichletprocesses,"2012. [70] J.JacodandA.N.Shiryaev,Limittheoremsforstochasticprocesses.Springer-VerlagBerlin,1987,vol.288. [71] J.Paisley,A.Zaas,C.Woods,G.Ginsburg,andL.Carin,\Astick-breakingconstructionofthebetaprocess,"inInternationalConferenceonMachineLearning.Haifa,Israel,2010. [72] S.JainandR.Neal,\Asplit-mergemarkovchainmontecarloprocedureforthedirichletprocessmixturemodel,"JournalofComputationalandGraphicalStatistics,vol.13,no.1,pp.158{182,2004. [73] D.Dahl,\Sequentially-allocatedmerge-splitsamplerforconjugateandnonconjugatedirichletprocessmixturemodels,"JournalofComputationalandGraphicalStatistics,2005. 174

PAGE 175

BIOGRAPHICALSKETCH SubhajitSenguptawasbornintheeasternpartofIndia,nearKolkata.AfternishingtenthandtwelfthgradefromMaheshSriRamakrishnaAshramVidyalayaandKanailalVidyamandir,respectively,hewasadmittedtoJadavpurUniversity,KolkataforhisundergraduatestudiesinComputerScienceandEngineering.Afternishinghisundergraduateeducationin2004,heworkedasasoftwaredeveloperatHCL-Technologies,Noida,India.Havingastrongdesiretodoresearch,hewasadmittedtothePh.D.programintheComputerandInformationScienceandEngineeringDepartmentattheUniversityofFloridain2006.HisprimaryareaofresearchismachinelearningandappliedstatisticsespeciallyintheareaofBayesiannonparametrics.Hisotherresearchareasaredynamicalsystemsundercomputationalneuroscienceandcomputervision.HegraduatedwithaPh.D.fromtheUniversityofFloridainMay2013. 175