Techniques of parallelization in Markov chain Monte Carlo methods

MISSING IMAGE

Material Information

Title:
Techniques of parallelization in Markov chain Monte Carlo methods
Physical Description:
1 online resource (93 p.)
Language:
english
Creator:
Gopal,Vikneswaran
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Statistics
Committee Chair:
Casella, George
Committee Members:
Hobert, James P
Khare, Kshitij
Davis, John M

Subjects

Subjects / Keywords:
amdahl -- clt -- mcmc -- parallel -- regeneration -- renewal -- topic
Statistics -- Dissertations, Academic -- UF
Genre:
Statistics thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
Parallel computing is at the forefront of statistical research today. The main reason for this is the fact that it is the most scalable solution, both in terms of cost and computational ability, to the large applied problems that are being solved by contemporary researchers. As many of these solutions utilize Markov chain Monte Carlo techniques, here we investigate methods of running these chains in parallel. For a geometrically ergodic Markov chain, we begin by establishing that the most theoretically sound method of running it in parallel is through the use of regeneration, which divides the chain into truly independent blocks, or tours. Then we continue by investigating some operational issues relevant to parallelizing the regenerative method. Using renewal theory, we show that it is possible to use the information from the completed tours to complete the unfinished ones. We also derive a lower bound on the possible speed-up that can be attained by using our parallel algorithm, as compared with the sequential one. Finally, we document and provide a general purpose software package for running a minorized geometrically ergodic Markov chain on a cluster. We demonstrate our parallel methodology and the use of our software package on two problems of general interest - hierarchical linear mixed models and topic models. Finally, as an aside, we also investigate running adaptive Markov chains using a pipeline algorithm. We show that it has similar convergence properties to the sequential algorithm, but provides a substantial speed-up in clock-time.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Vikneswaran Gopal.
Thesis:
Thesis (Ph.D.)--University of Florida, 2011.
Local:
Adviser: Casella, George.
Electronic Access:
RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2012-08-31

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2011
System ID:
UFE0043220:00001


This item is only available as the following downloads:


Full Text

PAGE 2

2

PAGE 3

3

PAGE 4

Iwouldliketoexpressmydeepestgratitudetomycommitteemembersfortheirguidance.Inparticulartomyadvisor-whoinspiredmetotakeupstatisticsevenbeforeImethim.Itisnoexaggerationtostatethatthisjourneywouldnothavebeenpossiblewithouthiscoaching,encouragementandtremendousunderstanding. 4

PAGE 5

page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 7 LISTOFFIGURES ..................................... 8 ABSTRACT ......................................... 9 CHAPTER 1INTRODUCTION ................................... 10 2RELEVANTMARKOVCHAINTHEORY ...................... 16 2.1RegenerativeMarkovChains ......................... 16 2.2ProvingGeometricErgodicityofaMarkovChain .............. 17 2.3RegenerationinParallel ............................ 20 2.4MinorizationofCommonMarkovChainMonteCarloSamplers ...... 22 2.5FiniteStateSamplerswithMixtureCandidates ............... 24 2.6Summary .................................... 27 3IMPLEMENTATIONASPECTSOFPARALLELREGENERATION ........ 28 3.1AchievableSpeed-upfromParallelization .................. 28 3.2CompletingUnnishedTours ......................... 31 3.2.1RenewalTheory ............................ 33 3.2.2EstimatingResidualLifeandResidualReward ........... 34 3.3PossibleDiagnostics .............................. 38 3.4Summary .................................... 40 4RPACKAGEFORPARALLELREGENERATIVEMARKOVCHAINS ...... 42 4.1CodeandMethodValidation ......................... 42 4.1.1MinorizingtheOnewayModel ..................... 45 4.1.2TheExperiment ............................. 48 4.2HierarchicalLinearMixedModel(HLMM) .................. 49 4.3TopicModels .................................. 51 4.4Summary .................................... 62 5PIPELINEDADAPTIVEMARKOVCHAINMONTECARLO ........... 68 5.1Introduction ................................... 68 5.2StochasticApproximation ........................... 69 5.3TheOrdinaryDifferentialEquationMethodofProof ............. 73 5.4PipelineVersionofStochasticApproximation ................ 76 5.5ConvergenceRate ............................... 79 5

PAGE 6

................................... 81 5.7Summary .................................... 81 6FUTUREWORK ................................... 83 APPENDIX:RPACKAGEDOCUMENTATION ...................... 85 A.1Overview .................................... 85 A.2InnerWorkingsoftheCode .......................... 85 A.3UserInterfacetoPackageFunctions ..................... 86 A.4InteractiveSessions .............................. 88 LISTOFREFERENCES .................................. 89 BIOGRAPHICALSKETCH ................................ 93 6

PAGE 7

Table page 1-1Taxonomyofparallelprocessingarchitectures ................... 15 4-1Numericallycomputedposteriormeansfortestdataset ............. 62 4-2Regenerativemethodcoverageprobabilities .................... 62 4-3Timingsforonewaymodel .............................. 63 4-4REMLestimatesforpetroldatamodel ....................... 63 4-5MLestimatesforpetroldatamodel ......................... 63 4-6Petroldataregenerativeestimates ......................... 63 4-7Timingsforpetroldata ................................ 64 4-8IndexnotationforLatentDirichletAllocationmodel ................ 64 4-9Timingsfortopicmodelsimulateddata ....................... 64 4-10Topicmodelsimulateddataregenerativeestimates ................ 65 7

PAGE 8

Figure page 3-1Speed-upfromparallelregeneration ........................ 41 4-1Coefcientofvariationboxplots ........................... 64 4-2ScaledRegenerationQuantilePlots ........................ 65 4-3ProximitytotheorisedNormalapproximation ................... 66 4-4Multi-modalityofLDAmodel ............................. 67 5-1PipelinedadaptiveMarkovchainMonteCarlo ................... 82 A-1FlowchartofMcParrefunctions ........................... 88 8

PAGE 9

9

PAGE 10

Suchardetal. ( 2010 )and Zhouetal. ( 2010 ),havebecomemorefrequentinrecenttimes.Whatexactlyisdrivingthisneedforparallelcomputing?Allsignspointtoincreasinglylargedatasets,andtotheneedtosolveproblemsofincreasingcomputationalcomplexity.Intheeldofgenetics,forexample,itcanbearguedthatdatasetsareincreasinginsizefasterthantheprocessinghardwareisimprovingtohandlethem.Ontheotherhand,methodologiessuchasbootstrappingrequiremorecomputingresourcesthantypicalstatisticaltechniques,simplybydesign.Similarly,MonteCarlosimulationtechniquesallowmoresophisticatedanalysesofdata,butagain,theydemandmoreprocessingpower.Thesewouldnaturallypointtoaneedtoboostthespeedofourindividualprocessors,butthetrendisveryclearlytowardslinkingupprocessorsofmoderatespeedratherthanaconcertedefforttoincreasethespeedofindividualprocessors.Thereasonforthisissimple:Theformeroptioncanbeexecutedatafractionofthecostofthelatter.Doublingthespeedofaprocessoriscostlybecauseitrequiresmemoryaccessspeedsandcoolingtobeconcomitantlyimproved.Althoughaddinganotherprocessorwouldrequirearewriteoftheprogramsinordertofullyoptimizethenewparallelprocessingenvironment,itisinordinatelycheap,easytodoandisascalablesolution.Thisexplainswhysupercomputersareprohibitivelyexpensiveexcepttodedicatedresearchinstitutes,whereasclustercomputingpossibilitiesarereadilyavailableandaccessible.Mostuniversitieshavetheirowncluster,whilecompaniessuchasAmazon,IBMandnowHP,havemadecloudcomputingavailabletoanyoneatanextremelycheapcost.Forthisreasontoo,ourdesktopsandnotebooksalreadycontainmultipleprocessors. 10

PAGE 11

Kontoghiorghes ( 2006 )and Matloff ( 2011 ).Inthissectionweshalltouchonsomeofthetopicsinthosereferences,andin Suchardetal. ( 2010 ),butourmainaimistointroducetheparallelcomputingarchitectureandapproachthatwehavechosenforourpurpose.Yearsago,ataxonomyofparallelprocessingarchitectureswasputforwardin Flynn ( 1972 ).Today,thedividinglinesarenotascleanasbeforesincemanycontemporaryparallelset-upsarehybridsofthoseoutlinedinthatseminalpaper.Howevertheoriginalbreakdownisstillagoodstartingpoint.ReferringtoTable 1-1 ,modelsforparallelcomputationcanbecategorisedinto4basickinds.Single-coredesktopsandnotebooksfromafewyearsagofallundertheSISDcategory.Theycontainoneprocessor,andtheyexecuteoneinstructionatatime.Inessence,theyarenotparallel.TheMISDmodelisusedonlyinspecialisedapplications,suchaswhenthealgorithmcanberuninapipeline.Itisalsousedincaseswheremultipleprocessorsarecalledontocarryoutthesamecomputationonthesamedatainordertodetectandmaskerrors.Thisissometimestermedfault-tolerantcomputing.Thelasttwotypesofparallelprocessingarchitectures,SIMDandMIMD,arealsothemostwidelyusedones.GPUprocessingisonegoodexampleofanSIMDmachine.AspecicexampleofitsworkingiswhenthehundredsofprocessorsontheGPUworkonrefreshingseparateportionsoftheviewingdisplay.AnexampleofanMIMDmachineisaBeowulf1cluster.TheMIMDmodelcanbefurthersub-dividedintosharedmemorymachinesanddistributedmemoryones.DistributedmemoryMIMDmachinestypicallyco-ordinatetheirworkbypassingmessagestooneanother.Whicharchitectureischosendependsonthealgorithmweintendtoparallelise,andhowweintendtoparalleliseit.Inamedium-grainparallelization,itispossibletodivide

PAGE 12

Hansetal. ( 2007 )developedashotgunstochasticsearchalgorithmtolookforregressionmodelsinparallel.TheirmethodisaMarkovchainsincethefuturecongurationsofvariablesonlydependsonthecurrentstate,butitisneitheraMetropolisnoraGibbsalgorithmandthusitspropertiesaredifculttoanalyze.However,ourimmediateaimistoleverageonparallelcomputingresourceswhenwewishtocarryoutestimationusingaMarkovchainratherthanastochasticsearch.OnenotableattemptatrunningaMetropolis-Hastingschaininparallelcanbefoundin Brockwell ( 2006 ),wherethealgorithmcallsonaclustertopre-computelikelihoodsofthecandidatedistributionandthussavetime.Ithasbeenextendedin Strid ( 2010 )tothecasewherethecurrentstateisusedtomakebetterfuturecandidatepredictions.Bybetter,theauthorsmeanthattheycantunethecandidateuntilanoptimalacceptancerateisachieved.However,thisdestroystheMarkov-iannatureofthechain,andwouldfallunderthecategoryofadaptivealgorithmsratherthanatraditionalMCMCalgorithm.Anotherpointtonoteisthatthisalgorithmisapplicableonlywhenthecomputation 12

PAGE 13

RenandOrkoulas ( 2007 ).TheauthorshereparallelizeaMetropolisalgorithmbydecomposingthedomain,andsamplingwithinthedisparateregionsonseparatenodes.Theyproposeswitchingtheregionsaroundrandomly,buttheyacknowledgethatthiswouldintroduceanothertimesinkviathemessagesbeingpassed.Apartfromthese,themostdirectmethodoftransplantinganMCMCalgorithmontoaclusterwouldbetorunonechainoflengthNoneachofnprocessors,andthenusethemeanofthemeansasapointestimateofthedesiredintegral,andthestandarderroroftheobservedmeansasanestimateofthevariabilityinordertogetacondenceinterval.Althoughappealinginitssimplicity,thismethodprovidesnosavingsintermsofcomputingtimeanditwillnotalwaysprovideaccurateresults.Forexample,in Fishman ( 2001 )(chapter6),itisshownthatundermildassumptionsonthetherateatwhichtheasymptoticbiasreducesto0,asthenumberofchainsnincreasestoinnity,theburninineachchain,k,hastoincreasefasterthanlogninorderforthecoverageprobabilityofthecondenceintervaltohold.Thusitalmostdefeatsthepurposeofrunningthechainonacluster,wherewehopedthathavingmoreprocessorswouldhavegainedustimeand/orprecision.Regenerativesimulation,ontheotherhand,providesaverycleanwayofparallelisingMCMCalgorithms.Ateveryregenerationpoint,therandomvariablegeneratedisindependentofitscurrentstate,andeverythingbeforeit.Henceregenerationpointsprovideuswithindependentsegmentsortoursthatcanberunonseparateprocessors,andthenconcatenatedlater! 13

PAGE 14

Suchardetal. ( 2010 ),whereanegrainparallelizationisappliedtoaGibbssamplerinordertoparallelizeandprovideaspeedupwithintheiterations.GPUsaffordanincrediblespeed-up.However,theyrequireverycarefulimplementationofanalgorithm,whichtranslatestoalongdevelopmenttime.WhatweaimforistoallowaresearchertorunaregenerativeMCMCalgorithm,aslongastheycanprovidetheminorization(seeChapter 2 forthedenitionandexplanationofrequiredMCMCconcepts).Anyincreaseinprocessingpower(byaddingnodes)willthenprovideaspeed-upbydecreasingtheexecutiontimetorunaxednumberoftours.Therestofthisreportisstructuredasfollows.InChapter 2 ,weprovideareviewofthetheoreticalbackgroundforregeneratingMarkovchainsandformalizetheideaofrunningtheminparallel.InChapter 3 ,weinvestigatesomepropertiesofparallelregenerativeMarkovchains.First,wetouchonthepossiblespeed-upwecangainovertheserialversionofthealgorithm.Second,weinvestigateifitpossibletouseinformationfromincompletetours.Thisisrelevantasourproposaltoincreasethenumberofprocessorswouldincreasethenumberofincompletetoursweareleftwith.Inchapter 4 ,wepresenttheRpackageMcParreforMarkovChainPArallelREgeneration.Weintroducetheparallelprogrammingparadigmthatitusesanddemonstrateitsuseandthespeedupitprovidesviathreemodels.Lastly,inChapter 5 ,wediscussanideaunrelatedtoregenerativechains-acceleratingadaptiveMarkovchainsusingapipelinealgorithmonacluster. 14

PAGE 15

Taxonomyofparallelprocessingarchitecturesasintroducedin Flynn ( 1972 ) AbbreviationParallelComputingArchitecture SISDSingleInstructionSingleDatastreamMISDMultipleInstructionSingleDatastreamSIMDSingleInstructionMultipleDatastreamMIMDMultipleInstructionMultipleDatastream 15

PAGE 16

Hobertetal. ( 2002 ), JonesandHobert ( 2001 )and Myklandetal. ( 1995 ),reviewswhatneedstobedonebeforearegeneration-basedCentralLimitTheorem(CLT)canbeappliedtoaparticularMCMCsampler.Brieyspeaking,rstofall,ithastobeshownthattheMarkovchainisgeometricallyergodic.Next,ithastobeascertainedthatthefunctionwhoseintegralwewishtondhasslightlymorethana2ndmoment.Finally,theMarkovchainhastobesplitintoindependentandidenticallydistributedtoursduringitsrun.Nowwegointosomedetailregardingtheabovestepsandintheprocessintroducethenotationthatwillbeused.Let=fX0,X1,...gbeaMarkovchaintakingvaluesinthesamplespace(E,E)withtransitionkernelP(,),whereEistypicallyRkandEistheBorelsetonE.Alsoassumethatpossessesthefollowingproperties: 1. Ithasastationarydistribution.Inotherwords,(A)=RP(x,A)(dx)forallA2E. 2. Itis-irreducible,aperiodicandHarrisrecurrent.SupposewewishtouseanMCMCalgorithmtoestimateEg=Rg(x)(dx),andweknowthatRjg(x)j(dx)<1.Theassumptionsaboveimplythatwehaveanergodictheorem,whichyieldsg=1 16

PAGE 17

2 )wasintroducedin ChanandGeyer ( 1994 ).Unfortunately,theredoesnotexistanelegantestimateof2,eventhoughthereisavastamountofresearchbeingdoneinthisarea.Oneofthemorepromisingcandidatestoestimate2istheBatchMeansestimator(see Jonesetal. ( 2006 ), Flegaletal. ( 2008 )and FlegalandJones ( 2010 )).However,itrequirestwolevelsofasymptoticsinordertoachieveconsistency-boththebatchsizeandthechainlengthhavetotendtoinnity,andthebatchsizeistypicallychosenbyconventionratherthanbyderivation.Ontheotherhand,introducingregenerationtimesintoaMarkovchainallowsustobreakitupintoindependentandidenticallydistributed(i.i.d)blocks,ortours.ApplyingtheclassicalCLTtothesei.i.dblocksyieldsaCLTwithanaturallyarisingconsistentestimateofthevariance.UsingtheregenerationCLTremovesanyneedforthechaintobeinstationarity.WeshalldisplaytheregenerationCLTlater,afterwehaveintroducedtherequisitenotation. JonesandHobert ( 2001 ),theauthorsdescribeonemethodofprovingthatisgeometricallyergodic.Notethatconvergencewillbemeasuredintotalvariationnorm.Undertheassumptionsabove,wehavethat,forallx2E, Rosenthal ( 2001 )). 17

PAGE 18

Nummelin ( 2004 ). Nummelin ( 2004 ),page61.Therstresultisthat(Xn,n)isaMarkovchain.Second,themarginalsequencefXngisaMarkovchainthatfollowsthelawdescribedbyouroriginalkernelP.Lastly,whenevern=1,wehavearegeneration,meaningthatthesubsequentportionofthechainisindependentofthepast. 18

PAGE 19

(EN1)2 Hobertetal. ( 2002 ),andobservethattheresultdependsonN1andS1havingnitesecondmoments.Theachievementoftheauthorswastoshowthatgeometricergodicityanda(2+)-thmomentongaresufcientforthis. 19

PAGE 20

2 )and( 2 )arelinked.Theauthorsin ChanandGeyer ( 1994 )derivedthefollowingresult.Geometricergodicity+(2+)-thmomentong9>>>>>=>>>>>;)non-regenerationCLT(equation( 2 ))TheregenerationCLTonlyrequires,alongwithaminorizationcondition,thatEN21andES21arenite:Minorizationcondition( 2 )+EN21andES21nite9>>>>>=>>>>>;)regenerationCLT(equation( 2 ))Whattheauthorsshowedin Hobertetal. ( 2002 )isthefollowing:Geometricergodicity+(2+)-thmomentong9>>>>>=>>>>>;)EN21andES21niteandhencetheregenerationCLTholds.Formally,theirtheoremsaysthefollowing: Hobertetal. ( 2002 )). 2.2.2 .IfisgeometricallyergodicandEjgj2+<1forsome>0,thenEN21andES21arebothnite.ItfollowsthattheCLTinequation( 2 )isapplicable. 20

PAGE 21

2.2.2 ,andthatallchainsareinitializedwiththesmallmeasureXi,0independentfori=1,...,dSupposewewishtocollectapre-determinednumberoftours,R,andsupposewehavedprocessorsatourdisposal.ItisimportanttothinkabouthowweshouldspreadtheseRtoursoverthedprocessors.IfwesimplytriedtoobtaintherstRtourscompletedbythedprocessorsandthrewawayanytoursthatwerestillrunning,wewouldbeintroducingabiasbecausewewouldbefavouringshortertours.AbetterideawouldbetorequireeachprocessortogeneratebR=dctours,andthenrequirethosethathavenishedtothenworkononeoftheremainingtours.Thiswouldmeanthatthetotalnumberoftourscollectedisxedandpre-determined,andthateachofthedprocessorswouldrunatleastbR=dctours.Someprocessorsmightrunanextratour,butnotourwouldbeleftunnishedwiththisapproachandthusitwouldnotfavorshortertours.WesetRitobethetotalnumberoftoursinchaini,andNi,jtobethelengthofthej-thtourinchaini,wherei=1,2,...,dandj=1,2,...,Ri.Finally,weredeneRandNtobeR=Pdi=1RiandN=(1=R)PiPjNi,j.Thekeypointofthisprocedureisthatthechainsareinitializedandrunindependently,whichleadstothepairs(Ni,j,Si,j)stillbeingindependentandidenticallydistributed.Asaresult,thefollowingPropositionholds. 2.2.1 foralli=i,...,d.ThenifEjgj2+<1forsome>0,thefollowingCLTholds. 21

PAGE 22

2 ).Thisprocesssplitsthechainintoindependentchunks,yieldingthe(Ni,Si)pairsthatweneedinordertoapplyTheorem 2.2.1 .Whatweproposeinsteadisgeneratingthebellvariablesalongsidetheactualchain.Wheneverthebellvariablegeneratedisasuccess,itindicatesthecompletionofatour.Theoretically,wecanthenbeginanewtourbyrestartingwithadrawfrom.Withthisapproach,wecanruneachtouronaseparateprocessor,andformgRand^2g,whicharedenedinProposition 2.3.1 ,fromtheresultingindependenttours.AlthoughgRisnolongertheergodicmeanfromalongnon-regenerativechain,Proposition 2.3.1 provesthatthetheoryissound.Inpractice,weshallnotthrowawaythecurrentlygeneratedstateandrestartfromwheneverweencounteraregeneration.Instead,weshallcontinuethechainwiththecurrentlygeneratedstate.Thiswouldhaveatwofoldbenet.Firstly,itwouldnotwasterandomvariablesalreadygenerated.Secondly,itwouldallowthenalpointestimatortobebasedonachainthatisclosetostationarity,andthuswouldintuitivelyreducetheratiopointestimatoreffect.Formoreontheratioestimator,seeSection 3.3 Myklandetal. ( 1995 ),theauthorsdescribedquitegeneralmethodsofminorizingMetropolis-Hastings(M-H)andGibbssamplers.Supposeourtargetdistribution

PAGE 23

Myklandetal. ( 1995 )showhowtominorizePifthefollowing2conditionshold: 1. Anatom(sq,q)thatminorizesQ(seedenition 2.2.2 )canbefound.Formally,wehavethatq(x,y)(dy)sq(x)q(dy),and 2. Thereexistsapositivefunctionhsuchthat {z }s(x)q(dy)min(y) {z }(dy)(2)Condition(2)aboveholds,inparticular,whentheproposalisanindependentjump,orwhenitissymmetric.Inthecasewhereqissymmetric,i.e.,q(x,y)=q(y,x),anyconstantcanserveash.Ifwethenpick~x2EandacompactsetD2E,wecansplitQwithq(dy)=q(~x,y)ID(y) 23

PAGE 24

{z }s(x)q(~x,y)ID(y)(dy)| {z }(dy)(2)ThistechniqueofworkingwithadistinguishedpointandsetisveryamenabletominorizingaGibbssampler,asweshallseelaterinthesectiononlinearmodels. Myklandetal. ( 1995 )pointoutthatweshouldtrytondapair(s,)suchthats(x)(dy)islargeonaverage,asthatwouldmeantheregenerationprobabilitieswouldbehigher,leadingtomoretours. 2.4 ,wereviewedthemaintechniquesintroducedin Myklandetal. ( 1995 )forsplittingalargeclassofMetropolis-HastingsMarkovchains.Themethodsrelyontherebeingapositivefunctionhsatisfyingtheconditionin( 2 ).Theseapproachescanbeextendedtomixturekernels,aslongasoneofthecomponentkernelssatisesthecondition.SupposethatwehaveakernelP=p1P1+(1p1)P2,andP1isaMetropoliskernelthatcanbeminorized.ThenPcanbeminorizedinthefollowingmanner: 2.5.1 .ItisidenticaltoProposition5in Tierney ( 1998 ).Hereweprovideaprooffornitestatespacechains.Weneedtointroduceacoupleofideasrst.RecalltheconceptoftheasymptoticvarianceofaMarkovchainintroducedinequation( 2 ).NotethatitisspecictothefunctiongwhoseintegralwewishtoestimateusingourMarkovchain.Thefollowingdenitioncanbefoundin Mira ( 2001 ). 24

PAGE 25

Peskun ( 1973 )). 2.5.1 .Considertwonite-stateMetropolis-Hastingschains,withthesamestationarydistribution.Supposethatwehaveanitesequenceofmcandidatedistributions,givenbyq1(x,y),...,qm(x,y),andasetofweightsw1,...,wmsuchthatPwi=1.Oneofthechainsusesamixtureofmkernels,withqibeingthecandidateforkerneli.Theotherusesasinglekernel,withamixtureoftheqi'sacandidate.Theoff-diagonalsofthetransitionkernelforthemixturecandidate,P1,aregivenby

PAGE 26

2 ),whichapplieswhenx6=y.Foralli2f1,2,...,mg,wehavethatmin(y) MiraandGeyer ( 1999 )toconcludethatP1ismoreefcientthanP2.Ifinaddition,theinequalitiesinthestatementofthePropositionhold,thenitfollowsfromtherstonethatqi(x,y)=min(y) 26

PAGE 27

2 )willbesmallerifweutiliseamixturecandidateinsteadofamixturekernel. 27

PAGE 28

Amdahl ( 1967 ),amethodtoanalysethespeed-upofanalgorithmwasintroduced.SupposethatanalgorithmconsistsofAoat-pointoperations.AssumethatafractionfofthoseoperationscanbespeduptorunatV1oat-pointoperationspersecond(ops),andtheremainingoperationshavetoremainrunningatV2ops,withV2(1f)A V2Comparedwithtslow=A=V2,thespeed-upaffordedbytheaccelerationis 1f(3)TheaboveinequalityisknownasAmdahl'sLaw,andcanbeusedtoderiveaboundonthemaximumspeedupofanalgorithm.Itcanbeextendedtoparallelisationofaserialalgorithminananalagousway.Supposethataserialalgorithmtakest1secondstoexecuteonaparticularprocessor.Assumethatafractionfofthealgorithmcanbeexecutedinparallelonpprocessorsideally.Inotherwords,thisfractionfofthealgorithmcanbeexecutedonpprocessorsinexactlyft1=pseconds.Thoughthisisunrealistic,itgreatlysimpliestheanalysis.Thespeed-upcanagainbeboundedbythesamequantityasabove:s=t1 f+(1f)p<1 1fNowconsidertheserialalgorithmthatwewishtoparallelise:

PAGE 29

Gustafson ( 1988 ),theauthorconsideredtheinversequestiontoAmdahl:Givenaparallelalgorithm,howmuchfasterwoulditbewhencomparedtorunningitonaserialprocessor?Putinthisway,ouralgorithmbecomeseasiertoanalyse,butitisstillcomplicatedbythefactthenumberofoperationsineachtourisarandomquantity.ConsidertheidealizedversionofourparallelalgorithmtorunRtours.SupposewehaveR+1identicalprocessorsatourdisposal,eachofwhichrunsataspeedofVops.TheRprocessorsthatactuallyrunthetoursareusuallyreferredtoastheslaves,whiletheprocessorcontrollingthemisknownasthemaster.Supposealsothattherearenolatenciesincommunicatingwiththeslaves.Thefollowingcodeisrun:Codeonmaster:

PAGE 31

3-1 indicatesthatwecanexpectatleasta40%speedup. 31

PAGE 32

1. Specifythetotalnumberoftours,R,thatthesubsequentanalysisshouldbebasedon,andthenstopthechainwhenRtoursarecompleted. 2. Specifythetotalnumberofiterationsthatthechainshouldberunfor,andthenstopthechainwhenthisisachieved.Thesubsequentanalysiswouldthenbebasedonthetourscompletedbythattimepoint.Aspointedoutearlier,thedownsidewithoption1isthatitrequiresarandomamountoftotalcomputingtime.Althoughthisproblemdoesnotarisewithoption2,otherissuessurface.Firstofall,itispossiblethatnotoursmightbecompletedbytheendoftherun.Secondly,asinvestigatedby MeketonandHeidelberger ( 1982 ),utilisingonlythecompletedtourstoestimatethemeaninvolvesabiasoftheorder(1=t),wheretisthelengthoftheMarkovchainthatisrun.Theauthorsshowedthatwaitingforthatnaltourtoendwouldreducetheorderofthebiasto(1=t2).Theywereabletoshowthatforsmallvaluesoft,thebiasreductionisappreciable.Thereasonforthatbiasreductionisthatthatnaltourisatypical.Thisistheso-calledinspectionparadoxorwaiting-timeparadox.Itreferstothefactthatoncewepre-specifyatimepoint,thetourthatcontainsthattimepointwillbelargerthananormaltour.Theorderofthebiasoftheestimatorcanbereducedsimplybywaitingforthetourthatcontainsthepre-speciedtimetocomplete.Detailsaboutthisissuecanbefoundin Feller ( 1971 ).Anotherissuewithoption2isthatthenumberoftours,R,isnowrandom.ItisnotstraightforwardtoseeifTheorem 2.2.1 andProposition 2.3.1 stillhold,althoughtheyshoulddosoasR!1.Thusoption2maynotbethepreferredone.However,whenrunningaregenerativeMarkovchaininparallelonacentrallyadministeredcluster,ausertypicallyhastospecifyanestimateonthewall-timethathisprogramwilltaketorun.Thisestimateis 32

PAGE 33

Feller ( 1971 ),buta 33

PAGE 34

Smith ( 1958 ),whileanaccessiblemid-levelintroductioncanbefoundin Ross ( 1992 ).Anumberoftheoremsexistpertainingtotheasymptoticpropertiesofthearrivalrate.Anotherveryusefulresultfromrenewaltheoryisatechniqueforsolvingrenewal-typeequations.However,weshallnotneedtoresorttothoseresults.Weonlyneedtheindependenceandstationarityoftherenewalprocessforourresults.

PAGE 35

Hencewecanndtheexpectedlengthofthatincompletetour:E[L(t)+Z(t)jZ(t)=s]=s+E[L(t)jZ(t)=s]=s+E[N1sjN1>s]=E[N1jN1>s]Thus,knowingthattherewerenorenewalssincesiterationsagomeanswedonothavetoconcernourselveswiththeinspectionparadoxandcanestimatethelengthwithasimpleconditionaldistribution.Wecanmakeaverysimilarcomputationtoestimatetherewardfromtheincompletetour.However,weneedtocomplicatethenotationalittlefurther.Recallthespecicationthattherewardisgainedgradually.LetS0s,nbetheaccumulatedrewardupuntiliterationswithinthen-thtour.Alsonotethatsinceeachtourisindependentandidenticallydistributed(weareessentiallyre-startingtheMarkovchainfromthesameinitialmeasurethatisdenedinequation( 2 )),wecanseethatforaxeds,theS0s,iarealsoidenticallydistributed.Inotherwords,theS0s,iarethesumofsiterationsofaMarkovchainwiththesametransitionkernelandinitialdistribution.SincetheSirandomvariableisabsolutelycontinuous,wehavetospecifyasetthatitfallsinto.LetthissetbedenotedasD. 35

PAGE 36

Pr[S0s,M(t)+12DjZ(t)=s,M(t)=n]Havingobservedthatthen-threnewalocurredattime(ts),andduetotheindependenceoftours,thisisexactlyequaltoPr[Sn+12A,S0s,n+12DjNn+1>s] Pr[S0s,n+12DjNn+1>s]Nowduetothefactthattoursareidenticallydistributed,thisisinturnequaltoPr[S12A,S0s,12DjN1>s] Pr[S0s,12DjN1>s]

PAGE 37

OnewaytousetheresultsderivedinPropositions 3.2.1 and 3.2.2 istoutilisetheavailablecompletedtourstoestimatetherewardandlengthofthenaltours.Forexample,supposethatafterrunningRcompletetours,the(R+1)-thtourhadtobestoppedaftersiterations,andthatoutoftheRcompletedtours,RNofthemwerelongerthansiterations.Thenonestraightforwardmethodofestimatingitslengthwouldbetouse^NR+1=1 37

PAGE 38

2 ))canbederivedusingthedeltamethod,startingwithasymptoticnormalityofthebivariateestimator(S,N).However,thatderivationdependsontheapproximationofS=NEgby(S(Eg)N)=EN1,whichisbasedonaTaylorseriesexpansion.TheexpectedabsolutevalueofthatTaylorserieserrortermisgivenbyES EN1=1 38

PAGE 39

Myklandetal. ( 1995 ).Thisisthereasontheysuggestthatwhenrunningregenerativesimulations,thecoefcientofvariationofNshouldbemonitored.Duetothis, Bratleyetal. ( 1987 )and Ripley ( 1987 )bothexplainthatregenerativesimulationworksbestwhenthetoursweobserveareofalmostequallength(i.e.,Nhaslittlevariability).Inordertoxthepossiblebiasproblemhighlightedearlier, Bratleyetal. ( 1987 )and Ripley ( 1987 )suggestjackningtheestimatesofthemeanandthevariance.However,weshouldbewarywhenusingthejacknifedestimate,because,aspointedoutinSection2.4of ShaoandTu ( 1995 ),althoughasymptoticnormalityisunaffected,jackkningdoesincreasetheMSEofthenite-sampleestimator.However,theNormalapproximationdictatesthatthebiaswillreducetozeroaswecollectmoreandmoretours.Thepurposeofthissectionistoconsidertheuseofdiagnosticsforassessingifwehavesufcienttours.In Myklandetal. ( 1995 ),theauthorsmentiontwowaysofmonitoringwhentheNormalapproximationisgood.Here,weintroduethemandexpoundonthemalittle.Asalways,letNidenotetheindividualtourlengths,Tn=Pni=1NiandsupposethatRtourswerecollected.TherststatistictheysuggesttrackingisthecCV(N),theestimatedcoefcientofvariationofN.Ifthisisgreaterthan,say,1%,thentheworkingsinthebeginningofthissectionsuggeststhattheerrorterminthedeltamethodderivationoftheCentralLimitTheoremcouldbelarge.Theysuggestcollectingmoretoursinthesesituations.Theseconddiagnostictheysuggestedcomesfromviewingtheregenerativemethodinthecontextofrenewaltheory.Onpage234of Myklandetal. ( 1995 ),theauthorsmentionthattheproportionofrenewalsthatfallinafractionoftheobservationperiodconvergesto.Hencetheyputforwardtheideaofascaledregeneration 39

PAGE 40

theproportionofrenewalsthatfallinafractionoftheobservationperiodconvergesto.ThisissimilartotheexactresultforPoissonrenewalprocesses,whichsaysthatifweknowthereareRrenewalsinaninterval(0,t],thenthoseRrenewalsareindependentlyanduniformlydistributedovertheinterval(0,t].AproofofthisresultcanbefoundinTheorem2.3of Ross ( 1992 ).Nowconsiderplottingaquantile-quantileplotfortheobservedregenerationtimes,toseeiftheyaresimilartoauniformdistribution.Theempiricalquantilefor1=RwouldbeT1.Thetheoreticalquantile,basedontheuniformdistributionon(0,TR],wouldbe1=RTR.Thustherstpointontheplotwouldbe(TR=R,T1).Thesecondpointwouldbe(2TR=R,T2),andsoon.ScalingthisplotbydividingbothaxesbyTR,wegettheonethattheauthorssuggested:Ti=TRagainsti=R.Iftheplotofthesepointsweretofallclosetothey=xline,wewouldwanttobelievethatsufcienttourshavebeencollected.However,inoursimulations,theabovediagnosticswerenotparticularlyhelpful.Neitherthecoefcientofvariationnorthescaledregeneratedquantileplotswereparticularlyhelpfulindeterminingifmoretoursshouldberun(seeSection 4.1 andFigures 4-1 and 4-2 ).Howeverwehaveincludedbothinourpackage,whichwedescribenext. 3.1.1 providesalowerboundforthis.Afteraninitialrunofourchaintoestimatetourlengths,wecoulduseaplotsimilartotheoneinFigure 3-1 tovisualisethepossiblegain. 40

PAGE 41

3.2.1 and 3.2.2 ,wehaveprovidedamethodforausertoutilisethecompletetoursinordertoestimatethemissingonesandshownthatweneednotbeconcernedwiththeinspectionparadoxwhenwedoso.Thisareahasscopeforfurtherresearch(seeChapter 6 ). ThegureaboveallowsustovisualisethelowerboundfromProposition 3.1.1 .Thelevel-plotiscreatedusingtheequationz=xy,wherex,y2[0,1].TheoverlayedcurveisPr(A1A(R))againstforaPoissonrandomvariablewith=50andtakingR=50,000tours.100samplesof50,000toursweretakeninordertoestimatethecurve.Thecloserthecurveistothetoprighthandcorner,thegreaterthelowerboundofthespeedupwillbe.Here,wecanestimatethelowerboundtobeslightlymorethan40%R.Inthissmallexample,wecaninfactcomputethetruespeed-upforeachofthe100realisationsofthe50,000tours.ThenwecancomputethetruevalueofE[S]andcompareittothevalueprovidedbythelowerbound.Whenwedoso,weobtainE[S]=60%R. 41

PAGE 42

Appendix .Theyaretheonewaymodelwithproperpriorsfrom HobertandGeyer ( 1998 ),thelinearmixedmodelfrom Hobertetal. ( 2006 ).Thepackagealsoincludesparallelregenerationcodefortheonewaymodelwithimproperpriors(from TanandHobert ( 2009 )).Inthischapterwealsopresentaminorizationforamodelthatisusedtosearchforstructureincollectionsofdocuments.ItisknownasaLatentDirichletAllocation(LDA)modeloraTopicModel.ThecodetocarryoutregenerationoftheLDAmodelwiththeminorizationproposedinSection 4.3 isincludedwithMcParre.Itsusageisdemonstratedherewithasimulateddataset. HobertandGeyer ( 1998 ): 42

PAGE 43

2KXj=1mXi=1(yijj)2!ma2 43

PAGE 44

2(0)2od.=Zen0 2(220+20) 2(220) 2[(0+2)22(00+1+2)]gd=exp 0+21=2exp((00+1+2)2 22Xj=1mXi=1(yijj)2!)de.=(m+a2) 2P2j=1Pmi=1(yijj)2m+a2.=b2+1 22Xj=1mXi=1(yijj)2!ma2f2(,y)Substitutingtheaboveanalyticalexpressionsintoh()inequations( 4 )and( 4 ),wehavethat 44

PAGE 45

2Xi(i)2!ej,,GammaN 2Xi,j(yiji)2!j,e,N00+K HobertandGeyer ( 1998 ).Ifweusep1,...,pK+3torepresenteachoftheconditionaldensitiesabove,wecanwritetheentireMarkovtransitiondensity 45

PAGE 46

Myklandetal. ( 1995 ),weshallminorizethisGibbssamplerinthefollowingway.Choosingadistinguishedpoint~xandacompactsetD,wehave {z }s(x)p(~x,w)I[w2D]| {z }(dw)(4)TheinmumoverDcanbecomputedanalytically,soastoavoidusingasoftwareoptimizationroutine.Hereweprovidethedetails.Considerp(x,z)=p(~x,z),anddenex=(1=K)PKi=1xi+3and~x=(1=K)PKi=1~xi+3.p(x,z) 2Pi(xi+3x3)2K=2+a1zK=2+a111expz1b1+1 2Pi(xi+3x3)2 b1+1 2Pi(~xi+3~x3)2K=2+a1zK=2+a111expz1b1+1 2Pi(~xi+3~x3)2b2+1 2Pi,j(yijxi+3)2N=2+a2zN=2+a212expz2b2+1 2Pi,j(yijxi+3)2 b2+1 2Pi,j(yij~xi+3)2N=2+a2zN=2+a212expz2b2+1 2Pi,j(yij~xi+3)20+Kz1 0+Kz1

PAGE 47

2KXi=1(xi+3x3)2(~xi+3~x3)2| {z }C1z21 2Xi,j(yijxi+3)2(yij~xi+3)2| {z }C20+Kz1 22z3200+Kz1(~x+x) 22z3200+Kz1(~x+x) 4 ),thepartialderivativesare@g 2K(~xx)200+Kz1(~xx) 2K(~xx)z1K0(~x+x)200K 47

PAGE 48

2K22C1K(~xx)(2z3~xx)Coeffofz1=2K0(C1Kz3(~xx))+K(~xx)(K0(~x+x))Constantterm=20C1K(~xx)z3+K0(~xx)) 4.1.1 ,wecannumericallycomputetheposteriormeansof1,2and.ThesevaluesaredisplayedinTable 4-1 .Theyarethegoldstandardthattheregenerativeprocedureshouldpickupandformcondenceintervalsaround.Fortheexperiment,50000toursweregeneratedon5processors.Thiswasdonebymakingeachoftheprocessorscomplete10000tours.Thisentireprocesswasrepeated100times.Inordertoassesswhethersufcienttourshadbeencollected,wecouldimaginestoppingeachofthose100chainsafter5000,25000andthen50000tours.Inpractice,wedidthisbyconsideringtherst1000,5000andthen10000toursgeneratedbyeachprocessor,ateachofthe100iterations.Thiswouldmimicstoppingthechainprematurely.Atthesepseudo-breakpoints,wecanformestimatesoftheposteriormeansthatweareafter,andalsooftheCoefcientofVariationofthetours.The100estimatesoftheposteriormeanwerethenusedtoformdensityestimatesoftheorisedNormalapproximationthattheyshouldbeconvergingindistributionto.Theseplotscanbefoundin 4-3 .TheresultsfromthesimulationrunsaresummarisedinTable 4-2 andFigure 4-3 .Thetablesindicatethatthecoverageprobabilitiesapproachthenominalvalueaswecollectmoretours.Thegure 4-3 showsusthataswecollectmoreandtours,thedistributionofgRapproacheswhatweexpect.Figures 4-2 and 4-1 containplotsofthe 48

PAGE 49

3.3 .Inthisparticularcase,theydonotappeartobeusefulindeterminingifsufcienttourshavebeencollected.Bothofthemseemtosuggestthat5000toursaresufcient,butthedensityplotsinFig 4-3 suggestotherwise.Themainaimofourmethodologyistoimprovetimings,whileretainingatheoreticallywater-tightinferenceprocedure.Hencewere-ranthedatasetfromprevioussubsectionon1,2and5processorsinordertoassessthespeed-upthatwegain.Thistime,weonlyran10replicationsforeachcongurationofprocessors.TheresultsareinTable 4-3 .ThecodethatwasusedtorunthisanalysisinparallelisavailableinMcParreasgenNextStateOnewayPlainandregenProbsOnewayPlain. Hobertetal. ( 2006 ).TheblockGibbssamplerforthischainhasrecentlybeenproventobegeometricallyergodic.Firstweintroducethemodelandthenotationused,whichcloselyfollowsthatof Hobertetal. ( 2006 ).Theminorizationintroducedtherewillalsobepresentedandusedinthenumericalexamplethatfollows.ThemodelistheBayesianversionoftheusualfrequentistgenerallinearmixedmodel:Yj,u,R,DNn(X+Zu,R1)ju,R,DNp(0,B1)ujD,RNq(0,D1)R=RInwhereRGamma(r1,r2)D=DIqwhereDGamma(d1,d2) 49

PAGE 50

2uTu(Rj,Y)Gamman 2(YXZu)T(YXZu)(jR,D,Y)Np+q(0,1)wherethemeanandthecovariancematrixforaregivenby=0B@ZTRZ+DZTRXXTRZXTRX+B1CAand0=10B@ZTRYXTRY+B01CAChoosingadistinguishedsetforthe2variablesRandD,the Myklandetal. ( 1995 )methodwouldleadustothefollowingminorization.p((D,R,),(0D,0R,0))infD2D1(Dj) VenablesandRipley ( 2002 ).Thisdatawasoriginallyusedtobuildanestimationequationfortheyieldofthereningprocessofgasoline.Thepossiblepredictorswerethespecicgravity,vapourpressure,volatilityanddesiredvolatilityofthesamples.Intotal10samplesweretaken;withineachsamplevaryingnumbersofyieldreadingswerecomputed.Weassumethatthesamplesgive 50

PAGE 51

VenablesandRipley ( 2002 ),theabovemodelwastusingREMLandML.TheresultsarecontainedinTables 4-4 and 4-5 .McParrewasusedtoregenerateaMarkovchainfor1000toursinordertoestimatetheparametersofthemodelabove.TheestimatesarecontainedinTable 4-6 .Forthepriors,weusedparametersofr1=d1=100andr2=d2=3inordertoletthedataspeakforthemodel.Wealsoincludetimingsfor10runseach,using1,2,and5processorstocompletethe10,000tours.ThesetimesareinTable 4-7 .Thecodethatwasusedtoanalyzethisdata,andHLMMingeneral,isincludedinMcParreasgenNextStateHLMM,smallFnHLMM,smallMeasureHLMMandtransDensHLMM. Bleietal. ( 2003 ).Herewedescribethefullmodel,thatallowsustorunaGibbssamplerinordertoestimatetheposteriordistributions.Usingterminologyfromtextcollections,supposethatwehaveDdocumentsinacorpus,andthatafterremovingthestopwords,eachdocumenthasndwords.AlsoletVdenotethevocabularyofthecorpus-thetotalnumberofuniquewords.Theneachdocumentcanberepresentedbyasequenceofndbinaryvectors,eachoflengthV,representingthendwordsinthatdocument.Weshallusew1,d,w2,d,...,wnd,dtorepresentthewords.Theco-ordinateindicatorwit,d=1willtellusexactlywhichwordofthevocabularyisbeingpickedup,withtrunningfrom1toV.Thewordsareobserved,butforeachwordindocumentd,letzi,ddenotealatentvariable,alsoabinary 51

PAGE 52

4-8 containsasummaryoftheindexlabelsusedinthissection.Usingthebinaryvectornotationdescribedabovesimpliesthenotationinexpressionsforthejointdensity.However,whenprogramming,thezdandvariableswillberepresentedwithavectoroflengthequaltothetotalnumberofwords.Eachelementwillbeanintegerbetween1andk,indicatingthetopictowhichthatwordhasbeenassigned.Similarly,theobservedwordswcanberepresentedbyintegersbetween1andVrepresentingtheexactwordinreferencetothevocabulary.Thissecondarynotationstyleismoreconcisefordiscussionandwillbeusedlaterwhendescribingtheexample.NowletrepresentarandomkVmatrix,whereeachrowrepresentstheprobabilityvectoroverthevocabularycorrespondingtoonetopic.Formally,thedatageneratinghierarchyisasfollows: 52

PAGE 53

53

PAGE 54

Given=(,),andnotingthatnj,d=PVt=1Pndi=1zij,dwit,d=PVt=1mjt,d,thefullconditionalforzdisPr(zdjd,,wd)/VYt=1kYj=1mjt,djt!kYj=1nj,dj,d!=ndYi=1VYt=1kYj=1zij,dwit,djt!ndYi=1VYt=1kYj=1zij,dwit,dj,d!=ndYi=1kYj=1VYt=1(jtj,d)zij,dwit,dThuswecangeneratezi,daccordingtozi,dMultinomial(1,(pi1,d,pi2,d,...,pik,d))fori=1,2,...,nwherepij,d/VYt=1(jtj,d)wit,dforj=1,2,...,k Givenz,generateandviadDirichlet(n1,d+1,...,nk,d+k)ford=1,2,...,DjDirichletDXd=1mj1,d+b1,...,DXd=1mjV,d+bV!forj=1,2,...,kAsthisisa2-stageGibbssamplerwithoneoftherandomvariablesoveranitestatespace,wecanapplytheDualityPrinciple(seeSection9.2.3of RobertandCasella ( 2004 )and RobertsandRosenthal ( 2001 )forfurtherdetails)andconcludethattheoverallchainisuniformlyergodic.HenceTheorem 2.2.1 holds.Nowweturntominorizingthischain.Firstobservethatthereare2waystorunit.Let(zx,x,x)and(zy,y,y)denotetwopossiblestatesfromtheposteriordistribution.Togofrompoint(zx,x,x)tozy,y,y),wecoulddo 54

PAGE 55

4 rst.Inthiscase,theMarkovtransitiondensityisgivenby Myklandetal. ( 1995 )approachofdistinguishedpointsandsets.Denotethedistinguishedpointby(~z,~,~).Recallthatzrepresentstheassignmentsofwordstolatenttopics.Supposeweknowthatacertaingroupofsuchassignmentsissignicantlyprobable.LetitbedenotedbyD1.Thenwecanarriveataverydirectandsimple-to-understandminorization: 55

PAGE 56

4 )isinfz2Dzp1(zj,)

PAGE 57

4 ).Wecanderiveathirdpossibleminorizationbyrunningthechainusingtheschemespeciedin( 4 ),inwhichcasetheMarkovtransitiondensityisgivenby (nj+j)(nj~nj)I[nj>~nj]jTheaboveinequalitycanbeseentobetruebytakinglogs,andrememberingthatlogj<0:kXj=1(nj~nj)logj+constantkXj=1I[nj>~nj](nj~nj)logj+constantWecannowdeneoursmallfunction.inf,2D2p2a(jzx)p2b(jzx) (mjt+bt)(mjt~mjt)I[mjt~mjt>0]jtDYd=1kYj=1(~nj,d+j) (nj,d+j)(nj,d~nj,d)I[nj,d>~nj,d]j,d.=s2(zx,x,x)

PAGE 58

1. Howdowespecify,andchoose,D2? 2. Howcanwecomputetheinmum?Thetwoquestionsarecloselyinter-related,becauseoftheformofthefunctionweneedtominimize.Asndingtheminimumofafunctionisequivalenttondingtheminimumofthelogofthefunction,ourtaskcanbereducedtondingtheminimumoffunctionsofaspecicform.Notethatlog(s(zx,x,x))isasumoffunctionstobeminimizedoverthecorrespondingsetinD2,andeachfunctionhastheformf(1,...,k)=kXj=1ajlogjwhere0
PAGE 59

59

PAGE 60

4 )ismulti-modalinnature.Toseethismoreexplicitly,refertothejointdistributionin( 4 ).Observethatwhenthej'sareequal,thenanycyclicpermutationofthezdlabelsandcorrespondingswitchingofthedandjvalueswillresultinthesamevalueforthejointposterior( 4 ).ItiscleartoseethisifweconsiderthefollowingsmallcasewithD=2,k=2,V=3,n1=n2=3andthe6wordsbeingw=(1,2,2,3,3,2).Nowtakethefollowingtwopointsfromthestate-space: 1. 60

PAGE 61

4-4 .Theissuestemsfromthelackofidentiabilityoftheclusterlabelings.Hencefortheexamplethatfollows,weshallsidestepthisissuebyassumingthat,apartfromthewordsthemselves,wealsoobservethetruetopicassignmentsofsomewords.AsasmallinstructiveexampleoftheimplementationoftheLDAmodelinMcParre,wegenerateddatafromthegeneratinghierarchy( 4 )withthefollowingcorpusconguration:K=2,D=2,V=6,n1=n2=30.Insteadofgeneratingjandd,weusedthefollowingvalues: 1. 2. 3. 61

PAGE 62

4-9 .ThecomputedCondenceIntervalsfromjustthese30toursaregiveninTable 4-10 6 Table4-1. Numericallycomputedposteriormeansfortestdataset Table4-2. Coverageof85%and90%condenceintervalsasthenumberoftourscollectedincreases. 85%CIcoverage90%CIcoverage Completedtours1212 62

PAGE 63

Timetaken(inseconds)torun50,000tourson1,2and5processors.10chainswererunwitheachcongurationofprocessors. NumberofprocessorsMeantimeStandarddeviation 1processor119.1271.7712processors53.2991.4215processors32.4840.333 Table4-4. REMLestimatesforpetroldatamodel ParameterEstimateStandarddeviation Table4-5. MLestimatesforpetroldatamodel ParameterEstimateStandarddeviation Table4-6. 95%CondenceIntervalsfrom10,000toursofregenerativeMarkovchainestimatesforpetroldatamodel ParameterLowerboundPointestimateUpperbound 63

PAGE 64

Timetakentorun100000toursforthePetroldataon1,2and5processors.Thetimeunitisminutes,andthestandarddeviationsareobtainedfrom20runsofeachcongurationofprocessors. NumberofprocessorsMeantimeStandarddeviation 1processor15.4740.6052processors7.8040.4835processors4.6080.145 Table4-8. IndexnotationforLatentDirichletAllocationmodel ModelcomponentIndexandrange Timetaken(inminutes)torun30toursforthesimulateddatafromtheTopicModelon1,2and5processors. NumberofprocessorsMeantimeStandarderror 1processor8.0941.3922processors4.2620.9785processors2.3970.977 Althoughthecoefcientofvariationisalreadybelowthethresholdspeciedby Myklandetal. ( 1995 )afterjust5000tours,Figure 4-3 indicatesthatitisinsufcientlyclosetothetheorisedNormaldistribution. 64

PAGE 65

95%CondenceIntervalsfrom30toursofregenerativeMarkovchainestimatesforsimulateddatafromtheTopicModel ParameterLowerboundPointestimateUpperbound TheaboveplotsareforoneoftheMarkovchainruns.Arandomsampleof100ofthepointstobeplottedwastakenafter5K,10Kand50Ktours.Theaboveplotsdonotlookverydifferent,suggestingtheymightnotbeusefulasadiagnostic. 65

PAGE 66

All10050000=5000000tourswereusedtocomputeanestimateof2,theasymptoticvarianceintheregenerativeCLT.Thentakingintoconsiderationonlytherst25Ktours,thedensityplotfromtheXofthe100chainsgivesthesolidredlinesinthetop2panels.Thesolidcyanlinecomesfromusingall50000toursineachofthe100chains.ThedashedlinesdepictthetheoreticalNormaldistributionsthattheXshouldbecloseto. 66

PAGE 67

DatawasgeneratedfromtheLDAmodelwithwithV=3,D=2,K=2,n1=n2=20andthetruetopicmixtureprobabilities1=(0.05,0.95)and2=(0.95,0.05).Thusthetwodocumentsshouldseparatemoreorlesscleanlyinto2topics.TheGibbssamplerwasrunfor275,000iterations.Thetopleft-handplotdepictsthesampledvaluesof1,1fortherst50,000iterations.Thebottomleftdiagramsummarisesthetopicassignmentsofwordstotopicsfortheseinitial50,000iterations.Conguration1denotesastatewheremostofthewordsindocument1wereassignedlabel2,andmostofthewordsindocument2wereassignedlabel1.Conguration3denotestheconverse,andconguration2referstoastatewhereeitherdocument1or2hadanequalassignmentofwordstotopics.Thehistogramsontherightclearlyshowthemultimodalnatureoftheposteriordistribution.Theergodicmeansforalltheparameterswereestimatedtobe0.50. 67

PAGE 68

AndrieuandThoms ( 2008 )and AndrieuandRobert ( 2001 ).Inthissubsection,weintroducetheideabeforefocusingonatechniquetoparallelisetherunningofanadaptiveMCMCchain.MarkovchainMonteCarloalgorithmsformalargeclassofalgorithmsthatallowausertogenerateapproximatesamplesfromatargetdistribution,whenitisinfeasibletosampledirectlyfromit.Whendecidingontheparticularalgorithmtorunhowever,theuseristypicallyfacedwiththeproblemofdecidingonacandidatedistributiontogeneratefrom.InthecaseofaMetropolis-Hastingsalgorithm,themoresimilarthiscandidateisto,thecloserthenalrandomvariableswillbeto.Hencethereissomemotivationindeed,tochooseagoodcandidate.Thecircularnatureoftheproblemisthatalotoftimes,auserdoesnotknowwhatthekeyfeaturesofare.Infact,hemightintendtousetheobtainedapproximatesampletoestimatethesefeatures,e.g.anintegralwithrespectto.Insuchsituations,whentheuserhaslittlepriorknowledgeabout,howisheexpectedtochoosetherightcandidate?Typically,theuserisdecidingoncandidatesfromwithinafamilyofdistributionsparametrisedby2Rn.Insuchcases,onesolutionistouseanadaptiveMCMCtechnique.Withthis,theuserrstspeciesacostfunctionh()thatrepresentsacriteriontobeoptimised.Thereareseveralcriteriathatausercanpick.Forexample,iftheaimistoestimateanintegralRfd,anditisknownthatthesamplemeansatisesaCLT,thenonecriteriacouldbetheasymptoticvarianceoftheCLT.Notethatthiscriterion,however,dependsonthefunctionf,andwillbehardtoevenestimateforeachvalueof.Analternativecriterionistheoptimalacceptancerate,whichisknownin 68

PAGE 69

AndrieuandThoms ( 2008 ).Onceanoptimalitycriterionischosen,thechainisstartedwithanypre-determined.Everyfewstepsofthechain,thevalueofisupdatedaccordingtoaStochasticApproximation(SA)algorithm.Thisalgorithmwill,undercertainconditions,directthesequenceofvaluestotheonethatoptimisesh.Wegivemoredetailsonwhen,howandwhytheSAalgorithmwillconvergetotherightvalue,inthesubsectionsbelow.Thuseventually,theuserwillberunningtheoptimalchain.AdaptiveMCMCsolutionsprovideanalternativetorunningarangeofpilotchains,eachwithadifferentvalueof,andthenpickingthebestone.Inmostofthecurrentliterature,authorswarnthatthesequenceofvalueshavetochangelessandlessinordertopreservestationarityoftheMarkovchain.TheyrefertothisasVanishingAdaptation-see RobertsandRosenthal ( 2007 )and AndrieuandThoms ( 2008 )formoredetails.Wefeelthatitisbestinsteadtousetheadaptivetechniquetowhittledownthepossiblevaluesofinanautomatedmanner.WhentheSAalgorithmhasconverged,thevalueofshouldbexed,andthechainrestarted.Inthisway,wewillhaveatrueMarkovchainwiththeoptimalfromthenon. Chen ( 2002 )and KushnerandYin ( 1997 ).Looselyspeaking,SAisanoptimizationtechniquethatisapplicablewhentheoptimizationproblemcanbereducedtondingthezeroes(roots)ofanun-knownfunction.Byunknown,wemeanthatthefunctioncanbeobserved,but 69

PAGE 70

RobbinsandMonro ( 1951 ).Itisasimple,recursivealgorithm,andcanbedescribedthus:Letg:Rn!Rnbeanunknown,nondecreasingfunctionwitharootat,i.e.g()=0.Assumeg()canbeobservedateachpointkwithafollowingnoiseterm:Yn=g(n)+nExplicitly,attimen,nisourestimateof,nisthe(unobserved)noise,andYnisourobservationormeasurementofthefunctiong().AccordingtotheRobbins-Monroalgorithm,theestimateofatiterationn+1shouldben+1=nanYnTheyprovedthataslongasthestepsizesfangsatisfythefollowingconditions:an>0,an!0andPan=1,n!inL2.Since1951,muchresearchhasgoneintoweakeningtheconditionsongandtheerrortermswhileretainingconvergenceofthealgorithm.Forexample,observationsofg()couldevencontainastructuralerror:Yn=gn(n)+n=g(n)+n+(gn(n)g(n))Notethatthereisnowanadditionalerrortermarisingfromameasurementongn()insteadofg()ateveryiteration.Aslongtheobservationerrorscanbeshowntoaverageouttozero,thealgorithmwillstillconverge.Currentproofstargetalmostsureconvergenceoftheiteratestotherootofthetargetfunction.Thereare3mainmethods.Therstmethodrequiresthenoisetermtobeamartingaledifference.However,italsorequiresalineargrowthconditionong().Amoregeneraltechniquewasinventedinthe1970sthatreliedonOrdinaryDifferential 70

PAGE 71

Chen ( 2002 )and KushnerandYin ( 1997 ).Inthesectionsbelow,weshallintroduceoneofthesimpleralgorithmsandoutlinetheproofofitsconvergence.Thenweshallintroduceaparallelversionofit,demonstratethatitconvergesiftheparentonedoes,andshowthatithasthesameconvergencerateastheoriginal.TotiethingsbacktoadaptiveMCMC,considerarunningexamplefortherestofthischapter: Gelmanetal. ( 1996 ),whichconsistsofsimulationsthatdemonstrateapplicationsofthemoregeneraltheoremsprovedin Robertsetal. ( 1997 ).Intheformerpaper,theauthorsconsidersimulatingfromamultivariatestandardnormaldistributionofdimensiond,using 71

PAGE 72

limN!1NVar(Y)=1 limN!1NVar(Y)Theauthorshaveanasymptoticresult(formoregeneraltargetdistributions)thatleadstothefollowingheuristicrulewhenutilisinganormaldistributionasthecandidateinaRWMetropolis-Hastingsalgorithm:Setthescaleofthecandidatedsuchthattheacceptancerateofthealgorithmisapproximately1=4.Thisisanasymptoticresult,thatweshallnotuse.Instead,wefocusonthelower-dimensionalresultsinSection3.2oftheirpaper.There,theyndthatwhend=1,theoptimaldis2.4.TherelationshipbetweenthescaleoftheproposalandtheacceptancerateoftheRWMetropolis-Hastingsalgorithmisgivenby:paccept=2 2.4=0.442Thisisintunewiththeheuristicrecommendationgivenintheirpaper:tunetheparameterofinterestuntiltheacceptancerateisapproximately1=4.Here,ourparameterofinterestwouldbed.Droppingthedfromournotation,ouradaptiveMCMCchainwouldtheninvolvestartingat0,runningthechainfor,say10000steps,andthenestimatingtheacceptance 72

PAGE 73

Gelmanetal. ( 1996 )thatprovidestheoptimaldandthecorrespondingoptimalacceptancerateford=1,2,...,10. 73

PAGE 74

1. 5 )to 74

PAGE 75

KushnerandYin ( 1997 ).Weemphasisehowever,theproofbeginswithwritingn(t)asinequations( 5 )andproceedingfromthere.Hereisacompletestatementoftheconvergenceresult. 51 ).ThenthereisasetNofprobability0,suchthatfor!=2N,thesetoffunctionsf(n(!,),Zn(!,))gisequicontinuous.Let((!,),Z(!,))bethelimitofsomeconver-gentsubsequence.ThenthispairsatisestheprojectedODEgivenby:

PAGE 76

KushnerandYin ( 1997 ).Here,weapplyittoadaptiveMarkovchainsandformallyinvestigateitsproperties.Supposewehavedidenticalprocessorsandthat,inouracceptancerateexample,wewishtoxnandruneachchainford1stepsbeforeupdatingtheSAalgorithm.SupposeeachstepofanindividualMarkovchaintakes1unitofcomputationtime,anddenotethetimescaleintermsofcomputationtimebyT.InitialisetheSAalgorithmwith0=1=...=d1.Starttherstchainonprocessor1atT=0.After1unitofcomputation,atT=1,startthesecondprocessorwith1.Oneunitoftimelater,startprocessor3with2,andsoonandsoforth.AtT=d1,processor1wouldhavenishedd1iterationsofthechain,andwouldhaveY0.Itwouldreceived1fromprocessord,andperformthefollowingupdate:d=d1+a0Y0AtT=d,processoronewouldrestartitschainwithanewparameter-d.Atthisverysametimepoint,processor2wouldhaveobtainedY1,andwouldbereadytoupdate1.Itwouldobtaindfromprocessor1andperformthefollowingupdate:d+1=d+a1Y1

PAGE 77

5-1 .Considerthecasewherewehave3processorsatourdisposaltoestimatetheoptimalscaleparameterinExample 5.2.1 .Assumethateachprocessortakes2unitsoftimetorunthechainwithaxedparameter.Assumethatafurthersingleunitoftimeisneededtodothefollowing:(i)estimatetheacceptancerate,(ii)receiveaparameterfromthepredecessorinthepipeline,(iii)updatetheparametertothenewvalue,and(iv)sendthenewparametervaluetothenextprocessor.Foreachprocessor,steps(i)-(iv)arecarriedoutintheredsectionintheabovegure.Thebluedashedarrowsindicateaprocessorsendingitsmostupdatedvalueoftheparametertothenextprocessorinthepipeline.Tobegin,0,1and2areinitialisedtothesamevalueandandthechainonprocessor1isstarted.After1unitoftime,processor2isstarted.After2timeunits,processor3sends2toprocessor1andthenstartsitsownchain.Processor1receives2,andtogetherwithitsestimateoftheacceptancerateb0,performstheupdate3=2+a0(b0).Aftersending3toprocessor2,itbeginsestimationoftheacceptanceratewith=3.Thecyclethencontinues-processor2receives3,updatesto4usingitsestimateb1,sends4toprocessor3andstartsitsnextroundofacceptancerateestimationwith4.ThemodiedformoftheSAalgorithm(fromequation( 5 ))isnow 5 ),( 5 )and( 5 ).Lett0=0,tn=Pn1i=0ai.KeepingthesameassumptionsasintheoriginalsequentialSA 77

PAGE 78

5 )as 5 ),exceptforashiftedstartingpoint.Thustheexactsameproofcanbeusedtoproveconvergenceofthepipelinealgorithm,aslongastheoriginalalgorithmconvergences. 5.3.1 ,thepipelineversionoftheSAalgorithm(givenformallybyequation( 5 ))hasthesameconvergencepropertiesastheoriginalsequentialversiongivenbyequation( 5 ).Thepipeliningcaseoutlinedaboveismostefcientwhenthenumberoftimeunitstakenforasingleprocessortocompletearoundofcomputationsandanupdateisequaltothenumberofprocessorsavailable.Forexample,inFigure 5-1 ,eachofthe3processorstakes2timeunitstoestimatetheacceptanceand1timeunittoreceiveandsendmessages,afterwhichitisreadyforthenextrun.Thusateverytimeunit,anupdateistakingplace.Inreality,itisalmostimpossibletoachievethisperfectsynchronization.AgainintheframeworkofthealgorithminFigure 5-1 ,itispossiblethatthemessagepassingportionofthealgorithmisslowtosuchanextentthatprocessoronenishesitscomputing 78

PAGE 79

1. ThesequenceofrandomvariablesfYnI[jnj]gisuniformlyintegrableforsomesmall. 3. For,anisolatedstableequilibriumpointof_=g()inH,theshiftedprocessfn()gconvergesweaklytotheconstantprocess. 4. Thesequenceofdistributionsgivenbyf(n)=p 5.

PAGE 80

7. ThereisaHurwitzmatrixsuchthatlimn,m1 Forsomep>0andsmall>0supnEjMnj2+pIfjnjg<1Finally,thereisanon-negativedenitematrix1suchthatEMnMTnIfjnjgj0,Yifori
PAGE 81

5.2.1 usingastartingvalueof2.30,andatargetacceptancerateof0.442.5processorswereused,for40roundsofupdates.Thistranslatesto200updatesintotal.Thetotaltimetakenwas3:27minutesandthenalparametervaluewas2.357.Forthesequentialversion,200updatestook14:20minutesandthenalvaluewas2.349.Thiswasacommontheme-thepipelineversionwasingeneral5timesfaster,andwasclosertothetrueoptimalvalueattheendoftherun. 81

PAGE 82

Adiagramtoexplainthecasewherewehave3processorsatourdisposaltoestimatetheoptimalscaleparameterinExample 5.2.1 82

PAGE 83

3.2.1 and 3.2.2 ,weprovidedamethodforausertoutilisethecompletetoursinordertoestimatethemissingonesandshowedthatweneednotbeconcernedwiththeinspectionparadoxwhenwedoso.Thisworkhassomescopeforfurtherinvestigation.Inparticular,itwouldbeinterestingtoseeifimputingthosenalvalueswouldreducetheorderofthebias(1=t)asmuchaswaitingforthatnaltourtonish.Itisalsoofinteresttoseeif,withtypicalvaluesoftforMarkovchainsbeingsolarge,itisworthimputingatall.Possiblyitmightbe,ifweenvisiontheparallelversionrunningonhundredsofnodes,thusleadingtoseveralincompletetours.CloselyrelatedtothisareaistheissueofwhetheretheregenerativeCLTstillholdstrueifthenumberoftoursgeneratedisrandom.Thisisapertinentissueinthesituationwherewerunachainforapre-speciednumberofiterationsandconsiderthetourscompleteduptothatpoint.Itappearsthatthisshouldholdtrue,atleastintheasymptoticcase,butshouldbeformalizedispossible.IntheLDAmodel,thereareacoupleofproblemstotackle.Firstly,themultiplemodesintheposteriorhavetoberesolvedsomehowbeforeaMarkovchaincanberunonit.Secondly,bestminorizationwehavestillleadstoinfrequentregenerationsexceptintrivialcases.Thisiscertainlyanareaforfutureresearchsincethemodelisofgreatinterestinthemachinelearningworld.Itfeelslikeitshouldbepossibletondabetterminorization,especiallysincethechainitselfconvergesveryquickly.Anotherideaishowwechoosethedistinguishedset.AgainintheLDAmodel,wehavetochooseasimplexthatissmallsothattheinmumtakenissmall,butthatislargeenoughsothatthechainvisitsitfrequentlyenoughforregularregenerationopportunities.Insection 4.3 ,wehavechosenthesimplexinaverybasicway.Somealternativesmightbetousetheeigenvectordecomopositioninordertopickupdirectionswheremostofthevariationofthedataisin.Onchoosingthesevertices,wecantransformthissimplexintoasetof 83

PAGE 84

Gopaletal. ( 2011 ).ThecurrentversionoftheRpackageforBAMDalreadycarriesoutthevariableselectioninparallel,butitdoesnotcapitaliseonanyoftheregenerationresults.RelatedtothatmodelisthequestionofprovingitsgeometricergodicityandminorizingitinordertocarryoutestimationusingMcParre. 84

PAGE 85

1. Theone-stepgenerationfunctionoftheMarkovchain.Giventhecurrentstate,thisfunctionshouldreturnadrawfromtheMarkovchain-thenextstate. 2. Iftheminorizationiscarriedoutusingthedistinguishedpointandsetmethod,thenthereisalmostalwayscancellationofnormalizationconstantswhencomputingtheregenerationprobabilityinequation( 2 ).Inthiscaseitisnotnecessarytoprogramseparatefunctionsforthesmallfunction,smallmeasureandthetransitiondensity.Itwouldbesimplertosimplyprogramafunctionthatreturnstheregenerationprobabilitygivenxandy.Onotheroccasions,theminorizationmightnotallowforsuchacancellation.Hencetheuserhastwooptionshere,andhastospecifyoneofthefollowingtwosetsoffunctions: (a) Afunctionthatreturnstheregenerationprobabilityforthebellvariable,giventhecurrentstateandthenextone. (b) Functionsthatcomputethesmallfunction,smallmeasureandthetransitiondensityforthechain. Groppetal. ( 1998 ).ThisstandardisthedefactostandardforMIMDarchitectures,whichmanyhighperformancecomputingcentersuse.TheMPIlibrariesthatareinstalledontheseclustersareusuallyopen-sourceimplementationsoftheMPIstandardinCorFORTRAN.ThewayMPIworksissimple-onacluster,thesamepieceofcodeisrunonallofthem.Duringtheinitializationphase,acommunicationworldissetup,thatallowstheindividualnodesorprocessorstosendandreceivemessages.Inordertoidentifythemselves,eachnodeisassignedarankduringinitialization.Iftherearepprocessors, 85

PAGE 86

Mansmannetal. ( 2009 ),allowsausertocalltheMPIfunctionsthroughR.However,alargefocusisonallowingausertoruninteractivejobs.Thisrequiresslavestobespawnedfromamaster,anddoingthisthroughRputstheslavesinaninniteloop,waitingforinstructionsfromthemaster.Thisisnotreliable,andIhaveencounteredseveralproblems.EvenafterextensiveGooglingandconsultingwiththeadministratorsattheUFHPC,Icouldnotxsomeerrors/warningswhenusingRmpiinthismode.MysuggestionistouseRinbatchmodewhenusingtheMPIroutines.ThiswouldcloselymimictheMPIstandard.Ihavefoundthistobemuchmorereliable.TheuserhastokeepinmindthatthesameRcodewillnowbeexecutedonallthenodesandhastowritehisinstructionsaccordingly. 86

PAGE 87

A-1 outlinesthestepsneededtorunandanalysearegenerativeMarkovchainoncethisisdone.Theinst/directoryinthepackagefoldercontainsseverallesthatarenecessaryforrunningthepackageandforlearninghowtouseMcParre.TheRproleleprovidedthereshouldreplacetheoneprovidedbyRmpi.ThelattercaterstothosewhowishtoruntheparallelRsessioninteractively,andisresponsibleforkeepingtheslavesinaninniteloop.TheMcParreprolesimplyloadstheRmpilibraryandprovidesacleanupfunctionforexiting.Theinst/directoryalsoincludessamplescriptsforrunningHierarchicalLinearModelsandTopicModels.SincewearerunningRinbatchmode,andthesamecodeisbeingexecutedonallnodes,theywouldallbewritingtothesameoutputleandhenceoverwritingeachother.HencethesolutionistopipealltheRoutputtothenulldevice,andgeteachnodetowritetoit'sownspecicoutputle.ThisisexactlywhatrunMarkovChainRegenP()does,inconjunctionwiththefollowingMPIcommandtoruntheparallelRsession: 87

PAGE 88

FigureA-1. Theaboveow-chartoutlinesthefunctionsinvolvedateachstepofrunningaregenerativeMarkovchainusingMcParre.Examplesforthemodelsdescribedinchapter4areavailableinMcParrewiththesufxesOnewayPlain,HLMMandTopicModel. 88

PAGE 89

Amdahl,G.(1967).Validityofthesingleprocessorapproachtoachievinglargescalecomputingcapabilities.InProceedingsoftheApril18-20,1967,springjointcomputerconference,pp.483.ACM. Andrieu,C.andRobert(2001).ControlledMCMCforoptimalsampling.Citeseer. Andrieu,C.andJ.Thoms(2008).AtutorialonadaptiveMCMC.StatisticsandComput-ing18(4),343. Blei,D.,A.Ng,andM.Jordan(2003).Latentdirichletallocation.TheJournalofMachineLearningResearch3,993. Bratley,P.,B.Fox,andL.Schrage(1987).Aguidetosimulation.Springer. Brockwell,A.(2006).ParallelMarkovchainMonteCarlosimulationbypre-fetching.JournalofComputationalandGraphicalStatistics15(1),246. Chan,K.andC.Geyer(1994).CommentonMarkovchainsforexploringposteriordistributions.TheAnnalsofStatistics22(4),1701. Chen,H.(2002).Stochasticapproximationanditsapplications.KluwerAcademicPub. Feller,W.(1971).ProbabilityTheoryanditsApplications,vol.II.JohnWiley&Sons. Fishman,G.(2001).Discrete-eventsimulation:modeling,programming,andanalysis.SpringerVerlag. Flegal,J.,M.Haran,andG.Jones(2008).MarkovchainMonteCarlo:Canwetrustthethirdsignicantgure?StatisticalScience23(2),250. Flegal,J.andG.Jones(2010).BatchmeansandspectralvarianceestimatorsinMarkovchainMonteCarlo.TheAnnalsofStatistics38(2),1034. Flynn,M.(1972).Somecomputerorganizationsandtheireffectiveness.Computers,IEEETransactionson100(9),948. Gelman,A.,G.Roberts,andW.Gilks(1996).EfcientMetropolisjumpingrules.Bayesianstatistics5,599. Gopal,V.,Li,Z.,Casella,andG.(2011).BAMD:BayesianAssociationModelforGenomicDatawithMissingCovariates.Rpackageversion3.5. Gropp,W.,S.Huss-Lederman,A.Lumsdaine,E.Lusk,B.Nitzberg,W.Saphir,andM.Snir(1998).Mpithecompletereference:Volume2,thempi-2extensions. Gustafson,J.(1988).ReevaluatingAmdahl'slaw.CommunicationsoftheACM31(5),532. 89

PAGE 90

Hobert,J.andC.Geyer(1998).GeometricergodicityofGibbsandblockGibbssamplersforahierarchicalrandomeffectsmodel.JournalofMultivariateAnaly-sis67(2),414. Hobert,J.,G.Jones,B.Presnell,andJ.Rosenthal(2002).Ontheapplicabilityofregenerativesimulationinmcmc.Biometrika89,731. Hobert,J.,G.Jones,andC.Robert(2006).UsingaMarkovchaintoconstructatractableapproximationofanintractableprobabilitydistribution.ScandinavianJournalofStatistics33(1),37. Horst,R.,P.Pardalos,andN.Thoai(2000).Introductiontoglobaloptimization.SpringerNetherlands. Jones,G.,M.Haran,B.Caffo,andR.Neath(2006).Fixed-widthoutputanalysisforMarkovchainMonteCarlo.JournaloftheAmericanStatisticalAssociation101(476),1537. Jones,G.andJ.Hobert(2001).Honestexplorationofintractableprobabilitydistributions.StatisticalScience16,312. Kontoghiorghes,E.(2006).Handbookofparallelcomputingandstatistics.CRCPress. Kushner,H.andG.Yin(1997).Stochasticapproximationandrecursivealgorithmsandapplications.SpringerVerlag. Mansmann,U.,L.Tierney,H.Yu,D.Eddelbuettel,M.Morgan,andM.Schmidberger(2009).Stateoftheartinparallelcomputingwithr.JournalofStatisticalSoft-ware31(i01). Matloff,N.(2011).Programmingonparallelmachines.IEEE. Meketon,M.andP.Heidelberger(1982).Arenewaltheoreticapproachtobiasreductioninregenerativesimulations.ManagementScience28(2),173. Mira,A.(2001).OrderingandimprovingtheperformanceofMonteCarloMarkovchains.StatisticalScience16(4),340. Mira,A.andC.Geyer(1999).OrderingMonteCarloMarkovchains.Technicalreport,Citeseer. Mykland,P.,L.Tierney,andB.Yu(1995).RegenerationinMarkovChainSamplers.JournalofAmericanStatisticalAssociation90,233. Nummelin,E.(2004).GeneralirreducibleMarkovchainsandnon-negativeoperators.CambridgeUnivPress. 90

PAGE 91

Ren,R.andG.Orkoulas(2007).ParallelMarkovchainMonteCarlosimulations.TheJournalofchemicalphysics126,211102. Ripley,B.(1987).Stochasticsimulation.WileyNewYorketal. Robbins,H.andS.Monro(1951).Astochasticapproximationmethod.TheAnnalsofMathematicalStatistics22(3),400. Robert,C.andG.Casella(2004).MonteCarloStatisticalMethods.Springer. Roberts,G.,A.Gelman,andW.Gilks(1997).WeakconvergenceandoptimalscalingofrandomwalkMetropolisalgorithms.TheAnnalsofAppliedProbability7(1),110. Roberts,G.andJ.Rosenthal(2001).MarkovChainsandDe-initializingProcesses.ScandinavianJournalofStatistics28(3),489. Roberts,G.andJ.Rosenthal(2007).CouplingandergodicityofadaptiveMarkovchainMonteCarloalgorithms.JournalofAppliedProbability44(2),458. Rosenthal,J.(2001).AreviewofasymptoticconvergenceforgeneralstatespaceMarkovchains.FarEastJ.Theor.Stat5,37. Ross,S.(1992).Appliedprobabilitymodelswithoptimizationapplications.DoverPubns. Shao,J.andD.Tu(1995).Thejackknifeandbootstrap.Springer. Smith,W.(1958).Renewaltheoryanditsramications.JournaloftheRoyalStatisticalSociety.SeriesB(Methodological)20(2),243. Strid,I.(2010).EfcientparallelisationofMetropolis-Hastingsalgorithmsusingaprefetchingapproach.ComputationalStatistics&DataAnalysis54(11),2814. Suchard,M.,Q.Wang,C.Chan,J.Frelinger,A.Cron,andM.West(2010).UnderstandingGPUprogrammingforstatisticalcomputation:Studiesinmassivelyparallelmassivemixtures.JournalofComputationalandGraphicalStatistics19(2),419. Tan,A.andJ.Hobert(2009).BlockGibbssamplingforBayesianrandomeffectsmodelswithimproperpriors:Convergenceandregeneration.JournalofComputationalandGraphicalStatistics18(4),861. Tierney,L.(1998).AnoteonMetropolis-Hastingskernelsforgeneralstatespaces.AnnalsofAppliedProbability8(1),1. Venables,W.andB.Ripley(2002).ModernappliedstatisticswithS.Springerverlag. 91

PAGE 92

92

PAGE 93

VikneswaranGopalwasbornandraisedinSingapore.Theyoungeroftwosons,hegraduatedfromRafesJuniorCollegein1995.AfterservingpartofhisNationalService,heattendedtheNationalUniversityofSingapore(NUS)andearnedaB.Sc(Hons)degreein2001,majoringinmathematics.Afterreturningtocompletehisarmystint,heworkedfortheCentreforStrategicInfocommTechnologies.Hisrolewastoassessthesecurityofcryptographicalgorithms,protocolsandimplementations.Whileworkingthere,heobtainedaM.Sc.instatisticsfromNUS.ThatpiquedhisinterestinthesubjectandeventuallyledtohispursuitofaPhDattheUniversityofFloridaDepartmentofStatisticsin2006.UponcompletionofhisPh.Dprogram,VikneswaranwillbejoiningtheIBMResearchCollaboratoryinSingapore.HehasbeenmarriedtoCeciliaTanChewYinfor5years. 93