Posterior Consistency of Bayesian Regression Models

MISSING IMAGE

Material Information

Title:
Posterior Consistency of Bayesian Regression Models
Physical Description:
1 online resource (88 p.)
Language:
english
Creator:
Sparks, Douglas Kyle
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Statistics
Committee Chair:
Ghosh, Malay
Committee Co-Chair:
Khare, Kshitij
Committee Members:
Rosalsky, Andrew J
Garvan, Francis G

Subjects

Subjects / Keywords:
bayes -- bayesian -- consistency -- dimensional -- empirical -- g -- hierarchical -- high -- necessary -- posterior -- prior -- regression -- sufficient -- zellner
Statistics -- Dissertations, Academic -- UF
Genre:
Statistics thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
We develop conditions for posterior consistency under a variety of Bayesian regression models, many of which are both necessary and sufficient.  We allow the number of regressors to grow at the same rate as the sample size, placing our work in the "large p, large n'' regime, and we consider posterior consistency under the sup vector norm instead of the more conventional Euclidean norm.  In particular, we examine Zellner's g-prior along with its empirical Bayesian and hierarchical Bayesian extensions, as well as several models in which the regression coefficients are in some sense taken as a prior independent, whether marginally or conditionally on some hyperparameter.  We also discuss the interpretations and implications of our results.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Douglas Kyle Sparks.
Thesis:
Thesis (Ph.D.)--University of Florida, 2012.
Local:
Adviser: Ghosh, Malay.
Local:
Co-adviser: Khare, Kshitij.
Electronic Access:
RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2014-08-31

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2012
System ID:
UFE0044654:00001


This item is only available as the following downloads:


Full Text

PAGE 1

POSTERIORCONSISTENCYOFBAYESIANREGRESSIONMODELS By DOUGLASKYLESPARKS ADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOL OFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF DOCTOROFPHILOSOPHY UNIVERSITYOFFLORIDA 2012

PAGE 2

c r 2012DouglasKyleSparks 2

PAGE 3

Toeveryonewhoselovehasmademylifewhatitisandmademeth epersonIam 3

PAGE 4

ACKNOWLEDGMENTS EverythingthatIhaveaccomplishedinlifehasbeentheprod uctofthesupportthat Ihavebeenfortunateenoughtoreceivefromthepeoplewhoha vealwayswantedwhat wasbestforme.Tomyfamilywhohaslovedme,myfriendswhoha vesupportedme, theprofessorsinourdepartmentwhohaveinspiredme,thest affinourdepartmentwho havehelpedme,thestudentsinmyownclasseswhohavetolera tedme,andallthe peopleIhaveeverhurtorletdownwhohaveforgivenme:Thank you. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 4 ABSTRACT ......................................... 6 CHAPTER 1INTRODUCTION ................................... 7 1.1RegressionModel ............................... 7 1.2DenitionandInterpretationofPosteriorConsistency ............ 8 2ZELLNER'S g -PRIORANDEXTENSIONS .................... 15 2.1Non-Hierarchical g -PriorModel ........................ 16 2.2EmpiricalBayesian g -PriorModel ....................... 28 2.3Hyperg -PriorModel .............................. 32 2.4Zellner-Siow g -PriorModel .......................... 43 3INDEPENDENCEPRIORMODELANDEXTENSIONS ............. 55 3.1Flat-PriorModel ................................ 56 3.2Non-HierarchicalIndependence-PriorModel ................. 58 3.3Uniform-HyperpriorIndependence-PriorModel ............... 74 4ADDITIONALDISCUSSION ............................ 78 4.1Comparisonof g -PriorandIndependence-PriorModels .......... 78 4.2PracticalImplicationsforFiniteSampleSizes ................ 79 5POSSIBLEFUTUREWORK ............................ 81 5.1Extensionstothe g -PriorModel ........................ 81 5.2ExtensionstotheIndependence-PriorModel ................ 82 5.3OtherExtensions ................................ 83 REFERENCES ....................................... 85 BIOGRAPHICALSKETCH ................................ 88 5

PAGE 6

AbstractofDissertationPresentedtotheGraduateSchool oftheUniversityofFloridainPartialFulllmentofthe RequirementsfortheDegreeofDoctorofPhilosophy POSTERIORCONSISTENCYOFBAYESIANREGRESSIONMODELS By DouglasKyleSparks August2012 Chair:MalayGhoshCochair:KshitijKhareMajor:Statistics Wedevelopconditionsforposteriorconsistencyunderavar ietyofBayesian regressionmodels,manyofwhicharebothnecessaryandsuf cient.Weallowthe numberofregressorstogrowatthesamerateasthesamplesiz e,placingourwork inthe“large p ,large n ”regime,andweconsiderposteriorconsistencyunderthesu p vectornorminsteadofthemoreconventionalEuclideannorm .Inparticular,weexamine Zellner's g -prioralongwithitsempiricalBayesianandhierarchicalB ayesianextensions, aswellasseveralmodelsinwhichtheregressioncoefcient sareinsomesensetaken asaprioriindependent,whethermarginallyorconditional lyonsomehyperparameter. Wealsodiscusstheinterpretationsandimplicationsofour results. 6

PAGE 7

CHAPTER1 INTRODUCTION Thesimpleproblemofregressingaresponsevectorof n observationson p covariatesisamongtherstencounteredbyanystudentofst atistics,andregression modelshavepracticalapplicationsinvirtuallyanyconcei vableeldofstudy.More specically,regressionundertheBayesianparadigmhasgr owninpopularityinrecent decadeswiththerapidaccelerationofcomputingpowerandt hecontinuingdevelopment ofMarkovchainMonteCarlotechniques.Thebehaviorandthe oreticalpropertiesof theseBayesianregressionmodelshavethereforebecomeimp ortanttopicsofstudy. 1.1RegressionModel Webeginbyestablishinganddiscussingthebasicregressio nmodelthatwillbe usedthroughoutourwork.Considertheusuallinearmodel Y n = X n n + e n with response Y n =( Y n ,1 ,..., Y n n ) T ,covariates X n =( x n ,1 ,..., x n n ) T ,regressioncoefcients n =( n ,1 ,..., n p n ) T anderrors e n =( e 1 ,..., e n ) T whosedistributiondependsona parameter 2 .Heretheunknowncoefcientvector n istheparameterofinterest,while theunknownerrorparameter 2 isconsideredtobeanuisanceparameter.Wenow imposethreeassumptions: Assumption1. Theerrorsaredistributedas e n N n ( 0 n 2 I n ) Assumption2. Thenumberofregressors p n isanondecreasingsequencewith p n < n and p n = n ,where 0 < 1. Assumption3. Theeigenvalues n ,1 ,..., n p n ofthematrix n ( X Tn X n ) 1 satisfythe inequalities 0 < min inf n i n i sup n i n i max < 1 forsome min and max NotethatAssumption 3 impliesthat 1 max I p n n 1 X Tn X n 1 min I p n Wenownotethatminimalsufciencyallowsthereductionof Y n to ( ^ n S n ) ,where ^ n =( X Tn X n ) 1 X Tn Y n ,themaximumlikelihoodestimatorof n ,and S n = jj Y n X n ^ n jj 22 theerrorsumofsquares.Notethatconditionalon n and 2 ^ n and S n aremutually independentwith ^ n j n 2 N p n ( n 2 ( X Tn X n ) 1 ) and S n j n 2 2 2n p n 7

PAGE 8

ABayesianregressionmodelrequiresthespecicationofpr iorsontheunknown parameters n and 2 ,possiblywiththeirownunknownhyperparameterswhichwou ld thenrequiretheirownhyperpriors.Thesepriorsandhyperp riorswillvaryandwillbe describedasneededinsubsequentchapters.Alternatively ,anempiricalBayesian approachcouldbeemployedinlieuofahierarchicalstructu re.Thedetailsofsuch techniqueswillalsobespeciedinduecourse.Wenowprovid ethefollowingdenition forclarity. Denition1.1. Weusetheterm Bayesianregressionmodel torefertoalinear modelwithAssumptions 1 2 ,and 3 ,andwithsomechoiceofpriors,possiblywith hyperparametersspeciedbyhyperpriorsorempiricalBaye siantechniques.Weuse thenotation P M todenotebothaBayesianregressionmodelandprobabilitie scomputed underit,andweuse E M andVar M todenotecorrespondingexpectationsandvariances. Itisimportanttonotethatwepermitthenumberofcovariate s p n toincreasealong withthesamplesize n ,asthecomplexityofmanyofourresultsoriginatesfromthi s fact.Wepreservetheclassicalfrequentistidentiabilit yrequirementthat p n < n ,butit shouldbenotedthatdimensionreductiontechniquesdoallo wregressionmethodsto beextendedtothecaseof p n > n ,theso-called“large p ,small n ”regime.However,we restrictourattentionheretothe p n < n case,whichmaybecalledthe“large p ,large n ” regime,andwhichhasbeenshowntobequiteinterestinginit sownright[ 2 ][ 21 ][ 22 ]. Still,ascanbeseenfromAssumption 2 ,weallowthenumberofcovariatestogrowat themaximumpossibleratesubjecttothisrestriction,i.e. ,weallow p n = n ,where 0 < 1 1.2DenitionandInterpretationofPosteriorConsistency Supposeweplacesomepriorontheunknowncoefcientvector n andvariance 2 TheapplicationofBayes'theoremandtheintegrationoutof thenuisanceparameter 2 yieldthemarginalposteriorof n whichistheprimaryobjectonwhichBayesian 8

PAGE 9

inferenceisbased.CallthisentireBayesianmodel P M Wenowintroducethenotionof atruemodelunderwhichthedataisactuallygenerated. Denition1.2. ForanyBayesianregressionmodel P M ,wedeneacorresponding truemodel comprisingthesamelikelihoodas P M butwithsomexedbutunknown parametervalues 0 and 2 0 Weusethenotation P 0 todenoteboththetruemodel andprobabilitiescomputedunderit,andweuse E 0 andVar 0 todenotecorresponding expectationsandvariances. Looselyspeaking,onewouldhopethatasthesamplesizetend stoinnity,the marginalposteriorof n convergestodegeneracyat 0 n almostsurelyunder P 0 Wecall thispropertyposteriorconsistency,andtheobviousquest ionistodeterminethevalues of 0 n and 2 0 forwhichitoccurs.Beforewemayformalizethisidea,wers tlet jjjj r denotetheusual ` r vectornorm,where 1 r 1 .Specically, jj u jj r =( P i j u i j r ) 1 = r if 1 r < 1 ,and jj u jj 1 =max i j u i j .Wenowstatetheformaldenitionofposterior consistencyunderthe ` r norm. Denition1.3. Let 0 n 2 R p n foreach n 1 and 2 0 > 0. Nowlet P 0 denotethe distributionof f ( ^ n S n ), n 1 g underthemodel Y n = X n 0 n + e n foreach n 1, where e n N n ( 0 n 2 0 I n ) foreach n 1. Alsolet 1 r 1 .Wesaythatthe Bayesianregressionmodel P M exhibits posteriorconsistency underthe ` r normat f ( 0 n 2 0 ), n 1 g if P M ( jj n 0 n jj r > j ^ n S n ) 0 a.s. ( P 0 ) forevery > 0. AlsoseeDenition 1.4 TheconventionalchoiceforthevectornorminDenition 1.3 wouldbetheEuclidean ` 2 norm.However,thefollowinglemmaandcorollaryillustrat ewhythe ` 2 normisnot sufcientlyexibleforourpurposes. Lemma1.1. Let Z n N p n ( 0 p n n 1 V n ), where p n < n ,andwheretheeigenvalues n ,1 ,..., n p n of V n satisfy 0
PAGE 10

Proof. Let > 0 .NotethatVar ( Z n i )= n 1 V n ii n 1 max ,and n 1 = 2 V 1 = 2 n ii Z n i N (0,1) Nowlet U n = n P p n i =1 V 1 n ii Z 2 n i ,sothat U n 2p n .Bythepropertiesofthechi-squared distribution, U n = n 0 almostsurelyifandonlyif p n = n 0 .Thensince r min U n n jj Z n jj 2 r max U n n itfollowsthat jj Z n jj 2 0 almostsurelyifandonlyif p n = n 0 Corollary1.2. jj ^ n 0 n jj 2 0 a.s. ( P 0 ) ifandonlyif p n = n 0 Proof. ApplyLemma 1.1 under P 0 with Z n = ^ n 0 n and V n = 2 0 n ( X Tn X n ) 1 AsisclearfromCorollary 1.2 ,noteventheMLE ^ n achievesalmostsureconsistency underthe ` 2 normwhen p n growsatthesamerateas n .Thus,anyattempttoestablish posteriorconsistencyunderthe ` 2 normofaBayesianregressionmodelunderthesame circumstanceswouldbefutile.However,thefollowinglemm aandcorollarymotivatethe choiceofthe ` 1 norminstead. Lemma1.3. Let Z n N p n ( 0 p n n 1 V n ), where p n < n ,andwheretheeigenvalues n ,1 ,..., n p n of V n satisfy sup n i n i max < 1 forsome max .Then jj Z n jj 1 0 almost surely. Proof. Let > 0. NotethatVar ( Z n i )= n 1 V n ii n 1 max ,and n 1 = 2 V 1 = 2 n ii Z n i N (0,1). Then 1 X n =1 P ( jj Z n jj 1 > ) = 1 X n =1 P max 1 i p n j Z n i j > 1 X n =1 p n X i =1 P n 1 = 2 V 1 = 2 n ii j Z n i j > n 1 V n ii 1 = 2 1 X n =1 p n X i =1 P n 1 = 2 V 1 = 2 n ii j Z n i j >! 1 = 2 max n 1 = 2 1 X n =1 p n X i =1 15 3 max 6 n 3 < 1 10

PAGE 11

byapplyingMarkov'sinequalityto n 3 V 3 n ii Z 6 n i andtheresultfollowsfromtheBorel-Cantelli lemma,notingthat p n < n Corollary1.4. jj ^ n 0 n jj 1 0 a.s. ( P 0 ) Proof. ApplyLemma 1.3 under P 0 with Z n = ^ n 0 n and V n = 2 0 n ( X Tn X n ) 1 AlthoughCorollaries 1.2 and 1.4 arenotposteriorconsistencyresultsperse,they nonethelessdemonstratetheaddedexibilitythatcanaris efromtheuseofthe ` 1 norm insteadofthe ` 2 normwhenprovingconsistencyresults.Forthisreason,wec hoose toworkwiththe ` 1 normthroughoutourwork,withthefollowingdenitionprov idedfor thoroughness. Denition1.4. Unlessotherwisespecied,theposteriorconsistencyofDe nition 1.3 is tobetakenunderthe ` 1 norm. Ofcourse,theaddedexibilityisgainedbecausethe ` 1 normisweakerthanthe ` 2 norminthesensethat jj u jj 1 jj u jj 2 forallvectors u .Thismeansthatasufcient conditionforposteriorconsistencyunderthe ` 1 normmaynotbesufcientunderthe ` 2 norm(orunderanyother ` r normfor 1 r < 1 ).However,a necessary conditionfor posteriorconsistencyunderthe ` 1 norm is necessaryunderanystrongernormaswell. Theuseofthe ` 1 normalsoleadstothefollowinghelpfulresult. Lemma1.5. InaBayesianregressionmodel, P M ( jj n 0 n jj 1 > j ^ n S n ) 0 a.s. ( P 0 ) forevery > 0 ifandonlyif P M ( jj n ^ n jj 1 > j ^ n S n ) 0 a.s. ( P 0 ) forevery > 0. Proof. Thetriangleinequalityimpliesthat P M n ^ n 1 > 2 j ^ n S n P M ^ n 0 n 1 > j ^ n S n P M jj n 0 n jj 1 > j ^ n S n P M n ^ n 1 >= 2 j ^ n S n + P M ^ n 0 n 1 >= 2 j ^ n S n 11

PAGE 12

Whenconditioningon ^ n and S n P M jj ^ n 0 n jj 1 > j ^ n S n = I jj ^ n 0 n jj 1 > where I ( ) denotestheusualindicatorfunction.Then jj ^ n 0 n jj 1 0 a.s. ( P 0 ) by Corollary 1.4 ,fromwhichitfollowsthat I ( jj ^ n 0 n jj 1 > ) 0 a.s. ( P 0 ) forall > 0. Thisandtheaboveinequalitiesimmediatelyyieldtheresul t. Thus,Lemma 1.5 essentiallystatesthat 0 n maybereplacedby ^ n inthedenition ofposteriorconsistencyunderthe ` 1 norm. Itshouldbenotedthatthetypeofposteriorconsistencycon sideredhereinis fundamentallydifferentfromwhatcouldinsteadbeconside redintheanalysisof Bayesianmethodology.Onecouldinsteadconsidertheconve rgenceoftheposterior underthesamemodel P M underwhichitisderived,inwhichcaseoneisassumingthat thepriorassociatedwiththemodel P M isinsomesense“true.”However,thisapproach isperhapstoofavorableinthatposteriorconsistencyisto oeasytoachieve.Infact,in thisapproach,aquitegeneralresultduetoDoob[ 6 ]statesthatposteriorconsistency occursonasetofparametervalueswithprobability1undert hepriorassociationwith P M Instead,thetypeofposteriorconsistencyconsideredhere inisfundamentally frequentistinnature,thatis,thevalues 0 areand 2 0 areconsideredxedbutunknown. ThefrequentistpropertiesofBayesianmethodshavebeenof interestforsometime. EvenpurefrequentistsmaybeinterestedinoriginallyBaye sianprocedures,orlimitsand approximationsthereof,duetoconsiderationssuchasadmi ssibilityandtheconvenient eliminationofnuisanceparameters.Indeed,itwasshownas earlyasLaplace[ 16 ]that insimplecases,theposteriordistributionandthedistrib utionofthemaximumlikelihood estimatorarecomparableforlargesamplesizes.Moresophi sticatedversionsofsuch resultshavebeendevelopedinmorerecenttimes[ 3 ][ 13 ][ 17 ][ 23 ]. Thepreciseinterpretationofthisfrequentistnotionofpo steriorconsistencybythe trueBayesianisdiscussedextensivelybyDiaconisandFree dman[ 5 ].Theirdiscussion 12

PAGE 13

separatesBayesiansintotwophilosophicalgroups,whicht heyterm“classical”and “subjectivist”Bayesians.TheydenetheclassicalBayesi antobeonewhobelieves thatatrueparametervalueexists,butthatitisunknownand mustbeestimatedfrom thedata.Despitethedifferenceinmethodology,suchBayes ianssharephilosophical commongroundwiththefrequentiststhroughabeliefinatru eparametervalue,sothe frequentistnotionofposteriorconsistencyisclearlyofg reatinteresttothisgroup.The othergroup,thesubjectivistBayesians,aredenedtobeth osewhoviewprobabilities asdegreesofbeliefandrejectthenotionofobjectiveproba bilitymodels.Forthisgroup, therelevanceofthefrequentistnotionofposteriorconsis tencyislessobvious.However, DiaconisandFreedmanprovideanextensionofaresultorigi nallyduetoBlackwell andDubins[ 4 ]andarguethatthefrequentistnotionofposteriorconsist encycanbe viewedasthemergingofdifferingsubjectiveopinionsunde ranappropriatesetof circumstances. ThereexistssubstantialliteratureonthebehaviorofBaye sianregressionmodels inthecontextofmodelselection.Inthiscontext,theprima ryfrequentistpropertyof interestistypicallyeithertheconsistencyoftheBayesfa ctorortheconvergenceor nonconvergencetounityoftheposteriorprobabilityofthe truemodel,whichhasbeen calledposteriormodelconsistency[ 22 ].Forxed p ,thesenotionsareequivalent,but theymaydifferwhen p increaseswiththesamplesize[ 22 ].Withxed p ,posterior consistencyofBayesfactorswasrstestablishedunderasp ecichierarchicalprior structurebyFern andezetal.[ 8 ],anditwaslaterestablishedforawidervarietyof Bayesianregressionmodels,includingsomehierarchicala ndempiricalBayesian models,byLiangetal.[ 18 ].Inthecaseofincreasing p n (where p n < n ),resultshave beenobtainedonbothposteriorconsistencyofBayesfactor s[ 2 ][ 21 ]andposterior modelconsistency[ 22 ]. AdifferentnotionofconsistencywasconsideredbyJiang[ 14 ],whoexamined convergenceratesofthepredictivedensityoftheresponse variable.Jiangaddressed 13

PAGE 14

notonlyregressionbutthebroaderclassofgeneralizedlin earmodelsandallowed p n toexceed n ,thoughhestressesthattheinclusionofavariableselecti onprocedureis essentialtoachievecertainresults. Incontrast,weconsiderposteriorconsistencyintheconte xtofparameter estimation,i.e.,theconvergenceornonconvergenceofthe posteriordistributionofthe coefcientvectortodegeneracyatthetruevalue.Thisispr eciselythetypeofposterior consistencyexaminedbyGhosal[ 12 ],whoconsideredcertaintypesofhigh-dimensional linearmodelsandprovidedavaluablecontributionbyprovi ngnotonlyposterior consistencybutalsoasymptoticnormalityoftheposterior distributionofthecoefcient vector.However,ourworkdiffersfromthatofGhosalinfour principalrespects.First, weprovide necessaryandsufcient conditionsforposteriorconsistencyinmanyofthe modelsweconsider.Whilewereadilyadmitthatstrongerres ultssuchasasymptotic normalityareperhapsmoreusefulwheneverposteriorconsi stencyoccurs,ourvarious necessary conditionsdemonstratecircumstancesinwhichposteriorc onsistencyfails tooccuratall,whichwebelievetobeinterestingintheirow nright.Second,ourwork allowstheparameterspaceforthe p n -dimensionalvectorofregressioncoefcients tobetakenas R p n ,asisnatural.ThiscontrastswithGhosal'swork,whichess entially requirestherestrictionoftheparameterspacetoasequenc eofcompactsets.Third, weallowthenumberofregressorstoincreaseatthesamerate asthesamplesize,i.e., p n = n ,0 < 1 ,whichisnotpermittedbyGhosal.Fourth,wepermitanunkno wn samplingvariance 2 withanassociatedprior,asiscommonlypracticedinregres sion models.Suchastructureisnotincludedintheclassofmodel sconsideredbyGhosal. 14

PAGE 15

CHAPTER2 ZELLNER'S g -PRIORANDEXTENSIONS The g -priorwasintroducedbyZellner[ 24 ]andhasfoundabundantapplicationsin traditionalBayesiananalysisoflinearmodels[ 13 ],variableselection[ 8 ][ 9 ][ 11 ][ 18 ], classicationproblems[ 19 ],andavarietyofothersubjects.Zellner's g -priorfor n conditionalon 2 isgivenby n j 2 N p n r n g 2 X Tn X n 1 (2–1) forsomescalar g > 0 andsomepriormean r n (often 0 p n ).Theprioron 2 iscommonly takentobeconjugate,i.e., 2 InverseGamma a 2 b 2 (2–2) wherewepermit a 2 and b 0 toallowforsuchpopularimproperpriorsas ( 2 ) / 1 = 2 ,1 = or 1 (whereweofcoursenotethatthispriorisonlyaproperinver se gammadistributionwhen a and b arebothstrictlypositive).Thefollowingdenition formalizesthenotionofa g -priormodel. Denition2.1. A g -priormodel isaBayesianregressionmodelwithpriorsspeciedas Priors 2–1 and 2–2 .Notethat g maybetakenasasinglexedvalueoranonstochastic sequenceofvalues g n ,oritmaybespeciedthroughempiricalBayesianorhierarc hical Bayesiantechniques. Onemotivationfortheuseofa g -priormodelistheconvenientformoftheBayes estimatorundersquarederrorloss.When g isxed,thisBayesestimatorisgivenby ^ B n = E M ( n j ^ n S n )= g g +1 ^ n + 1 g +1 r n aweightedaverageoftheMLEandpriormeanwithscalar-valu edweights.Thisfactwill facilitatetheproofsofmanyoftheresultsthatfollow. 15

PAGE 16

Theoutlineofthefoursectionsofthischapterisasfollows .Section 2.1 provides necessaryandsufcientconditionsforposteriorconsiste ncyforanonstochastic sequence f g n n 1 g Intheprocess,wedemonstratetheposteriorconsistencyor inconsistencyofsomepopularrecommendationsregardingt hechoiceof g n Section 2.2 providesnecessaryandsufcientconditionsforposterior consistencyinanempirical Bayesiancontextinwhich g n isestimatedfromthedata.Section 2.3 providesnecessary andsufcientconditionsforposteriorconsistencyundert hehierarchicalhyperg -prior model[ 18 ].Section 2.4 considersthecelebratedZellner-Siowprior[ 25 ]andprovidesa sufcient(thoughnotnecessary)conditionforposteriorc onsistencyunderthismodel. Attheendofeachsection,thepracticalimplicationsofthe resultsarebrieydiscussed. ItshouldbenotedthatalthoughthekeyresultsofSections 2.2 2.3 ,and 2.4 yieldthe samesufcientconditionforposteriorconsistency,thete chniquesusedtoprovethese resultsdiffersubstantiallyamongthethreemodels. 2.1Non-Hierarchical g -PriorModel Webeginourexaminationof g -priormodelsbyconsideringthesimplestapproach tothespecicationof g Denition2.2. A non-hierarchical g -priormodel isa g -priormodelinwhich g g n is anonstochasticsequence. Toestablishresultsonposteriorconsistencyorinconsist encyinthenon-hierarchical g -priormodel,werstdene T n =( ^ n r n ) T X Tn X n ( ^ n r n ), (2–3) sothat T n = 2 wouldbetheusualfrequentistlikelihoodratioteststatis ticforatestof H 0 : n = r n vs. H a : n 6 = r n ifthevariance 2 wereknown.Thenthejointposterior 16

PAGE 17

n ( n 2 j ^ n S n ) isgivenby n ( n 2 j ^ n S n ) / exp 1 2 n ^ B n T g n g n +1 2 X Tn X n 1 1 n ^ B n # 2 ( n + p n + a ) = 2 exp 1 2 2 S n + b + T n g n +1 andintegratingout n fromthisyieldsthemarginalposteriorof 2 n ( 2 j ^ n S n ) / 2 ( n + a ) = 2 exp 1 2 2 S n + b + T n g n +1 i.e., 2 j ^ n S n InverseGamma (( n + a 2) = 2, e T n = 2), under P M ,wherewedene e T n = S n + b +( g n +1) 1 T n Fornotationalconvenience,foreach n 1, dene 0 n = n jj r n 0 n jj 22 ( r n 0 n ) T X Tn X n ( r n 0 n ) 0 n = E 0 ( T n )= p n 2 0 + n 1 0 n jj r n 0 n jj 22 (2–4) e 0 n = E 0 ( e T n )=( n p n ) 2 0 + b + 1 g n +1 p n 2 0 + n 1 0 n jj r n 0 n jj 22 andnotethat min 0 n max since 1 max I p n n 1 X Tn X n 1 min I p n Thefollowinglemmasestablishthebehaviorofvariousquan titiesunder P 0 and theywillbeheavilyusedinprovingposteriorconsistencyo rinconsistencyinboththe non-hierarchical g -priormodelandthe g -priormodelsofsubsequentsections. Lemma2.1. ( n p n ) 1 S n 2 0 a.s. ( P 0 ). Proof. Firstobservethatunder P 0 theexpectationandfourthcentralmomentof S n are E 0 ( S n )=( n p n ) 2 0 and ( 4 ) 0 ( S n )=12( n p n )( n p n +4) 8 0 Let > 0. Then 1 X n =1 P 0 S n n p n 2 0 > 12 8 0 4 1 X n =1 n p n +4 ( n p n ) 3 < 1 so ( n p n ) 1 S n 2 0 a.s. ( P 0 ) bytheBorel-Cantellilemma. Lemma2.2. If > 0 or liminf n !1 jj r n 0 n jj 22 > 0, then T n = 0 n 1 a.s. ( P 0 ). 17

PAGE 18

Proof. Notethatunder P 0 T n = 2 0 hasanoncentralchi-squaredistributionwith p n degreesoffreedomandnoncentralityparameter n 1 0 n jj r n 0 n jj 22 = 2 .Thenthe fourthcentralmomentof T n under P 0 is ( 4 ) 0 ( T n ) = E 0 ( T n 0 n ) 4 = E 0 h T n p n 2 0 + n 1 0 n jj r n 0 n jj 22 i 4 whichcanbewrittenas ( 4 ) 0 ( T n ) =12 4 0 p n 2 0 +2 n 1 0 n jj r n 0 n jj 22 2 +48 6 0 p n 2 0 +4 n 1 0 n jj r n 0 n jj 22 12 4 0 2 p n 2 0 +2 n 1 0 n jj r n 0 n jj 22 2 +48 6 0 4 p n 2 0 +4 n 1 0 n jj r n 0 n jj 22 =48 4 0 2 0 n +192 6 0 0 n (2–5) Dene =liminf n !1 jj r n 0 n jj 22 .Observethatif > 0, then 0 n p n 2 0 > n 2 0 = 2 forall sufcientlylarge n andso 1 0 n = O ( n 1 ). If > 0, then 0 n n 1 0 n jj r n 0 n jj 22 > n 1 max = 2 forallsufcientlylarge n andso 1 0 n = O ( n 1 ). Eitherway, 1 0 n = O ( n 1 ), sothefourth centralmomentof T n = 0 n under P 0 is ( 4 ) 0 T n 0 n 48 4 0 2 0 n + 192 6 0 3 0 n = O ( n 2 ). Thenforany > 0, 1 X n =1 P 0 T n 0 n 1 > 1 X n =1 1 4 ( 4 ) 0 T n 0 n < 1 whichimpliesthat T n = 0 n 1 a.s. ( P 0 ) bytheBorel-Cantellilemma. Lemma2.3. e T n = e 0 n 1 a.s. ( P 0 ). Proof. ItfollowsfromInequality 2–5 thatthefourthcentralmomentof e T n under P 0 is ( 4 ) 0 e T n = E 0 e T n e 0 n 4 = E 0 8<: S n ( n p n ) 2 0 + T n g n +1 p n 2 0 + n 1 0 n jj r n 0 n jj 22 g n +1 # 4 9=; 18

PAGE 19

Wemayboundthisquantityby ( 4 ) 0 e T n 8 E 0 [ S n E 0 ( S n )] 4 +8 E 0 ( T n g n +1 E 0 T n g n +1 4 ) 96( n p n )( n p n +4) 8 0 + 12 4 0 ( g n +1) 4 p n 2 0 +2 n 1 0 n jj r n 0 n jj 22 2 + 48 6 0 ( g n +1) 4 p n 2 0 +4 n 1 0 n jj r n 0 n jj 22 whichimpliesthat ( 4 ) 0 e T n 96( n p n +4) 2 8 0 +48 4 0 e 2 0 n +192 6 0 e 0 n Since e 0 n ( n p n ) 2 0 thefourthcentralmomentof e T n = e 0 n under P 0 is ( 4 ) 0 e T n e 0 n 96( n p n +4) 2 8 0 e 4 0 n + 48 4 0 e 2 0 n + 192 6 0 e 3 0 n 96( n p n +4) 2 ( n p n ) 4 + 48 ( n p n ) 2 + 192 ( n p n ) 3 whichis O ( n 2 ). Thenforany > 0, 1 X n =1 P 0 e T n e 0 n 1 > 1 X n =1 1 4 ( 4 ) 0 e T n e 0 n < 1 whichimpliesthat e T n = e 0 n 1 a.s. ( P 0 ) bytheBorel-Cantellilemma. Thefollowinglemmasregardingthenormaldistributionwil lbeusefulinestablishing theconditionforposteriorconsistencyinthenon-hierarc hicalcase. Lemma2.4. Let Z n N p n ( n n ), where n ispositivedenite,foreach n 1. If jj n n jj 1 9 0 ,thenthereexistan > 0 andasubsequence k n of n suchthat P ( jj Z k n k n jj 1 > ) 1 = 2 forall n Proof. Assume jj n n jj 1 9 0. Thenthereexistasubsequence k n of n anda > 0 suchthat jj k n k n jj 1 > forall n Therealsoexistsan i n ,1 i n p n suchthat j k n i n k n i n j = jj k n k n jj 1 > forall n Theneither k n i n < k n i n (Case1)or k n i n > k n i n + (Case2).Nowlet 0 << andnotethat P jj Z k n k n jj 1 19

PAGE 20

P ( k n i n Z k n i n k n i n + ) Recallthat k n isassumedpositivedenite.Thenin Case1, P ( k n i n Z k n i n k n i n + ) P ( k n i n Z k n i n ) =1 = 2, whileinCase2, P ( k n i n Z k n i n k n i n + ) P ( Z k n i n k n i n ) =1 = 2. Eitherway,itfollowsthat P ( jj Z k n k n jj 1 > ) 1 = 2 forall n Lemma2.5. Let Z N ( 2 ). Then P ( j Z j ) 1 2( = ) forevery 0, where isthestandardnormalcdf. Proof. Notethatforany t > 0,( z + t ) ( z t ) ismaximizedat z =0. Hence, P ( j Z j )= P ( Z )= P Z = fromwhichitimmediatelyfollowsthat P ( j Z j ) 1 2( = ). Lemma2.6. Let Z n N p n ( 0 p n n ) foreach n 1, whereeachmatrix n haseach diagonalentryequalto1andeigenvalues n ,1 ,..., n p n If inf n i n i = min then inf n i Var ( Z i j Z i +1 ,..., Z p n ) min Proof. Foreach i =1,..., p n partition n as n = 266664 n i ,11 n i ,1 i n i ,12 Tn i ,1 i n ii n i ,2 i Tn i ,12 Tn i ,2 i n i ,22 377775 wherethesubmatrices n i ,11 and n i ,22 alongthediagonalhavedimension ( i 1) ( i 1) and ( p n i ) ( p n i ), respectively.Thendene e n i = Var ( Z i j Z i +1 ,..., Z p n ), so that e n i = n ii n i ,2 i 1 n i ,22 Tn i ,2 i Notethat e 1 n i istherstdiagonalentryof 264 n ii n i ,2 i Tn i ,2 i n i ,22 375 1 20

PAGE 21

whichhaseigenvaluesboundedaboveby 1 min sincetheeigenvaluesofaprincipal submatrixareboundedbelowbythesmallesteigenvalueofth efullmatrix.Hence e 1 n i 1 min andtheresultimmediatelyfollows. Finally,oneadditionallemmaprovidesaresultaboutthema rginalposteriorof 2 Lemma2.7. Inthenon-hierarchical g -priormodel,theposteriordistributionof 2 satises P M ( n 1 e 0 n = 2 2 2 n 1 e 0 n j ^ n S n ) 1 a.s. ( P 0 ). Proof. Recallthat e T n = e 0 n 1 a.s. ( P 0 ) byLemma 2.3 .Thenforallsufcientlylarge n P M e 0 n 2 n 2 2 e 0 n n ^ n S n P M 3 e T n 4( n + a 4) 2 5 e T n 4( n + a 4) ^ n S n a.s. ( P 0 ) = P M 2 E M 2 j ^ n S n e T n 4( n + a 4) ^ n S n 1 4( n + a 4) e T n 2 2 e T 2 n ( n + a 4) 2 ( n + a 6) =1 32 n + a 6 1, wherethelastinequalityisaconsequenceofChebyshev'sin equality,forwhichwenote thatVar M ( 2 j ^ n S n )=2( n + a 4) 2 ( n + a 6) 1 e T 2 n Notethatalthough g n doesnotappearexplicitlyintheresultinLemma 2.7 theresultneverthelessdoesdependonthechoiceof g n sinceitisinvolvedinthe quantity e 0 n Wenowstateandprovethenecessaryandsufcientcondition forposterior consistencyinthenon-hierarchical g -priormodel. Theorem2.8. Inthenon-hierarchical g -priormodel,posteriorconsistencyoccursifand onlyifboth ( g n +1) 1 jj r n 0 n jj 1 0 and g n ( g n +1) 2 (log p n ) n 1 jj r n 0 n jj 22 0. Proof. ByLemma 1.5 ,wemayreplace 0 n with ^ n inthedenitionofposterior consistency.Wewillnowconsiderfourcases. 21

PAGE 22

Case1:Suppose ( g n +1) 1 jj r n 0 n jj 1 9 0. Thensince jj r n ^ n jj 1 jj r n 0 n jj 1 jj ^ n 0 n jj 1 and jj ^ n 0 n jj 1 0 a.s. ( P 0 ) byCorollary 1.4 ,itfollows that ( g n +1) 1 jj r n ^ n jj 1 9 0 a.s. ( P 0 ). Nowobservethatunder P M n ^ n j 2 ^ n S n N p n 1 g n +1 r n ^ n g n g n +1 2 X Tn X n 1 ThenbyLemma 2.4 ,thereexistsan > 0 andasubsequence k n of n suchthat, a.s. ( P 0 ) P M ( jj k n ^ k n jj 1 > j 2 ^ k n S k n ) > 1 = 2 forevery n andevery 2 > 0. Then P M k n ^ k n 1 > ^ k n S k n = E M h P M k n ^ k n 1 > 2 ^ k n S k n ^ k n S k n i 1 = 2 forevery n a.s. ( P 0 ). Therefore P M ( jj n ^ n jj 1 > j ^ n S n ) 9 0, soposteriorconsistencydoesnotoccur. Fortheremainingcases,suppose ( g n +1) 1 jj r n 0 n jj 1 0. Thensince jj r n ^ n jj 1 jj r n 0 n jj 1 + jj ^ n 0 n jj 1 and jj ^ n 0 n jj 1 0 a.s. ( P 0 ) by Corollary 1.4 ,itfollowsthat ( g n +1) 1 jj r n ^ n jj 1 0 a.s. ( P 0 ). Then P M jj n ^ B n jj 1 > 2 j ^ n S n P M jj ^ B n ^ n jj 1 > j ^ n S n P M jj n ^ n jj 1 > j ^ n S n P M jj n ^ B n jj 1 >= 2 j ^ n S n + P M jj ^ B n ^ n jj 1 >= 2 j ^ n S n bythetriangleinequality.Notethat P M jj ^ B n ^ n jj 1 > j ^ n S n = I ( jj ^ B n ^ n jj 1 > ). However, ^ B n ^ n =( g n +1) 1 ( r n ^ n ), sothisindicatoriszeroforallsufciently large n a.s. ( P 0 ). Therefore,posteriorconsistencyoccursinCases2and3bel owifand onlyif P M ( jj n ^ B n jj 1 > j ^ n S n ) 0 a.s. ( P 0 ) forevery > 0. Wenowconsiderthe individualcases. 22

PAGE 23

Case2:Supposethat ( g n +1) 1 jj r n 0 n jj 1 0, andalsosupposethat g n ( g n +1) 2 (log p n ) n 1 jj r n 0 n jj 22 0. Observethat P M n ^ B n 1 > ^ n S n = E M h P M n ^ B n 1 > 2 ^ n S n ^ n S n i whichallowsustowrite P M n ^ B n 1 > ^ n S n E M P M n ^ B n 1 > 2 ^ n S n I 2 2 e 0 n n ^ n S n # + P M 2 > 2 e 0 n n ^ n S n Weimmediatelyhavethat P M ( 2 > 2 e 0 n = n j ^ n S n ) 0 a.s. ( P 0 ) byLemma 2.7 ,so itsufcestoworkwiththersttermtoestablishposteriorc onsistency.Let v n ij denote the ij thelementof n ( X Tn X n ) 1 andnotespecicallythatthediagonalelementsmaybe boundedby min v n ii max forall n and i Alsorecallthat n ^ B n j 2 ^ n S n N p n 0 p n g n ( g n +1) 1 2 X Tn X n 1 under P M Nowlet > 0, andboundtheaforementionedrsttermby E M P M n ^ B n 1 > j 2 ^ n S n I 2 2 e 0 n n ^ n S n # E M p n X i =1 P M n i ^ B n i > j 2 ^ n S n I 2 2 e 0 n n ^ n S n # E M p n X i =1 2 s 2 ( g n +1) n g n v n ii 2 I 2 2 e 0 n n ^ n S n # 2 p n E M s 2 ( g n +1) n 2 2 g n max e 0 n ^ n S n # =2 p n s 2 ( g n +1) n 2 2 g n max e 0 n 23

PAGE 24

where ( ) denotesthestandardnormalcdf.ThenbytheMillsratio, 2 p n s 2 ( g n +1) n 2 2 g n max e 0 n 2 p n s g n max e 0 n 2 ( g n +1) n 2 exp 2 ( g n +1) n 2 4 g n max e 0 n Thisexpressiontendstozeroif e 0 n = n isboundedabove,sowemayinsteadassumethat e 0 n = n !1 whichbyinspectionoccursifandonlyif ( g n +1) 1 jj r n 0 n jj 22 !1 Then e 0 n 2 n 1 0 n ( g n +1) 1 jj r n 0 n jj 22 forallsufcientlylarge n andhence 2 p n s 2 ( g n +1) n 2 2 g n max e 0 n 2 p n s 2 max g n jj r n 0 n jj 22 0 n 2 ( g n +1) 2 n exp 0 n 2 ( g n +1) 2 n 8 max g n jj r n 0 n jj 22 s 8 max g n jj r n 0 n jj 22 log p n 0 n 2 ( g n +1) 2 n exp 1 0 n 2 ( g n +1) 2 n 8 max g n jj r n 0 n jj 22 log p n log p n # 0 a.s. ( P 0 ) forevery > 0 bytheassumptionthat g n ( g n +1) 2 (log p n ) n 1 jj r n 0 n jj 22 0. Therefore,posterior consistencyoccurs. Case3:Again,suppose ( g n +1) 1 jj r n 0 n jj 1 0, butnowsupposethat g n ( g n +1) 2 (log p n ) n 1 jj r n 0 n jj 22 9 0. Thenthereexistasubsequence k n of n and aconstant > 0 suchthat g k n ( g k n +1) 2 (log p k n ) k 1 n jj r k n 0 k n jj 22 > forall n Note thatposteriorinconsistencyofthesubsequence P M ( k n j ^ k n S k n ) impliesposterior inconsistencyoftheoverallsequence P M ( n j ^ n S n ), sowemayassumewithoutloss ofgeneralitythat k n = n fornotationalconvenience.Also,dene n tobethe p n p n matrixwithelements n ij = v n ij = p v n ii v n jj where v n ij denotesthe ij thelementof 24

PAGE 25

n ( X Tn X n ) 1 asbefore.Then P M jj n ^ B n jj 1 > j ^ n S n E M P M n ^ B n 1 > j 2 ^ n S n I 2 e 0 n 2 n ^ n S n # E M P M max 1 i p n n i ^ B n i > r 2 v n i min 2 ^ n S n I 2 e 0 n 2 n ^ n S n # E M 24 P M 0@ max 1 i p n j Z i j > s ( g n +1) 2 n g n min 2 2 1A I 2 e 0 n 2 n ^ n S n 35 where Z n N p n ( 0 p n n ) andisindependentof 2 under P M Nownotethatthe innermostconditionalprobabilityisanondecreasingfunc tionof 2 ,whichimplies that P M jj n ^ B n jj 1 > j ^ n S n E M P M max 1 i p n j Z i j > s 2( g n +1) 2 n 2 g n min e 0 n 2 I 2 e 0 n 2 n ^ n S n # = P M max 1 i p n j Z i j > s 2( g n +1) 2 n 2 g n min e 0 n P M 2 e 0 n 2 n ^ n S n sincetheentriesof n dependonlyonthematrix X Tn X n ThenLemma 2.7 immediately impliesthat P M ( 2 e 0 n = 2 n j ^ n S n ) 1 a.s. ( P 0 ), soitsufcestoshowthatthe rsttermisboundedawayfromzeroforallsufcientlylarge n a.s. ( P 0 ). Nowdene 0 n =[2( g n +1) 2 n 2 = g n min e 0 n ] 1 = 2 and e n i = Var ( Z i j Z i +1 ,..., Z p n ). Then P M max 1 i p n j Z i j > 0 n =1 P M max 1 i p n j Z i j 0 n =1 E M P M ( j Z 1 j 0 n j Z 2 Z 3 ,..., Z p n ) p n Y i =2 I fj Z i j 0 n g # 1 E M 1 2 0 n = q e n ,1 p n Y i =2 I fj Z i j 0 n g # 25

PAGE 26

byLemma 2.5 andthefactthat e n ,1 doesnotdependon Z 2 Z 3 ,..., Z p n Hence, P M max 1 i p n j Z i j > 0 n 1 1 2 0 n = q e n ,1 P M max 2 i p n j Z i j 0 n Byrepeatedconditioningon Z i +1 Z i +2 ,..., Z p n for i =2,3,..., p n 1 andapplicationof Lemma 2.5 asabove,wendthat P M max 1 i p n j Z i j > 0 n 1 p n Y i =1 1 2 0 n = q e n i Notethat e 0 n n 1 0 n g n +1 jj r n 0 n jj 22 ( g n +1) n 2 max g n log p n whichimpliesthat 0 n s 2 max 2 log p n min Theeigenvaluesof n areboundedbelowby min = max so inf n i e n i min = max by Lemma 2.6 .Thenitfollowsthat P M max 1 i p n j Z i j > 0 n 1 1 2 s 2 2max 2 log p n 2min !# p n 1 exp 2 p n s 2 2max 2 log p n 2min !# Noticethatifanysubsequenceof p n isboundedabove,thenitisclearthatthequantity [ (2 2max 2 log p n = 2min ) 1 = 2 ] isboundedawayfromzeroalongthatsubsequence,and thusposteriorinconsistencyfollowsimmediately.Sowema yinsteadassumethat p n !1 Then 2 2max 2 log p n = 2min !1 ,inwhichcasetheinequality 1 ( t ) ( t 1 t 3 )(2 ) 1 = 2 exp( t 2 = 2) (2 t ) 1 (2 ) 1 = 2 exp( t 2 = 2) 26

PAGE 27

forlarge t maybeappliedforallsufcientlylarge n yielding P M max 1 i p n j Z i j > 0 n 2 ^ n S n 1 exp p n s 2min 4 2max 2 log p n exp 2max 2 log p n 2min # =1 exp ( s 2min 4 2max 2 log p n exp 1 2max 2 2min log p n ) 1 for < s 2min 2max Thereforeposteriorconsistencydoesnotoccur. Inthesameveinasfrequentistconsistency,posteriorcons istencyasdenedhere canbeconceptualizedastheideathatthecenter(notnecess arilythemean)ofthe posteriordistributionconvergestothetruevaluewhileth espread(notnecessarilythe variance)oftheposteriordistributionconvergestozero. Inlightofthis,itisnoteworthy thatthetwoconditionsinTheorem 2.8 arisefrompreciselysuchconsiderations. Therstconditioncontrolstheconvergencetozeroofthe ` 1 -distancebetweenthe posterior'scenterandthetruevalue 0 n whilethesecondconditioncontrolsthe convergenceoftheposterior'sspreadtozero.Bothconditi onsarenecessaryfor posteriorconsistencytohold. Inthesimplecasewhere p n doesnotincreasewith n itistypicaltoxtheprior meanas r n = r andtoassumethat 0 n = 0 alsodoesnotvarywith n .Inthiscaseit canbeimmediatelyseenthatalthoughthesecondconditiono fTheorem 2.8 issatised, therstconditionfailsexceptintheserendipitouscaseth at r = 0 Ofcourse,theresult issomewhatobviousevenwithoutappealingtoTheorem 2.8 ,sincetheposteriormean issimplyaweightedaverageoftheMLE ^ n whichisstronglyconsistentfor 0 and thepriormean r withweights g ( g +1) 1 and ( g +1) 1 respectively.Inthiscase,the situationmayberemediedbytakinganychoiceof g n thattendstoinnity.Forinstance, theunitinformationprior[ 15 ]isequivalenttotaking g n = n while g n =max f n p 2 n g has 27

PAGE 28

alsobeenrecommended[ 8 ].Eitherchoiceyieldsposteriorconsistencyinthexedp case. TheresultofTheorem 2.8 becomesmoreinterestingwhen p n !1 Forexample, itmaybethecasethat jj r n 0 n jj 1 = O (1), but jj r n 0 n jj 22 = O ( p n ). Inthiscase, therstconditionissatisedaslongas g n !1 butthesecondconditionimposesthe additionalrequirementthat g n mustgrowfasterthan p n n 1 log p n Theaforementioned choicesof g n = n or g n =max f n p 2 n g provideposteriorconsistencyinthiscaseaswell. Asanotherspecialcase,suppose p n = O ( n ) exactly,butsupposeonlyaxed number p ? > 0 ofcomponentsof r n 0 n arenonzeroandthese p ? components remainxedas n grows.Thiscircumstancecouldarisewiththelogicalchoic e r n = 0 p n ifonlytherstfewcovariatesarepresentinthetruefreque ntistmodel P 0 butcovariates continuetobeaddedasthesamplesizeincreases.Thenany g n !1 ensuresposterior consistency.Thiscaseisadmittedlyuninterestinginthen on-hierarchicalmodel,butwe willrevisititsbehaviorlaterunderempiricalandhierarc hicalBayesianmodels. 2.2EmpiricalBayesian g -PriorModel Apopularapproachistoavoidspecifying g or g n altogetherbytheuseofan empiricalBayesmethod[ 11 ]inwhichthevalueof g isestimatedfromthedata. Themostcommontechniqueistousethevalueof g thatmaximizesitsmarginal likelihood,restrictedto g 0. Byintegratingout n and 2 fromthejointdistributionof ^ n S n n 2 themarginallikelihoodof g isfoundtobe L ( g ; ^ n S n ) / ( g +1) ( n p n + a 2) = 2 ( g +1)( S n + b )+ T n ( n + a 2) = 2 forwhichthemaximizingvalueof g subjectto g 0 is ^ g EB n =max 0, n p n + a 2 S n + b T n p n 1 (2–6) Denition2.3. The empiricalBayesian g -priormodel isthenon-hierarchical g -prior modelwith g n takentobethedata-dependentsequence ^ g EB n asgiveninEquation 2–6 28

PAGE 29

WebeginouranalysisoftheempiricalBayesian g -priormodelbyprovidingalemma thataddressesthebehaviorof ^ g EB n Lemma2.9. If liminf n !1 jj r n 0 n jj 22 > 0, then liminf n !1 ^ g EB n > 0 a.s. ( P 0 ). Proof. Dene =liminf n !1 jj r n 0 n jj 22 andassume > 0. Then T n p n = T n 0 n 2 0 + n p n 0 n jj r n 0 n jj 22 > T n 0 n 2 0 + 2 max > 2 0 + 4 max forallsufcientlylarge n a.s. ( P 0 ), since T n = 0 n 1 a.s. ( P 0 ) byLemma 2.2 .Then liminf n !1 ^ g EB n > liminf n !1 n p n + a 2 S n + b 2 0 + 4 max 1 = 4 max 2 0 > 0 a.s. ( P 0 ) since ( n p n + a 2) = ( S n + b ) 1 = 2 0 a.s. ( P 0 ) byLemma 2.1 Since ^ g EB n issimplyafunctionof ( ^ n S n ), theempiricalBayesposteriorisidentical tothesimplenon-hierarchicalBayesposterior,butwithth edata-dependentquantity ^ g EB n inplaceof g n Thus,whileTheorem 2.8 wouldallowustoimmediatelystateanecessary andsufcientconditionforposteriorconsistencyinterms of ^ g EB n analternativecondition notinvolvingdata-dependentquantitieswouldbepreferab le.Thefollowingresultgives preciselysuchaconditionandestablishesitsnecessityan dsufciency. Theorem2.10. IntheempiricalBayesian g -priormodel,posteriorconsistencyoccurs ifandonlyifeither =0 ortheredoesnotexistasubsequence k n of n andaconstant A > 0 suchthat jj r k n 0 k n jj 22 A and jj r k n 0 k n jj 1 9 0. Proof. ByTheorem 2.8 ,weimmediatelyhavethatposteriorconsistencyoccursifa nd onlyifboth jj r n 0 n jj 1 ^ g EB n +1 0 and ^ g EB n log p n (^ g EB n +1) 2 n jj r n 0 n jj 22 0 a.s. ( P 0 ). (2–7) Wenowconsiderthreecases. 29

PAGE 30

Case1:Supposetheredonotexistasubsequence k n of n andaconstant A > 0 suchthat jj r k n 0 k n jj 22 A and jj r k n 0 k n jj 1 9 0. Nowlet k n beasubsequenceof n andconsidertwosub-cases. Case1.1:Suppose jj r k n 0 n jj 1 0. ThenclearlytherstpartofConditon 2–7 issatisedtrivially.Notethatforanyfurthersubsequenc e m n of k n forwhich jj r m n 0 m n jj 22 0, thesecondpartofCondition 2–7 issatisedtriviallyaswell,sowemay insteadassume liminf n !1 jj r k n 0 k n jj 22 > 0. Thenforallsufcientlylarge n a.s. ( P 0 ), ^ g EB k n log p k n (^ g EB k n +1) 2 k n jj r k n 0 k n jj 22 log k n k n S k n + b k n p k n + a 2 0 k n T k n p k n jj r k n 0 k n jj 22 0 k n log k n k n S k n + b k n p k n + a 2 0 k n T k n p k n max k n (2–8) 0 a.s. ( P 0 ) byLemmas 2.1 2.2 ,and 2.9 .Thus,bothpartsofConditions 2–7 holdalongthe subsequence k n Case1.2:NotethatCase1.1canbeappliedtoanyfurthersubs equence m n of k n forwhich jj r m n 0 m n jj 1 0, sowemaysupposeforCase1.2that liminf n !1 jj r k n 0 k n jj 1 > 0. Notealsothatinthiscase,therecannotexistanyfurthersu bsequence m n of k n forwhich jj r m n 0 m n jj 22 convergestoanonzeroconstant,sincethiswould contradicttheoriginalsuppositionofCase1.Thensince liminf n !1 jj r k n 0 k n jj 22 liminf n !1 jj r k n 0 k n jj 21 > 0, itfollowsthat jj r k n 0 k n jj 22 !1 Thenforallsufciently large n a.s. ( P 0 ), 1 ^ g EB k n +1 jj r k n 0 k n jj 1 = S k n + b k n p k n + a 2 0 k n T k n p k n jj r k n 0 k n jj 1 0 k n S k n + b k n p k n + a 2 0 k n T k n p k n max k n jj r k n 0 k n jj 2 (2–9) 0 a.s. ( P 0 ) 30

PAGE 31

byLemmas 2.1 2.2 ,and 2.9 ,whileInequality 2–8 anditsassociatedconvergence resultholdbythesamelemmas.Thus,bothpartsofCondition 2–7 holdalongthe subsequence k n SinceCases1.1and1.2togetherestablishthatbothpartsof Condition 2–7 holdalonganysubsequence k n ,theyholdforthewholesequence, andthereforeposteriorconsistencyoccurs. Case2:Nowsupposethereexistasubsequence k n of n andaconstant A > 0 suchthat jj r k n 0 k n jj 22 A > 0 and jj r k n 0 k n jj 1 9 0, andsuppose =0. NotethatCase1.1canbeappliedtoanyfurthersubsequence m n of k n forwhich jj r m n 0 m n jj 1 0, sowemaysupposeforCase2that liminf n !1 jj r k n 0 k n jj 1 > 0. ThenInequalities 2–8 and 2–9 andtheirassociatedconvergenceresultsstillhold byLemmas 2.1 2.2 ,and 2.9 since p k n = k n 0 and liminf n !1 jj r k n 0 k n jj 2 liminf n !1 jj r k n 0 k n jj 1 > 0. Hence,bothpartsofCondition 2–7 holdforevery subsequence,andconsequentlyfortheoverallsequence.Th ereforeposterior consistencyoccurs. Case3:Nowsupposethereexistasubsequence k n of n andaconstant A > 0 such that jj r k n 0 k n jj 22 A > 0 and jj r k n 0 k n jj 1 9 0, butsuppose > 0. AsinCase2, wemaysupposeforCase3that liminf n !1 jj r k n 0 k n jj 1 > 0. Thenforallsufciently large n a.s. ( P 0 ), jj r k n 0 k n jj 1 ^ g EB k n +1 = S k n + b k n p k n + a 2 0 k n T k n p k n jj r k n 0 k n jj 1 0 k n fromwhichitfollowsthat jj r k n 0 k n jj 1 ^ g EB k n +1 S k n + b k n p k n + a 2 0 k n T k n p k n min k n jj r k n 0 k n jj 22 liminf n !1 jj r k n 0 k n jj 1 2 0 min A liminf n !1 jj r k n 0 k n jj 1 > 0 a.s. ( P 0 ) byLemmas 2.1 2.2 ,and 2.9 .TherstpartofCondition 2–7 failsforthesubsequence k n andhencefortheoverallsequence.Thereforeposteriorcon sistencydoesnotoccur. 31

PAGE 32

AsalientconsequenceofTheorem 2.10 isthatif p n = o ( n ), thentheempirical Bayesmodelexhibitsposteriorconsistencyforallvalueso f r n and 0 n However,if p n = O ( n ) exactly,thenthesituationisnotassimple.Forexample,re turntothespecial casewhere p n = O ( n ) exactly,butonlyaxednumber p ? > 0 ofcomponentsof r n 0 n arenonzeroandthese p ? componentsremainxedas n grows.Thenclearlyboth jj r n 0 n jj 1 and jj r n 0 n jj 22 convergetoconstants,sotheconditionofTheorem 2.10 fails,andtheposteriorisinconsistent. Thisbehaviorisperhapssomewhatsurprising.Ifthepriorm ean r n isimaginedas aguessforthetrue 0 n thenonemightspeculatethatposteriorinconsistencywoul d onlyoccurwhentheguessisquitebad,i.e.,when jj r n 0 n jj 22 or jj r n 0 n jj 1 grows tooquickly.However,intheempiricalBayesiansetting,Th eorem 2.10 showsthatthis isnotthecase.Intuitively,thereasonisthatifweallowth edatatodeterminethevalue of g thenapriormean r n thatistoocloseto 0 n (inthesenseofthe ` 2 norm)may causethedatatochoose g valuesthattendtoaniteconstant,whichleadstoposterio r inconsistency. 2.3Hyperg -PriorModel Analternativeapproachtothespecicationof g isahierarchicalmodelinwhich g isconsideredahyperparameterandisgivenahyperprior n ( g ). Inthiscase,Prior 2–1 on n istakentobeconditionalonboth 2 and g Denition2.4. A hierarchical g -priormodel isa g -priormodelwithahyperprior ( g ) orasequenceofhyperpriors n ( g ) on g Inahierarchical g -priormodel,thejointposterior n ( n 2 g j ^ n S n ) isgivenby n ( n 2 g j ^ n S n ) / exp 1 2 h n ~ B n ( g ) i T g g +1 2 X Tn X n 1 1 h n ~ B n ( g ) i # 2 ( n + p n + a ) = 2 exp 1 2 2 S n + b + T n g +1 g p n = 2 n ( g ), 32

PAGE 33

where ~ B n ( g )= E ( n j g 2 ^ n S n )= g ( g +1) 1 ^ n +( g +1) 1 r n Integratingout n and subsequently 2 yieldsthemarginalposteriors n ( 2 g j ^ n S n ) / 2 ( n + a ) = 2 exp 1 2 2 S n + b + T n g +1 ( g +1) p n = 2 n ( g ), (2–10) n ( g j ^ n S n ) / ( g +1) p n = 2 S n + b + T n g +1 ( n + a 2) = 2 n ( g ). (2–11) Thefollowingtechnicallemmaestablishesarelationshipb etweenposteriorconsistency inthehierarchical g -priormodelandtheconvergenceofaparticularsequenceof posteriorprobabilities.Notethatthelemmamakesnoassum ptionsontheparticular formofthehyperprior n ( g ). Lemma2.11. Inahierarchical g -priormodel,suppose n 3 T 2 n E M [ g 2 ( g +1) 4 j ^ n S n ] 0 a.s. ( P 0 ) .Thenposteriorconsistencyoccursifandonlyif P M [( g +1) 1 jj r n 0 n jj 1 > j ^ n S n ] 0 a.s. ( P 0 ) forevery > 0. Proof. Assumethat n 3 T 2 n E M [ g 2 ( g +1) 4 j ^ n S n ] 0 a.s. ( P 0 ). ByLemma 1.5 ,it sufcestoconsiderwhether P M ( jj n ^ n jj 1 > j ^ n S n ) 0 a.s. ( P 0 ) forevery > 0. Byiteratedexpectationandthetriangleinequality, E M h P M ~ B n ( g ) ^ n 1 > 2 g 2 ^ n S n ^ n S n i E M h P M n ~ B n ( g ) 1 > g 2 ^ n S n ^ n S n i E M h P M n ^ n 1 > g 2 ^ n S n ^ n S n i (2–12) E M h P M ~ B n ( g ) ^ n 1 >= 2 g 2 ^ n S n ^ n S n i + E M h P M n ~ B n ( g ) 1 >= 2 g 2 ^ n S n ^ n S n i Consider P M ( jj n ~ B n ( g ) jj 1 > j g 2 ^ n S n ) forsomearbitrary > 0 and g 0. Under P M n ~ B n ( g ) g 2 ^ n S n N p n 0 g g +1 2 X Tn X n 1 33

PAGE 34

Let v n ,11 ,..., v n p n p n denotethediagonalelementsof n ( X Tn X n ) 1 andwrite P M n ~ B n ( g ) 1 > g 2 ^ n S n p n X i =1 P M n i ~ B n i ( g ) > g 2 ^ n S n p n X i =1 P M h n i ~ B n i ( g ) i 4 > 4 g 2 ^ n S n p n X i =1 3 g 2 4 v n ii ( g +1) 2 n 2 4 3 max g 2 4 ( g +1) 2 n 4 Then E M h P M n ~ B n ( g ) 1 > g 2 ^ n S n ^ n S n i 3 max n 4 E M g 2 4 ( g +1) 2 ^ n S n = 3 max n 4 E M g 2 ( g +1) 2 E M 4 g ^ n S n ^ n S n ObservefromtheformoftheposteriorinProportionality 2–10 thatunder P M 2 j g ^ n S n InverseGamma n + a 2 2 S n + b +( g +1) 1 T n 2 Therefore, E M h P M n ~ B n ( g ) 1 > g 2 ^ n S n ^ n S n i = 3 max n 4 E M g 2 S n + b +( g n +1) 1 T n 2 ( g +1) 2 ( n + a 4)( n + a 6) ^ n S n # 6 max n 4 S n + b n + a 6 2 + 6 max n 4 T 2 n ( n + a 6) 2 E M g 2 ( g +1) 4 ^ n S n 0 a.s. ( P 0 ) byLemma 2.1 andtheinitialassumption.ThenthisresultandInequality 2–12 implythat posteriorconsistencyoccursifandonlyif E M h P M ~ B n ( g ) ^ n 1 > g 2 ^ n S n ^ n S n i 0 a.s. ( P 0 ) (2–13) 34

PAGE 35

forevery > 0. Since ~ B n ( g ) ^ n =( g +1) 1 ( r n ^ n ), posteriorconsistencyoccursif andonlyif P M [( g +1) 1 jj r n ^ n jj 1 > j ^ n S n ) 0 a.s. ( P 0 ) forevery > 0. Butagain bythetriangleinequality, P M 1 g +1 jj r n 0 n jj 1 > 2 ^ n S n P M 1 g +1 0 n ^ n 1 > ^ n S n P M 1 g +1 r n ^ n 1 > ^ n S n (2–14) P M 1 g +1 jj r n 0 n jj 1 >= 2 ^ n S n + P M 1 g +1 0 n ^ n 1 >= 2 ^ n S n Foranyarbitrary > 0, P M 1 g +1 0 n ^ n 1 > ^ n S n P M 0 n ^ n 1 > ^ n S n = I 0 n ^ n 1 > 0 a.s. ( P 0 ) byCorollary 1.4 .Thenthisresult,Inequality 2–14 ,andCondition 2–13 togetherimply thatposteriorconsistencyoccursifandonlyif P M [( g +1) 1 jj r n 0 n jj 1 > j ^ n S n ] 0 a.s. ( P 0 ) forevery > 0. TheformofPosterior 2–11 suggeststhataconvenientchoiceofhyperprioris n ( g ) / ( g +1) c = 2 (2–15) forsomeconstant c ,whichcreatesapriorstructurecalledthehyperg -prior[ 18 ]. Denition2.5. The hyperg -priormodel isthehierarchical g -priormodelwiththe hyperpriorspeciedasPrior 2–15 Notethatthishyperpriorisproperfor c > 2 ,andthereexistsanargument[ 18 ]for taking 2 < c 4 ,butweinsteadpermit c totakeanyrealvalueinthepresentanalysis. 35

PAGE 36

Thehyperg -prioryieldstheposterior n ( g j ^ n S n ) / ( g +1) ( p n + c ) = 2 S n + b + T n g +1 ( n + a 2) = 2 / ( g +1) ( n p n + a c 2) = 2 ( g +1)( S n + b )+ T n ( n + a 2) = 2 (2–16) Itwillalsobeusefultodenethetransformation u = ( g +1)( S n + b ) ( g +1)( S n + b )+ T n W n = S n + b S n + b + T n (2–17) sothat g 0 ifandonlyif u W n ThenextlemmaassertsthatLemma 2.11 applies withthischoiceofhyperprior. Lemma2.12. Inthehyperg -priormodel, n 3 T 2 n E M [ g 2 ( g +1) 4 j ^ n S n ] 0 a.s. ( P 0 ). Proof. FromtheformofPosterior 2–16 andTransformation 2–17 T 2 n n 3 E M g 2 ( g +1) 4 ^ n S n T 2 n n 3 E M 1 ( g +1) 2 ^ n S n = T 2 n Z 1 0 ( g +1) ( n p n + a c 6) = 2 [ ( g +1)( S n + b )+ T n ] ( n + a 2) = 2 dg n 3 Z 1 0 ( g +1) ( n p n + a c 2) = 2 [ ( g +1)( S n + b )+ T n ] ( n + a 2) = 2 dg = ( S n + b ) 2 Z 1 W n u ( n p n + a c 6) = 2 (1 u ) ( p n + c ) = 2 du n 3 Z 1 W n u ( n p n + a c 2) = 2 (1 u ) ( p n + c 4) = 2 du Nowlet H n Beta (( n p n + a c 4) = 2,( p n + c 2) = 2) and e H n Beta (( n p n + a c ) = 2,( p n + c 2) = 2) withbothindependentof ^ n and S n under P M andobserve that H n isstochasticallysmallerthan e H n under P M Alsolet ( ) denotetheusualgamma 36

PAGE 37

function.Continuing,wehavethat T 2 n n 3 E M g 2 ( g +1) 4 ^ n S n = ( S n + b ) 2 n p n + a c 4 2 p n + c +2 2 P M H n > W n j ^ n S n n 3 n p n + a c 2 p n + c 2 2 P M e H n > W n j ^ n S n 1 n S n + b n 2 ( p n + c )( p n + c 2) ( n p n + a c 2)( n p n + a c 4) 0 a.s. ( P 0 ) byLemma 2.1 ToexaminethebehavioroftheposteriorprobabilitiesinLe mma 2.11 underthe hyperg -prior,webeginbyusingPosterior 2–16 towrite P M 1 g +1 jj r n 0 n jj 1 > ^ n S n = P M g < 1 jj r n 0 n jj 1 1 ^ n S n = Z q n ( ) 0 ( g +1) ( n p n + a c 2) = 2 ( g +1)( S n + b )+ T n ( n + a 2) = 2 dg Z 1 0 ( g +1) ( n p n + a c 2) = 2 ( g +1)( S n + b )+ T n ( n + a 2) = 2 dg wherewedene q n ( )=max f 0, 1 jj r n 0 n jj 1 1 g Nowdene e L n ( )= 1 jj r n 0 n jj 1 ( S n + b ) 1 jj r n 0 n jj 1 ( S n + b )+ T n L n ( )=max n W n eL n ( ) o (2–18) andapplyTransformation 2–17 toobtain P M 1 g +1 jj r n 0 n jj 1 > ^ n S n = Z L n ( ) W n u ( n p n + a c 2) = 2 (1 u ) ( p n + c 4) = 2 du Z 1 W n u ( n p n + a c 2) = 2 (1 u ) ( p n + c 4) = 2 du = P M [ W n < U n < L n ( ) j ^ n S n ] P M ( U n > W n j ^ n S n ) (2–19) 37

PAGE 38

where U n Beta n p n + a c 2 p n + c 2 2 (2–20) independentof ^ n and S n ,under P M .Wenowintroduceseveraltechnicalresults regardingthesequantitiesthatwillbeusefulinprovingth emaintheorem. Lemma2.13. If liminf n !1 jj r n 0 n jj 22 forsome > 0 ,then limsup n !1 W n (1 ) max 2 0 = ( + max 2 0 ) < 1 a.s. ( P 0 ). Proof. Assume liminf n !1 jj r n 0 n jj 22 forsome > 0 .Then limsup n !1 W n =limsup n !1 1+ T n S n + b 1 1+liminf n !1 p n 2 0 + n 1 0 n jj r n 0 n jj 22 n p n # liminf n !1 ( n p n ) T n ( S n + b ) 0 n 1 1+ 1 + max (1 ) 2 0 1 = (1 ) max 2 0 + max 2 0 < 1 a.s. ( P 0 ) byLemmas 2.1 and 2.2 Lemma2.14. If jj r n 0 n jj 22 !1 then(i) W n 0 a.s. ( P 0 ), andalso(ii) L n ( ) 0 a.s. ( P 0 ) forevery > 0. Proof. Assume jj r n 0 n jj 22 !1 andlet > 0. Then W n = 1+ T n S n + b 1 = 1+ p n 2 0 + n 1 0 n jj r n 0 n jj 22 n p n ( n p n ) T n ( S n + b ) 0 n 1 0 a.s. ( P 0 ) sincetheterminsquarebracketsconvergesto 1 = 2 0 a.s. ( P 0 ) byLemmas 2.1 and 2.2 Thisestablishes(i). Nowobservethat e L n ( ) maybewrittenas e L n ( )= 0@ 1+ p n 2 0 + n 1 0 n jj r n 0 n jj 22 ( n p n ) jj r n 0 n jj 1 ( n p n ) T n ( S n + b ) 0 n 1A 1 1+ jj r n 0 n jj 2 max ( n p n ) T n ( S n + b ) 0 n 1 0 a.s. ( P 0 ) 38

PAGE 39

sincetheterminsquarebracketsconvergesto 1 = 2 0 a.s. ( P 0 ) byLemmas 2.1 and 2.2 .It followsimmediatelythat L n ( )=max f W n e L n ( ) g! 0 a.s. ( P 0 ), establishing(ii). Lemma2.15. If jj r n 0 n jj 22 A ,where 0 < A < 1 ,and liminf n !1 jj r n 0 n jj 1 > 0, then(i)forevery > 0 ,thereexists L ? ( ) < 1 suchthat limsup n !1 L n ( ) L ? ( ) a.s. ( P 0 ) andalso(ii)forevery < 1, thereexists > 0 suchthat liminf n !1 L n ( ) > a.s. ( P 0 ). Proof. Assumethat jj r n 0 n jj 22 A > 0 and liminf n !1 jj r n 0 n jj 1 > 0. Let > 0 Then limsup n !1 W n (1 ) max 2 0 = ( A + max 2 0 ) < 1 a.s. ( P 0 ) byLemma 2.13 ,and limsup n !1 e L n ( )=limsup n !1 1+ T n jj r n 0 n jj 1 ( S n + b ) 1 =limsup n !1 0@ 1+ p n 2 0 + n 1 0 n jj r n 0 n jj 22 ( n p n ) jj r n 0 n jj 1 ( n p n ) T n ( S n + b ) 0 n 1A 1 limsup n !1 1+ jj r n 0 n jj 2 max ( n p n ) T n ( S n + b ) 0 n 1 = 1+ p A max 2 0 1 = max 2 0 A 1 = 2 + max 2 0 < 1 a.s. ( P 0 ) sincetheterminsquarebracketsconvergesto 1 = 2 0 a.s. ( P 0 ) byLemmas 2.1 and 2.2 Dene L ? ( )=max (1 ) max 2 0 A + max 2 0 max 2 0 A 1 = 2 + max 2 0 < 1, andobservethat limsup n !1 L n ( ) L ? ( ) a.s. ( P 0 ) .Thisestablishes(i). Nowdene e A =liminf n !1 jj r n 0 n jj 1 > 0, andnotethat liminf n !1 L n ( ) liminf n !1 1+ T n jj r n 0 n jj 1 ( S n + b ) 1 0@ 1+limsup n !1 24 p n 2 0 + n 1 0 n jj r n 0 n jj 22 ( n p n ) jj r n 0 n jj 1 35 limsup n !1 ( n p n ) T n ( S n + b ) 0 n 1A 1 39

PAGE 40

whichimpliesthat liminf n !1 L n ( ) 1+ 2 0 + A = min (1 ) e A 2 0 1 a.s. ( P 0 ) byLemmas 2.1 and 2.2 .Thenitcanbeseenthatforany < 1, thereexists > 0 such that liminf n !1 L n ( ) > a.s. ( P 0 ), establishing(ii). Toproveourmainresult,wewillalsoneedthefollowinglemm a,whichprovidesa simpleresultaboutbetarandomvariables. Lemma2.16. Let Z n Beta ( a n b n ) for n 1 ,where a n = n 1 and b n = n ,with 0 < 1 .Then P (1 Z n 1 + ) 1 forevery > 0 Proof. Let > 0 .Notethat E ( Z n )= a n = ( a n + b n ) 1 ,so j a n = ( a n + b n ) (1 ) j = 2 forallsufcientlylarge n .AlsonotethatVar ( Z n )= a n b n = [( a n + b n ) 2 ( a n + b n +1)] 1 = a n < 2 = [ n (1 )] forallsufcientlylarge n .Thenforallsufcientlylarge n P ( 1 Z n 1 + ) = P 1 a n a n + b n Z n a n a n + b n 1 a n a n + b n + P 2 Z n a n a n + b n 2 1 4 2 Var ( Z n ) 1 8 n (1 ) 2 1, wherethesecondofthethreeinequalitiesisChebyshev'sin equality. Wemaynowstateandprovethemainresult,anecessaryandsuf cientcondition forposteriorconsistencyinthehyperg -priormodel.Interestingly,thisconditionis identicaltotheonegiveninTheorem 2.10 fortheempiricalBayesianmodel. Theorem2.17. Inthehyperg -priormodel,posteriorconsistencyoccursifandonlyif either =0 ortheredoesnotexistasubsequence k n of n andaconstant A > 0 such that jj r k n 0 k n jj 22 A and jj r k n 0 k n jj 1 9 0. Proof. First,byLemmas 2.11 and 2.12 ,posteriorconsistencyoccursifandonlyif P M [( g +1) 1 jj r n 0 n jj 1 > j ^ n S n ] 0 a.s. ( P 0 ) forevery > 0. Thenby 40

PAGE 41

Equation 2–19 ,posteriorconsistencyoccursifandonlyif P M h W n < U n < L n ( ) j ^ n S n i P M ( U n > W n j ^ n S n ) 0 a.s. ( P 0 ) forevery > 0. (2–21) WenowconsiderthesamethreecasesasintheproofofTheorem 2.10 Case1:Supposetheredonotexistasubsequence k n of n andaconstant A > 0 suchthat jj r k n 0 k n jj 22 A and jj r k n 0 k n jj 1 9 0. Let k n beasubsequenceof n ,and let > 0 .Nowconsidertwosub-cases. Case1.1:Suppose jj r k n 0 k n jj 1 0. Then 1 jj r k n 0 k n jj 1 < 1 forallsufciently large n a.s. ( P 0 ) .Thisimpliesthat L k n ( )= W k n forallsufcientlylarge n a.s. ( P 0 ) ,and therefore P M [ W k n < U k n < L k n ( ) j ^ k n S k n ]=0 forallsufcientlylarge n a.s. ( P 0 ) .Also, P M ( U k n > W k n j ^ k n S k n ) > 0 forall n a.s. ( P 0 ) since W k n < 1 forall n a.s. ( P 0 ) .Thus, P M h W k n < U k n < L k n ( ) j ^ k n S k n i P M ( U k n > W k n j ^ n S n ) 0 a.s. ( P 0 ) bythecombinationofourresultsforitsnumeratoranddenom inator. Case1.2:NotethatCase1.1canbeappliedtoanyfurthersubs equence m n of k n forwhich jj r m n 0 m n jj 1 0, sowemaysupposeinsteadforCase1.2that liminf n !1 jj r k n 0 k n jj 1 > 0. Notealsothatinthiscase,therecannotexistanyfurther subsequence m n of k n forwhich jj r m n 0 m n jj 22 convergestoanonzeroconstant,since thiswouldcontradicttheoriginalsuppositionofCase1.Si nce liminf n !1 jj r k n 0 k n jj 22 liminf n !1 jj r k n 0 k n jj 21 > 0, itfollowsthat jj r k n 0 k n jj 22 !1 ThenLemma 2.14 impliesthatboth W k n 0 a.s. ( P 0 ) and L k n ( ) 0 a.s. ( P 0 ) ,whichinturnimpliesthat both W k n < (1 ) = 2 and L k n ( ) < (1 ) = 2 forallsufcientlylarge n a.s. ( P 0 ) .Thenfor allsufcientlylarge n a.s. ( P 0 ) P M h W k n < U k n < L k n ( ) j ^ k n S k n i P M U k n > W k n j ^ k n S k n P M h U k n < (1 ) = 2 j ^ k n S k n i P M h U k n > (1 ) = 2 j ^ k n S k n i 0 a.s. ( P 0 ) 41

PAGE 42

byLemma 2.16 .Finally,sinceCases1.1and1.2togetherestablishthatCo ndition 2–21 holdsalonganysubsequence k n ,itholdsforthewholesequence,andtherefore posteriorconsistencyoccurs. Case2:Nowsupposethereexistasubsequence k n of n andaconstant A > 0 such that jj r k n 0 k n jj 22 A > 0 and jj r k n 0 k n jj 1 9 0, andalsosuppose =0. Note thatCase1canbeappliedtoanysubsequence m n of n forwhicheither jj r m n 0 m n jj 22 doesnotconvergetoanynonzeroconstantor jj r m n 0 m n jj 1 0 ,soitsufces toshowthatCondition 2–21 holdsalongthesubsequence k n .Notealsothatthis meanswemaysupposeforCase2that liminf n !1 jj r k n 0 k n jj 1 > 0 .Nowlet > 0 ByLemma 2.13 limsup n !1 W k n max 2 0 = ( A + max 2 0 ) a.s. ( P 0 ) ,whichimplies that W k n < 2 max 2 0 = ( A +2 max 2 0 ) forallsufcientlylarge n a.s. ( P 0 ) .Moreover,by Lemma 2.15 ,thereexists L ? ( ) < 1 suchthat limsup n !1 L k n ( ) L ? ( ) a.s. ( P 0 ) whichimpliesthat L k n ( ) < [1+ L ? ( )] = 2 forallsufcientlylarge n a.s. ( P 0 ) .Thenforall sufcientlylarge n a.s. ( P 0 ) P M h W k n < U k n < L k n ( ) j ^ k n S k n i P M U k n > W k n j ^ k n S k n P M U k n < 1+ L ? ( ) 2 ^ k n S k n P M U k n > 2 max 2 0 A +2 max 2 0 ^ k n S k n 0 a.s. ( P 0 ) byLemma 2.16 .Thereforeposteriorconsistencyoccurs. Case3:Nowsupposethereexistasubsequence k n of n andaconstant A > 0 suchthat jj r k n 0 k n jj 22 A > 0 and jj r k n 0 k n jj 1 9 0, butsuppose > 0. By Lemma 2.13 limsup n !1 W k n (1 ) max 2 0 = ( A + max 2 0 ) a.s. ( P 0 ) ,whichimpliesthat W k n < 2(1 ) max 2 0 = ( A +2 max 2 0 ) forallsufcientlylarge n a.s. ( P 0 ) .ByLemma 2.15 thereexists 1 = 4 > 0 suchthat liminf n !1 L k n ( 1 = 4 ) 1 = 4 a.s. ( P 0 ) ,whichimplies that L k n ( 1 = 4 ) > 1 = 2 forallsufcientlylarge n a.s. ( P 0 ) .Thenforallsufciently 42

PAGE 43

large n a.s. ( P 0 ) P M h W k n < U k n < L k n ( 1 = 4 ) ^ k n S k n i P M U k n > W k n j ^ k n S k n P M h W k n < U k n < L k n ( 1 = 4 ) ^ k n S k n i P M 2(1 ) max 2 0 A +2 max 2 0 < U k n < 1 2 ^ k n S k n 1 a.s. ( P 0 ) byLemma 2.16 .SinceCondition 2–21 failstoholdforthesubsequence k n ,itfailsto holdfortheoverallsequence.Thereforeposteriorconsist encydoesnotoccur. ItshouldnotbeentirelysurprisingthattheempiricalBaye sian g -priorand hyperg -priormodelssharethesamenecessaryandsufcientcondit ionforposterior consistency.Indeed,thechoice c =0 yieldstheUniform (0, 1 ) hyperprioron g and inthiscasethemarginalposteriorandlikelihoodof g coincide.Moregenerally,we shouldexpectanadequatelywell-behavedhierarchicalmod eltoexhibitbroadlysimilar behaviortotheempiricalBayesianmodel,sincebothmodels essentiallypermitthedata todeterminethevalueof g 2.4Zellner-Siow g -PriorModel Anotherpopularchoiceinthehierarchical g -priormodelistotakeasequenceof hyperpriors n ( g ) givenby g InverseGamma (1 = 2, n = 2) ,i.e., n ( g )= r n 2 g 3 = 2 exp n 2 g (2–22) whichwasoriginallyproposedbyZellnerandSiow[ 25 ]. Denition2.6. The Zellner-Siow g -priormodel isthehierarchical g -priormodelwith thesequenceofhyperpriorsspeciedaccordingtoPrior 2–22 Themotivationbehindthischoiceisclearestwhen X Tn X n = n I p n inwhichcase itleadstomarginalCauchypriorsforeachcomponentof n Inthissection,wewill 43

PAGE 44

provideasufcientconditionforposteriorconsistencywi ththeZellner-Siowhyperprior. Itstillremainsanopenproblemtodetermineiftheconditio nisalsonecessary. Forgeneral X Tn X n theZellner-Siowhyperprioryieldstheposterior n ( g j ^ n S n ) / ( g +1) ( p n ) = 2 S n + b + T n g +1 ( n + a 2) = 2 g 3 = 2 exp n 2 g / ( g +1) ( n p n + a 2) = 2 ( g +1)( S n + b )+ T n ( n + a 2) = 2 g 3 = 2 exp n 2 g (2–23) Webeginwithalemmashowingthattheconditionneededtoapp lyLemma 2.11 is satisedbythismodel. Lemma2.18. IntheZellner-Siow g -priormodel, n 3 T 2 n E M [ g 2 ( g +1) 4 j ^ n S n ] 0 a.s. ( P 0 ). Proof. Considertwocases. Case1:Suppose jj r n 0 n jj 22 0. Then 0 n = p n 2 0 + n 0 n jj r n 0 n jj 22 n 2 0 forall sufcientlylarge n ThisresultandInequality 2–5 implythat ( 4 ) 0 ( T n ) = E 0 ( T n 0 n ) 4 48 4 0 2 0 n +192 6 0 0 n 96 n 2 8 0 forallsufcientlylarge n .Thenthereexists N suchthat 1 X n = N P 0 T n > 2 n 2 0 1 X n = N P 0 j T n 0 n j > n 2 0 1 X n = N 96 n 2 8 0 n 4 8 0 =96 1 X n = N 1 n 2 < 1 bytheapplicationofMarkov'sinequalityto ( T n 0 n ) 4 whichthenimpliesbythe Borel-Cantellilemmathat limsup n !1 ( T n = n ) 2 2 0 a.s. ( P 0 ). Itthereforefollowsthat n 3 T 2 n E M [ g 2 ( g +1) 4 j ^ n S n ] n 3 T 2 n 0 a.s. ( P 0 ). Case2:NoteimmediatelythatCase1canbeappliedtoanysubs equence k n of n forwhich jj r n 0 n jj 22 0, sowemaysupposeforCase2that liminf n !1 jj r n 0 n jj 22 > 0. Then limsup n !1 W n < 1 a.s. ( P 0 ) byLemma 2.13 .Forconvenience,dene n ( u )= I ( W n < u < 1)exp[ nW n (1 u ) = 2( u W n )], andnotethatthisisa 44

PAGE 45

nondecreasingfunctionof u ontheinterval (0,1). UsingtheformofPosterior 2–23 and Transformation 2–17 ,wemaywrite T 2 n n 3 E M g 2 ( g +1) 4 ^ n S n = T 2 n Z 1 0 ( g +1) ( n p n + a 10) = 2 ( g +1)( S n + b )+ T n ( n + a 2) = 2 g 1 = 2 exp n 2 g dg n 3 Z 1 0 ( g +1) ( n p n + a 2) = 2 ( g +1)( S n + b )+ T n ( n + a 2) = 2 g 3 = 2 exp n 2 g dg = ( S n + b ) 4 Z 1 0 u ( n p n + a c 10) = 2 (1 u ) ( p n +4) = 2 u W n W n (1 u ) 1 = 2 n ( u ) du n 3 T 2 n Z 1 0 u ( n p n + a c 2) = 2 (1 u ) ( p n 4) = 2 u W n W n (1 u ) 3 = 2 n ( u ) du ( S n + b ) 2 Z 1 0 u ( n p n + a c 9) = 2 (1 u ) ( p n +3) = 2 n ( u ) du n 3 (1 W n ) 2 Z 1 0 u ( n p n + a c 5) = 2 (1 u ) ( p n 1) = 2 n ( u ) du whichimpliesthat T 2 n n 3 E M g 2 ( g +1) 4 ^ n S n ( S n + b ) 2 n p n + a 7 2 p n +5 2 Z 1 0 h n ( u ) n ( u ) du n 3 (1 W n ) 2 n p n + a 3 2 p n +1 2 Z 1 0 e h n ( u ) n ( u ) du ( S n + b ) 2 ( p n +3)( p n +1) n 3 (1 W n ) 2 ( n p n + a 5)( n p n + a 7) 0 a.s. ( P 0 ), where h n and eh n arethedensitiesofBeta (( n p n + a 7) = 2,( p n +5) = 2) and Beta (( n p n + a 3) = 2,( p n +1) = 2) randomvariables,respectively,withrespectto Lebesguemeasure.Notethatthelastinequalityholdsbecau searandomvariablewith density h n isstochasticallysmallerthanarandomvariablewithdensi ty e h n and n is nondecreasingon (0,1), whilethealmostsureconvergencetozeroisbyLemma 2.1 andthefactthat limsup n !1 W n < 1 1 a.s. ( P 0 ) byLemma 2.13 45

PAGE 46

NowconsidertheformoftheposteriorprobabilitiesinLemm a 2.11 underthis hyperprior.ByonceagainmakingTransformation 2–17 ,wemaywrite P M 1 g +1 jj r n 0 n jj 1 > ^ n S n = Z L n ( ) W n u ( n p n + a 2) = 2 (1 u ) ( p n 4) = 2 f [ u W n ] = [ W n (1 u ) ] g 3 = 2 exp nW n (1 u ) 2( u W n ) du Z 1 W n u ( n p n + a 2) = 2 (1 u ) ( p n 4) = 2 f [ u W n ] = [ W n (1 u ) ] g 3 = 2 exp nW n (1 u ) 2( u W n ) du = Z L n ( ) W n f n ( u ) u W n 1 u 3 = 2 exp nW n (1 u ) 2( u W n ) du Z 1 W n f n ( u ) u W n 1 u 3 = 2 exp nW n (1 u ) 2( u W n ) du (2–24) where f n isthedensityofaBeta (( n p n + a ) = 2,( p n 2) = 2) randomvariablewithrespect toLebesguemeasure.Thefollowinglemmaaddressesthelowe rtailprobabilitiesofa sequenceofsuchrandomvariables. Lemma2.19. Let Z n Beta ( a n b n ) for n 1, where a n = n 1 and b n = n with 0 < 1 ,andlet 0 .Then(i) P ( Z n ) 4 n n (1 ) forallsufcientlylarge n if > 0, and(ii) P ( Z n ) n = 2 forallsufcientlylarge n if =0. Proof. Noteimmediatelythatboth(i)and(ii)aretrivialif =0 or 1 ,soassume 0 << 1 .Next,byStirling'sapproximation,wemayboundthenormal izingconstantby log ( a n + b n ) ( a n )( b n ) log ( a n + b n ) a n + b n 1 = 2 a a n 1 = 2 n b b n 1 = 2 n forallsufcientlylarge n .Wemayrewritethisas log ( a n + b n ) ( a n )( b n ) a n log a n + b n a n + b n log a n + b n b n + 1 2 log a n b n a n + b n (2–25) 46

PAGE 47

forallsufcientlylarge n .Then P ( Z n )= ( a n + b n ) ( a n )( b n ) Z 0 z a n 1 (1 z ) b n 1 dz ( a n + b n ) ( a n )( b n ) Z 0 z a n 1 dz = ( a n + b n ) a n ( a n )( b n ) a n whichcanbecombinedwithInequality 2–25 toyieldthatforallsufcientlylarge n 1 n log P ( Z n ) a n n log 1 n log a n + a n n log a n + b n a n + b n n log a n + b n b n + 1 2 n log a n b n a n + b n 8>><>>: (1 )log (1 )log(1 ) log if > 0, log if =0. Nowobservethatif > 0 ,then (1 )log(1 )+ log log2 ,anditfollows that limsup n !1 n 1 log P ( Z n ) (1 )log +log2 .Then n 1 log P ( Z n ) (1 )log +log4 forallsufcientlylarge n whichimplies(i).Ifinstead =0 ,then limsup n !1 n 1 log P ( Z n ) log so n 1 log P ( Z n ) 1 2 log forallsufciently large n (notingthat log < 0 ).Thisimplies(ii). NotethattheboundprovidedbyLemma 2.19 inthecasewhere 0 << 1 isonly usefulif 1 < 1 = 4. Nowlet Q n ( ) and R n denotethenumeratoranddenominator, respectively,oftheexpressiononthelastlineofEquation 2–24 .Thefollowinglemmas establishsomeresultsregardingthesequantitiesthatwil leffectivelyprovidetheproofof themaintheorem. Lemma2.20. If liminf n !1 jj r n 0 n jj 22 > 0, thenthereexistsaconstant K suchthat R n exp( nK ) forallsufcientlylarge n a.s. ( P 0 ). Proof. Let =liminf n !1 jj r n 0 n jj 22 ,andassume > 0 .ThenbyLemma 2.13 limsup n !1 W n (1 ) max 2 0 = ( + max 2 0 ) a.s. ( P 0 ) ,whichimpliesthat W n < 2(1 ) max 2 0 = ( +2 max 2 0 ) < 1 forallsufcientlylarge n a.s. ( P 0 ) .Thenforall 47

PAGE 48

sufcientlylarge n a.s. ( P 0 ) R n Z 1 (1 + W n ) = 2 f n ( u ) u W n (1 u ) 3 = 2 exp nW n (1 u ) 2( u W n ) du exp nW n 1 W n Z 1 (1 + W n ) = 2 f n ( u ) u (1 u ) 3 = 2 du = n p n + a 3 2 p n +1 2 n p n + a 2 p n 2 2 exp nW n 1 W n P M 1 + W n 2 < e U n < 1 ^ n S n n p n + a 3 2 p n +1 2 n p n + a 2 p n 2 2 exp nW n 1 W n P M +4 max 2 0 2 +4 max 2 0 ( 1 ) < e U n < 1 (2–26) where e U n Beta (( n p n + a 3) = 2,( p n +1) = 2), independentof ^ n and S n ,under P M Forallsufcientlylarge n Stirling'sapproximationyieldsthat n p n + a 3 2 p n +1 2 n p n + a 2 p n 2 2 n p n + a 3 2 ( n p n + a 4) = 2 exp n p n + a 3 2 p n +1 2 p n = 2 exp p n +1 2 2 n p n + a 2 ( n p n + a 1) = 2 exp n p n + a 2 p n 2 2 ( p n 3) = 2 exp p n 2 2 = 1 2 n p n + a 3 n p n + a ( n p n + a 4) = 2 p n +1 p n 2 ( p n 3) = 2 p n +1 n p n + a 3 = 2 Thenforallsufcientlylarge n n p n + a 3 2 p n +1 2 n p n + a 2 p n 2 2 1 4 p n +1 n p n + a 3 = 2 2(4 n ) 3 = 2 (2–27) 48

PAGE 49

Nowobservethat P M +4 max 2 0 2 +4 max 2 0 ( 1 ) < e U n < 1 1 byLemma 2.16 ,whichimpliesthat P M +4 max 2 0 2 +4 max 2 0 ( 1 ) < e U n < 1 > 1 2 (2–28) forallsufcientlylarge n .ThenbycombiningInequalities 2–26 2–27 ,and 2–28 ,we havethatforallsufcientlylarge n a.s. ( P 0 ) R n (4 n ) 3 = 2 exp nW n 1 W n =exp n W n 1 W n + 3 2 n log(4 n ) Finally,take K =2limsup n !1 [ W n = (1 W n )], andobservethat K < 1 a.s. ( P 0 ) because limsup n !1 W n (1 ) max 2 0 = ( + max 2 0 ) < 1 a.s. ( P 0 ). Then R n exp( nK ) forallsufcientlylarge n a.s. ( P 0 ) Lemma2.21. If jj r n 0 n jj 22 !1 thenforevery > 0 ,thereexistsasequenceof constants n ( ) !1 suchthat Q n ( ) exp [ n n ( ) ] forallsufcientlylarge n a.s. ( P 0 ). Proof. Let > 0, andassume jj r n 0 n jj 22 !1 ThenbyLemma 2.14 W n 0 a.s. ( P 0 ) and L n ( ) 0 a.s. ( P 0 ). Next,observethatthelasttwotermsoftheintegrandin Q n ( ) compriseanunnormalizedInverseGamma (1 = 2, nW n = 2) densityin ( u W n ) = (1 u ), the modeofwhichoccursat nW n = 3. Thenforallsufcientlylarge n Q n ( ) Z L n ( ) W n f n ( u ) nW n 3 3 = 2 exp 3 2 du 2 ( nW n ) 3 = 2 Z L n ( ) 0 f n ( u ) du 2 2 n +1 ( nW n ) 3 = 2 [ L n ( ) ] n (1 ) byLemma 2.19 .Nownotethatif 1 jj r n 0 n jj 1 1, then L n ( )= W n inwhichcase Q n ( )=0 andtheresultistrivial.Soinsteadassumethat 1 jj r n 0 n jj 1 > 1, whichin 49

PAGE 50

turnimpliesthat L n ( ) 1 jj r n 0 n jj 1 W n Then Q n ( ) 2 2 n +1 n ( S n + b ) S n + b + T n 3 = 2 1 jj r n 0 n jj 1 ( S n + b ) 1 jj r n 0 n jj 1 ( S n + b )+ T n n (1 ) 2 2 n +1 n 3 = 2 0@ 1+ p n 2 0 + n 1 0 n jj r n 0 n jj 22 n p n ( n p n ) T n ( S n + b ) 0 n 1A 3 = 2 0@ 1+ p n 2 0 + n 1 0 n jj r n 0 n jj 22 ( n p n ) jj r n 0 n jj 1 ( n p n ) T n ( S n + b ) 0 n 1A n (1 ) 2 2 n +1 n 3 = 2 4 jj r n 0 n jj 22 (1 ) 0 n 2 0 3 = 2 jj r n 0 n jj 2 2(1 ) 0 n 2 0 n (1 ) forallsufcientlylarge n a.s. ( P 0 ) byLemmas 2.1 and 2.2 sincethequantityinsquare bracketsconvergesto 1 = 2 0 a.s. ( P 0 ). Thenwemaysimplifyandwritethatforall sufcientlylarge n Q n ( ) 2 n (3 )+4 ( n ) 3 = 2 1 (1 ) max 2 0 n (1 ) 3 = 2 jj r n 0 n jj n (1 )+3 2 Nowconcludebywritingthatforallsufcientlylarge n a.s. ( P 0 ), Q n ( ) exp n 1 3 n log jj r n 0 n jj 2 1 3 2 n log 1 (1 ) max 2 0 + 3 2 n log( n ) 3 + 4 n log2 =exp [ n n ( ) ] where n ( ) !1 isdenedtobethequantityinsquarebrackets. Lemma2.22. If jj r n 0 n jj 22 A > 0,liminf n !1 jj r n 0 n jj 1 > 0, and =0, then Q n ( ) = R n 0 a.s. ( P 0 ) forevery > 0. Proof. Assume jj r n 0 n jj 22 A > 0,liminf n !1 jj r n 0 n jj 1 > 0, and =0. Let > 0 .Notethat R n > 0 forall n a.s. ( P 0 ) since W n < 1 forall n a.s. ( P 0 ) .Then whenever L n ( ) W n ,weimmediatelyhavethat Q n ( ) = R n =0 exactly,sowemay 50

PAGE 51

insteadassumethat L n ( ) > W n forall n .ByLemma 2.15 ,thereexists L ? ( ) < 1 such that limsup n !1 L n ( ) L ? ( ) a.s. ( P 0 ) ,whichimpliesthat L n ( ) < [1+ L ? ( )] = 2 forall sufcientlylarge n a.s. ( P 0 ) .Thenwemaywritethatforallsufcientlylarge n a.s. ( P 0 ) R n Z 1 [1+ L n ( )] = 2 f n ( u ) u W n (1 u ) 3 = 2 exp nW n (1 u ) 2( u W n ) du exp nW n [ 1 L n ( ) ] 2 [ 1+ L n ( ) 2 W n ] Z 1 [3+ L ? ( )] = 4 f n ( u ) u (1 u ) 3 = 2 du n p n + a 3 2 p n +1 2 n p n + a 2 p n 2 2 exp nW n [ 1 L n ( ) ] 4 [ L n ( ) W n ] P M 3+ L ? ( ) 4 < e U n < 1 ^ n S n (4 n ) 3 = 2 exp nW n [ 1 L n ( ) ] 4 [ L n ( ) W n ] (2–29) byInequalities 2–27 and 2–28 .Next,write Q n ( ) as Q n ( )= Z L n ( ) W n f n ( u ) u W n (1 u ) 3 = 2 exp W n (1 u ) 2( u W n ) exp ( n 1) W n (1 u ) 2( u W n ) du andobservethatthesecondandthirdtermsoftheintegrandc ompriseanunnormalized InverseGamma (1 = 2, W n = 2) densityin ( u W n ) = (1 u ), whichhasmode W n = 3. Then Q n ( ) W n 3 3 = 2 exp 3 2 exp ( n 1) W n [ 1 L n ( ) ] 2 [ L n ( ) W n ] Z L n ( ) 0 f n ( u ) du ( 2 W n ) 3 = 2 exp ( n 1) W n [ 1 L n ( ) ] 2 [ L n ( ) W n ] [ L n ( ) ] n = 2 byLemma 2.19 .ThenthisresultandInequality 2–29 togetheryieldthatforall sufcientlylarge n a.s. ( P 0 ), Q n ( ) R n W n 2 n 3 = 2 [ L n ( ) ] n = 2 exp nW n [ 1 L n ( ) ] 2 [ L n ( ) W n ] n 1 n 1 2 2 n W n 3 = 2 exp nW n [ 1 L ? ( ) ] 16 (2–30) 51

PAGE 52

Nowobservethat liminf n !1 W n =liminf n !1 1+ T n S n + b 1 =liminf n !1 1+ p n 2 0 + n 1 0 n jj r n 0 n jj 22 n p n ( n p n ) T n ( S n + b ) 0 n 1 1+ A = min 2 0 1 = min 2 0 A + min 2 0 a.s. ( P 0 ), whichimpliesthat W n > min 2 0 = (2 A + min 2 0 ) forallsufcientlylarge n a.s. ( P 0 ) .Wemay combinethiswithInequality 2–30 toyieldthatforallsufcientlylarge n a.s. ( P 0 ) Q n ( ) R n 2 n (2 A + min 2 0 ) min 2 0 3 = 2 exp n min 2 0 [ 1 L ? ( ) ] 16(2 A + min 2 0 ) 0 a.s. ( P 0 ) since L ? ( ) < 1 Wemaynowstatethemaintheorem,whichestablishesthesame sufcient conditionforposteriorconsistencyundertheZellner-Sio whyperpriorasfortheconugate hyperpriorandempiricalBayesmodelsoftheprevioussecti ons.However,unlike Theorems 2.10 and 2.17 ,itdoesnotestablishthenecessityofthecondition,which remainsanopenquestion. Theorem2.23. IntheZellner-Siow g -priormodel,posteriorconsistencyoccursifeither =0 ortheredoesnotexistasubsequence k n of n andaconstant A > 0 suchthat jj r k n 0 k n jj 22 A and jj r k n 0 k n jj 1 9 0. Proof. ByLemmas 2.11 and 2.18 ,posteriorconsistencyoccursif P M [( g +1) 1 jj r n 0 n jj 1 > j ^ n S n ] 0 a.s. ( P 0 ) forevery > 0. ThenbyEquation 2–24 ,thisoccursif Q n ( ) = R n 0 a.s. ( P 0 ) forevery > 0. Wenowproceedaccordingtocasessimilarto thoseintheproofsoftheprevioustheorems. Case1:Supposetheredonotexistasubsequence k n of n andaconstant A > 0 suchthat jj r k n 0 k n jj 22 A and jj r k n 0 k n jj 1 9 0. Nowlet k n beasubsequenceof n andlet > 0 .Nowconsidertwosub-cases. 52

PAGE 53

Case1.1:Suppose jj r k n 0 n jj 1 0. Then 1 jj r k n 0 k n jj 1 < 1 forallsufciently large n a.s. ( P 0 ) .Thisimpliesthat L k n ( )= W k n and Q k n ( )=0 forallsufciently large n a.s. ( P 0 ) .Also, R k n > 0 forall n a.s. ( P 0 ) since W k n < 1 a.s. ( P 0 ) .Therefore, Q k n ( ) = R k n 0 a.s. ( P 0 ) Case1.2:NotethatCase1.1canbeappliedtoanyfurthersubs equence m n of k n forwhich jj r m n 0 m n jj 1 0, sowemaysupposeinsteadforCase1.2that liminf n !1 jj r k n 0 k n jj 1 > 0. Notealsothatinthiscase,therecannotexistanyfurther subsequence m n of k n forwhich jj r m n 0 m n jj 22 convergestoanonzeroconstant,since thiswouldcontradicttheoriginalsuppositionofCase1.Si nce liminf n !1 jj r k n 0 k n jj 22 liminf n !1 jj r k n 0 k n jj 21 > 0, itfollowsthat jj r k n 0 k n jj 22 !1 Observethat byLemmas 2.20 and 2.21 ,thereexistaconstant K andasequenceofconstants n ( ) !1 suchthat Q k n ( ) = R k n exp f n [ n ( ) K ] g! 0 a.s. ( P 0 ) .Finally, sinceCases1.1and1.2togetherestablishthat Q k n ( ) = R k n 0 a.s. ( P 0 ) forevery subsequence k n ,itfollowsthat Q n ( ) = R n 0 a.s. ( P 0 ) ,andthereforeposterior consistencyoccurs. Case2:Nowsupposethereexistasubsequence k n of n andaconstant A > 0 suchthat jj r k n 0 k n jj 22 A and jj r k n 0 k n jj 1 9 0 ,andalsosuppose =0 .Note thatCase1canbeappliedtoanysubsequence m n of n forwhicheither jj r m n 0 m n jj 22 doesnotconvergetoanynonzeroconstantor jj r m n 0 m n jj 1 0 ,soitsufcesto showthat Q k n ( ) = R k n 0 a.s. ( P 0 ) .Notealsothatthismeanswemaysupposefor Case2that liminf n !1 jj r k n 0 k n jj 1 > 0 .Nowlet > 0 .Thenweimmediatelyhavethat Q k n ( ) = R n 0 a.s. ( P 0 ) byLemma 2.22 .Thereforeposteriorconsistencyoccurs. Sincethesameconditionissufcientforposteriorconsist encyunderboth thehyperg -priorandZellner-Siowhierarchicalmodels,onemightwon derifthis conditionissufcientforposteriorconsistencyundereve ryhierarchicalmodel. However,thefalsehoodofsuchaclaimismadeclearbytheobs ervationthatthe non-hierarchicalmodel,forwhichthesufcientcondition differs,issimplyaspecialcase 53

PAGE 54

ofthehierarchicalmodelinwhichthehyperprior n isspeciedtobedegenerateat g n Inactuality,theposteriorconsistencyorinconsistencyo fhierarchicalmodelswithother hyperpriorson g remainsatopicforfutureconsideration. 54

PAGE 55

CHAPTER3 INDEPENDENCEPRIORMODELANDEXTENSIONS The g -priormodeldiscussedinChapter 2 speciedapriorstructureoftheform ( n 2 )= ( n j 2 ) ( 2 ), whereweusethenotation ( ) heretodenotepriorsgenerically.Inconstrast,an alternativeapproachistospecify n and 2 tobeaprioriindependent,i.e.,astructureof theform ( n 2 )= ( n ) ( 2 ). (3–1) Wethentake n N p n r n 2 I p n (3–2) where 0 < 2 1 .Notethatwepermitthechoice 2 = 1 forreasonstobeexplained atthestartofSection 3.1 .Thenweagaintake 2 InverseGamma a 2 b 2 (3–3) whereweagainpermit a 2 and b 0 inthesamemannerasPrior 2–2 .Wenow introduceadenition. Denition3.1. An independence-priormodel isaBayesianregressionmodelwith priorsspeciedasPriors 3–1 3–2 ,and 3–3 .Notethat 2 maybetakenasxedor speciedthroughhierarchicalBayesiantechniques. ItshouldbeobservedthatDenition 3.1 includestwodistinctsensesofapriori independence:theindependenceof n and 2 ,aswellastheindependenceofthe individualcomponents n i of n Theoutlineofthethreesectionsofthischapterisasfollow s.Section 3.1 establishes posteriorconsistencyunderaspecialcaseoftheindepende nce-priormodel,whichwe calltheat-priormodel.Section 3.2 providesnecessaryandsufcientconditionsfor posteriorconsistencyundertheindependence-priormodel withnitepriorvariance 2 55

PAGE 56

Section 3.3 providesnecessaryandsufcientconditionsforposterior consistency underahierarchicalindependence-priormodelwithaunifo rmhyperpriorontheprior variance 2 .InSections 3.2 and 3.3 ,weintroduceanadditionalassumptiononthe covariatematrix X n ,whichallowsustoobtainineachsectionaconditionforpos terior consistencythatisbothnecessaryandsufcient. 3.1Flat-PriorModel AcommonchoiceinBayesianregressionmodelsistotakethep rioron n asthe atprioron R p n ,i.e., ( n ) / 1 .However,suchapriormaybeincludedasaspecial caseofPrior 3–2 where 2 = 1 undertheusualconventionthatsuchapriormaybe consideredasanormaldistributionwithinnitevariance. Thisleadstothefollowing denition. Denition3.2. A at-priormodel isanindependence-priormodelwith 2 = 1 Intheat-priormodel,theposterior n ( n 2 j ^ n S n ) isgivenby n ( n 2 j ^ n S n ) / exp n 2 2 n ^ n T 1 n n ^ n 2 ( n + a ) = 2 exp 1 2 2 ( S n + b ) wherewedene n = n ( X Tn X n ) 1 .Integratingout n fromthisyieldsthemarginal posteriorof 2 n ( 2 j ^ n S n ) / 2 ( n p n + a ) = 2 exp 1 2 2 ( S n + b ) i.e., 2 j ^ n S n InverseGamma (( n p n + a 2) = 2,( S n + b ) = 2) .Wenowstateandprove ourmainresultfortheat-priormodel. Theorem3.1. Intheatpriormodel,posteriorconsistencyoccursforeve rysequence f ( 0 n 2 0 ), n 1 g 56

PAGE 57

Proof. ByLemma 1.5 ,wemayreplace 0 n with ^ n inthedenitionofposterior consistency.Let > 0 ,andobservethatunder P M n ^ n j 2 ^ n S n N p n 0 p n n 1 2 n where 0 p n denotesthezerovectoroflength p n .Let n ii denotethe i thdiagonalelement of n for 1 i p n ,andnotethat sup n i n ii max .ThenwemayuseBonferroni's inequalitytowrite P M n ^ n 1 > ^ n S n = E M h P M n ^ n 1 > 2 ^ n S n ^ n S n i E M p n X i =1 P M j n i ^ n i j > 2 ^ n S n ^ n S n # = E M p n X i =1 P M j Z j > s n 2 n ii 2 2 ^ n S n ^ n S n # p n E M 24 P M 0@ j Z j > s n 2 max 2 2 ^ n S n 1A ^ n S n 35 where Z isarandomvariablesuchthat Z j 2 ^ n S n N (0,1) under P M .Nowapply Markov'sinequalityto Z 4 toobtain P M n ^ n 1 > ^ n S n p n E M 3 max 2 n 2 2 ^ n S n # = 3 p n 2max n 2 4 E M h ( 2 ) 2 ^ n S n i Thenbythepropertiesoftheinversegammadistribution, P M n ^ n 1 > ^ n S n 3 p n 2max n 2 4 ( S n + b ) 2 ( n p n + a 4)( n p n + a 6) 0 a.s. ( P 0 ) byLemma 2.1 .Therefore,posteriorconsistencyoccurs. TheresultofTheorem 3.1 isnotentirelytrivial.Recallthatwemayconceptualize posteriorconsistencyastheideathatthecenterofthepost eriordistributionconverges 57

PAGE 58

tothetruevaluewhilethespreadoftheposteriordistribut ionconvergestozero.Itcan beclearlyseenthattherstpartofthisconceptisalwaystr ueintheat-priormodel, sincetheposteriormeanis E M n j ^ n S n = E M h E M n j 2 ^ n S n ^ n S n i = ^ n and jj ^ n 0 n jj 1 0 a.s. ( P 0 ) byCorollary 1.4 .However,itisthesecondpartofthis conceptwhichisnontrivial,andTheorem 3.1 isneededtoestablishthatthespreadof theposteriordoesindeedconvergetozerointheat-priorm odel. 3.2Non-HierarchicalIndependence-PriorModel Itisperhapsreassuringtoknowthatposteriorconsistency isobtainedunder theostensiblyuninformativeprioroftheat-priormodel. However,theresultof Theorem 3.1 isalsousefulasabuildingblockforthefollowinglemma,wh ichappliesto allindependence-priormodelsconsideredinthischapter. Supposethatweinsteadwishtoconsideranindependence-pr iormodelwitha properprior,i.e.,totake 2 tohavesomenitepositivevalue.Wenowintroducean additionalsimplifyingassumption. Assumption4. Foreach n 1 n = I p n AlthoughAssumption 4 isfairlyrestrictive,wecanstillobtainslightlymodied resultsevenincircumstanceswheretheassumptionisnotsa tised.Noticethatina generalregressionproblemwhere n 6 = I p n ,wemaytransformforeach n accordingto X ?n = X n 1 = 2 n ? n = 1 = 2 n n ? 0 n = 1 = 2 n 0 n ,and r ? n = 1 = 2 n r n .Thentheresultsof thissectionandSection 3.3 canatleastestablishposteriorconsistencyorinconsiste ncy oftheposteriordistributionofthetransformedcoefcien tvector ? n ,thoughthisisnot necessarilyequivalenttotheposteriorconsistencyorinc onsistencyoftheposterior distributionoftheoriginalcoefcientvector n WithAssumption 4 inplace,wenowprovideadenition. 58

PAGE 59

Denition3.3. A non-hierarchicalindependence-priormodel isanindependencepriormodelwith 2 takenasxed,where 0 < 2 < 1 ,withtheadditionalAssumption 4 Notethatwerefertosuchamodelasnon-hierarchicalinorde rtodistinguishitfrom hierarchicalmodelsinwhich 2 istreatedasrandomandassignedahyperprior.Dene e B n ( 2 )= E M ( n j 2 ^ n S n )= n 2 + 1 2 1 n 2 ^ n + 1 2 r n andrecallthequantity T n fromEquation 2–3 ,whichmaybewrittenas T n = n jj r n ^ n jj 22 forthenon-hierarchicalindependence-priormodel.Thent hejointposterioris n n 2 ^ n S n / exp 2 + n 2 2 2 h n e B n ( 2 ) i 2 2 ( n + a ) = 2 exp S n + b 2 2 exp T n 2( 2 + n 2 ) andwemayintegrateout n toobtainthemarginalposterior n 2 ^ n S n / 2 ( n p n + a ) = 2 exp S n + b 2 2 2 + n 2 p n = 2 exp T n 2( 2 + n 2 ) / g n ( 2 ) h n ( 2 + n 2 ), (3–4) where g n isthedensitywithrespecttoLebesguemeasureofaninverse gammarandom variablewithshapeparameter ( n p n + a 2) = 2 andscaleparameter ( S n + b ) = 2 andwhere h n isthedensitywithrespecttoLebesguemeasureofaninverse gamma randomvariablewithshapeparameter ( p n 2) = 2 andscaleparameter T n = 2 .We nowprovidealemmathatstatesapreliminarynecessaryands ufcientconditionfor posteriorconsistency. Lemma3.2. Inthenon-hierarchicalindependence-priormodel,poster iorconsistency occursifandonlyif P M ( jj 2 ( 2 + n 2 ) 1 ( r n ^ n ) jj 1 > j ^ n S n ) 0 a.s. ( P 0 ) for every > 0 59

PAGE 60

Proof. ByLemma 1.5 ,wemayreplace 0 n with ^ n inthedenitionofposterior consistency.Forevery > 0 ,wemaywrite P M jj n ^ n jj 1 > j ^ n S n = E M h P M jj n ^ n jj 1 > j 2 ^ n S n ^ n S n i Thenbythetriangleinequality, E M h P M jj e B n ( 2 ) ^ n jj 1 > 2 j 2 ^ n S n ^ n S n i E M h P M jj n e B n ( 2 ) jj 1 > j 2 ^ n S n ^ n S n i P M jj n ^ n jj 1 > j ^ n S n (3–5) E M h P M jj e B n ( 2 ) ^ n jj 1 >= 2 j 2 ^ n S n ^ n S n i + E M h P M jj n e B n ( 2 ) jj 1 >= 2 j 2 ^ n S n ^ n S n i Nowdene n ( 2 ,1 = 2 )= P M n e B n ( 2 ) 1 > 2 ^ n S n (3–6) (Herewewishtoexplicitlydenotethedependenceof n ( 2 ,1 = 2 ) on 1 = 2 ,which occursthroughthemodel P M itself.)Nowrewritetheinequalitiesaboveas P M 2 2 + n 2 r n ^ n 1 > 2 ^ n S n E M h n ( 2 ,1 = 2 ) ^ n S n i P M jj n ^ n jj 1 > j ^ n S n (3–7) P M 2 2 + n 2 r n ^ n 1 >= 2 ^ n S n + E M h n ( = 2, 2 ,1 = 2 ) ^ n S n i Observethatunder P M n e B n ( 2 ) j 2 ^ n S n N p n 0 p n 1 2 + 1 2 1 I p n Fromtheformofthevariancematrixabove,itfollowsthat n ( 2 ,1 = 2 ) n ( 2 ,0) forevery > 0 2 > 0 ,and 2 > 0 .Nowconsiderthat n ( 2 ,0) issimplythe conditionalprobabilityinEquation 3–6 ascalculatedundertheat-priormodelof 60

PAGE 61

Section 3.1 .However,itcanbeseenthatTheorem 3.1 isequivalenttothestatement that E M h n ( 2 ,0) ^ n S n i 0 a.s. ( P 0 ) forevery > 0. Thereforewealsohavethat E M h n ( 2 ,1 = 2 ) ^ n S n i 0 a.s. ( P 0 ) forevery > 0. ThenthisresultandInequality 3–7 togetherestablishthatposteriorconsistencyoccurs inthenon-hierarchicalindependence-priormodelifandon lyif P M 2 2 + n 2 r n ^ n 1 > ^ n S n 0 a.s. ( P 0 ) forevery > 0. (3–8) Thenwemayonceagainusethetriangleinequalitytowrite P M 2 2 + n 2 jj r n 0 n jj 1 > 2 ^ n S n P M 2 2 + n 2 ^ n 0 n 1 > ^ n S n P M 2 2 + n 2 r n ^ n 1 > ^ n S n (3–9) P M 2 2 + n 2 jj r n 0 n jj 1 > 2 ^ n S n + P M 2 2 + n 2 ^ n 0 n 1 > 2 ^ n S n However,observethatforevery > 0 P M 2 2 + n 2 ^ n 0 n 1 > ^ n S n P M ^ n 0 n 1 > ^ n S n = I ^ n 0 n 1 > 0 a.s. ( P 0 ) byCorollary 1.4 .Thereforeposteriorconsistencyoccursifandonlyif P M 2 2 + n 2 jj r n 0 n jj 1 > ^ n S n 0 a.s. ( P 0 ) forevery > 0, followingfromCondition 3–8 andInequality 3–9 61

PAGE 62

Lemma 3.2 anditsmethodofproofdeserveadditionalcomment.Recallt hat posteriorconsistencycanbeconceptualizedastheideatha tthecenteroftheposterior convergestothetruevaluewhilethespreadconvergestozer o.Forthenon-hierarchical g -priormodelasconsideredinSection 2.1 ,Theorem 2.8 providedapairofconditions thatwerenecessaryandsufcientforposteriorconsistenc y:aconditionthatessentially controlledthecenteroftheposterior,andanotherconditi onthatessentiallycontrolled thespread.Boththeseconditionswereneededforposterior consistency. However,inthenon-hierarchicalindependence-priormode l,weseedifferent behavior.Specically,theproofofLemma 3.2 showsthat,looselyspeaking,thespread oftheposteriorintheat-priormodelisgreaterthanthesp readoftheposteriorin anyotherindependence-priormodel.Theusefulnessofthis relationshipliesinthe factthatLemma 3.1 alreadydemonstratedthatthespreadoftheposteriorinthe atpriormodeldoesindeedconvergetozero.Hence,thesame isalsotrueofany otherindependence-priormodel.Thismeansthattoprovepo steriorconsistencyor inconsistencyinanyotherindependence-priormodel,itsu fcestoprovesimplythatthe centeroftheposteriordoesordoesnotconvergetothetruev alue,informallyspeaking. Ofcourse,thisispreciselytheideathatisstatedrigorous lybyLemma 3.2 TheconditionalprobabilityinvolvedinLemma 3.2 maybewrittenas P M 2 2 + n 2 jj r n 0 n jj 1 > ^ n S n = Z 1 0 g n ( 2 ) h n ( 2 + n 2 ) I 2 2 + n 2 jj r n 0 n jj 1 > d 2 Z 1 0 g n ( 2 ) h n ( 2 + n 2 ) d 2 Bymakingthetransformation = 2 = ( 2 + n 2 ) ,wemaywrite P M 2 2 + n 2 jj r n 0 n jj 1 > ^ n S n = Z 1 jj r n 0 n jj 1 1 g ? n ( ) h ? n (1 ) d Z 1 0 g ? n ( ) h ? n (1 ) d (3–10) 62

PAGE 63

where g ? n isthedensitywithrespecttoLebesguemeasureofaninverse gammarandom variablewithshapeparameter ( n p n + a 4) = 2 andscaleparameter ( S n + b ) = (2 n 2 ) ,and h ? n isthedensitywithrespecttoLebesguemeasureofagamma( not inversegamma) randomvariablewithshapeparameter ( n + a 2) = 2 andrateparameter T n = (2 n 2 ) Thefollowinglemmas,whichsupplysomebasicresultsrelat edtotheinverse gammadistribution,willbeneededtoproveforthemainresu ltofthissection. Lemma3.3. Let Z InverseGamma ( q r ) and > r = q .Then P ( Z > ) ( r = ) q = ( q +1) Proof. Simplynotethat P ( Z > )= r q ( q ) Z 1 1 z q +1 exp r z dz r q ( q ) Z 1 1 z q +1 dz = ( r = ) q ( q +1) yieldingtheresult. Lemma3.4. Let Z n InverseGamma ( nq n nr n ) for n 1 ,where q n q > 0 and r n r > 0 .Then P ( r = q Z n r = q + ) 1 forevery > 0 Proof. Let > 0 .Forallsufcientlylarge n E ( Z n )= r n = ( q n n 1 ) r = q ,so j r n = ( q n n 1 ) r = q j = 2 forallsufcientlylarge n .Also,forallsufcientlylarge n Var ( Z n )= r 2 n = [ n ( q n n 1 ) 2 ( q n 2 n 1 )] 2 r 2 = ( nq 3 ) .Thenforallsufcientlylarge n P r q Z n r q + = P r q r n q n n 1 Z n r n q n n 1 r q r n q n n 1 + P 2 Z n r n q n n 1 2 1 4 2 Var ( Z n ) 1 8 r 2 nq 3 2 1, wherethesecondofthethreeinequalitiesisChebyshev'sin equality. Corollary3.5. Let Z n Gamma ( nq n nr n ) for n 1 ,where q n q > 0 and r n r > 0 Then P ( q = r Z n q = r + ) 1 forevery > 0 Proof. ApplyLemma 3.4 totherandomvariable Z 1 n InverseGamma ( nq n nr n ) 63

PAGE 64

Wenowstateandprovethissection'smaintheorem,whichpro videsnecessaryand sufcientconditionsforposteriorconsistencyinthenonhierarchicalindependence-prior model. Theorem3.6. Inthenon-hierarchicalindependence-priormodel,poster iorconsistency occursifandonlyif ( n log n ) 1 jj r n 0 n jj 22 0 Proof. ByLemma 3.2 ,posteriorconsistencyoccursifandonlyif P M 2 2 + n 2 jj r n 0 n jj 1 > ^ n S n 0 a.s. ( P 0 ) forevery > 0. (3–11) ThenbyEquation 3–10 ,posteriorconsistencyoccursifandonlyif Z 1 jj r n 0 n jj 1 1 g ? n ( ) h ? n (1 ) d Z 1 0 g ? n ( ) h ? n (1 ) d 0 a.s. ( P 0 ) forevery > 0, orequivalently,ifandonlyif Z 1 jj r n 0 n jj 1 1 g ? n ( ) h ? n (1 ) d Z jj r n 0 n jj 1 1 0 g ? n ( ) h ? n (1 ) d 0 a.s. ( P 0 ) forevery > 0, (3–12) Let G ? n and H ? n denoterandomvariableswithdensities g ? n and h ? n ,respectively,with respecttoLebesguemeasuregiven ^ n and S n under P M .Wenowconsiderthetwo casesaccordingtotheconditionofthetheorem. Case1:Suppose ( n log n ) 1 jj r n 0 n jj 22 0 ,andlet > 0 .Noteimmediatelythat n jj r n 0 n jj 1 1 n jj r n 0 n jj 1 2 !1 ,whichimpliesthat n jj r n 0 n jj 1 1 > 2 2 0 = 2 forallsufcientlylarge n .Nowobservethatif jj r n 0 n jj 1 0 ,thenwecanverify Condition 3–11 directlysince P M 2 2 + n 2 jj r n 0 n jj 1 > ^ n S n P M jj r n 0 n jj 1 > j ^ n S n = I jj r n 0 n jj 1 > 0, 64

PAGE 65

inwhichcaseposteriorconsistencyoccurs.Thenwemayinst eadassumeforthiscase that liminf n !1 jj r n 0 n jj 1 = > 0 ,meaningalsothat liminf n !1 jj r n 0 n jj 22 2 > 0 Next,notethatthemodeofthedensity h ? n is n 2 ( n + a 4) = T n .Wenowproceedby sub-dividingCase1intothreesub-casesaccordingtotheva lueofthismode.Foreach case,wewillshowthattheratioofintegralsinCondition 3–12 isboundedaboveforall sufcientlylarge n byasequencethatconvergestozeroa.s. ( P 0 ) forevery > 0 Case1.1:Suppose n 2 ( n + a 4) = T n 1 .Then h ? n (1 ) isdecreasingin onthe interval [0,1] .Wemaythenwrite Z 1 jj r n 0 n jj 1 1 g ? n ( ) h ? n (1 ) d Z jj r n 0 n jj 1 1 0 g ? n ( ) h ? n (1 ) d h ? n 1 jj r n 0 n jj 1 1 Z 1 jj r n 0 n jj 1 1 g ? n ( ) d h ? n 1 jj r n 0 n jj 1 1 Z jj r n 0 n jj 1 1 0 g ? n ( ) d P M nG ? n > n jj r n 0 n jj 1 1 ^ n S n P M nG ? n < n jj r n 0 n jj 1 1 ^ n S n P M nG ? n > 2 2 0 = 2 j ^ n S n P M nG ? n < 2 2 0 = 2 j ^ n S n 0 a.s. ( P 0 ) byLemma 3.4 ,notingthat E M ( nG ? n j ^ n S n ) 2 0 = 2 a.s. ( P 0 ) Case1.2:Suppose n 2 ( n + a 4) = T n 1 jj r n 0 n jj 1 1 .Then h ? n (1 ) is increasingin ontheinterval 0, jj r n 0 n jj 1 1 .Wemaythenwrite Z 1 jj r n 0 n jj 1 1 g ? n ( ) h ? n (1 ) d Z jj r n 0 n jj 1 1 0 g ? n ( ) h ? n (1 ) d h ? n n 2 ( n + a 4) T n Z 1 jj r n 0 n jj 1 1 g ? n ( ) d h ? n ( 1 ) Z jj r n 0 n jj 1 1 0 g ? n ( ) d n 2 ( n + a 4) T n e ( n + a 4) = 2 exp T n 2 n 2 P M nG ? n > n jj r n 0 n jj 1 1 ^ n S n P M nG ? n < n jj r n 0 n jj 1 1 ^ n S n 65

PAGE 66

Thensince n 2 ( n + a 4) = T n 1 jj r n 0 n jj 1 1 < e ,wemaywrite Z 1 jj r n 0 n jj 1 1 g ? n ( ) h ? n (1 ) d Z jj r n 0 n jj 1 1 0 g ? n ( ) h ? n (1 ) d exp T n 2 n 2 P M nG ? n > n jj r n 0 n jj 1 1 ^ n S n P M nG ? n < n jj r n 0 n jj 1 1 ^ n S n (3–13) TheprobabilityinthenumeratorofInequality 3–13 satises P M nG ? n > n jj r n 0 n jj 1 1 ^ n S n ( S n + b ) jj r n 0 n jj 1 2 n 2 ( n p n + a 4) = 2 n p n + a 2 2 1 byLemma 3.3 .ThenbyStirling'sapproximation,wehavethatforallsuf cientlylarge n P M nG ? n > n jj r n 0 n jj 1 1 ^ n S n ( S n + b ) jj r n 0 n jj 1 2 n 2 ( n p n + a 4) = 2 2 p 2 n p n + a 2 2 ( n p n + a 3) = 2 exp n p n + a 2 2 ( S n + b ) jj r n 0 n jj 1 e n 2 ( n p n + a 2) ( n p n + a 4) = 2 2 e p ( n p n + a 2) 1 = 2 ( S n + b ) jj r n 0 n jj 1 e n 2 ( n p n + a 2) ( n p n + a 4) = 2 (3–14) NoticefromInequality 3–14 that P M nG ? n > n jj r n 0 n jj 1 1 ^ n S n 0 a.s. ( P 0 ) by Lemma 2.1 andthefactthat n jj r n 0 n jj 1 1 !1 ,fromwhichitimmediatelyfollowsthat P M nG ? n < n jj r n 0 n jj 1 1 ^ n S n 1 a.s. ( P 0 ) .Thenbythisfact,Inequality 3–13 andInequality 3–14 ,wenowhavethatforallsufcientlylarge n a.s. ( P 0 ) Z 1 jj r n 0 n jj 1 1 g ? n ( ) h ? n (1 ) d Z jj r n 0 n jj 1 1 0 g ? n ( ) h ? n (1 ) d 2exp T n 2 n 2 ( S n + b ) jj r n 0 n jj 1 e n 2 ( n p n + a 2) ( n p n + a 4) = 2 66

PAGE 67

NowobservethatbyLemma 2.1 ( S n + b ) = ( n p n + a 2) 3 2 0 = e forallsufciently large n a.s. ( P 0 ) .Also,byLemma 2.2 ,theformof 0 n fromEquation 2–4 ,andthefact that liminf n !1 jj r n 0 n jj 22 2 > 0 T n n 2 0 n n 2 2 0 + jj r n 0 n jj 22 2 2 0 + 2 2 jj r n 0 n jj 22 forallsufcientlylarge n a.s. ( P 0 ) .Thenwehavethatforallsufcientlylarge n a.s. ( P 0 ) Z 1 jj r n 0 n jj 1 1 g ? n ( ) h ? n (1 ) d Z jj r n 0 n jj 1 1 0 g ? n ( ) h ? n (1 ) d 2exp 2 0 + 2 jj r n 0 n jj 22 2 2 3 2 0 jj r n 0 n jj 1 n 2 ( n p n + a 4) = 2 Since jj r n 0 n jj 1 jj r n 0 n jj 2 ,wenowhavethatforallsufcientlylarge n a.s. ( P 0 ) Z 1 jj r n 0 n jj 1 1 g ? n ( ) h ? n (1 ) d Z jj r n 0 n jj 1 1 0 g ? n ( ) h ? n (1 ) d 2exp 2 0 + 2 jj r n 0 n jj 22 2 2 + n p n + a 4 2 log 3 2 0 jj r n 0 n jj 2 n 2 # (3–15) Since ( n log n ) 1 jj r n 0 n jj 22 0 ,thereexistsasequence c n > 0 with c n 0 suchthat jj r n 0 n jj 22 c n n log n forallsufcientlylarge n .Thenforallsufcientlylarge n ,the quantityinsquarebracketsinInequality 3–15 satises 2 0 + 2 jj r n 0 n jj 22 2 2 + n p n + a 4 2 log 3 2 0 jj r n 0 n jj 2 n 2 2 0 + 2 c n n log n 2 2 + n p n + a 4 2 log 3 2 0 2 r c n log n n = ( n log n ) 2 0 + 2 c n 2 2 n p n + a 4 4 n # + n p n + a 4 2 log 3 2 0 2 p c n log n !1 67

PAGE 68

sincethequantityinsquarebracketsconvergesto (1 ) = 4 < 0 .Thereforetheupper boundinInequality 3–15 convergestozero. Case1.3:Suppose 1 jj r n 0 n jj 1 1 < n 2 ( n + a 4) = T n < 1 .Then h ? n (1 ) isdecreasingin ontheinterval [ jj r n 0 n jj 1 1 ,1] .Ontheinterval [0, jj r n 0 n jj 1 1 ] h ? n (1 ) isminimizedatoneoftheendpointsoftheinterval.Ifitism inimizedat = jj r n 0 n jj 1 1 ,thenwemaywrite Z 1 jj r n 0 n jj 1 1 g ? n ( ) h ? n (1 ) d Z jj r n 0 n jj 1 1 0 g ? n ( ) h ? n (1 ) d h ? n 1 jj r n 0 n jj 1 1 Z 1 jj r n 0 n jj 1 1 g ? n ( ) d h ? n 1 jj r n 0 n jj 1 1 Z jj r n 0 n jj 1 1 0 g ? n ( ) d andproceedexactlyasinCase1.1.Ifitisminimizedat =0 ,thenwemaywrite Z 1 jj r n 0 n jj 1 1 g ? n ( ) h ? n (1 ) d Z jj r n 0 n jj 1 1 0 g ? n ( ) h ? n (1 ) d h ? n n 2 ( n + a 4) T n Z 1 jj r n 0 n jj 1 1 g ? n ( ) d h ? n ( 1 ) Z jj r n 0 n jj 1 1 0 g ? n ( ) d andproceedexactlyasinCase1.2,notingthatthelastinequ alityofInequality 3–13 still holdssince n 2 ( n + a 4) = T n < 1 < e Case2:Supposeinsteadthat ( n log n ) 1 jj r n 0 n jj 22 9 0 .Thenthereexista subsequence k n of n andaconstant > 0 suchthat ( k n log k n ) 1 jj r k n 0 k n jj 22 > forall n .Notethatposteriorinconsistencyalongthesubsequence k n impliesposterior inconsistencyoftheoverallsequence,sowemayassumewith outlossofgenerality that k n = n fornotationalconvenience.Itthenfollowsfromthisassum ptionthat jj r n 0 n jj 1 1 p 1 = 2 n jj r n 0 n jj 1 2 1 = 2 (log n ) 1 = 2 0 .Next,notethatthemodeof thedensity g ? n is ( S n + b ) = n 2 ( n p n + a 2) .Wenowproceedbysub-dividingCase2 intotwosub-casesaccordingtothevalueofthismode.Forea chcase,wewillshowthat theratioofintegralsinCondition 3–12 isboundedbelowbyasequencethatdoesnot convergetozeroa.s. ( P 0 ) 68

PAGE 69

Case2.1:Suppose ( S n + b ) = n 2 ( n p n + a 2) jj r n 0 n jj 1 1 .Then g ? n ( ) is decreasingontheinterval jj r n 0 n jj 1 1 ,1 .Wemaythenwrite Z 1 jj r n 0 n jj 1 1 g ? n ( ) h ? n (1 ) d Z jj r n 0 n jj 1 1 0 g ? n ( ) h ? n (1 ) d g ? n (1) Z 1 jj r n 0 n jj 1 1 h ? n ( ) d g ? n S n + b n 2 ( n p n + a 4) Z jj r n 0 n jj 1 1 0 h ? n ( ) d = exp S n + b 2 n 2 P M H ? n < 1 jj r n 0 n jj 1 1 ^ n S n ( S n + b ) e n 2 ( n p n + a 2) ( n p n + a 2) = 2 P M 1 jj r n 0 n jj 1 1 < H ? n < 1 ^ n S n =exp S n + b 2 n 2 ( S n + b ) e n 2 ( n p n + a 2) ( n p n + a 2) = 2 P M H ? n T n n 2 < T n n 2 1 jj r n 0 n jj 1 1 ^ n S n P M T n n 2 1 jj r n 0 n jj 1 1 < H ? n T n n 2 < T n n 2 ^ n S n (3–16) NowobservethatbyLemma 2.2 T n n 2 = T n 0 n 0 n n 2 T n 0 n jj r n 0 n jj 22 n T n 0 n ( log n ) !1 a.s. ( P 0 ). Thisresultandthefactthat jj r n 0 n jj 1 1 0 implythat n 2 T n (1 jj r n 0 n jj 1 1 ) !1 a.s. ( P 0 ). Thenforallsufcientlylarge n a.s. ( P 0 ), P M H ? n T n n 2 < T n n 2 1 jj r n 0 n jj 1 1 ^ n S n P M H ? n T n n 2 < 2 2 ^ n S n 1 a.s. ( P 0 ) 69

PAGE 70

byCorollary 3.5 ,notingthat E M ( n 2 H ? n T n j ^ n S n ) 2 a.s. ( P 0 ) .Wemayusethis resultandInequality 3–16 towrite Z 1 jj r n 0 n jj 1 1 g ? n ( ) h ? n (1 ) d Z jj r n 0 n jj 1 1 0 g ? n ( ) h ? n (1 ) d exp S n + b 2 n 2 ( S n + b ) e n 2 ( n p n + a 2) ( n p n + a 2) = 2 2 P M 1 jj r n 0 n jj 1 1 < H ? n < 1 ^ n S n (3–17) forallsufcientlylarge n a.s. ( P 0 ) .Lemma 2.1 impliesthat ( S n + b ) = n 2(1 ) 2 0 2 2 0 and ( S n + b ) = ( n p n + a 2) 2 2 0 = e ,bothforallsufcientlylarge n a.s. ( P 0 ) .Thenthe numeratorofthelowerboundinInequality 3–17 satises exp S n + b 2 n 2 ( S n + b ) e n 2 ( n p n + a 2) ( n p n + a 2) = 2 2 2 0 n 2 ( n p n + a 2) = 2 exp 2 0 2 (3–18) forallsufcientlylarge n a.s. ( P 0 ) .Nowrecallthat n 2 ( n + a 4) = T n isthemodeofthe density h ? n ,andnotethat n 2 ( n + a 4) T n 0 n T n 2 ( n + a 4) jj r n 0 n jj 22 0 n T n 2 ( n + a 4) n log n 0 a.s. ( P 0 ). Thensince jj r n 0 n jj 1 1 0 ,themodeof h ? n islessthan 1 jj r n 0 n jj 1 1 for allsufcientlylarge n a.s. ( P 0 ) .Thisimpliesthat h ? n ( ) isdecreasingin onthe interval 1 jj r n 0 n jj 1 jj 1 ,1 forallsufcientlylarge n a.s. ( P 0 ) .Thenwehave 70

PAGE 71

thatforallsufcientlylarge n a.s. ( P 0 ) P M 1 jj r n 0 n jj 1 1 < H ? n < 1 ^ n S n jj r n 0 n jj 1 1 h ? n 1 jj r n 0 n jj 1 1 jj r n 0 n jj 1 1 T n 2 n 2 ( n + a 2) = 2 n + a 2 2 1 exp 24 T n 1 jj r n 0 n jj 1 1 2 n 2 35 WethenuseStirling'sapproximationtowrite P M 1 jj r n 0 n jj 1 1 < H ? n < 1 ^ n S n jj r n 0 n jj 1 1 T n 2 n 2 ( n + a 2) = 2 2 p 2 n + a 2 2 ( n + a 3) = 2 exp n + a 2 2 exp 24 T n 1 jj r n 0 n jj 1 1 2 n 2 35 jj r n 0 n jj 1 1 T n ( n + a 2) 4 n 2 e ( n + a 2) = 2 p n + a 2 exp 24 T n 1 jj r n 0 n jj 1 1 2 n 2 35 (3–19) forallsufcientlylarge n a.s. ( P 0 ) .Thensince jj r n 0 n jj 1 1 0 and T n 0 n 2 n jj r n 0 n jj 22 2 n 2 log n 2 forallsufcientlylarge n a.s. ( P 0 ) byLemma 2.2 ,wemaycontinuefromInequality 3–19 bywriting P M 1 jj r n 0 n jj 1 1 < H n < 1 ? ^ n S n n ( n + a 2)log n 8 2 e ( n + a 2) = 2 exp n log n 4 2 p n + a 2 (3–20) 71

PAGE 72

forallsufcientlylarge n a.s. ( P 0 ) .NowInequalities 3–17 3–18 ,and 3–20 togetheryield thatforallsufcientlylarge n a.s. ( P 0 ) Z 1 jj r n 0 n jj 1 1 g ? n ( ) h ? n (1 ) d Z jj r n 0 n jj 1 1 0 g ? n ( ) h ? n (1 ) d 2 2 0 n 2 ( n p n + a 2) = 2 exp 2 0 2 2 n ( n + a 2)log n 8 2 e ( n + a 2) = 2 exp n log n 4 2 p n + a 2 =exp n + a 2 2 log n ( n + a 2)log n 8 2 e n p n + a 2 2 log n 2 2 2 0 + n log n 4 2 1 2 log( n + a 2) 2 0 2 log2 exp n + a 2 2 log n 2 log n 16 2 e n + a 2 2 log n 2 2 2 0 1 2 log( n + a 2) 2 0 2 log2 =exp n + a 2 2 log 2 0 n log n 8( 2 ) 2 e 1 2 log( n + a 2) 2 0 2 log2 !1 sincethequantityinsquarebracketstendstoinnity. Case2.2:Suppose ( S n + b ) = n 2 ( n p n + a 2) > jj r n 0 n jj 1 1 .Then g ? n ( ) is increasingin ontheinterval [0, jj r n 0 n jj 1 1 ] .Ontheinterval [ jj r n 0 n jj 1 1 ,1] g ? n ( ) isminimizedatoneoftheendpointsoftheinterval.Ifitism inimizedat =1 ,thenwe maywrite Z 1 jj r n 0 n jj 1 1 g ? n ( ) h ? n (1 ) d Z jj r n 0 n jj 1 1 0 g ? n ( ) h ? n (1 ) d g ? n (1) Z 1 jj r n 0 n jj 1 1 h ? n ( ) d g ? n S n + b n 2 ( n p n + a 4) Z jj r n 0 n jj 1 1 0 h ? n ( ) d 72

PAGE 73

andproceedexactlyasinCase2.1.Ifitisminimizedat = jj r n 0 n jj 1 1 ,thenwemay write Z 1 jj r n 0 n jj 1 1 g ? n ( ) h ? n (1 ) d Z jj r n 0 n jj 1 1 0 g ? n ( ) h ? n (1 ) d g ? n jj r n 0 n jj 1 1 Z 1 jj r n 0 n jj 1 1 h ? n ( ) d g ? n jj r n 0 n jj 1 1 Z jj r n 0 n jj 1 1 0 h ? n ( ) d = P M H ? n < 1 jj r n 0 n jj 1 1 ^ n S n P M 1 jj r n 0 n jj 1 1 < H ? n < 1 ^ n S n !1 a.s. ( P 0 ) since P M H ? n < 1 jj r n 0 n jj 1 1 j ^ n S n 1 a.s. ( P 0 ) justasinCase2.1. ItisclearfromtheconditionofTheorem 3.6 thatinthecasewhere p n p r n g and 0 n 0 areallxed,posteriorconsistencyoccursinthenon-hiera rchical independence-priormodel.Posteriorconsistencyalsoocc ursif jj r n 0 n jj 1 = O (1) sincethisimpliesthat jj r n 0 n jj 22 = O ( p n ) WemayalsoconsidertheconditionofTheorem 3.6 inlightofthediscussion thatfollowedLemma 3.2 aboutthecenterandspreadoftheposteriordistribution. Lemma 3.2 couldbeinterpretedasstatingthatinthenon-hierarchica lindependence-prior model,posteriorconsistencyoccursifandonlyifthecente roftheposteriordistribution convergestothetruevalue,informallyspeaking.Thecondi tionofTheorem 3.6 is anecessaryandsufcientconditionforthisdesiredbehavi oroftheposterior's center.However,itisinterestingtonotethattheconditio nofTheorem 3.6 involves jj r n 0 n jj 22 ,whichcontrastswithTheorem 2.8 forthenon-hierarchical g -priormodel. InTheorem 2.8 ,thepartoftheconditionrelatedtotheposterior'scenter involved jj r n 0 n jj 1 instead.Apartialexplanationforthisdiscrepancycanbef oundbyviewing theposteriormeanasaweightedaverageoftheMLE ^ n andthepriormean r n Conditionalon 2 ,theseweightsare n 2 = ( 2 + n 2 ) and 2 = ( 2 + n 2 ) ,respectively. Thus,thebehavioroftheposteriormeanof n dependscloselyonthebehaviorofthe 73

PAGE 74

posteriordistributionof 2 ,whichinturndependscloselyon jj r n 0 n jj 22 throughthe quantity T n .Thisdiffersfromthe g -priormodelwheretheweightsareinstead g n = ( g n +1) and 1 = ( g n +1) ,respectively. 3.3Uniform-HyperpriorIndependence-PriorModel Analternativeapproachtothespecicationof 2 isahierarchicalmodelinwhich 2 isconsideredahyperparameterandisgivenahyperprior ( 2 ). Inthiscase,Prior 3–2 on n istakentobeconditionalon 2 .Morespecically,supposethehyperprioron 2 is speciedas 2 Uniform (0, 1 ), (3–21) i.e.,theimproperhyperprior ( 2 ) / 1 for 0 2 < 1 .Thenwehavethefollowing denition. Denition3.4. A uniform-hyperpriorindependence-priormodel isanindependencepriormodelwhere 2 istakenasahyperparameterwithahyperpriorspeciedas Prior 3–21 ,withtheadditionalAssumption 4 Thejointposteriorintheuniform-hyperpriorindependenc e-priormodelisgivenby n ( n 2 2 j ^ n S n ) / exp 1 2 n 2 + 1 2 n ~ B n ( 2 2 ) 22 2 p n = 2 exp 264 n r n ^ n 22 2 ( 2 + n 2 ) 375 2 ( n + a ) = 2 exp S n + b 2 2 where ~ B n ( 2 2 )= E M ( n j 2 2 ^ n S n )= n 2 + 1 2 1 n 2 ^ n + 1 2 r n Integratingout n yieldsthemarginalposterior n ( 2 2 j ^ n S n ) / 2 ( n p n + a ) = 2 exp S n + b 2 2 2 + n 2 p n = 2 exp 264 n r n ^ n 22 2 ( 2 + n 2 ) 375 / g n ( 2 ) h n ( 2 + n 2 ), (3–22) 74

PAGE 75

where g n isthedensitywithrespecttoLebesguemeasureofaninverse gammarandom variablewithshapeparameter ( n p n + a 2) = 2 andscaleparameter ( S n + b ) = 2 ,and where h n isthedensitywithrespecttoLebesguemeasureofaninverse gammarandom variablewithshapeparameter ( p n 2) = 2 andscaleparameter n jj r n ^ n jj 22 = 2 .Wenow providealemmathatisessentiallyidenticaltoLemma 3.2 Lemma3.7. Intheuniform-hyperpriorindependence-priormodel,post eriorconsistency occursifandonlyif P M ( jj 2 ( 2 + n 2 ) 1 ( r n ^ n ) jj 1 > j ^ n S n ) 0 a.s. ( P 0 ) for every > 0 Proof. TheproofisidenticaltotheproofofLemma 3.2 ,exceptwitheachinstanceof conditioningon 2 replacedbyconditioningonboth 2 and 2 ,andwith e B n ( 2 ) replaced by ~ B n ( 2 2 ) correspondingly. Nownoticethatwemayagainexpressthequantity T n as T n = n jj r n ^ n jj 22 Alsorecallthequantities W n fromTransformation 2–17 .Fortheuniform-hyperprior independence-priormodel, W n maybewrittenas W n = S n + b S n + b + n r n ^ n 22 Alsorecallthequantities e L n ( ) and L n ( ) fromEquation 2–18 ,whichmaybewrittenas e L n ( )= 1 jj r n 0 n jj 1 ( S n + b ) 1 jj r n 0 n jj 1 ( S n + b )+ n r n ^ n 22 L n ( )=max n W n eL n ( ) o fortheuniform-hyperpriorindependence-priormodel. Wenowstateandprovethemainresult,anecessaryandsufci entconditionfor posteriorconsistencyintheuniform-hyperpriorindepend ence-priormodel. Theorem3.8. Intheuniform-hyperpriorindependence-priormodel,post eriorconsistencyoccursifandonlyifeither =0 ortheredoesnotexistasubsequence k n of n and aconstant A > 0 suchthat jj r k n 0 k n jj 22 A and jj r k n 0 k n jj 1 9 0. 75

PAGE 76

Proof. ByLemma 3.7 ,posteriorconsistencyoccursifandonlyif P M 2 2 + n 2 jj r n 0 n jj 1 > ^ n S n 0 a.s. ( P 0 ) forevery > 0. (3–23) WemaythenusePosterior 3–22 towritetheprobabilityinCondition 3–23 as P M 2 2 + n 2 jj r n 0 n jj 1 > ^ n S n = Z 1 0 Z 1 0 g n ( 2 ) h n ( 2 + n 2 ) I 2 2 + n 2 jj r n 0 n jj 1 > d 2 d 2 Z 1 0 Z 1 0 g n ( 2 ) h n ( 2 + n 2 ) d 2 d 2 Nowlet 2 = 2 + n 2 andmakethetransformation ( 2 2 ) 7! ( 2 2 ) torewritethis probabilityas P M 2 2 + n 2 jj r n 0 n jj 1 > ^ n S n = Z 1 0 Z 1 2 g n ( 2 ) h n ( 2 ) I 2 < 2 jj r n 0 n jj 1 d 2 d 2 Z 1 0 Z 1 2 g n ( 2 ) h n ( 2 ) d 2 d 2 = P M G n < H n < G n jj r n 0 n jj 1 ^ n S n P M G n < H n j ^ n S n (3–24) where G n and H n arerandomvariablesthatareindependentunder P M withdistributions G n j ^ n S n InverseGamma (( n p n + a 2) = 2,( S n + b ) = 2) and H n j ^ n S n InverseGamma (( p n 2) = 2, n jj r n ^ n jj 22 = 2). Nowdenerandomvariables e G n = ( S n + b ) 1 n jj r n ^ n jj 22 G n and e U n =( e G n + H n ) 1 H n .Theserandomvariableshave distributions e G n j ^ n S n InverseGamma 0B@ n p n + a 2 2 n r n ^ n 22 2 1CA e U n j ^ n S n Beta n p n + a 2 2 p n 2 2 (3–25) 76

PAGE 77

under P M .ThentheratioofprobabilitiesinEquation 3–24 maybewrittenas P M 2 2 + n 2 jj r n 0 n jj 1 > ^ n S n = P M h W n < e U n < L n ( ) ^ n S n i P M e U n > W n ^ n S n where e U n j ^ n S n hasDistribution 3–25 under P M .Thus,posteriorconsistencyoccursin theuniform-hyperpriorindependence-priormodelifandon lyif P M h W n < e U n < L n ( ) ^ n S n i P M e U n > W n ^ n S n 0 a.s. ( P 0 ) forevery > 0. (3–26) However,thisissimplyaspecialcaseofCondition 2–21 ,whichwasobtainedin Theorem 2.17 forthehyperg -priormodelofSection 2.3 ,notingthattherandom variable e U n inCondition 3–26 isequivalenttotherandomvariable U n inCondition 2–21 with c =0 .(Wealsonotethatthetruemodel P 0 underconsiderationisnotrelatedtothe priorspecication,andthusthetruemodelfortheuniformhyperpriorindependence-prior modelissimplyaspecialcaseofthetruemodelfortheforthe hyperg -priormodel, subjecttotheadditionalAssumption 4 .)Itwasalreadyestablishedintheproofof Theorem 2.17 thatCondition 2–21 holdsifandonlyifeither =0 ortheredoesnot existasubsequence k n of n andaconstant A > 0 suchthat jj r k n 0 k n jj 22 A and jj r k n 0 k n jj 1 9 0 .Therefore,thisisanecessaryandsufcientconditionfor posterior consistencyintheuniform-hyperpriorindependence-prio rmodelaswell. Onceagain,Theorem 3.8 shouldnotbemisconstruedassuggestingthatthis samenecessaryandsufcientconditionwouldbesharedbyot herhierarchical independence-priormodels.JustasinChapter 2 ,wemayobservethatthenon-hierarchical independence-priormodelissimplyaspecialcaseofahiera rchicalindependence-prior modelwithadegeneratehyperprior.Thisprovidesanobviou scounterexample sincethenecessaryandsufcientconditionsforposterior consistencydifferin Theorems 3.6 and 3.8 77

PAGE 78

CHAPTER4 ADDITIONALDISCUSSION Somediscussionoftheresultsobtainedinthepreviouschap tershasalreadybeen providedinthechaptersthemselves.However,thischapter addressesafewadditional issuesthatconcernmultiplepreviouschapterssimultaneo usly.Section 4.1 compares the g -priormodelsofChapter 2 totheindependence-priormodelsofChapter 3 whileSection 4.2 givesatentativediscussionoftheimplicationsoftheresu ltsin Chapters 2 and 3 fornitesamplesizes. 4.1Comparisonof g -PriorandIndependence-PriorModels Inspectionofthevariancematrixintheprioron n showsthetwokeydifferences between g -priorandindependence-priormodels.First,fortheprior variancematrixof n j 2 ,the g -priormodelusesthesamematrix ( X Tn X n ) 1 thatappearsinthevariance oftheMLE ^ n ,whereastheindependence-priormodelusesthematrix I p n instead. However,theintroductionofAssumption 4 intheindependence-priormodelessentially removedthisdistinctionbysetting ( X Tn X n ) 1 = n 1 I p n .Thus,theprimarydifference betweencorrespondingversionsofthe g -priorandindependence-priormodelsas consideredhereisthatthe g -priormodelspeciesthepriorvarianceof n j 2 tobe proportionalto 2 ,whereastheindependence-priormodeldoesnot.Morespeci cally, wheretheindependence-priormodelhas 2 ,the g -priormodelhas g 2 = n ItcanbeseenfromTheorems 2.17 and 3.8 thatinthecaseofahierarchical modelwithauniformhyperprioron g or 2 (notingthattheformerisaspecialcase ofthehyperg -priormodelwith c =0 ),thenecessaryandsufcientconditionfor posteriorconsistencyisthesameforboththe g -priorandindependence-priormodels. However,itcanbeseenfromTheorems 2.8 and 3.6 thatthesameagreementdoes notoccurfornon-hierarchicalmodels.Forthesakeofcompa rison,supposewetake g n = n inthenon-hierarchical g -priormodel.Thenposteriorconsistencyoccursin thenon-hierarchical g -priormodelifandonlyif jj r n 0 n jj 22 = o ( n 2 = log p n ) .(Note 78

PAGE 79

thatwhen g n = n ,therstpartoftheconditioninTheorem 2.8 isimpliedbythe secondpart.)However,inthenon-hierarchicalindependen ce-priormodel,posterior consistencyoccursifandonlyif jj r n 0 n jj 22 = o ( n log n ) .Observethatthecondition inthenon-hierarchical g -priormodelismoreeasilysatisedthantheconditionin thenon-hierarchicalindependence-priormodel.If jj r n 0 n jj 22 growsmoreslowly than n 2 = log p n butatleastasfastas n log n ,thentheposteriorwillbeconsistent inthenon-hierarchical g -priormodelwith g n = n ,butnotinthenon-hierarchical independence-priormodel.Thus,purelyfromthestandpoin tofposteriorconsistency, thenon-hierarchical g -priormodelcanbeconsideredtobehaveatleastaswellasth e non-hierarchicalindependence-priormodel,inthisspeci csense. 4.2PracticalImplicationsforFiniteSampleSizes Althoughposteriorconsistencyisanasymptoticresultde nedas n !1 ,itis naturaltoaskwhatimplications,ifany,theresultsofChap ters 2 and 3 mayhavefor theanalysisofrealdatawithnite n .Ofcourse,anysuchconclusionsmustbestated cautiously,sincethemereoccurrenceofposteriorconsist encyprovidesnoinformation ontherateoftheconvergenceinquestion.Ifthisrateweret obeparticularlyslowina givensituation,theposteriorforsomenite n mightestimatethetrueparametervalue quitepoorlydespitebeingconsistentas n !1 Still,someusefulconclusionsmaybedrawnfromtheresults ofChapters 2 and 3 First,itshouldbenotedthatinmanypracticalsettings,it wouldbereasonableto assumethat jj r n 0 n jj 1 = o ( n ) and jj r n 0 n jj 22 = O ( p n ) if n wereallowedtotendto innity.Recallthatinthiscase,boththenon-hierarchica l g -priormodel(withasensible choiceof g n )andthenon-hierarchicalindependence-priormodelexhib itposterior consistency.Thisresultprovidesatleastsomejusticati onfortheuseofthesemodels inmanysettings. Ontheotherhand,itcanalsobeseenfromTheorems 2.8 and 3.6 thatposterior consistencymaynotoccurinthenon-hieararchicalmodelsi f jj r n 0 n jj 22 grows 79

PAGE 80

tooquickly.However,eachoftheempiricalandhierarchica lmodelsconsideredin Chapters 2 and 3 canstillexhibitposteriorconsistencyevenwhen jj r n 0 n jj 22 grows extremelyquickly.(Infact,asufcientconditionforpost eriorconsistencyinthese modelsisthat jj r n 0 n jj 22 !1 .)Fornite n ,thisresultprovidesatentativesuggestion thatincircumstanceswhere p 1 n jj r n 0 n jj 22 isbelievedtobeextremelylargecompared toquantitiessuchas max 2 0 or 2 ,anempiricalorhierarchicalmodelmayyieldbetter resultsthananon-hierarchicalmodel.Ofcourse,amuchmor ethoroughinvestigationof thisideawouldbeneededbeforeanysuchclaimcouldberelia blymade. Finally,perhapsthemostimportantimplicationcomesfrom thesomewhat surprisingsituationinwhichposteriorinconsistencyocc ursintheempiricaland hierarchicalmodels.Recallthatposterior inconsistency occursforthesemodelsin thescenariowhere p n = O ( n ) exactly,butonlyaxednumber p ? > 0 ofcomponents of r n 0 n arenonzeroandthese p ? componentsremainxedas n grows.If r n = 0 p n thenthezeroandnonzerocomponentsof 0 n correspondtocovariatesthatcanbe consideredirrelevantandrelevant,respectively,formod elingtheresponse(ineach case,conditionalontheinclusionoftheothercovariatesi nthemodel).Fornite n thisresultforthesemodelsperhapsprovidesapracticalwa rningagainstincluding largenumbersofpotentiallyirrelevantcovariatesunless therearelikelytobealarge numberofrelevantcovariatesaswell,orunlessthetotalnu mberofcovariatesisnotan appreciablefractionofthesamplesize. 80

PAGE 81

CHAPTER5 POSSIBLEFUTUREWORK WenowsuggestseveraldirectionsinwhichtheworkofChapte rs 2 and 3 could possiblybeextendedbyourselvesorothersinthefuture.Th eseincludeimprovements topreviouslypresentedresultsaswellasentirelynewprob lemsthatcouldbeexamined. 5.1Extensionstothe g -PriorModel Themostobviousextensiontoourworkwouldbeaproofthatth esufcient conditionforposteriorconsistencyintheZellner-Siow g -priormodelisalsonecessary. Ofcourse,itispossiblethatthisconditionis not necessary,andthatsomeslightly weakerconditionisnecessaryandsufcient.However,webe lievethatthesufcient conditionfromTheorem 2.23 is necessary,basedonthefollowingintuitiveargument. Inthenon-hierarchical g -priormodel,provided jj r n 0 n jj 1 9 0 ,itisnecessarythat g n !1 forposteriorconsistencytooccur.Accordingly,inhierar chical g -priormodels, itseemslogicalthatitshouldbethebehaviorofthetailoft heposteriordistributionof g thatcontrolswhetherornotposteriorconsistencyoccurs. Nownotethatforanyxed n thehyperprioron g intheZellner-Siow g -priormodelhasatailsimilartothatofthe hyperprioron g inthehyperg -priormodelwith c =3 .Thatis, lim g !1 ( g +1) 3 = 2 g 3 = 2 =lim g !1 g 3 = 2 exp( n = 2 g ) g 3 = 2 foranyxed n .ThissuggeststhattheZellner-Siow g -priormodelandthehyperg -prior modelmaysharethesamenecessaryandsufcientconditionf orposteriorconsistency. Still,thisargumentmaybeanoversimplication,andtheis sueremainsunresolved. Thesimilarityofthetailsofthehyperpriorson g inthehyperg -priorandZellner-Siow g -priormodelsalsoraisesthequestionofwhetherhyperprio rswithdifferenttailbehavior couldproduceadifferentnecessaryandsufcientconditio n.Morespecically,itcould beofinteresttoconsideragammahyperpriorfor g ,whichwouldhavealightertailthan eitherofthehyperpriorsconsideredthusfar.Basedonthes ameargumentasinthe 81

PAGE 82

previousparagraph,wemightspeculatethatposteriorcons istencycouldbeharderto achieveinsucha g -priormodel. Finally,our g -priorworkcouldalsobeextendedtoaddressthegeneralize d g -prior proposedbyMaruyamaandGeorge[ 20 ].Theyspecifytheprioron n j 2 as n j 2 N p n r n 2 W n n W T n where n isadiagonalmatrixsubjecttocertainrestrictionsand W n isanorthogonal matrixthatdiagonalizes X Tn X n .Unfortunately,however,thismodelcannotbewrittenin theformofthe g -priormodellikethemodelsinChapter 2 ,soitwouldrequireadifferent approachtoestablishresultsonposteriorconsistency. 5.2ExtensionstotheIndependence-PriorModel Forourstudyofindependence-priormodels,themostobviou sextensionwouldbe toremoveAssumption 4 ,whichforced n = I p n forthenon-hierarchicalindependence-prior modelandtheuniform-hyperpriorindependence-priormode l.However,ourworkthus farseemstoindicatethattheestablishmentofaconditiont hatisbothnecessaryand sufcientforposteriorconsistencyineitherofthesemode lswouldbequitechallenging if n 6 = I p n .Theprimarydifcultywhen n 6 = I p n isrelatedtoouruseofthe ` 1 norm insteadofthe ` 2 norm.Morespecically,ifweattempttoproveLemma 3.2 with n 6 = I p n weareunabletoproceedfromInequality 3–5 toanythingresemblingInequality 3–7 since e B n ( 2 ) ^ n 1 = I p n + n 2 2 1 n 1 r n ^ n 1 (5–1) whichcanbeboundedby 2 min p p n ( 2 min + n 2 ) r n ^ n 1 e B n ( 2 ) ^ n 1 p p n 2 max 2 max + n 2 r n ^ n 1 82

PAGE 83

Unless p n isxed,theseboundsdonotappeartobesharpenoughtoallow the establishmentofaconditionthatisbothnecessaryandsuf cient,anditisnotclear howelsetoproceed. Alternatively,wecouldkeepAssumption 4 andattempttoderivenecessaryand sufcientconditionsforposteriorconsistencyinotherva riationsoftheindependence-prior model.Forexample,otherhierarchicalmodelscouldbecons idered,perhapswith properhyperpriorson 2 .Adifferentrouteforthespecicationof 2 wouldbean empiricalBayesiansolutionwhere 2 isestimatedfromthedata.However,apotential complicationhereisthefactthatthemarginallikelihoodo f 2 inthenon-hierarchical independence-priormodelcannotbemaximizedinclosedfor m.Nonetheless,sensible empiricalBayesianmodelscouldstillbederivedusingmeth odsotherthanmaximum likelihoodfortheestimationof 2 Anotherdirectionofworkwouldbetoeliminatetherequirem entofasinglexed valueof 2 inthenon-hierarchicalindependence-priormodel.Oneopt ionwouldsimply betoallow 2 2 n tovarywith n .Perhapsamoreinterestingscenariowouldbeto alsoallowthedifferentcomponents n i of n tohavedifferentpriorvariances 2 n i .If someofthe 2 n i wereallowedtobeinnity,thiscouldaccommodateamodelin which theinterceptreceivesaatpriorwhiletheothercoefcien tsreceiveinformativepriors. Thisideacouldalsobeextendedtohierarchicalindependen ce-priormodels,with the 2 n i assignedsomehyperpriorstructure.However,withanimpro perhyperprior, cautionmustbeexercisedtoensureproprietyoftheposteri or.Forexample,thechoice n ( 2 n ,1 ,..., 2 n p n ) / 1 wouldleadtoanimproperposterior,incontrasttotheprope r posteriorthatresultsfromauniformhyperprioronacommon 2 ,asseeninSection 3.3 5.3OtherExtensions Thedifcultiesassociatedwithtaking n 6 = I p n intheindependencepriormodel arisesimplyfromthemismatchofthesetwomatricesinvario uscalculations,suchas Equation 5–1 .Therefore,ifwewereabletosolveorcircumventthisprobl emtonda 83

PAGE 84

necessaryandsufcientconditionforposteriorconsisten cyinthiscase,itislikelythat wecouldusethesamemethodtohandleageneralpriorvarianc ematrix n ( 2 ) .This wouldessentiallycoverallcasesofthebasicnormal-norma lBayesianregressionmodel. Wecouldalsoconsiderpriorson n or n j 2 thatarenotconjugate.Note thatawidevarietyofsymmetricdistributionscanbeconsid eredasscalemixturesof normals[ 1 ],whichimpliesthatanysuchdistributioncanbetakenasap rioron n viaa hierarchicalstructure.NotethattheZellner-Siow g -priormodelconsideredinSection 2.4 usespreciselysuchastructure. Intermsofpracticalapplicability,anotherimportantext ensiontoourworkwouldbe todetermineratesofconvergenceincaseswhereposteriorc onsistencydoesindeed occur.Thatis,foranyparticularmodel,wemightwishtond asequence r n ( ) 0 for every > 0 suchthat P M ( jj n 0 n jj 1 > j ^ n S n ) hasexactorder r n ( ) a.s. ( P 0 ) Thiswouldhelptoprovidemorerigorousstatementsoftheid easdiscussedinformallyin Section 4.2 Finally,itwouldalsoperhapsbeinterestingtoinvestigat ehowtoextendour ideastothecasewherethenumberofregressors p n exceedsthesamplesize n ImplementingaBayesianregressionmodelinthesecaseswou ldlikelyrequiresome sortofdimensionalityreductionmethod.Thekernel-based dimensionreduction frameworkofFukumizuetal.[ 10 ]orperhapsthesureindependencescreeningof FanandLv[ 7 ]couldproveusefulinthisregard.Ofcourse,thedenition ofprecisely whatismeantbyposteriorconsistencymightrequirecarefu lconsiderationinsettings wheredimensionalityreductioniscarriedout. 84

PAGE 85

REFERENCES [1] A NDREWS ,D.F.andM ALLOWS ,C.L.(1974).Scalemixturesofnormal distributions. JournaloftheRoyalStatisticalSociety,SeriesB 36 99–102. [2] B ERGER ,J.O.,G HOSH ,J.K.andM UKHOPADHYAY ,N.(2003).Approximations andconsistencyofBayesfactorsasmodeldimensiongrows. JournalofStatistical PlanningandInference 112 241–258. [3] B ERNSTEIN ,S.N.(1934). TheoryofProbability .GTTI,Moscow. [4] B LACKWELL ,D.H.andD UBINS ,L.E.(1962).Mergingofopinionswithincreasing information. AnnalsofMathematicalStatistics 33 882–886. [5] D IACONIS ,P.W.andF REEDMAN ,D.A.(1986).OntheconsistencyofBayes estimates. AnnalsofStatistics 14 1–26. [6] D OOB ,J.L.(1948).Applicationofthetheoryofmartingales. ColloquesInternationauxduCentreNationaldelaRecherceScientique 13 23–27. [7] F AN ,J.andL V ,J.(2008).Sureindependencescreeningforultrahighdime nsional featurespace. JournaloftheRoyalStatisticalSociety,SeriesB 70 849–911. [8] F ERN ANDEZ ,C.,L EY ,E.andS TEEL ,M.F.(2001).BenchmarkpriorsforBayesian modelaveraging. JournalofEconometrics 100 381–427. [9] F OSTER ,D.P.andG EORGE ,E.I.(1994).Theriskinationcriterionformultiple selection. AnnalsofStatistics 22 1947–1975. [10] F UKUMIZU ,K.,B ACH ,F.R.andJ ORDAN ,M.I.(2009).Kerneldimensionreduction inregression. AnnalsofStatistics 37 1871–1905. [11] G EORGE ,E.I.andF OSTER ,D.P.(2000).CalibrationandempiricalBayesvariable selection. Biometrika 87 731–747. [12] G HOSAL ,S.(1999).Asymptoticnormalityofposteriordistributio nsin high-dimensionallinearmodels. Bernoulli 5 315–331. [13] G HOSH ,J.K.,S INHA ,B.K.andJ OSHI ,S.N.(1982).Expansionsforposterior probabilityandintegratedBayesrisk.In StatisticalDecisionTheoryandRelated 85

PAGE 86

TopicsIII (S.S.GuptaandJ.O.Berger,eds.),vol.1.AcademicPress,N ewYork, 403–456. [14] J IANG ,W.(2007).Bayesianvariableselectionforhigh-dimensio nalgeneralized linearmodels:Convergenceratesofthetteddensities. AnnalsofStatistics 35 1487–1511. [15] K ASS ,R.E.andW ASSERMAN ,L.A.(1995).AreferenceBayesiantestfornested hypothesesanditsrelationshiptotheSchwarzcriterion. JournaloftheAmerican StatisticalAssociation 90 928–934. [16] L APLACE ,P.-S.(1774).M emoiresurlaprobabilit edescausesparles ev enements. M emoiresdemath ematiqueetdephysiquepresent esal'acad emieroyaledes sciences,pardiverssavants,etlusdanssesassembl ees. [17] L E C AM ,L.M.(1982).Expansionsforposteriorprobabilityandint egratedBayes risk.In StatisticalDecisionTheoryandRelatedTopicsIII (S.S.GuptaandJ.O. Berger,eds.),vol.1.AcademicPress,NewYork,403–456. [18] L IANG ,F.,P AULO ,R.,M OLINA ,G.,C LYDE ,M.A.andB ERGER ,J.O.(2008). Mixturesof g priorsforBayesianvariableselection. JournaloftheAmerican StatisticalAssociation 103 410–423. [19] M ALLICK ,B.K.,G HOSH ,D.andG HOSH ,M.(2005).Bayesianclassicationof tumorsusinggeneexpressiondata. JournaloftheRoyalStatisticalSociety,Series B 67 219–234. [20] M ARUYAMA ,Y.andG EORGE ,E.I.(2011).FullyBayesfactorswithageneralized g -prior. AnnalsofStatistics 39 2740–2765. [21] M ORENO ,E.,G IR ON ,F.J.andC ASELLA ,G.(2010).Consistencyofobjective Bayesfactorsasthemodeldimensiongrows. AnnalsofStatistics 38 1937–1952. [22] S HANG ,Z.andC LAYTON ,M.K.(2011).ConsistencyofBayesianlinearmodel selectionwithagrowingnumberofparameters. JournalofStatisticalPlanningand Inference 141 3463–3474. 86

PAGE 87

[23] VON M ISES ,R.(1964). MathematicalTheoryofProbabilityandStatistics AcademicPress,NewYork. [24] Z ELLNER ,A.(1986).OnassessingpriordistributionsandBayesianr egression analysiswith g -priordistributions.In BayesianInferenceandDecisionTechniques (P.K.GoelandA.Zellner,eds.).North-Holland,Amsterdam ,233–243. [25] Z ELLNER ,A.andS IOW ,A.(1980).Posterioroddsratiosforselectedregression hypotheses.In BayesianStatistics (J.M.Bernardo,M.H.DeGroot,D.V.Lindley andA.F.M.Smith,eds.).UniversityPress,Valencia,585–6 04. 87

PAGE 88

BIOGRAPHICALSKETCH DouglasKyleSparksgraduatedfromTheMcCallieSchoolin20 02andreceived hisundergraduateeducationattheUniversityofFlorida.H egraduatedwithaB.S.in physicsandmathematicsin2006andjoinedthegraduateprog ramintheDepartment ofStatisticsthesameyear.Hisresearchinterestsinclude Bayesiantheoryand methodology,particularlyasrelatedtotopicsinhigh-dim ensionalinferenceandvariable selection,aswellasthefrequentistperformanceofBayesi anmethods. 88