Citation
Automatic Discovery of Latent Clusters in General Regression Models

Material Information

Title:
Automatic Discovery of Latent Clusters in General Regression Models
Creator:
Sk, Minhazul Islam
Place of Publication:
[Gainesville, Fla.]
Florida
Publisher:
University of Florida
Publication Date:
Language:
english
Physical Description:
1 online resource (107 p.)

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Computer Science
Computer and Information Science and Engineering
Committee Chair:
BANERJEE,ARUNAVA
Committee Co-Chair:
ENTEZARI,ALIREZA
Committee Members:
RANGARAJAN,ANAND
GHOSH,MALAY

Subjects

Subjects / Keywords:
dirichlet-process -- regression -- statistics
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre:
bibliography ( marcgt )
theses ( marcgt )
government publication (state, provincial, terriorial, dependent) ( marcgt )
born-digital ( sobekcm )
Electronic Thesis or Dissertation
Computer Science thesis, Ph.D.

Notes

Abstract:
We present a flexible nonparametric Bayesian framework for automatic detection of local clusters in general regression models. The models are built using techniques that are now considered standard in statistical parameter estimation literature, namely Dirichlet Process (DP), Hierarchical Dirichlet Process (HDP), Generalized Linear Model (GLM) and Hierarchical Generalized Linear Model (HGLM). These Bayesian nonparametric techniques have been widely applied to solve clustering problems in the real world. In the first part of this thesis, we formulate all traditional versions of the infinite mixture of GLM models under the Dirichlet Process framework. We study extensively two different inference techniques for these models, namely, variational inference and Gibbs sampling. Finally, we evaluate their speed, and accuracy in synthetic and real word datasets across various dimensions. In the second part, we present a flexible nonparametric generative model for multigroup regression that detects latent common clusters of groups. We name this ``Infinite MultiGroup Generalized Linear Model'' (iMG-GLM). We present two versions of the core model. First, in iMG-GLM-1, we demonstrate how the use of a DP prior on the groups while modeling the response-covariate densities via GLM, allows the model to capture latent clusters of groups by noting similar densities. The model ensures different densities for different clusters of groups in the multigroup setting. Secondly, in iMG-GLM-2, we model the posterior density of a new group using the latent densities of the clusters inferred from previous groups as prior. This spares the model from needing to memorize the entire data of previous groups. The posterior inference for iMG-GLM-1 is done using variational inference and that for iMG-GLM-2 using a simple Metropolis Hastings algorithm. We demonstrate iMG-GLM's superior accuracy in comparison to well known competing methods like Generalized Linear Mixed Model (GLMM), Random Forest, Linear Regression etc. on two real world problems. In the third part, we present a flexible nonparametric generative model for multilevel regression that strikes an automatic balance between identifying common effects across groups while respecting their idiosyncrasies. We name it ``Infinite Mixtures of Hierarchical Generalized Linear Model'' (iHGLM). We demonstrate how the use of a HDP prior in local, group-wise GLM modeling of response-covariate densities allows iHGLM to capture latent similarities and differences within and across groups. We demonstrate iHGLM's superior accuracy in comparison to well known competing methods like Generalized Linear Mixed Model (GLMM), Regression Tree, Bayesian Linear Regression, Ordinary Dirichlet Process regression, and several other regression models on several synthetic and real world datasets. For the final problem we present a framework that shows how infinite mixtures of Linear Regression (Dirichlet Process mixtures) can be used to design a new denoising technique in the domain of time series data that presumes a model for the uncorrupted underlying signal rather than a model for the noise. Specifically, we show how the nonlinear reconstruction of the underlying dynamical system by way of time delay embedding yields a new solution for denoising where the underlying dynamics is assumed to be highly nonlinear yet low-dimensional. The model for the underlying data is recovered using the nonparametric Bayesian approach and is therefore very flexible. ( en )
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Thesis:
Thesis (Ph.D.)--University of Florida, 2017.
Local:
Adviser: BANERJEE,ARUNAVA.
Local:
Co-adviser: ENTEZARI,ALIREZA.
Statement of Responsibility:
by Minhazul Islam Sk.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
LD1780 2017 ( lcc )

Downloads

This item has the following downloads:


Full Text

PAGE 1

AUTOMATICDISCOVERYOFLATENTCLUSTERSINGENERALREGRESSIONMODELSByMINHAZULISLAMSKADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2017

PAGE 2

2017MinhazulIslamSk

PAGE 3

Idedicatethisdissertationtomyparentsfortheirhelpandcontributionsinmylife.

PAGE 4

ACKNOWLEDGMENTSFirstofall,Iwouldliketothankallthepeoplewhohavehelpedmeinmygraduatelife.IwouldliketothankmyPh.D.advisorDr.ArunavaBanerjee,withoutwhomIcouldnothavecompletedmydissertation.Icannotthankhimenoughforhishelp,contributionandmotivationinmyentiregraduatelife.Iowealotofthisjourneytohimasagraduatestudent.IwouldalsoliketothankmyPh.D.committeemembers:Dr.AnandRangarajan,Dr.AlirezaEntezari,Dr.MalayGhoshfortheirinvaluablesuggestions.IwouldliketothankRafaelNadalandBernieSanderswhohaveinspiredmeinmylifewiththeirpassion,accomplishmentsandghtforstandingupforwhatisright,especiallyinthetimeofdespair.Iwouldalsoliketotakethisopportunitytothankmyentirefamilyforhelpingmetoreachthisstageofmylife,fortheirnancialandmoralhelpintimeofdistress,forsupportingandbelievinginmeandraisingmetoprepareforeveryadversitiesinmylife. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 8 LISTOFFIGURES ..................................... 9 ABSTRACT ......................................... 10 CHAPTER 1INTRODUCTION ................................... 12 IntroductiontotheVariationalInferenceoftheDPMixturesofGLM ....... 13 AutomaticDetectionofLatentCommonClustersinMultigroupRegression ... 16 AutomaticDiscoveryofCommonandIdiosyncraticEffectsinMultilevelRe-gression ..................................... 19 DenoisingTimeSeriesbyaFlexibleModelforPhaseSpaceReconstruction .. 22 OrganizationoftheDissertation ........................... 26 2MATHEMATICALBACKGROUND ......................... 27 GeneralizedLinearModel .............................. 27 Overview .................................... 27 ProbabilityDistribution ............................. 28 LinearPredictor ................................. 28 LinkFunction .................................. 28 BayesianStatistics .................................. 29 Bayes'TheoremandInference ........................ 29 MAPEstimate .................................. 29 ConjugatePrior ................................. 29 NonparametricBayesian ............................... 30 DirichletDistributionandDirichletProcess .................. 30 StickBreakingRepresentation ........................ 31 ChineseRestaurantProcess ......................... 31 DirichletProcessMixtureModel ........................ 33 HierarchicalDirichletProcess ......................... 33 ChineseRestaurantFranchise ........................ 34 3VARIATIONALINFERENCEFORINFINITEMIXTURESOFGENERALIZEDLINEARMODELS .................................. 37 GLMModelsasProbabilisticGraphicalModels .................. 37 NormalModel .................................. 37 LogisticMultinomialModel ........................... 37 PoissonModel ................................. 38 5

PAGE 6

ExponentialModel ............................... 38 InverseGaussianModel ............................ 39 MultinomialProbitModel ............................ 40 VariationalInference ................................. 40 VariationalDistributionoftheModels ........................ 41 NormalModel .................................. 41 LogisticMultinomialModel ........................... 41 PoissonModel ................................. 42 ExponentialModel ............................... 42 InverseGaussianModel ............................ 43 MultinomialProbitModel ............................ 43 GeneralizedEvidenceLowerBound(ELBO) .................... 43 ParameterEstimationfortheModels ........................ 44 ParameterEstimationfortheNormalModel ................. 45 ParameterEstimationfortheMultinomialModel ............... 47 ParameterEstimationforthePoissonModel ................. 47 ParameterEstimationfortheExponentialModel ............... 48 ParameterEstimationfortheInverseGaussianModel ........... 49 ParameterEstimationfortheMultinomialProbitModel ........... 51 PredictiveDistribution ................................ 51 ExperimentalResults ................................ 52 Datasets ..................................... 54 TimingPerformancefortheNormalModel .................. 54 Accuracy .................................... 55 TooltoUnderstandStockMarketDynamics ................. 56 4AUTOMATICDETECTIONOFLATENTCOMMONCLUSTERSOFGROUPSINMULTIGROUPREGRESSION .......................... 60 ModelsRelatedtoiMG-GLM ............................ 60 iMG-GLMModelFormulation ............................ 60 NormaliMG-GLM-1Model ........................... 61 LogisticMultinomialiMG-GLM-1Model .................... 62 PoissoniMG-GLM-1Model .......................... 62 VariationalInference ................................. 62 NormaliMG-GLM-1Model ........................... 63 LogisticMultinomialiMG-GLM-1Model .................... 63 PoissoniMG-GLM-1Model .......................... 63 ParameterEstimationforVariationalDistribution ................. 64 ParameterEstimationofiMG-GLM-1NormalModel ............. 64 ParameterEstimationofiMG-GLM-1MultinomialModel .......... 64 ParameterEstimationofPoissoniMG-GLM-1Model ............ 65 PredictiveDistribution ............................. 65 iMG-GLM-2Model .................................. 66 InformationTransferfromPriorGroups .................... 66 6

PAGE 7

PosteriorSampling ............................... 67 PredictionforNewGroupTestSamples ................... 68 ExperimentalResults ................................ 68 TrendsinStockMarket ............................. 68 ClinicalTrialProblemModeledbyPoissoniMG-GLMModel ........ 70 5AUTOMATICDISCOVERYOFCOMMONANDIDIOSYNCRATICLATENTEFFECTSINMULTILEVELREGRESSION .................... 74 ModelsRelatedtoHGLM .............................. 74 AnIllustrativeExample ................................ 74 iHGLMModelFormulation .............................. 75 NormaliHGLMModel ............................. 75 LogisticMultinomialiHGLMModel ...................... 76 ProofofWeakPosteriorConsistency ........................ 77 GibbsSampling .................................... 78 PredictiveDistribution ................................ 80 ExperimentalResults ................................ 81 ClinicalTrialProblemModeledbyPoissoniHGLM .............. 81 HeightImputationProblem ........................... 83 MarketDynamicsExperiment ......................... 84 6SECONDPROBLEM:TIMESERIESDENOISING ................ 87 TimeDelayEmbeddingandFalseNeighborhoodMethod ............ 87 NPB-NRModel .................................... 88 StepOne:ClusteringofPhaseSpace .................... 88 StepTwo:NonlinearMappingofPhaseSpacePoints ............ 89 StepThree:RestructuringoftheDynamics ................. 90 ExperimentalResults ................................ 91 AnIllustrativeDescriptionoftheNPB-NRProcess ............. 91 PredictionAccuracy .............................. 92 NoiseReductionExperiment ......................... 95 PowerSpectrumExperiment ......................... 95 ExperimentwithDimensions ......................... 97 7CONCLUSIONANDFUTUREWORK ....................... 100 REFERENCES ....................................... 102 BIOGRAPHICALSKETCH ................................ 107 7

PAGE 8

LISTOFTABLES Table page 3-1Descriptionofvariationalinferencealgorithmsforthemodels .......... 53 3-2RuntimeforGibbssamplingandvariationalinference .............. 55 3-3Log-likelihoodofthenormalmodelofthepredictivedistribution ......... 56 3-4MSEandMAEofalgorithmsforthedatasets ................... 57 3-5Listofinuentialstocksonindividualstocks .................... 58 4-1DescriptionofvariationalinferencealgorithmforiMG-GLM-1normalmodel .. 66 4-2ClustersofStocksfromVariousSectors ...................... 71 4-3MeanabosulteerrorforallstocksforiMG-GLM-1 ................. 71 4-4MSEandMAEforclinicaltrialandpatientsdatasets ............... 72 5-1DescriptionofGibbssamplingalgorithmforiHGLM ................ 81 5-2Listofstockswithtop3signicantstocksinuencingeachstock ........ 85 5-3MSEandMAEofthealgorithmsfortheheightimputationdataset ........ 85 5-4MSEandMAEofthealgorithmsfortheclinicaltrialandpatientsdatasets. ... 86 6-1Step-wisedescriptionofNPB-NRprocess. .................... 91 6-2Minimumembeddingdimensionoftheattractors ................. 98 6-3MSEandstandarddeviationofdatasetsforallalgorithms ............ 98 6-4Noisereductionpercentageoftheattractors .................... 99 8

PAGE 9

LISTOFFIGURES Figure page 2-1StickbreakingfortheDirichletProcess ....................... 32 2-2ChineseRestaurantProcess ............................ 32 2-3PlatenotationforDPMM ............................... 33 2-4PlatenotationforHDPMM .............................. 35 2-5PlatenotationforHDPMMwithindicatorvariables ................ 35 2-6ChineseRestaurantFranchiseforHDP ...................... 36 3-1Posteriortrajectoryofthenormalmodel ...................... 53 3-2Timingsforsyntheticdatasetsperdimension ................... 55 4-1GraphicalrepresentationofiMG-GLM-1model. .................. 61 4-2AverageMAEfor51stocksfor50randomrunsforiMG-GLM-1model ..... 73 4-3AverageMAEfor10newstocksfor50randomrunsforiMG-GLM-2model ... 73 5-1Posteriortrajectoryofthesyntheticdatasetwith4groups ............ 75 5-2Depictionofseveralclustersintheheightimputationdataset .......... 86 6-1PlotofthenoisyIBMtimeseriesdata ....................... 92 6-2Depictionofnoisyphasespace(reconstructed). ................. 92 6-3Clusteredphasespaceandonesinglecluster ................... 93 6-4Regressiondata:Y(1)regressedwithcovariateasX(1),X(2)andX(3) ..... 93 6-5Singlenoiseremovedclusterandwholenoiseremovedphasespace ...... 93 6-6Plotofthenoiseremovedtimeseriesdata ..................... 93 6-7Powerspectrumandphasespaceplotofattractors ................ 96 9

PAGE 10

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophyAUTOMATICDISCOVERYOFLATENTCLUSTERSINGENERALREGRESSIONMODELSByMinhazulIslamSkAugust2017Chair:ArunavaBanerjeeMajor:ComputerScienceWepresentaexiblenonparametricBayesianframeworkforautomaticdetectionoflocalclustersingeneralregressionmodels.Themodelsarebuiltusingtechniquesthatarenowconsideredstandardinstatisticalparameterestimationliterature,namelyDirichletProcess(DP),HierarchicalDirichletProcess(HDP),GeneralizedLinearModel(GLM)andHierarchicalGeneralizedLinearModel(HGLM).TheseBayesiannonparametrictechniqueshavebeenwidelyappliedtosolveclusteringproblemsintherealworld.Intherstpartofthisthesis,weformulatealltraditionalversionsoftheinnitemixtureofGLMmodelsundertheDirichletProcessframework.Westudyextensivelytwodifferentinferencetechniquesforthesemodels,namely,variationalinferenceandGibbssampling.Finally,weevaluatetheirspeed,andaccuracyinsyntheticandrealworddatasetsacrossvariousdimensions.Inthesecondpart,wepresentaexiblenonparametricgenerativemodelformultigroupregressionthatdetectslatentcommonclustersofgroups.WenamethisInniteMultiGroupGeneralizedLinearModel(iMG-GLM).Wepresenttwoversionsofthecoremodel.First,iniMG-GLM-1,wedemonstratehowtheuseofaDPprioronthegroupswhilemodelingtheresponse-covariatedensitiesviaGLM,allowsthemodeltocapturelatentclustersofgroupsbynotingsimilardensities.Themodelensuresdifferentdensitiesfordifferentclustersofgroupsinthemultigroupsetting.Secondly,in 10

PAGE 11

iMG-GLM-2,wemodeltheposteriordensityofanewgroupusingthelatentdensitiesoftheclustersinferredfrompreviousgroupsasprior.Thissparesthemodelfromneedingtomemorizetheentiredataofpreviousgroups.TheposteriorinferenceforiMG-GLM-1isdoneusingvariationalinferenceandthatforiMG-GLM-2usingasimpleMetropolisHastingsalgorithm.WedemonstrateiMG-GLM'ssuperioraccuracyincomparisontowellknowncompetingmethodslikeGeneralizedLinearMixedModel(GLMM),RandomForest,LinearRegressionetc.ontworealworldproblems.Inthethirdpart,wepresentaexiblenonparametricgenerativemodelformultilevelregressionthatstrikesanautomaticbalancebetweenidentifyingcommoneffectsacrossgroupswhilerespectingtheiridiosyncrasies.WenameitInniteMixturesofHierarchicalGeneralizedLinearModel(iHGLM).WedemonstratehowtheuseofaHDPpriorinlocal,group-wiseGLMmodelingofresponse-covariatedensitiesallowsiHGLMtocapturelatentsimilaritiesanddifferenceswithinandacrossgroups.WedemonstrateiHGLM'ssuperioraccuracyincomparisontowellknowncompetingmethodslikeGeneralizedLinearMixedModel(GLMM),RegressionTree,BayesianLinearRegression,OrdinaryDirichletProcessregression,andseveralotherregressionmodelsonseveralsyntheticandrealworlddatasets.ForthenalproblemwepresentaframeworkthatshowshowinnitemixturesofLinearRegression(DirichletProcessmixtures)canbeusedtodesignanewdenoisingtechniqueinthedomainoftimeseriesdatathatpresumesamodelfortheuncorruptedunderlyingsignalratherthanamodelforthenoise.Specically,weshowhowthenonlinearreconstructionoftheunderlyingdynamicalsystembywayoftimedelayembeddingyieldsanewsolutionfordenoisingwheretheunderlyingdynamicsisassumedtobehighlynonlinearyetlow-dimensional.ThemodelfortheunderlyingdataisrecoveredusingthenonparametricBayesianapproachandisthereforeveryexible. 11

PAGE 12

CHAPTER1INTRODUCTIONThisdissertationcomprises,primarily,oftwoparts,withnonparametricBayesiantheoriesprovidingthecentraltheme.TherstpartdealswithaBayesiannonparametricapproachtoclusteringofregressionmodelsinvarioushierarchicalsettings.Thispartisdividedintothreesubtopics.Intherstsubtopic,weoutlinevariationalinferenceal-gorithmsforalreadyexistingclassesofinnitemixtureofGeneralizedLinearModels.Inthesecondsubtopic,wepresentagenerativemodelframeworkforautomaticdetectionoflatentcommonclustersofgroupsinmultigroupregression.Inthethirdsubtopic,weformulateagenerativemodelforautomaticdiscoveryofcommonandidiosyncraticlatenteffectsinmultilevelregression.Thesecondpartdealswithaproblemofdenoisingtimeseriesbywayofaexiblemodelforphasespacereconstructionusingvariationalinferenceofinnitemixturesoflinearregression.Eachpartisoutlinedinthefollowingparagraphs.Inmachinelearningandstatistics,regressiontheoryisaprocessforapproximatingfunctionalrelationshipsamongdifferententity/variables.Thiscomprisesofmethodsformodelingrelationshipbetweenmultiplevariables,wheretherstsetistermedasindependentvariablesorpredictorsorcovariates,whiletheothersetiscalleddependentvariableorresponsevariables.Ingeneral,regressiontheoryevaluatesthevalueofexpectationoftheconditionaldistributionoftheresponsegiventhecovariates.Anotherimportantparameteristhevarianceoftheresponseconditionaldensitygiventhecovariates.Intherstpartofthisdissertation,wepresentexiblenonparametricBayesianframeworksforautomaticdetectionoflocalclustersingeneralregressionmodelsinvariousgroupedaswellasnon-groupeddata.Intheotherpart,welayoutatimeseriesdenoisingtechniqueusingadynamicalsystemapproachthatusesphasespacereconstructionofthetimeseriesunderconsiderationandthenremovesthenoise 12

PAGE 13

inthephasespaceandnallyreconstructstheoriginalnoiseremovedtimeseriesallinthecontextofBayesiannonparametrics. IntroductiontotheVariationalInferenceoftheDirichletProcessMixturesofGeneralizedLinearModelsGeneralizedLinearModel(GLM)wasproposedin NelderandWedderburn ( 1972 )tobringerstwhiledisparatetechniquessuchas,Linearregression,Logisticregression,Poissonregression,InverseGaussian,MultinomialProbit,andExponentialregressionunderauniedframework.Generally,regressioninitscanonicalformassumesthattheresponsevariablefollowsagivenprobabilitydistributionwithitssupportdeterminedbyalinearcombinationofthecovariates.Formallystated,YjXf)]TJ /F6 11.9552 Tf 5.48 -9.684 Td[(XT.f,inthecaseofLinearregressionistheNormaldistribution,inthecaseofLogisticandPoissonregression,theyaretheMultinomialandPoissondistributionsrespectively.TherearetwopiecestotheaboveequationthatGLMgeneralizes.Firstly,fisgeneralizedtotheexponentialfamily.Secondly,thefunctionthatmapstheresponsemean()toXT,whichinthecaseoflinearregressionistheidentityfunction)]TJ /F6 11.9552 Tf 5.479 -9.684 Td[(XT=g()=,isgeneralizedtooneofanymemberofasetoflinkfunctions.CommonlinkfunctionsincludeLogit,Probitandthecomplementarylog-logfunction.AGLMmodelisformallydenedas: f(y;, )=expy)]TJ /F3 11.9552 Tf 11.955 0 Td[(b() a( )+c(y; )(1)Here, isadispersionparameter.Themeanresponseisgivenbythefollowingequation, E[YjX]=b()==g)]TJ /F7 7.9701 Tf 6.587 0 Td[(1)]TJ /F6 11.9552 Tf 5.479 -9.684 Td[(XT(1)HeregisthelinkfunctionandXTisthelinearpredictor. 13

PAGE 14

Notwithstandingitsgenerality,GLMsuffersfromtwointrinsicweaknesses,whichtheauthorsin Hannahetal. ( 2011 )addressedwheretheyusedtheGaussianModel.Firstly,thecovariatesareassociatedwiththemodelviaonlyalinearfunction.Secondly,thevarianceoftheresponsesarenotassociatedwiththeindividualcovariates.Weresolvetheissuesinlinewith Hannahetal. ( 2011 )byintroducingamixtureofGLM,andfurthermore,inordertoallowthedatatochoosethenumberofclustersweimposeaDirichletProcesspriorasformulatedin Ferguson ( 1973 ).Additionally,weextendthemodelsfromjustLinearandLogisticregressiontoallthetraditionalmodelsofGLMwhichwehavementionedabove.Forinference,awidelyapplicableMCMCalgorithm,namelyGibbssampling Neal ( 2000a )wasemployedin Hannahetal. ( 2011 )forpredictionanddensityestimationusingthePolyaurnschemeofDirichletProcess BlackwellandMacQueen ( 1973 ).Inspiteofthegeneralityandstrengthofthesemodels,theinherentdecienciesofGibbssamplingsignicantlyreducesitspracticalutility.Asiswellknown,Gibbssam-plingapproximatestheoriginalposteriordistributionbysamplingusingaMarkovchain.However,Gibbssamplingisprohibitivelyslowandmoreover,itsconvergenceisverydifculttodiagnose.Inhighdimensionalregressionproblems,Gibbssamplingseldomconvergestothetargetposteriordistributioninsuitabletime,leadingtosignicantlypoordensityestimationandprediction RobertandCasella ( 2001 ).Although,therearetheoreticalboundsonthemixingtime,inpracticetheyarenotparticularlyuseful.Toalleviatetheseproblems,weintroduceafastanddeterministicmeaneldvariationalinferencealgorithmforsuperiorpredictionanddensityestimationoftheGLMmixtures.Variationalinferenceisdeterministicandpossessesanoptimizationcriterionwhichcanbeusedtoassessconvergence.Variationalmethodswereintroducedinthecontextofgraphicalmodelsin M.JordanandSaul ( 2001 ).ForBayesianapplications,variationalinferencewasemployedin GhahramaniandBeal ( 2000 ).Variationalinferencehasfoundwideapplicationsin 14

PAGE 15

hierarchicalBayesianmodelssuchas,LatentDirichletAllocation D.BleiandJordan ( 2003 ),Dirichletprocessmixtures BleiandJordan ( 2006 )andHierarchicalDirichletProcess Tehetal. ( 2006 ).Tothebestofourknowledge,thisdissertationintroducesvariationalinferenceforthersttimetononparametricBayesianregression.Themaincontributionsofthispartareasfollows: WederivethevariationalinferencemodelseparatelyforallGLMmodelsaccordingtothestickbreakingrepresentationoftheDirichletProcess Sethuraman ( 1994 ).Thesemodelsdiffersignicantlyintermsofthetypeofcovariateandresponsedata,whichdirectstomarkedlydifferentvariationaldistributions,parameterestimationandpredictivedistributions.Ineachcase,weformulateaclassofdecoupledandfactorizedvariationaldistributionsassurrogatesfortheoriginalposteriordistribution.Wethenmaximizethelowerbound(resultingfromimposingJensen'sinequalityontheloglikelihood)toobtaintheoptimalvariationalparame-ters.Finally,wederivethepredictivedistributionfromtheposteriorapproximationtopredicttheresponsevariableconditionedonanewcovariateandthepastresponse-covariatepairs. Wedemonstratetheaccuracyoftheourvariationalapproachacrossdifferentmetricssuchas,relativemeansquareandabsoluteerrorinhighdimensionalproblemsagainstLinearregression,BayesianandvariationalLinearregression,GaussianProcessregression,andtheGibbssamplinginferenceinvarioustrain-ing/testingdatasplits.WeevaluatetheloglikelihoodofthepredictivedistributioninvaryingdimensionstoshowthesuperiorityofvariationalinferenceagainstGibbssamplinginaccuracy.Gibbssamplingfailstoconvergeasthedimensionprogressivelyrises. WeexperimentallyshowthatvariationalinferenceconvergessubstantiallyfasterthanGibbssampling,therebybecominganaturalchoiceforpracticalhighdimen-sionalregressionproblems.Weshowthetimingperformanceperdimensionwith 15

PAGE 16

thedimensionvaryingfromalowtoaverylargevalueforbothvariationalandGibbssamplinginferenceinasyntheticdataset,acompiledstockmarketdataset,andadiseasedataset. IntroductiontoAutomaticDetectionofLatentCommonClustersofGroupsinMultiGroupRegressionMultigroupregressionisthemethodofchoiceforresearchdesignwheneverresponse-covariatedataiscollectedacrossmultiplegroups.Whenacommonregressorislearnedontheamalgamateddata,theresultantmodelfailstoidentifyeffectsfortheresponsesspecictoindividualgroupsbecausetheunderlyingassumptionisthattheresponse-covariatepairsaredrawnfromasingleglobaldistribution,whentherealitymightbethatthegroupsarenotstatisticallyidentical,makingthejoiningoftheminappropriate.Modelingseparategroupsviaseparateregressorsresultsinamodelthatisdevoidofcommonlatenteffectsacrossthegroups.Suchamodeldoesnotexploitthepatternscommonamongthegroupsensuringinturnthetransferabilityofinformationamonggroupsintheregressionsetting.Thisisofparticularimportancewhenthetrainingsetisverysmallformanyofthegroups.Jointlearning,bysharingknowledgebetweenthestatisticallysimilargroups,strengthensthemodelforeachgroup,andtheresultinggeneralizationintheregressionsettingisvastlyimproved.Thecomplexitiesthatunderlietheutilizationoftheinformationtransferbetweenthegroupsarebestmotivatedthroughexamples.InClinicaltrials,forexample,agroupofpeopleareprescribedeitheranewdrugoraplacebotoestimatetheefcacyofthedrugforthetreatmentofacertaindisease.Atapopulationlevel,thisefcacymaybemodeledusingasingleNormalorPoissonmixedmodeldistributionwithmeansetasa(linearorotherwise)functionofthecovariatesoftheindividualsinthepopulation.Acloserinspectionmighthoweverdisclosepotentialfactorsthatexplaintheefcacyresultsbetter.Forexample,theremightberegularitiesatthegrouplevelCaucasiansasawholemightreactdifferentlytothedrugthan,say,Asians,whomight,furthermore, 16

PAGE 17

comprisemanygroups.Identifyingthisacrossgroupinformationwouldthereforeimprovetheaccuracyoftheregressor.Similarlyinthestockmarket,futurevaluesandtrendsforagroupofstocksarepredictedforvarioussectorssuchasenergy,materials,consumerdiscretionary,nancial,technology,etc.Withineachsector,variousstockssharetrendsandthereforepredictingthemtogether(modelingthemwiththesametimeseriesviaautoregressivedensity)isusuallymuchmoreaccuratethanpredictingandcapturingindividualtrends.Modelingthelatentcommonclusteringeffectsofcross-cuttingsubgroupsisthereforeanimportantproblemtosolve.Wepresentaframeworkherethataccomplishesthis.Formultigroupregression,GeneralizedLinearMixedModel(GLMM) BreslowandClayton ( 1993 )andHierarchicalGeneralizedLinearMixedModel LeeandNelder ( 1996 )havebeendevelopedwheresimilaritiesbetweengroupsiscapturedthoughaxedeffectandvariationacrossgroupsiscapturedthroughrandomeffects.Statistically,thesemodelsareveryrigidsinceeverygroupisforcedtomanifestthesamexedeffect,whiletherandomeffectonlyrepresentstheinterceptparameterofthelinearpredictors.Clusterofgroupsmayhavesignicantlydifferentpropertiesfromotherclustersofgroups,afeaturethatisnotcapturedinthesetraditionalGLMbasedmodels.Furthermore,variousclustersofgroupsmayhavedifferentuncertaintieswithrespecttothecovariateswhichwedenoteasheteroscedasticity.Inrecentprogress, BakkerandHeskes ( 2003 )hasproposedaBayesianhierarchicalmodel,whereapriorisusedforthemixtureofgroups.Nevertheless,individualgroupsaregivenweightsasopposedtojointlylearningvariousgroups.Also,thenumberofmixturesarexedinadvance.Before,presentingouralgorithm,wedescribeourbasisforidentifyinggroup-correlation.First,twogroupsarecorrelatediftheirresponsesfollowthesamedistri-bution.Second,twogroupsthathavethesameresponsevariancewithrespecttothecovariatesaredeemedtobecorrelated.ThisisachievedviaaDirichletProcessprior 17

PAGE 18

onthegroupsandthecovariatecoefcients().Theposteriorisobtainedbyappropri-atelycombiningthepriorandthedatalikelihoodfromthegivengroups.Thepriorhelpsclusterthegroupsandthelikelihoodfromtheindividualgroupshelpinthesharingoftrendsbetweengroupstocreatethesingleposteriordensitybetweenthemanypotentialgroups,therebyleadingtogroup-correlation.WenowpresentanoverviewofouriMG-GLMframework.Ourobjectiveistoachieve(a)sharedlearningofvariousgroupsinaregressionsetting,wheredatamayvaryintermsoftemporal,geographicalorothermodalitiesand(b)automaticclusteringofgroupswhichdisplaycorrelation.iMG-GLM-1solvesthistask.IniMG-GLM-2,wemodelacompletelynewgroupaftermodelingpreviousgroupsthroughparameterslearnediniMG-GLM-1.Intherstpart,theregressionparametersaregivenaDirichletProcessprior,thatis,theyaredrawnfromaDPwiththebasedistributionssetasthedensityoftheregressionparameters.SinceadrawfromaDPisanatomicdensity,tobegin,onegroupwillbeassignedonedensityoftheregressionparameterswhichsigniestheresponsedensitywithrespecttoitscovariates.AsthedrawnprobabilityweightfromtheDPincreases,theclusterstartstoconsumemoreandmoregroupsinthismultigroupsetting.WeemployavariationalBayesalgorithmfortheinferenceprocedureiniMG-GLM-1forcomputationalefciency.iMG-GLM-1isthenextendedtoiMG-GLM-2formodelingacompletelynewgroup.Herewetransfertheinformation(covariatecoefcients)obtainedintherstpart,tolearninganewgroup.Inessence,theclusterparameters(covariatecoefcientsforthewholegroup)areusedasapriordistributionforthemodelparametersofthenewgroup'sresponsedensity.Thisthereforeleadstoamixturemodelwheretheweightsaregivenbythenumberofgroupsthatoneclusterconsumedintherstpartandthemixturecomponentsaretheregressionparametersobtainedforthatspeciccluster.Thelikelihoodcomesfromthedataofthenewgroup.Weuseasimpleaccept-rejectbasedMetropolisHastingsalgorithmtogeneratesamplesfromtheposteriorforthenewgroupregressionparameterdensity. 18

PAGE 19

ForbothiMG-GLM-1andiMG-GLM-2,weuseMonteCarlointegrationforevaluatingthepredictivedensityofthenewtestsamples.WeevaluatebothiMG-GLM-1andiMG-GLM-2Normalmodelsintworealworldproblems.Therstisthepredictionandndingoftrendsinthestockmarket.Weshowhowinformationtransferbetweengroupshelpourmodeltoeffectivelypredictfuturestockvaluesbyvaryingthenumberoftrainingsamplesinbothpreviousandnewgroups.Inthesecond,weshowtheefcacyofi-MG-GLM-1andi-MG-GLM-2Poissonmodelagainstitscompetitorsinaveryimportantclinicaltrialproblemsetting. IntroductiontoAutomaticDiscoveryofCommonandIdiosyncraticLatentEffectsinMultilevelRegressionHierarchicalGeneralizedLinearModel(HGLM),proposedin LeeandNelder ( 1996 ),extendsGLMtoalreadygroupedobservations.HierarchicalGeneralizedLinearModelisformallydenedas: f(y;, ,u)=expy)]TJ /F12 10.9091 Tf 10.909 0 Td[(b() a( )+c(y; )(1)Here, isadispersionparameteranduistherandomeffectcomponent.ThemeanresponseisE[YjX]=b()==g)]TJ /F7 7.9701 Tf 6.586 0 Td[(1)]TJ /F3 11.9552 Tf 5.48 -9.684 Td[(XT+v,wheregisthelinkfunction,XTisthelinearpredictorandvisastrictlymonotonicfunctionofu,fv=v(u)g.Here,vsigniesover-dispersion.uhasapriordistributionchosenappropriately.Therefore,inHGLM,theseparatedensitiesarecharacterizedbytwomaincom-ponents.First,thereisaxedeffectparameter,)]TJ /F3 11.9552 Tf 5.479 -9.684 Td[(XTofthedensitywhichincludesthecovariatesXanditscoefcients().Theyaresameforallthegroups.Secondly,thereisarandomeffectpart(v)whichisdifferentindifferentgroups.Notwithstandingitsgeneralityandeffectiveness,theinherentassumptionsinHGLMlimititsperformanceandneedtoberelaxed.Firstly,therandomeffect(u)isnotafunctionofthelineartransformationofthecovariates,XT.Therefore,thisautomaticallyassumesthatthemeanfunctionandthe 19

PAGE 20

varianceoftheoutcomesindifferentgroupsdependneitheronthecovariate,X,noronthecoefcients.Thismakesthemodelsuitableonlyforgroupeddatawherepropertiesoftheoutcomesindifferentgroupsvaryindependentlyofcovariates.Secondly,althoughtheresponse-covariatepairsaregrouped,twodifferentpairsinthesamegroupmaycomefromdifferentresponse-covariatedensities.Likewise,anytwopairsfromtwodifferentgroupsmaybegeneratedfromthesamedensity.Therefore,weneedarobustmodelthatcapturesthishiddenintra/interclusteringeffectinalreadygroupeddata.Thirdly,thecovariate(XT)isassociatedwiththeresponse-covariatedensityonlythroughalinearfunction.Althoughwecanintroduceanon-linearfunctionfortheresponseattheoutput,itdoesnotincludethecovariates.Finally,datamaybeheteroscedasticwithintheindividualgroupalso,i.e.thevarianceoftheresponsemaybeafunctionofthepredictorswithineachgroups.TheresponsevariancehoweverdoesnotdependonthepredictorsinordinaryHGLM.Somelaterversion LeeandNelder ( 2001a )ofHGLMpicksheteroscedasticitybetweenthegroups(differentvariancefordifferentgroups),butwithinspecicgroupsresponsevariancedoesnotvarywithcovariates.ManyexamplesofthekindofproblemthatmotivatesuscanbefoundinClinicaltrials,treeheightimputationandinotherareas.Inclinicaltrials IBM ( 2011 ),agroupofpeoplearegiveneitheranewdrugoraplacebotoestimatetheeffectofthenewdrugfortreatmentofacertaindisease.Normally,thesearemodeledbyNormalorPoissonMixedmodel,whichpredictstheeffectivenessofthenewdrug.Inpractice,ithasbeenfoundthatdifferentpeoplereactdifferentlytonewdrugs.Also,personsindifferentgroupscanbehavesimilarlytothenewdrug.Therefore,predictionofusefulnessofthenewdrugsasawholeisnotperfect.Also,thevariabilityofthereactionmustbedifferentamongpeoplewithingroupsanddifferentgroupsandtheymustdependonthecovariatessuchas,treatmentcentersize,gender,ageetc.Inheightimputation RobinsonandWykoff ( 2004 )forforeststands,heightsaregenerallyregressedwith 20

PAGE 21

varioustreeattributesliketreediameter,pastincrementsetc.,whichgivesaprojectionforforestdevelopmentundervariousmanagementstrategies.ThesearemodeledbytraditionalNormalGLMMwherethefreecoefcient(0)becomestherandomeffect.Theunderlyingassumptionisthattreesinonestandhavethesamegrowthpropertieswhilehavingcompletelydifferentgrowthpropertiesindifferentstands,whichisnottrue.Weneedarobustenoughmodeltocapturethesesharedgrowthpropertiesamongstandsforproperprojectionofoverallforestdevelopment.Also,themodelshouldpickupthevarianceingrowthmeasurementsw.r.t.thediameters,pastincrementsandothertreeattributesacrossstands.Inthisdissertation,wealleviatetheseassumptionsofHGLMbydevelopingiHGLM,aNon-parametricBayesianMixtureModeloftheHierarchicalGeneralizedlinearModel.TheiHGLMframeworkisspeciedtoallthemodelsofHGLM,i.e.Normal,Poisson,Logistic,InverseGaussian,Probit,Exponentialetc.IniHGLM,wemodeloutcomesinthesamegroupviamixturesoflocaldensities.Thiscaptureslocallysimilarregressionpatterns,whereeachlocalregressionisef-fectivelyaGLM.Toforcethedensityofthecovariate,X,anditscoefcients,,tobesharedamonggroups,wemakethecoefcients,,andthecovariates,X,fordifferentgroupsbegeneratedfromthesameprioratomicdistribution.Anatomicdistributionplacesniteprobabilitiesonafewoutcomesofthesamplespace.Whenthecoef-cients,,andthecovariates,X,aredrawnfromthisatomicdensity,itenablestheXandindifferentgroupstosharedensities.Inthisway,intheBayesiansetting,alongwiththedensityofrandomeffect(u),thedensityofxedeffect(XT)isalsosharedamonggroups.Weobtainthisprioratomicdensityforxedandrandomeffect,whileensuringalargesupport,throughaHierarchicalDirichletProcess(HDP)prior Y.W.TehandBlei ( 2006 ).FromtheHDPprior,ourmaingoalistogeneratepriordensitiesofxedeffectuand(XT)foreachgroup.WedrawadensityG0fromaDirichletProcess(DP(,H)) 21

PAGE 22

Ferguson ( 1973 ).Inthiscase,theH(thebasedistribution)isbasicallythesetofdensitiesintheparameterspaceofxed(u)andrandomeffect(XT).Accordingto Sethuraman ( 1994 ),thisensuresthatG0isatomic,yethavingabroadsupport.Therefore,G0isanatomicdensityintheparameterspaceofuand(XT)whichputsniteprobabilitiesonseveraldiscretepointswhichactsasitssupport.Then,foreachgroup,wedrawgroupspecicdensitiesGjfromDP(,G0).SinceG0isalreadyatomic,andthereforeaccordingto Sethuraman ( 1994 ),GjisalsoatomicandhencethesupportofgroupspecicdensitiesGjsmustsharecommonpointsintheirrespectiveparameterspaceofxed(u)andrandom(XT)effect.Now,thisGjactsaspriordensitiesfortheuandXTforeachgroup.Subsequently,bothuandXTaremodeledthroughmixtureoflocaldensitieswhicharesharedamonggroups.Foreachcomponent(clusterswithingroups)inthemixtureofresponse-covariatedensitiesinasinglegroup,althoughthemeanfunctionislinear,marginalizingoutthelocaldistributioncreatesanon-linearmeanfunction.Inaddition,thevarianceoftheresponsesvaryamongmixturecomponents(clusters),therebyvaryingamongcovariates.Thenon-parametricmodelensuresthatthedatawoulddeterminethenumberofmixturecomponents(clusters)inspecicgroupsandthenatureofthelocalGLMs. IntroductiontoDenoisingTimeSeriesbyWayofaFlexibleModelforPhaseSpaceReconstructionInthispart,weoutlineatechniquefordenoisingatimeseriesbywayofaexiblemodelforphasespacereconstruction.Noiseisahighdimensionaldynamicalsystemwhichlimitstheextractionofquantitativeinformationfromexperimentaltimeseriesdata.Successfulremovalofnoisefromtimeseriesdatarequiresamodeleitherforthenoiseorforthedynamicsoftheuncorruptedtimeseries.Forexample,inwaveletbaseddenoisingmethodsfortimeseries MallatandHwang ( 1992 ); SiteandRamakrishnan ( 2000 ),themodelforthesignalassumesthattheexpectedoutputofaforward/inverse 22

PAGE 23

wavelettransformoftheuncorruptedtimeseriesissparseinthewaveletcoefcients.Inotherwords,itispresupposedthatthesignalenergyisconcentratedonasmallnumberofwaveletbasiselements;theremainingelementswithnegligiblecoefcientsareconsiderednoise.Hard-thresholdwavelet Zhangetal. ( 2001 )andSoft-thresholdwavelet DavidandDonoho ( 1995 )aretwowidelyknownnoisereductionmethodsthatsubscribetothismodel.PrincipalComponentAnalysis,ontheotherhand,assumesamodelforthenoise:thevariancecapturedbytheleastimportantprincipalcomponents.Therefore,denoisingisaccomplishedbydroppingthebottomprincipalcomponentsandprojectingthedataontotheremainingcomponents.Inmanycases,thetimeseriesisproducedbyalow-dimensionaldynamicalsystem.Insuchcases,thecontaminationofnoiseinthetimeseriescandisablemeasurementsoftheunderlyingembeddingdimension KostelichandYorke ( 1990 ),introduceextraLyapunovExponents Badiietal. ( 1988 ),obscurethefractalstructure Grassbergeretal. ( 1991 )andlimitpredictionaccuracy ElshorbagyandPanu ( 2002 ).Therefore,reductionofnoisewhilemaintainingtheunderlyingdynamicsgeneratedfromthetimeseriesisofparamountimportance.AwidelyusedmethodintimeseriesdenoisingisLow-passltering.Herenoiseisassumedtoconstituteallhighfrequencycomponentswithoutreferencetothecharacteristicsoftheunderlyingdynamics.Unfortunately,lowpasslteringisnotwellsuitedtonon-linearchaotictimeseries Wangetal. ( 2007 ).Sincethepowerspectrumoflow-dimensionalchaosresemblesanoisytimeseries,removalofthehigherfrequenciesdistortstheunderlyingdynamics,thereby,addingfractaldimensions Mitschkeetal. ( 1988 ).Inthisdissertation,wepresentaphasespacereconstructionbasedapproachtotimeseriesdenoising.ThemethodisfoundedonTaken'sEmbeddingTheorem Takens ( 1981 ),accordingtowhichadynamicalsystemcanbereconstructedfromasequenceofobservationsoftheoutputofthesystem(considered,here,thetimeseries).This 23

PAGE 24

respectsallpropertiesofthedynamicalsystemthatdonotchangeundersmoothcoordinatetransformations.Informallystated,theproposedtechniquecanbedescribedasfollows:Consideratimeseries,x(1),x(2),x(3).....corruptedbynoise.Werstreconstructthephasespacebytakingtimedelayedobservationsfromthenoisytimeseries(forexample,hx(i),x(i+1)iformsaphasespacetrajectoryin2-dimensions).Theminimumembed-dingdimension(i.e.,numberoflags)oftheunderlyingphasespaceisdeterminedviatheFalseNeighborhoodmethod Kenneletal. ( 1992 ).Next,weclusterthephasespacewithoutimposinganyconstraintsonthenumberofclusters.Finally,weapplyanonlinearregressiontoapproximatethetemporallysubsequentphasespacepointsforeachpointineachclusterviaanonparametricBayesianapproach.Henceforth,werefertoourtechniquebytheacronymNPB-NR,standingfornonparametricBayesianapproachtonoisereductioninTimeSeries.Toelaborate,thesecondstepclustersthereconstructedphasespaceofthetimeseriesthroughanInniteMixtureofGaussiandistributionviaDirichletProcess Ferguson ( 1973 ).WeconsidertheentirephasespacetobegeneratedfromaDirichletProcessmixture(DP)ofsomeunderlyingdensity EscobarandWest ( 1995 ).DPallowsthephasespacetochooseasmanyclustersastsitsdynamics.Theclusterspickoutsmallneighborhoodsofthephasespacewherethesubsequentnon-linearapproximationwouldbeperformed.Asthelatentunderlyingdensityofthephasespaceisunknown,modelingthiswithanInnitemixturemodelallowsNPB-NRtocorrectlyndthephasespacedensity.ThisisbecauseoftheguaranteeofposteriorconsistencyoftheDirichletProcessMixturesunderGaussianbasedensity S.GhosalandRamamoorthi ( 1999 ).Therefore,wechoosethemixingdensitytobeGaussian.TheposteriorconsistencyactsasafrequentistjusticationofBayesianmethodsasmoredataarrives,theposteriordensityconcentratesonthetrueunderlyingdensityofthedata. 24

PAGE 25

Inthethirdstep,ourgoalistonon-linearlyapproximatethedynamicsineachclus-terformedabove.WeuseaDPmixtureofLinearregressiontonon-linearlymapeachpointinaclustertoitsimage(thetemporallysubsequentpointinthephasespace).Inthisinnitemixturesofregression,wemodelthedatainaspecicclusterviaamixturesoflocaldensities(Normaldensitywithlinearcombinationofthecovariates(X)astheMean).Althoughthemeanfunctionislinearforeachlocaldensity,marginalizingoverthelocaldistributioncreatesanon-linearmeanfunction.Inaddition,thevarianceoftheresponsesvaryamongmixturecomponentsintheclusters,therebyvaryingamongcovariates.Thenonparametricmodelensuresthatthedatadeterminesthenumberofmixturecomponentsinspecicclustersandthenatureofthelocalregressions.Again,thebasisfortheinnitemixturemodeloflinearregressionistheguaranteeofposteriorconsistency Tokdar ( 2006 ).Inthenalstep,werestructurethedynamicsbyminimizingthesumofthedeviationbetweeneachpointintheclusteranditspre-image(previoustemporalpoint)andpost-image(nexttemporalpoint)yieldedbythenon-linearregressiondescribedabove.Tocreateanoiseremovedtimeseriesoutofthephasespace,readjustmentofthetrajectoryisdonebymaintainingtheco-ordinatesofthephasespacepointstobeconsistentwithtimedelayembedding.WedemonstratetheaccuracyoftheNPB-NRmodelacrossseveralexperimentalsettingssuchas,noisereductionpercentageandpowerspectrumanalysisonseveraldynamicalsystemslikeLorenz,Van-der-poll,BucklingColumn,GOPY,RayleighandSinusoidattractors,ascomparedtolowpassltering.WealsoshowtheforecastingperformanceoftheNPB-NRmethodintimeseriesdatasetsfromvariousdomainliketheDOW30indexstocks,LASERdataset,ComputerGeneratedSeries,Astrophysicaldataset,CurrencyExchangedataset,USIndustrialProductionIndicesdataset,DarwinSeaLevelPressuredatasetandOxygenIsotopedatasetagainstsomeofitscompetitorslikeGARCH,AR,ARMA,ARIMA,PCA,KernelPCAandGaussianProcessregression. 25

PAGE 26

OrganizationoftheDissertationInchapter2,webrieydescribeGeneralizedLinearModelsandBayesianinferencetheory,withfocusonthenonparametricBayesianframeworkanditsvariousrepresen-tations.InChapter3,weoutlinethevariationalinferenceofDirichletProcessmixturesofGeneralizedLinearModels.Chapter4presentstheclusteringmodelsformultigroupregression.Chapter5outlinestheautomaticdetectionoflatentCommonandidiosyn-craticeffectsinmultilevelregression.Finally,inChapter6,wepresentthetimeseriesdenoisingmethodindetails.Chapter7discussesfuturedirections. 26

PAGE 27

CHAPTER2MATHEMATICALBACKGROUND GeneralizedLinearModel OverviewGeneralizedLinearModelswereproposedin NelderandWedderburn ( 1972 )togeneralizeLinearRegressionbyallowingtheoutcome/responsevariablestobedistributedaccordingtomanyotherdistributionsotherthanthestandardNormaldistribution.ItbroughttogetherseveralregressionmodelssuchasLogisticRegression,Poissonregression,Probitregressionetc.underacommonframework.InaGeneralizedLinearModel(GLM),theresponsevariablegiventhecovari-ates/independentvariablesfollowsaexponentialfamilydistribution(whichthereforeincludestheNormal,Binomial,PoissonandGammadistributionsetc).Theexpec-tation/meanofthedistribution,,generallydependsonthecovariates/independentvariables,X,viathefollowingequation: E(Y)==g)]TJ /F7 7.9701 Tf 6.587 0 Td[(1(X)E(Y)==g)]TJ /F7 7.9701 Tf 6.586 0 Td[(1(X)(2)HereE(Y)isthemeanoftheresponsedistributionortheexpectedvalueofre-sponseandXisthelinearcombinationofthecovariateswithcoefcientsandgistermedasthelinkfunction.Theunknownparameters,,aregenerallyestimatedwithmaximumquasi-likelihood,maximumlikelihoodorBayesiantechniques.TheGLMframeworkoperatesusingthreecomponents: Exponentialfamilyprobabilitydistribution Alinearcombinationofthecovariates,linearpredictor,X. Alinkfunctiong,whichlinksthelinearpredictortothemeanoftheresponsedistribution,suchthatX=g(),sothatE(Y)==g)]TJ /F7 7.9701 Tf 6.586 0 Td[(1(X). 27

PAGE 28

ProbabilityDistributionAGLMmodelisthereforeformallydenedintermsofprobabilitydistributionas: f(y;, )=expy)]TJ /F3 11.9552 Tf 11.955 0 Td[(b() a( )+c(y; )(2)Here, isadispersionparameter.Therearemanycommondistributionsthatbelongtothisexponentialfamily.TheyareNormal,Gamma,Beta,Dirichlet,Multinomialetc. LinearPredictorThelinearpredictoristhelinearcombinationoftheindependentvariables,X.Thisistheentitythatgatherstheinformationabouttheindependentvariablesandthenincludestheminthemodel.Thisisalsotightlyrelatedtothelinkfunctionwhichwedescribeinthenextsection=X. LinkFunctionThelinkfunctionlinkstheexpectation/meanoftheresponsedistributiontothelinearpredictor.So,thelinearpredictorgoesintothemodelviathisfunctionwhichistheresponseofthedistribution.TherearemanycommonlyusedlinkfunctionsinGeneralizedLinearModelfamily.ForNormalmodel,X=,theidentitylinkfunction.ForExponentialandGammamodel,itistheinverselink,X=)]TJ /F7 7.9701 Tf 6.586 0 Td[(1.ForInverseGaussianmodel,thelinkfunctionistheinversesquared,X=)]TJ /F7 7.9701 Tf 6.586 0 Td[(2.ForPoissonmodel,thelinkfunctionistheloglink,X=ln().ForBernoulli,Categorical/MultinomialModel,itistheLogitfunction,ln( 1)]TJ /F16 7.9701 Tf 6.587 0 Td[(). 28

PAGE 29

BayesianStatistics Bayes'TheoremandInferenceBayesianinferenceisamannerofdoingstatisticalinferencewhereweuseBayes'theoremtocalculatethetheprobabilityforanunknownquantityaswegathermoreandmoreinformation.ThereisapriordistributionP(j)fortheunknownquantity,(here,isthehyperparameter)andtheobserveddata(X1,X2,......)ismodeledtobedistributedindependentlyandidentically(i.i.d.)accordingtoadistributionP(Xj).Now,giventhisdata,accordingtoBayes'rule,theposteriordistributionofisgivenby P(jX,)=P(Xj)P(j) P(Xj)=P(Xj)P(j) RP(Xj)P(j)d(2)Here,P(Xj)isknownasthemarginallikelihood. MAPEstimateMAPestimateisthemode(optima)oftheposteriordistribution.Thisisnothingbutapointestimateoftheunknownparameterbasedontheobserveddata.Thisisaccomplishedbyoptimizingtheposteriorwithrespecttotheunknownparameter.Thisisgivenby, MAP=argmaxP(jX,)(2)Thisiseasytoevaluatewhentheposteriorhasaclosedformknowndistribution,whichbringstheideaofconjugateprior. ConjugatePriorWhentheposteriordistribution,p(jX,)hasthesameanalyticalfromasthepriordistributionp(j),theyaretermedasconjugatetoeachother.Inthatcase,thepriorbecomesaconjugatepriorforthatlikelihoodp(Xj).Thisisanalgebraicconveniencewheretheposteriordistributioncanbedeterminedinaclosedform.Forexample, 29

PAGE 30

theGaussiandistributionisconjugatetoanotherGaussian(whereonlythemeanisunknown),DirichletdistributionisconjugateforMultinomiallikelihood,BetadensityistheconjugateforBinomiallikelihood.Everyexponentialfamilydistributionhasaconjugateprior. NonparametricBayesianTheanalyticalformofthedatadistributionisassumedinparametricBayesiantheory.Thisisverylimitinginthesensethatthenumberofparametersinthemodeldoesnotdependonthedata,ratherthisispre-xed.But,innonparametricBayesianstatistics,theparameterspaceisinnite-dimensional.Asthemodelobtainsmoreandmoredata,itautomaticallyevaluatesthestatusoftheexistingparametersoraddsmoreparameterstosuitablyreectthedata.NonparametricBayesianstatisticshavebeenstudiedextensivelyinmachinelearninginthedomainofclassication,regression,nancialmarkets,timeseriesprediction,dynamicalsystemsetc. DirichletDistributionandDirichletProcessTheDirichletdistributionisamultivariateversionoftheBetadistribution.ItisdenedontheK-dimensionalsimplex.If,x=(x1,x2,,,xK)representsaK-dimensionalprobabilityspace,suchthat8i,xi0andPKk=1xk=1,thentheDirichletdistributionisgivenby, Dir(x1,,,,xKj1,2,,,,K)=\(PKk=1k) \(k)Kk=1xk)]TJ /F7 7.9701 Tf 6.587 0 Td[(1k(2)Here,E[xk]=k PKk=1k,Var[xk]=k(k)]TJ /F19 7.9701 Tf 6.587 5.978 Td[(PKk=1k) (PKk=1k))2(PKk=1k)+1)TheDirichletdistributionistheconjugatepriortothecategoricalandmultinomialdistribution.Therefore,whenthedatalikelihoodfollowsacategorical/multinomialdistribution,thepriorshouldbeaDirichletdistributiontogetaDirichletdistributionastheposterior. 30

PAGE 31

ADirichletProcess Ferguson ( 1973 ),D(0,G0)isdenedasaprobabilitymeasureover(X,B(X)),suchthatforanynitepartitionofX=A1[A2[A3...[AK. (G(A1),G(A2),...,G(AN))Dir(0A1,0A2,...,0AK)(2)D(0,G0)isdenedasaprobabilitydistributiononasamplespaceofprobabilitydistribution.Here,0istheconcentrationparameterandG0isthebasedistribution.Here,E[G(A)]=G0(A)andV[G(A)]=G0(A)(1)]TJ /F3 11.9552 Tf 12.522 0 Td[(G0(A))=(0+1),where,AisanysubsetofXbelongingtoitssigmaalgebra.TherearetwowellknownrepresentationsofDirichletProcesswhichwewoulddescribebelow. StickBreakingRepresentationAccordingtothestick-breakingconstruction Sethuraman ( 1994 )ofDP,G,whichisasamplefromDP,isanatomicdistributionwithcountablyinniteatomsdrawnfromG0. ij0,G0Beta(1,0),ij0,G0G0i=vii)]TJ /F7 7.9701 Tf 6.587 0 Td[(1Yl=1(1)]TJ /F3 11.9552 Tf 11.955 0 Td[(vl),G=1Xi=1i.i(2) ChineseRestaurantProcessAsecondrepresentationoftheDirichletprocessisgivenbythePolyaurnProcess BlackwellandMacQueen ( 1973 ).ThisclearlyprovestheclusteringpropertyoftheDirichletProcess.Let1,2,...beindependentandidenticallydistributeddrawsfromG.Thentheconditionaldistributionnj1,2,..........n)]TJ /F7 7.9701 Tf 6.586 0 Td[(1isgivenby, nj1,23n)]TJ /F7 7.9701 Tf 6.587 0 Td[(1,0,G0n)]TJ /F7 7.9701 Tf 6.586 0 Td[(1Xi=11 n)]TJ /F10 11.9552 Tf 11.955 0 Td[(1+0i+0 n)]TJ /F10 11.9552 Tf 11.956 0 Td[(1+0G0(2) 31

PAGE 32

Figure2-1. StickBreakingfortheDirichletProcess Figure2-2. ChineseRestaurantProcess Basically,anatom,wouldbedrawnwithmoreprobabilityiftheatomhasbeendrawnbefore.Eachtimeanewatommaybedrawnwithprobability0. 32

PAGE 33

Figure2-3. PlatenotationforDPMM DirichletProcessMixtureModelIntheDPmixturemodel Antoniak ( 1974 ), EscobarandWest ( 1995 ),DPisusedasanonparametricprioroverparametersofaninnitemixturemodel IshwaranandJames ( 2001 ). znjfv1,v2,...gCategoricalf1,2,3....gXnjzn,(i)1i=1F(zn)(2)Here,Fisadistributionparametrizedbyzn. HierarchicalDirichletProcessHierarchicalDirichletProcesswasproposedin Y.W.TehandBlei ( 2006 )tomodelgroupeddata.Here,anindividualgroupismodeledaccordingtoamixturemodel.AHierarchicalDirichletProcessisdenedasadistributionoverasetofrandomprobabilitymeasures.ThereisarandomprobabilitymeasureGjforeachgroupandauniversal 33

PAGE 34

randomprobabilitymeasureG0.TheuniversalmeasureG0isadrawfromaDirichletprocessparametrizedbyconcentrationparameterandbaseprobabilitymeasureH. G0j,HDP(,H)(2)Now,eachGjisadrawfromaDPparametrizedby0andG0. Gjj0,G0DP(0,G0)(2)TheHDPMixturemodelisgivenby, j,ijGjGjxj,ijj,iF(j,i)(2)Here,j,iisthelatentparameterforithelementinthejthgroupandxj,iistheithelementinthejthgroup.Nowthat,G0isadrawfromaDP,thisformsanatomicdistri-butionaccordingtotheprevioussection.WhenGjsaredrawn,theyinvariablysharesomeofthoseatomsbecausetheyallaredrawnfromthesameG0.Therefore,Hierar-chicalDirichletProcesshasthisuniquecapabilityofpickingsharedlatentparametersingroupeddatainaninnitemixturemodelsetting. ChineseRestaurantFranchiseIntheChineseRestaurantFranchise(CRF),wehaveanitenumberofrestaurants(groups)withinnitenumberoftables(clusters)withshareddishes(parameter)amongallrestaurants.Letjibethecustomers,1:Kbetheglobaldishes,jtbethetable-specicdishes,tjibethetableindexofjthrestaurant(jt)andithcustomer(ji),kjtbethetablemenuindexofthejthrestaurant(jt)andtthtable(k).Again,njtandnjkdenotesthenumberofcustomersinthetthtable-jthrestaurantandjthrestaurant-kth 34

PAGE 35

Figure2-4. PlatenotationforHDPMM Figure2-5. PlatenotationforHDPMMwithindicatorvariables 35

PAGE 36

Figure2-6. ChineseRestaurantFranchiseforHDP Y.W.TehandBlei ( 2006 ) dishrespectively.mjk,mj,mkandmdenotethenumberoftablesinjthrestaurantservingdishk,numberoftablesinjthrestaurantservinganydishes,numberoftablesservingdishk,andtotalnumberoftables,respectively.Now,fromChineseRestaurantProcess,wehave, jijj1:j(i)]TJ /F7 7.9701 Tf 6.586 0 Td[(1),0,G00 0+i)]TJ /F10 11.9552 Tf 11.955 0 Td[(1G0+mjXt=1njt 0+i)]TJ /F10 11.9552 Tf 11.955 0 Td[(1jt(2)IntegratingoutG0,wehave, jtj11:j(t)]TJ /F7 7.9701 Tf 6.586 0 Td[(1),,H +mH+KXk=1mk +mk(2) 36

PAGE 37

CHAPTER3VARIATIONALINFERENCEFORINFINITEMIXTURESOFGENERALIZEDLINEARMODELS GLMModelsasProbabilisticGraphicalModelsWebeginbyassumingthecontinuouscovariate-responsepairsinthemodelsasaprobabilisticgraphicalmodelaccordingtoitsstickbreakingrepresentation.TheNormalandMultinomialModelwaspresentedin Hannahetal. ( 2011 ),weextendtotheothermodels. NormalModelInNormalModel,thegenerativemodelofthecovariate-responsepairisgivenbythefollowingsetofequations. vij1,2Beta(1,2)fi,d,x,i,dgN)]TJ /F9 11.9552 Tf 5.48 -9.683 Td[(i,djmx,d,(x,d,x,i,d))]TJ /F7 7.9701 Tf 6.587 0 Td[(1Gamma(x,i,djax,d,bx,d)fi,d,y,igN)]TJ /F9 11.9552 Tf 5.479 -9.684 Td[(i,djmy,d,(y,y,i))]TJ /F7 7.9701 Tf 6.587 0 Td[(1Gamma(y,ijay,by)znjfv1,v2,.....gCategoricalfM1,M2,M3....gXnjznN(zn,d,x,zn,d)YnjXn,znN zn,0+DXd=1zn,dXn,d,)]TJ /F7 7.9701 Tf 6.586 0 Td[(1y,zn!(3)Here,XnandYnrepresentsthecontinuousresponse-covariatepairs.fz,v,x,ygisthesetoflatentvariablesandthedistributions,fi,d,x,i,dgandfi,d,y,igarethebasedistributionsoftheDP. LogisticMultinomialModelInthelogisticmultinomialmodel,thecontinuouscovariatesaremodeledbyaGaussianmixtureandamultinomiallogisticframeworkisusedforthecategorical 37

PAGE 38

response.Inthismodel,thecovariateandznaremodeledidenticallyastheNormalModelabove.Hence,wepresentonlytheresponsedistribution. fi,dgN)]TJ /F9 11.9552 Tf 5.48 -9.684 Td[(i,djmy,d,k,s2y,d,kYnjXn,znexpzn,0,k+PDd=1zn,d,kXn,d PKk=1expzn,0,k+PDd=1zn,d,kXn,d(3)Here,fz,v,x,ygarethelatentvariablesandfi,d,x,i,dgandfi,dgaretheDPbasedistributions. PoissonModelInthePoissonModel,thecategoricalcovariateismodeledbyamixtureofMultino-mialandaPoissondistributionisusedforthecountresponsedata.Here,tooviandznfollowthesamedistributionsasbefore.Theremainderofthegenerativemodelisgivenby, fpi,d,jgDir(ad,j),fi,d,jgN)]TJ /F9 11.9552 Tf 5.479 -9.684 Td[(i,d,jjmd,j,s2d,jzn=exp zn,0+DXd=1K(d)Yj=1(i,d,jXn,d,j)norm(Xn,d,j)!XnjznCategorical(pzn,d,j),YnjXn,znPoisson(zn)(3)Thelatentvariable,pi,d,j,isparametrizedbyad,jandtheresponsecomesfromaPoissondistributionparametrizedbyexp(X).Here,norm(Xn,d,j)=1,ifXn,dbelongstothejthcategoryandiszerootherwise.K(d)isthenumberofcategoryofdthdimension. ExponentialModelIntheexponentialmodel,thegenerativemodelofthecovariate-responsepairisgivenby, 38

PAGE 39

vij1,2Beta(1,2)fx,i,dgGamma(x,i,djax,bx)fi,dgGamma(i,djcy,d,by,d)znjfv1,v2,.....gCategoricalfM1,M2,M3....gXn,djznExp(Xn,djx,zn,d)YnjXn,znExp Ynjzn,0+DXd=1zn,dXn,d!(3)Here,XnandYnrepresentsthecontinuousresponse-covariatepairs.fz,v,x,i,d,i,dgisthesetoflatentvariablesandthedistributions,fx,i,dgandfi,dgarethebasedistri-butionsoftheDP. InverseGaussianModelIntheInverseGaussianModel,thecovariateandtheresponseismodeledbyanInverseGaussiandistribution.Here,tooviandznfollowthesamedistributionsasbefore.Theremainderofthegenerativemodelisgivenby, fi,d,x,i,dgN)]TJ /F9 11.9552 Tf 5.479 -9.683 Td[(i,djax,d,(bx,d,x,i,d))]TJ /F7 7.9701 Tf 6.587 0 Td[(1Gamma(x,i,djcx,d,dx,d)fi,d,y,igN)]TJ /F9 11.9552 Tf 5.48 -9.684 Td[(i,djay,d,(by,y,i))]TJ /F7 7.9701 Tf 6.587 0 Td[(1Gamma(y,ijcy,dy)Xn,djznIG(Xn,djzn,d,x,zn,d)YnjXn,znIG Ynjzn,0+DXd=1zn,dXn,d,y,zn!(3)Here,XnandYnrepresentsthecontinuousresponse-covariatepairs.fz,v,i,d,x,i,d,i,d,y,igisthesetoflatentvariablesandthedistributions,fi,d,x,i,dgandfi,d,y,igarethebasedistributionsoftheDP. 39

PAGE 40

MultinomialProbitModelIntheMultinomialProbitmodel,thecontinuouscovariatesaremodeledbyaGaussianmixtureandaMultinomialProbitframeworkisusedforthecategoricalresponse.Here,tooviandznfollowthesamedistributionsasbefore.Theremainderofthegenerativemodelofthecovariate-responsepairisgivenbythefollowingsetofequations. fi,d,x,i,dgN)]TJ /F9 11.9552 Tf 5.479 -9.684 Td[(i,djax,d,(bx,d,x,i,d))]TJ /F7 7.9701 Tf 6.587 0 Td[(1Gamma(x,i,djcx,d,dx,d)Xn,djznN)]TJ /F3 11.9552 Tf 5.48 -9.684 Td[(Xn,djzn,d,)]TJ /F7 7.9701 Tf 6.587 0 Td[(1x,zn,di,d,kN)]TJ /F9 11.9552 Tf 5.48 -9.684 Td[(i,d,kjmy,d,k,s2y,d,ky,i,kGamma(y,i,kjay,k,by,k)Yn,k,ijXn,znN Ynji,0,k+DXd=1i,d,kXn,d,)]TJ /F7 7.9701 Tf 6.586 0 Td[(1y,i,k!YnjYn,k,znYn,k,zn PKk=1Yn,k,zn(3)Here,z,v,i,d,x,i,d,i,d,k,y,i,k,Yn,k,iarethelatentvariablesandthedistribu-tions,fi,d,x,i,dg,fi,d,kg,fy,i,kgandYn,k,iaretheDPbasedistributions. VariationalInferenceVariationalmethodsinBayesiansettingaimstondsomejointdistributionofsomehiddenvariablestoapproximateatruedistributionofthehiddenvariablesandminimizestheKLdivergencebetweenthetrue/variationaldistribution.Thesimpleformofvariationaldistributionischosenbecausethiscanlaterbeusedasfactorizeddistributionandcanbesampledfrom.Itcanalsoleadtocomputationalfeasibilityofpredictivedistribution.Thelikelihoodofthemodelisthesumofalowerbound(obtainedfromJensen'sinequalityandafunctionofavariationaldistributionparameters)andtheKLdivergenceofthetrueandvariationaldistribution.Therefore,maximizingthe 40

PAGE 41

boundisequivalenttominimizingthedivergence(asthelikelihoodisconstant),leadingtotheoptimalvariationalparameters.Thiscompletesthecomputationofthevariationaldistribution. VariationalDistributionoftheModelsTheinter-couplingbetweenYn,XnandzninallthreemodelsdescribedabovemakescomputingtheposteriorofYnanalyticallyintractable.Wethereforeintroducethefollowingfullyfactorizedanddecoupledvariationaldistributionsassurrogates. NormalModelThevariationaldistributionfortheNormalmodelisdenedformallyas: q(z,v,x,y)=T)]TJ /F7 7.9701 Tf 6.586 0 Td[(1Yi=1q(viji)NYn=1q(znjn)TYi=1DYd=1q)]TJ /F9 11.9552 Tf 5.479 -9.684 Td[(i,djmx,i,d,(x,i,d,x,i,d))]TJ /F7 7.9701 Tf 6.586 0 Td[(1q(x,i,djax,i,d,bx,i,d)TYi=1DYd=0q)]TJ /F9 11.9552 Tf 5.479 -9.683 Td[(i,djmy,i,d,(y,i,y,i))]TJ /F7 7.9701 Tf 6.587 0 Td[(1q(y,ijay,i,by,i)(3)Firstly,eachvifollowsaBetadistribution.Asin BleiandJordan ( 2006 ),wehavetruncatedtheinniteseriesofvisintoaniteonebymakingtheassumptionq(vT=1)=1andMi=08i>T.Notethatthistruncationappliestothevariationalsurrogatedistributionandnottheactualposteriordistributionthatweapproximate.Secondly,znfollowsavariationalmultinomialdistribution.Thirdly,x=fi,d,x,i,dgandy=fi,0:i,D,y,ig,bothfollowavariationalNormal-Gammadistribution. LogisticMultinomialModelThevariationaldistributionfortheLogisticMultinomialmodelisgivenby: 41

PAGE 42

q(z,v,x,y)=T)]TJ /F7 7.9701 Tf 6.586 0 Td[(1Yi=1q(viji)NYn=1q(znjn)TYi=1DYd=1q)]TJ /F9 11.9552 Tf 5.479 -9.684 Td[(i,djmx,i,d,(x,i,d,x,i,d))]TJ /F7 7.9701 Tf 6.586 0 Td[(1q(x,i,djax,i,d,bx,i,d)TYi=1DYd=0KYk=1q)]TJ /F9 11.9552 Tf 5.479 -9.684 Td[(i,d,kjmy,i,d,k,s2y,i,d,k(3)Here,viandznrepresentthesamedistributionsasdescribedintheNormalmodel.x=fi,d,x,i,dgandy=fi,0,0:i,D,KgfollowsavariationalNormal-GammaandaNormaldistributionrespectively. PoissonModelThevariationaldistributionforthePoissonModelis q(z,v,x,y)=T)]TJ /F7 7.9701 Tf 6.587 0 Td[(1Yi=1q(viji)NYn=1q(znjn)TYi=1DYd=1Dir(pi,d,jjai,d,j)TYi=1DYd=0K(d)Yj=1q)]TJ /F9 11.9552 Tf 5.479 -9.684 Td[(i,d,jjmi,d,j,s2i,d,j(3)Here,i,d,jfollowsaNormaldistributionandpi,d,jcomesfromamixtureofvaria-tionalDirichletdistribution. ExponentialModelThevariationaldistributionfortheExponentialmodelisdenedformallyas: q(z,v,x,i,d,i,d)=T)]TJ /F7 7.9701 Tf 6.586 0 Td[(1Yi=1q(viji)NYn=1q(znjn)TYi=1DYd=1q(x,i,djax,i,d,bx,i,d)TYi=1DYd=0q(i,djcy,i,d,dy,i,d)(3)znfollowsavariationalmultinomialdistribution.Thirdly,fx,i,dgandfi,0:i,Dg,bothfollowavariationalGammadistribution. 42

PAGE 43

InverseGaussianModelThevariationaldistributionfortheInverseGaussianModelisgivenby: q(z,v,i,d,x,i,d,i,d,y,i)=T)]TJ /F7 7.9701 Tf 6.586 0 Td[(1Yi=1q(viji)NYn=1q(znjn)TYi=1DYd=1q)]TJ /F9 11.9552 Tf 5.479 -9.684 Td[(i,djax,i,d,(bx,i,d,x,i,d))]TJ /F7 7.9701 Tf 6.587 0 Td[(1q(x,i,djcx,i,d,dx,i,d)TYi=1DYd=0q)]TJ /F9 11.9552 Tf 5.48 -9.684 Td[(i,djay,i,d,(by,i,y,i))]TJ /F7 7.9701 Tf 6.587 0 Td[(1q(y,ijcy,i,dd,i)(3)fi,d,x,i,dgandfi,0:i,D,y,igbothfollowsavariationalNormal-Gammadistribu-tion. MultinomialProbitModelThevariationaldistributionfortheMultinomialProbitModelis q(z,v,x,y)=T)]TJ /F7 7.9701 Tf 6.587 0 Td[(1Yi=1q(viji)NYn=1q(znjn)TYi=1DYd=1q)]TJ /F9 11.9552 Tf 5.479 -9.683 Td[(i,djax,i,d,(bx,i,d,x,i,d))]TJ /F7 7.9701 Tf 6.587 0 Td[(1q(x,i,djcx,i,d,dx,i,d)TYi=1DYd=1KYk=1q)]TJ /F9 11.9552 Tf 5.48 -9.684 Td[(i,d,kjmy,i,d,k,s2y,i,d,kKYk=1TYi=1q(y,i,kjay,i,k,by,i,k)NYn=1KYk=1TYi=1q Yn,k,iji,0,k+DYd=1i,d,kXn,d,)]TJ /F7 7.9701 Tf 6.587 0 Td[(1y,i,k!(3)Here,i,d,kfollowsaNormaldistribution.fi,d,x,i,dgandYn,k,i,y,i,kfollowsavariationalNormal-Gammadistribution.i,d,kfollowsanormaldistribution. GeneralizedEvidenceLowerBound(ELBO)Weboundtheloglikelihoodoftheobservationsinthegeneralizedformofthemodels(sameforallthemodels)usingJensen'sinequality,(E[X])E[(X)],where,isaconcavefunctionandXisarandomvariable. 43

PAGE 44

logfp(X,YjA)g=logZXzp(X,Y,z,v,x,yjA)dvdxdy=logZXzp(X,Y,z,v,x,yjA)q(z,v,x,y) q(z,v,x,y)dvdxdyZXzq(z,v,x,y)logfp(X,Y,z,v,x,yjA)gdvdxdy)]TJ /F5 11.9552 Tf 11.291 16.272 Td[(ZXzq(z,v,x,y)logfq(z,v,x,y)gdvdxdy=Eq[logfp(X,Y,z,v,x,yjA)g])]TJ /F3 11.9552 Tf 11.955 0 Td[(Eq[logfq(z,v,x,y)g]=Eq[logfp(v)g]+Eq[logfp(zjv)g]+Eq[logfp(x)g]+Eq[logfp(y)g]+Eq[logfp(X)g]+Eq[logfp(Y)g])]TJ /F3 11.9552 Tf 9.298 0 Td[(Eq[logfq(x)g])]TJ /F3 11.9552 Tf 11.956 0 Td[(Eq[logfq(y)g])]TJ /F3 11.9552 Tf 11.955 0 Td[(Eq[logfq(z)g])]TJ /F3 11.9552 Tf 9.299 0 Td[(Eq[logfq(v)g](3)ThisgeneralizedELBOisthesameforallthethreemodelsunderinvestigationanditisafunctionofthevariationalparametersaswellasthehyper-parameters.Wemaximizethisboundwithrespecttothevariationalparameterswhichgivestheestimatesofthesequantities.fAgaboveisthesetofhyper-parametersofthegenerativemodel. ParameterEstimationfortheModelsWeboundtheloglikelihoodoftheobservations(sameforallthemodels)usingJensen'sinequality,(E[X])E[(X)],where,isaconcavefunctionandXisarandomvariable.ThisgeneralizedELBOisthesameforallthethreemodelsunderinvestigationanditisafunctionofthevariationalparametersaswellasthehyper-parameters.WedifferentiatetheindividualELBOswithrespecttothevariationalparametersofthespecicmodelstoobtaintheirrespectiveestimates. 44

PAGE 45

ParameterEstimationfortheNormalModelWedifferentiatethederivedELBOabovew.r.t.1iand2iandsetthemtozerotoobtainestimatesof1iand2i, 1i=1+NXn=1n,i,2i=2+NXn=1TXj=i+1n,j(3)Estimatingn,iisaconstrainedoptimizationwithPn,i=1.WedifferentiatetheLagrangianw.r.t.n,itoobtain, n,i=exp(Mn,i) PTi=1exp(Mn,i)(3)ThetermMn,iisrepresentedas, Mn,i=iXj=1)]TJ /F9 11.9552 Tf 5.48 -9.683 Td[(2j)]TJ /F10 11.9552 Tf 11.955 0 Td[()]TJ /F9 11.9552 Tf 5.479 -9.683 Td[(1j+2j+Pn,i(3)where, Pn,i=1 2DXd=1flog1 2+(ax,i,d))]TJ /F3 11.9552 Tf 11.955 0 Td[(log(bx,i,d))]TJ /F9 11.9552 Tf 9.299 0 Td[()]TJ /F7 7.9701 Tf 6.587 0 Td[(1x,i,d)]TJ /F3 11.9552 Tf 13.3 8.088 Td[(ax,i,d bx,i,d(Xn,d)]TJ /F3 11.9552 Tf 11.955 0 Td[(mx,i,d)2g+1 2flog1 2+(ay,i))]TJ /F3 11.9552 Tf 11.955 0 Td[(log(by,i))]TJ /F9 11.9552 Tf 11.955 0 Td[()]TJ /F7 7.9701 Tf 6.587 0 Td[(1y,i 1+DXd=1X2n,d!)]TJ /F3 11.9552 Tf 10.644 8.088 Td[(ay,i by,i Yn)]TJ /F3 11.9552 Tf 11.955 0 Td[(my,i,0)]TJ /F15 7.9701 Tf 17.141 14.944 Td[(DXd=1my,i,dXn,d!2g(3)ThevariationalparametersforthecovariatesarefoundbymaximizingtheELBOw.r.t.them. 45

PAGE 46

x,i,d=x,d+NXn=1n,i,ax,i,d=ax,d+NXn=1n,i(3) bx,i,d=1 2fx,d(mx,i,d)]TJ /F3 11.9552 Tf 11.955 0 Td[(mx,d)2+2bx,d+NXn=1n,i(Xn,d)]TJ /F3 11.9552 Tf 11.955 0 Td[(mx,i,d)2g(3) mx,i,d=PNn=1n,iXn,d+x,dmx,d PNn=1n,i+x,d(3)Thevariationalparametersofthedistributionofi,disobtainedas, y,i=(D+1)y+PNn=1n,i1+PDd=1X2n,d D+1(3) ay,i=DXd=0ay+1 2NXn=1n,i(3) by,i=1 2fDXd=0y(my,i,d)]TJ /F3 11.9552 Tf 11.955 0 Td[(my,d)2+2by+NXn=1n,i Yn)]TJ /F3 11.9552 Tf 11.955 0 Td[(my,i,0)]TJ /F15 7.9701 Tf 17.141 14.944 Td[(DXd=1my,i,dXn,d!2g(3) my,i,0=myy+PNn=1n,iYn)]TJ /F5 11.9552 Tf 11.955 8.967 Td[(PDd=1my,i,dXn,d y+PNn=1n,i(3) 46

PAGE 47

my,i,d=my,dy y+PNn=1n,iX2n,d+PNn=1n,i(Yn)]TJ /F3 11.9552 Tf 11.956 0 Td[(my,i,0+my,i,dXn,d) y+PNn=1n,iX2n,d)]TJ /F5 11.9552 Tf 10.494 17.116 Td[(PNn=1n,iPDd=1my,i,dXn,d y+PNn=1n,iX2n,d(3) ParameterEstimationfortheMultinomialModelFortheLogisticMultinomialModel,theestimationof1i,2i,n,iandx,i,d,mx,i,d,ax,i,d,bx,i,dareidenticaltotheNormalmodelwiththeonlydifferencebeingthatPn,iisgivenas, Pn,i=1 2DXd=1flog1 2+(ax,i,d))]TJ /F3 11.9552 Tf 11.955 0 Td[(log(bx,i,d))]TJ /F9 11.9552 Tf 9.298 0 Td[()]TJ /F7 7.9701 Tf 6.586 0 Td[(1x,i,d)]TJ /F3 11.9552 Tf 13.3 8.087 Td[(ax,i,d bx,i,d(Xn,d)]TJ /F3 11.9552 Tf 11.955 0 Td[(mx,i,d)2g+KXk=1Yn,k mi,0,k+DXd=1Xn,dmi,d,k!(3)And,mi,0,k=md,k+s2d,kPNn=1n,iYn,k mi,d,k=md,k+s2d,kNXn=1n,iYn,kXn,d(3) ParameterEstimationforthePoissonModelAgain,inthePoissonModel,estimationof1i,2i,n,i,aresimilartotheNormalmodelwiththeonlydifferencebeingthatthetermPn,iisgivenas, Pn,i=DXd=1K(d)Xj=1Xn,d,j (ai,d,j))]TJ /F10 11.9552 Tf 11.956 0 Td[( K(d)Xj=1ai,d,j!!+fn,igthtermofEq[logfp(YjX,z,y)g](3) 47

PAGE 48

And,ai,d,j=ad,j+PNn=1n,i.Theequationinvolvingmi,d,jis mi,d,j s2d,j+exp(mi,d,j)NXn=1n,iXn,d,j s2i,d,j=NXn=1n,iYnXn,d,j(3)TheexpressionforEq[logfp(YjX,z,y)g]isshowninSupplementaryMaterials.mi,d,j,here,doesnothaveacloseformsolution.However,itcanbesolvedquicklyviaanyiterativeroot-ndingmethod. ParameterEstimationfortheExponentialModelWedifferentiatetheELBOw.r.t.1iand2iandsetthemtozerotoobtainestimatesof1iand2i, 1i=1+NXn=1n,i,2i=2+NXn=1TXj=i+1n,j(3)Estimatingn,iisaconstrainedoptimizationwithPn,i=1.WedifferentiatetheLagrangianw.r.t.n,itoobtain, n,i=exp(Mn,i) PTi=1exp(Mn,i)(3)ThetermMn,iisrepresentedas, Mn,i=iXj=1)]TJ /F9 11.9552 Tf 5.48 -9.684 Td[(2j)]TJ /F10 11.9552 Tf 11.955 0 Td[()]TJ /F9 11.9552 Tf 5.479 -9.684 Td[(1j+2j+Pn,i(3)where, 48

PAGE 49

Pn,i=NXn=1TXi=1DXd=1(ax,i,d))]TJ /F3 11.9552 Tf 11.955 0 Td[(ln(bx,i,d))]TJ /F3 11.9552 Tf 11.956 0 Td[(Xn,dax,i,d bx,i,d+NXn=1TXi=1f)]TJ /F3 11.9552 Tf 16.771 8.088 Td[(cy,i,0 dy,i,0)]TJ /F15 7.9701 Tf 17.142 14.944 Td[(DXd=1Xn,dcy,i,d dy,i,d)]TJ /F3 11.9552 Tf 11.955 0 Td[(Yn)]TJ 8.767 -.166 Td[((cy,i,0) (dy,i,0+1)cy,i,0+YnDXd=1)]TJ 8.767 -.166 Td[((cy,i,d) (dy,i,d+Xn,d)cy,i,dg(3)Thevariationalparametersforthecovariatesandresponsesarefoundbymaximiz-ingtheELBOw.r.t.them. ax,i,d=ax,d+NXn=1n,i,bx,i,d=bx,d+NXn=1n,iXn,d(3) cy,i,d=cy,d+NXn=1(n,i+Yn),dy,i,d=dy,d+NXn=1n,i(Xn,d+Yn)(3) ParameterEstimationfortheInverseGaussianModelFortheInverseGaussianModel,theestimationof1i,2i,n,iareidenticaltotheExponentialmodelwiththeonlydifferencebeingthatPn,iisgivenas, Pn,i=1 2DXd=1flog1 2+(cx,i,d))]TJ /F3 11.9552 Tf 11.956 0 Td[(log(dx,i,d))]TJ /F3 11.9552 Tf 9.299 0 Td[(b)]TJ /F7 7.9701 Tf 6.587 0 Td[(1x,i,d)]TJ /F3 11.9552 Tf 13.45 8.087 Td[(cx,i,d dx,i,d(Xn,d)]TJ /F3 11.9552 Tf 11.956 0 Td[(ax,i,d)2g+1 2flog1 2+(cy,i))]TJ /F3 11.9552 Tf 11.956 0 Td[(log(dy,i))]TJ /F3 11.9552 Tf 11.955 0 Td[(b)]TJ /F7 7.9701 Tf 6.586 0 Td[(1y,i 1+DXd=1X2n,d!)]TJ /F3 11.9552 Tf 10.793 8.088 Td[(cy,i dy,i Yn)]TJ /F3 11.9552 Tf 11.955 0 Td[(ay,i,0)]TJ /F15 7.9701 Tf 17.142 14.944 Td[(DXd=1ay,i,dXn,d!2g(3) 49

PAGE 50

Thevariationalparametersforthecovariatesandresponsesarefoundbymaximiz-ingtheELBOw.r.t.them. bx,i,d=bx,d+NXn=1n,i,cx,i,d=cx,d+NXn=1n,i(3) dx,i,d=1 2fbx,d(ax,i,d)]TJ /F3 11.9552 Tf 11.956 0 Td[(ax,d)2+2dx,d+NXn=1n,i(Xn,d)]TJ /F3 11.9552 Tf 11.955 0 Td[(ax,i,d)2 a2x,i,dXn,dg(3) ax,i,d=PNn=1n,iXn,d+bx,dmx,d PNn=1n,i+bx,d(3) by,i=(D+1)by+PNn=1n,i1+PDd=1X2n,d D+1(3) cy,i=DXd=0cy+1 2NXn=1n,i(3) dy,i=1 2fDXd=0by(ay,i,d)]TJ /F3 11.9552 Tf 11.955 0 Td[(ay,d)2+2dy+NXn=1n,iYn)]TJ /F3 11.9552 Tf 11.955 0 Td[(ay,i,0)]TJ /F5 11.9552 Tf 11.955 8.966 Td[(PDd=1ay,i,dXn,d2 ay,i,0)]TJ /F5 11.9552 Tf 11.955 8.967 Td[(PDd=1ay,i,d2Xn,dg(3) ay,i,0=ay,dby+PNn=1n,iYn)]TJ /F5 11.9552 Tf 11.955 8.967 Td[(PDd=1ay,i,dXn,d by+PNn=1n,i(3) 50

PAGE 51

ay,i,d=ay,dby by+PNn=1n,iX2n,d+PNn=1n,i(Yn)]TJ /F3 11.9552 Tf 11.955 0 Td[(ay,i,0+ay,i,dXn,d) by+PNn=1n,iX2n,d)]TJ /F5 11.9552 Tf 10.494 17.116 Td[(PNn=1n,iPDd=1ay,i,dXn,d by+PNn=1n,iX2n,d(3) ParameterEstimationfortheMultinomialProbitModelOnceagain,intheMultinomialModel,estimationof1i,2i,ax,i,d,bx,i,d,cx,i,d,dx,i,d,aresimilartotheExponentialmodel.Thevariationalparametersaregivenby, ay,i,k=ay,k+NXn=1n,i,by,i,k=by,k(3)And,my,i,0,k=my,d,k+s2y,d,kPNn=1n,iYn,k my,i,d,k=my,d,k+s2y,d,kNXn=1n,iYn,kXn,d(3) PredictiveDistributionFinally,wederivethepredictivedistributionforanewresponsegivenanewcovari-ateandthesetofpreviouscovariate-responsepairs. p(YN+1jXN+1,X,Y)=XzZZp(YN+1jXN+1,y,z)p(v,yjY,X)p(zjv)dvdy(3)Sincetheinnerintegralsareanalyticallyintractable,weapproximatethepredictivedistributionbyreplacingthetrueposteriorwithitsvariationalsurrogate.Thedensity, 51

PAGE 52

q(v),isintegratedouttogivetheweightfactorwiforeachmixture.Theremainingpartisintegratedouttoproduceat-distributionfortheNormalmodel. p(YN+1jXN+1,X,Y)=TXi=1wiSt Yj my,i,0+DXd=1my,i,dXN+1,d,Li,Bi!!(3)Here,wiisgivenby, wi=1i2i)]TJ /F9 11.9552 Tf 5.48 -9.683 Td[(2i+1.............)]TJ /F9 11.9552 Tf 5.479 -9.683 Td[(2i+T)]TJ /F10 11.9552 Tf 11.955 0 Td[(1)]TJ /F3 11.9552 Tf 11.955 0 Td[(i (1i+2i)(1i+2i+1)........(1i+2i+T)]TJ /F3 11.9552 Tf 11.956 0 Td[(i)(3)Here,Li=(2ay,i)]TJ /F15 7.9701 Tf 6.586 0 Td[(D)y,i 2(1+y,i)by,i,whichistheprecisionparameteroftheStudent'st-distributionandBi=2ay,i)]TJ /F3 11.9552 Tf 11.955 0 Td[(Disthedegreesoffreedom.Fortheothermodels,theintegrationofthedensitiesq(y,i)andp(YN+1)isnotanalyticallytractable.Therefore,weuseMonteCarlointegrationtoobtain, E[YN+1jXN+1,X,Y]=EEYN+1jXN+1,y,i(1:T)jX,Y=1 MMXm=1EYN+1jXN+1,my,i(1:T)(3)Inallexperimentspresentedinthisdissertation,wecollected100i.i.d.samplesfromthedensityofy,itoevaluatetheexpectedvalueofYN+1fromthedensityofp(YN+1). ExperimentalResultsAbroadsetofexperimentswereconductedtoevaluatethevariationalinferenceandastandardGibbssampling.Samplesfromthepredictiveposteriorwereusedtoevaluatetheaccuracyofthemodelagainstitscompetitoralgorithms,suchas,linearregressionwithnofeatureselection(OLS),BayesianLinearregression,variationallinearregression Bishop ( 2006 ),GaussianProcessregression RasmussenandWilliams 52

PAGE 53

Table3-1. Descriptionofvariationalinferencealgorithmsforthemodels Initializehype-parametersofthegenerativemodel.RepeatEvaluate1iand2i.Evaluaten,ioftherespectiveModel.Evaluatevariationalparametersofthecovariatedistribution.Evaluatevariationalparametersoftheresponsedistribution.untilconverged Figure3-1. Asimpleposteriorpredictivetrajectoryofvariationalinferenceofthenormalmodelina4clustersyntheticdatasetwitha1-Dcovariate.Thebluetrajectoryisthesmoothedresponseposteriortrainedinthe4clusterdatarepresentedbythepoints. ( 2005a ),ordinaryDPregressionandtheGibbssamplinginference Hannahetal. ( 2011 ).Variationalinference'sspeedofconvergencewasalsorecordedandcomparedagainstthatofGibbssampling,forsuccessivelygrowingdimensionalityofthecovariates.TheaccuracyoftheMultinomialandProbitmodelsmodel(variationalinference)wasevaluatedagainstmulticlasssupportvectormachine CortesandVapnik ( 1995 ),naiveBayesclassier LowdandDomingos ( 2005 )andmultinomiallogisticregression Bishop ( 2006 ).Next,tohighlightthemodelsasapracticaltool,itwasemployedasanewGLM-basedtechniquetomodelthevolatilitydynamicsofthestockmarket.Specically,itwasusedtodeterminehowindividualstockstractpredeterminedbasketsofstocksovertime. 53

PAGE 54

DatasetsOnearticialgroupofdatasetsandthreerealworlddatasetswereused.Inthearticialset,wegeneratedseveral25to100dimensionalregressiondatasetswith10clusterseachinthecovariate-responsespace(Y,X).ThecovariatesweregeneratedfromindependentGaussianswithmeansvaryingfrom1to27instepsof3forthe10clusters.Theshapeparameterwasdrawnindependentlyfromtherange[.1,1]forthe10clusters.Foraxedcluster,theshapesweresettobethesameforeachdimension.Theseconddatasetwasacompilationofdailystockpricedata(retrievedfromGooglenance)fortheDow30companiesfromNov29,2000toDec29,2013.Ithad3268instancesandwasviewedas30different29-1covariate-responsedatasets.ThegoalwastomodelthestockpriceofanindividualDow-30companyasafunctionoftheremaining29companies,overtime.Accuracyresultswereaveragedoverall30regressions.ThethirddatasetwastheParkinson'stelemonitoringdataset A.TsanasandRamig ( 2009 )fromtheUCIMachineLearningRepositorythathas5875instancesover16covariates.ThenaldatasetwastheBreastCancerWisconsin(Original)dataset WolbergandMangasarian ( 1990 )fromtheUCIRepositorythathas699instancesover10covariates.ThisdatasetwasusedtoevaluateMultinomialandProbitModelsagainstcompetitorslikemulticlassSVM CortesandVapnik ( 1995 ),MultinomialLogisticregression Bishop ( 2006 )andnaiveBayesclassier LowdandDomingos ( 2005 ). TimingPerformancefortheNormalModelForafaircomparisonofcomputingtime,weranbothvariationalinferenceandGibbssamplingtoconvergencefor50percentofthedatasetsettotrain.ForGibbssampling,weassessedconvergencetothestationarydistributionusingtheGelman-Rubindiagnostic GelmanandWolberg ( 1992 ).Forvariationalinference,wemeasuredconvergenceusingtherelativechangeoftheELBO,stoppingthealgorithmwhenitwaslessthan1e-8.ThevariationoftimingforbothvariationalinferenceandGibbs 54

PAGE 55

Figure3-2. TimeinsecondsperdimensionforbothvariationalinferenceandGibbssamplingforthesyntheticdataset. Table3-2. RuntimeperdimensionforconvergenceofGibbssamplingandvariationalinferenceinseconds. StockmarketdataTelemonitoringdata Variationalinference436.34229.97Gibbssampling521.65311.12 Syntheticdataset Dimension2540506075100Variationalinference320290280275270240Gibbssampling49568087599512401475 samplingperdimensionforallthedatasetsaretabulatedinTable 3-2 .Gibbssamplingremainsclosetovariationalinferenceinthetelemonitoringdatasetwhichhasonly16covariates.However,asshowninthesyntheticdata,whenthedimensionalitygrowsfrom25to100,Gibbssamplingstartstolagbehindvariationalinference,exposingitsvulnerabilitytogrowingdimensions.Incontrast,variationalinferenceremainsrobustagainstrisingdimensionalityofthecovariateswhereitstimeconsumptionperdimensionforconvergencedecreasesslightlyasthedimensionalityincreases. AccuracyWereportthemeanabsoluteerror(MAE)andMeanSquareError(MSE)forallthealgorithmsinTable 3-4 Notethatvariationalinferenceyieldstheleasterrorvaluesamongitscompetitors.TocomparevariationalinferencewithGibbssampling,wesetthe 55

PAGE 56

Table3-3. Log-likelihoodofthenormalmodelofthepredictivedistributionforthesyntheticdataset(50,75,100dimensions)andstockmarketandtelemonitoringdataset(30,60and90%ofdatasetastraining). Syntheticdataset DimensionVariationalinferenceGibbssampling 50-2345.05-2789.8375-3729.38-4589.49100-4467.75-6052.62 Stockmarketdataset TrainingpercentageVariationalinferenceGibbssampling 30-912.58-1254.7860-834.29-1087.9290-712.82-878.99 Telemonitoringdataset TrainingpercentageVariationalinferenceGibbssampling 30-673.55-794.2960-545.17-643.4890-487.77-529.82 truncationfactor(T)to20andthesamplesofGibbssamplingweretakenafterburn-intobeevery5thsample.Weshowthelog-likelihoodoftheNormalmodelofthepredictivedistributioninTa-ble 3-3 withdimensionvariedfrom50to100forthesyntheticdataset(50percentofthedatasetastraining)andalsoforthecompiledstockmarketandtelemonitoringdataset(30,60and90percentofthedatasetastraining).ItisnotablethatGibbssamplingdeterioratesveryquicklyasthedimensionalityofthecovariatesgrowslarger(from16dimensionalintelemonitoringto50-100dimensionalinthesyntheticdataset).IntermsoftheMSEandMAEtoo,Gibbssamplingshowsthesametrend.Errorsarelowinthetelemonitoringdatasets,butwithincreasingdimensions,asinthesyntheticandstockmarketdata,itlosesscalabilitysinceitssampledistributionstrayssubstantiallyfromthetrueposterior,thusleadingtolargeerrors. TooltoUnderstandStockMarketDynamicsThemodelsarepresentedasnewtoolstoanalyzethedynamicsofstocksfromtheDow30companies.Dow30stocksbelongtodisparatemarketsectorssuch 56

PAGE 57

Table3-4. MSEandMAEofthealgorithmsforthesyntheticdataset(50,75,100dimensions),stockmarketdataset,telemonitoringdatasetandbreastcancerdatasetwith30,60and90%ofdatasetastraining. SyntheticdataMAEMSE Trainingpercent306090306090 Variationalinference(normalmodel)1.04.82.671.721.591.31Gibbssampling(normalmodel)1.451.231.021.611.451.32Variationalinference(inverseGaussianmodel)1.21.89.791.781.551.39Variationalinference(exponentialmodel)1.321.261.161.851.781.44ODP1.471.371.291.951.821.52GPR1.561.421.632.342.171.79VLR1.711.531.292.492.282.82BLR1.921.591.412.712.441.92LR1.551.471.362.782.572.12 StockmarketdataMAEMSE Trainingpercent306090306090 Variationalinference(normalmodel).87.71.621.541.411.24Gibbssampling(normalmodel)1.32.99.901.781.671.56Variationalinference(InverseGaussianmodel).74.63.561.391.281.13Variationalinference(exponentialmodel)1.01.92.791.621.511.40ODP.99.88.731.741.571.38GPR.83.76.681.531.441.29VLR1.07.99.901.821.711.50BLR1.161.05.921.891.761.56LR1.251.131.011.941.831.64 TelemonitoringdataMAEMSE Trainingpercent306090306090 LR1.861.551.362.091.661.36BLR1.911.601.322.131.631.30VLR1.881.521.282.071.701.33ODP1.851.591.332.101.641.29GPR1.801.561.272.041.571.26Variationalinference(InverseGaussianmodel)1.791.541.252.011.591.25Variationalinference(exponentialmodel)1.771.481.231.991.531.20Gibbssampling(normalmodel)1.811.591.301.801.671.35Variationalinference(normalmodel)1.581.391.171.821.651.51 BreastcancerdataClasspercentageaccuracy Trainingpercent306090 Variationalinference(Probitmodel)86.492.198.3Variationalinference(multinomialmodel)90.495.198.8NaiveBayes69.776.982.8SVM74.478.786.9Logistic75.381.289.5 57

PAGE 58

Table3-5. Listofvedifferentstockswithtop3mostsignicantstocksthatinuenceeachstock.Here,Intel,Verizon,Cisco,IBM,AT-Taretech.stocks,MMM,CAT,DD,Boeing,GEaremachinery/chemicalstocks,XOM,Chevronareenergystocks,AXP,GS,PG,TRX,JPM,VISAarenance/retailstocksandMCD,J-J,Coca-Colaarefoodstocks. Time-periodCiscoGoldmansachsChevronMcDonaldBoeing 2000-07VerizonJPMXOMJandJDDIBMVISABoeingCoca-ColaGEGEAXPMMMNKEGS 2007-09AXPXOMAT-TMMMMCDINTELNKEPGIBMVISADISDDCoca-ColaTRXMMM 2009-13INTELAXPXOMCoca-colaCATMSFTPGCATMerckDDDDJPMGEJandJJPM as,technology(Microsoft,Inteletc.),nance(GoldmanSachs,AmericanExpressetc.),food/pharmaceuticals(Coca-cola,McDonald,JohnsonandJohnson),EnergyandMachinery(Chevron,GE,Boeing,ExxonMobil).Wedividedthedatasetinto3timesegmentsonthetwosidesofthenancialcrisisof2008.TherstcomprisedofthestockvaluesfromNov-00toNov-07andthethirdofthestockvaluesfromDec08-Dec13.Themiddle,setastheremainder,wasrepresentativeofthenancialcrisis.Usingthemodels,wemodeledeachcompany'sstockvalueasafunctionofthevaluesoftheothersinDOW30.Werecordedthestockshavingthemostimpactonthedeterminationofthevalueofeachstock.Theimpactsarenecessarilythemagnitudeoftheweightedcoefcientsofthecovariates(thestockvalues)inthemodels.Twosignicanttrendswerenoteworthy.Firstly,whenthemarketwasstable(therstandthirdsegments),stocksfromanygivensectorhadimpactlargelyonthesamesector,withfewstocksbeinginuentialoverall.Secondly,thesectorshavingthemostimpactonaspecicstockwerethesameonbothsidesofthecrisis.Forexample,Microsoft(tech.sector),islargelymodeledbyIntel,IBM(tech),GE(machinery)andJPM(nance)previoustothecrisisandmodeledbyCisco,Intel(tech),Boeing(machinery)andGS(nance)(indescendingorderof 58

PAGE 59

weights)postcrisis.However,duringthecrisis,thestocksshowednosuchtrends.Forexample,MicrosoftisimpactedbyGS,MMM,TRXandCiscoshowingnosectorwisetrend.Wereport5additionalsuchresultsinTable 3-5 .AlltheresultsarefortheInverseGaussianModel.Buttheyarequitesimilarfortheothermodelsalso. 59

PAGE 60

CHAPTER4AUTOMATICDETECTIONOFLATENTCOMMONCLUSTERSOFGROUPSINMULTIGROUPREGRESSION ModelsRelatedtoiMG-GLMAfteritsintroduction,GeneralizedLinearModelwasextendedtoHierarchicalGen-eralizedLinearModel(HGLM) LeeandNelder ( 1996 ).Thenstructureddispersionwasincludedin LeeandNelder ( 2001a )andmodelsforspatio-temporalco-relationwereproposedin LeeandNelder ( 2001b ).GeneralizedLinearMixedModels(GLMMs)wereproposedin BreslowandClayton ( 1993 ).TherandomeffectsinHGLMwerespeciedbybothmeananddispersionin LeeandNelder ( 2006 ).MixtureofLinearRegressionwasproposedin VieleandTong ( 2002 ).Hierarchicalmixtureofregressionwaspre-sentedin JordanandJacobs ( 1993 ).Varyingcoefcientmodelswereproposedin HastieandTibshirani ( 1993 ).Multi-taskingModelforclassicationinaNon-parametricBayesianscenariowasintroducedin YaXueandCarin ( 2007 ).SharingHiddenNodesinNeuralNetworkswasintroducedin Baxter ( 1995 2000 ).GeneralMulti-Tasklearningwasdescribedrstin Caruana ( 1997 ).CommonpriorinhierarchicalBayesianmodelwasusedin Yuetal. ( 2005 ); Zhangetal. ( 2005 ).Commonstructuresharinginthepredictorspacewaspresentedin AndoandZhang ( 2005 ).Allofthesemodelssuffertheshortcomingsofnotidentifyingthelatentclusteringeffectacrossgroupsaswellasvaryinguncertaintywithrespecttocovariatesacrossgroups,whichtheiMG-GLMspresentedhereinherentlymodels. iMG-GLMModelFormulationWeconsiderMgroupsindexedbyj=1,....,MandthecompletedataasD=fxj,i,yj,igs.t.i=1,...Nj.fxj,i,yj,igarecovariate-responsepairsandaredrawni.i.d.fromanunderlyingdensitywhichdiffersalongwiththenatureoffxj,i,yj,igamongvariousmodels. 60

PAGE 61

Figure4-1. GraphicalrepresentationofiMG-GLM-1model. NormaliMG-GLM-1ModelIntheNormaliMG-GLM-1model,thegenerativemodelofthecovariate-responsepairisgivenbythefollowingsetofequations.Here,XjiandYjirepresenttheithcon-tinuouscovariate-responsepairsofthejthgroup.ThedistributionofYj,ijXj,iisnormalparametrizedby0:Dand.Thedistribution,fkd,kg(Normal-Gamma)isthepriordistributiononthecovariatecoefcient.Thisdistributionisthebasedistribution(G)oftheDirichletProcess.Thesetfm0,0,a0,b0gconstitutethehyper-parametersforthecovariatecoefcients()distribution.ThegraphicalrepresentationofthenormalmodelisgiveninFigure 4 vkBeta(1,2),k=vkk)]TJ /F7 7.9701 Tf 6.586 0 Td[(1n=1(1)]TJ /F12 10.9091 Tf 10.909 0 Td[(vn)Nkdjm0,(0,k))]TJ /F7 7.9701 Tf 6.586 0 Td[(1Gamma(kja0,b0)ZjjvkCategorical(1,......1)YjijXjiNYjijPDd=0ZjdXjid,)]TJ /F7 7.9701 Tf 6.586 0 Td[(1Zj(4) 61

PAGE 62

LogisticMultinomialiMG-GLM-1ModelIntheLogisticMultinomialiMG-GLM-1model,aMultinomialLogisticFrameworkisusedforaCategoricalresponse,Yji,foracontinuouscovariate,Xji,inthecaseofithdatapointofthejthgroup.tistheindexofthecategory.ThedistributionofYj,ijXj,iisCategoricalparametrizedby0:D,0:T.Thedistribution,fktdg(Normal)isthepriordistributiononthecovariatecoefcientwhichisthebasedistribution(G)oftheDirichletProcess.Thesetfm0,s0gconstitutethehyper-parametersforthecovariatecoefcients()distribution. vkBeta(1,2),k=vkk)]TJ /F7 7.9701 Tf 6.587 0 Td[(1n=1(1)]TJ /F12 10.9091 Tf 10.91 0 Td[(vn)ktdN)]TJ /F14 10.9091 Tf 5 -8.837 Td[(ktdjm0,s20,ZjjvkCategorical(1,......1)Yji=tjXji,ZjexpPDd=0ZjtdXjid PTt=1expPDd=0ZjtdXjid(4) PoissoniMG-GLM-1ModelInthePoissoniMG-GLMmodel,aPoissondistributionisusedforthecountre-sponse.Here,XjiandYjirepresenttheithcontinuous/ordinalcovariateandcategoricalresponsepairofthejthgroup.ThedistributionofYj,ijXj,iisPoissonparametrizedby0:D,0:T.Thedistribution,fkdg(Normal)isthepriordistributiononthecovariateco-efcientwhichisthebasedistribution(G)oftheDirichletProcess.Thesetfm0,s0gconstitutethehyper-parametersforthecovariatecoefcients()distribution. vkBeta(1,2),k=vkk)]TJ /F7 7.9701 Tf 6.587 0 Td[(1n=1(1)]TJ /F12 10.9091 Tf 10.909 0 Td[(vn),fk,dgN)]TJ /F14 10.9091 Tf 5 -8.836 Td[(kdjm0,s20YjijXji,ZjPoissonyjijexpPDd=0ZjdXjid(4) VariationalInferenceTheinter-couplingbetweenYji,Xjiandzjinallthreemodelsdescribedabovemakescomputingtheposteriorofthelatentparametersanalyticallyintractable.We 62

PAGE 63

thereforeintroducethefollowingfullyfactorizedanddecoupledvariationaldistributionsassurrogates. NormaliMG-GLM-1ModelThevariationaldistributionfortheNormalmodelisdenedformallyas: q(z,v,kd,k)=QKk=1Beta)]TJ /F12 10.9091 Tf 5 -8.837 Td[(vkj1k,2kQMj=1Multinomial(zjjj)QKk=1QDd=0Nkdjmkd,(k,k))]TJ /F7 7.9701 Tf 6.587 0 Td[(1Gamma(kjak,bk)(4)Firstly,eachvkfollowsaBetadistribution.Asin BleiandJordan ( 2006 ),wehavetruncatedtheinniteseriesofvksintoaniteonebymakingtheassumptionp(vK=1)=1andk=08k>K.Notethatthistruncationappliestothevariationalsurrogatedistributionandnottheactualposteriordistributionthatweapproximate.Secondly,zjfollowsavariationalmultinomialdistribution.Thirdly,fkd,kgfollowsaNormal-Gammadistribution. LogisticMultinomialiMG-GLM-1ModelThevariationaldistributionfortheLogisticMultinomialmodelisgivenby: q(z,v,kd,k)=QKk=1Beta)]TJ /F12 10.9091 Tf 5 -8.836 Td[(vkj1k,2kQMj=1Multinomial(zjjj)QKk=1QTt=1QDd=0N)]TJ /F14 10.9091 Tf 5 -8.836 Td[(ktdjmktd,s2ktd(4)Here,vkandzjrepresentthesamedistributionsasdescribedintheNormaliMG-GLM-1modelabove.fktdgfollowsavariationalNormalModel. PoissoniMG-GLM-1ModelThevariationaldistributionforthePoissoniMG-GLM-1modelisgivenby: q(z,v,kd,k)=QKk=1Beta)]TJ /F12 10.9091 Tf 5 -8.837 Td[(vkj1k,2kQMj=1Multinomial(zjjj)QKk=1QDd=0N)]TJ /F14 10.9091 Tf 5 -8.836 Td[(ktdjmktd,s2ktd(4)Here,vkandzjrepresentthesamedistributionsasdescribedintheNormaliMG-GLM-1modelabove.fkdgfollowsavariationalNormalModel. 63

PAGE 64

ParameterEstimationforVariationalDistributionWeboundtheloglikelihoodoftheobservationsinthegeneralizedformofiMG-GLM-1(sameforallthemodels)usingJensen'sinequality,(E[X])E[(X)],where,isaconcavefunctionandXisarandomvariable.Inthissection,wedifferentiatetheindividuallyderivedboundswithrespecttothevariationalparametersofthespecicmodelstoobtaintheirrespectiveestimates. ParameterEstimationofiMG-GLM-1NormalModelTheparameterestimationoftheNormalModelisasfollows: 1k=1+PMi=1ik,2k=+PMi=1PKp=k+1n,pjk=exp(Sjk) PKk=1exp(Sjk)s.t.Sjk=Pkj=1n1j)]TJ /F13 10.9091 Tf 10.909 0 Td[(1j+2jo+Pjks.t.Pjk=1 2PMj=1PNji=1jkflog)]TJ /F7 7.9701 Tf 8.77 -4.542 Td[(1 2+(ak))]TJ /F12 10.9091 Tf 10.909 0 Td[(log(bk))]TJ /F14 10.9091 Tf 8.485 0 Td[(k1+PDd=1X2jid)]TJ /F15 7.9701 Tf 12.213 4.788 Td[(ak bkYji)]TJ /F12 10.9091 Tf 10.909 0 Td[(mk0)]TJ /F5 10.9091 Tf 10.909 8.182 Td[(PDd=1mkdXjid2gk=(D+1)0+PMj=1PNji=1jk(1+PDd=1X2jid) D+1ak=PDd=0a0+1 2PMj=1PNji=1jkbk=1 2fPDd=00(mkd)]TJ /F12 10.9091 Tf 10.909 0 Td[(m0)2+2b0+PMj=1PNji=1jkYji)]TJ /F12 10.9091 Tf 10.909 0 Td[(mk0)]TJ /F5 10.9091 Tf 10.91 8.182 Td[(PDd=1mkdXjid2gmk0=m00+PMj=1PNji=1ji(Yji)]TJ /F19 7.9701 Tf 6.587 5.978 Td[(PDd=1mkdXjid) 0+PMj=1PNji=1jkmkd=m00+PMj=1PNji=1jiYji)]TJ /F15 7.9701 Tf 6.587 0 Td[(mk0)]TJ /F19 7.9701 Tf 6.587 5.977 Td[(PD)]TJ /F20 5.9776 Tf 5.756 0 Td[((d)d=1mkdXjidXjid 0+PMj=1PNji=1jkX2jid(4) ParameterEstimationofiMG-GLM-1MultinomialModelFortheLogisticMultinomialModel,theestimationof1i,2i,jkandareidenticaltotheNormalmodelwiththeonlydifferencebeingthatPjkisgivenas, 64

PAGE 65

Pjk=1 2PMj=1PNji=1jkflog)]TJ /F7 7.9701 Tf 8.77 -4.541 Td[(1 2+PTt=1Yjitmk0t+PDd=1Xjidmkdtmkdt=m0s20+s2kdtPMj=1jkPNjj=1YjitXjid,s2kdt=s20+PMj=1jkPNjj=1PDd=0X2jidexpPDd=0Xjidmkdt(4) ParameterEstimationofPoissoniMG-GLM-1ModelAgain,inthePoissonModel,estimationof1i,2i,jk,aresimilartotheNormalmodelwiththeonlydifferencebeingthatthetermPjkisgivenas, Pjk=1 2PMj=1PNji=1jkf)]TJ /F5 10.9091 Tf 15.757 8.182 Td[(PDd=0expskd 2+mkdXjid skd+YjiPDd=0Xjidmkd)]TJ /F13 10.9091 Tf 10.909 0 Td[(log(Yji)mkd s2kd+exp(mkd)+PMj=1jkPNji=1Xjid s2kd=PMj=1Pi=1NjjkYjiXjid(4)For,mkdandskd,doesnothaveacloseformsolution.However,itcanbesolvedquicklyviaanyiterativeroot-ndingmethod. PredictiveDistributionFinally,wedenethepredictivedistributionforanewresponsegivenanewcovari-ateandthesetofpreviouscovariate-responsepairsforthetrainedgroups. p(Yj,newjXj,new,Zj,k=1:K,d=0:D)=PKk=1RZjkp)]TJ /F12 10.9091 Tf 5 -8.836 Td[(Yj,newjXj,new,Dk,d=0q(z,v,kd,k)(4)Integratingouttheq(z,v,kd,k),wegetthefollowingequationfortheNormalmodel. p(Yj,newjXj,new)=PKk=1jkStYj,newjPDd=0mkdXj,new,,d,Lk,Bk(4)Here,Lk=(2ak)]TJ /F15 7.9701 Tf 6.586 0 Td[(D)k 2(1+k)bk,whichistheprecisionparameteroftheStudent'st-distributionandBi=2ay,i)]TJ /F3 11.9552 Tf 12.178 0 Td[(Disthedegreesoffreedom.ForthePoissonandMultinomialModels, 65

PAGE 66

Table4-1. DescriptionofvariationalinferencealgorithmforiMG-GLM-1normalmodel 1.Initializegenerativemodellatentparametersq(z,v,kd,k)randomlyinitsstatespace.Repeat2.Estimate1kand2k.fork=1toK.3.Estimatejk.forj=1toMandfork=1toK.4.Estimatethemodeldensityparameters,fmkd,k,ak,bkgfork=1toKandd=0toD.untilconverged5.EvaluateE[Yj,new]foranewcovariate,Xj,new theintegrationofthedensitiesisnotanalyticallytractable.Therefore,weuseMonteCarlointegrationtoobtain, E[Yj,newjXj,new,X,Y]=E[E[Yj,newjXj,new,q(kd)]jX,Y]=1 SPSs=1E[Yj,newjXj,new,q(kd)](4)Inallexperimentspresentedinthisdissertation,wecollected100i.i.d.samples(S=100)fromthedensityoftoevaluatetheexpectedvalueofYj,new.ThecompletevariationalinferencealgorithmforiMG-GLM-1NormalModelisgivenTable 4-1 iMG-GLM-2ModelWecannowlearnanewgroupM+1,afteralloftherstMgroupshavebeentrained.Forthisprocess,wememorizethelearnedlatentparametersfromtheprevi-ouslylearneddata. InformationTransferfromPriorGroupsFirst,wewritedownthelatentparameterconditionaldistributiongivenalltheparametersinthepreviousgroups.Wedenethesetoflatentparameters(Z,v,,)as.FromthedescriptionofDirichletProcesswewritedowntheprobabilityforthelatentparametersforthe(M+1)thgroupgivenpreviousones, p(M+1j1:M,,G0)= M+G0+1 M+PKk=1nkk(4)Where,nk=PMj=1Zjk,representscountwherej=k.Ifwesubstitutek=E[k],whichwedeneby=fjk,k,mdk,k,sdkg,weget, 66

PAGE 67

p)]TJ /F14 10.9091 Tf 5 -8.837 Td[(M+1jk,,G0= M+G0+1 M+PKk=1nkk(4)Where,nk=PMj=1indexjkandindexjk=argmax(jk).ThisdistributionrepresentsthepriorbeliefaboutthenewgrouplatentparametersintheBayesiansetting.Nowourgoalistocomputetheposteriordistributionofthenewgrouplatentparametersafterweviewthelikelihoodwiththedatain(M+1)thgroup. p(M+1j,,DM+1)=p(DM+1jM+1)p(M+1j,G0) p(DM+1j,G0)(4)Here,p(DM+1jM+1)=NM+1i=1p(YM+1,ijM+1,XM+1,i). PosteriorSamplingTheposteriorabovedoesnothaveaclosedformsolutionapartfromtheNormalModel.So,weapplyaMetropolisHastingsAlgorithm RobertandCasella ( 2005 ); Neal ( 2000b )fortheLogisticMultinomialandPoissonModel.FortheNormalmodel,p(M+1j,,DM+1)turnsouttobeamixtureofNormal-Gammadensitywithfollowingparameters, mk=XTM+1XM+1+(k)I)]TJ /F7 7.9701 Tf 6.586 0 Td[(1XTM+1YM+1+kImkk=)]TJ /F12 10.9091 Tf 5 -8.836 Td[(XTM+1XM+1+kI,ak=ak+NM+1=2bk=bk+1 2YTM+1YM+1+mTkkmk)]TJ /F12 10.9091 Tf 10.91 0 Td[(mTkkmk(4)ForthePoissonandLogisticMultinomialModel,theMetropolisHastingsAlgorithmhasthefollowingsteps.First,wedrawasample_fromabove.Thenwedrawacandi-datesample,Next,wecomputetheacceptanceprobability,hminh1,p(DM+1j) p(DM+1j_)ii.Wesetthenew_towiththisacceptanceprobability.Otherwise,itremainstheoldvalue.Werepeattheabove4stepsuntilenoughsampleshasbeencollected.Thisyieldstheapproximationoftheposterior. 67

PAGE 68

PredictionforNewGroupTestSamplesWeseektopredictthefutureYM+1,newjXM+1,new,,bythefollowingequationwiththepreviouscollectionofposteriorsamplest=1:T.Tisthenumberofsamples. p(YM+1,newjXM+1,new,)=1 TPTt=1p(YM+1,newjXM+1,new,t)(4) ExperimentalResultsWepresentempiricalstudiesontworealworldapplications:(a)aStockMarketAccuracyandTrendDetectionproblemand(b)aClinicalTrialproblemontheefcacyofanewdrug. TrendsinStockMarketWeproposeiMG-GLM-1andiMG-GLM-2asatrendspotterinFinancialMarketswherewehavechosendailycloseoutstockpricesover51stocksfromNYSEandNasdaqinvarioussectors,suchas,Financials(BAC,WFC,JPM,GS,MS,Citi,BRK-B,AXP),Technology(AAPl,MSFT,FB,GOOG,CSCO,IBM,VZ),ConsumerDiscretionary(AMZN,DIS,HD,MCD,SBUX,NKE,LOW),Energy(XOM,CVX,SLB,KMI,EOG),HealthCare(JNJ,PFE,GILD,MRK,UNH,AMGN,AGN),Industrials(GE,MMM,BA,UNP,HON,UTX,UPS),Materials(DOW,DD,MON,LYB)andConsumerStaples(PG,KO,PEP,PM,CVS,WMT).Thetaskistopredictfuturestockpricesgivenpaststockvalueforallthesestocksandspotgeneraltrendsintheclusterofthestockswhichmightbehelpfulinndingafarmorepowerfulmodelforprediction.Thegeneralsettingisaauto-regressiveprocessviatheNormaliMG-GLM-1modelwithlagsrepresentingthepredictorvariablesandresponsebeingthecurrentstockprice.Thelag-lengthwasdeterminedtobe3bytrialanderrorwith50-50training-testingsplit.DatawascollectedfromSeptember13th,2010toSeptember13th,2015with1250datapoints,fromGoogleFinance. 68

PAGE 69

Someveryinterestingtrendswerenoteworthy.Aftertheclusteringwasaccom-plishedfortheNormalmodel,thestocksbecamegroupedalmostentirelybythesectorstheycamefrom.Specically,wewitnessedatotalof9clustersofstocks,closeinmakeuptothe8sectorschosenoriginallyconsolidatingallthestockssectorssuchas,nancial,healthcareetc.Forexample,Apple,MicrosoftVerizon,Google,CiscoandAMZNwereclubbedtogetherinonecluster.Thissigniesthatallofthesestockssharethesameauto-regressivedensitywiththesamevariance.Incomparison,singleandseparatemodelingofthestocksresultedinamuchinferiormodel.Jointmodelingwasparticularlyusefulbecausewehadonly625datapointsperstocksfortrainingpurposesoverthepast5years.Asaresult,transferofstockdatapointsfromonestocktoanotherhelpedmitigatetheproblemofover-ttingtheindividualstockswhileensuringamuchimprovedmodelfordensityestimationforaclusterofstocks.WereporttheclusteringofthestocksinTable 4-2 .WealsoshowtheaccuracyofthepredictionfortheiMG-GLM-1modelintermsoftheMeanAbsoluteerror(MAE)inTable 4-3 .NotethatMAEfortheNormalmodelsignicantlyoutperformedtheGLMMnormalmodel,stockspecicRandomForest,LinearRegressionandGaussianProcessRegression.WenowhighlighttheutilizationofinformationtransferintheiMG-GLM-1model.Wetrainedtherst51stockswherewevariedthenumberoftrainingsamplesineachgroup/stockfrom200to1200instepsof250.Foreachgroupwechosethetrainingsamplesrandomlyfromthedatasetsandtheremainingwereusedfortesting.Thehyper-parametersweresetas,fm0,0,a0,b0g=0,1,2,2.Wealsoranourinferencewithdifferentsettingsofthehyper-parametersbutfoundtheresultsnottobeparticularlysensitivetothehyper-parameterssettings.WeplottheaverageMAEof50randomrunsinFigure 4 .TheiMG-GLM-1NormalModelgenerallyoutperformedtheothercompetitors.Fewinterestingresultswerefoundinthisexperiment.Whenveryfewtrainingsampleswereusedfortraining,virtuallyallthealgorithmsperformedpoorly.Inparticular,iMG-GLM-1clubbedallstocksintooneclusterassufcientdatawasnot 69

PAGE 70

presenttoidentifythestatisticalsimilaritiesbetweenstocks.AsthenumberoftrainingsamplesincreasediMG-GLM-1startedtopickoutclusterofgroups/stocksasitwasablendlatentcommondensitiesamongdifferentgroups.As,thetrainingsamplesgotclosertothenumberofdatapoints(1200),allothermodelsstartedtoperformclosetotheiMG-GLM-1model,becausetheymanagedtolearneachstockwellinisolation,indicatingthatfurtherdatafromothergroupsbecamelessuseful.WenowproceedtoiMG-GLM-2,wherewetrained10newstocksfromdifferentsectors(CMCSA,PCLN,WBA,COST,KMI,AIG,GS,HON,LMT,T).Twofeatureswhichinuencedthelearningwereconsidered.First,wevariedthenumberoftrainingsamplesfrom400to750to1100foreachpreviousgroupsthatwereusedtofurthertrainM+1.Then,wechangedthenumberoftrainingsamplesforthenewgroupsfrom200to1200instepsof250.WeplottheMAEresultsfor50randomrunsinFigure 4 .Thepriorbeliefisthatthenewgroupsaresimilarinresponsedensitytothepreviousgroups.iMG-GLM-2efcientlytransfersthisinformationfromapreviousgroupstonewgroups.TheiMG-GLM-1modellearnsaninformativepriorfornewgroupswhenthenumberoftrainingsamplesforeachpreviousgroupisverysmall(asseenintherstpartinFigure 4 ).Theaccuracyincreasesveryslightlyasthenumberoftrainingsamplesincreasesineachgroup.But,withthenumberoftrainingsamplesforthenewgroupsincreasing,iMG-GLM-2doesnotimproveatall.Thisisduetotheexibleinformationtransferfromthepreviousgroups.Themodeldoesnotrequiremoretrainingsamplesforitsowngrouptomodelitsdensity,becauseithasalreadyobtainedsufcientinformationaspriorfromthepreviousgroups. ClinicalTrialProblemModeledbyPoissoniMG-GLMModelFinally,weexploredaClinicalTrialproblem IBM ( 2011 )fortestingwhetheranewanticonvulsantdrugreducesapatient'srateofepilepticseizures.Patientswereassignedthenewdrugortheplaceboandthenumberofseizureswererecordedoverasixweekperiod.Ameasurementwasmadebeforethetrialasabaseline.The 70

PAGE 71

Table4-2. ClustersofStocksfromVariousSectors.Wenote9clustersofstocksconsolidatingallthepre-chosensectorssuchas,nancials,materialsetc.Groupnumbersareindexedfrom1to9. 123456789 AAPLMSFTVZGOOGCSCOAMZNBACWFCJPMAXPPG,CITIGS,MSDISHDLOWSBUXMCDXOMCVXSLBEOGKMIGILDMRKUNHAMGNAGNGEMMMBAUNPHONDOWDDMONLYBJNJPFEKOPEPPMCVSWMTBRK-BIBMFBNKEUTXUPS Table4-3. Meanabsoluteerrorforallstocks.iMG-GLM-1hasmuchhigheraccuracythanothercompetitors. AAPLMSFTVZGOOGCSCOAMZNBACWFCJPMAXPPGCITIGSMSDISHDLOW GPR.023.004.087.078.093.189.452.265.176.190.378.018.037.098.278.038.011RF.278.903.370.256.290.570.159.262.329.592.746.894.956.239.934.189.045LR.381.865.280.038.801.706.589.491.391.467.135.728.578.891.389.790.624GLMM.378.489.389.208.972.786.289.768.189.389.590.673.901.490.209.391.991iMG-GLM.012.002.009.011.018.028.047.038.035.079.069.087.019.030.139.189.213 SBUXMCDXOMCVXSLBEOGKMIGILDMRKUNHAMGNAGNGEMMMBAUNPHON GPR.837.289.849.583.185.810.473.362.539.289.306.438.769.848.940.829.691RF.884.321.895.843.774.863.973.729.894.794.695.549.603.738.481.482.482LR.380.391.940.995.175.398.539.786.591.320.793.839.991.839.698.389.298GLMM.649.720.364.920.529.369.837.630.729.481.289.970.740.649.375.439.539iMG-GLM.003.018.128.291.005.060.052.017.014.078.009.067.191.034.098.145.238 DOWDDLYBJNJPFEKOPEPPMCVSWMTBRK-BIBMFBNKEUTXUPSMON GPR.689.890.745.907.678.378.867.945.361.934.589.845.901.310.483.828.748RF.181.098.489.237.692.827.490.295.749.692.957.295.478.694.747.806.945LR.67.386.984.982.749.294.256.567.345.767.893.956.294.389.694.921.702GLMM.727.389.288.592.402.734.923.900.571.312.839.956.638.490.390.372.512iMG-GLM.038.078.063.019.024.007.089.192.138.111.289.390.289.218.200.149.087 objectivewastomodelthenumberofseizures,whichbeingacountdatum,ismodeledusingaPoissondistributionwithaLoglink.Thecovariatesare:TreatmentCentersize(ordinal),numberofweeksoftreatment(ordinal),typeoftreatmentnewdrugorplacebo(nominal)andgender(nominal).APoissondistributionwithloglinkwasusedforthecountofseizures.Here,XjiandYjirepresenttheithcovariateandcountresponsepairofthejthgroup.Thedistribution,fkdg(Normal)isthepriordistributiononthecovariatecoefcient.Wefoundthatapatient'snumberofseizuresareclustered(theyformthegroups)inmultiplecollections.Thissigniesthatamajorityofthepatientsacrossgroupsshowthesameresponsetothetreatment.Weobtained8clustersfrom300outof565patientsfortheiMG-GLM-1model(theremaining265weresetasideformodelingthroughthe 71

PAGE 72

Table4-4. MSEandMAEofthealgorithmsfortheclinicaltrialdatasetandnumberofpatientsinclustersforiMG-GLM-1andiMG-GLM-2model. PatientnumberinclustersforiMG-GLM-1modelPositive(FirstFive)Negative(LastThree)4630402733243724PatientnumberinclustersforiMG-GLM-2modelPositive(FirstFive)Negative(LastThree)3324412953153238 iMG-GLMPoissonGLMMPoissonregressionRForestMeansquarerooterror(L2error)fpriMG-GLM-2model1.531.581.921.75Meanabsoluteerrorrooterror(L1error)foriMG-GLM-2model1.141.341.511.62 iMG-GLM-2model).Amongthem5clustersshowedthatthenewdrugreducesthenumberofepilepticseizureswithincreasingnumberofweeksoftreatmentwhiletheremaining3clustersdidnotshowanyimprovement.Wealsoreporttheforecasterrorofthenumberofepilepticseizuresoftheremaining265patientsinTable 4-4 .Ourrecommendationfortheusageofthenewdrugwouldbeaclusterbasedsolution.Foraspecicpatient,ifshefallsinoneofthoseclusterswithdecreasingtrendinthenumberofseizureswithtime,wewouldrecommendthenewdrug,andotherwisenot.Outof265testcasepatientsmodeledthroughiMG-GLM-2,180showedsignsofimprovementswhile85didnot.WekeptalltheweeksastrainingfortheiMG-GLM-1modelandtherstveweeksastrainingandthelastweekastestingdatafortheiMG-GLM-2model.TraditionalPoissonGLMMcannotinferthesendingssincethedensitiesarenotsharedatthepatientgrouplevel.Moreover,onlythePoissoniMG-GLM-1/2basedpredictionisformallyequippedtorecommendapatientclusterbasedsolutionforthenewdrug,whereasalltraditionalmixedmodelspredictaglobalrecommendationforallpatients. 72

PAGE 73

Figure4-2. Theaveragemeanabsoluteerrorfor51stocksfor50randomrunsforiMG-GLM-1modelwithvaryingnumberoftrainingsamples. Figure4-3. Theaveragemeanabsoluteerrorfor10newstocksfor50randomrunsforiMG-GLM-2modelwithvaryingnumberoftrainingsamplesinbothpreviousandnewgroups. 73

PAGE 74

CHAPTER5AUTOMATICDISCOVERYOFCOMMONANDIDIOSYNCRATICLATENTEFFECTSINMULTILEVELREGRESSION ModelsRelatedtoHGLMAfteritsintroduction,HierarchicalGeneralizedLinearModelwasextendedtoincludestructureddispersion LeeandNelder ( 2001a )andmodelsforspatio-temporalco-relation LeeandNelder ( 2001b ).GeneralizedLinearMixedModels(GLMMs)wereproposedin BreslowandClayton ( 1993 ).TherandomeffectsinHGLMwerespeciedbybothmeananddispersionin LeeandNelder ( 2006 ).MixtureofLinearRegressionwasproposedin VieleandTong ( 2002 ).HierarchicalMixtureofRegressionwasdonein JordanandJacobs ( 1993 ).Varyingco-efcientmodelswereproposedin HastieandTibshirani ( 1993 ).Allofthesemodelssuffertheshortcomingsofnotpickingupthelatentinter/intraclusteringeffectaswellasvaryinguncertaintywithrespecttocovariatesacrossgroups,whichtheiHGLMpresentednextinherentlymodels.ThedifferencebetweeniMG-GLMmodelsandtheiHGLMmodelsareinthelevelsofmodeling.TheiMG-GLMmodelscapturetheclusteringeffectonthegroups,notinsidethegroups.Itdoesnotdealwiththedatainsidethegroups.Moreprecisely,itdoesnottakeintoaccountthesimilarity/dissimilarityeffectsofthepatternsofthedatainsideonesinglegrouporacrossthegroups.TheiHGLMmodelscapturepreciselythisphenomenon.Theinter/intragroupssimilarity/clusteringeffectshavebeentakenintoaccount.Also,thisisamixturemodelwithinonesinglegroup.Thismeansthatthemodelsarenon-linearwithrespecttothecovaraitesineverygroupandalsomodelsvaryingvariancewithinonegroup.TheiMG-GLMmodelisincapableofdoingthisasitisnotamixturemodelwithinthegroup. AnIllustrativeExampleWeshowasimpleposteriorpredictivetrajectoryoftheiHGLMNormalModelinafour-groupsyntheticdatasetwitha1-DCovariateinFigure 5 .Theyellowtrajectoryisthesmoothedresponseposteriorlearnedbythemodel.Allthegroupswerecreated 74

PAGE 75

Figure5-1. Theposteriortrajectoryofthesyntheticdatasetwith4groups.Differentcolorsrepresentdifferentsubgroups. withfourmixturecomponentsequallyweighted.Fortherstgroup,responsesweregeneratedthroughfourresponse-covariatedensitieswithmeanandstandarddeviationsetas,(1+x,.5),(1.75+.5x,.8),(1.15+.8x,.2),(2.40+.3x,.4).Forthe2ndgrouptheywere(8.5)]TJ /F3 11.9552 Tf 12.377 0 Td[(x,1.2),(1.75+.5x,.8),()]TJ /F10 11.9552 Tf 9.298 0 Td[(18.25+4.5x,.1),(1+x,.5).Forthe3rd,(10.90)]TJ /F10 11.9552 Tf 12.706 0 Td[(.5x,.9),(1.15+.8x,.2),(49.15)]TJ /F10 11.9552 Tf 12.705 0 Td[(5.2x,1.1),(2.4+x,.3),andforthe4th,(3.55+.2x,1),(10.90)]TJ /F10 11.9552 Tf 12.077 0 Td[(.5x,.9),()]TJ /F10 11.9552 Tf 9.299 0 Td[(40.80+4.2x,.3),(1.75+.5x,.8).Observethatanytwogroupshaveatleastonedensityincommon.Tocapturethiskindofmultileveldata,aregressionmodelisneededwhichcapturessharingoflatentdensitiesbetweenthegroups.Also,everygroupmustbemodeledbyamixtureofdensities.Themodelmustcaptureheteroscedasticitywithingroupswherethevarianceoftheresponsesdependuponthecovariatesineachgroup.TheiHGLMnormalmodelcapturesallofthesehiddenintra/inter-clusteringeffectsbetweenthegroupsaswellasheteroscedasticitywithinthegroups,asshowninFigure 5 iHGLMModelFormulation NormaliHGLMModelInNormaliHGLM,thegenerativemodelofthecovariate-responsepairisgivenbythefollowingsetofequations.Here,XjiandYjirepresenttheithcontinuouscovariate-responsepairsofthejthgroup.Thedistribution,fd,xdg(Normal-Gamma)isthepriordistributiononcovariates.Thedistribution,fd,yg(Normal-Gamma)istheprior 75

PAGE 76

distributiononthecovariatecoefcient.Boththedistributionsarebasedistributions(H)oftherstDP.Thesetfmxd0,xd0,axd0,bxd0gandfmyd0,y0,ay0,by0gconstitutethehyper-parametersforthecovariatesandcovariatecoefcients(),respectively. fd,xdgN)]TJ /F9 11.9552 Tf 5.479 -9.684 Td[(djmxd0,(xd0xd))]TJ /F7 7.9701 Tf 6.587 0 Td[(1Gamma(xdjaxd0,bxd0),fd,ygN)]TJ /F9 11.9552 Tf 5.479 -9.684 Td[(djmyd0,(y0y))]TJ /F7 7.9701 Tf 6.587 0 Td[(1Gamma(yjay0,by0),G0DP(,H),GjDP(0,G0),kd,)]TJ /F7 7.9701 Tf 6.587 0 Td[(1xkdGj,fkd,ykgGj,Xjidjkd,xkdN)]TJ /F3 11.9552 Tf 5.48 -9.684 Td[(Xjidjkd,)]TJ /F7 7.9701 Tf 6.586 0 Td[(1xkd,YjijXjiNYjijPDd=0kdXjid,)]TJ /F7 7.9701 Tf 6.587 0 Td[(1yk(5) LogisticMultinomialiHGLMModelIntheLogisticMultinomialiHGLMmodel,thecontinuouscovariatesaremodeledbyaGaussianmixture(identicallyastheNormalmodelabove)andaMultinomialLogisticframeworkisusedforthecategoricalresponse(NumberofCategoriesisP).Here,XjiandYjirepresenttheithcontinuouscovariateandcategoricalresponsepairofthejthgroup.pistheindexofthecategory.Thedistribution,fd,xdg(Normal-Gamma)isthepriordistributiononthecovariates.Thedistribution,fpdg(Normal)isthepriordistributiononthecovariatecoefcient.Boththedistributionsarebasedistributions(H)oftherstDP.Thesetfmxd0,xd0,axd0,bxd0gandmypd0,s2ypd0constitutethesetofhyper-parametersforthecovariatesandcovariatecoefcients(),respectively.Thecompletemodelisasfollows: fd,xdgN)]TJ /F9 11.9552 Tf 5.48 -9.684 Td[(djmxd0,(xd0,xd))]TJ /F7 7.9701 Tf 6.586 0 Td[(1Gamma(xdjaxd0,bxd0),fpdgN)]TJ /F9 11.9552 Tf 5.479 -9.684 Td[(djmypd0,s2ypd0,G0DP(,H),GjDP(0,G0),kd,)]TJ /F7 7.9701 Tf 6.587 0 Td[(1xkdGj,fkpdgGj,Xjidjkd,xkdN)]TJ /F3 11.9552 Tf 5.479 -9.684 Td[(Xjidjkd,)]TJ /F7 7.9701 Tf 6.586 0 Td[(1xkd,fYji=pjXjigexp(PDd=0kpdXjid) PPp=1exp(PDd=0kpdXjid)(5) 76

PAGE 77

ProofofWeakPosteriorConsistencyWenowproveanimportantasymptoticpropertyoftheiHGLMmodel:theweakconsistencyofthejointdensityestimate.TheideabehindweakPosteriorconsistencyisthat,asthenumberofpreviousgroupspecicinput-outputpairsapproachesinnity,theposteriordistribution,f)]TJ /F3 11.9552 Tf 5.479 -9.684 Td[(fj(Xji,Yji)ni=1concentratesinaweakneighborhoodofthetruedistribution,f0(x,y).Thisensuresaccumulationoftheposteriordistributioninregionsofdensitieswhereintegrationofeveryboundedandcontinuousfunctionwithrespecttothedensitiesintheregionarearbitrarilyclosetotheirintegrationwithrespecttothetruedensity.PosteriorconsistencyactsasafrequentistjusticationofBayesianmethods;moredatadirectsthemodeltothecorrectparameters.Inspiteofitbeinganasymptoticproperty,posteriorconsistencyremainsabenchmarkbecauseitsviolationraisesthepossibilityofinferringthewrongposteriordistribution.Hence,posteriorconsistency,whenproven,givestheoreticalvalidationtotheusefulnessoftheiHGLMmodel.Aweakneighborhoodoff0ofradius,W(f0),isdenedasfollows:foreverybounded,continuousfunctiong,W(f0)=ff:jRf0(x,y)g(x,y)dxdy)]TJ /F5 11.9552 Tf 11.955 9.631 Td[(Rf(x,y)g(x,y)dxdyj
PAGE 78

forevery>0,thenfisweakconsistentatf0.Theprooffollowsalongthelinesof S.GhosalandRamamoorthi ( 1999 )and Tokdar ( 2006 )withthesignicantdifferencebeingthatthebasedistributionG0ofthedata(Xji,Yji)isatomic,becauseitisadrawfromaDP(,H).Fixing0<<1and>0,wecangetx0andusingthepropertyoff0,wehave,Rjxj>x0Rjyj>y0f0(x,y)logf0(x,y) f(x,y)dxdy<=2.Also,thereexistx0andy0,suchthatf0(x,y)=0forjxj>x0orjyj>y0,sincef0hascompactsupport.Fixing>0,thereexistsx>0,y>0,suchthat,RRf0(x,y)logf0(x,y) RR1 x(x)]TJ /F23 5.9776 Tf 5.756 0 Td[(x x)1 y(y)]TJ /F23 5.9776 Tf 5.756 0 Td[(y y)f0(x,y)dxdy<=4.LetP0beameasureonfx,0,1,x,yg.Wex,,>0suchthat1)]TJ /F9 11.9552 Tf 11.156 0 Td[(=(2(1)]TJ /F9 11.9552 Tf -458.702 -23.907 Td[()2)>.WechoosealargecompactsetKandG0(K),P0(K)>1)]TJ /F9 11.9552 Tf 12.464 0 Td[(suchthatthesupportofP0K.LetB=fP:jP(K)=P0(K))]TJ /F10 11.9552 Tf 12.034 0 Td[(1j0,sincethesupportofG0isequaltoP0.From Tokdar ( 2006 ),thereexistsasetCsuchthat(B\C)>0andforeveryP2B\C,forsomek,RRf0(x,y)logRK1 x(x)]TJ /F23 5.9776 Tf 5.756 0 Td[(x x)1 y(y)]TJ /F23 5.9776 Tf 5.756 0 Td[(0)]TJ /F23 5.9776 Tf 5.756 0 Td[(1x y)dP0 RK1 x(x)]TJ /F23 5.9776 Tf 5.757 0 Td[(x x)1 y(y)]TJ /F23 5.9776 Tf 5.756 0 Td[(0)]TJ /F23 5.9776 Tf 5.756 0 Td[(1x y)dPx0Rjyj>y0f0logf0 fdxdy<(5)Inconclusion,thepositivemeasurebyfonweakneighborhoodsoff0ensuresthattheNormalmodelisweakconsistent.Here,fandf0standsforf0(x,y)andf0(x,y). GibbsSamplingWewritedowntheGibbsSamplerforinference.Forallthemodels,wesampleindextji,kjtandk(fkd,xkdgandfkd,ykgfortheNormalmodel).AstheNormalmodelisconjugate,wehaveaclosedformexpressionfortheconditionaldensityof 78

PAGE 79

k,butforPoissonandLogisticMultinomialmodelswehaveusedMetropolisHastingsalgorithmaspresentedin Neal ( 2000a ).TheNormalmodel'ssolutionisgivenbythefollowing, fkd,xkdgN)]TJ /F9 11.9552 Tf 5.479 -9.683 Td[(kdjmxkd,(xkd,xkd))]TJ /F7 7.9701 Tf 6.587 0 Td[(1Gamma(xkdjaxkd,bxkd)fkd,ykgN)]TJ /F9 11.9552 Tf 5.479 -9.684 Td[(kdjmykd,(yk,yk))]TJ /F7 7.9701 Tf 6.587 0 Td[(1Gamma(ykjayk,byk)(5)Here, mxkd=xd0mxd0+Pzji=kxji xd0+njkxkd=xd0+njkaxkd=axd0+njk=2bxkd=bxd0+1 2Pzji=k(xjid)]TJ ET q .4782 w 249.934 -256.401 m 265.108 -256.401 l S Q BT /F3 11.9552 Tf 249.934 -263.72 Td[(xjid)2+xd0nnk( xjid)]TJ /F15 7.9701 Tf 6.586 0 Td[(mxd0) 2(xd0+nj.k)myk=XTX+(y0)I)]TJ /F7 7.9701 Tf 6.586 0 Td[(1XTy+y0Imy0y,k=)]TJ /F3 11.9552 Tf 5.48 -9.684 Td[(XTX+y0Iay,k=ay0+njk=2by,k=by0+1 2yTy+mTy0y0my0)]TJ /F3 11.9552 Tf 11.955 0 Td[(mTykykmyk(5)Again,thedistributionoftjiandkjtisgivenbelow. p)]TJ /F3 11.9552 Tf 5.48 -9.684 Td[(tji=tjt)]TJ /F15 7.9701 Tf 6.587 .001 Td[(ji,k/n)]TJ /F15 7.9701 Tf 6.586 0 Td[(jijt.f)]TJ /F15 7.9701 Tf 6.586 0 Td[(xji,yjikjt(xji,yji)iftisusedp)]TJ /F3 11.9552 Tf 5.48 -9.684 Td[(tji=tjt)]TJ /F15 7.9701 Tf 6.587 0 Td[(ji,k/0p)]TJ /F3 11.9552 Tf 5.48 -9.684 Td[(xji,yjijt)]TJ /F15 7.9701 Tf 6.586 0 Td[(ji,kift=tnew(5)Iftnewissampled,newsampleofkjtnewisobtainedfrom p(kjtnew=k)/m)]TJ /F15 7.9701 Tf 6.586 0 Td[(jikf)]TJ /F15 7.9701 Tf 6.587 0 Td[(xji,yjik(xji,yji)ifkisusedp(kjtnew=k)/f)]TJ /F15 7.9701 Tf 6.586 0 Td[(xji,yjiknew(xji,yji)ifk=knew(5)Samplingofkjtisgivenby, p(kjtnew=k)/m)]TJ /F15 7.9701 Tf 6.587 0 Td[(jtkf)]TJ /F15 7.9701 Tf 6.587 0 Td[(xjt,yjtk(xjt,yjt)ifkisusedp(kjtnew=k)/f)]TJ /F15 7.9701 Tf 6.587 0 Td[(xjt,yjtknew(xjt,yjt)ifk=knew(5) 79

PAGE 80

Here,p(xji,yji),f)]TJ /F15 7.9701 Tf 6.586 0 Td[(xji,yjik(xji,yji)andf)]TJ /F15 7.9701 Tf 6.587 0 Td[(xji,yjiknew(xji,yji)isgivenbythefollowingequations.FortheNormalmodel,theintegralshavecloseformsolutionswhereitleadstoaStudent-tdistribution.WesolveotherintegralsbyMonteCarlointegration. p(xji,yji)=PKk=1mk m+f)]TJ /F15 7.9701 Tf 6.586 0 Td[(xji,yjik(xji,yji)+ m+f)]TJ /F15 7.9701 Tf 6.587 0 Td[(xji,yjiknew(xji,yji)f)]TJ /F15 7.9701 Tf 6.587 0 Td[(xji,yjiknew(xji)=Rf(yjijxji,)f(xjij)h()d,f)]TJ /F15 7.9701 Tf 6.587 0 Td[(xji,yjik(xji,yji)=Rf(yjijxji,k)f(xjijk)h(kj)]TJ /F3 11.9552 Tf 17.933 0 Td[(xji,yji)dk(5) PredictiveDistributionFinally,wederivethepredictivedistributionforanewresponse)]TJ /F3 11.9552 Tf 5.479 -9.683 Td[(Yj(N+1)givenanewcovariateXj(N+1)andthesetofpreviouscovariate-responsepairsfDg.Forprediction,wecomputetheexpectationofYj(N+1)giventrainingdataandXj(N+1)usingMsamplesof j1:jT. E[Yj(N+1)jXj(N+1),D]=E[E[Yj(N+1)jXj(N+1), j1:jT]jD]=1 MPMm=1E[Yj(N+1)jXj(N+1), mj1:jT](5)Wenowneedtocomputethelikelihoodofthisexpectationwhichisgiveninthefollowingequation, E[Yj(N+1)jXj(N+1), jt=kjt]/(njt.)E[Yj(N+1)jXj(N+1), jt=kjt]fkjt)]TJ /F3 11.9552 Tf 5.479 -9.684 Td[(xj(N+1),iftisusedpreviously.E[Yj(N+1)jXj(N+1), jt=kjt]/(0njt.)E[Yj(N+1)jXj(N+1), jt=kjt]p)]TJ /F3 11.9552 Tf 5.48 -9.684 Td[(xj(N+1)jtnew,k,ift=tnew.(5)Firstly,p)]TJ /F3 11.9552 Tf 5.479 -9.684 Td[(xj(N+1)isgivenbyaboveequationwiththeypartomitted.Anewsampleofkjtnew(Iftnewissampled)isthenobtained.Anewsampleofkisobtainedifk=knew. 80

PAGE 81

Table5-1. DescriptionofGibbssamplingalgorithmforiHGLM 1.Initializegenerativemodelparametersinitsstatespace.Repeat2.Samplemodelparameters.3.Sampletji4.Samplekjtnew,ifrequired.5.Samplekjtuntilconverged6.EvaluateE[Yj(N+1)]foranewcovariate,Xj(N+1) Afterobtainingthespecictable, jt,forXj(N+1)andcorrespondingK,wecomputetheexpectationE[Yj(N+1)jXj(N+1), jt].Averagingoutsuccessiveexpectations,wegettheestimateofYj(N+1). ExperimentalResultsInallexperiments,wecollectedsamplesfromthepredictiveposteriorviatheGibbsSamplerandcomparedtheaccuracyofthemodelagainstitscompetitoralgorithms,includingstandardNormalGLMM,groupspecicRegressionalgorithmslikeLinearRegression(OLS),RandomForest,andGaussianProcessRegression RasmussenandWilliams ( 2005b ). ClinicalTrialProblemModeledbyPoissoniHGLMWeexploredaClinicalTrialproblem IBM ( 2011 )fortestingwhetheranewanticon-vulsantdrugreducesapatient'srateofepilepticseizures.Patientswereassignedthenewdrugortheplaceboandthenumberofseizureswererecordedoverasixweekperiod.Ameasurementwasmadebeforethetrialasabaseline.Theobjectivewastomodelthenumberofseizures,whichbeingacountdatum,ismodeledusingaPoissondistributionwithaLoglink.Thecovariatesare:TreatmentCentersize(ordinal),numberofweeksoftreatment(ordinal),typeoftreatmentnewdrugorplacebo(nominal)andgender(nominal).Forordinalcovariates,weusedaNormal-GammaMixture(LiketheNormalmodel)astheBaseDistribution.Fornominalcovariates,weusedaDirichletpriorMixtureastheBaseDistribution(H).APoissondistributionwithloglinkwasusedforthecountofseizures.Here,XjiandYjirepresenttheithcontinuouscovariateand 81

PAGE 82

countresponsepairofthejthgroup.Thedistribution,fd,xdg(Normal-Gamma)isthepriordistributionontheordinalcovariates.Thedistribution,fdg(Normal)isthepriordistributiononthecovariatecoefcient.mistheindexofthenumberofcategoriesforthenominalcovariate.pdmistheprobabilityofthemthcategoryofthedthdimension.adm0isthehyper-parameterfortheDirichlet.Therefore,thisbecomesaninnitemixtureofDirichletdensity.So,adrawG0isaninnitemixtureofpdm.AnotherdrawGjleadstoaninnitecollectionofpdmforgroupsseparately,butthistimethepdm'saresharedamongthegroupsbecauseG0isatomic.AfterthedrawofGj,oneofthemixturecompo-nents,pkdmgetspickedforthejthgroupanddthdimensionwithkdenotingmixtureindex.Then,covariateXjidisdrawnfromaCategoricalDistributionwithparametersaspkdm.Wefoundthatmostpatient'snumberofseizures(theyformthegroups)comesfromasingleunderlyingcluster.Thissigniesthatamajorityofthepatientsacrossgroupsshowthesameresponsetothetreatment.Weobtained10clustersfrom300outof565patients(theremaining265weresetasidefortesting).Amongthem8clustersshowedthatthenewdrugreducesthenumberofepilepticseizureswithincreasingnumberofweeksoftreatmentwhiletheremaining2clustersdidnotshowanyimprovement.Wealsoreporttheforecasterrorofthenumberofepilepticseizuresoftheremaining265patientsinTable 5-4 .Ourrecommendationfortheusageofthenewdrugwouldbeaclusterbasedsolution.Foraspecicpatient,ifshefallsinoneofthoseclusterswithdecreasingtrendinthenumberofseizureswithtime,wewouldrecommendthenewdrug,andotherwisenot.Outof265testcasepatients,220showedsignsofimprovementswhile45didnot.TraditionalPoissonGLMMcannotinferthisndingssincethedensitiesarenotsharedatthepatientgrouplevel.Moreover,onlythePoissoniHGLMbasedpredictionisformallyequippedtorecommendapatientclusterbasedsolutionforthenewdrug,whereasalltraditionalmixedmodelspredictaglobalrecommendationforallpatients. 82

PAGE 83

fd,xdgN)]TJ /F9 11.9552 Tf 5.48 -9.684 Td[(djmxd0,(xd0,xd))]TJ /F7 7.9701 Tf 6.586 0 Td[(1Gamma(xdjaxd0,bxd0),fdgN)]TJ /F9 11.9552 Tf 5.479 -9.683 Td[(djmyd0,s2yd0,pdmDir(adm0),G0DP(,H),GjDP(0,G0),kd,)]TJ /F7 7.9701 Tf 6.586 0 Td[(1xkdGj,fkdgGj,pkdmGj,Xjidcategorical(pkdm),Xjidjkd,xkdN)]TJ /F3 11.9552 Tf 5.479 -9.684 Td[(Xjidjkd,)]TJ /F7 7.9701 Tf 6.586 0 Td[(1xkdfYjijXjigPoissonyjijexpPDd=0kdXjid(5) HeightImputationProblemWeproposeanewiHGLMbasedmethodforheightimputation RobinsonandWykoff ( 2004 )basedonheight-diameterregressioninforeststands.Aforeststandisacommunityoftreesuniformincomposition,structure,ageandsizeclassdistribution.Estimatingvolumeandgrowthinforeststandsisanimportantfeatureofforestinventory.Sincethereisgenerallyastrongproportionalitybetweendiameterandothertree-attributeslikepastincrement,forecastingheightusingdiametercanproceedwithlimitedlossofinformation.Weprocesseddataforvestands.ThedataincorporatedinthemodelisthroughthelogarithmictransformationYnew=log)]TJ /F3 11.9552 Tf 5.479 -9.684 Td[(Yold)]TJ /F10 11.9552 Tf 11.956 0 Td[(4.5andinversetransformationXnew=)]TJ /F10 11.9552 Tf 5.479 -9.684 Td[(1+Xold)]TJ /F7 7.9701 Tf 6.587 0 Td[(1.Weshowthetreeheightswithrespecttothediametersforeachstandwhichclearlydepictsthesharingofclustersamongstandsanddifferentclusterswithineachstand.Also,differentclusterswithinstandshavedifferentvariabilityofgrowth,therebymodelingheteroscedasticityatthestandlevel.Roughly,thereare2to3primaryclustersineachstandtotaling5primaryclusters.Theremainingclustershaveveryfewtrees(maximum5)andrepresentoutliers.WereportthemeantreeheightsandalsothevarianceofgrowthofthetreeswithineachprimaryclusterinTable 5-3 .Wealsoreporttheforecasterrorofthetreesofthetestingset(20%)andcompareagainstNormalGLMM,groupspecicOLS,RandomForestandGaussianProcessRegression. 83

PAGE 84

MarketDynamicsExperimentInthisexperiment,insteadofpresentingathirdexampletodemonstratetheefcacyofthemodel,wedecidedtodemonstratehowthemodelcouldbeusedasanexploratorytool(asopposedtoaclassicalinferencetool)foranalyzingthetemporaldynamicsofstocksfromS&P500companies.Thisstrengthdrawsfromthemodel'slargesupport(i.e.,hypothesisspace).Thecompaniesbelongtodisparatemarketsectorssuchas,Technology(Microsoft,Apple,IBMandGoogle),Finance(GoldmanSachs,JPMorgan,BOAandWells-Fargo),Energy(XOM,PTR,ShellandCVX),Healthcare(JNJ,Novartis,PzerandMRK),Goods(GE,UTX,BoeingandMMM),andServices(WMT,AMZN,EBAYandHD).UsingiHGLMNormalModel,wemodeledeachcompany'sstockvalueatagiventimepointasafunctionofthevaluesoftheothersatthattimepoint(theremaining23).Eachstockofoneparticularsector(tech.,nance,healthcaresectoretc.)formedonegroup(e.g.tech.sectorhas4groups/stocks(IBM,MSFT,goog,aapl))andawholesector(tech.,nanceetc.)wasmodeledbyoneHGLM.Experimentswererunoverallsuchgroupings.Paststockpriceswerenotincluded.Werecordedthestockshavingthemostimpactonthedeterminationofthevalueofeachstock.Theimpactsarebydenitionthemagnitudeoftheweightedcoefcientsofthecovariates(thestockvalues)iniHGLM.Alltheexperimentsweredoneondailycloseoutstockpricesafterthenancialcrisis(June-09toMarch-14)andinthemiddleofthecrisis(May-07toJune-09).Fewtrendswerenoteworthy.PricesofagivensetofFirstly,stocksfromanygivensectorwereimpactedlargelybythesamestock(notnecessarilyfromthesamesector),withfewstocksbeinginuentialoverall.Secondly,thestockshavingthemostimpactonaspecicsectorwerelargelythesame.Forexample,Microsoft(tech.sector),islargelymodeledbyGOOG,IBM(tech),GS(Finance)afterthecrisis(indescendingorderofweights).However,duringthecrisis,thestocksshowednosuchtrends.Forexample,Microsoftis 84

PAGE 85

Table5-2. Listofstockswithtop3mostsignicantstocksthatinuenceeachstockfromallthesectors. Time-PeriodXOMPTRShellCVXAAPLMSFTIBMGOOGBOAJPMWFCGS 2009-14PTRXOMPTRXOMIBMGOOGAAPLAAPLWFCGSGSJPMCVXCVXXOMSHELLJPMIBMMMMGSGSWFCJPMBOAGSGOOGHDBOAGOOGGSGOOGMSFTJPMXOMPFEWFC 2007-09HDGSMSFTJNJBOAGEJPMWFCEBAYMMMGSWMTPTRPFEXOMIBMWMTGOOGShellMMMGEAMZNHDGEJPMCVXMMMHDGSJPMNVSUTXMRKCVXPFEGS Time-PeriodJNJNVSPFEMRKGEUTXBAMMMWMTAMZNEBAYHD 2000-14MRKJNJJNJNVSBAMMMMMMBAAMZNHDHDGOOGNVSPFEGSJPMMMMGEGEAAPLEBAYEBAYWMTWMTGEAAPLMRKJNJPTRPFEUTXGEGOOGMSFTMSFTGS 2007-09MSFTBAPTRIBMAXPGSJPMWMTHDMMMGSWMTCVXPFEAAPLCVXP&GBOAMRKPTRGECVXGEGEGSWFCMMMHDGSJPMHDWFCMMMHDIBMWFC Table5-3. MSEandMAEofthealgorithmsfortheheightimputationdatasetandmeansandstandarddeviationoftheindividualclustersfrommanystands.ForStand-1,themainclusterswereC1,C2,C3,forS-2,theseareC4,C5,C3,forS-3,theyareC1,C4,C3,forS-4,theseareC2,C3andforS-5,theyareC1,C4,C3. ClustersC1C2C3C4C5Mean.1317.0692.014.0302.0143STD.0087.00086.00049.00038.00015 iHGLMGLMMOLSRforestGPRCARTMAE(L1Error).0094.0114.01243.01527.01319.0252MSE(L2Error)1.008e-29.8e-31.2e-24.2e-21.8e-23.4e-2 impactedbyGE,GOOG,JPMshowingnosectorwisetrend.Wereportresultsforallthesectors/stocksinTable 5-2 85

PAGE 86

Table5-4. MSEandMAEofthealgorithmsfortheclinicaltrialdatasetandnumberofpatientsinclustersfortrainingandtestingsets. PatientnumberinclustersfortrainingsetPositive(First8)Negative(LastTwo)26391528225332243724Patientnumberinclustersfortestingset19332719163826421530 iHGLMPoissonGLMMPoissonregressionCARTRForestMeansquarerooterror(L2Error)1.411.581.921.651.75Meanabsoluteerrorrooterror(L1error).941.341.511.231.62 Figure5-2. Depictionofseveralclustersintheheightimputationdatasetfordifferentstandswhichissharedbyclusters.Everystandisshownwithitsownsinglecolor. 86

PAGE 87

CHAPTER6DENOISINGTIMESERIESBYWAYOFAFLEXIBLEMODELFORPHASESPACERECONSTRUCTIONInthischapter,wehaveusedtheDirichletProcessmixturesoflinearregressionforsolvingthetimeseriesdenoisingproblem. TimeDelayEmbeddingandFalseNeighborhoodMethodTimedelayembeddinghasbecomeacommonapproachtoreconstructthephasespacefromanexperimentaltimeseries.Thecentralideaisthatthedynamicsisconsideredtobegovernedbyasolutiontravelingthroughaphasespaceandasmoothfunctionmapspointsinthephasespacetothemeasurementwithsomeerror.Givenatimeseriesofmeasurements,x(1),x(2),....,x(N),thephasespaceisrepresentedbyvectorsinD-dimensionaleuclideanspace. y(n)=hx(n),x(n+T),.....,x(n+(D)]TJ /F10 11.9552 Tf 11.955 0 Td[(1)T)i(6)Here,TisthetimedelayandDistheembeddingdimension.Thetemporallysub-sequentpointtoy(n)inthephasespaceisy(n+1).Thepurposeoftheembeddingistounfoldthephasespacetoamultivariatespace,whichisrepresentativeoftheoriginaldynamics. Takens ( 1981 )hasshownthatundersuitableconditions,ifthedynamicalsystemhasdimensiondAandiftheembeddingdimensionischosenasD>2dA,thenalltheself-crossingsinthetrajectoryduetotheprojectioncanbeeliminated.Thefalseneighborhoodmethod Kenneletal. ( 1992 )accomplishesthistask,whereitviewsthedynamicsasacompactobjectinthephasespace.Iftheembeddingdimensionistoolow(thesystemisnotcorrectlyunfolded),manypointsthatlieveryclosetoeachother(i.e.,neighbors)arefarapartinthehigherdimensionalcorrectlyunfoldedspace.Iden-ticationofthesefalseneighborsallowsthetechniquetodeterminethatthedynamicalsystemhasnotbeencorrectlyunfolded.For,thetimeseries,x(n),indthand(d+1)thdimensionalembedding,theEuclideandistancebetweenanarbitrarypoint,y(n)anditsclosestneighboryFL(n)is,R2d(n)= 87

PAGE 88

Pd)]TJ /F7 7.9701 Tf 6.587 0 Td[(1k=0[x(n+kT))]TJ /F3 11.9552 Tf 12.762 0 Td[(xFL(n+kT)]2andR2d+1(n)=Pdk=0[x(n+kT))]TJ /F3 11.9552 Tf 12.762 0 Td[(xFL(n+kT)]2respectively.IftheratioofthesetwodistancesexceedsathresholdRtol(wetookthisas15inthisdissertation),thepointsareconsideredtobefalseneighborsinthedthdimension.Themethodstartsfromd=1andincreasesittoD,untilonly1)]TJ /F10 11.9552 Tf 12.924 0 Td[(2%ofthetotalpointsappearasfalseneighbors.Then,wedeemthephasespacetobecompletelyunfoldedinRD,aD-dimensionalEuclideanSpace. NPB-NRModel StepOne:ClusteringofPhaseSpaceGivenatimeseriesfx(1),x(2),..x(N)g,lettheminimumembeddingdimensionbeD(usingtheFalseNeighborhood).Hence,thereconstructedphasespaceis, 266666664x(1)x(2)...x(N)]TJ /F10 11.9552 Tf 11.956 0 Td[((D)]TJ /F10 11.9552 Tf 11.955 0 Td[(1)T)x(1+T)x(2+T)...x(N)]TJ /F10 11.9552 Tf 11.955 0 Td[((D)]TJ /F10 11.9552 Tf 11.955 0 Td[(2)T)............x(1+(D)]TJ /F10 11.9552 Tf 11.955 0 Td[(1)T)x(2+(D)]TJ /F10 11.9552 Tf 11.955 0 Td[(1)T)...x(N)377777775(6)Here,eachcolumnrepresentsapointinthephasespace.Thegenerativemodelofthepointsinthephasespaceisnowassumedas, ij1,2Beta(1,2),fi,d,i,dgN)]TJ /F9 11.9552 Tf 5.48 -9.684 Td[(i,djmd,(d,i,d))]TJ /F7 7.9701 Tf 6.587 0 Td[(1Gamma(i,djad,bd)znjfv1,v2,.....gCategoricalf1,2,3....g,Xd(n)jznN(zn,d,zn,d)(6)Here,Xd(n)isthedthco-ordinateofthenthphasespacepoint.fz,v,i,d,i,dgisthesetoflatentvariables.Thedistribution,fi,d,i,dg,isthebasedistributionoftheDP.f1,2,3....gdenotesthecategoricaldistributionparameters.InthisDPmixture,thesequence,fM1,M2,M3....g,createsaninnitevectorofmixingproportionsandfzn,d,zn,dgaretheatomsrepresentingthemixturecomponents.ThisinnitemixturesofGaussianspicksclustersforeachphasespacepointandletsthephasespace 88

PAGE 89

datadeterminethenumberofclusters.Fromthisperspective,wecaninterprettheDPmixtureasaexiblemixturemodelinwhichthenumberofcomponents(i.e.,thenumberofcellsinthepartition)israndomandgrowsasnewdataisobserved. StepTwo:NonlinearMappingofPhaseSpacePointsDuetothediscretizationoftheoriginalcontinuousphasespace,ourassumptionisthatapointinthephasespaceisconstructedbyanonlinearmapRwhoseformwewishtoapproximate.Inthissection,weapproximatethisnonlinearmapofthesubsequentphasespacepointsviatheproposednonlinearregression.WeassumethataspecicclusterhasNpoints.Wereorderthesepointsaccordingtotheiroccurrenceinthetimeseries.Wethenpickthecorrespondingimageofthesepoints(whicharethetemporallysubsequentphasespacepointsaccordingtotheoriginaltimedelayembedding).WemapeachphasespacepointsintheclusterthroughaninnitemixturesofLinearRegressiontotheirrespectiveimages.Themodelisformallydenedas: y1(n)=R1(x(n))y2(n)=R2(x(n))....yD(n)=RD(x(n))(6)Here,R1:DarenonlinearRegressorswhichisdescribedbythefollowingsetofequations.Here,Xd(n)andY1(n)representthedthco-ordinateofthenthphasespacepointandtherstco-ordinateofitspostimagerespectively.fz,v,i,d,x,i,d,i,d,y,igisthesetoflatentvariablesandthedistributions,fi,d,x,i,dgandfi,d,y,igarethebasedistributionsoftheDP.f1,2,3,...g.AlthoughthesesetofequationsareforR1,thesamemodelappliesforR2:D,representingY2:D(n). ij1,2Beta(1,2),fi,d,y,igN)]TJ /F9 11.9552 Tf 5.48 -9.683 Td[(i,djmy,d,(y,y,i))]TJ /F7 7.9701 Tf 6.586 0 Td[(1Gamma(y,ijay,by)znjf1,2,.....gCategoricalf1,2,3....g,Y1(n)jX(n),znNzn,0+PDd=1zn,dXd(n),)]TJ /F7 7.9701 Tf 6.587 0 Td[(1y,zn(6) 89

PAGE 90

TheinnitemixturemodelapproachtotheLinearRegressionmakesthecovariatesbeassociatedwiththemodelviaanonlinearfunction,resultingfrommarginalizingovertheothermixtureswithrespecttoaspecicmixture.Also,nowthevarianceisdifferentacrossdifferentmixtures,therebycapturingHeteroscedasticity. StepThree:RestructuringoftheDynamicsTheideahereistoperturbthetrajectorytomakethemodiedphasespacemoreconsistentwiththedynamics,whichisequivalenttoreducingtheerrorbyperturbingthephasespacepointsfromitsoriginalpositionandalsotheerrorbetweentheperturbedpositionandthemappedposition.Wehavetochooseanewsequenceofphasespacepoints,[x(n),suchthatfollowingobjectiveisminimized. PNn=1(k[x(n))]TJ /F3 11.9552 Tf 11.955 0 Td[(x(n)k2+k[x(n))]TJ /F3 11.9552 Tf 11.955 0 Td[(R)]TJ /F25 11.9552 Tf 14.45 -9.573 Td[(\xpre)]TJ /F15 7.9701 Tf 6.586 0 Td[(imagek2+kR[x(n))]TJ /F5 11.9552 Tf 11.955 9.684 Td[()]TJ /F25 11.9552 Tf 16.693 -9.573 Td[(\xpost)]TJ /F15 7.9701 Tf 6.586 0 Td[(imagek2(6)RisthenonlinearRegressors(R1:D)thatareusedtotemporallyapproximatethephasespace(Describedinthesectionabove).Nisthenumberofpointsinthespeciccluster.Thisisdoneacrossalltheclusters.Inaddition,tocreatethenewnoiseremovedtimeseries,perturbationsofxd(n)'saredoneconsistentlyforallsubsequentpoints,suchthatwecanrevertbackfromthephasespacetoatimeseries.Forexample,ifthetimedelayis1andtheembeddingdimensionis2,then,thephasespacepointsareperturbedinsuchawaythatwhenx(n)=(t(n),t(n+1))ismovedto[x(n)=dt(n),\t(n+1),wemaketherstco-ordinateof\x(n+1)tobe\t(n+1).Theseformasetofequalityconstraints.Whatresultsisaconvexprogram,thatisthensolvedtoretrievethedenoisedtimeseries.TheentirealgorithmissummarizedinTable 6-1 90

PAGE 91

Table6-1. Step-wisedescriptionofNPB-NRprocess. 1.Formthephasespacedynamicsfromthenoisytimeserieswiththeembeddingdimensiondeterminedbyfalseneighbor-hoodmethod.2.ClusterthepointsinthephasespaceviainnitemixtureofGaussiandensities.3.Foreachcluster,mapeachphasespacepointviaaninnitemixturesoflinearregression(R1:D)toitstemporallysubsequentpoint(post-image).4.InferthelatentparametersforbothinnitemixtureofGaussiandensitiesandinnitemixtureoflinearregression.fz,v,i,d,i,dgandfz,v,i,d,x,i,d,i,d,y,igwereinferredthroughvariationalinference.Theinferencegivesustheformoftheregressors,(R1:D).5.RestructurethedynamicsviaoptimizingtheConvexfunction.Therestructuringisdoneconsistentlyforallthesubsequentpoints,whichleadstothereconstructionofthenoiseremovedtimeseries. ExperimentalResults AnIllustrativeDescriptionoftheNPB-NRProcessFirst,wepresentanillustrativepictorialdescriptionofthecompleteNPB-NRpro-cesswitharealworldhistoricalstockpricedataset.Ourmodelforthehistoricaltimeseriesofthestockpriceisalow-dimensionaldynamicalsystemthatwascontaminatedbynoiseandpassedthroughameasurementfunctionattheoutput.Ourtaskwastodenoisethestockpricetonotonlyrecovertheunderlyingoriginalphasespacedynam-icsandcreatethesubsequentnoiseremovedstockpriceviatheNPB-NRprocess,butalsotoutilizeittomakebetterfuturepredictionsofthestockprice.WepickedhistoricaldailycloseoutstockpricedataofIBMfromMarch-1990toSept-2015forthistask.TheoriginalnoisytimeseriesisplottedinFigure 6-1 .ThevariousstagesofNPB-NRareillustratedinthesubsequentgures.Theunderlyingdimensionofthephasespaceturnedouttobe3fromtheFalseNeighborhoodMethod.TheRecon-structedPhaseSpacewithnoiseisshowninFigure 6-2 .ThecompletelyclusteredphasespaceandonespecicclusterinthephasespacebyDirichletProcessMixture 91

PAGE 92

Figure6-1. PlotofthenoisyIBMtimeseriesdata Figure6-2. Depictionofnoisyphasespace(reconstructed). ofGaussianofNPB-NR(stepone)isshowninFigure 6-3 .Fora3-dimensionalphasespace,asisthecasewiththeIBMstockpricedata,considerXandYtobetwotempo-rallysuccessivepointsinonecluster.Therefore,thenonlinearregressionmodel(StepTwo)inNPB-NRisY(1)=R1(X(1),X(2),X(3)),Y(2)=R2(X(1),X(2),X(3))andY(3)=R3(X(1),X(2),X(3)).InFigure 6-4 ,weplotY(1)againstX(1),X(2)andX(3)(Therstregression-R(1))todepictthenonlinearityoftheregressionmodelwhichwehavemodeledthroughtheDirichletProcessMixturesoflinearregression(steptwo).Thetrajectoryadjusted(stepthree)andconsequentlythenoiseremovedspecicclusterandthecompletenoiseremovedphasespaceareshowninFigure 6-5 .Finally,thedenoisedtimeseriesisshowninFigure 6-6 .TheerrorinformationforpredictionforIBMstockdataisreportedinTable 6-3 PredictionAccuracyNPB-NRwasusedfortimeseriesforecasting.Therstdatasetwasdrawnfromthestockmarket.Wechoose5stocks(IBM,JPMorgan,MMM,Home-Depotand 92

PAGE 93

Figure6-3. Depictionofwholeclusteredphasespace(stepone)andonesinglecluster Figure6-4. Regressiondata:Y(1)regressedwithcovariateasX(1),X(2)andX(3) Figure6-5. Singlenoiseremovedclusterandwholenoiseremovedphasespace Figure6-6. Plotofthenoiseremovedtimeseriesdata 93

PAGE 94

Walmart)fromMarch,2000toSept.,2015with3239instances(timepoints)fromDOW30.ThenextfourdatasetscamefromtheSantaFecompetitioncompiledin GershenfeldandWeigend ( 1994 ).TherstisaLasergenerateddatasetwhichisaunivariatetimerecordofasingleobservedquantity,measuredinaphysicslaboratoryexperiment.ThenextisaCurrencyExchangeRateDatasetwhichisacollectionoftickwisebidsfortheexchangeratefromSwissFrancstoUSDollars,fromAugust1990toApril1991.Thenextdatasetisasyntheticcomputergeneratedseriesgovernedbyalongsequenceofknownhighdimensionaldynamics.ThefourthdatasetisasetofastrophysicalmeasurementsoflightcurveofthevariablewhitedwarfstarPG1159035inMarch,1989.ThenextsetofdatasetsaretheDarwinsealevelpressuredatasetfrom1882to1998,OxygenIsotoperatiodatasetof2.3millionyearsandUSIndustrialProductionIndicesdatasetfromFederalReserverelease.NPB-NRwascomparedwiththeGARCH,AR(),ARMA(p,q)andARIMA(p,d,q)models,where,p,d,qweretakenbycross-validationsrangingfrom1to10fold.WealsocomparedNPB-NRtoPCAandkernelPCA Bishop ( 2006 )withsigmasetto1,andGaussianProcessBasedAuto-regressionwithtakenbycross-validationsrangingfrom1to5fold.WealsocomparedresultsfromHardThresholdWaveletdenoisingusingthewdenMatlabfunction.Allcompetitoralgorithmswererunwitha50-50training-testingsplit.WereporttheMeanSquareError(MSE,L2)oftheforecastforallthecompetitoralgorithmsinTable 6-3 .IndividualtimeserieswerereconstructedintoaphasespacewiththedimensiondeterminedbytheFalseNeighborhoodmethod,waspassedthroughNPB-NRtondthemostconsistentdynamicsbyreducingnoise,andsubsequentlyfedintoasimpleauto-regressorwithlagordertakenastheembeddingdimensionofthereconstructedtimeseries.Inmostdatasets,NPB-NRnotonlyyieldedbetterforecasts,butalsoasmallerstandarddeviationamongitscompetitorsamongthe10runs. 94

PAGE 95

NoiseReductionExperimentWeevaluatedtheNPB-NRtechniquefornoisereductionacrossseveralwellknowndynamicalsystems,namely,Lorenzattractor(chaotic) Lorenz ( 1963 ),Van-der-pollattractor Pol ( 1920 )andRosslerattractor Rossler ( 1976 )(periodic),BucklingColumnattractor(nonstrangenonchaotic,xedpoint),Rayleighattractor(nonstrangenonchaotic,limitcycle) AbrahamandShaw ( 1985 )andGOPYattractor(strangenonchaotic) Grebogietal. ( 1984 ).AlthoughnoisewasaddedtothetimeseriessuchthattheSNRrangedfrom15dbto100db,itisimpossibletocalculatenumericallyorfromthePowerSpectrumhowmuchnoisewasactuallyremovedfromthenoisytimeseries.Therefore,forboththenoiseremovedandthenoisytimeserieswecalculatedtheuctuationerror:,fi=kxi)]TJ /F3 11.9552 Tf 11.955 0 Td[(xi)]TJ /F7 7.9701 Tf 6.586 0 Td[(1)]TJ /F10 11.9552 Tf 11.956 -.166 Td[((dt)f(xi)]TJ /F7 7.9701 Tf 6.586 0 Td[(1,yi)]TJ /F7 7.9701 Tf 6.587 0 Td[(1,zi)]TJ /F7 7.9701 Tf 6.587 0 Td[(1)kThismeasuresthedistancebetweentheobservedandthepredictedpointinthephasespace.Here,measurementofthenoisereductionpercentageisgivenby,R=1)]TJ /F15 7.9701 Tf 13.151 5.942 Td[(Enoise)]TJ /F18 5.9776 Tf 5.757 0 Td[(removed Enoisy,E=Pf2i N1 2WetabulatedthenoisereductionpercentagesoftheNPB-NR,thelowpasslter,andalsowaveletdenoisingmethodsinTable 6-4 .Forthewaveletmethod,weusedthematlabwdenfunctionin'soft'and'hard'thresholdmode.TheNPB-NRyieldedthehighestnoisereductionpercentagefor15-100dbSNR.Sincethefaithfulreconstructionoftheunderlyingdynamicsintrinsicallyremovesthenoise,asthenoiseincreasesthenoisereductionperformanceofNPB-NRgotsignicantlybetterasopposedtotheothertechniques. PowerSpectrumExperimentWeranaPowerSpectrumexperimentforanoisecorruptedVan-der-pollattractor(periodic) Pol ( 1920 )aswellasatimeseriescreatedbysuperimposing6Sinusoidsand 95

PAGE 96

Figure6-7. PowerspectrumandphasespaceplotofVan-der-pollandsinusoidattractor subsequentlycorruptingitwithnoise.ThenoisewasadditivewhiteGaussiannoisewiththeSNR(Signal-to-Noiseratio)setat15db.Var-der-pollisasimpletwodimensionalattractorwithb=0.4;x0=1;y0=1andthesuperimpositionofSinusoidsisasimplelimitcycleattractorwithnegativeLyapunovExponentsandnofractalstructure.WeplotthephasespaceandthePowerSpectrumofthenoisytimeseriesgeneratedfromtheseattractors,thenoiseremovedsolutionwitha6th-orderButterworthlow-passlter(cut-offfreq.30Hzand1000Hzrespectively)andtheNPB-NRtechnique.ThePowerSpectrumandthephasespaceplotoftheVan-der-pollandSinusoidattractorsisshowninFigure 6-7 .NotethatNPB-NRsuccessfullymadetheharmonics/peaksmoreprominentwhichwasoriginallyobscuredbythenoise.Thelteringmethodwasunabletorestoretheharmonics,althoughitremovedsomeofthehigherfrequencycomponents.Wealso 96

PAGE 97

observethatNPB-NRsmoothenedoutthephasespacedynamicsbetterthanthelowpasslter. ExperimentwithDimensionsWecanviewnoiseashigh-dimensionaldynamicswhichisaddedtothelow-dimensionalattractor.Therefore,noisekicksupthedimensionoftheresultingdynamics.WeevaluatedtheNPB-NRmethodtocheckwhetheritbringsdownthedimensiontotheoriginaldesireddimensionoftheattractor.Werstcalculatedtheminimumembeddingdimension(whenFalseNeighborhoodPercentagefallsbelow1%).ThenwepassedeachtimeseriesthoughNPB-NR.Afterthis,weevaluatedtheminimumembeddingdimensionagainforthenewlycreatednoiseremovedtimeseries.WefoundthatNPB-NRsignicantlyoutperformsthelow-passlteringtechniquetobringdowntheminimumembeddingdimensionoftheunderlyingattractortotheoriginal.WehavealsocomparedNPB-NRagainsttraditionaldimensionalityreductiontechniqueslikePCAandkernelPCAwithsigmasetas1.ForPCAandkernelPCA,wesettheoriginaldimensionas15foralltheattractors.Then,theunderlyingdimensionwasdeterminedaswhenthecumulativevarianceofthetopeigen-vectorsroseabove98%.NumericalresultsshowthatPCAorkernelPCAcannotndthecorrectunderlyingdimensionofthenoisyattractors.ThereasonbehindthisisthatthegoalofPCA/kernelPCAistoprojectthehigherdimensionaldatatothelowerdimensionalsubspacewithmaximumspread.Inthepresenceofnoise,withoutbeingtiedtothemodelofthedynamics,PCA/KernelPCAdistortsthedynamicsseverelyandretrievesadimensionentirelydifferentfromtheoriginal.Iftheunderlyingdimensionpickedislowerthantheoriginal,itisunabletounfoldtheattractor.Ifgreater,thereisenoughresidualnoisetodegradethepredictionaccuracy.Alltheexperimentsinthissectionweredoneunder15dbSNR.s 97

PAGE 98

Table6-2. Minimumembeddingdimensionoftheattractorswiththeiroriginaldimension,dimensionofnoisytimeseriesanddimensionofnoiseremovedtimeserieswiththeNPB-NR,lowpasslter,PCAandKERNEL-PCA. LorenzGOPYVan-Der-PollRosslerRayleigh Originaldimension32232Dimension-noisy877105 Noiseremoveddimension NPB-NR42233Lowpasslter65544PCA53454KERNEL-PCA23344 Table6-3. MSEandstandarddeviationofallthedatasetsforallthecompetitoralgorithmsin50-50randomtraining-testingsplitfor10runs. MSENPB-NRGARCHWaveletARARMAARIMAPCAKERNEL-PCAGPR IBM1.431.651.371.871.701.681.981.901.84JPM1.381.521.491.461.421.391.671.591.73MMM1.691.871.962.061.931.831.142.112.23HD1.741.581.461.731.691.621.791.721.86WMT1.241.471.581.391.351.291.491.571.38LASER.971.361.291.31.861.151.421.351.34CER.82.99.93.94.88.841.181.111.05CGS1.792.111.882.031.961.862.382.282.17ASTRO1.822.192.142.081.911.922.262.332.46DSLP1.331.681.531.491.411.141.681.601.55OxIso1.191.381.871.321.261.531.481.411.45USIPI1.301.571.361.481.431.621.571.571.63 Stan.dev.NPB-NRGARCHWaveletARARMAARIMAPCAKERNEL-PCAGPR IBM1.341.891.861.671.781.393.262.371.42JPM1.631.981.971.781.942.012.351.692.21MMM1.821.481.291.421.361.821.731.661.59HD1.861.851.851.921.771.881.931.901.86WMT1.791.661.821.621.732.311.671.611.98LASER2.122.282.392.192.361.722.422.272.39CER1.691.781.931.841.741.711.801.911.72CGS2.341.921.881.972.051.871.952.172.11ASTRO1.131.821.411.371.691.791.551.291.62DSLP1.581.491.191.271.351.451.261.421.25OxIso2.472.151.992.241.892.251.922.612.45USIPI2.232.332.492.421.892.292.372.722.25 98

PAGE 99

Table6-4. NoisereductionpercentageoftheattractorsfortheNPB-NR,thelowpasslteringmethodandthehardandsoftthresholdwaveletmethod. LorenzGOPYVan-Der-PollRosslerRayleigh NoiseleveldbSNR NPB-NR4045542934Lowpasslter1927401931Wavelet soft1513292125Wavelet hard177211822 NoiseleveldbSNR NPB-NR5159614056Lowpasslter2631402839Wavelet soft2222363332Wavelet hard2314292428 NoiseleveldbSNR NPB-NR6371757982Lowpasslter3135403741Wavelet soft3229414042Wavelet hard2921333233 NoiseleveldbSNR NPB-NR7276798184Lowpasslter3439434344Wavelet soft3535464447Wavelet hard3427393638 NoiseleveldbSNR NPB-NR8079858589Lowpasslter3843464746Wavelet soft4139505060Wavelet hard3630454051 99

PAGE 100

CHAPTER7CONCLUSIONANDFUTUREWORKIntherstpart,wehaveformulatedinnitemixturesvariousGLMmodelsviaastickbreakingpriorashierarchicalBayesiangraphicalmodels.Wehavederivedfastmeaneldvariationalinferencealgorithmsforeachofthemodels.ThealgorithmsareparticularlyusefulforhighdimensionaldatasetswhereGibbssamplingfailstoscaleandisslowtoconverge.Thealgorithmshavebeentestedsuccessfullyonfourdatasetsagainsttheirwellknowncompetitoralgorithmsacrossmanysettingsoftraining/testingsplits.Inthenextpart,wehaveformulatedaninnitemultigroupGeneralizedLinearModel(iMG-GLM),aexiblemodelforsharedlearningamonggroupsingroupedregression.Themodelclustersgroupsbyidentifyingidenticalresponse-covariatedensitiesfordifferentgroups.Weexperimentallyevaluatedthemodelonawiderangeofproblemswheretraditionalmixedeffectmodelsandgroupspecicregressionmodelsfailtocapturestructureinthegroupeddata.Inthethirdpart,wehaveformulatedaninnitemixturesofHierarchicalGeneralizedLinearModel(iHGLM),aexiblemodelforhierarchicalregression.Themodelcapturesidenticalresponse-covariatedensitiesindifferentgroupsaswellasdifferentdensitiesinthesamegroup.Italsocapturesheteroscedasticityandoverdispersionacrossgroups.Weexperimentallyevaluateditonawiderangeofproblemswheretraditionalmixedeffectmodelsfailtocapturestructureinthegroupeddata.Inthenalpart,wehaveformulatedaBayesiannonparametricmodelfornoisereductionintimeseries.Themodelcapturesthelocalnonlineardynamicsinthetimedelayembeddedphasespacetotthemostappropriatedynamicsconsistentwiththedata.Finally,wehaveevaluatedtheNPB-NRtechniqueonvarioustimeseriesgeneratedfromseveraldynamicalsystems,stockmarketdata,LASERdata,sealevelpressuredata,etc.Thetechniqueyieldsmuchbetternoisereductionpercentage,power 100

PAGE 101

spectrumanalysis,accuratedimensionandpredictionaccuracy.Intheexperiments,wevariedthescalefactorwhichmodulatesthenumberofclustersinthephasespace.WhilethevariationalmethodsforGLMmodels,developedinameaneldsetting,itwouldbeworthexploringothervariationalmethodsinthenon-parametricBayesiancontexttotheGeneralizedLinearModels.Forthemultigroupandmultilevelregressions,althoughtheGibbssamplerturnedouttobefairlyaccuratefortheiMG-GLMandiHGLMmodels,developingavariationalinferencealternativewouldbeaninterestingtopicforfutureresearch.Finally,thenumberofmixturecomponentsineachgroupdependsonthescalefactorsand(scaleparametersoftheDPandHDP)ofthemodel,andattimesgrowslargeinspecicgroups.Thisoccursmostlywhenanygrouphasalargenumberofdatapointscomparedtoothers.Inmostcases,beyondafewprimaryclusters,theremainingrepresentoutliers.Although,carefultuningofscaleparameterscanmitigatetheseproblems,atheoreticalunderstandingofthedependenceofthemodelonscaleparameterscouldleadtobettermodelingandapplication.Although,theMetropolisHastingsalgorithmturnedouttobefairlyaccuratefortheiMG-GLM-2model,developingavariationalinferencealternativewouldbeaninterestingtopicforfutureresearch.Forthenalpart,weplantoexplorewhichkindofphysicalsystemscanbeanalyzedusingnonparametricBayesianbasednoisereductionmethods.Finally,considerableeffortshouldbegiventoanalyzingtimeseriesgeneratedfromhigherdimensionalsystems. 101

PAGE 102

REFERENCES A.Tsanas,P.E.McSharry,M.A.LittleandRamig,L.O.AccurateTelemonitoringofParkinsonsDiseaseProgressionbyNon-invasiveSpeechTests.IEEEtransactionsonBiomedicalEngineering57(2009):884. Abraham,R.andShaw,C.Dynamics:TheGeometryofBehavior.ArielPress,1985. Ando,RieKubotaandZhang,Tong.AFrameworkforLearningPredictiveStructuresfromMultipleTasksandUnlabeledData.JournalofMachineLearningResearch6(2005):1817. Antoniak,C.E.MixturesofDirichletProcesseswithApplicationstoBayesianNonpara-metricProblems.AnnalsofStatistics2(1974).6:1152. Badii,R.,Broggi,G.,Derighetti,B.,Ravani,M.,Ciliberto,S.,Politi,A.,andRubio,M.A.Dimensionincreaseinlteredchaoticsignals.Phys.Rev.Lett.60(1988):979. Bakker,B.andHeskes,T.TaskClusteringandGatingforBayesianMultitaskLearning.JournalofMachineLearningResearch4(2003):83. Baxter,Jonathan.LearningInternalRepresentations.InternationalConferenceonComputationalLearningTheory(1995):311. .AModelofInductiveBiasLearning.JournalofArticialIntelligenceResearch12(2000):149. Bishop,C.M.PatternRecognitionandMachineLearning.Springer-Verlag,2006. Blackwell,D.andMacQueen,J.B.FergusonDistributionsViaPolyaUrnSchemes.AnnalsofStatistics1(1973).2:353. Blei,D.andJordan,M.VariationalInferenceforDirichletProcessMixtures.BayesianAnalysis1(2006):121. Breslow,N.E.andClayton,D.G.ApproximateInferenceinGeneralizedLinearMixedModels.JournaloftheAmericanStatisticalAssociation88(1993).421:9. Caruana,Rich.MultitaskLearning.MachineLearning28(1997).1:41. Cortes,C.andVapnik,V.SupportVectorNetworks.MachineLearning20(1995):273. D.Blei,A.NgandJordan,M.LatentDirichletAllocation.JournalofMachineLearningResearch3(2003):993. David,L.andDonoho,J.De-noisingbySoft-thresholding.IEEETrans.Inf.Theor.41(1995).3:613. 102

PAGE 103

Elshorbagy,A.andPanu,U.S.Noisereductioninchaotichydrologictimeseries:factsanddoubts.JournalofHydrology256(2002).34:147. Escobar,D.M.andWest,M.BayesianDensityEstimationandInferenceUsingMixtures.JournaloftheAmericanStatisticalAssociation90(1995).430:577. Ferguson,T.S.ABayesianAnalysisofSomeNonparametricProblems.AnnalsofStatistics1(1973):209. Gelman,A.andWolberg,D.B.Rubin.InferenceFromIterativeSimulationusingMultipleSequences.StatisticalSciences7(1992):457. Gershenfeld,N.andWeigend,A.TimeSeriesPrediction:ForecastingtheFutureandUnderstandingthePast.Addison-Wesley,1994. Ghahramani,Z.andBeal,M.PropagationAlgorithmsforVariationalBayesianLearn-ing.Proceedingsof13thAdvancesinNeuralInformationProcessingSystems(2000):507. Grassberger,P.,Schreiber,T.,andSchaffrath,C.Non-lineartimesequenceanalysis.InternationalJournalofBifurcationandChaos1(1991).3:521. Grebogi,C.,Ott,E.,Pelikan,S.,andYorke,J.A.Strangeattractorsthatarenotchaotic.PhysicaD:NonlinearPhenomena13(1984).1:261. Hannah,L.,Blei,D.,andPowell,W.DirichletProcessMixturesofGeneralizedLinearModels.JournalofMachineLearningResearch12(2011):1923. Hastie,T.andTibshirani,R.Varying-CoefcientModels.JournaloftheRoyalStatisticalSociety.SeriesB(Methodological)55(1993).4:757. IBM.IBMSpssVersion20.IBMSPSSSOFTWARE(2011). Ishwaran,H.andJames,L.F.GibbsSamplingMethodsforStick-BreakingPriors.JournaloftheAmericanStatisticalAssociation96(2001).453:161. Jordan,M.andJacobs,R.HierarchicalmixturesofexpertsandtheEMalgorithm.InternationalJointConferenceonNeuralNetworks(1993). Kennel,MatthewB.,Brown,Reggie,andAbarbanel,HenryD.I.Determiningembed-dingdimensionforphase-spacereconstructionusingageometricalconstruction.Phys.Rev.A45(1992).6:3403. Kostelich,E.J.andYorke,J.A.NoiseReduction:FindingtheSimplestDynamicalSystemConsistentwiththeData.Phys.D41(1990).2:183. Lee,Y.andNelder,J.A.HierarchicalGeneralizedLinearModels.JournaloftheRoyalStatisticalSociety.SeriesB(Methodological)58(1996).4:619. 103

PAGE 104

.HierarchicalGeneralisedLinearModels:ASynthesisofGeneralisedLinearModels,Random-EffectModelsandStructuredDispersions.Biometrika88(2001a).4:987. .Modellingandanalysingcorrelatednon-normaldata.StatisticalModelling1(2001b).1:3. .Doublehierarchicalgeneralizedlinearmodels(withdiscussion).JournaloftheRoyalStatisticalSociety:SeriesC(AppliedStatistics)55(2006).2:139. Lorenz,E.N.DeterministicNonperiodicFlow.JournaloftheAtmosphericSciences20(1963).2:130. Lowd,D.andDomingos,P.NaiveBayesmodelsforprobabilityestimation.Proceedingsofthe22ndinternationalconferenceonMachinelearning(2005):529. M.Jordan,T.Jaakkola,Z.GhahramaniandSaul,L.IntroductiontoVariationalMethodsforGraphicalModels.MachineLearning37(2001):183. Mallat,S.andHwang,W.L.Singularitydetectionandprocessingwithwavelets.InformationTheory,IEEETransactionson38(1992).2:617. Mitschke,F.,Moller,M.,andLange,W.Measuringlteredchaoticsignals.Phys.Rev.A37(1988).11:4518. Neal,R.M.MarkovChainSamplingMethodsforDirichletProcessMixtureModels.JournalofComputationalandGraphicalStatistics9(2000a).2:249. .MarkovChainSamplingMethodsforDirichletProcessMixtureModels.JournalofComputationalandGraphicalStatistics9(2000b).2:249. Nelder,J.A.andWedderburn,R.W.M.GeneralizedLinearModels.JournaloftheRoyalStatisticalSociety,SeriesA(General)135(1972).3:370. Pol,B.V.D.Atheoryoftheamplitudeoffreeandforcedtriodevibrations.RadioReview1(1920):701. Rasmussen,C.E.andWilliams,C.K.I.GaussianProcessesforMachineLearning(AdaptiveComputationandMachineLearning).MITPress(2005a). Rasmussen,C.E.andWilliams,C.K.I.GaussianProcessesforMachineLearning(AdaptiveComputationandMachineLearning).MITPress,2005b. Robert,C.andCasella,G.MonteCarloStatisticalMethods.Springer-Verlag,2001. Robert,C.P.andCasella,G.MonteCarloStatisticalMethods(SpringerTextsinStatistics).Springer-VerlagNewYork,Inc.,2005. 104

PAGE 105

Robinson,A.P.andWykoff,W.R.Imputingmissingheightmeasuresusingamixed-effectsmodelingstrategy.CanadianJournalofForestResearch34(2004):24922500. Rossler,O.E.Anequationforcontinuouschaos.PhysicsLettersA57(1976).5:397. S.Ghosal,J.K.GhoshandRamamoorthi,R.V.PosteriorconsistencyofDirichletmixturesindensityestimation.AnnalsofStatistics27(1999):143. Schwartz,L.OnBayesprocedures.ZeitschriftfrWahrscheinlichkeitstheorieundVerwandteGebiete4(1965).1:10. Sethuraman,J.AConstructiveDenitionofDirichletPriors.StatisticaSinica4(1994):639. Site,G.andRamakrishnan,A.G.Waveletdomainnonlinearlteringforevokedpotentialsignalenhancement.ComputerandBiomedicalResearch33(2000).3:431. Takens,Floris.DynamicalSystemsandTurbulence,Warwick1980.Detectingstrangeattractorsinturbulence898(1981):366. Teh,Y.W.,Jordan,M.I.,Beal,M.,andBlei,D.HierarchicalDirichletProcesses.JournaloftheAmericanStatisticalAssociation101(2006):1566. Tokdar,S.T.PosteriorConsistencyofDirichletLocation-ScaleMixtureofNormalsinDensityEstimationandRegression.Sankhya:TheIndianJournalofStatistics68(2006).1:90. Viele,K.andTong,B.ModelingwithMixturesofLinearRegressions.StatisticsandComputing12(2002).4:315. Wang,Zidong,Lam,James,andLiu,Xiaohui.Filteringforaclassofnonlineardiscrete-timestochasticsystemswithstatedelays.JournalofComputationalandAppliedMathematics201(2007).1:153. Wolberg,W.H.andMangasarian,O.L.Multisurfacemethodofpatternseparationformedicaldiagnosisappliedtobreastcytology.ProceedingsoftheNationalAcademyofSciences87(1990):9193. Y.W.Teh,M.J.Beal,M.I.JordanandBlei,D.M.HierarchicalDirichletProcesses.JournaloftheAmericanStatisticalAssociation101(2006):1566. YaXue,XuejunLiaoandCarin,Lawrence.Multi-taskLearningforClassicationwithDirichletProcessPriors.JournalofMachineLearningResearch8(2007):35. Yu,Kai,Tresp,Volker,andSchwaighofer,Anton.LearningGaussianProcessesfromMultipleTasks.InternationalConferenceonMachineLearning(2005):1012. 105

PAGE 106

Zhang,Jian,Ghahramani,Zoubin,andYang,Yiming.LearningMultipleRelatedTasksusingLatentIndependentComponentAnalysis.AdvancesinNeuralInformationProcessingSystems(2005):1585. Zhang,L.,Bao,P.,andPan,Q.Thresholdanalysisinwavelet-baseddenoising.ElectronicsLetters37(2001).24:1485. 106

PAGE 107

BIOGRAPHICALSKETCHMinhazulIslamSkwasbornatthesmalltownofBurdwanofWestBengalprovinceinIndia.AfternishinghishighschoolinBurdwanC.M.S.highschool,hewasadmittedtoJadavpurUniversityinKolkataforhisundergraduatestudiesinElectronicsAndTelecommunicationEngineeringin2008.Afternishinghisundergraduateeducationin2012,hewasadmittedtothePh.D.programintheComputerandInformationScienceandEngineeringDepartmentattheUniversityofFloridainGainesville,Florida,USAin2012.HisprimaryareaofresearchismachinelearningandappliedstatisticsespeciallyintheareaofregressionandBayesiannonparametrics.HegraduatedwithaPh.D.fromtheUniversityofFloridainAugust2017. 107