Citation
Inference of Network Topology and Dynamics: A Compressive Sensing Approach

Material Information

Title:
Inference of Network Topology and Dynamics: A Compressive Sensing Approach
Creator:
He, Yuejia
Publisher:
University of Florida
Publication Date:
Language:
English

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Electrical and Computer Engineering
Committee Chair:
Wu, Dapeng
Committee Members:
Li, Xiaolin
Fang, Yuguang
Thai, My Tra
She, Yiyuan
Graduation Date:
8/10/2013

Subjects

Subjects / Keywords:
Estimate reliability ( jstor )
Inference ( jstor )
Moderation ( jstor )
Quantiles ( jstor )
Regulator genes ( jstor )
Statistical estimation ( jstor )
Statistical models ( jstor )
Statistics ( jstor )
Time series ( jstor )
Topology ( jstor )
dynamics
network
sparsity
topology

Notes

General Note:
Dynamical systems and networks have drawn increasing attention of researchers from various scientific domains, thanks to their capability of modeling and describing a large variety of phenomena and behaviors. A particular interest is to identify the underlying structures and between-node connections of such networks. A variety of models and approaches including Bayesian dynamical network, statistical hypothesis testing, Wiener filtering, and Granger causality have been proposed and studied. To exploit this challenge, we are devoted to modeling dynamical systems using linear or nonlinear stochastic differential/difference equations and inferring their topologies and dynamics using compressive sensing techniques. The first part of this work introduces a linear dynamical model and the causality graph associated with it. We present the improvements we have made over the current sparse learning methods for inferring the causality graph. First, we are the first one to consider guaranteeing the stationarity property of the estimated model. Second, we employ a novel penalty ``Berhu'' to improve the conventional Lasso in addressing high collinearity in the network data. Third, an iterative screening procedure based on quantile thresholding is proposed to tackle the ultra-high dimensional challenge. Finally, a stationary bootstrap technique is applied to attain reliable topology learning with connection occurring frequency. In the second part, we consider an unknown covariance structure for the linear model, brought in by the multivariate Gaussian noise. In this model, the transition matrix represents a directed graph that describes the causality relationships between nodes, while the covariance matrix represents an undirected graph that describes the conditional dependence relationships. We introduce the joint association graph (JAG) to integrate the two graphs and provide a comprehensive picture of the network. An efficient algorithm is designed to learn the JAG structure. Particularly, a robust JAG screening and decomposition is proposed for reliable and efficient network learning of large-scale systems. In practice, there are many complex systems whose dynamics are extremely nonlinear and cannot be satisfactorily characterized by linear models. Motivated by overcoming the limitations of linear models, we study the inference of nonlinear stochastic dynamical networks in Chapter 4. Particularly, we are interested in dynamical systems whose evolutions follow the sigmoid rules and whose topologies are sparse, such as the gene regulatory network. We use the sigmoid recurrent network (SRN) to model such dynamical networks and propose the sparse sigmoid regression algorithm to do model selection and parameter estimation simultaneously. Then the progressive sigmoid network screening is proposed to enable the user to have a direct control over the cardinality of the estimated model. We show the effectiveness of the proposed frameworks through simulation studies as well as applications to real-world data analysis. The dynamical networks analyzed include the macroeconomic network, the stock markets, and the gene regulatory networks.

Record Information

Source Institution:
UFRGP
Rights Management:
Copyright He, Yuejia. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Embargo Date:
8/31/2015

Downloads

This item has the following downloads:


Full Text

PAGE 2

2

PAGE 3

3

PAGE 4

Firstofall,IwouldliketothankmyadvisorDr.DapengWuandspecialco-advisorDr.YiyuanShe.Withhisbroadversionandwideknowledge,Dr.Wuhasprovidedmevaluableadvisesinseekingresearchtopics.Hedevotedalotoftimeandeffortinguidingmeontotheroadofscienticresearchandteachingmetobecomeaqualiedresearcher.IamalsogratefulforhissupportingmebothmateriallyandspirituallyinthefouryearsofmyPhDlife.Dr.Sheisaseriousandbrilliantscholarwithsolidknowledgeinstatistics,mathematicsandcomputerscience.Heisalwayspatienttoanswermyquestions,givemehelpfulsuggestionsandreviewmypaperscarefully.Hishard-workingandseriousattitudeforresearchsetanexcellentexampleforme.Hiskindnessandeasy-goingpersonalitymakeshimagoodfriend,too.IwouldliketothankDr.YuguangFang,Dr.XiaolinLiandDr.MyThaiforservingonmycommittee.TheyhavegivenmeimportantcommentsandsuggestionsonmyproposalsothatIknowwhereandhowtomakeimprovement.Iappreciatetheirtimeandeffortinreadingmydissertationandattendingmyoraldefense.Iwouldliketothankmycollegesandclassmates,ZongruiDing,QianChen,RuiYang,ZhengYuan,HuanghuangLi,ShijieLi,RuigangFang,SonglinZhao,etc.Theyhavehelpedmealotbothinstudyandinlife.Becauseofthem,mylifeinGainsevilleiswonderful.IthankSandraHardyforalwayscaringformeandprayingforme.AndIthankXueHuangforgivingmealotofhelpwhenIstayedinTallahassee.Finally,IwouldliketothankmyfamilyandfriendsinChina.Thoughyouarefaraway,Icanfeelyourloveandsupport. 4

PAGE 5

page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 8 LISTOFFIGURES ..................................... 9 ABSTRACT ......................................... 11 CHAPTER 1INTRODUCTION ................................... 13 1.1InferenceofStochasticLinearDynamicalNetwork ............. 15 1.1.1InferringtheTransitionMatrixandCausalityGraph ......... 15 1.1.2InferringJointAssociationGraphandIdentifyingSubnetworks ... 17 1.2InferenceofStochasticNonlinearDynamicalNetwork ........... 18 1.3Outline ...................................... 19 2INFERENCEOFTHECAUSALITYGRAPHOFTHELINEARMODEL ..... 20 2.1RelatedWorks ................................. 21 2.2TheStationary-Sparse(S2)LearningFramework .............. 23 2.2.1TheS2NetworkLearning ....................... 23 2.2.2TheBerhuPenalty ........................... 26 2.3ComputationofBISPS ............................. 28 2.3.1AlgorithmsBasedonConventionalTechniques ........... 28 2.3.1.1SemideniteProgramming ................. 29 2.3.1.2ProjectedSubgradientMethod ............... 29 2.3.1.3AlternatingDirectionMethodofMultipliers ......... 30 2.3.2TheBerhuThresholdingOperator ................... 31 2.3.3TheBISPSAlgorithm .......................... 32 2.4QuantileThresholdingIterativeScreening .................. 34 2.5StationaryBootstrapEnhancedNetworkLearning ............. 36 2.5.1TheSB-BISPSFramework ....................... 36 2.5.2StationaryBootstrap .......................... 37 2.6ExperimentsonSyntheticData ........................ 38 2.6.1TuningMethod ............................. 38 2.6.2PerformanceMeasuresandExperimentSettings .......... 39 2.6.3PerformanceofBISPS ......................... 40 2.6.4PerformanceofQTIS .......................... 42 2.7ApplicationtoU.S.MacroeconomicData ................... 44 2.7.1ComparisonofRollingMSE ...................... 45 2.7.2BootstrapAnalysis ........................... 46 5

PAGE 6

49 3.1TheJointGraphicalModel ........................... 52 3.1.1JointRegularizationandJointSparseLearning ........... 53 3.1.2JointAssociationGraph ........................ 54 3.1.3TheJGSELearningFramework .................... 56 3.2JAGScreeningandDecomposition ...................... 58 3.2.1Group-estimator ........................... 59 3.2.2JAGScreening ............................. 61 3.2.3TheAALineSearchandtheGISTAlgorithm ............ 62 3.2.4RobustJAGDecompositionviaSpectralClustering ......... 64 3.3FineLearning .................................. 66 3.4ExperimentsonSyntheticData ........................ 68 3.4.1IdenticationandEstimationAccuracy ................ 68 3.4.2RandIndexComparison ........................ 70 3.4.3ComputationTimeComparison .................... 71 3.5Applications ................................... 72 3.5.1S&P500 ................................ 72 3.5.2NASDAQ-100 .............................. 74 4INFERENCEOFTHESIGMOIDRECURRENTNETWORKMODEL ...... 78 4.1TheGeneRegulatoryNetworkandtheSRNModel ............. 78 4.1.1TheGeneRegulatoryNetwork .................... 78 4.1.2TheSigmoidRecurrentNetworkModel ................ 79 4.1.3RelatedWorks ............................. 80 4.2PMLEstimationoftheSRNModel ...................... 81 4.3TheSparseSigmoidRegressionAlgorithm ................. 82 4.3.1MultivariateModelLearning ...................... 84 4.3.2ProgressiveSigmoidalNetworkScreening .............. 85 4.4ExperimentsonSyntheticData ........................ 86 4.4.1SimulationoftheModelandStochasticProcesses ......... 86 4.4.2Methods ................................. 87 4.4.3PerformanceComparisons ...................... 88 4.5Applications ................................... 89 4.5.1TheSOSNetwork ........................... 89 4.5.2TheYeastNetwork ........................... 91 5CONCLUSION .................................... 94 APPENDIX APROOFOFTHEOREM2.1 ............................. 97 BPROOFOFTHEOREM2.2 ............................. 104 CPROOFOFTHEOREM4.1 ............................. 105 6

PAGE 7

............................. 108 REFERENCES ....................................... 109 BIOGRAPHICALSKETCH ................................ 118 7

PAGE 8

Table page 2-1PerformancecomparisonofBISPSwithLasso,eNetandBerhu.n=80. .... 41 2-2NormalizedRollingMSEofLassoandBISPSforeachcategory. ........ 44 2-3RollingMSEofLassoandBISPSfordifferenthorizons. ............. 45 3-1Comparisonofthetruepositiverate,falsepositiverateandmodelerror .... 69 3-2Computationtimecomparison. ........................... 72 3-3RollingMSEforNASDAQ-100. ........................... 77 4-1RegulatoryconnectionsoftheYeastcellcyclesubnetwork. ........... 92 8

PAGE 9

Figure page 2-1Exampleofanetwork( 2 )withtransitionmatrix( 2 ). ............. 21 2-2Exampleofstationaryandnonstationaryprocesses.Thenumberofnodesisp=50.Thestationaryprocesshas(B)=0.95andthenonstationaryprocesshas(B)=1.05. .............................. 24 2-3ForecastsofGDP263givenbyMLandPMLestimation.ThetimeseriesofGDP263isobtainedfromseasonalobservationsbetween1978:IVand1998:IV.Itisastationaryprocess.However,theforecastsgivenbyMLandPMLexhibitnonstationarybehaviors.(^BML)=1.171,(^BPML)=1.073. ........... 24 2-4Penaltyfunctionsandcorrespondingsolutions. .................. 27 2-5Comparisonofforecastingperformances.Top:samplefromthetruemodel;Middle:forecastfromthenonstationaryestimate^BLasso;Bottom:forecastfromthestationaryestimate^BBISPS. ........................... 42 2-6ComparingmissrateofQTISandSIS. ....................... 43 2-7PerformanceofQTIS+BISPS,comparedwithapplyingBISPStoafullmodel. 43 2-8COFsofthepre-GreatModerationperiodandthepost-GreatModerationperiod. 47 2-9Topologiesofthemacroeconomicnetworkinthepre-GreatModerationperiodandthepost-GreatModerationperiod.f=80%.Self-loopsarenotshown. .. 47 3-1ThesGTGandsCDGfortheS&P500indicesfromtheEnergycategory.Commonisolatednodeshavebeenremoved. ................... 51 3-2Exampleofthejointgraphicalmodel. ....................... 55 3-3Randindexcomparison. ............................... 71 3-4JAGdecomposition(q=0.1)forS&P500. ..................... 73 3-5RandindexforS&P500. .............................. 74 3-6sCDGlearningsforS&P500. ........................... 75 3-7TheGTGandCDGofNASDAQ-100.Thesizeofthenodeisproportionaltoitsdegree.Theedgewidthindicatestheweightoftheconnection.Solidlinerepresentspositiveweightanddottedlinerepresentsnegativeweight. ..... 77 4-1Topologyanddynamicsofthesigmoidrecurrentnetworkmodel. ........ 80 4-2Performancecomparison.p=10. ......................... 89 4-3ROCcurves. ..................................... 89 9

PAGE 10

..... 90 10

PAGE 11

11

PAGE 12

4 .Particularly,weareinterestedindynamicalsystemswhoseevolutionsfollowthesigmoidrulesandwhosetopologiesaresparse,suchasthegeneregulatorynetwork.Weusethesigmoidrecurrentnetwork(SRN)tomodelsuchdynamicalnetworksandproposethesparsesigmoidregressionalgorithmtodomodelselectionandparameterestimationsimultaneously.Thentheprogressivesigmoidnetworkscreeningisproposedtoenabletheusertohaveadirectcontroloverthecardinalityoftheestimatedmodel.Weshowtheeffectivenessoftheproposedframeworksthroughsimulationstudiesaswellasapplicationstoreal-worlddataanalysis.Thedynamicalnetworksanalyzedincludethemacroeconomicnetwork,thestockmarkets,andthegeneregulatorynetworks. 12

PAGE 13

Barabasi ( 2002 )havebeenappliedtodescribingalotofphysicalphenomenaandsystemssuchasthestockmarkets( MillsandMarkellos 2008 ),geneticnetworks( Gardneretal. 2003 ),andpowergrids( MassoudandWollenberg 2005 ).Twoofthemostimportantcharacteristicsofadynamicalnetworkareitstopologyanddynamics( Boccalettietal. 2006 ).Thetopologyindicatestheinterconnectionsandrelationshipsbetweennodes;whilethedynamicsdescribeshowthenetworkevolveswithtime.Withtopologyanddynamicsknown,wecanpredict,diagnoseandcontrolthenetwork( Carringtonetal. 2005 ).Avarietyofapproacheshavebeenproposedtoanalyzethetopologyordynamicsofadynamicalnetwork.Forexample,thedynamicBayesiannetwork( Ghahramani 1998 ; LahdesmakiandShmulevich 2008 )providesadirectedacyclicgraphthatcapturestheconditionaldependencerelationshipswithinadynamicalnetwork.Grangercausality( Granger 1969 )anditsextensionfornonlinearsystems,kernelGrangercausality( Marinazzoetal. 2008 ),areappliedtoinferthecausalitystructureofthenetwork.Pair-wisemutualinformationisusedtoevaluatenon-randomassociationsandconstructarelevancenetwork( ButteandKohane 2000 ).Thepredictionerrormethodandthemaximumlikelihoodestimationmethod( Ljung 1998 )arewellappliedtoinferthenetworkdynamicswhenthetopologyisknown,eithergivenbypriororexpertinformationorprovidedbysomeinferencemethod.Regularizationtechniques( ChiusoandPillonetto 2012 ; NapoletaniandSauer 2008 )areproposedtoinferboththedynamicsandtopologyfromdata.Referto( Krameretal. 2009 ; KossinetsandWatts 2006 ; LuandChen 2005 ; MaterassiandInnocenti 2010 ; Timme 2007 ; Xingetal. 2010 ; Yuanetal. 2011 )forotherworksondynamicalnetworkanalysis.Oneapproach,whichweareparticularlyinterestedin,startsfrommodellingthedynamicalnetworkwithasetofstochasticdifferential/differenceequations(SDEs). 13

PAGE 14

Donoho 2006 ; TsaigandDonoho 2006 ),suchaspenalizedmaximumlikelihood(PML)method,areappliedtoestimatetheparameterswithsparsityconstraint.Suchmethodsautomaticallymakeatrade-offbetweengoodnessoftandmodelcomplexity.Thus,theycannotonlyprovideasparsestructurethatismeaningfulandinterpretablebutalsoachieveaccurateestimationandpredictionofthenetworkdynamics.Ourresearchfollowsthisapproach.Thatis,wemodeladynamicalnetworkusinglinearornonlinearSDEsandinferitstopologyanddynamicsfrommultivariatetimeseriesdatathroughsparselearning.Thediscreteobservationsofalineardynamicalnetworkareoftendescribedusingthevectorautoregressivemodel( Sims 1980 ),whichisessentiallyasetofstochasticdifferenceequations( HansenandSargent 1990 ).Applicationscanbefoundinvariousscienticdomains,suchaseconomics,chemistry,andbioinfomatics( Goebeletal. 2003 ; Korhonenetal. 1996 ; Sims 1980 ; Wildetal. 2010 ).Recently,sparselearningtechniques,emergingfromstatisticalestimationandcompressivesensing,havebeenappliedtoreconstructthenetworktopologyandestimatethenetworkdynamicssimultaneously( Bolstadetal. 2011 ; Fujitaetal. 2007 ; Valdes-Sosa 2005 ).However,therearestillimportantproblemsthathavenotbeenfullytackled,suchasthestationaritypropertyoftheestimates,thecondenceleveloftheestimatedtopology,thecovariancestructureoftheGaussiannoise,andthelarge-scaleandultra-highdimentionalproblemsthatarebecomingmoreandmoreimportantinnetworkscience.Hence,inthiswork,wearedevotedtoexploringthosechallenges. 14

PAGE 15

Caellietal. 1993 ; Vohradsky 2001 ).Weproposeadata-drivenmethodtoinferthedynamicsandstructureoftheSRNmodelinasystematicway. Granger 1969 ).ThestrengthsofthesecausalrelationsaregivenbytheithrowofmatrixA.Therefore,thetransitionmatrixAdescribesthecausalrelationshipsbetweennodes( Wildetal. 2010 ).Ontheotherhand,thevaluesofthenodesattimetarealsocontaminatedbythenoisesthatcomefromamultivariateGuassiandistribution.ConsequentlythenodesarealsocorrelatedduetothecorrelationbetweenthecomponentsoftheGaussiannoise.Theconcentrationmatrix=1describessuchconditionaldependencerelationships( Lauritzen 1996 ).Thatis,nodeiandnodejareconditionallyindependentgivenothernodesifandonlyif!ij=!ji=0. Bolstadetal. 2011 ; Fujitaetal. 2007 ; Goebeletal. 2003 ),researchersfocusonlearningthe 15

PAGE 16

FanandLi 2006 2001 ; Zou 2006 ; ZouandHastie 2005 ; BlumensathandDavies 2010 ).Currentapplicationsfocusonusing`1penalty( Tibshirani 1996 )forthesakeofitscomputationalefciencyandtheoreticalelegance.Nevertheless,themajorproblemofthe`1penaltyfordynamicalnetworklearningisitsincapabilityofhandlingcollinearity,whichtypicallyexistsinnetworkdataasaresultoftheinteractionbetweennodes.Theelasticnet( ZouandHastie 2005 )usesalinearcombinationofthe`1and`2penaltiestodealwithcollinearityandlargenoise.However,its`2componentmaycounteractsparsityandbringthedoubleshrinkageissue.Toimprovethesedrawbacks,westudyanew``1+`2'variantBerhu( Owen 2007 ),whichfusesthe`1and`2penaltiesinanonlinearfashionandthuscandealwithcollinearityaswellasachievesufcientsparsity.WedesignaBerhuthresholdingoperatortoefcientlysolvetheBerhupenalizedproblem.ThePMLmethodisdesignedforstationarytimeseries.However,itmayendupwithanonstationaryestimatewithunmeaningfulpredictionresults.Hence,ourwork,distinguishedfromtheexistingones,focusesonguaranteeingthestationaritypropertyoftheestimatedmodel.WeputthestationarityconstraintintothePMLproblemanddesignanefcientalgorithm,theBerhuiterativesparsitypursuitwithstationarity(BISPS),tosolvethisoptimizationproblem.Intoday'sreal-worldproblems,wefrequentlyfacepracticaldatasetswithp2n,wherepisthenumberofnodesandnisthenumberofobservations,e.g.,microarraydatasetsconsistingofthousandsofgenesbutfewerthanahundredobservations.Toaddresstheso-calledultra-highdimensionalproblem,wedesignthequantile

PAGE 17

PolitisandRomano 1994 )tothenetworkidenticationproblemandproposethestationarybootstrapenhancedBISPS(SB-BISPS)procedure.Boostrapmethodresamplesthedataandrecalculatesthestatisticsusingtheresampleddata.Ourworkappliesthispowerfultoolwithstationarityguaranteetonetworkidenticationandprovidesacondencelevelfortheoccurrenceofeachpossibleconnectioninthenetwork. Rothmanetal. 2010 ).Sinceinastochasticdynamicalsystem,thenoiseisacrucialdrivingforceofthesystem'sevolution( HanggiandMarchesoni 2005 ),thestructureofthecovariancematrixortheconcentrationmatrixwillimpactthedynamicsofthenetwork.Therefore,Chapter 3 dealswiththecasewherebothAandin( 1 )haveunknownstructures.Thetwomatrices,combinedtogether,giveacomprehensivepictureofthenetworkdynamicsandtopology.TherearefewerstudiesintheliteraturethatconsiderthejointsparseestimationofAand( Rothmanetal. 2010 ; LeeandLiu 2012 ).Unfortunately,inourexperience,allexistingmethodsareslowandabsolutelynotapplicableforlarge-scalenetworks.Forexample,theMRCEalgorithmproposedby Rothmanetal. ( 2010 )isinfeasibleforanetworkwithsizep>120onanordinaryPC.SeeSection 3.4 fordetails.Evenforrelativelysmallnetworks,thenumberofunknownvariablesinthetwomatrices,i.e.,p2+p(p+1)=2,canbelarge,therebymakingitdifculttoreliablyidentifythesparsenetworktopologyandaccuratelyestimatethesystemparameters. 17

PAGE 18

Stein 1956 ).Anotherinterestingobservationisthatalargenetworkcanbepossiblydecomposedintosmallersubnetworks.Ifsuchanetworkdecompositioncouldbedetectedinanearlystage,complexlearningalgorithmswouldapplyinamuchmoreefcientway(eveninaparallelmanner)toreducecomputationalcomplexityandboostfeasibility.Ofcourse,thedecompositionbasedoneitherAnorisnottrustworthy.WeproposethegraphscreeninganddecompositionbasedonAandjointly.Specically,weintroducethenotionofjointassociationgraph(JAG)anddevelopajointgraphicalscreeningandestimation(JGSE)frameworkforefcientnetworklearninginlarge-scaledataapplications.Inparticular,ourmethodcanpre-determineandremoveunnecessaryedgesbasedonthejointgraphicalstructure,referredtoasJAGscreening,anddecomposealargenetworkintosmallersubnetworksinarobustmanner,referredtoasJAGdecomposition.Inpracticaldataanalysis,JAGscreeninganddecompositioncanreducetheproblemsizeandsearchspaceforneestimationatalaterstage.Experimentsonbothsyntheticdataandreal-worldapplicationshaveshowntheeffectivenessoftheproposedframeworkintopologyidenticationanddynamicsestimationoflarge-scalelineardynamicalnetworks. Anishchenkoetal. 2007 ).Suchsystemscallfornonlinearmodelstodescribetheirdynamics.Inourwork,weareparticularlyinterestedinthesparsesigmoidrecurrentnetwork(SRN)model,whichisdescribedbyasetof 18

PAGE 19

HanggiandMarchesoni 2005 )and()isasigmoidfunction: 1+exp().Thismodeliswidelyusedintheareasofneuroscienceandgenetics( Caellietal. 1993 ; Vohradsky 2001 ),mainlybecausethesaturationeffectofthesigmoidfunctioncandescribealotofphysicalphenomenaofinterest.IdentifyingtheSRNmodelisaNP-hardproblem,ifwehavetodoanexhaustivesearchofallthepossibilitiesforSi.Therefore,currentmethods Changetal. ( 2006 ); Yipetal. ( 2010 ); Vuetal. ( 2007 )focusonestimatingtheparametersgivenaspecicSi,whichischoseneitherusingpriorknowledgeorgreedymethods.Differentfromexistingworks,wedesignadata-drivenmethodthatidentiesSiandestimatestheunknownparameterssimultaneously.Tobespecic,wederivethePMLproblemfor( 1 )anddesignanovelalgorithm,thesparsesigmoidregression(SSR),tosolveit.TuningthepenaltyparameterfortheSSRalgorithmisnon-trivialduetothenonlinearandnonconvexnatureofthemodel.Tobypassthetuningissue,wecanapplytheprogressivesigmoidalnetworkscreening(PSNS)algorithminstead.ThePSNSalgorithmsolvesa`0constrainedMLproblem,whichisacounterpartofthepenalizedMLproblemsolvedbySSR.Itrenderstheusersthedirectcontrolonthecardinalityoftheestimate.Theusercansetadesiredcardinalityandthenmakefurtheranalysisusingtheresultingestimate. 2 andChapter 3 presentourworkontheinferenceofasparselinearmodel.Chapter 2 focuseson 19

PAGE 20

3 considersamoregeneralcovariancestructure.WeintroducetheJGSElearningframeworkthatconsistsoftwostages.Atstage1,theGISTalgorithmisproposedforrobustJAGscreeninganddecomposition.Atstage2,theFLOGalgorithmisproposedfornelearningofgraphs.Chapter 4 studiesasparseSRNmodelfornonlinearstochasticdynamicalnetwork.TheSSRalgorithmandPSNSalgorithmareproposedforconnectionselectionandestimation. 20

PAGE 21

Sims 1980 ).ItcanbegeneralizedtoaVARmodelwithorderm,wherethecurrentstateisalinearcombinationofthemostrecentmstates.Ontheotherhand,anymth-orderVARmodelcanbeconvertedtoarst-orderVARmodelbyappropriatelyredeningthenodevariables( Lutkepohl 2007 ),andthuswefocusontheformeronewithm=1.In( 2 ),thetransitionmatrixA=[aij]1i,jpdescribesanetworkthatrepresentsthedynamicalsystem:ifaij6=0,thereisaGrangercausalconnection( Granger 1969 )fromnodejtonodeiwithweightaij.Inotherwords,nodejGranger-causesnodei.Forexample,foranetworkwith6nodesandatransitionmatrixas 21

PAGE 22

BMultivariatetimeseries Exampleofanetwork( 2 )withtransitionmatrix( 2 ). Figure 2-1A showsitstopology(self-connectionsareremoved).ThenodesevolveandinteractwitheachotherthroughtheGrangercausalconnections,resultingintherandomprocessesplottedinFigure 2-1B .Therefore,matrixAnotonlyillustratesthedynamicrulesthatgoverntheevolutionofthesystem,butalsocapturesalinearcausalitynetworkthatdescribesthe(Granger)casualrelationsbetweennodes.

PAGE 23

2(xtAxt1)T1(xtAxt1)g.Assume=I.The(conditional)MLestimateofAcanbeobtainedbysolving 2nXt=2kxtAxt1k22.LettingY=[xT2,xT3,,xTn]T,X=[xT1,xT2,,xTn1]TandB=AT,wecanformulatetheprobleminmatrixform: 2kYXBk2F.Forconvenience,weuseBinsteadofAtorepresentthenetwork.Notethatbijdescribesthedirectedconnectionstrengthfromnodeitonodej.Theestimate^BMLhasbeeninvestigatedandappliedinmanyreal-worldproblems.Forstationaryprocess,theconsistencyandasymptoticefciencyof^BMLareanalyzedby Reinsel ( 2003 ).Thesmall-samplepropertiesarediscussedby Lutkepohl ( 2007 ).Nevertheless,theplainMLestimationisnotidealfornetworklearning.Inpractice,Xusuallydemonstrateshighcollinearity,especiallywhensomenodeshavesimilardynamicalbehaviorsandwhenthenumberofobservationsislimited.Moreover,theMLestimationdoesnotpromotesparsityandconsequentlytheresultingmodelisdifculttointerpret.Toimprovethepredictionaccuracyandobtainaninterpretablemodel,Regularizationisnecessary.Itcanbedonebyaddingapenaltyand/orconstraint.Forexample,wecanestimateBviapenalizedmaximumlikelihood(PML): 23

PAGE 24

Tibshirani 1996 )solvesthe`1penalizedproblem.Itisfastincomputation.Nevertheless,Lassosuffersfromsomedrawbackssuchasselectioninconsistency,estimationbiasandincapabilityofdealingwithcollinearity,inparticular. ZouandHastie ( 2005 )proposetheelasticnet(eNetforshort)whichaddsanadditionalridgeregularization( Tikhonov 1977 )todealwithcollinearityandlargenoise.However,thedesigncounteractssparsitytosomeextendandmaybringthedoubleshrinkageissue.Somenonconvexalternatives,includingthe``0+`1'SCAD( FanandLi 2001 )andthe``0+`2'hard-ridge( She 2009 ),areadvocatedtopromotemoresparsity.However,duetononconvexity,theconvergentsolutionmaybeonlylocallyoptimalanddependonthechoicesoftheinitialpoint.Theyarealsomorecomputationallyexpensivethanconvexapproaches.Therefore,wedonotconsidernonconvexregularizationsfornetworklearning. 2-2 .Forastationaryprocess,itsprobabilitydistributionisinvariantwithrespecttotheshiftintime.Inthenonstationaryprocess,however,wecanclearlyseedriftingandtrendingbehaviors.Inpractice,givenrawobservationssampledfromadynamicalsystem,researchersrststationarizethetimeseriesandtheninputthemtotheML/PMLestimator.Theresultingestimate 24

PAGE 25

Exampleofstationaryandnonstationaryprocesses.Thenumberofnodesisp=50.Thestationaryprocesshas(B)=0.95andthenonstationaryprocesshas(B)=1.05. 2-3 showsareal-data Figure2-3. ForecastsofGDP263givenbyMLandPMLestimation.ThetimeseriesofGDP263isobtainedfromseasonalobservationsbetween1978:IVand1998:IV.Itisastationaryprocess.However,theforecastsgivenbyMLandPMLexhibitnonstationarybehaviors.(^BML)=1.171,(^BPML)=1.073. 25

PAGE 26

StockandWatson 2012 ),whicharestationaryafterpropertransformations.TheestimatesarethenusedtoforecastanindexGDP263.AsshowninFigure 2-3 ,thoughthetimeseriesofGDP263isstationary,theMLandPMLforecastsclearlyexhibitnonstationarybehaviorsandtheyfailtocaptureallcharacteristicsoftheoriginaltimeseries.Inthissection,weproposetheframeworkofstationary-sparse(S2)networklearningtoaddressthelimitationofPMLtoguaranteethestationaritypropertyofthenetwork.Weinvokethestationarityconditionanddesignanefcientalgorithmtosolvetheoptimizationproblem.Arandomprocessasin( 2 )isstationaryifandonlyifitsspectralradius(B)satisesthestationaritycondition: Reinsel 2003 ).Thisleadstothefollowingoptimizationproblem: 2kYXBk2F+P(B;)s.t.(B)<1.(2)Nevertheless,problem( 2 )isextremelychallengingduetothefactthat(B)isanonconvexandnon-Lipschitz-continuousfunctionofB.Anoptimizationmethodproposedby CurtisandOverton ( 2012 ),whichcombinessequentialquadraticprogrammingandgradientsampling,shedssomelightonsloving( 2 ).However,ateachiteration,thegradientsamplingneedstosamplep2pointsandcalculatethegradientofthespectralradiusateachpoint.Asdiscussedby OvertonandWomersley ( 1988 ),calculatingthegradientofspectralradiusforasinglepointisalreadyachallengingandcomputationallydemandingproblem.Itisprohibitivetodosoforp2

PAGE 27

2 )asthestationarityconstraint: GoldbergandZwas 1974 ).Inallourapplications,thetransitionmatrixisnotradial,andthus(B)<1.TheS2learningproblemisgivenby 2.1 ,coherenceisoftenobservedinreal-worldnetworkdata,especiallywhensomenodeshavesimilardynamicalbehaviorsorstronginuencebetweeneachother.Insuchcases,theconventionalLassofailstohandlecollinearityandconsequentlygivesunsatisfactoryidenticationandforecastingperformance.Hence,amoreproperpenaltyisinneedforS2networklearning.WeadoptanewhybridpenaltyBerhu,whichhasacloserelationwithHuber'slossfunctionforrobustregression( Huber 1981 ).TheHuberfunctionisquadraticatsmallvaluesandlinearatlargeones,whichmakesitmorerobusttooutliersthanthesquared-errorcriterion.InspiredbytheHuberfunction, Owen ( 2007 )designedaconvexpenaltyfunctionBerhu 27

PAGE 28

Figure2-4. Penaltyfunctionsandcorrespondingsolutions. Figure 2-4 comparesBerhuwiththeridgepenaltyPR(t;)=1 2t2,LassoPL(t;)=jtjandeNetPE(t;,)=jtj+1 2t2.Theupperpanelplotsthefunctions,whilethelowerpanelshowstheircorrespondingsolutionsintheunivariateandorthogonalcase(seeSection 2.3.2 fordetails).Theridgepenaltyshrinksthecoefcientstocompensateforcollinearity.Butitcannotproduceexactzerocoefcientsintheestimate.TheLassosoft-thresholdsthecoefcientstoencouragesparsity.However,itdoesnotshrinkthelargecoefcientseffectivelyandconsequentlydoesnotworkwellforcorrelateddata.TheeNetincorporatestheridgecomponentintothe`1.However,thesingularityofthepenaltyfunctionatzeroissmoothedouttosomeextentbythe`2part,whichmayleadtoanestimatenotparsimoniousenough.Also,ittendstoover-shrinkmediumandlargecoefcients.Berhuovercomesthesedrawbacksbyusinganonlinearfusionofthe`1and`2penalties:forsmallcoefcients,`1regularizationisenforcedtoachievesparsity;forlargecoefcients,`2regularizationisenforcedtocompensatecollinearity. 28

PAGE 29

ZouandHastie 2005 ).Ontheotherhand,Berhushiftsonlymediumcoefcientsandshrinksonlylargeones.Itdoesselectionanddecorrelationseparately,whichbetterservestwoimportantobjectivesofnetworklearning:accuratepredictionandparsimoniousrepresentation.SubstitutingPB(B;,M)into( 2 ),wefocusonsolving 2kYXBk2F+PB(B;,M)s.t.kBk21.(2)WenotethatPB(B;,M)isnon-differentiableatzeroandpiecewise.Thestationarityconstraintaddsmoredifcultiestotheproblem.Moreover,inpracticewearefrequentlyconfrontedwithlarge-scalenetworks.Hence,anefcientandscalablealgorithmisdesiredforS2networklearning. 2 ).Somealgorithmsbasedonconventionaltechniqueswillbedevelopedrst.Theysufferfromhighcomputationalcomplexity,poornumericalaccuracy,and/orinsufcientsparsity.WethenproposethenovelBISPSwhichiseasytoimplementandcomputationallyefcient.Convergenceproofisprovided.WeassumethedatamatricesX,Yhasbeencenteredbeforeallthecomputation.PrecalculationsXX=XTXandXY=XTYhelpavoidrepeatedcomputation. 29

PAGE 30

2 )canbereformulatedandthensolvedbywell-knownoptimizationtechniques,suchassemideniteprogramming,projectedsubgradientmethod,andalternatingdirectionmethodofmultipliers.WebrieydiscussthesealgorithmsbeforeintroducingBISPS. 2 )canbereformulatedasasemideniteprogramming(SDP)problem Sturm 1998 )andSDPT3( Tutuncuetal. 2003 )usingMATLAB7.11.0onaPCwith4GBmemory;whenthenumberofnetworknodesislargerthan100,bothsolversranoutofmemory. Alberetal. 1998 )ofPB(),andrl(B)isthegradientofl(B):rl(B)=XXBXY.Theprojectedsubgradientmethod(PSGM)forproblem( 2 )computesasequenceoffeasiblepointsfBkgwiththeupdaterule 3 30

PAGE 31

Boydetal. 2011 ).ToapplyADMMforsolvingproblem( 2 ),wereformulateitas 2kYXBk2F+PB(A;,M)s.t.264BB375=264AC375,andkCk21.TheaugmentedLagrangiancanthenbewrittenasL1,2(B,A,C,1,2)=1 2kYXBk2F+PB(A;,M)+trfT1(BA)g+trfT2(BC)g+1 A fordetail.NotethatmatrixinversionisinvolvedinupdatingB,whichaddscomputationaldifculty.Thepenaltyparameters1and2havetobelargeenough;thechoicesofthemhavebeenshown 31

PAGE 32

He 2000 ).However,thealgorithmisstillslowwhentheproblemishighdimensional.Forexample,whenp=300,n=100,ADMMcostsupto10timesmorecomputationtimethanBISPS(tobedescribedinSection 2.3.2 )toreachcomparableaccuracy.Also,theconvergencepropertyofADMMwithvaryingisnotclear.Insummary,althoughwehaveimplementedsomealgorithmsbasedonconventionaltechniquesfortheS2networklearningproblem,amoreefcientandscalablealgorithmisingreatneed. 2.2.2 advocatestheBerhupenaltyforS2networklearning.However,intheoriginalpaper( Owen 2007 ),Berhuwassolvedthroughcvx( GrantandBoyd 2008 ).Practicalapplicationscallforthedevelopmentofmuchfasteralgorithms.WereparameterizeBerhuanddevelopitscoupledthresholdingrule,whichallowsustosolvetheBerhusparsitypursuit(problem( 2 )withBerhupenalty)inasimpleandefcientway.Thisformulationfacilitateseasyparametertuning.Also,ithelpsusunderstandtheessenceofBerhu.Let==M.ReformulatetheBerhupenalty( 2 )as 32

PAGE 33

1 ,B(;,)isthecoupledthresholdingrulefortheBerhupenaltyPB(;,):PB(t;,)=Zjtj0(supfs:B(s;,)ugu)du.Forthemultivariatecase,theBerhuthresholdingoperatorisappliedelementwise.WithB(;,),wecansolvetheBerhusparsitypursuit(withoutthestationarityconstraint)usingasimpleiterativeprocedure: She 2009 ).Itisworthpointingoutthat,basedontheconstructionrule( A ),wecanalsodenethethresholdingoperatorsforotherpenaltiesincludingLasso,eNetandtheridgepenalty,asshowninFigure 2-4 .TheBerhuthresholdingoperatorB(t;,)offersanonlinearfusionofthesoftthresholdingoperator(coupledwithLasso)S(t;)=sgn(t)(jtj)1jtjandtheridgethresholdingoperatorR(t;)=t 1+sgn(t)(jtj)1jtj,seethediscussioninSection 2.2.2 2 ),wenowproposeBISPSasgiveninAlgorithm 1 .Thisalgorithmcontainsonlysimplematrixoperationsinadditiontothespectralnormprojection.Parameterk0canbesettoanyconstantthatislargerthanthespectralnormofX.Noadhocalgorithmicparameters,suchas1,2inADMMandkinPSGM,areinvolved.TheinneriterationofStep2oftenconvergeswithin10stepsinpractice,wherematricesC,P,Qareauxiliaryvariablesthatcontributetofastconvergenceoftheprocedure.Step2.1istoenforcesparsitybyBerhuthresholdingandStep2.3istoprojecttheestimatetotheconvexsetfA:kAk21g.Theouter 33

PAGE 34

iterationhasonlyasimpleupdatestepandconvergesfast.Asaresult,thealgorithmiscomputationallyefcientaswellaseasytoimplement.TheconvergenceofBISPSistheoreticallyguaranteed.Forsimplicity,weassumetheinneriterationisrununtilconvergence.ThenTheorem 2.1 statesthatAlgorithm 1 solvestheS2learningproblem. 1 convergestoagloballyoptimalsolutiontoproblem( 2 ).SeeAppendix A forthedetailedproof.BISPShasmoreexibilityandgenerality.ThoughitisdesignedwiththeBerhupenalty,byreplacingBwithanappropriatethresholdingoperatorinStep2.1,thealgorithmallowsforanyconvexpenaltyforS2learning.Moreover,ifStep2.2to2.4areremoved,Algorithm 1 reducestotheBerhusparsitypursuit( 2 ). 34

PAGE 35

1 liesinthespectralnormprojection(Step2.3).Inpractice,wecanapplysometechniquestofurtherimprovecomputationalefciency.1)WecanrstrunAlgorithm 1 withoutStep2.2to2.4andobtainanestimate.Ifitsatisesthestationaritycondition( 2 ),weacceptandoutputthissolution.Otherwise,wererunAlgorithm 1 withStep2.2to2.4included.2)Moreover,wecantakeadvantageofthefactthatBk+1issparseandthenumberofBk+1'ssingularvaluesthatarelargerthan1ismuchsmallerthanp.Thesingularvaluethresholdingalgorithm( Caietal. 2010 ),amongsomeotherfastalgorithms,calculatesonlythesingularvaluesthatareaboveathresholdandtheircorrespondingsingularvectors,whichiscomputationallyefcientforlargesparsematrixandthustsourproblemwell.WeusethepackageMODIFIED-PROPACKprovidedby Lin ( 2011 )tocalculatethepartialSVDwiththresholdbeing1.Forp=500,n=100,thepartialSVD,comparedwiththeoriginalSVD,canacceleratethecalculationbyupto30times. Fanetal. 2009 ; FanandLv 2010 ).Forregressionproblems,undertheassumptionthatthenumberofnonzerocoefcientsisfarsmallerthann,screeningtechniquescanbeusedtocoarselyselectthevariablesbeforenerestimation.Thisideacanbeadoptedinnetworkidentication:ifoneissurethattheaveragenumberofconnectionsforeachnodeismuchlessthandne(say=0.8)orthetotalnumberofconnectionsinthenetworkismuchlessthandpne,onecanrstusefastscreeningtechniquestoselectm=dpnecandidateconnections,andthenapplyBISPSrestrictedonthecandidateconnectionsforfurtherselectionandestimation.Ifthescreeningtechniquecanincludeallthetrueconnections 35

PAGE 36

FanandLv ( 2008 )canbeappliedtopreselectvariablesinasupervisedmanner.Appliedtonetworklearning,itsortstheelementsofW=XTYbymagnitudeinadecreasingorderanddenesareducedmodel 2 )withB0=0andhardthresholdingH(t;)=t1jtjwithaproperlychosen.Thisinspiresustoapplyaniterativeprocedureforscreening:startingfromB0=0,repeat1)Bk+1Bk1 36

PAGE 37

2.2 showsthatQTISsolvesan`0constrainedproblem. 2kYXBk2F,andBksatiseskBkk0m,wherekk0denotesthenumberofnonzeroelements.SeeAppendix B forthedetailedproof.Inpractice,Musuallystopschangingafterlessthanahundrediterations.Thenumberofunknownsisreducedfromp2todpneeffectivelybyasmallamountofcomputation.Then,moreinvolvedandsophisticatedestimation,e.g.,BISPS,canbeperformedtothereducedmodel.ItismuchfasterthanapplyingBISPSdirectlyifp2n.Inaddition,QTISprovidesBISPSwithasparsepattern,whichfacilitatesthefastcomputationofpartialSVD.ToapplyBISPSonM,weuseelement-wisepenaltyparametersij'sandset 2.3 isaneffectivetechniquetoidentifystationaryandsparsenetwork.Nevertheless,aone-timeestimate,withoutanyp-valueorcondenceinterval,providesonlylimitedguidanceinidentifyingthetruenetworktopology.Infact,whateverinferencemethodisused,therewillbeuncertaintyunderlyingthevariableselectionprocedure.Itwouldbegreatlyhelpfulifonecouldprovidesomekindofuncertaintymeasureforsuchanestimate.Inourcase,wewouldliketondacertaincondencemeasurefortheestimatedtopology.Thiscanbedonebyassigningaprobabilityfortheexistenceofeachconnection.Hence,weusebootstrap 37

PAGE 38

Efron 1979 ).Inthissection,weproposethestationarybootstrapenhancedBISPS(SB-BISPS)whichprovidesacondencelevelaboutwhetheraconnectionexistsinthenetworkbymeasuringthefrequencywithwhichitischosenbytheBISPSalgorithm. 38

PAGE 39

Kunsch 1989 ).Thebasicideaisthat,despitethedependenceofindividualobservations,blocksofobservationscanbeapproximatelyindependentwitheachothergivenaproperblocksizel.Whenatimeseriesisstationary,itisnaturaltomaintainthispropertyinthebootstrapsamples.Thestationarybootstrap( PolitisandRomano 1994 )isamethodwiththisproperty.Itisbasedonresamplingblocksofrandomlengths,wherethelengthofeachblockfollowsageometricdistributionwithmean1=.Weapplyasimpleapproachtoconductsuchresampling.GiventhatxiischosentobetheJthobservationxJintheoriginaltimeseries,wechoosexi+1basedonthefollowingrule: 2 )separatestherolesof`1and`2regularizations;eachofthemisassociatedwitharegularizationparameter,namelyfor`1andfor`2.Thisprovidesimportantguidelinesforparametertuning.Basedonourexperience,theestimateisnotverysensitiveto,soafulltwo-dimensionalgridsearchisnotnecessary. 39

PAGE 40

Akaike 1974 ). She 2012a ).Thisresultsinthreeo's,onefromeachpath.Choosetheoptimalonefromthemandletitbetheoptimalthresholdingparameter.Thepair(,)isournalchoiceofthetwoparameters.TheSCVcross-validatesdifferentsparsitypatternsinsteadoftheregularizationparameters.Itismorecomputationallyefcientthantheplaincrossvalidationsinceitrunsthesparsealgorithmonlyonceandglobally(insteadofKtimeslocally).TocalculatetheSCVerrorassociatedwith,werstapplyBISPStothewholedatasetandobtainthesolution^B().Then,fork=1,,K,weapplyridgeregressionrestrictedtothevariablesthatarepickedbynz(^B())onthedatawithoutthekthdatapiece,andevaluateitsvalidationerrorusingthekthdatapiece.ThesumoftheKvalidationerrorsisdenedastheSCVerror.See( She 2012a )formoredetails. 2 ),thenthestationarityviolationpercentageisdenedasPv=Nv=N. 40

PAGE 41

StockandWatson 2012 ).SupposewehaveTobservations:x1,,xT.LettherollingwindowsizebeWandthehorizonbeh.Standingattimet,weusethemostrecentWobservationstoestimateB,denotingtheestimateas^Bt.Thenweusethisestimatetoforecastxt+h,denotingtheforecastas^xt+handtheforecastingerroraseht=kxt+h^xt+hk22.Thisprocessisrepeatedfort=W,,W+N1asweshiftthewindow.Nisthenumberofwindowshiftingthatsatises1NTWh1.ThentherollingMSEforhorizonhisdenedasMSEhrolling=1 2 ),thisshouldbedoneas:^xt+h=^BTt^xt+h1forh1,where^xt=xtwhenh=1.WegeneratethepptransitionmatrixBwithbothsparsityandstationarityproperties.First,thetopologyisgeneratedfromadirectedrandomgraphG(p,),wheretheedgefromonenodetoanothernodeoccursindependentlywithprobability.ThenthestrengthoftheedgesisgeneratedindependentlyfromaGaussiandistribution.ThisprocessisrepeateduntilweobtainamatrixBthathasadesiredspectralradius0.9<(B)<1.Weset=10=p,=2I,2=10.TheregularizationparametersarechosenbySCVasdescribedinSection 2.6.1 .Fora-path,weuseagridof100valuesfor,whichispickedfromtheinterval[0,kB0+XTYXTXB0kmax].TheinitialestimateissimplysetasB0=0.Foran-path,weuseagridof76valuesfor,whichispickedfromtheinterval[210,25].Thenumberoffolds 41

PAGE 42

2-1 showstheexperimentresultsfordifferentnetworksizes,namelyp=100,200,300.Recallthatforanetworkwithsizep,thenumberofunknownparametersisp2.Forexample,forthenetworkwith300nodes,thenumberofparameterstobeestimatedis9104,whichisextremelyhighcomparedwiththenumberofobservations80.Amongthethreepenalties,theLassosolutiongiveshighermissrates.Thisisbecausewhensomepredictorsarecorrelated,Lassotendstochooseonlyapart,orevennone,ofthem.Asaresult,Lassosometimesover-shrinkstheestimate.TheeNetandBerhu,insuchcases,tendtoincludeallthecorrelatedpredictors,thankstothe`2partinthepenalties.However,thesparsityoftheeNetsolutionisaffectedbythe`2regularization,soitgiveshighfalsealarmrates.Ontheotherhand,BerhuhasimprovedeNettosomeextendbyenforcingthe`2regularizationonlytolargecoefcients.Asaresult,Berhuachievesthesmallesttestingerrors(h=1)amongthethreepenalties.AsshownbyPv,nomatterwhatpenaltyisused,itispossibleforPMLtogiveanonstationaryestimate,whereastheproposedS2learningandBISPScanguaranteethestationaritypropertyof^B.Thisindicatesthataddingthestationarityconstraintintothesparsitypursuitdoeseffectivelypreventtheestimatefrombecomingnonstationary.Meanwhile,theS2estimatecanachieveacomparableestimationanddetectionaccuracywiththePMLestimate.Table 2-1 givestherollingMSEsfordifferenthorizonshtoillustrateboththeshorttermandlongtermforecastingperformance.ForPMLestimates,therollingMSEsgrowexplosivelywithhduetotheexistenceofnonstationaryestimates,whilethoseofBISPSaccumulatemuchmoreslowly. 42

PAGE 43

Lasso5%(13.6,22.3)23.41058.37798.9 eNet5%(12.3,25.9)23.51238.110842.8 ,24.9 )23.0 1329.412219.6 ,24.8 )23.1 195.9209.2 Lasso3%(15.8,29.3)41.8332.810998.7 eNet2%(14.5,26.9)36.5207.8405.3 ,24.6 )32.2 201.4391.5 ,24.4 )32.1 131.9200.5 Lasso2%(18.1,14.9)19.961.6111.2 eNet3%(13.5,26.1)20.465.5123.2 ,22.9 )18.8 66.5121.3 ,22.1 )19.3 63.365.5 PerformancecomparisonofBISPSwithLasso,eNetandBerhu.n=80. Tofurtherillustratethedisadvantagesofanonstationaryestimate,wendonerunwhereLassogivesanonstationaryestimate^BLasso.Startingfromatimepointt,wegenerate^xt+h(^B)forh=1,,100using^BLassoand^BBISPSrespectivelyandcomparethemwithxt+hobservedfromthetruemodel.TheresultsareplottedinFigure 2-5 .Wecaneasilyseethat^x(^BBISPS)givesareasonableimitationofthetruesystem.Thenonstationaryestimate^x(^BLasso),however,blowsupquicklyandbehavescompletelydifferentlyfromthetruemodel.Thistellsusthatensuringastationaryestimateisindeedcrucial. FanandLv 2008 )byexaminingtheirabilitytoincludeallthetrueconnections,whichcanbemeasuredbythemissrate.Simulationisdonefornetworkswithdifferentsizes,namelyp=300,400,500.Thesamplesizen=80.Figure 2-6 showstheperformancesofQTISandSISwithdifferentvaluesforthequantileparameter.WeseethatSISistoogreedyforcorrelateddata.ItispossibleforSIStomissevenmorethanhalfofthetrueconnections.Ontheotherhand,QTISgivesmuchsmallermissratesthanSIS,anditsperformanceismorerobusttothechoiceof.Wethen 43

PAGE 44

Comparisonofforecastingperformances.Top:samplefromthetruemodel;Middle:forecastfromthenonstationaryestimate^BLasso;Bottom:forecastfromthestationaryestimate^BBISPS. Figure2-6. ComparingmissrateofQTISandSIS. runBISPSwithandwithoutQTISandcheckthedifferenceoftheirperformances.WedenoteQTIS+BISPSastheprocedurethatrstappliesQTIStoscreentheconnectionsandthenappliesBISPStothereducedmodel.ThequantileparameterforQTISissetas=0.8.Figure 2-7 comparestheperformancesofQTIS+BISPS 44

PAGE 45

Bmissrate Ctestingerror PerformanceofQTIS+BISPS,comparedwithapplyingBISPStoafullmodel. andBISPS.Whenp=nratioislarge,addingQTISnotonlyimprovestheestimationandidenticationaccuracy,butalsosavesupto80%ofthecomputationtime.Asthep=nratiobecomeslarger,theimprovementbecomesmoreremarkable. 45

PAGE 46

LassoBISPS LassoBISPS 0.5890.445 7.Prices 1.9711.874 2.IP 0.8460.576 8.Wages 0.5520.207 3.Employment 0.9360.711 9.Interestrate 1.4430.738 4.Unempl.rate 0.2890.165 10.Money 0.1140.065 5.Housing 0.0710.033 11.Exchangerates 0.3700.107 6.Inventories 0.5060.217 12.Stockprices 0.2540.100 Table2-2. NormalizedRollingMSEofLassoandBISPSforeachcategory. h 12481632 Lasso 0.0170.0210.0290.365329.93.1108BISPS RollingMSEofLassoandBISPSfordifferenthorizons. to2008:IV,whichbelongto12categories1.Ithasbeenpreprocessedsothateachtimeseriesisastationaryprocess.WeusetheS2networkmodel( 2 )todetectGrangercausalrelationsbetweenthesemacroeconomicindices. 2-2 showstherollingMSEsoftheLassoandBISPS,normalizedbythatoftheAR(4)model,whichisaconventionalbenchmarkofmacroeconomicforecasting.ComparedwiththeAR(4)model,bothLassoandBISPS,basedonanetwork(multivariate)model,haveattainedmuchsmallerforecastingerrors,exceptforCategory7(prices).Therefore,byintroducingtheGrangercausalinteractionsbetweendifferentindices,wecanbuildamultivariatenetworkmodelthatismoreaccuratethanthe 46

PAGE 47

2-3 .Asthehorizonincreases,therollingMSEofLassogrowsexponentially,whichclearlyindicatesthatsomeestimatesoftheLassoarenonstationaryandthusfailtoforecastforlargehorizons.Ontheotherhand,therollingMSEofBISPSstaysstablefordifferenthorizons.ThisphenomenonissimilartowhatisshowninFigure 2-5 .TheyhaveillustratedthefundamentaldifferenceoftheS2learningfromtheplainPMLestimation. DavisandKahn 2008 )andanalyzethechangesintheirGrangercausalconnections.AstheeconomicstructureofU.S.hasgonethroughhugechangesintheGreatModerationinmid-1980,weexpecttoseesignicantlydifferentcausalitynetworksbeforeandaftermid-1980.Hence,wedividethetimeseriesintotwoperiods,thepre-GreatModerationperiodandthepost-GreatModerationperiod,andapplySB-BISPSseparatelytothetwoperiods.Forthepre-GreatModerationperiod,weusethedatafrom1960:Ito1979:IVastrainingset(80observations);forthepost-GreatModerationperiod,weusethedatafrom1985:Ito2004:IV(80observations).ThestationarybootstrapsamplesareobtainedusingtheRfunctiontsboot( Dalgaard 2008 )withdefaultparametervalues. 47

PAGE 48

BPost-GreatModeration COFsofthepre-GreatModerationperiodandthepost-GreatModerationperiod. ThenumberofstationarybootstrapsamplesissettobeB=100.Figure 2-8 showstheCOF(connectionoccurringfrequency)matricesgivenbySB-BISPSforthepre-GreatModerationandthepost-GreatModerationperiods.WenoticethattheCOFmatricesofthepre-GreatModerationperiodhasahigherenergylevelthanthatofthepost-GreatModerationperiod.Thisindicatesthatthenodesaremoreactivelyinteractingwitheachotherinthepre-GreatModerationperiodthaninthepost-GreatModerationperiod,whicheffectivelyreectsthereductioninvolatilityofthebusinesscycleuctuationssincetheGreatModeration.Toillustratetheideamoreclearly,wesetthecutoffvalueforCOFtobef=80%andidentifythemostsignicantconnections.Thetopologiesobtainedforthepre-GreatModerationperiodandthepost-GreatModerationperiodareshowninFigure 2-9 .Isolatednodesareremoved.Inthepre-GreatModerationperiod,themacrovariablesactivelyinteractandformacomplexdynamicalnetwork.Therearethreeprominentvariables,namelyGDP281(durablegoodsindex),GDP256andGDP261(grossprivatedomesticinvestmentindices),whichactlikehubvariables.Theyinteractnotonlywithmanynon-hubvariablesbutalsowitheachother.Therefore,therearenoindependentclusters.AftertheGreatModeration,ontheotherhand,theinteractions 48

PAGE 49

BPost-GreatModeration Topologiesofthemacroeconomicnetworkinthepre-GreatModerationperiodandthepost-GreatModerationperiod.f=80%.Self-loopsarenotshown. havebeenremarkablyreducedandmostofthevariablesseemonlyself-regulated.Thismakesiteasierforthenetworktostaystable.Therearetwohubvariables,GDP255(realpersonalconsumptionexpenditure)andGDP275-3(energygoodspriceindex).TheincreasingimportanceofthesetwovariablesagreeswiththeobservationthatenvironmentalregulationsandenergypolicieshavebeguntoinuencetheeconomicgrowthsincetheGreatModerationperiod( JorgensonandWilcoxen 1990 ; HalkosandTzeremes 2011 ). 49

PAGE 50

Banerjeeetal. 2008 ; Friedmanetal. 2008 ; BickelandLevina 2008 ; MeinshausenandBuhlmann 2006 ).Theconcentrationmatrix=1describestheconditionaldependencerelationshipsbetweentwonodesgivenalltheothernodes,whichcanbetranslatedtoanundirectedconditionaldependencegraph(CDG).Similarly,theGaussiangraphlearningenforcessparsityontoobtainaninterpretablegraph.Nevertheless,itisnotdirectlyapplicabletothedynamicalmodel.Ashasbeendiscussed,thetaskofestimatingAisaschallengingasthatofestimating.Inparticular,substitutingthesamplemeanforthetruemeanisinappropriatewhenAisalargematrix,whichiswellknownanddatesbacktotheStein'sphenomenon( Stein 1956 ).Toobtainacomprehensivepictureofthedynamicalnetwork,itisnecessarytoestimatebothAandbasedontheirjointlikelihoodbymeansofregularizationapproaches.TherearefewerstudiesintheliteraturethatconsiderthejointsparseestimationofAand( Rothmanetal. 2010 ; LeeandLiu 2012 ).Unfortunately,inourexperience,allexistingmethodsareslowandabsolutelynotapplicableforlarge-scalenetworks.Forexample,theMRCEalgorithmproposedby Rothmanetal. ( 2010 )is 50

PAGE 51

3.4 fordetails.Evenforrelativelysmallnetworks,thenumberofunknownvariablesinthetwomatrices,i.e.,p2+p(p+1)=2,canbelarge,therebymakingitdifculttoreliablyidentifythesparsenetworktopologyandaccuratelyestimatethesystemparameters.Asareal-dataexample,weuseasubsetoftheS&P500stockshares,whichconsistsof37nodesfromtheEnergycategory,toillustrateourmotivations.First,Figure 3-1 showsthegraphsobtainedbysoleGTGlearning(sGTGforshort)whichignoresthesecond-ordernodecorrelations,andsoleCDGlearning(sCDGforshort)whichignorestherst-ordertransitionestimation.Commonisolatednodeshavebeenremoved.Perhapsinterestingly,weobservethattheGTGandtheCDGsharesomesimilarstructures,suchasthesamehubnodeEOGandmanycommonedges,say,APACAM,ABCMBO,etc.Thisisinfacttrueinmanylarge-scalenetworks.See,e.g., RheinandStrimmer ( 2007 )forsimilarconclusionsingenenetworkstudies.Asaresult,thejointregularizationof(A,)mayhelpdetectthejointstructureofthetwographs,andthusimproveidenticationandestimationaccuracy.Evenwhenthesimilaritiesbetweenthetwographsarenotsoobvious,jointregularizationcanimprovetheoverallestimationaccuracyinsuchhighdimensions( Stein 1956 ).AnotherinterestingobservationfromFigure 3-1 isthatthenetworkcanbepossiblydecomposedintosmallersubnetworksandthereexistisolatedindices.Thishasalsobeennoticedby Finketal. ( 2009 )and StockandWatson ( 2012 )forthebrainconnectivitynetworkandtheU.S.macroeconomicnetworkrespectively.Ifsuchanetworkdecompositioncouldbedetectedinanearlystage,complexlearningalgorithms,suchasMRCEandGaussiangraphlearning,wouldapplyinamuchmoreefcientway(eveninaparallelmanner)toreducecomputationalcomplexityandboostfeasibility.Ofcourse,thedecompositionbasedoneithersGTGnorsCDGisnottrustworthy.OnlythegraphscreeninganddecompositionbasedonAandjointlyisreasonabletoreducetheproblemsizeinsuchstatisticaltasks. 51

PAGE 52

BsCDG ThesGTGandsCDGfortheS&P500indicesfromtheEnergycategory.Commonisolatednodeshavebeenremoved. Inthiswork,weproposetojointlyregularizethedirectedtransitiongraphandtheundirecteddependencegraphinnetworktopologyidenticationanddynamicsestimation.Wewillintroducethenotionofjointassociationgraph(JAG)andproposetheJAGscreeninganddecompositiontofacilitateGTGandCDGlearnings.TheJAGscreeningidentiesandremovesunnecessaryedges.Withthesearchspaceoftheoriginallarge-scaleproblemreduced,computationalandstatisticalperformancecanbeenhanced.Moreusefully,wecandecomposealarge-scalenetworkbasedontheJAGscreeningstructure.Ifsuchdecompositionsarepossible(whichistypicallytrueinreal-worldapplications),GTGandCDGcanbelearnedforeachsubnetworkindividually.Fast-computingpackagesforsparseoperationsandparallelcomputingtechniquesarewidelyavailabletofurtherspeedupthelearningprocess.SimilarideashavebeenprovedtobesuccessfulinGaussiangraphlearning( Wittenetal. 2011 ; MazumderandHastie 2012 ),referredtoastheblockdiagonalscreeningrule.Yet,ascanbeenseeninFigure 3-1 ,theCDGcapturesonlythesecond-orderstatisticalstructureofthedynamicalnetworkandthusitsdecompositionisnottrustworthyinourproblem.OurproposedJAGwillintegratethetransitiongraphandtheconditionaldependencegraph, 52

PAGE 53

Zhaoetal. 2012 ).Webelievethattheregularizationparametershouldbechosenbasedonthebeliefofhowsparsethenetworkis,notmerelybasedoncomputationalreasons.WewillproposearobustJAGdecompositionthatfollowsthisprinciple.Tothebestofourknowledge,noworkofjointgraphscreeninganddecompositionisavailableintheliterature.Theremainderofthischapterisorganizedasfollows.Section 3.1 describesthejointgraphicalmodelandproposesalearningframeworkcalledjointgraphicalscreeningandestimation(JGSE).Section 3.2 describesthegraphicaliterativescreen-ingviathresholding(GIST)algorithmforJAGscreeningandrobustdecomposition.Section 3.3 describesajointlearningalgorithmthatestimatesAandafterscreening.InSection 3.4 ,synthetic-dataexperimentsareconductedtoshowtheperformanceofJGSE.InSection 3.5 ,weapplyJGSEtoreal-worlddatasetsincludingtheS&P500andtheNASDAQ-100stockshares. 53

PAGE 54

Granger 1969 ).Theconcentrationmatrix=1canbetranslatedtoanundirectedconditionaldependencegraph(CDG):!ij=!ji=0indicatesnodeiandnodejareconditionallyindependentgivenothernodes( MeinshausenandBuhlmann 2006 ).Giventheobservationsx1,,xn+1,therst-orderstatisticAandthesecond-orderstatisticaretobeestimatedandtheirtopologicalstructurestobeinferred.WeareinterestedindynamicalnetworkswhereboththeGTGandtheCDGaresparse.First,manyreal-worldcomplexdynamicalnetworksindeedhavesparsestructures.Forexample,inaregulatorynetwork,ageneisonlyregulatedbyseveralothergenes( Fujitaetal. 2007 ).Second,whenthenumberofobservationsissmallcomparedwiththenumberofunknownvariables,thesparsityassumptionreducesthenumberofmodelparameterssothatthesystemisestimable.Third,fromaphilosophicalpointofview,asparsemodelisconsistentwiththeprincipleofOccam'srazor. 2(xt+1Axt)T(xt+1Axt)g.(3)Sothejoint(conditional)MLestimationofAandsolves 2nXt=1(xt+1Axt)T(xt+1Axt)n 3 )inmatrixform 2trf(YXB)(YXB)Tgn 54

PAGE 55

RheinandStrimmer ( 2007 )haveshownthat,whenanalyzingtheArabidopsisthalianagenedata,ifonenodehascausalinuenceonanothernode,thetwonodestendtobecorrelatedaswell.SimilarconclusionsarealsoseeninFigure 3-1 .Fact2)Large-scaledynamicalnetworksoftenconsistofsmaller-scalesubnetworkstructuresorclusters.Forexample,ahumanbrainconnectivitynetworkrevealedbytheEEGdatacanbedividedintoseveralfunctionalityregions( Finketal. 2009 ).Also,intheU.S.macroeconomicnetwork,theindicescanbedividedinto13categories( StockandWatson 2012 ).Itisdesirabletodecomposealarge-scalenetworkintosmallsubnetworks,ifpossible,forbothcomputationalandstatisticalconcerns.ThedesignofPC(B,;C)istocapturethejointstructureoftheGTGandtheCDG.Detectingthejointstructurehelpsimprovetheoverallidenticationandestimationaccuracy.Perhapsmoreusefullyinpractice,itcanhelpdecomposealarge-scalenetworkintosmallersubnetworks,aswillbeshowninSection 3.2 3 )showsboththerst-orderandthesecond-orderstatisticalrelationshipsbetweenthenodesasthenetworkevolves.Tocapturethejointstructureofthetwographs,weintroducethenotionofjointassociationgraph(JAG),anundirectedgraphwhereanytwonodesareconnectediftheyareconnectedineitherGTGorCDG.For 55

PAGE 56

BCDG CJAG Exampleofthejointgraphicalmodel. convenience,wedescribetheassociationstrengthbetweennodeiandnodejas 3-2 ,wheretheJAGinFigure 3-2C isobtainedfrom( 3 ).TheGTGandCDGaresimilarandsharemanycommonedges.Furthermore,theybothexhibitsubnetworkstructures.Forexample,inbothgraphs,node1throughnode4formacluster.Ontheotherhand,thetwographsalsodifferfromeachotherinsomesignicantways.Forexample,node9andnode10aredisconnectedinGTGbutnotinCDG.HavingintegratedtheconnectionsinGTGandCDG,JAGprovidesaneatpictureofalltheassociationsinthenetwork,giventhenatureoftheconnectionsisnotamajorconcern. 56

PAGE 57

3-2C ,node4andnode5aredisconnected,sowecansimplysetb45=b54=!45=!54=0beforewereallyestimateBand.Particularly,iftheJAG,afterpermutation,exhibitsablock-diagonalstructurelike 3-2 canbedecomposedintotwomutuallydisconnectedsubnetworksaccordingitsJAG.WecanthusinfertheGTGsandCDGsofthetwosubnetworksseparately.ExplicitlycomputingtheJAGbasedon( 3 )alsofacilitatesthecomputationandalgorithmdesign,aswillbeshowninSection 3.2.2 3 )describesageneralregularizationformforjointlyestimatingBand.However,directlyoptimizingthisobjectivefunctionisanextremelychallengingproblem.EvenwithoutPC,thejointestimationby,say,theMRCEisinefcientoreveninfeasibleforlarge-scaleproblems.ThemainobjectiveofthisworkistoutilizethejointstructureofBandtoimproveidenticationandestimation.Therefore,insteadofsolving( 3 )directly,weproposeaJointGraphicalScreeningandEstimation

PAGE 58

3 )servesforthemostgeneralcase,wherenoparticularpriorinformation,besidesFact1,isgiven.AtStage2,wenelyestimate(B,)basedonthepatternofJAG: 58

PAGE 59

2kYXBk22+PB(B;B).(3)Chapter 2 focusesonstudyingthismodel.AssumingthedatahasbeencenteredandB=0,thejointgraphicalmodeldegeneratestothesCDGmodelwhereasparsecanbesolvedbyGaussiangraphlearning Rothmanetal. 2010 ) LeeandLiu 2012 ),onlypracticallyfeasibleforsmallnetworks,say,withp<120.ThisisamajormotivationforustointroducetheJAGscreening.Inthefollowingtwosections,wepresentthealgorithmsforthetwo-stageJGSElearningframeworkindetail.Section 3.2 proposestheJAGscreeninganddecomposition.Section 3.3 describestheneestimationof(B,). 59

PAGE 60

3 )isverychallengingtosolvesincetheobjectivefunctionisnonconvexandnonsmoothwithalargenumberofunknownvariables.OnepossiblewayistouseanalternativeoptimizationofBand.However,itistoocomputationallyexpensive,similarlywiththealternativeoptimizationalgorithmsforjoint(B,)estimation,suchasMRCE.Moreover,thealgorithmdesignisquitecumbersomeonemustconsiderdifferentcasesdependingonwhetherthevariablesarezeroornot.Ourexperimentsshowthatsuchanalgorithmisonlyfeasiblefornetworksizep<120.Muchmoreefcientalgorithmsareingreatneedinlargenetworklearning,especiallyforscreeningpurposes.Hence,weproposeanovelGISTalgorithmbasedonthegroup-estimatorandanasynchronousArmijo-typelinesearchapproach. She 2009 ).Athresholdingrule(;)isrequiredtobeanodd,shrinkingandnondecreasingfunction.Examplesincludethesoft-thresholdingoperatorS(t;)=(jtj)1jtjandthehard-thresholdingoperatorH(t;)=t1jtj.Given(;),itsmultivariateversion~isdenedas( She 2012b ) 60

PAGE 61

She ( 2012b )showedthatthefollowingiterativeprocedure 3 ),providedthatPkandkarecoupledthroughthefollowingequation: 3 );andthegrouphard-thresholding~H(t;)correspondstothe`0-typepenalty,e.g.,PKk=1Pk(ktkk2;k)=PKk=12k 3 ),wedividethevariablesinBandintoK=p(p+1)=2groups,wherevariablesatentry(i,j)andentry(j,i)(1ijp)belongtothekthgroupwithk=(i1)(pi=2)+j.Let=[B,]2Rp2p,whereB=Band=.ItisnotdifculttoshowthatthegradientsofL(B,)inBandrespectivelyare(detailsomitted) 2(YXB)T(YXB)n 61

PAGE 62

3 )isgivenby A ). 3 )givesasolutiontoproblem( 3 )withanypenaltyfunctionconstructedby( A ).TheproblemnowboilsdowntochoosingaproperpenaltyformforJAGscreening.Anotherissuethatcannotbeignoredisparametertuning,whichisanontrivialtaskespeciallyfornonconvexanddiscretepenalties.Popularsparsity-promotingpenaltiesinclude`0,`1,SCAD,amongothers.Naturally,thegroup`0penaltyistheidealwaytoenforcesparsityonthegrouplevel.However,mosttuningstrategies,e.g.,K-foldcrossvalidation,arequitetimeconsumingorevenextremelyadhoc,especiallyforlargenetworkapplications.Ratherthanusingthegroup`0penalty,weproposeagroup`0constraint 3.2.1 viaaquantileversionof( 3 ): 62

PAGE 63

3 ),takingadvantageofthespecicgroupingmannerandthesymmetryofC.Itisnotdifculttoshowthat,forthe`0constraint( 3 ),themultivariatequantilethresholdingonisequivalenttotheelementwisehardthresholdingonCwithadynamicthreshold.Thatis,ateachiteration,oneshouldperformtheelementwisehardthresholdingonCl+1withthethresholdl+1beingthe(2m+1)thlargestelementinCl+1;thensetbl+1ij=!l+1ij=0,ifcl+1ij=0.Thissolvesthefollowing`0-constrainedproblem: She 2012b ; Sheetal. 2012 ).Nevertheless,dynamicalnetworklearningrequiresjointlyestimatingbothBand,andthereseemstobenosimpleformulaforlin( 3 ).Moreover,theconstraintcone0increasesthedifcultyinderivingthestepsize.Inthefollowing,weproposeasimplebuteffectiveasynchronousArmijo-type(denotedasAA)linesearchapproach,whichguaranteesaconvergentsolutionwith0satised. 3 ),istoselectastepsizealongthedescentdirectionthatsatisestheArmijorule( Armijo 63

PAGE 64

): 3 )andredotheupdatetilleithertheconditionissatisedorlbecomessmallerthanathresholdc2.Forexample,wecansetc1=104andc2=106.Ateachiterationweinitializelas1andletll=10ifthecondition( 3 )isnotsatised.EmpiricalstudiesshowthatGBandGusuallyhavedifferentordersofmagnitudeandusingthesamestepsizeforupdatingBandissuboptimal.Inourexperimentation,itisdifculttondanlsatisfying( 3 )thatisnottoosmall(say,l106).Thealgorithmtendstoconvergeslowlyandmaybetrappedintobadlocalminima.Therefore,weproposetousedifferentstepsizesforGBandG.Equivalently,thiscanbeimplementedbyasynchronouslyupdatingBand.Tobespecic,wemodify( 3 )as Duchietal. ( 2008 ).Inthisway,thenegativelog-likelihoodfunctionL(B,)is1fornotpositivedeniteandsuchanwillnaturallyberejectedbythecondition( 3 ).Togiveanalformofouralgorithm,assumethedataXhasbeencenteredandnormalizedcolumnwiseandYhasbeencenteredandpre-calculateXX=XTXand 64

PAGE 65

2(YXBl)T(YXBl)n TheGISTalgorithmenablestheusertohavedirectcontroloverthecardinalityoftheestimatedmodel.Itisveryeasytoimplementandrunsfastinpractice.Moreover,thereisnoparameterthatneedstuning.Ifthepurposeistogettheconvergentsparsitypatterninsteadofthepreciseestimate,onecanterminatethealgorithmaslongasthepatternoftheiteratesstabilizesusuallywithin50steps. 65

PAGE 66

Wittenetal. 2011 ; MazumderandHastie 2012 ),whereasimpleone-stepthresholdingisappliedonthesamplecovariancematrixtopre-determineifthecoupledconcentrationmatrixestimateisdecompossible,referredtoastheblockdiagonalscreeningrule.However,theGaussiangraphlearningignorestherst-orderstatisticalstructureofthedynamicalmodel( 3 ),andthustheresultingCDGmaynotreliablycapturethenetworktopology,aswillbeshowninSection 3.5.1 .WeproposetodecomposethenetworkbasedonGIST.Specically,wecanapplytheDulmage-Mendelsohndecomposition( DulmageandMendelsohn 1958 )to^Ctodetectifthereexistsanexactblockdiagonalformof^C.However,innetworkapplications,^Cisseldomdecomposableduetonoisecontamination.Weproposetotreat^Casasimilaritymatrixwheretheassociationstrengthcijnowindicateshowclosenodeiandnodejare.Pursuinganapproximateblockdiagonalformisnowidentiedasaclusteringproblemofthenodesbasedon^C.Specically,werecommendapplyingthespectralclustering( vonLuxburg 2006 )to^C.Itcanaccountforthenoiseeffectivelyandefciently( vonLuxburgetal. 2008 )andobtainrobustJAGdecomposition.Therearemanyconvenient-to-useworkstodeterminethenumberofclustersinsuchasetup( Hastieetal. 2001 ; FraleyandRaftery 1998 ; Tibshiranietal. 2001 ; SugarandJames 2003 ).Itisworthpointingoutthattheproposedapproachdoesnotrequiresettinganoverlyhighsparsityleveltoyieldperfectsubnetworks.Instead,thequantileparameterqin( 3 )reectsthebeliefofthetruecardinalityofthenetwork.Thephilosophyisseeminglydifferentfromthatof Wittenetal. ( 2011 )and MazumderandHastie ( 2012 ).Althoughtheyproposedtheblockdiagonalscreeningasapre-checkstepforeachwhencomputingthesolutionpath,inreal-dataanalysistheysimplyusedalargevalue 66

PAGE 67

Zhaoetal. 2012 ).Ifthenetworkisindeeddecomposable(orapproximatelydecomposible),thesystem( 3 )canbeequivalentlywrittenas 3 ).Inthiscase,theGISTestimate^Cisonlyusedtorevealtheblockdecompositionstructure,wherethescreeningpatternwithineachblockarenotusedinStage2.ii)Onecanenforcethewithin-blocksparsityconstraintgivenby^Ciiineachsubnetworklearning.Thisisusuallyfaster.Whenqisspeciedtoolow,nevertheless,oneshouldcautionagainstsuchause.TheimprovementincomputationalefciencyandestimationaccuracyobtainedbyJAGdecompositionisquiteremarkableforlarge-scalesystems,aswillbeshowninSection 3.4 3 ),whereBandaretobeestimatedaccurately,basedontheJAGscreeningpatterngivenbyStage1.Recalltheproblemis

PAGE 68

3.2.4 .Theafter-screeningoptimizationproblemcanbestatedinageneralway, 2trf(YXB)^(YXB)Tg+PB(B;B),(3)andsolveforwithBxed,withtheproblemreducedto 2trf(YX^B)(YX^B)Tgn 3.2.1 .Weproposethenelearningofgraphs(FLOG)algorithmbasedonthegroup-estimator.Forsimplicity,supposePBandPare`1penalties.AttheB-optimizationstep( 3 ),withxedatitscurrentestimate^,weuseTISP( She 2009 )withanArmijo-typelinesearchtosolvetheproblem.Withtheinitializationl=0,=1andBl=^B,thealgorithmrunsasfollows: 68

PAGE 69

NocedalandWright ( 2006 ).Atthe-optimizationstep( 3 ),onejustneedstosolveaGaussiangraphlearningproblemwiththesamplecovariancematrixgivenby1 Friedmanetal. 2008 )canbeused. 3 ),whereasingleregularizationparameterisenforcedonalltheentriesofamatrixandthereisnoscreeningconstraint. LeeandLiu ( 2012 )generalizedthealgorithmtohandleweightedpenalties.BothalgorithmsusecyclicalcoordinatedescenttosolvetheB-optimizationstep,whichhasaworstcaseO(p4)( Rothmanetal. 2010 ).Incontrast,theproposedB-updateinFLOGhasacomplexityboundedbyO(p3),whichcomesfromtheppmatrixmultiplicationforcomputingthegradientGB.Moreover,itisextremelyeasytoimplementonlybasicmatrixoperationsareinvolved.ExperimentsshowthatFLOGismoreefcientthanMRCEunderthesamesettingsoferrortolerancesandmaximumiterationnumbers.Forexample,inaproblemwith80nodesand200observations,FLOGhassavedmorethan40%computationtimecomparedwithMRCE. 3.1.3 ,namelysGTG,sCDGandMRCE. 69

PAGE 70

Friedmanetal. 2007 ). Friedmanetal. 2008 ),wheretheblockdiagonalscreeningrule( Wittenetal. 2011 ; MazumderandHastie 2012 )isappliedrsttocheckthedecomposibilityof^. Rothmanetal. 2010 )solvesthejointestimationproblem( 3 )andisimplementedinanRpackageMRCE.Inallthemethods,the`1penaltyisusedforPBand/orP.TheidenticationaccuracyismeasuredbythetruepositiverateTPR(^C,C)=#f(i,j):^cij6=0andcij6=0g 3 ),whereBiiandiiaresparserandommatriceswitharound20%nonzeroelements.Themultivariatetimeseriesarethengeneratedfollowing( 3 ).WesetthequantileparameterinGISTasq=0.3,whichissufcientlylargegiventhetruecardinalitiesofthenetworks.Allregularizationparametersarechosenbyminimizingthemodelvalidationerror,evaluatedon1000validatingsamplesindependentlygeneratedinadditiontothetrainingdata.Werepeateachexperiment50timesandcollecttheaveragestatistics,asshowninTable 3-1 .sCDGgraphsarenotsufcientlysparse.Itseemsthatthismodeltendstoexplaintherst-orderdynamicsasnodecorrelationsandconsequentlyresultsinadense 70

PAGE 71

Comparisonofthetruepositiverate,falsepositiverateandmodelerror Example1Example2Example3 sGTG (63.3%,32.2%,302.9)(28.1%,12.3%,584.7)(16.4%,3.6%,7428.9)sCDG (76.8%,54.5%,N/A)(74.3%,49.3%,N/A)(71.5%,40.8%,N/A)MRCE (85.6%,52.8%,247.7)(77.9%,46.4%,189.1)N/AJGSE second-ordertopology.Ontheotherhand,sGTGover-regulatesthemodelandmissesmanytrueconnections.BothsGTGandsCDGsufferfromover-simpliedmodelassumptionsandfailtoidentifynetworkconnectionsaccurately.MRCEestimatesboththerst-orderandsecond-orderstatisticsandisabletosignicantlyimprovetheestimationaccuracy,asshownbytheerrorrates.However,MRCEisquitecomputationallyexpensiveandmaybeinfeasibleforlarge-scaleproblems.InExample2,italreadytookMRCEaround40minutestorunasingleexperiment.InExample3,MRCEbecamecomputationallyintractable.OurproposedJGSEdoesnothavesuchlimitations,thankstotherobustJAGscreeninganddecomposition.NotonlydoesJGSEsolvethejointestimationprobleminamoreefcientmanner,butitimprovestheidenticationandestimationaccuracyremarkably.AmorecarefulexaminationinthenextsubsectionshowsthattheJAGscreeninganddecompositionsuccessfullyremovedunnecessaryedgesandthusreducedthesearchspacefornetworkidentication. Rand 1971 )isadoptedtoevaluatethedecompositionaccuracy.Itisobtainedbycomparingthemembershipsofeachpairofnodesassignedbyanalgorithmwiththetruememberships.Ifapaircomingfromthesameclusterareassignedtoasinglecluster,itisdenedasatruepositive(TP);ifapaircomingfromdifferentclustersareassignedtodifferentclusters,itisdenedasatruenegative(TN);ifapaircomingfromthesameclusterareassignedtodifferentclusters,itisafalsenegative(FN);ifa 71

PAGE 72

BRIversusnetworksize(q=0.5) Randindexcomparison. paircomingfromdifferentclustersareassignedtothesamecluster,itisafalsepositive(FP).ThentheRandindex(RI)isdenedasRI=(TP+TF)=(TP+TF+FP+FN).Wesimulatenetworksconsistingoftwoequal-sizesubnetworks.EachdiagonalblockofBisgeneratedasarandomsparsematrix;eachdiagonalblockofisgeneratedsothatallthediagonalelementsare1andalltheoff-diagonalelementsare0.5.Wesetn=30andgeneratethedatafollowing( 3 ).ObtainingtheJAGscreeningpattern^CfromGIST,weapplytherobustJAGdecompositionproposedinSection 3.2.4 todecomposethenetworkintotwosubnetworksandevaluatetheRandindex.Tomakeafaircomparison,theRIsofthesoleGTGgraph^CsGTG(assuming^sGTG=I)andthesoleCDGgraph^CsCDG(assuming^BsCDG=0aftercenteringthedata)withthesamequantileqarealsoevaluated.TheresultsareshowninFigure 3-3 .Eachexperimentisrepeated50times.Inallthesettings,GISToutperformssGTGandsCDGbyalargemarginandachievesmorereliabledecomposition.Moreover,itsperformanceisratherinsensitivetothechoiceofthequantileparameterq.Itisrelativelyeasytospecifythevalueofqaslongasqd2boundsthetruecardinality. 72

PAGE 73

Computationtimecomparison. 3.4.2 .LetTJGSEbethetotalcomputationtimetakenbytheJGSElearningframework(GIST+FLOG)andTFLOGbethecomputationtimetakenbyapplyingFLOGdirectlytothewholenetwork(WedidnotincludeMRCEinthecomparisonbecauseitismuchlessefcientthanFLOGforlargeproblems).Wesetagridofvaluesfor(B,)tocatchmostsparsitypatternsofBand.Thequantileparameterissetas0.3inGIST.WereporttheratioTJGSE=TFLOGfordifferentcombinationsof(p,ps)inTable 3-2 ,wheren=100inalltheexperiments.Table 3-2 showsthatmorethanhalfoftherunningtimecanbesavedwhenthenetworkisdecomposable.Thelargertheratiop=psis,themorecomputationalcostissaved.WeconductedtheexperimentseriallyonasinglePC,butifparallelcomputingresourcesareavailable,thecomputationalefciencycanbefurtherboosted.Bydecomposingalarge-scalenetwork,anotherwisecomputationallyexpensiveoreveninfeasibleproblembecomesmucheasiertosolve.Thisisveryusefulinmodernnetworkanalysis,aswillbedemonstratedbytwoexamplesinSection 3.5 73

PAGE 74

JAGdecomposition(q=0.1)forS&P500. Zhaoetal. 2012 ).WerstappliedtheGIST(withquantileq=0.1)andtherobustJAGdecomposition.Figure 3-4 showstheresultingclusters,wherethenodesareplacedbytheFruchterman-Reingoldalgorithm( FruchtermanandReingold 1991 ).Interestingly,theS&P500datahasbeendecomposedinto10subnetworks,whicharehighlyconsistentwiththe10givencategoriesinthedatadocumentation;thecorrespondingRIiscloseto0.9(seeFigure 3-5 ).WethenvariedqandsystematicallycomparedthespectralclusteringresultsofthenetworksobtainedbyGIST,sGTGandsCGD,withrespecttothe10stockcategories. 74

PAGE 75

RandindexforS&P500. TheRIsareshowninFigure 3-5 .GISTreachessubstantiallyhigherclusteringaccuracythansGTGandsCDGdo.Moreover,itsperformanceisquiterobusttothechoiceofq.Forcomparison,wealsoappliedtheblockdiagonalscreeningrule( Wittenetal. 2011 ; MazumderandHastie 2012 )todecomposetheS&P500into10subnetworks.AsshownbyFigure 3-6A ,thenetworkisdecomposedintoagiantclusterandnineisolatednodes,whichisdifculttointerpretanddoesnotreectthe10-categorystructureofthenetwork.Also,suchadecompositiondoeslittlehelpforreducingthecomputationalcost.Evenworse,Figure 3-6B showstheoptimalsCDGestimate(usingtheRpackagehugewithdefaultsettings( Zhaoetal. 2012 )).Thecorrespondingoptimalregularizationparameteris=0.0807,whereasthesmallestvaluetoachievea10-subnetworkdecompositionisth=0.2158.ThisveriesthediscussioninSection 3.2.4 thattheblockdiagonalscreeningruleresortstosettinganoverlylargevaluefortoyieldagraphdecomposition.However,suchahighthresholdinglevelcanmaskmanytrulyexistingedgesandtheresultingdecompositionstructuremaynotbetrustworthy. 75

PAGE 76

BOptimalsCDGestimate sCDGlearningsforS&P500.

PAGE 77

3-7 ,thetwographssharesimilartopologies.Theybothhavethreehubnodes,namelyGOOG(GoogleInc.),ISRG(IntuitiveSurgicalInc.),andPCLN(Priceline.com,Incorporated).ItisinterestingthatthethreehubsrespectivelycomefromthethreelargestsectorsoftheNASDAQ-100,namelyTechnology,Healthcare,andConsumerservice3.GOOGisaworld-famoustechnologycompany,anditmakessensethatitisrelatedwithotherwell-knowntechnologycompanies,suchasAMZN(Amazon.com,Inc.),INTC(IntelCorporation),andNVDA(NVIDIACorporation).Infact,weseethatalmostalltheneighborsofGOOGaretechnologyconcentrated

PAGE 78

BCDG TheGTGandCDGofNASDAQ-100.Thesizeofthenodeisproportionaltoitsdegree.Theedgewidthindicatestheweightoftheconnection.Solidlinerepresentspositiveweightanddottedlinerepresentsnegativeweight. corporations,whichformaclusterofthelargestsectorofNASDAQ-100.Similarly,PCLN,acommercialwebsitethathelpscustomersobtaindiscountsfortravel-relatedpurchases,isrelatedwithotherservicecompaniessuchasDTVandWFM;andISRG,acorporationthatmanufacturesroboticsurgicalsystems,isconnectedwithPRGO,apharmaceuticalmanufacturer.Ofcourse,therearesomedifferencesbetweentheGTGandCDG.Forexample,intheCDG,FFIViscorrelatedwithGOOGandIRSG;whileintheGTG,itisisolated.Perhapsinterestingly,GOOGhasaverystrongGrangercausalityconnectionwithFLEX,buttheyareconditionallyindependentintheCDG.Thankstothejointgraphicalmodel,wecangetacomprehensiveviewoftheNASDAQ-100network.Weperformedsimilaranalysisforothersegmentsandexaminedthechangesofthenentworktopologyaswell.Duetopagelimitation,suchdetailsarenotshownhere.WenextstudytheforecastingcapabilityofJGSE,measuredbytherollingMSE,asdenedinSection 2.6.2 .Forcomparison,wealsoappliedFLOG(MRCEiscomputationallyintractablehere)andsGTG.Following YuanandLin ( 2007 ),we 78

PAGE 79

RollingMSEforNASDAQ-100. Model Segment1Segment2Segment3Segment4 sGTG 24.330.820.632.2FLOG 21.931.420.118.4JGSE chosethetuningparametersbyBIC,wherethedegreeoffreedomisdenedasK=Pi,j1^bij6=0+Pij1^!ij6=0ifbothBandareestimatedandK=Pi,j1^bij6=0ifonlyBisestimated.TherollingMSEsgivenbythethreemodelsforwindowsizew=0.8nandhorizonh=1areshowninTable 3-3 .WeseethatthejointestimationofBandoutperformsthesoleGTGlearninginforecasting,whichimpliesthatthereindeedexistssomeconditionaldependencestructurebetweenthestockindicesanditisbenecialtotakeintoaccountsuchcorrelationsinestimation.TheJGSEisabletofurtherimprovetheforecastingperformancebyjointregularization,whichisnotsurprisingfromStein'sclassicalworks(see,e.g., Stein ( 1956 )).TheimprovementinestimationaccuracyisbroughtbytheJAGscreeninginStage1,whichremovesmanynon-existingconnectionsviascreeningandreducesthesearchspaceofStage2. 79

PAGE 80

Anishchenkoetal. 2007 ).Forsuchsystems,linearmodelsliketheVARprocessarenotsufcienttomodeltheirnonlinearbehaviors.Nonlinearmodelsareinneed.Theanalysisofnonlinearsystemsareusuallyproblem-specic,sincedifferentsystemsshowdifferentkindsofnonlinearcharacteristics,whichrequiredifferentmodelsandanalysismethods.Inourwork,weareparticularlyinterestedinthesigmoidrecurrentnetwork(SRN)modelwhichiscommonlyusedtodescribegeneregulatorynetworks( Chenetal. 2009 )andneuralnetworks.WewillrstgiveabriefintroductiontothegeneregulatorynetworkandtheSRNmodel,andthenproposeourmethodology. 4.1.1TheGeneRegulatoryNetworkAgeneregulatorynetworkisanetworkinwhichanodestandsforagene(oritsexpressionlevel)andalinkstandsfortheregulatoryrelationshipbetweentwogenes.Thedevelopmentofmicroarraytechnologieshasenabledresearcherstoobtaingeneexpressiondataandreverseengineertheregulatoryrelationswithinagenenetwork.However,thislearningtaskisextremelychallenging.First,therelationshipsbetweengenesarecomplicatedandusuallynonlinear.Second,themicroarraydataareverynoisyandonlyavailableatalimitednumberoftimepoints.Effectivestrategiesareneededtoextracttheinformationaboutthenonlinearrelationshipsbetweengenesfromnoisyandlimiteddata.Avarietyofmathematicalmodelsandcomputationalmethodshavebeenproposedfortheinferenceofgeneregulatorynetworks,includingBooleannetworks( Akutsuetal. 2000 ),Bayesiannetworks( Vignesetal. 2011 ),DynamicBayesiannetworks( Dojeretal. 2006 ),ordinarydifferentialequations( Caoetal. 2012 )andstochasticdifferentialequations( Chenetal. 2005 ). 80

PAGE 81

HanggiandMarchesoni 2005 ).Intheaboveequation,xiistheexpressionlevelofthetargetgeneandSiisasetofitsregulatorygenes.Figure 4-1 showsasmallexampletoillustratethetopologyanddynamicsofthesystem.Alinearcombinationoftheexpressionlevelsoftheregulatorygenesisappliedtoregulatethetargetgenethroughasigmoidfunction: 1+exp().Wedeneaijastheregulatorystrengthofgenejongenei,liasthecross-regulatorystrengthongenei,anddiastheself-regulatorystrength.Wecallthismodelthesigmoid 81

PAGE 82

4-1 ,thesystemhasasparsetopologyageneisnotregulatedbyeveryothergeneinthenetwork.Asamatteroffact,thegeneregularnetworkstendtobeverysparseageneisonlyregulatedbyfewothergenes.Itisataskofstatisticalinferencetodeterminewhethersuchregularizationexistornot.Basedonthetime-seriesobservationsofthegeneexpressionlevels,whichareavailablefrommicroarrayexperiments,wewouldliketoinfertheparametersin( 4 )aswellasidentifytheregulatorysetSi.Duetothenonlinearityandstochasticnatureofthesystem,such Figure4-1. Topologyanddynamicsofthesigmoidrecurrentnetworkmodel. ainferencetaskisextremelydifcult,especiallywhenonlyasmallnumberofnoisyobservationsareavailable. Chenetal. ( 2004 )and Chenetal. ( 2005 )simplifythemodelbyputtingthesigmoidfunctiononeachgeneseparately,whichleadstoanessentiallylinearmodelafterasigmoid 82

PAGE 83

4 )as 4 )usingEuler'sScheme:

PAGE 84

22nXs=1xs 2W1=2Y((XB)LXD+1cT)2F.(4)Toobtainasparsemodel,PMLmethodisapplied,whichsolves 2W1=2Y((XB)LXD+1cT)2F+P(B;).(4)Notethatfisadditivein1,,p.ItmeansthatifthepenaltyonBisalsoseparable,wecanmodeleachgeneseparatelyandlearnvectoriseparatelybyoptimizing 2kW1=2(y(l(X)dx+c))k22+P(;),(4)whereyistheithcolumnofYandxisthe(i+1)thcolumnofX. 84

PAGE 85

4 )or( 4 ).Firstdene Thevectorversionofcanbedenedinanelementwisefashion:(,y)=[(1,y1),,(n,yn)]T.Thealgorithmrunsasfollows: Whenlearningj,weinputtheithcolumnofYasy,andthe(i+1)thcolumnofXasx,whichis~xp(NotethattherstcolumnofXcorrespondstotheinterceptterm).At 85

PAGE 86

2 )andthe`l0+l2'penaltyPHR(t;,)=1 22 2t2.Thedenitionofthesefunctionscanbefoundin( She 2012a ).Generally,anypenaltyfunctioncanbeused,aslongasthethresholdingparameterisscaledproperly. 4 ).Givenanyinitialpoint0,theiterates(j,lj,dj,cj)ofAlgorithm 3 satisfyf(j,lj,dj,cj)f(j,lj+1,dj+1,cj+1)f(j+1,lj+1,dj+1,cj+1). C forthedetailedproof.Remarks. 86

PAGE 87

2,

PAGE 88

2W1=2Y((XB+1uT)LXD+1cT)2F+ 2.4 andSection 3.2.4 .Ashasbeendiscussed,screeningalgorithmsenjoyadvantagessuchaseasyimplementationandfastcomputation.Usingsuchalgorithms,theusercandirectlydecidethecardinality(sparsitylevel)oftheestimate.Therefore,theyareexibleandpowerfulinpractice.WeproposetheProgressiveSigmoidalNetworkScreening(PSNS),asgivenbyAlgorithm 4 ,tosolvetheaboveproblem. 4 ),theiterates(Bj,Lj,Dj,cj)givenbyAlgorithm 4 satisesthefunctionvaluedecreasingpropertyF(Bj,Lj,Dj,cj)F(Bj,Lj+1,Dj+1,cj+1)F(Bj+1,Lj+1,Dj+1,cj+1).SeeAppendix D forthedetailedproof.TheresultofAlgorithm 4 isusuallynotsensitivetotheridgeparameter.Soitiseasytotuneusingcrossvalidationorwecanjustsetitatasmallvalue,say104.Inpractice,wecanreducethecardinalityofSjfromn(n1)tothetargetminaprogressivewaybyusingavariantmjtoavoidgreedyselection.Empiricalstudyshows 88

PAGE 89

thatthelogistic-decaycoolingschedulem(j)2n(n1)=(1+ej)with=0.001workswell. 4.4.1SimulationoftheModelandStochasticProcessesWerstgenerateasparseregulationmatrixA.Fornodei,thenumberofregulatorsissetasjSij=1+r,whererisdrawnfromaBinomialdistributionrB(p1,1 2p).Thenthe1+rregulatorsarechosenrandomlyfromthe(p1)candidates(nodeiitselfisexcluded).Ifj=2Si,setaij=0;otherwise,drawuU(0,1)andletaijN(1.5,0.1)ifu0.5oraijN(1.5,0.1)otherwise.Otherparametersaredrawnidenticallyandindependentlyfromnormaldistributions:liN(1.5,0.1),diN(0.8,0.1),uiN(0,0.1),ciN(0,0.1).ThenthestochasticprocessesaregeneratedusingMatlab 89

PAGE 90

Bansaletal. ( 2006 ),whichisbasedonalinearmodel. 4.4.3 .TSNIsuffersfromover-simplifyingthemodel,whichcanbeseenfromthepredictionerror.ThethresholdedMLestimate,thoughhasagoodpredictionperformance,failstoidentifythenetworktopology,givingaROCcurvenobetterthanrandomguessing.Thisindicatesthatasimplepost-thresholdingdoesnotfulllthetaskofnetworkidenticationforanonlinearsystem.ThePSNSalgorithm,whichaddressesthenonlinearityofthedataandproducesinherentlysparsesolutions,isabletoachievethe 90

PAGE 91

BPredictionerror Performancecomparison.p=10. Bp=40. ROCcurves. bestidenticationperformance.Theminorsacriceofpredictionaccuracy(comparedwiththeMLestimate)isacceptable,giventhatthemainobjectiveistoinferthenetworktopology.Figure 4.4.3 showstheROCcurvesfortwomodelswithsizesp=20(m=30)andp=40(m=68),respectively. 91

PAGE 92

4.5.1TheSOSNetworkWeapplytheproposedmethodtothenine-geneSOSnetwork,whichisapartoftheDNA-damageresponsepathwayinthebacteriaE.coli.Thismicroarraydatasetconsistsof6observationsof9genes,whichissampledattimepoints0,12,24,36,48,and60minfromsomedrugtreatment.Intheworkof Bansaletal. ( 2006 ),theygaveadetaileddescriptionofthedataandproposedaninferencemethod,namedasTSNI,whichisbasedonalinearmodelspeciedbyordinarydifferentialequations.Thereare43directedconnectionsbetweentheninegenesthathavebeenreportedwithsupportingevidenceinliterature.Following Bansaletal. ( 2006 ),wetreatthoseconnectionsasthetruetopologyoftheSOSnetworkanduseittoevaluatetheinferencemethods.WeapplythePSNSalgorithmtothedatasetasthetargetcardinalitymvariesfrom20to70andcomputesthetruepositiverate(TPR)andtruenegativerate(TNR)oftheestimate,basedonthetruetopology.Theridgeparameteristunedbyleave-one-outSCV.Weplotthe(TPR,TNR)pairsinFigure 4-4 Figure4-4. TruepositiverateversustruenegativeratefortheSOSgenenetwork. 92

PAGE 93

4-4 ,denotedbyPSNS-opt.TheplotofPSNS-optisabovethatofTSNIeverywhere,andthegapbetweenthetwoisprettylarge.Aswenoticeintheexample,inferringgenenetworksfromthemicroarraytimeseriesdataisextremelychallengingpartlyduetothesmallnumberofobservationsaswellasthehighnoiselevel.TheTSNIalgorithmappliessplinesmoothingtocompensatefornoiseandthenappliessplineinterpolationtoincreasethenumberoftimepoints.Similarsmoothingandinterpolationtechniqueshavebeenusedinotheralgorithmsthatarebasedonlinearordinarydifferentialequations.DatainterpolationmayalsobebenecialforestimatingtheSRNmodel.However,thepopularsplineinterpolationviolatesthestochasticnatureoftheSRNmodelandthusisnotappropriate.Datainterpolation/augmentationfordiffusionprocessmodelsspeciedbynonlinearstochasticdifferentialequationshavebeenaninterestingtopicforresearchers( Elerianetal. 2001 ; Eraker 2001 ; Kouetal. 2012 ).Forexample, Kouetal. ( 2012 )proposedamultiresolutionmethodbasedonMCMCandBayesianinference.Suchmethodsareonlyfeasibleforvaraiblesoflowdimensions,say,1or2.Asthedimensionalityincreases,notonlytheestimationbiasincreases,butalsothecomputationalcomplexitybecomesprohibitive.Tobeappliedtonetworkinference,amoreefcientstrategyisingreatneed.Moreover,tothebestofourknowledge,nosparsityassumptionshavebeenintroducedinexistingworks.Therefore,asfuturework, 93

PAGE 94

RegulatoryconnectionsoftheYeastcellcyclesubnetwork. SWI4HCM1NDD1SWI5ASH1 SWI4HCM1 CDC46 weproposetostudythedataaugmentationfornonlinearstochasticdynamicalnetworkswithsparsetopology. Spellmanetal. ( 1998 ).Following Chenetal. ( 2004 ),weselect20geneswhoseexpressionpatternsuctuateinaperiodicmannerwiththecellcycleandapplythelearningmethodtothe-factorarresteddataset,whichrecordsthegeneexpressionlevelsat18timepointsduringacellcycle.Inthe20-genesubnetwork,vegeneshavebeenidentiedas

PAGE 95

Chenetal. ( 2004 )hasidentied55connectionsfromthetranscriptionfactorstothetargetgenes,ofwhich14arereported(truepositiverateTPR=73.7%).Forfaircomparison,wechoosem=218forPSNS,whichalsoidenties55connectionsfromthetranscriptionfactorstothetargetgenes,ofwhich17arereported(truepositiverateTPR=89.5%).PSNShasachievedasubstantialimprovementover Chenetal. ( 2004 ),whichusedasimplermodelandagreedyforward-selectionstrategy(seeSection 4.1.3 ).TheregulatoryconnectionsthatarereportedandthatareidentiedbythetwomethodsareshowninTable 4-1 .Symbol`'standsfortheregulatoryconnectionsthataresupportedbypublishedevidence;`'standsfortheonesthatareidentiedbyPSNSand` 'standsfortheonesthatareidentiedby Chenetal. ( 2004 ).

PAGE 96

2 andChapter 3 ,westudiedtheidenticationandestimationoflineardynamicalnetworksusingpenalizedmaximumlikelihood(PML)method.Theimprovementswehavemadeoverthestateoftheartaremainlyasfollows.First,weconsideredthestationaritypropertyofthePMLestimate.Weshowedbyexperimentthat,thoughdesignedforstationaryprocesses,thePMLmethodmayendupwithnonstationaryestimates.Suchestimatesfailtocaptureallthecharacteristicsoftheoriginalsystem.Toovercomethisdrawback,weproposedtheframeworkofstationary-sparse(S2)networklearningandtheBerhuiterativesparsitypursuitwithstationairity(BISPS)algorithmwhicheffectivelyguaranteesastaitionaryestimate.Second,wetackledtheso-calledultra-highdimensionalproblemforthenetworklearning.Wedesignedascreeningalgorithmcalledquantilethresholdingiterativescreening(QTIS)topreselectconnectionsfortheBISPSalgorithm.ThenBISPScanbeappliedonthescreenedconnectionsfornerselectionandestimation.Ashasbeenshownbysimulationstudies,theQTISalgorithmsoutperformsthesureindependence 96

PAGE 97

4 studiedthesigmoidrecurrentnetworkmodel(SRN)fornonlineardynamicalnetworks,wherethenonlinearityofthenetworkdynamicsiscapturedbythesigmoidfunction.WederivedthePMLestimatefortheSRNmodelanddesignthesparsesigmoidregression(SSR)algorithmtoestimatetheparameterswithasparsityconstraintontheregulationmatrix.Wealsodesignedtheprogressivesigmoidnet-workscreeningalgorithm,whichenablestheusertohavethedirectcontroloverthecardinalityoftheestimatedregulationmatrix. 97

PAGE 98

98

PAGE 99

2 ),theminimizationproblem 2kAk2F+PB(A;,)(A)hasauniqueoptimalsolutiongivenby^B=B(;,),whereB(;,)istheBerhuthresholdingruledenedas( 2 ). Proof. She ( 2009 ).Followingtheconstructionprocedure She ( 2012a ),B(;,)istheglobalminimizerof( A ). Proof. 1+20.

PAGE 100

(+1)2b22ba+ +1sgn(t~t)+(2a)=b[(+2) (+1)2b2a+ +1sgn(t~t)]+(2a)(+=)[(+2) (+1)2(+=)2a+ +1sgn(t~t)]+(2a)= [+22(a+)sgn(t~t)]+(2a)0.f)Supposea+=,b+=.Then=jt~tj2jt Proof. vonNeumann 1937 ),whichstatesthatforanyppmatricesBandAwithsingularvalues12pand12prespectively, 100

PAGE 101

3 isnonexpansive,thatis,k(B)(~A)k2kB~Ak2foranyB,~A2Rpp. Proof. A )again,wehave 101

PAGE 102

2kAA0k22+PB(A;,),s.t.kAk2.(A) 2kAA0k22+f(A)+g(A),(A)wheref(A)=PB(A;,)andg(A)=1kAk2isanindicatorfunctionforkAk2denedas Rockafellar 1997 )andtheysatisfydomf\domg6=;.Lemma 1 andLemma 3 implythatB(;,)and(;)aretheproximityoperators( Moreau 1962 )off(A)andg(A)respectively: proxfA=B(A;,)andproxgA=(A;).

PAGE 103

BauschkeandCombettes ( 2008 ),itholdsthat A ). NowwecanestablishTheorem 2.1 .RecallthatAlgorithm 1 istosolve 2kYXBk2F+PB(B;,),s.t.kBk21.Withoutlossofgenerality,weset2=0andM2=1inAlgorithm 1 .Supposek0>kXk2throughouttheremainderoftheproof.WeuseOpial'sconditions( Opial 1967 )toprovetheconvergenceoffBkg.Tobespecic,weshowthattheiterationofstep1to3inAlgorithm1isanonexpansiveasymptoticallyregularmappingwithanonemptysetofxedpoints. 1 2(k20IkXk22)kBk+1Bkk2F. 2kYXAk22+PB(A;,)+1 2trf(k20IXTX)(AB)T(AB)g.Itiseasytoverifythatg(B,A;,)g(A,A;,)=f(A;,).EqualityholdsonlyforB=A,sominkAk1,Bg(B,A;,)isequivalenttominkBk1f(B;,).a)GivenB,minimizinggoverAisequivalentto 2kAB1

PAGE 104

A )givenbyLemma 5 ,substituting1,A0B+1 2trf(k20IXTX)(AB)T(AB)g,whichleadstoB=A.Combininga)andb),weobtainf(Bk;,)=g(Bk,Bk;,)g(Bk,Bk+1;,)=f(Bk+1;,)+1 2(k20IkXk22)kBk+1Bkk2F.Theproofiscomplete. Sincefisconvex,Lemma 6 impliesthatthesequencefBkgisasymptoticallyregular( Browder 1966 ). 1 isuniformlybounded. Proof. 1 isanonexpansivemapping.

PAGE 105

2 andLemma 4 ,Bandarenonexpansivemappings.Infact,thecompositionofnonexpansivemappingsisalsononexpansive.SotheinneriterationgivenbyStep2inAlgorithm 1 isnonexpansive.Whenk0>kXk2,Step1inAlgorithm 1 isnonexpansive.Again,thecompositionofStep1andStep2isnonexpansive.Hence,theiterationofAlgorithm 1 isnonexpansive. Lemma 7 andLemma 8 implythatthemappingofAlgorithm 1 isanonexpansivemappingintoaboundedclosedconvexsubset.ByTheorem1in Browder ( 1965 ),ithasaxedpoint.Then,withallOpial'sconditionssatised,thesequencefBkghasauniquelimitpoint,denotedasB,anditisaxedpointofAlgorithm 1 .Next,weprovethatBmustbeaglobalminimizerofproblem( 2 ).Denoteh(B)=kBk21.ByLemma 5 andLemma 6 ,BsatisestheKKTconditionsofproblem( A )with=1:8>>>>>>>>>><>>>>>>>>>>:02BA0+@PB(B;=k20,=k20)+@h(B),h(B)0,0,h(B)=0.SubstitutingA0=B+1 2 )isconvexanditsKKTconditionsaregivenby( A ).Hence,Bisaglobalminimizerofproblem( 2 ).Theproofiscomplete. 105

PAGE 106

2kBAk2Fs.t.kAk0m. 2kBk2F1 2Pi,j2Ia2ij.Therefore,thequantilethresholding#(B;m)yieldsaglobalminimizer. Deneasurrogatefunction 2kYXAk2F+1 2trf(k20IXTX)(AB)T(B)g.BasedonLemma 9 andk20>kXk2,thefunctionvaluedecreasingpropertycanbeprovedfollowingthelinesofLemma 6 .Sowehavel(Bk)=~l(Bk,Bk)~l(Bk,Bk+1)=l(Bk+1)+1 2(k20IkXk22)kBk+1Bkk2Fl(Bk+1).Theproofiscomplete. 106

PAGE 107

2kW1=2(y())k22,then C )and( C ),wehave 2()T(XTWD(+(1))X)()(C)forsome2(0,1).Since 2(12s+2s2ys 4)ws 4),thediagonalentriesofWDareuniformlyboundedbymaxsws 4)(=k0(y,w)).

PAGE 108

C )into( C ),wehaveg0(,)l0()+P(;)+1 2(KkXk22kWDk22)f0()forany,.Theequalityholdsifandonlyif=.Given,minimizingg0(,)overistomin1 2k+1 She 2012a ),thesolutionis=(1 4.1 .Given,minimizingf(,l,d,c)over(l,d,c)istominl,d,c1 2kW1=2(yl+dxc)k22.Theweightedleast-squareestimateasinstep2inAlgorithm 3 givestheoptimalsolution.Therefore,therstequalityholds. 108

PAGE 109

2kW1=2l(y+dxc l(X))k22+P(;).Itisequivalenttominimizingf0()withy(y+dxc)=l,Wl2Wandreparameterizingbasedonthechoiceofthresholdingrule.Following( C ),thesecondequalityholds.Theproofiscomplete. 109

PAGE 110

2kABk2F+ 1+AI.Itfollowsthatf0(^B)=1 2kAk2F1 2Pi,j2Ia2ij.Therefore,thequantilethresholding#HR(A;m,)yieldsaglobalminimizer. BasedonLemma 10 ,thefunctionvaluedecreasingproperty( 4 )canbeprovedfollowingtheproofofTheorem 4.1 (extensiontomultivariateformisstraightforward),withthepenaltyfunctionin( 4 )beingtheridgepenalty. 110

PAGE 111

HirotuguAkaike.Anewlookatthestatisticalmodelidentication.AutomaticControl,IEEETransactionson,19(6):716,1974. TatsuyaAkutsu,SatoruMiyano,andSatoruKuhara.Inferringqualitativerelationsingeneticnetworksandmetabolicpathways.Bioinformatics,16(8):727,2000. YaI.Alber,AlfredoN.Iusem,andMikhailV.Solodov.Ontheprojectedsubgradientmethodfornonsmoothconvexoptimizationinahilbertspace.MathematicalProgram-ming,81(1):23,1998. VadimSemenovichAnishchenko,VladimirAstakhov,andTatjanaVadivasova.Nonlineardynamicsofchaoticandstochasticsystems:tutorialandmoderndevelopments.Springer,2007. LarryArmijo.MinimizationoffunctionshavingLipschitzcontinuousrstpartialderivatives.PacicJournalofMathematics,16(1):1,1966. OnureenaBanerjee,LaurentElGhaoui,andAlexandred'Aspremont.ModelselectionthroughsparsemaximumlikelihoodestimationformultivariateGaussianorbinarydata.TheJournalofMachineLearningResearch,9:485,2008. MukeshBansal,GiusyDellaGatta,andDiegoDiBernardo.Inferenceofgeneregulatorynetworksandcompoundmodeofactionfromtimecoursegeneexpressionproles.Bioinformatics,22(7):815,2006. AlbertL.Barabasi.Linked:TheNewScienceOfNetworksScienceOfNetworks.BasicBooks,2002.ISBN9780786746965. HeinzH.BauschkeandPatrickL.Combettes.Adykstra-likealgorithmfortwomonotoneoperators.PacicJournalofOptimization,4(3):383,2008. PeterJ.BickelandElizavetaLevina.Regularizedestimationoflargecovariancematrices.TheAnnalsofStatistics,pages199,2008. ThomasBlumensathandMikeE.Davies.Normalizediterativehardthresholding:Guaranteedstabilityandperformance.IEEEJournalofSelectedTopicsinSignalProcessing,4(2),Apr.2010. StefanoBoccaletti,VitoLatora,YamirMoreno,MartinChavez,andD-UHwang.Complexnetworks:Structureanddynamics.Physicsreports,424(4):175,2006. AndrewBolstad,BarryD.VanVeen,andRobertNowak.Causalnetworkinferenceviagroupsparseregularization.SignalProcessing,IEEETransactionson,59(6):2628,2011. 111

PAGE 112

FelixE.Browder.Nonexpansivenonlinearoperatorsinabanachspace.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,54(4):1041,1965. FelixE.Browder.Thesolutionbyiterationofnonlinearfunctionalequationsinbanachspaces.BulletinoftheAmericanMathematicalSociety,72(3):571,1966. AtulJ.ButteandIsaacS.Kohane.Mutualinformationrelevancenetworks:functionalgenomicclusteringusingpairwiseentropymeasurements.5:418,2000. TerryM.Caelli,DavidMcGSquire,andTomWild.Model-basedneuralnetworks.NeuralNetworks,6(5):613,1993. Jian-FengCai,EmmanuelJ.Candes,andZuoweiShen.Asingularvaluethresholdingalgorithmformatrixcompletion.SIAMJ.onOptimization,20(4):1956,Mar.2010. JiguoCao,XinQi,andHongyuZhao.Modelinggeneregulationnetworksusingordinarydifferentialequations.MethodsinMolecularBiology,802:185,2012. PeterCarrington,JohnScott,andStanleyWasserman.Modelsandmethodsinsocialnetworkanalysis.Cambridgeuniversitypress,2005. Yu-HsiangChang,Yu-ChaoWang,andBor-SenChen.Identicationoftranscriptionfactorcooperativityviastochasticsystemmodel.Bioinformatics,22(18):2276,Sep.2006. Hong-ChuChen,Hsiao-ChingLee,Tsai-YunLin,Wen-HsiungLi,andBor-SenChen.Quantitativecharacterizationofthetranscriptionalregulatorynetworkintheyeastcellcycle.Bioinformatics,20(12):1914,2004. Kuang-ChiChen,Tse-YiWang,Huei-HunTseng,Chi-YingF.Huang,andCheng-YanKao.AstochasticdifferentialequationmodelforquantifyingtranscriptionalregulatorynetworkinSaccharomycescerevisiae.Bioinformatics,21(12):2883,Jun.2005.ISSN1367-4803. LuonanChen,Rui-ShengWang,andXiang-SunZhang.Biomolecularnetworks:methodsandapplicationsinsystemsbiology,volume10.Wiley.com,2009. AlessandroChiusoandGianluigiPillonetto.ABayesianapproachtosparsedynamicnetworkidentication.Automatica,48(8):1553,2012. FrankE.CurtisandMichaelL.Overton.Asequentialquadraticprogrammingalgorithmfornonconvex,nonsmoothconstrainedoptimization.SIAMJournalonOptimization,22(2):474,2012. 112

PAGE 113

StevenJ.DavisandJamesA.Kahn.Interpretingthegreatmoderation:Changesinthevolatilityofeconomicactivityatthemacroandmicrolevels.JournalofEconomicPerspectives,22(4):155,2008. NorbertDojer,AnnaGambin,AndrzejMizera,BartekWilczynski,andJerzyTiuryn.Applyingdynamicbayesiannetworkstoperturbedgeneexpressiondata.BMCBioinformatics,7(1):249,2006. DavidL.Donoho.Compressedsensing.InformationTheory,IEEETransactionson,52(4):1289,2006. JohnDuchi,StephenGould,andDaphneKoller.ProjectedsubgradientmethodsforlearningsparseGaussians.InProceedingsoftheTwenty-fourthConferenceonUncertaintyinAI(UAI),2008. AndrewLlDulmageandNathanSaMendelsohn.Coveringsofbipartitegraphs.CanadianJournalofMathematics,10(4):516,1958. BradleyEfron.Bootstrapmethods:anotherlookatthejackknife.TheannalsofStatistics,pages1,1979. OlaElerian,SiddharthaChib,andNeilShephard.Likelihoodinferencefordiscretelyobservednonlineardiffusions.Econometrica,69(4):959,2001. BjrnEraker.Mcmcanalysisofdiffusionmodelswithapplicationtonance.JournalofBusinessandEconomicStatistics,19(2):177,2001. JianqingFanandRunzeLi.Variableselectionvianonconcavepenalizedlikelihoodanditsoracleproperties.JournaloftheAmericanStatisticalAssociation,96:1348,Dec.2001. JianqingFanandRunzeLi.Statisticalchallengeswithhighdimensionality:featureselectioninknowledgediscovery.InInternationalCongressofMathematicans,Aug.2006. JianqingFanandJinchiLv.Sureindependencescreeningforultrahighdimensionalfeaturespace.JournaloftheRoyalStatisticalSociety:SeriesB(StatisticalMethodol-ogy),70(5):849,2008. JianqingFanandJinchiLv.Aselectiveoverviewofvariableselectioninhighdimensionalfeaturespace.StatSin.,20(1):101,Jan.2010. JianqingFan,RichardSamworth,andYichaoWu.Ultrahighdimensionalfeatureselection:Beyondthelinearmodel.JournalofMachineLearningResearch,10:2013,Dec.2009.ISSN1532-4435. 113

PAGE 114

ChrisFraleyandAdrianE.Raftery.Howmanyclusters?whichclusteringmethod?answersviamodel-basedclusteranalysis.Thecomputerjournal,41(8):578,1998. JeromeFriedman,TrevorHastie,HolgerHoing,andRobertTibshirani.Pathwisecoordinateoptimization.TheAnnalsofAppliedStatistics,1(2):302,2007. JeromeFriedman,TrevorHastie,andRobertTibshirani.Sparseinversecovarianceestimationwiththegraphicallasso.Biostatistics,9(3):432,Jul.2008. ThomasFruchtermanandEdwardReingold.Graphdrawingbyforce-directedplacement.Software:Practiceandexperience,21(11):1129,1991. AndreFujita,JoaoSato,HumbertoGaray-Malpartida,RuiYamaguchi,SatoruMiyano,MariSogayar,andCarlosFerreira.Modelinggeneexpressionregulatorynetworkswiththesparsevectorautoregressivemodel.BMCSystemsBiology,1(1),2007. TimothyS.Gardner,DiegoDiBernardo,DavidLorenz,andJamesJCollins.Inferringgeneticnetworksandidentifyingcompoundmodeofactionviaexpressionproling.Science,301(5629):102,2003. ZoubinGhahramani.LearningdynamicBayesiannetworks.InAdaptiveProcessingofSequencesandDataStructures,pages168.Springer-Verlag,1998. RainerGoebel,AlardRoebroeck,Dae-ShikKim,andEliaFormisano.Investigatingdirectedcorticalinteractionsintime-resolvedfmridatausingvectorautoregressivemodelingandgrangercausalitymapping.MagneticResonanceImaging,21(10):1251,2003. MosheGoldbergandGideonZwas.Onmatriceshavingequalspectralradiusandspectralnorm.LinearAlgebraanditsApplications,8(5):427,1974. CliveW.J.Granger.Investigatingcausalrelationsbyeconometricmodelsandcross-spectralmethods.Econometrica:JournaloftheEconometricSociety,pages424,1969. MichaelC.GrantandStephenP.Boyd.Graphimplementationsfornonsmoothconvexprograms.InRecentadvancesinlearningandcontrol,pages95.Springer,2008. GeorgeE.HalkosandNickolaosG.Tzeremes.Oilconsumptionandeconomicefciency:Acomparativeanalysisofadvanced,developingandemergingeconomies.EcologicalEconomics,70(7):1354,May2011. 114

PAGE 115

LarsPeterHansenandThomasJSargent.Recursivelinearmodelsofdynamiceconomies.1990. TrevorHastie,RobertTibshirani,andJeromeFriedman.Theelementsofstatisticallearning,volume1.SpringerNewYork,2001. BingshengHe.Alternatingdirectionmethodwithself-adaptivepenaltyparametersformonotonevariationalinequalities.JournalofOptimizationTheoryandApplications,106(2):337,2000.ISSN0022-3239. PeterJ.Huber.Robuststatistics.Wileyseriesinprobabilityandmathematicalstatistics.Probabilityandmathematicalstatistics.Wiley,1981. DaleW.JorgensonandPeterJ.Wilcoxen.Environmentalregulationandu.s.economicgrowth.RANDJournalofEconomics,21(2):314,Summer1990. IlkkaKorhonen,LucaMainardi,GiuseppeBaselli,AnnaBianchi,PekkaLoula,andGuyCarrault.Linearmultivariatemodelsforphysiologicalsignalanalysis:applications.ComputerMethodsandProgramsinBiomedicine,51:121,1996. GueorgiKossinetsandDuncanJ.Watts.Empiricalanalysisofanevolvingsocialnetwork.Science,311(5757):88,Jan.2006. SamuelC.Kou,BenjaminP.Olding,MartinLysy,andJunS.Liu.Amultiresolutionmethodforparameterestimationofdiffusionprocesses.JournaloftheAmericanStatisticalAssociation,107(500):1558,2012. MarkA.Kramer,UriT.Eden,SydneyS.Cash,andEricD.Kolaczyk.Networkinferencewithcondencefrommultivariatetimeseries.PhysicalReviewE,79(6),June2009. HansR.Kunsch.Thejackknifeandthebootstrapforgeneralstationaryobservations.TheAnnalsofStatistics,17(3):1217,1989. HarriLahdesmakiandIlyaShmulevich.LearningthestructureofdynamicBayesiannetworksfromtimeseriesandsteadystatemeasurements.MachineLearning,71(2):185,Jun.2008. SteffenL.Lauritzen.GraphicalModels.OxfordSciencePublications.OxfordUniversityPress,USA,1996.ISBN9780198522195. WonyulLeeandYufengLiu.SimultaneousmultipleresponseregressionandinversecovariancematrixestimationviapenalizedGaussianmaximumlikelihood.JournalofMultivariateAnalysis,2012. ZhouchenLin.Somesoftwarepackagesforpartialsvdcomputation.CoRR,abs/1108.1548,2011. 115

PAGE 116

JinhuLuandGuanrongChen.Atime-varyingcomplexdynamicalnetworkmodelanditscontrolledsynchronizationcriteria.AutomaticControl,IEEETransactionson,50(6):841,Jun.2005. HelmutLutkepohl.NewIntroductiontoMultipleTimeSeriesAnalysis.Springer,2ndedition,2007. DanieleMarinazzo,MarioPellicoro,andSebastianoStramaglia.Kernel-grangercausalityandtheanalysisofdynamicalnetworks.PhysicalreviewE,77(5):056215,2008. AminS.MassoudandBruceF.Wollenberg.Towardasmartgrid:powerdeliveryforthe21stcentury.PowerandEnergyMagazine,IEEE,3(5):34,2005. DonatelloMaterassiandGiacomoInnocenti.Topologicalidenticationinnetworksofdynamicalsystems.AutomaticControl,IEEETransactionson,55(8):1860,2010. RahulMazumderandTrevorHastie.Exactcovariancethresholdingintoconnectedcomponentsforlarge-scalegraphicallasso.J.Mach.Learn.Res.,13:781,March2012.ISSN1532-4435. NicolaiMeinshausenandPeterBuhlmann.High-dimensionalgraphsandvariableselectionwiththelasso.TheAnnalsofStatistics,34(3):1436,2006. TerenceC.MillsandRaphaelN.Markellos.Theeconometricmodellingofnancialtimeseries.CambridgeUniversityPress,2008. Jean-JacquesMoreau.Fonctionsconvexesdualesetpointsproximauxdansunespacehilbertien.CRAcad.Sci.ParisSer.AMath,255:2897,1962. DomenicoNapoletaniandTimothyD.Sauer.Reconstructingthetopologyofsparselyconnecteddynamicalnetworks.PhysicalReviewE,77(2),2008. JorgeNocedalandStephenJWright.Numericaloptimization.SpringerScience+BusinessMedia,2006. ZdzisawOpial.Weakconvergenceofthesequenceofsuccessiveapproximationsfornonexpansivemappings.BulletinoftheAmericanMathematicalSociety,73(4):591,1967. MichaelL.OvertonandRobertS.Womersley.Onminimizingthespecialradiusofanonsymmetricmatrixfunction:Optimalityconditionsanddualitytheory.SIAMJournalonMatrixAnalysisandApplications,9(4):473,1988. 116

PAGE 117

DimitrisN.PolitisandJosephP.Romano.Thestationarybootstrap.JournaloftheAmericanStatisticalAssociation,89(428):1303,1994. WilliamM.Rand.Objectivecriteriafortheevaluationofclusteringmethods.JournaloftheAmericanStatisticalAssociation,66(336):846,1971. GregoryC.Reinsel.Elementsofmultivariatetimeseriesanalysis.Springer,2003. RainerO.RheinandKorbinianStrimmer.Learningcausalnetworksfromsystemsbiologytimecoursedata:aneffectivemodelselectionprocedureforthevectorautoregressiveprocess.BMCBioinformatics,8(Suppl2):S3+,2007. TyrellR.Rockafellar.Convexanalysis,volume28.Princetonuniversitypress,1997. AdamJ.Rothman,ElizavetaLevina,andJiZhu.Sparsemultivariateregressionwithcovarianceestimation.JournalofComputationalandGraphicalStatistics,19(4):947,2010. YiyuanShe.Thresholding-basediterativeselectionproceduresformodelselectionandshrinkage.Electron.J.Statist,3:384,2009. YiyuanShe.Aniterativealgorithmforttingnonconvexpenalizedgeneralizedlinearmodelswithgroupedpredictors.ComputationalStatisticsandDataAnalysis,56(10):2976,Oct.2012a. YiyuanShe.Aniterativealgorithmforttingnonconvexpenalizedgeneralizedlinearmodelswithgroupedpredictors.ComputationalStatisticsandDataAnalysis,9:2976,2012b. YiyuanShe,HuanghuangLi,JiangpingWang,andDapengWu.Groupediterativespectrumthresholdingforsuper-resolutionsparsespectralestimation.submittedtoIEEETransactionsonSignalProcessing,2012. ChristopherA.Sims.Macroeconomicsandreality.Econometrica,48(1):1,Jan.1980. PaulT.Spellman,GavinSherlock,MichaelQ.Zhang,VishwanathR.Iyer,KirkAnders,MichaelB.Eisen,PatrickO.Brown,DavidBotstein,andBruceFutcher.Comprehensiveidenticationofcellcycleregulatedgenesoftheyeastsaccharomycescerevisiaebymicroarrayhybridization.Molecularbiologyofthecell,9(12):3273,1998. CharlesStein.Inadmissibilityoftheusualestimatorforthemeanofamultivariatedistribution.Proc.ThirdBerkeleySymp.Math.Statist.Prob.,1:197,1956. 117

PAGE 118

JosF.Sturm.UsingSeDuMi1.02,aMATLABtoolboxforoptimizationoversymmetriccones,1998. CatherineA.SugarandGarethM.James.Findingthenumberofclustersinadataset:Aninformation-theoreticapproach.JournaloftheAmericanStatisticalAssociation,98(463):750,2003. RobertTibshirani.Regressionshrinkageandselectionviathelasso.JournaloftheRoyalStatisticalSociety,58(1):267,1996. RobertTibshirani,GuentherWalther,andTrevorHastie.Estimatingthenumberofclustersinadatasetviathegapstatistic.JournaloftheRoyalStatisticalSociety:SeriesB(StatisticalMethodology),63(2):411,2001. AndreiNikolaevichTikhonov.Solutionsofill-posedproblems.ScriptaSeriesinMathematics.Winston,1977. MarcTimme.Revealingnetworkconnectivityfromresponsedynamics.PhysicalReviewLetters,98(22),May2007. YaakovTsaigandDavidL.Donoho.Extensionsofcompressedsensing.SignalProcessing,86(3):549571,2006. RehaH.Tutuncu,KimC.Toh,andMichaelJ.Todd.Solvingsemidenite-quadratic-linearprogramsusingsdpt3.Mathematicalprogramming,95(2):189,2003. PedroA.Valdes-Sosa.Estimatingbrainfunctionalconnectivitywithsparsemultivariateautoregression.Phil.Trans.R.Soc.B,pages969,2005. MatthieuVignes,JimmyVandel,DavidAllouche,NidalRamadan-Alban,ChristineCierco-Ayrolles,ThomasSchiex,BrigitteMangin,andSimondeGivry.Generegulatorynetworkreconstructionusingbayesiannetworks,thedantzigselector,thelassoandtheirmeta-analysis.PloSone,6(12):e29165,2011. JiriVohradsky.Neuralmodelofthegeneticnetwork.J.Biol.Chem.,276(39):36168,Sep.2001. UlrikevonLuxburg.Atutorialonspectralclustering.TechnicalReport149,MaxPlanckInstituteforBiologicalCybernetics,August2006. UlrikevonLuxburg,MikhailBelkin,andOlivierBousquet.Consistencyofspectralclustering.TheAnnalsofStatistics,pages555,2008. JohnvonNeumann.Somematrixinequalitiesandmetrizationofmatrixspace.Tomsk.Univ.Rev.,1:153,1937. 118

PAGE 119

BeateWild,MichaelEichler,Hans-ChristophFriederich,MechthildHartmann,StephanZipfel,andWolfgangHerzog.Agraphicalvectorautoregressivemodellingapproachtotheanalysisofelectronicdiarydata.BMCMedicalResearchMethodology,10:1,2010. DanielaM.Witten,JeromeH.Friedman,andNoahSimon.Newinsightsandfastercomputationsforthegraphicallasso.JournalofComputationalandGraphicalStatistics,20(4):892,2011. EricP.Xing,WenjieFu,andLeSong.Astate-spacemixedmembershipblockmodelfordynamicnetworktomography.AnnalsofAppliedStatistics,4(2):535,2010. KevinY.Yip,RogerP.Alexander,Koon-KiuYan,andMarkGerstein.Improvedreconstructionofinsilicogeneregulatorynetworksbyintegratingknockoutandperturbationdata.PloSone,5(1):e8121,2010. MingYuanandYiLin.ModelselectionandestimationintheGaussiangraphicalmodel.Biometrika,94(1):19,2007. YeYuan,Guy-BartStan,SeanWarnick,andJorgeM.Goncalves.Robustdynamicalnetworkstructurereconstruction.Automatica,47(6):1230,2011. TuoZhao,HanLiu,KathrynRoeder,JohnLafferty,andLarryWasserman.Thehugepackageforhigh-dimensionalundirectedgraphestimationinr.J.Mach.Learn.Res.,98888:1059,June2012.ISSN1532-4435. HuiZou.Theadaptivelassoanditsoracleproperties.JournaloftheAmericanStatisticalAssociation,101:1418,2006. HuiZouandTrevorHastie.Regularizationandvariableselectionviatheelasticnet.JournaloftheRoyalStatisticalSociety:SeriesB(StatisticalMethodology),67(2):301,2005. 119

PAGE 120

YuejiaHewasborninChongqing,Chinain1987.Shereceivedherbachelor'sdegreeincontrolscienceandengineeringfromHuazhongUniversityofScienceandTechnologyin2009.ThenshebecameagraduatestudentinelectricalandcomputerengineeringattheUniversityofFloridaandreceivedherPh.D.inthesummerof2013. 120