Citation
Cholesky-Based Model Selection and Estimation in Graphical Models

Material Information

Title:
Cholesky-Based Model Selection and Estimation in Graphical Models
Creator:
Rahman, Syed Hafizur
Place of Publication:
[Gainesville, Fla.]
Florida
Publisher:
University of Florida
Publication Date:
Language:
english
Physical Description:
1 online resource (105 p.)

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Statistics
Committee Chair:
KHARE,KSHITIJ
Committee Co-Chair:
MICHAILIDIS,GEORGE
Committee Members:
GHOSH,MALAY
ENTEZARI,ALIREZA

Subjects

Subjects / Keywords:
cholesky -- convergence -- covariance -- dag -- graphical -- high-dimensional
Statistics -- Dissertations, Academic -- UF
Genre:
bibliography ( marcgt )
theses ( marcgt )
government publication (state, provincial, terriorial, dependent) ( marcgt )
born-digital ( sobekcm )
Electronic Thesis or Dissertation
Statistics thesis, Ph.D.

Notes

Abstract:
Covariance estimation for high-dimensional datasets is a fundamental problem in modern day statistics with numerous applications. In these high dimensional datasets, the number of variables p is typically larger than the sample size n. A popular way of tackling this challenge is to induce sparsity in the covariance matrix, its inverse or a relevant transformation. In particular, methods inducing sparsity in the Cholesky parameter of the inverse covariance matrix can be useful as they are guaranteed to give a positive definite estimate of the covariance matrix. In recent years, two useful penalized likelihood methods for sparse estimation of this Cholesky parameter (with no restrictions on the sparsity pattern) have been developed. However, these methods either consider a non-convex optimization problem which can lead to serious convergence issues and singular estimates of the covariance matrix when p > n, or achieve a convex formulation by placing restrictive constraints on the conditional variance parameters which can cause problems in obtaining accurate estimates of the covariance matrix. In this paper, we propose a new penalized likelihood method for sparse estimation of the inverse covariance Cholesky parameter that aims to overcome some of the shortcomings of current methods, but retains their respective strengths. We obtain a jointly convex formulation for our objective function, which leads to rigorous convergence guarantees, even when p > n. The approach always leads to a positive definite and symmetric estimator of the covariance matrix. We establish high-dimensional estimation and graph selection consistency, and also demonstrate finite sample performance on simulated/real data for concentration and covariance graph models when the underlying sparsity pattern is known. A major drawback of many existing methods is that estimates are not guaranteed to be positive definite. We develop "non-iterative methods" based on the Cholesky decompostion for both concentration graph models and covariance graph models with an arbitrary patterns of zeroes that are guaranteed to be positive definite. Finally, we develop a hybrid approach for directed acyclic graph estimation which leverages partition based ordering information, which leads to significant improvement in computational speed and statistical performance ( en )
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Thesis:
Thesis (Ph.D.)--University of Florida, 2017.
Local:
Adviser: KHARE,KSHITIJ.
Local:
Co-adviser: MICHAILIDIS,GEORGE.
Statement of Responsibility:
by Syed Hafizur Rahman.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
LD1780 2017 ( lcc )

Downloads

This item has the following downloads:


Full Text

PAGE 1

CHOLESKYBASEDMODELSELECTIONANDESTIMATIONINGRAPHICALMODELSBySYEDHAFIZURRAHMANADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2017

PAGE 2

c2017SyedHazurRahman

PAGE 3

ToAbbu.ToAmmu.Andtothetorpies.

PAGE 4

ACKNOWLEDGMENTSThisdissertationwouldn'thavebeenpossiblewithoutallthepeopleI'veinteractedwithinFlorida-theyhavehadaprofoundimpactonme,bothpersonallyandprofessionally.Firstandforemost,Iwanttothankmyadvisor,Dr.KshitijKhareforhissuperbguidanceandthoughtfulpatiencethroughouttheentireveandhalfyearjourney.Fromourveryrstinteraction,whenDr.Kharewalkedinandstartedlecturingwithoutnotesoraides(asiftherewasabookinhisheadsomewhere),IknewthiswasamanIneedtolearnalotfrom.Andnowwritingournalpapertogether,whereIamstilllearningabouthowtoshapecomplexideasaboutgraphicalmodelstocommunicatethemtotheworldatlarge-Iwilltreasurealltheselessonsfortherestofmylife.IwouldalsoliketothankDr.GeorgeMichailidis,withwhomIhadthepleasureofworkingwithonthenalchapterofmydissertation.Hehasawealthofknowledgeaboutmultivariateanalysisandhigh-dimensionalstatisicsandI'mgladhewasabletoimpartsomeofittome.AndDr.MalayGhosh-forhisbrilliantteachingofStatisticalInference.Asidefromthis,theyalsoservedaspartofmycommittee,forwhichIwillbeeternallygratefultothem.IhavethoroughlyenjoyedallmyclasseshereatFloridamostlyduetotheexcellentinstructionandguidancefromDr.Winner,Dr.Doss,Dr.Presnell,Dr.RosalskyandDrHobert.OthergraduatestudentsandstainthedepartmentcontributedtomakinglifeinFloridaenjoyableandalsoprovidedtheemotionalandintellectualsupporttogetmethroughmanyofmyclasses.SpecialshoutoutgoestoAbhishekBhattacharjee,AndreySkripnikov,CarlosMartinez,GiacomoBenincasa,WeiXia,RunminShi,Pei-liWang,IsaacDuerr,HunterMerrill,RobertParker,RayBai,PeymanJalali,TinaGreenlyandAleKirps.Withouttheirfriendshipandhelp,IdoubtIwouldhavemadeitsofarintheprogramandinlife.Lastly,Iwouldliketoacknowledgemyfriendsandfamilyforalltheirloveandencouragement.Myfatherandmotherinstillinginmealoveandrespectforappliedmathematicsandstatisticsandthevalueofdreamingbig.Andnally,RiksChow. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS ................................... 4 LISTOFFIGURES ..................................... 7 ABSTRACT ......................................... 9 CHAPTER 1OVERVIEW ...................................... 11 2ACONVEXFRAMEWORKFORHIGH-DIMENSIONALSPARSECHOLESKYBASEDCOVARIANCEESTIMATION ............................. 13 2.1Introduction ................................... 13 2.2AConvexApproachForSparseCholeskyEstimation .............. 20 2.2.1TheCSCSObjectiveFunction ...................... 21 2.2.2AMinimizationAlgorithmforQCSCS .................. 23 2.2.3SelectionofTuningParameter ...................... 25 2.2.4ComputationalComplexityoftheCSCSAlgorithm ........... 26 2.2.5ComparisonandConnectionswithOtherMethods ........... 27 2.2.5.1PenalizedSparsePartialCorrelationMethods ......... 27 2.2.5.2MethodsforDAGSelection .................. 30 2.3Experiments ................................... 32 2.3.1SparseCholeskyConvergencewhenn


PAGE 6

3.4.1SimulationExperiment .......................... 57 3.4.2ApplicationtoMinimumVariancePortfolioRebalancing ........ 58 3.5AsymptoticProperties .............................. 63 3.5.1ConsistencyintheFixedpSetting .................... 63 3.5.2ConsistencyintheHigh-dimensionalSetting ............... 65 4ESTIMATIONOFGAUSSIANDAGSWITHKNOWNPARTITIONBASEDPARTIALORDERING ...................................... 69 4.1Introduction ................................... 69 4.2DAGEstimationUsingPartitionInformation .................. 71 4.2.1Partition-DAGalgorithmwithTwoPartitionBlocks ........... 73 4.2.2TheCasewithMultiplePartitionBlocks ................. 75 4.2.3AdvantagesofPartition-DAG ...................... 76 4.3DataAnalysis .................................. 77 APPENDIX APROOFSFORCHAPTER2 .............................. 82 A.1ProofofLemma1 ................................ 82 A.2ProofofLemma5 ................................ 82 A.3ProofofLemma7 ................................ 83 A.4ProofofTheorem2.2.1 ............................. 84 A.5ProofofTheorem2.4.1 ............................. 84 BCALLCENTERDATA ................................. 94 B.1Forecastingdetails ................................ 94 CNETWORKSFORFLOWCYTOMETRYDATA ................... 98 REFERENCES ........................................ 100 BIOGRAPHICALSKETCH ................................. 105 6

PAGE 7

LISTOFFIGURES Figure page 2-1ComparisonofCSCS,SparseCholesky,SparseGraph ................. 20 2-2Cycliccoordinatewisealgorithmforhk,A, ....................... 25 2-3CSCSalgorithm:minimizationalgorithmforQCSCS ................. 26 2-4PlotoftheiteratesforD77forSparseCholeskyinasettingwithp=8.Itshowshowthevaluejumpsto0(andstaysthere). ..................... 44 2-5AUCforp=1000 ................................... 44 2-6AUCforp=2000 ................................... 44 2-7FrobeniusNormErrorforand .......................... 45 2-8FrobeniusNormerrorforp=1000 .......................... 45 2-9Log-likelihoodvaluesforcallcenterdata ....................... 46 2-10Log-likelihoodvaluesforHapMapdata ........................ 46 2-11TPRandFPRforCellSignallingPathwayData ................... 46 3-1Runningtimeandestimationerrorwithnormaldata ................. 58 3-2Runningtimeandestimationerrorwithtdata .................... 59 3-3RealizedSharperatiowithvariousvaluesofNest ................... 61 3-4AveragerunningtimeforS&P500data ........................ 61 3-5Normalizedwealthgrowthafteradjustingfortransactioncosts(0.5%ofprincipal)andborrowingcosts(interestrateof8%APR)withNest=375. .......... 62 3-6Normalizedwealthgrowthafteradjustingfortransactioncosts(0.5%ofprincipal)andborrowingcosts(interestrateof8%APR)withNest=450. .......... 62 4-1Partition-DAGalgorithmwith2blocks ........................ 78 4-2Partition-DAGalgorithmwithRblocks ........................ 79 4-3TrueNetworkfromowcytometrydata ........................ 81 4-4EstimatedNetworksfromowcytometrydata .................... 81 B-1Minimumforecasterrorforcallcenterdata ...................... 95 B-2Aggregatedforecasterrorforcallcenterdata ..................... 95 7

PAGE 8

B-3AverageAbsoluteForecastErrorwith205observationsintrainingdataset ..... 96 B-4AverageAbsoluteForecastErrorwith150observationsintrainingdataset ..... 96 B-5AverageAbsoluteForecastErrorwith100observationsintrainingdataset ..... 97 B-6AverageAbsoluteForecastErrorwith75observationsintrainingdataset ...... 97 C-1Trueandestimatedgraphs(matchingsparsitytotruegraph) ............ 98 C-2Trueandestimatedgraphsbysettingi()=2n)]TJ /F8 5.978 Tf 7.78 3.25 Td[(1 2Z 2p(i)]TJ /F8 5.978 Tf 5.76 0 Td[(1). ............. 99 8

PAGE 9

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophyCHOLESKYBASEDMODELSELECTIONANDESTIMATIONINGRAPHICALMODELSBySyedHazurRahmanDecember2017Chair:KshitijKhareMajor:StatisticsCovarianceestimationforhigh-dimensionaldatasetsisafundamentalprobleminmoderndaystatisticswithnumerousapplications.Inthesehighdimensionaldatasets,thenumberofvariablespistypicallylargerthanthesamplesizen.Apopularwayoftacklingthischallengeistoinducesparsityinthecovariancematrix,itsinverseorarelevanttransformation.Inparticular,methodsinducingsparsityintheCholeskyparameteroftheinversecovariancematrixcanbeusefulastheyareguaranteedtogiveapositivedeniteestimateofthecovariancematrix.Inrecentyears,twousefulpenalizedlikelihoodmethodsforsparseestimationofthisCholeskyparameter(withnorestrictionsonthesparsitypattern)havebeendeveloped.However,thesemethodseitherconsideranon-convexoptimizationproblemwhichcanleadtoseriousconvergenceissuesandsingularestimatesofthecovariancematrixwhenp>n,orachieveaconvexformulationbyplacingrestrictiveconstraintsontheconditionalvarianceparameterswhichcancauseproblemsinobtainingaccurateestimatesofthecovariancematrix.WeproposeanewpenalizedlikelihoodmethodforsparseestimationoftheinversecovarianceCholeskyparameterthataimstoovercomesomeoftheshortcomingsofcurrentmethods,butretainstheirrespectivestrengths.Weobtainajointlyconvexformulationforourobjectivefunction,whichleadstorigorousconvergenceguarantees,evenwhenp>n.Theapproachalwaysleadstoapositivedeniteandsymmetricestimatorofthecovariancematrix.Weestablishhigh-dimensionalestimationandgraphselectionconsistency,andalsodemonstratenitesampleperformanceonsimulated/realdata.Wealsoconsiderestimation 9

PAGE 10

inconcentrationwhentheunderlyingsparsitypatternisknown.Amajordrawbackofmanyexistingmethodsisthatestimatesarenotguaranteedtobepositivedenite.Wedevelop\non-iterativemethods"basedontheCholeskydecompostionforbothconcentrationgraphmodelsandcovariancegraphmodelswithanarbitrarypatternsofzeroesthatareguaranteedtobepositivedenite.Finally,wedevelopahybridapproachfordirectedacyclicgraphestimationwhichleveragespartitionbasedorderinginformation,whichleadstosignicantimprovementincomputationalspeedandstatisticalperformance. 10

PAGE 11

CHAPTER1OVERVIEWCovarianceestimationforhigh-dimensionaldatasetsisafundamentalprobleminmoderndaystatisticswithnumerousapplicationsineldssuchasnance,geneticsandclimatesciences.Inthesehighdimensionaldatasets,thenumberofvariablespistypicallylargerthanthesamplesizen.ThisincludesmicroarrayorSNPdatasetswherethenumberofgenesisinthetens(orhunderds)ofthousands,butthenumberofsamplesistypicallyafewhundred,orpaleoclimatedatasetswithclimatedataforthousandsoflocationsaroundtheearth,butonlygoingbacktoafewhundredyears.Acommongoalintheseapplicationsistounderstandthecomplexnetworkofrelationshipsbetweenthevariablesinthedatasets(suchasgenes,stocksorclimatelocations),andthemostfundamentalquantitywhichcanhelpusinthisendeavoristhecovariancematrixofthevariables.ThecovariancematrixhasO(p2)parameters,whichneedtobeestimatedbasedonacomparativelymuchsmallersamplesizen.Apopularwayoftacklingthischallengeistoinducesparsityinthecovariancematrix,itsinverseorarelevanttransformation.Thereareessentiallytwotasksinvolvedinthestatisticalanalysisofthesemodels: 1. Therstismodelselection,orchoosingthepatternofstructuralzerostobeimposedonor=)]TJ /F8 7.97 Tf 6.59 0 Td[(1ortheCholeskytransformof. 2. Thesecondisestimation,orestimatingundertheaboverestrictions.Mydissertation,whichconsistsofthreeprojects,hasfocussedontheabovetwotasks.Chapter 2 introducesapenalizedlikelihoodapproachthroughtheCholeskydecompositionofthecovariancematrixtoachievebothestimationandmodelselectioninone-shot.WeproposeanewmethodforsparseestimationoftheinversecovarianceCholeskyparameterthataimstoovercomesomeoftheshortcomingsofcurrentmethods,butretainstheirrespectivestrengths.Weobtainajointlyconvexformulationforourobjectivefunction,whichleadstoconvergenceguarantees,evenwhenp>n.Themainadvantageofthisapproachisthatitalwaysleadstoapositivedeniteandsymmetricestimatorofthecovariancematrix.In 11

PAGE 12

addition,thesparsitypatternoftheCholeskyparametercorrespondstoadirectedacyclicgraphforwhichweestablishgraphselectionconsistencyasn,p!1.Chapter 3 isconcernedwithestimationinconcentrationandcovariancegraphmodelswhentheunderlyingsparsitypatternisknown.Hence,theobjectiveistoobtainapositivedentiteestimateofthecovaraincematrix(oritsinverse)whichrespectsthegivensparsitypattern.However,manyexistingmethodsdonotprovideguaranteesfortheresultingestimatortobepositivedenite,orareiterativeinnature.Wedevelop\non-iterativemethods"basedontheCholeskydecompostionwithanarbitrarypatternsofzeroes,andourestimatorsareguaranteedtobepositivedenite.Chapter 4 developsahybridapproachforDAGestimationwhichleveragesthepartitionbasedorderinginformation.Wewillalsoshowthatusingthepartitioninformationleadstoareductioninthenumberofcomputations,andmoreimportantlyallowsforparallelprocessing.Thiscanleadtosignicantimprovementincomputationalspeedandstatisticalperformance.Theresultsfromthesechaptershavebeencompiledintothefollowingtwopapers Khareetal. ( 2017a ), Khareetal. ( 2017b )and Rahmanetal. ( 2017 ). 12

PAGE 13

CHAPTER2ACONVEXFRAMEWORKFORHIGH-DIMENSIONALSPARSECHOLESKYBASEDCOVARIANCEESTIMATION 2.1IntroductionInmoderndaystatistics,datasetswherethenumberofvariablesismuchhigherthanthenumberofsamplesaremorepervasivethantheyhaveeverbeen.Oneofthemajorchallengesinthissettingistoformulatemodelsanddevelopinferentialprocedurestounderstandthecomplexrelationshipsandmultivariatedependenciespresentinthesedatasets.Thecovariancematrixisthemostfundamentalobjectthatquantiesrelationshipsbetweenthevariablesinmultivariatedatasets.Hence,estimationofthecovariancematrixiscrucialinhigh-dimensionalproblemsandenablesthedetectionofthemostimportantrelationships.Inparticular,supposewehavei.i.d.observationsY1,Y2,,Ynfromap-variatenormaldistributionwithmeanvector0andcovariancematrix.Notethat2P+p,thespaceofpositivedenitematricesofdimensionp.Inmanymodernapplications,thenumberofobservationsnismuchlessthanthenumberofvariablesp.Insuchsituations,parsimoniousmodelswhichrestricttoalowerdimensionalsubspaceofP+parerequiredformeaningfulstatisticalestimation.Let)]TJ /F8 7.97 Tf 6.59 0 Td[(1=TtD)]TJ /F8 7.97 Tf 6.58 0 Td[(1TdenotethemodiedCholeskydecompositionof=)]TJ /F8 7.97 Tf 6.58 0 Td[(1.HereTisalowertriangularmatrixwithdiagonalentriesequalto1(wewillrefertoTastheCholeskyparameter),andDisadiagonalmatrixwithpositivediagonalentries.TheentriesofTandDhaveaverynaturalinterpretation.Inparticular,the(nonredundant)entriesineachrowofTarepreciselytheregressioncoecientsofthecorrespondingvariableontheprecedingvariables.Similarly,eachdiagonalentryofDistheresidualvarianceofthecorrespondingvariableafterregressionontheprecedingvariables.NotethattheCholeskyfactorofapositivedenitematrixisnotinvarianttoareorderingofthevariables,andthediscussionaboveimplicitlyassumesagivenorderingofthevariablesinthedataset.However,foravarietyofapplications,anaturalordering,suchastimebasedorlocationbasedordering,ofthevariablespresentsitself(seeSections 2.3.5 2.3.3 ,forexample). 13

PAGE 14

Owingtotheseinterpretations,variousauthorsintheliteraturehaveconsideredsparseestimationofTasameansofinducingparsimonyinhigh-dimensionalsituations.SmithandKohn SmithandKohn ( 2002 )developahierarchicalBayesianapproachwhichallowsforsparsityintheCholeskyparameter.WuandPourahmadi WuandPourahmadi ( 2003 )developanon-parametricsmoothingapproachwhichprovidesasparseestimateoftheCholeskyparameter,withabandedsparsitypattern.Huangetal. Huangetal. ( 2006 )introduceapenalizedlikelihoodmethodtondaregularizedestimateofwithasparseCholeskyparameter.Levinaetal. Levinaetal. ( 2008 )developapenalizedlikelihoodapproachusingtheso-callednestedlassopenaltytoprovideasparsebandedestimatorfortheCholeskyparameter.Rothmanetal. Rothmanetal. ( 2010a )developpenalizedlikelihoodapproachesfortherelatedbutdierentproblemofsparseestimationofT)]TJ /F8 7.97 Tf 6.59 0 Td[(1.ShajoieandMichalidis ShojaieandMichailidis ( 2010 )motivatesparsityintheCholeskyparameterTasawayofestimatingtheskeletongraphforaGaussianDirectedAcyclicGraph(DAG)model.Inrecentparallelwork,YuandBien YuandBien ( 2016 )developapenalizedlikelihoodapproachtoobtainatapered/bandedestimatorofT(withpossiblydierentbandwithsforeachrow).Tothebestofourknowledge,themethodsin Huangetal. ( 2006 )and ShojaieandMichailidis ( 2010 )aretheonly(non-Bayesian)methodswhichinduceageneralorunrestrictedsparsitypatternintheinversecovarianceCholeskyparameterT.Althoughboththesemethodsarequiteuseful,theysuerfromsomedrawbackswhichwewilldiscussbelow.Huangetal. Huangetal. ( 2006 )obtainasparseestimateofTbyminimizingtheobjectivefunction QChol(T,D)=tr)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(TtD)]TJ /F8 7.97 Tf 6.59 0 Td[(1TS+logjDj+X1i
PAGE 15

i=1,2,,p.Itcanbeestablishedaftersomesimplication(see Huangetal. ( 2006 ))thatQChol(T,D)=S11 D11+logD11+pXi=2(i)tSi)]TJ /F8 7.97 Tf 6.58 0 Td[(1i+2(i)tSi+Sii Dii+logDii+kik1,wherekxk1denotesthesumofabsolutevaluesoftheentriesofavectorx.ItfollowsthatminimizingQChol(L,D)withrespecttoLandD,isequivalenttominimizing QChol,i(i,Dii)=(i)tSi)]TJ /F8 7.97 Tf 6.59 0 Td[(1i+2(i)tSi+Sii Dii+logDii+kik1(2{2)withrespectto(i,Dii)fori=2,3,,p,andsettingD11=S11.Huangetal. Huangetal. ( 2006 )proposeminimizingQChol,iusingcyclicblockcoordinatewiseminimization,whereeachiterationconsistsofminimizingQChol,iwithrespectto(xingDiiatitscurrentvalue),andthenwithrespecttoDii(xingiatitscurrentvalue).However,thisregularizationapproachbasedonminimizingQCholencountersaproblemwhenn0QChol,n+1(n+1,Dn+1,n+1)=.NotethattherstandthirdtermintheexpressionforQChol,iarenon-negative.Hence,QChol,i(i,Dii)takesthevalueifandonlyif(i)tSi)]TJ /F8 7.97 Tf 6.59 0 Td[(1i+2(i)tSi+Sii=0andDii=0.LetTpdenotethespaceofpplowertriangularmatriceswithunitdiagonalentries,andDpdenotethespaceofppdiagonalmatriceswithpositivediagonalentries.Sincef(i,Dii)gpi=1formsadisjointpartitionof(T,D),itfollowsfromLemma 1 thatifn0QChol,1(D11)+pXi=2infi2Ri)]TJ /F8 5.978 Tf 5.76 0 Td[(1,Dii>0QChol,i(i,Dii)=, 15

PAGE 16

andtheinmumcanbeachievedonlyifoneoftheDii'stakesthevaluezero(whichisunacceptableasitcorrespondstoasingular).Anotherissuewiththeapproachin Huangetal. ( 2006 )isthatsincethefunctionQChol,n+1isnotajointlyconvexorevenbi-convexin(n+1,Dn+1,n+1),existingresultsintheliteraturedonotprovideatheoreticalguaranteethatthesequenceofiteratesgeneratedbytheblockcoordinatewiseminimizationalgorithmofHuangetal. Huangetal. ( 2006 )(whichalternatesbetweenminimizingwithrespectton+1andDn+1,n+1)willconverge.Ifthesequenceofiteratesdoesconverge,itisnotclearwhetherthelimitisaglobalminimumoralocalminimum.Ofcourse,convergencetoalocalminimumisnotdesirableastheresultingestimateisnotingeneralmeaningful,andasdescribedabove,convergencetoaglobalminimumwillimplythatthelimitliesoutsidetherangeofacceptableparametervalues.ThisphenomenonisfurtherillustratedinSection 2.3.1 .NotethatthesparsitypatternsinTcanbeassociatedwithadirectedacyclicgraphG=(V,E),whereV=f1,2,,pgandE=fi!j:i
PAGE 17

coecientsofYionfYjgi)]TJ /F8 7.97 Tf 6.59 0 Td[(1j=1,thiscanberegardedasalassoleastsquaresapproachforsparsityselectioninT.Hence,regardlessofwhetherthetrueDii'sareallequaltooneornot,thisisavalidapproachforsparsityselection/graphselection,whichispreciselythegoalin ShojaieandMichailidis ( 2010 ).However,inmanyapplications,theendgoalistoobtainanaccurateestimateofthecovariancematrix,whichrequiresestimatingbothTandD.WenowpointoutsomeissueswithmakingtheassumptionDii=181ipwhenthegoalisestimationof=T)]TJ /F8 7.97 Tf 6.58 0 Td[(1D(Tt))]TJ /F8 7.97 Tf 6.59 0 Td[(1.Notethatifcov(Y)=,andifwedenethevectorof\latentvariables"Z=TY,thencov(Z)=D.Hence,assumingthatDii=1impliesthatthelatentvariablesinZhaveunitvariance,NOTthevariablesinY.AnassumptionofunitvarianceforYcanbedealtwithbyscalingtheobservationsinthedata.ButscalingthedatadoesnotjustifytheassumptionthatthelatentvariablesinZhaveunitvariances.ThisisillustratedinthesimulationexampleinSection 2.3.2 .Also,itisnotclearifanassumptionofunitvariancesforthelatentvariablesinZcanbedealtwithbypreprocessingthedataanotherway.Hence,assumingthatthediagonalentriesofDare1canberestrictive,especiallyforestimationpurposes.OnecouldproposeanapproachwhereestimatesofTareobtainedbyminimizing( 2{3 ),andestimatesofDareobtaineddirectlyfromtheCholeskydecompositionofthesamplecovariancematrixS.However,thisapproachwillnotworkwhenn
PAGE 18

Inparticular,ItcanbeshownthattheCSCSobjectivefunctionisjointlyconvexinthe(nonredundant)entriesofL,isboundedawayfromevenifn
PAGE 19

Asmentionedpreviously,ourapproachaswellasthatof Huangetal. ( 2006 ); ShojaieandMichailidis ( 2010 ); YuandBien ( 2016 )assumestheexistenceofadomain-specicorderingofthethevariables,whichisavailableinalargevarietyofapplications(seethesepapersandSections 2.3.5 and 2.3.3 forexamples).However,inotherapplications,suchanorderingmaynotbeapparent.Insuchsettings,apossiblesolutionistoconsidertheorderingasanadditionalparameterintheobjectivefunctionandoptimizeoverit.Thisproblemhasbeenstudiedintherecentliterature,see vandeGeerandBuhlmann ( 2013 ); AragamandZhou ( 2015 ); Aragametal. ( 2016 )andthereferencestherein.Whilesuchanapproachisnecessaryinsituationswhenanaturalorderingdoesnotexist,thecorrespondingoptimizationproblemalsobecomessignicantlycomplicated,andingeneraltheresultingestimatecanonlybeshowntobealocalminimumofthecorespondingobjectivefunctions.Thecontributionofourworkistoshowthatwhenadomain-specicorderingisavailable,onecandevelopfaster,parallelizablealgorithms,andstrongeralgorithmicconvergence(convergencetoaglobalminimum,evenwhenn
PAGE 20

METHOD Property SparseCholesky SparseGraph CSCS Noconstraintsonsparsitypattern + + + NoconstraintsonD(forestimation) + + Convergenceguaranteetoacceptableglobalminimumwhenn


PAGE 21

2.2.1TheCSCSObjectiveFunctionWewillnowshowthatallthegoalsmentionedabovecanbeachievedbyreparametrizingintermsoftheclassicalCholeskyparameter.RecallthattheclassicalCholeskydecompositionofisgivenby=LtL,whereL(whichwewillrefertoastheclassicalCholeskyparameter)isalowertriangularmatrixwithpositivediagonalentries.Itiseasytoseethat Lij=Tij=p Djjforeveryij.(2{4)Hence,Lij=0ifandonlyifTij=0,i.e.,sparsityinTisequivalenttosparsityinL.AfterreparametrizingQCholintermsofL(asopposedto(T,D))andsomesimplemanipulations,weobtainthefollowingobjectivefunction. QChol(T)=tr)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(LLtS)]TJ /F10 11.955 Tf 11.95 0 Td[(2logjLj+X1j
PAGE 22

from( 2{6 ),thelowertriangularnatureofL,andthedenitionofithat QCSCS(L)=tr)]TJ /F3 11.955 Tf 5.48 -9.69 Td[(LSLt)]TJ /F10 11.955 Tf 11.96 0 Td[(2pXi=1logLii+X1j0,andbeapositiveconstant.Considerthefunctionh(x)=)]TJ /F10 11.955 Tf 11.29 0 Td[(logxk+xTAx+k)]TJ /F8 7.97 Tf 6.59 0 Td[(1Xi=1jxjjdenedonRk)]TJ /F8 7.97 Tf 6.59 0 Td[(1R+.Then,thereexistpositiveconstantsa1anda2(dependingonlyonandA),suchthath(x)a1xk)]TJ /F3 11.955 Tf 11.95 0 Td[(a2foreveryx2Rk)]TJ /F8 7.97 Tf 6.59 0 Td[(1R+. 22

PAGE 23

Using( 2{8 ),( 2{9 )alongwiththefactsthatSiispositivesemi-deniteandSii>0,itfollowsfromLemma 3 thatforevery1ip,thereexistpositiveconstantsaiandbisuchthat QCSCS,i(i)=(i)TSii)]TJ /F10 11.955 Tf 11.96 0 Td[(2logii+ 2i)]TJ /F8 7.97 Tf 6.59 0 Td[(1Xj=1jijj+ 2i)]TJ /F8 7.97 Tf 6.58 0 Td[(1Xj=1jijjaiii)]TJ /F3 11.955 Tf 11.96 0 Td[(bi+ 2i)]TJ /F8 7.97 Tf 6.59 0 Td[(1Xj=1jijj (2{10) foreveryi2Ri)]TJ /F8 7.97 Tf 6.58 0 Td[(1R+.Thefollowinglemmanowfollowsimmediatelyfrom( 2{7 ),( 2{10 )andthefactthatfigpi=1formsadisjointpartitionofL. Lemma4. Foreverynandp,infL2LpQCSCS(L)=pXi=1infi2Ri)]TJ /F8 5.978 Tf 5.75 0 Td[(1R+QCSCS,i(i))]TJ /F4 7.97 Tf 30.23 15.21 Td[(pXi=1bi>,andQCSCS(L)!1asjijj=jLijj!1foranyj0,andAisapositivesemi-denitematrixwithpositivediagonalentries.Itfollowsfrom( 2{8 )and( 2{9 )thatQCSCS,i(i)=hi,Si,(i)forevery1ip.Itthereforesucestodevelopanalgorithmtominimizeafunctionoftheformhk,A,asspeciedin( 2{11 ).Notethatwithoutthelogarithmictermandtherestrictionthatxk>0,theoptimizationproblemforhk,A,wouldhavebeenequivalenttothelassooptimizationproblemforwhichseveralapproacheshavebeendeveloped 23

PAGE 24

intheliterature,suchastheshootingalgorithmin Fu ( 1998 ),orthepathwisecoordinateoptimizationapproachin Friedmanetal. ( 2008a ),forexample.However,thesealgorithmsdonotapplyinthecurrentsituationduetothepresenceofthelogarithmictermandtheconditionxk>0.Wewillnowderiveacycliccoordinatewiseminimizationalgorithmforhk,A,.Forevery1jk,denethefunctionTj:Rk)]TJ /F8 7.97 Tf 6.58 0 Td[(1R+!Rk)]TJ /F8 7.97 Tf 6.58 0 Td[(1R+by Tj(x)=infy2Rk)]TJ /F8 5.978 Tf 5.76 0 Td[(1R+:yl=xl8l6=jhk,A,(x).(2{12)Thefollowinglemma(proofprovidedinthesupplementaldocument)showsthatthefunctionsfTjgkj=1canbecomputedinclosedform. Lemma5. ThefunctionTj(x)denedin( 2{12 )canbecomputedinclosedform.Inparticular, (Tj(x))j=S)]TJ /F10 11.955 Tf 9.3 0 Td[(2Pl6=jAljxl 2Ajj(2{13)for1jk)]TJ /F10 11.955 Tf 11.95 0 Td[(1,and (Tk(x))k=)]TJ /F16 11.955 Tf 11.29 8.96 Td[(Pl6=kAlkxl+r Pl6=kAlkxl2+4Akk 2Akk.(2{14)HereSisthesoft-thresholdingoperatorgivenbyS(x)=sign(x)(jxj)]TJ /F5 11.955 Tf 19.52 0 Td[()+.Lemma 5 providestherequiredingredientstoconstructacycliccoordinatewiseminimizationalgorithmtominimizehk,A,(seeAlgorithm 3.1 ).Now,tominimizeQCSCS(L),weuseAlgorithm 3.1 tominimizeQCSCS,i(i)forevery1ip,andcombinetheoutputstoobtaintheamatrixonLp(seeAlgorithm 2-3 ).WerefertoAlgorithm 2-3 astheCSCSalgorithm.NotethatalthoughthefunctionQCSCS,iisjointlyconvexintheentriesofi,itisnotingeneralstrictlyconvexifn
PAGE 25

Figure2-2. Cycliccoordinatewisealgorithmforhk,A, Algorithm2.1(H). Input:k,Aand Input:Fixmaximumnumberofiterations:rmax Input:Fixinitialestimate:^x(0) Input:Fixconvergencethreshold: Setr 1 converged=FALSE Set^xcurrent ^x(0) repeat ^xold ^xcurrent forj 1,2,,k)]TJ /F10 11.955 Tf 11.96 0 Td[(1do^xcurrentj (Tj(xcurrent))j endfor^xcurrentk (Tk(xcurrent))k (2{15) ^x(r) ^xcurrent ##Convergencechecking ifk^xcurrent)]TJ /F10 11.955 Tf 11.81 0 Td[(^xoldk1rmax Returnnalestimate:^x(r) Theorem2.2.1. IfSii>0forevery1ip,thenAlgorithm 2-3 convergestoaglobalminimumofQCSCS.Theproofoftheabovetheoremisprovidedinthesupplementaldocument. 2.2.3SelectionofTuningParameterThetuningparametercanbeselectedusinga"BIC"-likemeasure,denedasfollows:BIC()=ntr(S^))]TJ /F3 11.955 Tf 11.95 0 Td[(nlogj^j+lognE 25

PAGE 26

Figure2-3. CSCSalgorithm:minimizationalgorithmforQCSCS Algorithm2.2(H). Input:DataY1,Y2,,Ynand Input:Fixmaximumnumberofiterations:rmax Input:Fixinitialestimate:^L(0) Input:Fixconvergencethreshold: fori 1,2,,pdo (^i)(0) ithrowof^L(0)(uptothediagonal) Set^itobeminimizerofQCSCS,iobtainedbyusingAlgorithm 3.1 withk=i,A=Si,,rmax,^x(0)=(^i)(0), endfor Construct^L2Lpbysettingitsithrow(uptothediagonal)as^i Returnnalestimate:^L whereEdenotesthenumberofnon-zeroentriesin^L,nisthesamplesize,Sthesamplecovarianceand^=^Lt^L.ThevalueofminimizingthefunctionBIC()canbechosen.In Huangetal. ( 2006 )and ShojaieandMichailidis ( 2010 )theauthorsrespectivelyproposetuningparameterchoicesbasedoncross-validationandscalednormalquantiles.TheseproceduresaredescribedbrieyinSection 2.3.3 andSection 2.3.2 respectively. 2.2.4ComputationalComplexityoftheCSCSAlgorithmWenowproceedtoevaluatethecomputationalcomplexityoftheCSCSalgorithm.NotethattheCSCSalgorithm(Algorithm 2-3 )involvespseparateminimizations,allofwhichcanberuninparallel,especiallygivenmoderncomputingresources.Inaparallelizablesetting,wedenethecomputationalcomplexityasamaximumnumberofcomputationsamongallprocessesrunninginparallel.Wewillshowthefollowing. Lemma6. ThebestcasecomputationalcomplexityperiterationforAlgorithm 2-3 ismin(O(np),O(p2))(ifallthepminimizationsareruninparallel),andtheworstcasecomputationalcomplexityperiterationforAlgorithm 2-3 ismin)]TJ /F3 11.955 Tf 5.47 -9.69 Td[(O)]TJ /F3 11.955 Tf 5.48 -9.69 Td[(nPpi=1i,O)]TJ 5.48 -0.72 Td[(Pni=1i2=min(O(np2,p3))(ifallthepminimizationsarerunsequentially).Toprovetheabovelemma,westartbyestablishingaresultaboutthecomputationalcomplexityperiterationforAlgorithm 3.1 26

PAGE 27

Lemma7. SupposeA=BBT,whereBisanknmatrix.ThenthecomputationalcomplexityforAlgorithm 3.1 ismin(O(nk),O(k2)).Theproofofthislemmaisprovidedintheappendix.SinceS=1 nPnj=1YjYTj,itfollowsthatSi(aprincipaliisubmatrixofS)canbewrittenasBiBTiforanappropriateinmatrixBi.SinceQCSCS,i(i)=hi,Si,(i)forevery1ip,Lemma 6 followsimmediatelybyinvokingLemma 7 2.2.5ComparisonandConnectionswithOtherMethods 2.2.5.1PenalizedSparsePartialCorrelationMethodsInthissectionwecompareandcontrasttheCSCSmethod(whichinducessparsityintheCholeskyfactorof)withsparsepartialcorrelationmethods,i.e.,penalizedmethodswhichinducesparsityintheinversecovariancematrixitself.Theentriesintheithrowof(appropriatelyscaled)canbeinterpretedasregressioncoecientsoftheithvariableagainstallothervariables.Recallthatthe(non-redundant)entriesintheithrowofT,ontheotherhand,aretheregressioncoecientsoftheithvariableagainstonlytheprecedingvariables.AnaturalquestiontoaskiswhetherthereisanyconnectionbetweenmodelswhichintroducesparsityintheCholeskyfactorofandmodelswhichinducesparsityinitself.Ingeneral,thesparsitypatternintheCholeskyfactorTofapositivedenitematrixisnotthesameasthesparsitypatterninitself.Notethatagivenpatternofzerosinthelowertriangleappmatrixuniquelycorrespondstoagraphwithverticesf1,2,,pg,wheretwoverticesdonotshareanedgewheneverthecorrespondingentryisincludedinthepatternofzeros.ItisknownthatthesparsitypatterninisexactlythesameasitsCholeskyfactorifandonlyifthecorrespondinggraphischordal(decomposable)andtheverticesareorderedbasedonaperfectvertexeliminationscheme(see Paulsenetal. ( 1989b )).Wenowsummarizetherelevantdetailsofpenalizedmethodswhichinducesparsityin.Suchmethodscanbedividedintotwocategories:penalizedlikelihoodmethodssuchasGLASSO( Banerjeeetal. ( 2008 ), Friedmanetal. ( 2008b )),andpenalizedpseudo-likelihoodmethodssuchasCONCORD( Khareetal. ( 2015 )),SPACE( Pengetal. ( 2009 ))and 27

PAGE 28

SYMLASSO( Friedmanetal. ( 2010 )).TheGLASSOobjectivefunctioniscomprisedofalogGaussianlikelihoodtermandan`1-penaltytermforentriesof.Friedmanetal. Friedmanetal. ( 2008b )presentanalgorithmforminimizingthisobjectivefunctionwithhascomputationalcomplexityofO(p3)periteration1.Pseudo-likelihoodbasedobjectivefunctionsusedinCONCORD,SPACEandSYMLASSOarecomprisedofalogpseudo-likelihoodtremwhichisbasedontheregressionbasedinterpretationoftheentriesof,andan`1-penaltytermforentriesof.Theseobjectivefunctionsaretypicallyminimizedusingcycliccoordinatewiseminimizationwithacomputationalcomplexityofmin(O(np2),O(p3))2.Owingtotheregressionbasedinterpretationofthepseudo-likelihood,theminimizationisdoneoverallsymmetricmatriceswithpositivediagonalentries(asopposedtoGLASSO,wheretheminimizationisdoneoverthesetofpositivedenitematrices),andhencetheminimizerisnotguaranteedtobepositivedenite.Inmanyapplications,themaingoalisselectionofthesparsitypattern(network),andthisdoesnotposeaproblem.Infact,gettingridofthepositivedenitenessconstraintishelpfulinimprovingtheperformanceofsuchmethods(ascomparedtoGLASSO)inhigh-dimensionalsettingsincludingincreasedrobustnesstoheaviertaileddata(see Khareetal. ( 2015 )).TheCONCORDalgorithm,unlikeSPACEandSYMLASSO,providescrucialtheoreticalguaranteesofconvergencetoaglobalminimumoftherespectiveobjectivefunction(whilepreservingalltheotherattractivepropertiesofSPACEandSYMLASSO). 1Inrecentyears,severaladaptations/alternativestothisalgorithmhavebeenproposedinordertoimproveitsspeed(see Hsiehetal. ( 2011 ); MazumderandHastie ( 2012 )forinstance).However,forthesemethodstoprovidesubstantialimprovementsoverthegraphicallasso,certainassumptionsarerequiredonthenumberandsizeoftheconnectedcomponentsofthegraphimpliedbythezerosintheminimizer.2Recently,amuchfasterproximalgradientbasedoptimizationmethodfortheCONCORDobjectivefunctionhasbeendevelopedin Ohetal. ( 2014 ). 28

PAGE 29

Thereis,infact,aninterestingparallelbetweenCONCORDandCSCS.TheCONCORDobjectivefunction(scaledby2 n)isgivenbyQcon()=)]TJ /F4 7.97 Tf 17.61 15.21 Td[(pXi=12log!ii+tr)]TJ /F10 11.955 Tf 5.48 -9.69 Td[(tS+X1j
PAGE 30

interpretationfortheentriesoftheCholeskyfactorT(orequivalentlyL)exactlycorrespondstothelogGaussianlikelihoodforT. 2.2.5.2MethodsforDAGSelectionAsmentionedintheintroduction,whileanaturalorderingofvariablesisavailableinseveralapplications,inmanyapplicationssuchanorderingmaynotbeapparent.Insuchasetting,theorderingtooneedstobeselectedusingthedata.Methodstotacklethisproblemhavebeenstudiedinrecentliterature(see vandeGeerandBuhlmann ( 2013 ); AragamandZhou ( 2015 ); Aragametal. ( 2016 )andthereferencestherein,forareview).Themethod/algorithmmostrelevantinthecontextofCSCSistheCCDr-`1algorithmin AragamandZhou ( 2015 ).Thisalgorithmessentiallyaimstominimizetheobjectivefunction QCCDr(R)=tr)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(RTRS)]TJ /F10 11.955 Tf 11.96 0 Td[(2logjRj+X1i6=jpjRijj(2{16)overthesetofallmatricesRwhichcanbeobtainedbypermutingtherowsandcolumnsofalowertriangularmatrixwithpositivediagonalentries.EachiterationoftheCCDr-`1algorithmminimizestheobjectivefunctionoverthepair(Rij,Rji)(xingotherentries)andcyclesthroughallsuchpairs.Itfollowsfrom( 2{6 )that,fromapurelymathematicalpointofview,CCDr-`1andCSCSarebothoptimizingoverthesameobjectivefunction.ThedierenceisthatCCDr-`1optimizesthefunctionoverthesetofmatriceswhichcanbeconvertedtoalowertriangularmatrixwithpositivediagonalentriesthrough(arowandcolumn)reordering,whereasCSCSoptimizesthefunctionoverthesetoflowertriangularmatriceswithpositivediagonalentries.DespitethisverycloseconnectionbetweentheobjectivefunctionsforCCDr-`1andCSCS,thedierenceintherangeofoptimizationleadstosomequalitativedierencesbetweentherespectiveoptimizationalgorithmsandestimators. (a) (Convexity)WhiletheobjectivefunctionforCCDr-`1isconvexinR,therangeofminimizationisnot.Hencetheminimizationproblemisnotaconvexproblem.Insuchsettings,forcoordinatewiseminimizationalgorithmslikeCCDr-`1,generalresultsinthe 30

PAGE 31

literatureonlyguaranteeconvergenceofthesequenceofiteratestoalocalminimumoftheobjectivefunction.Ontheotherhand,forCSCS,theoptimizationproblemisaconvexproblem(thoughnotstrictlyconvexwhenn
PAGE 32

isnotaconvexproblem,andhencetheresultingestimatecanbeshownonlytobealocalminimumoftheobjectivefunction.Also,dierentinitialvaluescanpossiblyleadtodierentestimates.Furthemore,theCCDr-`1algorithmprovidesapermutedlowertriangularmatrix,whichcanbeusedtoconstructanorderingofthevariables,aswellasasanestimateofthecovariancematrix.However,itisnotclearifthiscovariancematrixestimatecorrespondstotheglobalminimumoftheobjectivefunctionifoneweretorestricttotheestimatedorderingandoptimizeonlyoverthematriceswhicharelowertriangularwithrespecttothisordering.Inlightofthesefacts,thefollowingtwo-stepapproach,canbedeveloped.FirstidentifymultiplelocalminimaoftheCCDr-`1objective(byusingdierentstartingpointsforthealgorithms),andthenuseCSCSistoobtaincovariancematrixestimatesforeachordering.Aglobalcandidatecanthenbechosenbycomparingoutofsamplelog-likelihoods. 2.3Experiments 2.3.1SparseCholeskyConvergencewhenn
PAGE 33

2.3.2SimulatedData:GraphSelectionandEstimationInthissection,weperformasimulationstudytocomparethegraph/modelselectionandestimationperformanceofCSCS,SparseCholeskyandSparseGraph.Formodelselection,weconsidereightdierentsettingswithp2f1000,2000gandn2fp=8,p=4,p=2,3p=2g.Inparticular,foreachp2f1000,2000g,applowertriangularmatrixT0isgeneratedasfollows.Werandomlychoose98%ofthelowertriangularentries,andsetthemtozero.Theremaining2%entriesarechosenrandomlyfromauniformdistributionon[0.3,0.7]andthenassignedapositive/negativesignwithprobability0.5.Now,appdiagonalmatrixD0isgeneratedwithdiagonalentrieschosenuniformlyfrom[2,5].Foreachsamplesizen2fp=8,p=4,p=2,3p=2g,100datasets,eachhavingi.i.d.multivariatenormaldistributionwithmeanzeroandinversecovariancematrix0=Tt0D)]TJ /F8 7.97 Tf 6.58 0 Td[(10T0,aregenerated.Themodelselectionperformanceofthethreealgorithms,CSCS,SparseCholesky,SparseGraph,isthencomparedusingreceiveroperatingcharacteristic(ROC)curves.Thesecurvescomparetruepositiverates(TPR)andfalsepositiverates(FPR),andareobtainedbyvaryingthepenaltyparameteroverroughly40possiblevalues.Inapplications,FPRistypicallycontrolledtobesucientlysmall,andthereforewefocusoncomparingportionofROCcurvesforwhichFPRislessthan0.15.InordertocomparetheROCcurves,Area-under-the-curve(AUC)isused(see Fawcett ( 2006 ), Friedmanetal. ( 2010 )).Tables 2-5 and 2-6 showthemeanandstandarddeviation(over100simulations)fortheAUCsforCSCS,SparseCholeskyandSparseGraphwhenp2f1000,2000gandn2fp=8,p=4,p=2,3p=2g.ItisclearthatCSCShasabettermodelselectionperformanceascomparedtoSparseCholeskyandSparseGraphinallcases. (a) AsexpectedSparseCholeskyperformssignicantlyworsethanothermethodswhennp. (b) TheFiguresalsoshowthatCSCSperformsbettermodelselectionthanSparseGraph,althoughthedierenceinAUCisnotaspronouncedaswithSparseCholesky.Inshould 33

PAGE 34

benotedthatCSCSobtainshigherAUCthanSparseGraphforeachofthe800datasets(100eachforp2f1000,2000gandn2fp=8,p=4,p=2,3p=2g).WealsonotethatthevariabilityismuchlowerforCSCSthantheothermethods.Itisworthmentioningthatforeachofthe800datasets,thedatawascenteredtozeromeanandscaledtounitvariancebeforerunningeachmethod.Firstly,thisillustratesthatscalingthedatadoesnotjustifyassumingthatthelatentvariableconditionalvariancesfDiigpi=1areidentically1,evidencedbyCSCSperformingconsistentlybettermodelselectionascomparedtoSparseGraph.Secondly,weobservedthatthethreealgorithmstypicallyrunmuchfasterwhenthedataisscaled;therefore,datastandardizationwasperformedintheinterestoftimegiventheextensivenatureofoursimulationstudy.NotethatpremultiplyingamultivariatenormalvectorbyadiagonalmatrixdoesnotaectthesparsitypatternoftheCholeskyfactoroftheinversecovariancematrix.AsmentionedinSection 2.1 ,theassumptionfDiigpi=1areidentically1cannotbeaccountedfor/justiedbypreprocessingthedata,andcanaecttheestimationperformanceoftheSparseGraphapproach.Toillustratethisfact,weconsiderthesettingsp=1000andn2fp=2,3p=2gandgenerate50datasetsforarangeofvaluessimilartothemodelselectionexperimentabove.Figures 2-7 showtheFrobeniusnormdierence(averagedover50independentrepetitions)betweenthetrueinversecovariancematrixandtheestimate(jj)]TJ /F10 11.955 Tf 13.63 2.66 Td[(^jjF),where^istheestimatedinversecovariancematrixforCSCSandSparseGraphforarangeonpenaltyparametervaluesforn=500.Foreachmethod(CSCSandSparseGraph),westartwithapenaltyparametervaluenearzero(0.01)andincreaseittilltheFrobeniusnormerrorbecomesconstant,i.e.,thepenaltyparameterislargeenoughsothatalltheo-diagonalentriesoftheCholeskyparameteraresettozero.ThatiswhytherangeofpenaltyparametervaluesfortheerrorcurvesisdierentinthevariouspartsofFigures 2-7 .Forn=500,iftheerrorismeasuredintermsofestimatingthecovariancematrix,CSCSachievesaminimumerrorvalueof0.2035at=0.2,andthemaximumerrorvalueof0.9526isachievedat=60(orhigher)whentheresultingestimate 34

PAGE 35

ofisadiagonalmatrixwiththeithdiagonalentrygivenbySiifor1ip.Onthetheotherhand,SparseGraphachievesaminimumerrorvalueof0.6635at=0.05,andamaximumerrorvalueof0.9996at=70(orhigher)whentheresultingestimateofistheidentitymatrix.Iftheerrorismeasuredintermsofestimatingtheinversecovariancematrix,CSCSachievesaminimumerrorvalueof0.363at=0.25,andthemaximumerrorvalueof2.2isachievedat=0.05.Ontheotherhand,SparseGraphachievesaminimumerrorvalueof0.7725at=70(orhigher)whentheresultingestimateofisadiagonalmatrixwiththeithdiagonalentrygivenby1=Siifor1ip,andachievesamaximumerrorvalueof2.798at=0.05.IfthepenaltyparameterischosenbyBIC(seeTable 2-8 )thenCSCShasa-errorvalueof0.2334andan-errorvalueof0.4054(correspondingto=0.3)andSparseGraphhasa-errorvalueof0.7642andan-errorvalueof1.7658(correspondingto=0.35).Asimilarpatternisobservedforthecasen=1500.ItisclearthatCSCShasasignicantlysuperioroverallestimationperformancethanSparseGraphinthissetting. 2.3.3ApplicationtoCallCenterDataInthissectionwediscusstheapplicationofCSCS,SparseCholeskyandSparseGraphtothecallcenterdatafrom Huangetal. ( 2006 ).Thedata,comingfromonecallcenterinamajorU.S.northeasternnancialorganization,containtheinformationaboutthetimeeverycallarrivesattheservicequeue.Foreachdayin2002,exceptfor6dayswhenthedata-collectingequipmentwasoutoforder,phonecallsarerecordedfrom7:00amuntilmidnight.The17-hourperiodisdividedinto10210-minuteintervals,andthenumberofcallsarrivingattheservicequeueduringeachintervalarecounted.Sincethearrivalpatternsofweekdaysandweekendsdier,thefocusisonweekdayshere.Inaddition,afterusingsingularvaluedecompositiontoscreenoutoutliersthatincludeholidaysanddayswhentherecordingequipmentwasfaulty(see ShenandHuang ( 2005 )),weareleftwithobservationsfor239days.Thedatawereorderedbytimeperiod.DenotethedatafordayibyNi=(Ni,1,...,Ni,102)0,i=1,...,239whereNi,tisthenumberofcallsarrivingatthecallcentreforthetth10-minuteintervalondayi.Letyit=p Nit+1=4,i=1,...,239,t=1,...,102.Weapplythe 35

PAGE 36

threepenalizedlikelihoodmethods(CSCS,SparseGraph,SparseCholesky)toestimatethe102102covariancematrixbasedontheresidualsfromatofthesaturatedmeanmodel.Thatis,countsofeachtimeperiodiscenteredbymeanofthatperiod.Followingtheanalysisin Huangetal. ( 2006 ),the`1penaltyparameterforallthreemethodswaspickedusing5-foldcrossvalidationonthetrainingdatasetasfollows.RandomlysplitthefulldatasetDintoKsubsetsofaboutthesamesize,denotedbyDv,v=1,...,K.Foreachv,weusethedataD)]TJ /F3 11.955 Tf 11.95 0 Td[(Dvtoestimate)]TJ /F4 7.97 Tf 6.58 0 Td[(vandDvtovalidate.Thenpicktominimize:CV()=1 KKXv=1)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(dvlogj^)]TJ /F8 7.97 Tf 6.59 0 Td[(1)]TJ /F4 7.97 Tf 6.59 0 Td[(vj+Xi2Ivy0i^)]TJ /F4 7.97 Tf 6.59 0 Td[(vyiwhereIvistheindexsetofthedatainDv,dvisthesizeofIv,and^vistheinversevariance-covariancematrixestimatedusingthetrainingdatasetD)]TJ /F3 11.955 Tf 11.95 0 Td[(Dv.Toassesstheperformanceofdierentmethods,wesplitthe239daysintotrainingandtestdatasets.ThedatafromtherstTdays(T=205,150,100,75),formthetrainingdatasetthatisusedtoestimatethemeanvectorandthecovariancematrix.Themeanvectorisestimatedbythemeanofthetrainingdatavectors.Fourdierentmethods,namely,CSCS,SparseCholesky,SparseGraphandS(samplecovariancematrix)areusedtogetanestimateofthecovariancematrix.Foreachofthethreepenalizedmethods,thepenaltyparameterischosenbothbycross-validationandtheBICcriterion.Hence,wehaveatotalofsevenestimatorsforthecovariancematrix.Thelog-likelihoodforthetestdataset(consistingoftheremaining239)]TJ /F3 11.955 Tf 12.38 0 Td[(Tdays)evaluatedatalltheaboveestimatorsisprovidedinTable 2-9 .Foralltrainingdatasizes,CSCSclearlydemonstratessuperiorperformanceascomparedtotheothermethods.Also,thecomparativeperformanceofCSCSwithothermethodsimprovessignicantlywithdecreasingtrainingdatasize.Huangetal. Huangetal. ( 2006 )additionallyusetheestimatedmeanandcovariancematrixtoforecastthenumberofarrivalsinthelaterhalfofthedayusingarrivalpatternsintheearlierhalfoftheday.Followingtheirmethod,wecomparedtheperformanceofallthefourmethodsunderconsideration(detailsprovidedinSupplementalSection B ).Wefoundthat 36

PAGE 37

allthethreepenalizedmethodsoutperformthesamplecovariancematrixestimator.However,asfarasthisspecicforecastingtaskisconcerned,thedierencesintheirperformancecomparedtoeachotheraremarginal.Wesuspectthattheforthepurposesofthisforecastingtask,theestimatedmean(sameforallthreemethods)hasamuchstrongereectthantheestimatedcovariancematrix.Hencethedierenceinforecastingperformanceismuchsmallerthanthedierenceinlikelihoodvalues.Nevertheless,SparseCholeskyhasthebestperformancefortrainingdatasizeT=205,150(whenthesamplesizeismorethanthenumberofvariables)andCSCShasthebestperformancefortrainingdatasizesT=100,75(whenthesamplesizeislessthanthenumberofvariables).SeeSupplementalSection B formoredetails. 2.3.4ApplicationtoHapMapDataInthissection,weanalyzetheHapMapphase3datafromtheInternationalHapMapproject(Consortiumetal.2010).Thedataconsistofn=167humansfromtheYRI(YorubainIbadan,Nigeria)population,andwefocusonp=201consecutivetagSNPsonchromosome22(afterlteringoutinfrequentsiteswithminorallelefrequency10%).Toassesstheperformanceofdierentmethods,wesplitthe167individualsintotrainingandtestdatasets.ThedatafromTrandomlyselectedindividuals(T=100,116,133),formthetrainingdatasetthatisusedtoestimatethemeanvectorandthecovariancematrix.Themeanvectorisestimatedbythemeanofthetrainingdatavectors.CSCS,SparseCholeskyandSparseGrapharethenusedtogetanestimateofthecovariancematrix.Foreachofthepenalizedmethods,thepenaltyparameterischosenbothbycross-validationandtheBICcriterion.FortheSparseCholeskyalgorithm,theresultingestimateshaveatleastonezerointhediagonalsoftheDmatrix,whichresultsinasingularestimateofthecovariancematrixwithalog-likelihoodofnegativeinnity.Hence,inFigure 2-10 ,wereportthelog-likelihoodforthetestdataset(consistingoftheremaining167)]TJ /F3 11.955 Tf 12.51 0 Td[(Tindividuals)evaluatedatthefourestimators(CSCS-CV,CSCS-BIC,SparseGraph-CV,SparseGraph-BIC).Foralltraining 37

PAGE 38

datasizes,CSCScoupledwithcross-validationclearlydemonstratesthebestperformanceascomparedtotheothermethods. 2.3.5ApplicationtoFlowCytometryDataInthissection,weanalyzeaowcytometrydatasetonp=11proteinsandn=7466cells,from Sachsetal. ( 2003 ).Theseauthorstadirectedacyclicgraph(DAG)tothedata,producingthenetworkinFigure C-1A (inSupplementalsection C ).Theorderingoftheconnectionsbetweenpathwaycomponentswereestablishedbasedonperturbationsincellsusingmolecularinterventionsandweconsidertheorderingtobeknownapriori.Thisdatasetisanalyzedin Friedmanetal. ( 2008b )and ShojaieandMichailidis ( 2010 )usingtheGLASSOalgorithmandtheSparseGraphalgotirms,respectively.In Friedmanetal. ( 2008b ),theauthorsestimatedthemanygraphsbyvaryingthe`1penaltyandreportaround50%falsepositiveandfalsenegativeratesbetweenoneoftheestimatesandthendingsof Sachsetal. ( 2003 ).Figure C-1 (inSupplementalsection C )showsthetruegraphaswellastheestimatedgraphusingCSCS,SparseCholeskyandSparseGraph.Wepickthepenaltyparameterbymatchingthesparsitytothetruegraph(approximately72%).HerebothSparseGraphandCSCSperformbetterthanSparseCholesky.In ShojaieandMichailidis ( 2010 ),theauthorsrecommendusingthefollowingequationforpenaltyparameterselection:i()=2n)]TJ /F8 5.978 Tf 7.78 3.26 Td[(1 2Z 2p(i)]TJ /F8 5.978 Tf 5.75 0 Td[(1),whereZqdenotesthe(1)]TJ /F3 11.955 Tf 12.17 0 Td[(q)thquantileofthestandardnormaldistribution.Thischoiceusesadierentpenaltyparameterforeachrow,andallthethreepenalizedmethods(SparseCholesky,SparseGraph,CSCS)canbeeasilyadaptedtoincorporatethis.UsingthismethodforSparseGraphgivesusafalsepositiverateof0.46andatruepositiverateof0.78,whileSparseCholeskyhasafalsepositiverateof0.62andatruepositiverateof0.94.Hence,whileSparseCholeskytendstondalotoffalseedges,itfailstodetectonlyonetrueedge.CSCSalsofailstodetectonlyoneedgeandthushasatruepositiverateof0.94.However,itdoesbetteroverallasindicatedbythelowerfalsepositiverateat0.51.Figure C-1 (inSupplementalsection C )showsthetruegraphaswellastheestimatedgraphusingCSCS,SparseCholeskyandSparseGraph.Bypickingthepenalty 38

PAGE 39

parameteraccordingtoBIC,SparseCholeskyresultsinacompletelysparsegraphwhileCSCSandSparseGraphreturnverydensegraphs.Matthew'scorrelationcoecient(MCC)isdenedasMCC=TPTN)]TJ /F3 11.955 Tf 11.95 0 Td[(FPFN p (TP+FP)(TP+FN)(TN+FP)(TN+FN)hereTP,TN,FPandFNcorrespondtotruepositive,truenegative,falsepositiveandfalsenegative,respectively.ThevalueofMCCrangesfrom-1to1withlargervaluescorrespondingtobetterts(-1and1representworstandbestts,respectively).ThehighestMCCvalueisattainedbyusingthepenaltyparamteraccordingtoi()=2n)]TJ /F8 5.978 Tf 7.78 3.26 Td[(1 2Z 2p(i)]TJ /F8 5.978 Tf 5.75 0 Td[(1),whereZqwithCSCS.ThetrueandfalsepositivesandMCCvaluesforthe72%sparsity,normalquantileandBICbasedestimatesareprovidedinTable 2-11 2.4AsymptoticPropertiesInthissection,asymptoticpropertiesoftheCSCSalgorithmwillbeexaminedinahigh-dimensionalsetting,wherethedimensionp=pnandthepenaltyparameter=nvarywithn.Inparticular,wewillestablishestimationconsistencyandmodelselectionconsistency(oracleproperties)fortheCSCSalgorithmundersuitableregularityassumptions.OurapproachisbasedonthestrategyoutlinedinMeinshausenandBuhlmann MeinshausenandBuhlmann ( 2006 )andMassam,PaulandRajaratnam Massametal. ( 2007 ).AsimilarapproachwasusedbyPengetal. Pengetal. ( 2009 )toestablishasymptoticpropertiesofSPACE,whichisapenalizedpseudolikelihoodbasedalgorithmforsparseestimationof.Despitethesimilarityinthebasiclineofattack,thereisanimportantstructuraldierencebetweentheasymptoticconsistencyargumentsin Pengetal. ( 2009 )andthissection(apartfromthefactthatweareimposingsparsityinL,not).Forthepurposeofprovingasymptoticconsistency,theauthorsin Pengetal. ( 2009 )assumethatdiagonalentriesofareknown,therebyreducingtheirobjectivefunctiontothesumofaquadratictermandan`1penaltytermin.Theauthorsin ShojaieandMichailidis ( 2010 )alsoestablishgraphselectionconsistencyoftheSparseGraphapproachundertheassumptionthatthediagonalentriesofLare1.Wedonotmakesuch 39

PAGE 40

anassumptionforL,whichleavesuswithpadditionalnon-zeroparameters,andadditionallogarithmictermsintheobjectivefunctiontoworkwith.Nevertheless,weareabletoadaptthebasicconsistencyargumentinthischallengingsettingwithanalmostidenticalsetofregularityassumptionsasin Pengetal. ( 2009 )(withassumptionsonreplacedbythesameassumptionsonL).Inparticular,weonlyreplacetwoassumptionsin Pengetal. ( 2009 )withaweakerandastrongerversionrespectively(seeAssumption(A4)andAssumption(A5)belowformoredetails)Westartbyestablishingtherequirednotation.Letfn=LtnLngn1denotethesequenceoftrueinversecovariancematrices,andrndenotethelowertriangularentries(includingthediagonal)intherthrowofLn,for1rp.LetArndenotethesetofindicescorrespondingtonon-zeroentriesinrthrowofLnfor1rp,andletqn=Ppnr=1jArnj.Letn=)]TJ /F8 7.97 Tf 6.58 0 Td[(1ndenotethetruecovariancematrixforeveryn1.Thefollowingstandardassumptionsarerequired. (A1-Boundedeigenvalues)Theeigenvaluesofnareboundedbelowbymin>0,andboundedabovebymax<1uniformlyforalln. (A2-SubGaussianity)TherandomvectorsY1,...,Ynarei.i.d.sub-Gaussianforeveryn1,i.e.,thereexistsaconstantc>0suchthatforeveryx2Rpn,Ehex0Yiiecx0x. (A3-Incoherencecondition)Thereexists<1suchthatforeveryn1,1rpnandj=2Arn,tn,j,Arn,ArAr+2 (rr)2r)]TJ /F8 7.97 Tf 6.59 0 Td[(1sign(rAr).Here,risajArjjArjmatrixwith(r)jj0=(1ifj=j0=jArj,0otherwise. (A4-Signalsize)Foreveryn1,letsn=min1rpminj2Arnrn,j. 40

PAGE 41

Thensn p dnn!1,wheredn=max1rpnjArj.Thisassumptionwillbeusefulforestablishingsignconsistency.Thesignalsizeconditionin Pengetal. ( 2009 )issn p qnn!1,whichisstrongerthanthesignalsizeconditionabove,asdnqn. (A5-Growthofpn,qnandn)Thefollowingconditionshold:pn=O(n)for0,qn=oq n logn,q qnlogn n=o(n),nq n logn!1andqnn!0asn!1.Thegrowthconditionsin Pengetal. ( 2009 )arethesameasabove(withqndenotingthesparsityinthetruein Pengetal. ( 2009 )),expectthatqnn!0aboveisreplacedbytheweakerassumptionp qnn!0.Undertheseassumptions,thefollowingconsistencyresultcanbeestablished. Theorem2.4.1. Supposethat(A1)-(A5)aresatised.ThenthereexistsaconstantC>0,suchthatforany>0,thefollowingeventsholdwithprobabilityatleast1)]TJ /F3 11.955 Tf 11.96 0 Td[(O(n)]TJ /F9 7.97 Tf 6.59 0 Td[(): (i) Asolutionoftheminimizationproblem infL2LpnQCSCS(L)(2{18)exists. (ii) (Estimationandsignconsistency):anysolution^Lnoftheminimizationproblemin( 2{18 )satisesk^Ln)]TJ /F10 11.955 Tf 12.21 2.66 Td[(LnkCqnn.andsign(^Ln,ij)=sign(Ln,ij),forevery1jip.Heresign(x)takesthevaluesf)]TJ /F10 11.955 Tf 15.28 0 Td[(1,0,1gwhenx<0,x=0,andx>0respectively.Aproofoftheaboveresultisprovidedintheappendix. 2.5DiscussionThisworkproposesanovelpenalizedlikelihoodbasedapproachforsparseCholeskybasedcovarianceestimationformultivariateGaussiandata,whenanaturalorderingofvariablesisavailable.Thegoalistoovercomesomeoftheshortcomingsofcurrentmethods,butatthesametimeretaintheirrespectivestrengths.Westartwiththeobjectivefunctionforthehighly 41

PAGE 42

usefulSparseCholeskyapproachin Huangetal. ( 2006 ).ReparametrizationofthisobjectivefunctionintermsoftheinverseoftheclassicalCholeskyfactorofthecovariancematrix,alongwithappropriatechangestothepenaltyterm,leadsustotheformulationoftheCSCSobjectivefunction.ItisthenshownthattheCSCSobjectivefunctionisjointlyconvexinitsarguments.Acoordinate-wiseminimizationalgorithmthatminimizesthisobjective,viaclosedformiterates,isproposed,andsubsequentlyanalyzed.Theconvergenceofthiscoordinate-wiseminimizationalgorithmtoaglobalminimumisestablishedrigorously.ItisalsoestablishedthattheestimateproducedbytheCSCSalgorithmalwaysleadstoapositivedeniteestimateofthecovariancematrix-thusensuringthatCSCSleadstowelldenedestimatesthatarealwayscompu=partialsim138602509.err.SuchaguaranteeisnotavailablewiththeSparseCholeskyapproachwhenn
PAGE 43

(d) Forestimationpurposes,CSCScanleadtosignicantimprovementsinperformanceoverSparseGraph(Section 2.3.2 ). 43

PAGE 44

Figure2-4. PlotoftheiteratesforD77forSparseCholeskyinasettingwithp=8.Itshowshowthevaluejumpsto0(andstaysthere). n=125 n=250 n=500 n=1500 Solver MeanStd.Dev. MeanStd.Dev. MeanStd.Dev. MeanStd.Dev. SparseCholesky 0.0127960.000045 0.0184610.000108 0.0788320.000122 0.1279160.000027 SparseGraph 0.1139550.000200 0.1291420.000048 0.1352710.000066 0.1386330.000026 CSCS 0.1184400.000111 0.1339580.000036 0.1384920.000023 0.1398910.000001 Figure2-5. MeanandStandardDeviationofarea-under-the-curve(AUC)for100simulationsforp=1000.EachsimulationyieldsaROCcurvefromwhichtheAUCiscomputedforFPRintheinterval[0.01,0.15].CSCSachievesthehighestAUCineachcolumn. n=250 n=500 n=1000 n=3000 Solver MeanStd.Dev. MeanStd.Dev. MeanStd.Dev. MeanStd.Dev. SparseCholesky 0.0151310.000050 0.0323910.000105 0.1242840.000058 0.1426780.000012 SparseGraph 0.1419570.000044 0.1463620.000009 0.1479840.000005 0.1487420.000001 CSCS 0.1446860.000019 0.1478390.000004 0.1487220.000002 0.1489040.000001 Figure2-6. MeanandStandardDeviationofarea-under-the-curve(AUC)for100simulationsforp=2000.EachsimulationyieldsaROCcurvefromwhichtheAUCiscomputedforFPRintheinterval[0.001,0.15].CSCSachievesthehighestAUCineachcolumn. 44

PAGE 45

A B C DFigure2-7. FrobeniusNormErrorforandaveragedover50replicationsforn=500fordierentpenaltyparametervalues.(A)forCSCS(B)forSparseGraph(C)forCSCS(D)forSparseGraph n=500() n=1500() n=500() n=1500() CSCS 0.4054(0.0018) 0.4154(0.0012) 0.2334(0.0107) 0.2043(0.0084) SparseGraph 1.7658(0.0036) 1.9089(0.0023) 0.7642(0.0040) 0.7669(0.0021) Figure2-8. FrobeniusNormerrorforchosenbyBICforCSCSandSparseGraphforp=1000 45

PAGE 46

TrainingDataSize Method 205 150 100 75 CSCS-CV -1090.447 -1369.181 -2225.907 -2841.348 CSCS-BIC -1072.75 -1364.145 -2214.729 -2849.931 SparseGraph-CV -1077.791 -2237.298 -3576.343 -4499.298 SparseGraph-BIC -1135.980 -2421.950 -3817.689 -4846.118 SparseCholesky-CV -1500.094 -2121.005 -3579.932 -496617558322 SparseCholesky-BIC -1523.409 -2178.738 -3584.160 -5444.471 S -1488.224 -7696.740 notpd notpd Figure2-9. Testdatalog-likelihoodvaluesforvariousestimationmethodswithtrainingdatasize205,150,100,75.Themaximumlikelihoodvalueineachcolumniswritteninbold. METHOD TrainingDataset CSCS-CV SparseGraph-CV CSCS-BIC SparseGraph-BIC 100 1718.216 -2784.134 582.6724 -3477.754 116 866.771 -2012.598 488.7348 -2687.4 133 860.0983 -1338.959 293.1705 -1757.237 Figure2-10. Testdatalog-likelihoodvaluesforvariousestimationmethodswithtrainingdatasizeT=100,116,133.Themaximumlikelihoodvalueineachcolumniswritteninbold. 72%Sparsity i()=2n)]TJ /F8 5.978 Tf 7.78 3.26 Td[(1 2Z 2p(i)]TJ /F8 5.978 Tf 5.75 0 Td[(1) BIC Solver FPTPMCC FPTPMCC FPTPMCC CSCS 0.24320.50000.2657 0.51350.94440.4848 0.86491.00000.2692 SparseCholesky 0.27030.44440.1817 0.62160.94440.3916 0.00000.00000.0000 SparseGraph 0.24320.50000.2657 0.45950.77780.3277 0.81081.00000.3232 Figure2-11. TPR&FPRforCellSignallingPathwayData 46

PAGE 47

CHAPTER3CHOLESKYBASEDESTIMATIONINGRAPHICALMODELS 3.1IntroductionSupposeY1,Y2,,Ynarei.i.d.observationsfromadistributiononRp.Letdenotethecovariancematrix,and=)]TJ /F8 7.97 Tf 6.59 0 Td[(1denotetheconcentrationmatrixortheinversecovariancematrixofthisdistribution.Inthisworkweconsiderconcentrationgraphmodels,wheresomeentriesoftheinversecovariancematrixarerestrictedtobezero.Thesemodelshavebeenofsignicantinterestinrecentyears,ashigh-dimensionaldatasetswithhundredsifnotthousandsofvariableswithasmallsamplesizehavebecomecommon.ThecasewhentheunderlyingdistributionisGaussianhasalsobeenofspecialinterest.Underthisassumption,zeroesintheconcentrationmatriximplyconditionalindependencies.Therearetwotasksinvolvedinthestatisticalanalysisofconcentrationgraphmodels.Therstismodelorgraphselection,orchoosingthepatternofstructuralzerostobeimposedon.Thesecondisp.d.estimation,i.e.,ndingapositivedeniteestimateofunderthediscoveredsparsitypattern.Suchestimatesareoftenrequiredfordownstreamapplications,especiallyinthecontextofnanceorclimatesciencedatasets.Themodel/graphselectionmethodsintheliteraturecanbebroadlyclassiedintotwoclasses.Therstclassofmethodsareiterativepenalizedlikelihoodmethodswhichmostlyachievemodelselectionandestimationtogether.See Friedmanetal. ( 2008b ); Pengetal. ( 2009 ); Friedmanetal. ( 2010 ); Rothmanetal. ( 2010b ); MazumderandHastie ( 2012 ); Khareetal. ( 2015 )andthereferencesthereinforconcentrationgraphmodels.Thesecondclassofmethodsarenon-iterativemethodsbasedonthresholdingi.e.,see HeroandRajaratnam ( 2012 )1.Thenon-iterativethresholdingbasedmethodsaremuchfasterandscalabletosignicantlyhigherdimensionalsettingsthanthepenalizedlikelihoodmethodswhichuseiterativeoptimizationalgorithms.Toclarify,theterm 1Thresholding-basedmethodsformodelselectioninconcentrationgraphmodelsarealsoprovidedin BickelandLevina ( 2008 ),buttheauthorsrestricttobandedsparsitypatterns. 47

PAGE 48

non-iterativemethodreferstoamethodwhichrequiresaxedandniteamountoftime,anddoesnotneedrepetitionsuntilconvergence.However,theaforementionedthresholdingbasedmethodsarenotguaranteedtoproducepositivedeniteestimatesofthecovariancematrix.Hencefordownstreamapplicationswhichrequireapositivedeniteestimateofthecovariancematrix,itiscriticalthatthethresholdingmethodsbecoupledwithmethodswhichprovideapositivedenitecovarianceestimateobeyingagivensparsitypattern.Wenowreviewexistingmethodsforpositivedeniteestimationinconcentrationgraphicalmodels,whentheunderlyingsparsitypatternisgiven.ForGaussianconcentrationgraphmodels,theMLEcannotbeobtainedinclosedformunlessthepatternofzerorestrictionscorrespondstoadecomposableorchordalgraph(see Lauritzen ( 1996 )).Aniterativealgorithmcallediterativeproportionaltting(IPF)wasproposedbySpeedandKiveri SpeedandKiveri ( 1986 )toobtaintheMLEinthiscase.Morerecently,amodicationoftheGLASSOalgorithmwhichtakestheknownsparsitypatternintoaccounthasalsobeendeveloped,see( Hastieetal. 2009 ,Algorithm17.1).ThereisasubstantialliteratureforBayesianestimationinGaussianconcentrationgraphmodels,see DawidandLauritzen ( 1993 ); Roverato ( 2002 ); LetacandMassam ( 2007 ); Rajaratnametal. ( 2008 )andthereferencestherein.Alloftheexistingapproachesforp.d.estimation(givenaspecicsparsitypattern)intheliteratureeitherarecomputationallyintensiveiterativealgorithmswhichhavetoberepeateduntilconvergence,and/orassumethatthepatternofzeroesisbandedorcorrespondstoadecomposablegraphand/orrelyonGaussianity.Hence,thereisaneedforfast,scalable,non-iterativemethodsforestimationinconcentrationgraphmodelswithageneralunderlyingsparsitypattern.Inthiswork,weaddressthiscrucialgapintheliteraturebedevelopingnon-iterativemethodsbasedontheCholeskydecompositionforp.d.estimationinconcentrationgraphmodelswithanarbitraryunderlyingpatternofzeroesintheconcentrationmatrix.Themethoddevelopedcanbesummarizedasfollows.Intherststep,estimatesareobtainedforanappropriatesmallerdecomposablepatternofzeros.TheCholeskydecompositionsoftheseestimatesarethen 48

PAGE 49

suitablymodiedtoobtainestimateswiththerequiredpatternofzeroesintheconcentrationmatrix.SincetheseestimatesarebasedontheCholeskydecomposition,achoiceoforderingofthevariablesneedstobemade.Inapplicationswhenthereisnonaturalordering,thischoiceisgovernedbycomputationalpurposes,andprincipledmethodstochoosetheappropriateorderingarediscussed.Thisworkisorganizedasfollows.Section 3.2 providesrequiredpreliminaries.TheproposedpositivedeniteestimationalgorithmisintroducedinSection 3.3 .InSection 3.4 wedemonstratethroughsimulationexperimentsandonrealdatathatasubstantialimprovement(upto50times)incomputationaleciencycanbeobtainedbyusingtheproposedmethodsascomparedtoIPF,whileroughlymaintainingthesamelevelofaccuracy.InSection 3.5 weestablishasymptoticconsistencyoftheproposedestimatesrstinaclassicalsettingwherethenumberofvariablesremainsxedasthesamplesizeincreases,andthen,moreimportantly,inahigh-dimensionalsettingwherethenumberofvariablesincreasesatanappropriatesub-exponentialratewithsamplesize. 3.2PreliminariesInthissectionweproviderequiredbackgroundmaterialfromgraphtheoryandmatrixalgebra. 3.2.1GraphsandVertexOrderingsAgraphG=(V,E)consistsofanitesetofverticesVsuchthatjVj=p,andasetofedgesEVV.Weconsiderundirectedgraphswithnoloops.Hence,(v,v)=2Eforeveryv2V,and(u,v)2E)(v,u)2E.Next,wedenethenotionsofvertexorderingandorderedgraphs.Thesewillbeusefulinsubsequentanalysis.Denition Denition1. (Vertexordering)AvertexorderingofthevertexsetVisabijectionfromVtoVp:=f1,2,,pg. Denition2. (Orderedgraph)ConsidertheundirectedgrahpG=(V,E)andanorderingofV.TheorderedgraphGhasvertexsetVpandedgesetE=f((u),(v)):(u,v)2Eg. 49

PAGE 50

3.2.2CholeskyDecompositionGivenapositivedenitematrix2P+p,thereexistsauniquelowertriangularmatrixLwithpositivediagonalentriessuchthat=LLT.ThematrixLwillbereferedtoastheCholeskyfactorof.TheCholeskydecompositionwillbeausefulingredientintheestimatorsconstructedinthiswork. 3.2.3TheSpacesPGandLGLetG=(V,E)beagraphonpverticesandletbeanorderingofV.LetP+denotethespaceofpppositivedenitematrices,andL+denotethespaceofpplowertriangularmatriceswithpositivediagonalentries.Weconsidertwomatrixspacesthatwillplayanimportantroleintheanalysis.TherstmatrixspacePG=2P+:ij=0if(i,j)=2E,isthecollectionofallpppositivedenitematriceswherethe(i,j)thentryiszeroif(i,j)arenotconnectedintheorderedgraphG.TheothermatrixspaceLG=L2L+:LLT2PG,isthecollectionofallpplowertrinagularmatriceswithpositivediagonalentries,whichcanbeobtainedasaCholeskyfactorofamatrixinPG. 3.2.4DecomposableGraphsAnundirectedgraphGissaidtobedecomposableifitdoesnotcontainacycleoflengthgreaterthanorequaltofourasaninducedsubgraphandifitisconnected.ThereaderisreferrredtoLauritzen Lauritzen ( 1996 )forallthecommonnotionsofgraphicalmodels(andinparticulardecomposablegraphs)thatwewillusehere.Onesuchimportantnotionisthatofaperfectorderofthecliques.Everydecomposableadmitsaperfectorderofitscliques.Let(C1,C2,,Ck)beonesuchperfectorderofthecliquesofthegraphG.ThehistoryforthegraphisdenedbyH1=C1andHj=C1[C2[Cj,j=2,3,,k, 50

PAGE 51

andtheminimalseparatorsforthegrapharedenedbySj=Hj)]TJ /F8 7.97 Tf 6.58 0 Td[(1\Cj,j=2,3,,k.LetRj=CjnHj)]TJ /F8 7.97 Tf 6.58 0 Td[(1forj=2,3,,k.Someoftheseseparatorscanbeidentical.Letk0k)]TJ /F10 11.955 Tf 12.76 0 Td[(1denotethenumberofdistinctseparatorsand(S)denotethemultiplicityofS,i.e.,thenumberofjsuchthatSj=S.Generally,wewilldenotebyCthesetofcliquesofagraphandbySitssetofseparators. Denition3. AnorderingofVisdenedtobeaperfectvertexeliminationschemeforGifforeveryu,v,w2Vsuchthat(u)>(v)>(w)and((u),(w))2E,((v),(w))2E,wehave((u),(v))2E.Aconstructivewaytoobtainaperfectvertexeliminationschemeisgivenasfollows.LabeltheverticesindescendingorderstartingwithverticesinC1,R2,R3,,Rk,whereverticesbelongingtoaeachparticularsetareorderedarbitrarily. 3.2.5CholeskyFactorsandFill-inEntriesLetGbeadecomposablegraph,andanorderingofV.In Paulsenetal. ( 1989a ),itisprovedthatisaperfectvertexeliminationschemeforGifandonlyifLG=L2L+:Lij=0ifi>j,(i,j)2E,i.e,thestructuralzeroesineverymatrixinPGarealsoreectedinitsCholeskyfactor.However,ifthegraphGisnotdecomposable,orifGisdecomposablebutisnotaperfectvertexeliminationscheme,thenforanymatrixinPG,thezeroesintheCholeskyfactorareingeneralasubsetofthestructuralzeroesinPG.Theextranon-zeroentriesintheCholeskyfactorarereferedtoasthell-inentriescorrespondingtoPG.Forexample,ifG=(V,E)isthe4-cycle,andissuchthatE=f(1,2),(2,3),(3,4),(4,1)g, 51

PAGE 52

then=0BBBBBBB@31011310013210231CCCCCCCA2PG,anditsCholeskyfactorLisgivenbyL=0BBBBBBB@1.7320000.5771.6330000.6121.62000.577)]TJ /F10 11.955 Tf 9.3 0 Td[(0.2041.3120.9511CCCCCCCA.Theonlyll-inentryinthiscaseisthe(4,2)thentry.Next,wedenethenotionofalledgraph. Denition4. ThelledgraphforG,denotedbyGD=(Vp,ED),isdenedasfollows. 1. Forevery2PG,theCholeskyfactorLsatisesLij=0ifi>j,(i,j)=2ED. 2. GDisanordereddecomposablegraph,i.e.,GDisadecomposablegraphandisaperfectvertexeliminationschemeforGD. 3. TheedgesetEDofGDisasubsetoftheedgesetofeverygraphsatisfyingproperties1and2.Essentially,thelledgraphforGisthe(ordered)decomposablegraphwhoseedgesetcontainsEalongwithalltheedgescorrespondingtothell-inentriesformatricesinLG.Intheexampleofthe4-cycleconsideredabove,thelledgraphGDhasedgesetgivenbyED=f(1,2),(2,3),(3,4),(4,1),(4,2)g.AsimpleproceduretoobtainGD(givenGand)usingtheeliminationtreemethod(see Davis ( 2006 ))isgivenbelow. 1. StartwithED=E.Setk=1. 2. LetN>k=fj:j>k,(j,k)2EDg.AddnecessaryedgestoEDsothatN>kisaclique. 52

PAGE 53

3. Setk=k+1.Ifk
PAGE 54

graphG.Anorderingofthevariablesisalsoassumedtobegiven.AsmentionedinSection 3.2.6 ,ifthereisnonaturalorderingamongthevariables,weusetheCMllreducingorderingforthegraphG.ThetaskconsideredinthissectionistondanestimateoflyinginPG.TherststepinourapproachistondanestimatorofinPGD.RecallthatGD,whichdenotesthelledgraphforG,isadecomposablegraphwhoseedgesetsubsumestheedgesetofG.LetCDdenotethesetofcliquesofGDandletSDdenotethesetofseparatorsofGD.LetSdenotethesamplecovariancematrix,andassumethatn>maxC2CDjCj.Considertheestimator bD=XC2CD(SC))]TJ /F8 7.97 Tf 6.59 0 Td[(10)]TJ /F16 11.955 Tf 18.59 11.36 Td[(XSep2SD(SSep))]TJ /F8 7.97 Tf 6.59 0 Td[(10,(3{1)where(SC))]TJ /F8 7.97 Tf 6.59 0 Td[(10:=8>><>>:(S)]TJ /F8 7.97 Tf 6.59 0 Td[(1C)ijifi,j2C,0otherwise,ans(SSep))]TJ /F8 7.97 Tf 6.58 0 Td[(10isdenedsimilarly.Itiswellknownthat(seeLauritzen Lauritzen ( 1996 ))bDistheMLEforifweassumethat2PGD,andtheunderlyingdistributionisGaussian.Thisestimatorisclosedformandexistsevenwhenp>n(aslongasn>maxC2CDjCj).LetbLDdenotetheCholeskyfactorofbD.Itcanbeshown(see Xiangetal. ( 2015 ))thatthenon-zeroentriesofthejthcolumnofbLDcanbeobtainedas bLD,>j=)]TJ /F10 11.955 Tf 46.33 9.32 Td[((S>j))]TJ /F8 7.97 Tf 6.58 0 Td[(1S>j q Sjj)]TJ /F10 11.955 Tf 11.96 0 Td[((S>j)T(S>j))]TJ /F8 7.97 Tf 6.59 0 Td[(1S>jandbLjj=1 q Sjj)]TJ /F10 11.955 Tf 11.96 0 Td[((S>j)T(S>j))]TJ /F8 7.97 Tf 6.59 0 Td[(1S>j(3{2)forevery1jp.HereA>j=((Akl))k,l>j,(k,j),(l,j)2EDandA>j=(Akj)k>j,(k,j)2EDforanymatrixA.ThesecondandnalstepinpurapproachistomodifytheCholeskyfactorbLDtoalowertriangularmatrixbLsuchthatb=bLbLT2PG.ThedetailsofthisapproachareprovidedinAlgorithm 3.1 ,whichwerefertoastheConstrainedCholeskyApproach(CCA).Notethatb(theresultingestimatefromAlgorithm 3.1 )isobtainedinaxedandnitenumberofsteps,andthereisnorepetitionofanidenticalsequenceofstepsuntilconvergence.Thefollowing 54

PAGE 55

lemmaestablishesthatbproducedbyAlgorithm 3.1 liesinPG,i.e.,itispositivedeniteandhastherequiredpatternofzeros. Lemma1. LetbbetheestimateoftheconcentrationmatrixproducedbyAlgorithm 3.1 .Ifn>maxC2CDjCj,thenb2PG.ProofItisclearbyconstructionthatifi>j,(i,j)=2E,thenbij=(bLbLT)ij=jXk=1bLikbLjk=bLijbLjj+j)]TJ /F8 7.97 Tf 6.58 0 Td[(1Xk=1bLikbLjk=0.Hence,wegetbL2LGandb2PG.2 Algorithm3.1(h). CCAAlgorithmforobtainingpositivedeniteestimateofobeyingsparsityencodedinagivengraph Input:SamplecovariancematrixS Input:SparsitypatternencodedinagraphG Compute:RCMllreducingordering Compute:DecomposablecoverGD Compute:TheCholeskyfactorbLDofbD(asspeciedin( 3{2 )) SetbL bLD Fori=2,,p Forj=1,2,,i)]TJ /F10 11.955 Tf 11.96 0 Td[(1 55

PAGE 56

If(i,j)=2E,setbLij=)]TJ /F24 7.97 Tf 8 5.98 Td[(Pj)]TJ /F8 5.978 Tf 5.76 0 Td[(1k=1bLikbLjk bLjj Output:ReturnnalestimatesbLandb=bLbLT 3.3.1ComputationalDetailsandComplexityfortheCCAAlgorithmInthissection,weproviderelevantcomputationaldetailsandevaluatethecomputationalcomplexityofeachstepintheCCAAlgorithm(Algorithm 3.1 ). ThecomplexityofthesteptodeterminetheRMCllreducingorderingisO(jEj)(see lineartimeimplementationofthereverseCuthill-McKeealgorithm ( 1980 )). ThecomplexityofcomputingGD(thedecomposablecover)usingtheeliminationtreemethodoutlinedinSection 3.2.5 isO)]TJ 5.48 -0.72 Td[(Ppj=1n2j,wherenj=fi:1jj,(i,j)=2ED.Also,byconstruction,bLij=bLDijif(i,j)2Eori=j.Hence,theonlyentrieswhichwillbechangedinthelaststepofAlgorithm 3.1 arethell-inentriesortheedgesinEDnE.ThecomplexityofmodifyingtheseentriesinthealgorithmisO)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(pEDnE.ThefollowinglemmacombinestheaboveanalysistoprovidetheoverallcomputationalcomplexityofAlgorithm 3.1 Lemma2. ThecomputationalcomplexityofAlgorithm 3.1 ismin O p+pXj=1n3j+jEj+pEDnE!,O)]TJ /F3 11.955 Tf 5.47 -9.69 Td[(p3!.EachiterationoftheecientversionofIteratedProportionalFitting(IPF)implementedintheglassopackageinRhasroughlythesamecomputationalcomplexityasAlgorithm 3.1 (anexactcomparisonisingeneralnotpossibleastheIPFalgorithmdoesnotusethe 56

PAGE 57

decomposablecoverGD,butweexpectthecomplexitiestobeinthesameballpark).ThemaincomputationaladvantageofAlgortihm 3.1 liesinthefactthatIPFhastorepeattheseiterationstillconvergence(basedonsomecriterion)isachieved,whilenosuchrepetitionisnecessaryforAlgorithm 3.1 3.4Illustrations 3.4.1SimulationExperimentInthissection,wecomparetheperformanceoftheCCAalgorithmandtheIteratedProportionalFitting(IPF).TheIPFisalgorithmforcomputingtheMLEforinaGaussianconcentrationgraphicalmodelwhentheunderlyingsparsitypatternisknown.ThealgorithmwasoriginallyproposedbySpeedandKiveri SpeedandKiveri ( 1986 ).Forourcomparison,weuseamoreecientversiondescribedin( Hastieetal. 2009 ,Algorithm17.1)andimplementedintheglassopackage.Essentially,thisversionisamodicationofthewell-knownGLASSOalgorithmwhichaccountsforthefactthatthesparsitypatterninisknown.WewillrefertothisalgorithmastheGLASSO-IPFalgorithm.Foreachvalueofp2f500,1000,1500,2000,5000g,thefollowingprocedureisrepeated.Thetruehasarandomsparsitypatternofapproximately2pedges.Togenerate=LtLforagivenp,weinitallygenerateL,alowertriangularmatrix,inthefollowingmanner.RandomlyselectsomeentiresofLtobezero(usuallylessthan2p).Amongstthesewerandomlysethalfofthemtobepositiveandtheotherhalftobenegative.Inaddition,theo-diagonalentriesofLaredrawnfromtheuniformdistributionover[0.3,0.7].Finally,wechoosethediagonalentriesfromtheuniformdistributionover[2,5].Oncethisisdone,wexthethenumberofsamplesn=f0.5p,0.75p,p,2pgandgenerate50datasetseachfromthemultivariateGaussianandthemultivariate-tdistributionwith3degreesoffreedom.BoththeCCAalgorithmandtheGLASSO-IPFalgorithmarethenusedtoobtainestimatesofWealsopresentresultsfromahybridapproach,whichusestheCCAestimateastheinitialestimateinGLASSO-IPFasopposedtothediagonalmatrixwithdiagonalentriesf1=Siigpi=1.TheresultspresentedbelowaretheaveragerelativeFrobeniusNormestimationerrors(jj^)]TJ /F8 7.97 Tf 6.59 0 Td[(jj jjjj)overthe 57

PAGE 58

50datasetsgeneratedforeachsamplesizeandunderlyingdistributions.TheresultsofthesimulationsforthemultivariateGaussiancaseareprovidedinFigure 3-1 ,andtheresultsforthemultivariate-tdistributioncaseareprovidedinFigure 3-2 .TheresultsshowthattheCCAalgorithmisanorderofmagnitudefasterthantheGLASSO-IPFalgorithm,whileprovidingasimilarlevelofstatisticalprecision.ThescalabilityoftheCCAalgorithm(ascomparedtoIPF)isevenmorepronouncedintheheavy-tailedt-distributioncase.ThisdemonstratestheusefulnessandscalabiblityoftheCCAalgorithmformodernhigh-dimensionalapplications. CCA IPF CCA-IPF p n TimeNorm TimeNorm TimeNorm 500 250 5.22940.1005 28.60280.1014 10.10920.1009 500 375 5.30870.0817 28.74130.0822 10.15350.0819 500 500 5.32300.0700 27.90880.0703 9.39430.0701 500 1000 5.26900.0492 24.18790.0493 8.62320.0493 1000 500 34.83900.0762 536.31020.0765 340.23400.0762 1000 750 34.81450.0626 516.45520.0627 332.84060.0625 1000 1000 34.53590.0536 513.21370.0537 333.50850.0536 1000 2000 35.31030.0376 495.10100.0377 306.78690.0376 1500 750 112.70320.0648 2661.35930.0643 1884.97890.0641 1500 1125 112.88100.0529 2622.05920.0525 1841.18990.0524 1500 1500 112.97000.0457 2580.04300.0453 1808.99900.0452 1500 3000 116.03840.0322 2471.48570.0318 1737.13590.0319 2000 1000 289.77680.0555 8933.59900.0551 5179.31120.0549 2000 1500 298.91770.0454 8844.65380.0450 5112.45750.0449 2000 2000 299.47480.0391 8740.39680.0387 5016.72790.0387 2000 4000 309.34370.0276 8590.32360.0273 4870.12890.0273 5000 2500 4911.29810.0320 120252.44530.0321 33776.26930.0320 5000 3750 4945.78840.0261 113356.63400.0261 28047.79770.0261 5000 5000 5034.48400.0226 114867.70360.0226 29037.67650.0226 5000 10000 5156.17260.0160 114133.30910.0160 29248.75740.0160 Figure3-1. Runningtimeandestimationerror(inrelativeFrobeniusnorm)forCCA,IPFandCCA-IPF(useIPFwiththeCCAestimatorastheinitialvalue)withnormaldata 3.4.2ApplicationtoMinimumVariancePortfolioRebalancingTheminimumvarianceportfolioselectionproblemisdenedasfollows.Givenpriskyassets,letrikdenotethereturnofassetiovertimeperiodk;whichinturnisdenedasthechangeinitspriceovertimeperiodk,dividedbythepriceatthebeginningofthe 58

PAGE 59

CCA IPF CCA-IPF p n TimeNorm TimeNorm TimeNorm 500 250 5.53030.6172 46.29280.6157 13.38360.6163 500 375 5.53280.6247 37.89480.6237 12.44670.6240 500 500 5.58350.6429 42.30950.6422 12.31070.6424 500 1000 5.46240.6493 37.36440.6489 11.75470.6491 1000 500 33.78570.6348 1172.26070.6342 638.86890.6347 1000 750 33.87030.6368 741.77550.6363 431.21080.6366 1000 1000 34.35480.6472 811.04690.6469 459.18290.6471 1000 2000 34.50480.6463 663.10950.6462 416.00260.6463 1500 750 112.78440.6355 4501.25980.6354 2983.81320.6360 1500 1125 112.63330.6400 3860.27500.6399 2601.65070.6404 1500 1500 113.49490.6425 3945.81330.6425 2628.71580.6428 1500 3000 116.62610.6458 3813.88120.6458 2613.35250.6461 2000 1000 282.32230.6373 14832.82640.6373 8884.33610.6380 2000 1500 275.97430.6459 14390.12320.6459 8771.02630.6467 2000 2000 280.93010.6375 11145.91230.6375 6681.29320.6378 2000 4000 283.29530.6549 11734.89000.6549 7662.19580.7060 5000 2500 4929.44450.6442 152256.83700.6441 40433.04590.6442 5000 3750 5015.80210.6472 143138.25860.6471 38316.72480.6471 5000 10000 5070.00880.6529 141424.23380.6528 32711.41630.6528 Figure3-2. Runningtimeandestimationerror(inrelativeFrobeniusnorm)forCCA,IPFandCCA-IPF(useIPFwiththeCCAestimatorastheinitialvalue)withtdata period.LetrTk=(r1k,r2k,,rpk)denotethevectorofreturnsovertimeperiodk.LetwTk=(w1k,w2k,...,wpk)denotethevectorofportfolioweights,i.e.,wikdenotestheweightofassetiintheportfolioforthek-thtimeperiod.Alongpositionorashortpositionforassetiduringperiodkisgivenbythesignofwik,i.e.,wik>0forlong,andwik<0forshortpositionsrespectively.Thebudgetconstraintcanbewrittenas1Twk=1,where1denotesthevectorofallones.Letdenotethecovariancematrixofthevectorofreturns.Notethattheriskofagivenportfolioasmeasuredbythestandarddeviationofitsreturnissimply(wTkkwk)1=2.Theminimumvarianceportfolioselectionproblemforinvestmentperiodkcannowbeformallydenedasfollows:minwTkkwksubjectto1Twk=1 59

PAGE 60

Asthisisasimplequadraticprogram,ithasasimpleanalyticsolutionwk=(1T)]TJ /F8 7.97 Tf 6.58 0 Td[(11))]TJ /F8 7.97 Tf 6.59 0 Td[(1)]TJ /F8 7.97 Tf 6.59 0 Td[(1k1.Themostbasicapproachtotheportfolioselectionproblemoftenmakestheunrealisticassumptionthatreturnsarestationaryintime.Astandardapproachtodealingwiththenon-stationarityinsuchnancialtimeseriesistouseaperiodicrebalancingstrategy.Inparticular,atthebeginningofeachinvestmentperiodk=1,2,...,K,portfolioweightswTk=(w1k,w2k,...,wpk)arecomputedfromthepreviousNestdaysofobservedreturns(Nestiscalledtheestimationhorizon).Theseportfolioweightsarethenheldconstantforthedurationofeachinvestmentperiod.Theprocessisrepeatedatthestartofthenextinvestmentperiodandisoftenreferredtoasrebalancing.WenowconsidertheproblemofinvestinginthestocksthatfeatureintheS&P500index.TheS&P500isacompositebluechipindexconsistingof500stocks.(notethatanumberofstockswereremovedinouranalysisduetoitslimiteddataspan-intheendweused398stocks).Rebalancingtimepointswerechosentobeevery64weeksstartingfrom2000/01/01to2016/12/31(approximately17years).Startandenddatesofeachperiodareselectedtobecalendarweeks,andneednotcoincidewithatradingday.Thetotalnumberofinvestmentperiodsis14,andthenumberoftradingdaysineachinvestmentperiodisapproximately320days.Weshallcomparethefollowingthreemethodsforestimatingthecovariancematrix:samplecovariance,GLASSO-IPFandCCA.WeconsidervariouschoicesofNest,inparticular,Nest2375,450,525inouranalysis.NotethatonceachoiceforNestismade,itiskeptconstantforallthe14investmentperiods.NotethatforGLASSO-IPFandCCAweneedtopickagraphfortheconcentrationmatrix.WhenNestislessthan398,wethresholdthegeneralizedinverseforthesamplecovarianceandwhenNestislargerthan398,wethresholdtheinversetillweachieveasparsityofapproximately95%.WeusetheRealizedSharpeRatio,denedastherealizedexcessreturnoftheportfolioovertherisk-freerateperunitrealizedriskfortheentireinvestmenthorizontomeasuretheperformanceofvariousapproaches.InadditiontoCCAandGLASSO-IPFbased 60

PAGE 61

Sample-CovCCAIPFSP500 375NA0.4820.4710.1324500.3290.4790.4630.1325250.1180.4730.4750.132 Figure3-3. RealizedSharperatioofdierentinvestmentstrategiescorrespondingtodierentestimatorswithvariousvaluesofNest.ThemaximumannualizedSharperatiosforeachrowarehighlighted. 375450525 CCA1.921.741.67GLASSO-IPF3.774.775.31 Figure3-4. Averagerunningtime(inseconds)forestimatingthecovariancematrixofreturnswithCCAandGLASSO-IPFforvariousvaluesofNest. strategies,wealsousethesamplecovariancematrixbasedstrategy(forNest=450,525,denotebySample-Cov)alongwiththepassiveindextrackingstrategythattrackstheS&P500averageindex.BasedonFigure 3.4.2 wecanseethatCCAperformedthebestintermsofSharperatioswhenNest=375,450andalmostaswellasGLASSO-IPFwhenNest=525.BothofthesemethodsaresuperiortothesamplecovarianceandtheS&P500index.Asanothermeasureofperformance,weuseNormalizedwealthgrowthwhichisdenedastheaccumulatedwealthderivedfromtheportfoliooverthetradingperiodwhentheinitialbudgetisnormalizedtoone.Thismeasuretakesintoaccountbothtransactioncostsandborrowingcosts.Figures 3.4.2 and 3.4.2 showsthatintermsofNormalizedWealthGrowth,GLASSO-IPFandCCAperformverysimilarlyandbothperformbetterthanthesamplecovarianceandtheS&P500index.Overall,theresultsdemonstratethattheCCAandGLASSO-IPFbasedstrategiesareverycompetitivewitheachintermsofnancialperformance.However,asdemonstratedinFigure 3.4.2 ,theCCAalgorithmismuchfasterintermsofcomputationaltimecomparedtotheGLASSO-IPFapproach.ThisreinforcestheobservationsinSection 3.4.1 thattheCCAalgorithmprovidesasignicantlymorescalablealternativetotheGLASSO-IPFalgorithm,whileprovidingasimilarlevelofstatisticalprecision. 61

PAGE 62

Figure3-5. Normalizedwealthgrowthafteradjustingfortransactioncosts(0.5%ofprincipal)andborrowingcosts(interestrateof8%APR)withNest=375. Figure3-6. Normalizedwealthgrowthafteradjustingfortransactioncosts(0.5%ofprincipal)andborrowingcosts(interestrateof8%APR)withNest=450. 62

PAGE 63

3.5AsymptoticPropertiesInthissectionweexaminetheasymptoticconsistencypropertiesoftheproposedestimators.InSection 3.5.1 ,weestablishconsistencywhenthenumberofvariablespdoesnotvarywiththesamplesizen.InSection 3.5.2 ,weestablishconsistencyinahigh-dimensionalsetting,wherethenumberofvariablespisallowedtoincreasewiththesamplesizen. 3.5.1ConsistencyintheFixedpSettingConsiderthesettinginSection 3.3 ,andletbdenotetheestimateofobtainedbyAlgorithm 3.1 .Weprovethatundermildmomentassumptions,bisp n-consistentfor2PGasn!1,assumingthenumberofvariablesptobexed.LetusassumethattheunderlyingdistributionFofthei.i.d.observationsY1,Y2,,YnsatisestheassumptionZRpjxijaidF(x)<1,wherea1,a2,,aparenon-negativeintegerswhichsatisfyPpi=1ai=4.Underthisassumption,itfollowsbyadirectapplicationofthemultivariatecentrallimittheoremthatp n(S)]TJ /F10 11.955 Tf 11.97 0 Td[()convergestoamultivariateGaussianindistributionasn!1.WenowshowthattheestimatorbisadierentiablefunctionofthesamplecovariancematrixSbyestablishingthefollowingtwolemmas.Lemma Lemma8. Thefunction1,GD:P+!PGDdenedby1,GD(A)=XC2CD(A)]TJ /F8 7.97 Tf 6.59 0 Td[(1)C0)]TJ /F16 11.955 Tf 18.59 11.36 Td[(XSep2SD(A)]TJ /F8 7.97 Tf 6.59 0 Td[(1)Sep0isdierentiable.Also,ifA)]TJ /F8 7.97 Tf 6.59 0 Td[(12PGD,then1,GD(A)=A)]TJ /F8 7.97 Tf 6.59 0 Td[(1.ProofThedierentiabilityfollowsbyrepeatedapplicationofthedierentiabilityofthematrixinverseforpositivedenitematrices.Itisknownthat(see Lauritzen ( 1996 ))1,GD(A)uniquelyminimizesthefunctionf(K)=tr(KA))-222(jKAjforK2PGD.SinceA)]TJ /F8 7.97 Tf 6.58 0 Td[(1uniquelyminimizesfoverthespaceofallpositivedenitematrices,ifA)]TJ /F8 7.97 Tf 6.59 0 Td[(12PGD,itfollowsthat1,GD(A)=A)]TJ /F8 7.97 Tf 6.59 0 Td[(1.2 63

PAGE 64

Let2,G:L+!LGbedenedasthefunctionwhichtakesL2L+,andappliesthelaststepofAlgorithm 3.1 (withthetwoforloops)toconvertittoamatrixinLG. Lemma9. Thefunction3,G:P+!PGdenedby3,GD(A)=2,G(LA)2,G(LA)Tisdierentiable.HereLAdenotestheCholeskyfactorofA.Also,ifA2PG,then3,G(A)=A.ProofThetransformationofapositivedenitematrixAtoitsCholeskyfactorLAisclearlydierentiable.Itisclearfromconstructionthatfori>j,(2,G(LA))ijisapolynomialin(LA)ij,f2,G(LA)ikgkj.If(i,j)2E,thenbyconstruction2,G(LA)ij=(LA)ij.If(i,j)=2E,thenbytheinductionhypothesisandthefactthatLA2LG,itfollowsthat2,G(LA)ij==)]TJ /F16 11.955 Tf 11.29 8.97 Td[(Pj)]TJ /F8 7.97 Tf 6.58 0 Td[(1k=12,G(LA)ik2,G(LA)jk 2,G(LA)jj=)]TJ /F16 11.955 Tf 11.29 8.96 Td[(Pj)]TJ /F8 7.97 Tf 6.58 0 Td[(1k=1(LA)ik(LA)jk (LA)jj=(LA)ij.Theresultfollowsbyinduction.2 64

PAGE 65

NotethatbD=1,G(S)andb=3,G(bD).Hence,wegetthatb=3,G(1,G(S)).Since2PG,itfollowsthat=3,G(1,G()).UsingthemultivariateDeltamethodalongwiththeseobservationsestablishestheconvergenceofp n(b)]TJ /F10 11.955 Tf 12.71 0 Td[()toamutivariateGaussianlimitasn!1. 3.5.2ConsistencyintheHigh-dimensionalSettingInthissection,largesampleestimationconsistencyoftheCCAalgorithmisinvestigatedinahigh-dimensionalsetting.ConsideragainthesettinginSection 3.3 .Wewillassumethatthenumberofvariablesp=pnvarieswiththesamplesizen.Letfngn1denotethesequenceoftrueinversecovariancematrices,andfLngn1denotethecorrespondingsequenceofCholeskyfactors.ThegraphG=Gnencodesthesparsitypatterninn,andletGD=GDndenotethelledgraphforGwithrespecttoagivenorderingofthevariables.Letd=dnbethenumberofedgesinE=Enand~d=~dnbethenumberofedgesinED=EDn.Letc=~d)]TJ /F3 11.955 Tf 12.39 0 Td[(d.Wemakethefollowingmildregularityassumptions. (A1)(BoundedEigenvalues)Thereexists>0suchthat0
PAGE 66

onaneventC=CnwhereP(Cn)!1asn!1.Heren=M~dn2q logp n!0asn!1andMisaconstant.Intheargumentsthatfollow,westayrestrictedtotheeventCn.AsdiscussedinSection 3.3.1 ,IntheCCAalgorithm,weonlyneedtoupdatethell-inentriesofbLDtoobtainthenalestimatebL.Let^L(i),i=1,2,...,cbetheintermediateestimateswhere^L(i)and^L(i)]TJ /F8 7.97 Tf 6.59 0 Td[(1)dierinexactlyoneentry,whichistheithll-inentry.Inparticular,notethat^L(c)=bL.Withtherequiredassumptionsandnotationinhand,weestablishourconsistencyresultinthefollowingtheorem. Theorem3.5.1. Supposethatassumptions(A1)and(A2)aresatised.Thenthefollowingresultshold. (a) bL)]TJ /F10 11.955 Tf 12.21 2.66 Td[(L26 +1cnwithprobabilitytendingto1asn!1. (b) b)]TJ /F10 11.955 Tf 13.35 2.66 Td[(23 6 +1cnwithprobabilitytendingto1asn!1.Proof(a)Supposetherstll-inentryis(i1,j1).Then^L(1))]TJ /F10 11.955 Tf 12.21 2.66 Td[(L2^L(1))]TJ /F10 11.955 Tf 18.12 3.78 Td[(^L(0)2+^L(0))]TJ /F10 11.955 Tf 12.21 2.66 Td[(L2^L(0)i1,j1)]TJ /F16 11.955 Tf 13.15 18.44 Td[(Pk
PAGE 67

Notethat^L(0)2n+L2and2=LLT2=L221 .Thisimpliesthat^(0))]TJ /F10 11.955 Tf 13.35 2.65 Td[(2=^L(0)(^L(0))T)]TJ /F10 11.955 Tf 12.21 2.65 Td[(LLT2^L(0)2^L(0))]TJ /F10 11.955 Tf 12.21 2.65 Td[(L2+L2^L(0))]TJ /F10 11.955 Tf 12.21 2.65 Td[(L2=(^L(0)2+L2)^L(0))]TJ /F10 11.955 Tf 12.21 2.65 Td[(L2(n+2L2)^L(0))]TJ /F10 11.955 Tf 12.2 2.65 Td[(L23 p nwherethelastinequalityfollowsbecuaseforlargen,n1 p .Inaddtion,asLiip and^L(0)j1,j1Lj1,j1)]TJ /F16 11.955 Tf 11.96 13.74 Td[(Lj1,j1)]TJ /F10 11.955 Tf 12.21 2.65 Td[(^L(0)j1,j1,wehavethatmax1ipLj1,j1)]TJ /F10 11.955 Tf 12.2 2.66 Td[(^L(0)j1,j1np 2withhighprobability.Hence,forlargen,^L(0)j1,j1p )]TJ /F11 7.97 Tf 13.16 11.5 Td[(p 2p 2.Itfollowsthat,^L(1))]TJ /F10 11.955 Tf 12.2 2.66 Td[(L26 +1n,8nNSimilarly,^L(k))]TJ /F10 11.955 Tf 12.21 2.66 Td[(L26 +1cn8k2f1,...,cgTherefore,^L)]TJ /F10 11.955 Tf 12.21 2.66 Td[(L2=^L(c))]TJ /F10 11.955 Tf 12.21 2.66 Td[(L26 +1cn!0asn!1. 67

PAGE 68

(b)Notethatb)]TJ /F10 11.955 Tf 13.35 2.66 Td[(2=bLbLT)]TJ /F10 11.955 Tf 12.2 2.66 Td[(LLT2=bLbLT)]TJ /F10 11.955 Tf 12.2 2.65 Td[(LbLT+LbLT)]TJ /F10 11.955 Tf 12.2 2.65 Td[(LLT2bL)]TJ /F10 11.955 Tf 12.21 2.66 Td[(L2bL2+L@bL)]TJ /F10 11.955 Tf 12.21 2.66 Td[(L2bL)]TJ /F10 11.955 Tf 12.21 2.66 Td[(L22+2L2bL)]TJ /F10 11.955 Tf 12.21 2.66 Td[(L2.TheresultfollowsfromPart(a),Assumption(A1)andAssumption(A2).2 68

PAGE 69

CHAPTER4ESTIMATIONOFGAUSSIANDAGSWITHKNOWNPARTITIONBASEDPARTIALORDERING 4.1IntroductionInmoderndaystatistics,datasetswherethenumberofvariablesismuchhigherthanthenumberofsamplesaremorepervasivethantheyhaveeverbeen.Oneofthemajorchallengesinthissettingistoformulatemodelsanddevelopinferentialprocedurestounderstandthecomplexrelationshipsandmultivariatedependenciespresentinthesedatasets.Thecovariancematrixisthemostfundamentalobjectthatquantiesrelationshipsbetweenthevariablesinmultivariatedatasets.Hence,estimationofthecovariancematrixiscrucialinhigh-dimensionalproblemsandenablesthedetectionofthemostimportantrelationships.Inparticular,supposeY1,...,Yn2Rparei.i.d.randomvectorsformamultivariatedistributionwithmean0andcovariancematrix=)]TJ /F8 7.97 Tf 6.58 0 Td[(1.Inhigh-dimensionalsamplestarvedsettings,aneectiveandpopularmethodforestimatingimposessparsityontheentriesof(covariancegraphmodels),or(concentrationgraphmodels),orappropriateCholeskyfactorsof(directedacyclicgraphmodelsorBayesiannetworks).Thechoiceofanappropriatemodeloftendependsontheapplication.Inthiswork,wefocusonDirectedAcyclicGraph(DAG)models,whichasdiscussedbelow,imposesparsityinanappropriateCholeskyfactorof.Inparticular,considerthefactorizationoftheinversecovariancematrix=BtB,whereBcanbeconvertedtoalowertriangularmatrixwithpositivediagonalentriesthroughapermutationofrowsandcolumns.ThesparsitypatternofBgivesusadirectedacyclicgraph(DAG),whichisdenedasG=(V,E)whereV=f1,...,pgandE=fi!j:Bij6=0g.TheassumptionBij=0correspondstoassuminganappropriateconditionalindependenceunderGaussianity,andcorrespondstoassuminganapproapriateconditionalcorrelationbeingzerointhegeneralsettting.TherearetwoimportantlinesofworkthathavecurrentlydealtwithDAGestimationintheGaussianframework-onewherethepermutationthatmakesBlowertriangularis 69

PAGE 70

knownandonewhereitisunknown.Webrifelydiscussthembelow.Inmanyapplications,anaturalordering,suchastimebasedorlocationbasedordering,ofthevariablespresentsitself,andhenceanaturalchoiceforthepermutationwhichmakesBlowertriangularisavailable.Penalizedlikelihoodmethods,whichuseversionsofthe`1penalty,andminimizetherespectiveobjectivefunctionsoverthespaceoflowertriangularmatrices,havebeendevelopedin Huangetal. ( 2006 ); ShojaieandMichailidis ( 2010 ); Khareetal. ( 2017a ).Bayesianmethodsforthissettinghavebeendevelopedin Caoetal. ( 2017 ); Altamoreetal. ( 2013 ); Consonnietal. ( 2017 ).Formanyofthesemethods,highdimensionalconsistencyresultshavealsobeenestablished.Ifthepermutation/orderingthatmakesBlowertriangularisunkonwn,thentheproblembecomessignicantlymorechallenging,bothcomputationallyandtheortically.Inthissetting,severalscore-based,constraint-basedandhybridalgorithmsforestimatingtheunderlyingCPDAG1havebeendevelopedandstudiedintheliterature Spirtesetal. ( 2001 ); GeigerandHeckerman ( 2013 ); LamandBacchus ( 1994 ); Heckermanetal. ( 1995 ); Chickering ( 2003 ); EllisandWong ( 2008 ); Zhou ( 2011 ); KalischandBuhlmann ( 2007 ); Tsamardinosetal. ( 2006 ); Gamezetal. ( 2011 2012 ); vandeGeerandBuhlmann ( 2013 ).See AragamandZhou ( 2015 )foranexcellentanddetailedreview.Recently, AragamandZhou ( 2015 )havedevelopedapenalizedlikelihoodapproachcalledCCDrforsparseestimationofB,whichhasbeenshowntobesigncantlymorecomputationallyscalablethanpreviousapproaches.Inmanyapplications,forexamplethegenomicsapplicationconsideredinSection 4.3 ,specicinformationregardingthevaraiblesinavailable,whichallowsforapartitionofthevariablesintosetsV1,V2,,VksuchthatanypossibleedgefromavertexinVitoavertexinVjisdirectedfromvitovjifi
PAGE 71

subsetusnotknown,andhastobeinferredfromthedata.Thissettingisnotsubsumedinthepreviousappraochesmentionedabove,andfallssomewhereinthemiddleofthetwoextremesofhavingcompleteinformationregardingtheordering,andhavingnoinformationregardingtheordering.ThegoalofthisworkistodevelopahybridapproachforDAGestimationwhichleveragesthepartitionbasedorderinginformation.Wewillalsoshowthatusingthepartitioninformationleadstoareductioninthenumberofcomputations,andmoreimportantlyallowsforparallelprocessing,unlikeCCDr.Thiscanleadtosignicantimprovementincomputationalspeedandstatisticalperformance.Therestoftheworkisorganizedasfollows.InSection 4.2.1 ,wedevelop,anddescribeindetail,ourhybridalgortihmcalledPartition-DAG.InSection 4.3 ,weperformadetailedexperimentalstudytoevaluatetheperformanceofouralgorithmcomparedtoapproacheswhichdonotmakesuseofthepartialorderinginformation.InSection 4.3 ,weuseknownDAGsfromtheDREAM3competition( Prilletal. ( 2010 ); Marbachetal. ( 2010 2009 ))andperformasimualtionstudyandevaluatetheDAGestimationaccuracyofvariousapproaches.InSection 4.3 ,weperformasimilarsimulationstudy,thistimeusingrandomlygeneratedDAGswithmorenumberofvariables.Finally,inSection 4.3 ,weanalyzethecellsignallingnetworkdatasetfrom Sachsetal. ( 2003 ). 4.2DAGEstimationUsingPartitionInformationInordertounderstandhowonecanleveragethepartialorderinginformation,itiscrucialtounderstandtheworkings,similarities,anddierencesoftheCCDr AragamandZhou ( 2015 )andtheCSCS Khareetal. ( 2017a )algorithms,whicharestateoftheartintermsoncomputationalscalabilityandconvergencefortheboundarysettingswithcompletelyunknownandcompletelyknownvariableorderingrespectively.TheCSCSalgorithmisderivedunderthesettingwhereadomain-specicorderingofthevariableswhichmakesBlowertriangularisknown.Hence,theDAGestimationproblemboilsdowntoestimatingthesparsitypatterninL,thelowertriangularpermutedversionofB.Inotherwords,Lisalowertriangularmatrixwithpositivediagonalentriessuchthat=LtL. 71

PAGE 72

TheobjectivefunctionforCSCSisQcscs(L)=trace(S))]TJ /F10 11.955 Tf 13.15 8.09 Td[(1 2logjj+X1j
PAGE 73

function.However,therangeforQCCDr(B),whichisthesetofmatricesthatcanbeconvertedtoalowertriangularmatrixwithpositivediagonalentriesthroughapermutationoftherowsandcolumns,isnotconvexandgeneralresultsintheliteratureonlyguaranteeconvergenceofthesequenceofiteratestoalocalminimumoftheobjectivefunction.Inaddition,whileCSCScanbebrokendownintopparallelizableproblemsthesamecannotbesaidforCCDr,whichleadstoasignicantcomputationaldisadvatageforCCDr.Finally,asymptoticconsistencyforthegeneralsetting(withnorestrictionsoncondtionalvariances)forCCDrisn'tavailableasyet,whereasTherem4.1in Khareetal. ( 2017a )establishesbothmodelselectionandestimationconsistencyforCSCS.See Khareetal. ( 2017a )foramoredetailedcomparisonbetweenthesetwoalgorithms. 4.2.1Partition-DAGalgorithmwithTwoPartitionBlocksAsstatedintheintroduction,additionaldatalikegeneknock-outdataormoregeneralperturbationsdata Shojaieetal. ( 2014 )cangiveinformationaboutpartitionsofthevariableswherewehavepriorknowledgeaboutthedirectionoftheedgesbetweenpartitions,butnotwithinpartitions.WewillnowdiscusshowonecancreateahybridalgorithmfromCSCSandCCDrwhereweincorporatethisinformationforDAGestimation.Forsimplicitywewillinitiallyworkwiththecasewherethevariables,V=f1,...,pg,aredividedintotwogroupsV1=f1,...,mgandV2=fm+1,...,pgsuchthatwecannothaveanedgefromanodeinV2tooneinV1,butcanhaveonefromanodeinV1tooneinV2.Hence,8j2V1,8k2V2wehavethatBkj=0.ThisimpliesthatBhasblocktriangularformwhichresemblesB=0B@B110B21B221CA (4{1)ThediagonalblocksB11andB22areconstrainedsothateachmatrixisapermutedversionoflowertriangularmatrices,i.e.,B112BmandB222Bp)]TJ /F4 7.97 Tf 6.58 0 Td[(m,theentriesoftheo-diagonalblockB12areallzero.However,therearenoconstraintsontheo-diagonalblockB21.SimilartoCCDr,weconsideraGaussianlog-likelihoodbasedobjectivefunction,denotedbyQPDAG, 73

PAGE 74

givenbyQPDAG(B)=trace(S))]TJ /F10 11.955 Tf 13.15 8.08 Td[(1 2logjj+X1i6=jpjBijj.Here,ourgoalistominimizetheabovefunctionoverthespaceeBp,denedbyeBp=fB:B112Bm,B222Bp)]TJ /F4 7.97 Tf 6.58 0 Td[(m,B12=0g,asopposedtoCCDr,wherethegoalistominimizeoverthespaceBp.AsinCCDrandCSCS,wepursueacoordinatewiseminimizationapproach.Usingstraightforwardcalculations,wegetQPDAG(B)=trace(BtBS))]TJ /F10 11.955 Tf 13.15 8.09 Td[(1 2logBtB+X1i6=jpjBijj=pXi=1)]TJ /F4 7.97 Tf 13.78 5.53 Td[(pXh=1ShhB2ih+2p)]TJ /F8 7.97 Tf 6.59 0 Td[(1Xk=1pXl=k+1SklBikBil)]TJ /F10 11.955 Tf 11.96 0 Td[(logBii+pXj=1,j6=ijBijj (4{2)Forcoordinatewiseminimization,weneedtounderstandhowtominimizeQPDAGwithrespecttoanarbitraryelementBijofBgivenalltheotherelements.GiventhenatureoftheconstraintsoneachblockofB2eBp,weconsiderthreedierentcases. CaseI:(Diagonalentries-Bii)Itfollowsfrom( 4{2 )thatQPDAGisthesumofquadraticandlogarithmictermsinagivendiagonalentryBii(treatingotherentriesasxed).Inparticular,QPDAG(Bii)=SiiB2ii+2BiipXk=1,k6=iSikBik)]TJ /F10 11.955 Tf 11.96 0 Td[(logBii+termsindependentofBii.Simplecalculusshowsthattheuniqueminimizer(withrespecttoBii)oftheabovefunctionisgivenby^Bii=)]TJ /F10 11.955 Tf 9.29 0 Td[(2Ppk=1,k6=iSikBik+q 4(Ppk=1,k6=iSikBik)2+8Sii 2Sii (4{3) CaseII:(O-diagonalentriesinB21)ConsiderBij,wherem+1ipand1jm.SinceBji=0,itfollowsfrom( 4{2 )thatQPDAGisthesumofquadraticandabsolute 74

PAGE 75

valuetermsinBij(treatingotherentriesasxed).Inparticular,QPDAG(Bij)=SjjB2ij+2BijpXk=1,k6=jSjkBik+jBijj+termsindependentofBij. (4{4)Simplecalculusshowsthattheuniqueminimizer(withrespecttoBij)oftheabovefunctionisgivenby^Bij=S)]TJ /F16 11.955 Tf 11.29 8.97 Td[(Ppk=1,k6=jSjkBik Sjj, 2Sjj, (4{5)whereS(x,)=sign(x)maxfjxj)]TJ /F5 11.955 Tf 17.94 0 Td[(,0g. CaseIII:(O-diagonalentriesinB11andB22)ConsiderBij,where1i6=jmorm+1i6=jp.SinceB112BmandB222Bp)]TJ /F4 7.97 Tf 6.59 0 Td[(m,itfollowsthatatmostoneofBijorBjiisnon-zero.SoasinCCDr AragamandZhou ( 2015 ),wewilljointlyminimizeQPDAGasafunctionof(Bij,Bji).Thiscanbedoneasfollows.Ifaddinganon-zerovalueforBijviolatestheDAGconstraint,orequivalentlytheconstraintthatB112BmandB222Bp)]TJ /F4 7.97 Tf 6.58 0 Td[(m,thenwesetBij=0,andthenminimizeQPDAGasafunctionofBjiandupdatetheBjientryasspeciedin( 4{4 ),( 4{5 )withtherolesofiandjexchanged).Ifaddinganon-zerovalueforBjiviolatestheDAGconstraint,thenwesetBji=0,andthenminimizeQPDAGasafunctionofBijandupdatetheBijentryasspeciedin( 4{4 ),( 4{5 ).However,itispossiblethatneitherjBijj>0orjBjij>0violatestheDAGconstraint.Inthatcase,wecompute^Bijand^Bjiusingappropriateversionsof( 4{5 ),picktheonethatmakesalargercontributiontowardsminimizingQPDAG,andsettheotheronetozero.TheresultingcoordinatewiseminimizationalgorithmforQPDAG,calledPartition-DAG,whichrepeatedlyiteratesthroughalltheentriesofBbasedonthethreecasesdiscussedabove,isprovidedinAlgorithm 4-1 .WewouldliketonotethatCaseIIresemblesatypicalstepoftheCSCSalgorithm Khareetal. ( 2017a ),whileCaseIIIresemblesatypicalstepoftheCCDr-`1algorithm.ThisisthereasonweregardAlgorithm 4-1 asahybridofthesetwoalgorithms. 4.2.2TheCasewithMultiplePartitionBlocksAlgorithm 4-1 canbeeasilygeneralizedtothecasewherethevariablesarepartitionedintoRblocks,say,V1,V2,,VR,suchthatanyedgefromanodeuinVitoanodevinVjisdirectedfromutov,ifi
PAGE 76

thematrixBhasablocklowertriangularstructure,whichcanbedenotedasfollows.B=0BBBBBBB@B110...0B21B22...0............BR1BR2...BRR1CCCCCCCA (4{6)Inparticular,theparameterBliesinthespaceeBp,RgivenbyeBp,R=fB:Bii2Bmi)]TJ /F4 7.97 Tf 6.58 0 Td[(mi)]TJ /F8 5.978 Tf 5.75 0 Td[(1,Brs=0if1r
PAGE 77

whereQr(Br1,Br2,,Brr)=mrXi=mr)]TJ /F8 5.978 Tf 5.75 0 Td[(1+1)]TJ /F4 7.97 Tf 10.99 5.26 Td[(mrXh=1ShhB2ih+2mr)]TJ /F8 7.97 Tf 6.58 0 Td[(1Xk=1mrXl=k+1SklBikBil)]TJ /F10 11.955 Tf 11.96 0 Td[(logBii+mrXk=1,k6=ijBikjonlydependsonthetermsinblockrowr.Asaresult,theminimizationofeachblockrowcanbeimplemmentedinparallelasshowninAlgorithm 4-2 .Thiscanleadtohugecomputationaladvatages,asillustratedinourexperiments. 2. (Numberofcomputations)Withtheadditionalpartitioninformation,manyoftheentriesinBareautomaticallysettozero.ThisreducesthenumberofcomputationsPartition-DAGneedstodoincomparisontoCCDr.Inaddition,manyofthecomputationswecarryoutforpartition-DAGfallunderCaseIIasdiscussedinSection??,whichismuchsimplerandfasterthancomputationsunderCaseIII,whichiswhatCCDrneedstodoforeverysinglecoordinate. 3. (EstimationAccuracy)Thisisveryobvious,butleveragingmoreinformationcanleadtoaimprovedestimationaccuracyasborneoutinoursimulationsstudies. 4.3DataAnalysisInthissection,weanalyzeaowcytometrydatasetonp=11proteinsandn=7466cells,from Sachsetal. ( 2003 ).Theseauthorstadirectedacyclicgraph(DAG)tothedata,whichweassumetobethetrueDAG.Whenneeded,theorderingoftheconnectionsbetweenpathwaycomponentswereestablishedbasedonperturbationsincellsusingmolecularinterventionsandweconsiderthenecessaryorderingstobeknownapriori.ForCSCS,thismeanstheentireordering,whereasforPartition-DAGweletV1=fplcg,pip3,pip2,pka,pkcgandV2=fpraf,pjnk,p44.42,pakts473,pmek,p38g.Thisdatasetisanalyzedin Friedmanetal. ( 2008b )and ShojaieandMichailidis ( 2010 ).In Friedmanetal. ( 2008b ),theauthorsestimatedthemanygraphsbyvaryingthe`1penaltyandreportaround50%falsepositiveandfalsenegativeratesbetween 77

PAGE 78

Figure4-1. Partition-DAGalgorithmwith2blocks Algorithm4.1(H). SetBo=Ip Set>0.OptimizingB11 whilejjBn)]TJ /F3 11.955 Tf 11.95 0 Td[(Bojj1do SetBo=Bn fori=1,...,mdo ^Bnii=)]TJ /F10 11.955 Tf 9.29 0 Td[(2Pmk=1,k6=iSikBnik+q 4(Pmk=1,k6=iSikBnik)2+8Sii 2Sii endfor fori=1,...,mdo forj=1,...,mdo Checkifaddingedgej!iori!jviolatesDAG. Ifi!jviolatesDAG,setBnji=0and^Bnij=S)]TJ /F16 11.955 Tf 11.29 8.96 Td[(Pmk=1,k6=jSjkBnik Sjj, 2Sjj Ifj!iviolatesDAG,reverseindicesandrepeattheabovesteps. Otherwise,ifneitherviolatestheDAG,setq1=QPDAG(Bn)evaluatedwithBnij=^BijandBnji=0andq2=QPDAG(Bn)evaluatedwithBnji=^BjiandBnij=0 Ifq1>q2,setBnij=0andBnji=^Bji.Otherwise,setBnji=0andcalculateBnij=^Bij. endfor endfor endwhile.OptimizingB22 whilejjBn)]TJ /F3 11.955 Tf 11.95 0 Td[(Bojj1do SetBo=Bn fori=m+1,...,pdo ^Bnii=)]TJ /F10 11.955 Tf 9.3 0 Td[(2Ppk=m+1,k6=iSikBnik+q 4(Ppk=m+1,k6=iSikBnik)2+8Sii 2Sii endfor fori=m+1,...,pdo forj=m+1,...,pdo Checkifaddingedgej!iori!jviolatesDAG. Ifi!jviolatesDAG,setBnji=0and^Bnij=S)]TJ /F16 11.955 Tf 11.29 8.97 Td[(Ppk=m+1,k6=jSjkBnik Sjj, 2Sjj Ifj!iviolatesDAG,reverseindicesandrepeattheabovesteps. Otherwise,ifneitherviolatestheDAG,setq1=QPDAG(Bn)evaluatedwithBnij=^BijandBnji=0andq2=QPDAG(Bn)evaluatedwithBnji=^BjiandBnij=0 Ifq1>q2,setBnij=0andBnji=^Bji.Otherwise,setBnji=0andcalculateBnij=^Bij. endfor endfor.B21 fori=m+1,...,pdo forj=1,...,mdo SetBnji=0and^Bnij=SPpk=1,k6=jSjkBnik Sjj, 2Sjj endfor endfor endwhile 78

PAGE 79

Figure4-2. Partition-DAGalgorithmwithRblocks Algorithm4.2(H). SetBo=Ip Set>0.OptimizingB11 whilejjBn)]TJ /F3 11.955 Tf 11.95 0 Td[(Bojj1do SetBo=Bn fori=1,...,m1do ^Bnii=)]TJ /F10 11.955 Tf 9.29 0 Td[(2Pm1k=1,k6=iSikBnik+q 4(Pm1k=1,k6=iSikBnik)2+8Sii 2Sii endfor fori=1,...,m1do forj=1,...,m1do Checkifaddingedgej!iori!jviolatesDAG. Ifi!jviolatesDAG,setBnji=0and^Bnij=S)]TJ /F16 11.955 Tf 11.29 8.96 Td[(Pm1k=1,k6=jSjkBnik Sjj, 2Sjj Ifj!iviolatesDAG,reverseindicesandrepeattheabovesteps. Otherwise,ifneitherviolatestheDAG,setq1=QPDAG(Bn)evaluatedwithBnij=^BijandBnji=0andq2=QPDAG(Bn)evaluatedwithBnji=^BjiandBnij=0 Ifq1>q2,setBnij=0andBnji=^Bji.Otherwise,setBnji=0andcalculateBnij=^Bij. endfor endfor endwhile forr=2,...,Rdo.OptimizingrthBlockRow whilejjBn)]TJ /F3 11.955 Tf 11.95 0 Td[(Bojj1do SetBo=Bn fori=mr)]TJ /F8 7.97 Tf 6.59 0 Td[(1+1,...,mrdo ^Bnii=)]TJ /F10 11.955 Tf 9.3 0 Td[(2Pmrk=mr)]TJ /F8 5.978 Tf 5.76 0 Td[(1+1,k6=iSikBnik+q 4(Pmrk=mr)]TJ /F8 5.978 Tf 5.76 0 Td[(1+1,k6=iSikBnik)2+8Sii 2Sii endfor fori=mr)]TJ /F8 7.97 Tf 6.59 0 Td[(1+1,...,mrdo fori=mr)]TJ /F8 7.97 Tf 6.58 0 Td[(1+1,...,mrdo Checkifaddingedgej!iori!jviolatesDAG. Ifi!jviolatesDAG,setBnji=0and^Bnij=S)]TJ /F16 11.955 Tf 11.29 8.97 Td[(Pmrk=mr)]TJ /F8 5.978 Tf 5.76 0 Td[(1+1,k6=jSjkBnik Sjj, 2Sjj Ifj!iviolatesDAG,reverseindicesandrepeattheabovesteps. Otherwise,ifneitherviolatestheDAG,setq1=QPDAG(Bn)evaluatedwithBnij=^BijandBnji=0andq2=QPDAG(Bn)evaluatedwithBnji=^BjiandBnij=0 Ifq1>q2,setBnij=0andBnji=^Bji.Otherwise,setBnji=0andcalculateBnij=^Bij. endfor endfor fori=mr)]TJ /F8 7.97 Tf 6.59 0 Td[(1+1,...,mrdo forj=1,...,mr)]TJ /F8 7.97 Tf 6.59 0 Td[(1do SetBnji=0and^Bnij=SPmrk=1,k6=jSjkBnik Sjj, 2Sjj endfor endfor endwhile endfor 79

PAGE 80

oneoftheestimatesandthendingsof Sachsetal. ( 2003 ).WepickarangeofthepenaltyparameterandonlydisplaytheresultsforthehighestMCC,whereMatthew'scorrelationcoecient(MCC)isdenedasMCC=TPTN)]TJ /F3 11.955 Tf 11.96 0 Td[(FPFN p (TP+FP)(TP+FN)(TN+FP)(TN+FN).HereTP,TN,FPandFNcorrespondtotruepositive,truenegative,falsepositiveandfalsenegative,respectively.ThevalueofMCCrangesfrom-1to1withlargervaluescorrespondingtobetterts(-1and1representworstandbestts,respectively).Asexpected,thebestperformanceisfromCSCSintermsofallthemetricsexceptFPRs.ThelowestFPRisforCCDr,butthiscomesatahighcostintermsofalltheothermetrics.ThesecondlowestFPRisforthePCalgorithm.Again,however,itpaysthepricesintermsofTPR,FNRandMCC.Partitioned-DAGhasamuchbetterrecoveryoftrueedgesthanthePCalgorithmandhencewouldbebetterfordiscoveringtruerelationships.ItdoescomeatthecostofaslightlyhigherFPR,buttheFNRisloweragain. FPRTPRFNRMCC PC0.16130.58820.41180.3683 CCDr0.07530.17650.82350.1272 PDAG0.26880.76470.23530.3770 CSCS0.26880.94120.05880.5026 80

PAGE 81

Figure4-3. TrueNetworkfromowcytometrydata A B C DFigure4-4. EstimatedNetworksfromowcytometrydata(A)PC(B)CCDr(C)PDAG(D)CSCS 81

PAGE 82

APPENDIXAPROOFSFORCHAPTER2 A.1ProofofLemma 1 NotethatSn(annnsamplecovariancematrixfortherstnvariables)isnon-singularwithprobaiblity1,whileSn+1(an(n+1)(n+1)matrix)issingularwithprobability1.SinceSn+1=264SnS(n+1)St(n+1)Sn+1,n+1375,ispositivesemi-denite,itfollowsthatSn+1,n+1=St(n+1)S)]TJ /F8 7.97 Tf 6.59 0 Td[(1nS(n+1).Hence,ifn+1=S)]TJ /F8 7.97 Tf 6.59 0 Td[(1nS(n+1),wegetthat(n+1)tSnn+1+2(n+1)tS(n+1)+Sn+1,n+1=0.ItfollowsthatQChol,n+1(n+1,1 m)=)]TJ /F10 11.955 Tf 11.3 0 Td[(logm+kn+1k1!asm!1.2 A.2ProofofLemma 5 Notethatfor1jk)]TJ /F10 11.955 Tf 11.96 0 Td[(1,hk,A,(x)=x2jAjj+2xj Xl6=jAljxl!+jxjj+termsindependentofxj.Itfollowsthat(Tj(x))j=S)]TJ /F10 11.955 Tf 9.3 0 Td[(2Pl6=jAljxl 2Ajj.Also,hk,A,(x)=)]TJ /F10 11.955 Tf 9.3 0 Td[(2logxk+x2kAkk+2xk Xl6=kAlkxl!+termsindependentofxk. 82

PAGE 83

Itfollowsthat@ @xkhk,A,(x)=0,)]TJ /F10 11.955 Tf 34.67 8.09 Td[(2 xk+2xkAkk+2Xl6=kAlkxl=0,xk=)]TJ /F16 11.955 Tf 11.29 8.97 Td[(Pl6=kAlkxl+r Pl6=kAlkxl2+4Akk 2Akk,Notethatsincexk>0thepositiveroothasbeenretainedasthesolution.2 A.3ProofofLemma 7 Weconsidertwocases.Case1(nk):Itfollowsfrom( 2{13 )and( 2{14 )thattheupdateforeachofthekcoordinatesinaniterationofAlgorithm 3.1 canbeachievedinO(k)computations.Hence,acomputationalcomplexityofO(k2)canbeachievedinthiscase.Case2(n
PAGE 84

thecomputationalcomplexityofO(nk)canbeachievedforoneiteration(whichinvolveskcoordinatewiseupdates)oftheAlgorithm 3.1 .2 A.4ProofofTheorem 2.2.1 Fix1iparbitrarily.NotethatSi=YTiYi,whereYiisannimatrixofobservationscorrespondingtotherstivariables.SincealldiagonalentriesofSareassumedtobepositive,itfollowsthatYihasnozerocolumns.Now,let2Rbearbitrarilyxed.ifQCSCS,i(i)<,thenitfollowsthat)]TJ /F10 11.955 Tf 9.3 0 Td[(2logii<(sincetheothertwotermsintheexpressionforQCSCS,iarenon-negative).Inparticular,weobtainthatii>exp()]TJ /F5 11.955 Tf 9.3 0 Td[(=2).Also,itfollowsfrom( 2{10 )thatjijj2=forevery1ji)]TJ /F10 11.955 Tf 12.6 0 Td[(1,andii(+bi)=ai.Theabovearguments,alongwiththeexpressionforQCSCS,iin( 2{8 )and( 2{9 ),and[2,Theorem2.2]implythatthecycliccoordinatewisealgorithmforQCSCS,iwillconvergetoaglobalminimumofQCSCS,i.ItfollowsthatAlgorithm 2-3 convergestoaglobalminimumofQCSCS.2 A.5ProofofTheorem 2.4.1 Notethatby( 2{7 ),theproblemofminimizingQCSCSwithrespecttoLisequivalenttotheproblemofminimizingQCSCS,rwithrespecttorfor1rp.WewillrstestablishappropriateconsistencyresultsfortheminimizersofQCSCS,r,foreach1rp,andthencombinetheseresultstoestablishTheorem 2.4.1 .Throughoutthisproof,wewilloftensuppressthedependenceofvariousquantitiesonn,fornotationalsimplicityandeaseofexposition.Wenowestablishaseriesoflemmaswhichwillbequiteusefulinthemainproof. Lemma10. Forany>0,thereexistsaconstantC>0suchthatwithprobabilityatleast1)]TJ /F3 11.955 Tf 11.96 0 Td[(O(n)]TJ /F9 7.97 Tf 6.59 0 Td[()max1i,j,pnjSij)]TJ /F10 11.955 Tf 13.35 2.65 Td[(n,ijjCr logn n.forlargeenoughn. 84

PAGE 85

Proof:Fix1i,jpn.Let+:=En(Y1i+Y1j)2and)]TJ /F10 11.955 Tf 11.95 1.79 Td[(:=En(Y1i)]TJ /F3 11.955 Tf 11.95 0 Td[(Y1j)2.Itfollowsthat P(jSij)]TJ /F10 11.955 Tf 13.35 2.66 Td[(n,ijj>t)=P 1 nnX`=1(Y`i+Y`j)2)]TJ /F10 11.955 Tf 11.96 0 Td[((Y`i)]TJ /F3 11.955 Tf 11.96 0 Td[(Y`j)2)]TJ /F10 11.955 Tf 11.96 0 Td[((+)]TJ /F5 11.955 Tf 11.96 0 Td[()]TJ /F10 11.955 Tf 7.08 1.8 Td[()>4t!P 1 nnX`=1(Y`i+Y`j)2)]TJ /F5 11.955 Tf 11.96 0 Td[(+>2t!+P 1 nnX`=1(Y`i)]TJ /F3 11.955 Tf 11.95 0 Td[(Y`j)2)]TJ /F5 11.955 Tf 11.95 0 Td[()]TJ /F16 11.955 Tf 7.09 22.71 Td[(>2t!. (A{1) NotethatY`i+Y`jaresub-Gaussianrandomvariables(byAssumption(A2))andtheirvariancesareuniformlyboundedini,jandn(byAssumption(A1)).Foranyc1>0,itfollowsby( A{1 )and[5,Theorem1.1],thatthereexistconstantsK1andK2independentofi,jandnsuchthatP jSij)]TJ /F10 11.955 Tf 13.35 2.66 Td[(n,ijj>Cr logn n!K1e)]TJ /F4 7.97 Tf 6.59 0 Td[(K2nc1p logn n2=K1e)]TJ /F4 7.97 Tf 6.58 0 Td[(K2C2lognforlargeenoughn.Usingtheunionboundandthefactthatp=O(n)forsome0givesustherequiredresult.2 Lemma11. Forevery1rp,wenotethatrminimizesQCSCS,rifandonlyif dri(r)=)]TJ /F5 11.955 Tf 9.3 0 Td[(nsign(ri)ifri6=0,1ir)]TJ /F10 11.955 Tf 11.96 0 Td[(1, (A{2) jdri(r)jnifri=0,1ir)]TJ /F10 11.955 Tf 11.95 0 Td[(1, (A{3) drr(r)=0, (A{4) where dri(r)=2rXj=1rjSij(A{5)for1ir)]TJ /F10 11.955 Tf 11.95 0 Td[(1,and drr(r)=2rXj=1rjSrj)]TJ /F10 11.955 Tf 15.28 8.09 Td[(2 rr.(A{6)Also,ifjdri(^r)j
PAGE 86

TheproofimmediatelyfollowsfromtheKKTconditionsfortheconvexfunctionQCSCS,r. Lemma12. Forevery1irpEn[dri(rn)]=0.Proof:Letn,rdenotethesubmatrixofnformedbyusingtherstrrowsandcolumns.Sincen=LtnLn,itfollowsthatrn,rrnistherthrowof)]TJ /F10 11.955 Tf 6.88 -7.03 Td[(n,r)]TJ /F8 7.97 Tf 6.59 0 Td[(1.Itfollowsthatforevery1i0,thereexistsaconstantC1,>0suchthatwithprobabilityatleast1)]TJ /F3 11.955 Tf 11.95 0 Td[(O(n)]TJ /F9 7.97 Tf 6.58 0 Td[(),max1irpjdri(rn)jC1,r logn n.Proof:Notethatdri(rn)=2 nnX`=1Y`i(rXj=1rn,jY`j))]TJ /F10 11.955 Tf 15.29 8.08 Td[(2 rr1fi=rgisthedierencebetweenthesamplecovarianceandpopulationcovarianceofYiandPrj=1rn,jYjSincern,rrnistherthrowof)]TJ /F10 11.955 Tf 6.88 -7.03 Td[(n,r)]TJ /F8 7.97 Tf 6.59 0 Td[(1,itfollowsbyAssumption(A1)thatthevarianceofPrj=1rn,jYj,givenby(r)tn,risuniformlyboundedovernandr.TheproofnowfollowsalongthesamelinesasintheproofofLemma 10 .2Withtheabovelemmasinhand,wenowmovetowardsthemainproof.Fixrbetween2andparbitrarily.RecallthatArn(henceforthreferredtoasAr)isthesetofindicescorrespondingtothenon-zeroentriesofrn.Westartbyestablishingpropertiesforthefollowingrestricted 86

PAGE 87

minimizationproblem: MinimizeQCSCS,r(r)w.r.t.rsuchthatrj=0foreveryj=2Ar.(A{7) Lemma14. ThereexistsC>0suchthatforany>0,aglobalminimaoftherestrictedminimizationproblemin( A{7 )existswithinthediscfr:kr)]TJ /F10 11.955 Tf 13.23 0 Td[(rk0andanyu2Rrsatisfyinguj=0foreveryj=2Arandkuk=C,wegetbythetriangleinequalitythat r)]TJ /F8 7.97 Tf 6.59 0 Td[(1Xj=1jrjj)]TJ /F4 7.97 Tf 19.14 14.94 Td[(r)]TJ /F8 7.97 Tf 6.59 0 Td[(1Xj=1jrj+nujjnr)]TJ /F8 7.97 Tf 6.59 0 Td[(1Xj=1jujjCnp jArj.(A{8)Let~QCSCS,r(r):=(r)TSrr)]TJ /F10 11.955 Tf 11.95 0 Td[(2logrr.By( A{8 )andasecondorderTaylorseriesexpansionaroundr,weget QCSCS,r(r+nu))]TJ /F3 11.955 Tf 11.95 0 Td[(QCSCS,r(r)=~QCSCS,r(r+nu))]TJ /F10 11.955 Tf 13.94 2.66 Td[(~QCSCS,r(r))]TJ /F5 11.955 Tf 11.96 0 Td[(n r)]TJ /F8 7.97 Tf 6.58 0 Td[(1Xj=1jrjj)]TJ /F4 7.97 Tf 19.13 14.94 Td[(r)]TJ /F8 7.97 Tf 6.59 0 Td[(1Xj=1jrj+nujj!nXj2Arujdrj(r)+2nXj2ArXk2ArujukSjk+u2r 2(r)2)]TJ /F3 11.955 Tf 11.96 0 Td[(Cnp jArjnnXj2Arujdrj(r)+2nXj2ArXk2Arujuk(Sjk)]TJ /F10 11.955 Tf 13.35 2.65 Td[(n,jk)+Xj2ArXk2Arujukn,jk)]TJ /F3 11.955 Tf 11.95 0 Td[(C2n (A{9) wherer2[rr,rr+nur].Notethatnq n logn!1,andqnq logn n!0asn!1,byAssumption(A1).ItfollowsbyCauchy-Schwarzinequality,Lemma 10 andLemma 13 thatforany>0,thereexistconstantsCandC1,>0suchthatwithprobabilityatleast1)]TJ /F3 11.955 Tf 11.96 0 Td[(O(n)]TJ /F9 7.97 Tf 6.59 0 Td[(), nXj2Arujdrj(r)CC1,r jArjlogn nn=o(2n),(A{10) 87

PAGE 88

and 2n 2jXj2ArXk2Arujuk(Sjk)]TJ /F10 11.955 Tf 13.35 2.65 Td[(n,jk)jCC2qnr logn n=o(2n).(A{11)Also,byAssumptionA1,itfollowsthat Xj2ArXk2Arujukn,jkC22n 2max.(A{12)Combining( A{9 ),( A{10 ),( A{11 )and( A{12 ),wegetthatQCSCS,r(r+nu))]TJ /F3 11.955 Tf 11.95 0 Td[(QCSCS,r(r)>C22n 2max)]TJ /F10 11.955 Tf 11.96 0 Td[(2C2nwithprobabilityatleast1)]TJ /F3 11.955 Tf 12.15 0 Td[(O(n)]TJ /F9 7.97 Tf 6.59 0 Td[()forlargeenoughn.ChoosingC=4max+1,weobtainthatinfu:uj=0forj=2Ar,kuk=CQCSCS,r(r+nu)>QCSCS,r(r),withprobabilityatleast1)]TJ /F3 11.955 Tf 12.06 0 Td[(O(n)]TJ /F9 7.97 Tf 6.59 0 Td[()forlargeenoughn.Henceforevery>0,alocalminima(infactglobalminimaduetoconvexity)oftherestrictedminimizationproblemin( A{7 )existswithinthediscfr:kr)]TJ /F10 11.955 Tf 13.41 0 Td[(rk0,suchthatforany>0thefollowingholdswithprobability1)]TJ /F3 11.955 Tf 11.95 0 Td[(O(n)]TJ /F9 7.97 Tf 6.58 0 Td[():foranyrinthesetS=fr:kr)]TJ /F10 11.955 Tf 12.62 0 Td[(rkC1p jArjn,rj=08j=2Arg,wehavekdrAr(r)k>p jArjn,wheredrAr(r):=)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(drj(r)j2Ar.Proof:Recallthatn=p jArjn.Choose2Sarbitrarily.Letu=r)]TJ /F10 11.955 Tf 12.91 0 Td[(r=n.Itfollowsthatuj=0foreveryj=2Ar,andkukC1.LetrdenotethejArjjArjmatrixwiththediagonalentrycorrespondingtotherthvariableequalto1,andallotherentriesequaltozero. 88

PAGE 89

ByarstorderTaylorseriesexpansiondrAr,itfollowsthat drAr(r)=drAr(r)+2nSArAr+1 (r)2ruAr=drAr(r)+2nArAr+1 (r)2ruAr+2n(SArAr)]TJ /F10 11.955 Tf 11.96 0 Td[(n,ArAr)uAr, (A{13) whererliesbetweenrrandrr+nur.ByLemma 10 andLemma 13 itfollowsthatforany>0,thereexistconstantsC2,andC3,suchthatkdrAr(r)k2nArAr+1 (r)2ruAr)]TJ /F3 11.955 Tf 11.96 0 Td[(C2,r qnlogn n)]TJ /F3 11.955 Tf 11.95 0 Td[(C3,kuknjArjp logn p nn maxkuk=p jArjnC1 maxwithprobabilityatleast1)]TJ /F3 11.955 Tf 12.76 0 Td[(O(n)]TJ /F9 7.97 Tf 6.59 0 Td[(),forlargeenoughn.ThelastinequalityfollowsfromAssumption(A1),thefactthatjArjqnandAssumption(A5).ChoosingC1=max+1leadstotherequiredresult.2Thenextlemmaestablishesestimationandmodelselection(sign)consistencyfortherestrictedminimizationproblemin( A{7 ). Lemma16. ThereexistsC2>0suchthatforany>0,thefollowingholdswithprobability1)]TJ /F3 11.955 Tf 12.66 0 Td[(O(n)]TJ /F9 7.97 Tf 6.59 0 Td[(),forlargeenoughn:(i)thereexistsasolutiontotheminimizationproblemin( A{7 ),(ii)(estimationconsistency)anyglobalminimumoftherestrictedminimizationproblemin( A{7 )lieswithinthediscfr:kr)]TJ /F10 11.955 Tf 12.97 0 Td[(rk
PAGE 90

rjsn>2C2p jArjnforeveryj2Ar,forsucientlylargen.Thesignconsistencynowfollowsbycombiningthisfactwithkr)]TJ /F10 11.955 Tf 12.62 0 Td[(rk0,anysolution^rof( A{7 )satisesmaxj=2Ardrj(^r)0begiven,andlet^rbeasolutionof( A{7 ).IfCn:=fsign(^r)=sign(r)g,thenP(Cn)1)]TJ /F3 11.955 Tf 11.85 0 Td[(O(n)]TJ /F9 7.97 Tf 6.58 0 Td[()]TJ /F9 7.97 Tf 6.58 0 Td[()forlargeenoughn(byLemma 16 ).Now,onCn,itfollowsbythearstorderexpansionofdrAraroundr,andtheKKTconditionsfor( A{7 )that )]TJ /F5 11.955 Tf 9.3 0 Td[(nsign(rAr)=drAr(^r)=drAr(r)+2SArAr^un+2 (r)2r^un=Hn^un+drAr(r)+2)]TJ /F3 11.955 Tf 5.48 -9.69 Td[(SArAr)]TJ /F10 11.955 Tf 13.35 2.66 Td[(n,ArAr^un+2 (r)2)]TJ /F10 11.955 Tf 22.64 8.08 Td[(2 (rr)2r^un, (A{14) where^un=^r)]TJ /F10 11.955 Tf 12.62 0 Td[(r,rliesbetweenrrand^rr,andHn=2n,ArAr+2 (rr)2r.Hence, ^un=)]TJ /F5 11.955 Tf 9.3 0 Td[(nH)]TJ /F8 7.97 Tf 6.59 0 Td[(1nsign(rAr))]TJ /F3 11.955 Tf 11.95 0 Td[(H)]TJ /F8 7.97 Tf 6.59 0 Td[(1ndrAr(r))]TJ /F10 11.955 Tf 11.95 0 Td[(2H)]TJ /F8 7.97 Tf 6.59 0 Td[(1n)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(SArAr)]TJ /F10 11.955 Tf 13.35 2.66 Td[(n,ArAr^un)]TJ /F10 11.955 Tf 9.3 0 Td[(2H)]TJ /F8 7.97 Tf 6.59 0 Td[(1n1 (r)2)]TJ /F10 11.955 Tf 22.64 8.09 Td[(1 (rr)2r^un. (A{15) Now,letusxj=2Ar.ByarstorderTaylorseriesexpansionofdrj,itfollowsthatdrj(^r)=drj(r)+2Sti,Ar^un. 90

PAGE 91

Using( A{15 ),wegetthat drj(^r)=drj(r)+2(Sj,Ar)]TJ /F10 11.955 Tf 13.35 2.66 Td[(n,j,Ar)t^un+2tn,j,Ar^un=)]TJ /F10 11.955 Tf 9.3 0 Td[(2ntn,j,ArH)]TJ /F8 7.97 Tf 6.59 0 Td[(1nsign(rAr)+drj(r))]TJ /F10 11.955 Tf 11.95 0 Td[(2tn,j,ArH)]TJ /F8 7.97 Tf 6.58 0 Td[(1ndrAr(r)+)]TJ /F10 11.955 Tf 9.3 0 Td[(4tn,j,ArH)]TJ /F8 7.97 Tf 6.58 0 Td[(1n)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(SArAr)]TJ /F10 11.955 Tf 13.35 2.66 Td[(n,ArAr^un)]TJ /F10 11.955 Tf 11.96 0 Td[(4tn,j,ArH)]TJ /F8 7.97 Tf 6.59 0 Td[(1n1 (r)2)]TJ /F10 11.955 Tf 22.64 8.09 Td[(1 (rr)2r^un+2(Si,Ar)]TJ /F10 11.955 Tf 13.35 2.66 Td[(n,i,Ar)t^un. (A{16) Wenowindividuallyanalyzeallthetermsin( A{16 ).Itfollowsbythe\incoherence"Assumption(A3)thatthersttermsatises )]TJ /F10 11.955 Tf 9.3 0 Td[(2ntn,j,ArH)]TJ /F8 7.97 Tf 6.58 0 Td[(1nsign(rAr)n0suchthat maxj2Ar\000SArAr)]TJ /F10 11.955 Tf 13.35 2.66 Td[(n,ArArbjC4,r logn n,(A{20) 91

PAGE 92

withprobability1)]TJ /F3 11.955 Tf 13.09 0 Td[(O(n)]TJ /F9 7.97 Tf 6.58 0 Td[()]TJ /F9 7.97 Tf 6.59 0 Td[(),forlargeenoughn.By( A{18 ),( A{20 ),theestimationconsistencypartofLemma 16 andAssumption(A5)thatthefourthtermin( A{16 )satises 4tn,j,ArH)]TJ /F8 7.97 Tf 6.58 0 Td[(1n)]TJ /F3 11.955 Tf 5.48 -9.69 Td[(SArAr)]TJ /F10 11.955 Tf 13.35 2.66 Td[(n,ArAr^un2)]TJ /F3 11.955 Tf 5.48 -9.69 Td[(SArAr)]TJ /F10 11.955 Tf 13.35 2.66 Td[(n,ArArbk^unk=O r jArlogn np jArjn!=o(n), (A{21) withprobability1)]TJ /F3 11.955 Tf 12.57 0 Td[(O(n)]TJ /F9 7.97 Tf 6.59 0 Td[()]TJ /F9 7.97 Tf 6.59 0 Td[(),forlargeenoughn.Since(rr)2istherthdiagonalentryof)]TJ /F8 7.97 Tf 6.58 0 Td[(1n,r,itfollowsbyAssumption(A1)thatrrisuniformlyboundedaboveandbelowinnandr.Sincerliesbetweenrrand^rr,itfollowsbytheestimationconsistencypartofLemma 16 thatrisboundedaboveandbelowuniformlywithprobabilityatleast1)]TJ /F3 11.955 Tf 11.96 0 Td[(O(n)]TJ /F9 7.97 Tf 6.59 0 Td[()]TJ /F9 7.97 Tf 6.59 0 Td[(),forlargeenoughn.By( A{18 ),thedenitionofr,Lemma 16 ,andAssumption(A1),thefthtermin( A{16 )satises 4tn,j,ArH)]TJ /F8 7.97 Tf 6.58 0 Td[(1n1 (r)2)]TJ /F10 11.955 Tf 22.63 8.09 Td[(1 (rr)2r^un2maxjrr+rjj min(rr)2(r)2jrr)]TJ /F10 11.955 Tf 12.57 0 Td[(^rrjj^un,rj=O)]TJ /F2 11.955 Tf 5.48 -9.68 Td[(jArj2n=o(n). (A{22) withprobability1)]TJ /F3 11.955 Tf 11.96 0 Td[(O(n)]TJ /F9 7.97 Tf 6.59 0 Td[()]TJ /F9 7.97 Tf 6.59 0 Td[(),forlargeenoughn.Also,byLemma 10 ,theconsistencypartofLemma 16 ,andAssumption(A1),thesixthtermin( A{16 )satises 2(Si,Ar)]TJ /F10 11.955 Tf 13.35 2.66 Td[(n,i,Ar)t^un2Si,Ar)]TJ /F10 11.955 Tf 13.35 2.66 Td[(n,i,Ark^unk=O r jArjlogn np jArjn!=o(n).(A{23)Itfollowsby( A{16 ),( A{17 ),( A{19 ),( A{21 )-( A{23 )thatforanyj=2Ar,drj(^r)0and1rpbechosenarbitrarily.LetCr,ndenotetheeventonwhichLemma 16 andLemma 17 hold.ItfollowsthatP(Cr,n)1)]TJ /F3 11.955 Tf 12.51 0 Td[(O(n)]TJ /F9 7.97 Tf 6.58 0 Td[()]TJ /F9 7.97 Tf 6.59 0 Td[(),forlargeenoughn.Now,onCr,n,anysolutionoftherestrictedproblem( A{7 )isalsoaglobalminimizerof 92

PAGE 93

QCSCS,r(byLemma 11 ).Hence,thereisatleastoneglobalminimizerofQCSCS,rforwhichthecomponentscorrespondingto(Ar)carezero.ItagainfollowsbyLemma 11 thatthesecomponentsarezeroforallglobalminimizersofQCSCS,r.Hence,thesolutionsetoftherestrictedminimizationproblemin( A{7 )isthesameasthesolutionsetfortheunrestrictedproblem(i.e.,thesetofglobalminimizersofQCSCS,r).Hence,onCr,n,theassertionsofLemma 16 holdforthesolutionsoftheunrestrictedminimizationproblemforQCSCS,r.RecallthatQCSCS(L)=Ppr=1QCSCS,r(r),andthatfrgpr=1formadisjointpartitionofL.Notethatbytheunionboundandthefactthatp=O(n),P(\nr=1Cr,n)1)]TJ /F3 11.955 Tf 12.24 0 Td[(O(n)]TJ /F9 7.97 Tf 6.58 0 Td[(),forlargeenoughn.Also,bythetriangleinequalitykL)]TJ /F16 11.955 Tf 12.98 3.16 Td[(eLkPpr=1kr)]TJ /F16 11.955 Tf 13.4 0.17 Td[(erkforanyL,eL2Lp.ItfollowsthattheassertionsinTheorem 2.4.1 holdon\pr=1Cr,n.2 93

PAGE 94

APPENDIXBCALLCENTERDATA B.1ForecastingdetailsSupposeyi=(yi,1,...,yi,102)t,andyi=(y(1)ti,y(2)ti)t,wherey(1)iandy(2)iare51dimensionalvectorsthatmeasurethearrivalpatternsintheearlyandlatertimesofdayi.Thecorrespondingpartitionsforthemeanandcovariancematrixaredenotedbyt=(t1,t2)and=0B@111221221CAAssumingmultivariatenormality,thebestmeansquarederrorforecastofy(2)iusingy(1)iis E(y(2)ijy(1)i)=2+21)]TJ /F8 7.97 Tf 6.59 0 Td[(111(y(1)i)]TJ /F5 11.955 Tf 11.95 0 Td[(1)(B{1)Tocomparetheforecastperformanceusingthefourdierentcovariancematrixestimates,wesplitthe239daysintotrainingandtestdatasets.ThedatafromtherstTdays(T=205,150,100,75),formthetrainingdatasetthatisusedtoestimatethemeanandcovariancestructure.Theestimatesarethenappliedforforecastingusing( B{1 )forthe239)]TJ /F3 11.955 Tf 12.18 0 Td[(Tdaysinthetestset.Notethatweusethe51square-root-transformedarrivalcountsintheearlyhalfofadaytoforecastthesquare-root-transformedarrivalcountsinthelaterhalfoftheday.Foreachtimeintervalt=52,...,102,theauthorsin Huangetal. ( 2006 )denetheforecasterror(FE)by:FEt=1 239)]TJ /F3 11.955 Tf 11.96 0 Td[(T239Xi=T+1j^yit)]TJ /F3 11.955 Tf 11.96 0 Td[(yitjwhereyitand^yitaretheobservedandforecastvaluesrespectively.InFigure B-1 ,weprovidethenumberoftimeintervals(outof51)whereeachofthefourmethods(CSCS,SparseCholesky,SparseGraph,samplecovariancematrix)hastheminimumforecasterrorvalue.Figure B-2 providestheaggregatedforecasterrorsoverallthe51timeintervalsforeachmethod.WhenT=205andthesizeofthetrainingdataislargerthanthe 94

PAGE 95

numberofvariables,itisclearthatSparseCholeskyperformsthebest,achievingminimumFEt38times,followedbyCSCSwith8andthenSparseGraphwith3.Thisorderingispreservedwhenonelooksattheaggregatedforecasterrors.Thesamplecovariancematrixperformstheworstandachievestheminimumonlytwice.AsimilarpatternisobservedforT=150.Thepicturechangesquiteabitifwereducethesizeofthetrainingdataset,especiallyifitissmallerthanthenumberofvariables.Inthen
PAGE 96

FigureB-3. AverageAbsoluteForecastErrorwith205observationsintrainingdataset FigureB-4. AverageAbsoluteForecastErrorwith150observationsintrainingdataset 96

PAGE 97

FigureB-5. AverageAbsoluteForecastErrorwith100observationsintrainingdataset FigureB-6. AverageAbsoluteForecastErrorwith75observationsintrainingdataset 97

PAGE 98

APPENDIXCNETWORKSFORFLOWCYTOMETRYDATA A B C DFigureC-1. Trueandestimatedgraphs(matchingsparsitytotruegraph)fromcell-signalingdata.Abluearrowdenotesatruepositive,whilearedarrowdenotesafalsepositive(A)Sachs(B)CSCS(C)SparseCholesky(D)SparseGraph 98

PAGE 99

A B C DFigureC-2. Trueandestimatedgraphsfromcell-signalingdatabysettingi()=2n)]TJ /F8 5.978 Tf 7.78 3.25 Td[(1 2Z 2p(i)]TJ /F8 5.978 Tf 5.75 0 Td[(1).Abluearrowdenotesatruepositive,whilearedarrowdenotesafalsepositive(A)Sachs(B)CSCS(C)SparseCholesky(D)SparseGraph 99

PAGE 100

REFERENCES Altamore,D.,Consonni,G.,andRocca,L.La.\Objectivebayesiansearchofgaussiandirectedacyclicgraphicalmodelsfororderedvariableswithnon-localpriors."Biometrics69(2013):478{487. Aragam,B.,Amini,A.A.,andZhou,Q.\LearningDirectedAcyclicGraphswithPenalizedNeighbourhoodRegression."arxiv(2016). Aragam,B.andZhou,Q.\ConcavePenalizedEstimationofSparseGaussianBayesianNetworks."JournalofMachineLearningResearch16(2015):2273{2328. Banerjee,O.,Ghaoui,L.El,andD'Aspremont,A.\ModelselectionthroughsparsemaximumlikelihoodestimationformultivariateGaussianorbinarydata."TheJournalofMachineLearningResearch9(2008):485{516. Bickel,P.J.andLevina,E.\Covarianceregularizationbythresholding."Ann.Statist.36(2008):199{227. Cao,Xuan,Khare,Kshitij,andGhosh,Malay.\PosteriorGraphSelectionandEstimationConsistencyforHigh-dimensionalBayesianDAGModels."https://arxiv.org/abs/1611.01205(2017). Chickering,DavidMaxwell.\Optimalstructureidenticationwithgreedysearch."TheJournalofMachineLearningResearch3(2003):507{554. Consonni,G.,Rocca,L.La,andPeluso,S.\Objectivebayescovariate-adjustedsparsegraphicalmodelselection."Scand.J.Statist44(2017):741{764. Davis,T.A.\Directmethodsforsparselinearsystems."SIAM,Philadelphia(2006). Dawid,A.P.andLauritzen,S.L.\HyperMarkovlawsinthestatisticalanalysisofdecomposablegraphicalmodels."Ann.Statist.21(1993):1272{1317. Ellis,ByronandWong,WingHung.\LearningcausalBayesiannetworkstructuresfromexperimentaldata."JournaloftheAmericanStatisticalAssociation103(2008).482. Fawcett,T.\AnintroductiontoROCanalysis."PatternRecognitionLetters27(8)(2006):861{874. Friedman,J.,Hastie,T.,andTibshirani,R.\Regularizationpathsforgeneralizedlinearmodelsviacoordinatedescent."JournalofStatisticalSoftware33(2008a):1{22. |||.\Sparseinversecovarianceestimationwiththegraphicallasso."Biostatistics9(2008b):432{441. |||.\Applicationsofthelassoandgroupedlassototheestimationofsparsegraphicalmodels."TechnicalReport,DepartmentofStatistics,StanfordUniversity(2010). 100

PAGE 101

Fu,W.J.\PenalizedRegressions:Thebridgeversusthelasso."JournalofComputationalandGraphicalStatistics7(1998):397{416. Gamez,JoseA,Mateo,JuanL,andPuerta,JoseM.\LearningBayesiannetworksbyhillclimbing:Ecientmethodsbasedonprogressiverestrictionoftheneighborhood."DataMiningandKnowledgeDiscovery22(2011).1-2:106{148. |||.\OneiterationCHCalgorithmforlearningBayesiannetworks:Aneectiveandecientalgorithmforhighdimensionalproblems."ProgressinArticialIntelligence1(2012).4:329{346. Geiger,DanandHeckerman,David.\LearningGaussiannetworks."arXiv:1302.6808(2013). Hastie,T.,Tibshirani,R.,andFriedman,J.ElementsofStatisticalLearning.Springer-Verlag,2009. Heckerman,David,Geiger,Dan,andChickering,DavidM.\LearningBayesiannetworks:Thecombinationofknowledgeandstatisticaldata."Machinelearning20(1995).3:197{243. Hero,A.andRajaratnam,B.\Hubdiscoveryinpartialcorrelationgraphs."IEEETransactionsonInformationTheory58(2012):6064{6078. Hsieh,C-J.,Sustik,M.A.,Dhillon,I.S.,andRavikumar,P.\Sparseinversecovariancematrixestimationusingquadraticapproximation."AdvancesinNeuralInformationProcessingSystems24(2011). Huang,J.,Liu,N.,Pourahmadi,M.,andLiu,L.\Covarianceselectionandestimationviapenalisednormallikelihoode."Biometrika93(2006):85{98. Kalisch,MarkusandBuhlmann,Peter.\Estimatinghigh-dimensionaldirectedacyclicgraphswiththePC-algorithm."TheJournalofMachineLearningResearch8(2007):613{636. Karoui,NEl.\Operatornormconsistentestimationoflargedimensionalsparsecovariancematrices."Ann.Statist.36(2008):2717{2756. Khare,K.,Oh,S.,andRajaratnam,B.\Aconvexpseudo-likelihoodframeworkforhighdimensionalpartialcorrelationestimationwithconvergenceguarantees."JournaloftheRoyalStatisticalSocietyB77(2015):803{825. Khare,K.andRajaratnam,B.\Convergenceofcycliccoordinatewisel1minimization."arxiv(2014). Khare,Kshitij,Oh,Sang,Rahman,Syed,andRajaratnam,Bala.\Aconvexframeworkforhigh-dimensionalsparseCholeskybasedcovarianceestimation."https://arxiv.org/abs/1610.02436(2017a). Khare,Kshitij,Rahman,Syed,andRajaratnam,Bala.\Choleskybasedpositivedeniteestimationingraphicalmodels."(2017b). 101

PAGE 102

Lam,WaiandBacchus,Fahiem.\LearningBayesianbeliefnetworks:AnapproachbasedontheMDLprinciple."ComputationalIntelligence10(1994).3:269{293. Lauritzen,S.L.Graphicalmodels.OxfordUniversityPressInc.,NewYork.,1996. Letac,G.andMassam,H.\Wishartdistributionsfordecomposablegraphs."Ann.Statist.35(2007):1278{1323. Levina,E.,Rothman,A.,andZhu,J.\Sparseestimationoflargecovariancematricesviaanestedlassopenalty."AnnalsofAppliedStatistics2(2008):245{263. lineartimeimplementationofthereverseCuthill-McKeealgorithm,A.\W.M.ChanandA.George."BITNumericalMathematics20(1980):8{14. Marbach,Daniel,Prill,RobertJ.,Schater,Thomas,Mattiussi,Claudio,Floreano,Dario,andStolovitzky,Gustavo.\Revealingstrengthsandweaknessesofmethodsforgenenetworkinference."PNAS107(2010).14:6286{6291.WingX. Marbach,Daniel,Schater,Thomas,Mattiussi,Claudio,andFloreano,Dario.\GeneratingRealisticInSilicoGeneNetworksforPerformanceAssessmentofReverseEngineeringMethods."JournalofComputationalBiology16(2009).2:229{239.WingX. Massam,H.,Paul,D.,andRajaratnam,B.\Penalizedempiricalriskminimizationusingaconvexlossfunctionand`1penalty."unpublishedmanuscript(2007). Mazumder,R.andHastie,T.\Exactcovariancethresholdingintoconnectedcomponentsforlarge-scalegraphicallasso."TheJournalofMachineLearningResearch13(2012):781{794. Meinshausen,N.andBuhlmann,P.\Highdimensionalgraphsandvariableselectionwiththelasso."AnnalsofStatistics34(2006):1436{1462. Oh,S.,Dalal,O.,Khare,K.,andRajaratnam,B.\Optimizationmethodsforsparsepseudo-likelihoodgraphicalmodelselection."ProceedingsofNeuralInformationProcessingSystems(2014). Paulsen,V.I.,Power,S.C.,andSmith,A.A.\Schurproductsandmatrixcompletions."JournalofFunctionalAnalysis85(1989a):151{178. Paulsen,V.I.,Power,S.C.,andSmith,R.R.\Schurproductsandmatrixcompletions."J.Funct.Anal.85(1989b):151{178. Peng,J.,Wang,P.,Zhou,N.,andZhu,J.\Partialcorrelationestimationbyjointsparseregressionmodels."JournaloftheAmericanStatisticalAssociation104(2009):735{746. Prill,RobertJ.,Marbach,Daniel,Saez-Rodriguez,Julio,Sorger,PeterK.,Alexopoulos,LeonidasG.,Xue,Xiaowei,Clarke,NeilD.,Altan-Bonnet,Gregoire,andStolovitzky,Gustavo.\TowardsaRigorousAssessmentofSystemsBiologyModels:TheDREAM3Challenges."PLOSONE5(2010).2:1{18.URL https://doi.org/10.1371/journal.pone.0009202 102

PAGE 103

Rahman,Syed,Khare,Kshitij,andMichailidis,George.\EstimationofGaussianDAGswithknownpartitionbasedpartialordering."(2017). Rajaratnam,B,Massam,H.,andCarvalho,C.M.\FlexiblecovarianceestimationingraphicalGaussianmodels."Ann.Statist.36(2008):2818{2849. Rothman,A.,Levina,E.,andZhu,J.\AnewapproachtoCholesky-basedcovarianceregularizationinhighdimensions."Biometrika97(2010a):539{550. Rothman,A.J.,Levina,E.,andZhu,J.\AnewapproachtoCholesky-basedestimationofhigh-dimensionalcovariancematrices."Biometrika97(2010b):539{550. Roverato,A.\HyperInverseWishartDistributionforNon-decomposableGraphsanditsApplicationtoBayesianInferenceforGaussianGraphicalModels."ScandinavianJournalofStatistics29(2002):391{411. Sachs,K.,Perez,O.,Pe'er,D.,Lauenburger,D.,andNolan,G.\Causalprotein-signalingnetworksderivedfrommultiparametersingle-celldata."Science308(5721)(2003):504{6. Shen,H.andHuang,J.Z.\Analysisofcallcenterarrivaldatausingsingularvaluedecomposition."Appl.Stoch.ModelsBus.andInd.21(2005):251{63. Shojaie,A.,Jauhiainen,A.,Kallitsis,M.,andMichailidis,G.\Inferringregulatorynetworksbycombiningperturbationscreensandsteadystategeneexpressionproles."PLoSONEe82393(2014). Shojaie,A.andMichailidis,G.\Penalizedlikelihoodmethodsforestimationofsparsehigh-dimensionaldirectedacyclicgraphs."Biometrika97(2010):519{538. Smith,M.andKohn,R.\ParsimoniouscovariancematrixestimationforlongitudinalData."JournaloftheAmericanStatisticalAssociation97(2002):1141{1153. Speed,T.P.andKiveri,H.T.\GaussianMarkovdistributionsovernitegraphs."AnnalsofStatistics14(1986):138{150. Spirtes,Peter,Glymour,Clark,andScheines,Richard.Causation,Prediction,andSearch.2001. Tsamardinos,Ioannis,Brown,LauraE,andAliferis,ConstantinF.\Themax-minhill-climbingBayesiannetworkstructurelearningalgorithm."MachineLearning,65(1):31{78,2006.65(2006).1:31{78. vandeGeer,S.andBuhlmann,P.\l0-penalizedmaximumlikelihoodforsparsedirectedacyclicgraphs."AnnalsofStatistics41(2013):536{567. Wu,W.B.andPourahmadi,M.\Nonparametricestimationoflargecovariancematricesoflongitudinaldata."Biometrika90(2003):831{844. Xiang,R.,Khare,K.,andGhosh,M.\Highdimensionalposteriorconvergenceratesfordecomposablegraphicalmodels."Electron.J.Statist.9(2015):2828{2854. 103

PAGE 104

Yu,G.andBien,J.\LearningLocalDependenceInOrderedData."arXiv:1604.07451(2016). Zhou,Qing.\Multi-domainsamplingwithapplicationstostructuralinferenceofBayesiannetworks."JournaloftheAmericanStatisticalAssociation106(2011).496:1317{1330. 104

PAGE 105

BIOGRAPHICALSKETCHSyedRahmanreceivedhisPh.D.instatisticsfromtheUniversityofFloridainDecember2017underthesupervisionofDr.KshitijKhare.Duringhistimethere,hemainlyfocusedongraphicalmodelsandcovarianceestimation.HehasaB.AineconomicsfromBardCollegeandanM.SinmathematicsfromNewYorkUniversity. 105