Citation
Efficient Sparse Optimization Algorithms: Designing Non-convex and Distributed Algorithms for Machine Learning and Engineering Applications

Material Information

Title:
Efficient Sparse Optimization Algorithms: Designing Non-convex and Distributed Algorithms for Machine Learning and Engineering Applications
Creator:
Zhu, Jiajie
Publisher:
University of Florida
Publication Date:
Language:
English

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Mathematics
Committee Chair:
HAGER,WILLIAM WARD
Committee Co-Chair:
KHARE,KSHITIJ
Committee Members:
GHOSH,MALAY
KEESLING,JAMES E
PARDALOS,PANAGOTE M

Subjects

Subjects / Keywords:
algorithm
machine-learning
non-convex
optimization
sparsity

Notes

General Note:
With the increasing computational demand from areas such as computer science and engineering, modern optimization algorithms must be adapted to the applications where there was little existing theoretical guarantee. Sparse optimization techniques are of high importance in the age of big datasets. This thesis mainly considers two applications, sparse principal component analysis(PCA) and power-grid load control. For the sparse PCA problem, we derive a non-convex projection algorithm and its accompanying theoretical convergence results. In the power-grid control application, a decentralized variable splitting algorithm with theoretical convergence guarantee is developed to solve the separable convex optimization problem. Our mathematical proof of convergence is self-contained. Numerical experiments in both problems are presented to demonstrate the effectiveness of the developed algorithms.

Record Information

Source Institution:
UFRGP
Rights Management:
All applicable rights reserved by the source institution and holding location.
Embargo Date:
12/31/2017

Downloads

This item has the following downloads:


Full Text

PAGE 1

EFFICIENTSPARSEOPTIMIZATIONALGORITHMS:DESIGNINGNON-CONVEXANDDISTRIBUTEDALGORITHMSFORMACHINELEARNINGANDENGINEERINGAPPLICATIONSByJIAJIEZHUADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2015

PAGE 2

c2015JiajieZhu

PAGE 3

Toeveryhard-workingpersonwhostilllivesinpoverty

PAGE 4

ACKNOWLEDGMENTSIwouldliketoexpressmygratitudeforthefriendsandcolleaguesfromtheDepartmentofMathematicsandtheUniversityofFlorida.Iwanttoespeciallythankmyadvisor,Dr.WilliamHager,fortheadvicesandassistanceintheresearchingandwritingofthisthesis.IwillrememberandbeforevergratefulforallthesupportIreceivedfromeveryoneduringthedifcultpastveyearsofmylife. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 6 LISTOFFIGURES ..................................... 7 ABSTRACT ......................................... 8 CHAPTER 1INTRODUCTION ................................... 9 2ANON-CONVEXPROJECTIONALGORITHMFORSPARSEOPTIMIZATION 10 2.1GradientProjectionAlgorithmforNon-convexOptimization ........ 12 2.2ApproximateNewtonAlgorithm ........................ 18 2.3NumericalExperimentsonSparsePCAAlgorithms ............. 22 2.4DiscussionontheProjectionAlgorithmandSparsePCA .......... 28 3ADECENTRALIZEDALGORITHMFORSEPARABLECONVEXOPTIMIZATION 35 3.1ThePower-gridLoadControlProblemBackground ............. 36 3.2FormulatingLoadControlasConstrainedOptimization ........... 37 3.3DualDecompositionandMulti-blockVariableSplitting ........... 38 3.4DecentralizedMulti-blockADMMforPowerGridLoadControl ....... 40 3.5ConvergenceoftheAlgorithm ......................... 42 3.6ToyExampleforDM-ADMM .......................... 48 3.7Power-gridLoadControlSimulation ..................... 49 3.8LinearConvergenceRatewithSeparableQuadraticProgramming .... 53 4CONCLUSIONS ................................... 65 REFERENCES ....................................... 66 BIOGRAPHICALSKETCH ................................ 70 5

PAGE 6

LISTOFTABLES Table page 2-1ResultsonPitpropsdataset. ............................ 30 2-2Simplerandomdataset ............................... 33 2-3Randomdataset,m=250,n=500. ........................ 33 2-4Hollywooddataset. ................................. 34 6

PAGE 7

LISTOFFIGURES Figure page 2-1ExamplethatshowsProposition2.2maynotholdforlocalminimizers. ..... 30 2-2Explainedvarianceversuscardinalityforrandomdataset. ............ 31 2-3Aplotofthebase10logarithmoftherelativeerrorversusiterationnumberfortherandomdatasetwithm=250andcardinality=500. .......... 31 2-4Explainedvarianceversusiterationnumberforcardinality50intherandomdataset. ........................................ 32 2-5Explainedvarianceversusmforcardinality20intherandomdataset. ..... 32 2-6Aplotofthebase10logarithmoftherelativeerrorversusiterationnumberfortherandomdatasetwithm=250andcardinality=500.Theboxescorrespondtothemonotonealgorithmwhichemploysalinesearch,whilethecirclescorrespondtothenonmonotonealgorithmwithkgivenbytheBBformula(2). .................................... 33 3-1DM-ADMMalgorithm,toyexample.Fixed,Differentn ............. 58 3-2DM-ADMMalgorithm,toyexample.Different,xedn .............. 59 3-3DM-ADMMalgorithm,toyexample.Varyingk. .................. 60 3-4Power-gridsimulation,withoutnoise ........................ 61 3-5Power-gridsimulation,withnoise,withoutcommunication ............ 61 3-6Power-gridsimulation,withnoise,withcommunication .............. 62 3-7Power-grid,withcommunication,withnoise.Differentchoicesof ....... 63 3-8Power-gridsimulationwithoutnoise,adaptiveparameter ............. 64 7

PAGE 8

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophyEFFICIENTSPARSEOPTIMIZATIONALGORITHMS:DESIGNINGNON-CONVEXANDDISTRIBUTEDALGORITHMSFORMACHINELEARNINGANDENGINEERINGAPPLICATIONSByJiajieZhuDecember2015Chair:WilliamHagerMajor:MathematicsWiththeincreasingcomputationaldemandfromareassuchascomputerscienceandengineering,modernoptimizationalgorithmsmustbeadaptedtotheapplicationswheretherewaslittleexistingtheoreticalguarantee.Sparseoptimizationtechniquesareofhighimportanceintheageofbigdatasets.Thisthesismainlyconsiderstwoapplications,sparseprincipalcomponentanalysis(PCA)andpower-gridloadcontrol.ForthesparsePCAproblem,wederiveanon-convexprojectionalgorithmanditsaccompanyingtheoreticalconvergenceresults.Inthepower-gridcontrolapplication,adecentralizedvariablesplittingalgorithmwiththeoreticalconvergenceguaranteeisdevelopedtosolvetheseparableconvexoptimizationproblem.Ourmathematicalproofofconvergenceisself-contained.Numericalexperimentsinbothproblemsarepresentedtodemonstratetheeffectivenessofthedevelopedalgorithms. 8

PAGE 9

CHAPTER1INTRODUCTIONTherearetwomainapplicationsofthisstudy.Thesparseprincipalcomponentanalysis(PCA)andthepower-gridloadcontrol.Chapter 2 studiesthesparsePCAproblem.Itsoptimizationformulationisal0-constrainedquadraticprogrammingproblem.Suchaproblemishardtosolveinitsnon-convexnature.Chapter 3 studiesthepower-gridloadcontrolproblem.Itisformulatedasaseparableconvexconstrainedoptimizationproblem.Theapplicationrequiresthealgorithmtobedecentralized.Weapproachbothproblemswithsparseoptimizationtechniques. 9

PAGE 10

CHAPTER2ANON-CONVEXPROJECTIONALGORITHMFORSPARSEOPTIMIZATIONPrincipalcomponentanalysis(PCA)isanimportanttechniqueinengineeringandstatisticalanalysis.Itamountstocomputingthesingularvectorsassociatedwiththelargestsingularvalues.Initssimplestsetting,therank-oneapproximation,amountstosolvinganoptimizationproblemoftheform maxfxTx:x2Rn,kxk=1g,(2)where=ATAisthecovariancematrixassociatedwiththedatamatrixA2RmnandkkistheEuclideannorm.Aspointedoutin[ 38 ],thereisnolossofgeneralityinassumingthatispositivedenitesincexTx+=xT(+I)xwheneverxisfeasiblein( 2 ).ThelackofinterpretabilityhasbeenamajorconcerninPCA.SparsePCApartlyaddressesthisproblembyconstrainingthenumberofnonzerocomponentsofthemaximizingxin( 2 ).Givenapositiveinteger,thesparsePCAproblemassociatedwith( 2 )is maxfxTx:x2Rn,kxk=1,kxk0g,(2)wherekxk0denotesthenumberofnonzerocomponentsofx.Duetothesparsityconstraintin( 2 ),thefeasiblesetisnolongerconvex,whichmakestheoptimizationproblemmoredifcult.In[ 10 ]PCAloadingssmallerthanacertaintolerancearesimplysettozerotoproducesparseprincipalcomponents.Morerecently,optimization-basedapproacheshavebeenusedtointroducesparsity.Forexample,in[ 37 ]sparsityisachievedusinganl1relaxation.Thatis,theproblem( 2 )isreplacedby maxfxTx:x2Rn,kxk=1,kxk21g,(2) 10

PAGE 11

wherekxk1=jx1j+jx2j+...+jxnj.Thesolutionoftherelaxedproblem( 2 )yieldsanupperboundforthesolutionof( 2 ).In[ 32 ]theRayleighquotientproblemsubjecttoanl1-constraintissuccessivelymaximizedusingtheauthors'SCoTLASSalgorithm.In[ 48 ],theauthorsformulatearegressionproblemandproposenumericalalgorithmstosolveit.Theirapproachcanbeappliedtolarge-scaledata,butitiscomputationallyexpensive.In[ 14 ]anewsemi-deniterelaxationisformulatedandagreedyalgorithmisdevelopedthatcomputesafullsetofgoodsolutionsforthetargetnumberofnon-zerocoefcients.WithtotalcomplexityO(n3),thealgorithmiscomputationallyexpensive.Otherreferencesrelatedtosparseoptimizationinclude[ 12 25 28 30 35 43 ].Ourworkislargelymotivatedby[ 33 ],[ 38 ],and[ 45 ].In[ 33 ]bothl1-penalizedandl0-penalizedsparsePCAproblemsareconsideredandageneralizedpowermethodisdeveloped.Thenumericalexperimentsshowthattheirapproachoutperformsearlieralgorithmsbothinsolutionqualityandincomputationalspeed.Recently,[ 38 ]and[ 45 ]bothconsiderthel0-constrainedsparsePCAproblemandproposeanefcienttruncatedpowermethod.TheiralgorithmsareequivalentandoriginatedfromtheclassicFrank-Wolfe[ 19 ]conditionalgradientalgorithm.Inthischapter,westudyboththegradientprojectionalgorithmandanapproximateNewtonalgorithm.ConvergenceresultsareestablishedandnumericalexperimentsaregivenforsparsePCAproblemsoftheform( 2 ).Thealgorithmshavethesameiterationcomplexityasthefastestcurrentalgorithms.ThegradientprojectionalgorithmwithunitstepsizehasnearlyidenticalperformanceasthatofConGradUandTpower.Ontheotherhand,theapproximateNewtonalgorithmcanoftenconvergefastertoabetterobjectivevaluethantheotheralgorithms.Thischapterisorganizedasfollows.InSection 2.1 weanalyzethegradientprojectionalgorithmwhentheconstraintsetisnonconvex.Section 2.2 introducesandanalyzestheapproximateNewtonscheme.Section 2.3 examinestheperformanceof 11

PAGE 12

thealgorithmsinsomenumericalexperimentsbasedonclassicexamplesfoundinthesparsePCAliterature.Notationinthischapter.Iff:Rn!Risdifferentiable,thenrf(x)denotesthegradientoff,arowvector,whileg(x)denotesthegradientoffarrangedasacolumnvector.Thesubscriptkdenotestheiterationnumber.Inparticular,xkisthek-thiterateinxandgk=g(xk).Thei-thelementofthek-thiterateisdenotedxki.kkdenotestheEuclideannormandkk0denotescardinality(numberofnon-zeroelements).Ifx2Rn,thenthesupportofxisthesetofindicesofnonzeroscomponents:supp(x)=fi:xi6=0g.IfRn,thenconv()istheconvexhullof.IfSf1,2,...,ng,thenxSisthevectorobtainedbyreplacingxifori2Scby0.IfAisaset,thenAcisitscomplement.IfAandB2Rnnaresymmetricmatrices,thenABmeansthatA)]TJ /F3 11.955 Tf 10.84 0 Td[(Bisnegativesemidenite.Thestandardsignumfunctionisdenedbysgn(x)=8>>>><>>>>:+1ifx>0,0ifx=0,)]TJ /F4 11.955 Tf 9.3 0 Td[(1ifx<0. 2.1GradientProjectionAlgorithmforNon-convexOptimizationLetusconsideranoptimizationproblemoftheform minff(x):x2g,(2)whereRnisanonempty,closedsetandf:!Risdifferentiableon.Often,thegradientprojectionalgorithmispresentedinthecontextofanoptimizationproblemwherethefeasiblesetisconvex[ 3 4 23 ].SincethefeasiblesetforthesparsePCAproblem( 2 )isnonconvex,wewillstudythegradientprojectionalgorithmforapotentiallynonconvexfeasibleset. 12

PAGE 13

TheprojectionofxontoisdenedbyP(x)=argminy2kx)]TJ /F3 11.955 Tf 11.96 0 Td[(yk.FortheconstraintsetthatarisesinsparsePCA,theprojectioncanbeexpressedasfollows: Proposition2.1. Fortheset =fx2Rn:kxk=1,kxk0g,(2)whereisapositiveinteger,wehaveT(x)=kT(x)k2P(x),whereT(x)isthevectorobtainedfromxbyreplacingn)]TJ /F5 11.955 Tf 12.02 0 Td[(elementsofxwithsmallestmagnitudeby0. Proof. Ify2,thenkx)]TJ /F3 11.955 Tf 11.96 0 Td[(yk2=kxk2+1)]TJ /F4 11.955 Tf 11.95 0 Td[(2hx,yi.Hence,wehave P(x)=argminfhx,yi:kyk=1,kyk0g.(2)In[ 38 ,Prop.4.3],itisshownthattheminimumisattainedaty=T(x)=kT(x)k.Weincludetheproofsinceitisshortandweneedtorefertoitlater.GivenanysetSf1,2,...,ng,thesolutionoftheproblemminfhx,yi:kyk=1,supp(y)=Sgisy=xS=kxSkbytheSchwarzinequality,andthecorrespondingobjectivevalueiskxSk,wherexSisthevectorobtainedbyreplacingxifori2Scby0.Clearly,theminimumisattainedwhenSisthesetofindicesofxassociatedwiththeabsolutelargestcomponents. Ingeneral,whenisclosed,theprojectionexists,althoughitmaynotbeuniquewhenisnonconvex.Ifxk2isthecurrentiterate,theninoneofthestandard 13

PAGE 14

implementationsofgradientprojectionalgorithm,xk+1isobtainedbyalinesearchalongthelinesegmentconnectingxkandP(xk)]TJ /F3 11.955 Tf 12.6 0 Td[(skgk),wheregkisthegradientatxkandsk>0isthestepsize.Whenisnonconvex,thislinesegmentisnotalwayscontainedin.Hence,wewillfocusongradientprojectionalgorithmsoftheform xk+12P(xk)]TJ /F3 11.955 Tf 11.96 0 Td[(skgk).(2)Sinceisclosed,xk+1existsforeachk.Werstobservethatxk+1)]TJ /F3 11.955 Tf 11.99 0 Td[(xkalwaysformsanobtuseanglewiththegradient,whichguaranteesdescentwhenfisconcave. Lemma1. Ifxk2,then rf(xk)(y)]TJ /F3 11.955 Tf 11.96 0 Td[(xk)0forally2P(xk)]TJ /F3 11.955 Tf 11.96 0 Td[(skgk).(2)Inparticular,fory=xk+1thisgives rf(xk)(xk+1)]TJ /F3 11.955 Tf 11.96 0 Td[(xk)0.(2)Iffisconcaveoverconv(),thenf(xk+1)f(xk). Proof. Ify2P(xk)]TJ /F3 11.955 Tf 12.31 0 Td[(skgk),thensinceP(xk)]TJ /F3 11.955 Tf 12.31 0 Td[(skgk)issetofelementsinclosesttoxk)]TJ /F3 11.955 Tf 11.96 0 Td[(skgk,wehave ky)]TJ /F4 11.955 Tf 11.95 0 Td[((xk)]TJ /F3 11.955 Tf 11.95 0 Td[(skgk)kkxk)]TJ /F4 11.955 Tf 11.96 0 Td[((xk)]TJ /F3 11.955 Tf 11.96 0 Td[(skgk)k=skkgkk.(2)BytheSchwarzinequalityand( 2 ),itfollowsthatgTk(y)]TJ /F4 11.955 Tf 11.96 0 Td[((xk)]TJ /F3 11.955 Tf 11.95 0 Td[(skgk))kgkkky)]TJ /F4 11.955 Tf 11.96 0 Td[((xk)]TJ /F3 11.955 Tf 11.96 0 Td[(skgk)kskkgkk2.Werearrangethisinequalitytoobtain( 2 ).Iffisconcaveoverconv(),then f(xk+1)f(xk)+rf(xk)(xk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(xk).(2)By( 2 ),f(xk+1)f(xk). Thefollowingresultiswellknown. 14

PAGE 15

Proposition2.2. Iff:Rn!RisconcaveandRn,then infff(x)jx2g=infff(x)jx2conv()g,(2)wheretherstinmumisattainedonlywhenthesecondinmumisattained.Iffisdifferentiableatx2argminff(x):x2g,then rf(x)(y)]TJ /F3 11.955 Tf 11.95 0 Td[(x)0forally2conv().(2) Proof. Therstresult( 2 )isprovedin[ 42 ,Thm.32.2].Ifxminimizesf(x)over,thenby( 2 ),x2argminff(x):x2conv()g.Sinceconv()isaconvexset,therst-orderoptimalityconditionforxis( 2 ). Notethatatalocalminimizerxoffoveranonconvexset,theinequalityrf(x)(y)]TJ /F17 10.909 Tf -437.35 -23.91 Td[(x)0maynotholdforally2.Forexample,supposethatf(x)=aTxwhererf=ahasthedirectionshowninFigure 2-1 .ThepointAisalocalminimizeroffover,but( 2 )doesnothold.Hence,Proposition 2.2 isonlyvalidforaglobalminimizer,asstated.Next,weconsiderthespecialchoicey2P(x)]TJ /F3 11.955 Tf 11.95 0 Td[(sg(x))inProposition 2.2 Corollary1. Iff:Rn!Risconcaveandx2argminff(x):x2g,then rf(x)(y)]TJ /F3 11.955 Tf 11.95 0 Td[(x)=0(2)whenevery2P(x)]TJ /F3 11.955 Tf 11.96 0 Td[(sg(x))forsomes0. Proof. ByProposition 2.2 ,wehaverf(x)(y)]TJ /F3 11.955 Tf 11.95 0 Td[(x)0 15

PAGE 16

forally2P(x)]TJ /F3 11.955 Tf 11.95 0 Td[(sg(x)).Ontheotherhand,byLemma 1 withxk=x,wehaverf(x)(y)]TJ /F3 11.955 Tf 11.95 0 Td[(x)0forally2P(x)]TJ /F3 11.955 Tf 11.95 0 Td[(sg(x)).Therefore,( 2 )holds. Thefollowingpropertyfortheprojectionisneededinthemaintheorem: Lemma2. Ifisanonemptyclosedset,xk2Rnisasequenceconvergingtoxandyk2P(xk)isasequenceconvergingtoy,theny2P(x). Proof. Sinceyk2foreachkandisclosed,y2.Hence,wehaveky)]TJ /F3 11.955 Tf 11.96 0 Td[(xkminy2ky)]TJ /F3 11.955 Tf 11.96 0 Td[(xk.Ifthisinequalityisaequality,thenwearedone;consequently,letussupposethatky)]TJ /F3 11.955 Tf 11.96 0 Td[(xk>miny2ky)]TJ /F3 11.955 Tf 11.96 0 Td[(xkminy2fky)]TJ /F3 11.955 Tf 11.95 0 Td[(xkk)-223(kxk)]TJ /F3 11.955 Tf 11.96 0 Td[(xkg=kyk)]TJ /F3 11.955 Tf 11.96 0 Td[(xkk)-222(kxk)]TJ /F3 11.955 Tf 11.96 0 Td[(xk.Asktendsto1,therightsideapproachesky)]TJ /F3 11.955 Tf 11.96 0 Td[(xk,whichyieldsacontradiction. Wenowgivefurtherjusticationfortheconvergenceofthegradientprojectionalgorithminthenonconvexsetting. Theorem2.1. Iff:Rn!Risconcave,isacompactnonemptyset,andxkisgeneratedbythegradientprojectionalgorithm( 2 ),thenwehavef(xk+1)f(xk)foreachkand limk!1rf(xk)(xk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(xk)=0.(2)Ifxisthelimitofanyconvergentsubsequenceofthexkandthestepsizeskap-proachesalimits,then rf(x)(y)]TJ /F3 11.955 Tf 11.95 0 Td[(x)0forally2P(x)]TJ /F3 11.955 Tf 11.95 0 Td[(sg(x)).(2) 16

PAGE 17

Iffiscontinuouslydifferentiablearoundx,then rf(x)(y)]TJ /F3 11.955 Tf 11.96 0 Td[(x)=0(2)forsomey2P(x)]TJ /F3 11.955 Tf 11.95 0 Td[(sg(x)). Proof. Sumtheconcavityinequality( 2 )fork=0,1,...,K)]TJ /F4 11.955 Tf 11.96 0 Td[(1toobtain f(xK))]TJ /F3 11.955 Tf 11.96 0 Td[(f(x0)K)]TJ /F7 7.97 Tf 6.58 0 Td[(1Xk=0rf(xk)(xk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(xk).(2)Sincefiscontinuousandiscompact,f=minff(x):x2gisniteand f)]TJ /F3 11.955 Tf 11.96 0 Td[(f(x0)f(xK))]TJ /F3 11.955 Tf 11.96 0 Td[(f(x0).(2)Together,( 2 )and( 2 )yield( 2 )sincerf(xk)(xk+1)]TJ /F3 11.955 Tf 12.69 0 Td[(xk)0foreachkbyLemma 1 .Therelation( 2 )is( 2 )withxkreplacedbyx.Forconvenience,letxkalsodenotethesubsequenceoftheiteratesthatconvergestox,andletyk2P(xk)]TJ /F3 11.955 Tf -425.76 -23.9 Td[(skgk)denotetheiterateproducedbyxk.Sinceykliesinacompactset,thereexistsasubsequenceconvergingtoalimity.Again,forconvenience,letxkandykdenotethisconvergentsubsequence.By( 2 )andthefactthatfiscontinuouslydifferentiablearoundx,wehavelimk!1rf(xk)(yk)]TJ /F3 11.955 Tf 11.96 0 Td[(xk)=rf(x)(y)]TJ /F3 11.955 Tf 11.96 0 Td[(x)=0.ByLemma 2 ,y2P(x)]TJ /F3 11.955 Tf 11.95 0 Td[(sg(x)). .Theinequalities( 2 ),( 2 ),and( 2 )implythatmin0kKrf(xk)(xk)]TJ /F17 10.909 Tf 10.91 0 Td[(xk+1)f(x0))]TJ /F17 10.909 Tf 10.91 0 Td[(f K+1. 17

PAGE 18

Whenisconvex,muchstrongerconvergenceresultscanbeestablishedforthegradientprojectionalgorithm.Inthiscase,theprojectionontoisunique.By[ 23 ,Prop.2.1],foranyx2ands>0,x=P(x)]TJ /F3 11.955 Tf 12.21 0 Td[(sg(x))ifandonlyifxisastationarypointfor( 2 ).Thatis,rf(x)(y)]TJ /F3 11.955 Tf 11.96 0 Td[(x)0forally2.Moreover,whenisconvex, rf(x)(P(x)]TJ /F3 11.955 Tf 11.96 0 Td[(sg(x)))]TJ /F3 11.955 Tf 11.95 0 Td[(x)kP(x)]TJ /F5 11.955 Tf 11.95 0 Td[(g(x)))]TJ /F3 11.955 Tf 11.96 0 Td[(xk2=s(2)foranyx2ands>0.Hence,( 2 )impliesthattheleftsideof( 2 )vanishesatx=x,whichmeansthatx=P(x)]TJ /F3 11.955 Tf 12.02 0 Td[(sg(x)).Andconversely,ifx=P(x)]TJ /F3 11.955 Tf 12.02 0 Td[(sg(x)),then( 2 )holds. 2.2ApproximateNewtonAlgorithmToaccountforsecond-orderinformation,Bertsekas[ 4 ]analyzesthefollowingversionofthegradientprojectionmethod:xk+12P(xk)]TJ /F3 11.955 Tf 11.95 0 Td[(skr2f(xk))]TJ /F7 7.97 Tf 6.58 0 Td[(1gk).Strongconvergenceresultscanbeestablishedwhenisconvexandfisstronglyconvex.Ontheotherhand,iffisconcave,localminimizersareextremepointsofthefeasibleset,sotheanalysisisquitedifferent.Supposethatr2f(xk)isapproximatedbyamultiplekoftheidentitymatrixasisdoneintheBBmethod[ 2 ].Thisleadstotheapproximation f(x)f(xk)+rf(xk)(x)]TJ /F3 11.955 Tf 11.96 0 Td[(xk)+k 2kx)]TJ /F3 11.955 Tf 11.95 0 Td[(xkk2.(2)Letusconsiderthealgorithminwhichthenewiteratexk+1isobtainedbyoptimizingthequadraticmodel: xk+12argminfrf(xk)(x)]TJ /F3 11.955 Tf 11.95 0 Td[(xk)+k 2kx)]TJ /F3 11.955 Tf 11.96 0 Td[(xkk2:x2g(2) 18

PAGE 19

Aftercompletingthesquare,theiterationisequivalenttoxk+12argminkkx)]TJ /F4 11.955 Tf 11.96 0 Td[((xk)]TJ /F3 11.955 Tf 11.95 0 Td[(gk=k)k2:x2.Ifk>0,thenthisreducestoxk+12P(xk)]TJ /F3 11.955 Tf 10.46 0 Td[(gk=k);inotherwords,performthegradientprojectionalgorithmwithstepsize1=k.Ifk<0,thentheiterationreducesto xk+12Q(xk)]TJ /F3 11.955 Tf 11.95 0 Td[(gk=k),(2)where Q(x)=argmaxfkx)]TJ /F3 11.955 Tf 11.95 0 Td[(yk:y2g.(2)Ifisunbounded,thenthisiterationmaynotmakesensesincethemaximumcouldoccuratinnity.Butifisbounded,thentheiterationisjustiedinthesensethatitisbasedonaquadraticmodelofthefunction,whichcouldbebetterthanalinearmodel.Inthespecialcasewhereistheconstraintset( 2 )appearinginsparsePCAandk<0,themaximizationin( 2 )canbeevaluatedasfollows: Proposition2.3. Forthesetin( 2 )associatedwithsparsePCA,wehave)]TJ /F3 11.955 Tf 9.3 0 Td[(T(x)=kT(x)k2Q(x). Proof. AsintheproofofProposition 2.1 ,kx)]TJ /F3 11.955 Tf 11.91 0 Td[(yk2=kxk2+1)]TJ /F4 11.955 Tf 11.91 0 Td[(2hx,yiwhenyliesinthesetof( 2 ).Hence,wehaveQ(x)=argmaxfhx,yi:kyk=1,kyk0g.GivenanysetSf1,2,...,ng,thesolutionoftheproblemmaxfhx,yi:kyk=1,supp(y)=Sgisy=)]TJ /F3 11.955 Tf 9.3 0 Td[(xS=kxSkbytheSchwarzinequality,andthecorrespondingobjectivevalueiskxSk.Clearly,themaximumisattainedwhenScorrespondstoasetofindicesofxassociatedwiththeabsolutelargestcomponents. 19

PAGE 20

Letusnowstudytheiteratesgeneratedbythequadraticmodel( 2 ).Ifk>0,thentheiterationreducestoxk+12P(xk)]TJ /F3 11.955 Tf 12.37 0 Td[(gk=k),whichwasstudiedinSection 2.1 .Hence,wefocusonthecasek<0wheretheiterationreducestoxk+12Q(xk)]TJ /F3 11.955 Tf -433.79 -23.91 Td[(gk=k). Lemma3. Ifxk2andk<0,then rf(xk)(y)]TJ /F3 11.955 Tf 11.95 0 Td[(xk)+k 2ky)]TJ /F3 11.955 Tf 11.96 0 Td[(xkk20forally2Q(xk)]TJ /F3 11.955 Tf 11.95 0 Td[(gk=k).(2)Inparticular,fory=xk+1thisgives rf(xk)(xk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(xk)+k 2kxk+1)]TJ /F3 11.955 Tf 11.96 0 Td[(xkk20.(2)Iffsatises f(y)f(x)+rf(x)(y)]TJ /F3 11.955 Tf 11.95 0 Td[(x)+k 2ky)]TJ /F3 11.955 Tf 11.96 0 Td[(xk2forallxandy2,(2)thenf(xk+1)f(xk). Proof. Ify2Q(xk)]TJ /F3 11.955 Tf 12.3 0 Td[(gk=k),thensinceQ(xk)]TJ /F3 11.955 Tf 12.29 0 Td[(skgk)issetofelementsinfarthestfromxk)]TJ /F3 11.955 Tf 11.95 0 Td[(gk=k,wehave ky)]TJ /F4 11.955 Tf 11.96 0 Td[((xk)]TJ /F3 11.955 Tf 11.96 0 Td[(gk=k)k2kxk)]TJ /F4 11.955 Tf 11.96 0 Td[((xk)]TJ /F3 11.955 Tf 11.96 0 Td[(gk=k)k2=kgkk2=2k.(2)Wecansquareandrearrangetheleftsideof( 2 )toobtain( 2 ).If( 2 )holds,thenwecombinethiswith( 2 )toobtainf(xk+1)f(xk). Thecondition( 2 )issatisediffistwicecontinuouslydifferentiableoverconv()andkexceedsthelargesteigenvalueoftheHessian.Wenowshowthatundersuitableassumptions,convergentsubsequencesoftheiteratesxk+12Q(xk)]TJ /F3 11.955 Tf 12.08 0 Td[(gk=k)approachapointwheretherst-orderoptimalitycondition( 2 )foraglobaloptimizerholds. Theorem2.2. Supposethatiscompact,fiscontinuouslydifferentiableon,xisalimitofanyconvergentsubsequenceoftheiteratesgeneratedbythescheme 20

PAGE 21

xk+12Q(xk)]TJ /F3 11.955 Tf 11.95 0 Td[(gk=k),wherek<0foreachk.Ifkxk+1)]TJ /F3 11.955 Tf 11.96 0 Td[(xkktendsto0,then rf(x)(y)]TJ /F3 11.955 Tf 11.95 0 Td[(x)0forally2conv().(2) Proof. Thequadraticmodel( 2 )isconcavewhenk<0.Hence,byProposition 2.2 [rf(xk)+k(xk+1)]TJ /F3 11.955 Tf 11.96 0 Td[(xk)T](y)]TJ /F3 11.955 Tf 11.96 0 Td[(xk+1)0(2)forally2conv().Letusfocusonasubsequenceofthexkconvergingtox.Sincekxk+1)]TJ /F3 11.955 Tf 12.26 0 Td[(xkktendsto0,thosexk+1associatedwiththeconvergentxksubsequencealsoconvergetox.Inthelimit,( 2 )yields( 2 ). Whenfisstronglyconcave,alinesearchstrategycanbeusedtoensurethatkxk+1)]TJ /F3 11.955 Tf 12.08 0 Td[(xkktendsto0andhence,theoptimalitycondition( 2 )holdsatalllimitpoints.Forillustration,considerthefollowingalgorithm:MONOTONEAPPROXIMATENEWTON(FORSTRONGLYCONCAVEf) Given2(0,1),[min,max](,0),andstartingguessx1.Setk=1.Step1.Choosek2[min,max]Step2.Setk=jkwherej0isthesmallestintegersuchthatf(xk+1)f(xk)+kkxk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(xkk2wherexk+12Q(xk)]TJ /F3 11.955 Tf 11.96 0 Td[(gk=k)Step3.Ifastoppingcriterionissatisfied,terminate.Step4.Setk=k+1andgotostep1. Theobjectivevaluesgeneratedbythisalgorithmaremonotonenonincreasingsincek<0.Ifiscompact,thenfisboundedfrombelowandf(xk))]TJ /F3 11.955 Tf 12.22 0 Td[(f(xk+1)approaches 21

PAGE 22

0.ItfollowsfromStep2thatkxk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(xkk2f(xk))]TJ /F3 11.955 Tf 11.96 0 Td[(f(xk+1) )]TJ /F5 11.955 Tf 9.29 0 Td[(k.Wenowshowthatwhenfisstronglyconcave,kisboundedawayfrom0,uniformlyink.Hence,kxk+1)]TJ /F3 11.955 Tf 11.96 0 Td[(xkktendsto0. Lemma4. Iffisdifferentiableonandforsome<0,wehave f(y)f(x)+rf(x)(y)]TJ /F3 11.955 Tf 11.95 0 Td[(x)+ 2ky)]TJ /F3 11.955 Tf 11.96 0 Td[(xk2forallxandy2,(2)thenStep2inthemonotoneapproximateNewtonalgorithmterminateswithanitej,andkboundedawayfrom0,uniformlyink. Proof. Ifxk+12Q(xk)]TJ /F3 11.955 Tf 12.07 0 Td[(gk=k),thenby( 2 )withy=xk+1andx=xkandby( 2 ),wehave f(xk+1)f(xk)+)]TJ /F5 11.955 Tf 11.95 0 Td[(k 2kxk+1)]TJ /F3 11.955 Tf 11.96 0 Td[(xkk2.(2)If0>k>=2,then)]TJ /F5 11.955 Tf 12.24 0 Td[(kk.Hence,by( 2 ),Step2mustterminatewheneverk>=2.Since2(0,1),itfollowsthatStep2terminatesforanitej.IfStep2terminateswhenj>0,thenj)]TJ /F7 7.97 Tf 6.58 0 Td[(1=2,whichimpliesthatk=j=2.IfStep2terminateswhenj=0,thenkmax<0.Ineithercase,kisuniformlyboundedawayfrom0. OnewaytochoosetheHessianapproximationkin( 2 )orkintheMonotoneApproximateNewtonAlgorithmiswiththeBBapproximation[ 2 ]givenby BBk=krf(xk))-221(rf(xk)]TJ /F7 7.97 Tf 6.59 0 Td[(1)k2 kxk)]TJ /F3 11.955 Tf 11.95 0 Td[(xk)]TJ /F7 7.97 Tf 6.58 0 Td[(1k2.(2) 2.3NumericalExperimentsonSparsePCAAlgorithmsWewillinvestigatetheperformanceofthegradientprojectandapproximateNewtonalgorithmrelativetopreviouslydevelopedalgorithmsintheliterature.Inourexperiments 22

PAGE 23

weusethegradientprojectionalgorithmwithunitstepsize(GPU):xk+12P(xk)]TJ /F3 11.955 Tf 11.96 0 Td[(gk).AndinourexperimentswiththeapproximateNewtonalgorithm,weemploytheBBapproximation(GPBB):xk+12Q(xk)]TJ /F3 11.955 Tf 11.96 0 Td[(gk=BBk).TheperformanceofthisnonmonotoneschemewillbecomparedlatertothatoftheMonotoneApproximateNewtonAlgorithm.ForthesetassociatedwithsparsePCA,wehaveT(x)=kT(x)k2P(x)and)]TJ /F3 11.955 Tf 11.95 0 Td[(T(x)=kT(x)k2Q(x)byPropositions 2.1 and 2.3 respectively.Wecomparetheperformanceofouralgorithmstothoseofboththetruncatedpowermethod(Tpower)[ 45 ]andthegeneralizedpowermethod(Gpower)[ 33 ].Theconditionalgradientalgorithmwithunitstep-size(ConGradU)proposedin[ 38 ]isequivalenttothetruncatedpowermethod.BothtruncatedandgeneralizedpowermethodaretargetedtothesparsePCAproblem( 2 ).Thetruncatedpowermethodhandlesthesparsityconstraintbypushingtheabsolutesmallestcomponentsoftheiteratesto0.Theiterationcanbeexpressed xk+1=T()]TJ /F3 11.955 Tf 9.3 0 Td[(gk) kT()]TJ /F3 11.955 Tf 9.3 0 Td[(gk)k.(2)Forcomparison,aniterationofthegradientprojectionalgorithmwithunitstepsizeGPUisgivenby xk+1=T(xk)]TJ /F3 11.955 Tf 11.96 0 Td[(gk) kT(xk)]TJ /F3 11.955 Tf 11.96 0 Td[(gk)k,(2)whiletheapproximateNewtonalgorithmis xk+1=sgn(k)T(xk)]TJ /F3 11.955 Tf 11.95 0 Td[(gk=BBk) kT(xk)]TJ /F3 11.955 Tf 11.96 0 Td[(gkBBk)k.(2) 23

PAGE 24

SincethecomputationofBBkrequiresthegradientattwopoints,westartGPBBwithoneiterationofGPU.ForthesparsePCAproblem( 2 ),thetimeforoneiterationofanyofthesemethodsisbasicallythetimetomultiplyavectorbythecovariancematrix.NotethatthemonotoneapproximateNewtonalgorithmcouldbemorecostlysincetheevaluationofanacceptablejmayrequireseveralevaluationsoftheobjectivefunction.Inthegeneralizedpowermethod,thesparsityconstraintishandledusingapenaltyterms.If>0denotesthepenalty,thenGpowerl1correspondstotheoptimizationproblemmaxkxk=1p x>x)]TJ /F5 11.955 Tf 11.95 0 Td[(kxk1,wherekxk1=jx1j+jx2j+...+jxnj,whileGpowerl0correspondstomaxkxk=1x>x)]TJ /F5 11.955 Tf 11.95 0 Td[(kxk0.Theparameterneedstobetunedtoachievethedesiredcardinality;asincreases,thecardinalityoftheGpowerapproximationdecreases.Incontrast,thecardinalityisanexplicitinputparameterforeitherthetruncatedpowermethodorforouralgorithms;inmanyapplications,cardinalityisoftenspecied.TheexperimentsinthischapterwereconductedusingMATLABonaGNU/Linuxcomputerwith8GBofRAMandanIntelCorei7-2600processor.Forthestartingguessinourexperiments,wefollowthepracticeoftheTpoweralgorithm[ 45 ]andsetx=ei,thei-thcolumnoftheidentitymatrix,whereiistheindexofthelargestdiagonalelementofthecovariancematrix.OurnumericalexperimentsarebasedonthesparsePCAproblem( 2 ).Wemeasurethequalityofthesolutionto( 2 )computedbyanyofthemethodsusingtheratioxTx=yTywherexisthesparserstprincipalcomponentcomputedbyanyofthealgorithmsfor( 2 )andyistherstprincipalcomponent(asolutionof( 2 )).Thisratioisoftencalledtheproportionoftheexplainedvariance. 24

PAGE 25

PitpropsdatasetThisdataset[ 29 ]contains180observationswith13variables,andacovariancematrix2R1313.ThisisastandardbenchmarkdatasetforSparsePCAalgorithms.Weconsider=6or7,andweadjustthevalueoftoachievethesamesparsityinGpower.ThelastcolumnofTable 2-1 givestheproportionoftheexplainedvariance.Observethatallthemethodsachieveessentiallythesameproportionoftheexplainedvariance.WealsoconsideredmultiplesparseprincipalcomponentsforthisdatasetandgotthesameresultsasthoseobtainedinTable2of[ 48 ]fortheTpowerandPathSPCAalgorithms.Similarly,forthelymphomadataset[ 1 ]andtheRamaswamydataset[ 41 ],allmethodsyieldedthesameproportionofexplainedvariance,althoughthevalueofforGpowerneededtobetunedtoachievethespeciedcardinality. RandomlygenerateddataInthenextsetofexperiments,weconsiderrandomlygenerateddata,where=ATAwitheachentryofA2Rmngeneratedbyanormaldistributionwithmean0andstandarddeviation1.Forrandomlygeneratedmatrices,wecanstudytheperformanceaseithermorchange.Eachresultthatwepresentisbasedonanaverageover100randomlygeneratedmatrices.InFigure 2-2 weplottheproportionoftheexplainedvarianceversuscardinalityform=250andn=500.ObservethatGPBByieldsasignicantlybetterobjectivevalueasthecardinalitydecreaseswhencomparedtoeitherGPUorConGradU,whileGPUandConGradUhaveessentiallyidenticalperformance.EventhoughallthealgorithmsseemtoyieldsimilarresultsinFigure 2-2 asthecardinalityapproaches500,theconvergenceofthealgorithmsisquitedifferent.Toillustratethis,letusconsiderthecasewherethecardinalityis500.Inthiscasewhere=n,thesparsePCAproblem( 2 )andtheoriginalPCAproblem( 2 )areidentical,andthesolutionof( 2 )isanormalizedeigenvectorassociatedwiththelargesteigenvalue1of.Sincetheoptimalobjectivevalueisknown,wecancomputethe 25

PAGE 26

relativeerrorexact1)]TJ /F5 11.955 Tf 11.96 0 Td[(approx1 exact1,whereapprox1istheapproximationtotheoptimalobjectivevaluegeneratedbyanyofthealgorithms.InFigure 2-3 weplotthebase10logarithmoftherelativeerrorversustheiterationnumber.ObservethatGPBBisabletoreducetherelativeerrortothemachineprecisionnear10)]TJ /F7 7.97 Tf 6.58 0 Td[(16inabout175iterations,whileConGradUandGPUhaverelativeerroraround10)]TJ /F7 7.97 Tf 6.58 0 Td[(3after200iterations.Toachievearelativeerroraround10)]TJ /F7 7.97 Tf 6.59 0 Td[(16,ConGradUandGPUrequireabout4500iterations,roughly25timesmorethanGPBB.Inthisexample,thecondition( 2 )ofLemma 3 isnotsatisedandtheobjectivevaluesproducedbyGPBBdonotimprovemonotonically.Nonetheless,theconvergenceisrelativelyfast.TheresultsfortheexplainedvarianceinFigure 2-2 wereobtainedbyrunningeitherConGradUorGPUfor6000iterations,whileGPBBwasrunfor200iterations.Hence,thebetterobjectivevaluesobtainedbyGPBBinFigure 2-2 wereduetothealgorithmconvergingtoabettersolution,ratherthantoprematureterminationofeitherGPUorConGradU.InFigure 2-4 weplottheproportionoftheexplainedvarianceversustheiterationnumberwhenm=250,n=500,andthecardinality=50intherandomdataset.WhenweplotthefunctionvalueasinFigure 2-4 ,itismoredifculttoseethenonmonotonenatureoftheconvergenceforGPBB.ThisnonmonotonenatureisclearlyvisibleinFigure 2-3 whereweplottheerrorinsteadofthefunctionvalue.InFigure 2-5 weshowhowtheexplainedvariancedependsonm.Asmincreases,theexplainedvarianceassociatedwithGPBBbecomesmuchbetterthanthatofeitherConGradUorGPU.Asmentionedalready,thenonmonotoneconvergenceofGPBBisduetothefactthattheBBvalueforkmaynotsatisfycondition( 2 ).Wecanachievemonotoneconvergencebyincreasingkuntilitislargeenoughthatf(xk+1)f(xk)asinthe 26

PAGE 27

monotoneapproximateNewtonalgorithm.InFigure 2-6 wecomparetherelativeerrorforthemonotoneschemeforthenonmonotoneGPBBwiththemonotoneschemecorrespondingto=0.25.Theerrorinthemonotonealgorithmconvergedtothecomputingprecisionaround10)]TJ /F7 7.97 Tf 6.59 0 Td[(16inaboutthesamenumberofiterationsasthenonmonotonealgorithm,however,therunningtimeforthemonotonealgorithmwasabout4timeslargersincewemayneedtotestseveralchoicesofkbeforegeneratingamonotoneiterate.TocomparewithGpower,weneedtochooseavaluefor.Werstconsiderasimplecasem=20,n=20andweusethedefaultseedinMATLABtogeneratethismatrix.Thealgorithmsareusedtoextracttherstprincipalcomponentwith=5,andwithtunedtoachieve=5.TheresultsinTable 2-2 indicatethatGpowerperformedsimilartoTpower,butnotaswellasGPBB.Inthenextexperiment,weconsider100randomlygeneratedmatriceswithm=250andn=500,andwiththeparameterinGpowerchosentoachieveanaveragecardinalitynear100or120.AsseeninTable 2-3 ,Gpowerl0achievessimilarvaluesfortheproportionoftheexplainedvarianceasTpower,whileGpowerl1achievesslightlybetterresultsandGPBBachievesthebestresults. Hollywood-2009dataset,Densestk-subgraph(DkS)GivenanundirectedgraphG=(V,E)withverticesV=f1,2,...,ngandedgesetE,andgivenanintegerk2[1,n],thedensestk-subgraph(DkS)problemistondasetofkverticeswhoseaveragedegreeinthesubgraphinducedbythissetisaslargeaspossible.AlgorithmsforndingDkSareusefultoolsforanalyzingnetworks.Manytechniqueshavebeenproposedforsolvingthisproblemincluding[ 5 ],[ 34 ],[ 44 ].Mathematically,DkSisequivalenttoabinaryquadraticprogrammingproblem maxfTA:2Rn,2f0,1gn,kk0=kg,(2) 27

PAGE 28

whereAistheadjacencymatrixofthegraph;aij=1if(i,j)2E,whileaij=0otherwise.Werelaxtheconstraints2f0,1gnandkk0=ktokk=p kandkk0k,andconsiderthefollowingrelaxedversionof( 2 ) maxfTA:2Rn,kk=p k,kk0kg.(2)Afterasuitablescalingof,thisproblemreducestothesparsePCAproblem( 2 ).LetusconsidertheHollywood-2009dataset[ 6 7 ],whichisassociatedwithagraphwhoseverticesareactorsinmovies,andanedgejoinstwoverticeswhenevertheassociatedactorsappearinamovietogether.Thedatasetcanbedownloadedfromthefollowingwebsite: http://law.di.unimi.it/datasets.phpTheadjacencymatrixAis11399051139905.InordertoapplyGpowertotherelaxedproblem,werstfactoredA+cIintoaproductoftheformRTRusingaCholeskyfactorization,wherec>0istakenlargeenoughtomakeA+cIpositivedenite.Here,Rplaystheroleofthedatamatrix.However,oneofthestepsintheGpowercodeupdatesthedatamatrixbyarankonematrix,andtherankonematrixcausedtheupdateddatamatrixtoexceedthe200GBmemoryonthelargestcomputerreadilyavailablefortheexperiments.Hence,thisproblemwasonlysolvedusingTpowerandGPBB.Sincetheadjacencymatrixrequireslessthan2GBmemory,iteasilytonour8GBcomputer.InTable 2-4 ,wecomparethedensityvaluesTA=kobtainedbythealgorithms.Inaddition,wealsocomputedthelargesteigenvalueoftheadjacencymatrixA,andgivetheratioofthedensityto.Observethatin2ofthe6cases,GPBBobtainedasignicantlybettervalueforthedensitywhencomparedtoTpower,whileintheother4cases,bothalgorithmsconvergedtothesamemaximum. 2.4DiscussionontheProjectionAlgorithmandSparsePCAThegradientprojectionalgorithmwasstudiedinthecasewheretheconstraintsetmaybenonconvex,asitisinsparseprincipalcomponentanalysis.Eachiterationof 28

PAGE 29

thegradientprojectionalgorithmsatisedtheconditionrf(xk)(xk+1)]TJ /F3 11.955 Tf 11.6 0 Td[(xk)0.Moreover,iffisconcaveoverconv(),thenf(xk+1)f(xk)foreachk.Whenasubsequenceoftheiteratesconvergetox,weobtaininTheorem 2.1 theequalityrf(x)(y)]TJ /F3 11.955 Tf 12.26 0 Td[(x)=0forsomey2P(x)]TJ /F3 11.955 Tf 12.34 0 Td[(sg(x))wherePprojectsapointontoandsisthelimitingstepsize.Whenisconvex,yisuniqueandtheconditionrf(x)(y)]TJ /F3 11.955 Tf 12.7 0 Td[(x)=0isequivalenttotherst-ordernecessaryoptimalityconditionatxforalocalminimizer.TheapproximateNewtonalgorithmwithapositiveHessianapproximationk,theiterationreducedtotheprojectedgradientalgorithmwithstepsize1=k.Ontheotherhand,whenk<0,asitiswhentheobjectivefunctionisconcaveandtheHessianapproximationiscomputedbytheBarzilai-Borweinformula( 2 ),theiterationamountstotakingastepalongthepositivegradient,andthenmovingasfarawayapossiblewhilestayinginsidethefeasibleset.Innumericalexperimentsbasedonsparseprincipalcomponentanalysis,thegradientprojectionalgorithmwithunitstepsizeperformedsimilartoboththetruncatedpowermethodandthegeneralizedpowermethod.Ontheotherhand,insomecases,theapproximateNewtonalgorithmwithaBarzilai-Borweinstepsizecouldconvergemuchfastertoabetterobjectivefunctionvaluethantheothermethods. 29

PAGE 30

Figure2-1. ExamplethatshowsProposition 2.2 maynotholdforlocalminimizers. Table2-1. ResultsonPitpropsdataset. MethodParametersExplainedVariance GPBB=60.8939GPBB=70.9473GPU=60.8939GPU=70.9473Tpower(ConGradU)=60.8939Tpower(ConGradU)=70.9473Gpowerl1=0.5(,=6)0.8939Gpowerl1=0.4(,=7)0.9473Gpowerl0=0.2(,=6)0.8939Gpowerl0=0.15(,=7)0.9473 30

PAGE 31

Figure2-2. Explainedvarianceversuscardinalityforrandomdataset. Figure2-3. Aplotofthebase10logarithmoftherelativeerrorversusiterationnumberfortherandomdatasetwithm=250andcardinality=500. 31

PAGE 32

Figure2-4. Explainedvarianceversusiterationnumberforcardinality50intherandomdataset. Figure2-5. Explainedvarianceversusmforcardinality20intherandomdataset. 32

PAGE 33

Figure2-6. Aplotofthebase10logarithmoftherelativeerrorversusiterationnumberfortherandomdatasetwithm=250andcardinality=500.Theboxescorrespondtothemonotonealgorithmwhichemploysalinesearch,whilethecirclescorrespondtothenonmonotonealgorithmwithkgivenbytheBBformula( 2 ). Table2-2. Simplerandomdataset MethodCardinalityExplainedVariance GPBB=50.8193Tpower(ConGradU)=50.7913Gpowerl1=0.18(,=5)0.7914Gpowerl0=0.045(,=5)0.7914 Table2-3. Randomdataset,m=250,n=500. MethodCardinalityExplainedVariance GPBB=1000.7396GPBB=1200.7823Tpower(ConGradU)=1000.7106Tpower(ConGradU)=1200.7536Gpowerl1=0.075(average=99)0.7288Gpowerl1=0.0684(average=120)0.7679Gpowerl0=0.0078(average=100)0.7129Gpowerl0=0.0066(average=120)0.7557 33

PAGE 34

Table2-4. Hollywooddataset. MethodCardinalityDensityTA=kRatioTA=k GPBBk=500379.400.1688GPBBk=600401.220.1785GPBBk=700593.240.2639GPBBk=800649.670.2891GPBBk=900700.380.3116GPBBk=1000745.950.3319Tpower(ConGradU)k=500190.110.0846Tpower(ConGradU)k=600401.210.1785Tpower(ConGradU)k=700436.530.1942Tpower(ConGradU)k=800649.670.2891Tpower(ConGradU)k=900700.440.3116Tpower(ConGradU)k=1000745.950.3319 34

PAGE 35

CHAPTER3ADECENTRALIZEDALGORITHMFORSEPARABLECONVEXOPTIMIZATIONInmoderncomputation,datasetsareoftenoflargedimensions.Developingdistributedmethodsforcomputationandstoragehasbecomeanimportantresearchtopic.Manypowerfuloptimizationalgorithmsweredevelopeddecadesago.Inrecentyears,theyhaveregainedvastpopularitybecauseofthedemandinresearchareassuchasmachinelearningandarticialintelligence.Theadvanceincomputationalspeedandstoragecapacityempowersresearcherstoutilizelarge-scaleandefcientvariantsofthealready-developedoptimizationalgorithms.OneofsuchalgorithmsistheAlternatingDirectionMethodofMultiplier(ADMM).Muchclassicfoundationalwork,suchasin[ 17 18 20 ],hasbeendonetoshowtherobustnessandeffectivenessofthismethod.Initsstandardform,theADMMusuallysplitsthefeaturevariableintotwoblocks.ItisanopenquestioninoptimizationwhetherADMMwithmorethantwovariableblockswillconvergeornot.Recently,authorsof[ 11 ]providedcounter-examplesthatADMMmaynotconvergeinthecaseswhenthevariableissplitintomorethantwosets.Ithasbeenshowninsomestudies[ 11 16 24 26 27 36 ]thatifcertainconditionsaresatised,convergenceofcertainmulti-blockvariantsofthealgorithmcanbeguaranteed.In[ 24 27 ],theconvergenceanalysisisprovidedforacaseADMMisappliedwithmulti-blocksplitting.AlgorithmsgiveninthosestudiesarebasedonGauss-Seidelupdateandaredifferentfromours.Especially,[ 27 ]mentionedamodicationusingJacobiupdateschemeintheprimalvariableupdatebutdidn'tprovidethecompleteproof.TheirJacobimodicationinvolvesanadjustmentforthevariableupdate.Thisvariantissimilartoouralgorithm,althoughdifferent.[ 16 ]proposedasimilarJacobianADMMandanothervariantcalledproximalJacobianADMM.However,theirconvergenceanalysisfortheJacobianADMMimposesanear-orthogonalityrequirement 35

PAGE 36

ontheconstraintandisdifferentfromours.Thisrequirementmaynotbesatisedinrealapplications,suchasoursubsequentcasestudyofpower-gridloadcontrol.WeproposeadecentralizedvariantoftheADMMalgorithmforsolvingthepower-gridloadcontrolproblem.Weobservesignicantperformanceadvantageofouralgorithmovertheexistingalgorithminrestoringthepower-gridload-generationbalance.Anothercontributionofthisstudyisthatweprovideaself-containedconvergenceproofforthisspecicalgorithmwithmultiplevariableblocks.TheconvergenceofADMMofmultiplevariableblockshaslargelybeenanopenprobleminoptimization.OurproofofconvergenceisdifferentfromthepreviousresultsontheJacobianvariantsofADMMandisapplicableinpower-gridloadcontrolproblem.Thischapterisorganizedasfollows.Section 3.1 introducestheengineeringbackgroundofthepower-gridcontrolapplication.InSection 3.2 ,theloadcontrolproblemisformulatedasanoptimizationproblem.Section 3.3 and 3.4 discussandproposethemulti-blockalgorithm.Section 3.5 providesaself-containedtheoreticalproofofconvergenceforouralgorithm.Section 3.6 and 3.7 presentthenumericalexperiments.Finally,linearconvergencerateisprovedinSection 3.8 .Notationinthischapter.@fdenotesthegradientoffwhenfisdifferentiable,orthesub-gradientwhenfisnotdifferentiable.Thesuperscriptoftendenotestheiterationnumberofthevariable.Thesubscriptoftendenotesthecomponentofthevariable.xkiisthei-thcomponentofthek-thiterateofx.kkdenotestheEuclideannorm. 3.1ThePower-gridLoadControlProblemBackgroundInthepowergrid,generationmustmatchconsumption.Ifthereistoolargeofamismatch,thegeneratorsmaybeshutdowntopreventdamage,andblackoutsmayoccurasaresult.Loadcontroloffersasolutiontomaintainingthisbalance.Insteadofgeneratorschanginggeneration,someloadsmaychangeconsumption.Centralizedcontrolofloadsisimpracticalduetothescaleoftheproblembecausetheremaybealarge 36

PAGE 37

numberofloadsthatmustbecontrolled.Therefore,decentralizationisparamountandhasbeenthefocusofmuchoftheliterature.Eachloadsharesinformationwithsomesubsetofotherloadsinthegrid.Knowledgeoftheimbalancebetweengenerationandconsumptionmaybeinjectedintothecommunicationnetworkatsomenode,oritmaybeinferredbyeachloadfromlocalmeasurements.Distributedoptimizationinthepresenceofacommunicationnetworkamongagentsispopularintheliterature.Weconsidertheloadcontrolprobleminamicro-gridwithonefrequency,onegeneratorandnloadswiththepowerconsumption.Supposethereisachangeofinpowergeneration,denotedbyC.Theloadcontrolalgorithmadjuststhepowerconsumptionofloadibyxi2[ai,bi].Thiscausesanend-userdisutilitymodelledbythefunctionfi.Wewishtoeliminatetheload-generationmismatchsothatPni=1xi=CwhileminimizingtheuserdisutilityPni=1fi(xi).Similarmodelshavebeenconsideredin[ 9 47 ]. 3.2FormulatingLoadControlasConstrainedOptimizationWecancasttheaforementionedloadcontrolproblemastheoptimizationproblem minimizexnXi=1fi(xi)subjecttoaixibi,i=1,...,n,nXi=1xi=C.(3)Whereeachfunctionfiisassumedtobeconvex.Werefertoaixibi,i=1,...,n,astheboxconstraintandPni=1xi=Casthelinearconstraint.Oneoftheprominentappearancesofthesimilarproblemsisinsolvingthedualformulationofsupportvectormachine(SVM).Manyoptimizationmethods,suchas[ 13 21 31 40 46 ],havebeendevelopedtosolveSVM.Iftheobjectivefunctioninproblem( 3 )isaseparablequadraticfunction,theproblemisreferedtoastheseparableconvexquadraticknapsackproblem.Afewalgorithms,suchas[ 39 ]and[ 15 ]havebeenproposedtosolveit.Alltheaforementionedmethodsareefcientsolversoftheoptimizationproblem.However,all 37

PAGE 38

themethodsmentionedarecentralizedalgorithms.Itmeanstheyrelyonacentralizedcontrolunittoupdateandstorevariables.Inadecentralizedalgorithm,eachagentmustbeabletoupdateandstoreitsownvariables.Existingsolversforsupportvectormachinerequirecentralizedcontroltohandlethelinearconstraintin( 3 ).Adecentralizedalgorithmisnecessaryinthecases,suchaspower-gridloadcontrol,wherethereisanabsenceofcentralizedcontrolunit.Recently,[ 47 ]proposedtouseadecentralizeddualascentschemetosolvetheproblem.Thecasestudyshowstheiralgorithmisabletoeffectivelyrestoreload-generationbalance.Inthischapter,weproposeadecentralizedalgorithmtosolvetheoptimizationproblem( 3 ). 3.3DualDecompositionandMulti-blockVariableSplittingTheideaofdualdecompositionoffersanadvantageindesigningdecentralizedalgorithm.ThiscanbeseeninthestandardformofADMM.Weconsideredtheconvexoptimizationproblemwithequalityconstraint minimizexf(x)subjecttoAx=b.(3)TheaugmentedLagrangianof( 3 )is L(x,y)=f(x)+y>(Ax)]TJ /F3 11.955 Tf 11.95 0 Td[(b)+ 2kAx)]TJ /F3 11.955 Tf 11.96 0 Td[(bk2,(3)whereyisthedualvariable,>0isaparameter.Themethodofmultipliersolvestheoptimizationproblemwiththeaugmentedobjectivefunction minimizexf(x)+ 2kAx)]TJ /F3 11.955 Tf 11.96 0 Td[(bk2subjecttoAx=b.(3)Itisobviousthatthisproblemisequivalenttoproblem( 3 ).Thisformulationenjoysimprovedconvergenceproperties.However,theobjectivefunctionof( 3 )isnot 38

PAGE 39

separablebecausetheadditionoftheaugmentedterm.TheADMMalgorithmconsiderstheproblemofform minimizex,zf(x)+g(z)subjecttoAx+Bz=c.(3)Problem 3 differsfromthepreviousformulationbythattheprimalvariablexin 3 issplitintotwovariablesxandz.TheaugmentedLagrangianofthisproblemis L(x,y,z)=f(x)+g(z)+y>(Ax+Bz)]TJ /F3 11.955 Tf 11.95 0 Td[(c)+ 2kAx+Bz)]TJ /F3 11.955 Tf 11.96 0 Td[(ck2.(3)TheupdateruleofADMMisgivenby xk+1=argminxL(x,yk,zk),zk+1=argminzL(xk+1,yk,z),yk+1=yk+(Axk+1+Bzk+1)]TJ /F3 11.955 Tf 11.96 0 Td[(c).(3)Ingeneral,theconvergenceoftheADMMcanbesummarizedasfollows. Objectiveconvergencef(xk)+g(zk)!p?. Dualvariableconvergenceyk!y?. PrimalresidualconvergenceAxk+Bzk)]TJ /F3 11.955 Tf 11.95 0 Td[(c!0. ADMMconvergesevenifx-updateisinexactbutmademoreaccurateasiterationnumberincreases.Thiswasprovedin[ 18 ]. Therateofconvergenceislinear.See,e.g.,[ 16 27 ].Intheaforementionedstandardform,theADMMsplitstheprimalvariableintotwoblocksxandz.Wehaveseenintheformulation( 3 )thatitwillbemostefcientiftheproblemcanbecompletelyseparable,i.e.,eachupdateisdoneinadistributedfashionononecomponentxionly.Thisrequiresthevariabletobecompletelysplitintonblocksinsteadofxandzinthestandardform.However,suchdirectextensionsoftheoriginalalgorithmoftenlacktheoreticalconvergenceguarantee. 39

PAGE 40

Inapplications,suchasthepower-gridloadcontrolproblem,eachagentoftenhasonlytheaccesstothelocalvariablexi.Eachagentonlyhastheinformationoflocalobjectivefunctionfi,butnottheotherobjectivefunctionfj,j6=i.Moreover,thereisoftenthelackofthecentralizedcontrolunitbecauseofthehardwarelimitation.Thepracticalsituationcallsforadecentralizedalgorithmwhichonlyuseslocalvariableandobjectivefunctioninformation.Itisworthnotingthat[ 27 ]mentionedamodicationusingJacobiupdateschemeintheprimalvariableupdatebutdidn'tprovidethecompleteproof.Theiralgorithmmodicationinvolvesanadjustmentforthevariableupdate,whichisdifferentfromours.[ 16 ]alsoproposedsimilarJacobivariantsofthealgorithm.Theirconditionforconvergencerequiresapropertybeingsatisedbythelinearconstraint.Thealgorithmsmentionedinthoseworksaredifferentfromours.Theycannotbedirectlyappliedtothepower-gridloadcontrolproblem. 3.4DecentralizedMulti-blockADMMforPowerGridLoadControlWerstconsiderthefollowingaugmentedformulationofproblem( 3 ) minimizexnXi=1fi(xi)+ 2knXi=1xi)]TJ /F3 11.955 Tf 11.96 0 Td[(Ck2subjecttoaixibi,i=1,...,n,nXi=1xi=C.(3)Inthethepower-gridcontrolcase,theproblemcallsforadecentralizedalgorithm,whichmeanstheremaybenocentralizedcontrolunittobroadcastorstoreinformation.Inaddition,theremaybenoorlimitedcommunicationbetweencomponentxiandxj.Furthermore,agentidoesn'thavetheinformationoftheotherobjectivefunctionsfj,j6=i.Weassumeinapplications,suchasthepower-gridloadcontrolproblem,agenticanobtainalocalestimateoftheprimalresidualrk=Pni=1xki)]TJ /F3 11.955 Tf 12.14 0 Td[(C,whichisthemeasurementofthefeasibility.Weconsidertheupdateruleforagenti 40

PAGE 41

DECENTRALIZEDMULTI-BLOCKADMM(DM-ADMM) Fork=1,2,...Distributedtask.Foreachblocki, yk+1i=yki+(nXi=1xki)]TJ /F3 11.955 Tf 11.95 0 Td[(C),xk+1i=argminaixbifi(x)+yk+1i(x+Xj6=ixkj)]TJ /F3 11.955 Tf 11.96 0 Td[(C)+ 2kx+Xj6=ixkj)]TJ /F3 11.955 Tf 11.96 0 Td[(Ck2.(3)End. ItisobviousthatthisalgorithmisanadaptationoftheADMMalgorithm.ItissimilartotheJacobiupdatevariantproposedin[ 27 ].Althoughwedon'trequireanadjustmentasisneededintheiralgorithm.WefavortheJacobiupdateovertheGauss-Seidelupdatebecauseitssuitabilityfordecentralizedalgorithm.Inthisupdaterule,eachagentstoresitsowncopyofthedualvariableyki.However,itisclearthatifthisruleisused,allthedualvariablesyki'sstoredondifferentagentswillbeequal.Iftheobjectivefunctionisgivenbythequadraticfunctionfi(x)=1 2dix2.Theminimizationprobleminthex-updatestepcanbeefcientlysolvedby xk+1i=projai,bi()]TJ /F3 11.955 Tf 9.3 0 Td[(yk+1i)]TJ /F5 11.955 Tf 11.95 0 Td[((rk)]TJ /F3 11.955 Tf 11.96 0 Td[(xki) di+),(3)wherethesimpleprojectionoperatorisgivenbyprojl,h(t)=median(l,t,h).Itisworthmentioningthat,althoughourconvergenceanalysisbelowisgiveninthecontextofconstantparameter,inreality,thechoiceofwillaffecttheconvergencerate.Suchbehaviorscanbeobservedinsimulationsandresultsinourconvergenceratestudy,suchasequation( 3 ).Therefore,wewilldiscussapracticaladaptiveparametertuningschemeattheendofSection 3.7 41

PAGE 42

3.5ConvergenceoftheAlgorithmTheconvergenceanalysisofstandardtwo-blockADMMhasbeenstudiedbymanyworks.Notably,[ 18 ]provedaconvergenceresultservesasthecornerstoneformanyofthefutureanalysisinADMM.Theirresultsalsoincludeaproofdiscussingthecaseaninexactminimizerisusedinthesub-problem.[ 8 ]providesareviewandgeneralizeddiscussionofADMMandapplications.Recently,therehavebeenafewstudiesonmulti-blockADMM,suchas[ 11 16 24 27 ].Theresultsgenerallystatethatvariantsofmulti-blockADMMconvergesundercertainconditions.Theiralgorithmsandconditionsforconvergencearedifferentfromours.Nonetheless,theconvergenceofmulti-blockcaseisstillanon-trivialresearchtopic.Weprovideaself-containedproofinthissectionwhichdoesn'trequirecomplexconceptinconvexanalysis.Wewillusetheterminologybythefoundationalworkof[ 42 ].Wemaketwogeneralassumptionstoprepareforourconvergenceanalysis. Assumption1. Thefunctionfiisproper,lowersemi-continuousandconvex. Assumption2. TheLagrangianfunction L0(x,y)=nXi=1fi(xi)+y(nXi=1xi)]TJ /F3 11.955 Tf 11.96 0 Td[(C)(3)hasasaddlepoint.Intheproofbelow,wesimplifytheproblembydroppingtheboxconstraintaixbi.Suchconstraintscanbeincorporatedintotheproofbyreplacingtheobjectivefunctionfi(x)byfi(x)+gi(x),wheregi(x)istheindicatorfunctionoftheinterval[ai,bi].Theresultingobjectivefunctionisstillaproper,lowersemi-continuousconvexfunction.Itisobviousthatunderourupdaterule,alltheyki'sareequal.Wethusdropthesubscriptforthedualvariableykinourproof.Werefertork=nXi=1xki)]TJ /F3 11.955 Tf 11.95 0 Td[(C 42

PAGE 43

astheprimalresidual.Ifrk=0,thenthelinearconstraintinthefeasibilityconditionofproblem( 3 )issatised.Weusep?=Pni=1fi(x?i)andpk=Pni=1fi(xki)todenotetheglobalminimumvalueandtheobjectivevalueatthek-thiteration.Webeginbyestablishingaresultthatdescribesthepropertiesofobjectivefunctionvalue. Proposition3.1. Letx?betheglobalminimizeroftheoptimizationproblem( 3 )andp?=Pni=1fi(x?i)betheglobalminimumvalue.Letxk+1andyk+1betheiteratesgeneratedbytheupdaterule( 3 ),andpk+1=Pni=1fi(xk+1i)bethecorrespondingfunctionvalueatthatiteration.Then pk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(p?)]TJ /F3 11.955 Tf 21.92 0 Td[(yk+2rk+1)]TJ /F5 11.955 Tf 11.96 0 Td[((rk)]TJ /F3 11.955 Tf 11.95 0 Td[(rk+1)rk+1)]TJ /F5 11.955 Tf 11.95 0 Td[(nXi=1(xki)]TJ /F3 11.955 Tf 11.96 0 Td[(xk+1i)(x?i)]TJ /F3 11.955 Tf 11.96 0 Td[(xk+1i)(3) Proof. xk+1iistheminimizerintheupdate( 3 ),wehave02@fi(xk+1i)+yk+1i+(xk+1i+Xj6=ixkj)]TJ /F3 11.955 Tf 11.95 0 Td[(C).Usetheupdateruleyk+2=yk+1+(xk+1i+Pj6=ixk+1j)]TJ /F3 11.955 Tf 11.96 0 Td[(C).,weobtain 02@fi(xk+1i)+yk+2+(Xj6=ixkj)]TJ /F12 11.955 Tf 11.95 11.36 Td[(Xj6=ixk+1j).(3)Fromthisinequality,weobservethatxk+1iistheminimizeroffi(x)+ yk+2+(Xj6=ixkj)]TJ /F12 11.955 Tf 11.95 11.35 Td[(Xj6=ixk+1j)!x.Thus, fi(xk+1i)+ yk+2+(Xj6=ixkj)]TJ /F12 11.955 Tf 11.96 11.36 Td[(Xj6=ixk+1j)!k+1xk+1ifi(x?i)+ yk+2+(Xj6=ixkj)]TJ /F12 11.955 Tf 11.96 11.36 Td[(Xj6=ixk+1j)!x?ifi(xk+1i))]TJ /F3 11.955 Tf 11.96 0 Td[(fi(x?i) yk+2+(Xj6=ixkj)]TJ /F12 11.955 Tf 11.96 11.36 Td[(Xj6=ixk+1j)!(x?i)]TJ /F3 11.955 Tf 11.96 0 Td[(xk+1i)(3) 43

PAGE 44

Sumupoveralli's,weobtaintheresult pk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(p?)]TJ /F3 11.955 Tf 21.92 0 Td[(yk+2rk+1)]TJ /F5 11.955 Tf 11.96 0 Td[((rk)]TJ /F3 11.955 Tf 11.95 0 Td[(rk+1)rk+1)]TJ /F5 11.955 Tf 11.95 0 Td[(nXi=1(xki)]TJ /F3 11.955 Tf 11.96 0 Td[(xk+1i)(x?i)]TJ /F3 11.955 Tf 11.96 0 Td[(xk+1i)(3) Lemma5. Let(x?,y?)betheglobalsolutionoftheproblem( 3 ),andp?,pk+1denotetheobjectivefunctionvalueatx?,xk+1.IftheobjectivefunctionPfiisstronglyconvexwithmodulus,then p?)]TJ /F3 11.955 Tf 11.95 0 Td[(pk+1
PAGE 45

Addthisinequalityto( 3 ),0)]TJ /F4 11.955 Tf 21.91 0 Td[((yk+2)]TJ /F3 11.955 Tf 11.96 0 Td[(y?)rk+1)]TJ /F5 11.955 Tf 11.95 0 Td[((rk)]TJ /F3 11.955 Tf 11.96 0 Td[(rk+1)rk+1)]TJ /F5 11.955 Tf 11.96 0 Td[(nXi=1(xki)]TJ /F3 11.955 Tf 11.95 0 Td[(xk+1i)(x?i)]TJ /F3 11.955 Tf 11.95 0 Td[(xk+1i))]TJ /F5 11.955 Tf 11.96 0 Td[(kxk+1)]TJ /F3 11.955 Tf 11.96 0 Td[(x?k2.Multiplythroughby)]TJ /F4 11.955 Tf 9.3 0 Td[(2, 02(yk+2)]TJ /F3 11.955 Tf 11.7 0 Td[(y?)rk+1+2(rk)]TJ /F3 11.955 Tf 11.7 0 Td[(rk+1)rk+1+2nXi=1(xki)]TJ /F3 11.955 Tf 11.71 0 Td[(xk+1i)(x?i)]TJ /F3 11.955 Tf 11.7 0 Td[(xk+1i)+2kxk+1)]TJ /F3 11.955 Tf 11.7 0 Td[(x?k2(3)Weexaminethersttermontheright-hand-side.Usetheupdateruleyk+2=yk+1+rk+1,wehave 2(yk+2)]TJ /F3 11.955 Tf 11.96 0 Td[(y?)rk+1=2(yk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(y?)rk+1+2krk+1k2=2 (yk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(y?)(yk+2)]TJ /F3 11.955 Tf 11.95 0 Td[(yk+1)+1 kyk+2)]TJ /F3 11.955 Tf 11.95 0 Td[(yk+1k2+krk+1k2=1 )]TJ /F2 11.955 Tf 5.48 -9.69 Td[(kyk+2)]TJ /F3 11.955 Tf 11.95 0 Td[(y?k2)-222(kyk+1)]TJ /F3 11.955 Tf 11.96 0 Td[(y?k2+krk+1k2.(3)Thelastlineaboveisobtainedbyusingyk+2)]TJ /F3 11.955 Tf 12.59 0 Td[(yk+1=yk+2)]TJ /F3 11.955 Tf 12.6 0 Td[(y?+y?)]TJ /F3 11.955 Tf 12.6 0 Td[(yk+1.Theninequality( 3 )canbewrittenas 1 )]TJ /F2 11.955 Tf 5.47 -9.69 Td[(kyk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(y?k2)-221(kyk+2)]TJ /F3 11.955 Tf 11.96 0 Td[(y?k2rk+12+2(rk)]TJ /F3 11.955 Tf 11.96 0 Td[(rk+1)rk+1+2nXi=1(xki)]TJ /F3 11.955 Tf 11.96 0 Td[(xk+1i)(x?i)]TJ /F3 11.955 Tf 11.95 0 Td[(xk+1i)+2kxk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(x?k2.(3)Werstexamthelasttermabove.Usexki)]TJ /F3 11.955 Tf 12.23 0 Td[(xk+1i=(xki)]TJ /F3 11.955 Tf 12.24 0 Td[(x?i))]TJ /F4 11.955 Tf 12.24 0 Td[((xk+1i)]TJ /F3 11.955 Tf 12.23 0 Td[(x?i)andvectornotation,thelasttermbecomes 2(xk)]TJ /F3 11.955 Tf 11.95 0 Td[(x?)T(x?)]TJ /F3 11.955 Tf 11.96 0 Td[(xk+1))]TJ /F4 11.955 Tf 11.95 0 Td[(2(xk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(x?)T(x?)]TJ /F3 11.955 Tf 11.96 0 Td[(xk+1).(3)Weaddandsubtractthequantitykxk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(xkk2=kxk+1)]TJ /F3 11.955 Tf 11.96 0 Td[(x?k2+kxk)]TJ /F3 11.955 Tf 11.96 0 Td[(x?k2)]TJ /F4 11.955 Tf 11.95 0 Td[(2(xk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(x?)T(xk)]TJ /F3 11.955 Tf 11.96 0 Td[(x?). 45

PAGE 46

Afterrearrange,Werewrite( 3 )as kxk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(xkk2)]TJ /F5 11.955 Tf 11.95 0 Td[(kxk)]TJ /F3 11.955 Tf 11.96 0 Td[(x?k2+kxk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(x?k2.(3)Nowinequality( 3 )becomes 1 kyk+1)]TJ /F3 11.955 Tf 11.96 0 Td[(y?k2+kxk)]TJ /F3 11.955 Tf 11.96 0 Td[(x?k2)]TJ /F12 11.955 Tf 11.96 16.86 Td[(1 kyk+2)]TJ /F3 11.955 Tf 11.95 0 Td[(y?k2+kxk+1)]TJ /F3 11.955 Tf 11.96 0 Td[(x?k2rk+12+2(rk)]TJ /F3 11.955 Tf 11.95 0 Td[(rk+1)rk+1+kxk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(xkk2+2kxk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(x?k2.(3)Thersttwotermsontheright-hand-sidecanberewrittenasrk2)]TJ /F5 11.955 Tf 11.96 0 Td[((rk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(rk)2.Addkxk)]TJ /F3 11.955 Tf 11.95 0 Td[(x?k2)]TJ /F5 11.955 Tf 11.96 0 Td[(kxk+1)]TJ /F3 11.955 Tf 11.96 0 Td[(x?k2onbothsides,wethenobtain 1 kyk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(y?k2+(+)kxk)]TJ /F3 11.955 Tf 11.95 0 Td[(x?k2)]TJ /F12 11.955 Tf 11.95 16.85 Td[(1 kyk+2)]TJ /F3 11.955 Tf 11.96 0 Td[(y?k2+(+)kxk+1)]TJ /F3 11.955 Tf 11.96 0 Td[(x?k2rk2+kxk+1)]TJ /F3 11.955 Tf 11.96 0 Td[(xkk2)]TJ /F5 11.955 Tf 11.96 0 Td[((rk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(rk)2+kxk)]TJ /F3 11.955 Tf 11.95 0 Td[(x?k2+kxk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(x?k2.(3)Werewritethelasttwotermsusingtheinequality kxk)]TJ /F3 11.955 Tf 11.96 0 Td[(x?k2+kxk+1)]TJ /F3 11.955 Tf 11.96 0 Td[(x?k2 2kxk+1)]TJ /F3 11.955 Tf 11.96 0 Td[(xkk2.(3)Use(rk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(rk)2= nXi=1xk+1i)]TJ /F6 7.97 Tf 18.3 14.95 Td[(nXi=1xki!2= nXi=1)]TJ /F3 11.955 Tf 5.48 -9.69 Td[(xk+1i)]TJ /F3 11.955 Tf 11.95 0 Td[(xki!2andkxk+1)]TJ /F3 11.955 Tf 11.96 0 Td[(xkk2=nXi=1)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(xk+1i)]TJ /F3 11.955 Tf 11.95 0 Td[(xki2, 46

PAGE 47

wecanre-writetheinequality( 3 )as 1 kyk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(y?k2+(+)kxk)]TJ /F3 11.955 Tf 11.95 0 Td[(x?k2)]TJ /F12 11.955 Tf 11.95 16.86 Td[(1 kyk+2)]TJ /F3 11.955 Tf 11.96 0 Td[(y?k2+(+)kxk+1)]TJ /F3 11.955 Tf 11.96 0 Td[(x?k2rk2+(+ 2)nnXi=11 n)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(xk+1i)]TJ /F3 11.955 Tf 11.95 0 Td[(xki2)]TJ /F5 11.955 Tf 11.96 0 Td[(n2 nXi=11 n)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(xk+1i)]TJ /F3 11.955 Tf 11.96 0 Td[(xki!2.(3)Finally,usethecondition 2(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)andapplyJenzen'sinequalitytothelasttwoterms,wehave(+ 2)nnXi=11 n)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(xk+1i)]TJ /F3 11.955 Tf 11.95 0 Td[(xki2)]TJ /F5 11.955 Tf 11.96 0 Td[(n2 nXi=11 n)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(xk+1i)]TJ /F3 11.955 Tf 11.96 0 Td[(xki!20.Theresultfollows. Theorem3.1. IftheobjectivefunctionPfiisstronglyconvexwithmodulusand 2(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1).Theprimalresidualrk!0ask!1.Furthermore,ifthereexists>0suchthat)]TJ /F5 11.955 Tf 11.96 0 Td[( 2(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1),theprimalvariableconvergesxk!x?andtheobjectivefunctionvalueconvergespk!p?. Proof. Sumtheinequality( 3 )fromk=1tok=KandletK!1,wehave 1 ky1)]TJ /F3 11.955 Tf 11.95 0 Td[(y?k2+(+)kx0)]TJ /F3 11.955 Tf 11.96 0 Td[(x?k21Xk=1rk2.(3)Therefore,rk!0ask!1.Theprimalresidualconverges.If)]TJ /F5 11.955 Tf 11.96 0 Td[( 2(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1),inequality( 3 )canbere-writtenas 1 kyk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(y?k2+(+)kxk)]TJ /F3 11.955 Tf 11.95 0 Td[(x?k2)]TJ /F12 11.955 Tf 11.95 16.86 Td[(1 kyk+2)]TJ /F3 11.955 Tf 11.96 0 Td[(y?k2+(+)kxk+1)]TJ /F3 11.955 Tf 11.96 0 Td[(x?k2rk2+(+)]TJ /F5 11.955 Tf 11.95 0 Td[( 2)nnXi=11 n)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(xk+1i)]TJ /F3 11.955 Tf 11.95 0 Td[(xki2)]TJ /F5 11.955 Tf 11.96 0 Td[(n2 nXi=11 n)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(xk+1i)]TJ /F3 11.955 Tf 11.96 0 Td[(xki!2+kxk)]TJ /F3 11.955 Tf 11.95 0 Td[(x?k2+kxk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(x?k2.(3) 47

PAGE 48

Therefore,usingJenzen'sinequality,wecanobtain 1 kyk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(y?k2+(+)kxk)]TJ /F3 11.955 Tf 11.95 0 Td[(x?k2)]TJ /F12 11.955 Tf 11.95 16.86 Td[(1 kyk+2)]TJ /F3 11.955 Tf 11.96 0 Td[(y?k2+(+)kxk+1)]TJ /F3 11.955 Tf 11.96 0 Td[(x?k2rk2+kxk)]TJ /F3 11.955 Tf 11.95 0 Td[(x?k2+kxk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(x?k2.(3)Sumtheinequality( 3 )fromk=1tok=KandletK!1,wehave 1 ky1)]TJ /F3 11.955 Tf 11.95 0 Td[(y?k2+(+)kx0)]TJ /F3 11.955 Tf 11.95 0 Td[(x?k21Xk=1rk2+1Xk=1)]TJ /F5 11.955 Tf 5.48 -9.68 Td[(kxk)]TJ /F3 11.955 Tf 11.96 0 Td[(x?k2+kxk+1)]TJ /F3 11.955 Tf 11.96 0 Td[(x?k2.(3)Thusxk!x?.Lastly,weprovetheconvergenceoftheobjectivevalue.Becausexk!x?andrk!0,allthetermsontherighthand-sideof( 3 )and( 3 )gotozeros.Thus,pk!p?. 3.6ToyExampleforDM-ADMMWenowconsiderasimplenumericalexperiment.TheexperimentsinthischapterwereconductedusingMATLABonaGNU/Linuxcomputerwith8GBofRAMandanIntelCorei7-2600processor.Werstconsidertherelativelysimpletoyexample. minimizex1 2xTAxsubjectto0xi100,i=1,...,n,nXi=1xi=100,(3)whereA=diagf1,...,ng,where1=100,i=1,fori>1.Intherstsetofsimulations,wex=0.5andtestdifferentvaluesofn.Weobservethat,whenn=2,thealgorithmisthestandardtwo-blockADMMwithJacobiupdate.Convergentbehaviourisobservedinthiscase.Whennincreasesand=0.5isxed,thealgorithmdiverges.SimulationresultsareplottedinFigure 3-1 48

PAGE 49

Next,wexn=100andtestdifferentvaluesof.ResultsareplottedinFigure 3-2 .Basedonourtheoreticalconditionsforconvergence 2(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1),weobserveconvergentbehavioursasdecreases.Finally,weconsiderthatisvaryingthroughouttheiteration.Weconsiderthediminishingstrategytoletk=0=k2,k=0=kandk=0=p k,where0=0.1andkistheiterationnumber.Wealsousethesafeguardk>0.001,becausethatisthestep-sizeforwhichwealreadyobservedconvergencebehavior.ResutlsareshowninFigure 3-3 Withsuchavarying,thealgorithmreachesconvergenceatacceleratedspeed. 3.7Power-gridLoadControlSimulationWeconsiderthesimulationscenarioof[ 47 ],wherethereisasinglegeneratorinamicro-gridwithnloads.Thegeneratorhaslocalcontrols(commonlyusedingenerators)thatadjustgenerationoutputtoreducetheconsumption-generationmismatchevenwithoutthepresenceofsmartloads.Asystem-wideprocessdisturbanceandmeasurementnoiseateachloadaremodeledaswide-sensestationarywhitenoise.Formoredetailsaboutthesimulationmodel(suchasgeneratordynamicsandnoisestatistics),thereaderisreferredto[ 47 ].Foreaseofcomparison,wechooseparametersasdonein[ 47 ].Forthesakeofcompleteness,wesummarizethesimulationparametersbelow.Eachloadisconstrainedbytheamountofitmayvaryconsumption.Wechooseai=0,i=1,2,...,n,andwechooseeachbifromauniformdistribution;wethennormalizethebi'ssothatPni=1bi=60MW.Eachload'sdisutilityismodeledasaquadraticfunctionofchangeinconsumption:fi(xi)=x2i 2i, 49

PAGE 50

whereeachiischosenfromauniformdistributionontheinterval[1,3].Forthisdisutilityfunction,theminimizationproblemforthex-updatein( 3 )maybesolvedefcientlyby( 3 ).Theinitialconditionsofthesystemareg0=200MWandx0i=0,i=1,2,...,n.Twogenerationdropsaremodeledasstepchangesingeneration:gk=8>>>><>>>>:200MW,0skT<20s190MW,20skT<50s170MW,50skT,whereT=0.1secondsisthediscretizationinterval.Mathematically,thiscorrespondsto C=8>>>>>><>>>>>>:0,0k<2010,20k<5030,50k.(3)inoptimizationproblem( 3 ).Atthebeginningofiterationk,weassumeeachloadcanobtainan(noised)estimateoftheresidualrk=Pni=1xki)]TJ /F3 11.955 Tf 11.96 0 Td[(C.AlthoughtheDM-ADMMalgorithmdoesnotrequirecommunicationamongloads,communicationmaybeusedtoeliminatetheeffectoftheestimationnoise.Inthiswork,wereportresultsofthesimulationswithandwithoutnoise.Inthecasewhennoiseispresent,wereportresultswithandwithoutcommunicationamongloads.Whenthereiscommunication,weuseaconnected2D-gridgraphasin[ 9 47 ].Inthisgraph,eachloadi2V=f1,2,...,Ngcommunicatesdirectlytoloadsinitsneighbourhoodi)]TJ /F3 11.955 Tf 11.95 0 Td[(K,i)]TJ /F3 11.955 Tf 11.96 0 Td[(K+1,...,i+K,whereKisaconstant.Theparameterwillbesettoasmall-enoughconstantbasedonourconvergenceanalysis.ThevariableparameterkconsideredinSection 3.6 isnotpracticalinrealpower-gridloadcontrolsinceloadsmayhavenoinformationofwhenthegenerationchangeoccurs. 50

PAGE 51

SimulationwithoutnoiseInourrstsimulation,weconsidertheidealcasethatthereisnoestimationnoiseinthesystem.Weassumeinthiscasethateachloadcanobtainanaccurateestimateoftheresidualrk=Pni=1xki)]TJ /F3 11.955 Tf 12.4 0 Td[(C.wassetto0.004inthissimulation.Figure 3-4 showsthesystemfrequencyandtheprimalresidualrk=Pni=1xki)]TJ /F3 11.955 Tf 12.44 0 Td[(CofthesolutionfortheproposedDM-ADMMalgorithmandthedualalgorithmproposedin[ 47 ]whennosystemnoiseisconsidered.WeobservethattheDM-ADMMalgorithmrestoresthefrequencybacktonominallevelwithhighspeedandstability.Thedualalgorithmwasabletoreducethefrequencydeviation,butthefrequencyhasnotbeenfullyrestoredattheendofthesimulation. Simulationwithnoise,withoutcommunicationOurproofofconvergenceisgiveninthewithout-noisecase.Inthepracticalproblem,wecanonlygetalocalnoisedestimateoftheresidualukiPni=1xki)]TJ /F3 11.955 Tf 12.21 0 Td[(C.Wethereforeconsiderthefollowingpracticalupdaterule. yk+1i=yki+uki,xk+1i=argminaixbifi(x)+yk+1i(x+uki)]TJ /F3 11.955 Tf 11.95 0 Td[(xki)+ 2kx+uki)]TJ /F3 11.955 Tf 11.96 0 Td[(xkik2.(3)Figure 3-5 showsthesystemfrequencyandtheprimalresidualofthesolutionfortheproposedDM-ADMMalgorithmandthedualalgorithmwhennocommunicationnetworkexistsamongtheloads.ItisclearfromthegurethattheDM-ADMMalgorithmsignicantlyreducesthefrequencydeviationfromnominalduringthecontingencyevents,whereasthedualalgorithmonlymodestlyreducesthefrequencydeviation.Additionally,theDM-ADMMalgorithmrestoresthesystemfrequencytothenominalvaluemuchfasterthandoesthedualalgorithm.wassetto0.0015inthissimulation. Simulationwithnoise,withcommunicationWhencommunicationisallowedinthepower-grid,loadiwillcommunicatewiththeloadsinitsneighbourhoodN(i).Loadireceivesthemeasurementukjandvaluesof 51

PAGE 52

ykifromallitsneighboursj2N(i).Fromthosevalues,itthencomputestheaveragedvaluesykianduki.Weconsiderthefollowingupdaterulewithcommunication. yk+1i=yki+uki,xk+1i=argminaixbifi(x)+yk+1i(x+uki)]TJ /F3 11.955 Tf 11.95 0 Td[(xki)+ 2kx+uki)]TJ /F3 11.955 Tf 11.96 0 Td[(xkik2.(3)Figure 3-6 showsthesystemfrequencyandthefeasibilityofthesolutionwhenthereiscommunicationamongloads.Onceagain,theDM-ADMMalgorithmoutperformsthedualalgorithminmaintainingnominalsystemfrequency.wassetto0.0015inthissimulation. AdaptiveparametertuningFigure 3-7 depictshowdifferentchoicesofaffecttheconvergence.Itisobservedthatalargerchoiceofcanforcetheprimalresidualtodecreasefasterresultinginfasterrestorationoffrequency.However,itsasymptoticconvergencemaynotbeguaranteedaccordingtoourconvergenceproof.Wepresentanadaptiveparametertuningschemetouseanaggressivehightoquicklydecreasetheprimalresidualrkandaconservativelowtoreinforceconvergence.ADAPTIVEPARAMETERk Usersupplies,,high,low,Ifrk>andk)]TJ /F5 11.955 Tf 11.95 0 Td[(low
PAGE 53

whenthesecondload-generationmismatcheventoccurs.Suchaadaptiveschemeisapromisingdirectionofresearchbecauseitutilizesthefasterinitialconvergenceofferedbyalargekwhiletheconvergenceanalysisrestrictstobeasmallconstant. 3.8LinearConvergenceRatewithSeparableQuadraticProgrammingWenowshowthat,inthecaseofquadraticobjectivefunctionfi(x)=1 2dix2,theconvergencerateofthedistributedmulti-blockADMMislinear.Leti=1 di+.Werstconsiderthecasewheretheboxconstrainaixibi,i=1,...,nisnotpresent.From( 3 )and( 3 ),wecanwritetheupdateruleas xk+1i=)]TJ /F5 11.955 Tf 11.96 0 Td[(iyk+1i)]TJ /F5 11.955 Tf 11.96 0 Td[(iXj6=ixkj+iCyk+2=yk+1+(nXi=1xk+1i)]TJ /F3 11.955 Tf 11.96 0 Td[(C)=(1)]TJ /F5 11.955 Tf 11.96 0 Td[(nXi=1i)yk+1)]TJ /F5 11.955 Tf 11.96 0 Td[(2nXi=1(Xj6=ij)xki+(2nXi=1i)]TJ /F5 11.955 Tf 11.96 0 Td[()C.(3)Usingmatrixnotation,thiscanbewrittenas 266666666664yk+2xk+11xk+12...xk+1n377777777775=266666666666641)]TJ /F5 11.955 Tf 11.95 0 Td[(nXi=1i)]TJ /F5 11.955 Tf 9.3 0 Td[(2(Xj6=1j))]TJ /F5 11.955 Tf 9.3 0 Td[(2(Xj6=2j)...)]TJ /F5 11.955 Tf 9.3 0 Td[(2(Xj6=nj))]TJ /F5 11.955 Tf 9.3 0 Td[(10)]TJ /F5 11.955 Tf 9.3 0 Td[(1...)]TJ /F5 11.955 Tf 9.3 0 Td[(1)]TJ /F5 11.955 Tf 9.3 0 Td[(2)]TJ /F5 11.955 Tf 9.3 0 Td[(20...)]TJ /F5 11.955 Tf 9.3 0 Td[(2...............)]TJ /F5 11.955 Tf 9.3 0 Td[(n)]TJ /F5 11.955 Tf 9.3 0 Td[(n)]TJ /F5 11.955 Tf 9.3 0 Td[(n...037777777777775266666666664yk+1xk1xk2...xkn377777777775+2666666666642Pni=1i)]TJ /F5 11.955 Tf 11.96 0 Td[(12...n377777777775C.(3) 53

PAGE 54

Let wk=266666666664yk+1xk1xk2...xkn377777777775,A()=266666666666641)]TJ /F5 11.955 Tf 11.96 0 Td[(nXi=1i)]TJ /F5 11.955 Tf 9.3 0 Td[(2(Xj6=1j))]TJ /F5 11.955 Tf 9.3 0 Td[(2(Xj6=2j)...)]TJ /F5 11.955 Tf 9.3 0 Td[(2(Xj6=nj))]TJ /F5 11.955 Tf 9.3 0 Td[(10)]TJ /F5 11.955 Tf 9.3 0 Td[(1...)]TJ /F5 11.955 Tf 9.29 0 Td[(1)]TJ /F5 11.955 Tf 9.3 0 Td[(2)]TJ /F5 11.955 Tf 9.3 0 Td[(20...)]TJ /F5 11.955 Tf 9.29 0 Td[(2...............)]TJ /F5 11.955 Tf 9.29 0 Td[(n)]TJ /F5 11.955 Tf 9.3 0 Td[(n)]TJ /F5 11.955 Tf 9.3 0 Td[(n...037777777777775,(3)andC()=2666666666642Pni=1i)]TJ /F5 11.955 Tf 11.96 0 Td[(12...n377777777775C.Wewriteequation( 3 )as wk+1=A()wk+C().(3)WestudytheeigenvaluesofthematrixA(). Lemma6. LetmatrixA()begivenby( 3 ).If>0issmallenough,thenthelargesteigenvalueofA()satisesmax(A())<1. 54

PAGE 55

Proof. Theeigenvaluesofamatrixremainunchangedundersimilaritytransform.WemultiplytherstrowofA()by1 andtherstcolumnby,weobtain A0()=266666666666641)]TJ /F5 11.955 Tf 11.96 0 Td[(nXi=1i)]TJ /F5 11.955 Tf 9.3 0 Td[((Xj6=1j))]TJ /F5 11.955 Tf 9.29 0 Td[((Xj6=2j)...)]TJ /F5 11.955 Tf 9.3 0 Td[((Xj6=nj))]TJ /F5 11.955 Tf 9.3 0 Td[(10)]TJ /F5 11.955 Tf 9.3 0 Td[(1...)]TJ /F5 11.955 Tf 9.29 0 Td[(1)]TJ /F5 11.955 Tf 9.3 0 Td[(2)]TJ /F5 11.955 Tf 9.3 0 Td[(20...)]TJ /F5 11.955 Tf 9.29 0 Td[(2...............)]TJ /F5 11.955 Tf 9.3 0 Td[(n)]TJ /F5 11.955 Tf 9.3 0 Td[(n)]TJ /F5 11.955 Tf 9.3 0 Td[(n...037777777777775=e1e1T)]TJ /F5 11.955 Tf 11.95 0 Td[(Z,whereZ=26666666666664nXi=1iXj6=1jXj6=2j...Xj6=nj101...1220...2...............nnn...037777777777775.(3)A0()andA()havethesameeigenvalues.ItisobviousthattheeigenvaluesofA0(0)are(0)=(1,0,...,0),associatedwiththeidentityeigenvectormatrix V=v1v2...vn=I.(3)Thenwehavethefollowingstandardresult,seee.g.[ 22 ],fortherstcomponentofthederivativeof()at=0. 0(0)1=)]TJ /F3 11.955 Tf 10.5 8.09 Td[(vT1Zv1 vT1v1=)]TJ /F6 7.97 Tf 17.64 14.95 Td[(nXi=1i.(3)Inaddition,usingGershgorin'sTheorem,therestoftheeigenvaluesofA()lieintheunionofthedisksDi=f:jjnig,i=2,...,n. 55

PAGE 56

Thus,theseeigenvaluesareatmostoforderO().UsingTaylorexpansion,weobtainthatfor>0smallenough, ()i=8>>><>>>:1)]TJ /F5 11.955 Tf 11.96 0 Td[(nXi=1i+O(2)i=1O()i=2,...,n.(3)Wethusobtaintheconclusionofthelemma. Letw?denotethelimitofwk.Wehave wk+1)]TJ /F3 11.955 Tf 11.95 0 Td[(w?=A())]TJ /F3 11.955 Tf 5.48 -9.69 Td[(wk)]TJ /F3 11.955 Tf 11.95 0 Td[(w?.(3)Forsmallenough,thelargesteigenvalueofA()islessthanone.Thuswkconvergeslinearly.Wenowconsiderthepresenceoftheboxconstraintaixibi.Theupdaterulebecomes xk+1i=projai,bi )]TJ /F5 11.955 Tf 9.3 0 Td[(iyk+1i)]TJ /F5 11.955 Tf 11.96 0 Td[(iXj6=ixkj+iC!yk+2=yk+1+(nXi=1xk+1i)]TJ /F3 11.955 Tf 11.95 0 Td[(C)(3)Becausewealreadyprovedxk!x?,theprojectionoperatoronlyaffectsthosexkinearthelowerorupperbounds.Withoutlossofgenerality,weonlyconsiderthecaseofthelowerboundaiandassumex?i=aionlywheni=1.Formally,let0=1 2minj6=1jx?j)]TJ /F3 11.955 Tf 12.89 0 Td[(ajj.ThereexistsN>0suchthatforallk>N,jxk)]TJ /F3 11.955 Tf 11.95 0 Td[(x?j<0.Thenjxkj)]TJ /F3 11.955 Tf 11.95 0 Td[(ajj>0,forj>1andjxk1)]TJ /F3 11.955 Tf 11.96 0 Td[(a1j<0.Therefore,in( 3 ),projectionoperatorisnotpresentfori>1,i.e.xk+1i=)]TJ /F5 11.955 Tf 9.29 0 Td[(iyk+1i)]TJ /F5 11.955 Tf 11.95 0 Td[(iXj6=ixkj+iC,fori>1. 56

PAGE 57

Thatmeansthosexki'sstillfollowtheupdaterulewithoutboxconstraint( 3 ).Thentheprojectionoperatormayonlyaffectxk+11andyk+2.Weassumexk+11=a1,otherwisetheboxconstraintisnotpresent.Thentheupdateequation( 3 )becomes 266666666664yk+2xk+11xk+12...xk+1n377777777775=266666666666641)]TJ /F5 11.955 Tf 11.95 0 Td[(Xi6=1i)]TJ /F5 11.955 Tf 9.3 0 Td[(2(Xj6=1j))]TJ /F5 11.955 Tf 9.3 0 Td[(2(Xj6=1,2j)...)]TJ /F5 11.955 Tf 9.3 0 Td[(2(Xj6=1,nj)000...0)]TJ /F5 11.955 Tf 9.29 0 Td[(2)]TJ /F5 11.955 Tf 9.3 0 Td[(20...)]TJ /F5 11.955 Tf 9.3 0 Td[(2...............)]TJ /F5 11.955 Tf 9.3 0 Td[(n)]TJ /F5 11.955 Tf 9.29 0 Td[(n)]TJ /F5 11.955 Tf 9.3 0 Td[(n...037777777777775266666666664yk+1xk1xk2...xkn377777777775+2666666666642Pi6=1i)]TJ /F5 11.955 Tf 11.95 0 Td[(02...n377777777775C+266666666664a1a10...0377777777775.(3)Wewritethisinbriefnotation wk+1=B()wk+C()+D().(3)Wealsohave w?=B()w?+C()+D().(3)Theargumentoflemma 6 stillappliestoB(),thusweobtainthelinearconvergencerateundertheboxconstraint. 57

PAGE 58

A=0.5,n=2 B=0.5,n=3 C=0.5,n=10Figure3-1. DM-ADMMalgorithm,toyexample.Fixed,Differentn 58

PAGE 59

A=1,n=100 B=0.1,n=100 C=0.01,n=100 D=0.001,n=100Figure3-2. DM-ADMMalgorithm,toyexample.Different,xedn 59

PAGE 60

Ak=0=k2,n=100 Bk=0=k,n=100 Ck=0=p k,n=100 D=0.001,n=100Figure3-3. DM-ADMMalgorithm,toyexample.Varyingk. 60

PAGE 61

ASystemfrequency BPrimalresidualrk=Pni=1xki)]TJ /F29 9.963 Tf 9.96 0 Td[(CFigure3-4. Power-gridsimulation,withoutnoise ASystemfrequency BPrimalresidualrk=Pni=1xki)]TJ /F29 9.963 Tf 9.96 0 Td[(CFigure3-5. Power-gridsimulation,withnoise,withoutcommunication 61

PAGE 62

ASystemfrequency BPrimalresidualrk=Pni=1xki)]TJ /F29 9.963 Tf 9.96 0 Td[(CFigure3-6. Power-gridsimulation,withnoise,withcommunication 62

PAGE 63

A=0.003 B=0.002 C=0.001 D=0.0005Figure3-7. Power-grid,withcommunication,withnoise.Differentchoicesof 63

PAGE 64

Figure3-8. Power-gridsimulationwithoutnoise,adaptiveparameter 64

PAGE 65

CHAPTER4CONCLUSIONSTheoptimizationproblemsaddressedinthisthesisarisefromrealapplicationsinmachinelearningandmechanicalengineering.Withthealgorithmicinsight,wedevelopednumericalmethodsandtheiraccompanyingtheorytoworkwithnon-convexanddecentralizedproblems.Chapter 2 establishedthetheoryforprojectionalgorithmsinthenon-convexsetting.Previously,suchalgorithmshadoftenbeenstudiedinconvexsettings.OuranalysisservesnotonlyasthebasisforthecasestudyofsparsePCAbutalsoasatheoreticalframeworkforfutureapplicationsofprojectioninnon-convexsettings.Chapter 3 proposedadecentralizedalgorithmtosolvethepower-gridloadcontrolproblemwiththeoreticalguarantees.Thesimulationsshowedperformanceadvantageofourapproach.Inaddition,ouradaptiveparametertuningschemeoffersadifferentperspectiveonspeedingupconvergence.Thisthesisprovidesaninsightintohowtodesignnumericalalgorithmswhenexistingalgorithmsandtheorydonotdirectlyapplyintheapplications.Furthermore,ourmathematicalderivationsinnon-convexprojectionanddecentralizedmulti-blockalgorithmareindependentandself-contained.Numericalstudieshaveshowntheeffectivenessofthetailoredalgorithms. 65

PAGE 66

REFERENCES [1] A.A.ALIZADEHETAL.,Distincttypesofdiffuselargeb-celllymphomaidentiedbygeneexpressionproling,Nature,403(2000),pp.503. [2] J.BARZILAIANDJ.M.BORWEIN,Twopointstepsizegradientmethods,IMAJ.Numer.Anal.,8(1988),pp.141. [3] AMIRBECKANDMARCTEBOULLE,Mirrordescentandnonlinearprojectedsubgra-dientmethodsforconvexoptimization,OperationsResearchLetters,31(2003),pp.167. [4] D.P.BERTSEKAS,ProjectedNewtonmethodsforoptimizationproblemswithsimpleconstraints,SIAMJ.ControlOptim.,20(1982),pp.221. [5] A.BHASKARA,M.CHARIKAR,E.CHLAMTAC,U.FEIGE,ANDA.VIJAYARAGHAVAN,Detectinghighlog-densities:ano(n1=4)approximationfordensestk-subgraph,inProceedingsoftheforty-secondACMsymposiumonTheoryofComputing,ACM,2010,pp.201. [6] PAOLOBOLDI,MARCOROSA,MASSIMOSANTINI,ANDSEBASTIANOVIGNA,Lay-eredlabelpropagation:Amultiresolutioncoordinate-freeorderingforcompressingsocialnetworks,inProceedingsofthe20thinternationalconferenceonWorldWideWeb,ACMPress,2011. [7] PAOLOBOLDIANDSEBASTIANOVIGNA,TheWebGraphframeworkI:Compressiontechniques,inProc.oftheThirteenthInternationalWorldWideWebConference(WWW2004),Manhattan,USA,2004,ACMPress,pp.595. [8] STEPHENBOYD,NEALPARIKH,ERICCHU,BORJAPELEATO,ANDJONATHANECKSTEIN,Distributedoptimizationandstatisticallearningviathealternatingdirectionmethodofmultipliers,FoundationsandTrendsRinMachineLearning,3(2011),pp.1. [9] JONATHANBROOKSANDPRABIRBAROOAH,Adistributedgradientprojectionmethodforintelligentloadcontrolwithcommunicationconstraints. [10] JORGECADIMAANDIANTJOLLIFFE,Loadingandcorrelationsintheinterpretationofprinciplecompenents,JournalofAppliedStatistics,22(1995),pp.203. [11] CAIHUACHEN,BINGSHENGHE,YINYUYE,ANDXIAOMINGYUAN,Thedirectextensionofadmmformulti-blockconvexminimizationproblemsisnotnecessarilyconvergent,MathematicalProgramming,(2014),pp.1. [12] KENNETHLCLARKSON,Coresets,sparsegreedyapproximation,andtheFrank-Wolfealgorithm,ACMTransactionsonAlgorithms(TALG),6(2010),p.63. 66

PAGE 67

[13] Y.H.DAIANDR.FLETCHER,Newalgorithmsforsinglylinearlyconstrainedquadraticprogramssubjecttolowerandupperbounds,Math.Program.,106(2006),pp.403. [14] ALEXANDRED'ASPREMONT,FRANCISBACH,ANDLAURENTELGHAOUI,Opti-malsolutionsforsparseprincipalcomponentanalysis,TheJournalofMachineLearningResearch,9(2008),pp.1269. [15] TIMOTHYADAVIS,WILLIAMWHAGER,ANDJAMESTHUNGERFORD,Anefcienthybridalgorithmfortheseparableconvexquadraticknapsackproblem. [16] WEIDENG,MING-JUNLAI,ZHIMINPENG,ANDWOTAOYIN,Parallelmulti-blockADMMwitho(1/k)convergence,arXivpreprintarXiv:1312.3040,(2013). [17] JIMDOUGLASANDHENRYHRACHFORD,Onthenumericalsolutionofheatcon-ductionproblemsintwoandthreespacevariables,TransactionsoftheAmericanmathematicalSociety,(1956),pp.421. [18] J.ECKSTEINANDD.BERTSEKAS,OntheDouglas-Rachfordsplittingmethodandtheproximalpointalgorithmformaximalmonotoneoperators,MathematicalProgramming,55(1992),pp.293. [19] M.FRANKANDP.WOLFE,Analgorithmforquadraticprogramming,NavalResearchLogisticsQuaterly,3(1956),pp.95. [20] DANIELGABAYANDBERTRANDMERCIER,Adualalgorithmforthesolutionofnonlinearvariationalproblemsvianiteelementapproximation,Computers&MathematicswithApplications,2(1976),pp.17. [21] M.D.GONZALEZ-LIMA,W.W.HAGER,ANDH.ZHANG,Anafne-scalinginterior-pointmethodforcontinuousknapsackconstraints,SIAMJ.Optim.,21(2011),pp.361. [22] W.W.HAGER,AppliedNumericalLinearAlgebra,Prentice-Hall,EnglewoodCliffs,NJ,1988. [23] W.W.HAGERANDH.ZHANG,Anewactivesetalgorithmforboxconstrainedoptimization,SIAMJ.Optim.,17(2006),pp.526. [24] DERENHANANDXIAOMINGYUAN,Anoteonthealternatingdirectionmethodofmultipliers,JournalofOptimizationTheoryandApplications,155(2012),pp.227. [25] E.HAZANANDS.KALE,Projection-freeonlinelearning,inProceedingsofthe29thInternationalConferenceonMachineLearning,J.LangfordandJ.Pineau,eds.,Omnipress,2012,pp.521. 67

PAGE 68

[26] BINGSHENGHE,LIUSHENGHOU,ANDXIAOMINGYUAN,Onfulljacobiandecom-positionoftheaugmentedlagrangianmethodforseparableconvexprogramming,Preprint,(2013). [27] MINGYIHONGANDZHI-QUANLUO,Onthelinearconvergenceofthealternatingdirectionmethodofmultipliers,arXivpreprintarXiv:1208.3922,(2012). [28] MARTINJAGGI,RevisitingFrank-Wolfe:Projection-freesparseconvexoptimiza-tion,inProceedingsofthe30thInternationalConferenceonMachineLearning,S.DasguptaandD.McAllester,eds.,vol.28,2013,pp.427. [29] J.JEFFERS,Twocasestudiesintheapplicationofprincipalcomponents,AppliedStatistics,16(1967),pp.225. [30] R.JENATTON,G.OBOZINSKI,ANDF.BACH,Structuredsparseprincipalcompo-nentanalysis,inInternationalConferenceonArticialIntelligenceandStatistics(AISTATS),2010. [31] T.JOACHIMS,Makinglarge-scalesupportvectormachinelearningpractical,inAdvancesinKernelMethodsSupportVectorLearning,B.Scholkopf,C.J.C.Burges,andA.Smola,eds.,Cambridge,MA,1998,MITPress,pp.169. [32] IANTJOLLIFFE,NICKOLAYTTRENDAFILOV,ANDMUDASSIRUDDIN,AmodiedprincipalcomponenttechniquebasedontheLASSO,JournalofComputationalandGraphicalStatistics,12(2003),pp.531. [33] MICHELJOURNEE,YURIINESTEROV,PETERRICHTARIK,ANDRODOLPHESEPUL-CHRE,Generalizedpowermethodforsparseprincipalcomponentanalysis,TheJournalofMachineLearningResearch,11(2010),pp.517. [34] S.KHULLERANDB.SAHA,Onndingdensesubgraphs,inAutomata,LanguagesandProgramming,Springer,2009,pp.597. [35] SIMONLACOSTE-JULIEN,MARTINJAGGI,MARKSCHMIDT,ANDPATRICKPLETSCHER,Block-coordinateFrank-WolfeoptimizationforstructuralSVMs,inProceedingsofthe30thInternationalConferenceonMachineLearning,S.DasguptaandD.McAllester,eds.,vol.28,2013,pp.53. [36] TIANYILIN,SHIQIANMA,ANDSHUZHONGZHANG,Ontheconvergencerateofmulti-blockadmm,arXivpreprintarXiv:1408.4265,(2014). [37] RONNYLUSSANDMARCTEBOULLE,ConvexapproximationstosparsePCAviaLagrangianduality,OperationsResearchLetters,39(2011),pp.57. [38] ,Conditionalgradientalgorithmsforrank-onematrixapproximationswithasparsityconstraint,SIAMReview,55(2013),pp.65. 68

PAGE 69

[39] P.M.PARDALOSANDN.KOVOOR,Analgorithmforasinglyconstrainedclassofquadraticprogramssubjecttoupperandlowerbounds,Math.Program.,46(1990),pp.321. [40] J.PLATT,Fasttrainingofsupportvectormachinesusingsequentialminimalopti-mization,inAdvancesinKernelMethodsSupportVectorLearning,B.Scholkopf,C.Burges,andA.Smola,eds.,Cambridge,MA,1998,MITPress,pp.41. [41] SRIDHARRAMASWAMY,PABLOTAMAYO,RYANRIFKIN,SAYANMUKHERJEE,CHEN-HSIANGYEANG,MICHAELANGELO,CHRISTINELADD,MICHAELREICH,EVALATULIPPE,JILLPMESIROV,ETAL.,Multiclasscancerdiagnosisusingtumorgeneexpressionsignatures,ProceedingsoftheNationalAcademyofSciences,98(2001),pp.15149. [42] R.T.ROCKAFELLAR,Convexanalysis,PrincetonUniv.Press,1970. [43] BHARATHKSRIPERUMBUDUR,DAVIDATORRES,ANDGERTRGLANCKRIET,Amajorization-minimizationapproachtothesparsegeneralizedeigenvalueproblem,MachineLearning,85(2011),pp.3. [44] Y.YEANDJ.ZHANG,Approximationofdense-n/2-subgraphandthecomplementofmin-bisection,JournalofGlobalOptimization,25(2003),pp.55. [45] XIAO-TONGYUANANDTONGZHANG,Truncatedpowermethodforsparseeigenvalueproblems,TheJournalofMachineLearningResearch,14(2013),pp.899. [46] L.ZANNI,T.SERAFINI,ANDG.ZANGHIRATI,Parallelsoftwarefortraininglargescalesupportvectormachinesonmultiprocessorsystems,JournalofMachineLearningResearch,7(2006),pp.1467. [47] CHANGHONGZHAO,UFUKTOPCU,ANDSTEVENHLOW,Optimalloadcontrolviafrequencymeasurementandneighborhoodareacommunication,PowerSystems,IEEETransactionson,28(2013),pp.3576. [48] HUIZOU,TREVORHASTIE,ANDROBERTTIBSHIRANI,Sparseprincipalcom-ponentanalysis,JournalofComputationalandGraphicalStatistics,15(2006),pp.265. 69

PAGE 70

BIOGRAPHICALSKETCHJiajiestudiedwithProfessorWilliamW.HagerattheUniversityofFlorida.Hisresearchinterestwasoptimizationandmachinelearning.HereceivedhisPh.D.degreeinmathematicsinthefallof2015. 70