Differential Game-Based Control Methods for Uncertain Continuous-Time Nonlinear Systems

MISSING IMAGE

Material Information

Title:
Differential Game-Based Control Methods for Uncertain Continuous-Time Nonlinear Systems
Physical Description:
1 online resource (143 p.)
Language:
english
Creator:
Johnson,Marcus A
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Aerospace Engineering, Mechanical and Aerospace Engineering
Committee Chair:
Dixon, Warren E
Committee Members:
Fitz-Coy, Norman G
Barooah, Prabir
Khargonekar, Pramod

Subjects

Subjects / Keywords:
euler -- games -- lagrange -- lyapunov -- noncooperative -- nonlinear -- optimal
Mechanical and Aerospace Engineering -- Dissertations, Academic -- UF
Genre:
Aerospace Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
Game theory methods have been instrumental in the advancement of various disciplines such as social science, economics, biology, and engineering. The focus of this dissertation is to develop techniques for approximating solutions to zero-sum and nonzero- sum noncooperative differential games and using these solutions to stabilize some classes of parametrically uncertain and disturbed nonlinear systems. One contribution of this work is the development of a robust (sub)optimal controller that stabilizes an uncertain Euler-Lagrange system with additive disturbances and yields a solution to the feedback Nash differential game. The control formulation utilizes the Robust Integral Sign of the Error (RISE) control technique to asymptotically identify nonlinearities in the dynamics and converge to a residual linearized system, then the solution to the Nash game is used to derive the stabilizing feedback control laws. Furthermore the Nash optimal control technique is improved when one player has additional information about the other player, in the case of the Stackelberg differential game. Another contribution of this work is a (sub)optimal open-loop Stackelberg-based controller with a leader-follower structure, which both players act as inputs to a parametrically uncertain and disturbed nonlinear system. Another contribution of this work is a technique for solving a two player zero-sum infinite horizon game subject to continuous-time unknown nonlinear dynamics. The technique involves a generalization of an actor-critic-identifier (ACI) structure which is used to implement Hamilton-Jacobi-Isaac (HJI) approximation algorithm. The HJI approximation uses, two neural network (NN) actor structures and one NN critic structure to approximate the optimal control laws and value function, respectively. Using the ACI technique, another contribution of this work is deriving an approximate solution to a N-player nonzero-sum game. The technique expands the ACI structure to solve a multi-player differential game problem, wherein N-actor and N-critic neural network structures are used to approximate the optimal control laws and the optimal value functions, respectively. Simulations and Lyapunov stability analysis are provided in each section to demonstrate the performance of the control designs.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Marcus A Johnson.
Thesis:
Thesis (Ph.D.)--University of Florida, 2011.
Local:
Adviser: Dixon, Warren E.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2011
System ID:
UFE0043258:00001


This item is only available as the following downloads:


Full Text

PAGE 1

DIFFERENTIALGAME-BASEDCONTROLMETHODSFORUNCERTAIN CONTINUOUS-TIMENONLINEARSYSTEMS By MARCUSA.JOHNSON ADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOL OFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF DOCTOROFPHILOSOPHY UNIVERSITYOFFLORIDA 2011 1

PAGE 2

2011MarcusA.Johnson 2

PAGE 3

Tomywife, Mirna ,myparents, Carolyn and KeithJohnson ,andmysisters, Christine and Michele ,fortheirunwaveringsupportandconstantencouragement 3

PAGE 4

ACKNOWLEDGMENTS Iwouldliketosincerelythankmyadvisor,WarrenE.Dixon,whoseexperienceand motivationhavebeeninstrumentalinthesuccessfulcompletionofmyPhD.Theguidance andthepatiencehehasshownovertheyearshavehelpedmematureinmyresearchand asaprofessional.Iwouldalsoliketoextendmygratitudetomycommitteemembers NormanFitz-Coy,PrabirBarooah,andPramodKhargonekarforthetimeandhelpthey provided.Iwouldliketothankmywifeforherloveandpatience.Also,Iwouldliketo thankmyfamilyforbelievinginme. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS ................................. 4 LISTOFFIGURES .................................... 7 ABSTRACT ........................................ 9 CHAPTER 1INTRODUCTION .................................. 11 1.1Motivation ................................... 11 1.2Background ................................... 12 1.3ProblemStatementandContributions .................... 19 2ASYMPTOTICNASHOPTIMALCONTROLDESIGNFORANUNCERTAINEULER-LAGRANGESYSTEM ....................... 27 2.1DynamicModelandProperties ........................ 27 2.2ErrorSystemDevelopment ........................... 28 2.3TwoPlayerFeedbackNashNonzero-SumDi! erentialGame ......... 30 2.4RISEFeedbackControlDevelopment ..................... 37 2.5StabilityAnalysis ................................ 40 2.6Simulation .................................... 44 2.7Summary .................................... 46 3ASYMPTOTICSTACKELBERGOPTIMALCONTROLDESIGNFORAN UNCERTAINEULER-LAGRANGESYSTEM ................... 49 3.1DynamicModelandProperties ........................ 49 3.2ErrorSystemDevelopment ........................... 49 3.3TwoPlayerOpen-LoopStackelbergNonzero-SumDi! erentialGame .... 51 3.4RISEFeedbackControlDevelopment ..................... 59 3.5StabilityAnalysis ............................... 61 3.6Simulation .................................... 62 3.7Summary .................................... 63 4APPROXIMATETWOPLAYERZERO-SUMGAMESOLUTIONFORAN UNCERTAINCONTINUOUSNONLINEARSYSTEM .............. 67 4.1TwoPlayerZero-SumDi! erentialGame ................... 68 4.2HJIApproximationAlgorithm ......................... 71 4.3SystemIdentication .............................. 73 4.4Actor-CriticDesign ............................... 80 4.5StabilityAnalysis ................................ 83 4.6ConvergencetoNashSolution ......................... 89 5

PAGE 6

4.7Simulation .................................... 91 4.8Summary .................................... 93 5APPROXIMATE N -PLAYERNONZERO-SUMGAMESOLUTIONFORAN UNCERTAINCONTINUOUSNONLINEARSYSTEM .............. 100 5.1 N -playerNonzero-SumDi! erentialGame ................... 101 5.2HJBApproximationviaACI .......................... 103 5.3SystemIdentier ................................ 105 5.4Actor-CriticDesign ............................... 106 5.5StabilityAnalysis ................................ 109 5.6ConvergencetoNashSolution ......................... 116 5.7Simulation .................................... 119 5.8Summary .................................... 121 6CONCLUSIONANDFUTUREWORK ...................... 128 6.1Conclusion .................................... 128 6.2FutureWork ................................... 130 REFERENCES ....................................... 133 BIOGRAPHICALSKETCH ................................ 143 6

PAGE 7

LISTOFFIGURES Figure page 2-1ThesimulatedtrackingerrorsfortheRISEandNashoptimalcontroller. .... 46 2-2ThesimulatedtorquesfortheRISEandNashoptimalcontroller. ........ 47 2-3Thedi! erencebetweentheRISEfeedbackandthenonlineare! ectandbounded disturbances. ..................................... 47 2-4Costfunctionalsfor u 1 and u 2 ........................... 48 3-1ThesimulatedtrackingerrorsfortheRISEandStackelbergoptimalcontroller. 64 3-2ThesimulatedtorquesfortheRISEandStackelbergoptimalcontroller. .... 64 3-3Thedi! erencebetweentheRISEfeedbackandthenonlineare! ectandbounded disturbances. ..................................... 65 3-4Costfunctionalsfortheleaderandfollower. .................... 65 4-1Theevolutionofthesystemstatesforthezero-sumgame,withpersistentlyexcitedinputfortherst10seconds. ......................... 94 4-2Errorinestimatingthestatederivatives,withtheidentierforthezero-sum game. ......................................... 95 4-3Convergenceofcriticweightsforthezero-sumgame. ............... 95 4-4Convergenceofactorweightsforplayer1andplayer2inazero-sumgame. .. 96 4-5Optimalvaluefunctionapproximation V ( x ) ,forazero-sumgame. ....... 96 4-6Optimalcontrolapproximations u 1 and u 2 ,inazero-sumgame. ........ 97 4-7Theevolutionofthesystemstatesforthezero-sumgame,withacontinuous persistentlyexcitedinput. .............................. 97 4-8Convergenceofcriticweightsforthezero-sumgame,withacontinuouspersistentlyexcitedinput. ................................. 98 4-9Convergenceofactorweightsforplayer1andplayer2inazero-sumgame,with acontinuouspersistentlyexcitedinput. ...................... 98 4-10Optimalvaluefunctionapproximation V ( x ) forazero-sumgame,withacontinuouspersistentlyexcitedinput.. ......................... 99 4-11Optimalcontrolapproximations u 1 and u 2 inazero-sumgame,withacontinuouspersistentlyexcitedinput. ........................... 99 7

PAGE 8

5-1Theevolutionofthesystemstatesforthenonzero-sumgame,withpersistently excitedinputfortherst10seconds. ....................... 122 5-2Errorinestimatingthestatederivatives,withtheidentierforthenonzerosumgame. ...................................... 123 5-3Convergenceofcriticweightsforthenonzero-sumgame. ............. 123 5-4Convergenceofactorweightsforplayer1andplayer2inanonzero-sumgame. 124 5-5Valuefunctionapproximation V ( x ) ,foranonzero-sumgame. .......... 124 5-6Optimalcontrolapproximation u ,foranonzero-sumgame. ........... 125 5-7Theevolutionofthesystemstatesforthenonzero-sumgame,withacontinuouspersistentlyexcitedinput. ........................... 125 5-8Convergenceofcriticweightsforthenonzero-sumgame,withacontinuouspersistentlyexcitedinput. ............................... 126 5-9Convergenceofactorweightsforplayer1andplayer2inanonzero-sumgame, withacontinuouspersistentlyexcitedinput. ................... 126 5-10Valuefunctionapproximation V ( x ) foranonzero-sumgame,withacontinuous persistentlyexcitedinput. .............................. 127 5-11Optimalcontrolapproximation u foranonzero-sumgame,withacontinuous persistentlyexcitedinput. .............................. 127 8

PAGE 9

AbstractofDissertationPresentedtotheGraduateSchool oftheUniversityofFloridainPartialFulllmentofthe RequirementsfortheDegreeofDoctorofPhilosophy DIFFERENTIALGAME-BASEDCONTROLMETHODSFORUNCERTAIN CONTINUOUS-TIMENONLINEARSYSTEMS By MarcusA.Johnson August2011 Chair:WarrenE.Dixon Major:AerospaceEngineering Gametheorymethodshavebeeninstrumentalintheadvancementofvariousdisciplinessuchassocialscience,economics,biology,andengineering.Thefocusofthis dissertationistodeveloptechniquesforapproximatingsolutionstozero-sumandnonzerosumnoncooperativedi! erentialgamesandusingthesesolutionstostabilizesomeclasses ofparametricallyuncertainanddisturbednonlinearsystems.Onecontributionofthis workisthedevelopmentofarobust(sub)optimalcontrollerthatstabilizesanuncertain Euler-Lagrangesystemwithadditivedisturbancesandyieldsasolutiontothefeedback Nashdi!erentialgame.ThecontrolformulationutilizestheRobustIntegralSignofthe Error(RISE)controltechniquetoasymptoticallyidentifynonlinearitiesinthedynamics andconvergetoaresiduallinearizedsystem,thenthesolutiontotheNashgameisused toderivethestabilizingfeedbackcontrollaws. FurthermoretheNashoptimalcontroltechniqueisimprovedwhenoneplayerhas additionalinformationabouttheotherplayer,inthecaseoftheStackelbergdi! erential game.Anothercontributionofthisworkisa(sub)optimalopen-loopStackelberg-based controllerwithaleader-followerstructure,whichbothplayersactasinputstoaparametricallyuncertainanddisturbednonlinearsystem. Anothercontributionofthisworkisatechniqueforsolvingatwoplayerzero-sum innitehorizongamesubjecttocontinuous-timeunknownnonlineardynamics.The techniqueinvolvesageneralizationofanactor-critic-identier(ACI)structurewhich 9

PAGE 10

isusedtoimplementHamilton-Jacobi-Isaac(HJI)approximationalgorithm.TheHJI approximationuses,twoneuralnetwork(NN)actorstructuresandoneNNcriticstructure toapproximatetheoptimalcontrollawsandvaluefunction,respectively. UsingtheACItechnique,anothercontributionofthisworkisderivinganapproximatesolutiontoa N -playernonzero-sumgame.ThetechniqueexpandstheACIstructure tosolveamulti-playerdi! erentialgameproblem,wherein N -actorand N -criticneural networkstructuresareusedtoapproximatetheoptimalcontrollawsandtheoptimal valuefunctions,respectively.SimulationsandLyapunovstabilityanalysisareprovidedin eachsectiontodemonstratetheperformanceofthecontroldesigns. 10

PAGE 11

CHAPTER1 INTRODUCTION 1.1Motivation Numeroustechnical,economicalorbiologicalprocessesareoftengovernedbyordinary di! erentialequations,wherethestateofthesystemisafunctionoftimeandcanbe inuencedwithinputparametersandexogenousenvironmentaldisturbances.Theeld ofcontroltheorystudiesmethodsofdeningtheinputparameterssuchthatstateofthe systembehavesinan acceptable manner;where acceptable performanceisgenerallydened intermsoftimehistoryandfrequencyresponsecriteria.Theeldofoptimalcontrol theorywasdevelopedasanapproachtoanalyticallydetermineinputparametersthat willsatisfythephysicalconstraintsofthesystem,whilealsominimizingaperformance criteria.Optimalcontrolhavebeenextensivelyinvestigatedasameanstoderiveanalytic proofsforclassesofsystemswhereasingleinputparameterinuencesthesystem, particularlyforlineardynamics.However,someinterestingquestionsarisewhenmultiple inputparametersareconsideredinadynamicsystem,forexample: Howcansystemsbedescribedwhenmorethanoneinputprovidesinuence? Howdotheinputparametersinuencethesystem?oreachother? Whatcriteriademonstratesthebehaviorofthesystemis acceptable ? Givenoptimalityconstraints,howcanoptimalcontrollersbedetermined? Howcanthecriteriaforoptimalitybedetermined? Gametheoryisoneapproachtoaddressconcernsraisedfromthesynthesisofcontrollers forcomplexdynamicsystems.Gametheorydealswithstrategicinteractionsamong multipleinputparameters,calledplayers(andinsomecontextagents),witheachplayer's objectivescapturedinavalue(orobjective)functionwhichtheplayereithertriesto maximizeorminimize.Foranon-trivialgame,thevaluefunctionofaplayerdependson thechoices(actions,orequivalentlydecisionvariable)ofatleastoneotherplayer,and generallyofalltheplayers;hence,playerscannotsimplyoptimizetheirownobjective 11

PAGE 12

functionsindependentofthechoicesoftheotherplayers.Thisincorporatescoupling betweentheactionsoftheplayersandbindsthemtogetherindecisionmakingevenin anon-cooperativeenvironment.Thisdissertationexplorestheutilityofgametheory (particularlydi! erentialgametheory)inderivingcontrollersthatstabilizenonlinear dynamicsystems. 1.2Background Thebasicprobleminoptimizationtheoryistondtheminimumvalueofafunction: min x X V ( x ) Typically V ( x ) isacontinuousfunctionandtheminimumissoughtoveraclosed,possibly unboundeddomain X R n .Extensiveresearchhasinvestigatedtheexistenceofthe minimum,necessaryandsu" cientconditionsforoptimality,andcomputationalmethods forapproximatingasolution.Optimizationtheorylaunchedthestudyofoptimalcontrol theory,wherethestate x ( t ) R n evolvesovertime.Forthestandardoptimalcontrol problem, x ( t ) evolvesbasedonanordinarydi! erentialequation(ODE)as x = f ( t,x ( t ) ,u ( t )) x (0)= x 0 ,t [0 ,T ] where t #$ u ( t ) U isthecontrolstrategy,rangingwithinthesetofadmissiblecontrol strategies U .Givenaninitialcondition x 0 ,theoptimalcontrolproblemistodeterminea controlstrategy u ( ) whichminimizesavalue(orcost)function J ( t,x ( t ) ,u ( t )) R J = !( x ( T ))+ T 0 L ( ,x ( ) ,u ( )) d !, (11) where R istheterminalcostand L R isthelocal(orrunning)cost.Techniques todeterminesolutionstotheoptimalcontrolproblemhavelargelybeenbasedontwo di! erentfundamentalideas:Bellman'soptimalityprincipleandPontryagin'smaximum principle.Dynamicprogrammingandtheassociatedoptimalityprinciple,whichisa su cientconditionforoptimality,wasintroducedbyBellmanintheUnitedStates, 12

PAGE 13

whereasthemaximumprinciple,whichisanecessaryconditionforoptimality,was introducedbyPontryaginintheSovietUnion.Bellman'sapproachtooptimalityconsiders thatifagivenstate-actionsequenceisoptimal,andtheinitialstateandactionare removed,theremainingsequenceisalsooptimal.Incontrast,Pontryagin'smaximum principleinvolvesndinganadmissiblecontrolinputthatminimizesaHamiltonian function.Pontryagin'smaximumprinciplecanonlybeappliedtodeterministicproblems, butyieldsthesamesolutionsasthedynamicprogrammingapproach.However,converse tothedynamicprogrammingapproachthemaximumprincipleavoidsthecurseof dimensionality.Undercertainconditions,themaximumprincipleandtheBellman principlecanbereducedtotheHamilton-Jacobi-Bellman(HJB)equation.Fornonlinear systems,thesolutiontotheHJBequationcanoftenbeintractableormaynotexist, thusvariousnumericandanalytictechniqueshavebeendevelopedtoapproximatethe solution[ 1 10 ]orindirectlydetermineasolutionusinginversemethods[ 11 18 ]. Forsystemswithmultipleplayers,gametheoryo! ersanaturalextensiontothe optimalcontrolproblem.TheworkbyVonNeumannandMorgenstern[ 19 ],widely regardedasthepreliminaryworkongametheory,focusedprimarilyontwo-player, zero-sumgames.Nash[ 20 ]providedasolutionapproachforaclassofgeneral N -player non-cooperativegames.Motivatedbytheanalysisofmarketeconomy,themonographby Stackelberg[ 21 ]onhierarchicalrelationshipsamongplayersprovidedfurthercontribution tothetheoryofgames.Intheearly1950sRufusIsaacs[ 22 ]pioneereddi! erentialgame theory,whichenabledanaturalmulti-playerextensionofthedynamicprogramming solutiontotheoptimalcontrolproblem.Inthecaseoftwoplayers,adi! erentialgame considersasystemwhosestate x ( t ) R n evolvesaccordingtotheODE x = f ( t,x ( t ) ,u 1 ( t ) ,u 2 ( t )) x (0)= x 0 ,t [0 ,T ] where t #$ u i ( t ) U i i =1 2 isthecontrolstrategy,rangingwithinthesetofadmissible controlstrategies U i .Giventheinitialcondition x 0 ,theobjectiveofthe i -thplayeristo 13

PAGE 14

minimizethevaluefunction J i ( t,x ( t ) ,u 1 ( t ) ,u 2 ( t )) R J i = !( x ( T ))+ T 0 L ( ,x ( ) ,u 1 ( ) ,u 2 ( )) d !. Whiletheoptimalcontrolproblemhasacleardenitionofoptimalitythatistiedtothe minimizationofthevaluefunction,theconceptofoptimalityingametheoryisnotas well-dened.Specicformsof optimality fordi! erentialgamescanbedenedinterms ofequilibriumsolutions(e.g.Nash,Pareto,Bayesian,Stackelberg).Inthisconstruct, thestructureandinformationsetsofdi! erentialgamescanvastlychangethegames objective.Theamountofcooperationthatoccursbetweenplayersinagameisoneofthe keydi! erencesbetweendi! erentbranchesofgametheoryliterature.Iftheplayersact inunison,buteachplayerhasadi! erentobjective(costfunction)thenmulti-objective optimizationisobtained[ 23 25 ].Inasituationinwhichthereisacommonvaluefunction andallplayersactcooperatively,thenteamtheoryisobtained[ 26 28 ]andifsomesubset oftheplayerscanmaketheirdecisionsinunison,suchthatamutualbenecialoutcome canbeobtainedthenacooperativegameisachieved[ 29 30 ].Cooperativegametheory impliesplayersareabletoformbindingcommitmentssuchthatthebestresultoccursfor thegameatlarge,whereasnoncooperativegamesarewheneachplayerpursuesindividual intereststhatarepartlyconictingwithothers.Themostextremecaseofconicting interestisazero-sumgame,inwhichplayersarediametricallyopposed.Previousresearch hasexploitednoncooperativegametheory[ 31 46 ]toprovidenumeroussolutiontechniques toawiderangeofcontrolengineeringapplications.Solutionstononcooperativegamesare referredtoasanequilibriumduetothefactthatthesolutionrepresentscontrolstrategies thatprovidebalancebetweenindependentinterestsofeachplayer.Azero-sumgame formulationthathasbeenthoroughlyexploredincontroltheoryisthetwo-playerminmax H controloptimizationproblem[ 41 ],wherethecontrollerisaminimizingplayer andthedisturbanceisamaximizingplayeryieldingaNashequilibria.Inazero-sumgame withlineardynamicsandaninnitehorizonquadraticcostfunction,theNashequilibrium 14

PAGE 15

solutionisequivalenttosolvingthegeneralizedgamealgebraicRiccatiequation(GARE). However,fornonlineardynamicsdevelopingananalyticalsolutioniscomplicatedby solvingaHamilton-Jacobi-Isaacs(HJI)partialdi!erentialequation,whereasolutionmay benon-smoothorintractable. Inadi! erentialgameformulation,thecontrolledsystemisinuencedbyanumber ofdi! erentinputs,computedbydi! erentplayersthatindividuallytrytooptimizetheir respectiveperformancefunctions.Thecontrolobjectiveistodetermineasetofpolicies thatareadmissible[ 47 ],i.e.controlpoliciesthatguaranteethestabilityofthedynamic systemandminimizeindividualperformancefunctionstoyieldanequilibrium.The subsequentsectionswillintroducecommondenitionsofequilibriumindi! erentialgames. TheParetoEquilibrium. Paretooptimalityisagenreofcooperativegametheory. Theso-calledParetoe"cientsolutionsarebasedonthepremisethatthecostanyone specicplayerincursisnotuniquelydetermined,ratherthesolutionisdeterminedwhen thecostincurredbyallplayerssimultaneouslycannotbeimproved.Formally,thecost function J i ( t,x,u 1 ,...,u N ) isdenedas J i = T 0 L ( ,x ( ) ,u 1 ( ) ,...,u N ( )) d !,i =1 ,...,N (12) where x ( t ) isthesolutionto x = f ( t,x ( t ) ,u 1 ( t ) ,...,u N ( t )) ,x (0)= x 0 (13) Asetofcontrolactions u iscalledParetoe" cientifthesetofinequalities J i ( u ) % J i ( u ) ,i =1 ,...,N, doesnotpermitasolution u U ,whereatleastoneoftheinequalitiesisstrict.The correspondingpoint ( J 1 ( u ) ,...,J N ( u )) R N iscalledaParetosolution.Theset ofallParetosolutionsiscalledtheParetofrontier.Awell-knownwaytodetermine Paretosolutionsistosolveaparameterizedoptimalcontrolproblem[ 48 50 ],however, 15

PAGE 16

ingeneral,itisunclearwhetherthisapproachyieldsallParetosolutions.ThePareto e cientsolutionisincludedinthisintroductionforcompleteness,howeverthetechniques developedinthisworkwillnotutilizeoptimalitydenedinthisstructure. TheNashEquilibrium. ANashdi! erentialgameconsistsofmultipleplayersmakingsimultaneousdecisionswhereeachplayerhasanoutcomethatcannotbeunilaterally improvedfromachangeinstrategy.Playersarecommittedtofollowingapredetermined strategybasedonknowledgeoftheinitialstate,thesystemmodelandthecostfunctional tobeminimized.Formally,forthecostfunctioninEq. 12 ,subjecttothestatedynamics inEq. 13 ,asetofcontrolactions ( u # i ,...,u # N ) isaNashequilibriumsolutionforthe N -playergame,ifthefollowing N inequalitiesaresatisedforall u # i U i ,i N : J # 1 J 1 ( x ( t ) ,u # 1 ,u # 2 ,...u # N ) % J 1 ( x ( t ) ,u 1 ,u # 2 ,...u # N ) J # 2 J 2 ( x ( t ) ,u # 1 ,u # 2 ,...u # N ) % J 2 ( x ( t ) ,u # 1 ,u 2 ,...u # N ) J # N J N ( x ( t ) ,u # 1 ,u # 2 ,...u # N ) % J N ( x ( t ) ,u # 1 ,u # 2 ,...u N ) " " " " " # " " " " " $ SolutiontechniquestotheNashequilibriumcanbeclassiedinvariouswaysdependingon theamountofinformationavailabletotheplayers(open-loop,closed-loop,feedback,etc.), theobjectivesofeachplayer(zero-sumandnonzero-sum),theplanninghorizon(innite horizonandnitehorizon),andthenatureofthedynamicconstraints(continuous, discrete,linear,nonlinear,etc).Alargebodyofresearchhasfocusedonlinearquadratic gamesonanitetimehorizon.OneissuethattheNashequilibriaposes,isthatingeneral auniqueNashequilibriumisnotexpected.Non-uniquenessissueswithNashequilibria werediscussedforanonzero-sumdi! erentialgamein[ 51 ].Inthecaseoftheopen-loop nonzero-sumgame,whereeveryplayerknowsattime t [0 ,T ] theinitialstate x 0 conditionsfortheexistenceofauniqueNashequilibriumcanbegivenby[ 52 ].Inthecase ofclosed-loopperfectstateinformation,whereeveryplayerknowsattime t [0 ,T ] the completehistoryofthestate,ithasbeenshownthatinnitelymanyNashequilibriamay 16

PAGE 17

exist.Inthiscase,itispossibletorestricttheNashequilibriumtoasubsetoffeedback solutions,whichisknownasthe(sub)gameperfectNashequilibria(orfeedbackNash equilibria).TheworkofCase[ 53 ]andFriedman[ 54 ],wasshownthat(sub)gameperfect Nashequilibriaare(atleastheuristically)givenbyfeedbackstrategiesandthattheir correspondingvaluefunctionsarethesolutiontoasystemofHamiltonJacobiequations. Theseconceptshavebeensuccessfullyappliedtolinear-quadratic(LQ)di! erential games[ 48 53 ].AspecialcaseoftheNashgameisthemin-maxsaddlepointequilibrium, whichiswidelyusedincontroltheorytominimizecontrole! ortundera worst-case level ofuncertainty.Thesaddlepointequilibriumhasbeenheavilyexploitedin H control theory[ 55 ],whichconsidersndingthesmallestgain & 0 underwhichtheuppervalue ofthecostfunction J ( u,v )= 0 Q ( x )+ u ( x ) 2 2 ( v ( x ) ( 2 d !, (14) isboundedandndingthecorrespondingcontrollerthatachievesthisupperbound. H controltheoryrelatestoLQdynamicgamesinthesensethattheworst-case H design problemshaveequalupperandlowerboundsoftheobjectivefunctioninEq. 14 ,which resultsinthesaddle-pointsolutiontotheLQgameproblem.Inboththe H control problemandtheLQgameproblem,theunderlyingdynamicoptimizationisforatwo playerzero-sumgamewiththecontrollerbeingtheminimizingplayerandthedisturbance beingthemaximizingplayer. TheStackelbergEquilibrium. Ahierarchicalnonzero-sumtechniquewasderivedbyVonStackelberg[ 21 ],whereanequilibriumsolutioncanbedeterminedwhen oneplayer'sstrategyhasinuenceoveranotherplayer'sstrategy.TheStackelbergtechniquehasbeenacceptedasthesolutiontoabroadclassofhierarchicaldecisionmaking problemswhereonedecisionmaker(calledtheleader)announcesastrategypriortothe announcementoftheseconddecisionmaker's(calledthefollower)strategy.Forthetwo 17

PAGE 18

playergame,considerthecostfunctiondenedinEq. 12 ,where N =2 ,withthedynamicconstraintsinEq. 13 .Theoptimalreactionsetforplayer1(thefollower u 1 U 1 ) toacontrol u 2 U 2 is R 1 ( u 2 )= { " U 1 | J 1 ( ,u 2 ) % J 1 ( u 1 ,u 2 ) ) u 1 U 1 } Ifplayer2isleadingthen u # 2 U 2 iscalledaStackelbergequilibriumforplayer2.Ifforall u 2 U 2 sup ! R 1 ( u 2 ) J 2 ( ,u # 2 ) % sup ! R 1 ( u 2 ) J 2 ( ,u 2 ) then u # 1 R 1 ( u # 2 ) isanoptimalStackelbergstrategyforthefollower.Animportant motivationfortheuseoftheStackelbergstrategybytheleaderliesinthereducedvalue ofthecostfunctionascomparedtotheNashstrategy;thusitcanbeshownthata StackelbergstrategyisatleastasgoodasanyNashstrategyfortheleader[ 56 ].The Stackelbergstrategycanbedividedintothreeessentialtypes:1)open-loopstrategies[ 37 57 58 ],2)closed-loopstrategies[ 35 36 45 ],and3)feedbackstrategies[ 38 58 61 ].In[ 37 ], an N -playernonzero-sumStackelbergdi! erentialgameisgeneralizedforamulti-input linearsystem,wheretheplayersaredividedintoagroupofleadersthatuseaStackelberg policyandagroupoffollowersthatuseaNashpolicy.Furthermore,hierarchicalcontrol problemsforclosed-loopStackelbergsolutionsarepresentedin[ 35 36 45 ].In[ 36 ]necessary conditionsaredevelopedforaclosed-looptwo-playerStackelberggamewithalinear quadraticcostconstrainedbylineardynamics.Whereasin[ 35 ],su" cientconditionsare derivedforaclosed-looptwoplayersolutionsubjecttoadiscretelinearsystem.The noveltyof[ 35 ]isthattherequirementforanapriorirestrictiononthestructureofthe player'sstrategiesisremoved.Thetechniquein[ 35 ]isextendedin[ 45 ]toderivetheteam optimalsolutionforthetwo-personcontinuous-timelineardi! erentialgameproblems underquadraticcostfunctionsforbothplayers.Feedbackstrategiesarepresented in[ 59 61 ]whichfocusonthesolutiontothedynamicStackelbergproblemandare characteristicofcontainingmemoryless(purefeedback)featuresinthecontrolstrategies. 18

PAGE 19

Asolutiontechniqueforaclassofnonclassicaldynamicsispresentedin[ 38 ].Previous researchoftheopen-loopStackelberggameincontroltheoryhasmainlyfocusedon derivingananalyticsolutionandprovidinggainconstraints;however,theseresultsare limitedtolinearsystemswithknownplantsanddon'tdemonstratestabilizationproperties ofthecontrollaw. 1.3ProblemStatementandContributions ItiswidelyknownthatminimizingthecostfunctioninEq. 11 isequivalentto minimizingtheHJBequationgivenby 0=min u i [ V # i ( f ( t,x ( t ) ,u 1 ( t ) ,...,u N ( t )))+ L ( x,u 1 ,...,u N )] V # i (0)=0 ,i N whichisapartialdi! erentialequation.Forlinearsystems,solvingthisequationreduces tondingthesolutionofageneralizedgamealgebraicRiccati,equation,howeverfora nonlinearsystem,ndinganalyticsolutionstotheHJBmaybeburdensome.Furthermore, forcertainclassesofmulti-playergames(particularlynonzero-sumgames),minimization ofacostfunctionresultsinasetofcoupledHJBequations.Inlinearsystems,thesecoupledHJBequationsreducetoacoupledsetofDi! erentialRiccatiEquations(DRE),and varioustechniqueshavebeenproposedforestablishingnecessaryandsu"cientconditions fortheexistenceofasolutiontoDREs.Abodyofresearch(e.g.[ 41 52 62 64 ])hasalso beendedicatedtodeterminingtheconditionsforuniquenessofdi! erentialgameswith lineardynamics,particularlyintheareaofgameswithlinearquadraticcostfunctions. Fornonlineardynamics,thecoupledHJBequationsarenonlinearandderivationsof existenceanduniquenessareoftensparse.Whileitisknownthatanalyticcontrollaws canbederivedforthesesystems,oftenthecontrollawsaredependentonthesolution tothecoupledHJBequations,whichcanbeimpracticalforreal-timeimplementationon engineeringsystems.Twoapproachesareinvestigatedinthisdissertationtoaddressthese 19

PAGE 20

limitationsforuncertainnonlinearcontinuous-timesystems:robustfeedbacklinearization andapproximatedynamicprogramming. AnalyticOptimalControlSolutions: Acommontechniqueusedincontrol applicationstoderiveananalyticoptimalcontrolsolutiontoanonlinearsystemisthe nonlinear H controlsolution[ 55 65 68 ].However,theinnitehorizonformulationof thenonlinear H controlproblemrequiresasignicantcomputationale! ortfornonlinear systemstherebymakingitsapplicationtorealsystemsoftennearlyimpossible.In particular,thechallengeisthatthenonlinear H controlproblemrequiresthesolution totheHJB.Inverseoptimalcontrol(IOC)[ 11 12 15 17 69 74 ]isanapproachtodevelop optimalcontrollersforsystemswithoutsolvingtheHJBequation.TheobjectiveofIOC istodevelopacontrollerthatisoptimalwithrespecttoastabilityanalysis-derivedcost functional.PreviousresearchinIOCshasfocusedonndingacontrolLyapunovfunction (CLF)andacontrollerthatstabilizesthesystem,thendeterminingifthecontrolleris optimalforameaningfulcost.IOCdi! ersfromotheranalyticoptimalcontroltechniques byrequiringthelocalcosttobeposteriorideterminedbythestabilizingfeedback,rather thanaprioribythedesigner.Whenparametricuncertaintyexistsinthesystem,several inverseoptimaladaptivecontrollers(IOACs)[ 74 79 ]havebeendevelopedtocompensate foruncertaintiesthatarelinearintheparameters(LP). Theuseofneuralnetworks(NNs)isanotherapproachtoapproximateunknown dynamicsasameansofdevelopingapproximateanalyticoptimalcontrollers.Specically, resultssuchas[ 47 80 84 ]ndaone-playeroptimalcontrollawforagivencostfunction constrainedbyapartialfeedbacklinearizedsystem,andthenmodifytheoptimalcontrol lawwithaNNtoapproximatetheunknowndynamics.ThetrackingerrorsfortheNN methodsin[ 47 80 84 ]areproventobeuniformlyultimatelybounded(UUB)andthe resultingstatespacesystem,forwhichtheanalyticoptimalcontrollerisdeveloped,is approximated.In[ 85 ],anoptimalcontrollerisderivedforanEuler-Lagrangesystem usingaRISEfeedbacktechniquecombinedwithanoptimalcontrollerthatminimizesan 20

PAGE 21

objectivefunction.TheRISEcontrollerpartiallyfeedbacklinearizesthesystem,leaving aresiduenonlinearsystemthatcanbemanipulatedtoformalinearquadraticoptimal controlproblem.Theworkusingapartialfeedbacklinearizedsystemtodeterminean optimalcontrolleristhebasisforChapters 2 and 3 ApproximationofOptimalControlSolutions: Duetothedi" cultyinvolvedin determiningasolutiontotheHJBabranchofresearchisdedicatedtoapproximatinga solutiontotheoptimalcontrolproblemviadynamicprogramming[ 86 90 ].Reinforcement learning(RL)isamethodwhereinappropriateactionsarelearnedbasedonevaluative feedbackfromtheenvironment.AwidelyusedRLmethodisbasedontheactor-critic (AC)architecture,whereanactorperformscertainactionsbyinteractingwithitsenvironment,thecriticevaluatestheactionsandgivesfeedbacktotheactor,leadingtoan improvementintheperformanceofsubsequentactions.ACalgorithmsarepervasivein machinelearningandareusedtolearntheoptimalpolicyonlinefornite-spacediscretetimeMarkovdecisionproblems[ 1 2 91 ].PreviousresearchonRLusingadaptivecriticsin themachinelearningcommunity[ 1 5 ]providesanapproachtodeterminingthesolution ofanoptimalcontrolproblemusingApproximateDynamicProgramming(ADP)[ 86 90 ]. Thediscrete/iterativenatureoftheADPformulationlendsitselfnaturallytothedesign ofdiscrete-timeoptimalcontrollers[ 89 92 96 ].Baird[ 97 ]proposedAdvantageUpdating, anextensionoftheQ-learningalgorithmwhichcouldbeimplementedincontinuous-time andprovidedfastconvergence.AHJB-basedframeworkisusedin[ 98 ]and[ 99 ],and Galerkin'sspectralmethodisusedtoapproximatethegeneralizedHJBsolutionin[ 7 ]. Alloftheaforementionedapproachesforcontinuous-timenonlinearsystemsarecomputed o#ineand/orrequirecompleteknowledgeofsystemdynamics.Acontributionin[ 100 ] istherequirementofonlypartialknowledgeofthesystemandahybridcontinuoustime/discrete-timesampleddatacontrollerisdevelopedbasedonpolicyiteration(PI), wherethefeedbackcontroloperationoftheactoroccursatfastertimescalethanthe learningprocessofthecritic.VamvoudakisandLewis[ 101 ]extendedtheideabydesigning 21

PAGE 22

ahybridmodel-basedonlinealgorithmcalledsynchronousPIwhichinvolvedsynchronous continuous-timeadaptationofbothactorandcriticneuralnetworks.Bhasinet.al[ 102 ] developedacontinuousactor-critic-identier(ACI)techniquetosolvetheinnitehorizon optimalsingleplayeroptimalcontrolproblem,byusingarobustdynamicneuralnetwork (DNN)toidentifythedynamicsandacriticNNtoapproximatethevaluefunction.This techniqueremovestherequirementofcompleteknowledgeofthesystemdriftdynamics andincorporatesanindirectadaptivecontroltechniqueforaRLproblem.Mostofthe previousresearchoncontinuous-timereinforcementlearningalgorithmsthatprovidean onlineapproachtothesolutionofoptimalcontrolproblems,assumedthatthedynamical systemisa! ectedbyasinglecontrolstrategy.Previousresearchhasalsoinvestigatedthe generalizationofRLcontrollerstodi! erentialgameproblems[ 101 103 109 ].Techniques utilizingQ-learningalgorithmshavebeendevelopedforazero-sumgamein[ 110 ].An ADPprocedurethatprovidesasolutiontotheHJIequationassociatedwiththetwoplayerzero-sumnonlineardi! erentialgameisintroducedin[ 103 ].TheADPalgorithm involvestwoiterativecostfunctionsndingtheupperandlowerperformanceindices assequencesthatconvergetothesaddlepointsolutionofthegame.TheACstructure requiredforlearningthesaddlepointsolutioniscomposedoffouractionnetworksand twocriticnetworks.AniterativeADPsolutionwaspresentedin[ 104 ],whereitconsiders solvingzero-sumdi! erentialgamesundertheconditionthatthesaddlepointdoesnot exist,andamixedoptimalperformanceindexfunctionisobtainedunderadeterministic mixedoptimalcontrolschemewhenthesaddlepointdoesnotexist.AnotherADPiterationtechniqueispresentedin[ 105 ],inwhichthenon-a"nenonlinearquadraticzero-sum gameistransformedintoanequivalentsequenceoflinearquadraticzero-sumgamesto approximateanoptimalsaddlepointsolution.In[ 106 ],anintegralRLmethodisused todetermineanonlinesolutiontothetwoplayernonzero-sumgameforalinearsystem withoutcompleteknowledgeofthedynamics.ThesynchronousPImethodin[ 101 ]was thenfurthergeneralizedtosolvethetwo-playerzero-sumgameproblemin[ 108 ]anda 22

PAGE 23

multi-playernonzero-sumgamein[ 109 ]fornonlinearcontinuous-timesystemswithknown dynamics.Theworkfrom[ 101 102 108 109 ]providesthefoundationforChapters 4 and 5 ,wherethetwoplayerzero-sumgameandmulti-playernonzerosumgamearesolved usinganACItechniquewherethecontrollersareimplementedonlineandwithoutthe requirementofcompleteknowledgeofthesystemdriftdynamics. Thisdissertationfocusesondevelopingdi! erential-gamebasedcontrollersforspecic classesofuncertaincontinuous-timenonlinearsystems.ThecontributionsofChapters 2 5 areasfollows: AsymptoticNashOptimalControlDesignforanUncertainEulerLagrangeSystem: ThemaincontributionofChapter 2 isthedevelopmentofrobust (sub)optimalNash-basedfeedbackcontrollaws.ThischaptercombinestheRobustIntegralSignoftheError(RISE)[ 111 ]controllerwithanoptimalNashstrategytostabilize anuncertainEuler-Lagrangesystemwithadditivedisturbances.Oneadvantageofthis methodoverprevioustechniquesisthatthecontrolleraccountsforuncertaintyina state-varyingmassinertiamatrix.ThischapterillustratesthedevelopmentoftheRISE controllerwhichisusedtoasymptoticallyidentifythenonlinearitiesinthedynamics. ByapplyingtheRISEcontroller,thenonlineardynamicsconvergetoaresidualpartially linearizedsystem,andthesolutiontothefeedbackNashgamefortheresidualsystemis usedtoderivethestabilizingcontrollaws.The(sub)optimalfeedbackcontrollersminimize acostfunctionalinthepresenceofunknownboundeddisturbances.ALyapunov-based analysisisusedtoproveasymptotictrackingforthecombinedRISEandNash-based strategy.ExistenceofthefeedbackNashsolutionisdiscussedandsimulationresults demonstratethecontrolperformance. AsymptoticStackelbergOptimalControlDesignforanUncertainEulerLagrangeSystem: Chapter 3 derivesastabilizingsetofcontrollersforasystemin whichonecontrolinputhasadditionalinformationabouttheothercontrolinput.This scenarioisrepresentativeofmanyengineeringapplications,whereinteractionsamonga 23

PAGE 24

leaderandafollowerareobserved(e.g.formationcontrol,autonomousdocking,aerial refueling,etc.).Incomparison,theNashgameinChapter 2 considersbothplayerstohave noaprioriknowledgeabouteachother,whichcanleadtoanoverlyconservativecontrol design.Themaincontributionofthisworkisthedevelopmentofrobust(sub)optimal open-loopStackelberg-basedcontrollersfortheleaderandfollower,whichbothactas inputstoanuncertainnonlinearsystem.ARISEcontrollerisusedinconjunctionwith thederivedStackelbergstrategy.TheRISEcontrollerenablesthedynamicstobewritten inaresidualform,whichallowsfortheStackelbergdi! erentialgameformulation.The controlleraccountsforastate-varyingmassinertiamatrix,aswellas,additiveexogenous disturbancesandparametricuncertaintiesinthedynamics.Onenoveltyofthetechniques inChapter 2 and 3 istheuseoftheSkewSymmetricpropertytoreducethecoupled di! erentialRiccatiequationstoalgebraicRiccatiequationswhichallowsforconditions tobeestablishedforthesolutiontotheNashandStackelbergnonzero-sumgames.The controlformulationutilizesthesolutiontothehierarchicalopen-loopStackelbergnonzerosumgametoderivethefeedbackcontrollaws.ALyapunov-basedstabilityanalysisto proveasymptotictracking,andabriefdiscussiononexistenceofsolutionisprovided. Simulationresultsareincludedtoillustratetheperformanceofthedevelopedcontroller. Nonlineartwo-playerzero-sumgameapproximatesolutionusinganHJI approximationalgorithm: IncontrasttotheapproachesinChapters 2 and 3 ,which arelargelybasedonPontryagin'smaximumprinciple,thetechniquesinChapters 4 and 5 seektoapproximatethesolutiontotheHJIandHJB.Thisapproximationis basedonBellman'soptimalityprincipleanddynamicprogramming.ThemaincontributionofChapter 4 issolvingatwoplayerzero-suminnitehorizongamesubjectto continuous-timeunknownnonlineardynamicsthatarea"neintheinput.Inthedevelopedmethod,twoactorandonecriticNNsusegradientandleastsquares-basedupdate laws,respectively,tominimizetheBellmanerror,whichisthedi! erencebetweenthe exactandtheapproximateHJIequations.TheidentierDNNisacombinationofa 24

PAGE 25

Hopeld-type[ 112 ]component,inparallelcongurationwiththesystem[ 113 ],andaRISE component.TheHopeldcomponentoftheDNNlearnsthesystemdynamicsbasedon onlinegradient-basedweighttuninglaws,whiletheRISEtermrobustlyaccountsforthe functionreconstructionerrors,guaranteeingasymptoticestimationofthestateandthe statederivative.TheonlineestimationofthestatederivativeallowstheACIarchitecture tobeimplementedwithoutknowledgeofsystemdriftdynamics;however,knowledgeof theinputgainmatrixisrequiredtoimplementthecontrolpolicy.Whilethedesignof theactorandcriticarecoupledthroughaHJIequation,thedesignoftheidentieris decoupledfromtheactor-criticandcanbeconsideredasamodularcomponentinthe ACIarchitecture.ConvergenceoftheACI-basedalgorithmandstabilityoftheclosed-loop systemareanalyzedusingLyapunov-basedadaptivecontrolmethodsandapersistence ofexcitation(PE)conditionisusedtoguaranteeconvergencetowithinaboundedregion oftheoptimalcontrolandUUBstabilityoftheclosed-loopsystem.Themainadvantage ofthisACIapproachconsistsinthefactthatneitherofthetwoparticipantsinthegame makesuseofexplicitknowledgeofthemodelofthedriftdynamicsofthesystemthat theyinuencethroughtheircontrolstrategies.Thismeansthatthetwoplayerswilllearn onlinethemoste! ectivecontrolstrategiesthatcorrespondtotheNashequilibriumwhile usingnoexplicitknowledgeonthedriftdynamicsofthedi! erentialgame.Inaddition, thistechniqueconvergestotheapproximatesolutionoftheNashequilibrium,withoutthe needforiterativetechniquesoro# inetraining,anditincorporatestheoryfromadaptive control,makingitanapproximateindirectadaptivesolutiontoatwoplayerzero-sum di! erentialgame. Nonlinear N -playernonzero-sumgameapproximatesolutionusingaHJB approximationalgorithm: Nonzero-sumgamespresentdi! erentchallengeswhen comparedtozero-sumgames.Fornonlineardynamics,theHJIforzero-sumgamesis equivalentlyacoupledsetofnonlinearHJBequationsfornonzero-sumgames.Research innonzero-sumgamesfornonlinearsystemsissparseandtherearemanyopenresearch 25

PAGE 26

challenges.Chapter 5 considersa N -playernonzero-suminnitehorizongamesubject tocontinuous-timeuncertainnonlineardynamics.UsingtheACItechnique,themain contributionofthisworkisderivinganapproximatesolutiontoa N -playernonzero-sum gamewithatechniquethatiscontinuous,onlineandbasedonadaptivecontroltheory. Previousresearchintheareahasfocusedonscalarnonlinearsystemsorimplemented iterative/hybridtechniquesthatrequiredcompleteknowledgeofthedriftdynamics.The techniquedevelopedinChapter 5 expandstheACIstructuretosolveadi! erentialgame problem,wherein N -actorand N -criticneuralnetworkstructuresareusedtoapproximate theoptimalcontrollawsandtheoptimalvaluefunctionset,respectively.Themain traitsofthisonlinealgorithminvolvetheuseofADPtechniquesandadaptivetheoryto determinetheNashequilibriumsolutionofthegameinmannerthatdoesnotrequirefull knowledgeofthesystemdynamicsandtheapproximatelysolvestheunderlyingsetof coupledHJBequationsofthegameproblem.Foranequivalentnonlinearsystem,previous researchmakesuseofo#ineproceduresorrequiresfullknowledgeofthesystemdynamics todeterminetheNashequilibrium.ALyapunov-basedstabilityanalysisshowsthatUUB trackingfortheclosed-loopsystemisguaranteedandaconvergenceanalysisdemonstrates thattheapproximatecontrolpoliciesconvergetoaneighborhoodoftheoptimalsolutions. 26

PAGE 27

CHAPTER2 ASYMPTOTICNASHOPTIMALCONTROLDESIGNFORANUNCERTAIN EULER-LAGRANGESYSTEM Azero-sumgameformulationthathasreceivednotableinterestincontroltheoryis thetwo-playermin-max H controlproblem[ 41 ],wherethecontrollerisaminimizing playerandthedisturbanceisamaximizingplayerinyieldingaNashequilibria.The H formulationiswellsuitedfordisturbancerejectionproblemswherethecontroller and worst-case disturbancearederivedforaNashequilibrium.ANashstrategyisone suchthattheoutcomeofeachplayer'sinputcannotunilaterallyimprovebychanging theplayer'sstrategy.PreviousNashgamesfocusonzero-sumsolutiontechniquesfor linearsystemswithinnitehorizonquadraticcostfunctionals.Inthischapter,ageneral frameworkisdevelopedforfeedbackcontrolofanEuler-Lagrangesystemusingafeedback nonzero-sumNashdi! erentialgame.ARISEcontrollerisusedtocompensateforsome uncertainnonlinearitiessothatNashoptimalcontrollerscanbederivedforthegeneral trackingproblem.ALyapunov-basedstabilityanalysisandnumericalsimulationsare providedtoexaminethestabilityandperformanceofthedevelopedcontrollers. 2.1DynamicModelandProperties Theclassofnonlineardynamicsystemsconsideredinthischapterareassumedtobe modeledbythefollowingEuler-Lagrangeformulation: M ( q ) q + V m ( q, q ) q + G ( q )+ F ( q )+ d ( t )= 1 ( t )+ 2 ( t ) (21) InEq. 21 M ( q ) R n $ n denotesthegeneralizedinertiamatrix, V m ( q, q ) R n $ n denotes thegeneralizedcentripetal-Coriolismatrix, G ( q ) R n denotesthegeneralizedgravity vector, F ( q ) R n denotesthegeneralizedfrictionvector, d ( t ) R n isageneralbounded disturbance, 1 ( t ) 2 ( t ) R n representsinputcontrolvectors,and q ( t ) q ( t ) q ( t ) R n denotethegeneralizedposition,velocity,andaccelerationvectors,respectively.The subsequentdevelopmentisbasedontheassumptionthat q ( t ) and q ( t ) aremeasurable,and 27

PAGE 28

M ( q ) V m ( q, q ) G ( q ) F ( q ) ,and d ( t ) areunknown.Moreover,thefollowingproperties andassumptionsareexploitedinthesubsequentdevelopment. Assumption2.1. Theinertiamatrix M ( q ) issymmetric,positivedenite,andsatises thefollowinginequality ) y ( t ) R n : m 1 ( # ( 2 % # T M ( q ) # % m ( q ) ( # ( 2 (22) where m 1 R isaknownpositiveconstant, m ( q ) R isaknownpositivefunction,and ( ( denotesthestandardEuclideannorm. Assumption2.2. Thefollowingskew-symmetricrelationshipsaresatised: # T % M ( q ) 2 V m ( q, q ) & # =0 ) # R n (23) % M ( q ) 2 V m ( q, q ) & T = M ( q ) 2 V m ( q, q ) (24) % M ( q ) ' V m ( q, q )+ V T m ( q, q ) ( & T = M ( q ) ' V m ( q, q )+ V T m ( q, q ) ( (25) Assumption2.3. If q ( t ) q ( t ) L ,then V m ( q, q ) F ( q ) and G ( q ) arebounded. Moreover,if q ( t ) q ( t ) L ,thentherstandsecondpartialderivativesoftheelements of M ( q ) V m ( q, q ) G ( q ) withrespectto q ( t ) existandarebounded,andtherstand secondpartialderivativesoftheelementsof V m ( q, q ) F ( q ) withrespectto q ( t ) existand arebounded. Assumption2.4. Thedesiredtrajectoryisassumedtobedesignedsuchthat q d ( t ) q d ( t ) q d ( t ) ... q d ( t ) .... q d ( t ) R n exist,andarebounded. Assumption2.5. Thedisturbancetermanditsrsttwotimederivatives,i.e. d ( t ) d ( t ) d ( t ) areboundedbyknownconstants. 2.2ErrorSystemDevelopment Thecontrolobjectiveistoensurethatthegeneralizedcoordinatestrackadesiredtimevaryingtrajectory,denotedby q d ( t ) R n ,despiteuncertaintiesinthedynamicmodel, whileminimizingagivenperformanceindex.Toquantifythetrackingobjective,aposition 28

PAGE 29

trackingerror,denotedby e 1 ( q,t ) R n ,isdenedas e 1 q d q. (26) Tofacilitatethesubsequentanalysis,lteredtrackingerrors,denotedby e 2 ( q, q,t ) and r ( q, q, q,t ) R n ,arealsodenedas e 2 e 1 + $ 1 e 1 (27) r e 2 + $ 2 e 2 (28) where $ 1 $ 2 R n $ n arepositivedenite,constant,gainmatrices.Thelteredtracking error r ( q, q, q,t ) isnotmeasurableduetothefunctionaldependenceon q ( t ) .Theerror systemsarebasedontheassumptionthatthegeneralizedcoordinatesoftheEulerLagrangedynamicsallowadditiveandnotmultiplicativeerrors.Astate-spacemodelcan bedevelopedbasedonthetrackingerrorsinEqs. 26 and 27 .Basedonthismodel,a controllerisderivedthatminimizesaquadraticperformanceindexunderthe(temporary) assumptionthatthedynamicsinEq. 21 areknown.Acontroltermisdevelopedas thesolutiontoanonzero-sumNashdi! erentialgame.TheNash-derivedcontroltermis thencombinedwitharobustcontrollertoidentifytheunknowndynamicsandadditive disturbance,therebyrelaxingthetemporaryassumptionthatthesedynamicsareknown. Todevelopastate-spacemodelforthetrackingerrorsinEqs. 26 and 27 ,theinertia matrixispre-multipliedtothetimederivativeofEq. 27 ,andsubstitutionsaremade fromEqs. 21 and 26 toobtain M e 2 = V m e 2 + h + d ( 1 + 2 ) (29) wherethenonlinearfunction h ( q, q,t ) R n isdenedas h M ( q d + $ 1 e 1 )+ V m ( q d + $ 1 e 1 )+ G + F. (210) 29

PAGE 30

Underthe(temporary)assumptionthatthedynamicsinEq. 21 areknown,thecontrol inputscanbedesignedas 1 + 2 h + d ( u 1 + u 2 ) (211) where u 1 ( t ) ,u 2 ( t ) R n areauxiliarycontrolinputsthatwillbedesignedtominimizea desiredperformanceindex.BysubstitutingEq. 211 intoEq. 29 theclosed-looperror systemfor e 2 ( t ) canbeobtainedas M e 2 = V m e 2 + u 1 + u 2 (212) Astate-spacemodelforEqs. 27 and 212 cannowbedevelopedas z = A ( q, q ) z + B 1 ( q ) u 1 + B 2 ( q ) u 2 (213) where A ( q, q ) R 2 n $ 2 n B 1 ( q ) ,B 2 ( q ) R 2 n $ n z ( t ) R 2 n andaredenedas A ) + $ 1 I n $ n 0 n $ n M % 1 V m ,B 1 = B 2 / 0 n $ n M % 1 0 T ,z / e T 1 e T 2 0 where I n $ n and 0 n $ n denotea n + n identitymatrixandmatrixofzeros,respectively. 2.3TwoPlayerFeedbackNashNonzero-SumDi!erentialGame TheNashsolutionischaracterizedbyanequilibriainwhicheachplayerhasan outcomethatcannotbeimprovedbyaunilateralchangeofstrategy.TheNashstrategy safeguardsagainstasingleplayerdeviatingfromtheequilibriumstrategyandiswell suitedforproblemswherecooperationbetweenplayerscannotbeguaranteed.ToformulatethefeedbackNashsolution,considerthesysteminEq. 213 intermsofplayersofa Nashequilibriumgame( u N 1 u N 2 )givenas z = A ( q, q ) z + B 1 ( q ) u N 1 + B 2 ( q ) u N 2 (214) 30

PAGE 31

Eachplayerhasacostfunctional J N 1 ( z,u N 1 ,u N 2 ) ,J N 2 ( z,u N 1 ,u N 2 ) R denedas J N 1 = 1 2 t 0 z T Q N z + u T N 1 R N 11 u N 1 + u T N 2 R N 12 u N 2 ( dt (215) J N 2 = 1 2 t 0 z T L N z + u T N 2 R N 22 u N 2 + u T N 1 R N 21 u N 1 ( dt, (216) where t 0 R istheinitialtime, Q N ,L N R 2 n $ 2 n aresymmetricconstantmatricesdened as Q N = ) + Q N 11 Q N 12 Q T N 12 Q N 22 L N = ) + L N 11 L N 12 L T N 12 L N 22 , where Q Nij and L Nij R n $ n aresymmetricsemi-deniteconstantmatrices,and R Nij R n $ n ispositivedenitefor i,j =1 2 Thischapterfocusesonagamewithmemoryless perfectstateinformation,thereforethecontrollerinformationsetcontainsonlytheinitial conditions z 0 andthecurrentstateestimates z ( t ) attime t .Inthiscontext,theactionsof theplayersarecompletelydeterminedbytherelations ( u N 1 ,u N 2 )=( 1 ( z 0 ,z ) 2 ( z 0 ,z )) Apairofstrategies ( # 1 # 2 ) iscalledaNashequilibriumsetforthedi! erentialgameiffor allstrategies ( 1 2 ) thefollowinginequalitieshold J N 1 ( 1 # 2 ) & J N 1 ( # 1 # 2 ) J N 2 ( # 1 2 ) & J N 2 ( # 1 # 2 ) Basedontheminimumprincipal[ 114 ],theHamiltonians H N 1 ( z,u N 1 ,u N 2 ) ,H N 2 ( z,u N 1 ,u N 2 ) R ofthecontrolinputs u N 1 and u N 2 aredenedas, H N 1 = 1 2 z T Q N z + u T N 1 R N 11 u N 1 + u T N 2 R N 12 u N 2 ( (217) + % T N 1 ( Az + B 1 u N 1 + B 2 u N 2 ) H N 2 = 1 2 z T L N z + u T N 2 R N 22 u N 2 + u T N 1 R N 21 u N 1 ( (218) + % T N 2 ( Az + B 1 u N 1 + B 2 u N 2 ) 31

PAGE 32

respectively.Twopreviousresultsfrom[ 33 ]and[ 62 ]utilizingthememorylessperfect informationstructurearegiveninthefollowingtheorems. Theorem2.1. Letthestrategies ( # 1 # 2 ) besuchthatthereexistssolutions ( % 1 % 2 ) tothe di erentialequations % 1 = & & z H N 1 ( z," # 1 # 2 % 1 ) & & u 2 H N 1 ( z," # 1 # 2 % 1 ) & & z # 2 ( z 0 ,z ) (219) % 2 = & & z H N 2 ( z," # 1 # 2 % 1 ) & & u 1 H N 2 ( z," # 1 # 2 % 1 ) & & z # 1 ( z 0 ,z ) (220) where H N 1 and H N 2 aredenedinEqs. 217 and 218 andsuchthat & & u 1 H N 1 =0 & & u 2 H N 2 =0 and z satises z = Az + B 1 # 1 + B 2 # 2 ,z (0)= z 0 Then ( # 1 # 2 ) isaNashequilibriumwithrespecttothememorylessperfectstateinformationstructureandthefollowingequalitieshold u # 1 N = # 1 = R % 1 N 11 B T 1 % N 1 (221) u # 2 N = # 2 = R % 1 N 22 B T 2 % N 2 (222) Proof. Referto[ 33 ]. Remark 2.1 FromTheorem2-1,itcaneasilybeshownthattheopenloopNashequilibriumisalsoaNashequilibriumwithrespecttothememorylessperfectstateinformation structure. In[ 33 ]itwasshownthatiftheadmissiblestrategiesarerestrictedtoaclassof (possiblytimevarying)linearfeedbackstrategies,thenthereexistsananalyticlinear feedbackfortheNashequilibrium.Thefollowingtheoremsummarizesthatresultforthe currentsystem. 32

PAGE 33

Theorem2.2. Suppose ( K N ,P N ) satisfythecoupledDi! erentialRiccatiequations(DRE), givenby 0= K N + K N A + A T K N K N B 1 R % 1 N 11 B T 1 K N (223) K N B 2 R % 1 N 22 B T 2 P N + Q T N P N B 2 R % 1 N 22 B T 2 K N + P N B 2 R % T N 22 R N 12 R % 1 N 22 B T 2 P N 0= P N + P N A + A T P N P N B 1 R % 1 N 11 B T 1 K N (224) P N B 2 R % 1 N 22 B T 2 P N + L T N K N B 1 R % 1 N 11 B T 1 P N + K N B 1 R % T N 11 R N 21 R % 1 N 11 B T 1 K N Thenthepairofstrategies ( # 1 # 2 ) ' R % 1 N 11 B T 1 K N z, R % 1 N 22 B T 2 P N z ( isalinearfeedback NashequilibriumandthesolutionstothecostateequationsdenedinEqs. 219 and 220 arelinearfeedbacksgivenas % N 1 = K N z (225) % N 2 = P N z (226) Proof. Referto[ 62 ]. Remark 2.2 Itwasshownin[ 115 ]thatformoregeneral(i.e.nonlinearfeedback)strategiestheremayexistinnitelymanyfeedbackNashequilibriaforthememorylessperfect stateinformationstructure. ThesubsequentanalysisutilizesthefeedbackstructurefromTheorem2-2andthe skew-symmetricpropertyforEuler-LagrangesystemstoreducetheDREsdenedinEqs. 223 and 224 toalgebraicRiccatiequations,therebyderivingananalyticsolutionfor ( K N ,P N ) ,basedonthecontrolgains.Assumethat K N ( t ) ,P N ( t ) R 2 n $ 2 n aretimevaryingpositivedenitediagonalmatricesdenedas K N = ) + K N 11 0 n $ n 0 n $ n K N 22 P = ) + P N 11 0 n $ n 0 n $ n P N 22 . 33

PAGE 34

ThetwoDREsinEqs. 223 and 224 mustbesolvedsimultaneouslytoyieldacontrol strategyforthebothplayers.SubstitutingEqs. 214 225 ,and 226 intoEq. 223 yields foursimultaneousequationsas 0= K N 11 K N 11 $ 1 $ T 1 K N 11 + Q N 11 (227) 0= K N 11 + Q N 12 (228) 0= K N 11 + Q T N 12 (229) 0= K N 22 K N 22 M % 1 V m V T m M % 1 K N 22 + Q N 22 K N 22 M % 1 R % 1 N 11 M % 1 K N 22 K N 22 M % 1 R % 1 N 22 M % 1 P 22 (230) P N 22 M % 1 R % 1 N 22 M % 1 K N 22 + P N 22 M % 1 R % T N 22 R N 12 R % 1 N 22 M % 1 P 22 Likewise,fromEq. 224 ,foursimilarsimultaneousequationsaregeneratedas 0= P N 11 P N 11 $ 1 $ T 1 P N 11 + L N 11 (231) 0= P N 11 + L N 12 (232) 0= P N 11 + L T N 12 (233) 0= P N 22 P N 22 M % 1 V m V T m M % 1 P N 22 + L N 22 P N 22 M % 1 R % 1 N 11 M % 1 K N 22 P N 22 M % 1 R % 1 N 22 M % 1 P N 22 (234) K N 22 M % 1 R % 1 N 11 M % 1 P N 22 + K N 22 M % 1 R % T N 11 R N 21 R % 1 N 11 M % 1 K 22 If P N 22 ( t ) and K N 22 ( t ) areselectedas P N 22 = K N 22 = M ( q ) (235) thentheskewsymmetrypropertiesinAssumption2-2canbeappliedtoEqs. 230 and 234 todeterminetwoconstraintsonthecontrolgains R % 1 N 11 2 R % 1 N 22 + R % T N 22 R N 12 R % 1 N 22 + Q N 22 =0 (236) 2 R % 1 N 11 R % 1 N 22 + R % T N 11 R N 21 R % 1 N 11 + L N 22 =0 (237) 34

PAGE 35

Since Q N and L N areconstantmatricesthen K N 11 and P N 11 fromEqs. 228 and 232 respectively,mustalsobeconstantmatrices(i.e. P N 11 = K N 11 =0 ).Itisevidentfrom Eqs. 228 229 232 ,and 233 thatthefollowingrelationshipscanbeestablished K N 11 = 1 2 Q N 12 + Q T N 12 ( (238) P N 11 = 1 2 L N 12 + L T N 12 ( (239) TwomoreconstraintscanbeestablishedbysubstitutingEqs. 238 and 239 into 227 and 231 ,respectively,thenreducingtheequationsas 0= Q N 11 + 1 2 1' Q N 12 + Q T N 12 ( $ 1 + $ T 1 Q N 12 + Q T N 12 (2 0= L N 11 + 1 2 1' L N 12 + L T N 12 ( $ 1 + $ T 1 L N 12 + L T N 12 (2 SubstitutingEqs. 214 225 ,and 235 intoEq. 221 yieldstheNashderivedcontroller u N 1 = R % 1 N 11 B T 1 K N z = R % 1 N 11 e 2 (240) ThecontrollerinEq. 240 issubjecttotheotherplayer'sinput,derivedbysubstituting Eqs. 214 226 ,and 235 intoEq. 222 ,as u N 2 = R % 1 N 22 B T 2 L N z = R % 1 N 22 e 2 (241) Theweights( Q N ,L N )imposedapenaltyonthestatevectorsinthecostfunctionsEqs. 2 15 and 216 andthegainmatrices R % 1 N 11 and R % 1 N 22 aresubjecttothefollowingconstraints 0= R % 1 N 11 2 R % 1 N 22 + R % T N 22 R N 12 R % 1 N 22 + Q N 22 (242) 0= 2 R % 1 N 11 R % 1 N 22 + R % T N 11 R N 21 R % 1 N 11 + L N 22 (243) 0= 1 2 1' Q N 12 + Q T N 12 ( $ 1 + $ T 1 Q 12 + Q T 12 (2 + Q N 11 (244) 0= 1 2 1' L N 12 + L T N 12 ( $ 1 + $ T 1 L N 12 + L T N 12 (2 + L N 11 (245) 35

PAGE 36

BasedonthefeedbackNashstrategy,thederivedcontrollerinEq. 240 minimizesthe costfunctionalgivenbyEq. 215 andissubjecttothesecondplayer'scontrolinput inEq. 241 thatminimizesthecostfunctionalgivenbyEq. 216 .Todemonstrate optimalityoftheproposedcontroller,HamiltonianswereconstructedinEqs. 217 and 218 andtheoptimalcontrolproblemwasformulated.ThecostatevariablesinEqs. 225 and 226 wereassumedtobesolutionsofEqs. 219 and 220 andgainconstraintswere developed.IfallconstraintsinEqs. 242 245 aresatisedthentheassumedsolutionsin Eqs. 225 and 226 satisfyEqs. 218 220 ,andhence,are(sub)optimal. ExistenceandUniquenessofNashEquilibriumSolution. Theorem2-2 exploitsanexistenceanduniquenessproofforthefeedbackNashequilibriumsolutionthat iswellknowninliterature,howeverthisproofdemonstratestheexistenceanduniqueness foranon-stationaryNashfeedback.Ifthemassinertiamatrix M ( q ) isconstantthen theprioranalysisdemonstratesonepossibleanalyticsolutionfortheDREsthatwould yieldeda stationary feedbackstrategyfortheplayers u N 1 and u N 2 ,thereforeexistence anduniquenessoftheNashsolutionneedstobefurtherinvestigated.Forthegamewith aninnite-planninghorizon,Proposition3.6in[ 62 ]givessu"cientconditionsforthe existenceofaNashequilibrium.Thefollowingdenitionsandtheoremssummarizethe results. Denition2.1. Considerthesystem z = Az + Bu y = Cz + Du. Thesystemiscalledoutputstabilizableifthereexistsastatefeedback u = Fx suchthat thecorrespondingoutput y F =( C + DF ) x convergestozeroas t $, Theorem2.3. Supposethat ( C i ,D i ) aresuchthat C T 1 C 1 ,C T 2 C 2 ( =( Q N ,L N ) C T 1 D 1 = C T 2 D 2 =0 ,and D T 1 D 1 ,D T 2 D 2 ( =( R 11 ,R 22 ) .Ifthereexistsapairofsolutions ( K N ,P N ) 36

PAGE 37

thatsatisfythecoupledalgebraicRiccatiequations 0= K N A + A T K N K N B 1 R % 1 N 11 B T 1 K N K N B 2 R % 1 N 22 B T 2 P N (246) + Q T N P N B 2 R % 1 N 22 B T 2 K N 0= P N + P N A + A T P N P N B 1 R % 1 N 11 B T 1 K N P N B 2 R % 1 N 22 B T 2 P N (247) + L T N K N B 1 R % 1 N 11 B T 1 P N suchthat K N isthesmallestrealpositivesemi-denitesolutionofEq. 246 foragiven P N and P N isthesmallestrealpositivesemi-denitesolutionofEq. 247 foragiven K N ,and if ( K N ,P N ) aresuchthatthesystems A B 2 R % 1 N 22 B T 2 P N ,B 1 ,C 1 ,D 1 ( and A B 1 R % 1 N 11 B T 1 K N ,B 2 ,C 2 ,D 2 ( arebothoutputstabilizable,thenthestrategies ( # 1 # 2 ) givenby u # 1 N = R % 1 N 11 B T 1 K N z, u # 2 N = R % 1 N 22 B T 2 P N z, constitutesafeedbackNashequilibriuminalinearstationarystrategy. Proof. SimilartoproofofProposition3.6in[ 62 ]. Remark 2.3 Theorem2-3requiressmallrealpositivesemi-denitesolutionsofthecoupled Riccatiequations;however,thetheoremdoesnotimplyuniquenessoftheequilibrium. Furthermore[ 62 ]usesascalarexampletoillustratethepossiblenon-uniquenessof thelinearstationaryfeedbackNashequilibria.Aprooffortheuniquenessofalinear stationaryfeedbackNashequilibriumremainsanopenproblem. 2.4RISEFeedbackControlDevelopment Ingeneral,theboundeddisturbance d ( t ) andthenonlineardynamicsgiveninEq. 210 areunknown,sothecontrollergiveninEq. 211 cannotbeimplemented.However, ifthecontrolinputcontainsamethodtoidentifyandcancelthesee! ects,then z ( t ) will 37

PAGE 38

convergetothestatespacemodelinEq. 213 sothat u 1 ( t ) and u 2 ( t ) eachminimize aperformanceindex.Inthissection,acontrolinputisdevelopedthatexploitsRISE feedbacktoidentifythenonlineare! ectsandboundeddisturbancestoenablethesystem toasymptoticallyconvergetothestatespacemodel z ( t ) Todevelopthecontrolinput,theerrorsysteminEq. 28 ispre-multipliedby M ( q ) andtheexpressionsinEqs. 21 26 ,and 27 areusedtoobtain Mr = V m e 2 + h + d + $ 2 Me 2 ( 1 + 2 ) (248) Basedontheopen-looperrorsysteminEq. 248 ,thecontrolinputsarecomposedof theoptimalcontrollersdevelopedinEqs. 240 and 241 ,plusasubsequentlydesigned auxiliarycontrolterm ( t ) R n as 1 + 2 ( u N 1 + u N 2 ) (249) Theclosed-looptrackingerrorsystemcanbedevelopedbysubstitutingEq. 249 intoEq. 248 as Mr = V m e 2 + h + d + $ 2 Me 2 +( u N 1 + u N 2 ) . (250) Tofacilitatethesubsequentstabilityanalysistheauxiliaryfunction f d ( q d q d q d ) R n whichisdenedas f d M ( q d ) q d + V m ( q d q d ) q d + G ( q d )+ F ( q d ) (251) isaddedandsubtractedtoEq. 250 toyield Mr = V m e 2 + h + f d + d + u N 1 + u N 2 + $ 2 Me 2 (252) where h ( t ) R n isdenedas h h f d (253) 38

PAGE 39

SubstitutingEqs. 240 and 241 intoEq. 252 ,takingatimederivativeandusingthe relationshipinEq. 28 yields M r = 1 2 Mr + N + N D e 2 ( R % 1 N 11 + R % 1 N 22 ) r (254) afterstrategicallygroupingspecicterms.InEq. 254 ,theunmeasurableauxiliaryterms N ( q, q, q,e 1 ,e 2 ,r ) N D ( q d q d q d ... q d ) R n aredenedas N V m e 2 V m e 2 1 2 Mr + h + $ 2 Me 2 + $ 2 M e 2 + e 2 +( R % 1 N 11 + R % 1 N 22 ) $ 2 e 2 N D f d + d Motivationforgroupingtermsinto N ( ) and N D ( ) comesfromthesubsequentstability analysisandthefactthattheMeanValueTheorem,Assumption2-3,Assumption2-4,and Assumption2-5canbeusedtoupperboundtheauxiliarytermsas 3 3 3 N ( t ) 3 3 3 % ( ( y ( ) ( y ( (255) ( N D (% ( 1 3 3 3 N D 3 3 3 % ( 2 (256) where y ( e 1 ,e 2 ,r ) R 3 n isdenedas y [ e T 1 e T 2 r T ] T (257) theboundingfunction ( ( y ( ) R isapositivegloballyinvertiblenondecreasingfunction, and ( i R ( i =1 2) denoteknownpositiveconstants.BasedonEq. 254 ,thecontrol term ( t ) isdesignedasthegeneralizedsolutionto ( t ) k s r ( t )+ ) 1 sgn ( e 2 ) (258) 39

PAGE 40

where k s ) 1 R arepositiveconstantcontrolgains.Theclosedlooperrorsystemsfor r ( q, q, q,t ) cannowbeobtainedbysubstitutingEq. 258 intoEq. 254 as M r = 1 2 Mr + N + N D e 2 ( R % 1 N 11 + R % 1 N 22 ) r k s r ) 1 sgn ( e 2 ) (259) 2.5StabilityAnalysis Lemma2.1. Let O ( e 2 ,r,t ) R denotethegeneralizedsolutionto O ( t ) r T ( N D ( t ) ) 1 sgn ( e 2 )) ,O (0)= ) 1 n 4 i =1 | e 2 i (0) | e 2 (0) T N D (0) (260) where e 2 i (0) denotesthe i thelementofthevector e 2 (0) .Providedthat ) 1 isselected accordingtothesu" cientconditions: ) 1 > ( 1 + 1 % min ( $ 2 ) ( 2 (261) where ( 1 and ( 2 areknownpositiveconstantsdenedinEq. 256 ,then O ( e 2 ,r,t ) & 0 Proof. Referto[ 111 ]. Theorem2.4. ThecontrollergivenbyEqs. 240 241 ,and 249 ensuresthatallsystem signalsareboundedunderclosed-loopoperation,andthetrackingerrorsaresemi-globally asymptoticallyregulatedinthesensethat ( e 1 ( t ) ( ( e 2 ( t ) ( ( r ( t ) ($ 0 ast $, (262) providedthecontrolgain k s inEq. 258 isselectedsu" cientlylargebasedontheinitial conditionsofthesystem, ) 1 inEq. 258 isselectedsu" cientlylarge,and $ 1 $ 2 are selectedaccordingtothesu" cientconditions: % min ( $ 1 ) > 1 2 % min ( $ 2 ) > 1 (263) Furthermore, u N 1 ( t ) and u N 2 ( t ) minimizeEqs. 215 and 216 subjecttoEq. 213 providedthegainconstraintsgiveninEqs. 242 245 aresatised. 40

PAGE 41

Remark 2.4 Thecontrolgain $ 1 cannotbearbitrarilyselected,ratheritiscalculated usingaLyapunovequationsolver.Itsvalueisdeterminedbasedonthevalueof Q N L N R N 11 R N 21 and R N 22 Therefore Q N L N R N 11 R N 21 ,and R N 22 mustbechosensuchthat Eq. 263 issatised. Proof. Let D R 3 n +1 beadomaincontaining ( t )=0 ,where ( t ) R 3 n +1 isdenedas ( t ) 5 y T ( t ) 6 O ( t ) 7 T (264) Let V L ( ,t ): D + (0 , ) $ R beaLipschitzcontinuousregularpositivedenitefunction denedas V L ( ,t ) e T 1 e 1 + 1 2 e T 2 e 2 + 1 2 r T M ( q ) r + O, (265) whichsatisesthefollowinginequalities: U 1 ( ") % V L ( ,t ) % U 2 ( ") (266) providedthesu"cientconditionsintroducedinEqs. 263 261 aresatised.InEq. 266 ,thecontinuouspositivedenitefunctions U 1 ( ") and U 2 ( ") R aredenedas U 1 ( ") % 1 ( ( 2 and U 2 ( ") % 2 ( q ) ( ( 2 where % 1 % 2 ( q ) R aredenedas % 1 1 2 min { 1 ,m 1 } % 2 ( q ) max 8 1 2 m ( q ) 1 9 where m 1 m ( q ) areintroducedinEq. 22 .AftertakingthetimederivativeofEq. 265 V L ( ,t ) canbeexpressedas V L ( ,t )=2 e T 1 e 1 + e T 2 e 2 + 1 2 r T M ( q ) r + r T M ( q ) r + O. FromEqs. 27 259 260 ,and 265 ,someofthedi! erentialequationsdescribingthe closed-loopsystemforwhichthestabilityanalysisisbeingperformedhavediscontinuous 41

PAGE 42

right-handsides e 1 = e 2 $ 1 e 1 e 2 = r $ 2 e 2 (267) M r = 1 2 M ( q ) r + N + N B k s r ) 1 sgn ( e 2 ) e 2 (268) O ( t )= r T ( N B ) 1 sgn ( e 2 )) (269) Let f ( ,t ) R 3 n +1 denotetheright-handsideofEqs. 267 269 .Asdescribedin [ 116 118 ],theexistenceofFilippov'sgeneralizedsolutioncanbeestablishedforEqs. 267 269 .Notethat f ( ,t ) iscontinuousexceptintheset { ( ,t ) | e 2 =0 } .From[ 116 118 ], anabsolutecontinuousFilippovsolution ( t ) existsalmosteverywhere(a.e.)sothat " K [ f ]( y,t ) a.e. Exceptforthepointsonthediscontinuoussurface { ( ,t ) | e 2 =0 } theFilippovset-valuedmapincludesuniquesolutions.UnderFilippov'sframework, ageneralizedLyapunovstabilitytheorycanbeused([ 119 121 ]forfurtherdetails)to establishstrongstabilityoftheclosed-loopsystem.ThegeneralizedtimederivativeofEq. 265 exists(a.e.),and V L ( ,t ) a.e. V L ( ,t ) where V L = : # V L ( ,t ) # T K / e T 1 e T 2 r T 1 2 O % 1 2 O 1 0 T where & V L ( ,t ) isthegeneralizedgradientof V L ( ,t ) [ 119 ],and K [ ] isdenedas [ 120 121 ] K [ f ](") : $ > 0 : N =0 cof ( B ( ) N ) where ; N =0 denotestheintersectionofallsets N ofLebesguemeasurezero, co denotes convexclosure,and B ( ) representsaballofradius around .Since V L ( ,t ) isa Lipschitzcontinuousregularfunction V L = V T L K / e T 1 e T 2 r T 1 2 O % 1 2 O 1 0 T = / 2 e T 1 e T 2 r T M 2 O 1 2 1 2 r T Mr 0 K / e T 1 e T 2 r T 1 2 O % 1 2 O 1 0 T 42

PAGE 43

Usingcalculusfor K [ ] from[ 121 ]andsubstitutingthedynamicsfromEqs. 27 258 259 ,and 269 yields V L % r T N ( k s + % min R % 1 11 + R % 1 22 ( ) ( r ( 2 $ 2 ( e 2 ( 2 2 $ 1 ( e 1 ( 2 +2 e T 2 e 1 wherethefactthat ( r T ( t ) r T ( t )) i SGN ( e 2 i )=0 isused(thesubscript i denotesthe i th element),and K [ sgn ( e 2 )]= SGN ( e 2 ) [ 121 ]suchthat SGN ( e 2 i )=1 if e 2 i ( t ) > 0 SGN ( e 2 i )=[1 1] if e 2 i ( t )=0 SGN ( e 2 i )= 1 if e 2 i ( t ) < 0 Basedonthefactthat 2 e T 2 ( t ) e 1 ( t ) %( e 1 ( t ) ( 2 + ( e 2 ( t ) ( 2 theexpressioninEq. 255 canbeusedtoupperbound V L ( t ) usingthesquaresofthecomponentsof z ( t ) as V L %' % 3 ( y ( 2 1 k s ( r ( 2 ' ( ( y ( ) ( r (( y ( 2 (270) where % 3 min { 2 $ 1 1 $ 2 1 % min R % 1 N 11 + R % 1 N 22 ( } ; hence, $ 1 ,and $ 2 mustbechosenaccordingtothesu"cientconditioninEq. 263 AftercompletingthesquaresforthetermsinsidethebracketsinEq. 270 ,thefollowing expressionisobtained: V L %' % 3 ( y ( 2 + 2 ( ( y ( ) ( y ( 2 4 k s (271) TheexpressioninEq. 271 canbefurtherupperboundedas V L ( ,t ) %' U ( ")= c ( y ( 2 ) " D (272) forsomepositiveconstant c ,where D < " R 3 n +1 | ( (% % 1 % 2 6 % 3 k s &= where k s isselectedas k s > 0 Largervaluesof k s willexpandthesizeofthedomain D TheresultinEq. 272 indicatesthat V L ( ,t ) %' U ( ") ) V L ( ,t ) V L ( ,t ) .The inequalityinEq. 272 canbeusedtoshowthat V L ( ,t ) L in D ;hence, e 1 ( t ) e 2 ( t ) and r ( t ) L in D .Giventhat e 1 ( t ) e 2 ( t ) ,and r ( t ) L in D ,standardanalysis 43

PAGE 44

methodscanbeusedtoprovethatthecontrolinputandallclosed-loopsignalsare bounded,andthat U ( ") isuniformlycontinuousin D .Let S -D denotethesetdened as S < ( t ) D| U 2 ( "( t )) < % 1 ' % 1 2 % 3 k s (( 2 = .Theregionofattractioncanbe madearbitrarilylargetoincludeanyinitialconditionsbyincreasingthecontrolgain k s (i.e.,asemi-globaltypeofstabilityresult),andhence c ( y ( t ) ( 2 $ 0 and ( e 1 ( t ) ($ 0 as t $,) y (0) S Since u N 1 ( t ) ,u N 2 ( t ) $ 0 as e 2 ( t ) $ 0 (byEq. 240 ),thenEq. 252 canbeusedtoconcludethat $ h + f d + du as r ( t ) e 2 ( t ) $ 0 (273) TheresultinEq. 273 indicatesthatthedynamicsinEq. 21 convergetothestate-space systeminEq. 213 .Hence, u N 1 ( t ) ,u N 2 ( t ) convergetooptimalcontrollersthatminimizes Eqs. 215 and 216 ,respectively,subjecttoEq. 213 inthepresenceofstructured disturbances;providedthegainconstraintsgiveninEqs. 242 245 aresatised. 2.6Simulation ToexaminetheperformanceoftheNash-derivedcontrollerdevelopedinEqs. 240 241 ,and 249 anumericalsimulationwasperformed.Toillustratetheutilityofthe techniqueamodelisdescribedbytheEuler-Lagrangedynamicsas ) + 1 0 + ) + 0 2 = ) + p 1 +2 p 3 c 2 p 2 + p 3 c 2 p 2 + p 3 c 2 p 2 ) + q 1 q 2 (274) + ) + p 3 s 2 q 2 p 3 s 2 ( q 1 + q 2 ) p 3 s 2 q 1 0 ) + q 1 q 2 + ) + f d 1 0 0 f d 2 ) + q 1 q 2 + ) + d 1 d 2 , where p 1 =3 473 kg m 2 ,p 2 =0 196 kg m 2 ,p 3 =0 242 kg m 2 ,f d 1 =5 3 Nm sec f d 2 =1 1 Nm sec ,c 2 denotes cos ( q 2 ) ,s 2 denotes sin ( q 2 ) and d 1 d 2 denotebounded 44

PAGE 45

disturbancesdenedas d 1 =10sin( t )+3 5cos(3 t ) d 2 =2 5sin(2 t )+1 5cos( t ) IntheNashstrategy,player1isdenedby N 1 = 1 andplayer2isdeedas N 2 = 2 Theobjectiveofbothplayersistotrackadesiredtrajectorygivenas q d 1 = q d 2 =60sin(2 t ) 1 exp ' 0 01 t 3 (( andtheinitialconditionswereselectedas q 1 (0)= q 2 (0)=14 3 deg q 1 (0)= q 2 (0)=28 6 deg/ sec Theweightingmatricesforbothcontrollerswerechosenas Q N 11 = diag { 5 5 } Q N 12 = diag { 5 5 } Q N 22 = diag { 5 5 } L N 12 = diag { 5 5 } L N 22 = diag { 10 10 } L N 11 = diag { 5 5 } whichusingtheNashconstraintsgiveninEqs. 242 245 yieldtheNashgains R N 22 R N 11 R N 21 ,and R N 12 as R N 22 = diag 8 1 5 1 5 9 R N 11 = diag 8 1 10 1 10 9 R N 12 = diag 8 1 25 1 25 9 R N 21 = diag 8 1 100 1 100 9 45

PAGE 46

0 5 10 15 20 25 30 35 40 35 30 25 20 15 10 5 0 5 [sec] [deg] Player 1 Player 2 Figure2-1.ThesimulatedtrackingerrorsfortheRISEandNashoptimalcontroller. ThecontrolgainsforRISEcontrolelementwereselectedas $ 1 = diag { 5 5 } $ 2 = diag { 15 3 5 } ) 1 = diag { 15 10 } k s = diag { 65 25 } ThetrackingerrorsandthecontrolinputsfortheRISEandoptimalcontrollerare showninFigure 2-1 and 2-2 ,respectively.ToshowthattheRISEfeedbackidentiesthe nonlineare! ectsandboundeddisturbances,aplotofthedi! erenceisshowninFigure 2-3 Asthisdi! erencegoestozero,thedynamicsinEq. 21 convergetothestate-spacesystem inEq. 213 ,andthecontrollersolvesthetwoplayerdi! erentialgame.Inaddition,Figure 2-4 showstheconvergenceofthecostfunctionalsforeachplayer. 2.7Summary Inthischapter,anovelapproachforthedesignofadi! erentialgame-basedfeedback controllerisdevelopedforanEuler-Lagrangesystemsubjecttouncertaintiesandbounded disturbances.Anoptimalgame-derivedfeedbackcomponentwasusedinconjunctionwith 46

PAGE 47

0 5 10 15 20 25 30 35 40 50 40 30 20 10 0 10 20 30 [sec] [N m] Player 1 Player 2 Figure2-2.ThesimulatedtorquesfortheRISEandNashoptimalcontroller. 0 5 10 15 20 25 30 35 40 50 0 50 Player 1 [N m] 0 5 10 15 20 25 30 35 40 5 0 5 10 15 20 Player 2 [N m] Figure2-3.Thedi! erencebetweentheRISEfeedbackandthenonlineare! ectand boundeddisturbances. 47

PAGE 48

0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 Cost Functionals Player 1 Player 2 Figure2-4.Costfunctionalsfor u 1 and u 2 aRISEfeedbackcomponent,whichenablesthegeneralizedcoordinatesofthesystemto globallyasymptoticallytrackadesiredtime-varyingtrajectorydespiteuncertaintyinthe dynamics.UsingaLyapunovstabilityanalysisandafeedbackNashgamedevelopment, su"cientgainconditionswerederivedtoensureasymptoticstabilitywhileminimizinga costfunctionforthedevelopedcontrollers. 48

PAGE 49

CHAPTER3 ASYMPTOTICSTACKELBERGOPTIMALCONTROLDESIGNFORAN UNCERTAINEULER-LAGRANGESYSTEM Gametheoryestablishesanoptimalstrategyformultipleplayersineitheracooperativeornoncooperativemannerwheretheobjectiveistoreachanequilibriumstateamong theplayers.AStackelberggamestrategyinvolvesaleaderandafollowerthatfollowa hierarchyrelationshipwheretheleaderenforcesitsstrategyonthefollower.Inthischapter,ageneralframeworkisdevelopedforfeedbackcontrolofanEulerLagrangesystem usinganopen-loopnon-zerosumStackelbergdi! erentialgame.ARobustIntegralSign oftheError(RISE)controllerisusedtocanceluncertainnonlinearitiesinthesystemand anopen-loopStackelberggamemethodisthenappliedtotheresidualuncertainnonlinear systemtominimizecostfunctionalsforeachplayer.ALyapunovanalysisandsimulation areprovidedtoexaminethestabilityandperformanceofthedevelopedcontrollers. 3.1DynamicModelandProperties ThischapterconsidersthesamedynamicspresentedinChapter 2 .Inthisformulation,player1isdenotedasthefollower 1 = F andplayer2isdenotedastheleader 2 = L ,givenas M ( q ) q + V m ( q, q ) q + G ( q )+ F ( q )+ d ( t )= F ( t )+ L ( t ) (31) wherethesameassumptionfromChapter 2 hold. 3.2ErrorSystemDevelopment Thecontrolobjectiveistoensurethatthegeneralizedcoordinatestrackadesiredtimevaryingtrajectory,denotedby q d ( t ) R n ,despiteuncertaintiesinthedynamicmodel, whileminimizingagivenperformanceindex.Toquantifythetrackingobjective,aposition trackingerror,denotedby e 1 ( q,t ) R n ,isdenedas e 1 q d q. (32) 49

PAGE 50

Tofacilitatethesubsequentanalysis,lteredtrackingerrors,denotedby e 2 ( q, q,t ) r ( q, q, q,t ) R n ,arealsodenedas e 2 e 1 + $ 1 e 1 (33) r e 2 + $ 2 e 2 (34) where $ 1 $ 2 R n $ n arepositivedenite,constant,diagonalgainmatrices.Theltered trackingerror r ( q, q, q,t ) isnotmeasurablesincetheexpressioninEq. 34 dependson q ( t ) .Theerrorsystemsarebasedontheassumptionthatthegeneralizedcoordinatesof theEuler-Lagrangedynamicsallowadditiveandnotmultiplicativeerrors.Todevelop astate-spacemodelforthetrackingerrorsinEqs. 32 and 33 ,theinertiamatrixis premultipliedtothetimederivativeofEq. 33 ,andsubstitutionsaremadefromEq. 31 and 32 toobtain M e 2 = V m e 2 + h + d ( L + F ) (35) wherethenonlinearfunction h ( q, q,t ) R n isdenedas h M ( q d + $ 1 e 1 )+ V m ( q d + $ 1 e 1 )+ G + F. (36) Underthe(temporary)assumptionthatthedynamicsinEq. 31 areknown,thecontrol inputscanbedesignedas L + F h + d ( u L + u F ) (37) where u F ( t ) ,u L ( t ) R n areauxiliarycontrolinputsforthefollowerandleader,respectively,thatwillbedesignedtominimizedesiredperformanceindices.SubstitutingEq. 37 intoEq. 35 yields M e 2 = V m e 2 + u L + u F (38) Astate-spacemodelforEqs. 33 and 38 cannowbedevelopedas z = A ( q, q ) z + B 1 ( q ) u L + B 2 ( q ) u F (39) 50

PAGE 51

where A ( q, q ) R 2 n $ 2 n B 1 ( q ) ,B 2 ( q ) R 2 n $ n ,and z ( e i ,e 2 ) R 2 n andaredened inChapter 2 .Thestate-spacemodelinEq. 39 isdevelopedunderthe(temporary) assumptionofexactknowledgeofthedynamics.Inthenextsection,costfunctionalsand controllersaredevelopedfortheresidualuncertainnonlinearsysteminEq. 39 .The Stackelberggame-basedcontrollersarethenincorporatedwiththeRISEcontrolmethod thatasymptoticallyreducestheoriginaluncertaindynamicstothedynamicsinEq. 39 3.3TwoPlayerOpen-LoopStackelbergNonzero-SumDi!erentialGame Stackelbergdi! erentialgamesprovideaframeworkforsystemsthatoperateon di! erentlevelswithaprescribedhierarchyofdecisions.Thetwo-playergameiscastin twosolutionspaces:theleaderandthefollower.Thefollowertriestominimizeitscost functionalbasedonthedecisionfromtheleader,whiletheleader,whohasinsightinto thefollower'srationale,willdeneaninputsuchthattheleaderandthefollower'sinputs willyieldminimalcostfunctionals.TheStackelbergdi! erentialgameforthesystemgiven inEq. 39 canbeformulatedinanoptimalcontrolframeworkwheretheleader'sinput is u L ( z ) ,andthefollower'sinputas u F ( z ) .EachplayerinEq. 39 hasacostfunctional J F ( z,u F ,u L ) ,J L ( z,u F ,u L ) R denedas J F = 1 2 t 0 z T Qz + u T F R 11 u F + u T L R 12 u L ( dt (310) J L = 1 2 t 0 z T Nz + u T F R 21 u F + u T L R 22 u L ( dt, (311) where t 0 R istheinitialtime, Q,N R 2 n $ 2 n aresymmetricconstantmatricesdenedas Q = ) + Q 11 Q 12 Q T 12 Q 22 N = ) + N 11 N 12 N T 12 N 22 , where Q ij N ij R ij R n $ n aresymmetricconstantmatricesfor i,j =1 2 Basedonthe minimumprincipal[ 114 ],theHamiltonian H F ( z,u F ,u L ) R ofthefollowerisdenedas H F = 1 2 z T Qz + u T F R 11 u F + u T L R 12 u L ( + % T F ( Az + B 1 u F + B 2 u L ) (312) 51

PAGE 52

wheretheoptimalcontrollerandcostateequationofthefollowerare[ 114 ] u F = > & H F & u F ? T = R % 1 11 B T 1 % F (313) % F = > & H F & z ? T = A T % F Q T z. (314) UsingEq. 313 ,theleader'sHamiltonian H L ( z,u F ,u L ) R isdenedas H L = 1 2 z T Nz + % T F B 1 R % T 11 R 21 R % 1 11 B T 1 % F + u T L R 22 u L ( + % T L Az B 1 R % 1 11 B T 1 % F + B 2 u L ( + + T % F (315) wheretheoptimalcontrollerandcostateequationsaredenedas u L = > & H L & u L ? T = R % 1 22 B T 2 % L (316) % L = > & H L & z ? T = N T z A T % L + Q + (317) + = > & H L &% F ? T (318) = B 1 R % T 11 R 21 R % 1 11 B T 1 % F + B 1 R % T 11 B T 1 % L + A +. TheexpressionsderivedinEqs. 310 318 denetheoptimalcontrolproblem.Thesubsequentanalysisaimsatdevelopinganexpressionforthecostatevariables ( % F ( t ) % L ( t ) + ( t )) whichsatisfythecostateequations ( % F ( t ) % L ( t ) + ( t )) andcanbeimplementedbythe controllers u F ( t ) and u L ( t ) .Tothisend,thesubsequentdevelopmentisbasedonthe followingassumedsolutionsforthecostatevariables % F = Kz (319) % L = Pz (320) + = Sz, (321) 52

PAGE 53

where K ( t ) ,P ( t ) ,S ( t ) R 2 n $ 2 n aretime-varyingpositivedeniteblockdiagonalmatrices denedas K = ) + K 11 0 n $ n 0 n $ n K 22 P = ) + P 11 0 n $ n 0 n $ n P 22 S = ) + S 11 0 n $ n 0 n $ n S 22 . Giventheseassumedsolutions,conditions/constraintsarethendevelopedtoensurethese solutionssatisfyEqs. 312 318 SubstitutingEqs. 39 313 314 ,and 316 321 intothederivativeofEqs. 319 3 21 yieldsthethreedi! erentialequations 0= K + KA + A T K KB 1 R % 1 11 B T 1 K KB 2 R % 1 22 B T 2 P + Q T (322) 0= P + PA + A T P PB 1 R % 1 11 B T 1 K PB 2 R % 1 22 B T 2 P + N T QS (323) 0= S + SA AS SB 1 R % 1 11 B T 1 K SB 2 R % 1 22 B T 2 P + B 1 R % 1 11 ( T R 21 R % 1 11 B T 1 K B 1 R % 1 11 ( T B T 1 P, (324) whereEqs. 322 and 323 aredi! erentialRiccatiequations(DRE).EquationsEqs. 322 324 mustbesolvedsimultaneouslytoyieldacontrolstrategyfortheleaderandfollower. ThesolutionstotheDREgains K ( t ) and P ( t ) correspondto u F ( t ) and u L ( t ) respectively, while S ( t ) constrainsthetrajectoryof K ( t ) and P ( t ) .FromtheDREinEq. 323 ,four simultaneousequationsaregeneratedas 0= P 11 P 11 $ 1 $ T 1 P 11 + N 11 Q 11 S 11 (325) 0= P 11 + N 12 Q 12 S 22 (326) 0= P 11 + N T 12 Q T 12 S 11 (327) 0= P 22 P 22 M % 1 V m V T m M % 1 P 22 + N 22 Q 22 S 22 P 22 M % 1 R % 1 11 M % 1 K 22 P 22 M % 1 R % 1 22 M % 1 P 22 (328) 53

PAGE 54

If P 22 ( t ) and K 22 ( t ) areselectedas P 22 = K 22 = M ( q ) (329) thentheskewsymmetrypropertiesinAssumption2canbeappliedtoEq. 328 to determinethat R % 1 11 R % 1 22 + N 22 Q 22 S 22 =0 (330) whichimpliesthat S 22 isaconstantmatrix;therefore, P 11 mustalsobeaconstantmatrix fromEq. 326 FoursimultaneousequationsaregeneratedfromtheDREinEq. 322 0= K 11 K 11 $ 1 $ T 1 K 11 + Q 11 (331) 0= K 11 + Q 12 (332) 0= K 11 + Q T 12 (333) 0= K 22 K 22 M % 1 V m V T m M % 1 K 22 + Q 22 (334) K 22 M % 1 R % 1 11 M % 1 K 22 K 22 M % 1 R % 1 22 M % 1 P 22 SubstitutingEq. 329 intoEq. 334 yields R % 1 11 R % 1 22 + Q 22 =0 (335) whichwhencombinedwithEq. 330 yields Q 22 ( I n $ n + S 22 ) N 22 =0 If N ischosensuchthat N 22 = Q 22 ,then S 22 isconstrainedtobe S 22 = 2 I n $ n (336) 54

PAGE 55

FromtheRiccatiequationgiveninEq. 324 thefollowingthreesimultaneousequations aregenerated 0= S 11 S 11 $ 1 + $ 1 S 11 (337) 0= S 11 S 22 (338) 0= S 22 + M % 1 V m S 22 S 22 M % 1 R % 1 11 M % 1 K 22 S 22 M % 1 R % 1 22 M % 1 P 22 + M % 1 R % 1 11 R 21 R % 1 11 K 22 M % 1 R % 1 11 M % 1 P 22 S 22 M % 1 V m (339) SubstitutingEqs. 329 and 336 intoEq. 339 resultsintheconstraint R % 1 11 +2 R % 1 22 + R % 1 11 R 21 R % 1 11 =0 (340) Inaddition,substitutingEq. 336 intoEq. 338 yields S 11 = 2 I n $ n (341) ItisevidentfromEqs. 326 327 332 333 336 ,and 341 thatthefollowingrelationshipscanbeestablished K 11 = 1 2 Q 12 + Q T 12 ( (342) P 11 = 1 2 N 12 + N T 12 ( +2 K 11 (343) AnotherconstraintcanbeestablishedbysubstitutingEqs. 331 341 ,and 343 intoEq. 325 andreducingtheequationas 0= N 11 + 1 2 1' N 12 + N T 12 ( $ 1 + $ T 1 N 12 + N T 12 (2 SubstitutingEqs. 39 319 ,and 329 intoEq. 313 yieldsaStackelbergderivedcontrollergivenas u F = R % 1 11 B T 1 Kz = R % 1 11 e 2 (344) 55

PAGE 56

ThecontrollerinEq. 344 issubjecttotheleaders'sinput,derivedbysubstitutingEqs. 39 320 ,and 329 intoEq. 316 ,givenas u L = R % 1 22 B T 2 Pz = R % 1 22 e 2 (345) ItisevidentfromEqs. 335 and 340 thatthegainmatrices R % 1 11 and R % 1 22 areconstrained by Q 22 + R % 1 22 + R % 1 11 R 21 R % 1 11 =0 (346) Inaddition,theweights( Q,N )imposeapenaltyonthestatevectorsgiveninthecost functionsEqs. 310 and 311 andaresubjecttothefollowingconstraints 0= N 22 + Q 22 (347) 0= 1 2 1' Q 12 + Q T 12 ( $ 1 + $ T 1 Q 12 + Q T 12 (2 + Q 11 (348) 0= 1 2 1' N 12 + N T 12 ( $ 1 + $ T 1 N 12 + N T 12 (2 + N 11 (349) Basedontheopen-loopStackelbergstrategy,thederivedcontrollerinEq. 344 minimizes thecostfunctionalgivenbyEq. 310 andissubjecttotheleadersinputinEq. 345 thatminimizesthecostfunctionalgivenbyEq. 311 .Todemonstrateoptimalityofthe proposedcontroller,HamiltonianswereconstructedinEqs. 312 and 315 ,andanoptimal controlproblemwasformulated.ThecostatevariablesinEqs. 319 321 wereassumedto besolutionstothecostateequationsEqs. 314 317 and 318 andgainconstraintswere developed.IfallconstraintsinEqs. 346 349 aresatisedthentheassumedsolutionsin Eqs. 319 321 satisfyEqs. 312 318 ,andhenceareoptimalwithrespecttotheresidual dynamicsinEq. 39 Remark 3.1 Sincetheopen-loopstrategyfortheleaderisdeclaredinadvanceforthe entiregame,ifthefollowerminimizesitscostfunction,thenitobtainsthefollowerStackelbergstrategywhichistheoptimalreactiontothedeclaredleaderstrategy.Adrawback ofanyopen-loopdi! erentialgameapproachisthat,duetodynamic-inconsistency(also 56

PAGE 57

calledtime-inconsistency[ 122 ]),theopen-loopstrategydoesnotsatisfytheprincipleof optimality;i.e.,if u t 0 1 ( x,t ) hasbeenfoundtobetheopen-loopstrategyfortheleaderat t = t 0 andifafteranintervaloftime [ t 0 ,t 1 ] are-evaluationoftheStackelbergstrategy isattempteditwill,ingeneral,befoundthattheresultingoptimalStackelbergstrategy u t 0 1 ( x,t ) / = u t 1 1 ( x,t ) .Theopen-loopStackelbergstrategyconceptassumesacommitment bytheleadertoimplementitsannouncedstrategy.Thiscommitmentisfortheentire game,andiftheactualintervalofthegamewasdi! erent,thecommittedstrategygenerallywouldnotcoincidewiththeStackelbergstrategyforthenewinterval,buttheleader wouldbeobligedtousethenon-optimalstrategy(i.e.,thegameissubgameimperfect). ExistenceandUniquenessofStackelbergEquilibriumSolution. TheexistenceofuniqueStackelbergequilibriawasshowntobetiedtotheexistenceofsolutionsto certainnon-symmetricRiccatiequations,whicharedi"culttosolve.In[ 63 ]aconnection betweensolutionsofastandardalgebraicRiccatiequationandanon-symmetricalgebraic Riccatiequationweregiven.Thesubsequenttheorem,givenasTheorem3in[ 63 ],utilizes theconnectionbetweenthestandardandnon-symmetricRiccatiequationstodene existenceanduniqueness. Theorem3.1. Iftheconvexitycondition,givenby R 11 & 0 ,R 22 > 0 ,Q & 0 R 21 & 0 ,N & 0 aresatisedandifthereexistsastabilizingsolution X tothenon-symmetricalgebraic Riccatiequation 0= X @ A B A 0 0 A C D E + @ A B A T 0 0 A T C D E X + @ A B Q 0 N Q C D E XGX (350) 57

PAGE 58

where G @ A B B 1 B 2 0 00 B 1 C D E @ A A A A B R 11 00 0 R 22 0 R 21 0 R 11 C D D D D E % 1 @ A A A A B B T 1 0 0 B T 2 0 B T 1 C D D D D E thentheuniqueStackelbergexistsandisgivenby @ A B u # 1 u # 2 C D E = @ A B R % 1 11 00 0 R % 1 22 0 C D E @ A A A A B B T 1 0 0 B T 2 0 B T 1 C D D D D E Xz, (351) where z isthesolutionoftheclosed-loopequation z = @ A B @ A B A 0 0 A C D E GX C D E z,z (0)= @ A B z 0 0 C D E Proof. SeeTheorem3in[ 63 ]. ItsinterestingtonotethatTheorem3-1doesnotdependonthesolutiontothe Riccatiequations,howevergiventheexistenceofthestabilizingsolution X toEq. 350 at leastonestabilizingsolution ( K,P,S ) ofthealgebraicRiccatiequationsgivenby 0= KA + A T K KB 1 R % 1 11 B T 1 K KB 2 R % 1 22 B T 2 P + Q T (352) 0= PA + A T P PB 1 R % 1 11 B T 1 K PB 2 R % 1 22 B T 2 P + N T QS (353) 0= SA AS SB 1 R % 1 11 B T 1 K SB 2 R % 1 22 B T 2 P + B 1 R % 1 11 ( T R 21 R % 1 11 B T 1 K B 1 R % 1 11 ( T B T 1 P, (354) 58

PAGE 59

exists.Thisfollowsfromfactthat Im @ A B I n $ n X C D E is H st -invariant,where Im ( ) denotesthe imageoperator,and H st istheextendedHamiltoniandenedas H st = @ A A A A A A A B A B 1 R % 1 11 B T 1 B 2 R % 1 22 B T 2 0 Q A T 00 N 0 A T Q 0 B 1 R % T 11 R 21 R % 1 11 B T 1 B 1 R % 1 11 B T 1 A C D D D D D D D E andcontainsan n -dimensional H st -invariantsubspaceoftheform Im I n $ n ,S T ,K T ,P T ( T whichdenesthedesiredsolutionofEqs. 352 354 [ 63 ].Ifastabilizingsolutionexists fortheRiccatiequations,imposingadditionalconstraintssuchas: A isstableandevery eigenvaluein A is ( Q,A ) unobservable,thenthesolutionisunique.However,accordingto Chapter2of[ 123 ],satisfyingthetwoconstraintsdoesnotadmitastablesolutiontothe algebraicRiccatiequationfortheleaderandthenon-symmetriccoupledRiccatiequation denedinEq. 350 3.4RISEFeedbackControlDevelopment Ingeneral,theboundeddisturbance d ( t ) andthenonlineardynamicsgiveninEq. 36 areunknown,sothecontrollergiveninEq. 37 cannotbeimplemented.However, ifthecontrolinputcontainssomemethodtoidentifyandcancelthesee! ects,then z ( t ) willconvergetothestatespacemodelinEq. 39 sothat u L ( t ) and u F ( t ) minimizestheir respectiveperformanceindex.Inthissection,acontrolinputisdevelopedthatexploits RISEfeedbacktoidentifythenonlineare! ectsandboundeddisturbancestoenable z ( t ) to asymptoticallyconvergetothestatespacemodel. Thecontrolinputisdenedthesameas 249 inChapter 2 ,howeverforthisderivationplayer1isthefollower 1 = F andplayer2isdenotedastheleader 2 = L .Using thecontrolinputs,theclosedlooperrorsystemcanbederivedas M r = 1 2 Mr + N + N D e 2 ( R % 1 11 + R % 1 22 ) r (355) 59

PAGE 60

InEq. 355 ,theunmeasurableauxiliaryterms N ( q, q, q,e 1 ,e 2 ,r ) N D ( q d q d q d ... q d ) R n aredenedas N V m e 2 V m e 2 1 2 Mr + h + $ 2 Me 2 + $ 2 M e 2 + e 2 +( R % 1 11 + R % 1 22 ) $ 2 e 2 N D f d + d Motivationforgroupingtermsinto N and N D comesfromthesubsequentstability analysisandthefactthattheMeanValueTheorem,Assumption3-3,Assumption3-4,and Assumption3-5canbeusedtoupperboundtheauxiliarytermsas 3 3 3 N ( t ) 3 3 3 % ( ( y ( ) ( y ( (356) ( N D (% ( 1 3 3 3 N D 3 3 3 % ( 2 (357) where y ( e 1 ,e 2 ,r ) R 3 n isdenedas y ( t ) [ e T 1 e T 2 r T ] T (358) theboundingfunction ( ( y ( ) R isapositivegloballyinvertiblenondecreasingfunction, and ( i R ( i =1 2) denoteknownpositiveconstants.BasedonEq. 355 ,thecontrol term ( t ) isdesignedasthegeneralizedsolutionto ( t ) k s r ( t )+ ) 1 sgn ( e 2 ) (359) where k s ) 1 R arepositiveconstantcontrolgains.Theclosedlooperrorsystemsfor r ( t ) cannowbeobtainedbysubstitutingEq. 359 intoEq. 355 as M r = 1 2 Mr + N + N D e 2 ( R % 1 11 + R % 1 22 ) r k s r ) 1 sgn ( e 2 ) (360) 60

PAGE 61

3.5StabilityAnalysis Lemma3.1. Let O ( e 2 ,r,t ) R denotethegeneralizedsolutionto O r T ( N D ) 1 sgn ( e 2 )) O (0)= ) 1 n 4 i =1 | e 2 i (0) | e 2 (0) T N D (0) (361) where e 2 i (0) denotesthe i th elementofthevector e 2 (0) .Provided ) 1 isselectedaccording tothefollowingsu" cientcondition: ) 1 > ( 1 + 1 % min ( $ 2 ) ( 2 (362) where ( 1 and ( 2 areknownpositiveconstantsdenedinEq. 357 ,then O ( e 2 ,r,t ) & 0 Proof. See[ 111 ]. Theorem3.2. ThecontrollergivenbyEqs. 344 345 ,and 359 ensuresthatallsystem signalsareboundedunderclosed-loopoperation,andthetrackingerrorsaresemi-globally asymptoticallyregulatedinthesensethat ( e 1 ( t ) ( ( e 2 ( t ) ( ( r ( t ) ($ 0 ast $, (363) providedthecontrolgain k s inEq. 359 isselectedsu" cientlylargebasedontheinitial conditionsofthesystem, ) 1 inEq. 359 isselectedaccordingtoEq. 362 ,and $ 1 $ 2 are selectedaccordingtothesu" cientconditions % min ( $ 1 ) > 1 2 % min ( $ 2 ) > 1 (364) Furthermore, u F ( t ) and u L ( t ) minimizeEqs. 310 and 311 subjectto 39 providedthe gainconstraintsgiveninEqs. 346 349 aresatised Remark 3.2 Thecontrolgain $ 1 cannotbearbitrarilyselected,ratheritiscalculated usingaLyapunovequationsolver.Itsvalueisdeterminedbasedonthevalueof Q N R 11 R 21 and R 22 Therefore Q N R 11 R 21 and R 22 mustbechosensuchthatEq. 364 is satised. 61

PAGE 62

Proof. RefertoTheorem2-4fromChapter 2 Remark 3.3 SimilartoLQRdesign,thestateweights ( Q,N ) andinputweight R are designedtoregulatethestatesandinputstoadesiredbehavior,respectively.Thegain constraintsinEqs. 346 349 provideageneralframeworkforimplementingthecontroller. Theweights Q and N penalizethestate z ( t ) andcanbechosensu"cientlylargetoyield desirabletrackingerrorwhilethetheleader'scontrolinputweight R 22 canbechosen su"cientlylargetoyielddesirablecontrollerbandwidth.Thefollower'scontrolinput weight R 11 isthengeneratedusingthechosengains Q N ,and R 22 andtheconstraintsin Eqs. 346 349 3.6Simulation ToexaminetheperformanceoftheStackelberg-derivedcontrollerproposedinEqs. 344 345 ,and 359 ,anumericalsimulationwasperformed.Toillustratetheutilityof thetechniqueamodelisdescribedbytheEuler-Lagrangedynamics,denedinChapter 2 areconsidered.FortheStackelbergstrategy,thefollowerinputis F = 1 andtheleader inputis L = 2 .InthisframeworktheinertialandCoriolise! ectsoftheleaderareseen asadisturbancetothetrackingobjectiveofthefollower.Inbothstrategies,theobjective ofbothplayersistotrackadesiredtrajectorygivenas q d 1 = q d 2 =60sin(2 t ) 1 exp ' 0 01 t 3 (( andtheinitialconditionswereselectedas q 1 (0)= q 2 (0)=14 3 deg q 1 (0)= q 2 (0)=28 6 deg/ sec 62

PAGE 63

Theweightingmatricesforbothcontrollerswerechosenas Q 11 = diag { 5 5 } Q 12 = diag { 5 5 } Q 22 = diag { 5 5 } L 12 = diag { 5 5 } L 22 = diag { 5 5 } L 11 = diag { 5 5 } whichusingtheStackelbergconstraintsgiveninEqs. 346 348 yieldthevaluesStackelberggains R 22 R 11 and R 21 ,as R 22 = diag 8 4 11 4 11 9 R 11 = diag 8 1 15 1 15 9 R 21 = diag 8 1 100 1 100 9 ThecontrolgainsforRISEcontrolelementwereselectedas $ 1 = diag { 5 5 } $ 2 = diag { 15 3 5 } ) 1 = diag { 15 10 } k s = diag { 65 25 } ThetrackingerrorsandthecontrolinputsfortheRISEandoptimalcontrollerare showninFigure 3-1 and 3-2 ,respectively.ToshowthattheRISEfeedbackidentiesthe nonlineare! ectsandboundeddisturbances,aplotofthedi! erenceisshowninFigure 3-3 Asthisdi! erencegoestozero,thedynamicsinEq. 31 convergetothestate-spacesystem inEq. 39 ,andthecontrollersolvesthetwoplayerdi! erentialgame.Inaddition,Figure 3-4 showstheconvergenceofthecostfunctionalsforeachplayer. 3.7Summary Inthischapter,anovelapproachforthedesignofaStackelberg-basedcontrollerwas proposedforanonlinearEuler-Lagrangesystemsubjecttoparametricuncertaintyand boundeddisturbances.Stackelberggamemethodsareusedtodeveloptrackingcontrollers 63

PAGE 64

0 5 10 15 20 25 30 35 40 35 30 25 20 15 10 5 0 5 [sec] [deg] Follower Leader Figure3-1.ThesimulatedtrackingerrorsfortheRISEandStackelbergoptimalcontroller. 0 5 10 15 20 25 30 35 40 30 20 10 0 10 20 30 [sec] [N m] Follower Leader Figure3-2.ThesimulatedtorquesfortheRISEandStackelbergoptimalcontroller. 64

PAGE 65

0 5 10 15 20 25 30 35 40 50 0 50 Follower [N m] 0 5 10 15 20 25 30 35 40 5 0 5 10 15 20 Leader [N m] [sec] Figure3-3.Thedi! erencebetweentheRISEfeedbackandthenonlineare! ectand boundeddisturbances. 0 5 10 15 20 25 30 35 40 0 2 4 6 8 10 12 14 16 18 Cost Functionals Follower Leader Figure3-4.Costfunctionalsfortheleaderandfollower. 65

PAGE 66

thatminimizecostfunctionalsconstrainedbyaresidualuncertainnonlinearsystem withmultipleinputs.ALyapunov-basedstabilityanalysisisusedtoprovesemi-global asymptotictrackingoftheresultingcontroller.TheinclusionoftheRISEstructureisan enablingmethodtoallowtheanalyticaldevelopmentofacontrollerthatasymptotically minimizescostfunctionalsinaStackelberggamefortheuncertainnonlinearcontinuoustimeEuler-Lagrangesystem.However,thecontributionoftheimplicitlearningRISE structureisnotincludedinthecostfunctional,yieldinga(sub)optimalresult. 66

PAGE 67

CHAPTER4 APPROXIMATETWOPLAYERZERO-SUMGAMESOLUTIONFORAN UNCERTAINCONTINUOUSNONLINEARSYSTEM Inrecentwork[ 102 ],anonlineapproximatesolutionmethodisdevelopedbasedon anapproximationoftheHJBforthe(oneplayer)innitehorizonoptimalcontrolproblem ofacontinuous-timenonlinearsystemswithpartiallyknowndynamics.Thisapproximate optimaladaptivecontrollerusestwoadaptivestructures,acritictoapproximateforthe value(cost)functionandanactortoapproximatethecontrolpolicy.Inaddition,aDNN isusedtorobustlyidentifythesystemparameters.Thetwoadaptivestructuresaretuned simultaneouslyonlinetolearnthesolutionoftheHJBequationandtheoptimalpolicy. Thischaptergeneralizesthemethodgivenin[ 102 ]andsolvesthetwoplayerzero-sum gameproblemfornonlinearcontinuous-timesystemswithpartiallyknowndynamics. Thischapterpresentsanoptimaladaptivecontrolmethodthatconvergesonlinetothe solutiontothetwoplayerdi! erentialgame.TheHJIapproximationalgorithmconsiders atwoactorandonecriticNNsarchitecture,wheretheactorandcriticusegradientand leastsquares-basedupdatelaws,respectively,tominimizetheBellmanerror,whichis thedi! erencebetweentheexactandtheapproximateHJIequations.AnDNNidentier learnsthesystemdynamicsbasedononlinegradient-basedweighttuninglaws,while aRISEtermrobustlyaccountsforthefunctionreconstructionerrors,guaranteeing asymptoticestimationofthestateandthestatederivative.Theonlineestimationof thestatederivativeallowstheACIarchitecturetobeimplementedwithoutknowledge ofsystemdriftdynamics.Theparameterupdatelawstunethecriticandactorneural networksonlineandsimultaneouslytoconvergetothesolutiontotheHJIequationand thesaddlepointpolicies,whilealsoguaranteeingclosed-loopstability.Thesepolicies guaranteeUUBtrackingerrorfortheclosed-loopsystem. 67

PAGE 68

4.1TwoPlayerZero-SumDi!erentialGame Considerthenonlineartime-invarianta"neintheinputdynamicsystemgivenby x = f ( x )+ g 1 ( x ) u 1 ( x )+ g 2 ( x ) u 2 ( x ) (41) y = h ( x ) z = / y T u T 1 0 T where x ( t ) X! R n isthestatevector, u 1 ( x ) ,u 2 ( x ) U! R m arethecontrolinputs, and f ( x ) R n ,and g 1 ( x ) ,g 2 ( x ) R n $ m arethedrift,andinputmatrices,respectively. Assumethat f ( x ) and g 1 ( x ) and g 2 ( x ) areLipschitzcontinuousandthat f (0)=0 so that x =0 isanequilibriumpointforEq. 41 Bounded L 2 GainProblem. Theobjectiveofthebounded L 2 gaincontrolproblem istodesignacontrolinputpolicy u 1 ( x ) suchthat 0 ( z ( 2 d 0 0 h T h + u 1 Ru 1 ( d % 2 0 ( u 2 ( 2 d !, foragiven > 0 ;where R = R T > 0 ,forall u 2 L 2 [0 , ) when x (0)=0 .The H controlproblemisinterestedindeterminingthesmallest > 0 ,knownas # ,suchthat thebounded L 2 gaincontrolproblemhasasolution[ 65 ].Thischapterisnotinterested inthe H controlobjectives,ratheritisassumedthat isprescribedapriorisuchthat & # & 0 .Thevaluefunction[ 124 ]isgivenas V ( x,u 1 ,u 2 )= t h T h + u T 1 Ru 1 2 || u 2 || 2 ( d !, where R = R T R m $ m ispositivedenite.Adi! erentialequivalenttothevaluefunction isthenonlinearLyapunov-likeequation 0= h T h + u T 1 Ru 1 2 || u 2 || 2 + V ( f ( x )+ g 1 ( x ) u 1 ( x )+ g 2 ( x ) u 2 ( x )) (42) 68

PAGE 69

where V # V # x R n $ 1 isthegradientofthevaluefunction.Foradmissiblepolicies u 1 ,a solution V ( x ) & 0 toEq. 42 isthevalueforagiven u 2 L 2 [0 , ) TwoPlayerZero-SumGame .Forthetwoplayerzero-sumdi! erentialgame,the innite-horizonscalarvalueorcostfunctional V ( x ( t ) ,u 1 ,u 2 ) associatedwiththecontrol policies { u 1 = u 1 ( x ( s )); s & t } and { u 2 = u 2 ( x ( s )); s & t } canbedenedas V ( x )=min u 1 max u 2 t r ( x ( s ) ,u 1 ( s ) ,u 2 ( s )) ds, (43) where t istheinitialtime,and r ( x,u 1 ,u 2 ) R isthelocalcostforthestate,andcontrols, denedas r = Q ( x )+ u T 1 Ru 1 2 u T 2 u 2 (44) Inthisdi! erentialgame, u 1 ( x ) istheminimizingplayer,and u 2 ( x ) isthemaximizing player.ThistwoplayeroptimalcontrolproblemhasauniquesolutioniftheNash conditionholds min u 1 max u 2 V ( x (0) ,u 1 ,u 2 )=max u 2 min u 1 V ( x (0) ,u 1 ,u 2 ) Theobjectiveoftheoptimalcontrolproblemistondfeedbackpolicies[ 41 ]( u # 1 = u 1 ( x ) and u # 2 = u 2 ( x ) ),suchthatthecostinEq. 43 associatedwiththesysteminEq. 41 is minimized[ 114 ].Assumingthevaluefunctionaliscontinuouslydi! erentiable,Bellman's principleofoptimalitycanbeusedtoderivethefollowingoptimalitycondition 0=min u 1 max u 2 [ V ( f ( x )+ g 1 ( x ) u 1 ( x )+ g 2 ( x ) u 2 ( x ))+ r ( x,u 1 ,u 2 )] (45) whichisanonlinearPDE,alsocalledtheHJIequation.Givenasolution V # ( x ) & 0 tothe HJI,thelocalcostgiveninEq. 44 canbeusedtoformthealgebraicexpressionsforthe optimalcontrolanddisturbanceinputsfromEq. 45 as u # 1 = 1 2 R % 1 g T 1 ( x ) V # (46) u # 2 = 1 2 2 g T 2 ( x ) V # (47) 69

PAGE 70

TheclosedformexpressionfortheoptimalcontrolanddisturbanceinEqs. 46 and 47 respectively,obviatestheneedtosearchforafeedbackpolicythatminimizethevalue function;however,thesolution V # ( x ) totheHJIequationgiveninEq. 45 isrequired. TheHJIequationinEq. 45 ,canberewrittenbysubstitutingforthelocalcostinEq. 44 andtheoptimalcontrolpoliciesinEqs. 46 and 47 ,as 0= Q ( x )+ V # f ( x ) 1 4 V # g 1 ( x ) R % 1 g T 1 ( x ) V # (48) + 1 4 2 V # g 2 ( x ) g T 2 ( x ) V # V # (0)=0 SincetheHJIequationistroublesometosolveingeneral,thischapterconsidersan approximatesolution.TheHJIinEq. 48 mayhavemorethanonenonnegativedenite solution.Anonnegativedenitesolution V a issuchthatthereexistsnoothernonnegative denitesolution V suchthat V a ( x ) & V ( x ) & 0 .In[ 41 ],thesystemisinNashequilibrium withavaluegivenas V a ( x (0)) andasaddlepointequilibriumsolution ( u # 1 ,u # 2 ) among strategiesin L 2 [0 , ) ,if V a issmoothandaminimalsolutiontotheHJIandthesystem iszerostateobservable.Moreover,theclosed-loopsystems f ( x )+ g 1 u # 1 + g 2 u # 2 and f ( x )+ g 1 u # 1 arelocallyasymptoticallystable.Itisprovenin[ 41 ],thattheminimum nonnegativedenitesolutiontotheHJIistheuniquesolutionforwhichtheclosedloopsystem f ( x )+ g 1 u # 1 + g 2 u # 2 isasymptoticallystable.In[ 65 ]itwasshownthatthe HJIequationhasalocalsmoothsolution V ( x ) ifthesystem f ( x )+ g 1 u 1 islocally asymptoticallystableand u 1 ( x ) yieldsthe L 2 gainofEq. 41 % .Fromthisitcanbe shownthat V a ( x ) isalsotheminimalnonnegativesolutiontotheHJI.Theworkin[ 65 ] alsoshowsthatforagiven ,where V ( x ) & 0 issmoothandisthesolutiontoEq. 48 andthesysteminEq. 41 iszerostateobservable,thenthesysteminEq. 41 hasa L 2 gain % andtheoptimalcontrol u # 1 inEq. 46 solvesthe L 2 gainproblemandyieldsthe equilibriumpointlocallyasymptoticallystable.Moreover,yieldingtheoptimalcontrol as u # 1 ( t ) L 2 [0 , ) .Itisevidentthatboththe L 2 gainproblemandthezero-sumgame problemaredependentonthesolutiontotheHJI. 70

PAGE 71

ExistenceofsolutiontotheHJI .WhileingeneralglobalsolutionstotheHJIin Eq. 48 maynotexist,alocalexistenceproofwasgivenin[ 65 ].Theworkin[ 65 ]proposes thatforagiven ,ifthesystemiszerostateobservable,andthereexistsacontrolpolicy u 1 ( x ) suchthatlocallythesystemhasa L 2 gain % andthesystemisasymptotically stable,thenthereisaneighborhood # x R n oftheoriginonwhichthereexistsasmooth solution V ( x ) & 0 totheHJIequationinEq. 48 .Furthermore,thecontrolyieldsthe L 2 gain % foralltrajectoriesoriginatingattheoriginandremaininginside # x .Moreover, iftheydo,theymaynotbesmooth.ForadiscussiononviscositysolutionstotheHJI, see[ 55 125 ]. 4.2HJIApproximationAlgorithm ThischaptergeneralizestheACIapproximationarchitecturetosolvethetwoplayer zero-sumgameforEq. 48 .TheACIarchitectureeliminatestheneedforexactmodel knowledgeandutilizesaDNNtorobustlyidentifythesystem,acriticNNtoapproximate thevaluefunction,andanactorNNtondacontrolpolicywhichminimizesthevalue functions.ThissectionintroducestheACIarchitectureforthetwoplayergame,and subsequentsectionsgivedetailsofthedesignforthetwoplayerzero-sumgamesolution. TheHamiltonian H ( x, V,u 1 ,u 2 ) ofthesysteminEq. 41 canbedenedas H = r + VF u (49) where V istheJacobianofthevaluefunction V ( x ) F u ( x,u 1 ,u 2 ) f ( x )+ g 1 u 1 + g 2 u 2 R n denotesthesystemdynamics,and r ( x,u 1 ,u 2 ) Q ( x )+ u T 1 Ru 1 2 u T 2 u 2 denotesthe localcost.TheoptimalpolicyinEq. 46 andtheassociatedvaluefunction V # ( x ) satisfy theHJIequation H ( x, V # ,u # 1 ,u # 2 )= r ( x,u # 1 ,u # 2 )+ V # F u =0 (410) 71

PAGE 72

ReplacingtheoptimalJacobian V # andoptimalcontrolpolicy u # 1 anddisturbanceinput u # 2 byestimates V u 1 ,and u 2 respectively,yieldstheapproximateHJIequation H % x, V, u 1 u 2 & = r ( x, u 1 u 2 )+ VF u (411) ItisevidentthattheapproximateHJIinEq. 411 isdependentonthecompleteknowledgeofthesystem.Toovercomethislimitation,anonlinesystemidentierreplacesthe systemdynamicswhichmodiestheapproximateHJIinEq. 411 H % x, x, V, u 1 u 2 & = r ( x, u 1 u 2 )+ V F u (412) where F u isanapproximationofthesystemdynamics F u .Theerrorbetweentheoptimal andapproximateHJIequationsinEqs. 410 and 412 ,respectively,yieldstheBellman residualerror hjb % x, x, u 1 u 2 V & denedas hjb H % x, x, V, u 1 u 2 & H ( x, V # ,u # 1 ,u # 2 ) (413) Howeversince H ( x, V # ,u # 1 ,u # 2 )=0 thentheBellmanresidualerrorcanbedenedina measurableformas hjb = H % x, x, V, u 1 u 2 & Theobjectiveistoupdateboth u 1 and u 2 (actors)and V (critic)simultaneously,basedon theminimizationoftheBellmanresidualerror hjb .Alltogethertheactors u 1 and u 2 ,the critic V andtheidentier F u constitutetheACIarchitecture.Tofacilitatethesubsequent analysisthefollowingassumptionsaregiven. Assumption4.1. Givenacontinuousfunction h : S $ R n ,where S isacompactsimply connectedset,thereexistsidealweights W V suchthatthefunctioncanberepresentedby aNNas h ( x )= W T V T x ( + ( x ) where ( ) isthenonlinearactivationfunctionand ( x ) isthefunctionreconstruction error. 72

PAGE 73

Assumption4.2. TheNNactivationfunction ( ) andtheirtimederivative & ( ) with respecttoitsargumentisboundedi.e. || ||% and || & ||% & Assumption4.3. TheidealNNweightsareboundedbyaknownpositiveconstant[ 126 ] i.e. || W ||% W and || V ||% V Assumption4.4. TheNNfunctionreconstructionerrorsareanditsderivativeis bounded[ 126 ],i.e. || ||% and || & ||% & 4.3SystemIdentication ForthedynamicsgiveninEq. 41 ,thefollowingassumptionsaboutthesystemwill beutilizedinthesubsequentdevelopment. Assumption4.5. Theinputmatrices g 1 ( x ) and g 2 ( x ) areknownandboundedi.e. ( g 1 (% g 1 and ( g 2 (% g 2 where g 1 and g 2 areknownconstants. Assumption4.6. Theinputs u 1 and u 2 areboundedi.e. u 1 ,u 2 L UsingAssumption4-1,thenonlinearsysteminEq. 41 canberepresentedusinga multi-layerNNas x = F u ( x,u 1 ,u 2 )= W T f f V T f x ( + f ( x )+ g 1 ( x ) u 1 + g 2 ( x ) u 2 (414) where W f R N f +1 $ n V f R n $ N f areunknownidealNNweightmatriceswith N f representingtheneuronsintheoutputlayers.Theactivationfunctionisgivenby f = V T f x ( R N f ,and f ( x ) R n isthefunctionreconstructionerrorinapproximating thefunction f ( x ) .Theproposedmulti-layerdynamicneuralnetwork(MLDNN)usedto identifythesysteminEq. 41 is x = F u ( x, x,u 1 ,u 2 )= W T f f + g 1 ( x ) u 1 + g 2 ( x ) u 2 + , (415) where x ( t ) R n isthestateoftheMLDNN, W f R N f +1 $ n V f R n $ N f aretheestimates oftheidealweightsoftheNNs,and ( t ) R n denotestheRISEfeedbacktermdenedas k ( x ( t ) x (0))+ (416) 73

PAGE 74

wheremeasurableidenticationerror x ( t ) R n isdenedas x x x, (417) and ( t ) R n isthegeneralizedsolutionto =( k $ + f ) x + ) 1 sgn ( x ) (0)=0 where k,$," f ) 1 R arepositiveconstantgains,and sgn ( ) denotesavectorsignum function.Theidenticationerrordynamicsaredevelopedbytakingthetimederivativeof Eq. 417 andsubstitutingforEqs. 414 and 415 as x = F u ( x, x,u 1 ,u 2 )= W T f f W T f f + f ( x ) , (418) where F u ( x, x,u 1 ,u 2 )= F u ( x,u 1 ,u 2 ) F u ( x, x,u 1 ,u 2 ) R n .Alteredidenticationerror isdenedas r x + $ x. (419) TakingthetimederivativeofEq. 419 andusingEq. 418 yields r = W T f & f V T f x W T f f W T f & f V T f x W T f & f V T f x + f ( x ) kr f x ) 1 sgn ( x )+ $ x. (420) TheweightupdatelawsfortheDNNinEq. 415 aredevelopedbasedonthesubsequent stabilityanalysisas W f = proj ( $ wf & f V T f x x T ) V f = proj ( $ vf x x T W T f & f ) (421) where proj ( ) isasmoothprojectionoperator[ 127 ],[ 128 ],and $ wf R L f +1 $ L f +1 $ vf R n $ n arepositiveconstantadaptationgainmatrices.Addingandsubtracting 1 2 W T f & f V T f x + 1 2 W T f & f V T f x ,andgroupingsimilarterms,theexpressioninEq. 420 canberewrittenas r = N + N B 1 + N B 2 kr f x ) 1 sgn ( x ) (422) 74

PAGE 75

wheretheauxiliarysignals, N ( x, x,r, W f V f ,t ) ,N B 1 ( x, x, W f V f ,t ) and N B 2 ( x, x, W f V f ,t ) R n inEq. 422 aredenedas N $ x W T f f W T f & f V T f x + 1 2 W T f & f V T f x + 1 2 W T f & f V T f x, (423) N B 1 W T f & f V T f x 1 2 W T f & f V T f x 1 2 W T f & f V T f x + f ( x ) (424) N B 2 1 2 W T f & f V T f x + 1 2 W T f & f V T f x. (425) Tofacilitatethesubsequentstabilityanalysis,anauxiliaryterm N B 2 ( x, x, W f V f ,t ) R n isdenedbyreplacing x ( t ) in N B 2 ( ) by x ( t ) and N B 2 ( x, x, W f V f ,t ) N B 2 ( ) N B 2 ( ) Theterms N B 1 ( ) and N B 2 ( ) aregroupedas N B N B 1 + N B 2 .UsingAssumptions 4-2,4-3,4-4,and4-6,Eqs. 419 and 421 424 and 425 thefollowingboundscanbe obtained 3 3 3 N 3 3 3 % 1 ( ( z ( ) ( z ( (426) ( N B 1 (% ( 1 ( N B 2 (% ( 2 3 3 3 N B 3 3 3 % ( 3 + ( 4 2 ( ( z ( ) ( z ( (427) 3 3 3 x T N B 2 3 3 3 % ( 5 ( x ( 2 + ( 6 ( r ( 2 (428) where z 1 x T r T 2 T R 2 n 1 ( ) 2 ( ) R arepositive,globallyinvertible,non-decreasing functions,and ( i R ,i =1 ,..., 6 arecomputablepositiveconstants.Tofacilitatethe subsequentstabilityanalysis,let D R 2 n +2 beadomaincontaining y ( t )=0 ,where y ( t ) R 2 n +2 isdenedas y 5 x T r T P 6 Q f 7 T (429) wheretheauxiliaryfunction P ( t ) R isthegeneralizedsolutiontothedi! erential equation[ 129 ] P = L,P (0)= ) 1 n 4 i =1 | x i (0) | x T (0) N B (0) (430) 75

PAGE 76

wheretheauxiliaryfunction L ( t ) R isdenedas L r T ( N B 1 ) 1 sgn ( x ))+ x T N B 2 ) 2 2 ( ( z ( ) ( z (( x ( (431) where ) 1 ) 2 R arechosenaccordingtothefollowingsu"cientconditions,suchthat P ( t ) & 0 ) 1 > max( ( 1 + ( 2 ( 1 + ( 3 $ ) ) 2 > ( 4 (432) Theauxiliaryfunction Q f ( W f V f ) R inEq. 429 isdenedas Q f 1 4 $ 5 tr ( W T f $ % 1 wf W f )+ tr ( V T f $ % 1 vf V f ) 7 where tr ( ) denotesthetraceofamatrix. Theorem4.1. ForthesysteminEq. 41 ,theidentierdevelopedinEq. 415 alongwith itsweightupdatelawsinEq. 421 ensuresasymptoticidenticationofthestateandits derivative,inthesensethat lim t '" ( x ( t ) ( =0 and lim t '" 3 3 x ( t ) 3 3 =0 providedAssumptions4-4through4-6hold,andthecontrolgains k and f arechosen su cientlylargebasedontheinitialconditionsofthestates 1 ,andsatisfythefollowing su cientconditions $" f > ( 5 ,k>( 6 (433) where ( 5 and ( 6 areintroducedinEq. 428 ,and ) 1 ) 2 introducedinEq. 431 ,arechosen accordingtothesu" cientconditionsinEq. 432 1 Seesubsequentstabilityanalysis. 76

PAGE 77

Proof. Let V I ( y ): D $ R beaLipschitzcontinuousregularpositivedenitefunction denedas V I 1 2 r T r + 1 2 f x T x + P + Q f (434) whichsatisesthefollowinginequalities: U 1 ( y ) % V I ( y ) % U 2 ( y ) (435) where U 1 ( y ) U 2 ( y ) R arecontinuouspositivedenitefunctionsdenedas U 1 1 2 min (1 f ) ( y ( 2 U 2 max (1 f ) ( y ( 2 FromEqs. 418 421 422 ,and 430 ,thedi! erentialequationsoftheclosed-loop systemarecontinuousexceptintheset { y | x =0 } .UsingFilippov'sdi! erentialinclusion [ 116 118 ],theexistenceofsolutionscanbeestablishedfor y = f ( y ) where f ( y ) R 2 n +2 denotestheright-handsideofthetheclosed-looperrorsignals.UnderFilippov's framework,ageneralizedLyapunovstabilitytheorycanbeused([ 119 121 ]forfurther details)toestablishstrongstabilityoftheclosed-loopsystem.Thegeneralizedtime derivativeofEq. 434 existsalmosteverywhere(a.e.),and V I ( y ) a.e. V I ( y ) where V I = : # V I ( y ) # T K / r T x T 1 2 P % 1 2 P 1 2 Q % 1 2 Q 0 T where & V I isthegeneralizedgradientof V I [ 119 ],and K [ ] isdenedas[ 120 121 ] K [ f ]( y ) : $ > 0 : M =0 cof ( B ( y,* ) M ) where ; M =0 denotestheintersectionofallsets M ofLebesguemeasurezero, co denotes convexclosure,and B ( y,* )= { x R 2 n +2 | ( y x ( < } .Since V I ( y ) isaLipschitz 77

PAGE 78

continuousregularfunction, V I = V T I K / r T x T 1 2 P % 1 2 P 1 2 Q % 1 2 Q 0 T = 5 r T f x T 2 P 1 2 2 Q 1 2 7 K / r T x T 1 2 P % 1 2 P 1 2 Q % 1 2 Q 0 T Usingthecalculusfor K [ ] from[ 121 ],andsubstitutingthedynamicsfromEqs. 422 and 430 ,yields V I r T ( N + N B 1 + N B 2 kr ) 1 K [ sgn ( x )] f x )+ f x T ( r $ x ) r T ( N B 1 ) 1 K [ sgn ( x )]) x T N B 2 + ) 2 2 ( ( z ( ) ( z (( x (' 1 2 $ 5 tr ( W T f $ % 1 wf W f )+ tr ( V T f $ % 1 vf V f ) 7 = $" f x T x kr T r + r T N + 1 2 $ x T W T f & f V T f x + 1 2 $ x T W T f & f V T f x + x T ( N B 2 N B 2 ) + ) 2 2 ( ( z ( ) ( z (( x (' 1 2 $ tr ( W T f & f V T f x x T ) 1 2 $ tr ( V T f x x T W T f & f ) (436) whereEq. 421 andthefactthat ( r T r T ) i SGN ( x i )=0 isused(thesubscript i denotes the i th element),where K [ sgn ( x )]= SGN ( x ) [ 121 ],suchthat SGN ( x i )=1 if x i > 0 [ 1 1] if x i =0 and 1 if x i < 0 .Cancelingcommonterms,substitutingfor k k 1 + k 2 and f 1 + 2 ,usingEqs. 426 428 ,andcompletingthesquares,theexpressioninEq. 436 canbeupperboundedas V I %' ( $" 1 ( 5 ) ( x ( 2 ( k 1 ( 6 ) ( r ( 2 + 1 ( ( z ( ) 2 4 k 2 ( z ( 2 + ) 2 2 2 ( ( z ( ) 2 4 $" 2 ( z ( 2 (437) Providedthesu"cientconditionsinEq. 433 aresatised,theexpressioninEq. 437 can berewrittenas V I %' % ( z ( 2 + ( ( z ( ) 2 4 / ( z ( 2 %' U ( y ) ) y D (438) where % min { $" 1 ( 5 ,k 1 ( 6 } ( ( z ( ) 2 1 ( ( z ( ) 2 + 2 ( ( z ( ) 2 / min { k 2 %! 2 & 2 2 } ,and U ( y )= c ( z ( 2 forsomepositiveconstant c, isacontinuous,positivesemi-denitefunction 78

PAGE 79

denedonthedomain D < y ( t ) R 2 n +2 | ( y (% % 1 % 2 6 %/ &= Thesizeofthedomain D canbeincreasedbyincreasingthegains k and .Theresultin Eq. 438 indicatesthat V I ( y ) %' U ( y ) ) V I ( y ) a.e. V I ( y ) ) y D .Theinequalitiesin Eqs. 435 and 438 canbeusedtoshowthat V I ( y ) L in D ;hence, x ( t ) ,r ( t ) L in D .UsingEq. 419 ,standardlinearanalysiscanbeusedtoshowthat x ( t ) L in D ,and since x ( t ) L x ( t ) L in D .Since W f ( t ) L fromtheuseofprojectioninEq. 421 f ( t ) L fromAssumption4-6,and u ( t ) L fromAssumption4-2, ( t ) L in D fromEq. 415 .Usingtheaboveboundsandthefactthat & f ( t ) f ( t ) L itcanbe shownfromEq. 420 that r ( t ) L in D Since x ( t ) r ( t ) L ,thedenitionof U ( y ) can beusedtoshowthatitisuniformlycontinuousin D Let S -D denoteasetdenedas S 8 y ( t ) D| U 2 ( y ( t )) < 1 2 % % 1 % 2 6 %/ && 2 9 (439) TheregionofattractioninEq. 439 canbemadearbitrarilylargetoincludeanyinitial conditionsbyincreasingthecontrolgain / (i.e.asemi-globaltypeofstabilityresult),and hence c ( z ( 2 $ 0 as t $,) y (0) S andusingthedenitionof z ( t ) thefollowingresultcanbeshown ( x ( t ) ( 3 3 x ( t ) 3 3 ( r ($ 0 as t $,) y (0) S 79

PAGE 80

4.4Actor-CriticDesign UsingAssumption4-1andEq. 46 ,theoptimalvaluefunctionandtheoptimal controlscanberepresentedbyNNsas V # ( x )= W T 0 ( x )+ ( x ) ,u # 1 ( x )= 1 2 R % 1 g T 1 ( x ) 0 & ( x ) T W + & ( x ) ( (440) u # 2 ( x )= 1 2 2 g T 2 ( x ) 0 & ( x ) T W + & ( x ) ( where W R N istheunknownidealNNweight, N isthenumberofneurons, 0 ( x )= [ 0 1 ( x ) 0 2 ( x ) ...0 N ( x )] T R N aresmoothNNactivationfunctions,suchthat 0 i (0)=0 and 0 & i (0)=0 i =1 ...N ,where 0 & ( ) denotesthersttimederivativeoftheactivation functions,and ( ) R isthefunctionreconstructionerrors. Assumption4.7. TheNNactivationfunction { 0 j ( x ): j =1 ...N } arechosensuchthat as N $, 0 ( x ) providesacompleteindependentbasisfor V # ( x ) UsingAssumption4-7andWeierstrasshigher-orderapproximationtheorem,both V # ( x ) and V # canbeuniformlyapproximatedbyNNsinEq. 440 ,i.e.as N $, theapproximationerrors ( x ) & ( x ) $ 0 ,respectively.Thecritic V ( x ) andtheactor u ( x ) approximatetheoptimalvaluefunctionandtheoptimalcontrolsinEq. 440 ,andare givenas V ( x )= W T c 0 ( x ) u 1 ( x )= 1 2 R % 1 g T 1 ( x ) 0 & ( x ) T W 1 a (441) u 2 ( x )= 1 2 2 g T 2 ( x ) 0 & ( x ) T W 2 a where W c ( t ) R N and W 1 a ( t ) W 2 a ( t ) R N areestimatesoftheidealweightsofthe criticandactorNNs,respectively.Theweightestimationerrorsforthecriticandactorare denedas W c ( t ) W W c ( t ) and W ia ( t ) W W ia ( t ) ,for i =1 2 respectively.The actorandcriticNNweightsarebothupdatedbasedontheminimizationoftheBellman error hjb ( ) inEq. 412 ,whichcanberewrittenbysubstituting V fromEq. 441 as hjb = W T c 0 & F u + r ( x, u 1 u 2 )= W T c 1 + r ( x, u 1 u 2 ) (442) 80

PAGE 81

where 1 ( x, u 1 u 2 ,t ) 0 & F u R N isthecriticNNregressorvector. LeastSquaresUpdatefortheCritic .ConsidertheintegralsquaredBellman error E c ( W c ( t ) ,t ) E c = t 0 2 hjb ( ) d !. (443) TheLSupdatelawforthecritic W c ( t ) isgeneratedbyminimizingthetotalprediction errorinEq. 443 & E c & W c =2 t 0 hjb ( ) &* hjb ( ) & W c ( ) d =0 = W T c ( t ) t 0 1 ( ) 1 ( ) T d + t 0 1 ( ) T r ( ) d =0 W c ( t )= @ B t 0 1 ( ) 1 ( ) T d C E % 1 t 0 1 ( ) r ( ) d !, whichgivestheLSestimateofthecriticweights,providedtheinverse % t 0 1 ( ) 1 ( ) T d & % 1 exists.TherecursiveformulationofthenormalizedLSalgorithm[ 130 ]givestheupdate lawsforthecriticweightas W c = / c $ c 1 1+ .1 T $ c 1 hjb (444) where ,/ c R areconstantpositivegainsand $ c ( t ) % t 0 1 ( ) 1 ( ) T d & % 1 R N $ N isa symmetricestimationgainmatrixgeneratedby $ c = / c $ c 11 T 1+ .1 T $ c 1 $ c ; $ c ( t + r )=$ c (0)= 2 0 I, (445) where t + r istheresettingtimeatwhich % min { $ c ( t ) } % 2 1 ,and 2 0 > 2 1 > 0 .The covarianceresettingensuresthat $ c ( t ) ispositive-deniteforalltimeandprevents arbitrarilysmallvaluesinsomedirections,makingadaptationinthosedirectionsveryslow (alsocalledthecovariancewind-upproblem)[ 130 ].FromEq. 445 itisclearthat $ c % 0 81

PAGE 82

whichmeansthatthecovariancematrix $ c ( t ) canbeboundedasfollows 2 1 I % $ c % 2 0 I (446) GradientUpdatefortheActor. Theactorupdate,likethecriticupdatein Section 4.4 ,isbasedontheminimizationoftheBellmanerror hjb ( ) .However,unlikethe criticweights,theactorweightsappearnonlinearlyin hjb ( ) ,makingitproblematicto developaLSupdatelaw.Hence,agradientupdatelawisdevelopedfortheactorwhich minimizesthesquaredBellmanerror E a ( t ) 2 hjb ,whosegradientsaregivenas & E a & W 1 a =( W 1 a W c ) T 0 & G 1 ( 0 & ) T hjb (447) & E a & W 2 a = ( W 2 a + W c ) T 0 & G 2 ( 0 & ) T hjb (448) where G 1 g 1 R % 1 g T 1 R n $ n and G 2 % 2 g 2 g T 2 R n $ n aresymmetricmatrices.Using Eq. 447 ,theactorsNNsareupdatedas W 1 a = proj 8 $ 11 a 1+ 1 T 1 0 & G 1 ( 0 & ) T ( W 1 a W c ) hjb $ 12 a ( W 1 a W c ) 9 (449) W 2 a = proj 8 $ 21 a 1+ 1 T 1 0 & G 2 ( 0 & ) T ( W 2 a + W c ) hjb $ 22 a ( W 2 a W c ) 9 (450) where $ i 1 a $ i 2 a R for i =1 2 arepositiveadaptationgains,and proj {} isaprojection operatorusedtoboundtheweightestimates[ 127 ],[ 128 ].UsingtheAssumption4-3 andtheprojectionalgorithminEq. 449 ,theactorNNweightestimationerrorcanbe boundedas 3 3 3 W 1 a 3 3 3 % 3 1 3 3 3 W 2 a 3 3 3 % 3 2 (451) where 3 1 3 2 R issomepositiveconstant.TherstterminEq. 449 isnormalized andthelasttermisaddedasfeedbackforstability(basedonthesubsequentstability analysis). 82

PAGE 83

4.5StabilityAnalysis Thedynamicsofthecriticweightestimationerror W c ( t ) canbedevelopedusingEqs. 49 412 442 and 444 ,as W c = / c $ c 1 1+ .1 T $ c 1 5 W T c 1 W T 0 & F u + & F u + u T 1 R u 1 u # T 1 Ru # 1 (452) + W T 0 & ( g 1 ( u 1 u # 1 )+ g 2 ( u 2 u # 2 )) 2 u T 2 u 2 + 2 u # T 2 u # 2 7 Substitutingfor ( u # 1 ( x ) ,u # 2 ( x )) and ( u 1 ( x ) u 2 ( x )) fromEqs. 440 and 441 ,respectively,in Eq. 452 yields W c = / c $ c ++ T W c + / c $ c 1 1+ .1 T $ c 1 / W T 0 & F u + 1 4 W T 1 a 0 & G 1 0 & T W 1 a (453) + 1 4 W T 2 a 0 & G 2 0 & T W 2 a 1 4 & ( G 1 G 2 ) & T & F u 0 where + ( t ) ( t ) 1+ (' ( t ) T c ( t ) ( t ) R N isthenormalizedcriticregressorvector,boundedas ( + (% 1 .2 1 (454) where 2 1 isintroducedinEq. 446 .TheerrorsystemsinEq. 453 canberepresentedas thefollowingperturbedsystems W c = #+% (455) where # ( W c ,t ) / c $ c ++ T W c R N denotesthenominalsystem,and % ( t ) / c $ c 1 1+ .1 T $ c 1 / W T 0 & F u + 1 4 W T 1 a 0 & G 1 0 & T W 1 a + 1 4 W T 2 a 0 & G 2 0 & T W 2 a 1 4 & ( G 1 G 2 ) & T & F u 0 denotestheperturbations.UsingTheorem2.5.1in[ 130 ],itcanbeshownthatthenominal systems W c = / c $ c ++ T W c (456) 83

PAGE 84

areexponentiallystable,iftheboundedsignals + isPE,i.e. 2 I & t 0 + $ t 0 + ( ) + ( ) T d & 1 I ) t 0 & 0 where 1 , 2 R aresomepositiveconstants.Since # ( W c ,t ) iscontinuouslydi! erentiableandtheJacobian # # # W c = / c $ c ++ T isboundedfortheexponentiallystablesystem Eq. 456 theconverseLyapunovTheorem4.14in[ 131 ]canbeusedtoshowthatthere existsafunction V c : R N + [0 , ) $ R ,whichsatisesthefollowinginequalities c 1 3 3 3 W c 3 3 3 2 % V c ( W c ,t ) % c 2 3 3 3 W c 3 3 3 2 & V c & t + & V c & W c # ( W c ,t ) %' c 3 3 3 3 W c 3 3 3 2 (457) 3 3 3 3 & V c & W c 3 3 3 3 % c 4 3 3 3 W c 3 3 3 forsomepositiveconstants c 1 ,c 2 ,c 3 ,c 4 R for i =1 2 .UsingAssumptions4-1through 4-3,and4-5through4-7,theprojectionboundsinEq. 449 ,thefactthat F u L (since ( u # 1 ( x ) ,u # 2 ( x )) isstabilizing),andprovidedtheconditionsofTheorem4-1hold(required toprovethat F u L ),thefollowingboundsaredevelopedtofacilitatethesubsequent stabilityproof 3 3 3 W 1 a 3 3 3 % 3 1 3 3 3 W 2 a 3 3 3 % 3 2 (458) 3 3 3 3 W T 1 a 0 & G 1 0 & T W 1 a + 1 4 W T 2 a 0 & G 2 0 & T W 2 a 3 3 3 3 % 3 3 a 3 3 3 3 W T 0 & F u 1 4 & ( G 1 G 2 ) & T & F u 3 3 3 3 % 3 3 b 3 3 3 3 1 2 W T 0 & + & ( % ( G 1 + G 2 ) & T + G 1 0 & T W 1 a + G 2 0 & T W 2 a & 3 3 3 3 % 3 4 3 3 3 0 & G 1 0 & T 3 3 3 % 3 5 3 3 3 0 & G 2 0 & T 3 3 3 % 3 6 where 3 3 3 3 a + 3 3 b and 3 j R for j =1 ,..., 6 arecomputablepositiveconstants. Theorem4.2. IfAssumptions4-1through4-7hold,theregressors + ( t ) 1+ T c isPE,andprovidedEq. 432 ,Eq. 433 andthefollowingsu" cientgainconditionsare 84

PAGE 85

satised c 3 > $ 11 a 3 1 3 5 + $ 21 a 3 2 3 6 $ 22 a > 3 6 4 where $ 11 a $ 21 a $ 22 a ,c 3 3 1 3 2 3 5 and 3 6 areintroducedinEqs. 449 457 ,and 458 thenthecontrollerinEq. 441 ,theactor-criticweightupdatelawsinEqs. 444 445 and 449 ,andtheidentierinEqs. 415 and 421 ,guaranteethatthestateofthesystem x ( t ) andtheactor-criticweightestimationerrors W 1 a ( t ) W 2 a ( t ) and W c ( t ) areUUB. Proof. ToinvestigatethestabilityofthethesystemEq. 41 withcontrol ( u 1 u 2 ) ,and theperturbedsystemEq. 455 ,consider V L : X + R N + R N + R N + [0 , ) $ R asthe continuouslydi! erentiable,positive-deniteLyapunovfunctioncandidate,givenas V L ( x, W c W 1 a W 2 a ,t ) V # ( x )+ V c ( W c ,t )+ 1 2 W T 1 a W 1 a + 1 2 W T 2 a W 2 a where V # ( x ) (theoptimalvaluefunctionforEq. 41 ),istheLyapunovfunctionforEq. 41 ,and V c ( W c ,t ) istheLyapunovfunctionfortheexponentiallystablesysteminEq. 456 .Since V # ( x ) arecontinuouslydi! erentiableandpositive-denitefromEq. 43 fromLemma4.3in[ 131 ],thereexistclass K functions $ 1 and $ 2 denedon [0 ,r ] where B r X ,suchthat $ 1 ( ( x ( ) % V # ( x ) % $ 2 ( ( x ( ) ) x B r (459) UsingEqs. 457 and 459 V L ( x, W 1 a W 2 a ,t ) canbeboundedas $ 1 ( ( x ( )+ c 1 3 3 3 W c 3 3 3 2 + 1 2 > 3 3 3 W 1 a 3 3 3 2 + 3 3 3 W 2 a 3 3 3 2 ? % V L $ 2 ( ( x ( )+ c 2 3 3 3 W c 3 3 3 2 + 1 2 > 3 3 3 W 1 a 3 3 3 2 + 3 3 3 W 2 a 3 3 3 2 ? & V L whichcanbewrittenas $ 3 ( ( w ( ) % V L ( x, W 1 a W 2 a ,t ) % $ 4 ( ( w ( ) ) w B s where w ( t ) [ x ( t ) T W c ( t ) T W 1 a ( t ) T W 2 a ( t ) T ] T R n +3 N $ 3 and $ 4 areclass K functions denedon [0 ,s ] where B s X+ R N + R N + R N .Takingthetimederivativeof V L ( ) 85

PAGE 86

yields V L = V # ( f + g 1 u 1 + g 2 u 2 )+ & V c & t + & V c & W c # + & V c & W c % (460) W T 1 a W 1 a W T 2 a W 2 a wherethetimederivativeof V # ( ) istakenalongthethetrajectoriesofthesystemEq. 516 withcontrolinputs ( u 1 ( ) u 2 ( )) andthetimederivativeof V c ( ) istakenalongthe alongthetrajectoriesoftheperturbedsystemEq. 455 .UsingtheHJIequationEq. 410 V # f = '* V # ( g 1 u # 1 + g 2 u # 2 ) Q ( x ) u # T 1 Ru # 1 + 2 u # T 2 u # 2 Substitutingforthe V # f terms inEq. 460 ,usingthefactthat V # g 1 = 2 u # T 1 R and V # g 2 =2 2 u # T 2 fromEqs. 46 and 47 ,andusingEqs. 449 and 457 460 canbeupperboundedas V L %' Q u # T 1 Ru # 1 + u # T 2 u # 2 c 3 3 3 3 W c 3 3 3 2 (461) + c 4 3 3 3 W c 3 3 3 ( % ( +2 u # T 1 R ( u # 1 u 1 ) 2 2 u # T 2 ( u # 2 u 2 ) + W T 1 a / $ 11 a 1+ 1 T 1 0 & G 1 ( 0 & ) T ( W 1 a W c ) hjb + $ 12 a ( W 1 a W c ) 0 + W T 2 a / $ 21 a 1+ 1 T 1 0 & G 2 ( 0 & ) T ( W 2 a + W c ) hjb + $ 22 a ( W 2 a W c ) 0 86

PAGE 87

Substitutingfor u # u,* hjb ,and % usingEqs. 46 47 441 452 ,and 455 ,respectively, andusingEqs. 446 and 454 inEq. 461 ,yields V L %' Q c 3 3 3 3 W c 3 3 3 2 > $ 22 a 1 4 3 3 3 0 & G 2 0 & T 3 3 3 ? 3 3 3 W 2 a 3 3 3 2 $ 12 a 3 3 3 W 1 a 3 3 3 2 (462) + 1 2 W T 0 & + & ( % ( G 1 + G 2 ) & T + G 1 0 & T W 1 a + G 2 0 & T W 2 a & + c 4 / c 2 0 2 .2 1 3 3 3 3 W T 0 & F u + 1 4 W T 1 a 0 & G 1 0 & T W 1 a + 1 4 W T 2 a 0 & G 2 0 & T W 2 a 1 4 & ( G 1 G 2 ) & T & F u 3 3 3 3 3 3 3 W c 3 3 3 + $ 11 a 1+ 1 T 1 W T 1 a 0 & G 1 0 & T ( W 1 c W 1 a ) % W T c 1 W T 0 & F u + 1 4 W T 1 a 0 & G 1 0 & T W 1 a + 1 4 W T 2 a 0 & G 2 0 & T W 2 a 1 4 & ( G 1 G 2 ) & T & F u ? + $ 21 a 1+ 1 T 1 W T 2 a 0 & G 2 0 & T ( W c W 2 a +2 W ) % W T c 1 W T 0 & F u + 1 4 W T 1 a 0 & G 1 0 & T W 1 a + 1 4 W T 2 a 0 & G 2 0 & T W 2 a 1 4 & ( G 1 G 2 ) & T & F u ? + % $ 12 a 3 3 3 W 1 a 3 3 3 + $ 22 a 3 3 3 W 2 a 3 3 3 & 3 3 3 W c 3 3 3 UsingtheboundsdevelopedinEq. 458 andAssumption4-3,Eq. 462 canbefurther upperboundedas V L %' Q ( c 3 $ 11 a 3 1 3 5 $ 21 a 3 2 3 6 ) 3 3 3 W c 3 3 3 2 $ 12 a 3 3 3 W 1 a 3 3 3 2 > $ 22 a 1 4 3 6 ? 3 3 3 W 2 a 3 3 3 2 + 3 3 3 W c 3 3 3 + $ 11 a 3 2 1 3 5 3 3 + $ 21 a 3 2 1 +2 W ( 3 6 3 3 + 3 4 where ! c 4 / c 2 0 2 .2 1 3 3 +($ 11 a 3 1 3 5 + $ 21 a 3 2 3 6 ) 3 3 + $ 11 a 3 2 1 3 5 + $ 21 a 3 1 +2 W ( 3 1 3 5 + $ 12 a 3 1 + $ 22 a 3 2 87

PAGE 88

Provided c 3 > $ 11 a 3 1 3 5 + $ 21 a 3 2 3 6 and 4 $ 22 a > 3 6 ,andcompletingthesquareyields V L %' Q (1 4 1 )( c 3 $ 11 a 3 1 3 5 $ 21 a 3 2 3 6 ) 3 3 3 W c 3 3 3 2 (463) $ 12 a 3 3 3 W 1 a 3 3 3 2 > $ 22 a 1 4 3 6 ? 3 3 3 W 2 a 3 3 3 2 + 2 4 4 1 ( c 3 $ 11 a 3 1 3 5 $ 21 a 3 2 3 6 ) + $ 11 a 3 2 1 3 5 3 3 + $ 21 a 3 2 1 +2 W ( 3 6 3 3 + 3 4 where 4 1 (0 1) .Since Q ( x ) ispositivedenite,accordingtoLemma4.3in[ 131 ],there existclass K functions $ 5 and $ 6 suchthat $ 5 ( ( w ( ) % F ( ( w ( ) % $ 6 ( ( w ( ) ) w B s (464) where F ( ( w ( )= Q +(1 4 1 )( c 3 $ 11 a 3 1 3 5 $ 21 a 3 2 3 6 ) 3 3 3 W c 3 3 3 2 + > $ 22 a 1 4 3 6 ? 3 3 3 W 2 a 3 3 3 2 + $ 12 a 3 3 3 W 1 a 3 3 3 2 UsingEq. 464 ,theexpressioninEq. 463 canbefurtherupperboundedas V L %' $ 5 ( ( w ( )+ & where & = 2 4 4 1 ( c 3 $ 11 a 3 1 3 5 $ 21 a 3 2 3 6 ) + $ 11 a 3 2 1 3 5 3 3 + $ 21 a 3 2 1 +2 W ( 3 6 3 3 + 3 4 whichprovesthat V L ( ) isnegativewhenever w ( t ) liesoutsidethecompactset # w F w : ( w (% $ % 1 5 ( & ) G andhence, ( w ( t ) ( isUUB,accordingtoTheorem4.18 in[ 131 ]. Remark 4.1 Sincetheactor,criticandidentierarecontinuouslyupdated,thedeveloped RLalgorithmcanbecomparedtofullyoptimisticpolicyiteration(PI)inmachine learningliterature[ 87 ],wherepolicyevaluationandpolicyimprovementaredoneafter 88

PAGE 89

everystatetransition.Thisdi! ersfromtraditionalPI,wherepolicyimprovementis doneonlyaftertheconvergenceofeachpolicyevaluationstep.Convergencebehaviorof optimisticPIisnotfullyunderstood,andbyconsideringanadaptivecontrolframework, thisresultinvestigatestheconvergenceandstabilitybehavioroffullyoptimisticPIin continuous-time.TherequirementofPEconditioninTheorem4-2isequivalenttothe explorationparadigminRLwhichensuressu"cientsamplingofthestatespaceand convergencetotheoptimalpolicy[ 2 ].ThetheoremshowsthatthePEconditionis neededforproperidenticationofthevaluefunction.Thetheoremmakesnomentionof ndingtheminimumnon-negativedenitesolutiontotheHJI.Howeveritdoesguarantee convergencetoasolution ( u 1 ,u 2 ) suchthatthedynamicsinEq. 41 arestable.Thisis onlyaccomplishedbytheminimalnon-negativedeniteHJIsolution. 4.6ConvergencetoNashSolution Inadditiontoestablishingconvergenceoftheactorandcriticweights,itisprudent toalsoconsidertheconvergenceofthecontrolstrategiestothesaddlepointNashequilibrium.ThesubsequentanalysisdemonstratesthattheactorandcriticNNapproximations convergetotheapproximateHJIequationinEq. 315 .Itcanalsobeshownthatthe approximatecontrollersinEq. 441 approximatetheoptimalsolutionstothetwoplayer NashgameforthedynamicsystemgiveninEq. 41 .Tofacilitatethesubsequentanalysis thefollowingassumptionismade. Assumption4.8. Foreachsetofadmissiblecontrolpolicies,theHJIequationEq. 48 hasalocallysmoothsolution V ( x ) & 0 ) x # x ,where # x isthesetdescribedinSection 4.1 .On # x R n f ( ) isLipschitzandboundedby ( f (% c f ( x ( ,where c f R isa positiveconstant. Theorem4.3. GiventhattheAssumptionsandsu"cientgainconstraintsinTheorem4-2 hold,thentheactorandcriticNNsconvergetotheapproximateHJIsolution,inthesense thattheHJIsinEq. 411 areUUB. 89

PAGE 90

Proof. ConsidertheapproximateHJIinEq. 513 andaftersubstitutingtheapproximate controllawsinEq. 519 yields H % x, V, u 1 u 2 & = r u + VF u = Q ( x )+ 1 4 W T 1 a 0 & G 1 0 & T W 1 a 1 4 W T 2 a 0 & G 2 0 & T W 2 a + W T c 0 & f ( x ) 1 2 W T c 0 & % G 1 0 & T W 1 a G 2 0 & T W 2 a & Afteraddingandsubtracting W T 0 & + & ( f = ' W T 0 & + & ( ( g 1 u # 1 + g 2 u # 2 ) Q ( x ) u # T 1 Ru # 1 + 2 u # T 2 u # 2 andsubstitutingfortheoptimalcontrollawinEq. 518 as H = W T c 0 & 1 f & f + 1 4 W T 1 a 0 & G 1 0 & T W 1 a 1 4 W T 0 & G 1 0 & T W (465) 1 4 W T 2 a 0 & G 2 0 & T W 2 a + 1 4 W T 0 & G 2 0 & T W 1 2 W T c 0 & % G 1 0 & T W 1 a G 2 0 & T W 2 a & + 1 2 W T 0 & + & ( ( G 1 G 2 ) 0 & T W + 1 4 & ( G 1 G 2 ) & T (466) SubstitutingtheNNmismatcherrors W c ( t ) W W c ( t ) and W ia ( t ) W W ia ( t ) ,for i =1 2 respectivelyinto 465 H = W T c 0 & f ( x ) & f + 1 4 W T 1 a 0 & G 1 0 & T W 1 a + 1 4 & ( G 1 G 2 ) & T (467) 1 4 W T 2 a 0 & G 2 0 & T W 2 a + 1 2 W T c 0 & ( G 1 G 2 ) 0 & T W 1 2 W T c 0 & % G 1 0 & T W 1 a G 2 0 & T W 2 a & + 1 2 & ( G 1 G 2 ) 0 & T W. UsingtheboundsdevelopedinEq. 458 ,Assumption4-2through4-4andAssumption 4-8,Eq. 467 canbeupperboundedas ( H (% c f 0 & ( x ( + 3 5 W + 3 1 ( + 3 6 W + 3 2 (( 3 3 3 W c 3 3 3 + & c f ( x ( (468) + 3 5 3 3 3 W 1 a 3 3 3 2 3 6 3 3 3 W 2 a 3 3 3 2 + & ( ( G 1 G 2 ) ( 0 & W + & ( UsingtheAssumptionsandTheorem4-2,itiseasytoseethatalltermstotherightofthe inequalityareUUB,thereforetheapproximateHJIisalsoUUB. 90

PAGE 91

Theorem4.4. GiventhattheAssumptionsandsu"cientgainconstraintsinTheorem 4-2hold,theapproximatecontrollawsinEq. 441 convergetotheapproximateNash equilibriumsolutionofthezero-sumgame. Proof. Considerthecontrolerrors ( u 1 u 2 ) betweentheoptimalcontrollawsinEqs. 46 and 47 ,andtheapproximatecontrollawsinEq. 441 givenas u 1 u # 1 u 1 u 2 u # 2 u 2 SubstitutingfortheoptimalcontrollawsinEqs. 46 and 47 ,andtheapproximate controllawsinEq. 441 andusing W ia ( t ) W i W ia ( t ) for i =1 2 ,yields u 1 = 1 2 R % 1 11 g T 1 0 & % W 1 a + & & (469) u 2 = 1 2 2 g T 2 0 & % W 2 a + & & UsingAssumptions4-1through4-4,Eq. 469 canbeupperboundedas ( u 1 (% 1 2 % min R % 1 11 ( g 1 0 & % 3 3 3 W 1 a 3 3 3 + & 1 & ( u 2 (% 1 2 2 g 2 0 & % 3 3 3 W 2 a 3 3 3 + & 2 & GiventhattheAssumptionsandsu"cientgainconstraintsinTheorem4-2hold,then alltermstotherightoftheinequalityareUUB,thereforethecontrolerrors ( u 1 u 2 ) are UUBandtheapproximatecontrollaws ( u 1 u 2 ) givetheapproximateNashequilibrium solution. 4.7Simulation Thefollowingnonlineardynamicsareconsideredin[ 107 108 132 ] x = f ( x )+ g 1 ( x ) u 1 ( x )+ g 2 ( x ) u 2 ( x ) 91

PAGE 92

where f ( x )= ) + x 1 + x 2 x 3 1 x 3 2 + 1 4 x 2 ( cos (2 x 2 )+2) 2 1 4 2 x 2 ( sin (4 x 1 )+2) 2 g 1 ( x )= / 0 cos (2 x 2 )+2 0 T g 2 ( x )= / 0 sin (4 x 1 )+2 0 T Theinitialstateisgivenas x (0)=[3 1] T andthelocalcostfunctionisdenedas r = x T Qx + u T 1 Ru 1 2 u T 2 u 2 where R =1 2 =8 ,Q = I 2 $ 2 Theoptimalvaluefunctionis V # ( x )= 1 4 x 4 1 + 1 2 x 2 2 andtheoptimalcontrolinputsaregivenas u # 1 = ( cos (2 x 1 )+2) x 2 ,u # 2 = 1 2 ( sin (4 x 1 )+2) x 2 TheactivationfunctionforthecriticNNischosenas 0 = / x 2 1 x 2 2 x 4 1 x 4 2 0 whiletheactivationfunctionfortheidentierDNNischosenasasymmetricsigmoidwith 5neuronsinthehiddenlayer.Theidentiergainsarechosenas k =100 $ =30 f =5 ) 1 =0 2 $ wf =0 2 I 6 $ 6 $ vf =0 2 I 2 $ 2 andthegainsoftheactor-criticlearninglawsarechosenas $ 11 a = $ 21 a =1 $ 12 a = $ 22 a =0 5 / c =1 =0 005 92

PAGE 93

Thecovariancematrixisinitializedto $ (0)= 001 ,alltheNNweightsarerandomly initializedwithvaluesbetween [ 1 1] ,andthestatesareinitializedto x (0)=[3 1] Asmallamplitudeexploratorysignal(noise)isaddedtothecontroltoexcitethestates fortherst10secondsofthesimulation,asseenfromtheevolutionofstatesinFigure 4-1 .Theidentierapproximatesthesystemdynamics,andthestatederivativeestimation errorisshowninFigure 4-2 .ThetimehistoriesofthecriticNNweightsandtheactors NNweightsaregiveninFigure 4-3 and 4-4 .Persistenceofexcitationensuresthatthe weightsconverge.Figure 4-5 showsthedi! erencebetweentheoptimalvaluefunction andtheapproximateone.Figure 4-6 demonstratestheapproximationerrorbetween theoptimalcontrollerandtheapproximatedcontrollerforplayer1and2,respectively. Figures 4-7 4-8 4-9 ,and 4-10 demonstratethatforaPEsignalthatisnotremovedthe weightsconverge,howeverthePEsignaldegradestheperformanceofthestates. Remark 4.2 AnimplementationissueinusingthedevelopedalgorithmistoensurePE ofthecriticregressorvector.Unlikelinearsystems,wherePEoftheregressortranslates tothesu"cientrichnessoftheexternalinput,noveriablemethodexiststoensurePEin nonlinearsystems.Inthissimulation,asmallexploratorysignalconsistingofsinusoidsof varyingfrequencieswasaddedtothecontroltoensurePEqualitatively,andconvergence ofcriticweightstotheiroptimalvaluesisachieved.Theexploratorysignal n ( t ) ispresent intherst3secondsofthesimulationandisgivenby n ( t )=(1 2 exp( 01 t )) cos 2 (0 2 t )+ sin 2 (2 0 t ) cos (0 1 t )+ sin 2 ( 1 2 t ) cos ( 5 t )+ sin 5 ( t ) ( 4.8Summary Ageneralizedsolutionforatwoplayerzero-sumdi! erentialgameissoughtutilizing theACIarchitecturefornonlinearaHJIequation.TheACIarchitectureimplements theactorandcriticapproximationsimultaneouslyandinreal-time.Theuseofarobust DNN-basedidentiercircumventstheneedforcompletemodelknowledge,yieldingan identierwhichisproventobeasymptoticallyconvergent.Leastsquaresapproximation 93

PAGE 94

andadaptivecontroltheorytechniquesareutilizedtoupdatetheweightsforthecriticand actorNNstoapproximatethevaluefunctionandapproximatecontrolpolicies.Usingthe identierandthecritic,anapproximationtotheoptimalcontrollaw(actor)isdeveloped whichstabilizestheclosedloopsystemandapproachestheoptimalsolutionstothetwo playerzero-sumgame. 0 5 10 15 20 25 30 1 0 1 2 3 x 1 0 5 10 15 20 25 30 2 1 0 1 2 x 2 [sec] Figure4-1.Theevolutionofthesystemstatesforthezero-sumgame,withpersistently excitedinputfortherst10seconds. 94

PAGE 95

0 1 2 3 4 5 6 7 8 9 10 0.05 0 0.05 x 1 0 1 2 3 4 5 6 7 8 9 10 0.1 0.05 0 0.05 0.1 x 2 [sec] Figure4-2.Errorinestimatingthestatederivatives,withtheidentierforthezero-sum game. 0 5 10 15 20 25 30 0.2 0 0.2 0.4 0.6 0.8 1 1.2 W c [sec] W c 1 W c 2 W c 3 W c 4 Figure4-3.Convergenceofcriticweightsforthezero-sumgame. 95

PAGE 96

0 5 10 15 20 25 30 1 0 1 2 3 W a 1 W a 11 W a 12 W a 13 W a 14 0 5 10 15 20 25 30 1 0 1 2 3 W a 2 [sec] W a 21 W a 22 W a 23 W a 24 Figure4-4.Convergenceofactorweightsforplayer1andplayer2inazero-sumgame. 0 5 10 15 20 25 30 0 2 4 6 8 10 12 14 16 18 20 [sec] V Approximation Optimal Figure4-5.Optimalvaluefunctionapproximation V ( x ) ,forazero-sumgame. 96

PAGE 97

0 5 10 15 20 25 30 5 0 5 10 u 1 0 5 10 15 20 25 30 0.8 0.6 0.4 0.2 0 0.2 u 2 [sec] Approximation Optimal Figure4-6.Optimalcontrolapproximations u 1 and u 2 ,inazero-sumgame. 0 5 10 15 20 25 30 1 0 1 2 3 x 1 0 5 10 15 20 25 30 1.5 1 0.5 0 0.5 1 x 2 [sec] Figure4-7.Theevolutionofthesystemstatesforthezero-sumgame,withacontinuous persistentlyexcitedinput. 97

PAGE 98

0 5 10 15 20 25 30 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 W c [sec] W c 1 W c 2 W c 3 W c 4 Figure4-8.Convergenceofcriticweightsforthezero-sumgame,withacontinuous persistentlyexcitedinput. 0 5 10 15 20 25 30 0 0.5 1 1.5 W a 1 0 5 10 15 20 25 30 0 0.2 0.4 0.6 0.8 W a 2 [sec] W a 11 W a 12 W a 13 W a 14 W a 21 W a 22 W a 23 W a 24 Figure4-9.Convergenceofactorweightsforplayer1andplayer2inazero-sumgame, withacontinuouspersistentlyexcitedinput. 98

PAGE 99

0 5 10 15 20 25 30 0 2 4 6 8 10 12 14 16 18 20 [sec] V Approximation Optimal Figure4-10.Optimalvaluefunctionapproximation V ( x ) forazero-sumgame,witha continuouspersistentlyexcitedinput.. 0 5 10 15 20 25 30 5 0 5 10 u 1 0 5 10 15 20 25 30 0.8 0.6 0.4 0.2 0 0.2 u 2 [sec] Approximation Optimal Figure4-11.Optimalcontrolapproximations u 1 and u 2 inazero-sumgame,witha continuouspersistentlyexcitedinput. 99

PAGE 100

CHAPTER5 APPROXIMATE N -PLAYERNONZERO-SUMGAMESOLUTIONFORAN UNCERTAINCONTINUOUSNONLINEARSYSTEM Chapter 4 focusedonsolvingatwoplayerinnitehorizonzero-sumgamesubject tononlineartime-invarianta"neintheinputdynamics.Thischapterexpandsthe techniquefromChapter 4 toamoregeneralclassofproblems.Thefocusofthischapter isthederivationofasolutiontoan N -playerinnitehorizonnonzero-sumgamesubject tononlineartime-invarianta"neintheinputdynamics.Thisproblemhasinherent complexityascomparedtothezero-sumgameinthefactthatasetofoptimalstrategies aretryingtominimizeasetofcoupledcostfunctions,whichleadtoasetofcoupled nonlinearHJBpartialdi! erentialequations.Nonzero-sumdi! erentialgamesresemble classicaloptimalcontrolproblemsinsomerespects,butduetomultiplecostcriteriathe gameformulationmustbefurtherspeciedastowhatisdemandedofanoptimalsolution. Toapproachafeasiblesolutionforthisproblemanonlinesolutionmethodbasedonan approximationofthesetofHJBsispresented.Thistechniqueutilizesanapproximate optimaladaptivecontrollerthathastwosetsofadaptivestructures,acriticsetto approximateforthevalue(cost)functionsandanactorsettoapproximateforthecontrol policies.Inaddition,aDNNisusedtorobustlyidentifythesystemparameters.Thetwo adaptivestructuresaretunedsimultaneouslyonlinetolearnthesolutiontothesetof coupledHJBequationsandthesetofoptimalpolicies.Thischapterpresentsanadaptive controlmethodthatconvergesonlinetoanapproximatesolutionsetofthe N -player di! erentialgame.Parameterupdatelawsaregiventotunetheweightsoftheonlinecritic andactorneuralnetworkssimultaneouslytoconvergetothesolutionsetofthecoupled HJBequationsandthesetofNashequilibriumpolicies,whilealsoguaranteeingclosedloopstability.Thesetofpoliciesguaranteeuniformlyultimatelybounded(UUB)tracking errorfortheclosed-loopsystem. 100

PAGE 101

5.1 N -playerNonzero-SumDi!erentialGame Considerthe N -playernonlineartime-invarianta"neintheinputdynamicsystem giveby x = f ( x )+ N 4 j =1 g j ( x ) u j (51) where x ( t ) X! R n isthestatevector, u j ( x ) U! R m j arethecontrolinputs,and f ( x ) R n ,and g ( x ) R n $ m arethedrift,inputmatrices.Assumethat g 1 ( x ) ,...,g N ( x ) and f ( x ) aresecondorderdi! erentiableandLipschitzcontinuous,and that f (0)=0 suchthat x =0 isanequilibriumpointforEq. 51 .Theinnite-horizon scalarcostfunctional J i ( x ( t ) ,u 1 ,u 2 ,...u N ) associatedwitheachplayercanbedenedas J i = t r i ( x ( s ) ,u 1 ,u 2 ,...u N ) dsi N, (52) where t istheinitialtime,and r i ( x,u 1 ,u 2 ,...u N ) R isthelocalcostforthestate,and control,denedas r i = Q i ( x )+ N 4 j =1 u T j R ij u j i N, (53) where Q i ( x ) R R ij = R T ij R m j $ m j arecontinuouslydi! erentiableandpositive denite,and R ii R m i $ m i arepositivedenitesymmetricmatrices.Thecostfunctional mayalsobewrittenas[ 109 ] J i = 1 N N 4 j =1 J j + 1 N N 4 j =1 ( J i J j ) 0 J + J i i N, (54) where J isanoverallcooperative team costand J i a conict costforplayer i .Thecost functioninEq. 54 canbecastasazero-sumgamebysetting J =0 ,andcanbefurther reducedtoatwoplayerzero-sumgamewhen J 1 = J 2 .Suchgameshavebeenextensively studiedincontrolsystemsandresultinthesaddle-pointNashequilibriumsolution. However,general team gamesmayhavebothcooperativeobjectivesandselshobjectives, whichiscapturedinnonzero-sumgames,asdetailedinEq. 54 .Theobjectiveofthe N -playergameistondasetofadmissiblefeedbackpolicies ( u # 1 ,u # 2 ,...,u # N ) suchthatthe 101

PAGE 102

valuefunction V i ( x ( t ) ,u 1 ,u 2 ,...u N ) giveninEq. 52 V i = min u i t H Q i ( x )+ N 4 j =1 u T j R ij u j I dsi N, (55) isminimized.ThischapterfocusesontheNashequilibriumsolutionforthe N -player game,inwhichthefollowing N inequalitiesaresatisedforall u # i # i ,i N : V # 1 V 1 ( x ( t ) ,u # 1 ,u # 2 ,...u # N ) % V 1 ( x ( t ) ,u 1 ,u # 2 ,...u # N ) V # 2 V 2 ( x ( t ) ,u # 1 ,u # 2 ,...u # N ) % V 2 ( x ( t ) ,u # 1 ,u 2 ,...u # N ) V # N V N ( x ( t ) ,u # 1 ,u # 2 ,...u # N ) % V N ( x ( t ) ,u # 1 ,u # 2 ,...u N ) " " " " " # " " " " " $ (56) TheNashequilibriumoutcomeofthe N -playergameisgivenbythe N -tupleofquantities { V # 1 ,V # 2 ,...,V # N } .Thevaluefunctionscanbealternatelypresentedbyadi! erential equivalentgivenbythefollowingnonlinearLyapunovequation[ 109 ] 0= r ( x,u 1 ,...,u N )+ V # i H f ( x )+ N 4 j =1 g j ( x ) u j I ,V # i (0)=0 ,i N, (57) where V # i # V i ( x ) # x R n $ 1 Assumingthevaluefunctionaliscontinuouslydi! erentiable, Bellman'sprincipleofoptimalitycanbeusedtoderivethefollowingoptimalitycondition 0=min u i J V # i H f ( x )+ N 4 j =1 g j ( x ) u j I + r ( x,u 1 ,...,u N ) K (58) V # i (0)=0 ,i N, whichisaN-coupledsetofnonlinearPDEs,alsocalledtheHJBequation.Suitable nonnegativedenitesolutionstoEq. 57 canbeusedtoevaluatetheinniteintegral Eq. 55 alongthesystemstrajectories.Aclosed-formexpressionoftheoptimalfeedback controlpoliciesaregivenby u # i ( x )= 1 2 R % 1 ii g T i ( x ) V # i i N. (59) 102

PAGE 103

TheclosedformexpressionfortheoptimalcontrolpoliciesinEq. 59 ,obviatestheneed tosearchforasetoffeedbackpoliciesthatminimizethevaluefunction;however,the solutions V # i ( x ) totheHJBequationsgiveninEq. 58 arerequired.TheHJBequations inEq. 58 ,canberewrittenbysubstitutingforthelocalcostinEq. 53 andtheoptimal controlpolicyinEq. 59 ,respectively,as 0= Q i ( x )+ V # i f ( x ) 1 2 V # i N 4 j =1 g j ( x ) R % 1 jj g T j ( x ) V # j (510) + 1 4 N 4 j =1 V # j g j ( x ) R % T jj R ij R % 1 jj g T j ( x ) V # j ,V # i (0)=0 Althoughnonzero-sumgamescontainnon-cooperativecomponents,thesolutionto eachplayer'scoupledHJBequationinEq. 510 requiresknowledgeofalltheother player'sstrategiesinEq. 59 .Theunderlyingassumptionofrationalopponents[ 41 ]is characteristicofdi! erentialgametheoryproblemsanditimpliesthattheplayersshare information,yettheyagreetoadheretotheequilibriumpolicydeterminedfromtheNash game. 5.2HJBApproximationviaACI ThischaptergeneralizestheACIapproximationarchitecturetosolvethe N -player nonzero-sumgameforEq. 510 .TheACIarchitectureeliminatestheneedforexact modelknowledgeandutilizesaDNNtorobustlyidentifythesystem,acriticNNto approximatethevaluefunctionandanactorNNtondasetofcontrolpolicieswhich minimizesthevaluefunctions.ThissectionintroducestheACIarchitectureforthe N -playergame,andsubsequentsectionsgivedetailsofthedesignforthe N -player nonzero-sumgamesolution. TheHamiltonian H i ( x, V x i ,u 1 ,...,u N ) ofthesysteminEq. 51 canbedenedas H i = r u i + V i F u ,i N, (511) 103

PAGE 104

where V i denotestheJacobianofthevaluefunctions V i F u ( x,u 1 ,...,u N ) f ( x )+ N L j =1 g j ( x ) u j R n denotesthesystemdynamics,and r u i ( x,u 1 ,...,u N ) Q i ( x )+ N L j =1 u T j R ij u j denotesthelocalcost.TheoptimalpoliciesinEq. 59 andtheassociatedvaluefunctions V # i ( x ) satisfytheHJBequation H i ( x, V # i ,u # 1 ,...,u # N )= r u i + V # i F u =0 i N. (512) ReplacingtheoptimalJacobian V # i andoptimalcontrolpolicies u # i byestimates V i and u i ,respectively,yieldstheapproximateHJBequation H i % x, V i u 1 ,..., u N & = r u i + V i F u ,i N. (513) ItisevidentthattheapproximateHJBinEq. 513 isdependentonthecompleteknowledgeofthesystem.Toovercomethislimitation,anonlinesystemidentierreplacesthe systemdynamicswhichmodiestheapproximateHJBinEq. 513 ,andisdenedas H i % x, x, V i u 1 ,..., u N & = r u i + V i F u ,i N, (514) where F u isanapproximationofthesystemdynamics F u .Takingtheerrorbetweenthe optimalandapproximateHJBequationsinEqs. 512 and 514 ,respectively,yieldsthe Bellmanresidualerrors hjb i % x, x, u i V i & denedas hjb i H i % x, x, V i u 1 ,..., u N & H i ( x, V # i ,u # 1 ,...,u # N ) i N. (515) Howeversince H i ( x, V # i ,u # 1 ,...,u # N )=0 ) i N thentheBellmanresidualerrorcanbe denedinameasurableformas hjb i = H i % x, x, V i u 1 ,..., u N & i N. Theobjectiveistoupdateboth u i (actors)and V i (critics)simultaneously,basedonthe minimizationoftheBellmanresidualerrors hjb i .Alltogether,theactors u i ,thecritics V i 104

PAGE 105

andtheidentier F u constitutetheACIarchitecture.Assumptions4-1through4-6from Chapter 4 areusedinthisderivationtofacilitatethesubsequentanalysis. 5.3SystemIdentier ConsiderthetwoplayercaseforthedynamicsgiveninEq. 51 as x = f ( x )+ g 1 ( x ) u 1 ( x )+ g 2 ( x ) u 2 ( x ) ,x (0)= x 0 (516) where u 1 ( x ) ,u 2 ( x ) R n arethecontrolinputs,andthestate x ( t ) R n isassumedtobe measurable.ThesystemidentierisidenticaltotheidentierpresentedinChapter 4 .For brevitythischapterpresentsthemaintheoremandreferstoChapter 4 forfurtherdetails. Theorem5.1. ForthesysteminEq. 516 ,theidentierdevelopedinEq. 415 along withitsweightupdatelawsinEq. 421 ensuresasymptoticidenticationofthestateand itsderivative,inthesensethat lim t '" ( x ( t ) ( =0 and lim t '" 3 3 x ( t ) 3 3 =0 providedAssumptions4-4through4-6hold,andthecontrolgains k and f arechosen su cientlylargebasedontheinitialconditionsofthestates,andsatisfythefollowing su cientconditions $" f > ( 5 ,k>( 6 (517) where ( 5 and ( 6 areintroducedinEq. 428 ,and ) 1 ) 2 introducedinEq. 431 ,arechosen accordingtothesu" cientconditionsinEq. 432 Proof. RefertoTheorem4-1inChapter 4 105

PAGE 106

5.4Actor-CriticDesign UsingAssumption5-1andEq. 59 ,theoptimalvaluefunctionandtheoptimal controlscanberepresentedbyNNsas V # 1 ( x )= W T 1 0 1 ( x )+ 1 ( x ) ,u # 1 ( x )= 1 2 R % 1 11 g T 1 ( x ) 0 & 1 ( x ) T W 1 + & 1 ( x ) T ( M NO P ( V 1 (518) V # 2 ( x )= W T 2 0 2 ( x )+ 2 ( x ) ,u # 2 ( x )= 1 2 R % 1 22 g T 2 ( x ) 0 & 2 ( x ) T W 2 + & 2 ( x ) T ( M NO P ( V 2 where W 1 ,W 2 R N areunknownidealNNweights, N isthenumberofneurons, 0 i ( x )=[ 0 i 1 ( x ) 0 i 2 ( x ) ...0 iN ( x )] T R N aresmoothNNactivationfunctions,suchthat 0 ij (0)=0 and 0 & ij (0)=0 j =1 ...N and i =1 2 ,and 1 ( ) 2 ( ) R arethefunction reconstructionerrors. Assumption5.1. TheNNactivationfunction { 0 ij ( x ): j =1 ...N,i =1 2 } arechosen suchthatas N $, 0 i ( x ) providesacompleteindependentbasisfor V # i ( x ) UsingAssumption5-1andWeierstrasshigher-orderapproximationtheorem,both V # i ( x ) and V # i canbeuniformlyapproximatedbyNNsinEq. 518 ,i.e.as N $, ,the approximationerrors i ( x ) & i ( x ) $ 0 for i =1 2 ,respectively.Thecritic V i ( x ) andthe actor u i ( x ) approximatetheoptimalvaluefunction V # i ( x ) andtheoptimalcontrols u # i ( x ) inEq. 518 ,andaregivenas V 1 ( x )= W T 1 c 0 1 ( x ) u 1 ( x )= 1 2 R % 1 11 g T 1 ( x ) 0 & 1 ( x ) W 1 a (519) V 2 ( x )= W T 2 c 0 2 ( x ) u 2 ( x )= 1 2 R % 1 22 g T 2 ( x ) 0 & 2 ( x ) W 2 a where W 1 c ( t ) W 2 c ( t ) R N and W 1 a ( t ) W 2 a ( t ) R N areestimatesoftheidealweights ofthecriticandactorNNs,respectively.Theweightestimationerrorsforthecriticand actoraredenedas W ic ( t ) W i W ic ( t ) and W ia ( t ) W i W ia ( t ) for i =1 2 respectively.TheactorandcriticNNweightsarebothupdatedbasedontheminimization oftheBellmanerror hjb ( ) inEq. 514 ,whichcanberewrittenbysubstituting V 1 and V 2 106

PAGE 107

fromEq. 519 as hjb 1 = W T 1 c 0 & 1 F u + r 1 ( x, u 1 u 2 )= W T 1 c 1 1 + r 1 ( x, u 1 u 2 ) (520) hjb 2 = W T 2 c 0 & 2 F u + r 2 ( x, u 1 u 2 )= W T 2 c 1 2 + r 2 ( x, u 1 u 2 ) where 1 i ( x, u,t ) 0 & i F u R N for i =1 2 isthecriticNNregressorvector. LeastSquaresUpdatefortheCritic. Considertheintegralofthesumofthe squaredBellmanerrors E c ( W 1 c ( t ) W 2 c ( t ) ,t ) E c = t 0 2 hjb 1 ( )+ 2 hjb 2 ( ) ( d !. (521) TheLSupdatelawforthecritic W 1 c ( t ) isgeneratedbyminimizingthetotalprediction errorinEq. 521 & E c & W 1 c =2 t 0 hjb 1 ( ) &* hjb 1 ( ) & W 1 c ( ) d =0 = W T 1 c ( t ) t 0 1 1 ( ) 1 1 ( ) T d + t 0 1 1 ( ) T r 1 ( ) d =0 W 1 c ( t )= @ B t 0 1 1 ( ) 1 1 ( ) T d C E % 1 t 0 1 1 ( ) r 1 ( ) d !, whichgivestheLSestimateofthecriticweights,providedtheinverse % t 0 1 1 ( ) 1 1 ( ) T d & % 1 exists.Likewise,theLSupdatelawforthecritic W 2 c ( t ) isgeneratedby W 2 c ( t )= @ B t 0 1 2 ( ) 1 2 ( ) T d C E % 1 t 0 1 2 ( ) r 2 ( ) d !. 107

PAGE 108

TherecursiveformulationofthenormalizedLSalgorithm[ 130 ]givestheupdatelawsfor thetwocriticweightsas W 1 c = / 1 c $ 1 c 1 1 1+ 1 1 T 1 $ 1 c 1 1 hjb 1 (522) W 2 c = / 2 c $ 2 c 1 2 1+ 2 1 T 2 $ 2 c 1 2 hjb 2 where 1 2 / 1 c / 2 c R areconstantpositivegainsand $ ic ( t ) % t 0 1 ( ) 1 ( ) T d & % 1 R N $ N for i =1 2 ,aresymmetricestimationgainmatricesgeneratedby $ 1 c = / 1 c $ 1 c 1 1 1 1 T 1+ 1 1 T 1 $ 1 c 1 1 $ 1 c $ 1 c ( t + r 1 )=$ 1 c (0)= 2 01 I, (523) $ c 2 = / 2 c $ 2 c 1 2 1 2 T 1+ 2 1 T 2 $ 2 c 1 2 $ 2 c $ 2 c ( t + r 2 )=$ 2 c (0)= 2 02 I, where t + r 1 and t + r 2 aretheresettingtimesatwhich % min { $ 1 c ( t ) } % 2 1 and % min { $ 2 c ( t ) } % 2 2 2 01 > 2 1 > 0 ,and 2 02 > 2 2 > 0 .Thecovarianceresettingensuresthat $ 1 c ( t ) and $ 2 c ( t ) arepositive-deniteforalltimeandpreventarbitrarilysmallvaluesinsomedirections, whichwouldmakeadaptationinthosedirectionsveryslow(alsocalledthecovariance wind-upproblem)[ 130 ].FromEq. 523 itisclearthat $ 1 c % 0 and $ 2 c % 0 ,whichmeans thatthecovariancematrices ( $ 1 c ( t ) $ 2 c ( t )) canbeboundedasfollows 2 1 I % $ 1 c % 2 01 I,2 2 I % $ 2 c % 2 02 I. (524) GradientUpdatefortheActor. TheactorupdateisalsobasedontheminimizationoftheBellmanerrors hjb i ( ) .However,unlikethecriticweights,theactorweights appearnonlinearlyin hjb i ( ) ,makingitproblematictodevelopaLSupdatelaw.Hence,a gradientupdatelawisdevelopedfortheactorwhichminimizesthesquaredBellmanerror E a ( t ) 2 hjb 1 + 2 hjb 2 ,whosegradientsaregivenas & E a & W 1 a =( W 1 a W 1 c ) T 0 & 1 G 1 0 & T 1 hjb 1 +( W 1 a 0 & 1 G 21 W 2 c 0 & 2 G 2 ) T 0 & T 1 hjb 2 (525) & E a & W 2 a =( W 2 a 0 & 2 G 12 W 1 c 0 & 1 G 1 ) T 0 & T 2 hjb 1 +( W 2 a W 2 c ) T 0 & 2 G 2 0 & T 2 hjb 2 108

PAGE 109

where G i g i R % 1 ii g i R n $ n and G ji g i R % 1 ii R ji R % 1 ii g i R n $ n ,for i =1 2 and j =1 2 aresymmetricmatrices.UsingEq. 525 ,theactorNNsareupdatedas W 1 a = proj Q $ 11 a 6 1+ 1 T 1 1 1 & E a & W 1 a $ 12 a ( W 1 a W 1 c ) R (526) W 2 a = proj Q $ 21 a 6 1+ 1 T 2 1 2 & E a & W 2 a $ 22 a ( W 2 a W 2 c ) R where $ 11 a $ 12 a $ 21 a $ 22 a R arepositiveadaptationgains,and proj {} isaprojection operatorusedtoboundtheweightestimates[ 127 ],[ 128 ].UsingAssumption4-2andthe projectionalgorithminEq. 526 ,theactorNNweightestimationerrorcanbeboundedas 3 3 3 W 1 a 3 3 3 % 3 1 ; 3 3 3 W 2 a 3 3 3 % 3 2 (527) where 3 1 3 2 R issomepositiveconstants.TherstterminEq. 526 isnormalized andthelasttermisaddedasfeedbackforstability(basedonthesubsequentstability analysis). 5.5StabilityAnalysis Thedynamicsofthecriticweightestimationerrors W 1 c ( t ) and W 2 c ( t ) canbedevelopedusingEqs. 511 514 520 and 522 ,as W 1 c = / 1 c $ 1 c 1 1 1+ 1 1 T 1 $ 1 c 1 1 5 W T 1 c 1 1 W T 1 0 & 1 F u & 1 F u + u T 1 R 11 u 1 (528) u # T 1 R 11 u # 1 u # T 2 R 12 u # 2 + W T 1 0 & 1 ( g 1 ( u 1 u # 1 )+ g 2 ( u 2 u # 2 ))+ u T 2 R 12 u 2 7 W 2 c = / 2 c $ 2 c 1 2 1+ 2 1 T 2 $ 2 c 1 2 5 W T 2 c 1 2 W T 2 0 & 2 F u & 2 F u + u T 2 R 22 u 2 u # T 1 R 21 u # 1 u # T 2 R 22 u # 2 + W T 2 0 & 2 ( g 1 ( u 1 u # 1 )+ g 2 ( u 2 u # 2 ))+ u T 1 R 21 u 1 7 109

PAGE 110

Substitutingfor ( u # 1 ( x ) ,u # 2 ( x )) and ( u 1 ( x ) u 2 ( x )) fromEqs. 518 and 519 ,respectively,in Eq. 528 yields W 1 c = / 1 c $ 1 c + 1 + T 1 W 1 c + / 1 c $ 1 c 1 1 1+ 1 1 T 1 $ 1 c 1 1 5 W T 1 0 & 1 F u & 1 F u (529) + 1 4 W T 1 a 0 & 1 G 1 0 & T 1 W 1 a + 1 4 W T 2 a 0 & 2 G 12 0 & T 2 W 2 a 1 4 & 1 G 1 & T 1 1 4 & 2 G 12 & T 2 + 1 2 % W 2 a 0 & 2 + & T 2 &% G 2 0 & T 1 W 1 G 12 0 & T 2 W 2 & 0 W 2 c = / 2 c $ 2 c + 2 + T 2 W 2 c + / 2 c $ 2 c 1 2 1+ 2 1 T 2 $ 2 c 1 2 5 W T 2 0 & 2 F u & 2 F u + 1 4 W T 1 a 0 & 1 G 21 0 & T 1 W 1 a + 1 4 W T 2 a 0 & 2 G 2 0 & T 2 W 2 a 1 4 & 2 G 2 & T 2 1 4 & 1 G 21 & T 1 + 1 2 % W 1 a 0 & 1 + & T 1 &% G 1 0 & T 2 W 2 G 21 0 & T 1 W 1 & 0 where + i ( t ) i ( t ) 1+ ( i i ( t ) T ic ( t ) i ( t ) R N arethenormalizedcriticregressorvectorsfor i =1 2 ,respectively,boundedas ( + 1 (% 1 . 1 2 11 ( + 2 (% 1 . 2 2 12 (530) where 2 11 and 2 12 areintroducedinEq. 524 .TheerrorsystemsinEq. 529 canbe representedasthefollowingperturbedsystems W 1 c = # 1 + & 01 % 1 W 2 c = # 2 + & 02 % 2 (531) where # i ( W ic ,t ) / ic $ ic + i + T i W ic R N i =1 2 denotesthenominalsystem, & 0 i ) ic ic i 1+ ( i T i ic i denotestheperturbationgain,and % i ( t ) / W T i 0 & i F u + 1 4 W T ia 0 & i G i 0 & T i W ia & i F u + 1 4 W T ka 0 & k G ik 0 & T k W ka 1 4 & k G ik & T k 1 4 & i G i & T i + 1 2 % W ka 0 & k + & T k &% G k 0 & T i W i G ik 0 & T k W k & 0 R N 110

PAGE 111

where i =1 2 and k =3 i ,denotestheperturbations.UsingTheorem2.5.1in[ 130 ],it canbeshownthatthenominalsystems W 1 c = / 1 c $ 1 c + 1 + T 1 W 1 c W 2 c = / 2 c $ 2 c + 2 + T 2 W 2 c (532) areexponentiallystable,iftheboundedsignals ( + 1 ( t ) + 2 ( t )) arePE,i.e. i 2 I & t 0 + $ i t 0 + i ( ) + i ( ) T d & i 1 I ) t 0 & 0 ,i =1 2 where i 1 , i 2 i R aresomepositiveconstants.Since # i ( W c ,t ) iscontinuously di! erentiableandtheJacobian # # i # W ic = / ic $ ic + i + T i isboundedfortheexponentiallystable systemEq. 532 for i =1 2 ,theconverseLyapunovTheorem4.14in[ 131 ]canbeusedto showthatthereexistsafunction V c : R N + R N + [0 , ) $ R ,whichsatisesthefollowing inequalities c 11 3 3 3 W 1 c 3 3 3 2 + c 12 3 3 3 W 2 c 3 3 3 2 % V c ( W 1 c W 2 c ,t ) % c 21 3 3 3 W 1 c 3 3 3 2 + c 22 3 3 3 W 2 c 3 3 3 2 & V c & t + & V c & W 1 c # 1 ( W 1 c ,t )+ & V c & W 2 c # 2 ( W 2 c ,t ) %' c 31 3 3 3 W 1 c 3 3 3 2 c 32 3 3 3 W 2 c 3 3 3 2 (533) 3 3 3 3 & V c & W 1 c 3 3 3 3 % c 41 3 3 3 W 1 c 3 3 3 3 3 3 3 & V c & W 2 c 3 3 3 3 % c 42 3 3 3 W 2 c 3 3 3 forsomepositiveconstants c 1 i ,c 2 i ,c 3 i ,c 4 i R for i =1 2 .UsingAssumptions4-1 through4-6and5-1,theprojectionboundsinEq. 526 ,thefactthat F u L (since ( u # 1 ( x ) ,u # 2 ( x )) isstabilizing),andprovidedtheconditionsofTheorem5-1hold(required toprovethat F u L ),thefollowingboundsaredevelopedtofacilitatethesubsequent 111

PAGE 112

stabilityproof 3 3 3 W 1 a 3 3 3 % 3 1 ; 3 3 3 W 2 a 3 3 3 % 3 2 3 3 3 0 & 1 G 1 0 & T 1 3 3 3 % 3 3 ; 3 3 3 0 & 2 G 2 0 & T 2 3 3 3 % 3 4 ( % 1 (% 3 5 ; ( % 2 (% 3 6 (534) 3 3 3 3 1 2 W T 1 0 & 1 + W T 2 0 & 2 + & 1 + & 2 ( % G 1 & T 1 + G 2 & T 2 & 3 3 3 3 % 3 7 3 3 3 3 1 2 W T 1 0 & 1 + W T 2 0 & 2 + & 1 + & 2 ( % G 1 0 & T 1 W 1 a + G 2 0 & T 2 W 2 a & 3 3 3 3 % 3 8 3 3 3 0 & 1 G 21 0 & T 1 3 3 3 % 3 9 ; 3 3 3 0 & 2 G 2 0 & T 1 3 3 3 % 3 10 3 3 3 0 & 1 G 1 0 & T 2 3 3 3 % 3 11 ; 3 3 3 0 & 2 G 12 0 & T 2 3 3 3 % 3 12 where 3 j R for j =1 ,..., 12 arecomputablepositiveconstants. Theorem5.2. IfAssumptions4-1-4-6and5-1hold,theregressors + i ( t ) i 1+ T i ic i for i =1 2 arePE,andprovidedEq. 432 ,Eq. 517 andthefollowingsu" cientgain conditionsaresatised c 31 $ 11 a > 3 1 3 3 ; c 32 $ 21 a > 3 2 3 4 where $ 11 a $ 21 a ,c 31 ,c 32 3 1 3 2 3 3 and 3 4 areintroducedinEqs. 526 533 ,and 534 thenthecontrollerinEq. 519 ,theactor-criticweightupdatelawsinEqs. 522 523 and 526 ,andtheidentierinEqs. 415 and 421 ,guaranteethatthestateofthesystem x ( t ) andtheactor-criticweightestimationerrors % W 1 a ( t ) W 2 a ( t ) & and % W 1 c ( t ) W 2 c ( t ) & are UUB. Proof. ToinvestigatethestabilityofthethesystemEq. 516 withcontrol ( u 1 u 2 ) ,and theperturbedsystemEq. 531 ,consider V L : X + R N + R N + R N + R N + [0 , ) $ R as thecontinuouslydi! erentiable,positive-deniteLyapunovfunctioncandidate,givenas V L ( x, W 1 c W 2 c W 1 a W 2 a ,t ) V # 1 ( x )+ V # 2 ( x )+ V c ( W 1 c W 2 c ,t )+ 1 2 W T 1 a W 1 a + 1 2 W T 2 a W 2 a 112

PAGE 113

where V # i ( x ) for i =1 2 (theoptimalvaluefunctionforEq. 516 ,aretheLyapunov functionforEq. 516 ,and V c ( W c ,t ) istheLyapunovfunctionfortheexponentiallystable systeminEq. 532 .Since ( V # 1 ( x ) ,V # 2 ( x )) arecontinuouslydi! erentiableandpositivedenitefromEq. 55 ,fromLemma4.3in[ 131 ],thereexistclass K functions $ 1 and $ 2 denedon [0 ,r ] where B r X ,suchthat $ 1 ( ( x ( ) % V # 1 ( x )+ V # 2 ( x ) % $ 2 ( ( x ( ) ) x B r (535) UsingEqs. 533 and 535 V L ( x, W 1 c W 2 c W 1 a W 2 a ,t ) canbeboundedas $ 1 ( ( x ( )+ c 11 3 3 3 W 1 c 3 3 3 2 + c 12 3 3 3 W 2 c 3 3 3 2 + 1 2 > 3 3 3 W 1 a 3 3 3 2 + 3 3 3 W 2 a 3 3 3 2 ? % V L $ 2 ( ( x ( )+ c 21 3 3 3 W 1 c 3 3 3 2 + c 22 3 3 3 W 2 c 3 3 3 2 + 1 2 > 3 3 3 W 1 a 3 3 3 2 + 3 3 3 W 2 a 3 3 3 2 ? & V L, whichcanbewrittenas $ 3 ( ( w ( ) % V L ( x, W 1 c W 2 c W 1 a W 2 a ,t ) % $ 4 ( ( w ( ) ) w B s where w ( t ) [ x ( t ) T W 1 c ( t ) T W 2 c ( t ) T W 1 a ( t ) T W 2 a ( t ) T ] T R n +4 N $ 3 and $ 4 areclass K functionsdenedon [0 ,s ] where B s X+ R N + R N + R N + R N .Takingthetime derivativeof V L ( ) yields V L =( V # 1 + V # 2 )( f + g 1 u 1 + g 2 u 2 )+ & V c & t + & V c & W 1 c # 1 (536) + & V c & W 1 c & 01 % 1 + & V c & W 2 c # 2 + & V c & W 2 c & 02 % 2 W T 1 a W 1 a W T 2 a W 2 a wherethetimederivativesof V # i ( ) for i =1 2 aretakenalongthethetrajectoriesof thesystemEq. 516 withcontrolinputs ( u 1 ( ) u 2 ( )) andthetimederivativeof V c ( ) is takenalongthealongthetrajectoriesoftheperturbedsystemEq. 531 .UsingtheHJB equationEq. 512 V # i f = '* V # i ( g 1 u # 1 + g 2 u # 2 ) Q i ( x ) 2 L j =1 u T j R ij u j for i =1 2 Substitutingforthe V # i f termsinEq. 536 ,usingthefactthat V # i g i = 2 u # T i R ii from 113

PAGE 114

Eq. 59 ,andusingEqs. 526 and 533 ,Eq. 536 canbeupperboundedas V L %' Q u # T 1 ( R 11 + R 21 ) u # 1 u # T 2 ( R 22 + R 12 ) u # 2 c 31 3 3 3 W 1 c 3 3 3 2 c 32 3 3 3 W 2 c 3 3 3 2 (537) + c 41 & 01 3 3 3 W 1 c 3 3 3 ( % 1 ( + c 42 & 02 3 3 3 W 2 c 3 3 3 ( % 2 ( +2 u # T 1 R 11 ( u # 1 u 1 )+2 u # T 2 R 22 ( u # 2 u 2 ) + W T 1 a J $ 11 a 6 1+ 1 T 1 1 1 & E a & W 1 a + $ 12 a ( W 1 a W 1 c ) K + V # 1 g 2 ( u 2 u # 2 ) + W T 2 a J $ 21 a 6 1+ 1 T 2 1 2 & E a & W 2 a + $ 22 a ( W 2 a W 2 c ) K + V # 2 g 1 ( u 1 u # 1 ) where Q ( x ) Q 1 ( x )+ Q 2 ( x ) .Substitutingfor u # i u i hjb i ,and % i for i =1 2 usingEqs. 59 519 528 ,and 531 ,respectively,andusingEqs. 524 and 530 inEq. 537 ,yields V L %' Q c 31 3 3 3 W 1 c 3 3 3 2 c 32 3 3 3 W 2 c 3 3 3 2 $ 12 a 3 3 3 W 1 a 3 3 3 2 $ 22 a 3 3 3 W 2 a 3 3 3 2 (538) + 1 2 W T 1 0 & 1 + W T 2 0 & 2 + & 1 + & 2 ( % G 1 & T 1 + G 2 & T 2 & + 1 2 W T 1 0 & 1 + W T 2 0 & 2 + & 1 + & 2 ( % G 1 0 & T 1 W 1 a + G 2 0 & T 2 W 2 a & + c 41 / 1 c 2 01 2 . 1 2 11 ( % 1 ( 3 3 3 W 1 c 3 3 3 + c 42 / 2 c 2 02 2 . 2 2 12 ( % 2 ( 3 3 3 W 2 c 3 3 3 + $ 11 a 6 1+ 1 T 1 1 1 W T 1 a % ( W 1 c W 1 a ) T 0 & 1 G 1 0 & T 1 % W T 1 c 1 1 + % 1 & +( W 1 c 0 & 1 G 21 W 2 a 0 & 2 G 2 ) T 0 & T 1 % W T 2 c 1 2 + % 2 && + $ 12 a 3 3 3 W 1 a 3 3 3 3 3 3 W 1 c 3 3 3 + $ 21 a 6 1+ 1 T 2 1 2 W T 2 a % ( W 2 c 0 & 2 G 12 W 1 a 0 & 1 G 1 ) T 0 & T 2 % W T 1 c 1 1 + % 1 & +( W 2 c W 2 a ) T 0 & 2 G 2 0 & T 2 % W T 2 c 1 2 + % 2 && + $ 22 a 3 3 3 W 2 a 3 3 3 3 3 3 W 2 c 3 3 3 + $ 11 a 6 1+ 1 T 1 1 1 W T 1 a % ( W 1 0 & 1 G 21 W 2 0 & 2 G 2 ) T 0 & T 1 % W T 2 c 1 2 + % 2 && + $ 21 a 6 1+ 1 T 2 1 2 W T 2 a % ( W 2 0 & 2 G 12 W 1 0 & 1 G 1 ) T 0 & T 2 % W T 1 c 1 1 + % 1 && 114

PAGE 115

UsingtheboundsdevelopedinEq. 534 ,Eq. 538 canbefurtherupperboundedas V L %' Q ( c 31 $ 11 a 3 1 3 3 ) 3 3 3 W 1 c 3 3 3 2 ( c 32 $ 21 a 3 2 3 4 ) 3 3 3 W 2 c 3 3 3 2 $ 12 a 3 3 3 W 1 a 3 3 3 2 $ 22 a 3 3 3 W 2 a 3 3 3 2 + 1 3 3 3 W 1 c 3 3 3 + 2 3 3 3 W 2 c 3 3 3 + $ 11 a 3 2 1 3 3 3 5 + 3 1 3 2 3 10 + W 1 3 9 + W 2 3 10 ( 3 6 ( + $ 21 a 3 2 2 3 4 3 6 + 3 2 3 1 3 11 + W 1 3 11 + W 2 3 12 ( 3 5 ( + 3 3 3 3 W 1 c 3 3 3 3 3 3 W 2 c 3 3 3 + 3 7 + 3 8 where 1 > c 41 / 1 c 2 01 2 . 1 2 11 3 5 + $ 11 a 3 1 3 3 3 5 + 3 2 1 3 3 + 3 1 3 9 3 6 ( + $ 12 a 3 1 + $ 21 a 3 2 3 1 3 11 + W 1 3 11 + W 2 3 12 (( 2 > c 42 / 2 c 2 02 2 . 2 2 12 3 6 + $ 21 a 3 2 3 3 3 6 + 3 2 3 12 3 5 + 3 2 2 3 4 ( + $ 22 a 3 2 + $ 11 a 3 1 3 2 3 10 + W 1 3 9 + W 2 3 10 (( 3 ( $ 11 a 3 1 3 9 + $ 21 a 3 2 3 12 ) Provided c 31 > $ 11 a 3 1 3 3 and c 32 > $ 21 a 3 2 3 4 ,usingYoung'sinequality 3 3 3 W 1 c 3 3 3 3 3 3 W 2 c 3 3 3 % 1 2 3 3 3 W 1 c 3 3 3 2 + 1 2 3 3 3 W 2 c 3 3 3 2 ,andcompletingthesquareyields V L %' Q (1 4 1 )( c 31 $ 11 a 3 1 3 3 1 2 3 ) 3 3 3 W 1 c 3 3 3 2 $ 12 a 3 3 3 W 1 a 3 3 3 2 (539) (1 4 2 )( c 32 $ 21 a 3 2 3 4 1 2 3 ) 3 3 3 W 2 c 3 3 3 2 $ 22 a 3 3 3 W 2 a 3 3 3 2 + $ 11 a 3 2 1 3 3 3 5 + 3 1 3 2 3 10 + W 1 3 9 + W 2 3 10 ( 3 6 ( + $ 21 a 3 2 2 3 4 3 6 + 3 2 3 1 3 11 + W 1 3 11 + W 2 3 12 ( 3 5 ( + 2 1 4 4 1 ( c 31 $ 11 a 3 1 3 3 1 2 3 ) + 2 2 4 4 2 ( c 32 $ 21 a 3 2 3 4 1 2 3 ) where 4 1 4 2 (0 1) .Since Q ( x ) ispositivedenite,accordingtoLemma4.3in[ 131 ],there existclass K functions $ 5 and $ 6 suchthat $ 5 ( ( w ( ) % F ( ( w ( ) % $ 6 ( ( w ( ) ) w B s (540) 115

PAGE 116

where F ( ( w ( )= Q +(1 4 1 )( c 31 $ 11 a 3 1 3 3 1 2 3 ) 3 3 3 W 1 c 3 3 3 2 + $ 12 a 3 3 3 W 1 a 3 3 3 2 +(1 4 2 )( c 32 $ 21 a 3 2 3 4 1 2 3 ) 3 3 3 W 2 c 3 3 3 2 + $ 22 a 3 3 3 W 2 a 3 3 3 2 UsingEq. 540 ,theexpressioninEq. 539 canbefurtherupperboundedas V L %' $ 5 ( ( w ( )+ where =$ 11 a 3 2 1 3 3 3 5 + 3 1 3 2 3 10 + W 1 3 9 + W 2 3 10 ( 3 6 ( + $ 21 a 3 2 2 3 4 3 6 + 3 2 3 1 3 11 + W 1 3 11 + W 2 3 12 ( 3 5 ( + 2 1 4 4 1 ( c 31 $ 11 a 3 1 3 3 1 2 3 ) + 2 2 4 4 2 ( c 32 $ 21 a 3 2 3 4 1 2 3 ) whichprovesthat V L ( ) isnegativewhenever w ( t ) liesoutsidethecompactset # w F w : ( w (% $ % 1 5 ( ) G andhence, ( w ( t ) ( isUUB,accordingtoTheorem4.18 in[ 131 ]. 5.6ConvergencetoNashSolution ThesubsequenttheoremdemonstratesthattheactorNNapproximationsconverge totheapproximatecoupledHJBinEq. 510 .Itcanalsobeshownthattheapproximate controllersinEq. 519 approximatetheoptimalsolutionstothetwoplayerNashgamefor thedynamicsystemgiveninEq. 516 Assumption5.2. ForeachadmissiblecontrolpoliciestheHJBequationsEq. 510 have alocallysmoothsolution V i ( x ) & 0 ,for i =1 2 ,and f ( ) isLipschitzandboundedby ( f (% c f ( x ( ,where c f R isapositiveconstant. Theorem5.3. GiventhattheAssumptionsandsu"cientgainconstraintsinTheorem5-2 hold,thentheactorandcriticNNsconvergetotheapproximatecoupledHJBsolution,in thesensethattheHJBsinEq. 513 areUUB. 116

PAGE 117

Proof. ConsidertheapproximateHJBinEq. 513 andaftersubstitutingtheapproximate controllawsinEq. 519 yields H 1 % x, V 1 u 1 u 2 & = r u 1 + V 1 F u = Q 1 ( x )+ 1 4 W T 1 a 0 & 1 G 1 0 & T 1 W 1 a + W T 1 c 0 & 1 f ( x ) + 1 4 W T 2 a 0 & 2 G 12 0 & T 2 W 2 a 1 2 W T 1 c 0 & 1 % G 1 0 & T 1 W 1 a + G 2 0 & T 2 W 2 a & and H 2 % x, V 2 u 1 u 2 & = r u 2 + V 2 F u = Q 2 ( x )+ 1 4 W T 2 a 0 & 2 G 2 0 & T 2 W 2 a + W T 1 c 0 & 1 f ( x ) + 1 4 W T 1 a 0 & 1 G 21 0 & T 1 W 1 a 1 2 W T 2 c 0 & 2 % G 1 0 & T 1 W 1 a + G 2 0 & T 2 W 2 a & Afteraddingandsubtracting W T i 0 & i + & i ( f = ' W T i 0 & i + & i ( ( g 1 u # 1 + g 2 u # 2 ) Q i ( x ) 2 L j =1 u T j R ij u j for i =1 2 andsubstitutingfortheoptimalcontrollawinEq. 518 as H 1 = W T 1 c 0 & 1 f ( x ) & 1 f + 1 4 W T 1 a 0 & 1 G 1 0 & T 1 W 1 a 1 4 W T 1 0 & 1 G 1 0 & T 1 W 1 + 1 4 W T 2 a 0 & 2 G 12 0 & T 2 W 2 a 1 4 W T 2 0 & 2 G 12 0 & T 2 W 2 (541) 1 2 ( & 2 G 12 & 1 G 2 ) 0 & T 2 W 2 + 1 2 & 1 % G 1 & T 1 + G 2 & T 2 & 1 4 & 2 G 12 & T 2 + 1 2 W T 1 0 & 1 % G 1 0 & T 1 W 1 + G 2 0 & T 2 W 2 & + 1 2 W T 1 0 & 1 % G 1 & T 1 + G 2 & T 2 & 1 2 W T 1 c 0 & 1 % G 1 0 & T 1 W 1 a + G 2 0 & T 2 W 2 a & 117

PAGE 118

and H 2 = W T 2 c 0 & 2 f ( x ) & 2 f + 1 4 W T 2 a 0 & 2 G 2 0 & T 2 W 2 a 1 4 W T 2 0 & 2 G 2 0 & T 2 W 2 + 1 4 W T 1 a 0 & 1 G 21 0 & T 1 W 1 a 1 4 W T 1 0 & 1 G 21 0 & T 1 W 1 (542) 1 2 ( & 1 G 21 & 2 G 1 ) 0 & T 1 W 1 + 1 2 & 2 % G 2 & T 2 + G 1 & T 1 & 1 4 & 1 G 21 & T 1 + 1 2 W T 2 0 & 2 % G 2 0 & T 2 W 2 + G 1 0 & T 1 W 1 & + 1 2 W T 2 0 & 2 % G 2 & T 2 + G 1 & T 1 & 1 2 W T 2 c 0 & 2 % G 2 0 & T 2 W 2 a + G 1 0 & T 1 W 1 a & SubstitutingtheNNmismatcherrors W ic ( t ) W i W ic ( t ) and W ia ( t ) W i W ia ( t ) ,for i =1 2 into 465 and 542 ,respectively,yields H 1 = W T 1 c 0 & 1 f ( x ) & 1 f 1 4 W T 1 0 & 1 G 1 0 & T 1 W 1 (543) 1 2 W T 2 a 0 & 2 G 12 0 & T 2 W 2 a + 1 2 W T 2 0 & 2 G 12 0 & T 2 W 2 1 2 ( & 2 G 12 & 1 G 2 ) 0 & T 2 W 2 + 1 2 & 1 % G 1 & T 1 + G 2 & T 2 & 1 4 & 2 G 12 & T 2 + 1 2 W T 1 0 & 1 % G 1 0 & T 1 % 2 W 1 a W 1 & + G 2 0 & T 2 W 2 & + 1 2 W T 1 0 & 1 % G 1 & T 1 + G 2 & T 2 & + 1 2 W T 1 0 & 1 % G 1 0 & T 1 % W 1 a + W 1 & + G 2 0 & T 2 % W 2 a + W 2 && and H 2 = W T 2 c 0 & 2 f ( x ) & 2 f 1 4 W T 2 0 & 2 G 2 0 & T 2 W 2 (544) 1 2 W T 2 a 0 & 1 G 21 0 & T 1 W 1 a + 1 2 W T 1 0 & 1 G 21 0 & T 1 W 1 1 2 ( & 1 G 21 & 2 G 1 ) 0 & T 1 W 1 + 1 2 & 2 % G 2 & T 2 + G 1 & T 1 & 1 4 & 1 G 21 & T 1 + 1 2 W T 2 0 & 2 % G 2 0 & T 2 % 2 W 2 a W 2 & + G 1 0 & T 1 W 1 & + 1 2 W T 2 0 & 2 % G 2 & T 2 + G 1 & T 1 & + 1 2 W T 2 0 & 2 % G 2 0 & T 2 % W 2 a + W 2 & + G 1 0 & T 1 % W 1 a + W 1 && Itiseasytoseethatiftheassumptionsandsu"cientgainconstraintsinTheorem5-2 hold,thentherightsideofEqs. 543 and 544 canbeupperboundedbyafunctionthat 118

PAGE 119

isUUB ( H i (% ( i % W ic W 1 a W 2 a ,t & for i =1 2 ,thereforetheapproximateHJBsarealso UUB. Theorem5.4. Giventhattheassumptionsandsu"cientgainconstraintsinTheorem2 hold,theapproximatecontrollawsinEq. 441 convergetotheapproximateNashsolution ofthegame. Proof. Considerthecontrolerrors ( u 1 u 2 ) betweentheoptimalcontrollawsinEq. 59 andtheapproximatecontrollawsinEq. 519 givenas u 1 u # 1 u 1 u 2 u # 2 u 2 SubstitutingfortheoptimalcontrollawsinEq. 59 andtheapproximatecontrollawsin Eq. 519 andusing W ia ( t ) W i W ia ( t ) for i =1 2 ,yields u 1 = 1 2 R % 1 11 g T 1 ( x ) 0 & 1 ( x ) % W 1 a + & 1 ( x ) T & (545) u 2 = 1 2 R % 1 22 g T 2 ( x ) 0 & 2 ( x ) % W 2 a + & 2 ( x ) T & UsingAssumptions2-5,Eq. 545 canbeupperboundedas ( u 1 (% 1 2 % min R % 1 11 ( g 1 0 & 1 % 3 3 3 W 1 a 3 3 3 + & 1 & ( u 2 (% 1 2 % min R % 1 22 ( g 2 0 & 2 % 3 3 3 W 2 a 3 3 3 + & 2 & Giventhattheassumptionsandsu"cientgainconstraintsinTheorem5-2hold,then alltermstotherightoftheinequalityareUUB,thereforethecontrolerrors ( u 1 u 2 ) are UUBandtheapproximatecontrollaws ( u 1 u 2 ) givetheapproximateNashequilibrium solution. 5.7Simulation Thefollowingnonlineardynamicsareconsideredin[ 101 108 109 132 ] x = f ( x )+ g 1 ( x ) u 1 ( x )+ g 2 ( x ) u 2 ( x ) 119

PAGE 120

where f ( x )= ) + x 2 1 2 x 1 x 2 + 1 4 x 2 ( cos (2 x 1 )+2) 2 1 4 x 2 ( sin (4 x 2 1 )+2) 2 g 1 ( x )= / 0 cos (2 x 1 )+2 0 T g 2 ( x )= / 0 sin (4 x 2 1 )+2 0 T Theinitialstateisgivenas x (0)=[3 1] T andthelocalcostfunctionisdenedas r i = x T Q i x + u T i R ii u i + u T i R ji u i i =1 2 ,j =3 i, where R 11 =2 R 22 =1 ,R 12 =2 R 21 =2 ,Q 1 =2 Q 2 = I 2 $ 2 Theoptimalvaluefunctionsforthecriticsofplayer1andplayer2aregivenas V # 1 ( x )= 1 2 x 2 1 + x 2 2 ,V # 2 ( x )= 1 4 x 2 1 + 1 2 x 2 2 andtheoptimalcontrolinputsaregivenas u # 1 = ( cos (2 x 1 )+2) x 2 ,u # 2 = ' sin 4 x 2 1 ( +2 ( x 2 TheactivationfunctionforthecriticNNischosenas 0 = ) + x 2 1 x 1 x 2 x 2 2 , whiletheactivationfunctionfortheidentierDNNischosenasasymmetricsigmoidwith 5neuronsinthehiddenlayer.Theidentiergainsarechosenas k =800 $ =300 f =5 ) 1 =0 2 $ wf =0 1 I 6 $ 6 $ vf =0 1 I 2 $ 2 andthegainsoftheactor-criticlearninglawsarechosenas $ 11 a = $ 22 a =10 $ 12 a = $ 21 a =50 / 1 c = / 1 c =50 1 = 2 =0 001 120

PAGE 121

Thecovariancematrixisinitializedto $ (0)=5000 ,alltheNNweightsarerandomly initializedwithvaluesbetween [ 1 1] ,andthestatesareinitializedto x (0)=[3 1] Asmallamplitudeexploratorysignal(noise)isaddedtothecontroltoexcitethestates fortherst3secondsofthesimulation,asseenfromtheevolutionofstatesinFigure 5-1 .Theidentierapproximatesthesystemdynamics,andthestatederivativeestimation errorisshowninFigure 5-2 .ThetimehistoriesofthecriticNNweightsandtheactors NNweightsaregiveninFigure 5-3 and 5-4 .Persistenceofexcitationensuresthatthe weightsconverge.Figure 5-5 showstheoptimalvaluefunctionsandtheapproximateones. Figure 5-6 showstheoptimalcontrollerandtheapproximatedcontrollerforplayer1and 2,respectively.Figures 5-7 5-8 5-9 ,and 5-10 demonstratethatforaPEsignalthatis notremovedtheweightsconverge,howeverthePEsignaldegradestheperformanceofthe states. Remark 5.1 AnimplementationissueinusingthedevelopedalgorithmistoensurePE ofthecriticregressorvector.Unlikelinearsystems,wherePEoftheregressortranslates tothesu"cientrichnessoftheexternalinput,noveriablemethodexiststoensurePE innonlinearsystems.Inthissimulation,asmallamplitudeexploratorysignalconsisting ofasumofsinesandcosinesofvaryingfrequenciesisaddedtothecontroltoensurePE qualitatively,andconvergenceofcriticweightstotheiroptimalvaluesisachieved.The exploratorysignal n ( t ) ispresentintherst3secondsofthesimulationandisgivenby n ( t )=(1 2 exp( 01 t )) cos 2 (0 2 t )+ sin 2 (2 0 t ) cos (0 1 t )+ sin 2 ( 1 2 t ) cos ( 5 t )+ sin 5 ( t ) ( 5.8Summary Ageneralizedsolutionfora N -playernonzero-sumdi! erentialgameissoughtutilizing byaHamilton-Jacobi-Bellmanapproximationbyanactor-critic-identierarchitecture. TheACIarchitectureimplementstheactorandcriticapproximationsimultaneouslyand inreal-time.TheuseofarobustDNN-basedidentiercircumventstheneedforcomplete modelknowledge,yieldinganidentierwhichisproventobeasymptoticallyconvergent. 121

PAGE 122

Agradient-basedweightupdatelawisusedforthecriticNNtoapproximatethevalue function.Usingtheidentierandthecritic,anapproximationtotheoptimalcontrollaw (actor)isdevelopedwhichstabilizestheclosedloopsystemandapproachestheoptimal solutionstothe N -playernonzero-sumgame. 0 5 10 15 20 25 30 1 0 1 2 3 x 1 0 5 10 15 20 25 30 2 1 0 1 2 3 x 2 [sec] Figure5-1.Theevolutionofthesystemstatesforthenonzero-sumgame,withpersistently excitedinputfortherst10seconds. 122

PAGE 123

0 2 4 6 8 10 12 14 16 18 20 0.05 0 0.05 x 1 0 2 4 6 8 10 12 14 16 18 20 0.5 0 0.5 x 2 [sec] Figure5-2.Errorinestimatingthestatederivatives,withtheidentierforthe nonzero-sumgame. 0 5 10 15 20 25 30 0.5 0 0.5 1 W c 1 W c 1 W c 2 W c 3 0 5 10 15 20 25 30 0.2 0 0.2 0.4 0.6 0.8 W c 2 [sec] W c 1 W c 2 W c 3 Figure5-3.Convergenceofcriticweightsforthenonzero-sumgame. 123

PAGE 124

0 5 10 15 20 25 30 0.5 0 0.5 1 W a 1 0 5 10 15 20 25 30 0.5 0 0.5 1 1.5 W a 2 [sec] W a 11 W a 12 W a 13 W a 21 W a 22 W a 23 Figure5-4.Convergenceofactorweightsforplayer1andplayer2inanonzero-sumgame. 0 5 10 15 20 25 30 10 0 10 20 30 40 V 1 [sec] 0 5 10 15 20 25 30 2 0 2 4 6 8 V 2 Optimal Approximation Figure5-5.Valuefunctionapproximation V ( x ) ,foranonzero-sumgame. 124

PAGE 125

0 5 10 15 20 25 30 6 4 2 0 2 4 u 1 Approximation Optimal 0 5 10 15 20 25 30 10 5 0 5 u 2 [sec] Figure5-6.Optimalcontrolapproximation u ,foranonzero-sumgame. 0 5 10 15 20 25 30 1 0 1 2 3 x 1 0 5 10 15 20 25 30 2 1.5 1 0.5 0 0.5 x 2 [sec] Figure5-7.Theevolutionofthesystemstatesforthenonzero-sumgame,witha continuouspersistentlyexcitedinput. 125

PAGE 126

0 5 10 15 20 25 30 0 0.5 1 1.5 W c 1 0 5 10 15 20 25 30 0 0.2 0.4 0.6 0.8 W c 2 [sec] W c 1 W c 2 W c 3 W c 1 W c 2 W c 3 Figure5-8.Convergenceofcriticweightsforthenonzero-sumgame,withacontinuous persistentlyexcitedinput. 0 5 10 15 20 25 30 0 0.5 1 1.5 W a 1 0 5 10 15 20 25 30 0 0.2 0.4 0.6 0.8 W a 2 [sec] W a 11 W a 12 W a 13 W a 21 W a 22 W a 23 Figure5-9.Convergenceofactorweightsforplayer1andplayer2inanonzero-sumgame, withacontinuouspersistentlyexcitedinput. 126

PAGE 127

0 5 10 15 20 25 30 10 0 10 20 30 40 V 1 [sec] Optimal Approximation 0 5 10 15 20 25 30 2 0 2 4 6 8 V 2 Figure5-10.Valuefunctionapproximation V ( x ) foranonzero-sumgame,witha continuouspersistentlyexcitedinput. 0 5 10 15 20 25 30 2 0 2 4 u 1 Approximation Optimal 0 5 10 15 20 25 30 2 0 2 4 6 u 2 [sec] Figure5-11.Optimalcontrolapproximation u foranonzero-sumgame,withacontinuous persistentlyexcitedinput. 127

PAGE 128

CHAPTER6 CONCLUSIONANDFUTUREWORK 6.1Conclusion Thefocusofthisworkistodeveloptechniquesforapproximatingsolutionstozerosumandnonzero-sumnoncooperativedi! erentialgamesandusingthesesolutionsto stabilizesomeclassesofuncertainnonlinearsystems.Inthespiritofoptimalcontrol, twoapproacheswereusedbasedfromBellman'soptimalityprincipleandPontryagin's maximumprincipletoapproximatethesolutiontocouplednonlinearHJBequations.The rstapproach,usingthemaximumprinciple,involvespartialfeedbacklinearizationofa particularclassofnonlinearsystemsandsynthesizingadi! erentialgame.Thedi! erential gameyieldsacoupledsetofDREequationswhicharereducedtoAREandconditionsare givenforthesolutiontotheAREs.Thesecondapproachusestheoptimalityprinciple, particularlythedynamicprogrammingapproach,toapproximatethesolutiontotheHJB. Theseapproachesareshowntoapproximatelysolveadi! erentialgameandstabilizethe dynamics.Thespeciccontributionsofeachresultarementionedbelow. Chapter 2 focusesonthedevelopmentofrobust(sub)optimalfeedbackNash-based feedbackcontrollawsforanuncertainnonlinearsystem.Thischapterincorporatesthe RISEcontrollerwithanoptimalNashstrategytostabilizeanuncertainEuler-Lagrange systemwithadditivedisturbances.Thischapteralsoillustratesthedevelopmentof theRISEcontrollerwhichisusedtoasymptoticallyidentifythenonlinearitiesinthe dynamics.ByapplyingtheRISEcontrollerthenonlineardynamicsconvergetoaresidual system,thesolutiontothefeedbackNashgamefortheresidualsystemisusedtoderive thestabilizingfeedbackcontrollaws.The(sub)optimalfeedbackcontrollersareshown tominimizeacostfunctionalinthepresenceofunknownboundeddisturbances,anda Lyapunov-basedstabilityanalysisdemonstratesasymptotictrackingforthecombination oftheRISEandNash-basedcontrollers. 128

PAGE 129

TheresultfromChapter 2 isfurtherrenedinChapter 3 foraclassofsystemsin whichadditionalinformationisprovidedtooneoftheplayers.Themaincontribution ofthischapteristhedevelopmentofrobust(sub)optimalopen-loopStackelberg-based fortheleaderandfollower,whichbothactasinputstoanuncertainnonlinearsystem. SimilartoChapter 2 ,thischapterutilizestheRISEcontrollerandcombinesitwitha di! erentialgame-basedcontrolstrategy.Thecontrolformulationutilizesthesolution tothehierarchicalopen-loopStackelberggametoderivethefeedbackcontrollaws.A Lyapunov-basedasymptotictrackingderivationandasimulationispresentedtovalidate theutilityofthetechnique. IncontrasttotheapproachesinChapters 2 and 3 ,whicharelargelybasedonPontryagin'smaximumprinciple,thetechniquesinChapter 4 and 5 attempttoapproximate thesolutiontotheHJI.ThemaincontributionofChapter 4 issolvingatwoplayerzerosuminnitehorizongamesubjecttocontinuous-timeunknownnonlineardynamicthatare a "neintheinput.Inthedevelopedmethod,twoactorandonecriticNNsusinggradient andleastsquares-basedupdatelaws,respectively,aredesignedtominimizetheBellman error,whichisthedi! erencebetweentheexactandtheapproximateHJIequation.The identierDNNisacombinationofaHopeld-type[ 112 ]component,inparallelcongurationwiththesystem[ 113 ],andaRISEcomponent.TheHopeldcomponentoftheDNN learnsthesystemdynamicsbasedononlinegradient-basedweighttuninglaws,whilethe RISEtermrobustlyaccountsforthefunctionreconstructionerrors,guaranteeingasymptoticestimationofthestateandthestatederivative.Theonlineestimationofthestate derivativeallowstheACIarchitecturetobeimplementedwithoutknowledgeofsystem driftdynamics;however,knowledgeoftheinputgainmatrixisrequiredtoimplement thecontrolpolicy.WhilethedesignoftheactorandcriticarecoupledthroughaHJI equation,thedesignoftheidentierisdecoupledfromactor-critic,andcanbeconsidered asamodularcomponentintheactor-critic-identierarchitecture.Convergenceofthe actor-critic-identier-basedalgorithmandstabilityoftheclosed-loopsystemareanalyzed 129

PAGE 130

usingLyapunov-basedadaptivecontrolmethods,andaPEconditionisusedtoguarantee convergencetowithinaboundedregionoftheoptimalcontrolandUUBstabilityofthe closed-loopsystem. Nonzero-sumgamesposedi! erentchallengesascomparedtozero-sumgames.For nonlineardynamics,theHJIforzero-sumgamesisequivalentlyacoupledsetofnonlinear HJBequationsfornonzero-sumgames.Chapter 5 buildsChapter 4 ,byconsideringa N playernonzero-suminnitehorizongamesubjecttocontinuous-timeuncertainnonlinear dynamics.Themaincontributionofthisworkisderivinganapproximatesolutionto a N -playernonzero-sumgamewithatechniquethatiscontinuous,onlineandbased onadaptivecontroltheory.Previousresearchintheareafocusedonsimplisticscalar nonlinearsystemsorimplementediterative/hybridtechniquesthatrequiredcomplete knowledgeofthedriftdynamics.ThetechniqueexpandstheACIstructuretosolvea di! erentialgameproblem,whereintwoactorandtwocriticneuralnetworkstructures areusedtoapproximatetheoptimalcontrollawsandtheoptimalvaluefunctionset, respectively.ThemaintraitsofthisonlinealgorithminvolvetheuseofADPtechniques andadaptivetheorytodeterminetheNashequilibriumsolutionofthegameinanonline simultaneousprocedurethatdoesnotrequirefullknowledgeofthesystemdynamics andtheonlineversionofamathematicalalgorithmthatsolvestheunderlyingsetof coupledHJBequationsofthegameproblem.Foranequivalentnonlinearsystem,previous researchmakesuseofo#ineproceduresorrequiresfullknowledgeofthesystemdynamics todeterminetheNashequilibrium.ALyapunovproofshowsthatUUBtrackingfor theclosed-loopsystemisguaranteedandaconvergenceanalysisdemonstratesthatthe approximatecontrolpoliciesconvergetotheoptimalsolutionsinthesenseofUUB. 6.2FutureWork Theworkinthisdissertationopensnewdoorsfortheresearchinthedomainof nonlinearoptimalcontroldesign.Inthissection,openproblemsrelatedtotheworkinthis 130

PAGE 131

dissertationarediscussedforacuriousreader.Theopenproblemsarelistedbelow.From Chapters 2 and 3 : 1. Theinvestigationofoptimaloutputfeedbacksolutionsfornonzero-sumgames subjecttononlineardynamics,wherefullstatefeedbackisnotavailable.Ingame theorythisscenarioisconsideredanimperfectstategame.ManyengineeringproblemsthatcanbedenedbyEuler-Lagrangedynamicsdonotalwayshavefullstate feedbackandthusoutputfeedbackdesignsaredesirableforimplementation.Nonlinear H controlhasexaminedanoutputfeedbacksolutionforazero-sumgame, howeverthesecontrollersrequirethesolutiontotheHJIequation.Furthermorethe nonzero-sumoutputfeedbackdesignhasbeenrelativelyunexplored. 2. Thedeterminationofandi! erentialgame-derivedoptimalcontrolcontrollaw thatissubjecttosaturatedinputsandtimedelays.Hardwareimplementationof controldesignsareoftenplaguedwithtimedelaysandsmallactuatorbandwidth. PreliminaryinvestigationwithHeuristicDynamicProgramminghavelookedatADP approachestoincorporatingthesephenomenonintothecontroldesign,howeverlittle researchhasgoneintothedevelopmentofexistenceanduniquenessconditionsfor thesetypesofgamesandtheimplementationofagame-derivedcontroldesignfora nonlinearsystemthatincorporatesthesee! ects. 3. InregardstoEuler-Lagrangedynamics,thederivationofagreedyoptimalcontrol thatthathasamovinghorizon,therebyallowingforonlinerealization.Agreedy strategyonlylooksatthemostoptimalchoiceforthenextiteration,whereas Chapters 2 and 3 focusedontheinnitehorizonoptimalstrategy.Deninga controllerthatlooksatasmallerintervalthatchangeswithtimeallowsfora computationallyfeasibleonlinecalculationofthegamesolution. FromChapters 4 and 5 : 1. Onlineadaptiveoptimalcontrollersforsystemswithperiodicdynamics.Periodic dynamicscanbeseenassystemswhomyieldaperiodicoutput(e.g.Atvarious 131

PAGE 132

levelofmodeling,automotiveenginedynamicscanbeconsideredasalinearperiodic systemmechanicallycoordinatedthroughtherevolutionofthecrankshaft).Periodic systemsarepresentinawidevarietyofengineeringapplications,particularlyrobotic systemsusedinmanufacturing,yettherehavenotbeenmuchinvestigationof controllingthesesystemsusingADPtechniques. 2. Onlineadaptiveoptimalcontrollersusingoutputfeedback.Nonlinear H control hasexaminedanoutputfeedbacksolutionforazero-sumgame,howeverthese controllersrequirethesolutiontotheHJIequation.Furthermorethenonzero-sum outputfeedbackdesignhasbeenrelativelyunexplored.UsingADPtechniques,the approximationoftheHJIsolutioncouldyieldmoreimplementablesolutions. 3. Existenceanduniquenessproofsformulti-playernonzero-suminnitehorizon gameswithnonlineardynamicconstraints.Duetothecomplexityof N -coupled HJBequations,theinvestigationofuniquenesspropertiesinthe N -playernonzerosumgamearesparse.Fornonlineardynamicsthisislargelystillseenasanopen problem. 4. AnonlinecontinuousADPsolutionusingamixedstrategyforazero-suminnite horizongamewithnonlineardynamicconstraints.Zero-sumgamesarewidelyused inengineeringproblems,however,typicallythesolutionoftheHJIequationis assumedtoexistorhaslocalexistencewithconditionsthataredi"culttosatisfy. ForonlinecontinuousADPgames,thescenarioinwhichthesaddlepointsolution doesnotexistisanopenproblem.Theuseofthemixedstrategyincorporatedinan onlinecontinuousADPtechniquecouldbeafeasiblesolution. 132

PAGE 133

REFERENCES [1] A.Barto,R.Sutton,andC.Anderson,"Neuron-likeadaptiveelementsthatcansolve di cultlearningcontrolproblems," IEEETrans.Syst.ManCybern. ,vol.13,no.5, pp.834846,1983. [2] R.S.SuttonandA.G.Barto, ReinforcementLearning:AnIntroduction .MIT Press,1998. [3] R.Sutton,A.Barto,andR.Williams,"Reinforcementlearningisdirectadaptive optimalcontrol," IEEEContr.Syst.Mag. ,vol.12,no.2,pp.1922,1992. [4] J.CamposandF.Lewis,"Adaptivecriticneuralnetworkforfeedforwardcompensation,"in Proc.Am.ControlConf. ,vol.4,1999. [5] A.Al-Tamimi,F.L.Lewis,andM.Abu-Khalaf,"Adaptivecriticdesignsfordiscretetimezero-sumgameswithapplicationtoh-[innity]control," IEEETrans.Syst. ManCybern.PartBCybern. ,vol.37,pp.240247,2007. [6] Y.TessaandT.Erez,"Leastsquaressolutionsofthehjbequationwithneural networkvalue-functionapproximators," Trans.onNeuralNetworks ,vol.18,pp. 10311041,2007. [7] R.Beard,G.Saridis,andJ.Wen,"Galerkinapproximationsofthegeneralized Hamilton-Jacobi-Bellmanequation," Automatica ,vol.33,pp.21592178,1997. [8] J.A.PrimbsandV.Nevistic,"Optimalityofnonlineardesigntechniques:A converseHJBapproach,"CaliforniaInstituteofTechnology,Pasadena,CA91125, Tech.Rep.CIT-CDS96-022,1996. [9] T.Cheng,F.Lewis,andM.Abu-Khalaf,"Fixed-nal-time-constrainedoptimal controlofnonlinearsystemsusingneuralnetworkHJBapproach," IEEETrans. NeuralNetworks ,vol.18,no.6,pp.17251737,2007. [10] M.Abu-KhalafandF.Lewis,"NearlyoptimalHJBsolutionforconstrainedinput systemsusinganeuralnetworkleast-squaresapproach,"in Proc.IEEEConf.Decis. Control ,LasVegas,NV,2002,pp.943948. [11] M.KrsticandZ.-H.Li,"Inverseoptimaldesignofinput-to-statestabilizingnonlinear controllers,"in Proc.IEEEConf.Decis.Control ,vol.4,Dec.1012,1997,pp. 34793484. [12] M.KrsticandP.Tsiotras,"Inverseoptimalityresultsfortheattitudemotionofa rigidspacecraft,"in Proc.Am.ControlConf. ,4-6June1997,pp.18841888. [13] M.KrsticandH.Deng, StabilizationofNonlinearUncertainSystems .Springer, 1998. 133

PAGE 134

[14] M.KrsticandZ.-H.Li,"Inverseoptimaldesignofinput-to-statestabilizingnonlinear controllers," IEEETrans.Autom.Control ,vol.43,no.3,pp.336350,March1998. [15] N.Kidane,Y.Yamashita,H.Nakamura,andH.Nishitani,"Inverseoptimizationfor anonlinearsystemwithaninputconstraint,"in Proc.SICEAnnu.Conf. ,vol.2,4-6 Aug.2004,pp.12101213. [16] T.Fukao,"Inverseoptimaltrackingcontrolofanonholonomicmobilerobot,"in Proc.IEEE/RSJInt.Conf.Intell.Robot.Syst. ,vol.2,28Sept.-2Oct.2004,pp. 14751480. [17] R.FreemanandJ.Primbs,"ControlLyapunovfunctions:newideasfromanold source,"in Proc.IEEEConf.Decis.Control ,11-13Dec.1996,pp.39263931. [18] P.Gurl,"Non-linearmissileguidancesynthesisusingcontrolLyapunovfunctions," Proc.IMEGJ.Aero.Eng. ,vol.219,pp.7788,2005. [19] J.vonNeumannandO.Morgenstern, TheoryofGamesandEconomicBehavior PrincetonUniversityPress,1980. [20] J.Nash,"Non-cooperativegames," AnnalsofMath. ,vol.2,pp.286295,1951. [21] H.vonStackelberg, TheTheoryoftheMarketEconomy .OxfordUniv.Press,1952. [22] R.Isaacs, Di erentialGames .JohnWiley,1967. [23] K.Hipel,K.Radford,andL.Fang,"Multipleparticipant-multiplecriteriadecision making." IEEETrans.Syst.ManCybern. ,vol.23,pp.11841189,1993. [24] S.Zoints,"Multiplecriteriamathematicalprogramming:Anoverviewandseveral approaches." MathematicsofMulti-ObjectiveOptimization ,pp.227273,1985. [25] Y.Sawaragi,H.Nakayama,andT.Tanino, TheoryofMulti-objectiveOptimization AcademicPress,1985. [26] K.-C.Chu,"Teamdecisiontheoryandinformationstructuresinoptimalcontrol problems-partii," IEEETrans.Autom.Contr. ,vol.17,pp.2228,1972. [27] C.-Y.HoandK.-C.Chu,"Teamdecisiontheoryandinformationstructuresin optimalcontrolproblems-parti," IEEETrans.Autom.Contr. ,vol.17,pp.1522, 1972. [28] K.KimandF.W.Roush, TeamTheory .EllisHorwoodLimited,1987. [29] W.Bialas,"Cooperativen-personStackelberggames," IEEEConf.Decisionand Control ,pp.24392444,1989. [30] G.Owen, GameTheory .AcademicPress,1982. 134

PAGE 135

[31] R.Isaacs, Di erentialgames:amathematicaltheorywithapplicationstowarfareand pursuit,controlandoptimization .DoverPubns,1999. [32] S.Tijs, IntroductiontoGameTheory .HindustandBookAgency,2003. [33] T.BasarandG.Olsder, DynamicNoncooperativeGameTheory .SIAM,PA,1999. [34] M.Bloem,T.Alpcan,andT.Ba$ar,"AStackelberggameforpowercontroland channelallocationincognitiveradionetworks,"in Proc.Int.Conf.Perform.Eval. Methodol.Tools .ICST,Brussels,Belgium,Belgium:ICST(InstituteforComputer Sciences,Social-InformaticsandTelecommunicationsEngineering),2007,pp.19. [35] T.BasarandH.Selbuz,"Closed-loopStackelbergstrategieswithapplicationsinthe optimalcontrolofmultilevelsystems," IEEETrans.Autom.Control ,vol.24no.2, pp.166179,1979. [36] J.Medanic,"Closed-loopStackelbergstrategiesinlinear-quadraticproblems," IEEE Trans.Autom.Control ,vol.23no.4,pp.632637,1978. [37] M.SimaanandJ.Cruz,J.,"AStackelbergsolutionforgameswithmanyplayers," IEEETrans.Autom.Control ,vol.18,pp.322324,1973. [38] G.PapavassilopoulosandJ.Cruz,"NonclassicalcontrolproblemsandStackelberg games," IEEETrans.Autom.Control ,vol.24no.2,pp.155166,1979. [39] A.Gambier,A.Wellenreuther,andE.Badreddin,"Anewapproachtodesignmultiloopcontrolsystemswithmultiplecontrollers,"in Proc.IEEEConf.Decis.Control 13-152006,pp.18281833. [40] J.HongbinandC.Y.Huang,"Non-cooperativeuplinkpowercontrolincellularradio systems," WirelessNetworks ,vol.4no.3,pp.233240,1998. [41] T.BasarandP.Bernhard, H-innityOptimalControlandRelatedMinimaxDesign Problems .Boston:BirkhŠuser,2008. [42] A.IsidoriandA.Astol,"DisturbanceattenuationandH-innity-controlvia measurementfeedbackinnonlinearsystems," IEEETrans.Autom.Control ,vol.37, no.9,pp.12831293,Sept.1992. [43] L.Pavel,"AnoncooperativegameapproachtoOSNRoptimizationinoptical networks," IEEETrans.Autom.Control ,vol.51no.5,pp.848852,2006. [44] C.J.Tomlin,J.Lygeros,andS.ShankarSastry,"Agametheoreticapproachto controllerdesignforhybridsystems," Proc.oftheIEEE ,vol.88,no.7,pp.949970, 2000. [45] T.BasarandG..J.Olsder,"Team-optimalclosedloopStackelbergstrategiesin hierarchicalcontrolproblems," Automatica ,vol.16no.4,pp.409414,1980. 135

PAGE 136

[46] M.Jungers,E.Trelat,andH.Abou-Kandil,"Min-maxandmin-minStackelberg strategywithclosed-loopinformation," HALHyperArticlesenLigne ,vol.3,2010. [47] M.Abu-KhalafandF.Lewis,"Nearlyoptimalcontrollawsfornonlinearsystems withsaturatingactuatorsusinganeuralnetworkHJBapproach," Automatica vol.41,no.5,pp.779791,2005. [48] A.StarrandC.-Y.Ho,"Nonzero-sumdi! erentialgames," J.Optimiz.TheoryApp. vol.3,pp.184206,1972. [49] G.Leitmann, CooperativeandNon-cooperativeManyPlayserDi! erentialGames Springer,1974. [50] D.YeungandL.Petrosyan,"Subgameconsistentsolutionsofacooperativestochasticdi! erentialgamewithnontransferablepayo! s," J.Optimiz.TheoryApp. ,vol. 124,pp.701724,2005. [51] A.StarrandHo,"Furtherpropertiesofnonzero-sumdi! erentialgames," J.Optimiz. TheoryApp. ,vol.4,pp.207219,1969. [52] J.EngwerdaandA.Weeren,"Theopen-loopnashequilibriuminlq-gamesrevisited," CenterforEconomicResearch ,1995. [53] J.Case,"Towardatheoryofmanyplayerdi! erentialgames," SIAM ,vol.7,pp. 179197,1969. [54] A.Friedman, Di erentialgames .Wiley,1971. [55] T.BasarandP.Bernhard, Hinnity-optimalcontrolandrelatedminimaxdesign problems:Adynamicgameapproach .BirkhŠuser,1995. [56] J.Cruz,"Leader-followerstrategiesformultilevelsystems," IEEETrans.Autom. Control ,vol.23no.2,pp.244255,1978. [57] C.I.ChenandJ.B.Cruz,"Stackelbergsolutionfortwo-persongameswithbiased informationpatterns," IEEETrans.Autom.Control ,vol.17no.6,pp.791798, 1972. [58] J.B.Cruz,"SurveyofNashandStackelbergequilibriumstrategiesindynamic games," Ann.Econ.Soc.Meas. ,vol.4no.2,pp.339344,1975. [59] M.SimaanandJ.Cruz,"OntheStackelbergstrategyinnonzero-sumgames," J. Optimiz.TheoryApp. ,vol.11no.5,pp.533555,1973. [60] ,"AdditionalaspectsoftheStackelbergstrategyinnonzero-sumgames," J. Optimiz.TheoryApp. ,vol.11no.1,pp.613626,1973. [61] B.GardnerandJ.Cruz,"FeedbackStackelbergstrategyforatwo-playergame," IEEETrans.Autom.Control ,vol.22no.2,pp.244255,1977. 136

PAGE 137

[62] A.Weeren,J.Schumacher,andJ.Engwerda,"Asymptoticanalysisoflinearfeedback nashequilibriainnonzero-sumlinear-quadraticdi! erentialgames," J.Optimiz. TheoryApp. ,vol.101,pp.69372,1999. [63] G.Freiling,G.Jank,andD.Kremer,"Solvabilityconditionforanonsymmetric riccatiequationappearinginstackelberggames," Proc.oftheEuropeanControl Conference ,2003. [64] T.BasarandG.J.Olsder, DynamicNoncooperativeGameTheory .SIAM,1999. [65] A.VanderSchaft,"L2-gainanalysisofnonlinearsystemsandnonlinearH-[innity] control," IEEETrans.Autom.Control ,vol.37,no.6,pp.770784,1992. [66] A.vanderSchaft,"L2gainanalysisofnonlinearsystemsandnonlinearstatefeedbackH-innitycontrol," IEEETrans.Autom.Control ,vol.37,no.6,pp. 770784,June1992. [67] ,"Onastatespaceapproachtoh-innitynonlinearcontrol," Syst.Contr.Lett. vol.16,pp.18,1991. [68] A.IsidoriandW.Lin,"GlobalinverseL2-gainstatefeedbackdesignforaclassof nonlinearsystems," IEEEConf.DecisionandControl ,pp.28312836,1997. [69] X.Cai,R.Lin,andS.Su,"Robuststabilizationforaclassofnonlinearsystems,"in Proc.Chin.ControlDecis.Conf. ,2008,pp.48404844. [70] H.Kim,J.Back,H.Shim,andJ.H.Seo,"Locallyoptimalandgloballyinverse optimalcontrollerformulti-inputnonlinearsystems,"in Proc.Am.ControlConf. 2008,pp.44864491. [71] Y.NakamuraandH.Hanafusa,"Inversekinematicsolutionswithsingularity robustnessforrobotmanipulatorcontrol," J.Dyn.Syst.Meas.Contr. ,vol.108, no.3,pp.163171,1986. [72] J.Guojun,"Inverseoptimalstabilizationofaclassofnonlinearsystems,"in Proc. Chin.ControlConf. ,2007,pp.226230. [73] M.KrsticandP.Tsiotras,"Inverseoptimalstabilizationofarigidspacecraft," IEEE Trans.Autom.Control ,vol.44,no.5,pp.10421049,May1999. [74] M.Krstic,"Inverseoptimaladaptivecontrol-theinterplaybetweenupdatelaws, controllaws,andLyapunovfunctions,"in Proc.Am.ControlConf. ,2009,pp. 12501255. [75] Z.-H.LiandM.Krstic,"Optimaldesignofadaptivetrackingcontrollersfornonlinearsystems,"in Proc.Am.ControlConf. ,Albuquerque,NewMexico,1997,pp. 11911197. 137

PAGE 138

[76] J.Fausz,V.-S.Chellaboina,andW.Haddad,"Inverseoptimaladaptivecontrolfor nonlinearuncertainsystemswithexogenousdisturbances,"in Proc.IEEEConf. Decis.Control ,Dec.1997,pp.26542659. [77] W.Luo,Y.-C.Chu,andK.-V.Ling,"Inverseoptimaladaptivecontrolforattitude trackingofspacecraft," IEEETrans.Autom.Control ,vol.50,no.11,pp.16391654, Nov.2005. [78] L.Sonneveldt,E.VanOort,Q.P.Chu,andJ.A.Mulder,"Comparisonofinverse optimalandtuningfunctionsdesignsforadaptivemissilecontrol," J.Guid.Contr. Dynam. ,vol.31,no.4,pp.11761182,2008. [79] X.-S.CaiandZ.-Z.Han,"Inverseoptimalcontrolofnonlinearsystemswithstructuraluncertainty," IEEProc.Contr.Theor.Appl. ,vol.152,no.1,pp.7983,Jan. 2005. [80] J.Cheng,H.Li,andY.Zhang,"Robustlow-costslidingmodeoverloadcontrolfor uncertainagilemissilemodel,"in Proc.WorldCongr.Intell.ControlAutom. ,Dalian, China,June2006,pp.21852188. [81] T.ChengandF.Lewis,"Neuralnetworksolutionfornite-horizonH-innity constrainedoptimalcontrolofnonlinearsystems,"vol.5,no.1,2007,pp.111. [82] T.Cheng,F.Lewis,andM.Abu-Khalaf,"Aneuralnetworksolutionforxed-nal timeoptimalcontrolofnonlinearsystems," Automatica ,vol.43,no.3,pp.482490, 2007. [83] Y.Kim,F.Lewis,andD.Dawson,"Intelligentoptimalcontrolofroboticmanipulatorusingneuralnetworks," Automatica ,vol.36,no.9,pp.13551364,2000. [84] Y.KimandF.Lewis,"OptimaldesignofCMACneural-networkcontrollerforrobot manipulators," IEEETrans.Syst.ManCybern.PartCAppl.Rev. ,vol.30,no.1, pp.2231,feb2000. [85] K.Dupree,P.Patre,Z.Wilcox,andW.Dixon,"Asymptoticoptimalcontrolof uncertainnonlineareuler-lagrangesystems," Automatica ,2010. [86] P.Werbos,"Approximatedynamicprogrammingforreal-timecontrolandneural modeling,"in HandbookofIntelligentControl:Neural,Fuzzy,andAdaptiveApproaches ,D.A.WhiteandD.A.Sofge,Eds.NewYork:VanNostrandReinhold, 1992. [87] D.BertsekasandJ.Tsitsiklis, Neuro-DynamicProgramming .AthenaScientic, 1996. [88] D.V.ProkhorovandI.Wunsch,D.C.,"Adaptivecriticdesigns," IEEETrans. NeuralNetworks ,vol.8,pp.9971007,1997. 138

PAGE 139

[89] A.Al-Tamimi,F.L.Lewis,andM.Abu-Khalaf,"Discrete-timenonlinearHJB solutionusingapproximatedynamicprogramming:Convergenceproof," IEEE Trans.Syst.ManCybern.PartBCybern. ,vol.38,pp.943949,2008. [90] ,"Model-freeq-learningdesignsforlineardiscrete-timezero-sumgameswith applicationtoh-[innity]control," Automatica ,vol.43,pp.473481,2007. [91] B.Widrow,N.Gupta,andS.Maitra,"Punish/reward:Learningwithacriticin adaptivethresholdsystems," IEEETrans.Syst.ManCybern. ,vol.3,no.5,pp. 455465,1973. [92] S.Balakrishnan,"Adaptive-critic-basedneuralnetworksforaircraftoptimalcontrol," J.Guid.Contr.Dynam. ,vol.19,no.4,pp.893898,1996. [93] G.Lendaris,L.Schultz,andT.Shannon,"Adaptivecriticdesignforintelligent steeringandspeedcontrolofa2-axlevehicle,"in Int.JointConf.NeuralNetw. 2000,pp.7378. [94] S.FerrariandR.Stengel,"Anadaptivecriticglobalcontroller,"in Proc.Am. ControlConf. ,vol.4,2002. [95] D.HanandS.Balakrishnan,"State-constrainedagilemissilecontrolwithadaptivecritic-basedneuralnetworks," IEEETrans.ControlSyst.Technol. ,vol.10,no.4,pp. 481489,2002. [96] P.HeandS.Jagannathan,"Reinforcementlearningneural-network-basedcontroller fornonlineardiscrete-timesystemswithinputconstraints," IEEETrans.Syst.Man Cybern.PartBCybern. ,vol.37,no.2,pp.425436,2007. [97] L.Baird,"Advantageupdating,"WrightLab,Wright-PattersonAirForceBase,OH, Tech.Rep.,1993. [98] K.Doya,"Reinforcementlearningincontinuoustimeandspace," NeuralComput. vol.12,no.1,pp.219245,2000. [99] J.Murray,C.Cox,G.Lendaris,andR.Saeks,"Adaptivedynamicprogramming," IEEETrans.Syst.ManCybern.PartCAppl.Rev. ,vol.32,no.2,pp.140153, 2002. [100] D.VrabieandF.Lewis,"Neuralnetworkapproachtocontinuous-timedirect adaptiveoptimalcontrolforpartiallyunknownnonlinearsystems," NeuralNetworks vol.22,no.3,pp.237246,2009. [101] K.VamvoudakisandF.Lewis,"Onlinesynchronouspolicyiterationmethodfor optimalcontrol,"in RecentAdvancesinIntelligentControlSystems .Springer,2009, pp.357374. 139

PAGE 140

[102] S.Bhasin,R.Kamalapurkar,M.Johnson,K.G.Vamvoudakis,F.L.Lewis,and W.E.Dixon,"Anovelactor-critic-identierarchitectureforapproximateoptimal controlofuncertainnonlinearsystems," Automatica ,vol.(submitted),2011. [103] Q.WeiandH.Zhang,"Anewapproachtosolveaclassofcontinuous-timenonlinear quadraticzero-sumgameusingadp,"in Networking,SensingandControl,2008. ICNSC2008.IEEEInternationalConferenceon .IEEE,2008,pp.507512. [104] H.Zhang,Q.Wei,andD.Liu,"Aniterativeadaptivedynamicprogrammingmethod forsolvingaclassofnonlinearzero-sumdi! erentialgames," Automatica ,vol.47,pp. 207214207214207214,2010. [105] X.Zhang,H.Zhang,Y.Luo,andM.Dong,"Iterationalgorithmforsolvingthe optimalstrategiesofaclassofnona"nenonlinearquadraticzero-sumgames,"in ControlandDecisionConference ,2010. [106] A.Mellouk,Ed., AdvancesinReinforcementLearning .InTech,2011. [107] K.VamvoudakisandF.Lewis,"Onlineactor-criticalgorithmtosolvethe continuous-timeinnitehorizonoptimalcontrolproblem," Automatica ,vol.46, pp.878888,2010. [108] ,"Onlineneuralnetworksolutionofnonlineartwo-playerzero-sumgamesusing synchronouspolicyiteration,"in Proc.IEEEConf.Decis.Control ,2010. [109] ,"Multi-playernon-zero-sumgames:Onlineadaptivelearningsolutionof coupledhamilton-jacobiequations," Automatica ,2011. [110] M.Littman,"Value-functionreinforcementlearninginmarkovgames," Cognitive SystemsResearch ,vol.2,no.1,pp.5566,2001. [111] P.M.Patre,"Lyapunov-basedrobustandadaptivecontrolofnonlinearsystemsusing anovelfeedbackstructure,"Ph.D.dissertation,UniversityofFlorida,Gainesville, FL,2009.[Online].Available: http://ncr.mae.u.edu/dissertations/patre.pdf [112] J.Hopeld,"Neuronswithgradedresponsehavecollectivecomputationalproperties likethoseoftwo-stateneurons," Proc.Nat.Acad.Sci.U.S.A. ,vol.81,no.10,p. 3088,1984. [113] A.Poznyak,E.Sanchez,andW.Yu, Di erentialneuralnetworksforrobustnonlinear control:identication,stateestimationandtrajectorytracking .WorldScienticPub CoInc,2001. [114] D.Kirk, OptimalControlTheory:AnIntroduction .DoverPubns,2004. [115] T.Basar,"Acounterexampleinlinear-quadraticgames:Existenceofnon-linearnash strategies," J.Optimiz.TheoryApp. ,vol.14,pp.425430,1974. 140

PAGE 141

[116] A.Filippov,"Di! erentialequationswithdiscontinuousright-handside," Am.Math. Soc.Transl. ,vol.42no.2,pp.199231,1964. [117] , Di erentialequationswithdiscontinuousright-handside .Netherlands:Kluwer AcademicPublishers,1988. [118] G.V.Smirnov, Introductiontothetheoryofdi! erentialinclusions .American MathematicalSociety,2002. [119] F.H.Clarke, Optimizationandnonsmoothanalysis .SIAM,1990. [120] D.ShevitzandB.Paden,"Lyapunovstabilitytheoryofnonsmoothsystems," IEEE Trans.Autom.Control ,vol.39no.9,pp.19101914,1994. [121] B.PadenandS.Sastry,"AcalculusforcomputingFilippov'sdi! erentialinclusion withapplicationtothevariablestructurecontrolofrobotmanipulators," IEEE Trans.CircuitsSyst. ,vol.34no.1,pp.7382,1987. [122] F.KydlandandE.Prescott,"Rulesratherthandiscretion:Theinconsistencyof optimalplans," JournalofPoliticalEconomy ,vol.85no.3,pp.473492,1977. [123] H.Abou-Kandil,G.Freiling,V.Ionescu,andG.Jank, MatrixRiccatiEquationsin ControlandSystemsTheory .Birkhauser,2003. [124] F.L.Lewis, OptimalControl .JohnWiley&Sons,1986. [125] M.BardiandI.Dolcetta, OptimalcontrolandviscositysolutionsofHamilton-JacobiBellmanequations .Springer,1997. [126] F.L.Lewis,R.Selmic,andJ.Campos, Neuro-FuzzyControlofIndustrialSystems withActuatorNonlinearities .Philadelphia,PA,USA:SocietyforIndustrialand AppliedMathematics,2002. [127] W.E.Dixon,A.Behal,D.M.Dawson,andS.Nagarkatti, NonlinearControlof EngineeringSystems:ALyapunov-BasedApproach .BirkhŠuserBoston,2003. [128] M.Krstic,P.V.Kokotovic,andI.Kanellakopoulos, NonlinearandAdaptiveControl Design .JohnWiley&Sons,1995. [129] P.M.Patre,W.MacKunis,K.Kaiser,andW.E.Dixon,"Asymptotictracking foruncertaindynamicsystemsviaamultilayerneuralnetworkfeedforwardand RISEfeedbackcontrolstructure," IEEETrans.Autom.Control ,vol.53,no.9,pp. 21802185,2008. [130] S.SastryandM.Bodson, AdaptiveControl:Stability,Convergence,andRobustness UpperSaddleRiver,NJ:Prentice-Hall,1989. [131] H.K.Khalil, NonlinearSystems ,3rded.PrenticeHall,2002. 141

PAGE 142

[132] V.NevisticandJ.A.Primbs,"Constrainednonlinearoptimalcontrol:aconverse HJBapproach,"CaliforniaInstituteofTechnology,Pasadena,CA91125,Tech.Rep. CIT-CDS96-021,1996. 142

PAGE 143

BIOGRAPHICALSKETCH MarcusJohnsonwasborninSeattle,Washington.HereceivedhisBachelorof EngineeringdegreeinaerospaceengineeringfromtheUniversityofFlorida,USA.Hethen joinedtheNonlinearControlsandRobotics(NCR)researchgrouptopursuehisdoctoral researchundertheadvisementofWarrenE.DixonandhecompletedhisDoctorateof PhilosophyfromtheUniversityofFlordainAugust2011.Hehasworkedasaight controlsengineerfortheTybrinCorporationatNASADrydeninEdwardsCA,fromMay 2009throughNovember2010.SinceNovember2010hehascurrentlybeenworkingasa guidance,navigation,andcontrolengineerwiththeBoeingCo.inHuntingtonBeachCA. 143