Citation
Optimization and Dynamical Approaches in Nonlinear Time Series Analysis with Applications in Bioengineering

Material Information

Title:
Optimization and Dynamical Approaches in Nonlinear Time Series Analysis with Applications in Bioengineering
Creator:
CHAOVALITWONGSE, WANPRACHA ( Author, Primary )
Copyright Date:
2008

Subjects

Subjects / Keywords:
Algorithms ( jstor )
Epilepsy ( jstor )
Ions ( jstor )
Linear programming ( jstor )
Maxims ( jstor )
Mining ( jstor )
Seizures ( jstor )
Time series ( jstor )
Time series analysis ( jstor )
Tin ( jstor )

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright Wanpracha Chaovalitwongse. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Embargo Date:
8/1/2008
Resource Identifier:
53207417 ( OCLC )

Downloads

This item is only available as the following downloads:


Full Text

PAGE 1

OPTIMIZATIONANDDYNAMICALAPPROACHESINNONLINEARTIME SERIESANALYSISWITHAPPLICATIONSINBIOENGINEERING By WANPRACHACHAOVALITWONGSE ADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOL OFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF DOCTOROFPHILOSOPHY UNIVERSITYOFFLORIDA 2003

PAGE 2

Copyright2003 by WanprachaChaovalitwongse

PAGE 3

Thisworkisdedicatedtomyfamily.

PAGE 4

ACKNOWLEDGMENTS Firstandforemost,Iwishtoexpressmydeepsenseofgratitudetowa rdsmy advisorProfesssorPanosM.Pardalosforhishelp,generousadvi ce,fruitfulguidance, andkindencouragementsatalltimes.Hisinsights,°exibility,a ndinnovatingideas haveallowedmetosuccessfullypursuetheresearchprojectwhich isthebasisofthis thesis,whilehissenseofhumorhasmadetheexperienceenjoyabl eandful¯lling.I hadexcellentworkingconditionsathisinstituteallthetim e. Iamveryfortunatetohavetheothermembersofmyadvisorycom mittee, ProfessorsJosephP.Geunes,StanislavUryasev,andJ.ChrisSackel lares.Iam verythankfulfortheirinsightfulcomments,valuablesuggesti ons,andconstantencouragements.ThroughoutmystudentcareerIhavebeenveryfo rtunatetowork undertheguidanceofotherfacultytowhomIamindebtedfort heirencouragement andsupport.Amongthese,IamparticularlythankfultoProfessor sLeonidasD. Iasemidis,VitaliyYatsenkoandPaulR.Carney. TheBrainDynamicsLaboratory(BDL)andCenterforAppliedOp timization (CAO)havebothprovidedagreatworkingenvironment,andIp articularlyappreciatethefriendshipandmanythought-provokingdiscussions.Iwou ldliketothank BDLmembers(i.e.,Dr.D.-S.Shiau,LindaK.Dance,WichaiSu haritdamrong, andSandeepNair)andCenterforAppliedOptimizationmembers (i.e.,Dr.Sergiy Butenko,BrunoChiarini,CarlosOliviera,VladimirBoginski ,OlgaPerdikaki,Lezhou Zhan,andOlegProkopyev).Theyhavemadetheresearchprocess pleasant,and theexposuretotheirresearchtopicshasgreatlyenhancedmya ppreciationofother areasofoperationsresearch.Especially,Lindahelpedmecor recterrorsinEnglish grammarandstyleaswellasimprovethee±ciencyofmycomputer programsinthis iv

PAGE 5

dissertationdespiteherbusyschedule.Therearemanymorefrien dsIhavemade attheUniversityofFloridawhogavemehelpandsupportbutwere notmentioned here.Iwouldliketotakethisopportunitytothankallofthe m. Iwishtothankmyfamily,espectiallymyfather,Vinij,mymothe rBoonya,and mysisters,PaveenaandChamonporn,fortheirsel°esssacri¯ce,andt heireternal springofinspirationinallmyendeavors.Theirloveandconstan tencouragement haveplayedagreatroleinsustainingmethroughmygraduatestu dies.Despite thefactthatIamthousandsofmilesawayfromthem,mywelfare hasalwaysbeen uppermostintheirminds.Iproudlydedicatethisthesistoallo fthem. Ialsogratefullyacknowledgetheresearchassistantshipthatsupp ortmyresearch attheUniversityofFlorida.Inparticular,theresearchonapp licationsinbioengineeringwaspartiallysupportedbytheNIH,NSF,andVAresearchgr ants. v

PAGE 6

TABLEOFCONTENTS page ACKNOWL EDG M ENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv L IS T O F TABL ES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix L IS T O F F IG URES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x ABS TRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi CHAPTERS 1 INTRO DUCTIO N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Dat a M in in g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Classicat ion an d Pr ed ict ion . . . . . . . . . . . . . . . . . 4 1.1.2 Clu st er in g . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Tim e S er ies An alysis . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Ch aos Th eor y an d Non lin ear Dyn am ics . . . . . . . . . . . . . . . 6 1.4 G lob al O p t im izat ion . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.5 Cont r ib u t ion s of t h e Th esis . . . . . . . . . . . . . . . . . . . . . . 14 1.6 O r gan izat ion of Ch ap t er s . . . . . . . . . . . . . . . . . . . . . . . 17 2 TIM E S ERIES DATA M INING . . . . . . . . . . . . . . . . . . . . . . . 18 2.1 Tim e S er ies An alysis an d For ecast in g Tech n iqu es . . . . . . . . . 18 2.1.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.1.2 S t at ist ical An alysis . . . . . . . . . . . . . . . . . . . . . . 22 2.1.3 Non lin ear An alysis . . . . . . . . . . . . . . . . . . . . . . 26 2.1.4 S p ect r al An alysis . . . . . . . . . . . . . . . . . . . . . . . 27 2.1.5 M at ch in g S im ilar Tim e S er ies Pat t er n s . . . . . . . . . . . 28 2.2 Dat a M in in g Con cep t s . . . . . . . . . . . . . . . . . . . . . . . . 29 2.3 Con cep t s of Tim e S er ies Dat a M in in g . . . . . . . . . . . . . . . . 30 2.3.1 Tem p or al Pat t er n an d Tem p or al Pat t er n Clu st er . . . . . . 3 0 2.3.2 Ph ase S p ace an d Tim eDelay Emb ed d in g . . . . . . . . . . 31 2.3.3 Event Ch ar act er izat ion Fu n ct ion . . . . . . . . . . . . . . . 31 3 DYNAM ICAL APPROACHES AND CHAO S THEO RY . . . . . . . . . 32 3.1 Ch aos G lossar y . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.1.1 Ph ase S p ace . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.1.2 Tr a ject or y . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.1.3 Bifu r cat ion . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 vi

PAGE 7

3.1.4 Degr ee of Fr eed om . . . . . . . . . . . . . . . . . . . . . . . 34 3.1.5 At t r act or . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.1.6 S t r an ge At t r act or . . . . . . . . . . . . . . . . . . . . . . . 353.2 Ch aos Th eor y for Tim e S er ies An alysis . . . . . . . . . . . . . . . 36 3.3 Ch aot ic S yst em s . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3.1 H e n on M ap p in g . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3.2 L or en z S yst em . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.3.3 Rossler S yst em . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4 Fr act al Dim en sion . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4.1 BoxCou nt in g Dim en sion ( D 0 ) . . . . . . . . . . . . . . . . 44 3.4.2 In for m at ion Dim en sion ( D 1 ) . . . . . . . . . . . . . . . . . 44 3.4.3 Cor r elat ion Dim en sion ( D 2 ) . . . . . . . . . . . . . . . . . 44 3.5 L yap u n ov Exp on ent s . . . . . . . . . . . . . . . . . . . . . . . . . 45 4 G L O BAL O PTIM IZ ATIO N . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.1 Fu n d am ent al Resu lt s on G lob al O p t im izat ion . . . . . . . . . . . 51 4.1.1 Kar u sh {Ku h n {Tu cker Con d it ion s . . . . . . . . . . . . . . 53 4.1.2 KKT Con d it ion s an d t h e L in ear Com p lem ent ar ity Pr ob le m 54 4.1.3 O p t im ality Con d it ion s . . . . . . . . . . . . . . . . . . . . 55 4.2 Discr et e O p t im izat ion . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2.1 L in ear Discr et e O p t im izat ion Pr ob lem s . . . . . . . . . . . 57 4.2.2 Non lin ear Discr et e O p t im izat ion Pr ob lem s . . . . . . . . . 63 4.3 Non lin ear an d Int eger Pr ogr am m in g Pr ob lem s . . . . . . . . . . . 66 4.3.1 Equ ivalen ce Between Discr et e an d Cont inu ou s Pr ogr am s . . 67 4.3.2 Int eger Pr ogr am s an d Com p lem ent ar ity Pr ob lem s . . . . . 68 4.4 Q u ad r at ic Pr ogr am m in g . . . . . . . . . . . . . . . . . . . . . . . 71 4.4.1 Com p lexity of Q u ad r at ic O p t im izat ion . . . . . . . . . . . 73 4.4.2 KKT con d it ion s for Q u ad r at ic Pr ogr am m in g . . . . . . . . 74 4.4.3 Com p lexity of KKT p oint s in Q u ad r at ic Pr ogr am m in g . . . 79 4.4.4 L in ear an d Q u ad r at ic Z er oO n e Pr ob lem s . . . . . . . . . . 80 4.4.5 Var iou s Equ ivalent For m s of Q u ad r at ic Z er oO n e Pr ob l em . 87 4.4.6 Com p lexity of Q u ad r at ic Z er oO n e Pr ogr am m in g . . . . . . 89 4.5 M u lt iQ u ad r at ic Pr ogr am m in g . . . . . . . . . . . . . . . . . . . . 90 4.5.1 Ap p licat ion s . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.6 Refor mu lat ion L in ear izat ion Tech n iqu es . . . . . . . . . . . . . . . 94 4.6.1 Q u ad r at ic Int eger Pr ogr am m in g . . . . . . . . . . . . . . . 95 4.6.2 M u lt iQ u ad r at ic Int eger Pr ogr am m in g . . . . . . . . . . . . 1 00 4.7 Ap p licat ion s of t h e Develop ed L in ear izat ion Tech n iqu e . . . . . . 103 4.7.1 M aximu m Cliqu e Pr ob lem s . . . . . . . . . . . . . . . . . . 103 4.7.2 M aximu m In d ep en d ent S et Pr ob lems . . . . . . . . . . . . 104 5 APPL ICATIO NS IN BIO ENG INEERING : BRAIN DIS O RDERS . . . . 106 5.1 Int r o d u ct ion t o Ep ilep sy . . . . . . . . . . . . . . . . . . . . . . . 107 5.1.1 Classicat ion of S eizu r es . . . . . . . . . . . . . . . . . . . 108 vii

PAGE 8

5.1.2 M ech an ism s of Ep ilep t ogen esis . . . . . . . . . . . . . . . . 109 5.2 Dir ect ion s in Ep ilep sy Resear ch : S eizu r e Pr ed ict ion . . . . . . . . 110 5.3 M ot ivat ion an d G oals of Resear ch . . . . . . . . . . . . . . . . . . 111 5.4 Dat a M in in g in EEG Tim e S er ies . . . . . . . . . . . . . . . . . . 115 5.4.1 Th e M et h o d of Delays . . . . . . . . . . . . . . . . . . . . . 115 5.4.2 Est im at ion of S h or t Ter m M aximu m L yap u n ov Exp on ent s 1 16 5.4.3 Est im at ion of Dyn am ical Ph ase ( An gu lar Fr equ en cy) . . . 12 8 5.4.4 Ent r opy an d In for m at ion . . . . . . . . . . . . . . . . . . . 129 5.4.5 Ap p r oxim at e Ent r opy ( Ap En ) . . . . . . . . . . . . . . . . 132 5.4.6 Kolm ogor ovS in ai Ent r opy . . . . . . . . . . . . . . . . . . 134 5.4.7 Ku lb ackL eib ler Dist an ce . . . . . . . . . . . . . . . . . . . 134 5.4.8 M o d elin g of EEG Tim es S er ies . . . . . . . . . . . . . . . . 139 5.5 S t at ist ical Test s for S p at iot em p or al An alysis . . . . . . . . . . . . 142 5.6 O p t im izat ion Tech n iqu es for Id ent ifyin g t h e Tem p or al P at t er n s . . 146 5.6.1 Q u ad r at ic Z er oO n e Pr ogr am m in g . . . . . . . . . . . . . . 148 5.6.2 Convent ion al L in ear izat ion Ap p r oach . . . . . . . . . . . . 15 0 5.6.3 KKT Con d it ion s L in ear izat ion Ap p r oach . . . . . . . . . . 151 5.6.4 M u lt iQ u ad r at ic Z er oO n e Pr ogr am m in g . . . . . . . . . . 15 2 5.7 M at er ials an d M et h o d s . . . . . . . . . . . . . . . . . . . . . . . . 163 5.7.1 Dat aset s . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 5.7.2 S eizu r e War n in g Algor it h m . . . . . . . . . . . . . . . . . . 163 5.7.3 Evalu at ion of t h e S eizu r e War n in g Algor it h m . . . . . . . . 1 66 5.8 Resu lt s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 5.9 Con clu sion s an d Discu ssion . . . . . . . . . . . . . . . . . . . . . . 168 6 CO NCL UDING REM ARKS AND F UTURE RES EARCH . . . . . . . . 176 6.1 S u m m ar y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 6.2 Fu t u r e Resear ch . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 6.2.1 Dyn am ical Ap p r oach es . . . . . . . . . . . . . . . . . . . . 177 6.2.2 G lob al O p t im izat ion . . . . . . . . . . . . . . . . . . . . . . 178 6.3 Ap p licat ion s in F in an ce . . . . . . . . . . . . . . . . . . . . . . . . 178 REF ERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 BIO G RAPHICAL S KETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 viii

PAGE 9

LISTOFTABLES Table page 5{1 Per for m an ce ch ar act er ist ics ( com p u t at ion al t im e { m easu r ed in secon d s) of two p r op osed ap p r oach es com p ar ed wit h com p let e enu m e r at ion s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 5{2 Ch ar act er ist ics of an alyzed EEG d at aset . . . . . . . . . . . . . . . . . 164 5{3 Per for m an ce ch ar act er ist ics of au t om at ed seizu r e war n in g algor it h m wit h op t im al p ar am et er set t in gs of t r ain in g d at a . . . . . . . . . . . 170 5{4 Per for m an ce ch ar act er ist ics of au t om at ed seizu r e war n in g algor it h m t est in g on op t im al t r ain in g p ar am et er set t in gs . . . . . . . . . . . . 171 ix

PAGE 10

LIST OF FIGURES Figure page 2{1 Flo w c hart of the forecasting system: The mo del-building and forecasting phases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2{2 T en min ute EEG time series data. . . . . . . . . . . . . . . . . . . . . 20 2{3 Mon thly acciden tal deaths in U.S.A. (from Jan uary 1973 to Decem b er 1978) time series data, whic h sho ws a v ery strong seasonalit y . . . . 20 2{4 U.S.A. p opulation with ten-y ear in terv als (from 1790 to 1990) time series data, whic h sho ws a v ery strong trend. . . . . . . . . . . . . . 21 2{5 Do w-Jones index closing prices (251 consecutiv e trading da ys ending 08/26/94) time series data. . . . . . . . . . . . . . . . . . . . . . . 21 3{1 Henon map created b y Runge-Kutta in tegration with a = 1 : 4, b = 0 : 3. 38 3{2 F ourier transform of Henon map with a = 1 : 4, b = 0 : 3. . . . . . . . . 38 3{3 3-D plot of Henon attractor from Henon map with a = 1 : 4, b = 0 : 3. . 38 3{4 Lorenz System created b y Runge-Kutta in tegration of the Lorenz equations, with = 10 : 0, r = 28 : 0, b = 8 3 . . . . . . . . . . . . . . . . . . 40 3{5 F ourier transform of Lorenz system with = 10 : 0, r = 28 : 0, b = 8 3 . . . 40 3{6 3-D plots of Lorenz attractor, whic h is a strange attractor, from Lorenz system with = 10 : 0, r = 28 : 0, b = 8 3 . . . . . . . . . . . . . . . . . 40 3{7 R ossler System created b y Runge-Kutta in tegration with a = 0 : 15, b = 0 : 20, c = 10 : 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3{8 F ourier transform of R ossler system with a = 0 : 15, b = 0 : 20, c = 10 : 0. 42 3{9 3-D plots of R ossler attractor from R ossler system with a = 0 : 15, b = 0 : 20, c = 10 : 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 x

PAGE 11

5{1 In fer ior t r an sver se view of t h e b r ain , illu st r at in g ap p r ox im at e d ep t h an d su b d u r al elect r o d e p lacem ent for EEG r ecor d in gs ar e d ep i ct ed . S u b d u r al elect r o d e st r ip s ar e p laced over t h e left or b it ofr o nt al ( L O F ) , r ight or b it ofr ont al ( RO F ) , left su b t em p or al ( L S T) , an d r igh t su b t em p or al ( RS T) cor t ex. Dep t h elect r o d es ar e p laced in t h e left t em p or al d ep t h ( LTD) an d r ight t em p or al d ep t h ( RTD) t o r ecor d h ip p o ca m p al act ivity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5{2 L at er al views of t h e b r ain , illu st r at in g ap p r oxim at e d ep t h an d su b d u r al elect r o d e p lacem ent for EEG r ecor d in gs ar e d ep ict ed . . . . . . . . . 113 5{3 A 10 sec segm ent of r aw EEG d at a d u r in g an ep ilep t ic seizu r e r e cor d ed fr om elect r o d e RTD2 in a p at ient wit h r ight t em p or al lob e ep i lep sy. 119 5{4 Th e ep ilep t ic at t r act or EEG d at a cor r esp on d in g t o F igu r e 5{3 in t h e r econ st r u ct ed p h ase sp ace wit h emb ed d in g d im en sion p = 3 an d t im e d elay = 20 m sec, in two d ier ent or ient at ion s. . . . . . . . . . . . 119 5{5 Th e var iat ion of S T L ma x wit h t h e len gt h T of t h e d at a segm ent for d at a in t h e p r eict al an d ict al st at e of an ep ilep t ic seizu r e ( p at ient 1) . Th e r est of t h e p ar am et er s for t h e S T L ma x algor it h m wer e: p = 7, = 14 m sec, t = 42 m sec, I D I S T 2 = 84 m sec, I D I S T 3 = 84, b = 0 : 05, c = 0 : 1, an d V i;j ( in it ial) = 0.1 r ad . . . . . . . . . . . . . . 125 5{6 Th e var iat ion of S T L ma x wit h t h e I D I S T 2 p ar am et er for d at a in t h e p r eict al an d ict al st at e of an ep ilep t ic seizu r e. Th e r est o f p ar am et er s for t h e S T L ma x algor it h m wer e: p = 7, = 14 m sec, t = 42 m sec, I D I S T 1 = 14 m sec, I D I S T 3 = 84, b = 0 : 05, c = 0 : 1, an d V i;j ( in it ial) = 0.1 r ad . . . . . . . . . . . . . . . . . . . . . . . . 126 5{7 S m o ot h ed S T L ma x p r oles over 2 h ou r s d er ived fr om an EEG sign al r ecor d ed at RTD2 ( p at ient 1) . S eizu r e 10 st ar t ed an d en d ed b et ween t h e two ver t ical d ash ed lin es. Th e est im at ion of t h e L ma x valu es was m ad e by d ivid in g t h e sign al int o n on over lap p in g segm ent s of 10.24 sec each , u sin g p = 7 an d = 20 m sec for t h e p h ase sp ace r econ st r u ct ion . Th e sm o ot h in g was p er for m ed by a 10 p oint ( 1.6 m inu t es) m ovin g aver age win d ow over t h e gen er at ed S T L ma x p r oles. 127 5{8 A typ ical p r ole b efor e, d u r in g an d aft er an ep ilep t ic seizu r e, est im at ed fr om t h e EEG r ecor d ed fr om a sit e in t h e ep ilep t ogen ic h ip p o cam p u s; t h e seizu r e o ccu r r ed b etween t h e ver t ical lin es. . . . 129 5{9 Plot of t h e L ma x over t im e d er ived fr om an EEG sign al r ecor d ed at B L 1, an elect r o d e sit e over lyin g t h e seizu r e fo cu s. . . . . . . . . . . 13 1 5{10 Plot of t h e Ent r opy over t im e d er ived fr om an EEG sign al ( c or r esp on d in g t o F igu r e 5{9 r ecor d ed at B L 1, an elect r o d e sit e over lyin g t h e seizu r e fo cu s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 xi

PAGE 12

5{11 Ku llb ackL eib ler d ist an ce ( KL D) an alysis of t h e Hen n on m a p . Th e left colu m n illu st r at es two d at a sam p les ( N= 100 an d N= 500) gen er at ed by t h e Hen n on m ap wit h ou t n oise. O n t h e r ight , two sam p les ( N= 100 an d N= 500) , gen ar at ed by Hen n on m ap wit h n oise ( = 0.003) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 5{12 Ku llb ackL eib ler d ist an ce an alysis on t h e EEG sign als r ec or d ed fr om a su b d u r al elect r o d e over lyin g or b it ofr ont al cor t ex cont r a lat er al t o t h e seizu r e on set zon e. Th e u p p er p lot sh ows an EEG sign al over a 13m inu t e t im e p er io d . Th e t r ace is d ivid ed int o 5 p r eict al wid ows ( A 1 t o A 5 ) , a win d ow wh ich in clu d es a seizu r e ( A 0 ) an d 5 p ost ical win d ows ( B 1 B 5 ) . Each win d ow is 75 secon d s in d u r at ion . Th e h ist ogr am in t h e secon d r ow is t aken fr om t h e win d ow cont ain in g t h e seizu r e ( A 0 ) . Hist ogr am s for exam p le p r eict al win d ows A 2 an d A 5 ar e sh own on t h e left . Hist ogr am s fr om p ost ict al win d ows B 2 an d B 5 ar e sh own on t h e r ight . In t h e cent er , Ku llb ackL eib ler d ist an ces b etween t h e A 5 an d ot h er win d ows ( p lot C ) an d b etween t h e win d ow A 0 an d each of t h e ot h er win d ows ( p lot D ) ar e d ep ict ed . . . . . . . 137 5{13 Ku llb ackL eib ler d ist an ce an alysis on t h e EEG sign als r ec or d ed fr om a su b d u r al elect r o d e over lyin g t h e ep ilep t ic fo cu s ( seizu r e o n set zon e) . 138 5{14 L yap u n ov d im en sion of ep ilep t ic p at ient as fu n ct ion of t im e. Th is gu r e cor r esp on d s t o n in e L yap u n ov exp on ent s. Not e t h at t h e L yap u n ov d im en sion d ecr eases over t h e 15 t o 20 m inu t es p er io d p r eced in g seizu r es on set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 5{15 Th r eed im en sion p lot s of ent r opy, an gu lar fr equ en cy, a n d S TL m ax in d ier ent p hysiological st at es ( int er ict al, p r eict al, ict al an d p ost ict al) of an ep ilep t ic p at ient . . . . . . . . . . . . . . . . . . . . . . . 14 1 5{16 Comb in ed 3d im en sion p lot s of ent r opy, an gu lar fr equ en cy, an d S TL m ax in d ier ent p hysiological st at es ( int er ict al, p r eict al, ict al an d p ost ict al) for an ep ilep t ic p at ient . . . . . . . . . . . . . . . . . . . 1 42 5{17 Plot s of t h e S T L ma x p r oles an d t h e T in d ex p r ole b etween t h e n or m al sit e RO F 4 an d t h e ep ilep t ogen ic sit e RTD2 ab ou t 35 m inu t es int o t h e r ecor d in g sh ow t h e d yn am ical ent r ain m ent of a p air of b r ain sit es b etween seizu r es 9 an d 10 ( p at ient 1) . . . . . . . . . . . . . . . 14 4 5{18 An gu lar fr equ en cy p r oles fr om two left or b it ofr ont al elect r o d e sit es over 3.5 h ou r s b etween seizu r es 13 an d 14 ( p at ient 1) . Th e ict a l p er io d s of t h e two seizu r es ar e d en ot ed by ver t ical lin es . . . . . . . 145 xii

PAGE 13

5{19 Th e Tin d ex p r ole b etween two elect r o d e sit es wh ose p r oles ar e d ep ict ed in F igu r e 5{18. Th e two sit es ar e d yn am ically ent r ai n ed 1.75 t o 1.5 h ou r s, as well as 1.2 h ou r p r ior t o seizu r e's 14 on set . Th e T 1 an d T 2 st at ist ical t h r esh old s ar e r ep r esent ed by t h e two h or izont al lin es. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 5{20 Hist ogr am of p r ob ab ility of d et ect in g p r eict al t r an sit i on of r an d om ly select ed ent r ain ed elect r o d e sit es 5,000 t im es com p ar ed wit h t h e m ost ent r ain ed elect r o d e sit es. . . . . . . . . . . . . . . . . . . . . . 147 5{21 Per for m an ce ch ar act er ist ics of two p r op osed ap p r oach es com p ar ed wit h com p let e enu m er at ion s . . . . . . . . . . . . . . . . . . . . . . . . . 153 5{22 S m o ot h ed S T L ma x p r oles of 5 op t im al elect r o d e sit es over 150 m inu t es in clu d in g a seizu r e. Th e p r eict al p er io d sh ows gr ad u al conve r gen ce of t h e S T L ma x valu es calcu lat ed for t h ese cr it ical elect r o d e sit es. Du r in g t h e seizu r e, S T L ma x valu es ar e com p let ely ent r ain ed . Post ict ally, t h e valu es ar e d isent r ain ed in d icat in g r eset t in g wh ich r ever ses t h e p r eict al ent r ain m ent . . . . . . . . . . . . . . . . . . . . . . . . 156 5{23 S m o ot h ed S T L ma x p r oles of 5 n on op t im al elect r o d e sit es over 150 m inu t es in clu d in g a seizu r e. Post ict al r eset t in g is n ot ob ser v ed for t h ese sit es. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 5{24 S m o ot h ed S T L ma x an d ma x p r oles of t h e 5 op t im ally select ed elect r o d es over t im e ( in clu d in g seizu r es 14, 15, an d 16) . Th e op t i m al elect r o d es wer e select ed 10 m inu t es b efor e seizu r e 15. . . . . . . . . 157 5{25 Aver age Tin d ex cu r ve over t im e fr om t h e S T L ma x p r oles an d t h e ma x p r oles in F igu r e 5{24. . . . . . . . . . . . . . . . . . . . . . . 158 5{26 Conver gen ce of 5 S T L ma x p r oles fr om cr it ical cor t ical sit es over 2 h ou r s b etween seizu r es 9 an d 10 ( p at ient 1) . Th e ict al p er io d s of t h e two seizu r es ar e d en ot ed by ver t ical lin es. . . . . . . . . . . . . . . 16 0 5{27 Th e Tin d ex p r ole am on g 5 cr it ical cor t ical sit es wh ose S T L ma x p r oles ar e d ep ict ed in F igu r e 5{26. Th e cor t ical sit es ar e d yn am i cally ent r ain ed ap p r oxim at ely 60 m inu t es p r ior t o seizu r e's 10 on se t . Th e st at ist ical t h r esh old s ar e r ep r esent ed by t h e two h or izont al li n es. . . 160 5{28 An gu lar fr equ en cy ma x p r oles b etween seizu r es 13 an d 14 ( p at ient 1) of 5 elect r o d e sit es select ed by t h e op t im izat ion p r ogr am d u r in g t h e 10 m inu t e int er val p r ior t o t h e on set of seizu r e 13. . . . . . . . 161 5{29 Th e aver age Tin d ex p r ole of t h e 5 op t im al elect r o d e sit e s wh ose ma x p r oles ar e d ep ict ed in F igu r e 5{28. Th e 5 sit es b ecom e an d r em a in d yn am ically ent r ain ed ap p r oxim at ely 0.5 h ou r p r ior t o t h e o n set of seizu r e 14. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 xiii

PAGE 14

5{30 Angular frequency max proles b etween seizures 13 and 14 (patient 1) of 5 non-optimally selected electro de sites. . . . . . . . . . . . . 1 62 5{31 The average T-index prole of the 5 non-optimal electro d e sites whose max proles are depicted in Figure 5{30. The 5 sites do not b ecome dynamically entrained b etween seizures 13 and 14. . . . . . . . . . 162 5{32 Flow diagram of the Seizure Warning Algorithm. This diag ram illustrates the steps employed in the automated algorithm (see text for an explanation of each step). . . . . . . . . . . . . . . . . . . . . . 167 5{33 A plot of S T L max values calculated from a 250-minute sample of intracranial EEG recording which contains 3 of the complex par tial seizures recorded from patient 1. After seizure 8 and 9, 5 critic al electro de sites (A R 4, A L 4, B R 2, B R 3 and B L 2 after seizure 8 and B R 1, B R 2, B R 4, C R 2 and C R 8 after seizure 9) were selected by the global optimization algorithm. At this p oint in time, S T L max values for these selected sites are signicantly dierent (disentrained) . Prior to seizure 9 and 10, S T L max values from these same sites converge to a common value (entrained) and these sites b ecome disentrain ed after seizure 9 and 10. . . . . . . . . . . . . . . . . . . . . . . . . . 169 5{34 This average T-index prole was calculated from the S T L max proles shown in Fig 5{33. When the average T-index drops from a value o f 5 or ab ove to a critical value of 2.662, the average T-index fo r these sites is not signicantly dierent than 0. At that p oint, the sites are considered to b e dynamically entrained and a seizure warni ng is generated by the system. Seizure warnings are generated appro ximately 50 minutes b efore seizure 9 and approximately 70 minut es b efore seizure 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 5{35 RO C cu r ve for t h e op t im al p ar am et er set t in g of 5 p at ient s . . . . . . 171 5{36 S ch em at ic d iagr am of t h e syst em ap p r oach t o t h e m o d elin g o f EEG in ep ilep t ic p at ient s for p u r p oses of seizu r e p r ed ict ion an d an a lysis of d yn am ical m ech an ism s. . . . . . . . . . . . . . . . . . . . . . . . . 173 6{1 Th e AA st o ck in d ex over 4000 op er at in g d ays ( ap p r oxim at ely 16 year s) fr om 01/04/1984 t o 01/04/2000. . . . . . . . . . . . . . . . . . . . 180 6{2 Th e AA S T L ma x p r oles over 4000 op er at in g d ays ( ap p r oxim at ely 16 year s) fr om 01/04/1984 t o 01/04/2000. . . . . . . . . . . . . . . . . 180 6{3 Th e CAT st o ck in d ex over 4000 op er at in g d ays ( ap p r oxim at e ly 16 year s) fr om 01/04/1984 t o 01/04/2000. . . . . . . . . . . . . . . . . 181 6{4 Th e CAT S T L ma x p r oles over 4000 op er at in g d ays ( ap p r oxim at ely 16 year s) fr om 01/04/1984 t o 01/04/2000. . . . . . . . . . . . . . . . . 181 xiv

PAGE 15

6{5 Th e DD st o ck in d ex over 4000 op er at in g d ays ( ap p r oxim at el y 16 year s) fr om 01/04/1984 t o 01/04/2000. . . . . . . . . . . . . . . . . . . . 182 6{6 Th e DD S T L ma x p r oles over 4000 op er at in g d ays ( ap p r oxim at ely 16 year s) fr om 01/04/1984 t o 01/04/2000. . . . . . . . . . . . . . . . . 182 xv

PAGE 16

Abstract of Dissertation Presen ted to the Graduate Sc ho ol of the Univ ersit y of Florida in P artial F ulllmen t of the Requiremen ts for the Degree of Do ctor of Philosoph y OPTIMIZA TION AND D YNAMICAL APPR O A CHES IN NONLINEAR TIME SERIES ANAL YSIS WITH APPLICA TIONS IN BIOENGINEERING By W anprac ha Chao v alit w ongse August 2003 Chair: P anagote M. P ardalos Ma jor Departmen t: Industrial and Systems Engineering T raditional linear analysis in the time series has b een routinely used but did not seem to successfully giv e insigh t in to the c haracteristic and mec hanism of time series b ecause these metho ds are limited b y the requiremen t of stationarit y of the time series and normalit y and indep endence of the residuals. A new data pro cessing tec hnique, kno wn as data mining, can o v ercome these problems and eectiv ely dra w o the hidden information in these large time series data sets. A new framew ork for analyzing time series data called Time Series Data Mining (TSDM) motiv ated us to adapt and inno v ate data mining concepts, dynamical approac hes in c haos theory , and optimization tec hniques to the areas of time series analysis. The main ob jectiv e of this researc h is to pro vide new data mining concepts and a new set of metho ds, based on time series analysis, c haos theory and optimization tec hniques, that are able to c haracterize, dra w inferences, and rev eal hidden temp oral patterns that are predictiv e c haracteristics of time series ev en ts. These to ols can b e used to dev elop a new tec hnique for the prediction of the time series arising in real w orld problems, as w ell as to conduct adv anced studies on the sub ject. xvi

PAGE 17

In this dissertation, analysis based on c haos theory and theory of nonlinear dynamics has b een applied to time series data to iden tify complex (nonp erio dic, nonlinear, irregular, and c haotic) c haracteristics. Sev eral alternativ e optimization metho ds for reconstructing parameter spaces of the dynamical systems and iden tifying predictiv e temp oral patterns in the system are emplo y ed. Sp ecically , these problems can b e form ulated as m ulti-quadratic programming (MQP) problems. T o solv e MQP problems, a new linearization tec hnique based on Karush-Kuhn T uc k er optimalit y conditions, pro v en to guaran tee the global optimalit y , is dev elop ed. A no v el combination of metho ds for determining the optimal parameters and temp oral patterns are applied to real-w orld electro encephalogram (EEG) time series for prediction of epileptic seizures. The results sho w that the com bination of these tec hniques pro vides accurate results while impro ving dramatically the time required to quan tify c haos in time series, and eliminating a range of parameters that ha v e, th us far, b een xed empirically . Sensitivit y analysis is emplo y ed to justify the use of this com bination of metho ds, and comparisons are made with more con v en tional quan tifying tec hniques and trivial measures sho wing the adv an tage of the results generated b y this w ork. xvii

PAGE 18

CHAPTER 1 INTR ODUCTION During the past decade, the dev elopmen ts in information tec hnologies ha v e allo w ed industries to automate and computerize their op erations and pro cesses. F or this reason, a large n um b er of time series data sets from distinct subsystems of an en terprise ha v e b egun to b e dramatically collected and accum ulated. A time series is essen tially dened as a set of serial observ ations in the system, where eac h one is recorded at a sp ecic time. In practice, there are man y time series, noticeably in the natural sciences, medicine, and nance. F or this reason, time series analysis is crucial to a v ariet y of real w orld time dep enden t systems b ecause it pro vides a basis for economics and business planning, pro duction planning, in v en tory and pro duction con trol, or con trol and optimization of industrial pro cesses. The time series data sets seem to b e unmanageable and unpredictable and the traditional linear analysis in the time series has b een routinely used but did not seem to successfully giv e insigh t in to the c haracteristic and mec hanism of time series. Ho w ev er, a new data pro cessing tec hnique, kno wn as data mining, is sho wn to b e able to eectiv ely dra w o the hidden information in these large time series data sets. The time series data mining (TSDM) framew ork w as rst in tro duced b y P o vinelli (2000), and it has b een sho wn to b e able to successfully c haracterize and predict complex, nonp erio dic, irregular, and c haotic time series [129 ]. In addition, the TSDM framew ork has man y adv an tages o v er other traditional time series analysis tec hniques (e.g., stationarit y and linearit y assumptions). In the last decade, time series analysis based on c haos theory and theory of nonlinear dynamics, whic h are among the most in teresting and gro wing researc h topics, has b een applied to time series data with some degree of success. The concepts 1

PAGE 19

2 of c haos theory and theory of nonlinear dynamics ha v e not only b een useful to analyze sp ecic systems of ordinary dieren tial equations or iterated maps, but ha v e also oered new tec hniques for time series analysis. Moreo v er, a v ariet y of exp erimen ts ha v e sho wn that a recorded time series is driv en b y a deterministic dynamical system with a lo w dimensional c haotic attractor, whic h is dened as the phase space p oin t or set of p oin ts represen ting the v arious p ossible steady-state conditions of a system; an equilibrium state or group of states to whic h a dynamical system con v erges. Th us, the theories of c haos and nonlinear dynamics ha v e pro vided new theoretical and conceptual to ols that allo w us to capture, understand, and link the complex b eha viors of simple systems together. Characterization and quan tication of the dynamics of nonlinear time series are also imp ortan t steps to w ard understanding the nature of random b eha vior and ma y enable us to predict the o ccurrences of some sp ecic ev en ts whic h follo w temp oral dynamical patterns in the time series. Since the TSDM framew ork pro vides new data mining concepts and the dynamical approac hes in c haos theory enable us to nd the consisten t patterns and apply them to mak e b etter predictions for suc h time series, the com bination of the t w o tec hniques motiv ated this dissertation. T o maximize the abilit y to c haracterize and dra w inferences in the time series, w e emplo y optimization tec hniques to nd optimal temp oral patterns that are c haracteristic and predictiv e episo des of ev en t, whic h are v ery crucial to the real w orld time dep enden t systems. In this thesis, w e aim to dev elop a set of to ols to disco v er and rev eal hidden predictiv e temp oral patterns of time series ev en ts. In particular, w e adapt and inno v ate data mining concepts, dynamical approac hes in c haos theory , and optimization tec hniques to the areas of time series analysis, whic h will b e discussed in the next few sections. 1.1 Data Mining Kno wledge disco v ery in database (KDD) is the non trivial pro cess of iden tifying v alid, no v el, p oten tially useful, and ultimately understandable patterns in data [42 ].

PAGE 20

3 KDD is comprised of man y steps, whic h in v olv e data preparation, searc h for patterns, kno wledge ev aluation, and renemen t, all rep eated in m ultiple iterations. Data mining is a v ery imp ortan t step in this pro cess where sp ecic algorithms are emplo y ed for extracting patterns from data. The term data mining can b e considered as a set of the extraction pro cess of kno wledge starting from data con tained in a base of data. Data mining can b e used in v arious time series data sets and large-scale distribution to analyze the b eha viors of the consumers, seek similarities of consumers according to geographical criteria, cross sale and selectiv e activ ation with discoun t cards, and optimize resto c king. Ho w ev er, data mining is used in the pharmaceutical lab oratories for the iden tication (c hoice) of the b est therapies and in the banks to searc h for frauds or the authorization of credit. Moreo v er, it can b e used in insurance, aeronautics, cars, industry , transp ort, telecomm unications, energy and other domains. Data mining tec hniques include a v ariet y of metho ds: predictiv e mo deling, clustering, asso ciation mining, and c hange and deviation detection [35 ]. The predictiv e mo deling metho d includes classication for categorical predictions and regression analysis for n umerical predictions. The clustering metho d separates the data in to subsets that are similar to eac h other, where the n um b er of desired clusters can also b e determined. The ob jectiv e of the asso ciation mining tec hnique is to determine rules that indicate relationships among attributes, that is, to estimate ho w m uc h the v alue of an attribute dep ends on the v alues of other attributes. The c hange and deviation detection tec hnique is concerned with the sequence information, suc h as time-series, where the ordering of the observ ations is imp ortan t. By com bining a v ariet y of these tec hniques with existing databases and decision supp ort mo dels, signican t impro v emen t can b e made in the kno wledge learned and ultimately decisions can b e made for man y applications. Man y business and scien tic problems in v olv e data that are asso ciated with time. Recen tly , there has b een a signican t

PAGE 21

4 amoun t of eort to apply data mining tec hniques to time series data but there still are v ast amoun ts of w ork remaining to b e done in this area. In this thesis, w e apply the data mining concepts to the time series prediction and c haracterization. Ultimately , based on data mining concepts, w e in tegrate dynamical approac hes in c haos theory and optimization tec hniques with time series analysis to c haracterize, dra w inferences and nd predictiv e temp oral patterns in the time series. Next, some principal concepts in data mining are discussed. 1.1.1 Classication and Prediction Giv en an existing data set, classication is the pro cess of classifying the category for unkno wn data. Normally the data set is divided in to a training data set and a testing data set. The training data set is used to train a mac hine learning algorithm iterativ ely , un til the error b ound is decreased to b elo w a threshold. In order to nd the b est pro cess design for a giv en problem, the training pro cess is rep eated man y times with v arious parameter v alues and randomized orders of the input. Once an optimal pro cess design is determined for a giv en set of parameters, the testing data sets (unkno wn to the algorithm) are op erated through the pro cess design iterativ ely un til the a v erage error b ound is obtained. The t ypical criterion used to determine whic h data are in the testing data set is a hold out ratio for the test data and a cross-fold v alidation. In c hronological data sets, the algorithm is rep eated b y using a sliding windo w o v er a time series, where the next m time stamp v alues are c hosen to b e the testing data set, and the previous n time stamp v alues are considered to b e the training data set. After the training and testing steps, the pro duction-ready classier is built b y using the b est pro cess design for a giv en set of parameters from the en tire data sets. There are sev eral w ell-kno wn classication mac hine learning algorithms, including decision trees, suc h as neural net w orks [60 ], genetic algorithms [54 ], and Ba y esian based metho ds [99 ]. In addition to these base algorithms, sev eral approac hes

PAGE 22

5 for dev eloping committees, or com binations of learners ha v e b een used to impro v e learning [60 , 99 ]. Eac h of these algorithms has its adv an tages and disadv an tages, dep ending on a giv en problem. An o v erview of curren t mac hine learning classication algorithms has b een published [99 ]. 1.1.2 Clustering Clustering is an unsup ervised learning metho d that divides the data in to naturally o ccurring groups. The most common clustering algorithm is referred to as c-means [142 ]. Classication lab els are either una v ailable or ignored in the clustering tec hnique. Although there are classes iden tied in the data set, clustering do es not adjust for the o v erlap in the classes. There is not a training stage in clustering algorithms; ho w ev er, class lab els are used to train the pattern recognition algorithm. Cluster v alidation tec hniques are then applied to mak e sure that the clustering algorithm nds the b est n um b er of clusters. The Dunn's measure and the Da vies measure [142 ] are the t w o most common cluster v alidation tec hniques for the cmeans algorithm. These measures pro vide a metho d for determining the n um b er of clusters. The Dunn's measure yields lo cal maxim um v alues, but not necessarily the global maxim um v alue. Clustering do es not separate data from dieren t classes that o v erlap, but it simply nds the b est clustering of the pro vided data. 1.2 Time Series Analysis Time series analysis is concerned with indep enden t data whic h are serially correlated for predicting the future b eha vior. The goal of time series analysis is t w ofold. One is to estimate future v alues or ev en ts based on an analysis of past general data whic h are b eliev ed to inuence future v alues or b eha vior o v er time. The second is to dra w inferences from suc h series based on the nature of the phenomenon represen ted b y the sequence of observ ations. Therefore, a mo dest aim of a time series analysis is to deriv e a go o d description from a learning p erio d. This description ma y simply consist of some statistical summaries or graphical represen tations, but

PAGE 23

6 it can also b e used to forecast future v alues of the series. The use of the a v ailable observ ations at time t to forecast its future v alue at time t + pro vides a basis for economics and business planning, pro duction planning, in v en tory and pro duction con trol, or con trol and optimization of industrial pro cesses. Man y metho ds in time series analysis ha v e b een dev elop ed for this purp ose. In concept, stationarit y is concerned with understanding the probabilistic structure of a time series but the idea of stationarit y is that the probabilistic structure of the time series is not aected b y a shift in time origin. Although most of the statistical forecasting metho ds are dev elop ed for stationary time series, man y time series in industry , business and economics are often nonstationary and, in particular, ha v e no natural mean. Ho w ev er, a nonstationary time series ma y still con tain p ortions with stationary prop erties. Most of the metho ds for time series analysis try to iden tify the optimal mo del to t the data in a learning p erio d and apply this mo del to predict the future. Ho w ev er, man y time series ma y only con tain a few meaningful or predictable patterns, esp ecially in a nonstationary time series suc h as earthquak e time series and EEG time series where the in terest ma y lie on the o ccurrences of some sp ecic ev en ts. F or example, when similar patterns app ear rep eatedly in the observ ations, those patterns ma y app ear again in the future. In these cases, the traditional time series mo dels (e.g., the autoregressiv e (AR) mo del, the mo ving a v erage (MA) mo del, and the autoregressiv e mo ving a v erage (ARMA) mo del) usually giv e p o or predictions since the mo del w as constructed to t the en tire learning p erio d where a stable pattern useful for prediction ma y o ccur only in a small p ortion. 1.3 Chaos Theory and Nonlinear Dynamics Chaos is said to ha v e b een disco v ered in 1963 b y Edw ard Lorenz [92 ] who w as w orking on a mo del for long-term w eather prediction. Although his Lorenz system did not turn out to aid in forecasting w eather, it piqued in terest and pa v ed the

PAGE 24

7 w a y for researc h in the area of c haos theory . Chaos has pro vided an alternativ e in terpretation of the erratic or random b eha vior of the system. In general, c haos is dened as a name for an y order that pro duces confusion in our minds. Ho w ev er, in dynamical systems theory , c haos means irregular uctuations in a deterministic system: the system b eha v es irregularly b ecause of its o wn in ternal structure, and not b ecause of random forces acting from outside. F or instance, c haos is dened as an unpredictable b eha vior arising in a deterministic system b ecause of great sensitivit y to initial conditions. Chaos arises in a dynamical system if t w o arbitrarily close starting p oin ts div erge exp onen tially , so that their future b eha vior is ev en tually unpredictable. Dynamical systems are \deterministic" if there is a unique consequence to ev ery state and \sto c hastic" or "random" if there is more than one consequen t c hosen from some probabilit y distribution. Because of its unique consequen t, a deterministic dynamical system is p erfectly predictable giv en p erfect kno wledge of the initial condition, and is in practice alw a ys predictable in the short term. On the other hand, the cause of long-term unpredictabilit y is the sensitivit y to the initial conditions prop ert y . No matter ho w precisely the initial condition in these systems is measured, the prediction of its subsequen t motion go es radically wrong after a p erio d of time. W eather is considered c haotic since arbitrary small v ariations in initial conditions can result in radically dieren t w eather later. This ma y limit the p ossibilities of long-term w eather forecasting, for example, the p ossibilit y of a buttery's sneeze aecting the w eather enough to cause a h urricane w eeks later. Chaos can also b e dened as a tra jectory that is exp onen tially unstable and neither p erio dic or asymptotically p erio dic; that is, it oscillates irregularly without settling do wn. Chaos is the term used to describ e the complex b eha vior of what w e consider to b e simple. Chaotic b eha vior lo oks erratic and almost random. It is almost lik e the b eha vior of a system strongly inuenced b y outside, random noise or the complicated b eha vior of a system with man y degrees of freedom. Chaotic b eha vior arises in v ery

PAGE 25

8 simple systems (with only a few activ e degrees of freedom), whic h are almost free of noise. Chaos is really only one t yp e of b eha vior exhibited b y nonlinear systems where a nonlinear system is dened as a system whose time ev olution equations are nonlinear; that is, the dynamical v ariables describing the prop erties of the system (i.e., p osition, v elo cit y , acceleration, pressure, etc.) app ear in the equations in a nonlinear form. The eld of study is more prop erly called nonlinear dynamics, whic h is the study of the dynamical b eha vior (b eha vior in time) of a nonlinear system. The main consequence of c haotic motion is that giv en imp erfect kno wledge, the predictabilit y horizon in a deterministic system is m uc h shorter than one migh t exp ect, due to the exp onen tial gro wth of errors. The b elief that small errors should ha v e small consequences w as p erhaps engendered b y the success of Newton's mec hanics applied to planetary motions. Ho w ev er, c haos is denitely not complete disorder; it is disorder in a deterministic dynamical system, whic h is alw a ys predictable for short times. Chaos theory and theory of nonlinear dynamics ha v e b een applied to man y dieren t areas including ph ysics (e.g., in the study of turbulence, Ec kman and Ruelle 1985 [29 ]), meteorology (e.g., in the study of nonp erio dic o w of atmosphere, Lorenz 1963 [92 ]), biology (e.g., in study of epileptic brain, Iasemidis and Sac k ellares 1991 [69 ]), epidemiology (e.g., in study of disease o ccurrence, Sugihara and Ma y 1990 [151 ]), economics (e.g., in study of mo deling economic dynamics, Sc heinkman 1990 [143 ]), nance (e.g., in study of exploring c haos in nancial mark et, Hsieh 1991 [63 ]), geology (e.g., in study of predictabilit y in o cean w ater lev els, F rison et al. 1999 [43 ]), and comm unication (e.g., in study of optical ring laser, Abarbanel and Kennel 1998 [3 ]). 1.4 Global Optimization Op erations researc h (OR) is concerned with optimal decision making under circumstances c haracterized b y conicting goals, c hanging conditions, limited

PAGE 26

9 resources, complex in terp ersonal dynamics, uncertain t y , and strict deadlines, and the mo deling of deterministic and probabilistic systems that originate from real life. The goal of op erations researc h is to pro vide a framew ork for constructing mo dels of decision-making problems, nding the b est solutions with resp ect to a giv en measure of merit, and implemen ting the solutions in an attempt to solv e the problems. Man y problems of b oth practical and theoretical imp ortance concern themselv es with the c hoice of a \b est" conguration or set of parameters to ac hiev e some goal. Ov er the past few decades, mathematical programming has emerged together with a corresp onding collection of tec hniques for solutions to the problems, with the ob jectiv e of studying prop erties and algorithms for the solution of optimization problems. One of the cen tral p oin ts of this dissertation is to nd optimal temp oral patterns that can b e used to c haracterize and predict ev en ts. Th us, w e are required to solv e optimization problems to nd hidden temp oral patterns to maximize the abilit y to c haracterize and predict ev en ts in the time series. Next, w e giv e a review of the optimization tec hniques. The optimization problem has a general form Minimize or Maximize f ( x ) Sub ject to x 2 S ; where S (feasible domain) is a set in R n and f ( x ) (ob jectiv e function) is a real v alued function dened on S . Let k k denote the Euclidean norm. A p oin t x 2 S is said to b e relativ e or lo cal minim um p oin t if f ( x ) f ( x ) for all x 2 S satisfying k x x k for some > 0. W e sa y that x is a global minim um p oin t if f ( x ) f ( x ) for all x 2 S . Lik ewise, a p oin t x 2 S is a lo cal global maxim um p oin t if f ( x ) f ( x ) for all x 2 S \ f x : k x x k g . W e sa y that x is a global minim um p oin t if f ( x ) f ( x )

PAGE 27

10 for all x 2 S . In the sequel w e consider optimization problems of the form min x 2 S f ( x ) ; (1.1) where S = x 2 R n : g i ( x ) 0 ; i = 1 ; :::; p with g i : A ! R on a suitable set A S (often A = R n ). If S = R n , then w e ha v e an unconstrained optimization problem. When S is a p olyhedron w e call the problem linearly constrained. When the ob jectiv e function is linear, the problem is called a linear programming problem. It is called a nonlinear programming problem when at least one of the functions in v olv ed is nonlinear. When f and eac h constrain t function g i is con v ex, Problem 1.1 is referred to as a con v ex programming problem. As w e kno w, when f is a con v ex function and S is a con v ex set then ev ery lo cal minim um is a global minim um. This prop ert y holds for some classes of generalized con v ex ob jectiv e functions; ho w ev er, it is not true for quasicon v ex ob jectiv e functions. Note that a function f : C ! R ; C con v ex, is quasicon v ex if its (lo w er) lev el sets L ( f ; ) = f x 2 C ; f ( x ) g are con v ex for all 2 R . In noncon v ex nonlinear programming problems, w e ha v e to exp ect (man y) lo cal minim um p oin ts with function v alues dieren t from the global minim um. In general, optimization problems can b e divided in to t w o categories: those with con tin uous v ariables, and those with discrete v ariables, whic h are called com binatorial. In the con tin uous problems, w e are essen tially lo oking for a set of real n um b ers or ev en a function. In the com binatorial problems, w e are lo oking for an ob ject from a nite or p ossibly coun tably innite set, t ypically an in teger, set, p erm utation, or graph. The concept and imp ortance of ha ving tigh t linear programming relaxations to enhance the eectiv eness of an y algorithm for solving in teger programming problems has b een widely studied. Most of the atten tion has fo cused on pure and mixed zeroone programming problems b ecause of the wide v ariet y of applications whic h these

PAGE 28

11 t yp es of problems mo del. The most general form of the problem of in terest can b e stated as Minimize f ( x; y ) Sub ject to g ( x; y ) = 0 (1.2) h ( x; y ) 0 x 2 D Z p ; y 2 R q ; where f : R p + q ; g : R p + q ! R m ; h : R p + q ! R l are assumed to b e functions with con tin uous 2 nd order deriv ativ es, and D is a b ounded set of Z p . The decision v ariables are represen ted b y x and y , whic h are discrete and con tin uous v ariables resp ectiv ely . If the v ariables x do not arise in problem 1.2 so that it b ecomes a nonlinear optimization problem with con tin uous v ariables only , then there are a large n um b er of algorithms to solv e the problem. When there is a presence of discrete v ariables x , problem 1.2 b ecomes a discrete optimization problem, whic h is m uc h harder to solv e ev en if the discrete problem is relativ ely small when compared to the con tin uous problem. In fact, there are only a few algorithms dealing with suc h problems b ecause of their in trinsic dicult y . Discrete v ariables arise in man y optimization problems, and they o ccasionally o ccur in conjunction with con tin uous v ariables. A common reason for discrete v ariables arising in the optimization problems is when resources of in terest ha v e to b e measured in terms of in teger quan tities (e.g., parameter settings, p erm utations, and the n um b er of p eople to b e assigned for certain jobs). Discrete v ariables can b e in tro duced to facilitate the problem mo deling pro cess (e.g., in tro ducing binary or 0-1 v ariables to represen t \y es-no" decisions). One of the classical optimization problems that emplo y binary v ariables is the knapsac k problem. In this problem, there are n items that can b e placed in to a knapsac k. Eac h item i is asso ciated with w eigh t w i and v alue c i . The ob jectiv e of this problem is to maximize the total v alue of items

PAGE 29

12 placed in the knapsac k suc h that the total w eigh t of items is not to exceed b , where x i is a binary v ariable represen ting the item i suc h that x i = 8 > > < > > : 1 if item i is placed in the knapsac k, 0 otherwise. Then the knapsac k problem is giv en b y Maximize c T x Sub ject to w T x b (1.3) x 2 f 0 ; 1 g : Discrete v ariables can b e used to mo del constrain ts that in v olv e logical constrain ts. F or instance, supp ose w e w an t to in tro duce a nonlinear constrain t x 1 x 2 0 to a linear optimization problem; w e are required to transform the nonlinear constrain t to a linear constrain t in order to preserv e the linearit y of the optimization problem. This can b e done b y in tro ducing t w o linear constrain ts M (1 y ) x 1 M y and M y x 2 M (1 y ), where y is a binary v ariable and M is a sucien tly large p ositiv e n um b er that do es not aect the feasibilit y of the problem. It is clear to see that if y = 1, then x 1 0 ; x 2 0 and if y = 0, then x 1 0 ; x 2 0. In b oth cases, x 1 x 2 0. Although discrete optimization problems ha v e nite or coun table feasible p oin ts, they are not necessarily easier to solv e than con tin uous cases. In fact, they are generally harder to solv e. F or a b etter understanding of algorithms dev elop ed to solv e discrete optimization problems, it is useful to emplo y the terminology in the complexit y theory . An algorithm is said to b e p olynomially b ounded if there exists a p olynomial function p suc h that for eac h input of size n , the algorithm terminates after at most p ( n ) steps. A decision problem is a problem that returns an answ er of \y es" or \no"

PAGE 30

13 for its solution. A decision problem is said to b e in the class P if it can b e solv ed b y a p olynomially b ounded algorithm. The problem class N P is a class of decision problems whose solutions can b e v eried in time that is p olynomial to the input size. A problem P 1 is said to b e p olynomial-time reducible to problem P 2 if there is a mapping function f from the inputs of P 1 to the inputs of P 2 suc h that f can b e computed in p olynomial-time, and the solution to P 1 on input x is y es if and only if the solution to P 2 on input f ( x ) is y es. A problem P is N P -hard if for an y problem P 0 2 N P , P 0 is p olynomial-time reducible to P . If problem P is N P -hard and P 2 N P , then P is N P -complete. Although P N P , it is still an op en question whether P = N P , whic h implies a question if there is some P -complete problem that can b e solv ed b y a p olynomially b ounded algorithm. Most of the discrete optimization problems are N P -hard or N P -complete, ev en if they are linear problems. Solving suc h discrete optimization problems can b e dicult b ecause of their complexit y . Alternativ ely , since there is a nite set of feasible solutions to the discrete optimization problems, one can simply examine all suc h solutions (i.e., an exhaustiv e en umeration of all p ossible solutions). Ho w ev er, the n um b er of steps needed to en umerate all p ossible solutions is exp onen tial. F or example, if there are m binary decision v ariables in the problem, then up to 2 m function ev aluations ha v e to b e p erformed to determine the optimal solution. F or this reason, it is not practical to p erform exhaustiv e en umeration for an y discrete optimization problems of a size more than 1000 b ecause it w ould require at least millions of y ears to complete this op eration. Although some en umeration p ossibilities can b e eliminated b y in tellectual observ ations (e.g., branc h-and-b ound), there still is an enormous n um b er of alternativ es to b e considered. In this framew ork, w e divide the discrete optimization problems in to t w o categories: linear discrete optimization problems and nonlinear discrete optimization problems. These t w o problems will b e discussed in detail in Chapter 4.

PAGE 31

14 1.5 Con tributions of the Thesis In this thesis, w e are concerned with the problem of predicting target episo des of ev en ts, whic h encompasses disco v ering temp oral patterns in m ultiple time series go v erning the related target episo des of ev en ts. T raditional linear and nonlinear time series analyses ha v e b een routinely used but did not seem to successfully giv e insigh t in to the c haracteristic and mec hanism of time series b ecause these metho ds are limited b y the stationary requiremen t of the time series and the normalit y and indep endence requiremen ts of the residuals. These limitations and the lac k of insigh t in to the c haracteristic and mec hanism of time series for making b etter ev en tpredictions of traditional time series analysis are resolv ed b y the dev elopmen t of new time series data mining concepts, whic h generalizes data mining concepts, dynamical approac hes in c haos theory , and optimization tec hniques to the areas of time series analysis. These concepts are used to dev elop new tec hniques for the prediction of the time series arising in real w orld problems (e.g., electro encephalogram (EEG) time series) as w ell as to conduct adv anced studies on the sub ject. The dev elop ed tec hniques use a com bination of data mining tec hniques, dynamical approac hes, and optimization tec hniques applied to time series data, with the ob jectiv e of disco v ering temp oral patterns in time series and then predicting ev en ts of in terest. Sp ecically , this thesis in tegrates metho ds based on c haos theory , statistical analysis and optimization tec hniques to iden tify complex (nonp erio dic, nonlinear, irregular, and c haotic) c haracteristics and predict the onset of a target ev en t from complex real w orld time series. In this researc h, w e also fo cus on the statistical and optimization problems that enable us to detect statistically signican t temp oral patterns that can b e used to c haracterize and predict the onset of target ev en ts in the times series. Iden tifying temp oral patterns in m ultiple time series is com binatorial in nature, op erating with the selection of critical comp onen ts in the system of in terest. Therefore, the

PAGE 32

15 optimization tec hniques are dev elop ed to impro v e the p erformance of prediction in the time series b y iden tifying critical temp oral patterns related to the target ev en ts (e.g., dynamical parameter settings and selecting of critical comp onen ts). F or instance, sev eral alternativ e optimization metho ds for selecting the critical comp onen ts in the systems are emplo y ed and a no v el com bination of metho ds for determining the optimal parameters can b e applied to systems with one or more hidden v ariables, whic h can b e used to reconstruct maps or dieren tial equations of the dynamics of the system. Motiv ated b y the spinning glass mo del, the problem of c haracterizing and iden tifying temp oral patterns is ideally suited to 0-1 (t w o states) problems. In this thesis, w e are sp ecically in terested in m ulti-quadratic 0-1 programming problem, whic h is one of the most practical optimization problems. Herein, w e prop ose a new computational approac h to solv e the m ulti-quadratic 0-1 programming problem. In this approac h, w e dev elop a no v el linearization tec hnique based on Karush{Kuhn T uc k er (KKT) optimalit y conditions. It is w ell-kno wn that the KKT optimalit y conditions guaran tee the global optimalit y only in the con v ex case. Although the dev elop ed tec hnique seems to b e heuristic in nature, w e ha v e pro v en that this no v el tec hnique can guaran tee the global optimalit y with the p ositivit y assumption of elemen ts in the quadratic matrices. T o generalize this tec hnique and mak e this tec hnique applicable to other real w orld problems, w e prop ose new ideas to solv e general m ulti-quadratic 0-1 programming problems and guaran tee the global optimalit y without an y assumptions (e.g., p ositivit y assumption of elemen ts in the quadratic matrices). While this dev elop ed tec hnique solv es the m ulti-quadratic 01 programming problems with global optimalit y , it linearizes the problem with the same n um b er of 0-1 v ariables ( n ) and additional O ( n ) n um b er of con tin uous v ariables. On the other hand, the con v en tional linearization tec hniques found in the literature linearize the problem with additional O ( n 2 ) n um b er of 0-1 v ariables. This mak es the problem b ecome m uc h larger and harder to solv e. The comparison of computational

PAGE 33

16 times b et w een the con v en tional linearization approac h (found in the literature) and the KKT conditions linearization approac h has sho wn that this dev elop ed tec hnique enormously outp erforms the con v en tional approac h; that is, the new tec hnique solv es problems a lot faster than the con v en tional one and consume considerably m uc h less computational resources. In this thesis, w e direct our applications to bio engineering problems, particularly epilepsy and brain disorders. W e are sp ecically in terested in the prediction of epileptic seizures, whose o ccurrence seems to b e random and unpredictable. In essence, w e in tegrate the dev elop ed tec hniques from data mining concepts, dynamical approac hes in c haos theory , and optimization tec hniques as a set of to ols used to extract dynamical c hanges in the EEG time series that precede a seizure. Sp ecically in this framew ork, studies based on c haos theory of the spatiotemp oral dynamics in EEG's from patien ts with temp oral lob e epilepsy demonstrate a pre-ictal transition (temp oral patterns of dynamical c hanges in m ultiple EEG recordings), c haracterized b y a progressiv e con v ergence (en trainmen t) of dynamical measures (e.g., short{ term maxim um Ly apuno v exp onen ts { S T L max ) at sp ecic anatomical areas in the neo cortex and hipp o campus b efore the seizure onset. The problem of iden tifying those critical sp ecic areas are form ulated as a m ulti-quadratic 0-1 programming problem, whic h is solv ed b y the dev elop ed computational approac h. T ak en together, these tec hniques form the basis for the automated seizure w arning algorithm. An in tensiv e ev aluation of the p erformances of seizure prediction algorithm testing on con tin uous 0.76 to 5.84 da ys in tracranial EEG recordings from a group of 5 patien ts with refractory temp oral lob e epilepsy is rep orted in this thesis. F or the individual patien t, w e use the rst half of seizures to train the parameter settings, whic h is ev aluated b y R OC (Receiv er Op erating Characteristic) curv e analysis. With the b est parameter setting, the algorithm applied to all cases predicted an a v erage of 91.7% of seizures with an a v erage false prediction rate of 0.196 p er hour. These results indicate

PAGE 34

17 that it ma y b e p ossible to dev elop automated seizure w arning devices for diagnostic and therap eutic purp oses. 1.6 Organization of Chapters This dissertation, whic h inno v ates concepts from time series analysis, c haos and nonlinear dynamics, and optimization, is divided in to six c hapters. The organization of the succeeding c hapters of this pap er is as follo ws. The basic time series data mining concepts and literature of con v en tional time series analysis are elab orated in Chapter 2. The bac kground and literature of dynamical approac hes, c haos theory and nonlinear dynamics including calculation of Ly apuno v exp onen ts are review ed in Chapter 3. Chapter 4 reviews sev eral of the optimization tec hniques to optimally iden tify temp oral patterns for the prediction of time series. Sp ecically , in this researc h w e apply m ulti-quadratic in teger programming to solv e these problems. Additionally , this c hapter also presen ts the theoretical pro of for the equiv alence of linear mixed in teger programming and m ulti-quadratic in teger programming as w ell as its applications. In Chapter 5, w e fo cus on the application in bio engineering. The bac kground of epilepsy and brain disorder is addressed as w ell as the metho ds of estimation of S T L max , its parameter settings, and the spatiotemp oral dynamical analysis. Later in the c hapter, the framew ork of the algorithm dev elop ed for the prediction of epileptic seizures is discussed. The conclusions and future researc h are discussed in the nal Chapter 6.

PAGE 35

CHAPTER 2 TIME SERIES D A T A MINING A time series is a set of n um b ers that measures the status of some activit y o v er time. It is the historical record of some activit y , with measuremen ts tak en at equally spaced in terv als with a consistency in the activit y and the metho d of measuremen t. There are t w o main goals of time series analysis: 1 Dra wing inferences from suc h series and iden tifying the nature of the phenomenon represen ted b y the sequence of observ ations. 2 Predicting future v alues of the time series v ariable (forecasting). In b oth cases, the pattern of observ ed time series data is required so w e can in terpret and in tegrate it with other data. Regardless of the depth of our understanding and the v alidit y of our in terpretation (theory) of the phenomenon, w e can extrap olate the iden tied pattern to predict future ev en ts. Based on the time series data mining researc h, this c hapter reviews the fundamen tal concepts and w ell-kno wn metho ds in time series analysis. The organization of the succeeding sections of this c hapter is as follo ws. In section 2.1, w e giv e some concepts of time series analysis and forecasting tec hniques. The data mining concept is in tro duced in section 2.2. In the nal section 2.3, the basic concepts of the time series data mining are review ed. 2.1 Time Series Analysis and F orecasting T ec hniques 2.1.1 Basics The selection and implemen tation of the prop er forecast metho dology has alw a ys b een the most imp ortan t issue in forecasting tec hniques. Mo deling for forecasting is a necessary input to planning, whether in business, or researc h. The o w c hart in Fig. 2{1 highligh ts the systematic dev elopmen t of the mo deling and forecasting phases. 18

PAGE 36

19 Figure 2{1: Flo w c hart of the forecasting system: The mo del-building and forecasting phases. The mo deling pro cess in Fig. 2{1 is useful to understand the underlying mec hanism generating the time series, and describ e and explain an y v ariations, seasonalit y , trend, etc. In addition, it enables us to predict the future under \business as usual" condition and to con trol the system; that is, to p erform the \what-if" scenarios [11 ]. Examples of w ell-kno wn real w orld time series in business and researc h asp ects are illustrated in Figures 2{2, 2{3, 2{4, and 2{5. There are t w o main metho ds in the forecasting systems: 1 The explanatory metho d. This metho d estimates the future v alues based on an analysis of comp onen t factors and parameters, whic h are b eliev ed to inuence future v alues. 2 The extrap olation metho d. This metho d mak es a prediction of target ev en ts of in terest based on an inferred study of past data b eha vior and proles o v er time. Both approac hes ma y lead to the accurate and useful forecasts, but for a general degree of accuracy , the former approac h seem to b e more dicult to implemen t and v alidate than the latter one.

PAGE 37

20 0 1 2 3 4 5 6 7 8 9 10 -4000 -3000 -2000 -1000 0 1000 2000 3000 4000 Time (minutes)EEG Data Figure 2{2: T en min ute EEG time series data. 7 0 0 0 . 8 0 0 0 . 9 0 0 0 . 1 0 0 0 0 . 1 1 0 0 0 . 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 S e r i e s Figure 2{3: Mon thly acciden tal deaths in U.S.A. (from Jan uary 1973 to Decem b er 1978) time series data, whic h sho ws a v ery strong seasonalit y .

PAGE 38

21 0 . 0 E + 0 0 5 . 0 E + 0 7 1 . 0 E + 0 8 1 . 5 E + 0 8 2 . 0 E + 0 8 2 . 5 E + 0 8 2 4 6 8 1 0 1 2 1 4 1 6 1 8 2 0 S e r i e s Figure 2{4: U.S.A. p opulation with ten-y ear in terv als (from 1790 to 1990) time series data, whic h sho ws a v ery strong trend. 3 5 0 0 . 3 6 0 0 . 3 7 0 0 . 3 8 0 0 . 3 9 0 0 . 4 0 0 0 . 0 5 0 1 0 0 1 5 0 2 0 0 2 5 0 S e r i e s Figure 2{5: Do w-Jones index closing prices (251 consecutiv e trading da ys ending 08/26/94) time series data.

PAGE 39

22 2.1.2 Statistical Analysis In this section, w e will fo cus on the recen tly dev elop ed metho ds used to deal with nonstationary time series. 2.1.2.1 Multiple regression analysis Regression is the study of ho w the distribution of Y c hanges for v arying com binations of X v alues (i.e, the study of the conditional distribution y j x of the resp onse y giv en the v ector of non trivial predictors x ). Regression analysis is essen tially the study of relationships among v ariables, a principal purp ose of whic h is to predict, or estimate the v alue of one v ariable from kno wn or assumed v alues of other related v ariables. The m ultiple regression analysis is generally used when t w o or more indep enden t factors (predictors) are in v olv ed and widely used for in termediate term forecasting. This metho d can also b e used to dev elop alternate mo dels with dieren t factors. T o mak e predictions or estimations, w e m ust iden tify the eectiv e predictors of the v ariables of in terest. 2.1.2.2 T rend analysis A structural time series mo del is a linear mo del whic h is form ulated directly in terms of the comp onen ts of in terest in the time series: Observ ed series = trend + seasonal + irregular, where the \irregular" comp onen t reects nonsystematic mo v emen ts in the series. This metho d forecasts the curren t time stamp b y understanding the trend, seasonal, and irregular comp onen ts of previous time stamps in the observ ed series. Structural mo dels are estimated b y con v erting the time series in to a state space form. The trend analysis applies linear and nonlinear regression with time as the explanatory v ariable to estimate and remo v e trends. It is usually used where there exists a pattern or long-term trend o v er time. Unlik e most of other forecasting tec hniques, the trend analysis do es not assume the condition of equally spaced time series. This metho d can also b e used to transform a nonstationary time series to a stationary time series [44 ]. In man y cases, remo ving trends is b etter than dierencing

PAGE 40

23 in man y asp ects (e.g., the time series with seasonal patterns). In addition, if there is evidence that a deterministic trend exists in the time series, then it will generally giv e b etter results when suc h a trend is rst estimated and remo v ed from the giv en series b efore applying the Bo x and Jenkins (1976) [21 ] approac h to the residual series than when the series is dierenced b efore applying the trend analysis. 2.1.2.3 Mo ving a v erage The mo ving a v erage (MA) is the b est-kno wn forecasting metho d. The metho d simply tak es a certain n um b er of past p erio ds and adds them together, then divides them b y the n um b er of p erio ds. Giv en that time series is stationary in b oth mean and v ariance, the mo ving a v erages is v ery eectiv e and ecien t. The follo wing form ulation is used to nd the mo ving a v erage of order n , MA(n) for a p erio d t + 1, M A t +1 = [ D t + D t 1 + ::: + D t n +1 ] =n; (2.1) where n is the n um b er of observ ations used in the calculation. The mo ving a v erage has b een widely extended to b e a basis of new metho dology of forecasting tec hniques. One of its extensions is the w eigh ted mo ving a v erage, whic h is widely used where rep eated forecasts are required. The follo wing form ulation is used to nd the w eigh ted mo ving a v erage of order n , W eigh ted MA(n) for a p erio d t + 1, W eigh ted MA( t + 1) = [ w 1 :D t + w 2 :D t 1 + ::: + w n :D t n +1 ] =n; (2.2) where the w eigh ts are an y p ositiv e n um b ers suc h that: w 1 + w 2 + ::: + w n = 1. 2.1.2.4 Autoregressiv e in tegrated mo ving a v erage mo dels Man y metho ds ha v e b een used to analyze and forecast nonstationary time series data. One of the w ell-kno wn metho ds, prop osed b y Bo x and Jenkins (1976) [21 ], is called the autoregressiv e in tegrated mo ving a v erage (ARIMA) pro cess. It is an extension to the stationary autoregressiv e mo ving a v erage (ARMA) pro cess. ARIMA

PAGE 41

24 deals with nonstationary time series b y using the d th dierence to mak e the nonstationary time series a stationary ARMA pro cess. The ARIMA metho d assumes that a series can b e reduced to a stationary time series b y dierencing or detrending. ARIMA is limited b y the requiremen ts of stationarit y of the time series and normalit y and indep endence of the residuals. In other w ords, the statistical c haracteristics of a stationary time series remain constan t through time, and residuals, whic h are the errors b et w een the observ ed time series and the mo del generated b y the ARIMA metho d, are assumed to b e caused b y noise. Auto correlation for time series analysis is a metho dology for studying the sequen tial progression of ev en ts. The dep endence or regression on the v ariable's past v alues is used to mak e predictions of the v ariable's future v alues. A mo del that is deriv ed from auto correlated data is referred to as an autoregression. W e sa y that f y t g is an ARIMA pro cess of order p; d; q (i.e., y t ARIMA( p; d; q )), if the d th dierence of y t is a stationary , in v ertible ARMA pro cess of order p and q . The mo del can b e written as ( B )(1 B ) d y t = ( B ) z t ; (2.3) where B is the bac kw ard shift op erator, f z t g is white noise, and ( ) and ( ) are p olynomials of degree p and q , resp ectiv ely , with all ro ots of the p olynomial equations ( z ) = 0 and ( z ) = 0 outside the unit circle. Autoregressiv e in tegrated mo ving a v erage (ARIMA) pro cesses pro vide a justication for dierencing a nonstationary time series in order to ac hiev e stationarit y while the approac h b y Bo x and Jenkins giv es a systematic w a y to mo del a giv en nonstationary time series. Ho w ev er, this pro cedure can only b e used when the nonstationarit y is homogeneous; that is, the same nite n um b er of dierencing is applied ev erywhere. Although dierencing pro duces a stationary pro cess, the forecasts obtained from an

PAGE 42

25 estimated ARIMA mo del are guaran teed to b e optimal; other transformations ma y giv e b etter forecasts. 2.1.2.5 Autoregressiv e-autoregressiv e mo ving a v erage mo dels The autoregressiv e-autoregressiv e mo ving a v erage (ARARMA) mo del for nonstationary time series w as prop osed b y P arzen (1982) [120 ]. The mo del consists of a nonstationary AR follo w ed b y a stationary ARMA mo del. The Bo x and Jenkins ARIMA approac h is a sp ecial case of the ARARMA pro cess in whic h the transformation is constrained to b e pure dierencing op erators. The mo del can b e describ ed as follo ws: ~ y t = y t 1 y t 1 + r y t r ; (2.4) p X j =1 j ( ~ y t j ~ ) = q X k =0 k z t k ; (2.5) where z t is white noise. An ARARMA mo del w as compared with Bo x and Jenkins for the w ell-kno wn in ternational air lines data with resp ect to the forecasting mean square error (MSE) and mean a v erage p ercen tage error (MAPE). The results sho w ed that the ARARMA mo del is b etter than the ARIMA mo del b y Bo x and Jenkins (1976) [21 ], esp ecially for further step forecasts. 2.1.2.6 Kalman lter In state space, the v alue of a time series at time t is plotted against the v alue of the time series at time t rst-order Mark o v pro cess. Another w ell-kno wn pro cedure is using the state space ltering algorithm [79 ] and w as discussed b y Meinhold and Singpurw alla (1983) [96 ] with a statistical asp ect. The Kalman lter can b e applied after putiing a mo del in to a state space form. In general, the Kalman lter is a recursiv e pro cedure used to compute the optimal estimator of the state v ector at time t , based on the information a v ailable at time t . The Kalman lter is an optimal estimator if the disturbances (random sho c ks) and the initial state v ector are normally distributed. The basic idea of this

PAGE 43

26 approac h w as motiv ated b y the up dating feature, and its deriv ation follo ws the least square estimation theory , where an observ ation equation and a system equation are emplo y ed. Let ( y 1 ; y 2 ; : : : ; y t ) b e the observ ed time series, and assume that y t dep ends on an unobserv able quan tit y x t , kno wn as state of natur e . The relationship b et w een y t and x t is linear and is sp ecied b y the observation e quation as y t = H t x t + b t ; t = 1 ; 2 ; : : : where H t is a kno wn quan tit y , and b t is the observ ation error whic h is assumed to b e normally distributed with mean zero and kno wn nite v ariance. The main dierence b et w een Kalman's state space ltering metho d and the traditional linear mo del is that the state of nature ( x t ) in Kalman's metho d, whic h corresp onds to the regression co ecien ts of the linear mo dels, is not assumed to b e a constan t but ma y c hange in time. This dynamical feature of x t is represen ted b y a system e quation as x t = F t x t 1 + a t ; t = 1 ; 2 ; : : : ; (2.6) where F t is a kno wn quan tit y , and a t is the system equation error whic h is assumed to b e normally distributed with mean zero and kno wn nite v ariance. In this algorithm, there are t w o ma jor assumptions: 1 H t and F t ha v e to b e sp ecied. 2 The assumptions ab out the statistical c haracteristics of the dynamical error terms, f b t g and f a t g ha v e to b e made (e.g., normalit y assumption on the series and indep endence assumption on the residuals). Ho w ev er, in man y applications, the v alidit y of these assumptions is not c hec k ed. 2.1.3 Nonlinear Analysis Neural net w orks are one of the most studied metho ds in nonlinear analysis used in time series analysis. Sev eral authors ha v e giv en an o v erview of dieren t t yp es of neural net w orks in time series pro cessing. F or example, neural net w orks can b e categorized

PAGE 44

27 b y the t yp e of mec hanism to deal with temp oral information. The use of neural net w orks in times series in the con text of function appro ximation and classication w as describ ed b y Dorner (1994) [26 ]. Neural net w orks ha v e b een used to extend b oth the ARIMA mo del and the state-space linear mo dels (e.g., Kalman's lter). Nonlinear autoregressiv e mo dels suc h as neural net w orks are p oten tially more p o w erful than linear mo dels b ecause m uc h more complex underlying c haracteristics of the series can b e mo deled without the stationarit y assumption. There are v arieties of tec hniques that can b e used to select the b est net w ork arc hitecture for ev aluating the p erformance of the net w ork and the prediction algorithms for almost random-w alk series. 2.1.3.1 Neural net w ork F or time series forecasting, the prediction mo del of order p , has the general form: D t = f ( D t 1 ; D t 1 ; :::; D t p ) + e t : (2.7) Neural net w ork arc hitectures can b e trained to predict the future v alues of the dep enden t v ariables. Design of the net w ork paradigm and its parameters are required for the training purp ose. The m ulti-la y er feed-forw ard neural net w ork approac h consists of an input la y er, one or sev eral hidden la y ers and an output la y er. The partially recurren t neural net w ork is an extension of the neural net w ork tec hnique that can learn sequences as time ev olv es. Moreo v er, it resp onds to the same input pattern dieren tly at dieren t times, dep ending on the previous input patterns. The p erformance of b oth approac hes can b e impro v ed b y adding a damp ed feedbac k that p ossesses the c haracteristics of a dynamic memory . 2.1.4 Sp ectral Analysis 2.1.4.1 Discrete fourier transform The imp ortance of the discrete fourier transform (DFT) is the existence of a fast algorithm, the fast fourier transform (FFT) that can calculate DFT co ecien ts

PAGE 45

28 in O ( n log n ) time. The DFT analysis decomp oses a series in to comp onen t parts. It ev olv es t w o concepts: 1 Dieren t w a v eforms can b e com bined together. 2 Adding enough simple sine and cosine w a v es of dieren t frequencies, phases, and amplitudes together is sucien t to create an y shap e of time series. F ourier sp ectral analysis pro duces information ab out w a v eforms, frequencies, amplitudes of comp onen ts in the series. The DFT analysis has b een used to p erform similarit y matc hing, discretizing and clustering in time series. It can also b e used to map time sequences to the frequency domain. Sequences are mapp ed to a lo w erdimensionalit y space b y using only the rst few F ourier co ecien ts, and then R trees are used to index the sequences and to ecien tly answ er similarit y queries. This metho d w orks on global sequence matc hing [104 ]. 2.1.4.2 W a v elet transforms W a v elet transforms ha v e b een used to lo cally and globally matc h sequences, and to extract features that describ e prop erties of the sequence in v arious lo cations and v arying time regularities. F eatures are extracted from the time series based on the discrete w a v elet transform, while lo cal and global similar sequences are iden tied based on these feature v ectors. Sequences that are lo cally stationary in time are w ell suited to sp ectral represen tations (e.g., the direct use of F ourier co ecien ts). Ho w ev er, man y sequences that con tains transien t b eha vior are nonstationary and ma y p ossess v ery w eak sp ectral signatures lo cally and globally [150 ]. 2.1.5 Matc hing Similar Time Series P atterns Finding similar patterns in time series is used in sev eral applications, including indexing lik e patterns, nding similar subsequences, clustering, and nding rules asso ciating time series [20 ]. Time series similarit y has b een approac hed in man y dieren t w a ys. Man y approac hes assume a template pattern (either global or lo cal),

PAGE 46

29 and then try to nd similar patterns in a reference sequence. Most of the approac hes decomp ose the time series in to windo ws, in whic h features are extracted. Then ecien t matc hing is p erformed b y using an R tree structure in feature space [82 ]. In order to nd similar patterns in time series, v arious distance measures ha v e b een emplo y ed. The most common is the Euclidean distance measure. Dynamic time w arping, in tro duced as a temp oral similarit y measure b y Berndt and Cliord (1996) [20 ], is one of the most studied in matc hing similar time series patterns. The dynamic time w arping tec hnique w as tak en from sp eec h recognition, whic h yields elasticit y in the temp oral axis when matc hing a template to a reference sequence. In this approac h, a dynamic programming metho d is used to align the time series and a predened set of templates. Another w ell-kno wn approac h is the longest common subsequence measure in tro duced b y Selman et al. (1997) [144 ]. This approac h considers X and Y to b e similar if they exhibit similar b eha vior for a large part of their length. The templates using piecewise linear segmen tations w ere in tro duced, and a probabilistic metho d w as used to matc h the kno wn templates to the time series data b y Keogh and P . Sm yth (1997) [82 ]. Lo cal features suc h as p eaks, troughs, and plateaus are dened using a prior distribution on exp ected deformations from a basic template. A predened set of templates to matc h a time series generated from rob ot sensors can also b e implemen ted b y emplo ying the time-dela y em b edding pro cess to matc h their predened templates. 2.2 Data Mining Concepts No w ada ys, data mining is fast emerging as a core comp onen t of man y business and industries, esp ecially those related to Finance and Banking, Man ufacturing, Biosystems and Biotec hnology , and Information Systems and Services. Data mining is referred to as a m ultipurp ose abstraction of information from large data sources. Data mining is dened as \the searc h for v aluable information in large v olumes of data. Predictiv e data mining is a searc h for v ery strong patterns in big data that

PAGE 47

30 can generalize to accurate future decisions" [160 ]. Alternativ ely , data mining can also b e dened as \the pro cess of extracting previously unkno wn, v alid, and actionable information from large databases and then using the information to mak e crucial business decisions" [22 ]. Data mining is ev olv ed from sev eral elds, including mac hine learning, statistics, and database design [160 ]. It uses tec hniques suc h as clustering, asso ciation rules, visualization, decision trees, nonlinear regression, and probabilistic graphical dep endency mo dels to iden tify no v el, hidden, and useful structures in large databases [160 ]. 2.3 Concepts of Time Series Data Mining In short, some k ey concepts of TSDM are as follo ws. An ev en t is dened as an imp ortan t o ccurrence in time. The asso ciated ev en t c haracterization function f ( t ), whic h is dened a priori, represen ts the v alue of future ev en tness for the curren t time stamp. A phase space is a N -dimensional real metric space in to whic h the time series is em b edded. The augmen ted phase space is dened as a N + 1 dimensional space formed b y extending the phase space with the additional dimension of f ( ). Dened as a v ector of length N or equiv alen tly as a p oin t in a N -dimensional phase space, a temp oral pattern is a hidden structure in a time series that is c haracteristic and predictiv e of ev en ts. The ob jectiv e function is to nd a v alue or tness of a temp oral pattern cluster or a collection of temp oral pattern clusters. In other w ords, nding optimal temp oral pattern clusters that c haracterize and predict ev en ts is the k ey of the TSDM framew ork. 2.3.1 T emp oral P attern and T emp oral P attern Cluster The imp ortan t concept within the TSDM framew ork is the temp oral pattern. A temp oral pattern is dened as a hidden structure, whic h is c haracteristic and predictiv e of ev en ts, in a time series. The temp oral pattern p is a real v ector of length N . The temp oral pattern will b e represen ted as a p oin t in a N -dimensional real metric space, p 2 R N . The observ ations f x t ( N 1) ; : : : ; x t ; x t g form a sequence that can

PAGE 48

31 b e compared to a temp oral pattern, where x t represen ts the curren t observ ation, x t ( N 1) ; : : : ; x t 2 ; x t past observ ations. Let > 0 b e a p ositiv e in teger. If t represen ts the presen t time stamp, then t is a time stamp in the past, and t + is a time stamp in the future. F rom this notation, time is partitioned in to three categories: past, presen t, and future. T emp oral patterns and ev en ts are placed in to dieren t time categories; that is, temp oral patterns o ccur in the past and complete in the presen t while ev en ts o ccur in the future. 2.3.2 Phase Space and Time-Dela y Em b edding A reconstructed phase space is dened as a N -dimensional metric space in to whic h a time series is em b edded [1]. F rom T ak ens' theorem, if N is large enough, then the phase space is homeomorphic to the state space that generated the time series [153 ]. The time-dela y ed em b edding of a time series maps a set of N time series observ ations tak en from X on to x t , where x t is a v ector or p oin t in the phase space. Sp ecically , to determine ho w w ell a temp oral pattern or a phase space p oin t c haracterizes an ev en t requires the concept of an ev en t c haracterization function as in tro duced in the next section. 2.3.3 Ev en t Characterization F unction The ev en t c haracterization function f ( t ) is in tro duced to connect a temp oral pattern in the past and presen t with the future ev en for the prediction. The ev en t c haracterization function represen ts the v alue of future \ev en tness" for the curren t time stamp. The ev en t c haracterization function is dened a priori and is created to address the sp ecic TSDM goal. The ev en t c haracterization function is dened suc h that its v alue at t highly correlates with the o ccurrence of an ev en t at some sp ecied time in the future. In other w ords, the ev en t c haracterization function is causal when applying the TSDM metho d to prediction problems. Non-causal ev en t c haracterization functions are useful when applying the TSDM metho d to system iden tication problems.

PAGE 49

CHAPTER 3 D YNAMICAL APPR O A CHES AND CHA OS THEOR Y Dynamics is considered to b e an imp ortan t conceptual sc heme b et w een mathematics and sciences, unifying the sciences; ph ysical, biological, and so cial in a common geometric mo del. Dynamics has ev olv ed in to three disciplines: mathematical, applied, and exp erimen tal. Newton is the pioneer of mathematical dynamics, whic h has b ecome a large and activ e branc h of pure mathematics. This includes the theory of ordinary dieren tial equations, no w a classical sub ject. After a while, P oincar e in tro duced the metho ds of top ology and geometry , whic h ha v e dominated the eld. Applied dynamics is an increasingly imp ortan t branc h of the sub ject, founded b y Galileo. Ho w ev er, applied dynamics b ecame w ell-kno wn b y Ra yleigh, Dung, and V an Der P ol. Exp erimen tal tec hniques ha v e b een rev olutionized with new dev elopmen t of the tec hnology . No w ada ys, the new tec hnology is accelerating the impro v emen t of the researc h fron tier and increasing the eciency of the exp erimen tal w ork. As stated earlier in Chapter 1, a dynamical system is dened as an ything that mo v es, c hanges, or ev olv es o v er time. Dynamical systems are considered to b e \deterministic" if there is a unique consequen t to ev ery state, or \sto c hastic" or \random" if there is more than one consequen t c hosen from some probabilit y distribution. Unpredictable b eha vior of deterministic systems has b een called \c haos". The term c haos w as rst in tro duced b y Li and Y ork e (1975) [89 ]. In general, c haos is referred to as a name for an y order that pro duces confusion in our minds. Mathematically , c haos can b e dened as an eectiv ely unpredictable long term b eha vior arising in a deterministic dynamical system b ecause of sensitivit y to the initial conditions. F or example, the w eather is go v erned b y the atmosphere, whic h ob eys deterministic 32

PAGE 50

33 ph ysical la ws. The reason for the unpredictabilit y of the w eather is that the w eather exhibits extreme sensitivit y to initial conditions. A tin y c hange in to da y's w eather (the initial conditions) ma y cause a larger c hange in tomorro w's w eather and an ev en larger c hange in the next da y's w eather. This sensitivit y to initial conditions has b een hailed as the buttery eect, b ecause it is p ossible for a buttery apping its wings to da y in China to set o tornado es in Kansas a w eek later. Since it is imp ossible to obtain the initial conditions with p erfect precision, long-term prediction is imp ossible, ev en when the ph ysical la ws are deterministic and exactly kno wn. It has b een sho wn that the predictabilit y horizon in w eather forecasting cannot b e more than t w o or three w eeks [148 ]. The threads of c haos theory and nonlinear dynamics are mostly based on the fundamen tal la ws of ph ysics, c hemistry , and biology . Accordingly , c haos theory holds promise for explaining man y natural pro cesses. F or example, a stream of w ater exhibits regular (laminar) o w when mo ving slo wly and irregular (turbulen t) o w when mo ving more rapidly . The transition b et w een the t w o can b e v ery abrupt. If t w o stic ks are dropp ed side-b y-side in to a stream with laminar o w, then they sta y close together, but if they are dropp ed in to a turbulen t stream, then they quic kly separate [148 ]. F or example, consider a b oulder precariously p erc hed on the top of an ideal hill. The sligh test push will cause the b oulder to roll do wn one side of the hill or the other; the subsequen t b eha vior is sensitiv e to the direction of the push and the push can b e arbitrarily small. Chaotic pro cesses are not random; they follo w rules, but ev en simple rules can pro duce extreme complexit y . The organization of the succeeding sections of this c hapter is as follo ws. Short explanations of common terms used in dynamical systems and c haos are giv en in the next section. In section 3.2, examples of w ell-kno wn c haotic systems are illustrated and discussed. The literature review and motiv ation are addressed in section 3.3. Later in section 3.4, w e discuss the dimensionalit y of the system, whic h is called fractal dimension. In section 3.5, Ly apuno v stabilit y theorem and Ly apuno v exp onen ts are

PAGE 51

34 discussed. This includes prop osed metho ds used for calculating Ly apuno v exp onen ts, whic h can b e used to analyze the real-w orld time series. 3.1 Chaos Glossary 3.1.1 Phase Space Phase space is the collection of p ossible states of a dynamical system. A phase space can b e nite (e.g., for the ideal coin toss, w e ha v e t w o states heads and tails), coun tably innite (e.g., state v ariables are in tegers), or uncoun tably innite (e.g., state v ariables are real n um b ers). 3.1.2 T ra jectory T ra jectory or orbit of the dynamical system is the path connecting p oin ts in c hronological order in phase space traced out b y a solution of an initial v alue problem. If the state v ariables tak e real v alues in a con tin uum, the orbit of a con tin uous-time system is a curv e, while the orbit of a discrete-time system is a sequence of p oin ts. 3.1.3 Bifurcation A bifurcation is dened as a qualitativ e c hange in dynamics up on a small v ariation in the parameters of a system. A gradually v ariation of a parameter in the system corresp onds to the gradual v ariation of the solutions to the problem. F or instance, bifurcation is a phenomena when the n um b er of solutions c hanges abruptly and the structure of solution manifolds v aries dramatically when a parameter passes through some critical v alues. Bifurcation theory is a metho d for studying the onset of c haos and ho w solutions of a nonlinear problem and their stabilit y c hange as the parameters of the system v ary . 3.1.4 Degree of F reedom The notion of degrees of freedom is dened as one canonical conjugate pair, a conguration and its conjugate momen tum. In the study of dissipativ e systems the term degree of freedom means a single co ordinate dimension of the phase space. In

PAGE 52

35 this con text, degree of freedom implies order whic h is equal to the dimension of the phase space. 3.1.5 A ttractor An attractor is dened as a phase space p oin t or a set of p oin ts represen ting the v arious p ossible steady-state conditions of a system; an equilibrium state or a group of states to whic h a dynamical system con v erges and cannot b e decomp osed in to t w o or more attractors with distinct basins of attraction. This is necessary since a dynamical system ma y ha v e m ultiple attractors, eac h with its o wn basin of attraction. Informally , an attractor is a region of a dynamical system's state space that the system can en ter but not lea v e, and whic h con tains no smaller suc h region. Th us in the long term, a dissipativ e dynamical system ma y settle in to an attractor. In short, an attractor is just a set in the phase space that has a neigh b orho o d in whic h ev ery p oin t sta ys nearb y and approac hes the attractor as time go es to innit y . The b oundary of a basin of attraction is v ery in teresting since it dieren tiates b et w een dieren t t yp es of motion. T ypically , a basin b oundary is a saddle orbit, or suc h an orbit and its stable manifold. An alternativ e denition of an attractor is sometimes used b ecause there are systems that most, but not all, of the initial conditions are attracted in their neigh b orho o d. Th us, an attractor can alternativ ely b e dened as a set for whic h a p ositiv e measure of initial conditions in a neigh b orho o d are asymptotic to the set. 3.1.6 Strange A ttractor A strange attractor is dened as an attractor that sho ws sensitivit y to initial conditions (exp onen tial div ergence of neigh b oring tra jectories) and that, therefore, o ccurs only in the c haotic domain. The term strange attractor w as in tro duced b y Ruelle and T ak ens in 1971 [133 ] in their discussion of a scenario for the onset of turbulence in uid o w. They noted that when p erio dic motion go es unstable (with three or more mo des), the t ypical result will b e a geometrically strange ob ject.

PAGE 53

36 Unfortunately , the term strange attractor is often used for an y c haotic attractor. Ho w ev er, the term should b e reserv ed for attractors that are \geometrically" strange (e.g., fractal). While all c haotic attractors are strange, not all strange attractors are c haotic [53 ]. In other w ords, the c haoticit y condition is necessary , but not sucien t, for the strangeness condition of attractors. 3.2 Chaos Theory for Time Series Analysis Man y in v estigations ha v e also sho wn the presence of c haos in man y real w orld time series data [3 , 29 , 43, 63 , 69 , 92, 143 , 151 ]. Because of their sensitivit y to initial conditions, the metho ds of nonlinear dynamical systems had a strong inuence on the dev elopmen t of new concepts in the eld of time series analysis. One of the ma jor goals of time series analysis is prediction (i.e., forecasting the future dev elopmen t of the series from observ ations in the past). The rst approac h to predicting deterministic dynamical systems on the basis of the time dela y metho d relies on nearest neigh b ors in the em b edding space and a lo cal linear mo del deriv ed b y tting a h yp erplane through the data. This em b edding tec hnique consists of forming a mapping b et w een the observ ed time series and v ectors in some simple m ultidimensional space. These v ectors then form p oin ts on a tra jectory and the m ultidimensional space b ecomes a represen tation of the actual ph ysical phase space of the system. The main theorems connecting the t w o realms, measuremen t and ph ysical phase space, are T ak en's theorem [153 ], its precursors [102 ], and its sequels [140 , 141 ]. Most of the dynamical approac hes to time series analysis are limited b ecause of man y diculties and assumption requiremen ts. F or instance, if the series is generated b y a deterministic dynamical system, the metho ds do not allo w for a geometrical description of the dynamics whic h yields information ab out the structure of the c haotic attractor. In particular, the dimensionalit y of the attractor and its em b edding in a smo oth, nonlinear manifold is not addressed. In general, em b edding in Euclidean spaces ma y b e view ed as encapsulating the nonlinear manifold. The

PAGE 54

37 in trinsic dimensionalit y of the dynamics is in general smaller than the dimension of the space needed to correctly em b ed the dynamics. The problem of reducing the dimensionalit y of dynamical systems is extremely imp ortan t and b ecomes an activ e area of researc h. In addition, the assumption of a deterministic dynamical system itself is also a concern b ecause exp erimen ters usually w ork with noisy data. Man y time series (e.g., foreign currency exc hange rates, the heart b eat) do not seem to b e in the class of deterministic dynamical systems, but rather app ear to b e go v erned b y a sto c hastic pro cess. It is therefore desirable to ha v e an algorithm whic h on the one hand can b e applied to deterministic as w ell as to sto c hastic time series, and on the other hand allo ws to t a smo oth manifold through the data, regardless of whether the time series is driv en deterministically or sto c hastically . 3.3 Chaotic Systems In this section, some w ell-kno wn c haotic systems are discussed. These systems can b e considered as b enc hmark examples for testing statistical metho ds used to dra w inferences of systems with c haos. 3.3.1 H e non Mapping The dieren tial equations from the H e non map are giv en b y: dx dt = a + b y x 2 (3.1) dy dt = x; (3.2) where a and b are constan ts. V alues a = 1 : 4 and b = 0 : 3 are normally c hosen for study where c haos is exhibited [57 , 158 ]. The evidence of c haos for H e non mapping with the ab o v e parameters can b e seen through n umerical study . The H e non attractor is the scatter plot of the orbits f ( x t ; y t ) g N n = N 0 for some t ypical initial v alue ( x 0 ; y 0 ), where N 0 is the n um b er of discarded transien t steps. The H e non attractor is lo cally a pro duct of a line segmen t and a Can tor set. Figure (3{1){(3{3) illustrate the H e non map with parameter setting

PAGE 55

38 Figure 3{1: Henon map created b y Runge-Kutta in tegration with a = 1 : 4, b = 0 : 3. Figure 3{2: F ourier transform of Henon map with a = 1 : 4, b = 0 : 3. Figure 3{3: 3-D plot of Henon attractor from Henon map with a = 1 : 4, b = 0 : 3.

PAGE 56

39 where system is sho wn to b e c haotic, the F ourier transform, and the 3-D plot of the H e non map. 3.3.2 Lorenz System The dieren tial equations from the Lorenz system are giv en b y: dx dt = ( y x ) (3.3) dy dt = ( r x ) y ( x z ) (3.4) dz dt = ( x y ) ( b z ) (3.5) where , r , and b are constan ts. V alues = 10 : 0 ; r = 28 : 0 ; b = 8 3 are normally c hosen for study where c haos is exhibited [92 , 158 ]. In a famous pap er in 1963 [92 ], Lorenz disco v ered that simple systems of three dieren tial equations can ha v e complicated attractors. The Lorenz attractor (with its buttery wings reminding us of sensitiv e dep endence) is the \icon" of c haos. Lorenz sho w ed that his attractor w as c haotic, since it exhibited sensitiv e dep endence. Moreo v er, his attractor is also \strange", whic h means that it is a fractal. The Lorenz system is one of the most-studied systems. The Lorenz equations are obtained from truncation of the Na vier-Stok es equations, whic h giv e an appro ximate description of a horizon tal uid la y er heated from b elo w, whic h is similar to condition in the earth's atmosphere. F or sucien tly in tense heating ( r ), the time ev olution has sensitiv e dep endence on initial conditions, th us represen ting a v ery irregular and c haotic b eha vior. The Lorenz system is also used to justify the so-called \buttery eect", a metaphor of the imprecision of w eather forecasting. The Lorenz system is the rst example that is deriv ed from an actual ph ysical pro cess and giv es a rise to a t yp e of attractor whic h is neither p erio dic nor quasip erio dic.

PAGE 57

40 Figure 3{4: Lorenz System created b y Runge-Kutta in tegration of the Lorenz equations, with = 10 : 0, r = 28 : 0, b = 8 3 . Figure 3{5: F ourier transform of Lorenz system with = 10 : 0, r = 28 : 0, b = 8 3 . Figure 3{6: 3-D plots of Lorenz attractor, whic h is a strange attractor, from Lorenz system with = 10 : 0, r = 28 : 0, b = 8 3 .

PAGE 58

41 Figures (3{4){(3{6) illustrate the Lorenz system with the parameter setting where system is sho wn to b e c haotic, the F ourier transform, and the 3-D plot of the system. 3.3.3 R ossler System The dieren tial equations from the R ossler System are giv en b y: dx dt = z y (3.6) dy dt = x + a y (3.7) dz dt = b + z ( x c ) (3.8) where a , b , and c are constan ts. V alues a = 0 : 15 ; b = 0 : 20 ; c = 10 : 0 are normally c hosen for study where c haos is exhibited [95 , 158 ]. Figure (3{7){(3{9) illustrate the R ossler system with the parameter setting where system is sho wn to b e c haotic, the F ourier transform, and the 3-D plot of the system. 3.4 F ractal Dimension The rst step of kno wledge to c haracterize the prop erties of the system is the dimension of an attractor. The dimension is the information necessary to sp ecify the p osition of a p oin t on the attractor to within a giv en accuracy . In addition, the dimension is a lo w er b ound on the n um b er of essen tial v ariables needed to mo del the dynamics. Strange attractors often ha v e a structure that is not simple; they are often not manifolds and actually ha v e a highly fractured c haracter. The dimension that is most useful tak es on v alues that are t ypically not in tegers. These nonin teger dimensions are called fr actal dimensions . F or an y attractor, the dimension can b e estimated b y lo oking at the w a y in whic h the n um b er of p oin ts within a sphere of radius r scales as the radius shrinks to zero. The geometric relev ance of this observ ation is that the v olume o ccupied b y a sphere of radius r in the dimension d b eha v es as r d .

PAGE 59

42 Figure 3{7: R ossler System created b y Runge-Kutta in tegration with a = 0 : 15, b = 0 : 20, c = 10 : 0. Figure 3{8: F ourier transform of R ossler system with a = 0 : 15, b = 0 : 20, c = 10 : 0. Figure 3{9: 3-D plots of R ossler attractor from R ossler system with a = 0 : 15, b = 0 : 20, c = 10 : 0.

PAGE 60

43 F or regular attractors, irresp ectiv e to the origin of the sphere, the dimension w ould b e the dimension of the attractor. But for a c haotic attractor, the dimension v aries dep ending on the p oin t at whic h the estimation is p erformed. If the dimension is in v arian t under the dynamics of the pro cess, w e will ha v e to a v erage the p oin t densities of the attractor around it. F or the purp ose of iden tifying the dimension in this fashion, w e nd the n um b er of p oin ts y ( k ) within a sphere around some phase space lo cation x . This is dened b y n ( x; r ) = 1 N N X k =1 ( r j y ( k ) x j ) (3.9) where is the Hea viside function. This coun ts all the p oin ts on the orbit y ( k ) within a radius r from the p oin t x and normalizes this quan tit y b y the total n um b er of p oin ts N in the data. Also, w e kno w that the p oin t densit y , ( x ), on an attractor do es not need to b e uniform (for a strange attractor) on the gure of the attractor. Cho osing the function as n ( x; r ) q 1 and dening the function C ( q ; r ) of t w o v ariables q and r b y the mean of n ( x; r ) q 1 o v er the attractor w eigh ted with the natural densit y ( x ) yield C ( q ; r ) = Z d d x ( x ) n ( x; r ) q 1 = 1 M M X k =1 [ 1 K M X n =1 ;n 6= k ( r j y ( n ) y ( k ) j )] q 1 (3.10) The quan tit y C ( q ; r ) is called the \correlation function" on the attractor. This function is a measure of the probabilit y that t w o p oin ts y ( n ) and y ( k ) on the attractor are separated b y a distance r . M and K are some large v alues but not innite. This function of t w o v ariables is an in v arian t on the attractor, but it has b ecome con v en tional to lo ok only at the v ariation of this quan tit y when r is small. In that limit, it is assumed that C ( q ; r ) r ( q 1) D q (3.11)

PAGE 61

44 dening the generalized fractal dimension D q when it exists. F rom the ab o v e equation, D q can b e estimated in the limiting case as D q = lim r small log [ C ( q ; r )] ( q 1) log [ r ] (3.12) In practice, w e need to compute C ( q ; r ) for a range of small r o v er whic h w e can argue that the function log [ C ( q ; r )] is linear in log [ r ] and then select the linear-lik e slop e o v er the range. 3.4.1 Bo x-Coun ting Dimension ( D 0 ) The b o x-coun ting dimension ( D 0 ) is estimated as the n um b er of spheres of radius r , namely , the n um b er of b o xes w e need to co v er all the p oin ts in the data sets. If w e ev aluate the n um b er N ( r ) as a function of r as r b ecomes small, then D 0 = lim r small log [ N ( r )] ( q 1) log [ r ] (3.13) This can also b e dened as D 0 = lim q ! 0 D q (3.14) 3.4.2 Information Dimension ( D 1 ) The information dimension ( D 1 ) is a generalization of the capacit y that tak es in to accoun t the relativ e probabilit y of cub es used to co v er the set. F rom the generalized dimensions, this information dimension can b e dened as D 1 = lim q ! 1 D q (3.15) 3.4.3 Correlation Dimension ( D 2 ) F or q = 2, the denition of the fractal dimension, D q , assumes a simple form that lends it to reliable computation. The resulting dimension, D 2 , is called the correlation dimension of the attractor [51 , 50 , 153 ] and is estimated as the slop e of the log-log

PAGE 62

45 curv e giv en b y D 2 = lim r small log [ C (2 ; r )] log [ r ] (3.16) The correlation dimension is easy to quan tify from exp erimen tal data but v ery hard to quan tify from time series data b ecause there ma y not exist a linear-lik e slop e range. 3.5 Ly apuno v Exp onen ts The increasing in terest in the computation of Ly apuno v exp onen ts is motiv ated b y its fundamen tal role in the theory of dynamical systems. Ly apuno v exp onen ts allo w for the generalization of the linear stabilit y analysis from p erturbations of steady state solutions to p erturbations of time-dep enden t solutions, and also pro vide a meaningful w a y to c haracterize the asymptotic b eha vior of nonlinear dynamics. In particular, in the last t w o decades the Ly apuno v exp onen ts ha v e b een widely used as a ma jor to ol to iden tify c haotic b eha viors. Lypuno v exp onen ts w ere rst in tro duced in 19 th cen tury b y A.M. Ly apuno v. He w as greatly inuenced b y Cheb yshev and w as a studen t with Mark o v. F rom his dissertation, the notion of Ly apuno v exp onen ts w ere emerged as w ell as a collection of pap ers on the equilibrium shap e of rotating liquids, on probabilit y , and on the stabilit y of lo w-dimensional dynamical systems. The usual metho ds of studying stabilit y (e.g., linear stabilit y) w ere not go o d enough, b ecause if in the long term the small errors due to linearization w ould accum ulate and mak e the appro ximation in v alid. Ly apuno v dev elop ed Ly apuno v Stabilit y concepts to o v ercome these diculties. The Ly apuno v exp onen ts of a system are a set of in v arian t geometric measures whic h describ e the dynamical con ten t of the system. In particular, they serv e as a measure of ho w easy it is to p erform prediction on the system. Ly apuno v exp onen ts quan tify the rate of div ergence or con v ergence of t w o nearb y initial p oin ts of a dynamical system, in a global sense. A p ositiv e Ly apuno v exp onen t measures the

PAGE 63

46 a v erage exp onen tial div ergence of t w o nearb y tra jectories, whereas a negativ e Ly apuno v exp onen t measures exp onen tial con v ergence of t w o nearb y tra jectories. A zero Ly apuno v exp onen t indicates the temp oral con tin uous nature of a o w. If a discrete nonlinear system is dissipativ e, a p ositiv e Ly apuno v exp onen t quan ties a measure of c haos. Consequen tly a system with p ositiv e exp onen ts has p ositiv e en trop y , in that tra jectories that are initially close together mo v e apart o v er time. The more p ositiv e the Ly apuno v exp onen ts are, the faster they mo v e apart. Similarly , for negativ e exp onen ts, the tra jectories mo v e together in time. A system with b oth a p ositiv e and negativ e Ly apuno v exp onen ts is said to b e c haotic. In other w ords, Ly apuno v exp onen ts quan tify the amoun t of linear stabilit y or instabilit y of an attractor or an asymptotically long orbit of a dynamical system. Giv en t w o initial conditions for a c haotic system, a and b , whic h are close together, the a v erage v alues obtained in successiv e iterations for a and b will dier b y an exp onen tially increasing amoun t. In other w ords, the t w o sets of n um b ers drift apart exp onen tially . If this is written e n for n iterations, then e is the factor b y whic h the distance b et w een closely related p oin ts b ecomes stretc hed or con tracted in one iteration. is the Ly apuno v exp onen t. A t least one Ly apuno v exp onen t m ust b e p ositiv e in a c haotic system. In other w ords, at eac h p oin t in the sequence, the deriv ativ e of the iterated equation is ev aluated. The Ly apuno v exp onen t is the a v erage v alue of the log of the deriv ativ e. If the v alue is negativ e, the iteration is stable. Note that summing the logs corresp onds to m ultiplying the deriv ativ es; if the pro duct of the deriv ativ es has magnitude less than 1, p oin ts will attract together as they go through the iteration. F or n -dimensional system, there are n Ly apuno v exp onen ts in the state space of the system, but the maxim um exp onen t is usually the most imp ortan t. The maxim um Ly apuno v exp onen t is the time constan t, , in the expression for the distance b et w een t w o nearb y orbits, e t . If is negativ e, then the orbits con v erge in time, and the

PAGE 64

47 dynamical system is insensitiv e to initial conditions. Ho w ev er, if is p ositiv e, then the distance b et w een nearb y orbits gro ws exp onen tially in time, and the system exhibits sensitiv e dep endence on initial conditions. Calculation of the Ly apuno v sp ectrum can b e deriv ed analytically where the equations of pro cess are kno wn [146 , 18 ]. There are sev eral published algorithms for p erforming this measuremen t on exp erimen tal data. The oldest and most tested algorithm is that b y W olf [162 ], then there are estimates based on the generation of lo cal Jacobian metrics from Ec kmann et al. [28 ], and Ellner et al. [31 ]. The W olf algorithm [162 ] is said to b e sensitiv e to the n um b er of observ ations as w ell as to the degree of measuremen t or system noise in the observ ations. This disco v ery motiv ated a searc h for new algorithmic designs with impro v ed nite-sample prop erties. This searc h for an algorithm to calculate Ly apuno v exp onen ts with desirable nite-sample prop erties has gained momen tum in the last few y ears. Abarbanel et al. [2], Ellner et al. [31 ], Iasemidis [65 ], Iasemidis and Sac k ellares [69 ], and McCarey et al. [95 ] came up with impro v ed algorithms for calculating the Ly apuno v exp onen ts from observ ed data. The main algorithmic design in all of the ab o v e pap ers in v olv es em b edding the observ ations in an m -dimensional space, then emplo ying the theorems of Ma ~ n e [102 ] and T ak ens [153 ] to use the observ ations in reconstructing the dynamics on the attractor. The Jacobian of the reconstructed dynamics as demonstrated in Ec kmann and Ruelle [29 ] and Ec kmann et al. [28 ] is then used to calculate the Ly apuno v exp onen ts of the unkno wn dynamics. The metho d of reconstructing an n -dimensional system from observ ations includes forming v ectors of m -consecutiv e observ ations, whic h for m > 2 n is generically an em b edding pro cess. The Jacobian metho ds for Ly apuno v exp onen ts utilize a function of m v ariables to mo del the data, and a Jacobian matrix is constructed at eac h p oin t in the orbit of the data. When em b edding o ccurs at dimension m , D , n , then the Ly apuno v exp onen ts of the reconstructed dynamics

PAGE 65

48 are the Ly apuno v exp onen ts of the original dynamics. Ho w ev er, if em b edding only o ccurs when m > n , then the Jacobian metho d yields m Ly apuno v exp onen ts, only n of whic h are the Ly apuno v exp onen ts of the original system. The problem is that as it is curren tly used, the Jacobian metho d is applied to the full m -dimensional space of the reconstruction, and not just to the n -dimensional manifold that is the image of the em b edding map. Our examples sho w that it is p ossible to get spurious Ly apuno v exp onen ts that are ev en larger than the largest Ly apuno v exp onen t of the original system. P arlitz [119 ] fo cused on the iden tication of spurious Ly apuno v exp onen ts b y presen ting a metho d for exp erimen tal data. This metho d is based on the observ ation that the true Ly apuno v exp onen ts c hange their signs up on time rev ersal, whereas the spurious exp onen ts do not. P arlitz's metho d [119 ] can b e a useful to ol for iden tication purp oses, esp ecially for con tin uous-time systems. F or discrete c haotic systems, in general it is not p ossible to run time bac kw ard, since the dynamics are not one to one. Lo cal Ly apuno v exp onen ts measure the gro wth rate of tangen t v ectors to a giv en orbit. More precisely , consider a map f in an m dimensional phase space, and its deriv ativ e matrix D f ( x ). Let v b e a tangen t v ector at the p oin t x . Then Ly apuno v exp onen ts at x is dened as follo w: L ( x; v ) = lim n !1 1 n l n j ( D f n ( x ) v ) j (3.17) No w the Multiplicativ e Ergo dic Theorem of Oseledec states that this limit exists for almost all p oin ts x and all tangen t v ectors v . There are at most m distinct v alues of L as v ranges o v er the tangen t space. In other w ords, the gro wth rate of tangen t v ectors to a giv en orbit is quan tied b y calculating the Jacobian matrix. F or the dieren tial equations, the eigen v alues of the Jacobian matrix a v erage o v er n steps can b e calculated to obtain Ly apuno v exp onen ts. In most real w orld situations w e do not kno w the dieren tial equations;

PAGE 66

49 therefore, w e m ust calculate the exp onen ts from a time series of exp erimen tal data. Extracting exp onen ts from a time series is a complex problem and requires care in its application and the in terpretation of its results.

PAGE 67

CHAPTER 4 GLOBAL OPTIMIZA TION Researc h in optimization started to attract a great deal of atten tion when signican t adv ances in linear programming w ere in tro duced in the late 1940's. A general form of optimization problems con tains t w o parts: an ob jectiv e function to b e minimized (or maximized), and a set of constrain ts whic h limit the domain of the system con trols and other v ariables. If b oth ob jectiv e function and constrain ts are linear, then the problem is considered to b e a linear programming problem. If b oth the ob jectiv e function and constrain ts con tain nonlinear comp onen ts, the problem b ecomes a nonlinear programming problem. In this thesis, w e are in terested in global optimization tec hniques for nonlinear programming problems. There are man y sp ecialized areas of researc h and practical applications in the researc h area of nonlinear programming. F or example, problems in engineering design, logistics, man ufacturing, and the c hemical and biological sciences often demand mo deling via noncon v ex form ulations that exhibit m ultiple lo cal optima. The p oten tial gains to b e obtained through global optimization of these problems motiv ated a stream of recen t eorts, including the dev elopmen t of deterministic and sto c hastic global optimization algorithms. F or an o v erview of v arious optimization tec hniques for nonlinear systems and constrain ts, see [39 , 62 , 103 ]. In section 4.1, w e in tro duce a summary of the most often used prop erties and fundamen tal results of functions in the area of deterministic global optimization for solving a general optimization problem, whic h are used in solution approac hes in this dissertation. In section 4.2, the basic concept of discrete optimization problems (linear and nonlinear) is discussed as w ell as tec hniques used to solv e those problems. 50

PAGE 68

51 In section 4.3, w e discuss some imp ortan t results in nonlinear and in teger programming problems, whic h are the motiv ation of this researc h. In section 4.4, quadratic programming is discussed. Multi quadratic programming is in tro duced in section 4.5. In section 4.6, the prop osed reform ulation-linearization tec hniques for quadratic in teger programming problems and m ulti-quadratic in teger programming problems are discussed in detail, and the optimalit y pro of for the prop osed tec hniques are also pro vided. Section 4.7 giv es applications of the prop osed tec hniques and concludes this c hapters. 4.1 F undamen tal Results on Global Optimization F rom elemen tary analysis, w e kno w that a closed set S 2 R n con tains the limits of all con v ergen t sequences of p oin ts x i 2 S . Th us, a con tin uous function f has the prop ert y that f ( x i ) ! f ( x ) (as i ! 1 ) whenev er x i ! x (as i ! 1 ). W e then ha v e the follo wing fundamen tal result of W eierstrass [61 ]. Theorem 4.1 (W eierstrass). If S is a nonempty c omp act set in R n , and f ( x ) is a c ontinuous function on S , then f ( x ) has at le ast one glob al minimum (maximum) p oint in S . Next, w e discuss some classical results ab out c haracterizations of lo cal and global minima, whic h will b e used throughout this c hapter. Theorem 4.2. Supp ose that the function f ( x ) is c ontinuously dier entiable on an op en set c ontaining S R n . if x is a lo c al minimum of f (with r esp e ct to S ), then d T r f ( x ) 0 for every d 2 Z ( x ) . W e call a p oin t x 2 S that satises d T r f ( x ) 0 for eac h d 2 Z ( x ) a critical (stationary) p oin t. In the case of con v ex problems, critical p oin ts are alw a ys global minima. When x is an in terior p oin t of S , then ev ery direction is feasible. In this case, if d 1 = r f ( x ) and d 2 = r f ( x ), this is equiv alen t to r f ( x ) = 0. A critical p oin t ma y not b e a lo cal minim um. Ho w ev er, from this prop ert y , w e can claim a necessary condition for lo cal optimalit y as in the follo wing theorem [61].

PAGE 69

52 In the case of con v ex problems, w e can sho w that critical p oin ts are alw a ys global minima b y using the prop erties of con v exit y [61 ]. Theorem 4.3. In the pr oblem of minimizing a c onvex function on a c onvex set, every lo c al minimum of a c onvex function f : S ! R ; S R n c onvex, is also a glob al minimum. Pr o of. W e can pro v e this theorem b y con tradiction. Let us assume that x b e a lo cal minim um p oin t and assume that there exists a p oin t x 2 S suc h that f ( x ) < f ( x ). F rom the con v exit y prop ert y , f ( x + ( x x )) f ( x ) + (1 ) f ( x ) < f ( x ) for 0 < < 1. This con tradicts with our initial assumption that x is a lo cal minim um b ecause there m ust exist suc h that f ( x + ( x x )) f ( x ) for 0 < < : Theorem 4.4. In the pr oblem of minimizing a c onc ave function on a c omp act c onvex set, a glob al minimum of the c onc ave function f : S ! R ; S R n c omp act c onvex is attaine d at an extr eme p oint of S . Pr o of. W e can represen t an y p oin t x 2 S as a con v ex com bination x = N P i =1 i v i where N P i =1 i = 1 ; i 0( i = 1 ; : : : ; N ) of extreme p oin ts v i of S . F rom the conca vit y of function f , w e ha v e f ( x ) N X i =1 i f ( v i ) N X i =1 i min f f ( v i ) : i = 1 ; : : : ; N g = min f f ( v i ) : i = 1 ; : : : ; N g (since N X i =1 i = 1) : W e note that a global minim um can o ccur at p oin ts whic h are not extreme p oin ts in the case that f is not strictly conca v e. In the case that f is strictly conca v e, global and lo cal minima can only o ccur at extreme p oin ts. If the feasible region is a conca v e function o v er a p olytop e S , to nd the global minim um, w e ha v e to consider only the nite n um b er of v ertices of S .

PAGE 70

53 4.1.1 Karush{Kuhn{T uc k er Conditions In this section, w e consider rst-order necessary conditions for optimalit y in terms of a system of equations and inequalities. These conditions are kno wn as the Karush{ Kh un{T uc k er (KKT) conditions. Consider the nonlinear programming problem with inequalit y constrain ts, min f ( x ) s.t. g ( x ) 0 ; (4.1) where f : R n ! R ; g : R p ! R . The KKT conditions of the ab o v e problem can b e written as r f ( x ) + r g ( x ) T u = 0 ; g ( x ) 0 ; u 0 ; (4.2) g ( x ) T u = 0 ; where u 2 R p . Dene S = f x : g i ( x ) 0 ; i = 1 ; : : : ; p g R n . The constrain ts of the optimization problem min x 2 S f ( x ) are considered to b e regular in x 2 S when L ( x ) = cl Z ( x ). In this case, ev ery condition that ensures regularit y is called a c onstr aint qualic ation . The full statemen t of the theorem in v olving the KKT optimalit y conditions requires certain constrain t qualication to b e satised. Tw o w ell-kno wn constrain t qualications for Problem (4.1) are dened as follo ws. Denition 4.1. The line ar indep endenc e c onstr aint qualic ation is said to b e satise d at a solution x of Pr oblem (4.1), for any g , if the ve ctors r g ( x ) for i 2 M ( M = f i : g i ( x ) = 0 g ) ar e line arly indep endent. Denition 4.2. The A rr ow-Hurwicz-Uzawa c onstr aint qualic ation is said to b e satise d at a solution x of the pr oblem (4.1), for any g , if J W ( x ) z > 0 and J V ( x ) z > 0

PAGE 71

54 have a solution z 2 R n , wher e W = f i : g i ( x ) = 0 ; g i is c onc ave at x g ; V = f i : g i ( x ) = 0 ; g i is not c onc ave at x and J W ( x ) ; J V ( x ) ar e matric es denoting the r ows of the Jac obian of g at x with r esp e ct to W and V r esp e ctively. The KKT Theorem for the rst-order necessary conditions of optimalit y can no w b e stated as follo ws [61 ]. Theorem 4.5. L et X b e an op en set in R n and F = f x 2 X : g ( x ) 0 g b e nonempty. L et x b e a lo c al minimizer of min x 2 F f ( x ) and supp ose either one of the ab ove c onstr aint qualic ations is satise d at x . Then ther e exists a u 2 R m such that ( x; u ) solves Pr oblem (4.2). Corresp ondingly , three of the most w ell-kno wn constrain t qualications are giv en in the next theorem [61 ]. Theorem 4.6. Each of the fol lowing c onditions is a c onstr aint qualic ation: (i) g i ( x ) = a T i x b i ; a i 2 R n n f 0 g ; b i 2 R ( i = 1 ; : : : ; p ) (linear constrain ts). (ii) g i ( x ) is con v ex ( i = 1 ; : : : ; p ) and there exist x suc h that g i ( x ) < 0 ( i = 1 ; : : : ; p ) (Slater Condition). (iii) The v ectors r g i ( x ) ; i 2 I ( x ), are linearly indep enden t. Under certain con v exit y assumptions, the necessary conditions for optimalit y are also the sucien t conditions for optimalit y . There are also similar second-order necessary and sucien t conditions for optimalit y . See [93 , 17 ] for more details and pro ofs of the conditions for optimalit y . 4.1.2 KKT Conditions and the Linear Complemen tarit y Problem Consider the linear programming problem giv en b y min f cx : Ax b; x 0 g . W e then ha v e the corresp onding Lagrangian function L ( x; ; ) = cx T ( Ax b ) T x where and are the m and n dimensional Lagrange m ultipliers for the inequalit y and nonnegativit y constrain ts, resp ectiv ely . The KKT conditions from the

PAGE 72

55 LP are giv en b y r x L ( x; ; ) = c T A T = 0 r L ( x; ; ) = Ax b 0 r L ( x; ; ) = x 0 T ( Ax b ) = 0 T x = 0 0 ; 0 : Since the linear program is con v ex, the ab o v e KKT conditions are necessary and sucien t conditions. These conditions can b e rewritten as v = Ax b; u = A T c T ; and note that u T x + v T = 0. Dene = 0 B @ u v 1 C A = 0 B @ x 1 C A M = 0 B @ 0 A T A 0 1 C A = 0 B @ c T b 1 C A : Solving a linear program with inequalit y constrain ts is equiv alen t to solving the system M = = 0 ; = 0 (4.3) T = 0 : The last t w o constrain ts imply that i i = 0 for all i . Problems of the form (4.3) are kno wn as linear complemen tarit y problems. 4.1.3 Optimalit y Conditions Most mathematical programs are solv ed b y applying an algorithm that searc hes for a p oin t in the feasible region whic h satises a set of optimalit y conditions. Under certain con v exit y assumptions, the necessary conditions for optimalit y are also the sucien t conditions for optimalit y . There are also similar second-order necessary and

PAGE 73

56 sucien t conditions for optimalit y . F or more details and pro ofs of the conditions for optimalit y , see [17 , 93 , 101 ]. In this section w e outline the second order optimalit y conditions. Theorem 4.7 (Second Order Necessary Conditions). Supp ose that x is a loc al minimum of f ( x ) subje ct to h ( x ) = 0 as wel l as a r e gular p oint of these c onstr aints. Then ther e exists a ve ctor 2 R m such that r f ( x ) T r h ( x ) = 0 . If we denote the tangent plane T = f y : r h ( x ) T y = 0 g , then the matrix L ( x ) , F ( x ) T H ( x ) is p ositive semidenite on T ; that is, y T L ( x ) y = 0 for al l y 2 T . Pr o of. See [101 ]. Theorem 4.8 (Second Order Suciency Conditions). Supp ose ther e is a p oint x satisfying h ( x ) = 0 and a ve ctor 2 R m such that r f ( x ) T r h ( x ) = 0 : Assume that the matrix L ( x ) = F ( x ) T H ( x ) is p ositive denite on T = f y : r h ( x ) T y = 0 g ; that is, for al l y 2 T ; y 6= 0 , we have y T L ( x ) y > 0 . Then we have x which is a strict (unique) lo c al minimum of f subje ct to h ( x ) = 0 . Pr o of. See [101 ]. 4.2 Discrete Optimization Next, w e discuss the descriptions of the problems and sp ecial cases (i.e., classes of problems that are easy to solv e). Then, general tec hniques used to solv e the linear and nonlinear discrete optimization problems are discussed. The most general form of the problem of in terest can b e stated as Minimize f ( x; y ) Sub ject to g ( x; y ) = 0 (4.4) h ( x; y ) 0 x 2 D Z p ; y 2 R q ;

PAGE 74

57 where f : R p + q ; g : R p + q ! R m ; h : R p + q ! R l are assumed to b e functions with con tin uous 2 nd order deriv ativ es, and D is a b ounded set of Z p . The decision v ariables are represen ted b y x and y , whic h are discrete and con tin uous v ariables resp ectiv ely . 4.2.1 Linear Discrete Optimization Problems In general, linear discrete optimization problems can b e describ ed b y \in teger programming". There are man y applications in v olving linear discrete optimization problems (e.g., set partitioning problem, generalized linear assignmen t problem, in teger net w ork o w problem, and shortest path problem). Linear discrete optimization problems can b e expressed in Problem (4.4) with f ; g ; and h b eing linear. Although this problem is linear, it is still v ery dicult to solv e. Ho w ev er, for certain classes of linear discrete optimization problems, a relaxation of Problem (4.4) on the in tegralit y constrain t ( x 2 D ) ma y ha v e an optimal solution v ector x 2 D ; that is, the optimal solution to the relaxed Problem (4.4) ma y b e the optimal solution to the original Problem (4.4). The term \relaxation" can b e expressed b y the follo wing denition [61 ] . Denition 4.3. Given an optimization pr oblem P : min f f ( x ) : x 2 X g and optimization pr oblem P : min f f ( x ) : x 2 Y g . P is said to b e a r elaxation of P if and only if X Y and f ( x ) f ( x ) for al l x 2 X . T otal Unimo dularit y is one of the most imp ortan t classes of linear discrete optimization problems in whic h the optimal solution to the relaxation of problem 4.4 yields the optimal solution to the original problem. Suc h, in this case, the problem can b e form ulated as Maximize c T x + d Sub ject to Ax = b (4.5) x 0 x 2 D Z n ;

PAGE 75

58 where the matrix A is unimo dular. Denition 4.4. A squar e inte ger matrix B is c al le d unimo dular (UM) if its determinant det ( B ) = 1 : A n inte ger matrix A is c al le d total ly unimo dular (TUM) if every squar e, nonsingular submatrix of A is UM. If B is formed from m linearly indep enden t columns of A , it determines the basic solution x = B 1 b = B ad j b det( B ) where B ad j is the adjoin t of B , and so if B is UM and b is in teger (whic h w e alw a ys assume), x is in teger. If w e dene the p olytop e R 1 ( A ) = f x : Ax = b; x 0 g to b e the usual feasible set for the standard form linear programming problem, w e ha v e the follo wing prop osition [61 ]. Prop osition 4.1. If A is TUM, then al l the vertic es of R 1 ( A ) ar e inte ger for any inte ger ve ctor b . Th us a standard form linear programming problem with TUM matrix will alw a ys lead to an in teger optim um. In addition, this result also holds for inequalit y constrain ts. T o b e more rigorous, w e ha v e the follo wing prop osition [61 ]. Prop osition 4.2. Consider a line ar discr ete optimization pr oblem P : min c T x + d; s.t. Ax = b; x 2 D Z n and its r elaxation P : min c T x + d; s.t. Ax = b; x 0 ; x R n . L et matrix A is total ly unimo dular (TUM), b 2 Z n and Z n \ f x : Ax = b; x 0 g D . If x is an optimal solution to P , then x is also the optimal solution to P . Pr o of. If x an optimal solution to P , then x = B 1 b and c N c B B 1 N 0. Consider the adjoin t matrix of B , adj( B ), whic h is the transp osed matrix of cofactors of A . Eac h en try of adj( B ) is formed from the determinan ts of square submatrices

PAGE 76

59 of B . Since an y square submatrix of B is also a square submatrix of A , b y the TUM prop ert y of A, w e ha v e adj( B ) 2 Z n n and det( B ) = 1. This implies that x = B 1 b = B ad j b det ( B ) 2 Z n . Since x is feasible, w e ha v e Ax = b; x 0. x 2 Z n \ f x : Ax = b; x 0 g D . Th us, x is also an optimal solution to P [61 ]. In this section, w e consider Problem (4.5), without loss of generalit y , and assume that D f 0 ; 1 g n . Next, w e discuss the general tec hniques used to solv e this problem. 4.2.1.1 Branc h-and-b ound tec hniques The branc h-and-b ound metho ds start analogously to the outer appro ximation algorithms with a relaxation of the feasible region of a linear zero-one problem giv en b y Maximize c T x Sub ject to w T x b (4.6) x 2 f 0 ; 1 g : This relaxation is c hosen suc h that a lo w er as w ell as an upp er b ound for the optimal v alue of Problem (4.6) can b e determined. The feasible region is partitioned in to sub domains and suc h a partitioning pro cess can b e represen ted b y a tree in whic h eac h no de represen ts a subproblem. The simplest w a y to partition the feasible region is to consider the t w o subproblems when a v ariable x i = 0 and x i = 1. These subproblems generated b y the partition are used to determine b ounds on the ob jectiv e function and up date the b est ob jectiv e v alue obtained so far. If a subproblem considered in the branc h-and-b ound tree has a lo w er b ound, whic h exceeds the curren t b est kno wn v alue for Problem (4.6), then this set is eliminated from further considerations (pruning). Suc h sets cannot con tain feasible p oin ts of Problem (4.6) with a smaller ob jectiv e function v alue than the b est v alue kno wn so far. Using these strategies one hop es that the algorithm concen trates the searc h for a global minim um of Problem (4.6) on a small p ortion of the feasible region. One exp ects that a large part of feasible region,

PAGE 77

60 whic h do es not con tain a global minim um of Problem (4.6), is pruned from further considerations at an early stage of the examination of the optimization problem b y the branc h-and-b ound algorithm, whic h is applied for the solution to this problem. T o b e more sp ecic, upp er and lo w er b ounds are b eing generated at dieren t lev els and no des of the tree throughout the whole branc h-and-b ound pro cess, un til the upp er and lo w er b ounds dier b y an acceptable tolerance. The optimal ob jectiv e v alue of a subproblem will b e a lo w er b ound on the solution to Problem (4.6) if a subset of the v ariables are allo w ed to b e con tin uous. W e note that if no feasible solution for the relaxation of a subproblem exists, then no feasible solution exists for the subproblem itself. When a feasible solution exists, it is an upp er b ound to Problem (4.6). In the case of an infeasible subproblem, the lea v es of this subproblem are discarded. Lik ewise, if an y subproblems are sho wn to ha v e ob jectiv e v alues or b ounds that are not as go o d as the b est kno wn ob jectiv e v alue, they are also discarded. The whole pro cess is rep eated un til all the p ossible partitions ha v e b een carried out and an optimal solution is obtained, or if the upp er and lo w er b ounds of all partitions considered fall within a predetermined tolerance. 4.2.1.2 Cutting-plane metho d (outer appro ximation) One of cen tral ideas of cutting-plane metho ds is to add constrain ts to the problem so that a discrete solution is obtained while solving a con tin uous problem. The cutting-plane metho d uses the follo wing basic concept. Determine a sup erset S 0 of S , whic h has a simple structure, for example a p olyhedron, and try to minimize the function f with resp ect to this bigger set. If the minimization of f with resp ect to the simpler set S 0 is still to o complicated, determine a simpler function f , whic h underestimates f on the set S , and solv e the problem min f ( x ) : x 2 S . This problem yields a lo w er b ound for the optimal v alue of the original problem. Suc h problems are usually called relaxations of the original problem.

PAGE 78

61 F or example, in the case of the zero-one problem, the con tin uous (relaxed) problem min f f ( x ) : Ax = b; 0 x e g is solv ed b y using the simplex metho d. If the solution x 0 2 f 0 ; 1 g n is obtained, then w e obtain an in teger solution b y solving a relaxed problem. Otherwise an additional linear constrain t is in tro duced to cut a w a y x 0 from b eing an optimal solution of the new problem b y the h yp erplane, and y et not eliminating an y feasible p oin t in f 0 ; 1 g n . Again, the relaxed problem with an additional constrain t is solv ed b y the simplex metho d. This pro cess is rep eated iterativ ely un til an x 0 2 f 0 ; 1 g n is found or there is no more feasible solutions; that is, the original problem is infeasible. Next, w e giv e an example of a cut. Supp ose the simplex metho d is applied to the relaxed problem min f f ( x ) : Ax = b; 0 x e g , the optimal tableau x i = g i 0 + P j 2 N g ij ( x j ) ; i 2 B , where B and N are the basic and non basic v ariables in the optimal tableau resp ectiv ely are obtained. F or some k 2 B , assume that x k is fractional. If w e dene N 1 = f j 2 N : f k j < f k 0 g , where f k j represen ts the fractional part of g k j , then one of p ossible cuts can b e dened as P j 2 N 1 min f f k j f k 0 ; 1 f k j 1 f k 0 x j 1. The rst cutting-plane metho d w as dev elop ed b y Gomory [48 ]. Ho w ev er, pure cutting-plane algorithms are impractical b ecause of the slo w con v ergence to in teger solutions. T ypically , branc h-and-b ound algorithms are com bined with the cuttingplane approac h in whic h a small n um b er of ecien t cuts are added to the problems at the no des of the branc h-and-b ound tree. Suc h metho ds are kno wn as the branc hand-cut metho ds and a recen t surv ey can b e found in [98 ]. 4.2.1.3 Con v ex en v elop e metho d In branc h-and-b ound and cutting-plane metho ds, w e need a simpler function f , whic h underestimates the examined function f with resp ect to a giv en set S . Since con v ex functions lead to easily solv able problems, the so-called con v ex en v elop e of an arbitrary function f is a concept frequen tly used for determining the desired function f .

PAGE 79

62 Denition 4.5. L et f : S ! R b e a lower-semic ontinuous function dene d on a nonempty c onvex set S R n . The c onvex envelop e of f ( x ) on the set S is a function F ( x ) with the fol lowing pr op erties: (i) F ( x ) is con v ex on the set S ; (ii) F ( x ) f ( x ), for all x 2 S ; (iii) if h ( x ) : S ! R is a con v ex function suc h that h ( x ) f ( x ) for all x 2 S , then h ( x ) F ( x ) for all x 2 S [61 ]. Consequen tly , the con v ex en v elop e F ( x ) of a function f on a set S is the uniformly b est con v ex underestimating function for f on the giv en set. In general, ho w ev er, the construction of a con v ex en v elop e F ( x ) is a problem, whic h migh t b e harder to solv e than the considered optimization problem itself. F or some instances, the explicit form of the con v ex en v elop e is kno wn. F or example, if f is a conca v e function and S is a p olytop e with giv en v ertex set V ( C ) = f v 1 ; : : : ; v k g , the con v ex en v elop e F ( x ) of f with resp ect to S is giv en b y F ( x ) = min f k X i =1 i f ( v i ) : x = k X i =1 i v i ; 2 R k + ; k X i =1 i = 1 [61 ] : This implies that the con v ex en v elop e of a conca v e function f with resp ect to an n -simplex S = [ v 0 ; : : : ; v n ] is the uniquely determined ane function, whic h coincides in the n + 1 v ertices of S with f [61 ]. In some cases an o v erestimating function for a giv en function f with resp ect to a set S is additionally needed. In this situation the analogous concept of the so-called conca v e en v elop e F ( x ) can b e applied. Denition 4.6. L et f : S ! R b e a upp er-semic ontinuous function dene d on a nonempty c onvex set S R n . The c onc ave envelop e of f ( x ) on the set S is a function F ( x ) such that F ( x ) is the c onvex envelop e of f ( x ) on the set S . Th us, the conca v e en v elop e F ( x ) of a function f is the b est conca v e o v erestimating function of f on the set S . Ob viously , the conca v e en v elop e of a con v ex function f

PAGE 80

63 with resp ect to an n -simplex S is also the uniquely determined ane function, whic h coincides in the v ertices of S with f . 4.2.2 Nonlinear Discrete Optimization Problems Although man y discrete optimization problems are linear, there is also a v ast n um b er of practical problems that are nonlinear. F or example, in a linear assignmen t problem it ma y turn out that one needs to factor in nonlinear costs in the ob jectiv e function, then this problem b ecomes a nonlinear assignmen t problem. A w ell-kno wn nonlinear v ersion of the linear assignmen t problem is the quadratic assignmen t problem, whic h will b e discussed later in this c hapter. While problems with only binary v ariables are the simplest form of discrete problems, they are v ery imp ortan t b ecause an y discrete problem with b ounded v ariables can alw a ys b e transformed in to a binary problem. More sp ecically , problems with constrain ts x i 2 I , where I is a nite set of in tegers, can alw a ys b e transformed to an equiv alen t problem with binary v ariables. Without a loss of generalit y , consider the example of an in teger v ariable x b ounded b y 0 and u . The v ariable x can then b e substituted b y x = k P i =0 2 i v i ; where k = b ln u ln 2 c and v 2 B n is new binary v ariable. The in teger v ariable x w as replaced with k + 1 binary v ariables. This is p erhaps the b est w a y to in tro duce the minimal n um b er of binary v ariables p ossible in place of in teger v ariables with an upp er and lo w er b ound [87 ]. Although w e can con v ert all the b ounded in teger v ariables in to binary v ariables, it is v ery disadv an tageous to in tro duce suc h a large n um b er of additional v ariables. Ho w ev er, the basic data structure has not increased. F or example, the size of the Hessian of the ob jectiv e function ma y increase b y a factor of ten (hence the n um b er of elemen ts increases b y a factor of h undred) but the n um b er of nonzero elemen ts is lik ely to remain constan t. An ob vious approac h to solving nonlinear discrete problems is to generalize the t w o metho ds discussed for solving the linear discrete problem [56 ]. Note that b oth of

PAGE 81

64 these approac hes capitalize on the existence of fast algorithms to solv e the con tin uous problem. There is an inheren t dicult y in generalizing the branc h-and-b ound (and hence the branc h-and-cut) metho d b ecause it critically dep ends on the uniqueness of the solution. In the con v ex case, this w ould not b e an issue; ho w ev er, dev eloping algorithms for noncon v ex problems is extremely dicult. Obtaining a discrete solution when solving a con tin uous problem with a nonlinear ob jectiv e function is v ery dicult. Therefore, generalizing the tec hniques for linear problems to nonlinear problems is limited. Consider the follo wing nonlinear discrete problem: min f ( x ) s.t. 0 x e x 2 f 0 ; 1 g n : If f ( x ) is linear, then the cutting plane algorithm is unnecessary b ecause all p ossible in teger solutions are b ounded. In the case that the in tegralit y constrain ts are dropp ed, w e need to ensure that an in tegral solution is obtained. Indeed, the appropriate v ertex of the feasible region ma y b e found b y examining the co ecien ts of the ob jectiv e function. When f ( x ) is nonlinear, the problem is non trivial b ecause solving the con tin uous problem no longer guaran tees an in teger solution. The v ery rationale b ehind a pure cutting plane metho d is therefore no longer v alid; ho w ev er, the idea of using cutting planes within other algorithms is still v alid. Next, w e discuss some of the metho ds that ha v e b een prop osed to handle more general nonlinear discrete problems. Ho w ev er, it is hard or imp ossible to generalize suc h algorithms to deal with certain t yp es of nonlinear discrete problems. 4.2.2.1 Decomp osition metho ds Decomp osition metho ds can b e applied to solv e problems with a mixture of discrete and con tin uous v ariables. F or instance, the generalized Benders decomp osition metho d decomp oses a mixed-in teger nonlinear programming problem in to t w o

PAGE 82

65 problems that are solv ed iterativ ely a pure in teger linear master problem and a nonlinear con tin uous subproblem [45 ]. The nonlinear subproblem is obtained b y xing the in teger v alues and the con tin uous v ariables are optimized to giv e an upp er b ound to the original problem. On the other hand, the master problem optimizes for the new in teger v ariables b y imp osing new constrain ts, suc h as the Lagrangian dual form ulation of the nonlinear problem. The master problem yields additional com binations of the in teger v ariables for the subsequen t nonlinear subproblems, as w ell as estimate lo w er b ounds to the original problem. Under con v exit y assumptions, the master problems generate a sequence of lo w er b ounds that is monotonically increasing. The algorithm terminates when the dierence b et w een the upp er b ound and lo w er b ound is smaller than a presp ecied tolerance. The outer appro ximation metho d is another example of the decomp osition approac h, whic h has b een implemen ted as the DICOPT solv er in GAMS [27 ]. Similar to the generalized Benders decomp osition metho d, this metho d in v olv es solving a master problem and a con tin uous nonlinear subproblem alternately . The main dierence is ho w to setup the master problem. The metho d generates the master problems b y linearizations of the nonlinear constrain ts (using T a ylor series) at those p oin ts that are the optimal solutions of the nonlinear subproblems. Then, the master problems b ecome mixed-in teger programming problems. In the con v ex case, global optimalit y or nite termination is guaran teed. In the noncon v ex case, a decomp osition metho d is not guaran teed to obtain a reasonable solution. In addition, the n um b er of constrain ts in the master problems increases at ev ery iteration; therefore, the cost of solving the master problems ma y b ecome v ery exp ensiv e. 4.2.2.2 Branc h-and-reduce metho ds The branc h-and-reduce metho d is in the class of branc h-and-b ound algorithms. T o pro duce a lo w er b ound for the original problem, the metho d needs to construct a relaxation of the original problem that can b e solv ed to optimalit y . Generally , the

PAGE 83

66 relaxation is constructed b y enlarging the feasible region or using an underestimation of the ob jectiv e function. The metho d also in v olv es range con traction tec hniques (i.e., in terv al analysis and dualit y theory that systematically reduce the feasible region to b e considered), and incorp orates branc hing sc hemes that guaran tee nite termination with the global optimal solution for certain t yp es of problems. A t eac h iteration, the searc h domain is partitioned in to upp er and lo w er b ounds. Similar to branc h-andb ound algorithms, this metho d remo v es partitions that pro duce infeasible regions or regions with ob jectiv e v alues w orse than the curren t solution. The partitioning pro cess con tin ues un til the dierence b et w een the upp er and lo w er b ounds o v er all partitions is less than a pre-sp ecied tolerance. The dicult y of this metho d is that it ma y go through an unpredictably large n um b er of iterations ev en though it has go o d branc hing sc hemes. Since the metho d needs to solv e the corresp ondingly large n um b er of nonlinear relaxation problems, a hea vy computational burden is required. In addition, the construction of the relaxation problem ma y in v olv e a con v ex underestimation of the ob jectiv e function, whic h generates inecien t b ounds. 4.3 Nonlinear and In teger Programming Problems In teger programming problems ha v e a wide range of applications. Although in teger programs and their solution approac hes ha v e b een widely studied, they are still far from complete in comparison with the op en problems. In this section w e aim to clarify the logical connections b et w een in teger programming problem and other elds of Mathematics. A nonlinear in teger programming problem and its connections with sp ecial noncon v ex problems and complemen tarit y problems are discussed. W e mainly consider a quadratic programming problem as a particular case, whic h is the problem of our in terest. These connections are useful to reduce the resolution of a quadratic program to a linear in teger program. In addition, they motiv ated us to dev elop a new reform ulation-linearization tec hnique to transform a m ulti-quadratic program to a linear in teger program, whic h mak es the problem a lot easier to solv e.

PAGE 84

67 4.3.1 Equiv alence Bet w een Discrete and Con tin uous Programs Before w e sho w the equiv alence b et w een discrete and con tin uous programs, it is imp ortan t to discuss an equiv alence prop ert y b et w een t w o extrem um problems [46 ]. Therefore, w e refer to the follo wing theorem (see [46 ] for a pro of ). Theorem 4.9. L et Z and X b e c omp act sets in R n ; R b e a close d set in R n , and let the fol lowing hyp otheses hold. H 1 ) f : R n ! R is a b ounde d function on X , and ther e exists an op en set A Z and r e al numb er ; L > 0 such that, for any x; y 2 S ; f satises the fol lowing H older c ondition: j f ( x ) f ( y ) j L k x y k : H 2 ) It is imp ossible to nd ' : R n ! R such that (i) ' is con tin uous on X , (ii) ' ( x ) = 0 ; x 2 Z ; ' ( x ) > 0 ; x 2 X Z ; (iii) 8 z 2 Z ; there exists a neigh b orho o d S ( z ) and a real " > 0 suc h that, for an y x 2 S ( z ) \ ( X Z ) ; ' ( x ) " k x z k . Then a r e al 0 exists such that for any r e al 0 , min f ( x ) ; x 2 Z \ R is e quivalent to min[ f ( x ) + ' ( x )] ; x 2 X \ R : No w w e can sho w an equiv alence b et w een discrete and con tin uous programs from the follo wing theorem [46 ]. Theorem 4.10. L et e T = (1 ; 1 ; : : : ; 1) ; Z = B n ; X = f x 2 R n ; 0 x e g ; R = f x 2 R n ; g ( x ) 0 g . Consider the pr oblem min f ( x ) s.t. g ( x ) 0 (4.7) x 2 B n ;

PAGE 85

68 and the pr oblem min [ f ( x ) + x T ( e x )] s.t. g ( x ) 0 (4.8) 0 x e: Then we supp ose that f veries assumption H 1 fr om The or em 4.9 with = 1 ; that is, it is b ounde d on X and Lipschitz c ontinuous on an op en set A Z . Subse quently, ther e exists some 0 2 R such that 8 < 0 Pr oblems (4.7) and (4.8) ar e e quivalent. 4.3.2 In teger Programs and Complemen tarit y Problems The connections b et w een in teger programs and complemen tarit y problems can b e exhibited b y applying KKT conditions. The results can b e generalized in the quadratic programming case [61 ]. Theorem 4.11. L et us rst assume 4.11a) f : R n ! R ; g : R n ! R are con tin uously dieren tiable functions. 4.11b) g ( x ) satises a constrain t qualication condition at x 0 to ensure that KKT conditions are v alidated. Then the nonline ar pr o gr amming pr oblem min f ( x ) s.t. g ( x ) 0 ; (4.9) x 0 ;

PAGE 86

69 has an optimal solution x 0 i ther e exist u 0 2 R n ; y 0 ; v 0 2 R v such that ( x 0 ; y 0 ; u 0 ; v 0 ) is an optimal solution to the fol lowing pr oblem: min f ( x ) s.t. f 0 ( x ) y T g 0 ( x ) u = 0 ; g ( x ) v = 0 ; (4.10) y T v = 0 x T u = 0 x; y ; u; v 0 : Pr o of. Ne c essity. If x 0 is an optimal solution to Problem (4.9), from KKT conditions w e obtain ( y 0 ; u 0 ) suc h that f 0 ( x 0 ) y 0 T g ( x 0 ) u 0 = 0 ; g ( x 0 ) 0 ; x 0 T u 0 = 0 ; x 0 ; y 0 ; u 0 0 : Let v 0 = g ( x 0 ), then ( x 0 ; y 0 ; u 0 ; v 0 ) is an optimal solution to Problem (4.10). Suciency. The pro of is trivial. W e no w generalize the results of Theorem 4.11 to the quadratic programming case. Consider the follo wing problem min 1 2 x T Qx + c T x s.t. Ax b; (4.11) x 2 B n ;

PAGE 87

70 where Q is a symmetric matrix. Using Theorem 4.10, Problem (4.11) is equiv alen t to min [ 1 2 x T ( Q 2 I ) x + ( c T + e T ) x ] s.t. Ax b; (4.12) x e; x 0 : Applying Theorem 4.11 to Problem (4.12), w e then obtain min [ 1 2 x T ( Q 2 I ) x + ( c T + e T ) x ] (4.13) s.t. c + Qx + ( e 2 x ) y T A + t = u; (4.14) b Ax = v ; (4.15) e x = w ; (4.16) x T u = 0 ; (4.17) y T v = 0 ; (4.18) t T w = 0 ; (4.19) x; y ; t; u; v ; w 0 : (4.20) Arrange the terms in Eq. (4.14), w e then ha v e Qx 2 x = ( c + e ) + y T A t + u: Consequen tly , (4.13) b ecomes min[ 1 2 ( c T + e T ) x + 1 2 ( b T y e T t ). F rom Eqs. (4.17), (4.18), and (4.19), w e ha v e x T u = 0 ; 0 = y T v = y T b y T Ax; 0 = t T w = t T e t T x ;

PAGE 88

71 therefore, y T b = y T Ax and t T e = t T x . T ak en all together, Problem (4.11) is equiv alen t to the follo wing problem. min ^ c T ^ x s.t. ^ A ^ x + ^ u = ^ b; ^ x ^ u = 0 ; ^ x; ^ u 0 ; where ^ x T = ( x T ; y T ; t T ) ; ^ u T = ( u T ; v T ; w T ) ; ^ A = 0 B B B B @ Q + 2 I A T I A 0 0 I 0 0 1 C C C C A ; ^ c T = 1 2 ( c T + e T + e T ; b T ; e T ) ; ^ b T = ( c T ; b T ; e T ) : Note that there are no restrictiv e assumptions made on Q , this transformation is applicable to the con v ex case as w ell as the noncon v ex case. 4.4 Quadratic Programming In this section w e consider a quadratic programming (QP) problem of the follo wing form: min f ( x ) = 1 2 x T Qx + c T x s.t. x 2 D (4.21) where D is a p olyhedron in R n , c 2 R n . Without an y loss of generalit y , w e can assume that Q is a real symmetric ( n n )-matrix. If this is not the case, then the matrix Q

PAGE 89

72 can b e con v erted to symmetric form b y replacing Q b y ( Q + Q T ) = 2, whic h do es not c hange the v alue of the ob jectiv e function f ( x ). Note that if Q is p ositiv e semidenite, then Problem (4.21) is considered to b e a con v ex minimization problem. When Q is negativ e semidenite, Problem (4.21) is considered to b e a conca v e minimization problem. When Q has at least one p ositiv e and one negativ e eigen v alue (i.e., Q is indenite), Problem (4.21) is considered to b e an indenite quadratic programming problem. W e kno w that in the case of con v ex minimization problem, ev ery KuhnT uc k er p oin t is a lo cal minim um, whic h is also a global minim um. In this case, there are a n um b er of classical optimization metho ds that can obtain the globally optimal solutions of quadratic con v ex programming problems. These metho ds can b e found in man y places in the literature. In the case of conca v e minimization o v er p olytop es, it is w ell kno wn that if the problem has an optimal solution, then an optimal solution is attained at a v ertex of D . On the other hand, the global minim um is not necessarily attained at a v ertex of D for innite quadratic programming problems. In this case, from second order optimalit y conditions, the global minim um is attained at the b oundary of the feasible domain. In this researc h, without loss of generalit y , w e are in terested in dev eloping solution tec hniques to solv e general (con v ex, conca v e and indenite) quadratic programming problems. The main basis for classication of quadratic problems of the form in Problem (4.21) comes from the prop erties of the quadratic matrix Q [40 ]. In this framew ork, quadratic problems can b e categorized as follo ws: 1 Bilinear Problems: The matrix Q is suc h that there exist t w o sub v ectors of distinct v ariables y and z of x suc h that the problem is linear when one of these v ectors is xed. 2 Conca v e Quadratic Problems: When the matrix Q is negativ e semidenite (i.e., all its eigen v alues are nonp ositiv e), Problem (4.21) reduces to one of conca v e minimization problem. 3 Indenite Quadratic Problems: This class of problems is the most in tractable among the others and arises when the matrix Q has b oth p ositiv e and negativ e

PAGE 90

73 eigen v alues. Moreo v er, there are not to o man y solution approac hes for this class of problems. 4.4.1 Complexit y of Quadratic Optimization In this section w e discuss the complexit y of quadratic programming problems. The complexit y analysis can giv e an idea of the p ossibilit y of dev eloping ecien t algorithms for solving the problem. In [139 ], the QP w as sho wn to b e N P -hard in the case of a negativ e denite matrix Q . The QP w as also pro v en to b e N P -hard b y reduction to the satisabilit y problem [155 ], and reduction to the knapsac k feasibilit y problem [110 ]. Moreo v er, it has also b een sho wn that c hec king lo cal optimalit y for the QP itself is an N P -hard problem [155 ]. In addition, c hec king for strict con v exit y (c hec king lo cal optimalit y as part of the second order necessary conditions) in the QP w as pro v en to b e N P -hard [116 ]. In fact, nding a lo cal minim um and pro ving lo cal optimalit y of suc h a solution to the QP ma y tak e exp onen tial time. This is true ev en in the case of a small n um b er of conca v e v ariables. F or instance, although the matrix Q is of rank one with exactly one negativ e eigen v alue, the QP is still N P -hard [117 ]. Ho w ev er, a large n um b er of negativ e eigen v alues do es not necessarily mak e the problem harder to solv e. F or example, consider the follo wing problem: min 1 2 x T Qx + c T x s.t. x 0 : If the matrix Q has ( n 1) negativ e eigen v alues, then there m ust b e at least ( n 1) activ e constrain ts at the optimal solution [58 ]. Corresp ondingly , it is sucien t to solv e ( n 1) dieren t problems, in eac h case setting ( n 1) of the constrain ts to equalities, to nd the optimal solution. In general, if the matrix Q has ( n k ) negativ e eigen v alues, then w e are required to solv e n ! k !( n k )! indep enden t problems. In addition, the total computational time required to solv e this problem is prop ortional to k 3 c k n ! k !( n k )! . Th us, if k is an constan t and indep enden t of n , then the computational

PAGE 91

74 time is b ounded b y a p olynomial in n . On the other hand, if k gro ws with n , then the computational time can gro w exp onen tially with n [58 ]. 4.4.2 KKT conditions for Quadratic Programming In this section, w e consider a quadratic program, whic h is a linearly constrained optimization problem with a quadratic ob jectiv e function. W e then examine the Karush-Kuhn-T uc k er conditions for the QP and sho w that they can b e transformed to b e a set of linear inequalities and complemen tarit y constrain ts. The general quadratic program can b e written as min f ( x ) = 1 2 x T Qx + c T x s.t. Ax b and x 0 (4.22) where c is an n -dimensional column v ector, whic h is the co ecien ts of the linear terms, and Q is an ( n n ) symmetric matrix, whic h is the co ecien ts of the quadratic terms in the ob jectiv e function. The n -dimensional column v ector x is the decision v ariables. The constrain ts are dened b y an ( m n ) A matrix and an m -dimensional column v ector b of righ t-hand side co ecien ts. F rom general dualit y theory for nonlinear programming, the Lagrangian function for the quadratic program is giv en b y L ( x; ) = 1 2 x T Qx + c T x + ( Ax b ) ; x 0

PAGE 92

75 where is an m -dimensional ro w v ector. The KKT conditions for a lo cal minim um are giv en as follo ws: @ L @ x j 0 ; j = 1 ; : : : ; n ) x T Q + c T + A 0 (4.23) @ L @ j 0 ; i = 1 ; : : : ; m ) Ax b 0 (4.24) x j @ L @ x j 0 ; j = 1 ; : : : ; n ) x T ( Qx + c + A T ) = 0 (4.25) u i g i ( x ) = 0 ; i = 1 ; : : : ; m ) ( Ax b ) = 0 (4.26) x j 0 ; j = 1 ; : : : ; n ) x 0 (4.27) i 0 ; i = 1 ; : : : ; m ) 0 ; (4.28) Note that Eq. (4.24) is the stationary condition, Eqs. (4.25) and (4.26) are the complemen tary slac kness conditions, and Eqs. (4.24), (4.27) and (4.28) ensure feasibilit y of the solution. An y p oin t x that satises Eqs. (4.23){(4.28) is called a KKT stationary p oin t of Problem (4.21). W e in tro duce nonnegativ e surplus v ariables y 2 R n to Eq. (4.23) and nonnegativ e slac k v ariables v 2 R m to Eq. (4.24). Then w e obtain the equations for a lo cal minim um giv en as follo ws: Qx + c T + A y = 0 Ax b + v = 0 T o get the KKT conditions in a more manageable form, w e mo v e the constan ts to the righ t-hand side (RHS). W e then obtain Qx + A y = c T (4.29) Ax + v = b (4.30) x 0 ; 0 ; y 0 ; v 0 (4.31) y T x = 0 ; v = 0 (4.32)

PAGE 93

76 Note that Eqs. (4.29){(4.30) are linear constrain ts, and Eq. (4.31) are the nonnegativit y constrain ts but (4.32) is the complemen tarit y slac kness constrain t, whic h is nonlinear. Ho w ev er, the simplex algorithm can b e used to solv e Eqs. (4.29){(4.32) b y using a restricted basis en try rule. F rom the KKT conditions, w e can set up the linear programming mo del for the QP as follo ws: 1. Dene KKT conditions to b e the structural constrain ts in Eqs. (4.29) and (4.30). 2. Multiply an y equations, whose RHS v alues are negativ e, b y -1. 3. Add an articial v ariable to eac h equation. 4. Let the ob jectiv e function b e the sum of the articial v ariables. The ob jectiv e of this linear program is to nd the solution that minimizes the sum of the articial v ariables with the complemen tarit y constrain ts. If the solution is zero, the KKT conditions are satised. It is notew orth y that a restricted basis en try ruled in the simplex algorithm needs to b e applied; that is, the en tering v ariable will b e the one whose reduced cost is most negativ e suc h that its complemen tary v ariable is not in the basis or w ould lea v e the basis on the same iteration [78 ]. This tec hnique requires computational eort comparable to a linear programming problem with m + n constrain ts, where m is the n um b er of constrain ts and n is the n um b er of v ariables in the QP . It has b een sho wn to w ork w ell when the ob jectiv e function is p ositiv e denite but there are computational diculties with p ositiv e semidenite forms of the ob jectiv e function. The simplest practical approac h to transform a p ositiv e semi-denie Q matrix to a p ositiv e denite matrix is to add a small constan t to eac h of the diagonal elemen ts of Q . Ho w ev er, there is an in tensiv e discussion of the conditions that yield a global optim um when f ( x ) is not p ositiv e denite [108 ]. It is w ell kno wn that the KKT conditions are necessary for all quadratic problems (con v ex and noncon v ex) but sucien t only for con v ex problems.

PAGE 94

77 4.4.2.1 F easible descen t directions and optimalit y conditions It is p ossible to form ulate the necessary and sucien t conditions for lo cal optimalit y for Problem (4.22) b y using the concept of a feasible descen t direction at a giv en p oin t. F or an y feasible p oin t x , a v ector d 2 R n is a feasible direction if there exists an > 0 suc h that x + td is feasible for all t 2 [0 ; ]. In addition, if there exists an > 0 suc h that f ( x + td ) < f ( x ) for all t 2 [0 ; ], the v ector d is said to b e a descen t direction. If the v ector d is a feasible direction as w ell as a descen t direction, then the v ector d is called a feasible descen t direction. The follo wing theorem is equiv alen t to the KKT conditions for Problem 4.22 [40 ]. Theorem 4.12. The gr adient of the obje ctive function at x , g = Qx + c . The subset of c onstr aints that ar e active at x is denote d by A a . The ne c essary and sucient c onditions for x to b e a lo c al minimizer for 4.22 ar e given by (i) F or al l d such that A a d 0 , g T d 0 (ii) F or al l d such that A a d 0 and g T d = 0 , d T Qd 0 . Pr o of. See [155 ]. It is clear that when w e com bine condition (i) with the feasibilit y requiremen t, w e then obtain the KKT conditions. Condition (ii) is an additional second order optimalit y condition. 4.4.2.2 Activ e constrain ts and optimalit y When w e consider the activ e set of constrain ts at the optimal solution, w e obtain the follo wing theorem b y adapting the original theorem for the general nonlinear programming problem from [58 ]. Theorem 4.13. If the matrix Q has s ne gative eigenvalues at some p oint x , then ther e must b e at le ast s active c onstr aints at x . This theorem giv es a direct correlation b et w een the geometric structure of the geometric structure of the feasible region and the algebraic structure of the matrix Q . No w w e can conclude that if there are n negativ e eigen v alues at x , then x is an

PAGE 95

78 extreme p oin t of the feasible region. F or an y problems where the ob jectiv e function is b ounded o v er the feasible set and there is at least one optimal solution, the follo wing results hold. (i) F or indenite quadratic problems, the optimal solution exists at a b oundary p oin t of the feasible region. (ii) F or conca v e quadratic problems, the optimal solution exists at an extreme p oin t of the feasible region. (iii) If the optimal solution exists at an in terior p oin t on a facet of the p olytop e dened b y the feasible set, then the matrix Q m ust ha v e exactly one negativ e eigen v alue. 4.4.2.3 Global optimalit y criteria The follo wing theorem sp ecically giv es necessary and sucien t conditions of global optimalit y . Theorem 4.14. Ther e exists N 2 f 1 ; 2 ; : : : g and c > 0 such that for every choic e of r e al numb er N N and c c , x is the optimal solution to Pr oblem (4.22) i x also minimizes c T x + 1 2 x T Qx + c j Ax b j 1 = N over the nonne gative orthant of R n . In addition, if > 0 ; Ax + b = 0 and x minimizes c T x + 1 2 x T Qx + j Ax b j 1 = , then x is the optimal solution to Pr oblem (4.22) [157]. This theorem is hard to implemen t in practice. When w e consider the cop ositivit y of the matrix Q , w e ha v e the follo wing theorem whic h giv es us a more useful criterion for c hec king global optimalit y when the matrix Q is negativ e denite. Theorem 4.15. L et Q b e a ne gative denite matrix, and x b e a fe asible p oint of Pr oblem (4.22). L et I ( x ) b e the sets of active c onstr aints at x , and u i = b i ( Ax ) i >

PAGE 96

79 0 b e the slacks for the inactive c onstr aints [24]. Dene B i = a i ( Qx + c ) T ( Qx + c )( a i ) T ; Q 0 = Q; Q i = u i Q B i ; i = 1 ; : : : ; m; = f 2 R n : ( A ) i 0 ; if ( Ax ) i = b i g ; 0 = f 2 : ( A ) i 0 ; 8 i 2 f 1 ; : : : ; m gn I ( x ) g ; and 0 = f 2 : ( A ) i 0 ; and u j ( A ) i u i ( A ) j 8 j 2 f 1 ; : : : ; m gn I ( x ) g ; wher e ( a i ) T denotes the i t h r ow of A . Then, x is a glob al solution to Pr oblem (4.22) if and only if: (i) x is a KKT p oin t of Problem (4.22) and Q i is i -cop ositiv e for all i 2 f 0 ; : : : ; m g n I ( x ). (ii) x satises T ( Qx + c ) 0 for all 2 0 and Q i is i -cop ositiv e for all i 2 f 0 ; : : : ; m gn I ( x ). In practice, this condition in v olv es m 1 problems of c hec king -cop ositivit y . In the w orst case, this still has the exp onen tial complexit y of c hec king lo cal optimalit y . 4.4.3 Complexit y of KKT p oin ts in Quadratic Programming In the previous section, w e sho w that KKT conditions can b e transformed to b e a set of linear inequalities and complemen tarit y constrain ts. Ho w ev er, to c hec k the existence of a KKT p oin t in the un b ounded feasible domain is not trivial. Let us consider the follo wing quadratic problem min f ( x ) = 1 2 x T Qx + c T x (4.33) s.t. x 0 ; where Q is an ( n n ) symmetric matrix, and c 2 R n . F rom the previous section, w e can obtain the KKT optimalit y conditions whic h can b e transformed to a linear complemen tarit y problem (LCP). Therefore, the complexit y of nding KKT p oin ts

PAGE 97

80 for the ab o v e quadratic problem is reduced to the complexit y of solving the follo wing LCP: Qx + c x; x 0 (4.34) x T ( Qx + c ) = 0 : F rom [61 ], w e then ha v e the follo wing pro of that the KKT p oin ts exist is N P hard. Note that the LCP can b e pro v en to b e N P hard b y sho wing that the LCP is solv able if the asso ciated knapsac k problem, whic h is w ell-kno wn to b e N P complete, is solv able. 4.4.4 Linear and Quadratic Zero-One Problems In teger programming is used to mo del a v ariet y of imp ortan t practical problems in op erations researc h, engineering, and computer science. Consider the follo wing linear zero-one programming problem: min c T x s.t. Ax b; x i 2 f 0 ; 1 g ( i = 1 ; :::; n ) where A is a real ( m n )-matrix, c 2 R n and b 2 R m . Let e T = (1 ; :::; 1) 2 R n denote the v ector whose comp onen ts are all equal to 1. Then the zero-one in teger linear programming problem is equiv alen t to the follo wing conca v e minimization problem: min f ( x ) = c T x + x T ( e x ) s.t. Ax b; 0 x e where is a sucien tly large p ositiv e in teger. W e kno w that the function f ( x ) is conca v e b ecause x T x is conca v e. The equiv alence of the t w o problems is based on the facts that a conca v e function attains its minim um at a v ertex and that x T ( x e ) = 0 ; 0 x e , implies x i = 0 or 1 for i = 1 ; :::; n: W e note that a v ertex of the feasible domain is not necessarily

PAGE 98

81 a v ertex of the unit h yp ercub e 0 x e , but the global minim um is attained only when x T ( e x ) = 0, pro vided that is a sucien tly large n um b er. These transformation tec hniques can b e applied to reduce nonlinear zero-one problems to equiv alen t conca v e minimization problems. F or instance, consider the quadratic zero-one problem of the follo wing form: min f ( x ) = c T x + x T Qx s.t. x 2 f 0 ; 1 g where Q is a real symmetric ( n n ) matrix. Giv en an y real n um b er , let Q = Q + I where I is the ( n n ) unit matrix, and c = c e . Because of f ( x ) = f ( x ), the ab o v e quadratic zero-one problem is equiv alen t to the problem: min f ( x ) = c T x + x T Qx s.t. x i 2 f 0 ; 1 g ( i = 1 ; : : : ; n ) In this case, if w e c ho ose suc h that Q = Q + I b ecomes a negativ e semidenite matrix (e.g., = ; where is the largest eigen v alue of Q ), then the ob jectiv e function f ( x ) b ecomes conca v e and the constrain ts can b e replaced b y 0 x e . Th us, this problem is equiv alen t to the minimization of a quadratic conca v e function o v er the unit h yp ercub e [61 ]. In this section, w e giv e examples of some applications of the QP . Next, nonlinear assignmen t problem, maxim um clique problem, and maxim um indep enden t set problem are discussed. 4.4.4.1 Nonlinear assignmen t problems The quadratic assignmen t problem (QAP), whic h is kno wn to b e N P -complete, b elongs to a class of com binatorial optimization problems that ha v e man y practical applications, but are computationally dicult to solv e [61 ]. The QAP can b e stated as follo ws:

PAGE 99

82 Given a p ositive inte ger n , and two n n matric es A = ( a ij ) and B = ( b ij ) with nonne gative entries, nd a p ermutation p = ( p (1) ; : : : ; p ( n )) of the set f 1 ; 2 ; : : : ; n g that minimizes C ( p ) n X i =1 n X j =1 a ij b p ( i ) p ( j ) The QAP can b e form ulated in sev eral equiv alen t forms. One of the most common form ulations is the the follo wing quadratic zero-one programming problem: min n X i =1 n X j =1 n X k =1 n X l =1 a ij b l k x ik x j l s.t. n X i =1 x ij = 1 ( j = 1 ; : : : ; n ) n X j =1 x ij = 1 ( i = 1 ; : : : ; n ) (4.35) x ij 2 f 0 ; 1 g ( i; j = 1 ; : : : ; n ) : If w e denote the feasible domain of the ab o v e problem b y D , then the problem can b e written as min x T S x (4.36) s.t. x 2 D where the ( n 2 n 2 )-matrix S has nonnegativ e en tries. The en tries of S are the pro ducts of a ij b k l , and it is natural to dene a ro w of S b y i and j xed, and a column of S b y k and l xed (or vice v ersa). The quadratic zero-one problem (4.35) can b e transformed in to an equiv alen t quadratic conca v e minimization problem. Let m = n 2 and consider the ro w norm of the ( m m )-matrix S dened b y k S k 1 = max f m X j =1 j s 1 j j ; : : : ; m X j =1 j s mj jg :

PAGE 100

83 Let Q = S I , where I is the m m unit matrix, and > k S k 1 . Then, let x = ( x 11 ; x 12 ; : : : ; x nn ) T , and consider the quadratic form x T Qx . Without loss of generalit y , w e assume that Q is symmetric, then replace Q b y Q 0 = 1 2 ( Q + Q T ), whic h is symmetric and satises x T Q 0 x = x T Qx b ecause x T Q T x = ( x T Q T x ) T = x T Qx . W e then obtain x T Qx = m X i =1 q ii x 2 i + 2 m 1 X i =1 m X j = i +1 q ij x i x j = m X i =1 ( q ii + m X j =1( j 6= i ) q ij ) x 2 i m 1 X i =1 m X j = i +1 q ij ( x i x j ) 2 = m X i =1 ( + m X j =1 s ij ) x 2 i m 1 X i =1 m X j = i +1 s ij ( x i x j ) 2 m X i =1 ( + m X j =1 s ij ) x 2 i Clearly , x T Qx < 0 for an y x 6= 0. Then, the matrix Q is negativ e denite. W e then can transform the QAP in to the equiv alen t quadratic conca v e programming problem: min x T Qx (4.37) s.t. x 2 where Q = S I ; > k S k 1 , and is the set of all x = ( x 11 ; x 12 ; : : : ; x nn ) T 2 R n 2 satisfying n X i =1 x ij = 1 ( j = 1 ; : : : ; n ) n X j =1 x ij = 1 ( i = 1 ; : : : ; n ) x ij 0 ( i; j = 1 ; : : : ; n ) : This is true b ecause, for x 2 ; x T I x = n P i =1 n P j =1 n P k =1 n P l =1 x ik x j l = n 2 , whic h is a constan t on the v ertices of . It is w ell-kno wn that the matrix of the assignmen t problem constrain ts is TUM and all v ertices of are in teger with comp onen ts in

PAGE 101

84 f 0 ; 1 g [103 , 109 ]. Then the conca v e function x T Qx ac hiev es its minim um at some v ertex [61 ]. 4.4.4.2 Maxim um clique problems The maxim um clique problem can b e dened as follo ws. Let G = G ( V ; E ) b e an undirected graph where V = f 1 ; : : : ; n g is the set of v ertices (no des), and E denotes the set of edges. Assume that there is no parallel edges (and no self-lo ops joining the same v ertex) in G . Denote an edge joining v ertex i and j b y ( i; j ). Denition 4.7. A clique of G is a subset C of vertic es with the pr op erty that every p air of vertic es in C is c onne cte d by an e dge; that is, C is a clique if the sub gr aph G ( C ) induc e d by C is c omplete. Denition 4.8. The maximum clique pr oblem is the pr oblem of nding a clique set C of maximal c ar dinality (size) j C j . The maxim um clique problem can b e represen ted in man y equiv alen t form ulations (e.g., an in teger programming problem, a con tin uous global optimization problem, and an indenite quadratic programming). In this thesis, w e are only in terested in an indenite quadratic programming problem. F rom [100 ], let A G = ( a ij ) n n b e the adjacency matrix of G dened b y a ij = 8 > < > : 1 if ( i; j ) 2 E 0 if ( i; j ) = 2 E : The matrix A G is symmetric and all its eigen v alues are real n um b ers. In addition, the sum of eigen v alues is zero b ecause the main diagonal en tries a ij are zero. In general, A G has p ositiv e and negativ e (and p ossibly zero) eigen v alues [61 ]. The con tin uous form ulation of the indenite quadratic programming problem for the maxim um clique

PAGE 102

85 is giv en b y max f G ( x ) = X ( i;j ) 2 E x i x j = 1 2 x T A G x s.t. x 2 S = f x = ( x 1 ; : : : ; x n ) T : n X i =1 x i = 1 ; x i 0 g : (4.38) W e can form ulate the maxim um clique problem as a minimization problem. The indenite quadratic in teger programming form ulation for the maxim um clique is giv en b y min f G ( x ) = n X i =1 x i + 2 X ( i;j ) 2 E ;i>j x i x j = x T ( A G I ) x = x T Ax s.t. x 2 f 0 ; 1 g n : (4.39) where A = A G I and A G is an adjacency matrix of the graph G . If x solv es Problem (4.39), then the set C dened b y C = t ( x ) is a maxim um clique of graph G with j C j = f G ( x ). 4.4.4.3 Maxim um indep enden t set problems The maxim um indep enden t set problem has man y equiv alen t form ulations as an in teger programming problem and as a con tin uous noncon v ex optimization problem [118 ]. In this section w e will giv e a brief review of some w ell-kno wn form ulations (e.g., in teger programming form ulations and quadratic programming form ulations) Giv en a v ector w 2 R n of p ositiv e w eigh ts w i (asso ciated with eac h v ertex i , i = 1 ; : : : ; n the ob jectiv e of the maxim um w eigh t indep enden t set problem is to nd indep enden t sets of maxim um w eigh t. It is a generalization of the maxim um indep enden t set problem. The in teger programming form ulation (edge form ulation)

PAGE 103

86 of the maxim um w eigh t indep enden t set problem is giv en b y max f ( x ) = n X i =1 w i x i s.t. x i + x j 1 ; 8 ( i; j ) 2 E ; (4.40) x i 2 f 0 ; 1 g ; i = 1 ; : : : ; n: An alternativ e form ulation of this problem is the follo wing clique form ulation [55 ]. max f ( x ) = n X i =1 w i x i s.t. X i 2 S x i 1 ; 8 S 2 C f maximal cliques of G g (4.41) x i 2 f 0 ; 1 g ; i = 1 ; : : : ; n: The adv an tage of form ulation (4.41) o v er (4.41) is a smaller gap b et w een the optimal v alue of (4.41) and its linear relaxation. On the other hand, nding an ecien t solution to (4.41) is dicult b ecause there is an exp onen tial n um b er of constrain ts [4 ]. W e consider another form ulation of the maxim um indep enden t set problem. Let A G b e the adjacency matrix of a graph G , and I b e the ( n n ) iden tit y matrix. The global quadratic zero-one problem for the maxim um indep enden t set problem is giv en b y min f ( x ) = x T Ax s.t. x i 2 f 0 ; 1 g ; i = 1 ; : : : ; n: (4.42) where A = A G I . If x is a solution to (4.42), then the set J dened b y J = f j 2 V : x j = 1 g is a maxim um indep enden t set of G with j J j = f ( x ) [114 ]. Problem 4.42 can b e written in the quadratic form giv en b y H ( x ) = n X i =1 x i + X ( i;j ) 2 E x i x j ; for x 2 f 0 ; 1 g n (4.43)

PAGE 104

87 F rom [4 ], the indep endence n um b er of the graph G is c haracterized b y the maximization of H ( x ) o v er the n -dimensional h yp ercub e. Theorem 4.16. L et G = ( V ; E ) b e a simple gr aph on n no des V = f 1 ; : : : ; n g and set of e dges E , and ( G ) b e the indep endenc e numb er of G . Then ( G ) = max 0 x i 1 ; i =1 ;::: ;n H ( x ) = max 0 x i 1 ; i =1 ;::: ;n ( n X i =1 x i X ( i;j ) 2 E x i x j ) ; wher e e ach variable x i c orr esp onds to no de i 2 V . Pr o of. See [4 ]. 4.4.5 V arious Equiv alen t F orms of Quadratic Zero-One Problem The problem considered here is a quadratic zero-one program, whic h has the form min f ( x ) = x T Ax; s.t. x i 2 f 0 ; 1 g ; i = 1 ; :::; n; (4.44) where A is an n n matrix [112 , 113 ]. Throughout this section the follo wing notation will b e used. f 0 ; 1 g n : set of n dimensional 0-1 v ectors. R n n : set of n n dimensional real matrices. R n : set of n dimensional real v ectors. In order to formalize the notion of equiv alence w e need some denitions. Denition 4.9. We say that pr oblem P is \p olynomial ly r e ducible" to pr oblem P 0 if given an instanc e I ( P ) of pr oblem P , we c an in p olynomial time obtain an instanc e I ( P 0 ) of pr oblem P 0 such that solving I ( P ) wil l solve I ( P 0 ) . Denition 4.10. Two pr oblems P 1 and P 2 ar e c al le d \e quivalent" if P 1 is \p olynomial ly r e ducible" to P 2 and P 2 is \p olynomial ly r e ducible" to P 1 . Consider the follo wing three problems:

PAGE 105

88 P : min f ( x ) = x T Ax; x 2 f 0 ; 1 g n ; A 2 R n n . P 1 : min f ( x ) = x T Ax + c T x; x 2 f 0 ; 1 g n ; A 2 R n n ; c 2 R n . P 2 : min f ( x ) = x T Ax; x 2 f 0 ; 1 g n ; A 2 R n n ; P n i =1 x i = k for some k s.t. 0 k n , where x = ( x 1 ; x 2 ; :::::; x n ). Next w e sho w that problems P , P 1 , and P 2 are all \equiv alen t". Then, form ulation P 2 will b e used in the rest of the sections. Lemma 4.1. P is \p olynomial ly r e ducible" to P 1 . Pr o of. It is v ery easy to see that P is a sp ecial case of P 1 . Lemma 4.2. P 1 is \p olynomial ly r e ducible" to P . Pr o of. Problem P 1 is dened as follo ws: min f ( x ) = x T Ax + c T x; x 2 f 0 ; 1 g n ; A 2 R n n ; c 2 R n . If A = ( a ij ) then let B = ( b ij ) where b ij = 8 > < > : a ij if i 6= j a ij + c i if i = j : Since x 2 i = x i (b ecause x i 2 f 0 ; 1 g ), w e ha v e g ( x ) = x T B x = x T Ax + c T x . So the follo wing problem is equiv alen t to problem P 1 : min g ( x ) = x T B x; x 2 f 0 ; 1 g n ; B 2 R n n . Using Lemma 4.1 and Lemma 4.2, it is eviden t that P and P 1 are \equiv alen t". Lemma 4.3. P 2 is \p olynomial ly r e ducible" to P . Pr o of. Problem P 2 is as follo ws: min f ( x ) = x T Ax; x 2 f 0 ; 1 g n ; A 2 R n n ; P n i =1 x i = k for some k s.t. 0 k n . If A = ( a ij ) then let M = 2[ P n j =1 P n i =1 j a ij j ] + 1. No w, dene the follo wing problem P : min g ( x ) = x T Ax + M ( P n i =1 x i k ) 2 s.t. x 2 f 0 ; 1 g n ; A 2 R n n . Let x b = ( x b 1 ; :::::; x b n ) and x 0 = ( x 0 1 ; :::::; x 0 n ) suc h that P n i =1 x b i 6= k and P n i =1 x 0 i = k ; then g ( x 0 ) M 1 2 as P n i =1 x 0 i = k ; g ( x b ) ( M 1) 2 + M or g ( x b ) M +1 2 as j P n i =1 x b i k j 1 : Therefore, g ( x 0 ) < g ( x b ) if P n i =1 x b i 6= k and P n i =1 x 0 i = k : Hence, if min g ( x ) = g ( x 0 ) where x 0 = ( x 0 1 ; :::::; x 0 n ) then P n i =1 x 0 i = k :

PAGE 106

89 So min f ( x ) = min g ( x ). F rom the ab o v e discussion, it can b e easily seen that P 2 is \p olynomially reducible" to P . The pro of of Lemma 4.3 also illustrates ho w equalit y constrain ts in a quadratic zero-one program can b e eliminated. Lemma 4.4. P is \p olynomial ly r e ducible" to P 2 . Pr o of. Let problem P b e dened as follo ws: min f ( x ) = x T Ax; x 2 f 0 ; 1 g n ; A 2 R n n . Dene a series of ( n + 1) problems: P 2 (0) ; P 2 (1) ; P 2 (2) ; ; P 2 ( n ), where P 2 ( j ) is the follo wing problem min f ( x ) = x T Ax; x 2 f 0 ; 1 g n ; A 2 R n n , P n i =1 x i = j : Let the minim um of the problem P 2 ( j ) b e y j , then the minim um of problem P is easily seen to b e the min f y 0 ; y 1 ; :::::; y n g . Lemma 4.3 and Lemma 4.4 imply that P and P 2 are \equiv alen t". Since \equivalen t" is a transitiv e relativ e, P ; P 1 ; P 2 are all \equiv alen t". 4.4.6 Complexit y of Quadratic Zero-One Programming Quadratic zero-one programming is a dicult problem. W e sho w next that the mo del ( P 2 ) w e are using in this w ork is equiv alen t to the k -clique problem. A k -clique is a complete graph with k v ertices. k -clique problem: Giv en a graph G = ( V ; E ) ( V is the set of v ertices and E is the set of edges), do es the graph G ha v e a k -clique as one of its subgraphs? k -clique problem is kno wn to b e NP-complete. W e will sho w that the k -clique problem is \p olynomially reducible" to problem P 2 dened in the previous subsection. Theorem 4.17. The k -clique pr oblem is \p olynomial ly r e ducible" to P 2 . Pr o of. Problem P 2 w as dened as min f ( x ) = x T Ax , s.t. x i 2 f 0 ; 1 g ; i = 1 ; ; n , P n i =1 x i = m for some 0 m n . Giv en the graph G = ( V ; E ), dene A = ( a ij ) suc h that a ij = 8 > < > : 0 if ( v i ; v j ) 2 E 1 if ( v i ; v j ) 62 E ;

PAGE 107

90 where n = j V j ; m = k (w e are trying to nd a k -clique). The meaning attac hed to the v ector x 2 f 0 ; 1 g n in problem P 2 is as follo ws x i = 8 > < > : 1 means that v i is in the clique. 0 means that v i is not in the clique. W e can easily pro v e that the graph G has a k -clique if and only if min f ( x ) = k ( k 1). So the k -clique problem is \p olynomially reducible" to P 2 . Problem P 2 is \equiv alen t" to P , so problem P is also NP-hard. Therefore, as the dimension of the problem increases, the necessary CPU time to solv e the problem increases exp onen tially . 4.5 Multi-Quadratic Programming In this researc h w e examine optimization problems, where the ob jectiv e function is a quadratic function and the feasible region is dened b y a nite set of quadratic and linear constrain ts. These problems are called Multi-Quadratic Programming (MQP) problems. They can b e form ulated as follo ws: min x T Qx + c T x s.t. x T A j x + B j x b j ; j = 1 ; : : : ; m (4.45) x 0 where A j is an ( n n ) matrix corresp onding to the m th quadratic constrain t, and B j is the j th ro w of the ( m n ) matrix B . MQP pla ys an imp ortan t mo deling role for man y div erse problems. Moreo v er, the MQP is a structured global optimization problem, whic h encompasses man y others. It pro vides a m uc h impro v ed mo del compared to the simpler linear relaxation of a problem. Indeed, linear mixed 0-1, fractional, bilinear, bilev el, generalized linear complemen tarit y , and man y more programming problems are or can easily b e reform ulated as sp ecial cases of MQP . Ho w ev er, there are theoretical and practical diculties

PAGE 108

91 in the pro cess of solving suc h problems. Ho w ev er, v ery large linear mo dels can b e solv ed ecien tly; whereas MQP problems are in general N P -hard and n umerically in tractable. The problem of nding a feasible solution is N P -hard as it generalizes the linear complemen tarit y problem [61 ]; the nonlinear constrain ts dene a feasible region whic h is in general neither con v ex nor connected. Moreo v er, ev en if the feasible region is a p olyhedron, optimizing the quadratic ob jectiv e function is strongly N P hard as the resulting problem subsumes the disjoin t bilinear programming problem. Therefore, nding a nite and exact algorithm that solv es large MQP problems is impractical. Ev en for the con v ex case (when Q and A j are p ositiv e semidenite), there are v ery few algorithms for solving MQP problems. Ho w ev er, the MQP constitutes an imp ortan t part of mathematical programming problems, arising in v arious practical applications including facilit y lo cation, pro duction planning, VLSI c hip design, optimal design of w ater distribution net w orks, and most problems in c hemical engineering design. The MQP w as rst in tro duced in the seminal pap er of Kuhn and T uc k er [85 ]. Later on, the case of MQP with a single quadratic constrain t in the problem w as discussed in [152 , 107 ]. The rst general approac h for solving MQP problems w as prop osed in [16 ], where the follo wing t w o Lagrange functions for MQP are considered: L 1 ( x; ) = x T Qx + c T x + m X j =1 j ( x T A j x B j x b j ) ; L 2 ( x; ; ) = L 1 ( x; ) i x i ; where and are the m ultipliers for the quadratic and b ound constrain ts resp ectiv ely . A cutting plane algorithm w as applied to solv e this problem; that is, the algorithm solv es a sequence of linear master problems that minimize a piecewise linear function constructed from the Lagrange functions for constan t x , and a primal problem with either an unconstrained quadratic function (using L 2 ( x; ; )) or a quadratic function o v er the nonnegativ e orthan t (using L 1 ( x; )) [40 ].

PAGE 109

92 A branc h-and-b ound algorithm for solving MQP problems (and other more general problems) when the ob jectiv e function is separable and the constrain t set is linear w as in tro duced in [34 ]. The metho d ev olv es solving b ounding con v ex en v elop e appro ximating problems o v er successiv e partitions of the feasible region. This metho d w as later extended to deal with noncon v ex constrain ts but it generates a n um b er of infeasible solutions and do es not, in general, con v erge in a nite n um b er of iterations [147 ]. An algorithm for the solution to linear problems with an additional rev erse con v ex constrain t w as prop osed in [19]. The algorithm in v olv es partitioning the feasible region in to subsets con tained in cones originating at an infeasible v ertex of the p olytop e formed b y the linear constrain ts while ensuring that an in terior p oin t of the feasible region is con tained in eac h partition. Later on, an algorithm for the solution to problems with conca v e ob jectiv e functions and separable quadratic constrain t w as prop osed in [6]. The algorithm uses piecewise linear appro ximation for the quadratic constrain ts and solv es a MQP problem as a mixed 0-1 linear problem. This algorithm is similar to the solution approac hes for conca v e quadratic problems [115 ] and for indenite quadratic problems [111 ]. During the last decade, sev eral authors are in terested in some sp ecial cases of MQP . Also, man y extensions of MQP ha v e b een discussed in the literature. The problem of minimizing an indenite quadratic ob jectiv e sub ject to t w o-sided indenite quadratic constrain ts w as discussed in [149 ]. Under suitable assumptions, they deriv ed necessary and sucien t optimalit y conditions and ga v e some conditions for the existence of solutions for this noncon v ex program. While sev eral metho ds ha v e b een suggested for solving MQP problem, n umerical solutions of the general problem are still rarely a v ailable in the literature. By using a double dualit y argumen t, under suitable assumptions, the MQP is pro v ed to b e equiv alen t to a con v ex program [154 ]. In addition, a problem with a conca v e quadratic function is pro v ed to b e equiv alen t

PAGE 110

93 to a minimax con v ex problem, and th us can b e solv ed in p olynomial time via in teriorp oin ts metho ds. The prop ert y is no longer true when Q is an indenite quadratic function [154 ]. 4.5.1 Applications Eac h n -dimensional MQP problem can b e easily transformed to a 2 n -dimensional bilinear problem. A strategy for reducing the necessary dimension of the resulting bilinear program is also prop osed [5 , 59 ]. Ho w ev er, on the other hand, bilinear optimization problems are nothing else but a sp ecial instance of MQP . P o oling problems in p etro c hemistry , the mo dular design problem in tro duced in [32 ], in particular the m ultiple mo dular design problem [5, 33] or the more general mo dularization of pro duct sub-assem blies [134 ], and sp ecial classes of structured sto c hastic games [37 ] are only some examples of the wide range of applications of bilinear programming problems. Another large class of optimization problems are problems with linear or quadratic functions additionally in v olving Bo olean v ariables (i.e., v ariables x i 2 R with the constrain t x i 2 f 0 ; 1 g ). Another widely explored problem is the problem of pac king n 2 N equal circles in a square, whic h can b e transformed to a MQP problem. One lo oks for the maxim um radius r of n non-o v erlapping circles con tained in the unit square. This problem is equiv alen t to a MQP problem with a linear ob jectiv e function and conca v e quadratic constrain ts. A related class of global optimization problems are minimax lo cation problems [64 ], whic h also lead to quadratic constrain ts. Pro duction planning and p ortfolio optimization are examples where so-called c hance constrained linear programs o ccur. These are problems, lo oking similar to linear programs. Ho w ev er, the matrix describing the linear constrain ts of suc h problems is not deterministic, it is a sto c hastic one. Under certain restrictiv e assumptions it is p ossible to transform these sto c hastic constrain ts to deterministic quadratic constrain ts [64 ], suc h that in general a problem of t yp e MQP is obtained. In [6 ] it is sho wn that noncon v ex MQP problems can

PAGE 111

94 b e used for the examination of sp ecial instances of nonlinear bilev el programming problems. Other applications of MQP include the fuel mixture problem encoun tered in the oil industry [121 ] and also placemen t and la y out problems in in tegrated circuit design [7, 8]. Hence there are man y applications of MQP . Whether the MQP is in practice applicable for solving, for example, problems resulting from in teger programming problems, dep ends on the n umerical eciency of the solution metho d that is used. Up to no w only few metho ds for solving the considered general case of MQP w ere prop osed in the literature. Most of them result from metho ds b eing dev elop ed for other more general problem classes. In the next section w e will discuss some of the solution tec hniques, whic h are used in the solution approac hes in this dissertation. 4.6 Reform ulation-Linearization T ec hniques Denition 4.11. L et P A and P B b e two optimization pr oblems. A r eformulation B ( ) of P A and P B is a mapping fr om P A to P B such that, given any instanc e A of P A and an optimal solution to B ( A ) , an optimal solution to A c an b e obtaine d within a p olynomial amount of time. The reform ulation in v olv ed in a p olynomial time T uring reduction is a p olynomial time mapping. Although the optimal solution to A can b e obtained within p olynomial time from that of B ( A ), there ma y b e a case that the reform ulation is not a p olynomial mapping. A reform ulation is completely c haracterized b y P A and B ( P A ). If no large nite constan t app ears in the instance B ( P A ), w e denote it as reform ulation P A ! P B and otherwise as reform ulation P A ) P B . Links b et w een quadratic in teger programming and mixed in teger programming are studied in the next section. Reform ulations from one problem to another are not alw a ys straigh tforw ard; for example, they ma y require the in tro duction of KKT optimalit y conditions.

PAGE 112

95 4.6.1 Quadratic In teger Programming In this dissertation, w e are motiv ated b y reform ulation-linearization tec hniques (RL T) to dev elop a no v el linearization tec hnique based on KKT conditions, whic h w e later pro v e that this tec hnique yields an optimal solution to the original MQP . A quadratic function is dened on R n b y min f ( x ) = x T Qx; s.t. x i 2 f 0 ; 1 g ; i = 1 ; :::; n (4.46) where Q is an n n matrix [112 , 113 ]. Throughout this section the follo wing notations will b e used. f 0 ; 1 g n : set of n dimensional 0-1 v ectors. R n n : set of n n dimensional real matrices. R n : set of n dimensional real v ectors. Next, w e add a linear constrain t, n P i =1 x = b , where b is a constan t. W e no w consider the follo wing Quadratic In teger Programming (QIP) problem: P : min f ( x ) = x T Qx; s.t. n X i =1 x = b; x 2 f 0 ; 1 g n ; Q 2 R n n : (4.47) Problem P can b e form ulated as a quadratic 0-1 problem of the form as in (4.46) b y using an exact p enalt y . If Q = ( q ij ) then let M = 2[ P n j =1 P n i =1 j q ij j ] + 1. Then, w e ha v e the follo wing equiv alen t problem P as follo ws: P : min g ( x ) = x T Qx + M ( n X i =1 x b ) 2 ; s.t. x 2 f 0 ; 1 g n ; Q 2 R n n : (4.48) 4.6.1.1 Con v en tional linearization tec hnique for quadratic zero-one problems In this section, w e prop ose a con v en tional linearization tec hnique, whic h can b e found in the literature. Consider Problem (4.47), for eac h pro duct x i x j ; w e in tro duce a new 0-1 v ariable, x ij = x i x j ( i 6= j ). Note that x ii = x 2 i = x i for x i 2 f 0 ; 1 g . After

PAGE 113

96 linearization, the equiv alen t In teger Programming (IP) form ulation is giv en b y min X i X j q ij x ij (4.49) s.t. n X i =1 x = b; (4.50) x ij x i ; for i; j = 1 ; :::; n ( i 6= j ) (4.51) x ij x j ; for i; j = 1 ; :::; n ( i 6= j ) (4.52) x i + x j 1 x ij ; for i; j = 1 ; :::; n ( i 6= j ) (4.53) where x i 2 f 0 ; 1 g and x ij 2 f 0 ; 1 g , i; j = 1 ; :::; n . Note that this tec hnique increases the n um b er of 0-1 v ariables to O ( n 2 ). Although, w e can apply CPLEX 7.0 to solv e problems with n = 30, this approac h b ecomes computationally inecien t as n increases. Algorithms for ecien tly solving QIP problems with larger size of problem are still desirable. 4.6.1.2 KKT conditions linearization tec hnique for quadratic zero-one problems In this section w e prop ose a no v el linearization tec hnique based on KKT optimalit y conditions. Consider a linearly constrained quadratic problem giv en b y min f ( x ) = x T Qx; s.t. n X i =1 x = b; x i 0 ; i = 1 ; :::; n: (4.54) The Lagrange function for Problem (4.54) is giv en b y min L ( x; u; y ) = 1 2 x T Qx + u T ( n X i =1 x b ) + y T (0 x ) ; (4.55) where u and v are the Lagrange m ultipliers for the inequalit y and negativit y constrain ts resp ectiv ely . Let e T = (1 ; 1 ; : : : ; 1); w e then ha v e the follo wing rst order

PAGE 114

97 Karush-Kuhn T uc k er conditions for Problem (4.54) as follo ws: Qx + e T u + y = 0 (4.56) u T ( e t x b ) = 0 (4.57) y T x = 0 (4.58) e T x b = 0 (4.59) x; u; y 0 : (4.60) Note that u is a scalar and y is a column v ector. Since Eq. (4.59) is alw a ys satised, v ariables u T can tak e an y v alues. W e add articial v ariables w , whic h is a column v ector, to Eq. (4.56) and then denote a column v ector s = u:e + w . W e then ha v e the KKT conditions giv en b y Qx + y + s = 0 (4.61) e T x = b (4.62) y T x = 0 (4.63) x; s; y 0 : (4.64) W e can form ulate the ab o v e KKT conditions as a mixed-in teger linear programming (MILP) problem. The ob jectiv e function is to minimize the summation of articial v ariables, s i . Because x i are 0-1 v ariables, w e can replace the last constrain t with y i M (1 x i ), for i = 1 ; :::; n , where M = max i P n j =1 q ij = k Q k 1 . W e then

PAGE 115

98 ha v e the MILP form ulation giv en b y min n X i =1 s i s.t. n X j =1 q ij x j + s i + y i = 0 ; for i = 1 ; :::; n n X i =1 x i b = 0 (4.65) y i M (1 x i ) 0 ; for i = 1 ; :::; n where x i 2 f 0 ; 1 g and s i ; y i 0 ; for i = 1 ; :::; n . Applying CPLEX 7.0, this problem can b e easily solv ed with n = 30. In addition, this form ulation is computationally ecien t as n increases b ecause the n um b er of 0-1 v ariables is O ( n ). W e note that solving the QIP problem in (4.54) b y solving the MILP in (4.65) is heuristic in nature. In matrix Q , ev ery elemen t m ust b e p ositiv e to mak e sure that there exists x that satises Eq. (4.61). Next, w e pro v e that applying this tec hnique w e can get an optimal solution to Problem (4.54) b y solving Problem (4.65). Without loss of generalit y , w e consider a QIP problem, whic h has the form min f ( x ) = x T Qx s.t. Ax b; (4.66) where Q is an n n matrix, whose eac h elemen t q ij 0 ; i; j = 1 ; : : : ; n , x 2 f 0 ; 1 g n , A is an m n matrix, b is a constan t v ector, m and n are some in teger n um b ers. Consider the follo wing t w o problems: P 1 : min f ( x ) = x T Qx; ; Ax b; x 2 f 0 ; 1 g n . P 1 : min g ( x ) = e T s; Qx y s = 0 ; Ax b; y T x = 0 ; x 2 f 0 ; 1 g n ; y i 0 ; s i 0. Lemma 4.5. Consider P 1 , for any i if x 0 i = 0 then s 0 i = 0 .

PAGE 116

99 Pr o of. By con tradiction, assume that for some i , x 0 i = 0 and s 0 i > 0, where ( y 0 ; s 0 ) w ere c hosen to minimize e T s 0 . Dene v ectors ~ y and ~ s as ~ y i = y 0 i + s 0 i , ~ s i = 0 and for i 6= j ~ y j = y j and ~ s j = s j . It is easy to c hec k that ( x 0 ; ~ y ; ~ s ) also satises all constrain ts in P 1 , and e T ~ s < e T s 0 . This con tradicts the initial assumption that s 0 and y 0 w ere c hosen to minimize e T s 0 . W e use the result from lemma 4.5 to pro v e the follo wing theorem. Theorem 4.18. P 1 has an optimal solution x 0 i ther e exist y 0 ; s 0 such that ( x 0 ; y 0 ; s 0 ) is an optimal solution to P 1 . Pr o of. Ne c essity. If x 0 is an optimal solution to P 1 , it is ob vious that 9 y ; s : y 0 ; s 0 suc h that Qx 0 y s = 0 ; Ax b; y T x 0 = 0 : Cho ose y 0 ; s 0 from the ab o v e dened set of y and s suc h that e T s 0 is minimized. Then, w e pro v e that ( x 0 ; y 0 ; s 0 ) is an optimal solution to P 1 . Multiplying (1) b y ( x 0 ) T , w e obtain ( x 0 ) T Qx 0 ( x 0 ) T y 0 ( x 0 ) T s 0 = 0. Note that from (4.67), ( x 0 ) T y 0 = 0. W e then ha v e ( x 0 ) T Ax 0 = ( x 0 ) T s 0 . F rom lemma 4.5, w e pro v e that ( x 0 ) T s 0 = e T s 0 . Th us, ( x 0 ; y 0 ; s 0 ) is an optimal solution to P 1 . Suciency. The pro of is similar to the necessit y pro of. W e can form ulate P 1 as a MIP b y replacing the nonlinear constrain t y T x = 0 b y a linear constrain t y M (1 x ), where M = max i P n j =1 q ij = k Q k 1 . ~ P 1 : min g ( x ) = e T s; Qx y s = 0 ; Ax b; y M (1 x ) ; s i 0 ; y i 0 ; x 2 f 0 ; 1 g n . F rom theorem 4.18 and the ab o v e transformation, w e ha v e sho wn that P 1 , P 1 , and ~ P 1 are \equiv alen t".

PAGE 117

100 4.6.2 Multi-Quadratic In teger Programming In this section, w e prop ose reform ulation-linearization tec hniques for MQIP , whic h is a more general case of QIP . Consider the MQIP problem giv en b y min x T Qx s.t. n X i =1 x i = b (4.67) x T B x k where x i 2 f 0 ; 1 g 8 i 2 f 1 ; :::; n g . Let B b e n n matrix. Note that k is a constan t. 4.6.2.1 Con v en tional linearization tec hnique for m ulti-quadratic in teger programming problems With one more quadratic constrain t, Problem (4.67) b ecomes m uc h harder to solv e. Ho w ev er, from the equiv alen t IP form ulation of QIP problems, w e can mo dify the MQIP form ulation and reform ulate this problem b y adding one more linearized constrain t to mak e sure that the solution to Problem 4.68 satises the additional quadratic constrain t. The equiv alen t IP form ulation is giv en b y min X i X j q ij x ij s.t. n X i =1 x i = b; x ij x i ; for i; j = 1 ; :::; n ( i 6= j ) (4.68) x ij x j ; for i; j = 1 ; :::; n ( i 6= j ) x i + x j 1 x ij ; for i; j = 1 ; :::; n ( i 6= j ) X i X j b ij x ij k (4.69) where x i 2 f 0 ; 1 g and x ij 2 f 0 ; 1 g , i; j = 1 ; :::; n . As w e men tioned in the previous section, the ab o v e form ulation is not computationally ecien t as n increases.

PAGE 118

101 4.6.2.2 KKT conditions linearization tec hnique for m ulti-quadratic programming problems W e can solv e the MQIP problem in (4.67) b y adding the rst-order deriv ativ e of the quadratic constrain t and some additional constrain ts to the MILP problem in (4.65). The equiv alen t MILP form ulation is giv en b y min n X i =1 s i (4.70) s.t. n X i =1 x i k = 0 (4.71) n X j =1 q ij x j + s i + y i = 0 ; for i = 1 ; :::; n (4.72) y i M (1 x i ) 0 ; for i = 1 ; :::; n (4.73) h i M x i 0 ; for i = 1 ; :::; n (4.74) n X j =1 b ij x j + h i 0 ; for i = 1 ; :::; n (4.75) n X i =1 h i k (4.76) where x i 2 f 0 ; 1 g , M = max i P n j =1 q ij = k Q k 1 , M = max i P n j =1 b ij = k B k 1 , and s i ; y i ; h i 0 ; for i; j = 1 ; :::; n . Applying CPLEX 7.0, this problem can b e easily solv ed with n = 30. This form ulation is v ery computationally ecien t b ecause the n um b er of 0-1 v ariables is O ( n ). Next, w e pro v e that applying this tec hnique w e can get an optimal solution to Problem (4.67) b y solving Problem (4.73). Let B b e an n n matrix, whose eac h elemen t b ij 0 ; i; j = 1 ; : : : ; n . Consider the follo wing t w o problems: P 2 : min f ( x ) = x T Qx; Ax b; x T B x k ; x 2 f 0 ; 1 g n ; is a p ositiv e constan t. P 2 : min g ( x ) = e T s; Qx y s = 0 ; Ax b; y M (1 x ) ; B x z 0 ; e T z k ; z M 0 x; x 2 f 0 ; 1 g n ; y i ; s i ; z i 0 ; where M 0 = k B k 1 and M = k Q k 1 . Let us pro v e the follo wing theorem.

PAGE 119

102 Theorem 4.19. P 2 has an optimal solution x 0 i ther e exist y 0 ; s 0 ; z 0 such that ( x 0 ; y 0 ; s 0 ; z 0 ) is an optimal solution to P 2 . Pr o of. Ne c essity. F rom the pro of of the Theorem 4.18, it is ob vious that w e only need to sho w that if x 0 is an optimal solution to P 2 then there exists z 0 suc h that the follo wing constrain ts are satised: B x 0 z 0 0 ; (4.77) e T z 0 k ; (4.78) z 0 M 0 x 0 : (4.79) Note from (4.79) that if x 0 i = 0 then w e m ust ha v e z 0 i = 0. F rom lemma 4.5, w e ha v e e T z 0 = ( x 0 ) T z 0 : Since z 0 i is real n um b er, for all i , where w e ha v e x 0 i = 1, w e can c ho ose z 0 i suc h that ( B x 0 ) i = z 0 i . Therefore, Eqs. (4.77) and (4.79) are satised. Multiplying Eq. (4.77) b y ( x 0 ) T , w e obtain ( x 0 ) T B x 0 = ( x 0 ) T z 0 = e T z 0 and as x 0 is an optimal solution to P 2 then Eq. (4.78) is satised. Suciency. The pro of is similar to the necessit y pro of. Next consider the case when the elemen ts of Q and B can b e negativ e (i.e., q ij and b ij can b e less than 0). If w e ha v e a knapsac k constrain t w T x = b , where w i and b are some constan ts and w i 6= 0 for i = 1 ; : : : ; n w e still can reduce the problem to equiv alen t one with matrices ~ Q and ~ B , where ~ q ij 0 and ~ b ij 0. Th us w e can apply the tec hnique describ ed ab o v e to linearize the problem. Let us sho w this reduction. Assume without loss of generalit y that w i 1 for i = 1 ; : : : ; n . Let C b e a n n matrix and C = w w T . It is clear that c ij 1. Dene ~ Q as ~ Q = Q + max i;j j q ij j C . Since c ij 1, w e then ha v e the follo wing equalities: ~ q ij = q ij + max i;j j q ij j c ij 0 ; x T Qx = x T ( ~ Q max i;j j q ij j C ) x = x T ~ Q x b 2 max i;j j q ij j :

PAGE 120

103 Since the term b 2 max i;j j q ij j is a constan t, w e can solv e the initial problem using the matrix ~ Q . As w e sho w ed ab o v e ~ q ij are nonnegativ e and hence w e still can use the same tec hnique to linearize the problem. 4.7 Applications of the Dev elop ed Linearization T ec hnique F rom the dev elop ed R TL in the previous section, w e can apply this tec hnique to an y sp ecial cases of the MQIP problem. Herein, w e giv e t w o examples of w ellkno wn com binatorial optimization problems (i.e., maxim um indep enden t set problem, maxim um clique problem). Although these t w o problems ha v e the IP form ulation v ersions, the dev elop ed R TL giv es alternativ e MLIP form ulations, whic h app ear to b e more ecien t. 4.7.1 Maxim um Clique Problems The in teger programming form ulation (edge form ulation) of the maxim um clique problem is giv en b y max f ( x ) = n X i =1 w i x i s.t. x i + x j 1 ; 8 ( i; j ) 2 E ; (4.80) x i 2 f 0 ; 1 g ; i = 1 ; : : : ; n: F rom section 4.4.4.2, the indenite quadratic programming problem for the maxim um clique is giv en b y min f G ( x ) = n X i =1 x i + 2 X ( i;j ) 2 E ;i>j x i x j = x T ( A G I ) x = x T Ax s.t. x 2 f 0 ; 1 g n : (4.81) where A = A G I and A G is an adjacency matrix of the graph G . If x solv es Problem (4.81), then the set C dened b y C = t ( x ) is a maxim um clique of graph G with j C j = f G ( x ).

PAGE 121

104 W e then apply the KKT conditions linearization tec hnique to the maxim um clique problem. Let M = max i P n j =1 a ij = k A k 1 . W e then ha v e the MILP form ulation for the maxim um clique problem giv en b y min n X i =1 s i s.t. n X j =1 a ij x j + s i + y i = 0 ; for i = 1 ; :::; n (4.82) y i M (1 x i ) 0 ; for i = 1 ; :::; n where x i 2 f 0 ; 1 g and s i ; y i 0 ; for i = 1 ; :::; n . The adv an tage of form ulation (4.82) o v er (4.81) is a smaller n um b er of the constrain ts (4.82), esp ecially for dense graphs. In other w ords, the n um b er of constrain ts in the form ulation (4.81) is the n um b er of edges existing in the graph, whic h can b e O ( n 2 ) in the case of dense graph. On the other hand, the n um b er of constrain ts in the form ulation (4.82) is O ( n ). 4.7.2 Maxim um Indep enden t Set Problems Recalled from section 4.4.4.3, the in teger programming form ulation (edge form ulation) of the maxim um indep enden t set problem is giv en b y max f ( x ) = n X i =1 w i x i s.t. x i + x j 1 ; 8 ( i; j ) 2 E ; (4.83) x i 2 f 0 ; 1 g ; i = 1 ; : : : ; n: Let A = A G I , where A G is an adjacency matrix of the graph G , w e then ha v e the quadratic form ulation of the maxim um indep enden t set problem giv en b y min f ( x ) = x T Ax s.t. x i 2 f 0 ; 1 g ; i = 1 ; : : : ; n: (4.84)

PAGE 122

105 W e then apply the KKT conditions linearization tec hnique to the maxim um indep enden t set problem. Let M = max i P n j =1 a ij = k A k 1 . W e then ha v e the MILP form ulation for the maxim um indep enden t set problem giv en b y min n X i =1 s i s.t. n X j =1 a ij x j + s i + y i = 0 ; for i = 1 ; :::; n (4.85) y i M (1 x i ) 0 ; for i = 1 ; :::; n where x i 2 f 0 ; 1 g and s i ; y i 0 ; for i = 1 ; :::; n . The adv an tage of form ulation (4.85) o v er (4.83) is a smaller n um b er of the constrain ts (4.85), esp ecially for dense graphs. In other w ords, the n um b er of constrain ts in the form ulation (4.83) is the n um b er of edges existing in the graph, whic h can b e O ( n 2 ) in the case of dense graph. On the other hand, the n um b er of constrain ts in the form ulation (4.85) is O ( n ).

PAGE 123

CHAPTER 5 APPLICA TIONS IN BIOENGINEERING: BRAIN DISORDERS During the past few decades, neuroscien tists b eliev ed that epileptic seizures b egan abruptly , just a few seconds b efore clinical onset. Ho w ev er, there is no w gro wing evidence that seizures dev elop min utes to hours b efore a seizure clinical onset. This evidence is based on quan titativ e studies of long term in tracranial electro encephalographic (EEG) recordings from patien ts with epilepsy . The metho ds in those studies include frequency based metho ds, statistical analysis of EEG signals, nonlinear dynamics (c haos theory), and in telligen t engineered systems. Adv ances in seizure prediction leads us the opp ortunit y to dev elop implan table devices, whic h are able to w arn of imp ending seizures and to trigger therap y to prev en t clinical epileptic seizures. The organization of the succeeding sections of this c hapter is as follo ws. The in tro duction to epilepsy is discussed in the next section. In section 5.2, the directions in epilepsy researc h, esp ecially seizure prediction, are addressed. This includes the metho ds used to study the problem of seizure prediction. The motiv ation and goals of this researc h are giv en in section 5.3. Later in section 5.4, w e prop ose v arious quan titativ e metho ds for analyzing EEG time series. Statistical tests for spatiotemp oral analysis are then prop osed in section 5.5. The dev elop ed optimization tec hniques to iden tify the temp oral pattern in EEG time series based on dynamical system approac hes are discussed in section 5.6. In section 5.7, the datasets and framew ork of the seizure w arning algorithm are presen ted. The p erformance of the algorithm, sensitivit y and false w arning rate, applied to 5 patien ts is presen ted in section 5.8. The conclusions and p erformance, limitation, and p ossibilit y to dev elop 106

PAGE 124

107 devices for diagnostic and therap eutic purp oses of this algorithm are discussed in the nal section 5.9. 5.1 In tro duction to Epilepsy Epilepsy is among the most common disorders of the nerv ous system and consists of more than 40 clinical syndromes aecting 50 million p eople w orldwide (appro ximately 1% of the p opulation). Epilepsy is c haracterized b y in termitten t seizures, that is, in termitten t paro xysmal rh ythmic electrical disc harges within the cerebrum that disrupt normal brain function. Appro ximately 25 to 30% of patien ts receiving medication ha v e inadequate seizure con trol. In other w ords, ab out 25% of patien ts with epilepsy ha v e seizures that are resistan t (refractory) to medical therap y . There is a lo calized structural c hange in neuronal circuitry within the cerebrum whic h pro duces organized quasi-rh ythmic disc harges in some t yp es of epilepsy (i.e., fo cal or partial epilepsy). These disc harges then spread from the region of origin (epileptogenic zone) to activ ate other areas of the cerebral hemisphere. Although the macroscopic and microscopic features of the epileptogenic zone ha v e b een comprehended, the mec hanism b y whic h these xed disturbances in lo cal circuitry pro duce in termitten t disturbances of brain function cannot b e explained and understo o d. While epilepsy o ccurs in all age groups, the highest incidences o ccur in infan ts and in the elderly . The most common t yp e of epilepsy in adults is temp oral lob e epilepsy . In this t yp e of epilepsy , the temp oral cortex, lim bic structures and orbitofron tal cortex app ear to pla y a critical role in the onset and spread of seizures. T emp oral lob e seizures usually b egin as paro xysmal electrical disc harges in the hipp o campus and often spread rst to ipsilateral, then to con tralateral cerebral cortex. These abnormal disc harges result in a v ariet y of in termitten t clinical phenomena, including motor, sensory , aectiv e, cognitiv e, autonomic and psyc hic symptomatology . There is no single cause of epilepsy . In appro ximately 65% of cases, the causes are unkno wn. Ho w ev er, man y factors can injure the nerv e cells in the brain or the

PAGE 125

108 w a y the nerv e cells comm unicate with eac h other. Most frequen tly iden tied causes are genetic abnormalities, dev elopmen tal anomalies, febrile con vulsions, as w ell as brain insults suc h as craniofacial trauma, cen tral nerv ous system infections, h yp o xia, isc hemia, and tumors. The hallmark of epilepsy is recurren t seizures, whic h can b e c haracterized b y the sudden dev elopmen t of sync hronous neuronal ring, p oten tials, in the cerebral cortex that ma y b egin lo cally in a p ortion of one cerebral hemisphere or b egin sim ultaneously in b oth cerebral hemispheres. When neuronal net w orks are activ ated, they pro duce a c hange in v oltage p oten tial, whic h can b e captured b y an EEG. These c hanges are reected b y wriggling lines along the time axis in a t ypical EEG recording. F rom the EEG time series, to dra w on adv ances from the epilepsy eld and other rapidly c hanging elds of biomedicine, w e fo cus our researc h on the prediction of epileptic seizures in patien ts at risk nonlinear to enable eectiv e and safe treatmen t for patien ts with epilepsy . 5.1.1 Classication of Seizures There are man y v arieties of epileptic seizures, and seizure frequency and the form of attac ks v ary greatly from p erson to p erson. The most common classication sc heme describ es t w o ma jor t yp es of seizures: 1. \partial" seizure: a seizure that causes excessiv e electrical disc harges in the brain limited to one area. 2. \generalized" seizure: a seizure that c hanges the whole brain to b e in v olv ed with excessiv e electrical disc harges. Eac h of these categories can b e divided in to sub categories: simple partial, complex partial, tonic{clonic, and other t yp es. With the most common t yp es of seizures there is some loss of consciousness, but some seizures ma y only in v olv e some mo v emen ts of the b o dy or strange feelings. Dieren t p eople's seizures can b e v ery dieren t. Common feelings include uncertain t y , fear, ph ysical and men tal exhaustion, confusion, and memory loss. Sometimes if a p erson is unconscious, there ma y b e no feeling at

PAGE 126

109 all. Seizures can last an ywhere from a few seconds to sev eral min utes, dep ending on the t yp e of seizure. In particular, a tonic{clonic seizure t ypically lasts 1{7 min utes. Absence seizures ma y only last a few seconds. Complex partial seizures range from 30 seconds to 2{3 min utes. 5.1.2 Mec hanisms of Epileptogenesis Epileptogenesis is considered to b e a cascade of dynamic biological ev en ts altering the balance b et w een excitation and inhibition in neural net w orks. It can apply to an y of the progressiv e bio c hemical, anatomic, and ph ysiologic c hanges leading up to recurren t seizures. Progressiv e c hanges are suggested b y the existence of a so{called silen t in terv al (y ears in duration) b et w een CNS infection, head trauma or febrile seizures and the later app earance of epilepsy . Understanding these c hanges is k ey to prev en ting the onset of epilepsy [76 ]. Mec hanisms of epileptogenesis are b eliev ed to incop orate information from lev els of organization that range from molecular (e.g., altered gene expression) to macrostructural (e.g., altered neural net w orks). Since the p ossibilities are so div erse, a primary researc h is directed to sort out whic h mec hanisms are causal, correlativ e, or consequen tial. The complexit y can b e in tractable when, for example, a single seizure activ ates c hanges in expression of man y genes ranging from transcription factors to structural proteins. Moreo v er, mec hanisms of plasticit y ma y mask the initiating ev en t. No animal mo del completely mimics the features of h uman epilepsy . Hyp otheses for epilepsy prev en tion m ust incorp orate observ ations ab out the in termitten t nature of epilepsy , its age{sp ecic features, v ariabilit y in expression, dela y ed temp oral onset ranging up to 15 y ears after an insult, and selectiv e vulnerabilit y of brain regions. The p oten tial role of protectiv e factors is w orth exploring b ecause ab out 50% of patien ts fail to dev elop epilepsy ev en after sev ere p enetrating brain injuries [76 ].

PAGE 127

110 5.2 Directions in Epilepsy Researc h: Seizure Prediction Ideas of predicting epileptic seizures b egan in the 1970's [156 ]; ho w ev er, the past tec hnology did giv e some insigh t un til the 1990's. In 1988, the existence of a pre-ictal state b efore temp oral{lob e seizures w as disco v ered [74 ]. Subsequen tly , there ha v e b een a lot of in terest and dev elopmen ts in seizure predictions: the wide acceptance of digital electro encephalographic (EEG) tec hnology; maturation of metho ds for recording from in tracranial electro des to lo calize seizures; and the tremendous ecacy , acceptabilit y , and commercial success of implan table medical devices, suc h as pacemak ers, implan table cardiac debrillators, and brain stim ulators for P arkinson's disease, tremor, and pain [90 ]. It is v ery dicult to dev elop prosp ectiv e analysis on prediction of epileptic seizures b ecause of the lac k of substan tiv e studies including: the need for long{ duration, high{qualit y datasets from a large n um b er of patien ts implan ted with in tracranial electro des; adequate storage and p o w erful computers for pro cessing of digital EEG datasets man y gigab ytes in length; and en vironmen ts facilitating a smo oth o w of clinical EEG data to p o w erful exp erimen tal computing facilities [90 ]. Next, w e discuss some of seizure{prediction studies in the literature. There ha v e b een a lot of studies in time{domain analysis including statistical analysis of particular EEG ev en ts and c haracterization of the EEG data. F or example, the relationship b et w een the n um b er of in terictal epileptiform disc harges on EEG and oncoming seizures w as in v estigated [49 , 81, 161 ]. Later on, it w as sho wn in the long{ term (sev eral da ys) energy analysis that the n um b er of energy bursts in the EEG app eared to increase o v er hours as seizures approac hed [91 ]. F requency domain analysis is one of seizure prediction tec hniques used to decomp ose the EEG signal in to comp onen ts of dieren t frequencies. In [91 ], bursts of activit y in the range 15{25 Hz app eared to build from ab out 2 hours b efore seizure

PAGE 128

111 onset in some patien ts with temp oral lob e epilepsy . These burst activities seemed to c hange their frequency steadily (faster and slo w er) o v er time. One of the most w ell{kno wn tec hniques in seizure prediction is based on nonlinear dynamics and c haos theory . These tec hniques sho w c hanges in c haracteristics (dynamics) of the EEG w a v eform in the min utes leading up to seizures [75 , 88 ]. They apply T ak ens' theorem, whic h states that the complete dynamics of a system can b e reconstructed from a single measuremen t sequence (suc h as its tra jectory o v er time), along with certain in v arian t prop erties [153 ]. This sc heme allo ws us to em b ed signals in to a phase space and observ e some of the hidden c haracteristics of the signals. Nonlinear tec hniques sho w ed that the tra jectory of the EEG signals app eared to b e more regular and organized b efore the clinical onset of the seizure than the ones in the in terictal state [70 ]. 5.3 Motiv ation and Goals of Researc h P atien ts with epilepsy ma y rep ort imp ending seizures hours or da ys in adv ance, suggesting that tec hniques ma y ev en tually b e dev elop ed to predict seizures b efore their o ccurrence. One approac h is through quan titativ e analysis based on c haos theory of the m ultic hannel in tracranial con tin uous EEG recordings that had b een acquired from patien ts with medically in tractable temp oral lob e epilepsy . Eac h record included a total of 28 to 32 in tracranial electro des (8 sub dural and 6 hipp o campal depth electro des for eac h cerebral hemisphere). A diagram of standard electro de lo cations is pro vided in Figures 5{1 and 5{2. The application of c haos theory and nonlinear analysis to the ph ysical and c hemical sciences has resolv ed some long{standing problems; for example, ho w to determine nonp erio dic o w in the atmosphere [92 ], calculate a turbulen t ev en t in uid dynamics [29] or ho w to quan tify the path w a y of a molecule during Bro wnian motion [47 ]. Since biology and medicine ha v e unresolv ed problems; for example, ho w to predict the o ccurrence of lethal arrh ythmias or epilepsy , it ma y b e suitable to consider the application of c haos

PAGE 129

112 theory and nonlinear analysis in this area. Studies using nonlinear analysis of EEG recordings from depth electro des, implan ted in patien ts during ev aluation for epilepsy surgery , whic h ma y allo w ph ysicians to an ticipate seizures with lead times of min utes (rather than the seconds previously though t to encompass seizure generation). Epileptic seizure o ccurrences seem to b e random and unpredictable; ho w ev er, recen t studies in epileptic patien ts suggest that seizures are deterministic rather than random. Subsequen tly , studies of the spatiotemp oral dynamics in EEG's, from patien ts with temp oral lob e epilepsy , demonstrated a pre-ictal transition, c haracterized b y a progressiv e con v ergence (en trainmen t) of dynamical measures (e.g., short{ term maxim um Ly apuno v exp onen ts { S T L max ) at sp ecic anatomical areas in the neo cortex and hipp o campus, of appro ximately 1 2 to 1 hour duration b efore the ictal onset [75 , 69 , 72 , 66 , 138 , 136 ]. The existence of the pre-ictal transition is supp orted b y subsequen t w orks of other in v estigations [25 , 30 , 88, 131 , 91 ]. There is also ph ysiological supp ort for the idea that seizures are predictable. Ra jna and colleagues in terview ed 562 patien ts and found that clinical pro dromes or auras o ccurred in more than 50% of patien ts [132 ]. A signican t increase in blo o d o w in the epileptic temp oral lob e that started 10 min utes b efore seizure onset and an increase in b oth temp oral lob es 2 min utes b efore seizure onset w ere sho wn in [159 ]. Although the existence of the pre-ictal transition p erio d has recen tly b een conrmed and further dened b y other in v estigators, the c haracterization of this spatiotemp oral transition is still far from complete. Therefore, the dev elopmen t of a mo del for the mec hanism of generation of epileptic seizures remains a dicult task. F or example, ev en in the same patien t, dieren t set of cortical sites ma y exhibit pre-ictal transition from one seizure to the next. In addition, resetting of the en trainmen t of the normal sites with the epileptogenic fo cus (critical cortical sites) o ccurs after eac h seizure [73 ]. Therefore, it is p ostulated that complete or partial p ost-ictal resetting of pre-ictal en trainmen t of the epileptic brain, aects the

PAGE 130

113 Figure 5{1: Inferior transv erse view of the brain, illustrating appro ximate depth and sub dural electro de placemen t for EEG recordings are depicted. Sub dural electro de strips are placed o v er the left orbitofron tal (LOF), righ t orbitofron tal (R OF), left subtemp oral (LST), and righ t subtemp oral (RST) cortex. Depth electro des are placed in the left temp oral depth (L TD) and righ t temp oral depth (R TD) to record hipp o campal activit y . Figure 5{2: Lateral views of the brain, illustrating appro ximate depth and sub dural electro de placemen t for EEG recordings are depicted.

PAGE 131

114 route to the subsequen t seizure, con tributing to the apparen t non-stationary nature of the en trainmen t pro cess. Ev en though a complete mo deling of the pro cess remains elusiv e, the dynamical measures w e ha v e used ha v e resulted in the dev elopmen t of eectiv e seizure prediction sc hemes. Because the EEG is a nonstationary signal, selection of the appropriate time scale is crucial. In addition, dynamical measures m ust prop erly w eigh transien ts in the signal. In a sp on taneously bursting neuronal net w ork in brain, c haos can b e demonstrated b y the presence of unstable xed-p oin t b eha vior. While Ly apuno v exp onen ts are among the global dynamical in v arian ts studied for detecting c haos and nonlinear structure in time series analysis, lo cal Ly apuno v exp onen ts, dened as the lo cal div ergence within a nite-time horizon, are a more useful measure of predictabilit y of nonlinear systems and a more p o w erful to ol for testing nonlinearit y of time series. In a retrosp ectiv e analysis (i.e., after a seizure o ccurrence) utilizing S T L max as a dynamical measure and a global optimization tec hnique to iden tify critical electro de sites, Iasemidis and group found that the pre-ictal transition preceded more than 91% of the seizures analyzed [75 , 72 ]. This nding indicated that, if one kno ws whic h critical electro de sites will participate in the next pre-ictal transition, it ma y b e p ossible to detect the transition in time to w arn of an imp ending seizure. The results of these studies conrmed the predictabilit y of seizures. F urther studies ha v e sho wn the existence of resetting of the brain after seizures' onset [145 , 73 , 135 ], that is, div ergence of S T L max proles after seizures. Therefore, to corp orate these ndings together, w e ha v e to ensure that the optimal group of critical sites sho ws this div ergence. The problem of iden tifying those critical sp ecic areas are form ulated as a m ulti-quadratic 0-1 programming problem. Ho w ev er, the metho d prop osed in [75 , 72] could not b e applied to solv e this problem. In this dissertation, w e prop ose new computational approac hes to solv e the m ulti-quadratic 0-1 programming problem. T o solv e this practical optimization problem, w e dev elop a new linearization

PAGE 132

115 tec hnique based on KKT optimalit y conditions. The details of this tec hnique and global optimalit y pro of are previously discussed in Chapter 4. 5.4 Data Mining in EEG Time Series A c hallenging question in epilepsy is whether the nonlinear metho ds, along with the classical linear metho ds, can detect c hanges in the brain dynamics, with fo cus on the prediction of the onset of epileptic seizures. In the past decade, nonlinear metho ds, based on c haos theory , w ere applied with some success. In this thesis, w e compile a large set of nonlinear metho ds and assess the capabilit y of eac h metho d in predicting epileptic seizures when applied to EEG time series. This researc h is motiv ated b y the k ey time series data mining concepts of ev en t c haracterization function, temp oral pattern, temp oral pattern cluster, time-dela y em b edding, phase space, augmen ted phase space, ob jectiv e function, and optimization. The ob jectiv e of this w ork is to in v estigate the underlying dynamics of the EEG signal and searc h for a reliable computational to ol in order to detect c hanges in the EEG that suggest a p ossible forthcoming of epileptic seizure. 5.4.1 The Metho d of Dela ys A fundamen tal concept in nonlinear dynamics and c haos theory is the \metho d of dela ys" in tro duced b y T ak en [153 ]. In this metho d, eac h p oin t in a time series of ev en ts or measuremen ts is considered in the con text of other ev en ts or measuremen ts in the same series that are close in time. Ho w ev er, the problem of extracting geometric information from time series w as rst in tro duced b y P ac k ard et. al [105 ]. The basic idea of this approac h is that the state of an n -dimensional dynamical system can b e uniquely c haracterized b y indep enden t quan tities. One suc h set of indep enden t quan tities are the phase space ( n -dimensional spanned basis) co ordinates y ( t ) = ( y 1 ( t ) ; : : : ; y n ( t )) T but these are not a v ailable, since the only data is the one-dimensional time series. Based on the conjecture that an y p -tuple of n um b ers should giv e equiv alen t results (in the sense that, if one reconstructs sev eral phase p ortraits in accordance with this

PAGE 133

116 idea, then for an y t w o of these phase p ortraits there should b e a dieomorphism whic h maps one on to the other). Similar to the approac h b y P ac k ard, the metho d of dealys can b e used to construct p -v ectors whic h con tain the same information as the original state v ectors. This metho d simply tak es consecutiv e elemen ts of the time series directly as co ordinates in phase space. F or instance, a dynamical system can b e analyzed b y taking one co ordinate of phase space as the observ able: X i = ( x ( t i ) ; x ( t i + ) : : : x ( t i + ( p 1) )) T (5.1) where is the selected time lag b et w een the comp onen ts of eac h v ector in the phase space, p is the selected dimension of the em b edding phase space, and t i 2 [1 ; T ( p 1) ]. 5.4.2 Estimation of Short-T erm Maxim um Ly apuno v Exp onen ts The metho d w e dev elop ed for estimation of Short T erm Maxim um Ly apuno v Exp onen ts ( S T L max ), an estimate of L max for nonstationary data, is explained in detail elsewhere[69 , 68 , 162 ]. Herein w e will presen t only a short description of our metho d. Construction of the em b edding phase space from a data segmen t x ( t ) of duration T is made b y the metho d of dela ys describ ed ab o v e. The geometrical prop erties of the phase p ortrait of a system can b e expressed quan titativ ely using measures that ultimately reect the dynamics of the system. F or example, the complexit y of an attractor is reected in its dimension [50 ]. The larger the dimension of an attractor, the more complicated it app ears in the phase space. The em b edding dimension p is the dimension of the phase space that con tains the attractor and it is alw a ys a p ositiv e in teger. On the other hand, the attractor's dimension D ma y b e a p ositiv e non-in teger (fractal). D is directly related to the n um b er of v ariables of the system and is in v ersely related to the existing coupling among them.

PAGE 134

117 F rom T ak ens' theorem, the em b edding dimension p should b e at least equal to (2 D + 1) in order to correctly em b ed an attractor in the phase space. Of the man y dieren t metho ds used to estimate the dimension D of an ob ject in the phase space, eac h has its o wn practical problems [50 ]. The measure most often used to estimate D is the phase space correlation dimension. Metho ds for calculating the correlation dimension from exp erimen tal data ha v e b een describ ed b y Kostelic h [84 ] and w ere emplo y ed in our w ork to appro ximate D of the epileptic attractor. In the EEG data w e ha v e analyzed to date, D is found to b e b et w een 2 and 3 during an epileptic seizure. Therefore, for the reconstruction of the phase space w e ha v e used an em b edding dimension p of 7. An attractor is c haotic if, on the a v erage, orbits originating from similar initial conditions (nearb y p oin ts in the phase space) div erge exp onen tially fast (expansion pro cess). If these orbits b elong to an attractor of nite size, they will fold bac k in to it as time ev olv es (folding pro cess). The result of these t w o pro cesses ma y b e a stable top ologically la y ered attractor. When the expansion pro cess o v ercomes the folding pro cess in some eigen-directions of the attractor, the attractor is called c haotic. The measures that quan tify the c haoticit y of an attractor are the Kolmogoro v en trop y ( K ) and the Ly apuno v exp onen ts, t ypically measured in bits/sec. F or an attractor to b e c haotic, the Kolmogoro v en trop y or at least the maxim um Ly apuno v exp onen t ( L max ) m ust b e p ositiv e. The Kolmogoro v (Sinai or metric) en trop y ( K ), in bits/sec units, measures the uncertain t y ab out the future state of the system giv en information ab out its previous states in the state space. The Ly apuno v exp onen ts measure this a v erage uncertain t y along the lo cal eigen v ectors of an attractor in the state space. If the phase space is of p dimensions, w e can estimate theoretically up to p Ly apuno v exp onen ts. Ho w ev er, as exp ected, only ( D + 1) of them will b e real. The rest will b e spurious [1 ]. Metho ds for calculating these dynamical measures from exp erimen tal data ha v e b een published [1 , 162 , 65]. The estimation of L max in a c haotic system has

PAGE 135

118 b een sho wn to b e more reliable and repro ducible than the estimation of the remaining exp onen ts [52 ], esp ecially when D is unkno wn and c hanges o v er time, as it is the case with high-dimensional and nonstationary data. If w e denote b y L the estimate of the short term largest Ly apuno v exp onen t S T L max then: L = 1 N a t N a X i =1 log 2 j X i;j ( t ) j j X i;j (0) j (5.2) with X i;j (0) = X ( t i ) X ( t j ) (5.3) X i;j ( t ) = X ( t i + t ) X ( t j + t ) (5.4) where (i) X ( t i ) is the p oin t of the ducial tra jectory t ( X ( t 0 )) with t = t i , X ( t 0 ) = ( x ( t 0 ) : : : x ( t 0 + ( p 1) )) T , T denotes the transv erse, and X ( t j ) is a prop erly c hosen v ector adjacen t to X ( t i ) in the phase space (see Figures 5{3 and 5{4). (ii) X i;j (0) = X ( t i ) X ( t j ) is the displacemen t v ector at t i , that is, a p erturbation of the ducial orbit at t i , and X i;j ( t ) = X ( t i + t ) X ( t j + t ) is the ev olution of this p erturbation after time t . (iii) t i = t 0 + ( i 1) t and t j = t 0 + ( j 1) t , where i 2 [1 ; N a ] and j 2 [1 ; N ] with j 6= i . (iv) t is the ev olution time for X i;j , that is, the time one allo ws X i;j to ev olv e in the phase space. If the ev olution time t is giv en in sec, then L is in bits p er second. (v) t 0 is the initial time p oin t of the ducial tra jectory and coincides with the time p oin t of the rst data in the data segmen t of analysis. In the estimation of L , for a complete scan of the attractor, t 0 should mo v e within [0 ; t ]. (vi) N a is the n um b er of lo cal L max 's that will b e estimated within a duration T data segmen t. Therefore, if D t is the sampling p erio d of the time domain data, T = ( N 1) D t = N a t + ( p 1) . W e computed the short term largest Ly apuno v exp onen t S T L max using the metho d prop osed b y Iasemedis et al. [65 ], whic h is a mo dication of the metho d b y

PAGE 136

119 Figure 5{3: A 10 sec segmen t of ra w EEG data during an epileptic seizure recorded from electro de R TD2 in a patien t with righ t temp oral lob e epilepsy . Figure 5{4: The epileptic attractor EEG data corresp onding to Figure 5{3 in the reconstructed phase space with em b edding dimension p = 3 and time dela y = 20 msec, in t w o dieren t orien tations.

PAGE 137

120 W olf et al. [162 ]. W e call the measure short term to distinguish it from those used to study autonomous dynamical systems studies. Mo dication of the W olf 's algorithm is necessary to b etter estimate S T L max in small data segmen ts that include transien ts, suc h as in terictal spik es. The mo dication is primarily in the searc hing pro cedure for a replacemen t v ector at eac h p oin t of a ducial tra jectory . F or example, in our analysis of the EEG, w e found that the crucial parameter of the L max estimation pro cedure, in order to distinguish b et w een the pre-ictal, the ictal and the p ost-ictal stages, w as not the ev olution time t nor the angular separation V i;j b et w een the ev olv ed displacemen t v ector X i 1 ;j ( t ) and the candidate displacemen t v ector X i;j (0) (as it w as claimed in F rank et al. [41 ]). The crucial parameter is the adaptiv e estimation in time and phase space of the magnitude b ounds of the candidate displacemen t v ector to a v oid catastrophic replacemen ts. Results from sim ulation data of kno wn attractors ha v e sho wn the impro v emen t in the estimates of L ac hiev ed b y using the prop osed mo dications [65 ]. Our rules can b e stated as follo ws: 1. F or L to b e a reliable estimate of S T L max , the candidate v ector X ( t j ) should b e c hosen suc h that the previously ev olv ed displacemen t v ector X ( i 1) ;j ( t ) is almost parallel to the candidate displacemen t v ector X i;j (0), that is, j V i;j j = j < X i;j (0) ; X ( i 1) ;j ( t ) > j max (5.5) where V max should b e small and j < ; > j denotes the absolute v alue of the angular separation b et w een t w o v ectors and in the phase space. 2. F or L to b e a reliable estimate of S T L max , X i;j (0) should also b e small in magnitude in order to a v oid computer o v ero w in the future ev olution within v ery c haotic regions and to reduce the probabilit y of starting up with p oin ts on separatrices [163 ]. This means, j X i;j (0) j = j X ( t i ) X ( t j ) j max (5.6) with max assuming small v alues. Therefore, the parameters to b e selected for the estimation pro cedure of L are:

PAGE 138

121 (i) The em b edding dimension p and the time lag for the reconstruction of the phase space (ii) The ev olution time t (n um b er of iterations N a ) (iii) The parameters for the selection of X ( t j ), that is, V i;j and max (iv) The duration of the data segmen t T Note that since only v ector dierences are in v olv ed in the estimation of L , an y direct curren t (DC) presen t in the data segmen t of in terest do es not inuence the v alue of L . In addition, only v ector dierence ratios participate in the estimation of L . This means that also L is not inuenced b y the scaling of the data (as long as the parameters in v olv ed in the estimation pro cedure, i. e. max , do not assume absolute but relativ e v alues to the scale of ev ery analyzed data segmen t). Both p oin ts ab o v e mak e sense when one recalls that L relates to the en trop y rate of the data[106 ]. 5.4.2.1 Selection of p and : W e select the em b edding dimension p suc h that the dimension v of the attractor in phase space is clearly dened. In the case of the epileptic attractor[74 , 75, 65 ], v = 2 3, and according to T ak ens Theorem a v alue of p (2 3 + 1) = 7 is adequate for the em b edding of the epileptic attractor in the phase space. This v alue of p ma y b e to o small for the construction of a phase space that can em b ed all states of the brain in terictally , but it should b e adequate for detection of the transition of the brain to w ard the ictal stage if the epileptic attractor is activ e in its space prior to the o ccurrence of the epileptic seizure. The parameter should b e as small enough to capture the shortest c hange (i.e., highest frequency comp onen t) presen t in the data. Also, should b e large enough to generate (with the metho d of dela ys) the maxim um p ossible indep endence b et w een the comp onen ts of the v ectors in the phase space. These t w o conditions are usually addressed b y selecting as the rst minim um of the m utual information b et w een the comp onen ts of the v ectors in the phase space or as the rst zero of the time domain auto correlation function of the

PAGE 139

122 data[1 ]. Theoretically , since the time span ( p 1) of eac h v ector in the phase space represen ts the duration of a state of the system, ( p 1) should b e at most equal to the p erio d of the maxim um (or dominan t) frequency comp onen t in the data. F or example, a sine w a v e (or a limit cycle) has v = 1, then a p = 2 1 + 1 = 3 is needed for the em b edding and ( p 1) = 2 should b e equal to the p erio d of the sine w a v e. Suc h a v alue of w ould then corresp ond to the Nyquist sampling of the sine w a v e in the time domain. In the case of the epileptic attractor, the highest frequency presen t is 70 Hz (the EEG data are lo w-pass ltered at 70Hz), whic h means that if p = 3, the maxim um to b e selected is ab out 7 ms. Ho w ev er, since the dominan t frequency of the epileptic attractor (i.e., during the ictal p erio d) w as nev er more than 12 Hz, according to the ab o v e reasoning, the adequate v alue of for the reconstruction of the phase space of the epileptic attractor is (7 1) = 83 ms, that is, should b e ab out 14 ms (for more details see [69 ]). 5.4.2.2 Selection of t : The ev olution time t should not b e to o large, otherwise the folding pro cess within the attractor adv ersely inuences L . On the other hand, it should not b e to o small in order for X i;j ( t ) to follo w the direction of the maxim um rate of information c hange. If there is a dominan t frequency comp onen t f 0 in the data, t is usually c hosen as t = 1 2 f 0 . Then, according to the previous argumen ts for the selection of p and , the t (( p 1) ) = 2, whic h for EEG results in t 42 msec. In Ref. [69 ], it is sho wn that suc h a v alue is within the range of v alues for t that can v ery w ell distinguish the ictal from the pre-ictal state. 5.4.2.3 Selection of V max : W e start with an initial V max (initial) = 0 : 1 rad. In the case that a replacemen t v ector X ( t j ) is not found with 0 j V i;j j < V max (initial) and j X ij (0) j < 0 : 1 max , w e relax the b ound for j X ij (0) j and rep eat the pro cess with b ounds up to 0 : 5 max . If not successful, w e relax the b ounds for j V i;j j b y doubling the v alue of V max and

PAGE 140

123 rep eat the pro cess with b ounds for V max up to 1 rad. V alues of V max larger than 0.8 rad nev er o ccurred in the pro cedure. If they do, the replacemen t pro cedure stops, a lo cal L ( t i ) is not estimated at t i and w e start the whole pro cedure at the next p oin t in the ducial tra jectory . 5.4.2.4 Selection of max : In W olf 's algorithm, max is selected as max = max j ;i j X i;j (0) j (5.7) where j = 1 ; :::; N and i = 1 ; :::; N a Th us, max is the global maxim um distance b et w een an y t w o v ectors in the phase space of a segmen t of data. This w orks ne as long as the data are stationary and relativ ely uniformly distributed in the phase space. With real data this is hardly the case, esp ecially with the brain electrical activit y whic h is strongly nonstationary and non uniform[15, 36 , 77]. Therefore, a mo dication in the searc hing pro cedure for the appropriate X ( t j ) is essen tial. First, an adaptiv e estimation of max is made at eac h p oin t X ( t i ), and the estimated v ariable is i;max = max j j X i;j (0) j (5.8) where j = 1 ; :::; N . By estimating max as ab o v e, w e tak e care of the non uniformit y of the phase space ( max is no w a spatially lo cal quan tit y of the phase space at a p oin t X ( t i )) but not of the eect of existing nonstationarities in the data. W e ha v e attempted to solv e the problem of nonstationarities b y estimating max also as a temp orally lo cal quan tit y . Then, a more appropriate denition for max is: i;max = max I D I S T 1 < j t i t j j
PAGE 141

124 and I D I S T 1 = (5.10) I D I S T 2 = ( p 1) (5.11) where I D I S T 1 and I D I S T 2 are upp er and lo w er b ounds for j t i t j j , that is, for the temp oral windo w of lo cal searc h. Th us, the searc h for i;max is alw a ys made temp orally ab out the state X ( t i ) and its c hanges within a p erio d of the time span ( p 1) of a state. According to the previous form ulae, the v alues for the parameters in v olv ed in the adaptiv e estimation of i;max in our EEG data are: I D I S T 1 = = 14 msec and I D I S T 2 = ( p 1) = 84 msec. 5.4.2.5 Selection of X ( t j ): The replacemen t v ector X ( t j ) should b e spatially close to X ( t i ) in phase space (with resp ect to magnitude and angle deviation), as w ell as temp orally not v ery close to X ( t i ) to allo w selecting X ( t j ) from a nearb y (but not the same) tra jectory (otherwise, b y replacing one state with one that shares common comp onen ts w ould lead to a false underestimation of L ). The ab o v e t w o argumen ts are implemen ted in the follo wing relations: 0 j V i;j j < V i;j (initial) = 0.1 rad (5.12) b i;max X i;j (0) c i;max (5.13) j t i t j j > I D I S T 3 ( p 1) (5.14) The parameter c starts with a v alue of 0.1 and increases, with a step of 0.1, up to 0.5, in order to nd a replacemen t v ector X ( t j ) satisfying Eq. (5.12) through Eq. (5.14). The parameter b m ust b e smaller than c and is used to accoun t for the p ossible noise con tamination of the data, denoting the distance b elo w whic h the estimation of L ma y b e inaccurate (w e ha v e used b = 0 : 05 for our data[69 , 162 ]. The temp oral b ound

PAGE 142

125 2 4 6 8 10 12 14 16 18 20 0 1 2 3 4 5 6 7 8 9 Segment duration T in secL in bit/secPreictal Ictal Figure 5{5: The v ariation of S T L max with the length T of the data segmen t for data in the pre-ictal and ictal state of an epileptic seizure (patien t 1). The rest of the parameters for the S T L max algorithm w ere: p = 7, = 14 msec, t = 42 msec, I D I S T 2 = 84 msec, I D I S T 3 = 84, b = 0 : 05, c = 0 : 1, and V i;j (initial)=0.1 rad I D I S T 2 should not b e confused with the temp oral b ound I D I S T 3 , since I D I S T 2 is used to nd the appropriate i;max at eac h p oin t X ( t i ) (searc hing o v er a limited time in terv al), whereas I D I S T 3 is used to nd the appropriate X ( t j ) within a i;max distance from X ( t i ) (searc hing o v er all p ossible times t j ). 5.4.2.6 Selection of T : F or data obtained from a stationary state of the system, the time duration T of the analyzed segmen t of data ma y b e large for the estimate of L to con v erge to a nal v alue. F or nonstationary data w e ha v e t w o comp eting requiremen ts: on one hand w e w an t T to b e as small as p ossible to pro vide lo cal dynamic information, but on the other hand the algorithm requires a minim um length of the data segmen t to stabilize the estimate of S T L max . Figure 5{5 sho ws a t ypical plot for the c hange of S T L max with the size of the windo w for segmen ts in the pre-ictal and ictal stage of a seizure. F rom this gure, it is clear that v alues of 10 to 12 seconds for T are adequate

PAGE 143

126 20 40 60 80 100 120 140 160 180 200 2 3 4 5 6 7 8 9 10 IDIST in msecL in bit/secPreictal Ictal 2 Figure 5{6: The v ariation of S T L max with the I D I S T 2 parameter for data in the pre-ictal and ictal state of an epileptic seizure. The rest of parameters for the S T L max algorithm w ere: p = 7, = 14 msec, t = 42 msec, I D I S T 1 = 14 msec, I D I S T 3 = 84, b = 0 : 05, c = 0 : 1, and V i;j (initial)=0.1 rad to distinguish b et w een the t w o extreme cases (pre-ictal and ictal) in our data and for the algorithm to con v erge. After extensiv e sensitivit y studies with EEG data in epilepsy[65 , 69 ] w e ha v e concluded that the critical parameter of the algorithm just presen ted is the I D I S T 2 , that is the parameter that establishes a neigh b orho o d in time at eac h p oin t in ducial tra jectory for the estimation of the parameter i;max , whic h then establishes a spatial neigh b orho o d for this p oin t in the phase space. This is v ery clearly illustrated in Figure 5{6. It is ob vious that, with v alues of I D I S T 2 greater than 160 msec, one is not able to distinguish b et w een the pre-ictal and the ictal state of a seizure based on the th us generated v alues of L . By dividing the recorded EEG data at an electro de site in to sequen tial nono v erlapping segmen ts, eac h 10.24 sec in duration, and estimating S T L max for eac h of these segmen ts, proles of S T L max o v er time are generated. A t ypical plot of S T L max o v er time obtained from an electro de site (R TD2) within the epiletogenic

PAGE 144

127 0 20 40 60 80 100 120 3 4 5 6 7 TIME (MINUTES)Lmax (bit/sec) SZ#10 * Figure 5{7: Smo othed S T L max proles o v er 2 hours deriv ed from an EEG signal recorded at R TD2 (patien t 1). Seizure 10 started and ended b et w een the t w o v ertical dashed lines. The estimation of the L max v alues w as made b y dividing the signal in to non-o v erlapping segmen ts of 10.24 sec eac h, using p = 7 and = 20 msec for the phase space reconstruction. The smo othing w as p erformed b y a 10 p oin t (1.6 min utes) mo ving a v erage windo w o v er the generated S T L max proles. fo cus is sho wn in Figure 5{7. The exp onen t S T L max is p ositiv e during the whole p erio d of recording, that is, 94 min utes pre-ictally (prior to the onset of the ten th recorded seizure), 2 min utes ictally (during the seizure) and 24 min utes p ost-ictally (after the seizure ends). The seizure onset corresp onds to the maxim um drop in the v alues of S T L max , th us the seizure can b e detected from the lo w est v alues of S T L max . Ho w ev er, ictal S T L max is still p ositiv e, implying a c haotic state ev en during the seizure. This is consisten t with an in terpretation of the seizure b eing a c haotic state with few er degrees of freedom than b efore. Comparing the mean v alue of S T L max in the pre-ictal state with the one in the p ostictal state, w e can sa y that the pre-ictal state is less c haotic than the immediate p ostictal one. In the pre-ictal state depicted in Figure 5{7, one can notice a pre-ictal trend of S T L max to w ard lo w er v alues o v er 1.7 hours prior to seizure 1 with one prominen t drop in the v alues of S T L max at ab out 0.45 hours prior to the seizure. This pre-ictal drop in S T L max can b e explained as

PAGE 145

128 an attempt of the system to w ard a phase transition long b efore the actual transition to the seizure. 5.4.3 Estimation of Dynamical Phase (Angular F requency) Motiv ated b y the represen tation of a state as a v ector in the state space, w e ha v e dened the dierence in phase b et w een t w o ev olv ed states X ( t i ) and X ( t i + t ) as i [71 ]. Then, denoting with () the a v erage of the lo cal phase dierences i b et w een the v ectors in the state space, w e ha v e: = 1 N N X i =1 i (5.15) where N is the total n um b er of phase dierences estimated from the ev olution of X ( t i ) to X ( t i + t ) in the state space, according to: i = j arccos X ( t i ) X ( t i + t ) k X ( t i ) k k X ( t i + t ) k j : (5.16) Then, the a v erage angular frequency is: = 1 t : (5.17) If t is giv en in sec, then is giv en in rad/sec. Th us, while S T L max measures the lo cal stabilit y of the state of the system on a v erage, measures ho w fast a lo cal state of the system c hanges on a v erage (e.g., dividing b y 2 , the rate of c hange of the state of the system is expressed in sec 1 = H z ). An example of a t ypical prole o v er time is giv en in Figure 5{8. The v alues are estimated from a 60-min ute-long EEG sample recorded from an electro de lo cated in the epileptogenic hipp o campus. The EEG sample includes a 2-min ute seizure that o ccurs in the middle of the recording. The state space w as reconstructed from sequen tial, non-o v erlapping EEG data segmen ts of 2048 p oin ts (sampling frequency 200 H z , hence eac h segmen t is 10.24 sec in duration) with p = 7 and = 4 for the estimation of the S T L max proles [71 ]. The pre-ictal, ictal and p ost-ictal states

PAGE 146

129 Figure 5{8: A t ypical prole b efore, during and after an epileptic seizure, estimated from the EEG recorded from a site in the epileptogenic hipp o campus; the seizure o ccurred b et w een the v ertical lines. corresp ond to medium, high and lo w er v alues of resp ectiv ely . The highest v alues w ere observ ed during the ictal p erio d, and higher v alues w ere observ ed during the pre-ictal p erio d than during the p ost-ictal p erio d. This pattern roughly corresp onds to the t ypical observ ation of higher frequencies in the original EEG signal ictally , and lo w er EEG frequencies p ost-ictally . Ho w ev er, these observ ations can hardly denote a long-term w arning of an imp ending seizure. 5.4.4 En trop y and Information The en trop y H ( p ) of a probabilit y densit y p is H ( p ) = Z p ( x ) log p ( x ) dx: (5.18) The en trop y of a distribution o v er a discrete domain is H ( p ) = X i p i log p i : (5.19) The en trop y of EEG data describ es the extension to whic h the distribution is concen trated on small sets. If the en trop y is lo w, then the distribution is concen trated

PAGE 147

130 at a few v alues of x . It ma y , ho w ev er, b e concen trated on sev eral sets from eac h other. Th us, the v ariance can b e large while the en trop y is small. The a v erage en trop y dep ends explicitly on p . The en trop y can b e written as H ( p ) = < l og p ( x ) > : (5.20) The en trop y can th us b e view ed as a self-momen t of the probabilit y , in con trast to the ordinary momen ts; for example, < x 2 > , whic h are a v erage o v er quan tities that pro vide a dieren t kind of information than the ordinary momen ts. The en trop y measures the degree of surprise one should feel up on learning the results of a measuremen t. It coun ts the n um b er of p ossible states, w eigh ting eac h b y its lik eliho o d. Join t and conditional en tropies can b e calculated b y H ( x; p ) = Z p ( x; y ) l og ( p ( x; y ) dxdy ; (5.21) H ( x j p ) = Z p ( x j y ) l og ( p ( x j y ) dx: (5.22) The negativ e of en trop y is sometimes called information : I ( p ) = H : (5.23) F or our purp oses the distribution b et w een the t w o is seman tic. En trop y is most often used to refer to measures, whereas information t ypically refers to probabilities. F or t w o random v ariables x and y , the mutual information states ho w m uc h information y giv es ab out x that is not presen t in x alone. I ( x; y ) = H ( x ) H ( x j y ) : (5.24) Denote H ( x ) as an uncertain t y in x and H ( x j y ) as an uncertain t y in x giv en y . If one kno ws that y cannot mak e x more uncertain, then H ( x ) H ( x j y ). The m utual information is therefore nonnegativ e. The m utual information tells ho w m uc h the uncertain t y of x is decreased b y kno wing y .

PAGE 148

131 Figure 5{9: Plot of the L max o v er time deriv ed from an EEG signal recorded at B L 1, an electro de site o v erlying the seizure fo cus. Figure 5{10: Plot of the En trop y o v er time deriv ed from an EEG signal (corresp onding to Figure 5{9 recorded at B L 1, an electro de site o v erlying the seizure fo cus.

PAGE 149

132 In Figures 5{9 and 5{10, the examples of the S T L max and en trop y proles from EEG recordings of electro de B L 1 o v er 2-hour in terv al including one seizure are illustrated. 5.4.5 Appro ximate En trop y (ApEn) Recen tly , appro ximate en trop y (ApEn), a statistical measure for quan tifying the system regularit y/complexit y of a time series, has b een widely applied on medical data analysis [122 , 126 ]. It can b e used to dieren tiate b et w een normal data and abnormal data in instances where momen t statistics (e.g., mean and v ariance) approac hes fail to sho w meaningful dierence, suc h as in heart rate analysis in the h uman neonate [128 , 124 ], and in epileptic activit y analysis in electro cardiographic recordings [25 ]. Mathematically , as part of a general theoretical dev elopmen t, ApEn has b een sho wn to b e the rate of en trop y for an appro ximating Mark o v Chain to a pro cess [123 ]. Most imp ortan tly , compared with the Kolmogoro v-Sinai (K-S) en trop y [83 ], ApEn is generally nite and has b een sho wn to b e able to classify the complexit y of the systems b y as few as 1000 data p oin ts based on the calculations that included theoretical analyses of b oth sto c hastic and deterministic c haotic pro cesses [122 , 127 ] and clinical applications [125 , 80 ]. The main steps of ApEn analysis are describ ed as follo ws: 1. Giv e a time series U = f u 1 ; u 2 ; : : : ; u n g , measured equally spaced in time. 2. Fix l , an in teger, and r , a p ositiv e real n um b er. The v alue of l is the length of compared subsequences in U , and r sp ecies a tolerance lev el. The c hoices of l and r ma y v ary for dieren t application of ApEn. 3. Dene x i = f u i ; u i +1 ; : : : ; u i + l 1 g . F orm a sequence of v ectors x 1 ; x 2 ; : : : ; x n l +1 in R l . These v ectors are subsequences with length l in U . 4. Use the sequences x 1 ; x 2 ; : : : ; x n l +1 to construct, for eac h i , 1 i n l + 1, C l i ( r ) = n um b er of x j 's suc h that d ( x i ; x j ) r n l + 1 , where d ( x i ; x j ) = max 0 k l 1 j u i + k u j + k j , i.e.,

PAGE 150

133 d ( x i ; x j ) represen ts the maxim um distance b et w een v ectors x i and x j in their resp ectiv e scalar comp onen ts. 5. Dene l ( r ) = n l +1 X i =1 ln C l i ( r ) = ( n l + 1) 6. The appro ximate en trop y is then dened b y ApEn = l ( r ) l +1 ( r ) 7. Note that -ApEn = l +1 ( r ) l ( r ) = P n l i =1 ln C l +1 i ( r ) n l P n l +1 i =1 ln C l i ( r ) n l + 1 1 n l n l X i =1 (ln C l +1 i ( r ) ln C l i ( r )) = 1 n l n l X i =1 ln C l +1 i ( r ) C l i ( r ) ; and C l +1 i ( r ) C l i ( r ) = P r ( j u j + k u i + k j r ; k = 0 ; 1 ; : : : ; l ) P r ( j u j + k u i + k j r ; k = 0 ; 1 ; : : : ; l 1) = P r ( j u j + l u i + l j r j j u j + k u i + k j r ; k = 0 ; 1 ; : : : ; l 1) Heuristically , ApEn quan ties the (logarithmic) lik eliho o d that subsequences in U of patterns that are close and will remain close on the next incremen t. The lo w er ApEn v alue indicates that the giv en time series is more regular and correlated, and larger ApEn v alue means that it is more complex and indep enden t. Ho w ev er, ApEn can only b e used to quan tify the o v erall regularit y in a time series. It cannot b e applied to detect particularly meaningful signals or patterns in a time series or to

PAGE 151

134 mak e predictions. The next section reviews sev eral metho dologies for the analysis and forecasting in nonstationary time series data. 5.4.6 Kolmogoro v-Sinai En trop y F or c haotic systems, the Kolmogoro v-Sinai (K-S) en trop y is one of the most commonly used nonlinear measure. Let r denote the curren t time in terv al and p r denote the probabilit y of a giv en v ector b eing signican tly close to another v ector, then K-S en trop y , s , is dened as follo ws: s = X r p r ln p r : In order to calculate p r , w e need to dene a distance matrix, D , in the n dimensional phase space for v ectors X and Y . D ( X ; Y ) = j x 1 y 1 j + j x 2 y 2 j + : : : + j x n y n j n ; where X = f x 1 ; : : : ; x n g and Y = f y 1 ; : : : ; y n g : The distance matrix D is calculated for eac h v ector with resp ect to ev ery other v ector in ev ery ep o c h. Consequen tly , w e calculate the standard deviation of distances b et w een v ectors. If D ( X ; Y ) 0 : 2 , v ector X and Y are considered to b e close. Let M r b e the total n um b er of v ectors that are close to a particular v ector in that time windo w, and M denote the total n um b er of v ectors within that time in terv al. Then, the probabilit y p r can calculated as follo ws: p r = M r M . 5.4.7 Kulbac k-Leibler Distance Kullbac k-Leibler distance (the div er g ence ) w as in tro duced in [86]. It is called the Kul lack diver genc e , the Kul lb ack-L eibler information , the r elative entr opy , diver genc e , or information for discrimination . It has b een studied in detail b y man y authors, including [10 ]. This section lists some of remark able prop erties of Kullbac kLeibler distance.

PAGE 152

135 Kullbac k-Leibler distance is particularly imp ortan t and has man y applications in elds related to probabilit y and information. F ollo wing the con v en tion, w e refer D ( 1) as the Kullbac k div ergence and D (1) as its dual. Unlik e other f -div ergences, the Kullbac k div ergence satises the follo wing chain rule D ( 1) ( p k q ) = D ( 1) ( p k k q k ) + Z D ( 1) ( p n ( j y ) k q n ( j y )) p k ( y ) dy ; (5.25) whic h giv es another pro of of the monotonicit y . In particular, it satises the additivity: D ( 1) ( p 12 k q 12 ) = D ( 1) ( p 1 k q 1 ) + D ( 1) ( p 2 k q 2 ) (5.26) for pro duct distribution p 12 ( x 1 ; x 2 ) = p 1 ( x 1 ) p 2 ( x 2 ). Figure 5{11 illustrates the comparison of Kullbac k-Leibler distance (KLD) analysis b et w een the sim ulated data generated b y Henon map with and without white noise. In order to demonstrate the use of Kullbac k-Leibler distance in forecasting problem, w e will presen t here an example of EEG data analysis. Other tests of this metho dology ha v e b een rep orted b y [9] and [23 ]. A straigh tforw ard w a y to test for indep endence of pre-ictal region and epileptic seizures is to see whether the densities p ( x ) and q ( x ) are the same. One natural w a y to do this to in tro duce the idea of distance on probabilit y space and measure the distance on probabilit y space and measure the distance b et w een p and q . One measure of the distance b et w een densities p and q is the 2 statistic. 2 = Z ( p ( x ) q ( x )) 2 q ( x ) dx; (5.27) The 2 statistic is usually dened on EEG data sets using a histogram estimate for p and q . this in tro duces an extra factor of N , the n um b er of cells, in to the equations. Kernel density estimation is less time-ecien t but usually more data-ecien t for smo oth probabilit y densities [164 ]. The basic idea is to let eac h p oin t \inuence" surrounding p oin ts through a k ernel function . Kernel densit y estimation are of the

PAGE 153

136 1 2 3 4 5 6 7 8 9 0 2 4 x 10 0 20 40 60 80 100 0 2 Number of iterations .5 .5 0 0.5 1 1.5 0 10 Henon map .5 .5 0 0.5 1 1.5 0 60 0 20 40 60 80 100 0 2 .5 .5 0 0.5 1 1.5 0 10 .5 .5 0 0.5 1 1.5 0 60 1 2 3 4 5 6 7 8 9 0 1 2 3 x 10 (d = 0) Hist (N = 100) Hist (N = 500) KLD Henon map (d = 0.005) Hist (N = 100 ) Hist (N = 500) KLD Number of iterations Figure 5{11: Kullbac k-Leibler distance (KLD) analysis of the Hennon map. The left column illustrates t w o data samples (N=100 and N=500) generated b y the Hennon map without noise. On the righ t, t w o samples (N=100 and N=500), genarated b y Hennon map with noise ( = 0.003).

PAGE 154

137 0 2 4 6 8 10 12 0 5000 0 5000 0 500 1000 1500 0 5000 0 500 1000 1500 0 2000 4000 0 500 1000 1500 0 5000 0 500 1000 1500 0 5000 0 500 1000 1500 0 1 2 3 4 5 6 7 8 9 10 0 150 KLD 0 1 2 3 4 5 6 7 8 9 10 0 60 KLDA5 A4 A3 A2 A1 Seizure B1B2 B3 B4 B5 Minutre A2 A5 B2 B5 Seizure A0 C D E Figure 5{12: Kullbac k-Leibler distance analysis on the EEG signals recorded from a sub dural electro de o v erlying orbitofron tal cortex con tralateral to the seizure onset zone. The upp er plot sho ws an EEG signal o v er a 13-min ute time p erio d. The trace is divided in to 5 pre-ictal wido ws ( A 1 to A 5 ), a windo w whic h includes a seizure ( A 0 ) and 5 p ostical windo ws ( B 1 B 5 ). Eac h windo w is 75 seconds in duration. The histogram in the second ro w is tak en from the windo w con taining the seizure ( A 0 ). Histograms for example pre-ictal windo ws A 2 and A 5 are sho wn on the left. Histograms from p ost-ictal windo ws B 2 and B 5 are sho wn on the righ t. In the cen ter, Kullbac k-Leibler distances b et w een the A 5 and other windo ws (plot C ) and b et w een the windo w A 0 and eac h of the other windo ws (plot D ) are depicted.

PAGE 155

138 0 2 4 6 8 10 12 0 5000 0 5000 0 500 1000 1500 0 5000 0 500 1000 1500 0 5000 0 500 1000 1500 0 5000 0 500 1000 1500 0 5000 0 500 1000 1500 0 5 10 0 150 KLD 0 5 10 0 40 KLDSeizure Minutes A 5 A 4 A 3 A 2 A 1 Seizure B 1 B 2 B 3 B 4 B 5 A 2 B 2 B 5 A 5 C D E A 0 Figure 5{13: Kullbac k-Leibler distance analysis on the EEG signals recorded from a sub dural electro de o v erlying the epileptic fo cus (seizure onset zone).

PAGE 156

139 form p ( x ) = 1 N n X i =1 k ( k x x i k ! ) ; (5.28) where the sum is o v er the en tire data set, k is the k ernel (a normalized probabilit y densit y), kk represen ts an appro ximation norm, and w is a windo w width. T o mak e a go o d estimate the k ernel function, the width m ust b e prop erly matc hed to the densit y function and the n um b er of data p oin ts. Another w a y to estimate probabilit y densit y of a giv en lo cal windo w is to utilize P arzen windo wing [38 , 130 ]. In P arzen windo wing, the probabilit y densit y is appro ximated b y sum of ev en, symmetric k ernels whose cen ters are translated to the simple p oin ts. A suitable k ernel function for this purp ose is the Gaussian. Another p ossible distance measure is the Kullbac k-Leibler information distance: K ( p; q ) = Z q ( x ) log ( q ( x ) p ( x ) ) dx: (5.29) Assuming that the t w o distribution p and q are nearly the same, the Kullbac kLeibler distance can b e expanded in a T a ylor series K ( p; q ) = Z q f ( p q 1)( p q 1) 2 = 2 + g Z q ( p q 1) 2 = 2 = Z ( p q ) 2 2 q : (5.30) In Figure 5{12 and 5{13 the v alues of the Kullbac k-Leibler distance b et w een the t w o reference windo ws (signal A 5 and epileptic seizure) and other windo ws are computed using the Eq. (5.30). 5.4.8 Mo deling of EEG Times Series The epileptic seizure is often referred to as an ictus. The p erio d immediately after a seizure, often asso ciated with symptoms suc h as confusion and dro wsiness, is called the p ost-ictal state. The long p erio ds b et w een seizures, usually hours to da ys or ev en w eeks in duration, is called the in terictal state. Recen t in v estigations in to

PAGE 157

140 the spatiotemp oral c haracteristic of the EEG in epileptic patien ts ha v e resulted in the disco v ery of a pre-ictal state [30 , 75 , 70 , 67 , 88 , 91 , 94 , 131 ]. The pre-ictal state app ears to b e a gradual transition, lasting appro ximately 30 min utes to 1 hour, from the in terictal state to the seizure (ictal state) [67 ]. F or the presen t study , w e ha v e estimated sequen tial short term Ly apuno v exp onen ts of a pre-ictal (appro ximately 30 min utes b efore seizure onset) and ictal EEG previously recorded in a patien t with medically in tractable seizures of left temp oral lob e origin. Figure 5{14 presen ts the Ly apuno v dimension as a function of time for temp oral lob es of patien t with seizures originating in the left mesial temp oral lob e region. These data w ere obtained b y the analysis of EEG signals recorded from the region of the brain from whic h the seizures rst b egin (often referred to as the seizure onset zone, or seizure fo cus). 0 16 20 25 Time (minutes) Lyapunov Dimension 4th order polynimial approximation dL Figure 5{14: Ly apuno v dimension of epileptic patien t as function of time. This gure corresp onds to nine Ly apuno v exp onen ts. Note that the Ly apuno v dimension decreases o v er the 15 to 20 min utes p erio d preceding seizures onset. By applying nonlinear measures discussed in the previous section, w e compare and sho w c hanges in brain dynamics among dieren t states of an epileptic patien t. In this exp erimen t, 10-min utes of EEG data sampled from the interictal state (seizure free p erio d no seizure within the range of 8 hours), pr e-ictal state (p erio d b efore

PAGE 158

141 2 4 6 8 0 0.5 1 1.5 2 0 500 1000 1500Entropy STLmaxAngular Frequency Interictal 2 4 6 8 0 0.5 1 1.5 2 0 500 1000 1500Entropy STLmaxAngular Frequency Preictal 2 4 6 8 0 0.5 1 1.5 2 0 500 1000 1500Entropy STLmaxAngular Frequency Ictal 2 4 6 8 0 0.5 1 1.5 2 0 500 1000 1500EntropySTLmax Angular Frequency Postictal Figure 5{15: Three-dimension plots of en trop y , angular frequency , and STLmax in dieren t ph ysiological states (in terictal, pre-ictal, ictal and p ost-ictal) of an epileptic patien t. the seizure onset), and p ost-ictal state (p erio d immediately after the seizure ends) are obtained from EEG recordings of an epileptic patien t. Ho w ev er, only 2-min utes of data from EEG recordings obtained during the ictal state (seizure) of the patien t are a v ailable b ecause the seizure usually lasts ab out 1-2 min utes. F rom the EEG data obtained from eac h of the states describ ed ab o v e, the dynamical measures en trop y , STLmax and angular frequency w ere calculated o v er time. Eac h measure w as calculated con tin uously for eac h non-o v erlapping 10 : 24 second segmen t of EEG data. Figure 5{15 sho ws three measures in the in terictal, preictal, ictal, and p ost-ictal states individually . Figure 5{16 sho ws the 3-dimensional com bined plot. The most in teresting nding is that the v alues of these measures in the in terictal and pre-ictal state are close together in a cluster whereas those in ictal and p ost-ictal p erio ds are widespread. Ho w ev er, from Figure 5{16, the gradual transitions from one ph ysiologic state to another can b e iden tied. This observ ation suggests the p ossibilit y to predict imp ending seizures b y detecting the state c hanges in the EEG signal preceding seizures.

PAGE 159

142 2 3 4 5 6 7 8 0 0.5 1 1.5 2 0 500 1000 1500 STLmax Angular Frequency Entropy Interictal Preictal Ictal Postictal Figure 5{16: Com bined 3-dimension plots of en trop y , angular frequency , and STLmax in dieren t ph ysiological states (in terictal, pre-ictal, ictal and p ost-ictal) for an epileptic patien t. 5.5 Statistical T ests for Spatiotemp oral Analysis Although a great deal is no w kno wn ab out lo w dimensional c haos, the erratic motion of dynamical systems describ ed b y a few v ariables is understo o d in systems where the n um b er of c haotic degrees of freedom b ecomes v ery large. T ypically suc h systems sho w disorder in b oth space and time and are said to exhibit spatiotemp oral c haos. Spatiotemp oral c haos o ccurs when the system of coupled dynamical systems giv es rise to dynamical b eha vior that exhibits b oth spatial disorder (as in rapid deca y of spatial correlations) and temp oral disorder (as in nonzero Ly apuno v exp onen ts). This is an extremely activ e and rather unsettled area of researc h. The system under consideration (brain) has a spatial exten t and, as suc h, information ab out the transition of the system to w ards the ictal state should also b e included in the in teractions of its spatial comp onen ts. The pre-ictal transition, progressiv e con v ergence of S T L max proles, is another piece of evidence of spatiotemp oral c haos in the brain (sho wn in Figure 5{17). Ha ving estimated the S T L max temp oral proles at individual cortical site, and as the brain pro ceeds to w ards the ictal state, the temp oral ev olution of the stabilit y of eac h cortical site is quan tied. The spatial dynamics of

PAGE 160

143 this transition are captured b y considering the relationship of the S T L max b et w een dieren t cortical sites. F or example, if a similar transition o ccurs at dieren t cortical sites, the S T L max of the in v olv ed sites are exp ected to con v erge to similar v alues prior to the transition. W e ha v e called suc h participating sites \critical sites", and suc h a con v ergence \dynamical en trainmen t". More sp ecically , in order for the dynamical en trainmen t to ha v e a statistical con ten t, w e ha v e allo w ed a p erio d o v er whic h the mean of the dierences of the S T L max v alues at t w o sites is estimated. W e ha v e used 60 S T L max v alues (i.e., mo ving windo ws of appro ximately of 10 min utes at eac h electro de site) to test the dynamical en trainmen t at the 0.01 statistical signicance lev el. W e emplo y the T-index as a measure of dynamical en trainmen t of S T L max proles o v er time. The T -index at time t b et w een electro de sites i and j is dened as: T i;j ( t ) = p N j E f S T L max;i S T L max;j gj = i;j ( t ) (5.31) where E fg is the sample a v erage dierence for the S T L max;i S T L max;j estimated o v er a mo ving windo w w t ( ) dened as: w t ( ) = 8 > < > : 1 if 2 [ t N 1 ; t ] 0 if 62 [ t N 1 ; t ] ; where N is the length of the mo ving windo w. Then, i;j ( t ) is the sample standard deviation of the S T L max dierences b et w een electro de sites i and j within the mo ving windo w w t ( ). The th us dened T -index follo ws a t -distribution with N-1 degrees of freedom. F or the estimation of the T i;j ( t ) indices in our data w e used N = 60 (i.e., a v erage of 60 dierences of S T L max exp onen ts b et w een sites i and j p er mo ving windo w of appro ximately 10 min ute duration). Therefore, a t w o-sided t -test with N 1(= 59) degrees of freedom, at a statistical signicance lev el should b e used to test the n ull h yp othesis, H o : \brain sites i and j acquire iden tical S T L max v alues

PAGE 161

144 0 20 40 60 80 100 120 2 3 4 5 6 7 8 TIME (MINUTES)Lmax (bit/sec) 0 20 40 60 80 100 120 0 5 10 15 TIME (MINUTES)T INDEX RTD2 ROF4 SZ#10 SZ#9 SZ#9 SZ#10 RTD2 vs ROF4 (B) (A) Figure 5{17: Plots of the S T L max proles and the T -index prole b et w een the normal site R OF4 and the epileptogenic site R TD2 ab out 35 min utes in to the recording sho w the dynamical en trainmen t of a pair of brain sites b et w een seizures 9 and 10 (patien t 1). at time t". In this exp erimen t, w e set = 0 : 01, the probabilit y of a t yp e I error. In other w ords, the probabilit y of falsely rejecting H o if H o is true, is 1%. F or the T -index to pass this test, the T i;j ( t ) v alue should b e within the in terv al [0,2.662]. In Figure 5{17, pre-ictal en trainmen t (long b efore the o ccurrence of a seizure) and p ost-ictal disen trainmen t (after the o ccurrence of the seizure) of the S T L max proles at t w o brain sites is sho wn. This b eha vior is quan tied b y the T -index prole b et w een these sites. F rom this gure, it is clear that attempts for a spatiotemp oral en trainmen t b et w een brain sites o ccur long b efore an epileptic seizure (rst attempt ab out 70 min utes prior to seizure). P ost-ictally , this en trainmen t is reset. In Figure 5{18, the proles at t w o cortical sites are sho wn for the in terv al b et w een seizures 13 and 14 in patien t 1. F or these cortical sites, a remark able feature

PAGE 162

145 Figure 5{18: Angular frequency proles from t w o left orbitofron tal electro de sites o v er 3.5 hours b et w een seizures 13 and 14 (patien t 1). The ictal p erio ds of the t w o seizures are denoted b y v ertical lines Figure 5{19: The T-index prole b et w een t w o electro de sites whose proles are depicted in Figure 5{18. The t w o sites are dynamically en trained 1.75 to 1.5 hours, as w ell as 1.2 hour prior to seizure's 14 onset. The T 1 and T 2 statistical thresholds are represen ted b y the t w o horizon tal lines.

PAGE 163

146 is observ ed: a long-term con v ergence of their proles prior to seizure 14. W e ha v e called this con v ergence "dynamical en trainmen t" and w e ha v e quan tied it b y a T-statistic that pro vides a comparison b et w een the t w o electro de sites (sho wn in Figure 5{19). 5.6 Optimization T ec hniques for Iden tifying the T emp oral P atterns Ha ving dened that electro de sites, whic h participate in the pre-ictal transition, m ust b e en trained prior to seizures, w e h yp othesize that \the electro de sites that are most en trained during the curren t seizure and disen trained after the seizure onset should b e most lik ely to b e en trained prior to the next seizure". Critical electro de sites are those sites, whic h are most en trained prior to seizures and disen trained after the seizure onset. As a result, it is p ossible to predict a seizure if one can iden tify critical electro de sites in adv ance. T o test this h yp othesis, w e designed an exp erimen t whic h compares the probabilit y of detecting pre-ictal transition from an y en trained cortical sites with the probabilit y of detecting pre-ictal transition from the critical cortical sites. In this exp erimen t, testing on 3 patien ts with 20 seizures, w e randomly selected 5,000 groups of en trained sites and emplo y ed the computational approac h, whic h will b e discussed in detail in the next section, to solv e m ulti-quadratic 0-1 programming problem (select the critical sites). The results sho w that the probabilit y of detecting pre-ictal transition from the critical cortical sites is appro ximately 83%. When w e compare this probabilit y with the probabilit y of detecting pre-ictal transition from an y en trained sites, w e obtain P-v alue < 0.07, whic h is signican t and v alidates our h yp othesis. The Histogram of probabilit y of detecting pre-ictal transition from randomly selected en trained cortical sites compared with the probabilit y of detecting pre-ictal transition from the critical cortical sites is illustrated in Figure 5{20. The results of this study conrm our h yp othesis that the set of most con v erged cortical sites during the curren t seizure and reset after the seizure onset is more lik ely to b e con v erged again during the next seizure

PAGE 164

147 Figure 5{20: Histogram of probabilit y of detecting pre-ictal transition of randomly selected en trained electro de sites 5,000 times compared with the most en trained electro de sites.

PAGE 165

148 than other con v erged cortical sites. Th us, it is p ossible to predict an imp ending seizure based on optimization and nonlinear dynamics of m ultic hannel in tracranial EEG recordings. Prediction is p ossible b ecause, for the v ast ma jorit y of seizures, the spatiotemp oral dynamical features of the pre-ictal transition are sucien tly similar to that of the preceding seizure. This similarit y mak es it p ossible to iden tify electro de sites that will participate in the next pre-ictal transition, b y solving m ulti-quadratic 0-1 programming problem. 5.6.1 Quadratic Zero-One Programming In this pap er w e refer to the Sherrington-Kirkpatric Hamiltonian that describ es the mean-eld theory of the spin glasses where elemen ts are placed on the v ertices of a regular lattice, the magnetic in teractions hold only for nearest neigh b ors and ev ery elemen t has only t w o states (Ising spin glasses [12 , 13, 14 , 61 , 97]). One of the most in teresting problems ab out this mo del is the determination of the minimal energy states (GR OUND ST A TE problem). F or man y y ears the Ising mo del has b een a p o w erful to ol in studying phase transitions in statistical ph ysics. Suc h an Ising mo del can b e describ ed b y a graph G ( V ; E ) ha ving n v ertices f v 1 ; : : : ; v n g and eac h edge ( i; j ) 2 E ha ving a w eigh t (in teraction energy) J ij . Eac h v ertex v i has a magnetic spin v ariable i 2 f 1 ; +1 g asso ciated with it. An optimal spin conguration of minim um energy is obtained b y minimizing the Hamiltonian H ( ) = X 1 i j n J ij i j o v er all 2 f 1 ; +1 g n : This problem is equiv alen t to the com binatorial problem of quadratic biv alen t programming [61 ]. Quadratic zero-one programming has b een extensiv ely used to study Ising spin glass mo dels. This has motiv ated us to use quadratic 0-1 programming to select the critical cortical sites, where eac h electro de has only t w o states, and to determine

PAGE 166

149 the minimal-a v erage T-index state. W e form ulated this problem as a quadratic 0-1 knapsac k problem with ob jectiv e function to minimize the a v erage T-index (a measure of statistical distance b et w een the mean v alues of S T L max ) among electro de sites and the knapsac k constrain t to iden tify the n um b er of critical cortical sites. Let A b e n n matrix, whose eac h elemen t a i;j represen ts the T-index b et w een electro de i and j within the 10-min ute windo w b efore the onset of a seizure. Dene x = ( x 1 ; :::; x n ), where eac h x i represen ts the cortical electro de site i . If the cortical site i is selected to b e one of the critical electro de sites, then x i = 1; otherwise, x i = 0. A quadratic function is dened on R n b y min f ( x ) = x T Ax; s.t. x i 2 f 0 ; 1 g ; i = 1 ; :::; n (5.32) where A is an n n matrix [112 , 113 ]. Throughout this section the follo wing notations will b e used. f 0 ; 1 g n : set of n dimensional 0-1 v ectors. R n n : set of n n dimensional real matrices. R n : set of n dimensional real v ectors. Next, w e add a linear constrain t, P n i =1 x i = k , where k is the n um b er of critical electro de sites that w e w an t to select. W e no w consider the follo wing linearly constrained quadratic 0-1 problem: P : min f ( x ) = x T Ax; s.t. n X i =1 x i = k for some k ; x 2 f 0 ; 1 g n ; A 2 R n n : (5.33) Problem P can b e form ulated as a quadratic 0-1 problem of the form as in Problem 5.32 b y using an exact p enalt y . If A = ( a ij ) then let M = 2[ P n j =1 P n i =1 j a ij j ] + 1. Then, w e ha v e the follo wing equiv alen t problem P as follo ws: P : min g ( x ) = x T Ax + M ( n X i =1 x i k ) 2 ; s.t. x 2 f 0 ; 1 g n ; A 2 R n n : (5.34)

PAGE 167

150 T o solv e this problem, w e considered 3 computational approac hes. In the rst approac h, w e solv ed (5.34) b y applying a branc h and b ound algorithm with a dynamic rule for xing v ariables [112 , 113 ]. In the second approac h, w e use a linearization tec hnique to form ulate the quadratic in teger programming (QIP) problem in (5.33) as an in teger programming (IP) problem b y in tro ducing a new v ariable for eac h pro duct of t w o v ariables and adding some additional constrain ts, and then form ulated this problem as a linear 0-1 problem. In the third approac h, w e emplo y ed the KarushKh un T uc k er optimalit y conditions of the linearly constrained quadratic 0-1 problem in (5.33) to form ulate this problem as a mixed-in teger linear programming (MILP) problem. Details of the rst approac h can b e found in [112 , 113 ]; next, w e discuss the second and the third approac hes. 5.6.2 Con v en tional Linearization Approac h F or eac h pro duct x i x j ; w e in tro duce a new 0-1 v ariable, x ij = x i x j ( i 6= j ). Note that x ii = x 2 i = x i for x i 2 f 0 ; 1 g . After linearization, the equiv alen t IP form ulation is giv en b y: min X i X j a ij x ij (5.35) s.t. n X i =1 x i = k ; (5.36) x ij x i ; for i; j = 1 ; :::; n ( i 6= j ) (5.37) x ij x j ; for i; j = 1 ; :::; n ( i 6= j ) (5.38) x i + x j 1 x ij ; for i; j = 1 ; :::; n ( i 6= j ) (5.39) where x i 2 f 0 ; 1 g and x ij 2 f 0 ; 1 g , i; j = 1 ; :::; n . The n um b er of 0-1 v ariables has b een increased to O ( n 2 ). Although, w e can apply CPLEX 7.0 to solv e problems with n = 30, this approac h b ecomes computationally inecien t as n increases. F uture tec hnology will require the abilit y to ecien tly

PAGE 168

151 solv e problems with m uc h larger v alues for n . F or instance, micro-electro des will b e implan ted in the future ( n > 1000). 5.6.3 KKT Conditions Linearization Approac h Consider a quadratic problem with a linear constrain t giv en b y: min z ( x ) = x T Ax; s.t. n X i =1 x i = k ; x i 0 ; i = 1 ; :::; n: (5.40) W e then ha v e the follo wing Karush-Kuhn T uc k er conditions: 2 Ax + u:e + y = 0 (5.41) n X i =1 x i = k (5.42) y T x = 0 ; (5.43) where u and y are Lagrangian m ultipliers. Note that u is a scalar and y is a column v ector. W e add the slac k v ariables w , whic h is a column v ector, to (5.41) and then denote a column v ector s = u:e + w . W e then ha v e the KKT conditions giv en b y: 2 Ax + y + s = 0 n X i =1 x i = k y T x = 0 : W e can form ulate the ab o v e KKT conditions as a MILP form ulation. The ob jectiv e function is to minimize the summation of v ariables, s i . Because x i are 0-1 v ariables, w e can replace the last constrain t with y i (1 x i ), for i = 1 ; :::; n ,

PAGE 169

152 where max i P n j =1 a ij = k A k 1 . W e then ha v e the MILP form ulation giv en b y: min n X i =1 s i s.t. n X j =1 a ij x j + s i + y i = 0 ; for i = 1 ; :::; n n X i =1 x i k = 0 (5.44) y i (1 x i ) 0 ; for i = 1 ; :::; n where x i 2 f 0 ; 1 g and s i ; y i 0 ; for i = 1 ; :::; n . F rom Chapter 4, w e pro v ed that QIP form ulation in (5.33) is equiv alen t to the MILP form ulation in (5.44). F rom (5.31), T i;j ( t ) = p N j E f S T L max;i S T L max;j gj = i;j ( t ), w e note that ev ery elemen t in T-index matrix A is p ositiv e. F or this reason, in ev ery instance, b y solving the MILP problem in (5.44) w e can nd the global solution to the original QIP problem in (5.33). Applying CPLEX 7.0, this problem can b e easily solv ed with n = 30. In addition, this form ulation is computationally ecien t as n increases b ecause the n um b er of 0-1 v ariables is O ( n ). F rom computational exp erimen ts, the ab o v e linear mixed in teger 0-1 problem is the most ecien t approac h in our application, see T able 5{1 and Figure 5{21. T able 5{1: P erformance c haracteristics (computational time { measured in seconds) of t w o prop osed approac hes compared with complete en umerations Numb er of KKT Conditions Line arization Complete sele cte d ele ctr o des Appr o ach (se cs) Appr o ach (se cs) Enumer ations (se cs) 5 (out of 30) 297 656 15 6 (out of 30) 406 735 78 7 (out of 30) 609 968 313 8 (out of 30) 1797 2610 1141 9 (out of 30) 2562 5235 3578 5.6.4 Multi-Quadratic Zero-One Programming Our group has sho wn dynamical resetting of the brain follo wing seizures [145 , 73 , 135 ], that is, div ergence of S T L max proles after seizures. Therefore, w e w an t to

PAGE 170

153 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0 5 ( o u t o f 3 0 ) 6 ( o u t o f 3 0 ) 7 ( o u t o f 3 0 ) 8 ( o u t o f 3 0 ) 9 ( o u t o f 3 0 ) S e l e c t i o nC P U s e c o n d s K K T c o n d i t i o n s L i n e a r i z a t i o n C o m p l e t e E n u m e r a t i o n s Figure 5{21: P erformance c haracteristics of t w o prop osed approac hes compared with complete en umerations incorp orate this nding with our existing critical electro de selection problem (QIP problem in (5.33)). Th us, w e ha v e to ensure that the optimal group of critical sites sho ws this div ergence b y adding one more quadratic constrain t to the QIP problem in (5.33). The m ulti-quadratic in teger programming (MQIP) problem is giv en b y: min x T Ax s.t. P n i =1 x i = k (5.45) x T B x T k ( k 1) where x i 2 f 0 ; 1 g 8 i 2 f 1 ; :::; n g . Let B b e n n matrix, whose eac h elemen t b i;j represen ts the T-index b et w een electro de i and j within the 10-min ute windo w after the onset of a seizure. Note that the matrix A = ( a ij ) is the T-index matrix of brain sites i and j within the 10-min ute windo w b efore the onset of a seizure. T is the critical v alue of T-index, as previously dened, to reject H o : \t w o brain sites acquire iden tical S T L max v alues within time windo w w t ( )".

PAGE 171

154 With one more quadratic constrain t, the problem in (5.45) b ecomes m uc h harder to solv e. Note that in the rst approac h, a branc h and b ound algorithm with a dynamic rule for xing v ariables cannot b e applied to solv e this problem b ecause of the additional quadratic constrain t. Ho w ev er, w e can mo dify the MIP form ulation in (5.44) from the previous section and reform ulate this problem b y adding one more linearized constrain t. The equiv alen t IP form ulation is giv en b y: min X i X j a ij x ij s.t. n X i =1 x i = k ; x ij x i ; for i; j = 1 ; :::; n ( i 6= j ) x ij x j ; for i; j = 1 ; :::; n ( i 6= j ) x i + x j 1 x ij ; for i; j = 1 ; :::; n ( i 6= j ) X i X j b ij x ij T k ( k 1) where x i 2 f 0 ; 1 g and x ij 2 f 0 ; 1 g , i; j = 1 ; :::; n . As w e men tioned in the previous section, the ab o v e form ulation is not computationally ecien t as n increases. F rom the pro of in Chapter 4, w e ha v e sho wn that solving the MILP in (5.44) giv es us the optimal solution, whic h is the global solution to the QIP problem in (5.33). In that pro of, w e sho w that w e can also solv e the MQIP problem in (5.45) b y

PAGE 172

155 solving the follo wing MILP form ulation. min n X i =1 s i (5.46) s.t. n X i =1 x i k = 0 (5.47) n X j =1 a ij x j + s i + y i = 0 ; for i = 1 ; :::; n (5.48) y i M (1 x i ) 0 ; for i = 1 ; :::; n (5.49) h i M x i 0 ; for i = 1 ; :::; n (5.50) n X j =1 d ij x j + h i 0 ; for i = 1 ; :::; n (5.51) n X i =1 h i T k ( k 1) (5.52) where x i 2 f 0 ; 1 g and s i ; y i ; h i 0 ; for i; j = 1 ; :::; n . Applying CPLEX 7.0, this problem can b e easily solv ed with n = 30. This form ulation is v ery computationally ecien t and is used to solv e this m ulti-quadratic zero-one programming problem iterativ ely for the selection after ev ery subsequen t seizure. In the future, it ma y b e useful for diagnostic purp oses to implan t more electro des. Although this will increase n , this form ulation is still applicable b ecause it is computationally ecien t. Note that, in the future, more seizure c haracteristics ma y b e disco v ered. This w ould require additional quadratic and linear constrain ts, the problem form ulation tec hnique is still applicable for solving MQIP problems. F or illustration purp oses, the smo othed (10-min ute mo ving a v erage) S T L max and max proles of the v e optimally selected electro des of seizure 14 and 15, are sho wn in Figures 5{24. The optimal electro des w ere selected in a 10 min ute in terv al prior to the second seizure of eac h set. F or eac h set of seizures, S T L max and max proles clearly con v erge (en train) b efore the second seizure and either b oth or one of them div erge (disen train) in this seizure's p ost-ictal p erio d. The a v erage T -index curv es that quan tify this pre-ictal en trainmen t and p ost-ictal disen trainmen t

PAGE 173

156 0 20 40 60 80 100 120 140 2 3 4 5 6 7 8 MINUTESSTLmax Figure 5{22: Smo othed S T L max proles of 5 optimal electro de sites o v er 150 min utes including a seizure. The pre-ictal p erio d sho ws gradual con v ergence of the S T L max v alues calculated for these critical electro de sites. During the seizure, S T L max v alues are completely en trained. P ost-ictally , the v alues are disen trained indicating resetting whic h rev erses the pre-ictal en trainmen t. 0 20 40 60 80 100 120 140 2 3 4 5 6 7 8 9 MINUTESSTLmax Figure 5{23: Smo othed S T L max proles of 5 non-optimal electro de sites o v er 150 min utes including a seizure. P ost-ictal resetting is not observ ed for these sites.

PAGE 174

157 0 50 100 150 200 250 300 350 400 450 500 2 3 4 5 6 7 8 STLm a x (bit/sec) LTD9 RTD6 RST1 RST4 LOF2 0 50 100 150 200 250 300 350 400 450 500 0 5 10 15 20 25 30 (rad/sec) LTD7 LST2 ROF2 RL03 RL04 MINUTES MINUTES SZ # 15 SZ # 16 SZ # 14 SZ # 14 SZ # 15 SZ # 16 (a) (b) W max Figure 5{24: Smo othed S T L max and max proles of the 5 optimally selected electro des o v er time (including seizures 14, 15, and 16). The optimal electro des w ere selected 10 min utes b efore seizure 15.

PAGE 175

158 0 50 100 150 200 250 300 350 400 450 500 0 5 10 15 MINUTEST-INDEX 0 50 100 150 200 250 300 350 400 450 500 0 5 10 15 MINUTEST-INDEX SZ # 14 SZ # 15 SZ # 16 SZ # 14 SZ # 15 SZ # 16 (a) (b) Figure 5{25: Av erage T-index curv e o v er time from the S T L max proles and the max proles in Figure 5{24.

PAGE 176

159 among the selected electro des for eac h of the corresp onding 3 sets of seizures are resp ectiv ely sho wn in Figures 5{25. The second and third sets of seizures w ere included herein to sho w that S T L max and max measures are not iden tical in the detection of the en trainmen t and disen trainmen t transition across epileptic seizures in the same patien t. Figures 5{26 and 5{27 illustrate the application of the optimization tec hniques to the detection of the pre-ictal transition preceding seizure 10 in patien t 1 from S T L max proles. The proles from 5 electro de sites (k=5), selected with the optimization program applied during the 10 min ute in terv al immediately preceding the onset of seizure 9 are sho wn. Figures 5{28 and 5{29 illustrate the detection of the pre-ictal transition preceding seizure 14 in patien t 1 from max proles. The proles from 5 electro de sites (k=5), selected with the optimization program applied during the 10 min ute in terv al immediately preceding the onset of seizure 13 are sho wn. In Figure 5{ 26, the estimated v alues forw ard in time from the 5 selected sites are sho wn for the en tire in terv al b et w een seizures 9 and 10. The a v erage T-index o v er all p ossible T ij indices among the optimal 5 sites is plotted o v er time in Figure 5{27. In Figure 5{28, the estimated v alues forw ard in time from the 5 selected sites are sho wn for the en tire in terv al b et w een seizures 13 and 14. The a v erage T-index o v er all p ossible T ij indices among the optimal 5 sites is plotted o v er time in Figure 5{29. Sev eral p oin ts are notew orth y . First, the use of optimal sites for the estimation and the a v erage T-index proles helps to detect the pre-ictal transition (ab out 1 to 0.5 hour b efore the seizure onset). Second, the detection of the pre-ictal transition is more robust than the previous illustration when only 2 sites w ere used (compare Figures 5{18 and 5{19), in the sense that false w arnings preceding the seizure ha v e b een eliminated. The need for optimization in selecting cortical site that are lik ely to sho w the pre-ictal transition prior to an imp ending seizure is clear when w e compare the results deriv ed with optimization (e.g., in Figures 5{28 and 5{29) to sites selected

PAGE 177

160 Figure 5{26: Con v ergence of 5 S T L max proles from critical cortical sites o v er 2 hours b et w een seizures 9 and 10 (patien t 1). The ictal p erio ds of the t w o seizures are denoted b y v ertical lines. Figure 5{27: The T-index prole among 5 critical cortical sites whose S T L max proles are depicted in Figure 5{26. The cortical sites are dynamically en trained appro ximately 60 min utes prior to seizure's 10 onset. The statistical thresholds are represen ted b y the t w o horizon tal lines.

PAGE 178

161 Figure 5{28: Angular frequency max proles b et w een seizures 13 and 14 (patien t 1) of 5 electro de sites selected b y the optimization program during the 10 min ute in terv al prior to the onset of seizure 13. Figure 5{29: The a v erage T-index prole of the 5 optimal electro de sites whose max proles are depicted in Figure 5{28. The 5 sites b ecome and remain dynamically en trained appro ximately 0.5 hour prior to the onset of seizure 14.

PAGE 179

162 Figure 5{30: Angular frequency max proles b et w een seizures 13 and 14 (patien t 1) of 5 non-optimally selected electro de sites. Figure 5{31: The a v erage T-index prole of the 5 non-optimal electro de sites whose max proles are depicted in Figure 5{30. The 5 sites do not b ecome dynamically en trained b et w een seizures 13 and 14.

PAGE 180

163 without the use of optimization (e.g., in Figures 5{30 and 5{31). In Figure 5{30, the proles from 5 randomly selected electro de sites (k=5) from the same 10 min ute in terv al prior to seizure 13 are estimated for the same in terv al b et w een seizures 13 and 14. The corresp onding a v erage T-index prole is sho wn in Figure 5{31. It is clear that the pre-ictal transition of seizure 14 cannot b e reliably detected b y analyzing the EEG from sites that w ere not selected using the optimization program. 5.7 Materials and Metho ds 5.7.1 Datasets The datasets consisted of con tin uous long-term (3 to 12 da ys) m ultic hannel in tracranial EEG recordings that w ere acquired from 5 patien ts with medically intractable temp oral lob e epilepsy . The recordings w ere obtained as part of a presurgical clinical ev aluation. They w ere obtained using the Nicolet BMSI 4000 and 5000 recording systems, using a 0.1 Hz high-pass and a 70 Hz lo w-pass lter. Eac h record included a total of 28 to 32 in tracranial electro des (8 sub dural and 6 hipp o campal depth electro des for eac h cerebral hemisphere). A diagram of electro de lo cations is pro vided in Figures 5{1 and 5{2. The c haracteristics of the recordings are outlined in T able 5{2. The recorded EEG signals w ere digitized, using a sampling rate of 200 Hz, and stored on magnetic media for subsequen t o-line analysis. In this study , all the EEG recordings ha v e b een view ed b y t w o indep enden t b oard-certied electro encephalographers to determine the n um b er and t yp e of recorded seizures, seizure onset and end times, and seizure onset zones. 5.7.2 Seizure W arning Algorithm The Seizure W arning Algorithm is outlined in Figure 5{32. This algorithm in v olv es the follo wing steps: 1. Con tin uous S T L max calculation: S T L max v alues w ere iterativ ely calculated from sequen tial non-o v erlapping 10.24 second EEG ep o c hs obtained from eac h electro de site, utilizing the metho d describ ed in Section 5.4.2. The S T L max metho d accomplishes a large data reduction (eac h 10.24 second EEG ep o c h b ecomes a single sample in the S T L max proles), and it is applied sequen tially

PAGE 181

164T able 5{2: Characteristics of analyzed EEG dataset Num b er of Seizures T otal Num b er of P artial Length Range of P atien t Gender Age electro des Complex Secondarily Sub clinical of data seizure in terarriv al P artial Generalized (Hours) time (Hours) 1 F emale 41 32 17 3 0 83.30 0.3 14.5 2 Male 29 28 8 0 7 140.15 0.3 70.8 3 F emale 38 32 6 0 0 18.24 1.1 4.8 4 Male 60 28 0 7 0 121.92 2.7 78.7 5 F emale 45 28 3 0 6 69.53 0.5 47.9

PAGE 182

165 to eac h EEG c hannel, creating a new m ultic hannel time series that is utilized for subsequen t analysis. 2. Optimization (Selection of critical electro de sites): Critical electro de sites are iden tied automatically b y the optimization algorithm describ ed in Section 5.5, after the rst seizure o ccurs. These critical sites are up dated after eac h subsequen t electrographic seizure. Selection of critical electro de sites is accomplished in 2 steps. The rst step is to generate t w o T-matrices, one from the 10-min ute ep o c h prior to the seizure onset, the other from the 10-min ute ep o c h after the seizure onset. This is accomplished b y calculating the T-indices (describ ed in Section 5.5) of S T L max for all p ossible pairs of electro de sites. In the second step, the optimization algorithm is applied on the t w o T-matrices as its ob jectiv e functions to iden tify the most critical group of electro de sites. In this case, the algorithm basically iden ties the group of sites whic h are most en trained prior to the seizure, conditional on the disen trainmen t after the seizure onset. When selecting the critical sites, there are t w o parameters to b e trained in this algorithm: n um b er of sites ( k ) p er group and n um b er of groups ( m ) to b e selected. In this study , for eac h patien t, w e utilize the rst half of the seizures to train k (3 6) and m (1 5). The optimal parameter setting is then iden tied b y observing the R OC curv es of the seizure w arning p erformance, and will b e applied in the testing seizures. The ev aluation of the w arning p erformance is discussed in the next subsection. 3. Monitoring the a v erage T-index curv e of selected electro de groups: Once groups of critical electro de sites are c hosen, the a v erage T-index v alue for eac h of these groups is con tin uously calculated from S T L max proles, using sequen tial 10-min ute o v erlapping windo ws. The a v erage T-index v alues are contin uously compared to a preset threshold v alue (T index), dened as the v alue b elo w whic h the a v erage dierence of the v alues of S T L max in the corresp onding time windo w is not signican tly dieren t from 0 (p > 0.01). When the a v erage T-index of a group of selected sites b ecomes less than the T index, the group is considered to b e en trained. 4. W arning of an imp ending seizure: The ob jectiv e of the automated seizure w arning system is to detect pre-ictal transitions in order to prosp ectiv ely w arn of an imp ending seizure. A seizure w arning is generated when the pre-ictal transition is detected. The onset of the pre-ictal transition is dened as that p oin t in time when at least one of the monitored groups of critical electro de sites is en trained. That is, the a v erage T-index for that group of sites initially ab o v e 5 (disen trained) drops to a v alue of 2.662 or less (en trained). These critical v alues w ere c hosen based on the follo wing statistical considerations: when the T-index is greater than 5, the a v erage S T L max v alues for electro de pairs are highly signican tly dieren t (p-v alue < 0.00001); when the T-index is less than 2.662,

PAGE 183

166 the a v erage S T L max v alues for electro de pairs are not signican tly dieren t (p-v alue > 0.01). F ollo wing eac h successiv e seizure, new groups of critical electro de sites are reselected and the algorithm is rep eated. 5.7.3 Ev aluation of the Seizure W arning Algorithm T o test this algorithm, a w arning w as considered to b e true if a seizure o ccurred within 3 hours after an en trainmen t transition w as detected. A 3-hour p erio d w as c hosen for purp oses of this analysis, based up on the seizure w arning in terv als observ ed in preliminary studies of seizure predictabilit y [66 ]. If no seizure o ccurred in that p erio d, the w arning w as considered to b e false. If a seizure o ccurred without a w arning during the preceding 3 hours, the algorithm w as judged to ha v e failed to w arn of that seizure. Th us, the sensitivit y w as dened as the total n um b er of seizures accurately predicted divided b y the total n um b er of seizures recorded. The false prediction rate w as dened as the a v erage n um b er of false w arnings p er hour. 5.8 Results Figure 5{33 sho ws an example of S T L max proles v ersus time, deriv ed from EEG signals recorded from 5 critical electro de sites. These sites, selected from the rst seizure in the series, div erge with resp ect to the v alues of S T L max after that seizure and con v erge to a common v alue prior to the next seizure (pre-ictal transition). After the o ccurrence of the second seizure, reselection of critical sites is made. Pre-ictal transition and p ost-ictal div ergence are reected in the corresp onding a v erage Tindex curv es with the gradual reduction pre-ictally and more rapid rise p ost-ictally , as sho wn in Figure 5{34. This sequence of dynamical state transitions is rep eated after eac h seizure. W e rst applied the algorithm in the training seizure set for eac h patien t to determine the optimal parameter ( k and m ) settings. Figures 5{35 sho ws the R OC curv e of eac h patien t. T able 5{3 summarizes the optimal parameter settings and their seizure w arning p erformance. The criterion for determining the optimal parameter

PAGE 184

167 EEGr signalsr Monitoringr T-index curvesr among selectedr electrode sitesr Optimization (Selectionr of critical electroder sites)r Yesr Warn of an impendingr seizurer Continuousr STLmaxr Calculationr Yesr Observer seizurer Observer preictalr entrainmentr transitionr Nor Nor Figure 5{32: Flo w diagram of the Seizure W arning Algorithm. This diagram illustrates the steps emplo y ed in the automated algorithm (see text for an explanation of eac h step).

PAGE 185

168 settings is that, for patien ts 1 and 2 with larger n um b er of training seizures (10 and 7, resp ectiv ely), the sensitivit y m ust b e larger than 80% with the minim um false p ositiv e rate p er hour. F or patien ts 3, 4 and 5, due to the small n um b er of training seizures (3 for eac h), the sensitivit y m ust b e at least 66.66% with the minim um false p ositiv e rate p er hour. In these 5 training sets, the p ercen tage of seizures that w ere correctly predicted ranged from 66.66% (patien t 4 and 5) to 100% (patien t 3), with an o v erall sensitivit y of 80.77% (21/26). The false w arnings o ccurred at a rate ranging from 0.00 to 0.234 (o v erall 0.159) false w arnings p er hour. This corresp onds, on a v erage, to a false w arning ev ery 6.3 hours. T able 5{4 summarizes the p erformance of the algorithm in the 5 testing seizure sets when using the selection parameters from the training seizure sets. The prediction sensitivit y ranged from 85.71% (patien t 2) to 100% (patien t 3, 4 and 5), with an o v erall sensitivit y of 91.67% (22/24). The false w arnings rates range from 0.049 to 0.366 (o v erall 0.196) false w arnings p er hour. This corresp onds, on a v erage, to a false w arning ev ery 5.1 hours. 5.9 Conclusions and Discussion W e ha v e prop osed a system approac h to the study of the ph ysiological disturbances that o ccur in h uman epilepsy . This approac h is summarized in Figure 5{36. F rom the results rep orted herein, and compared with other rep orted ndings, this approac h illustrates the v alue of com bining optimization metho ds with measures of the dynamical c haracteristics of the EEG generated b y the epileptic brain. Our analysis of con tin uous in tracranial EEG recordings o v er a long p erio d rev ealed that seizure generation in temp oral lob e epilepsy tak es place o v er a p erio d of hours. W e ha v e sho wn the state c hanges in the EEG signals using the Ly apuno v dimension and the com bination of STLmax, en trop y and angular frequency . These c hanges unfold as a cascade of electroph ysiologic \precursor" ev en ts, b eginning in the epilepsy region

PAGE 186

169 0 50 100 150 200 2 3 4 5 6 7 8 9 Lmax(bit/sec)SZ#8 SZ#9 SZ#10 Sites reselected Sites reselected Time (Minutes) Figure 5{33: A plot of S T L max v alues calculated from a 250-min ute sample of in tracranial EEG recording whic h con tains 3 of the complex partial seizures recorded from patien t 1. After seizure 8 and 9, 5 critical electro de sites (A R 4, A L 4, B R 2, B R 3 and B L 2 after seizure 8 and B R 1, B R 2, B R 4, C R 2 and C R 8 after seizure 9) w ere selected b y the global optimization algorithm. A t this p oin t in time, S T L max v alues for these selected sites are signican tly dieren t (disen trained). Prior to seizure 9 and 10, S T L max v alues from these same sites con v erge to a common v alue (en trained) and these sites b ecome disen trained after seizure 9 and 10. 0 50 100 150 200 0 1 2 3 4 5 6 7 8 9 Time (Minutes)T Index Warning Time Warning Time Sites reselected SZ#8 SZ#9 SZ#10 Sites reselected Figure 5{34: This a v erage T-index prole w as calculated from the S T L max proles sho wn in Fig 5{33. When the a v erage T-index drops from a v alue of 5 or ab o v e to a critical v alue of 2.662, the a v erage T-index for these sites is not signican tly dieren t than 0. A t that p oin t, the sites are considered to b e dynamically en trained and a seizure w arning is generated b y the system. Seizure w arnings are generated appro ximately 50 min utes b efore seizure 9 and appro ximately 70 min utes b efore seizure 10.

PAGE 187

170T able 5{3: P erformance c haracteristics of automated seizure w arning algorithm with optimal parameter settings of training data Num b er ] of critical ] of selected F alse Prediction Av erage W arning P atien t of analyzed electro des critical Sensitivit y Rate Time seizures in eac h group groups (F alse p er Hour) (Min utes) 1 11 5 3 80.00% (8/10) 0.095 (4/46.248) 67.8 20.9 2 8 3 2 85.71% (6/7) 0.234 (20/85.468) 66.2 15.7 3 4 3 4 100.00% (3/3) 0.000 (0/7.140) 71.5 12.3 4 4 5 2 66.67% (2/3) 0.065 (1/15.328) 39.0 20.4 5 4 4 1 66.67% (2/3) 0.151 (9/59.565) 72.7 47.8 All patien ts 31 80.77% (21/26) 0.159 (34/213.749) 63.44 6.22

PAGE 188

171 0 0.1 0.2 0.3 0.4 0.5 0.2 0.4 0.6 0.8 1 Sensitivity (%) 0 0.2 0.4 0.6 0.8 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 0 0.2 0.4 0.6 0.8 1 Select 3 cortical sites Select 4 cortical sites Select 5 cortical sites Select 6 cortical sites Optimal parameter setting Optimal parameter setting Optimal parameter setting Optimal parameter setting Optimal parameter setting False Positive Rate (false/hour) False Positive Rate (false/hour) False Positive Rate (false/hour) False Positive Rate (false/hour) False Positive Rate (false/hour) Sensitivity (%) Sensitivity (%) Sensitivity (%) Sensitivity (%)ROC curve for training datasets of 5 patients Patient 1 Patient 2 Patient 3 Patient 4 Patient 5 Figure 5{35: R OC curv e for the optimal parameter setting of 5 patien ts T able 5{4: P erformance c haracteristics of automated seizure w arning algorithm testing on optimal training parameter settings Num b er F alse Prediction Av erage W arning P atien t of analyzed Sensitivit y Rate Time seizures (F alse p er Hour) (Min utes) 1 10 88.89% (8/9) 0.049 (2/41.134) 79.3 13.2 2 8 85.71% (6/7) 0.366 (20/54.685) 90.2 19.2 3 3 100.00% (2/2) 0.137 (1/7.278) 79.9 6.2 4 4 100.00% (3/3) 0.178 (19/106.59) 108.8 7.4 5 4 100.00% (3/3) 0.100 (1/9.967) 104.9 21.1 All patien ts 31 91.67% (22/24) 0.196 (43/219.654) 92.6 6.2

PAGE 189

172 and spreading to adjacen t regions b efore clinical seizures o ccur, whic h ma y b e used to predict an imp ending epileptic seizure. Seizure prediction ev en tually should b e coupled with treatmen t strategies that in terrupt the pro cess b efore seizures b egin, or immediately after their onset is detected with more adv anced metho ds of imaging or electroph ysiology . The spatio-temp oral dynamics of the EEG represen t quan titativ e c haracteristics of the epileptic brain that ma y b e useful in the dev elopmen t of a ph ysiological in terpretation. One approac h to this ob jectiv e is to mo del the ev olution of a m ultidimensional state v ector o v er time. A rst step in this direction is to iden tify those dynamical measures that pro vide the b est c haracterization of the system as it ev olv es from state to state (in terictal ! pre-ictal ! ictal ! p ost-ictal ! in terictal). Based on this study , en trop y , the Ly apuno v dimension, STLmax, and angular frequency are candidates for inclusion in a m ultidimensional state v ector. F urther understanding of the dynamics underlying state transitions that o ccur in h uman epilepsy could serv e as the basis for dev eloping new approac hes to the diagnosis and treatmen t of epilepsy . In most instances, the diagnosis is based up on the description of the clinical manifestations of the seizure, pro vided b y the patien t and family mem b ers, and the in terictal EEG. Ho w ev er, the clinical phenomena of epilepsy are not unique to epilepsy and can b e caused b y other disorders suc h as migraine, transien t isc hemic attac ks, syncop e, and v arious psyc hiatric disorders. The diagnostic problem is further complicated b y the fact that an in terictal EEG ma y b e normal or sho w only nonsp ecic abnormalities (based on the curren t state of the art whic h is visual insp ection of the EEG tracings b y an electro encephalographer). It is p ossible that a b etter understanding of EEG dynamics will lead to the dev elopmen t of more p o w erful w a ys to dieren tiate the in terictal EEG of epileptic patien ts from those of patien ts with other disorders and from neurologically normal individuals.

PAGE 190

173 Figure 5{36: Sc hematic diagram of the system approac h to the mo deling of EEG in epileptic patien ts for purp oses of seizure prediction and analysis of dynamical mec hanisms.

PAGE 191

174 Curren tly , apart from surgical remo v al of the seizure onset zone, the only established w a y to con trol seizures is through c hronic use of medications or with v agal nerv e stim ulation. Surgical remo v al of the seizure onset zone is not alw a ys feasible or successful and is asso ciated with the risk of surgical and anesthetic morbidit y . Chronic medication therap y is not alw a ys eectiv e and dep ends on patien t compliance. F urther, all medications can pro duce side-eects. In medically in tractable patien ts who are not candidates for resp ectiv e surgery (remo v al of the seizure onset zone), the v agal nerv e stim ulator ma y b e used in an attempt to reduce the frequency and sev erit y of seizures. The v agal nerv e stim ulator is an implan table device that giv es an electrical stim ulus to the v agus nerv e in the nec k at xed in terv als. The success of this device has lead to researc h for the application of electrical stim ulation for seizure con trol. These include the study of the eects of electrical stim ulation of the brain itself. The gradual ev olution of the pre-ictal state pro vides a unique opp ortunit y for the dev elopmen t of therap eutic in terv en tions directed to w ard ab orting the dev elopmen t of an imp ending seizure [69 , 137 , 66 ] Our previous studies in h uman temp oral lob e epilepsy , ha v e sho wn that the pre-ictal dynamical transition ev olv es on a time scale on the order of 30 min utes to an hour [73 ]. This long time in terv al pro vides sucien t time to in terv ene therap eutically . P oten tial means of in terv en tion include electrical or magnetic stim ulation of the v agus nerv e, the trigeminal nerv e, or the brain itself (e.g., thalam us or cortex). A pre-ictal in terv en tion can b e considered as the application of an external con trol in order to mo dify the future ev olution of brain dynamics. The dev elopmen t of the appropriate mo del of the epileptic brain will pro vide opp ortunities to in v estigate the eects of v arious approac hes to dynamical con trol. This w ould b e an essen tial rst step in the dev elopmen t of no v el treatmen ts, based up on the theory of nonlinear dynamics. Clearly , further w ork is needed to carefully prob e exp erimen tally observ ed dynamics of the epileptic brain, and to clarify the

PAGE 192

175 bifurcation and self{organization structures that displa y complex probabilit y distribution functions. The results of this study conrm our h yp othesis that it is p ossible to predict an imp ending seizure based on optimization and nonlinear dynamics of m ultic hannel in tracranial EEG recordings. Prediction is p ossible b ecause, for the v ast ma jorit y of seizures, the spatiotemp oral dynamical features of the pre-ictal transition are sucien tly similar to that of the preceding seizure. This similarit y mak es it p ossible to iden tify electro de sites that will participate in the next pre-ictal transition, based on their b eha vior during the previous pre-ictal transition. Although evidence for the c haracteristic pre-ictal transition utilized b y the seizure prediction algorithm emplo y ed in this study w as rst rep orted in 1991 [69 ], further studies w ere required b efore a practical seizure prediction algorithm w as feasible. Dev elopmen t of a seizure prediction algorithm w as complicated b y three factors: (1) the cortical sites participating in the pre-ictal transition v aried from seizure to seizure, (2) the length of the pre-ictal transition v aried from seizure to seizure, and (3) it w as not kno wn whether or not this t yp e of spatiotemp oral transition w as unique to the preictal p erio d. These problems w ere o v ercome b y the use of our prop osed linearization metho ds to solv e MQIP problems. Because the algorithm selects candidate electro de sites and b y analyzing con tin uous EEG recordings of sev eral da ys of duration, the computational approac h to solv e the optimization problem has to b e v ery ecien t. A t presen t, w e can ecien tly solv e the m ulti-quadratic in teger programming problem. Ho w ev er, future tec hnology ma y allo w ph ysicians to implan t thousands of electro de sites, n > 1000, in the brain. This pro cedure will extract more information and allo w us to ha v e more understanding of the brain. Therefore, to solv e this optimization problem with n > 1000, w e ma y need computationally fast heuristic approac hes in the future.

PAGE 193

CHAPTER 6 CONCLUDING REMARKS AND FUTURE RESEAR CH 6.1 Summary This thesis is concerned with the problem of predicting target ev en ts and disco vering temp oral patterns in m ultiple time series go v erning the related target episo des of ev en ts. The new time series data mining concepts, whic h generalizes data mining concepts, dynamical approac hes in c haos theory , and optimization tec hniques, are applied to the areas of time series analysis. The optimization tec hniques are dev elop ed to impro v e the p erformance of prediction of time series b y iden tifying critical temp oral patterns related to the target ev en ts (e.g., dynamical parameter settings and selecting of critical comp onen ts). F or instance, sev eral alternativ e optimization metho ds for selecting the critical comp onen ts in the systems are emplo y ed and a no v el com bination of metho ds for determining the optimal parameters can b e applied to systems with one or more hidden v ariables, whic h can b e used to reconstruct maps or dieren tial equations of the dynamics of the system. Sp ecically in this researc h, studies based on c haos theory of the spatiotemp oral dynamics in EEG's, from patien ts with temp oral lob e epilepsy , demonstrated a preictal transition (temp oral patterns in m ultiple EEG recordings), c haracterized b y a progressiv e con v ergence (en trainmen t) of dynamical measures (e.g., short{term maxim um Ly apuno v exp onen ts { S T L max ) at sp ecic anatomical areas in the neocortex and hipp o campus, of appro ximately 1 2 to 1 hour duration b efore the seizure onset. The problem of iden tifying those critical sp ecic areas are o v ercome b y the prop osed optimization tec hnique. The dev elop ed tec hnique can guaran tee the global optimalit y and can b e generalized to solv e other w ell-kno wn nonlinear programming problems (e.g., maxim um clique problems, nonlinear assignmen t problems, maxim um 176

PAGE 194

177 indep enden t set problems). While this dev elop ed tec hnique solv e the m ulti-quadratic in teger programming problems with global optimalit y , it linearizes the problem with the same n um b er of 0-1 v ariable ( n ) and additional O ( n ) n um b er of con tin uous v ariables (compared with the additional O ( n 2 ) n um b er of 0-1 v ariables in con v en tional linearization tec hniques in the literature). 6.2 F uture Researc h Theoretical researc h will b e conducted to dev elop optimization tec hniques used to optimally determine the required dimension of the reconstructed phase space giv en an arbitrary n um b er of observ able states. As the time series data sets gro w larger, the computational eort required to quan tify the dynamics of the data and nd hidden temp oral patterns gro ws, requiring higher p erformance implemen tations of the dynamical measures and optimization tec hniques. This includes ho w to determine the phase space dimension for an arbitrary n um b er of observ able states so that the phase space is top ologically equiv alen t to the original state space. In the future, w e ma y need to in v estigate the relationship b et w een the n um b er of observ able states n and the required phase space dimensionalit y when 1 < n < Q . Impro ving computational p erformance is also desirable. One direction is to in v estigate alternativ e global optimization metho ds suc h as in terv al branc h and b ound. A second parallel direction is to in v estigate distributed and parallel implemen tations of the dynamical approac hes. 6.2.1 Dynamical Approac hes In this researc h, new tec hniques to con trol a c haotic system based on adaptiv e con trol need to b e studied. By observing the states of the system to b e con trolled, the con troller will try to adjust its parameters to ac hiev e a desired con trol ob ject. T o b etter understand the dynamics and the spatiotemp oral patterns of time series, w e are required to dev elop metho ds, whic h enable us to estimate the lo cal Ly apuno v

PAGE 195

178 exp onen ts with adaptiv ely optimizing parameter settings along with the estimates of v ariabilities of the estimators or condence in terv als. 6.2.2 Global Optimization F or a large scale quadratic in teger programming problem, heuristic approac hes to solv e this large problem (note that the solution ma y not b e the global minima) are desirable. In the future, w e ma y need to dev elop mathematical mo dels of the epileptic brain, whic h enable us to understand the theoretical basis for mec hanisms of pre-ictal transitions. 6.3 Applications in Finance Time series analysis has b een applied to test the nancial data, and the results often indicate that nonlinear structure is presen t. Ho w ev er, in the nance literature, the TSDM framew ork w as sho wn to b e able to generate a trading-edge b y c haracterizing and predicting sto c k price ev en ts. It is in teresting to apply the dev elop ed dynamical approac hes and optimization tec hniques in the nancial domain. Subsequen tly , w e ma y b e able to answ er the op en question: whether or not there is c haos in the sto c k mark et. In dynamical systems theory , the c haotic system b eha v es irregularly b ecause of its o wn in ternal logic, not b ecause of random forces acting from outside. If w e dene our dynamical system to b e the so cio-economic b eha vior of the en tire planet, nothing acts randomly from outside. Ho w ev er, the system's dimension (n um b er of state v ariables) is v ast, and it is imp ossible to exploit the determinism. This is highdimensional c haos, whic h migh t just as w ell b e truly random b eha vior. In this sense, the sto c k mark et is c haotic. T o b e useful, economic c haos w ould ha v e to in v olv e some kind of collectiv e b eha vior whic h can b e fully describ ed b y a small n um b er of v ariables. In the practice, the system has to b e self-organizing, resulting in lo w-dimensional c haos. If w e can pro v e this result, then w e can exploit the lo w-dimensional c haos in the sto c k mark et to mak e short-term predictions. The problem of in terest is to iden tify the state

PAGE 196

179 v ariables whic h c haracterize the collectiv e forms. In addition, ha ving limited the n um b er of state v ariables, man y ev en ts b ecome external to the system; that is, the system is op erating in a c hanging en vironmen t, whic h mak es the problem of system iden tication v ery dicult. In fact, if there w ere suc h collectiv e forms of uctuation, mark et pla y ers w ould probably recognize these patterns and they w ould tak e to exploit them and w ould quic kly n ullify the patterns. Mark et participan ts w ould probably not need to kno w c haos theory for this to happ en. Therefore, if these patterns exist, they m ust b e prett y hard to recognize b ecause they do not emerge clearly from the noise caused b y individual actions or v ery short patterns. Figures 6{1, 6{3, and 6{5 represen t the AA, CA T, and DD sto c k indices, resp ectiv ely , o v er 16 y ears (from 01/04/1984 to 01/04/2000). By dividing the recorded index data in to sequen tial 7-da y o v erlapping segmen ts, eac h 1024 da ys in duration, and estimating S T L max for eac h of these segmen ts, proles of S T L max o v er time are generated. A t ypical plot of S T L max o v er time obtained from the AA, CA T, and DD indices for 16 y ears (from 01/04/1984 to 01/04/2000) is sho wn in Figures 6{2, 6{ 4, and 6{6, resp ectiv ely . In Figure 6{2, the drop in the v alues of S T L max b et w een time t = 3600 and time t = 3700 precede the progressiv e incremen t of the AA index starting at time t = 3900 (see Figure 6{1, th us the trend of the index incremen t can b e detected from the drop in S T L max v alues. Similarly , in Figure 6{4, the drop in the v alues of S T L max b et w een time t = 2900 and time t = 3000 precede the progressiv e incremen t of the CA T index starting at time t = 3000 (see Figure 6{3, th us the trend of the index incremen t can b e detected from the drop in S T L max v alues. Corresp ondingly , in Figure 6{6, the drop in the v alues of S T L max b et w een time t = 1200 and time t = 1600 precede the progressiv e incremen t of the CA T index starting at time t = 1700 (see Figure 6{5, th us the trend of the index incremen t can b e detected from the drop in S T L max v alues.

PAGE 197

180 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 5 10 15 20 25 30 35 40 45 50 Days (from 01/04/1984)Price ($)AA Index Figure 6{1: The AA sto c k index o v er 4000 op erating da ys (appro ximately 16 y ears) from 01/04/1984 to 01/04/2000. 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 1 2 3 4 5 6 7 8 Days (from 01/04/1984)Price ($)STL max Profiles of AA Index Figure 6{2: The AA S T L max proles o v er 4000 op erating da ys (appro ximately 16 y ears) from 01/04/1984 to 01/04/2000.

PAGE 198

181 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 10 20 30 40 50 60 70 Days (from 01/04/1984)Price ($)CAT index Figure 6{3: The CA T sto c k index o v er 4000 op erating da ys (appro ximately 16 y ears) from 01/04/1984 to 01/04/2000. 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 1 2 3 4 5 6 7 8 Days (from 01/04/1984)Price ($)STL max Profiles of CAT Index Figure 6{4: The CA T S T L max proles o v er 4000 op erating da ys (appro ximately 16 y ears) from 01/04/1984 to 01/04/2000.

PAGE 199

182 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 10 20 30 40 50 60 70 80 90 Days (From 01/04/1984)Price ($)DD Index Figure 6{5: The DD sto c k index o v er 4000 op erating da ys (appro ximately 16 y ears) from 01/04/1984 to 01/04/2000. 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 1 2 3 4 5 6 7 8 Days (From 01/04/1984)Price ($)STL max Profiles of DD Index Figure 6{6: The DD S T L max proles o v er 4000 op erating da ys (appro ximately 16 y ears) from 01/04/1984 to 01/04/2000.

PAGE 200

183 The v alues of S T L max proles of the AA, CA T and DD indices are p ositiv e during the whole p erio d of recordings, implying that there exists c haos in the sto c k mark et (p ositiv e maxim um Ly apuno v exp onen ts). This conrms the h yp othesis that the sto c k mark et is a c haotic system. These patterns in dynamical measures ( S T L max ) of the sto c k indices ma y allo w us to predict the trend (b eha vior) of the indices. In the future, it is desirable to dev elop suc h metho ds to b e able to nd these patterns. W e can p ostulate that when the mark et b ecomes more and more ordered (drop in S T L max ), the system is v ery organized. This migh t lead to the increase of the sto c k indices. Ho w ev er, b ecause it seems unlik ely that mark ets remain stationary long enough to iden tify a c haotic attractor, w e ma y need dynamic optimization tec hniques to adaptiv ely nd the optimal dimension and temp oral pattern of a giv en system. Essen tially , w e exp ect that these tec hniques will b e able to iden tify rh ythms of mark et trading to mak e short-term predictions.

PAGE 201

REFERENCES [1] H. Abarbanel. A nalysis of observe d chaotic data . Springer-V erlag, New Y ork, 1996. [2] H. Abarbanel, R. Bro wn, and M. Kennel. V ariation of ly apuno v exp onen ts in c haotic systems: Their imp ortance and their ev alution using observ ed data. Journal of Nonlie ar Scienc e , 2:343{365, 1992. [3] H. Abarbanel and M. Kennel. Sync hronizing high-dimensional c haotic optical ring dynamics. Physic al R eview L etters , 80(14):3153{3156, 1998. [4] J. Ab ello, S. Butenk o, P . P ardalos, and M. Resende. Finding indep enden t sets in a graph using con tin uous m ultiv ariable p olynomial form ulations. Journal of Glob al Optimization , 21:111{137, 2001. [5] F. Al-Kha yy al. Generalized bilinear programming: P art I . mo dels, applications and linear programming relaxation. Eur op e an Journal of Op er ational R ese ar ch , 60:306{314, 1992. [6] F. Al-Kha yy al, R. Horst, and P . P ardalos. Global optimization of conca v e functions sub ject to separable quadratic constrain ts: An application to bilev el programming. A nnals of Op er ations R ese ar ch: Sp e cial V olume on Hier ar chic al Optimization , 34:125{147, 1992. [7] F. Al-Kha yy al, C. Larsen, and T. v an V o orhis. A relaxation metho d for noncon v ex quadratically constrained quadratic programs. Journal of Glob al Optimization , 6:215{230, 1995. [8] F. Al-Kha yy al and T. v an V o orhis. Accelerating con v ergence of branc h-andb ound algorithms for quadratically constrained optimization problems. In C. Floudas, editor, State of the art in glob al optimization: c omputational metho ds and applic ations . Klu w er Academic Publishers, 1996. [9] S.-I. Amari. Dieren tial geometry of curv ed exp onen tial families curv atures and information loss. The A nnals of Statistics , 10(2):357{385, 1982. [10] S.-I. Amari. Dier ential-ge ometric al metho ds in statistics . Springer-V erlag, New Y ork, 1985. [11] H. Arsham. Time series analysis for forecasting: In tro duction to statistical forecasting. h ttp://ubmail.ubalt.edu/ harsham/stat-data/opre330F orecast.h tm, 10/16/2002. 184

PAGE 202

185 [12] G. A thanasiou, C. Bac has, and W. W olf. In v arian t geometry of spin-glass states. Physic al R eview B , 35:1965{1968, 1987. [13] F. Barahona. On the computational complexit y of spin glass mo dels. J. Phys. A: Math. Gen. , 15:3241{3253, 1982. [14] F. Barahona. On the exact ground states of three-dimensional ising spin glasses. J. Phys. A: Math. Gen. , 15:L611{L615, 1982. [15] J. Barlo w. Metho ds of analysis of nonstationary eegs with emphasis on segmen tation tec hniques. Journal of Clinic al Neutophysiolo gy , 2:267{304, 1985. [16] D. Baron. Quadratic programming with quadratic constrain ts. Naval R ese ar ch L o gistics Quaterly , 19:253{260, 1972. [17] M. Bazaraa, H. Sherali, and C. Shett y . Nonline ar pr o gr amming: the ory and algorithms . John Wiley & Sons, New Y ork, 1993. [18] G. Bennetin, L. Galgani, and J. Strelcyn. Ly apuno v c haracteristic exp onen ts for smo oth dynamical systems and for hamiltonian systems: a metho d for computing all of them. Me chanic a , 15:9, 1980. [19] S. BenSaad. A n algorithm for a class of nonline ar c onvex optimization pr oblems . PhD thesis, Univ ersit y of California, Los Angeles, 1989. [20] D. Berndt and J. Cliord. Finding patterns in time series: A dynamic programming approac h. In U. F a yy ad, G. Piatetsky-Shapiro, P . Sm yth, and R. Uth ursam y , editors, A dvanc es in know le dge disc overy and data mining , pages 229{248. AAAI Press, 1996. [21] G. Bo x and G. Jenkins. Time series analysis: F or e c asting and c ontr ol . HoldenDa y , San F rancisco, 1976. [22] P . Cab ena and I. B. M. Corp oration. Disc overing data mining : fr om c onc ept to implementation . Pren tice Hall, Upp er Saddle Riv er, 1998. [23] N. Chen tso v. Statistical decision rules and optimal inference (in russia). English translation: T ranslation of Mathematical Monographs. AMS, Rho de Island, 1972. [24] G. Danninger and I. Bomze. Using cop ositivit y for global optimalit y criterion in gconca v e quadratic programming problems. Mathematic al Pr o gr amming , 62(1):575{580, 1993. [25] L. Diam bra, J. B. de Figueiredo, and C. Malta. Epileptic activit y recognition in eeg recording. Physic a A , 273:495{505, 1999. [26] G. Dorner. Neural net w orks for time series pro cessing. In Neur al Networks for Signal Pr o c essing IV, IEEE , pages 499{508, 1994.

PAGE 203

186 [27] M. Duran and I. Grossmann. An outer appro ximation algorithm for a class of mixed-in teger nonlinear programs. Mathematic al Pr o gr amming , 36:307{339, 1986. [28] J.-P . Ec kman, S. O. Kamphorst, D. Ruelle, and S. Cilib erto. Ly apuno v exp onen ts from time series. Physic al R eview A , 34:4971{4979, 1986. [29] J.-P . Ec kman and D. Ruelle. Ergo dic theory of c haos and strange attractors. R ev. Mo d. Phys. , 57:617{656, 1985. [30] C. Elger and K. Lehnertz. Seizure prediction b y non-linear time series analysis of brain electrical activit y . Eur op e an Journal of Neur oscienc e , 10:786{789, 1998. [31] S. Ellner, A. Gallen t, D. McCarey , and D. Nyc hk a. Con v ergence rates and date requiremen ts for jacobian-based estimates of ly apuno v exp onen ts from data. Phys. L ett. A. , 153:357{363, 1991. [32] D. Ev ans. Mo dular design a sp ecial case in nonlinear programming. Op er ations R ese ar ch , 11:637{647, 1963. [33] D. Ev ans. A note on mo dular design a sp ecial case in nonlinear programming. Op er ations R ese ar ch , 18:562{564, 1970. [34] J. F alk and R. Soland. An algorithm for separable non v o x programming problems. Management Scienc e , 15(9):550{569, 1969. [35] U. F a yy ad and P . Stolorz. Data mining and kdd: Promise and c hallenges. F utur e Gener ation Computer Systems , 13:99{115, 1997. [36] F. F eb er. T reatmen t of some nonstationarities in the eeg. Neur opsychobiolo gy , 17:100{104, 1987. [37] J. Filar and T. Sc h ultz. Bilinear programming and structured sto c hastic games. Journal of Optimization The ory and Applic ations , 53(1):85{104, 1987. [38] J. Fisher. Nonline ar extension to the minimum aver age c orr elation ener gy lter . PhD thesis, Univ ersit y of Florida, 1997. [39] C. Floudas. Deterministic glob al optimization: The ory, algorithms and applic ations . Klu w er Academic Publishers, Dordrec h t, 1999. [40] C. Floudas and V. Visw esw aran. Quadratic optimization. In R. Horst and P . P ardalos, editors, Handb o ok of glob al optimization , pages 217{269. Klu w er Academic Publishers, 1995. [41] W. F rank, T. Lo okman, M. Neren b erg, C. Essex, J. Lemieux, and W. Blume. Chaotic time series analyses of epileptic seizures. Physic a D , 46:427{438, 1990.

PAGE 204

187 [42] W. F ra wley , G. Piatetsky-Shapiro, and C. Matheus. Kno wledge disco v ery in databases: An o v erview. In G. Piatetsky-Shapiro and W. F ra wley , editors, Know le dge disc overy in datab ases , pages 1{27. AAAI/MIT Press, 1991. [43] T. F rison, H. Abarbanel, M. Earle, J. Sc h ultz, and W. Sc herer. Chaos and predictabilit y in o cean lev els. Journal of Ge ophysic al R ese ar ch{Oc e ans , 104:7935{7951, 1999. [44] P . G a rdenfors and B. Hansson. F orecasting nonstationary time series{some metho dological asp ects. T e chnolo gic al F or e c asting and So cial Change , 18:63{ 75, 1980. [45] A. Georion. Generalized b enders decomp osition. Journal of Optimization The ory and Applic ations , 10(4):237{260, 1972. [46] F. Giannessi and F. Niccolucci. Connections b et w een nonlinear and in teger programming problems. Istituto Nazionale di A lta Matematic a, Symp osia Mathematic a , 19:161{176, 1976. [47] J. Gleic k. Chaos: Making a new scienc e . P enguin, New Y ork, 1987. [48] R. Gomory . Outline of an algorithm for in teger solutions to linear programs. Bul letin of the A meric an Mathematic al So ciety , 64:275{278, 1958. [49] J. Gotman, J. Iv es, P . Glo or, A. Olivier, and L. Quesney . Changes in in terictal eeg spiking and seizure o ccurrence in h umans. Epilepsia , 23:432{433, 1982. [50] P . Grassb erger and I. Pro caccia. Characterization of strange attractors. Physic al R eview L etters , 50(5):346{349, 1983. [51] P . Grassb erger and I. Pro caccia. Estimation of the k olmogoro v en trop y from a c haotic signal. Physic al R eview A , 28(4):2591{2593, 1983. [52] P . Grassb erger, T. Sc hreib er, and C. Sc harath. Nonlinear time sequence analysis. International Journal of Bifur c ation and Chaos , 1:521{547, 1991. [53] C. Greb ogi, E. Ott, and J. Y ork e. Strange attractors that are not c haotic. Physic a D , 13:261{268, 1984. [54] R. Groth. Data mining building c omp etitive advantage . Pren tice Hall, Upp er Saddle Riv er, 2000. [55] M. Gr otsc hel, L. Lo v a sz, and A. Sc hrijv er. Ge ometric algorithms and c ombinatorial optimization . Springer, Berlin, 1993. [56] O. Gupta and A. Ra vindran. Branc h and b ound exp erimen ts in con v ex nonlinear in teger programming. Management Scienc e , 31:1533{1546, 1985. [57] M. H e non. A t w o-dimensional mapping with a strange attractor. Comm. Math. Phys. , 50:69{77, 1976.

PAGE 205

188 [58] W. Hager, P . P ardalos, I. Roussos, and H. Sahinoglou. Activ e constrain ts, indenite quadratic programming, and test problems. Journal of Optimization The ory and Applic ations , 68(3):499{511, 1991. [59] P . Hansen and B. Jaumard. Reduction of indenite quadratic programs to bilinear programs. Journal of Glob al Optimization , 2:41{60, 1992. [60] S. Ha ykin. Neur al networks: A c ompr ehensive foundation . Pren tice Hall, New Y ork, 2 edition, 1999. [61] R. Horst, P . P ardalos, and N. Thoai. Intr o duction to glob al optimization . Klu w er Academic Publishers, Dordrec h t, 1995. [62] R. Horst and H. T uy . Glob al optimization: Deterministic appr o aches . Springer V erlag, Berlin, 3 edition, 1996. [63] D. Hsieh. Chaos and nonlinear dynamics: Application to nancial mark ets. The Journal of Financ e , 5:1839{1877, 1991. [64] E. P . h uy Hao. Quadratically constrained quadratic programming: Some applications and a metho d for solution. Zeitschrift f u r Op er ations R ese ar ch , 26:105{119, 1982. [65] L. Iasemidis. On the dynamics of the human br ain in temp or al lob e epilepsy . PhD thesis, Univ ersit y of Mic higan, Ann Arb or, 1991. [66] L. Iasemidis, P . P ardalos, J. Sac k ellares, and D.-S. Shiau. Quadratic binary programming and dynamical system approac h to determine the predictabilit y of epileptic seizures. Journal of Combinatorial Optimization , 5:9{26, 2001. [67] L. Iasemidis, J. Princip e, J. Czaplewski, R. Gilmore, S. Rop er, and J. Sac k ellares. Spatiotemp oral transition to epileptic seizures: a nonlinear dynamical analysis of scalp and in tracranial eeg recordings. In F. Silv a, J. Princip e, and L. Almeida, editors, Sp atiotemp or al Mo dels in Biolo gic al and A rticial Systems , pages 81{88. IOS Press, 1997. [68] L. Iasemidis, J. Princip e, and J. Sac k ellares. Measuremen t and quan tication of spatiotemp oral dynamics of h uman epileptic seizures. In M. Ak a y , editor, Nonline ar biome dic al signal pr o c essing , pages 294{318. Wiley{IEEE Press, v ol. I I, 2000. [69] L. Iasemidis and J. Sac k ellares. The ev olution with time of the spatial distribution of the largest ly apuno v exp onen t on the h uman epileptic cortex. In D. Duk e and W. Pritc hard, editors, Me asuring Chaos in the Human Br ain , pages 49{82. W orld Scien tic, 1991. [70] L. Iasemidis and J. Sac k ellares. Chaos theory and epilepsy . The Neur oscientist , 2:118{126, 1996.

PAGE 206

189 [71] L. Iasemidis, D.-S. Shiau, P . P ardalos, and J. Sac k ellares. Phase en trainmen t and predictabilit y of epileptic seizures. In P . P ardalos and J. Princip e, editors, Bio c omputing , pages 59{84. Klu w er Academic Publishers, 2001. [72] L. Iasemidis, D.-S. Shiau, J. Sac k ellares, and P . P ardalos. T ransition to epileptic seizures: Optimization. In D. Du, P . P ardalos, and J. W ang, editors, DIMA CS series in Discr ete Mathematics and The or etic al Computer Scienc e , pages 55{74. American Mathematical So ciet y , 1999. [73] L. Iasemidis, D.-S. Shiau, J. Sac k ellares, P . P ardalos, and A. Prasad. Dynamical resetting of the h uman brain at epileptic seizures: application of nonlinear dynamics and global optimization tecniques. IEEE T r ansactions on Biome dic al Engine ering , 50(5):616{627, 2003. [74] L. Iasemidis, H. Za v eri, J. Sac k ellares, and W. Williams. Phase space analysis of eeg in temp oral lob e epilepsy . In IEEE Eng. in Me dicine and Biolo gy So ciety, 10th A nn. Int. Conf. , pages 1201{1203, 1988. [75] L. Iasemidis, H. Za v eri, J. Sac k ellares, and W. Williams. Phase space top ograph y of the electro corticogram and the ly apuno v exp onen t in partial seizures. Br ain T op o gr aphy , 2:187{201, 1990. [76] M. Jacobs, G. Fisc h bac h, M. Da vis, M. Dic h ter, R. Dingledine, D. Lo w enstein, M. Morrell, J. No eb els, M. Roga wski, S. Sp encer, and W. Theo dore. F uture directions for epilepsy researc h. Neur olo gy , 57:1536{1542, 2001. [77] B. Jansen and W. Cheng. Structural eeg analysis. Int. J. Biome d. Comput. , 23:221{237, 1988. [78] P . Jensen and J. Bard. Op er ations r ese ar ch: Mo dels and metho ds . John Wiley & Sons, New Y ork, 2003. [79] R. Kalman. A new approac h to linear ltering and prediction problems. Journal of Basic Engine ering , 82:34{45, 1960. [80] D. Kaplan, M. F urman, S. Pincus, S. Ry an, L. Lipsitz, and L. Goldb erger. Aging and the complexit y of cardio v ascular dynamic. Biophysic al Journal , 59:945{949, 1991. [81] A. Katz, D. Marks, G. McCarth y , and S. Sp encer. Do es in terictal spiking c hange prior to seizures? Ele ctr o enc ephalo gr am and Clinic al Neur ophysiolo gy , 79:153{156, 1991. [82] E. Keogh and P . Sm yth. A probabilistic approac h to fast pattern matc hing in time series databases. In Pr o c e e dings of the Thir d International Confer enc e on Know le dge Disc overy and Data Mining , 1997.

PAGE 207

190 [83] A. Kolmogoro v. A new metric in v arian t of transien t dynamical systems and automorphisms in leb esgue space. Doklady A kademii Nauk SSSR , 119:861{864, 1958. [84] E. Kostelic h. Problems in estimating dynamics from data. Physic a D , 58:138{ 152, 1992. [85] H. Kuhn and A. T uc k er. Nonlinear programming. In J. Na yman, editor, Pr o c e e dings of the Se c ond Berkeley Symp osium on Math. Stat. and Pr ob. , pages 481{492. U. of California Press, 1951. [86] S. Kullbac k and R. Leibler. On information and suciency . A nnals of Mathematic al Statistics , 22:79{86, 1951. [87] E. La wler and M. Bell. A metho d for solving discrete optimization problems. Op er ations R ese ar ch , 14:1098{1112, 1966. [88] K. Lehnertz and C. Elger. Can epileptic seizures b e predicted? evidence from nonlinear time series analysis of brain electrical activit y . Physic al R eview L etters , 80:5019{5022, 1998. [89] T. Li and J. Y ork e. P erio d three implies c haos. A meric an Mathematic al Monthly , 10:985{992, 1975. [90] B. Litt and J. Ec hauz. Prediction of epileptic seizures. The L anc et Neur olo gy , 1:22{30, 2002. [91] B. Litt, R. Esteller, J. Ec hauz, D. Mary ann, R. Shor, T. Henry , P . P ennell, C. Epstein, R. Bak a y , M. Dic h ter, and G. V ac h tserv anos. Epileptic seizures ma y b egin hours in adv ance of clinical onset: A rep ort of v e patien ts. Neur on , 30:51{64, 2001. [92] E. Lorenz. Deterministic nonp erio dic o w. Journal of A tmosp eric Scienc e , 20:130{141, 1963. [93] O. Mangasarian. Nonline ar pr o gr amming . McGra w-Hill, New Y ork, 1969. [94] J. Martinerie, C. V. Adam, and M. L. V. Quy en. Epileptic seizures can b e an ticipated b y non-linear analysis. Natur e Me dicine , 4:1173{1176, 1998. [95] D. McCarey , D. Nyc hk a, S. Ellner, and A. Gallan t. Estimating ly apuno v exp onen ts with nonparametric regression. J. A mer. Statist. So c. , 87:682{695, 1992. [96] R. Meinhold and N. Singpurw alla. Understanding the k alman lter. The A meric an Statistician , 37:123{127, 1983. [97] M. Mezard, G. P arisi, and M. Virasoro. Spin glass the ory and b eyond . W orld Scien tic, Singap ore, 1987.

PAGE 208

191 [98] J. Mitc hell. Branc h-and-cut algorithms for com binatorial optimization problems. In P . P ardalos and M. Resende, editors, Handb o ok of Applie d Optimization . Oxford Univ ersit y Press, 2002. [99] T. Mitc hell. Machine L e arning . McGra w-Hill, Boston, 1997. [100] T. Motzkin and E. Strauss. Maxima for graphs and a new pro ofs of a theorem tur a n. Canadian Journal of Mathematics , 17:533{540, 1965. [101] S. Nash and A. Sofer. Line ar and nonline ar Pr o gr amming . McGra w-Hill, New Y ork, 1996. [102] R. M. n e. On the dimension of the compact in v arian t sets of certain nonlinear maps. In L e ctur e Notes in Math, V ol. 597 , pages 230{242. Springer V erlag, 1981. [103] J. Nemhauser and L. W olsey . Inte ger and c ombinatorial optimization . John Wiley , New Y ork, 1988. [104] A. Opp enheim and R. Sc hafer. Digital signal pr o c essing . Pren tice-Hall, Englew o o d Clis, 1975. [105] N. P ac k ard, J. Crutc held, and J. F armer. Geometry from time series. Phys. R ev. L ett. , 45:712{716, 1980. [106] M. P alus, V. Albrec h t, and I. Dv orak. Information theoretic test of nonlinearlit y in time series. Phys. R ev. A , 34:4971{4972, 1993. [107] C. V. D. P anne. Programming with a quadratic constrain t. Management Scienc e , 12:709{815, 1966. [108] C. V. D. P anne. Metho ds for line ar and quadr atic pr o gr amming . North Holland, Amsterdam, 1975. [109] C. P apadimitriou and K. Steiglitz. Combinatorial optimization: A lgorithms and c omplexity . Pren tice-Hall, Englew o o d Clis, 1982. [110] P . P ardalos. Global optimization algorithms for linearly constrained indenite quadratic problems. Computers and Mathematics with Applic ations , 21:87{97, 1991. [111] P . P ardalos, J. Glic k, and J. Rosen. Global minimization of indenite quadratic problems. Computing , 39:281{291, 1987. [112] P . P ardalos and G. Ro dgers. P arallel branc h and b ound algorithms for unconstrained quadratic zero-one programming. In R. S. et al., editor, Imp act of r e c ent c omputer advanc es on op er ations r ese ar ch . North-Holland, 1989. [113] P . P ardalos and G. Ro dgers. Computational asp ects of a branc h and b ound algorithm for quadratic zero-one programming. Computing , 45:131{144, 1990.

PAGE 209

192 [114] P . P ardalos and G. Ro dgers. A branc h and b ound algorithm for the maxim um clique problem. Computers and Op er ations R ese ar ch , 19:363{375, 1992. [115] P . P ardalos and J. Rosen. Metho ds for global conca v e minimization: A bibliographic surv ey . SIAM R eview , 28(3):367{379, 1986. [116] P . P ardalos and G. Sc hnitger. Chec king lo cal optimalit y in constrained quadratic programming is N P -hard. Op er ations R ese ar ch L etters , 7(1):33{35, 1988. [117] P . P ardalos and V a v asis. Quadratic programming with one negativ e eigen v alue is N P -hard. Journal of Glob al Optimization , 1:15{23, 1991. [118] P . P ardalos and J. Xue. The maxim um clique problem. Journal of Glob al Optimization , 4:301{328, 1992. [119] U. P arlitz. Iden tication of true and spurious ly apuno v exp onen ts from time series. Int. J. Bif. Chaos , 2:155{165, 1992. [120] E. P arzen. Ararma mo dels for time series analysis and forecasting. Journal of F or e c asting , 1:67{82, 1982. [121] T. Phing, P . T ao, and L. H. An. A metho d for solving d.c. programming problems, application to fuel mixture noncon v ex optimization problems. Journal of Glob al Optimization , 6:87{105, 1994. [122] S. Pincus. Appro ximate en trop y as a measure of system complexit y . In Pr o c e e dings of the National A c ademy of Scienc e of the Unite d States of A meric a , v olume 88, pages 2297{2301, 1991. [123] S. Pincus. Appro ximating mark o v c hains. In Pr o c e e dings of the National A c ademy of Scienc e of the Unite d States of A meric a , v olume 89, pages 4432{ 4436, 1992. [124] S. Pincus, T. Cummins, and G. Haddad. Heart-rate con trol in normal and ab orted sids infan ts. A meric an Journal of Physiolo gy , 264:R638{R646, 1993. [125] S. Pincus, I. Gladstone, and R. Ehrenkranz. A regularit y statistic for medical data analysis. Journal of Clinic al Monitoring , 7:335{345, 1991. [126] S. Pincus and W.-M. Huang. Appro ximate en trop y: Statistical prop erties and applications. Communic ations in Statistics Part A Ther oy and Metho ds , 21:3061{3077, 1992. [127] S. Pincus and D. L. Keefe. Quan tication of hormone pulsatilit y via an appro ximate en trop y algorithm. A meric an Journal of Physiolo gy , 265:E741{ E754, 1992.

PAGE 210

193 [128] S. Pincus and R. Viscarello. Appro ximate en trop y: A regularit y measure for fetal heart rate analysis. Obstetrics and Gyne c olo gy , 79:249{255, 1992. [129] R. P o vinelli. Iden tifying temp oral patterns for c haracterization and prediction of nancial time series ev en ts. In T emp or al, Sp atial and Sp atio-T emp or al Data Mining: First International Workshop TSDM2000 , pages 46{61, 2000. [130] J. Princip e, X. Dongxin, Q. Zhao, and W. Fisher. Learning from examples with information theoretic criteria. Journal of VLSI Signal Pr o c essing , 26:61{ 77, 2000. [131] M. L. V. Quy en, J. Martinerie, M. Baulac, and F. V arela. An ticipating epileptic seizures in real time b y non-linear analysis of similarit y b et w een eeg recordings. Neur oR ep ort , 10:2149{2155, 1999. [132] P . Ra jna, B. Clemens, and E. C. et. al. Hungarian m ulticen tre epidemiologic study of the w arning and initial symptoms (pro drome, auro) of epileptic seizures. Seizur e , 6:361{368, 1997. [133] D. Rulle and F. T ak ens. On the nature of turbulence. Comm. Math. Phys. , 20, 21:167{192, 343{344, 1971. [134] D. Ruten b erg and T. Shaftel. Pro duct design: Subassem blies for m ultiple mark ets. Management Scienc e , 18(4):B220{B231, 1971. [135] J. Sac k ellares, L. Iasemidis, R. Gilmore, and S. Rop er. Epileptic seizures as neural resetting mec hanisms. Epilepsia , 38(S3):189, 1997. [136] J. Sac k ellares, L. Iasemidis, R. Gilmore, and S. Rop er. Epilepsy when c haos fails. In K. Lehnertz, J. Arnhold, P . Grassb erger, and C. Elger, editors, Chaos in the br ain? W orld Scien tic, 2002. [137] J. Sac k ellares, L. Iasemidis, K. P appas, R. Gilmore, B. Uthman, and S. Rop er. Dynamical studies of h uman hipp o campus in lim bic epilepsy . Neur olo gy , 45S:404, 1995. [138] J. Sac k ellares, L. Iasemidis, and D.-S. Shiau. Detection of the preictal transition in scalp eeg. Epilepsia , 40:176, 1999. [139] S. Sahni. Computationally related problems. SIAM Journal on Computing , 3:262{279, 1974. [140] T. Sauer. A noise reduction mo del for signals from nonlinear systems. Physic a D , 58:193{201, 1992. [141] T. Sauer, J. Y ork e, and M. Casdagli. Em b edology . Journal of Statistic al Physics , 65:579{616, 1991.

PAGE 211

194 [142] R. Sc halk o. Pattern r e c o gnition statistic al, structur al, and neur al appr o aches . John Wiley & Sons, New Y ork, 1992. [143] J. Sc heinkman. Nonlinearities in economic dynamics. The Ec onomic Journal , 100:33{48, 1990. [144] B. Selman, H. Lev esque, and D. Mitc hell. Time-series similarit y problems and w ell-separated geometric sets. In Thirte enth A nnual A CM Symp osium on Computational Ge ometry , pages 454{456, 1997. [145] D.-S. Shiau, Q. Luo, S. Gilmore, S. Rop er, P . P ardalos, J. Sac k ellares, and L. Iasemidis. Epileptic seizures resetting revisited. Epilepsia , 41(S7):208{209, 2000. [146] I. Shimada and T. Nagashima. A n umerical approac h to ergo dic problem of dissipativ e dynamical systems. Pr o gr ess of The or etic al Physics , 61:1605{1616, 1979. [147] R. Soland. An algorithm for separable non v o x programming problems ii: Noncon v ex constrain ts. Management Scienc e , 17(11):159, 1971. [148] J. Sprott. Str ange attr actors: Cr e ating p atterns in chaos . M&T Bo oks, New Y ork, 1993. [149] R. Stern and H. W olk o wicz. Indenite trust region subproblems and nonsymmetric eigen v alue p erturbations. SIAM Journal on Optimization , 5:286{ 313, 1995. [150] Z. Struzik and A. Sieb es. W a v elet transfrom in similarit y paradigm. T ec hnical rep ort, CWI Rep ort INS-R9802 Information Systems, 1998. [151] G. Sugihara and R. Ma y . Nonlinear forecasting as a w a y of distinguishing c haos from measuremen t error in time series. Natur e , 344:734{741, 1990. [152] K. Sw arup. Indenite quadratic programming with a quadratic constrain t. Ekonomickomatem. Obzor , 4:69{75, 1966. [153] F. T ak ens. Detecting strange attractors in turbulence. In D. Rand and L. Y oung, editors, Dynamic al systems and turbulenc e, L e ctur e notes in mathematics . Springer-V erlag, 1981. [154] A. B. T al and M. T eb oulle. Hidden con v exit y in some noncon v ex quadratically constrained quadratic programming. Mathematic al Pr o gr amming , 72:51{639, 1996. [155] S. V a v asis. Nonline ar optimization: Complexity issues . Oxford Univ erist y Press, New Y ork, 1991.

PAGE 212

195 [156] S. Viglione and G. W alsh. Pro ceedings: Epileptic seizure prediction. Ele ctr o enc ephalo gr am and Clinic al Neur ophysiolo gy , 39:435{436, 1975. [157] J. W arga. A necessary and sucien t condition for a constrained minim um. SIAM Journal of Optimization , 2(4):665{667, 1992. [158] E. W eeks. My adv en tures in c haotic time series analysis: Meet the time series. h ttp://www.ph ysics.emory .edu/ w eeks/researc h/tseries1.h tml, 10/16/2002. [159] M. W einand, L. Carter, W. el Saadan y , P . Sioutos, D. Labiner, and K. Oommen. Cerebral blo o d o w and temp oral lob e epileptogenicit y . Journal of Neur osur gery , 86:226{232, 1997. [160] S. W eiss and N. Indurkh y a. Pr e dictive data mining: A pr actic al guide . Morgan Kaufmann, San F rancisco, 1998. [161] H. Wieser. Preictal eeg ndings. Epilepsia , 30:669, 1989. [162] A. W olf, J. Swift, H. Swinney , and J. V astano. Determining ly apuno v exp onen ts from a time series. Physic a D , 16:285{317, 1985. [163] A. W olf and J. V astano. In termediate length scale eects in ly apuno v exp onen t estimation. In G. Ma y er-Kress, editor, Dimensions and entr opies in chaotic systems: Quantic ation of c omplex b ehavior , pages 94{99. Springer-V erlag, 1986. [164] A. Y aglom. Stationary r andom functions . Pren tice-Hall, New Jersey , 1952.

PAGE 213

BIOGRAPHICAL SKETCH W anprac ha Chao v alit w ongse w as b orn on April 22, 1979, in Bangk ok, Thailand. He receiv ed his Bac helor of Engineering degree in telecomm unication engineering from King Mongkut Institute of T ec hnology , Ladkrabang, Thailand, in April 1999 and Master of Science degree in industrial and systems engineering from Univ ersit y of Florida, Gainesville, USA, in Decem b er 2000. He then b egan do ctoral studies in the departmen t of Industrial and Systems Engineering at Univ ersit y of Florida and receiv ed his Ph.D. in August 2003. His researc h in terests include global optimization, com binatorial optimization, c haos theory , and nonlinear dynamics in nancial and biomedical engineering applications. 196