From Fixed to Adaptive Budget Robust Kernel Adaptive Filtering

MISSING IMAGE

Material Information

Title:
From Fixed to Adaptive Budget Robust Kernel Adaptive Filtering
Physical Description:
1 online resource (122 p.)
Language:
english
Creator:
Zhao, Songlin
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Electrical and Computer Engineering
Committee Chair:
Principe, Jose C
Committee Members:
Rangarajan, Anand
Shea, John M
Chen, Yunmei

Subjects

Subjects / Keywords:
kenrel -- songlin
Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre:
Electrical and Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
Recently, owning to universal modeling capacity, convexity in performance surface and modest computational complexity, kernel adaptive filters have attracted more and more attention. Even though these methods achieve powerful classification and regression performance in complicated nonlinear problems, they have drawbacks. This work focuses on how to improve kernel adaptive filters performance both on accuracy and computational complexity. After reviewing some existing adaptive filters cost functions, we introduce an information theoretic objective function, Maximal Correntropy Criterion (MCC), that contains high order statistical information. Here we propose to adopt this objective function for kernel adaptive filters to improve system accuracy performance in nonlinear and non-Gaussian scenario. To determine the free parameter, kernel width in correntropy, an adaptive method based on the statistical property of the prediction error is proposed. After that we propose a growing and pruning method to realize a fixed-budget kernel least mean square (KLMS) algorithm  based on improvements of the quantized kernel least mean square algorithm and a new significance measure. The end result is to control the computational complexity and memory requirement of kernel adaptive filters while preserving the accuracy as much as possible. This balance between accuracy and filter model order is explored from the perspective of information learning. Indeed the issue is how to deal with the trade-off between system complexity and accuracy performance, and an information learning criterion called Minimal Description Length (MDL) is introduced to kernel adaptive filtering. Two formulations of MDL: batch and online model are developed and illustrated by approximation level selection in KRLS-ALD and center dictionary selection in KLMS respectively. The end result is a methodology that controls the kernel adaptive filter dictionary (model order) according to the complexity of the true system and the input signal for online learning even in nonstationary environments.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Songlin Zhao.
Thesis:
Thesis (Ph.D.)--University of Florida, 2012.
Local:
Adviser: Principe, Jose C.
Electronic Access:
RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2013-06-30

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2012
System ID:
UFE0044951:00001


This item is only available as the following downloads:


Full Text

PAGE 1

FROMFIXEDTOADAPTIVEBUDGETROBUSTKERNELADAPTIVEFILTERINGBySONGLINZHAOADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2012

PAGE 2

c2012SonglinZhao 2

PAGE 3

ACKNOWLEDGMENTS Iwouldliketosincerelythankmyadvisor,Dr.J.C.Prncipe,forhissupport,encouragementandpatienceinguidingtheresearch.Hisopennessandsupporttomyinvestigativeideashadhelpmetobecomeabetterresearch.IwouldliketothankDr.AnandRangarajan,Dr.JohnM.Shea,andDr.YunmeiChenforbeingservingonmycommitteeandfortheirhelpfuladvice.Theirvaluablecommentsandconstructivecriticismhelpimprovethequalityofthisworkgreatly.Manythanksareduetoformerlabmember,Dr.BadongChen,forhisadviceandmanydiscussionsthatmademyresearchaloteasierandhappier.Thanksareduetoallofcurrentlabmembersfortheirsupport.Finally,Iwouldliketothankmyparentsfortheirencouragementandsupportwithme. 3

PAGE 4

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 3 LISTOFTABLES ...................................... 6 LISTOFFIGURES ..................................... 7 ABSTRACT ......................................... 9 CHAPTER 1INTRODUCTION ................................... 11 1.1LinearAndNonlinearAdaptiveFilter ..................... 11 1.1.1LinearAdaptiveFilter .......................... 11 1.1.2NonlinearAdaptiveFilter ........................ 12 1.2ReproducingKernelHilbertSpace ...................... 13 1.3KernelAdaptiveFiltering ............................ 14 1.3.1KernelRecursiveLeastSquareAlgorithm .............. 15 1.3.1.1Approximatelineardependencyforsparsication ..... 16 1.3.1.2RegularizedKRLS-ALD ................... 17 1.3.2KernelLeastMeanSquareAlgorithm ................. 20 1.3.3KernelAfneProjectionAlgorithm ................... 23 1.4WorkInThisThesis .............................. 23 2KERNELADAPTIVEFILTERINGWITHMAXIMUMCORRENTROPYCRITERION ..................................... 26 2.1CostFunctionAndInformationTheoreticLearning ............. 27 2.1.1TheCostFunctionDevelopment ................... 27 2.1.2DenitionAndPropertiesOfCorrentropy ............... 30 2.1.3CorrentropyInLinearAdaptiveFilters ................ 32 2.2FormulationOfKernelMaximum-CorrentropyAlgorithm .......... 34 2.3ImplementationOfKernelMaximum-CorrentropyAlgorithm ........ 36 2.3.1Steady-StateMSEAnalysis ...................... 36 2.3.2SelfRegularization ........................... 40 2.4KernelWidthSelectionForCorrentropy ................... 40 2.5Simulation .................................... 45 2.5.1FrequencyDoubling .......................... 45 2.6Conclusion ................................... 51 3FIXEDBUDGETQUANTIZEDKERNELLEAST-MEAN-SQUAREALGORITHM 54 3.1BudgetControllingStrategy .......................... 55 3.1.1GrowingAndPruningTechniquesOverview ............. 55 3.1.2QuantizedKernelAdaptiveFiltering .................. 58 4

PAGE 5

3.2SignicanceMeasure ............................. 60 3.2.1DenitionOfSignicance ....................... 60 3.2.2EstimationOfSignicance ....................... 62 3.2.2.1Inuencefactor ....................... 62 3.2.2.2Integrationapproximation .................. 64 3.2.2.3Recursivetechnique ..................... 65 3.3FixedBudgetQuantizedKernelLeast-Mean-SquareAlgorithm ...... 66 3.4Simulation .................................... 67 3.4.1TimeSeriesPrediction ......................... 67 3.4.2ChannelEqualization .......................... 71 3.4.2.1Comparisonindifferentchangerateconditions ...... 74 3.4.2.2Outlierinuence ....................... 77 3.4.2.3Howthethreefreeparameters,,andK,affecttheperformanceofQKLMS-FB ................. 78 3.5ConclusionAndDiscussion .......................... 80 4MDL-BASEDSPARCIFICATIONALGORITHMFORKERNELADAPTIVEFILTERING ...................................... 81 4.1InformationTheoryForSparsication .................... 82 4.2KRLS-ALDApproximationLevelSelection .................. 85 4.2.1FormulationofKRLS-ALDApproximationLevelSelectionTechniqueBasedonMDL ............................. 86 4.2.1.1Objectivefunction ...................... 86 4.2.1.2Optimizationselection .................... 87 4.2.2Simulation ................................ 90 4.2.2.1Systemidentication ..................... 90 4.2.2.2Sunspotprediction ...................... 93 4.3MDL-BasedQuantizedKernelLeast-Mean-SquareAlgorithm ....... 95 4.3.1ObjectiveFunction ........................... 97 4.3.2FormulationOfMDL-BasedQuantizedKernelLeast-Mean-SquareAlgorithm ................................ 98 4.3.3Simulation ................................ 102 4.3.3.1Systemidentication ..................... 102 4.3.3.2SantaFetime-seriesprediction ............... 104 4.4Conclusion ................................... 110 5CONCLUSIONANDFUTUREWORK ....................... 112 5.1ConclusionAndDiscussion .......................... 112 5.2FeatureWork .................................. 113 REFERENCES ....................................... 115 BIOGRAPHICALSKETCH ................................ 122 5

PAGE 6

LISTOFTABLES Table page 1-1Computationalcostsperiteration .......................... 20 2-1ThenalestimatedIEPcomparison ........................ 47 2-2TheoreticandsimulationEMSE(Gaussian-mixturenoise) ............ 48 2-3Systemcomparisonbetweenxedkernelwidthanddifferentupdatemethods(Gaussian-mixturenoise) .............................. 50 2-4Systemcomparisonbetweenxedkernelwidthanddifferentupdatemethods(Laplace-mixturenoise) ............................... 51 2-5EffectofinKMC .................................. 51 3-1ParameterssettingfordifferentalgorithmsinLorenztimeseriesprediction ... 71 3-2ThenalMSE(dB)comparisonofdifferentalgorithmsinchannelequalization.Theparametersofthealgorithmsarechosenastoproducealmostthesamenalnetworksize. .................................. 77 3-3Thenalnetworksizecomparisonofdifferentalgorithmsinchannelequalization.TheparametersofthealgorithmsarechosenastoproducealmostthesamenalMSEineachcondition. ............................ 77 3-4ParameterssettinginchannelequalizationtoachievealmostthesamenalNetworksizeineachcondition ........................... 77 3-5ParameterssettinginchannelequalizationtoachievealmostthesameMSEineachcondition ................................... 77 3-6Parameterssettinginchannelequalizationfordifferentalgorithms(changerateequalsto1=500) ................................. 78 4-1Thesecondtermexpressioninparticularcriterion ................ 85 4-2Meanfreerunpredictionerrorfortheannualsunspotcounttimeseries .... 96 4-3Computationalcostsperiteration .......................... 102 4-4ThenalnetworksizecomparisonofQKLMSandQKLMS-MDLinsystemidenticationproblem. ................................ 104 4-5ParameterssettinginchannelequalizationproblemtoachievealmostthesamenalNetworksizeineachcondition ........................ 104 4-6Parameterssettingfordifferentalgorithmsintimeseriesprediction ....... 107 6

PAGE 7

LISTOFFIGURES Figure page 1-1Basicstructureofalinearadaptiveltering .................... 12 2-1Atypicalkerneladaptiveltering .......................... 27 2-2ContoursofCorrentropy(X,0)in2Dsamplespace(kernelsizeissetto1). .. 30 2-3CostfunctionvaluesofMSE,MEEandMCC:therstlineshowsthecostfunctionvaluesofthesecriteriaandthesecondlineisthedistributionoferror. ..... 33 2-4Overalltrainingsystemstructure .......................... 43 2-5Examplesofkernelwidthselectioninnon-Gaussianenvironment ........ 44 2-6Simulationdata .................................... 46 2-7Averagelearningcurvescomparisonalongwithstandardderivation ...... 47 2-8Visualresultofarepresentativesimulation .................... 48 2-9ConvergencecurveintermsoftheEMSE ..................... 49 2-10Thekernelsizecurveswithrespecttoseries .................. 52 3-1Lorenzpredictiontimeseriesthathasagradualchangeatiteration2500 ... 68 3-2BiascomponentaddedtotheLorenzpredictiontimeseries ........... 69 3-3PerformanceofQKLMSandQKLMS-FBinLorenztimeseriespredictionunderstationarycondition ................................. 70 3-4PerformancecomparisonforQKLMS,QKLMS-FBandQKLMS-GGAPinLorenztimeseriesprediction.Theparametersofthealgorithmsarechosensuchthattheyproducealmostthesamenalnetworksize. ................. 72 3-5PerformancecomparisonforQKLMS,QKLMS-FBandKLMSinLorenztimeseriesprediction.TheparametersofthealgorithmsarechoseninawaytoproducealmostthesamenaltrainingMSE. ................... 73 3-6TheinuenceofthexedbudgetsizeonthenalMSEinLorenztimeseriesperditionundernonstationarycondition. ...................... 74 3-7Basicstructureofanonlinearchannel ....................... 74 3-8Theinputofthechannelequalizationsystem ................... 75 3-9PerformancecomparisonforQKLMS,QKLMS-FBandQKLMS-GGAPforthechannelequalizationproblemundertheconditionofabruptchange ....... 76 7

PAGE 8

3-10ThecurveofthenumberofcentersallocatedduringH1forthechannelequalizationproblem ........................................ 78 3-11EffectofthequantizationfactorandforgetfactoronnalMSEperformanceforthechannelequalizationproblemwithrespecttothexednetworksize. .. 79 4-1GeneralRate-distortioncurveofKRLS-ALD .................... 85 4-2AnonlinearWienersystem ............................. 91 4-3Theinuencecomparisonofdifferentregularizationtermonsystemidenticationproblem ........................................ 92 4-4ThepointlocationdecidedbyproposedMDL-basedmethodforsystemidenticationproblem ........................................ 92 4-5Thenormalizedannualsunspotdata ........................ 93 4-6ThecomparisonbetweenKRLS-ALDwithandwithoutregularizationforannualsunspotprediction .................................. 94 4-7ThepointlocationdecidedbyproposedMDL-basedmethodforannualsunspotprediction ....................................... 95 4-8Theevolutioncurvesofsystemparameters .................... 103 4-9PerformancecomparisonforQKLMSandQKLMS-MDLinsystemidenticationproblemwithtransitionarealength1000 ...................... 105 4-10SantaFetimeseriesdata .............................. 106 4-11PerformancecomparisoninSantaFetimeseriesprediction.Theparametersofthealgorithmsarechosensuchthattheyproducealmostthesamemaximumnetworksize. ..................................... 108 4-12PerformancecomparisoninSantaFetimeseriesprediction.TheparametersofthealgorithmsarechosensuchthattheyproducealmostthesametrainingMSEatthenalstage. ................................ 109 4-13EffectofthewindowlengthonnaltestingMSEinSantaFetimeseriesdata 110 4-14EffectofthewindowlengthonnalcenternumberinSantaFetimeseriesdata 111 8

PAGE 9

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophyFROMFIXEDTOADAPTIVEBUDGETROBUSTKERNELADAPTIVEFILTERINGBySonglinZhaoDecember2012Chair:JoseC.PrincipeMajor:ElectricalandComputerEngineering Recently,owningtouniversalmodelingcapacity,convexityinperformancesurfaceandmodestcomputationalcomplexity,kerneladaptiveltershaveattractedmoreandmoreattention.Eventhoughthesemethodsachievepowerfulclassicationandregressionperformanceincomplicatednonlinearproblems,theyhavedrawbacks.Thisworkfocusesonhowtoimprovekerneladaptiveltersperformancebothonaccuracyandcomputationalcomplexity. Afterreviewingsomeexistingadaptivelterscostfunctions,weintroduceaninformationtheoreticobjectivefunction,MaximalCorrentropyCriterion(MCC),thatcontainshighorderstatisticalinformation.Hereweproposetoadoptthisobjectivefunctionforkerneladaptivelterstoimprovesystemaccuracyperformanceinnonlinearandnon-Gaussianscenario.Todeterminethefreeparameter,kernelwidthincorrentropy,anadaptivemethodbasedonthestatisticalpropertyofthepredictionerrorisproposed. Afterthatweproposeagrowingandpruningmethodtorealizeaxed-budgetkernelleastmeansquare(KLMS)algorithmbasedonimprovementsofthequantizedkernelleastmeansquarealgorithmandanewsignicancemeasure.Theendresultistocontrolthecomputationalcomplexityandmemoryrequirementofkerneladaptivelterswhilepreservingtheaccuracyasmuchaspossible.Thisbalancebetweenaccuracyandltermodelorderisexploredfromtheperspectiveofinformationlearning.Indeedtheissueishowtodealwiththetrade-offbetweensystemcomplexityand 9

PAGE 10

accuracyperformance,andaninformationlearningcriterioncalledMinimalDescriptionLength(MDL)isintroducedtokerneladaptiveltering.TwoformulationsofMDL:batchandonlinemodelaredevelopedandillustratedbyapproximationlevelselectioninKRLS-ALDandcenterdictionaryselectioninKLMSrespectively.Theendresultisamethodologythatcontrolsthekerneladaptivelterdictionary(modelorder)accordingtothecomplexityofthetruesystemandtheinputsignalforonlinelearningeveninnonstationaryenvironments. 10

PAGE 11

CHAPTER1INTRODUCTION 1.1LinearAndNonlinearAdaptiveFilter Thetermlterusuallyreferstoasystemthatisdesignedtoextractinformationaboutaprescribedquantityofinterestfromnoisydata.Anadaptivelterisalterthatself-adjustsitsinputoutputmappingaccordingtoanoptimizationalgorithmmostlydrivenbyanerrorsignal.Becauseofthecomplexityoftheoptimizationalgorithms,mostadaptiveltersaredigitallters.Withtheavailableprocessingcapabilitiesofcurrentdigitalsignalprocessors,adaptiveltershavebecomemuchmorepopularandarenowwidelyusedinvariouseldssuchascommunicationdevices,camcordersanddigitalcameras,andmedicalmonitoringequipments. 1.1.1LinearAdaptiveFilter Inmostadaptivesignalprocessingapplications,systemlinearityisassumedandadaptivelinearltersarethusused.Thetraditionalclassofsupervisedadaptiveltersrelyonerror-correctionlearningfortheiradaptivecapability.Therefore,theerroristhenecessaryelementofacostfunction,whichisacriterionforoptimumperformanceofthelter.Toillustratethisformoflearning,considerthelteringstructuredepictedinFig. 1-1 Thelinearlterincludesasetofadaptivelyadjustableparameters(alsoknownasweights),whichismarkedas!(n)]TJ /F4 11.955 Tf 12.18 0 Td[(1),wherendenotesdiscretetime.Theinputsignalu(n)appliedtothelterattimen,producestheactualresponsey(n)via y(n)=!(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)Tu(n) Thenthisactualresponseiscomparedwiththecorrespondingdesiredresponsed(n)toproducetheerrorsignale(n).Theerrorsignal,inturn,actsasaguidetoadjusttheweights!(n)]TJ /F4 11.955 Tf 11.99 0 Td[(1)byanincrementalvaluedenotedby4!(n).Onthenextiteration,!(n)becomesthelatestvalueoftheweightstobeupdated.Theadaptivelteringprocessis 11

PAGE 12

Figure1-1. Basicstructureofalinearadaptiveltering repeatedcontinuouslyuntilthelterreachesastopcondition,whichnormallyisthattheweightsadjustmentissmallenough. Animportantissueintheadaptivedesign,nomatterlinearornonlinearadaptivelter,istoensurethelearningcurveisconvergentwithanincreasingnumberofiterations.Underthiscondition,wedenethesystemisinasteady-state. 1.1.2NonlinearAdaptiveFilter Eventhoughlinearadaptivelteringcanapproximatenon-linearity,theperformanceofadaptivelinearltersisnotsatisfactoryinapplicationswherenonlinearitiesaresignicant.Hencemoreadvancednonlinearmodelsarerequired.Atpresent,NeuralNetworksandKernelAdaptiveFiltersarepopularnonlinearmodels.Byprovidinglinearityinahighdimensionfeaturespace,ReproduceKernelHilbertSpace(RKHS),anduniversalapproximationinEuclideanspacewithuniversalkernels[ 1 ],thekerneladaptiveltersareattractingmoreattention.Throughareproducingkernel,kerneladaptiveltersmapdatafromaninputspacetoRKHS,whereappropriatelinearmethodsareappliedtothetransformeddata.Thisprocedureimplementsanonlineartreatmentforthedataintheinputspace.Comparingwithothernonlineartechniques,kerneladaptiveltershavethefollowingfeatures: 12

PAGE 13

Theycanbeuniversalapproximationswheneverthekernelisuniversal. Theyhavenolocalminimawiththesquarederrorcostfunction. Theyhavemoderatecomplexityintermsofcomputationandmemory. Theybelongtoonlinelearningmethodandhavegoodtrackingabilitytohandlenonstationaryconditions. Thedetailsaboutkerneladaptiveltersareintroducednext. 1.2ReproducingKernelHilbertSpace ReproducingKernelHilbertSpace,RKHSforshort,isacompleteinnerproductspaceassociatewithaMercerkernel.AMercerkernelisacontinuous,symmetricandpositivedenitefunction:UU!R,whereUistheinputdomaininEuclideanspaceRL(Listheinputorder).ThecommonlyusedkernelsincludestheGaussiankernel( 1 )andthepolynomialkernel( 1 )[ 1 ]. (u,u0)=exp()]TJ 10.5 8.08 Td[(ku)]TJ /F10 11.955 Tf 11.95 0 Td[(u0k2 2)(1) (u,u0)=(uTu0+1)p(1) Ifonlyonefreeparameterisinsertedintothekernelfunction,(u,)isexpressedasatransformedfeaturevector'(u)throughamapping':U!H,whereHistheRKHS.Therefore'(u)=(u,).OneofthemostimportantpropertiesofRKHSforpracticalapplicationsistheone,calledthekerneltrick '(u)T'(u0)=(u,u0)(1) thatallowscomputingtheinnerproductsbetweentwoRKHSfunctionsasascalarevaluationintheinputspacebythekernel. Besideskerneltrick,someotherpropertiesofRKHSrelatedtothisworkareasfollows.AssumeHbeanyRKHSofallreal-valuedfunctionsofuthataregeneratedbythe(u,).Supposenowtwofunctionsh()andg()arepickedfromthespaceHthat 13

PAGE 14

arerespectivelyrepresentedby h()=lXi=1ai(ci,)=lXi=1ai'(ci) g()=mXj=1bj(~cj,)=mXj=1bj'(~cj) whereaiandbjaretheexpansioncoefcientsandbothciand~cj2Uforalliandj. 1. Symmetry = 2. Scalinganddistributiveproperty <(cf+dg),h>=c+d 3. Squarednorm khk2=0 1.3KernelAdaptiveFiltering Thekernelmethodisapowerfulnonparametricmodelingtoolforpatternanalysisandstatisticalsignalprocessing.Throughanonlinearmapping,kernelmethodstransformthedataintoasetofpointsinaRKHS.Thenvariousmethodsareutilizedtondrelationshipbetweenthedata.Therearemanysuccessfulexamplesofthismethodologyincludingsupportvectormachines(SVM)[ 2 ],kernelregularizationnetwork[ 3 ],kernelprincipalcomponentanalysis(kernelPCA)[ 4 ]andkernelsherdiscriminantanalysis[ 5 ].Kerneladaptiveltersisaclassofkernelmethods.Asamemberofkerneladaptivelter,kernelafneprojectionalgorithms(KAPA)[ 6 ]includethekernelleastmeansquare(KLMS)[ 7 ]asthesimplestelementandkernelrecursiveleastsquares(KRLS)[ 8 ]asthemostcomputationallydemanding. Themainideaofkerneladaptivelteringcanbesummarizedasfollows:Transformtheinputdataintoahigh-dimensionfeaturespaceF,viaaMercerkernel.Thenappropriatelinearmethodsaresubsequentlyappliedonthetransformeddata.Aslongasalinearmethodinthefeaturespacecanbeformulatedintermsofinnerproducts, 14

PAGE 15

thereisnoneedtodocomputationinhigh-dimensionspacebasingonkerneltrick.Ithasbeenprovedby[ 9 ]thatthekerneladaptivelterswithuniversalkernelhasuniversalapproximationproperty.i.e.foranycontinuousinput-outputmappingf:U!R,8&>0,9fu(i)gi2N2Uandrealnumbercii2N,suchthatkf)]TJ /F14 11.955 Tf 12.73 8.97 Td[(Pici(u(i),)k2<&.Thisuniversalapproximationpropertyguaranteesthatthekernelmethodiscapableofsuperiorperformanceinnonlineartasks.IfweexpressavectorinFas =Xici'(u(i))(1) weobtain kf)]TJ /F4 11.955 Tf 11.95 0 Td[(T'k2<&(1) Kerneladaptiveltersprovideageneralizationoflinearadaptiveltersbecausetheseareaspecialcaseoftheformerwhenexpressedinthedualspace.Kerneladaptiveltersexhibitagrowingradialbasisfunctionnetwork,learningthenetworktopologyandadaptivethefreeparametersdirectlyfromthedataatthesametime[ 1 ].Inthefollowingofthissection,KRLSandKLMSareintroducedrespectively. Considerthelearningofanonlinearfunctionf:U!Rbasedonaknownsequence(u(1),d(1)),(u(2),d(2)),...,(u(n),d(n)),whereU2RListheinputspace,u(i),i=1,...,nisthesysteminputatsampletimei,andd(i)isthecorrespondingdesiredresponse. 1.3.1KernelRecursiveLeastSquareAlgorithm TheKRLSisactuallytherecursiveleastsquaresalgorithm(RLS)algorithminRKHS.Ateachiteration,oneneedstosolvetheregularizedleastsquaresregressiontoobtainf: minf2HnXi=1kf(u(i)))]TJ /F5 11.955 Tf 11.95 0 Td[(d(i)k2+kfk2H(1) whereistheregularizationtermandkk2HdenotesthenorminH. 15

PAGE 16

ThisproblemcanalternativelybesolvedinafeaturespaceandresultsinKRLS.ThelearningproblemofKRLSwithregularizationinfeaturespaceFistondahigh-dimensionalweightvector2Fthatminimizes min2FnXi=1kT'(u(i)))]TJ /F5 11.955 Tf 11.96 0 Td[(d(i)k2+kk2F(1) wherekk2FdenotesthenorminF. 1.3.1.1Approximatelineardependencyforsparsication SimilartoRLS,KRLSachieveshighaccuracyandhasfastconvergencerateinstationaryscenario.Howeverthegoodperformanceisobtainedatacostofhighcomputationcomplexity,O(n2),nisthenumberofprocessedsample. InordertodecreasethecomputationalcomplexityofKRLS,sparisicationtechniquesareadopted.Specically,sparsicationinkernelmethodsisrelatedtolesscomputationalcomplexityandmemoryconsumption.Furthermore,thesystem'sgeneralizationabilityisalsoinuencedbysparsicationinmachine-learningalgorithmsandsignalprocessing[ 1 ].NoveltyCriterion[ 10 ],ApproximateLinearDependency(ALD)[ 8 ],PredictionVarious[ 11 ],Surprise[ 12 ]andQuantizedtechniques[ 13 ]arecommonstrategiesforsparisication.Amongthem,ALDisaneffectivesparisicationtechniqueforKRLSbecauseitissolvedinthefeaturespace,unlikemostoftheothertechniques.BeforeintroducingALD,severalconceptsshouldbeinterpreted: Networksize:thenumberofdatautilizedtodescribethesystemmodel. Centerc:thedatautilizedtobuildthesystem.Therefore,thenumberofcentersequalstonetworksize. CenterdictionaryC:thesetofallcentersc. Supposeafterhavingobservedn)]TJ /F4 11.955 Tf 10.94 0 Td[(1trainingsamples,wehaveestablishedacenterdictionaryC(n)]TJ /F4 11.955 Tf 12.48 0 Td[(1)=fcigK(n)]TJ /F6 7.97 Tf 6.58 0 Td[(1)i=1,whereK(n)]TJ /F4 11.955 Tf 12.49 0 Td[(1)isthecardinalityofC(n)]TJ /F4 11.955 Tf 12.48 0 Td[(1).Whenanewsamplefu(n),d(n)gispresented,ALDtestswhetherthereexistsacoefcient 16

PAGE 17

a(n)=(a1,...,aK(n)]TJ /F6 7.97 Tf 6.59 0 Td[(1))satisfying d2=minaK(n)]TJ /F6 7.97 Tf 6.58 0 Td[(1)Xi=1ani'(ci))]TJ /F9 11.955 Tf 11.95 0 Td[('(u(n))2(1) where,calledtheapproximationlevel,isthethresholddeterminingthesparisicationlevelaswellassystemaccuracy.Ifthisconditionissatised,thenewsampleinfeaturespacecouldbelinearlyapproximatedbythecentersinC(n)]TJ /F4 11.955 Tf 12.54 0 Td[(1).Thereforetheeffectofthissampletothemappingcanbeexpressedthroughexistingcentersincenterdictionaryandthereisnoneedtoaugmentthecenterdictionary.Otherwise,thenewsamplewhosefeaturevectorisnotapproximatelydependentonthesamplesshouldbeaddedintodictionaryand,consequently,anewcoefcientcorrespondingtothiscenterwillbeincluded.Throughstraightcalculationitiseasytoobtain, a(n)=~K(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1))]TJ /F6 7.97 Tf 6.58 0 Td[(1h(n)(1) d2=(u(n),u(n)))]TJ /F10 11.955 Tf 11.95 0 Td[(h(n)Ta(n)(1) whereh(n)=[(c1,u(n)),...,(cK(n)]TJ /F6 7.97 Tf 6.59 0 Td[(1),u(n))]T,thematrix[~K(n)]TJ /F4 11.955 Tf 12.18 0 Td[(1)]ij=(ci,cj).ALDnotonlyisaneffectiveapproachtosparsicationbutalsoimprovestheoverallstabilityofthealgorithmbecauseofitsrelationwiththeeigenvalueof~K[ 1 ]. 1.3.1.2RegularizedKRLS-ALD NowtheyaretwosimplywaytodealwithKRLS-ALD:1)Settingregularizationtermequalto0[ 8 ].2)Discardingthesamplesoutsideofthecenterdictionaryandignoringtheinuenceofthesesamples[ 1 ].However,bothofthesetwostrategieshavedisadvantages.Withthesparsication,theprobabilityofoverttingdecreases,evennothappenswhendatanumberissmallenough.Thisistheinspirationthatmethod1usedtosettheregularizationtermequalto0.Unfortunately,thesuccessofthismethoddependsontheextentofsparsication.Iftheapproximationlevelisnotlargeenough,itdoesn'tmitigateovertting.Owningtodiscardingusefulinformation,theconvergencyspeedandaccuracyperformanceofthesecondmethodmaybenotsatisfactory.In 17

PAGE 18

ordertoovercomethesedrawbacks,IproposedageneralstructureofregularizedKRLS-ALD. Denethematrices(n)=['(u(1)),...,'(u(n))],~(n)=['(cn),...,'(cK(n))],accordingto[ 8 ], (n)=~(n)A(n)T+(n)res(1) (n+1)=~(n)~(n+1)(1) whereA(n)=[a(1),...,a(n)]T.Then,thecostfunctionbecomes L(~(n+1))=nXi=1k(n)T'(u(i)))]TJ /F5 11.955 Tf 11.96 0 Td[(d(i)k2+k(n)k2F=k(n)T~(n)~(n+1))]TJ /F10 11.955 Tf 11.96 0 Td[(d(n)k2+k(n)k2FkA(n)~K(n)~(n+1))]TJ /F10 11.955 Tf 11.95 0 Td[(d(n)k2+k~(n)~(n+1)k2F(1) whered(n)=[d(1),...d(n)]T.InordertominimizeL(~(n)),wetakethederivativewithrespectto~(n)andobtain @L @~(n+1)=2(A(n)~K(n))T(A(n)~K(n)~(n+1))]TJ /F10 11.955 Tf 11.85 0 Td[(d(n))+2~(n)T~(n)~(n+1)(1) Attheextremumthesystemsolutionis: ~(n+1)=[A(n)TA(n)~K(n)+I])]TJ /F6 7.97 Tf 6.58 0 Td[(1A(n)Td(n)(1) Intheonlinescenario,ateachtimestep,wearefacedwitheitheroneofthefollowingtwocases. 1. '(x(n))couldbeapproximatedbyC(n)]TJ /F4 11.955 Tf 12.02 0 Td[(1)accordingtoALD,thatisd2.Inthiscase,C(n)=C(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1). 2. d2>andthenewdata'(xn)isnotALDonC(n)]TJ /F4 11.955 Tf 13.18 0 Td[(1).Therefore,C(n)=C(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)Sfx(n)g. Thekeyissueinthisproblemishowtodesignaiterativesolutiontoobtain~n.Inthefollowing,wedenoteP(n)=[A(n)TA(n)~K(n)+I])]TJ /F6 7.97 Tf 6.59 0 Td[(1andderivetheKRLSwithregularizationforeachofthesetwocases. 18

PAGE 19

Dictionarydoesn'tchange:Inthiscase,~(n)=~(n)]TJ /F4 11.955 Tf 9.54 0 Td[(1)andhence~K(n)=~K(n)]TJ /F4 11.955 Tf 9.54 0 Td[(1).OnlyAchangesbetweentimesteps:A(n)=[A(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)T,a(n)]T.Therefore, A(n)TA(n)=A(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)TA(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)+a(n)a(n)T(1) SuchmatrixP(n)canbeexpressedasP(n)=[P(n)]TJ /F4 11.955 Tf 9.62 0 Td[(1))]TJ /F6 7.97 Tf 6.58 0 Td[(1+a(n)a(n)T~K(n)])]TJ /F6 7.97 Tf 6.59 0 Td[(1.Accordingtothematrixinversionlemma,assumeP(n)]TJ /F4 11.955 Tf 12.1 0 Td[(1)=A,a(n)=B,a(n)T~K(n)=C,I=D,yields, P(n)=P(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1))]TJ /F10 11.955 Tf 13.15 8.09 Td[(P(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)ana(n)T~K(n)P(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1) 1+a(n)T~K(n)P(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)a(n)(1) Deningq(n)=P(n)]TJ /F4 11.955 Tf 12.14 0 Td[(1)an=(1+a(n)T~K(n)P(n)]TJ /F4 11.955 Tf 12.13 0 Td[(1)a(n))ands(n)=a(n)T~K(n),Thecoefcientvector~(n+1)couldthenbeexpressedas ~(n+1)=P(n)A(n)Td(n)="P(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1))]TJ /F10 11.955 Tf 11.16 8.09 Td[(P(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)a(n)a(n)T~K(n)P(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1) 1+a(n)T~K(n)P(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)a(n)#[A(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)Td(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)+a(n)d(n)]=~(n)+P(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)a(n)[d(n))]TJ /F10 11.955 Tf 11.96 0 Td[(a(n)T~K(n)~(n)] 1+a(n)T~K(n)P(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)a(n)=~(n))]TJ /F10 11.955 Tf 11.95 0 Td[(q(n)(d(n))]TJ /F10 11.955 Tf 11.96 0 Td[(s(n))~(n)(1) Thesizeofcenterdictionaryincreases:Inthiscondition,~(n)=[~(n),'(u(n))].ThematrixAchangesto A(n)=264A(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)001375(1) Therefore, A(n)TA(n)=264A(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)TA(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)001375(1) P(n)=264P(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1))]TJ /F6 7.97 Tf 6.59 0 Td[(1A(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)TA(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)h(n)h(n)T+nn375)]TJ /F6 7.97 Tf 6.59 0 Td[(1(1) 19

PAGE 20

Table1-1. Computationalcostsperiteration ALDtestO(K(n))2 UpdateP(n)O(K(n))2 Update~(n)O(K(n)) whereh(n)=[(c1,u(n)),...,(ckn)]TJ /F23 5.978 Tf 5.75 0 Td[(1,u(n))]Tandnnisthesimplicationof(u(n),u(n)).Utilizetheblockmatrixinversionidentity,weobtain P(n)=(n))]TJ /F6 7.97 Tf 6.59 0 Td[(1264P(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1))]TJ /F4 11.955 Tf 9.3 0 Td[(1(n)+zA(n)z(n)T)]TJ /F10 11.955 Tf 9.3 0 Td[(zA(n))]TJ /F10 11.955 Tf 9.3 0 Td[(z(n)T1375(1) where(n)=+nn)]TJ /F10 11.955 Tf 13.04 0 Td[(h(n)TzA(n),zA(n)=P(n)]TJ /F4 11.955 Tf 13.04 0 Td[(1)A(n)]TJ /F4 11.955 Tf 13.04 0 Td[(1)TA(n)h(n),andz(n)=P(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)Th(n).Suchthecoefcientvectorisupdatedas ~(n+1)=P(n+1)A(n)Td(n)=(n))]TJ /F6 7.97 Tf 6.59 0 Td[(1264P(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)(n)+zA(n)z(n)T)]TJ /F10 11.955 Tf 9.3 0 Td[(zA(n))]TJ /F10 11.955 Tf 9.3 0 Td[(z(n)T1375264A(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)Td(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)dn375=264~(n))]TJ /F10 11.955 Tf 11.96 0 Td[(zA(n)(n))]TJ /F6 7.97 Tf 6.58 0 Td[(1e(n)(n))]TJ /F6 7.97 Tf 6.59 0 Td[(1e(n)375(1) wheree(n)=d(n))]TJ /F10 11.955 Tf 11.96 0 Td[(hTn~(n). WenowhaveobtainedarecursivealgorithmtosolvetheKRLSwithregularization,whichisreferredtoasisdescribedinpseudocodeinAlgorithm. 1 .Table 1-1 summarizesthecomputationalcostsperiterationforKRLS-ALDwithandwithoutregularization. 1.3.2KernelLeastMeanSquareAlgorithm TheKLMSutilizesthegradientdescenttechniquestosearchfortheoptimizationsolutionofKRLSandistheleastmeansquarealgorithm(LMS)inRKHS.KLMSis 20

PAGE 21

Algorithm1KernelRLSwithregularizationalgorithm Initialization:Selectthethreshold>0andtheregularizationparameter>0, C(1)=fu(1)g,P(1)=[11+])]TJ /F6 7.97 Tf 6.58 0 Td[(1,~K(1)=11, ~K(1))]TJ /F6 7.97 Tf 6.58 0 Td[(1=)]TJ /F6 7.97 Tf 6.59 0 Td[(111,A(1)=1,~(1)=P(1)y(1) forn=2,3...do Getthenewsample:(u(n),d(n)); Computeh(n) ALDtest:a(n)=~K(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1))]TJ /F6 7.97 Tf 6.58 0 Td[(1h(n) d2(n)=nn)]TJ /F10 11.955 Tf 11.96 0 Td[(h(n)Ta(n) ifd2(n)2then C(n)=C(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1),~K(n)=~K(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1),~K(n))]TJ /F6 7.97 Tf 6.59 0 Td[(1=~K(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1))]TJ /F6 7.97 Tf 6.58 0 Td[(1 Compute~(n+1)(Equ. 1 ) UpdateP(n)(Equ. 1 ) UpdateA(n),A(n)=[A(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)T,a(n)]T else C(n)=C(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)Sfu(n)g Compute~(n+1)(Equ. 1 ) UpdateP(n)(Equ. 1 ) Update~K(n))]TJ /F6 7.97 Tf 6.58 0 Td[(1and~K(n) UpdateA(n)(Equ. 1 ) endif endfor return~(n+1),C(n) obtainedbyminimizingtheinstantaneouscostfunction: J(n)=1 2e(n)2=1 2kT'(u(n)))]TJ /F5 11.955 Tf 11.95 0 Td[(d(n)k2(1) Assumetheinitialconditionofweightis(0)=0,thenusetheLMSalgorithmonRKHS,yields (n+1)=(n)+e(n)'(u(n))(1) 21

PAGE 22

However,thedimensionalityof'(.)isveryhigh,soanalternativewayisneeded. (n+1)=(n)+e(n)'(u(n))=(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)+e(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)'(u(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1))+e(n)'(u(n))...=nXi=1e(i)'(u(i))(1) Accordingtothekerneltricktheoutputofsystemofthenewinputu(n)canbeexpressedas (n)T'(u(n))=[n)]TJ /F6 7.97 Tf 6.58 0 Td[(1Xi=1e(i)'(u(i))]'(u(n))=n)]TJ /F6 7.97 Tf 6.58 0 Td[(1Xi=1e(i)(u(i),u(n))(1) Inconclusion,thelearningruleofKLMSintheoriginalspaceisasshowinAlgorithm. 2 .Amongallofthekerneladaptivelters,KLMSisunique.Itprovideswell-posedness Algorithm2Kernelleastmeansquarealgorithm Initialization:Selectthestepsize, e(1)=d(1); y(1)=e(1); Computation whilefu(n),d(n)gavailabledo y(n)=Pn)]TJ /F6 7.97 Tf 6.59 0 Td[(1i=1[e(i)(u(i),u(n))]; e(n)=d(n))]TJ /F5 11.955 Tf 11.95 0 Td[(y(n); endwhile solutionwithnitedata[ 7 ]andnaturallycreatesagrowingradial-basisfunction(RBF)network[ 1 ].Moreover,asanonlinelearningalgorithm,KLMSismuchsimplertoimplement,withrespecttocomputationalcomplexityandmemorystorage,thanotherbatch-modelkernelmethods. 22

PAGE 23

1.3.3KernelAfneProjectionAlgorithm TheKLMSalgorithmsimplyusesthecurrentvaluestosearchtheoptimalsolution.Whiletheafneprojectionalgorithm(APA)adoptsbetterapproximationbyusingtheKmostrecentsamplesandprovidesatrade-offbetweencomputationcomplexityandsystemperformance.Moreinterestingly,theKAPAprovidesageneralframeworkforseveralexitingtechniques,includingKLMS,slidingwindowKRLSandkernelregularizationnetworks[ 1 ]. 1.4WorkInThisThesis Eventhoughkerneladaptiveltershaveadvantagesmentionedaboveandhavebeenprovedtobeusefulincomplicatednonlinearregressionandclassicationproblems,theystillhavesomedrawbacks.Forexample,theconventionalcostfunction,mean-squareerror(MSE)criterion,cannotobtainthebestperformanceinnon-Gaussiansituations.Moreover,thelineargrowingstructurewitheachnewsampleleadstohighcomputationburdenandmemoryrequirement.Therefore,computationalcomplexityandmemoryincreaselinearlywithtimeasmoresamplesareprocessed,particularyforcontinuousscenarioswhichhindersofaronlineapplication,suchasinDSPandFPGAimplementations.Thereforetoapplythesepowerfulkerneladaptiveltersinpractice,weneedtoaddresstwomainissuesrst: Howtoobtainarobustperformanceinnon-Gaussiansituations? Howtodecreasethegrowingcomputationalburdenandmemoryrequire-ment? Thesetwoissuesareimportantaspectstojudgesystemperformanceandarehighlyrelated.Improvingrobustperformancenormallyresultsathecomputationalcomplexityincrease.Therefore,appropriateprocesses,suchasreducingthenetworksize,todecreasecomputationalcomplexityarenecessary.Ontheotherhand,approximationtechniquestodecreasecomputationcomplexitymaysufferfromsystemperformancegettingworseandcouldbecompensatedbyimprovingsystemrobust 23

PAGE 24

performance.Throughouttheliterature,similarproblemshavebeenstudiedfromdifferentperspectivesleadingtoamyriadoftechniques.Yetfewtechniqueshavebeenadoptedtokerneladaptivelters.Ourgoalistoprovideappropriatemethodswhichtakeintoaccounttheintrinsicpropertiesofkerneladaptivelterstosolvethesetwoproblems. Therearevariousfactorsinuencingkerneladaptiveltersperformance,suchascostfunctionandkernelfunction.First,letusconsiderthecostfunction.MSEcriterionisequivalenttomaximumlikelihoodtechniqueinlinearandGaussianconditionandobtainsgoodperformance,whileitisnotenoughinotherscenarios.Theleastmeanp-powererror(MPE)criterion,buildsalinearweightedcombinationofvariouspowersofthepredictionerror,isinvestigatedtosolvethisproblem[ 14 ].However,manyfreeparametersandpriorknowledgerequirementslimititswideapplicationinpractice.Recently,Informationtheoreticlearning(ITL)hasbeenprovedmoreefcienttotrainadaptivesystems.Differentfromconventionalerrorcostfunction,ITLoptimizestheinformationcontentofthepredictionerrortoachievethebestperformanceintermsofinformationltering[ 15 ].Bytakingintoaccountthewholesignaldistribution,adaptivesystemstrainingthroughinformationtheoreticcriteriahavebetterperformanceinvariousapplicationsespeciallyforwhichGaussianityassumptionisnotvalid.Furthermore,thenonparametricpropertyincostfunctionandclearphysicalmeaningalsomotivateustointroducetheITLintothekerneladaptivelterstoachievearobustsystemandproposethekernelmaximumcorrentropyalgorithm(KMC). Eventhoughtheaccuracyofasystemisimproved,theredundantcomputationalandmemoryburdeniswhatwehavetofacewith.Thereforetheotherimportantproblemforthepracticalapplicationofkerneladaptiveltersiscompactingnetworkstructure,whichhelpsusdecreasethetotalcomputationalcomplexity.Byincludingtheimportantinformationanddiscardingrelativelylessusefuldata,growingandpruningtechniquesmaintainthesystemnetworksizeintoanacceptablerange.Differentfromprevious 24

PAGE 25

sparsicationtechniques,thequantizedkernelleastmeansquareutilizedredundantdatatoupdatethenetworknotpurelydiscardedthemtoobtainacompactsystemwithhigheraccuracy.Asacomplementofspasication,pruningstrategiesmakesthesystemstructuremorecompact.Sensitiveanalysisbasedpruningstrategies,discardingtheunitswhichmakeinsignicantcontributiontotheoverallnetwork,arerobustandwidelyutilizedinvariousareas.Generally,omittinganyinformationbringstheaccuracydown.Hence,thegrowingandpruningtechniquesisatradeoffbetweensystemperformanceandcompactstructure.Therefore,ameasureofsignicanceisadoptedinkerneladaptiveltertoestimatetheinuenceofprocessdataanddecidewhatinformationwillbediscarded. Thesignicancemeasureguidesthesystemtoxthenetworksizeonapredenedthreshold.However,thisisnotarealnonstationarylearning.Whatweexpectedisthatthenetworksizeshouldoptimallybedictatedbythecomplexityofthetruesystemandthesignal,whilethesystemaccuracyisacceptable.Ifwesolvethisproblem,wehaveatrulyonlinealgorithmtohandlerealworldnonstationaryproblems.Thisproblemistransferredashowtoobtainatrade-offbetweensystemcomplexityandaccuracyperformance.Avarietyofinformationcriteriahavebeenproposedsofartodealwiththiscompromiseproblem.Amongthem,MinimalDescriptionLength(MDL)hasgreatadvantagesofsmallcomputationalcostsandrobustagainstnoise,whichhavebeenproveninmanyapplicationsparticularininformationlearningarea.MDLcriterionhastwoformulations:batchmodelandonlinemodel.TakingtheapproximationlevelselectioninKRLS-ALDasanexample,thebatchmodelMDLinkerneladaptiveltersisillustratedrstly.ThenweproposedanKLMSsparsicationalgorithmtoexplaintheonlinemodelMDL.Owningtothisproposedalgorithmseparatingtheinput(feature)spacewithquantizationtechniques,itiscalledQKLMS-MDL. 25

PAGE 26

CHAPTER2KERNELADAPTIVEFILTERINGWITHMAXIMUMCORRENTROPYCRITERION Thegoalofoursystemistoconstructafunctionf:U!Rbasedonaknownsequence(u(1),d(1)),(u(2),d(2)),...,(u(n),d(n))2Zn,whereu(i)isthesysteminputatsampletimei,andd(i)isthecorrespondingdesireresponse.Noticethatthedesireddatamaybenoisyinpractice,thatisd(i)=~d(i)+v(i)inwhich~d(i)istherealcleandataandv(i)isnoiseattimei.Actually,whatwewanttosolveisthefollowingempiricalriskminimization(ERM)problem:Remp[f2H,Zn]=nXi=1(~d(i))]TJ /F5 11.955 Tf 11.96 0 Td[(f(u(i)))2 (2) Howtoobtainarobustsystemwithgoodnonlinearapproximationistheproblemsolvedinthischapter. Inthischapter,wepresentakernel-basedmaximumcorrentropylearningalgorithm,calledKMC(KernelMaximumCorrentropy).KMCmakesuseoftheuniversalapproximationofpositivedenekernelsandthesecondandhigherorderstatisticsofcorrentropytoachievearobustnonlinearsystem.ThismethodmapstheinputdataintoRKHStoapproximatetheinput-outputmappingf,thenutilizesthemaximumcorrentropycriterion(MCC)asacostfunctiontominimizethedifferencebetweenthedesireddataandthelteroutput.Thisalgorithmnotonlyapproximatesanynonlinearsystemmoreaccuratelythanalinearmodel,butalsoisrobustindifferentnoisyenvironments.ThecomputationalcomplexityofouralgorithmissimilartoKLMSwhiletherobustnessissuperior.Furthermore,theKMCiswellposedwhennitedataisusedinthetrainingandthereforedoesnotneedexplicitregularization,whichnotonlysimpliestheimplementationbutalsoresultsinthepotentialtoprovidebetterperformancesbecauseregularizationbiasestheoptimalsolutionasiswellknown. 26

PAGE 27

2.1CostFunctionAndInformationTheoreticLearning 2.1.1TheCostFunctionDevelopment Fig. 2-1 isthegeneralstructureofkerneladaptivelterprocess.Costfunctionisanessentialpartoftheprocessandthegoaloftheoptimizationalgorithminadaptivelters.Andthelearningalgorithmisthetechniquethathowtoobtaintheoptimaaccordingtospeciccostfunction. Figure2-1. Atypicalkerneladaptiveltering Errorcostfunctions(orerrorcriteria)playsignicantrolesinstatisticalestimationproblems.Theminimummean-squareerror(MMSE)isthemostpopularcriterion.Ithasmanyadvantagessuchasmathematicaltractability,lowcomputationalcomplexityandrobust,guaranteedoptimalityforGaussiandata.Asanapplication,theleast-mean-squares(LMS),whichadoptsgradientdescenttechniquetosearchtheoptimalsolutioninmean-squareerrorsurface,isacommonadaptivealgorithmbecauseofitssimplicityandrobustnessinlinearandGaussiancondition.Similartomostconventionaladaptivelteringalgorithms,KLMSalsoutilizestheMMSEcriterionasacostfunction.Owningtosecond-orderstatisticsofthepredictionerror,theMMSEandLMSbothbelongtoaclassofsecondorderstatistics(SOS)algorithms.Movingbeyondmeansquarederror 27

PAGE 28

techniques,higherorderstatistics(HOS)methodsexploitinghigherordermomentsoftheerror(andnotjustsecondorderstatisticslikeinLMS)havebeenproposed,suchastheLeastMeanFourthadaptivelter(LMF)byWalachandWidrow[ 16 ]anditsgeneralizationnoiseconstrainedleastmeanfourth(NCLMF)adaptivealgorithm[ 17 ].Besides,Barrosetal.usedweightedsumsofsecondandfourthmomentsoftheerrorasacostfunction[ 18 ].Thisideautilizesthegoodbehaviorofthesecondordermomentinthesteadystateandfastconvergenceofthehigherordermoment.Eventhoughthesealgorithmsexplorethehigherordermomentsforadaptation,theystillworkwiththesameoptimalsolutionasMSE.TheuseofthesemethodsaresufcientwhentheprocessedsignalsareGaussiandistributedandlinearapplications.However,theperformanceofthemarenotthebestinnonlinearandnon-Gaussiansituations.Therefore,theleastmeanp-powererror(MPE)criterionisstudied[ 14 ]andthepreviouscriteriabasedonerrororderareconsideredasspecicrealizationsofMPE.MPEhasthreeadvantages:1)MPEfunctionisaconvexfunctionwhichguaranteesnolocalminima.2)WheninputprocessanddesiredprocessarebothGaussianprocesses,thenMPEfunctionhasthesameoptimumsolutionasconventionalWienersolutionforanyp.3)Wheninputprocessanddesiredprocessarenon-Gaussianprocesses,MPEfunctionmayhavebetteroptimumsolution.Furthermore,[ 14 ][ 19 ]suppliedgeneralrulesaboutpselection:1)whenthesignaliscorruptedbyanoisewithheavytailsuchasanimpulsivenoise,theadaptivealgorithmwithp<2ispreferredtoobtainastablesystem.2)whenthenoiseislighttail,theproperchoiceofpislargerthan2toachievefastconvergencespeed. Recently,Informationtheoreticlearning(ITL)hasbeenprovedmoreefcienttotrainadaptivesystemsespeciallyfornonlinearsignalprocessing.Differentfromconventionalerrorcostfunction,ITLoptimizestheinformationcontentofthepredictionerrortoseektoachievethebestperformanceintermsofinformationltering[ 15 ].Owningtotakingintoaccountthewholesignaldistribution,adaptivesystemstrainingthrough 28

PAGE 29

informationtheoreticcriteriahavebetterperformanceinvariousapplicationswherethesignalsarenon-Gaussian.Correntropy,developedbyPrincipeetal.,isakindoflocalizedmeasuretoestimatehowsimilartworandomvariablesare:whentworandomvariablesareveryclose,correntropyequalsthe2-normdistance,whichevolvesto1-normdistanceiftworandomvariablesgetfurtherapart,evenfallstozero-normastheyarefarapart[ 20 ],asshowinFig. 2-2 .Correntropyhasalreadybeenemployedtomanyapplicationssuccessfully.KernelPCAcanprojectthetransformeddataontoprincipaldirectionswithcorrentropyfunction[ 21 ],andefcientlycomputetheprincipalcomponentsinthefeaturespace.[ 22 ]proposedapowerspectralmeasureforFourierbasedsurrogatenonlinearitytestthroughcorrentropyasadiscriminantmeasure.ExtendingtheMinimumAverageCorrelationEnergy(MACE)ltertononlinearlterviacorrentropyimprovesMACEperformance,whenappliedtofacerecognition[ 23 ].Moreover,similarextensionofGrangercausalitybycorrentropycandetectcausalityofanonlineardynamicalsystemwherethelinearGrangercausalityfailed[ 24 ].[ 25 ]inducedasmoothlossfunction,C-lossfunction,fromcorrentropytoapproximatetheideal0-1lossinclassicationproblem.Whenitcomestothecostfunctionofadaptivelteringalgorithm,MaximalCorrentropyCriterion(MCC),maximizingthesimilaritybetweenthedesireandthepredictionoutputinthesenseofcorrentropy,isarobustadaptationprincipleinpresenceofnon-gaussianoutliers[ 26 ].Accordingtothepselectionrulesinerrorcriteriamentionedabove,thecorrentropycriterionshouldhavebetterperformancethanMSEinheavytailnoiseconditionandfasterconvergencespeedthanLeastabsolutedifference(LAD)(p=1).BesidesMCC,theMinimumErrorEntropyCriterion(MEE)alsoisaneffectivecriterion.Itrelatesasinglenumbertotheshapeoftheerrordistributioninsteadofbeingjustaxedfunction(thepower)oftheerror.ComparisonwithMSE,theMEEextractstheinformationinthesamplesefcientlytoobtainabettersolutionwithfewsamples.Ontheotherhand,owningtoconsideringalloferrorincludingoutlier,MEEislessrobustthanMCCtoheavytail 29

PAGE 30

noise.Besides,MEEhashighcomputationalcomplexity(O(NL),whereListhelengthofabatchoferrorsamples)whichmakesitquiteunsuitableforpracticalimplementationinapplications. Figure2-2. ContoursofCorrentropy(X,0)in2Dsamplespace(kernelsizeissetto1). 2.1.2DenitionAndPropertiesOfCorrentropy Asdevelopedin[ 20 ]and[ 27 ],correntropyisamethodtoestimateprobabilisticallythesimilaritybetweentwoarbitraryrandomvariables.Thekernelbandwidthcontrolsthescalesinwhichsimilarityisassessed.Correntropyisexpressedas:V(X,Y)=E[(X)]TJ /F5 11.955 Tf 11.95 0 Td[(Y)] (2) inwhich,(.)isasymmetricpositivedenitekernelwiththekernelwidthbeing.Correntropyistheexpectedvalueofkernelfunctionbetweentherealizationoftworandomvariables.Generally,anytranslationinvariantkernelcouldbethekernelfunctionofcorrentropy.Forsimplicity,theGaussianKernelistheonlyoneconsideredinthis 30

PAGE 31

work.Inpractice,weuseasetofnitedatatoapproximatetheexpectation,^VN,(X,Y)=1 NNXi=1(X)]TJ /F5 11.955 Tf 11.95 0 Td[(Y) (2) whereNisthenumberofdatatoestimatecorrentropy. Forcompleteness,wepresentbelowsomeofthemostimportantpropertiesofthecorrentropyfunction. Property1:Correntropyispositivedeniteandbounded,thatis,0
PAGE 32

h(E)isstrictlyconcaveintherangeofE2[)]TJ /F3 11.955 Tf 9.29 0 Td[(,].Strictlyconcavityguaranteestheexistenceanduniquenessfortheoptimalsolutionofadaptivelter.Butcorrentropyisnotconcaveinthefullspace,i.e.ltersadaptedwiththiscriterionmaybesub-optimalbecauseadaptationcanbecaughtinlocalminima.BecausetheconcavepropertyissatisedintherangeE2[)]TJ /F3 11.955 Tf 9.3 0 Td[(,],theinitialconditionforthekernelsizeshouldbechosencarefullythroughannealing,orwecanuseothercriteriatotrainadaptivelterrstlytomakesurecurrentsolutionisneartheglobaloptimalsolution. Asaglobalmeasure,MSEincludesallthesamplesintheinputspacetoestimatethesimilarityoftworandomvariableswhilecorrentropyemphasizessimilarityalongx=yline.ThispropertyintuitivelyexplainswhythecorrentropyissuperiorthanMSEiftheresidualofX)]TJ /F5 11.955 Tf 11.95 0 Td[(Yisnon-symmetricorwithnonzeromean. 2.1.3CorrentropyInLinearAdaptiveFilters Whencorrentropycomestoadaptivelters,thegoalistomaximizecorrentropybetweenthedesiredsignald(i)andthelteroutputy(i).Such,criterionisJ(n)=1 NnXi=n)]TJ /F11 7.97 Tf 6.59 0 Td[(N+1(d(i),y(i)) (2) inwhich,(.)isapositivedenitesymmetrickernelwiththekernelwidthbeing,andNisthenumberofsamplesinParzenestimatewindow. Fig. 2-3 representscostfunctionvaluesofMSE,MEEandMCCcriteriaunderaparticularcondition.ItisnoticeablethatMSEemphasizeserrorsawayfromzero,andMEE,whichsetsmoreimportancewherethereishighconcentrationoferrors,emphasizesthemodesoftheerror.Comparewiththem,MCCemphasizestheerrorsaroundzeroandessentiallycomputesp(e=0). SimilartoMSEcriterion,wecanuseaniterativegradientascentapproachtosearchtheoptimalsolution,thatisthenextsetoflterweightsarecorrectedbytakingavalue 32

PAGE 33

Figure2-3. CostfunctionvaluesofMSE,MEEandMCC:therstlineshowsthecostfunctionvaluesofthesecriteriaandthesecondlineisthedistributionoferror. propertothepositivegradientofthecostfunctionintheweightspace.Therefore, !(n+1)=!(n)+rJ(n)(2) SubstitutingJ(n)intoEq. 2 ,wecanobtain,!(n+1)=!(n)+ NnXi=n)]TJ /F11 7.97 Tf 6.58 0 Td[(N+1@(d(i),y(i)) @!(n) (2) Foronlinemode,thecurrentvalue(N=1)approximatesthestochasticgradient,!(n+1)=!(n)+@(d(n),y(n)) @!(n)=!(n)+g(e(n))u(n)=!(n)+exp()]TJ /F5 11.955 Tf 9.3 0 Td[(e(n)2 22)e(n)u(n) (2) inwhiche(n)=d(n))]TJ /F9 11.955 Tf 12.84 0 Td[(!(n)Tu(n)isthepredictionerror,andg(e(n))isafunctionofe(n)intermsofthekernelchoiceofCorrentropy.g(e(n))=exp()]TJ /F11 7.97 Tf 6.59 0 Td[(e(n)2 22)e(n)fortheNormalizedGaussianKernel.Inthefollowing,weutilizetheNormalizedGaussianKernelasanexampletoillustrateandanalysisKMCalgorithm.Ifonehavegoodreasontomakesurethatotherkernelfunctionsresultinbetterttingwithhigheraccuracy, 33

PAGE 34

theformulationofg(e(n))shouldbemodiedaccordingly.Besides,inthispaper,weassumeN=1isenoughtoapproximatethegradient. ThissectionshowedthatMCCsharesthecomputationalsimplicityoftheLMSalgorithm.ItscomputationalcomplexityisO(n),wherenisthenumberoftrainingdata.Withthesmoothdependenceofcorrentropyonkernelbandwith,thiscriterionisarobuststatisticalmethod.Moreover,theerrornonlinearityofMCCismemoryless,signpreservingandodd.Allofthesethreepropertiesofcorrentropyleadtotherequirementofagoodcostcriterion.Thesignpreservationpropertyiscrucialbecauseitallowstheschemetodescent,ratherthanclimb,theerrorsurface,andtheoddpropertyweightspositiveandnegativedataequally.In[ 26 ],experimentsintheoreticandpracticalapplicationsdemonstratetheadvantageofMCCinlinearadaptiveltersthoroughcomparingwithothercriteria. 2.2FormulationOfKernelMaximum-CorrentropyAlgorithm Asmentioned,ifthemappingbetweendanduisnonlinear,linearadaptivelterscannotobtaingoodperformance.Becauseofuniversalapproximationcapabilitiesandconvexoptimization[ 1 ],kernelmethodsaregoodchoiceforthistask.Inouralgorithm,theinputdatau(n)istransformedtoahigh-dimensionalfeaturespaceFas'(u(n))viathekernek-inducedmapping.Furthermore,linearadaptivelterisutilizedinthefeaturespace.AsdiscussedinRepresenterTheorem[ 28 ],theadaptivelterweighthastherepresentation, =nXi=1a(i)(u(i),.)=nXi=1a(i)'(i)(2) wherea(i)areweightedcoefcientsobtainedfromthetrainingdata,and'(i)isasimplicationof'(u(i)).Inthefollowingofthischapter,tosimplifythenotation,'(i)isusedtodenote'(u(i)).Then,usingtheMCCcriterionandthestochasticgradient 34

PAGE 35

approximationtothenewpairwisesamplef'(n),d(n)g,yields (0)=0(n+1)=(n)+@(d(n),(n)T'(u(n))) @(n)=(n)+[exp()]TJ /F5 11.955 Tf 9.3 0 Td[(e(n)2 22)e(n)'(n)]=(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)+nXi=n)]TJ /F6 7.97 Tf 6.59 0 Td[(1[exp()]TJ /F5 11.955 Tf 9.3 0 Td[(e(i)2 22)e(i)'(i)]...=nXi=1[exp()]TJ /F5 11.955 Tf 9.3 0 Td[(e(i)2 22)e(i)'(i)](2) Now,thekerneltrickisusedtoobtainthesystemoutput,whichcanbesolelyexpressedintermsofinnerproductsbetweenthenewinputandpreviousinputsweightedbypredictionerrors. y(n)=(n)T'(n)=n)]TJ /F6 7.97 Tf 6.59 0 Td[(1Xi=1[exp()]TJ /F5 11.955 Tf 9.3 0 Td[(e(i)2 22)e(i)'(i)T'(n)]=n)]TJ /F6 7.97 Tf 6.59 0 Td[(1Xi=1[exp()]TJ /F5 11.955 Tf 9.3 0 Td[(e(i)2 22)e(i)(u(i),u(n))](2) AsshowninEq. 2 ,thecomputationalcomplexityofKMCisO(N),whereNisthenumberoftrainingdata.Inconclusion,thelearningalgorithmisasfollows: Algorithm3KernelMaximumCorrentropyalgorithm Initialization :stepsize :universalkernel e(1)=d(1); y(1)=e(1)exp()]TJ /F11 7.97 Tf 6.59 0 Td[(e(1)2 22); Computation whilefu(n),d(n)gavailabledo y(n)=Pn)]TJ /F6 7.97 Tf 6.59 0 Td[(1i=1[exp()]TJ /F11 7.97 Tf 6.59 0 Td[(e(i)2 22)e(i)(u(i),u(n))]; e(n)=d(n))]TJ /F5 11.955 Tf 11.95 0 Td[(y(n); endwhile 35

PAGE 36

2.3ImplementationOfKernelMaximum-CorrentropyAlgorithm Stabilityisanextremelyimportantaspectinadaptiveltering.Inthispart,weusetheenergyconservationrelationtoanalyzesteady-stateMSEandwell-posednessofouralgorithm. 2.3.1Steady-StateMSEAnalysis AnidealadaptivelterinRKHSattemptstondaweightvector, d(n)=T'(n)+v(n)(2) wherev(n)isameasurementnoiseandmodelingerrors.Theadaptivealgorithmyieldsthepredictionerror e(n)=d(n))]TJ /F10 11.955 Tf 11.96 0 Td[((n)T'(n)=T'(n))]TJ /F10 11.955 Tf 11.96 0 Td[((n)T'(n)+v(n)=~(n)T'(n)+v(n)(2) inwhich~(n)=)]TJ /F10 11.955 Tf 12.16 0 Td[((n)istheweight-errorvectorinRKHSatiterationn.Atthenextiterationtheweight-errorvectorcanbewrittenas, ~(n+1)=)]TJ /F10 11.955 Tf 11.96 0 Td[((n+1)=~(n)+(n))]TJ /F10 11.955 Tf 11.95 0 Td[((n+1)=~(n))]TJ /F26 11.955 Tf 11.96 0 Td[(4(n)(2) Inordertostudythelterlearningprocess,thepriorandposteriorierrorsaredenedas ea(n)=~(n)T'(n)ep(n)=~(n+1)T'(n)(2) 36

PAGE 37

Suchthat, ep(n)=~(n+1)T'(n)=()]TJ /F10 11.955 Tf 11.96 0 Td[((n+1))T'(n)=()]TJ /F4 11.955 Tf 11.96 0 Td[(((n)+4(n)))T'(n)=(~(n))-222(4(n))T'(n)=ea(n))-222(4(n)T'(n)=ea(n))]TJ /F3 11.955 Tf 11.95 0 Td[(g(e(n))(u(n),u(n))(2) ForNormalizedGaussianKernel,(u(n),u(n))=1.Therefore,Eq. 2 canbesimpliedto ep(n)=ea(n))]TJ /F3 11.955 Tf 11.96 0 Td[(g(e(n))(2) Thesteady-stateMSEcanbewrittenasMSE=limn)176(!1E(e(n)2),TogetMSE,oneassumptionisutilized:theaprioriestimationerrorea(n)withzeromeanisindependentofv(n).Thisassumptionispopular,whichiscommonlyusedinthesteady-stateanalysisformostadaptivealgorithm.Basedon[ 29 ],thesteady-stateMSEcanbewrittenas MSE=2v+EMSE(2) where2visthevarianceofv(n)andEMSEistheexcessMSE(EMSE)denedas EMSE=limn)176(!1Ejea(n)j2 Because2visunknownandnotcontrolledbyadaptivelters,wemainlyfocusonEMSE. CombiningEq. 2 andEq. 2 ,weget, ~(n+1)T=~(n)T+(ep(n))]TJ /F5 11.955 Tf 11.96 0 Td[(ea(n))='(n)T=~(n)T+(ep(n))]TJ /F5 11.955 Tf 11.96 0 Td[(ea(n))'(n)=(u(n),u(n))=~(n)T+(ep(n))]TJ /F5 11.955 Tf 11.96 0 Td[(ea(n))'(n)(2) 37

PAGE 38

Basedontheenergyconservationrelation,bothsidesofEq. 2 shouldhavethesameenergyinsteadystate, k~(n+1)k2F=k~(n)+(ep(n))]TJ /F5 11.955 Tf 11.96 0 Td[(ea(n))'(n)k2F(2) Expandingthisequation, ~(n+1)T~(n+1)=[~(n)+(ep(n))]TJ /F5 11.955 Tf 11.96 0 Td[(ea(n))'(n)]T[~(n)+(ep(n))]TJ /F5 11.955 Tf 11.96 0 Td[(ea(n))'(n)]=~(n)T~(n)+2~(n)T(ep(n))]TJ /F5 11.955 Tf 11.95 0 Td[(ea(n))'(n)+(ep(n))]TJ /F5 11.955 Tf 11.95 0 Td[(ea(n))2=~(n)T~(n))]TJ /F4 11.955 Tf 11.96 0 Td[(2ea(n)g(e(n))+2g(e(n))2(2) Suchthat,weobtain, E[k~(n+1)k2F]=E[k~(n)k2F])]TJ /F4 11.955 Tf 11.95 0 Td[(2E[ea(n)g(e(n))]+2E[g(e(n))2](2) Insteadystate,theadaptiveltersworkwiththesamemeanweight E[k~(n+1)k2F]=E[k~(n)k2F] andthetimeindexncouldbeomittedforeasyvisualization.ThenEq. 2 iswrittenas, 2E[eag(e)]=E[g(e)2](2) AccordingtoTaylorExpansiontheory, g(e)=g(ea+v)=g(v)+g(1)e(v)ea+1 2g(2)e(v)e2a+O(ea)(2) whereg(1)e(v)andg(2)e(v)denotetherst-orderandsecond-orderpartialderivativeofg(v)withrespecttoeatthevaluevandO(ea)denotesthirdandhigher-powertermof 38

PAGE 39

ea.SubstitutingEq. 2 intotheleft-handsideofEq. 2 andignoringE[O(ea)]yields 2E[eag(e)]=2E[eag(v)+g(1)e(v)e2a]+O(ea)=2E[g(1)e(v)e2a]=2E[g(1)e(v)]EMSE(2) Withthesamehandlingprocedure,theright-handsideofEq. 2 is E[g(e)2]=[E[g2(v)]+E[B]EMSE](2) whereB=g(v)g(2)(v)+jg(1)(v)j2.Itisobvioustoobtain g(1)(v)=exp()]TJ /F5 11.955 Tf 9.3 0 Td[(v2 22)(1)]TJ /F5 11.955 Tf 13.27 8.09 Td[(v2 2)g(2)(v)=exp()]TJ /F5 11.955 Tf 9.3 0 Td[(v2 22)(v3 3)]TJ /F4 11.955 Tf 13.15 8.09 Td[(3v 2)(2) SubstitutingEq. 2 intoEq. 2 andEq. 2 respectively,andusingEq. 2 ,wehave 2E[exp()]TJ /F5 11.955 Tf 9.3 0 Td[(v2 22)(1)]TJ /F5 11.955 Tf 13.27 8.09 Td[(v2 2)])]TJ /F3 11.955 Tf 11.96 0 Td[(E[exp()]TJ /F5 11.955 Tf 9.3 0 Td[(v2 22)(1+2v4 4)]TJ /F4 11.955 Tf 13.15 8.09 Td[(5v2 2)]EMSE=E[exp()]TJ /F5 11.955 Tf 9.3 0 Td[(v2 22)v2](2) Owningto>0,EMSE0andE[exp()]TJ /F11 7.97 Tf 6.59 0 Td[(v2 22)v2]0,ifthestepsizesatises 0<
PAGE 40

2.3.2SelfRegularization Doesourmethodfacetheill-posedproblemduetosmalldatasizeorseverenoise,likeLeastSquare(LS)?IntheLSproblem,theTikhonovregularization[ 7 ]iswidelyusedtodealwiththisissue.Ifthesamemethodisappliedtoourmethod,aregularizationtermisintroducedtotheMCCcostfunction maxRemp[2F,Zn]=nXi=1(d(i),('(i)))+kk2F(2) thisoptimizationproblemisequivalenttothefollowingproblem: maxRemp[2F,Zn]=nXi=1(d(i),('(i)))subjecttokk2FC(2) whichwasalreadyprovenin[ 7 ].Therefore,thefollowingconclusioncanbeobtained:constrainingthenormofthesolutionhasthesameresultasaddingaregularizationtermintheKMC. Asmentionedinprevioussection,kk2FmonotonicallydecreasesasiterationsincreaseaslongasthestepsizesatisestheconstrainofEq.( 2 ).Hence,foranypositivevalueC,wecanndawithappropriatestepsizeandinitialcondition,suchthatkk2FC.Toconclude,KMClearningisaself-regularizationmethodundertheappropriatestepsizeandinitalcondition. 2.4KernelWidthSelectionForCorrentropy EventhoughtheKMChasgoodnonlinearapproximationandisrobustinnonlinearandnon-Gaussiansituation,theadvantagesofKMChavebeenachievedatacost:therequirementofsettingakernelwidthparameterforcorrentropy.NoticetherearetwokernelsizesinKMC:thekernelusedfortheRKHSmappingandthekernelforthecostfunction.herewetalkaboutthelatter.Asshowin[ 20 ]and[ 30 ],thekernelwidthvalueofcorrentropyisveryimportant,evenhasmoreinuencethanthekernelitself.Kernelwidthcontrolsthelocalrangewherethesimilarityoftworandomvariablesare 40

PAGE 41

compared.Whenitcomestothesituationthatcorrentropyasacriterioninadaptivesystemtraining,kernelwidthofcorrentropyinuencesthenatureofthepreformancesurface,presenceoflocialoptima,rateofconvergenceandrobustnesstoimpulsivenoiseduringadaption[ 31 ].Ifanuntkernelwidthischosen,thesuperiorperformanceofcorrentropywilllose.Therefore,kernelwidthselectionisanimportantaspectincorrentropyapplication. Atpresent,therearevariousmethodologiestosolvekernelwidthselectionindensityestimation.Atrst,ithasbeendiscussedextensivelyinthestatisticalsense[ 32 ].Inthecaseofplug-inmethod,theruleofthumbpopularizedbySilverman[ 33 ]isthemostcommonbecauseofitssimplicity.Toimprovetheperformanceinnon-gaussianenvironment,ParkandMarron[ 34 ]proposedtouseanonparametricestimationofthetrueprobabilitydensityofrandomvariableandtakethesecondderivativefromthisestimation.Withregardtocrossvalidation,theLeastSquaresCross-validation[ 35 ]andBiasedCross-validation[ 36 ]arethebestknown.Allthesealgorithmsbelongtostatisticalmethodologies.Unfortunately,allofthesemethodsmentionedabovearenotappropriatetoourproblem.Because,asshownlater,correntropy,isnotonlyafunctionofdensityfunctionf,butalsoafunctionoftheitemshowingthesimilarityoftworealizationsofrandomvariables.Whatwewanttosolveisthekernelwidthselectioninthelateritem.Besidesstatisticalmethodologies,[ 31 ]proposedaniterativeupdatealgorithmsbymeansofminimizingtheKullback-Leibler(KL)divergencebetweentheestimatedandtruesignaldistribution.Althoughthismethodchoosesthekernelwidthadaptively,howtochoosethestepsizeisnottrival[ 37 ].Furthermore,thehighcomputationalcomplexityismoreexpensivetoupdatetheweightsthatapproximatenonlinearmapping.Asaremedy,[ 37 ]proposedaxedpointupdaterulewithgradientdescentsearchingtechnique.Owningtoapproximatetheexpectationthroughestimatingitsargumentforthecurrentsample,thisxedpointruleiseasilydisturbedbyoutliers. 41

PAGE 42

Inthispart,anadaptivealgorithmforkernelwidthselectioncorrentropyisproposed.Thismethodupdatesthekernelwidtheveryiterationonthestrengthofthepredictionerrordistribution.Insignalprocessing,Middleton'snon-Gaussianinterferencemodelshavestrongtheoreticalbackgroundandbroadapplicability[ 38 ].MiddletonclassBdensitymodelisalinearcombinationofanormaldistributionandaseriesof-distribution.Hereweusemixed-GaussianmodeltoapproximatetheMiddletondensitymodel.Kurtosis,presentstheshapeoferrorprobabilitydensityfunctionbyanumericalvalue,aidsustoadjustupdateitemwithrespecttostandarddeviationoferror.Thecomputationalcomplexityofthisalgorithmislinearwiththenumberoftrainingdata,andsomeapproximationtechniquescouldbeadoptedtoreducethiscomplexitytoenhanceKMCapplicabilityinrealsystem. Sometimesthesignalisnon-stationary,forexample,asaresultoflearningorduetoachangingenvironment.Therefore,thekernelwidthofcorrentropyshouldbeadaptedtoensurethatthecriterionhasbetterperformanceinnonstationaryscenario.Now,weproposeaadaptivekernelwidthupdateruleas, (n+1)=(n)+(1)]TJ /F3 11.955 Tf 11.96 0 Td[()s G e2e(2) where(n),n=1,2,...,isthekernelwidthofcorrentropyatsampleindexn,0<<1isamemoryfactor,GisthekurtosisofGaussiandistribution,whichvalueis3,eand2earethekurtosisandvarianceofthepredictionerrorrespectively.Wewillexplainthisupdaterulelater. Whenthiskernelwidthupdatealgorithmisemployedinadaptivesystemstraining,theoverallsystemdesignisdepictedinFig. 2-4 .Noticetherearetwoadaptationloopsinthesystem,oneforweightsupdate,andtheotherforthecostfunctionparameterupdate.ForKMC,themaximumcorrentropyweightupdatestrategyisobtainedbysearchingaappropriatekernelwidthwithkurtosis.Therefore,ateachiteration,thekernelwidthupdaterulechangesthecostfunctionbeforethesystemweighareupdated. 42

PAGE 43

Thatis,thekernelwidthupdateshouldbeslowerthanweightupdate.Therefore,theweightupdateisperformed,afterincorporationthenewkernelwidthforcorrentropy. Figure2-4. Overalltrainingsystemstructure BasingonEq. 2 ,thekernelwidthupdateateveryiterationisproportionaltostandarddeviationofthepredictionerror,whichisthesameasconventionalmethods,Silverman'srule,etc.IfthepredictionerrorfollowstheGaussiandistribution,thekernelwidthofcorrentropydecaystothestandarddeviationofthepredictionerror.However,ifthedensityofthepredictionerrordeviatessubstantiallyfromtheshapeoftheGaussiandistribution(beingmultimodalforinstance),merelystandarddeviationisnotenough.Generally,inthesituationthatthepredictionerrorhaslonger,fattertails,asshowninFig. 2-5A ,thekernelwidthshouldberelativelysmallerthanstandarddeviation.Suchthattheinappropriateerrorinformation,likeimpulsepredictionerror,resultinginlargeuctuationofweightupdatewouldberejected.Ontheotherhand,thekernelwidthofcorrentropyshouldberelativelylargertoutilizemorevalidinformationforweightupdate,andimprovestheconvergentspeed,asshowninFig. 2-5B .Forthisreason,anadditionalcoefcientshouldbeemployedtoadjustupdateitemwith 43

PAGE 44

AHeavytail BLighttail Figure2-5. Examplesofkernelwidthselectioninnon-Gaussianenvironment respecttothestandarddeviationofpredictionerror.Asameasuretoestimatethepeakynessandheavytailofprobabilitydistribution,kurtosisrepresentstheshapeofprobabilitydistributioninnumericalvalues.Adistributionwithhigherkurtosishasasharperpeakandheavytails,whileadistributionwithlowkurtosishasashorter,thinnertails.Gaussiandistributionsaretheunitstorepresentthenon-GaussiannoiseinMiddletonmodel,andthisfactinspiresustoexploitthekurtosisofGaussiandistributionGasthebenchmarktojudgewhethertheerrordistributionhasheavyorlighttails.Inconclusion,theupdateitemisintendedtobeq G e2e.0andselectioncontrolthetradeoffbetweenrobustnessandconvergencespeed. Thecomputationalcomplexityisanimportantconsiderationforalgorithmapplication.Inourmethod,eisacomputationalintensivepart,scalingwithO(n),wherenisthe 44

PAGE 45

numberoftrainingdata.Actually,thereisnoneedtoupdatekernelsizeateveryiteration.Becausethepredictionerrorislocallystableinmostsituation.Forthesamereason,merelysamplingtherecentpredictionerror,lengthM,astheeffectivedatatoapproximatekurtosisisanotherchoice. Eventhoughtheproposedmethodstillrequiresafreeparametertobechosen,itisatrivialproblem.isamemoryfactorandactuallycontrolsthecontributionofthepredictionerroratdifferenttimetocurrentkernelwidth.Usually,thevalueofisintherangeof[0.9,1.0).Asweshowinsimulationsection,theperformanceofproposedadaptivemethodisnotsensitivetothevalueof,thatis,systemswithdifferenthavesimilarperformance. 2.5Simulation 2.5.1FrequencyDoubling Frequencydoublingisanobviousnonlinearproblem.Inthissimulation,theinputanddesiredataforthesystembotharesinewavewithf0and2f0respectively,asshowninFig. 2-6 .1500samplesfromtwosequencesaresegmentedforthetrainingdata,and200samplesasthetestdata.WeuseanimpulsiveGaussianmixturemodeltosimulatetheinuenceofthenon-gaussiannoise,whoseprobabilitydensityfunctionis: pnoise(i)=0.9N(0,0.01)+0.1N(2,0.01)(2) PART1KMCPerformanceComparedwithothersystemsInthispart,wecomparetheperformanceofKMCwithMCCwithlinearlterandkernellterswithpopularerrorp-normcriteria.Theinputvectorisdimension2(currentandonepastsample).Underkernellearningprocessthelearningrateis0.9,0.1,0.25and0.9respectivelyforKLMS,KLMF,KLDAandKMC,anditis0.2forMCCwithlinearlter.Moreover,thekernelsizesforkernel-inducedmappinginkerneladaptiveltersare0.5,andthekernelsizesforcorrentropycriterioninKMCandMCCwithlinearadaptivelteraresetto0.4whichperformsbestonthetestdata.Meanwhile,inordertoguarantee 45

PAGE 46

Figure2-6. Simulationdata KMCandMCCwithlinearlterreachtheglobaloptimalsolution,wersttraintheselterswithMSEcriterionduringtherst300samples.100Monte-Carlosimulationsarerunforthesamedatawith100differentstartsandnoises.Allofresultsarepresentedwithrespecttointrinsicerrorpower(IEP)oncleantestdata,wherecleandatameansdesiredsignalwithoutnoise.Thatis IEP=E[d)]TJ /F5 11.955 Tf 11.95 0 Td[(f(u)]2(2) inwhich,danduarethedesireandinputofcleantestdatarespectively,andf(u)isthesystemoutputforcorrespondingtestdata. TheaveragelearningcurvesaccompaniedwithstandardderivationareshowninFig. 2-7 ,andthenalestimatedIEPforthesesystemsareshowninTab. 2-1 .AlloftheseresultsshowthattheperformanceofKMCismuchbetterthanothers.NotonlythemeanofIEPofKMCisthesmallest,butalsotheoutputrangeisthenarrowest.Comparisonwithothers,theKMCCandKLDAcontroltheinuenceoftheoutlier,which 46

PAGE 47

Figure2-7. Averagelearningcurvescomparisonalongwithstandardderivation expressesas1)theaccuracyofthemisrelativelyhigher2)theuctuationofsystemperformanceisunconspicuous.Moreover,becauseofthecorrentropycriterionequalsto2-normdistanceofthepredictionerrorwhenerrorissmall,theconvergencespeedofKMCisfasterthanKLDA.Fig. 2-8 isarepresentativevisualresult.TheMCCwithlinearlterisincapableofapproximatingthenonlinertransferfunctionrequiredforthefrequencydoubling.EventhoughKLMSfollowsthefrequencydoubling,theresultisinuencedbythenoise.However,noticethatthekernelwidthinthecorrentropycriterionmustbeselectedbytheusertoattenuatetheoutliers,whichdependsontheapplication. Table2-1. ThenalestimatedIEPcomparison adaptivelterIEP KLMS0.33080.2721KLMF0.38490.1054KLDA0.05000.0310KMC0.01090.0033MCCwithlinearsystem1.15180.3685 Notethatalloftheseresultsaresummarizedintheformofaveragestandarddeviation. 47

PAGE 48

Figure2-8. Visualresultofarepresentativesimulation PART2TheEMSEanalysisofKMCInthispart,weinvestigatetheEMSEperformanceofKMC.ThesystemperformanceunderaseriesnoisesareshowninTable 2-2 (kernelsizesforcorrentropycriterioninKMCissetto0.4).Thenoisesareaseriesofmixed-Gaussinmodels, pnoise(i)=0.9N(0,0.01)+0.1N(a,0.01)(2) whereaiscoefcienttocharacterizevariousnoisemodels.Fig. 2-9 visuallyshowstheresultofa=1.0.Fromtheseresults,theconvergencetestingMSEindeedisclosetothetheoreticsteady-stateEMSE. Table2-2. TheoreticandsimulationEMSE(Gaussian-mixturenoise) a=0.5a=1.0a=1.5a=2.0 TheoreticEMSE0.00110.00210.00230.0026 SimulationEMSE0.00070.00240.00280.0032 PART3TheeffectofthekernelwidthonMCCinKMCInthissection,theeffectofthekernelwidthonMCCinKMCisdemonstratedandtheperformanceoftheadaptive 48

PAGE 49

Figure2-9. ConvergencecurveintermsoftheEMSE kernelwidthselectionisvalidated.Atrst,wechooseeightkernelwidths:0.1,0.3,0.4,0.7,1.0,1.5,4andthevalueobtainedbySilverman'srule.Similarly,100MonteCarlosimulationswithdifferentnoisesareruntostudytheeffectofkernelwidth,whilealltheotherparametersofltersarethesameastheprevioussimulation.Tab. 2-3 presentstheresults.Fromthistable,KMCperformsatthesamelevelforalargerangeofkernelsizes,i.e.whenthekernelwidthisintherangeof[0.3,1.0].IfasetofkernelsizesSMbyapplyingSilverman'sruletothepredictionerror,SMvariesbetween[0.0602,0.15],whichisintheneighborhoodofthebestvaluesmentionedabove.Alargekernelwidthinitiallyisbenecialtoavoidlocaloptimalsolution,andtheglobaloptimalpointwillobtainbyusingalargekernelwidth.Atthesametime,largekernelwidthdecreasesthewindoweffectofcorrentropy,andadaptivesystemswithtoolargekernelwidthdegeneratetothosetrainedwithMSEcriterion.Thereforeanappropriatekernelwidthisacompromisebetweenglobaloptimaandnoiseoutliercancelation.However,KMCwithlargekernelwidthwillnotperformworsethanKLMS. 49

PAGE 50

Then,theperformanceoftheadaptivekernelwidthselectionisdiscussed.wecomparethenewmethodwithxedkernelwidths,Silverman'sruleandxedpointupdatemethodproposedby[ 37 ].Inordertoshowthesuperioroftheproposedadaptiveselectionmethod,anothersimulationwithdifferentnoiseisshowninTab. 2-4 .Thenoisesareaseriesofmixed-Laplacemodels, pnoise(i)=0.9L(=0,b=0.1826)+0.1L(=a,b=0.1826)(2) whereaiscoefcienttocharacterizevariousnoisemodels,andbarethefreeparametersoftheLaplacedistributionpL(xj,b)=1 2bexp()]TJ /F13 7.97 Tf 10.49 5.7 Td[(jx)]TJ /F25 7.97 Tf 6.58 0 Td[(j b)(xisarandomvariable). Basingonthesetwotables,systemstrainingwithproposedupdateruleperformsbetterthanothers.Takingtherstsimulationasanexample:thesystemperformsrelativelywellwhenthekernelwidthisintherangeof[0.3,1].ThekernelwidthselectedbySilvermansruleisnotinthisrange,whiletheonestrainingbytheothertwoadaptiverulesareintheneighborhoodofthebestvalues.Furthermore,themeanandstandarddeviationofIEPinourmethodisthesmallestamongthreeadaptivemethodsandalmostoptimalcomparingtothexedkernelwidthcondition. Table2-3. Systemcomparisonbetweenxedkernelwidthanddifferentupdatemethods(Gaussian-mixturenoise) IEP 0.10.32410.32790.30.01190.00600.40.01090.00330.70.01130.00381.00.02090.01131.50.12030.12574.00.23860.1966Silverman's(0.0602)*0.17240.1339Fixedpointupdate(0.7894)0.01510.0297Proposedadaptivemethod(0.3865)0.00770.0062 Thenumbersinparenthesesarevaluesofnalkernelwidthobtainedbymentionedalgorithms. 50

PAGE 51

Next,theeffectofthememoryfactorisdemonstrated.StilltakeGaussian-mixturenoiseasanexample.Wechoose4memoryfactors:0.91,0.93,0.95and0.98.Foreachmemoryfactor,50MonteCarlosimulationswithdifferentnoiserealizationsareruntoestimatethemeanofkenelwidthsandIEPsofsystem.TheresultsarepresentedinFig. 2-10 andTab. 2-5 .AsshowninFig. 2-10 ,thecurveof=0.98isthemostsmooth,whichmeansthebesttrackingstablity.However,theperformanceoffasttrackingpropertyisworsethanothers.Becausethecurveof=0.98doesn'tmovemuchwiththechangeofpredictionerrordistribution.Nevertheless,theinuenceoftoadaptivesystemistiny.Withthelearningepochesincreasing,thekernelwidthconvergestoaround0.4graduallybasingonFig. 2-10 .AndtheuctuationofthenalIEPswithrespecttofourdifferentvaluesissmall,notmattermeanorstandarddeviation. 2.6Conclusion Owingtouniversalnonlinearapproximation,linearityinRKHS,andconvexityinhypothesisspace,kerneladaptiveltersarewidelyusedinmanyapplications.However,MSEcriterionwhichispopularinmostconventionaladaptiveltersisnotappropriate Table2-4. Systemcomparisonbetweenxedkernelwidthanddifferentupdatemethods(Laplace-mixturenoise) a=0.5a=1.0a=1.5a=2.0Proposedmethod0.00170.00050.00200.00070.00210.00070.00210.0009(0.0224)*(0.2060)(0.2910)(0.3571) Silverman's0.00860.00410.03440.02040.11020.11320.17200.1159(0.0224)(0.0256)(0.0432)(0.0410) Fixedpointupdate0.01590.00970.02860.02030.01300.01450.00590.0057(0.7162)(0.782)(0.8183)(0.8388) Thenumbersinparenthesesarevaluesofnalkernelwidthobtainedbymentionedalgorithms. Table2-5. EffectofinKMC IEP 0.500.01020.01930.800.00980.01640.910.00930.01500.980.00750.0028 51

PAGE 52

Figure2-10. Thekernelsizecurveswithrespecttoseries fornon-Gaussianandnonlinearcases.Asanalternativeoption,MCChasbeenprovedtobemoreefcientandrobustthanMSEcriterioninthesesituations.WorkinthischaptercombinestheadvantagesofthesetwoapproachesandbringsanewalgorithmcalledKernelMaximumCorrentropy.Asshowntheoreticallyandexperimentally,theperformanceofthisalgorithmissuperior.Itsconvergencepropertyandregularizationpropertyaremainlycontrolledbytwosimpleparameters(step-sizeparameterandkernelwidthinthecorrentropy).Besides,itscomputationalcomplexityissimilartothatofKLMS,beingO(N). AlthoughtheKMClterobtainsgoodperformance,thekernelwidthinthecorrentropycriterionmustbeselectedaccordingtotheapplicationtoattenuatetheoutliers.Axedkernelwidthleadstoatradeoffbetweennoiseoutliercancellationandconvergencerate.Withouthavinganypriorknowledgeaboutthestatisticaldistributionofthesignalinvolved,howtoselectionakernelwidthiscrucialproblemincorrentropyapplication.Thereforeweformulateanalgorithmtoadaptthisparameter.Utilizingthe 52

PAGE 53

kurtosisofthesignalastheratiotoadjustthepredictionerrorstandarddeviationhelpsthelterachieveanappropriatekernelwidth.Theproposedmethodcanindeedupdatethekernelwidthadaptively,yieldingbetterperformance.Moreimportant,thesystemperformanceisnotsensitivetotheonlyfreeparameterandthevalueofiseasytobeselectedbecauseofclearimplementingmeaning.Besidesthekernelwidthincorrentropy,practicalapproachesareavailabletoselectthekernelandtochoosethestep-sizeparameter,aswhatdidinKLMS.Issuethatrequirefutureinvestigationispruningmethodstofurtherreducenetworksize. 53

PAGE 54

CHAPTER3FIXEDBUDGETQUANTIZEDKERNELLEAST-MEAN-SQUAREALGORITHM Kerneladaptiveltersexhibitagrowingmemorystructureembeddedinthelterweightsandthenetworksizeincreaseslinearlywiththeprocessedsamples.Theincreasenetworksizeresultsinhighcomputationalburdenandmemoryrequirementandhindersonlineapplicationsparticularlyforcontinuousscenarios.Thisproblemmotivatesustoexploreaxed-budgetkernelsystemtodecreasethecomputationcomplexity.Combinedwithasparsetechnique,quantizedkernelleastmeansquarealgorithm(QKLMS),weproposedagrowingandpruningsystemtorealizeaxed-budgetKLMSalgorithm,calledQKLMS-FB.Atrst,anewcenterwillbeaddedonlyifthedistancebetweenitandthecenterdictionaryissmallerthanorequaltoapredenedthreshold[ 13 ].Whenthenetworksizereachesitspredenedbudget,thecenterwiththesmallestsignicance,whichisameasuretoestimatetheinuenceofacentertotheadaptivesystem,willbeprunedwitheachnewcenteraddedtothenetworkaccordingtoaquantizationstrategy.Asthesparsetechnique,QKLMScurbsthenetworksizeincrease,whilethepruningstrategybasedonsignicancemakesthenetworkmorecompactandimprovesthesystemtrackingcapability.Thischapterintroducestheconceptofsignicancebasedonaweightedaverageoverallinputdata,eventhoughsomeofthemarequantizedtoexistingcenters.Animportantaspectofanonlinealgorithmismoderatecomputationalcomplexityateveryiteration.Therefore,arecursivemethodtoestimatesignicanceisalsodeveloped,followedbythesummaryofQKLMS-FB.Suchthesignicancemeasurecouldbeupdatedrecursivelyateachstepwhichleadstheproposedmethodologyisonline. Tosimplifythenotation,wealwaysrefertotimeindexn,andthereforeexpressCi(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1),i(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)andK(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)asci,iandK. 54

PAGE 55

3.1BudgetControllingStrategy 3.1.1GrowingAndPruningTechniquesOverview ComparedwithKLMS,KMCalgorithmincreasesthecomputationalburdenofsystem.Additionally,kerneladaptiveltersexhibitagrowingmemorystructureembeddedinthelterweights[ 1 ].Thispropertyaggravateshighcomputationalcomplexityandmemoryrequirementandhindersonlineapplicationsparticularlyforcontinuousscenarios.Computationandmemoryproblemsareconfrontedinotherkernelmethodsaswell.Forexample,thecomputationburdenofSVM[ 2 ]andkernelregressionnetwork[ 3 ]scaleswithO(N3),aswellasmemory,uptoO(N2),withNbeingthenumberofprocesseddata.Asaremedy,theonlinekernellearning(OKL)providesmoreefcientalternativeswhichapproximatethenonlinearfunctionsequentiallybygradientdescenttechniques[ 39 ].AsthesubeldofOKL,thekerneladaptiveltersarerecentlymakingrapidprogress.EventhoughOKLandkerneladaptiveltersrequirelesscomputationalcost,thesystemstructurestillgrowslinearlywiththenumberofprocesseddata.Tocurbthisgrowth,avarietyofsparsicationtechniqueshavebeenproposedtocurbgrowth:thenoveltycriterion[ 10 ],theapproximationlineardependency(ALD)criterion[ 8 ],predictionvarious[ 11 ],coherence[ 40 ]andthesurprisecriterion[ 12 ],etc.Recently,quantizationtechniqueswereintroducedtoKLMS,calledquantizedKLMS(QKLMS)[ 13 ].Unlikeconventionalsparsicationapproaches,thisnewmethoddoesn'tpurelydiscardthe`redundant'data.Itinsteadutilizestheinformationconveyedbythesedatatoupdatethecoefcientsoftheexistingcenters,andhenceachievesamorecompactnetworkwithbetteraccuracy.Therefore,theQKLMSisthesparsicationstrategyutilizedinthiswork. Althoughthesesparsicationmethodscurbthenetworksizegrowth,theydonotallowtoxinadvancetheexactnetworksizeM,i.e.establishapriorithemodelorderasconventionallydoneinlinearmodels.Fixed-budgetkernelmethodsarecrucialinpracticalapplications,sinceDSPprocessorshavelimitedphysicalmemoryand 55

PAGE 56

computationalcapability.Inordertoobtainaxedbudgetmodel,budgetmaintenancestrategiesshouldbeproposed,likepruning,projection,andmerging.Inthiswork,wemainlyfocusonpruningtechniques,aseriesofstrategiesthatdiscardtheredundantunitsfromexistinglargenetworks.Whenadoptingthepruningstrategies,threekeyproblemsneedtobeaddressed:1)Decidewhichcentertobepruned.2)Howtodealwiththenonstationarity.3)Howtodecreasetheinuenceofoutliers.Tosolvetheseproblems,variouspruningtechniqueshavebeenproposedandusednotonlyintheledkernelmethodsbutalsoinpatternrecognitionandneuralnetworks.Howevermostofthemhavebeendevelopedforbatchmethodsthatareinappropriateforonlinelearningparticularlyinnonstationaryenvironments.Thesepruningmethodscanbecategorizedintotwogroups:theintuitivestrategiesandsensitiveanalysisbasedpruningmethods.Next,wewillexplainthemonebyonebyintroducingrepresentativetechniques. Intuitivepruningmethods,whicharebasedonthedirectrelationshipbetweencoefcientsorunitactivationvalues. Removingtheoldestpattern:[ 41 ]developedtwosimpleremovalstrategies,calledforgetron1and2,todiscardtheoldestsampleinthedictionaryandprovedtheerrorupperbound.Eventhoughthisstrategyisintuitiveandeasytoachieve,theaccuracyisnotgoodenoughforourpurpose. Magnitude-basedpruningstrategies:Eliminateweightsthathavethesmallestmagnitudes,asin[ 42 ]and[ 43 ].Unfortunately,thissimpleideaoftenleadstothewrongeliminationoftheweights.Becausesmallweightsdon'tmeansmallinuencetosystemperformance. Discardingthelessrelevantnodeinthesystemwithorthogonaldecomposition:In[ 44 ]and[ 45 ],theorthogonaltechniquesGram-Schmidt,QRfactorizationandsingularvaluedecomposition(SVD)areutilizedrespectively,suchtheimpactoftheregressorsonreducingtheerrorcanbeexpressedtermwise.ItthenhelpustoselectasubsetofregressormatricesthatcontaintheessentialinformationofthetimeseriesandthecolumnwhichislessirrelevantisremovedfromtheRBFnetwork.However,thesemethodshavehighcomputationalburdenandneedalargenumberoffreeparameterstobepreset. Sensitiveanalysisbasedpruningmethodsmodelaseriesofsensitivitymeasuretoqualifythesystemperturbationsafterpruning. 56

PAGE 57

Throwingawaytheinformationresultinginminimalerrorincreasewithrespecttocurrenttrainingsample:Inaclassicationscenario,removingtheperceptronwiththelargestmargin[ 46 ]isproposed.Whenitcomestotheregressionproblem,thisideawasrstintroducedinOptimalBrainDamage[ 47 ]andwasimprovedby[ 48 ]toremovetherestrictiveassumptionabouttheformofthenetwork'sHessian.Kruifetal.[ 49 ]imposedthisideatoSVMonlyaccountingfortheabsoluteintroducederror.Theproposedproceduredetermineswhichsamplescanbediscardedwhenleastsupportvectormachines(LSSVM)areused. Omittingtheunitswhichmakesinsignicantcontributiontotheoverallnet-workoutputoveranumberoftrainingdata:Becauseofthenoiseinuence,consideringthepruningeffectonlytoonecurrenttrainingsamplewouldresultininappropriateelimination.Therefore,[ 50 ]and[ 51 ]identiedtheunitswithinsignicantcontributiontotheoverallnetworkoutputwithinaslidingdatawindow.Thedeciencyisthatselectingaproperwindowsizecanonlybedonebyenormoustrialandsimulationstudy.Toovercomethisshortcoming,similaras[ 52 ][ 53 ],[ 54 ]transferredtheslidingwindowtothewholedataset.However,thepruningprocessin[ 54 ]won'tstartuntilallthetrainingobservationshavebeenlearned.Inmostpracticalapplications,thelearningsystemmaynotdeterminewhenandwhetheralltheobservationshavebeenpresented.Furthermore,thetemporalmemorizeddataislimitedbythememoryspace.Throughtheaprioriinputsamplingdensity,[ 55 ]linkedtherequiredlearningaccuracywiththecontributiontothenetworkoutputaverageoverallthereceivedinputdata,andthedatawithlittlecontributionissimplyremovedfromthenetwork.ThisalgorithmiscalledGeneralizedGrowingandPruningRBFneuralnetwork(GGAP-RBF).EventhoughthepruningstrategyinthisGGAP-RBFalgorithmisnonparametric,thepriorovertheinputdatadistributionisnoteasytoobtaininpracticalcases. Goingoverthepreviouslymentionedpruningtechniques,werealizethat,despitealltheconstituents(i.e.theintuitivepruningmethods,thestrategyconsideringtheeffecttothecurrentsample,andtheGGAP-RBFalgorithm)beingonline,theiraccuracyperformanceisnotgoodenough.WhileothermethodsarebatchcriteriawhichareunsuitableforKLMS. Theaforementionedpruningmethodscanbebroughttokerneladaptivelters.Uptonow,threedifferentxed-budgetKRLSalgorithmshavebeenproposed.OneusesaslidingwindowtodiscardtheoldertrainingdataandobtainacompactsolutionwhichonlydependsonthelatestKobserveddata[ 56 ][ 57 ].Theothertwoxed-budgetKRLSalgorithmsareobtainedthroughomittingtheleastsignicantcenterswiththe 57

PAGE 58

samepruningcriterionmentionedin[ 49 ],[ 58 ][ 59 ].Thesetwoxed-budgetKRLSalgorithmshaveshowngoodperformance,notonlyinstationaryconditionsbutalsoinnon-stationaryscenarios.However,becausethesetwomethodsneedtheintermediateresultof[I+K])]TJ /F6 7.97 Tf 6.59 0 Td[(1,whereKistheGrammatrixoftheinputdataandistheregularizationterm,theyarenotsuitableforKLMSwhichdoesnotneedtheGrammatrix.Ineffect,thecosttoobtaintheinverseGrammatrixismuchmoreexpensivethanKLMSitself.TheproblemisthatforKLMSweseekapruningtechniquethatisO(N)otherwiseitwillbemoreexpensivetoprunethetopologythatcreatethenonlinearmapping. 3.1.2QuantizedKernelAdaptiveFiltering QuantizationtechniquesisintroducedtoKLMSatrsttoformulateapowerfulsparisicationkernellter,calledQKLMS.Thisalgorithmusesasimplevectorquantization(VQ)algorithmtoquantizetheinputspace,andthentocurbthenetworksizeofthekerneladaptivelter.Everytimeanewsamplearrives,theQKLMSalgorithmchecksifitsdistancetotheavailablecentersislessthanthepredenedminimaldistance.Ifthereisanexistingcentersufcientlyclosetothenewsample,thecenterdictionaryandnetworksizeremainunchanged,butthecoefcientoftheclosestcenterwillbeupdated.Otherwise,thenewsampleisincludedinthedictionary.Foronlinekernellearning,mostoftheexistingVQalgorithmsarenotsuitablebecausethecenterdictionaryisusuallytrainedofineandthecomputationalburdenisheavy.Therefore,averysimpleonlineVQmethodisusedherebasedontheinputspacedistancebetweenthesampleandtheexistingcenters1.Thedifferencebetweenthisprocedureandothermethodsofcurbingthedictionarygrowthisthatthecoefcientsareupdatedwitheach 1Atranslationinvariantkernelestablishesacontinuousmappingfromtheinputspacetothefeaturespace.Assuchthedistanceinfeaturespaceismonotonicallyincreasingwiththedistanceintheoriginalinputspace.Thereforeforatranslationinvariantkernel,theVQintheoriginalinputspacealsoinducesaVQinthefeaturespace. 58

PAGE 59

newinputsample.Inconclusion,thesummaryoftheQKLMSalgorithmwithonlineVQispresentedinAlgorithm. 4 Algorithm4QKLMSAlgorithm Initialization:stepsize,kernelwidthM,quantizationthreshold"U>0,centerdictionaryC(1)=fu(1)gandcoefcientvectora(1)=[d(1)] whilefu(1),d(1)gisavailabledo e(n)=d(n))]TJ /F11 7.97 Tf 11.95 12.07 Td[(K(n)]TJ /F6 7.97 Tf 6.59 0 Td[(1)Pi=1ai(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)M(u(n),Ci(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)) dis(u(n),C(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1))=min1iK(n)]TJ /F6 7.97 Tf 6.59 0 Td[(1)ku(n))]TJ /F10 11.955 Tf 11.95 0 Td[(Ci(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)k i=argmin1iK(n)]TJ /F6 7.97 Tf 6.58 0 Td[(1)ku(n))]TJ /F10 11.955 Tf 11.96 0 Td[(Cj(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)k ifdis(u(n),C(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1))"uthen C(n)=C(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1),ai(n)=ai(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)+e(n) else C(n)=fC(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)u(n)g,a(n)=[a(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1),e(n)] endif endwhile (whereMistheMercerkernelcontrolledbykernelsizeM,Ci(n)]TJ /F4 11.955 Tf 12.69 0 Td[(1)andK(n)]TJ /F4 11.955 Tf 12.69 0 Td[(1)aretheithelementandsizeofC(n)]TJ /F4 11.955 Tf 13.53 0 Td[(1)respectively,andk.kdenotestheEuclideannormintheinputspace.) Thesufcientconditionformean-squareconvergence,andalowerandupperboundonthetheoreticalvalueofthesteady-stateexcessmeansquareerror(EMSE)arestudiedin[ 13 ]. AfterQKLMS,quantizationtechniqueswasutilizedinKRLSandderivesarecursivealgorithm,namelyquantizedKRLS(QKRLS)[ 60 ].Thecostfunctioninthefeaturespacecanbeexpressedas min2FnXi=1(d(i))]TJ /F10 11.955 Tf 11.96 0 Td[(T'(Q[u(i)]))2+kkF(3) whereQ[]denotesavectorquantizationoperator.Asshownin[ 60 ],QKRLSisrathereffectiveinyieldingacompactmodel(withsmallernetworksize)whilepreservingadesirableperformance. 59

PAGE 60

3.2SignicanceMeasure 3.2.1DenitionOfSignicance TheQKLMSoutputwithKcentersforanewinputvectoru(n)is: y(n)=KXi=1iM(u(n),ci)(3) whereiisthecoefcientcorrespondingtotheithcenter.Ifthecenterckisremoved,thesystemoutputwithremainingcentersforunis: ^y(n)=k)]TJ /F6 7.97 Tf 6.59 0 Td[(1Xi=1iM(u(n),ci)+KXi=k+1iM(u(n),ci)(3) Thus,theerrorresultingfromremovingckisgivenby: E0k(n)=ky(n))]TJ /F4 11.955 Tf 12.24 0 Td[(^y(n)kq=kkkqM(u(n),ck)(3) wherek.kqistheq-normofvectors,andq=1inourwork.Therefore,theaverageerrorforallsequentiallylearnedobservationcasedbyremovingckis: Ek(n)=1 n(Ek(1),Ek(2)...,Ek(n))T1=1 nkkknXi=1M(u(i),ck)(3) However,thecomputationcomplexityofEk(n)isveryhighbecauseitinvolvesalllearnedobservations.Therefore,asimplerandbettermethodneedstobeadoptedtoapproximateEk(n).Asmentionedin[ 55 ],theprobabilitydensityfunctionisanoption.SupposethattheinputoftheobservationsequencearedrawnfromadomainUwithasamplingdensityfunctionp(u).Accordingto[ 55 ],wehave limn!+1Ek(n)=kkkZUM(u,ck)p(u)du(3) Thisisthestatisticalcontributionofacenterck.Inpractice,thedistributionofuisunknown,sohereweapplythepopularkerneldensityestimationtechnique(Parzen 60

PAGE 61

window[ 61 ])inordertonon-parametricallyestimatethedensityfunctionoftheinputdata,whichyields, ^p(u)=1 nnXi=1P(u,u(i))(3) wherePisthekernelfunctionwithkernelbandwidthP.Arangeofkernelfunctionsarecommonlyused:uniform,triangular,normal,Epanechnikov,andothers.Therefore,Eq. 3 canbefurtherwrittenas Ek(n)=kkk nnXi=1ZUM(u,ck)P(u,u(i))du(3) InEq. 3 ,informationfromalltheinputdataisneededandfurtherworkshouldbedonetocomputethecentercontributiononlywiththeavailablecenters,i.e.inanonlinemode.InspiredbyQKLMS,thecontributionofu(n)totheinputdensitydistributioncouldbeapproximatedbyitsnearestcenter(thedetailsareexplainedinsection 3.2.2 ).Moreover,innonstationaryscenarios,theinuenceofckonthesystemperformanceistime-varying,whichmeanssignicancerepresentationshouldtracktheinputchanges.Since1=ndoesnotinuencetherankofEk(n),itisdroppedintheexpression.Inordertofacilitatetheimplementationoftheseproperties,Eq. 3 ismodiedto: Ek(n)=kkkKXi=1ZUk(n)M(u,ck)P(u,ci)du(3) wherek(n)istheinuencefactorforcenterckattimentoconsiderthenumberandthetimeofdatamergedintock.Intuitively,themoredatamergedintock,themoreimportantitscontributionandthehighershouldbeitssignicance.Meanwhile,thecenterthatappearsmorerecentlyhasrelativelyahighersignicancethanolderones.Inconclusion,Ek(n)inEq. 3 isthestatisticalcontributionofthecentercktotheoveralloutput.ThuswedenethisasthesignicanceofckandrepresentitasEsigk(n).Signicanceofacenterisdenedasthestatisticalcontributionofcktothenetworkoutputweightedaveragedovertheinputdatareceivedsofar. 61

PAGE 62

Eq. 3 givesanabstractpictureofwhatandhowthesefactorsaffectthesignicancemeasureofacenter.Someinterpretationareshownbelow: Thesignicancemeasureisproportionaltothemagnitudeofthedatacoefcientkkk.RecallthatinKLMScoefcientsareassociatedwithinstantaneouserrors.InQKLMStheweightuctuationofremovingcenterckiskk2=kkk2M(ck,ck)=Ckkk2ifMisatranslationvariancekernel,whereCisaconstant.Whenkkkislarge,removingthiscenterproducesalargeeffect,anditimpactsnegativelythesystemperformance. Acenterwithalargecoefcientbutsmallinuencefactorwouldhavearelativelysmallsignicancemeasurebecausefewdataaremergedintoit.Sinceanoutliercenterresultsinalargecoefcient,theincorporationoftheinuencefactorwilldecreasethesignicanceofthisoutliertillitisdiscardedafterwards.Inthissense,ourpruningcriterionislesssensitivetotheoutliersandadaptivelyomitsoutliersfromthecenterdictionary. Theinuencefactorofacenterdecreaseswithtimeifthereisnonewdatamergedintothiscenter.Thispropertyisusefulinnonstationaryenvironments.Thesignicanceofoldercentersreducesgraduallytillitistinyenoughtobeomitted.Thispropertyalsoguaranteesthatthenewestcenterwouldnotbedroppedimproperlybecauseoflessdatamergedintoitascomparedwitholderones. Remark:Ifthedistributionoftheinputdataisuniformandtheinputstatisticsarestationary,thesignicancevalueisexactlyproportionaltothemagnitudeofthecoefcientdata.Therefore,ourpruningrulesdegeneratestoeliminatingcentersthathavethesmallestmagnitudes. 3.2.2EstimationOfSignicance Animportantaspectofanyonlinealgorithmismoderatecomputationalcomplexityateveryiteration.Therefore,wedeveloparecursivemethodtoestimatesignicanceandcomputetheinuencefactor,thusavoidingtheintegrationoperationinEq. 3 3.2.2.1Inuencefactor First,letusconsidertheproblemofapproximatingtheinputdensityfunctionwithlimitedsamples.Hereweexploitthequantizationidea:utilizingthenearestcenterofan 62

PAGE 63

inputdatatoapproximateitscontributiontotheinputdensitydistribution.Thatis, ^pu(n)=1 nnXj=1P(u,u(j))=1 nnXj=1P(u,Q(u(j)))(3) whereQ(.)isaquantizationoperator.Forsimplication,thecenterdictionaryandquantizationrulefordensityestimationarethesameasthequantizationprocedureinQKLMS. Therefore,theinputdensityfunctionisupdatedaccordingtoEq. 3 ^pu(n)=8>><>>:1 n[(n)]TJ /F4 11.955 Tf 9.96 0 Td[(1)^pu(n)]TJ /F4 11.955 Tf 9.97 0 Td[(1)+P(u,cj)]u(i)mergestocj1 n[(n)]TJ /F4 11.955 Tf 9.96 0 Td[(1)^pu(n)]TJ /F4 11.955 Tf 9.97 0 Td[(1)+P(u,u(n))]otherwise=8>><>>:1 n[PKj=1,j6=jj(n)]TJ /F4 11.955 Tf 9.96 0 Td[(1)P(u,cj)+(j(n)]TJ /F4 11.955 Tf 9.96 0 Td[(1)+1)P(u,cj)]u(i)mergestocj1 n[PKj=1j(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)P(u,cj)+K+1(n)P(u,u(n))]otherwise(3) wherej(n)]TJ /F4 11.955 Tf 12.08 0 Td[(1)istheinuencefactorandrepresentshowmanysamplesarequantizedtocj.Therankofthesignicance,notitsexactvalue,iswhatwetakeintoconsiderationasmentionedbefore,hencetheconstant1 ninEq. 3 willbeignored. Second,ageneralapproachtoforgetpastsamplesistoincludeaforgettingfactor.isapositivescalar,usuallyclosetoone: 01(3) Basedontheaboveanalysis,theexpressionoftheinuencefactoris: k(n)=8>><>>:k(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)Ifnomergeforckk(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)+1Otherwise(3) 63

PAGE 64

whereistheforgettingfactorthathelpsthesystemtracktheinputdatastatisticchange.Thesmallerthevalueofvalue,themoreinuencethecurrentdatahas.Ifanewcenterisaddedtothecenterdictionary,theinitialvalueofis1. 3.2.2.2Integrationapproximation Eq. 3 involvesanintegrationoftheParzenwindowdensityestimationinthesamplingrangeU.Thiscanbedoneanalyticallyforsomesimplebutpopularlyusedkernels,suchastheGaussiankernel.Ouralgorithminvolvestwokernels:theMercerkernelformappingtotheRKHSandtheParzendensitykernel.Forboth,theGaussiankernelisagoodchoicebecauseitisuniversal,differentiableandcontinuous.Moreover,theconvolutionoftwoGaussiankernelfunctionisaGaussianfunctionbutwithdifferentmeanandvariance.Thispropertysimpliestheintegrationcomputationdramaticallyasshownbelow.Forallthesereasons,theGaussiankernelfunctionisagoodchoicetoestimatethesignicancemeasure.TakingthenormalizedGaussiankernel: ZUM(u,ck)P(u,cL)du=ZUexp()]TJ /F4 11.955 Tf 10.5 8.09 Td[((u)]TJ /F10 11.955 Tf 11.96 0 Td[(ck)2 2M)]TJ /F4 11.955 Tf 13.15 8.09 Td[((u)]TJ /F10 11.955 Tf 11.96 0 Td[(cL)2 2P)dx=exp()]TJ /F4 11.955 Tf 10.49 8.09 Td[((ck)]TJ /F10 11.955 Tf 11.96 0 Td[(cL)2 2M+2P)ZUexp()]TJ /F3 11.955 Tf 10.49 9.03 Td[(2M+2P 2M2P(u)]TJ /F10 11.955 Tf 13.15 12.87 Td[(ck+2M 2PcL 1+2M 2P)2)du(3) Practically,ifM=P=,theintegrationcanbewrittenas: ZUM(u,ck)P(u,cL)du=exp()]TJ /F10 11.955 Tf 10.49 8.09 Td[(ck)]TJ /F10 11.955 Tf 11.95 0 Td[(cL)2 22)ZUexp()]TJ /F4 11.955 Tf 10.5 8.79 Td[(2(x)]TJ /F6 7.97 Tf 13.15 4.71 Td[(1 2(ck+cL))2 2)=r 2 2exp()]TJ /F4 11.955 Tf 10.49 8.09 Td[((ck)]TJ /F10 11.955 Tf 11.95 0 Td[(cL)2 22)(3) Thisassumptionisreasonable.TheSilverman'srule[ 33 ]isapopularandwidelyacceptedmethodforkernelselection,thathasbeenpracticallyusedoverawiderange 64

PAGE 65

ofdatasets.Manystudieshaveactuallycomparedotherselectiontechniqueswithit[ 32 ][ 34 ][ 31 ].TheselectionoftheRKHSkernelsizeisstillanopenquestion.However,experimentshaveshownthatthesystemperformanceisacceptableandnotverysensitivetothekernelsizeinarangecenteredatthevaluegivenbySilverman'srule[ 1 ].Furthermore,theGaussiankernelonlyisanexampletoillustrateourmethod.Ifonehasgoodreasontobelievethatotherkernelfunctions,suchaspolynomialkernel,resultinbetterttingwithhigheraccuracy,theformulationofEq. 3 shouldbemodiedaccordingly. 3.2.2.3Recursivetechnique TheoriginalcalculationofsignicanceinEq. 3 istimeconsuming,andscalesbyO(K2).Therefore,arecursivemethodisanoptiontospeedupitscalculation.Accordingtothepruningandgrowingdemandcriterion,therearethreeconditions: 1)AnewcentercK+1isaddedandthereisextraroomforthecenterdictionarytogrow.Inthisconditionwesimplyappendtheeffectofthenewsampletoallthecurrentcenters. Ek(n)=Ek(n)]TJ /F4 11.955 Tf 9.96 0 Td[(1)+kK+1kZUM(u,ck)P(u,cK+1)du(3) 2)Ifu(i)shouldbequantizedtocj,theupdateprocessisasshowninEq. 3 Ek(i)=8>><>>:Ek(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)+(u(n),ci)kkkRUM(u,ck)P(u,ci)duk6=ikak+e(n)k kkkEk(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)+(u(n),ci)kak+e(n)kRUM(u,ck)P(u,ci)duk=i(3) 3)whenacentercLispruned,theinuenceofthiscentershouldbeeliminated,yielding, Ek(n)=Ek(n))-221(kkkL(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)ZUM(u,ck)P(u,cL)du(3) TheexploitationofrecursivityreducesthecomputationalcomplexityofsignicancefromO(K2)toO(K)whichisacceptableforthemajorityofkernelmethods.Furthermore,althoughthismethodhastomemorizeallinuencefactorsofexistingcentersandhence 65

PAGE 66

risesthememoryrequirement,thecostislimitedwhencomparedwithconventionalKLMSalgorithm. Withtheimplementationofrecursivity,oursignicancecriterionisanonlinecostfunction.Unlikeotherbatchcriteria,forexamplethosein[ 44 ]-[ 54 ],oursignicancemeasureisupdatedbasedonthepreviousvalueateachstepotherthanrecalculatingthroughscanningalldata. 3.3FixedBudgetQuantizedKernelLeast-Mean-SquareAlgorithm Nowwegiveasummaryofproposedxed-budgetQKLMSalgorithminAlgorithm. 5 Foreachobservation(u(1),d(1)),(u(2),d(2)),...presentedtothesystem, Algorithm5ProposedFixed-budgetAlgorithm Initialization:forgetfactor,thexedbudgesizeK,quantizationthreshold"U>0,thestepsize,centerdictionaryC(1)=fu(1)gandcoefcientvector(1)=[d(1)]; Computing whileu(n),d(n)isavailabledo e(n)=d(n))]TJ /F11 7.97 Tf 11.95 12.06 Td[(K(n)]TJ /F6 7.97 Tf 6.59 0 Td[(1)Pi=1i(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)M(u(n),Ci(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)); dis(u(n),C(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1))=min1jK(n)]TJ /F6 7.97 Tf 6.59 0 Td[(1)ku(n))]TJ /F10 11.955 Tf 11.95 0 Td[(Ci(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)k; i=argmin1iK(n)]TJ /F6 7.97 Tf 6.58 0 Td[(1)ku(n))]TJ /F10 11.955 Tf 11.96 0 Td[(Ci(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)k; ifdis(u(n),C(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1))"uthen C(n)=C(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1),i(n)=i(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)+e(n); UpdatethesignicanceEsigk(n)basedontheEq. 3 ; else ifSize(C(n))
PAGE 67

3.4Simulation Weproceedtodemonstratetheperformanceoftheproposedpruningalgorithm,rstunderstationaryconditionandtheninnonstationarysituations.Wemainlyfocusonthecomparisonofdifferentpruningtechniqueswithonlinecriterion.Furthermore,consideringthecomputationalcomplexityperspective,pruningalgorithmswithO(K2)orhighercomplexityarenotstudied. 3.4.1TimeSeriesPrediction TheLorenzsystemisadynamicalsystemofachaoticow,notedforitsbutteryshape.Thesystemisnonlinear,three-dimensionalanddeterministic.ThefollowingsetofdifferentialequationsdictateshowthestateofLorenzsystemevolvesovertimeinacomplex,non-repeatingpattern,8>>>>>><>>>>>>:_x=(y)]TJ /F5 11.955 Tf 11.95 0 Td[(x)_y=)]TJ /F5 11.955 Tf 9.3 0 Td[(xz+x)]TJ /F5 11.955 Tf 11.95 0 Td[(y_z=xy)]TJ /F5 11.955 Tf 11.96 0 Td[(Bz (3) Twosetofparametersareconsidered:onewith=10,=28,B=8 3,anotherwith=16,=45.62,B=4.Usingthesetwoparameterssets,wegeneratetwotimeserieswithsamplingperiodof0.01sandlengthH1andH2respectively,whichweconcatenatetocreateanonstationarytimeserieswitharapidtransition(Fig. 3-1 ).AbiascomponentasshowninFig. 3-2 isaddedtotheseriestomaketheproblemmorechallenging.Azero-meanwhiteGaussiannoisewithvariance0.01isaddedtothewholesequence.TheproblemsettingforLorenzshorttermpredictionisasfollows:thepreviousvepointsu(i)=[x(i)]TJ /F4 11.955 Tf 12.28 0 Td[(5),...,x(i)]TJ /F4 11.955 Tf 12.28 0 Td[(1)]Tareusedastheinputvectortopredictthecurrentvaluex(i)whichisthedesiredresponse.Inthesimulationsbelow,thekernelsizeiskept 67

PAGE 68

Figure3-1. Lorenzpredictiontimeseriesthathasagradualchangeatiteration2500 at=0.707,andthestepsizeissetato=0.92.Furthermore,thetrainingMSEiscalculatedbasedonthemeansquareofthepredictionerrorinarunningwindowof50samples,forvisualclarity. Inordertoverifythattheproposedmethodologykeepsarelativelygoodlearningperformanceinstationaryconditioneventhoughcentersarepruned,werstgeneratethetrainingdatausingthesecondsetofparameters.TheconvergencecurvesofthetrainingMSE(averagedover300Monte-Carlosimulations)forQKLMS(=0.653)andQKLMS-FB(=0.7,=0.96)areplottedinFig. 3-3A .AlthoughtheaccuracyofQKLMS-FBisslightlyworsethanthefullQKLMS(QKLMS-FBnalMSE: 2Forsimplicity,thesefreeparametersarexedandnottweakedduringthenon-stationaryprocedure.Therefore,wechoosetheparametersthatreachcompetitiveperformanceinallthreezones.Thisstrategyisalsoimplementedinthenextsimulationexample.3="u=isthequantizationfactor[ 13 ] 68

PAGE 69

Figure3-2. BiascomponentaddedtotheLorenzpredictiontimeseries -16.41dB,QKLMSnalMSE:-17.27dB),thedictionarysizedecreasesfrom36.81to26(reductionrateis29.3%).Fig. 3-3B illustratestheinuenceofthexedbudgetsizeonthenalMSE.Asexpected,performancesufferswhenthebudgetsizeissmaller,butitapproachessmoothlythatofQKLMSwhenthedictionarysizegrows. Next,wedemonstratetheperformanceofQKLMS-FBundernonstationaryconditionandcompareitwithQKLMSandQKLMS-GGAP(pruningstrategywithsignicancedenedinGeneralizedGrowingandPruningRBF).Thesimulationresultsover200Monte-CarlorunsareshowninFig. 3-4 .ThebudgetofQKLMS-FBandQKLMS-GGAPissetat20,andtheparametersofQKLMSaresettoachievealmostthesamenaldictionarysize.TheparametersarelistinTable 3-1 .Severalaprioridistributions(Gaussian,RayleighandExponential)aretestedwithQKLMS-GGAPandtheyhavesimilarperformance.InthefollowingonlytheresultsofGaussiandistributionareshown.ThedifferencebetweentheactualinputdistributionandapriorbaseddistributionleadstotheimpropereliminationofcentersinQKLMS-GGAP,andmakestheperformance 69

PAGE 70

AConvergencecurvesintermsofthetrainingMSE BTheinuenceofthexedbudgetsizeonnalMSE(QKLMS-FB) Figure3-3. PerformanceofQKLMSandQKLMS-FBinLorenztimeseriespredictionunderstationarycondition 70

PAGE 71

worseasclearlyindicatedinFig. 3-4A .BecauseQKLMS-FBdiscardstheoldcentersinH1afterthechangeatiteration2500,itobtainsbettertrackingabilityandfasterconvergencespeedinH2.Furthermore,owningtoalargerquantizedfactortoachievesimilarnaldictionarysize,theperformanceofQKLMSisworsethanQKLMS-FB.InFig. 3-5 ,theperformanceofQKLMS-FB,QKLMS-GGAP,QKLMSandKLMSarecompared.ThistimetheparametersareselectedtoyieldalmostthesamenalMSE(exceptQKLMS-GGAP,sinceitdoesn'tworkwellinnonstationaryconditions).InordertoachievethesamenalMSEofQKLMS-FB,QKLMSrequiresasmallerquantizationfactor,whichspeedsupthedictionarygrowth(seeFig. 3-5 ).Fig. 3-6 showstheinuenceofthexedbudgetsizeonthenalMSEinnonstationarydata.Inthisexample,oursimulationresultsconrmthattheQKLMS-FBbehavesmuchbetterthantheothers,achievingeitherhigheraccuracyorsmallerdictionarysize. Table3-1. ParameterssettingfordifferentalgorithmsinLorenztimeseriesprediction SameMSESamenetworksize QKLMS=0.9=1.35 QKLMS-FB=1.0=1.0=0.9=0.9 QKLMS-GGAP=1.0=1.0 3.4.2ChannelEqualization Considerthenonlinearchannelconsistingofalinearlterandamemorylessnon-linearity,asshowninFig. 3-7 .ThisnonlinearchannelmodelisnamedasWienermodelandhasbeenusedtomodeldigitalcommunicationssystemanddigitalmagneticrecordingchannels.Aseriesofbinarysignals,si2f+1,)]TJ /F4 11.955 Tf 9.3 0 Td[(1g,arethechannelinput.Atthereceiver,thesignaliscorruptedbyadditivenoisen(i).Thechannelequalizationproblemcanbeformulatedasaregressionproblem,bysettingthepairwiseinput-outputasf([r(i),r(i+1),...,r(i+l)],s(i)]TJ /F5 11.955 Tf 12.21 0 Td[(D))g,wherelisthetime-embeddinglengthandDistheequalizationtimelag.l=5andD=2aresetinthissimulation.Thenonlinearchannelmodelisdenedbyx(i)=s(i)+0.5s(i)]TJ /F4 11.955 Tf 12.32 0 Td[(1),r(i)=a(x(i))]TJ /F4 11.955 Tf 12.32 0 Td[(0.9x(i)2+n(i)),wheren(i)isa-20dBwhiteGaussiannoise.a=3.0intherst1000samples(area 71

PAGE 72

AConvergencecurvesintermsofthetrainingMSE BNetworksizeevolutioncurves Figure3-4. PerformancecomparisonforQKLMS,QKLMS-FBandQKLMS-GGAPinLorenztimeseriesprediction.Theparametersofthealgorithmsarechosensuchthattheyproducealmostthesamenalnetworksize. 72

PAGE 73

AConvergencecurvesintermsofthetrainingMSE BNetworksizeevolutioncurves Figure3-5. PerformancecomparisonforQKLMS,QKLMS-FBandKLMSinLorenztimeseriesprediction.TheparametersofthealgorithmsarechoseninawaytoproducealmostthesamenaltrainingMSE. 73

PAGE 74

Figure3-6. TheinuenceofthexedbudgetsizeonthenalMSEinLorenztimeseriesperditionundernonstationarycondition. H1)anddecreasesto2.0withdifferentchangerates(areaH12,inthispaperchangerateisdenedas1=length(H12))thenxedtoa=2.0(areaH2).Fig. 3-8 representsonerealizationofr(i)withthechangerateofbeing1/500.Inthisexample,thekernelsizeissetto=0.707,andthestepsizeto=1.0. Figure3-7. Basicstructureofanonlinearchannel 3.4.2.1Comparisonindifferentchangerateconditions WerstcomparetheperformancesofQKLMS,QKLMS-FB,QKLMS-GGAPunderdifferentchangerateconditions:abruptchange,1 500,1 2500and1 5000respectively.InTable 3-2 andTable 3-3 ,theparametersareselectedsuchthatallofthealgorithmsyieldalmostthesamenalnetworksizeornalMSEforeachcondition.Fig. 3-9 visually 74

PAGE 75

Figure3-8. Theinputofthechannelequalizationsystem showstheresultunderabruptchangecondition.ThespecicparameterssettingarelistedinTable 3-4 andTable 3-5 TheseresultsindicateagainthatQKLMS-FBissuperiorinallconditionswithbetterttingperformanceorfewercenter.InTable 3-2 ,QKLMS-FB'sperformanceisstable,whichimpliesthatithasgoodtrackingabilityunderdifferentchangerateconditions.Table 3-3 showsthatthenetworksizeofQKLMSincreaseswiththechangetimegrows,whichresultedbythedictionaryspaceoccupationofthedatainthetransitionarea.Ontheotherhand,QKLMS-FBdoesnotstrugglewiththisproblembecauseitdiscardstheoldercentersinlearning-varyingsystem.Forexample,inthecasewheretheratechangeequalsto1 500,QKLMS-FBabandonsthecentersallocatedinH1duringH2(seeFig. 3-10 )andkeepsmorerecentdata.Therefore,QKLMS-FBcouldemployarelativelysmallquantizationfactorandobtainbetteraccuracythanQKLMSinnonstationarycondition. 75

PAGE 76

AConvergencecurvesintermsofthetrainingMSE BNetworksizeevolutioncurves Figure3-9. PerformancecomparisonforQKLMS,QKLMS-FBandQKLMS-GGAPforthechannelequalizationproblemundertheconditionofabruptchange 76

PAGE 77

3.4.2.2Outlierinuence WenowverifytheperformanceofQKLMS-FBinanon-Gaussiannoisesituationanduseamix-Gaussiandensitytosimulatetheinuenceofoutliers.Thenoisedistributionis: pnoise(i)=0.95N(0,0.03)+0.05N(0.4,0.03)(3) Table 3-6 showsparameterssettingandnalMSEfordifferentalgorithmswithalmostthesamenalnetworksize(QKLMS:85.2,QKLMS-FBandQKLMS-GGAP:84).This Table3-2. ThenalMSE(dB)comparisonofdifferentalgorithmsinchannelequalization.Theparametersofthealgorithmsarechosenastoproducealmostthesamenalnetworksize. abruptchange1 5001 25001 5000 QKLMS-3.236-7.254-5.499-6.556 QKLMS-FB-20.76-18.55-17.37-18.365 QKLMS-GGAP-17.5-5.573-4.658-6.126 Table3-3. Thenalnetworksizecomparisonofdifferentalgorithmsinchannelequalization.TheparametersofthealgorithmsarechosenastoproducealmostthesamenalMSEineachcondition. abruptchange1 5001 25001 5000 QKLMS101166.5188.3190.8 QKLMS-FB84848484 Table3-4. ParameterssettinginchannelequalizationtoachievealmostthesamenalNetworksizeineachcondition abruptchange1 5001 25001 5000 QKLMS=2.7=3.2=3.4=3.4 =2.0=2.0=2.0=2.0QKLMS-FB=0.98=0.95=0.92=0.9K=84K=84K=84K=84 QKLMS-=2.0=2.0=2.0=2.0GGAPK=84K=84K=84K=84 Table3-5. ParameterssettinginchannelequalizationtoachievealmostthesameMSEineachcondition abruptchange1 5001 25001 5000 QKLMS=1.5=1.7=1.6=1.6 QKLMS-FB=2.0=2.0=2.0=2.0=0.98=0.95=0.92=0.9 77

PAGE 78

Figure3-10. ThecurveofthenumberofcentersallocatedduringH1forthechannelequalizationproblem resultindicatesthatQKLMS-FBstillobtainsgoodperformanceundertheinuenceofoutliernoise. Table3-6. Parameterssettinginchannelequalizationfordifferentalgorithms(changerateequalsto1=500) parameterssettingnalMSE(dB) QKLMS=3.3)]TJ /F18 10.909 Tf 8.48 0 Td[(5.066 QKLMS-FB=2)]TJ /F18 10.909 Tf 8.49 0 Td[(17.954 QKLMS-GGAP=0.9)]TJ /F18 10.909 Tf 8.48 0 Td[(5.046 3.4.2.3Howthethreefreeparameters,,andK,affecttheperformanceofQKLMS-FB Settingtheratechangeto1 500forexampleandxingthenetworksizebudgetat84,weplotthenaltrainingMSEversusthequantizationfactorandforgettingfactor(Fig. 3-11 ).Fromtheplotweobservethattheseparametersaffectthesystemperformanceheavily.Westilldonothaveasystematicapproachtoselecttheseparametersbasedonthedata,butcanadvancesomeintuitiveobservations:a)Underaxed-budgetlimitation,smallandlargebothresultinlowaccuracy.Atoosmall 78

PAGE 79

Figure3-11. EffectofthequantizationfactorandforgetfactoronnalMSEperformanceforthechannelequalizationproblemwithrespecttothexednetworksize. quantizationfactordoesnotresultindesirablenetwork.Thusthelimiteddictionarycannotprovideenoughdetailofthesignalstructuretoprovidegoodidentication.Ontheotherside,alargequantizationfactordecreasesthesteady-stateEMSEofthequantizationprocedure,whichalsodecreasestheperformanceofQKLMS-FB.b)Selectingtheforgettingfactordependsonhowthedatastatisticschange.Ifthestatisticalpropertychangesfrequently,asmallerforgettingfactorispreferred.Otherwise,alargeforgettingfactorbringsgoodperformance(=1instationarycondition).Asmallforgettingfactorforcesthecurrentdatatohavemoreinuenceonthesignicancemeasure.However,atoosmallforgettingfactormakethesystemsensitivetooutliers.Inconclusion,settingtheforgettingfactorisatradeoffbetweenoutlierrejectionandtrackingability. 79

PAGE 80

3.5ConclusionAndDiscussion Sincepracticalcomputingdeviceshaveonlylimitedphysicalmemoryandcomputationalcapacity,xed-budgetkernelmethodsarecrucialinrealworldapplications.Inthischapter,wepresentedanewefcientpruningandgrowingstrategyindesigningaxed-budgetQKLMSsystem.Theproposedsignicancemeasuresuccessfullyquantiestheimportanceofacenterwithrespecttotheoveralllearningsystem,andkeepsthesystemstructureunderapredenedsizeparticularlyinnonstationarycondition.ComparingwithGGAP-RBF,QKLMS-FBismorepracticalbecauseofnorequirementofaprioriknowledgeabouttheinputdistribution.Meanwhile,onlineandrecursivecalculationsresultinareasonableandefcientupdatingstrategytocarryonthecomputationinrealtime. Therearemanymoreapplicationsthatcanbenetfromthisdevelopment.Theinterestingnextquestionishowtosetthenetworksizepredenedthresholdautomatically.Anycriterionrequiressomethresholdpicking.Cross-validationisusuallyusedasadefaultmethod.Sincethenetworksizeisamoremeaningfulquantity,itmightbeeasierthanotherstochooseaccordingtopracticaltimerequirementandmemorycapability.Thesecondinterestingquestionishowtoextendsurprisemeasureintootherkernelmethods,likeKRLS.ThecoefcientsofKRLSarereachedbytheGrammatrix.ThereforetheerrorresultingfromremovingacenterisnotsostraightforwardasKLMS.TherelatedtechniquecouldrefertotheGrammatrixdimensiondecreasingmethodmentionedin[ 58 ]. 80

PAGE 81

CHAPTER4MDL-BASEDSPARCIFICATIONALGORITHMFORKERNELADAPTIVEFILTERING Toestablishagoodkerneladaptivelter,wehavetodealwiththreeimportantproblems.1)Howtondthekernelfunctionwhichreectsthesystempeculiarityasmuchaspossible.2)Howtoprovidegoodestimationaboutthefreeparameters,likestepsizeinKLMSandregurgitationterminKRLS.3)Howtoselectthestructuresize(modelorder)toobtainagoodtradeoffbetweencomputationalcomplexityandsystemperformance.Thersttwoproblemshavebeendiscussedinmanyliteratures.Thersttopiciscloselyrelatedtothetechniquesofmulti-kernellearningorlearningthekernelknowninthemachinelearningliteratures[ 62 ][ 63 ][ 64 ].However,inkerneladaptiveltering,tosimplifythecomputation,ingeneral,asinglekernelisselected,ratherthanusingacombinedorhybridkernel[ 1 ].Forthesecondtopicsomeintuitiveobservationsareutilizedtoguideparametersselection[ 1 ].Thereforemyworkfocusesonthethirdarea.Inthelastchapter,thesignicancemeasureguidesthesystemtoxthenetworksizeonapredenedthreshold.However,howtodecidethepredenedthresholdisnotclear.Furthermore,theadd-one-discard-onestrategyisnotarealnonstationarylearning.Whatweexpectedisthatthenetworksizeshouldoptimallybedictatedbytheinputdatacomplexity.Thatis,iftheinputdatacomplexityishigh,thenetworksizeshouldincreasecorrespondingly.Otherwisethenetworksizeshoulddecrease.Asapowerfultooltondthecompromisebetweensystemcomplexityandaccuracyperformance,MinimalDescriptionLength(MDL)hasbeenutilizedinseveralareas.Here,weadoptedMDLasasparsicationcriteriontoadaptivelydecidethenetworksizeofkerneladaptivelters.Generally,MDLcriterionhastwoformulations:batchmodelandonlinemodel.Mostofthetime,MDLisutilizedforbatchmodelproblems.Therefore,thischapterrstlytakestheapproximationlevelselectioninKRLS-ALDasanexampletoillustratethebatchmodelinkerneladaptivelters.ThenweproposedforthersttimeKLMSsparsicationalgorithmbasedontheonlineMDL.Owningtothisproposed 81

PAGE 82

algorithmseparatingtheinput(feature)spacewithquantizationtechniquesandutilizingaslidingwindowtochecktheMDLperformance,itiscalledQKLMS-MDL. 4.1InformationTheoryForSparsication Besidesthekerneladaptivelter,themodelselectionproblemcanbefoundinvariousareassuchasvibrationanalysis,underwateracoustics,andimagingprocessing.Severalcriteriahavebeenproposedtodealwiththisproblem.ThepioneeringworkisAkaikeInformationCriterion(AIC)[ 65 ].Afterthiswell-knownwork,BayesianInformationCriterion(BIC,whichisalsoknownastheSchwarzInformationCriterion)[ 66 ][ 67 ],MinimumDescriptionLength(MDL)[ 67 ],PredictiveMinimumDescriptionLength(PDL)[ 68 ]andtheNormalizedMaximumLikelihood[ 69 ]wereproposed.Recently,theBayesianYingYang(BYY)informationcriterion,amodicationtotheMDLforimprovingperformance,wasdeveloped[ 70 ].TheutilizationoftheMDLinourwork,inparticular,isbasedonourfamiliaritywithitandthefactthatitisrobustagainstnoiseorstochasticdistributionandrelativelyeasytoestimate.Besides,comparedwithothers,MDLhasthegreatadvantageofrelativelysmallcomputationalcosts[ 71 ][ 72 ]. MDLprinciple,rstproposedbyRissanenin[ 73 ],isrelatedtothetheoryofalgorithmiccomplexity[ 74 ].Rissanensuggestedtheissueofmodelselectionwastakenasaproblemofdatacompression.TheMDLutilizesthedescriptionlengthasameasureofthemodelsimplicityandimpliesthemodelwiththeleastdescriptionlengthisbest.ThesuperiorityofMDLhasbeenindicatedinvariousapplications.InNeuralNetworks,MDLwasadoptedtodeterminethenumberofneuronsthatmimictheunderlyingdynamicpropertyofthesystemwithabsenceofthelocalminima[ 71 ][ 75 ].Afterthat,asanimprovement,MDLcriterionwasutilizedtodirectlydeterminetheoptimalNeuralNetworkmodelandsuccessfullyappliedtopredictionproblems[ 76 ]andcontrolsystem[ 77 ].Furthermore,theembeddingdimensionofarticialneuralnetworkisdecidedbasedonconstructingaglobalmodelwithleastdescriptionlength[ 78 ].Startingfromanoverlycomplexmodelandthenpruningunneededbasisfunction 82

PAGE 83

accordingtoMDL,LenoardisandBischof[ 79 ]proposedaradicalbasisfunction(RBF)networkformulationtobalanceaccuracyperformance,trainingtimeandcomplexityofnetwork.BesidesNeuralNetworksandRBF,MDLalsohavebeensuccessfullyusedinvectorquantization[ 80 ],clustering[ 81 ][ 82 ][ 83 ],minegraphs[ 84 ][ 85 ]andsoon.Inmostcases,theMDLisusedforsupervisedlearningasapenaltytermontheerrorfunctionorasacriterionfornetworkselection[ 80 ].OneexceptionistheworkofZemelandHinton[ 86 ]whoappliedMDLtodetermineasuitableencodingofdata. TheMDLprincipleaddressesmodeldescriptionascodingdataandsuggestschoosingthemodelwhichprovidestheshortestdescriptionlength.Fromtheviewofclassicalparametricstatistics,forexample,weneedtoestimatethemodelparametersfromafamilyparameteroptions M=ff(un)j:2Rkg(4) basedontheobservationun=(u(1),...u(n)),wherekisthedimensionof.MDLisatwo-stagecodingscheme:thedescriptionlengthL(^),inbits1,fortheestimatedmember^andthedescriptionlengthL(unj^),inbits,ofdatabasedon^.Ifweassumethemachineprecisionis1=p nforeachmemberof^andtransmit^withauniformencoder,therststageisexpressedas: L(^)=2 klog2n Thischoicehasbeenshownisoptimalinregularparametricfamilies[ 87 ].Analternativechoicefortherststageisdecidingthemachineprecisionaccordingtothedistribution 1Theunitofdescriptionlengthalsocouldbenat,whichiscalculatedbyln,ratherthanlog2aswedointhispaper. 83

PAGE 84

of,whichisexplainedin[ 88 ], L(^)=kXj=1log2 j whereisaconstantrelatedtothenumberofbitsintheexponentoftheoatingpointrepresentationofjandjistheoptimalprecisionofj.AccordingtoShannonEntropyprinciple,thesecondstageofMDLareoftentakenas: L(unj^)=)]TJ /F4 11.955 Tf 11.29 0 Td[(log2f(unj^) Combiningthedescriptionlengthsfromthetwostages,weestablishthetwo-stageMDLformulation, SimpleMDL:Lmodel(n)=)]TJ /F4 11.955 Tf 11.29 0 Td[(log2f(unj^)+2 klog2n(4) MixtureMDL:Lmodel(n)=)]TJ /F4 11.955 Tf 11.29 0 Td[(log2f(unj^)+kXj=1log2 j(4) ComparingwiththesimpleMDL,themixtureMDLappliesanadaptivelydeterminedpenaltyonthesecondstage.OwningtotheapplicationcomplexityofmixtureMDL,weutilizethesimpleMDLinourwork. MDLhasrichconnectionwithothertraditionalframeworks.Obviously,minimizing)]TJ /F4 11.955 Tf 11.29 0 Td[(log2f(unj^)isequivalenttomaximizingf(unj^).Therefore,inthissense,MDLcoincideswithMaximumLikelihood(ML)[ 89 ]inparametricestimationproblem.SimilarasMDL,AICandBICalsohavetwo-stagestructure.Theonlydifferenceinformulationisthesecondterm,asshowninTab. 4-1 .Wendoutthatthesimpliedtwo-stageMDLtakesthesameformofBIC.Furthermore,MDLhasclosetietoBayesiananalysis.UnderBayesianassumption,themixtureMDLapproximatesBICasymptotically.ThereforethemixtureMDLparadigmservesasanobjectiveplatformfromwhichwecancompareBayesianandnon-Bayesianproceduresalike.ComparedwithMDLandBIC,AICcriteriontendstoselectamodelthataretoocomplexandisnotappropriateforsmalldataset.Notice,eventhoughthesecriteriasharesimilarform,thephilosophiesbehindthemaremuchdifferent[ 90 ]. 84

PAGE 85

Figure4-1. GeneralRate-distortioncurveofKRLS-ALD 4.2KRLS-ALDApproximationLevelSelection InKRLS-ALDtheapproximationleveldramaticallyinuencesthecenterdictionarysizeandsystemaccuracy.Usually,asmallerresultsinhigheraccuracybutlargercenterdictionary,whilealargerreachesloweraccuracybutsmallercenterdictionary.ByanalogywiththeRate-distortiontheoryininformationtheory,acurveabouttherelationshipbetweencenternumberofKRLS-ALDandMSEisplottedinFig. 4-1 .Generally,thiscurvecouldbesplitintothreezones:highsensitivityarea(zone1),lowsensitivityarea(zone3)andtransitionarea(zone2).Inzone1asmallincrementincenternumbergainsanoticeablyimprovementinsystemaccuracy,whereasthesystemimprovementisminimaleventhoughthecenternumberrisesinzone3.Comparedwiththese,systemsinzone2achieveagoodtradeoffbetweensystemcomplexityandaccuracyperformance.Inthissection,weproposeamethodbasedonMDLforapproximationlevelselectionthatwillputtheoperatingpointinzone2.ThistaskinvolvestheMDLobjectivefunctiondesignandanoptimizationproceduretoselectareasonablesolution. Table4-1. Thesecondtermexpressioninparticularcriterion MDLPkj=1log2 j AICk BIC2 klog2n 85

PAGE 86

4.2.1FormulationofKRLS-ALDApproximationLevelSelectionTechniqueBasedonMDL 4.2.1.1Objectivefunction InKRLS-ALD,thesystemmodelisdescribedbythecentersincenterdictionaryC(n)=fcigK(n)i=1andthecorrespondingcoefcients~(n).Accordingtothesimpliedtwo-stageMDL,thedescriptionlengthoftheKRLSsystemisgivenbytwoparts:thecostofspecifyingthemodelpredictionerrorsandthethecostofdescribingthefreeparametersC(n),~nandkernelsize.SuchEq. 4 intoourparticularcaseofaKRLS-ALDnetwork.isexpressedas Lmodel(n)=)]TJ /F4 11.955 Tf 11.29 0 Td[(log2P(e(n)jC(n),~(n))+M(n) 2log2n(4) whereM(n)isthenumberoffreeparametersincludingkernelsize,~(n)andC(n).Ifthenumberofcenterdictionaryismuchlargerthanthenumberofkernelwidth,1,wecanapproximatesM(n) 2asK(n),yielding, Lmodel(n))]TJ /F4 11.955 Tf 23.91 0 Td[(log2P(e(n)jC(n),~(n))+K(n)log2n(4) Assumingthepredictionerrorsarei.i.dandfollowtheGaussiandistributionwithzeromeanandstandarddeviation, )]TJ /F4 11.955 Tf 11.95 0 Td[(log2P(e(n)jC(n),~(n))=nXi=1)]TJ /F4 11.955 Tf 11.3 0 Td[(log2P(e(i)jC(n),~(n))=n 2log2e+log22 nn=2+log22 nn 2log2e+log22 nn=2+log2 nXi=1e(i)2!n=2(4) TheGaussianassumptionisreasonableinmanysituationsandexpedientinallcases.Ifonehasgoodreasontomakesurethatthedistributionofthepredictionerrorshaveotherforms,suchasuniformdistribution,theformulationofEq. 4 shouldbemodied 86

PAGE 87

accordingly,oradensityfunctionestimationmethod,likeParzenWindow[ 61 ],couldbeadopted.Inthecurrentwork,wefocusourattentiononthesituationwheretheerrorsfollowaGaussiandistribution. 4.2.1.2Optimizationselection InordertosearchtheminimalvalueofEq. 4 ,weusetheEstimationDistributionAlgorithms(EDA)tolocatetheappropriate.EDA,whichbelongstotheclassofevolutionaryalgorithms,isastochasticoptimizationmethodbybuildingandsamplingtheexplicitprobabilisticmodesofpromisingcandidatesolutions[ 91 ].Fastconvergencespeed,avoidinglocaloptimalandthecapabilityofsolvingdifferenttypesofproblemsinarobustmannerarethemainreasonstochooseEDAhere.ThegeneralprocedureofanEDAisoutlinedintheAlgorithm 6 : Algorithm6EDAAlgorithmoutline D(0):GenerateNindividuals(theinitialpopulation)randomly l=1:Iterationindex whileThestoppingcriterionisn'tmetdo fori=1,...,Ndo Foreachround,calculatetheresultofDi(l)]TJ /F4 11.955 Tf 11.95 0 Td[(1) endfor Dse(l)]TJ /F4 11.955 Tf 11.21 0 Td[(1):SelectSe
PAGE 88

as0(n),Let'(u0(1))equalsto'(u(1)),'(u0(2))isthedatawhichhasthelongestdistanceinthefeaturespaceto'(u0(1))among(n)nf(u0(1))g,and'(u0(3))isthedatathathasthelongestdistancetothespannedspaceformedby'(u0(1))and'(u0(2))....Thatis,'(u0(i)),i=1,2,...,nsatises '0(i)=argmax'(j)2(n)n0(i)]TJ /F6 7.97 Tf 6.58 0 Td[(1)ki)]TJ /F6 7.97 Tf 6.58 0 Td[(1Xk=1akj'0(k))]TJ /F9 11.955 Tf 11.95 0 Td[('(j)k2(4) where'0(k)and'(j)arethesimplicationof'(u0(k))and'(u(j))respectivelyand0(i)]TJ /F4 11.955 Tf 12.58 0 Td[(1)=['0(1),...,'0(i)]TJ /F4 11.955 Tf 12.59 0 Td[(1))].Obviously,theorderofdesiredatashouldchangeaccordingtotheorderoftheinputdata.Meanwhileavectorsisutilizedtostorethecorrespondingdistancevalue.Next,wedeneafunctionwith: f(si)=sign(si)]TJ /F3 11.955 Tf 11.95 0 Td[()(4) where sign(x)=8>><>>:1Ifx>00o.w.(4) suchthat~(n),~K)]TJ /F6 7.97 Tf 6.59 0 Td[(1(n)andA(n)correspondingtoparticularcanbeexpressedas: ~(n)=['0(1),...,'0(k)],k:f(sk)=1,f(sk+1)=0;(4) ~K(n))]TJ /F6 7.97 Tf 6.59 0 Td[(1=(~(n)T~(n)))]TJ /F6 7.97 Tf 6.59 0 Td[(1(4) A=266666664a11a12a1n)]TJ /F6 7.97 Tf 6.59 0 Td[(1a1na21a22a2n)]TJ /F6 7.97 Tf 6.59 0 Td[(1a2n........................ak1ak2akn)]TJ /F6 7.97 Tf 6.58 0 Td[(1akn377777775T(4) where 8>><>>:aii=1,aik=0,iff(si)=1,ai=~K(i)]TJ /F4 11.955 Tf 11.95 0 Td[(1))]TJ /F6 7.97 Tf 6.59 0 Td[(1h(i),iff(si)=0,(4) 88

PAGE 89

Becausetheinputdatahavebeenreordered,theformationofA(n)is[I,a(k+1),...,a(n)]T,whereIisanidentitymatrix.Aslongasthe0(n)andsareobtained,~(n)scorrespondingtoseriesscouldbeachieveddirectly.Furthermore,throughsamplingswithascendingorderineachiteration,the~(n)ofaparticularroundis~(n)ofthepreviousroundinthesameiterationappendedwithsomemoresamples.Therefore,thevalueof~K(n))]TJ /F6 7.97 Tf 6.58 0 Td[(1,A(n),and~(n)inthepreviousroundaretheinitialconditionofthecurrentround.Thatis,KRLS-ALDcouldbeobtainedrecursivelyineachiteration.Suchthecomputationalcomplexitydecreasesdramatically. InEDA,therearethreefreeparameters:theterminationcriteria,thesamplenumber(Se)toapproximatetheprobabilitydistributionandthesamplenumber(N)togeneratenextscanningpopulation.Inourproblem,weassumetheprobabilitydistributionofthecandidatesampleisaone-dimensionalGaussiandistribution.Andtheterminationcriteriaissetas:IfthevarianceofDSe(l)issmallerthanathresholdEDA,thesearchingprocessends.LargervaluesofNandSeincreasetheaccuracyofprobabilityestimationandimprovethesystemperformancethereby.Meanwhile,computationalcomplexityofEDAislinearwiththeNandSe.Therefore,theselectionofthesetwoparametersaretradeoffbetweencomputationalcomplexityandsystemaccuracy.Inconclusion,thesummaryoftheKRLS-ALDapproximationlevelselectionalgorithmbasedonMDLisshowninAlgorithm 7 Thismethodisabatchmethod.Inonlinesystem,partoftrainingdatacanbechosenrstlytoestimateaapproximationlevel.Thentheresultistakenasareferencetoguidethefollowingtrainingprocedure.Inpractice,theparticularrequirementofdifferentsystemsaredifferent.Forexample,medicaldiagnoseemphasizesonlowfalsepositiveeventhoughtheprocessingprocedureisnotprompt.Onthecontrary,inmostonlinesystemstimelyfeedbackisthemostimportanteventhoughtheresultshavecertainbutacceptableerror.Therefore,tweakingtheapproximationlevelbasedonparticularrequirementisnecessary. 89

PAGE 90

Algorithm7theKRLS-ALDapproximationlevelselectionalgorithmbasedonMDL Initialization:Selectthetheregularizationparameter>0,thesamplenumberNandSe
PAGE 91

Figure4-2. AnonlinearWienersystem identicationisprimarilyresponsiblefordeterminingadiscreteestimationofthetransferfunctionforanunknowndigitaloranalogsystem,forexample,theWienersystem. Hereasignalx2N(0,1)isfedintoanonlinearchannelthatconsistsofalinearniteimpulseresponse(FIR)channelfollowedbythenonlinearfunctiony=tanh(z),wherezisthelinearchanneloutput.Theimpulseresponseofthelinearchannelischosenash=[1,0.1050,)]TJ /F4 11.955 Tf 9.3 0 Td[(0.3760,)]TJ /F4 11.955 Tf 9.3 0 Td[(0.4284].Finally,whiteGaussiannoisewith20dBpowerisaddedtothenonlinearoutput.Inthisonlineidenticationsimulation,theinputdatumisatime-embeddingof4tapssignalandthedesireddatumisone-dimension.Ateachstep,theMSEperformanceismeasureonatestsetof100datapoints. Fig. 4-3 explorestheinuenceofregularizationterminKRLS-ALD.Ifistoosmall,suchas0.001,theoverttingisrelievedbutstillexists.When=0.01,theoverttingissuppressedandthesystemobtainamuchcloserrepresentationoftheunderlyingfunction.Whileincreases,thesystemagainhaveapoort,forexample=0.1.Weseethatineffectnowcontrolstheeffectivecomplexityofthemodelandhencedeterminesthedegreeofover-tting.Generallythevalueisoptimalforsomeintermediatevalue,neithertoolargenortoosmall.Fig. 4-4 showsthenetworksize-MSEresultobtainedbyproposedmethod.ThispointlocationisinthetransitionareaoftheRDcurve.Therefore,theapproximationleveldecidedbyproposedalgorithmhaveagoodcompromisebetweensystemcomplexityandaccuracy. 91

PAGE 92

Figure4-3. Theinuencecomparisonofdifferentregularizationtermonsystemidenticationproblem Figure4-4. ThepointlocationdecidedbyproposedMDL-basedmethodforsystemidenticationproblem 92

PAGE 93

Figure4-5. Thenormalizedannualsunspotdata 4.2.2.2Sunspotprediction TheWolfannualsunspotcounttimeserieshasbeenthesubjectofinterestinphysicsandstatisticscommunitiesandalsoisthebenchmarkdataintimeseriespredictionproblem.ThenormalizeddataareshowninFig. 4-5 .Atrst,TongappliedboththeAuto-regressionmodelAR(9)andself-excitingthresholdauto-regression(SETAR)modeltodescribethistimeseries[ 94 ].Afterthat,nonlinearradialbasisfunction(RBF)model[ 88 ]andNeuralNetworks[ 75 ]alsoareprovedtoachievecompetitivepredictiveperformance. Atrst,wetaketheone-steppredictionasanexampletoshowthesuperiorityofregularizedKRLS-ALD.Weestablishakernelsystemusingannualsunspotdataovertheperiod1700-1979.Thenthissystemisusedtopredicttheperiod1980-2012.Theproblemsettingforshorttermpredictionisasfollows:thepreviousninepointsu(i)=[x(i)]TJ /F4 11.955 Tf 12.43 0 Td[(9);...;x(i)]TJ /F4 11.955 Tf 12.44 0 Td[(1)]Tareusedastheinputvectortopredictthecurrentvaluex(i)whichisthedesiredresponse.ThreeKRLS-ALDalgorithmsarecompared:the 93

PAGE 94

Figure4-6. ThecomparisonbetweenKRLS-ALDwithandwithoutregularizationforannualsunspotprediction proposedregularizedKRLS-ALD,settingregularizationtermequalto0[ 8 ](KRLS-ALDwithoutregularization)anddiscardingthesamplesoutsideofthecenterdictionaryandignoringtheinuenceofthesesamples[ 1 ](KRLS-ALDwithdiscarding).Inthesimulationsbelow,thekernelsizeissetat=0.9,andtheregularizationinvolvedissetat=0.001.AsshowinFig. 4-6 ,theproposedKRLS-ALDwithregularizationisbetterthanothers:avoidstheoverttingsuccessfullyandhashigheraccuracy. AnotherinterestistheapproximationlevelselectioninKRLS-ALD.Inthissimulation,wesetEDA=0.5,thesamplenumbertoapproximatetheSeis30andthesampletogeneratescanningpopulationNis100.30Monte-Carlosimulationsareruntoeliminatetherandomnessresultedbydistributionapproximation.Fig. 4-7 highlightsthenetworksize-MSEpointthatobtainedbyproposedMDL-basedselectionstrategyinRate-distortioncurveofsunspotdata.ThelocationofthispointisinthetransitionareaoftheRDcurve,thatiswhatweexpected.Furthermore,fromthisresult,itisshownthatwhentheapproximationlevelisrelativehighandcorrespondnetworksizeis 94

PAGE 95

Figure4-7. ThepointlocationdecidedbyproposedMDL-basedmethodforannualsunspotprediction small,theinuenceoftheregularizationtermistinybecausethelineardependencyofeachsampleinfeaturespaceisgraduallydiminishedasapproximationlevelincreases.Therefore,theperformanceofKRLS-ALDwithregularizationandwithoutregularizationaresimilar.Withamplesimulations,wenoticethattheoptimizationsolutionofMDLgenerallyfallsinthisrange.Therefore,inpractice,theKRKS-ALDwithoutregularizationcouldbeutilizedtocalculateoptimalsolutionbecauseofsimplicityandsmallercomputationalcomplexity. Besidestheone-steppredictionmentionedabove,wetesttheachievedmodelonfree-runprediction.Forfairnessofcomparisonwenormalizetheresultin[ 94 ][ 88 ][ 75 ]andallofresults,includingKRLS-ALD,arelistedinTable 4-2 .Table 4-2 clearlyshowsthattheKRLS-ALDmodelbehavesbetterthanothersinsystemaccuracy. 4.3MDL-BasedQuantizedKernelLeast-Mean-SquareAlgorithm AfterillustratingthebatchmodelMDLinkerneladaptivelters,thissectionproposesanonlinemodelMDLalgorithmforsparsication.Throughcombininga 95

PAGE 96

simplequantizationmethodandsignicancemeasurebasedpruningstrategy,thexed-budgetQKLMSxesthenetworksizeinapredenedthreshold.However,bothofthesetwomethodshavedrawbacks. Thesimplequantizationmethod:Asmentionedbefore,eventhoughithasbeenprovensuperiorthanotherexistingsparsicationmethods[ 13 ][ 60 ]intheoreticalanalysisandsimulations,theadoptedvectorquantizationmethodisnotefcient.Thissimpleonlinemethodonlyconsidersthedistancemeasureintheinput(feature)spacewithoutcontainingtheinformationofthepredictionerror[ 13 ].Itisnoticeablethatexploringbothofthemismorereasonable.Furthermore,thereisafreeparameterquantizationfactorwhichinuencessystemaccuracyandnetworksizedramatically.Howtodecidethevalueofthisparameterstillisanopenquestion. Signicancemeasurebasedpruningstrategy:Thegrowingandpruningstrategyinthelastchaptercanbeconcludedasadd-one-discard-one.Ifanewcentershouldbeaddedintocenterdictionaryaccordingtoquantization,anexistingcenterwiththesmallestsignicancewouldbediscarded.Therefore,thenetworksizeandcomputationalcomplexityofadaptiveltersarexednotadaptivelyadjustedbythecomplexityofthetruesystemandtheinputdata. Asameasurewhichmeasuresthetrade-offbetweensystemaccuracy(thepredictionerror)andcomplexity(networksize),MDLisagoodoptionasaguidetogeneratecenterdictionaryinonlineenvironment.Inthissection,anonlinesparsicationkerneladaptivelter,calledQKLMS-MDLisproposed.Thebasicideaofproposed Table4-2. Meanfreerunpredictionerrorfortheannualsunspotcounttimeseries ModelMSE(1980-2002) AR(9)0.0466*SETAR0.1936*RBF0.5480*NeuralNetwork0.0399*KRLS0.0265 valuesmarkedwith*arecalculatedbyresultsreportedin[ 75 ]. 96

PAGE 97

algorithmisasfollow:Aslongasanewdataisreceived,theproposedalgorithmcalculatesthedescriptionlengthofaddingthisdataasanewcenterandmergingittoitsnearestcenter.Thenthestrategywithsmallerdescriptionlengthisadopted.Furthermore,theproposedalgorithmcomparesthedescriptionlengthofdiscardinganexistingcenterwiththeonekeepingit.Similarly,thestrategywiththesmallerdescriptionlengthistaken.Thisprocessisrepeateduntilallexistingcentersarescanned,suchthatthenetworksizecouldbeadjustedadaptivelyparticularlyinnonstationaryconditionwhiletheaccuracyperformanceisacceptable.Besides,becauseQKLMS-MDLquantizestheinput(feature)spaceaccordingtonotonlythedistanceintheinput(feature)spacebutalsothepredictionerrorinformation.IthasbetterperformancethanQKLMSwhichonlyconsidersthedistanceintheinput(feature)space. 4.3.1ObjectiveFunction InonlineMDL,aslidingwindowisadoptedtoestimatethedescriptionlengthvalue.SameasKRLS,themodelofKLMSisdescribedbythecentersC(n)=fcigK(n)i=1andthecorrespondingcoefcient(n).Therefore,theobjectivefunctionhereisexpressedas: Lmodel(n)=)]TJ /F4 11.955 Tf 11.29 0 Td[(log2P(eLwin(n)jC(n),(n))+K(n) 2log2Lwin(4) whereLwinisthewindowlength,andeLwin(n)expressesthepredictionerrorsfe(n)]TJ /F5 11.955 Tf -430.6 -23.9 Td[(Lwin+1),...,e(n)g.Withtheslidingwindow,QKLMS-MDLadjuststhesystemstructureaccordingtothelatestinformation.KLMSbelongstostochasticgradientdescentalgorithmanditslearningaccuracyisvariedcontinuouslybeforesteady-state.Therefore,utilizingthelatestinformationismorereasonablethantakingallhistoryrecords,andtheformermethodgivesbetterestimationofcurrentsystemperformance.QKLMS-MDLisasparsicationalgorithmaccordingtothelatestsystemperformance.Assumingthesystemtobelearnedisstationaryinthewindow,thepredictionerrors 97

PAGE 98

followthei.i.dGaussiandistribution,Eq. 4 is Lmodel(n)=nXi=n)]TJ /F11 7.97 Tf 6.59 0 Td[(Lwin+1)]TJ /F4 11.955 Tf 11.29 0 Td[(log2P(eijDn,~n)+k 2log2Lwin=Lwin 2log2e+log22 LwinLwin=2+log2 nXi=n)]TJ /F11 7.97 Tf 6.59 0 Td[(Lwin+1e(i)2!Lwin=2+K(n) 2log2Lwin(4) 4.3.2FormulationOfMDL-BasedQuantizedKernelLeast-Mean-SquareAlgo-rithm Ifanewcenterisreceived,thedescriptionlengthcostsofaddingthisdataintocenterdictionaryandmergingitintothenearestcenterarecompared.Thatis,wecalculate Lmodel 1(n)=Ladd(n))]TJ /F5 11.955 Tf 11.96 0 Td[(Lmerge(n)=log2 nXi=n)]TJ /F11 7.97 Tf 6.59 0 Td[(Lwin+1^e(i)2!Lwin=2+(K(n)+1)log2Lwin)]TJ /F4 11.955 Tf 9.96 0 Td[(log2 nXi=n)]TJ /F11 7.97 Tf 6.58 0 Td[(Lwin+1e(i)2!Lwin=2)]TJ /F5 11.955 Tf 9.97 0 Td[(K(n)log2Lwin=log2 nXi=n)]TJ /F11 7.97 Tf 6.59 0 Td[(Lwin+1^e(i)2!Lwin=2)]TJ /F4 11.955 Tf 9.96 0 Td[(log2 nXi=n)]TJ /F11 7.97 Tf 6.58 0 Td[(Lwin+1e(i)2!Lwin=2+log2Lwin(4) where^e(i)expressestheestimatedpredictionerrorafteraddinganewcenterande(i)istheapproximatedpredictionerroraftermerging.IfLmodel(n)>0,thecostofaddinganewcenterislargerthanthecostofmerging.Thereforethisnewdatashouldbemergedtoitsnearestcenter.Otherwise,anewcenterisaddedintothecenterdictionary. AsmentionedinSec. 3.2.1 ,theoutputofQKLMSwithK(n)]TJ /F4 11.955 Tf 12.42 0 Td[(1)centersforanewinputdatau(n)is: y(n)=K(n)]TJ /F6 7.97 Tf 6.59 0 Td[(1)Xi=1ai(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)(u(n),ci)(4) 98

PAGE 99

whereai(n)]TJ /F4 11.955 Tf 12.38 0 Td[(1)expressesthecoefcientofithcenteratiteration(n-1).ThebeautyofQKLMSisthatitscoefcientsarelinearcombinationofpredictionerror.Thisfactleadstoaneasyestimateofthe^e(i)ande(i).Assumeckisthenearestcenterofu(n), ^e(i)=e(i))]TJ /F5 11.955 Tf 11.96 0 Td[(e(n)(u(i),u(n))(4) e(i)=e(i))]TJ /F5 11.955 Tf 11.95 0 Td[(e(n+1)(u(i),ck)(4) SubstituteEq. 4 ,Eq. 4 intoEq. 4 ,itisstraightforwardtoobtainLmodel 1(n).Thentheappropriateoperationareselected:addinganewcenterormergingthisdataintoitsnearestcenter. SimilartoQKLMS,QKLMS-MDLmergesasampleintoitsnearestcenterintheinput(feature)space.ThedifferenceisQKLMS-MDLquantizestheinput(feature)spaceaccordingtonotonlytheinputdatadistancebutalsothepredictionerror,whichresultsinhigheraccuracy.Furthermore,theMDLcriterionadaptivelyseeksacompromisebetweensystemaccuracyandcomplexity,whileQKLMSneedsapredenedquantizationfactor. Now,wehavealreadysolvedtheproblemofhowtoincreasethenetworksizeefciently.Itisobviouslynotenoughinnonstationaryconditionwheretheoldercentersshouldbediscarded.Generalmethodsutilizeanonstationarydetector,likelikelihoodratiotest[ 95 ],M-estimators[ 96 ]andspectraanalysis,tocheckwhetherthetruesystemortheinputdatahasdifferentstatisticalpropertyasbefore.Ifdetected,learningsystemdropsthepreviousinformationandstartslearningthenewmodel.However,thecomputationalcomplexityofthesedetectorsishighandaccuracyisnotgoodenough.GoodabilityofMDLtomodulatesystemcomplexityaccordingtothedatacomplexityinspiresustoapplyMDLasthecriterionforadaptivelydiscardingexistingcenters.Aftercheckingwhetheranewdataisaddedtocenterdictionaryornot,theproposedalgorithmcomparesthedescriptionlengthcostsbetweendiscardinganexistingcenterandkeepingit.Thenthestrategywiththesmallerdescriptionlengthis 99

PAGE 100

taken.Thisprocedureisrepeateduntilallexistingcentersarescanned.Suchthatthenonstationarydetectorisnotrequiredintheproposedalgorithm,becauseMDLcriterioncouldadaptivelyadjustthenetworksizeaccordingtothelatestinformation(predictionerrorandcurrentcenterdictionary). SimilarasEq. 4 ,thedescriptionlengthdifferencebetweendiscardingandkeepingacenteris Lmodel 2(n)=Ldiscard(n))]TJ /F5 11.955 Tf 11.96 0 Td[(Lkeep(n)=log2 nXi=n)]TJ /F11 7.97 Tf 6.58 0 Td[(Lwin+1^e(i)2!Lwin=2+(mn)]TJ /F4 11.955 Tf 9.97 0 Td[(1)log2Lwin)]TJ /F4 11.955 Tf 9.96 0 Td[(log2 nXi=n)]TJ /F11 7.97 Tf 6.59 0 Td[(Lwin+1e(i)2!Lwin=2)]TJ /F5 11.955 Tf 9.96 0 Td[(K(n)log2Lwin=log2 nXi=n)]TJ /F11 7.97 Tf 6.58 0 Td[(Lwin+1^e(i)2!Lwin=2)]TJ /F4 11.955 Tf 9.96 0 Td[(log2 nXi=n)]TJ /F11 7.97 Tf 6.59 0 Td[(Lwin+1e(i)2!Lwin=2)]TJ /F4 11.955 Tf 11.95 0 Td[(log2Lwin(4) where^e(i)expressestheestimatedpredictionerrorafterdiscardinganexistingcenter,whilee(i)istheapproximatedpredictionerrorofkeepingthiscenter.IfLmodel2(n)>0,thecostofdiscardinganexistingcenterislargerthanthecostofkeepingit.Thereforethecenterdictionarydoesn'tchange.Otherwise,anexistingcentershouldbediscarded.Assumeckisthecentertobeexamined, ^e(i)=e(i)+ak(u(i),ck)(4) Notice,aslongasanexistingcenterisdiscarded,thevalueofe(i)forthenextiterationshouldbechangedcorrespondingly.Letckisthediscardedcenter, e(i)=e(i)+ak(u(i),ck)(4) Whensystemmodelisrelativelysimple,inpractice,theaccuracyperformanceplaysmoreimportantrolethanmodelsimplicity.Comparewiththemodelsimplicity,systemaccuracyisessentialafterall.Therefore,weadoptatricktoguaranteesystem 100

PAGE 101

accuracy.If^e(i)issmallerthanthepredenedthresholde,nomatterwhatisthevalueofEq. 4 ,thecorrespondingcentershouldbekeptincenterdictionary.Thishandlingavoidsthatsystemkeepsdecreasingthenetworksizewhiletheaccuracyisunacceptable,eventhoughthedescriptionlengthofthesystemgetssmaller. Algorithm 8 andTable 4-3 giveasummaryofproposedQKLMS-MDLalgorithmanditscomputationalcomplexity. Algorithm8QKLMS-MDLAlgorithm Initialization:windowlengthLwin,MSEthresholde,centerdictionaryC(1)=fu(1)gandcoefcientvectora(1)=[d(1)]; Computing whilefu(n),d(n)gisavailabledo e(n)=d(n))]TJ /F11 7.97 Tf 11.95 12.06 Td[(K(n)]TJ /F6 7.97 Tf 6.59 0 Td[(1)Pi=1ai(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)M(u(n),Ci(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)); CalculateLmodel 1(n)accordingtoEq. 4 ; ifLmodel 1(n)>0then %checkwhetheranewcenterisaddedornot C(n)=fC(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1)u(n)g,a(n)=[a(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1),e(n)]; else i=argmin1iK(n)]TJ /F6 7.97 Tf 6.59 0 Td[(1)ku(n))]TJ /F10 11.955 Tf 11.95 0 Td[(Ci(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)k; C(n)=C(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1),ai(n)=ai(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)+e(n); endif %checkwhethertheexistingcentersarediscarded whileck2C(n)do CalculateLmodel 2(n)accordingtoEq. 4 ; if^ek0then C(n)doesn'tchange; else C(n)=fC(n)nckg,Updatepredictionerroren,LwinaccordingtoEq. 4 ; endif endif endwhile endwhile Eventhoughttheprocedureofdiscardingexistingcentersistime-consuming,thereisnoneedtocheckitateveryiteration.Assumingthesystemislocallystationaryin 101

PAGE 102

mostsituation,thediscardingprocedurecouldbeimplementedperiodicallyforadesiredtimeinterval. Inthiswork,thereisafreeparameterLwin,whichadjuststhecompromisebetweensystemaccuracyandnetworksize.ThelargertheLwinvalue,themoreemphasisonsystemaccuracythanmodelcomplexity.Therefore,largerLwinresultsinrelativelyamorecomplexmodelandsmallerLwinbringasimplermodel. 4.3.3Simulation Inthissection,wetesttheQKLMS-MDLintwosignalprocessingapplications.WebeginbyexploringthebehaviorofQKLMS-MDLinasimplenonstationarysystemidenticationproblem.Next,wemovetoareal-datatime-seriespredictionproblem. 4.3.3.1Systemidentication Inthissection,theunderlyingdynamicsystemisgovernedby z(n)=a(n)z(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1)z(n)]TJ /F4 11.955 Tf 11.96 0 Td[(2)z(n)]TJ /F4 11.955 Tf 11.96 0 Td[(3)x(n)]TJ /F4 11.955 Tf 11.95 0 Td[(2)(z(n)]TJ /F4 11.955 Tf 11.96 0 Td[(3))]TJ /F4 11.955 Tf 11.96 0 Td[(1)+x(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1) 1+z(n)]TJ /F4 11.955 Tf 11.95 0 Td[(2)2+z(n)]TJ /F4 11.955 Tf 11.96 0 Td[(3)2(4) wherethesysteminputx(n)isarandomsignaldistributedasN(,1)anda(n)isavariantsystemcoefcient.Thenoisysystemoutputisgivenbyy(n)=z(n)+v(n),wherethenoiseisGaussiandistributedwithzeromeanandstandarddeviation0.1.Wechangethevalueofa(n)andthemeanofx(n)withtimetosimulationnonstationaryscenario.a=1.0intherst1000samples(areaH1)anddecreasesto0.5withdifferentchangerates(areaH12)thenxedtoa=0.5(areaH2).Similarly,themeanofx(n)issetas0inareaH1atrstandthenlinearlyincreaseto0.8,asshowninFig. 4-8 .Theproblemsettingforsystemidenticationisasfollows:theinputvectorofthekernellteris u(n)=[y(n)]TJ /F4 11.955 Tf 11.96 0 Td[(1),y(n)]TJ /F4 11.955 Tf 11.95 0 Td[(2),y(n)]TJ /F4 11.955 Tf 11.96 0 Td[(3),x(n)]TJ /F4 11.955 Tf 11.95 0 Td[(1),x(n)]TJ /F4 11.955 Tf 11.96 0 Td[(2)]T Table4-3. Computationalcostsperiteration DecidingwhetheranewcentershouldbeaddedO(Lwin) ScaningallexistingcentersanddecidingwhetheranexistingcentershouldbediscardedO(KLwin) 102

PAGE 103

andthecorrespondingdesignsignalisy(n).Inthissection,thekernelsizeandthestepsizearesetas=1.0,=1.0respectively.ThetrainingMSEiscalculatedbasedonthemeanofthepredictionerrorinarunningwindowof100samples,forvisualclarity. ATheevolutioncurveofsystemcoefcienta BTheevolutioncurveofxmean Figure4-8. Theevolutioncurvesofsystemparameters Atrst,wecomparetheperformanceofQKLMSandQKLMS-MDLwhenthetruesystemortheinputdatachangeslowly,forexample,thelengthoftransitionareaH12is1000.ThequantizationfactorofQKLMSissetas0.75andthewindowlengthofQKLMS-MDLis100.Thesimulationresultsover200Monte-CarlorunsareshowninFig. 4-4 .AsshowinFig. 4-4 ,thenetworksizeofQKLMS-MDLismuchsmallerthanQKLMSwhilethetrainingMSEofthemarealmostsame.Furthermore,theQKLMS-MDLadjuststhenetworksizeadaptivelyaccordingtothetruesystem 103

PAGE 104

complexity.Forexample,afteriteration2000thenetworksizedecreasesgraduallylongwiththetruesystemcomplexityreduces.Uselessoldercentersarediscardedandnewimportantcentersareaddedtothecenterdictionary.Onthecontrary,QKLMSkeepsallexistingcenterswhichresultsinthenetworksizegrowinglarger. Then,toverifythesuperiorityofQKLMS-MDL,intable 4-4 welistthenalnetworksizeofQKLMSandQKLMS-MDLindifferenttransitionlengthconditions.Theparameterssettingareshownintable 4-5 whereallparametersareselectedsuchthatbothalgorithmsyieldsimilarMSE.ThenalnetworksizeofQKLMS-MDLisnoticeablysmallerthanQKLMS.Furthermore,indifferentconditionthenalnetworksizesofQKLMS-MDLaresimilartoeachother.ThisfactalsotestiesthatthenetworksizechosenbyQKLMS-MDLisdecidedbythecomplexityofcurrentinputdatanotmatterhowcomplexthedataisbefore. Table4-4. ThenalnetworksizecomparisonofQKLMSandQKLMS-MDLinsystemidenticationproblem. abruptchange1 2001 5001 5000 QKLMS118.44.22119.044.0095.083.47124.373.52 QKLMS-MDL12.777.5514.599.0714.899.1914.2310.85 Notethatalloftheseresultsaresummarizedintheformofaveragestandarddeviation. Table4-5. ParameterssettinginchannelequalizationproblemtoachievealmostthesamenalNetworksizeineachcondition abruptchange1 2001 5001 5000 QKLMS=0.65=0.75=0.7=0.65 QKLMS-MDLLwin=100Lwin=100Lwin=100Lwin=100 4.3.3.2SantaFetime-seriesprediction Inthisexperiment,thechaoticlasertime-seriesisfromtheSantaFetimeseriescompetition(datasetA2)[ 97 ].Thistimeseriesisparticularlydifculttopredict,due 2http://www-psych.stanford.edu/andreas/Time-Series/SantaFe.html 104

PAGE 105

AConvergencecurvesintermsoftrainingMSE BNetworksizeevolutioncurves Figure4-9. PerformancecomparisonforQKLMSandQKLMS-MDLinsystemidenticationproblemwithtransitionarealength1000 105

PAGE 106

Figure4-10. SantaFetimeseriesdata bothtoitschaoticdynamicsandtothefactthatonlythreeintensitycollapseoccurinthetrainingset[ 8 ],asshowninFig. 4-10 .Afternormalizingalldatatolieintherange[0,1],weconsiderthesimpleone-steppredictionproblem.Withthecross-validationtest,theproblemsettingforshorttermpredictionisasfollows:theprevious40pointsx(i)=[u(i)]TJ /F4 11.955 Tf 12.23 0 Td[(40),...,u(i)]TJ /F4 11.955 Tf 12.24 0 Td[(1)]Tareusedastheinputvectortopredictthecurrentvalueu(i)whichisthedesiredresponse. AtrstweverifythesuperiorityofQKLMS-MLD.Besidesquantizationtechniques,thenoveltycriterionandsurprisecriterion(whichincludestheALDandvariancecriterionasspecialcases)aretwotypicalsparsicationcriteria.Inthefollowing,weuseKLMS-NCandKLMS-SCtodenote,respectively,thenoveltyandsurprisecriterionbasedsparsiedKLMS.InthesimulationbelowthefreeparametersrelatedtoKLMSaresetasfollows:theGaussiankernelwithkernelwidthequalingto0.6isselected 106

PAGE 107

andstepsizeissetto0.9.TheperformancecomparisonsbetweenthesealgorithmsarepresentedinFig. 4-11 andFig. 4-12 .InFig. 4-11 ,theparametersofthefoursparsealgorithmsarechosensuchthatthealgorithmsyieldalmostthesamemaximumnetworksize,whileinFig. 4-12 theparametersofthesparsealgorithmsarechosensuchthattheyproducealmostthesametrainingMSEatthenalstage.Table 4-6 liststhespecicparameterssetting.SimulationresultsclearlyindicatethattheQKLMS-MDLexhibitsmuchbetterperformance,i.e.achieveseithermuchsmallernetworksizeormuchsmallertrainingMSE,thanQKLMS,KLMS-NCandKLMS-SC.Owningtothelargedistancethreshold,mostofusefulinformationisdiscardedbyKLMS-NCwhichresultsinitslearningcurvenotdecreasingafter190iterationasshowninFig. 4-11A .Comparedwithothermethods,surprisecriterionhas3freeparametersandthesystemperformanceissensitivetothevaluesoftheseparameters.DifferentfromQKLMS,QKLMS-MDLconsidersthepredictionerrorintoquantizationand,consequently,hasbetterperformance.MoreimportantisthatthenetworksizeofQKLMS-MDLvaries.Forexample,asshowninFig. 4-11 ,thenetworksizeofQKLMS-MDLbetweentimeindex300and400issmallerthanbeforebecausetheinputdatacomplexityinthisrangeissimpler. Table4-6. Parameterssettingfordifferentalgorithmsintimeseriesprediction QKLMSMDL-QKLMSNC-KLMSSC-KLMS Samenetworksize=1.97Lwin=1501=1.38=0.0052=0.001T1=90,T2=)]TJ /F18 10.909 Tf 8.48 0 Td[(0.085 SamenalMSE=1.54Lwin=1501=0.85=0.0052=0.001T1=300,T2=)]TJ /F18 10.909 Tf 8.48 0 Td[(1.6 :quantizationfactor 1:thedistancethreshold,2:theerrorthreshold :theregularizationparameter,T1:theupperthresholdofthesurprisemeasure,T2:thelowerthresholdofthesurprisemeasure. Thenweinvestigatehowthewindowlengthaffectsthesystemperformance.TheeffectofwindowlengthonnaltrainingMSEandnalnetworksizeareshowninFig. 4-13 andFig. 4-14 respectively.AsshowninFig. 4-13 ,withthethewindowlengthincreasing,thenaltrainingMSEofKLMS-MDLisclosertothatofKLMS(for 107

PAGE 108

AConvergencecurvesintermsofthetrainingMSE BNetworksizeevolutioncurves Figure4-11. PerformancecomparisoninSantaFetimeseriesprediction.Theparametersofthealgorithmsarechosensuchthattheyproducealmostthesamemaximumnetworksize. 108

PAGE 109

AConvergencecurvesintermsofthetestingMSE BNetworksizeevolutioncurves Figure4-12. PerformancecomparisoninSantaFetimeseriesprediction.TheparametersofthealgorithmsarechosensuchthattheyproducealmostthesametrainingMSEatthenalstage. 109

PAGE 110

Figure4-13. EffectofthewindowlengthonnaltestingMSEinSantaFetimeseriesdata comparison,thenaltrainingMSEofKLMSalsoisplottedinthegure).Fig. 4-14 indicatesthattheincreaseofthewindowlengthresultsinthenalnetworksizerise.Fortunately,thisgrowthtrendgraduallyweakens.Suchthatthenetworksizekeepsinarangenomatterhowlongofthewindowlength.Inthissensethewindowlengthisacompromiseofsystemaccuracyandnetworkcomplexity. 4.4Conclusion Howtochooseanefcientkernelmodeltomakeatrade-offbetweencomputationalcomplexityandsystemaccuracyalwaysisaninterestingtopic.Basedonthepopularandpowerfultool,MDLcriterion,twosparcicationalgorithmsforkerneladaptiveltersareproposed.OneisaguideforapproximationlevelselectioninKRLS-ALD,anotheroneistoestablishanadaptivelyadjustedQKLMSmodel.Experimentsshowsthattherstalgorithmpointsoutagoodcompromisebetweensystemcomplexityandaccuracyperformanceanditsresultleadsagoodreferenceforapproximationlevelselection.Furthermore,theQKLMS-MDLsuccessfullyadjuststhenetworksizeaccordingtothe 110

PAGE 111

Figure4-14. EffectofthewindowlengthonnalcenternumberinSantaFetimeseriesdata inputdatacomplexitywhilekeeptheaccuracyinacceptablerange.Thispropertyisveryusefulinnonstationaryconditionwhileotherexistingsparsicationmethodskeepthenetworksizegrowing.FewerfreeparametersandonlinemodelmakeQKLMS-MDLispracticalinreal-timeapplications. 111

PAGE 112

CHAPTER5CONCLUSIONANDFUTUREWORK 5.1ConclusionAndDiscussion Inthiswork,wehavepresentedthreealgorithms,namelyKMC,KLMS-FBandMDLbasedsparsication,toimprovetheaccuracyandcomputationalcomplexityofkerneladaptivelters. First,theMCCcriterionisintroducedtoKLMStoimprovethesystemaccuracyinnon-linearandnon-Gaussiancondition.TheproposedKMCalgorithmhashigheraccuracythankerneladaptivelterswithtraditionalMSEcostfunction.Usingcurrentvaluetoapproximatetheexpectationofcorrentropy,KMChassmallcomputationalcomplexityofO(N),whereNisthenumberofprocesseddata.ThisworkalsopresentsthewellposednessanalysisontheKMCfromthesteady-stateanalysisperspective.Sincekernelmethodsareusuallyformulatedinaveryhighdimensionalspace,thewellposednessanalysisinthecaseofnitetrainingdataiscrucial.Furthermore,thekernelwidthincorrentropyhashighinuencetosystemperformance.Todealwiththisproblem,weformulatesanintuitiveselectionmethodusingkurtosisofthepredictionerror. Second,weproposedtwoalgorithmstoestablishacompactstructureofkerneladaptivelters.Asmentioned,themodelcomplexityofexistingkerneladaptiveltersgrowslinearlywiththenumberofprocesseddata,hinderingonlineapplications,particularlyincontinuousscenarios.Eventhoughthereareseveralexistingsparsicationalgorithmstocurbthenetworksize,thenetworksizecurvestillis.Therefore,KLMS-FBalgorithmisproposedtoxthenetworksizeaccordingtoapredenedthreshold.Whenanewcenterisadded,KLMS-FBdiscardstheexistingcenterwiththesmallestsignicancemeasure.ThesignicancemeasurecouldbeupdatedrecursivelyateachstepwhichleadstheKLMS-FBmethodologyisonline.However,thexed-budgetapproachisnotsufcientinnonstationarycondition.Howtoselectthenetworksize 112

PAGE 113

thresholdstillisanopenquestion.Besides,thenetworksizedoesn'tvarieswiththeinputdatacomplexity.Therefore,theMDL-basedsparsictionalgorithmsareproposedtoadaptivelyupdatemodelstructureaccordingtotheinputdata.TheapproximationlevelselectioninKRLS-ALDandQKLMS-MDLareexamplestoillustratethebatchmodelandonlinemodelMDLrespectively.WithMDL,kerneladaptivelterscansearchabalancebetweensystemcomplexityandaccuracy.Besides,thenetworksizeofQKLMS-MDLisadaptivelyadjustedbytheinputdatacomplexity,especiallyinnonstationaryconditions.However,iftheinputdatacomplexitychangesabruptly,QKLMS-MDLneedstimetoadjustthemodelcomplexity.Therefore,theadjustmentabilityofQKLMS-MDLhashysteresis,makingQKLMS-MDLnotsuitableforthesystemswithnonstopabruptchanges.Inmostcases,thesystemislocallystationary,thereforeQKLMS-MDLcouldefcientlyupdatemodelstructure. 5.2FeatureWork Wepresenteleganttheoriestoimprovekerneladaptivelteringwherefurtherworkispossibleeitherinpracticalapplicationorinbetteringtheirpracticalimplementation.Hencethefutureworkwilladdressthesemainissues. WebelievethatmanyapplicationswillbenetfromQKLMS-MDL.Forexample,speechsignalprocessingisanextremelyinterestingarea.Thenonstationarypropertyisanatureofspeechsignal.Furthermore,owningtohighsamplingfrequency,thenumberofsignalsamplesalwaysisrelativelylarge.Therefore,thesystemswithsignalprocessingshouldbegoodapplicationsforQKLMS-MDL.Besidesspeechprocessing,point-processdataanalysisalsoisasuitableapplication.Pointprocessesarefrequentlyusedasmodelsforrandomeventsintime,suchasthearrivalofimpulsesinaneuron.KernelmethodisespeciallyusefulforthiskindofdatatypethatdonotexistinEuclideanspace.Becauseneuralsignalsarelargeinscaleandmostoftheapplicationsareonline,computationalcomplexityisanimportantconsideration.WiththeQKLMS-MDL,thesystemslikeneuraldecodingcouldhavebetteronlinetrackingability. 113

PAGE 114

AsthedescriptioninSec. 4.3 ,innonstationaryconditiontheQKLMS-MDLalgorithmforgetsthepreviouslearningresultsandre-learningtheinput-outputmappingwhensystemswitchtoanewstate.Thisstrategykeepsthesystemmodelinarelativelysimplestructure.Butwhenapreviousstateappearsagain,QKLMS-MDLrelearnsallstartfromscratchandtheformerlearninginformationiswasted.Isitpossibletostoreallpreviousstatelearningresultandtakeadvantageoftheseinformationtoimprovesystemperformance?Itwillbeaninterestingtopic. 114

PAGE 115

REFERENCES [1] W.Liu,J.Prncipe,andS.Haykin,KernelAdaptiveFiltering:AComprehensiveIntroduction,WILEY,2010. [2] V.Vapnik,Thenatureofstatisticallearningtheory,Springer,NewYork,1995. [3] F.Girosi,M.Jones,andT.Poggio,Regularizationtheoryandneuralnetworksarchitectures,NeuralCompuatation,vol.7,pp.219,1995. [4] B.Scholkopf,A.Smola,andK.Muller,Nonlinearcomponentanalysisasakerneleigenvalueproblem,NeuralCompuatation,vol.10,no.5,pp.1299,1998. [5] S.Mika,G.Ratsch,J.Weston,B.Scholkopf,andK.Mullers,Fisherdiscriminantanalysiswithkernels,NeuralNetworksforSignalProcessingIX,1999.Proceed-ingsofthe1999IEEESignalProcessingSocietyWorkshop.,pp.41,1999. [6] W.LiuandJ.Prncipe,Kernelafneprojectionalgorithms,EURASIPJournalonAdvancesinSignalProcessing,vol.2008,pp.1,2008. [7] W.Liu,P.Pokharel,andJ.Prncipe,Thekernelleastmeanquarealgorithm,IEEETransactionsonSignalProcessing,vol.56,pp.543,2008. [8] Y.Engel,S.Mannor,andR.Meir,Thekernelrecursiveleast-squaresalgorithm,IEEETransactionsonsignalprocessing,vol.52,no.8,pp.2275,2004. [9] I.Steinwart,Ontheinuenceofthekernelontheconsistencyofsupportvectormachines,JournalofMachineLearningResearch,vol.2,pp.67,2001. [10] J.Platt,Aresource-allocatingnetworkforfunctioninterpolation,NeuralCompu-atation,vol.3,pp.213,1991. [11] L.CsatoandM.Opper,Sparseon-linegaussianprocesses,NeuralCompuata-tion,vol.14,no.3,pp.641,2002. [12] W.Liu,I.Park,andJ.Prncipe,Aninformationtheoreticapproachofdesigningsparsekerneladaptivelters,NeuralNetworks,IEEETransactionson,vol.20,pp.1950,2009. [13] B.Chen,S.Zhao,P.Zhu,andJ.Prncipe,Quantizedkernelleastmeansquarealgorithm,NeuralNetworksandLearningSystems,IEEETransactionson,vol.23,no.1,pp.22,2012. [14] S.PeiandC.Tseng,Leastmeanp-powererrorcriterionforadaptiverlter,Se-lectedAreasinCommunications,IEEEJournalon,vol.12,no.9,pp.1540,1994. [15] J.Prncipe,InformationTheoreticLearning,Springer,2010. 115

PAGE 116

[16] E.Walach.andB.Widrow.,Theleastmeanfourth(lmf)adaptivealgorithmanditsfamily,InformationTheory,IEEETransactionson,vol.30,no.2,pp.275,1984. [17] A.Zerguine,M.Moinuddin,andS.Imam,Anoiseconstrainedleastmeanfourth(nclmf)adaptivealgorithm,SignalProcessing,vol.91,no.1,pp.136,2011. [18] A.Barros,J.Prncipe,Y.Takeuchi,andN.Ohnishi,Usingnon-linearevenfunctionsforerrorminimizationinadaptivelters,Neurocomputing,vol.70,no.1,pp.9,2006. [19] B.Chen,J.Hu,Y.Zhu,andZ.Sun,Informationtheoreticinterpretationoferrorcriteria,ActaAutomaticaSinica,vol.35,no.10,pp.1302,2009. [20] W.Liu,P.Pokharel,andJ.Prncipe,Correntropy:Propertiesandapplicationsinnon-gaussiansignalprocessing,SignalProcessing,IEEETransactionson,vol.55,no.11,pp.5286,2007. [21] J.Xu,P.Pokharel,A.Paiva,andJ.Prncipe,Nonlinearcomponentanalysisbasedoncorrentropy,inNeuralNetworks,2006.InternationalJointConferenceon,2006,pp.1851. [22] A.GunduzandJ.Prncipe,Correntropyasanovelmeasurefornonlinearitytests,SignalProcessing,vol.89,no.1,pp.14,2009. [23] K.JeongandJ.Prncipe,Thecorrentropymacelterforimagerecognition,inMachineLearningforSignalProcessing,2006.Proceedingsofthe200616thIEEESignalProcessingSocietyWorkshopon,2006,pp.9. [24] I.ParkandJ.Prncipe,Correntropybasedgrangercausality,inAcoustics,SpeechandSignalProcessing,2008.ICASSP2008.IEEEInternationalConferenceon,2008,pp.3605. [25] A.SinghandJ.Prncipe,Alossfunctionforclassicationbasedonarobustsimilaritymetric,inNeuralNetworks(IJCNN),The2010InternationalJointConferenceon,2010,pp.1. [26] A.SinghandJ.Prncipe,Usingcorrentropyasacostfunctioninlinearadaptivelters,inNeuralNetworks,2009.IJCNN2009.InternationalJointConferenceon,2009,pp.2950. [27] I.Santamara,P.Pokharel,andJ.Prncipe,Generalizedcorrelationfunction:Denition,propertiesandapplicationtoblindequalization,SignalProcessing,IEEETransactionson,vol.54,no.6,pp.2187,2006. [28] C.Micchelli,Y.Xu,andH.Zhang,Universalkernel,TheJournalofMachineLearningResearch,vol.7,pp.2006,2651-2667. 116

PAGE 117

[29] B.Lin,R.He,X.Wang,andB.Wang,Thesteady-statemean-squareerroranalysisforleastmeanp-orderalgorithm,SignalProcessingLetters,IEEE,vol.16,no.3,pp.176,2009. [30] S.Zhao,B.Chen,andJ.Prncipe,Kerneladaptivelteringwithmaximumcorrentropycriterion,inNeuralNetworks(IJCNN),The2011InternationalJointConferenceon,2011,pp.20122017. [31] A.SinghandJ.Prncipe,Informationtheoreticlearningwithadaptivekernels,SignalProcessing,vol.91,no.2,pp.203,2011. [32] M.Jones,J.Marron,andS.Sheather,Abriefsurveyofbandwidthselectionfordensityestimation,JournaloftheAmericanStatisticalAssociation,vol.91,pp.401,1996. [33] B.Silverman,Densityestimationforstatisticsanddataanalysis,ChapmanandhallLondon,1986. [34] B.KimandJ.Marron,Asymptoticallybestbandwidthselectorsinkerneldensityestimation,StatisticsandProbabilityLetters,vol.19,pp.119,1994. [35] A.Bowman,Analternativemethodofcross-validationforthesmoothingofdensityestimates,Biometrika,vol.71,pp.353,1984. [36] D.Scott.andG.Terrell,Biasedandunbiasedcross-validationindensityestimation,JournaloftheAmericanStatisticalAssociation,vol.82,pp.1131,1987. [37] A.PaivaandJ.Prncipe,Axedpointupdateforkernelwidthadaptationininformationtheoreticcriteria,in2010IEEEInternationalWorkshoponMachineLearningforSignalProcessing,2010,pp.262. [38] D.Middleton,Statistical-physicalmodelsofelectromagneticinterference,Electro-magneticCompatibility,IEEETransactionson,,no.3,pp.106,1977. [39] J.Kivinen,A.Smola,andR.Williamson,Onlinelearningwithkernels,IEEEtransacationonsignalprocessing,vol.52,pp.2165,2004. [40] C.Richard,J.C.M.Bermudez,andP.Honeine,Onlinepredictionoftimeseriesdatawithkernels,IEEETransactionsonSignalProcessing,vol.57,no.3,pp.1058,2009. [41] O.Dekel,S.S.Shwartz,andY.Singer,Theforgetron:Akernel-basedperceptrononaxedbudget,InAdvancesinNeuralInformationProcessingSystems18,pp.1342,2006. [42] J.Hertz,A.Krogh,andR.G.Palmer,Introductiontothetheoryofneuralcomputa-tion,Addison-Wesley,1991. 117

PAGE 118

[43] Y.ChengandC.Lin,Alearningalgorithmforradialbasisfunctionnetworks:withthecapabilityofaddingandpruningneurons,inInProceedingsofIEEEInternationalConferenceonNeuralNetworks,1994,vol.2,pp.797. [44] S.Chen,C.C.F.N.,andP.M.Grant,Orthogonalleastsquareslearningalgorithmforradialbasisfunctionnetworks,IEEEtransacationonNeuralNetworks,vol.2,no.2,pp.302,1991. [45] M.Salmeron,J.Ortega,C.Puntonet,andA.Prieto,Improvedransequentialpredictionusingorthogonaltechniques,Neurocomputing,vol.41,pp.152,2001. [46] K.Singer,Onlineclassicationonabudget,inAdvancesinNeuralInformationProcessingSystems16:Proceedingsofthe2003Conference.MITPress,2004,vol.16,p.225. [47] Y.L.Cun,J.S.Denker,andS.A.Solla,Optimalbraindamage,AdvancesinNeuralInformationProcessingSystems,1990. [48] B.Hassibi,D.Stork,andG.Wolff,Optimalbrainsurgeonandgeneralnetworkpruning,inIEEEInternationalConferenceonNeuralNetworks,CA,USA,Mar1993,pp.293. [49] B.DeKruifandT.DeVries,Pruningerrorminimizationinleastsquaressupportvectormachines,NeuralNetworks,IEEETransactionson,vol.14,pp.696,May2003. [50] Y.Lu,N.Sundararajan,andP.Saratchandran,Asequentiallearningschemeforfunctionapproximationusingminimalradialbasisfunctionneuralnetworks,NeuralComputation,vol.2,no.2,pp.461,1997. [51] L.Yingwei,N.Sundararajan,andP.Saratchandran,Performanceevaluationofasequentialminimalradialbasisfunction(rbf)neuralnetworklearningalgorithmminmal,NeuralNetworks,IEEETransactionson,vol.9,no.2,pp.308,1998. [52] J.Weston,A.Bordes,andL.Bottou,Online(andofine)onaneventighterbudget,inProceedingsofInternationalWorkshoponArticialIntelligenceandStatistics,2005,pp.413. [53] Z.WangandS.Vucetic,Tighterperceptronwithimproveddualuseofcacheddataformodelrepresentationandvalidation,inNeuralNetworks,2009.IJCNN2009.InternationalJointConferenceon.IEEE,2009,pp.3297. [54] I.Rojas,H.Pomares,J.L.Bernier,J.Ortega,B.Pino,F.J.Pelayo,andA.Prieto,Timeseriesanalysisusingnormalizedpg-rbfnetworkwithregressionweights,Neurocomputing,vol.42,pp.267,2002. 118

PAGE 119

[55] G.huang,P.Saratchandran,andN.Sundararajan,Ageneralizedgrowingandpruningrbf(ggap-rbf)neuralnetworkforfunctionapproximation,NeuralNetworks,IEEETransactionson,vol.16,no.1,pp.57,2005. [56] S.V.Vaerenbergh,J.Via,andI.Santamana,Asliding-windowkernelrlsalgorithmanditsapplicationtononlinearchannelidentication,inAcoustics,SpeechandSignalProcessing,2006.ICASSP2006Proceedings.2006IEEEInternationalConferenceon,2006,pp.789. [57] S.V.Vaerenbergh,J.Via,andI.Santamaria,Nonlinearsystemidenticationusinganewsliding-windowkernelrlsalgorithm,inJournalofComm.,2007,vol.2,pp.1. [58] S.V.Vaerenbergh,I.Santamaria,W.Liu,andJ.C.Prncipe,Fixed-budgetkernelrecursiveleast-squares,inAcousticsSpeechandSignalProcessing(ICASSP),2010IEEEInternationalConferenceon,2010,pp.1882. [59] M.Lazaro-Gredilla,S.VanVaerenbergh,andI.Santamara,Abayesianapproachtotrackingwithkernelrecursiveleast-squares,inMachineLearningforSignalProcessing(MLSP),2011IEEEInternationalWorkshopon,2011,pp.1. [60] B.Chen,S.Zhao,P.Zhu,andJ.Prncipe,Quantizedkernelrecursiveleastsquaresalgorithm,inAcceptedbyIEEETransactionsonNeuralNetworksandLearningSystems.IEEE,2012. [61] E.Parzen,Onestimationofaprobabilitydensityfunctionandmode,Theannalsofmathematicalstatistics,vol.33,no.3,pp.1065,1962. [62] M.GonenandE.Alpaydn,Multiplekernellearningalgorithms,JournalofMachineLearningResearch,vol.12,pp.2211,2011. [63] M.Herbster,Relativelossboundsandpolynomial-timepredictionsforthek-lms-netalgorithm,inAlgorithmicLearningTheory.Springer,2004,pp.309. [64] C.Ong,A.Smola,andB.Williamson,Learningthekernelwithhyperkernels,JournalofMachineLearningResearch,vol.6,pp.1045,2005. [65] H.Akaike,Anewlookatthestatisticalmodelidentication,AutomaticControl,IEEETransactionson,vol.19,no.6,pp.716,1974. [66] G.Schwarz,Estimatingthedimensionofamodel,Theannalsofstatistics,vol.6,no.2,pp.461,1978. [67] A.Barron,J.Rissanen,andB.Yu,Theminimumdescriptionlengthprincipleincodingandmodeling,InformationTheory,IEEETransactionson,vol.44,no.6,pp.2743,1998. [68] J.Rissanen,Universalcoding,information,prediction,andestimation,InformationTheory,IEEETransactionson,vol.30,no.4,pp.629,1984. 119

PAGE 120

[69] J.Rissanen,Mdldenoising,InformationTheory,IEEETransactionson,vol.46,no.7,pp.2537,2000. [70] L.Xu,Bayesianyingyanglearning(ii):Anewmechanismformodelselectionandregularization,2004. [71] Z.YiandM.Small,Minimumdescriptionlengthcriterionformodelingofchaoticattractorswithmultilayerperceptronnetworks,CircuitsandSystemsI:RegularPapers,IEEETransactionson,vol.53,no.3,pp.722,2006. [72] T.Nakamura,K.Judd,A.Mees,andM.Small,Acomparativestudyofinformationcriteriaformodelselection,InternationalJournalofBifurcationandChaos,vol.16,no.8,pp.2153,2006. [73] J.Rissanen,Modelingbyshortestdatadescription,Automatica,vol.14,no.5,pp.465,1978. [74] M.LiandP.Vitanyi,AnintroductiontoKolmogorovcomplexityanditsapplications,Springer-VerlagNewYorkInc,2008. [75] M.SmallandC.Tse,Minimumdescriptionlengthneuralnetworksfortimeseriesprediction,PhysicalReviewE,vol.66,no.6,pp.066701,2002. [76] A.Ning,H.Lau,Y.Zhao,andT.Wong,Fulllmentofretailerdemandbyusingthemdl-optimalneuralnetworkpredictionanddecisionpolicy,IndustrialInformatics,IEEETransactionson,vol.5,no.4,pp.495,2009. [77] J.WangandY.Hsu,Anmdl-basedhammersteinrecurrentneuralnetworkforcontrolapplications,Neurocomputing,vol.74,no.1,pp.315,2010. [78] Y.Molkov,D.Mukhin,E.Loskutov,A.Feigin,G.Fidelin,etal.,Usingtheminimumdescriptionlengthprincipleforglobalreconstructionofdynamicsystemsfromnoisytimeseries,Phys.Rev.E,vol.80,pp.046207,2009. [79] A.LeonardisandH.Bischof,Anefcientmdl-basedconstructionofrbfnetworks,NeuralNetworks,vol.11,no.5,pp.963,1998. [80] H.Bischof,A.Leonardis,andA.Selb,Mdlprincipleforrobustvectorquantisation,PatternAnalysis&Applications,vol.2,no.1,pp.59,1999. [81] K.ShinodaandT.Watanabe,Mdl-basedcontext-dependentsubwordmodelingforspeechrecognition,AcousticalScienceandTechnology,vol.21,no.2,pp.79,2000. [82] T.Rakthanmanon,E.Keogh,S.Lonardi,andS.Evans,Mdl-basedtimeseriesclustering,KnowledgeandInformationSystems,pp.1,2012. 120

PAGE 121

[83] H.BischofandA.Leonardis,Fuzzyc-meansinanmdl-framework,inPatternRecognition,2000.Proceedings.15thInternationalConferenceon,2000,vol.2,pp.740. [84] I.Jonyer,L.Holder,andD.Cook,Mdl-basedcontext-freegraphgrammarinductionandapplications,InternationalJournalonArticialIntelligenceTools,vol.13,no.1,pp.65,2004. [85] S.Papadimitriou,J.Sun,C.Faloutsos,andP.Yu,Hierarchical,parameter-freecommunitydiscovery,MachineLearningandKnowledgeDiscoveryinDatabases,pp.170,2008. [86] R.Zemel,Aminimumdescriptionlengthframeworkforunsupervisedlearning,Ph.D.dissertation,UniversityofToronto,1993. [87] J.Rissanen,Auniversalpriorforintegersandestimationbyminimumdescriptionlength,TheAnnalsofstatistics,vol.11,no.2,pp.416,1983. [88] K.JuddandA.Mees,Onselectingmodelsfornonlineartimeseries,PhysicaD:NonlinearPhenomena,vol.82,pp.426,1995. [89] A.Edwards,Likelihood,CambridgeUnivPr,1984. [90] M.HansenandB.Yu,Modelselectionandtheprincipleofminimumdescriptionlength,JournaloftheAmericanStatisticalAssociation,vol.96,no.454,pp.746,2001. [91] P.LarranagaandJ.Lozano,Estimationofdistributionalgorithms:Anewtoolforevolutionarycomputation,vol.2,Springer,2002. [92] G.Kechriotis,E.Zervas,andE.Manolakos,Usingrecurrentneuralnetworksforadaptivecommunicationchannelequalization,NeuralNetworks,IEEETransac-tionson,vol.5,no.2,pp.267,1994. [93] N.SandsandJ.Ciof,Nonlinearchannelmodelsfordigitalmagneticrecording,Magnetics,IEEETransactionson,vol.29,no.6,pp.3996,1993. [94] H.Tong,Nonlineartimeseriesanalysis,EncyclopediaofBiostatistics,2005. [95] D.Boes,F.Graybill,andA.Mood,Introductiontothetheoryofstatistics,Seriesinprobabili,1974. [96] S.Geer,Applicationsofempiricalprocesstheory,CambridgeBibles,2000. [97] A.WeigendandN.Gershenfeld,Timeseriesprediction:Forecastingthefutureandunderstandingthepast,1994,inProceedingsofaNATOAdvancedResearchWorkshoponComparativeTimeSeriesAnalysis,heldinSantaFe,NewMexico. 121

PAGE 122

BIOGRAPHICALSKETCH SonglinZhao,grewupinJiangsu,China.ShegotherB.S.andM.S.degreesinelectricalengineeringfromXi'anJiaotongUniversityin2006and2009respectively.ShewasinvolvedinnewgenerationvideocompressionstandardinInstituteofArticialIntelligenceandRobotics,Xi'anJiaotongUniversity,participatetodesignEmbeddedScalableMultipleDescriptionvideocodec.In2009,shejoinedtheComputationalNeuroEngineeringLaboratoryattheUniversityofFloridaasaPh.D.student.Herresearchfocusesonsignalprocessing,adaptiveltering,andmachinelearning. 122