<%BANNER%>

Capturing Spike Train Similarity Structure

Permanent Link: http://ufdc.ufl.edu/UFE0042022/00001

Material Information

Title: Capturing Spike Train Similarity Structure A Point Process Divergence Approach
Physical Description: 1 online resource (137 p.)
Language: english
Creator: Park, Il
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2010

Subjects

Subjects / Keywords: computational, divergence, hypothesis, memming, neuron, neuroscience, point, probability, process, spike, statistics, train
Biomedical Engineering -- Dissertations, Academic -- UF
Genre: Biomedical Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Neurons mostly communicate via stereotypical events called action potentials (or spikes for short) giving rise to a time series called the neural spike train. Spike trains are random in nature, and hence one needs to deal with the probability law over the spike trains space (point process). Rich statistical descriptors are a prerequisite for statistical learning in the spike train domain; this provides necessary analysis tools for neural decoding, change detection, and neuron model fitting. The first and second order statistics prevalently used in neuroscience -- such as mean firing rate function, and correlation function -- do not fully describe the randomness, thus are partial statistics. However, restricting the study to these basic statistics implicitly limit what can be discovered. We propose three families of statistical divergences that enable non-Poisson, and more over, distribution-free spike train analysis. We extend the Kolmogorov-Smirnov test, phi-divergence, and kernel based divergence to point processes. This is possible through the development of novel mathematical foundations for point process representations. Compared to the similarity or distance measures for spike trains that assumes predefined stochasticity and hence are not flexible, divergences applied to sets of spike trains capture the underlying probability law and measures the statistical similarity. Therefore, divergences are more robust, and assumption free. We apply the methodology on real data from neuronal cultures as well as anesthetized animals for neuroscience and neuroengineering applications posed as statistical inferences to evaluate their usefulness.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Il Park.
Thesis: Thesis (Ph.D.)--University of Florida, 2010.
Local: Adviser: Principe, Jose C.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2010
System ID: UFE0042022:00001

Permanent Link: http://ufdc.ufl.edu/UFE0042022/00001

Material Information

Title: Capturing Spike Train Similarity Structure A Point Process Divergence Approach
Physical Description: 1 online resource (137 p.)
Language: english
Creator: Park, Il
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2010

Subjects

Subjects / Keywords: computational, divergence, hypothesis, memming, neuron, neuroscience, point, probability, process, spike, statistics, train
Biomedical Engineering -- Dissertations, Academic -- UF
Genre: Biomedical Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Neurons mostly communicate via stereotypical events called action potentials (or spikes for short) giving rise to a time series called the neural spike train. Spike trains are random in nature, and hence one needs to deal with the probability law over the spike trains space (point process). Rich statistical descriptors are a prerequisite for statistical learning in the spike train domain; this provides necessary analysis tools for neural decoding, change detection, and neuron model fitting. The first and second order statistics prevalently used in neuroscience -- such as mean firing rate function, and correlation function -- do not fully describe the randomness, thus are partial statistics. However, restricting the study to these basic statistics implicitly limit what can be discovered. We propose three families of statistical divergences that enable non-Poisson, and more over, distribution-free spike train analysis. We extend the Kolmogorov-Smirnov test, phi-divergence, and kernel based divergence to point processes. This is possible through the development of novel mathematical foundations for point process representations. Compared to the similarity or distance measures for spike trains that assumes predefined stochasticity and hence are not flexible, divergences applied to sets of spike trains capture the underlying probability law and measures the statistical similarity. Therefore, divergences are more robust, and assumption free. We apply the methodology on real data from neuronal cultures as well as anesthetized animals for neuroscience and neuroengineering applications posed as statistical inferences to evaluate their usefulness.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Il Park.
Thesis: Thesis (Ph.D.)--University of Florida, 2010.
Local: Adviser: Principe, Jose C.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2010
System ID: UFE0042022:00001


This item has the following downloads:


Full Text

PAGE 1

CAPTURINGSPIKETRAINSIMILARITYSTRUCTURE:APOINTPROCESSDIVERGENCEAPPROACHByIL\MEMMING"PARKADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2010

PAGE 2

c2010Il\Memming"Park 2

PAGE 3

DedicatedtoKurtGodel 3

PAGE 4

ACKNOWLEDGMENTS ItwouldbeonlyappropriatetothankmyadvisorDr.JoseCarlosSantosCarvalhoPrnciperstforhisguidanceandlessonsnotonlyforresearchbutforlifeingeneral.Alotofpeoplehelpedmegetthroughmyjourneyofgraduateschool,andperhapsmyattempttoproperlythankthemallwillfailmiserably,butIhavetotry.Dr.ThomasB.DeMarsehelpedmeenormouslyespeciallybylettingmeperformexperiments,andhehasbeenemotionallysupportingmyresearchaswell.IowemydeepestgratitudetoDr.MuraliRaoforbringingmathematicalrigortomyclumsyideas.MycommitteemembersDr.ArunavaBanerjee,Dr.BruceWheelerandDr.JustinSanchezsupportedmeandkeptmemotivated.Dr.JohnHarris'skindsupportallowedmetomakefriendsandconnectionsaroundtheworld.Dr.PurvisBedenbaughbroughtmeaspecialChristmasgiftofauditoryspikingdatain2009.Iamindebtedtomanyofmycolleagues;withouttheirsupportthisdissertationwouldnothavebeenpossible.AntonioRafaeldaCostaPaivahasbeenagreatfriendandcolleaguefordevelopingspiketrainbasedsignalprocessingalgorithms.JianwuXuandWeifengLiugavemegreatintuitionsforreproducingkernelHilbertspaces.DongmingXuenlightenedmeondynamicalsystems.BrainstormingwithKarlDockendorfwasalwaysapleasure.IlearnedsomuchfromthediscussionswithStevenVanVaerenberghandLuisSanchez.AmongallthemostfruitfulcollaborationwaswithSohanSeth.Hehasbeenagreatfriend,andbroughtjoytomywork.Igreatlyappreciateallthesupportmyfriendsgavemeinanumberofways.Ionlymentionafewofthemhere:PingpingZhutheoperatoroperator,JasonWintersthecreative,AysegulGunduzthebrilliant,VaibhavGargthegood,SachinTalathithesynchrony,AbhishekSinghthemangoshake,AlexanderSingh-Alvaradothehappyomniscient,SavyasachiSinghtheexperthermit,RajasimhanRajagovindantheattentionmeasurer,ManuRastogithecheerful,SunghoOhthesosa,AshishMylesthecouncilor,LinLithestrongwilled,FlorianKuehneltheadventurer,YujinKimthegreatscientist, 4

PAGE 5

JenniferJacksonthekind,Dong-ukHwangthewise,ErionHasanbelliuthecomplainer,KatherineFengthequestioner,AustinBrockmeierthesmart,andYuriyBobkovthepoliteelectrophysiologist.SpecialthanksgoestoFrancescaSpedalieri.Withouthersupport,Iwouldhavebeeninacompletelydierentemotionalstateduringthewriting.Lastbutnotleast,Iwouldliketothankmyfamilyforalltheirloveandsupport. 5

PAGE 6

TABLEOFCONTENTS page ACKNOWLEDGMENTS ................................. 4 LISTOFTABLES ..................................... 9 LISTOFFIGURES .................................... 10 CHAPTER ABSTRACT ........................................ 12 1INTRODUCTION .................................. 13 1.1NeuroscienceApplications ........................... 17 1.1.1ChangeDetection ............................ 17 1.1.2EstimationandGoodness-of-t ..................... 18 1.1.3NeuralAssemblyIdentication ..................... 18 1.1.4CodingCapacity ............................. 19 1.1.5FunctionalConnectivityviaDependenceDetection .......... 19 2SPIKETRAINSPACES ............................... 20 2.1RepresentationsofSpikeTrains ........................ 20 2.2DistanceandDissimilarityMeasures ..................... 22 2.3ReproducingKernelHilbertSpaceforSpikeTrains ............. 25 2.4KernelDesignProblem ............................. 28 3POINTPROCESSSPACES ............................. 30 3.1DenitionofPointProcess ........................... 31 3.1.1RandomCountingMeasure ....................... 31 3.1.2ConditionalIntensityFunction ..................... 32 3.1.3ProbabilityMeasureonSpikeTrainSpace .............. 33 3.2PreciselyTimedSpikeTrains ......................... 34 3.2.1PreciselyTimedActionPotentialModel ............... 36 3.2.2PreciselyTimedSpikeTrainModel .................. 38 3.3PointProcessDivergenceandDissimilarity .................. 42 4CUMULATIVEBASEDDIVERGENCE ...................... 45 4.1Introduction ................................... 45 4.2ExtendedKolmogorov{SmirnovDivergence .................. 46 4.3ExtendedCramer{von-MisesDivergence ................... 48 4.4SimulationResults ............................... 50 4.4.1PoissonProcess ............................. 50 4.4.2StationaryRenewalProcesses ..................... 51 4.4.3PreciselyTimedSpikeTrains ...................... 53 6

PAGE 7

4.4.4NeuronModel .............................. 54 4.4.5SerialCorrelationModel ........................ 55 4.5OptimalStimulationParameterSelection ................... 56 4.5.1Problem ................................. 56 4.5.2Method .................................. 56 4.6Conclusion .................................... 59 5-DIVERGENCEANDHILBERTIANMETRIC ................. 61 5.1HilbertianMetric ................................ 61 5.2Radon-NikodymDerivative ........................... 64 5.3HellingerDivergence .............................. 66 5.3.1Estimator ................................ 67 5.3.2IllustrativeExample ........................... 68 5.3.3DetectingLearning ........................... 68 5.3.4ArticialSpikeTrains .......................... 71 5.3.5Non-stationarityDetection ....................... 72 5.3.6KernelSize ................................ 77 5.4Symmetric2-divergence ............................ 79 5.4.1PointProcessRepresentation ...................... 80 5.4.2EstimationofSymmetric2-divergence ................ 81 5.4.2.1Straticationapproach .................... 81 5.4.2.2Smoothedspiketrainapproach ............... 82 5.4.3Results .................................. 84 5.4.3.1Twoactionpotentials .................... 84 5.4.3.2InhomogeneousPoissonprocess ............... 85 5.4.3.3Stationaryrenewalprocesses ................. 86 5.4.4AuditoryNeuronClustering ...................... 87 5.4.5Discussion ................................ 89 6KERNELBASEDDIVERGENCE ......................... 91 6.1Introduction ................................... 91 6.2KernelBasedDivergenceandDissimilarity .................. 91 6.3StrictlyPositiveDeniteKernelsonRnandL2 ............... 93 6.3.1CompositeKernelsonRn ........................ 93 6.3.2SchoenbergKernels(orradialbasisfunction)onL2 ......... 94 6.4RepresentationofSpikeTrainsandPointProcessSpaces .......... 94 6.4.1SmoothedSpikeTrainSpace ...................... 94 6.5StratiedSpikeTrainKernels ......................... 94 6.6KernelsonSmoothedSpikeTrains ...................... 96 6.7SimulationResults ............................... 97 6.7.1KernelPCA ............................... 97 6.7.2StatisticalPower ............................ 99 6.8Discussion .................................... 100 6.9ProofofTheorem 14 .............................. 101 7

PAGE 8

6.10ProofofTheorem 15 .............................. 102 7CONCLUSION .................................... 105 APPENDIX AREPRODUCINGKERNELHILBERTSPACESFORINFORMATIONTHEORETICLEARNING ...................................... 111 A.1RKHSFrameworkforITL ........................... 114 A.1.1TheL2spaceofPDFs ......................... 114 A.1.2RKHSHVBasedonL2(E) ....................... 115 A.2ConnectionBetweenITLandKernelMethodsviaRKHSHV ........ 118 BPOISSONPROCESS ................................. 120 CMEASURETHEORY ................................ 125 REFERENCES ....................................... 127 BIOGRAPHICALSKETCH ................................ 137 8

PAGE 9

LISTOFTABLES Table page 5-1ComparisonofintegrationmethodsforHellingerdivergence ........... 71 5-2StatisticalpowercomparisonofdissimilaritiesandHellingerdivergenceonasetofexperiments .................................... 73 6-1Statisticalpowerofdierentstrictlypositivedenitekernelsinduceddivergences ............................................. 97 6-2Listofkernelsforspiketrainsofspecialinterestandtheircorrespondingtimecomplexity ...................................... 102 9

PAGE 10

LISTOFFIGURES Figure page 2-1Illustrationofspiketrainrepresentations ..................... 22 2-2IllustrationofkernelmethodsandreproducingkernelHilbertspace ....... 26 3-1Non-stationarytrial-to-trialvariability ....................... 30 3-2Stratiedspiketrainspace ............................. 34 3-3Preciselytimedspiketrain ............................. 35 3-4Illustrationofpreciselytimedspiketrainmodelwithrealizations ......... 40 3-5Preciselytimedspiketrainmodelingofneuronalculturedata .......... 41 4-1StatisticalpowerofcumulativebaseddivergencesonPoissonprocesses ..... 51 4-2Statisticalpowerofcumulativebaseddivergenceonrenewalprocesses ..... 52 4-3Examplepreciselytimedspiketrains ........................ 52 4-4Statisticalpowerofcumulativebaseddivergenceonpreciselytimedspiketrains 53 4-5StatisticalpowerofcumulativebaseddivergencesonIzhikevichneuronmodel 54 4-6Discriminationofseriallycorrelationwithcumulativebaseddivergence ..... 55 4-7Spiketrainsfromsensorycortexinresponsetonaturalandelectricalstimulus 57 4-8Divergenceofarticialresponsetothenatural .................. 58 4-9Detailedrasterofnaturalstimulusandselectedresponse ............. 58 5-1Toyexamplerasterplotsandestimatedpointprocessillustration ........ 69 5-2ConvergenceofHellingerdivergenceestimator ................... 70 5-3EmpiricaldistributionofHellingerdivergence ................... 70 5-4SignicanceofdivergencebasedonHellingerdistancevaluesbeforeandafterlearningprotocol ................................... 72 5-5PerformanceoftestwithHellingerdivergencedependsonthenumberofsamples ............................................. 73 5-6Nonstationaritydetectionfromculturestimulus .................. 76 5-7Non-stationaritydetectionwithdissimilaritiescomparedtoHellinger ...... 77 5-8KernelsizeeectonHellingerdivergenceestimator ................ 78 10

PAGE 11

5-9Twoactionpotentialexampletotest2-divergence ................ 85 5-10Performanceof2-divergenceonPoissonprocess ................. 86 5-11Performanceof2-divergenceonrenewalprocess ................. 87 5-12Pointprocessclusteringofauditoryneuronswith2divergence ......... 88 6-1Kernelprincipalcomponentanalysisofspiketrainswithstrictlypositivedenitekernels ........................................ 98 6-2Statisticalpowerofkernelbaseddivergences ................... 99 7-1Comparisonoftypicalexecutiontimeofdierentdivergenceestimators ..... 106 7-2Comparisonofstatisticalpoweramongproposedmethods ............ 107 A-1ReproducingkernelHilbertspaceinterpretationofITL .............. 118 11

PAGE 12

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophyCAPTURINGSPIKETRAINSIMILARITYSTRUCTURE:APOINTPROCESSDIVERGENCEAPPROACHByIl\Memming"ParkAugust2010Chair:JoseC.PrncipeMajor:BiomedicalEngineeringNeuronsmostlycommunicateviastereotypicaleventscalledactionpotentials(orspikesforshort)givingrisetoatimeseriescalledtheneuralspiketrain.Spiketrainsarerandominnature,andhenceoneneedstodealwiththeprobabilitylawoverthespiketrainsspace(pointprocess).Richstatisticaldescriptorsareaprerequisiteforstatisticallearninginthespiketraindomain;thisprovidesnecessaryanalysistoolsforneuraldecoding,changedetection,andneuronmodeltting.Therstandsecondorderstatisticsprevalentlyusedinneuroscience{suchasmeanringratefunction,andcorrelationfunction{donotfullydescribetherandomness,thusarepartialstatistics.However,restrictingthestudytothesebasicstatisticsimplicitlylimitwhatcanbediscovered.Weproposethreefamiliesofstatisticaldivergencesthatenablenon-Poisson,andmoreover,distribution-freespiketrainanalysis.WeextendtheKolmogorov-Smirnovtest,-divergence,andkernelbaseddivergencetopointprocesses.Thisispossiblethroughthedevelopmentofnovelmathematicalfoundationsforpointprocessrepresentations.Comparedtothesimilarityordistancemeasuresforspiketrainsthatassumespredenedstochasticityandhencearenotexible,divergencesappliedtosetsofspiketrainscapturetheunderlyingprobabilitylawandmeasuresthestatisticalsimilarity.Therefore,divergencesaremorerobust,andassumptionfree.Weapplythemethodologyonrealdatafromneuronalculturesaswellasanesthetizedanimalsforneuroscienceandneuroengineeringapplicationsposedasstatisticalinferencestoevaluatetheirusefulness. 12

PAGE 13

CHAPTER1INTRODUCTIONImagineyouhaveaccesstotheringpatternofallneuronsinyourbrain.Youwillobserveseeminglyrandomsignals[ 112 ].Howwillyouanalyzethedatatodiscovertheinternalrepresentationandmechanismsforinformationprocessing?Thebrainisspontaneouslyactiveanditisperhapsimpossibletocontrolitsstatewithinthenaturaloperatingregime.Soinsteadofthewholebrain,youdecidetostudythesensoryormotorsubsystemwhereitisrelativelyeasytocontroltheinternalstatesandrepeatexperiments.Youwillstillndalotofvariabilityinthespatio-temporalneuralringpatterns[ 46 103 118 ],becausethebrainoperateswithnoisycomponents.Althoughthepropagationofanactionpotentialwithinaneuronisquitereliable,thesynapticmechanismandsubthresholdneuronaldynamicsisstochastic[ 3 4 69 70 74 ].Therefore,theresultingactionpotentialsequencesarestochasticaswell.Thebrain,inasense,knowsthisintrinsicstochasticityattheactionpotentiallevelandbuildsareliablesystem.Hence,toanalyzespiketrains{temporalpatternsofactionpotentialring{eitheranappropriatemodelfortheirstochasticityisrequiredortoolsthatcanquantifyarbitrarystochasticity.AssumptionsoftherstkindhavebeenwidelydeployedinneuroscienceintheformofPoissonorrenewalprocessmodels[ 13 97 98 120 ].Theseassumptionsenablesummarizingthecomplicatedspiketrainobservationstoafewintuitivestatisticalquantitiesandenablebuildingcomputationaltoolsoftenwithelegantanalyticsolutions.Infact,thePoissonassumptionforspiketrainanalysisisanalogoustotheGaussianassumptionforconventional(real-valued)stochasticprocesses;whileitallowspowerfulandsimpletoolstobebuilt,itoftenbreaksdownwhentheassumptionsareviolated[ 23 116 ].Thestochasticstructureofspiketrainobservationsisdirectlylinkedtoneuralcodes.IfoneassumesaPoissonstatistic,alltheinformationisavailableintheratefunction,sincetheratefunctioncompletelydescribestheprocess.Hence,neuronsonlyneedtodecodetheratefunction;thisfullysupportsthenotionofratecode[ 112 ].Inasimilar 13

PAGE 14

way,arenewalprocesswithaxedintervaldistributionshapeisalsofullydescribedbyitsmeanratefunction.However,recentevidencesupportsthenotionofatemporalcode{aneuralcodingschemethatassumesthatthereismoreinformationthanwhatiscontainedinthemeanratefunction.Therstsetofevidencescamefromtheearlysensorysystemswherehighlyrepeatablespatio-temporalpatternatthemillisecondtimescalewereobserved.Thesepreciselytimedspiketrainsareabundantlyreportedbothinvitroandinvivopreparations[ 26 28 54 66 74 102 103 ].Precisephaselockingtolocaleldpotentialoscillationisalsowidelyobservedinvivo,anditisconsideredtocreatepopulationwidepreciseactivitypatterns[ 115 ].Synchronyisanothertemporalcodingschemewhereneuronsrewithhighlytimelockedmannercomparedtothecoincidenceprobabilitypredictedfromtheratefunctionmodulations[ 41 ].ModulationofFanofactor(ameasureofspikecountvariability)withoutco-modulationofthemeanringratewasobservedinvarioustasksandareasinmonkeywhichindicatesPoissonstatisticsisnotenough[ 19 ].Althoughitremainstobeshownthatthesenon-Poissonstatisticsareindeedusedbythebrainfordecoding(ratherthanbeingepiphenomena),theseevidencessuggestthatthePoisson/renewalassumptionisnotalwaysappropriateforanalysis.Thisdissertationwasinspiredbytheinformationtheoreticlearning(ITL)framework[ 101 ]fornon-Gaussian,non-linearsignalprocessing,andgeneralizestheITLapproachtothespiketrainsinseveralways.InITL,informationtheoreticquantities{entropy,andmutualinformation{areutilizedindevelopingsignalprocessingandmachinelearningalgorithms.Insteadofthetraditionalmeansquarederror(MSE)asacentralcostfunctionwhichisoptimalinthemaximumlikelihoodestimation(MLE)senseforthewhiteGaussiannoise,informationtheoreticquantitiesareusedascostfunctionsinITL.WhiletheMSEonlyconsidersuptosecondorderstatistics,informationtheoreticquantitiesreectallaspectsoftheunderlyingprobabilitylaw.Theestimationoftheinformationtheoreticquantitiescanbedoneeciently,andinfactthesuccessofITLinpartliesinits 14

PAGE 15

simpledistribution-freeestimatorsthatarederivedfromParzenwindowing.Theestimatorprovidesinterestingconnectiontokernelmethods,viathetheoryofreproducingkernelHilbertspace(RKHS).Wepresentthedetailedinterpretationandconnectionsintheappendix A.1 .InthespiritofITL,ourgoalistodevelopdistribution-freestatisticaldivergencesforpointprocesses(stochasticspiketrainobservations).Astatisticaldivergence(orsimplydivergence)isastatisticthatquantiesthedierencebetweentheprobabilitylaws{ittakesthevalueofzeroifandonlyiftheprobabilitylawsareidentical,andotherwiseittakesapositiverealvaluewhichindicateshowdierenttheyare.Manydivergencesaregeneralizationofinformationtheoreticquantities;forexampletheKullback-LeiblerdivergencehasacentralroleinShannon'sinformationtheory,theRenyi's-divergence,and-divergencefamilieshavetheirowninformationtheoreticinterpretations[ 22 67 89 ].Infact,anentropycanoftenbederivedfromthedivergenceagainstaLebesguemeasure(whichisnotaprobabilitymeasure)[ 20 ].Divergencesforpointprocesseshavebeenexploredintheliteratureforparametricpointprocesses.Kullback-Leiblerdivergence,Hellingerdistance,andtotalvariation,forexample,areoftenusedunderPoissonassumption[ 104 ].Forgeneralpointprocesses,theProhorovdistanceandvariationdistancecanbedened,however,theyonlyhavetheoreticalvaluesincenoestimatorisprovided[ 58 ].EstimationofKullback-Leiblerdivergenceonbinnedspiketrainhasbeenproposed[ 55 ],however,duetothelargedatarequirement,itisdiculttoapplyinpractice.Weapproachtheproblemofdesigningpointprocessdivergenceswhilekeepinginmindpracticalestimation.Themoststraight-forwardapproachistouseadistribution-freepointprocessestimationandplugitintoknowndivergencessuchasHellingerdistance.1Wedevelopdistribution-freepointprocessestimatorsbyextendingtheideasofempirical 1R(f(x))]TJ /F3 11.955 Tf 11.95 0 Td[(g(x))2dxwherefandgareprobabilitydensities 15

PAGE 16

cumulativedistributionfunction(CDF),andkerneldensityestimation(alsoknownasParzenwindowing)fromEuclideanspacetospiketrainspace.TheextendedempiricalCDFleadstothegeneralizationofKolmogorov{Smirnov(K-S)andCramer{von-Mises(C-M)tests,whilethekerneldensityestimationallowstheestimationofafamilyofdivergencesknownasHilbertianmetrics.Anothergenericapproachistousethelikelihoodratiobaseddivergences{ifthelikelihoodratiodeviatesfrom1,thetwopointprocessesaredierent.-divergenceisbasedoninformationtheoreticextensionsoflikelihooddeviation.Thus,wedeveloplikelihoodratioestimatorsforpointprocesses.Itistrivialtoutilizetheprobabilitydensityestimatortoevaluatethelikelihoodratio,however,wecanskipthecomplicatedestimationoftwodensitiesanddirectlyestimatetheratiousingkernelregression.Kernelregressionrequiresasymmetricpositivedenitekernelonthespiketrains,whichwehavedevelopedinpreviousstudies[ 80 90 ].WeshowthatsuchkernelsarepowerfulenoughtorepresentanyL2functionfromthespiketrainstoreals;thusagoodapproximationofthelikelihoodratiocanbeobtainedandpluggedintoestimate-divergence.Infact,kernelsprovidenotonlytheabilitytoapproximatefunctions,butalsotheabilitytorepresentthespiketrainsinaHilbertspace(avectorspacewithinnerproducts).Undercertainconditions,2wecanshowthateachpointprocessisalsouniquelyrepresentedinthesameHilbertspace.SincetheHilbertspaceisequippedwithametric,thedistancebetweentwopointprocessesintheHilbertspaceisalsoapointprocessdivergence.Asamatteroffact,theuniquerepresentationcorrespondstotheexpectationofthespiketrainrealizationsintheHilbertspaceofthepointprocess,andcanbeeasilyestimatedbyempiricalmean.3Wealsoshowthisformulationofdivergenceanditsestimationholdswithoutthesymmetryrequirementforthekernel(hencenot 2Hermitianstrictlypositivedenitekernel.3convergenceisguaranteedbythelawoflargenumbers 16

PAGE 17

havingexplicitHilbertspaceembedding).ThisfamilyofdivergenceisageneralizationofmanydivergencesanddissimilaritiesinquadraticformsuchasDEDofITL,C-Mteststatisticoncountdistribution,andsquareintegrateddistanceofmeanratefunctions.Oncewehavestatisticaldivergences,wecanperformstatisticalinferenceonpointprocess.Twomajorcomponentsofstatisticalinferenceareestimationandhypothesistesting[ 58 ].Estimationtheoryrequiresanobjectivefunctionoralossfunctionwhichcapturestheoptimality.Adivergencecanprovideapropersimilarityoftheestimatetothedata.Forhypothesistesting,onebuildsahypothesisoftheformP=QwherePandQaretwoprobabilitylawsunderlyingspiketrainmodelsordatasets.AdivergencecanbeusedtomeasurethedeviationfromP=Qwithasinglenumber.Unlikepartialsummarizingstatisticssuchascountdistribution,orparametrictesting,divergencesarefundamentallywellsuitedforhypothesistesting,sincegivenaconsistentestimatoritwillalwayscorrectlyrejectintheasymptoticlargedatalimit. 1.1NeuroscienceApplicationsApartiallistofapplicationsofpointprocessdivergencesandstatisticalinferencesinthecontextofneuroscienceispresentedinthissection. 1.1.1ChangeDetectionOneofthefundamentalquestionsthatisconstantlyencounteredinavarietyofneuroscienceproblemsiswhethertwosetsofspiketrainobservationsfollowthesameprobabilitylawornot.Detectingchangeiscrucialinneuroscienceexperimentswherestableconditionsmustbeensuredacrosstrials,andndingwhichprotocolinducessignicantchangetotheneuralsystemisoftenpartoftheresearchgoal.Fromastatisticalperspective,thisquestioncanbeframedasahypothesistestingproblemusingasuitabledivergencemeasure;thestatisticalrejectionofnull-hypothesisofP=Qwouldsupportthatthereisachangeindeed.Theproblemofdetectingplasticityandnon-stationarityofaneuralsystemfromspiketrainobservationhasbeenachallengeforneuroscience,andithinderedtheplasticity 17

PAGE 18

researchusingexternalelectrodearrays[ 91 92 ].Researcherswouldreportchangesintheringrateprolewhichindicatestructuralchangeinthenetwork,buttheydidnothaveasophisticatedtooltotestforsubtlechangesotherthantherate.Wewillapplythedevelopedmethodstosuchplasticityexperimentsoninvitrosettingswithoutanyassumptiononthesysteminsection 5.3.5 1.1.2EstimationandGoodness-of-tHowgoodisamodelthatgeneratesspiketrainswithrespecttothetargetspiketraindataset?ThestandardMSEisnotappropriateformeasuringthegoodness-of-tofspikingmodels.Forexample,theproblemofsearchingforasetofparametersthatproducesspiketrainsclosesttothespeciedprobabilitylawcanbetackledwithdivergences.Wedemonstratethepowerofdivergencesfortheoptimalstimulationparametersearchprobleminsection 4.5 .Usingadivergenceasthelossfunction,onecanalsobuildminimumdivergenceesti-mators[ 89 ].Forexample,inneuroscienceSharpeeproposedtheuseofmutualinformation(Kullback-Leiblerdivergence),andPaninskiproposedtheuseof-divergenceasatooltoestimatethelinearportionofthelinear-nonlinearspikingneuronmodel[ 86 113 ].Whilethewidelyusedrstorderandsecondorderbasedmethods{spiketriggeredaveraging(orreversecorrelation),andspiketriggeredcovariancetechniques{areshowntobeinconsistentestimatorsforthisproblem,thecorrespondingdivergencebasedestimatorsareconsistent[ 86 ]. 1.1.3NeuralAssemblyIdenticationAneuralassemblyisagroupofneuronsthatworktogethertransientlyforaparticulartask[ 78 ].Whenelectrodearrayisusedtorecordmanyneuronsininvivoexperiments,oneoftheinterestingtasksistondneuralassemblies.Itiswidelyassumedthatthoseneuronsreinsynchrony[ 8 39 41 ],however,thereisnobiophysicalrestrictiontoelectsynchronyasthecommonproperty.Ifwecanndanystatisticalsimilaritiesbetweentheneuronalpopulationringpattern,itwouldbefairtoassumethattheprobed 18

PAGE 19

neuronsformastatisticalneuralassembly.Inasense,astatisticalneuralassemblycanbeusedtoperformensembleaverageswithinasingletrial.Toassisttheidenticationofstatisticalneuralassemblies,weneedtoidentifyneuronswithsimilarstatistics{ataskthatcanbesolvedbyclusteringalgorithmonthedivergencevalues.Wedemonstrateneuronalresponseclusteringinsection 5.4.4 1.1.4CodingCapacityMutualinformationisoftenusedtoquantifytheeciencyofneuralcodeofaspikingneuralsystem[ 55 88 123 ].Neuraldecodingcanbeviewedasaclassicationproblemwhentheinputiscategorical.Theclassicationerrorlimitisduetotheoverlapoftheprobabilitylawsgiveneachcategory.Divergencescanoftenstronglyboundtheclassicationerrors[ 67 117 ].Hence,wecananalyzethecodingcapacitybymeasuringthedivergences. 1.1.5FunctionalConnectivityviaDependenceDetectionAdivergencemeasurecanbeextendedtoa(in)dependencemeasurewhenthejointandproductofmarginalprobabilitymeasuresareapplied.Thenonparametricestimatorcanalsobeusedtoestimatethejointdistributionofapairofspiketrains,henceitispossibletoextendthemethodtopointprocessdependencemeasure.Inthesamestimulationparadigmwheretheinputiskeptconstant,thedependencebetweenpointprocessesimpliesdependencebetweenneurons.Fromasignalprocessingpointofview,crosscorrelationandGrangercausalityhavebeenappliedtobinnedspiketrainstoestimatefunctionalconnectivity[ 76 93 ].However,theyimplicitlyassumethattherstandsecondorderstatisticscarrytheinformation.Theproposedapproachcanbeavaluableadditionforfunctionalconnectivityanalysiswithoutassumptions.Thefunctionalconnectivitycanalsobeappliedtoidentifyingneuralassemblies. 19

PAGE 20

CHAPTER2SPIKETRAINSPACESNeuronscommunicatethroughasequenceofstereotypicalactionpotentials.Sincetheamplitudeofeachactionpotentialisvirtuallyconstant,theonlyinformationisinthetimingofeachevent.Thisbiologicalsignalabstractionasasequenceofeventtimesiscalledthespiketrain.Spiketrainsaredierentfromconventionalamplitudebasedsignals(eitherincontinuoustimeorindiscretetime)inthesensethattheycannotberepresentedinalinearspacesuchasL2orl2.Infact,spiketrainslackanaturalalgebraicstructure:amajorroadblock.Neitheradditivenormultiplicativestructuresisnaturallydenedforspiketrains.Inthischapterwediscusshowspiketrainscanbemathematicallyrepresented,andreviewtwowaysofintroducingbasicstructurestothespiketrainspace.Therststructureistheconceptofdistance,andthesecondisakernelthatinducesaHilbertspace.Wediscusstheadvantagesanddisadvantagesoftheseapproachesandposetheproblemofcapturingtheunderlyingprobabilitystructuretobetterquantifysimilarity. 2.1RepresentationsofSpikeTrainsSpiketrainscanberepresentedasseveraldierenttypesofmathematicalobjects.Thesedierentrepresentationsleadtodierenttypesofdivergencesinlaterchapters. Denition1(Sequencerepresentationofaspiketrain). Let!=(t1;t2;:::)beanenumerablesequenceofrealnumbersinstrictlyincreasingorderwhereeachelementtidenotesthetimeofi-thactionpotentialobservation.Wecall!thesequencerepresentationofthespiketrain.Thisrepresentationisstraight-forwardandfrequentlyusedtostorespiketrainsindevicessuchasdigitalcomputers.Practicallyforaniteobservationinterval,!hasanitelength.Notethateventhougheachsequenceisofnitelength,theircorrespondinglengthsaredierentingeneral.Hence,thisrepresentationdoesnotallowavectorspaceembeddingunlessanexplicitpaddingrulemakesthemequallength.Padding 20

PAGE 21

makesundesirableassumptionsaboutcorrespondencebetweeni-thspiketimingsthatisunnatural,andnotveryrobusttoinsertionordeletionofactionpotentials.Amoretechnicalrepresentationusedinthemathematicscommunityistorepresentspiketrainsascountingmeasures.Ameasureisafunction,henceoperationsonfunctionscanbeextendedtospiketrains.Sincewewillbeusingmeasuretheoryfrequentlyinthisthesis,herewebrieypresentthebasicdenitions(extensivediscussioncanbefoundinDaleyandVere-Jones[ 25 ],Halmos[ 42 ],Karr[ 58 ]).Letbeaset.AsetofsubsetsFofisa-algebraif2Fanditisclosedundersetunionandcomplement.Thepair(;F)iscalledameasurablespaceandtheelementsofFarecalledmeasurablesets.Ameasureisafunction:F!R+suchthatitiscountablyadditive,thatis,ifA1;A2;:::isasequenceofdisjointmeasurablesets,([i=1Ai)=Pi(Ai).Acountingmeasureisameasuresuchthat(A)2NforanymeasurablesetA.Asimplecounting(point)measureisacountingmeasuresuchthat(fxg)1forx2.ThefundamentalsimplepointmeasureistheDiracmeasurex(A)=1x2Awhere1istheindicatorfunction.Sincewearedealingwithtimeastheprimaryspaceforevents,thespiketrainisrepresentedasasimplecountingmeasureontherealline.Specically,weusethenaturaltopologicalstructureofRandusetheBorelalgebraB(R)asthe-algebra.ABorelalgebraofatopologicalspaceisthesmallest-algebrathatcontainsalltheopensets.Forasequenceofactionpotentialtimingst1;t2;:::;tn,thecorrespondingcountingmeasureisthesumofDiracmeasures:N=nXi=1tiBythevirtueofthereallineandextensiontheorems[ 42 ],thecountingmeasureson(R;B(R))canbefullyspeciedbyanintegervaluedfunctiononR,N(t):=N((;t])(seegure 2-1 )thatisrightcontinuouswithleftlimits(cadlag).WedenotethespaceofallsimplecountingmeasuresasMs. 21

PAGE 22

Figure2-1.Illustrationofspiketrainrepresentations.(Top)Countingfunction.(Bottom)SequenceofDiracdeltafunctions. AnalternativerepresentationschemeistotransformeachspiketrainstoauniquefunctioninL2.Manysmoothingschemesfallintothiscategory.Astablecausallinearlterh(t)2L2appliedonthesequencerepresentation,Pni=1h(t)]TJ /F3 11.955 Tf 13.39 0 Td[(ti),iswidelyusedtocreatefunctionsinL2.Othernon-linearandnon-causalschemesarealsopossible[ 44 ].Thebinningapproachwithasmallbinsizecanalsobethoughtofasacrudeapproximationofsmoothingwheretheuniquenessholdsuptotimingjitterwithineachbin.Thisrepresentationisdirectlylinkedtothekernelbasedrepresentation(section 2.3 ).Theserepresentations{timesequence,simplecountingmeasure,andsmoothedspiketrains{areequivalentinthesensethattheyprovideaschemethatuniquelyrepresentseachspiketrain.However,dierentsetofmathematicaltoolscanbemoreeasilyappliedtoeachofthem. 2.2DistanceandDissimilarityMeasuresSimilarityanddissimilaritycanbeimposedonobjectswithoutotheralgebraicstructuresuchasthelinearstructure,andstillallowsomepatternrecognition,andvisualization[ 96 ].Inametricspace,algorithmssuchask-nearestneighborclassiercanbeimplemented.Inthissection,wediscussvariousdistance/dissimilarity/similaritymeasuresproposedintheliterature[ 85 ].Throughdecadesofneuroscienceresearchonneuralcodes,welearnedthattheringrateandprecisetimingofactionpotentialsareimportantforencodinginformation,andhencesimilarityinthemeanratefunctionorspiketimingshouldimplysimilarityor 22

PAGE 23

closenessofspiketrains.Giventhesameorsimilarexperimentalconditions,theresultingobservedspiketrainsshouldbesimilarandastheconditiondeviates,thedistanceshouldincrease.Inaprobabilisticpointofview,ideally,adistancebetweenobjectsshouldbeinverselyrelatedtotheposteriorprobabilitydistribution[ 96 ].Theintuitivenotionofratesimilarityandtimingsimilarityhaveinspiredthetwoimportantspiketrainmetrics:Victor-Purpuradistance[ 123 ],andvanRossumdistance[ 121 ].VictorandPurpurasuggestedtoextendtheideasofeditdistancethathadbeensuccessfullyusedinnaturallanguageprocessingandbioinformaticstoapairofspiketrains[ 123 ].Victor-Purpura(VP)distanceistherstbinlessdistancemeasureproposedintheneuroscienceliterature.Sincethetrial-to-trialvariabilityofthespiketrainssuggestthejitterofspiketimingandmissingspikestobeconsideredasnoise[ 118 ],theirapproachofassigningcosttoshiftinganactionpotentialintimeandaddingorremovinganactionpotentialismotivatedbyneurophysiologicalideasthatemphasizescoincidence.VPdistancebetweenspiketrainsisbasedonthecostintransformingonespiketrainintotheother.Threeelementaryoperationsintermsofsinglespikesareestablished:movingonespiketoperfectlysynchronizewiththeother,deletingaspike,andinsertingaspike.Onceasequenceofoperationsisset,thedistanceisgivenasthesumofthecostofeachoperation.Whilethecostofdeletingorinsertingaspikeissettoone,thecostofaligningaspikeatttot+tisqjtj,whereqisascalingparameterforrelativetemporalsensitivity.Becauseahigherqimpliesthatthedistanceincreasesmorewhenaspikeneedstobemoved,thedistanceasafunctionofqcontrolstheprecisionofthespiketimesofinterest.Recalltheaxiomsofametric;forspiketrains!i,!jand!k: (i) Symmetry:d(!i;!j)=d(!j;!i) (ii) Positiveness:d(!i;!j)0,withequalityholdingifandonlyif!i=!j (iii) Triangleinequality:d(!i;!j)d(!i;!k)+d(!k;!j): 23

PAGE 24

Toensurethetriangleinequalityanduniquenessofthedistancebetweenanytwospiketrains,thesequencewhichyieldstheminimumcostintermsoftheoperationsisused.Therefore,theVPdistancebetweenspiketrains!iand!jisdenedasdVP(!i;!j)=minC(!i$!j)Xldqtici[l];tjcj[l]; (2{1)whereC(!i$!j)isthesetofallpossiblesequencesofelementaryoperationsthattransform!ito!j,andc()[]2C(!i$!j)whereci[l]denotestheindexofthespiketimeof!imanipulatedinthel-thstepofasequence.dq(tici[l];tjcj[l])isthecostassociatedwiththestepofmappingtheci[l]-thspikeof!iattici[l]totjcj[l],correspondingtothecj[l]-thspikeof!j.NotethatVPdistanceisnon-Euclidean.dqisadistancemetricbetweentwospiketimes.Supposetwospiketrainswithonlyonespikeeach,themappingbetweenthetwospiketrainsisachievedthroughthethreeabovementionedoperationsandthedistanceisgivenby dq(t;s)=minfqjt)]TJ /F3 11.955 Tf 11.95 0 Td[(sj;2g=8><>:qjt)]TJ /F3 11.955 Tf 11.96 0 Td[(sj;jt)]TJ /F3 11.955 Tf 11.96 0 Td[(sj<2=q2;otherwise:(2{2)Thismeansthatifthedierencebetweenthetwospiketimesissmallerthan2=qthecostislinearlyproportionaltotheirtimedierence.However,ifthespikesarefurtherapart,itislesscostlytosimplydeleteandinsertappropriately.Later,HoughtonandSen[ 45 ]showedthatVPdistanceyieldsinpracticealmostidenticalresultstoL1distanceofsmoothedspiketrainswitharectangularkernel.VPmetrichasbeenusedbytheneurosciencecommunityforanalyzingprecisionandpotentialoftemporalcoding[ 99 103 ].vanRossum[ 121 ]suggestedtheuseofL2distancebetweensmoothedspiketrains.Becausethesmoothingeectivelytransformsthespiketraintorealvaluedsignals,L2distanceisanaturalchoice.Thisisalsoanaturalextensionofthetraditionallyusedtime 24

PAGE 25

binningmethodwherethesmoothingkernelisrectangularandthesmoothedfunctionissampledonadiscretegridontime.vanRossumdistancealsohasbiologicalrelevance,sincethesmoothingcanbeconsideredasarstorderapproximationofthesynapticsignaltransductionprocess[ 121 ].Houghton[ 44 ]recentlyextendedthisideatoincorporatephysiologicaldynamicsatthesynapse.AsinvanRossum[ 121 ],L2distanceisdenedbetweenthepost-synapticpotentialsevokedaccordingtodynamicalmodelsofsynapses.Inspiketrainsfromzebranchrecordings,Houghton[ 44 ]showedanimprovementinclassicationperformancewhichwasparticularlysignicantwhenaccountingforthedepletionofreceptors,suggestingthehistorydependenceintheinteractionofactionpotentialtimings.WeshowedthatthevanRossumandHoughton'sdistancemeasurescanbederivedfromtheperspectiveofkernels[ 82 84 ]. 2.3ReproducingKernelHilbertSpaceforSpikeTrainsThemainadvantageofthesmoothedspiketrainapproachisthatthespiketrainscanbetreatedasregularcontinuousamplitude,continuoustimesignals.L2isaHilbertspace(linearstructurewithinnerproductandnorm),thereforeitallowstraditionalsignalprocessingtoolstobedirectlyappliedtospiketrains.Linearltering,cross-correlationanalysisandmostsupervisedlearningalgorithmscanbeappliedtomentionafew.ThekernelmethodisapowerfulmachinelearningtechniquewherelinearalgorithmscanbeeasilyextendedtononlinearonesbythetheoryofreproducingkernelHilbertspace(RKHS).Givenabivariatesymmetricpositivedenitekerneldenedontheinputspace,theobjectsintheinputspacecanbemappedtoaHilbertspacewheretheinnerproductisdenedbytheevaluationofthekernel[ 2 ](Seegure 2-2 ).Thesuccessofsupportvectormachine(SVM)broughtthedevelopmentofavarietyofkernelmethodssuchaskernelregression[ 109 ],kernelprincipalcomponentanalysis(KPCA)[ 110 ],kernelleastmeansquares(KLMS)[ 68 ],andkernelsherdiscriminant[ 109 ]tonamejustafew. 25

PAGE 26

Figure2-2.IllustrationofreproducingkernelHilbertspaceinducedbysymmetricpositivedenitekernel.ElementsintheunstructuredspaceXareprojectedtoanHilbertspaceHviathekernelK.Linearstructureandinnerproductareindicatedwithdottedlines. ThekernelmethodisparticularlyusefultodatatypesthatdonothavenaturalrepresentationinEuclideanspace.Manyspecializedkernelsweredesignedforstrings,bagsofobjects,images,shapes[ 109 ],andtransducers[ 21 62 ].However,theyrelyonthepriorknowledgeandinsightofthedesigner.WehaveindependentlydevelopedakernelmethodthatinducesanRKHSthatcanisometricallyembedthevanRossumdistance[ 82 84 94 ].Inaddition,theCauchy-SchwarzdivergencewhichmeasurestheangulardistanceintheRKHScorrespondstothesimilaritymeasureusedbySchreiberetal.[ 111 ]toperformclusteringofinvitrospiketrainobservations.Usingakernelsmoothingrepresentationofthespiketrain,denethesmoothedspiketrainsas, ^i(t)=NiXm=1pdf(t)]TJ /F3 11.955 Tf 11.95 0 Td[(tim)(2{3)wheretimisthem-thspiketimingofspiketrainindexedbyi.Wedenethememorylesscrossintensity(mCI)kernelonspiketrainsas, I(!i;!j)=ZT^si(t)^sj(t)dt:(2{4) 26

PAGE 27

ThemCIkernelissymmetricpositive(semi-)deniteandhenceinducesanRKHS[ 84 ].Thiskerneliscloselyrelatedtoothersingletrialrateestimationprocessessuchasvariousbinningmethods.Hence,usingthiskernelresemblesmanylinearmethodsthathavebeenwidelyusedinneuroscience.OneimportantpropertyofthemCIkernelisthatwhenpdfisanexponentialdecay1,( 2{4 )canbeevaluatedecientlyas,I(!i;!j)=1 NiNjNiXl=1NjXm=1K(t(i)l)]TJ /F3 11.955 Tf 11.95 0 Td[(t(j)m) (2{5)whereK()isthedoubleexponentialfunction(Laplaciankernel)[ 84 ].Also,inthiscasethevanRossumdistancecoincideswiththedistanceinthisRKHS.ToovercomethelinearpropertiesofthekernelinheritedfromL2space,wecanextenditwithadditionalnonlinearity.WehaveproposedtwosuchnonlinearkernelscallednCI1andnCI2.ThenCI1isdenedas,K1(!i;!j)=exp)]TJ 10.5 8.09 Td[(ki(t))]TJ /F3 11.955 Tf 11.96 0 Td[(j(t)k2 ; (2{6)whilenCI2isdenedas,K1(!i;!j)=Zexp")]TJ /F3 11.955 Tf 10.49 8.09 Td[(i(t))]TJ /F3 11.955 Tf 11.96 0 Td[(j(t)2 #dt: (2{7)ItisanonlineartransformationofthestructurethatiscreatedbythemCIRKHS.Wehavesuccessfullyappliedthistechniqueforcontinuous-timecrosscorrelation[ 93 ]andspectralclustering[ 82 83 ].WewilldiscusstheimplicationofthenCIkernelinthelightofstrictlypositivedenitefunctionsinchapter 6 1ThiscorrespondstoarstorderIIRlow-passlter 27

PAGE 28

2.4KernelDesignProblemWhatistheoptimalkernelforaspecicproblem?Choiceofakerneldeterminesthestructureofthespacewherethedataislinearlyrepresented[ 81 109 ].Ingeneral,thekernelshallcapturethesimilarityofthedata,andideallythedistanceinthespaceshouldcorrespondtotheposteriordistributionforclassication[ 96 ].Theproblemofndingakernelthatbestapproximatesthisconditionisknownasthe\kerneldesignproblem".Allofthepreviouslymentioneddistanceorsimilaritymeasuresarebasedontheinsightthattheexactspiketiming(orsynchronizedinstantaneousincreaseofringrate)isanimportantfeature.Althoughthemeasureshaveafewfreeparametersrelatedtotimescalethatcanbeoptimized,theycannotcopewiththegeneralstatisticalstructureoftheobservations.Inotherwords,thesemeasuresimposearigidstructuretothespiketrainspace,asparametricmodelsdo.Infact,thestructuresarecloselyrelatedtothesmoothingprocess;injectivemappingofthespiketrainstotheL1orL2space.ThereareasetofkerneldesignmethodsthatutilizeagenerativemodelingofthedatasuchastheFisherkernel[ 50 ],productprobabilitykernel[ 51 ],Hilbertianmetrics[ 43 ],andnon-extensiveentropickernel[ 71 ]inEuclideanspaces.TheFisherkernelisoneoftheearliestkernelsproposedforgenerativeprobabilitymodels[ 49 50 ].Givenaparametricfamilyofprobabilisticmodels,thetwoobjectsaremappedusingthescorefunction(gradientoftheloglikelihood)andinnerproductistakenwithrespecttotheRiemannianmanifold.Theideaisvalidwhentheprobabilitylawisintheexponentialfamilywithxednumberofparameters,suchthattherepresentationbecomesthespaceofsucientstatistics.Ininformationtheoreticlearning[ 101 ],basedontheoptimizationprinciplesforRenyi'squadraticentropy,aspaceofsquare-integrableprobabilitydistributionsisused,anditisanRKHSbasedonthecrossinformationpotentialkernel2astheinnerproduct 2K(X;Y)=RfX(z)fY(z)dzwherefXistheprobabilitydensityofX 28

PAGE 29

betweenrandomvariables(section A ).Thiskernelcanalsobeviewedasaconditionalexpectationoftheprobabilitydensitywithrespecttotheotherrandomvariable.Hence,Jebaraetal.[ 51 ]callsitanexpectedlikelihoodkernel,andgeneralizesittoabroaderfamilyofkernelsoftheformK(p;q)=Rpr(x)qr(x)dxwherepandqaretwoprobabilitydensities,andr>0(productprobabilitykernel).Givenanonparametricestimatorfortheprobabilitydensities,thesekernelscanbereadilyimplemented.WhenasingletrialestimationusingParzenwindowingisused,itformsasubsetofregularkernelmethods.Also,whenr=1 2,itistheBhattacharyyakernelwhichisrelatedtotheHellingerdistance.Hellingerdistancehasanicepropertythatitisindependentonthereferencemeasureforintegration.HeinandBousquet[ 43 ]proposedafamilyofsuchmetricsforprobabilitymeasureswhichcanbeembeddedisometricallyinaHilbertspace,andpresentedthekernelthatprovidesinnerproductbetweenprobabilitymeasures.ThisfamilyofHilbertianmetricsprovidesafoundationfordesigningkernelsfromgenerativemodelswithnon-Euclideanrepresentation.ThefamilyofRKHSincludetheclassicalHellinger,Shannon-Jensen,2-divergence,andtotalvariation.Martinsetal.[ 71 ]alsosuggestedafamilyofkernelsonprobabilitymeasuresbasedonTsallis-entropy.WewillexploretheHilbertianmetricsappliedforpointprocessesinchapter 5 29

PAGE 30

CHAPTER3POINTPROCESSSPACESTrial-to-trialvariabilitystudiesclearlyshowthatspiketrainobservationsarenoisy.Perhapsduetothediversityandheterogeneityofneuralsystems,thisvariabilitystructureishighlyheterogeneousforneuronsorgivendierentexperimentalconditions;whileoneneuronmayshowPoissonprocesslikevariability,anotherneuronmayshowpreciselytimedactionpotentials,orthesameneuronmightshowprecisetimelockingbehaviorforstrongstimulusandincreasethetimejitterasthestimulusgetsweaker.(Seegure 3-1 foratrial-to-trialvariabilityexample) Figure3-1.Recordingfromventralauditorythalamusofarat.Threedierentfrozennoisestimuluswaspresented:dynamicripplenoise(DRN),lteredgaussiannoise,andspeechbabblefromInternationalCollegiumofRehabilitativeAudiology(ICRA).Eachboxcontains56trials,andthegraylineisthesmoothedPSTH. Thetrial-to-trialvariabilityshouldbeconsideredasnoiseintheinputforadownstreamneuron.Toovercomethenoise,thereareseveralpossiblestrategies{post-synapticsmoothingandpatterndetection,bothatthesinglespiketraintrainlevelaswellasinthepopulationactivity.Anidealneuronwouldencodetheinformationthatwouldsurvivethespiketrainnoise,andoptimallydecodegiventheknowledgeofthenoisestatistics.The 30

PAGE 31

informationneedstobeconveyedinthemaximally\orthogonaldimension"forthenoisetobeecientlytransferred;signalsshouldbeseparatedinawaythemostdissimilartothenoise.However,therelevant\codingdimension"isbynomeansunique,andwillberelativetothetuningofthedownstreamneuron.Hence,inthecontextofoptimaldecoding,capturingthenoisestructureiscritical.Thewidedynamicrangeofstochasticityofspiketrainobservationsarebestcapturedbypointprocessmodels;Apointprocessisaprobabilisticdescriptionofspiketraingeneration.Inthischapter,werigorouslydenepointprocessesbasedontherepresentationofspiketrainswediscussedinthepreviouschapter.Weintroduceapointprocessmodelofpreciselytimedspiketrainsasabenchmarkfortrial-to-trialvariability.Inlaterchapters,wewilldeveloppointprocessdivergencesasthekeystatisticfordistribution-freestatisticalinference. 3.1DenitionofPointProcessApointprocessisaprobabilitylawthatgovernsthegenerationofspiketrains.Therearemultipleformalframeworkstorigorouslydenepointprocesses. 3.1.1RandomCountingMeasureAnalogoustorandomvariablesthatdescribeprobabilitylawsinEuclideanspaceRd,arandomcountingmeasureisafunctionfromthesamplespacetothespaceofcountingmeasures.Let(;F;P)beaprobabilityspacewhereisthesamplespace,Fisthe-algebra,andPistheprobabilitymeasure.AmeasurablefunctionN:!Msisarandomcountingmeasurethatdenesapointprocess.Since2Msisafunctionoftheform:B(R)!N,wecanseethatN:!B(R)!N,anditcanalsoberepresentedasN:B(R)!!N.ThusN(A)isaninteger-valuedrandomvariablegivenaBorelsetA.SpecifyingaconsistentjointprobabilitylawofcollectionofrandomvariablesN(A1);N(A2);:::;N(An)foranyfAi2B(R)galsoisequivalenttodeningapointprocess. 31

PAGE 32

Letusdenethe(inhomogeneous)Poissonprocessasanexample1.Givenanintensitymeasure:B(R)!R+,therandomcountingprocessofaPoissonprocessisspeciedbyindependentintegervaluedrandomvariables,Pr[N(A)=k]=(A)k k!e)]TJ /F4 7.97 Tf 6.58 0 Td[((A) (3{1)foranyA2B(R)wherek2N[ 116 ]. 3.1.2ConditionalIntensityFunctionRecallthat2Msintimecanalsobedescribedbynon-decreasingstaircasefunctions:R!N.Again,wecanseethatN:!R!NaswellasN:R!!Narevalidrepresentations.Indeedthisisthesamedescriptionasrandomprocesseswhereateachtimepoint,thevalueisrandomN(t).Forsuchapointprocessrepresentationthatsatisestheusualconditions,themartingaletheoryallowsadecompositionandwecanobtainthecompensatorprocess(Doob-Meyerdecomposition).Morerigorousconditionsareomittedhere,refertoKarr[ 58 ]fordetails.Thecompensatorofapointprocesscanbesimplyrepresentedbytheconditionalintensityfunctiondenedas(t)=lim!0E[N((t;t+jHt)] (3{2)whereHtisthe-algebraofeventsoccurringattimesuptobutnotincludingt[ 25 ].Theconditionalintensityfunctioncompletelydescribesanypointprocess[ 25 ].Theconditionalintensityfunctioncanbeparameterizedwithageneralizedlinearmodel(GLM)whichoftenprovidesauniquemaximumlikelihoodttingandispopularinpointprocessmodeling[ 12 79 87 99 ]. 1FordetaileddescriptionofPoissonprocesssee B 32

PAGE 33

Therstorderstatisticoftheconditionalintensityrepresentationistheintensityfunction,alsoknownasthe(mean)ratefunction:R!R+,(t)=E[(t)]: (3{3)ForPoissonprocess,(t)=(t)meaningtheprocessisindependentofthehistory,andtheintensityfunctionissimplyrelatedtotheintensitymeasurebyRba(t)dt=([a;b)).Theperi-stimulustimehistogram(PSTH)istheconventionallyusedestimatorfor(t)[ 97 98 ]. 3.1.3ProbabilityMeasureonSpikeTrainSpaceHerewetakearatherunconventionalroutetoavoidunnecessarycomplicationthatmayhinderthepresentationoftheproposedstatistics.Insteadofassuminganarbitraryprobabilityspaceasthepreviouslyintroducedapproacheshavetaken,wedirectlydenethepointprocessestobeaprobabilitymeasureonthespiketrainspace.Thisallowssimpledescriptionsforourmethodsandisappropriatefornon-stationaryxedtimeintervalobservations.Inordertodenemeasures,werstneedameasurablespaceforspiketrains.LetbethesetofallnitespiketrainsonaboundedintervalXR,thatis,each!2canberepresentedasanitesetofevents(actionpotentials)!=ft1
PAGE 34

correspondingactionpotentialtimings,whereasAndenotesasubsetofthesespiketrainsinvolvingonlynactionpotentialseach.Itisconvenienttohaveareferencemeasureonthespace.Let(A)=0(A)+P1n=1n(A\n)denotetheextensionofLebesguemeasuresnforn(n>0)to,and0isaDiracmeasureforthesingleelementin0whichrepresentstheemptyspiketrain.Wewilluseasthereferencemeasurewhennecessary.However,itwillturnoutthatthechoiceofisnotimportantsince-divergencesareindependentofthereferencemeasure. Figure3-2.Illustrationofhowthespiketrainspaceisstratied.ItcanbewrittenasaunionofEuclideanspaces.Seesection 3.1.3 fordetails. ApointprocessisdenedasaprobabilitymeasurePonthemeasurablespace(;F)[ 25 ].Wecallthisthestratiedapproachofrepresentingpointprocesses.Similarapproachcanbeusedtoinduceprobabilitymeasuresonthesmoothedspiketrainspace(seesection 5.4.1 ). 3.2PreciselyTimedSpikeTrainsWhenthesamestimulationispresentedtoaneuronalsystem,thetrainofactionpotentialsthatareobservedasaresultsometimesshowahighlyrepeatablespatio-temporal 34

PAGE 35

patternatthemillisecondtimescale.Recentlythesepreciselytimedspiketrainsareabundantlyreportedbothinvivoandinvitropreparations[ 26 28 54 66 74 102 103 ].Ithasbeenspeculatedthatthisspatio-temporalpatterncansupporttemporalcoding[ 122 ].Despitebeinghighlyreproducible,dierentformsoftrial-to-trialvariabilityhavealsobeenobserved[ 118 ].Itiscrucialtounderstandthisvariabilitysincetoutilizeapreciselytimedspikepatternasatemporalcode,thesystemshouldpresumablyberobusttoitsvariabilitystructure,andpossiblylearntoreduceit[ 11 ].Inthissection,wedevelopaparametricpointprocessmodelthatintuitivelycapturessuchvariabilitystructure. Figure3-3.Illustrationofdierentvariabilitiesinapreciselytimedspiketrain:precision,reliability,andadditiveactionpotentials.Thejitterdistributiononthebottomrepresentsprecision,thedottedboxindicatesamissingactionpotential(reliability),andtheredactionpotentialrepresentsanadditivenoisewhichisnotpreciselytimelocked. Preciselytimedspiketrains(PTST)consistsofpreciselytimedactionpotentials(PTAP)whicharehighlyrepeatableintime(Fig. 3-3 ).ThetemporaljitterwithrespecttostimulationonsetisknownastheprecisionofanPTAP,andtheportionthePTAPidentiedacrosstrialsisknownasthereliability.PrecisionandreliabilityarethemaincharacterizationofthevariabilityofPTAP[ 118 ].InthePTST,normallytherearealsoadditiveactionpotentialsthatdonotcorrespondtoaPTAPwhichisadditivetothetrial-to-trialvariability.ThesevariabilitiesofPTSTsarenotappropriatelydescribedbythewidelyusedpointprocessmodelssuchasPoissonprocessorrenewalmodelswhichdescribetheratemodels 35

PAGE 36

well[ 1 ].AlthoughthePTST'shavebeenpreviouslyquantiedinpractice[ 14 18 126 ],therehasnotbeenaformalpointprocessmodeltodescribethem.Inthissection,weprovideamathematicallyrigorouspointprocessmodel,anddescribeitsproperties.Inauditorycortex,ithasbeenobservedthatsomeneuronsonlyreasingleactionpotentialwithprecisedelayafterthestimulation[ 27 ].Theconceptof\binaryspiking"wasintroducedin[ 28 ]quantifyingtheprobabilityofringandtheFanofactor.WeextendthebinaryactionpotentialmodelasaunitthatcomprisesageneralpreciselytimedspiketrainwheretherearemorethanoneofthesePTAPs,thereforewerepresentPTSTasacollectionofPTAPs.Ourmodelisautonomous,i.e.,itdoesnothaveaninputunlikethegeneralizedlinearmodels[ 59 87 ]whichalsoaccuratelymodelsthespiketrainresponsegiventhefullinputtracetothesystem.Toobtainamodelwithmanageablecomplexity,weassumethattheneuronalsystemisstationarybetweentrials.Changesinthesystemsuchaslearningandmemoryduetoeitherthestimulationorspontaneousprocesswouldbeconsideredcross-trialnon-stationaryandarenotconsideredinthepresentframework.ThissimplifyingassumptionissimilartoindependenceofactionpotentialsinaPoissonprocessorindependenceofintervalsinarenewalprocess.StatisticalmodeltodescribethevariabilityofPTSTispresentedintwosteps;rstthemodelforasinglepreciselytimedactionpotentialisdescribed,thenpointasasuperpositionofprocesspreciselytimedactionpotentialsandadditiveactionpotentialsaswellasratemodulatedactionpotentials. 3.2.1PreciselyTimedActionPotentialModelWeusetherandomcountingmeasuretodescribethepreciselytimedactionpotential. Denition2. ThecountingprocessN(t)ofthepreciselytimedactionpotential(PTAP)canbedescribedbytwoindependentrandomvariablesRandT.RisaBernoullirandomvariable,i.e.Pr[R=1]=pandPr[R=0]=1)]TJ /F3 11.955 Tf 12.69 0 Td[(p.Tisanarbitrarynon-negativereal-valuedrandomvariablewithnitemeanandcorrespondingcumulativedistribution 36

PAGE 37

F(t).Thecountingprocessisdenedas,N(t)=R1Tt (3{4)where1()istheindicatorfunction.Rrepresentsthereliability,ifittakesthevalue0,thecorrespondingeventdoesnotoccur.2Trepresentstheactualoccurrencepatternoftheeventwhichisthejitterdistributionoftheactionpotential(Fig. 3-3 ).NotethatTiscompletelyignoredwhenRtakes0.Notethatdenition 2 describesthefulljointdistributionforanitecollectionofrandomvariablesfN(Ai)gforanydisjointintervalsfAig,whereN(A)denotesthecountingrandomvariablefornumberofeventsinA.Thusitdenesarandomprocess.N(t)onlytakesvalueofeither0or1,withthefollowingprobability, Pr[N(t)=1]=pF(t)Pr[N(t)=0]=1)]TJ /F1 11.955 Tf 11.95 0 Td[(Pr[N(t)=1](3{5)Clearly,thispointprocessishighlytemporallycorrelated.Onceapointoccursatsometime,therecannotbeanotherpointanywhereelse.Inotherwords,theconditionalprobabilityPr[N(A)=1jN(B)=1]=0wheneverA\B=;.TemporalcorrelationandreliabilityiswidelymeasuredwitheitherFanofactor,coecientofvariation,orlessoftenwithperi-stimulustimevarianceintheliterature[ 75 99 ].Sincethesequantitiesdependonthewindowsizewhentemporalcorrelationispresent,weuseawindowedversionofFanofactor. 2Inthecontextofsurvivalanalysis,thisissimilartothecensoringrandomvariable. 37

PAGE 38

Denition3(WindowedFanoFactor). ThewindowedFanofactorofacountingprocessN(t)foranintervalAisdenedas,FF(N;A)=var(N(A)) E[N(A)] (3{6)NotethatFF(N;[0;1))istheFanofactoroftheentireprocess.ItiswellknownthattheFanofactorofanyintervalisunityforPoissonprocess[ 23 ].IftheFanofactorissmallerthan1,weusethetermsub-Poissonforthecorrespondingcountingrandomvariable.ForaPTAPandanintervalA=[a;b),letq=p(F(b))]TJ /F3 11.955 Tf 11.96 0 Td[(F(a)),thenFF(N;A)=var(N(A)) E[N(A)]=q(1)]TJ /F3 11.955 Tf 11.96 0 Td[(q) q=1)]TJ /F3 11.955 Tf 11.96 0 Td[(q1Thus,N(A)behaveslikeaBernoulliprocesswithprobabilityq.Notethatasthewindowsize(b)]TJ /F3 11.955 Tf 12.46 0 Td[(a)becomessmaller,theFanofactorapproachesunityasinthecaseofrenewalprocess[ 75 ].Theconditionalintensityfunctionisgivenby,(t)=(tjHt)=lim!0Pr[N(t;t+)=1jHt] =pf(t) 1)]TJ /F3 11.955 Tf 11.95 0 Td[(F(t)1T>t (3{7)whereHtistheltration(orhistory)attimet,wheneverthederivativeoff(t)=dF(t) dtexists.Notethattheconditionalintensityfunctionisarandomvariable.Themarginalintensityfunctionisgivenby,(t)=E[(tjHt)]=lim!0Pr[N(t;t+)=1] =pf(t) (3{8) 3.2.2PreciselyTimedSpikeTrainModelWeintroducethenotionofpreciselytimedspiketrainasasuperpositionofindependentpointprocessesrepresentingPTAPsandotherratemodulationsinaddition. 38

PAGE 39

Denition4. Thecountingprocessforpreciselytimedspiketrain(PTST)isdenedas,N(t)=MXi=1Ni(t)+N(t) (3{9)wherefNi(t)gMi=1isthecollectionofMindistinguishableandindependentPTAPswithcorrespondingfpi;Fi(t)gMk=1,Ni(t)=Ri1Tit,andN(t)isacountingprocessforinhomogeneousPoissonprocesswithintensityfunction(t).TherateprocessNaccountsfortheadditiveactionpotentialsthatareassumedtobearateresponsefromthestimuliand/orthebackgroundactivity.Ingure 3-4 ,thisisillustrated.Notethatthesummationin( 3{9 )removestheoriginofindividualactionpotentials.Superpositionofindependentsub-Poissonprocesses(orequivalentlythesummationofthecorrespondingcountingprocesses)resultsinasub-Poissonprocess,hencethePTSTisrestrictedtobesub-Poisson.Infact,theFanofactorofthePTSTisgivenasFF(N;A)=PMi=1qi(1)]TJ /F6 7.97 Tf 6.59 0 Td[(qi)+q PMi=1qi+q1whereqi=piRAf(s)dsandq=RA(s)ds.ItiseasytoshowthatthemarginalintensityfunctionforthePTSTis(t)=MXi=1pidFi(t) dt+(t)=MXi=1i(t)+(t): (3{10)Alikelihoodmeasureforaspiketrainmodelisaprobabilitymeasureonthesetofallpossiblespiketrainsincludingtheemptyspiketrain(;). (;)=MYi=1(1)]TJ /F3 11.955 Tf 11.96 0 Td[(pi)exp)]TJ /F9 11.955 Tf 11.29 16.27 Td[(Z(t)dtd d(t1;t2;:::;tN)=XpermMYk=L+1(1)]TJ /F3 11.955 Tf 11.96 0 Td[(pik)LYk=1ik(tjk)NYk=L+1(tjk)exp)]TJ /F9 11.955 Tf 11.29 16.27 Td[(Z(t)dt(3{11)whereNMisthenumberofactionpotentialsinarealizationatcorrespondingtimest1
PAGE 40

Figure3-4.Illustrationofpreciselytimedspiketrainmodel.5PTAPsareshownontopwithrealizationsinthemiddle.Thebottomistheequi-intensityPoissonrealizations. actionpotentialstoLPTAPsandN)]TJ /F3 11.955 Tf 12.37 0 Td[(Lrateprocess.Duetoambiguitycreatedbythesummationofcountingprocesses,thelikelihoodmeasureforPTSTiscombinatorialinnature.Noticethatfortheemptyspiketrainthemeasurehasniteprobability,andforanon-emptyspiketrainandonlythedensity(Radon-NikodymderivativewithrespecttoaLebesguemeasure)iswelldenedotherwise[ 116 ].Ingure 3-5 ,wettherealdatafromneuronalculturewiththeproposedPTSTmodel.Boththemeanandvariancetswell,andthegeneratedspiketrainsarealmostindistinguishablefromtherealdata. 40

PAGE 41

TheproposedPTSTmodelisdesignedfromobservationstocapturethetrial-to-trialvariabilityofpreciselytimedspiketrains.Itprovidesanintuitiveandnaturalpointprocessmodelforunderstandingthenoisethattheneuralsystemhastoovercomeiftheprecisetiminginformationwouldbeusedoptimally.Knowingthenoisestructureiscriticalforanalyzingthetemporalcode[ 123 ],buildinganoptimalneuraldecoder(e.g.classier)[ 30 118 ]andderivingsynapticlearningrulestoenhanceneuralcoding[ 11 ].Italsorestrictstheamountofinformationthatcanbeencodedinsuchamannerbysettingtheresolutionfordistinguishablespikepatterns.Thespaceofallpossiblespiketrainsthatencodeinformationhasnitevolumebythebiologicalconstraintssuchasringrateandintrinsicnoisesources.Themodeledvariabilitygivesclueabouthowmuchofthatspaceisavailableforcoding.Hence,thecurrentworkisacomplementaryapproachtounderstandingtheneuralcodebasedonpreciselytimedspiketrains. Figure3-5.Preciselytimedspiketrainmodelingofneuronalculturedata.Dataandmodelgeneratedspiketrains,andperi-stimulusmeanandvariancehistogram(3msslidingwindow). 41

PAGE 42

Inthissection,weonlypresentedthemodelforasinglespiketrain,howevertheextensiontomultiplespiketrainsisstraight-forward.Usingapopulationofpreciselytimedspiketrainswouldbeadvantageoustodecreasetheuncertaintyofrepresentation,andmaketheoverallperformancebetter.Infact,themultipleinteractionprocess(MIP)proposedbyKuhnandcoworkershasasimilarstructure[ 64 ].MIPhasbeeninventedtotheoreticallyanalyzetheeectofdierentkindsofsynchronyontheringrateofthepost-synapticneuron.Theproposedmodelisabiologicallyrelevantalternativeforstudyingstatisticsofneuronmodels.Themorewidelyusedapproachonpointprocessmodelingofspiketrainsistousetherenewalmodel[ 9 120 ].However,modelingtheinter-spikeinterval(ISI)isnotappropriateforpreciselytimedsequenceofevents.Sincethepreciselytimedeventsoftenoriginatefromexternalworld(e.g.patternsinsensoryinput)themodelingtheintervaldistributionmakeslesssensethanmodelingthetimingsmeasuredbyexternalreferencetime.AsdemonstratedbyMainenandSejnowski[ 69 ]andotherinvivovisualstimulationstudies[e.g., 102 ],stimulationwithfeaturesdistributedovertime(asinfrozennoiseornaturalstimulation)istheoriginofPTAPs.Ifthereisnotemporalstructure,asinthecaseofconstantcurrentinjection,thetimingerroraccumulatesandreducesthetemporalcorrelationofAPsamongrealizationsandincreasesvariabilityacrosstrialsdespitethehighlyrepeatableinput[ 69 ].ThesimplifyingassumptionintheproposedmodelistheindependenceamongPTAPs.Severaldatasetsshowedintheircountstatisticsthatsomeactionpotentialstendtoretogetherorreexclusively(datanotshown).Thiscorrelatedbehaviorisnotcapturedwithourmodel,butstraight-forwardtoincludeinthemodel. 3.3PointProcessDivergenceandDissimilarityDivergenceisabivariatefunctionofprobabilitymeasuresd(P;Q):MM!R+thatevaluateshowclosetwoprobabilitymeasuresare,andtakesthevalueofzeroifandonlyifPandQareidentical.Instatistics,afamilyofdivergencescalled-divergence(or 42

PAGE 43

f-divergence)iswidelyused[ 67 ].d(P;Q)=ZdQ dPdP; (3{12)whereisaconvexfunctionand(1)=0.-divergencesincludeotherwellknowndivergencessuchasKullback-Leibler,andHellingerdivergences.Divergencesarecloselyrelatedtoconvexfunctionsandinformationtheory,e.g.,Kullback-Leiblerdivergencebetweentheproductofmarginalprobabilitiesandthejointprobabilityisthemutualinformation[ 22 ].Hypothesistestingusingdivergencesresultinadistribution-freetest,meaningthatthetestdoesnotassumeunderlyingdistributionofthesamples,butanydistributioncanbeplugged-in.Inneuroscience,theconventionallyWilcoxontest,t-test,andANOVAareusedindiscriminatingthechangeinthenumberofactionpotentials,timetorstspike,andinter-spikeinterval(e.g.[ 63 ]).Thestatisticsusedinthesetestsarepowerful,yetarenotdivergences{forexamplegiventhedistributionofnumberofactionpotentials,therecouldbemanypointprocessesthatgeneratevalue0forsuchstatistics;suchstatisticsareknownasdissimilarities[ 96 ].Therefore,thetestderivedfromthemdoesnotnecessarilyguaranteediscriminationoftheunderlyingprobabilitylaws.RecentlyproposeddistancemeasureforpointprocessbyPaivaandcoworkers[ 84 ],isalsoonlyadissimilarity,butnotadivergence.Thedissimilaritystatisticforgeneralpointprocessesisdenedfromtheir(marginal)intensityfunctionsP(t);Q(t)whichcanreadilybeestimatedfromdata.dL2(P;Q)2=Z(P(t))]TJ /F3 11.955 Tf 11.96 0 Td[(Q(t))2dt (3{13)Notethatthisisacomparisonintherstorderstatisticonly.Wewillcomparethesedissimilaritieswiththeproposedmethodsintheexperiments.Ageneralizationof( 3{13 )suggeststheuseofsquare-integrableconditionalintensityfunctionsinplaceofmarginalintensityfunctionR)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(P(t))]TJ /F3 11.955 Tf 11.96 0 Td[(Q(t)2dt.However,this 43

PAGE 44

quantityisarandomvariable,henceanexpectationisnecessarytogetarealvaluedfunction.dL2(P;Q)2=EZ)]TJ /F3 11.955 Tf 5.48 -9.69 Td[(P(t))]TJ /F3 11.955 Tf 11.96 0 Td[(Q(t)2dt (3{14)Although( 3{14 )isadivergence,itisdiculttoestimatethisquantityinpracticeunlessaparametricmodelisused. 44

PAGE 45

CHAPTER4CUMULATIVEBASEDDIVERGENCE 4.1IntroductionFordistribution-freehypothesistesting,oneofthemostwidelyusedtestsarestandardtestsonthecumulativedistributionfunctions(CDF)suchasKolmogorov{Smirnov(K-S)testandCramer{von-Mises(C-M)test.ThemainadvantageisthattheCDFcanbeeasilyestimatedconsistentlyusingempiricalCDFwithoutanyfreeparameter.Theempiricaldistributionfunctionforsamplesfx1;:::;xngisdenedas,Fn(x)=1 nnXi=11xix: (4{1)Theconvergenceisatleastalmostsurebythelawoflargenumbers,howeveritcanbeshowntobestronger: Theorem1(Glivenko{Cantelli). FnconvergestoFuniformlyalmostsurely.TwosidedKolmogorov{SmirnovteststatisticusestheL1metricofthedierencebetweenempiricaldistributions:KS=supxjFn(x))]TJ /F3 11.955 Tf 11.96 0 Td[(Gn(x)j,whiletheCramer{von-MisesteststatisticusestheL2(R;F+G)squareddistance:CM=R(Fn(x))]TJ /F3 11.955 Tf 12.05 0 Td[(Gn(x))2d(F+G).ThesestatisticswillsignicantlydeviatefromzerowhenF6=G.K-Stestonlymeasuresthemaximumdeviation,andtheC-Mtestmeasuressquaresumofdeviations,howeveronedoesnotalwaysperformsbetterthantheother.TheconceptofCDFisonlywelldenedforrealrandomvariableswhereafullorderingisprovided.Inthespaceofspiketrains,thereisnonaturallyprovidedordering.Howeverwithineachstratumitispossibletohaveamulti-dimensionalCDF.1Thestratiedspaceapproachofrepresentingpointprocessesallowsthesetraditionalstatisticalteststobeimplemented.ThereforetheextensionofK-SandC-Mtesttothepointprocessesrequirescarefulapplicationofstratication. 1Thelevelofthestratiedmodelisdevelopedinchapter 3 45

PAGE 46

4.2ExtendedKolmogorov{SmirnovDivergenceAKolmogorov-Smirnov(K-S)typedivergencebetweenPandQcanbederivedfromtheL1distancebetweentheprobabilitymeasures,followingtheequivalentrepresentation,d1(P;Q)=ZdjP)]TJ /F3 11.955 Tf 11.95 0 Td[(QjsupA2FjP(A))]TJ /F3 11.955 Tf 11.95 0 Td[(Q(A)j: (4{2)Since( 4{2 )isdicultandperhapsimpossibletoestimatedirectlywithoutamodel,ourstrategyistousethestratiedspaces(0;1;:::)denedinsection 3.1.3 ,andtakethesupremumonlyinthecorrespondingconditionedprobabilitymeasurestoapproximatethelowerbound.LetFi=F\i:=fF\ijF2Fg.Since[iFiF,d1(P;Q)Xn2NsupA2FnjP(A))]TJ /F3 11.955 Tf 11.96 0 Td[(Q(A)j=Xn2NsupA2FnjP(n)P(Ajn))]TJ /F3 11.955 Tf 11.95 0 Td[(Q(n)Q(Ajn)j:SinceeachnisaEuclideanspace,wecaninducethetraditionalK-Steststatisticbyfurtherreducingthesearchspaceto~Fn=fi(;ti]jt=(t1;:::;tn)2Rng.Thisresultsinthefollowinginequality,supA2FnjP(A))]TJ /F3 11.955 Tf 11.96 0 Td[(Q(A)jsupA2~FnjP(A))]TJ /F3 11.955 Tf 11.96 0 Td[(Q(A)j=supt2RnF(n)P(t))]TJ /F3 11.955 Tf 11.95 0 Td[(F(n)Q(t); (4{3)whereF(n)P(t)=P[T1t1^:::^Tntn]isthecumulativedistributionfunction(CDF)correspondingtotheprobabilitymeasurePinn.Hence,wedenetheK-SdivergenceasdKS(P;Q)=Xn2Nsupt2RnP(n)F(n)P(t))]TJ /F3 11.955 Tf 11.96 0 Td[(Q(n)F(n)Q(t): (4{4)GivenanitenumberofsamplesX=fxigNPi=1andY=fyjgNQj=1fromPandQrespectively,wehavethefollowingestimatorforequation( 4{4 ).^dKS(P;Q)=Xn2Nsupt2Rn^P(n)^F(n)P(t))]TJ /F1 11.955 Tf 14.64 3.02 Td[(^Q(n)^F(n)Q(t)=Xn2Nsupt2Xn[Yn^P(n)^F(n)P(t))]TJ /F1 11.955 Tf 14.64 3.02 Td[(^Q(n)^F(n)Q(t); (4{5) 46

PAGE 47

whereXn=X\n,and^Pand^FParetheempiricalprobabilityandempiricalCDF,respectively.NoticethatweonlysearchthesupremumoverthelocationsoftherealizationsXn[YnandnotthewholeRn,sincetheempiricalCDFdierence^P(n)^F(n)P(t))]TJ /F1 11.955 Tf 14.64 3.02 Td[(^Q(n)^F(n)Q(t)onlychangesvaluesatthoselocations. Theorem2(dKSisadivergence). d1(P;Q)dKS(P;Q)0 (4{6)dKS(P;Q)=0()P=Q (4{7) Proof. Therstpropertyandthe(proofforthesecondpropertyaretrivial.FromthedenitionofdKSandpropertiesofCDF,dKS(P;Q)=0impliesthatP(n)=Q(n)andF(n)P=F(n)Qforalln2N.Givenprobabilitymeasuresforeach(n;Fn)denotedasPnandQn,thereexistcorrespondinguniqueextendedmeasuresPandQfor(;F)suchthattheirrestrictionsto(n;Fn)coincidewithPnandQn,henceP=Q. Theorem3(ConsistencyofK-Sdivergenceestimator). Asthesamplesizeapproachesinnity,dKS)]TJ /F1 11.955 Tf 14.02 3.16 Td[(^dKSa:u:)412()412(!0 (4{8) Proof. NotethatjPsup)]TJ /F9 11.955 Tf 17.93 8.97 Td[(PsupjPjsup)]TJ /F1 11.955 Tf 17.94 0 Td[(supj.Duetothetriangleinequalityofthesupremumnorm,supt2RnP(n)F(n)P(t))]TJ /F3 11.955 Tf 11.95 0 Td[(Q(n)F(n)Q(t))]TJ /F1 11.955 Tf 12.92 0 Td[(supt2Rn^P(n)^F(n)P(t))]TJ /F1 11.955 Tf 14.64 3.02 Td[(^Q(n)^F(n)Q(t)supt2RnP(n)F(n)P(t))]TJ /F3 11.955 Tf 11.95 0 Td[(Q(n)F(n)Q(t))]TJ /F9 11.955 Tf 11.96 13.75 Td[(^P(n)^F(n)P(t))]TJ /F1 11.955 Tf 14.64 3.02 Td[(^Q(n)^F(n)Q(t): 47

PAGE 48

Again,usingthetriangleinequalitywecanshowthefollowing:P(n)F(n)P(t))]TJ /F3 11.955 Tf 11.96 0 Td[(Q(n)F(n)Q(t))]TJ /F9 11.955 Tf 11.95 13.75 Td[(^P(n)^F(n)P(t))]TJ /F1 11.955 Tf 14.64 3.02 Td[(^Q(n)^F(n)Q(t)P(n)F(n)P(t))]TJ /F3 11.955 Tf 11.95 0 Td[(Q(n)F(n)Q(t))]TJ /F1 11.955 Tf 14.59 3.02 Td[(^P(n)^F(n)P(t)+^Q(n)^F(n)Q(t)=P(n)F(n)P(t))]TJ /F3 11.955 Tf 11.95 0 Td[(P(n)^F(n)P(t))]TJ /F3 11.955 Tf 11.95 0 Td[(Q(n)F(n)Q(t)+Q(n)^F(n)Q(t)+P(n)^F(n)P(t))]TJ /F1 11.955 Tf 14.59 3.02 Td[(^P(n)^F(n)P(t)+^Q(n)^F(n)Q(t))]TJ /F3 11.955 Tf 11.96 0 Td[(Q(n)^F(n)Q(t)P(n)F(n)P(t))]TJ /F1 11.955 Tf 14.61 3.02 Td[(^F(n)P(t)+Q(n)F(n)Q(t))]TJ /F1 11.955 Tf 14.6 3.02 Td[(^F(n)Q(t)+^F(n)P(t)P(n))]TJ /F1 11.955 Tf 14.59 3.03 Td[(^P(n)+^F(n)Q(t)Q(n))]TJ /F1 11.955 Tf 14.64 3.03 Td[(^Q(n):ThenthetheoremfollowsfromtheGlivenko-Cantellitheorem,and^P;^Qa:s:)453()454(!P;Q. 4.3ExtendedCramer{von-MisesDivergenceWecanextendequation( 4{4 )toderiveaCramer{von-Mises(C-M)typedivergenceforpointprocesses.Let=P+Q 2,thenP;Qareabsolutelycontinuouswithrespectto.Notethat,F(n)P;F(n)Q2L2(n;jn)wherejndenotestherestrictiononn,i.e.theCDFsareL2integrable,sincetheyarebounded.AnalogoustotherelationbetweenK-StestandC-Mtest,wewouldliketousetheintegratedsquareddeviationstatisticsinplaceofthemaximaldeviationstatistic.Byintegratingovertheprobabilitymeasureinsteadofthesupremumoperation,andusingL2insteadofL1distancein( 4{4 ),wedenedCM(P;Q)=Xn2NZRnP(n)F(n)P(t))]TJ /F3 11.955 Tf 11.95 0 Td[(Q(n)F(n)Q(t)2djn(t): (4{9)ThiscanbeseenasadirectextensionoftheC-Mcriterion.Thecorrespondingestimatorcanbederivedusingthestronglawoflargenumbers,^dCM(P;Q)=Xn2N"1 2Xi^P(n)^F(n)P(x(n)i))]TJ /F1 11.955 Tf 14.64 3.02 Td[(^Q(n)^F(n)Q(x(n)i)2+1 2Xi^P(n)^F(n)P(y(n)i))]TJ /F1 11.955 Tf 14.64 3.02 Td[(^Q(n)^F(n)Q(y(n)i)2#: (4{10) 48

PAGE 49

Theorem4(dCMisadivergence). ForPandQwithsquareintegrableCDFs,dCM(P;Q)0 (4{11)dCM(P;Q)=0()P=Q: (4{12) Proof. Similartotheorem 2 Theorem5(ConsistencyofC-Mdivergenceestimator). Asthesamplesizeapproachesinnity,dCM)]TJ /F1 11.955 Tf 14.02 3.15 Td[(^dCMa:u:)412()412(!0 (4{13) Proof. Similarto( 4{8 ),wendanupperboundandshowthatthebounduniformlyconvergestozero.Tosimplifythenotation,wedenegn(x)=P(n)F(n)P(x))]TJ /F3 11.955 Tf -403.2 -23.91 Td[(Q(n)F(n)Q(x),and^gn(x)=^P(n)^F(n)P(x(n)))]TJ /F1 11.955 Tf 15.5 3.02 Td[(^Q(n)^F(n)Q(x(n)).Notethat^gna:u:)412()412(!gbytheGlivenko-Cantellitheoremand^Pa:s:)453()454(!Pbythestronglawoflargenumbers.dCM)]TJ /F1 11.955 Tf 14.02 3.15 Td[(^dCM=1 2Xn2NZg2ndPjn+Xn2NZg2ndQjn)]TJ /F9 11.955 Tf 11.95 11.35 Td[(Xn2NXi^gn(xi)2)]TJ /F9 11.955 Tf 11.96 11.35 Td[(Xn2NXi^gn(yi)2=Xn2NZg2ndPjn)]TJ /F9 11.955 Tf 11.96 16.28 Td[(Z^g2nd^Pjn+Zg2ndQjn)]TJ /F9 11.955 Tf 11.96 16.28 Td[(Z^g2nd^QjnXn2NZg2ndPjn)]TJ /F9 11.955 Tf 11.95 16.27 Td[(Z^g2nd^Pjn+Zg2ndQjn)]TJ /F9 11.955 Tf 11.95 16.27 Td[(Z^g2nd^Qjnwhere^P=Pi(xi)and^Q=Pi(yi)arethecorrespondingempiricalmeasures.Withoutlossofgenerality,weonlyndtheboundonRg2ndPjn)]TJ /F9 11.955 Tf 11.96 9.63 Td[(R^g2nd^Pjn,thentherestisboundedsimilarlyforQ.Zg2ndPjn)]TJ /F9 11.955 Tf 11.96 16.27 Td[(Z^g2nd^Pjn=Zg2ndPjn)]TJ /F9 11.955 Tf 11.96 16.27 Td[(Z^g2ndPjn+Z^g2ndPjn)]TJ /F9 11.955 Tf 11.96 16.27 Td[(Z^g2nd^PjnZ)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(g2n)]TJ /F1 11.955 Tf 12.37 0 Td[(^g2ndPjn)]TJ /F9 11.955 Tf 11.95 17.33 Td[(Z^g2ndPjn)]TJ /F1 11.955 Tf 14.59 3.02 Td[(^Pjn 49

PAGE 50

ApplyingGlivenko-Cantellitheoremandstronglawoflargenumbers,thesetwotermsconvergessince^g2nisbounded.Hence,weshowthattheC-Mtestestimatorisconsistent. 4.4SimulationResultsWepresentasetoftwo-sampleproblemsandapplyvariousstatisticstoperformhypothesistesting.Asabaselinemeasure,thewidelyusedWilcoxonrank-sumtest(orequivalently,theMann-WhitneyUtest)isperformedonthecountdistribution(e.g.[ 60 ]);thisisanon-parametricmediantestforthetotalnumberofactionpotentials.Inaddition,dierenceintheratefunction,i.e.,1-dimensionalrst-ordersummarizingstatisticofthepointprocess,ismeasuredwithanintegratedsquareddeviationstatisticsL2=R(1(t))]TJ /F3 11.955 Tf 11.95 0 Td[(2(t))2dt,where(t)isestimatedbysmoothingspiketimingwithaGaussiankernel,evaluatedatauniformgridatleastanorderofmagnitudesmallerthanthestandarddeviationofthekernel.Wereporttheperformanceofthetestwithvaryingkernelsizes.Alltestsarequantiedbythepowerofthetestgivenasignicancethreshold(type-Ierror)at0:05.Thenullhypothesisdistributionisempiricallycomputedbyeithergeneratingindependentsamplesorbypermutingthedatatocreateatleast1000values. 4.4.1PoissonProcessPoissonprocessisthesimplestpointprocessmodelwidelyusedasabasicstochasticneuronmodel.Wetestifthemethodscandetectdierenceintherateprolewhilemaintainingtheaveragerateconstant.InPoissonprocess,thecountdistributionhasvarianceequaltothemeanrate,henceforhigherratesthedatabecomessparselyspreadamongthedimensionsn.Forthisexampleofrate3,90%ofthecountdistributionisconcentratedintherangeofonetosevenspikes.Sincetheratefunctionfullydescribesthisprocess,theperformanceoftheratebasedstatisticL2worksthebestinthiscase(seeFigure 4-1 ).AlsobecausethecountdistributionforH0andH1areidenticallyPoissondistributed,Wilcoxontest(denotedN) 50

PAGE 51

Figure4-1.StatisticalpowerofcumulativebaseddivergencesonPoissonprocesses.(Left)Spiketrainsfromthenullandalternatehypothesis.Theratefunctionisconstantduringeach100msinterval.TherateofH0changesfrom20to10spk/s,andforH1itchangesfrom10to20spk/s.(Right)Comparisonofthepowerofeachmethod.TheWilcoxontestonmeancount(labeledwithN)staysaroundthethresholdlevel(0.05).Allothermethodsareempiricallyconsistentforthisexample.Theerrorbarsarestandarddeviationover10MonteCarloruns. failstodetectthedierence.K-SandC-Mbaseddivergencesperformssimilarly,yetitisinterestingtonotethatC-MisconsistentlybetterthantheK-Sdivergence. 4.4.2StationaryRenewalProcessesRenewalprocessisawidelyusedpointprocessmodelthatcompensatesthedeviationfromPoissonprocess[ 75 ].Stationaryrenewalprocesswithgammaintervaldistributionissimulated.Meanratearethesame,thereforeratefunctionstatisticandWilcoxontestdoesnotyieldconsistentresult,whiletheproposedmeasuresobtainedhighpowerwithasmallnumberofsamples.TheC-MtestismorepowerfulthanK-Sinthiscase;thiscanbeinterpretedbythefactthatthedierenceinthecumulativeisnotconcentratedbutspreadoutovertimebecauseofthestationarity. 51

PAGE 52

Figure4-2.Gammadistributedrenewalprocesswithshapeparameter=3(H0)and=0:5(H1).Themeannumberofactionpotentialisxedto10.(Left)Spiketrainsfromthenullandalternatehypothesis.(Right)Comparisonofthepowerofeachmethod.Theerrorbarsarestandarddeviationover20MonteCarloruns. Figure4-3.Preciselytimedspiketrainmodel(H0)versusequi-intensityPoissonprocess(H1).SpiketrainsfromthenullandalternatehypothesisforL=4. 52

PAGE 53

Figure4-4.ComparisonofthepowerofeachmethodforL=1;2;3;4onpreciselytimedspiketrainmodel(H0)versusequi-intensityPoissonprocess(H1).SeeFigure 4-3 forexamplespiketrains.(Left)PowercomparisonformethodsexceptforN.TheratestatisticL2arenotlabeled,sincetheyarenotabletodetectthedierence.(Right)Wilcoxontestonthenumberofactionpotentials.Theerrorbarsarestandarddeviationover10MonteCarloruns. 4.4.3PreciselyTimedSpikeTrainsAsdescribedinsection 3.2 ,apreciselytimedspiketraininanintervalismodeledbyLnumberofprobabilitydensityandprobabilitypairsf(fi(t);pi)gLi=1.Eachfi(t)correspondstothetemporaljitter,andpicorrespondstotheprobabilityofgeneratingthespike.EachrealizationofthePTSTmodelproducesatmostLspikes.Theequi-intensityPoissonprocesshastheratefunction(t)=Pipifi(t).WetestifthemethodscandierentiatebetweenthePTST(H0)andequi-intensityPoissonprocess(H1)forL=1;2;3;4(seeFigure 4-3 ).NotethatLdeterminesthemaximumdimensionforthePTST.fi(t)wereequalvarianceGaussiandistributiononagridsampledfromauniformrandomvariable,andpi=0:9.AsshowninFigure 4-4 ,onlytheproposedmethodsperformwell.Sincetheratefunctionproleisidenticalforbothmodels,theratefunctionstatisticL2failstodierentiate.TheWilcoxontestdoesworkforintermediatedimensions,howeveritsperformanceishighlyvariableandunpredictable.Thisisbecausetheassumptionthatthe 53

PAGE 54

countdistributionscomefromalocationfamilyisviolatedinthisexample.Incontrasttotheexampleinsection 4.4.1 ,theK-StestisconsistentlybetterthantheC-Mstatisticinthisproblem. 4.4.4NeuronModel Figure4-5.Izhikevichneuronmodel.(Left)examplespiketrainsfromthemodelforthebaselinecurrent(top)andwithincreasedgain(middle).Thebottomtraceisthewaveformoftheinjectedcurrent.(Right)Comparisonofthepowerofeachmethodfordierentinputscaling.TTFSistheK-Sstatisticforthejitterdistributionofthetimetorstspike. Weinvestigatethesensitivityoftheproposedmethodsinaneurophysiologicalscenario.Wesimulatedarepeatedinjectionofcurrenttoaneuronmodel,andobservedtheoutputspiketrainpattern.Byvaryingthegainfactorofinput,weinvestigatethesensitivityofvariousstatisticstoarealisticstatisticalchange.Izhikevich'ssimpliedneuronmodelforClass2neuronisstimulatedwithnoisycurrentinjection[ 48 ,Ch8].Thedynamicsofthemodelisfullydescribedbythefollowingequations,_v=0:04(v+42)(v+82)ifv30;then_u=0:2(0:26v)]TJ /F3 11.955 Tf 11.96 0 Td[(u)v )]TJ /F1 11.955 Tf 24.57 0 Td[(65 54

PAGE 55

wherespiketimesarerecordedwhenevervisreset.FrozennoiseisgeneratedfromwhiteGaussianwaveformwithduration40ms.Thenoiseisinjectedtotheneuronmodelwithadditionalnoisecurrentthatisgeneratedforeachtrial.Theneuronmodelissensitivetocertainfeatureofthenoise[ 69 ],hencepreciselytimedactionpotentialsareobserved(seeFigure 4-5 ).Whenthefrozennoisewasscaledup,theactionpotentialsgenerallyreducedtheirvarianceandredinearliertimeonaverage(seeFigure 4-5 ).Theinputcurrentisadjustedsuchthat2actionpotentialsaregeneratedonaverage.Thesamplesizeis30spiketrainsforeachcondition.TheC-MandK-Smeasuresbothperformatleastasgoodasthebestratebasedmeasure.Additionally,aK-Stestonthejitterdistributiontotherstevent(timetorstspike),ignoringthesecondspiketiming,iscomparedandshowntobeslightlyworsethantheproposedmethod. 4.4.5SerialCorrelationModel Figure4-6.Renewalprocesswith(H0)orwithoutserialcorrelation(H1).(Left)Spiketrainsfromthenullandalternatehypothesis.(Right)Comparisonofthepowerofeachmethod.Theerrorbarsarestandarddeviationover10MonteCarloruns. Serialcorrelationinpointprocessisdenedbytheautocovarianceoftheintervals(timebetweenspikes).Duetointernaldynamicsofneurons,thespiketrainscanhave 55

PAGE 56

serialcorrelation[ 33 ].Wesimulatedarenewalprocesswithoutserialcorrelationandapointprocesswithsamemarginalintervaldistributionbutwithanon-zeroserialcorrelation[ 15 ].Werestrictourproblemtotwointervalsonly,whereeachintervaldistributionisaconvolutionoftwouniformdistributions.Itturnsoutthisproblemisquitedicultandhundredsofsamplesareneededtodiscriminatethem(seeFigure 4-6 ).BothK-SandC-Mtypetestsperformequallywell,whiletheWilcoxontestfailstodierentiatebetweenthetwopointprocesses.AnoptimizedkernelsizefortheratefunctionstatisticoutperformsbothK-SandC-Mtypetest. 4.5OptimalStimulationParameterSelection 4.5.1ProblemInthecontextofbrainmachineinterfaces,oneofthechallengesistocreateasensoryneuralprosthetics;adevicethatwillarticiallygenerateperceptionofappropriatesensation.Onewayofachievingthisisthroughelectricstimulationofthesensorypathwaysuchasthalamus.Ifoneagreestothenotionthatsimilarsensorycortexactivationleadstosimilarsensoryperception,theproblemistooptimizestimulationformaximumsimilarity.Thus,aprincipledmethodformeasuringsimilarityisessential.Sincewemaynotbeabletondtheexactparameterthatgeneratesthetargetresponsepattern,thesimilaritymeasureneedstowellbehaveforawiderangeprovidingdistinctsimilarityvalues.Anexperimentaldatathatconsistsofmulti-trialmeasurementforsensorycortexresponseresultingfromnaturaltouchstimulioftheratpawisused.Thenaturalstimulationservesasthetargetpointprocess,whileaxednumberofparameterizedmicrostimuliresponsescreatethesearchspace. 4.5.2MethodThedatausedinthisanalysisisfromasingleratimplantedwithmulti-siteelectrodes.Theanesthetizedrat'sneuronalactivitywasrecordedfrom16cortical 56

PAGE 57

Figure4-7.Spiketrainsfromsensorycortexinresponsetonaturalandelectricalstimulus.Eachboxshowstherasterof140spiketrains(exceptforthenaturalwhichhas240)ofduration40ms.Thespiketrainsaresortedbythenumberofspikesandtimetorstspike. electrodesinS1and16thalamicelectrodesinventralposterolateralnucleus(VPL)area.Theratwasanesthetizedbyisouoranefollowedbyanembutalinjectionandmaintainedwithisouorane.Neuronswithsignicantnaturalresponsewereidentiedbynaturaltouchstimulationviathwackingoftherstdigitontherat'srightpawat2Hzfor120seconds,yielding240thwackresponses.Thethalamicmicrostimulationwasperformedontwoelectrodes,chosenfortheirsignicantresponsetonaturaltouch(basedontheaudiblequalityofthespiking).Thestimulationsconsistedofsinglebiphasicpulseswithspeciedpulsedurationandamplitude,butalwaysat2Hzfrequency.Duringthetrial19distinctpairsofpulsedurationandamplitudewereapplied,with140responsesfromeachpairrandomlypermutedthroughouttherecording.Thus,thegoalwastocomparethe240naturaltouchresponsestothe140responsesforeachpair.Inordertocomparethetemporalresponse,thenaturaltouchandmicrostimulationneededtobealigned.Adicultywithmicrostimulationisthatitproducesartifactthat 57

PAGE 58

preventsrecordingfornearly6msontheentireamplierarray.Thecorrespondinginitial6msresponsetonaturaltouchwasremoved.Awindowof40mswasusedforcomparison.Theringratereturnedtothespontaneouslevelbytheendofthewindowasingure 4-9 Figure4-8.Dissimilarity/divergencesacrossparametersets.Eachmeasureisshiftedandscaledtobeintherangeof0to1.L2used2.5msbinswithnosmoothing. Figure4-9.Thenaturalresponse(left),themicrostimulationsetselectedbyL2(center),andthesetselectedbyK-SandC-M(right).Toprowshowsthespiketrainsstratiedintonumberofspikesandthenspiketime.BottomrowshowstheaverageresponsePSTHbinnedat2.5ms;thewindowedvarianceisshownasathingreenline. Asanapplicationoftheproposedmeasure,wedemonstratehowtheycanbeusedasacostfunctiontobeminimized.Theproblemistondtheoptimalelectricalstimulation 58

PAGE 59

parametersthatwouldproducespikingpatterninthesensorycortexthatisclosesttothetargetpatterngeneratedfromnaturalsensoryinput.Given240responsesfromnaturalinput,andaround140responsesfromeachstimulationparameter(rangingoverdurationandamplitudeofabiphasiccurrentinjection),weusetheproposeddivergencestomeasurethemostsimilarresponsepattern,takingthevariabilitystructureovertrialsintoaccount.TheresultsfromapplyingtheC-M,K-S,andL2measuresbetweenthesetofnaturalstimuliresponsesandeachparametersetareshownFigure 4-8 .Theoveralltrendamongdissimilarity/divergencesisconsistent,butthelocationoftheminimadoesnotcoincideforL2.ThenaturalstimuliandtheoptimalresponsesareshowninFigure 4-9 .Thesetselectedbytheproposedmethodsmatchtheearlytimingpatternbetter.Infact,thedeviationofwindowedvariancefromthemeansupportsthenon-Poissonnatureoftheresponse,probablyduetothetimelockedrstactionpotential. 4.6ConclusionWehaveproposedtwonovelmeasuresofdivergencebetweenpointprocessesbasedoncumulativedistributionfunctions.Theproposedmeasureshavebeenderivedfromthebasicprobabilitylawofapointprocessandwehaveshownthatthesemeasurescanbeecientlyestimatedconsistentlyfromdata.Usingdivergencesforstatisticalinferenceletusbreakfreefromusingonlyrstandsecondorderstatistics,andenablesdistribution-freespiketrainanalysis.ThetimecomplexityofbothmethodsisO)]TJ 5.48 -0.72 Td[(PnnNP(n)NQ(n)+N2P(n)+N2Q(n)whereNP(n)isthenumberofspiketrainsfromPthathasnspikes.InpracticethisisoftenfasterthanthebinnedratefunctionestimationwhichhastimecomplexityO(BN)whereBisthenumberofbinsandN=Pnn(NP(n)+NQ(n))isthetotalnumberofspikesinallthesamples.Inseveralexamplesdemonstratedhere,theL2distanceofratefunctionstatisticoutperformedourproposedmethod.However,itinvolvesthesearchforthesmoothingkernelsizeandbinsizewhichcanmaketheprocessslowandprohibitive. 59

PAGE 60

Inaddition,itbringsthedangerofmultipletesting,sincesomesmoothingkernelsizesmaypickupspuriouspatternsthatareonlyuctuationsduetonitesamplessize. 60

PAGE 61

CHAPTER5-DIVERGENCEANDHILBERTIANMETRICRecallthedenitionof-divergence( 3{12 )D(P;Q)=ZdQ dPdP; (5{1)The-divergencefamilyencompassesseveralwellknowndivergencessuchasKullback-Leiblerdivergence,Hellingerdistance,totalvariation,and2-divergence,thatarewidelyappliedinestimationtheory,decisiontheory,formulationofclassicationbounds,andinformationtheory[ 67 ].While-divergencesarenotmetricingeneral,asubclassof-divergencesaresquaredHilbertianmetricsonthespaceofprobabilitylaws.Weinvestigatetwospecialcasesintheintersection:Hellingerdivergenceandsymmetric2-divergence.WeextendthesetopointprocessesandpresentestimatorsviaRadon-Nikodymderivatives.WepresenttwodistinctmethodsforestimatingtheRadon-Nikodymderivative;viaadirectdensityestimation,orviaapproximatingthederivativewithkernelmethods.Inaddition,twodatasetsareanalyzedforchangedetection,andclusteringapplications. 5.1HilbertianMetricStatisticaldivergenceisnotametricingeneral;thesymmetrynorthetriangleinequalityissatised.SymmetrizationcanbeachievedbydeningD(Q;P)+D(P;Q),howeversatisfyingthetriangleinequalityisnottrivial.Sincealotofalgorithmscanoperateinametricspace,itisdesirabletostudythedivergencesthathavethemetricproperties;henceworkinginthemetricspaceofprobabilitymeasures.Moreover,aswewillseeafamilyofdivergencesknownasHilbertianmetricprovidesadistanceontheprobabilityspacethatcanbeisometricallyembeddedinaHilbertspace.Thesedivergencesarecloselyrelatedtokernelmethods;infactthereexistsafamilyofcorrespondingsymmetricpositivedenitekernelsonprobabilitymeasures.Therefore,wecanprovideaninnerproductstructureinadditiontothedistancethroughthesedivergences. 61

PAGE 62

AmetricdisaHilbertianmetricifandonlyif)]TJ /F3 11.955 Tf 9.3 0 Td[(d2(x;y)isaconditionallypositivedenitekernel1(cpd)[ 7 108 ].Sinceprobabilityandtheirdensitytakepositiverealvalues,onecaninduceametricbetweenprobabilitymeasuresfromadistanceonpositivereals.Inordertoensurethedistanceisindependentofthereferencemeasure,whichiscrucialforitsevaluation,thedistancemustbea1 2-homogeneousHilbertianmetric. Denition5(-homogeneousfunction[ 43 ]). Asymmetricfunctionfis-homogeneousifforallc2R+,f(cx;cy)=cf(x;y): (5{2)HeinandBousquetdenedafamilyofsuchmetricsthatleadstoHilbertianmetricsonprobabilitymeasures. Theorem6(Fuglede[ 36 ],HeinandBousquet[ 43 ]). Thefunctiondj:R+R+!Rdenedas,d2j(x;y)=21 (x+y)1 )]TJ /F1 11.955 Tf 11.95 0 Td[(21 (x+y)1 21 )]TJ /F1 11.955 Tf 11.96 0 Td[(21 (5{3)isa1 2-homogeneousHilbertianmetriconR+,whenever2[1;1];2[1 2;]or2[;)]TJ /F1 11.955 Tf 9.3 0 Td[(1]. Proposition7(HeinandBousquet[ 43 ]). LetPandQbetwoprobabilitymeasuresonX,andbeameasurethatdominatesbothprobabilitymeasures.LetdR+bea1 2-homogeneousHilbertianmetriconR+.ThendMdenedas,d2M(P;Q)=ZXd2R+dP d;dQ dd (5{4)isaHilbertianmetriconM.d2Misindependentofthedominatingmeasure. 1IfPiPjcicjK(xi;xj)0forallPici=0,thenitiscpd. 62

PAGE 63

Proof. Sinced2R+isa1-homogeneousfunction,thedenitionisinvarianttothechoiceofdominatingmeasure;d2M(P;Q)=ZXd2R+dP d;dQ dd=ZXd dd2R+dP d;dQ dd=ZXd2R+dP dd d;dQ dd dd=ZXd2R+dP d;dQ ddbythechainrule[ 42 ]where.Bythelinearityoftheintegration,thecpdpropertyof)]TJ /F3 11.955 Tf 9.3 0 Td[(d2R+transmitsto)]TJ /F3 11.955 Tf 9.3 0 Td[(d2M,henceitisaHilbertianmetric. Scholkopfshowedtheconnectionbetweenthecpdkernelsandtheirbehaviorasadistancebasedfeaturespace[ 108 ]. Proposition8(cpdandpdkernelScholkopf[ 108 ]). Let:XX!Rbeasymmetrickernel.Then,~denedasfollows,~(x;y)=1 2((x;y))]TJ /F3 11.955 Tf 11.95 0 Td[((z;y))]TJ /F3 11.955 Tf 11.96 0 Td[((x;z)+(z;z)) (5{5)ispdifandonlyifiscpd.Nowitispossibletobuildpdkernelsfromcpdkernels.Notethatfrom( 5{5 )wecaninduceanRKHSbasedon~.Let(x)denotetheprojectedvectorcorrespondingtox2X.Then,thesquareddistanceintheRKHSbecomes,k(x))]TJ /F3 11.955 Tf 11.95 0 Td[((y)k2=~(x;x)+~(y;y))]TJ /F1 11.955 Tf 11.96 0 Td[(2~(x;y) (5{6)=1 2((x;x))]TJ /F3 11.955 Tf 11.96 0 Td[((z;x))]TJ /F3 11.955 Tf 11.95 0 Td[((x;z)+(z;z)+(y;y))]TJ /F3 11.955 Tf 11.95 0 Td[((z;y))]TJ /F3 11.955 Tf 11.96 0 Td[((y;z)+(z;z)))]TJ /F1 11.955 Tf 11.96 0 Td[(((x;y))]TJ /F3 11.955 Tf 11.96 0 Td[((z;y))]TJ /F3 11.955 Tf 11.95 0 Td[((x;z)+(z;z))=)]TJ /F3 11.955 Tf 9.29 0 Td[((x;y)+1 2((x;x)+(y;y)) (5{7)Therefore,usingthepdkernel~andusinganalgorithmthatonlyrequiresdistances,essentiallytheRKHScanbelinkedtoacpdkernel. 63

PAGE 64

IntermsofHilbertianmetricsforprobabilitymeasures,using( 5{4 )and( 5{3 ),andtakingthezin( 5{5 )tobethezeromeasureresultsinaclassofpdkernelsj(P;Q)[ 43 ].1j1(P;Q)=Zp(x)q(x) p(x)+q(x)d(x);1 2j1(P;Q)=Zp p(x)q(x)d(x)1j1(P;Q)=)]TJ /F1 11.955 Tf 19.05 8.09 Td[(1 log2Zp(x)logp(x) p(x)+q(x)+q(x)logq(x) p(x)+q(x)d(x) (5{8)wherep=dP dandq=dQ d.ThesekernelsinduceRKHSwherethedistancecorrespondstousingsymmetric2-measure,Hellingermetric,andJensen-Shannondivergence. 5.2Radon-NikodymDerivativeBoth-divergencesandHilbertianmetricscanbewrittenintheintegralform:ZfdP dQdR (5{9)whereP;QandRare-nitemeasures.Here,dP dQistheRadon-NikodymderivativeofPwithrespecttoQ. Denition6(Absolutecontinuity). Let(;F)beameasurablespace,andP;Qtwomeasures.IfforallA2F,Q(A)=0impliesP(A)=0,thenPisabsolutelycontinuouswithrespecttoQ.ThisiswrittenasPQ. Theorem9(Radon-Nikodymderiviative[ 42 ]). Ifisa-nitemeasure,and,thenthereexistsanitevaluedmeasurablefunctionfonsuchthat(E)=ZEfdforeverymeasurablesetE.fistheRadon-Nikodymderivative,anddenotedasd d.Itisuniqueupto.Radon-NikodymderivativeofprobabilitymeasuresPandQisequivalenttothelikelihoodratio.ThequestionishowtoestimatedP dQfromnitesamplesfromPandQ.AmeasurePisadiusemeasureifP(fxg)=0forallx2.IfthespaceisequippedwithareferencemeasuresuchthatforanydiuseprobabilitymeasuresPandQ,(P;Q),thenwecanusethechainruleofRadon-Nikodymderivatives[ 42 ]andwrite 64

PAGE 65

( 5{9 )as,ZfdP d=dQ ddR (5{10)Hence,givenanestimatorfordP dthedivergencecanbeestimated.Inthecaseofstratiedapproach,wedenedareferencemeasurederivedfromtheLebesguemeasureofEuclideanspaces(section 3.1.3 ).Hence,forapointprocessP,dP dcanbemeasuredviaParzenwindowingusingdensityestimationkernelsonEuclideanspaces(assumingthepointprocessesarediusive).TheprobabilitymeasurePforpointprocesscanalsobedecomposedaccordingtothepartitionof.DenePn(A)=P(A\n),thenwecanwriteP=P1n=0Pn.AlsodenotePn()aspn,theprobabilityofhavingnactionpotentials.Pisassumedtobeabsolutelycontinuouswithrespectto(writtenasP),thuswecantaketheRadon-Nikodymderivative,dP d(!)=P0()0(!)+1Xn=1dPn dn(!n)=p00(!)+1Xn=1pnfn(!n) (5{11)wherefnistheunorderedjointlocationdensityandissymmetriconthepermutationofitsarguments,and!n=![n.fnhasacloserelationshipwiththedensityofJanossymeasurejn(!)=n!pnfn(!).Foraspiketrain!=(t1;;tn),theJanossymeasurehasasimpleinterpretation;jn(t1;;tn)dt1dtnistheprobabilitythatthereareexactlyneventsinthepointprocess,oneineachofthendistinctinnitesimalintervals(ti;ti+dti)[ 25 ].Notethatifnactionpotentialsareordered,itshouldbedividedbyn!torepresentthejointdensityoforderedtimeswhichhasasmallerstatespace.GivenasequenceofobservationsX=f!igmi=1,wecanusethedecomposition( 5{11 )forestimationofthenitepointprocess.LetthesubsequencesX(n)=f!i;!i2n;i=1;:::;mgbethesetofallspiketrainswithlengthn.Frequencybasedestimateofthe 65

PAGE 66

totalcountdistributionandthekerneldensityestimationoffncanthenbewrittenas,^pn=card(X(n)) card(X)n=0;1; (5{12)^fn(x)=1 card(X(n))X!i2X(n)n(x)]TJ /F3 11.955 Tf 11.95 0 Td[(!i;n)n=1;::: (5{13)wheren(;)isasymmetricn-dimensionaldensityestimationkernelwithbandwidthparameterwhichisalsosymmetriconthepermutationofitsarguments,andcard()denotesthecardinalityofaset.f^pn;^fngcompletelydescribesageneralnitepointprocessaswediscussedbefore.Intheexperimentalsection,weusedasphericalnormaldistributionfor(;).Theoptimalbandwidthofthekernelisusuallysetbyacrossvalidationmethod.However,tospeedupcomputationweusedamodiedSilverman'srulethatguaranteesconsistentdensityestimationinstead[ 114 ]:n=card(X(n)))]TJ /F4 7.97 Tf 6.59 0 Td[(1=(n+4)1wheren>1isthedimensionand1isthebandwidthforasinglesampleinonedimension. 5.3HellingerDivergenceFirst,weinvestigateHellingerdivergenceforpointprocesses.Therearetwoforms:aHilbertianmetricform( 5{14 )with=1 2;=1,andthe-divergenceformwith(x)=(1)]TJ 11.96 8.61 Td[(p x)2( 5{15 ).d2H(P;Q)=Zd21 2j1dP d;dQ dd=Z s dP d)]TJ /F9 11.955 Tf 11.95 21.5 Td[(s dQ d!2d (5{14)=Z 1)]TJ /F9 11.955 Tf 11.95 21.51 Td[(s dP d=dQ d!2Q (5{15) 66

PAGE 67

5.3.1EstimatorUsingthenonparametricestimatorofnitepointprocess( 5{12 )and( 5{13 )to( 5{14 )withrespecttothereferencemeasure,weobtaintheHellingerdistanceestimator:d2H1(^P;^Q)=Z0@s d^P d)]TJ /F9 11.955 Tf 11.96 23.02 Td[(s d^Q d1A2d (5{16)Sincetheintegrationof( 5{16 )isnoteasilycomputable,weresorttotheMonteCarlomethodforestimation.Wewouldliketousetheaverageprobabilitymeasure=^P+^Q 2forimportancesampling,however,isusuallynotabsolutelycontinuouswithrespectto.UsingtheLebesguedecompositiontheorem[ 42 ],canbedecomposedto=?+,where??and.Sinced2(0;0)=0,theintegrationwith?leadstozero,andhence,theimportancesamplingcanbeachievedas,d2H1(^P;^Q)=ZXd2 d^P d;d^Q d!d=ZXd2 d^P d;d^Q d!d?+ZXd2 d^P d;d^Q d!d=ZXd2 d^P d;d^Q d!d dd=ZXd2 d^P d;d^Q d!2 d^P d+d^Q dd=ZXd2 d^P d;d^Q d!2 d^P d+d^Q dd: (5{17)Using( 5{17 ),wecaneitherusealltheoriginalspiketrainsthatwereusedtoestimate^Pand^Q,orgeneratearbitrarynumberofspiketrainsfrom^Pand^Q.Thelatterapproachprovidesmorecontrolledvariancebutisslower.Wecanuseanalternativeestimatorderivedfrom( 5{15 )as,d2H2(^P;^Q)=Z0@1)]TJ /F9 11.955 Tf 11.96 23.02 Td[(s d^P d=d^Q d1A2d^Q=E^Q240@1)]TJ /F9 11.955 Tf 11.96 23.02 Td[(s d^P d=d^Q d1A235 (5{18)Notethatexpectationcanbeestimatedbythestronglawoflargenumbers. 67

PAGE 68

5.3.2IllustrativeExampleTwopointprocesseswithtwoeventsorlesswithsamemarginalintensityfunctionbutwithdierenteventcorrelationstructurearechosentoillustratethemethod(Fig. 5-1 ).InpointprocessA,thetwotimingsarecorrelated;theinterspikeinterval(ISI)hasanarrowdistribution.InpointprocessB,thetwotimingsareindependent;eachactionpotentialhasaprecisetiming.Bothpointprocesseshavealossynoise;eachactionpotentialhasaprobabilityofdisappearingwithprobabilityp=0:1.NotethatprocessArepresentsrenewaltypeofneuralcode,whileprocessBrepresentspreciselytimedactionpotential.Wetook40realizationsofeachprocess,andcomputedtheHilbertianmetriccorrespondingtoHellingermetric.WechoosepointprocessAandBsuchthatPnandf1areidenticalforboth;theonlydierenceisinf2.Non-parametricestimationfromthesamplesshowthecorrelationcanbedetectedvisually(Fig. 5-1 ).Duetothesmallnumberofrealizationsforf1,theestimateiscrude,butrecallthatitwillbeweightedby^p1whichisalsosmall.TheMonteCarlointegrationof( 5{17 )withHellingermetricconvergedwithinhundredsofiterations.Thisisillustratedingure 5-2 wherethesquareddistancebetweentwosetsofrealizationsfromprocessAandonefromprocessBiscompared.The90%condenceintervalcandistinguishprocessAandprocessBwithin50iterations(seealsogure 5-3 ).Thestatisticalpowerofthetestincreasesasthenumberofsampleincreases.Table 5-1 summarizesthepowerofdiscriminatingprocessAandBwiththresholdof0.05.Thetoprowiscomputedusingonlythesamplesthemselves(twicethenumberusedtoestimatetheprobabilitymeasuresincetherearetwoofthem)andthebottomrowuses500generatedsamplesfromtheestimatedmeasures(seesection3.2). 5.3.3DetectingLearningLetusconsiderahypotheticalexperimentwhereoneinvestigatestheeectivenessofadrugorastimulationprotocolthatmodulatesorinducessynapticplasticity.Wecollect 68

PAGE 69

Figure5-1.(Toprasters)40realizationseachfrompointprocessA(top)andB(bottom),x-axisistimeinseconds.Thespiketrainsaresortedintermsoftimeofrstactionpotential(right).InbothAandB,t1U(0:2;0:3).t2t1+0:3+N(0;0:01)inA,andt2U(0:5;0:6)+N(0;0:01)inB.Botht1andt2have0:1probabilityofloss.(Below)Estimatedparametersfromthespiketrainsshownabove.pn,f1andf2aredepictedfromlefttoright.Annonsymmetricf2isplottedforsimplicity. 69

PAGE 70

Figure5-2.1000parallelMonteCarlointegrationstoshowtheconvergence.Maxandmin,90%empiricalcondenceinterval,estimatedstandarddeviation,andmeanofestimationof( 5{17 )fortwoxedsetsofrealizationsofpointprocessA(blue),andbetweentwoxedsetsofrealizationsofpointprocessAandB(red)areplotted. Figure5-3.EmpiricaldistributionofHellingerdivergence.Histogramofstatisticsestimatedfrom40realizations(left)and100realizations(right). datafrompiecewisestationarysegmentswherelittleplasticityispresumed.Thegoalistodetecttheplasticitybycomputingthedivergencebetweendatasets.Arandomlyconnectednetworkof1000neurons(800excitatoryregularspiking,200inhibitoryfastspiking[ 48 ])withdelayandspiketimingdependentplasticity(STDP)onexcitatorytoexcitatorysynapseswassimulatedsimilarto[ 47 ].Thenetworkhadcurrentinjectiontorandomlypreselected50neuronsfor2msonceasecond.Theresponseofeachneuroninthe100mswindowaftercurrentinjectionwascollectedastrials.Asaninitialization,STDPwasturnedontoadapttheweightsfortherst20s,forthesecond20sSTDPwasturnedotoreachthesteadystate.Thefollowing120s 70

PAGE 71

#ofsamples102030405060708090100 samplesonly0.2010.4200.7630.8870.9650.9930.9990.99911500MC0.2020.4630.7490.9040.9830.9950.9991.00011 Table5-1.Statisticalpowerofintegrationmethodsonnumberofsamples. consistsofthree40sblock;therst40scorrespondtothebaselinemeasurements,duringthesecond40sSTDPwasturnedon(oroforthecontrolexperiment)toinducechanges,andthelast40scorrespondstothetrialsfromthemodiedsystem(ornullhypothesisforthecontrol).Eachneuronwasseparatelyanalyzed.Notethatthestimulationprotocolisatypicalstimulationgiventoculturedneuronsonmicroelectrodearrays(MEAs)[ 126 ].Althoughthedataoriginatedfromonelongexperimentcutinto40trials,anddivergencefromindividualneuronswerecollectedtogether,thedatafromSTDPandnoSTDPshowedsignicantdierence(Fig. 5-4 ).Assumingtheneuronstobeanensemble,91.9%wererejectedbasedontheempiricalsurrogatetest.DetailedobservationrevealedthatSTDPingeneralbringsthetimingofringclosertothestimulationtime,whilesomeneuronsdidnotchangeatall. 5.3.4ArticialSpikeTrainsIntable 5-2 ,theaboveproposedmethodistestedon4distinctpointprocessesthathaveemphasisonfeaturesrelatedtoneuroscience;correlatedspiketimings[ 34 ],ringratechange[ 46 ],preciselytimedspiketrains[ 26 28 103 ],andserialcorrelation[ 15 ].Toanswerthequestionofwhetherasetofspiketrainsoriginatefromthenullhypothesis,weperformedhypothesistestingbygeneratingtheempiricaldistributionofthedivergencegivennitenumberofsamples.Thegenerationofsurrogatedistributionwithcarefullychosenassumptionscancompensateforthebiasandgivesestimatedp-valuetothetest.Incaseoftable 5-2 ,wealsoknowthealternativehypothesis,hencewecomputedthestatisticalpower.Non-parametricmethodisageneralist,whileeachoftheparametricmethodisaspecialist.Asexpected,onecanseefromthetablethattheproposedmethodisa 71

PAGE 72

Figure5-4.SignicanceofdivergencebasedonHellingerdistancevaluesbeforeandafterlearningprotocol.(Left)Solidcurverepresentsthenullhypothesis;sorteddivergencevaluesforeachneuroninstationarycondition(withoutlearning).Horizontaldottedlineisthe0:05signicancethreshold(0:161)forrejection.CrossesrepresentthecorrespondingdivergencevaluesforeachneuronwhenSTDPwasenabledfor40s.91.9%oftheneuronshadsignicantlydierentspikingstatisticafterSTDPlearning(p<0:05).(Right)Histogramofthenullandalternativehypothesis.Thestimulatedneuronsandinhibitoryneuronsareexcluded(n=754). generalist.Oneneedsageneralistwhenthechangeissubtleorunknown.Whenlargedatasetwithvariouspossibilityisgiven,ageneralistwouldbeagoodchoicetouseindetectingchanges.Giveninnitenumberofsamples,theproposedmethodisguaranteedtodetectthechangeifthereisachange. 5.3.5Non-stationarityDetectionIndependenceandstationarityacrosstrialsareessentialrequirementsfortheanalysisofexperimentalobservations.Testingwhetherthemeasureddatasethasthedesiredpropertyisachallengingstatisticalproblem,evenmoresowhenthemeasurementsareintheformofspiketrains.Thebestwecandoistousesensitivemethodstodetectnon-stationarityandsupporttheclaimofstationarityfromfailureofdetection.Thepointprocessdivergencecanbeusedtoperformhypothesistestingfortheinter-trial 72

PAGE 73

Exp FR FF ISI N L2100 L210 TTFS HL100 HL10 I 0.0685 0.0589 0.9516 0.0502 0.0657 0.0164 0.0415 0.0579 1.0000 II 0.9990 0.2869 0.2067 0.9275 0.9990 0.9971 0.4579 0.9217 0.8927 III 0.0859 0.9690 1.0000 0.1632 0.0995 0.1391 0.2531 0.8657 1.0000 IV 0.0000 0.0000 0.0483 0.0000 0.0541 0.1140 0.0309 0.7942 0.6386 Table5-2.Statisticalpower(higherthebetter)ofvariousmethodsforhypothesistestingwiththreshold0:05.Thepowerwascomputedwithrespecttoempiricalsurrogatedistribution.DatasetI:atwoactionpotentialmodelwithorwithoutcorrelation,II:homogeneousPoissonwithdierentrate,III:preciselytimedspiketrainandPoissonequivalent,IV:renewalprocesswithorwithoutserialcorrelation.40spiketrainsareusedforI,II,andIII,320spiketrainsareusedforIV.FR:absolutedierencebetweenmeanringrate,FF:Fanofactordierence,ISI:KS-statisticofinter-spikeintervaldistribution,N:KS-statisticoftotalspikecountdistribution,L210,L2100:meansquaredierenceofsmoothedPSTHwithcorrespondingkernelsize,TTFS:KS-statisticoftimetorstspikedistributionHL10,HL100:proposednon-parametricmethodwith100msand10mskernelsize1 Figure5-5.PerformanceoftestwithHellingerdivergencedependsonthenumberofsamples.Theblue(thin)andblack(thick)curvescorrespondtotwodierentscalesoftheA4vsA5discriminationproblem.Thedottedlineisthe-divergenceformoftheestimator( 5{18 ).Theerrorbarsarestandarddeviationestimatedfrom10MonteCarloruns. 73

PAGE 74

non-stationarity.Asanapplicationoftheproposedmethod,weapplyittoinvestigatetheshort-termnon-stationarityand/orstationarityofdatasetscollectedfromneuralcultures.Periodicallyrepeatedstimulationtoneuralsystemisastandardtechniquetoprobeandanalyzethestateofthesystem.Certainlythisisnotanaturalinputforinvivoneuralsystemswherethestimulusismoreorlessrandom,buttheresultingspiketrainsignalsareeasiertoanalyze;inmanycasestheresponsepatternsarestrictlytimelockedtothestimulation.Fromthedatasetobservedfromneuralcultures,wehavepreviouslyobservedthattrial-to-trialvariabilityplaysamajorroleforclassicationordecodingoftherepeatedinputstimulusconditions[ 30 ].Forasingleprobingtrial,themeasuredspiketrainisnotstationaryingeneral.However,multipletrialscanbeconsideredindependentandstationarythroughproperinitialization,forexamplebycontrollingtheinter-trialinterval.Byinter-trialstationarity,wemeanthattheprobabilitylawbehindthesetoftrialsdonotchangeovertime.Inotherwords,ifasetoftrialsobeyinter-trialstationarity,eachtrialisindependentandidenticallydistributed.Whentheobservationsarespiketrains,thismeansthatthetrialsareindependentrealizationsofthesamepointprocess(non-stationarywithinthetrialingeneral).Givenameasureofdivergencebetweentwosetsofspiketrains,wecanperformstatisticaltesttoshowiftheunderlyingpointprocessesaredierent,hence,showinginter-trialnon-stationarity.Whenelectricallystimulated,theresponseofneuralculturerecordedthroughamicro-electrodearray(MEA)changesasthestimulationisperiodicallyrepeated.Variousresourcesofneuronsareusedandshorttermplasticityisrstobservedfromtheactionpotentialtimings.However,afterawhiletheresponsesstabilizeduetohomeostaticmechanisms.Thetimeperiodthatthishappensisrelatedtothetimeconstantsoftheneurophysiologicalprocesses.Hence,whenstimulatedforalongtime,theresponsebecomesmoreorlessstationary.Inneuralcultures,mostactionpotentialsoccurtimelockedtothestimulationtime,thereforethisisvisuallyveriable.Assumingacertain 74

PAGE 75

portionofthetrialstobeinter-trialstationary,wewouldliketodeterminebyhowmuchwecanextendtheassumption.Toutilizethemulti-channelrecordingnatureoftheexperiments,weassumedthatthechannelshavesimilarnon-stationaritypatternovertime.Byaveragingoverthechannels,asmootherestimateofthedivergenceisobtained.Thedistributionfornullhypothesismustalsobechangedaccordinglybytakingthemeanofacrosschannels.Wehaveselected5to15activechannelswithphasicresponseswhichseemstobechangingintheexperimentalsetup.Weremovedthechannelsthatdoesnothavelessthanorequalto5actionpotentialsfor90%ofthetrials.Theinactivechannelsthatcannotbereliablyestimated(meannumberofactionpotentialslessthan0.5)areremovedfromanalysisaswell.Thisprocessalsoreducesthebiasandincreasessensitivityinsomecases.Includingallthechannelsforanalysisdoesnotchangetheresultsignicantly.Weassumethattheculturereachesastationarystateattheend(last80trialsoutof600(or1200)trialsover5minutes),andwewanttodeterminehowmuchmoreonecanextendthewindowthatiswithinthestationaritycondition.Thenullhypothesisisthatthesetoftrialsofinterestcamefromthesamepointprocessasthelast40trials.Weusedaresamplingtechniquetocreatethesurrogatedistributionofdivergencesforthenullhypothesis.Firstrecallthatthelast80trialsarei.i.d,bythepreviousassumption.Werandomlychoose40ofthelast80trialstodivideintotwo.Thedivergencebetweenthesetwohalvesisrepeatedlycomputedfor2000timestocreatetheempiricaldistributionunderthenullhypothesis.Theactualdivergencevalueiscomparedagainstthistotestifitsignicantlydeviatesfromthenullhypothesis.Figure 5-6 showsthattheculturesarenon-stationaryfortherst350samplescorrespondingto3minutes.Thedashedline,andthedottedlinecorrespondtoempiricalandGaussianassumed95percentcondencelevelforthenullhypothesis.Thedivergencevalues(blueandgreen)havedecreasingtrendduringthisperiodandcrossthethreshold.Thenon-stationarity(deviationfromthenull-hypothesis)issteepfortherst200samples, 75

PAGE 76

Figure5-6.Nonstationaritydetection.(top)TheHellingerdivergenceofamovingwindowof40trials.Valuesareaveragedover10selectedchannels.(bottomtworows)examplerasterplots.Thethreethickblackbarsindicatethewindowof40trialsthemovingwindowwascomparedagainst. followedbyaplateau.TheseobservationscanalsobemadewithothermethodsexceptISIandFF.Therightpanelofgure 5-7 showsamorediscontinuousstatechangeataround400samples.Despitea5minuterestat600samplestimepoint,thecultureresponseisstillwithinthepseudo-stationaryregion.Hence,thesuddenchangeislikelyalong-termplasticityeect.Againsimilarobservationscanbemadewithdierentdivergencemeasures.Dependingoncultures,wecoulddistinguishshort-termchangesandlong-termchanges(N=6). 76

PAGE 77

Figure5-7.Non-stationaritydetectionwithdissimilarities.(left)Dierentcolorscorrespondtodierentwindowofdataof40trialsforthesamedataasgure 5-6 .(right)Onadierentculturewithalengthenedprotocol;two5minutestimulationsessionsseparatedby5minutesat600sampletime.Threeadditionalreferencetimeperiodsfordivergenceestimationareaddedaswell;twojustbeforeandoneaftertherestingperiod.Nostimulationwasappliedduringthe5minuterest,andanyshort-termeectispresumedtobereversed. 5.3.6KernelSizeTheproposedestimatorhasafreeparameter{kernelsize(alsoknownasbandwidth).Ifthekernelsizeistoosmall,everysmalldierencewillbeconsidereddierent,thereforeresultinginnodiscriminability.Ontheotherhand,ifthekernelsizeistoobig,nomatterhowdierentthespiketimingsare,theywillbeconsideredverysimilar,andagainthediscriminationpowerwouldbelost.Intheory,ifthekernelsizegoestozeroasthenumberofsamplesgoestoinnite,itispossibletohaveaconsistentdensityestimator[ 114 ]. 77

PAGE 78

However,inpracticeweonlyhavenitenumberofsamples,andchoosingthebestkernelsizebecomesanissue.Ingure 5-8 ,itcanbeseenthatthekernelsizeplaysasignicantrole.Thetypeoffeature,numberofsamples,problemdiculty(jittersize)areallrelatedtotheproblem,andhenceitisnon-trivialtoestimatethebestkernelsize.However,forthecurrentkernelsizescalingrules,itcanbeseenthatthephysiologicaljittersizeisintherangeoftheoptimalkernelsize,andtheperformanceisnotverysensitivetosmallchangesinthekernelsize.Hence,werecommendusingthephysiologicallyrelevantkernelsizes(1millisecondto1secondrange)andoptimizeforperformanceifpossibleasusuallydonewiththespiketraindistancemeasures[ 123 124 ]. Figure5-8.Jitter20msismoredicultthat10ms.Thepeakperformanceforjitter10mscaseismaintainedforabout0.5orderofmagnitude.Numberofsamplesisxedto30.-divergencebasedestimator(solidlines)arecomparedagainsttheimportancesampling(dottedlines).Theyhaveessentiallysameperformance. 78

PAGE 79

5.4Symmetric2-divergenceInthissection,weparticularlyfocusonthe2-divergence,where(t)=(t)]TJ /F1 11.955 Tf 12.06 0 Td[(1)2,thatis,D2(P;Q)=ZdP dQ)]TJ /F1 11.955 Tf 11.96 0 Td[(12dQ: (5{19)Noticethat,theexistenceofD2requirestheRadon-NikodymderivativedP=dQ,alsoknownasthelikelihoodratio,tobeL2(Q)integrable.The2-divergence(orany-divergenceassuch)canbeconsistentlyestimatedas^D2=Z ddP dQ)]TJ /F1 11.955 Tf 11.95 0 Td[(1!2d^Q (5{20)where\dP=dQisaconsistentestimatoroftheRadon-Nikodymderivativeand^Qistheempiricalprobabilitymeasure.However,theestimationoftheRadon-NikodymderivativeonlymakessensewhenPQsincethetermdP dQbecomesundenedotherwise.Forexample,intheextremecasewhenPandQhavedisjointsupports.TotacklethisproblemweproposetoworkwithD2(P;(P+Q)=2)instead,sinceP(P+Q)=2.Thisdivergenceisthesameasthesymmetric2-divergenceDS(P;Q)=Zd(P)]TJ /F3 11.955 Tf 11.95 0 Td[(Q) d(P+Q)2d(P+Q); (5{21)addressedin[ 43 ],exceptforamultiplicativefactor:DS(P;Q)=2D2(P;(P+Q)=2)=2D2(Q;(P+Q)=2).Moreover,thesymmetric2-divergenceisametriconthespaceofallprobabilitymeasures[ 43 ].Inthissection,weexploretwoconceptuallydierentapproachestoestimatetheRadon-Nikodymderivativeandthesymmetric2-divergence.Therstmethodisidenticaltotheprevioussection,whereasthesecondmethodmapsthespiketrainsinamorestructuredspace,andutilizesstrictlypositivedenitekernelsonthatspacetoestimatetheRadon-Nikodymderivative.TheuseofstrictlypositivedenitekernelstoestimateRadon-Nikodymderivativeanddivergencefunctionalshasrecentlygainedconsiderable 79

PAGE 80

interestin=Rd[ 56 77 ].Weextendthisideaandshowthatthisapproachcanalsobeappliedonamoreabstractspace. 5.4.1PointProcessRepresentationInadditiontothestratiedspaceapproach(section 5.2 ),wepresentanalternativemethodtoapproximatetheRadon-Nikodymderivativeusingthesmoothedspiketrainspace.Thestraticationapproachinherentlydistributesthesampleswithdierentnumberofactionpotentialsindierentgroups,andtherefore,twospiketrainswithdierentnumberofactionpotentialsneverinteract.Thisposesaprobleminestimationifthecountdistributionisati.e.thespiketrainstendstohavedierentnumberofactionpotentials.Analternateapproachtorepresentaspiketrainistoprojectthespiketraininadierent,andperhapsmorestructured,space.LetSbethespaceofallL2integrablefunctionsoverXi.e.,S=L2(X).Given!=ft1;:::;tng2,deneamappingG:!SasG(!)(t)=Pni=1g(t;ti)jXsuchthatGisinjective.Whengisatranslationinvariantfunctionthatdecaysat1,Gcanbeconsideredasmoothedspiketrain[ 84 ].Therearemanydierentg'sthatmakethemappingGinjective.Forexample,wheng(x;y)isaboundedstrictlypositivedenitefunction.Toseethis,considertwospiketrains!1=ft11;:::;t1ngand!2=ft21;:::;t2mgsuchthatallspiketimingsaredistincti.e.,!1\!2=;,andassumethatG(!1)=G(!2).Sincethekernelmatrix[K]ij=g(ti;tj)whereti;tj2!1[!2isfullrank(bythevirtueofstrictpositivedeniteness),G(!1)(t)=G(!2)(t)forallt2!1[!2impliesthat!1=!2.The-algebraofS=L2(X)isinducedbyB()andGtobe(S[fG(A)jA2B()g).ThenGismeasurable,andwecandeneaninducedprobabilitymeasureUsuchthatU(SnG())=0andU(G(A))=P(A)forA2().LetUandVbetwoprobabilitylawsonSinducedbyPandQ,respectively,then,thefollowingtwopropositionsshowthattheRadon-NikodymderivativeincanbetransferedtoS. Proposition10. IfPQthenUV 80

PAGE 81

Proof. LetAG(),thenV(A)=0)Q)]TJ /F3 11.955 Tf 5.48 -9.69 Td[(G)]TJ /F4 7.97 Tf 6.59 0 Td[(1(A)=0)P)]TJ /F3 11.955 Tf 5.47 -9.69 Td[(G)]TJ /F4 7.97 Tf 6.58 0 Td[(1(A)=0)P(A)=0:SimilarprooffollowswhenA6G(). Proposition11. dP=dQ(!)=dU=dV(G(!)) Proof. Foranyintegrablefunction:S!R,Z(G(!))dP(!)=ZS(f)dU(f)=ZS(f)dU dV(f)dV(f)=Z(G(!))dU dV(G(!))dQ(!)Therefore,dP(!)=dU=dV(G(!))dQ(!). Corollary12. The2-divergencebetweenPandQisthesameasthe2-divergencebetweenUandV,i.e.ZdP dQ(!))]TJ /F1 11.955 Tf 11.95 0 Td[(12dP(!)=ZSdU dV(f))]TJ /F1 11.955 Tf 11.96 0 Td[(12dU(f):ThiscorollaryshowsthattheproblemofestimatingtheRadon-Nikodymderivativeand2-divergenceincanbeframedinS,whichismorestructured. 5.4.2EstimationofSymmetric2-divergenceFollowingthetworepresentationsdecribedintheprevioussection,weproposetwodistinctestimatorsofthesymmetric2-divergence.Noticethat,inthissectionweassumethatPQtosimplifythederivations.However,theproposedmethodscanbetriviallyextendedtoestimatesymmetric2-divergencebyreplacingQby(P+Q)=2. 5.4.2.1StraticationapproachTheRadon-NikodymderivativedP=dQinRncanbeconsistentlyestimated,underappropriateconditions,bytheratiooftheParzenestimates\dP=d.\dQ=dwheredenotestheLebesguemeasure[ 17 ].Inourcase,sincedP=dQ=P1n=0dPn=dQnwe 81

PAGE 82

estimatetheRadon-NikodymderivativeasddP dQ=1Xn=0\dPn=dn \dQn=dnUsingthisestimate,theestimatorofthe2-divergencebecomes^D2=Z ddP dQ)]TJ /F1 11.955 Tf 11.96 0 Td[(1!2d^Q=1Xn=0Z \dPn=dn \dQn=dn)]TJ /F1 11.955 Tf 11.95 0 Td[(1!2d^Qn=1 card(Y(0))mcard(X(0)) ncard(Y(0)))]TJ /F1 11.955 Tf 11.95 0 Td[(12+1Xn=11 card(Y(n))Xyi2Y(n) lPxj2X(n)n)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(!1j)]TJ /F3 11.955 Tf 11.95 0 Td[(!2i;n mPyj2Y(n)n)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(!2j)]TJ /F3 11.955 Tf 11.96 0 Td[(!2i;n)]TJ /F1 11.955 Tf 11.96 0 Td[(1!2:wheref!1igmi=1andf!2igli=1aresamplesfromPandQrespectively.Noticethat,althoughtheindividualestimatorsforeachnisconsistentbythevirtueoftheconsistencyoftheRadon-Nikodymderivativeestimate,andtheempiricalmeasure,theentireestimatormightnotbeconsistentsinceitrequiresaddingacountablenumberofsuchestimators.Moreover,theconvergencerateoftheestimatedRadon-NikodymderivativedependsonthedimensionalityoftheEuclideanspace.Therefore,theconvergencerateoftheentireestimatorisgovernedbytheslowestconvergencerate.However,theseissuescanbetackledbyassumingthatnumberofspikesinaspiketrainisupperboundedi.e.n
PAGE 83

ThisinequalityshowsthattheerrorbetweentheactualandestimatedRadon-NikodymderivativeisboundedbytwoL2distances.Thersttermgoestozeroasn!1since^V!Valmostsurely.Therefore,inordertogetaconsistentestimateofD2itisimportanttogetanappropriateestimateofRadon-Nikodymderivativethatmakesthesecondtermarbitrarilyclosetozero.Following[ 56 ],weassumethat\dU=dV(f)=Pli=1i~(g)]TJ /F3 11.955 Tf 13.01 0 Td[(gi)wherei'sarearealcoecientsand(f)]TJ /F3 11.955 Tf 12.79 0 Td[(g)isastrictlypositivedenitekernel[ 10 ].ThisexpansionisjustiedduetothefollowingpropositionwhichstatesthatthefunctionsoftheformP1i=1i~(g)]TJ /F3 11.955 Tf 12.26 0 Td[(gi)aredenseinL2(S;V)i.e.itcanapproximateanyfunctionp2L2(S;V)witharbitraryaccuracyintheL2sense.NoticethatsincedU=dV2L2(S;V),thisimpliesthattheproposedexpansioncanapproximatetheRadon-Nikodymderivativearbitrarily. Theorem13. Let~(x;y)beasymmetricstrictlypositivedenitecontinuouskernelonSSandVisaprobabilitymeasureonSsuchthatRS~2(x;y)dV(x)<1forally2S,thenspan(~(x;y):y2GS)isdenseinL2(S;V),whereGdenotesthesubsetwherethemeasureVlies. Proof. LetusassumethatthespanisnotdenseinL2(S;V),thenthereexistsafunctiong2L2(S;V)suchthatRg(y)(x;y)dV(y)=0.Therefore,RRg(x)g(y)(x;y)dV(x)dV(y)=0.Since~isstrictlypositivedenite,thisimpliesthatgiszeroa.e.V. Theorem 13 isacloselyrelatedtothenotionofUniversalkernel;insteadofcontinuousfunctionsC(S)foracompactS,weshowedthatitisdenseinL2(S)functions.Usingthekernelexpansion,thesecondtermin( 5{22 )canbeexpressedas,Z dU=dV(f))]TJ /F6 7.97 Tf 19.28 14.94 Td[(lXi=1i~(f;gi)!2dV(f)=Z(dU=dV)2(f)dV(f))]TJ /F1 11.955 Tf 11.96 0 Td[(2lXi=1Zi~(f;gi)dU(f)+ZlXi=1lXj=1ij~(f;gi)~(f;gj)dV(f)C)]TJ /F1 11.955 Tf 15.34 8.09 Td[(2 mlXi=1mXj=1Zi~(fj;gi)+1 llXi=1lXj=1lXk=1ij~(gk;gi)~(gk;gj) 83

PAGE 84

whereCisaconstant.Therefore,thesecondtermis( 5{22 )isminimizedfor=(l=m)(KQQKQQ+lI))]TJ /F4 7.97 Tf 6.59 0 Td[(1KQP1whereisaregularizationparameterrequiredtoavoidoverttingand[KPQ]ij=~(G(!1i);G(!2j))isthegrammatrix.Theestimatedcanthenbeusedtoestimate^D2=(KQQ)]TJ /F1 11.955 Tf 11.95 0 Td[(1):21=l.ExamplesofkernelsthatsatisfytherequiredpropertiesarethenCIkernels(section 2.3 ).~(f;g)=exp)]TJ /F9 11.955 Tf 11.29 16.28 Td[(ZX(f(t))]TJ /F3 11.955 Tf 11.95 0 Td[(g(t))2dt=2and~(f;g)=ZXexp)]TJ /F2 11.955 Tf 5.48 -9.69 Td[()]TJ /F1 11.955 Tf 9.3 0 Td[((f(t))]TJ /F3 11.955 Tf 11.95 0 Td[(g(t))2=2dt:Forthesimulations,weusetherstkernel,setthetobethemediandistancebetweensamplesfi'sandsettheregularizationparametertobe1=n. 5.4.3ResultsWepresent3hypothesistestingexperimentstodemonstratetheprosandconsofeachmethod.Foreachexample,therearetwoclassesofspiketrains,andthetestistondwhetherthespiketrainsoriginatefromthesameprobabilitylaw.Thestatisticalpower,whichmeasuresthetype2errorgivenaxedtype1error(=0:05),isestimatedviaasurrogatetest.WecomparetheresultswiththebaselinedissimilarityL2whichcomputestheL2distancebetweenPSTH(peri-stimulustimehistogram)asastatistic.Weobservethatthekernelbasedestimatorconsistentlyapproacheshighpowerquicklyforallexamples. 5.4.3.1TwoactionpotentialsTwopointprocesseswithtwoeventsorlesswithsamemarginalintensityfunctionbutwithdierenteventcorrelationstructurearechosentoillustratethemethods(Fig. 5-9 left).InpointprocessfromH0,thetwotimingsarecorrelated;theinterspikeinterval(ISI)hasanarrowdistribution.InpointprocessfromH1,thetwotimingsareindependent;eachactionpotentialhasaprecisetiming.Bothpointprocesseshavealossynoise;eachactionpotentialhasaprobabilityofdisappearingwithprobabilityp=0:1.NotethatH0 84

PAGE 85

Figure5-9.Twoactionpotentialsthatarecorrelated(H0)andindependent(H1).1=10msisusedforthestratiedkernel.(Left)Spiketrainsfromthenullandalternatehypothesis.(Right)Comparisonofthepowerofeachmethod.Theerrorbarsarestandarddeviationover5MonteCarlosimulations. representsrenewaltypeofneuralcode,whileprocessH1representspreciselytimedactionpotentials.Sincethedimensionoftheproblemisatmost2,thisproblemiseasierforthestratiedestimator(Fig. 6-2 right).Nevertheless,thekernelbasedestimatorquicklycatchesupasthenumberofsampleincreases.Theringratefunctionbaseddissimilarityfailstodiscriminatethetwoprocessesbecausetheintensityfunctionisdesignedtobeidentical. 5.4.3.2InhomogeneousPoissonprocessInPoissonprocess,themeanratefunctionfullydescribestheprocess.WesimulatetwoinhomogeneousPoissonprocesswheretheratechangesat100ms.Ingure 5-10 ,theL2withtherightbinsizeoutperformsothers.Thestratiedapproachessuersfromthespreadofsamplesindierentstrataandthecurseofdimensionality,whilethekernelbasedestimatorquicklyapproacheshighpower. 85

PAGE 86

Figure5-10.Poissonprocesswithratechanginginstepfunctionfrom3to2,and2to3at100ms.1=100msisusedforthestratiedkernel.Themeannumberofactionpotentialisxedto5.(Left)Spiketrainsfromthenullandalternatehypothesis.(Right)Comparisonofthepowerofeachmethod.Theerrorbarsarestandarddeviationover5MonteCarloruns. 5.4.3.3StationaryrenewalprocessesRenewalprocessisawidelyusedpointprocessmodeltocompensatethedeviationfromPoissonprocess[ 75 ].Stationaryrenewalprocesswithgammaintervaldistributionissimulated.Despitethefactthatthecountdistributioniswidelydistributedandthehighdimensionality,thestratiedestimatorperformsthebest.Thisisbecausethedierenceinthecountdistributioniseasilycapturedwhenthebandwidthisrelativelylarge.Thekernelbasedmethoddoesnotsuerfromthisfactandperformswell.Themeanringratebasedestimatorisnotconsistentinthisexample,inthesensethattheperformanceofthemethoddoesnotimprovewiththesamplesize.Asthenumberofsampleapproachesinniteitfailstodetect.However,interestinglyinthenitesamplerange,thevarianceintheratefunctiondependsontheregularityoftheringpattern,thereforeitisabletodetectthedierence. 86

PAGE 87

Figure5-11.Gammadistributedrenewalprocesswithshapeparameter=3(H0)and=0:5(H1).1=100msisusedforthestratiedkernel.Themeannumberofactionpotentialisxedto10.(Left)Spiketrainsfromthenullandalternatehypothesis.(Right)Comparisonofthepowerofeachmethod.Theerrorbarsarestandarddeviationover20MonteCarloruns. 5.4.4AuditoryNeuronClusteringWeexplorertheauditoryresponseofratthalamicneurons.Asseeningure 3-1 ,thetrial-to-trialvariabilityoftheseneuronshavesomecomponentofprecisetiming.AICRAfrozennoisewaspresented56timestothesystemandcorresponding56singleneuronresponseisrecorded.Weappliedsmoothedspiketrainbased2divergenceestimator(theorem 13 )toallpairsofsetsofspiketrains.Theaimistoclustersimilarresponsepointprocessestogether.Thiscouldpossiblyhelpidentifyneuralassemblies.67singleunitsfrommedialgeniculatebody(49ventral,18dorsal)areanalyzed.Eachtrialresultedin2.5secondsofspiketrainrecording.Weapplieddistancebasedhierarchicalclustering(gure 5-12 ).Notethatcomparedtomeanratefunctionclustering,thesymmetric2measuregivesdierentresults.Also,thereisabias(around0:2)inthedistanceestimation. 87

PAGE 88

Figure5-12.Pointprocessclusteringofauditoryneuronswith2divergencecomparedagainstL2.Theneuronsweredividedintothreegroupsaccordingtotheirmeanrate:high(>3:5,n=17),low(<1:5,n=25),medium(rest,n=25).Agglomerativehierarchicalclustertreeminimizingtheaveragepairwisedistanceisusedtocreatethehierarchy.X-axiscorrespondstoneuronindex,andY-axisistimefortherasteranddistanceforthedendrogram. 88

PAGE 89

5.4.5DiscussionInthissection,wehaveintroducedtwodistinctmethodsofestimating2-divergenceandRadon-Nikodymderivativebetweentwopointprocesses,namelythestraticationapproachandthesmoothedspiketrainapproach.Thestraticationapproachissimplerinnature,andcanbeevaluatedmoreecientlythanthelater.Tobespecic,thestraticationapproachisO(ml)incomputationwhereasthesmoothedspiketrainapproachisO(l3+ml2)incomputation.However,thecomputationcanbeimprovedbyexploitingthestructureoftheGrammatrices,andhasbeenelaboratelystudiedintheliterature[ 35 ].Thestraticationapproach,ontheotherhand,suerswhenthecountdistributionofthespiketrainsisati.e.thespiketrainstendtocontaindierentnumberofspikes(asdemonstratedinthesubsection 5.4.3.1 ).Thisposesaprobleminestimationsinceeachstratumcontainslesserspikesinasmallsamplesizesituation.Therefore,thesmoothedspiketrainapproachisapplicabletoalargervarietyofpointprocesses,but,atthecostofhighercomputation.TheproblemofestimatingRadon-Nikodymderivative,however,canalsobeapproachedinseveralotherwaysthatremaintobeexplored.Forexample,noticethat,althoughthespaceofspiketrainsisnotEuclidean,itisstilla(semi-)normedspacewherethenormisinducedbyeitherthestraticationorthesmoothedspiketrainrepresentation,orotheravailabledistancemetricessuchasVictor-Purpurametric[ 123 ].Recently,therehavebeenconsiderableworkonestimatingregressionfunctionalsonsemi-normedspacebyintroducingboundedkernelsonthenormofthespace[ 24 ].SimilarapproachcanbeconsideredforestimatingRadon-Nikodymderivativebyemployingsimilarkernels,albeitproperlynormalized,withthehelpofParzentypeestimation.Moreover,inthissectionwehaveonlyaddressedtheproblemofestimatingsymmetric2-divergence.However,inpractice,otherdivergences,suchasKL-divergence,arealsowidelyused.Theproposedmethodscanbeextendedtoestimateanyarbitrarydivergencefollowingtheapproachpresentedin[ 77 ].Thisrequiresthestrictlypositivedenitekernel 89

PAGE 90

toberichenoughtopresentawidevarietyofcontinuousfunctionsorinasensetobeuniversal[ 73 ].However,weleavetheseinterestingextensionsasfuturework. 90

PAGE 91

CHAPTER6KERNELBASEDDIVERGENCE 6.1IntroductionInthischapter,wefollowakernelbasedapproach,sinceakernel{bivariatepositivedenitefunction{canbedenedonanyarbitraryspace,anditprovidesanimplicitsimilaritymeasureonthespacethatcanbeusedtodesigndivergenceanddissimilarityontheprobabilitylawsdenedonthatspace[ 43 ].ThisapproachhasalsobeenexploredbyDiksandPanchenko[ 29 ],andhasrecentlybeenpopularizedbyGrettonandcoworkersinthecontextoftwosampleproblem[ 40 ].Wepresentafamilyof(strictly)positivedenitekernelsonthespaceofspiketrains,anddissimilarities(divergences)measuresinducedbythem.Weshowthatthisframeworkencompassesmanywellknownmeasuresofdissimilaritye.g.measuresbasedonintensityfunctionsandcountstatistics.Wealsoshowthataclassofpositivedenitekernelsonspiketrainsintroducedbeforeareactuallystrictlypositivedenite,thus,explainingtheirsuperiorperformanceoverotherkernels[ 84 ].Therestofthechapterisorganizedasfollows.Insection 6.2 wediscussthenotionofkernelbaseddivergenceanddissimilaritymeasures.Insection 6.3 wediscusssomespecialformofstrictlypositivedenitekernelsonRn.Welaterusethesekernelstodesignkernelsonspiketrainspace.Insection 6.5 and 6.6 wediscussafamilyof(strictly)positivedenitekernelandtheircorresponding(divergence)dissimilaritymeasuresusingthestratiedrepresentationandthesmoothedrepresentation,respectively.Insection 6.7 weprovidesomesimulationresultsshowingtheperformanceoftheproposedkernels. 6.2KernelBasedDivergenceandDissimilarityLet(X;F)beameasurablespaceandF=B(X)istheBorelalgebrainducedfromthetopologyonX. Denition7(Positivedenitekernel). Let(X;F)beameasurablespace,andK:XX!Cbea(FF=B(C))-measurablefunction.ThekernelKiscalledpositive 91

PAGE 92

deniteifandonlyifforanynitenon-zerocomplexBorelmeasure:F!C,RRK(x;y)d(x)d(y)0.Furthermore,iftheinequalityisstrict,thenthekernelKiscalledstrictlypositivedenite.Notethatthedenitiondoesnotensuresymmetry.LetM+denotethesetofprobabilitymeasureson(X;F).GivenapositivedenitekernelK:XX!C,wedeneameasureofdissimilarityDK:M+M+!R+0as,DK(P;Q)=ZZK(x;y)d(x)d(y); (6{1)where=P)]TJ /F3 11.955 Tf 12.38 0 Td[(Q.DuetothepositivedenitenessofK,DK(P;Q)isnon-negative,andP=QimpliesDK(P;Q)=0.AdissimilaritymeasuredenedinsuchawaycomparessomeparticularfeaturesofPandQ.Forexample,ifX=RandK(x;y)=xy,thenDK(P;Q)=(EP[X])]TJ /F10 11.955 Tf 11.95 0 Td[(EQ[Y])2,i.e.thekernelK(x;y)=xyonlycomparesthemeanoftherandomvariablesXPandYQ.AdivergencemeasureDK(P;Q):M+M+!R+0canbeconstructedusing( 6{1 )byincorporatingastrictlypositivedenitekernel.DuetothestrictpositivenessofK,DK(P;Q)isnon-negative,andzeroifandonlyifP=Q.Asamatteroffact,onecanshowthattheresultingdivergenceisametricinducedfromthedistanceinthereproducingkernelHilbertspace(RKHS)ifthepositivedenitekernelisHermitian[ 29 ].Oneofthemajoradvantagesof( 6{1 )isthesimplicityofitsestimator. Denition8(Estimator). GivensamplesfxigNPi=1andfyjgNQj=1fromPandQrespectively,theestimatorforDK(P;Q)canbedenedas,^DK(P;Q)=NPXi=1NPXj=1K(xi;xj))]TJ /F1 11.955 Tf 11.95 0 Td[(2NPXi=1NQXj=1K(xi;yj)+NQXi=1NQXj=1K(yi;yj) (6{2)Thisestimatorisconsistenti.e.^DK(P;Q)a:s:)453()454(!DK(P;Q). 92

PAGE 93

While( 6{1 )providesalargeclassofdivergences,insteadofusingaxedkernelK,which,casuallyspeaking,comparestwoprobabilitylawsontheentiresamplespace,itispossibletouseakernelthatcomparestwoprobabilitylawsonlyontheirsupport.Thismotivatesustodesignameasuredependentdivergenceasfollows. Denition9(Measuredependentdivergence). Ameasuredependentdivergenceisdenedas,DK(P;Q)=ZZK(x;y)d(x)d(y); (6{3)where=P)]TJ /F3 11.955 Tf 11.96 0 Td[(Q,(absolutecontinuity),dependsonPandQ,andKisastrictlypositivedenitekernelforallnon-zeromeasuresabsolutelycontinuouswithrespectto(c.f.denition 7 ).Itdirectlyfollowsfromthedenitionthat( 6{3 )isadivergence.Forexample,if=P+Q 2,thentheinduceddivergencecomparesthedierencebetweenPandQ,onlyonthesupportonP+Qinsteadoftheentirespace. 6.3StrictlyPositiveDeniteKernelsonRnandL2InRn,strictlypositivedenitekernelsarewellstudied[ 37 ].Webrieydiscussafewfamiliesofstrictlypositivedenitekernelsofspecialinterest. 6.3.1CompositeKernelsonRn Denition10(S-admissiblefunctions). LetSbetheSchwarzspaceofrapidlydecreasingfunctions.g:RnRn!RisS-admissibleifforeveryh2S,thefollowingintegralequationhasasolutionf,Rg(x;u)f(u)du=h(x). Theorem14(Compositionkernels). Ifg:RnRn!CisaS-admissibleorastrictlypositivedenitefunction,isameasure,andsupp()=Rn,thenthefol-lowingkernelisstrictlypositivedeniteonRn,K(x;y)=Rg(x;u)g(y;u)d(u).[Proofattheendofchapter]Examplesofbasisfunctionsg(x;u)usedforcompositionkernelsareeixu,I(xu),ande)]TJ /F4 7.97 Tf 6.58 0 Td[((x)]TJ /F6 7.97 Tf 6.59 0 Td[(u)2forR.InRn,wecanusethetensorproductkernels:Qig(xi;ui). 93

PAGE 94

6.3.2SchoenbergKernels(orradialbasisfunction)onL2 Denition11(Completelymonotonefunctions[ 106 ]). Afunction:(0;1)!Rissaidtobecompletelymonotoneon[0;1)if2C1(0;1),continuousatzero,and()]TJ /F1 11.955 Tf 9.3 0 Td[(1)l(l)(r)0;l2N;r>0where(l)denotesthel-thderivative.Examplesofcompletelymonotonefunctionson[0;1)are;e)]TJ /F6 7.97 Tf 6.58 0 Td[(x;1 (x+2)whereandareconstants. Theorem15(StrictlypositivedeniteSchoenbergkernelsonL2). Ifafunction:[0;1)!Riscompletelymonotoneon[0;1)butnotaconstantfunctionthenK(x;y)=(kx)]TJ /F3 11.955 Tf 11.96 0 Td[(yk2)isstrictlypositivedeniteonL2wherekkdenotestheL2norm.[Proofattheendofchapter] 6.4RepresentationofSpikeTrainsandPointProcessSpaces 6.4.1SmoothedSpikeTrainSpaceAnalternativerepresentationforthespiketrainspaceistouseaninjectivetransformS:!L2(X)fromthespaceofspiketrainstothespaceofL2integrablecontinuousfunctionswithnitenumberofdiscontinuities[ 44 84 ].Givenaspiketrain!=ftigni2n,thisisusuallyachievedbythelinearsmoothing!7!~(t)=Pni=1g(t)]TJ /F3 11.955 Tf 11.98 0 Td[(ti)jXwheregistheimpulseresponseofthesmoothinglter[ 84 121 ].ThefunctiongisoftentakentobestrictlypositivedeniteinthesenseofBochner[ 10 ],sinceitmakesSinjective. 6.5StratiedSpikeTrainKernelsThestraticationintroducedinsection 6.3 providesanopportunitytobuildkernelsbycombiningkernelsonRn. Theorem16(Stratiedstrictlypositivedenitekernel). LetfK(n)g1n=0beafamilyof(strictly)positivedenitekernelsK(n):RnRn!Cforeveryn2N.Dene,Ks(!i;!j)=8>><>>:K(n)(!i;!j)ifboth!i;!j2n,0otherwise (6{4)ThenKsisa(strictly)positivedenitekernelonspiketrains, 94

PAGE 95

Proof. Bydirectsum. Thecorrespondingdissimilarity(divergence)canbesimpliedasDs(P;Q)=1Xn=0ZZK(n)(!i;!j)djn(!i)djn(!j) (6{5)where=P)]TJ /F3 11.955 Tf 12.09 0 Td[(Q,andjndenotestherestrictionofthemeasuretoFn.Below,wepresentafewspecialcasesofthistypeofkernels.IfK(n)1=1foralln2N,thenthedissimilaritybecomes,D1(P;Q)=Pn(P(n))]TJ /F3 11.955 Tf -443.99 -23.91 Td[(Q(n))2,whichcorrespondstotheCramer{von-Mises(C-M)statisticforthecountdistributions[P(n)]nand[Q(n)]n.AnotherinterestingkernelisfoundbyusingthefollowingcompositekernelsonRn,K(n)s(!i;!j)=ZI(!i!)I(!j!)d(!)whereisapositivemeasure,supp()=n,I(!i!j)=QdI(!di!dj),and!=[!1;:::;!d].Thecorrespondingdivergenceisgivenby,Ds=XnZZhI(!(n)i!(n))I(!(n)j!(n))djn(!(n))id(P)]TJ /F3 11.955 Tf 11.96 0 Td[(Q)jn(!(n)1)d(P)]TJ /F3 11.955 Tf 11.95 0 Td[(Q)jn(!(n)2)=XnZP(n)FP(!(n)))]TJ /F3 11.955 Tf 11.95 0 Td[(Q(n)FQ(!(n))2jn(!(n)) (6{6)whereFdenotesthecumulativedistributionfunctioninRn.Given=P+Q 2,itcanbeshownthattheresultingdivergenceisthecumulativedistributionfunction(CDF)basedCramer{von-Misestypedivergenceproposedinchapter 4 .Ifg(x;u)isthedeltafunction(x;u)suchthatRf(x)(x;u)dx=f(u)foreverycontinuousfunctionf(x),thentheinduceddivergenceisgivenby,DKp(P;Q)=XnZP(n)f(n)P(u))]TJ /F3 11.955 Tf 11.96 0 Td[(Q(n)f(n)Q(u)2djn(u) (6{7) 95

PAGE 96

wheref(n)Pandf(n)Qarethe(continuous)densityfunctionsinni.e.theinduceddivergenceisjusttheL2distancebetweentwodensityfunctions.Inpracticethefunctioncanbereplacedbyasmoothingfunctiongthatsatisestheconditionintheorem 14 6.6KernelsonSmoothedSpikeTrainsA(strictly)positivedenitekernelK(!1;!2)oncanbedenedusinga(strictly)positivedenitekernel~K(~1;~2)ontheL2spacesince,ZK(!1;!2)d(!1)d(!2)=Z~K(~1;~2)~(~1)~(~2)0;where~denotesthemeasureinL2inducedbythemeasureini.e.~(A)=(S)]TJ /F4 7.97 Tf 6.59 0 Td[(1(A))ifAS()orzerootherwise.Paivaandcoworkersproposedthefollowingkernelonspiketrains[ 84 ].KL2(!1;!2)=ZX~1(t)~2(t)dt; (6{8)where~isobtainedbylinearsmoothing(seesection 5.4.1 ).Itiseasytoseethatthisisasymmetricpositivedenitekernel.Therefore,thedissimilaritystatisticsDKL2isnotadivergence.KL2inducesaspiketrainRKHSwherethevanRossumlikedistance[ 121 ]isinducedbytheinnerproduct.Directsumof(strictly)positivedenitekernelsdenesa(strictly)positivedenitekernel.Thismotivatesthefollowingconstruction. Theorem17. Givenaboundedstrictlypositivedenitefunction:R!R,andapositivefunctionh:R!R+,thekernelK:!Risstrictlypositivedenite.K(!1;!2)=ZXh(t)(~1(t))]TJ /F1 11.955 Tf 12.44 3.16 Td[(~2(t))dt (6{9) Proof. Itiseasytoseefromthefollowingform.ZZK(!i;!j)d(!i)d(!j)=ZXh(t)ZZ(~!i(t))]TJ /F1 11.955 Tf 12.44 3.16 Td[(~!j(t))d(!i)d(!j)dt: 96

PAGE 97

Asaspecialcase,thenonlinearkernelofthefollowingformproposedin[ 84 ]isstrictlypositivedenite.Ka(!1;!2)=ZXexp)]TJ /F9 11.955 Tf 11.29 13.27 Td[(~1(t))]TJ /F1 11.955 Tf 12.44 3.15 Td[(~2(t)2dt: (6{10)Finally,followingtheorem 15 ,anykerneloftheform,K(!1;!2)=(k~1(t))]TJ /F1 11.955 Tf 12.44 3.16 Td[(~2(t)k2); (6{11)where()iscompletelymonotonefunctionandkkistheL2norm,isstrictlypositivedenite.In[ 84 ],theauthorshavealsoproposedthefollowingspecialcase,Kb(!1;!2)=exp)]TJ /F9 11.955 Tf 11.29 16.27 Td[(Z~1(t))]TJ /F1 11.955 Tf 12.44 3.15 Td[(~2(t)2dt: (6{12) 6.7SimulationResults 6.7.1KernelPCA 1 2 4 10 Kb 0.0289 0.4367 0.9999 0.9999 Ka 0.1072 0.9893 0.9999 0.9999 Ks 0.0367 0.0782 0.7623 0.9999 Table6-1.Statisticalpowerestimatedfrom1000MonteCarlorunsforeachkernelgiven40spiketrains. Whenahermitianstrictlypositivedenitekernelisused,thekernelbaseddivergencesbetweentwoprobabilitylawscanbeseenasthedistancebetweenthecorrespondingmeanelementsintheRKHS[ 117 ].Therefore,theperformanceofaspecickernelcanbevisualizedbytheseparationofthemeanelementintheRKHS.However,sinceitisnotpossibletovisualizethemeanelementintheinnitedimensionalRKHS,weperformkernelprincipalcomponentanalysis(KPCA)andusethersttwoeigenvectorsforvisualizalization[ 107 ]. 97

PAGE 98

Figure6-1.KPCAresultsofdierentstrictlypositivedenitekernelsarevisualizedonthersttwoprincipalcomponents.Kbisglobal,Kaislocal,andKsisstratied(seetextfordetails).100spiketrainseachfromtworenewalprocesseswithxedrateGammadistributedintervalsaregenerated(top).istheshapeparameteroftheGammadistribution;=1correspondstothePoissonprocess.OneofthetwoclasseswasalwaysxedtobePoisson;therstcolumncorrespondstotheindistinguishablecase.Thediamondandcirclearethemeanofeachclass.Themeannumberofspikesissetto10. Wecomparetheperformanceofthefollowingthreekernels;Kb( 6{12 ),Ka( 6{10 )andKs( 6{4 ).ForKb,wehaveobtainedthesmoothingusingheavysidefunctioni.e.~(t)=PiI(tti),whereasforKathesmoothingisobtainedusingrectangularwindowi.e.~(t)=PiI(ti)]TJ /F3 11.955 Tf 12.27 0 Td[(tti+).Therefore,whileKbremembersallthehistoryovertime,Kaisonlysensitivetolocaleventswithinthewindow.ForKswehavetakenK(n)tobeann-dimensionalsphericalGaussian.Sinceastrictlypositivedenitekernelischaracteristic[ 117 ],themeanintheRKHSforeachclassisuniquewheneverthegeneratingpointprocessesaredistinct(theright3columnsinthegure,therstcolumnisforidenticalpointprocesses)[ 117 ]. 98

PAGE 99

Figure6-2.Hypothesistestontwoactionpotentialsthatarecorrelated(H0)andindependent(H1).(Left)Spiketrainsfromthenullandalternatehypothesis.(Right)Comparisonofthepowerofeachmethod.Y-axisindicatesthestatisticalpowerthatmeasuresthetype2errorgivenaxedtype1error(=0:05)asestimatedviasurrogatetest.Variouskernelsarecompared;seetestofgure 6-1 fordetails.ForL2thenumberindicatesthebinsizeforrateestimation. However,theKPCAprojectionsshowthattherepresentationsignicantlydiersbetweendierentkernels.Weemphasizethatitisnotnecessaryforthekerneltoclusterthesamplepointstoderiveagooddivergencemeasure;onlythedistancebetweenthemeanmatters.WeshowtheperformanceoftheKPCAingure 6-1 andreportthepowerofthecorrespondinghypothesistestintable 6-1 .Weobservethateachkernelhashighdiscriminatingpoweronthisdataset. 6.7.2StatisticalPowerInthissection,weempiricallycomparethesmallsamplepoweroftheproposeddivergencesagainstabaselinedissimilarity(L2)whichcomputesthel2distancebetweenPSTH(peri-stimulustimehistogram).Wegeneratetwoclassesofspiketrainsandtestwhethertheyfollowthesameprobabilitylaw.Bothpointprocessessharesamemarginalintensityfunctionbuttheyhavedierentcorrelationstructure(Fig. 6-2 left).Intherstprocess(H0ingure),thetwoeventtimingsarecorrelated;theinterspikeinterval(ISI) 99

PAGE 100

hasanarrowdistribution.Inthesecondprocess(H1ingure),thetwoeventtimingsareindependent;eachactionpotentialhasaprecisetiming.Bothpointprocesseshavealossynoise;eachactionpotentialhasaprobabilityofdisappearingwithprobabilityp=0:1.NotethatH0representsrenewaltypeofneuralcode,whileprocessH1representspreciselytimedactionpotentials.Sincethehighestdimensionoftheproblemis2,thisproblemiseasierforthestratiedkernels(Fig. 6-2 right).Nevertheless,thesmoothedkernelbaseddivergencesquicklycatchupasthenumberofsampleincreases.Theringratefunctionbaseddissimilarity(L2)performspoorly,sincetheintensityfunctionisdesignedtobeidentical. 6.8DiscussionInthischapter,wehaveintroducedafamilyof(strictly)positivedenitekernelsonthespaceofspiketrainstodesigndissimilarity(divergence)betweenpointprocesses.Wehaveshownthattheproposedkernelsoftenleadtopopulardissimilarity(divergence)measuresthatarewidelyusedinpractice,andthat,manyexistingkernelsareactuallystrictlypositivedeniteinnature.However,giventhiscollectionofkernels,itisanaturalquestiontoask,whichkernelisbetter.Inthefollowingparagraphs,webrieydiscusstheprosandconsoftheproposedkernels.Thekernelsdenedonthestratiedspacearemoreecienttocompute,andeasytodesign.However,thistypeofkernelssuerfromsmallsamplesizes,sinceifthecountdistributionisat(notconcentrated)i.ethesamplesarescatteredindierentpartitionsn,thedivergencebecomediculttoevaluate.Inaddition,thestratiedapproachsuersfromthecurseofdimensionality,whenthenumberofspikesinaspiketrainislarge.Hence,itmaybenecessarytoreducethetimewindowofobservationtoreducethedimension.Ontheotherhand,thekernelsdenedonthesmoothedspacedonotsuerfromtheseproblems,sincethesekernelscancomparetwospiketrainsinvolvingdierentspikenumberofspikes.However,onthecontrary,thesekernelsareratherdiculttocomputeduetotheintegrationrequiredforevaluatingthekernel. 100

PAGE 101

Inthesmoothedrepresentation,wehaveintroducedthreedierentkernels.Therstkernel( 6{8 )issimplytheinnerproductbetweenthesmoothedspiketrains.Thiskerneliseasiertoevaluatethantheotherkernelsonthisspace.However,thiskernelisapositivedenitekernel(notstrictly)andthecorrespondingdissimilaritymeasurecanbeexpressedastheL2distancebetweentwointensityfunctions[ 84 ].Thesecondkernel( 6{10 )isastrictlypositivedenitekernel.However,thiskernelisdiculttoevaluatesinceitrequiresevaluatinganintegralwhichmightbeexpensive.Thenalkernel( 6{12 )providesacompromisebetweenperformanceandcomputationalcostsincethekernelisstrictlypositivedenitebutthecomputationalcostissimilartothatoftheinnerproductkernel.Noticethatthecomputationalcostofthesekernelscanbereducedbychoosingappropriatesmoothingfunctionsgsuchasheavysidefunctionorrectangularfunction.InTable 6-2 ,weprovideabriefdescriptionoftheproposedmethodalongwiththeircomplexityandrelationtotheliterature. 6.9ProofofTheorem 14 Firstweshowthatitispositivedenite.DK(P;Q)=ZZK(x;y)d(x)d(y)=ZZZg(x;u)g(y;u)d(u)d(x)d(y)=ZZg(x;u)d(x)2d(u)0Toshowthestrictlypositivedeniteness,weneedtoshowwhendK(P;Q)=0,=0.SupposedK(P;Q)=0,then,ZZg(x;u)d(x)2d(u)=0()Zg(x;u)d(x)=0{a.e.uand,sincesupp()=Rn,thisholdseverywhere.Ifgisstrictlypositivedenite,RRg(x;u)d(x)d(y)=0implies=0.InthecasewheregisS-admissible,wemultiplybothsidesbyanarbitraryfunctionf(u)andintegrate.Zf(u)Zg(x;u)d(x)du=0)ZZg(x;u)f(u)dud(x)=0 101

PAGE 102

eq#Kerneldivergence?timecomplexity smoothed ( 6{8 )K=RX~1(t)~2(t)dtnoO(N2M2)a[ 84 ]( 6{10 )K=RXexp)]TJ /F9 11.955 Tf 11.29 13.28 Td[(~1(t))]TJ /F1 11.955 Tf 12.44 3.16 Td[(~2(t)2dtyesO(N2T)[ 84 ]y( 6{12 )K=exp)]TJ /F9 11.955 Tf 11.29 9.63 Td[(R~1(t))]TJ /F1 11.955 Tf 12.44 3.16 Td[(~2(t)2dtyesO(N2M2)a[ 84 ]y( 6{8 )K=RX~S1(t)~S2(t)dt{O(N2T)[ 44 ]b stratied -K(n)=1noO(NM)countCM( 6{6 )K(n)s=RI(!1!)I(!2!)dP+Q 2(!)yesO(N2M2)ch. 4 y-K(n)ch=Rexp(ix!)exp()]TJ /F3 11.955 Tf 9.3 0 Td[(iy!)d(!)yesO(N2M2)[ 57 ]cy-K(n)p=R(x)]TJ /F3 11.955 Tf 11.95 0 Td[(u)(y)]TJ /F3 11.955 Tf 11.96 0 Td[(u)duyesO(N2M2)[ 127 ]y ausinganexponentialsmoothingkernelbnonlinearsmoothingcpresentedinthecontextofindependence Table6-2.Listofkernelsforspiketrainsofspecialinterestandtheircorrespondingtimecomplexity.TheworstcasetimecomplexityispresentedwhereN=max(NP;NQ),Misthemaximumnumberofspikesinallthesamples,andTisthenumberofnumericalintegrationsteps.Notethatthestratiedkernelscanbemoreecientlyevaluatedbecausethespiketrainswithdierentnumberofeventsarenotcompared.Althoughwerefertotheliterature(markedwithy)forthekernels,theywerenotexplicitlyproventobestrictlypositivedenitebeforetothebestofourknowledge. Foranyh2S,Rg(x;u)f(u)du=h(x)hasasolutionf,therefore,Rh(x)d(x)=0.Since,h(x)2S,theaboveconditionimpliesthat=0.ThisfollowsfromthepropertyoftheSchwartzspace. 6.10ProofofTheorem 15 Sinceisacompletelymonotonefunction,ithasaLaplacetransformrepresentation,(x)=Z10e)]TJ /F6 7.97 Tf 6.59 0 Td[(xtdm(t)wheremisanitepositiveBorelmeasure.Letf kgbeanorthonormalbasisofL2.Wecanrepresentanyfunctioninf2L2asf=Pjaj k.Then,foranysetofflgCand 102

PAGE 103

fflgL2,XlXklk(kfl)]TJ /F3 11.955 Tf 11.95 0 Td[(fkk2)=Z10XlXklkexp()]TJ /F3 11.955 Tf 9.3 0 Td[(tkfl)]TJ /F3 11.955 Tf 11.95 0 Td[(fkk2)dm(t)=Z10XlXklkexp()]TJ /F3 11.955 Tf 9.3 0 Td[(tkal)]TJ /F3 11.955 Tf 11.96 0 Td[(akk2)dm(t)0whereaisavectorofcoecientsfajgandtheinequalityisfromthefactthatGaussianisapositivedenitefunction.Itremainstoshowthatthestrictequalityimpliesthatalllarezero.Supposetheequalityholds,thatis,XlXklkexp()]TJ /F3 11.955 Tf 9.3 0 Td[(tkfl)]TJ /F3 11.955 Tf 11.96 0 Td[(fkk2)=0m-a.e.tTakeasequenceofindependentzeromeanunitvarianceGaussianrandomvariablesfxjg.DuetotheFouriertransformpropertyofGaussian,E"exp iXjajxj!#=exp )]TJ /F1 11.955 Tf 10.49 8.09 Td[(1 2Xja2j!:)XlXklkexp )]TJ /F1 11.955 Tf 10.5 8.09 Td[(1 2Xj(alj)]TJ /F3 11.955 Tf 11.96 0 Td[(akj)2!=E24Xkkexp iXjakjxj!235=0:Hence,Xkkexp iXjakjxj!=0a.e.Bytakingtheconditionalexpectations,Xkkexp)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(iak1x1E"exp iXj=2akjxj!#=Xkkkexp)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(iak1x1=0a.e.:Multiplyingbothsideswiththeconjugateandtakingexpectationweget,XlXkllkkexp1 2(ak1)]TJ /F3 11.955 Tf 11.96 0 Td[(al2)2=0 103

PAGE 104

FromthestrictlypositivedenitenessofGaussiankernel,andfromthefactthatk>0,weconcludethatk=0forallk. 104

PAGE 105

CHAPTER7CONCLUSIONThenon-stationary,non-Poisson,andheterogeneousnatureoftrial-to-trialvariabilityinthespiketrainobservationcallsforexibleanalysistoolsformeasuringsimilarities.Insteadofassumingaparametricstructureonspiketrains,ourgoalsistodevelopstatisticaldivergencestodirectlymeasurethesimilaritybetweenpointprocesses,connectingthefundamentalconceptsofnoiseandsimilarity.DivergencesaredirectextensionsofITLconceptswherethefulldescriptionofprobabilitylawsareutilizedinsteadofpartialstatistics{theymeasuredistancesinthespaceofpointprocesses.Divergenceservesasacornerstonetobuildstatisticallearningandinference.Weposedseveralneuroscienceproblemssuchaschangedetection,clusteringandparameterestimationasstatisticalinferences.Weproposedthreedivergencefamiliesandcorrespondingestimators.Theyaremotivatedfromdierentprinciplesandestimationstrategies.ThecumulativebaseddivergenceextendstheK-StestthroughtheuseofCDFrepresentationonthestratiedspace.Themainadvantageisthelackoffreeparameter,howeverusingtheempiricalCDFwithoutsmoothingleadstolowstatisticalpoweroftheestimator.Thecumulativebasedpointprocessdivergencehasalowtimecomplexity(gure 7-1 ).Duetostratication,itworksbestwithsmallnumberofactionpotentialspertrial.Therefore,itissuitableforsearchingmassivedatasetswithrelativelyhighnumberoftrialsandsmallnumberofactionpotentialspertrial.TheHilbertianmetricand-divergencebringsaninnerproductstructuretothespaceofpointprocesses.WepresentedtheHellingerandsymmetric2-divergencesbothofwhicharequitepowerful(gure 7-2 ).WhileboththedivergencescanbeestimatedviaParzenwindowingonthestratiedspace,thesymmetric2-divergenceallowsadirectestimationofthelikelihoodratiowhilemaintainingtheL2consistency.Althoughthedensityestimationbaseddivergenceestimationperformsbetterinlowdimensional 105

PAGE 106

Figure7-1.Comparisonoftypicalexecutiontimeofdierentdivergenceestimators.(IntelPentiumTMIV3.2GHz,MATLABTM2009b) problems(lessnumberofspikes),theysuerfromslowconvergenceinhighdimensionaldata.Incontrast,thedirectestimationoflikelihoodratioperformsequallywellregardlessofthedimension;however,thecomputationalcomplexityismuchhigher(theslowest,seegure 7-1 ).BeingHilbertian,thedivergenceisdirectlyconnectedtoasymmetricpositivedenitekernelonpointprocesses,thusenablingaplethoraofkernelalgorithms.Thekernelbaseddivergence(basedonstrictlypositivedenitekernels)approachestheproblemdirectlyinthesensethatnointermediatestatisticalobjectsuchascumulativedistribution,probabilitydensity,norlikelihoodratio,isestimated.However,itturnsoutthatquadraticdivergencemeasuressuchastheextendedC-Mtest,thatisdevelopedascumulativebaseddivergence,aswellastheL2distance(inthecaseofmCIkernel)arespecialcasesofkernelbaseddivergence.Theestimationisstraight-forward,andhasintermediatetimecomplexity.Whenasymmetrickernelisapplied,itcanbeinterpretedasthedistancebetweenthemeanintheRKHS;thisisequivalenttotheDEDcriterion(Euclideandivergence)intheITLframework.Moreover,thesymmetrickernelinducesa 106

PAGE 107

Hilbertianmetriconthepointprocessspaceaswell,butthisisfundamentallydierentfromthe-divergencesubclass. Figure7-2.Comparisonofstatisticalpoweramongproposedmethods.(left)PTSTagainstequi-intensityPoissonprocess.(right)Renewalagainstequi-intensityPoissonprocess. Alltheproposedmeasuresarebinlessandasymptoticallyconsistent.Intheory,consistencymeansthatgivenarbitrarilylargenumberoftrials,theestimatorconvergestothetheoreticalvalue.However,whenthenumberoftrialsisnite,asisalwaysthecaseinpractice,wemayhavepoorestimation.Thebinlesspropertyimpliesthatitutilizesallthegiveninformationthatwouldhavebeenlostifbinningwereperformed.Forexample,thedirectestimationofentropywithbinningrequireordersofmagnitudelargernumberoftrialsforestimationcomparedtobinlessmethods[ 123 125 ].Eachproposedestimatorhasdierentbehaviorinthesmallsamplesizeregime.Generallyspeaking,thestratiedkernelsaremoresensitivetothechangeincountdistributions,andthenon-stratiedkernel-basedmethodsaremoresensitivetoringrateprolechanges.Inaddition,thechoiceofparameterssuchaskerneltypeandkernelsizecreatescomplexityinusingthedivergence.Applyingmultipledivergencesortuningtheparametercanleadtoincorrectlyhighrejectionrateinhypothesistesting,andcarefuldesignofmultiple-testingisnecessary[ 6 ].Weleavethisasanopenproblem.Inpractice,theproposedmethodsshouldbeappliedwhenthesimpletestonratefunctionfails. 107

PAGE 108

Oneoftheimmediateextensionoftheproposedmethodistogeneralizeittomultiplespiketrains.Pairwisemeasureisonlyusefulwhenobservingandanalyzingasinglespiketrain,andtheneedformulti-variateanalysisisevident;hypothesistestingcanbeusedtodeterminewhethertwopointprocessesarestatisticallydependentornot.However,estimationofjointpointprocessisamuchmoredicultproblem.Thestratiedspaceapproachisverylikelytofailduetothecombinatorialexplosionofthenumberofstratum.Therefore,weshouldfocusondevelopingkernelsoftheformK:dd!R.Thesimplestformistheproductkernel,howeveritispossibletodesignotherecientkernelsdirectlyhavingthemulti-dimensionalform.Wealsoleavethisforfuturedirections.Anotherinterestingfutureapplicationofdivergencemeasuresistouseitasacostfunctiontoobtainadaptivelteringandlearningalgorithms.InITL,thisapproachhasbroughtfruitfulresultssuchasrobustadaptivelteringandinformationtheoreticclustering[ 101 ].ThesedivergencescanpotentiallyimprovetheMSEorMLEbasedestimationprocessaswell.Designingsynapticplasticityrulesbyoptimizingthedivergencesisalsoaninterestingavenue[ 11 16 61 119 ].Inadditionindividualdivergencesprovidedierentstructuretothepointprocessspace.Beingdivergences,theybehavesimilarlywhenthetwopointprocessesareveryclosetoeachother.However,whenfarapart,theyemphasizedierentaspectsofdistance.Inanoptimizationproblem,whentheoptimum(zerodivergence)isachieved,thechoiceofdivergencewouldonlyaltertheconvergencerateandestimationquality.Incontrast,iftheoptimumisnotachievable,eachdivergencemaygivedistinctanswerseventheoretically.Therefore,furtherinvestigationofeachdivergenceisnecessaryasafuturework.Thesimulatedexamplesclearlyshowthatthedivergencesaresuperiortodissimilarities.However,theanalysisoftherealdatasetsshowsthatdespitebeingadissimilaritytheL2distanceofmeanratefunction(orequivalentlythekerneldissimilaritywiththemCIkernel)performscomparabletotheproposeddivergences.Therearetwoexplanationsforthisphenomena.First,theexperimentalist'sbiasplaysakeyrole.Wehavenotdesigned 108

PAGE 109

theexperimentstodemonstrateoursuperiority,andtheexperimentalistislikelytodiscardtheconditionsorprotocolsthatdonotshowsignicantchangeinthemeanrateprole,thuscreatingdatasetsthatcanbemostlydescribedbythemeanratefunction.Second,thechangeintheneuralsystemoftenaccompanieschangeinthemeanringrate.Thisisperhapsthereasonwhyscientistsstillreliesonringrateandcountstatistic.Itisundeniablethatthemeanringratecontainssignicantamountofinformation.However,thepointofdistribution-freedivergencesistoanalyze\additional"informationthatthescientistsmaybeblindtowhenonlyusingtherstorderstatistic.Weproposeatwostageapproachtondevidencethatthefullstatisticisneededforrealdataasafuturework.Foratargetsensorysystem,collectasmanytrialsaspossiblewithslightestchangesintheinput,sothattheoutputspiketrainscannotbefullyclassiedviameanratefunctiononly.Thenusingadivergence,verifyifitispossibletosub-classifytheambiguouslyclassiedresultsfromthemeanratefunction.Thismayrevealnoveltunningpropertiesofneuronsaswell.Onelastdisadvantageofusingdivergencethatweareawareofforhypothesistestingisthatwhenthehypothesisisrejected,thedivergencedoesnotexplicitlyindicatewhichfeaturecausedtherejection.Thisisbecauseallaspectsaremergedintoasinglenumberrepresentation.Therefore,itisnecessarytofurtheranalyzethedatatounderstanditbetter.Onepossiblelineoffutureworkistodesignadivergencethatcanbebrokenintosmallexplicitfeatures,hencenaturallyallowinghierarchicaltests.Iwouldliketoclarifywhichpartsofthisthesisaremyoriginalwork(manyofwhichareresultsofcollaboration).Thetheoreticalcontributionofthedissertationisthedevelopmentofmathematicalframeworksfor(1)Hilbertspacerepresentationofspiketrains,(2)alternativepointprocessformulation,(3)preciselytimedpointprocessmodel,(4)statisticalinferencethroughpointprocessdivergences,and(5)noveldivergencesandestimatorsforpointprocess.Inaddition,theoreticalconnectionsamongITL,RKHS,and(strictly)positivedenitefunctionsarebetterunderstood. 109

PAGE 110

Thepracticalcontributionsaredevelopmentofasetofdataanalysisandmodelingtoolsforneuroscienceapplications.Afewproblemswithrealdataaredemonstratedthroughthethesis:(1)non-stationaritydetection,(2)plasticitydetection,(3)optimalparameterselection,(4)clusteringandvisualizationofspiketrains,and(5)clusteringpointprocesses.TheestimatorsandalgorithmsdescribedinthethesisareimplementedinMATLABTMasanopensourceprojectcalledIOCANEandfreelyavailableontheweb.1Thegeneralpointprocessviewprovidedbythecurrentbodyofworkdescribeswhatisthenoise,andmeasuressimilaritybasedonthenoisestructure,butittellslittleaboutwhatthesignalis.However,westronglybelievethatanalyzingthenoiseistherststeptowardsbetterunderstandingthesignalanditsprocessingintheneuralsystem.Ouranalysisframeworkisfocusedondealingwithnon-stationaryandarbitrarynoisestructurethatisoftenobservedinexperiments,andprovidesprincipledstatisticaltoolsforanalysis.Wehopetheproposedmethodscontributetotheneurosciencecommunityandtransformhowwethinkaboutspiketrainsimilarities. 1 http://www.cnel.ufl.edu/~memming/iocane/ http://code.google.com/p/iocane/ 110

PAGE 111

APPENDIXAREPRODUCINGKERNELHILBERTSPACESFORINFORMATIONTHEORETICLEARNINGTheinitialgoalofITLwastoproposealternativecostfunctionsforadaptiveltering[ 31 ].1Entropycharacterizestheuncertaintyintheerrordistribution,however,thedicultyofShannon'sentropyresidesonnon-parametricestimationwithintheconstraintsofoptimalltering(e.g.smoothcosts).Forthisreasonweembraced[ 32 ]ageneralizationofShannon'sentropyproposedbyAlfredRenyi.Itisaparametricfamilyofentropiesgivenby H(X)=1 1)]TJ /F3 11.955 Tf 11.95 0 Td[(logZf(x)dx;>0;wheref(x)isthePDFofthecontinuousrandomvariableX.Itiseasytoshowthatthelimitof!1yieldsShannon'sentropyand[ 105 ]showsthatthesingularityisnotessential.ThereisapracticalnonparametericestimatorforthequadraticRenyi'sentropy(=2), H2(X)=)]TJ /F1 11.955 Tf 11.29 0 Td[(logZf2(x)dx=)]TJ /F1 11.955 Tf 11.29 0 Td[(logE[f(x)];(A{1)i.e.isthe)]TJ /F1 11.955 Tf 11.29 0 Td[(logoftherstmomentofthePDFitself.Sincethelogarithmfunctionismonotonic,thequantityofinterestisjusttheargumentofit. V(X)=Zf2(x)dx;(A{2) 12008IEEE.Reprinted,withpermission,fromIEEETransactionsonSignalProcessing,AReproducingKernelHilbertSpaceFrameworkforInformation-TheoreticLearning,J.Xu,A.R.Paiva,I.Park,andJ.C.Principe 111

PAGE 112

whichiscalledtheinformationpotential2(IP),sonamedduetoasimilaritywithpotentialeldsinphysics[ 100 ].Anon-parametricasymptoticallyunbiasedandconsistentestimatorforagivenPDFf(x)isdenedas[ 95 ] ^f(x)=1 NNXi=1(x;xi);(A{3)where(;)iscalledtheParzenwindow,orkernel.HeretheParzenkernelwillbechosenasasymmetricnon-negativedenitefunctionwithnon-negativevaluejustlikeinkernel-basedlearningtheory,suchastheGaussian,polynomialetc[ 37 ].3ThenbyevaluatingtheexpectationoftheParzen'sPDFapproximationin( A{2 ),theintegralcanbedirectlyestimatedfromthedataas ^V(X)=1 N2NXi=1NXj=1(xi;xj);(A{4)wherefxigNi=1isthedatasampleandNisthetotalnumber,whichistheestimatorforIP.Theconceptandpropertiesofinformationpotential(anditsestimator)havebeenmathematicallystudiedandanewcriterionbasedoninformationpotentialhasbeenproposed,calledtheMEE(MinimizationErrorEntropy),toadaptlinearandnonlinearsystems[ 31 ].MEEservesasanalternativecostfunctiontotheconventionalMSE(MeanSquareError)inlinear/nonlinearlteringwithseveraladvantagesinperformancewhentheerrordistributionisnotGaussian.IfwethinkforamomentweseethebigdierencebetweenMSEandMEE:MSEisthesecondordermomentofthedataandMEEistherstmomentofthePDFofthedata.Sincealltheinformationcontainedintherandom 2Notethatinpreviouslypublishedpapers,wecalledtheestimatorforthisquantity( A{4 )astheinformationpotential.Inthispaper,wegeneralizetheconceptandcalledthestatisticaldescriptorbehindit( A{2 )astheIPandreferto( A{4 )astheestimatorofIP.Thephysicalinterpretationstillholds.3NotallParzenwindowkernelsarenonnegativedenite(e.g.rectangularkernel),andnotallnon-negativedenitekernelisnonnegativevalued 112

PAGE 113

variableisrepresentedinitsPDF,wecanexpectbetterperformancefromthelaterthanfromMSE.Ininformationtheory,mutualinformationisusedtoquantifythedivergencebetweenthejointPDFandtheproductofmarginalPDFsoftworandomvariables.Anotherwell-knowndivergencemeasureistheKullback-Leiblerdivergence[ 65 ].However,botharediculttoestimateinpracticewithoutimposingsimplifyingassumptionsaboutthedata.Numericalmethodsarerequiredtoevaluatetheintegrals.ThisIPandtwodivergencemeasuresamongPDFs,onebasedontheirEuclideandistanceandtheotheronCauchy-Schwarzinequality,havebeenproposedtosurpasstheselimitations[ 100 ].Giventwoprobabilitydensityfunctionsf(x)andg(x),theirEuclideandivergenceisdenedasDED(f;g)=Z(f(x))]TJ /F3 11.955 Tf 11.95 0 Td[(g(x))2dx=Zf(x)2dx)]TJ /F1 11.955 Tf 11.95 0 Td[(2Zf(x)g(x)dx+Zg(x)2dx: (A{5)ThedivergencemeasurebasedonCauchy-Schwarzinequalityisgivenby DCS(f;g)=)]TJ /F1 11.955 Tf 11.29 0 Td[(logRf(x)g(x)dx q )]TJ 5.48 -0.05 Td[(Rf2(x)dx)]TJ 12.95 -0.05 Td[(Rg2(x)dx:(A{6)NoticethatbothDED(f;g)andDCS(f;g)aregreaterorequaltozero,andtheequalityholdsifandonlyiff(x)=g(x).Noticetheformoftheintegrals.WehaveinboththerstmomentofeachPDF,andanewtermRf(x)g(x)dxthatistherstmomentofthePDFg(x)overtheotherPDFf(x)(orviceversa)whichiscalledthecrossinformationpoten-tial(CIP)[ 100 ].CIPmeasuresthesimilaritybetweentwoPDFsascanbeexpectedduetoitsresemblancetoBhattacharyyadistanceandotherdistancesasexplainedinGokcayandPrincipe[ 38 ].CIPappearsbothinEuclideanandCauchy-Schwarzdivergencemeasuresandifonesubstitutesg(x)byf(x)inCIPitbecomestheargumentofRenyi'squadraticentropy.Asexpectedallthesetermscanbeestimateddirectlyfromdataasin( A{4 ). 113

PAGE 114

TheIPestimator( A{4 )canbeinterpretedinanRKHS.IndeedaccordingtotheMercer'stheorem[ 72 ],anysymmetricnon-negativedenitekernelfunctionthatissquareintegrablehasaneigen-decompositionas(x;y)=h(x);(y)iH,where(x)isthenonlinearlytransformeddataintheRKHSHinducedbythekernelfunctionandtheinnerproductisperformedinH.Therefore,wecanre-write( A{4 )as ^V(x)=*1 NNXi=1(xi);1 NNXj=1(xj)+H=1 NNXi=1(xi)2:SimilarinterpretationsoftheCauchy-SchwarzdivergenceinHweredevelopedin[ 53 ].TheRKHSHisdataindependentsincethekernelispre-designedandactsonindividualdatasamples,whichmeansthatextracomputationinvolvingfunctionalevaluationsinHarerequiredwhenstatisticalquantitiesareestimated.TheexampleoftheIPestimatorisstillsimpleandcanexploitthekerneltrick,butingeneralthismaynotbethecase.ThedicultyisthattheinnerproductstuctureofHisnottranslatingthestatisticsofthedata. A.1RKHSFrameworkforITLFromthevariousdenitionsininformation-theoreticlearningsummarizedabove,weseethatthemostgeneralquantityofinterestistheintegraloftheproductoftwoPDFsRf(x)g(x)dxwhichwecalledtheCIP.ThereforethiswillbeourstartingpointforthedenitionoftheITLRKHSthatwillincludethestatisticsoftheinputdatainthekernel. A.1.1TheL2spaceofPDFsLetEbethesetthatconsistsofallsquareintegrableone-dimensionalprobabilitydensityfunctions,i.e.,fi(x)2E,8i2I,whereRfi(x)2dx<1andIisanindexset.Wethenformalinearmanifold (Xi2Iifi(x))(A{7) 114

PAGE 115

foranycountableIIandi2R.Completethesetin( A{7 )usingthemetric kfi(x))]TJ /F3 11.955 Tf 11.95 0 Td[(fj(x)k=s Z(fi(x))]TJ /F3 11.955 Tf 11.95 0 Td[(fj(x))2dx;8i;j2I(A{8)anddenotethesetofalllinearcombinationsofPDFsanditslimitpointsbyL2(E).L2(E)isanL2spaceonPDFs.Moreover,bythetheoryofquadraticallyintegrablefunctions,weknowthatthelinearspaceL2(E)formsaHilbertspaceifaninnerproductisimposedaccordingly.GivenanytwoPDFsfi(x)andfj(x)inE,wecandeneaninnerproductas hfi(x);fj(x)iL2=Zfi(x)fj(x)dx;8i;j2I:(A{9)NoticethatthisinnerproductisexactlytheCIP.Thisdenitionofinnerproducthasacorrespondingnormof( A{8 ).Hence,L2(E)equippedwiththeinnerproduct( A{9 )isaHilbertspace.However,itisnotanRKHSbecausetheinnerproductisnotreproducinginL2(E),i.e.,thepointevaluationofanyelementinL2(E)cannotberepresentedviatheinnerproductbetweentwofunctionalsinL2(E).Nextweshowthattheinnerproduct( A{9 )issymmetricnon-negativedenite,andbytheMoore-AronszajntheoremituniquelydenesanRKHS. A.1.2RKHSHVBasedonL2(E)First,wedeneabivariatefunctiononthesetEas V(fi;fj)=Zfi(x)fj(x)dx;8i;j2I:(A{10)InRKHStheory,thekernelfunctionisameasureofsimilaritybetweenfunctionals.Noticethat( A{10 )correspondstothedenitionoftheinnerproduct( A{9 )andthecrossinformationpotentialbetweentwoPDFs,henceitisnaturalandmeaningfultodenethekernelfunctionasV(fi;fj).Next,weshowthat( A{10 )issymmetricnon-negativedeniteinE.Property1(Non-NegativeDeniteness):Thefunction( A{10 )issymmetricnon-negativedeniteinEE)166(!R. 115

PAGE 116

Proof. Thesymmetryisobvious.ForanyN,anysetff1(x);f2(x);:::;fN(x)gEandanynotallzerorealnumbersf1;2;:::;Ng,bydenitionwehaveNXi=1NXj=1ijV(fi;fj)=NXi=1NXj=1ijZfi(x)fj(x)dx (A{11)=Z NXi=1ifi(x)! NXj=1jfj(x)!dx=Z NXi=1ifi(x)!2dx0:Hence,V(fi;fj)issymmetricnon-negativedenite,anditisalsoakernelfunction. AccordingtotheMoore-Aronszajntheorem,thereisauniqueRKHS,denotedbyHV,associatedwiththesymmetricnon-negativedenitefunction( A{10 ).WeconstructtheRKHSHVbottom-up.Sincethebivariatefunction( A{10 )issymmetricandnon-negativedenite,italsohasaneigen-decompositionbyMercer'stheorem[ 72 ]as V(fi;fj)=1Xk=1k k(fi) k(fj);(A{12)wheref k(fi);k=1;2;:::gandfk;k=1;2;:::garesequencesofeigenfunctionsandcorrespondingeigenvaluesofthekernelfunctionV(fi;fj)respectively.TheseriesaboveconvergesabsolutelyanduniformlyonEE[ 72 ].ThenwedeneaspaceHVconsistingofallfunctionalsG()whoseevaluationforanygivenPDFfi(x)2Eisdenedas G(fi)=1Xk=1kak k(fi);(A{13)wherethesequencefak,k=1,2,...gsatisesthefollowingcondition 1Xk=1ka2k<1:(A{14)FurthermorewedeneaninnerproductoftwofunctionalsinHVas hG;FiHV=1Xk=1kakbk;(A{15)whereGandFareofform( A{13 ),andakandbksatisfyproperty( A{14 ). 116

PAGE 117

ItcanbeveriedthatthespaceHVequippedwiththekernelfunction( A{10 )isindeedareproducingkernelHilbertspaceandthekernelfunctionV(fi;)isareproducingkernelbecauseofthefollowingtwoproperties: 1. V(fi;fj)asafunctionoffi(x)belongstoHVforanygivenfj(x)2EbecausewecanrewriteV(fi;fj)as V(fi;)(fj)=1Xk=1kbk k(fj);bk= k(fi):Thatis,theconstantsfbk;k=1;2;:::gbecometheeigenfunctionsf k(fi);k=1;2;:::ginthedenitionofG.Therefore, V(fi;)2HV;8fi(x)2E: 2. GivenanyG2HV,theinnerproductbetweenthereproducingkernelandGyieldsthefunctionitselfbythedenition( A{15 ) hG;V(fi;)iHV=1Xk=1kakbk=1Xk=1kak k(fi)=G(fi):Thisissocalledthereproducingproperty.Therefore,HVisanRKHSwiththekernelfunctionandinnerproductdenedabove.Bythereproducingproperty,wecanre-writethekernelfunction( A{12 )as V(fi;fj)=hV(fi;);V(fj;)iHV (A{16) V(fi;):fi7![p k k(fi)];k=1;2;:::ThereproducingkernellinearlymapstheoriginalPDFfi(x)intotheRKHSHV.WeemphasizeherethatthereproducingkernelV(fi;fj)isdeterministicanddata-dependentbywhichwemeanthenormoftransformedvectorintheRKHSHVisdependentonthePDFoftheoriginalrandomvariablebecause kV(fi;)k2=hV(fi;);V(fi;)iHV=Zfi(x)2dx:Thisisverydierentfromthereproducingkernel(x;y)usedinkernel-basedlearningtheory.ThenormofnonlinearlyprojectedvectorintheRKHSHdoesnotrelyonthe 117

PAGE 118

statisticalinformationoftheoriginaldatasince k(x)k2=h(x);(x)iH=(0)ifweusetranslation-invariantkernelfunctions[ 37 ].Moreover,ifXisarandomvariable,(X)isalsoarandomvariableintheRKHSH.Thevalueof(0)isaconstantregardlessoftheoriginaldata.Consequently,thereproducingkernelHilbertspacesHVandHdeterminedbyV(fi;fj)and(x;y)respectivelyareverydierentinnature. A.2ConnectionBetweenITLandKernelMethodsviaRKHSHV FigureA-1.Therelationshipamongthesamplespace,PDFspace,theproposedITLRKHSHVandtheRKHSH.ThesamplespaceandHareconnectedviathenonlineartransformationX().ThePDFspaceandHVareconnectedviathefeaturemapV(f;).ArealizationofaPDFinPDFspacecorrespondstoasetofpointsinthesamplespace.TheensembleaverageoffunctionalsinHcorrespondstoonefunctionalinHV.ThekernelmethodsandITLarerelatedviatheParzenwindow.(reprintedfrom[ 127 ]withpermissionc2006IEEE) Inthissection,weconnectITLandkernelmethodsviatheRKHSframework.Aswehavementionedintheprevioussection,becausetheRKHSHisinducedbythedata-independentkernelfunction,thereforethenonlinearlyprojecteddatainHisstillstochasticandstatisticalinferenceisrequiredinordertocomputequantitiesofinterest.Forinstance,inordertocomputethestatisticsofthefunctionals,themeanandcovariancearerequired.TheexpectedvalueoffunctionalsintheRKHSHisdenedasE[(x)].Thecross-covarianceisdenedasauniqueoperatorXYsuchthatforany 118

PAGE 119

functionalsfandginH hg;XYfiH=E[g(y)f(x)])]TJ /F3 11.955 Tf 11.95 0 Td[(E[g(y)]E[f(x)]=Cov[f(x);g(y)]: (A{17) Themeanandcross-covarianceoperatorsasstatisticsoffunctionalsinHbecomeintermediatestepstodevelopalgorithmssuchasthethemaximummeandiscrepancy(MMD)[ 40 ],thekernelindependentcomponentanalysis(KernelICA)[ 5 ]andothers.However,theproposedITLRKHSHVisbasedontheCIP(integralofproductofPDFs),thereforethetransformedfunctionalinHVisdeterministicandonlyalgebraisneededtocarryoutstatisticalinferenceinITLRKHS.HenceourproposedRKHSoerssimplicityandeleganceindealingwithdatastatistics.TheRKHSHandtheRKHSHVarerelatedviatheexpectationoperator.Inordertojustifythisstatement,Parzen'snon-parametricasymptoticallyunbiasedandconsistentPDFestimator( A{3 )isemployedtoestimatethosePDFsusedintheITLdescriptors[ 52 ].TheParzenwindowevaluatesthePDFsinthesamplespace.Providedonechoosesanon-negativedenitekernelfunctionastheParzenwindow,itconnectstheRKHSHVtotheRKHSHusedinthekernelmethods.AsillustratedinFig. A-1 ,thefeaturemap(x)nonlinearlyprojectsthesamplespaceintoastochasticRKHSH.AlternativelythefeaturemapV(f;)transformsthePDFspaceintoadeterministicRKHSHV.Hencethestochasticityisimplicitlyembeddedintothefeaturemap,andimmediatealgebraicoperationcanbeappliedtocomputestatistics.HowevertheHmethodologyhastorelyonintermediatestepsbydeningmeanandcovarianceoperators. 119

PAGE 120

APPENDIXBPOISSONPROCESSWhiletherearemanyequivalentapproachestodeneanddescribepointprocesses{wejustdiscussedthecompleteintensityfunctionandrandommeasure{wewillusethecountingprocessrepresentationwhichisthemostintuitiveforpeoplewhoarealreadyexposedtorandomvariables.LetBbeaBorelspace.LetN(B)beanon-negativevaluedmeasureforB2B.LetBbethesetofallmeasuresN().ApointprocessisessentiallyaprobabilitydistributiononthesetB.Incaseofusualtemporalpointprocesses,BistheBorelsetofthereallineandtherealizationN([0;t))oftherandommeasuresigniesthenumberofeventsthatoccurredduringtheinterval[0;t).Asanotation,wedeneN(t)=N([0;t)),andN(s;t)=N(t))]TJ /F1 11.955 Tf 12.62 0 Td[(N(s).ThefunctionN(t)isanon-decreasingintegervaluedfunctionwithinitialvalue0.Poissonprocessisthesimplestandthemostrandompointprocess.MostofthepointprocessesaresomeformofgeneralizationofthePoissonprocess.Also,severalformsofgenerallimittheoremsindicatesthatthesuperpositionofindependentrenewalprocesswithsomeconditionsasymptoticallyconvergencetoaPoissonprocess.TherearemanyequivalentwaystodeneaPoissonprocess,here'sone. Denition12(HomogeneousPoissonprocess). A(homogeneous)Poissonprocesswiththerateparameter2Risdenedbythefollowingproperties.Whenthelimitofgoesto0,Pr[N(t;t+)=1jHt]=+o() (B{1)Pr[N(t;t+)>1jHt]=o()(conditionalorderliness) (B{2)Pr[N(0)=0]=1(initialcondition) (B{3)whereHtisanyeventintheinterval[0;t),ando()representshigherorderterms. 120

PAGE 121

Theconditionalorderlinessguaranteesthattherecanonlybeoneeventatasingletimeinstance(theclassofpointprocessesthathasthispropertyisknownassimplepointprocess).Sincetheoccurrenceofaneventdoesnotdependonanyofthehistory,Poissonprocessismemoryless.AdiscretetimeanalogyofPoissonprocessistheBernoullitrialprocess(inthesensethatlimitofp!0,withconstraint=pn).LetusinformallyderivetheintervaldistributionofaPoissonprocess.Thememorylesspropertycanberestatedintermsoftherandomvariableoftheintervaltimeasfollows, Pr[>s+tj>s]=Pr[>t](B{4)whichmeansthatiftherewerenoeventuptills,theprobabilitydistributionthattherewillbenoeventduringthenextthasthesamedistributionastheconditioneddistribution.LetthecomplementarycumulativeprobabilitydistributionfunctionofbeFc(t)=Pr[>t].UsingtheBayesruleon( B{4 ),Pr[>s+t;>s]=Pr[>t]Pr[>s]Pr[>s+t]=Pr[>t]Pr[>s]Fc(s+t)=Fc(t)Fc(s)Theonlycontinuousfunctionthatsatisesthisconditionistheexponentialfunction.Therefore,theprobabilitydistributionf(t)oftheintervalsistheexponentialdistribu-tion.Exponentialintervaldistributionisthehallmarkofcontinuous-timeMarkovchains.WewillshowthatPoissonprocessisequivalenttoapurebirthprocess.Fromtheconditionalorderlinessassumption,thefollowingprobabilitydecompositionisvalidfor 121

PAGE 122

k1.Pr[N(s;t+)=k]=Pr[N(s;t)=k]Pr[N(t;t+)=0]+Pr[N(s;t)=k)]TJ /F1 11.955 Tf 11.95 0 Td[(1]Pr[N(t;t+)=1]+o()=Pr[N(s;t)=k](1)]TJ /F3 11.955 Tf 11.95 0 Td[(+o())+Pr[N(s;t)=k)]TJ /F1 11.955 Tf 11.95 0 Td[(1](+o())+o()ThesecondequalityisfromthedenitionofPoissonprocess( B{1 ).Pr[N(s;t+)=k])]TJ /F1 11.955 Tf 11.96 0 Td[(Pr[N(s;t)=k] = (B{5))]TJ /F1 11.955 Tf 11.29 0 Td[(Pr[N(s;t)=k]+Pr[N(s;t)=k)]TJ /F1 11.955 Tf 11.96 0 Td[(1]+o() (B{6)Takingthelimit!0+,dPr[N(s;t)=k] dt=)]TJ /F3 11.955 Tf 9.3 0 Td[(Pr[N(s;t)=k]+Pr[N(s;t)=k)]TJ /F1 11.955 Tf 11.96 0 Td[(1] (B{7)Andincaseofk=0,dPr[N(s;t)=0] dt=)]TJ /F3 11.955 Tf 9.3 0 Td[(Pr[N(s;t)=0] (B{8)Thesedierentialequationsareaspecialcaseofpurebirthprocessincontinuous-timeMarkovprocess.Eq.( B{8 )canbesolvedwithinitialconditiongivenbythethirdpropertyinthedenitionofPoissonprocess. Pr[N(s;t)=0]=e)]TJ /F6 7.97 Tf 6.59 0 Td[((s)]TJ /F6 7.97 Tf 6.59 0 Td[(t)(B{9)whichisequivalenttotheintervaldistribution,becausethisistheprobabilitythatthereisnoeventintheinterval.ByrecursivelysolvingEq.( B{7 ),wegetthePoissondistribution:Pr[N(s;t)=k]=((s)]TJ /F3 11.955 Tf 11.96 0 Td[(t))k k!e)]TJ /F6 7.97 Tf 6.59 0 Td[((s)]TJ /F6 7.97 Tf 6.58 0 Td[(t) (B{10)Incaseofdiscretetime,thiscorrespondstothebinomialdistribution.ThehomogeneousPoissonprocessisstationaryinthefollowingsense. 122

PAGE 123

Denition13(Stationarypointprocess). ApointprocessisstationaryiftheprobabilitydistributionofN(A)isinvariantoverthetimeshiftofaBorelsetA.AnextensionofthehomogeneousPoissonprocessthatmakesitnon-stationaryistheinhomogeneousPoissonprocesswheretheratedependsontime. Denition14(InhomogeneousPoissonprocess). Pr[N(t;s)=k]=((t))]TJ /F1 11.955 Tf 11.96 0 Td[((s))k k!e)]TJ /F4 7.97 Tf 6.59 0 Td[(((t))]TJ /F4 7.97 Tf 6.59 0 Td[((s))(B{11) Pr[N(0)=0]=1(initialcondition)(B{12) Pr[N(A)];Pr[N(B)]areindependentifA\B=;(B{13)where()isanite-valued,non-decreasing,andnon-negativefunction.UsingEq.( B{11 )inthedenitionisslightlymoregeneralthansaying,Pr[N(t;t+)=1jHt]=(t)+o()where(t))]TJ /F1 11.955 Tf 10.8 0 Td[((s)=Rts()d,because()mightnotbedierentiableorevencontinuous,hence(t)maynotbewelldened.Wheneverhasadiscontinuity,theprobabilityofhavinganeventis1,thus,theprocesscanbedecomposedintoxedeventsandacontinuous0whenneeded. Theorem18(SamplefunctiondensityforaninhomogeneousPoissonprocess[ 116 ]). Let=fW1;W2;;Wn;N(t)=ng[s;t)beaparticularrealizationofaPoissonprocessN()withratefunction().Theprobabilityofhavingaparticularrealizationisgivenby,p()=8>>><>>>:exp)]TJ /F9 11.955 Tf 11.29 16.28 Td[(Zts()d;N(s;t)=0; nYi=0(Wi)!exp)]TJ /F9 11.955 Tf 11.29 16.28 Td[(Zts()d;N(s;t)=n1: (B{14)=exp)]TJ /F9 11.955 Tf 11.29 16.27 Td[(Zts()d+Ztsln(())N(d) (B{15) 123

PAGE 124

Alargeclassofsimplepointprocessescanbeuniquelydeterminedintermsoftheintensityfunction. Denition15(Intensityfunction). (t;Ht)=lim!0+Pr[N(t;t+)>0jHt] (B{16)TheHtcanincludethehistorydependence,inputdependence,internalstatedynamics,andhiddenstochasticprocess. Theorem19(DecompositionofPoissonprocessintotworandomvariables). Ifiswelldened,aninhomogeneousPoissonprocessforaxedinterval[s;t)canberepresentedwithtwoindependentrandomvariables.OneisthePoissonrandomvariableN(s;t),andtheotherisacontinuousrandomvariableXforthedistributionofthelocationoftheevents.TheprobabilitydistributionfunctionofXisgivenbynormalizingtheintensityfunction()asfollows, fX()=() Rts()d(B{17)ArealizationofthePoissonprocessisacollectionofnindependentrealizationsofX,wherenisarealizationofN(s;t).IncaseofhomogeneousPoissonprocess,Xfollowsuniformdistribution,henceitispossibletoeasilygeneraterealizationsusingapseudo-randomnumbergeneratorwhichisassumedtobeuniform. 124

PAGE 125

APPENDIXCMEASURETHEORYAmeasureisaconvenientmathematicalobject(function)thatcanrepresentthepositionsofstrawberriesinaeld,distributionofwaterintheocean,orprobabilitiesofwinningoverthelotterynumbers;themeasurecountsthenumberofstrawberriesinagivenarea,reportstheamountofwaterinacertainsea,andevaluatestheprobabilityofalotterytickettowin.Thisabstractunifyingframeworkenablesonetorigorously\measure"quantitiesoveraspace,andalsoenableintegration.Italsoallowselegantnotationforprobabilitytheory.Herewebrieydescribekeyideasofmeasuretheorywithoutproof.ThismaterialismostlybasedonDaleyandVere-Jones[ 25 ],Halmos[ 42 ].Todeneameasure,weneedameasurablespace(;F);anon-emptysetanda-algebraFon.Hereisthespacewhereourstutobemeasuredlies,andFgivesspecialstructureofthespacesuchthatthingsarewelldenedandpathologicalsetscanbeavoided.AnalgebraFofisasetofsubsetsofsuchthatitcontainstheemptyset,andclosedundersetunionandcomplement.A-algebraisanalgebrathatisclosedundercountableunion.ElementsinFaresaidtobemeasurable.Ameasureon(;F)isanon-negativeextendedrealvaluedfunctiononFthatiscountablyadditive([1iAi)=1Xi(Ai); (C{1)whereAiaredisjointmeasurablesets.Additivitymakessensebecausewewant(A)+(B)=(A[B)whenAandBaredisjoint(numberofstrawberriesshouldaddupfordierentelds).Ifisalwaysnitevalued,thenitiscalledanitemeasure.Foraspecialcase,aprobabilitymeasureisanitemeasurewhere()=1.GivenasetF,wedenotethesmallest-algebrathatcontainsFas(F)andsay(F)isthe-algebrageneratedbyF.AmeasureonFisdeterminedbyitsvaluesonanyalgebrathatgeneratesF(Caratheodoryextensiontheorem). 125

PAGE 126

ApredicateP(x)holdsalmosteverywhere(or-a.e.forshort)ifitistrueexceptforasetofmeasurezero,thatis,(E)=0;8x2Ec;P(x).Afunctionffromameasurablespace(X;F)toameasurablespace(Y;G)ismeasurableif8E2G;f)]TJ /F4 7.97 Tf 6.58 0 Td[(1(A)2F.Foratopologicalspace(X;U),the-algebrageneratedbytheopensetsiscalledtheBorel-algebra(orBorelset).Borelsetlinksthemeasurablefunctionsandcontinuousfunctions{everycontinuousfunctionfromtothereallineismeasurablewithrespecttotheBorelalgebra.Whenthetopologicalspaceisinducedbyametric,BorelsetandBairesetcoincide.Bairesetisthesmallest-algebrawithrespecttowhichthecontinuousfunctionsaremeasurable.Real-valuedBorel-measurablefunctionsareclosedunderalgebraicoperationsandlimit.Moreover,theycanbeapproximatedbylimitofsimplefunctions.Asimplefunctionisanitelinearcombinationofindicatorfunctionofmeasurablesets.Bythelinearityofintegration,integrationofaBorel-measurablefunctionwithrespecttoameasurecanbedenedbylettingtheintegrationofanindicatorfunctiononthemeasurablesetAas(A):RIAd=(A). 126

PAGE 127

REFERENCES [1] A.Amarasingham,T.L.Chen,S.Geman,M.T.Harrison,andD.L.Sheinberg.Spikecountreliabilityandthepoissonhypothesis.J.Neurosci.,26(3):801{809,January2006.ISSN1529-2401. [2] NachmanAronszajn.Theoryofreproducingkernels.TransactionsoftheAmericanMathematicalSociety,68(3):337{404,1950. [3] R.AzouzandC.M.Gray.Cellularmechanismscontributingtoresponsevariabilityofcorticalneuronsinvivo.JNeurosci,19(6):2209{2223,March1999.ISSN0270-6474. [4] RonyAzouzandCharlesM.Gray.Dynamicspikethresholdrevealsamechanismforsynapticcoincidencedetectionincorticalneuronsinvivo.PNAS,97(14):8110{8115,July2000. [5] F.R.BachandM.I.Jordan.Kernelindependentcomponentanalysis.J.Mach.Learn.Res.,3:1{48,2002. [6] YoavBenjaminiandYosefHochberg.Controllingthefalsediscoveryrate:Apracticalandpowerfulapproachtomultipletesting.JournaloftheRoyalStatisticalSociety.SeriesB(Methodological),57(1):289{300,1995.ISSN00359246. [7] ChristianBerg,JensPeterReusChristensen,andPaulRessel.HarmonicAnalysisonSemigroups:TheoryofPositiveDeniteandRelatedFunctions.Springer-Verlag,1984. [8] DeniseBerger,ChristianBorgelt,SebastienLouis,AbigailMorrison,andSonjaGrun.Ecientidenticationofassemblyneuronswithinmassivelyparallelspiketrains.ComputationalIntelligenceandNeuroscience,2010:1{19,2010.ISSN1687-5265. [9] MichaelJ.BerryIIandMarkusMeister.Refractorinessandneuralprecision.TheJournalofNeuroscience,18(6):2200{2211,March1998.ISSN0270-6474. [10] S.Bochner.Hilbertdistancesandpositivedenitefunctions.AnnalsofMathematics,42(3):647{656,Jul.1941. [11] SanderM.BohteandMichaelC.Mozer.Reducingthevariabilityofneuralresponses:Acomputationaltheoryofspike-timing-dependentplasticity.NeuralComputation,19(2):371{403,February2007. [12] EmeryN.Brown,DavidP.Nguyen,LorenM.Frank,MatthewA.Wilson,andVictorSolo.Ananalysisofneuralreceptiveeldplasticitybypointprocessadaptiveltering.PNAS,98:12261{12266,2001. 127

PAGE 128

[13] EmeryN.Brown,RobertE.Kass,andParthaP.Mitra.Multipleneuralspiketraindataanalysis:state-of-the-artandfuturechallenges.Natureneuroscience,7(5):456{61,may2004.ISSN10976256. [14] DanielA.Butts,ChongWeng,JianzhongJin,Chun-IYeh,NicholasA.Lesica,Jose-ManuelAlonso,andGarrettB.Stanley.Temporalprecisionintheneuralcodeandthetimescalesofnaturalvision.Nature,449(7158):92{95,2007. [15] MauriceJ.Chacron,BenjaminLindner,andAndreLongtin.Noiseshapingbyintervalcorrelationsincreasesinformationtransfer.PhysicalReviewLetters,92(8):080601+,Feb2004. [16] GalChechik.Spike-timing-dependentplasticityandrelevantmutualinformationmaximization.NeuralComp.,15(7):1481{1510,July2003. [17] S.Chena,Y.S.Hsub,andJ.T.Liawc.Onkernelestimatorsofdensityratio.Statistics,2009. [18] Z.Chi,P.L.Rauske,andD.Margoliash.Patternlteringfordetectionofneuralactivity,withexamplesfromhvcactivityduringsleepinzebranches.NeuralComputation,15(10):2307{2337,October2003.ISSN0899-7667. [19] MarkM.Churchland,ByronM.Yu,JohnP.Cunningham,LeoP.Sugrue,MarleneR.Cohen,GregS.Corrado,WilliamT.Newsome,AndrewM.Clark,PaymonHosseini,BenjaminB.Scott,DavidC.Bradley,MatthewA.Smith,AdamKohn,J.AnthonyMovshon,KatherineM.Armstrong,TirinMoore,SteveW.Chang,LawrenceH.Snyder,StephenG.Lisberger,NicholasJ.Priebe,IanM.Finn,DavidFerster,StephenI.Ryu,GopalSanthanam,ManeeshSahani,andKrishnaV.Shenoy.Stimulusonsetquenchesneuralvariability:awidespreadcorticalphenomenon.NatureNeuroscience,13(3):369{378,February2010.ISSN1097-6256. [20] AndrzejCichockiandShun-ichiAmari.Familiesofalpha-beta-andgamma-divergences:Flexibleandrobustmeasuresofsimilarities.Entropy,12(6):1532{1568,June2010. [21] CorinnaCortes,PatrickHaner,andMehryarMohri.Rationalkernels:Theoryandalgorithms.J.Mach.Learn.Res.,5:1035{1062,2004.ISSN1533-7928. [22] ThomasM.CoverandJoyA.Thomas.ElementsofInformationTheory.Wiley-Interscience,August1991.ISBN0471062596. [23] D.R.CoxandValerieIsham.PointProcesses.MonographsonAppliedProbabilityandStatistics.ChapmanandHall,1980. [24] S.Dabo-NiangandN.Rhomari.KernelregressionestimationinaBanachspace.JournalofStatisticalPlanningandInference,139(4):1421{1434,2009. [25] D.J.DaleyandD.Vere-Jones.AnIntroductiontotheTheoryofPointProcesses.Springer,1988. 128

PAGE 129

[26] RobR.deRuytervanSteveninck,GeoreyD.Lewen,StevenP.Strong,RolandKoberle,andWilliamBialek.Reproducibilityandvariabilityinneuralspiketrains.Science,275:1805{1808,1997. [27] StevenP.Dear,JamesA.Simmons,andJonathanFritz.Apossibleneuronalbasisforrepresentationofacousticscenesinauditorycortexofthebigbrownbat.Nature,364(6438):620{623,1993. [28] MichaelR.DeWeese,MichaelWehr,andAnthonyM.Zador.Binaryspikinginauditorycortex.J.Neurosci.,23(21):7940{7949,August2003. [29] C.G.H.DiksandV.Panchenko.Nonparametrictestsforserialindependencebasedonquadraticforms.CeNDEFWorkingPapers05-13,UniversiteitvanAmsterdam,CenterforNonlinearDynamicsinEconomicsandFinance,2005. [30] KarlDockendorf,IlPark,PingHe,JoseC.Prncipe,andThomasB.DeMarse.Liquidstatemachinesandculturedcorticalnetworks:Theseparationproperty.Biosystems,95(2):90{97,February2009. [31] D.ErdogmusandJ.C.Principe.Anerror-entropyminimizationalgorithmforsupervisedtrainingofnonlinearadaptivesystems.IEEETrans.SignalProcess.,50(7):1780{1786,Jul.2002. [32] D.ErdogmusandJ.C.Principe.Fromlinearadaptivelteringtononlinearinformationprocessing.IEEESignalProcess.Mag.,23:14{33,Nov.2006. [33] FarzadFarkhooi,Martin,andMartinP.Nawrot.Serialcorrelationinneuralspiketrains:Experimentalevidence,stochasticmodeling,andsingleneuronvariability.PhysicalReviewE(Statistical,Nonlinear,andSoftMatterPhysics),79(2):021905+,2009. [34] Jean-MarcFellous,PaulH.E.Tiesinga,PeterJ.Thomas,andTerrenceJ.Sejnowski.Discoveringspikepatternsinneuronalresponses.J.Neurosci.,24(12):2989{3001,March2004. [35] S.FineandK.Scheinberg.Ecientsvmtrainingusinglow-rankkernelrepresentations.J.Mach.Learn.Res.,2:243{264,2001. [36] B.Fuglede.SpiralsinHilbertspace:Withanapplicationininformationtheory.ExpositionesMathematicae,23(1):23{45,April2005.ISSN07230869. [37] MarcG.Genton.Classesofkernelsformachinelearning:astatisticsperspective.J.Mach.Learn.Res.,2:299{312,2002.ISSN1533-7928. [38] E.GokcayandJ.C.Principe.Informationtheoreticclustering.IEEETrans.PatternAnal.Mach.Intell.,24(2):158{171,Feb2002. [39] YukioriGotoandPatricioO'Donnell.Networksynchronyinthenucleusaccumbensinvivo.J.Neurosci.,21(12):4498{4504,June2001. 129

PAGE 130

[40] ArthurGretton,KarstenM.Borgwardt,MalteJ.Rasch,BernhardScholkopf,andAlexanderJ.Smola.Akernelmethodforthetwo-sampleproblem.CoRR,abs/0805.2368,2008. [41] SonjaGrun,MarkusDiesmann,andAdAertsen.Unitaryeventsinmultiplesingle-neuronspikingactivity:I.detectionandsignicance.NeuralComputation,14(1):43{80,January2002.ISSN0899-7667. [42] PaulR.Halmos.Measuretheory.Graduatetextsinmathematics,18.Springer-Verlag,1974.ISBN0387900888. [43] MatthiasHeinandOlivierBousquet.Hilbertianmetricsandpositivedenitekernelsonprobabilitymeasures.InProceedingsoftheTenthInternationalWorkshoponArticialIntelligenceandStatistics,pages136{143,2005. [44] ConorHoughton.StudyingspiketrainsusingavanRossummetricwithasynapse-likelter.JournalofComputationalNeuroscience,26(1):149{155,February2009. [45] ConorHoughtonandKamalSen.Anewmultineuronspiketrainmetric.Neuralcomputation,20(6):1495{1511,June2008.ISSN0899-7667. [46] D.H.HubelandT.N.Wiesel.Receptiveeldsofsingleneuronesinthecat'sstriatecortex.TheJournalofphysiology,148:574{591,October1959.ISSN0022-3751. [47] EugeneM.Izhikevich.Polychronization:Computationwithspikes.NeuralComputa-tion,18(2):245{282,February2005. [48] EugeneM.Izhikevich.Dynamicalsystemsinneuroscience:thegeometryofexcitabilityandbursting.Computationalneuroscience.MITPress,2007. [49] TommiJaakkola,MarkDiekhans,andDavidHaussler.Usingthesherkernelmethodtodetectremoteproteinhomologies.InProceedingsoftheSeventhInter-nationalConferenceonIntelligentSystemsforMolecularBiology,pages149{158.AAAIPress,1999.ISBN1-57735-083-9. [50] TommiS.JaakkolaandDavidHaussler.Exploitinggenerativemodelsindiscriminativeclassiers.InAdvancesinNeuralInformationProcessingSystems,1999. [51] TonyJebara,RisiKondor,andAndrewHoward.Probabilityproductkernels.J.Mach.Learn.Res.,5:819{844,2004.ISSN1533-7928. [52] R.Jenssen,D.Erdogmus,J.Principe,andT.Eltoft.Someequivalencesbetweenkernelmethodsandinformationtheoreticmethods.J.VLSISignalProcess.,45:49{65,2006. 130

PAGE 131

[53] R.Jenssen,J.C.Principe,D.Erdogmus,andT.Eltoft.TheCauchy-SchwarzdivergenceandParzenwindowing:ConnectionstographtheoryandMercerkernels.JournaloftheFranklinInstitute,343:614{629,2006. [54] R.S.JohanssonandI.Birznieks.Firstspikesinensemblesofhumantactileaerentscodecomplexspatialngertipevents.NatNeurosci,7(2):170{177,February2004.ISSN1097-6256. [55] DonH.Johnson,CharlotteM.Gruner,KeithBaggerly,andChandranSeshagiri.Information-theoreticanalysisofneuralcoding.JournalofComputationalNeuro-science,10(1):47{69,January2001. [56] T.Kanamori,S.Hido,andM.Sugiyama.Ecientdirectdensityratioestimationfornon-stationarityadaptationandoutlierdetection.InAdvancesinNeuralInformationProcessingSystems20,pages809{816,2008. [57] A.Kankainen.Consistenttestingoftotalindependencebasedonempiricalcharacter-isticfunction.PhDthesis,UniversityofJyvskyl,1995. [58] AlanKarr.Pointprocessesandtheirstatisticalinference.CRCPress,1991.ISBN9780824785321. [59] JustinKeat,PamelaReinagel,ClayR.Reid,andMarkusMeister.Predictingeveryspike:Amodelfortheresponsesofvisualneurons.Neuron,30(3):803{817,June2001. [60] AdamKepecs,NaoshigeUchida,HatimA.Zariwala,andZacharyF.Mainen.Neuralcorrelates,computationandbehaviouralimpactofdecisioncondence.Nature,455(7210):227{231,August2008.ISSN0028-0836. [61] StefanKlamp,RobertLegenstein,andWolfgangMaass.Spikingneuronscanlearntosolveinformationbottleneckproblemsandextractindependentcomponents.Neuralcomputation,21(4):911{959,April2009.ISSN0899-7667. [62] LeonidA.Kontorovich,CorinnaCortes,andMehryarMohri.Kernelmethodsforlearninglanguages.Theor.Comput.Sci.,405(3):223{236,2008.ISSN0304-3975. [63] GabrielKreiman,ChristofKoch,andItzhakFried.Category-specicvisualresponsesofsingleneuronsinthehumanmedialtemporallobe.NatureNeuroscience,3(9):946{953,September2000.ISSN1097-6256. [64] A.Kuhn,A.Aertsen,andS.Rotter.Higher-orderstatisticsofinputensemblesandtheresponseofsimplemodelneurons.NeuralComputation,15(1):67{101,January2003.ISSN0899-7667. [65] S.KullbackandR.A.Leibler.Oninformationandsuciency.Ann.Math.Stat.,22(1):79{86,1951. 131

PAGE 132

[66] RomeshD.Kumbhani,MarkJ.Nolt,andLarryA.Palmer.Precision,reliability,andinformation-theoreticanalysisofvisualthalamocorticalneurons.JournalofNeurophysiology,98:2647{2663,2007. [67] F.LieseandI.Vajda.Ondivergencesandinformationsinstatisticsandinformationtheory.InformationTheory,IEEETransactionson,52(10):4394{4412,2006. [68] WeifengLiu,P.P.Pokharel,andJ.C.Principe.Thekernelleast-mean-squarealgorithm.SignalProcessing,IEEETransactionson,56(2):543{554,2008. [69] ZacharyF.MainenandTerrenceJ.Sejnowski.Reliabilityofspiketiminginneocorticalneurons.Science,268(5216):1503{1506,1995. [70] ZacharyF.Mainen,JasdanJoerges,JohnR.Huguenard,andTerrenceJ.Sejnowski.Amodelofspikeinitiationinneocorticalpyramidalneurons.Neuron,15(6):1427{1439,December1995. [71] AndreF.T.Martins,MarioA.T.Figueiredo,PedroM.Q.Aguiar,NoahA.Smith,andEricP.Xing.Nonextensiveentropickernels.InICML'08:Proceedingsofthe25thinternationalconferenceonMachinelearning,pages640{647,NewYork,NY,USA,2008.ACM.ISBN978-1-60558-205-4. [72] J.Mercer.Functionsofpositiveandnegativetype,andtheirconnectionwiththetheoryofintegralequations.Philos.Trans.RoyalSocietyofLondon,209:415{446,1909. [73] C.A.Micchelli,Y.Xu,andH.Zhang.Universalkernels.J.Mach.Learn.Res.,7:2651{2667,2006.ISSN1533-7928. [74] MartinP.Nawrot,PhilippSchnepel,AdAertsen,andClemensBoucsein.Preciselytimedsignaltransmissioninneocorticalnetworkswithreliableintermediate-rangeprojections.Front.NeuralCircuits,3(1),2009. [75] MartinP.P.Nawrot,ClemensBoucsein,VictorR.Molina,AlexaRiehle,AdAertsen,andStefanRotter.Measurementofvariabilitydynamicsincorticalspiketrains.JNeurosciMethods,169(2):374{390,April2008.ISSN0165-0270. [76] AatiraNedungadi,GovindanRangarajan,NeerajJain,andMingzhouDing.Analyzingmultiplespiketrainswithnonparametricgrangercausality.JournalofComputationalNeuroscience,27(1):55{64,August2009.ISSN0929-5313. [77] X.Nguyen,M.Wainwright,andM.I.Jordan.Estimatingdivergencefunctionalsandthelikelihoodratiobypenalizedconvexriskminimization.AdvancesinNeuralInformationProcessingSystems20,pages1089{1096,2008. [78] MiguelA.L.NicolelisandMikhailA.Lebedev.Principlesofneuralensemblephysiologyunderlyingtheoperationofbrainmachineinterfaces.NatureReviewsNeuroscience,10(7):530{540,July2009.ISSN1471-003X. 132

PAGE 133

[79] YoshikoOgata.Theasymptoticbehaviourofmaximumlikelihoodestimatorsforstationarypointprocesses.AnnalsoftheInstituteofStatisticalMathematics,30(1):243{261,December1978.ISSN0020-3157. [80] AntonioR.C.Paiva.ReproducingKernelHilbertSpacesforPointProcesses,withapplicationstoneuralactivityanalysis.PhDthesis,UniversityofFlorida,2008. [81] AntonioR.C.PaivaandIlPark.Whichmeasureshouldweuseforunsupervisedspiketrainlearning?InStatisticalAnalysisofNeuronalData(SAND5),2010. [82] AntonioR.C.Paiva,IlPark,andJosePrncipe.Optimizationinreproducingkernelhilbertspacesofspiketrains.(bookchapterinpress). [83] AntonioR.C.Paiva,SudhirRao,IlPark,andJoseC.Prncipe.Spectralclusteringofsynchronousspiketrains.InIEEEInternationalJointConferenceonNeuralNetworks,Orlando,FL,USA,August2007. [84] AntonioR.C.Paiva,IlPark,andJosePrncipe.AreproducingkernelHilbertspaceframeworkforspiketrains.NeuralComputation,21(2):424{449,Feburary2009. [85] AntonioR.C.Paiva,IlPark,andJosePrncipe.Acomparisonofbinlessspiketrainmeasures.NeuralComputing&Applications,19:405{419,2010. [86] LiamPaninski.Convergencepropertiesofthreespike-triggeredanalysistechniques.Network:ComputationinNeuralSystems,14:437{464,August2003. [87] LiamPaninski.Maximumlikelihoodestimationofcascadepoint-processneuralencodingmodels.Network:Comput.NeuralSyst.,15:243{262,2004. [88] LiamPaninski.Estimatingentropyonmbinsgivenfewerthanmsamples.Informa-tionTheory,IEEETransactionson,50(9):2200{2203,2004.ISSN0018-9448. [89] LeandroPardo.Statisticalinferencebasedondivergencemeasures,volume185ofStatistics,textbooksandmonographs.Chapman&Hall/CRC,2006. [90] IlPark.Continuoustimecorrelationanalysistechniquesforspiketrains.Master'sthesis,UniversityofFlorida,2007. [91] IlParkandJoseC.Prncipe.Signicancetestforspiketrainsbasedonnitepointprocessestimation.InSocietyforNeuroscience,2009. [92] IlParkandJoseC.Prncipe.Quanticationofinter-trialnon-stationarityinspiketrainsfromperiodicallystimulatedneuralcultures.InIEEEInternationalConferenceonAcoustics,SpeechandSignalProcessing(ICASSP),2010.SpecialsessiononMultivariateAnalysisofBrainSignals:MethodsandApplications. [93] IlPark,AntonioR.C.Paiva,ThomasB.DeMarse,andJoseC.Prncipe.Anecientalgorithmforcontinuous-timecrosscorrelogramofspiketrains.JournalofNeuroscienceMethods,168(2):514{523,March2008. 133

PAGE 134

[94] IlPark,MuraliRao,ThomasB.DeMarse,andJoseC.Prncipe.Pointprocessmodelforpreciselytimedspiketrains.InFrontiersinSystemsNeuroscience.ConferenceAbstract:Computationalandsystemsneuroscience(COSYNE),2009. [95] E.Parzen.Onestimationofaprobabilitydensityfunctionandmode.Ann.Math.Stat.,33(3):1065{1076,Sept.1962. [96] ElzbietaPekalskaandRobertP.W.Duin.TheDissimilarityRepresentationforPatternRecognition.WorldScienticPublishingCo.,Inc.,RiverEdge,NJ,USA,2005.ISBN9812565302. [97] D.H.Perkel,G.L.Gerstein,andG.P.Moore.Neuronalspiketrainsandstochasticpointprocesses.I.thesinglespiketrain.BiophysicalJournal,7(4):391{418,1967. [98] D.H.Perkel,G.L.Gerstein,andG.P.Moore.Neuronalspiketrainsandstochasticpointprocesses.II.simultaneousspiketrains.BiophysicalJournal,7(4):419{440,July1967. [99] JonhathanW.Pillow,LiamPaninski,ValerieJ.Uzzell,EeroP.Simoncelli,andE.J.Chichilnisky.Predictionanddecodingofretinalganglioncellresponseswithaprobabilisticspikingmodel.JournalofNeuroscience,25:11003{11013,2005. [100] J.C.Principe,D.Xu,andJ.W.Fisher.Informationtheoreticlearning.InS.Haykin,editor,UnsupervisedAdaptiveFiltering,pages265{319.JohnWiley&Sons,2000. [101] JoseC.Prncipe.InformationTheoreticLearning.Springer,2010.ISBN1441915699. [102] PamelaReinagelandR.ClayReid.Temporalcodingofvisualinformationinthethalamus.JournalofNeuroscience,20(14):5392{5400,2000. [103] PamelaReinagelandR.ClayReid.Preciseringeventsareconservedacrossneurons.J.Neurosci.,22(16):6837{6841,Aug2002. [104] Rolf-DieterReiss.Acourseonpointprocesses.Springerseriesinstatistics.Springer,1993.ISBN9780387979243. [105] A.Renyi.Onmeasuresofentropyandinformation.InSelectedpaperofA.Renyi,volume2,pages565{580.AkademiaiKiado,Budapest,Hungary,1976. [106] RobertSchabackandHolgerWendland.Multivariateapproximationandapplica-tions,chapterCharacterizationandconstructionofradialbasisfunctions,pages1{24.CambridgeUniversityPress,2001.ISBN0521800234. [107] B.Scholkopf,A.Smola,andK.R.Muller.Nonlinearcomponentanalysisasakerneleigenvalueproblem.NeuralComput.,10(5):1299{1319,1998.ISSN0899-7667. [108] BernhardScholkopf.Thekerneltrickfordistances.InNeuralInformationProcessingSystems,pages301{307,2000. 134

PAGE 135

[109] BernhardScholkopfandAlexanderJ.Smola.Learningwithkernels:supportvectormachines,regularization,optimization,andbeyond.Adaptivecomputationandmachinelearning.MITPress,2002. [110] BernhardScholkopf,AlexanderSmola,andKlaus-RobertMuller.Nonlinearcomponentanalysisasakerneleigenvalueproblem.NeuralComputation,10(5):1299{1319,1998. [111] S.Schreiber,J.M.Fellous,D.Whitmer,P.Tiesinga,andT.J.Sejnowski.Anewcorrelation-basedmeasureofspiketimingreliability.Neurocomputing,52-54:925{931,2002. [112] MichaelN.ShadlenandWilliamT.Newsome.Noise,neuralcodesandcorticalorganization.Curr.Opin.Neurobiol.,4:569{579,1994. [113] TatyanaO.Sharpee,KennethD.Miller,andMichaelP.Stryker.Ontheimportanceofstaticnonlinearityinestimatingspatiotemporalneurallterswithnaturalstimuli.JNeurophysiol,99(5):2496{2509,May2008.ISSN0022-3077. [114] B.W.Silverman.DensityEstimationforStatisticsandDataAnalysis.ChapmanandHall,NewYork,1986. [115] WilliamE.Skaggs,BruceL.McNaughton,MatthewA.Wilson,andCarolA.Barnes.Thetaphaseprecessioninhippocampalneuronalpopulationsandthecompressionoftemporalsequences.Hippocampus,6(2):149{172,1996.ISSN1098-1063. [116] DonaldL.SnyderandMichaelI.Miller.RandomPointProcessesinTimeandSpace.Springer-Verlag,1991. [117] BharathK.Sriperumbudur,ArthurGretton,KenjiFukumizu,GertLanckriet,andBernhardSchokopf.Injectivehilbertspaceembeddingsofprobabilitymeasures.InCOLT,2008. [118] PaulTiesinga,Jean-MarcFellous,andTerrenceJ.Sejnowski.Regulationofspiketiminginvisualcorticalcircuits.NatureReviewsNeuroscience,9:97{107,Feb2008. [119] TaroToyoizumi,Jean-PascalPster,KazuyukiAihara,andWulframGerstner.Optimalitymodelofunsupervisedspike-timing-dependentplasticity:Synapticmemoryandweightdistribution.NeuralComp.,19(3):639{671,March2007. [120] HenryC.Tuckwell.IntroductiontoTheoreticalNeurobiology:Volume2,NonlinearandStochasticTheories:NonlinearandStochasticTheories.CambridgeUniversityPress,1988.ISBN0521352177,9780521352178. [121] M.C.W.vanRossum.Anovelspikedistance.NeuralComputation,13:751{763,2001. 135

PAGE 136

[122] RunVanRullen,RudyGuyonneau,andSimonJ.Thorpe.Spiketimesmakesense.TrendsinNeurosciences,28(1):1{4,Jan2005. [123] J.D.VictorandK.P.Purpura.Metric-spaceanalysisofspiketrains:theory,algorithmsandapplication.Network:ComputationinNeuralSystems,8(2):127{164,1997. [124] JonathanD.Victor.Spiketrainmetrics.CurrentOpinioninNeurobiology,15:585{592,2005. [125] JonathanD.Victor.Approachestoinformation-theoreticanalysisofneuralactivity.Biologicaltheory,1(3):302{316,2006.ISSN1555-5542. [126] D.A.Wagenaar,J.Pine,andS.M.Potter.Eectiveparametersforstimulationofdissociatedculturesusingmulti-electrodearrays.JNeurosciMethods,138(1-2):27{37,September2004.ISSN0165-0270. [127] JianWuXu,AntonioR.C.Paiva,IlPark,andJoseC.Prncipe.AreproducingkernelHilbertspaceframeworkforinformation-theoreticlearning.SignalProcessing,IEEETransactionson,56(12):5891{5902,2008. 136

PAGE 137

BIOGRAPHICALSKETCH Il\Memming"Parkisacomputationalneuroscientistwhoaimstounderstandtheinformationprocessingofhumanbrain.HeattendedKoreaAdvancedInstituteofScienceandTechnology(KAIST)wherehereceivedaBachelorofSciencedegreeincomputersciencein2004.HepursuedhisgraduatestudywithJoseC.PrncipeintheComputationalNeuroEngineeringLaboratory(CNEL)inUniversityofFloridafromFall2005.HereceivedhisMasterofSciencedegreeinelectricalandcomputerengineeringdepartmentinUniversityofFloridain2007onecientalgorithmsforbinlessspiketraindomainsignalprocessing.HecontinuedhisstudyatCNELinbiomedicalengineeringdepartment.HeisknownbyhispseudonymMEMMINGontheweb.HewasborninGermanyin1979. 137