<%BANNER%>

A Biologically Plausible Approach for Noise Robust Vowel Recognition

Permanent Link: http://ufdc.ufl.edu/UFE0022585/00001

Material Information

Title: A Biologically Plausible Approach for Noise Robust Vowel Recognition
Physical Description: 1 online resource (97 p.)
Language: english
Creator: Uysal, Ismail
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2008

Subjects

Subjects / Keywords: Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre: Electrical and Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Present day commercial automatic speech recognition (ASR) systems still pale in comparison to the human ability to recognize speech. The inherent non-stationarity associated with the timing of speech signals, as well as the dynamical changes in the environment make the ensuing analysis and recognition extremely difficult. For decades, researchers have tried to mimic biology for improving machine recognition and ASR is no exception. Widely used and accepted Mel Frequency Cepstral Coefficients, for example, try to develop more effective filter banks by looking at the tonotopical distribution of the hair cells along the basilar membrane. However, all these approaches stem only from psychoacoustical and anatomical considerations. This research aims to take this biological inspiration one step further by imitating some of the dynamical computation of our auditory system via describing a biologically plausible algorithm that exclusively utilizes spikes in both the feature extraction and recognition stages. The main goal of this research is to investigate the computation potential of spike trains produced at the early stages of the auditory system for a simple acoustic classification task. The system makes use of an inner hair cell model to convert sound pressure waves into physical membrane motion which is then converted into a spiking representation at the nerve fibers centered at different frequencies. Different spike coding schemes from temporal to rate coding are explored, as well as different spike-based encoders with various simplicity levels such as rank order coding and liquid state machine. The spiking data is analyzed using different coding schemes one of which considers the phase synchrony among different nerve fibers. A phase synchrony plot is constructed which shows the degree of synchrony at various frequencies. This plot is used as a means to detect the features inherent in the vowel signal, namely formant frequencies. The experiments prove that this yields a spike-based, noise robust feature set for the analysis of speech signals. This research also introduces a duplex theory of spike coding in the early stages of the auditory system based on the intensity and noise levels of the acoustic stimuli. According to this concept, at low intensity levels, where auditory nerve firings cannot generate a high enough synchrony among neuron ensembles, rate coding is more likely favored against phase-locking via synchrony coding. To the contrary, at conversational intensity levels, phase synchrony coding is preferred due to its superior and highly noise robust performance. The theory is supported by both evidence from biology, as well as from experimental simulations using biologically plausible models of the entire processing chain from spike generation to recognition. Based on these findings, a biologically plausible system architecture is proposed for the recognition of phonetically simple acoustic signals which makes exclusive use of spikes for computation. The prototype biological system is demonstrated on voiced phonemes and results show competitive recognition performance on a vowel dataset in the presence of noise. Two different tests are performed in order to gauge the theoretical and real life performance of the algorithm. In both situations the algorithm performs significantly well, down to SNR values as low as 5dB, where the performance of the baseline algorithm used for comparison drops considerably. These suggest not only a possible explanation for the recognition mechanism inherent to the auditory system, but also a novel way of information processing for speech signals. To sum up, the research presented in this dissertation has three major contributions to the field. A novel, spike-based, feature extraction technique is presented, establishing the first use of spike-based features in the form of phase-synchrony for speech recognition. Second, for the defined problem of vowel recognition, the proposed spike-based algorithms outperformed the best known techniques currently being used today in commercial ASR systems. Finally, during the process of developing such algorithms, we derived real biological implications such as the theory of duplex spike coding in the auditory system.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Ismail Uysal.
Thesis: Thesis (Ph.D.)--University of Florida, 2008.
Local: Adviser: Harris, John G.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2008
System ID: UFE0022585:00001

Permanent Link: http://ufdc.ufl.edu/UFE0022585/00001

Material Information

Title: A Biologically Plausible Approach for Noise Robust Vowel Recognition
Physical Description: 1 online resource (97 p.)
Language: english
Creator: Uysal, Ismail
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2008

Subjects

Subjects / Keywords: Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre: Electrical and Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Present day commercial automatic speech recognition (ASR) systems still pale in comparison to the human ability to recognize speech. The inherent non-stationarity associated with the timing of speech signals, as well as the dynamical changes in the environment make the ensuing analysis and recognition extremely difficult. For decades, researchers have tried to mimic biology for improving machine recognition and ASR is no exception. Widely used and accepted Mel Frequency Cepstral Coefficients, for example, try to develop more effective filter banks by looking at the tonotopical distribution of the hair cells along the basilar membrane. However, all these approaches stem only from psychoacoustical and anatomical considerations. This research aims to take this biological inspiration one step further by imitating some of the dynamical computation of our auditory system via describing a biologically plausible algorithm that exclusively utilizes spikes in both the feature extraction and recognition stages. The main goal of this research is to investigate the computation potential of spike trains produced at the early stages of the auditory system for a simple acoustic classification task. The system makes use of an inner hair cell model to convert sound pressure waves into physical membrane motion which is then converted into a spiking representation at the nerve fibers centered at different frequencies. Different spike coding schemes from temporal to rate coding are explored, as well as different spike-based encoders with various simplicity levels such as rank order coding and liquid state machine. The spiking data is analyzed using different coding schemes one of which considers the phase synchrony among different nerve fibers. A phase synchrony plot is constructed which shows the degree of synchrony at various frequencies. This plot is used as a means to detect the features inherent in the vowel signal, namely formant frequencies. The experiments prove that this yields a spike-based, noise robust feature set for the analysis of speech signals. This research also introduces a duplex theory of spike coding in the early stages of the auditory system based on the intensity and noise levels of the acoustic stimuli. According to this concept, at low intensity levels, where auditory nerve firings cannot generate a high enough synchrony among neuron ensembles, rate coding is more likely favored against phase-locking via synchrony coding. To the contrary, at conversational intensity levels, phase synchrony coding is preferred due to its superior and highly noise robust performance. The theory is supported by both evidence from biology, as well as from experimental simulations using biologically plausible models of the entire processing chain from spike generation to recognition. Based on these findings, a biologically plausible system architecture is proposed for the recognition of phonetically simple acoustic signals which makes exclusive use of spikes for computation. The prototype biological system is demonstrated on voiced phonemes and results show competitive recognition performance on a vowel dataset in the presence of noise. Two different tests are performed in order to gauge the theoretical and real life performance of the algorithm. In both situations the algorithm performs significantly well, down to SNR values as low as 5dB, where the performance of the baseline algorithm used for comparison drops considerably. These suggest not only a possible explanation for the recognition mechanism inherent to the auditory system, but also a novel way of information processing for speech signals. To sum up, the research presented in this dissertation has three major contributions to the field. A novel, spike-based, feature extraction technique is presented, establishing the first use of spike-based features in the form of phase-synchrony for speech recognition. Second, for the defined problem of vowel recognition, the proposed spike-based algorithms outperformed the best known techniques currently being used today in commercial ASR systems. Finally, during the process of developing such algorithms, we derived real biological implications such as the theory of duplex spike coding in the auditory system.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Ismail Uysal.
Thesis: Thesis (Ph.D.)--University of Florida, 2008.
Local: Adviser: Harris, John G.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2008
System ID: UFE0022585:00001


This item has the following downloads:


Full Text

PAGE 1

1

PAGE 2

2

PAGE 3

3

PAGE 4

ThisworkhasbeencarriedoutundertheguidanceofmyadvisorDr.JohnG.Harris.Hisbrilliantideasandsuggestionsmorethanhelpedshapethisresearch.Asallhisotherstudents,Ifeelprivilegedandfortunatetohavehimasmyadvisor.IalsowouldliketothankDr.HarshaSathyendraforhisgreatcollaborationwiththisandotherresearchprojects.Finally,Iwouldliketothankeachofmycommitteemembers,Dr.JoseC.Principe,Dr.JaniseY.McNairandDr.ArunavaBanerjeefortheiradviceandcontributiontomyeducationthusfarandinthefuture. 4

PAGE 5

page ACKNOWLEDGMENTS ................................. 4 LISTOFTABLES ..................................... 7 LISTOFFIGURES .................................... 8 ABSTRACT ........................................ 10 CHAPTER 1INTRODUCTION .................................. 13 1.1AutomaticSpeechRecognition ......................... 13 1.2TheInspiration ................................. 13 1.3TheGoalandtheApproach .......................... 13 1.4ABriefHistoryofAutomaticSpeechRecognition .............. 15 1.5ABriefHistoryofBio-InspiredSpeechProcessing .............. 16 1.6ObjectivesandContributionsoftheResearch ................ 19 1.7RoadMap .................................... 20 2CONVENTIONALSPEECHRECOGNITION .................. 21 2.1FeatureExtraction:MelFrequencyCepstralCoecients .......... 21 2.1.1MFCCImplementationofSlaney'sAuditoryToolbox ........ 24 2.1.2MFCCImplementationoftheHiddenMarkovModelToolkit .... 25 2.2Classication:HiddenMarkovModel ..................... 25 2.2.1MarkovProcess ............................. 26 2.2.2MarkovAnalysisofSpeech ....................... 26 2.2.3CharacterizationofHiddenMarkovModel .............. 26 3BIOLOGICALLYPLAUSIBLESPEECHRECOGNITION:SPIKEGENERATION,CODINGANDCLASSIFICATION ......................... 29 3.1InnerHairCellModel ............................. 29 3.2SpikeCodingTechniques ............................ 34 3.2.1RateCoding ............................... 35 3.2.2DirectTemporalCoding ........................ 36 3.2.3PhaseCoding .............................. 37 3.2.4TimetoFirstSpikeCoding ....................... 37 3.2.5SynchronyCoding ............................ 38 3.3SpikeClassicationTechniques ........................ 39 3.3.1RankOrderCoding ........................... 39 3.3.2LiquidStateMachine .......................... 41 5

PAGE 6

................................... 44 4.1Introduction ................................... 44 4.2DetectionofFormantFrequenciesandFeatureExtraction .......... 44 4.3TheMeddisHairCellModel .......................... 45 4.4SynchronyBetweenHairCells ......................... 46 5SPIKE-BASEDCLASSIFICATION ......................... 54 5.1ROC-SynchronyCodingArchitecture(ROC-DoS) ............. 54 5.1.1RankOrderCoding ........................... 55 5.1.2ClassicationViaRankOrderCoding ................. 55 5.2LSM-Rate,DirectTemporalandSynchronyCodingArchitecture ..... 58 6SAMPLESPACINGENTROPYESTIMATION .................. 62 6.1Introduction ................................... 62 6.2EntropyEstimationMethods ......................... 64 6.2.1KernelDensityEstimationBasedEntropyEstimates ......... 64 6.2.2SampleSpacingEstimates ....................... 65 6.3ContinuouslyDierentiableSample-SpacingEntropyEstimation ...... 66 6.4MinimumErrorEntropyandAdaptiveSystemTraining ........... 69 6.5ExperimentalResultsandComparisons .................... 70 6.5.1CaseStudyI:Mackey-GlassTimeSeries ............... 71 6.5.2CaseStudyII:TimeSerieswithHighDimensionalDynamicsandDriftingParameters ........................... 73 6.6Conclusions ................................... 76 7RESULTSANDEXPERIMENTS .......................... 78 7.1ControlledExperiment ............................. 78 7.2TheRealClassicationProblemwithRealVowels .............. 80 7.2.1TestSettings ............................... 80 7.2.2TestResults ............................... 81 7.3ProgressivePerformanceTests ......................... 84 7.4Discussions ................................... 87 8DISCUSSION,CONCLUSIONSANDFUTUREWORK ............. 88 8.1FutureApplications ............................... 89 8.1.1PitchDetection ............................. 89 8.1.2AuditorySceneAnalysis ........................ 89 8.2FutureWork:WhereDoWeGoFromHere? ................. 91 REFERENCES ....................................... 92 BIOGRAPHICALSKETCH ................................ 97 6

PAGE 7

Table page 4-1MeanformantfrequenciesforcommonEnglishvowels ............... 45 6-1ComputationTimevs.L1PerformanceComparisonsandTrade-osfortheLearningSchemes-Mackey-GlassTimeSeriesPrediction .................. 72 6-2ComputationTimevs.L1PerformanceComparisonsandTrade-osfortheLearningSchemes-MotionofaDamped,DrivenParticle .................. 75 7-1Meanformantfrequenciesforvowelsusedinthecontrolledexperiment ..... 79 7-2Percentageofvowelscorrectlyclassiedforthecontrolledexperiment ...... 79 7-3Meanformantfrequenciesforvowelsusedintherealclassicationtest ..... 80 7-4PercentageofvowelscorrectlyclassiedforROC-SynchronyCoding ...... 81 7-5Percentageofvowelscorrectlyclassiedat10dBSPLforLSM-rate,directtemporalandsynchronycoding ................................ 82 7-6Percentageofvowelscorrectlyclassiedat60dBSPLforLSM-rate,directtemporalandsynchronycoding ................................ 83 7-7PercentageofvowelscorrectlyclassiedforLSM-synchronycodingandMFCC-HMMengine ......................................... 84 7-8Percentageofvowels(10classes)correctlyclassiedforLSM-synchronycodingandMFCC-HMMengine ............................... 85 7-9Percentageofvowels(10classes)correctlyclassiedfortheaugmentedfeaturesandMFCCwiththeHMMengine .......................... 86 7

PAGE 8

Figure page 1-1Ageneralblockdiagramofautomaticspeechrecognition ............. 17 1-2Worderrorrateforlarge-vocabulary(5000words)continuousspeechrecognition.FourdierentnoisesourceswereusedtocorruptthecleanspeechwhereasthemachinerecognitionwascarriedoutusingthecommonHMMtoolkit. ...... 18 2-1TherelationshipbetweenMelscaleandHertzscale ................ 22 2-2ThedistributionofMFCClterbankinfrequency ................ 23 2-3Melfrequencyenergyspectraisderivedfromthelterbankoutputs,followedbyaDCToperationforcompression. ........................ 23 2-4Atypicalmultiple-emittingstatehiddenMarkovmodel .............. 27 3-1Atypicalactionpotential .............................. 30 3-2Theowdiagramshowingthegenerationofaspiketraininresponsetoagivenacousticstimuli .................................... 32 3-3Speech-to-spikeconversionblockwhichshowsthetransductionofsoundwavesintotrainsofactionpotentialsatnervebersconnectedtoinnerhaircells ... 33 3-4Ratecodingintheformofspikedensity ...................... 35 3-5Phasecodedneuralresponseduetoanoscillation ................. 37 3-6Timetorstspikecodinginresponsetoastimulionset .............. 38 3-7Synchronyofagroupofneuronsinresponsetostimuli .............. 39 3-8ThebasicstructureofarankorderdecoderforLdierentnumberofclassesandNdierentnumberofpresynapticneurons ................... 40 3-9ThebasicstructureofaliquidstatemachinewithMreadoutstrainedusingsupervisedlearning .................................. 42 4-1Spiketrainoutputsforasetof10haircellsallwithacentralfrequencyof300Hz 47 4-2Inter-spiketimeintervalsforasetofhaircellscenteredat300Hz ......... 48 4-3Log-magnitudespectralenvelopefor/iy/andthecorrespondingdegreeofsynchronyforasetofhaircellscenteredatFc=300Hz(computedforanoisyutterancewith5dBSNR) .................................... 49 4-4Degreeofsynchronywithin2setsofhaircellscenteredatFc=300HzandFc=250HzinresponsetonoisyvowelsignalwithF1=300Hz ................. 50 8

PAGE 9

................................... 52 5-1Blockdiagramofoneofthebranchesofvowelclassicationalgorithmforoneofthevowels ....................................... 57 5-2Outputofthedecodersforatestutteranceofthesecondvowel/ih/asin\bit" 58 5-3TheblockdiagramfortheLSM-synchronycodingarchitecture ......... 59 6-1Thepolynomialkernelfori=3and~ei=0. .................... 68 6-2TheerrorprobabilitydensitiesforMSE,MEE-KDEandMEE-CDSSfortheMackey-Glasstimeseriespredictiononthetestdata ............... 73 6-3ThetesterrorprobabilitydensitiesforMSE,MEE-KDEandMEE-CDSSforthepredictionofthemotionofaweaklydampedandperiodicallydrivenparticle 76 8-1Log-magnitudespectralenvelopefor/iy/andthecorrespondingdegreeofsynchronyforasetofhaircellscenteredatFc=300Hz(computedforanoisyutterancewith5dBSNR)-Notethatthenerveberscenteredatlowfrequenciescansimilarlyphase-locktothefundamentalfrequency ................. 90 9

PAGE 10

10

PAGE 11

11

PAGE 12

12

PAGE 13

13

PAGE 14

14

PAGE 15

1 ].Othereortsregardingearlyrecognitionsystems,worthmentioning,canbelistedasfollows: 2 ], 3 ], 4 ], 5 ], 6 ]. 15

PAGE 16

7 ].LPCenabledstraight-forwardestimationandanalysisofvocaltractresponseextractedfromtime-domainspeechwaveforms.UsingLPCalsomadepossibletraditionalpatternrecognitiontoolstobeappliedtospeechrecognition[ 8 ].However,until1980s,speechrecognitionwasstillmostlyviewedastemplatematchingintimeorfrequency.TheintroductionofHiddenMarkovmodel(HMM)enabledthetransitiontoamorestatisticalframework[ 9 ].Meanwhile,theresearchonamorerobustfeaturespaceonwhichtheHMMcouldoperate,yieldedMel-frequencycepstralcoecients(MFCC)[ 10 ].MFCCreliesonspectralanalysisonMelscalewhichapproximatesourauditorysystem'snaturalresponsemorecloselythanlinearlyspacedfrequencyscalewhichwasbeingusedforpreviousfeatureextractionssuchasLPC.Presentday,state-of-the-artASRsystemsstillusethebasicbuildingblockofMFCCfeatureset-HMMclassierandincludesadvancedversionsoflexiconandlanguagemodelsforbetterperformance.Figure 1-1 showsthegeneralblockdiagramofatypicalASRsystem.Inthisgure,HMMandMFCCconstitutetheacousticmodelblockwhichwillbediscussedinmoredetailinthenextchapter.Ifweassumethattheacousticmodelrepresentssub-words,thenlexiconmodelwilldeterminehowtheyarelinkedtogethertorepresentfullwords.Foracontinuousspeechrecognitiontask,languagemodelrepresentsthetopmosthierarchicalstructurewhichdecidesonthelikelihoodofaspecicorderofwordstomakeupameaningfulsentence.Thisresearchdealswithbio-inspiredfeatureextractionandclassication,thusweconcentrateontheacousticmodelblock. 16

PAGE 17

Ageneralblockdiagramofautomaticspeechrecognition auditoryneuralcomputation.Thebiggestmotivationbehindthisresearchisthefactthathumanengineeredautomaticspeechrecognition(ASR)systemsperformpoorlyasthevariabilityassociatedwiththespeechsignalincreasesespeciallyinnoisyenvironments.Figure 1-2 showsabriefcomparisonofhumanandmachineperformanceforacontinuousspeechrecognitionexperiment.Itisobviousthathumanbrainismuchmorecapableofprocessingtheever-continuousstreamofinputwithunparalleledaccuracy.Eventhoughmuchisstillunknownabouthowbrainexactlyworks,itiswellknownthatneuronsinthebrainuseactionpotentialstocommunicatethetiminginformationfromthesensorytomorecomplexlevelsofprocessinginthecortex.Westronglybelievethat 17

PAGE 18

Worderrorrateforlarge-vocabulary(5000words)continuousspeechrecognition.FourdierentnoisesourceswereusedtocorruptthecleanspeechwhereasthemachinerecognitionwascarriedoutusingthecommonHMMtoolkit. computationwithspiketrainsisnotamereartifactofbiologybutinsteadholdsthekeytotherobustnessandperformanceoftheauditorysystem.Theideaofusingbio-inspiredtechniquesformachinerecognitiontasksiscertainlynotanewconcept.InASR,themostcommonlyusedfeatureset,Melfrequencycepstralcoecients(MFCC),imitatesthedistributionofcochlearlterbanksbyemployinglogarithmicallydistributedltersalongthefrequencyaxis[ 10 ].Moreelaboratedapproachesincludehumanfactorcepstralcoecientswhichuseknownfactsfromhumanpsychoacousticssuchastherelationshipbetweencenterfrequencyandcriticalbandwidthtodecouplelterbandwidthfromlterspacing[ 11 ].Theperformanceandrobustnessofsaidtechniquescomparedtopreviouscommonlyusedfeatureextractorssuchaslinear 18

PAGE 19

7 ].Theobjectiveofthisresearchistotakethisinspirationonestepfurthertoincludetheneuralcomputationusedinthehumanauditorysystemforboththefeatureextractionandrecognitionstagesofasimpleacousticclassicationproblem.Inordertoaccomplishthisgoal,wehavefollowedasystematicapproachwheredierentspike-basedoperatorsareevaluatedalongwithdierentspikecodingtechniquesfordierenttypesandintensitiesofnoiseandinputacousticstimuli[ 12 { 14 ]. 19

PAGE 20

20

PAGE 21

10 ].ThemostimportantreasonbehindthesuccessofMFCCisthenoiserobustnessofthefeaturescomparedtofeatureextractorsusinglinearpredictionandspectrum.AtypicalMFCCfeatureextractionconsistsofthefollowingsteps:1)Dividethewholespeechsegmentintoframes,wherequasi-stationaryanalysisispossible.Atypicalspeechframewillhaveadurationof20to40ms.2)TaketheFouriertransformofthespeechsignalwithinthatframetoobtainitsspectralrepresentation.3)Usingtriangularoverlappingwindowsonmelscale,maptheenergiesoftheobtainedspectrum.4)Takethelogarithmoftheenergyfoundateachmelfrequencytoconstructavectorofenergies.5)Findthediscretecosinetransformoftheabovevectortoobtainthenalfeaturesetforaparticularspeechframe. 21

PAGE 22

TherelationshipbetweenMelscaleandHertzscale 6)Followthesameprocedureforeachspeechframeforacompleteanalysisofthewholespeechsegment.InordertounderstandhowMFCCisderived,oneneedstounderstandtherelationshipbetweenmelscaleandlinearHertzscale.MelscalewasrstproposedbyStevens,VolkmanandNewmanin1937[ 15 ].ItdictatesthelogarithmicsensitivityofhumanhearingwithrespecttoacousticinputfrequencyasalsoshowninFigure 2-1 .Thetriangularoverlappingwindowsareplacedinmelfrequency.Figure 2-2 showsthelterbankusedbyDavisandMermelsteinintheoriginalMFCCfeatureextractionalgorithm.Therst10ltersarelinearlyspacedwithin100and1000Hzwhereastherestoftheltersarelog-spacedat5ltersperoctave,followingmelfrequencydistribution. 22

PAGE 23

ThedistributionofMFCClterbankinfrequency Figure2-3. Melfrequencyenergyspectraisderivedfromthelterbankoutputs,followedbyaDCToperationforcompression. Thenextstepistocalculatethespectralenergyineachband(i.e.eachlteroutput)andtakethelogarithmofthecorrespondingenergyvector,whichisdepictedinFigure 2-3 .Finally,theresultingMFCCfeaturevectorisfoundbytakingtheDCTofthevector[m1...mM]whereMdonatesthenumberofcepstralcoecients.Inamorecompactform,eachfeaturecanbefoundusingthefollowingequation: 23

PAGE 24

2) 16 ],[ 17 ]. 16 ].Inthisversion,thelterbankhas40lterswithequalareascomparedtoDavisandMermelstein's20lterswithequalheights.ThefollowingequationsdescribethisparticularimplementationofMFCC:fci=133:33+66:67i;i=1;:::;12fci=fc12i12;i=13;:::;40and=20:1wherefciisthecenterfrequencyoftheithlterinHertz. 24

PAGE 25

17 ].Itusesaslightlydierentdenitionofmelfrequencywarpingasfollows:^f=2595log(1+f 25

PAGE 26

2-4 .Thistypeofleft-to-righttopologyisusedforalmosteveryASRapplicationusingdierentnumberof 26

PAGE 27

Atypicalmultiple-emittingstatehiddenMarkovmodel states.Forexample,theapplicationdiscussedinthisthesis,dealswithalmoststationaryvowelsandthusasinglestateisenoughforsucientmodeling.TheHMMsaretypicallycharacterizedbythefollowingparameters.=>Ndenotesthenumberofstatesinthemodel.Forinstance,ingure 2-4 therearethreedierentstates.=>Adenotesthestatetransitionprobabilitymatrix.Inotherwords,itdenestheprobabilityofgoingfromonestatetoanother.Sincethisisaleft-to-rightHMM,backwardstatetransitionarenotallowedandcarryzeroprobability.=>Bistheoutputprobabilitydistributiondenedforeachemittingstate.ThemostcommonHMMmodelsusedinspeechrecognitionaredenedwithGaussianoutputprobabilitydistributions.=>Finally,denotestheinitialstatedistribution.Inotherwords,itdeneswherethesystemwillstartattimet=0.ThisdissertationfocusesonsinglestateHMMsbecauseofthefactthatvowelscanbeconsideredstationarywithintheframeofanalysis,N=1.AGaussianmixturemodelisusedforthatsinglestate,inotherwords,B,theoutputprobabilitydistributionisa 27

PAGE 28

18 ]. 28

PAGE 29

19 ].Accordingtothismodelsoundpressurewavesareconvertedintomechanicalmotionatthebasilarmembrane(BM)whichhasatonotopicdistributionofinnerhaircells.WhenthesoundpressurewavesvibratestheBM,theattachedhaircellsdeectwhichchangesthepermeabilityofthecellmembraneswhichmakesynapticconnectionstonervebers.Achangeinpermeabilityalsochangestheamountofneurotransmitterscurrentlypresentinthesynapticcleftwhichstartsthemechanismforgeneratingtheactionpotentialsatthenervebers.Themodelusedinthispapertakesintoaccountmanyimprovementsoverthisbasicconceptthroughouttheyearssuchastheintroductionofadaptationasseeninthehumancochleaandthenon-lineartemporalpropertiesassociatedwithspikegeneration[ 20 21 ] 22 ].AsignicantclaimisthattheshapeofanactionpotentialasshowninFigure 3-1 ,isnotlikelytocarrydiscernibleinformation.Theinformationismorelikelycarriedinthetimingandfrequencyoftheoccurrenceofspikesthroughoutthenerveberwhichwillbediscussedinthenextsection. 29

PAGE 30

Atypicalactionpotential Speechsignalsontheotherhandarenothingbutsoundpressuresthatarerstdetectedbythepinna,theoutermostregionoftheear.Throughacomplexchainofprocessing,thesepressurewavesareconvertedintoaspatio-temporalarrayofactionpotentialsforfurtherprocessing.Therstpartoftheresearch,whichconcentratesontheextractionoffeaturesinherentinaspeechsignal,willfocusonthistransductionofpressurewavestoactionpotentials.Hence,itissuitabletoprovidethereaderwithsomebackgroundknowledge.Theearisanengineeringmarvelinnumerousaspects.Thebasicideaisthatthecochlea,whichisapartoftheinnerear,isauidlledchamberinwhichsoundpressurewavesowandareltered.Thesewavesmodulatethebasilarmembraneatvariouspointsalongthecochleacorrespondingtothefrequenciesinthesoundsignal.Asthenameimplies,thebasilarmembraneactsasabaseforthesensorycellsandactslikeafrequency 30

PAGE 31

23 ].Theinnerhaircellsareattachedtothebasilarmembranethroughoutandareresponsiblefromconvertingmechanicalvibrationstoelectricalsignals.TheircharacteristicsandoperationwasrstclearlyformulatedbyMeddisin1986[ 19 ].Figure 3-2 showstheowdiagramforthehaircellmodel.Innerhaircellshavemechanically-gatedionchannelsinsteadofthemoreusualpotassiumandsodiumvoltagegatedionchannels.Whenahaircellisdeectedthemechanicallygatedionchannelsopenwhichletspositivelychargedionsenterthecell.Thisinturn,opensthecalciumvoltage-gatedionchannelsandthusthereleaseofneurotransmitterstothesynapticcleftbetweenthehaircellandthenervebers.Thisdepolarizesthenerveberandcausesaspiketore.Meddiswasabletoexplainthisphenomenonwithmathematicalequations.Hismodelconsistsof3importantvariables:thehaircellreceptorpotential,whichisafunctionofthebasilarmembranevelocity,thecalciumcontrolledtransmitterreleaseandtheprobabilityofanactionpotential.Sincetheintroductionofthemodelin1986,itwasrevisedandupdatedseveraltimes.Currentlythemodelalsoincorporatestheadaptationoftheneuralringstoconstantstimuli[ 20 ],[ 21 ].Oncethesoundwavesareconvertedtoactionpotentialspropagatingthroughouttheauditorynerve,theliteratureonthesubsequentprocessingstepsrevealslessandlessasonereachesthehigherlevelsoftheauditorysystem,particularlytheauditorycortex.Aswithothercorticalareas,comprehensiveanalysisoftheauditorycortexusuallyrequiresinvasivetechniqueswhichlimitsourunderstandingonthedynamicsofthecomputationwhichtakesplaceontheinputprovidedbytheauditorynerveber.Mostoftheevidence,usuallyobtainedthroughsurgeries,providesphysiologicalinsightstotheorganization 31

PAGE 32

Theowdiagramshowingthegenerationofaspiketraininresponsetoagivenacousticstimuli ofthecortex.Forexample,thetonotopicmappingweobserveonthebasilarmembraneispreservedeveninthecortex[ 24 ].Thelatestresearchdividesthecortexintoseveralregionsandexplainsthecellstructureandtheconnectionsbetweentheseregionsaswellasthetonotopicmappingsobservedwithineachregion[ 24 ].Thisresearchmodelsthesetheoreticalndingsusingspikingnetworkmodelsandperformsimulationson 32

PAGE 33

Speech-to-spikeconversionblockwhichshowsthetransductionofsoundwavesintotrainsofactionpotentialsatnervebersconnectedtoinnerhaircells thesenetworkswhoseparameterswillincludeconnectivitytoothernetworks,frequencyselectivityandnumberofneuronsrespectivetootherregions.Figure 3-3 showsthefront-endoftheoverallspike-basedarchitecture.Thespeechispassedthroughaseriesofequivalentrectangularbandwidth(ERB)ltersspanningthefrequencyrangeof200Hzto4kHzwhichincludesmostofthefrequencycontentpresentinatypicalspeechsignal.Thecenterfrequenciesoftheseltersaredistributedlogarithmicallyandfollowthefrequencyresolutionobservedinrealcochlearlters.Hence,theresolutionisadecreasingfunctionoffrequencywithmorelterscenteredaroundlowerfrequencies.Thereare20frequencychannelsand50ltersineachchannelformoreaccurateapplicationofsomeofthecodingschemessuchassynchronyandratecoding,whilealsohavingareasonablylowcomputationalcost.Theoutputoftheseltersare 33

PAGE 34

25 ],[ 26 ].Oneofthemajordierencesbetweentheproposedarchitectureandtheseapproachesistheenforcementofthespikecodingschemesontopofthespiketrainsfromthecochlea.Spikecodingisnotonlyusedasafeatureextractorbutalsotoinvestigatehowrobustnessmightbeencodedwithinthesespiketrains.Withoutanyappliedcodingscheme,thespike-basedoperatorisforcedtodierentiatethehigherorderfeaturescontainedwithinaspiketrainthusdecreasingtherobustnessoftheoverallsystem. 34

PAGE 35

Ratecodingintheformofspikedensity 22 ].Inordertondtherate,thespikecountcanbeaveragedover: 1. time(rateasspikecount), 2. severalexperimentaltrials(rateasspikedensity), 3. populationsofneurons(rateaspopulationactivity).Figure 3-4 illustratesthemostcommonlyusedformofratecoding.Therearemanyexperimentalndingsinsupportofratecodingaswellasmanyothersclaimingotherwiseinsomecertaincases[ 27 ].ForexampleinThorpe's1996paper,heshowsthathumanshavetheabilitytorecognizeavisualsceneinjustafewhundredmillisecondswhichdoesnotallowenoughtimeforanaveragingovertime.Similarlyayspeciesrespondstovisualstimuliafterthepostsynapticneurononlyreceivesoneortwospikes[ 22 ].Ratecoding,withoutadoubt,isstillamongthemostcommonschemesappliedbyscientiststorealworldproblemstodiscoverpossibleconnectionsbetweenbehaviorandobservedspiketraindata[ 22 ].Averagingovertimesimplyassumesthefrequencyofspike 35

PAGE 36

28 ].Thismeans,giventwodierentinputs,aslongastheirSPLsarearoundconversationallevels,theratecodingattheendofnerveberswillnotbeabletodierentiatethetwodierentacousticstimuli.Hence,thispaperinvestigatesratecodingattwodierentSPLs,namely60dBand10dB,pointingtowardsaduplextheoryofspikecodingwhichmightdependoninputSPLandnoiselevels.Inaddition,forspeechprocessing,ratecodingdoesnotseemtobeasuitablewayofcodingbecauseitisknownthattheringofanauditoryneuronissaturatedduringanormalconversation.Thismeansthat,nomatterwhatthestimuliis,therateofringthroughouttheneuronswithdierentcenterfrequenciesareverysimilar.Hence,onecannotdierentiateonestimulifromanotherbysimplylookingattheringrateatanauditorynerveberduringnormalconversationalspeech. 29 { 31 ].Directtemporalcodinginparticularusesthespikeringtimesas-is,andallthetiminginformationissuppliedtothespike-basedoperator.Forthearchitectureinthesystem,thiscorrespondsto1000individualspiketrainsbeingfedintothespike-basedoperator.Dependingontheclassierarchitecture,theprocessissimpliedbyusinglessspiketrainsperfrequencychannelwhichwillbediscussedinmoredetailinchapter3. 36

PAGE 37

Phasecodedneuralresponseduetoanoscillation 3-5 givesabasicideaonhowphasecodingmightbeusedtocodeastimuli.Phasecodinghasbeenstudiedbydierentresearchersandthereissomeexperimentalevidencesuggestingthatitmightbeusedwhencodingthelocationofsomeanimalswherethephaseofaspikewithrespecttohippocampusoscillationscarrythisinformation[ 32 ],[ 33 ]. 27 ],[ 34 ].Itreliesontheideathatallthestimulusinformationisencodedintherstspikeringtimesoftheneurons.Figure 3-6 showsasimpleexampleofhowthiscodingschemeworks.Whenaninputcomes,dependingontheclassofstimulus,i.e.atableorachairincaseit'savisualstimulus,thenotionthatthesecondneuronredfasterthantherstwhichinturnred 37

PAGE 38

Timetorstspikecodinginresponsetoastimulionset fasterthanthethirdneuronimpliesthataparticularclass,saytable,ispresentedasavisualstimuli.Thisalsomeansthatmostoftheinformationiscarriedwiththerstspikering.Otherresearcherswereabletoshowthattherst20to50msfromthestimulusonsetdoesindeedcarrythemostinformationinagivenspiketrainusingconceptsfrominformationtheory[ 35 ].Someoftheseexperimentsalsoshowthatthebrainusuallyonlyhastimeenoughtoprocessonespikeperneuron,whichmakesthisschemeanapplicablecandidateforplausiblecoding. 3-7 ,whenaparticulargroupofneurons,1,2,3and4,startringtogetherthiscansignifyacertainstimulus.Thistypeofcodinghasbeenstudiedextensivelybymanyresearchers[ 36 ],[ 31 ].Itiscommonconsensusthatsynchronyplaysasignicantroleinthegroupcommunicationsofneurons. 38

PAGE 39

Synchronyofagroupofneuronsinresponsetostimuli 37 { 39 ]. 40 ].Asitisevidentfromthealgorithmdescription,theresponsetimesaremuchshorterwhencomparedtootheralternativessuchasratecodingandalsoitisnoteectedbytheinputintensityconsideringthepresynapticneuronswillsimplyrefasterwithoutchanging 39

PAGE 40

ThebasicstructureofarankorderdecoderforLdierentnumberofclassesandNdierentnumberofpresynapticneurons theorderofring.Thesepropertiesrenderthealgorithmveryusefulforimageprocessingapplications[ 38 41 ].Trainingasystemusingthistypeofcodingisextremelysimpleandfastandslightlyresemblesthatofamulti-layerperceptron.Figure 3-8 showsasimpleblockdiagramforaclassicationexamplewherethereareLnumberofclassesandLcorrespondingdecoderneurons.Theactivationlevelofadecoding(post-synaptic)neuroniattimetisgivenasfollows:Activation(i;t)=mXj=1korder(j)wj;iOrder(j)istheringorderofthepresynapticneuronjandkischosentobeanynumberbetween0and1.wj;iisthesynapticweightbetweenthedecodingneuroniand 40

PAGE 41

3-8 ,whenclass1ispresented,sincetheorderofringwillcorrespondbesttothepresynapticweightsfortherstdecoderneuron,itwillreachitsthresholdandreaspikesignalingdetectionofclass1.Thesynapticweightvalueattheendofthetrainingphasewillbeproportionaltothemeanmodulationofthesynapse.Hence,incaseoneknowsthemeanmodulationforthetrainingdatasamplesbelongingtoaspecicclass,justonesamplewhichisclosertothemeanmodulationcanbeusedfortrainingwhichresultsinanexceptionallyfasttrainingphaseverymuchunliketheconventionalmachinelearningengines.Forspeech,withoutsometypeofpre-codingenforcement,itisnotpossibletoclassifysignalswithjusttheorderofringamongaerentauditoryneurons.Thereasonisthattheneuronscenteredathigherfrequencieswillalwaysrelaterthantheneuronscenteredatlowerfrequenciesnomatterwhattheinputstimuliis. 41

PAGE 42

ThebasicstructureofaliquidstatemachinewithMreadoutstrainedusingsupervisedlearning intheformofliquidstatemachinesandechostatenetworksrespectively[ 39 42 ].Theoverallstructureforbothoftheseapproachesisquitesimilartoamulti-layerperceptronexceptthefactthattherearerecurrentconnectionswithinthereservoir/neuralmicrocircuitandtheweightscorrespondingtotheseconnectionsarenottrained.Onlytheoutputweightsthatareusedtoextractreadoutsfromthestateofthereservoir/neuralmicrocircuitaretrainedusingsupervisedlearningtechniques.AmajordierencebetweenthetwoarchitecturesisthefactthatLSMusesspikesforcomputationandthusismoresuitableforthealgorithmdiscussedinthepaper. 42

PAGE 43

3-9 showsthebasicstructureforatypicalLSM.Theneuralmicrocircuitisanetworkofspikingneuronsrandomlyconnectedtoeachotherwithadesiredratioofinhibitoryandexcitatorysynapses.Theinputissuppliedtothisnetworkviaanalogorspikingsynapsesandthestateoftheneuralcircuitisrecordedovertime.Onecandenethisstatetobeaninternalvariableandforaspike-basedclassicationproblemitcanbechosenasthelow-passlteredspiketrainsgeneratedbytheindividualelementsoftheneuralmicrocircuit.~X(t)=NMC(~u(t))HerethestatevectorisdenedbytheliquidlterNMC,operatingontheinputvectoru(t).Foraparticularclassicationtask,thelaststepistomapthestatevectortothedesiredoutputviamemorylessreadoutfunctions:~ydesired(t)=F1:L(~X(t))assumingMdierentclassesinthisexample.Wehavechosentouseafeed-forwardmulti-layerneuralnetworktotrainthereadoutfunctions.Thenexttwochapterswilldiscussthedierentsystemarchitecturescombiningthecodingschemesandthespike-basedoperatorsinmoredetail. 43

PAGE 44

7 ].Thislinearltermodelsthevocaltractresonations,wherebythelter'smagnituderesponseisdenedasthespectralenvelopethathavedistinctpeaksatparticularfrequenciesdenedasformants.ThesepeaksarelocatedatdierentfrequenciesforeachvowelclassinEnglish.Theyalsodierfortheutterancesofthesamevowelspokenbydierentpeople.However,onecannormallyclassifyavowelwithitsrstthreeformantfrequencies.Theproposedalgorithmusesacomputationaltechniquebasedonspikesforthedetectionofformantfrequencies.First,thespeechislteredthroughagamma-toneequivalentrectangularbandwidth(ERB)lterbankandthelterbankoutputisgivenasinputtotheMeddisHairCellModel(MHCM)toobtainspikingprobabilities[ 19 ],[ 20 ],[ 21 ].Multiplehaircellscenteredatdierentfrequenciesareusedtodetectformantsusingthesynchronybetweenthegeneratedspiketrainsforeachhaircell.Oncetheformantsaredetected,RankOrderCoding(ROC)andLiquidStateMachine(LSM)isusedforclassicationbasedonsynchrony[ 37 ],[ 41 ],[ 38 ]andotherspikecodingmethods.ROCisatime-basedcodingtechnique,whichusestheorderofringamongaerentneuronstoencodeanddecodeinformation,whereasLSMcanmakeuseofdierentcodingschemesforclassication. 44

PAGE 45

MeanformantfrequenciesforcommonEnglishvowels F1(Hz)F2(Hz)F3(Hz) 27022903010/ae/asin"bat" 66017202410/aa/asin"hot" 73010902240/ao/asin"bought" 5708402410/uh/asin"foot" 44010202240 10 ],[ 7 ].Forvowelrecognition,therstthreeformantfrequenciescanbeusedasafeatureset,becausetheseparameterstypicallyidentifyauniquevowel.Table 7-3 liststhetypicalformantfrequenciesforsomeofthemostcommonlyusedvowelsinEnglish.Thesevowelsarealsotheonesusedinthispaperastheclasstypesforclassication.Forthefeatureextractionfromaspeechsample,weuseamodelofthecochleawithMHCMandlookforthesynchronybetweenthehaircellshavingthesamecenterfrequenciesaswellasthefrequencyofringandtemporalcode.Synchronycodingwillbeexplainedinfurtherdetailinthenextsection. 19 ],[ 20 ],[ 21 ].Themodelconsistsoffourmainvariables:permeability,neurotransmittersintheinnerhaircell,neurotransmittersinthesynapticcleftandspikingprobabilities.Anacousticinputsignalrstchangesthepermeabilityoftheinnerhaircellmembrane.Thisresultsinareleaseofneurotransmittersintothesynapticcleftfromtheinnerhaircell.Thenumberofneurotransmittersinthesynapticcleftisthenassociatedwiththeprobabilityofgeneratingaspike. 45

PAGE 46

20 ],[ 21 ]. 31 ].Webelieve,theredundancycausedbythenumberofnerveberspackeddenselyalongthebasilarmembranegiverisetosuchsynchronywhichisoneofthefactorsexplainingtherobustnessoftheauditorysystemtonoise.Intheliteratureonecanndmanydierentdenitionsforsynchronyamongstspiketrains[ 43 ].Theconceptofsynchronycodingasitisappliedinthispaper'sarchitectureisbestexplainedwithasimpleexample.Considerthevowel/iy/asin\beet".Forsimplicity,letusassumethereare20nervebersplacedataparticularfrequencychannel.Whentheacousticsignal,/iy/,isgivenasinputtothefrequencychannelcenteredat300Hz,thespiketrainsthatareobservedalongthesenervebersareshowninFigure 4-1 .TheinputsignaliscorruptedwithwhitenoiseataverylowlevelofSNRaround5dBandisbarelyaudibleatsuchanoiseintensity.Whenthespiketrainsonthe10nervebersareobserved,itisdiculttoseeatrivialpatternregardingactionpotentialtimings.Thenextstepistolookattheinter-spiketimeintervaldistributionwhichisshowninFigure 4-2 .Asitisclearlyobservedfromthegure,thesenervebersarephaselockedtointegermultiplesofT=3.26ms.Hence,thephaselockedfrequencyisFpl305Hz. 46

PAGE 47

Spiketrainoutputsforasetof10haircellsallwithacentralfrequencyof300Hz Figure 8-1 showsthespectralmagnitudeoftheinter-spiketimeintervaldistributioncomparedtothelog-magnitudespectralenvelopeoftheinputvowel.Vowelsarepredominantlydenedbytheirrstthreeformantfrequencieswhicharethepeaksshowninthetopplot.Vowel/iy/inparticularhasitsrstformantfrequencyatF1=305Hz.Figure 8-1 showsthatthespectralmagnitudeplothasapeakatFp=305HzindicatingthatthenerveberscenteredatFc=300Hzwereabletophaselocktotherstformantfrequencyevenforsuchanoisysignal.However,thenoiserobustfeaturewhichwillcarrythisinformationforwardisnotthefrequencytheyarephaselockedto,butinsteadthedegreeofphaselockingwhichissimplythemagnitudeofthepeakatthatparticularfrequency.Thereasoningbehindthiswillbecomemoreapparentingure 4-4 47

PAGE 48

Inter-spiketimeintervalsforasetofhaircellscenteredat300Hz Figure 4-4 showsthespectralmagnitudeoftheinter-spikeintervaldistributioncorrespondingtotwodierentsetsofhaircellscenteredatFc1=300HzandFc2=200Hz.Wehavealreadyexplainedwhytherstplothasapeakat300Hz.Interestingly,whenwelookatthesecondplot,eventhoughtheothersetofnervebersarecentered100Hzothevowelsrstformant,theyarestillabletophaselocktothatparticularfrequency.Thisisbecauseofthefactthattheyhaveanitebandwidthassociatedwiththeirreceptionofacousticstimuli.However,themostcrucialdierencelieswithinthemagnitudesofthepeaksobservedinthephaselockedfrequencyforthetwosetsofnervebers.Asthegureclearlyshows,thepeakmagnitudeforthesecondplotislessthantherstplotindicatingaweakerdegreeofsynchronyorphaselockingfornerveberscenteredfurtherapartfromthedominantfrequencyintheinputstimuli.Thisresultimmediatelyleadstoanewsetof 48

PAGE 49

Log-magnitudespectralenvelopefor/iy/andthecorrespondingdegreeofsynchronyforasetofhaircellscenteredatFc=300Hz(computedforanoisyutterancewith5dBSNR) featuresdenedasfollows:"highestspectralmagnitudepeakvalueoftheinter-spiketimeintervalhistogramatanon-zerofrequencyforeachchannel"whichwewillcallasthedegreesofsynchrony(DoS).Ourexperimentsshowedthatindependentofthespike-basedclassieroperator,thisnovelfeaturesetisextremelyrobusttonoise,andforaparticularvowelremainsunaectedbythechangeinnoisetypeorintensity.Todetermineameasureofsynchrony,50haircellsareusedforeachpredeterminedcenterfrequency.Thisnumberwasfoundtobeareasonabletradeobetweenalow-enoughcomputationalloadandahigh-enoughsynchronyforhighfrequencies.Table 7-3 showsthatthelowestformantfrequencymeanisat270Hzandthehighestformantfrequencymeanisat3010Hz.Therefore,thehaircellsinourmodelspanthefrequencyrange 49

PAGE 50

Degreeofsynchronywithin2setsofhaircellscenteredatFc=300HzandFc=250HzinresponsetonoisyvowelsignalwithF1=300Hz from200Hzto3100Hz.Thefrequencyresolutioninthesystemisprogrammedasadecreasingfunctionoffrequency,resemblingtheresolutionofbiologicalsystems[ 44 ].Asasimplecomputationalapproximation,threeclassicationregionsarecreatedforthethreerespectiveformantfrequencies.Therstregionspansthefrequencyrangefrom200Hzto800Hzandisusedtoclassifybasedupontherstformantfrequency,F1.Thefrequencyresolutioninthisregionis10Hz,whichmeansthatthereare50haircellscenteredateverymultipleof10Hzstartingfrom200Hz.Similarly,thesecondclassicationregionisforthesecondformant,F2(andsometimesthethirdformant,F3)andspansthefrequencyrangefrom800Hzto2.4kHz.Thefrequencyresolutioninthisregionis25Hz.ThenalclassicationregionisforF3(andsometimesF2)andspansthefrequencyrangefrom1.6kHzto3.1kHz.Thefrequencyresolutioninthisregionis50Hz.Thereissomeoverlap 50

PAGE 51

45 ].Thissaturationimpliesthathaircellringratecannotbeusedasareliablefeaturefordetectingformantlocations.Thephaselockingofhaircellspiketrainsappearstobeamuchmoreplausibleandnoiserobustfeature[ 46 ].Astrongerphaselockingtoformantfrequencyforahaircell,impliesahigherenergyinputatthecorrespondingfrequency.Figure 4-1 showsthespiketrainsgeneratedinourmodelfor10haircellscenteredatFc=300HzasaresponsetoanoisyvowelinputwhichhasF1=300Hz.Itisdiculttoobservephaselockingonlybylookingatthespiketrains.Howeverwhentheinter-spiketimeintervalsareobservedasinFigure 4-2 ,ahighdegreeofphaselockingtotherstformantfrequencyofthenoisyvowelsignalisobtained.ThepeaksinFigure 4-2 areatt1=3.26ms,t2=6.5msandsoon,whichimpliesthatmostoftheneuronsarephaselockedtothefrequencyFpl=1/3.26ms=305HzwhichisveryclosetoF1=300Hzevenforaverynoisysignal(5dBSNR).Thenextstepistondameasureofsynchronyusingthephaselockedneurons.Thereareanumberofproposalsintheliteratureforcomputingthedegreeofphasesynchrony[ 43 ].However,wehaveinitiallyimplementedasimplebrute-forceapproximationwhichwillutilizeleakyintegrateandre(LIF)neuronstofunctionasaDiscreteFourierTransform(DFT)ontheinter-spiketimeintervaldistribution.TheDFTwillpeakatthefrequencyforwhichweobservethestrongestphaselocking.Hence,themagnitudeoftheDFTatthispointisusedasanindicatorofthedegreeofsynchrony.ThisisclearlyobservedinFigure 4-4 .TherstplotshowstheDFTmagnitudeforthesamevowelinputwithF1=300HzwherethehaircellsarecenteredatFc=300Hz.Asexpected,thereisapeakaround300Hz.ThesecondplotshowstheDFTmagnitudeforadierentsetofhaircellswithFc=200Hz.BecausethebandwidthofthehaircellsincludesF1,peakat300Hzcanstillbeobserved.However,astherstformantfrequencygetsfurtherfromthecenterfrequencyofhaircells,thepeakmagnitude 51

PAGE 52

Degreesofsynchronyfor3regionsofclassicationforeachsetofhaircells(Vowel/iy/asin\beet") oftheDFTshrinks,whichindicatesalowerdegreeofsynchrony.Thisobservationdirectlyleadstotheconclusionthatformantfrequenciescanbedeterminedusingsynchronywithinasetofhaircellscenteredatthesamefrequency.Inotherwords,thecenterfrequencyofthesetofhaircells,whichhasthehighestsynchronyamongalltheothersetsofhaircells,shouldbethebestapproximationtotherespectiveformantfrequency.specically,whenthereisahighersynchronyinaparticularsetofhaircells,theLIFneuronconnectedtothatsetwillreachitsthresholdfasterandresaspike.TheLIFneuronthatrstresaspikeistheoneconnectedtothesetofhaircellswithacenterfrequencyclosesttotherespectiveformantfrequency.Figure 4-5 showsthedegreeofsynchronyfordierenthaircellsetsinthesystemforanutteranceof/iy/asin\beet".Thesynchronymapforallthreeclassicationregions 52

PAGE 53

53

PAGE 54

47 ],[ 18 ].However,neitherisspike-basedandtheyusuallyrequiresophisticatedtrainingproceduresusingasubstantialamountoftrainingdata.Sincetheproposedalgorithmusesspikesforcomputation,aspiked-basedclassierhastobeused.Thereareseveralspike-basedclassicationtechniques,whichhavebeeninvestigatedbeforewithoutanybaselineclassiersforcomparison[ 48 ],[ 25 ],[ 49 ].Thisresearchemploystwospike-basedclassierswithdierentadvantages.Whilerankordercoding(ROC)isaverysimpleandecienttechniquewhichrequiresverylittletrainingdata,liquidstatemachine(LSM)withsupervisedlearningismorecomplexandsuitableforbettergeneralizationoftheproposedalgorithmtodierentclassicationtasks[ 37 ],[ 41 ],[ 38 ],[ 49 ].Withthepossibilityofusingtwodierentspike-basedoperatorsandthreespikecodingschemesthereare6dierentpossiblearchitecturalcombinations.Thischapterwilldiscuss4ofthemorecompetitivecombinationsunderdierenttypesandintensitiesofnoiseanddierentSPLs,ingreaterdetail. 54

PAGE 55

40 ].Therearethreemajoradvantagestothistypeofcoding.First,theresponsetimesaremuchshorterthanthealternativesbecausetheinformationiscodedinonlyonespikeperneuron.Second,theinformationcapacityishighanditisinvarianttochangesintheinputintensityandinputcontrast.Finally,thetrainingissimpleandfast. 5-1 ,beforetherst-stagedecoderneurons,thereareleakyintegrateandre(LIF)neuronswhichdeterminethesynchronywithineachsetofhaircells.Asdescribedearlier,LIFneuronswillbeusedtoreplacetheDFToperationperformedontheinter-spikeintervalsofthehaircells.TheLIFneuronsconnectedtoasetofhaircells,whichhasthehighestsynchrony,reachesitsthresholdofringfasterthantheothersandresaspike.ByusingthestrengthofsynchronyasinputweobtainaringpatternonLIFneurons,whichcanthenbedecodedusingarankorderdecodingneuron.Theoutputsofthesedecoderneuronsarethengivenasinputsto10secondstagedecoderneurons,oneforeachvowel,whichareresponsibleforthenaldecision.Theactivationlevelofadecodingneuroniattimetisgivenbythefollowingequation,Activation(i;t)=mXj=1korder(j)wj;i

PAGE 56

5-1 showsthegeneralblockdiagramofthesystemarchitectureforonethebranchesforoneofthevowels. 56

PAGE 57

Blockdiagramofoneofthebranchesofvowelclassicationalgorithmforoneofthevowels 57

PAGE 58

5-2 showstheoutputofdecodersforatestutteranceofthesecondvowel/ih/.Itisclearfromthegurethatthesecondstagedecoderneuronofthesecondvowelhasthehighestactivationenergyasexpected. Figure5-2. Outputofthedecodersforatestutteranceofthesecondvowel/ih/asin\bit" 58

PAGE 59

TheblockdiagramfortheLSM-synchronycodingarchitecture Asanexample,thearchitecturewhichusesLSMandthedegreeofsynchronyasitsfeaturesetisshowninFigure 5-3 59

PAGE 60

50 ].Duringthetrainingphase,rstthestatevectorisobtainedvialow-passlteringthespikeoutputsofthe300neuronsinthemicrocircuitwitha300Hzcut-ofrequency.Thestatevectoristhensampledatevery20msandassociatedwithaclasslabelcorrespondingtotheinputsignal.Thisgeneratesinput-desiredoutputpairstobeusedtotrainasinglehiddenlayerfeed-forwardneuralnetworkwiththewellknownback-propagationalgorithm.Theneuralnetworkusesatangentialsigmoidoutputwhichisquantizedbythenumberofavailableinputclassesandisusedtoapproximatethememorylessreadoutfunctionmappingthestateofthemicrocircuittothedesiredclasslabel.Minimumerror-entropy(MEE)isusedasaperformancecriterionforsupervisedtrainingoftheneuralnetwork.However,withtheneuralmicrocircuitemploying300neurons,trainingtheoutputsofsuchanenormoussetwiththememorylimitationsofMEEturnedouttobeadiculttask. 60

PAGE 61

61

PAGE 62

51 { 53 ].Thesimpleanalyzablestructure,andlowcomputationalrequirementsarethemostimportantadvantagesoftheMSEcriterion.UnderGaussianityassumption,MSE,whichsolelyconstraintsthesecond-orderstatistics,iscapableofextractingallpossibleinformationfromthedata,whosecharacteristicsareuniquelydenedbyitsmeanandvariance.Inmanyapplications,datadensitiestakecomplexforms;hence,theGaussianityassumptionbecomesrestrictiveandmanyreallifephenomenacannotbesucientlydescribedbyonlysecondorderstatistics.Atthispoint,constrainingtheinformation 62

PAGE 63

54 { 57 ].Duetoitsconceptualsimplicity,errorentropyminimizationmeritsspecialattentionhere.EntropyisintroducedbyShannonasameasureoftheaverageinformationinagivenprobabilitydensityfunction[ 58 59 ].Entropyisanexplicitformoftheprobabilitydensityfunction(pdf)itself;hence,itincludesallthehighorderstatisticalpropertiesdenedinthepdf.Consequently,asanoptimalitycriterion,entropyissuperiortoMSE,asminimizingtheentropyconstrainsallmomentsoftheerrorpdf,whereasMSEconstrainsonlytherstandsecondmomentofthepdf.Inthiscontext,minimizingtheoutputerrorentropyisequivalenttominimizingthedistancebetweentheprobabilitydensitiesoftheoutputandthedesiredsignal[ 56 ].Entropyisafunctionofthedatapdf,andanalyticaldatadistributionsareneveravailableinrealapplications.Therefore,entropymustbeestimatedfromthedatasamples.Onecommonapproachistodirectlysubstituteanestimateofthepdfofthesignalintothesamplemeanapproximationfortheexpectation[ 60 ],andhere,kerneldensityestimation(KDE)isthetypicaldensityestimationscheme[ 61 62 ].Ifthekernelfunctionitselfiscontinuouslydierentiable,KDEyieldsacontinuouslydierentiabledensityestimate,whichiscrucialforgradientbasedadaptation.Anotheralternativeisemployingdensityestimatorsbasedonsamplespacing.Thisisanothernonparametricapproach,whichisbasedonthedistancebetweenpairs,orgenerallyn-tuples,ofthedatasamples.Inthisapproach,thereisnoproblemsuchaskernelselection;however,theresultingestimatesarenotdierentiable,hence,notsuitableforgradientbasedadaptivelearning.Thecomputationalbottleneckfortheseentropyestimatorsisthecomplexityofthedensityestimationmethodemployed.Typically,KDEresultsinO(N2)complexity,whereasthesamplespacingmethodsresultsinO(Nm),whereNisthetotalnumberof 63

PAGE 64

^p(e)=1 61 63 64 ].Methodsvaryfromnearestneighborbasedheuristicstoprincipledmaximumlikelihoodbasedapproaches.Still,theselectionoftheoptimalbandwidthisanopenendedproblem. 64

PAGE 65

1logZp(e)de(6{2)substituting(1)into(2),intheplug-inestimationframework,onecanobtaintheKDEbasedentropyestimateas ^H(e)=1 1log1 ^p(e)=8>><>>:1 (N1)(ei+1ei)ifeieei+1,0otherwise(6{4)Notethattheexpectedprobabilitymassbetweentwosuccessivesamplesoftherandomvariablehereis1 1 ^p(e)=8>><>>:1 (Nm)(ei+mei)ifeieei+m,0otherwise(6{5)Thisiscalledthem-spacingestimator.Similarly,theexpectedprobabilitymassbetweensuccessivesamplesis1 65 ]. 65

PAGE 66

65 66 ] m(ei+mei)(6{6) ~ei=ei+m+ei (Nm)(ei+mei)intheinterval[ei;ei+m],andzerootherwise 1. iszerooutsidetheinterval[ei;ei+m]. 2. satisestheboundaryconditionsateiand,ei+m.Thelimitofthekernelfunctionandthederivativeofkernelfunctionatthesepointhastobe0.lime!eiKi(e)=0,lime!ei+mKi(e)=0,lime!eiKi(e)=0, 66

PAGE 67

3. integratesto1 iscontinuouslydierentiable.Aforthorderpolynomial-withdoublerootsateiand,ei+m-istheminimumorderpolynomialchoice. 67

PAGE 68

Thepolynomialkernelfori=3and~ei=0. natureofthekernelandremovetheunnecessarykernelevaluationsoneshouldchangethesummationindices.Thisyields 68

PAGE 69

N=0.Typicalselectionsformincludem=p 65 ].ThecomputationalcomplexityofCDSSentropyestimatorisO(mN),whichbecomesO(N3=2)and,O(N5=4)forthesechoices,bothofwhicharesignicantlycheaperthanO(N2)computationalcomplexityoftheKDEentropyestimator.Moreover,forlargetrainingsets,thecomputationalcomplexitycanbefurtherreducedtoO(N)ifaprioriinformationaboutthedataorempiricalevaluationsyieldasucientconstantvalueform. 56 57 ].Althoughtheerrorentropyminimizationapproachisverypromising,itcanonlybeappliedtosmallscaleproblemsduetohighcomputationalcomplexity.InordertoovercomethisproblemwepresentedacomputationallyecientCDSSentropyestimator.Inthissection,wewillshowhowthisapproachcanbeappliedtosupervisedtrainingofadaptivesystems.Consideratrainingsetofinputanddesiredoutputpairsandanadaptivesystemwithparametervectorw,onecandesignagradientbasedadaptation.Gradientdescentisthesimplestandthemostcommonlyusedchoice,whereassecondorderalternativessuchastheNewton'smethodarealsopossible.Leteidenotethetrainingerrorbetweentheithinput-outputpair.Thegradientoftheerrorentropywithrespecttothesystem 69

PAGE 70

70

PAGE 71

67 68 ]. 69 ].ForMEE-KDEthekernelsizeisempiricallychosentobe2=0.01.Assuggestedintheliterature,thesamplesizeforMEE-CDSSmischosenasintegermultiplesofp 65 ].Pleasenotethatsuchasamplesizeselectionalsosatisesthefollowingtwoconditions: 1. limN!1m=1(asymptoticunbiasedness) 2. limN!1m=N=0(asymptoticconsistency)Afterempiricalanalysis,mischosentobe3p 71

PAGE 72

ComputationTimevs.L1PerformanceComparisonsandTrade-osfortheLearningSchemes-Mackey-GlassTimeSeriesPrediction ComputationTimeL1Performance MSE 0.05units0.064MEE-CDSS 0.45units0.048MEE-KDE 2.1units0.046MSE 2units0.052MEE-CDSS 1.8units0.038 6-1 comparestheperformancesandcomputationalcomplexitiesofthethreesupervisedlearningschemes.Fortherstthreerows,allTDNNsweretrainedwiththeexactsamenumberofiterations,andthesamesize(200samples,asdescribedabove)oftrainingdata.Asexpected,MEE-KDEhasthebestL1performancewhilerequiringmorecomputationtimethantheothertwoalgorithms.Ontheotherhand,MEE-CDSShasaverysimilarperformancetoMEE-KDEwithlessthanonly1=4thofitscomputationtime.ThenoveltyoftheMEE-CDSSschemebecomesmoreobviouswhenitisallowedtotrainusingmoretrainingsamplessuchthatitscomputationtimeisapproximatelysimilartoMEE-KDE.Thelast2rowsoftable 6-1 displaystheresultsforsuchascenariowherebothMSEandMEE-CDSSareallowedtouseextracomputationtimefortrainingtomatchthatofMEE-KDE's.TheL1performanceofMSE,whilegettingcloser,stillcannotoutperformMEE-KDE;onthecontraryMEE-CDSSnowhasalowererrorthanMEE-KDEwhiledisplayingasymptoticstability.Figure 6-2 showsthetesterrordistributionsforthe3schemesfordierentnumberofiterations.Eventhoughtheyareallconcentratedaround0meanerror,MSEhasaconsiderablylargervariancewhereasMEEbasederrordistributionsdisplayquitesimilarcharacteristics.Moreimportantly,whenthecomputationtimeisxedandMEE-CDSS 72

PAGE 73

TheerrorprobabilitydensitiesforMSE,MEE-KDEandMEE-CDSSfortheMackey-Glasstimeseriespredictiononthetestdata isallowedtousemoretrainingsamplestocompensateforreducedcomplexity,theerrorvarianceisdecreased. 68 ].ThisdatasetwasusedintheSantaFetimeseriescompetitiondirectedbyNeilGershenfeldofMassachusettsInstituteofTechnologyandAndreasWeigend.Theinternalstatedynamicsandtheoutputofthesystemcanbesummarizedbythefollowingequations.Thetime-seriesisbasicallygeneratedbynumericallyintegratingthedynamicequationsofmotionforaweaklydampedandperiodicallydrivenparticle[ 70 ]given 73

PAGE 74

68 ].Themainreasonforchoosingsuchasystemistheexistenceofhighdimensionaldynamicsandtheslowdriftofparameterstoensurealongtermchangeintransitionprobabilities.Thissystemexhibitscomplexhigh-dimensionaldynamicsobservedthroughacomplexnonlinearmeasurement,thusisextremelychallenging.Moreover,bypickingsuchanobservable,thecomplexityoftheproblemisincreasedbyhavingmoreinternalstatesthanexternallyobservable.Theparameters(numberofiterations,mwith3p 74

PAGE 75

Table6-2. ComputationTimevs.L1PerformanceComparisonsandTrade-osfortheLearningSchemes-MotionofaDamped,DrivenParticle ComputationTimeL1Performance MSE 0.15units0.092MEE-CDSS 1.4units0.04MEE-KDE 7.5units0.036MSE 2units0.078MEE-CDSS 7.2units0.034 6-2 comparestheperformancesandcomputationalcomplexitiesofthethreesupervisedlearningschemesforthesecondcasestudy.Asthetableshows,theresultsconrmthoseoftherstexperiment.Whentheexactsamenumberofiterationsandthesamesizeoftrainingdataisused,MEE-KDEhasthebestperformancewiththetrade-oofahavingahighercomputationtime.Alsonotethat,withtheincreasingsizeofthetrainingsamplesavailable,thedierencebetweenthecomputationtimeshaveincreasedaswell.Onceagain,theperformanceofMEE-CDSSschemesurpassesMEE-KDEassumingaxedamountofcomputationtimeisallowedforbothalgorithms.Inotherwords,MEE-CDSSstillhasthebesttrade-obetweencomputationalcomplexityandperformance.Figure 6-3 showstheerrordistributionsforthe3trainingschemes.Giventhesamenumberofiterationsandthesamesizeoftrainingdata,theerrorvarianceofMEE-KDEisnoticeablylessthanthatofMSE,howeverthereisonlyamarginaldierencebetweenMEE-KDEandMEE-CDSS.Ontheotherhand,givenaxedcomputationtimeandmoretrainingsamples,theerrordensityofMEE-CDSSachievesamoreconcentratedpeakaroundzero. 75

PAGE 76

ThetesterrorprobabilitydensitiesforMSE,MEE-KDEandMEE-CDSSforthepredictionofthemotionofaweaklydampedandperiodicallydrivenparticle Tosummarize,ithasalreadybeenshownthatMEE-KDEprefersalargerandmoreconcentratedpeakaroundzeroerrorandapproximatesthepdfofthedesiredoutputmuchbetterthanMSEwiththecostofasubstantiallyincreasedcomputationaltimeduringthetrainingofthesystem[ 56 ].Alltheresultspresentedinthissectionsimplydemonstratethatwiththeuseofcontinuouslydierentiablesamplespacing,MEE-CDSScanachieveasimilarperformancetoMEE-KDEwithmuchlesscomputationalcomplexity. 76

PAGE 77

77

PAGE 78

16 ].Trainingdataconsistedofthe10vowelslistedin 7-1 ,whichhavethelistedmeanformantfrequencies.Testingdatawasgeneratedbyrandomlyvaryingtheformantfrequenciesforeachgeneratedvowel.Thetrainingdatawasgivenasmeanmaleformantfrequenciesforeachvowelandthusthetestingdataconsistedofmaleutterances.Formalespeakersthevarianceinformantfrequenciesisusuallysmallerthan100Hz,i.e.,therstformantof=iy=canbebetween170Hz
PAGE 79

Meanformantfrequenciesforvowelsusedinthecontrolledexperiment F1(Hz)F2(Hz)F3(Hz) 27022903010/ih/asin"bit" 39019902550/eh/asin"bet" 53018402480/ae/asin"bat" 66017202410/ah/asin"but" 52011902390/aa/asin"hot" 73010902240/aa/asin"hot" 73010902240/ao/asin"bought" 5708402410/uh/asin"foot" 44010202240/er/asin"bird" 49013501690 Percentageofvowelscorrectlyclassiedforthecontrolledexperiment 100.0%100.0%100.0%WhiteNoise 100.0%100.0%71.3% 79

PAGE 80

Meanformantfrequenciesforvowelsusedintherealclassicationtest F1(Hz)F2(Hz)F3(Hz) 27022903010/ae/"bat" 66017202410/aa/"hot" 73010902240/ao/"bought" 5708402410/uh/"foot" 44010202240 7-3 showsthe5mostcommonlyusedvowelsintheEnglishlanguagewhicharethevowelsthatthesystemswilltrytoclassify.Thetestswillbedoneforallthe4architecturalcombinationsdiscussedinchapter4using2dierenttypesofnoise:whiteandpink;and3dierentSNRlevels:25dB,15dB,5dB.Also,inordertoseetheviabilityofthe3spike-codingschemesfordierentSPLs,LSM-rate,directtemporalandsynchronycodingstructureswillalsobetestedat2levelsofsoundpressureaswell:60dBand10dB. 80

PAGE 81

PercentageofvowelscorrectlyclassiedforROC-SynchronyCoding 80.6%80.2%79.9%WhiteNoise 79.8%78.9%77.5% 12 ]. 7-4 showstheperformanceofthearchitectureusingdegreeofsynchronyandrankordercodingat60dBSPL.Asobservedfromthetable,forbothnoisetypes,evenat5dBSNR,theperformanceisroughlythesameasin25dBSNRwhichshowstherobustnessofthedegreeofsynchronyfeaturesettonoise.RatecodingbeingunreliableathighSPLlevels,thesameexperimentisperformedat10dBSPLusingLSMasthespike-basedoperator.Table 7-5 showstheresultsforpinknoise(whitenoisebeingverysimilar)forRC,DoSandDTC.AtthelowestSNRvalueRC 81

PAGE 82

Percentageofvowelscorrectlyclassiedat10dBSPLforLSM-rate,directtemporalandsynchronycoding 77.9%74.2%63.0%DTC 77.8%72.0%59.8%DoS 76.2%71.6%58.4% 7-5 indicatethatratecodingmightbepreferredovertemporalcodingatlowSPLs.EventhoughallthreeschemesperformsimilarlyathighSNRlevels,theperformancesofESTandDoSdegradefasterthanRCwithincreasingnoise.Therearetwoexplanationsforthisphenomenon.Firstofall,bothESTandDoSrequireasmanyspikesgeneratedbytherealacousticinputaspossibletocodeinformationeectivelywhichisnotthecasefor10dBSPL.Moreover,atlowintensityvalues,phasesynchronyiseasiertooccuratlowfrequencychannelsratherthanathighfrequencychannels.Hence,whennoiseisintroducedtosomeofthelowfrequencychannelswherethereisnorealspectralcontentfromtheactualacousticinput,therandomlygeneratedsynchronymightnegatethecontributionofrealsynchronyhappeningathigherfrequencies.Thisisnottrueforratecoding,becauseat10dBSPL,theringratesforeverychannelchangealmostlinearlywithchangesinacousticinputintensity.Moreover,inherentaveragingovertimealsocontributestoitshigherperformanceundernoise.Thethirdtestcomparesallalgorithmsfor60dBSPLusingtheLSMasthespike-basedoperatoragain.Table 7-6 showstheresultsforpinknoiseandDoSissuperior 82

PAGE 83

Percentageofvowelscorrectlyclassiedat60dBSPLforLSM-rate,directtemporalandsynchronycoding 36.2%35.4%35.2%DTC 80.2%72.5%64.9%DoS 93.0%92.5%91.0% 7-7 showsthecomparisonoftheLSM-synchronycodingarchitectureandtheMFCC-HMM 83

PAGE 84

Table7-7. PercentageofvowelscorrectlyclassiedforLSM-synchronycodingandMFCC-HMMengine 92.5%84.0%76.0%DoS-WhiteN. 91.0%90.0%87.5%MFCC-PinkN. 94.0%88.0%78.5%DoS-PinkN. 93.0%92.5%91.0% 7-1 .2)Introducetwodierenttypesofnoise:carandstreet.3)Augmentphase-synchronyfeatureswithMFCCandusewithHMM. 84

PAGE 85

Table7-8. Percentageofvowels(10classes)correctlyclassiedforLSM-synchronycodingandMFCC-HMMengine 90.2%82.5%71.4%DoS-StreetN. 81.4%77.3%70.1%MFCC-CarN. 91.9%85.4%73.2%DoS-CarN. 82.0%78.9%72.5% 7-8 showsthepercentageofvowelscorrectlyclassiedforthefullyspike-basedalgorithmandMFCC-HMMengine.AthighSNRvalues,MFCC-HMMalgorithmsignicantlyoutperformsDoS-LSM.Eventhoughthedierencebetweenthetwoalgorithmsgetssmallerwithincreasinglevelsofnoise,MFCC-HMMisalwaysaheadinperformance.However,thedropinperformancewithdecreasinglevelsofSNR,iscloseto20%forMFCC-HMMwhereasit'sonlyaround10%forDoS-LSM.Thistableisuniqueinthesensethatitstillshowsadegreeofrobustnessforthespike-basedmethodhoweverthegeneralizationoftheliquidstatewiththeincreasingnumberofclassesfailsbehindthatothehiddenMarkovmodel.Asmentionedmanytimesbefore,wehavethefollowingtwogoalswiththisresearch:1)Fullyspike-basedalgorithmstryingtocomeascloseaspossibletocompletebiologicalplausibilitywhilestillmaintainingviabilityinperformance,2)Partiallyspike-basedalgorithmsemployingthebestadvantagesofspike-basedcomputationtoengineernoise-robustsystemsrivalingstate-of-the-artconventionalapproachestospeechrecognition. 85

PAGE 86

Table7-9. Percentageofvowels(10classes)correctlyclassiedfortheaugmentedfeaturesandMFCCwiththeHMMengine 90.2%82.5%69.4%DoS-MFCC-StreetN. 93.4%92.8%89.0%MFCC-CarN. 91.9%85.4%73.2%DoS-MFCC-CarN. 98.8%96.7%91.3% 7-9 showsthepercentageofcorrectlyclassiedvowelsfortheaugmented(orhybrid)featuresetcomparedtoMFCC.TheMFCCresultsaretakenfromTable 7-8 .AsexpectedthehybridfeaturesetvastlyoutperformsMFCCatnearlyeverynoiselevelandtheperformancedierenceisasmuchas20%atthelowestSNRvalues.Theconclusionisthatitpreservesthenoiserobustnessofthephase-synchronyfeaturesetandcombinesitwiththebettergeneralizationofHMM. 86

PAGE 87

87

PAGE 88

88

PAGE 89

8-1 showstheregularphase-lockingbehaviorofnerveberstothemagnitudepeaksobservedinspectralenvelope.Forapossiblespike-basedpitchdetectionalgorithm,lowfrequencyberswouldphaselocktothefundamentalfrequency,instead. 89

PAGE 90

Log-magnitudespectralenvelopefor/iy/andthecorrespondingdegreeofsynchronyforasetofhaircellscenteredatFc=300Hz(computedforanoisyutterancewith5dBSNR)-Notethatthenerveberscenteredatlowfrequenciescansimilarlyphase-locktothefundamentalfrequency Currently,weareworkingonthesoundseparationof4classesofinput:speech,music,noiseandspeechinnoiseusingthespikeoutputfromtwodierenttypesofcochlea,softwareandelectronic(forapossiblehardwareimplementation).Eventhoughattheveryearlystagesofresearch,thespike-basedframeworkisapproachingtheperformanceofstate-of-the-artmethodsforthisparticularclassicationtask.Preliminaryresultsshowthatwithadatasetof4classesand30samplesforeachclass(recordingsof30seconds)weachieve100%correctclassicationontrainingdataand85%correctclassicationontestingdata.Asacomparison,arecentlyintroducedstate-of-the-artalgorithmbyBuchlerhas91%correctclassicationontestingdata[ 71 ]. 90

PAGE 91

72 ].Theexactsameprinciplescouldbeappliedforthespike-basedarchitectureemployingtheneuralmicro-circuittorecognizemulti-syllablewords.Itwouldbequitenaivetothinkit'saneasytasktodevelopaframeworkthatcouldreplacealgorithmsbaseduponftyyearsofresearch.However,webelievethatthendingspresentedinthisresearchwillatleastsettherststeppingstonesforanewwayofinformationprocessingforspeech. 91

PAGE 92

[1] K.H.Davis,R.Biddulph,andS.Balashek,\Automaticrecognitionofspokendigits,"J.Acoust.Soc.Am.,vol.24,no.6,pp.627{642,1952. [2] H.F.OlsonandH.Belar,\Phonetictypewriter,"J.Acoust.Soc.Am.,vol.28,no.6,pp.1072{1081,1956. [3] J.W.ForgieandC.D.Forgie,\Resultsobtainedfromavowelrecognitioncomputerprogram,"J.Acoust.Soc.Am.,vol.31,no.11,pp.1480{1489,1959. [4] J.SuzukiandK.Nakata,\Recognitionofjapanesevowels-preliminarytotherecognitionofspeech,"J.RadioRes.Lab,vol.37,no.8,pp.193{212,1961. [5] J.SakaiandS.Doshita,\Thephonetictypewriter,"InformationProcessing1962,Proc.IFIPCongress,Munich,1962. [6] D.B.FryandP.Denes,\Thedesignandoperationofthemechanicalspeechrecognizeratuniversitycollegelondon,"J.BritishInst.RadioEngr.,vol.19,no.4,pp.211{229,1959. [7] B.S.AtalandS.L.Hanauer,\Speechanalysisandsynthesisbylinearprediction,"J.Acoust.Soc.Am.,vol.50,pp.637{655,1971. [8] L.R.Rabiner,S.E.Levinson,A.E.Rosenberg,andJ.G.Wilpon,\Speaker-independentrecognitionofisolatedwordsusingclusteringtechniques,"IEEETrans.Acoustics,SpeechandSignalProc.,vol.27,pp.336{349,1979. [9] S.E.Levinson,L.R.Rabiner,andM.M.Sondhi,\Anintroductiontotheapplicationofthetheoryofprobabilisticfunctionsofamarkovprocesstoautomaticspeechrecognition,"BellSyst.Tech.J.,vol.62,no.4,pp.1035{1074,1983. [10] S.B.DavisandP.Mermelstein,\Comparisonofparametricrepresentationsformonosyllabicwordrecognitionincontinuouslyspokensentences,"IEEETrans.Acoust.,Speech,SignalProcessing,vol.28,pp.357{366,1980. [11] M.D.SkowronskiandJ.G.Harris,\Exploitingindependentlterbandwidthofhumanfactorcepstralcoecientsinautomaticspeechrecognition,"J.Acoust.Soc.Am.,vol.116,no.3,pp.1774{1780,2004. [12] I.Uysal,H.Sathyendra,andJ.G.Harris,\Abiologicallyplausiblesystemapproachfornoiserobustvowelrecognition,"IEEEProc.oftheMidwestSymposiumonCircuitsandSystems,vol.1,pp.245{249,2006. [13] ||,\Aduplextheoryofspikecodingintheearlystagesoftheauditorysystem,"IEEEProc.oftheInternationalConferenceonAcousticsSpeechandSignalProcess-ing,vol.4,pp.733{736,2007. 92

PAGE 93

||,\Spike-basedfeatureextractionfornoiserobustspeechrecognitionusingphasesynchronycoding,"IEEEProc.oftheInternationalSymposiomonCircuitsandSystems,pp.1529{1532,2007. [15] S.S.Stevens,J.Volkman,andE.Newman,\Ascaleforthemeasurementofthepsychologicalmagnitudeofpitch,"J.Acoust.Soc.Am.,vol.8,no.3,pp.185{190,1937. [16] M.Slaney,\Auditorytoolbox,version2,technicalreportno:1998-010,"IntervalResearchCorporation,Tech.Rep.,1998. [17] S.Young,J.Jansen,J.Odell,D.Ollasen,andP.Woodland,\Thehtkbook(version2.0),"EntropicsCambridgeResearchLab,Tech.Rep.,1995. [18] L.R.Rabiner,\Atutorialonhiddenmarkovmodelsandselectedapplicationsinspeechrecognition,"ProceedingsoftheIEEE,vol.77,no.2,pp.257{286,1989. [19] R.Meddis,\Simulationofmechanicaltoneuraltransductionintheauditoryreceptor,"J.Acoust.Soc.Am.,vol.79,pp.702{711,1986. [20] C.J.SumnerandE.A.Lopez-Poveda,\Arevisedmodeloftheinner-haircellandauditory-nervecomplex,"J.Acoust.Soc.Am.,vol.111,no.5,pp.2178{2188,2002. [21] C.J.Sumner,E.A.Lopez-Poveda,L.P.O'Mard,andR.Meddis,\Adaptationinarevisedinner-haircellmodel,"J.Acoust.Soc.Am.,vol.113,no.2,pp.893{901,2003. [22] F.Rieke,D.Warland,R.deRuytercanSteveninck,andW.Bialek,Spikes-Explor-ingtheNeuralCode.MITPress,Cambridge,MA,1999. [23] B.Nageris,J.C.Adams,andS.N.Merchant,\Ahumantemporalbonestudyofchangesinthebasilarmembraneoftheapicalturninendolymphatichydrops,"Am.J.Otol.,vol.17,pp.245{252,1996. [24] R.Konig,P.Heil,E.Budinger,andH.Scheich,TheAuditoryCortex.LawrenceErlbaumAssociates,Publishers,2005. [25] J.J.HopeldandC.D.Brody,\Whatisamoment?transientsynchronyasacollectivemechanismforspatiotemporalintegration,"Proc.Natl.Acad.Sci.USA,vol.98,no.3,pp.1282{1287,2001. [26] D.Verstraeten,B.Schrauwen,D.Stroobandt,andJ.V.Campenhout,\Isolatedwordrecognitionwiththeliquidstatemachine:acasestudy,"InformationProcessingLetters,vol.95,pp.521{528,2005. [27] S.Thorpe,D.Fize,andC.Marlot,\Speedofprocessinginthehumanvisualsystem,"Nature,vol.381,pp.520{522,1996. [28] M.B.Sachs,\Neuralcodingofcomplexsounds:speech,"AnnualReviewofPhysiol-ogy,vol.46,pp.261{273,1984. 93

PAGE 94

P.DayanandL.F.Abbott,Theoreticalneuroscience:computationalandmathemati-calmodelingofneuralsystems.MITPress,Cambridge,MA,2001. [30] R.VanRullen,R.Guyonneau,andS.J.Thorpe,\Spiketimesmakesense,"TrendsinNeurosciences,vol.28,no.1,pp.1{4,2005. [31] D.TermanandD.Wang,\Globalcompetitionandlocalcooperationinanetworkofneuraloscillators,"PhysicaD.,vol.81,pp.148{176,1995. [32] J.J.HopeldandA.V.M.Herz,\Rapidlocalsynchronizationofactionpotentials:towardscomputationwithcoupledintegrate-and-renetworks,"Proc.Natl.Acad.Sci.USA,vol.92,pp.6655{6662,1995. [33] J.O'Keefe,\Hippocampus,theta,andspatialmemory,"Curr.Opin.Neurobiol.,vol.3,pp.917{924,1993. [34] M.J.ToveeandE.T.Rolls,\Informationencodinginshortringrateepochsbysingleneuronsintheprimatetemporalvisualcortex,"Vis.Cognit.,vol.2,no.1,pp.35{58,1995. [35] A.Treves,E.T.Rolls,andM.Simmen,\Timeforretrievalinrecurrentassociativememories,"PhysicaD.,vol.107,pp.392{400,1997. [36] A.K.KreiterandW.Singer,\Oscillatoryneuronalresponsesinthevisualcortexoftheawakemacaquemonkey,"Eur.J.Neurosci.,vol.4,pp.369{375,1992. [37] S.J.ThorpeandJ.Gautrais,\Rankordercoding,"inComputationalNeuroscience:TrendsinResearch,J.Bower,Ed.NewYork:PlenumPress,1998,pp.113{119. [38] A.DelormeandS.J.Thorpe,\Faceidenticationusingonespikeperneuron:resistancetoimagedegradations,"NeuralNetworks,vol.14,pp.795{803,2001. [39] W.Maass,T.Natschlager,andH.Markram,\Real-timecomputingwithoutstablestates:Anewframeworkforneuralcomputationbasedonperturbations,"NeuralComputation,vol.14,no.11,pp.2531{2560,2002. [40] R.V.Rullen,R.Guyonneau,andS.J.Thorpe,\Spiketimesmakesense,"TrendsNeurosci.,vol.28,pp.1{4,2005. [41] R.V.Rullen,J.Gautrais,A.Delorme,andS.J.Thorpe,\Faceprocessingusingonespikeperneurone,"Biosystems,vol.48,pp.229{239,1998. [42] H.Jaeger,\The\echostate"approachtoanalysingandtrainingrecurrentneuralnetworks,"GermanNationalResearchCenterforInformationTechnology,Tech.Rep.GMDReport148,2001. [43] U.MoisslandU.Meyer-Base,\Acomparisonofdierentmethodstoassessphase-lockinginauditoryneurons,"inInternationalConferenceofIEEE-EMBS,vol.2,2000,pp.840{843. 94

PAGE 95

E.Zwicker,\Subdivisionofaudiblefrequencyrangeintocriticalbands,"J.Acoust.Soc.Am.,vol.33,p.248,1961. [45] P.Dallos,\Responsecharacteristicsofmammaliancochlearhaircells,"J.Neurosci.,vol.5,no.6,pp.1591{1608,1985. [46] P.A.Cariani,\Temporalcodingoftheperiodicitypitchintheauditorysystem:anoverview,"NeuralPlast,vol.6,no.4,pp.147{172,1999. [47] D.A.ReynoldsandR.C.Rose,\Robusttext-independentspeakeridenticationusinggaussianmixturespeakermodels,"IEEETrans.SpeechAudioProcess.,vol.3,pp.72{83,1995. [48] J.J.HopeldandC.D.Brody,\Whatisamoment?corticalsensoryinformationoverabriefinterval,"Proc.Natl.Acad.Sci.USA,vol.97,no.25,pp.13919{13924,2000. [49] W.Maass,T.Natschlger,andH.Markram,\Computationalmodelsforgenericcorticalmicrocircuits,"inComputationalNeuroscience:AComprehensiveApproach,J.Feng,Ed.ChapmanHallCRCPress,2004,pp.575{605. [50] H.Markram,J.Lubke,M.Frotscher,andB.Sakmann,\Regulationofsynapticecacybycoincidenceofpostsynapticapsandepsps,"Science,vol.275,pp.213{215,1997. [51] B.WidrowandS.D.Stearns,AdaptiveSignalProcessing.NewJersey:Prentice-Hall,1985. [52] S.Haykin,NeuralNetworks:AComprehensiveFoundation,2ndEd.NewJersey:Prentice-Hall,1999. [53] A.H.Sayed,FundamentalsofAdaptiveFiltering.NewYork:Wiley,2003. [54] J.K.A.HyvarinenandE.Oja,IndependentComponentAnalysis.NewYork:Wiley,2001. [55] A.CichockiandS.I.Amari,AdaptiveBlindSignalandImageProcessing:LearningAlgorithmsandApplications.NewYork:Wiley,2002. [56] D.ErdogmusandJ.C.Principe,\Anerror-entropyminimizationalgorithmforsupervisedtrainingofnonlinearadaptivesystems,"IEEETransactionsonSignalProcessing,vol.50,no.7,pp.1780{1786,2002. [57] ||,\Generalizedinformationpotentialcriterionforadaptivesystemtraining,"IEEETransactionsonNeuralNetworks,vol.13,no.5,pp.1035{1044,2002. [58] C.E.ShannonandW.Weaver,TheMathematicalTheoryofCommunication.UniversityofIllionisPress,Urbana,1964. [59] T.CoverandJ.Thomas,ElementsofInformationTheory.NewYork:Wiley,1991. 95

PAGE 96

L.G.J.Berilant,E.J.DudewiczandE.C.vanderMeulen,\Nonparametricentropyestimation:Anoverview,"InternationalJournalofMathematicalandStatisticalSciences,vol.6,no.1,pp.17{39,1997. [61] E.PArzen,\Onestimationofaprobabilitydensityfunctionandmode,"1967. [62] L.DevroyeandG.Lugosi,CombinatorialMethodsinDensityEstimation.NewYork:Springer-Verlag,2001. [63] B.W.Silverman,DensityEstimationforStatisticsandDataAnalysis.London:ChampmanandHall,1986. [64] D.Comaniciu,\Analgorithmfordata-drivenbandwidthselection,"IEEETrans.onPatternAnalysisMachineIntelligence,vol.25,no.2,pp.281{288,2003. [65] E.G.MillerandJ.W.Fisher,\Icausingspacingsestimatesofentropy,"inProceed-ingsoftheFourthInternationalSymposiumonICAandBlindSignalSeparation,2003,pp.1047{1052. [66] O.Vasicek,\Atestfornormalitybasedonsampleentropy,"JournaloftheRoyalStatisticalSocietySeriesB,vol.38,no.1,pp.54{59,1976. [67] D.KaplanandL.Glass,Understandingnonlineardynamics.NewYork:Springer-Verlag,1995. [68] A.S.WeigendandN.A.Gershenfeld,Timeseriesprediction:forecastingthefutureandunderstandingthepast.Reading,MA:Addison-Wesley,1994. [69] D.G.Luenberger,Linearandnonlinearprogramming.Reading,MA:Addison-Wesley,1973. [70] S.LinkwitzandH.Grabert,\Energydiusionofaweaklydampedandperiodicallydrivenparticleinananharmonicpotentialwell,"PhysicalReviewB(CondensedMatter),vol.44,no.21,pp.11888{11900,1991. [71] M.Buchler,S.Allegro,S.Launer,andN.Dillier,\Soundclassicationinhearingaidsinspiredbyauditorysceneanalysis,"EURASIPJournalonAppliedSignalProcessing,vol.18,pp.2991{3002,2005. [72] M.D.SkowronskiandJ.G.Harris,\Minimummeansquarederrortimeseriesclassicationusinganechostatenetworkpredictionmodel,"IEEEProc.oftheInternationalSymposiomonCircuitsandSystems,pp.3153{3156,2006. 96

PAGE 97

IsmailUysalwasborninAnkara,Turkeyin1980.HereceivedhisB.Sdegreeinelectricalengineering(summacumlaude)in2003fromOrtaDoguTeknikUniversitesi-Ankara,Turkey.HereceivedtheM.Sdegree(undertheguidanceofDr.JohnG.Harris)inelectricalengineering(GPA4.0),in2006fromtheUniversityofFlorida-Gainesville,Florida.HereceivedthePh.D.degree(undertheguidanceofDr.JohnG.Harris)inelectricalengineering(GPA4.0),withemphasisindigitalsignalandspeechprocessing,in2008fromtheUniversityofFlorida-Gainesville,Florida.Dr.UysalcametoUnitedStatesforhisgraduatestudiesin2003.HehasbeenworkingwithDr.JohnG.Harrissince2004asaresearchassistantofComputationalNeuroEngineeringLaboratory,UF.Duringhisstudies,hehasauthored/co-authoredoveradozenpeer-reviewedarticlesandistheco-inventorofthepatenttitled,\MethodandSystemforBandwidthExpansionforVoiceCommunications."Dr.UysalisamemberofEtaKappaNuandTauBetaPi.HehasreceivedmultipleawardswhilepursuinghisPh.D.degree,includingtheUFAlumniFellowshipandtheInternationalCenterAwardforExcellency.Hiscurrentresearchinterestsincludespeechprocessing,computationalneurosciencespecializinginauditorysystemandmachinelearning. 97