Citation
Automatic Assessment of Disordered Speech Intelligibility

Material Information

Title:
Automatic Assessment of Disordered Speech Intelligibility
Creator:
Singh, Savyasachi
Place of Publication:
[Gainesville, Fla.]
Florida
Publisher:
University of Florida
Publication Date:
Language:
english
Physical Description:
1 online resource (77 p.)

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Electrical and Computer Engineering
Committee Chair:
HARRIS,JOHN GREGORY
Committee Co-Chair:
PRINCIPE,JOSE C
Committee Members:
WU,DAPENG
SHRIVASTAV,RAHUL
Graduation Date:
8/9/2014

Subjects

Subjects / Keywords:
Covariance ( jstor )
Databases ( jstor )
Feature extraction ( jstor )
Linear regression ( jstor )
Mutual intelligibility ( jstor )
Personnel evaluation ( jstor )
Regression analysis ( jstor )
Signals ( jstor )
Sound pitch ( jstor )
Spoken communication ( jstor )
Electrical and Computer Engineering -- Dissertations, Academic -- UF
automatic -- disordered -- dysarthric -- intelligibility -- parkinsons -- processing -- speech
Genre:
bibliography ( marcgt )
theses ( marcgt )
government publication (state, provincial, terriorial, dependent) ( marcgt )
born-digital ( sobekcm )
Electronic Thesis or Dissertation
Electrical and Computer Engineering thesis, Ph.D.

Notes

Abstract:
The aim of this study is to develop an automatic system for evaluation of the speech intelligibility for the patients of Parkinson's disease. The speech analysis system presented here is very flexible and versatile, and can also be used for dysarthric speech arising from other disease processes. The perceptual evaluation of disordered speech by a speech pathologist is an important way of identifying and differentiating among the different types of dysarthria, as well as quantifying the severity. Perceptual rating by listeners is called subjective evaluation of speech, whereas the proposed system is objective evaluation of speech which uses acoustic speech signal and automatic processing to predict scores that match human judgements. The proposed system is inexpensive, non-invasive, non time-consuming and highly consistent. Our system consists of two major steps and takes speech signal as input. We have developed a feature extraction system which employs a computational auditory model for obtaining spectro-temporal internal representations. The internal representation of the patient's speech is compared to that of the healthy (perfectly intelligible) speech using correlation. The next step of processing is that of scoring which accepts feature vectors as inputs and maps them to a scalar value representing the speech intelligibility score. We solve the problem of scoring using the supervised learning technique of regression where feature vectors are the input variables (regressors) and quality score is the target (dependent) variable. We apply linear regression and other non-linear methods such as Gaussian process regression and support vector regression. These models are trained on a dataset with known perceptual ratings, and thereafter used for prediction. The prediction performance is evaluated using various metrics such as mean-square error and Pearson's correlation coefficient. In this document we study the performance of our system in the intelligibility prediction experiment using a database of 160 sentences collected from 48 Parkinson's patients. The results of intelligibility prediction are encouraging, for example, the correlation between perceptual scores and computed scores is high (>0.9) showing excellent agreement. ( en )
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Thesis:
Thesis (Ph.D.)--University of Florida, 2014.
Local:
Adviser: HARRIS,JOHN GREGORY.
Local:
Co-adviser: PRINCIPE,JOSE C.
Statement of Responsibility:
by Savyasachi Singh.

Record Information

Source Institution:
UFRGP
Rights Management:
Copyright Singh, Savyasachi. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Classification:
LD1780 2014 ( lcc )

Downloads

This item has the following downloads:


Full Text

PAGE 1

AUTOMATICASSESSMENTOFDISORDEREDSPEECHINTELLIGIBILITYBySAVYASACHISINGHADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2014

PAGE 2

c2014SavyasachiSingh 2

PAGE 3

Tomyfamily,friendsandteachers 3

PAGE 4

ACKNOWLEDGMENTS IamgreatlyindebtedtomyadvisorDr.JohnG.Harrisforgivingmetheopportunitytoworkinhislab.Hehasbeenaveryunderstandingandapatientmentor.ItisfromhimthatIlearnttobeanalytical,thinkcreatively,explorenewideasandexploresimplesolutionsforcomplexproblems.Iamgratefultohimfortrustingmewiththeresponsibilityofdeningthescopeofmyprojects,coursesandresearchwork.AsagoodcoachhewasalwaystheretocriticallypointouttheawsinmydecisionmakingprocessandtoencouragemewheneverIdidmanagetoputthingsinplaceintime.IwouldliketothankmycommitteemembersDr.JoseC.Principe,Dr.RahulShrivastavandDr.DapengWufortakingthetimetoprovidemewithvaluablefeedbackonmyresearch.Dr.Shrivastavhasprovidedmefeedback,ideasandhelpedmebroadenmyhorizonsatvariousjuncturesofthisresearch.IamalsothankfultoDr.Principeforhiscommentsandsuggestionswhichsignicantlyshapedthisresearchwork.VaibhavGargandManuRastogihavebeenverypatientfriendsandcriticsthroughout.IamspeciallythankfultoDr.Shrivastav'slabforprovidingmewiththedisorderedspeechdatabasewithoutwhichthisresearchcouldnotbeconducted.IhavegainedimmenselyfromtheintensediscussionswithJeremyAndersonandKwansunChowhichgaverisetonewideas.Withouttheirtechnicalexpertise,helpandsupportIpossiblywouldnothavebeenabletocompletethiswork.CNELlabmates,especiallyMeenaRamani,IsmailUysal,AlexanderSinghandSohanSeth,havebeenaconstantsourceofsupport,motivationandfunduringmytimeatCNEL.Iamindebtedtomyparentsandmyfamilyforprovidingmewithbestpossibleeducationandopportunities.Iamthankfulthattheyingrainedintomethevalueofeducationandself-reliance.Theyhavebeenaconstantsourceofencouragement,supportandenlightenment.Finally,IwouldliketoacknowledgeVeteransAffairsAdministration,Gainesvilleforprovidingthefundingtoconductthisresearch. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 7 LISTOFFIGURES ..................................... 8 ABSTRACT ......................................... 9 CHAPTER 1GENERALINTRODUCTION ............................ 11 1.1Dysarthria .................................... 11 1.1.1Denition ................................ 11 1.1.2Causes ................................. 12 1.1.3DiagnosisandTreatment ........................ 12 1.2EvaluationofDisorderedSpeech ....................... 14 1.2.1SubjectiveMethod ........................... 14 1.2.2ObjectiveMethod ............................ 16 2BACKGROUNDONOBJECTIVEMETHODS ................... 18 2.1SpeechIntelligibilityPrediction ........................ 18 2.2OutlineoftheProposedSystem ....................... 22 3FEATUREEXTRACTION .............................. 23 3.1OverviewoftheProposedFeatureExtractionSystem ........... 24 3.2DescriptionoftheProposedFeatureExtractionSystem .......... 26 3.2.1Jepsen'sAuditoryModel ........................ 26 3.2.1.1Dualresonancenonlinearlterbank ............ 27 3.2.1.2Hair-celltransduction .................... 29 3.2.1.3Neuraladaptation ...................... 29 3.2.1.4Modulationlterbank .................... 30 3.2.2TimeAlignment ............................. 31 3.2.2.1Mel-frequencycepstrumcoefcients ............ 32 3.2.2.2Dynamictimewarping .................... 34 3.2.3PitchAnalysis .............................. 37 4SCORING ....................................... 40 4.1RegressionAnalysis .............................. 40 4.1.1LinearRegression ........................... 40 4.1.2GaussianProcessRegression .................... 42 4.1.2.1Weightspaceview ...................... 43 4.1.2.2Functionspaceview ..................... 45 5

PAGE 6

4.1.2.3Covariancefunctions(kernels) ............... 47 4.1.2.4Modelselectionandadaptationofhyperparameters ... 49 4.1.3Multi-layerPerceptronRegression .................. 51 4.1.4RadialBasisFunctionNetworkRegression ............. 55 4.1.5SupportVectorRegression ...................... 56 4.2PerformanceEvaluationofRegressionResult ................ 59 5EXPERIMENTSANDDISCUSSION ........................ 61 5.1SpeechIntelligibilityEvaluationExperiment ................. 61 5.1.1Database ................................ 61 5.1.2FeatureExtraction ........................... 64 5.1.3RegressionAnalysis .......................... 65 5.1.4AutomaticSpeechRecognitionBasedSpeechIntelligibilityPredictionExperiment ............................... 67 5.2Discussion ................................... 70 5.3FutureWork ................................... 71 5.4Conclusion ................................... 73 REFERENCES ....................................... 74 BIOGRAPHICALSKETCH ................................ 77 6

PAGE 7

LISTOFTABLES Table page 3-1ValuesoftheDRNLlterbankparameters. ..................... 28 4-1Somecommonlyusedcovariancefunctions. ................... 48 4-2Somecommonlyusedradialbasisfunctions. ................... 56 5-1ListofsentencesfromthedisorderedspeechdatabaseofParkinson'spatients. 62 5-2Predictionperformancemetricsfortheproposedsystem. ............ 67 5-3PredictionperformancemetricsfortheDTWbasedsystem. ........... 70 5-4Predictionperformancemetricsfortheproposedsystemwithoutpitchanalysis. 71 5-5Predictionperformancemetricsfortheproposedsystemwithoutmodulationlterbank. ....................................... 72 7

PAGE 8

LISTOFFIGURES Figure page 2-1Schemaofatypicalautomaticspeechevaluationsystem. ............ 18 2-2Schemaoftheproposeddisorderedspeechintelligibilityevaluationsystem. .. 22 3-1Schemaoftheproposedfeatureextractionsystem. ................ 25 3-2Schemaoftheauditorymodel. ........................... 27 3-3SchemaoftheDRNLlterbank. .......................... 27 3-4Neuraladaptationcircuitoftheauditoryperceptionmodel. ............ 29 3-5Modulationlterbank. ................................ 30 3-6Schemaofthetimealignmentsystem. ....................... 32 3-7DFTweightingfunctionsformel-frequencycepstrumcomputation. ....... 33 3-8TimealignmentoftwoutterancesusingDTW. ................... 36 3-9PitchanalysisusingSWIPE0algorithm. ...................... 39 5-1HistogramofspeechintelligibilityscoresforParkinson'sspeechdatabase. ... 63 5-2PerformancecurveforautomaticadaptationofhyperparametersforGaussianprocessregressionmodel. ............................. 66 5-3Scatterplotofactualandpredictedintelligibilityscoresforlinearregression. .. 68 5-4ScatterplotofactualandpredictedintelligibilityscoresforGPregression. ... 68 5-5Scatterplotofactualandpredictedintelligibilityscoresfor"-SVregression. .. 68 8

PAGE 9

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophyAUTOMATICASSESSMENTOFDISORDEREDSPEECHINTELLIGIBILITYBySavyasachiSinghAugust2014Chair:JohnG.HarrisMajor:ElectricalandComputerEngineering ThisdissertationdescribesanautomaticsystemforevaluationofthespeechintelligibilityforthepatientsofParkinsonsdisease.Thespeechanalysissystempresentedhereisveryexibleandversatile,andcanalsobeusedfordysarthricspeecharisingfromotherdiseaseprocesses.Theperceptualevaluationofdisorderedspeechbyaspeechpathologistisanimportantwayofidentifyinganddifferentiatingamongthedifferenttypesofdysarthria,aswellasquantifyingtheseverity.Perceptualratingbylistenersiscalledsubjectiveevaluationofspeech,whereastheproposedsystemisobjectiveevaluationofspeechwhichusesacousticspeechsignalandautomaticprocessingtopredictscoresthatmatchhumanjudgements.Theproposedsystemisinexpensive,non-invasive,nontime-consumingandhighlyconsistent.Oursystemconsistsoftwomajorstepsandtakesspeechsignalasinput.Wehavedevelopedafeatureextractionsystemwhichemploysacomputationalauditorymodelforobtainingspectro-temporalinternalrepresentations.Theinternalrepresentationofthepatientsspeechiscomparedtothatofthehealthy(perfectlyintelligible)speechusingcorrelation.Thenextstepofprocessingisthatofscoringwhichacceptsfeaturevectorsasinputsandmapsthemtoascalarvaluerepresentingthespeechintelligibilityscore.Wesolvetheproblemofscoringusingthesupervisedlearningtechniqueofregressionwherefeaturevectorsaretheinputvariables(regressors)andqualityscoreisthetarget(dependent)variable.Weapplylinearregressionandothernon-linearmethodssuchas 9

PAGE 10

Gaussianprocessregressionandsupportvectorregression.Thesemodelsaretrainedonadatasetwithknownperceptualratings,andthereafterusedforprediction.Thepredictionperformanceisevaluatedusingvariousmetricssuchasmean-squareerrorandPearson'scorrelationcoefcient.Inthisdocumentwestudytheperformanceofoursystemintheintelligibilitypredictionexperimentusingadatabaseof160sentencescollectedfrom48Parkinsonspatients.Theresultsofintelligibilitypredictionareencouraging,forexample,thecorrelationbetweenperceptualscoresandcomputedscoresishigh(>0.9)showingexcellentagreement. 10

PAGE 11

CHAPTER1GENERALINTRODUCTION Orallanguageisacodewithstructuralpropertiescharacterizedbyasetofrulesforproducingandcomprehendingspokenutterances.Humancommunicationmakesusenotonlyoflanguage,butofothervocalsoundssuchasscreamsandwhines.Speechisourmostcommonmodeofusingorallanguage.Normalspeechisaremarkablyuseful,powerfulandconvenientformofcommunication.Anyabnormalityinthespeechproductionorcommunicationmechanismiscalledaspeechdisorder.Speechdisorderscanbeofdifferenttypessuchasdysphasia,phonologicaldisorders,voicedisorders,dysarthriaanddisordersinspeechow(stuttering). TheaimofthisstudyistodevelopanautomaticanalysissystemforpredictingspeechintelligibilityforthepatientsofParkinson'sdisease.Thespeechanalysissystempresentedhereisveryexibleandversatile,andcanbeusedfordifferenttypesofdysarthricspeech.Hencetheterm“disorderedspeech”isusedinthisdocumentwhichreectsthewideapplicabilityofmethodsproposedinthisstudy. 1.1Dysarthria 1.1.1Denition Aheterogeneousgroupofspeechdisordersexistsastheresultofvariousdisturbancesinvoluntarycontroloverthespeechmusculature.Thesedisturbancesareconsequencesofdifferentformsofdamageormaldevelopmentofthenervous(centralorperipheral)ormusclesystemsandarecalleddysarthria.Thepatientissimplyunabletospeakwithnormalmuscularspeed,strength,precisionortiming.Wecanheartheeffectsofthesemuscularabberations;problemsofphonationbecauseofinterferencewiththelaryngealandrespiratorymusculature,ofresonationbecauseofinterferencewiththemusclesofsoftpalateandpharynx,andofarticulationbecauseofinterferencewiththelip,tongueandjawmusculature.Prosody,themelody,stressand 11

PAGE 12

rhythmpatternsofcontextualspeechisdisorderedbecauseofthecumulativeproblemsofthesedifferentmotor-speechsystems. Insomecases,dysarthriaismildwhileinothers,muscularweaknessandincoordinationaresoseverethatspeechisunintelligibleandpracticallyuseless.Thetermanarthriaisoftenusedtodenotenearlyunintelligibledysarthricspeechorthecompleteabsenceofspeechforreasonsrelatedtodysarthria.Dysarthriacanoccuratanyage.Dysarthriaisoneofthemostdifculttoremediateofallthedisorders,anditisadisorderthatwillbecomemorewidespreadastheaveragelifeexpectancyeasesupwardandtheneurologicaldiseasesoftheagedbecomemorecommon. 1.1.2Causes Dysarthriacanbecausedbyvariousconditions,including: 1. Cerebralpalsy 2. Headtrauma 3. Huntington'sdisease 4. Multiplesclerosis 5. Neurologicaldisordersaffectingthecentralnervoussystem. 6. Neuromusculardisease 7. Parkinson'sdisease 8. Stroke 1.1.3DiagnosisandTreatment Uponlisteningtothespeechofindividualswithseveredysarthria,itisevidentthattheirspeechhaslimitedeffectivenessasameansofcommunication.Dysarthriahasserioussocial,psychologicalandeconomicalconsequencesforthepatients.Speechcliniciansandspeechresearchersarestrivingtodevelopbettermethodswithwhichtohelpdysarthricpatientsacquiremoreintelligiblespeech. 12

PAGE 13

Thereisacauseandeffectrelationshipbetweenthelocationofdamagewithinthenervoussystemandthetypeofdysarthriathatresults.Eachofthedifferentanatomicalandfunctionalcomponentsofthenervoussystemservesaspecicpurposeinthespeechproductionmechanism.Shouldaparticularmotorcomponentofthenervoussystembedamaged,itsspecialfunctionwillbeimpaired.Thatimpairmentwillrevealitselfbymeansofaspecickindofmovementabnormalitywhichwillproducesimilarlyspecicdysarthricspeechchangesthatcanbeidentiedbytheskilledclinician.Manyneurologicaldiseasesrevealthemselvesrst,andoftenonly,bywayofperipheralspeechmechanismmalfunction.Henceaberrantmotorspeechbehaviorcanbelikeangerprintofthepresenceandlocationoftissuedamageinthenervoussystem. Thespeechandlanguagepathologist(SLP)auditoryperceptionofabnormalspeechisextremelyimportantforidentifyinganddifferentiatingamongthedifferenttypesofdysarthriaandforplanningacourseforrehabilitation.Therearetwocommonwaysoflisteningtospeechintheevaluation.Oneistolistenforspecicelementswithinprescribedspeechtasksinanattempttoisolatewhichstructuresorsubsetsofstructuresareimpaired.Forexampleasustainedphonation(assteadyandclearaspossible)ofavowelcanhelpdetectabnormalitiesofpitch,loudness,durationandsteadiness. Anothercommonwayoflisteningtodysarthricspeechrequiresaconnecteddiscoursespeechsample.SLPattendstothetonalityoroverallvoicequalityofthepatient'sspeechandattemptstomakejudgementsaboutthefunctionofspeechoutputparametersandtheirdifferentinteractions.Therespiratorysystemmightbesuspectedofdysfunctionviaapatient'slowvoiceloudness,mono-loudnessoruncontrollablealterationofloudness.Thelaryngealmechanismmightbesuspectedofcontrolproblemsifapatientexhibitsexcessivelyloworhighpitchvoice,orvoicetremor,oravoicethatisperceivedasmonopitched.Laryngealcontrolproblemsalsomanifestthemselvesasvoicequalitydisorders,suchas“hoarseness”,“breathiness”and 13

PAGE 14

“roughness”.Motordisordersofarticulatorymusculaturearerevealedbyageneralizedlackinprecisionintheproductionofconsonants,shorteningofsounddurations,abnormallysloworrapidspeechrates,alterationinratefromslowtofastorintermittentbreakdownsintheaccuracyofarticulation.Hence,theSLP'searisanimportantclinicaldevice.Othermethodsofdiagnosingdysarthriaincludeinspectionoftheperipheralspeechmechanismandotherphysiologicalandacousticalanalysisofspeech. Treatmentfordysarthriavariesfromcasetocase.Methodsincludemedical-surgicalintervention,neurologicalfacilitationapproachesandlearning-relearningtherapies.Theoutcomeofthesetreatmentsonpatient'sspeechisanalyzedbytheSLP,whichhelpsintrackingtheprogressoftreatment. 1.2EvaluationofDisorderedSpeech 1.2.1SubjectiveMethod AsdescribedinSection 1.1.3 ,theauditoryperceptionofdisorderedspeechbytheSLPistheprimarymethodofdiagnosis.TheSLPlistenstothepatient'sspeechandattachesperceptuallabelstoit,suchas“hoarse”,“rough”,“breathy”,“fast”and“slow”.Inaddition,speechpathologistsoftenassignscorestotheselabelsforquantifyingtheseverityortheirmagnitude.Thesescorescanbeonacontinuousscaleoradiscretescale.Acompositescorecanalsobeassignedtoaspeechsamplequantifyingtheoverallspeechqualityordegreeofdisorder.Thistypeofspeechevaluationisasubjectivemethod.Oneofthedrawbacksofthisapproachispoorinter-andintra-judgereliability.TheSLPwilloftenscorethesamespeechsampledifferentlyatdifferenttrialsovertime,whichraisesissuesaboutconsistency.Sometimestheperceptuallabelsareverycomplexandthejudgesfailtoreachaconsensus(statistically)duetothedifferencesinperception.Controllingthevariabilityofsubjectivejudgementsrequireselaborateexperimentalmethodologies,whichmakessuchtestimpracticalforroutineclinicaluse.Theoutcomeofasubjectiveevaluationisheavilydependentontheleveloftrainingandexperienceoftheclinician.Manyofthemethodsemployedinpractice 14

PAGE 15

arenon-standardized.Despitetheseshortcomings,thesubjectiveevaluationmethodisthemostpopularandtrustedapproachfordiagnosisprimarilybecausehumansoutperformcomputersinthisspeechpatternanalysistask.Humansareequippedwithahighlyspecializedandsophisticatedauditoryprocessingsystemcoupledwiththebrainwhichprocessestheresponsestospeech/audiostimulusandcreatestheauditoryperception.Hence,theperceptualevaluation/scoringbytheSLPservesasagoldstandarduponwhichothermethodsareevaluated.Differentkindsofratingscalesandvariousratingsystems,suchasGrade,Roughness,Breathiness,Asthenia,Strain(GRBAS)orconsensusauditory-perceptualevaluationofvoice(CAPE-V)havebeenproposedforperceptualassessmentofvoicequality. Anotherkindofsubjectiveevaluationassessestheintelligibilityofspeech.Inphonetics,intelligibilityisameasureofhowcomprehendiblespeechis,orthedegreetowhichspeechcanbeunderstood.Intelligibilityisaffectedbyfactorssuchasspokenclarity,comprehensibility,andprecision.Forexample,speechfromapatientsufferingfrommilddysarthriawillbeoflowerqualitybutcouldstillbehighlyintelligible.Conversationalspeechisthemostsocially-validcontextforevaluatingspeechintelligibility.Anunderlyinggoalofallhumanspeechcommunicationistopresentamessagethatisintelligibletoanotherhumanbeing.Achievingthatgoalisnotsimple,becausealargenumberofvariablesareinvolved.First,thespeakerandlistenermustbeusingthesamelanguage.Second,thespeakermustproducealloftheunitsofthatlanguagewithsufcientamplitudeandwithappropriateprosodicvariation.Third,thelistenermusthavesufcienthearingtodecipherthespokenlanguage.Fourth,theambientnoiselevelmustbelowenoughtopreventinterferencewiththesignal.Similarlymakingoneselfunderstoodinconversationalspeechisnotsimple,becauseunderstandingwhatisbeingsaidisnotasingleeventbutratheraseriesofevents.Speakingratesandintensitiesofoutputvaryconsiderablyfromutterancetoutteranceasdoesaccuracyofproduction,andlengthandcomplexityoftheutterances.Inaddition 15

PAGE 16

temporarydistractions(cognitiveorvisual)placeanaddedburdenonthelistener.Conversationalspeechincludeselementsthatactuallyassistthelistenerincludingthelinguisticcontextofthemessage(creatingvaryingdegreesofpredictabilitywithineachutterance),nonverbalsignals(gestures,posturesandfacialexpressions),andtheconversationalmilieu(i.e.,topicbeingdiscussed,precedingutterances,etc.).Theinterestlevelofthelistenerplaysaroleaswell. 1.2.2ObjectiveMethod Oneoftheaimsofautomaticevaluationofspeechistomimichumanperceptionwithhighreliability.Themeasurementsfromthepatient'sspeecharethenprocessedbyacomputertoprovidescores.Thisistheobjectivespeechevaluationmethod.Objectiveevaluationofspeechisinexpensive,non-invasive,nontime-consuming,highlyconsistentandfullyautomatic.Further,acousticdatacanbeeasilysharedamongclinicsandresearchers.Hence,automaticevaluationmethodsarebecominganindispensabletoolforspeechpathologistswhocandevotemoretimetothespeechrehabilitationofthepatient.Objectiveevaluationmethodslargelyemployauditorysignalprocessingandpatternanalysisalgorithms.Someoftheevaluationmethodsarebasedonsignalprocessingmodelsthatareinspiredbybiologicalmechanisms.Since,thesemethodsarebasedonclearlydenedalgorithmsandwelldenedprinciples,theycanbeeasilystandardized.AnobjectiveevaluationsystemcanbeimplementedasasoftwarewithasimpleinterfaceforusebytheSLP.Currentlyclinicalcentersandhospitalsdonothaveaccesstotheautomaticmethodsforevaluationofdisorderedspeech. Mostoftenobjectivescoresaredeterminedsolelyonthebasisofacousticalsignals,becauseoftheeaseandnon-invasivenatureofthespeechdataacquisitionprocess.Patient'sspeechisrecordedbyaclinicianandstoreddigitallyforfurtherprocessing.Hence,theacousticalsignalbasedscoringapproachhasattractedalotofresearchrecently. 16

PAGE 17

Inthisstudy,anacousticalsignalbasedspeechintelligibilityevaluationsystemisproposedfordisorderedvoices.Furtherorganizationofthisreportisasfollows.Chapter 2 brieydiscussessomeofthecurrentapproachesandintroducestheproposedsystem.Chapter 3 describesthefeatureextractionsystemusedinthisstudy.InChapter 4 wediscussthescoringtechniqueswhichmapthefeaturemeasurementstothespeechintelligibilityscore.WeconcludewithChapter 5 ,withthedescriptionofexperimentsconductedinthisstudy,discussionoftheirresultsandplansforfuturework. 17

PAGE 18

CHAPTER2BACKGROUNDONOBJECTIVEMETHODS AsdiscussedinSection 1.2 ,theeldofautomaticdisorderedspeechevaluationisanactiveresearchareawithawidevarietyofapproachesbeingproposed.Figure 2-1 showstheschemaofatypicalobjectivespeechevaluationsystem.Thespeechsignalisprocessedtoextractrelevantfeatures,whicharefurtherprocessedbyascoringalgorithmtoprovideascore(ormultiplescores).Objectiveevaluationofspeechintelligibilityhasbeenwidelyresearchedinthecommunicationsystemsareaincludingtelecommunicationsandspeechenhancementapplications.Nopublishedstudiesexistwhichtackletheproblemofpredictingspeechintelligibilitybycomputationalmethodsfordysarthricpatientswhichisthegoalofthepresentstudy.Section 2.1 reviewssomeoftheapproachesforspeechintelligibilitypredictionbasedonacousticmeasurements. . . FeatureExtraction . Speech . ScoringAlgorithm . Score Figure2-1. Schemaofatypicalautomaticspeechevaluationsystem. 2.1SpeechIntelligibilityPrediction Manyresearcheffortshavefocussedontheproblempredictingintelligibilityofspeechdegradedbynoise,transmissionchanneldistortionsandspeech-codecartifacts.Inthedevelopmentprocessofnoise-reductionalgorithms,anobjectivemachine-drivenintelligibilitymeasurewhichshowshighcorrelationwithspeechintelligibilityisofgreatinterest.Besidesreducingtimeandcostscomparedtoreallisteningexperiments,anobjectiveintelligibilitymeasurecouldalsohelpprovideanswersonhowtoimprovetheintelligibilityofnoisyunprocessedspeech. FrenchandSteinberg[ 1 ]presentedoneoftheearliestsystemstopredicttheintelligibilityofspeechsentthroughatransmissionsystemcalledtheArticulationIndex 18

PAGE 19

(AI).AIisbasedontheideathattheresponseofaspeechcommunicationsystemcanbedividedintotwentyfrequencybands,eachofwhichcarriesanindependentcontributiontotheintelligibilityofthesystem,andthatthetotalcontributionofallthebandsisthesumofthecontributionsoftheindividualbands.(AImayalsobemeasuredusingone-thirdoctaveoroctavebands.)Signal-to-noiseratiosarecomputedforeachindividualband,thenweightedandcombinedtoyieldanintelligibilityscore.TheAIvariesinvaluefrom0(completelyunintelligible)to1(perfectintelligibility).AnAIof0.3orbelowisconsideredunsatisfactory,0.3to0.5satisfactory,0.5to0.7good,andgreaterthan0.7verygoodtoexcellent.Thisapproachevolvedintothespeechintelligibilityindex(SII)andwasstandardizedasS3.5-1997[ 2 ].SinceAIismainlymeantforsimplelineardegradations,e.g.,additivenoise,SteenekenandHoutgast[ 3 ]proposedthespeechtransmissionindex(STI),whichisalsoabletopredicttheintelligibilityofspeechdegradedbyreverberations,nonlineardistortions,peakclippingandcenterclipping.Forthisobjectivemeasure,anoisesignalwiththelong-termaveragespectrumofspeechisamplitudemodulatedatseveralmodulationfrequencieswithacosinefunctionandappliedtothecommunicationchannel.TheeventualoutcomeoftheSTIisthenbasedontheeffectonthemodulationdepthwithinseveralfrequencybandsattheoutputofthecommunicationchannel.Rhebergenetal.[ 4 ]presentedtheextendedSII(ESII)forthepredictionofthespeechreceptionthresholdinuctuatingnoise.However,theESIIrequiresaccesstothetargetspeechandtheinterferingnoiseseparatelyandcannotbeusedincaseswherethemixtureisdegradedorenhancedbysometypeofsignalprocessingalgorithm.SimilarlyKatesandArehart[ 5 ]proposedcoherenceSII(CSII)asameansforestimatingspeechintelligibilityunderconditionsofadditivestationarynoiseorbandwidthreduction.GoldsworthyandGreenberg[ 6 ]providesanoverviewofobjectivemeasuresbasedonAI.Althoughtheafore-mentionedobjectiveintelligibilitymeasuresaresuitableforseveraltypesofdegradation(e.g.,additivenoise,reverberation,lteringandclipping),itturnsoutthattheyarelessappropriatefor 19

PAGE 20

methodswherenoisyspeechisprocessedbysometypeoftimefrequency(TF)varyinggainfunction.Thisincludessingle-channelnoise-reductionalgorithms[ 7 ],butalsospeechseparationtechniquessuchasidealtimefrequencysegregation,wheretypicallyabinaryTF-weightingisused.Forexample,STIandvariousSTI-basedmeasurespredictanintelligibilityimprovementwhenspectralsubtractionisapplied.Thisisnotinlinewiththeresultsoflisteningexperimentsintheliterature,whereitisreportedthatsingle-channelnoise-reductionalgorithmsgenerallyarenotabletoimprovetheintelligibilityofnoisyspeech.Furthermore,measureslikethecoherenceCSIIandanormalizedcovariance-basedSTIprocedure(CSTI)[ 6 ],bothshowlowcorrelationwiththeintelligibilityofITFS-processedspeech.TheSII,STIandvariantsoftheseallincludepropertiesofauditoryfrequencyselectivityinthecalculations.Someofthemodelsincludead-hoccorrectionstoaccount,tosomeextent,forupwardspreadofmasking.Still,themodelsoperateonthephysicalsignalsanddonotconsidervariousaspectsandprinciplesofauditorysignalprocessing.Also,theydonottaketheportionsofthespeechsignalintoaccountthataretemporallymaskedoremphasizedbytheprocessingintheauditorysystem. Recently,Maetal.[ 8 ]showedthatseveralintelligibilitymeasurescouldbenetfromtheuseofnew(signal-dependent)band-importancefunctions(BIF).Forexample,thecorrelationofCSIIandCSTIwiththespeechintelligibilityofsinglechannelnoise-reducedspeechincreasedsignicantlybytheuseofthesenewBIFs.InHolubeandKollmeier[ 9 ],consonantvowelconsonantwordswerepresentedafteracarrierphrase.Thetargetwordwaschosenamongvealternativesdifferingonlyinoneofthephonemes.RecognitionscoresweresimulatedbyprocessingthetestwordandthevealternativeswiththeauditorymodelofDauetal.[ 10 ].Thealternativewiththesmallestdistancetothetestwordattheleveloftheinternalrepresentationofthestimuliwasconsideredastherecognizedword.ThismeansthatthemodelofHolubeandKollmeier[ 9 ]canonlybeusedinanexperimentalsetupwherethereare 20

PAGE 21

referencewordsavailablefortheidenticationofeachtestword.Santosetal.[ 11 ]studiedseveralintelligibilitypredictionmethodsforcochlearimplantusersincomplexlisteningenvironments.BoldtandEllis[ 12 ]presentedacorrelationbasedspeechintelligibilitypredictionsystemfornonlinearspeechenhancementandseparation.Taaletal.[ 7 ]proposedashort-timeobjectiveintelligibilitymeasure(STOI).STOIisbasedoncorrelationbetweenthetemporalenvelopesofthecleananddegradedspeech,inshort-time(384ms),overlappingsegments. Severalapproacheshavebeenproposedwhichrelyoncomputationalauditorymodels.Christiansenetal.[ 13 ]presentedaintelligibilityevaluationsystemfornoisyspeechusingmodelproposedbyDauetal.[ 14 ].Inthisstudytheycombinedapsychoacousticallyvalidatedmodelofauditorypreprocessing[ 14 ]withasimplecentralstagethatdescribesthesimilarityofthetestsignalwiththecorrespondingreferencesignalataleveloftheinternalrepresentationofthesignals.Theproposedmodelsuccessfullycapturesthetrendsinthespeechinnoisetrainingdata,andprovidesabetterpredictionofthebinarymasktestdata,particularlywhenthebinarymasksdegeneratetoanoisevocoder.Similarly,Jorgensenetal.[ 15 ]alsopresentedanapproachwhichutilizesauditorymodelingcalledthespeech-basedenvelopepowerspectrummodel(sEPSM).sESPMestimatestheenvelopepowersignal-to-noiseratioaftermodulation-frequencyselectiveprocessing.Multi-resolutionsESPMaccountswellforchangesofspeechintelligibilityfornormal-hearinglistenersinconditionswithadditivestationarynoise,reverberation,andnonlinearprocessingwithspectralsubtraction.Inthelattercondition,thestandardizedspeechtransmissionindexfails.Themulti-resolutionsEPSMisdemonstratedtoaccountforintelligibilityobtainedinconditionswithstationaryanductuatinginterferers,andnoisyspeechdistortedbyreverberationorspectralsubtraction.Theresultssupportthehypothesisthattheenvelopepowersignal-to-noiseratioisapowerfulobjectivemetricforspeechintelligibilityprediction. 21

PAGE 22

2.2OutlineoftheProposedSystem . . TimeAlignment . Healthy&PatientSpeech . FeatureExtraction . ScoringAlgorithm . IntelligibilityScore Figure2-2. Schemaoftheproposeddisorderedspeechintelligibilityevaluationsystem. Thepurposeofthisstudyistodesignanautomaticintelligibilityevaluationsystemfordysarthricspeechonaquantitativescale.Figure 2-2 showstheoutlineoftheproposedsystem.Theinputtothesystemisanacousticspeechsignalwhichcanbeaword,sentenceoraparagraph.Thefocusofoursystemistoevaluatethecontinuousspeechsample,becausevowelsmaynotideallyreectthevocalcharacteristicsofthesubject.Severalstudieshavediscussedthemeritofusingcontinuousspeechfortheassessmentofdysarthria,asitprovidesamorereliableandvalidassessmentofpatient'scontrolovervocalparameters,andmaycorrelatebetterwithsubjectiveevaluations. Theinputspeechsignalisthentime-alignedtoa“healthy”speechutteranceusingdynamictimewarping(DTW).Thefeatureextractionsystememploysacomputationalauditorymodelandpitchanalysis.Finally,regressionanalysisisusedformodelingtheSLPassignedscoresusingextractedfeaturesasregressorsonatrainingdataset.Atrainedregressionmodelisthenusedtopredictscoresfornewutterances.Subsequentchaptersdescribeeachofthesestepsindetail. 22

PAGE 23

CHAPTER3FEATUREEXTRACTION Therststepofprocessinginthespeechevaluationsystemisthatoffeaturemeasurement(Figure 2-1 ),whichprovidesanappropriaterepresentationofthetimevaryingspeech.Featureextractiontechniquesarebasedonsignalprocessingalgorithmswhichconverttheacousticspeechwaveformtosometypeofparametricrepresentationwhichisgenerallyataconsiderablylowerinformationrate.Awidevarietyofpossibilitiesexistfortheparametricrepresentationofthespeechsignal.Theseincludeshort-timeenergy,zerocrossingrateandlinearpredictioncoefcients.Somefeaturesarebasedonbio-inspiredalgorithmswhichmodelthehumanauditorysystemtosomelevelofdetail.Modelsofauditoryprocessingmayberoughlyclassiedintobiophysical,physiological,mathematicalandperceptualmodelsdependingonwhichaspectsofprocessingareconsidered.Featurescanbetaxonomizedbroadlyintothreecategoriesviz.,temporal,spectralandcepstralbasedonthedomainofrepresentation. Timedomainprocessingofspeechinvolvesdirectoperationsontheacousticsignal(orthelteredversion).Thesemethodsgenerallyperformshort-timeanalysisofthespeechsignalwithanunderlyingassumptionthatthepropertiesofthespeechsignalchangerelativelyslowlywithtimecomparedtothedetailedsampletosamplevariationsofthewaveform.Shortsegmentsofspeechareisolatedandprocessedasiftheywereshortsegmentsfromasustainedsoundwithxed(nontimevarying)properties.Thissegmentationprocessisalsocalledframinginwhichthewaveformispartitionedintoequalsizedsegments(calledanalysisframes)withorwithoutoverlapbetweensuccessiveframes.Theresultingframesareoftenmultipliedbyawindowfunction,e.g,Hamming,HanningorBlackman.Thecrucialissueinshort-timeprocessingisthechoiceofsegmentdurationorframelength.Shortersegmentsprovidebettertimeresolutionatthecostofhighdegreeofuncertaintyinthemeasurementoftargetedspeechparameterandviceversa. 23

PAGE 24

Letx[n]beaspeechsignaloflengthN,andw[n]beadatawindowoflengthL.ThentheframingprocesswithanoverlapofOsamples(whereO[0,L))willyieldF=N)]TJ /F11 7.97 Tf 6.58 0 Td[(O L)]TJ /F11 7.97 Tf 6.58 0 Td[(Oframes,whereithframe(^xifori=1,2,...F)isgivenby^xi[n]=x[(i)]TJ /F5 11.955 Tf 11.95 0 Td[(1)(L)]TJ /F9 11.955 Tf 11.96 0 Td[(O)+n]w[n],n=0,1,...L)]TJ /F5 11.955 Tf 11.95 0 Td[(1. (3) FrequencydomainprocessinginvolvesoperationsonaFourierrepresentationofthespeechsignal.Thisisparticularlyusefulinthecontextofsource-ltertheoryofspeechproductionbecausetheFouriertransformofaspeechwaveformreectsthepropertiesoftheexcitation,thevocaltractandradiationfrequencyresponses.Theshort-timeanalysistechnique(framing)inconjunctionwithFouriertransformationprovidesapowerfulandexiblerepresentationofspeechsignalscalledshort-timeFouriertransform(STFT). Cepstraldomainmethodsarebasedonhomomorphicsignalprocessingwherethecentralideaistheseparationordeconvolutionofaspeechsegmentintoacomponentrepresentingvocaltractresponse,andacomponentrepresentingtheexcitationsource.ThecepstrumofasignalisdenedastheinverseFouriertransformofthelogarithmoftheFouriertransformofthesignal. 3.1OverviewoftheProposedFeatureExtractionSystem ThefeatureextractionsystemproposedinthisstudyemploysacomputationalmodelofhumanauditorysignalprocessingproposedbyJepsenetal.[ 16 ].Outputofthemodelissubjectedtomulti-resolutionprocessingbasedontheshort-timeanalysisofsignals.Figure 3-1 showstheoutlineofthefeatureextractionsystem.Thesystemtakeshealthyspeechanddisorderedspeechutterancesofthesamewordorsentenceasinputsandprovidesafeaturevectorasanoutput.Thefeaturevectoristhenusedbythescoringalgorithmtopredicttheintelligibilityscore.Severalstudies([ 9 , 13 , 15 ])havesimilarlyemployedapsychoacousticallymotivatedperceptionmodelforpredictingspeechintelligibility. 24

PAGE 25

. . TimeAlignment . Correlation . Framing . Framing . AuditoryModel . AuditoryModel . HealthySpeech . PatientSpeech . PitchAnalysis . GroupFrames . Features Figure3-1. Schemaoftheproposedfeatureextractionsystem. Thesequenceofprocessingstepsisasfollows.FirstthehealthyspeechanddisorderedspeechwaveformsaretimealignedusingtheDynamicTimeWarping(DTW)algorithmandtheresultingwarpingpathinformationisstored.NextbothinputspeechsignalsareprocessedbytheJepsen'smodel[ 16 ].Figure 3-2 showstheoutlineoftheJepsen'smodel.Theauditorymodeltransformstheinputspeechsignalintoatimevaryingspectro-temporalinternalrepresentation(IR).Themodelusestwolterbankswithchannelsspreadacrosstheacousticfrequencyandmodulationfrequencycontinuum.Hence,theresultingIRisfourdimensionalwiththefollowingaxes;time,acousticfrequency,modulationfrequencyandmodeloutput.ThedetailedspecicationsofthemodelarepresentedinSection 3.2.1 .Theoutputoftheauditorymodelisprocessedusingmulti-resolutionanalysiswithmodulationlterdependentframedurationassuggestedin[ 13 ].TheIRissegmentedintoframesusingarectangularwindowwithnooverlap.Thedurationofthewindowissetequaltotheinverseofthemodulationchannelcenterfrequency.ThisresultsinasequenceofIRframesforeveryacousticchannelandmodulationchannel.Next,thetimealignmentinformationisusedtomatchcorrespondingIRframesfromthehealthyandpatientspeechandcorrelationiscomputedbetweenthem. 25

PAGE 26

NowthehealthyspeechwaveformissubjectedtopitchanalysisusingtheSWIPE0algorithmbyCamachoandHarris[ 17 ].InadditiontopitchestimatesSWIPE0alsoprovidesapitchstrengthestimatewhichisdenedasthesaliencyofpitch.Nowtheframesaregroupedintotwocategoriesoflowandhighpitchstrength.Thecorrelationvaluesforeveryacousticchannelareaveragedoverallmodulationchannelsforboththepitchstrengthgroups.ThisresultsinafeaturevectorwhosesizeistwicethenumberofacousticchannelsusedintheJepsen'smodel. Oneapproachtoestimateobjectiveintelligibilityofdegradedspeechistomakesometypeofcorrelation-basedcomparisonbetweenthespectro-temporalinternalrepresentationsofthecleananddegradedspeechsignal.Forexample,CSTIdeterminesacorrelationcoefcientbetweenoctave-bandtemporalenvelopesandCSIIisbasedonthecoherencefunction,whichisameasureofcorrelationbetweencomplexFourier-coefcients,overtime,asafunctionoffrequency.Anotherexampleofacorrelation-basedmeasureisthenormalizedsubbandenvelopecorrelationproposedbyBoldtandEllis[ 12 ].Inadditiontospeechcorruptedbybackgroundnoise,acorrelation-basedcomparisoncanalsobeusedforother(nonlinear)typesofdistortions,e.g.,noise-reducedspeech.Thefeatureextractionsystemproposedinthisstudyismotivatedbythecorrelationbasedapproach. 3.2DescriptionoftheProposedFeatureExtractionSystem 3.2.1Jepsen'sAuditoryModel Jepsen'sauditorymodelshowninFigure 3-2 formsthecoreofthefeatureextractionsystem.Itisafunctionalmodelwhichtriestosimulatetheinput-outputbehavioroftheperipheralauditorysystem.Themodelsuccessfullyexplainstheperceptualmaskingphenomena,theeffectsofintensitydiscriminationandspectralandtemporalmasking[ 16 ].Theinputtothemodelistheacousticspeechsignalwhereanamplitudeof1correspondstoamaximumsoundpressurelevel(SPL)of100dB.Thesignalislteredwithabandpasslterwhichmodelstheouter-earandmiddle-ear 26

PAGE 27

. . Outer-MiddleEarTF . SpeechSignal . DRNLFilterbank . HairCellTransduction . Expansion . NeuralAdaptation . ModulationFilterbank . InternalRepresentation Figure3-2. SchemaoftheauditorymodelbyJepsenetal.[ 16 ]. transferfunctions.Theouter-earlterisaheadphone-to-eardrumtransferfunction.Themiddle-earltersimulatesthemechanicalimpedancechangefromtheouter-eartothemiddle-ear.Thesetransferfunctionsarerealizedbytwolinear-phaseniteimpulseresponse(FIR)lters.Thecombinedfunctionhasasymmetricbandpasscharacteristicwithamaximumatabout800Hzandslopesof20dB/decade.Theoutputofthisstageisassumedtorepresentthepeakvelocityofvibrationatthestapesasafunctionoffrequency. 3.2.1.1Dualresonancenonlinearlterbank . . LinearGain . Gammatone1storderFilterCascade . Butterworth2ndorderLPFCascade . Gammatone1storderFilterCascade . Broken-sticknonlinearity . Gammatone1storderFilterCascade . Butterworth2ndorderLPF . Figure3-3. SchemaoftheDRNLlterbank. Theoutputoftheouter-middle-earlteringisprocessedbyadualresonancenonlinear(DRNL)lterbankproposedbyLopez-PovedaandMeddis[ 18 ].Stapesmotiontransmitsenergytotheintracochlearuid,whichinturninducesoscillationsinthebasilarmembrane(BM).ThisprocessismodeledbytheDRNLlterbank.Figure 27

PAGE 28

3-3 showsthestructureofthelterbank.DRNLltersalsoaccountforthenon-linearbehavioroftheBMwhichenablesthemtoperformbetterthanthegammatonelters.TheparametersoftheDRNLunitsvarywithrespecttopositionalongthecochlearpartitionbutarexedwithrespecttotheintensityofthestimulus.AsingleDRNLunitincludestwoparallelbandpassprocessingpaths,alinearoneandacompressivenonlinearone,andthenaloutputisthesumoftheoutputsofthetwopaths.Thelinearpathconsistsofalineargainfunction,agammatonebandpasslter,andalowpasslter.Thenonlinearpathconsistsofagammatonelter,acompressivefunctionwhichappliesaninstantaneousbroken-sticknonlinearity,anothergammatonelter,and,nally,alowpasslter.Theoutputofthelinearpathdominatesthesumathighsignallevels(above70-80dBSPL).Thenonlinearpathbehaveslinearlyatlowsignallevels(below30-40dBSPL)andiscompressiveatmediumlevels(40-70dBSPL).TheparametersoftheDRNLlterbankwerettedtomatchtheeffectofthehumancochleabyttingthemodelparameterstopsychophysicalpulsation-thresholddata[ 18 ].Thepresentstudyusestheparametersspeciedin[ 16 ]whichareaslightmodicationoforiginalparameters[ 18 ].Theamountofcompressionwasadjustedtostayconstantabove1.5kHz,whereasitwasassumedtoincreasecontinuouslyintheoriginalparameterset. Table3-1. ValuesoftheDRNLlterbankparameters. Parameterp0m BWlin0.037280.75BWnlin-0.031930.77LPlincuto-0.067621.01aCF>1.5Hz4.004710.00bCF>1.5Hz-0.980150.00 28

PAGE 29

3.2.1.2Hair-celltransduction Thenextprocessingstepsimulatesthehair-celltransduction,i.e.,thetransformationfrommechanicalvibrationsoftheBMintoinner-hair-cellreceptorpotentials.Thisismodeledbyhalf-waverecticationfollowedbyrstorderlowpasslteringwith1kHzcutoff.Thelowpasslterpreservesthetemporalnestructureofthesignalatlowfrequenciesandextractstheenvelopeofthesignalathighfrequencies.Thelowpasslteroutputisthentransformedintoanintensitylikerepresentationbyapplyingasquaringexpansion. 3.2.1.3Neuraladaptation . . . I . LowpassFilter1 . . LowpassFilter5 . I2)]TJ /F18 4.981 Tf 5.4 0 Td[(5 . Fivefeedbackloops. Figure3-4. Neuraladaptationcircuitoftheauditoryperceptionmodel. Theoutputofsquaringexpansionservesastheinputtotheneuraladaptationstageofthemodel.Thisstagesimulatestheadaptivepropertiesoftheauditorynerve.Adaptationreferstodynamicchangesinthegainofthesysteminresponsetochangesininputlevel.Thisstagecompressesstationarysignalsalmostlogarithmicallywhereasfastuctuationsoftheinputaretransformedmorelinearly.Figure 3-4 showsthestructureoftheneuraladaptationstagewhichconsistsofachainofvefeedbackloopsinserieswithdifferenttimeconstants.Withineachsingleelement,thelow-passlteredoutputisfedbacktoformthedenominatorofthedividingelement.Thedivisoristhemomentarychargingstateofthelowpasslter,determiningtheattenuationappliedtotheinput.Thetimeconstantsvarybetween5and500millisecondsandttedtoaccountforperceptualforward-maskingdata[ 10 ].Aftertheonsetofstationarysignalsthelowpassltersarechargedaccordingtotheirtimeconstants.Becausethechargingstateofthelowpasslterentersasdivisor,agreaterchargingcausesagreaterattenuationoftheinputsignal.ForstationarysignalsaninputvalueIproducesavalue 29

PAGE 30

ofI2)]TJ /F16 5.978 Tf 5.75 0 Td[(5whichapproachesalogarithmictransform.Forinputvariationsthatarerapidcomparedtothetimeconstantsofthelowpasslters,thetransformationthroughtheadaptationloopsismorelinear,leadingtoanenhancementinfasttemporalvariationsoronsetsandoffsetsattheoutputoftheadaptationloops.Inresponsetosignalonsets,theoutputoftheadaptationloopsischaracterizedbyapronouncedovershoot.Thisovershootislimited,suchthatthemaximumratiooftheonsetresponseamplitudeandsteady-stateresponseamplitudeis10. 3.2.1.4Modulationlterbank Figure3-5. Transferfunctionsofthemodulationltersoftheauditorymodel. Theoutputoftheneuraladaptationstageislteredbyarstorderlowpasslterwithacutoffof150Hz.Thisltersimulatesadecreasingsensitivitytosinusoidalmodulationasafunctionofmodulationfrequency.Theoutputofthelowpasslterisprocessedbythemodulationlterbank.Thehighestmodulationltercenterfrequenciesinthelterbankarelimitedtoone-quarterofthecenterfrequencyoftheperipheralchanneldrivingthelterbankandmaximallyto1kHz.Thelowestmodulationlterisasecondorderlowpasslterwithacutofffrequencyof2.5Hz.Themodulationlterstunedto5and10Hzhaveaconstantbandwidthof5Hz.Formodulationfrequenciesatandabove10Hz,themodulationltercenterfrequenciesarelogarithmicallyscaledand 30

PAGE 31

theltershaveaconstantQvalueof2.Themagnitudetransferfunctionsoftheltersoverlapattheir3dBpoints.Figure 3-5 showsthetransferfunctionsofthemodulationlters.Allthemodulationltersarecomplexfrequency-shiftedrstorderlowpasslters.Theseltershaveacomplexvaluedoutputandeithertheabsolutevalueoftheoutputortherealpartcanbeconsidered.Forthelterscenteredabove10Hz,theabsolutevalueisconsidered.ThisiscomparabletotheHilbertenvelopeofthebandpasslteredoutputandonlyconveysinformationaboutthepresenceofmodulationenergyintherespectivemodulationband,i.e.,themodulationphaseinformationisstronglyreduced.Formodulationlterscenteredatandbelow10Hz,therealpartofthelteroutputisconsidered.Theoutputofmodulationltersabove10Hzwasattenuatedbyafactorofp 2,sothatthermsvalueattheoutputisthesameasforthelowfrequencychannelsinresponsetoasinusoidalamplitudemodulatedinputsignalofthesamemodulationdepth.Inordertosimulatelimitedresolution,aGaussiandistributednoiseisaddedtoeachchannelattheoutputofthemodulationlterbank.Thenaloutputcannowbeinterpretedasafourdimensional,time-varyingactivitypatterncalledinternalrepresentation. 3.2.2TimeAlignment AsshowninFigure 3-1 thepatient'sspeechismatchedwithhealthyspeechusingtimealignment.Timealignmentisaprocessbywhichtemporalregionsofthetestutterancearematchedwithappropriateregionsofthereferenceutterance.Timealignmentofspeechisimportantbecauseofthefactthatdifferentacousticrenditionsofthesameutterance(e.g.,word,phrase,sentence)areseldomrealizedatthesamespeed(speakingrate)acrosstheentireutterance.Therefore,whencomparingtwoutterances,thevariationinspeakingrateshouldbeaccountedfortogetameaningfulcomparison.Dynamictimewarping(DTW)isatechniqueforcomparingtwospeechutterancesbasedontemplatematchingthatinherentlyaccomplishestimealignment 31

PAGE 32

[ 19 ].Figure 3-6 showstheoutlineofthetimealignmentsystememployedinthepresentstudy. . . MFCC . HealthySpeech . MFCC . PatientSpeech . DissimilarityMatrix . DynamicTimeWarping . SimilarityScore&WarpingPath Figure3-6. Schemaofthetimealignmentsystem. TheinputspeechwaveformsaretransformedintoatypeofcepstrumrepresentationcalledMel-frequencycepstrumcoefcients(MFCC).Thedissimilaritymatrix,whichisasymmetricmatrixofpairwisedistancesbetweentheMFCCcoefcientsoftheframesofboththeutterancesiscomputed.Finally,theDTWalgorithmoperatesonthedissimilaritymatrixandprovidestheoveralldissimilarityscoreandthewarpingpathwhichprovidesthebesttimealignment. 3.2.2.1Mel-frequencycepstrumcoefcients Severalpsychoacousticstudieshavebeendevotedtoderivethefrequencyscalesthatattempttomodeltheresponseofhumanauditorysystem.Theinnerearactsasaspectrumanalyzerandtogetherwiththeactionoftheauditorynervetheperceptualattributesofsoundsatdifferentfrequenciesarenotentirelylinearinnature.Themelfrequencyscaleisaperceptuallymotivatedscalethatislinearbelow1kHzandlogarithmicabove,withequalnumberofsamplestakenbelowandabove1kHz.ThemelscaleisdenedasMel(f)=2595log101+f 700 (3) MFCCisatypeofcepstrumrepresentationwherethefrequencyanalysisisbaseduponalterbankwithcriticalbandspacingofltersandbandwidthsbasedonmelscale.Figure 3-7 showsanexampleofalterbankwithtriangularweightingfunctions.ForcomputingMFCC,themagnitudespectrumofaframeofspeechiscomputedusingFouriertransform.TheDFTvaluesarethengroupedtogetherincriticalbandsand 32

PAGE 33

Figure3-7. DFTweightingfunctionsformel-frequencycepstrumcomputation. weightedbytriangularfunctionsofthelterbank.FinallythediscretecosinetransformofthelogarithmofthelteroutputsprovidestheMFCCvaluesfortheframe.MFCChasausefulpropertyofcompressionbyvirtueofwhichtheentirespectralinformationiscapturedbyasmallnumberofcoefcients(typically12-13).Also,theinformationcarriedbyindividualMFCCcoefcientsacrossframesismaximallyde-correlated.MFCChasbecomermlyestablishedasthebasicfeatureformostspeechpatternanalysisproblemsduetoveryrobustandreliableperformance. Sincespeechisadynamicsignalevolvingovertime,itisusefultocapturethetemporaldynamicsofthesignalinsomerepresentation.Timederivatives(rstandsecond)oftheMFCCfeaturemeasurementsareusedforthispurpose.Theyarecommonlyknownasdelta(orvelocity)anddelta-delta(oracceleration)parameterscorrespondingtorstandsecondorderderivativesrespectively.DeltaMFCCisoftencomputedasaleastsquaresapproximationtolocalslope(overaregionaroundthecurrentsample)therebyprovidingalocallysmoothedestimateasmfcci[n]=MXk=)]TJ /F11 7.97 Tf 6.58 0 Td[(Mk(mfcci+k[n]) MXk=)]TJ /F11 7.97 Tf 6.58 0 Td[(Mk2 (3) wheremfcci[n]istheMFCCvaluefortheithframeandnthcoefcient.Equation( 3 )isbasicallyttingastraightlineofunityslopevialinearregressiononMframesbeforeand 33

PAGE 34

Mframesaftertheithframe.Delta-deltaMFCCvaluescanbecalculatedonthesamelinesbyusingEquation( 3 )withmfcci[n]replacingmfcci[n]ontherighthandside. 3.2.2.2Dynamictimewarping ConsidertwospeechutterancesXandYthatarerepresentedbyasetofshort-timeacousticfeaturesequences(x1,x2,...,xTx)and(y1,y2,...,yTy)respectively,wherexiandyiarefeaturevectorswithawelldeneddistancemeasurebetweenthem.CommonlyuseddistancemeasuresareEuclideanandcosinedistance.TxandTyarethedurationsoftwofeaturesequenceswhichneednotbeidentical.LetixandiydenotethetimeindicesofXandY,respectivelywhereix=1,2,...,Txandiy=1,2,...,Ty.AlsothedistancebetweenfeaturevectorsofXandY,d(xix,yiy)issimplydenotedasd(ix,iy).TimealignmentofXandYinvolvestheuseoftwowarpingfunctions,xandy,whichmaptimeindicesixandiytoacommon,“normal”timeaxisk,i.e,ix=x(k),k=1,2,...,T (3) iy=y(k),k=1,2,...,T (3) Aglobaldissimilaritymeasured(X,Y)isthendenedbasedonthewarpingfunctionpair=(x,y)astheaccumulateddistortionovertheentireutteranceasd(X,Y)=TXk=1d(x(k),y(k))m(k)=M (3) whered(x(k),y(k))isthedistancebetweenxx(k)andyy(k),m(k)isanonnegative(path)weightingcoefcientandMisa(path)normalizingfactor.Alargenumberofpossibilitiesexistforwarpingfunctionpairs.Sooveralldissimilarityd(X,Y)isdenedasd(X,Y),mind(X,Y) (3) wheremustsatisfyasetofconstraints.Equation( 3 )ischoosingthebestpathsoastominimizetheaccumulateddistortionalongthealignmentpathwhichmeansthe 34

PAGE 35

dissimilarityismeasuredbasedonbestpossiblealignmentinordertocompensateforthenonlinearspeakingratedifferencesbetweenXandYwhichrepresentutterancesofsameword(orsentence).Findingthebestpathinvolvessolvingtheminimizationproblemtocomputetheoveralldissimilarity.ThisminimizationissolvedusingdynamicprogrammingwhichisbasedonBellman'soptimalityprinciple. Forachievingmeaningfultimealignmentcertainconstraintsonwarpingfunctionpairneedtobeimposedsuchasdisallowingtimereversaletc.Typicalwarpingconstraintsinclude endpointconstraints monotonicityconstraints localcontinuityconstraints globalpathconstraints slopeweighting Theseconstraintsarediscussedindetailin[ 19 ].ApplyingthedynamicprogrammingsolutionalongwiththewarpingconstraintsprovidesanefcientwaytosolvetheminimizationinEquation( 3 )whichcomputestheoveralldissimilarityd(X,Y)andalsoprovidesameaningfultimealignmentdenedby. ThesummaryofthedynamicprogrammingimplementationofndingthebestpaththroughTxbyTygrid,beginningat(1,1)andendingat(Tx,Ty)isasfollows, 1. Initialization DA(1,1)=d(1,1)m(1) 2. Recursion 35

PAGE 36

For1ixTx,1iyTysuchthatixandiystaywithintheallowablegrid,computeDA(ix,iy)=min(i0x,i0y)DA(i0x,i0y)+)]TJ /F5 11.955 Tf 5.48 -9.68 Td[((i0x,i0y),(ix,iy))]TJ /F5 11.955 Tf 5.48 -9.68 Td[((i0x,i0y),(ix,iy)=LsXl=0d(x(T0)]TJ /F9 11.955 Tf 11.95 0 Td[(l),y(T0)]TJ /F9 11.955 Tf 11.95 0 Td[(l))m(T0)]TJ /F9 11.955 Tf 11.96 0 Td[(l) withLsbeingthenumberofmovesinthepathfrom(i0x,i0y)to(ix,iy)accordingtoxandy. 3. Termination d(X,Y)=DA(Tx,Ty) M OneoftheadvantagesoftheDTWalgorithmisthatitdoesnotrequireanypriortraining.Alsothecomputationalcomplexityofthealgorithmisverylowduetotheuseofthedivideandconquerstrategy. Figure3-8. TimealignmentoftwoutterancesusingDTW.Leftpanelshowsthedissimilaritymatrixcalculatedusingcosinedistancebetweentheshort-timeMFCC's.Rightpanelshowsthewarpingpathoverlaidontheminimumcost/distortion-to-this-pointmatrix. Figure 3-8 showstheresultoftheDTWprocedureontwoutterancesofsamesentence,“UniversityofFloridagators”.Utterance1isroughly1.7timeslongerthanutterance2.Firstshort-timeMFCCcoefcientswerecalculatedforbothutterances 36

PAGE 37

andthencosinedistancewascalculatedbetweeneachframeofutterance1withallframesofutterance2,where,cosinedistancebetweentwovectorsxandyisdenedas0.5(1)]TJ /F20 11.955 Tf 12.09 0 Td[(xTy=kxk=kyk).Theleftpanelshowsthedissimilaritymatrixwhose(i,j)entrycorrespondstodistancebetweenithframeofutterance1andjthframeofutterance2.Lowdissimilarityvaluescanbeseenasadeepbluestriperunningdiagonallyacrossfrombottomlefttothetopright.Rightpanelshowsthebesttimealignmentpathwhichvisiblyfollowsthedeepbluestripeintheleftpanel.Thetimealignmentpathisoverlaidonthematrixofminimumpartialaccumulateddistortionateachtimestep.MoreoverthetoprightcornerofthismatrixgivestheoveralldissimilaritymeasureasexpressedinEquation( 3 ). 3.2.3PitchAnalysis AsshowninFigure 3-1 thehealthyspeechsignalissubjectedtopitchanalysis.Theperceptualquantitythatisrelatedtosoundfrequencyiscalledpitch.Theperceivedpitchisinuencedbythesoundfrequencyandsoundintensity.Ingeneral,pitch(subjectiveattribute)ishighlycorrelatedwithfundamentalfrequency(physicalattribute).Pitchisanattributeofsoundthatgivesimportantinformationaboutitssource.Inspeech,ithelpsustoidentifythegenderofthespeaker(femalestendtohavehigherpitchthanmales)andgivesadditionalmeaningtowords(asetofwordsmaybeinterpretedasanafrmationoraquestiondependingontheintonation).Inmusic,itdeterminesthenamesofthenotes.Oneofthecharacteristicsofpitchisthathumanlistenersareextremelysensitivetochangesinfrequencyandcanreliablydistinguishtwotonesseparatedby3Hz(ormore)ifthefrequencyofthetonesisatorbelow500Hz.Ifthefrequencyofthetonesisatorabove500Hz,humanscanreliablydeterminethattwotonesareofdifferentfrequenciesiftheyareseparatedby0.003f0,wheref0isthefrequencyofthelowertone.Recently,CamachoandHarris[ 17 ]proposedanovelpitchestimatorwhichoutperformsallcontemporaneousalgorithmsevaluatedondatasetscomprisingnormalspeech,disorderedvoicesandmusicalinstruments. 37

PAGE 38

Thealgorithmiscalledsawtoothinspiredpitchestimator(SWIPE0).InadditiontopitchestimatesSWIPE0alsoprovidespitchstrengthestimates.Theattributeofsoundthatconveysthesaliencyofpitchiscalledpitchstrength.Pitchstrengthisnotacategoricalvariablebutacontinuumthatallowsustodeterminethedegreetowhichthesensationofpitchexists.Soundswhichelicitastrongpitchsensationwillhavehigherpitchstrength.Also,pitchstrengthisindependentofpitch,i.e,twosoundscanhavesamepitchbutdifferentpitchstrength.Unvoicedsoundsnormallyhavepitchstrengthintherangeof0to0.2andvoicedsoundshavepitchstrengthofmorethan0.2andlessthanorequalto1.SWIPE0extractsthepitchbutnotthefundamentalfrequency(denedasthemaximumcommondivisorofitsspectralcomponents).Inmanycases,thesetwoattributescoincide,butthatisnotalwaysthecase.Forexample,aperiodicsignalformedbythe13th,25th,and29thharmonicsof50Hz(i.e.,650,950,and1250Hz)isperceivedashavingapitchof334or650Hzbutnot50Hz.SWIPE0hasbeenshowntobevastlysuperiortothecurrentpitchestimatorsonadisorderedvoicedatabaseofvowels[ 17 ]. SWIPE0estimatesthepitchasthefundamentalfrequencyofthesawtoothwaveformwhosespectrumbestmatchesthespectrumoftheinputsignal.Thesequenceofstepsinthealgorithmareasfollows: 1. Foreachpitchcandidatefwithinapitchrange[fmin,fmax],computethescoreasfollows.First,computethesquarerootofthespectrumofthesignal.Thennormalizethesquarerootofthespectrumandapplyanintegraltransformusinganormalizedcosinekernelwhoseenvelopedecaysas1=p f. 2. Estimatethepitchasthehighestscorecandidate. Figure 3-9 showstheoutputoftheSWIPE0algorithmappliedtoutterance1usedinthetimealignmentexperimentofSection 3.2.2.2 .Thetextoftheutteranceis“UniversityofFloridagators”.Thetoppanelshowstheacousticwaveformoftheutterance.Thebottompanelplotsthepitchestimates(Hz)inblueandpitchstrengthestimatesingreen.Theregionsoftheutteranceforwhichpitchstrengthislessthan0.2areunvoiced. 38

PAGE 39

Figure3-9. PitchanalysisusingSWIPE0algorithm.Toppanelshowsthewaveformoftheutterance.Bottompanelshowstheextractedpitch(Hz)andpitchstrengthvalues. 39

PAGE 40

CHAPTER4SCORING Thesecondstepofprocessinginthespeechevaluationsystemisthatofscoring(Figure 2-1 ),whichprovidesascorequantifyingtheoverallspeechintelligibility.Thescoringsystemreceivesfeaturemeasurementsfromanutteranceandthenpredictstheintelligibilityscore.Scoringcanbeviewedasasupervisedlearningtaskofregressionwherefeaturevectoristhepredictorvariable(regressor)andintelligibilityscoreistheresponsevariable(regressand). 4.1RegressionAnalysis AssumingwehaveatrainingdatasetD=f(xi,yi)ji=1,...,NgRDRcomprisingNobservations,whereeachxi=(xi1,xi2,...,xiD)TisaDdimensionalfeature(covariate)vectorandyiisthetargetvariable(dependentvariable)forithdatasample.LetXdenoteaNDmatrixwhoserowsarexiand,lety=(y1,y2,...,yN)Tdenotethetargetorresponsevector,sothatD=(X,y).Thegoaloftrainingistolearnthemappingxi7!yi.Alearningmachineisdenedbyasetofmappingsxi7!f(xi,),wherethefunctionsfthemselvesarelabeledbyadjustableparameters.Differentvaluesofresultindifferentmappingsandtrainingprocedurechoosesthebest^basedonsomeoptimalitycriterion.Aftertrainingwecanpredicttheresponsetoanynewinputobservationx?asf(x?,^).Regressionanalysisattemptstomodeltherelationshipbetweeninputsandtargets,i.e.,conditionaldistributionofthetargetsgiventheinputs.Alargebodyoftechniquesforcarryingoutregressionanalysishasbeendevelopedincludingbothparametricandnon-parametricapproaches.Followingsectionsdescribethetechniquesusedinthisstudy. 4.1.1LinearRegression Linearregressionmodelhastheformf(x,w)=p)]TJ /F13 7.97 Tf 6.58 0 Td[(1Xj=0wjj(x)=wT(x) (4) 40

PAGE 41

where,w=(w0,w1,...,wp)]TJ /F13 7.97 Tf 6.59 0 Td[(1)Tistheunknownweight(parameter)vector,fj(x)garethebasisfunctionsandmodelorderisdenotedbyp.Theparameterw0allowsforanyxedoffset(orbias)inthedatabydeningadummybasisfunction0(x)=1.Severalchoicesforbasisfunctionsexistwhichincludeslinearandnon-linearfunctionssuchas linearfunctionwithj(x)=x; transformationofinputfeaturessuchaslog,squareroot,squareortanh; basisexpansionssuchasj(x)=xjleadingtopolynomialrepresentation; interactionbetweenvariables. Irrespectiveofthechoiceofbasisfunctionsthemodelislinearinparameters.Fromaprobabilisticpointofview,thepredictivedistributionp(yjx)expressestheuncertaintyaboutthevalueofyforeachvalueofx.Fromthisconditionaldistributionwecanmakepredictionsofy?foranynewvalueofx?insuchawayastominimizetheexpectedvalueofthesquaredlossfunction.Thisleadstothemethodofleastsquareswherewischosensothatitminimizestheresidualsumofsquares(RSS)RSS(w)=NXi=1(yi)]TJ /F9 11.955 Tf 11.96 0 Td[(f(xi,w))2 (4) LetdenotetheNpmatrix,calleddesignmatrixwhoseelementsaregivenbyij=j(xi),sothat=2666666640(x1)1(x1)p)]TJ /F13 7.97 Tf 6.58 0 Td[(1(x1)0(x2)1(x2)p)]TJ /F13 7.97 Tf 6.58 0 Td[(1(x2)............0(xN)1(xN)p)]TJ /F13 7.97 Tf 6.59 0 Td[(1(xN)377777775 then,wecanrewritetheresidualsumofsquaresasRSS(w)=(y)]TJ /F20 11.955 Tf 11.95 0 Td[(w)T(y)]TJ /F20 11.955 Tf 11.96 0 Td[(w) (4) 41

PAGE 42

Thisisaquadraticfunctioninthepparameters.ForminimizingRSSwedifferentiateEquation( 4 )withrespecttowtoobtain@RSS @w=)]TJ /F5 11.955 Tf 9.3 0 Td[(2T(y)]TJ /F20 11.955 Tf 11.96 0 Td[(w)@2RSS @w@wT=2T (4) Assumingthathasfullcolumnrank,i.e,Tispositivedeniteandsettingtherstderivativetozeroyieldsauniquesolution^w=(T))]TJ /F13 7.97 Tf 6.59 0 Td[(1Ty (4) Thematrix(T))]TJ /F13 7.97 Tf 6.59 0 Td[(1TisknownastheMoore-Penrosepseudo-inverseofthematrix.Thepredictedvaluesfortraininginputsare^y=^w=(T))]TJ /F13 7.97 Tf 6.59 0 Td[(1Ty (4) Theresidualvectory)]TJ /F5 11.955 Tf 12.91 0 Td[(^yisorthogonaltothecolumnspaceof(subspaceinRNspannedbycolumnsofmatrix).Hencetheestimate^yistheorthogonalprojectionofyontothissubspace,forwhichtheprojectionmatrixis(T))]TJ /F13 7.97 Tf 6.59 0 Td[(1T.TheGauss-Markovtheoremimpliesthattheleastsquaresestimatorhasthesmallestmeansquarederrorofalllinearestimatorswithnobias.However,theremaywellexistabiasedestimatorwithsmallermeansquarederror.Linearregressionhastheadvantageofsimplicityinimplementationandinterpretationbutsuffersfromlimitedexibility;iftherelationshipbetweentheinputandoutputcannotbereasonablyapproximatedbyalinearfunction,thenthemodelwillgivepoorpredictions. 4.1.2GaussianProcessRegression Wewishtondafunction(mapping)ffromanitetrainingdatasetDsothatwecanmakepredictionsfornewinputsx?notseeninD.Alinearregressiontechniquerestrictsftobealinearfunctionofinputs.Inadifferentapproachwecanassignapriorprobabilitytoeverypossiblefunction,wherehigherprobabilitiesaregiventofunctions 42

PAGE 43

consideredtobemorelikely.Thereexistsanuncountablyinnitesetofpossiblefunctionswhichseemtomakethisapproachintractable.AGaussianprocess(GP)providesasophisticatedframeworktomakethisapproachcomputationallytractable.AGaussianprocessisdenedasacollectionofrandomvariables,anynitenumberofwhichhaveajointGaussiandistribution.AGPiscompletelyspeciedbyitsmeanfunctionandcovariancefunction.ThebookbyRasmussenandWilliams[ 20 ]providesacomprehensiveanalysisofGPregression;wecloselyfollowtheirtextinourdiscussionhere.FortheregressionmodeldenedbyEquation( 4 ),ifthereisaGaussianpriordistributionontheweightsandweassumeGaussiannoisethentherearetwoequivalentwaysofobtainingtheregressionfunction,(i)byperformingcomputationsinweightspace,and(ii)bytakingafunctionspaceview. 4.1.2.1Weightspaceview Wewillreconsiderthelinearregressionmodelf(x,w)=wT(x)(Equation( 4 ))fromtheBayesianviewpoint.LettheweightshaveapriordistributionwhichisazeromeanGaussian,wN(0,w),whereN(m,)denotesamultivariateGaussiandistributionwithmeanvectormandcovariancematrix.AssumingthatthetargetsyiaregeneratedbyGaussiannoiseofvariance2fromtheunderlyingfunction,thenthelikelihoodofwisp(yj,w)=N(w,2I) (4) whereIistheidentitymatrixandisaNpdesignmatrix.Theposteriordistributionfortheweightsisgivenbyp(wjy,)=p(yj,w)p(w) Rp(yj,w)p(w)dw (4) AsthepriorandlikelihoodareGaussian,theposteriorisalsoGaussianp(wjy,)N1 2A)]TJ /F13 7.97 Tf 6.58 0 Td[(1Ty,A)]TJ /F13 7.97 Tf 6.59 0 Td[(1 (4) 43

PAGE 44

whereA=)]TJ /F13 7.97 Tf 6.59 0 Td[(2T+)]TJ /F13 7.97 Tf 6.58 0 Td[(1w.Themeanoftheposteriordistributionp(wjy,)isalsoitsmode,whichisalsocalledthemaximumaposteriori(MAP)estimateofw.Theposteriormeanvalueofweights^wisthechoiceofwthatminimizesthequadraticformE=1 22(y)]TJ /F20 11.955 Tf 11.96 0 Td[(w)T(y)]TJ /F20 11.955 Tf 11.95 0 Td[(w)+1 2wT)]TJ /F13 7.97 Tf 6.59 0 Td[(1ww (4) whichgives^w=1 22A)]TJ /F13 7.97 Tf 6.59 0 Td[(1Ty (4) Tomakepredictionsforatestcaseweaverageoverallpossibleparametervalues,weightedbytheirposteriorprobability.Thisisincontrasttonon-Bayesianschemes,whereasingleparameteristypicallychosenbysomecriterion.Thepredictivedistributionforf?,f(x?)isgivenbyaveragingtheoutputofallpossiblelinearmodelsw.r.t.theGaussianposteriorp(f?jx?,,y)=Zp(f?jx?,w)p(wj,y)dw=N(T?^w,T?A)]TJ /F13 7.97 Tf 6.58 0 Td[(1?)=N(T?wT(K+2I))]TJ /F13 7.97 Tf 6.58 0 Td[(1y,T?w?)]TJ /F4 11.955 Tf 11.95 0 Td[(T?wT(K+2I))]TJ /F13 7.97 Tf 6.59 0 Td[(1w?) (4) where?,(x?)andK=wT. NotethatinEquation( 4 )thefeaturespacealwaysappearsintheformofwT,T?w?orT?wT;thustheentriesofthesematricesareinvariablyoftheform(x)Tw(x0)wherexandx0belongtoeithertrainingortestsets.Wedeneacovariancefunctionorkernelask(x,x0)=(x)Tw(x0)whichisaninnerproduct(withrespecttow).Ifanalgorithmisdenedsolelyintermsofinnerproductsininputspacethenitcanbeliftedintofeaturespacebyreplacingoccurrencesofthoseinnerproductsbyk(x,x0),whichisalsoknownasthekerneltrick. 44

PAGE 45

4.1.2.2Functionspaceview InSection 4.1.2.1 theuncertaintyintheproblemwasdescribedthroughaprobabilitydistributionovertheweights.Alternativelywecandealdirectlywithuncertaintywithrespecttothefunctionvaluesatpointsweareinterestedin.Thisisstochasticprocessoffunctionspaceviewoftheproblem.AsdescribedpreviouslyGPareasubsetofstochasticprocessesbygivingonlythemeanvectorandcovariancematrixforanynitesubsetofpoints. Inthefunctionspaceviewoflinearregressionweconsiderthekindsoffunctionthatcanbegeneratedfromaxedsetofbasisfunctionswithrandomweights.AnexampleofGPisourBayesianlinearregressionmodelf(x)=(x)TwwithpriorwN(0,w)forwhichthemeanandcovarianceareE[f(x)]=(x)TE[w]=0,E[f(x),f(x0)]=(x)TE[wwT](x0)=(x)Tw(x0) (4) whereE[.]istheexpectationoperator.Thespecicationofthecovariancefunctionk(x,x0)impliesadistributionoverfunctions. ConsideratestdatamatrixX?ofdimensionsN?D,comprisingN?samples.Letf=(f(x1),...,f(xN))Tbeavectoroffunctionvaluesattrainingdatapointsand,letf?beavectoroffunctionvaluesfortestdata.Also,letK(X,X?)denoteaNN?matrixofthecovariances(kernel)evaluatedatallpairsoftrainingandtestpoints.AsimilardenitionholdsformatricesK(X,X),K(X?,X?)andK(X?,X).Includingthemeasurementnoiseterm2wecanexpressthejointdistributionoftheobservedtargetvaluesandthefunctionvaluesatthetestlocationsas264yf?375N0B@0,264K(X,X)+2IK(X,X?)K(X?,X)K(X?,X?)3751CA (4) 45

PAGE 46

Finally,theconditionalpredictivedistributionisgivenbyp(f?jX,y,X?)N(f?,cov(f?)),wheref?=K(X?,X)[K(X,X)+2I])]TJ /F13 7.97 Tf 6.59 0 Td[(1y (4) cov(f?)=K(X?,X?))]TJ /F9 11.955 Tf 11.96 0 Td[(K(X?,X)[K(X,X)+2I])]TJ /F13 7.97 Tf 6.59 0 Td[(1K(X,X?) NotethattheseresultsexactlycorrespondtotheonesderivedundertheweightspaceviewinEquation( 4 )byusingK(C,D)=(C)w(D)T,whereC,DstandforeitherXorX?.Foranysetofbasisfunctions,wecancomputethecorrespondingcovariancefunctionask(x,x0)=(x)Tw(x0);conversely,forevery(positivedenite)covariancefunctionk,thereexistsa(possiblyinnite)expansionintermsofbasisfunctions. Forthecaseofsingletestpointx?,letk?=(k(x1,x?),...,k(xN,x?))TdenotethevectorofcovariancesbetweenthetestpointandNtrainingpoints.ThenEquation( 4 )reducestothefollowingf?=kT?[K(X,X)+2I])]TJ /F13 7.97 Tf 6.59 0 Td[(1y,V[f?]=k(x?,x?))]TJ /F20 11.955 Tf 11.96 0 Td[(kT?[K(X,X)+2I])]TJ /F13 7.97 Tf 6.58 0 Td[(1k? (4) AlthoughtheGPdenesajointGaussiandistributionoveralloftheyvariables,formakingpredictionsatx?,weonlycareabouttheN+1dimensionaldistributiondenedbyNtrainingpointsandthetestpoint.NotealsothatthevarianceinEquation( 4 )doesnotdependontheobservedtargets,butonlyontheinputs;thisisapropertyoftheGaussiandistribution. Marginallikelihood(orevidence)p(yjX)isdenedastheintegralofthelikelihoodtimesthepriorp(yjX)=Rp(yjf,X)p(fjX)df.Thetermmarginallikelihoodreferstothemarginalizationoverthefunctionvaluesf.ObservingthatyN(0,K+2I)and 46

PAGE 47

performingtheintegrationyieldsthelogmarginallikelihoodlog(p(yjX))=)]TJ /F5 11.955 Tf 10.49 8.09 Td[(1 2yT(K+2I))]TJ /F13 7.97 Tf 6.58 0 Td[(1y)]TJ /F5 11.955 Tf 13.15 8.09 Td[(1 2logjK+2Ij)]TJ /F9 11.955 Tf 19.12 8.09 Td[(N 2log2 (4) 4.1.2.3Covariancefunctions(kernels) Asseenabovecovariancefunctions(orkernels)k(x,x0)playacrucialroleintheGPpredictor.Thecovariancefunctiondenesthenotionofsimilaritybetweendatapoints.Itallowsembeddingofdatainahighdimensionalfeaturespace(viamapping(x))wherelinearpatternanalysisisperformedresultinginnon-linearpatternanalysisintheinputspace.Theuseofkernelsenablesthistechniquetobeappliedwithoutpayingthecomputationalpenaltyimplicitinthenumberofdimensions,sinceitispossibletoevaluatetheinnerproductbetweentheimagesoftwoinputsinafeaturespacewithoutexplicitlycomputingtheircoordinates. Astationarycovariancefunctionisafunctionofx)]TJ /F20 11.955 Tf 12.93 0 Td[(x0and,henceinvarianttotranslationsininputspace.Moreoverifthecovariancefunctionisonlyafunctionofkx)]TJ /F20 11.955 Tf 13.55 0 Td[(x0kthenitiscalledisotropic;itisthusinvarianttoallrigidmotions.Ifacovariancefunctiondependsonxandx0solelythroughxTx0thenitiscalledadotproductcovariancefunction.Bydenitioncovariancefunctionsaresymmetrici.e.,k(x,x0)=k(x0,x).Asdenedpreviously,giventhesetofinputpointsfxiji=1,...,NgwecancomputetheGrammatrixKwhoseentriesareKij=k(xi,xj),i,j=1,...,N,alsocalledthekernelmatrixorcovariancematrix.Thematrixissymmetrici.e,K=KTandpositivesemi-denitei.e,vTKv0,8vRN. Table 4-1 providesexpressionsofsomecommonlyusedcovariancefunctions.Thesquaredexponential(SE)orGaussiancovariancefunctionisthemostprevalentchoiceofisotropickernel,whereparameter`denesthecharacteristiclengthscale.TheSEcovariancefunctionisinnitelydifferentiable,whichmeansthattheGPwiththiscovariancefunctionhasmeansquarederivativesofallorders,andisthusverysmooth.TheSEkernelisinnitelydivisibleinthat(k(x,x0))tisavalidkernelforallt>0;the 47

PAGE 48

Table4-1. Somecommonlyusedcovariancefunctionsk(x,x0)forx,x0RD CovariancefunctionExpression Constant20Noise20(x,x0)LinearDXd=12dxdx0dPolynomial(xTx0+20)pSquaredexponentialexp)]TJ 10.5 8.09 Td[(kx)]TJ /F20 11.955 Tf 11.95 0 Td[(x0k2 2`2-exponentialexp)]TJ /F10 11.955 Tf 11.29 16.86 Td[(kx)]TJ /F20 11.955 Tf 11.96 0 Td[(x0k `Rationalquadratic1+kx)]TJ /F20 11.955 Tf 11.96 0 Td[(x0k2 2`2)]TJ /F14 7.97 Tf 6.58 0 Td[( where,,l,0,darerealscalarsand(a,b)istheKroneckerdeltafunction,whichis1iffa=bor0otherwise. effectofraisingktothepowertissimplytorescale`.AlsotheSEcovariancefunctioncorrespondstoaBayesianlinearregressionmodelwithaninnitenumberofbasisfunctions.RasmussenandWilliams[ 20 ]provideadetailedanalysisofvarioustypesofkernels. Newkernelscanbeconstructedbycombiningormodiedexistingvalidkernels.Givenvalidkernelsk1(x,x0)andk2(x,x0),thefollowingconstructionsarealsovalidkernelsk(x,x0)=ck1(x,x0)k(x,x0)=f(x)k1(x,x0)f(x0)k(x,x0)=q(k1(x,x0))k(x,x0)=exp(k1(x,x0))k(x,x0)=k1(x,x0)+k2(x,x0) (4) k(x,x0)=k1(x,x0)k2(x,x0)k(x,x0)=(k1(x,x0))pk(x,x0)=xTAx0k(x,x0)=k3((x),(x0)) 48

PAGE 49

where,c>0isaconstant,f(.)isanyfunction,q(.)ispolynomialwithnon-negativecoefcients,Aisasymmetricpositivedenitematrix,pisapositiveinteger,(x)mapsxtoRMandk3(.,.)isavalidkernelinRM. 4.1.2.4Modelselectionandadaptationofhyperparameters TypicallythecovariancefunctionsemployedinGPregressionhavesomefreeparameters.Forexample,SEkernelhasoneparameter`andpolynomialkernelhastwoparameters0andp.Ingeneralthefreeparametersarecalledthehyperparameters.Inmanypracticalapplications,itmaynotbeeasytospecifyallaspectsofthecovariancefunctionwithcondence.Inaddition,theexactformandpossiblefreeparametersofthelikelihoodfunctionmayalsonotbeknowninadvance.GPprovidesapracticalsolutiontothisproblemofmodelselectionandhyperparametersetting. ModelselectionforGPregressionwithGaussiannoiseisbasedonBayesianinferenceprinciples.WecanexplainBayesianmodelselectionbyusingahierarchicalspecicationofthemodel.Atthelowestlevelaretheparameters,w.Atthesecondlevelarehyperparameterswhichcontrolthedistributionoftheparametersatthebottomlevel.Atthetoplevelwemayhavea(discrete)setofpossiblemodelstructures,Hi,underconsideration.Inferenceisdoneonelayerwiseusinglawsofprobability.AccordingtoBayes'ruletheposteriorovertheparametersisgivenbyp(wjy,X,,Hi)=p(yjX,w,Hi)p(wj,Hi) Rp(yjX,w,Hi)p(wj,Hi)dw (4) wherep(yjX,w,Hi)isthelikelihoodandp(wj,Hi)istheparameterprior.Thenormalizingconstantinthedenominatorisindependentoftheparameters,andiscalledthemarginallikelihood(orevidence).Similarlyatthenextlevel,wecalculatetheposterioroverthehyperparameters,wherethemarginallikelihoodfromtherstlevelplaystheroleoflikelihoodp(jy,X,Hi)=p(yjX,,Hi)p(jHi) Rp(yjX,,Hi)p(jHi)d (4) 49

PAGE 50

wherep(jHi)isthehyper-prior.Atthetoplevel,wecomputetheposteriorforthemodelp(Hijy,X)=p(yjX,Hi)p(Hi) Xip(yjX,Hi)p(Hi) (4) Equation( 4 )providestheexpressionformarginallikelihoodfromEquation( 4 ),whichcanbeexplicitlywrittenconditionedonthehyperparameters(theparametersofthecovariancefunction).Werestatetheequationbelowlog(p(yjX,)=)]TJ /F5 11.955 Tf 10.5 8.09 Td[(1 2yTK)]TJ /F13 7.97 Tf 6.59 0 Td[(1yy)]TJ /F5 11.955 Tf 13.15 8.09 Td[(1 2logjKyj)]TJ /F9 11.955 Tf 19.13 8.09 Td[(N 2log2 (4) whereKy=K+2I.Wecantunethehyperparametersbymaximizingthelogmarginallikelihoodlog(p(yjX),).Thepartialderivativeoflog(p(yjX),)withrespecttothehyperparameterjgives@log(p(yjX),) @j=1 2yTK)]TJ /F13 7.97 Tf 6.58 0 Td[(1y@Ky @jK)]TJ /F13 7.97 Tf 6.59 0 Td[(1yy)]TJ /F5 11.955 Tf 13.16 8.09 Td[(1 2traceK)]TJ /F13 7.97 Tf 6.59 0 Td[(1y@Ky @j=1 2trace(T)]TJ /F9 11.955 Tf 11.96 0 Td[(K)]TJ /F13 7.97 Tf 6.59 0 Td[(1y)@Ky @jwhere=K)]TJ /F13 7.97 Tf 6.59 0 Td[(1yy (4) ThebulkofthecomputationinvolvesinvertingtheKymatrixwhichrequiresO(N3)timeforaNNmatrix.OnceK)]TJ /F13 7.97 Tf 6.59 0 Td[(1yisknownthencomputingthederivativesrequiresonlyO(N2)timeperhyperparameter.ThederivativesfromEquation( 4 )canbeusedinanyunconstrainedoptimizationtechnique,e.g,conjugategradientalgorithmforndingthemaxima.Althoughmarginallikelihoodsuffersfrommultiplelocalmaxima,itisnotahugedisadvantagesinceeachlocalmaximaoffersadifferentinterpretationofthedata.Apartfrommarginallikelihoodapproach,theproblemofmodelselectionandhyperparametersettingcanalsobetackledusingk-foldcross-validationorleave-one-outcross-validationtechniquesatthecostofhighercomputationalburden. 50

PAGE 51

4.1.3Multi-layerPerceptronRegression Themulti-layerperceptron(MLP)isafeed-forwardneuralnetworkarchitecturewhichprovidesahighlyexiblesystemforsolvingregressionandclassicationproblems[ 21 ].AsthenamesuggeststheMLPcomprisesoneormorehiddenlayersandaoutputlayer,whereeachlayerismadeupofprocessingelements(orunits).ForthetaskofregressionweconsideranMLPwithonehiddenlayerofNHunitsandanoutputlayerwithasingleunit.Givenatraininginput(x,y)D,xRD,yR,theoutputofahiddenunitisgivenbyzj=' DXi=1w(h)jixi+w(h)j0!forj=1,...,NH (4) wherethesuperscript(h)indicatesthattheparameterbelongstothehiddenlayerand'(.)isadifferentiablenonlinearactivationfunction.Parametersw(h)jiandw(h)j0areknownasweightsandbiasesrespectively.Asbefore,wecanabsorbthebiasparameterintothesetofweightparametersbyaugmentingtheinputvectorxwithanadditionalvariablex0whosevalueisclampedatx0=1.Thenwecancompactlyrewritetheoutputofahiddenunitaszj=' DXi=0w(h)jixi! (4) Thechoicesforthenonlinearactivationfunction'aregenerallysigmoidalfunctionssuchasthelogisticsigmoid(logsig)or`tanh'.'logsig(a)=1 1+e)]TJ /F11 7.97 Tf 6.58 0 Td[(a (4) 'tanh(a)=tanh(a)=2 1+e)]TJ /F13 7.97 Tf 6.59 0 Td[(2a)]TJ /F5 11.955 Tf 11.95 0 Td[(1 (4) Fortheregressionproblemtheoutputlayerconsistsofasingleunitwithidentityactivationfunction('identity(a)=a).Therefore,followingEquation( 4 )wecan 51

PAGE 52

linearlycombinezjtogiveanoverallresponseofourMLPtoinputxas^y=NHXj=0w(o)jzj=NHXj=0w(o)j' DXi=0w(h)jixi! (4) wherethesuperscript(o)indicatesthattheparametersbelongtotheoutputlayer.Againwehaveabsorbedtheoutputlayerbiasesintotheoutputlayerweightsbyaugmentinghiddenunitoutputs(zjjj=1,...,NH)withz0=1.ThecomputationsofEquation( 4 )canbeinterpretedastheforwardpropagationofinformationthroughthenetwork.Notethatiftheactivationfunctionsofallthehiddenunitsofanetworkarelinear,thenwecanalwaysdeneanequivalentnetworkwithouthiddenunits.Ifanetworkconsistsofmultiplehiddenlayersthenwecanextendourformulationoftheweightedlinearcombinationfollowedbyanelement-wisetransformationusinganonlinearactivationfunction(Equation( 4 ))foreachlayer,wheretheoutputsofprecedinglayerserveasinputstothecurrentlayer.ThisMLParchitectureisimportantbecauseithasbeenshownthatnetworkswithonehiddenlayerareuniversalapproximatorsasthenumberofhiddenunitstendstoinnity,foralargeclassoffunctions(butexcludingpolynomials). GiventhedatasetD=(X,y),thetrainingoftheMLPisthetaskofndingthevaluesofparameters(w(h)andw(o))byminimizingthesum-of-squareserrorfunctiongivenbyE(w)=1 2k^y)]TJ /F20 11.955 Tf 11.95 0 Td[(yk2 (4) where,wehavegroupedtogethertheweightsandbiasesofalllayersintoasinglevectorwand^y=(^y1,...,^yN)TistheMLPresponsetotrainingtargetsycalculatedusingEquation( 4 ).TheBayesiantreatment(maximumlikelihoodapproach)ofMLPtrainingalsoleadstothesameerrorfunction(consultBishop[ 21 ]foracomprehensiveanalysisofneuralnetworkdesignandtraining). SinceE(w)isasmoothcontinuousfunctionofw,minimawilloccuratapointinweightspacewherethegradientoftheerrorfunctionvanishesi.e.,rE(w)=0. 52

PAGE 53

Pointsatwhichthegradientvanishesarecalledstationarypoints,andmaybefurtherclassiedintominima,maxima,andsaddlepoints.Theerrorfunctionhasahighlynonlineardependenceontheweightsandbiasparameters,andsotherewillbemanypointsinweightspaceatwhichthegradientvanishes(orisnumericallyverysmall).Furthermore,typicallytherearemultipleinequivalentstationarypointsandinparticularmultipleinequivalentminima.ForsuccessfultrainingofMLPs,itisnotnecessarytondtheglobalminimum(andingeneralitisnotknownwhethertheglobalminimumhasbeenfound)butitmaybenecessarytocompareseverallocalminimainordertondasufcientlygoodsolution.SincetheequationrE(w)=0cannotbesolvedanalytically,iterativenumericalproceduresareemployedwhichrelyongradientinformation.ThegenericapproachtominimizingE(w)bygradientdescent,iscalledback-propagation.Becauseofthecompositionalformofthemodel,thegradientcanbeeasilyderivedusingthechainrulefordifferentiation.Thiscanbecomputedbyaforwardandbackwardsweepoverthenetwork,keepingtrackonlyofquantitieslocaltoeachunit.Back-propagationinvolveschoosingsomeinitialvaluew0fortheweightvectorandthenmovingthroughweightspaceinasuccessionofstepsoftheformwn+1=wn+wn (4) wherenlabelstheiterationnumber.Differentalgorithmsinvolvedifferentchoicesfortheweightvectorupdatewnwhichincludestherstand/orsecondordergradientinformation. FortrainingourMLP,weusetheLevenberg-Marquardt(LM)algorithm[ 21 ]whichcomputestheJacobianmatrixJforweightupdate.TheelementsoftheJacobianaregivenbythederivativesofthenetworkoutputswithrespecttotheinputsJki=@^yk @xi (4) 53

PAGE 54

whereeachsuchderivativeisevaluatedwithallotherinputsheldxed.Likethequasi-Newtonmethods,theLMalgorithmisdesignedtoapproachsecond-ordertrainingspeedwithouthavingtocomputetheHessianmatrix(rrE(w)).Forthesum-of-squareserrorfunction(Equation( 4 ))theHessiancanbeapproximatedbyJTJwiththefollowingNewton-likeupdatewn+1=wn)]TJ /F5 11.955 Tf 11.96 0 Td[([JTJ+I])]TJ /F13 7.97 Tf 6.59 0 Td[(1JT(y)]TJ /F20 11.955 Tf 11.75 0 Td[(^yn) (4) where^ynisthenetworkresponseatnthiteration(calculatedbyusingwninEquation( 4 ))andisascalarknownasthestep-size.When=0thentheLMupdateisjustNewton'smethod,usingtheapproximateHessianmatrix.Whenislarge,thisbecomesgradientdescentwithasmallstepsize.SinceNewton'smethodisfasterandmoreaccuratenearaminima,theaimistotransitiontowardsNewton'smethodonceweareclosetoaminima.Thusisdecreasedaftereachsuccessfulstep(reductioninE(w))andisincreasedonlywhenatentativestepwouldincreasethevalueofE(w),therebyensuringthaterrorisalwaysreducedateachiteration.ThesuperiorconvergenceofLMalgorithmcomesatapenaltyofhighercomputationalcostascomparedtoasimplegradientdescenttechnique. Atthestartoftrainingweinitializethenetworkparametersw0usingtheNguyen-Widrowinitializationalgorithm[ 22 ].Thisalgorithmchoosesvaluesinordertodistributetheactiveregionofeachunitinthelayerapproximatelyevenlyacrossthelayer'sinputspacewithacertaindegreeofrandomness.Alsotoavoidtheproblemofover-ttingweusethetechniqueofearly-stopping.Forearly-stopping,wesplitthetrainingdatasetintotrainandvalidationsubsets.Thetrainingsubsetisusedforcomputingtheweightupdateswn,whiletheerroronthevalidationsubsetismonitoredduringthetrainingprocess.Normallyduringtheinitialphaseoftrainingtheerrorontrainingandvalidationsubsetsdecreases.However,whenthenetworkbeginstoovertthedata,theerroron 54

PAGE 55

thevalidationsettypicallybeginstorise,andwhenthevalidationerrorincreasesforaspeciednumberofiterationsweterminatethetraining. 4.1.4RadialBasisFunctionNetworkRegression InSection 4.1.1 weconsideredamodelbasedonlinearcombinationsofxedbasisfunctionsf(x,w)=wT(x)( 4 ).Acommonchoiceforthebasisfunctionisthatofradialbasisfunctions(RBF),whichhavethepropertythateachbasisfunctiondependsonlyontheradialdistance(typicallyEuclidean)fromacenteri,sothati(x)='(kx)]TJ /F25 11.955 Tf -452.89 -23.91 Td[(ik),where'(.)isanon-lineardifferentiablefunction.AnRBFnetworkcanbeseenasatwolayerfeed-forwardnetwork(analogoustoanMLP)withahiddenlayerofradialbasisunitsandanoutputlayerwithasingleunit[ 21 ].AllRBFcentersandnonlinearitiesinthehiddenlayerarexed,whichresultsinthemodelbeinglinearinparameters.Thusthehiddenlayerperformsaxednonlinearmappingintofeaturespacewithnoadjustableparameters.Theoutputlayerthenimplementsalinearcombiner(sameastheoutputunitofMLPregressionmodel)onthefeaturespaceandtheonlyadjustableparametersaretheweightsofthislinearcombiner.Theseparameterscanbesetbyleast-squaresapproach. ConsideranRBFnetworkwithNHhiddenunits(orcenters)andaninputvectorxDfromtrainingdataset.Thentheoverallmodelresponsetoxis^y=w0+NHXi=1wi'(kx)]TJ /F25 11.955 Tf 11.96 0 Td[(ik) (4) whereiistheithRBFcenter,andfwjjj=0,...,NHgaretheoutputlayerweights.Notethatw0servesasthebiastermforoutputunit.Somecommonchoicesofnonlinearity'(.)arelistedinTable 4-2 .Forthin-plate-spline'(a)!1asa!1andforGaussian'(a)!0asa!1. RBFnetworktrainingstartswithchoosingthecentersi,followedbythettingofparametersbyaleastsquaresapproach.Themostcrucialprobleminthetrainingisthatofsuitablecenterselectionfromalargesetofcandidates.Sinceaxedcenter 55

PAGE 56

Table4-2. Somecommonlyusedradialbasisfunctions'(a)forR. RadialbasisfunctionExpression Thin-plate-splinea2log(a)Gaussianexp)]TJ /F9 11.955 Tf 10.99 8.08 Td[(a2 2Multiquadraticp a2+2Inversemultiquadratic1 p a2+2 correspondstoagivenregressorinalinearregressionmodel,theselectionofRBFcenterscanberegardedasaproblemofsubsetmodelselection.Weusetheorthogonalleastsquares(OLS)learningalgorithmbyChenetal.[ 23 ],whichyieldsaparsimoniousRBFnetworkforagivenperformancebound.Ateachtrainingiteration,thenextdatapointtobechosenasabasisfunctioncentercorrespondstotheonethatgivesthegreatestreductioninthesum-of-squareserror(Equation( 4 )),hencetheincrementintheexplainedvarianceofthedesiredoutputismaximized.Furthermore,oversizingandill-conditioningproblemsoccurringfrequentlyinrandomselectionofcentersareautomaticallyavoided.OLSlearningisanefcientalgorithmwhichrequiresonlyonepassofthetrainingdataandtheselectionofcentersisdirectlylinkedtothereductionoferrorsignals. 4.1.5SupportVectorRegression Supportvectormachines(SVM)haverecentlybecomehighlypopularforsolvingpatternrecognitionproblems[ 21 , 24 , 25 ].SVMtechniquesweredevelopedbyVapnikandhiscolleaguesinthe1990'satAT&TBellLaboratories,andhavethenceforthbecomehighlycompetitivewiththebestmachinelearningsystemsinrealworldapplications.TheadvantagesofSVM'sincludesparsenessandaconvexobjectivefunction,soanylocalsolutionisalsoaglobaloptimum. GiventhetrainingdatasetD,thegoalof"-SVregressionistondamappingxi7!f(xi)thathasatmost"deviationfromtargetsyifortheentiredataset,andfshouldbeasataspossible.IfwerewritethelinearregressionmodelofEquation 56

PAGE 57

( 4 )withidentitybasisfunctions(x)=xandaexplicitbiastermb,thenwegetf(x)=wTx+b,forbR,xD.Onewayofachievingatnessoff(x)istominimizekwk2subjecttotheconstraintsthatjy)]TJ /F9 11.955 Tf 12.16 0 Td[(f(x)j".Thisisaconvexoptimizationproblemandiscalledfeasible,ifasolutionfexists.Tocopewithinfeasibleconstraintsweallowforsomeerrorsbyintroducingtheslackvariablesi,i.Hencewearriveatthefollowingoptimizationproblemminimize1 2kwk2+CNXi=1(i+i)subjectto8>>>><>>>>:yi)]TJ /F20 11.955 Tf 11.95 0 Td[(wTx)]TJ /F9 11.955 Tf 11.96 0 Td[(b"+iwTx+b)]TJ /F9 11.955 Tf 11.95 0 Td[(yi"+ii,i0 (4) whereconstantC>0determinesthetrade-offbetweenatnessoffandtheamountuptowhichdeviationslargerthan"aretolerated.Thiscorrespondstodealingwithasocalled"-insensitivelossfunctionjj"describedbyjj",8><>:0ifjj""jj")]TJ /F4 11.955 Tf 11.95 0 Td[("otherwise (4) TheoptimizationinEquation( 4 )canbeeasilysolvedinitsdualformulation,whichalsohelpsinextendingittononlinearfunctionsviakernels(referSection 4.1.2.3 ).ThekeyideaistoconstructaLagrangefunctionfromtheobjectivefunctionandthecorrespondingconstraints,byintroducingadualsetofvariables.Itcanbeshownthatthisfunctionhasasaddlepointwithrespecttotheprimalanddualvariablesatthesolution.UsingLagrangemultipliersi,iandakernelfunctionk,wearriveatthe 57

PAGE 58

followingoptimizationproblem(forC,"0chosenapriori)maximize8>>>><>>>>:)]TJ /F4 11.955 Tf 9.3 0 Td[("NXi=1(i+i)+NXi=1(i)]TJ /F4 11.955 Tf 11.95 0 Td[(i)yi)]TJ /F5 11.955 Tf 10.49 8.09 Td[(1 2NXi,j=1(i)]TJ /F4 11.955 Tf 11.96 0 Td[(i)(j)]TJ /F4 11.955 Tf 11.95 0 Td[(j)k(xi,xj) (4) subjecttoi,i[0,C],8i=1,...,N,andNXi=1(i)]TJ /F4 11.955 Tf 11.96 0 Td[(i)=0 Theregressionfunctiontakestheform^y=NXi=1(i)]TJ /F4 11.955 Tf 11.95 0 Td[(i)k(xi,x)+b (4) wherebiscalculatedusingtheKarush-Kuhn-Tuckerconditions.Wehaveasparseexpansionofwintermsofxi(i.e.wedonotneedallxitodescribew).DatapointsassociatedwithnonzeroLagrangecoefcientsi,iarecalledSupportVectors. Letc(xi,yi,f(xi))denotethecostfunctiondetermininghowwewillpenalizeestimationerrorforithdatasample,thenwecanwritethesocalledempiricalriskfunctionalasRemp[f],1 NNXi=1c(xi,yi,f(xi)).Toavoidoverttingweshouldaddacapacitycontrolterm,whichforSVcaseleadstotheregularizedriskfunctionalRreg[f]=Remp[f]+ 2kw2k (4) where>0istheregularizationconstant.Forthe"-insensitivecostfunctionwestillhavetheproblemofchoosinganadequatevalueof"inordertoachievegoodperformancewiththeSVmachine.Thereexists,however,amethodtoconstructSVmachinesthatautomaticallyadjust"andmoreoveralso,atleastasymptotically,haveapredeterminedfraction(01)ofsamplingpointsasSVs.ThisisachievedbyminimizingthefollowingriskfunctionalR[f]=Remp[f]+ 2kw2k+" (4) 58

PAGE 59

Theoptimizationproblemfor-SVregressioncanbewritteninitsprimalformasminimize1 2kwk2+C NXi=1jyi)]TJ /F9 11.955 Tf 11.96 0 Td[(f(xi)j"+N"!subjectto8>>>><>>>>:yi)]TJ /F20 11.955 Tf 11.95 0 Td[(wTx)]TJ /F9 11.955 Tf 11.96 0 Td[(b"+iwTx+b)]TJ /F9 11.955 Tf 11.95 0 Td[(yi"+ii,i0 (4) andthedualformisgivenbymaximize8>>>><>>>>:)]TJ /F5 11.955 Tf 10.49 8.09 Td[(1 2NXi,j=1(i)]TJ /F4 11.955 Tf 11.96 0 Td[(i)(j)]TJ /F4 11.955 Tf 11.95 0 Td[(j)k(xi,xj)+NXi=1(i)]TJ /F4 11.955 Tf 11.95 0 Td[(i)yisubjectto8>>>>>>><>>>>>>>:NXi=1(i)]TJ /F4 11.955 Tf 11.95 0 Td[(i)=0NXi=1(i+i)CNi,i[0,C] (4) Essentially,-SVregressionimprovesupon"-SVregressionbyallowingthe"-tubewidthtoadaptautomaticallytothedata.Onecanshowthattheregressionisnotinuencedifweperturbpointslyingoutsidethetube.Thus,theregressionisessentiallycomputedbydiscardingacertainfractionofoutliers,speciedby,andcomputingtheregressionestimatefromtheremainingpoints. 4.2PerformanceEvaluationofRegressionResult Wewishtoevaluatethequalityofpredictionresult^yofaregressionalgorithmtrainedonthedatasetD=(X,y).Forthispurposewecalculatethefollowingthreemetrics Meansquareerror(MSE) Pearson'scorrelationcoefcient(R) Slopeandoffsetofstraightlinet 59

PAGE 60

MeansquareerrorisdenedasMSE=1 Nky)]TJ /F5 11.955 Tf 13.45 0 Td[(^yk2.Leta=y)]TJ /F5 11.955 Tf 14.71 8.08 Td[(1 NNXi=1yiandb=^y)]TJ /F5 11.955 Tf 14.71 8.08 Td[(1 NNXi=1^yi,thenPearson'sR=aTb kakkbk.Finally,wetthestraightlinemodel^y=y+usingleastsquares,whereistheslopeandisthey-axisintercept(offset).Forthecaseofperfectprediction^y=y,wehaveunityslopeandzerooffset. 60

PAGE 61

CHAPTER5EXPERIMENTSANDDISCUSSION ThischapterwediscusstheevaluationoftheproposedscoringsystemtopredictspeechintelligibilityofParkinson'sdisease(PD)patients.Parkinson'sdiseaseisadegenerativedisorderofthecentralnervoussystem.PDisthesecondmostcommonneurodegenerativedisorderandaffectsapproximatelysevenmillionpeopleglobally.Theearlysymptomsofthediseasearevoicedisordersandmovementrelatedincludingshaking,rigidity,slownessofmovementanddifcultywithwalkingandgait.Duringthelaterstagesofthediseasecognitiveandbehavioralproblemsmayarise,withdementiacommonlyoccurringintheadvancedstagesofthedisease.Depressionisthemostcommonpsychiatricsymptom.Parkinson'sdiseasecancauseneuropsychiatricdisturbanceswhichcanrangefrommildtosevere.APDpatienthastwotosixtimestheriskofdementiacomparedtothegeneralpopulation.Theprevalenceofdementiaincreaseswithdurationofthedisease.Parkinson'sdiseasehasnocure,butmedications,surgeryandmultidisciplinarymanagementcanproviderelieffromthesymptoms.SpeechtherapyplaysanimportantroleintherehabilitationofthePDpatient.PDismorecommonintheelderlyandprevalencerisesfrom1%inthoseover60yearsofageto4%ofthepopulationover80.Themeanageofonsetisaround60years,although5-10%ofcases,classiedasyoungonset,beginbetweentheagesof20and50.Section 5.1 describesthedatabase,analysistechniqueandresultsfortheintelligibilitypredictiontask.Weconcludewithadiscussionofcurrentresultsandideasforfuturework. 5.1SpeechIntelligibilityEvaluationExperiment 5.1.1Database WecollectedaspeechdatabaseofParkinson'spatientscoveringthedifferentstagesofthedisease.Thedatabaseconsistsof160utterancescollectedfrom48patientswhichincluded20femalesand28males.Thefemalepatientswereaged 61

PAGE 62

between57and86yearswiththemedianageof72.5yearsandinter-quartilerangeof17.5years.Themalepatientswereagedbetween48and90yearswiththemedianageof73yearsandinter-quartilerangeof18years.AllthepatientswerevolunteersfromtheVeteran'sAffairshospitalinGainesville,UFmotormovementdisordersclinicandPDsupportgroupsinandaroundGainesville.TherecordingswerecollectedbyexperiencedgraduatestudentsfromtheDepartmentofSpeech,Language,andHearingSciencesattheUniversityofFlorida.ThedatabasecomprisesfteensentencesculledfromtheSPINdatabasesuchthattheyformaphoneticallywellbalancedset.Table 5-1 liststhesentencesusedinthedatabase.Thespeakerswerecomfortablyseatedduringtherecordingandwereallowedtospeakatacomfortableloudnessandspeakingrate.Allthe160utteranceswerecollectedusingahighqualitysoundrecorderwithsolidstatememoryatasamplingrateof44.1kHzanddigitizedwith16bitresolution.TherecordingsweresubsequentlystoredinaPCforfurtherprocessing. Table5-1. ListofsentencesfromthedisorderedspeechdatabaseofParkinson'spatients.Thewordsinboldarechosenfortheintelligibilitypredictionexperiment. Sentences 1Hisbossmadehimworklikeaslave.2Hecaughttheshinhisnet.3Thebeerdrinkersraisedtheirmugs.4Imadethephone-callfromabooth.5Thecutonhiskneeformedascab.6Igaveherakissandahug.7Thesoupwasservedinabowl.8Thecookieswerekeptinajar.9Thebabysleptinhiscrib.10Thecopworeabulletproofvest.11Howlongcanyouholdyourbreath.12Atbreakfasthedranksomejuice.13Iateapieceofchocolatefudge.14Thejudgeissittingonthebench.15Theboatsailedalongthecoast. Alltheutterancesweremanuallysegmentedtomarkthewordboundaries.Table 5-1 markstheselectedwordsforeverysentenceinbold.Theselectedwords(total47) 62

PAGE 63

wereclippedfromtheentirePDdatabaseyieldingatotalof485utterances.ThisPDworddatabaseisusedforthespeechintelligibilitypredictionexperiment.Two-thirdsoftheworddatabasewasusedfortrainingtheregressionmodelsandtheremainingutterancesformedthetestset.Wechosetopredicttheintelligibilityofwordsinsteadoffullsentencesduetothelimitationsoftimealignmentalgorithm.TheDTWalgorithmfailsattimealigningParkinson'sspeechutterancesoffullsentences.AdditionallyahiddenMarkovmodel(HMM)basedtext-dependentautomaticspeechrecognitionsystemtrainedoncleanspeechcorpus(TIMITdatabase)alsoperformedunsatisfactorilyinthetaskoftimealigningsentencesspokenbyParkinson'spatients.BothDTWandHMMbasedapproachesworkequallywellwithwords.WechoosetheDTWalgorithmoverHMMbasedsystemforsimplicity. Figure5-1. HistogramofthespeechintelligibilityscoresforParkinson'sspeechdatabase. Theutterancesinthedatabasewereperceptuallyratedforspeechintelligibilityonaratioscaleof1through1000.Ratioscaleisawayofassigningspeechintelligibilityscorestoutterancessuchthatanutterancewiththescoreofkistwiceasintelligibleasanutterancewiththescoreof2k,providedboththeutterancesaretherenditionsofthesamewordorsentence.Theratioscaleoffersamoreconvenientwayofscoringcomplexperceptualconstructssuchasspeechintelligibility.Inthisstudytheratioscaleischosensuchthatthescoreof100denotesperfectlyintelligiblespeech.Ninelistenerswithnormalhearing(bilaterally),allstudentsintheDepartmentofSpeech, 63

PAGE 64

Language,andHearingSciencesattheUniversityofFloridaparticipatedintheratingtask.Allthejudgesreceivedtrainingforthetaskpriortotheactualscoring.Allthetestswereconductedinasound-treatedbooth.Thestimuliweredeliveredtothesubjectviahigh-delityheadphonesdesignedtodeliveraatfrequencyresponseattheeardrum.Allthestimuliwerescaledtotheoutputlevelof75dBSPLandpresentedbinaurally.Eachstimuluswerepresentedtentimesinrandomordertoeachlistener,andthescoreswereaveragedtoyieldasinglescoreforeachlistener.Theoverallintelligibilityscoreforanutterancewascalculatedasthegeometricmeanofthescoresfromalllisteners.Figure 5-1 plotsthedistributionofspeechintelligibilityscoresforentiredatabase.ThehealthyutteranceswerechosenfromtheSPINdatabasewhichincluded150utterancesforeachofthe15sentences.Tentemplate(exemplar)healthyutterancesforeverysentencewerechosenwiththehelpofhierarchicalclusteringusingclusteringtreesofdendrograms.DuringclusteringthesimilaritybetweenapairofutteranceswascomputedusingtheDTWalgorithm.Itwasensuredthatthetemplatesforeverysentenceincludedspeakersfromboththegenders. 5.1.2FeatureExtraction Section 3.1 providesthedescriptionofthefeatureextractionstep.AlltheutteranceswerelowpasslteredusingalinearphaseFIRlteranddownsampledto16kHzpriortoprocessing.Figure 3-1 showstheoutlineofthefeatureextractionsystem.Firstthetestutteranceistimealignedwiththetenhealthyspeechtemplatesofthesamesentence.Thetemplatewiththelowestoveralldissimilarityscoreischosenforsubsequentprocessing.AsshowninFigure 3-6 rsttheMFCCcoefcientswereextractedforcomputingthedissimilaritymatrix.MFCCprocessingwasdoneusing32msHammingwindowswith50%overlap.Themel-lterbankconsistedof40lterswithcenterfrequenciesspanningtherangefrom0to8kHz.TypeIIDCTwasusedforthenalcepstraltransformation.Finallythedeltaanddelta-deltaMFCCcoefcientswere 64

PAGE 65

appended.SubsequentlythedissimilaritymatrixiscomputedusingthecosinedistancemeasurewhichservesasaninputtotheDTWalgorithm. ThetemplatespeechutteranceandthetestutteranceareprocessedbytheJepsen'sauditorymodelasdescribedinSection 3.2 .TheDRNLlterbankconsistedof20channelscoveringthefrequencyrangefrom63Hzto8kHzandchannelcenterfrequencieswereevenlydistributedintheERBscale[ 16 ].Themodulationlterbankconsistedofeightsecond-orderbandpasslterswithoctavespacingandcenterfrequenciescoveringarangefrom2Hzto256Hz.lterbank.Thehighestmodulationltercenterfrequenciesinthelterbankarelimitedtoone-quarterofthecenterfrequencyoftheperipheralchanneldrivingthelterbankasproposedinVerheyetal.[ 26 ].Theoutputofthemodulationlterbankissegmentedintoframesusingarectangularwindowwithoutoverlap.Theframedurationistheinverseofthemodulationchannelcenterfrequency.Forexample,50Hzmodulationchanneloutputissegmentedintoframesof20msduration.Correlationiscomputedbetweenmatchingframesofthetestutteranceandthetemplateutterance.NextthetemplateutterancewasprocessedbytheSWIPE0algorithmtocomputethepitchstrengthvalues.Thepitchsearchrangespanned30Hzto800Hzandthecandidateswereevenlydistributedinabase-2logarithmicscalewiththespacingof1/48.Thepitchestimateswereupdatedevery5ms.Pitchstrengthcutoffof0.2wasusedtopartitionframesintolowandhighgroupsforeveryaudiofrequencychannel.Thisyieldedafeaturevectorofsize40. 5.1.3RegressionAnalysis AsdescribedinChapter 4 ,weuseregressiontechniquesforpredictingintelligibilityforourdatabase.Specicallyweemploylinearregression,GPregressionand"-SVregressionforthisexperiment.Atthestart,theentiredatabaseissplitintotwosets,onefortrainingandtheotherfortestingusingstratiedsampling.Forlinearregressionweuseamodelwithbiasandidentitybasisfunctions.ForthecaseofGPregressionweuseacovariancefunctionwhichisthesummationofSEandnoisecovariancefunctions 65

PAGE 66

(Table 4-1 ).AlsothemeanfunctionofGPismodeledasanafnetransformation,whichconstitutesthelinearcombinationofinputsplusaconstantterm.AsdescribedinSection 4.1.2.4 ,weoptimizethehyperparametersoftheGPregressionmodelusingatmost50iterationsoftheconjugategradientalgorithm(Polak-Ribiereformulation),alongwiththegradientinformationcalculatedusingEquation( 4 ).Priortoautomaticadaptationofhyperparameters,allfreeparametersassociatedwiththecovariancefunctionwereinitializedto-1andthoseassociatedwithmeanfunctionwereinitializedto0.Figure 5-2 plotsthelogmarginallikelihood( 4 )acrossiterationsforatypicalsessionofhyperparameteradaptation;anditcanbeseenthattheconjugategradientmethodconvergeswithin20iterations.ThenalscoreiscomputedusingthemeanofthepredictivedistributiongivenbyEquation( 4 )orEquation( 4 ).Thisisequivalenttokernelridgeregression(KRR)orridgeregressionindualvariables[ 27 ].ThekernelmatrixinkernelridgeregressioncanbeinterpretedasthecovarianceofaGaussianprocessprior.For"-SVregression,aGaussiankernelwasusedandthemodelwastrainedusingaSMOtypealgorithmproposedbyFanetal.[ 28 ].The"parameter,aGaussiankernelbandwidthandthecostfactorCwerechosenbygrid-search. Figure5-2. PerformancecurveforautomaticadaptationofhyperparametersforGaussianprocessregressionmodel. 66

PAGE 67

TrainedmodelsareusedtopredictintelligibilityforbothtrainandtestsetsfromthePDworddatabase.AsdescribedinSection 4.2 ,wecalculateMSE,Pearson'sR,slopeandoffsetofthestraightlinet,asmetricsforevaluatingtheperformanceofourintelligibilitypredictionsystem.Weran100independentsimulationswhichinvolvethefollowingsteps:partitioningthedataset,trainingthemodelandnallyprediction.Wecomputedthefollowingperformancemetricsacross100simulations;meanMSE(MSE),meanslope(),meanoffset(),meanPearson'sR(R)andthestandarddeviationofPearson'sR(R). Table 5-2 liststheperformancemetricsoftheproposedsystemcalculatedacross100simulations.Itcanbeseenthatalltheregressionmethodsgivesatisfactoryperformanceonthisdataset.WehaveobtainedrobustPearson'sRvalue(>0.9)forthepredictedscores,showinggoodagreementbetweenactualandpredictedintelligibility.AlltheregressionmethodsworkequallywellwiththeperformanceofGPregressionand"-SVregressionbeingslightlybetter.Figure 5-3 ,Figure 5-4 andFigure 5-5 showthescatterplotsofactualintelligibilityversuspredictedintelligibilityfromoneofthesimulationsforthethreeregressionmethods. Table5-2. Speechintelligibilitypredictionperformancemetricsfortheproposedsystemcalculatedacross100independentsimulations. RegressionmethodMSE(103)RR(10)]TJ /F13 7.97 Tf 6.59 0 Td[(2) Linear1.4460.80538.7800.8961.060GP1.0120.85328.1370.9170.970"-SV0.9930.84927.6420.9290.968 5.1.4AutomaticSpeechRecognitionBasedSpeechIntelligibilityPredictionExperiment SLP'softensubjectivelyquantifyintelligibilityoftheconversationalspeechbyusingthespeechrecognitiontasks.Thesetasksincludescalingprocedureswherethelistenerestimatestheproportionoftheintendedtargetsthatwereunderstoodandwordidenticationtestsinwhichthelistenerattemptstodetermine(bytranscriptioninthe 67

PAGE 68

Figure5-3. Scatterplotoftheactualandpredictedintelligibilityscoresforlinearregression. Figure5-4. ScatterplotoftheactualandpredictedintelligibilityscoresforGPregression. Figure5-5. Scatterplotoftheactualandpredictedintelligibilityscoresfor"-SVregression. 68

PAGE 69

caseofconversationalspeech)exactlywhatwassaid.Wehavedesignedanexperimentinvolvinganautomaticspeechrecognition(ASR)systemtopredicttheintelligibilityoffullsentencesfromourPDdatabase.Weusethetext-independentASRsystem,wherewedeterminethemostlikelyphonesequencewiththeassumptionthateveryphoneisequallylikelyateachtimestep.Thespeechsignalcanbesegmentedintoasequenceofphones,whereeachphonecorrespondstoauniquevocaltractshape.Thephonesserveasthebasicindivisibleunitofspeech.Aphonesequencecanbetransformedintoawordsequenceusingadictionary.TheoutputoftheASRsystemisastringofwordsrecognizedfromtheinpututterance.Theoutputtranscriptionisprocessedtoobtainthewordinsertionerrors(NI),theworddeletionerrors(ND)andthewordsubstitutionerrors(NS).IfNisthenumberofwordsintheutterancethenthestandardrecognizerperformancemeasuresviz.,theaccuracy(N)]TJ /F11 7.97 Tf 6.58 0 Td[(ND)]TJ /F11 7.97 Tf 6.58 0 Td[(NS)]TJ /F11 7.97 Tf 6.59 0 Td[(NI N)andthecorrectness(N)]TJ /F11 7.97 Tf 6.59 0 Td[(ND)]TJ /F11 7.97 Tf 6.59 0 Td[(NS N)areused.ForthepresentstudywechosetheGooglespeechapi(version2)astheASRsystem.Buildingatext-independentASRsystemthatcanrecognizeanysentenceinAmericanEnglishisatoughchallengewhichrequirescarefulandcomplicatedsignalprocessingandpatternrecognitionmethods.Theyalsorequireenormousamountoftrainingdatainordertoobtainawellgeneralizedrecognizer.WedecidedtousetheGoogleASRsystemwhichisastateoftheartrecognizerthatiswidelyusedinvariousGoogleprojectssuchasChromiumandAndroidandalsofreelyaccessibleforpublicuse.TheoutputoftheGoogleASRsystemistherecognizedtextandacondencescore.Thecondencescoreisanumberrangingfrom0to1withhighervaluesreectingahigherdegreeofcertaintyabouttherecognizeroutput.WeusedfullsentencesfromthePDdatabaseforthisexperiment.AlltheutterancesinthedatabasewereprocessedbytheGoogleASRsystemandtheoutputwasusedtocomputetheaccuracy,correctnessandcondence.Theresultingthreedimensionalfeatureswereusedinthelinearregressionmodelwithintelligibilityscoresasthetargetvariable.The 69

PAGE 70

resultingtyieldedcorrelationofR=0.76whichindicatesthattheselectedfeaturesarenotgoodproxiesforthesubjectivespeechintelligibilityevaluationsofourdatabase. 5.2Discussion Weperformedthreeadditionalexperimentstostudythecontributionsofdifferentsub-systemsintheproposedintelligibilitysystem.TherstexperimentinvolvedtheuseoftheDTWbaseddissimilarityscore(Equation( 3 ))forpredictingtheintelligibilityoftheParkinson'sworddatabase.AsdescribedinSection 5.1.2 ,inputspeechwaveformswereprocessedtoextracttheMFCCfeaturematrices.ThentheDTWalgorithmwasusedtotimealigntheutterancesusingthedissimilaritymatrixcalculatedfromtheMFCCmatrices.Thenaldissimilarityscorewasusedasthepredictorvariableintheregressionanalysis.Table 5-3 liststheresultsaggregatedfrom100simulations.Theresultsshowthepoorperformance(R=0.623)oftheMFCCbasedDTWdissimilarityscoreforpredictingthesubjectivespeechintelligibilityevaluationsofourdatabase. Table5-3. SpeechintelligibilitypredictionperformancemetricsfortheDTWbasedsystemcalculatedacross100independentsimulations. RegressionmethodMSE(103)RR(10)]TJ /F13 7.97 Tf 6.58 0 Td[(2) Linear8.7970.399129.9220.6233.556 ForthesecondexperimentweuseasetupsimilartotheproposedsystemasdescribedinSection 5.1.2 butwithamodiedfeatureextractionstep.Thefeatureextractionmethodforthisexperimenteliminatesthepitchanalysisbutretainstheothersub-systemswithoutanymodications(Figure 3-1 ).Weaveragedthecorrelationvaluesofthematchingframesacrossallmodulationlterbankchannelsforeveryperipheralaudiochannel.TheresultingfeaturevectorisusedtopredicttheintelligibilityoftheParkinson'sdatabaseasdescribedinSection 5.1.3 .Table 5-4 liststheresultsfrom100simulations.Theoverallperformanceisslightlyworsethantheproposedsystem.The"-SVregressionmethodyieldsthebestoverallresults(R=0.902)asbefore.Theresultsprovetheusefulnessofthepitchstrengthbasedframegroupingapproachofthe 70

PAGE 71

proposedsystem.Thisindicatesthatthedifferenttemporalregionsofspeechcontributedifferentlytotheperceivedintelligibilitybasedonthemagnitudeoftheirpitchstrength. Table5-4. Speechintelligibilitypredictionperformancemetricsfortheproposedsystemwithoutpitchanalysiscalculatedacross100independentsimulations. RegressionmethodMSE(103)RR(10)]TJ /F13 7.97 Tf 6.59 0 Td[(2) Linear2.2150.73354.5470.8491.519GP1.6370.75851.4050.8971.246"-SV1.4940.77748.3280.9021.261 ForthethirdexperimentwemodiedtheJepsen'sauditorymodelinthefeatureextractionstep.Themodulationlterbankwasreplacedbyarstorderlowpasslterwithacutofffrequencyof8HzassuggestedinDauetal.[ 10 ]andthislteringpreservesallinformationaboutthemodulationphaseforlowmodulationfrequencies.TherestoftheexperimentalsetupwasidenticaltotheproposedsystemasdescribedinSection 5.1 .Theresultsfromthe100simulationsarelistedinTable 5-5 whichshowthattheoverallperformanceismuchworsethantheperformanceoftheproposedsystem.GPregressionyieldsthebestperformingresult(R=0.847)withsimilarperformancefromthe"-SVregression.Theresultsfromthisexperimentarealsoworsethantheonesfromthepreviousexperimentwherewehadeliminatedthepitchanalysisstep.Theseresultsprovetheimportanceofemployingamodulationlterbankanalysis.Jorgensenetal.[ 15 ]alsousethemodulationlterbankanalysisforpredictingintelligibilityofspeechcorruptedbynoise.Theresultsobtainedheredemonstratethatthetemporalmodulation-frequencyselectivitybasedapproachcanaccountfortheintelligibilityofParkinson'sspeech. 5.3FutureWork Thenextstepofthestudywouldinvolveextendingthecurrentapproachtoincludesentencesandparagraphs.Thiswouldrequirethemodicationinthetimealignment(Figure 3-1 )step,whereDTWalgorithmwouldbereplacedbyASRbasedphonetic 71

PAGE 72

Table5-5. Speechintelligibilitypredictionperformancemetricsfortheproposedsystemwithoutmodulationlterbankcalculatedacross100independentsimulations. RegressionmethodMSE(103)RR(10)]TJ /F13 7.97 Tf 6.59 0 Td[(2) Linear2.5510.72169.4270.8171.479GP2.2570.68060.5100.8471.223"-SV2.3010.70858.1160.8431.236 transcriptionsystem.Therestofthefunctionalblocksofthesystemwouldworkunmodied.Theintelligibilitypredictionexperimenthasyieldedencouragingresultswhichsupportthevalidityofproposedsystem.Theperformanceofregressionanalysislookstobepromisingforpredictingspeechintelligibilityfordisorderedvoices.ThecurrentdatabasehoweverincludesonlytheParkinson'spatients,andrepeatingtheexperimentwithadatabaseofotherdysarthriacausingdiseaseswouldestablishtheusefulnessofproposedapproach. Anotherextensionoftheproposedsystemwouldinvolvepredictingintelligibilityscoresonaordinalscale.Atypeofsupervisedlearningmethodcalledrankingorordinalregression,whereexamplesarelabeledbyanordinalscalecalledtherankcanbeused.Target(dependent)variableintherankingproblembelongstoaniteanddiscretesetofsymbols,e.g.,inourcaseitcouldrepresentlabelslike`highlyintelligible',`moderatelyintelligible',`barelyintelligible',and`totallyunintelligible'asratingsforaspeechutterance.Theratingshaveanaturalorder,whichdistinguishesordinalregressionfromgeneralmulti-classclassication.NotethattheSLP'softenassignordinalratingstothevoicesamplesduringsubjectiveevaluations.Recentlymanyalgorithmshavebeenproposedforrankingfrommachinelearningperspective[ 29 – 33 ].WespecicallyplantousemanifoldmethodbyZhouetal.[ 29 ],GaussianprocessordinalregressionbyChuandGhahramani[ 32 ]andSVMapproachtoordinalregressionusingextendedbinaryclassicationbyLiandLin[ 33 ].Moreover,ifwe 72

PAGE 73

obtainconformingpredictionresultsfromordinalregressionandmetricregressionapproaches,thenitwouldsupportthereliabilityofthecomputedscores. 5.4Conclusion InthisdissertationwehaveproposedanautomaticsystemfortheevaluationofspeechintelligibilityforParkinson'sdiseasepatients.Themaingoalofthesystemistoproviderobustandreliableintelligibilityratingsforpathologicalspeechwhichmimicthescoresfromsubjectiveevaluation.Theproposedsystemconsistsoftwostagesviz.,featureextractionandscoring.AsdetailedinChapter 3 ,wecomputecorrelation-basedfeaturesfromspectro-temporalrepresentationsobtainedfromacomputationalhumanauditorysystemmodel.Thenextstepinvolvesthemappingoffeaturemeasurementstoaratingvalue.Wetacklethisproblemusingthesupervisedlearningtechniqueofregressionwherefeaturevectorsaretheregressorsandtheintelligibilityscoreisthetargetvariable.Regressionanalysisusinglinearandnon-linearmethodsprovidesapowerfultoolforpredictingintelligibilityscoresusingmachinelearningprinciples. Weconductedanexperimenttostudytheeffectivenessoftheproposedsysteminpredictingintelligibility,onadatabasecomprising160sentencesfrom48Parkinson'sdiseasepatients.Table 5-2 providestheperformanceresultsforintelligibilitypredictionusinglinearregression,GPregressionand"-SVregression.Weobtainedgoodpredictionresults,andthecomputedscoresshowhighcorrelation(>0.9)withperceptualratings.Theseresultsindicatethesuitabilityoftheproposedsystemforautomaticevaluationofspeechintelligibility. 73

PAGE 74

REFERENCES [1] N.R.FrenchandJ.C.Steinberg,“FactorsGoverningtheIntelligibilityofSpeechSounds,”TheJournaloftheAcousticalSocietyofAmerica,vol.19,no.1,pp.90,Jan.1947. [2] ANSI,“MethodsforCalculationoftheSpeechIntelligibilityIndex,”NewYork,Tech.Rep.,1997. [3] H.J.M.SteenekenandT.Houtgast,“Aphysicalmethodformeasuringspeech-transmissionquality,”TheJournaloftheAcousticalSocietyofAmerica,vol.67,no.1,p.318,1980. [4] K.S.Rhebergen,N.J.Versfeld,andW.A.Dreschler,“Extendedspeechintelligibilityindexforthepredictionofthespeechreceptionthresholdinuctuatingnoise,”TheJournaloftheAcousticalSocietyofAmerica,vol.120,no.6,p.3988,2006. [5] J.M.KatesandK.H.Arehart,“Coherenceandthespeechintelligibilityindex,”TheJournaloftheAcousticalSocietyofAmerica,vol.117,no.4,p.2224,2005. [6] R.L.GoldsworthyandJ.E.Greenberg,“Analysisofspeech-basedspeechtransmissionindexmethodswithimplicationsfornonlinearoperations,”TheJournaloftheAcousticalSocietyofAmerica,vol.116,no.6,p.3679,2004. [7] C.H.Taal,R.C.Hendriks,R.Heusdens,andJ.Jensen,“AnAlgorithmforIntelligibilityPredictionofTimeFrequencyWeightedNoisySpeech,”IEEETransac-tionsonAudio,Speech,andLanguageProcessing,vol.19,no.7,pp.2125,Sep.2011. [8] J.Ma,Y.Hu,andP.C.Loizou,“Objectivemeasuresforpredictingspeechintelligibilityinnoisyconditionsbasedonnewband-importancefunctions.”TheJournaloftheAcousticalSocietyofAmerica,vol.125,no.5,pp.3387,May2009. [9] I.HolubeandB.Kollmeier,“Speechintelligibilitypredictioninhearing-impairedlistenersbasedonapsychoacousticallymotivatedperceptionmodel.”TheJournaloftheAcousticalSocietyofAmerica,vol.100,no.3,pp.1703,Sep.1996. [10] T.Dau,D.Puschel,andA.Kohlrausch,“Aquantitativemodelofthe”effective”signalprocessingintheauditorysystem.I.Modelstructure.”TheJournaloftheAcousticalSocietyofAmerica,vol.99,no.6,pp.3615,Jun.1996. [11] J.F.Santos,S.Cosentino,O.Hazrati,P.C.Loizou,andT.H.Falk,“Objectivespeechintelligibilitymeasurementforcochlearimplantusersincomplexlisteningenvironments,”SpeechCommunication,vol.55,no.7-8,pp.815,Sep.2013. [12] J.B.BoldtandD.P.W.Ellis,“Asimplecorrelation-basedmodelofintelligibilityfornonlinearspeechenhancementandseparation,”Proc.EUSIPCO'09,2009. 74

PAGE 75

[13] C.Christiansen,M.S.Pedersen,andT.Dau,“Predictionofspeechintelligibilitybasedonanauditorypreprocessingmodel,”SpeechCommunication,vol.52,no.7-8,pp.678,Jul.2010. [14] T.Dau,B.Kollmeier,andA.Kohlrausch,“Modelingauditoryprocessingofamplitudemodulation.I.Detectionandmaskingwithnarrow-bandcarriers,”TheJournaloftheAcousticalSocietyofAmerica,vol.102,no.5,p.2892,1997. [15] S.Jorgensen,S.D.Ewert,andT.Dau,“Amulti-resolutionenvelope-powerbasedmodelforspeechintelligibility.”TheJournaloftheAcousticalSocietyofAmerica,vol.134,no.1,pp.436,Jul.2013. [16] M.L.Jepsen,S.D.Ewert,andT.Dau,“Acomputationalmodelofhumanauditorysignalprocessingandperception.”TheJournaloftheAcousticalSocietyofAmerica,vol.124,no.1,pp.422,2008. [17] A.CamachoandJ.G.Harris,“Asawtoothwaveforminspiredpitchestimatorforspeechandmusic.”TheJournaloftheAcousticalSocietyofAmerica,vol.124,no.3,pp.1638,2008. [18] E.A.Lopez-PovedaandR.Meddis,“Ahumannonlinearcochlearlterbank,”TheJournaloftheAcousticalSocietyofAmerica,vol.110,no.6,p.3107,2001. [19] L.RabinerandB.H.Juang,FundamentalsofSpeechRecognition.PrenticeHallPTR.,1993. [20] C.E.RasmussenandC.K.I.Williams,Gaussianprocessesformachinelearning.Cambridge,MA,USA:MITPress,Apr.2006,vol.14,no.2. [21] C.M.Bishop,PatternRecognitionandMachineLearning,1sted.Springer,2006. [22] D.NguyenandB.Widrow,“Improvingthelearningspeedof2-layerneuralnetworksbychoosinginitialvaluesoftheadaptiveweights,”inInternationalJointConferenceonNeuralNetworks.IEEE,1990,pp.21. [23] S.Chen,C.N.Cowan,andP.M.Grant,“Orthogonalleastsquareslearningalgorithmforradialbasisfunctionnetworks.”IEEEtransactionsonneuralnetworks,vol.2,no.2,pp.302,Jan.1991. [24] C.J.C.Burges,“Atutorialonsupportvectormachinesforpatternrecognition,”Dataminingandknowledgediscovery,vol.2,no.2,pp.121,1998. [25] A.J.SmolaandB.Scholkopf,“Atutorialonsupportvectorregression,”Tech.Rep.3,Aug.2004. [26] J.L.Verhey,T.Dau,andB.Kollmeier,“Within-channelcuesincomodulationmaskingrelease(CMR):experimentsandmodelpredictionsusingamodulation-lterbankmodel.”TheJournaloftheAcousticalSocietyofAmerica,vol.106,no.5,pp.2733,Nov.1999. 75

PAGE 76

[27] C.Saunders,A.Gammerman,andV.Vovk,“Ridgeregressionlearningalgorithmindualvariables,”inProceedingsoftheFifteenthInternationalConferenceonMachineLearning,ser.ICML'98.SanFrancisco,CA,USA:MorganKaufmannPublishersInc.,1998,pp.515. [28] R.Fan,P.Chen,andC.Lin,“Workingsetselectionusingsecondorderinformationfortrainingsupportvectormachines,”TheJournalofMachineLearningResearch,vol.6,pp.1889,2005. [29] D.Zhou,J.Weston,A.Gretton,O.Bousquet,andB.Scholkopf,“Rankingondatamanifolds,”inAdvancesinNeuralInformationProcessingSystems16:Proceedingsofthe2003Conference.TheMITPress,2004,p.169. [30] M.BreitenbachandG.Z.Grudic,“Clusteringthroughrankingonmanifolds,”inProceedingsofthe22ndinternationalconferenceonMachinelearning-ICML'05.NewYork,NewYork,USA:ACMPress,2005,pp.73. [31] W.ChuandS.S.Keerthi,“Newapproachestosupportvectorordinalregression,”Proceedingsofthe22ndinternationalconferenceonMachinelearning-ICML'05,pp.145,2005. [32] W.ChuandZ.Ghahramani,“Gaussianprocessesforordinalregression,”JournalofMachineLearningResearch,vol.6,no.1,p.1019,2006. [33] L.LiandH.-t.Lin,“Ordinalregressionbyextendedbinaryclassication,”AdvancesinNeuralInformationProcessingSystems,vol.19,p.865,2007. 76

PAGE 77

BIOGRAPHICALSKETCH SavyasachiSinghwasborninLucknow,Indiain1982toAlpanaandSwatantraKumarSingh.Hehasoneyoungersister,SonaliSingh.SavyasachireceivedaBachelorofTechnologydegreeinElectronicsandCommunicationEngineeringfromtheVelloreInstituteofTechnology(VIT),Vellore,IndiaonMay2005.Startingfromfall2005SavyasachibeganpursuingaMasterofSciencedegreeattheDepartmentofElectricalandComputerEngineeringatUniversityofFlorida.Sincefall2006,SavyasachihasbeenaresearchassistantattheComputationalNeuro-EngineeringLaboratory(CNEL)attheUniversityofFloridaworkingwithDr.JohnG.Harrisondevelopingsignalprocessingalgorithmsfortheevaluationofdisorderedspeech.Hisresearchinterestsincludedigitalsignalprocessing,speechprocessingandpatternrecognition.Heisalsointerestedincomputerprogrammingandlikestolearnnewprogramminglanguagesandparadigms.SavyasachireceivedhisMasterofScience(MS)degreefromtheUniversityofFloridain2007andhisPh.D.degreeinelectricalengineeringinAugust2014alsofromtheUniversityofFlorida. 77