<%BANNER%>
PRIVATE ITEM
Digitization of this item is currently in progress.
Simulation of human speech production applied to the study and synthesis of European Portuguese
CITATION PDF VIEWER
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/AA00008930/00001
 Material Information
Title: Simulation of human speech production applied to the study and synthesis of European Portuguese
Series Title: EURASIP Journal on Applied Signal Processing
Physical Description: Archival
Language: English
Creator: Teixeira, Antonio J. S.
Martinez, Roberto
Silva, Luis Nino
Jesus, Luis M. T.
Principe, Jose C.
Vaz, Francisco A. C.
Publisher: BioMed Central
Hindawi Publishing Corporation
Publication Date: 2005
 Notes
Abstract: A new articulatory synthesizer (SAPWindows), with a modular and flexible design, is described. A comprehensive acoustic model and a new interactive glottal source were implemented. Perceptual tests and simulations made possible by the synthesizer contributed to deepening our knowledge of one of the most important characteristics of European Portuguese, the nasal vowels. First attempts at incorporating models of frication into the articulatory synthesizer are presented, demonstrating the potential of performing fricative synthesis based on broad articulatory configurations. Synthesis of nonsense words and Portuguese words with vowels and nasal consonants is also shown. Despite not being capable of competing with mainstream concatenative speech synthesis, the anthropomorphic approach to speech synthesis, known as articulatory synthesis, proved to be a valuable tool for phonetics research and teaching. This was particularly true for the European Portuguese nasal vowels.
General Note: Publication of this article was funded in part by the University of Florida Open-Access publishing Fund. In addition, requestors receiving funding through the UFOAP project are expected to submit a post-review, final draft of the article to UF's institutional repository, IR@UF, (www.uflib.ufl.edu/ufir) at the time of funding. The Institutional Repository at the University of Florida (IR@UF) is the digital archive for the intellectual output of the University of Florida community, with research, news, outreach and educational materials
 Record Information
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution.
Resource Identifier: doi - 1687-6180-2005-709592
System ID: AA00008930:00001

Downloads

This item is only available as the following downloads:

( PDF )


Full Text

PAGE 1

EURASIPJournalonAppliedSignalProcessing2005:9,1435c 2005HindawiPublishingCorporationSimulationofHumanSpeechProductionAppliedtotheStudyandSynthesisofEuropeanPortugueseAnt onioJ.S.TeixeiraInstitutodeEngenhariaElectr onicaeTelem aticadeAveiroIEETA),3810-193Aveiro,PortugalDepartamentodeElectr onicaeTelecomunicac oes,UniversidadedeAveiro,3810-193Aveiro,PortugalEmail:ajst@det.ua.ptRobertoMartinezInstitutodeEngenhariaElectr onicaeTelem aticadeAveiroIEETA),3810-193Aveiro,PortugalEmail:martinezrs@ieeta.ptLu NunoSilvaInstitutodeEngenhariaElectr onicaeTelem aticadeAveiroIEETA),3810-193Aveiro,PortugalEmail:lnors@ieeta.ptLuisM.T.JesusInstitutodeEngenhariaElectr onicaeTelem aticadeAveiroIEETA),3810-193Aveiro,PortugalEscolaSuperiordeSa ude,UniversidadedeAveiro,3810-193Aveiro,PortugalEmail:lmtj@essua.ua.pt JoseC.Pr ncipe ComputationalNeuroengineeringLaboratory(CNEL),UniversityofFlorida,Gainesville,FL32611,USAEmail:principe@cnel.u.eduFranciscoA.C.VazInstitutodeEngenhariaElectr onicaeTelem aticadeAveiroIEETA),3810-193Aveiro,PortugalDepartamentodeElectr onicaeTelecomunicac oes,UniversidadedeAveiro,3810-193Aveiro,PortugalEmail:fvaz@det.ua.ptReceived29October2003;Revised31August2004AnewarticulatorysynthesizerSAPWindows),withamodularandxibledesign,isdescribed.Acomprehensiveacousticmodelandanewinteractiveglottalsourcewereimplemented.Perceptualtestsandsimulationsmadepossiblebythesynthesizercon-tributedtodeepeningourknowledgeofoneofthemostimportantcharacteristicsofEuropeanPortuguese,thenasalvowels.Firstattemptsatincorporatingmodelsoffricationintothearticulatorysynthesizerarepresented,demonstratingthepotentialofperformingfricativesynthesisbasedonbroadarticulatorycongurations.SynthesisofnonsensewordsandPortuguesewordswithvowelsandnasalconsonantsisalsoshown.Despitenotbeingcapableofcompetingwithmainstreamconcatenativespeechsynthesis,theanthropomorphicapproachtospeechsynthesis,knownasarticulatorysynthesis,provedtobeavaluabletoolforphoneticsresearchandteaching.ThiswasparticularlytruefortheEuropeanPortuguesenasalvowels.Keywordsandphrases:articulatorysynthesis,speechproduction,EuropeanPortuguese,nasalvowels,fricatives.1.INTRODUCTIONRecenttechnologicaldevelopmentsarecharacterizedbyin-creasingphysicalandpsychologicalsimilaritytohumans.Oneexampleisthewell-knownhuman-likerobots.Beingoneofthedistinctcharacteristicsofhumans,speechisanaturalcandidatetoimitationbymachines.Also,informa-tioncanbetransmittedveryfastandspeechfreeshandsandeyesforothertasks.Variousdesignsofmachinesthatproduceandunder-standhumanspeechhavebeenavailableforalongtime[ 1 2 ].Theuseofvoiceincomputersystemsinterfaceswill

PAGE 2

1436EURASIPJournalonAppliedSignalProcessing beanaddedadvantage,allowing,forexample,theuseofin-formationsystemsforpeoplewithdi erentdisabilitiesandtheaccessbytelephonetonewinformationservices.How-ever,ourcurrentknowledgeoftheproductionandpercep-tionofvoiceisstillincomplete.Thequality(orlackofit)ofsyntheticvoiceofthecurrentlyavailablesystemsisaclearindicationofthenecessitytoimprovethisknowledge[2 ]. Therearetwotypesofmotivationsforresearchinthevastdomainofvoiceproductionandperception[3 ].Therstoneaimsatthedeepunderstandingofitsdiverseaspectsandfunctions,thesecondisthedesignanddevelopmentofarti-cialsystems.Whenarticialsystemsarecloselyrelatedtothewayhumansdothings,thesetwomotivationscanbemerged.Thesesystemscontributetoanincreasedknowledgeoftheprocessandthisknowledgecanbeusedtoimprovecurrentsystems.Wehavebeendevelopinganarticulatorysynthesizer,since1995,whichwillhopefullyproducehigh-qualitysyn-theticEuropeanPortuguese(EP)speech.Weaimatasimul-taneousimprovementofoursynthesisquality(technologicalmotivation)andalsotoexpandourknowledgeofPortugueseproductionandperception.2.ARTICULATORYSYNTHESISArticulatorysynthesisgeneratesthespeechsignalthroughmodelingofphysical,anatomical,andphysiologicalcharac-teristicsoftheorgansinvolvedinhumanvoiceproduction.Thisisadi erentapproachwhencomparedwithothertech-niques,suchasformantsynthesis[5 ].Inthearticulatoryap-proach,thesystemismodeledinsteadofthesignaloritsacousticscharacteristics.Approachesbasedonthesignaltrytoreproducethesignalofanaturalvoiceasfaithfullyaspos-siblewithfewornoconcernabouthowitisproduced.Incontrast,amodelbasedontheproductionsystemusesphys-icallawstodescribethesoundpropagationinthevocaltractandmodelsmechanicalandaeroacousticphenomenatode-scribetheoscillationofthevocalfolds.2.1.BasiccomponentsofanarticulatorysynthesizerToimplementanarticulatorysynthesizerinadigitalcom-puter,amathematicalmodelofthevocalsystemisneeded.Synthesizersusuallyincludetwosubsystems:ananatomic-physiologicalmodelofthestructuresinvolvedinvoicepro-ductionandamodeloftheproductionandpropagationofsoundinthesestructures.Therstmodeltransformsthepositionsoftheartic-ulators,likethejaw,tonguebody,andvelum,intocross-sectionalareasofthevocaltract.Thesecondmodelconsistsofasetofequationsthatdescribetheacousticpropertiesofthevocaltractsystem.Generallyitisdividedintosubmod-elstosimulatedi erentphenomenasuchasthecreationofasourceofperiodicexcitation(vocalfoldoscillation),soundsourcescausedbytheturbulentowinthecaseofexistenceofconstrictionzonesareasu cientlyreducedalongthevo-caltract),propagationofthesoundaboveandbelowthevo-calfolds,andradiationatthelipsand/ornostrils.Theparametersforthemodelscanbeproducedbydif-ferentmethods.Theycanbeobtaineddirectlyfromthevoicesignalbyaprocessofinversionwithoptimization,bedenedmanuallybytheresearcher,orbetheoutputofalinguisticprocessingpartofaTTS(text-to-speech)system.2.2.MotivationsArticulatorysynthesishasnotreceivedasmuchattentioninrecentyearsasitcouldhavebecausethereisnotanalterna-tivetotheactualsystemsofsynthesiscurrentlyusedinTTSsystems.Thisisduetodi erentfactors:thedi cultytogetinformationaboutthevocaltractandthevocalfoldsduringtheproductionofvoiceinhumans;themeasurementtech-niquesgenerallyprovideinformationregardingstaticcong-urationswhileinformationconcerningthedynamicsofthearticulatorsisincomplete;afullandreliableinversionpro-cessforobtainingthearticulatoryparametersfromnaturalvoicedoesnotexistyet;thistechniqueinvolvescomplexcal-culations,raisingproblemsofstabilityinthenumericalres-olution. Despitetheselimitations,articulatorysynthesispresentssomeimportantadvantages:theparametersofthesynthe-sizeraredirectlyrelatedwiththehumanarticulatorymech-anisms,beingveryusefulinstudiesofproductionandper-ceptionofvoice[6 ];thismethodcanproducehigh-qualitynasalconsonantsandnasalvowels[7 ];source-tractinterac-tion,essentialforanaturalsound,canbeconvenientlymod-eledwhensimulatingthevocalfoldsandthetractasonesys-tem[8 ];theparametersvaryslowlyintime,sotheycanbeusedine cientprocessesofcodication;theparametersareeasiertointerpolatethanLPCandformantsynthesizerspa-rameters[9 ];smallerrorsinthecontrolsignalsdonotgen-erallyproducelowqualityspeechsounds,becausetheinter-polatedvalueswillalwaysbephysicallypossible.AccordingtoShadleandDamper[10 ],articulatorysyn-thesisisclearlythebestwaytoreproducesomeattributesofspeechweareinterestedin,suchastobeabletosoundlikeanextraordinaryspeakere.g.,asinger,someonewithdis-orderedspeech,oranalienwithextrasinuses);tobeabletochangetoanotherspeakertype,oralterthevoicequalityofagivenspeaker,withouthavingtogothroughasmuche ortasrequiredfortherstvoice.Articulatorysynthesiz-ershaveparametersthatcanbeconceptualized,sothatifaspeechsamplesoundswrong,intuitionisusefulinxingit,alwaysteachingussomethingandprovidingopportunitiestolearnmoreasweworktoproduceacommerciallyusablesystem.Articulatorysynthesisholdspromiseforovercomingsomeofthelimitationsandforsharpeningourunderstand-ingoftheproduction/perceptionlink[11 ].Thereisonlypartialknowledgeaboutthedynamicsofthespeechsignal,socontinuedresearchinthisareaisneeded.Thesystematicstudyofthecoarticulatione ectsisofspecialimportanceforthedevelopmentoftheexperimentalphoneticsandsciencesrelatedwiththeprocessingofvoice[12 ].Anarticulatorysyn-thesizercanbeusedasaversatilespeakerandthereforecon-tributetosuchstudies.Articulatorysynthesizerscangenerate

PAGE 3

SpeechProductionSimulationAppliedtoEuropeanPortuguese1437 speechusingcarefullycontrolledconditions.Thiscanbeuse-ful,forexample,totestpitch-trackingalgorithms[13 ]. Thearticulatorysynthesizercanbecombinedwithaspeechproductionevaluationtooltodevelopasystemthatcanproducereal-timeaudio-visualfeedbacktohelppeoplewithspecicarticulatorydisorders.Forexample,computer-basedspeechtherapy[14 ]ofspeakerswithdysarthriatriestostabilizetheirproductionatsyllableorwordlevel,toim-provetheconsistencyofproduction.Forseverelyhearingim-pairedpersons,theaimistoteachthemnewspeechpatternsandincreasetheintelligibilityoftheirspeech.Forchildrenwithcleftlipandpalateandvelopharyngealincompetence,theaimistoeliminatemisarticulatedspeechpatternssothatmostofthesespeakerscanachievehighlyintelligiblenormalspeechpatterns.Alsotheuseofsucha[articulatory]synthesizerhasmuchtocommenditinphoneticstudies[15 ].Theaudio-visualfeedbackcouldbeusedasanassistantforteachingphoneticstoforeignstudentstoimprovetheirspeechquality.Thesynthesizercanbeusedtohelpteachcharacteristicfea-turesofagivenlanguagesuchaspitchlevelandvowelspace[ 16 ]. RecentdevelopmentspresentedattheICPhS[11 ]showthatarticulatorysynthesisisworthrevisitingasaresearchtoolandasapartofTTSsystems.Betterwaysofmeasur-ingvocaltractcongurations,anincreasedresearchinterestinthevisualrepresentationofspeechandtheuseofsimplercontrolstructures,haverenewedtheinterestinthisresearcharea[11 ].Currentarticulatoryapproachestosynthesisin-cludeanopen-sourceinfrastructurethatcanbeusedtocom-binedi erentmodels[17 ],recentdevelopmentsintheHask-inscongurablearticulatorysynthesizerCASY[18 ],thecharacterizationoflipmovements[19 ],theICPvirtualtalk-ingheadthatincludesarticulatory,aerodynamic,andacous-ticmodelsofspeech[20 ],andthequasiarticulatoryarticula-toryparameterscontrollingaformantsynthesizer)approachofStevensandHanson[21 ]. 3.SAPWINDOWSARTICULATORYSYNTHESIZERObject-orientedprogrammingwasusedtoimplementthesynthesizer.Themodel-view-controllerconceptwasadoptedtoseparatemodelsfromtheircontrolsandviewers.Theapplication,developedusingMicrosoftVisualC++,cansynthesizespeechsegmentsfromparameterssequences.Thesesequencescanbedenedinadataoreditedbytheuser.Thesynthesisprocessispresentedstepbysteponagraphicalinterface.Presently,implementedmodelsallowonlyqualitysyn-thesisofvowelsoralornasal),nasalconsonants,andfrica-tives.Thenextsectionspresentbrieythecurrentlyimple-mentedmodels.3.1.AnatomicmodelsFornonnasalsounds,weonlyhavetoconsiderthevocaltract,thatis,avariableareatubebetweentheglottisandthelips.Fornasalsounds,wehavealsotoconsiderthenasal F M V W B N U T L3 L6 L5 L7 Jaw PF TO c2 cmvcmn DL PPc1Wh G2 G H G1K(0,0Jaw TonguetipTonguebodyLipsopeningLipsprot.HyoidVelumposition:20deg.(0. 349rads):(9. 800;10. 040) :(6. 640;8. 780) :0 139082 :0 390000 :0 060000 :(4. 376;9. 653) Figure1:Vocaltractmodel,basedonMermelsteinsmodel[22 ]. tract.Thenasaltractareaisessentiallyconstant,withtheex-ceptionofthesoftpalateregion.Thevocaltractvariescon-tinuallyanditsformmustbespeciedinintervalsshorterthanafewmilliseconds[23 ]. 3.1.1.VocaltractmodelTheproposedanatomicmodel,showninFigure1,assumesmidsagittalplanesymmetrytoestimatethevocaltractcross-sectionalarea.Modelarticulatorsaretonguebody,tonguetip,jaw,lips,velum,andhyoid.OurmodelisanimprovedversionoftheUniversityofFloridaMMIRCmodel[24 ], whichinturnwasamodiedversionoftheMermelsteinsmodel[22 ].Itusesanonregulargridtoestimatesectionsar-easandlengths.3.1.2.NasaltractmodelThemodelofthenasaltractallowstheinclusionofdi erentnasaltractshapesandseveralparanasalsinuses.Thenasalcavityismodeledinasimilarwaytotheoraltractandcanbeconsideredasasidebranchofthevocaltract.Themajordi erenceisthattheareafunctionofthenasaltractisxedforthemostpartofthenasaltract,foraparticu-larspeaker.Thevariableregion,thesoftpalate,changeswiththedegreeofnasalcoupling.Thevelumparameterofthear-ticulatorymodelcontrolsthiscoupling.RLCshuntcircuits,representingHelmholtzresonators,simulatetheparanasalsinuses[7 ]. Oursynthesizerallowsthedenitionofdi erenttractshapesandtheinclusionoftheneededsinusatanyposition

PAGE 4

1438EURASIPJournalonAppliedSignalProcessing Velum23 91 52 93 41cm0 822. 41 40 5cm2 MaxillarysinusNostrils Figure2:Defaultnasalmodelbasedon[26 ]. bysimplyeditinganASCIIle.Also,blockingofthenasalpassagesatanypositioncanbesimulatedbydeninganullareasectionatthepointofocclusion.Implementationdetailswerereportedin[25 ]. Inmostofourstudies,weusethenasaltractdimensionsfrom[26 ],asshowninFigure2,whichwerebasedonstudiesbyDangandHonda[27 ]andStevens[28 ]. 3.2.InteractiveglottalsourcemodelWedesignedaglottalexcitationmodelthatincludedsource-tractinteraction,fororalandnasalsounds[29 ],thatalloweddirectcontrolofsourceparameters,suchasfundamentalfre-quency,andthatwasnottoodemandingcomputationally.Theinteractivesourcemodelwedevelopedwasbasedon[ 30 ].Themodelwasextendedtoincludeatwo-masspara-metricmodeloftheglottalarea,jitter,shimmer,aspiration,andtheabilitytosynthesizedynamiccongurations.Tocalculatetheglottalexcitation,u g t ),itbecamenec-essarytomodelthesubsystemsinvolved:thelungs,thesub-glottalcavities,theglottisandthesupraglottaltract.Theroleofthelungsistheproductionofaquasicon-stantpressuresource,modeledasapressuresourcep l inse-rieswiththeresistanceR l .Torepresentthesubglottalregion,includingthetrachea,weusedthreeRLCresonantcircuits[ 31 ]. Severalapproacheswereusedforvocalfoldmodeling:self-oscillatingmodels,parametricglottalareamodels,andsoforth.Wewantedtohaveaphysiologicalmodel,likethetwo-massmodel,thatresultedinhigh-qualitysynthesis,butatthesametimeamodelnottoodemandingcomputation-ally.Also,adirectcontrolofparameterssuchasF 0 wasre-quired.WethereforechosethemodelproposedbyPrado[ 24 ],whichdirectlyparameterizesthetwoglottalareas.Inthemodel,R g and L g ,whichdependonglottalaperture,rep-resentthevocalfolds.Systemsaboveglottisweremodeledbythetractin-putimpedancez in t obtainedfromtheacousticmodel.Thisapproachresultsinanaccuratemodelingoffrequency-dependentlosses.Thevarioussubsystemscanberepresentedbytheequiv-alentcircuitshowninFigure3. Pressurevariationalongthecircuitcanberepresentedbyp l R l u g t 3 i = 1 p sgi d L g u g t dt R g u g t p s t = 0 (1) Theglottalsourcemodelincludesparametersneededtomodel F 0 andglottalapertureperturbations,knownasjitter Lungsp l + R l R sg 1 L sg 1 C sg 1 p sg 1 R sg 2 L sg 2 C sg 2 p sg 2 R sg 3 L sg 3 C sg 3 p sg 3 Tracheap intraR g L g p sub p s Glottis u g t Z in t Tract Figure3:Electricalanalogueoftheimplementedglottalsource.Adaptedfrom[32 ]. Table1:Glottalsourcetime-varyingparameters. Parameter DescriptionTypicalvalueUnit p l Lungspressure10000dyne/cm2 F 0 Fundamentalfrequency100HzOQ Openquotient60%ofT 0 SQ Speedquotient2A g 0 Minimumglottalarea0cm2 A g max Maximumglottalarea0. 3cm2 A 2 A 1 Slope0. 03cm2 Jitter F 0 perturbation2%Shimmer A g max perturbation5%Asp Aspiration andshimmer.Themodelalsotakesintoaccounttheaspira-tionnoisegenerationasproposedbySondhiandSchroeter[ 23 ].Oursourcemodeliscontrolledbytwokindsofpa-rameters.Thersttypeofparameterscanvaryintime,hav-ingarolesimilartothetractparameters.Inthesynthesisprocess,theseparameterscanbeusedtocontrolintonation,voicequality,andrelatedphenomena.Theyarepresentedin Table1.Thesecondtypeofsourceparameters(includ-inglungresistance,glottisdimensions,etc.)doesnotvaryintime.Theirvaluescanbealteredbyeditingacongurationle. 3.3.AcousticmodelSeveraltechniqueshavebeenproposedforsimulationofsoundpropagationintheoralandnasaltracts[33 ]:di-rectnumericsolutionoftheequations;time-domainsimu-lationusingwavedigitalersWDF),alsoknownasKelly-Lochbaummodel;frequency-domainsimulation.Afteran-alyzingtheprosandconsofthesethreeapproaches,wechoseforourrstimplementationoftheacousticmodelthefrequency-domaintechnique.Themainreasonforthischoicewasthepossibilityofeasilyincludingthefrequency-dependentlosses.Inouracousticmodel,wemadethefollowingapproxi-mations:propagationisassumedplanar;thetractisstraight;thetubeisapproximatedbytheconcatenationofelementaryacoustictubesofconstantarea.Anequivalentcircuit,repre-sentedbyatransmissionmatrix,modelseachoneoftheseelementarytubes.Analysisofthecircuitisperformedinthefrequencydomain[9 ].

PAGE 5

SpeechProductionSimulationAppliedtoEuropeanPortuguese1439 SubglottalsystemU g Z sub ApBpCpDp Z sub Pharyngealtube A 1 bB 1 b C 1 bD 1 b U c Backwardtract AnBnCnDn Z nr Z n Nasaltract Figure4:MatricesandimpedancesinvolvedinthecalculationofthetransferfunctionH gn ,betweenglottisandaconstrictionpoint,whichinturnisusedinthecalculationofuxatthenoisesourcelocation. Speechisgeneratedbytheacousticmodel.Weuseafrequency-domainanalysisandtime-domainsynthesismethodusuallydesignatedasthehybridmethod[9 ].Theuseoftheconvolutionmethodavoidstheproblemofcon-tinuityofresonanceinthefastermethodproposedbyLin[ 34 ].TheuseofafastimplementationoftheIFFT(theMITFFTW[35 ])minimizestheconvolutioncalculationtime.AsimilarprocedureisappliedtotheinputimpedanceZ in ),inordertoobtainz in n ),neededforthesource-tractinteractionmodelingbytheglottalsourcemodel.3.4.AcousticmodelforfricativesThevolumevelocityataconstrictionisobtainedbythecon-volutionoftheglottalowwiththeimpulseresponsecalcu-lated,usinganIFFT,fromthetransferfunctionbetweentheglottisandtheconstrictionpointH gn (see Figure4). 3.4.1.NoisesourcesFluctuationsinthevelocityofairowemergingfromaconstriction(atanabruptterminationofatube)createmonopolesourcesanductuationsofforcesexertedbyanobstaclee.g.,teeth,lips)orsurface(e.g.,palateorientednormaltothewgeneratedipolesources.Sincedipolesourceshavebeenshowntobethemostinuentialinthefricativespectra[36 ],thenoisesourceofthefricativeshasonlybeenapproximatedbyequivalentpressurevoltage(dipole)sourcesinthetransmission-linemodel.Neverthe-less,itisalsopossibletoinserttheappropriatemonopolesources,whichcontributetothelow-frequencyamplitudeandcanbemodeledbyanequivalentcurrentvolumevelocitysource.Fricationnoiseisgeneratedatthevocaltractaccord-ingtothesuggestionsofFlanagan[37 ],andSondhiandSchroeter[9 ].Anoisesourcecanbeintroducedautomat-icallyatanyT-sectionofthevocaltractnetwork,betweenthevelumandthelips.Thesynthesizesarticulatorymod-uleregisterswhichvocaltracttubecross-sectionalareasarebelowacertainthresholdA< 0 2cm2 ),producingalistoftubesectionsthatmightbepartofanoralconstrictionthatgeneratesturbulence.TheacousticmodulecalculatestheReynoldsnumber(Re)atthesectionsselectedbythearticulatorymoduleandactivatesnoisesourcesattubesectionswheretheReynoldsnumberisaboveacriticalvalueRecrit= 2000accordingto[ 9 ]).Noisesourcescanalsobeinsertedatanylocationinthevocaltract,basedonadditionalinformationaboutthedistributionandcharacteristicsofsources[36 38 ].Thisisadi erentsourceplacementstrategyfromthatusuallyusedinarticulatorysynthesis[9 ]wherethesourcesareprimar-ilylocatedinthevicinityoftheconstriction.Thedistributednatureofsomenoisesourcescanbemodeledbyinsertingseveralsourceslocatedinconsecutivevocaltractsections.Thiswillallowustotrycombinationsofthecanonicalsourcetypesmonopole,dipole,andquadrupole).ApressuresourcewithamplitudeproportionaltothesquaredReynoldsnumberP noise = 2 10 6 rand Re 2 Re 2 crit ,Re> Re crit, 0,Re Re crit, (2) isactivatedatthecorrectplaceinthetract[9 37 ].Theinter-nalresistanceofthenoisepressuresourceisproportionaltothevolumevelocityattheconstriction:R noise = | U c | / 2 A 2 c where isthedensityoftheair,U c isthewatthecon-striction,andA c istheconstrictioncross-sectionalarea.Theturbulentowcanbecalculatedbydividingthenoisepres-surebythesourceresistance.Thisnoiseowcouldalsobeeredinthetimedomaintoshapethenoisespectrum[36 ] andtestvariousexperimentallyderiveddipolespectra.3.4.2.PropagationandradiationThegeneralproblemassociatedwithhavingN noisesourcesisdecomposedinN simpleproblemsbyusingthesuperpo-sitionprinciple.Inordertocalculatetheradiatedpressureatthelipsduetoeachnoisesource,thevocaltractisdividedintothefollowingthreesections:pharyngeal,regionbetweenvelumcouplingpointandnoisesource,andregionafterthesource.DatastructuresbasedontheareafunctionofeachsectionaredenedandABCDmatricescalculated[9 ].TheABCDmatriceswerethenusedtocalculatedownstreamZ 1 andupstreamZ 2 inputimpedances,aswellasthetransferfunction,H ,givenbyH = Z 1 Z 1 + Z 2 1 CZ rad + D ,(3whereC and D areparametersfromtheABCDmatrix(fromnoisesourcetolips),andZ rad isthelipradiationimpedance.Theradiatedpressureatthelipsduetoaspecicsourceisgivenbyp radiated n = h n u noise n ),whereh n = IFFT( H ).Theoutputsoundpressuresduetothedi erentnoisesourcesareaddedtogether.Theoutputsoundpressureresultingfromtheexcitationofthevocaltractbyaglottalsourceisalsoaddedwhenthereisvoicing.4.RESULTSInthissection,wepresentexamplesofsimulationexper-imentsperformedwiththesynthesizerandtwopercep-tualstudiesregardingEuropeanPortuguesenasalvowels.

PAGE 6

1440EURASIPJournalonAppliedSignalProcessing CVVn NCVOpenOralVelumaperture Figure5:MovementofthevelumandoralarticulatorsforanasalvowelbetweentwostopconsonantsC VCcontext).Thethreephasesofanasalvowelinthiscontextareshown.Westartbythedescriptionoftheperceptualtests;then,re-centresultsinfricativesynthesis;nally,examplesofpro-ducedwordsandqualitytestsarepresented.4.1.NasalvowelsstudiesThesynthesizerwasusedtoproducestimuliforseveralper-ceptualtests,mostofthemforstudiesofnasalvowels.Next,wepresenttworepresentativestudies:therstinvestigatingthee ectofvelum,andotheroralarticulatorsvariationovertime;thesecondaddressingthesource-tractinteractionef-fectsinnasalvowels.Experiment1.StudyoftheinuenceofvelumvariationintheperceptionofnasalvowelsonC VCcontexts[39 ]. Severalstudiespointtotheneedofregardingspeechasadynamicphenomenon.Theinuenceofdynamicinforma-tioninoralvowelperceptionhasbeenasubjectofstudyformanyyears.Inaddition,someresearchersalsoseenasalvow-elsasdynamic.Toproducehigh-qualitysyntheticnasalvow-els,wouldbeusefultoknowinwhatmeasureweneedtoincludedynamicinformation.Weinvestigatedifitisenough,toproduceagoodqual-ityPortuguesenasalvowel,tocouplethenasaltractorthedegreeofcouplingvariationintimeimprovesquality.Thenullhypothesisisthatstaticanddynamicvelumwillproducestimuliofsimilarquality.OurrsttestsaddressedtheC VCcontext,nasalvowelsbetweenstops,themostcommonfornasalvowelsinPor-tuguese. VelumandoralpassageaperturevariationforanasalvowelproducedbetweenstopconsonantsisrepresentedschematicallyinFigure5.Duringtherststopconsonant,thenasalandoralpassagesareclosed.Thebeginningofthenasalvowelcoincideswiththereleaseoftheoralocclusion.Toproducethenasalvowel,boththeoralpassageandthevelummustbeopen.Possiblyduetotheslowspeedofvelummovements,inEuropeanPortuguese,thereisaperiodoftimewhereoralpassageisopenandvelumisinaclosed,oralmostclosed,position,producingasoundwithoralvowelcharacteristics,representedinFigure5byaV.Velumcon-tinuesitsopeningmovementcreatingsimultaneoussoundpropagationinoralandnasaltracts.ThiszoneisrepresentedbyVn .Theoralpassagemustcloseforthefollowingstopcon-sonant,sotheearlyoralclosure(beforethevelarclosurecre-atesazonewithonlynasalradiation,representedbyN.Theplaceofarticulationofthisnasalconsonant,createdbycoar-ticulation,isthesameasthefollowingstop.Stimuli Forthisexperiment,3variantsofeachofthe5EPnasalvow-elswereproduceddi eringinthewayvelummovementwasmodeled.Fortherstvariant,calledstatic,thevelumwasopenataxedvalueduringallvowelproduction.Theothertwovariantsusedtime-varyingvelumopening.Intherst100milliseconds,thevelumstayedclosed,makinganopen-ingtransitionin60millisecondstothemaximumaperture,andthenremainingopen.Inoneofthesevariants,analbilabialstopconsonant,[m],wascreatedattheendbylipclosureat250milliseconds.Allstimulihadaxeddurationof300milliseconds.Listeners Atotalof11,9maleand2female,EuropeanPortuguesena-tivespeakersparticipatedinthetest.Theyhadnohistoryofspeech,hearing,orlanguageimpairments.ProcedureWeusedapairedcomparisontest[40 ,page361],becausewewereanalysingthesynthesisquality,despitethedemandformoredecisionsbyeachlistener,whichalsoincreasestestduration.Thequestionansweredbylistenerswasasfollows:whichofthetwostimulidoyoupreferasaEuropeanPor-tuguesenasalvowel?Inpreparingthetest,wenoticedthatlis-tenershad,insomecases,di cultyinchoosingthepreferredstimulus.Thecausesweretracedtoeithergoodorpoorqual-ityofbothstimuli.Tohandlethissituation,weaddedtwonewpossibilities,foratotaloffourpossibleanswers:rst,second,both,andnone.Thetestwasdividedintotwoparts.Intherstpart,wecomparedstaticversusvelumdynamicstimuli.Inthesecondpartcomparisonwasmadebetweendynamicstimuliwithandwithoutanalbilabialnasalconsonant.Stimuliwerepresented5timesinbothABandBAorder.Interstimuliin-tervalwas600milliseconds.Theresultsforeachpossiblepairofstimuliinthetestwerecheckedforlistenerconsistency.Theywereretainedifthelistenerpreferredonestimulusinmorethan70%ofthepresentations.Onlyclearchoicesofonestimulusagainstoth-erswereanalyzed.Results Variablevelumpreferredtostaticvelum.Preferencescores(percentageofthedesignatedstimulichosenasthepreferredone)forxedvelumaperture,variablevelumaperture,andthedi erencebetweenthetwoarepresentedintheboxplotsof Figure6. Clearly,listenerspreferredstimuliwithtimevariablevelumaperture.Averagepreference,includingallvowelsandlisteners,wasashighas71. 8%.CondenceintervalCIp= 0 95)forthedi erenceinpreferencescorewasbetween24 2and65. 6%,infavourofthevariablevelumcase.RepeatedmeasuresANOVAshowedasignicantvelumvariatione ect[F (1,10)= 5 67, p< 0 05]andanonsigni-cantp> 0 05)vowelandinteractionbetweenthetwomainfactorsvowelandvelumvariation).

PAGE 7

SpeechProductionSimulationAppliedtoEuropeanPortuguese1441 FixedvelumVariablevelumDi erence 50 0 50 100 Preference(%) Figure6:BoxplotsofthepreferencescoresfortherstpartoftheperceptualtestfornasalvowelsinC VCcontext,comparingstimuliwithedandvariablevelumapertures,showingthee ectofthevelumaperturevariation. WithoutconsonantWithconsonantDi erence0 20 40 60 80 100 Preference(%) Figure7:BoxplotsofthepreferencescoresforthesecondpartoftheperceptualtestfornasalvowelsinC VCcontext,comparingstimuliwithandwithoutanalnasalconsonant,showingtheef-fectofthenalnasalconsonant.Nasalconsonantatnasalvowelendwaspreferred.Ingen-eral,listenerpreferredstimuliendinginanasalconsonant.LookingatthepreferencescoresrepresentedgraphicallyinFigure7,stimuliwithnalnasalconsonantwerepreferredmorethanstimuliwithoutthenalconsonant.Thecon-denceintervalCIp= 0 95)forthedi erenceinpreferencescorewasbetween36. 1and87. 0%,infavourofthestimuliwithanalnasalconsonant. NointeractionTotalPartial0246Timems0 200 400 600 u g t cm3 / s) Figure8:Glottalwaveof3variantsofvowel[ a)withouttractload(nointeraction);(b)withtotaltractload;(c)withtractinputimpedancecalculateddiscardingnasaltractinputimpedance. Z in totalZ in oralonly0100020003000400050000 10 20 30 40 50 60 Figure9:Inputimpedanceforvowel[ withandwithoutthenasaltractinputimpedance.ANOVAresults,withtwomainfactors,conrmedasig-nicante ectofthenalnasalconsonant[F (1,8= 9 5, p< 0 05]andnonsignicantp> 0 05)vowele ectandinteractionbetweenmainfactors.Experiment2.Studyofsource-tractinteractionfornasalvowels[29 ]. Weinvestigatediftheextracouplingofthenasaltractinnasalvowelsproducedidentiablealterationsintheglot-talsourceduetosource-tractinteraction,andifmodelingofsuche ectsresultedinamorenaturalqualitysyntheticspeech.Figure8depictsthee ectofthe3di erentinputimpedancesinnasalvowel[ Thenasaltractloadhasagreatinuenceontheglottalsourcewave,becauseoftheno-ticeabledi erenceintheinputimpedancecalculatedwithorwithoutthenasaltractinputimpedance,showninFigure9. Thisdi erenceisduetothefactthatforhighvowelssuchas[ theimpedanceloadforthepharyngealregion,whichisequaltotheparalleloftheoralcavityandnasaltractin-putimpedances,isalmostequaltothenasalinputimpedance(see Figure10).Thee ectislessnotoriousinalowvowels,suchas[5 ]. Stimuli StimuliwereproducedfortheEPnasalvowelsvaryingonlyonefactor:theinputimpedanceofthetractusedbythe

PAGE 8

1442EURASIPJournalonAppliedSignalProcessing Z in nasal Z in oralParallel0100020003000400050000 20 40 60 80 Figure10:Inputimpedancesinthevelumregionfornasalvowel[ TheFigurepresentstheoralinputimpedanceZ in oral),thenasaltractinputimpedanceZ in nasal),andtheequivalentparal-lelimpedance(parallel).Theparallelimpedanceis,forthisvowel,approximatelyequaltothenasaltractinputimpedance.interactivesourcemodel.Thisfactorhad3values:1)inputimpedanceincludingthee ectofallsupraglottalcavities;(2)inputimpedancecalculatedwithouttakingintoaccountthenasaltractcoupling;or3)notractload.Only3vowels,[ 5 ], [ and[ u],wereconsideredtoreducetestrealizationtime.Thesametimingwasusedforallvowels.Intherst100milliseconds,thevelumstayedclosed,makinganopen-ingtransitionin60millisecondstothemaximumvalue.Thevelumremainedatthismaximumuntiltheendofthevowel.Thestimuliendedwithanasalconsonant,abilabial[m],producedbyclosingthelips.Closingmovementofthelipsstartedat200millisecondsandended50millisecondslater.Stimulusdurationwasxedat300millisecondsforallvowels.ThesechoiceswerebasedontheresultsoftheExperiment1,wheredynamicvelumstimuliwerepreferred.TheinteractivesourcemodelwasusedwithvariableF 0 F 0 startsaround100Hz,raisesto120Hzintherst100mil-liseconds,andthengraduallygoesbackdownto100Hz.Theopenquotientwas60%andthespeedquotient2.Jitterandshimmerwereaddedtoimprovenaturalness.Listeners Atotalof14,11malesand3femalesEuropeanPortuguesenativespeakersparticipatedinthetest.Theyhadnohistoryofspeech,hearing,orlanguageimpairments.ProcedureA4IAXfour-intervalforced-choicediscriminationtestwasperformedtoinvestigateiflistenerswereabletoperceivechangesintheglottalexcitationcausedbytheadditionalcou-plingofthenasaltract.The4IAXtestwaschosen,insteadofthemorecommonlyusedABXtest,becausebetterdiscriminationresultshavebeenreportedwiththistypeofperceptualtest[4 ]. Inthe4IAXparadigm,listenersheartwopairsofstimuli,withasmallintervalinbetween.Themembersofonepairarethesame(AA);themembersoftheotherpairaredi er-entAB).Listenershavetodecidewhichofthetwopairshasdi erentstimuli.Table2:Resultsofthe4IAXtest. Listener Sex[ 5 ][ [ u]Average 1 M50. 033. 341. 741. 7 2 M58. 3100. 050. 069. 4 3 F50. 041. 750. 047. 2 4 F33. 383. 366. 761. 0 5 M16. 758. 333. 336. 1 6 M66. 766. 766. 766. 7 7 M50. 050. 041. 747. 2 8 M58. 358. 341. 752. 8 9 F41. 750. 066. 752. 7 10 M58. 350. 058. 355. 6 11 M33. 383. 358. 358. 3 12 M75. 058. 358. 363. 9 13 M50. 041. 733. 341. 4 14 M83. 350. 058. 363. 8 Average 1. 858. 951. 854. 1 Std. 7. 318. 611. 910. 3 Signalswerepresentedoverheadphonesinroomswithlowambientnoise.Eachofthe4combinationsABAA,ABBB,AAAB,andBBAB)waspresented3timesinarandomorder.Withthisarrangement,eachpairtobetestedappears12times.Theorderwasdi erentforeachlistener.Interstim-uliintervalwas400millisecondsandinterpairsintervalwas700milliseconds.Results Table2showsthepercentageofcorrectanswersforthe4IAXtest.Thetablepresentsresultsforeachlistenerandvowel.Also,thestatisticsmeanandstandarddeviation)foreachvowel,andforthe3vowels,arepresentedatthebottomofthetable.Resultsarecondensed,ingraphicalform,inFigure11. Fromthetableandtheboxplots,itisclearthatlistenerscorrectanswerswerecloseto50%,beingalittlehigherforthenasalvowel[ Theseresultsindicatethatstimulidi erencesareofdi cultperceptionbythelisteners.Statisticaltests,havingasnullhypothesisH 0 : = 50 andalternativeH 1 : 50,wereonlysignicant,ata5%levelofsignicance,for[ Forthisvowel,the95%codenceintervalforthemeanwasbetween50. 1a67. 7.For[ 5 ], weobtainedp = 0 36andfor[ u], p = 0 29.Forthe3vow-elsconsideredtogether,theaveragewasalsonotsignicantlysuperiorto50%p = 0 08). Discussion Simulationsshowedsomesmalle ectsofthenasaltractloadintheglottalwavetimeandfrequencyproperties.Resultsofperceptualtests,conductedtostudytowhatextenttheseal-terationswereperceivedbylisteners,supportedtheideathatthesechangesarehardlyperceptible.Theseresultsagreewithresultsreportedin[41 ].Intheirwork,TitzeandStoryre-portedthatAnopennasalport... showednomeasurablee ectonoscillationthresholdpressureorglottalow.

PAGE 9

SpeechProductionSimulationAppliedtoEuropeanPortuguese1443 [ a ][ [ u] 0 20 40 60 80 100 Correctdiscrimination(%) Figure11:Boxplotofthe4IAXdiscriminationtestresultsforeval-uationofthelistenersabilitytoperceivethee ectsofsource-tractinteractiononnasalvowels.Thereishoweveratendencyforthee ectofinteractionbeingmoreperceptibleforthehighvowel[ producedwithreducedvocalcavity.Oursimulationsresultssuggestasanexplanationforthisdi erencetherelationbetweenthenasaltractimputimpedanceandtheimpedanceofthevocalcavityatthenasaltractcouplingpoint.4.2.FricativesInarstexperimentthesynthesizerwasusedtoproduce,sustainedunvoicedfricatives[42 ].Thevocaltractcongura-tionderivedfromanaturalhighvowelwasadjustedbyrais-ingthetonguetipinordertoproduceasequenceofreducedvocaltractcross-sectionalareas.Thelungpressurewaslin-earlyincreasedanddecreasedatthebeginningandendoftheutterance,toproduceagradualonsetando setoftheglottalow.ThesecondgoalwastosynthesizefricativesinVCVse-quences[42 ].Articulatorycongurationsforvowelswereobtainedbyinversion[43 ].Thefricativesegmentwasob-tainedbymanualadjustmentofarticulatoryparameters.Forexample,todeneapalato-alveolarfricativecongurationforthefricativein[iS i],weusedthecongurationofvowel[i]andonlychangedthetonguetiparticulatortoaraisedpositionensuringacross-sectionalareasmallenoughtoac-tivatenoisesources.For[i],besidesraisingthetonguetip,describedfor[iS i], weusedlipopeningtocreatethenecessarysmallareapassageatthelips.Synthesisresultsforthenonsenseword/i/areshowninFigure12. An F 0 valueof100Hzandamaximumglottalopeningof0. 3cm2 wereusedtosynthesizethevowels.Thetimetra-jectoryoftheglottalsourceparameterA g max risesto2cm2 atthefricativemiddlepointandattheendofthefricativereturnstothevalueusedduringvowelproduction. 0460Timems 0 086 0 0 06 0460Timems0 5000 FrequencyHz Figure12:Synthetic[i],showingspeechsignalandspectrogram. 0466Timems 0 089 0 0 082 0466Timems0 5000 FrequencyHz Figure13:Synthetic[ivi],showingspeechsignalandspectrogram.Nonsensewordswithvoicedfricativeswerealsopro-duced,keepingtheglottalfoldsvibrationthroughoutthefricative.Resultsforthe[ivi]sequencearepresentedinFigure13. 4.3.WordsThesynthesizerisalsocapableofproducingwordscontain-ingvowels(oralornasal),nasalconsonants,andlower-quality)stops.Toproducesuchwords,andsincethesynthesizerisnotconnectedtothelinguisticandprosodiccomponentsofatext-to-speechsystem,weusedthefollowingmanualprocess:(1)obtainingdurationsforeachphoneticsegmententer-ingthewordcompositionpresentlybydirectanalysis

PAGE 10

1444EURASIPJournalonAppliedSignalProcessing (a (b (cFigure14:Tractcongurationsusedtosynthesizethewordm ao (hand):a)[m],(b)[a],and(c)[u].ofnaturalspeechalthoughanautomaticprocess,suchasaCARTtree,canbeusedinthefuture);(2)obtainingoralarticulatorscongurationsforeachofthephones.Forvowelsweusedcongurationsob-tainedbyaninversionprocessbasedonthenatu-ralvowelsrstfourformants[43 44 ].Thesecon-gurationswerealreadyavailablefrompreviouswork[ 39 43 ].Fortheconsonants,forwhichwedonothave,yet,aninversionprocess,congurationswereobtainedmanually,basedonthearticulatoryphoneticsdescrip-tionandpublishedX-rayandMRIimages;(3)velumtrajectorydenition,usingadequatevaluesforeachvowelandconsonant;(4)settingglottalsourceparameters,inparticular,thefundamentalfrequencyF 0 ). WerstattemptedtosynthesizewordscontainingnasalsoundsduetotheirrelevanceinthePortugueselanguage[ 45 ].Wenowpresentthreeexamplesofsyntheticwords:m ao m ae ,andAnt onio Example1(wordm ao (hand)).First,fromnaturalspeechanalysis,wemeasureddurationsof100millisecondsforthe[m]and465millisecondsforthenasaldiphthong.Inthiscase,the[m]congurationwasobtainedmanu-allyandcongurationsfor[a]and[u]wereobtainedbyaninversionprocess[43 46 ].Thethreecongurationsarepre-sentedinFigure14. Avelumtrajectorywasdened,basedonarticulatoryde-scriptionsoftheinterveningsounds.AsshowninFigure15, thevelumstartsclosed,inapreproductionposition,opensforthenasalconsonant,opensmoreduringtherstvowelinthediphthong,andnallyraisestowardsclosureinthesec-ondpartofthediphthong.Fundamentalfrequency,F 0 ,andothersourceparameterswerealsodened.F 0 startsat120Hz,increasesto130Hzattheendofthenasalconsonant,thento150Hztostresstheinitialpartofthediphthong,andnallydecreasesto80Hzattheendoftheword.Thisvariationintimewasbased,par-tially,ontheF 0 contourofnaturalspeech.Valuesof60%fortheopenquotient(OQ)and2forspeedquotient(SQ)wereused.Jitter,shimmer,andsource-tractinteractionwerealsoused. 20100140185380565TimemsClosed Open Figure15:Velumtrajectoryusedtosynthesizethewordm ao (hand). 0800Timems0 5000 FrequencyHz Figure16:Spectrogramofthewordm ao producedbythearticula-torysynthesizer.Twoversionswereproduced:withandwithoutlipclo-sureattheendoftheword.Duetotheopenstateofthevelum,thisnaloralclosureresultsinthenalnasalcon-sonant[m].Thespectrogramofthislastversionispresentedin Figure16. Example2(wordm ae (mother)).Apossiblephonetictran-scriptionforthewordm ae (mother)ism 5 ],includingapalatalnasalconsonantattheend[45 ,page292].Keepingtheoralpassageopenattheendofthewordproducedavariant.Duetothelackofpreciseinformationregardingoraltractcongurationduringproductionof[ 5 weproducedvari-antsdi eringinthecongurationusedforthenasalvowel[ 5 ].Oneversionwasproducedusingthecongurationoforalvowel[a],another,withahighertongueposition,us-ingthecongurationofvowel[5 ].Anotherparametervariedwas F 0 :versionswithvaluesobtainedbyanalysisofanaturalspeech,andversionswithsyntheticF 0 .Forthesyntheticcase,afurthervariationwasused:theinclusionornotofsource-tractinteraction.Figure17showsthespeechsignalandre-spectivespectrogramfornonnaturalF 0 ,source-tractinterac-tion,congurationof[a]fornasalvowel[ 5 ],andnalpalatalocclusion.Example3(wordAnt onio ).Therstnameoftherstau-thor,Ant onio [ 5 O nju],wasalsosynthesizedusingthesameprocessasinthetwopreviousexamples.Thiswordhasanasalvowelatthebeginning,astop,anoralvowel,anasal

PAGE 11

SpeechProductionSimulationAppliedtoEuropeanPortuguese1445 0800Timems 0 679 0 0 683 0800Timems0 5000 FrequencyHz Figure17:Speechsignalandspectrogramofoneoftheversionsofthewordm ae synthesizedusingan[a]congurationatthebegin-ningofthenasaldiphthong,oralocclusionattheend,source-tractinteraction,andsyntheticvaluesforF 0 consonant,andanaloraldiphthong.Twoversionswereproduced:onewithnaturalF 0 ,andanotherwithsyntheticF 0 .Thesignalanditsspectrogramobtainedfortherstver-sionarepresentedinFigure18.Thestopconsonant[t]wasobtainedclosingandopeningtheoralpassagewithoutmod-elingimportantphenomenafortheperceptionofanaturalqualitystopsuchasthevoiceonsettimeVOTandtheas-pirationatthereleaseofclosure.Aspartofameanopinionscore(MOS)qualitytest,thisandmanyotherstimuliproducedbyoursynthesizer,wereevaluated.Todocumentthequalitylevelachievedbyourmodels, Table3showstheratingsofthevariousversionsofthe3examplespresentedabove.Thenormalizedto5)re-sultsvariedbetweenthevalues3and4(fromfairtogood).Thetop-ratedwordobtained3. 7(3. 4withoutnormaliza-tion).5.CONCLUSIONFromtheexperiencewithsimulationsandperceptualtestsusingstimuligeneratedbyourarticulatorysynthesizer,webelievethatarticulatorysynthesisisapowerfulapproachtospeechsynthesisbecauseofitsanthropomorphicoriginandanditallowsustoaddressquestionsregardinghumanspeechproductionandperception.Wedevelopedamodulararticulatorysynthesizerarchi-tectureforPortuguese,usingobject-orientedprogramming.Separationofcontrol,model,andviewerallowstheadditionofnewmodelswithoutmajorchangestotheuserinterface.Implementedmodelscompriseaglottalinteractivesourcemodel,aexiblenasaltractareamodel,andahybridacousticmodelcapableofdealingwithasymmetricnasaltractcong01400Timems 0 503 0 0 653 01400Timems0 5000 FrequencyHz Figure18:SpeechsignalandspectrogramforthesyntheticwordAnt onio [ 5 O nju]producedusingF 0 extractedfromanaturalpro-nunciation.urationsandfricationnoisesources.Synthesizedspeechhasaqualityrangingfromfairtogood.Thesynthesizerhasbeenused,mainly,intheproduc-tionofstimuliforperceptualtestsofPortuguesenasalvowels(e.g.,[39 47 48 ]).Thetwostudiesonnasalvowelsreportedinthispaperwereonlypossiblewiththeuseofthearticu-latoryapproachtospeechsynthesis,allowingthecreationofstimulibydirectandprecisecontrolofthearticulatorsandtheglottalsource.Theyillustratethepotentialofarticula-torysynthesisinproductionandperceptionstudiesandthexibilityofoursynthesizer.PerceptualtestsandsimulationscontributedtoimproveourknowledgeregardingEPnasalsounds,namelythefol-lowing.(1)Itisnecessarytoincludethetimevariationofvelumaperture,combinedwiththetimevariationofarticu-latorscontrollingtheoralpassage,inordertosynthe-sizehigh-qualitynasalvowels.(2)Nasalityisnotcontrolledsolelybythevelummove-ment.Oralpassagereduction,orocclusion,canbealsousedtoimprovenasalvowelquality.Whennasalvow-elswereword-nal,thelipsortonguemovement,evenwithoutocclusion,improvedthequalityofthesynthe-sizednasalvowelbyincreasingthepredominanceofnasalradiation.Oralocclusion,duetocoarticulation,beforestopsalsocontributestonasalqualityimprove-ment. (3)Source-tractinteractione ectduetoextracouplingofthenasaltractisnoteasilyperceived.Discriminationwassignicantlyabovechancelevelonlyforthehighvowel[ whichcanpossiblybeexplainedbythere-lationofnasalandoralinputimpedancesatthenasaltractcouplingpoint.

PAGE 12

1446EURASIPJournalonAppliedSignalProcessing Table3:Qualityratingsforseveralwordsproducedbythesynthesizer.Foreachword,thetableincludesthemeanopinionscore(MOS),itsrespective95%condenceinterval,andthenormalizedvalueresultingfromscalingnaturalspeechscoresto5. WordF 0 Interac.Observ.MOSCI95%Norm m ao SyntheticYesno[m]atend3. 4[3. 7]3. 7 SyntheticYes[m]atend3. 0[2. 4]3. 3 m ae Naturalyes[a],[ ]atend2. 9[2. 3]3. 2 SyntheticYes[a],[ ]atend3. 1[2. 4]3. 3 SyntheticYes[5 ],[ ]atend2. 9[2. 3]3. 2 SyntheticYes[a],no[ ]3 0[2. 4]3. 3 SyntheticNo[5 ],[ ]atend2. 9[2. 3]3. 1 SyntheticNo[a],[ ]atend2. 8[2. 2]3. 1 Ant onio NaturalYes3. 0[2. 2]3. 3 SyntheticYes2. 7[2. 9]2. 9 Naturalspeech4. 5. 0 Anasalvowel,atleastinEuropeanPortuguese,isnotasoundobtainedonlybyloweringthevelum.Thewaythisapertureandotherarticulatorsvaryintimeisimportant.Namely,howthevelumandtheoralarticulatorsvaryinthevariouscontextsimprovesquality.Withtheadditionofnoisesourcemodelsandmodica-tionstotheacousticmodel,ourarticulatorysynthesizerisca-pableofproducingsustainedfricativesandfricativesinVCVsequences.Firstresultswerepresented,andjudgedininfor-mallisteningtestsashighlyintelligible.Ourmodeloffrica-tivesiscomprehensiveandexible,makingthenewversionofSAPWindowsavaluabletoolfortryingoutneworim-provedsourcemodels,andrunningproductionandpercep-tualstudiesofEuropeanPortuguesefricatives[49 ].Thepos-sibilityofautomaticallyinsertingandremovingnoisesourcesalongtheoraltractisafeatureweregardashavinggreatpo-tential.SAPWindowsarticulatorysynthesizerisusefulinpho-neticsresearchandteaching.Weexploredtherstareaforseveralyearswithveryinterestingresults,asshowninthispaper.Recently,westartedexploringthesecondarea,aimingatusingthesynthesizerinphoneticsteachingatourUniver-sitysLanguagesandCulturesDepartment.Articulatorysyn-thesisisalsoofinterestintheeldofspeechtherapybecauseofitspotentialtomodeldi erentspeechpathologies.Developmentofthissynthesizerisanunnishedtask.TheadditionofnewmodelsforotherPortuguesesounds,theuseofacombineddata(MRI,EMA,EPG,etc.)foradetaileddescriptionofthevocaltractcongurationsandanoptimalmatchbetweenthesynthesizedandthePortuguesenaturalspectra[49 ],andtheintegrationofthesynthesizerinatext-to-speechsystemareplannedasfuturework.ACKNOWLEDGMENTSThisworkwaspartiallyfundedbytherstauthorsPh.D.ScholarshipBD/3495/94andtheprojectArticulatorySyn-thesisofPortugueseP/PLP/11222/1998,bothfromthePor-tugueseResearchFoundation(FCTPRAXISXXIprogram.WealsohavetothanktheUniversityofFloridasMMIRC,headedbyProfessorD.G.Childers,wherethisworkstarted.REFERENCES [1]R.Linggard,ElectronicSynthesisofSpeech,CambridgeUniver-sityPress,Cambridge,UK,1985.[2]M.R.Schroeder,ComputerSpeech:Recognition,Compression,Synthesis,vol.35ofSpringerSeriesinInformationSciences, SpringerVerlag,NewYork,NY,USA,1999.[3]J.-P.Tubach,Pr esentationG en erale,FondementsetPer-spectivesenTraitmentAutomatiquedelaParole,H.M eloni, Ed.,Universit esFrancophones,1996.[4]G.J.Borden,K.S.Harris,andL.J.Raphael,SpeechSciencePrimerhysiology,Acoustics,andPerceptionofSpeech,LWW,4ndedition,2003.[5]D.Klatt,Softwareforacascade/parallelformantsynthe-sizer,JournaloftheAcousticSocietyofAmerica,vol.67,no.3,pp.971,1980.[6]P.Rubin,T.Baer,andP.Mermelstein,Anarticulatorysynthe-sizerforperceptualresearch,JournaloftheAcousticalSocietyofAmerica,vol.70,no.2,pp.321,1981.[7]S.Maeda,Theroleofthesinuscavitiesintheproductionofnasalvowels,inProc.IEEEInt.Conf.Acoustics,Speech,Sig-nalProcessing(ICASSP,vol.2,pp.911,Paris,France,May1982.[8]T.Koizumi,S.Tanigushi,andS.Hiromitsu,Glottalsource-vocaltractinteraction,JournaloftheAcousticSocietyofAmer-ica ,vol.78,no.5,pp.1541,1985.[9]M.M.SondhiandJ.Schroeter,Ahybridtime-frequencydomainarticulatoryspeechsynthesizer,IEEETrans.Acoust.,Speech,SignalProcessing,vol.35,no.7,pp.955,1987.[10]C.H.ShadleandR.Damper,rospectsforarticulatorysyn-thesis:Apositionpaper,inProc.4thISCATutorialandRe-searchWorkshop(ITRW01),Perthshire,Scotland,AugustSeptember2001.[11]D.H.Whalen,Articulatorysynthesis:Advancesandprospects,inProc.15thInternationalCongressofPhoneticSciences(ICPhS,pp.175,Barcelona,Spain,August2003. [12]B.K uhnertandF.Nolan,Theoriginofcoarticulation,ForschungsberichtedesInstitutsf urPhonetikundSprach-licheKommunikationderUniversit atM unchen(FIPKM), vol.35,pp.61,1997,andalsoin[50 ].Online.Available:http://www.phonetik.uni-muenchen.de/FIPKM/index.html.[13]A.PintoandA.M.Tom e,Automaticpitchdetectionandmidiconversionforthesingingvoice,inProc.WSESInter-nationalConferences:AITA,AMTA,MCBE,MCBC ,pp.312,Greece,2001.

PAGE 13

SpeechProductionSimulationAppliedtoEuropeanPortuguese1447 [14]A.M. Oster,D.House,A.Protopapas,andA.Hatzis,Presen-tationofanewEUprojectforspeechtherapy:OLP(Ortho-Logo-Paedia),inProc.TMH-QPSR,Fonetik2002,vol.44,pp.45,Stockholm,Sweden,May2002.[15]F.S.CooperSpeechsynthesizers,inProc.4thInternationalCongressofPhoneticSciences(ICPhS,A.Sovij arviandP.Aalto,Eds.,pp.3,TheHague:Mouton,Helsinki,Finland,September1961.[16]M.Wrembel,nnovativeapproachestotheteachingofprac-ticalphonetics,inProc.PhoneticsTeaching&LearningCon-ferencePTLC,London,UK,April2002.[17]S.S.Fels,F.Vogt,B.Gick,C.Jaeger,andI.Wilson,ser-centreddesignforanopensource3-Darticulatorysynthe-sizer,inProc.15thInternationalCongressofPhoneticSciences(ICPhS,vol.1,pp.179,Barcelona,Spain,August2003. [18]K.Iskarous,L.Goldstein,D.H.Whalen,M.K.Tiede,andP.E.Rubin,CASY:TheHaskinscongurablearticulatorysynthe-sizer,inProc.15thInternationalCongressofPhoneticSciences(ICPhS,vol.1,pp.185,Barcelona,Spain,2003.[19]S.MaedaandM.Toda,Mechanicalpropertiesoflipmove-ments:Howtocharacterizedi erentspeakingstyles?inProc.15thInternationalCongressofPhoneticSciences(ICPhS, vol.1,pp.189,Barcelona,Spain,August2003.[20]P.Badin,G.Bailly,F.Elisei,andM.Odisio,Virtualtalkingheadsandaudiovisualarticulatorysynthesis,inProc.15thIn-ternationalCongressofPhoneticSciencesICPhS03),vol.1,pp.193,Barcelona,Spain,August2003.[21]K.N.StevensandH.M.Hanson,Productionofconsonantswithaquasi-articulatorysynthesizer,inProc.15thInterna-tionalCongressofPhoneticSciences(ICPhS,vol.1,pp.199,Barcelona,Spain,August2003.[22]P.Mermelstein,Articulatorymodelforthestudyofspeechproduction,JournaloftheAcousticalSocietyofAmerica, vol.53,no.4,pp.1070,1973.[23]J.SchroeterandM.M.Sondhi,Speechcodingbasedonphys-iologicalmodelsofspeechproduction,inAdvancesinSpeechSiganlProcessing,MarcelDekker,NewYork,NY,USA,1992.[24]P.P.L.Prado,Atarget-basedarticulatorysynthesizer,Ph.D.dissertation,UniversityofFlorida,Gianesville,Fla,USA,1991. [25]A.Teixeira,F.Vaz,andJ.C.Pr ncipe,Acomprehensivenasalmodelforafrequencydomainarticulatorysynthesizer,inProc.10thPortugueseConferenceonPatternRecognitionRec-Pad98),Lisbon,Portugal,March1998.[26]M.Chen,AcousticcorrelatesofEnglishandFrenchnasalizedvowels,JournaloftheAcousticalSocietyofAmerica,vol.102,no.4,pp.2360,1997.[27]J.DangandK.Honda,MRImeasurementsandacousticin-vestigationofthenasalandparanasalcavities,JournaloftheAcousticalSocietyofAmerica,vol.94,no.3,pp.1765,1993. [28]K.N.Stevens,AcousticPhonetics,CurrentStudiesinLinguis-tics,MITPress,Cambridge,Mass,USA,1998.[29]A.Teixeira,F.Vaz,andJ.C.Pr ncipe, ectsofsource-tractinteractioninperceptionofnasality,inProc.6thEuropeanConferenceonSpeechCommunicationandTechnology(EU-ROSPEECH99),vol.1,pp.161,Budapest,Hungary,September1999.[30]D.AllenandW.Strong,Amodelforthesynthesisofnaturalsoundingvowels,JournaloftheAcousticalSocietyofAmerica, vol.78,no.1,pp.58,1985.[31]T.V.AnanthapadmanabhaandG.Fant,Calculationoftrueglottalowanditscomponents,SpeechCommunication, vol.1,no.3-4,pp.167,1982.[32]L.Silva,A.Teixeira,andF.Vaz,Anobjectorientedarticu-latorysynthesizerforWindows,RevistadoDepartamentodeElectr onicaeTelecomunicac oes,UniversidadedeAveiro,vol.3,no.5,pp.483,2002.[33]E.L.Riegelsberger,Theacoustic-to-articulatorymappingofvoicedandfricatedspeech,Ph.D.dissertation,TheOhioStateUniversity,Columbus,Ohio,USA,1997.[34]Q.Lin,Afastalgorithmforcomputingthevocal-tractim-pulseresponsefromthetransferfunction,IEEETrans.SpeechAudioProcessing,vol.3,no.6,pp.449,1995.[35]M.FrigoandS.Johnson,FFTW:anadaptivesoftwarearchi-tecturefortheFFT,inProc.IEEEInt.Conf.Acoustics,Speech,SignalProcessing(ICASSP,vol.3,pp.1381,Seattle,Wash,USA,1998.[36]S.S.NarayananandA.A.H.Alwan,oisesourcemodelsforfricativeconsonants,IEEETrans.SpeechAudioProcessing, vol.8,no.3,pp.328,2000.[37]J.L.Flanagan,SpeechAnalysis,SynthesisandPerception, Springer-Verlag,NewYork,NY,USA,2ndedition,1972.[38]C.H.Shadle,Articulatory-acousticrelationshipsinfricativeconsonants,inSpeechProductionandSpeechModelling,W.J.HardcastleandA.Marchal,Eds.,pp.187,KluwerAca-demic,Dordrecht,TheNetherlands,1990.[39]A.Teixeira,F.Vaz,andJ.C.Pr ncipe,nuenceofdynamicsintheperceivednaturalnessofportuguesenasalvowels,inProc.14thInternationalCongressofPhoneticSciences(ICPhS ,SanFrancisco,Calif,USA,August1999.[40]N.Kitawaki,Qualityassessmentofcodedspeech,inAd-vancesinSpeechSignalProcessing,S.FuruiandM.M.Sondhi,Eds.,chapter12,pp.357,MarcelDekker,NewYork,NY,USA,1992.[41]I.R.TitzeandB.H.Story,Acousticinteractionsofthevoicesourcewiththelowervocaltract,JournaloftheAcousticalSo-cietyofAmerica,vol.101,no.4,pp.2234,1997.[42]A.Teixeira,L.M.T.Jesus,andR.Martinez,Addingfrica-tivestothePortuguesearticulatorysynthesizer,inProc.8thEuropeanConferenceonSpeechCommunicationandTechnol-ogyEUROSPEECH03),pp.2949,Geneva,Switzer-land,September2003.[43]A.Teixeira,F.Vaz,andJ.C.Pr ncipe,AsoftwaretooltostudyPortuguesevowels,inProc.5thEuropeanConferenceonSpeechCommunicationandTechnology(EUROSPEECH97), G.Kokkinakis,N.Fakotakis,andE.Dermatas,Eds.,vol.5,pp.2543,Rhodes,Greece,September1997.[44]A.Teixeira,F.Vaz,andJ.C.Pr ncipe,SomestudiesofEu-ropeanPortuguesenasalvowelsusinganarticulatorysynthe-sizer,inProc.5thIEEEInternationalConferenceonElectron-ics,CircuitsandSystemsICECS98),vol.3,pp.507,Lis-bon,Portugal,September1998.[45]J.Laver,PrinciplesofPhonetics,CambridgeTextbooksinLin-guistics,CambridgeUniversityPress,Cambridge,UK,1stedi-tion,1994.[46]D.G.Childers,SpeechProcessingandSynthesisToolboxes,JohnWiley&Sons,NewYork,NY,USA,2000.[47]A.Teixeira,L.C.Moutinho,andR.L.Coimbra,roduction,acousticandperceptualstudiesonEuropeanPortuguesenasalvowelsheight,inProc.15thInternationalCongressofPhoneticSciences(ICPhS,Barcelona,Spain,August2003.[48]A.Teixeira,F.Vaz,andJ.C.Pr ncipe,asalvowelsfollowinganasalconsonant,inProc.5thSeminaronSpeechProduc-tion:ModelsandData,pp.285,Bavaria,Germany,May2000. [49]L.M.T.JesusandC.H.Shadle,AparametricstudyofthespectralcharacteristicsofEuropeanPortuguesefricatives,JournalofPhonetics,vol.30,no.3,pp.437,2002.

PAGE 14

1448EURASIPJournalonAppliedSignalProcessing [50]W.J.HardcastleandN.Hewlett,Eds.,Coarticulation:Theo-reticalandEmpiricalPerspectives,CambridgeUniversityPress,Cambridge,Mass,USA,1999. Ant onioJ.S.TeixeirawasborninParedes,Portugal,in1968.Hereceivedhisrstde-greeinelectronicandtelecommunicationsengineeringin1991,theM.S.degreeinelec-tronicandtelecommunicationsengineeringin1993,andthePh.D.degreeinelectricalengineeringin2000,allfromtheUniversityofAveiro,Aveiro,Portugal.HisPh.D.disser-tationwasonarticulatorysynthesisofthePortuguesenasals.Since1997,hehasbeenteachingintheDepartmentofElectronicsandTelecommunica-tionsEngineeringattheUniversityofAveiro,aProfessorAuxiliarsince2000,andhasbeenaResearcher,sinceitscreationin1999,intheSignalProcessingLaboratoryattheInstituteofElectronicsandTelematicsEngineeringofAveiro(IEETA),Aveiro,Portugal.Hisresearchinterestsincludedigitalprocessingofspeechsignals,particularly(articulatory)speechsynthesis;Portuguesephonetics;speakerverication;spokenlanguageunderstanding;dialoguesys-tems;andman-machineinteraction.HeisalsoinvolvedinanewMastsprogramintheareaofspeechsciencesandhearing,astheCoordinator.HeisaMemberofTheInstituteofElectricalandElectronicsEngineers,InternationalSpeechCommunicationAsso-ciation,andtheInternationalPhoneticAssociation. RobertoMartinezwasborninCubain1961.Hereceivedhisrstdegreeinphysicsin1986fromtheMoscowStateUniversityM.V.Lomonosov,formerUSSR.HeisaMi-crosoftCertiedEngineersince1998.From1986to1994,hewasanAssistantProfessorofmathematicsandphysicsattheHavanaUniversity,Cuba,doingresearchincom-puteraidedmoleculardesign.From1996to1998,hewaswithSIMELtd.,Cuba,asanIntranetDeveloperandSystemAdministrator.From1999to2001,hewaswithDISAICConsultingServices,Cuba,doingtrainingofNetworkAdministratorsandconsultinginMicrosoftBackO ce SystemsIntegrationandNetworkSecurity.HeiscurrentlyworkingtowardtheDoctoraldegreeinarticulatorysynthesisofPortugueseattheUniversityofAveiro,Portugal. Lu NunoSilvareceivedhisrstdegreeinelectronicsandtelecommunicationsen-gineeringin1997andtheM.S.degreeinelectronicsandtelecommunicationsengi-neeringin2001,bothfromtheUniversi-dadedeAveiro,Aveiro,Portugal.From1997till2002,heworkedinresearchandde-velopmentattheInstitutodeEngenhariaElectr onicaeTelem aticadeAveiro,Aveiro,Portugal(formerInstitutodeEngenhariadeSistemaseComputadoresofAveiro,Aveiro,Portugal)asaResearchAssociate.Since2002,hehasbeenworkingasaSoftwareEngineerattheResearchandDevelopmentDepartmentofNECPortugal,Aveiro,Portugal.Hisresearchinterestsincludedigitalprocessingofspeechsignalsandspeechsynthesis.HeisaMemberofTheIn-stituteofElectricalandElectronicsEngineers. LuisM.T.Jesusreceivedhisrstdegreeinelectronicandtelecommunicationsengi-neeringin1996fromtheUniversidadedeAveiro,Aveiro,Portugal,theM.S.degreeinelectronicsin1997fromtheUniversityofEastAnglia,Norwich,UK,andthePh.D.degreeinelectronicsin2001fromtheUni-versityofSouthampton,UK.Since2001,hehasbeenaReaderintheEscolaSuperiordeSa udedaUniversidadedeAveiro,Aveiro,Portugal,andhasbeenamemberoftheSignalProcessingLab-oratoryattheInstitutodeEngenhariaElectr onicaeTelem atica deAveiro,Aveiro,Portugal.Hisresearchinterestsincludeacous-ticphonetics,digitalprocessingofspeechsignals,andspeechsyn-thesis.HeisaMemberofTheAcousticalSocietyofAmerica,Associac aoPortuguesadeLingu stica,InternationalPhoneticAs-sociation,InternationalSpeechCommunicationAssociation,andTheInstituteofElectricalandElectronicsEngineers. JoseC.Pr ncipeisaDistinguishedProfes-sorofelectricalandbiomedicalengineer-ingattheUniversityofFlorida,Gainesville,whereheteachesadvancedsignalprocess-ingandarticialneuralnetworks(ANNs)modeling.HeisaBellSouthProfessorandFounderandDirectoroftheUniversityofFloridaComputationalNeuroEngineeringLaboratory(CNEL).Hehasbeeninvolvedinbiomedicalsignalprocessing,brainma-chineinterfaces,nonlineardynamics,andadaptivesystemstheory(informationtheoreticlearning).HeistheEditor-in-ChiefofIEEETransactionsonBiomedicalEngineering,PresidentoftheInterna-tionalNeuralNetworkSociety,andformalSecretaryoftheTechni-calCommitteeonNeuralNetworksoftheIEEESignalProcessingSociety.HeisalsoaMemberoftheScienticBoardoftheFoodandDrugAdministration,andaMemberoftheAdvisoryBoardoftheUniversityofFloridaBrainInstitute.Hehasmorethan100publicationsinrefereedjournals,10bookchapters,andover200conferencepapers.Hehasdirected42Ph.D.degreedissertationsand57M.S.degreetheses. FranciscoA.C.VazwasborninOporto,Portugal,in1945.HereceivedtheElectri-calEngineeringdegreefromUniversityofOporto,Portugal,in1968,andthePh.D.degreeinelectricalengineeringfromtheUniversityofAveiro,Portugal,in1987.HisPh.D.dissertationwasonautomaticEEGprocessing.From1969to1973,heworkedforthePortugueseNuclearCommittee.Af-terseveralyearsworkingintheindustry,hejoined,in1978,thesta oftheDepartmentofElectronicsEngi-neeringandTelecommunications,theUniversityofAveiro,whereheiscurrentlyaFullProfessor.Hisresearchinterestshavecentredonthedigitalprocessingofbiologicalsignals,andsince1995ondigitalspeechprocessing.