![]() ![]() |
![]() |
UFDC Home | Search all Groups | UF Institutional Repository | Research Archive | | Help |
Material Information
Notes
Record Information
|
Full Text |
PAGE 1 EURASIPJournalonAppliedSignalProcessing2005:9,1435c 2005HindawiPublishingCorporationSimulationofHumanSpeechProductionAppliedtotheStudyandSynthesisofEuropeanPortugueseAnt onioJ.S.TeixeiraInstitutodeEngenhariaElectr onicaeTelem aticadeAveiroIEETA),3810-193Aveiro,PortugalDepartamentodeElectr onicaeTelecomunicac oes,UniversidadedeAveiro,3810-193Aveiro,PortugalEmail:ajst@det.ua.ptRobertoMartinezInstitutodeEngenhariaElectr onicaeTelem aticadeAveiroIEETA),3810-193Aveiro,PortugalEmail:martinezrs@ieeta.ptLu NunoSilvaInstitutodeEngenhariaElectr onicaeTelem aticadeAveiroIEETA),3810-193Aveiro,PortugalEmail:lnors@ieeta.ptLuisM.T.JesusInstitutodeEngenhariaElectr onicaeTelem aticadeAveiroIEETA),3810-193Aveiro,PortugalEscolaSuperiordeSa ude,UniversidadedeAveiro,3810-193Aveiro,PortugalEmail:lmtj@essua.ua.pt JoseC.Pr ncipe ComputationalNeuroengineeringLaboratory(CNEL),UniversityofFlorida,Gainesville,FL32611,USAEmail:principe@cnel.u.eduFranciscoA.C.VazInstitutodeEngenhariaElectr onicaeTelem aticadeAveiroIEETA),3810-193Aveiro,PortugalDepartamentodeElectr onicaeTelecomunicac oes,UniversidadedeAveiro,3810-193Aveiro,PortugalEmail:fvaz@det.ua.ptReceived29October2003;Revised31August2004AnewarticulatorysynthesizerSAPWindows),withamodularandxibledesign,isdescribed.Acomprehensiveacousticmodelandanewinteractiveglottalsourcewereimplemented.Perceptualtestsandsimulationsmadepossiblebythesynthesizercon-tributedtodeepeningourknowledgeofoneofthemostimportantcharacteristicsofEuropeanPortuguese,thenasalvowels.Firstattemptsatincorporatingmodelsoffricationintothearticulatorysynthesizerarepresented,demonstratingthepotentialofperformingfricativesynthesisbasedonbroadarticulatorycongurations.SynthesisofnonsensewordsandPortuguesewordswithvowelsandnasalconsonantsisalsoshown.Despitenotbeingcapableofcompetingwithmainstreamconcatenativespeechsynthesis,theanthropomorphicapproachtospeechsynthesis,knownasarticulatorysynthesis,provedtobeavaluabletoolforphoneticsresearchandteaching.ThiswasparticularlytruefortheEuropeanPortuguesenasalvowels.Keywordsandphrases:articulatorysynthesis,speechproduction,EuropeanPortuguese,nasalvowels,fricatives.1.INTRODUCTIONRecenttechnologicaldevelopmentsarecharacterizedbyin-creasingphysicalandpsychologicalsimilaritytohumans.Oneexampleisthewell-knownhuman-likerobots.Beingoneofthedistinctcharacteristicsofhumans,speechisanaturalcandidatetoimitationbymachines.Also,informa-tioncanbetransmittedveryfastandspeechfreeshandsandeyesforothertasks.Variousdesignsofmachinesthatproduceandunder-standhumanspeechhavebeenavailableforalongtime[ 1 2 ].Theuseofvoiceincomputersystemsinterfaceswill PAGE 2 1436EURASIPJournalonAppliedSignalProcessing beanaddedadvantage,allowing,forexample,theuseofin-formationsystemsforpeoplewithdi erentdisabilitiesandtheaccessbytelephonetonewinformationservices.How-ever,ourcurrentknowledgeoftheproductionandpercep-tionofvoiceisstillincomplete.Thequality(orlackofit)ofsyntheticvoiceofthecurrentlyavailablesystemsisaclearindicationofthenecessitytoimprovethisknowledge[2 ]. Therearetwotypesofmotivationsforresearchinthevastdomainofvoiceproductionandperception[3 ].Therstoneaimsatthedeepunderstandingofitsdiverseaspectsandfunctions,thesecondisthedesignanddevelopmentofarti-cialsystems.Whenarticialsystemsarecloselyrelatedtothewayhumansdothings,thesetwomotivationscanbemerged.Thesesystemscontributetoanincreasedknowledgeoftheprocessandthisknowledgecanbeusedtoimprovecurrentsystems.Wehavebeendevelopinganarticulatorysynthesizer,since1995,whichwillhopefullyproducehigh-qualitysyn-theticEuropeanPortuguese(EP)speech.Weaimatasimul-taneousimprovementofoursynthesisquality(technologicalmotivation)andalsotoexpandourknowledgeofPortugueseproductionandperception.2.ARTICULATORYSYNTHESISArticulatorysynthesisgeneratesthespeechsignalthroughmodelingofphysical,anatomical,andphysiologicalcharac-teristicsoftheorgansinvolvedinhumanvoiceproduction.Thisisadi erentapproachwhencomparedwithothertech-niques,suchasformantsynthesis[5 ].Inthearticulatoryap-proach,thesystemismodeledinsteadofthesignaloritsacousticscharacteristics.Approachesbasedonthesignaltrytoreproducethesignalofanaturalvoiceasfaithfullyaspos-siblewithfewornoconcernabouthowitisproduced.Incontrast,amodelbasedontheproductionsystemusesphys-icallawstodescribethesoundpropagationinthevocaltractandmodelsmechanicalandaeroacousticphenomenatode-scribetheoscillationofthevocalfolds.2.1.BasiccomponentsofanarticulatorysynthesizerToimplementanarticulatorysynthesizerinadigitalcom-puter,amathematicalmodelofthevocalsystemisneeded.Synthesizersusuallyincludetwosubsystems:ananatomic-physiologicalmodelofthestructuresinvolvedinvoicepro-ductionandamodeloftheproductionandpropagationofsoundinthesestructures.Therstmodeltransformsthepositionsoftheartic-ulators,likethejaw,tonguebody,andvelum,intocross-sectionalareasofthevocaltract.Thesecondmodelconsistsofasetofequationsthatdescribetheacousticpropertiesofthevocaltractsystem.Generallyitisdividedintosubmod-elstosimulatedi erentphenomenasuchasthecreationofasourceofperiodicexcitation(vocalfoldoscillation),soundsourcescausedbytheturbulentowinthecaseofexistenceofconstrictionzonesareasu cientlyreducedalongthevo-caltract),propagationofthesoundaboveandbelowthevo-calfolds,andradiationatthelipsand/ornostrils.Theparametersforthemodelscanbeproducedbydif-ferentmethods.Theycanbeobtaineddirectlyfromthevoicesignalbyaprocessofinversionwithoptimization,bedenedmanuallybytheresearcher,orbetheoutputofalinguisticprocessingpartofaTTS(text-to-speech)system.2.2.MotivationsArticulatorysynthesishasnotreceivedasmuchattentioninrecentyearsasitcouldhavebecausethereisnotanalterna-tivetotheactualsystemsofsynthesiscurrentlyusedinTTSsystems.Thisisduetodi erentfactors:thedi cultytogetinformationaboutthevocaltractandthevocalfoldsduringtheproductionofvoiceinhumans;themeasurementtech-niquesgenerallyprovideinformationregardingstaticcong-urationswhileinformationconcerningthedynamicsofthearticulatorsisincomplete;afullandreliableinversionpro-cessforobtainingthearticulatoryparametersfromnaturalvoicedoesnotexistyet;thistechniqueinvolvescomplexcal-culations,raisingproblemsofstabilityinthenumericalres-olution. Despitetheselimitations,articulatorysynthesispresentssomeimportantadvantages:theparametersofthesynthe-sizeraredirectlyrelatedwiththehumanarticulatorymech-anisms,beingveryusefulinstudiesofproductionandper-ceptionofvoice[6 ];thismethodcanproducehigh-qualitynasalconsonantsandnasalvowels[7 ];source-tractinterac-tion,essentialforanaturalsound,canbeconvenientlymod-eledwhensimulatingthevocalfoldsandthetractasonesys-tem[8 ];theparametersvaryslowlyintime,sotheycanbeusedine cientprocessesofcodication;theparametersareeasiertointerpolatethanLPCandformantsynthesizerspa-rameters[9 ];smallerrorsinthecontrolsignalsdonotgen-erallyproducelowqualityspeechsounds,becausetheinter-polatedvalueswillalwaysbephysicallypossible.AccordingtoShadleandDamper[10 ],articulatorysyn-thesisisclearlythebestwaytoreproducesomeattributesofspeechweareinterestedin,suchastobeabletosoundlikeanextraordinaryspeakere.g.,asinger,someonewithdis-orderedspeech,oranalienwithextrasinuses);tobeabletochangetoanotherspeakertype,oralterthevoicequalityofagivenspeaker,withouthavingtogothroughasmuche ortasrequiredfortherstvoice.Articulatorysynthesiz-ershaveparametersthatcanbeconceptualized,sothatifaspeechsamplesoundswrong,intuitionisusefulinxingit,alwaysteachingussomethingandprovidingopportunitiestolearnmoreasweworktoproduceacommerciallyusablesystem.Articulatorysynthesisholdspromiseforovercomingsomeofthelimitationsandforsharpeningourunderstand-ingoftheproduction/perceptionlink[11 ].Thereisonlypartialknowledgeaboutthedynamicsofthespeechsignal,socontinuedresearchinthisareaisneeded.Thesystematicstudyofthecoarticulatione ectsisofspecialimportanceforthedevelopmentoftheexperimentalphoneticsandsciencesrelatedwiththeprocessingofvoice[12 ].Anarticulatorysyn-thesizercanbeusedasaversatilespeakerandthereforecon-tributetosuchstudies.Articulatorysynthesizerscangenerate PAGE 3 SpeechProductionSimulationAppliedtoEuropeanPortuguese1437 speechusingcarefullycontrolledconditions.Thiscanbeuse-ful,forexample,totestpitch-trackingalgorithms[13 ]. Thearticulatorysynthesizercanbecombinedwithaspeechproductionevaluationtooltodevelopasystemthatcanproducereal-timeaudio-visualfeedbacktohelppeoplewithspecicarticulatorydisorders.Forexample,computer-basedspeechtherapy[14 ]ofspeakerswithdysarthriatriestostabilizetheirproductionatsyllableorwordlevel,toim-provetheconsistencyofproduction.Forseverelyhearingim-pairedpersons,theaimistoteachthemnewspeechpatternsandincreasetheintelligibilityoftheirspeech.Forchildrenwithcleftlipandpalateandvelopharyngealincompetence,theaimistoeliminatemisarticulatedspeechpatternssothatmostofthesespeakerscanachievehighlyintelligiblenormalspeechpatterns.Alsotheuseofsucha[articulatory]synthesizerhasmuchtocommenditinphoneticstudies[15 ].Theaudio-visualfeedbackcouldbeusedasanassistantforteachingphoneticstoforeignstudentstoimprovetheirspeechquality.Thesynthesizercanbeusedtohelpteachcharacteristicfea-turesofagivenlanguagesuchaspitchlevelandvowelspace[ 16 ]. RecentdevelopmentspresentedattheICPhS[11 ]showthatarticulatorysynthesisisworthrevisitingasaresearchtoolandasapartofTTSsystems.Betterwaysofmeasur-ingvocaltractcongurations,anincreasedresearchinterestinthevisualrepresentationofspeechandtheuseofsimplercontrolstructures,haverenewedtheinterestinthisresearcharea[11 ].Currentarticulatoryapproachestosynthesisin-cludeanopen-sourceinfrastructurethatcanbeusedtocom-binedi erentmodels[17 ],recentdevelopmentsintheHask-inscongurablearticulatorysynthesizerCASY[18 ],thecharacterizationoflipmovements[19 ],theICPvirtualtalk-ingheadthatincludesarticulatory,aerodynamic,andacous-ticmodelsofspeech[20 ],andthequasiarticulatoryarticula-toryparameterscontrollingaformantsynthesizer)approachofStevensandHanson[21 ]. 3.SAPWINDOWSARTICULATORYSYNTHESIZERObject-orientedprogrammingwasusedtoimplementthesynthesizer.Themodel-view-controllerconceptwasadoptedtoseparatemodelsfromtheircontrolsandviewers.Theapplication,developedusingMicrosoftVisualC++,cansynthesizespeechsegmentsfromparameterssequences.Thesesequencescanbedenedinadataoreditedbytheuser.Thesynthesisprocessispresentedstepbysteponagraphicalinterface.Presently,implementedmodelsallowonlyqualitysyn-thesisofvowelsoralornasal),nasalconsonants,andfrica-tives.Thenextsectionspresentbrieythecurrentlyimple-mentedmodels.3.1.AnatomicmodelsFornonnasalsounds,weonlyhavetoconsiderthevocaltract,thatis,avariableareatubebetweentheglottisandthelips.Fornasalsounds,wehavealsotoconsiderthenasal F M V W B N U T L3 L6 L5 L7 Jaw PF TO c2 cmvcmn DL PPc1Wh G2 G H G1K(0,0Jaw TonguetipTonguebodyLipsopeningLipsprot.HyoidVelumposition:20deg.(0. 349rads):(9. 800;10. 040) :(6. 640;8. 780) :0 139082 :0 390000 :0 060000 :(4. 376;9. 653) Figure1:Vocaltractmodel,basedonMermelsteinsmodel[22 ]. tract.Thenasaltractareaisessentiallyconstant,withtheex-ceptionofthesoftpalateregion.Thevocaltractvariescon-tinuallyanditsformmustbespeciedinintervalsshorterthanafewmilliseconds[23 ]. 3.1.1.VocaltractmodelTheproposedanatomicmodel,showninFigure1,assumesmidsagittalplanesymmetrytoestimatethevocaltractcross-sectionalarea.Modelarticulatorsaretonguebody,tonguetip,jaw,lips,velum,andhyoid.OurmodelisanimprovedversionoftheUniversityofFloridaMMIRCmodel[24 ], whichinturnwasamodiedversionoftheMermelsteinsmodel[22 ].Itusesanonregulargridtoestimatesectionsar-easandlengths.3.1.2.NasaltractmodelThemodelofthenasaltractallowstheinclusionofdi erentnasaltractshapesandseveralparanasalsinuses.Thenasalcavityismodeledinasimilarwaytotheoraltractandcanbeconsideredasasidebranchofthevocaltract.Themajordi erenceisthattheareafunctionofthenasaltractisxedforthemostpartofthenasaltract,foraparticu-larspeaker.Thevariableregion,thesoftpalate,changeswiththedegreeofnasalcoupling.Thevelumparameterofthear-ticulatorymodelcontrolsthiscoupling.RLCshuntcircuits,representingHelmholtzresonators,simulatetheparanasalsinuses[7 ]. Oursynthesizerallowsthedenitionofdi erenttractshapesandtheinclusionoftheneededsinusatanyposition PAGE 4 1438EURASIPJournalonAppliedSignalProcessing Velum23 91 52 93 41cm0 822. 41 40 5cm2 MaxillarysinusNostrils Figure2:Defaultnasalmodelbasedon[26 ]. bysimplyeditinganASCIIle.Also,blockingofthenasalpassagesatanypositioncanbesimulatedbydeninganullareasectionatthepointofocclusion.Implementationdetailswerereportedin[25 ]. Inmostofourstudies,weusethenasaltractdimensionsfrom[26 ],asshowninFigure2,whichwerebasedonstudiesbyDangandHonda[27 ]andStevens[28 ]. 3.2.InteractiveglottalsourcemodelWedesignedaglottalexcitationmodelthatincludedsource-tractinteraction,fororalandnasalsounds[29 ],thatalloweddirectcontrolofsourceparameters,suchasfundamentalfre-quency,andthatwasnottoodemandingcomputationally.Theinteractivesourcemodelwedevelopedwasbasedon[ 30 ].Themodelwasextendedtoincludeatwo-masspara-metricmodeloftheglottalarea,jitter,shimmer,aspiration,andtheabilitytosynthesizedynamiccongurations.Tocalculatetheglottalexcitation,u g t ),itbecamenec-essarytomodelthesubsystemsinvolved:thelungs,thesub-glottalcavities,theglottisandthesupraglottaltract.Theroleofthelungsistheproductionofaquasicon-stantpressuresource,modeledasapressuresourcep l inse-rieswiththeresistanceR l .Torepresentthesubglottalregion,includingthetrachea,weusedthreeRLCresonantcircuits[ 31 ]. Severalapproacheswereusedforvocalfoldmodeling:self-oscillatingmodels,parametricglottalareamodels,andsoforth.Wewantedtohaveaphysiologicalmodel,likethetwo-massmodel,thatresultedinhigh-qualitysynthesis,butatthesametimeamodelnottoodemandingcomputation-ally.Also,adirectcontrolofparameterssuchasF 0 wasre-quired.WethereforechosethemodelproposedbyPrado[ 24 ],whichdirectlyparameterizesthetwoglottalareas.Inthemodel,R g and L g ,whichdependonglottalaperture,rep-resentthevocalfolds.Systemsaboveglottisweremodeledbythetractin-putimpedancez in t obtainedfromtheacousticmodel.Thisapproachresultsinanaccuratemodelingoffrequency-dependentlosses.Thevarioussubsystemscanberepresentedbytheequiv-alentcircuitshowninFigure3. Pressurevariationalongthecircuitcanberepresentedbyp l R l u g t 3 i = 1 p sgi d L g u g t dt R g u g t p s t = 0 (1) Theglottalsourcemodelincludesparametersneededtomodel F 0 andglottalapertureperturbations,knownasjitter Lungsp l + R l R sg 1 L sg 1 C sg 1 p sg 1 R sg 2 L sg 2 C sg 2 p sg 2 R sg 3 L sg 3 C sg 3 p sg 3 Tracheap intraR g L g p sub p s Glottis u g t Z in t Tract Figure3:Electricalanalogueoftheimplementedglottalsource.Adaptedfrom[32 ]. Table1:Glottalsourcetime-varyingparameters. Parameter DescriptionTypicalvalueUnit p l Lungspressure10000dyne/cm2 F 0 Fundamentalfrequency100HzOQ Openquotient60%ofT 0 SQ Speedquotient2A g 0 Minimumglottalarea0cm2 A g max Maximumglottalarea0. 3cm2 A 2 A 1 Slope0. 03cm2 Jitter F 0 perturbation2%Shimmer A g max perturbation5%Asp Aspiration andshimmer.Themodelalsotakesintoaccounttheaspira-tionnoisegenerationasproposedbySondhiandSchroeter[ 23 ].Oursourcemodeliscontrolledbytwokindsofpa-rameters.Thersttypeofparameterscanvaryintime,hav-ingarolesimilartothetractparameters.Inthesynthesisprocess,theseparameterscanbeusedtocontrolintonation,voicequality,andrelatedphenomena.Theyarepresentedin Table1.Thesecondtypeofsourceparameters(includ-inglungresistance,glottisdimensions,etc.)doesnotvaryintime.Theirvaluescanbealteredbyeditingacongurationle. 3.3.AcousticmodelSeveraltechniqueshavebeenproposedforsimulationofsoundpropagationintheoralandnasaltracts[33 ]:di-rectnumericsolutionoftheequations;time-domainsimu-lationusingwavedigitalersWDF),alsoknownasKelly-Lochbaummodel;frequency-domainsimulation.Afteran-alyzingtheprosandconsofthesethreeapproaches,wechoseforourrstimplementationoftheacousticmodelthefrequency-domaintechnique.Themainreasonforthischoicewasthepossibilityofeasilyincludingthefrequency-dependentlosses.Inouracousticmodel,wemadethefollowingapproxi-mations:propagationisassumedplanar;thetractisstraight;thetubeisapproximatedbytheconcatenationofelementaryacoustictubesofconstantarea.Anequivalentcircuit,repre-sentedbyatransmissionmatrix,modelseachoneoftheseelementarytubes.Analysisofthecircuitisperformedinthefrequencydomain[9 ]. PAGE 5 SpeechProductionSimulationAppliedtoEuropeanPortuguese1439 SubglottalsystemU g Z sub ApBpCpDp Z sub Pharyngealtube A 1 bB 1 b C 1 bD 1 b U c Backwardtract AnBnCnDn Z nr Z n Nasaltract Figure4:MatricesandimpedancesinvolvedinthecalculationofthetransferfunctionH gn ,betweenglottisandaconstrictionpoint,whichinturnisusedinthecalculationofuxatthenoisesourcelocation. Speechisgeneratedbytheacousticmodel.Weuseafrequency-domainanalysisandtime-domainsynthesismethodusuallydesignatedasthehybridmethod[9 ].Theuseoftheconvolutionmethodavoidstheproblemofcon-tinuityofresonanceinthefastermethodproposedbyLin[ 34 ].TheuseofafastimplementationoftheIFFT(theMITFFTW[35 ])minimizestheconvolutioncalculationtime.AsimilarprocedureisappliedtotheinputimpedanceZ in ),inordertoobtainz in n ),neededforthesource-tractinteractionmodelingbytheglottalsourcemodel.3.4.AcousticmodelforfricativesThevolumevelocityataconstrictionisobtainedbythecon-volutionoftheglottalowwiththeimpulseresponsecalcu-lated,usinganIFFT,fromthetransferfunctionbetweentheglottisandtheconstrictionpointH gn (see Figure4). 3.4.1.NoisesourcesFluctuationsinthevelocityofairowemergingfromaconstriction(atanabruptterminationofatube)createmonopolesourcesanductuationsofforcesexertedbyanobstaclee.g.,teeth,lips)orsurface(e.g.,palateorientednormaltothewgeneratedipolesources.Sincedipolesourceshavebeenshowntobethemostinuentialinthefricativespectra[36 ],thenoisesourceofthefricativeshasonlybeenapproximatedbyequivalentpressurevoltage(dipole)sourcesinthetransmission-linemodel.Neverthe-less,itisalsopossibletoinserttheappropriatemonopolesources,whichcontributetothelow-frequencyamplitudeandcanbemodeledbyanequivalentcurrentvolumevelocitysource.Fricationnoiseisgeneratedatthevocaltractaccord-ingtothesuggestionsofFlanagan[37 ],andSondhiandSchroeter[9 ].Anoisesourcecanbeintroducedautomat-icallyatanyT-sectionofthevocaltractnetwork,betweenthevelumandthelips.Thesynthesizesarticulatorymod-uleregisterswhichvocaltracttubecross-sectionalareasarebelowacertainthresholdA< 0 2cm2 ),producingalistoftubesectionsthatmightbepartofanoralconstrictionthatgeneratesturbulence.TheacousticmodulecalculatestheReynoldsnumber(Re)atthesectionsselectedbythearticulatorymoduleandactivatesnoisesourcesattubesectionswheretheReynoldsnumberisaboveacriticalvalueRecrit= 2000accordingto[ 9 ]).Noisesourcescanalsobeinsertedatanylocationinthevocaltract,basedonadditionalinformationaboutthedistributionandcharacteristicsofsources[36 38 ].Thisisadi erentsourceplacementstrategyfromthatusuallyusedinarticulatorysynthesis[9 ]wherethesourcesareprimar-ilylocatedinthevicinityoftheconstriction.Thedistributednatureofsomenoisesourcescanbemodeledbyinsertingseveralsourceslocatedinconsecutivevocaltractsections.Thiswillallowustotrycombinationsofthecanonicalsourcetypesmonopole,dipole,andquadrupole).ApressuresourcewithamplitudeproportionaltothesquaredReynoldsnumberP noise = 2 10 6 rand Re 2 Re 2 crit ,Re> Re crit, 0,Re Re crit, (2) isactivatedatthecorrectplaceinthetract[9 37 ].Theinter-nalresistanceofthenoisepressuresourceisproportionaltothevolumevelocityattheconstriction:R noise = | U c | / 2 A 2 c where isthedensityoftheair,U c isthewatthecon-striction,andA c istheconstrictioncross-sectionalarea.Theturbulentowcanbecalculatedbydividingthenoisepres-surebythesourceresistance.Thisnoiseowcouldalsobeeredinthetimedomaintoshapethenoisespectrum[36 ] andtestvariousexperimentallyderiveddipolespectra.3.4.2.PropagationandradiationThegeneralproblemassociatedwithhavingN noisesourcesisdecomposedinN simpleproblemsbyusingthesuperpo-sitionprinciple.Inordertocalculatetheradiatedpressureatthelipsduetoeachnoisesource,thevocaltractisdividedintothefollowingthreesections:pharyngeal,regionbetweenvelumcouplingpointandnoisesource,andregionafterthesource.DatastructuresbasedontheareafunctionofeachsectionaredenedandABCDmatricescalculated[9 ].TheABCDmatriceswerethenusedtocalculatedownstreamZ 1 andupstreamZ 2 inputimpedances,aswellasthetransferfunction,H ,givenbyH = Z 1 Z 1 + Z 2 1 CZ rad + D ,(3whereC and D areparametersfromtheABCDmatrix(fromnoisesourcetolips),andZ rad isthelipradiationimpedance.Theradiatedpressureatthelipsduetoaspecicsourceisgivenbyp radiated n = h n u noise n ),whereh n = IFFT( H ).Theoutputsoundpressuresduetothedi erentnoisesourcesareaddedtogether.Theoutputsoundpressureresultingfromtheexcitationofthevocaltractbyaglottalsourceisalsoaddedwhenthereisvoicing.4.RESULTSInthissection,wepresentexamplesofsimulationexper-imentsperformedwiththesynthesizerandtwopercep-tualstudiesregardingEuropeanPortuguesenasalvowels. PAGE 6 1440EURASIPJournalonAppliedSignalProcessing CVVn NCVOpenOralVelumaperture Figure5:MovementofthevelumandoralarticulatorsforanasalvowelbetweentwostopconsonantsC VCcontext).Thethreephasesofanasalvowelinthiscontextareshown.Westartbythedescriptionoftheperceptualtests;then,re-centresultsinfricativesynthesis;nally,examplesofpro-ducedwordsandqualitytestsarepresented.4.1.NasalvowelsstudiesThesynthesizerwasusedtoproducestimuliforseveralper-ceptualtests,mostofthemforstudiesofnasalvowels.Next,wepresenttworepresentativestudies:therstinvestigatingthee ectofvelum,andotheroralarticulatorsvariationovertime;thesecondaddressingthesource-tractinteractionef-fectsinnasalvowels.Experiment1.StudyoftheinuenceofvelumvariationintheperceptionofnasalvowelsonC VCcontexts[39 ]. Severalstudiespointtotheneedofregardingspeechasadynamicphenomenon.Theinuenceofdynamicinforma-tioninoralvowelperceptionhasbeenasubjectofstudyformanyyears.Inaddition,someresearchersalsoseenasalvow-elsasdynamic.Toproducehigh-qualitysyntheticnasalvow-els,wouldbeusefultoknowinwhatmeasureweneedtoincludedynamicinformation.Weinvestigatedifitisenough,toproduceagoodqual-ityPortuguesenasalvowel,tocouplethenasaltractorthedegreeofcouplingvariationintimeimprovesquality.Thenullhypothesisisthatstaticanddynamicvelumwillproducestimuliofsimilarquality.OurrsttestsaddressedtheC VCcontext,nasalvowelsbetweenstops,themostcommonfornasalvowelsinPor-tuguese. VelumandoralpassageaperturevariationforanasalvowelproducedbetweenstopconsonantsisrepresentedschematicallyinFigure5.Duringtherststopconsonant,thenasalandoralpassagesareclosed.Thebeginningofthenasalvowelcoincideswiththereleaseoftheoralocclusion.Toproducethenasalvowel,boththeoralpassageandthevelummustbeopen.Possiblyduetotheslowspeedofvelummovements,inEuropeanPortuguese,thereisaperiodoftimewhereoralpassageisopenandvelumisinaclosed,oralmostclosed,position,producingasoundwithoralvowelcharacteristics,representedinFigure5byaV.Velumcon-tinuesitsopeningmovementcreatingsimultaneoussoundpropagationinoralandnasaltracts.ThiszoneisrepresentedbyVn .Theoralpassagemustcloseforthefollowingstopcon-sonant,sotheearlyoralclosure(beforethevelarclosurecre-atesazonewithonlynasalradiation,representedbyN.Theplaceofarticulationofthisnasalconsonant,createdbycoar-ticulation,isthesameasthefollowingstop.Stimuli Forthisexperiment,3variantsofeachofthe5EPnasalvow-elswereproduceddi eringinthewayvelummovementwasmodeled.Fortherstvariant,calledstatic,thevelumwasopenataxedvalueduringallvowelproduction.Theothertwovariantsusedtime-varyingvelumopening.Intherst100milliseconds,thevelumstayedclosed,makinganopen-ingtransitionin60millisecondstothemaximumaperture,andthenremainingopen.Inoneofthesevariants,analbilabialstopconsonant,[m],wascreatedattheendbylipclosureat250milliseconds.Allstimulihadaxeddurationof300milliseconds.Listeners Atotalof11,9maleand2female,EuropeanPortuguesena-tivespeakersparticipatedinthetest.Theyhadnohistoryofspeech,hearing,orlanguageimpairments.ProcedureWeusedapairedcomparisontest[40 ,page361],becausewewereanalysingthesynthesisquality,despitethedemandformoredecisionsbyeachlistener,whichalsoincreasestestduration.Thequestionansweredbylistenerswasasfollows:whichofthetwostimulidoyoupreferasaEuropeanPor-tuguesenasalvowel?Inpreparingthetest,wenoticedthatlis-tenershad,insomecases,di cultyinchoosingthepreferredstimulus.Thecausesweretracedtoeithergoodorpoorqual-ityofbothstimuli.Tohandlethissituation,weaddedtwonewpossibilities,foratotaloffourpossibleanswers:rst,second,both,andnone.Thetestwasdividedintotwoparts.Intherstpart,wecomparedstaticversusvelumdynamicstimuli.Inthesecondpartcomparisonwasmadebetweendynamicstimuliwithandwithoutanalbilabialnasalconsonant.Stimuliwerepresented5timesinbothABandBAorder.Interstimuliin-tervalwas600milliseconds.Theresultsforeachpossiblepairofstimuliinthetestwerecheckedforlistenerconsistency.Theywereretainedifthelistenerpreferredonestimulusinmorethan70%ofthepresentations.Onlyclearchoicesofonestimulusagainstoth-erswereanalyzed.Results Variablevelumpreferredtostaticvelum.Preferencescores(percentageofthedesignatedstimulichosenasthepreferredone)forxedvelumaperture,variablevelumaperture,andthedi erencebetweenthetwoarepresentedintheboxplotsof Figure6. Clearly,listenerspreferredstimuliwithtimevariablevelumaperture.Averagepreference,includingallvowelsandlisteners,wasashighas71. 8%.CondenceintervalCIp= 0 95)forthedi erenceinpreferencescorewasbetween24 2and65. 6%,infavourofthevariablevelumcase.RepeatedmeasuresANOVAshowedasignicantvelumvariatione ect[F (1,10)= 5 67, p< 0 05]andanonsigni-cantp> 0 05)vowelandinteractionbetweenthetwomainfactorsvowelandvelumvariation). PAGE 7 SpeechProductionSimulationAppliedtoEuropeanPortuguese1441 FixedvelumVariablevelumDi erence 50 0 50 100 Preference(%) Figure6:BoxplotsofthepreferencescoresfortherstpartoftheperceptualtestfornasalvowelsinC VCcontext,comparingstimuliwithedandvariablevelumapertures,showingthee ectofthevelumaperturevariation. WithoutconsonantWithconsonantDi erence0 20 40 60 80 100 Preference(%) Figure7:BoxplotsofthepreferencescoresforthesecondpartoftheperceptualtestfornasalvowelsinC VCcontext,comparingstimuliwithandwithoutanalnasalconsonant,showingtheef-fectofthenalnasalconsonant.Nasalconsonantatnasalvowelendwaspreferred.Ingen-eral,listenerpreferredstimuliendinginanasalconsonant.LookingatthepreferencescoresrepresentedgraphicallyinFigure7,stimuliwithnalnasalconsonantwerepreferredmorethanstimuliwithoutthenalconsonant.Thecon-denceintervalCIp= 0 95)forthedi erenceinpreferencescorewasbetween36. 1and87. 0%,infavourofthestimuliwithanalnasalconsonant. NointeractionTotalPartial0246Timems0 200 400 600 u g t cm3 / s) Figure8:Glottalwaveof3variantsofvowel[ a)withouttractload(nointeraction);(b)withtotaltractload;(c)withtractinputimpedancecalculateddiscardingnasaltractinputimpedance. Z in totalZ in oralonly0100020003000400050000 10 20 30 40 50 60 Figure9:Inputimpedanceforvowel[ withandwithoutthenasaltractinputimpedance.ANOVAresults,withtwomainfactors,conrmedasig-nicante ectofthenalnasalconsonant[F (1,8= 9 5, p< 0 05]andnonsignicantp> 0 05)vowele ectandinteractionbetweenmainfactors.Experiment2.Studyofsource-tractinteractionfornasalvowels[29 ]. Weinvestigatediftheextracouplingofthenasaltractinnasalvowelsproducedidentiablealterationsintheglot-talsourceduetosource-tractinteraction,andifmodelingofsuche ectsresultedinamorenaturalqualitysyntheticspeech.Figure8depictsthee ectofthe3di erentinputimpedancesinnasalvowel[ Thenasaltractloadhasagreatinuenceontheglottalsourcewave,becauseoftheno-ticeabledi erenceintheinputimpedancecalculatedwithorwithoutthenasaltractinputimpedance,showninFigure9. Thisdi erenceisduetothefactthatforhighvowelssuchas[ theimpedanceloadforthepharyngealregion,whichisequaltotheparalleloftheoralcavityandnasaltractin-putimpedances,isalmostequaltothenasalinputimpedance(see Figure10).Thee ectislessnotoriousinalowvowels,suchas[5 ]. Stimuli StimuliwereproducedfortheEPnasalvowelsvaryingonlyonefactor:theinputimpedanceofthetractusedbythe PAGE 8 1442EURASIPJournalonAppliedSignalProcessing Z in nasal Z in oralParallel0100020003000400050000 20 40 60 80 Figure10:Inputimpedancesinthevelumregionfornasalvowel[ TheFigurepresentstheoralinputimpedanceZ in oral),thenasaltractinputimpedanceZ in nasal),andtheequivalentparal-lelimpedance(parallel).Theparallelimpedanceis,forthisvowel,approximatelyequaltothenasaltractinputimpedance.interactivesourcemodel.Thisfactorhad3values:1)inputimpedanceincludingthee ectofallsupraglottalcavities;(2)inputimpedancecalculatedwithouttakingintoaccountthenasaltractcoupling;or3)notractload.Only3vowels,[ 5 ], [ and[ u],wereconsideredtoreducetestrealizationtime.Thesametimingwasusedforallvowels.Intherst100milliseconds,thevelumstayedclosed,makinganopen-ingtransitionin60millisecondstothemaximumvalue.Thevelumremainedatthismaximumuntiltheendofthevowel.Thestimuliendedwithanasalconsonant,abilabial[m],producedbyclosingthelips.Closingmovementofthelipsstartedat200millisecondsandended50millisecondslater.Stimulusdurationwasxedat300millisecondsforallvowels.ThesechoiceswerebasedontheresultsoftheExperiment1,wheredynamicvelumstimuliwerepreferred.TheinteractivesourcemodelwasusedwithvariableF 0 F 0 startsaround100Hz,raisesto120Hzintherst100mil-liseconds,andthengraduallygoesbackdownto100Hz.Theopenquotientwas60%andthespeedquotient2.Jitterandshimmerwereaddedtoimprovenaturalness.Listeners Atotalof14,11malesand3femalesEuropeanPortuguesenativespeakersparticipatedinthetest.Theyhadnohistoryofspeech,hearing,orlanguageimpairments.ProcedureA4IAXfour-intervalforced-choicediscriminationtestwasperformedtoinvestigateiflistenerswereabletoperceivechangesintheglottalexcitationcausedbytheadditionalcou-plingofthenasaltract.The4IAXtestwaschosen,insteadofthemorecommonlyusedABXtest,becausebetterdiscriminationresultshavebeenreportedwiththistypeofperceptualtest[4 ]. Inthe4IAXparadigm,listenersheartwopairsofstimuli,withasmallintervalinbetween.Themembersofonepairarethesame(AA);themembersoftheotherpairaredi er-entAB).Listenershavetodecidewhichofthetwopairshasdi erentstimuli.Table2:Resultsofthe4IAXtest. Listener Sex[ 5 ][ [ u]Average 1 M50. 033. 341. 741. 7 2 M58. 3100. 050. 069. 4 3 F50. 041. 750. 047. 2 4 F33. 383. 366. 761. 0 5 M16. 758. 333. 336. 1 6 M66. 766. 766. 766. 7 7 M50. 050. 041. 747. 2 8 M58. 358. 341. 752. 8 9 F41. 750. 066. 752. 7 10 M58. 350. 058. 355. 6 11 M33. 383. 358. 358. 3 12 M75. 058. 358. 363. 9 13 M50. 041. 733. 341. 4 14 M83. 350. 058. 363. 8 Average 1. 858. 951. 854. 1 Std. 7. 318. 611. 910. 3 Signalswerepresentedoverheadphonesinroomswithlowambientnoise.Eachofthe4combinationsABAA,ABBB,AAAB,andBBAB)waspresented3timesinarandomorder.Withthisarrangement,eachpairtobetestedappears12times.Theorderwasdi erentforeachlistener.Interstim-uliintervalwas400millisecondsandinterpairsintervalwas700milliseconds.Results Table2showsthepercentageofcorrectanswersforthe4IAXtest.Thetablepresentsresultsforeachlistenerandvowel.Also,thestatisticsmeanandstandarddeviation)foreachvowel,andforthe3vowels,arepresentedatthebottomofthetable.Resultsarecondensed,ingraphicalform,inFigure11. Fromthetableandtheboxplots,itisclearthatlistenerscorrectanswerswerecloseto50%,beingalittlehigherforthenasalvowel[ Theseresultsindicatethatstimulidi erencesareofdi cultperceptionbythelisteners.Statisticaltests,havingasnullhypothesisH 0 : = 50 andalternativeH 1 : 50,wereonlysignicant,ata5%levelofsignicance,for[ Forthisvowel,the95%codenceintervalforthemeanwasbetween50. 1a67. 7.For[ 5 ], weobtainedp = 0 36andfor[ u], p = 0 29.Forthe3vow-elsconsideredtogether,theaveragewasalsonotsignicantlysuperiorto50%p = 0 08). Discussion Simulationsshowedsomesmalle ectsofthenasaltractloadintheglottalwavetimeandfrequencyproperties.Resultsofperceptualtests,conductedtostudytowhatextenttheseal-terationswereperceivedbylisteners,supportedtheideathatthesechangesarehardlyperceptible.Theseresultsagreewithresultsreportedin[41 ].Intheirwork,TitzeandStoryre-portedthatAnopennasalport... showednomeasurablee ectonoscillationthresholdpressureorglottalow. PAGE 9 SpeechProductionSimulationAppliedtoEuropeanPortuguese1443 [ a ][ [ u] 0 20 40 60 80 100 Correctdiscrimination(%) Figure11:Boxplotofthe4IAXdiscriminationtestresultsforeval-uationofthelistenersabilitytoperceivethee ectsofsource-tractinteractiononnasalvowels.Thereishoweveratendencyforthee ectofinteractionbeingmoreperceptibleforthehighvowel[ producedwithreducedvocalcavity.Oursimulationsresultssuggestasanexplanationforthisdi erencetherelationbetweenthenasaltractimputimpedanceandtheimpedanceofthevocalcavityatthenasaltractcouplingpoint.4.2.FricativesInarstexperimentthesynthesizerwasusedtoproduce,sustainedunvoicedfricatives[42 ].Thevocaltractcongura-tionderivedfromanaturalhighvowelwasadjustedbyrais-ingthetonguetipinordertoproduceasequenceofreducedvocaltractcross-sectionalareas.Thelungpressurewaslin-earlyincreasedanddecreasedatthebeginningandendoftheutterance,toproduceagradualonsetando setoftheglottalow.ThesecondgoalwastosynthesizefricativesinVCVse-quences[42 ].Articulatorycongurationsforvowelswereobtainedbyinversion[43 ].Thefricativesegmentwasob-tainedbymanualadjustmentofarticulatoryparameters.Forexample,todeneapalato-alveolarfricativecongurationforthefricativein[iS i],weusedthecongurationofvowel[i]andonlychangedthetonguetiparticulatortoaraisedpositionensuringacross-sectionalareasmallenoughtoac-tivatenoisesources.For[i],besidesraisingthetonguetip,describedfor[iS i], weusedlipopeningtocreatethenecessarysmallareapassageatthelips.Synthesisresultsforthenonsenseword/i/areshowninFigure12. An F 0 valueof100Hzandamaximumglottalopeningof0. 3cm2 wereusedtosynthesizethevowels.Thetimetra-jectoryoftheglottalsourceparameterA g max risesto2cm2 atthefricativemiddlepointandattheendofthefricativereturnstothevalueusedduringvowelproduction. 0460Timems 0 086 0 0 06 0460Timems0 5000 FrequencyHz Figure12:Synthetic[i],showingspeechsignalandspectrogram. 0466Timems 0 089 0 0 082 0466Timems0 5000 FrequencyHz Figure13:Synthetic[ivi],showingspeechsignalandspectrogram.Nonsensewordswithvoicedfricativeswerealsopro-duced,keepingtheglottalfoldsvibrationthroughoutthefricative.Resultsforthe[ivi]sequencearepresentedinFigure13. 4.3.WordsThesynthesizerisalsocapableofproducingwordscontain-ingvowels(oralornasal),nasalconsonants,andlower-quality)stops.Toproducesuchwords,andsincethesynthesizerisnotconnectedtothelinguisticandprosodiccomponentsofatext-to-speechsystem,weusedthefollowingmanualprocess:(1)obtainingdurationsforeachphoneticsegmententer-ingthewordcompositionpresentlybydirectanalysis PAGE 10 1444EURASIPJournalonAppliedSignalProcessing (a (b (cFigure14:Tractcongurationsusedtosynthesizethewordm ao (hand):a)[m],(b)[a],and(c)[u].ofnaturalspeechalthoughanautomaticprocess,suchasaCARTtree,canbeusedinthefuture);(2)obtainingoralarticulatorscongurationsforeachofthephones.Forvowelsweusedcongurationsob-tainedbyaninversionprocessbasedonthenatu-ralvowelsrstfourformants[43 44 ].Thesecon-gurationswerealreadyavailablefrompreviouswork[ 39 43 ].Fortheconsonants,forwhichwedonothave,yet,aninversionprocess,congurationswereobtainedmanually,basedonthearticulatoryphoneticsdescrip-tionandpublishedX-rayandMRIimages;(3)velumtrajectorydenition,usingadequatevaluesforeachvowelandconsonant;(4)settingglottalsourceparameters,inparticular,thefundamentalfrequencyF 0 ). WerstattemptedtosynthesizewordscontainingnasalsoundsduetotheirrelevanceinthePortugueselanguage[ 45 ].Wenowpresentthreeexamplesofsyntheticwords:m ao m ae ,andAnt onio Example1(wordm ao (hand)).First,fromnaturalspeechanalysis,wemeasureddurationsof100millisecondsforthe[m]and465millisecondsforthenasaldiphthong.Inthiscase,the[m]congurationwasobtainedmanu-allyandcongurationsfor[a]and[u]wereobtainedbyaninversionprocess[43 46 ].Thethreecongurationsarepre-sentedinFigure14. Avelumtrajectorywasdened,basedonarticulatoryde-scriptionsoftheinterveningsounds.AsshowninFigure15, thevelumstartsclosed,inapreproductionposition,opensforthenasalconsonant,opensmoreduringtherstvowelinthediphthong,andnallyraisestowardsclosureinthesec-ondpartofthediphthong.Fundamentalfrequency,F 0 ,andothersourceparameterswerealsodened.F 0 startsat120Hz,increasesto130Hzattheendofthenasalconsonant,thento150Hztostresstheinitialpartofthediphthong,andnallydecreasesto80Hzattheendoftheword.Thisvariationintimewasbased,par-tially,ontheF 0 contourofnaturalspeech.Valuesof60%fortheopenquotient(OQ)and2forspeedquotient(SQ)wereused.Jitter,shimmer,andsource-tractinteractionwerealsoused. 20100140185380565TimemsClosed Open Figure15:Velumtrajectoryusedtosynthesizethewordm ao (hand). 0800Timems0 5000 FrequencyHz Figure16:Spectrogramofthewordm ao producedbythearticula-torysynthesizer.Twoversionswereproduced:withandwithoutlipclo-sureattheendoftheword.Duetotheopenstateofthevelum,thisnaloralclosureresultsinthenalnasalcon-sonant[m].Thespectrogramofthislastversionispresentedin Figure16. Example2(wordm ae (mother)).Apossiblephonetictran-scriptionforthewordm ae (mother)ism 5 ],includingapalatalnasalconsonantattheend[45 ,page292].Keepingtheoralpassageopenattheendofthewordproducedavariant.Duetothelackofpreciseinformationregardingoraltractcongurationduringproductionof[ 5 weproducedvari-antsdi eringinthecongurationusedforthenasalvowel[ 5 ].Oneversionwasproducedusingthecongurationoforalvowel[a],another,withahighertongueposition,us-ingthecongurationofvowel[5 ].Anotherparametervariedwas F 0 :versionswithvaluesobtainedbyanalysisofanaturalspeech,andversionswithsyntheticF 0 .Forthesyntheticcase,afurthervariationwasused:theinclusionornotofsource-tractinteraction.Figure17showsthespeechsignalandre-spectivespectrogramfornonnaturalF 0 ,source-tractinterac-tion,congurationof[a]fornasalvowel[ 5 ],andnalpalatalocclusion.Example3(wordAnt onio ).Therstnameoftherstau-thor,Ant onio [ 5 O nju],wasalsosynthesizedusingthesameprocessasinthetwopreviousexamples.Thiswordhasanasalvowelatthebeginning,astop,anoralvowel,anasal PAGE 11 SpeechProductionSimulationAppliedtoEuropeanPortuguese1445 0800Timems 0 679 0 0 683 0800Timems0 5000 FrequencyHz Figure17:Speechsignalandspectrogramofoneoftheversionsofthewordm ae synthesizedusingan[a]congurationatthebegin-ningofthenasaldiphthong,oralocclusionattheend,source-tractinteraction,andsyntheticvaluesforF 0 consonant,andanaloraldiphthong.Twoversionswereproduced:onewithnaturalF 0 ,andanotherwithsyntheticF 0 .Thesignalanditsspectrogramobtainedfortherstver-sionarepresentedinFigure18.Thestopconsonant[t]wasobtainedclosingandopeningtheoralpassagewithoutmod-elingimportantphenomenafortheperceptionofanaturalqualitystopsuchasthevoiceonsettimeVOTandtheas-pirationatthereleaseofclosure.Aspartofameanopinionscore(MOS)qualitytest,thisandmanyotherstimuliproducedbyoursynthesizer,wereevaluated.Todocumentthequalitylevelachievedbyourmodels, Table3showstheratingsofthevariousversionsofthe3examplespresentedabove.Thenormalizedto5)re-sultsvariedbetweenthevalues3and4(fromfairtogood).Thetop-ratedwordobtained3. 7(3. 4withoutnormaliza-tion).5.CONCLUSIONFromtheexperiencewithsimulationsandperceptualtestsusingstimuligeneratedbyourarticulatorysynthesizer,webelievethatarticulatorysynthesisisapowerfulapproachtospeechsynthesisbecauseofitsanthropomorphicoriginandanditallowsustoaddressquestionsregardinghumanspeechproductionandperception.Wedevelopedamodulararticulatorysynthesizerarchi-tectureforPortuguese,usingobject-orientedprogramming.Separationofcontrol,model,andviewerallowstheadditionofnewmodelswithoutmajorchangestotheuserinterface.Implementedmodelscompriseaglottalinteractivesourcemodel,aexiblenasaltractareamodel,andahybridacousticmodelcapableofdealingwithasymmetricnasaltractcong01400Timems 0 503 0 0 653 01400Timems0 5000 FrequencyHz Figure18:SpeechsignalandspectrogramforthesyntheticwordAnt onio [ 5 O nju]producedusingF 0 extractedfromanaturalpro-nunciation.urationsandfricationnoisesources.Synthesizedspeechhasaqualityrangingfromfairtogood.Thesynthesizerhasbeenused,mainly,intheproduc-tionofstimuliforperceptualtestsofPortuguesenasalvowels(e.g.,[39 47 48 ]).Thetwostudiesonnasalvowelsreportedinthispaperwereonlypossiblewiththeuseofthearticu-latoryapproachtospeechsynthesis,allowingthecreationofstimulibydirectandprecisecontrolofthearticulatorsandtheglottalsource.Theyillustratethepotentialofarticula-torysynthesisinproductionandperceptionstudiesandthexibilityofoursynthesizer.PerceptualtestsandsimulationscontributedtoimproveourknowledgeregardingEPnasalsounds,namelythefol-lowing.(1)Itisnecessarytoincludethetimevariationofvelumaperture,combinedwiththetimevariationofarticu-latorscontrollingtheoralpassage,inordertosynthe-sizehigh-qualitynasalvowels.(2)Nasalityisnotcontrolledsolelybythevelummove-ment.Oralpassagereduction,orocclusion,canbealsousedtoimprovenasalvowelquality.Whennasalvow-elswereword-nal,thelipsortonguemovement,evenwithoutocclusion,improvedthequalityofthesynthe-sizednasalvowelbyincreasingthepredominanceofnasalradiation.Oralocclusion,duetocoarticulation,beforestopsalsocontributestonasalqualityimprove-ment. (3)Source-tractinteractione ectduetoextracouplingofthenasaltractisnoteasilyperceived.Discriminationwassignicantlyabovechancelevelonlyforthehighvowel[ whichcanpossiblybeexplainedbythere-lationofnasalandoralinputimpedancesatthenasaltractcouplingpoint. PAGE 12 1446EURASIPJournalonAppliedSignalProcessing Table3:Qualityratingsforseveralwordsproducedbythesynthesizer.Foreachword,thetableincludesthemeanopinionscore(MOS),itsrespective95%condenceinterval,andthenormalizedvalueresultingfromscalingnaturalspeechscoresto5. WordF 0 Interac.Observ.MOSCI95%Norm m ao SyntheticYesno[m]atend3. 4[3. 7]3. 7 SyntheticYes[m]atend3. 0[2. 4]3. 3 m ae Naturalyes[a],[ ]atend2. 9[2. 3]3. 2 SyntheticYes[a],[ ]atend3. 1[2. 4]3. 3 SyntheticYes[5 ],[ ]atend2. 9[2. 3]3. 2 SyntheticYes[a],no[ ]3 0[2. 4]3. 3 SyntheticNo[5 ],[ ]atend2. 9[2. 3]3. 1 SyntheticNo[a],[ ]atend2. 8[2. 2]3. 1 Ant onio NaturalYes3. 0[2. 2]3. 3 SyntheticYes2. 7[2. 9]2. 9 Naturalspeech4. 5. 0 Anasalvowel,atleastinEuropeanPortuguese,isnotasoundobtainedonlybyloweringthevelum.Thewaythisapertureandotherarticulatorsvaryintimeisimportant.Namely,howthevelumandtheoralarticulatorsvaryinthevariouscontextsimprovesquality.Withtheadditionofnoisesourcemodelsandmodica-tionstotheacousticmodel,ourarticulatorysynthesizerisca-pableofproducingsustainedfricativesandfricativesinVCVsequences.Firstresultswerepresented,andjudgedininfor-mallisteningtestsashighlyintelligible.Ourmodeloffrica-tivesiscomprehensiveandexible,makingthenewversionofSAPWindowsavaluabletoolfortryingoutneworim-provedsourcemodels,andrunningproductionandpercep-tualstudiesofEuropeanPortuguesefricatives[49 ].Thepos-sibilityofautomaticallyinsertingandremovingnoisesourcesalongtheoraltractisafeatureweregardashavinggreatpo-tential.SAPWindowsarticulatorysynthesizerisusefulinpho-neticsresearchandteaching.Weexploredtherstareaforseveralyearswithveryinterestingresults,asshowninthispaper.Recently,westartedexploringthesecondarea,aimingatusingthesynthesizerinphoneticsteachingatourUniver-sitysLanguagesandCulturesDepartment.Articulatorysyn-thesisisalsoofinterestintheeldofspeechtherapybecauseofitspotentialtomodeldi erentspeechpathologies.Developmentofthissynthesizerisanunnishedtask.TheadditionofnewmodelsforotherPortuguesesounds,theuseofacombineddata(MRI,EMA,EPG,etc.)foradetaileddescriptionofthevocaltractcongurationsandanoptimalmatchbetweenthesynthesizedandthePortuguesenaturalspectra[49 ],andtheintegrationofthesynthesizerinatext-to-speechsystemareplannedasfuturework.ACKNOWLEDGMENTSThisworkwaspartiallyfundedbytherstauthorsPh.D.ScholarshipBD/3495/94andtheprojectArticulatorySyn-thesisofPortugueseP/PLP/11222/1998,bothfromthePor-tugueseResearchFoundation(FCTPRAXISXXIprogram.WealsohavetothanktheUniversityofFloridasMMIRC,headedbyProfessorD.G.Childers,wherethisworkstarted.REFERENCES [1]R.Linggard,ElectronicSynthesisofSpeech,CambridgeUniver-sityPress,Cambridge,UK,1985.[2]M.R.Schroeder,ComputerSpeech:Recognition,Compression,Synthesis,vol.35ofSpringerSeriesinInformationSciences, SpringerVerlag,NewYork,NY,USA,1999.[3]J.-P.Tubach,Pr esentationG en erale,FondementsetPer-spectivesenTraitmentAutomatiquedelaParole,H.M eloni, Ed.,Universit esFrancophones,1996.[4]G.J.Borden,K.S.Harris,andL.J.Raphael,SpeechSciencePrimerhysiology,Acoustics,andPerceptionofSpeech,LWW,4ndedition,2003.[5]D.Klatt,Softwareforacascade/parallelformantsynthe-sizer,JournaloftheAcousticSocietyofAmerica,vol.67,no.3,pp.971,1980.[6]P.Rubin,T.Baer,andP.Mermelstein,Anarticulatorysynthe-sizerforperceptualresearch,JournaloftheAcousticalSocietyofAmerica,vol.70,no.2,pp.321,1981.[7]S.Maeda,Theroleofthesinuscavitiesintheproductionofnasalvowels,inProc.IEEEInt.Conf.Acoustics,Speech,Sig-nalProcessing(ICASSP,vol.2,pp.911,Paris,France,May1982.[8]T.Koizumi,S.Tanigushi,andS.Hiromitsu,Glottalsource-vocaltractinteraction,JournaloftheAcousticSocietyofAmer-ica ,vol.78,no.5,pp.1541,1985.[9]M.M.SondhiandJ.Schroeter,Ahybridtime-frequencydomainarticulatoryspeechsynthesizer,IEEETrans.Acoust.,Speech,SignalProcessing,vol.35,no.7,pp.955,1987.[10]C.H.ShadleandR.Damper,rospectsforarticulatorysyn-thesis:Apositionpaper,inProc.4thISCATutorialandRe-searchWorkshop(ITRW01),Perthshire,Scotland,AugustSeptember2001.[11]D.H.Whalen,Articulatorysynthesis:Advancesandprospects,inProc.15thInternationalCongressofPhoneticSciences(ICPhS,pp.175,Barcelona,Spain,August2003. [12]B.K uhnertandF.Nolan,Theoriginofcoarticulation,ForschungsberichtedesInstitutsf urPhonetikundSprach-licheKommunikationderUniversit atM unchen(FIPKM), vol.35,pp.61,1997,andalsoin[50 ].Online.Available:http://www.phonetik.uni-muenchen.de/FIPKM/index.html.[13]A.PintoandA.M.Tom e,Automaticpitchdetectionandmidiconversionforthesingingvoice,inProc.WSESInter-nationalConferences:AITA,AMTA,MCBE,MCBC ,pp.312,Greece,2001. PAGE 13 SpeechProductionSimulationAppliedtoEuropeanPortuguese1447 [14]A.M. Oster,D.House,A.Protopapas,andA.Hatzis,Presen-tationofanewEUprojectforspeechtherapy:OLP(Ortho-Logo-Paedia),inProc.TMH-QPSR,Fonetik2002,vol.44,pp.45,Stockholm,Sweden,May2002.[15]F.S.CooperSpeechsynthesizers,inProc.4thInternationalCongressofPhoneticSciences(ICPhS,A.Sovij arviandP.Aalto,Eds.,pp.3,TheHague:Mouton,Helsinki,Finland,September1961.[16]M.Wrembel,nnovativeapproachestotheteachingofprac-ticalphonetics,inProc.PhoneticsTeaching&LearningCon-ferencePTLC,London,UK,April2002.[17]S.S.Fels,F.Vogt,B.Gick,C.Jaeger,andI.Wilson,ser-centreddesignforanopensource3-Darticulatorysynthe-sizer,inProc.15thInternationalCongressofPhoneticSciences(ICPhS,vol.1,pp.179,Barcelona,Spain,August2003. [18]K.Iskarous,L.Goldstein,D.H.Whalen,M.K.Tiede,andP.E.Rubin,CASY:TheHaskinscongurablearticulatorysynthe-sizer,inProc.15thInternationalCongressofPhoneticSciences(ICPhS,vol.1,pp.185,Barcelona,Spain,2003.[19]S.MaedaandM.Toda,Mechanicalpropertiesoflipmove-ments:Howtocharacterizedi erentspeakingstyles?inProc.15thInternationalCongressofPhoneticSciences(ICPhS, vol.1,pp.189,Barcelona,Spain,August2003.[20]P.Badin,G.Bailly,F.Elisei,andM.Odisio,Virtualtalkingheadsandaudiovisualarticulatorysynthesis,inProc.15thIn-ternationalCongressofPhoneticSciencesICPhS03),vol.1,pp.193,Barcelona,Spain,August2003.[21]K.N.StevensandH.M.Hanson,Productionofconsonantswithaquasi-articulatorysynthesizer,inProc.15thInterna-tionalCongressofPhoneticSciences(ICPhS,vol.1,pp.199,Barcelona,Spain,August2003.[22]P.Mermelstein,Articulatorymodelforthestudyofspeechproduction,JournaloftheAcousticalSocietyofAmerica, vol.53,no.4,pp.1070,1973.[23]J.SchroeterandM.M.Sondhi,Speechcodingbasedonphys-iologicalmodelsofspeechproduction,inAdvancesinSpeechSiganlProcessing,MarcelDekker,NewYork,NY,USA,1992.[24]P.P.L.Prado,Atarget-basedarticulatorysynthesizer,Ph.D.dissertation,UniversityofFlorida,Gianesville,Fla,USA,1991. [25]A.Teixeira,F.Vaz,andJ.C.Pr ncipe,Acomprehensivenasalmodelforafrequencydomainarticulatorysynthesizer,inProc.10thPortugueseConferenceonPatternRecognitionRec-Pad98),Lisbon,Portugal,March1998.[26]M.Chen,AcousticcorrelatesofEnglishandFrenchnasalizedvowels,JournaloftheAcousticalSocietyofAmerica,vol.102,no.4,pp.2360,1997.[27]J.DangandK.Honda,MRImeasurementsandacousticin-vestigationofthenasalandparanasalcavities,JournaloftheAcousticalSocietyofAmerica,vol.94,no.3,pp.1765,1993. [28]K.N.Stevens,AcousticPhonetics,CurrentStudiesinLinguis-tics,MITPress,Cambridge,Mass,USA,1998.[29]A.Teixeira,F.Vaz,andJ.C.Pr ncipe, ectsofsource-tractinteractioninperceptionofnasality,inProc.6thEuropeanConferenceonSpeechCommunicationandTechnology(EU-ROSPEECH99),vol.1,pp.161,Budapest,Hungary,September1999.[30]D.AllenandW.Strong,Amodelforthesynthesisofnaturalsoundingvowels,JournaloftheAcousticalSocietyofAmerica, vol.78,no.1,pp.58,1985.[31]T.V.AnanthapadmanabhaandG.Fant,Calculationoftrueglottalowanditscomponents,SpeechCommunication, vol.1,no.3-4,pp.167,1982.[32]L.Silva,A.Teixeira,andF.Vaz,Anobjectorientedarticu-latorysynthesizerforWindows,RevistadoDepartamentodeElectr onicaeTelecomunicac oes,UniversidadedeAveiro,vol.3,no.5,pp.483,2002.[33]E.L.Riegelsberger,Theacoustic-to-articulatorymappingofvoicedandfricatedspeech,Ph.D.dissertation,TheOhioStateUniversity,Columbus,Ohio,USA,1997.[34]Q.Lin,Afastalgorithmforcomputingthevocal-tractim-pulseresponsefromthetransferfunction,IEEETrans.SpeechAudioProcessing,vol.3,no.6,pp.449,1995.[35]M.FrigoandS.Johnson,FFTW:anadaptivesoftwarearchi-tecturefortheFFT,inProc.IEEEInt.Conf.Acoustics,Speech,SignalProcessing(ICASSP,vol.3,pp.1381,Seattle,Wash,USA,1998.[36]S.S.NarayananandA.A.H.Alwan,oisesourcemodelsforfricativeconsonants,IEEETrans.SpeechAudioProcessing, vol.8,no.3,pp.328,2000.[37]J.L.Flanagan,SpeechAnalysis,SynthesisandPerception, Springer-Verlag,NewYork,NY,USA,2ndedition,1972.[38]C.H.Shadle,Articulatory-acousticrelationshipsinfricativeconsonants,inSpeechProductionandSpeechModelling,W.J.HardcastleandA.Marchal,Eds.,pp.187,KluwerAca-demic,Dordrecht,TheNetherlands,1990.[39]A.Teixeira,F.Vaz,andJ.C.Pr ncipe,nuenceofdynamicsintheperceivednaturalnessofportuguesenasalvowels,inProc.14thInternationalCongressofPhoneticSciences(ICPhS ,SanFrancisco,Calif,USA,August1999.[40]N.Kitawaki,Qualityassessmentofcodedspeech,inAd-vancesinSpeechSignalProcessing,S.FuruiandM.M.Sondhi,Eds.,chapter12,pp.357,MarcelDekker,NewYork,NY,USA,1992.[41]I.R.TitzeandB.H.Story,Acousticinteractionsofthevoicesourcewiththelowervocaltract,JournaloftheAcousticalSo-cietyofAmerica,vol.101,no.4,pp.2234,1997.[42]A.Teixeira,L.M.T.Jesus,andR.Martinez,Addingfrica-tivestothePortuguesearticulatorysynthesizer,inProc.8thEuropeanConferenceonSpeechCommunicationandTechnol-ogyEUROSPEECH03),pp.2949,Geneva,Switzer-land,September2003.[43]A.Teixeira,F.Vaz,andJ.C.Pr ncipe,AsoftwaretooltostudyPortuguesevowels,inProc.5thEuropeanConferenceonSpeechCommunicationandTechnology(EUROSPEECH97), G.Kokkinakis,N.Fakotakis,andE.Dermatas,Eds.,vol.5,pp.2543,Rhodes,Greece,September1997.[44]A.Teixeira,F.Vaz,andJ.C.Pr ncipe,SomestudiesofEu-ropeanPortuguesenasalvowelsusinganarticulatorysynthe-sizer,inProc.5thIEEEInternationalConferenceonElectron-ics,CircuitsandSystemsICECS98),vol.3,pp.507,Lis-bon,Portugal,September1998.[45]J.Laver,PrinciplesofPhonetics,CambridgeTextbooksinLin-guistics,CambridgeUniversityPress,Cambridge,UK,1stedi-tion,1994.[46]D.G.Childers,SpeechProcessingandSynthesisToolboxes,JohnWiley&Sons,NewYork,NY,USA,2000.[47]A.Teixeira,L.C.Moutinho,andR.L.Coimbra,roduction,acousticandperceptualstudiesonEuropeanPortuguesenasalvowelsheight,inProc.15thInternationalCongressofPhoneticSciences(ICPhS,Barcelona,Spain,August2003.[48]A.Teixeira,F.Vaz,andJ.C.Pr ncipe,asalvowelsfollowinganasalconsonant,inProc.5thSeminaronSpeechProduc-tion:ModelsandData,pp.285,Bavaria,Germany,May2000. [49]L.M.T.JesusandC.H.Shadle,AparametricstudyofthespectralcharacteristicsofEuropeanPortuguesefricatives,JournalofPhonetics,vol.30,no.3,pp.437,2002. PAGE 14 1448EURASIPJournalonAppliedSignalProcessing [50]W.J.HardcastleandN.Hewlett,Eds.,Coarticulation:Theo-reticalandEmpiricalPerspectives,CambridgeUniversityPress,Cambridge,Mass,USA,1999. Ant onioJ.S.TeixeirawasborninParedes,Portugal,in1968.Hereceivedhisrstde-greeinelectronicandtelecommunicationsengineeringin1991,theM.S.degreeinelec-tronicandtelecommunicationsengineeringin1993,andthePh.D.degreeinelectricalengineeringin2000,allfromtheUniversityofAveiro,Aveiro,Portugal.HisPh.D.disser-tationwasonarticulatorysynthesisofthePortuguesenasals.Since1997,hehasbeenteachingintheDepartmentofElectronicsandTelecommunica-tionsEngineeringattheUniversityofAveiro,aProfessorAuxiliarsince2000,andhasbeenaResearcher,sinceitscreationin1999,intheSignalProcessingLaboratoryattheInstituteofElectronicsandTelematicsEngineeringofAveiro(IEETA),Aveiro,Portugal.Hisresearchinterestsincludedigitalprocessingofspeechsignals,particularly(articulatory)speechsynthesis;Portuguesephonetics;speakerverication;spokenlanguageunderstanding;dialoguesys-tems;andman-machineinteraction.HeisalsoinvolvedinanewMastsprogramintheareaofspeechsciencesandhearing,astheCoordinator.HeisaMemberofTheInstituteofElectricalandElectronicsEngineers,InternationalSpeechCommunicationAsso-ciation,andtheInternationalPhoneticAssociation. RobertoMartinezwasborninCubain1961.Hereceivedhisrstdegreeinphysicsin1986fromtheMoscowStateUniversityM.V.Lomonosov,formerUSSR.HeisaMi-crosoftCertiedEngineersince1998.From1986to1994,hewasanAssistantProfessorofmathematicsandphysicsattheHavanaUniversity,Cuba,doingresearchincom-puteraidedmoleculardesign.From1996to1998,hewaswithSIMELtd.,Cuba,asanIntranetDeveloperandSystemAdministrator.From1999to2001,hewaswithDISAICConsultingServices,Cuba,doingtrainingofNetworkAdministratorsandconsultinginMicrosoftBackO ce SystemsIntegrationandNetworkSecurity.HeiscurrentlyworkingtowardtheDoctoraldegreeinarticulatorysynthesisofPortugueseattheUniversityofAveiro,Portugal. Lu NunoSilvareceivedhisrstdegreeinelectronicsandtelecommunicationsen-gineeringin1997andtheM.S.degreeinelectronicsandtelecommunicationsengi-neeringin2001,bothfromtheUniversi-dadedeAveiro,Aveiro,Portugal.From1997till2002,heworkedinresearchandde-velopmentattheInstitutodeEngenhariaElectr onicaeTelem aticadeAveiro,Aveiro,Portugal(formerInstitutodeEngenhariadeSistemaseComputadoresofAveiro,Aveiro,Portugal)asaResearchAssociate.Since2002,hehasbeenworkingasaSoftwareEngineerattheResearchandDevelopmentDepartmentofNECPortugal,Aveiro,Portugal.Hisresearchinterestsincludedigitalprocessingofspeechsignalsandspeechsynthesis.HeisaMemberofTheIn-stituteofElectricalandElectronicsEngineers. LuisM.T.Jesusreceivedhisrstdegreeinelectronicandtelecommunicationsengi-neeringin1996fromtheUniversidadedeAveiro,Aveiro,Portugal,theM.S.degreeinelectronicsin1997fromtheUniversityofEastAnglia,Norwich,UK,andthePh.D.degreeinelectronicsin2001fromtheUni-versityofSouthampton,UK.Since2001,hehasbeenaReaderintheEscolaSuperiordeSa udedaUniversidadedeAveiro,Aveiro,Portugal,andhasbeenamemberoftheSignalProcessingLab-oratoryattheInstitutodeEngenhariaElectr onicaeTelem atica deAveiro,Aveiro,Portugal.Hisresearchinterestsincludeacous-ticphonetics,digitalprocessingofspeechsignals,andspeechsyn-thesis.HeisaMemberofTheAcousticalSocietyofAmerica,Associac aoPortuguesadeLingu stica,InternationalPhoneticAs-sociation,InternationalSpeechCommunicationAssociation,andTheInstituteofElectricalandElectronicsEngineers. JoseC.Pr ncipeisaDistinguishedProfes-sorofelectricalandbiomedicalengineer-ingattheUniversityofFlorida,Gainesville,whereheteachesadvancedsignalprocess-ingandarticialneuralnetworks(ANNs)modeling.HeisaBellSouthProfessorandFounderandDirectoroftheUniversityofFloridaComputationalNeuroEngineeringLaboratory(CNEL).Hehasbeeninvolvedinbiomedicalsignalprocessing,brainma-chineinterfaces,nonlineardynamics,andadaptivesystemstheory(informationtheoreticlearning).HeistheEditor-in-ChiefofIEEETransactionsonBiomedicalEngineering,PresidentoftheInterna-tionalNeuralNetworkSociety,andformalSecretaryoftheTechni-calCommitteeonNeuralNetworksoftheIEEESignalProcessingSociety.HeisalsoaMemberoftheScienticBoardoftheFoodandDrugAdministration,andaMemberoftheAdvisoryBoardoftheUniversityofFloridaBrainInstitute.Hehasmorethan100publicationsinrefereedjournals,10bookchapters,andover200conferencepapers.Hehasdirected42Ph.D.degreedissertationsand57M.S.degreetheses. FranciscoA.C.VazwasborninOporto,Portugal,in1945.HereceivedtheElectri-calEngineeringdegreefromUniversityofOporto,Portugal,in1968,andthePh.D.degreeinelectricalengineeringfromtheUniversityofAveiro,Portugal,in1987.HisPh.D.dissertationwasonautomaticEEGprocessing.From1969to1973,heworkedforthePortugueseNuclearCommittee.Af-terseveralyearsworkingintheindustry,hejoined,in1978,thesta oftheDepartmentofElectronicsEngi-neeringandTelecommunications,theUniversityofAveiro,whereheiscurrentlyaFullProfessor.Hisresearchinterestshavecentredonthedigitalprocessingofbiologicalsignals,andsince1995ondigitalspeechprocessing. |