<%BANNER%>

Real-World evaluation of mobile phone speech enhancement algorithms

University of Florida Institutional Repository

PAGE 3

TABLEOFCONTENTS ACKNOWLEDGMENTS........................ii ABSTRACT..............................v CHAPTERS 1INTRODUCTION..........................1 1.1Background................................2 1.1.1EnergyRedistribution......................2 1.1.2BandwidthExpansion......................5 1.1.3CombinedAlgorithm.......................7 1.2ListeningTests..............................7 1.3ChapterSummary............................9 2PCBASEDLISTENINGTESTS..................10 2.1IntelligibilityTest.............................10 2.1.1BandwidthExpansionResults..................11 2.1.2ERVUResults...........................12 2.2PerceptualLoudnessTest........................12 2.2.1ObjectiveLoudness........................12 2.2.2SubjectiveLoudness.......................13 2.3AcceptabilityTest.............................14 3EXPANDEDPCBASEDLISTENINGTESTS............17 3.1Motrola TM VSELPVocoder.......................17 3.2NoiseSources...............................18 3.2.1SNRCalculation.........................19 3.2.2SegmentalSNR..........................19 3.2.3A-Weighting............................21 3.2.4ChoosingtheSNRlevels.....................22 3.3AudioEQFilter..............................23 3.4ListenerDemographicsandTestResults................24 3.5ANoteonERVU.............................27 4JAVAIMPLEMENTATIONOFLISTENINGTESTS.........29 4.1J2MEandJ2SE..............................29 4.2WhyJ2ME?................................30 iii

PAGE 4

4.2.1DevelopinginJ2ME.......................31 4.2.2J2MEConstraints........................31 4.3MotorolaiDENSeriesPhones......................33 4.4ListeningTestSetup...........................33 4.5 ListenT MIDlet..............................33 4.5.1LoudnessTest...........................36 4.5.2IntelligibilityTest.........................38 4.5.3AcceptabilityTest........................41 4.5.4DatabaseStorage.........................41 4.5.5 RandObj Class..........................42 4.6 ListenDB Class..............................42 4.7 Voicenotes MIDlet............................42 4.8VoiceNoteCollection...........................44 4.9 DbUpload MIDlet.............................44 4.10PCCodingUsingJ2SE..........................46 5CONCLUSION..........................49 APPENDIX AJ2MECLASSESANDMETHODS..................51 A.1TheListenTClass............................51 A.2TheDbUpload..............................51 A.3VoiceNotes2................................51 BJ2SECLASSESANDMETHODS..................55 B.1DbDownload...............................55 B.2ControlPanelandScrollingPanel.....................56 B.3ConnectPhone...............................56 REFERENCES............................58 BIOGRAPHICALSKETCH......................60 iv

PAGE 5

AbstractofThesisPresentedtotheGraduateSchool oftheUniversityofFloridainPartialFulllmentofthe RequirementsfortheDegreeofMasterofScience REAL-WORLDEVALUATIONOFMOBILEPHONESPEECH ENHANCEMENTALGORITHMS By WilliamThomasO'Rourke December2002 Chairman:JohnG.Harris MajorDepartment:ElectricalandComputerEngineering Thisworkevaluatestheperformanceoftwonewclassesofautomaticspeechenhancementalgorithmsbymodellinglisteningtestsclosertoreal-worldenvironments. Resultsfromearlierlisteningtestsshowthatthewarpedbandwidthexpansionalgorithmincreasesperceptualloudnessandtheenergyredistributionvoiced/unvoiced algorithmincreasesintelligibilityofthespeechwithoutaddingadditionalpowerto thespeechsignal. Thisthesispresentsresultsfromlisteningtestsconductedwithamodelofrealworldenvironmentsandprovidesaplatformforcellularphonebasedlisteningtests. Bothalgorithmsarecombinedonaframebasistoincreaseintelligibilityandloudness.Thespeechsignalsareencodedanddecodedtomodeleectsofcellularphone vocoders.Threetypicalenvironmentalnoises(pink,babbleandcar)areusedtotest thealgorithms'performancetonoise.Perceptualtechniquesareusedtocalculatethe signaltonoiseratio(SNR).AspeakerEQmodelisusedtoemulatethefrequency limitsofcellularphonespeakers.Finally,cellularphonebasedlisteningtestsare v

PAGE 6

developedusingtheJava2MicroEditionplatformforMotorolaiDENJavaenabled cellularphones. Thelisteningtestsresultedina4.8%intelligibilityincreaseat-5dBSNRanda 4dBperceptualloudnessincreaseforthecombinedalgorithm.Thecellularphone basedlisteningtestswillprovideanideallisteningtestenvironmentoncetheJava environmentisequippedwithstreamingaudioabilitiesandthealgorithmsareimplementedonthephone. vi

PAGE 7

CHAPTER1 INTRODUCTION Theuseofcellularphonesisonanincreaseallovertheworldandnaturallyitis becomingmorecommontoseepeopleusingtheirphonesinhighnoiseenvironments. Theseenvironmentsmayincludedrivingincars,socializinginloudgatheringsor workinginfactories.Todealwiththenoise,cellularphoneuserswilloftenpressthe phonetotheirheadandturnupthevolumetothemaximumwhichmanytimesisstill notenoughtounderstandthespeech.Cellularphonemanufacturerscouldusemore powerfulspeakersorhighercurrentdrivers,bothincreasingcostandbatterysize. Algorithmsthatincreaseintelligibilityandoverallloudnesswillhelplowerbattery usageandeaseuserstrainwhenusingthephone. Thisthesisstudiestheimplementationandevaluationofanewclassofalgorithmforcellularphoneuse.Theenergyredistributionalgorithm[22],describedin Section1.1.1,isaneorttoincreaseoverallintelligibilityofspeech.Section1.1.2describesthebandwidthexpansionalgorithm[4],usedtoincreaseperceptualloudness ofspeech.Theaimofimplementingthesealgorithms,iseither(1)toenhancethe speechfornoisyenvironmentsor(2)tomaintainhequalityofthespeechatalower signalpowerinordertoextendbatterylife. Thisthesisprimarilyaddressesthetestingofthesealgorithmstoensurereal-world applicability.ThesetestsincludecontrolledenvironmentlaboratorytestingonPCs andreal-worldenvironmenttestingoncellularphones.ThePCtestingrstteststhe performanceofthetwoalgorithmswithoutreal-worldconsiderations.Next,thePC testingismodiedtobettermodelreal-worldenvironments.Finally,thelistening testsareimplementedonacellularphonetoevaluatereal-worldperformance. 1

PAGE 8

2 1.1Background ThisthesisisanalrequirementforthefullmentofworkdoneforiDENDivision ofMotorola.Theproposedworkattemptedtoincreaseintelligibilityandperceptual loudnesswithoutincurringadditionalpowercost.Theworkrequiredalgorithmsthat werefeasibleforreal-timeimplementationsandwouldnotaectthenaturalnessof thespeech.Itassumedthatstandardnoisereductiontechniqueswouldbeperformed priortotheapplicationofthesealgorithmsandthatthereceivedspeechcouldbe assumedtobeclean.Insupportofthiswork,theextendedlaboratorytestingand cellularphonebasedlisteningtestswererequired.Theideawastoenablethecomplete evaluationofthetwoalgorithmswhichresultedfromthisresearch. 1.1.1EnergyRedistribution Theenergyredistributionvoiced/unvoiced(ERVU)algorithm[22]hasbeenshown toincreasetheintelligibilityofspeech.Thealgorithmwasdevelopedbasedonpsychoacoustics.First,thepoweroftheunvoicedspeechiscrucialforintelligibility. Second,thepowerofthevoicedregionscanbeattenuateduptoacertainpoint withoutaectingtheintelligibilityandnaturalnessofthespeech.Voicedspeechis generatedanytimeglottalexcitationisusedtomakethesound.Voicedsignalstypicallyhavehigherpowerthanunvoicedsignals.Additionally,mostofthesignalpower liesinthelowerfrequenciesforvoicedspeech. Energyredistributionisperformedinthreebasicsteps.First,thevoicedandunvoicedregionsaredeterminedthroughthespectralatnessmeasurement(SFM)[11, 22],showninEquation1.1,onindividualwindowsofspeech. SFM = N 1 Q k =0 j X k j 1 N 1 N N 1 P k =0 j X k j (1.1)

PAGE 9

3 whereNisthewindowlengthandX k istheDiscreteFourierTransform(DFT)ofthe window. Second,theSFMiscomparedtotwothresholdsT 1 andT 2 .Thesethresholds aredeterminedbasedonstatisticalclassicationoftheSFMonvoicedandunvoiced speech,showninFigure1.1.ThevaluesforT 1 andT 2 weresetto0.36and0.47 respectively.ThedecisioncasescanbeseeninEquation1.2. Figure1.1:DiscriminationofphonemesbytheSFM Decision = 8 > > > > > > < > > > > > > : Voiced for SFMT 2 PreviousDecision otherwise (1.2) Next,theboostinglevelforthevoicedandunvoicedregionsmustbedetermined. Theboostinglevelisagainfactorthatwillbeappliedtothewindow.Forunvoiced windows,theboostingwillbegreaterthan1andforvoicedwindows,itmustbeless than1.Forwindowsthatfallbetweenboththresholds,theboostinglevelwillremain thesame.Boostinglevelsweredeterminedbyevaluatingvarioussentencesobtained

PAGE 10

4 fromtheTIMITdatabase[27].Thelevelswereadjusteduntilnaturalnesswaslost andthensettothepreviouslevel.Theresultinglevelsweresettobe0.55forvoiced speechand4forunvoiced.Inordertosmooththetransitions(goingfromvoicedto unvoicedwindowsorvice-versa),theboostinglevelisadjustedlinearlyintherst10 millisecondsofthewindow.Anexampleoforiginalspeechutterance\six"isplotted withthemodiedversioninFigure1.2.TheSFMtechniquewaschosenovertwo Figure1.2:ResultofapplyingERVUtotheutterance\six" othertechniquesdiscussedbyReinke[22].Thesetechniquesuseameasureofspectral transitiontodictateenhancement.Thepointsofhighspectraltransitionareimportantforretainingvowelperceptioninco-articulation.Similarresultswereobtained inintelligibilitytests.TheSFMtechniqueisalsolesscomputationallycomplexthan theothermethods. TheresultsoftestsconductedbyReinke[22]haveshownanincreaseofintelligibilitycloseto5percentat0dBSNR.Resultsindicatetheperformanceofthe ERVUalgorithmdecreasedwhentheoriginalspeechwascorruptedwithnoise.This

PAGE 11

5 shortfallistheresultofusingSFM.Theaddednoisellsthenullsbetweenformants typicallyassociatedwithvoicedspeech.Thisincreasesthegeometricmean(numeratorofEquation1.1)ofthespectrumsignicantly.TheresultingSFMisincreased andleadstoamisclassicationofvoicedspeech.However,thisthesisexaminesthe intelligibilityofcleanspeechforthesender'ssidewithnoiseonthereceiver'sside. 1.1.2BandwidthExpansion Bandwidthexpansion[4]utilizesawarpedltertoincreaseperceptualloudnessof vowelsincleanspeech.LiketheERVU,bandwidthexpansionusesmotivationfroma psychoacousticsperspective.Theunderlyingprincipleisthatloudnessincreaseswhen criticalbandsareexceeded[3].Loudnessreferstothelevelofperceivedintensityof signalloudnessandcanbemeasuredbothobjectivelyandsubjectively.Objective measurementscanbemadeusingtheISO-532Bstandard(Zwickermethod)[10]. Thehumanauditorysystemactsasifthereisadedicatedband-passlteraround allfrequenciesthatcanbedetectedbythehumans.Withinthisband,perceptual loudnessisdominatedbythefrequencieswiththestrongestintensity.Varioustests havebeenperformedbyZwickerandFastl[29]tomeasurethesebands.Theunderlyingideaisthatwhenenergywithinabandisxed,theloudnessremainsconstant. However,oncethebandwidthisexceeded(theenergyisspreadovermorethanone criticalband)therewillbeanincreaseintheperceivedloudness. Thebandwidthexpansionalgorithmusesthisideaofspreadingthespectralenergy ofaspeechsignalovermorecriticalbands.Theregionsofinterest,vowels,are foundusingvoiceactivitydetectiondescribedbyMotorolaCorporation[19].Speech enhancementisperformedinthreesteps.First,avocaltractmodelisestimated usinglinearpredictioncoecients(LPC) a ,calculatedusingtheLevinson-Durbin recursion[1].TheexcitationisthenfoundusingtheinverselterA( z ),anFIRlter whosecoecientsare a .Then,thesignalisevaluatedotheunitcircleinthe z domain.Evaluationotheunitcircleisdonebyrstselectingtheradius( r )atwhich

PAGE 12

6 thesignalwillbeevaluated,thenthesignalispassedthroughanIIRlterA(~ z )shown inEquation1.3.Figure1.3showsthatthepoledisplacementwidensthebandwidth oftheformantsinthe z -domain. A (~ z ) j ~ z = re j! = P X k =0 a k r k e jwk (1.3) Figure1.3:VisualizationofPoleDisplacement. Sincecriticalbandsarenotofequalbandwidth,expansionbyevaluationalong thecircleofradius r willnotbeoptimal.Forthisreason,awarpingtechnique isused.ThiswarpingtechniqueisperformedliketheLPCbandwidthwidening, however,itworksonthecriticalbandscale.Theideaistoexpandthebandwidth onascaleclosertothatofthehumanauditorysystem.ThexedbandwidthLPC poledisplacementmethodismodiedbyapplyingthesametechniquetoawarped implementation.TheWarpedLPClter(WLPC)isimplementedbyreplacingthe unit-delayofA(~ z )withanall-passltershowninEquation1.4.Thewarpinglter providesanadditionalterm calledthewarpingfactor.Therangeofvaluesfor

PAGE 13

7 are-1to1.When is0,nowarpingtakeseect,for > 0highfrequencies arecompressedandlowfrequenciesexpandedandfor < 0lowfrequenciesare compressedandhighfrequenciesexpanded. ~ z 1 = z 1 1 z 1 (1.4) AftertheWLPCanalysisisdone,theexcitationispassedthroughawarpedIIR (WIRR)thatresultsinthewarpedbandwidthenhancedspeech.Anadditionalradius term isaddedtheWIIRsothattheresultingspectralsloperemainsthesame.The newalgorithmforbandwidthexpansioncanbeseeninEquation1.5. H ( z )= A (~ z=r ) A (~ z= ) (1.5) ThenalarchitectureusedforbandwidthexpansioncanbeseeninFigure1.4.Boillot foundvaluesforof and equalto0.35and0.4respectively,thealgorithmperformed thebestinsubjectiveloudnesstests.Thevalueof r wassetbetween0.4and0.8as afunctionoftonality[4]. 1.1.3CombinedAlgorithm InordertoevaluatetheeectsofbothERVUandbandwidthexpansion,the twoalgorithmshavebeencombinedonawindowbywindowbasis.Thecombined algorithmincreasesintelligibilityandperceptualloudnessgainwithoutincreasingthe signalpower. 1.2ListeningTests Inordertoverifythereal-worldeectsofspeechenhancementalgorithms,some formofsubjectivelisteningtestsmustbeperformed.Threedierentlevelsoflisteningtestswereperformedforthispurpose.First,controlledtestswereadministered inthelaboratory.ThesetestswereperformedusingcleanspeechfromtheTIMIT

PAGE 14

8 Figure1.4:RealizationoftheWarpedBandwidthExpansionFilter.

PAGE 15

9 andTI-46databases.Thepurposeofthesetestswastoobtainpreliminaryresults andmakenecessaryadjustmentstothealgorithms.Second,thetestsaremodiedto bettermodelthereal-worldenvironment.Thesemodelsincludetheintroductionof environmentalnoiseandmodellingoftheiDENcellularphonespeaker.Finally,the testsareconductedusingtheactualphone.ThisisenabledbytherecentincorporationoftheJava2Micro-EditionvirtualmachineoncertainiDENphones.Thethree testsarediscussedalongwiththeirresultsinChapters2,3and4respectively. 1.3ChapterSummary Theoutlinefortheremainderofthisthesisisasfollows: Chapter2:PCbasedlisteningtests. Thischapterwilldescribethebasic set-upofthelaboratorylisteningtestswithemphasisoncleanspeechandnoisefree environment.Itwillalsopresentresultsobtainedinpastresearchusingthisset-up. Chapter3:ExpandedPCbasedlisteningtests. Inthischapter,thebasic listeningtestsdescribedinChapter2willbemodiedtobetterapproximaterealworldconditions.Forthistest,backgroundnoiseconsistingofpink,carandbabble noiseisadded.Additionally,amodelofthefrequencyresponseoftheMotorolai90c phoneisusedtobettersimulatecellularphoneuse.Theresultswillshowthatthe algorithmsbothimproveintelligibilityby4.8percentat-5dBSNRandresultina perceptualloudnessgainof4dB. Chapter4:Javaimplementationoflisteningtests. ThischapterwilldescribetheimplementationoflisteningtestsforMotorolaJavaenabledcellularphones. Itwillalsodescribethesupportapplicationsdevelopedtomanageandevaluatelisteningtests. Chapter5:Conclusionsandfuturework. Thischapterwilldiscussthe resultsofthelisteningtests,shortcomingsofthealgorithmsandfutureworkonthe subjectofspeechenhancementandreal-worldtesting.

PAGE 16

CHAPTER2 PCBASEDLISTENINGTESTS Therststepinevaluatingtheeectsofspeechenhancementalgorithmsissome formoflisteningtest.Theresultsofthesetestscanbeusedtotune,modifyordiscard algorithmsbasedoninitialperformance.Forthepurposeofthisthesis,theevaluation ofthealgorithmsstartswithasimpliedlisteningtest.Thesetestsarewrittenin MATLABandwereoriginallydevelopedbyCNELmemberMarkSkowronski.The testshavebeenmodiedthroughtimetoaccommodatechangestothelisteningtests. ThischapterwilldescribethetestsastheywerewhenBoillot[4]testedthebandwidth expansionalgorithm.Intelligibility,loudnessandacceptabilitytestswereconducted. TheEnergyRedistributionVoiced/Unvoiced(ERVU)algorithm,originallytestedby Reinke[22],wasnotinitiallytestedinthislisteningtestenvironment.However,the resultsofERVUintelligibilitytestswillbediscussedinSection2.1.2. 2.1IntelligibilityTest Thepurposeofcommunicationsistogetthemessageacrossasclearlyaspossible.Algorithmsthatattempttoincreaseintelligibilityorperceptualloudnessrequire testingtoverifytheirapplicability.Intelligibilitytestingmethodscomeinmany forms.SomeofthesemethodsincludetheDiagnosticRhymeTest(DRT),theModiedRhymeTest(MRT),andthePhoneticallyBalancedwordlists(PB).Themethod usedinthisexperimentisavariantoftheDRT[28,4]. TheintelligibilitytestswereconductedusingspeechsamplesfromtheTI-46 database[26]sampledat10kHz.SetsI,IIandIII,fromTable2.1wereused.These setswereoriginallyusedbyJunqua[12]totesttheeectoftheLombardeecton 10

PAGE 17

11 speechintelligibility.TheLombardeectisthewaypeoplespeakdierentlywhenin anoisyenvironment.Theytrytocompensateforthenoisebyspeakinglouder,slower, moreclearlyandwithmorestressandemphasis.Theindividualsetsareconsidered tobeeasilyconfusableandprovideagoodvocabularyfortestingintelligibility. TheMATLABGUIusedtoconductthetestisshowninFigure2.1.Foreach utterance,analternateutterancewasselectedfromthesamesetandpresentedasa confusablechoice.TheGUIallowedonlyonechoiceanddidnotlimitthenumberof timestheutterancewasplayed.Eachutterancehadanequalchanceofbeingselected. Theorderofselectionwasrandomizedsothatthelistenerhadnoknowledgeofwhich utterancewascorrect.Halftheutterancespresentedwereleftintheiroriginalform andtheotherhalfenhancedwiththebandwidthexpansionalgorithm. Figure2.1:IntelligibilitytestGUI 2.1.1BandwidthExpansionResults Allthoughthebandwidthexpansionalgorithmonlyattemptstoincreasethe loudnessofspeech,itstillmustbetestedtoensurethatthereisnodecreasein intelligibility.Atotalof60utterancesbepresentedtoeachlistenerwithadded Gaussiannoiseat0dBSNR.Thetestresultedinanoveralldecreaseinintelligibilityof 0.3% 3.1%ata95%condenceinterval.Theseresultsshowedthatthebandwidth expansionalgorithmhadnomeasurableeectonintelligibility[4].Theseresults

PAGE 18

If,s,x,yesIIa,eight,h,kIIIb,c,d,e,g,p,t,v,z,threeIVm,n2.1.2ERVUResultsReinkeinitiallyusedthesameformofintelligibilitytestasBoillottotesthisERVUalgorithmandahigh-passlter.HelaterusedtheDRTtotestotherintelligibilityenhancementalgorithms[22].ThetestsusedsetsI,II,IIIandIVfromTable2.1.Gaussiannoisewasaddedtotheutterancesat0dBand-10dBSNRlevels.Hereporteda3%increaseinintelligibilityat0dBSNRand5.5%at-10dBSNR.Histestwasadministeredto25listeners,whoseagesrangedfrom21to37.2.2PerceptualLoudnessTestLoudnessisthehumanperceptionofintensityofsound[8].Itisafunctionofsoundintensity,frequencyandquality.Theloudnessofsinglesinusoidscanbemeasuredusingtheequalloudnesscontours[29].Thismeasure,usingtheunitsphon,inonlyvalidfornarrowbandsignals.Forthisreason,BoillotusedISO532Banalysis[10].Subjectivemeasuresareperformedusingaloudnesstest.2.2.1ObjectiveLoudnessSpeechisawidebandsignalandperceptualloudnesscannotbemeasuredusingtheequalloudnesscontours.Instead,amodelrstdevelopedbyZwicker[6],ISO532Bwasusedtoperformobjectivemeasures.ISO532Busesconceptsfromauditoryltermodelsandthecriticalbandconcept[7].Additionally,ISO532Bfollowsthemodel

PAGE 19

V1zero,one,two,four,ve,six,seven,nineV2enter,erase,help,repeat,right,ruboutV4a,eight,h,kV5b,c,d,e,g,p,t,v,z,threeV5m,nAtotalof80wordswaspresentedtoeachlistener.Boillotused15%ofthesewordstoperformascreeningevaluation.Thelistenerhadnoknowledgeofwhichwasthemodiedortheenhancedwordandtheorderwasrandomlyselected.First,bothwordsarenormalizedtoequalpowerandthentheenhancedversionwouldbescaledrandomlybetween0dBand5dBin0.5dBincrements.Thisprovidedtheperceptualgainoftheenhancedalgorithm.Thescalingpointwhichproduceda50%selectionratemarksthisperceptualgaincrossoverpoint.Thebandwidthexpansion

PAGE 20

14 Figure2.2:LoudnesstestGUI algorithmresultedinanapproximate2dBcrossoverpoint.Thisisthepointatwhich thelistenerisguessing,hence,50%theenhancedversionwaschosen.Theseresults canbeseeninFigure2.3,wheretheresultsareshowninsolidlines. Thescreeningprocessensuredthatthedatacollectionwasaccurate.Forthis portionofthetestsneitherwordwasmodiedbutonewasscaled.Ittestedthe hearingresolutionofthelistenerand,atthesametime,ensuredthelistenerwas payingattentionandnotsueringfromfatigue.Itwasimportanttoverifythelevel atwhichthehumanauditorysystemcouldperceiveachangeinloudnessbefore thealgorithmcouldbeconsideredeective.FromFigure2.3,thescreeningresults (indicatedbydashedlines)areequalat50%at0dBanddivergeasexpected. 2.3AcceptabilityTest Thegoalofthespeechenhancementalgorithmsistoincreasetheintelligibilityand perceptualloudnessofspeechwithoutdeterioratingthenaturalnessofthespeech. Boillotfoundthattheloudnessofvowelsincreasedmonotonicallyuntilthespectrum

PAGE 21

15 Figure2.3:TheBandwidthExpansionresultsforalllistenersontheLoudnesstest. Verticalbarsindicate95%condenceintervals[4]. wasat.Thisleadtoanobviousdistortionofthespeech.Toensurethatthewarped bandwidthexpansionalgorithmdidnoteectthequalityofspeech,Boillotincluded anacceptability,orquality,test[4].Thistestusedanumberratingsystemtoquantify theoverallimpression.Boillotusedpairedcomparisonteststoevaluateacceptability. Additionally,thetestincludedasubjectiveloudnessassessment.Thetestusedan originalandmodied(enhanced)versionofthesamesentencetakenfromtheTIMIT database.Atotalof20phoneticallybalancedsentencessampledat16kHzwereused. Theoriginalandthemodiedsentenceswerescaledtohaveequalpoweronaframeby-framebasis.Thelistenerswouldplayeachsentencepairandthensubjectivelyrank eachone.Thelistenersgaveamarkofexcellent,good,orfairforeachsentence.The numberonethroughthreecorrespondedtoeachmarkrespectively.Listenerswere directedtoscorerelatively.Forexample,evenifbothsentencessoundedexcellent, theyshouldstilltrytodeterminewhichofthetwowasbetterandgivethatsentence theexcellentmarkandtheotheramarkofgood.Theloudnessassessmentjustasked thelistenertodeterminewhichsentencesoundedlouderoverall.

PAGE 22

16 Figure2.4:AcceptabilitytestGUI Boillotreportedaqualityratingof1.56fororiginaland1.47formodiedspeech. Themodiedsentencewasselectedlouder90%ofthetime.Theseresultsinindicate thattheoverallqualityisnotaected.Theloudnessassessmentresultsprovidea previewtohowthealgorithmwouldperformonsentencesinsteadofthesingleword utterancesusedinloudnessandintelligibilitytests.

PAGE 23

CHAPTER3 EXPANDEDPCBASEDLISTENINGTESTS ThelisteningtestsdiscussedinChapter2didnotimplementtypicalreal-world eectsonspeech.Inthischapter,wewillexpandtheseteststomorecloselymodel real-worldenvironments.Thisisdoneinthreesteps.First,tosimulatetheeectsof vocodersusedoncellularphones,thespeechisvocodedthendevocoded.Then,noise sourcesarechosenbasedonreal-worldenvironments.Finally,theAudioEQModelfor theMotorolaI85ccellularphoneisimplementedtosimulatethecellularphonespeaker frequencyresponse.Additionally,thebandwidthexpansionandERVUalgorithms arecombinedinanattempttoincreaseperceptualloudnessandintelligibility.Itis thiscombinationalgorithmthatwillbeusedinthischapter.Theresultingsignals modiedbythisalgorithmwillbereferredtoasthe\enhanced"or\modied"signal inthischapter.Forthesetests,SonyMDR-CD-10headphoneswereused. 3.1Motrola TM VSELPVocoder TheMotorolaiDENi90cphoneusestheVectorSumExcitedLinearPrediction (VSELP)Vocoder[17]toprovideencodinganddecodingofspeechfortransmission. Thepurposeoftheencodingonthesendsideistocompressspeechtolimittransmissionbandwidth.Onthereceiveside,thevocoderthendecodesthecompressed speech.Theresultofencodinganddecodingisdegradationofthespeech.This degradationincludesanalreadylimitedfrequencyrangeandlossofnaturalness.To simulatethisdegradation,MotorolahasprovidedaCprogramtoemulatetheencodinganddecodingofspeech.Thisallowsthelisteningteststobettermodelthesound ofspeechdeliveredbythephone.TheCprogramusestheAdvancedMulti-Band 17

PAGE 24

18 Table3.1:DescriptionandSamplesofNoiseSources. PinkNoiseacquiredbysamplinghigh-qualityanalognoisegenerator (Wandel&Goltermann).Exhibitsequalenergyper1/3octave. BabbleNoiseacquiredbyrecordingsamplesfrom1/2"B&Kcondensormicrophoneontodigitalaudiotape(DAT).Thesourceof thisbabbleis100peoplespeakinginacanteen.Theroomradius isovertwometers;therefore,individualvoicesareslightlyaudible. Thesoundlevelduringtherecordingprocesswas88dBA. Volvo340noiseacquiredbyrecordingsamplesfrom1/2"B&Kcondensormicrophoneontodigitalaudiotape(DAT).Thisrecording wasmadeat120km/h,in4thgear,onanasphaltroad,inrainy conditions. Excitation(AMBE)vocodermodelwhichisalsousediniDENphones.Todothis, thespeechmustrstbere-sampledatarateof8kHZ(thesamplingrateusedonthe phones).Thisfurthersthereal-worldmodelbybandwidth-limitingthespeech. 3.2NoiseSources ThelisteningtestsdescribedinChapter2exclusivelyusedwhiteGaussiannoise but,thereisalargervarietyofnoisetypesweexperienceinoureverydaylifeandfew ofthemareGaussian.Someofthesenoisesarecar,cocktailparty(babble),machine andpinknoise.Itwouldbeidealtotesttheperformanceofouralgorithmwithall possiblenoisesources.However,inordertopreventlistenerfatigueandtoallow multipleSNRleveltesting,thenoisesourceswerelimitedtocar,babble,andpink noises.ThesenoisesourceswereobtainedfromRiceUniversitySignalProcessing InformationBase[23].Thenoisesamplesaresampledat19.98kHzandavailablein bothMATLAB.MATformatand.WAVformat.Table3.1describesthenoisesand providessamples.

PAGE 25

19 3.2.1SNRCalculation ThereareseveralmethodsforcalculatingtheSNRofasignal.Theclassicform isshowninEquation3.1.Where x [ n ]isthecleanspeechsignal, v [ n ]istheadditive noiseandNisthenumberofsamplesforthesignal. SNR =10 log 10 0 B B B @ N 1 P n =0 ( x [ n ]) 2 N 1 P n =0 ( v [ n ]) 2 1 C C C A (3.1) ThisformofSNRisusuallytakenovertheentiresignallength.Unfortunately, thisformwillnoteectivelymeasuretheperceptualsignicanceofnoisetohuman hearing.Humansarelimitedintheirfrequencyrangeofhearing.Humanhearing typicallyrangesfrom20Hztoupwardsof20kHz.Obviously,ahighpowersignal atfrequenciesaboveandbelowthisrangewillnoteecttheperceptualSNRfrom thelistenersstandpoint.Additionally,thereisasharpdrop-ointheintensityof soundabove4kHz.Thisdrop-oisapparentinthe equalloudnesscontours shownin Figure3.1.Forthisreason,analternateapproachtothemethodusedinEquation3.1 isneedled. 3.2.2SegmentalSNR ClassicSNRcalculationscarrylittleperceptualsignicance,sincetheywilltend tobehigherastheratioofvoicedtounvoicedspeechincreases.Abettercalculation canbeobtainedbyconsideringtheSNRvaluesforframesofspeech.SegmentalSNR ( SNR seg ),showninEquation3.2,usesaframebasedaverageofthestandardSNR inEquation3.1.Thebasisofthe SNR seg calculationisthatittakesintoaccount theshortdurationofunvoicedspeech.Ifthe SNR seg equationisrearranged,we seethatitisageometricmeanofthewindowedspeechsignalSNR.Thisisseenin Equation3.3.

PAGE 26

20 Figure3.1:EqualLoudnessContours SNR seg = 1 L M 1 X k =0 10 log 10 0 B B B B @ L ( k +1) 1 P n = L k ( x [ n ]) 2 L ( k +1) 1 P n = L k ( v [ n ]) 2 1 C C C C A (3.2) SNR seg =10 log 10 0 B B B B B @ M 1 Y k =0 0 B B B B @ L ( k +1) 1 P n = L k ( x [ n ]) 2 L ( k +1) 1 P n = L k ( v [ n ]) 2 1 C C C C A 1 L 1 C C C C C A (3.3) ThewindowedbasedSNRsareusuallylimitedtoupperandlowervalues.Thisisused forinstanceswherethewidowedSNRissignicantlyloworhigh.Inthesecasesthe extremeswoulddominatethemeasurement.Anexampleofthisisifthesignalpower goestozerothewindowedSNRwouldbesetto-10dBinsteadof .Likewise,if thenoisesignalgoestozero,thewidowedSNRissetto45dBinsteadof 1 Theprimaryproblemwithusing SNR seg tocalculatethenoisegainrequired,is thatitcallsforaniterativeprocess.Thatis,sincethecalculationisnon-linear,there isnoclosedformsolutionlikestandardSNR.Thecalculationmustberepeatedfor

PAGE 27

21 severalnoisegainvaluesbeforethedesiredSNRlevelisachieved.Forthisreason,we neededanotherapproachtocalculateperceptualSNR. 3.2.3A-Weighting AnotherapproachtoeectivelyaddingnoiseataspecicperceptualSNRlevelis A-Weighting[20].A-Weightingcoecientsareusedtolterboththespeechsignal andthenoise.Thesetemporarysignalsarethenusedtocalculatetherequiredgain forthenoisesourceinordertoachieveaspecicperceptualSNR.Thefrequency responseoftheA-WeightinglterisshowninsolidredinFigure3.2.Thegure alsoshowstheresultsofaveragingtheinverseofthe equalloudnesscontours from Figure3.1indashedblue. Figure3.2:A-WeightingFilterMagnitudeResponse. NambaandMiura[20]foundthatA-WeightingwasidealforcalculatingtheperceptualSNRofnarrow-bandnoise.Thoughspeechandmanynoisetypesareconsideredwide-bandsignals,theuseofA-Weightingisstillabetterapproximationthan theclassiccalculation.Additionally,thetimerequiredforthelisteningtestsshould berelativelyshortandmorecomplexcalculations,suchasISO532b,arenotpractical

PAGE 28

22 forthelisteningtests.Forthesereasons,A-weightingwasusedforthecalculationof theSNRlevelforthelisteningtestdiscussedinthischapter. 3.2.4ChoosingtheSNRlevels Ifweweretotestthefullrangeofalistener'shearinginnoisyenvironments, thepeakincreaseinintelligibilitycouldbefound.Thiswouldrequireadjustingthe SNRfromthepointwherethelistenerwasachieving100%accuracytowhereheor sheismerelyguessingeverytimeontheintelligibilitytest.Figure3.3showsthe expectedresultsiftheSNRwasadjustedfromaverylowSNRleveltoaveryhigh level.Thisgureshowsanexpectedperformanceandvaluesareprovidedforclarity only.AtaparticularSNRlevel,themaximumpercentincreasewouldbeachieved. Unfortunately,thiswouldrequirethatthetestdurationbeextremelylong.And, giventhethreenoisesources,itwouldmostdenitelyleadtolistenerfatigue.This requiredsomepretestingtoestablishwhatlevelstouse.Wedecidedthatthedesired accuracyforun-enhancedspeechshouldbesomewherecloseto75%.Thisnumber, beinghalfwaybetweentheupperandlowerlimits,wouldleaveroomforinexperienced andexperiencedlisters. Figure3.3:ConceptualIntelligibilityTestResultsforaSingleListener.

PAGE 29

23 TheprocedureforndingtheseSNRlevelsinvolvedapreliminaryintelligibility test.Fourlistenerswereusedforthesetests.FirsttheSNRwassetto5dBandthen anadaptivealgorithmwasused.Foreachnoisesource,atotalof80utteranceswere presented.After20utterances,thepercentcorrectwascalculated.Ifthepercent correctwashigherthan75%,theSNRwasloweredby1dB.Ifitwaslowerthan 75%,theSNRwasraised.Aftereachadditionalveutterances,thepercentcorrect wasagaincalculated.Ifthepercentcorrectwasstillapproaching75%,thenthe SNRwasnotchanged.However,ifitwasmovingawayorunchangedtheSNRwas adjusted.Table3.2showstheresultsofthepreliminaryintelligibilitytests.The listener'slanguageandSNRdBlevelwhichresultedin75%correctfortherespective noisesource.Basedontheseresults,SNRlevelsof-5dBand5dBwerechosen. Table3.2:ResultsofPreliminaryIntelligibilityTest. ListenerNativeLanguageBabbleCarPink IEnglish1.7dB-9.4dB-0.8dB IIEnglish1.0dB-2.2dB0.0dB IIIHindi2.2dB-3.2dB-3.5dB IVChinese3.5dB2.5dB2.7dB 3.3AudioEQFilter Cellularphonespeakersaredesignedwithsizeandcostinmind.Theyrequire compactsizeandlimitedcostinordertobeusedincellularphones.Additionally, theFederalCommunicationsCommission(FCC)putsconstraintsonthespeakerpeak outputsoundpressurelevel(SPL).Becauseofthesedesignconstraintsthespeakers havelimitedfrequencyrange.Motorolaprovidedthefrequencyresponseforthe iDENi85phonespeaker(assumedtobelinear),showninFigure3.4.TheMATLAB function firls:m ,fromtheSignalProcessingToolbox,waschosentodesignthelter.

PAGE 30

24 The firls:m functionusesaleast-squares(LS)approachtoderivetheFIRlter coecientsfortheAudioEQmodel[14].TheFIRlterislinear-phaseandtherefore doesnotdistortthespeechsignal.Thepurposeofthismodelwastomimicthe Figure3.4:PhaseandFrequencyResponsefortheSpeakerEQModel. cellularphoneenvironment.Thismodelwasusedtolterthespeechsignalsafterthey wereenhancedwiththecombinedalgorithm.Sinceweknowthatenvironmentalnoise isnotlimitedtothesamebandwidthascellularphonespeech,thiswasperformed beforetheSNRcalculationandonlythespeechsignalwasltered. 3.4ListenerDemographicsandTestResults Demographics Atotalof22listersweretestedinbothLoudnessandIntelligibility tests.AtotalofsixlistenerswerenativeEnglishspeakers.Nineteenofthelisteners weremaleandthreewerefemale.Fiveofthelistenerswereconsideredexperienced (takenmultiplelisteningtests)listeners.Thelistenersrangedfrom22to42yearsof age.Theaveragetesttimewas22minutesand38seconds.

PAGE 31

25 ListeningTestFlowDiagrams Figures`3.5and3.6showtheowdiagramsfor theloudnessandintelligibilitytestsrespectively. Figure3.5:SignalFlowDiagramfortheLoudnessTests. Figure3.6:SignalFlowDiagramfortheIntelligibilityTests. ResultsforLoudnessTests Theaverageperceptualloudnessgainwas4dB.This resultisapparentinthecrossoverplot,Figure3.7.Thescreeningprocess,shownby thedottedlines,indicatesthattheresultsareaccurateandthattheuserswerepaying attention.Ofthe22listenersresultsonlyfourfellbelowthe4dBcrossoverandnone ofthesefellbelow2dB.Table3.3showsthetotalresultsfortheloudnesstests.These resultsarehigherthanearliertestsconductedbyBoillot.Thismaybeattributedto theapplicationoftheAudioEQmodel.The2.5dBgainwasforformantexpansionon speechsampledat16kHz.The4dBgainwasachievedusingthecombinedalgorithm onvocodedspeechsampledat8kHZ.TheAudioEQmodelhasapeakaroundthe

PAGE 32

26 Figure3.7:TheresultsforalllistenersontheExpandedLoudnesstest.Verticalbars indicate95%condenceintervals[4]. 2-3kHzrangewhichalsocorrespondstothehighestsensitivityontheISO-226equal loudnesscurves[9]. ResultsforIntelligibilityTests TheIntelligibilitytestsresultedinanincreaseof 4.8%at-5dBSNRforenhancedspeechoverallnoisetypesandconfusablesets.This isaminimumincreaseandweexpectthatthemaximumincreasewouldbelarger.At a5dbSNRlevelthetestsresultedinlessthan1%decreaseinintelligibility.The95% condenceintervalsareshownforoverallresults,inTables3.4and3.5.Thesetables Table3.3:ResultsforSubjectiveLoudnessTests. Scalingof TimesSelectedTimesSelectedPercentEnhanced Modied OriginalEnhancedSelected -5dB 1389340 -4dB 10510951 -3dB 6610862 -2dB 6418568 -1dB 4121084 0dB 3621586

PAGE 33

27 Table3.4:IntelligibilityTestresultsfor5dBSNR. Alg. All I II III IV Overall O 83.56 5.94 89.06 92.36 79.93 69.00 E 82.70 4.69 82.24 86.31 81.96 81.18 Car O 88.35 91.67 90.35 86.46 34.21 E 88.35 91.23 88.85 85.15 82.89 Babble O 84.92 93.86 91.05 78.59 45.61 E 81.65 82.46 92.11 79.21 65.79 Pink O 78.20 56.00 90.79 72.02 60.53 E 77.40 65.44 51.05 83.50 75.69 Table3.5:IntelligibilityTestresultsfor-5dBSNR. Alg. All I II III IV Overall O 66.49 3.98 66.57 81.83 64.31 53.73 E 71.31 4.92 68.45 73.25 74.59 61.26 Car O 73.08 83.86 74.91 64.31 31.58 E 78.47 81.14 59.74 82.22 39.47 Babble O 62.48 55.26 78.20 67.79 14.47 E 72.17 70.18 66.18 74.65 44.91 Pink O 64.13 50.88 79.91 61.00 43.86 E 65.09 46.05 75.12 69.49 50.38 alsoshowresultsforallthreenoisesourcesvs.allfourconfusablesetsinTable2.1 andfortheoriginal(O)andenhanced(E)signals. 3.5ANoteonERVU ThoughtheSFMtechniquewasusedforvoiced/unvoiceddecisionintestingfor itslowercomputationalcomplexity,thereisanissueofprecisionwhenusingaxedpointDigitalSignalProcessor(DSP).ThegeometricmeanistakenontheDFT valuesofaframeofspeech.Weknowthatthesevaluesarelessthanone.Theresult ofmultiplyingallthevaluesinoneframewouldresultinanumberlessthanthe precisionoftheMotorola56000DSPusedintheiDENphones.Forexample0 : 9 160 = 4 : 8 10 8 whencomparedtotheDSPsprecisionof2 15 =3 : 11 10 5 .Moreelaborate

PAGE 34

28 calculationsusinglogarithmsproduceasolutionbutrequirehighercomputational complexity.Weproposeanalternatemethodforvoice/unvoiceddecisionusinga peakautocorrelationratiotechnique. r [ k ]= 1 N N 1 X n =0 x [ n ] x [ n k ] ; where N> 0(3.4) Equation3.4showsthebiasedautocorrelationfunctionforlag k .Autocorrelation, iscommonlyusedinpitchrecognition[1]systems.Pitch,therateatwhichtheglottal opensandcloses,isinherenttovoicedspeech.Thepeaksintheautocorrelation functiontakeonvoicedspeechareseparatedbytheperiodicityofthepitch.For unvoicedspeechtheautocorrelationfunctionresemblessomethingclosetoanimpulse response.Thisisduetothecharacteristicofunvoicedspeechbeingclosetostationary whiteGaussiannoise. ratio = max m 24 m N 1 r [ m ] r [0] ; where N> 0 and f s 4 m = maximumpitch. (3.5) Insteadofcalculatingthepitch,weconsidertheratioofthesignalpower,equivalentto r [0],andthemaximumvalueoftheautocorrelationfunctionfromlag 4 m to N 1. 4 m ischosentoremoveanyspreadingoftheimpulsearoundlagzeroandto ignoreunrealisticpitchvalues.Thepeakautocorrelationratiotechniqueresultsina 6.2%voiced/unvoicedclassicationerrorascomparedtoa3.8%errorusingtheSFM technique.However,theperformanceofSFMdecreasesastheSNRdecreases.This canbealleviatedusingthepeakautocorrelationratiotechnique.Thistechniqueis consideredveryrobusttonoise[1]forpitchdetection.

PAGE 35

CHAPTER4 JAVAIMPLEMENTATIONOFLISTENINGTESTS InChapter3wediscussedmethodsformakingthelisteningtestsdiscussedin Chapter2morerelevanttofuturereal-worldoperationofthealgorithms.These enhancementsincludedreal-worldnoise,vocodereectsandmodellingthespeakerEQ curesofthecellularphones.However,theselisteningtestsresultsarestillsomewhat articialsincetheuserisactuallylisteningonaheadsetandnotonacellularphone. Clearly,ifwecouldmoveourwholelisteningtestenvironmenttothephone,thenthe testscouldberuninthetrueenvironmentsuchasridinginacaronahighwayor usingthephoneinacrowdedsocialgathering.Thischapterdiscussestheeortsof implementingthelisteningtestsontheJavaphone{Javaenabledcellularphone{,the interfacebetweenthePCandphoneandthedatabasemanagement.Thelistening testsconductedinChapter3gavepromisingresultstowardsreal-worldperformance. Tonishtheevaluation,wemustbeabletoquantifytheperformanceinthetrue listeningenvironment.Real-worldtestingcanbeperformedusingtheJavaphonein anaturalenvironment. 4.1J2MEandJ2SE TheJava2MicroEdition(J2ME)isasoftwaredevelopmentkit(SDK)commonly usedinmobiledevices.DevelopmentinJ2MEislimitedinseveralwaysascompared totheJava2StandardEdition(J2SE).First,thelimitedmemoryspace,describedin section4.3,requiresecientcoding.Second,thelimitedclasssetreducesfunctionalityandsometimesrequiresadditionalcoding.UnlikeJ2SE,developmentofJ2ME applicationsrequirestheuseofacongurationandprole.TheConnectedLimited 29

PAGE 36

30 DeviceCongurationdescribes(CLDC)theAPIforacertainfamilyofdeviceswhich includestheMotorolaiDENseriesJavaenabledphones.TheMobileInformation DeviceProle(MIDP)sitsontopofthecongurationandtargetsaspecicclass ofdevices[15].AllJavaapplicationswrittenforthephoneareanextensionofthe MIDletclass.AMIDletisaMIDPapplication.ProgramminginJ2MEisperformed byutilizingtheclasseswithinMIDPandCLDCAPIs.TheMIDletisdevelopedspecictothedevicesitwillrunonand,inthiscase,theMotorolaJavaenablediDEN phones. 4.2WhyJ2ME? Javaisanever-evolvingsoftwaredevelopmenttool.Motorolahasincorporated theJava2MicroEdition(J2ME)virtualmachine,describedinSection4.1,incertainiDENmodelphones.SomeadvantagesofJavaareportability(writeonce,run anywhere),thoroughdocumentationandextendednetworkingability.ThoughJ2ME islimitedintheseadvantages,itstillprovidesanexcellentenvironmentforthedevelopmentofapplicationsoncellularphones.Thefollowingisalistofsomeofthe abilitiesJ2MEhas. 1.CommunicationwithaPCviatheserialport. 2.Communicationwithwebresourcesviatheinternet. 3.Adatabasestoragesystem. 4.BasicGUIoperation. 5.Controlofvocodedspeechles. 6.Imagerenderingandanimation. 7.Multi-threadedoperation.

PAGE 37

31 ThedevelopmentofthelisteningtestsontheiDENphonesincorporatestheabilities listedinitems1,2,3,4and5,inthelistabove. 4.2.1DevelopinginJ2ME TheprocedurefordevelopingJ2MEcodeisasfollows: 1.Thecodeiswrittenwithinthelimitationofthedeviceitdesignedtorunon. 2.ItiscompiledusinganystandardJavacompilerandJ2MElibraries. 3.Itispreveriedusingadevice-specicpreverier. 4.Thecodeistestedonanemulatorforbugs. 5.Anyproblemsencounteredintheemulationaredebugged. 6.Thecodeisrecompiled,packagedintoaMIDletsuite,convertedtoJavaArchive (JAR)lesanduploadedtothephone. TheuploadingofMIDletsisperformedusingtheJavaApplicationLoader(JAL) utility.WhenthephoneisconnectedtothePC'sserialport,theJALutilitycanbe usedtoview,delete,anduploadMIDlets.Ifanybugsarediscoveredafterrunning theMIDlet,thecodemustbedebuggedandstepstwothroughsixarerepeated.The prevericationandemulationsoftwaremaynotalwayscatchproblemsthatwilloccur oncetheMIDletsareexecutedonthephone.MultipleMIDletscanbeuploadedto thephone.IftheseMIDletsarerequiredtoshareresources,theymustbepartof thesamesuite.ThemultipleMIDletsarepackagedintoaMIDletsuiteandthen compiledandconvertedtoJARlestobeuploadedtothephone.SeeAppendixA foranexplanationofallclassesandmethodscreatedtoimplementthelisteningtests. 4.2.2J2MEConstraints Constraintsonfont,screencolors,screensizeandimagerenderingneedtobe carefullyconsideredwhenwritingJ2MEcodeforaspecicdevice.Thebasicuser

PAGE 38

32 Figure4.1:Motorolai90ciDENphone displayobjectusedinJ2MEisthe Display class.Thisclasscontainsallmethods forbringingdisplayableinformationtotheforegroundofthedisplayscreen.These methodscancontrolanydisplayableobjects.The Screen objectisasubclassof the Display class,henceitinheritsdisplayableproperties.The Screen classand itssubclassesareusedinallcodewrittenforthelisteningtests.Anotherclass,the Canvas objectisusedfordrawingonthescreenandwasnotutilizedinthelistening tests.Threesubclassesof Screen are Form List and TextBox .Theseclasses allowforuserinputthroughthedevicekeypadandinformationdisplayonthedevice screen.AMIDletistypicallywrittentonavigatethroughdierentscreensbasedon userinputs.

PAGE 39

33 4.3MotorolaiDENSeriesPhones TheJavaphonehasthreeuserinputdevices[21].Theyincludeaalpha-numeric keypad,similartostandardtouch-tonetelephones,a4-waynavigationkeyandtwo optionkeys.Throughthekeypadallnumbers,lettersandpunctuationcanbeentered bysequentiallypressingkeys(multi-tapping.).The4-waynavigationkeycanbeis usedtomovethoughmenus,lists,radio-buttonsandchoicegroups.Thetwooption keysareusedascontroltoselectfrommenus,lists,radio-buttons,choice-groupsorto programdenedoptions.TheMotorolai90ciDENphoneisshowninFigure4.1.The phonehas3typesofmemorydedicatedtotheJavaVM[16].Datamemory(256k Bytes)isusedtostoreapplicationdata,suchasimageles.Programmemory(320k Bytes)referstothememoryusedtoinstallapplications(MIDlets).Heapmemory (256kBytes)referstotheRandomAccessMemory(RAM)availabletorunaJava application. 4.4ListeningTestSetup Threeseparatelisteningtestsrunonthephone.Theyincludeloudness,intelligibilityandacceptability,similartothoseusedinChapter2.Thesetestscanbe runusingthe ListenT MIDletonthephonewhichispartofthe ListenM package. Theuserisaskedtoentertheinformationincluding;name,age,nativelanguageand date.Next,theuserselectsbetweenoneofthethreetests.Thebasicowchartfor the ListenT MIDletisshowninFigure4.2.Thethreelisteningtestowchartscan beseeninFigures4.4,4.5and4.6. 4.5 ListenT MIDlet The ListenT MIDletisanextensionofclass MIDlet andimplementstheinterface CommandListener .Theimplementationof CommandListener allowstheMIDlet tomonitorcommandsreceivedthroughthephone'soptionkeys.Thecommandsare theninterpretedbasedonthecurrentscreen,commandselectedandindexselected,

PAGE 40

34 Figure4.2:FlowchartforListeningTests ListenT:java ifany.Initially,whentheMIDletisrun,itrstexecutesitsconstructor.Thissetsup anyclassesorvariablesthatareinitiallyneededtoexecutetheMIDletproperly.In ListenT theconstructorcreatesthree ListenDB databases(explainedinsection4.6) foreachofthethreelisteningtests,the ansBuffer buerandthe RandObj object{ arandomnumbergeneratorwhichextendstheabilityofthe Random class.Itthen initializesthescreens testScreen userScreen and doneScreen bycallingtheirinitializationmethods.Next,itcallsthe MIDlet:startApp method(thisisalwaysthe casewithanyMIDlet).Withinthismethodthe userScreen issetasthecurrent displayandthelisteningtestisreadytostartandthe randomizeDir methodis called.The randomizeDir methodtakesallVoiceNoteFiles(VNF)(describedin

PAGE 41

35 Figure4.3:JavaphoneListeningTestGUI. Section4.7)andrandomizestheordersothateverytimeatestistakentheorder ofutteranceschanges.ThreeseparateVNFdirectoriesarecreatedfortheloudness test,intelligibilitytestandacceptabilitytest.Todothisanamingschemewasused ontheVNFs.EachVNFnamebeginswithatwoletterdesignatorfollowedbyan \ ".Thesedesignatorsare\lt",\it"and\at"forloudnesstest,intelligibilitytest andacceptabilitytestrespectively.Fromthispointontheusernavigatesthrough dierentscreensbasedontheinputreceived. Onceinformationisenteredandthecommand\Begin"iscalledbypressingtheleft optionkey.The CommandListener method commandAction methodthencompares thecommandtothecurrentscreen.Itstorestheuserinformationin ansBuffer and

PAGE 42

36 setsthedisplayto testScreen .Theuserthenselectsoneofthethreetesttypesand pressesthecommand\Select"usingtheleftoptionkey.Again,the commandAction methodcomparesthecommandtothecurrentscreen.Ittheninitializesthecorrespondinglisteningtestscreen.Thenextthreesubsectionswillgivedetailedprogram executionbasedontheuserstestselection. 4.5.1LoudnessTest TheloudnesstestowchartisshownFigure4.4forareference.Iftheuserselectionwas\LoudnessTest"fromthe testScreen ,the CommandListener method commandAction willinitializetheloudnesstestscreen,setthecountertozeroand generatearandomsequence,basedonthelengthofthetest(inthiscase20).Thesequencewillconsistofthenumberszerooroneandwillbeusedtodetermineinwhich ordertheenhancedandun-enhancedutteranceswillbeplayed.The loudScreen screenissetasthecurrentdisplay.Theuserthenselectsoneoftheutterances,not knowingwhichisenhanced,andplaysthesoundbypressing\Play"usingtheleftoptionkey.Themethod playSample ofclass ListenT ispassedtheutteranceselected. Thismethodutilizesmethod VoiceNote:play toplaytheutterance. VoiceNote isa JavapackageprovidedbyMotorolathatallowstheplaying,recordingandmanagementofsoundlesin.vcfand.vnfformat. Oncebothsoundshavebeenplayed,theuserthenselectswhichsoundedlouder bypressing\Select"usingtherightoptionkey.The commandAction methodveries thatbothsoundshavebeenplayedandthencomparestheselectiontotherandom sequenceelementatthenumberofthecounter.Iftheyareequivalent,then\right"is storedinthe ansBuffer .Otherwise,\wrong"isstored.Next,thecounterischecked toseeif20utteranceshavebeenevaluated.Ifitisnotreached,thedisplayremains loudScreen andthetestiscontinued. Oncethetestiscomplete(thecounterreaches20),thetestresultsarestored tothedatabaseusingthe ltAnsDB object.Databasestoragewillbediscussedin

PAGE 43

37 Figure4.4:FlowchartforLoudnessTestsubprogram.

PAGE 44

38 Subsection4.5.4.Afterthedataisstored,thedisplayisset doneScreen .Fromhere, theusercanchoosetotakeanothertestorexittheMIDlet.Figure4.3showsthe GUIforthelisteningtestsonthephonediscussedinthenextthreesections. 4.5.2IntelligibilityTest TheintelligibilitytestowchartisshownFigure4.5forareference.Iftheuserselectionwas\IntelligibilityTest"fromthe testScreen ,the CommandListener method commandAction willinitializetheintelligibilitytestscreen,setthecountertozero andgeneratearandomsequence,basedonthelengthofthetest(inthiscase20e). Thesequencewillconsistofthenumberszeroandoneandwillbeusedtodetermine theorderthatthecorrectandincorrectchoiceswillbedisplayed.The intelScreen screenisthenasthecurrentdisplay.Theuserthenplaysthesoundbypressing \Play"usingtheleftoptionkey.Themethod playSample ofclass ListenT ispassed theutteranceselected. Oncethesoundhasbeenplayed,theuserthenselectswhichutterancewasheard bypressing\Select"usingtherightoptionkey.The commandAction methodveriesthatthesoundhasbeenplayedandthencomparestheselectiontotherandomsequenceelementatthenumberofthecounter.Iftheyareequivalent,then \right [utterance] [algorithm]"isstoredinthe ansBuffer .Otherwise,\wrong [utterance] [algorithm]"isstored.Next,thecounterischeckedtoseeif20utterances havebeenevaluated.Ifitisnotreached,thedisplayremains intelScreen andthe testiscontinued. Oncethetestiscomplete(thecounterreaches20),thetestresultsarestored tothedatabaseusingthe itAnsDB object.Databasestoragewillbediscussedin Subsection4.5.4.Afterthedataisstored,thedisplayisset doneScreen .Fromhere, theusercanchoosetotakeanothertestorexittheMIDlet.

PAGE 45

39 Figure4.5:FlowchartforIntelligibilityTestsubprogram.

PAGE 46

40 Figure4.6:FlowchartforAcceptabilityTestsubprogram

PAGE 47

41 4.5.3AcceptabilityTest TheacceptabilitytestowchartisshownFigure4.6forareference.Iftheuserselectionwas\AcceptabilityTest"fromthe testScreen ,the CommandListener method commandAction willinitializetheacceptabilitytestscreenandsetthecountertozero. The acceptScreen screenisthensetasthecurrentdisplay.Theuserthenplaysthe soundbypressing\Play"usingtherightoptionkey.Themethod playSample of class ListenT ispassedtheutteranceselected.Oncethesoundhasbeenplayed,the userthenratesthequalitybyselecting\Excellent",\Good",\Fair"or\Poor"and pressing\Select"usingtherightoptionkey.The commandAction methodveries thatboththesoundhasbeenplayedandthencomparestheselectiontotherandom sequenceelementatthenumberofthecounter.Then\sent# [algorithm] [quality rating]"isstoredinthe ansBuffer .Next,thecounterischeckedtoseeiftenutteranceshavebeenevaluated.Ifitisnotreached,thedisplayremains acceptScreen andthetestiscontinued. Oncethetestiscomplete(thecounterreachesten),thetestresultsarestored tothedatabaseusingthe atAnsDB object.Databasestoragewillbediscussedin Subsection4.5.4.Afterthedataisstored,thedisplayisset doneScreen .Fromhere, theusercanchoosetotakeanothertestorexittheMIDlet. 4.5.4DatabaseStorage Thestorageofanswersisperformedusingthe RecordStore classandmethods. The ltAnsDB itAnsDB and atAnsDB objectsareinstancesof ListenDB ,which werecreatedintheMIDletconstructor,andcontrolstorageofdatatoarecordusing themethod ListenDB:addTaskRecord .Theuserinformationtypeandanswersare storedsequentiallyusingthedelimiter\:?:"ina String object.Thedataisthen storedina RecordStore bythepassingthestringtothemethod addTaskRecord .

PAGE 48

42 4.5.5 RandObj Class The RandObj classisanextensionofthe Random classandallowsthegeneration ofrandomnumbersandsequences.The Random classisapseudorandomnumber generatorcapableofprovidingarandomnumberuniformlydistributedbetween 2 15 Themethod Date:getTime isusedtogenerateaseedwhichisthenpassedtothe method Random:setSeed withintheRandObjobject.Method getRandNum ofthe RandObj classarecalledfrom ListenT togeneratethesenumbers. 4.6 ListenDB Class The ListenDB classprovidesdatabasemanagementabilitytoboth ListenT and DbUpload MIDlets.Thetwomaincontrolmethodsof ListenDB open and close allowaccesstoa RecordStore databasebycheckingtoseeifthedatabaseexists,verifyingitisopen,openingitandclosingit.Fourfunctionalmethods, addTaskRecord deleteTaskRecord getRecordsByID and enumerateTaskRecord ,allowtheadditionof records,deletionofrecords,browsingofrecordsandorganizationofrecords,respectively. Note:The ListenDB classisnotaMIDlet.Itisonlyaclassthataddsfunctionalitytootherclasses.Itisnotexecutableandbecomesainner-classwheninstantiated insideaMIDlet.Theuseofaseparateclassconservesmemory.Sinceitisusedby multipleMIDlets,itwillnotoccupymemoryineachMIDlet. 4.7 Voicenotes MIDlet The Voicenotes MIDlet,whichimplementsthe VoiceNote Class,allowstherecording,playback,renaminganddeletingofvoicesamplesrecordedonorothephone. Theonlydirectaccesstosoundplaybackandrecordingisthrough VoiceNotes .The soundlesarestoredasvocodeddata.TheowchartcanbeseeninFigure4.7. Twootherapplicationsarewrittentosupportuploadinganddownloadingsoundles toandfromaPC.The simpleWriteVNF MIDletisrunonthephonewhilethe

PAGE 49

43 Figure4.7:Flowchartfor VoiceNotes MIDlet

PAGE 50

44 simpleRead J2SEapplicationisrunonthePC.Thesetwoprograms,providedby Motorola,utilizetheJava TM CommunicationsAPI( commapi )package[24]. 4.8VoiceNoteCollection VoiceNoteFiles(VNFs)collectedonthephoneanddownloadedtothePCas describedinSection4.7.Theselesarethende-vocodedusingaPCapplication called\VoiceNoteRecorder"providedbyMotorola.Thede-vocodedlecanthenbe readbyMATLABandprocessed.Atthispoint,anyenhancementormodicationof thesoundsamplecanbedone.Thenextstepwouldbetovocodethesoundsample usingthe"VoiceNoteRecorder"utilityanduploadittothephone.Additionally, PCrecordedsoundlescanbevocodedanduploadedtothephone.Since,thePC recordedsamplesareonlyvocodedonce,therefore,thequalityofthesampleisbetter thanthatofthesamplesrecoveredfromthephone.ThenewVNFsareuploadedto thephoneusingtheJALutility.Fromthere,theycanbeplayedordeletedusingthe VoiceNotes MIDlet.Figure4.8showstheGUIandprogramowforthe Voicenotes MIDlet. 4.9 DbUpload MIDlet Thelisteningtestsresultsarestoredasrecordsonthephone.Torecoverthese records,the DbUpload MIDletisused.ThisMIDletispartofthe ListenM package, whichallowsitaccesstorecordsstoredin recAnsDb bythe ListenT MIDlet.The owchartforthisMIDletisshowninFigurerefg:dbow.Thisprogramrequires thatthephonebeconnectedtothePC'sserialportforproperexecution.Theuser isrstpromptedtoconnectthephonetothePCandrunthePCbasedapplication DbDownload ,explainedinSection4.10.Next,theuserchooseswhichofthethree testtypestodownload.Whennished,theuserisaskedifthedownloadedlesareto bedeleted.Finally,theMIDletisended.Figure4.10showtheGUIforthe DbUpload MIDlet.

PAGE 51

45 Figure4.8:JavaphoneVoicenotesGUI.

PAGE 52

46 Figure4.9:Flowchartfor DbUpload MIDlet 4.10PCCodingUsingJ2SE TheJ2SEapplication DbDownload isusedconcurrentlywiththe DbUpload MIDletonthephone.Forthisapplication,theOBDCutilitywasusedinWindowsto connectanMSAccessdatabasetotheapplication.TheMSaccessdatabaseandthe JDBCconnectionmustsharethesamelename.Initially,theapplicationconnects tothedatabaseusingJDBC(JavaDatabaseConnectivity).Theapplicationwindow canbeseeninitsinitialmodeconrmingconnectiontothedatabase,inreg:dbdng. Thiswindow,createdfromclass Frame ,hasthreefunctionalbuttons;Exit,Connect toPhone,andAddRecords.Italsohasthreetext-boxestoindicatethetesttype, numberofrecordstobeaddandaneditcommentboxtoaddedtotherecords.Additionally,ithasastatusdisplaythatindicateswhatsteptheMIDletisinthedownload anddatabasestorageprocess.Thisdisplayconrmsstep-by-stepproceduresandindicateswhenproblemsoccur.Oncethephoneisconnectedtotheserialportand

PAGE 53

47 Figure4.10:JavaphoneDbUploadGUI. DbUpload isrun,the\ConnecttoJavaphone"buttonispressed.Theapplication willindicatethenumberofrecordsandthetesttypethatwasdownloaded.Theuser canthenchoosetoaddacommenttotherecordsorleaveitblank.Thebutton\Add Records"ispushedandthesystemaddstherecordstothedatabaseusingstandard SQLcommandsinsub-class AddRecords .Theseresultscanthenbeanalyzedusing MSAccess.

PAGE 54

48 Figure4.11:GUIfor DbDownload ApplicationRunningonPC

PAGE 55

CHAPTER5 CONCLUSION Thegoalofthisthesiswastoevaluatethereal-worldperformanceoftheEnergy RedistributionVoived/Unvoiced(ERVU)andWarpedBandwidthExpansionalgorithms.Earliertestingresultedinincreasedintelligibilityandincreasedperceptual loudnessforthesealgorithmsrespectively.Thealgorithmswerecombinedtoconcurrentlyenhancebothintelligibilityandperceptualloudness.Environmentalnoise, vocoding/devocodingeectsandcellularphonespeakercharacteristicswereincorporatedinlaboratorytestingtomimicthecellularphonelisteningenvironment.PC basedlisteningtestswereperformedtoquantifytheperformanceofthecombined algorithm.Toovercomethelimitsoflaboratorytesting,cellularphonebasedlisteningtestsweredevelopedinJ2MEtoprovideaplatformfortestingalgorithmsin real-worldenvironments.Thiswillprovideconcreteresultsandhelpdetermineifthe algorithmswillbeimplementedeet-wide. Thelisteningtestsresultedina4.5%increaseinintelligibilityat-5dBSNRand 4dBperceptualloudnessgain.Theseresultsshowthatthecombinedalgorithmwill provideincreasedperformancewithoutanyaddedpowertothespeechsignal.This providessucientmotivationtowardsimplementationoftheenhancementalgorithms oncellularphones. Theapplicationsdevelopedforthephonesbasedlisteningtestsallowtheevaluationofvocodedspeech.Therearetwoshort-comingswhichmustberesolvedbefore thetestswillprovideanyconclusiveresults.First,theJavaVirtualMachine(JVM) musthaveaccesstocontrollingstreamingaudio.Atthistimeitonlyhasdirectcontroloverprerecordedvocodedspeech(VoiceNotes).Thismayrequireseparateclass 49

PAGE 56

50 developmentandamoreelaborateinterfacebetweentheJVMandthecellularphone DSP.Atthistime,speechenhancementcanonlybeperformedbeforethespeechis vocoded.Thisisthereverseorderofwhichtheimplementationwillprocessspeech. Thiswillleadtoinconclusiveresultssincetheeectoftheencodingprocessbythe vocoderonenhancedspeechisunknown.WhenthedecisionismadebytheMotorola iDENGrouptoimplementthealgorithmsonthecellularphoneDSPandtheinterfacebetweentheJVMandDSParecompleted,theJ2MEcodecanbeappropriately modied. Futureworkshouldconsiderextensivecellularphonebasedtestingoncetheproper implementationsaremade.Thismayleadtoareevaluationofthealgorithmsand parameters.Itwillbetheseteststhatwillprovidethetrueperformanceofthealgorithms.Additionally,wirelessimplementationofthecommunicationbetweenthe phoneandPCwillrequirelessoverheadonthephoneandmaketestalterationsimpler.MakingchangestotestsfromthePCsidesuchasmodifyingquestions,test lengthandspeechsamplescouldbepossible.Theuseof UserDatagramProtocol (UDP)[5]{aninternetcommunicationsprotocolthatinvolvesthesendingofinformationpackets{andstreamingaudiomayhelpexpediteanydesiredchanges. TheJ2MElisteningtestmayalsoprovideausefulbyproductthatallowsthe evaluationofspeechcodersusedinthecellularphoneindustry.Likealgorithmevaluation,theoptimaltestingenvironmentforvocodersisonthecellularphoneitself. Theabilitytoevaluateandquantifyperformanceofvocodersonthephonecouldlead toamoretimelydeterminationofimplementation.

PAGE 57

APPENDIXA J2MECLASSESANDMETHODS Thefollowingsectionswillpresenttheclasses,subclassesandmethodsdeveloped intheJ2MEenvironment.Allmethodscreatedwillbediscussed.Methodsinherited throughextensionofaclassorimplementationofaninterfacewillnotbediscussed. Foradescriptionofthesemethodssee[25,18,13].Additionally,objectsdened inaclassthatarealreadypartoftheJ2SEorJ2MESDKswillnotbediscussed. SourcecodemaybeobtainedbysendinganemailrequesttoWilliamO'Rourkeat worourke@cnel.u.edu. A.1TheListenTClass TheListenTclassextendstheMIDletClassandimplementstheCommandListenerinterface.SeeTableA.1. TheRandObjclass RandObjisasubclassoftheListenTclassthatextendsthe Randomclass.Itgeneratespseudorandomintegersfrom 2 31 .Aseedissetby callingRandom.setseed(longseed)andpassingitthecurrentsystemdatemeasured fromtheepoch.SeeTableA.2. A.2TheDbUpload TheDbUploadclassextendstheMIDletclassandimplementstheCommandListenerinterface.SeeTableA.3. A.3VoiceNotes2 TheVoiceNotes2classextendstheMIDletclassandimplementstheCommandListenerandVoicenotesEnglishinterfaces.SeeTableA.4. 51

PAGE 58

52 TableA.1:MethodsforListenT.class MethodArgumentsReturnsDescription getVoiceNoteListString[]Returnsallvoicenotes onthephone. initUserScreenInitializestheUser InformationScreen. initTestScreenInitializestheTest SelectScreen initLoudTestScreenInitializestheLoudness TestScreen initInteligTestScreenInitializestheIntelligibility TestScreen initAcceptTestScreenInitializestheAcceptability TestScreen initDoneScreenInitializestheExit Screen setUpIntelSetsuptheIntelligibility screenforproperdisplay. playSampleStringPlaysavoicenoteby name. playSampleint,intPlaysavoicenote randomlybyID. randomizeDirString[]String[]Randomizesvoicenote directory.

PAGE 59

53 TableA.2:SubclassRandObj.classofListenT.class MethodArgumentsReturnsDescription getRandNumintintReturnsthenextpseudorandom integerlimitedto 2 numBits 1 getRandNumintReturnsthenextpseudorandom integer. TableA.3:MethodsforDbUpload.class MethodArgumentsDescription initConnectScreenInitializesthePC ConnectionScreen. initTestListScreenInitializestheTest SelectScreen initDeleteRecordsScreenInitializestheDelete OptionScreen initDeleteOKScreenInitializestheVerify DeleteScreen initDoneScreenInitializestheExit Screen sendRecordsintGathersrecordsbytype andcallssendRecord. deleteSentRecordsintDeletesaListeningTest Recordsbytype sendRecordStringDeletesaspecicrecord

PAGE 60

54 TableA.4:MethodsforVoiceNotes2.class. MethodArgumentsDescription initCommandScreenInitializestheCommand SelectScreen. initRecordInfoScreenInitializesvoicenote infoScreen. initRecordScreenInitializesthevoicenote recorderScreen. initListInitializesthevoice notelistScreen. initDeleteScreenInitializesthevoice noteDeleteScreen. initRenameScreenInitializesthevoice noterenameScreen. initDoneScreenInitializestheExit Screen. recordStringRecordsavoicenote. playPlaysavoicenote. deleteDeletesavoicenote. renameString,StringRenamesaspecic voicenote.

PAGE 61

APPENDIXB J2SECLASSESANDMETHODS Thefollowingsectionswillpresenttheclasses,subclassesandmethodsdeveloped intheJ2SEenvironmentinthesamemannerasAppendixA. B.1DbDownload TheDbDownload.classextendstheJFrameclassandhassubclassesshownin FigureB.1.Thisclasscreatesandinstanceofitselfwhichisshowninthegureas DbDownload$1.ThisclasscreatestheJDBCconnection,instantiatesScrollingPanel andControlPanelobjects.Nomethodswererequiredtobeaddedtothisclass. FigureB.1:TheDbDownloadclassanditssubclasses 55

PAGE 62

56 TableB.1:MethodsforConnectPhone.class MethodArgumentsDescription sortDataString[]Sortsreceivedstring usingdelimiter\:?:". B.2ControlPanelandScrollingPanel TheControlPanelandScrollingPanelclassesextendtheJPanelclass.Theseclass havenoaddedmethods.TheControlPanelclasshasinstantiatesExitMainandConnectPhoneobjects.TheExitMainclasshasnoaddedmethods. B.3ConnectPhone TheConnectPhoneclassimplementstheActionListenerinterface.ThisclassinstantiatesanDownloadDataobject.DownloadDataimplementsRunnableandSerialPortEventListenerinterfaces.SeeTableB.2andB.1.

PAGE 63

57 TableB.2:MethodsforDownloadData.class MethodArgumentsDescription getPhoneDataSetsupserialport inputstream. runStartsThreadfor readingdata. stopStopsThreadfor readingdata. serialEventSerialPortEventListensforevents ontheserialport. phoneConnectCapturestheserial port.

PAGE 64

REFERENCES [1]A.AceroX.HuangandH.Hon. SpokenLanguageProcessing .Prentice-Hall PTR,UpperSaddleRiver,NJ,2001. [2]T.BaerB.R.GlasbergandB.C.Moore.RevisionofZwicker'sLoudnessModel. Acustica ,82:335{445,1996. [3]M.A.Boillot.APsychoacousticApproachtotheLoudnessEnhancementof SpeechPartI:FormantExpansion.Submittedto InternationalConferenceon Acoustic,SpeechandSignalProcessing, HongKong2003. [4]M.A.Boillot. AWarpedFilterImplementationfortheLoudnessEnhancement ofSpeech. PhDDissertation,UniversityOfFlorida,2002. [5]H.DeitelandP.Deitel. Java TM HowtoProgram .Prentice-HallInc.,Upper SaddleRiver,NJ,1999. [6]H.FastlE.ZwickerandC.Dallmayr.Basic-programforCalculatingtheLoudness ofSoundsfromtheir1/3-oct.BandSpectraAccordingtoISO532B. Acustica 55:63{67,1984. [7]H.FletcherandW.J.Munson.Loudness,itsDenition,Measurement,and Calculation. JournaloftheAcousticalSocietyofAmerica ,5:82{108,1933. [8]W.Hartmann. Signals,SoundandSensation .Springer,NewYork,NY,1998. [9]TheInternationalOrganizationforStandardization. ISO226.Acoustics{Normal EqualLoudnessContours. ISO,Geneva,Switzerland1987. [10]TheInternationalOrganizationforStandardization. ISO532B.1975.Method BforCalculatingtheLoudnessofaComplexToneThatHasBeenAnalyzedin TermsofOne-thirdOctaveBands. ISO,Geneva,Switzerland1975. [11]J.Johnston.TransformCcodingofAudioSignalsUsingPerceptualNoiseCriteria. IEEEJournalonSelectedAreasinCommunications ,6(2):314{323,1988. [12]J.C.Junqua.TheInuenceofPsychoacousticandPsycholinguisticFactorson ListenerJudgmentsofIntelligibilityofNormalandLombardSpeech.In ProceedingsofInternationalConferenceonAcoustic,Speech,andSignalProcessing volume1,pages361{364,CausalProductionsPtyLtd.,Toronto,Canada1991. 58

PAGE 65

59 [13]M.KrollandS.Haunstein. Java TM 2MicroEditionApplicationDevelopment SamsPublishing,Indianapolis,IN,2002. [14]S.Mitra. DigitalSignalProcessing:AComputer-BasedApproach SecondEdition.McGraww-Hill,NewYork,NY,2001. [15]M.Morrison. SamsTeachYourselfWirelessJavawithJ2MEin21Days .Sams, Indianapolis,IN,2001. [16]MotorolaCorporation.J2ME TM TechnologyforWirelessCommunicationDevices.http://developers.motorola.com/developers/j2me,AccessedJuly2002. [17]MotorolaCorporation. MotorolaProtocolManual68P81129E15-BVSELP4200 BPSVoiceCodingAlgorithmforiDEN .iDENDivision,Plantation,FL,1997. [18]MotorolaCorporation. MotorolaSDKComponentsGuidefortheJ2MEPlatform .Austin,TX,2000. [19]MotorolaCorporation. StandAloneVoiceActivityDetectorHigh-LevelandLowLevelDesignDocument .iDENDivision,Plantation,FL,1999. [20]S.NambaandH.Miura.AdvantagesandDisadvantagesofA-WweightedSound PressureLevelinRelationtoSubjectiveImpressionofEnvironmentalNoises. NoiseControlEngineeringJournal ,33:107{115,1989. [21]NexteliDEN. i90cPhoneUsersGuide .NextelCommunicationsInc.,Reston, VA,2001. [22]T.L.Reinke. AutomaticSpeechIntelligibilityEnhancement .Master'sThesis, UniversityOfFlorida,2001. [23]RiceUniversity.SignalProcessingInformationBase. http://spib.rice.edu/spib/select noise.html,AccessedJune2002. [24]SunMicrosystemsInc.CommunicationsAPI. http://java.sun.com/products/javacomm/,AccessedApril2002. [25]SunMicrosystemsInc.Java TM 2StandardEditionAPI. http://java.sun.com/j2se/1.4/docs/api/index.html,AccessedSeptember2001. [26]TexasInstruments.TI46-WordSpeaker-DependentIsolatedWordCorpus(cdrom),1991. [27]TexasInstruments.DARPATIMITAcoustic-PhoneticContinuousSpeechCorpus(cd-rom),1991. [28]W.D.Voiers.Ch.34 DiagnosticEvaluationofSpeechIntelligibility. Dowden, Hutchinson,andRossInc.,NewYork,NY,1977. [29]E.ZwickerandH.Fastl. Psychoacoustics:FactsandModels 2ndEdition. SpringerVerlag,NewYork,NY,1999.

PAGE 66

BIOGRAPHICALSKETCH WilliamO'RourkewasbornonNovember5,1970,inBualo,NY.Attheageof vehisfamilymovedtoBocaRaton,Florida.Afterhighschool,heenteredtheUS Navy.Duringthattime,hespentveyearsstationedontwoshipsforwarddeployed totheUSSeventhFleetinJapan.HetravelledextensivelythroughAsiameetingnew peopleandlearningnewcustoms,apricelessexperience.Attheageof27,hedecided toleavetheNavyandpursueadegreeinelectricalengineeringattheUniversityof Florida. 60


Permanent Link: http://ufdc.ufl.edu/UFE0000585/00001

Material Information

Title: Real-World evaluation of mobile phone speech enhancement algorithms
Physical Description: Mixed Material
Creator: O'Rourke, William Thomas ( Author, Primary )
Publication Date: 2002
Copyright Date: 2002

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0000585:00001

Permanent Link: http://ufdc.ufl.edu/UFE0000585/00001

Material Information

Title: Real-World evaluation of mobile phone speech enhancement algorithms
Physical Description: Mixed Material
Creator: O'Rourke, William Thomas ( Author, Primary )
Publication Date: 2002
Copyright Date: 2002

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0000585:00001


This item has the following downloads:


Full Text











REAL-WORLD EVALUATION OF MOBILE PHONE SPEECH
ENHANCEMENT ALGORITHMS












By

WILLIAM THOMAS O'ROURKE


A THESIS PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE


UNIVERSITY OF FLORIDA


2002















ACKNOWLEDGMENTS


I am immensely indebted to Dr. John Harris for giving me the opportunity to

work on this project. The direction, guidance and encouragement I received from

him gave me the confidence needed to achieve. He is the epitome of an academic

advisor and his contributions to this thesis cannot be overstated. I would like to

extend this gratitude to his wife, Mika. For it was her support at their home that was

equally important in allowing him to be have such a significant effect on my research.

I would like to express my sincere thanks to Dr. Jose Principe and Dr. Purvis

Bedenbaugh for agreeing to be on my thesis committee and giving me guidance in

the process of completing my research.

I would like to thank my parents, Tom and Maureen O'Rourke, my sisters, Mau-

reen, Colleen, Kathy, Elizabeth, Bridget, Marie and Dawn, and my brothers Bobbll

and Tommy, for the love, concern and support they gave me in all my endeavors.

Finally, I would also like to thank friends and fellow lab mates Dr. Marc Boillot

of Motorola iDEN Division, Mark Skowronski, Adnan Sabuwala and Kaustubh Kale

for their help and contributions to my research.















TABLE OF CONTENTS


ACKNOWLEDGMENTS ............... . ii

ABSTRACT . . . . . . . .. v

CHAPTERS

1 INTRODUCTION . . . . . . 1
1.1 Background . . . . . . . . 2
1.1.1 Energy Redistribution .......... ........... 2
1.1.2 Bandwidth Expansion .......... ........... 5
1.1.3 Combined Algorithm .......... ............ 7
1.2 Listening Tests .................. .... 7
1.3 C!i ipter Sum m ary .................. ... 9

2 PC BASED LISTENING TESTS . . . . 10
2.1 Intelligibility Test .................. .......... .. 10
2.1.1 Bandwidth Expansion Results ................. .. 11
2.1.2 ERVU Results .. .......... ....... .. 12
2.2 Perceptual Loudness Test .................. .... .. 12
2.2.1 Objective Loudness ............... .. .. 12
2.2.2 Subjective Loudness ..... .......... .... 13
2.3 Acceptability Test ............... .......... .. 14

3 EXPANDED PC BASED LISTENING TESTS.... . 17
3.1 MotrolaTM VSELP Vocoder ................ .... 17
3.2 Noise Sources ............... ........... .. 18
3.2.1 SNR Calculation ................ ... .... .. 19
3.2.2 Segmental SNR ............... ....... .. 19
3.2.3 A-Weighting. .................... ........ .. 21
3.2.4 C!....-i_- the SNR levels ................ .. .. 22
3.3 Audio EQ Filter ........... . . . .... 23
3.4 Listener Demographics and Test Results ............... .. 24
3.5 A Note on ERVU .... .......... ......... .. 27

4 JAVA IMPLEMENTATION OF LISTENING TESTS . . 29
4.1 J2M E and J2SE .................. ........... .. 29
4.2 W hy J2ME? .................. ............. .. 30









4.2.1 Developing in J2ME ..... .......... .... 31
4.2.2 J2ME Constraints ............... .. .. 31
4.3 Motorola iDEN Series Phones .................. ..... 33
4.4 Listening Test Setup .................. ........ .. 33
4.5 ListenT MIDlet ............. .......... .. 33
4.5.1 Loudness Test ............. . ... 36
4.5.2 Intelligibility Test .... ..... ...... ... .. 38
4.5.3 Acceptability Test .............. .. .. 41
4.5.4 Database Storage .... ..... ...... ... .. .. 41
4.5.5 RandObj Class .................. ....... .. 42
4.6 ListenDB Class .................. ........... .. 42
4.7 Voicenotes MIDlet .................. ......... .. 42
4.8 Voice Note Collection .................. ...... .. .. 44
4.9 Db Upload M IDlet .................. ........ .. .. 44
4.10 PC Coding Using J2SE .................. ....... .. 46

5 CONCLUSION . . . . . . 49

APPENDIX

A J2ME CLASSES AND METHODS . . . . 51
A.1 The ListenT Class ............... ........ .. 51
A.2 The DbUpload .................. ......... .. .. 51
A.3 VoiceNotes2 .................. ............. .. 51

B J2SE CLASSES AND METHODS . . . . 55
B.1 DbDownload .................. ............ .. 55
B.2 ControlPanel and ScrollingPanel ...... .......... .... 56
B.3 ConnectPhone .................... . ..... 56

REFERENCES . . . . . . . 58

BIOGRAPHICAL SKETCH . . . . . 60















Abstract of Thesis Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Master of Science

REAL-WORLD EVALUATION OF MOBILE PHONE SPEECH
ENHANCEMENT ALGORITHMS

By

William Thomas O'Rourke

December 2002


C'i i ii il : John G. Harris
Major Department: Electrical and Computer Engineering

This work evaluates the performance of two new classes of automatic speech en-

hancement algorithms by modelling listening tests closer to real-world environments.

Results from earlier listening tests show that the warped bandwidth expansion al-

gorithm increases perceptual loudness and the energy redistribution voiced/unvoiced

algorithm increases intelligibility of the speech without adding additional power to

the speech signal.

This thesis presents results from listening tests conducted with a model of real-

world environments and provides a platform for cellular phone based listening tests.

Both algorithms are combined on a frame basis to increase intelligibility and loud-

ness. The speech signals are encoded and decoded to model effects of cellular phone

vocoders. Three typical environmental noises (pink, babble and car) are used to test

the algorithms' performance to noise. Perceptual techniques are used to calculate the

signal to noise ratio (SNR). A speaker EQ model is used to emulate the frequency

limits of cellular phone speakers. Finally, cellular phone based listening tests are









developed using the Java 2 Micro Edition platform for Motorola iDEN Java enabled

cellular phones.

The listening tests resulted in a 4 -' intelligibility increase at -5dB SNR and a

4dB perceptual loudness increase for the combined algorithm. The cellular phone

based listening tests will provide an ideal listening test environment once the Java

environment is equipped with streaming audio abilities and the algorithms are imple-

mented on the phone.















CHAPTER 1
INTRODUCTION


The use of cellular phones is on an increase all over the world and naturally it is

becoming more common to see people using their phones in high noise environments.

These environments may include driving in cars, socializing in loud gatherings or

working in factories. To deal with the noise, cellular phone users will often press the

phone to their head and turn up the volume to the maximum which many times is still

not enough to understand the speech. Cellular phone manufacturers could use more

powerful speakers or higher current drivers, both increasing cost and battery size.

Algorithms that increase intelligibility and overall loudness will help lower battery

usage and ease user strain when using the phone.

This thesis studies the implementation and evaluation of a new class of algo-

rithm for cellular phone use. The energy redistribution algorithm [22], described in

Section 1.1.1, is an effort to increase overall intelligibility of speech. Section 1.1.2 de-

scribes the bandwidth expansion algorithm [4], used to increase perceptual loudness

of speech. The aim of implementing these algorithms, is either (1) to enhance the

speech for noisy environments or (2) to maintain he quality of the speech at a lower

signal power in order to extend battery life.

This thesis primarily addresses the testing of these algorithms to ensure real-world

applicability. These tests include controlled environment laboratory testing on PCs

and real-world environment testing on cellular phones. The PC testing first tests the

performance of the two algorithms without real-world considerations. Next, the PC

testing is modified to better model real-world environments. Finally, the listening

tests are implemented on a cellular phone to evaluate real-world performance.







2

1.1 Background

This thesis is a final requirement for the fulfilment of work done for iDEN Division

of Motorola. The proposed work attempted to increase intelligibility and perceptual

loudness without incurring additional power cost. The work required algorithms that

were feasible for real-time implementations and would not affect the naturalness of

the speech. It assumed that standard noise reduction techniques would be performed

prior to the application of these algorithms and that the received speech could be

assumed to be clean. In support of this work, the extended laboratory testing and

cellular phone based listening tests were required. The idea was to enable the complete

evaluation of the two algorithms which resulted from this research.

1.1.1 Energy Redistribution

The energy redistribution voiced/unvoiced (ERVU) algorithm [22] has been shown

to increase the intelligibility of speech. The algorithm was developed based on psy-

choacoustics. First, the power of the unvoiced speech is crucial for intelligibility.

Second, the power of the voiced regions can be attenuated up to a certain point

without affecting the intelligibility and naturalness of the speech. Voiced speech is

generated anytime glottal excitation is used to make the sound. Voiced signals typi-

cally have higher power than unvoiced signals. Additionally, most of the signal power

lies in the lower frequencies for voiced speech.

Energy redistribution is performed in three basic steps. First, the voiced and un-

voiced regions are determined through the spectral flatness measurement (SFM) [11,

22], shown in Equation 1.1, on individual windows of speech.


(Hi Xk
SFM (1.1)
N-1
kE Xk
k-0











where N is the window length and Xk is the Discrete Fourier Transform (DFT) of the

window.

Second, the SFM is compared to two thresholds T1 and T2. These thresholds

are determined based on statistical classification of the SFM on voiced and unvoiced

speech, shown in Figure 1.1. The values for Tland T2 were set to 0.36 and 0.47

respectively. The decision cases can be seen in Equation 1.2.


8-
-vowels
stops
fricatives
7 glides
-nasals
others
A affricates


01 02 03 04 05 06
Spectral Flatness Measer


07 08 09


Figure 1.1: Discrimination of phonemes by the SFM


Decision


Voiced for SFM < Ti

Unvoiced for SFM > T2

Previous Decision otherwise


(1.2)


Next, the boosting level for the voiced and unvoiced regions must be determined.

The boosting level is a gain factor that will be applied to the window. For unvoiced

windows, the boosting will be greater than 1 and for voiced windows, it must be less

than 1. For windows that fall between both thresholds, the boosting level will remain

the same. Boosting levels were determined by evaluating various sentences obtained











from the TIMIT database [27]. The levels were adjusted until naturalness was lost

and then set to the previous level. The resulting levels were set to be 0.55 for voiced

speech and 4 for unvoiced. In order to smooth the transitions (going from voiced to

unvoiced windows or vice-versa), the boosting level is adjusted linearly in the first 10

milliseconds of the window. An example of original speech utterance "six" is plotted

with the modified version in Figure 1.2. The SFM technique was chosen over two

Original






-



0 100 200 300 400 500

Modified
2-









10)-
(O
-2







-3
0 100 200 300 400 500
Modified











Time in milliseconds

Figure 1.2: Result of applying ERVU to the utterance "six"


other techniques discussed by Reinke [22]. These techniques use a measure of spectral

transition to dictate enhancement. The points of high spectral transition are impor-

tant for retaining vowel perception in co-articulation. Similar results were obtained
2_ -





2















in intelligibility tests. The SFM technique is also less computationally complex than
0 100 200 300 400 500







Timethe other methods.mliseconds
Figure 1.2: Result of applying ERVU to the utterance "six"



other techniques discussed by Reinke [22]. These techniques use a measure of spectral

transition to dictate enhancement. The points of high spectral transition are impor-

tant for retaining vowel perception in co-articulation. Similar results were obtained

in intelligibility tests. The SFM technique is also less computationally complex than

the other methods.

The results of tests conducted by Reinke [22] have shown an increase of intel-

ligibility close to 5 percent at 0 dB SNR. Results indicate the performance of the

ERVU algorithm decreased when the original speech was corrupted with noise. This






5

shortfall is the result of using SFM. The added noise fills the nulls between formants

typically associated with voiced speech. This increases the geometric mean (numer-

ator of Equation 1.1) of the spectrum significantly. The resulting SFM is increased

and leads to a misclassification of voiced speech. However, this thesis examines the

intelligibility of clean speech for the sender's side with noise on the receiver's side.

1.1.2 Bandwidth Expansion

Bandwidth expansion [4] utilizes a warped filter to increase perceptual loudness of

vowels in clean speech. Like the ERVU, bandwidth expansion uses motivation from a

psychoacoustics perspective. The underlying principle is that loudness increases when

critical bands are exceeded [3]. Loudness refers to the level of perceived intensity of

signal loudness and can be measured both objectively and subjectively. Objective

measurements can be made using the ISO-532B standard (Zwicker method) [10].

The human auditory system acts as if there is a dedicated band-pass filter around

all frequencies that can be detected by the humans. Within this band, perceptual

loudness is dominated by the frequencies with the strongest intensity. Various tests

have been performed by Zwicker and Fastl [29] to measure these bands. The under-

lying idea is that when energy within a band is fixed, the loudness remains constant.

However, once the bandwidth is exceeded (the energy is spread over more than one

critical band) there will be an increase in the perceived loudness.

The bandwidth expansion algorithm uses this idea of spreading the spectral energy

of a speech signal over more critical bands. The regions of interest, vowels, are

found using voice activity detection described by Motorola Corporation [19]. Speech

enhancement is performed in three steps. First, a vocal tract model is estimated

using linear prediction coefficients (LPC) a, calculated using the Levinson-Durbin

recursion [1]. The excitation is then found using the inverse filter A(Z), an FIR filter

whose coefficients are a. Then, the signal is evaluated off the unit circle in the z-

domain. Evaluation off the unit circle is done by first selecting the radius (r) at which









the signal will be evaluated, then the signal is passed through an IIR filter A(>) shown

in Equation 1.3. Figure 1.3 shows that the pole displacement widens the bandwidth

of the formants in the z-domain.


A >) I=yg,,j


P
Sak k e-wk
kO0


(1.3)


Figure 1.3: Visualization of Pole Displacement.

Since critical bands are not of equal bandwidth, expansion by evaluation along

the circle of radius r will not be optimal. For this reason, a warping technique

is used. This warping technique is performed like the LPC bandwidth widening,

however, it works on the critical band scale. The idea is to expand the bandwidth

on a scale closer to that of the human auditory system. The fixed bandwidth LPC

pole displacement method is modified by applying the same technique to a warped

implementation. The Warped LPC filter (WLPC) is implemented by replacing the

unit-delay of A(>) with an all-pass filter shown in Equation 1.4. The warping filter

provides an additional term a called the warping factor. The range of values for


I









a are -1 to 1. When a is 0, no warping takes effect, for a > 0 high frequencies

are compressed and low frequencies expanded and for a < 0 low frequencies are

compressed and high frequencies expanded.


S-1~
z1 a1 (1.4)
1- az-1

After the WLPC analysis is done, the excitation is passed through a warped IIR

(WIRR) that results in the warped bandwidth enhanced speech. An additional radius

term 7 is added the WIIR so that the resulting spectral slope remains the same. The

new algorithm for bandwidth expansion can be seen in Equation 1.5.


H(z) A (1.5)


The final architecture used for bandwidth expansion can be seen in Figure 1.4. Boillot

found values for of 7 and a equal to 0.35 and 0.4 respectively, the algorithm performed

the best in subjective loudness tests. The value of r was set between 0.4 and 0.8 as

a function of tonality [4].

1.1.3 Combined Algorithm

In order to evaluate the effects of both ERVU and bandwidth expansion, the

two algorithms have been combined on a window by window basis. The combined

algorithm increases intelligibility and perceptual loudness gain without increasing the

signal power.

1.2 Listening Tests

In order to verify the real-world effects of speech enhancement algorithms, some

form of subjective listening tests must be performed. Three different levels of listen-

ing tests were performed for this purpose. First, controlled tests were administered

in the laboratory. These tests were performed using clean speech from the TIMIT


















































Figure 1.4: Realization of the Warped Bandwidth Expansion Filter.









and TI-46 databases. The purpose of these tests was to obtain preliminary results

and make necessary adjustments to the algorithms. Second, the tests are modified to

better model the real-world environment. These models include the introduction of

environmental noise and modelling of the iDEN cellular phone speaker. Finally, the

tests are conducted using the actual phone. This is enabled by the recent incorpora-

tion of the Java 2 Micro-Edition virtual machine on certain iDEN phones. The three

tests are discussed along with their results in ('!i lpters 2, 3 and 4 respectively.

1.3 Chapter Summary

The outline for the remainder of this thesis is as follows:

Chapter 2: PC based listening tests. This chapter will describe the basic

set-up of the laboratory listening tests with emphasis on clean speech and noise free

environment. It will also present results obtained in past research using this set-up.

Chapter 3: Expanded PC based listening tests. In this chapter, the basic

listening tests described in C'!i lpter 2 will be modified to better approximate real-

world conditions. For this test, background noise consisting of pink, car and babble

noise is added. Additionally, a model of the frequency response of the Motorola i90c

phone is used to better simulate cellular phone use. The results will show that the

algorithms both improve intelligibility by 4.8 percent at -5 dB SNR and result in a

perceptual loudness gain of 4 dB.

Chapter 4: Java implementation of listening tests. This chapter will de-

scribe the implementation of listening tests for Motorola Java enabled cellular phones.

It will also describe the support applications developed to manage and evaluate lis-

tening tests.

Chapter 5: Conclusions and future work. This chapter will discuss the

results of the listening tests, shortcomings of the algorithms and future work on the

subject of speech enhancement and real-world testing.














CHAPTER 2
PC BASED LISTENING TESTS


The first step in evaluating the effects of speech enhancement algorithms is some

form of listening test. The results of these tests can be used to tune, modify or discard

algorithms based on initial performance. For the purpose of this thesis, the evaluation

of the algorithms starts with a simplified listening test. These tests are written in

MATLAB and were originally developed by CNEL member Mark Skowronski. The

tests have been modified through time to accommodate changes to the listening tests.

This chapter will describe the tests as they were when Boillot [4] tested the bandwidth

expansion algorithm. Intelligibility, loudness and acceptability tests were conducted.

The Energy Redistribution Voiced/Unvoiced (ERVU) algorithm, originally tested by

Reinke [22], was not initially tested in this listening test environment. However, the

results of ERVU intelligibility tests will be discussed in Section 2.1.2.

2.1 Intelligibility Test

The purpose of communications is to get the message across as clearly as possi-

ble. Algorithms that attempt to increase intelligibility or perceptual loudness require

testing to verify their applicability. Intelligibility testing methods come in many

forms. Some of these methods include the Diagnostic Rhyme Test (DRT), the Modi-

fied Rhyme Test (I! RT), and the Phonetically Balanced word lists (PB). The method

used in this experiment is a variant of the DRT [28, 4].

The intelligibility tests were conducted using speech samples from the TI-46

database [26] sampled at 10 kHz. Sets I, II and III, from Table 2.1 were used. These

sets were originally used by Junqua [12] to test the effect of the Lombard effect on









speech intelligibility. The Lombard effect is the way people speak differently when in

a noisy environment. They try to compensate for the noise by speaking louder, slower,

more clearly and with more stress and emphasis. The individual sets are considered

to be easily confusable and provide a good vocabulary for testing intelligibility.

The MATLAB GUI used to conduct the test is shown in Figure 2.1. For each

utterance, an alternate utterance was selected from the same set and presented as a

confusable choice. The GUI allowed only one choice and did not limit the number of

times the utterance was p1- i,- d. Each utterance had an equal chance of being selected.

The order of selection was randomized so that the listener had no knowledge of which

utterance was correct. Half the utterances presented were left in their original form

and the other half enhanced with the bandwidth expansion algorithm.


SNH = -5 JB

Ccuri = 1 of 80


M N



Play Next


Figure 2.1: Intelligibility test GUI


2.1.1 Bandwidth Expansion Results

All though the bandwidth expansion algorithm only attempts to increase the

loudness of speech, it still must be tested to ensure that there is no decrease in

intelligibility. A total of 60 utterances be presented to each listener with added

Gaussian noise at 0 dB SNR. The test resulted in an overall decrease in intelligibility of

0.;:'. 3.1 at a 95' confidence interval. These results showed that the bandwidth

expansion algorithm had no measurable effect on intelligibility [4]. These results









were as expected since bandwidth expansion only modifies the voiced phonemes and

unvoiced phonemes predominantly define intelligibility.


Table 2.1: Vocabulary of words used for Intelligibility Test



I f, s, x, yes
II a, eight, h, k
III b, c, d, e, g, p, t, v, z, three
IV m, n


2.1.2 ERVU Results

Reinke initially used the same form of intelligibility test as Boillot to test his ERVU

algorithm and a high-pass filter. He later used the DRT to test other intelligibility

enhancement algorithms [22]. The tests used sets I, II III and IV from Table 2.1.

Gaussian noise was added to the utterances at 0 dB and -10 dB SNR levels. He

reported a :'. increase in intelligibility at 0 dB SNR and 5.5'. at -10 dB SNR. His

test was administered to 25 listeners, whose ages ranged from 21 to 37.

2.2 Perceptual Loudness Test

Loudness is the human perception of intensity of sound [8]. It is a function of sound

intensity, frequency and quality. The loudness of single sinusoids can be measured

using the equal loudness contours [29]. This measure, using the units phon, in only

valid for narrow band signals. For this reason, Boillot used ISO 532B analysis[10].

Subjective measures are performed using a loudness test.

2.2.1 Objective Loudness

Speech is a wide band signal and perceptual loudness cannot be measured using

the equal loudness contours. Instead, a model first developed by Zwicker [6], ISO 532B

was used to perform objective measures. ISO 532B uses concepts from auditory filter

models and the critical band concept [7]. Additionally, ISO 532B follows the model









discussed by Baer, Moore and Glasberg [2], which was a revision of Zwicker's model

and compensated for low frequencies and levels. Boillot's test on TIMIT sentences

showed a 2.1 dB gain for vowels using ISO 532B analysis.

2.2.2 Subjective Loudness

To verify the perceptual loudness increase of the bandwidth expansion algorithm,

an apparent gain measurement would be required. Until now, there were no known

tests for quantifying speech loudness. For this purpose, subjective loudness tests

were developed. The subjective loudness tests were performed in MATLAB and used

utterances from the TI-46 database shown in Table 2.2. The utterances were sampled

at 10 kHz and the a term of Equation 1.4 was set to 0.5. The listener was presented

with two utterances of the same word, one being the original and the other being the

enhanced. The MATLAB GUI used can be seen in Figure 2.2. The listener would

pl i, each sound and then make a judgment on which one sounded louder.


Table 2.2: Vocabulary of words used for Loudness Test



V1 zero, one, two, four, five, six, seven, nine
V2 enter, erase, help, repeat, right, rubout
V4 a, eight, h, k
V5 b, c, d, e, g, p, t, v, z, three
V5 m, n


A total of 80 words was presented to each listener. Boillot used 15'. of these

words to perform a screening evaluation. The listener had no knowledge of which

was the modified or the enhanced word and the order was randomly selected. First,

both words are normalized to equal power and then the enhanced version would be

scaled randomly between 0 dB and 5 dB in 0.5 dB increments. This provided the

perceptual gain of the enhanced algorithm. The scaling point which produced a 50'.

selection rate marks this perceptual gain crossover point. The bandwidth expansion










01


Count = 1 of 80



Next



Play 1 Play 2



S1 is Louder i '2 is Louder



Figure 2.2: Loudness test GUI

algorithm resulted in an approximate 2dB crossover point. This is the point at which

the listener is guessing, hence, 50'. the enhanced version was chosen. These results

can be seen in Figure 2.3, where the results are shown in solid lines.

The screening process ensured that the data collection was accurate. For this

portion of the tests neither word was modified but one was scaled. It tested the

hearing resolution of the listener and, at the same time, ensured the listener was

p iing attention and not suffering from fatigue. It was important to verify the level

at which the human auditory system could perceive a change in loudness before

the algorithm could be considered effective. From Figure 2.3, the screening results

(indicated by dashed lines) are equal at 50'-. at OdB and diverge as expected.

2.3 Acceptability Test

The goal of the speech enhancement algorithms is to increase the intelligibility and

perceptual loudness of speech without deteriorating the naturalness of the speech.

Boillot found that the loudness of vowels increased monotonically until the spectrum






















r ---.- -
o i 2 3 4
dB drop

Figure 2.3: The Bandwidth Expansion results for all listeners on the Loudness test.
Vertical bars indicate 95'. confidence intervals [4].

was flat. This lead to an obvious distortion of the speech. To ensure that the warped

bandwidth expansion algorithm did not effect the quality of speech, Boillot included

an acceptability, or quality, test [4]. This test used a number rating system to quantify

the overall impression. Boillot used paired comparison tests to evaluate acceptability.

Additionally, the test included a subjective loudness assessment. The test used an

original and modified (enhanced) version of the same sentence taken from the TIMIT

database. A total of 20 phonetically balanced sentences sampled at 16 kHz were used.

The original and the modified sentences were scaled to have equal power on a frame-

by-frame basis. The listeners would phli each sentence pair and then subjectively rank

each one. The listeners gave a mark of excellent, good, or fair for each sentence. The

number one through three corresponded to each mark respectively. Listeners were

directed to score relatively. For example, even if both sentences sounded excellent,

they should still try to determine which of the two was better and give that sentence

the excellent mark and the other a mark of good. The loudness assessment just asked

the listener to determine which sentence sounded louder overall.










do atypic I3 rarrrer; grc-.'j oak

Court = 1 cr 20







Play 1 Play 2


1 i. Lol.jder F 2 i. L3uder


R astna. iR trig
,r E.,-.ellenI P E .e.llernl

C gc.:.d C go:d

C r ti


Figure 2.4: Acceptability test GUI


Boillot reported a quality rating of 1.56 for original and 1.47 for modified speech.

The modified sentence was selected louder C91' of the time. These results in indicate

that the overall quality is not affected. The loudness assessment results provide a

preview to how the algorithm would perform on sentences instead of the single word

utterances used in loudness and intelligibility tests.















CHAPTER 3
EXPANDED PC BASED LISTENING TESTS


The listening tests discussed in C'! ipter 2 did not implement typical real-world

effects on speech. In this chapter, we will expand these tests to more closely model

real-world environments. This is done in three steps. First, to simulate the effects of

vocoders used on cellular phones, the speech is vocoded then devocoded. Then, noise

sources are chosen based on real-world environments. Finally, the Audio EQ Model for

the Motorola I85c cellular phone is implemented to simulate the cellular phone speaker

frequency response. Additionally, the bandwidth expansion and ERVU algorithms

are combined in an attempt to increase perceptual loudness and intelligibility. It is

this combination algorithm that will be used in this chapter. The resulting signals

modified by this algorithm will be referred to as the i ii'. l I" or "modified" signal

in this chapter. For these tests, Sony MDR-CD-10 headphones were used.

3.1 MotrolaTM VSELP Vocoder

The Motorola iDEN i90c phone uses the Vector Sum Excited Linear Prediction

(VSELP) Vocoder [17] to provide encoding and decoding of speech for transmission.

The purpose of the encoding on the send side is to compress speech to limit trans-

mission bandwidth. On the receive side, the vocoder then decodes the compressed

speech. The result of encoding and decoding is degradation of the speech. This

degradation includes an already limited frequency range and loss of naturalness. To

simulate this degradation, Motorola has provided a C program to emulate the encod-

ing and decoding of speech. This allows the listening tests to better model the sound

of speech delivered by the phone. The C program uses the Advanced Multi-Band









Table 3.1: Description and Samples of Noise Sources.



Pink Noise acquired by sampling high-quality analog noise generator
(Wandel & Goltermann). Exhibits equal energy per 1/3 octave.

Babble Noise acquired by recording samples from 1/2" B&K con-
densor microphone onto digital audio tape (DAT). The source of
this babble is 100 people speaking in a canteen. The room radius
is over two meters; therefore, individual voices are slightly audible.
The sound level during the recording process was 88 dBA.

Volvo 340 noise acquired by recording samples from 1/2" B&K con-
densor microphone onto digital audio tape (DAT). This recording
was made at 120 km/h, in 4th gear, on an asphalt road, in i ,iiv
conditions.


Excitation (AMBE) vocoder model which is also used in iDEN phones. To do this,

the speech must first be re-sampled at a rate of 8 kHZ (the sampling rate used on the

phones). This furthers the real-world model by bandwidth-limiting the speech.

3.2 Noise Sources

The listening tests described in Chapter 2 exclusively used white Gaussian noise

but, there is a larger variety of noise types we experience in our every i, life and few

of them are Gaussian. Some of these noises are car, cocktail party (babble), machine

and pink noise. It would be ideal to test the performance of our algorithm with all

possible noise sources. However, in order to prevent listener fatigue and to allow

multiple SNR level' ii- the noise sources were limited to car, babble, and pink

noises. These noise sources were obtained from Rice University Signal Processing

Information Base [23]. The noise samples are sampled at 19.98 kHz and available in

both MATLAB .MAT format and .WAV format. Table 3.1 describes the noises and

provides samples.









3.2.1 SNR Calculation

There are several methods for calculating the SNR of a signal. The classic form

is shown in Equation 3.1. Where x[n] is the clean speech signal, v[n] is the additive

noise and N is the number of samples for the signal.



SNR 10 x log0 ,, -- (3.1)
: (VE])2)
\n=-0 /
This form of SNR is usually taken over the entire signal length. Unfortunately,

this form will not effectively measure the perceptual significance of noise to human

hearing. Humans are limited in their frequency range of hearing. Human hearing

typically ranges from 20 Hz to upwards of 20 kHz. Obviously, a high power signal

at frequencies above and below this range will not effect the perceptual SNR from

the listeners standpoint. Additionally, there is a sharp drop-off in the intensity of

sound above 4 kHz. This drop-off is apparent in the equal loudness contours shown in

Figure 3.1. For this reason, an alternate approach to the method used in Equation 3.1

is needled.

3.2.2 Segmental SNR

Classic SNR calculations carry little perceptual significance, since they will tend

to be higher as the ratio of voiced to unvoiced speech increases. A better calculation

can be obtained by considering the SNR values for frames of speech. Segmental SNR

(SNRseg), shown in Equation 3.2, uses a frame based average of the standard SNR

in Equation 3.1. The basis of the SNRse, calculation is that it takes into account

the short duration of unvoiced speech. If the SNRse, equation is rearranged, we

see that it is a geometric mean of the windowed speech signal SNR. This is seen in

Equation 3.3.











10000


S 801



40







0,02 0.05 0.1 0.2 0.5 1 2 5 10
Frequency, kHz

Figure 3.1: Equal Loudness Contours



/Lx(k+l)-1 \
t M-1 E (X2
SNR, = 10 x logo k-1x (3.2)
o-0: E (v )2
\ n=Lxk

,Ii /Lx(k+l)-l \ L-
M -1 n : (X[_])2
SNRg -0 ox loglo H kn-o/ (3.3)



The windowed based SNRs are usually limited to upper and lower values. This is used

for instances where the widowed SNR is significantly low or high. In these cases the

extremes would dominate the measurement. An example of this is if the signal power

goes to zero the windowed SNR would be set to -10dB instead of -oo. Likewise, if

the noise signal goes to zero, the widowed SNR is set to 45dB instead of oo.

The primary problem with using SNRse,,g to calculate the noise gain required, is

that it calls for an iterative process. That is, since the calculation is non-linear, there

is no closed form solution like standard SNR. The calculation must be repeated for










several noise gain values before the desired SNR level is achieved. For this reason, we

needed another approach to calculate perceptual SNR.

3.2.3 A-Weighting

Another approach to effectively adding noise at a specific perceptual SNR level is

A-Weighting [20]. A-Weighting coefficients are used to filter both the speech signal

and the noise. These temporary signals are then used to calculate the required gain

for the noise source in order to achieve a specific perceptual SNR. The frequency

response of the A-Weighting filter is shown in solid red in Figure 3.2. The figure

also shows the results of averaging the inverse of the equal loudness contours from

Figure 3.1 in dashed blue.



7707 i77- ---------- ------7---7---7i----..-------------7,-- ..-------------------------..









5 0
0 --- -----. -------.----- -----,-- -- -- ,-- ------ ----- ---,.-.---- .. ---..... ...




10' 10' 10' 104
frequency/Hz

Figure 3.2: A-Weighting Filter Magnitude Response.


N i n1_ i and Miura [20] found that A-Weighting was ideal for calculating the per-

ceptual SNR of narrow-band noise. Though speech and many noise types are consid-

ered wide-band signals, the use of A-Weighting is still a better approximation than

the classic calculation. Additionally, the time required for the listening tests should

be relatively short and more complex calculations, such as ISO 532b, are not practical










for the listening tests. For these reasons, A-weighting was used for the calculation of

the SNR level for the listening test discussed in this chapter.


3.2.4 Choosing the SNR levels

If we were to test the full range of a listener's hearing in noisy environments,

the peak increase in intelligibility could be found. This would require adjusting the

SNR from the point where the listener was achieving 1(111' accuracy to where he or

she is merely guessing every time on the intelligibility test. Figure 3.3 shows the

expected results if the SNR was adjusted from a very low SNR level to a very high

level. This figure shows an expected performance and values are provided for clarity

only. At a particular SNR level, the maximum percent increase would be achieved.

Unfortunately, this would require that the test duration be extremely long. And,

given the three noise sources, it would most definitely lead to listener fatigue. This

required some protesting to establish what levels to use. We decided that the desired

accuracy for un-enhanced speech should be somewhere close to 7.' This number,

being half way between the upper and lower limits, would leave room for inexperienced

and experienced listers.

Expected Intelligibllty Results
1 Enhanced I Always Right
Original
90 Difference
so -c /~*~
w / MaxiHmum
SC Difference

/
/








Low SNR High SNR


Figure 3.3: Conceptual Intelligibility Test Results for a Single Listener.









The procedure for finding these SNR levels involved a preliminary intelligibility

test. Four listeners were used for these tests. First the SNR was set to 5dB and then

an adaptive algorithm was used. For each noise source, a total of 80 utterances were

presented. After 20 utterances, the percent correct was calculated. If the percent

correct was higher than 77"' the SNR was lowered by 1dB. If it was lower than

7',. the SNR was raised. After each additional five utterances, the percent correct

was again calculated. If the percent correct was still approaching 77,.'. then the

SNR was not changed. However, if it was moving away or unchanged the SNR was

adjusted. Table 3.2 shows the results of the preliminary intelligibility tests. The

listener's language and SNR dB level which resulted in 7 .' correct for the respective

noise source. Based on these results, SNR levels of -5dB and 5dB were chosen.


Table 3.2: Results of Preliminary Intelligibility Test.



Listener Native Language Babble Car Pink

I English 1.7dB -9.4dB -0.8dB
II English 1.0dB -2.2dB 0.0dB
III Hindi 2.2dB -3.2dB -3.5dB
IV Chinese 3.5dB 2.5dB 2.7dB




3.3 Audio EQ Filter

Cellular phone speakers are designed with size and cost in mind. They require

compact size and limited cost in order to be used in cellular phones. Additionally,

the Federal Communications Commission (FCC) puts constraints on the speaker peak

output sound pressure level (SPL). Because of these design constraints the speakers

have limited frequency range. Motorola provided the frequency response for the

iDEN i85 phone speaker (assumed to be linear), shown in Figure 3.4. The MATLAB

function firls.m, from the Signal Processing Toolbox, was chosen to design the filter.










The firls.m function uses a least-squares (LS) approach to derive the FIR filter

coefficients for the Audio EQ model [14]. The FIR filter is linear-phase and therefore

does not distort the speech signal. The purpose of this model was to mimic the










R -50
-150 I I I I I I I
0 500 1000 1500 2000 2500 3000 3500 4000
Frequency (Hz)
2000
200 |--------------------------------------*---------------


^ -2000 I ------- -------- ----------------

-4000 -- --

-6000
0. 500 1000 1500 2000 2500 3000 3500 4000
Frequency (Hz)

Figure 3.4: Phase and Frequency Response for the Speaker EQ Model.


cellular phone environment. This model was used to filter the speech signals after they

were enhanced with the combined algorithm. Since we know that environmental noise

is not limited to the same bandwidth as cellular phone speech, this was performed

before the SNR calculation and only the speech signal was filtered.


3.4 Listener Demographics and Test Results

Demographics A total of 22 listers were tested in both Loudness and Intelligibility

tests. A total of six listeners were native English speakers. Nineteen of the listeners

were male and three were female. Five of the listeners were considered experienced

(taken multiple listening tests) listeners. The listeners ranged from 22 to 42 years of

age. The average test time was 22 minutes and 38 seconds.









Listening Test Flow Diagrams Figures'3.5 and 3.6 show the flow diagrams for

the loudness and intelligibility tests respectively.


Figure 3.5: Signal Flow Diagram for the Loudness Tests.


Figure 3.6: Signal Flow Diagram for the Intelligibility Tests.



Results for Loudness Tests The average perceptual loudness gain was 4dB. This

result is apparent in the crossover plot, Figure 3.7. The screening process, shown by

the dotted lines, indicates that the results are accurate and that the users were p ,iing

attention. Of the 22 listeners results only four fell below the 4dB crossover and none

of these fell below 2dB. Table 3.3 shows the total results for the loudness tests. These

results are higher than earlier tests conducted by Boillot. This may be attributed to

the application of the Audio EQ model. The 2.5dB gain was for formant expansion on

speech sampled at 16 kHz. The 4dB gain was achieved using the combined algorithm

on vocoded speech sampled at 8 kHZ. The Audio EQ model has a peak around the










-o -0-- original
/'' -- modified
S0. 81

-a 0.6 /
0)0



o0


0 -----,--

0 1 2 Bdrop3 4 5
dB drop

Figure 3.7: The results for all listeners on the Expanded Loudness test. Vertical bars
indicate 95'. confidence intervals [4].


2-3 kHz range which also corresponds to the highest sensitivity on the ISO-226 equal

loudness curves [9].


Results for Intelligibility Tests The Intelligibility tests resulted in an increase of

4 '. at -5dB SNR for enhanced speech over all noise types and confusable sets. This

is a minimum increase and we expect that the maximum increase would be larger. At

a 5db SNR level the tests resulted in less than 1 decrease in intelligibility. The 95'.

confidence intervals are shown for overall results, in Tables 3.4 and 3.5. These tables


Table 3.3: Results for Subjective Loudness Tests.


Scaling of Times Selected Times Selected Percent Enhanced
Modified Original Enhanced Selected


-5dB 138 93 40
-4dB 105 109 51
-3dB 66 108 62
-2dB 64 185 68
-1dB 41 210 84
OdB 36 215 86









Table 3.4: Intelligibility Test results for 5dB SNR.


Alg. All I II III IV
Overall O 83.56 5.94 89.06 92.36 79.93 69.00
E 82.70 4.69 82.24 86.31 81.96 81.18
Car O 88.35 91.67 90.35 86.46 34.21
E 88.35 91.23 88.85 85.15 82.89
Babble O 84.92 93.86 91.05 78.59 45.61
E 81.65 82.46 92.11 79.21 65.79
Pink O 78.20 56.00 90.79 72.02 60.53
E 77.40 65.44 51.05 83.50 75.69


Table 3.5: Intelligibility Test results for -5dB SNR.


SAlg. All I II III IV
Overall O 66.49 3.98 66.57 81.83 64.31 53.73
E 71.31 4.92 68.45 73.25 74.59 61.26
Car O 73.08 83.86 74.91 64.31 31.58
E 78.47 81.14 59.74 82.22 39.47
Babble O 62.48 55.26 78.20 67.79 14.47
E 72.17 70.18 66.18 74.65 44.91
Pink O 64.13 50.88 79.91 61.00 43.86
E 65.09 46.05 75.12 69.49 50.38


also show results for all three noise sources vs. all four confusable sets in Table 2.1

and for the original (0) and enhanced (E) signals.

3.5 A Note on ERVU

Though the SFM technique was used for voiced/unvoiced decision in testing for

its lower computational complexity, there is an issue of precision when using a fixed-

point Digital Signal Processor (DSP). The geometric mean is taken on the DFT

values of a frame of speech. We know that these values are less than one. The result

of multiplying all the values in one frame would result in a number less than the

precision of the Motorola 56000 DSP used in the iDEN phones. For example 0.9160

4.8 108- when compared to the DSPs precision of 2-15 = 3.11 10-5. More elaborate









calculations using logarithms produce a solution but require higher computational

complexity. We propose an alternate method for voice/unvoiced decision using a

peak autocorrelation ratio technique.


1 N-1
r[k] = N x[n] x*[n k], where N > 0 (3.4)
n=o

Equation 3.4 shows the biased autocorrelation function for lag k. Autocorrelation,

is commonly used in pitch recognition [1] systems. Pitch, the rate at which the glottal

opens and closes, is inherent to voiced speech. The peaks in the autocorrelation

function take on voiced speech are separated by the periodicity of the pitch. For

unvoiced speech the autocorrelation function resembles something close to an impulse

response. This is due to the characteristic of unvoiced speech being close to stationary

white Gaussian noise.



max r[m]
mEAm-N- 1 fs .
ratio r where N > 0 and = maximum pitch. (3.5)
r[0] Am

Instead of calculating the pitch, we consider the ratio of the signal power, equiva-

lent to r [0], and the maximum value of the autocorrelation function from lag Am to

N 1. Am is chosen to remove any spreading of the impulse around lag zero and to

ignore unrealistic pitch values. The peak autocorrelation ratio technique results in a

6.'"- voiced/unvoiced classification error as compared to a 3.>'-. error using the SFM

technique. However, the performance of SFM decreases as the SNR decreases. This

can be alleviated using the peak autocorrelation ratio technique. This technique is

considered very robust to noise [1] for pitch detection.















CHAPTER 4
JAVA IMPLEMENTATION OF LISTENING TESTS


In Chapter 3 we discussed methods for making the listening tests discussed in

C'!I pter 2 more relevant to future real-world operation of the algorithms. These

enhancements included real-world noise, vocoder effects and modelling the speaker EQ

cures of the cellular phones. However, these listening tests results are still somewhat

artificial since the user is actually listening on a headset and not on a cellular phone.

Clearly, if we could move our whole listening test environment to the phone, then the

tests could be run in the true environment such as riding in a car on a highway or

using the phone in a crowded social gathering. This chapter discusses the efforts of

implementing the listening tests on the Javaphone Java enabled cellular phone-, the

interface between the PC and phone and the database management. The listening

tests conducted in C! Ilpter 3 gave promising results towards real-world performance.

To finish the evaluation, we must be able to quantify the performance in the true

listening environment. Real-world testing can be performed using the Javaphone in

a natural environment.

4.1 J2ME and J2SE

The Java 2 Micro Edition (J2ME) is a software development kit (SDK) commonly

used in mobile devices. Development in J2ME is limited in several v- -v- as compared

to the Java 2 Standard Edition (J2SE). First, the limited memory space, described in

section 4.3, requires efficient coding. Second, the limited class set reduces function-

ality and sometimes requires additional coding. Unlike J2SE, development of J2ME

applications requires the use of a configuration and profile. The Connected Limited









Device Configuration describes (CLDC) the API for a certain family of devices which

includes the Motorola iDEN series Java enabled phones. The Mobile Information

Device Profile (\111)P) sits on top of the configuration and targets a specific class

of devices [15]. All Java applications written for the phone are an extension of the

MIDlet class. A MIDlet is a MIDP application. Programming in J2ME is performed

by utilizing the classes within MIDP and CLDC APIs. The MIDlet is developed spe-

cific to the devices it will run on and, in this case, the Motorola Java enabled iDEN

phones.

4.2 Why J2ME?

Java is an ever-evolving software development tool. Motorola has incorporated

the Java 2 Micro Edition (J2ME) virtual machine, described in Section 4.1, in cer-

tain iDEN model phones. Some advantages of Java are portability (write once, run

anywhere), thorough documentation and extended networking ability. Though J2ME

is limited in these advantages, it still provides an excellent environment for the de-

velopment of applications on cellular phones. The following is a list of some of the

abilities J2ME has.

1. Communication with a PC via the serial port.

2. Communication with web resources via the internet.

3. A database storage system.

4. Basic GUI operation.

5. Control of vocoded speech files.

6. Image rendering and animation.


7. Multi-threaded operation.









The development of the listening tests on the iDEN phones incorporates the abilities

listed in items 1, 2, 3, 4 and 5, in the list above.

4.2.1 Developing in J2ME

The procedure for developing J2ME code is as follows:

1. The code is written within the limitation of the device it designed to run on.

2. It is compiled using any standard Java compiler and J2ME libraries.

3. It is preverified using a device-specific preverifier.

4. The code is tested on an emulator for bugs.

5. Any problems encountered in the emulation are debui-:, I-

6. The code is recompiled, packaged into a MIDlet suite, converted to Java Archive

(JAR) files and uploaded to the phone.

The uploading of MIDlets is performed using the Java Application Loader (JAL)

utility. When the phone is connected to the PC's serial port, the JAL utility can be

used to view, delete, and upload MIDlets. If any bugs are discovered after running

the MIDlet, the code must be debui.--, .1 and steps two through six are repeated. The

preverification and emulation software may not alv--x catch problems that will occur

once the MIDlets are executed on the phone. Multiple MIDlets can be uploaded to

the phone. If these MIDlets are required to share resources, they must be part of

the same suite. The multiple MIDlets are packaged into a MIDlet suite and then

compiled and converted to JAR files to be uploaded to the phone. See Appendix A

for an explanation of all classes and methods created to implement the listening tests.

4.2.2 J2ME Constraints

Constraints on font, screen colors, screen size and image rendering need to be

carefully considered when writing J2ME code for a specific device. The basic user









4.3 Motorola iDEN Series Phones

The Javaphone has three user input devices [21]. They include a alpha-numeric

keypad, similar to standard touch-tone telephones, a 4-way navigation key and two

option keys. Through the keypad all numbers, letters and punctuation can be entered

by sequentially pressing keys (multi-tapping.). The 4-way navigation key can be is

used to move though menus, lists, radio-buttons and choicegroups. The two option

keys are used as control to select from menus, lists, radio-buttons, choice-groups or to

program defined options. The Motorola i90c iDEN phone is shown in Figure 4.1. The

phone has 3 types of memory dedicated to the Java VM [16]. Data memory (256k

Bytes) is used to store application data, such as image files. Program memory (320k

Bytes) refers to the memory used to install applications (I\I)lets). Heap memory

(256k Bytes) refers to the Random Access Memory (RAM) available to run a Java

application.

4.4 Listening Test Setup

Three separate listening tests run on the phone. They include loudness, intel-

ligibility and acceptability, similar to those used in C'! lpter 2. These tests can be

run using the ListenT MIDlet on the phone which is part of the ListenM package.

The user is asked to enter the information including; name, age, native language and

date. Next, the user selects between one of the three tests. The basic flowchart for

the ListenT MIDlet is shown in Figure 4.2. The three listening test flowcharts can

be seen in Figures 4.4, 4.5 and 4.6.

4.5 ListenT MIDlet

The ListenT MIDlet is an extension of class MIDlet and implements the interface

CommandListener. The implementation of CommandListener allows the MIDlet

to monitor commands received through the phone's option keys. The commands are

then interpreted based on the current screen, command selected and index selected,
































Loud \elect Acept

Intel


1 2) Z


Figure 4.2: Flowchart for Listening Tests ListenT.java


if any. Initially, when the MIDlet is run, it first executes its constructor. This sets up

any classes or variables that are initially needed to execute the MIDlet properly. In

ListenT the constructor creates three ListenDB databases (explained in section 4.6)

for each of the three listening tests, the ansBuffer buffer and the RandObj object

a random number generator which extends the ability of the Random class. It then

initializes the screens testScreen, userScreen and doneScreen by calling their ini-

tialization methods. Next, it calls the MIDlet.startApp method (this is alv-v- the

case with any MIDlet). Within this method the userScreen is set as the current

display and the listening test is ready to start and the randomizeDir method is

called. The randomizeDir method takes all Voice Note Files (VNF) (described in



















T-ls i




















Figure 4.3: Javaphone Listening Test GUI.


Section 4.7) and randomizes the order so that every time a test is taken the order

of utterances changes. Three separate VNF directories are created for the loudness

test, intelligibility test and acceptability test. To do this a naming scheme was used

on the VNFs. Each VNF name begins with a two letter designator followed by an

"_". These designators are "It", "it" and i for loudness test, intelligibility test

and acceptability test respectively. From this point on the user navigates through

different screens based on the input received.

Once information is entered and the command "B1 3,i is called by pressing the left

option key. The CommandListener method commandAction method then compares

the command to the current screen. It stores the user information in ansBuffer and









sets the display to testScreen. The user then selects one of the three test types and

presses the command "Select" using the left option key. Again, the commandAction

method compares the command to the current screen. It then initializes the corre-

sponding listening test screen. The next three subsections will give detailed program

execution based on the users test selection.

4.5.1 Loudness Test

The loudness test flowchart is shown Figure 4.4 for a reference. If the user se-

lection was "Loudness Test" from the testScreen, the CommandListener method

commandAction will initialize the loudness test screen, set the counter to zero and

generate a random sequence, based on the length of the test (in this case 20). The se-

quence will consist of the numbers zero or one and will be used to determine in which

order the enhanced and un-enhanced utterances will be pl i -d. The loudScreen

screen is set as the current di-pli w. The user then selects one of the utterances, not

knowing which is enhanced, and pl .i-, the sound by pressing "Play" using the left op-

tion key. The method pl,., uGample of class ListenT is passed the utterance selected.

This method utilizes method VoiceNote.play to pl i, the utterance. VoiceNote is a

Java package provided by Motorola that allows the pl i ,ing, recording and manage-

ment of sound files in .vcf and .vnf format.

Once both sounds have been p1l i,. d, the user then selects which sounded louder

by pressing "Select" using the right option key. The commandAction method verifies

that both sounds have been pl i., d and then compares the selection to the random

sequence element at the number of the counter. If they are equivalent, then iight" is

stored in the ansBuffer. Otherwise, '.--i ig" is stored. Next, the counter is checked

to see if 20 utterances have been evaluated. If it is not reached, the display remains

loudScreen and the test is continued.

Once the test is complete (the counter reaches 20), the test results are stored

to the database using the ItAnsDB object. Database storage will be discussed in






















































Figure 4.4: Flowchart for Loudness Test subprogram.









Subsection 4.5.4. After the data is stored, the di -piv is set doneScreen. From here,

the user can choose to take another test or exit the MIDlet. Figure 4.3 shows the

GUI for the listening tests on the phone discussed in the next three sections.

4.5.2 Intelligibility Test

The intelligibility test flowchart is shown Figure 4.5 for a reference. If the user se-

lection was "Intelligibility Test" from the testScreen, the CommandListener method

commandAction will initialize the intelligibility test screen, set the counter to zero

and generate a random sequence, based on the length of the test (in this case 20e).

The sequence will consist of the numbers zero and one and will be used to determine

the order that the correct and incorrect choices will be di-i, ',, 't. The intelScreen

screen is then as the current display. The user then pl i,-; the sound by pressing

"P1 ,v using the left option key. The method ;pl,/ ample of class ListenT is passed

the utterance selected.

Once the sound has been pl ., '1, the user then selects which utterance was heard

by pressing "Select" using the right option key. The commandAction method ver-

ifies that the sound has been pl i. and then compares the selection to the ran-

dom sequence element at the number of the counter. If they are equivalent, then

iight_[utterance]_[algorithm]" is stored in the ansBuffer. Otherwise, .n-15,,_ [ut-

terance] _[algorithm]" is stored. Next, the counter is checked to see if 20 utterances

have been evaluated. If it is not reached, the display remains intelScreen and the

test is continued.

Once the test is complete (the counter reaches 20), the test results are stored

to the database using the itAnsDB object. Database storage will be discussed in

Subsection 4.5.4. After the data is stored, the di -piv is set doneScreen. From here,

the user can choose to take another test or exit the MIDlet.






















































Start


Figure 4.5: Flowchart for Intelligibility Test subprogram.



















































Figure 4.6: Flowchart for Acceptability Test subprogram









4.5.3 Acceptability Test

The acceptability test flowchart is shown Figure 4.6 for a reference. If the user se-

lection was "Acceptability Test" from the testScreen, the CommandListener method

commandAction will initialize the acceptability test screen and set the counter to zero.

The acceptScreen screen is then set as the current display. The user then p1l li- the

sound by pressing "Play" using the right option key. The method p;/.,,/ ample of

class ListenT is passed the utterance selected. Once the sound has been p1 li,'. the

user then rates the quality by selecting "E:;. -, ni "Good", "Fair" or "Poor" and

pressing "Select" using the right option key. The commandAction method verifies

that both the sound has been p1l i,- d and then compares the selection to the random

sequence element at the number of the counter. Then ,ni _[algorithm]_ [quality

rating] is stored in the ansBuffer. Next, the counter is checked to see if ten ut-

terances have been evaluated. If it is not reached, the display remains acceptScreen

and the test is continued.

Once the test is complete (the counter reaches ten), the test results are stored

to the database using the atAnsDB object. Database storage will be discussed in

Subsection 4.5.4. After the data is stored, the di ppl is set doneScreen. From here,

the user can choose to take another test or exit the MIDlet.

4.5.4 Database Storage

The storage of answers is performed using the RecordStore class and methods.

The ItAnsDB, itAnsDB and atAnsDB objects are instances of ListenDB, which

were created in the MIDlet constructor, and control storage of data to a record using

the method ListenDB.addTaskRecord. The user information type and answers are

stored sequentially using the delimiter ":?:" in a String object. The data is then

stored in a RecordStore by the passing the string to the method addTaskRecord.









4.5.5 RandObj Class

The RandObj class is an extension of the Random class and allows the generation

of random numbers and sequences. The Random class is a pseudo random number

generator capable of providing a random number uniformly distributed between 215.

The method Date.getTime is used to generate a seed which is then passed to the

method Random.setSeed within the RandObj object. Method getRandNum of the

RandObj class are called from ListenT to generate these numbers.

4.6 ListenDB Class

The ListenDB class provides database management ability to both ListenT and

DbUpload MIDlets. The two main control methods of ListenDB, open and close,

allow access to a RecordStore database by checking to see if the database exists, ver-

ifying it is open, opening it and closing it. Four functional methods, addTaskRecord,

deleteTaskRecord, getRecordsBylD and enumerateTaskRecord, allow the addition of

records, deletion of records, browsing of records and organization of records, respec-

tively.

Note: The ListenDB class is not a MIDlet. It is only a class that adds functional-

ity to other classes. It is not executable and becomes a inner-class when instantiated

inside a MIDlet. The use of a separate class conserves memory. Since it is used by

multiple MIDlets, it will not occupy memory in each MIDlet.

4.7 Voicenotes MIDlet

The Voicenotes MIDlet ,which implements the VoiceNote Class, allows the record-

ing, p1 I backc, renaming and deleting of voice samples recorded on or off the phone.

The only direct access to sound p1 backk and recording is through VoiceNotes. The

sound files are stored as vocoded data. The flowchart can be seen in Figure 4.7.

Two other applications are written to support uploading and downloading sound files

to and from a PC. The simpleWriteVNF MIDlet is run on the phone while the


















































Figure 4.7: Flowchart for VoiceNotes MIDlet









simpleRead J2SE application is run on the PC. These two programs, provided by

Motorola, utilize the JavaTMCommunications API (commapi) package [24].

4.8 Voice Note Collection

Voice Note Files (VNFs) collected on the phone and down loaded to the PC as

described in Section 4.7. These files are then de-vocoded using a PC application

called "Voice Note Recorder" provided by Motorola. The de-vocoded file can then be

read by MATLAB and processed. At this point, any enhancement or modification of

the sound sample can be done. The next step would be to vocode the sound sample

using the "Voice Note Recorder" utility and upload it to the phone. Additionally,

PC recorded sound files can be vocoded and uploaded to the phone. Since, the PC

recorded samples are only vocoded once, therefore, the quality of the sample is better

than that of the samples recovered from the phone. The new VNFs are uploaded to

the phone using the JAL utility. From there, they can be p1 ,i, i or deleted using the

VoiceNotes MIDlet. Figure 4.8 shows the GUI and program flow for the Voicenotes

MIDlet.

4.9 DbUpload MIDlet

The listening tests results are stored as records on the phone. To recover these

records, the DbUpload MIDlet is used. This MIDlet is part of the ListenM package,

which allows it access to records stored in recAnsDb by the ListenT MIDlet. The

flowchart for this MIDlet is shown in Figure reffig:dbflow. This program requires

that the phone be connected to the PC's serial port for proper execution. The user

is first prompted to connect the phone to the PC and run the PC based application

DbDownload, explained in Section 4.10. Next, the user chooses which of the three

test types to download. When finished, the user is asked if the downloaded files are to

be deleted. Finally, the MIDlet is ended. Figure 4.10 show the GUI for the DbUpload

MIDlet.































49-


MGTm4


p.,O


Figure 4.8: Javaphone Voicenotes GUI.





















Delete
uploaded
Icrods From
DataBse



C"


Figure 4.9: Flowchart for DbUpload MIDlet


4.10 PC Coding Using J2SE

The J2SE application DbDownload is used concurrently with the DbUpload MI-

Dlet on the phone. For this application, the OBDC utility was used in Windows to

connect an MS Access database to the application. The MS access database and the

JDBC connection must share the same file name. Initially, the application connects

to the database using JDBC (Java Database Connectivity). The application window

can be seen in its initial mode confirming connection to the database, in reffig:dbdnfig.

This window, created from class Frame, has three functional buttons; Exit, Connect

to Phone, and Add Records. It also has three text-boxes to indicate the test type,

number of records to be add and an edit comment box to added to the records. Addi-

tionally, it has a status display that indicates what step the MIDlet is in the download

and database storage process. This display confirms step-by-step procedures and in-

dicates when problems occur. Once the phone is connected to the serial port and































Figure 4.10: Javaphone DbUpload GUI.

DbUpload is run, the "Connect to Javaphone" button is pressed. The application

will indicate the number of records and the test type that was downloaded. The user

can then choose to add a comment to the records or leave it blank. The button "Add

Records" is pushed and the system adds the records to the database using standard

SQL commands in sub-class AddRecords. These results can then be analyzed using

MS Access.


































Exit Connect to JavaPlhone


vriIl ileius &r i


Test Type:

Comlllenis on Records:

Number of Records to Add:

Connection to Database successful







Figure 4.11: GUI for DbDownload Application Running on PC















CHAPTER 5
CONCLUSION


The goal of this thesis was to evaluate the real-world performance of the Energy

Redistribution Voived/Unvoiced (ERVU) and Warped Bandwidth Expansion algo-

rithms. Earlier testing resulted in increased intelligibility and increased perceptual

loudness for these algorithms respectively. The algorithms were combined to con-

currently enhance both intelligibility and perceptual loudness. Environmental noise,

vocoding/devocoding effects and cellular phone speaker characteristics were incorpo-

rated in laboratory testing to mimic the cellular phone listening environment. PC

based listening tests were performed to quantify the performance of the combined

algorithm. To overcome the limits of laboratory testing, cellular phone based lis-

tening tests were developed in J2ME to provide a platform for testing algorithms in

real-world environments. This will provide concrete results and help determine if the

algorithms will be implemented fleet-wide.

The listening tests resulted in a 4.5'. increase in intelligibility at -5dB SNR and

4dB perceptual loudness gain. These results show that the combined algorithm will

provide increased performance without any added power to the speech signal. This

provides sufficient motivation towards implementation of the enhancement algorithms

on cellular phones.

The applications developed for the phones based listening tests allow the evalua-

tion of vocoded speech. There are two short-comings which must be resolved before

the tests will provide any conclusive results. First, the Java Virtual Machine (JVM)

must have access to controlling streaming audio. At this time it only has direct con-

trol over prerecorded vocoded speech (Voice Notes). This may require separate class









development and a more elaborate interface between the JVM and the cellular phone

DSP. At this time, speech enhancement can only be performed before the speech is

vocoded. This is the reverse order of which the implementation will process speech.

This will lead to inconclusive results since the effect of the encoding process by the

vocoder on enhanced speech is unknown. When the decision is made by the Motorola

iDEN Group to implement the algorithms on the cellular phone DSP and the inter-

face between the JVM and DSP are completed, the J2ME code can be appropriately

modified.

Future work should consider extensive cellular phone based testing once the proper

implementations are made. This may lead to a reevaluation of the algorithms and

parameters. It will be these tests that will provide the true performance of the al-

gorithms. Additionally, wireless implementation of the communication between the

phone and PC will require less overhead on the phone and make test alteration sim-

pler. Making changes to tests from the PC side such as modifying questions, test

length and speech samples could be possible. The use of User D ilp.itrl, Protocol

(UDP) [5] an internet communications protocol that involves the sending of infor-

mation packets -and streaming audio may help expedite any desired changes.

The J2ME listening test may also provide a useful byproduct that allows the

evaluation of speech coders used in the cellular phone industry. Like algorithm eval-

uation, the optimal testing environment for vocoders is on the cellular phone itself.

The ability to evaluate and quantify performance of vocoders on the phone could lead

to a more timely determination of implementation.















APPENDIX A
J2ME CLASSES AND METHODS


The following sections will present the classes, subclasses and methods developed

in the J2ME environment. All methods created will be discussed. Methods inherited

through extension of a class or implementation of an interface will not be discussed.

For a description of these methods see [25, 18, 13]. Additionally, objects defined

in a class that are already part of the J2SE or J2ME SDKs will not be discussed.

Source code may be obtained by sending an e-mail request to William O'Rourke at

worourkeacnel.ufl.edu.

A.1 The ListenT Class

The ListenT class extends the MIDlet Class and implements the CommandLis-

tener interface. See Table A.1.


The RandObj class RandObj is a subclass of the ListenT class that extends the

Random class. It generates pseudo random integers from 231. A seed is set by

calling Random.setseed(long seed) and passing it the current system date measured

from the epoch. See Table A.2.

A.2 The DbUpload

The DbUpload class extends the MIDlet class and implements the CommandLis-

tener interface. See Table A.3.

A.3 VoiceNotes2

The VoiceNotes2 class extends the MIDlet class and implements the Comman-

dListener and VoicenotesEnglish interfaces. See Table A.4.














Table A.I: Methods for ListenT.class


Method


Arguments


getVoiceNoteList


initUserScreen


initTestScreen


initLoudTestScreen


initInteligTestScreen


initAcceptTestScreen


Returns Description


String[] Returns all voice notes
on the phone.


Initializes the User
Information Screen.

Initializes the Test
Select Screen

Initializes the Loudness
Test Screen


Initializes the
Test Screen

Initializes the
Test Screen


initDoneScreen


setUpIntel


p1 liSample


p1 lSample


randomizeDir


Initializes the Exit
Screen

Sets up the Intelligibility
screen for proper display.


String


int, int


String [


Pl-, ,- a voice note by
name.

Pl-, ,- a voice note
randomly by ID.

String[] Randomizes voice note
directory.


Intelligibility


Acceptability











Table A.2: Subclass RandObj.class of ListenT.class



Arguments Returns Description


getRandNum


getRandNum


Returns the next pseudo random
integer limited to 2UmBits-1

Returns the next pseudo random
integer.


Table A.3: Methods for DbUpload.class


Method


Arguments


initConnectScreen


initTestListScreen


initDeleteRecordsScreen


initDeleteOKScreen


initDoneScreen


sendRecords


deleteSentRecords


sendRecord


Description

Initializes the PC
Connection Screen.

Initializes the Test
Select Screen

Initializes the Delete
Option Screen

Initializes the Verify
Delete Screen

Initializes the Exit
Screen

Gathers records by type
and calls sendRecord.

Deletes a Listening Test
Records by type

Deletes a specific record


Method


String















Table A.4: Methods for VoiceNotes2.class.


Arguments


Description


initCommandScreen


initRecordInfoScreen


initRecordScreen


initList


initDeleteScreen


initRenameScreen


initDoneScreen


record


String


pl1 ,i


delete

rename


String, String


Initializes the Command
Select Screen.

Initializes voice note
info Screen.

Initializes the voice note
recorder Screen.

Initializes the voice
note list Screen.

Initializes the voice
note Delete Screen.

Initializes the voice
note rename Screen.

Initializes the Exit
Screen.

Records a voice note.

Pli--v a voice note.

Deletes a voice note.


Renames a specific
voice note.


Method
















APPENDIX B
J2SE CLASSES AND METHODS



The following sections will present the classes, subclasses and methods developed

in the J2SE environment in the same manner as Appendix A.

B.1 DbDownload

The DbDownload.class extends the JFrame class and has subclasses shown in

Figure B.1. This class creates and instance of itself which is shown in the figure as

DbDownii 1-'. This class creates the JDBC connection, instantiates ScrollingPanel

and ControlPanel objects. No methods were required to be added to this class.

DbDownload DbDownload$1





ExtMain --------- ntrolPanel ScrollingPanel




AldRFcord ------------- nnectPhone -------- DownloadData


E Project Q(ss
Inner / Qter
-------- (her (reference, etc )


Figure B.1: The DbDownload class and its subclasses










Table B.1: Methods for ConnectPhone.class



Method Arguments Description

sortData String[] Sorts received string
using delimiter ":?:".



B.2 ControlPanel and ScrollingPanel

The ControlPanel and ScrollingPanel classes extend the JPanel class. These class

have no added methods. The ControlPanel class has instantiates ExitMain and Con-

nectPhone objects. The ExitMain class has no added methods.

B.3 ConnectPhone

The ConnectPhone class implements the ActionListener interface. This class in-

stantiates an DownloadData object. DownloadData implements Runnable and Seri-

alPortEventListener interfaces. See Table B.2 and B.1.
























Table B.2: Methods for DownloadData.class


Arguments


Description


getPhoneData


run


stop


serialEvent


SerialPortEvent


phoneConnect


Sets up serial port
input stream.

Starts Thread for
reading data.

Stops Thread for
reading data.

Listens for events
on the serial port.

Captures the serial
port.


Method














REFERENCES


[1] A. Acero X. Huang and H. Hon. Spoken Liur,.gr.g: Processing. Prentice-Hall
PTR, Upper Saddle River, NJ, 2001.

[2] T. Baer B.R. Glasberg and B.C. Moore. Revision of Zwicker's Loudness Model.
Acustica, 82:335-445, 1996.

[3] M.A. Boillot. A Psychoacoustic Approach to the Loudness Enhancement of
Speech Part I: Formant Expansion. Submitted to International Conference on
Acoustic, Speech and S.:g,,,1 Processing, Hong Kong 2003.

[4] M.A. Boillot. A Warped Filter Implementation for the Loudness Enhancement
of Speech. PhD Dissertation, University Of Florida, 2002.

[5] H. Deitel and P. Deitel. Ir ,',~ How to P,'..gin, Prentice-Hall Inc., Upper
Saddle River, NJ, 1999.

[6] H. Fastl E. Zwicker and C. Dallmayr. Basic-program for Calculating the Loudness
of Sounds from their 1/3-oct. Band Spectra According to ISO 532 B. Acustica,
55:63-67, 1984.

[7] H. Fletcher and W. J. Munson. Loudness, its Definition, Measurement, and
Calculation. Journal of the Acoustical S .: I/; of America, 5:82-108, 1933.

[8] W. Hartmann. S.:g,.,l Sound and Sensation. Springer, New York, NY, 1998.

[9] The International Organization for Standardization. ISO 226. Acoustics-Normal
Equal Loudness Contours. ISO, Geneva, Switzerland 1987.

[10] The International Organization for Standardization. ISO 532 B. 1975. Method
B for Cal. l,.rl.I.: the Loudness of a Complex Tone That Has Been A,.l;,.. ,. in
Terms of One-third Octave Bands. ISO, Geneva, Switzerland 1975.

[11] J. Johnston. Transform Ccoding of Audio Signals Using Perceptual Noise Crite-
ria. IEEE Journal on Selected Areas in Communications, 6(2):314-323, 1988.

[12] J. C. Junqua. The Influence of Psychoacoustic and Psycholinguistic Factors on
Listener Judgments of Intelligibility of Normal and Lombard Speech. In Pro-
ceedings of International Conference on Acoustic, Speech, and S.:g,.,l Processing,
volume 1, pages 361 364, Causal Productions Pty Ltd., Toronto, Canada 1991.









[13] M. Kroll and S. Haunstein. lii.T1 2 Micro Edition Application Development.
Sams Publishing, Indianapolis, IN, 2002.

[14] S. Mitra. Digital S.:j,,.,l Processing: A Computer-Based Approach Second Edi-
tion. McGraww-Hill, New York, NY, 2001.

[15] M. Morrison. Sams Teach Yourself Wireless Java with J2ME in 21 Days. Sams,
Indianapolis, IN, 2001.

[16] Motorola Corporation. J2METMTechnology for Wireless Communication De-
vices. http://developers.motorola.com/developers/j2me, Accessed July 2002.

[17] Motorola Corporation. Motorola Protocol Manual 68P81129E15-B VSELP 4200
BPS Voice Coding Algorithm for iDEN. iDEN Division, Plantation, FL, 1997.

[18] Motorola Corporation. Motorola SDK Components Guide for the J2ME Plat-
form. Austin, TX, 2000.

[19] Motorola Corporation. Stand Alone Voice A I.: .:I Detector High-Level and Low-
Level Design Document. iDEN Division, Plantation, FL, 1999.

[20] S. N ,il i, and H. Miura. Advantages and Disadvantages of A-Wweighted Sound
Pressure Level in Relation to Subjective Impression of Environmental Noises.
Noise Control Engineering Journal, 33:107-115, 1989.

[21] Nextel iDEN. i90c Phone Users Guide. Nextel Communications Inc., Reston,
VA, 2001.

[22] T.L. Reinke. Automatic Speech Inr, ll.,,i.,lii Enhancement. Master's Thesis,
University Of Florida, 2001.

[23] Rice University. Signal Processing Information Base.
http://spib.rice.edu/spib/select-noise.html, Accessed June 2002.

[24] Sun Microsystems Inc. Communications API.
http://java.sun.com/products/javacomm/, Accessed April 2002.

[25] Sun Microsystems Inc. JavaTM2 Standard Edition API.
http://java.sun.com/j2se/1.4/docs/api/index.html, Accessed September 2001.

[26] Texas Instruments. TI 46-Word Speaker-Dependent Isolated Word Corpus (cd-
rom), 1991.

[27] Texas Instruments. DARPA TIMIT Acoustic-Phonetic Continuous Speech Cor-
pus (cd-rom), 1991.

[28] W. D. Voiers. Ch. 34 Diagnostic Evaluation of Speech Irn/l11. l...I:i, Dowden,
Hutchinson, and Ross Inc., New York, NY, 1977.

[29] E. Zwicker and H. Fastl. P-;. I.. r. .>ustics: Facts and Models 2nd Edition.
Springer Verlag, New York, NY, 1999.















BIOGRAPHICAL SKETCH


William O'Rourke was born on November 5, 1970, in Buffalo, NY. At the age of

five his family moved to Boca Raton, Florida. After high school, he entered the US

Navy. During that time, he spent five years stationed on two ships forward deploy, ,1

to the US Seventh Fleet in Japan. He travelled extensively through Asia meeting new

people and learning new customs, a priceless experience. At the age of 27, he decided

to leave the Navy and pursue a degree in electrical engineering at the University of

Florida.