1 PERCEPTION AND PRODUCTION OF ENGLISH LEXICAL STRESS BY THAI SPEAKERS By JIRAPAT JANGJAMRAS A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2011
2 2011 Jirapat Jangjamras
3 To my parents
4 ACKNOWLEDGMENTS I cannot believe that the biggest and the most complicated project I have ever undertaken is coming to an end. Many people have given me support at various stages of my work. First and foremost, I extend my heartfelt thank s to Dr. Ratree Wayland, my committee chair, an expert in experimental phonetics and psycholinguistics. Her thorough and innovative advice helped me develop a comprehensive research design; her encouragement when I felt down and her patience with my myriad questions helped me finish the project. I would also like to express my gratitude to my committee members. Dr. James Harnsberger gave me training on recording equipment, read my endless word lists allowed me to sit in on his transcription class and intensive summer reading group, and encouraged me to redo data collection in a professional way. Dr. Caroline Wiltshire mentioned in her Introduction to Graduate Research class that we should propose research as if we do not care about time and money. This has become the unconsciously underlying philosophy of my dissertation. I admire Dr. Wind C owless expertise in psycholinguistics as well as her organization al skills and kindness. I need to include Dr. Edith Kaan as my unofficial committee member. She introduced me to E Prime let me take over her sound booth for two months, and aided me on R codes Excel and statistical analysis. A lowtech person like me received technical support from people I met in various venues. Helpers on Praat scripts include Priyankoo Sarmah (my former classmate), Danial Hirst, Paul Edmunds, and especially Dr. Boon Yang who kindly wrote a customized script for my acoust ic analysis. I benefit ed greatly from attending the
5 Linguistics MiniInstitute at Ohio State University in the summer of 2008 where Mary Beckman introduced me to the R program. I also would like to express my deep appreciation to my past teachers and professors in other fields who taught me how to learn. They are ajaarn Suganda Tangtavechagit, Dr. Pak Silavipaporn, Dr. Chetana Nagavachara, Dr. Rudolph Troike, Dr. M.J. Hardman and Dr. Eric Potsdam. I am indebted to the Royal Thai Government who granted me the MA PhD scholarship and the English Department, Chiang Mai University, for allowing me to take a long leave of absence. This research has been partially funded by a Dissertation Grant awarded by the journal Language Learning. During the ups and downs of grad school, I received support from these endearing people: Bill and Mary Sullivan, Jules Gliesche, Andrea Dallas, Donruethai Laphasradakul, Han Ye, Zoe Ziliak, Weihua Zhu, Yunjuan He, Divya Gogoi, Lynne Harshman, Jolee Gibbs, Noon, B ot, To, PMor, PCain, Golf, Vap and NPle I sincerely thank the Athey family, Marvin, Bonnie, and Rich, for showing me the real American life. My greatest thanks go to two special people who love me unconditionally and give me endless care and support, Samritphon and Jiemjai Jangjamras. I feel so fortunate to be their daughter.
6 TABLE OF CONTENTS page ACKNOWLEDGMENTS .................................................................................................. 4 LIST OF TABLES ............................................................................................................ 9 LIST OF FIGURES ........................................................................................................ 11 LIST OF ABBREVIATIONS ........................................................................................... 13 ABSTRACT ................................................................................................................... 14 CHAPTER 1 INTRODUCTION .................................................................................................... 17 Motivation ............................................................................................................... 18 Goals ...................................................................................................................... 18 Main Research Questions ....................................................................................... 20 Hypotheses ............................................................................................................. 20 Research Design .................................................................................................... 21 2 SECOND LANGUAGE SPEECH PRODUCTION AND PERCEPTION .................. 23 Stress Perception ................................................................................................... 24 Stress Position in L1 and Stress Typology ....................................................... 24 Acoustic Cue for Stress Perception .................................................................. 27 Proposed Models ............................................................................................. 29 Stress deafness model (SDM) ................................................................... 29 Stress typology model (STM) ..................................................................... 31 Stress Production ................................................................................................... 34 Second Language Stress Placement Strategy ................................................. 35 Acoustic Parameters of Second Language Stress ........................................... 40 Proposed Model: STM ...................................................................................... 45 Limitations of Previous Stress Production Studies ........................................... 46 Relation between Stress Perception and Production .............................................. 47 3 PROSODIC PHONOLOGY OF ENGLISH AND THAI ............................................ 50 American English .................................................................................................... 50 American English Segmental System .............................................................. 50 Stress ............................................................................................................... 51 Lexical Stress ................................................................................................... 52 Lexical Stress Pat tern ...................................................................................... 53 Acoustic Correlates of Stress ........................................................................... 54 Accent .............................................................................................................. 57
7 Intonation .......................................................................................................... 58 Thai ......................................................................................................................... 59 Thai Segmental System ................................................................................... 59 Syllabic Structure ............................................................................................. 60 Tone ................................................................................................................. 61 Stress ............................................................................................................... 62 Interaction between Thai Prosodic Units .......................................................... 66 Conclusion .............................................................................................................. 67 4 THE CURRENT STUDY ......................................................................................... 72 Motivation f or the Study .......................................................................................... 72 Research Questions and Hypotheses ..................................................................... 73 Overview of the Study ............................................................................................. 74 Detailed Research Questions ................................................................................. 74 Experiment 1: Speech Production .................................................................... 74 Experiment 2: Speech Perception .................................................................... 75 Experiment 1 & 2: Relation between Speech Perception and Production ........ 76 Improvement in Methodology .................................................................................. 77 Significance of the Study ........................................................................................ 78 5 METHODOLOGY ................................................................................................... 80 Participants ............................................................................................................. 80 Materials ................................................................................................................. 81 Procedure I: Production Experiment ....................................................................... 84 Procedure II: Perception Experiment ...................................................................... 86 Procedure III: Production Evaluation ...................................................................... 87 Native American English Listener Judgments .................................................. 87 Acoustic Analysis ............................................................................................. 89 Procedure IV: Perception Evaluation ...................................................................... 92 Statistical Design .................................................................................................... 92 Conclusion .............................................................................................................. 93 6 RESULTS ............................................................................................................... 97 Speech Production ................................................................................................. 98 Perceptual Evaluation ...................................................................................... 98 Acoustic D ata: ThreeWay ANOVAs ................................................................ 99 Acoustic Data: Stepwise Regression .............................................................. 101 Speech Perception ............................................................................................... 107 Accuracy ......................................................................................................... 107 Accuracy (without Vowel Reduction Cue) ....................................................... 108 Reaction Time ................................................................................................ 109 Acoustic Analysis of Perception Stimuli .......................................................... 110 Relation between Speech Perception and Production .......................................... 119 Overall Summary of Results ................................................................................. 121
8 7 THAI TONE TRANSFER ...................................................................................... 147 Methodology ......................................................................................................... 147 Research Questions and Hypotheses ............................................................ 147 Materials ......................................................................................................... 149 Two Way Repeated Measures ............................................................................. 149 Average F0 ..................................................................................................... 150 Duration .......................................................................................................... 151 Intensity .......................................................................................................... 152 Summary of the TwoWay Repeated Measures Results ................................ 152 Thai Tone Transcription ........................................................................................ 153 Transcription Procedures ............................................................................... 154 Results ........................................................................................................... 154 Conclusion ............................................................................................................ 156 8 GENERAL DISCUSSION AND CONCLUSIONS .................................................. 160 Stress Production ................................................................................................. 160 Thai Tone Transfer ............................................................................................... 165 Stress Perception ................................................................................................. 167 Relation between Speech Perception and Production .......................................... 172 STM and the Current Findings .............................................................................. 174 Conclusion ............................................................................................................ 177 Limitations of the P resent Study ........................................................................... 178 Future Directions .................................................................................................. 179 APPENDIX A WORDLIST FOR SPEECH PRODUCTION TASK READ BY A PHONETICIAN .. 181 B MATERIALS FOR SPEECH PRODUCTION TASK AS PRESENTED ON EPRIME .................................................................................................................. 182 C WORDLIST FOR SPEECH PERCEPTION TASK READ BY A PHONETICIAN ... 184 D QUESTIONAIRE FOR LANGUAGE BACKGROUND CHECK ............................. 188 E ENGLISH LEXICAL STRESS: POWER POINT PRESENTATION ....................... 189 LIST OF REFERENCES ............................................................................................. 190 BIOGRAPHICAL SKETCH .......................................................................................... 198
9 LIST OF TABLES Table page 3 1 American English consonants ............................................................................ 68 3 2 American English vowels .................................................................................... 69 3 3 Thai consonants ................................................................................................. 70 3 4 Thai vowels ......................................................................................................... 70 5 1 25 word types ..................................................................................................... 95 6 1 Results of t hreeway ANOVAs on four acoustic correlates ............................... 130 6 2 Descriptive statistics of average F0 by syllabic structure x position x group ..... 131 6 3 Descriptive st atistics of three acoustic correlates by all combinations of stress position x syllabic structure and mean difference of each contrastive pair and their t statistics. ................................................................................................. 131 6 4 Summary of stepwise regression results for AE and NT for each syllabic structure ........................................................................................................... 132 6 5 Summary of predictors for each stress position ................................................ 132 6 6 Descriptive statistics of average F0 values and difference between stressed and unstressed syllables in each structure x position ....................................... 144 6 7 Descriptiive statistics of intensity difference values and difference between stressed and unstressed syllables in each structure x position ........................ 144 6 8 Summary of stepwise regression results AE and NT stress perception score .. 144 7 1 Hypothesized F0 patterns according to Gandours and Thai tones predictions on English nonwords produced by Thai speakers ......................... 157 7 2 Descriptive statistic of average F0 ratio by each syllabic structure and stress position ............................................................................................................. 157 7 3 Descriptive statistic of duration ratio by each syllabic structure and stress position ............................................................................................................. 158 7 4 Descriptive statistics of average intensity difference by each syllabic structure and stress position ............................................................................................ 158 7 5 Thai tone distribution of initial stress words ...................................................... 158
10 7 6 Thai tone distribution of final stress words ....................................................... 159 7 7 Thai tonal rule predictions ................................................................................ 159 7 8 Gandour (1979)s predictions ........................................................................... 159 A 1 Stimulus word duration in milliseconds ............................................................. 181 B 1 Stimuli presented on E PRIME ......................................................................... 182 C 1 Speech perception wordlist block 1 .................................................................. 184 C 2 Speech perception wordlist block 2 .................................................................. 185 C 3 Speech perception reduced vowel wordlist block 1 .......................................... 186 C 4 Speech perception reduced vowel wordlist block 2 .......................................... 187
11 LIST OF FIGURES Figure page 2 1 Typology of stress parameters and relevant languages. .................................... 49 3 1 Functions and physical parameters of worlds prosodic system. ........................ 68 3 2 English prosodic hierarchy. ................................................................................ 69 3 3 F0 contour of the five Thai tones i n isolated monosyllabic words ....................... 71 6 1 Overall mean percentage and standard errors of stress production score by NT and AE ........................................................................................................ 127 6 2 M ean percentage and standard errors of stress production score by each stress position by both language groups .......................................................... 127 6 3 Mean percentage and standard errors of stress production score by stress position and language group ............................................................................ 128 6 4 Mean percentage and standard errors of stress production score by syllabic structures of both stress positions by both language groups ............................ 128 6 5 Mean percentage and standard errors of stress production score by syllabic structure by stress position and language group .............................................. 129 6 6 Overall mean percentage and standard errors of stress perception accuracy by NT and AE ................................................................................................... 133 6 7 M ean percentage and standard errors of stress perception accuracy by each stress position by both language groups .......................................................... 133 6 8 Mean percentage and standard errors of stress perception accuracy by stress position by NT and AE ........................................................................... 134 6 9 Mean percentage and standard errors of stress perception accuracy by syllabic structures of both stress positions by both language groups ............... 134 6 10 Mean percentage and standard errors of stress perception accuracy by syllabic structure by language group ................................................................ 135 6 11 Mean percentage and standard errors of stress perception accuracy by syllabic structure by stress position and language group ................................. 135 6 12 Overall mean percentage and standard errors of stress perception without reduction cue accuracy by NT and AE ............................................................. 136
12 6 13 M ean percentage and standard errors of stress perception accuracy without vowel reduction cue by each stress position by both language groups ............ 136 6 14 Mean percentage and standard errors of stress perception without vowel reduction cue accuracy by stress position by NT and AE ................................. 137 6 15 Mean percentage and standard errors of stress perception without vowel reduction cue accuracy by syllabic structure of both stress positions and both l anguage groups ............................................................................................... 137 6 16 Mean percentage and standard errors of stress perception without reduction cue accuracy by syllabic structure by language group ..................................... 138 6 17 Mean percentage and standard errors of stress perception without reduction cue accuracy by syll abic structure by stress position and language group ....... 138 6 18 Overall mean percentage and standard errors of stress perception accuracy accuracy by NT and AE .................................................................................... 139 6 19 M ean percentage and standard errors of stress perception accuracy by each stress position by both language groups .......................................................... 139 6 20 Mean percentage and standard errors of reaction time by syllabic structures of both stress positions by AE .......................................................................... 140 6 21 Mean percentage and standard errors of reaction time by stress position by NT and AE ........................................................................................................ 140 6 22 Mean percentage and standard errors of reaction time by syllabic structure by stress position of both language groups ...................................................... 141 6 23 Mean percentage and standard errors of reaction time by syllabic structure by stress position and by NT and AE ................................................................ 141 6 24 Series of four acoustic parameters in stressed and unstressed syllables of each syllabic structure and stress position combination. .................................. 143 6 25 Scatterplot for the overall production and perception accuracy percentage by NT ..................................................................................................................... 145 6 26 Scatterplot for the overall production and perception accuracy percentage by AE ..................................................................................................................... 145 6 27 Scatterplot for the NT production and perception accuracy percentage of initial CVV xx syllabic structure .......................................................................... 146 6 28 Scatterplot for the NT production and perception accuracy percentage of final xxC VV syllabic structure ................................................................................... 146
13 LIST OF ABBREVIATIONS AE American English or native speakers of American English ANOVA Analysis of v ariance F0 Fundamental frequency ITI Intertrial i nterval JND Just noticeable difference L1 First language or native language L2 Second language or target language NT Native speakers of Thai RT Reaction t ime SDM Stress Deafness Model SPEAK Speaking Proficiency English Assessment Kit STM Stress Typology Model TSE Test of Spoken English
14 Abstract of Dissertation Present ed to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy PERCEPTION AND PRODUCTION OF ENGLISH LEXICAL STRESS BY THAI SPEAKERS B y Jirapat Jangjamras May 2011 Chair: Ratree Wayland Major: Linguistics This study investigated the effects of first language prosodic transfer on the perception and production of English lexical stress and the relation between stress perception and production by second language learners. To test the effect of Thai tonal distribution rules and stress patterns on native Thai speakers perception and production of stress in English, paired experiments using English nonwords with controlled syllabic st ructures were used. Native speakers of Thai (NT) and American English speakers (AE) were first asked to concatenate two aurally presented monosyllabic syllables and to produce them as disyllabic nonwords with initial or final stress. They were then asked to identify stress location in the same set of disyllabic nonwords produced by a trained phonetician. Acoustic analyses of both groups production indicated that they implemented wordinitial stress with a higher mean F0 range, a higher average F0 and a higher average intensity than wordfinal stress. However, wordfinal stress was realized with longer vowel duration than wordinitial stress. The results obtained also revealed that, while both groups production was equally (~70%) heard as intended over all, their production of wordinitial stress was heard as intended at a higher rate than their wordfinal stress production (73% vs.67%).
15 In addition, it was found that a syllable with a short vowel followed by an obstruent (CVO) was the most difficult syl labic structure on which accurate stress production could be heard. Results of a stepwise regression analysis showed that native judges relied heavily on vowel duration in their perception of NT wordinitial as well as word final stress production. On the other hand, intensity and average F0 were their main perceptual cues to AE wordinitial and word final stress respectively. Even though F0 patterns of NT stress production could be identified as similar to a Thai tone, little evidence of Thai tonal distri bution rule transfer on their production of stress was found. The stress perception results showed that NT identified stress position as accurately as AE and both showed initial stress preference. Both groups were more accurate at identifying wordinitia l stress than wordfinal stress even though final stress was produced with more acoustic correlates than initial stress. Reaction time analysis also revealed that Thai speakers spent more time on final stress identification. In addition, stepwise regression analyses revealed no significant predictor for both AE and NT perception accuracy scores for initial stress. For final stress, vowel duration was the predictor for AE while intensity, duration and mean F0 were strong predictors for NT. A careful examinat ion of the acoustic data suggested that average F0 may have been responsible for their superior initial stress perception. However, NT did not show the same sensitivity toward vowel reduction as a cue for stress as AE did. Furthermore, a moderate degree of correlation between the perception and production scores and acoustic reliance was observed among NT, but not among AE suggesting a relatively closer relation between stress production and perception among NT. The overall findings are inconsistent wit h the Stress Typology Model and suggest that tonal
16 speakers with L1 fixed stress patterns could accurately produce and perceive variable stress in L2 and that acoustic features used contrastively in L1 are exploited in L2 stress perception and production. More detailed acoustic studies on L2 stress acquisition among tone language speakers will deepen our understanding of L1 prosodic system influence on L2 stress learning, thus leading to a more comprehensive and accurate model of cross language stress acquisition.
17 C HAPTER 1 INTRODUCTION All languages in the world have segmental systems, or inventories of vowels and consonants, which distinguish pairs of words with different meanings. In English, the phonemes /p/ in pin and /b/ in bin are contrastive consonants and // in pet and // in pi t are contrastive vowels. Besides segmental systems, s u prasegmental features such as stress, tone, intonation and rhythm also have linguistic functions in the worlds languages. For example, stress is used to distingui sh the English noun PROduce from the verb proDUCE. When a stress falls on the first syllable, we have a noun that refers to food or other substance obtained through farming. When the stress falls on the second syllable, we have a verb that means to make s omething. A stressed syllable is usually characterized by being acoustically louder, longer and higher in pitch than other surrounding unstressed syllables. In tone languages, voice pitch, an auditory impression of rate of vocal fold vibration, distinguishes the meaning of one word from another. These lexical tones are characterized by their pitch height (e.g., high, mid, low) or pitch contours (e.g., level, falling and rising). For example, in Thai, maa produced with mid tone means to come and with high tone ma means a horse. On the other hand, a change in pitch does not affect a words meaning in English, but a change in intonation (pitch movement) can differentiate a statement from a question. Thus, for a second language learner, learning a new language requires mastery of not only vowel and consonant production, but also production of suprasegmental or prosodic features governing larger linguistic units.
18 Motivation People from various nationalities frequently came up to me and said they felt that a native Thai speaker like myself speaks English with musical tones. One of the most detailed observations came from an American English accent reduction instructor. He made an observation that Thai learners of English tend to drop word endings, i. e. final consonants, insert musical tones on vowels, nasalize vowels, or devoice final voiced consonants. For example, the English word nine /na n/ was produced as [na ] with a high tone or [nat] night by Thai speakers. Having listened to such comments and knowing that many Thai graduate students at the University of Florida struggled to pass the Speaking Proficiency English Assessment Kit (SPEAK), a test of spoken English proficiency similar to the Test of Spoken English (TSE), I became interested in conducting research in second language (L2) prosody production. Goals Most adult second language learners speak with a foreign accent, and the influence from their first language (L1) has often been regarded as a source of nontarget like speech pr oduction. L1 sounds also affect, whether positively or negatively, L2 speech perception. For decades, researchers have examined the influence of native speech sounds on the perception and production of nonnative vowels and consonants; however, research into speech perception and production of L2 prosody, including lexical stress, rhythm, and intonation remains scarce. Since less is known in this area, the goal of this dissertation is to primarily investigate prosodic transfer at word level, L2 lexical stress, in both speech perception and production. Thai and English are examined due to
19 systematic differences in their prosodic systems. Thai is a tone language with five phonemic tones: mid, low, high, falling and rising. The distribution of these tones varies according to syllabic structures. For example, all five tones can occur on syllables containing a long vowel with or without a nasal final consonant (CVV or CVVN). On the other hand, only the low and falling tones can occur on syllables with a final obstruent consonant (CVVO). Vowel length is also contrastive in Thai. At the phrasal level, Thai exhibits final phrasal stress, the main acoustic correlate of which is duration (Potisuk, 1996). Unlike Thai, English lacks lexical tones and, instead, has v ariable lexical stress patterns employing dynamic acoustic cues including, duration, intensity and voice fundamental frequency (F0) (Cutler, 2005; Guion et al., 2003). At present, little is known about how a first language (L1) lexical tone system may in fluence the perception and production of L2 lexical stress. One may hypothesize that Thai speakers might assign Thai tones onto English words since it was found that English words that entered into Thai language were assigned Thai tones according to their syllabic structure and word position (Gandour, 1979). The present study focuses on the transfer of acoustic properties used to realize lexical tone, phrasal stress and vowel length distinction to L2 lexical stress perception and production. The s pec ific question ask ed was whether these L1 prosodic factors would have an effect on stress production and perception on English disyllabic nonword such as CVC.CVVO. The scope of L2 stress production in this study is limited to disyllabic words to investigate acoustic properties used to implement stress in English by native Thai speakers (NT) The study does not address NT knowledge of stress assignment patterns on multisyllabic words in English, but their phonetic implementation
20 of acoustic correlates of k nown stress patterns. The relation between L2 stress production and perception is also investigated. Main Research Q uestions Do Thai tone distribution patterns, which are governed by syllabic structures, affect NT production and perception of English st ress? Does the stress pattern in Thai influence NT perception and production of lexical stress? What is the relation between L2 lexical stress perception and production? Hypotheses The knowledge of tone distribution patterns in Thai was hypothesized to affect F0 patterns produced by NT on stressed disyllabic nonwords. Specifically, due to Thai tonal distribution patterns, it was predicted that NT would have difficulty producing stress on CVVO and CVO syllables due to restrictions on allowable Thai F0 pattern assignments. Since only falling and low tones are allowed in CVVO syllables, NT may not be able to produce appropriate F0 value to indicate stress on these syllables. Both low and high tones are allowed on CVO syllables in Thai, and the implementat ion of English stress on these syllables as a Thai low tone could potentially lead to the syllable being perceived as unstressed. Secondly, it was predicted that NT would produce final stress more accurately than initial stress due to a transfer from thei r L1 fixed final stress pattern. As for perception, since Thai is a tone language with a phonemic vowel length distinction, it was predicted that NT would not have difficulty perceiving English lexical stress because they could rely on pitch and vowel dur ation as main perceptual cues. The combined results from the production and perception tasks were expected to yield insight into the relation between L2 stress production and perception.
21 Research Design The overall design of the study involves adm inistering a stress production and perception test to a group of native Thai speakers and a group of native speakers of English (AE) The same set of test words are used for both tasks to allow for an investigation of the relationship between production an d perception of L2 stress. The production test was designed to examine stress production accuracy rates as well as acoustic properties used in the implementation of stress. The perception test examined stress location identification, accuracy, and acoustic properties employed. The results would demonstrate which aspects of L1 prosody are transferred to both production and perception of L2 stress and provide evidence for the relation between stress perception and production. Moreover, they would reveal the specific acoustic properties that the AE and NT use to produce and perceive stress. Such findings will serve as a resource for L2 classroom planning and a novel starting point for L2 prosody research. Outline This dissertation is structured as follows. Chapter 2 addresses dominant models in speech perception and production and provides background literature on acquisition of L2 suprasegmentals, L2 stress perception and processing, and acoustic correlates of L2 stress production. Chapter 3 desc ribes the suprasegmental systems of Thai and English and compares the functions and acoustic correlates of Thai tone and stress to those of English lexical stress. Chapter 4 presents goals and specific research questions, predictions and implications of the study. Chapter 5 discusses methodology for speech perception and production experiments. Chapter 6 includes data analysis, results and the combined findings for speech production and perception. Chapter 7
22 presents data analysis for Thai tone transfer and results. Finally, Chapter 8 presents general discussion, conclusion, limitations of the study, and future research directions.
23 CHAPTER 2 SECOND LANGUAGE SPEE CH PRODUCTION AND PERCEPTION Most adult learners of a second language speak with a detectable foreign accent and experience some speech processing difficulties. A number of variables, including age of L2 learning onset, linguistic environment, length of residence in an L2speaking country, gender, formal instruction, motivation, languagelearning aptitude, and influence from the first language (L1) sound system, have been raised to account for these phen omena (Ortega, 2009 ; Piske, MacKay & Flege, 2001; Trofimovich, 2006). The assumption is that interference from the native phonological and phonetic systems is an important source of deviation from nativelike speech production and perception (Flege, 1995; Best, 1995). Similarity and dissimilarity between L1 and L2 phones have been hypothesized to predict the success rate of L2 speakers from differing L1 backgrounds. Similarity can be defined acoustically or articulatorily, and the units of comparison can be segments, allophones, phonemes, or larger units. Dominant L2 speech learning theories, such as the Perceptual Assimilation Model (Best, 1995) and the Speech Learning Model (Flege, 1995), have, for decades, been used to examine the influence of native segmental (vowel and consonant) systems on the perception and production of nonnative segments (see Strange (1995) for an overview). Very f ew studies, however, have focused on L2 speech prosody, that is, tone, stress, pitch accent, rhythm, and intonation. The goal of this dissertation is to investigate one of these less understood areas of prosodic transfer, with a focus on second language l exical stress acquisition. Background literature on L2 stress perception and production and related L2 stress models are presented in this chapter.
24 Stress P erception The ability to detect primary stress location is important for efficient word recognit ion and identification (Jongenburger, 1996). Stress languages can differ in where stress falls. Stress position in stress languages like English and Dutch can have a contrastive property; for example, the location of stress determines word class (i.e., p roduce is a noun when stressed initially and a verb when stressed finally) S tress in English, and other languages like French and Polish, can be positionally fixed (i.e., always occur ring at the word edges) or maybe sensitive to syllable weight or the pr esence of specific affixes, among other factors. Previous studies have shown that n onnative speakers of an L2 stress language had various success rates in stress perception tasks. Factors concerning L2 stress perception investigated in previous research include L1 stress position, comparison of L2 speakers from various L1 stress typology and specific parameters in a stress language, and finally, relative weighting in acoustic cues for L2 stress perception. Stress Position in L1 and Stress T ypology Evidenc e of L1 stress bias on L2 stress location perception has long been in existence. Berinstein (1979) found that participants from each of three language groups (English, Spanish, and K'ekchi, a Mayan language spoken in Guatemala) identified stress differently on four syllable word (CV.CV.CV.CV) stimuli that varied in vowel duration cues in relation to their L1 stress pattern. That is, English showed initial syllable bias, Spanish no position bias, and Kekchi a final position bias. L1 stress patterns have been reported to have an effect on the stress pattern perception of nonword stimuli.
25 Speakers of French, a language with fixed final stress, were found to perform poorly on stress discrimination tasks on CVCV nonwords that differ only in the position of stress as opposed to speakers of Spanish, a language with variable stress position ( Dupoux Sebastin & Mehler,1997; Dupoux, Peperkamp, & Sebasti nG lles, 2001). French speakers have been shown to exhibit a socalled stress deafness which has b een found in a high variability, high memory load ABX task (i.e., A and B were produced by different talkers with par ticipants responding within a time limit) (Dupoux et al., 1997). However, in a samedifferent AX discrimination task with both stimuli (A and X) produced by the same talker (involved no change in talker), French speakers could discriminate stimuli that differ only in stress location very well. This showed that speakers of French, a language that rarely utilizes any prosodic cues, could detect acoustic correlates of stress (Dupoux et al, 1997). Subsequent studies using a sequence recall task concluded that French speakers lack the ability to encode contrastive stress in their phonological representation of words (Dupoux, Sebasti nG lles, Nav arrete & Peperkamp, 2008). In other words, the stress deafness may be a linguistic problem, rather than a perceptual problem. Stress deafness was also observed in speakers of other predictable stress languages; for example, Finish and Hungarian speakers performed worse in a stress discrimination task than Polish speakers (Peperkamp & Dupoux, 2002). Unlike Finish and Hungarian, Polish vowel length is not contrastive, but is used as a stress cue. In addition, the Polish stress rule has some lexical except ions. With a similar goal of testing the effect of L1 stress patterns on L2 stress perception, Altman (2006) investigated the stress identification ability of L2 learners of
26 English from various L1 stress backgrounds: predictable, nonpredictable and non stress languages. Predictable stress was defined as primary stress that is regular and the position on which stress falls can be predicted based on phonological characteristics of the word alone (e.g., position of a syllable within the word, syllable wei ght) and nonpredictable stress was used when primary stress is not fixed in one position. Results from a stress identification task testing multisyllabic English nonwords showed that speakers of L1 fixedstress languages (Arabic, Turkish, and French) scored significantly lower than speakers of L1 without stress (Japanese, Korean, and Chinese) and speakers with unpredictable stress (Spanish). However, similar patterns of identification difficulties due to stress position have been observed in all groups. All groups showed similar difficulties when locating stress placement on the final syllable of 2and 3syllable words, followed by that of initial and final syllable for 4 syllable words (125 test items). Her findings were used to support the stress typology model that will be discussed later in this chapter. Peperkamp, Vendelin & Dupoux (2010) further investigated the factors relating stress perception among speakers of L1 predictable stress languages. They extended the sequence recall task paradi gm used in their earlier work (Dupoux et al., 2008) to speakers of five languages with predictable stress (Standard French, Southeastern French, Finnish, Hungarian and Polish) and one language with non predictable stress (Spanish). They hypothesized that there are four factors that can predict the degree of stress deafness between languages: domain of stress (word or phrase), lexical use of one or more phonetic correlates of stress in other functions (e.g. duration in contrastive length or F0 in tone or pi tch accent), variability in stress position (none, moderate, high)
27 and lexical exceptions to stress rules of each language. Results showed that all languages with predictable stress except Polish exhibited a strong stress deafness while Spanish speaker s exhibited no such deafness. Polish speakers exhibited a weak stress deafness This was only correctly predicted by the lexical exceptions to stress rules factor; Polish has moderate variability among a small number of lexical exceptions whereas other predictable stress languages do not have any lexical exceptions. This study indicates that languages can vary significantly in suprasegmental contrasts (i.e., tone, length, stress and pitch accent). Even when languages are grouped as a predictable str ess language, subtle differences in parameters such as lexical exceptions to stress rules can contribute to varying success in L2 stress perception. Acoustic Cue for Stress P erception A less explored area in L2 stress perception is auditory sensitivity in detecting the acoustic signal of stress. Two issues may be considered. First, sensitivity to acoustic/ phonetic correlates of stress can be accounted for by the function of an acoustic cue used in the L1, whether it is phonetic or phonemic, in relati on to L2 stress cue selection (Berinstein, 1979). Second, stress perception involves the accessibility of acoustic cues, which may be cues used in prosodic systems other than stress or cues not present in the L1. When multiple cues are available, weighti ng of the cues is observed (Zhang & Francis, 2010). Berinstein (1979) predicted that duration may not be a significant stress cue in languages in which vowel length is phonemic. To test this hypothesis, two groups of L1 fixed stress position speakers that differ in the use of a phonemic vowel length system were asked to judge stress on artificial language stimuli words with four syllables,
28 CV.CV.CV.CV, that only vary in duration contrast. The prediction was borne out; it was found that speakers from a fixed stress language without phonemic duration contras t (Cakchiquel) used duration as a perceptual cue to stress while speakers from a fixed stress language with a phonemic vowel length system (Kekchi) did not use duration as a perceptual cue to stress. Bernstein concluded that phonemic vowel length in Kiekchi prevented the use of duration as a stress cue in nonword stimuli. This finding implied that if an acoustic cue was used phonemically in a language, such as F0 in a tone language, F0 would not play a significant role in perceiving and/or producing st ress. One limitation of her study is that the researcher only focused on one acoustic correlate to stress and ignored other cues. Additionally, the synthetic materials used in this study did not approximate naturally produced words; they were syllables w ith a fundamental frequency of 100 Hz. The 100 Hz F0 is lower than the minimum 120 Hz for a typical male speaker (Traunmller & Eriksson, 1995). Zhang & Francis (2010) investigated the weight of acoustic cues to English lexical stress by speakers from va stly different prosodic typologies: Mandarin, a tone language, versus English, a stress language. The researchers reviewed that lexical tones in Mandarin are acoustically instantiated primarily in terms of F0, then duration and intensity, but not by vowel reduction; English lexical stress is instantiated by multiple cues, especially in F0, intensity, duration and vowel quality. Both groups of speakers were asked to judge lexical stress placement in synthesized tokens of the English word desert with mani pulated acoustic properties: vowel quality (F1F3), F0, duration and intensity under natural and flat pitch contour conditions. Results showed that both language groups consistently weighted vowel quality more heavily than other cues;
29 vowel quality and duration were treated as a combinational cue by both language groups. However, different patterns of reliance lie in F0 usage. Mandarin listeners consistently showed greater reliance on F0 than English listeners in both pitch contour conditions while Engli sh listeners only showed a greater F0 reliance in natural than in flat pitch contour conditions. The study reported novel findings that a nonnative cue, vowel reduction, was ranked the highest among existing acoustic cues employed in the L1. However, the results suggest robust sensitivity of F0 detection by native speakers of tone languages since Mandarin speakers showed more sensitivity to F0 contrast in both pitch contour conditions. Proposed Model s The above L2 stress perception studies revealed that success rate in L2 stress perception can be predicted by various factors. Most studies assume that the position of L1 stress influences the perception of L2 stress patterns (Archibald, 1998; Altmann, 2006; Berinstein, 1979; Dupoux et al., 1997; Dupoux et al 2001; Peperkamp & Dupoux, 2002). This assumption was incorporated into two models: the Stress Deafness Model (SDM) and the Stress Typology Model (STM). Stress deafness m odel (SDM) Peperkamp & Dupoux (2002) proposed a hierarchy of languages with predictable stress. They associate languages that have more predictable stress with poorer stress discrimination compared to their discrimination of segments. In their experiment, participants were required to memorize two nonwords that either differed in one of two dimensions: segmental (e.g. kupi vs. kuti) or stress location (e.g. mipa vs mipa), and associate them with different keys on a computer keyboard. Then they heard random
30 sequences of the two nonwords and were asked to transcribe them using the s pecified keys on the keyboard. Every sequence was followed by the word OK to prevent the use of echoic memory. The stress discrimination results showed that speakers of various stressed languages vary in their performance (Dupoux et al., 1997; Peperkam p and Dupoux, 2002) and the features of their L1 stress were used as the grounds for the model. SDM categorizes languages with predictable stress into a hierarchical classification of stress deafness from class I (major problems distinguishing stress contrasts) to Class IV (no problems). The following is the hierarchy of stress deafness. Class I (French, Finnish): regular stress always at an utterance edge (no phrasefinal unstressed function words) Class II (Fijian): regular stress at an utterance edge based on syllable weight: utterancefinal if heavy, otherwise penultimate (no phrasefinal unstressed function words) Class III (Hungarian): regular stress at utterance edge, except for unstressed function words Class IV (Polish): regular stress pattern f or content words, however, not at utterance edge (unless monosyllabic) Stress parameters, whether s tress is contrastive or not, were claimed to set during first language acquisition. If the language allows children to observe that stress is regular in thei r language, they will not encode stress information in their phonological representation and thus lose the ability to use this information in the speech stream in the course of L1 and L2 acquisition. Accordingly, languages with regular stress that always falls on utterance edges (Class I) yield a higher degree of stress deafness than those that require a) knowledge about syllable weight (class II), b) the ability to
31 distinguish between function and content words (class III), or c) the awareness of content word boundaries (class IV). SDM is concerned only with general perceptual ability, but not with stress production or issues in L2 acquisition. Importantly, the model has never been tested on speakers from tone language backgrounds which lack word level stress as well. In addition, novel findings by Peperkamp, Vendelin and Dupoux (2010) have proven that SDMs classification is not detailed enough in predicting stress discrimination of speakers from various stress backgrounds. Polish speakers did exhibit weak stress deafness while French and Hungarian speakers showed strong stress deafness. Lexical exceptions were found to be a more accurate predictor that distinguished Polish speakers from Spanish speakers who showed no stress deafness. Stress typology m odel (STM) STM was proposed by Altmann & Vogel (2002) and elaborated by Altmann (2006). STM predicts the success rate of bilinguals L2 stress perception according to their L1 stress patterns/parameters. Overall, STM predicts poor stress discrimi nation by speakers of L1 predictable stress languages and excellent discrimination by speakers of nonstress languages (i.e., non wordlevel stress) STM categorizes languages as languages with predictable stress, languages with nonpredictable stress, and nonstress languages, the latter including tone languages and pitch accent languages. Binary features, in terms of the present (positive) or absent (negative) property concerning L1 word stress, are used to predict the success rate of L2 speakers. Figure 21 shows parameter settings and languages selected for Altmanns experiments.
32 Speakers from an L1 with more negative stress features ( stress, p redictable), such as nonstress languages, are expected to perform the best in stress location identification tasks since there is no interference from L1 stress patterns. Speakers from an L1 with both a positive (+stress) and a negative stress feature ( predictable), i.e. unpredictable L1 stress patterns, are predicted to perform better than speakers of languages with two positive stress features (+stress and +predictable), i.e. a language with predictable stress. The model assumes that negative settings for any parameter are not considered to have an effect on the success of L2 acquisition of stress, while positive settings are likely to cause interference. The predictions of this model were borne out in Altmann (2006). Her findings showed that native speakers of languages without wordlevel stress (Chinese, Japanese, and Korean) earned near perfect scores in an English nonword stress identification task. Speakers of unpredictable stress languages (Spanish) scored second best. On the other hand, nati ve speakers of languages with predictable stress (Arabic, Turkish, and French) had difficulty perceiving stress location in the same task. Altmanns STM is a major contribution in second language stress perception as well as production (discussed in nex t section). The model is not limited to L1 stress languages; it also includes nonstress languages. However, as Altmann acknowledged, the model only accounted for observable surface stress patterns in L1 production. STM does not deal with the perception of acoustic cues for L2 stress or sensitivity to certain acoustic signals of L2 stress that could be transferred from sensitivity in the L1 prosodic system, e.g., acoustic correlates of L1 stress, L1 tone, pitch accent, intonation or phonemic duration contrast. Focusing on acoustic property can offer an alternate
33 explanation as to why native speakers of languages without wordlevel stress (Chinese, Japanese, and Korean) excelled in the stress identification task; it is not only due to the lack of any surf ace stress features ( stress, predictable), but also due to other acoustic properties of the L1. Like SDM, STM cannot account for the varying degree of stress deafness among speakers of predictable stress languages (Peperkamp et al., 2010). Both SDM and STM cannot readily explain the discrepancy results of French stress perception. Both failed to explicitly explain why French speakers, discussed earlier, performed well in the AX task, but not in the stress discrimination task (Dupoux et al., 1997) or stress identification task (Altmann, 2006). In other words, the issue of perceptual vs. linguis tic processing remains unsettled. Success in stress perception seems to be more complicated than looking at the surface stress position in the L1. More ev idence from acoustic correlates to L2 stress perception can broaden our knowledge of L2 stress perception, especially when vastly different language typologies come into contact. This would allow us to test perceptual cue weighting and competition between phonemic and phonetic cues. So far, Zhang and Francis (2010) pointed out that a nonnative cue, vowel reduction, takes precedence over a native cue, F0, in L2 stress perception among Mandarin speakers. Berinstein (1979)s hypothesis that an acoustic cue used phonemically would not be used in stress perception has never been reexamined in more recent literature. In addition, linguistic transfer can be observed by using stimuli that test specific constraints existing in the L1 that might affect L2 stress perception. Future study should address multi facet views of stress perception (stress position and perception of acoustic cues of
34 stress and L1 linguistic constraints) so that we can gain a better understanding of L2 stress perception. In sum, this s ection reviewed literature related to L2 stress perception. Previous studies suggested that success in L2 stress perception could be predicted by L1 stress position and stress typology of L2 speakers. The stress perception models, SDM and STM, assumed that L1 stress position and their associated parameters could account for varying degrees of success in stress discrimination and stress identification by L2 speakers. The lesser investigated area in L2 stress perception concerns sensitivity to acoustic cues to L2 stress. Berinstein (1979) suggested that a cue used contrastively in L1 would have insignificant role in L2 stress perception. In addition, Zhang and Francis (2010)s finding revealed that a nonnative cue, vowel reduction, was ranked higher than other stress cues (F0, duration and intensity) by Mandarin listeners of L2 English. Stress P roduction Being able to produce stress effectively in a second language involves the ability to place stress accurately in a word as well as the ability to artic ulate stress in a way that allows native listeners to process stress easily. A majority of second language stress production research focuses on stress assignment strategies in English, as English is a major international language which has notoriously com plicated stress patterns (Altmann, 2006; Archibald, 1993 & 1997; Davis & Kelly, 1997; Guion et al., 2004; Ling & Grabe, 1999; Wayland, 2006). The tasks used to elicit stress assignments range from reading a list of real words aloud and reading nonwords a loud to producing stress after hearing aural stimuli. Due to the nature of each task, it is not clear whether poor stress production arises from nonnative stress assignment patterns or unsuccessful stress articulation. Fewer studies have focused on the realization of L2
35 stress or the acoustic parameters of L2 stress (e.g., Adams & Munro, 1978; Chen et al., 2001; Zhang, Nissen & Francis 2008). These studies using written English real words as stress production stimuli described subtle acoustic differences between native and nonnative stress production. Transfer from the L1 suprasegmental system was suggested to account for deviations in acoustic correlate or patterns observed. Second Language Stress Placement S trategy L2 stress production research indi cated that speakers from a L1 with stress had better stress production than speakers from a nonstress background. That is, the first group tends to be sensitive to syllable weight and word class in stress production. For example, Archibald (1993) report ed that native speakers of fixedstress languages, Polish and Hungarian, showed sensitivity to word class and syllabic structure when assigning stress on English real words which varied in length from two to four syllables, e.g., aroma, erase, cinema and M innesota, in a reading aloud task. In addition, Polish speakers tended to stress the penultimate syllable (a syllable before the final syllable of a word) and Hungarian the initial syllable. In contrast, native speakers of nonstress languages, Mandarin, Cantonese and Japanese, did not show similar sensitivity when completing the same task. Archibald (1997) interpreted the findings as native speakers of nonstress languages treat stress as a purely lexical phenomenon or memorize stress as a part of phonol ogical word representation. However, this suggestion was based on a small sample size (10 participants representing three language groups, 35 test items). Similarly, Altmann (2006) found that speakers from fixedstress languages (Arabic, Turkish and French) produced stress patterns similar to those of AE native speakers; speakers with unpredictable stress (Spanish) showed undecided patterns or tended to
36 stre ss the final vowel; speakers from L1 nonstress languages (Chinese, Japanese, Korean) produced stress patterns that cannot be grouped together. Chinese speakers showed the least nativelike stress patterns across word types, such as placing stress on the last stressable vowel, including final syllables. Korean speakers were often undecided as a group for many word types. Japanese speakers showed incorrect final stresses for some word types, nativelike stress patterns for others, and were undecided on ot her word types. Altmann (2006) is a more systematic study among previous studies using real word stimuli. The use of nonwords can avoid the familiarity effect found in the use of real word stimuli. With real word stimuli, its difficult to differentiat e whether observed stress assignment is based on metrical grid computation or syllable weight knowledge or if it is just memorization. Thus, she presented nonwords written in English orthography on paper to her participants, 10 in each language group. A ll 46 nonword test items were open syllables and varied in length from two to four syllables. Additionally, they had the following features: syllable boundary is marked by the big black dot; the combinations of syllables reflect English real words; words in citation form, no syllabic template, were presented and recorded. Examples include le soo (C CV) ; da boo va (C CV C ; mey ze la noe (CVC C CV) The intended syllabic structures are in parentheses and the predicted stress location is underlined. Vowel reduction ( C syllable) was expected in an unstressed syllable, but a full vowel was accepted as well. Although Altmann could avoid the familiarity effect, other problems were present. First, a statistical analysis could not be performed becaus e a large number of words
37 were excluded due to coding difficulties. Second, presenting nonwords in English orthography resulted in non target vowel production in many cases as English lacks oneto one grapheme and phoneme representation. Reading nonwor ds aloud could also reduce the production of vowel reduction in unstressed syllables if participants unconsciously mapped a vowel letter such as a with a full vowel such as /a/ (back low vowel). Furthermore, it was not clear if the poorer performance am ong the speakers of nonstress languages emerged from their inaccurate stress assignment patterns or their nonnative implementation of stress. This group of speakers lack experience in assigning stress as well as articulating stress. Finally, this study only tested stress assignment strategies related to syllabic structure and position, and did not consider syntactic and morphological factors. Guion and colleagues also investigated the stress assignment strategies by native vs. nonnative speakers us ing a novel method. They proposed a paradigm that simultaneously tests three factors in stress assignment, namely syllabic structure, lexical class (noun or verb) and stress patterns of phonologically similar words. Their stress preference paradigm inclu ded two tasks where participants were asked to produce and give perceptual judgments on 40 nonwords of varying syllabic structures in noun and verb sentence frames. In the production task, participants were asked to concatenate two monosyllabic words, pr esented aurally, into a single word and say it in a frame sentence (verb: Id like to __ or noun: Id like a __ ). In the perception task, participants reported their preference for initial or final syllable stress on two syllable nonwords (the same ones used in the production experiment) that were aurally presented in noun and verb sentence frames. Guion and colleagues (2004) found that
38 AE native speakers were more likely to assign stress to syllables with long vowels rather than syllables with short vo wels for both lexical classes (CVV.CVCC non words are preferred to CV.CVCC nonwords) and on consonant cluster codas rather than a single consonant codas (CV.CVCC is preferred to CV.CVC) (Guion et al., 2003). Early Spanish bilinguals also showed nativelike stress placement strategies while late Spanish bilinguals showed weak nativelike strategies (Guion et al., 2004). Speakers from a tone language, Thai, were found to employ different strategies, specifically sensitivity to syllabic structure, from spe akers of stress languages, AE and Spanish (Wayland, 2006). For Thai speakers syllables with a long vowel attracted stress more often, but they did not assign stress on coda consonant clusters in the stress production task. Referring to L1 syllabic struc ture and tone assignment rules, Wayland (2006) argued that Thai speakers were sensitive to the vowel length property because it constrains Thai tone distribution rules, while the number of coda consonants does not. In fact, coda clusters do not exist in Thai. The role of L1 syllabic structure of tone languages and L2 stress assignment has been further investigated. Ou (2007) found that L1 syllabic structure does not always predict L2 stress preference. Following Guion (2003)s perceptual stress preference task, Chinese and Vietnamese ESL learners were found to show preference in assigning stress on a syllable closed by a sonorant or CVS syllable rather than a syllable closed by an obstruent or CVO syllable. CVO syllables do not exist in Chinese and wer e hypothesized to trigger undecided preference pattern. However, Chinese speakers showed a preference for stress on CVS syllables. Ou concluded that the
39 finding supports the phonological universal hypothesis that sonorous codas tend to contribute more sy llable weight cross linguistically. In sum, previous studies reviewed above suggested that speakers of L1 stress languages perform stress production tasks better than speakers of L1 non stress languages. The first group of speakers such as Polish and Spanish might benefit from L1 stress representation and thus were more sensitive to factors affecting stress assignment such as syllable weight, word class and frequency based regularity. Speakers from L1 nonstress languages, such as Chinese, Japanese and Thai, appeared to rely more on wordby word learning strategy or stress pattern of familiar words. However, it was found that this latter group of speakers also prefers to assign stress on syllables with long vowels and syllables closed by a sonorous coda. Three elicitation techniques were also discussed. The most conventional way of testing stress assignment production was to ask participants to read words aloud from a reading list and transcribe their stress assignment (Archibald, 1993 & 1997). To avoi d an effect of knownstress patterns on English real words, a more recent method presents a reading list of English nonwords that resemble syllabic structures in English (Altmann, 2006). However, since English orthography is not transparent and likely to interfere with intended pronunciation (e.g., vowel reduction in unstressed syllable), the most recent approach uses aural stimuli as a way to elicit stress production (Guion et al, 2004). Both stress perception and production studies reviewed above investigated factors affecting the success of L2 stress acquisition. L1 stress pattern and stress typology generate bidirectional predictions in L2 stress perception vs stress production. Fixed stress position in L1 results in poor stress discriminati on (Dupoux and colleagues, 1997
40 & 2001) and identification while variable stress patterns or nonstress in L1 is associated with good stress perception (Altmann, 2006). However, the reverse pattern is shown in stress production experiments. Speakers from nonstress language backgrounds have difficulty with stress production while speakers from L1 stress languages showed more nativelike stress production performance (Altmann, 2006). Speakers from nonstress backgrounds appeared not to show similar sensit ivity to syllabic structure as speakers from L1 stress languages, especially when assigning stress (Archibald, 1993 & 1997; Altmann, 2006; Wayland, 2006). These speakers were assumed to learn L2 stress on a wordby word basis rather than by a rulebased a pproach. Acoustic Parameters of Second Language S tress Stress production by L2 learners from various L1 groups also differ in how stress is realized acoustically. A few acoustic studies of L2 stress production documented fine grained acoustic differences between native and nonnative speakers of English. Alt hough individual differences existed, a homogenous group of speakers tend to produce acoustic correlates of stress in a similar way suggesting evidence of transfer from the L1 suprasegmental system. Since validity of the obtained results is conditioned by methodology; findings and methodology are presented together in this section. The early work in acoustic correlates of L2 stress in connected speech by nonnati ve speakers was led by Adams & Munro (1978). The researchers found that eight nonnative speakers of various Asian languages characterized by syllabletimed rhythm, in which their specific L1 was not revealed, employed similar acoustic cues but differed qualitatively when compared to the stress production by Australian English native
41 speakers. Besides the anomalous stress placement, nonnative speakers production was characterized by invariable patterns of F0 (rise only or fall only instead of rise/fall or fall/rise F0 contours). In addition, falls in amplitude at the boundaries of stressed sylla bles resulted in staccato rhythm and longer duration of unstressed syllables than those of native speakers. Over the past decade, studies on the acoustic correlates of stress production focused on speakers from the same L1 prosodic background, specifical ly from tonal languages, Chinese and Vietnamese. Chen et al. (2001) analyzed acoustic correlates of English sentential stress produced by 40 native Chinese speakers who were advanced learners of English. Their results showed that Mandarin speakers employ ed the same acoustic correlates to signal stress, namely F0, duration and intensity, as American English speakers. However, the Mandarin speakers produced stressed words with higher F0 average and shorter duration; they also produced unstressed words with higher F0 and greater intensity average than the American English group. No critical divergence in the way Mandarin speakers implemented American English sentence stress was observed. Other research on Mandarin learners of English focused on lexical st ress production (Zhang & Francis, 2006; Zhang, Nissen & Francis, 2008). Zhang and colleagues (2008) asked ten Mandarin speakers and the American English control group to produce stress on seven English stress minimal pairs, in which stress pattern disting uishes verb/ noun class and meanings, for example object contract and rebel Results showed that Mandarin speakers used all four acoustic correlates: F0 average, F0 peak location, intensity average, duration contrast and vowel reduction, to
42 distinguish stressed from unstressed syllables. However, they produced a significantly higher F0 average and had more difficulty with the vowel reduction cue than the control group. The acceptability ratings by five native listeners with linguistic training revealed that Mandarin speakers produced significantly less acceptable stress contrasts than English speakers. Mandarin high tones, characterized by a much higher pitch range, were thought to influence the unusually high F0 average production by Mandarin speakers in English stressed syllables. Significant differences in formant patterns across groups suggest that Mandarin speakers were able to achieve appropriate vowel reduction in certain unstressed syllables, but not in others. For instance, Mandarin speakers production of the high front lax vowel  in the unstressed syllable de in desert was not close enough to that of their native English counterparts; the Mandarin speakers also showed a tendency to map their native vowel  to English  when the Engli sh speakers chose to produce  in the unstressed syllable conin contract In sum, the differences in acoustic data suggested influence from L1 tone and similarities and differences between the L1 and L2 vowel inventories. Other recent studies investig ate stress production by a different population of tonal speakers Vietnamese learners of English ( & Ingram 2005; Ingram & Pensalfini 2008) who failed to produce native like duration contrast for stress & Ingram (2005) examined English lexical stress production by beginning versus advanced learners of English, ten in each group. The English real word minimal pairs were used as stimuli in a reading list of a contrastive sentence pairs task. For example, 1a. Say the word conduc tagain; 1b. Say the word conduct again, where italics indicate stress position. The acoustic measurements included F0, intensity, and the
43 duration of vowel and syllable of stressed and unstressed target syllables. The ratios of stressed to unstressed of the four acoustic parameters were calculated. Results showed that both groups of Vietnamese speakers employed the F0 and intensity cues as well as the Australian English native speakers did, while the Vietnamese beginning group did not use nativelike duration cues and produced no duration difference between stressed and unstressed syllables. L1 prosodic transfer was used to interpret their findings. The positive transfer from the L1 tonal systems was suggested to account for nativelike patterns in the F0 cue on a stress word in the accented position by both groups of learners. In contrast, the lack of L1 contrastive length distinction in Vietnamese was interpreted as negative transfer leading to failure in duration contrast manipulation by the beginning group. Ingram & Pensalfini (2008) conducted a subsequent study on English contrastive stress or accent production by Vietnamese speakers. Results showed that beginning and advanced learners differ from Australian English speakers in thei r use of acoustic patterns which reflect L1 phonology for realizing the three English stress patterns namely broad focus NP narrow focus NP and compound word For examples ( ., 2008: 158), a. broad focus NP: This is a bottle which is color ed blue. Its a blue bottle b. narrow focus NP: This bottle isnt colored yellow. Its a blue bottle c. Compound word: This kind of jelly fish is quite common here. Its a blue bottle. First, beginning learners failed to deaccent the second constituent of the compound and narrow focused patterns; however, the advanced learners showed evidence of deaccenting the noun to a greater extent. Second, the nonnative speakers
44 produced more varied F0 patterns than native speakers. This suggests a transfer eff ect of the various tonal contours on different syllable types from Vietnamese (p.174). However, the overall results indicated that Vietnamese speakers can manipulate contrastive levels of F0 and intensity on accent bearing syllables but failed to realize duration contrast in a nativelike way. The findings and interpretation that Vietnamese speakers ( & Ingram 2005; .2008) could manipulate F0 as a stress correlate better than duration contrast due to their L1 tonal system is in fact contradictory to Berinstein (1979)s hypothesis that certain acoustic parameters used phonemically in an L1 would not be used primarily in L2 stress perception and production. The result of her stress production experiment showed that speakers of Kekchi a language with a phonemic length contrast, relied primarily on the change of F0 contrast, secondarily on intensity and lastly (insignificantly) on duration when producing stress. Berinstein suggested that the phonemic vowel length in Kekchi prevented the use of duration from being used noncontrastively as a correlate of L2 stress. In summary, findings from Chinese and Vietnamese speakers of L2 English suggested possible transfer effect from a L1 tone system and L1 vowel inventory. Since F0 primarily signals a phonemic tone system, both groups of speakers did not have problems with manipulating F0 contrast in the L2 stress system. Nonetheless, Chinese speakers were found to produce higher F0 contrast and Vietnamese speakers were found to produce various F 0 or tonal contours Both Chinese and Vietnamese do not use contrastive length distinction ; this might explain why Chinese speakers ( Chen, 2001) used shorter duration contrast in L2 stress production and Vietnamese speakers
45 ( & Ingram, 2005 ; ., 2008) have problem with producing duration contrast in L2 stress. However, Berinstein (1979) made a different argument that an acoustic parameter used phonemically in an L1 would not be used primarily in the production of L2 stress. Proposed Model : STM The reviewed research suggests that STM has been the only model to predict the success rate of stress production. The obtained results from stress production experiments revealed that speakers from a L1 with predictable stress (Turkish, French, and Arabic) fared best in the production of English nonwords; speakers of a non predictable stress language (Spanish) performed second best and speakers of nonstress languages showed the least nativelike patterns. Altmann (2006) argued that t he complete absence of wordlevel stress (non stress language) in the L1 seems to cause difficulties with the production of target like stress placement in the L2. Based on these findings, STM states that in speech perception, only positive settings seem to impede the ability to correctly locate stressed syllables, while negative settings would not influence the performance. For the production of L2 stress, on the contrary, speakers with a positive setting for stress language would produce L2 stress in a target like manner, while speakers of languages without stress would produce nontarget like stress placement strategies. Thus, the topmost parameter, namely stress language, in the STM is the most important factor in determining the success of L2 st ress production. If the parameter is set positively, as in the case of stress languages, there appear to be no (major) problems with the correct pronunciation of L2 words with regard to stress placement. When the first parameter is set negatively, as in the case of nonstress languages, there are major problems with correct placement of L2 stress,
46 leading to different and inadequate strategies for stress production (Altmann, 2006: p. 157). Altmann suggested that further research may examine the possibi lity of L1 transfer with such languages. She also pointed out that in addition to the ability to perceive the location of stress, it is necessary for L2 learners to produce stress according to what native speakers of the L2 perceive as stress. Since not much is known about L2 stress production by speakers from L1 nonstress languages, it is not surprising that STM does not possess more detailed parameters on the right side of the typology tree and generate further predictions. More acoustic evidence of L2 stress by speakers from nonstress languages will provide insight on L1 transfer issue as Altmann suggested. This L1 transfer issue might be drawn from other prosodic systems such as a tonal system. Further investigation into the functions and acoustic correlates of stress in tone languages would reveal the mechanism underlying poor stress production by speakers from nonstress languages. Limitations of Previous Stress Production S tudies An important issue in stress production tasks is elicitation techniques. The use of real words written in English orthography cannot test whether L2 speakers assign stress from already learned patterns based on prior exposure to the test items. This is a major drawback of studies investigating stress assignment strategies using real words (Archibald, 1997). Studies in acoustic measurement of nonnative stress using real word stimuli also suffered from the familiarity effect. The use of nonwords can avoid the familiarity effect but not the orthography effect. This later confounding issue results in non target production due to underrepresentation of sounds and letters in English (Altmann, 2006). The orthography effect might have also prevented nonnati ve speakers of English from producing vowel reduction in unstressed syllables since there
47 is no difference in spelling between each noun/verb stress in minimal pairs (Zhang et al., 2008). Presenting nonwords aurally, proposed by Guion and colleagues, appears to be the best elicitation technique in testing stress assignment strategies so far since two confounding factors can be avoided. The second issue in stress production studies is that there is a gap between stress assignment strategy and acousti c analysis of L2 stress. Most stress production studies focused on stress assignment and overlooked other related issues that speakers from L1 nonstress backgrounds might face. This group of speakers, largely represented by native speakers of tone languages (Mandarin, Thai, Vietnamese), could be much less familiar with the concept of stress and how to realize it than speakers of stressed languages. None of the previous studies (Adams & Munro, 1978; Altmann, 2006; Wayland, 2006) clarified whether incorrect stress placement is due to inappropriate stress realization on an intended stressed syllable. Altmann (2006: 127) noted that speakers of nonstress languages may have intended to place stress on a certain syllable but speakers were not successful in c ombining the correlates appropriately. Furthermore, previous acoustic studies suggested possible transfer from a L1 tone system when discussing F0 as a L2 stress correlate, but so far none of the studies design specific experiments that test L1 transfer. To provide more understanding for these remaining questions, future study should focus more on acoustic correlates of L2 stress while treating stress assignment as a constant factor. Relation between Stress Perception and P roduction The relation between stress perception and production can be observed in studies that included paired experiments. The following are results from stress placement studies and acoustic analysis of stress. The combined results on L2 stress assignment
48 strategies by Altmann (2006) revealed that good perception of L2 stress does not necessarily lead to good production of L2 stress; bad perception does not entail bad production. In other words, hearing stress and articulating stress are independent from each other. The combined results from the production and perception of new words experiments by Wayland (2006) showed similar directions in that Thai speakers relied heavily on the stress patterns of already known words or word by word learning strategy and less on syllabic structure and lexical class. Note that her conclusion is congruent with Archibald (1997) that Chinese and Japanese tended to assign stress by using a whole word learning strategy. An acoustic correlate study by Berinstein (1979) revealed that speakers from a fixed stress language with a phonemic v owel length system, Kekchi, did not use duration as the primary acoustic correlate of stress and, in fact, did not use duration as a perceptual cue to stress at all. By far, the amount of research that investigates the relation of stress perception and production is scarce. More paired studies, especially with the emphasis on acoustic aspects of L2 stress, are needed in order to shed light on the relation between perception and production of L2 lexical stress.
49 STRESS PARAMETERS stress language non stress language predictable not predictable pitch no pitch English, Spanish Korean quantity quantity tone pitch accent sensitive insensitive Chinese Japanese Left R ight Left Right Arabic French, Turkish Figure 21. Typology of stress parameters and relevant languages. [Reprinted with permission from Altmann, H. (2006). The perception and production of second language stress: a cross linguistic experimental study. Ph.D. Dissertation (Page 38, Figure 2), University of Delaware, Newark, Delaware.]
50 CHAPTER 3 PROSODIC PHONOLOGY O F THAI AND ENGLISH In this study, L1 Thai L2 English adults prosodic transfer in both perception and production of English lexical stress was investigated. Thai and English were selected due to their vastly different prosodic systems. In brief, Thai is a tone language with phrasefinal stress, the dominant production cue to which is duration (Potisuk, 1996). English is an intonation language and has variable stress patterns employing dynamic cues such as duration, F0, intensity contrast and vowel reduction in unstressed syllables (Cutler, 2005). Prosodic categories, both lexical and nonlexical, in Thai and English have overlapping acoustic correlates, or physical properties, as shown in figure 3 1. This chapter presents background on both languages. I begin with a description of the languages segments and then move to suprasegmental systems. Am erican English English is a West Germanic language. For this experiment, the emphasis was on American English. In the US, between 20062008, 225 million people spoke only American English at home and 55 million people spoke a language other than English at home (American Community Survey, 2008). There are many dialects of American English. However, most of my L1 English participants were raised in Florida, which is often described as not belonging to any particular dialectal group. American English Segmental S ystem English consonants are presented in table 31. Different varieties of American English have different numbers of vowels, but the 12 simple vowels most commonly present in AE dialects are presented in table 32.
51 Stress Stress is relative b ecause its degree or presence cannot be established unless it is compared with other nearby syllables or words. There are two types of stress in English: word (lexical) stress and sentence (sentential) stress. Both differ in domain, position and function. The domain of word stress is the syllable while the domain of sentence stress is the words (Couper Kuhlen, 1986). Kingdon (1958: 1) defined word stress as the relative degree of force used in pronouncing the different syllables of a word of more than one syllable. In contrast, sentence stress is the relative degree of force given to the different words in a sentence it is closely bound with intonation. Word stress or lexical stress is a property of a word and its position is largely fixed in a wo rd. For example, stress always falls on the first syllable of the words challenge and mother Monosyllables cannot be said to have word stress (Kingdon, 1958). In a disyllabic word, there always has to be a primary stress and an unstressed syllable (e.g ., balloon) or a primary stress and secondary stress (e.g., raccoon) ; in a polysyllabic word, there is a primary stress, a secondary stress and an unstressed syllable. Sentence stress involves selecting one word or phrase within a sentence and giving the item special emphasis in pronunciation. John hasnt arrived can be uttered in three ways (Fudge, 1984:1). a. John hasnt arrived. b. John hasnt arrived. c. John hasnt arrived. In a. John has set out to get here, but is not yet here; b. might be utter ed as a correction to someone elses assertion that John has arrived; c. might be said if John was expected to be among the people who have arrived, but is not in fact among them.
52 These three sentences show that sentence stress can vary in its position. In contrast, the stress position of the word arrived is always placed on the second syllable, never the first. In a sentence domain, monosyllabic words can have stress and multisyllabic words can be unstressed depending on their function in a sentence. Conventionally, content words such as nouns and verbs are stressed while grammatical/function words such as articles, auxiliary verbs, and prepositions, are unstressed. In this dissertation the term lexical stress was used when referring to stressed syllables in order to avoid terminology confusion since some researchers (e.g. Chen et al., 2001) use the term word stress to refer to sentence stress. Note that the terms prominence, stress and accent have been used interchangeably in the literature, prior to approximately 1990. Cruttenden (1986:16) attempted to sort out the unsystematic differences by suggesting that stress, either in words or sentences, often refers to breathforce or loudness implying syllables are made pr ominent although loudness plays a minor role in producing prominence. In contrast, accent commonly implies that such prominence is principally associated with pitch, hence the common term pitch accent. However, where intonation phonology is concerned, the term accent is used at the phrase level and utterance level (Gussenhoven, 2004; Ladd, 2008; Pierrehumbert & Hirshberg, 1990) to refer to the most prominent stressed syllable of an accentual phrase or phrases in a sentence. Lexical S tress Lexical str ess is important for word recognition and is encoded in the phonological representation of a lexical item for native speakers of stress based languages like English, Spanish and Dutch (Cutler & Van Donselaar, 2001). Children as young as nine
53 months have been known to perceive stress in English (Jusczuk et al., 1993). English speakers use lexical stress to distinguish a considerable set of stress minimal pairs, for example, PROduce (noun) vs. proDUCE (verb) (Kingdon, 1958: 4550). This is a contrastive function of stress in English. Importantly, lexical stress is a fundamental element of English prosodic hierarchy. It provides the docking site for pitch accents which are superimposed by intonation systems (Howard & Smith, 2002; Ladd, 2008; Wennerstorm, 2001). Lexical Stress P attern The lexical stress positions in English, either primary or secondary, are the major challenge for adult learners of English. The rules for placing primary stress on words depend on several factors such as the type of word (s imple words or root, words with prefixes and suffixes, compounds) and syllable weight (weak/light, strong). See Chomsky & Halle (1968) Fudge (1984) and Halle (1998) for stress pattern analyses. The general rules for placing primary stress on simple words are a) stress falls on the penultimate (second from last) syllable for disyllabic words e.g. O zone, SINGle, b) on either the penultimate or antepenultimate (third from last) syllable for trisyl labic or longer words e.g. toMAto, Antelope (Fudge, 1948:17). The history of English borrowed words explains the overlapping of stress rules (Wennerst r om, 2001; Gussenhoven, 2004). Words of Germanic origin, the most frequently used words in English, follow the rule: stress the first syllable of the words root, ignoring the affixes. This gives the trochaic, strongweak structure such as MOther, KINDness. Words of Latin origin have more complicated stress rules; stress depends on the number of syllables i n the word, the part of speech (noun, verb, adjective, or adverb), moraic structure of the rhyme node (long or short vowels and
54 whether a syllable ends in consonants) and the type of affixes attached to the root. Words of French origin, adopted into Engli sh more recently, keep the original stress position, final syllable stress (Wennerst r om, 2001:48). Although English stress patterns appear very complicated, several facts indicate that English stress rules can be internalized. For example, Englishsp eaking children learn the rules quite well (Jarmulowicz, 2002) This shows that it is not a matter of memorizing stress patterns. Moreover, native speakers of English showed a good degree of agreement when placing stress on nonsense multisyllabic words l ike JABberwo cky or su perc a lifr a gil i stice xpi a liDOScious (underlined let ters showed secondary stress); which shows alternating stress patterns (Wennerst r om, 2001:48). Acoustic Correlates of S tress From a speech articulatory viewpoint, a stressed syll able is pronounced with a greater amount of energy than an unstressed syllable, and it is more prominent in the flow of speech. This usually involves pushing out more air from the lungs by contracting the muscles of the rib cage and perhaps increasing the pitch by the use of the laryngeal muscles. The extra activity may result in the sound having greater length (Ladefoged 2001:2312). The physical/acoustic dimensions of stress include the length, intensity, fundamental frequency and formant structure (Fr y, 1955 & 1958; Lehiste, 1958; Adams & Munro, 1978; Beckman & Edwards, 1994; Sluijter van Heuven & Pacilly, 1997). The four corresponding psychological dimensions that are perceived by a listener are duration, loudness, pitch and vowel quality. The firs t three dimensions have been claimed to be the primary acoustic cues to a stressed syllable whereas secondary cues to a stressed syllable, which are more related to segmental aspects, include noncentralized vowels, glottal stops, and aspirated plosives (C ouper Kuhlen, 1986).
55 Other interfering factors in the measurement of suprasegmental aspects of stressed vs. unstressed vowel/syllable are vowel identity, consonantal context, position in the phrase or sentence, and rate of speech. For instance, high front vowels are intrinsically higher in pitch, but less loud and shorter than back low vowels; preceding voiceless consonants cause a sudden rise in F0 in the following vowel; intensity and F0 decrease over a phrase or sentence while vowels at the end of the phrases tend to be lengthened; in rapid speech, unstressed vowels are more reduced in length than stressed vowels, and F0 tends to be higher (Couper Kuhlen, 1986: 2021). Not all these physical properties need to be present for a syllable to be perceived as stressed; a native speaker can identify a stressed syllable correctly by relying on only some of the cues (see a review by Cutler, 2005). Over nearly sixty years, researchers have investigated the relative weight of stress perception cues. Th ere seems to be an agreement that duration is the major perceptual cue to stress while other cues are more controversial. Fry (1955 & 1958) conducted a study of word stress in English minimal noun/verb pairs such as SUBject subJECT to test three dimens ions of stress: duration, intensity and F0. In these studies, Fry synthesized speech stimuli in which the physical dimensions could be varied independently and systematically. For example duration and intensity varied independently in 5 steps each, and F0 varied in 16 steps. Listeners judgments showed more effect of the durational manipulation than the intensity manipulation when these two dimensions were investigated. Change in F0 showed an all or none effect; that is, the fact that an F0 change has been perceived by a listener is more important than the magnitude of F0 change. In addition, syllables
56 with a higher F0 peak or clear F0 movement were always judged as stressed. In a subsequent study, Fry (1965) investigated the effect of the vowel reduction cue of unstressed syllables in comparison to duration and intensity manipulation. Step change was applied to the vowel formant ratios, e.g., for a word object changed from a full vowel / to reduced vowels /in the first syllable or from / to in the second syllable. The results showed that duration and intensity were more important cues for stress judgment than this type of vowel reduction cue. A number of researchers argued that F0 is not a cue for stress perception but a cue for accent (Bolinger, 1958; Sluijter & van Heuven, 1996; Sluijter, van Heuvan & Pacilly,1997; Beckman & Edwards, 1994). Bolinger (1958) pointed out that the salient turn of pitch reported in Fry (1958) was the realization of pitch accents on the stressed syllable of the word. (See discussion on English accent and intonation systems below.) Sluijter and colleagues (1996, 1997) discussed that at sentence level, when words are spoken outside focus ( i.e., without a pitch accent on the stressed syllable) the F0 cue w as hardly seen on the stressed syllable of deaccented words; thus the position of the stress has to be inferred from the remaining cues such as duration and intensity. Intensity, claimed as a less important stress cue (Fry, 1955), has been largely reinv estigated as well. Beckman (1986) proposed that a measure of total intensity (amplitude) across a syllable could reveal the effects on stress judgment. However, Sluijter and colleagues (1996 &1997) argued that such a measure is confounded with syllable d uration and proposed an alternative measurement, spectral balance. Spectral balance is the intensity level in the higher frequencies; the stressed syllables have more energy in the higher frequency regions of the spectrum than unstressed vowels do. In
57 th e experiment using disyllabic Dutch nonsense word nana, spectral balance was found to have more effect when the speech was presented via loud speakers than via headphones (Sluijter et al., 1997). Note that only spectral balance of similar vowels in differ ent stressed position can be compared. Campbell & Beckman (1997) however, failed to replicate such an effect in English; instead, they found spectral differences as a function of focal accent, but not on unaccented, stressed syllables. Okobi (2006) reported that spectral tilt, amplitude difference between the first harmonic (H1) and the most prominent harmonic in the third formant region (F3)(H1* A3*), F0 and intensity prominence can well predict accented syllables. In contrast to F0 or intensity and other discussed cues to stress, duration, alone, appears to be a sufficient cue for perceived stress. Turk & Sawusch (1996) conducted a series of experiments by manipulating the speech stimuli in a way that length and intensity are not confounded with each other. They found that length and loudness are perceived integrally and that the extraction of length appears to be much easier than the extraction of loudness. In fact, listeners can use length alone, but not loudness, when making stress judgment. A ccent In the mental lexicon, every multisyllabic word in English is stressed; however, it might not be emphasized or accented (Bolinger, 1986: 15). In other words, accented syllables are stressed or made prominent at the sentence level or discourse level (Chun, 2002: 148). In addition, accents, in each accentual phrase, are blocks of intonation contours (Chun, 2002; Ladd, 2008). Pitch, length, loudness, rhythm, and vowel quality are manifestations of accent (Bolinger, 1986: 19). Many researchers agree that F0 is the most important acoustic
58 correlate of English accent, followed by duration, and intensity is the least significant (Beckman, 1986; Chun, 2002; Wennerst r om, 2001). Accent is often associated with the movement of pitch on the accented syll able or an anchor point for the alignment between the F0 contour and spectral patterns (Beckman and Edwards, 1994: 12). This description of accent captures the fact that we can observe both F0 peak and F0 valley (sudden fall in F0) on a stressed syllabl e, i.e. stressed syllable can have either higher or lower average F0 than unstressed syllable. Intonation Intonation refers to the rise and fall of the pitch when native speakers of English speak (Armstrong, 1931). Intonation can be described as the mel ody of the language because it is mainly characterized by a change in F0. Phonetic parameters of intonation include pitch, loudness, length and pause (Cruttenden, 1986). Intonation contour of English can be altered by discourse roles or meanings that apply to phrases or utterances as a whole such as type of speech act, or focus and information structure (Couper Kuhlen, 1986; Ladd, 1996; Wennerstrom, 2001). Intonation has also been reported to have sociolinguistic functions such as identifying individuals on the basis of social groups, agegroups, socioregional and occupational groups (Chun, 2002). It is currently most commonly represented in an autosegmental metrical framework (Pierrehumbert, 1980; Pierrehumbert & Beckman, 1988), which treats intonation contours as sequences of discrete units ( pitch accents and boundary tones ) defined in terms of the top and bottom of the speakers current pitch range. Pitch accents are associated with prominent or accented syllables while boundary tones are assigned to the edges of phrases
59 In summary, English prosodic systems are organized in a hierarchical manner, starting from the syllable level up to utterance level. Figure 32 shows English prosodic hierarchy with a sample English sentence. At syllable level, we have strong (S) and weak (W) syllables. At the word level, stress is assigned to the first syllable, the strong syllable, of the word sister. Then at phrase level, we have a pitch accent on the stressed syllable /s s/ in the phrase my sister. To express a proposition or a complete thought unit, we need other phrases to accompany the accented phrase my sister. The other accentual phrases are then organized into a larger unit and are strung together by intonation contour (Level t 1989). Thai Thai is the official language of Thailand and is also spoken in Laos. In Thailand, it is used as a medium of education and most mass communication. In 2000, about 20 milli on people worldwide spoke Thai as their mother tongue, and 40 million spoke Thai as a second language as of 2001 (Gordon, 2005). While Thai belongs to the Tai language family, the use of aspirated consonants in Thai differentiates it from the Tai languages spoken in North Vietnam, Burma, Assam (a province of India), and southwestern China ( Li 1992). Thai is an SVO language with no definite or indefinite article and no verb conjugation for tense, number, or gender. A general description of Thai consonant s and vowels is presented in the following sections. Thai Segmental S ystem Thai consonants and vowels are presented in table 33 and 34, respectively. Thai has 21 consonants. Aspiration is a distinctive feature of Thai consonants. In the aspiration of / ph, th, kh/, the fairly long lag between the release of the stop and the onset of voicing is filled with turbulence, i.e. noiseexcitation of the relatively unimpeded
60 supraglottal vocal tract, while during the noise of the affricate /th/, the voicing l ag excites a narrow postalveolar constriction, thus giving rise to local turbulence (Tingsabadh & Abramson, 1999: 149). The glottal stop // has phonemic status when preceding both short and long vowels. In final position, // has epenthetic status; it can occur only after a short vowel in a stressed syllable. A syllable ending in the epenthetic glottal stop, for example, /s to shampoo, is subject to the same tonal constraints as a syllable with a short vowel and a final (phonemic) stop, for example, /sk / only (Abramson, 1962 cited in Kallayanamit, 2004 ; Peyasantiwong, 1986). The Thai vowel system contains nine monophthongs and three diphthongs. Diphthongs are formed by combining a high vowel with a back low vowel /a/. Length is contrastive for all twelve vowels, e.g., /dp / to extinguish vs /d p/ sword ; /st/ fresh vs /s t/ unmarried. The physical correlate of the phonemic vowel length distinction in Thai is relative duration. Abramson (2001) reported that the ratio of long to short vowels ranges from 1.5 to 2.2, depending on speech style. The ratio is small in a fas t speaking rate situation such as when reading carrier sentences; the ratio is larger in a slow speaking rate situation such as reading conversational excerpts. Syllabic S tructure Thai syllabic structure could be reduced to C(C)V(:)T (C) where T is a tone (Wutiwiwatchai & Furui, 2007) The initial consonant can be either a s ingle or a clustered consonant ( single: p, ph, t, th, c, ch, k, kh, b, d, m, n, pr, pl, tr, kr, kl, kw, phr, phl, thr, khr, khl, khw ) whereas the V can be either a single vowel or a diphthong. The final consonant can only be a single consonant from the following
61 set: voiceless, unaspirated stop; nasal; and approximant. All final consonants / p, t, k, m, n, w, j/ in Thai are phonetically unreleased (Wayland, 1997). Tone Thai has five tones : mid ( or unmarked ) low falling ( ) high and rising Luksaneeyanawin,1998; Tingsabadh & Abramson 1999). mid / kha:/ getting stuck high / kh :/ to trade low / kh :/ galangal rising / kh :/ a leg falling / kh :/ to kill or I (first person pronoun) Although tone is an innate property of every syllable, not every syllable can bear all five tones. Thai tones are subject to syllabic structure constraints (Moren and Zsiga, 2006). Only open syllables with a long vowel (CVV) or a syllable closed by a sonorant consonant (CVS and CVVS) can bear all five tones. CVO syllables (where O = obstruent) can bear only high and low tones, and on CVVO syllables, only falling and low tones are allowed. These restrictions are illustrated below: CVO CVVO mid ----low /l k/ stake /l :k/ various high /l k/ to steal --falling --/l:k/ to tow rising ----According to Gandour (1978), the primary acoustic correlate of tone is fundamental frequency (F0). F0 contour and height contri bute to the shape of each
62 tone in Thai. Mid, low, and high tones are static, while falling and rising tones are dynamic. The tone naming does not exactly match with the F0 patterns; thus, to avoid any possible confusion, both conventional names and the naming system proposed by the UCLA phonetics lab (Ladefoged, 2010) in parentheses were provided Figure 33 presents F0 contours of Thai tone system. The static group is characterized by relatively smooth F0 movement: the mid (mid falling) and the low t ones (low falling) actually continuously fall within a narrow F0 range, but the high tone (high rising) continuously rises within a narrow F0 range. The dynamic group has sharp F0 contours that move over a wide F0 range: the falling tone (high falling) begins high and continuously falls, and the rising tone (low falling rising) begins low and continuously rises (Luksaneeyanawin, 1983 & 1998; Satravaha, 2002). In addition, voice quality has been reported as an acoustic correlate of tone. Wayland (1997) id entified falling and high tones as glottalized whereas mid, low, and rising tones were nonglottalized. Stress The status of stress in Thai is not the same as English lexical stress; it operates at a phonetic level (Luksaneeyanawin 1993 & 1998) or primar ily for contrastive and/or emphatic purposes (Saengsuriya, 1989). Past research in Thai stress has revealed that the syllable in wordfinal position is the most prominent, or has strong stress ( reviewed by Peyasantiwong, 1986). Luksaneeyanawin (1983 &1998) provides further analysis that Thai is a fixed accent language. In her accentual system of monosyllabic Thai words, all content words are accented and all grammatical words are unaccented. In polysyllabic words, the primary accent is always on the last syllable of a word and the secondary accent placement is rule governed. Luksaneeyanawin (1983:75) used the
63 term accent in a sense that it refers to the potentiality of the syllable or the syllables in a word to be realized with stress either when the word occurs by itself in an utterance or with other words in an utterance. Most monosyllabic Thai words are of native origin. Stress realization due to the Thai accentual system in monosyllabic words is not as complicated as delineated above. However, there has been massive borrowing from other languages, such as Khmer, Pali, Sanskrit and English words, and these borrowed words are multisyllabic morphemes. The following examples are drawn from Bennett (1995:89). kha.j rubbish (Khmer) na.li.nii lotus pond (Sanskrit) tam.ruat police (Khmer) kh m.phw .t computer (English) phaa.saa language (Sanskrit) a t.sa.waa.n k cavalry (Sanskrit) Stress realization rules in multisyllabic words, then, are more complicated because of the secondary stress. Luksaneeyawin (1983) proposed an accentual system for polysyllabic words in Thai. She divided syllables into two types, linker syllables (L) a nd nonlinker syllables (O). Linker syllables are syllables that have a phonemic /a/ and end with an epenthetic glottal stop; all other syllables are non linker. The marker ( ) indicates stress position. The last syllable of the word is stressed. T he following examples illustrate Thai primar y stress placement of disyllables. L'L r. j distance 'O 'O tamna:m a legend L 'O ch. bp a copy 'O 'L wa : r an occasion Secondary stress placement is subject to the following rules Bisyllabic words L L L O O O O L
64 Trisyllabic words ( LL) L ( LL) O ( LO) O (L O) L ( OO) O ( OO) L ( OL) L ( OL) O Tetrasyllabic words (L O)O' O (L O)O' L (L O)L O (L O)L L ( OL)O' O ( OL)O' L ( OL)L O ( OL)L L ( LL)L' L ( LL)L' O ( LL)O O ( LL)O L ( OO)O' O ( OO)O' L ( OO)L L ( OO)L O For trisyllabic and tetrasyllabic words, primary ac cent is assigned to the final syllable. If the two remaining syllables (the ones in par enthesis) are of the same type, secondary accent will be assigned to the syllable farthest from the primary accent. If one of the remaining syllables is a non linker syllable, the secondary accent will be assigned to it. Note that Luksaneeyanawin collected her data by asking Thai participants to read her word lists aloud in isolation. Another analysis of Thai stress assignment on polysyllabic words was done using a metrical feet frame work. Bennett (1995) proposed that the prosodic structure of Thai is right headed with only two levels of prominence, stressed and unstressed. Light syllables cannot stand at the right edge of a prosodic word. Thai permits light syllables to be included in polysyllabic words without being included in foot structure, and it allows prosodic words to include more than one foot. The following is an example. ( x) Prosodic Word layer ( x) ( x) Foot layer x x x x x ma nt sa. ja. chon
65 Phonetic stress as realization of an accentual system proposed by Luksaneeyanawin (1983) seems to be a good analysis for the fixed stress position in Thai. Cutler (1986) suggested that if stress can only occur in one position for a given string of phonetic segments, it cannot be used for lexical purposes; rather, it is for word boundary identification. Stress affects the realization of vowel length in Thai. A long accented syllable is always realized as a distinctively long stressed syllable, as opposed to a long unaccented syllable which is always realized as a relatively short unstressed syllable (Luksaneeyanawin,1998: 377). Thai does not have a vowel that is equivalent to the English mid central unstressed vowel or schwa, so lack of stress does not affect vowel quality (Ariyapitipun, 1988; Panlay, 1997); however, Thai unstressed vowels have reduced length (Luangthongkum, 1977) Peyasantiwong (1986) presented phonological rules such as vowel shortening, glottal stop deletion, and tone alteration that distinguish weakly stressed and normally stressed syllables. In disyl labic words (compounds, loanwords, and reduplications), various types of vowel shortening in unstressed syllables exist: (1) long vowels are reduced to short vowels (V: V), (2) original short vowels are reduced to about half of their length, and (3) diphthongs are also reduced in duration. Other rules affecting unstressed syllables include glottal stop deletion in CV syllables when they are not wordfinal (example a.) and tone neutralization (example b.), in which high and low tone become mid. a. th lee tha .lee sea b. pra .theet pra In multisyllabic words, the final glottal stop does not exist in linker (CV) syllables .theet country such as examples c. and d.
66 c. so n.th yaa so n. tha .yaa twilight d. th naa.khaan tha Recent studies on acoustic correlates of stress production and perception .naa.khaan bank confirmed that duration was the dominant cue of Thai stress. In a stress production test by Potisuk (1996), five Thai participants were asked to read aloud sentences containing 25 stress minimal pairs. These are ambiguous words: either a nounverb sequence that has a strong stress pattern or compounds with a weak strong stress pattern. Acoustic analysis revealed that rhyme duration is the predominant cue for Thai stress; F0 played a subservient role (not significant in stepwise regression analysis) and intensity played no role. The average height of F0 contours did not change as a function of stress, but the SDs of tonal contours were significantly larger in stressed syllables than those in unstressed syllables for all tones. The researchers explained that F0 was not an important cue to stress in Thai because F0 is reserved for tone realization. Thubthong, Kijsirikul & Luksaneeyanawin (2001) also agreed that duration was the most important acoustic correlate of Thai stress. A study on Thai stress perc eption by Nitisaroj (2004) reported that duration was the primary cue, followed by vowel quality/bias and amplitude. F0 did not appear to play any role in signifying stress. In her stress judgment task, participants were asked to identify the stress location of stress minimal pairs (compounds vs. phrases) stimuli that were digitally manipulated. Interaction between Thai Prosodic U nits Other larger prosodic units in Thai such as phrasal stress (Potisuk, 1996 ) and Thai intonation (Luksaneeyanawin, 1983 & 1998; Kallayanamit, 2004) do not contaminate the phonological system and phonetic features of tone. The five way tonal contrast in
67 unstressed syllables remains the same at a fast speaking rate (Gandour et al., 1999). Conclusion English and Thai differ greatly in terms of their segmental systems, phonotactics and suprasegmental systems. While the two languages employ similar acoustic cues to prosody such as F0, intensity and duration, these cues are used at different linguistic levels. Thai uses F0 phonemically in its tone system, while English uses F0 at the post lexical level in its accentual and intonation systems. Duration is used phonemically in the Thai vowel length system; it is also used at the post lexical level as a phrasal stress cue in Thai. On the other hand, duration is only one of many acoustic cues employed to indicate English lexical stress. Various cues including F0, duration, intensity and vowel quality have been reported for English lexical stress. To say which acoustic property is the most salient cue for English lexical stress is difficult because English stress cannot stand alone but is superimposed on accent and intonation systems. In addition, the predominant stress pattern in English is word initial (especially in disyllab les) while Thai primary stress is word final. Given these differences, it is interesting to look at the transfer of stress patterns from the Thai prosodic system when Thai speakers produce and perceive English lexical stress.
68 Figure 31. Functions and physical parameters of worlds prosodic system [Reprinted with permission from Hirst & Di Cristo (1998). Intonation systems: A survey of twenty languages. (Page 5). Cambridge, UK: Cambridge University Press. ] Table 31. American English c onsonants Bilabial Labio dental Dental Alveolar Post alveolar Palata l Velar Glottal Plosive p b t d k g Affricate Nasal m n Fricative f v s z h Approximant j w Lateral Approximant l Note: [Reprinted with permission from Ladefoged, P. (1999). American English. In Handbook of the International Phonetic Association: A guide to the use of the International Phonetic Alphabet (Page 41). Cambridge, United Kingdom: Cambridge University Press. ]
69 Table 32. American English v owels front central back high i u mid high e(e ) o(o mid mid low low a Note: American English diphthongs are /a a o /. [Adapted from Ladefoged, P. (2001). A course in phonetics (Pages 29 & 36). Boston, MA: Heinle & Heinle.] Figure 32. English prosodic hierarchy [Reprinted with permission from Howard, D., & Smith, K. (2002). The effects of lexical stress in aphasia word production. Aphasiology, 16(1/2), 198237. (Page 202). ]
70 Table 33. Thai c onsonants Bilabial Labio dental Alveolar Post alveolar Palatal Velar Glottal Plosive p p h b t t h d k k h Nasal m n Fricative f s h Affricate t t h Trill r Approximant j w Lateral Approximant l Note: [Reprinted with permission from Tingsabadh, K., & Abramson, A. S. (1999). Thai. In Handbook of the International Phonetic Association: A guide to the use of the Phonetic Alphabet (pp. 147150). (Page 148). New York: Cambridge University Press. ] Table 34. Thai v owels front central back high /i i:/ / :/ /u, u:/ mid /e, e:/ / :/ /o, o:/ low / :/ /a, a:/ / :/ Note: Thai diphthongs include /ia, i:a, ua, u:a/. [Adapted from Ahkuputra, V., Maneenoi, E., Luksaneeyanawin, S., & Jitapunkul, S. (2003). Acoustic Modelling of Vowel Articulation on the Nine Thai Spreading Vowels. (Page 174). International Journal of Computer Processing of Oriental Languages, 16(3), 171195. ]
71 Figure 33. F0 contour of the five Thai tones in isolated monosyllabic words. [Adapted from Luksaneeyanawin, S. (1998). Intonation in Thai. In D. Hirst & A. Di Christo (Eds.), Intonation systems: a survey of twenty languages (pp. 345359). (Page 377). New York: Cambridge University Press. ]
72 CHAPTER 4 THE CURRENT STUDY In this dissertation, the perception and production of English lexical stress by NT was under investigat ion In this study, speech production refers to the acoustic realization of L2 lexical stress, and speech perception refers to the ability to identify stress placement in disyllabic words. Motivation for the S tudy The majority of previous work on L2 lex ical stress acquisition has focused on the stress placement strategies used by native and nonnative speakers (Altmann, 2006; Guion et al., 2004; Ling, 1999; Wayland, 2006). While a few studies (Adams & Munro, 1978; Chen et al., 2001; Zhang et al. 2008) have focused on the acoustic correlates of L2 stress production by speakers from nonstress language backgrounds and suggested L1 influence on L2 stress production; none of these have directly evaluated L1 prosodic transfer. To my knowledge, no previous st udy has specifically examined the relation between L2 lexical stress perception and production, especially with a focus on acoustic properties of stress. In addition, there is scarce evidence regarding the acquisition of L2 stress by L1 tonal language speakers ; so far only Chinese and Vietnamese populations have been examined ( Chen et al 2001; ., 2008; Zhang et al., 2008; Zhang & Francis, 2010). Thai is still an unresearched language in this area. It has a rich prosodic system which uses F0 and duration contrastively in its tone and phonemic vowel systems. Further, Thai phonology differs vastly from that of American English, especially at the suprasegmental level.
73 To narrow the gap in L2 stress acquisition research and to answer new research questions, the current study focuses on how tone assignment rules, stress patterns and acoustic correlates are all transferred from L1 to L2. Research Questions and Hypotheses The major research questions and hypotheses presented in Chapter 1 are repeated here. Do Thai tone assignment rules based on syllabic structures affect NT production and perception of English lexical stress? Does the primary stress pattern of Thai influence NT production and perception of English lexical stress? What is the relation between L2 lexical stress production and perception? As explained in Chapter 1, all five Thai tones can be assigned to CVV, CVS, CVVS syllables, but only the low and falling tones can occur on CVVO syllables. It was therefore predicted that NT would have the most difficulty producing stress on CVVO syllables. CVO syllables would be the second most difficult for stress production due to the limited F0 contour patterns: only high and low tones. However, since high tone can appear on CVO syllables, NT should not find it too difficult to produce stress on this syllable. Secondly, it was predicted that NT would produce final stress more accurately than initial stress due to a transfer from their L1 fixed final stress pattern. Furthermore, since Thai is a tone language with a phonemic vowel length distinction, it was predicted that NT would not have difficulty perceiving English lexical stress because they could rely on pitch and vowel duration as the main perceptual cues. The combined results f rom the production and perception tasks were expected to yield insight into the relation between L2 stress production and perception.
74 Overview of the S tudy The main research questions were investigated in two experiments. Experiment 1, Stress Production, examined whether NT and AE could produce lexical stress assignment on English nonwords that conformed to Thai syllabic structures. Their stress production was evaluated by a panel of 42 American English listener judges and then submitted to a detailed acoustic analysis. Experiment 2, Stress Perception, examined whether NT could identify the stress location of disyllabic nonwords as accurately as AE. Accuracy scores and reaction times were collected. Detailed Research Questions Detailed research questions and predictions in each experiment are presented in this section. The last set of questions are related to findings from both experiments. Experiment 1: Speech P roduction Production score analysis What is the overall stress production ability for the two language groups? Prediction: AE will have better production than NT as shown by higher group stress production scores. Is there any stress position effect? Prediction: NT will show final stress preference. AE will show initial stress p reference. Which syllabic structures are most difficult for each language group to produce the intended stress on? Prediction: CVO and CVVO syllables will be most difficult to assign stress to for N T. No specific prediction for AE. Is there any interac tion between stress position, structure of the stressed syllable, and language group? Prediction: NT will experience more difficulty when producing initial stress on CVO and CVVO syllables. No specific prediction for AE.
75 Acoustic analysis Does stress posi tion affect which acoustic parameters used by each group? Prediction: NT will employ the duration contrast heavily to indicate final stress. No specific prediction for AE. Does the structure of a stressed syllable constrain F0 realization in stress produc tion? Prediction: NT will employ less F0 contrast than other syllable structures when CVVO syllables are stressed. Which acoustic correlate(s) of lexical stress do the judges rely on when identifying the location of stress produced by the two language groups? Prediction: The judges will rely on multiple acoustic cues. Experiment 2: Speech P erception Accuracy analysis What is the overall stress identification score of the two language groups? Prediction: Both groups will show a ceiling effect, as reported in Altmann (2006). Is there any group difference? Prediction: No. According to STMs prediction, NT should excel in a stress identification task since Thai is a tonal language like Chinese. Is there any stress position effect? Prediction: NT will identify final stress better than initial stress. AE will show the reverse pattern as English prefers to stress initially. Is there any syllabic structure effect? Prediction: NT will be less accurate at identifying stress on CVVO and CVO syllables than on syllables with other structures. No prediction for AE. Is there any interaction between stress position, syllabic structure and language group? Prediction: In initial stress position, NT will be less accurate at identifying stress on CVVO and CVO syllable s than on syllables with other structures. No specific prediction for AE.
76 Reaction Time A nalysis Is there any group difference? Prediction: Overall, AE will have faster RTs than NT. Which stress position will participants take longer to identify? Prediction: NT will identify final stress more quickly than initial stress. AE will identify initial stress more quickly than final stress. Which syllabic structures will lead to the longest reaction times (RTs)? Prediction: NT will need more time to identify stressed CVVO and CVO syllables than stressed syllables with other structures. No prediction for AE. Is there any interaction between stress position, syllabic structure and group? Prediction: In initial stress position, NT will need more time to identify stressed CVVO and CVO syllables than stressed syllables with other structures. No specific prediction for AE. Acoustic A nalysis Which acoustic cue(s) to lexical stress do NT and AE rely on when identifying the location of stress produced by the phonetician? No specific prediction. Does stress position affect which acoustic cues used by each group? No specific prediction. Experiment 1 & 2: Relation between Speech Perception and P roduction What is the overall relation between speech perception and production? Prediction: NT who excel in the stress identification task will show similar performance in stress production task. What is the relation between perception and production for specific syllabic structures? No prediction. What is the overall relation between acoustic parameters employed in speech perception and production?
77 Improvement in Methodology Research methodology in this study improves upon previous studies in several ways. First, the present study includes a larger number of participants and native listener judges than previous studies. 15 participants from each language group participated in both experiments. Each token of lexical stress production was perceptually evaluated by a panel of 21 native listeners; 42 native listeners in total were included in the stress production evaluation task. This current study includes speech sample of 30 speakers; this number is considerably larger than many previous detailed acoustic analysis studies. For example, Ada ms & Munro (1978) only analyzed the stress production of eight learners of English from unidentified Asian languages. In Guion et al.s (2004) English stress assignment preference tasks, only 10 Spanish speakers were included, and only two native listener s assessed the stress assignment production. Similarly, Wayland (2006) tested only 10 Thai participants production with only 2 native listener judges. In Archibalds (1997) stress assignment task, a total of only 10 participants from Japanese and Chines e were included, and only one native listener (possibly the researcher himself) evaluated the stress production (Archibald, 1997). Second, the stress production elicitation has been improved so that it avoids orthography effect, familiarity eff ect, and confusion on which syllable to assign stress. English spelling is opaque; there is no specific written symbol that indicates whether a vowel is stressed. When participants read out loud from an English real word list, the test of their ability to produce stress is confounded with all the problems mentioned above. In this study, the stimuli were presented aurally with a visual stress location
78 prompt. The use of nonwords also removes the familiarity effect that might give native speakers an advantage over nonnative speakers. Third, in the perception experiment, not only the accuracy score was collected but also the RTs. At the time of data collection, none of the previous L2 stress perception studies had measured the RTs. This information was collec ted to support further exploration on the issue of language processing, especially if Thai participants showed a ceiling effect for stress identification as Chinese listeners did in Altmann (2006). Fourth, I particularly avoided the task order effect by having participants complete the speech production task before the perception task. Completing the stress perception task first could have affected participants stress production on English nonwords. To ensure that all participants understand the concept of English lexical stress, the Power Point presentation on this topic and a few examples of English real words including stressed minimal pairs were presented before the experiments began. Finally, the pairedstudy design, examining both str ess perception and production allows investigation into the relation between speech perception and production. The same individuals completed both tasks. The stimuli were also drawn from the same nonword corpus specifically designed to test the influenc e of Thai tone assignments conditioned by syllabic structures. Significance of the S tudy This study marks a novel contribution to our understanding of L2 prosodic transfer, acoustic parameters of L2 lexical stress perception and production, and the relation between speech perception and production, which are not well understood. The findings obtained can be used to support and argue against current L2 stress perception and production and possibly shed light on other L2 speech models that
79 mostly rely on segmental evidence. Moreover, the results present direct pedagogical implications for Engli sh as a second language classrooms and accent reduction training. To date, no study has addressed the English stress identification performance or English word stress realization by NT This current study will present the findings needed to diagnose any possible problems. If results in one set of tasks are poorer than in the other, then we will know which area(s) to concentrate in future training study.
80 CHAPTER 5 METHODOLOGY This study test ed two groups of participants: NT and AE Participants in each language group completed two tasks: production and perception of English lexical stress. The stimuli used in each task consisted of English nonsense words that conformed to Thai syllabic structures. In the speech production task, the participants were required to produce disyllabic English nonwords with specified English lexical stress, either initial or final. Their speech production was recorded and was first evaluated by a group of native speakers of American English who indentified the intended lexical stress position of the disyllabic stimulus words. The production data was then subjected to a detailed acoustical analysis. In the speech perception task, participants identified the stress location of disyllbic nonwords. A ccuracy scores and reaction time were collected. The goal of this research is to investigate the transfer effects of an L1 (Thai) prosodic system onto L2 (American English) prosodic speech perception and production and the relation between speech perception and production. Participants T wo groups of participants participated in this experiment: NT and AE the latter serving as a control group. All reported normal hearing and speaking and passed a bilateral hearing screen in the range of 750 to 8,000 Hz at 25 dB HL (usi ng a DSP Pure Tone Audiometer). Fifteen NT (nine female and six male; mean age = 28; age range: 20 38; age mode = 28) were used in this experiment. One additional male was excluded due to a recording error. The NT all came to the United States as adults and spoke English as a second language. Their duration of residence in the US varied from 2 months to four
81 years and 10 months (mean = 2.6 years; mode = 3 years). All participants began learning English in primary school, either in grade 1 or grade 5. 14 participants were affiliated with the University of Florida: two participants were undergraduate students from the business school; 11 were graduate students from various disciplines, including interior design, veterinary medicine, f isheries & a quatic sciences nutrition, education, linguistics, chemistry, pharmaceutics, and m aterials science & e ngineering; and two were researchers in i nsect p athology and agronomy, respectively. One participant had just received his masters degree in furniture design from another US university but lived in Gainesville, Florida, at the time of recording. All were paid for their participation at the rate of $7.50 per hour. Fifteen AE (eight female and seven male; mean age = 20; age range: 18 27; age mode = 21) who did not speak any tone language were used in this experiment. They were all undergraduate students, studying various subjects at the University of Florida, including English literature (3), journalism (1), communication sciences and disorders (3), economics (1), linguistics (2), applied physiology and kinesiology (1), biology (1), history (1), and physics (1), and one had an undeclared major. Of the 15 par ticipants, two were paid for their time, while the other 13 participated for course credit. Materials The materials used in all experiments were drawn from a specifically designed English nonword corpus. Non words were selected for use in this research due to several advantages over using English real words. First, the familiarity effect such as previously known stress patterns that would inflate the perception and production scores, especially by the AE group, could be avoided. Second, the use of nonwords allows a strict control on syllabic structures and segments of test words.
82 To test the influence of Thai tonal constraints, five syllabic structures conforming to Thai phonotactics were chosen to create stimuli for the speech production and perception tasks. These syllabic structures included CVV, CVS, CVVS (smooth syllables); CVVO (long checked syllable); and CVO (short checked syllable). (Note that S = Sonorant; O = Obstruent.) The use of these simple syllabic structures with no final cluster s but only voiceless stops (O) and nasal consonants (S) in coda position would also minimize segmental errors made by Thai speakers. These five syllabic structures were paired into all possible combinations, yielding 25 disyllabic word types e.g., instance, CVV.CVV, CVV.CVVS and CVV.CVVO. (See table 41 below.) The speech perception task used all 25 word types with variable segments, while the speech production used only 12 word types with more strictly controlled segments, as discussed below. Only wor d types found to be of interest in a previous stress identification perception study (Jangjamras, 2008) were selected for the production task. This was done to minimize test time and fatigue. In the perception task, the onsets included stops /b, p, d, t /, nasals /m, n/, and fricatives: /f, s/; the codas included /t, p, m, n/ and the nuclei included the English vowels /a, e The segments selected for creating stimuli for the production task were more limited. Only /t/ was used as an onset, and only /p/ and /m/ appeared in codas. A pilot study showed that participants had difficulty hearing speech production stimuli such as the less audible onsets like /f/ or /s/, and they could not distinguish coda /m/ from /n/ very well. Importantly, havi ng only one onset consonant also facilitates segmentation work. Vowels only include point vowels: English high front tense/lax /i, /, high back tense/lax /u, and back low / /. In addition, to test Thai long syllabic structures such as
83 CVVO, CVVS, a long high front vowel /i:/, and a long high back vowel, /u:/, were included. Table 51 shows the 25 word types and some sample words. The first row under each syllabic structure shows a sample word for the perception task whereas the second row, when present, shows a word used in the production task. A male phonetician who i s a native speaker of American English, aged 37, produced the test materials written in IPA symbols. The recording for the perception wordlist was made in a soundproof booth using a Marantz PMD 660 digital recorder and a Shure SM2 headset with a microphone that was held at about 1 inches from the left corner of the readers mouth. To maintain a rhythmic pattern, the reader was first asked to produce initial stress on all 100 disyl labic words. Then, the same stimulus words, in randomized order, were produced with only final stress. The reader was told not to change vowel quality of unstressed vowels in order to keep the original syllabic structure. Three repetitions were obtained for each stimulus word, and the middle token from the triad was selected. For this set, 200 stimulus words were obtained (100 x 2 stress positions). On a different day, the same reader produced 25 disyllabic words with alternating initial stress and fina l stres s, again with three repetitions per word. For the second reading set he was asked to apply vowel reduction in unstressed syllables. 50 stimulus words (25 x 2 stress positions) were collected. In sum 250 aural stimuli were used for the perceptual experiment. The recording for the speech production stimuli was made in a sound attenuated booth using a Marantz PMD 660 digital recorder and a Shure SM10A headmounted
84 microphone that was held at about 1 inches from the left corner of the readers m outh. Each monosyllabic word was written on a flash card (e.g., thu:, thi:, tha:m, thp) and presented in random order one word at a time, to the reader. Three repetitions were obtained for each stimulus word. 17 stimulus words were selected for the production experiment. See Appendix A for stimulus words produced by the phonetician and vowel length; Appendix B for materials presented in the speech production task ; Appendix C for the wordlist for the speech perception task. Procedure I: Production E xperiment Before completing the experiment, the participants read a consent form and completed a short questionnaire on language background and personal information, see Appendix D. Then, they watched a PowerPoint presentation on English lexical stress which was created to ensure that everyone understood what lexical stress was; the presentation was self paced. In addition to the written description on the screen, sound files of English stress minimal pairs such as pe rmit (noun) vs. permi t (verb) and pro duce (noun) vs. produ ce (verb) were played ( see Appendix E ) During the experiment, the experimenter sat in the sound booth with each participant to monitor the microphone meter level and to check if any stimulus trial was missed. On average, this experiment lasted about an hour. In general, the American English group completed this task faster than the Thai group did. The speech production task was administered before the perception task to avoid fami liarity effects from the speech perception stimuli, which were disyllabic words produced with lexical stress by a phonetician.
85 The participants were asked to concatenate two monosyllabic words, presented aurally as isolated, stressed syllables, into a si ngle word and say this new word in a frame sentence. Four counterbalanced blocks of 18 trials each were used, making a total of 72 trials (36 words x 2 stress positions). Blocks 1 and 3 elicited initial stress; blocks 2 and 4 elicited final stress. Each participant was randomly assigned to one of the four block orders, while the order of trials in each block was randomized. The total number of utter ances produced by each speaker was 432 (18 trials x 4 blocks x 6 repetitions in each trial). One word fr om each nonword pair was presented in each block, meaning each word was presented twice throughout the experiment. Half of the productions were produced with initial stress and half with final stress. Participants were instructed to stress the initial s yllable when they saw the visual prompt X___ and the final syllable when they saw ___X. The participants were asked to produce the target word three times in isolation and then three times in the frame sentence, They said ___ twice. There was a short break after each block. Participants were given five practice trials using nontest items before the first block began. The experiment was self paced and controlled by each participant. See appendix B for the stimuli used in each block. Stimulus pr esentation was controlled by E PRIME version 1.1 (Psychology Software Tools, Inc.) on a personal computer located outside the sound attenuated room and played over highquality loud speakers and on an external monitor in the room. For each trial, the part icipants heard two monosyllabic words with 1,000 ms inter stimulus interval (ISI) and saw the sentence frame on the screen. They could replay the aural stimulus in each trial as many times as they wished by pressing the ENTER key
86 on an external keyboard. Once they had finished producing the target word they read aloud the trial number (for coding purposes) and then pressed the N key to continue to the next trial. The participants were told to take their time and consider all possible responses. If they made any mistake, they were told to say, correct, and then repeat the response correctly. The speech production was recorded in a sound attenuated booth using a Marantz PMD 660 digital recorder and a Shure SM10 A head mounted microphone that was held at about 2 inches (three fingers) from the left corner of the subjects mouth. The production was recorded and digitized at 44.1 kHz (16 bit) on a personal computer. Procedure II: Perception E xperiment The participants listened to prerecorded test materials through a headset (Plantronics DSP digital signal processing-GameCom/ Pro 1) at a self adjusted comfortable listening level in a quiet room. They were asked to identify the stress placement of each disyllabic word by pressing computer keys. The answers were twoalternative forced choices for each trial. The participants were told to use their dominant hand to press computer keys labeled 1 or 2 when they heard initial stress or final stress, respectively. As the RT was measured, the participants were instructed to respond as quickly as possible. Stimulus presentation was control led by E Prime version 1.1 (Schneider et al., 2002). A short practice session with 10 practice trials that were not used as test trials preceded the real task. The experiment included five blocks, each consisting of 50 trials. The inter trial interval ( ITI) was 1 500 ms. There was a short break after each block. The blocks and trials were presented in random order. Participants listened to 250 aural stimuli in total (25 word types x 2 stress positions x 4 samples). 50 of these
87 250 stimuli (25 word ty pes x 2 stress positions) contained reduced vowels in unstressed position. This experiment took approximately 20 minutes or less to complete. Procedure I II: Production Evaluation Stress productions were evaluated in two ways: 1) perceptual evaluation by native American English listener judges and 2) acoustic analysis. One out of three tokens of each target nonword was extracted from the frame sentence, They said ___ twice to be perceptually evaluated for its stress position by the judges and submit ted to a detailed acoustic analysis. Disyllabic target words were considered candidates for inclusion if there was no evidence of spurious pitch tracing in Praat such as excessive rise or fall within a syllable, and excessive creaky voice. The second rep eated instance was preferred to the first or last instances if the aforementioned conditions were met. Each selected word was edited into its own file and normalized to 98% peak intensity using MODRIFF in the UAB software package (Steve Smith, University of Alabama). These normalized words were presented to American English judges in the following perceptual assessment. However, the original digital recordings were reserved for acoustic analysis. Native American English Listener Judgments S tress productions by NT and AE were perceptually evaluated by 42 judges This evaluation was done prior to the acoustic analysis. Forty two native speakers of American English (24 female and 20 male; mean age = 23; age range: 18 32; age mode = 20) were included as native listener judges. All reported normal hearing and speaking and passed a bilateral hearing screen in the range of 750 to 8,000 Hz measuring at 25 dB HL (with a DSP Pure Tone Audiometer). Two extra potential participants were rejected bec ause one failed the screening test and the other scored
88 lower than 2 sd. from the overall mean accuracy score. All judges were recruited on the University of Florida campus through flyers and the Linguistics Communication Sciences and Disorders participant pool. They were undergraduate or graduate students except for two judges who were not affiliated with the university. 24 participants were paid $15 for their time, and 18 participants received course credit. Prior to the experiment, participants read a consent form and completed a short questionnaire on language background and demography information. Then, they watched a PowerPoint Presentation on English lexical stress to ensure they understood what lexical stress was. 21 judges were assigned to each of the two listening blocks, block A and B. Judges gender and their recruitment method were counterbalanced in each block. Each block consisted of disyllabic words produced by 30 speakers, in which half of their total productions (72 words) were pr esented in either listening block, and controlled words produced by a trained phonetician. Each judge listened to 1152 tokens (36 test words x 30 speakers + 72 control words). The judges were told to press computer keys labeled 1 or 2 when they heard ini tial or final stress, respectively. They were instructed to respond as quickly as possible. Each token was presented once. If they could not identify the stress location, they were told to guess by pressing any key. The procedure of this evaluation task was similar to that of the speech perception task. Stimulus presentation was controlled by E Prime version 1.1 (Schneider et al. 2002). A short practice session with 10 practice trials that were not used as test trials preceded the real task. The experiment was divided into two sessions; the sessions were administered on two consecutive days with all judges except for two who
89 completed both sessions on the same day with a break of at least 1.5 hr between the sessions. Each session, presented in a c ounter balanced order, included eight blocks, and each block consisted of 72 trials. The ITI was 1,500 ms. Each block included 36 productions by a native speaker of Thai and 36 productions by a native speaker of AE Trials in each block and the eight blocks in each session were presented in randomized order. A short, optional break was given at the end of each block. Once the fourth block was completed, there was a mandatory break of 10 minutes before the fifth block started. On average, the first s ession, including check in time, lasted for approximately an hour. The second session took about 45 minutes. Acoustic Analysis The acoustic analysis served to determine acoustic cues the judges rely on when identifying lexical stress produced by the NT a nd AE After excluding distracter items in the production task, 56 words produced by each speaker were submitted to acoustic analysis. The recordings (56 tokens x 30 talkers) were analyzed using Praat version 5.0.42 (Boersma and Weenik, 2009 ). The foll owing acoustic parameters were measured for the stressed and unstressed vowels for each disyllabic token: vowel duration (in ms); average intensity (in dB); average fundamental frequency (F0, in Hz), maximum F0, minimum F0, F0 range, and the first and the second formants (in Hz ). Each vowel was broken down into 20 evenly spaced data points in which i ntensity, F0, and t he two formants were measured. However, the averaged formant values are not presented in this report since they do not fit well with other v ariables included in the stepwise regression analysis. The speech tokens were sampled at a rate of 44.1 kHz with a quantization rate of 16 bits and a low pass filter of 11.025 kHz. Following
90 recommendations in the Praat manual, when measuring F0, the pi tch range for female speakers was set to 100 500 Hz, and for male speakers, 75 300 Hz. However, for creaky voice (approx imately 3% of all tokens) the pitch floor was set to 50 Hz for a particular token across all talkers. When measuring formants, the m aximum formant was set at 5500 Hz for female talkers and 5000 Hz for males. The analysis window size was 25.6 ms. Segmentation C riteria The onset of all production tokens was [th]. There were three rhyme types: 1. The CVV structure had an open vowel (no c oda). This includes the syllables [tha:], [thu:], and [thi:]. 2. The CVVO and CVO structures had the stop /p/ as a coda. These include the syllables [thi:p], [thip], [th, [thu:p], [thup], [th p], and [tha:p]. 3. The CVS and CVVS structures had nasal /m/ as a coda. These include the syllables [thi:m], [thim], [thum], [thu:m] and [tha:m] Segmentation criteria were based on both waveform and spectrogram cues as described by Peterson and Lehiste (1952) and Ladefoged (2001) as well as audio cues. Vowel boundaries were segmented according to the following criteria: The vowel onset was defined as the first upward zero crossing at the beginning of voicing in the waveform. On the spectrogram, it was the visible cessati on of aspiration of the onset /th/ and the onset of voicing, usually accompanied by the onset of the first formant. See figure 51. In the rare case when aspiration extended into the vowel portion, the onset of F1 was treated as the vowel onset. The vowel offset in an open rhyme node was marked at the downward zero crossing immediately following the final glottal pulse in the waveform (Zhang et al. 2008). On the spectrogram, this was usually the cessation of all formants; if extra
91 oscillation or cycles were present, a sudden drop in F1 was used as a cue (de Jong, 2004). The offset of a syllable with an obstruent coda, downward zero crossing, was defined as the end of F2. The final release burst of the voiceless bilabial stop was ignored. In a syllable with a bilabial nasal coda, the visual cues to the end of the vowel included a decrease or drop in F1 on the spectrogram and rather faint higher formants compared to that of the vowel portion. See figure 52. In many cases, the formant cues were more straightforward when the nasal followed a high front vowel or low back vowel than when it followed a high back vowel. When the vowels were nasalized, there was usually an extra peak of energy, generally between the oral and nasalized portions of the vowel. (The nasalized portion of the vowel was included in the vowel duration.) Extra aperiodic cycles or creaky voice after the regular offset (for both open and closed syllables) seen on the waveform were excluded. Acoustic C omputation To control for variations in speaking rate, duration, intensity and F0 between recordings and across talkers in the two groups, two types of normalization methods were used. First, a ratio of the first to the second syllable values was used for average F0, F0 range and duration (Fry, 1955; Fry 1958; Sereno & Jongman, 1995). For example, a ratio of the average F0 of each token was calculated by dividing the average F0 of the stressed vowel by the average F0 of the unstressed vowel. If the resulting value was positi ve (greater than zero), it showed that the average F0 of the stressed vowel was higher than that of the unstressed vowel, and vice versa. This value served as a dependent variable in subsequent statistical analyses. For intensity, the difference in dB between the stressed and unstressed vowels was used instead of the ratio. This
92 was done because dB is a logarithmic scale, not an absolute value such as Hz or milliseconds. Procedure IV: Perception Evaluation Stress perception was evaluated in three ways: 1) accuracy score, 2) RT analy sis, and 3) acoustic analysis. The acoustic analysis served to determine acoustic cues the NT and AE rely on when perceiving stress produced by the phonetician. After excluding 50 items with vowel reduction in unstressed sy llables, 200 words produced by the phonetician were submitted to acoustic analysis. The acoustic analysis on stress perception stimuli has the same procedures as that of stress production. Statistical D esign The statistical design for the speech production task and for the speech perception task included a mixed design threeway (Position x Syllabic Structure x G ro up) ANOVA with stress position and syllabic structure as within subjects factor s and language group as a betweensubjects factor. Position had two levels: initial o r final; Syllabic Structure has five levels: CVO, CVS, CVV, CVVO, and CVVS; and Gr oup has 2 levels: AE and NT The dependent variables in each test were accuracy percentage, acoustic values (production) and RT means (perception). The Bonferroni post hoc test (adjusted p <.05) was selected for pair wise means comparisons. Acoustic data on stress production were also submitted to a step wise regression analysis. Independent variables included duration ratio, F0 range ratio, average F0 ratio, and average intensity difference. Dependent variables were stress production identification scores (21 = possible highest score per token). Separate regression analyses were performed on each stress position and syllabic structure by each la nguage group.
93 Acoustic data of perception stimuli were submitted to three statistical tests. First, dependent t tests were conducted. Raw acoustic values were dependent variables and stress versus unstressed were independent variables. Second, correl ation tests between each acoustic parameter, ratios/difference, and perception accuracy scores were performed on each stress position by each language group. Finally, a stepwise regression analysis was performed on stress position by each language group. Acoustic ratios/ difference were independent variables and stress perception accuracy scores (15 = possible highest score per token) were dependent variables. The r e lation between speech perception and production was investigated through correlation tests. Two tests were conducted on: 1) overall perception and production mean score and 2) specific syllabic structures of each stress position. Both linear (Pearson) and nonlinear (Spearmans rho) correlation tests were used. Conclusion This chapter described the procedures of the two experiments. Fifteen NT and fifteen AE participated in stress production and perception experiments using English nonwords drawn from a specifically designed corpus that tested the influence of Thai tone assignment and stress position on production and perception of English str ess. Participants were asked to produce English nonwords using a novel elicitation technique. Their recorded speech was evaluated by 42 native listener judges (21 judges per word), and selected speech tokens were submitted to detailed acoustic analysis to uncover acoustic correlates of lexical stress employed by both groups. In the speech perception task, participants were asked to identify the stress location of English nonwords. Accuracy scores and RT were recorded.
94 The participants stress product ion scores, perception scores and mean RT were submitted to a 3way ANOVA to answer the question of how syllabic structure, stress position and language group affects stress production and perception. Acoustic analyses were performed on words produced by two language groups and a p honetician. The production data were submitted to a series of step wise regression analysis to identify the primary acoustic correlate employed in stress judgment by native listener judges Acoustic data of perception stimuli w ere submitted to three statistical tests: dependent t tests, correlation tests and stepwise regressions. Correlations between the accuracy means of stress production and perception were calculated to answer the research question regarding the relation between stress production and perception for NT and AE.
95 Table 5 1. 25 word types Word type 1 2 3 4 5 CVV.CVV CVV.CVVS CVV.CVVO CVV.CVO CVV.CVS perception b a : .ba: ba: b : n da: .fe t ba:.b t da:. b m production t h u:.t h a:p t h i:.t h ip Word type 6 7 8 9 10 CVVS.CVVS CVVS.CVVO CVVS.CVO CVVS.CVV CVVS.CVS perception t :n.b :n n :n.fe t t :n. b t n :n.ba: t :n. b m production t h im:.t h a:p t h a:m.t h ip Word type 11 12 13 14 15 CVVO.CVVO CVVO.CVO CVVO.CVV CVVO.CVVS CVVO.CVS perception pe t.bu:t ke t. b t pe t.ba: ke t.b : n pe t. b m production t h i:p.t h up t h ip:.t h u: t h i:p.t h i:m Word type 16 17 18 19 20 CVO.CVO CVO.CVS CVO.CVV CVO.CVVS CVO.CVVO perception n t b t s t b m n t b a: s t b : n n t .fe t production t h .t h p t h p:.t h i:m t h up.t h i:p Word type 21 22 23 24 25 CVS.CVS CVS.CVV CVS.CVVS CVS.CVVO CVS.CVO perception k m.b m t m.ba: k m b : n t m.fe t k m.b t production t h um.t h u: t h im:.t h p
96 Figure 51. A spectrogram of the word th :.thp with vowel boundaries. Figure 52. A spectrogram of the word thm .thim with vowel boundaries.
97 CHAPTER 6 RESULTS This chapter presents the results obtained from the stress production and perception experiments described in chapter 5. To investigate any transfer from Thai tonal rule assignments and other Thai prosodic features onto L2 str ess production, NT and AE, the control group, were asked to concatenate two monosyllabic English nonwords into a single word with a specified stress location. The recorded speech materials were analyzed in three ways: the stress identification by native listener judges, the detailed acoustical analysis of the production data and the Thai tone transfer analysis (in chapter 7). To further investigate L1 prosodic transfer in L2 lexical stress perception, the same participant groups completed a stress identification task using English nonwords. Their stress production and perception scores and RTs were collected and analyzed. In addition, a similar detailed acoustical analysis was conducted on the speech perception stimuli to reveal which acoustic cues participants relied on in the stress perception task. The combined results from the two experiments we re used to examine the relation between lexical stress production and perception. The results are presented in three sections: production, perception and relation between production and perception. The statistical design for the stress identification score and reaction time analyses in the two experiments (production and perception) involved a 2 x 2 x 5 ANOVA with language as the betweensubjects factor (Thai and English), and Position (initial and final stress) and Syllabic Structure (CVO, CVS, CVV, CVVO, CVVS) as within subjects factors. The Bonferroni post hoc test (adjusted p <.05) was selected for pair wise means comparisons. When the Mauchlys sphericity
98 test result was significant, the degrees of freedom were corrected by HuynhFeldt Epsilon and thus reported instead of the original degrees of freedom. Speech P roduction Perceptual E valuation Stress location identification scores obtained from 21 AE native listener judges for the 56 disyllabic nonwords produced by each of the 30 speakers (15 NT, 15 AE), or stress production scores, were used for this analysis. When the intended stress location of each item was reported as a stressed syllable by a judge, that item was assigned a score of 1. In contrast, if the intended stress location did not match with a stressed location identified by a judge, the item was assigned a score of 0. The highest possible stress production score that an item can have is 21. However, 21 judgements were excluded due to a sound file extraction error. Therefore, 35,259 responses [(56 x 30 x 21) 21] were included in this analysis. A threeway repeated measures ANOVA analysis revealed no evidence of overall difference between Thai and AE production of stress, [main effects of Group, F (1,28) = .001, p = .978]. Figure 61 shows that both groups scored above chance level (50%). The analysis also showed that both groups produced initial stress more accurately than final stress (73.37% vs. 67.02%) (Fig. 62 & 6 3) [main effect of Position, F (1,28) = 5.47, p = .027]. I n addition, their stress production score varied significantly across the five syllable structures (Fig. 6 4) [main effects of Structure, F (3.28,92.06) = 17.24, p = .000]. Post hoc, pair wise comparisons indicated that stress product ion was less accurate for CVO ( 61.75%) than for the four other structures, and that CVV (69.3%) was produced significantly less accurately than CVVS (74.9%). These results indicate that CVO was the most dif f icult structure for stress production, CVV was not as difficult, and C VVS was the easiest. No significant
99 interaction between any factors was found. Fig. 65 reports mean percentage and standard errors of stress production score by syllabic structure by stress position and language group. Acoustic Data: ThreeW ay ANOVAs A series of threeway ANOVAs was conducted to evaluate the extent to which each of the four acoustic parameters, namely duration, F0 range, average F0 and average dB, was realized in stress production by N T and AE. Position (initial and final) and Struct ure (CVVO, CVVS and CVS) were within subject factors and Group (Thai and AE) was the betweensubject factor. The dependent variables for these analyses were a subset of acoustic data evaluated by native listener judges. Since each Syllabic structure x P osition condition does not have equal samples, an attempt was made to make each analysis cell equal and this constituted the subset of acoustic data. O utliers (3 SDs below or above the means) were removed. The total number of data points in this analysis was 720; each cell contained 60 data points. Table 61 reports F statistics and p values of all tests. The alpha level is .05. Main effect of stress position showed that both N T and AE produced initial stress with a higher mean F0 range [ F (1,118) = 71.517, p = .000], higher average F0 [ F (1,118) = 5.436, p = .000] and higher intensity [ F (1,188) = 44.886, p = .000]. That is, both language groups produced initial stress that had a larger pitch range and higher pitch and that was louder than final stres s production. However, they produced final stress with longer vowel duration [ F (1,118) = 285.84, p = .000] than that of initial stress production. Main effect of Syllabic Structure was found for three acoustic parameters measured: vowel duration, [ F (1,118) = 4.046, p=.047], average F0, [ F (2, 228.074) =
100 4.156, p = .018], and intensity, [ F (2, 118) = 9.492, p = .000]. Post hoc Bonferroni pair wise comparisons showed that 1) the duration ratio means of all three syllabic structures varied significantly from e ach other: CVVS (1.869, SE = .069) is longer than CVVO (1.57, SE = .055); CVVS is longer than CVO (1.337, SE = .065); CVVO is longer than CVO; CVO is shorter than CVVS, 2) F0 ratio mean of the CVVO syllable (1.131, SE .014) is significantly higher than that of CVO (1.18, SE .017) and 3) the mean dB difference of CVVO (3.085, SE .307) and CVO (2.993, SE .285) are significantly higher than CVVS (1.579, SE .284). Main effect of Group for only one acoustic parameter, average F0, [ F (1,118) = 6.36, p=.013] was found. Stressed syllables are produced with higher average F0 than unstressed syllables (thus a greater F0 ratio) by N T. See table 62 for detailed descriptive statistics of average F0 by each syllabic structure and by each language group. A significant two way in teraction between Position and Group was found among all three acoustic correlates. In addition, the Structure x Group interaction on intensity difference was found significant. No other significant interactions were found. Follow up tests on the Position x Group interaction included a paired sample t test and an independent t test at each level of Structure. The results of a paired sample t test examining the effect of Position by collapsing group means showed that overall both groups significantly produced 1) longer duration contrast when stressing finally, 2) bigger F0 range when stressing finally and 3) higher average F0 when stressing initially. See table 6 3 for descriptive and t statistics.
101 An independent t test showed that when collapsing Position means, NT produced a significantly higher average F0 when stressing CVVS syllables [ t (238) = 2.502, p = .013] than AE. Although other mean comparisons did not reach significance level at p= .025, it should be noted that NT also produc ed higher F0 range than AE when stressing CVVO structures [ t (238) = 1.98, p =. 049], higher mean average F0 on CVO syllables [ t (238) = 1.893, p = .06] and higher mean intensity difference on CVVS structures [ t (238) = 1.97, p = .05]. Followup tests on t he Structure x Group interaction showed that in initial stress production, the average dB difference varied significantly among the three syllabic structures, [F (2,357) = 4.758, p = .009]. Post hoc Bonferroni showed that both groups produced CVO stressed syllable (4.68 dB) with significantly higher average intensity than CVVS syllable (2.92 dB). In final stress production, the intensity difference also varied significantly among the three structures, [ F (2,357) = 3.631, p = .027]. Post hoc Bonferroni showed that the overall intensity of CVVO (1.98 dB) was significantly higher than that of CVVS syllables (0.23 dB). In summary, when each acoustic parameter was examined closely, overall, with both groups of speakers syllabic structure and stress position affected their stress production in a similar way. That is, they produced initial stress with larger F0 range, higher mean F0 and greater intensity than final stress. In contrast, final stress was produced with longer duration ratio than initial stress. Acoustic D ata: Stepwise R egression An analysis was conducted to examine whic h acoustic properties judges relied on when identifying stress location as produced by the two language groups. Twelve separate stepwise regressions were conducted, one for each syllabic structure (CVVO,
102 CVO, CVVS) in each stress position (initial and final) for each language group (NT and AE). Stress production score was the dependent variable, and four acoustic parameters, duration ratio, F0 range ratio, mean F0 ratio and mean intensity difference (dB), were independent variables. In a stepwise regression, the strongest predictor enters the model first, then the next strongest and so forth. All the reported predictors below showed a positive relationship with the stress production score ( i.e. when dur ation ratio increased, the score increased) A total of 1,193 cases were included in this analysis after removing the 5.32 % of acoustic measurement data that were considered outliers (3 SDs below or above the means of any one of the four independent var iables). See table 64 for the regression result summary. The percentages presented in parentheses indicate the percent of variability explained by each predictor. The first predictor listed under each condition (Structure x Position x Group) is the pri mary predictor of each model. The subsequent predictors, listed under the primary predictor, add additional predictive power of prediction to the model. Results for the AE for each stress position will be considered first. AE initial stress production on CVVO syllable regression results, based on 127 cases, showed a significant regression model with three acoustic measures, with the exclusion of F0 range, F (3, 126) = 16.366, p = .000, with a fair model fit, R2 = .285. This model with three predictors ac counted for 29% of the variance. The strongest predictor of perceived stress was vowel duration (13% of the variance), followed by mean F0 (additional 10%) and then average intensity (additional 5%). AE initial stress production on CVO syllable results, based on 101 cases, showed that a model with three predictors was statistically reliable, F (3, 100) = 183.315, p =
103 .025, with a fair model fit, R2 = .345. This model accounted for 35% of the variance. The strongest predictor was mean F0 (20% of the vari ance), followed by duration (additional 10%) and finally intensity (additional 3.5%). AE initial stress production on CVVS syllable results, based on 57 cases, showed that a model with just one predictor was statistically reliable, F (1, 56) = 35.383, p = .000, with a fair model fit, R2 = .391. The only predictor in this model, intensity, accounted for 39% of the variance. Next, the results for NT regression are presented. NT initial stress production on CVVO syllable regression results, based on 130 cases, showed that a model with three predictors, with the exclusion of F0 range, was statistically reliable, F(3, 129) = 36.016, p = .000, with a fairly good model fit, R2 = .462. This full model accounted for 46% of the variance. The strongest predictor was intensity (22% of the variance), followed by duration (additional 17%) and average F0 (7%). NT initial stress production on CVO syllable results, based on 103 cases, showed that a model with three predictors was statistically reliable, F (3, 102), p = .000 with a very good model fit, R2 = .610. The full model accounted for 61% of the variance. The strongest predictor was duration (37% of the variance), followed by intensity (additional 6.9%) and mean F0 (additional 6.9%). The CVVS syllable results, based on 55 cases, showed that a model with three predictors was statistically reliable, F (3, 54) = 11.947, p = .000 with a moderately good model fit, R2 = .413. This full model accounted for 41% of the variance. The strongest predictor was intensity (25% of the variance), followed by mean F0 (additional 10%) and duration (additional 6.6%).
104 For final stress production by both groups for each syllable t ype, only a model with three predictors, always excluding F0 range, was found to be significant (p = .000). AE final stress production on CVVO syllable regression results, based on 127 cases, showed that a model with three predictors, F (3,126) = 44.084, p = .000, produced a good model fit, R2 = .518. This full model accounted for 52% of the variance. The strongest predictor was intensity (33% of the variance), followed by mean F0 (additional 18%) and duration (additional 6.5%). AE final stress producti on on CVO syllable results, based on 96 cases, showed that a model with three predictors was statistically reliable, F (3, 95) = 46.58, p = .000, with a good model fit, R2 = .603. This full model accounted for 60% of the variance. The strongest predictor was mean F0 (37%), followed by duration (additional 14.8%) and intensity (additional 8.3%). AE final stress on CVVS syllable regression results, based on 85 cases, showed that a model with three predictors was statistically reliable, F (3, 84) = 53.685, p = .000, with a good model fit, R2 = .665. The strongest predictor was average F0 (39% of the variance), followed by duration (additional 20.7%) and intensity (additional 7%). Finally, the results for NT are presented. Thai final stress production on C VVO syllable regression results, based on 128 cases, showed that a threepredictor model, F (3, 127) = 23.065, p = .000, produced a good model fit, R2 = .358. The strongest predictor was intensity (19%), followed by duration (additional 8.6%) and mean F0 ( additional 8.2%). For N T final stress production on CVO syllables, based on 100 cases, results showed that a model with three predictors, F (3, 99) = 33.096, p = .000, produced a
105 moderately good model fit, R2 = .508. The full model accounted for 50% of t he variance. The strongest predictor was duration (27%), followed by mean F0 (additional 15.8%) and intensity (additional 7.8%). Results for N T final stress production on CVVS syllables, based on 84 cases, showed that a model with three predictors, F(3, 83) = 47.261, p = .000, produced a good model fit, R2 = .639. The strongest predictor was duration (29% of the variance), followed by mean F0 (additional 32.8%) and intensity (additional 2.2%). Based on the total amount of variance they accounted for across all three syllabic structures, duration (60.6%) appeared to be the strongest predictor for NT initial stress score, followed by intensity (53.9%) and mean F0 (23.9%) respectively. Duration (64.6%) was also the strongest predictor for NT final str ess production score, followed by mean F0 (56.8%) and intensity (29%) respectively. For AE initial stress production score, on the other hand, intensity (47.5%) accounted for the most amount of variance, followed by average F0 (30%) and duration (23%). For their final stress production score, average F0 (94%) emerged as the strongest predictor overall, followed by intensity (48.3%) and duration (42%). F0 range was the only parameter that did not account for any significant amount of variance in either groups stress production scores. In addition, the relatively low model fit ( R2 value) of each regression model of AE initial stress production across three syllabic structures indicates that the judges probably based their judgments on other acoustic cues not measured here such as vowel centralization. In summary, the stepwise regressions showed that native listener judges relied on different acoustic cues when identifying stress produced by NT than when identifying
106 stress produced by AE. The strongest predictor for NT word initial as well as word final stress production scores was vowel duration. In contrast, for AE average, intensity was the strongest predictor of their production accuracy of wordinitial stress while mean F0 was the strongest predictor for their word final stress production. Table 6.5 presents summary of predictors for each stress position. In conclusion, this section presents results concerning stress production by NT and AE. In order to obtain the findings reported above, several procedures have been delineated. First, their intended stress productions were judged by a panel of 21 native listeners. The obtained stress production scores were submitted to a threeway ANOVAs analysis to investigate the effects of Structure, Position and Group. NT and AE showed similar patterns of performance. Both groups overall score was 70% where initial stress was more accurately heard as intended than final stress (73% vs. 67%). Second, the results from detailed acoustic analysis revealed acoustic parameters both language groups employed in stress production. Again a threeway ANOVAs analysis was employed to investigate three factors: Structure, Position and Group. When each acoustic parameter was examined closely, syllabic structure and stress position affected their stress production similarly. Both groups produced initial stress with larger F0 range, higher average F0 mean and greater intensity difference than final stress. Furthermore, they produced final stress with longer vowel dur ation than initial stress. Finally, stepwise regression analyses were conducted to investigate the acoustic cue native listener judges relied on when identifying stress location produced by NT and
107 AE. Results of a stepwise regression analysis showed that with varying degrees of strength across the three syllabic structures examined, vowel intensity, vowel duration and average F0 all emerged as significant predictors of both NT and AE stress production scores. Based on the total amount of variance they accounted for across all three syllabic structures, duration was the strongest predictor for NT initial and final stress score For AE initial stress production score, on the other hand, intensity accounted for the most amount of variance. For their final stress production score, average F0 was the strongest predictor overall F0 range was the only parameter that did not account for any significant amount of variance in either group s stress production scores. Speech P erception The dependent variables of speech perception analysis include the correct stress identification score (hence forth accuracy data) and reaction times (RTs). Accuracy data are analyzed in two ways. The first one included all 250 test items per listener. The second one included 200 test items; 50 items with a reduced vowel in an unstressed position were excluded. This was done to examine how the vowel reduction cue affects stress identification by N T and AE Accuracy 7,500 stress location responses from both N T and AE were used in this analysis. Results of a repeated measures ANOVA indicated no significant difference between Thai and AE accuracy in their identification of stress location [main effect of Group, F (1,28) = .521, p = .476]. Figure 66 shows close to ceiling per formance by both langage groups: Thai 89.36 % (SE 2.25) and AE 91.65 % (SE 2.25). Participants were significantly more accurate in identifying stress in initial position (93.65%) than in final
108 position (87.3%) [main effect of Position, F (1,28) = 16.69, p = .000] and their performance varied significantly across the 5 syllable structures [main effect of Structure, F (4,112) = 4.99, p = .001] (Fig. 67, 68 & 6 9). Post hoc, pair wise comparisons showed that they were less accurate on CVS (87%) than on CVVS ( 92.6%) and CVVO (92.2%). A significant interaction between Structure and Group [ F (4,112) = 6.18, p = .000] was also found (Fig. 610). Follow up tests revealed that this was due mainly to the fact that AE, but not NT identified stress position significantly less accurately on CVO (87.33%) than CVVS (94.80%) sylllables, CVS (90.00%) than CVVS (94.8%) syllables, and CVS than CVVO (92.53%) syllables. No other significant interaction was found. Fig. 611 reports mean percentage and standard errors of stres s perception accuracy by syllabic structure by stress position and language group. Accuracy (without Vowel Reduction C ue) 6,000 stress location responses from both N T and AE were used in this stress identification analysis after test words with reduced vowel quality in unstressed position had been removed. Results of a repeated measures ANOVA indicated no significant difference between N T (89.1%, SE 2.36) and AE (91.46 %, SE 2.36) accuracy in their identification of stress location (Fig. 612) [main effect of Group, F (1,28) = .502, p = .485]. Note that the previous perception analysis reported the accuracy means as N T, 89.36 % (SE 2.47) and AE, 91.65 % (SE 2.47). Part icipants were significantly more accurate in identifying stress in initial position (93.60%) than in final position (86.96%) (Fig. 613) [main effect of Position, F (1,28) = 15.63, p = .000] and their performance varied significantly across the 5 syllable s tructures (Fig. 615) [main effect of Structure, F(4,112) = 3.637, p = .008]. Post hoc, pair wise comparisons showed that they were less accurate on CVS (87.5%) than on CVVO (92.25%). A significant interaction
109 between Position and Group [ F (1,28) = 5.97, p = .021] was also found. Follow up tests revealed that both language groups were significantly more accurate on initial stress than final stress (Fig. 6 14): Thai initial st ress 94.47 % vs final 83.73 % [ r = .877, p = .000], AE initial stress 92.73 % vs final 90.20 % [r = .53, p = .042]. Structure and Group interaction was also found significant (Fig. 616)[ F (4,112) = 6.363, p = .000]. Follow up tests revealed that only AE were significantly less accurate on CVO (86.5%) than CVV (93%), [F (2.35,32.89) = 5.066, p = .009]. Fig. 617 reports mean percentage and standard errors of stress perception without reduction cue accuracy by syllabic structure by stress position and language group. The results of stress perception with and without the vowel reduction cue differ in two ways. First, according to the main effect of Structure, the pair wise comparison on CVS and CVVS was significant in the stress perception with all test words, but not significant in the stress perception without reduction cue (compare Fi g. 69 vs Fig. 615). Second, in addition to Structure and Group interaction, Position and Group interaction was found significant in the stress perception without reduction cue analysis. Reaction Time 6,438 accurate responses with RT below 5000 ms, measured from stimulus onset, were used in this analysis. RT means of each syllabic structure in each stress position were the dependent variables. A threeway repeated measures ANOVA analysis revealed no overall significant difference in response times for NT and AE (Fig. 618) [main effect of Group, F (1,28) = .744, p = .396]; and overall more time was spent on final stress identification (3,038 ms) than initial stress (2,901 ms)(Fig. 619) [main effect of Position, F (1,28) = 31.429, p = .000]. The analysis also suggested that RT varied significantly across the five syllabic structures (Fig. 620) [main effect of Structure,
110 F (3.45, 96,73) = 5.68, p = .001]. Post hoc, pair wise comparisons showed that RT mean for the CVS structure (3,035 ms) was significantly longer than for CVV (2,930 ms), CVVO (2,952 ms), and CVVS (2,929 ms). Significant twoway interactions between Position and Group [ F (1,28) = 11.266, p = .002], and Position and Structure [ F (4,112) = 3.32, p = .013] were found. Follow up tests revealed that the position effect was due to the fact that NT, but not AE, listeners spent more time judging wordfinal stress (3,131 ms) than initial stress (2,912 ms) as shown in Fig. 6 21. The second followup tests also revealed that both groups spent more time on final stress identification of CVS (3,149 ms) than on CVV (2,970 ms), CVVO (3,033 ms), and CVVS (2,964 ms) as shown in Fig. 622. Figure 623 reports mean percentage and standard errors of reaction ti me by syllabic structure by stress position and by NT and AE. Acoustic Analysis of Perception S timuli The results from the previous section revealed that NT did not perform well on final stress identification of CVS syllable and their reaction time on f inal stress identification were longer than initial stress identification. In addition, AE identified final stress on CVV syllable better than other syllabic structures. To seek explanations for these observations, the stimulus words produced by the trained phonetician were submitted to the same detailed acoustic analysis described in the speech production experiment. The acoustic correlates of interest include F0 range, average F0, vowel duration and intensity. Only 200 words without vowel reduction cues were included in this analysis because the other 50 words were produced with an additional cue which is vowel reduction and their sample of each syllabic structure x stress position was only five. The descriptive statistics and dependent t results on raw acoustic correlates between stressed versus unstressed syllables of 200 words (n = 20 in each condition) produced
111 without vowel reduction cue are presented in figure 624. These bar graphs use a color system to represent stressed versus unstressed syllables and significant or insignificant t test results. Blue represents each acoustic value of stressed syllables while red represents those of unstressed syllables that reach the twotailed significant difference level in the dependent t test, p < .025. The lighter shades of blue and red represent acoustic values that did not reach the significance level. The acoustic results showed that final stress was produced with all four cues while initial stress was produced without the use of all cues in all syllabic structure types. In initial stress, only average F0 (mean F0) and intensity (dB) cues were used consistently across structure types. In addition, that the magnitude of duration contrast for final stress stimuli was much greater than that of initi al stress. A close inspection showed that the magnitude of average F0 difference and the intensity difference of an initial stressed vowel and its neighbor (i.e., unstressed vowel) was considerably greater than that of the final stressed vowel across five structures. The average F0 difference between stressed and unstressed syllables of initial stress is approximately 20.86 Hz higher (range from 18.3 to 23.4 Hz difference) than that of final stress across the five structures. Table 66 showed the average F0 of stressed and unstressed syllables in each stress position and the average F0 difference between stressed and unstressed syllables. The average intensity difference between stressed and unstressed vowels of initial stress is approximately 3.54 dB hi gher (range from 2.4 to 4.6 dB difference) than that of final stress across the five structures. Table 67 showed the average intensity of stressed and unstressed syllables in each stress position and the average intensity difference between stressed and unstressed syllables. Note that the just noticeable
112 difference (JND), a set of values for the sensitivity of the ear to acoustic changes, in frequency lower than 500 Hz range is about 4 Hz and JND for intensity is about 1 dB (Ladefoged, 1996). Based on the perception accuracy results and the t test results, both average F0 and intensity difference in initial stress stimuli were likely contributors to the superior initial stress perception scores by both language groups. This interpretation gained support by the fact that only average F0 and intensity parameters were consistently evident across structure types in initial stress whereas all the four acoustic parameters of stress were present in final stress stimuli. Yet NT as well as AE were more accurate a t identifying initial stress than final stress (93.6% vs 86.96% in stress perception score without vowel reduction). In addition, average F0 contrast was speculated to be the most important cue among all the four cues measured in stress perception since t he magnitude of the difference in average F0 between initial and final stress was much higher (20.5 Hz or 5 times the JND) than that of other parameters. Furthermore, NT might have been adversely affected by the average F0 contrast of initial and final s tress more than AE because the difference between their initial stress accuracy score and final stress score (94.47% vs 83.73%) was much larger than that of the AE control group (92.73% vs 90.20%). Finally, NT spent more time judging wordfinal stress (3 ,131 ms) than initial stress (2,912 ms). This further supported NT difficulty in final stress processing. However, it should be mentioned that in final stress identification NT were the least accurate in CVS structure (with the most RTs) and AE was the least accurate at identifying CVO structure. Although the average F0 difference of these structures (xxCVS = 19.0 Hz, xxCVO = 20.6 Hz) was higher than other structures with
113 long vowels (xxCVVO = 15.8 Hz, xxCVVS = 12.5 Hz and xxCVV = 11.2 Hz), this implies th at duration might compensate for low average F0 contrast in final stress identification and the 20 Hz F0 difference alone might be too small to signal stress in absence of durational contrast. This finding is interesting in so far as it suggests a trading relation between these two acoustic correlates of stress. In order to investigate how the four acoustic parameters of stressed versus unstressed syllables interact and how they affected stress perception (e.g., to evaluate the hypothesis that NT relied he avily on average F0 cue and to seek other explanations for initial stress preference), two statistical analyses were conducted. Correlation: Acoustic Data and Stress Perception Accuracy S core Correlation tests were conducted to examine the interaction among acoustic parameters in stress perception as well as the association between each acoustic parameter and stress identification score by each language group. Four acoustic parameters, duration ratio, F0 range ratio, mean F0 ratio and mean intensity difference (dB) were correlated with each stress position (initial and final) for each language group (NT and AE). Each pair of correlation tests were based on 100 cases of the stress perception score (total = 15) and 100 acoustic data points. Significant co rrelations were only found among acoustic ratios of final stress perception stimulus words. A positive relationship was found between duration ratio and intensity difference [ r = .233, p = .02] at p<.05; F0 range ratio and intensity diffrence [ r = .262, p = .009] at p<.01. This suggested that the phonetician produced final stress with increased intensity difference, increased F0 range difference and increased duration ratio difference. However, duration ratio was negatively correlated with average
114 F0 rat io [ r = .674, p = .000]. This indicated a strong tendency that as duration ratio increased, average F0 ratio decreased. In other words, when stressed vowels were lenghtened, the phonetician tended to lower average F0 contrast on the final stressed vowel The following are correlation results between acoustic ratio and stress identification scores. No significant correlation between any of the four acoustic parameters and stress perception accuracy scores by AE and NT was found for initial stress pos ition. For final stress correlation results, however, significant correlations were found as follows. A Pearson correlation showed a positive correlation between AE stress perception accuracy scores and duration ratio [ r = .342, p = .000] at p<.01; the sc ores and F0 range ratio [ r = .222, p = .026]; the scores and intensity difference [ r = .201, p = .044] at p<.05. This showed that AE stress identification scores were likely to increase the most as duration ratio of final stress increased, followed by F0 range ratio and intensity difference. For Thai final stress, a Pearson correlation showed a positive correlation between Thai stress perception accuracy scores and itensity difference [ r = .389, p = .000] at p<.01; score and duration ratio [ r = .311, p = .002]. These results showed that the final stress accuracy score for NT appeared to increase as intensity and duration ratio increased. No other correlations were found significant. In sum, significant correlations were only found in final stress perception scores and in some of the four acoustic parameters. AE stress identification scores appeared to increase the most when the duration ratio of final stress increased, followed by F0 range ratio and intensity difference. NT final stress accuracy scores tended to increase as duration ratio and intensity difference increased. Since no significant correlations
115 were found in initial stress, an important acoustic cue accounting for the successful performance in initial stress identification was not rev ealed. No direct evidence from correlation results supported t test results that NT relied heavily on the average F0 cue, especially in initial stress identification. Lastly, the correlation results suggested that when the phonetician produced final stress with increased intensity difference, F0 range and duration ratio were also increased. However, extended vowel duration contrast appeared to have an inverse relationship with average F0 contrast: the longer the stressed vowels, the lower the average F0 ratio. This implies that it could be difficult to lenghten the final vowel and to increase average F0 at the same time or this could be a characteristic of final lowering on F0 in English falling intonation. Stepwise Regression: Acoustic Data and Stress Perception Accuracy S core An analysis was conducted to examine which acoustic properties NT and AE relied on when identifying stress location as produced by the phonetician. Four separate stepwise regressions were conducted, one for each stress pos ition (initial and final) for each language group (NT and AE). Participants stress identification score was the dependent variable, and four acoustic parameters, duration ratio, F0 range ratio, mean F0 ratio and mean intensity difference (dB), were independent variables. A total of 200 cases (100 case from each stress position) drawn from speech perception stimuli without the vowel reduction cue were included in this analysis. Since all the words were produced by one talker, all of the acoustic data wer e used for the analysis. See table 6.8 for the regression result summary. The percentages presented in parentheses indicate the percent of variability explained by each predictor. The first predictor listed
116 under each condition (Position x Group) is the primary predictor of each model. The subsequent predictors in the following line add additional predictive power to the model. For initial stress regression results for both AE and NT, there were no significant predictors, suggesting that none of the acoustic parameters measured accounted for a significant amount of variance in their stress perception scores. This may have been partly due to a lack of variability in the perception score (i.e., initial stress perception accuracy scores were very high or approximately 94% or 14 out of 15 points were accurate) making it difficult to correlate the scores with the predictors. AE final stress perception results, based on 100 cases, showed that a model with just one predictor was statistically reliable, F (1 98) = 12.990, p = .000, with a poor model fit, R2 = .117. The only predictor in this model, duration, accounted for 12% of the variance. AE were more accurate in identifying final stress as duration of the stressed vowels increased. NT final stress per ception results, based on 100 cases, showed that a model with three predictors with the exclusion of F0 range was statistically reliable, F (3, 96), = 10.173, p = .000 with a fair model fit, R2 = .241. The full model accounted for 24% of the variance. The strongest predictor was intensity (15% of the variance), followed by duration (additional 5%) and mean F0 (additional 4%). The NT were more accurate in identifying final stress as intensity and duration of the stressed vowels increased, but mean F0 of the stressed vowels decreased. In sum, stepwise regression results showed that in final stress perception, the increased duration ratio was the only important predictor employed by AE while NT relied on the intensity, duration ratio and the mean F0 rati o. The final regression results
117 supported an interpretation that final CVO was the most difficult structure for stress identification because CVO had the shortest vowel among the five syllabic structures. Similarly, the regression results of final stres s identification suggested that NT scored the lowest on CVS syllables in final stress position because this structure had very low average intensity difference (as shown in table 67) and was relatively shorter in duration than other structures. In additi on, in the N T final stress regression model, the intensity and duration ratio had a positive relationship with stress perception accuracy score while mean F0 was negatively correlated with these scores. None of the predictors were found for initial stress perception models; regression results did not indicate a specific explanation for initial stress preference by both language groups. In conclusion, this section presents various aspects of stress perception results. First, the stress perception accur acy scores by NT and AE were presented. The accuracy analyses were divided into overall stress perception scores and stress perception scores without vowel reduction cues. Both groups showed excellent stress identification scores with a higher accuracy r ate for initial stress than for final stress. NT showed the most difficulty in CVS final stress identification while AE showed the least accurate scores on CVO in final stress identification. A stress perception accuracy analysis without the vowel reduct ion cue revealed that NT were not as sensitive to vowel reduction cue as AE. The second analysis involved mean RT as a dependent factor. NT spent more time identifying final stress than initial stress. CVS in final stress position showed the longest reaction time when compared to other syllabic structures. There is no significant difference in mean RTs between initial and final stress identification by AE.
118 The last analysis included a detailed acoustic analysis of speech perception stimuli produced by a phonetician. The obtained acoustic values were calculated in two ways: raw acoustic data of stressed versus unstressed syllables and ratio/difference values. T test results showed that final stressed words were produced with all significantly differ ent acoustic values between stressed vs. unstressed syllables while only average F0 and intensity were consistently different in initial stressed words. However, initial stress was produced with a greater magnitude of average F0 contrast and intensity dif ference than that of final stress. According to its magnitude, average F0 was speculated to carry the most weight for initial stress perception by both groups. Correlation results showed that AE final stress identification scores tended to increase the m ost when the duration ratio of final stress increased, followed by F0 range ratio and intensity difference. N T final stress accuracy scores tended to increase as intensity difference and duration ratio increased, respectively. No signficant correlations were found in initial stress. In addition, it was found that the phonetician produced final stress with incre ased intensity difference, F0 range and duration ratio. Finally, stepwise regression results showed that the increased duration ratio was the only important predictor in final stress perception by AE while three predictors: intensity, duration and mean F0 ratio, were included in final stress identification by NT. The intensity and duration ratio had a positive relationship between stress perception accuracy score while mean F0 was negatively correlated with the final stress scores. The significant predic tors in the final stress regression models explained why CVO and CVS in the final stress position were the most difficult structures for AE and NT respectively. Similar to correlation results above, none of the predictors were found
119 significant for initia l stress perception models. Unlike t test results, both correlation results and stepwise regression results failed to indicate the most important predictor for superior initial stress identification scores. This might be due to the fact that average F0 a nd intensity difference both contributed to excellent performance in initial stress identification. The next section presents the combined findings from stress production and perception. Relation between Speech Perception and P roduction The relation between lexical stress perception and production of English nonwords was investigated in two ways. First, the overall correlation between the perception and production accuracy response analyses was examined. Figures 625 and 626 show the scatterplot for the overall production and perception accuracy percentage by NT and AE, respectively. A Pearson correlation showed a positive correlation between stress perception and production in the Thai group, although this was not statistically significant at p<.05 [ r = .488, p = .065]; no relation was observed in the AE group [ r = .216, p = .439]. Further, the relationship between perception and production for specific syllabic structures was investigated. All responses from the production task and only perception responses drawn from the matching 12 wordtypes were reported across the two stress positions and five syllabic structures and treated as dependent variables. Significant correlation was only found for the Thai group. In word initial position (Fig. 62 7), Spearmans rho revealed a significant correlation for CVV structure [r = .549, p = .034]. In wordfinal position (Fig. 628), CVV was also found to have a significant correlation between perception and production [(Pearson correlation) r = .584, p = 022; (Spearmans rho) r = .706, p = .003 (p< 0.01)].
120 The acoustic results of stress production and perception revealed that NT were not influenced by Thai tonal rules as much when producing English lexical stress as no evidence across syllabic structures was found. NT showed more reliance on duration and intensity contrast when producing stress in both positions. In the stress perception task, t test results suggested that average F0 contrast and intensity difference were mostly likely important acoustic cues that NT attended to for initial stress perception. In final stress perception, regression results reported that intensity was the most important predictor for final stress perception followed by duration and average F0. The pattern of reliance on acoustic parameters seems to be simple for the NT: duration and intensity were used primarily for stress perception and production. In contrast, for AE, average F0 was the strongest predictor of their production accuracy of wordfinal stress, significant predictors for their word initial stress production ac curacy varied from syllabic structure to syllabic structure (intensity, duration and average F0). The relatively low R2 in regression models suggested that they also produced stress with other acoustic parameters not measured in this study. In stress per ception, AE relied on vowel reduction as suggested by the perception accuracy scores. Similar to NT, F0 and intensity might have contributed to their success in initial stress perception as suggested by t test results. Regression results revealed no signi ficant predictor for initial stress perception score but that duration was the only significant predictor for final stress perception. The AE pattern of reliance on acoustic parameters seems to be more complex than that of NT. The combined results sugges ted that AE relied on multiple acoustic properties in stress production and perception, including suprasegmental and segmental cues. There was strong evidence
121 that average F0 was the primary suprasegmental property that AE relied on primarily for stress p roduction and perception. Overall S ummary of Results The first part of the stress production results included stress production accuracy scores as determined by native American English judges. The prediction that AE would have better production than NT as shown by higher group stress production accuracy scores was not borne out. The results showed that NT and AE production scores were similar, approximately 70% in both groups. In addition, the prediction that NT would produce final stress better than ini tial stress was not borne out. Both groups produced initial stress (73%) more accurately than final stress (67%). With regard to syllabic structure and stress production, CVO and CVVO syllables were predicted to be most difficult to assign stress to for NT while no specific prediction was made for AE. This prediction was partially borne out. Syllabic structure conditioned each groups performance in a similar way. For both groups, CVO showed the lowest accuracy rate (62%) among the five structures. In addition, stress was produced significantly less accurately on CVV (69%) than CVVS (75%). It was concluded that CVO is the most difficult syllabic structure to implement stress, CVV is not as difficult and CVVS is the easiest for stress production. Finally, the prediction that NT would experience more difficulty when producing initial stress on CVO and CVVO syllables was not borne out. The second part of the stress production results reported acoustic data. NT were predicted to employ the duration param eter heavily to indicate final stress while no specific prediction was made for AE. Acoustic analyses were conducted on three selected syllabic structures, CVO, CVVO and CVVS of both stress positions. Overall, syllabic structures and stress position affected the four acoustic parameters (average
122 F0, F0 range, duration and intensity) produced by both groups of speakers in a similar way. That is, they produced initial stress with larger F0 range, higher mean F0 and greater intensity than final stress. How ever, they produced final stress with longer duration ratio than initial stress. Thus, the prediction that NT would use duration contrast to signal final stress was borne out. Duration was also the main acoustic correlate of final stress among AE. To examine which acoustic correlate(s) of lexical stress the judges relied on when identifying the location of stress produced by the two language groups, stepwise regression tests were performed. It was predicted that that judges would rely on multiple acousti c cues. This prediction was borne out. Stepwise regression results showed that native listener judges relied on different cues to identify stress produced by AE and NT. When evaluating stress produced by AE, the judges made use of various acoustic param eters with the most reliance on intensity, average F0 and vowel duration respectively for initial stress, and on average F0, intensity and vowel duration for final stress. In contrast, when identifying stress produced by NT, the native judges relied on duration the most for both initial and final stress. Intensity and average F0 also emerged as the second and third strongest predictors for initial stress while average F0 was the second strongest and intensity was the weakest predictor for final stress produced by NT, respectively. The stress perception experiments yielded two types of stress identification accuracy scores (with and without vowel reduction) and RT results. Concerning the overall stress identification score, both groups were predicted to s how a ceiling effect with no significant difference between NT and AE scores. These predictions were borne out. The results of stress identification including all test items (7,500) revealed no
123 significant difference between perceptual accuracy: NT identified 89% of the stressed syllables correctly, and AE were 92% accurate. With regard to stress position effect, NT were predicted to identify final stress better than initial stress while AE were predicted to identify initial stress better than final str ess. The first prediction was not borne out as NT identified initial stress more accurately than final stress. In fact, both NT and AE identified initial stress more accurately than final stress. With regard to syllabic structure effect, NT were predicted to be less accurate at identifying stress on CVVO and CVO syllables than on syllables with other structures, especially in initial stress position. This prediction was not borne out. Syllable structure did not affect NT perceptual accuracy, but it did affect AE. With all test items, AE identified stressed CVO syllables (87%) less accurately than stressed CVVS (95%), CVS (90%) than CVVS (95%) and CVVO (92%) syllables. Analysis without vowel reduction items (6,000) revealed that AE identified CVO (86%) less accurately than CVV syllables (93%). The change in stress identification patterns between the two types of accuracy scores suggested that AE relied on the vowel reduction cue, but Thai did not as much as AE. Predictions regarding RT were also made on similar factors: group, stress position, and syllabic structure. First, it was predicted that, overall, AE would have faster RTs than NT. This prediction was not borne out. There was no significant difference in overall RTs for AE and NT. Second, it w as predicted that NT would identify final stress more quickly than initial stress and AE would identify initial stress more quickly than final stress. This set of predictions was not borne out. RT results revealed that only NT, but not AE, identified ini tial stress more quickly than final stress. Third, NT were predicted to spend more time identifying stressed CVVO and CVO syllables than stressed
124 syllables with other structures, especially in initial stress position. This prediction was borne out either The results showed that both groups spent more time on wordfinal stressed CVS structure than on wordfinal stressed CVV, CVVO and CVVS structures. A detailed acoustic analysis of speech perception stimuli produced by a phonetician revealed that final stress of all five structures was produced with several significantly different acoustic parameters between stressed and unstressed syllables. However, only average F0 and average intensity were consistently different for initial stress test stimuli. Th is finding is surprising given that initial stress syllables were identified more accurately than final stress by both language groups. Detailed examination showed that average F0 contrast and intensity difference in initial stress perception stimuli were larger than that of final stress stimuli. Although no specific prediction as to which acoustic parameters would AE and NT rely on was made, results of the regression analyses suggested that AE relied solely on duration to identify final stress while NT r elied on intensity, duration and average F0. Stepwise regression results did not yield any significant predictor for initial stress perception score for either group. This may have been due to a ceiling effect; initial stress perception accuracy scores we re very high (above 90%) for both AE and NT. However, since average F0 and intensity were the two acoustic dimensions that were significantly differentiated for initial stress, and since the magnitude of the differentiation was greater than that of final stress, one might speculate that these two acoustic parameters were the ones NT relied on in their perception of initial stress. The overall stress perception and production accuracy scores of each individual in each language group and syllabic structure w ere investigated to examine the
125 relationship between production and perception. It was predicted that NT who excelled in the stress identification task would show similar performance in the stress production task. This prediction was borne out. A positi ve correlation (close to significance) of the overall accuracy scores for perception and production was found only for NT. The next research question was related to the relation between perception and production for specific syllabic structures. No speci fic prediction was made. The results showed that only for NT, in wordfinal and initial position, the CVV structure was found to have significant correlation. Combining the acoustic results from both experiments, the overall relation between acoustic par ameters employed in speech perception and production was revealed. Acoustic data suggested a possible direct relationship between stress perception and production by NT. NT appeared to rely heavily on duration and intensity for stress production. In the absence of a strong F0 cue, as in final stressed words, NT relied on intensity the most as suggested by regression results on the final stress score. NT also showed great reliance on average F0 contrast in initial stress perception as suggested by t test results. In contrast, the relationship between acoustic reliance in stress perception and production for AE were more complex and integrated acoustic parameters might have been employed. AE appeared to rely on average F0 the most in both stress product ion and perception. However, other acoustic parameters including duration, intensity, vowel reduction and other acoustic parameters not measured here might also be employed by AE. This chapter reported the overall performance of stress production and stress perception by both language groups. Effects of Structure and Position showed that overall NT and AE perceived and produced stress similarly. However, detailed acoustic
126 results showed subtle differences. Although the present chapter provided answers to many research questions, the role of Thai tone assignment in L2 lexical stress production still has not been directly treated. This leads to further analysis in the next chapter.
127 40 50 60 70 80 90 100 Thai AE score (%) language group Figure 61. Overall mean percentage and standard errors of stress production score by NT and AE Figure 62. M ean percentage and standard errors of stress production score by each stress position by both language groups
128 40 50 60 70 80 90 100 Thai AE score (%) stress position x language group Initial Final Figure 63. Mean percentage and standard errors of stress production score by stress position and language group Figure 64. Mean percentage and standard errors of stress production score by syllabic structures of both stress positions by both language groups
129 Figure 65. Mean percentage and standard errors of stress production score by syllabic structure by stress position and language group
130 Table 61. R esults of t hreeway ANOVAs on four acoustic correlates Acoustic Main effects parameters Position Structure Group Duration F(1, 118) = 285.84 F(2, 209.173)= 20.023 p = .000 p = .000 non sig mean 1 = 0.863 all 3 Structures differ mean 2 = 2.32 1 = 1.57, 2 = 1.337 3 = 1.869 F0 range F(1, 118) = 71.517 p = .000 non sig non sig mean 1 = 1.973 mean 2 = 1.660 Mean F0 F(1, 118) = 5.436 F(2, 228.074) = 4.156 F(1, 118) = 6.36 p = .000 p = .018 p = .013 mean 1 = 1.25 1 higher than 2 mean 2 = 1.076 Mean dB F(1, 118) = 44.886 F(2, 118) = 9.492 p = .000 p = .000 non sig mean 1 = 3.931 1 higher than 3 mean 2 = 1.174 2 higher than 3 Note: Position, 1 = initial, 2 = final; Structure, 1 = CVVO, 2 = CVO, 3 = CVVS Table 61. Continued Acoustic Interaction parameters Position Structure Position Position x Group x Group x Structure x Structure x Group Duration F(1, 118) = 4.046 p = .047 non sig non sig non sig F0 range F(1, 118) = 24.06 p = .000 non sig non sig non sig Mean F0 F(1,118) = 0.404 p = .048 non sig non sig non sig Mean dB F(2, 118) = 3.13 non sig p = .046 non sig non sig
131 Table 62. Descriptive statistics of average F0 by syllabic structure x position x group G roup Mean F0 SD N InCVVO Thai 1.25 0.28 60 AE 1.18 0.20 60 inCVO Thai 1.30 0.27 60 AE 1.20 0.26 60 inCVVS Thai 1.36 0.43 60 AE 1.20 0.23 60 fiCVVO Thai 1.04 0.19 60 AE 1.05 0.19 60 fiCVO Thai 1.12 0.24 60 AE 1.10 0.24 60 fiCVVS Thai 1.09 0.20 60 AE 1.06 0.22 60 Note: in = initial, fi = final Table 63. Descriptive statistics of three acoustic correlates by all combinations of stress position x syllabic structure and mean difference of each contrastive pair and their t statistics. Pair Condition Mean (N=120) SD (N=120) Mean difference SD SE t df p 1 inCVVOdur 0.94 0.37 1.27 1.40 0.13 9.94 119 .000 fiCVVOdur 2.21 1.26 2 inCVVOfr 1.15 1.05 1.42 2.66 0.24 5.84 119 .000 fiCVVOfr 2.57 2.57 3 inCVVOmf 1.22 0.24 0.17 0.30 0.03 6.24 119 .000 fiCVVOmf 1.04 0.19 4 inCVOdur 0.63 0.23 1.42 1.34 0.12 11.59 119 .000 fiCVOdur 2.04 1.36 5 inCVOfr 1.13 1.09 1.34 3.27 0.30 4.49 119 .000 fiCVOfr 2.47 3.01 6 inCVOmf 1.25 0.27 0.14 0.34 0.03 4.56 119 .000 fiCVOmf 1.11 0.24 7 inCVVSdur 1.02 0.40 1.69 1.66 0.15 11.14 119 .000 fiCVVSdur 2.72 1.54 8 inCVVSfr 1.17 1.19 1.23 3.28 0.30 4.11 119 .000 fiCVVSfr 2.40 2.99 9 inCVVSmf 1.28 0.35 0.21 0.37 0.03 6.09 119 .000 fiCVVSmf 1.07 0.21 Note: in = initial, fi = final, dur = duration, mf = avearage F0, fr = F0 range
132 Table 64. Summary of stepwise regression results for AE and NT for each syllabic structure structure AE Initial stress NT initial stress AE final stress NT final stress CVVO R 2 = .285, p = .002 R 2 = .462, p = .000 R 2 = .518, p = .000 R 2 = .358, p = .000 duration (13%) intensity (22%) intensity (33 %) intensity (19%) mean F0 (10%) duration (17%) mean F0 (18%) duration (8.6%) intensity (5%) mean F0 (7%) duration (6.5 %) mean F0 (8.2%) n = 127 n = 130 n = 127 n = 128 CVO R 2 = .345, p = .025 R 2 = .610, p = .000 R 2 = .603, p = .000 R 2 = .508, p = .000 mean F0 (20%) duration (37%) mean F0 (37%) duration (27%) duration (10%) intensity (6.9%) duration (14.8%) mean F0 (15.8%) intensity (3.5%) mean F0 (6.9%) intensity (8.3 %) intensity (7.8%) n = 101 n = 103 n = 96 n = 100 CVVS R 2 = .391, p = .000 R 2 = .413, p = .000 R 2 = .665, p = .000 R 2 = .639, p = .000 intensity (39%) intensity (25%) mean F0 (39%) duration (29%) mean F0 (10%) duration (20.7%) mean F0 (32.8%) duration (6.6 %) intensity (7%) intensity (2.2%) n = 57 n = 55 n = 85 n = 84 Table 65. Summary of predictors for each stress position Ranking AE Initial stress NT initial stress AE final stress NT final stress 1 intensity (47.5%) duration (60.6%) mean F0 (94%) duration (64.6%) 2 mean F0 (30%) intensity (53.9%) intensity (48.3 %) mean F0 (56.8%) 3 duration (23%) mean F0 (23.9%) duration (42 %) intensity (29%)
133 Figure 66. Overall mean percentage and standard errors of stress perception accuracy by NT and AE Figure 67. M ean percentage and standard errors of stress perception accuracy by each stress position by both language groups
134 Figure 68. Mean percentage and standard errors of stress perception accuracy by stress position by NT and AE Figure 69. Mean percentage and standard errors of stress perception accuracy by syllabic structures of both stress positions by both language groups
135 Figure 610. Mean percentage and standard errors of stress perception accuracy by syllabic structure by language group Figure 611. Mean percentage and standard errors of stress perception accuracy by syllabic structure by stress position and language group
136 Figure 612 Overall mean percentage and standard errors of stress perception without reduction cue accuracy by NT and AE Figure 613. M ean percentage and standard errors of stress perception accuracy without vowel reduction cue by each stress position by both language groups
137 Figure 614. Mean percentage and standard errors of stress perception without vowel reduction cue accuracy by stress position by NT and AE Figure 615. Mean percentage and standard errors of stress perception without vowel reduction cue accuracy by syllabic structure of both stress positions and both language groups
138 Figure 616. Mean percentage and standard errors of stress perception without reduction cue accuracy by syllabic structure by language group Figure 617. Mean percentage and standard errors of stress perception without reduction cue accuracy by syllabic structure by stress position and language group
139 Figure 618. Overall mean percentage and standard errors of stress perception accuracy accuracy by NT and AE Figure 6 19. M ean percentage and standard errors of stress perception accuracy by each stress position by both language groups
140 Figure 620. Mean percentage and standard errors of reaction time by syllabic structures of both stress positions by AE Fig ure 621. Mean percentage and standard errors of reaction time by stress position by NT and AE
141 Figure 622. Mean percentage and standard errors of reaction time by syllabic structure by stress position of both language groups Figure 623. Mean percentage and standard errors of reaction time by syllabic structure by stress position and by NT and AE
142 A B C D E F
143 G H I J Figure 624. Series of four acoustic parameters in stressed and unstressed syllables of each syllabic structure and stress position combination. A) CVVO initial stress, B) CVVO final stress, C) CVO initial stress, D) CVO final stress, E) CVVS initial stress, F) CVVS final stress, G) CVS initial stress, H) CVS final stress, I) CVV initial stress and J) CVV final stress.
144 Table 66. Descriptive statistics of average F0 values and difference between stressed and unstressed syllables in each structure x position Structure x Position Stressed (Hz) Unstressed (Hz) Mean difference (Hz) Mean (SD) Mean (SD) Mean (SD) CVVOxx 126.1 (5.6) 87.7 (2.4) 38.3 (5.8) xxCVVO 113.4 (6.4) 97.6 (3.4) 15.8 (6.0) CVOxx 126.0 (5.6) 87.1 (2.3) 38.9 (6.2) xxCVO 117.1 (6.1) 96.4 (2.8) 20.6 (6.5) CVVSxx 122.2 (3.8) 88.1 (4.1) 34.1 (5.0) xxCVVS 110.6 (6.0) 98.1 (3.2) 12.5 (7.4) CVSxx 126.4 (7.2) 88.8 (2.6) 37.5 (7.2) xxCVS 116.2 (4.2) 97.2 (3.8) 19.0 (5.1) CVVxx 126.4 (5.0) 91.8 (4.1) 34.6 (6.0) xxCVV 110.2 (5.8) 99.0 (3.1) 11.2 (7.1) Table 67. Descriptiive statistics of intensity difference values and difference between stressed and unstressed syllables in each structure x position Structure x Position Stressed (dB) Unstressed (dB) Mean difference (dB) Mean (SD) Mean (SD) Mean (SD) CVVOxx 71.0 (1.5) 63.0 (2.7) 8.0 (2.9) xxCVVO 69.6 (2.3) 65.7 (3.7) 3.9 (4.4) CVOxx 73.0 (1.7) 64.0 (1.9) 9.0 (2.2) xxCVO 71.8 (1.2) 65.1 (2.6) 6.6 (2.3) CVVSxx 70.7 (1.4) 62.8 (1.7) 7.9 (2.2) xxCVVS 69.3 (1.3) 66.0 (2.1) 3.3 (1.8) CVSxx 70.8 (1.5) 64.8 (2.5) 6.0 (2.1) xxCVS 68.3 (1.9) 64.8 (2.6) 3.5 (2.5) CVVxx 75.6 (1.6) 65.1 (2.2) 10.5 (1.7) xxCVV 73.5 (1.9) 67.0 (2.6) 6.4 (2.0) Table 68. Summary of stepwise regression results AE and NT stress perception score AE Initial stress NT initial stress AE final stress NT final stress N/A N/A R 2 = .117, p = .000 R 2 = .241, p = .000 duration (12%) intensity (15%) duration (5%) mean F0 (4%) n = 100 n = 100 n = 100 n = 100
145 Figure 625. Scatterplot for the overall production and perception accuracy percentage by NT Figure 626. Scatterplot for the overall production and perception accuracy percentage by AE
146 Figure 627. Scatterplot for the NT production and perception accuracy percentage of initial CVVxx syllabic structure Figure 628. Scatterplot for the NT production and perception accuracy percentage of final xxCVV syllabic structure
147 CHAPTER 7 THAI TONE TRANSFER This chapter focuses on the role of Thai tonal constraint onto L2 lexical stress production. This research question differs from other questions investigated in chapter 6. In chapter 6, the obtained results with the focus on F0 parameter showed that NT d id not rely heavily on F0 parameters when producing stress. A series of stepwise regression analyses indicated that NT relied mainly on duration and intensity contrast when producing stress while AE relied mostly on average F0 and intensity contrast. Non etheless, when only the acoustic data were considered (threeway ANOVAs analysis), the results showed that NT employed a significantly higher average F0 when producing CVVS stressed syllables compared to AE. These results, however, did not indicate whether there is a transfer from Thai tonal constraints onto English lexical stress production by N T. To directly evaluate this issue, two additional analyses were designed and results are presented and discussed in this chapter. Methodology Research Questions and Hy potheses The goal of this chapter is to investigate whether Thai speakers of L2 English assign Thai tone when they produced English lexical stress. I t was hypothesized that NT might transfer the implicit knowledge of the Thai tone distribution rul e constrained by syllabic structure (see table 71) and thus produce stress on the target syllable with F0 patterns (or tone) consistent with this rule. Thai is a tone language with specific tonal distribution according to syllabic structure, unlike other tone languages. Five tones in Thai: mid, low, falling, high and rising, can be assigned freely to syllables with long
148 vowel, CVV, or with a sonorant coda, CVS and CVVS. Syllables with an obstruent coda have more restriction than other syllabic types. C VO can only carry low and high tone and CVVO can only bear low and falling tone. Gandour (1979) observed that native speakers of Thai assigned Thai tones to English loanwords that entered into Thai language and predicted tonal assignment rule onto Engli sh loanwords according to their syllabic type and syllabic position. He grouped Thai syllables into two sets: smooth (CVV, CVVS and CVS) and checked syllables (CVVO and CVO). In non final position, smooth syllable will bear mid and checked syllable will bear high tones. In final position, smooth syllables will bear falling tones while checked syllables will bear low tones. Although his predictions were generated from a corpus of English loanwords or new words being adopted into the Thai language, they might be applicable in the reverse direction when native speakers of Thai are learning English. That is, Thai participants were hypothesized to assign Thai tones according to Gandour's predicted syllabic structures and stress positions when producing stress in English These two sources of Thai tone predictions motivated the design of English nonwords in two lexical stress experiments. The hypothesized F0 patterns Thai speakers might employ when producing English lexical stress are shown in table 7 1 Table 71presents predictions for all 12 disyllabic word types recorded in the speech production task. Gandours and Thai tone predictions are presented in their own columns which are further divided by syllabic position: first and final (second syllable in this study).
149 Materials 80 disyllabic words (4 structures x 2 stress positions x 15 speakers) representing four syllabic structures: CVO, CVVO, CVS and CVVS, in two stressed positions produced by Thai speakers only were included in the following analyses. To control the variation in unstressed syllables, all the selected words had unstressed syllable with long vowels (CVVO, CVVS, CVV). These words were used in two analyses: 1) twoway repeated measures ANOVAs on F0, duration and intensity and 2) Thai tone transcription analysis. TwoW ay Repeated M easures This analysis was designed to test if the acoustic parameters NT used when producing stress as reported in chapter 6 vary as a function of the syllabic type of the stressed syllable and the stress position. Average F0 ratio posits the most interesting parameter for the analysis since F0 pattern is the physical property of tone. However, duration and intensity contrast between stressed and unstressed syllables were investigated along with F0 bec ause they were reported as the major acoustic parameters NT used when producing stress. The close examination on these two parameters might give a clearer explanation on the insignificant role of F0 as a stress cue by NT A series of two way repeated meas ures ANOVAs were used to examine the mean difference of the three acoustic parameters in question. The factors investigated include syllabic structures (CVO, CVVO, CVS, and CVVS) and stress position (initial and final). These two factors were treated as within subject factors.
150 Average F0 The effect of Thai tone transfer can be indirectly assessed by comparing the average F0 values of each syllabic structure. If there was a transfer, the average F0 ratio of each syllabic structure type would be different. Each Thai tone has different F0 pattern/ contour and thus onset and offset points measured in Hz vary. Low tone should have the lowest F0 average since its midpoint and offset are relatively lower than that of the other tones. The F0 ratio of words with CVVO syllables in stressed position was predicted to carry to lowest F0 ratio since CVVO can only bear low and falling tone in Thai. CVO was predicted to have low F0 ratio as well, but not as low as CVVO, because it can bear low and high tone accordi ng to Thai tonal rules. CVVS and CVS were predicted to have high average F0 ratio as the stressed syllable can carry any tone in Thai. Moreover, the average F0 ratio values might be affected by the position of the syllable as predicted by Gandours (1979 ) rule. Thus, all four syllabic structures were investigated for both initial stress and final stress. Table 72 present s descriptive statistics of the average F0 ratio for each syllabic structure in each stress position. The results of twoway repeated measures on average F0 ratio revealed no significant difference between average F0 ratio among the four syllabic structures, [main effects of Structure, F (1.73, 24.23) = 2.334, p = .124]. However, the analysis showed that Thai speakers produced initial stress with higher average F0 (1.3) than that of final stress (1.08), [main effects of Position, F (1, 14) = 9.171, p = .009]. Significant interaction between stress position and syllabic structure was found, [ F (2.93, 41.08) = 3.069, p = .39]. Follow up tests showed that the means of two syllabic structures, CVVS and CVS, differed significantly (p <.0125) in both stress positions: CVVSxx
151 (1.43) vs xxCVVS (1.1), [ t (14) = 2.927, p = .011] and CVSxx (1.41) vs xxCVS (0.97), [ t (14) = 3.2, p = .006]. No other contrastive pairs were found significant. This result does not support the prediction that CVVO syllable would show lower average F0 ratio than other syllabic struct ures such as CVVS or CVS. Duration Two way ANOVAs reporte d significant effect of Position, [ F (1, 14) = 22.898, p = .000] and significant effect of Structure, [ F (2.277, 31.88) = 3.137, p = .051]. The duration ratio of final stress (2.058) was longer than that of initial stress (0.727). Post hoc Bonferroni pair wise comparisons showed that the duration ratio of the CVVS syllable (1.665) was significantly longer than that of the CVO syllable (1.305). The twoway interaction between Position and Structure was not found significant, [ F (2.116, 29.63) = .991, p = .387]. Table 73 display s the mean duration ratio of each syllabic structure by stress position. The result suggests that when it is feasible to manipulate duration contrast, NT take advantage of this. Since CVVS naturally carries the longest vowel duration among the four syllables, NT had no difficulty manipulating duration ratio to signal stress on this syllable. CVO, carrying the shortest vowel duration of all syllabic types, caused the most difficulty for N T to indicate stress through duration. NT appeared to be restricted, or conformed, to the original length of this syllabic structure and were unable to modify its duration to implement stress However, the fact that the duration ratio of neither CVVO was found to be significantly longer than CVO syl lables nor was CVVS found significantly longer than CVS syllables suggest the difference in syllabic structure and vowel lengthening effect. It may be easier to manipulate vowel duration in a syllable ending with a sonorant th an with a stop consonant, thus the difference between
152 stressed and unstressed syllable of CVVS/CVS type is about the same. On the other hand, v owel lengthening could be less effective in a syllable ending with a stop consonant CVVO/CVO type thus the difference between stressed and unstressed syllable of CVVO/CVO type is the same Intensity Lastly, intensity difference of the four syllabic structures was submitted to the twoway ANOVAs. The results showed a significant effect of Position, [ F (1,14) = 4.901 = 4.901, p = .044] and Structure, [ F (3, 42) = 15.692, p = .000]. The average intensity difference of initial stress (1.094 dB) was significantly higher than that of final stress ( 0 .553 dB). The negative value of final stress shows that, on aver age, final stress was produced with lower intensity difference. In other words, stressed syllables in wordfinal position are not always produced with higher intensity than unstressed syllables as in word initial position. T able 74 display s means of eac h syllabic structure in each position. The collapsed means on stress positions are as followed: CVO, 0 .411 dB; CVVO, .457 dB; CVVS, 1.93 dB and CVS, 2.967 dB. Post hoc, pair wise comparisons showed all contrastive pairs were significantly different wit h an exception of CVO vs CVVO pairs. No significant Position x Structure interaction was found, [ F (2.654, 37.154) = .992, p = .400]. Summary of the Two Way Repeated Measures R esults The results in the section do not support the Thai tone transfer hypothesis in the direction predicted. CVVO and CVO syllables were not found to have the lowest mean F0 ratio of all the stressed syllables, rather they appear to have similar means to CVVS stressed syllables. Position, instead, has an effect on average F0 rat io. NT produced initial stress with higher average F0 ratio than final stress syllables, especially when
153 they were stressing CVVS and CVS syllables. The results of the duration ratio analysis revealed that CVVS syllables were produced with longer vowel duration contrast than that of CVO syllables and imply that NT only used vowel duration contrast when it is feasible to execute. The results of the Intensity contrast analysis showed that initial stress was produced with higher average intensity than that of final stress. To conclude this section, there has not been substantial evidence that NT were constrained by L1 tonal rules when producing stress. Syllabic structure was not an immediate factor accounting for the difference in F0 ratio values. Stress p osition effect played a more uniform role accounting for the difference between stress ed vs. unstressed syllables. Thai Tone Transcription Another approach to trace any possible Thai tone transfer onto English nonword production by NT is to describe the pattern of F0. The materials for tone transcription are the same word set used in the above twoway repeated measures analysis. The previous analysis only compared the average F0 ratio and this might be not be the best representation of F0 contour of each syllable. Before the tone transcription was selected as a method for this analysis, an attempt had been made on comparing values of F0 in hertz (10 measured points) reported by Praat However, interpreting only the number was difficult. The values near the onset and offset of tone were high due to the voiceless onset and coda consonants. Mid and low tones appeared to have similar contour; these tones originate in similar frequency range and gradually fall from mid portion to the offset. Although their offset ends at different frequencies, the magnitude is very small compared to the offset of other tones. Moreover, some syllables carried F0 contour that could not be mapped
154 with any Thai tones. Having considering all of these challenges, auditory ev aluation of Thai tone transfer was a more valid method for the analysis. Transcription P rocedures Three native speakers of Thai with linguistic training were included in a Thai tone mapping analysis. The first two transcribers listened to each individual sound file of 80 disyllabic words, used in the previous twoway ANOVAs analysis, and were asked to identify the Thai tones heard in each syllable (160 syllables in total). Six responses were mid, low, fall, high, rise tones and none of them. They could listen to each sound file as many times as they like and in a self paced manner. The third transcriber, with the most experience in tone transcription, was called to resolve disagreeing responses on tone identification made by the two transcribers. Final ly, the reported tones were selected on the following basis. If two of the three listeners identified the same tone in a syllable, that tone was used for the analysis. If three listeners reported different tones heard on the same syllable or when two transcribers agreed that the heard F0 pattern did not map well with any Thai tone, the tone of that syllable was coded as n/a (not available). Results The results of Thai tone transcri ption are reported in tables 75 and 76 According to the frequency of Thai tone assigned to each stressed syllables: CVO, CVVO, CVVS and CVS, it was found that the F0 patterns in initial stress position of all four syllables were mapped to the high tone in Thai the most (51.7%). In final stress position, the observed F0 patterns were mapped to falling tone (56.7 %) and mid tone (18.3%) the most often. The F0 patterns of unstressed syllables in both positions were mapped to low tone the most, followed by mid tone.
155 The distribution of Thai tone mapping onto English stress production did not support the Thai tone prediction very well. (See the predictions again in table 77 .) In initial stress position, the tones on CVVO syllables were heard as low (5 times), fall (3 times) and high (7 times). High tone was not predicted f or CVVO syllable but yet it was assigned to the syllable the most. In fact, all four syllabic types were produced with high tones the most. F0 patterns of unstressed syllables in table 75 were produced mostly with low tones. The uniform patterns of F0 production in initial stress words explain why the significant contrast between the F0 ratio of CVVO syllables and other syllabic types was not found. In final stress position, falling tone was assigned to all four syllabic structures the most. The fact that the F0 pattern on CVO syllables were heard as fall tone 11 times (73% of the time) is contradicting the Thai tone prediction that CVO will be produced with low or high tone. Unstressed syllables of final stress words were reported to carry mid (25%), low (46.7%) and high tones (16.7%) unlike that of initial stress words that predominantly carried low tone (63.3%). This suggests that it is more diff icult to produce lower frequency (lower pitch) on unstressed syllables wordinitially. Gandours predictions (see table 78 ) were partially supported by F0 patterns on CVO and CVVO syllables in non final (stressed) and final position (unstressed) initial stress words. However, the rest of his predictions were not supported by the data. Combining the results and the predictions together, it does not mean that NT produced F0 patterns on English nonwords without the influence of Thai tone at all. Althoug h the data did not show strong evidence on Thai tone transfer according to the predicted syllabic structure and position, more than 50% of the data showed that F0
156 patterns produced by NT could be mapped to Thai tones as data reported above. Comments from the transcribers reported that some syllables were produced with obvious Thai tones while some syllables were produced with F0 patterns modified from Thai tone or unlike Thai tones at all. Each Thai participant appeared to have their own way of producing F0 patterns; individual differences account for this. Small samples of English nonwords produced by AE were transcribed by the most experienced tone transcriber. Their F0 productions were not readily identified as any Thai tone. This can be interpreted that to some degree Thai speakers implementation of F0 on stressed syllable is consistent with the Thai tone distribution rule. Conclusion In summary, the two way repeated measures on 80 English disyllabic words produced by NT failed to show the difference between the average F0 ratios of CVVO or CVO syllables and other syllabic structures. According to the Thai tonal rule, CVVO and CVO were predicted to show lower average F0 ratios than other syllables that can carry any tones. However, unlike the predi ction, the average F0 means of CVS and CVVS syllables in two stressed positions were found significantly different. The results of repeated measures ANOVAs on the Duration ratio showed that CVVS syllables were significantly longer than CVO syllables. Fur thermore, the mean intensity difference of each of the structures was significantly different from each other, except the mean intensity of CVO and CVVO syllables. Thai tone transcription data did not conclusively show that NT realize d F0 patterns on stressed syllables using the Thai tone distribution rule. It appears that NT simultaneously manipulated F0, intensity and duration to signal stress. The manipulation seems to be influenced by both syllabic structure and stressed position.
157 Table 71. Hypothesized F0 patterns according to Gandours and Thai tones predictions on English nonwords produced by Thai speakers Word type code Syllabic Structure Gandour Thai tone first final first final 3 CVV.CVVO M L 5 L,F 4 CVV.CVO M L 5 L,H 7 CVVS.CVVO M L 5 L,F 10 CVVS.CVS M F 5 5 12 CVVO.CVO H L L,F L,H 13 CVVO.CVV H F L,F 5 14 CVVO.CVVS H F L,F 5 17 CVO.CVS H F L,H 5 19 CVO.CVVS H F L,H 5 20 CVO.CVVO H L L,H L,F 22 CVS.CVV M F 5 5 25 CVS.CVO M L 5 L,H Note: M = mid, L = low, F = falling, H = high and 5 = all five Thai tones. Table 72. Descriptive statistic of average F0 ratio by each syllabic structure and stress position Structure x Position Mean SD N CVOxx 1.37 0.35 15 xxCVO 1.19 0.24 15 CVVOxx 1.22 0.27 15 xxCVVO 1.08 0.24 15 CVVSxx 1.44 0.45 15 xxCVVS 1.10 0.19 15 CVSxx 1.41 0.40 15 xxCVS 0.98 0.22 15
158 Table 73. Descriptive statistic of duration ratio by each syllabic structure and stress position Structure x Position Mean SD N CVOxx 0.64 0.26 15 xxCVO 0.83 0.34 15 CVVOxx 0.91 0.31 15 xxCVVO 0.54 0.19 15 CVVSxx 1.97 1.07 15 xxCVVS 1.91 0.96 15 CVSxx 2.42 1.55 15 xxCVS 1.93 0.95 15 Table 74. Descriptive statistics of average intensity difference by each syllabic structure and stress position Structure x Position Mean SD N CVOxx 0.22 4.72 15 xxCVO 2.12 2.68 15 CVVOxx 1.63 2.99 15 xxCVVO 3.67 4.01 15 CVVSxx 1.04 4.46 15 xxCVVS 1.21 5.73 15 CVSxx 2.23 4.16 15 xxCVS 2.27 2.84 15 Table 75. Thai tone distribution of initial stress words Stressed syllables Unstressed syllables mid low fall high rise n/a mid low fall high rise n/a CVO 1 4 8 2 CVVO 1 10 4 0 CVVO 5 3 7 0 CVVS 7 6 1 1 CVVS 6 1 8 0 CVVO 13 2 0 CVS 4 1 1 8 1 CVV 9 3 1 2 total 11 7 8 31 0 3 total 8 38 10 0 1 3 % of 60 18.3 11.7 13.3 51.7 0 5 % of 60 13.3 63.3 16.7 0 1.67 5
159 Table 76. Thai tone distribution of final stress words Stressed syllables Unstressed syllables mid low fall high rise n/a mid low fall high rise n/a CVO 2 11 2 0 CVVO 8 2 4 1 CVVO 3 8 2 2 CVVS 8 6 1 0 CVVS 4 9 1 1 CVVO 9 1 3 2 CVS 7 1 6 1 CVVS 7 5 2 1 total 11 6 34 5 0 4 total 15 28 3 10 0 4 % of 60 18.3 10 56.7 8.3 0 6.67 % of 60 25 46.7 5 16.7 0 6.7 Table 77. Thai tonal rule predictions Syllables Thai tones Smooth: CVV, CVVS, CVS mid low fall high rise Long checked: CVVO -low fall --Short checked: CVO -low -high -Table 78. Gandour (1979)s predictions Syllables Non final position Final position Smooth: CVV, CVVS, CVS mid fall Checked: CVVO, CVO high low
160 CHAPTER 8 GENERAL DISCUSSION AND CONCLUSIONS The analyses in chapter 6 and 7 have provided answers to the three main research questions raised in the Int roduction: Do Thai tone assignment rules linked to syllabic structures affect Thai speakers production and perception of English stress? Does the stress pattern in Thai influence Thai speakers perception and production of lexical stress? What is the relation between L2 lexical stress perception and production? The following sections present discussions on production and perception of lexical stress separately. Next, the relation between speech perception and production will be addressed. All the res ults will then be brought together in light of existing L2 lexical stress models. Finally, the last section presents the conclusion and suggestions for further research. Stress P roduction The stress production experiment was designed to test the effect of syllabic structure and stress position. It was first hypothesized that Thai speakers would have difficulty producing stress on CVVO and CVO syllables due to the influence of Thai tone dis tribution rules. CVVO was predicted to be the most difficult syllable for stress assignment since it can bear only the low and falling tones in Thai; CVO was predicted to be the second most difficult syllable since it can only bear low and high tones in Thai. Second, Thai speakers were hypothesized to produce final stress better than initial stress since the Thai language was reported to have fixedfinal stress pattern. Overall, AE and NT were able to produce stress on English nonwords with an equal degree of success; approximately 70% of the intended stresses were accurately
161 heard by native listener judges. Both groups produced initial stress (73 %) more successfully than final stress (67%). Their stress production scores judged by native listeners, that differ significantly from highest to lowest means, were CVVS (74.9%), then CVV (69.3%) and the worst, CVO (61.75%). The stress production scores indicated that CVO is the most difficult structure for stress production. It should also be noted that although NT produced intended stress as successfully as AE, NT, in general, showed more effort when producing stress and required more recording time, almost twice as much, than AE. Since both groups obtained similar stress production scores, as condition ed by syllabic structures and positions, this suggests a weak effect of the Thai tone distribution rules, which predict the production of F0 patterns based on syllabic structures. Languagegeneral cues such as syllable weight to stress assignment rather than transfer of prosodic features associated with L1 syllabic structure might be a better predictor of stress production on CVVS and CVV over CVO syllables by both AE and NT. CVVS and CVV are heavy syllables as they have a long vowel and/or are closed by a sonorant while CVO is a light syllable and closed by an obstruent. This interpretation is consistent with the findings that Chinese and Vietnamese learners of English preferred to stress CVS syllables to CVO (Ou, 2007). Another possible interpretat ion concerns vowel length as a stress cue. Both CVVS and CVV have a longer vowel than CVO syllables; the longer vowel attracts stress better than the shorter vowel. However, if this was true, the open syllable with a long vowel, CVV, should receive a higher stress identification score by native listeners than the CVVS syllable which has a slightly shorter vowel length. However, rhyme
162 duration is longer in CVVS than in CVV. In addition, F0 and intensity realized on a final sonorant may cause CVVS to be more easily heard as being stressed. This explanation implies that syllable weight may be more important than vowel duration in the implementation of stress. To further investigate how NT and AE realized stress phonetically, three syllabic structures: CVO, CVVO and CVVS, were submitted to acoustic analysis. Four acoustic measures: average F0, F0 range, duration and average intensity, of stressed and unstressed vowels were computed and used as variables in statistical analyses. According to threeway ANOVAs results, overall both groups of speakers were found to produce initial stress with larger F0 range, higher mean F0 and greater intensity than final stress. In contrast, they produced final stress with longer duration ratio than initial stress. This might partly be due to the final lengthening effect. In addition, NT produced a higher average F0 when stressing CVVS syllables than AE did. Interestingly, although acoustic correlates of both initial and final stress were similar for NT and AE, stepwise regr ession analyses yielded different significant predictors for the stress production score for the two groups. This means native listener judges relied on different acoustic cues when identifying stress produced by each language group. Based on the total amount of variance they accounted for across all three syllabic structures examined (CVVO, CVVS and CVO), the most significant predictor for NT initial stress production was duration, followed by intensity and mean F0 respectively. For their final stress production, duration was again the strongest predictor followed by mean F0 and intensity. On the other hand, for AE initial stress production, intensity was the strongest, followed by average F0 and duration. For their final stress production,
163 average F0 emerged as the strongest predictor overall, followed by intensity and duration. These findings are interesting as they suggested that acoustic parameters that were differentiated in NT and AE stress production were not necessarily the ones that caused their production to be accurately heard by the native judges. Recall that both AE and NT produced initial stress with a significantly higher F0 range, higher average F0 and higher vowel intensity, and significantly longer vowel duration for final stress, but duration was the acoustic dimension the judges relied on the most when identifying Thai initial and final stress production. On the contrary, they relied on intensity the most when identifying AE initial stress production, and average F0 when identifying AE final stress production. Since NT and AE productions were equally accurately heard and since not all variance in the judges scores were explained by the acoustic parameters measured, it is likely that other acoustic parameters were used by the native lis tener judges to identify stress position produced by NT and AE. These include vowel reduction in the unstressed syllable, the presence of a glottal stop after a stressed syllable, and a longer period of aspiration i n the onset of a stressed vowel. The fact that vowel duration is the most differentiated acoustic dimension in NT final stress production might have been influenced by the presence of final phrasal stress and phonemic vowel length in Thai. According to Potisuk (1996), the primary acoustic correl ate of Thai final stress was duration. However, it is important to note that the role of an acoustic parameter with phonemic status in L1 and its use in L2 stress production and perception remains
164 controversial. Berinstein (1979) asserted that if a cue is used contrastively in an L1, that cue would not be used primarily in L2 stress production (and perception) and cited the evidence that speakers of Kekchi, a language with phonemic vowel length, used F0 contrast but not duration contrast as the primary ac oustic correlate of stress In contrast (2008) found that Vietnamese ESL learners had difficulty with manipulating duration contrast when producing stress because Vietnamese is a tone language without phonemic vowel length. However, the V ietnamese speakers in their study could manipulate contrastive levels of F 0 and intensity on accent bearing syllables The present finding that vowel duration was the main acoustic correlate of (2008) and suggests that acoustic properties with a phonemic status in an L1 may serve as an acoustic correlate in L2 stress production. In addition, F0 (used contrastively in Thai lexical tones) was also one of the most differentiated parameters (besides F0 range and intensity) in Thai L2 initial stress production. The results for stress position did not support the hypothesis that NT would produce final stress better than initial stress. In contrast, both NT and AE produced initial stress better than f inal stress. The result that AE were able to produce initial stress better than final stress supports the well known fact that English has a strong tendency to stress wordinitially in disyllables However, it is more challenging to account for that fact that NT produced initial better than final stress as opposed to what was predicted. One explanation could be that it is more feasible to implement initial stress than final stress regardless of the L1 predominant stress patterns. According to the 3way ANOVA analysis on acoustic parameters of stress, both groups were found to
165 produce stress in a similar manner: initial stress was produced with larger F0 range, higher F0 and intensity than that of final stress with the exception that the vowel duration of final stress was longer than that of initial stress. Thai Tone Transfer The results from the Thai tone transcription analysis indicated that NT were not constrained by specific Thai tonal rule distribution or the Thai tones on English loanword predictions by Gandour (1979). Unlike the Thai tone predictions that predict only low and falling tone on the CVVO syllable, it was found that in initial stress position CVVO syllables carried high tone the most. In fact, all four syllables: CVO, CVVO, CVS, CVVS, were produced most of the time with high tone in initial stress position and with falling tone in final stress position. The result of the tone transcriptions on final syllables contradicts the Thai tone prediction that CVO stressed syllables would carry low or high tones or Gandours prediction that CVO syllables in wordfinal position would carry low tone. Although the syllabic structures which constrain Thai tone distribution were not predictive of their F0 patterns, NT still produced F0 contours that w ere recognizable as Thai tones. In general, they seemed to be aware that stressed syllables should have higher pitch than unstressed syllables and modified their seemingly Thai tonal F0 contour accordingly. Moreover, NT appeared to simultaneously manipul ate F0, intensity and duration to signal stress. This manipulation seems to be influenced by both syllabic structure and stressed position. A possible question that might arise: would these F0 patterns that resemble Thai tones affect the stress judgment by a native listener judge? According to the results from the Thai tone transcription, most Thai speakers tend to assign level tones (high
166 and low) rather than dynamic tones (falling and rising) on stressed versus unstressed syllables with the exception of final stress production which carried falling tone the most. Since NT stress production was heard as accurately as AE production, it is reasonable to suggest that F0 patterns produced by NT did not affect native listeners stress judgment in any significant way. A similar question was addressed by Fry (1958) who investigated the effectiveness of different types of fundamental frequency variation in determining stress judgments. While controlling duration, his stimulus words contained syllables with three kinds of F0 pattern: level ( ), a linear change ( ) and a curvilinear change ( ). His results showed that F0 pattern combinations on stressed and unstressed, respectively, did not play a big role in stress position judgment. As long as frequency change has been detected, listeners can accurately perceive stress. Data from the Thai tone analysis seems to support Frys conclusion. In initial stress production, NT tended to map high tone with a stressed syllable and low tone with an unstressed syllable as shown in a. This pattern was described by Fry (1958:1489) as an unEnglish intonation like pattern; but, pattern a. still received initial stress responses that were nearly as high (mean 70%) as pattern b. (mean 80%) which showed English intonation like patterns. a. b. The tone combinations assigned to final stress by NT include low and falling tones, on unstressed and stressed syllables, respectively. This can be shownas pattern c. A more common English intonation pattern that resulted in final stress judgment is d.
167 c. d. Both patterns c. (produced by Thai) and d. (a more English native like pattern) evoked the same degree of final stress judgment. This further suggested a minimal influence of Thai tone patterns onto English nonword production and lexical stress judgment by AE listener judges. However, there are several points to consider before drawing any strong conclusions on the minimal interference from Thai tonal rules on L2 stress production by NT. Firs t, this present study and Frys study (1958) only investigated stress judgment by comparing stress production scores. If reaction time was treated as a dependent variable, subtle differences, instead of overall similar scores by two language groups, might be detected when considering F0 contrasts as a factor. I t was speculated that the reaction time could be longer when AE native listener judges were presented with nonnative like F0 patterns on disyllabic words, especially when other acoustic correlate cues were controlled. Second, since this study used naturally produced words, there are many competing factors for stress perception judgment. In Beckman (1986)s study, hybrid resynthesis (manipulating acoustic values of naturally produced words) was used and English listeners were found to rely heavily on F0 when judging stress. Stress P erception The results from the speech perception task showed that overall NT identified lexical stress location of disyllabic nonwords as accurately as AE. This native like performance in the stress identification task by NT is similar to Altmann (2006)s findings that speakers of languages without wordlevel stress (Chinese, Japanese and Korean) earned perfect stress identification scores on English nonwords.
168 The analysis on the stress perception accuracy score provided additional findings that both groups identified initial stress better than final stress. This result was expected for AE as English has a strong tendency to stress initially, especially in disyll ables (Cutler, 1987). English speaking infants as young as nine months old show a preference for strongweak stress patterns to weak strong stress patterns (Jusczyk, Cutler & Redanz, 1993). However, NT also identified initial stress better than final str ess like AE. This result did not support the prediction that NT would perceive final stress better than initial stress, given that Thai exhibits fixed final stress in polysyllabic words (Luksaneeyanawin, 1983). The initial stress preference by NT in this study was further confirmed by the fact that their final stress judgment required more reaction time (approximately 200 ms longer than initial stress judgment). It is possible that instead of word final stress, Thai actually exhibits a phrase final stres s. NT difficulty with final stress suggests that L1 phraselevel stress patterns do not necessary facilitate perception of L2 wordlevel stress. It is also possible that rather than causing NT to be more sensitive to final stress, the presence of a fixed (word or phrasal level) final stress in Thai reduces their sensitivity to perceptual cues to stress in this position. This is because its position as well as its acoustic correlates is not encoded in NT longterm representation. Further research is clear ly needed to validate this claim. The stress position identification pattern by NT in this study is inconsistent with the result of Wayland et al (2006). In their study, ten native speakers of Thai were found to prefer final stress assignment over initial stress regardless of syllabic structure or lexical class (noun vs verb). However, the goal and experiment paradigm in Wayland et al.s
169 study and this current study are not the same and might have resulted in a different stress preference pattern. Specif ically, in Wayland et al., participants were not explicitly asked whether initial or final stress was heard. They were simply presented with nonsense disyllabic words, produced with initial and final stress, embedded at the end of a verb and a noun frame s entence (Id like to __ vs. Id like a __.), and were asked which sentence they preferred. Preference for initial or final stress was predicted based on syllable structure and lexical class (noun and verb) of the target nonsense words. Contrary to the predictions, Thai participants preferred final stress for both types of sentences. On the other hand, this present study asked the listeners to explicitly indicate on which syllable of the disyllabic nonwords was the stress heard. Thus, rather than investigating the effects of lexical class and syllabic structure on stress assignment pattern, this current study examined acoustical information used to perceive stress. It is possible that in Wayland et al.s study, the carrier sentences render lexical ac tivation in Thai participants and leads to preference in L1 dominant stress patterns. Another possibility might be due to the transfer of the L1 phrase final effect. Since the target stimuli were placed in the sentence final position, their Thai participants automatically made reference to the default position of phrasal stress in Thai, phrase final stress. Nonetheless, acoustic characteristics of test words appear to offer an alternative explanation as to why initial stress identification is more acc urate than final stress identification by both groups. The detailed acoustic analysis of perception stimuli produced by a t rained phonetician showed that in initial stressed words only average F0 (Hz) and intensity (dB) of stressed and unstressed syllabl es were consistently different across syllabic structures. In contrast, all four acoustic parameters to stress
170 measured, average F0, F0 range, duration and intensity contrasts, were always significantly different in final stress. Correlation results on acoustic ratio values revealed how the phonetician realized final stress. The negative correlation between increased vowel duration and average F0 showed that the more the phonetician increased vowel duration contrast; the more he decreased the magnitude of average F0 contrast in final stress. The relationship between the acoustic data of the perception stimuli and stress identification accuracy was investigated through a stepwise regression analysis. The results revealed that increased vowel duration (12%) was the strongest predictor of AE final stress perception accuracy scores. Intensity (15%) was one of the strongest predictors of NT final stress perception scores, followed by increased vowel duration (5%) and average F0 (4%), respectively. Unfortunately, no significant predictor of initial stress perception accuracy scores for both language groups was revealed. This could be partly due to the ceiling effect: both groups obtained very high scores (Thai, 89.36% vs AE, 91.65%) with a small amount of v ariance. However, a closer examination of the acoustic data revealed that even though average F0 and intensity were the only two consistent acoustic correlates of initial stress, the magnitude of the differences between initial stress and its neighboring (i.e., final) unstressed syllable was much greater than final stress. That is, the magnitude in F0 difference was in average 20.86 Hz higher and the intensity difference was 3.54 dB higher than that of final stressed syllables. Both average F0 and intensity difference might have contributed to the higher initial stress identification score than the final stress score by both AE and NT. Average F0 was speculated to play the most important role in stress perception by both groups due to the larger effect size than that of intensity
171 difference based on just noticeable differences. The JND in F0 was about 5 times while JND in intensity was about 3.5 times. Besides relative importance of F0 as a correlate of stress in AE and NT perception, this result also suggests that it is the magnitude of acoustic saliency rather than the number of the acoustic correlates that is more predictive of accurate stress perception. Given that F0 is the major acous tic correlate of the English accent system and intonation systems and of the Thai tone system, it is very likely that both language groups were unconsciously sensitive to salient F0 contrast in the initial stress position. Other studies also found that E nglish listeners employed F0 as the most important cue to stress perception in an accented position (Fry, 1958; Beckman, 1986). It is also interesting to point out that despite a larger difference in vowel duration, final stress was less accurately identi fied by AE and NT. This result suggests that despite its higher degree of saliency, duration is relatively less important than F0 in final stress perception. This is due, perhaps, to the final vowel lengthening effect. In other words, since a final vowel is expected to be longer than an initial vowel, the role of vowel duration difference as an acoustic correlate of stress is reduced. However, the result of final stress predictor by AE supported the finding by Turk & Sawusch (1996) that duration alone appears to be a sufficient cue for stress perception. Furthermore, their suggestion that duration and intensity were perceived integrally and listeners could not use intensity alone for making stress judgment was consistent with the result that intensity an d duration were significant predictors in final stress regression model by NT. Few studies investigated the weight of acoustic cues to English lexical stress by speakers of tonal languages. Yu & Andruski (2010) found that Mandarin listeners relied
172 primari ly on F0 in stress identification of real English words, nonwords and hums (no segmental information) and employed duration as a secondary cue to final stress in nonwords. Their findings appear to support my speculation that that average F0 contrast would be the primary cue and duration is relatively less important than F0 in stress perception. However, Zhang and Francis (2010) suggested a different cue weighting pattern by Mandarin speakers and American English control groups. Both language groups wer e found to weigh vowel quality more heavily than other cues and both groups treated vowel quality and duration as a combinational cue when listening to synthesized tokens of the word desert. In contrast, Mandarin listeners were more sensitive to the F0 cue than American listeners. In this study, Thai listeners did not appear to show the same detectable sensitivity to vowel reduction as the control group. Different results might be partly due to different types of stress stimuli used, natural speech vs synthetic speech. Relation between Speech Perception and P roduction Another research question raised in this study is to examine the relation between speech perception and production of English nonwords. Two measures including accuracy scores and acousti c information revealed the relation between the two speech mechanisms. The two language groups showed different patterns for their accuracy scores and acoustic reliance. The correlation results between stress perception and production accuracy scores b y NT and AE are different. A moderate positive correlation (close to significant level) between stress production score and perception score was found. Recall that stress production scores reflect the extent to which NT and AE stress production was accur ately heard by native listener judges and stress production scores reflect their
173 ability to indentify intended stress position. Therefore, the moderate correlation between NT production and perception scores suggests that native judges perceptual cues to NT stress production match relatively well to perceptual cues NT use to indentify stress. Also, there was a significant positive correlation between perception and production scores for the CVV structure in both stress positions, but not for other syllabl e structures. This may have been due to the relatively simplistic nature of the acoustic and perceptual correlates of CVV: vowel duration and intensity were the only two acoustic dimensions consistently implemented (and used by native listener judges) on t his syllabic structure and NT effectively made use of these two acoustic dimensions, saliently present in the perception stimuli, in their own perception of stress. On the other hand, the acoustic implementation of other syllabic structures such as CVVO or CVVS is more varied and more complex. Vowel duration may be more effectively lengthened and thus perceptually more salient in CVVS than in CVVO. With a sonorant coda, F0 contour of CVVS may also be longer than CVVO. Furthermore, an obstruent coda in CVVO structure might have been vocalized as a glottal stop or might have triggered a small pause between initially stressed and unstressed syllables. These could be secondary cues to stress (Couper Kuhlen, 1986). These extra acoustic correlates were not clearly present in CVV. For this reason, a mismatch between the various acoustic features of stress produced by NT (and used by native listener judges) on CVVO and CVVS, and their perceptual cues to stress on these two syllabic structure is likely to occur. In other words, unlike CVV, production and perception of these two syllabic structures are subjected to different constraints.
174 For AE, the direct relation between stress perception and production scores has not been observed in either their overall acc uracy scores or their scores on particular word types (syllabic structure combinations). The lack of any correlation between overall perception and production scores for AE suggested a mismatch between perceptual cues used to identify the position of stress in their production by the native judges and the perceptual cues they themselves used to identify stress position. This might be due to the fact that some of the tested syllabic structures were less frequently used syllabic structures such as CVO and C VVO resulting in difficult stress articulation. If real words were used in both tasks, it would be very likely to see significant correlation results as AE should be familiar with the test words and score well in both tasks. It is also interesting to note that even though acoustic correlates of stress production are similar for AE and NT, their perceptual cues to stress perception vary. For example, both groups manipulated vowel duration to signal final stress production. However, while vowel duration was the main and only perceptual cue to final stress perception among AE, NT relied on intensity, vowel duration as well as average F0 in their perception of final stress. This pattern of result suggests that stress production and stress perception are condit ioned by different factors. STM and the Current Findings STM (Stress Typology Model) is the only model that predicts the success rate of both perception and production by L2 learners. The present findings seem to both support and challenge STM. According to the stress perception scores, Thai speakers showed nativelike competence in the stress identification task. If Thai was categorized as a tone language, STM's prediction that speakers from L1 nonstress language ( stress, predictable), would excel in stress identification tasks was correct. Alternatively,
175 if Thai was treated as a fixed stress language like French (+stress, +predictable, quantity sensitive), the STM prediction that speakers with predictable stress would have difficulty perceiving stress location would not be borne out. Thai speakers in this study differed from French speakers in Altmann (2006) in that they did not have perceptual difficulty in identifying the stress location of disyllabic words. A possible way to resolve typological definition is to specify stress category in detail. In addition to categorizing languages according to their observable surface stress patterns, STM might need to specify that suprasegmental features (e.g., tone, pitch accent) with phonemic status should outrank the feature with phonetic status such as phrasal stress in Thai. Although Thai and French have the same fixed final stress, the contrastive tone system in Thai justifies that Thai should belong to the nonstress language group rather than stress group. Another problem is that STM did not explain in detail why speakers of nonstress language performed well in stress identification. The model merely assumes that the positive settings in L1 stress were likely to cause interference. This assumption is merely based on the stress position in L1 and ignores sensitivity to acoustic cues to stress perception. The present finding points out that L1 stress position (final in Thai) did not affect stress perception as much because both language groups showed initial stress preference. It is very likely that sensitivity to acoustic dimensions of stress is, in fact, an important factor underlying Thai speakers success in stress identification. The detailed acoustic analysis of perception stimuli suggests t hat NT (and AE) reliance on average F0 might have been responsible for their success in initial stress perception and a lesser degree of F0 differentiation on final stress might have been one
176 of the reasons why NT was not as successful in perceiving final stress. Thus, variation in degrees of sensitivity to different acoustic parameters of L2 stress, rather than surface stress pattern, could provide a better explanation for excellent stress identification by NT and poor performance by French speakers in Al tmann (2006). The use of suprasegmental reliance in L1 could partly explain predict success in L2 stress identification. Thai relies heavily on suprasegmental cues while French utilizes little suprasegmental cues. French stress is not cued by F0, durati on or energy contrast, but realized as final syllable lengthening. Moreover, French has no contrastive tone, pitch accent or stress, and duration is not used contrastively in vowels or consonants (reviewed by Dupoux, Sebasti nG lles, Navarrete & Peperkam p, 2008). With regard to stress production, STM predicts that speakers without L1 stress would produce non target like stress placement strategies. This prediction does not apply to the present finding as the ability to realize stress was tested while c ontrolling the stress placement. However, the production results suggest that Thai speakers could realize stress successfully or in a way that native speakers of English could identify their intended stress location. This production result shows that speakers of nonstress languages such as Thai could articulate stress. The relation between stress perception and production has not been directly addressed by STM. STM only observes stress position in L1 and generates the prediction but does not investiga te the role of suprasegmental properties in L1 in relation to L2 stress acquisition. The findings shown by NT disprove the claim regarding L1 stress patterns as a solely predictor for success in L2 stress perception and production. Berinstein (1979)s pr ediction concerning acoustic parameter and function
177 in L1 and its relatively ranking in L2 stress perception and production was more relevant than that of STM based on the finding in this study. Berinstein hypothesized that an acoustic parameter used phonemically in an L1 would not be used primarily in the perception and production of L2 stress. The current finding that F0 range as well as average F0 were used to signal initial stress, and duration to signal final stress among NT (and AE) is inconsistent with the above hypothesis. On the contrary, the present finding suggests that acoustic features (F0 and duration) used contrastively in L1 (lexical tones and phonemic vowel length) can be used in L2 production of stress. Acoustic analyses of perception s timuli also suggested that average F0 was also responsible for NT (and AE) success in initial stress perception. Furthermore, both duration and F0 were perceptual cues to final stress among NT. In sum, in order to predict the success of L2 stress learners there is a need for a model to investigate beyond the surface observable L1 stress position and to generate predictions based on their acoustic dimensions of L1 stress as well as other related L1 prosodic systems with their function (lexical or nonlexic al). In addition, a model should incorporate two possible levels of stress processing and production: phonetic and phonological representation of L2 stress. This study only investigated stress acquisition at the phonetic level and reported that Thai lear ners of L2 English can perceive and produce stress in English nonword tasks as well as the AE control group. Conclusion In summary, NT neither have perceptual difficulty when identifying lexical stress nor producing lexical stress. Due to the specific design of test stimuli, other factors such as English orthography, familiarity effect and complex syllabic structures were excluded. Given that Thai is a tone language with phonemic vowel length distinction,
178 NT relied on F0 to produce L2 initial stress and duration to signal final stress. Average F0 also plays a role in NT perception of L2 stress, particularly word initially. This leads to a moderate relationship between stress production and perception among NT. Limitations of the Present S tudy This st udy investigated many areas in stress acquisition including stress perception and production and the role of Thai tone transfer. Due to the large scope of the study, it was impossible to include vowel reduction analysis as well as RT as dependent variables in the stress production analysis. Second, the segment and syllabic structures of English nonword wordlist was not strictly controlled due to the concern on ecological validity of the wordlist that should approximate real words as much as possible. Ho wever, the wordlist used in this study with mostly two differing syllabic structures in stressed and unstressed position and differing vowels might have introduced some variance in the detailed acoustic analysis. If the vowels and syllabic structures of u nstressed syllable were all the same (e.g., THUM.tha, THIM.tha and THAA.tha), comparing ratio values across five structures of stressed syllables would yield more statistical significant results. Each vowel has its intrinsic values such as F0, intensity and length and comparing acoustic parameters of two differing vowels results in different ratio values. In addition, if the same vowels were used in disyllabic words such as THA.tha, this would facilitate computing vowel reduction and make spectral balance measurement possible. Finally, the present stepwise regression analysis could not predict the most important acoustic variable for initial stress perception score due to the minimal variance in the score (i.e., overall high score in both groups) and sma ll data points (200 scores) as dependent variables. An alternative statistical design for regressing the acoustic values and stress production score that could correlate
179 individual scores of each listener (200 scores x 15 listeners) with each set of acous tic parameters might be considered in a future study. Future D irections This study is the first study that presents comprehensive findings regarding stress perception and production by Thai speakers, the role of L1 tonal transfer and relation between stress perception and production based on accuracy scores and acoustic parameters. English nonword stimuli were selected for use in this study to test the effect of L1 transfer. It might be a good idea for future research to compare the results of Engl ish nonword stimuli vs. English real word stimuli. I t was hypothesize that if Thai participants were asked to produce English real words, they would show poor performance in stress production due to poor sound and visual representation of English orthogr aphy. Other potential areas of future research include comparing the performance of Thai speakers with speakers from different stress typologies in the same task. First, with the focus on L2 stress perception, it would be interesting to examine whether Thai lis teners would exhibit any degree of stress deafness like French listeners in the discrimination task paradigm used by Dupoux et al. (1997 & 2008). Given the fact that both Thai and French do not use stress contrastively, it is possible that Thai speakers who excelled in the stress identification task, will show poor performance in stress discrimination. If this was true, we will gain further support that there are at least two levels of stress processing such as both the phonetic and phonological levels More research needs to be conducted to investigate the nature of stress representation by listeners from nonstress languages.
180 Second, synthetic materials that vary in one or two acoustic correlates at a time (Fry, 1958; Zhang & Francis, 2010 ) should be used in combination with naturally produced words that have more complex speech signals. Administering two types of stimuli on the same participants could resolve conflicting results from the previous L2 perception by Mandarin listener studies (Zhang & Francis, 2010; Yu & Andruski, 2010). In addition to sensitivity to various acoustic cues to stress, stress representation by listeners from nonstress languages has never been directly tested. The discrimination task paradigm by Dupoux et al. ( 1997 & 2008) might reveal novel findings on L2 stress perception by tonal speakers. The future findings in combination with the present findings will constitute a wealth of cross linguistic data and will be a potential resource for proposing a new model in L2 stress perception and production acquisition with an emphasis on acoustic evidences.
181 APPENDIX A WORDLIST FOR SPEECH PRODUCTION TASK READ BY A PHONETICIAN Syllabic structure and segments used in production task CVV thii thuu thaa CVVS thiim thuum thaam CVVO thiip thuup thaap CVO thip thup thap th p th p CVS thim thum tham [where th = th] Table A 1. Stimulus word duration in milliseconds words whole word vowel portion thup 432 118 thip 409 131 thum 320 133 th p (AE) 477 145 th p (AE) 400 149 thuup 570 223 thiip 557 239 thiim 425 240 thaap 559 245 thuum 496 288 thaam 583 288 thuu 495 335 thaa 567 355 thii 636 396 Note: AE = American English vowels that do not have correspondent vowels in Thai
182 APPENDIX B MATERIALS FOR SPEECH PRODUCTION TASK AS PRESENTED ON EPRIME Sentence frame for initial stress: They spoke X twice. Sentence frame for final stress: They spoke X twice. Table B 1. Stimuli presented on E PRIME Running Block SFrame Number SoundOne SoundTwo TrialNumber Block1 X 1 thii.wav thip.wav 4a Block1 X 2 thuup.wav thuum.wav 14b Block1 X 3 thup.wav thiip.wav 20c Block1 X 4 thiim.wav thaap.wav 7a Block1 X 5 thuu.wav thaap.wav 3b Block1 X 6 thaap.wav thUp_AE.wav 12c Block1 X 7 thiip.wav thup.wav 12a Block1 X 8 thIp_AE.wav thuup.wav 20a Block1 X 9 thUp_AE.wav thIp_AE.wav 17b Block1 X 10 thuum.wav thim.wav 10b Block1 X 11 thUp_AE.wav thii.wav 22b Block1 X 12 thiip.wav thuu.wav 13a Block1 X 13 thaap.wav thii.wav 13c Block1 X 14 thim.wav thIp_AE.wav 25c Block1 X 15 thaam.wav thuup.wav 7c Block1 X 16 thaa.wav thUp_AE.wav 4c Block1 X 17 thIp_AE.wav thUp_AE.wav 25a Block1 X 18 thUp_AE.wav thuum.wav 19b Block2 X 1 thiim.wav thaap.wav 7af Block2 X 2 thii.wav thip.wav 4af Block2 X 3 thuu.wav thaap.wav 3bf Block2 X 4 thaap.wav thii.wav 13cf Block2 X 5 thiip.wav thup.wav 12af Block2 X 6 thaap.wav thUp_AE.wav 12cf Block2 X 7 thIp_AE.wav thUp_AE.wav 25af Block2 X 8 thaam.wav thuup.wav 7cf Block2 X 9 thuum.wav thim.wav 10bf Block2 X 10 thuup.wav thuum.wav 14bf Block2 X 11 thUp_AE.wav thii.wav 22bf Block2 X 12 thiip.wav thuu.wav 13af Block2 X 13 thim.wav thIp_AE.wav 25cf Block2 X 14 thUp_AE.wav thuum.wav 19bf Block2 X 15 thaa.wav thUp_AE.wav 4cf Block2 X 16 thIp_AE.wav thuup.wav 20af
183 Table B 1. Continued. Running Block SFrame Number SoundOne SoundTwo TrialNumber Block2 X 17 thup.wav thiip.wav 20cf Block2 X 18 thUp_AE.wav thIp_AE.wav 17bf Block3 X 2 thii.wav thuup.wav 3a Block3 X 3 thaa.wav thiip.wav 3c Block3 X 4 thuum.wav thiip.wav 7b Block3 X 5 thiim.wav thUp_AE.wav 10a Block3 X 6 thaam.wav thIp_AE.wav 10c Block3 X 7 thuup.wav thaa.wav 13b Block3 X 8 thiip.wav thiim.wav 14a Block3 X 9 thaap.wav thuum.wav 14c Block3 X 10 thuup.wav thIp_AE.wav 12b Block3 X 11 thIp_AE.wav thiim.wav 19a Block3 X 12 thip.wav thaam.wav 19c Block3 X 13 thUp_AE.wav thaap.wav 20b Block3 X 14 thIp_AE.wav thum.wav 17a Block3 X 15 thip.wav thUp_AE.wav 17c Block3 X 16 thUp_AE.wav thup.wav 25b Block3 X 17 thIp_AE.wav thaa.wav 22a Block3 X 18 thum.wav thuu.wav 22c Block4 X 1 thuu.wav thIp_AE.wav 4bf Block4 X 2 thii.wav thuup.wav 3af Block4 X 3 thaa.wav thiip.wav 3cf Block4 X 4 thuum.wav thiip.wav 7bf Block4 X 5 thiim.wav thUp_AE.wav 10af Block4 X 6 thaam.wav thIp_AE.wav 10cf Block4 X 7 thuup.wav thaa.wav 13bf Block4 X 8 thiip.wav thiim.wav 14af Block4 X 9 thaap.wav thuum.wav 14cf Block4 X 10 thuup.wav thIp_AE.wav 12bf Block4 X 11 thIp_AE.wav thiim.wav 19af Block4 X 12 thip.wav thaam.wav 19cf Block4 X 13 thUp_AE.wav thaap.wav 20bf Block4 X 14 thIp_AE.wav thum.wav 17af Block4 X 15 thip.wav thUp_AE.wav 17cf Block4 X 16 thUp_AE.wav thup.wav 25bf Block4 X 17 thIp_AE.wav thaa.wav 22af Block4 X 18 thum.wav thuu.wav 22cf
184 APPENDIX C WORDLIST FOR SPEECH PERCEPTION TASK READ BY A PHONETICIAN Set 1: Full vowels in unstressed position Instruction s for reader: Please concatenate the two monosyllabic words into one disyllabic word. Read aloud items in blocks 1a and 2a with initial stress. Read aloud items in blocks 1b and 2b with f inal stress. Do not change the vowel quality of the unstressed syllable. Block 1: a. the words will be produced with initial stress b. the words will be produced with final stress Table C 1. Speec h perception wordlist block 1 # for reader s et code # words # for reader s et code # words 1 1 1 ba: ba: 26 21 3 g m b m 2 1 3 ga: ba: 27 23 3 g m b :n 3 3 3 ga: fe t 28 23 5 t m b :n 4 3 5 ta: fe t 29 2 2 da: b :n 5 5 1 ba: b m 30 2 4 pa: b :n 6 5 3 ga: b m 31 2 2 da: b t 7 6 2 d :n b :n 32 2 4 pa: b t 8 6 4 p :n b :n 33 7 3 g :n fe t 9 8 2 d :n b t 34 7 5 t :n fe t 10 8 4 p :n b t 35 9 1 b :n ba: 11 10 2 d :n b m 36 9 3 g :n ba: 12 10 4 p :n b m 37 12 2 de t b t 13 11 3 ge t bu:t 38 12 4 pe t b t 14 11 5 te t bu:t 39 14 2 de t b :n 15 13 1 be t ba: 40 14 4 pe t b :n 16 13 3 ge t ba: 41 17 1 b t b m 17 15 3 ge t b m 42 17 3 g t b m 18 15 5 te t b m 43 19 3 g t b :n 19 16 2 d t b t 44 19 5 t t b :n 20 16 4 p t b t 45 22 2 d m ba: 21 18 2 d t ba: 46 22 4 p m ba: 22 18 4 p t ba: 47 24 2 d m fe t 23 20 2 d t fe t 48 24 4 p m fe t 24 20 4 p t fe t 49 25 1 b m b t 25 21 1 b m b m 50 25 3 g m b t
185 Block 2: a. the words will be produced with initial stress b. produced with final stress Table C 2. Speec h perception wordlist block 2 # for reader s et code # words # for reader Set code # words 1 1 5 ta: ba: 26 21 7 n m b m 2 1 7 na: ba: 27 23 7 n m b :n 3 3 7 na: fe t 28 23 9 s m b :n 4 3 9 sa: fe t 29 25 5 t m b t 5 5 5 ta: b m 30 25 7 n m b t 6 5 7 na: b m 31 2 6 ka: b :n 7 6 6 k :n b :n 32 2 8 fa: b :n 8 6 8 f :n b :n 33 4 6 ka: b t 9 8 6 k :n b t 34 4 8 fa: b t 10 8 8 f :n b t 35 7 7 n :n fe t 11 10 6 k :n b m 36 7 9 s :n fe t 12 10 8 f :n b m 37 9 5 t :n ba: 13 11 7 ne t bu:t 38 9 7 n :n ba: 14 11 9 se t bu:t 39 12 6 ke t b t 15 13 5 te t ba: 40 12 8 fe t b t 16 13 7 ne t ba: 41 14 6 ke t b :n 17 15 7 ne t b m 42 14 8 fe t b :n 18 15 9 se t b m 43 17 5 t t b m 19 16 6 k t b t 44 17 7 n t b m 20 16 8 f t b t 45 19 7 n t b :n 21 18 6 k t ba: 46 19 9 s t b :n 22 18 8 f t ba: 47 22 6 k m ba: 23 20 6 k t fe t 48 22 8 f m ba: 24 20 8 f t fe t 49 24 6 k m fe t 25 21 5 t m b m 50 24 8 f m fe t
186 Set 2 : Reduced vowels in unstressed position Block 1 Instructions for reader 1. Concatenate these two monosyllabic words into one word. 2. Stress on the first syllable and reduce the vowel quality of the second unstressed syllable. 3. Take your time on practicing the new disyllabic word. When ready, read the ident ification number aloud followed immediately by the new word with three repetitions. For example, one point two, [dab Table C 3. Speech perception reduced vowel wordlist block 1 # for reader words # for reader words 1.2 da: ba: 14.1 be t b:n 12.5 tet b t 15.8 fe t b m 3.1 ba: fe t 16.3 g t b t 4.5 ta: b t 17.6 k t b m 22.7 n m ba: 18.7 n t ba: 6.7 n:n bn 19.8 f t b:n 7.4 p:n fe t 20.1 b t fe t 8.3 g:n b t 21.4 p m b m 25.4 p m b t 9.6 k:n ba: 10.1 b:n b m 23.1 b m b:n 24.9 s m fe t 11.2 de t bu:t 5.9 sa: b m 2.3 ga: b:n 13.6 ke t ba:
187 Block 2 Instructions for reader 1. Concatenate these two monosyllabic words into one word. 2. Stress on the second syllable and reduce the vowel quality of the first unstressed syllable. 3. Take your time on practicing the new disyllabic word. When ready, read the identification number aloud and followed immediately by the new word with three repetitions. For example, one point two, [d Table C 4. Speech perception reduced vowel wordlist block 2 # for reader words # for reader words 7.4 p:n fe t 14.1 be t b:n 2.3 ga: b:n 4.5 ta: b t 3.1 ba: fe t 16.3 g t b t 15.8 fe t b m 17.6 k t b m 5.9 sa: b m 10.1 b:n b m 6.7 n:n bn 19.8 f t b:n 18.7 n t ba: 20.1 b t fe t 8.3 g:n b t 21.4 p m b m 9.6 k:n ba: 12.5 te t b t 22.7 n m ba: 23.1 b m b:n 11.2 de t bu:t 24.9 s m fe t 25.4 p m b t 1.2 da: ba: 13.6 ke t ba:
188 APPENDIX D QUESTIONAIRE FOR LANGUAGE BACKGROUND CHE CK Name : _______________ UFID #: ___________ Age: ____ years ____ months Major: ___________ Circle accordingly. Gender: male female Dominant hand: left right Have you had any history of hearing or speaking problem? Do you have a cold? Questions for Thai participants only List the languages you speak: English language background check 1. When did you start learning English? 2. a) When did you arrive in the US? b) How old were you at that time? 3. How long have you been in the US? 4. How much English do you use per day/ per week? Questions for AE participants only Please identify your American English Regional dialect e.g. Midwest, South, Northeast Where did you grow up? List the languages you speak and rate your proficiency.
189 APPENDIX E ENGLISH LEXICAL STRESS: POWER POINT PRESENTAT ION Slide 1 English Lexical Stress Slide 2 Stress is one of the devices used in English to distinguish one word from another. For example, PERmit (noun) is distinguished from perMIT (verb). The first syllable PER in PERmit (noun) is said to be stressed while the second syllable mit is not. The opposite is true for perMIT (verb). Slide 3 Stressed syllable is stronger (perceptually more salient or prominent) than unstressed syllable. Slide 4 Permit (present the sound files) Slide 5 PERmit (noun) perMIT (verb) an official d ocument to allow something that allows you to do something e.g. parking permit Slide 6 Produce (present the sound files) Slide 7 PROduce (noun) proDUCE (verb) food from a farm to make e.g. fresh produce Slide 8 There is always a stressed syllable in polysyllabic words. Slide 9 beauty beautiful (present sound files) Slide 10 End
190 LIST OF REFERENCES Abramson, A. S. (2001). The stability of distinctive vowel length in Thai. In K. Tingsabadh & A. S. Abramson (Eds.), Essays in Tai Linguistics (pp. 1326). Bangkok: Chulalongkorn University Press. Adams, C., & Munro, R. R. (1978). In search of the acoustic correlates of stress: fundamental frequency, amplitude, and duration in the connected utterance of some native and nonnative speakers of English. Phonetica, 35, 125156. Ahkuputra, V., Maneenoi, E., Luksaneeyanawin, S., & Jitapunkul, S. (2003). Acoustic Modelling of Vowel Articulation on the Nine Thai Spreading Vowels. International Journal of Computer Processing of Oriental Languages, 16(3), 171195. Altmann, H. (2006). The perception and production of second language stress: a cross linguistic experimental study. Ph.D. Dissertation, University of Delaware, Newark. Altmann, H., & Vogel, I (2002). L2 Acquisition of Stress: the role of L1. Paper presented at the Multilingualism Today, Mannheim, Germany, March 2002. Archibald, J. (1993). The learnability of English metrical parameters by adult Spanish speakers. International Review of Applied Linguistics and Language Teaching, 31/32, 129142. Archibald, J. (1997). The acquisition of English stress by speakers of nonaccentual languages: lexical storage versus computation of stress. Linguistics, 35, 167 181. Ariyapitipun, S. (1988). An Analysis of Phonological Errors in the Pronunciation of English Consonants and Vowels by Selected Native Speakers of Thai. Ph.D. Dissertation, University of Georgia, Athens. Armstrong, L. E., & Ward, I. C. (1931). A handbook of English intonation. Cambridge, UK: W.Heffer & Sons Ltd. Beckman, M. E. (1986). Stress and NonStress Accent Riverton, NJ: Foris Publications. Beckman, M. E., & Edwards, J. (1994). Articulatory evidence for differentiating stress categories. In Phonological St ructure and Phonetic form papers in Laboratory Phonology III (pp. 233). New York: Cambridge University Press. Bennett, J. F. (1995). Metrical foot structure in Thai and Kayah Li: Optimality Theoretic studies in the prosody of two Southeast Asian languages. Ph.D. Dissertation, University of Illinois at UrbanaChampaign, Urbana. Berinstein, A. E. (1979). A cross linguistic study on the perception and production of stress. Ph.D. Dissertation, University of California, Los Angeles.
191 Best, C. T. (1995). A direct realist view of cross language speech perception. In W. Strange (Ed.), Speech perception and linguistic experience: Issues in cross language research (pp. 171204). Timonium, MD: York Press. Best, C. T., McRoberts, G. W., & Sithole, N. M. (1988). Examination of perceptual reorganization for nonnative speech contrasts: Zulu click discrimination by Englishspeaking adults ad infants. Journal of Experimental Psychology: Human Perception and Performance, 14, 345 360. Boersma, P., & Weenink, D. (2009). Praat: doing phonetics by computer (Version 5.0.42). Retrieved September 2009, from http://www.praat.org/. Bolinger, D. (1958). A theory of pitch accent in English. Word, 14, 109149. Bolinger, D. (1986). Intonation and its parts. Melody in spoken English. London: Edward Arnold. Campbell, N., & Beckman, M. E. (1997). Stress, prominence, and spectral tilt In A. Botinis & G. Kouroupetroglou & G. Carayiannis (Eds.), Intonation: Theory, Models and Applications (Proceedings of an ESCA Workshop, September 1820, 1997, Athens, Greece). ESCA and University of Athens Department of Informatics. Chen, Y., Robb, M., Gilbert, H., & Lerman, J. (2001). A study of sentence stress production in Mandarin speakers of American Engli sh. The Journal of the Acoustical Society of America, 109(4), 16811690. Chomsky, N., & Halle, M. (1968). The sound pattern of English. New York, NY: Harper and Row. Chun, D. M. (2002). Discourse Intonation in L2 From theory and research to practice. Philadelphia, PA: John Benjamin Publishing Company. Couper Kuhlen, E. (1986). An introduction to English prosody London: Edward Arnold. Cruttenden, A. (1986). Intonation. Cambridge: Cambridge University Press. Crystal, D. (1969). Prosodic Systems and Intonation in English. London: Cambridge University Press. Cutler, A. (1986). Forbear is a homophone: lexical prosody does not constrain lexical access. Language and Speech, 29(3), 201220. Cutler, A. (2005). Lexical Stress. In D. B. Pisoni & R. E. Remez (Eds.), The Handbook of Speech Perception (pp. 264289). Malden, MA: Blackwell Publishing Ltd.
192 Cutler, A., & Carter, D. M. (1987). The predominance of strong initial syllables in the English vocabulary. Computer Speech and Language, 2, 1 33142. Cutler, A., & Van Donselaar, W. (2001). Voornaam is not (really) a Homophone: Lexical Prosody and Lexical Access in Dutch. Language and Speech, 44(2), 171195. Davis, S. M., & Kelly, M. H. (1997). Knowledge of the English nounverb stress dif ference by native and nonnative speakers. Journal of Memory and Language, 36, 445460. de Jong, K. (2004). Stress, lexical focus, and segmental focus in English: patterns of variation in vowel duration. Journal of Phonetics, 32(4), 493516. Dupoux, E ., Pallier, C., Sebasti n, N. r., & Mehler, J. (1997). A Destressing 'Deafness' in French? Journal of Memory and Language, 36, 406421. Dupoux, E., Peperkamp, S., & Sebasti nGalls, N. (2001). A robust method to study stress "deafness". The Journal of the Acoustical Society of America, 110(3), 16061618. Dupoux, E., Sebastian Galles, N., Navarete, E., & Peperkamp, S. (2008). Persistent stress `deafness': the case of French learners of Spanish. Cognition 106(2), 682 706. Dupoux, E., Peperkamp, S., & Sebasti nGalls, N (2010). Limits on bilingualism revisited: stress 'deafness' in simultaneous FrenchSpanish bilinguals. Cognition, 114(2), 266275. Flege, J. E. (1995). Second language speech learning: Theory, findings, and problems. In W. Strange (Ed.), Speeh Perception and Linguistic Experience (pp. 233277). Timonium, MD: York Press. Fry, D. B. (1955). Duration and intensity as physical correlates of linguistic stress. The Journal of the Acoustical Society of America, 27(4), 765 768. Fry, D. B. (1958). Experiments in the perception of stress. Language and Speech, 1, 126152. Fry, D. B. (1965). The dependence of stress judgments on vowel formant structure. In Proceedings of the Fifth International Congress of Phonetic Sciences (pp. 306311). New York, NY: Karger. Fudge, E. (1984). English wordstress. London: George Allen & Unwin.
193 Gandour, J. (1979). Tonal rules for English loanwords in Thai. In Studies in Tai and MonKhmer phonetics and phonology in honour of Eugnie J.A. Henderson (pp. 94105). Bangkok: Chulalongkorn University Press. Gandour, J., Tumtavitikul, A., & Satthamnuwong, N. (1999). Effects of speaking rate on Thai tones. Phonetica, 56, 123134. Gandour, J. (1978). The Perception of Tone. In V. A. Fromkin (Ed.), Tone: A Linguistic Survey New York, NY: Academic Press. Gordon, R. G. (2005). Ethnologue: Languages of the world. Dallas, TX: SIL International. Guion, S. G., Clark, J. J., Harada, T., & Wayland, R. P. (2003). Factors affecting stress placement for English nonwords include syllabic structure, lexical class, and stress patterns of phonologically similar words. Language and Speech, 46 (4), 403427. Guion, S. G., Harada, T., & Clark J. J. (2004). Early and late Spanish English bilinguals' acquisition of English word stress patterns. Bilingualism: Language and Cognition, 7(3), 207226. Gussenhoven, C. (2004). The phonology of tone and intonation. New York, NY: Cambridge Univ ersity Press. Halle, M. (1998). The stress of English words 19681998. Linguistic Inquiry, 29, 539568. Hirst, D., & Di Cristo, A. (Eds.). (1998). Intonation systems: A survey of twenty languages Cambridge: Cambridge University Press. Howard, D., & Smith, K. (2002). The effects of lexical stress in aphasia word production. Aphasiology, 16(1 2), 198237. Jangjamras, J. (2008). English lexical stress perception by native Thai speakers. Paper presented at the 156th Meeting of the Acoustical Society of America, Miami, FL, November 2123. Jarmulowicz, L. (2002). English Derivational Suffix Frequency and Children's Stress Judgments. Brain & Language, 81(1 3), 192 205. Jongenburger, W. (1996). The role of lexical stress during spokenword processing. Dordrecht: ICG. Jusczyk, P. W., Cutler, A., & Redanz, N. J. (1993). Infants' Preference for the Predominant Stress Patterns of English Words. Child Development, 64, 675687.
194 Kallalyan amit, S. (2004). Intonation in Standard Thai: Contours, Registers and Boundary Tones. Ph.D. Dissertation, Georgetown University, Georgetown. Kingdon, R. (1958). The groundwork of English stress London: Longman. Knaus, J., Wiese, R., & JanBen, U. (2007). The processing of word stress: EEG studies on task related components. Paper presented at the 16th International Congress of Phonetics Sciences, Saarbrcken, Germany, August 610. Ladd, D. R. (1996). Intonation phonology New York, NY: Cambridge University Press. Ladd, D. R. (2008). Intonation phonology (Second ed.) New York, NY: Cambridge University Press. Ladefoged, P. (1996). Elements of Acoustic Phonetics Chicago, IL: The University of Chicago Press. Ladefoged, P. (1999). American English. In Handbook of the International Phonetic Association: A guide to the use of the International Phonetic Alphabet Cambridge, United Kingdom: Cambridge University Press. Ladefoged, P. (2001). A course in phonetic s. Boston, MA: Heinle & Heinle. Ladefoged, P. (2010). UCLA Phonetics Lab Data Site. Retrieved November 1, 2010, from http://phonetics.ucla.edu/course/chapter10/thai/thai.html Lehis te, L., & Peterson, G. E. (1958). Vowel amplitude and phonemic stress in American English. The Journal of the Acoustical Society of America, 31(4), 428435. Levelt, W. J. M. (1989). Speaking from intention to articulation. Cambridge, MA: the MIT press Li, F.K. (1992). The Tai Languages. In C. J. Compton & J. F. Hartmann (Eds.), Papers on Tai Languages, Linguistics, and Literatures (pp. 14). DeKalb, IL: Northern Illinois University. Ling, L. E., & Grabe, E. (1999). A Contrastive Study of Prosody and Lexical Stress Placement in Singapore English and British English. Language and Speech, 42 (1), 3956. Luangthongkum, T. (1977). Rhythm in standard Thai. Ph.D. Dissertation, University of Edinburgh, Edinburgh. Luksaneeyanawin, S. (1983). Intonation in Thai. Ph.D. Dissertation, University of Edinburgh, Edinburgh.
195 Luksaneeyanawin, S. (1998). Intonation in Thai. In D. Hirst & A. Di Christo (Eds.), Intonation systems: a survey of twenty languages (pp. 345359). New York, NY: Cambridge University Press. Morn, B., & Zsiga, E. (2006). The lexical and post lexical phonology of Thai tones. Natural Language & Linguistic Theory, 24, 113 178. T. A. T., & Ingram, C. L. J. (2005). Vietnamese acquisition of English word stress. TESOL Quarterly, 39(2), 309319. T. A. T., Ingram, C. L. J., & Pensalfini, J. R. (2008). Prosodic transfer in Vietnamese acquisition of English contrastive stress patter ns. Journal of Phonetics, 36, 158190. Nitisaroj, R. (2004). Perception of stress in Thai. The Journal of the Acoustical Society of America, 116(4), 2645. Okobi, A. O. (2006). Acoustic correlates of word stress in American English. Ph.D. Dissertation, Massachusetts Institute of Technology, Cambridge. Ortega, L. (2009). Understanding second langauge acquisiton. London: Hodder Education. Ou, S. C. (2007). Linguistic factors in L2 word stress acquisition: a comparison of Chinese and Vietnamese EFL learners' development. Paper presented at the 16th International Congress of Phonetics Sciences, Saarbrcken, Germany, August 610. Panlay, S. (1997). The effect of English loanwords on the pronunciation of Thai. M.A. Thesis, Michigan S tate University, East Lansing. Peperkamp, S., & Dupoux, E. (2002). A typological study of stress 'deafness'. In C. Gussenhoven & N. Warner (Eds.), Laboratory Phonology 7: Phonology & Phonetics (pp. 203240). New York: Mouton de Gruyter. Peperkamp, S. Vendelin, I., & Dupoux, E. (2010). Perception of predictable stress: A crosslinguistic investigation. Journal of Phonetics, 38, 422430. Peterson, G. E., & Lehiste, L. (1960). Duration of syllable nuclei in English. The Journal of the Acoustical Soci ety of America, 32(6), 693703. Peyasantiwong, P. (1986). Stress in Thai. In R. J. Bickner & T. J. Hudak (Eds.), Papers from a Conference on Thai Studies in Honor of William J. Gedney (pp. 211230). Ann Arbor: Center for South & Southeast Asian Studies, University of Michigan.
196 Pierrehumbert, J. (1980 ). The phonology and phonetics of English intonation. Ph.D. Dissertation, Massachusetts Institute of Technology Cambridge. Pierrehumbert, J., & Beckman, M. E. (1988). Japanese Tone Structure. Cambridge, MA: MIT Press. Pierrehumbert, J., & Hirshberg, J. (1990). The meaning of intonation contours in the interpretation of discourse. In P. Cohen, J. Morgan & M. Pollack (Eds.), Intentions in Communication (pp. 271311). Cambridge, MA: MIT Press. Piske, T., MacKay, I. R. A., & Flege, J. E. (2001). Factors affecting degree of foreign accent in an L2: A review. Journal of Phonetics, 29(2), 191215. Potisuk, S., Gandour, J., & MP, H. (1996). Acoutic correlates of stress in Thai. Phonetica, 53(4), 200220. Saengsuriya, N. (1989). Variations in duration, pitch and amplitude of Thai words in different utterance conditions. Ph.D. Dissertation, University of Kansas, Lawrence. Satravaha, N. (2002). Tone classification of syllablesegmented Thai speech based on multilayer perceptron. Ph.D. Dissertation, West Virginia University, Morgantown. Schneider, W., Eschman, A., & Zuccolotto, A. (2002). E Prime Reference Guide (Version 1.1) Pittsburgh, PA: Psychology Software Tools Inc. Sereno, J. A., & Jongman, A. (1995). Acoustic Correlates of Grammatical Class. Language and Speech, 38(1), 5776. Sluijiter, A. M. C., & Heuven, V. J. V. (1996). Spectral balance as an acoustic correlate of linguistic stress. The Journal of the Acoust ical Society of America, 100(4), 24712485. Sluijiter, A. M. C., van Heuven, V. J., & Pacilly, J. J. A. (1997). Spectral balance as a cue in the perception of linguistic stress. The Journal of the Acoustical Society of America, 101(1), 503513. Str ange, W. (Ed.). (1995). Speech perception and linguistic experience. Timonium, MD: York Press. Thubthong, N., Kijsirikul, B., & Luksaneeyanawin, S. (2001). Stress and tone recognition of polysyllabic words in Thai speech. Paper presented at the International Conference on Intelligent Technologies, Bangkok, Thailand, November, 2729. Tingsabadh, K., & Abramson, A. S. (1999). Thai. In Handbook of the International Phonetic Association: A guide to the use of the Phonetic Alphabet (pp. 147150). N ew York. NY: Cambridge University Press.
197 Traunmller, H. & Eriksson, A. (1995). The frequency range of the voice fundamental in the speech of male and female adults. Unpublished Manuscript. Retrieved December 6, 2010, from http://www.ling.su.se/staff/hartmut/aktupub.htm) Trofimovich, P., & Baker, W. (2006). Learning Second Language Suprasegmentals: effect of L2 experience on prosody and fluency characteristics of L2 speech. Applied Psycholinguistics, 28 1 30. Turk, A. E., & Sawusch, J. R. (1996). The processing of duration and intensity cues to prominence. Journal of the Acoustical Society of America, 99, 37823789. Wayland, R. (1997). Nonnative production of Thai: acoustic measurements and accentedness ratings. Applied Linguistics, 18(3), 345373. Wayland, R. (2006). Native thai speakers' acquisition of English word stress patterns. Journal of Psycholinguistic Research, 35(3), 285304. Wennerstrom, A. (1994). Intonation meaning in English discourse: a study of nonnative speakers. Applied Linguistics, 15(4), 399420. Wennerstrom, A. (2001). The music of everyday speech. New York, NY: Oxford University Press. Wutiwiwatchai, C., & Fur ui, S. (2007). Thai speech processing technology: A review. Speech Communication, 49(8), 8 27. Yu, V. Y., & Andruski, J. E. (2010). A cross language study of perception of lexical stress in English. Journal of psycholinguistic research, 39, 323344. Zhang, Y., & Francis, A. (2010). The weighting of vowel quality in native and nonnative listeners perception of English lexical stress. Journal of Phonetics, 38(2), 260 271. Zhang, Y., Nissen, S., & Francis, A. (2008). Acoustic characteristics of Engl ish lexical stress produced by native Mandarin speakers. The Journal of the Acoustical Society of America, 123(6), 44984513. Zhang, Y. a. F., Alexander L. (2006). Acoustic correlates of English lexical stress produced by native speakers of Mandarin Chinese. The Journal of the Acoustical Society of America, 120(5), 3175.
198 BIOGRAPHICAL SKETCH Jirapat Jangjamras was born and grew up in Bangkok, Thailand. She received a Bachelor of Arts degree in English in 1998 from Silpakorn University, Nakorn Pathom. Then she became an instructor in the English Department at Chiang Mai University. After four years of teaching experience, she traveled to the United States of America in 2002 and received a Master of Arts in English as a second language in 2004 fr om the University of Arizona, Tucson. In the same year, she moved to Florida to study in the Linguistics Department at the University of Florida. She completed a Doctor of Philosophy in linguistics with specialization in phonetics in 2011. During her PhD. Training, she worked as a teaching assistant in the linguistics 2000 course in Spring 2006, a grader in psycholinguistics course in Spring 2009 and a grader in morphology in Spring 2010. She received a six year scholarship from the Royal Thai Government scholarship for her M.A. and PhD. Studies. In 2008, she received a Dissertation Grant awarded by the journal Language Learning that assisted in the doctoral data collection. She also received several travel grants from the College of Liberal Arts and Sci ence, the Graduate Student Council, Graduate School, and the Acoustical Society of America. After graduation, she will resume her teaching position at the English Department, Chiang Mai University, Thailand.