PERCEPTION AND PRODUCTION OF EN GLISH [n] AND [l] BY CHINESE DIALECT SPEAKERS By BIN LI A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLOR IDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2006
Copyright 2006 by Bin Li
To my parents.
iv ACKNOWLEDGMENTS I would like to take this oppor tunity to express my grat efulness to all that have helped me through the four-year pursuit of a dream. Without any of them, this dissertation would not be possible. First of all, I owe my hear tful gratitude to my advisor, Dr. Ratree Wayland, who is an excellent mentor both in academics and in life. She has always been supportive and encouraging throughout my study. Her guidance is the source of my progress towards the doctoral degree. Her efficiency in working a nd persistence in academic pursuant have set for me a life-long model of an outstandi ng researcher and a successful teacher. My gratitude also goes to the committee members. Dr. Caroline Wiltshire who has helped me in various ways, ever since the days when I applied for the program. Dr. Gillian Lord and Dr. Bonnie Johnson have given me valuable comments on the dissertation drafts, especially in the final stage. I am also thankful to fellow linguistic st udents, who have helped or showed morale support. Mohammed Al-Khairy, who graduate d in the year 2004, was my resource. Jinping Zhu, Priyankoo Sarmah, Chris Barkle y, Pedro Alcocer, and Jenna Silver all showed their kindness in helping with script writing, data collec ting. Andrea Dallas read through the draft caref ully and provided helpful comments. I am also thankful to Jeomer Ta-ala who is my Â“dissertation buddyÂ”. The friendship that I found in Gainesvill e has turned the f our years into an experience that will never be forgotten in a li fetime. I cherish every moment that has been
v shared with friends. They have changed this place into a home for me. My neighbor who was also my roommate played a crucial pa rt in my dissertation, because it was her Â“consistent mispronunciationÂ” of certain words that sparked my interest in the current research. Finally, I owe special thanks to one person who has seen me through these years and has shared with me every moment s sad and happy. Without him, I would not be able to survive some of the hard times. I wish to him the best luck after graduation. I would also like to thank Pr ofessor Ann Schmidt at the Kent State University, who provided results from her research in the sa me area; Professor Si Liu at the Beihang University, who lent me the Experimental Se mantics Lab for data collection in China. I would like to use the la st and most important paragraph to express my gratefulness to my parents who will not be able to come and witness the realization of a dream that we have all shared. Their s acrifice and support has made possible my academic pursuit in the USA. I am confident th at I will continue to make them feel proud to have such a daughter.
vi TABLE OF CONTENTS page ACKNOWLEDGMENTS.................................................................................................iv LIST OF TABLES.............................................................................................................ix LIST OF FIGURES...........................................................................................................xi ABSTRACT.....................................................................................................................xi ii CHAPTER 1 INTRODUCTION AND BACKGROUND.................................................................1 The Chinese Language.................................................................................................2 Mandarin Chinese.........................................................................................................3 Jianghuai, Xinan and Gan.............................................................................................5 An Overview of the Study............................................................................................7 Research Questions........................................................................................7 Research Design.............................................................................................7 2 ARTICULATORY AND ACOUSTIC FEATURES OF ENGLISH /l/ AND /n/......10 Articulatory and Acoustic F eatures of English [l]......................................................11 Phonetic Factors in [l] Perception and Production..............................................13 Syllable Position..................................................................................................14 Summary..............................................................................................................14 Articulatory and Acoustic Features of English [n].....................................................15 Comparison of English /n/ and /l/...............................................................................17 Spectral Moments................................................................................................21 Summary..............................................................................................................23 3 CROSS-LANGUAGE SPEECH PERCEPTION.......................................................26 Models of (Cross-) Language Speech Perception......................................................27 The Native Language Magnet Model (NLM).....................................................28 The Perceptual Assimilation Model (PAM)........................................................29 The Speech Learning Model (SLM)....................................................................32 Cross-language Research on the Perception of Sonorants.........................................34 Effects of Training in Cross-Language Speech Perception........................................37
vii Experimental Factors in Speech Perception Research........................................40 Factors affecting the degree of disc riminability of on-native speech contrasts................................................................................................... 40 Factors affecting the outcome of training................................................. 43 Relationship between Speech Perception and Production..........................................47 Summary.....................................................................................................................50 4 PERCEPTION EXPERIMENT..................................................................................51 Research Questions.....................................................................................................51 Experiment Design.....................................................................................................55 Participants..........................................................................................................56 Stimuli.................................................................................................................56 Procedure.............................................................................................................58 Data analysis........................................................................................................59 Results........................................................................................................................ .60 Perception in the Pre-test.....................................................................................60 Effects of Vowel Cont ext in the Pre-test.............................................................61 Syllable Structure................................................................................................63 Effects of Training...............................................................................................64 Pre-test vs. Post-test Comparison........................................................................64 Pre-test vs. Generalization Test...........................................................................65 Discussion...................................................................................................................66 Perception Patterns..............................................................................................66 Vowel context..............................................................................................70 Syllable structure..........................................................................................71 Training Effects...................................................................................................72 Summary.....................................................................................................................74 5 PRODUCTION OF ENGLISH [L] AND [N] BY AMERICAN ENGLISH AND XINAN SPEAKERS..................................................................................................81 Research Questions.....................................................................................................82 Experiment Design.....................................................................................................83 Speakers...............................................................................................................83 Stimuli.................................................................................................................83 Acoustic Measurement........................................................................................83 Statistical Analysis..............................................................................................85 Results and Discussion...............................................................................................86 Duration of [n] and [l].........................................................................................86 Duration of the CV Transition.............................................................................88 Spectral Properties of English [n] and [l]............................................................89 Frequency of the first three formants (F1, F2, F3) and difference between F2 and F1..................................................................................................89 Bandwidth of the first three formants..........................................................91 Intensity of the first three formants..............................................................92 Rate of Mean Intensity Change...........................................................................93
viii Spectral Moments of the Consonant....................................................................94 Analysis of the Constriction Release...................................................................95 Track of the F1 and F2 frequency................................................................96 Track of the F2 intensity..............................................................................96 Vowel Nasalization.............................................................................................97 Discussion...................................................................................................................98 6 GENERAL DISCUSSI ON AND CONCLUSION...................................................112 APPENDIX: DISCRIMINANT FUNCTION ANALYSIS............................................122 LIST OF REFERENCES.................................................................................................124 BIOGRAPHICAL SKETCH...........................................................................................134
ix LIST OF TABLES Table page 1-1 Consonants in Jianghuai, Xinan and Gan (Yuan 1960)...........................................10 1-2 Vowels in Jianghuai, Xinan and Gan (Yuan 1960)..................................................10 2-1 Formant frequencies of Eng lish /l/ in previous studies............................................25 2-2 Mean duration of [l] and [n] at syllabl e initial positions in previous studies...........25 4-1 English wordlist used in the data collection.............................................................75 4-2 Mean percentage correct in [l] and [n] identification for Jianghuai (n = 15), Xinan (n = 18), Gan (n = 7) and NM (n = 7) speakers in the pre-test......................75 4-3 Mean percentage correct of [l] and [n] identification in four vowel contexts for NM speakers (n = 5) in the pre-test..........................................................................76 4-4 Mean percentage correct of [l] and [n] identification in four vowel contexts for Jianghuai speakers (n = 15) in the pre-test...............................................................76 4-5 Mean percentage correct of [l] and [n] identification in four vowel contexts for Xinan speakers (n = 18 ) in the pre-test....................................................................76 4-6 Mean percentage correct of [l] and [n] identification in four vowel contexts for Gan speakers (n = 7) in the pre-test..........................................................................76 4-7 Mean percentage correct in [l] and [n ] identification in two syllable structures for NM speakers (n = 5) in the pre-test....................................................................77 4-8 Mean percentage correct of [l] and [n ] identification in two syllable structures for Jianghuai speakers (n = 15) in the pre-test.........................................................77 4-9 Mean percentage correct of [l] and [n ] identification in two syllable structures for Xinan speakers (n = 18) in the pre-test...............................................................77 4-10 Mean percentage correct of [l] and [n ] identification in two syllable structures for Gan speakers (n = 7) in the pre-test....................................................................77 4-11 Mean percentage correct in [l] and [n ] identification in two syllable structures for NM speakers (n = 5) in the pre-test....................................................................78
x 4-12 Mean percentage correct of [l] and [n ] identification in two syllable structures for Jianghuai speakers (n = 15) in the pre-test.........................................................78 4-13 Mean percentage correct of [l] and [n ] identification in two syllable structures for Xinan speakers (n = 18) in the pre-test...............................................................78 4-14 Mean percentage correct of [l] and [n ] identification in two syllable structures for Gan speakers (n = 7) in the pre-test....................................................................79 5-1 Mean comparison of consonantal dura tion produced by Xinan and AE speakers.101 5-2 Formant frequency of the consonant in syllable singleton for all AE speakers.....102 5-3 Formant frequency of the consonant in syllable singleton for Xinan female and male speakers.........................................................................................................102 5-4 Bandwidth of F1, F2, F3 of AE speakers...............................................................104 5-5 Bandwidth of F1, F2, F3 of Xinan speakers..........................................................104 5-6 Mean intensities of indivi dual formants of AE speakers.......................................104 5-7 Mean intensities of individual formants by Xinan Speakers..................................105 5-8 Rate of intensity change from right before the release into the vowel...................105 5-9 Frequencies of F1 and F2 measured at the three consecutive pulses represented by A, B, and C of AE speakers..............................................................................108 5-10 Acoustic dimensions that [l] and [n] c ontrast in the producti on of AE and Xinan speakers..................................................................................................................111
xi LIST OF FIGURES Figure page 1-1 Atlas of Chinese dialects and distribu tion of sub-dialects of Mandarin, redrawn based on the description in Wurm and Li, et al. 1987, and Hou 2005.......................9 2-2 An FFT spectrum of [l] generated by PRAAT for the English word Â“LeeÂ” produced by a male AE speaker in this study..........................................................24 2-3 An FFT spectrum of [n] generated by PRAAT for the English word Â“kneeÂ” produced by a female AE speaker in this study.......................................................25 4-1 Identification of [l] and [n] of a ll dialect speakers in the pre-test............................75 4-3 Identification of [l] and [n] in two sy llable structures by all dialect speakers in the pre-test................................................................................................................78 4-4 Comparison of mean percentage of co rrection between the pre-test and the posttest for Jianghuai, Xinan, Gan speakers...................................................................79 4-5 omparison of mean percentage of correction between the pre-test and the generalization test for Jianghuai, Xinan, Gan speakers...........................................79 4-6 Conceptual assimilation of [l] and [n] by Jianghuai and Gan speakers...................80 5-1 Waveform of the word Â“LeeÂ” produced by the speaker AE-5.................................99 5-2 Broadband spectrogram of Â“LeeÂ” produced by AE-5............................................100 5-3 Waveform of Â“LeeÂ” produced by AE-5.................................................................100 5-4 Mean consonantal duration in two sylla ble structures by AE and Xinan speakers.101 5-5 Mean CV duration in two syllable structures by Xinan and AE speakers.............102 5-6 Mean formant frequencies for AE and Xinan female speakers.............................103 5.7 Mean formant frequencies for AE and Xinan male speakers.................................103 5-8 Mean values for F2-F1 for both AE and Xinan speakers.......................................104 5-9 Intensity of the first three fo rmants for AE and Xinan speakers............................105
xii 5-10 Comparison of the rate of intensity ch ange between AE and Xinan speakers (n-s: /n/ in singleton, n-c: /n/ in cluster).........................................................................105 5-11 Center of Gravity for AE and Xinan speakers.......................................................106 5-12 Standard Deviation for AE and Xinan speakers.....................................................106 5-13 Kurtosis for AE and Xinan speakers......................................................................107 5.14 Skewness for AE speakers and Xinan speakers.....................................................107 5-15 Comparison of F1 track by AE and Xinan speakers..............................................108 5-16 Comparison of F1 track by AE and Xinan speakers..............................................108 5-17 Comparison of F2-F1 track by AE and Xinan speakers.........................................109 5-18 Comparison of F2 intensity track between AE and Xinan speakers......................109 5-19 Two-dimension spectra of Â“LeeÂ” by the speaker AE 5..........................................109 5-20 Two-dimension spectra of Â“kneeÂ” produced by the speaker AE 5.........................110 5-21 Two-dimension spectra of Â“LeeÂ” produced by the speaker Xinan 6......................110 5-22 Two-dimension spectra of Â“kneeÂ” produced by the speaker Xinan 6....................111
xiii Abstract of Dissertation Pres ented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy PERCEPTION AND PRODUCTION OF THE ENGLISH [n] AND [l] BY CHINESE DIALECT SPEAKERS By Bin Li December 2006 Chair: Ratree Wayland Major Department: Linguistics Chinese dialects: Jianghuai, Xinan and Ga n, do not differentiate the alveolar nasal from the alveolar lateral. Native speakers of these dialects have difficulty in both perceiving and producing the two sounds in E nglish, even after years of experience. This study investigates if a short-term auditory training would improve the ability to perceptually identify English [n] and [l] by Chinese dialect learners of English across four vowel contexts and two syllable structur es. An acoustic experiment is also carried out to examine the production of these two sounds in English by both American English (AE) speakers and by Xinan speakers. Both temp oral and spectral dimensions of [n] and [l] are measured and compared between AE and Xinan speakers. The perceptual experiment finds that Chinese dialect speakers demonstrated varying degrees of difficulty in perceiving [n] and [l], which is a function of their native dialects. Vowel contexts and syllable structur es both affect the accuracy of identification for speakers of Jianghuai, Xinan, and Gan. Acous tic analysis indicates that production of
xiv [n] and [l] by Xinan speakers differ in certain dimensions such as F2 bandwidth and formant intensity, the number of which, howev er, is much less than that of the AE speakers. The results indicat e that although Xinan speakersÂ’ perceptual ability was improved after training, their production remain s unchanged. Implications of the results and suggestions for future research are discussed.
1 CHAPTER 1 INTRODUCTION AND BACKGROUND Languages differ in terms of the number of consonants and phonetic properties that are used to distinguish consonants in their phonological inventories. Such differences have implications as to how adults who are learning a second language (L2) perceive and produce consonants, especially those whose phonetic qualities are not represented in a listenerÂ’s native language (L1). Previous literat ure has established that adult L2 learners often have difficulty percei ving and producing consonants th at are found in the L2 but not in their L1. A frequently cited and well-documented ex ample is the perception and production of English liquids, / / and /l/, by adult Japanese speakers. Specifically, Japanese learners of English-as-an-L2 have been reported to have difficulty in both perceiving and producing this English contrast because their native language lacks such a distinction (Miyawaki et al. 1975). Failure to perceive and produce L2 consonants in a native-like fashion among adult L2 learners ha s led some researchers to hypothesize that a critical period exists for L2 speech lear ning (Lenneberg 1967; Scovel 1969; Patkowski 1989). However, even after years of research, it remains unclear whet her such a critical period exists, and if it does, whether it is conditioned by biological or environmental factors. Thus, L2 speech perception and produc tion difficulties could arise from one or more of the following factors: (a) a deteriora tion, slowing, or complete loss of some basic speech learning mechanism(s) (Lenneberg 1967); (b) inadequate L2 phonetic input (Flege 1995); (c) similarities and differences between the L1 and L2 phonological systems (Best 1995). If the human speech learning mechanism is reduced or lost after a critical period,
2 one might reasonably ask if adult L2 learners are capable of learning to perceive or produce L2 speech sounds accurately. The overall goal of this dissertation is to examine the efficacy of short-term laboratory training on the per ception and production of a non-na tive consonant contrast by adult learners of English. Specifically, it investigated the ability to perceive the English [l] and [n] distinction by native sp eakers of three different Chinese dialects whose L1 phonological systems do not include such contrast. To our knowledge, this study is among the first to systematically comp are and contrast the perception of English [l] and [n] among the three groups of native Ch inese listeners. Be fore discussing the design of this study, a brief overview of Chines e dialect classification systems, and of the three dialects under investiga tion, will be presented. The Chinese Language Chinese refers to the language used by th e Han nationality. The Han nationality is one of fifty-six ethnic groups in China, and accounts for more than 90% of the total population. The other fifty-fi ve ethnic groups are usually referred to as minority nationalities, and most have their own s poken languages (e.g., Tibetan of the SinoTibetan family, Uygur, Mongolian and Xibe of the Altaic fa mily; Tajik and Russian of the Indo-European family; DeÂ’ang (Benglong) and Wa of the Austro asiatic family, as well as Austronesian languages spoken in Taiwan (Wurm and Li et. al. 1987)). Based on regional, social and historical fact ors, varieties of Chinese, as opposed to separate languages outlined above, are commonly regarded as dialects instead of separate languages (Chappell 2001, Wurm and Li et. al. 1987, Yuan 1960) despite a lack of mutual intelligibility among some of them (Yip 2002). Although dialects differ in spoken forms, they all share one writing system in of ficial uses and education. This uniformity in
3 the writing system has served as a cultura lly unifying force for thousands of years. Chappell (2001) stated that the uniform written medium Â“has reinforced the belief that the spoken varieties in China are dialects of the one language rather than related languagesÂ” (2001: 4). Regardi ng dialectal classification of Chinese, some researchers claim that there are eight dialect families in Chinese (Wurm and Li et. al. 1987); while others propose a classification of seve n dialect families (Yuan 1960, Norman 1988, Hou 2005). This dissertation adheres to the eight -family classification system proposed by Wurm et al. (1987). The eight dialect families are listed as follows, and their geographic distribution is shown in Fi gure 1-1, in which areas numbered as 1 are places where languages other than Chinese are spoken: 1. Northern Chinese (Mandarin) 2. Jin 3. Xiang 4. Gan 5. Wu 6. Min 7. Kejia (Hakka) 8. Yue (Cantonese) The difference between the ways of classi fication, i.e. sevenand eight-family classification systems, lies in the classification of Jin spoken in the northwestern area in China. Jin is classified as a separate dialec t in the eight-dialect sy stem, and a sub-dialect within one dialect family (Mandarin) in the seven-dialect system. Except for this difference, the two classification systems ag ree on the distinction between Mandarin and the other dialects. Mandarin Chinese In general, Mandarin deviates to a greater extent from the ancient Chinese than other dialects. To differentiate Mandarin fr om the other dialects, Hou (2005) describes
4 eight unique characteristics of Mandarin. (1) Voiced c onsonants in ancient Chinese disappeared and changed into voiceless count erparts in Mandarin. (2) Fricatives and affricates are only aspirated when bearing level tones. (3) Th ree syllable-final nasals /m/, /n/, and / / in ancient Chinese have merged into two in Mandarin: /n/ and / /. (4) There are fewer lexical tones in Manda rin than other dialects (exc ept in Jianghuai and Xinan). Besides phonological differences, Mandarin also differs from ot her dialects in its lexicon and grammar. (5) Mandarin uses a different wo rd for general referenc e of the third person singular pronoun. (6) When referring to domestic animals and fowl, the lexeme that indicates the gender precedes the name. (7) In addition to tonal differences, for certain disyllabic words Mandarin and other dial ects show opposite compounding order (i.e., cauliflower is called [ ] in Mandarin, whereas is called [ ] in other dialects). (8) Syntactically, when a verb ta kes two accusatives: a pers on and an object, the accusative denoting a person precedes the object in Mandarin, whereas the object comes before the person in other dialects. As shown in Figure 1-1, the shaded area is where Mandarin is spoken. It is evident from the atlas that geographically, Mandarin claims the largest expanse of territory; covering most areas in northern, central, and southwestern China. The other seven dialect families fall into near complementary geographic distribution with Mandarin, covering the northwestern, southeast and southern part s of China. Moreover, Mandarin has the most speakers with almost 73% of the total population in China. Covering a vast geographi c area and claiming a large number of speakers, Mandarin presents a very complex situation in terms of further classification into sub-
5 dialects. In terms of regional distribution, historical developm ent, and linguistic features, Mandarin is further divided into eight sub-dialects (Wurm and Li, et. al. 1987): 1. Beijing 2. Dongbei (Northeast) 3. Jiaoliao 4. Jilu 5. Zhongyuan (Central Prairie) 6. Lanyin 7. Jianghuai 8. Xinan (Southwest) Although mutual intelligibility ac ross dialect families is low, as speakers from different regions in the south cannot understand each other without the help of Putonghua (the Common Language), Mandarin displays a high de gree of mutual inte lligibility among its varieties. Speakers from the northernmost provinces can communicate with those from the southwestern provinces without referri ng to Putonghua, or with little accommodation (Yuan 1960, Wurm and Li, et. al. 1987). Jianghuai, Xinan and Gan In this dissertation, one dialect of Chines e and two sub-dialects of Mandarin were investigated: Gan, Jianghuai, and Xinan. Gan is one of the six southern dialects, and it is mainly spoken in the province of Jiangxi (a s shown in Figure 1-1). Jianghuai, a subdialect of Mandarin, is mainly spoken in areas to the north of the Yangtze River in Anhui and Jiangsu Provinces, and also the areas be tween the Yangtze River and the city of Zhenjiang (as shown in Figure 1-1). Xinan, anot her sub-dialect of Mandarin, is mainly spoken in six southern provinces: Hubei (Southeast areas are excluded), Sichuan, Yunnan, Guizhou, northwest of Guangxi, and no rthwest of Hunan (as shown in Figure 11). For purposes of convenience, the three vari eties will henceforth be referred to simply as Â‘dialectsÂ’, as opposed to one dialect and two sub-dialects.
6 The three dialects differ phonologically in seve ral ways; such as in their number of consonants and tones, as well as in th eir lexicons (Hou 2005, Yuan 1960). The phonetic realization of /n/ and /l/ is one of the majo r classifying distinctions among these dialects. Across all eight Chinese dialect families, /n / and /l/ can be found as separate phonemes, but this contrast does not exist, or is only partially maintain ed in certain phonetic contexts, in the three dialects of interes t: Gan, Jianghuai, and Xinan. The lack of a distinction between /n/ and /l/ in certain wo rd-initial position in Gan was documented as early as the 1940s by Zhao (1948). Moreover, Zh aoÂ’s fieldwork in the cities of Wuchang (Xinan), Hankou (Xinan), and Hanyang (Xinan ) showed that in Xinan there was no distinction between /n/ and /l/, with [n] accoun ting for the majority of realizations. The merger of /l/ and /n/ is also evident in some areas where Wu and Xiang dialects are spoken (Yuan 1960). Yuan also reported that th e mixing between /n/ a nd /l/ varies in the southern areas. In Nanjing (Jianghuai), [n] occurred only in word-final position, whereas [l] occurred only word-initially. In Chongqi ng (Xinan), /l/ was not found in the sound system at all. Table 1-1 and Table 1-2 show the phoneme inventories of Gan, Jianghuai, and Xinan. Except for /n/ and / / in Xinan and Gan, and /p/, /t/, /k/ in Gan, all other consonants are not allowed to a ppear in the coda of a syllabl e, and no consonant clusters of any kind are legal in any of the dialects. For vowels, all high vowels can be combined with other non-high vowels except th e central vowel to form diphthongs. In summary, the distribution of /l/ and /n/ varies in the three dialects of interest. Unlike in Putonghua where the /l / and /n/ is phonemically co ntrastive, [n] and [l] are allophones of a single phoneme in Jianghuai. It is realized as [l] sylla ble-initially, and as
7 [n] syllable-finally. In Xinan, only one phoneme of the pair remains: /n/ occurs at both syllable initial and final positions; /l/ is enti rely absent from the phonological system. The distribution of /n/ and /l/ is more complex in Gan: while the contrast is maintained when preceding high vowels; [n] and [l] occur in fr ee variation in all ot her environments, and /l/ is not allowed in syllable-final position. An Overview of the Study Research Questions The overall goal of this study was to invest igate the ability of adult native speakers of Jianghuai, Xinan and Gan dialects to perceptu ally identify English [l] and [n]. It also aimed to test if laboratory training would im prove these speakersÂ’ perception of the nonnative pair of contrast. The carry-over eff ect of perceptual trai ning to production was also examined. Specifically, this resear ch was guided by the following questions: Research question 1: Do Chinese dialect speakers have difficulty in perceptually identifying English [l] and [n] as predic ted by current speech perception models? Research question 2: Does their identifica tion accuracy vary as a function of their native dialect? Research question 3: Is their identifica tion accuracy influenced by such phonetic contexts as vowel environmen t and syllable structure? Research question 4: Will their identifi cation accuracy improve after training? Would the amount of improvement vary as a function of L1 background? Research question 5: What are the acoustic cues American English speakers use to distinguish [l] and [n] in their production? Research question 6: Will Xinan speakers be able to distinguish English [l] and [n] in production after perceptual training? Research Design To answer the six research questions, th e research is designed as follows. The ability to perceptually dis tinguish the English [l] and [n ] among native speakers of
8 Jianghuai (n = 15), Xinan (n = 18) and Gan (n = 7) dialects before and after perceptual training was investigated. Five speakers of northern Mandarin subdialects were also included as controls in the pr e-test of the perception experi ment. The general design of the perception experiment included four phases: a pre-test phase, a perceptual training phase, a post-test phase and a generalization phase. During the pre-test phase, the ability to identify English [l] and [n] in natural stimuli by all participants was assessed. In the perceptual training phase, part icipants of Jianghuai, Xinan, and Gan were trained to identify English [l] and [n] using the same s timuli as those used in the pre-test. After 4 days of training, their ability to identify E nglish [l] and [n] was evaluated in a post-test phase. The post-test procedure was identical to the pre-test procedure. Furthermore, to assess whether the effectiven ess of the training transfe rred to stimuli produced by different talkers, the participantsÂ’ identification of English [l] and [n] was assessed again during a generalization phase. Unlike some previous studies (Bradlow et al. 1997), the issue of whether improved perception skill after training will be readil y transferred to production skill without further production training was not one of our original rese arch questions. Therefore, a comparison of English [l] and [n] producti on by speakers of Jianghuai, Xinan, and Gan before and after the perceptual training was not part of our original research protocol. Nonetheless, six Xinan speakers were availa ble the day after the post-test to provide production data. To see whether Xinan speakers distinguished English [l] and [n] in their production after perceptual training, their production was acoustic ally analyzed and compared to those of native speakers of Am erican English. It is important to note, however, that if a di stinction between the English [l] and [n] is evident among Xinan
9 speakersÂ’ production, it is inconclu sive to interpret the distinc tion as the result of training. However, if no distinction is realized, it may be concluded that the effectiveness of the perceptual training did not transfer to production immediately after the training. In the following chapters, I will first discuss and compare the articulatory and acoustic features of the English [l] and [n]. Next, I will review theo retical models in L2 speech perception and how each model proposes to account for the difficulty posed by the English /l/ and /n/ in Chinese dialect speakersÂ’ L2 perception. In Chapter 4, I will focus on the perception experiment. First, re search questions will be elaborated, and predictions and hypotheses will be propose d. Next, the methodology adapted in the perception experiment will be described and justified. Then, results will be presented and discussed. In the following chapter, Chapter 5, I will focus on the acoustic analysis on the American speakers and Xinan speakers. Results and implications will be discussed. In the last chapter (Chapter 6), I will conclude the whole dissertation. Implications as well as drawbacks of the current study will be addressed. Future research will also be proposed. Figure 1-1 Atlas of Chinese dialects and distribution of sub-dialects of Mandarin, redrawn based on the description in Wurm and Li, et al. 1987, and Hou 2005.
10 Table 1-1 Consonants in Jianghua i, Xinan and Gan (Yuan 1960). Table 1-2 Vowels in Jianghuai, Xinan and Gan (Yuan 1960). CHAPTER 2 ARTICULATORY AND AC OUSTIC FEATURES OF ENGLISH /l/ AND /n/ The consonant systems of many of the wo rldÂ’s languages contai n a voiced lateral approximant and a nasal consonant. Acco rding to the UCLA Phonological Segment Inventory Database (UPSID), 122 out of 317 languages have a voi ced dental/alveolar lateral approximant /l/, and 155 languages incl ude a voiced dental/alveolar nasal /n/ in their consonant inventories (Maddieson, 1984) . Additionally, /l/ and /n/ constitute a phonemic contrast in many of these languages, for example, English, German, Romanian, Sinhalese, Hausa. The contrast between /l/ and /n/ is also evident in many dialects of the Chinese language, for example, northern subdialects of Mandarin, Jin. Such a distinction is, however, absent or not co mpletely maintained in the three dialects of interest: Jianghuai, Xinan, and Gan. In Jianghuai, the phonemic inventory contains only /l/, which has two allophones: [l] in syllable initial positio ns and [n] in syllable final positions. In Xinan, only /n/ appears in the system and it does not have allophones. In Gan, /l/ and /n/
11 contrast before high vowels, but are in fr ee variation in other vowel conditions. Recall that the goal of this dissertation is the per ception of English [l] and [n] by speakers of the three dialects of interest. As there has b een few researches devoting to a direct comparison between /l/ and /n/ in English, before addressing the study on the perception and production of these two sounds, this chapte r will discuss and compare them in terms of their articulatory and acoustic characteristics. In the first section, articulatory and acous tic characteristics of /l/ are reviewed and discussed. Although there are two allophonic va riations in English /l/, namely the prevocalic [l] as in the word Â“lightÂ”, and a postvocalic [l] as in the word Â“feelÂ”, only the prevocalic allophone [l], also refe rred to as the Â‘lightÂ’ [l], will be discussed here as this is the focus of the dissertation. Section 2 discusses articulato ry and acoustic properties of /n/. Acoustic characteristics of /l/ and /n/ are then compared and contrasted in section 3; and finally, a discussion of add itional acoustic dimensions that serve to distinguish /l/ and /n/ is presented. Articulatory and Acoustic Features of English [l] English [l] is phonetically described as a la teral because it is produced with the tip of the tongue forming a mid-line closure at or near the alveolar ridg e, allowing airflow to escape on either side of the occlusion (i.e ., lateral release) (K ent and Read 1992). According to Stevens (1998), th is splitting of the airway results in certain acoustic characteristics in the middle and higher fr equencies which serve to differentiate the spectra of [l] from other speech sounds, especia lly glides and vowels. Specifically, it is hypothesized that such bifurcation of the vocal cavity divides ener gy of the source (i.e. the vibrating vocal folds) into two channe ls other than the main route (Hayward 2000, Stevens 1998), which introduces antiformants or zeros (i.e. negative peaks or a gap in the
12 acoustic output) into the lateral resonance, as illustrated in Figure 2-1). As shown in this figure, the antiformant or zero cancels the form ant or pole, causing a drop in the acoustic intensity of the sound. This hypothesis has been confirmed by em pirical studies on lateral sounds. For instance, Kent and Read (1992) reported a lack of lower frequency energy than surrounding vowels for English [l] resulting fr om antiformants caused by the interaction of the sound waves in the two lateral cavit ies. In addition to a lower-range energy reduction, Stevens (1998:546) repo rted that antiformants for [l ] might also occur within the frequency range of 2200-4400 Hz, but that the precise value depended on the length of the side-branch (the air trapped on t op of the tongue). Possible locations of antiformants in an English word Â“LeeÂ” is shown in Figure 2-2. The antiformants are located approximately at 1700, 2700 Hz, and 3800 Hz respectively. In sum, the configuration of the vocal tract during the production of [l] introduces both poles (formants) and zeros (antiformant s) in its acoustic output spectrum. Formant frequencies of English [l] have b een measured in a number of previous studies. Table 2-1 shows mean formant freque ncies (F1, F2 and F3) of [l] from three different studies (Lehiste 1964, Nolan 1983, St evens 1998). According to Lehiste (1964), the frequency of the first formant (F1) of [l] measured at the gl ottal pulse following the constriction release is usually higher than that of other approximants such as [j] or [w]. The second formant (F2) has been found in some studies to be similar in frequency to F2 in [ ] (Kent and Read 1992) but with a larger bandwidth (Stevens and Blumstein 1994; Stevens 1998). The larger bandwidth or broad resonance peaks in the spectrum of [l] are due to its articulatory properties: the bifur cation of the vocal cavity results in a greater
13 surface area available in the voc al tract for the production of [l] compared to other nonlateral and non-nasal sounds . Therefore, more sound energy is absorbed and the amplitude of [l] is lowered (Stevens 1998) . Additionally, the narrow constriction for [l] introduces acoustic losses that increa se the F1 bandwidth (Stevens, 1998). In addition to spectral features differen ces, the articulatory co nfiguration of [l] creates unique temporal characteristics when compared to other non-lateral sounds. Two of these characteristics, which provide essent ial cues to differentiate [l] from other approximants and stops, are the rate of consonant-vowel (CV) transition and the duration of formant transitions. Dalston (1975) suggest ed that the tongue c ontact involved in the articulation of [l] might account for F1 and F2 being longer in [l] than other approximants, and that the rapid movement of the tongue tip may be the cause of shorter and rapid F1 transitions in [l]. Furtherm ore, Lehiste (1964) found a rapid rise in F1 which she considered a strong cue for identif ying /l/ compared to other approximants like /j/ or /w/. Tongue movement has also been attribut ed to spectral differences. The rapid movement of the tongue tip from the alveol ar ridge during the pr oduction of [l] may be responsible for a sharp spectral discontinuity at the offset of a pre vocalic [l]. According to Johnson (2002), this abrupt acoustic discon tinuity is presumably due to the abrupt movement of the tongue tip as it leaves the alveolar ridg e, thereby producing a single central cavity rather than two lateral ones. Additionally, spectral discontinuities have been considered typical of pre vocalic laterals (Lehiste 1964). Phonetic Factors in [l] Perception and Production Phonetic properties of [l] have also been shown to be affected by syllable structure and vowel contexts (Fant 1960, Lehiste1964). Fo r example, Lehiste (1964) reported that
14 a syllable-final [l] seems to have a smaller F2-F1 space than a syllable-initial [l]. Additionally, Stevens and Blum stein (1994) found that F1 an d F2 frequencies in prestressed laterals varied depending on the formant frequencies of the following vowel; [l]Â’s F2 is higher before a fr ont vowel and its F1 is higher before a low vowel. [l]Â’s F1 may also be close to the F1 frequency of [I ] (Kent and Read 1992). In other words, the formant structure of a lateral interacts with that of an adjacent vowel: the first three formants of a prevocalic [l] anticip ate those of the following vowel. Syllable Position Regarding the effect of syllable struct ure on the phonetic properties of [l], OÂ’Connor et al. (1957) reported that there is a more signif icant energy change between the lateral [l] and the following vowels if the lateral was not in a cluster with another consonant, and was syllable-initial. On the other hand, OÂ’Connor et al. (1957) found that if the lateral was in a cluster with anothe r consonant, there would be a greater energy change between [l] and the following vowel if the preceding consonant was voiced; in this environment, the lateral is also completely voiced. Summary The English lateral [l] is produced with a mid-line tongue-tip constriction allowing sound to radiate through both or either side of the late ral openings. Such midline bifurcation of the vocal trac t introduces antiformants, increases formant bandwidth, and decreases the overall amplitude of [l] relativ e to that of vowels and other non-lateral consonants. Finally, the rapid movements of th e tip of the tongue away from the alveolar ridge during the production of [l] leads to abru pt spectral discontinui ty from [l] to the following vowel and a relatively short and ra pid F1 transition. The features described
15 above are unique to [l]; as such, they play a ro le in a listenerÂ’s ability to perceive [l] from other non-lateral consonants. Articulatory and Acoustic Features of English [n] The production of the English voiced alveolar nasal [n] requires an occluded oral cavity, a lowered velum and a radiation of sound through th e nasal cavity. The vocal tract extends from the glottis to the nostrils , and the oral cavity behind the articulatory closure functions as the side-bra nch (Fujimura and Erickson 1997). The velopharyngeal opening and the accomp anying occluded oral cavity during the production of [n] cause sound to radiate exclus ively through the nose a nd gives rise to the acoustic feature of a nasal murmur (Kent a nd Read 1992). Similar to vowels, the nasal murmur contains a number of spectral peaks or formants. However, the amplitude of these formants is much weaker than that of the vowel formants. The only formant in the nasal murmur that has amplitude comparable to that of the vowel formants is the lowfrequency formant typically referred to as the nasal formant (Kent and Read 1992). In other words, the nasal murmur is character ized by a dominant low-frequency (around 300 Hz) formant, namely the nasal formant and a number of much weaker formants at higher frequencies. Similar to the bifurcation of the vocal tract in the articulation of [l], the involvement of both oral and the nasal caviti es during the articulation of [n] produces antiformants in its resonance (Hayward 2000, Kent and Read 1992). Specifically, the coupling between the nasal and the oral pa ssage at the velopharyngeal port causes the formation of antiformants. The frequencies of the antiformants depend on the length of the nasal cavity which extends from the uvular to the nares and ha ve an average spacing of 1400 Hz for adult
16 males. According to Stevens (1997), the firs t antiformant might occur in the range of 250-300 Hz, and the second in the range of 1500-1800 Hz. Figure 2-3 below illustrates possible locations of antiformants in the spectra of an English word Â“kneeÂ” taken at the mid-point of [n]. One antiformant is below 500 Hz and the other one is at around 1700 Hz. The antiformants are, however, barely m easurable through conventional spectral analysis such as LPC (Linear Prediction Co efficient), partly because they are often disguised by the formant structure of th e nasal sound, or filled up by background noise (Johnson 2002). Johnson suggests th at it is unlikely that the hu man auditory system could pick up on anti-resonance as the cue for perceiving nasals. As mentioned above, besides the low-frequency nasal formant , the antiformants, the spectrum of the nasal murmur also cont ains formants in hi gher frequency region. According to Kent and Read (1992), similar to the antiformants, the formants of the nasal cavity (the resonator) are determined by th e length of the cavity extending from the uvular to the nares, which is estimated to be approximately 12.5 cm in adult males. Thus, the spacing between nasal formants is a pproximately 1400 Hz. As stated above, the amplitude of these higher-frequency form ants are, however, much weaker when compared to the amplitude of the vowel formants. A reduction in the amplitude of higher forman ts in the nasal murmur is due to the coupling between the oral and nasal cavities. As already mentione d, the coupling of the two cavities introduces antifor mants into the nasal spec trum. According to Johnson (2003), the presence of an antiformant reduces the amplitude of all other formants above it. Additionally, the coupling of the nasal and the oral cavity results in increased surface
17 area and additional volumes, which absorb more energy than non-nasal sounds. Thus, it weakens the formants of [n] and widens th eir bandwidth. Such weakening effect shows up in the output spectrogram as overall lo wer amplitude of the [n] than surrounding vowels. Although nasals have relatively low amp litude compared to surrounding vowels, variations in amplitude do not seem to affect the perception of nasals in certain vowel contexts. Abramson et al. (1979) found a statistic ally insignificant shif t in the position of the boundary along a synthesized [d]-[n] conti nuum in high and low vowel contexts. To account for this shift between oral and na sal boundaries, they hypothesized that low vowels require a larger area of velopharyngeal coupling in order that the listeners perceive the consonant as being nasal, but that the magnitude of boundary shift due to loudness level is insufficient to account fo r the oral-nasal category boundary between open and close vowels. In summary, the occlusion of the oral cavity accompanied by the opening of the velopharyngeal port generates the so-called Â‘nasal murmu rÂ’ during the production of a nasal consonant. The nasal murmur is characterized by a dominant low-frequency resonance commonly referred to as a Â‘nasal formantÂ’ and a numb er of weaker formants at higher frequencies. Comparison of English /n/ and /l/ A direct comparison of the ac oustic properties of [n] and [l ] is relatively sparse in the speech acoustic literature. Acoustically, laterals are most similar to nasals because the source (the glottis) is the same and the f ilter function contains antiformants due to the bifurcation of the vocal cavity (nas al and oral tracts in case of [n]; lateral branches of [l]) (Ladefoged 2003: 145). As discussed above, wh en compared to vowels and other non-
18 nasal and non-lateral consonants, both [l] a nd [n] have a relatively low acoustic energy with a predominantly low-frequency concentration. From an acoustic point of view, Coleman (1998) claims that /l/ and /n/ constitute a natural class in both having additional spec tral zeros between F2 and F3, which are caused by side-branches in articulation as disc ussed earlier. [l] and [n] also have similar spectra: the F3 locus for [l] in onset positions is as high as that of [n], and both have a similar formant-antiformant pair in a higher frequency range; furthermore, most of the energy is in the low frequencies, with da mped higher formants for both sounds (Lass 1996). Such acoustic similarity has been reported to cause confusion between nasals and laterals in automatic recognition studies fo cusing on the distinction between /n/ and /l/, the latter of which is the only class of approximants that shares the features +sonorant, +voice and Â–syllabic with nasals. Merm elstein (1977) reported that /l/ and / / before high vowels were often confused with nasals in machine recognition. Similar confusion has also been confirmed by more recent studies (Espy-Wilson 1992, Pruthi 2004). According to these researchers, a majority of the misclassifications of nasals occurs when nasals appear in a context with high fre quency energy such as a low vowel / /. Autonomous recognition research has, thus, suggested that currently known cues are not adequate for successful discrimination between /n/ and /l/. An investig ation into more robust and unambiguous acoustic cues in the distinction be tween nasals and late rals is warranted. The articulation of [n] and [l] differs in important ways. First, unlike [l], sound energy is radiated only through the nasal pass age during the articulation of [n], producing a nasal formant (250-300Hz) in the spectru m for a nasal murmur. Second, the velum
19 remains lowered shortly after the producti on of [n], causing the following vowels to become nasalized to a varying degree. Nasals have been shown to affect surrounding vowels due to slow movement of the velum. Acoustically, the reduction in amplitude of F1 has been observed to be the primar y cue of nasalization (Fant 1960, House and Stevens 1956). House et al. (1956) used synthe tic perceptual stimuli to show that F1 amplitude lowered by 6-8 dB was necessary to achieve a significant level of nasality perception. Hawkins and Stevens (1984) investigated the distinction betw een nasal and oral vowels in a perceptual experi ment of listeners from four language backgrounds. Based on their results, they postulated that there is a basic acoustic property of nasality, independent of vowels, to which the audito ry system responds in a distinctive way regardless of language background. Also ther e are one or more additional acoustic properties that may be used to varying degrees in differe nt languages to enhance the contrast between a nasal and an oral vowel . Hawkins and Stevens (1984) proposed that the degree of prominence of the spectral peak in the vicinity of the F1 was the basic differing acoustic property. Shifts in the center of gravity of the low-frequency spectral prominence were also considered important in the nasal vs. non-nasal distinction. Transitional segments, both befo re and after the nasal murm ur, are also effective as cues for detecting the nasal as a class, as th e nasal quality is slightly prolonged; albeit transitions to nasals are relatively short, generally not exceeding 50 ms (Mermelstein, 1977). Nasalization in natural speech can also give rise to changes in the spectrum at higher frequencies. There may be shifts in the frequencies of higher formants, modifications in the amplitudes of the spect ral peaks, as well as the introduction of
20 additional spectral peaks. These changes in the acoustic spectrum at high frequencies, however, do not seem to be as consistent acro ss different speakers and vowels as those in the vicinity of the F1 (Hawkin and Stevens 1984 ). This lack of consistency was attributed to individual differences in th e anatomy of the nasal cavity. The third difference between [n] and [l] is the configuration of the tongue. Articulation of [l] involves the sides of the tongue, as well as an alveolar constriction. [n] shares the alveolar constric tion, but does not involve lowe ring the center and curling sides of the tongue. The different articulations of [l] and [n] result in different transitions to any following vowels. More specifically, during the few tens of milliseconds following the consonantal release, the rate of release of the tip of the tongue would be similar for both, but the downward movement of the late ral edges would be different (Stevens 1998). Thus, an analysis of the two sounds at the consonantal rele ase should provide more information on the difference between th e two. Spectral analys is of the glottal pulses immediately near the release or the CV transition onset has been investigated, and has proved effective in the perception of pl ace of articulation for nasals (Kurowski and Blumstein 1984). They also found that the highest performance scores of correct identification by listeners were obtained when the stimuli consisted of the last three glottal pulses of the murmur and the first th ree glottal pulses of th e transition in a CV syllable. The fourth difference between [l] and [n] is concerned with their energy distribution. Although both exhibit predominan tly low-frequency concentrations, their energies distributed in the higher frequency region may differ. Stevens (1998: 547) points out that
21 Small changes in the frequency of the zero can give rise to large changes in the amplitudes of the spectral p eaks corresponding to the poles in higher-frequency region (1500 to 4000 Hz). Consequently, when the vocal tract is in the late ral configuration, considerab le variability can be expected in the spectral shape in the frequency range of 2500 to 4000 Hz. Therefore an examination of overall energy distribution may yield interesting results; warranting a look at spectral moments. Spectral Moments Spectral moments aim at capturing Â“both local (mean frequency) and global (spectral tilt and peakedne ss) aspects of speech soundsÂ” (Jongman et al. 2000: 1253). Spectral moments include four components: Center of Gravity (COG), Standard Deviation (SD), Skewness, and Kurtosis. Center of gravity : The first moment, Center of Gr avity, is a measure of where the center of the energy concentration produced by the constriction lies. It provides information about the average concentration of energy (Forrest et al. 1988, Jongman et al. 2000). Standard deviation : The second moment, Standard De viation, is a measure of how much the energy spreads out from the Center of Gravity. It indicates the range in energy concentration. Skewness : The third moment, Skewness, is a measure of how much the shape of the distribution below the Center of Gravity differs from the shape of the distribution above the Center of Gravity. It pr ovides information about the til t of the energy distribution (Forrest et al. 1988). A value of Â‘0Â’ points to a symmetry around the mean; Â‘positiveÂ’ values denote that the right tail of the dist ribution extends further than the left tail, whereas a Â‘negativeÂ’ skewness implies that the left tail extends further than the right tail (Jongman et al. 2000).
22 Kurtosis : The fourth moment, Kurtosis, is a measure of how much the distribution differs from a Gaussian shape in terms of the peakedness of the central section. It indicates the peakedness of the energy distri bution (Forrest et al . 1988); negative values are an indication of a flat spectrum, while pos itive values are a sign of a spectrum with clearly defined peaks (Jongman et al. 2000). Two primary factors affect the outcome of spectral moments: the bandpass filter and window length in spectrum analysis. Genera lly, a 30-ms window fits best for speech spectrum analysis (Carter 2002, Stevens 1998). Details of the filter applied to the spectrum have an important role in the outcome of a moment analysis. This is particularly true in the case of liquids (Carter 20 02). There is, however, no generally accepted standard for selecting regions of spectra. I ndeed, some analysts select more than one fixed window to act as a ba ndpass filter on the spectrum be fore a center of gravity analysis is performed (Beddor and Hawkin s 1990); while other analysts have used a dynamic approach (TraunmÃ¼ller 1990). Moment analyses have been used in studi es of fricatives (J ongman et al. 2001, AlKhairy 2005). They have not yet been applied to the contrastive analysis of /n/ and /l/. Considering the existence of antiformants a nd their possible variab le locations in the spectrums of [n] and [l], the di stribution of energy (or peaks and zeros in the spectrum) of the two sounds should be difference. Moment analysis would be able to capture the difference more successfully than common spectral analysis. The fifth and last important difference concer ning [l] and [n] is th at they may differ in the consonantal duration. Romero (1993) re ported a difference in duration between the fricative and the approximant in Spanish: th e Spanish bilabial approximant had a longer
23 duration than its homogenous fricative. She attr ibuted this difference to the difference in articulation of the two classe s of sounds: fricatives ha ve a longer, more precise articulatory configuration than approximants, which create no turbulence and, consequently, need not remain in a particular articulatory configuration for long. As /n/ is a stop and also has precise articulatory confi guration, its duration may be longer than that of /l/, which is an approximant. A review of previous studies, however, showed an opposite pattern: the duration of [l] at the ons et of a syllable was on average longer than that of [n] except in Toft (2002), the values of which were listed in Table 2-2 below (Barry 2000, Byrd 1993, Toft 2002, Umeda 1976). Such a discrepancy among studies raises more interest in an investig ation on the duration of [l] and [n]. Summary Although acoustically similar to each other, [n] and [l] are distinguishable across certain acoustical dimensions. Among these dimensions, features of the consonant formants, CV transition, and spectral analysis of the constriction release, have been used to investigate [n] and [l]. Other dimensions , e.g., duration of the consonant, the rate of mean intensity change, spectral moments of the consonant, and the nasalization of the following vowel, have not been previous ly used to compare [n] and [l]. As this dissertation uses natural tokens, it is not possible to control all acoustic features; thus their effects on perception will not be investigated and compared; but effects of two contexts can and will be exam ined: vowel contexts and positions of /l/ and /n/ in a syllable. Regarding the produc tion analysis, the fo llowing eight acoustic dimensions will be examined and compared for both AE and Xinan speakers: consonant duration, CV transition, formant frequencies and bandwidth and intensity, the difference between F1 and F2, mean intensity change, spec tral properties of glo ttal pulses, spectral
24 moments, and vowel nasality. In the next chap ter, theories and models of cross-language speech perception involved in this dissert ation will be reviewed and discussed. Figure 2-1 A simplified two-dimension spectr um for the illustrati on of antiformants. Figure 2-2 An FFT spectrum of [l] genera ted by PRAAT for the English word Â“LeeÂ” produced by a male AE speaker in this study
25 Figure 2-3 An FFT spectrum of [n] generate d by PRAAT for the English word Â“kneeÂ” produced by a female AE speaker in this study Table 2-1 Formant frequencies of E nglish /l/ in previous studies. Study F1 F2 F3 Nolan (1983) 36013503050 Lehiste (1964) 295980 2600 Female 3501180 Steven (1998) Male 360900 Table 2-2 Mean duration of [l] and [n] at syllable initial po sitions in previous studies. [l] onset Duration (ms) [n] onset Duration (ms) Toft (2002) 81 Toft (2002) 86 Barry (2000) 95 Umeda (1976) 71 Umeda (1976) 66 Byrd (1993) 59
26 CHAPTER 3 CROSS-LANGUAGE SPEECH PERCEPTION Adults are language-specific perceivers whose L1 profoundly influences their L2 perception (Aoyama et al. 2004, Flege 1993, Goto 1971, Lively et al. 1994). The acquisition of an L2 phonologica l system after the establishm ent of an L1 in childhood requires Â“reattunement of perceptual phonetic processes and the perceptual reorganization of phonological categoriesÂ” (Strange 1995: 197). Over decades, researchers have been motivated by questions regarding speech processing by children and adults, and on cross-language speech perception. In the 1970s and 1980s, perception studies concerning native speak ers and L2 language learners demonstrated that phonetic segments were generally perceived in terms of the speakerÂ’s nativ e categories (Abramson and Lisker 1970, Werker and Tees 1984). Sin ce then, theories and models, such as the Native Language Magnet Model, the Percep tual Assimilation Model, and the Speech Learning Model, have been proposed to expl ain the relationship be tween acoustic signals and linguistic elements such as phonemes and distinctive features, and the mapping between L2 sounds onto L1 phonetic categories. In this chapter, an overview and a discu ssion of the three models of cross-language speech perception will be revi ewed and discussed in the first section. This will be followed by a review on previous studies on sononrants and variables in speech perception experiments. Next, the relations hip between speech perception and production is discussed and followed by a review of releva nt research. Factors that may influence the effects of training are disc ussed after the review.
27 Models of (Cross-) Language Speech Perception Within cross-language perception research several factors have been shown to affect an individualÂ’s ability to discriminate non-native sounds. This includes differences and similarities between the sound systems of the L1 and that of L2, and amount of experience a person has with the target langu age. Differences and similarities between the L1 and the L2 sound systems include th e phonemic distinctions employed by the two languages as well as the phonetic character istics realized by those phonemes. A comparison between the L1 and the L2 phonemi c systems gained popularity in the 1960Â’s known as Â‘Contrastive Analysis Â’; researchers conducting such analyses, took phonemic differences and similarities as the main source of cross-language misperception and misproduction. A new generation of resear chers, however, found similarities or differences between the L1 and L2 at the phonetic level as a more promising source of explanation for cross-language perceptual constraints. This is evident in the fact that phonemic and phonetic level differences and si milarities have been incorporated in current speech perception models such as the Native Language Magnet Model (NLM) (Iverson and Kuhl, 1995), the Perceptual A ssimilation Model (PAM) (Best, 1995), and the Speech Learning Model (SLM) (Flege, 1995 ). While research in infantÂ’ speech perception forms the basis for the NLM, the SLM is formulated to account for the changes in L1 and L2 perception, as well as production among second language learners. PAM, on the other hand, is concerned with th e perception of novel or foreign contrasts. Its focus is on the filtering effect of the L1 perceptual categories on the perception of foreign sounds.
28 The Native Language Magnet Model (NLM) This model was proposed to account for the early stages of speech perception development in infants and how adult listenersÂ’ pe rception develops so th at it is particular to their native language (Kuhl 1991). The NLM focuses on the internal organization of category boundaries and proposes that that is the basis on wh ich listeners differentiate between good and less good exemplars of pr ototypes of a native category (Kuhl 1992, 1993). According to NLM, each phonetic category contains a best exemplar or prototype. Like a magnet, a phonetic prototype in a cat egory assimilates non-pr ototypical members of the same category, and shrinks the acous tic-phonetic space toward it, thus making prototype vs. non-prototype discrimination difficult. Additionally, this model predicts that per ceptual discriminability is reduced for L2 sounds that fall within the psychoacoustic space of an L1 sound. Specifically, the NLM argues that perceived Â‘categor y goodnessÂ’ influences the disc riminability of exemplars identified by a single phoneme. The prototyp e in the category of the phoneme exerts the Â‘Perceptual-Magnet EffectÂ’ (PME) and shrinks the boundaries around it. PME has been reported, mostly by Kuhl and her colleague s for both vowels and consonants in both infant and adult listeners (Greiser and Kuhl 1989, Kuhl 1991, 1993, Kuhl et al. 1992, Iverson et. al 1994, Iverson and Kuhl 1996). According to the NLM, difficulties for Chinese dialect speakers in perceiving the English [l] and [n] would resu lt from perceived similarity between the L2 sounds and a native prototype. Specifically, perceived lo cations and psychoacoustic distance of English [l] and [n] from a native Chinese cate gory prototype (/n/ or /l/ or other phoneme in the Chinese dialects) will de termine their degrees of discriminability. In other words, if English [l] and [n] fall within the psychoacous tic spaces of two different categories (i.e.,
29 the two sounds are pulled in by two different na tive prototypes), then they will be easy to discriminate. It is predicted that percepti on of Gan and NM speakers will fall into this scenario. If both [l] and [n] fall within th e psychoacoustic space of the same category, their discriminability will be determined by their psychoacoustic distance both from the prototype and from each other. If they are both poor exemplars (very different from the prototype), then their discriminability will be higher if they are psychoacoustically apart than if they are close to each other. Specifical ly, it is predicted that both [l] and [n] will be assimilated into /n/ in Xinan, and into / l/ in Jianghuai. The discrimination will be good if [l] and [n] are both perceived as poor exem plars. If one is perceived as a good exemplar and the other as poor, their discrimination will be good. These assumptions, however, are hard to test with natural stimuli (Harnsberger 2001). Harnsberger argued that Â“the complexity of natural stimuli makes it problematic to assume that the particular measures taken c onstitute most or all of the perceptually relevant differences between the stimuli, part icularly in experiments involving Â‘exoticÂ’ contrasts that have not been examined in much or any prior workÂ”(2001: 491). In this sense, the NLM does not offer specific concrete predicti ons concerning the discrimination patterns for speakers of the thr ee Chinese dialects of interest in the study at hand. The Perceptual Assimilation Model (PAM) Unlike the NLM, the Perceptual Assimila tion Model (PAM) (Best, 1995), another influential model in the domain of crosslanguage speech perception, was proposed based on an investigation into perception of na tural non-native stimuli. Both models incorporate the notion of category goodness to account for the varying degrees of discrminability of non-native contrasts.
30 PAM was developed by Best and her colle agues (Best, McRoberts and Sithole, 1988; Best, 1995). The primary goal of this mode l is to provide an explanation for the process of cross-language speech perception. According to PAM, foreign sounds are perceived according to their similarities to, or discrepancies from L1 sounds that are closest in terms of articul ation (Best, 1995). Based on th is, the perceived distance between a pair of foreign sounds and the clos est L1 sounds will lead to different degrees of discriminability. According to PAM, a pair of L2 sounds that are identified as instances of two distinct sounds in L1, called a two-category (TC) type , will be relatively easy to discriminate. On the other hand, a pa ir of L2 sounds that are identified as instances of a single L1 sound, but are judged to differ in how good they are as instances of the L1 sound, called the category-goodness (CG) type, will be more difficult to discriminate. Finally, two L2 sounds that ar e judged to be equally good instances of a single L1 sound, called the single-category (SC) type, will be the most difficult to discriminate. Thus, these three perceptu al assimilation types can be rank ordered according to their discriminability from high to low as follows: TC > CG > SC. Three other types of perceptual assim ilation patterns identified by PAM are the Uncategorized-Categorized (UC) type, the Un categorized-Uncategorized (UU) type and the Non-Assimilable (NA) type. For the UC type, one non-native phone is categorized as one of the L1 phones while the other is not. A high degree of discri mination is predicted for this assimilation pattern, and discriminab ility may be comparable to the best CG contrast (Harnsberger 2001). On the other hand, neither sound in the UU pair can be categorized as any L1 category. Its discrimina bility is expected to range from fair to good, depending on perceived similarities between the two sounds making up the
31 contrast, and their nearby L1 categories. However, since Â“these two proximities are not clearly defined, or weighted with respect to one another, UU contrast cannot be rank ordered (Harnsberger, 2001: 490). Finally, for the NA type, both non-native sounds cannot be categorized as any of the L1 s ounds and are perceived instead as non-speech sounds. Good to excellent discrimination, de pending on their perceived difference as non-speech sounds, is predicted for this perceptual pattern. Support for PAM has been obtained in several studies (Best, McRoberts and Sithole, 1988; Best 1994, 1996; Guion et al ., 2000, Best, McRobert s and Goodall, 2001). With the exception of Guion et al. (2000) a nd Best et al. (2001), pe rceptual assimilation patterns in earlier studies were obtained in a post-hoc fashion by the authors based on the phonetic description of the sounds in ques tion (Harnsberger, 2000) . Recent findings, on the other hand, suggest the inadequacy of PAM to account for Â“a large proportion of the assimilations that occur outside of th e laboratoryÂ” (Harnsberger, 2001: 498). According to PAM, the English contrast of /l/ and /n/ would be difficult for Chinese dialect speakers due the different phonemic structures between English and the three dialects. Moreover, unlike NLM, PAM can generate predic tions concerning the degree of difficulty that speakers would encount er in perceiving the non-native [l] and [n] contrast. According to PAM, perceived degr ee of discriminability of [l] and [n] in English should vary across the three groups of Chinese speakers, as the distribution of /n/ and /l/ vary across the three Chinese dialects . More specifically, for Jianghuai speakers, both English /n/ and /l/ would be assimilated to /l/ and constitute a Â‘single categoryÂ’ assimilation type according to PAMÂ’s classi fication. The discrimination of English [l] and [n] is, therefore, expected to be very poor for this group of speakers. Similarly, a
32 single category assimilation patte rn is also expected for Xinan speakers. That is, since Xinan only has /n/, both [l] a nd [n] would be perceived as Xinan /n/, and discrimination of [l] and [n] would be exp ected to be very poor. Unlike Jianghuai and Xinan speakers, Gan speakersÂ’ assimilation patterns are not as straightforward because the distribution of /l/ and /n/ in Gan depends on the following vowel contexts: contrastive before high vowels and in free variation before other vowel s. Therefore, in the context of a high vowel, the English [l] and [n] may be assimila ted as two categories, and therefore should lead to very good discriminati on. A single category assimilati on pattern is expected for other vowel environments. In this case, disc rimination is expected to be very good in a high vowel context compared to a non-high vowel environment. In sum, PAM predicts that discrimination of English [l] and [n] would be equally poor for all three groups of Chinese speakers but that Gan speakers will outperform Jianghuai and Xinan speakers in a high vowel context. For Gan speakers, the two English sounds could be perceived as belonging to two categories, or one category differing in Category Goodness (CG). If it were the latter case CG, performance would be moderate to good depending on how well each sound fits in the native category. The Speech Learning Model (SLM) Similar to PAM, the Speech Learning Model (Flege 1995), another influential model in the domain of cross-language speech perception, addresses cross-language speech perception in terms of phonetic (dis)similarity betwee n an L1 and L2. However, unlike PAM, SLM focuses more on ultimate attainment of speech learning and the relationship between perception and producti on in second language (L2) acquisition. SLM is concerned with age-related limits on how well an L2 learner produces and perceives an L2 sound, based on the phonetic di fference between the L2 sound and an L1
33 sound (Flege 1995). The SLM views L2 learning like L1 learning in that it Â“is also influenced importantly by the nature of input received, and L2 production is guided by perceptual representations st ored in the long term memory Â” (Flege 2005: 84). The core assumption of the SLM is that basic speech learning mechanisms, including the ability to establish new phonetic categories, remain malleable across th e life span. The model also proposes that there is one phonological space that accommodates both L1 and L2 phonetic systems with mutual influence on each other. A series of hypotheses are put forth in th e SLM (Flege 2005). First, the greater the perceived dissimilarity of an L2 sound from the closet L1 sound, the more likely a new category will be formed for the L2 sound. Second, category formation for an L2 sound becomes less likely through adulthood as re presentations for neighboring L1 sounds develop. Thirdly, when a category is not formed for an L2 sound because it is too similar to an L1 counterpart, the L1 and L2 categor ies will assimilate to form a merged L1-L2 category. A fourth hypothesis is that when a new category is established for L2 sound, it may dissimilate from neighboring L1 and/or L2 sound, and vice versa, to preserve phonetic contrast. Similar to PAM, SLM models how well th e listener will perform in perceiving nonnative sounds based on the perceived phonetic distance between L1 and L2 sounds. SLM claims that speech learning mechanisms are mos tly effective in the early stages of the L2 acquisition (Flege 1995). Various studies have provided evidence in support of this claim (Flege and Munro et al. 1995, Yamada 1993). All else being equal, SLM predicts that early L2 learners will be more likely to establish new phonetic categories for L2 speech sounds than late bilinguals. SLM also predic ts that the likeliho od of a category being
34 formed for an L2 speech sounds varies inversel y as a function of its degree of perceived dissimilarity from the closest L1 speech sound. It has been proposed by SLM that the more distant an L2 sound (phonetic segment) is from the closest L1 speech sound, the more learnable the L2 sound will be. Aoyoma et al. (2003) investigated whether native Japanese speakers would have more success acquiring English / / than /l/. This longitudinal study on the perception and production of English /l/, / /, and /w/ by native Japanese adults and children living in the US suggested that there was great er improvement for English / / than English /l/ among children. As Japanese / / is phonetically different from American English /l/ and / /, SLM accounted that degree of perceived phonetic dissimilarity influences L2 learners' success in acquiring L2 phonetic segments. In agreement with NLM and PAM, SLM al so predicts that the Chinese dialect speakers would have difficulty in perceiving th e English /l/ and /n/ distinction when the non-native sounds are strongly identified with a single native category. But as the SLM is not mainly concerned with the initial per ceptual discrimination, it cannot provide more specific predictions on discrimination pa tterns as PAM does. On the other hand, according to SLM, changes in the perceptual ability are possible because age is not what ultimately determinates L2 acquisition. Theref ore, in the current study, there should be some improvement in dialect speakersÂ’ abiliti es to perceptually distinguish English [l] and [n] after training. Cross-language Research on the Perception of Sonorants The formulation of these three speech pe rception models benefited from a number of cross-language speech perception studies. A large number of these studies were those conducted to investigate the perception of English sonorants by speakers of other
35 languages. Among the English sonorants, the / / and /l/ contrast has been the most extensively investigated, which may be due to the fact that both Japanese and Korean speakers have demonstrated difficulty in acqui ring this distinction, which is absent in their native languages. According to Ladefoged (2001) the Japanese / / is phonetically an apico-alveolar tap that is quite different from American English /l/ and / /, which are lateral and central approximants, respectively. In fact, Japanese / / is phonetically more similar to tapped /t/ and /d/ in American English than it is to either /l/ or / /. Despite the articulatory difference between Japanese / / and the English liquids /l/ and / /, Japanese speakers seem to perceptually assimilate both English liquids to Japanese / /. Best and Strange (1992) suggested that both English [l] and [ ] will be heard as poor exemplars of a single Japanese consonant, either [w] or [ ]. Takagi (1993) reported th at syllable-initial English /l/ and /w/ were usually identified as Japanese / / by Japanese adults. Similarly, the Japanese adults tested by Guion et al. (2000) identified English [l] and [ ] as either Japanese [ ] or a high back unrounded vowel. The au thors took the resu lts as suggesting a two-to-one cross-language mapping pattern. As early as 1975, Miyawaki reported non-categorical discrimination of / / and /l/ by native Japanese speakers. In this st udy, adult native speakers of Japanese and American were tested on the discrimina tion of a set of synthetic speech-like [ ] [l]s. The stimuli varied in initial stationary freque ncy of F3 and its transitions to the following vowel. Discrimination tests of a comparable se t of stimuli consisting of the isolated F3 components provided a Â“non-speechÂ” control. For speech-like stimuli, AmericansÂ’ discrimination was nearly categorical, wh ereas, Japanese speakers performed only
36 slightly better than chance. Th e author attributed this difference to the fact that Japanese lacks a phonemic contrast between / / and /l/. For non-speech stimuli, however, almost identical highly accurate performance was observed for both Japanese and American speakers. Based on this result, the author sugge sted that the effect of linguistic experience is specific to perception in the speech mode, but not in the non-speech mode. MacKain et al. (1981) carri ed out a similar experiment. They investigated categorical percepti on of a synthetic / -l/ continuum with Japanese bilinguals varying in levels of English-language experience (i ndexed by lengths of English conversation training). The tasks that they used were ab solute identification, AXB discrimination, and oddity discrimination. In an AXB task, participan ts hear three stimuli per trial. Their job is to decide whether the sec ond stimulus (X) belongs to the same phonetic category as the first (A) or the last (B) stimuli. In th e oddity discrimination task, on the other hand, participants have to decide which of th e three stimuli belongs to a different phonetic category. Similar to the previous studies, they found categorical perception by American speaker and near-chance performance by less experienced Japanese speakers. Experienced Japanese speakers, however, perceived / / and /l/ categorically, and their performance was similar to that of the American-English controls. Beside Japanese speakers, researchers ha ve also investigated Korean speakersÂ’ perception of English liquids . Ingram and See-Gyoon (1998) compared Japanese and Korean speakersÂ’ identification and discrimination of the /l/-/ / phonemic contrast. Their goal was to examine Â“the interaction of potentially competing factors operating at different levels of perceptual processi ng (phonetic and phonological) and over different domains (language specific and universal).Â”( p. 1172) The results they obtained suggest
37 earlier perceptual learning affects L2 perception through th e means of language-specific phonetic and phonological factors. In comparison to [l] and [ ], studies on cross-language perception of English [l] and [n] are far fewer in number. Schmidt and he r colleagues carried out a series of studies on the perception of English [l] and [n] by sp eakers of certain Chinese dialects (Schmidt 1996, Schmidt and Kaminski 1997, Schmidt et al. 1999). In one study (1996), she compared the perception of English [l] a nd [n] by speakers of Chinese northern and southern dialects. She found that northe rn dialect speakers had no difficulty in distinguishing [l] and [n], but th e [l]-[n] distinction was hard for southern dialect speakers to differentiate. In a later perception st udy (Schmidt and Kaminsky 1997), identification performance for both natural and amplif ied tokens were compared; however, no differences were found in identification pe rformance between na tural and amplified stimuli for the two sounds, [n] and [l]. Effects of Training in Cross-Language Speech Perception As mentioned earlier in this chapter, native Japanese speakers learning English have persistent difficulty in perceptua lly differentiating the liquid consonants / / and /l/, even after extensive conversational instruc tion. This led researchers to investigate perceptual training on this nonnative contrast (Flege et al. 1997, Jamieson and Marosan 1986, Strange and Dittmann 1984). Such studies on the development of second language speech perception fall into two categories: cr oss-sectional studies of L2 learners at different stages in the acquisition process (Fle ge et al. 1997); and st udies where listeners are trained on segments or suprasegments of a second language in a laboratory setting over the course of an experiment session or several sessions (Flege 1995, Jamieson & Marosan 1986, Wang et al. 1999, Wayland and Li 2005).
38 Contrary to the Critical Period Hypothesis (Lenneberg 1967, Scovel 1969, Patkowski 1989) in which the pl asticity of the human perceptu al system is believed to decline with age, even to the point where changes cannot occur, training studies on nonnative phonological contrasts are based on th e assumption that the human perceptual system remains somewhat malleable ove r the life span (Flege 1995, 2005). Among the early training studies were t hose designed to train American English listeners to perceive non-native voicing contrasts of stop consonants Pisoni et al. (1982), for example, reported that after a short period of la boratory training, monolingual speakers of American English were able to reliably label and discriminate voiceless aspirated, voiceless unaspirated and voiced stops diffe ring in Voice Onset Time (VOT: the time between the release of the c onsonant constriction and the vi bration of the vocal cords). Other studies have been conducted to examin e the ability to identify the American English fricatives / / and / / among native speakers of French (Jamieson and Morosan 1986). Among approximant consonants, /l/ and / / have received the most attention in cross-language speech training research. Speci fically, the ability to discriminate the American English /l/ and / / among Japanese listeners has been the focus of several studies (Bradlow et al. 1997, Lively et al. 1994, Logan et al. 1991, Strange and Dittmann 1984). Strange and Dittmann (1984) compared pre-training with pos t-training tests of natural speech minimal pairs contrasting / / and /l/ in several c ontexts, and categorical perception tests with two synthe tic speech series contrasting / / and /l/ in word initial position. Using a same-different discriminati on task with immediate feedback, female Japanese speakers were given extensive trai ning in a synthetic "rock"-"lock" stimulus
39 series. Performance improved gradually for all participants over the 14-28 training sessions. Improved results indicated transf er of training to the more demanding identification and oddity discrimination tasks. Participants also improved in identification and oddity discrimination of an acoustically dissimilar "rake"-"lake" synthetic series. However, transfer did not extend to natu ral speech words contrasting initial / / and /l/. The investigators concluded that modification of perception of some phonetic contrasts in adulthood takes time and effort, but improved la boratory training tasks may be useful in establishing categorical perception of these contrasts. Most studies on perception training re port a significant improvement in the identification of the nonnative contrast after training. Such improvement was also found to extend to other nonnative cont rasts in other novel contexts and was retained long after training (Bradlow et al. 1997). These findings were taken to support the view that the perceptual mechanisms used by adults in categorizing speech s ounds can be easily modified with simple laboratory techniques in a short period of tim e suggesting that the human perceptual system remains somewh at malleable over the life span. This assumption is also in agreement with the findings that adult L2 speech production and perception improve with experi ence (Yamada et al. 1994; Fleg e et al. 1996, Flege et al., 1997). Unlike / / and /l/, very little attention has been given to the perception of English /n/ and /l/ by speakers of other languages. Th e work of Schmidt and her colleagues (1997, 1999) on the perception and production of English /n/ and /l/ by Chinese speakers constitutes the only exception. They found that some southern dialect speakers from China had difficulty in perceiving and produci ng English /n/ and /l/. The enhancement of
40 the amplitude of the CV transitions did not result in improvement in perception, nor did a training task improve the speakersâ€™ production. Even if their partic ipants could produce the pair of sounds in English, they could not distinguish the /n/ and /l/ contrast perceptually. These studies on /l/ and /n/, however, did not specify which dialects the participants spoke; therefore it is not possible to evalua te the L1â€™s effect on L2 perception. Experimental Factors in Speech Perception Research As reviewed above, laborato ry training on non-native phonet ic contrasts have been shown to be effective on perception improve ment among adult listeners in a number of studies. Results of these studies showed a significant improvement in the ability to identify or discriminate non-native contra sts for most non-native participants following training. The amount of improvement, however, varies considerably from study to study due to differences in the characteristics of the non-native participants studied (i.e., inexperienced or experienced L2 learners) and th e nature (i.e., syntheti c or natural; single token or multiple-tokens from one or many talk ers) as well as the type (i.e., vowel, consonants) of stimuli used. Factors affecting the d egree of discrimina bility of non-native speech contrasts As reviewed above, there is a great deal of literature addressi ng the issue of which non-native contrasts are more difficult for L2 learners to master. These studies vary in terms of target and native languages, particip ants, stimuli, and tasks, all of which may affect the discriminability of non-native sounds, and consequently the outcome of a perception study. Among the factors that affect the degree of discriminability, the degree of similarity and difference between the phone mic inventories of the native and second
41 languages is generally agreed to be a main de terminer of relative perceptual difficulty for L2 learners. Other potential contributing factors have been also a ddressed in the literature on second language speech perception (Beddor and Gottfried 1995, Flege 1995, Polka 1991). Polka (1991) investigated monolingual English speakersÂ’ perception of Hindi consonants (contrasting in places of articulation and voicing). In addition to the factor of the phonemic function, she identified two othe r linguistic factors: phonetic experience (whether the non-native contra sts are perceived as allophon es or free variants), and acoustic salience (voicing and spectral feat ures), which can differentiate between phonetic categories. Therefore, when inte rpreting perception results, not only the phonemic features of related languages, but al so allophonic structur es and free variants, need to be taken into consideration. Besides linguistic factors, experimental va riables such as participants, stimuli, and task have shown to affect the performance in non-native speech perception (Beddor and Gottfried 1995). In terms of participant differences, it has been observed that age and experience exert an importan t influence on L2 learnersÂ’ perception. The earlier the learner starts learning the L2, the longer the learner is exposed to the L2 environment, and the more variations and amount of L2 input the learner receives, the more likely that the learner will be able to acquire L2 sounds (Flege et al. 1995, Flege et al. 1996, Polka 1992, Yamada and Tohkura 1992). Regarding stimuli materials, phonetic cont ext (e.g. the syllable position where a target sound occurs) may have a negative effect on perceptual performance because certain contexts may pull lis tenersÂ’ attention away from a target non-native sound
42 contrast (Beddor and Gottfried 1995). Phonetic contexts, on the other hand, may facilitate the perception if they fall in to listenersÂ’ nativ e phonotactic rules (Sheldon and Strange 1982). Concerning types of stimuli, researches have been using three kinds of stimuli: natural, edited natural, and synthesized. S ynthesized stimuli allow the researcher to accurately manipulate phonetic variations; wh ereas natural stimuli faithfully retain physical properties in the signal. Beddor a nd Gottfried (1995) summarized that more recent categorical discrimination tests tend to use natural stimuli over synthesized ones and both types yielded similar patterns in cross-language studies. Task variables in a perception study incl ude types of instruction, practice trials, configuration of the task, tr aining, and presentation of fee dback. It is common practice that instructions are given in the targeted language in an e xperiment, i.e., when listeners are tested on a L1 language contrast, the inst ruction will be given in the L1; if they are tested for L2 distinction, the instruction will be given in the L2. In comparison with the type of instruction, the content of instruction is more important, as it can be used to direct the listenerÂ’s attention to intended distincti ons, suggest perceptual strategies, and thus may change the performance. Another variable concerning the task is the use of practice trials. They are assumed to be important in perception tests, Â“given the general findings concerning transfer and practice effects in experimental psychologyÂ” (Strange 1995: 220). A third crucial variable that influences a listenerÂ’s perceptual performance is the configuration of the task. Generally studies employ identification or discrimination tests to serve different research goals (Aoyama et al. 2004, Best et al. 1997, Ingram and Park 1998, Wayland and Li 2005). Identification tasks require listeners to assign a linguistic
43 label to a sound. The listenerâ€™s memory load wo uld be reduced if a fixed set of responses were provided, compared with open-set labeli ng. In a discrimination test, the listener is instructed to decide the same/difference be tween stimuli. This type of test includes several variations: oddity discri mination (one in a group of thr ee stimuli is different from the other two), AXB discrimination (X is id entical to A or B, the two of which are acoustically different), and AX discrimination (X is same as or different from A). Factors affecting the outcome of training The first important variable is the tr aining procedure used in the training. Specifically, two training procedures, namely identification (ID) training and same/different discrimination (SD) training have been used. Commonly, stimuli used in training studies are multiple productions of minimal pairs that exemplify the phonetic contrast of interest (e.g., night vs. light for /n/ and /l/ distinction). In ID training, participants hear a single stimulus on each trial. Their task is to identify the stimulus in terms of two categ ories (e.g., /l/ and /n/). Feedback regarding the correct identity of the stimulus is given imme diately after a response is received. Through the provision of feedback, pa rticipants are expected to gradually learn the phonetic properties associated with each category. On the other hand, participants who are administered an SD training procedure hear tw o stimuli on each trial. Their task is to determine whether the two stimuli in each tr ial are instances of the same category or instances of two different categories. The advantages and disadvantages of th e two training procedures have been a subject of debate. Logan et al . (1991) suggested that ID trai ning might be more effective than SD discrimination training in that it encouraged participants to rely more on phonetic codes stored in long-term memory th an on rapidly fading sensory information in
44 short-term memory. Similarly, Lively et al . (1994) assumed that the combination of Â“minimal uncertaintyÂ” of the two-alternat ive forced choice ID procedure and the provision of immediate feedb ack (p. 2076) promoted the formation of new and robust phonetic categories that are not a dversely affected by acoustic va riations irrelevant to the phonetic identity of the categories in ques tion. These include variations induced by different speaking rates and/or idiosyncrati c characteristics of individual speakers. Jamieson and Morosan (1986, 1989) suggested that SD training encouraged participants to pay more attention to within-category acoustic differences than between-category acoustic properties. Consequently, the partic ipants may fail to r ecognize core acoustic information that defines the two categor ies. However, Polka (1992) argued that participants who receive ID training may l earn to respond correctly by attending to any properties that might be used to differentiate the two non-native categ ories. Some of these properties may not be the one s used by native speakers. Flege (1995) directly compared the relati ve efficacy of th e identification and categorical SD training proce dures in a study on distinguishi ng English word-final /t/ and /d/ distinction by Mandarin1 Chinese speakers. In the categorical SD procedure, multiple tokens of each category were used; as such stimuli in each pair we re always physically non-identical. The participantsÂ’ task was not to decide whet her one stimulus had been presented twice in succession (i.e., physically id entical), but to determine if two different realizations of a single phonetic category had been presented. The results from this study indicated that small but significant gains were observed for both groups immediately after training, and in a delayed post-te st two months later, regard less of the procedure used. 1 Mandarin in this study referred to the Putonghua, the Common Language of Chinese.
45 Moreover, for both groups, the effect of trai ning generalized to new tokens. Interestingly, the magnitude of generalization did not diffe r significantly between the two groups. In another recent study (Wayland and Li, 2005), the effect of ID and SD procedures in the training effect was evaluated by examining the perception of Thai tones by native English (NE) speakers and native Chinese speakers (NC). It was found that both procedures yielded significant improvement in perception. Both NE and NC speakers achieved more improvement in the ID training than the SD procedure, although the amount of improvement was statistically insignificant. Conversely, studies on Japanese speakersÂ’ perception of English liquids have demonstrated that ID may be better than SD (Logan et al. 1991, Lively et al. 1994, Strange and Dittmann 1984, Yamada 1993). Trai ning on Japanese speakersÂ’ perception of English liquids showed that the effect of discrimination training on synthesized tokens did not generalize to the identification of natural tokens (Strange & Dittmann 1984). Moreover, Japanese speakers trained on / /-/l/ minimal pairs using an ID task with natural tokens from various talkers, could ge neralize the effects to novel talkers as well as novel words (Lively et al. 1994, Yamada 1993) Besides the two commonly used types of tests, Jamieson and Morosan (1986, 1989) developed a fading method in which the presenta tion of stimuli is transitioned from easy to difficult identification. This method was prove n effective in training French listeners to perceive English interdental fricatives. Th e same training method, however, did not result in improvement in the perception of synthesi zed English [l] and [n] by Chinese speakers (Schmidt 1999). It seems that the fading method is most applicable to stimuli that differ along a temporal dimension, and thus may not be appropriate for the distinction of [l] and
46 [n], which may differ in consonantal dura tion and CV transition duration, as discussed earlier in Chapter 2. Although it seems that ID and SD tasks ma y be equally effective in perceptual training on non-native contrasts, as mentioned earlier, ID has been shown be superior to SD with natural tokens and with inexperien ced listeners of an L2, because it directs listenersÂ’ attention to the categorical diffe rences, to abstract common features that categorize sounds, and to refer to this info rmation in long-term memory during later learning. Therefore, it is assumed that ID tr aining may be more effective than SD in training Chinese dialect speakers on th e English /l/ and /n/ distinction. Besides the training procedure, the presen tation of stimuli in training experiments is important. Recently with the develo pment of computer technology, training experiments adopted participant-controlled stimuli presentation. Wa yland et al. (2005) investigated tonal training, in which listeners were allowed to repeat each token as many times as they wanted before they made a response. Self-controlled presentation enables listeners to focus on difficult stimuli. It can also accommodate the varying learning abilities of participants, allo wing participants to choose the best suited training pace. A third variable in training research is feedback. Feedback in speech training is critical, as is true in the development of skilled behavior (Anderson 1990). Trial-by-trial feedback is used extensively to provide info rmation on the listenerÂ’s responses to his or her best advantage (Logan and Pr uitt 1995, Wayland and Li 2005). A last important variable is the durati on of training: short-term vs. long-term. Short-term training does not exceed one se ssion. Ten-minute training was shown to be effective in improving American English sp eakersÂ’ perception on the VOT continuum
47 falling in the prevoiced range (Pisoni et al. 1982). Logan and Pruitt (1995) suspected that the improvement might result from the fact that VOT is allophonic in American English. Generally speaking, a long-term training coveri ng several days or several weeks would be more successful in perception improvement . Studies vary greatly in the duration of long-term training. It could vary from one se ssion per day to multiple sessions per day. One study used six sessions (Rochet 1995), while ot hers have used as many as forty-five sessions (Yamada 1993). The rate of improve ment follows a power function (Anderson 1990). That is, the listenerÂ’s ab ility increases more rapidly in the early stages and the improvement slows down in additional traini ng, as confirmed by Yamada (1993) that out of the forty-five sessions, the first ten resu lted in the largest gain in perception. In summary, there are a numb er of experimental variable s that interact with and influence the effect of speech perception tr aining. This dissertation addresses these variables in the following way: a four-d ay training period was adopted with an identification task; natural stimuli from mu ltiple talkers were used to insure the variability and enhance the robustness of stimu li; instruction to the training was given in English; participants to some extent were allowed to control the pace of their learning; and visual feedback was provide d trial-by-trial during training. Relationship between Speech Perception and Production The relationship between speech percepti on and production has been investigated by researchers since the 1950s. Generally speak ing, three approaches to the link between the perception and production have been propos ed: the Motor Theory (MT) that assumes that acoustic signals are perc eived in terms of articulator y gestures, and that a single module is shared by perception and production; the acoustic-auditory theories, such as the Acoustical Invariance Theory (AIT), whic h views the invariant relationship between
48 signals and linguistic featur es and perception and producti on as autonomous processes; and the Direct-realist Theory (DRT), wh ich proposes a common communicative goal shared by the two processes and a gestural object of speech perception (Bradlow et al. 1997). These theories all address issues in the general pe rception of sounds of any language, but offer different explanations fo r the relationship betw een speech perception and production. The MT theory proposes that the discriminability of speech sounds is closely related to the presence or absence of functional (i.e., phonemic) differences between sounds (Liberman and Mattingly 1985, Holt et al. 2004). According to this model, perception and production are ope rated through a common set of neural representations. Changes resulting from pe rceptual training, therefore, should be available to production through modifications to these representations. The AIT, on the other hand, predicts that as an autonomous process from production, perception is only indirectly linked to producti on through acoustical features of target speech sounds and auditory feedback during pr oduction (Stevens and Blum stein 1978, 1981). So perception should not directly affect pr oduction; further more, learni ng in production occurs during production. Similar to the MT, the DRT assume s a direct associati on between perception and production, but the DRT proposes that artic ulatory gestures are perceived directly instead of being reconstructed and interpreted from sensory input from the brain (Fowler 1986). As articulatory gestures are the obj ect of speech perception, perception and production should be interrelated and should be influenced mutually. In line with general speech production, the acquisition of novel phonetic categories by non-native speakers has received attention for quite some time (Bradlow et al. 1997).
49 Previous studies have shown that even high proficient L2 speakers could not achieve native-like accuracy in pronunciation (Fle ge and Hillenbrand 1987, Flege 1988). There is also research reporting a significant corre lation between perception accuracy and production intelligibil ity of English / /-/l/ tokens by Japanese speakers (Yamada et al. 1994). These studies imply that perception is associated with production in some way in L2 acquisition. An investigation on the percep tion and production of English [l] and [n] by Chinese dialect speakers would provide a window to the relationship between L2 perception and production. The cross-language speech acquisition models discussed in an earlier section in this chapter all offer theoretical predictions in regard to th e correlation betw een perception and production. First, the SLM is able to make predictions from a developmental perspective. The model hypothesizes that th e ability to perceive and produce non-native sounds remains plastic in adults. New categor ies will be established when L2 speakers detect the phonetic differences between an L2 sound and the closest L1 sounds. Thus, according to the SLM, speakers produce speech sounds based on how they perceive them; and thus perceptual learning is beneficial to speech production. Similarly, the NLM proposes that langua ge experience is crucial in shaping boundaries of perceptual spaces. The model suggests after some language exposure the representative exemplars will function as ma gnets and shrink the auditory space around them, thus allowing for the creation of equi valence classes for phonetic segments. This effect reduces differences of good representa tions of a sound, thus helping individuals ignore irrelevant differences between memb ers of a category. According to the NLM, when space shrinks around prototypes, distin ctiveness within a category decreases and
50 distinctiveness between categories increases. Th e perceptual prototypes serve as speech production targets, thus speech production w ould benefit from perceptual learning. Models and theories reviewed here all pr opose an improvement in the perception of English /l/ and /n/ for Chinese dialect speaker s; however, the prediction as to whether the improvement would transfer to production of the two sounds is not consistent because of the varying theoretical stances these models ta ke in regard to the relationship between the perception and production. Summary In conclusion, this chapter reviewed th ree most influential models in speech perception. Prediction and hypothese s were proposed in regard to the difficulty facing Chinese dialect speakers in perceiving Eng lish [l] and [n], as well as to the varying degrees of difficulty due to phonemic diffe rences between English and the Chinese dialects. In addition, a series of factors that would affect the speech perception and training were addressed and st udy design was proposed accordingly. In the next chapter, the perception experiment will be presented and discussed.
51 CHAPTER 4 PERCEPTION EXPERIMENT The focus of this dissertation is on the pe rception of American English word-initial [l] and [n] by native speakers of three Chin ese dialects: Jianghuai, Xinan and Gan; and five speakers of northern dialects of Mandari n (NM) who were recru ited as controls. As discussed in the last chapter, according to three major models of cross-language speech perception, speakers of Jianghuai, Xinan, a nd Gan would have varying degrees of difficulties in identifying the English [l] and [n ] because of the differe nt distributions of the two English sounds in their native langua ge systems. Moreover, it was hypothesized that perceptual training would improve these sp eakersÂ’ abilities to identify the two sounds in English. In order to test these hypotheses and predictions, a series of experiments were implemented, the details of which will be disc ussed in this chapter. The chapter will be organized as follows. First, research questions will be elaborated and justified. Next, the perception experiment will be described in terms of design, data collection, and data analysis. In the third section, results from the perception experiment will be presented and discussed. Research Questions The overall goal of this study was to invest igate the ability of adult native speakers of Jianghuai, Xinan and Gan dialects to perceptu ally identify English [l] and [n]. It also aimed to test if laboratory training would im prove these speakersÂ’ perception of the nonnative pair of contrast. The carry-over eff ect of perceptual trai ning to production was
52 also examined. Specifically, this percep tual research was guided by the following questions: Research question 1: Would Chinese dialect speakers have difficulty in perceptually identifying English [l] and [n] as predicted by current speech perception models? Research question 2: Would their identifica tion accuracy vary as a function of their native dialect? Research question 3: Would their identific ation accuracy be influenced by such phonetic contexts as vowel enviro nment and syllable structure? Research question 4: Would their identific ation accuracy impr ove after training? Would the amount of improvement vary as a function of L1 background? These questions will be elaborated one by one in the following sections and corresponding hypotheses will be discussed and justified. Research question 1: Would Chinese dialect speakers have difficulty in perceptually identifying English [l] and [n] as predicted by current speech perception models? Recall in Chapter One that the phonemic st atus and the distribution of English /l/ and /n/ contrast vary in all three Chinese di alects. In northern subdialects of Mandarin (NM) /l/ and /n/ are two separate phonemes and contrast syllable -initially. This contrast is either lost or only maintained in cert ain contexts in Jia nghuai, Xinan, and Gan. Specifically, only /l/ exists in Jianghuai and it has two allophones: [l] syllable initially and [n] syllable finally. On the other hand, only /n/ is present in Xinan consonant inventory. Finally, in Gan, /n/ and /l/ contrast syllable initially before high vowels, but occur in free variation in other vowel conditions. As reviewed in Chapter 2, adult L2 lear ners often have difficulty perceiving and producing consonants that are found in the L2 but not in their L1. Since English /l/ and /n/ are either absent or show different dist ribution in the three Chinese dialects, it is
53 hypothesized that this non-nati ve contrast would be diffi cult for Jianghuai, Xinan, and Gan speakers. This English contrast, should, however be easy for NM speakers since the same contrast exists in their native dialect. Therefore, speakers of Jianghuai, Xinan, and Gan would have difficulty identifying English [l] and [n], whereas NM speakers will not. Research question 2: Would their identificat ion accuracy vary as a function of their native dialect? Due to the phonemic status and distribut ion of English /l/ and /n/ in their native dialects, it is also hypothesized that perceptual difficulties experienced by Jianghuai, Xinan and Gan speakers should not be of the same degree. According to the PAM, perception of non-native contrasts falls into three types: the English /n/ and /l/ could be perceived by Chinese dialect speakers as either exemplars of a native category, exemplars of a non-native category, or as non-speech sounds. In the context of this current study, PAM predicts that English /l/ and /n/ would be perceptually assimilated to NMÂ’s /l/ and /n / respectively, which is a Two-category (TC) assimilation pattern. Their ab ility to distinguish between English [l] and [n] should, therefore, be very good. On the other hand, bot h English [l] and [n] would be assimilated to Jianghuai /l/ thus exhibiting a SC assi milation pattern as th eir native phonological system has only /l/. Their identification of Eng lish [l] and [n] is predic ted to be very poor by PAM. A Single-Category (SC) assimila tion pattern and a very poor degree of identification accuracy were also predicted for Xinan speakers, because both English /l/ and /n/ was expected to assimilate to Xinan /n/. Assimilation patterns of Gan speakers are more complex than other groups. English /n/ and /l/ may be assimilated as two categories as Gan does contrast the two before high vow els or as Single Category (SG) under other
54 conditions. Accordingly TC assimilation shoul d lead to very good discrimination; while CG would result in a performance that would be moderate to good. Research question 3: Would their identif ication accuracy be influenced by such phonetic contexts as vowel envi ronment and syllable structure? It was predicted that identification accuracy of English [l] and [n ] for Gan speakers would vary with vowel contexts. As mentioned earlier , Gan contrasts /n/ and /l/ sy llableinitially before high vowels. Regarding this phonologi cal fact, a research question was raised that if the Gan speakers could identify the English [n] and [l ] better in these contexts. For speakers of other dialects, vowel contexts should not affect their iden tification because they do not differ in terms of the distribution of the vowel s investigated at hand (Refer to Chapter 1 for the vowel inventory of the three dialects). Moreover, as reviewed in Chapter 3, pe rceptual difficulty of a non-native phonemic contrast may vary according to syllable pos ition. As none of the Chinese dialects involved allow consonant clusters anywhere in a syllable, it was predic ted that English [l] and [n] in syllable clusters would pose more di fficulty than those in a singleton, and this influence should be present in perfor mances of all dialect speakers. Research question 4: Would their identif ication accuracy impr ove after training? Would the amount of improvement vary as a function of L1 background? In Chapter 1 and Chapter 2, it was reviewed that regarding the failure to achieve native-like perception by adult L2 learners, the Critical Period Hypot hesis (CPH) proposes that after a certain age the ability of successful L2 learning will be lost; in the context of this study, the CPH predicts failure among adult learners to achie ve native-like perception of an L2. Other researchers have argued that the failure may be caused by other reasons including
55 inadequate input or differences between L1 and L2 (as discusse d in Chapter 3). For example, the Speech Learning Model (SLM) proposes that the speech learning mechanism remains accessible and malleable throughout the life span. This dissertation assumes that the hu man speech perceptual system remains malleable after puberty and that it allows m odification. This study at hand, in fact, was intended to test this assumption made by SL M through four intensive perceptual training sessions. It was hypothesized th at all Jianghuai, Xinan and Gan speakers should benefit from the training, which would be evident in th e post-test and the gene ralization test. It is also hypothesized that the benef its of training would be different for speakers of the three different dialects; that is, the amount of the improvement in English [l] and [n] identification would be different, because speake rs of the three dialects may have varying degrees of familiarity with /l/ and /n/ due to the fact that their L1s differ in the phonological distribution of the two phonemes. Experiment Design The perception experiment employed the para digm of a pre-test and a post-test, combined with a four-day audio training and a generalization test after the training. The total duration of this experiment was five days. On the first day, the pre-test was administered to assess the ability of all sp eakers to identify English [n] and [l]. On the second day, audio training was implemented. Two training sessions were administrated each day for four days, with each session la sting approximately one hour. Altogether participants received four da ys of training, totaling eight hours. On the fifth day, after the last training session, the post-te st and the generalization test were run to evaluate the training effect.
56 Participants Participants were forty-five adult native speakers of Chinese dialects (Jianghuai 15: female = 4 and male =11, Xinan 18: female = 2 and male = 16, Gan 7: female = 1 and male = 6, NM 5: female = 2 and male = 3). All participants ranged in age from 18 to 21 years. Jianghuai, Xinan and Gan participants completed all three tests and received full training. NM speakers were recruited as contro ls and only participated in the pre-test. All participants were enrolled as non-English majors at Beihang University in Beijing, China, at the time of testing. None had lived abroad or had any special training in English pronunciation or conversation. Howeve r, all had learned English for six years in a classroom setting since junior high school from the age of 12. According to a language background questionnaire administered before the perception experiment, all participants were born and raised in their native di alect regions. All had arrived in Beijing for less than a month when the experiment was carried out. None of the participants reported any history of a speech or heari ng impairment at the time of testing. Participants were told before the experi ment that the study was intended to help them learn the English /n/ and /l/. Because all participants self-reported that they had difficulty both perceiving and producing the two English sounds when learning English, they were all highly interested and motivated to participate in the study. All participants received cash compensation for their part icipation after their completion of the experiment. Stimuli The stimuli used in the perception experime nt are natural tokens of English [l] and [n] produced by five American English (AE) speakers (3 females and 2 males). The wordlist as shown in Table 41 consisted of 13 minimal pa irs of monosyllabic English
57 words contrasting /n/ and /l/ sy llable-initially. Stimuli also contrasted in four vowel contexts (/i/, /u/, [ ], / /) and two syllable structures (abs olute singleton and /s/-cluster). All words were put in a carrier frame of a sentence which reads Â“Please read ____ three times.Â” Three repetitions of each word were elicited, totaling 78 tokens. All 78 tokens and 20 fillers (3 at the begi nning and 17 mixed with the ta rget words) were randomly listed and printed out for the AE speakers to read out loud. The speak ers were instructed to familiarize themselves with the wordlist before recording and were asked to read the wordlist clearly but naturally. Speakers were recorded one at a time in a soundproof booth in the Linguistics Program phonetics lab at the University of Florida. A unidirectional head-mounted microphone (Shure SM 10A) and a SONY TCD D8 -DAT recorder were used to capture the production. Data was then transformed into WAV format using a Kay Lab CSL 4400 machine and stored in an IBM computer. The recording was redigitized at a sampling rate of 44.1 kHz and 16-bit quantization. The sevent y-eight target word s are segmented out from the carrier frame and saved as indivi dual WAV files. All tokens used for the perceptual experiment were then normalized fo r peak intensity (98 of the intensity scale) with the UAB software developed at the University of Alabama at Birmingham. Production of three AE speakers (2 females and 1 male, coded as AE 1 to AE 3) was used in the pre-test, training, and the post-test. Pr oduction of the other two speakers (1 female and 1 male, coded as AE 4 and AE 5) was used in the generalization test for assessing the training effect in the context of new AE speakers. The presentation of the stimuli was contro lled by the UAB software. The pre-test and the post-test used the same random orders of presentation. During the pre and post-
58 tests, the stimuli were presented in a to tal of 486 trials: 26 tokens x 2 repetitions2 x 3 talkers x 3 blocks. The same number of trials were presented in each of the three blocks. However, order of presentation was different for each block. Additionally, three different orders were used in the gene ralization test. .. A short break was given after each block. The total number of stimuli in the generaliza tion test was 312: 26 toke ns x 2 repetitions x 2 talkers x 3 blocks. However, the order of presentation was different from that in the pres and post-tests. The stimuli used during training were the same as those used during the pre-test. Procedure The experiment was carried out through fi ve computers in the Experiment Sematics Lab at Beihang University. Participants were instructed in the expe riment that the study was designed to help them learn English [l ] and [n]. The instruction for technical familiarization with the software was in Chinese. Instructions concerning the task were in English, and appeared on the screen for re ference during the expe riment. Participants heard one stimulus per trial and the inter-tr ial interval was set at1500ms. They were instructed to pay attention to the sound at the beginning of each trial and to decide whether they heard it as English [l] or [n ] by clicking the button labeled as L or N. Training sessions differed only in the or der of presentation. In each block during the training session, the first one-t hird of stimuli (52 stimuli) presentation was intended as a learning phase. These stimuli were presente d in a fixed order: th e odd-numbered trials were all /n/-words and the even-numbered tria ls were all /l/-words. The remaining twothirds of the block, 104 stimuli, were presented in a random or der. For all training trials, 1. Several words of AE5 were not produced clearly and therefore unusable. To keep it balanced, two repetitions of each word from all talkers were chosen.
59 participants received tr ial-by-trial feedback, that is, if their response were not correct, the right button would blink. They could hear the stimuli as ma ny times as they wanted before they made a choice. Once a butt on was selected, no replay was allowed. Data analysis Mean percentage correct of [l] and [n] identification was calculated for each participant. The results from the pre-test were submitted to a repeated measures ANOVA with Consonant (2 levels: [l], [n]) as the within-subject factor and Dialect (4 levels: Jianghuai, Xinan, Gan, NM) as the between-subj ect factor. If significant main effects were found for these factors, a Bonferroni Post hoc test was run. If significant interactions between the f actors were found, a multivariate ANOVA was run afterwards. If no interaction were (was) found, a repeated measures ANOVA was run for identification of [n] and [l] sepa rately for each dialect. In ad dition, to test the effect of two phonetic factors, a repeated measures ANOVA with Vowel Context (4 levels: /i/, / /, [ai], /u/) and Syllable Structure (2 leve ls: singleton, cluster) as the within-subjects factors, and Dialect (3 leve ls: Jianghuai, Xinan, Gan) as the between-subjects. To test the training effect on percep tion, a repeated meas ures ANOVA was run with Test (2 levels: pre-test, post-test) as the within-subjects factor and Dialect (3 levels: Jianghuai, Xinan, Gan) as the between-subjects factor. If interactions between the factors were found, a multivariate ANOVA was run; otherwise, a repeated measures ANOVA was run for each dialect with Test as the within-subjects factor. The same method was adapted for a comparison between the pre-test and a generalization test. Results from this experiment will be presented and discussed in the following section.
60 Results First I will report results from the pr e-test. Besides comparing the varying perceptual patterns among Jianghuai, Xinan, Ga n, and NM speakers, I will also examine the effects of two linguistic factors on their perception, vowel context and syllable structure. Next, I will report results c oncerning the training effects on speakersÂ’ perceptual ability, by comparing the pre-test with the post-test, the pre-test with the generalization test respectively, and by examini ng results from four trainings. In the last section, I will discuss thes e results respectively. Perception in the Pre-test Mean percent correct identification of Eng lish [l] and [n] for all four groups of speakers (Jianghuai, Xinan, Gan, and NM) are reported in Table 4-2 and Figure 4-1. As shown in Table 4-2 and Figure 4-1 above, a trend as predicted in previous sections was observed, i.e., NM speakers were highly accu rate in their ability to identify English [n] and [l] (mean = 98.89%). Scores of Jianghuai (62.89%), Xinan (55.20%), and Gan (69.93%) were far lower than that of NM speak ers. Additionally, except for NM speakers who did not show differences between [l] and [n ], [n] was better identified than [l] in all other three dialects, but the degree of difference seemed vary among the groups. To test for cross-dialect differences, the data from NM, Jianghuai, Xinan, and Gan speakers were submitted to repeated-measures ANOVA with Consonant (2 levels: /n/, /l/) as the within-subjects factor and Dialect as the between-subjects factor. This analysis showed that there was a significant main ef fect of Dialect [F (3, 41) = 33.825, p <.001], but the main effect of Consonant was not significant [F (1, 41) = 2.769, p =.104]. No interaction between the two factors was f ound. For the main effect of Dialect, a Bonferroni post hoc pair-wise comparison c onfirmed that NM outperformed all three
61 other dialect speakers. Additionally, both Jianghuai and Gan speakers performed significantly better than Xinan speakers. A similar repeated measures ANOVA was also performed to compared the performance of only the dialects of interest (i.e., Jianghuai, Xinan and Gan) without the control group (i.e., NM speakers). Unlike the fi rst analysis, besides the main effect of Dialect, this analysis also revealed a significan t main effect of the factor Consonant [F (1, 37) = 4.501, p = 0.041]. No interaction between the two factors was found. A Bonferroni Post hoc indicated that Gan speakers showed an overall better identification than Xinan speakers (p = 0.007), whereas neither the di fferences between Jianghuai and Xinan, nor that between Jianghuai and Ga n reached significance. The main effect of Consonant confirmed that identification English [n] by all 4 groups of speakers was significantly more accurate than that of English [l]. To evaluate the predictions generated by PAM that Jianghuai and Xinan speakers would have a better identifica tion of [l] and [n] respectively, paired-samples t-tests were run for all three dialects. The analyses showed that only Jianghuai speakers identified the two sounds significantly differently [t = 2.918, df = 14, p = 0.011]. [n] was numerically better identified than [l] among Xinan and Ga n speakers. However, the difference did not reached statistical significance among these speakers [Xinan: t = 0.628, df = 17, p = 0.538; Gan: t = 0.421, df = 6, p =0.688]. Effects of Vowel Context in the Pre-test As presented in the section on experiment design, English [l] and [n] produced in four different vowels contexts were used in the study. There were four vowel contexts in the stimuli: /i/, / /, [ ], and /u/. Mean percentages corre ct identification for [l] and [n]
62 for each vowel context for each speaker group are reported in Table 4-3 through Table 46, and their error bars are presented in Figure 4-2. As shown in Table 4-3 through 4-6 and Fi gure 4-2, a trend in perception across the four vowel contexts for all dialect speake rs was observed. Except for NM speakers, speakers of other dialects s howed differences in perfor mances in different vowel contexts. Specifically, for [l], it was more accurately identified when preceding [ ] for Jianghuai, Xinan and Gan; whereas for [n], it was better identified when preceding [i], [u], and [ ]. A repeated measures ANOVA was run w ith Vowel Context (4 levels: /i/, / /, [ ], /u/) and Consonant (2 levels: /l/, /n/) as the within-subjects factors and Dialect as the between-subjects factor. A main effect of Vowel Context was found [F (3, 228) = 12.477, p < 0.001], as was an interaction between Vowel Context and Consonant [F (3, 228) = 12.863, p < 0.001]. A repeated measures ANOVA with Vowel Context as the withinsubjects factor was performed for each dialect group and the an alyses revealed that, when both English [l] and [n ] were considered, generally th ere was no significant difference among the four vowel contexts for speaker s of Jianghuai [F (3, 87) = 0.622, p = 0.603], Xinan [F (3, 105) = 0.310, p = 0.818], or Ga n [F (3,39) = 2.803, p = 0.054]. However, when breaking the results down by each s ound, interesting patterns were found. For identification of [n], the repeated-measures ANOVA revealed a significant main effect of the factor Vowel Context [F (3, 111) = 10.997, p < 0.001]; a similar effect was found too for identification of [l] [F (3, 111) = 5.484, p = 0.001]. A follow-up Bonferroni pair-wise comparison discovered that for [n], identifying in an [ ] context was worse than in the [i], [ ], and [u] contexts (p < 0.001, p < 0.001, p = 0.002). On the contrary, identification
63 of [l] in an [ ] context was better than the other three (p = 0.03, p = 0.06, p = 0.001). No analysis was run for NM speakers as their performa nce reached ceiling effect. Syllable Structure The mean percentage of correct identifica tion of English [l] and [n] in two syllable positions (i.e. word-initial single, word-initial cluster with /s/) for all four groups of speakers is reported in Table 4-7 through Table 4-10 and also in Figure 4-3. As shown in Table 4-7 through 4-10 and in Figure4-3, except for NM speakers, speakers from the three dialects of interest demonstrated different identification patterns for different syllable structures. Specificall y, [n] was better identifie d in singletons than in clusters by speakers of Jianghuai (68.85% and 55.05%), Xinan (58.39% and 53.55%), and Gan (73.67% and 55.95%); whereas [l] was more accurately identified in clusters than in singletons for speakers of Xinan (60.34% and 55.39%) and Gan (73.02% and 68.18%), but not for Jianghuai sp eakers (49.26% and 58.59%). In ea ch syllable structure, [n] was more accurately identifi ed that [l] in singletons for all three groups, whereas [l] was more accurately identified in cl uster except for Jianghuai speakers. Data from Jianghuai, Xinan, and Gan speak ers were then submitted to a repeatedmeasures ANOVA with Syllable Structure (2 levels: singleton, cluster) and Consonant (2 levels: /l/, /n/) as the with in-subjects factors and Dialect (4 levels: Jianghuai, Xinan, Gan, NM) as the between-subjects f actor. A main effect of Sylla ble Structure was found [F (1, 76) = 19.835, p < 0.001]. Interac tions between Syllable Stru cture and Consonant, and Syllable Structure [F (1,76) = 11.437, p < 0.001] and Dialect [F (2, 76) =6.357, p = 0.003] were also found. A follow-up repeated measures ANOVA for each dialect group was run with Syllable Structure and Consonant as the within-subjects factor s. The analysis revealed a
64 main effect for Syllable Structure in Jia nghuai [F (3, 42) = 7.345, p < 0.001] and Gan [F (3,18) = 4.705, p = 0.014]. A follow-up pairwi se comparison showed that Jianghuai speakers identified [n] better when it appeared in singletons than in clusters (p =0.007). Moreover, Jianghuai speakers identified [n] be tter than [l] in syllable singletons (p = 0.027). Gan speakers also identifi ed [n] better when it was in singletons than in clusters (p = 0.015), and identified [n] better than [l] when they were in clusters (p = 0.008). No differences for Xinan speakers in identifyi ng [l] or [n] were re sulted from different Syllable Structures. Effects of Training Training is examined from three perspectiv es: Pre-test vs. Post-test to assess the effect of training on the same tokens produ ced by the same talkers; Pre-test vs. Generalization test to evaluate how participan ts generalized the training effect to same tokens from new AE speakers. Pre-test vs. Post-test Comparison Percentage of correct identification before (pre-test) and afte r training (post-test) for Jianghuai, Xinan, and Gan speakers are re ported in Table 4-11 through Table 4-14, and also in Figure 4-4. As shown in Table 4-11 through 4-14 and in Figure 4-4 above, overall, improved performance was observed for speakers from all three dialects (61.94 % vs. 67.95 % for Jianghuai, 56.90 % vs. 61.10 % for Xinan, a nd 69.93% vs. 79.36% for Gan) yielding a mean average perceptual gain of 5.8 %. (M ean percentage of correct responses was 61.06% in the pre-test and 66.86% in the pos t-test). A repeated-measures ANOVA was carried out with Test (2 leve ls: pre-test, post-test) as th e within-subjects factor, and Dialects (3 levels: Jianghuai, Xinan, Gan) as the between-s ubjects factor. It revealed a
65 significant main effect of Test [F (1, 37) = 17.680, p < 0.001] and Dialect [F (2, 37) = 4.411, p = 0.019], but no interaction between th e two factors [F (2, 37) = 0.852, p = 0.435], suggesting that a comparable amount of improvement was realized for all three groups of speakers. The performance of Jia nghuai and Gan were both better than that of Xinan, and Gan was better than Jianghuai; but a Bonferroni Post hoc indicated that only Gan speakers outperformed Xinan speakers in both Preand Post-tests (p = 0.017). As shown in Figure 4-4, the improvement in [l] identification seemed to be greater than [n] for all. To examine the improvement in [l] and [n] respectively, a repeated measures ANOVA with Test (2 levels: pre-te st, post-test) as the within-subjects factor was carried out for each dialect group. A si gnificant increase in [l] was found for both Gan listeners [F (1, 6) = 7.172, p = 0.037] and Jianghuai listeners [F (1, 14) = 13.891, p = 0.002], but not in the Xinan group. Pre-test vs. Generalization Test To evaluate whether the effects of traini ng is generalized to English [l] and [n] produced by new AE speakers, all three groups were administered a generalization test using stimuli produced by two additional AE speakers (AE4, AE5). Th e results of this test are shown in Table 4-11 through Table 4-14 presented in the previous section, and also in Figure 4-5. As shown in Table 4-11 through 4-14 and Figure 4-5, all three groups showed an increase in the percentage of correct identification in the generalization test (the mean percentage correct in the generalization te st was 64.79%, compared to 61.06% in the pretest). To assess if the training effect tran sferred to new AE speakers, data from the generalization test was compared to that from the pre-test in a repeated measures analysis with Test (2 levels: pre-test, generalization test) as the with in-subjects factor, and Dialect
66 (3 levels: Jianghuai, Xinan, Gan) as the betw een-subjects factor. It revealed a significant main effect of Test [F (1, 37) = 8.268, p = 0.007] and Dialect [F (2, 37) = 4.004, p = 0.027. A Bonferroni Post hoc test showed that Gan speakers outperformed Xinan speakers in both the pre-test and the gene ralization tests (p = 0.023). The amount of improvement among the three dialects was, howev er, comparable, as the analysis did not find a significant interaction be tween the factors of Test a nd Dialect [F (2, 37) = 0.336, p = 0.717]. To examine the improvement in [l] and [n] respectively, a repeated measures ANOVA with Test (2 levels: pre-test, genera lization test) as the within-subjects factor was carried out for each dialect group. A si gnificant increase in [l] was found for both Jianghuai speakers [F (1, 14) = 4.84, p = 0.041] and Gan speakers [F (1, 6) = 5.99, p = 0.049], but such significance was not found in the Xinan group. Discussion The discussion is structured as follows. First, I will examine the perceptual patterns of all speakers before any training. Next, I wi ll discuss the effect of vowel context and syllable structure on identifyi ng [l] and [n] prior to trai ning. Last, I will present the discussion on training effect. Perception Patterns In Chapter 3, it was predicted that percepti on of [l] and [n] cont rast would be easy for NM speakers, but difficult for Jianghuai, Xinan, and Gan speakers. Secondly, the degree of difficulty would vary am ong the southern dialect groups. The first prediction was confirmed: NM sp eakers were the best in identifying the non-native contrast; in fact, fa r better than the other three groups. As predicted by the Perceptual Assimilation Model (PAM), the disc riminability of Englis h /n/ and /l/ depends
67 on degree of phonetic (dis) similarity to th e phonological categories present in the listenerÂ’s L1. The contrast between /l/ and /n/ is present in NM which enabled the NM speakers to assimilate a non-na tive [l] and [n] into separa ted phonemic categories, termed Two-Category (TC) assimilation. Thus, as expected, NM speakersÂ’ performance was excellent. Because the /l/ and /n/ contrast func tions at best partially at the phonemic level in the other three dialects from NM, its perception by the other three dialects speakers was not as good as by the NM speakers. Ho wever, the prediction concerning relative degrees of identification accuracy among Ji anghuai, Xinan, and Gan speakers was not entirely confirmed. It was predicted that Ga n speakers would perform better than both Jianghuai and Xinan groups, because Gan contra sts /n/ and /l/ phonemica lly at least in the context of a high vowel, whereas the contrast is absent in a ll vowel contexts in Jianghuai and Xinan. The lack of contrast in Jianghua i and Xinan suggests that both groups should have performed equally bad. This aspect of the hypothesis was not confirmed as the results indicate that Jianghuai speakers pe rformed significantly better than Xinan speakers. Xinan and Gan cases will be firs t discussed as they followed the prediction. First, Xinan speakersÂ’ poor di scrimination can be accounted for in terms of the difference between the phonemic organizations between the Xinan and English: of the two phonemes /n/ and /l/ in English, Xinan has only /n/. Xinan Speakers assimilated the two English sounds into a Single-Category (SC) in the native phonologi cal system. For Gan speakers, it was predicted that assimilation pa tterns would be more complex than Xinan. This was because Gan speakers might perceive the contrast as e ither TC or Single Category (SC): TC assimilation because Gan does contrast the two non-native sounds before high vowels, SC type because Gan doe s not distinguish /n/ and /l / before non-high
68 vowels. The results suggested, however, that the moderate degree of identification accuracy (69.93%) in the non-high vowel condi tion found for Gan speakers was better characterized as reflecting a Category-Goodness (CG) type of assimilation pattern rather than an SC assimilation pattern. In this se nse, Gan speakers perceived both English [l] and [n] into one native phoneme category (though into which phoneme cannot be concluded from the present study). [n] was pe rceived as more similar to the native phoneme than [l] was, as revealed by the f act that the identificat ion score for [n] was better than [l]. The assimilation into on e category prevented the Gan speakers from having as a good discrimination as NM speakers. Turning to the Jianghuai speakersÂ’ per ception, it was pred icted that their performance would be as poor as Xinan speaker s. However, the results obtained revealed that their performance was signi ficantly better than that of Xinan speakers ((mean % of correction: 61.93 vs. 56.89 %). Jian ghuai speakersÂ’ perception had been predicted to fall in SC assimilation as there is one phoneme, /l/, in Jianghuai vs. two phonemes, /n/ and /l/, in English. But Jianghuai speak ers performed better than Xi nan speakers who showed SC assimilation. It seems that the difference cannot be accounted for at the phonemic level. Recall that Jianghuai /l/ has tw o allophones: [l] in syllable initial positions and [n] in syllable final positions. Neither the Nativ e Language Magnet model (NLM) nor PAM takes into consideration the effect of allophonic varia tions on non-native contrast perception. The Speech Learning Model (SLM ), on the other hand, proposes that in addition to phonemic features, L2 perception s hould also seek expl anation in terms of allophonic differences. The function of all ophones facilitated the mapping of the phonetic
69 features of English /n/ and /l/ onto native phonetic categories, which resulted in better discrimination than Xinan speakers, who could only rely on the phonemic differences. It is also possible that Jianghuai speakersÂ’ perception was of the CG assimilation type as was the case with Gan speakers. If th is is so, the degree of goodness fit in a native category must be different in the two groups . That is, both dialect groups perceived one of the two sounds as a better exemplar of a native category than the other sound, but the distance between the assimilation of the two wa s different. Such an interpretation may be generated through the Native Language Ma gnet model (NLM), as it allows a psychoacoustic comparison between two nonnative sounds. When non-native sounds fall into a native phonetic category, the magnet effect of the prototype reduces the discrimination between them, but their psyc hoacoustic distances be tween the prototype and among themselves in the native space will determine the perceptual sensitivity. In Gan and Jianghuai, if both nonnative sounds fell into one na tive category and they both differed from the prototype, assimilation distan ces could be conceptualized as illustrated in Figure 4-6 (A similar concepti on was used in Hansberger 2001). For the sake of illustration both dialects ar e put into one figure, although they may differ in what the phoneme they assimilate d the non-native sounds into. The circle represents a conceptual phonetic space and the star at center suggests where the prototype is located. Both [l] and [n] fell in a distance between the prototypes within the space for Gan than Jianghuai. Gan speakers might perc eive the sounds as poor er exemplars from their prototype than Jianghuai speakers, who perceive the similarity between the sounds and their prototype (this could not be tested in the current study). But if this were the case, then it could explain th e better discrimination of Ga n speakers. As both Gan and
70 Jianghuai perceived the sounds as poor ex emplars of their pr ototype, the distance between the exemplars themselves determin ed the degree of discriminability. The two assimilations in Gan were closer to e ach other psychoacousti cally, resulting in no significant difference between the two as shown earlier in Figure 4-1. For Jianghuai speakers, although both assimilations fell away from the prototype, they differed in a greater extent physically th an the assimilations in Ga n, and resulted in a better discrimination between the two. Moreover, be cause perceived assi milation of [n] was farther away from the prototype than that of the [l], [n] was better identified. In summary, the PAM using the TC and SC assimilation cases could account for the discrimination of NM and Xinan speaker s. The explanation of Gan and Jianghuai speakers required consideration of additional f actors, as well as combination of different models. In addition to the examination of the contribution of phonemic and phonetic differences between L1 and L2, the experiment also studied e ffects of two ot her linguistic factors on perception of English [l] and [n], wh ich is presented in th e following sections. Vowel context As shown in the earlier section, except fo r NM speakers, speakers of other three dialects all showed difference performances in different vowel contexts. Specifically, for [l], it was better identified for when preceding [ ] for Jianghuai, Xinan and Gan; whereas for [n], it was better identified when preceding [i], [u], and [ ]. The different perception of [l] and [n] across the four vowel contexts can be attributed to the acoustic characteristics of each vowel. /i/ and /u/ are both high vowels and have a formant within the low-frequency range; whereas / / a low vowel lacks such a low frequency formant. The presence of low frequency resonance in [i] and [u] provided an acoustical context th at is more similar to [n] that also has a formant in the
71 similar range (Abramson et al. 1979). This effect, however, was not applicable to NM speakers who were experienced in terms of familiarity with /l/ and /n/ contrast. The hypothesis that vowel contexts do not infl uence the perception by NM speaker could have been strengthened if more stimuli co ntrasting [l] and [n] in various syllable positions had been used. In summary, the prediction is confirmed that vowel contexts would influence the identification of Englis h [l] and [n], but it is confirmed from a different perspective. The influence of vowel context is not a result at the phonemic level, but at the phone tic level. Syllable structure Results showed that, except NM speakers, speakers from other three dialects demonstrated a different pattern of identif ication in different syllable structures. Specifically, [n] was better identified in singl etons than in clusters for all speakers; whereas [l] was better identif ied in clusters than in singletons except for Jianghuai speakers. [n] was better identified than [l] in singletons for all these three groups, whereas [l] was better identified in cluste rs except for Ji anghuai speakers. It was mentioned in Chapter 2 that phonotact ic rules in L1 may also contribute in L2 speech perception. In the three dialects, none allows [l] or [n] in clusters in either syllable initial positions or final positi ons. The current study fa iled to support the hypothesis that perception should be easier in singleton than in cluster, since only the identification of [n] was consistently better in singletons than in clusters for all dialect groups. Identification of [l] suggested that it wa s easier in clusters than in singletons. This cannot be explained by the influence of L1 phonotactic rules, and suggests other perspectives needed to be taken. In stimuli th at contain syllable clusters, [s] precedes the target sound. As a fricative, [s] contains strong energy in high fr equency range. It is
72 speculated that the presence of prominence at high frequencies in the preceding fricative may have hindered the perception of the followi ng [l] and [n]. It is further hypothesized that such hindrance is greater for [n] than for [l]. This hypothesis, however, cannot be confirmed by the current study, but would be strengthened by acoustic analysis of the fricative [s] and an investigation on higher frequencies of both [l ] and [n] in future studies. Training Effects A comparison between the pre-test and the post-test suggests that , overall, speakers of all three dialects improved equally afte r training. Moreover, all three groups showed better identification in the generalization te st. Furthermore, the improvement in [l] identification seemed to be gr eater than [n] for all dialec t groups, but only Jianghuai and Gan speakers achieved a significant increase in [l]. By examining the data from the four-day training, it was found that speakers from Jianghuai, Xinan and Gan all demonstrated si milar tracks of performance improvements. The increase was of the greatest amount between the pre-test and the first day of training, and then slowed down. There was a sudden drop in the performance in the post-test and generalization test. Results obtained provide evidence that is not in favor of the Critical Period Hypothesis (CPH), which claims after a certai n age the ability of successful L2 learning disappears. Instead the results support the pr ediction that laborator y training would be beneficial to all speakers. Generally sp eakers from Jianghuai, Xinan and Gan all improved equally from pre-test to post-test; and they could all apply the training to the identification of stimuli by new AE speakers in the generalization test. These findings are
73 in favor of SLM in that the speech learning m echanism (the ability to identify English /n/ and /l/ by these dialect speakers) remain s accessible and malleable in adulthood. Secondly, Jianghuai and Gan speakersÂ’ iden tification of [l] was improved more than that of Xinan speakers. It suggests that L1 phonemic inventory may play a role in perceptual training. R ecall that Jianghuai has [l] as an allophone and Gan has it as a phoneme before high vowels. This suggests that the familiarity with the target sound facilitates speakersÂ’ learni ng of the non-native sound. On the other hand, a similar influence was not observed in the improvement of [n] identification. Therefore, the above conclusion may only apply fo r certain non-native sounds. Besides implications about training in L2 speech perception, the findings in this experiment also provide support for the tre nd of training power mentioned in Chapter 3. The track of speakersÂ’ performance throughout th e five days of the experiment revealed that the greatest improvement occurred in the first day of training. The improvement slowed down in the following days. The drop fr om the 4th training to the post-test could be attributed to experiment design. First, as participants were allowed to take as much time as they needed to make an identifica tion choice, and were provided with feedbacks after each trial Moreover, they were not under the Â“testing pressureÂ” in the training sessions as in the post-test. Therefore, the accuracy of their identification was higher in the training sessions as compar ed to the post-test and the generalization test. Secondly, for practical reasons, the post-test and the ge neralization test were both carried out on the same day after the last training. Speakers had finished two-hour training before the final testing. Fatigue might have played an importa nt role in the dete riorating accuracy in
74 identifying the English contrast in the last day. Nonetheless, speakersÂ’ performance was significantly better in the post-te st and the generalization test compared with the pre-test. Summary In summary, the results obtained provided answers to the research questions: NM speakers did not have much difficulty in di stinguishing English [l ] and [n]; whereas Jianghuai, Xinan, and Gan speakers had difficu lty in perceiving th e non-native contrast. The difficulty they encountered, however, wa s not of the same degree. Current crosslanguage speech perception models could account for the degree of difficulty experienced by the three dialect groups, but none could offer a satisfactor y explanation by itself. The differences can be attributed to phonemic and phonetic differences between L1 and L2, and as well as two other lingui stic factors: vowel context and syllable structure. Vowel contexts and syllable structures influenced Jianghuai, Xinan, and Gan speakersÂ’ accuracy in identifying [l] and [n]; but did not have effect on NM sp eakers. It suggests that the phonetic factors can only influence L2 learne rsÂ’ perception when they are inexperienced with the non-native contrasts. Furthermore, Jianghuai, Xinan, and Gan sp eakers all improved in their abilities to perceptually distinguishing English [l] and [n] at syllable initial positions. The results are in favor of the hypothesis that the human pe rceptual mechanism remains accessible after puberty. It can be concluded th at Jianghuai, Xinan, and Gan sp eakersÂ’ perceptual abilities concerning certain speech sounds can be m odified through laboratory training with natural stimuli. The carry-over effect of perceptual training to production by Xinan speakers will be discussed in the next chapter.
75 Table 4-1 English wordlist us ed in the data collection. Table 4-2 Mean percentage corr ect in [l] and [n] identifica tion for Jianghuai (n = 15), Xinan (n = 18), Gan (n = 7) and NM (n = 7) speakers in the pre-test. Jianghuai Xinan Gan NM [n] [l] [n] [l] [n] [l] [n] [l] Mean 66.72 57.15 56.06 54.34 70.94 68.92 98.62 98.91 SD 8.68 17.33 7.42 7.1 10 21.77 1.36 0.97 0.00 20.00 40.00 60.00 80.00 100.00 JianghuaiXinanGanNMmean % correct /n/ /l/ Figure 4-1 Identification of [l] and [n] of all dialect speakers in the pre-test.
76 Table 4-3 Mean percentage correct of [l] and [n] identification in four vowel contexts for NM speakers (n = 5) in the pre-test. [n] [l] [i] [ ] [ ] [u] [i] [ ] [ ] [u] Mean 98.28 98.93 99.21 98.93 98.97 98.28 99.63 99.28 SD 1.52 0.88 0.52 1.01 0.90 0.85 0.34 0.21 Table 4-4 Mean percentage correct of [l] and [n] identification in four vowel contexts for Jianghuai speakers (n = 15) in the pre-test. [n] [l] [i] [ ] [ ] [u] [i] [ ] [ ] [u] Mean 68.52 55.80 68.40 68.52 57.31 65.92 59.01 48.02 [n] [l] [i] [ ] [ ] [u] [i] [ ] [ ] [u] SD 15.64 14.96 23.56 12.54 21.10 16.78 29.06 19.76 Table 4-5 Mean percentage correct of [l] and [n] identification in four vowel contexts for Xinan speakers (n = 18 ) in the pre-test. [n] [l] [i] [ ] [ ] [u] [i] [ ] [ ] [u] Mean 61.73 51.54 56.07 59.88 54.94 60.29 56.69 53.09 SD 15.59 9.87 13.49 9.72 16.71 14.67 16.28 12.43 Table 4-6 Mean percentage correct of [l] and [n] identification in four vowel contexts for Gan speakers (n = 7) in the pre-test. [n] [l] [i] [ ] [ ] [u] [i] [ ] [ ] [u] Mean 81.55 59.79 84.66 67.99 74.60 84.65 75.13 70.90 SD 11.50 9.30 8.40 15.56 21.54 15.70 30.03 17.72
77 0.00 20.00 40.00 60.00 80.00 100.00 Jiangh uai-n Jiangh uai-l Xinan-nXinan-lGan-nGan-lNM-nNM-lMean % correct V1 V2 V3 V4Figure 4-2 Identification of [l] and [n] in four vowel contex ts by all dialect speakers (V1 /i/, V2/ /, V3 [ ], and V4 /u/). Table 4-7 Mean percentage corr ect in [l] and [n] identification in two syllable structures for NM speakers (n = 5) in the pre-test. [n] [l] Singleton Cluster Singleton cluster Mean 98.28 98.87 99.21 98.78 SD 1.69 1.02 0.70 1.04 Table 4-8 Mean percentage correct of [l] and [n] identification in two syllable structures for Jianghuai speakers (n = 15) in the pre-test. [n] [l] Singleton Cluster Singleton cluster Mean 68.85 55.00 58.59 49.26 SD 10.39 6.49 17.86 18.71 Table 4-9 Mean percentage correct of [l] and [n] identification in two syllable structures for Xinan speakers (n = 18) in the pre-test. [n] [l] Singleton Cluster Singleton cluster Mean 58.39 53.55 55.39 60.34 SD 10.29 9.85 12.52 10.21 Table 4-10 Mean percentage corre ct of [l] and [n] identification in two syllable structures for Gan speakers (n = 7) in the pre-test. [n] [l] Singleton Cluster Singleton cluster
78 Mean 73.67 55.95 68.18 73.02 SD 11.43 2.50 24.34 8.44 0.00 20.00 40.00 60.00 80.00 100.00JianghuaiXinanGanNMmean % correct [n] in singleton [n] in cluster [l] in singleton [l] in cluster Figure 4-3 Identification of [l] a nd [n] in two syllable structur es by all dialect speakers in the pre-test. Table 4-11 Mean percentage corre ct in [l] and [n] identification in two syllable structures for NM speakers (n = 5) in the pre-test. [n] [l] Singleton Cluster Singleton cluster Mean 98.28 98.87 99.21 98.78 [n] [l] Singleton Cluster Singleton cluster SD 1.69 1.02 0.70 1.04 Table 4-12 Mean percentage corre ct of [l] and [n] identification in two syllable structures for Jianghuai speakers (n = 15) in the pre-test. [n] [l] Singleton Cluster Singleton cluster Mean 68.85 55.00 58.59 49.26 SD 10.39 6.49 17.86 18.71 Table 4-13 Mean percentage corre ct of [l] and [n] identification in two syllable structures for Xinan speakers (n = 18) in the pre-test. [n] [l] Singleton Cluster Singleton cluster Mean 58.39 53.55 55.39 60.34 SD 10.29 9.85 12.52 10.21
79 Table 4-14 Mean percentage corre ct of [l] and [n] identification in two syllable structures for Gan speakers (n = 7) in the pre-test. [n] [l] Singleton Cluster Singleton cluster Mean 73.67 55.95 68.18 73.02 SD 11.43 2.50 24.34 8.44 0.00 20.00 40.00 60.00 80.00 100.00 Jianghuai-nJianghuai-lXinan-nXinan-lGan-nGan-lMean % correct PRE POST Figure 4-4 Comparison of mean percentage of correction between the pre-test and the post-test for Jianghuai, Xinan, Gan speakers. 0.00 20.00 40.00 60.00 80.00 100.00 Jianghuai-nJianghuai-lXinan-nXinan-lGan-nGan-lMean % correct PRE GENE Figure 4-5 Comparison of mean percentage of correction between the pre-test and the generalization test for Jianghuai, Xinan, Gan speakers.
80 Jianghuai-l Jianghuai-n Gan-l Gan-nnative prototype Figure 4-6 Conceptual assimilation of [l ] and [n] by Jianghuai and Gan speakers.
81 CHAPTER 5 PRODUCTION OF ENGLISH [L] AND [N ] BY AMERICAN ENGLISH AND XINAN SPEAKERS Results of the perception experiment reporte d in Chapter 4 demonstrated that native speakers of Jianghuai, Xinan and Gan dialec ts of Chinese had difficulty identifying English [l] and [n] before training. More impor tantly, consistent with previous training studies, the ability to identif y the two English consonants by all three groups of Chinese speakers was found to improve after four days of perceptual trai ning. As reviewed in Chapter 2, some previous training studies al so demonstrated that improved perception ability after training transfe rred to production ability without further production training (Bradlow 1997, Yamada et al. 1993). This finding supported the hypot hesis that speech production is intricately linked to speech production as proposed by the Motor Theory and the Direct-realist Theory, whic h were reviewed in Chapter 3. As mentioned in Chapter 1, the overall goal of this research was to evaluate the effectiveness of perception training on the id entification of Englis h [l] and [n] by native Chinese speakers, and not on whether the production of English [l] and [n] by these groups of speakers would improve after percep tion training. The production of English [l] and [n] by all three groups of Chinese speaker s before and after th e perception training was, therefore, not obtained. Nonetheless, six native Xinan speaker s were recorded to produce the 13 minimal pairs of English /l/ and /n/ contra sts used in the perception experiment. It was decided th at their production and that of the five native English speakers would be acoustically analyzed to see (a) which acoustic dimensions
82 differentiate English [l] and [n] among bot h English and Xinan speakers; and (b) how Xinan speakersÂ’ production of English [l] and [n ] differ from or are similar to that of native English speakers. Research Questions Recall in Chapter 1, regardi ng the production of English [l] and [n], two questions were asked: Research question 5: What are the acoustic cues American English speakers use to distinguish [l] and [n] in their production? Research question 6: Would Xinan speakers be able to distinguish English [l] and [n] in production after perceptual training? As mentioned in Chapter 2, an investigati on into acoustic dimensions that serve to distinguish English [l] and [n] is currently lacking. As such, this study will be among the first to systematically docume nt phonetic similarities and/or differences between English [l] and [n]. Additionally, ac oustic analyses of English [l] and [n] produced by Xinan speakers may provide insight into why they have difficulty identifying these two nonnative sounds. Furthermore, if Xinan speaker s could not produce th e difference between English [l] and [n], then one may conclude that improved perception after four days of perceptual training did not result in imp roved production. These results would be inconsistent with the findings of some prev ious training studies a nd with theories of speech perception that proposed a close li nk between speech perception and production. The chapter will be organized as follo ws. First, the design of the production experiment will be described and justified. Next, results will be presented and discussed for each acoustic dimension examined. In this section, AE speakersÂ’ production will be examined to test the prediction that they w ill distinguish [l] and [n] across a series of acoustic dimensions: not only in commonly us ed ones, but also by Â“newÂ” dimensions
83 proposed in this study. Next, Xinan speakersÂ’ production will be measured to test the prediction that they may show some degree of distinction between th e English [l] and [n] in certain acoustic dimensions, the number of which may be smaller than that of AE speakers. AE and Xinan speakers will then be compared to examine in what dimensions Xinan speakers were similar or different from native speakers. Finally, implication to the relationship between perception and producti on will be discussed based on what was found in the study. Experiment Design Speakers There were eleven speakers altogether in this producti on experiment. Six were the same five AE speakers (AE 1-5) who produ ced the stimuli used in the perception experiment. The other half were six Xinan speakers (2 females and 4 males), who were recorded after they completed th e entire perception experiment. Stimuli The same 13 minimal pairs c ontrasting English /l/ and /n/ syllable initially used in the perception test were recorded for the ac oustic analysis. Xinan sp eakers read the same wordlist as the AE speakers did. Altogether 264 tokens (11 talkers x 12 pairs of words x 2 repetitions) were recorded using the same e quipment and procedures as describe in the section on experiment design of the percep tion experiment. However, unlike the stimuli used in the perception experiment in which perceptual loudness need ed to be controlled, these stimuli were not normalized for peak intensity. Acoustic Measurement As discussed in Chapter 2, based on the ways English [l] and [n] are produced, eight acoustical dimensions were proposed to serve as cues to the distinction between [l]
84 and [n]. As reviewed in Chapter 2, three out of the eight dimensions have been traditionally used to compare between [l] and [n]. They are the duration of CV transition, frequency and bandwidth and intensity of th e consonant formants. One dimension: the spectral features of the glottal pulses at around the cons triction releas e, has been previously used to analyze [l] and [n] separa tely, but has not been used to compare the two sounds. The rest four dimensions have not been used to inves tigate the differences between [l] and [n] in previous literature. They are the duration of consonant, rate of mean intensity change during the CV transi tion, spectral moments of the consonant and nasalization of following vowels. Each di mension was calculated using the PRAAT software developed at the University of Amsterdam (Boersma & Weenink, 2004). 1. Duration of the consonant from the onset of the consonant to the release of the consonant (vowel onset), as shown in Figure 5-1. This is a waveform generated by PRAAT for the English word Â“LeeÂ” produced by a female AE speaker (the dotted vertical lines represents glottal pulses). The total dur ation of the word is 294.85 milliseconds (ms). 2. Duration of the CV transition from the rel ease of the consonant to the point when the second formant of the vowel become s steady, as shown in Figure 5-1 and Figure 5-2. 3. First three formants (F1, F2, F3) and thei r intensity (I1, I2, I3) and bandwidths (B1, B2, B3) of the consonant measured at consonant midpoint. Figure 5-2 shows a broadband spectrogram generated by PRAA T. This is an English word Â“LeeÂ” produced by a female AE speaker. The dotted lines track how formants change. The plain line represents the mean inte nsity trend along the time of 288.84 ms shown here. The vertical spik e-like line at the onset of the CV transition indicates the release of the constriction of the late ral. The intensity of a formant at a specified point is calculated in the foll owing way. First a 30-ms Gaussian window was centered at the target time point. For example, when measuring the consonantal formant intensities, the targ et time would be the mid-point of the steady-state of the consonant. Second, an FFT spectrum was generated for the windowed selection. PRAAT calculates the spectrum based on the center half of the selection and the edges are largely ignored. Third, the spectrum was converted into an LTAS (Long-term average spectrum) using the 1-to-1 setting to exclude loss of frequency resolution. An LTAS is the averaged intensity or amplitude spectrum across a selected frequency range for continuous speech, reflecting the contribution of the glottal source and the vocal tract for the voice quality; and the
85 best length for speech analysis is 30 ms (Nordemberg and Sundberg 2003). As a last step, the intensity was measured fro m the LTAS at specified frequency range obtained earlier in this procedure. 4. The difference between the first two formants (F2-1) measured as in the procedure listed in 3. 5. The rate of mean intensity, represented by th e plain line in Figure 5-2, calculated as the difference in intensity measured betw een the onset and the offset of the CV transition, divided by the duration of time between the two points. 6. Spectral moments, which are the center of gravity (COG), the Standard Deviation (STD), the Skewness, and the Kurtosis, m easured at the midpoint of the consonant. Spectral analysis of all four moment s was performed using PRAAT. A 30-ms Gaussian window was used, and the spectru m at the center point of the windowed selection was derived via Fast Fourier Tran sform (FFT). Before the spectrum is generated, the sound slice was fed through a Hanning bandpass f ilter. The present study adopts a fixed window using a filter of 100-1000 Hz to capture the lowfrequency spectra. 7. Spectral analysis of three consecutive glottal pulses ar ound the constriction release. The first and second formants and their rates of changes were measured. The three consecutive glottal pulses ar e: (1) the one immediatel y before the constriction release (P1), (2) at the constriction releas e (P2), and (3) the one immediately after the constriction release (P3) , as shown in Figure 5-3. 8. Nasalization of the vowel (V) following the consonant, measured at the midpoint of the vowel. Parameters measured are F1 fr equency, F1 bandwidth and F1 intensity. Statistical Analysis Values obtained for all eight acoustic parameters measured were submitted to statistical analyses. A repeated measures ANOV A was run with Consonant (2 levels: /n/, /l/) and Syllable structure (2 levels: singleton, cluster) as within-subjects factors, and Language (2 levels: AE, Xinan) and Gender3 (2 levels: female, male) as between-subject factors. All acoustic parameters measured were also analyzed with a Discriminant Function Analysis (DFA) (see Appendix I for de tails) to examine their relative degree of effectiveness in classifying sounds as being either /l/ or /n/. 3 Gender difference is expected in formant frequencies and bandwidth due to anatomical reasons.
86 Results and Discussion In this section, acoustic dimensions ar e discussed one by one. For each dimension, AE and Xinan speakersÂ’ production will be di scussed separately and then followed by a comparison between the two. Duration of [n] and [l] As mentioned above, the duration of the consonant was measured from the formation of the constriction of the consonant to its release. Figure 5-4 shows the mean duration of English [l] and [n] in milliseconds in two syllable structures (NP1: singleton, NP2: cluster) produced by all AE and Xinan speakers. For AE speakers, as shown in this figure, both [l] and [n] are much longer in wordinitial singleton positions than in word-ini tial cluster positions (123.53 ms vs. 47.46 ms for [l] and 147.28 ms vs. 41.61 ms for [n]). Additionally, [l] are shorter than [n] in singleton, but the opposite is true for cluster position for AE speakers. These observations were confirmed when the data was submitted to a repeated-measures ANOVA with Consonant and Syllable Structur e as the within-subjects factor s. Specifically, the analysis yielded significant main effect of Sy llable Structure [F (1, 63) = 29.957, p< 0.001]. Results of the follow-up paired-samples t-test analyses revealed that the duration of both [n] and [l] were longer in singletons than in clusters [ [n]: t = 8.568, df = 23, p< 0.001], [l]: t = 6.278, df = 21, p < 0.001 for]. However, no difference was found between the two sounds either in singleton [t = 1.002, df = 21, p = 0.328] or cluster [t = -0.584, df = 23, p = 0.565] was found. Additionally, the Discrimina nt Function Analysis (DFA) failed to show that consonantal durati on could effectively classify the stimuli into /n/ and /l/ categories.
87 Similar to the AE data just reported, as illustrated in Figure 5-4, English [l] and [n] are produced with a shorter duration in cl usters compared to singletons by Xinan speakers (106.17 ms vs. 80.42ms for [l] and 128.73ms vs. 78.31 ms fo r [n]). However, unlike AE speakers, the difference between th e two positions did not appear to be as great. Similar to AE speakers, [l] appeared to be slightly shorter than [n] in singleton while the opposite was true for cluster. A repeated measures ANOVA with Consonant and Syllable Structure as the within-subject s factors was run, and a significant main effect of Syllable Structure was found [F (1, 63) = 10.357, p < 0.001]. A paired-samples t-test analysis showed a significant difference for Syllable Structure in [n], in that the duration of [n] was longer in singletons than th at in clusters [t = 4.627, df = 20, p< 0.001]. No significant differences were found betw een [l] and [n] either in singletons or in clusters. Comparing the production of AE speakers with that of Xinan speakers, it was found that duration was longer for AE speakers in clusters, while shorter in singletons. Table 5-1 listed the difference in mean durati ons between Xinan and AE speakers in each syllable structure for both [l ] and [n]. Except for [n] in singleton, all productions by Xinan speakers were different from those by AE speakers. Another repeated measures ANOVA with Language (2 levels: English, Xi nan) as the between -subjects factor and Consonant (2 levels: [l], [n]) and Syllable Position (2 levels: singl eton, cluster) as the within-subjects factors was performed and the results reported in Table 5-1. These results suggested that Xinan speakers do not differentiate [l] and [n] in English in terms of consonantal duration, and that their production is far from native-like.
88 Duration of the CV Transition Duration of the CV transition was measured from the release of the consonant to the point when the second formant of the vowel became steady. Figure 5-5 illustrated the CV durations by XINAN and AE speakers. For AE speakers, as shown in this figure, CV transition of [l] is longer than that of [n ] in both singletons ([ l]: 25.23 ms, [n]: 20.23 ms) and clusters ([l]: 37.65 ms, [n]: 20.34 ms). Furthe rmore, CV transition of [l] is shorter in singleton than in cluster, whereas there is almost no difference between the syllable structures for [n]. Results of a repeated measures ANOVA fo r the AE data indicated that CV transition for [l] was longer than that of [n] in Singleton [F (1,101) = 6.422, p = 0.013], and in Cluster [F (1,88) = 13.836, p < 0.001]. OÂ’ Connor et al. (1957) found that transition duration is an importa nt factor for distinguishing [l ] and [n]. These researchers found that if CV transitions were too brief, listeners confused [l] with [n]. Specifically, they reported that when the CV transition du ration of [l] was as brief as 30 ms, it was confused with nasals; whereas when the tr ansition was between 60-70 ms, the confusion disappeared. The results found here are par tially in agreement with OÂ’Connor et al. finding: [l] had a significantly longer CV transition durati on than [n], even though the AEÂ’s duration was much shorter than wh at OÂ’Connor reported. CV transition is confirmed to be a parameter to distinguish [l] from [n] in American English. Such a distinction was not found in the Xinan data. As illustrated in Figure 5-5, Xinan speakers exhibited the oppos ite pattern: CV transition of [n] was only slightly longer than that of [l] in singletons ([l] : 18.87 ms, [n]: 19.86 ms) and clusters ([l]: 21.89 ms, [n]: 22.65 ms). The DFA revealed that CV transition as a classifying predictor could successfully account for 88.9% cases of [n] and 37.5% ca ses of [l] in singletons
89 [Eigenvalue = 0.64, WilkÂ’s Lambda = 0.94, p < 0.013]; and 89.5% cases of /n/ and 69.6% of /l/ in clusters [Eigenvalue = 0.169, WilkÂ’s Lambda = 0.856, p < 0.001]. In order to compare duration of CV tran sition for AE and Xinan speakers in both syllable positions, a repeated measures ANOVA was performed with Language (2 levels: AE, Xinan) as the between-subjects factor and Syllable Structure (2 levels: singleton, cluster) as the within-subject factor for both [l] and [n]. A significant main effect of the factor Language was found for [l] in bot h singleton [F (1, 173)=15.110, p<0.000] and cluster [F(1,89)=13.158, p<0.000]: the duration was longer for the AE speakers than Xinan speakers. Such a difference is not f ound for [n]. That is, unlike English [l], duration of CV transition for English [n] fe ll within the native range for Xinan speakers, whereas CV transition for [l] did not fall within native range. Spectral Properties of English [n] and [l] A series of spectral properties of Englis h [l] and [n] were measured. They included formant frequencies (F1, F2, F3), their intens ity (I1, I2, I3), and their bandwidths (B1, B2, B3) measured at the mi dpoint of the consonant. Frequency of the first three formants (F1, F2, F3) and difference between F2 and F1 Table 5-2 shows the mean formant frequencies for both female and male AE speakers. As shown in the table, both F1 and F2 of [l] are higher th an those of [n]. The difference between F2 and F1 is larger for [n] than for [l]. These differences are true for both female and male speakers. A repeated measures ANOVA was run with Consonant (2 levels: [l] vs. [n]) and Syllable Structure (2 levels: singleton vs. clus ter) as the within-subjects factors for both female and male separately. It was found that compared with [l], [n] in singleton has lower F1 [female: F (1,25) = 29.290, p<0.001, male: F (1,23) = 10.156, p = 0.004], a
90 higher F2 [female: F (1,25) = 15.270, p< 0.001, male: F (1,23) = 5.376, p = 0.03], and greater difference between F2 and F1 [female: F(1,25)=28.287, p = 0.001, male: F(1,23) = 9.050, p = 0.006], shown in Figure 5-4. In cluste r, except for F2 of the male, [n] shows the same pattern, as it is th e singleton structure: lower F1 [female: F (1,19) = 8,780, p = 0.008, male: F (1,20) = 9.702, p = 0.006], higher F2 [female: F (1,19) = 23.308, p = 0.001], and greater difference between F2 and F1 [female: F (1,19) = 32.774, p < 0.001, male: F (1,20) = 4.906, p = 0.039]. DFA revealed that F1, F3 and the difference between F1 and F2 are the three predictors that can accurate ly place sounds into the two categories: [l] and [n]. The percentage of cases that can be accounted fo r is very high, 97.9% for /n/ and 84.1% for /l/ (Eigenvalue is 1.803, WilksÂ’ Lambda is 0.357 with df = 3, p < 0.001). For Xinan speakers, patterns similar to those of AE speakers were found. As shown in Table 5-3, F1 of [l] is higher than that of [n ], whereas F2 of [l] is lower than that of [n]. The difference between these two formants is larger for [n]. A repeated measure ANOVA with Consonant as the within -subject factor, however, di d not find a significant main effect of Consonant. That is, although Xinan speakersÂ’ forman t frequencies of [l] and [n] were different, their production did not signi ficantly differ from each other in the way that it did for native speakers. Difference between the AE and Xinan groups in general is evident only in singleton structures. A repeated measures ANOVA with Language as the between-subjects factor and Consonant as the within-subject factor, revealed no significant main effects of the factors for any of the formants in cluste rs. A repeated measures ANOVA with Syllable Structure as the within-subject factor was r un for both [l] and [n] in singletons. It was
91 found that , for [n], F1 and F2 of the Xinan group are higher than those of the AE group [F (1,95) = 31.049, p < 0.0001] [F (1, 95) = 5.358, p = 0.023]; for [l], F2 and the difference between F2 and F1 of the Xinan gr oup are higher than those of the AE group [F (1, 74) = 48.003, p < 0.0001] [F (1, 74) = 45.902, p < 0.0001]. The general patterns of F1-F2-F3 for both females and males in the Xinan group are almost identical between [l] and [n], as illustrated in Figure 5-6 and Figure 5-7. Both consonants exhibit a pattern that is more similar to that of the [n] of the AE group. F1 is usually taken to corr elate with openness of the vo cal tract during articulation (Stevens, 1998). In liquids, F2-F1 could also be an indication of cha nge in the front/back dimension. Figure 5-8 illustrated the F2-F1 difference for both AE and Xinan speakers. As demonstrated in this figure and co nfirmed by a repeated measure ANOVA with Consonant (2 levels: [l], [n]) as the within-s ubject factor, F2-F1 of [l] of the AE group is smaller than that of [n] [F (1, 74) = 15.06, p < 0.001]. This suggests that AE speakersÂ’ [n] is more front than their [l], although both are classified as alveolar. As also shown in Figure 5-8 and confirmed by the repeated measures ANOVA with Language (2 levels: AE, Xinan) as the between-subjects factor fo r both [l] and [n], F2-F1 of [l] produced by the Xinan group is larger than that of the AE speakers [F (1, 74) = 12.32, p < 0.001], suggesting that Xinan speakersÂ’ [l] is more forward than that of the AE speakers. Furthermore, the difference between F2 and F1 in [n] showed a marginal main effect of the factor Language [F (1,74) = 3.81, p = 0.051] . In addition, while the AE group showed a great difference between [l] and [n], such difference was not found in the Xinan group. Bandwidth of the first three formants This study found that the [n] of the AE group exhibited a sharp increase in F2 bandwidth compared to F1 bandwidth, and a si milar increase, yet to a lesser degree for
92 [l]Â’s bandwidth, was identified as shown in Table 5-4. A repeated measures ANOVA with Consonant as the within-subjects fact or for female and male speakers confirmed these increase were signif icant (Female AE: F (1,25) = 25.328, p < 0.001; Male AE: F (1,23) = 7.237, p = 0.013). A repeated measures ANOVA with Cons onant as the within-subjects factor revealed that the production of /n/-words by Xi nan speakers exhibited an increase in F2 bandwidth compared to F1 bandwidth, similar to AE speakers. There was also a similar increase identified for [l] (Female Xina n: F (1,83) = 5.288, p = 0.024; Male Xinan: F (1,37) = 9.808, p = 0.003); yet as with the AE sp eakers, [l]Â’s increased bandwidth was not as large as [n]Â’s increased bandwidth . Details are listed in Table 5-5. To compare between AE and Xinan speak ersÂ’ production, a repeated measures ANOVA with Language as the between-subjects factors and Consonant as the withinsubjects factor for both female and male spea kers, revealed that only the bandwidth of the first formant, there was a significant differe nce between the two groups [female [n]: F (1,64) = 9.982, p < 0.002; female [l]: no si gnificance; male [n]: F(1,31) = 7.758, p < 0.009; male [l]: F(1,29) = 8.489, p < 0.007]. Previous studies have found that one sp ectral characteristics of [n] is highly damped formants with wider bandwidth co mpared with oral consonants (Fant 1960, Kurowski and Blumstein 1984, Stevens 1998). Du e to a similar articulatory reason, when the tongue is in the [l] confi guration, there is also an incr eased bandwidth and decreased prominence of the F2, which wa s confirmed by the current study. Intensity of the first three formants Table 5-6 shows the mean intensity of form ants by AE speakers. As shown in the table, intensities of the first three formants decrease as frequencies increase. Generally
93 speaking, [l] had a more prominent F1 and F2 than [n]. A repeated measures ANOVA with Consonant as the within-subjects fact or confirmed that there was a significant difference between [l] and [n] for the intens ity of F2 [F (1, 41) = 15.235, p < 0.001]. This finding is in agreement with the change of bandwidth found in the previous section. The same procedure used for the AE group was adopted to analyze formant intensity for the Xinan group, that is, intensity for all three formants at the steady-state during the consonant was measured, and the re sults are presented in Table 5-7. As shown in this table, intensity of [l] is greater than that of [n] for all three formants. A repeated measures ANOVA with Consonant as the wi thin-subjects factor revealed significant differences between [l] and [n] in the inte nsities of F1 and F2 [F (1, 40) = 8.325, p < 0.001; F (1, 40) = 12.781, p < 0.001]. A comparison of the production between AE and Xinan speakers suggests that, the Xinan speakers exhibited a pattern in intensity similar to the AE group for [n](as shown in Figure 5-9), which was confirmed by a one-way ANOVA with Language as the between-subjects factor for in tensities of the three forman ts [F (1, 40) = 2.84, p = 0.10].. The intensity dropped sharply for F2 and the de crease was larger than that of [l]. The Xinan groupÂ’s [l] pattern was more similar to the [n] intensity pattern than to that of the AE [l] pattern. Generally speaking, [l] had a more prominent F1 and F2 than [l]. Rate of Mean Intensity Change For singletons, a repeated measure of ANOVA with Consonant as the withinsubjects factor revealed that neither AE nor Xinan speaker s showed differences between [l] and [n]Â’s rate of mean intensity ch ange. To compare between the two groups according to syllable structur e, a repeated measures ANOVA with Language as the between-subjects factor was run for each syll able structure for both [l] and [n]. The
94 statistical results are presented in Table 5-8. As shown in this table, for cluster, both AE and Xinan speakers showed significant differenc es, but with different patterns. Figure 510 suggests that the Xinan speaker s had a greater ratio for [n] than for [l], whereas the AE speakers showed the opposite pattern. Such a difference is only evident for [n]. Spectral Moments of the Consonant Four spectral moments were calculated: the Center of Gravity (COG), the Standard Deviation (STD), Skewness, and Kurtosis. Va lues of the four mo ments are separately presented in Figure 5-11, Figure 512, Figure 5-13 and Figure 5-14. A repeated measures ANOVA with Consonant as the within-subjects factor in the AE group showed an effect of Consonant in three out of the four moments: COG [F (1, 110) = 12.019, p < 0.001], STD [F (1, 110) = 8.201, p = 0.03], and Skewness [F (1,110) = 20.576, p < 0.001]. As shown in Figure 5-11, [n] had a lower center of gravity at around 250 Hz, than [l], which had energy ce ntered around 350 Hz. Though both had low frequency prominence, [n] demonstrated an energy corresponding to a nasal murmur. As illustrated in Figure 5-12, [n] also had a sma ller Standard Deviation, which means that its energy is more closely distributed around the center within a range of about 70 Hz. [l] had prominence spread out within a range of about 130 Hz around the center. Predictably the distribution of the energy of [n] was more left-skewed than [l ], as its center of gravity was positioned at the lower frequency end of the spectrum, the average value of which is illustrated in Figure 5-14 . As demonstrat ed in Figure 5-13, [n] also had a greater Kurtosis value than [l], which suggests that both consonants had clea rly defined peaks in the spectrum. This is not unexpected since bo th consonants involve a bifurcation of the vocal cavity and had antiformants in the resonance.
95 In order to test the distinguishing functi on of the moments, values were submitted for the Discriminant Function Analysis (DFA ). Of the four moments, COG and skewness satisfied that criteria and we re entered in the analysis (DF coefficient is 0.710 and Â– 0.425). Overall 78.8% of the grouped cases were correctly classified using these two predictors. Specifically, the classification ac curacy for /n/ is 83.9% and 72.9% for /l/ (Wilks' Lambda=0.620, df = 2, p<0.0001) respec tively. Eigenvalue for the function is 0.613 and accounts for 100% of the variance. For the Xinan group, [n] has a center of grav ity at 325 Hz, which is lower than that of [l], which is at 390 Hz. The Standar d. Deviation is, however, the same for both consonants, which means both extend the energy within a range about 100 Hz. Correspondingly, the Skewness for [n] is larger th an that of [l], suggesting a distribution that is more left-skewed for [n] than for [l]. The Kurtosis value is similar for the two consonants. For the Xinan group, Skewness and Kurtosis sa tisfied the criteria and were entered in the analysis (DF coeffi cient is 1.078 and Â–0.614). Overal l, 67.0% of the grouped cases were correctly classified usi ng the two predictors. Specifically [n] was better classified with an accuracy rate of 88.1%, whereas only 39.6% of the /l/ was correctly classified (WilksÂ’ Lambda = 0.903, df = 2, p < 0.000. Eigenvalue for the function is 0.108, accounting for 100% of the variances). Analysis of the Constriction Release The three consecutive glo ttal pulses investigated were: the pulse immediately before the consonantal release; at the release; and the pulse immediatel y after the release.
96 Track of the F1 and F2 frequency Overall, F1 and F2 for both consonants increase as the articulation moves from the consonant to the vowel. For the AE group [l] had a higher F1 and a lower F2 than the [n] at all three points. Those values are lis ted in Table 5-9. Value differences between every two consecutive points were calculated to examine the extent of frequency change. As shown in Table 5-9, generally, F1 incr eased along with the movement from release into CV transition, and F2 increased in befo re the release and decreases after it. The extent of increase from before the release to the release point was greater for [l] than [n]. The distance of change from the release to th e CV transition had a reversed pattern: that of [n] was greater than that of [l]. The sec ond phase increase was sharper than the first phase for [n]. The increase of F2 in the first phase was larger than that of F1 in the same phase. For Xinan speakers, F1 and F2 for both consonants increased as the articulation moved from the consonant to the vowel. Resu lts showed an interesting pattern for the Xinan groupÂ’s production. The track s of F1, F2, and F2-F1 were almost identical for the Xinan [l] and [n], and they l ooked more similar to that of the AE [l]. The similarities were more clearly illustrated in Figure 515, Figure 5-16, and Figur e 5-17 respectively. Track of the F2 intensity The prominence of F2 increased from imme diately before to after the release of constriction. No difference was found between [l] and [n], nor between AE and Xinan speakers (report stats). Figure 518 illustrates the intensity of F2 for both AE and Xinan speakers. Overall, [l] had a stronger prominen ce than [n] for both AE and Xinan speakers [F (1, 74) = 6.21, p = 0.022] [F (1, 95) = 7.21, p = 0.01].
97 A two-dimension spectrum was calculate d to capture antifor mants, which are shown in the following figures from 519 through Figure 5-22. The spectra were calculated from a narrowband spectrogram of Â‘LeeÂ’ and Â“kneeÂ” produced by a male speaker AE 6 (Figure 5-19 and Figure 5-20) a nd a male speaker Xinan 6 (Figure 5-21 and Figure 5-22). In the figures, the plain line re presents the pulse before the release. The dash line is the spectra at th e release. The dotted line is taken at the pulse after the release. As can be seen in Figure 5-19, the intensity of formants of [l] produced by AE 6 bumps up at the release and afterwards. Antif ormants are identified before the release (the plain line) at around 1500 Hz and below 3500 Hz, and an antiformant occurs below 1500 Hz at the release (the dash line). Figure 5-20 shows the spectra of Â“kneeÂ”. More antiformants are observed in this spectrum. They occur both before and after the release in the lo w frequency range around 500-800 Hz and 1500-2000 Hz, and also in th e higher frequency range of 3000-4000 Hz. Figure 5-21 shows the spectra of Â“LeeÂ” produced by the speaker Xinan 6. Antiformants are identified in the fr equency range of 1500-2000 Hz and 3000-4000 Hz. Figure 5-22 shows the spectra of Â“kneeÂ” of the same speaker. Antiformants are observed around 1500 Hz and 3500 Hz. Vowel Nasalization Nasals are produced with the velum lo wered. After the release of the oral constriction of a nasal, the relatively slow moving velum spreads a Â‘nasal murmurÂ’ onto the following vowel, causing nasa lity of the vowel. The inte rplay between the nasal and oral cavities results in lower formants with an increased bandwidth and damped amplitude compared to an oral vowel (Johnson 2001).
98 Neither an ANOVA nor the DF A found any statistical sign ificance in the spectral correlates of nasalization measured for [l ] and [n], nor between the AE and Xinan speakers. The results suggest that correlates of nasality may not be an effective acoustic cue for classification of /l/ and /n/. Such a sim ilarity may be due to an articulatory reason: [l] is formed by a central constriction at around the alveolar and a curl-up of the tongue side(s) to the palatal; upon releasing [l] into a vowel-l ike position, the to ngueÂ’s central body and its sides need to move into the same articulation position. Such coordination may take time and serve a similar functi on as that of the velum in nasal. Discussion In summary, this chapter examined eight se ries of acoustical parameters in order to find out which acoustic cues serve as effectiv e cues to distinguish distinction between English [l] and [n]. It was found that AE sp eakers, showed differences between [l] and [n] in all dimensions but the vowel nasalizat ion. Table 5-10 reports the dimensions in which AE and Xinan speaker demonstrat ed differences between [l] and [n]. The Xinan speakers however, predictably s howed differences in a far less numbers of these dimensions. Moreover, this chap ter demonstrated that certain acoustical parameters can be used to predict the classi fication between /l/ and /n/, (e.g., duration of CV transition, frequencies of F1 and F2, ba ndwidth and intensity of F1 and F2, spectral moments, and spectral properties of glotta l pulses at the cons triction release). In regard to the implication to the relationship between speech perception and production, or to answer the question about whether perceptual training results in improved production, results from this study are not able to reach any conclusive claims. As shown in previous sections, Xinan speaker s could not distinguish [l] and [n] in these acoustic dimensions even in a close-to-nativ e-like fashion after significantly improved in
99 the domain of perception. This discrepa ncy may be accounted for from several perspectives. First, as reviewed in Chap ter 3, perception and production are related to each other through articulatory gestures, ina ccurate perception of articulation may result in inaccurate configuration of articulator s in production. Secondly, as discussed in previous chapters, it may take time for the L2 learners to apply perception learning to production; therefore, production immediately after training may not reflect transfer from perceptual training. Moreover, the limited number of differences between Xinan speakersÂ’ [l] and [n] cannot be taken as ev idence that perceptu al training leads to production improvement, because there is no data prior training to compare with. It can only be safely concluded that Xinan speakers cannot distingui sh English [l] and [n] after perceptual improvement. Time (s) 0 0.294853 -0.1202 0.1145 0 Onset of the consonant Onset of the vowel Duration of the consonant [ l ] CV Transition Duration of the steady-state of the vowel [ i ] Time (s) 0 0.294853 -0.1202 0.1145 0 Onset of the consonant Onset of the vowel Duration of the consonant [ l ] CV Transition Duration of the steady-state of the vowel [ i ] Figure 5-1. Waveform of the word Â“LeeÂ” produced by the speaker AE-5.
100 Figure 5-2. Broadband spectrogram of Â“LeeÂ” produced by AE-5. Figure 5-3. Waveform of Â“LeeÂ” produced by AE-5.
101 0.0040.0080.00120.00160.00AEXinanConsonantal duration (ms) [l] in cluster [l] in singleton [n] in cluster [n] in singleton Figure 5-4 Mean consonantal duration in two syllable structures by AE and Xinan speakers. Table 5-1. Mean comparison of consonant al duration produced by Xinan and AE speakers. XINAN-AE Mean (ms)df F Sig. [n] in singleton -10.240 1, 490.3220.573 [n] in cluster -50.601 1,42 9.8190.003 [l] in singleton 63.751 1,44 11.1980.002 [l] in cluster 31.173 1,66 17.6080.000
102 0.008.0016.0024.0032.0040.00 Xinan AE CV duration (ms) [n] in cluster [n] in singleton [l] in cluster [l] in singleton Figure 5-5 Mean CV duration in two syllabl e structures by Xinan and AE speakers. Table 5-2 Formant frequency of the consonant in syllable singleton for all AE speakers. Female /n/ Mean Std. Error Female /l/MeanStd. Error F1 291.32 12.32 408.3418.53 F2 1565.1 75.59 1138.941.72 F2-F1 1273.7 68.56 730.650.54 Male /n/ Male /l/ F1 306.35 9.86 398.3727.14 F2 1676.3 106.7 1181.4184.85 F2-F1 1369.9 103.41 783.05165.43 Table 5-3 Formant frequency of the consonant in syllable singleton for Xinan female and male speakers. Female /n/ (Hz) Std. Erro r /l/ (Hz) Std. Error Male /n/ (Hz) Std. Erro r /l/ (Hz) Std. Error F1 449.56 9.50 485.8821.42 F1 280.727.05 328.89 20.03 F2 1923.8456.85 1926.1950.30 F2 1512.5251.86 1564.6917.22 F21 1474.2854.26 1440.3159.45 F211231.8052.36 1235.7927.32
103 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 F1F2F3 Xinan-n Xinan-l AE-n AE-l Figure 5-6 Mean formant frequencies for AE and Xinan female speakers. 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 F1F2F3 Xinan-n Xinan-l AE-n AE-l Figure 5.7 Mean formant frequencies for AE and Xinan male speakers.
104 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 XINANAEdifference between F2 and F1(Hz) female-n female-l male-n male-l Figure 5-8 Mean values for F2-F1 for both AE and Xinan speakers. Table 5-4 Bandwidth of F1, F2, F3 of AE speakers. Female /n / Mean(Hz)Std. ErrorFemale /l / Mean(Hz)Std. Erro r B1 136.12 17.01 155.92 26.33 B2 804.26 64.11 275.97 78.01 B3 510.19 87.28 868.13 185.84 Male /n/ Male /l/ B1 281.32 41.58 189.27 39.01 B2 954.69 145.87 432.87 127.85 B3 436.43 218.68 344.97 73.41 Table 5-5 Bandwidth of F1, F2, F3 of Xinan speakers. Female /n/ Mean (Hz)Std. Error Female /l/ Mean (Hz)Std. Erro r B1 252.62 21.02 B1 213.63 19.51 B2 634.16 102.98 B2 310.54 86.76 B3 654.80 127.11 B3 899.53 123.18 Male /n/ Male /l/ B1 165.00 20.78 B1 93.64 7.30 B2 623.90 99.54 B2 276.26 37.79 B3 424.62 63.25 B3 397.33 61.956 Table 5-6 Mean intensities of individual formants of AE speakers. Intensity of F1 (dB) Std. Error Intensity of F2 (dB) Std. Error Intensity of F3 (dB) Std. Error AE-n 19.00 2.82 -5.77 1.99-0.38 1.92 AE-l 22.84 2.95 12.75 3.32-4.67 2.60
105 Table 5-7 Mean intensities of indi vidual formants by Xinan Speakers. I1 (dB) Std. Error I2 (dB) Std. ErrorI3 (dB) Std. Error XINAN-n 5.98 3.53 -10.22 3.21 -2.61 2.36 XINAN-l 17.88 3.08 3.56 2.24 2.15 2.07 -15 -10 -5 0 5 10 15 20 25 I1I2I3mean intensity(dB) AE-n CN-n AE-l CN-l Figure 5-9 Intensity of the first three formants for AE and Xinan speakers. Table 5-8 Rate of intensity change from right before the release into the vowel. SYLLABLE Mean (dB/ms) Std. Error F-statistics Singleton [n] Xinan40.86 10.14 AE 72.64 9.57 F (1,88) = 4.010, p < 0.50 [l]Xinan25.26 12.64 AE 53.06 22.97 No significance Cluster [n] Xinan163.2424.14 AE 98.43 9.44 F (1,38) = 6.011, p < 0.019 [l]Xinan92.31 23.89 AE 126.069.12 No significance 0 50 100 150 200 n-sl-sn-cl-cmean intensity ratio(dB/ms) Xinan AE Figure 5-10 Comparison of the rate of intens ity change between AE and Xinan speakers (n-s: /n/ in singleton, nc: /n/ in cluster).
106 0.00 100.00 200.00 300.00 400.00 500.00 femalemaleCOG (Hz) Xinan-n Xinan-l AE-n AE-l Figure 5-11 Center of Gravity for AE and Xinan speakers. 0.00 36.00 72.00 108.00 144.00 180.00 femalemaleStd. Deviation (Hz) Xinan-n Xinan-l AE-n AE-l Figure 5-12 Standard Deviation for AE and Xinan speakers.
107 9.21 67.37 12.87 88.04 149.15 56.29 3.61 5.69 0.00 50.00 100.00 150.00 200.00 250.00 femalemaleKurtosis Xinan-n Xinan-l AE-n AE-l Figure 5-13 Kurtosis for AE and Xinan speakers. 0.00 1.00 2.00 3.00 4.00 5.00 femalemaleSkewness Xinan-n Xinan-l AE-n AE-l Figure 5.14 Skewness for AE speakers and Xinan speakers.
108 Table 5-9 Frequencies of F1 and F2 measured at the three consecutive pulses represented by A, B, and C of AE speakers. AE-n N Mean Std. Error AE-lNMeanStd. Erro r AF1 29 309.89 9.67 AF129442.2231.97 BF1 340.44 13.38 BF1 554.2257.60 CF1 400.85 19.42 CF1 538.2743.97 AF2 1797.59 51.77 AF2 1309.00119.72 BF2 2005.25 36.62 BF2 1477.05128.12 CF2 2020.43 47.45 CF2 1438.2592.18 200 300 400 500 600 AF1BF1CF1 Xinan-n Ae-n Xinan-l AE-l Figure 5-15 Comparison of F1 tr ack by AE and Xinan speakers. 0 500 1000 1500 2000 2500 AF2BF2CF2 Xinan-n AE-n Xinan-l AE-l Figure 5-16 Comparison of F1 tr ack by AE and Xinan speakers.
109 0 500 1000 1500 2000 AF2-1BF2-1CF2-1 Xinan-n AE-N Xinan-l AE-L Figure 5-17 Comparison of F2-F1 tr ack by AE and Xinan speakers. 0 5 10 15 20 25 123 Xinan-n Xinan-l AE-n AE-l Figure 5-18 Comparison of F2 intensity track between AE and Xinan speakers. Figure 5-19 Two-dimension spectr a of Â“LeeÂ” by the speaker AE 5.
110 Figure 5-20 Two-dimension spectra of Â“kneeÂ” produced by the speaker AE 5. Figure 5-21 Two-dimension spectra of Â“LeeÂ” produced by the speaker Xinan 6.
111 Figure 5-22 Two-dimension spectra of Â“ kneeÂ” produced by the speaker Xinan 6. Table 5-10 Acoustic dimensions that [l] and [n] contrast in the production of AE and Xinan speakers AE Xinan 1 Duration of CV transition F2 bandwidth 2 Consonant F1, F2, F2-F1 F1 and F2 intensity 3 F2 bandwidth 2 spectral moments 4 F1 and F2 intensity 5 4 Spectral moments 6 Constriction release
112 CHAPTER 6 GENERAL DISCUSSION AND CONCLUSION This dissertation aimed to investigate the ab ility of adult native Chinese speakers of Jianghuai, Xinan and Gan dialects to perceptually identify E nglish syllable-initial [l] and [n]. Its second aim was to test if such ability could be improved through laboratory training. An acoustic analysis of English [l ] and [n] produced by American English (AE) speakers and that produced by Xinan speakers af ter the training aimed to investigate the acoustic cues employed by the AE speakers and Xinan speakers. Another aim of the acoustic analysis was to examine if the pe rceptual training can be transferred to production ability without further production training. Differences in the phonemic status and th e distribution of /l/ and /n/ in NM, Jianghuai, Xinan and Gan dialects of Chinese provided a most interesting testing ground for predictions generated by speech perception theories in general and for current crosslanguage speech perception models in particul ar. According to three most cited current cross-language speech perception models, na mely the Native Language Magnet (NLM), the Perceptual Assimilation Model (PAM) and the Speech Learning Model (SLM), native speakers of NM, Jianghuai, Xinan and Gan shoul d exhibit varying degrees of difficulty in their identification of English [l] and [n]. Specifically, NM speakers should have no difficulty perceiving the two non-native s ounds. Jianghuai and Xinan speakers, on the other hand, should perceive the non-native cont rast poorly, because these speakers would assimilate the two sounds into a single L1 category. Gan speakers should assimilate the non-native sounds into either one L1 categ ory in non-low vowel context or two L1
113 categories in a high vowel envir onment, Their ability to identi fy English [n] and [l] was, therefore, predicted to be excel lent in a high vowel context, but to be poor in other vowel conditions. Overall, Gan speaker sÂ’ performance was expected to be moderate to good. Moreover, the ability to identify English [l] and [n] among all three groups of speakers should improve after short-term laboratory pe rceptual training. All speakers should also generalize their learning in tr aining to new AE speakers. The results obtained indicate that only some of the predictions based on perceptual assimilation patterns generated at the phonemi c level concerning relative degrees of perceptual difficulty predicted for the thre e dialect groups were upheld. That is, the excellent performance shown by NM speak ers and the poor performance exhibited among Xinan speakers were as predicted. Ho wever, the prediction that Jianghuai and Xinan speakers would perform equally poor was not supported. Contrary to the prediction, Jianghuai speakersÂ’ performance was found to be superior to that of Xinan speakers. The difference across the three dialects could, however, be accounted for conceptually by NLM by referring to psycho aco ustical dissimilarity between the location of perceived non-native contrast and the prototype in the native category. Therefore, the current results suggest that not only phonemi c differences between L1 and L2 determine L2 perception difficulty, but that cross-language differences at the phonetic level also contribute to the degree of discrimina bility of non-native contrasts. In addition to the perceptual patterns and th e degrees of perceptual difficulty, it was predicted that phonetic factors, i.e., syllable structures and vowel contexts, would affect the perception. Results from the pretest showed that these two factors did influence the perception of English [l] and [n], and that su ch influence was evident in the identification
114 performance of all speakers of Jianghuai, Xinan, and Gan. Sp ecifically, it was found that identification of [n] was facili tated in a high vowel context wh ile identification of [l] was enhanced in a low vowel context. One may speculate that a lower F1 in a high vowel enhances the acoustic salienc e of nasal murmur whose spectrum is dominated by lowfrequency resonance while a higher F1 in a low vowel facilitates the perception of [l], with a relatively higher-frequency domina nt spectrum. Interestingly, the varying facilitating effects of vowel c ontexts in [l] and [n] were al so true for Gan speakers, who were predicted to show more accurate id entification for both sounds in high vowel contexts. These results suggest that the sp ectral features of th e following vowels (i.e. locations of vowel formants) have a universa l influence in percei ving certain non-native contrasts. NM speakersÂ’ identification, howev er, did not vary across vowel contexts. One possible reason is that their performance reac hed ceiling effects. A nother explanation is the influence of their native dialects. In NM, /l/ and /n/ are two separate phonemes as they are in English, NM speakers are more familiar with the contrast than Jianghuai, Xinan, and Gan speakers, who have less experience with such a contrast, which is absent in their L1s. Therefore, it is possible that the effect of vowel contexts in the perception of a nonnative contrast is only evident when L2 learners are not familiar with the sounds. Similarly, syllable structure was also found to play a role in the perception of English [l] and [n]. Syllable-initial singlet ons resulted in a more accurate identification than clusters for all dialect groups, except NM speakers w ho had equally excellent accuracy in both syllable structures. As none of the dialects investig ated allows clusters in syllable initial or final positions, it may be concluded that L1 phonotactic rules play a role in L2 perception. As NM speakers, however, are equa lly unfamiliar with syllable clusters as
115 speakers of other three dialects, the difference in their accuracy can only be attributed to L1 influence. NM speakers have relatively mo re experience with [l] and [n], which is a benefit from their L1. Such a familiarity helps them overcome the inexperience with a novel syllable structure. In conclusion, phonetic factors play an important role in the perception of English [l] and [n], and the de gree of the influence is lower with more experience with the L2 sounds. Another interesting finding in this study is that all pa rticipants showed better identification of [n] than of [l] in the pretest, but such a difference disappeared in the post-test. That is, [n] was easier to perceive than [l] in the pretest, but the amount of improvement in the identification of [n] was not as significant as that of [l]. It is possible that English [n] contains acoustic information that is sensitive to dialect speakersÂ’ ears. As there are no phonetic parameters of Chin ese dialectsÂ’ [l] and [n], it can only be speculated that English [n] is similar to Chines e [n] to a higher degree than Chinese [l]. Concerning the amount of trai ning effect, the average am ount of perceptual gain (5.8%) was smaller that those reported in ot her studies (Bradlow et al. 1997, Yamada et al. 1993). Bradlow et al. (1997) and Yamada et al. (1993) both reported more than a 10 % improvement in the perception of English liquids by adult Japanese speakers. The differences between this study and theirs may be attributed to several factors. First, their studies contained longer percep tual training, which totaled 45 sessions and lasted 3 to 4 weeks. Secondly, their target non-native contrast is different from the contrast examined by the current study. As reviewed in Chap ter 2, it has been established that [ ] and [l] differ from each other at the F3 transition. Pr evious studies have shown that [l] and [n] are acoustically similar and perceptually confusing. There, however, has been no
116 conclusion on which dimensions [l ] and [n] directly contrast e ach other. It could be that the contrast studied in this di ssertation is perceptually more difficulty than the pair of liquids. Compared to the study on perception of E nglish [l] and [n] by Schmidt et al. (1997), the amount of improvement found in our study is promising, as Schmidt et al. did not find any improvement in their particip antsÂ’ perception from their training. The differences can be attributed to the following factors. First, differe nt perceptual tasks were used in the two studies. Schmidt et al. used a fading method with synthesized stimuli, whereas ours used an ID task with na tural stimuli. The ID task as examined in Chapter 4 has been shown to be superior in training. Secondly, different types of stimuli were used in the two studies. Our study used natural stimuli produced by multiple speakers, which provided more robustness a nd variability in input than synthesized stimuli in their study. In conclusion, the trai ning method using the id entification task in our study is effective in improving ability to perceptually distinguishing English [l] and [n]. In addition to the study on perception, this dissertation investigated the production of English [l] and [n] by bot h AE and Xinan speakers. As reviewed in Chapter 3, a number of general theories concerning sp eech perception and production (Motor Theory, Direct Realist Theory) postulated a close link between speech perception and production. In the context of cross-language speech perc eption research, these theories predict that improved perception ability through either labo ratory training or increased exposure to the target language in a na tural setting would result in improved production without specific production training. Thus far, this prediction has been supported by empirical
117 evidence obtained in previous studies on cr oss-language speech pe rception (Bradlow et al. 1997, Yamada et al. 1993). Although hard evidence as to whether the training in th e current study resulted in improved production is lacking, as pre-trai ning production was not measured and then compared to post-training production, acousti c analyses of English [l] and [n] produced by Xinan speakers after the four-day perception traini ng suggests that improved perception did not readily resu lt in improved production. That is, while the production of English [l] and [n] produced by native Eng lish speakers were distinguished by seven acoustic dimensions; Xinan speak ers production of [l] and [n] was hardly distinguishable. Additionally, production of English [l] and [n ] by Xinan speakers differs, to a varying extent, from those produced by native Eng lish speakers in all acoustic dimensions examined. Based on the results from the acoustic analys is, it cannot be conc luded as to which acoustic dimensions of the L1 phonetic system influenced Xinan spea kersÂ’ articulation of English sounds, as no acoustical parameters of native [n] in Xinan were available. Only in one early piece of literature, Zhao and Li (1939:172) were th e phonetic features of Xinan /n/ briefly mentioned. In this work, Xina n /n/ was described as a Â‘nasal-oralÂ’ sound that was similar to both /n/ and /l/ in nor thern sub-dialects of Mandarin. On the other hand, two factors can account for the lack of distinction between [n] and [l] in the production of Xinan speakers. First, as reviewed in Chapters 3 and 4, perceptual training effects may not immediately tran sfer to production; such a tran sfer process takes effort to achieve configuring articulators accordi ngly. As Xinan speakers were recorded immediately after training, they may have not applied their learning in the perceptual
118 training to their production. Secondly, the nondistinction may be due to a relatively small amount of input, as participants ha d only four days in training, much shorter compared to the month-long training provided in previous studies on Japanese speakersÂ’ acquisition of English / / and /l/. Thirdly, the inability to produce the difference between English [l] and [n] could be attributed to th eir limited improvement in perception. That is, even though significantly improved from the pretest, their overall identification accuracy only reached 61.10% after training, sugges ting that Xinan speakers still cannot consistently detect phonetic differences betw een [n] and [l]. According to the speech Learning Model, new categories in L2 will be established only after L2 learners detect the difference between L2 and L1 sounds. With this level of performance, it would be too mature to conclude that Xinan speakers have learned to perceive the difference between English [l] and [n], and that new phonetic cat egories have been formed for these two English sounds among this group of speakers. It is possible that improved perception without categorical formation is not su fficient for production to be improved (or something along this line) Therefore, the re sult of the production experiment should not be taken as evidence against the hypothesis that perception and production are intricately connected to and benefit from each other. In conclusion, the results of the per ception and production experiments conducted in this study suggest that pe rceptually identifying English [l] and [n] by adult Chinese speakers from three dialects is difficult, and that the degree of perceptual difficulty depends on phonemic and phonetic differences between their L1 and English. Factors like vowel context and syllable structure we re found to influence perception. Moreover, their ability to identify English [l] and [n] can be improved through training as short as
119 four days. Unfortunately, however, Xinan speak ersÂ’ do not appear ab le to distinguish English [l] and [n] in their productio n, even after perceptual training. There is, of course, space for improvem ent for this study. One improvement for future studies would be to include more mini mal pairs of English words to contrast [l] and [n] not only syllable-initially, but also in other syllable positions. Moreover, more subjects could be recruited from each dial ect group. Equal number of subjects in all groups would increase the reliability of the statistical analysis. In a word, the higher variability in stimuli and subjects will enhance any conclusions concerning the effects of L1 phonetic characteristics and phonotactic rules on L2 perception. Secondly, in order to further testify predictions concerning disc rimination of non-native contrasts generated by PAM, same-different (SD) discrimination task s should be incorporated in the perception experiment. The inclusion of SD tasks w ould enable us to examine degrees of discriminability of non-native contrasts and to compare the discriminability between within-category and between-category contrasts. Thirdly, as suggested in the previous section, a study comparing native Chinese speakersÂ’ production before and after per ceptual training would shed light on the relationship between perception and production. In addition to an acoustical analysis of production, other means of analysis, such as judgments by native English speakers or self-assessment can be employed to obtain psyc hoacoustical parameters in the perception of [n] and [l]. The psychoac oustical parameters can be us ed to test the predictions generated by NLM, to further investigate the different perceptual patterns between Jianghuai and Xinan speakers who perceived the non-native contrast into a single L1 category; and also to examine goodness fitting of Gan speakersÂ’ perception of [l] and [n].
120 WhatÂ’s more, in order to test training effects over the long term, a perception test and a production test can be run one or two months after the training. Synthesized stimuli differing across various acoustic dimensions could also be used in a discrimination test, in order to test the hypothesis that perception of L2 contra sts is weighed by the phonetic or psychoacoustical differences and similari ties between perceived L2 sounds and L1 prototypes. The current research can also be complemented by studies using other methodologies. For example, an X-ray, ultras ound, or fMRI study can be implemented to examine the tongue position when [n] and [l] are produced in both English and in Chinese dialects. Results along this line will provide empirical evidence as to whether visual stimuli (showing the f ace of the speaker) should be in corporated into perception training. Previous studies have showed th at visual information facilitates speech perception (McGurk and McDonald 1976, Sald ana and Rosenblum 1993). Additionally, a physics analysis can be run to investigate th e dynamic airflow thr ough the nasal and oral cavities to corroborate the findings of the ac oustic study; as well as to provide evidence for an accurate phonetic description of the Xinan /n/. In conclusion, results from the study at hand not only provide insight into the understanding and modification of current mode ls and theories on cross-language speech perception and speech production, but also provide valuable information in pedagogical domains. In foreign language teaching, la nguage teachers can use such information to guide their class designing. As generally a ccepted that perception precedes production, the inclusion of perception trai ning with natural stimuli would be very helpful in teaching non-native sounds. For classrooms where lab teach ing is available, the result in training
121 methods from this study can be applied. Fo reign language teaching can incorporate an identification task using natural stimuli as well as synthesized stimuli, which are enhanced at certain acoustic dimensions. For classrooms where lab training is not available, teaching can be compensated by listening to a variety of minimal pairs produced by native speakers, as we have s hown that natural stim uli provide a high and robust variability in input. Further more, this study serves to fill up the gap in the literature on acoustic characteristics of E nglish [l] and [n]. For American English speakers, [l] and [n] differ not only in te mporal dimensions, but also in spectral dimensions. More importantly, differences between the two sounds are found in Â“newÂ” acoustic dimensions, such as in spectral mo ments and spectral features measured around the glottal pulses at the constriction release. Moreover, acoustic char acteristics of Xinan speakersÂ’ production of the two sounds in Englis h provide valuable information to future studies on speech perception and production am ong these particular speakers. Finally, findings from the acoustic anal ysis at hand provide parameters, which can be used in autonomous speech recognition by machines, and in developing models in signal processing. In conclusion, cross-language st udies of /l/ and /n/, which are improved upon through various linguistic perspectives and interdisciplinary appr oaches, can provide useful information in various domains.
122 APPENDIX DISCRIMINANT FUNCTION ANALYSIS DFA is a statistical procedure that is useful for situati ons where a predictive model of group membership is desired based on obs erved characteristics of each case. Â“The procedure generates a discriminant function ba sed on linear combinations of the predictor variables that provide the be st discrimination between the groupsÂ” (SPSS Tutorial 2001). DFA can be used for exploratory and predic tive purposes: it can be used to determine whether there are differences among the average sc ores of a series of variables for two or more groups; to determine which variable can account for these differences; to classify variables; and to inve stigate how groups are discrimina ted. Applied to acoustic data, DFA can be used to classify tokens into two or more groups (e.g., manner or place of articulation) using a set of pr edictors (e.g., acoustic cues). Th is analysis has been shown to be effective in place of articulation cl assification of consonants (Arabic fricatives: Alkhairy 2005, English fricat ives: Jongman et al. 2000, voi celess obstruents: Nissen 2003). DFA in this dissertation used a step-w ise method in which only the predictor (acoustic dimensions) that minimized WilksÂ’ Lambda was entered at any given step. The WilkÂ’s Lambda is used to test the null hypot hesis that the cases have identical means on the discriminant function coefficients, that is, acoustic cues were equal in their abilities to discriminate English [l] and [n]. The smalle r the Wilks lambda is, the more doubt is cast upon the null hypothesis. The criteria for entry in to each step in the DFA analysis was set at p = 0.05 and at p = 0.10 for removal. Also, since the levels of th e dependent variables
123 (/n/ vs. /l/) had unequal numbers of cases, th e Prior Probabilities for group membership were calculated from the group size. The Pr ior Probabilities for groups show how many percentages of the cases can be accounted fo r using the independent variables (acoustic dimensions). The output of the DFA has a seri es of statistic parameters. The Eigenvalue is an indication of how much each discrimination function contributes to the analysis. The value of WilksÂ’ Lambda indicates the degree of effectivene ss of discriminant functions. Finally, DFA provides classification results on what percentage of the original cases could be correctly classified into th e predicted membership groups (/n/ vs. /l/).
124 LIST OF REFERENCES Abramson, A. S. and Lisker, L. "Discrimin ability Along the Voicing Continuum: Cross Language Tests." Proceedings of the 6t h International C ongress of Phonetic Sciences (1970): 569-73. Abramson, A.S., Nye, P W; Henderson, J; Mars hall, C W. "The Perception of an OralNasal Continuum Generated by Articulat ory Synthesis." Haskins Laboratories Status Report on Speech Research 57 (1979): 17-38. Al-Khairy, Mohamed Ali. "Acoustic Character istics of Arabic Fricatives." Ph.D Dissertation. Universi ty of Florida, 2005. Anderson, J. R. The Adaptive Character of Thought. Hillsdale, NJ: Erlbaum, 1990. Aoyama, Katsura, Flege, James E., Guion, Susan G., Akahane-Yamada, Reiko; Yamada, Tsuneo. "Perceived Phonetic Dissimilarity and L2 Speech Learning: the Case of Japanese /r/ and English /l/ and /r/. " Journal of Phonetics 32 (2003): 233-50. Barry, M. "A Phonetic and Phonological Investigation of English Clear and Dark Syllabic /l/." Bulletin de la Co mmunication ParlÃ©e 5 (2000): 77-87. Beddor, P.S. and S. Hawkins. "The Influence of Spectral Prominence on Perceived Vowel Quality." Journal of the Acous tic Society of America 87.6 (1990): 2684704. Beddor, P.S. and T.L. Gottfried. "Methodolog ical Issues in Cross-Language Speech Perception Research with Adults." Speech Perception and Linguistic Experience: Theoretical and Methodological Issues in Cross-Language Speech Research. Ed. W. Strange. Timonium, MD: York Press Inc, 1995. 207-32. Best, C. T. and McRoberts, G. W.; Sit hole, N. M. "Examination of Perceptual Reorganization for Nonnative Speech C ontrasts: Zulu Click Discrimination by EnglishSpeaking Adults and Infants." Journal of Experimental Psychology: Human Perception and Performance 14.3 (1988): 345Â–60. Best, C. T. and Strange, W. "Effects of Phonological and Phonetic Factors on CrossLanguage Perception of Approximants. " Journal of Phonetics 20 (1992): 305-30. Best, C. T. Learning to Pe rceive the Sound Pattern of English. Ablex, Norwood: NJ: Haskins Laboratories, 1994.
125 ---. "The Emergence of Native-Language Phonological Influences in Infants: A Perceptual Assimilation Model." The De velopment of Speech Perception: The Transition from Speech Sounds to Spoken Words. Ed. H. C. Nussbaum. Cambridge: MA: MIT Press, 1994. 167-224. ---. "A Direct-Realist View of Cross-Langua ge Speech Perception." Speech Perception and Linguistic Experience: Issues in Cr oss-Language Research. Ed. W. Strange. Timonium, MD: York Press, 1995. 171-204. Best, C.T., McRoberts, Gerald W. and G oodell, Elizabeth. "Discrimination of NonNative Consonant Contrasts Varying in Per ceptual Assimilation to the Listener's Native Phonological System." Journal of the Acoustical Society of America 109.2 (2001): 775-94. Boersma, P. and Weenink D. Praat: A Syst em for Doing Phonetics by Computer. Vers. 4.4.20. Computer software. Inst itute of Phonetic Sciences of the University of Amsterdam., 2004. Bohn, O.-S., and Flege, J. "Interlingual Iden tification and the Role of Foreign Language Experience in L2 Vowel Perception." Applied Psycholinguistics 11.3 (1990): 30328. Borden, Glorida J., Harris, Katherine S., and Raphael, Lawrence J. Speech Science Primer. 1st edition ed. Baltimore: Williams & Wilkins, 1980. Bradlow, A., Pisoni, D., Yamada, R., and T ohkura, Y. "Training Japanese Listeners to Identify English /r/ & /l/: Iv. Some Ef fects of Perceptual Learning on Speech Production." Journal of the Acoustic Society of America 101 (1997): 2299-310. Bradlow, A., Yamada, R., Pisoni, D., and T ohkura, Y. "Training Japanese Listeners to Identify English /r/ & /l/: Long-Term Re tention of Learning in Perception and Production." Perception and Psychophysics 61.5 (1999): 977-85. Byrd, D. "54,000 American Stops." UCLA Wo rking Papers in Phonetics 83 (1993): 97115. Carter, Paul Graham. "Structu red Variation in British E nglish Liquids: The Role of Resonance." PhD Dissertation. University of York, 2002. Chappell, Hilary, ed. Sinitic Grammar: Sync hronic and Diachronic Perspectives. Oxford: Oxford University Press, 2001. Coleman, J.S. "Cognitive Reality and the Phonol ogical Lexicon: A Review." Journal of Neurolinguistics 11 (1998): 295-320. Coleman, J. Phonological Representations: Th eir Names, Forms and Powers. Cambridge: Cambridge University Press, 1998.
126 Cooper, Franklin S., Delattre, Pierre C., Liberman, Alvin M., Borst, John M., and Gerstman, Louis J. "Some Experiments on the Perception of Synthetic Speech Sounds." Journal of the Acoustical Society of America 24 (1952): 597-606. Dalston, R. M. "Acoustic Characteristics of English /w, r, l/ Spoken Correctly by Young Children and Adults." Journal of the Ac oustic Society of Am erica 57 (1975): 46269. Delattre, P. "The Physiologi cal Interpretation of Sound Spectrograms." Publications of the Modern Language Association 66 (1951): 864-75. Delattre, P.C., Liberman A. M., and Cooper, F. S. "Acoustic Loci and Transitional Cues for Consonants." Journal of the Acous tic Society of America 27 (1955): 769Â–73. Diehl, R., Kluender, K., and Walsh, M. "Som e Auditory Bases of Speech Perception and Production." Advances in Speech Hear ing and Language Processing. Ed. W. Ainsworth. Vol. 1. London, England: JAI Press, 1990. 243-67. Eimas, P. D. "The Relation between Iden tification and Discrimination Along Speech and Non-Speech Continua." Language and Speech 6 (1963): 206-17. Espy-Wilson, Carol Y. "Acoustic Measures fo r Linguistic Features Distinguishing the Semivowels/w, j, r, l/in American Eng lish." Journal of the Acoustic Society of America 92.2 (Pt. 1) (1992): 736-57. Fant, G. Acoustic Theory of Sp eech Production. Mouton: The Hague, 1960. Flege, J.E. and Hillenbrand, J.M. "A Differe ntial Effect of Release Bursts on the Stop Voicing Judgments of Native French and English Listeners." Journal of Phonetics 15 (1987): 68-73. Flege, J. "Factors Affecting Degree of Per ceived Foreign Accent in English Sentences." Journal of the Acoustical Soci ety of America 84 (1988): 70-79. Flege, J., Munro, M., and MacKay, I. "Eff ects of Age of Second-Language Learning on the Production of English Consonants." Speech Communication 16 (1995): 1-26. Flege, J., Takagi, N., and Mann, V. "Lexical Familiarity and English-Language Experience Affect Japanese Adults' Per ception of /r/ and /l/." Journal of the Acoustic Society of America 99.2 (1996): 1161-73. Flege, J., Bohn, O.-S., and Jang, S. "Effect s of Experience on Non-Native Speakers' Production and Perception of English Vo wels." Journal of Phonetics 25.4 (1997): 437-70. Flege, J. "Origins and Development of the Speech Learning Model." 1st ASA Workshop on L2 Speech Learning. Simon Fraser Un iversity, Vancouver, BC, Canada, 2005.
127 Forrest, K., Weismer, G., Milenkovic, P., and Dougall, R. N. "Statistical Analysis of Word-Initial Voiceless Obstruents: Preliminary Data." Journal of the Acoustic Society of America 84.1 (1988): 115-23. Fujimura, O. "Analysis of Nasal Consonant s." Journal of the Acoustic Society of America 34 (1962): 1865-75. Fujimura, Osamu and Donna Erickson. "Ac oustic Phonetics: The Handbook of Phonetic Sciences." Ed. William J. Hardcastle and John Laver: Blackwell Publishers, 1997. 65-115. Goto, H. "Auditory Perception by Normal Japa nese Adults of the Sounds Â‘lÂ’ and Â‘rÂ’." Neuropsychologia 9 (1971): 317Â–23. Grieser, D., & Kuhl, P. K. "Categoriza tion of Speech by Infants: Support for SpeechSound Prototypes." Developmen tal Psychology 25 (1989): 577-88. Guion, S., Flege, J., and Loftin, J. "The Effect of L1 Use on Pronunciation in QuichuaSpanish Bilinguals." Journal of Phonetics 28.1 (2000): 27-42. Harnsberger, James D. "A Cross-Language Study of the Identific ation of Non-Native Nasal Consonants Varying in Place of Artic ulation." Journal of Acoustical Society of America 108.2 (2000): 764-83. ---. "On the Relationship between Identifica tion and Discrimination of Non-Native Nasal Consonants." Journal of Acoustical Society of America 110.1 (2001): 489-503. Hawkins, S. and Stevens, K. "Acoustic and Pe rceptual Correlates of the Non-Nasal-Nasal Distinction for Vowels." Journal of Acoustical Society of America 77.4 (1985): 1560Â–75. Hawkins, S. "Reevaluating Assumptions A bout Speech Perception: Interactive and Integrative Theories." The Acoustics of Speech Communication. Ed. J. Pickett. Boston: Allyn and Bacon, 1999. 232-88. Hayward, K. Experimental Phonetics. Essex, England: Pearson Education, 2000. Holt, L. L., Lotto, A. J., and Diehl, R. L. "Auditory Discontinuities Interact with Categorization: Implications for Speech Pe rception." Journal of Acoustical Society of America 116 (2004): 1763-73. Hou, Jingyi. Xiandai Hanyu Fangyan Gailun (an Ov erview on Modern Chinese Dialects). Shanghai: Shanghai Education Publishing House, 2005. House, A. S. and Stevens, K. N. "Analog St udies of the Nasalization of Vowels." Journal of Speech Hearing Disorders 21 (1956): 218-32.
128 House, A.S. "Analog Studies of Nasal C onsonants." Journal of Speech and Hearing Disorders 22 (1957): 190Â–204. Ingram, J. C. and Park, S.-G. "Language, Context, and Speaker Effects in the Identification and Discrimination of English /r/ and /l/ by Japanese and Korean Listeners." Journal of the Acoustic Society of Amer ica 103 (1998): 1161Â–74. Iverson, G. and Sohn, H.-S. "Liquid Representa tion in Korean." Theo retical Issues in Korean Linguistics. Ed. Y.-K. Kim-Renaud, 1994. 79-100. Iverson, P. and Kuhl, P.K. "Influences of Phonetic Identification and Category Goodness on American Listeners' Perception of /R/ a nd /L/." Journal of Acoustical Society of America 92 (1996): 1130-40. Jamieson, D. and Morosan, D. "Training N on-Native Speech Cont rasts in Adults: Acquisition of the English / / and / / Contrast by Francophones." Perception & Psychophysics 40.4 (1986): 205-15. Jamieson, D. G. and Morosan, D. E. "Training New, Nonnative Speech Contrasts: A Comparison of the Prototype and Perceptual Fading Techniques." Canadian Journal of Psychology 43 (1989): 88-96. Johnson, K. Acoustic and Auditory Phonetics. 2nd Edition ed. Oxford: Blackwell, 2002. Jongman, A., Wayland, R., a nd Wong, S. "Acoustic Characteristics of English Fricatives." Journal of the Acoustic Society of America 108.3 Pt 1 (2000): 1252Â– 63. Kent, R. D. and Read, C. The Acoustic Analysis of Speech. San Diego: Singular Publishing Group, 2002. Kuhl, P. and Miller, J. "Speech-Perception by Chinchilla Identification Functions for Synthetic VOT Stimuli." Journal of the Acoustic Society of America 63.3 (1978): 905-17. Kuhl, P. K. "Human Adults and Human Infant s Show a Perceptual Magnet Effect for the Prototypes of Speech Categories, Monkeys Do Not." Perception & Psychophysics 50 (1991): 93-107. ---. "Psychoacoustics and Speech Perception: In ternal Standards, Perceptual Anchors, and Prototypes." Developmental Psychoacoustics . Ed. L. A. Werner & E. W. Rubel. Washington DC: American Psyc hological Association, 1992. 293Â–332. ---. "Innate Predispositions and the Effect s of Experience in Speech Perception: The Native Language Magnet Theory." Developmental Neurocognition: Speech and Face Processing in the First Year of Life. Ed. S. de Schonen B. deBoysson-Bardies, P. Jusczyk, P. McNeilage, & J. Morton. Do rdrecht, Netherlands: Kluwer Academic Publishers, 1993. 259-74.
129 Kuhl, P.K. and Iverson, P. "Linguistic Expe rience and the "Percep tual Magnet Effect." Speech Perception and Linguistic Experience. Ed. W. Strange. Baltimore: York Press, 1995. Kurowski, K., and Blumstein, S. E. "Perceptu al Integration of th e Murmur and Formant Transitions for Place of Articulation in Na sal Consonants." Journal of the Acoustic Society of America 76 (1984): 383Â–90. Ladefoged, P., and Maddieson, I. The Sounds of the World's Languages. Oxford: Blackwell, 1996. Ladefoged, P. A Course in Phonetics. 4th ed. Orlando, FL: Harcourt college, 2001. ---. Phonetic Data Analysis: An Introduction to Fieldwork and Instrumental Techniques. 1st edition ed. Malden: Blackwell Publishers, 2003. Lass, N.J. Principles of Experimental P honetics. Vol. Xiii. St. Louis, Mo: Mosby, 1996. Lehiste, I. "Acoustic Charac teristics of Selected Englis h Consonants." International Journal of American Linguistics (1964): 10-115. Lenneberg, E. H. Biological Foundations of Language. New York: John Wiley & Sons, 1967. Liberman, A. M., Cooper, F. S., Shankw eiler, D. P., and Studdert-Kennedy, M. "Perception of the Speec h Code." Psychological Review 74 (1967): 431-61. Liberman, A., and Mattingly, I. "The Moto r Theory of Speech Perception Revisited." Cognition 21 (1985): 1-36. Lisker, L., and Abramson, A.S. "A CrossLanguage Study of Voicing in Initial Stops: Acoustic Measurements." Word 20 (1964): 384-422. Lively, S., Logan, J., and Pisoni, D. "Training Japanese Listeners to Identify English /r / and /l/. Ii. The Role of Phonetic Envir onment and Talker Variability in Learning New Perceptual Categories." Journal of the Acoustic Society of America 94 (1993): 1242-55. Lively, S. E., Pisoni, D. B., Yamada, R. A., Tohkura, Y., and Yamada, T. "Training Japanese Listeners to Identif y English /r/ and /l/: Iii. Long-Term Retention of New Phonetic Categories." Journa l of the Acoustic Society of America 96 (1994): 207687. Logan, J., Lively, S., and Pisoni, D. "Training Japanese Listeners to Identify English /r/ and /l/: A First Report." J ournal of the Acoustic Society of America 89 (1991): 87486.
130 Logan, J. S., and Pruit, J. S. "Methodological Issues in Training Listeners to Perceive Non-Native Phonemes." Speech Perception a nd Linguistic Experience: Issues in Cross-Language Research. Ed. W. St range. Baltimore: York Press, 1995. MacDonald, J. and McGurk, H. "Visual Infl uences on Speech Perception Processes." Perception and Psychophysics 24 (1978): 253-57. MacKain, K.S., Best, C.C., and Strange, W. "C ategorical Perception of English /r/ and /l/ by Japanese Bilinguals." Applie d psycholinguistics 2 (1981): 369-90. Maddieson, Ian. Patterns of Sounds. Cambri dge: Cambridge University Press, 1984. Malecot, A. "Vowel Nasality as a Distinc tive Feature in American English." Language 36 (1960): 222-229. McGurk, Harry and MacDonald, John. "Hearing Lips and Seeing Voices", Nature 264 (1976), 746-748. Mermelstein, Paul. "On Detecting Nasals in Continuous Speech." Journal of the Acoustic Society of America 61.2 (1977): 581-87. Miller, J. D., Wier, C. C., Pastore, R., Kell y, W. J., and Dooling, R. J. "Discrimination and Labeling of Noise-Buzz Sequences with Varying Noise-Lead Times: An Example of Categorical Perception." Journa l of the Acoustic Society of America 60 (1976). Miyawaki, K., Strange, W., Ve rbrugge, R., Liberman, A.M., Jenkins, J.J., and Fujimora, O. "An Effect of Linguistic Experience: The Discrimination of [r] and [l] by Native Speakers of Japanese and En glish." Perception and Psychophysics 18 (1975): 331-40. Nakata, K. "Synthesis and Perception of Nasal Consonants." Journal of the Acoustic Society of America 31 (1959): 661Â–66. Nolan, F. The Phonetic Bases of Speaker Recognition. Cambridge: Cambridge University Press, 1983. Norman, Jerry. Chinese. New York:: Cambridge University Press, 1988. O'Connor, J. D., Gertsmart, L. J., Liberman, A.M., Delattre, P. C., and Cooper, F. S. "Acoustic Cues for the Perception of Init ial /w, r, l/in English." Word 13 (1957): 24-43. Patkowski, M. "The Sensitive Period for the Acquisition of Syntax in a Second Language." Language Learning 30 (1980): 449-72.
131 Pisoni, D. "Identification and Discrimina tion of Relative Onset of Two-Component Tones: Implications for Voicing Percep tion in Stops." Journal of the Acoustic Society of America 61 (1977): 1352-61. Pisoni, D., Aslin, R., Perey, A., and Hennessy, B. "Some Effects of Laboratory Training on Identification and Discrimination of Voicing Contrasts in Stop Consonants." Journal of Experimental Psychology: Hu man Perception and Performance 8 (1982): 297-314. Polka, L. "Cross-Language Speech Percepti on in Adults: Phonemic, Phonetic, and Acoustic Contributions." J ournal of Acoustical Soci ety of America 89 (1991): 2961-77. ---. "Characterizing the Influence of Native Experience on Adults Speech Perception." Perception & Psychophysics 52 (1992): 37-52. Pruthi, Tarun, and Espy-Wilson, Carol Y. "Ac oustic Parameters for Automatic Detection of Nasal Manner." Speech Communication 43 (2004): 225Â–39. Rochet, B. L. "Perception and Production of Second-Language Speech Sounds by Adults." Speech Perception and Linguistic Experience. Ed. W. Strange. Timonium, MD: York Press, 1995. Romero, Joaquin. "Duration Aspects in Manner of Articulation Distin ctions." Journal of the Acoustical Society of America 94.3 (1993): 1764. Saldana, H. M., and Rosenblum, L. D. "Visual Influences on Auditory Pluck and Bow Judgments." Perception and Psychophysics 54 (1993): 406-16. Schmidt, A. M. "Perception of English /r, l, n, d/ by Native Speakers of Chinese." Journal of Acoustical Society of America 99 (1996): 2548. Schmidt, A.M., and Kaminski, A. "Perception of English /n, l/ with Amplified Consonants or Transitions by Native Speak ers of Chinese." ASA meeting. State College, PA, 1997. Schmidt, A.M., Kaminsky, Amy, and Chung, Hyunjoo. "Training Chinese Speakers to Perceive English /n, l/: Part Ii ." ASA meeeting. Columbus, OH, 1999. Scovel, T. "Foreign Accents, Language Acquisition, and Cerebral Dominance." Language Learning 19 (1969): 245-53. Seo, M. "A Perception-Based Study of Sonoran t Assimilation in Korean." OSU working papers in Linguistics 55 (2001): 43-69. Sheldon, A., and Strange, W. "The Acquisition of /r/ and /l/ by Japanese Learners of English: Evidence That Speech Produc tion Can Precede Speech Perception." Applied Psycholinguistics 3 (1982).
132 Stevens, K. N., and Blumstein, S. E. "Invari ant Cues for Place of Articulation in Stop Consonants." Journal of Acoustical Society of America 64 (1978): 1358-68. ---. "The Search for Invariant Acoustic Correlat es of Phonetic Features." Perspectives of the Study of Speech. Ed. P. D. Eimas and J. L. Miller. Hillsdale, NJ: Erlbaum, 1981. Stevens, Kenneth N., and Blumstein, Sheila E. "Attributes of Lateral Consonants." Journal of Acoustical Soci ety of America 95.5 (1994): 2875. Stevens, K. N. Acoustic Phonetic s. Massachusetts: MIT Press, 1998. Strange, W., and Dittmann, S. "Effects of Di scrimination Training on the Perception of /R-L/ by Japanese Adults Learning E nglish." Perception and Psychophysics 36.2 (1984): 131-45. Strange, W., ed. Speech Perception and Linguistic Experience: Issues in Cross-Language Speech Research. Timonium, MD: York Press., 1995. Takagi, N. Perception of American English /r/ and /l/ by Adult Japanese Learners of English: A Unified View. Unpublished P h.D. Dissertation, Univ. of CaliforniaIrvine. Toft, Zeo. "The Phonetics and Phonology of Some Syllabic Consonants in Southern British English." ZAS Papers in Linguistics 28 (2002): 111 44. TraunmÃ¼ller, H. "Analytical Expressions for the Tonotopic Sensory Scale." Journal of the Acoustic Society of Am erica 88 (1990): 97-100. Umeda, N. "Consonant Duration in American English." Journal of Acoustical Society of America 61.3 (1976): 846-58. Wang, Y., Spence, M. M., Jongman, A., and Sereno, J. A. "Training American Listeners to Perceive Mandarin Tones." Journal of Acoustical Society of America 106 (1999): 3649Â–58. Wayland, R.P., and Guion, S.G. "Discrim ination of Thai Tones by Naive and Experienced Native English Speakers." J ournal of the Acoustical Society of America 110 (2001): 2686. Wayland, R.P. and Li, Bin. "Training Native Chinese and Native English Listeners to Perceive Thai Tones." ISCA Workshop on Plasticity in Speech Perception (PSP2005). Senate House, London, UK, 2005. Werker, J. F., and Tees, R. C. "Cross-Language Speech Perception: Evidence for Perceptual Reorganization dur ing the First Year of Life ." Infant Behavior and Development 7 (1984): 49Â–63.
133 Wright, R., Frisch, S., and Pisoni, D. B. Speech Perception: Research on Spoken Language Processing. Bloomington, IN: Sp eech Research Laboratory, Indiana University, 1996. Wurm, S. A., and Li, Rong, ed. The Language Atlas of China, Hong Kong: Longman Group (Far East), 1987. Yamada, R.A., and Tohkura, Y. "The Eff ects of Experimental Variables on the Perception of American English /r/ and /l/ by Japanese Listeners." Perception & Psychophysics 52 (1992): 376-92. Yamada, R. A. "Effects of Extended Trai ning on /r/ and /l/ Identification by Native Speakers of Japanese." Journal of Ac oustical Society of America 93 (1993): 2931(A). Yamada, R. A., Strange, W., Magnuson, J. S., Pru itt, J. S., and Clarke III, W. D. "The Intelligibility of Japanese Speakers' Producti ons of American English /r /, /l/, and /w/, as Evaluated by Native Speakers of American English." The International Conference of Spoken Language Proce ssing. Yokohama: Acoustical Society of Japan, 1994. 2023-26. Yamada, R. "Age and Acquisition of Sec ond Language Speech Sounds: Perception of American English /r/ and /l/ by Native Sp eakers of Japanese." Speech Perception and Linguistic Experience: Issues in Cr oss-Language Research. Ed. W. Strange. Timonium, MD: York Press, Inc, 1995. 305-20. Yip, M. Tone. Cambridge: Camb ridge University Press, 2002. Yuan, J. Hanyu Fangyan Gaiyao. Beijing: Yuwen Chubanshe (Chinese Press), 1960. Zhao, Y. Hubei Fangyan Diaocha Baogao. Beijing: Shangwu Chubanshe, 1948.
134 BIOGRAPHICAL SKETCH Bin Li, daughter of Guiliang Li and Ti ngxiu Wang, was born in Jinan, China. She received her B.A. in English from the Shandong University of Technology (merged into Shandong University in 2000). She earned her M.A. in Foreign Linguistics and Applied Linguistics at the Beihang Un iversity. Upon graduation in 2002, she came to the United States of America for the pursuit of Ph D in linguistics on the four-year Alumni Fellowship. In the Linguistics Program at the Univer sity of Florida, she developed strong interests in phonetics. During the PhD traini ng, she worked as a teaching assistant to the course of Â‘Sounds of Human languageÂ’ in 20042005; and also a research assistant in a series of phonetic studies. She will be work ing as an assistant professor in the Department of Chinese, Translation, and Lingui stics at the City University of Hong Kong in China after graduation.