University of Florida | Journal of Undergraduate Research | Volume 15, Issue 3 | Summer 2014 1 The Effect of Tone Production in Lexical Tone Discrimination Training Eric Holgate, Shuang Lu, and Dr. Edith Kaan College of Liberal Arts and Sciences, University of Florida Non native speakers have been shown to be less accurate when discriminating between lexical tones, such as those which occur in languages like Mandarin Chinese and Thai. Several studies have shown that discrimination accuracy can be increased through laboratory train ing, but little research has been done on the effect of production in such training programs. The present study compared the effect of perception only and perception plus production training on lexical tone discrimination accuracy. Analysis of the collecte d data showed that accuracy increased after training, but that including production in the training did not have any effect. Further investigation on a larger population of participants is necessary to resolve these null results. INTRODUCTION Lexical tones and training Pitch differentiation is used in many languages to mark speech. Many spoken languages use intonation to convey emphasis and other paralinguistic information, as shown by the rising pitch inherent in many English questions. Tonal languages, like Mandarin Chinese, use pitch differentiation at th e syllabic level to convey lexical information (e.g. the syllable [ma] in Mandarin can mean mother, hemp, horse, scold or can serve as a question particle depending upon the pitch of the speaker s voice). Native speakers of languages that do not have such lexical tones, such as English, are often less accurate when asked to discriminate between foreign lexical tones than are native speakers of tonal languages (Wayland & Guion, 2004), but can improve from short term laboratory training (Wang, Spenc e, Jongman, & Sereno, 1999). Previous studies mostly employed training techniques that involved perception only. For instance, participants are asked to either identify the tone category of the sounds presented (e.g., Wang et al., 1999), or are asked to indicate whether the sounds presented have the same or a different tone (e.g., Wayland & Guion, 2004; Kirkham, Lu, Wayland, & Kaan, 2011). Speech production and perception Th e present study examines whether producing tones during training, in addition to on ly perceiving them, leads to an even greater increase in discrimination accuracy, and is part of a larger study which will include event related potential analysis. This investigation was prompted by consideration of The Motor Theory of Speech Perception ( Liberman, Cooper, Shankweiler, & Studdert Kennedy, 1967), which asserts that it is the identification of intended articulatory gestures (the motions within the vocal tract necessary for the production of speech sounds, e.g. the closure of the lips and add uction of the vocal folds in the production of [b]), and not the sound patterns they produce (individual phones or speech sounds, such as [b], [g], or [e]), that are used in speech perception. Similarly, the Direct Realist Theory (Best, 1995) asserts that articulatory gestures play the central role in speech perception, but holds that the actual gestures, not the intended gestures, are used in the perception process. As speech production occurs very rapidly, the production of speech sounds is often effected by the environment of the target sound in a process known as coarticulation (e.g. in English, [n] is usually produced with the tip of the tongue at the alveolar ridge, which lies behind the front teeth. In the production of tenth however, it is interdent al due to assimilation of the these theories, they both imply that a speakers ability to make the articulatory gestures to produce a given sound directly affects that speakers ability to perceive the same sound. It is therefore expected that tone discrimination training paradigms that include production (as opposed to perception only) will further augment the increase in accuracy in behavioral tasks that has already been documented (e.g. Wong & Pe rrachione, 2007). THE PRESENT STUDY The present study was conducted to determine the effect of lexical tone production in tone discrimination training by comparing the effect of perceptiononly and perception combined with production training on the accuracy of native English speakers in Mandarin lexical tone discrimination tasks The study was part of a larger study which included the collection of electroencephalography data for event related potential analysis ; these latter data will not be reporte d here We will discuss data collected from a discrimination task conducted before and after training, and will investigate the change in accuracy over the course of training.
ERIC HOLGATE, SHUANG LU, AND DR. EDITH KAAN University of Florida | Journal of Undergraduate Research | Volume 15, Issue 3 | Summer 2014 2 RESEARCH DESIGN & ME THOD Participants A total of 14 native speakers of American English were recruited from and around the University of Florida and the Gainesville community to participate in this study. Each participant was randomly assigned to one of two groups: seven receiving only perception training (perception group), and se ven receiving both production and perception training (production group). Participants had no previous experience with tone languages, were 18 35 years of age, right handed according to the Edinburgh Handedness inventory (Oldfield, 1971), and had no histor y of neurological or speech impairment. Additionally, participants hearing within 250 to 8000 Hz was tested at 25 dB at the time of the experiment to ensure hearing deficits did not adversely affect data. A comparison of the demographics of the two groups can be found in Table 1. Table 1. Training Group Demographics Group n Mean Age Sex AMMA Male Female Perception 7 19.42 0 7 Average Production 7 19.14 1 6 Average Additionally, no participant indicated to have received formal musical training, as many studies have shown music training to affect the perception of lexical tones (e.g. Wong, Skoe, Russo, Dees, & Kraus, 2007). In addition to the inclusion of this criterion in the demographic questionnaire, participants musical aptitude was tested th rough use of the Advanced Measures of Music Audiation (AMMA, Gordon, 1989), which provides a standardized assessment of musicality. No participants received a high rating on the AMMA assessment. Many participants sought course credit for participating in the study, and received course credit for participating two hours. All participants were financially compensated for the remaining time spent in the lab All participants signed an informed consent form as per University of Florida IRB procedures. Stimuli 144 stimuli were created by Shuang Lu for use in the study. Twelve monosyllabic syllables that naturally occur in English ([pha], [phi], [phho], [kha], [khi], [khho], [tha], [thi], [thho]) were digitally altered in pitch through use of the Praat software (Boersma & Weenink, 2013) to match Mandarin tones 1 (high level), 2 (rising), and 4 (high falling) (Figure 1). While the tone contours used in this study are not natural contours, they approximate them in the way suggested by W ong & Perrachione (2007). Tone 3 (low dipping) was not included in this study as it has been shown to be confusing to native and nonnative Mandarin speakers alike (Kirkham, Lu, Wayland, & Kaan, 2011). Native English syllables were selected specifically so that participants would be able to focus solely on lexical tone and not be preoccupied with the presence of foreign syllables. Two female native speakers of American English produced each syllable twice, creating a total of four tokens per syllable, and i t was this set of files that were digitally altered to mimic the Mandarin lexical tone. A soundattenuated booth was used to ensure the purity of the sample. Female speakers were chosen as studies have shown that adult female speakers have been rated more intelligible than their male counterparts (Bradlow, Torretta, & Pisoni, 1996). The pitch contours were superimposed from the start of vocalization of each syllable through the use of the pitch synchronous overlap and add (PSOLA) function in the Praat softw are. The syllables [pha], [kho], [tha], [thi], [thho] were used to conduct the pre training and post training assessment. The syllables [pha], [phi], [phho], [kha], [khi], [khho] were used in the training programs. For t one 1, the mean fundamental frequency (F0) of each recording was used as the onset value and, as the tone is high level, this frequency was maintained across the syllable. t one 2 slopes upward 26% to its offset at the same frequency as tone 1. The onset of t one 1 was used for the onset of t one 4 with the offset being 82% lower. Normalization of duration (400ms) and intensity of all stimuli makes them indistinguishable except for syllable and pitch contour. Five native Mandarin speaking inf ormants judged the stimuli for accuracy of tone contour and returned a 95% accuracy rate or above. Discrimination t ask (pre and post training) During the discrimination task, participants were presented with 288 stimulus pairings and asked to determine whether or not the stimuli fe atured the same lexical tone and answer via a button box. Each stimulus pairing featured the same syllable in both stimuli. For example, if syllable one was Ta1 (Ta paired with tone 1), syllable two was also Ta, but may or may not have feature d tone 1. Additionally, syllable two featured the same speaker as syllable one, and was of the same token. All pairings were presented in a pseudorandom order, so while all participants received the same stimuli the same number of times in the experiment the stimu li were not necessarily presented in the same order. During this task, participants were given a six trial training period in order to clarify the instructions They did not receive feedback regarding their responses after th is practice period Participant s were given a rest period after 144 trials.
THE EFFECT OF TONE PRODUCTION IN LEXICAL TONE DISCRIMINATION TRAINING University of Florida | Journal of Undergraduate Research | Volume 15, Issue 3 | Summer 2014 3 Figure 1. Mandarin Chinese lexical tones. Training t asks Each training program lasted approximately an hour and a half and featured a sixtrial practice section to give participants an opportunity to understand the instructions and ask questions. After the practice period, the training consisted of repeated exer cises of the same eight steps (see Figure 2). Figure 2. General training structure. This figure displays the general structure of both training types. *The critical difference between training programs occurs in steps (6) and (8). Perception only t raining In each trial, participants: (1) heard one stimulus/tone; (2) heard a second stimulus/tone 500ms after the offset of the first stimulus; (3) determined if the lexical tone of syllable two matched that of syllable one and responded via button box; ( 4) received visual feedback as to the correctness of their judgment on the monitor; (5) heard the first stimulus again and were presented with a visual display of the tone contour (see Figure 1 above, but note that only one tone contour is presented); (6) were prompted to say next to proceed to the second stimulus; (7) heard the second stimulus again and were presented with a visual display of its tone contour; and (8) were prompted to say next to proceed to the next trial. As in the baseline task, th e stimuli in each trial featured different tokens of the same syllable, produced by the same speaker. Each session consisted of 192 trials, half (96) featuring stimuli of the same lexical tone and half featuring stimuli of different tones. Production t rain ing Production training differed from perception training in steps (6) and (8). In step (6), participants were prompted to reproduce the tone of stimulus one instead of saying next. In step (8), participants were prompted to reproduce the tone of stimulus two instead of saying next. The participants productions of the lexical tones were recorded by means of a digital recorder to ensure they were actually reproducing the syllable and tone and not something else. General p rocedure Each participant comple ted the experiment over the course of three consecutive days. On Day 1, participants were asked to complete an Informed Consent Form (IRB approved), a demographic questionnaire and to complete the AMMA assessment. On this day, they also conducted the elec troencephalography (EEG) portion of the larger study, and a memory and working memory task not reported here. They concluded the first session with the baseline task described above. On Day 2, participants completed one of the two training programs descri bed above. Finally, on Day 3, participants repeated the EEG task (not reported here), and the baseline task. ANALYSIS AND RESULTS As already stated, the present study utilized the behavioral discrimination task described above to gather both pre and post training data, and is crucially concerned with the differences in the level of improvement presented by participants (if any) between the perception and production training groups. All of the following statistical calculations were completed with The R Env ironment (Gentleman & Ihaka, 1997). 1 Presentation of Syllable 1 2 Presentation of Syllable 2 3 Participant prompted to respond 4 Participant receives feedback on their response 5 Syllable 1 presented again with visual aid 6* Participant prompted to speak to proceed* 7 Syllable 2 presented again with visual aid 8* Participant prompted to speak to proceed*
ERIC HOLGATE, SHUANG LU, AND DR. EDITH KAAN University of Florida | Journal of Undergraduate Research | Volume 15, Issue 3 | Summer 2014 4 Discrimination task The mean and standard deviation of the accuracy before and after training of the participants of each group can be found in Table 2. Of importance is the fact that, as expected (e.g. Wang et al., 1999), both groups presented an increase in accuracy after training. Table 2. Baseline Mean (Standard Deviation) Accuracy Across all Trials Training Group Pre Training Post Training Perception 0.74 (0.42) 0.80 (0.36) 0.06 Production 0.82 ( 0.37) 0 .90 (0.25) 0.08 A mixed effects logit model was then run on the response given (1 for accurate, 0 for incorrect) by participants. Training Type and Session (pre or post training), and their intera ctions were included as factors. These factors were centered to reduced collinearity. The factor Subject was included as a random effect, with S ession nested within Subject to account for the individual variation between participants. While the change in mean accuracy of the production group was approximately 2% larger than that of the perception group, only the main effect of Session was significant : participants were more accurate in the post training task than in the pre training task, [ = 0.86, SE = 0.17, z = 4.95, p < 0.0001]. I t is worth accounting for the fact that only the syllables [pha] and [kho] we re present in both the training programs and baseline tasks. To see whether Training Type would have an effect on stimuli that were featured in the training programs, the above tests were rerun o n a subset of the original data consisting of only the trials that featured these syllables. This subset featured all 14 participants, but only 48 trials per session (a fourth of the original number of trials). Table 2 displays the mean and st andard deviation of the accuracy across trials featuring [pha] and [kho] The production group presented a numeric increase in accuracy that was almost double that of the perception group in the trials featuring [pha] and [kho]. Table 2. Baseline Mean (Standard Deviation) Accuracy Across Trials Featuring [pha] and [kho] Training Group Pre Training Post Training Perception 0.74 (0.42) 0.79 (0.34) 0.05 Production 0.80 (0.39) 0.91 (0.25) 0.11 The regression mixedeffect logit model estimated for only the trials featuring [pha] and [kho] still yielded a statistically significant Session [ = 0.99, SE = 0.22, z = 4.50, p < 0.0001] but did not indicate any increased significance in Training Type [ = 0.65, SE = 0.50, z = 1.30, p > 0.1] or the interac tion between the two variables. [ = 0.64, SE = 0.44, z = 1.45, p > 0.1] On the basis of the M otor T heory, we had expected an effect of T raining T ype. This is not borne out in our data. However, the lack of a significant T raining T ype does not allow us to draw the conclusion that the M otor T heory is incorrect. Training Since the logistic regression of the preand post training data did not show a significant d ifference between Training Type the accuracy of the participants during the training program were analyzed to examine the change in participants accuracy during the training program. To achieve this, the duration of the training program was divided into quartiles containing 48 trials for each participant. With seven participants per training type, thi s resulted in 336 trials per quartile. Since this section of analysis is concerning data contained entirely within each training program and it is always beneficial to maximize stimu lu s to noise ratio, there is no need to restrict the data to the trials co ntaining the syllables [pha] and [kho]. Participants mean accuracy was averaged over these trials to determine the mean accuracy across all participants for each quartile (Figure 3). Figure 3. Mean accuracy of Training Type by quartile. This graph depicts the mean accuracy of participants in each training group split by quartiles of 48 trials. Descriptively, while the perception group shows a steady increase in mean accuracy until the fourth quartile, the production group dips in mean accuracy during the second quartile before evening out 0.89% below the mean accuracy in the first quartile. Additionally, there was no change in mean accuracy between quartiles three and four for the production g roup. Below, find the mean accuracy per quartile of each production group participant (Figure 4). The mean accuracy per quartile of each participant in the production group mimics that of the group as a whole. No individual participant displays a consiste nt upward trend, and participant number three even presents a consistent downward trend. The increase in the accuracy that the production group displayed in the post training discrimination task must therefore either occur very early in the training progra m and plateau quickly or over a larger number of trials than considered here; further testing of a longer duration would be required. 0 0.2 0.4 0.6 0.8 1 1 2 3 4Accuracy Quartile Perception Production
THE EFFECT OF TONE PRODUCTION IN LEXICAL TONE DISCRIMINATION TRAINING University of Florida | Journal of Undergraduate Research | Volume 15, Issue 3 | Summer 2014 5 Figure 4. Mean accuracy of production group participants by Quartile. This graph displays the accuracy of the production group participants (participant numbers 1, 3, 5, 8, 10, 11, and 14) throughout the duration of the training program. To confirm the conclusions reached from the above data, mixed model, etc was run on the mean accuracy of all participants across Training Type and Trial Number (where one Trial is defined as the completion of the eight steps depicted in Figure 2) and the interaction between these variables. As predicted by the previous analysi s of change in mean accuracy across quartiles, the mixed logit model does not indicate significance at any level for Trial Number [ = 0.0015, SE = 0.0012, z = 1.19, p > 0.1]. Summary of Results The results of the above analysis reveal that (a) Session (pre vs. post training) is significant in predicting an individuals lexical tone discrimination accuracy, (b) Training Type (perception vs. production) is not significant in predicting an individuals accuracy, and (c) Trial Number is not significant in predicting an individuals accuracy during a training program. DISCUSSION This study was conducted to determine the effect of lexical tone production in tone discrimination training by comparing the effect of perception and production training on the accur acy of native English speakers in Mandarin lexical tone discrimination tasks. Fourteen native English speakers were randomly assigned to one of two training groups: perception and perceptionplus production. Based on the Motor Theory it was hypothesized t hat the participants that were asked to produce the Mandarin lexical tones used in this study (tones one, two, and four) would outperform those that underwent only perception training in a tone discrimination task. A baseline discrimination task was admini stered before and after the training session, and participants accuracy in these tasks was used to determine the effect of production on accuracy. Mixed logit models revealed that the data collected in the present study does not support the stated hypothesis and that only Session (pre vs. post training) was significant in predicting accuracy. This study therefore replicated the results depicted in other studies that examined lexical tone discrimination and found that nonnative speakers can increase in d iscrimination accuracy (e.g. Wong & Perrachione, 2007). Further analysis into the accuracy of participants throughout the duration of the training programs yielded interesting results: though Trial Number (where one Trial is defined as the completion of th e eight steps described in Figure 2) was not significant in predicting accuracy (i.e. participants did not become more or less accurate at a significant rate over the duration of the trial), participants in the perception group numerically displayed a steady, upward trend in accuracy over the duration of the training program, whereas participants in the production group actually dipped in accuracy during the middle of the training. Ultimately, it is difficult to discern from this data why the main effects o f Training Type and Trial had no effect due to the small n. Descriptively, while both trainin g types replicated an increase i n accuracy, the increase exhibited by the production group is notably higher than that exhibited by the perception group. It is pos sible that with so few participants in the study, a large enough number of participants had a natural proficiency for tone discrimination to skew the data and nullify any informative results. While these data do not support the stated hypothesis that the i nclusion of production in in lexical tone discrimination training would augment any increase in discrimination accuracy, the null results do not constitute evidence against it and warrant further inquiry. FURTHER INVESTIGATIO N Further testing is necessary in order to clarify the null results reported here. A larger number of participants in both training groups would provide a more accurate measure of the significance of main effects. Additionally, while it has been documented by several studies that indiv iduals without native language experience with a tonal language can improve their ability to discriminate between lexical tones through short term training paradigms, there is no data, to our knowledge at the time of publish, that describes the rate of acq uisition of the ability to successfully discriminate between tones. The data examined in the present study is interesting as participants in the perception and production training paradigms did not exhibit the same trend of increasing accuracy throughout t he duration of the training program. Additional data would be necessary (consisting of significantly longer training exercises) to determine if there is a reliable, upward trend in accuracy over the duration of training exercises, and if/when the effect ta pers off. 0 0.2 0.4 0.6 0.8 1 1 3 5 8 10 11 14Mean Accuracy Participant 1 2 3 4
ERIC HOLGATE, SHUANG LU, AND DR. EDITH KAAN University of Florida | Journal of Undergraduate Research | Volume 15, Issue 3 | Summer 2014 6 REFERENCES Best, C. T. (1995). A direct realist view of cross -language speech perception. In Strange, W. (Ed): Speech Perception and Linguistic Experience: Issues in Cross -Language Research. York, Baltimore, 171 -204. Boersma, P. & Weenink, D. (20 13). Praat: Doing Phonetics by Computer [Computer program]. Version 5.3.20 retrieved 5 July 2012 from http://www.praat.org/ Bradlow, A.R., Torretta, G.M., & Pisoni, D.B. (1996). Intelligibility of normal speech I: Global and fine grained acoustic phonetic talker characteristics. Speech Communication, 20 255 272. Gentleman, R. & Ihaka, R. (1997). The R Project for Statistical Analysis [Computer program]. Vers ion R 2.15.2 GUI 1.53 Leopard build 64bit, retrieved 13 December 2012 from www.r -project.org/ Gordon, E. E. (1989). Manual for the Advanced Measures of Music Audiation. Chicago: GIA. Kirkham, J., Lu, S., Wayland, R., & Kaan, E. (2011). Comparison of vocal ists and instrumentalists on lexical tone perception and production tasks. Proceedings of the 17th International Congress of Phonetic Sciences 1098 -1101. Liberman, A.M., Cooper, F.S., Shankweiler, D. P., & Studdert -Kennedy, M. (1967). Perception of the speech code. Psychological Review 74 431 -461. Oldfield, R. C. (1971). The assessment and analysis of handedness: The Edinburgh inventory. Neuropsychologia 9 97 -113. Wang, Y., Spence, M. M., Jongman, A., & Sereno, J.A. (1999). Training American listeners to perceive Mandarin tone. Journal of the Acoustical Society of America 106 3649 3658. Wayland, R. & Guion, S. (2004). Training native English and native Chinese speakers to perceive Thai tones. Language Learning, 54(4), 681 -712. Wong, P.C.M., Skoe, E., Russo, N. M., Dees, T., and Kraus, N. (2007). Musical Experience Shapes Human Brainstem Encoding of Linguistic Pitch Patterns. Nature Neuroscience, 10(4), 420-422. Wong, P.C.M., Perr achione, T.K. (2007). Learning pitch patterns in lexical identification by native English -speaking adults. Applied Psycholinguistics 28 565 -585.