Intelligibility of speech processed through the cochlea of fetal sheep in utero

MISSING IMAGE

Material Information

Title:
Intelligibility of speech processed through the cochlea of fetal sheep in utero
Physical Description:
xii, 190 leaves : ill. ; 29 cm.
Language:
English
Creator:
Huang, Xinyan, 1964-
Publication Date:

Subjects

Subjects / Keywords:
Sheep -- Fetuses   ( lcsh )
Cochlea   ( lcsh )
Sheep -- Physiology   ( lcsh )
Language acquisition -- Fetuses   ( lcsh )
Communication Sciences and Disorders thesis, Ph. D   ( lcsh )
Dissertations, Academic -- Communication Sciences and Disorders -- UF   ( lcsh )
Genre:
bibliography   ( marcgt )
non-fiction   ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 1999.
Bibliography:
Includes bibliographical references (leaves 178-189).
Statement of Responsibility:
by Xinyan Huang.
General Note:
Typescript.
General Note:
Vita.

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 030474611
oclc - 43413630
System ID:
AA00013620:00001


This item is only available as the following downloads:


Full Text










INTELLIGIBILITY OF SPEECH PROCESSED THROUGH
THE COCHLEA OF FETAL SHEEP IN UTERO













By

XINYAN HUANG


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

























Dedicated to my wife, Min Feng














ACKNOWLEDGMENTS


First and foremost, I would like to express my sincerest appreciation and gratitude

to my committee chairman and mentor, Dr. Kenneth Gerhardt, for his constant guidance,

support, encouragement, and invaluable contribution to my professional and personal

development. I would like to respectfully thank Dr. Robert Abrams for the constant

support and encouragement he gave me regarding science, academics, and American

culture.

Secondly, my thanks go to my committee members, Dr. Scott Griffiths, Dr.

Francis Joseph Kemker, and Dr. Kyle Rarey, for their thoughtful suggestions and support.

I would like to thank the faculty and staff in the Department of Communication Sciences

and Disorders and in the Perinatology Research Laboratory for their valuable help during

this study. I especially thank Mr. Rodney Housen for his computer programming

assistance.

Finally, I wish to express my deepest appreciation and thanks to my wife.

Without her love, patience, understanding, and continued support, this endeavor would

have not been possible. My love and appreciation are imparted to my parents whose

inspiration has kept me in constant pursuit of my dreams.















TABLE OF CONTENTS
page


ACKNOW LEDGM ENTS .......................................... ............................... iii

L IST O F T A B L E S ...................................................................... .............................. vi

LIST O F FIG U RES ....................................................................... ....................... ix

A B ST R A C T .................................................................................... ............................ xi

CHAPTERS

1 IN T R O D U C TIO N .................................................................... ........................ ......

2 REVIEW OF LITERATURE ............................ .........................6

Fetal H hearing ........................................................................... .......................... 6
Development of the Auditory System ........................ .........................6
Development of the Place Principle .................... .........................9
Central Auditory System ...........................................................13
Fetal Behavioral Response to Sound ...................................... ........................14
Fetal Sound Enviroment .................... .....................................16
Intrauterine Background Noise ................................... ......................16
Sound Transmission into the Uterus ............................ ........................19
Fetal Sound Isolation ..................................................... .......................... 21
Route of Sound Transmission into the Fetal Inner Ear ........................................23
Model of Fetal Hearing .................................................................................25
Intelligibility of Speech Sounds Recorded within the Uterus .............................27
Fetal Auditory Experiences and Learning ......................... ...........................31
Prenatal Effects of Sound Experience .......................... ........................ 31
Postnatal Effects of Prenatal Sound Experience ................................................ 35
Speech Perception .................................................................................................44
Speech Perception in Infancy .................... .. ................................44
Characteristic of Speech ...................... ....................................45
Intelligibility of Speech ..................... ...................................... 48

3 MATERIALS AND METHODS ................................... .......................56

Surgery ........................................................................................................................56

iv









Recording Speech Stimuli ................................... ...................................... 58
Perceptual Testing ................................................................ ...........................62
Subjects ............................................................................ ............................. 62
Speech Stim uli .................................................................. 62
Procedures .................................... ... .. ... ................. 64
Data Analyses ................................ .......... ....................65
Statistical A analyses .................................................... .. ........................... 65
Information Analyses ........................................ .... ................. .. 67
Acoustic Analyses .....................................................68

4 RESULTS AND DISCUSSION ....................... .. ............................. 70

Intelligibility ........................................ ............................ .. ..70
Consonant Feature Transmission ................................. ............. ...........94
Acoustic Analyses of Vowel Transmission ............................................. ...........117

5 SUMMARY AND CONCLUSIONS ........................................ .. ....153

APPENDICES

A SUBJECT RESPONSE SHEET .......................... ............. .............158

B RAW DATA FROM SUBJECT RESPONSE FORMS ........................................161

C RAW DATA FROM ACOUSTIC ANALYSES OF VOWELS .............................. 169

R EFEREN CES ........................................................... ............... .................... ..............178

BIOGRAPHICAL SKETCH .................................. ...... ......................190














LIST OF TABLES


Table pge

3-1 Perceptual tests ................................................................ ..........................63

4-1 VCV stimulus intelligibility scores ............................ .........................76

4-2 CVC stimulus intelligibility scores .................................. ....................77

4-3 ANOVA summary table for VCV stimuli .......................... ......................78

4-4 Post hoc multiple comparisons (Newman-Keuls test) for VCV stimuli ...............79

4-5 ANOVA summary table for CVC stimuli .......................... ......................80

4-6 Post hoc multiple comparisons (Newman-Keuls test) for CVC stimuli ...............81

4-7 Consonant confusion matrix for male talker, recorded in air at 105 dB SPL .......95

4-8 Consonant confusion matrix for male talker, recorded in air at 95 dB SPL .........96

4-9 Consonant confusion matrix for male talker, recorded in the uterus at 105 dB
SPL ........................................................................................................................97

4-10 Consonant confusion matrix for male talker, recorded in the uterus at 95 dB
S P L .................................................................................. ................................98

4-11 Consonant confusion matrix for male talker, recorded from CM-ex utero at 105
dB S P L ............................................................................ ................................99

4-12 Consonant confusion matrix for male talker, recorded from CM-ex utero at 95 dB
SPL ...................................................................................................................... 100

4-13 Consonant confusion matrix for male talker, recorded from CM-in utero at 105
dB S P L ....................................................................... ..................................10 1

4-14 Consonant confusion matrix for male talker, recorded from CM-in utero at 95 dB
S P L ................................................................................. ..............................102

vi










4-15 Consonant confusion matrix for female talker, recorded in air at 105 dB SPL ..103

4-16 Consonant confusion matrix for female talker, recorded in air at 95 dB SPL ....104

4-17 Consonant confusion matrix for female talker, recorded in the uterus at 105 dB
S P L ................................................................................. .. ............................10 5

4-18 Consonant confusion matrix for female talker, recorded in the uterus at 95 dB
S P L ................................................................................. ...............................10 6

4-19 Consonant confusion matrix for female talker, recorded from CM-ex utero at 105
dB S P L ......................................................................... ................................. 107

4-20 Consonant confusion matrix for female talker, recorded from CM-ex utero at 95
dB SPL ...................................................................... ...............108

4-21 Consonant confusion matrix for female talker, recorded from CM-in utero at 105
dB SPL .......................................................................... ...............................109

4-22 Consonant confusion matrix for female talker, recorded from CM-in utero at 95
dB SP L .......................................................................... ................................110

4-23 Conditional percentage of voicing, manner, and place information received (of
bits sent) for each talker, recording location, and stimulus level condition for the
nonsense syllables (VCV) ................................... ...........................112

4-24 Average fundamental frequencies (F0) and first three formant frequencies (FI, F2,
F3) for five vowels produced by each talker and recorded in air ......................128

4-25 Mean and standard deviation (S.D.) of relative intensity levels (dB) of
fundamental frequency (F0) and first three formant frequencies (F,, F2, F,) for
vowel /i/ produced by each talker at different recording sites in the 105 dB
condition .................................................... .............................................129

4-26 Mean and standard deviation (S.D.) of relative intensity levels (dB) of
fundamental frequency (F0) and first three formant frequencies (F,, F2, F3) for
vowel /I/ produced by each talker at different recording sites in the 105 dB
condition ....................................................... ..........................................135

4-27 Mean and standard deviation (S.D.) of relative intensity levels (dB) of
fundamental frequency (F0) and first three formant frequencies (Fi, F2, F3) for
vowel /e/ produced by each talker at different recording sites in the 105 dB
condition ............................................................................... ..................138

vii









4-28 Mean and standard deviation (S.D.) of relative intensity levels (dB) of
fundamental frequency (Fo) and first three formant frequencies (F,, F2, F3) for
vowel /a/ produced by each talker at different recording sites in the 105 dB
condition ............................................................................. ..................143

4-29 Mean and standard deviation (S.D.) of relative intensity levels (dB) of
fundamental frequency (Fo) and first three formant frequencies (F1, F2, F) for
vowel /A/ produced by each talker at different recording sites in the 105 dB
condition .................................................... ............... ..........................146

4-30 Summary of acoustic analyses of vowels .....................................................150














LIST OF FIGURES


Figure p

3-1 Schematic drawing showing the animal and the setup of devices for stimulus
generation, stimulus measurement, and recording in air, in the uterus, and from
the fetal inner ear cochlearr microphonic) ......................... ......................59

3-2 Examples of CMs evoked by airborne pure tones at 0.5 and 2.0 kHz ..................61

3-3 The frequency responses of two types of earphones .............................................66

4-1 Mean percent intelligibility of VCV nonsense stimuli spoken by a male and a
female talker recorded in air, in the uterus, from the fetal CM ex utero, and from
fetal CM in utero at two airborne stimulus levels ............................................... 72

4-2 Mean percent intelligibility of CVC words spoken by a male and a female talker
recorded in air, in the uterus, from the fetal CM ex utero, and from fetal CM in
utero at two airborne stimulus levels ............................ ...............................74

4-3 Mean percent intelligibility of VCV nonsense stimuli spoken by a male and a
female talker recorded in air, in the uterus, from the fetal CM ex utero, and from
fetal CM in utero when combining two airborne stimulus levels .......................84

4-4 Mean percent intelligibility of CVC words spoken by a male and a female talker
recorded in air, in the uterus, from the fetal CM ex utero, and from fetal CM in
utero when combining two airborne stimulus levels ..........................................87

4-5 Conditional percentage of voicing, manner and place information received for a
male (M) and a female (F) talker; in air (A), in the uterus (U), from the fetal CM
ex utero (X), and from the fetal CM in utero (I); at 105 dB (H) and 95 dB (L)
SPL .......................................................... ......................................... 114

4-6 Spectrographic recordings of "Mark the word lash" at different recording
conditions ...................................................................... .. ........................ 119

4-7 Mean of intensity levels (dB relative) of fundamental frequency (F0) and first
three formant frequencies (F,, F2, and F3) for vowel /i/ produced by both talkers
recorded at different locations at 105 dB SPL ............................................131

ix









4-8 Mean of intensity levels (dB relative) of fundamental frequency (F0) and first
three formant frequencies (FI, F2, and F3) for vowel /I/ produced by both talkers
recorded at different locations at 105 dB SPL ..................................................137

4-9 Mean of intensity levels (dB relative) of fundamental frequency (Fo) and first
three formant frequencies (F,, F2, and F3) for vowel /s/ produced by both talkers
recorded at different locations at 105 dB SPL .................................................140

4-10 Mean of intensity levels (dB relative) of fundamental frequency (F0) and first
three formant frequencies (F,, F,, and F3) for vowel /x/ produced by both talkers
recorded at different locations at 105 dB SPL ...................................................145

4-11 Mean of intensity levels (dB relative) of fundamental frequency (F0) and first
three formant frequencies (F,, F2, and F,) for vowel /A/ produced by both talkers
recorded at different locations at 105 dB SPL ................................................. 148














Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

INTELLIGIBILITY OF SPEECH PROCESSED THROUGH
THE COCHLEA OF FETAL SHEEP IN UTERO

By

Xinyan Huang

August 1999

Chairman: Kenneth J. Gerhardt
Major Department: Communication Sciences and Disorders

The intelligibility of speech stimuli recorded from the fetal sheep inner ear

cochlearr microphonic, CM) in utero was determined perceptually using a group of

untrained judges. A fetus was prepared for acute recordings during a surgical procedure.

Two separate lists, one of meaningful and one of nonmeaningful speech, were spoken by a

male and a female talker, delivered through a loudspeaker to the side of a pregnant ewe,

and recorded with an air microphone, a hydrophone placed inside the uterus, and an

electrode secured to the round window of the fetus in utero. Perceptual test audio compact

discs (CDs) generated from these recordings were played to 139 judges.

The intelligibility of the phonemes recorded in air was significantly greater than the

intelligibility of these stimuli when recorded from within the uterus. The intelligibility of

the phonemes recorded from CM ex utero was significantly greater than from CM in utero.

Overall, male and female talker intelligibility scores recorded within the uterus averaged

xi








91% and 85%, respectively. When recorded from the fetal CM in utero, intelligibility

scores averaged 45% and 42% for the male and female talkers, respectively.

An analysis of the transmission of consonant feature information revealed that

"voicing" is better transmitted into the uterus and into the fetal inner ear in utero than

"manner" or "place." Voicing information for the male, as well as manner and place

information, was better preserved in the fetal inner ear in utero than for the female.

Spectral analyses of vowels showed that the fundamental frequency (F0) and the first

three formants (F,, F2, and F,) were well preserved in the uterus recordings for both talkers,

but only F0, F|, and F, (< 2000 Hz) were perceived in the fetal inner ear in utero. Only the

lower frequency contents of vowels were present in fetal inner ear recordings.

This study demonstrated the presence of external speech signals in the fetal inner

ear in utero and described the type of phonetic information that was detected at the fetal

inner ear in utero.














CHAPTER 1
INTRODUCTION



There is overwhelming evidence that the human fetus detects and responds to

sound in utero (Querleu et al., 1989; Hepper, 1992; Lecanuet and Schaal, 1996). Studies

in pregnant humans (Walker, Grimwade and Wood, 1971; Querleu et al., 1988a; Richards

et al., 1992) and sheep (Armitage, Baldwin and Vince, 1980; Vince et al., 1982, 1985;

Gerhardt, Abrams and Oliver, 1990) have shown the existence of a rich diversity of

sound in the fetal environment, heavily dominated by the mother's voice and other

internal noises and permeated by varied rhythmic and tonal sounds from the external

environment. The human fetus has a well-developed hearing mechanism by the sixth

month of gestation (Rubel, 1985a; Pujol and Uziel, 1988; Pujol, Lavigne-Rebillard and

Uziel, 1990). During the last trimester, sound exposure may have a pronounced effect on

fetal behavior and central nervous system maturation. Speech perception and voice

recognition by the newborn may result directly from its prenatal experience (Fifer and

Moon, 1988, 1995).

Linguistic theorists have proposed two alternative hypotheses regarding language

development that infants upon birth are equipped with either a generalized auditory

mechanism or a specialized speech-specific mechanism designed for perception of

speech. Some theorists hold that human infants are born with a "speech module," a

mechanism designed specifically for processing the complex and intricate acoustic








signals needed by humans to communicate with one another (Liberman, 1982; Fodor,

1983; Liberman and Mattingly, 1985; Wilkins and Wakefield, 1995; Fowler, 1996). An

alternative theory of the neonate's initial state suggests that infants enter the world

without specialized mechanisms dedicated to speech and language, but rather respond to

speech using general sensory, motor, and cognitive abilities (Aslin, 1987; Kuhl, 1987,

1992; Jusczyk 1996; Ohala, 1996; Fitch, Miller and Tallal, 1997). Which theory, if

either, applies to the human fetus is not known. What is known is that the fetus is

beginning the dynamic process of acquiring the necessary skills for speech and language

acquisition during prenatal life in utero (Querleu et al., 1989; Lecanuet, Granier-Deferre

and Busnel, 1991; Lecanuet and Schaal, 1996).

The maternal voice is a naturally occurring and salient stimulus in utero that

occurs during a crucial time period of fetal ontogeny (Querleu et al., 1988a; Benzaquen et

al., 1990; Richards et al., 1992) in which several psychobiological systems, including the

auditory system, are developing. The immediate effects of exposure to the mother's

voice on the fetus may provide a way of tracking auditory system development, as well as

measuring fetal ability to process sensory information (Fifer and Moon, 1988, 1994,

1995). Fetal auditory discrimination has also led to the hypothesis that prenatal

experience with auditory stimulation is the precursor to postnatal linguistic development

(Cooper and Aslin, 1989; Querleu et al., 1989; Ruben, 1992; Abrams, Gerhardt and

Antonelli, 1998).

DeCasper and his colleagues (DeCasper and Fifer, 1980; DeCasper and Prescott,

1984) demonstrated that newborn infants preferred their mother's voice over that of other

talkers. While this preference was assumed to be the product of in utero exposure to the








mother's voice and suggested that the fetus detected maternal vocalizations and retained

memories of her speech patterns, it is not known what speech information actually

reaches the fetal inner ear nor the extent to which the auditory system responds to

externally generated speech. Querleu et al. (1988b) and more recently Griffiths et al.

(1994) reported on the intelligibility of speech recorded with a hydrophone in the human

(Querleu et al., 1988b) and sheep (Griffiths et al., 1994) uterus. In both studies, the

recordings were played back to juries of normal listeners and speech intelligibility was

calculated from their responses. The intelligibility of in utero recordings of speech was

poorer than that of air recordings because the acoustic signature of human speech is

modified by the abdominal wall, uterus, and amniotic fluids as it passes from air to the

fetal head. The attenuation properties of the abdomen and uterus can be modeled as a

low-pass filter with a high frequency cutoff at 250 Hz and a rejection rate of

approximately 6 dB/octave (Gerhardt, Abrams and Oliver, 1990).

While the results of these studies reflect the perceptibility of the speech energies

present in the amniotic fluid, they do not specify what speech energy might be present at

the level of fetal inner ear. Measures of acoustic transmission to the fetal inner ear are

quite limited at present (Gerhardt et al., 1992). Much work needs to be completed before

conclusions can be drawn regarding what speech energies reach and are able to be

perceived by the fetus.

The present experiment was designed to evaluate the intelligibility of speech

produced through a loudspeaker and recorded with an electrode secured to the fetal sheep

round window. The electrode recorded a bioelectric potential called the cochlear

microphonic (CM). The CM is generated at the level of the hair cells and mimics the







4
input in amplitude and frequency (Gulick, Gescheider and Frisina, 1989). Recordings of

the CM represent the time displacement patterns of the basilar membrane and reflect the

initial response of the auditory periphery. The hypothesis is that speech is further

degraded as it passes into the inner ear. Sheep were used in this study not only because

sound attenuation characteristics of the abdominal contents of pregnant sheep are similar

to those of pregnant women (Armitage, Baldwin and Vince, 1980; Querleu et al., 1988a;

Gerhardt, Abrams and Oliver, 1990; Richards et al., 1992), but also because of the

precocious hearing and the similarity of auditory sensitivity to humans. Sheep's hearing

is only slightly poorer than that of humans for frequencies below about 8000 Hz

(Wollack, 1963). The objective of this study was to determine what speech information

was transmitted into the uterus and presented within the inner ear of the sheep fetus in

utero.

The following hypotheses were tested:

I. The intelligibility of monosyllabic words and nonsense syllables will be reduced when

recorded in the uterus compared to air.

2. The intelligibility of monosyllabic words and nonsense syllables will be reduced when

recorded from the fetal inner ear in utero compared to uterus.

3. The intelligibility of a male talker will be greater than the intelligibility of a female

talker when recorded in the uterus and from the fetal inner ear in utero.

4. Transmission into the uterus and fetal inner ear will be greater for voicing

information than for manner and place information.







5
5. The transmission of voicing, manner, and place information will be better for males

than for females when recorded in the uterus and from the inner ear of the fetus in

utero.

6. Acoustic energy in the second and third formants of vowels measured in air for both

male and female talkers will be reduced when recorded in the uterus, and will be

reduced to the noise floor when recorded from the fetal inner ear in utero.














CHAPTER 2
REVIEW OF LITERATURE



The human, unlike most mammalian species, is born with highly developed

auditory sensitivity. By the 20th week of gestation, the structures of the peripheral

auditory system, including the outer, middle, and inner ear, are anatomically like that of

an adult, thus enabling the fetus to detect sounds during the last trimester of pregnancy

(Rubel, 1985a; Pujol and Uziel, 1988; Pujol, Lavigne-Rebillard and Uziel, 1990).

Responsiveness of the fetus to auditory stimuli begins during the 24th week of gestation

(Birnholz and Benacerraf, 1983; Shahidullah and Hepper 1993). Maturation of auditory

processing capabilities takes place through prenatal and perinatal periods. An

appreciation of the process of auditory development is important not only for an

understanding of the normal auditory system, but also for an understanding of the impact

of prenatal sound experience on the postnatal development, from structural, functional to

behavioral development (Lecanuet and Schaal, 1996).



Fetal Hearing



Development of the Auditory System

The earliest embryological signs of the human auditory apparatus are thickenings

of the ectoderm on the sides of the head, bilaterally, called the auditory placodes. About








the 23rd day of gestational age (GA), each placode begins to invaginate to form the

auditory pit, which then splits off from the overlying ectoderm to form an otocyst at the

30th day. At about 4 to 5 weeks, the otocyst divides into two parts, the vestibular portion

and the cochlea. During the 8th through 11th week, the two and a half coils of the

cochlea are attained. Complete maturation of sensory and supporting cells in the cochlea

does not occur until the 20th week when the cochlea reaches adult size (Northern and

Downs, 1991; Peck, 1994). Cytodifferentiation occurs during the 9th to 10th weeks

within the cochlear duct, where there is a thickening of epithelium. From the 3rd to the

5th month, the thickening epithelium differentiates into the distinct receptor and

supporting cells of the organ of Corti.

Comparing with that found in other mammals when the first responses to sound

can be evoked, the human cochlea has achieved a functional stage by 20 weeks of

gestation (Pujol and Uziel, 1988). At this time, the cochlea may have high thresholds and

very poor discriminative properties. It is thus not possible to detect signs of cochlear

activity using behavioral or electrophysiological methods, which explains why the first

responses to acoustic stimulation can only be recorded a few weeks later (Starr et al.

1977; Bimholz and Benacerraf, 1983).

Rubel (1984) indicated that no single event triggers the onset of cochlear function.

Many simultaneous and synchronous events contribute to the maturation of mechanical

and neural properties. These events include thinning of the basilar membrane, formation

of the inner spiral sulcus, maturation of the pillar cells, freeing of the inferior margin of

the tectorial membrane, opening of the tunnel of Corti, formation of Nuel's spaces,







8
differentiation of the hair cells, establishment of mature cilia structure, and the maturation

of synapses (Pujol and Hilding, 1973).

These final maturational events do not occur simultaneously throughout the length

of the cochlea. There are two general developmental gradients in the differentiation and

maturation of cochlea hair cells and their neural connections. The first is the classic basal

to apical gradient, that at each maturation stage the mid-basal region develops first and

spreads in both directions, with the apex maturating last. The second gradient is from

inner hair cells (IHCs) to outer hair cells (OHCs); IHCs differentiate and develop first

(Pujol and Uziel, 1988; Pujol, Lavigne-Rebillard and Lenoir, 1998). This does not

necessarily imply that IHCs are the first to achieve all adult characteristics. For example,

the completion of the ciliogenesis process occurs first at OHCs. Generally, synapse

formation on IHCs occurs early and undergoes only minor modifications thereafter. The

OHCs are initially surrounded by afferent terminals, which are gradually replaced by

numerous efferents. Then the large calyciform efferent terminals form, typical of the

mature cochlea.

Based on cat studies, the functional development of the auditory system is divided

into three stages (Walsh and McGee, 1990). During the first stage, which is through the

cats' first postnatal week and corresponds to the second trimester of human gestation,

auditory responses can be elicited, but hearing thresholds are very high and well outside

of the range of naturally occurring acoustic events. Response sensitivity does not

significantly improve during this stage and the responsive frequency range is limited to

low-frequency and mid-frequency sounds. During the second stage, in cats through the

third postnatal week and in humans probably through the final trimester, rapid maturation








of auditory function takes place. Thresholds decrease substantially, the adult frequency

response range is attained, and response duration is perceived. These changes are

attributable in large part to cochlear maturation, and to a lesser extent to maturation of the

central auditory system. During the final developmental stage, the remaining

components within the auditory system mature slowly and myelination is complete. The

adult characteristics for the cat are acquired during the second month after birth.

However, further maturation of the human auditory system occurs after birth and

continues for the next few years.



Development of the Place Principle

Young mammals do not respond initially to all of the frequencies to which they

respond as adults. Generally, initial responses are elicited by low- or mid-frequency

sounds. As development proceeds, responsiveness to both lower and higher frequencies

increases. Responsiveness to the highest frequencies develops last (Rubel, 1978; Rubel,

1985a). However, cochlear differentiation occurs first in basal or mid-basal high-

frequency regions, then spreads in both directions. The last part of the cochlea to

undergo differentiation is the apical, low-frequency region (Rubel, 1978). A similar

differentiation gradient also occurs in eighth-nerve ganglion cells and cochlear nuclei;

regions receiving input from the basal, high-frequency region of the cochlea mature prior

to the development of apical projection areas (Romand and Romand, 1982; Rubel, Smith

and Miller, 1976; Schweitzer and Cant, 1984).

A paradox of cochlear development was pointed out by Rubel in 1978. During

the early stages of hearing, the base or mid-basal region of the cochlea and the basal








representation areas of the central nervous system are the first to respond to sound.

However, these areas are initially most sensitive to relatively low-frequency sound, even

though this region of the cochlea has been tuned to being respond to high-frequency

sound. With maturation of both mechanical and neural properties of the cochlea, the

place code gradually shifts toward the apex until mature organization is achieved.

In an effort to understand more fully the mechanisms underlining this apparent

paradox, Rubel and Ryals (1983) studied the position of hair cell damage produced by

high-intensity pure tones of three different frequencies on three age groups of young

chicks. The results showed that the position of maximum damage produced by each

frequency shifted systematically toward the apex as a function of age. This experiment

was carried out during the late stages of hearing development in the chick, corresponding

to the perinatal or immediate postnatal periods in humans. On a related study, Lippe and

Rubel (1983) evaluated the relationship between the location of neurons of the brainstem

in chicks (nucleus magnocellularis and nucleus laminaris) and the frequency to which

they were most sensitive. In both nuclei of the brainstem, embryonic neurons were most

sensitive to tones 1-1.5 octaves below the frequencies that activate the same neurons one

to two weeks after hatching. These two experiments provided support for the model of

cochlear development offered by Rubel in 1978.

Later investigations, again in chicks, revealed some inconsistencies in the theory

developed by Rubel (1978). The discrepancy between these studies may be attributed to

developmental changes in middle-ear transfer function, the changes of the physical size

of the basilar papilla, and temperature effects on frequency tuning (Riibsamen and Lippe,

1998). Currently, there are two alternative hypotheses for the development of the








cochlear frequency map in chicks. One theory suggests that frequency representation

does not change developmentally. Another theory proposes that frequency representation

shifts developmentally but that the shift is restricted to regions along the papilla that code

mid- and high-frequency sounds, while low-frequency sounds are always represented at

the apical location. Responses to mid-frequency sounds occur progressively more

apically as the base becomes responsive to high-frequency sounds (Riibsamen and Lippe,

1998).

Dallos and his colleagues (Harris and Dallos, 1984; Yancey and Dallos, 1985;

Arijmand, Harris and Dallos, 1988) studied the developmental change of the place code

in gerbils. They reported that the cutoff frequency of the cochlear microphonic (CM) and

the summating potential in the mid-basal turn (15 kHz location) increased about 1.5 to 2

octaves between the onset of sound evoked response on the 12th postnatal day when

frequency representation becomes adultlike on the 21st postnatal days. But, the cutoff

frequency of the CM at a second turn location (2.5 kHz) remains stable during

development.

More direct evidence was provided by the finding that the characteristic

frequencies of spiral ganglion neurons at a constant basal cochlear location increased up

to 1.5 octaves between the second and third postnatal weeks (Echteler, Arjmand and

Dallos, 1989). It has been uniformly reported that tonotopic organization in the mid- and

high-frequency regions of the cochlea and central auditory nuclei changes during

development. However, tonotopy in the cochlear apex and its central projection sites

appeared to be developmentally stable (Rtbsamen and Lippe, 1998). As a result of this

new information, two updated explanations for the place code have been proposed. First,








the shifts in frequency code are attributed to maturational changes in the passive

mechanical properties of the cochlea (Lippe and Rubel, 1985). Second, Romand (1987)

proposed that the shifts in frequency organization should be attributed to maturational

changes in cochlear active processes mediated by the outer hair cells. Both factors were

examined by comparing tone-evoked distortion product otoacoustic emissions before and

after an injection of furosemide in gerbils between 14 days old and adult (Mills, Norton

and Rubel, 1994; Mills and Rubel, 1996). Results showed that increase in the passive

base cutoff frequency rather than maturational changes in active processes accounts for

the place code shift.

Currently, a revised model of the place code shift hypothesis for mammals, based

on the evidence from developmental studies of central and peripheral frequency maps, is

suggested. The entire length of the basilar membrane is capable of supporting a

traveling wave at or very soon after the onset of hearing. Frequency representation in the

cochlear apex is developmentally stable. From the very onset of hearing, the apex

responds to its correct (adult) frequency, although the sensitivity and sharpness of tuning

are reduced. In contrast, the more basal regions of the cochlea, mid- and high-frequency

regions, undergo a shift in frequency organization such that each location becomes

responsive to progressively higher frequencies in older animals. Shifts in the cochlear

map result largely from maturational changes in the mechanical properties of the cochlear

partition. The active mechanism also contributes to the shift in frequency organization

(Riibsamen and Lippe, 1998).








Central Auditory System

The development of the central auditory system and its relation to the maturation

of the auditory periphery has been studied in animal models (Rubel, 1985a). Normal

growth of central auditory neural elements requires an intact peripheral mechanism.

However, initial stages of development of the auditory centers in the central nervous

system are independent of peripheral regulation. The proliferation and migration of

neurons in the central auditory system do not depend on the cochlea. The major

pathways are established prior to or simultaneously with the development of peripheral

function. Marty (1962) showed that in newborn kittens, the cortical evoked responses

were elicited by electrical stimulation of the auditory nerve. The cochlea is immature at

this time, and it is not possible to reliably evoke cortical responses to sound.

Following the establishment of functional connections between the periphery and

the central nervous system, the continued maturation of neurons is highly dependent on

the functional integrity of their afferents. Rubel and his colleges (Rubel, Smith and

Miller, 1976; Jackson, Hackett and Rubel, 1982) revealed that in chicks after the time

when functional connections normally are established between the eighth nerve and the

cochlear nucleus cells, the absence of peripheral innervation caused rapid and severe

degeneration of the neurons. Abrams et al. (1987) demonstrated the impairment of

glucose utilization in the auditory as well as nonauditory portions of the brain after

cochlear ablation in fetal sheep.








Fetal Behavioral Response to Sound

The human fetal auditory system is functional by the start of the third trimester

(Bimholz and Benacerraf, 1983). Although direct measurement of fetal hearing cannot

be made by electrophysiological methods, indirect methods have been applied to measure

fetal behavioral responses to sound stimuli. The most common approaches used to

measure responsiveness to sound include the monitoring of fetal heart rate (Johansson,

Wedenberg and Westen, 1964), fetal movement (Shahidullah and Hepper, 1994) and

reflexive responses such as the auropalpebral reflex (Bimholz and Benacerraf, 1983).

Fetal movements in response to sound and to vibroacoustic stimulation or to both relate

closely to the development of fetal audition (Gelman et. al, 1982; Hepper and

Shahidullah, 1994a).

In 1983, Birnholz and Benacerraf measured fetal responsiveness to an electronic

artificial larynx (EAL) applied directly to the maternal abdomen. The auropalpebral

reflex (blink-startle response) of the 236 fetuses tested from 16 to 32 weeks of gestation

was monitored by ultrasonography. Reflexive eye movements were first elicited in some

fetuses between 24 and 25 weeks of gestational age, and responses increased in frequency

after 26 weeks. Consistent responses to EAL were observed after 28 weeks of

pregnancy.

Shahidullah and Hepper (1993) examined the response of fetuses to a 110 dB SPL

broadband air-borne stimulus (80-2000 Hz) at 15, 20 and 25 weeks of gestation. Using a

response, which consists of a movement within 4.5 seconds of the onset of the stimulus,

the investigators found that fetuses heard the noise at 25 weeks of gestation, but not

earlier. However, when the stimulus was changed from a single pulse to a series of ten








pulses with two-second duration and ten-second inter-stimulus interval, a response was

observed at 20 weeks of pregnancy. Thus, very early diffuse motor responses of slow

latency were appeared as early as 20 weeks of gestation; by 25 weeks the response had

become an immediate auditory startle response.

The auditory system of the fetus does not just begin to function uniformly across

frequency. While the adult range of audibility is from 20 Hz to 20,000 Hz with greatest

sensitivity in the 300 to 3000 Hz range, the fetus hears a much more limited range.

Hepper and Shahidullah (1994b) examined the range of frequencies and intensity levels

required to elicit human fetal movements as assessed with ultrasonography. Out of 450

fetuses involved in the study, only one demonstrated a response to a 500 Hz tone at 19

weeks gestational age. The range of frequencies to which the fetus responded expanded

first to low frequencies, 100 Hz and 250 Hz, and then to high frequencies, 1000 Hz and

3000 Hz. By 27 weeks, 96% of the fetuses responded to tones at 100, 250 and 500 Hz,

while none responded to frequencies at 1000 and 3000 Hz. It was not until weeks 29

(1000 Hz) and 31 (3000 Hz) that the fetuses responded to these tones. Between 33 and

35 weeks, the fetuses responded 100% of the time to presentations of 1000 and 3000 Hz.

As gestation progressed from 19 to 37 weeks, the fetuses exhibited responsiveness to

frequencies over a progressively wider frequency range. During this period, there was a

significant decrease (20-30 dB) in the intensity level of stimulus required to elicit a

response for all frequencies. This finding suggests that fetal hearing to pure tones

becomes more sensitive as gestation proceeds.

The ability to discriminate frequency is fundamental for the interpretation of

auditory information and for the development of speech perception and speech








production. Adults can detect changes of less than 2 Hz when the primary tone is

between 100 Hz and 1000 Hz (Yost, 1994). The development of frequency

discrimination in the human fetus was studied by Shahidullah and Hepper (1994) through

the method habituation/dishabituation measurement. Ultrasound imaging was used to

monitor fetal response to 250 and 500 Hz tones at 27 and 35 weeks gestation (N=48).

They found that 35-week-old fetuses were capable of distinguishing between the two

pure tones. However, fetuses at 27 weeks were not as likely to demonstrate this same

discrimination.

Shahidullah and Hepper (1994b) also evaluated the abilities of 36 fetuses to

differentiate between speech sounds. Fetuses at 27 and 35 weeks of age were exposed to

a pair of pre-recorded syllables presented at 110 dB SPL through an earphone placed on

the maternal abdomen. Half of the fetuses received /baba/ as their habituating stimuli and

/bibi/ as their dishabituating stimulus and vice versa. Although all fetuses habituated,

fewer stimuli were required for habituation for the 35-week-old fetuses than the 27-week-

olds, and a greater number of the 35-week-old fetuses (17 of 18) demonstrated

dishabituation compared to the younger ones (3 of 18). Thus, fetuses at thirty-five weeks

possess the ability to discriminate among different phonemes.



Fetal Sound Environment



Intrauterine Background Noise

The fetal sound environment is composed of a variety of internally generated

noises, as well as many sounds originating from the environment of its mother. The once








held belief that the fetus develops in an environment devoid of external stimulation

(Grimwarde, Walker and Wood, 1970) has been replaced by the fact that the fetus grows

in the uterus filled with rich and diversified sounds originated inside and outside the

mother (Gerhardt, 1989; Querleu et al., 1989).

The acoustic characteristics of internal noises and of external sounds that transmit

into the uterus have been described in the human from various recording sites including

inside the vagina (Bench, 1968), inside the cervix (Grimwarde, Walker and Wood, 1970),

and inside the uterus following amniotomy (Querleu et al., 1988b; Benzaquen et al.,

1990; Richards et al., 1992). These intrauterine sounds in humans were very similar to

those recorded in pregnant sheep, via a chronically implanted hydrophone on the fetal

head inside the uterus with an intact amniotic sac (Vince et al., 1982, 1985; Gerhardt,

Abrams and Oliver, 1990).

Sounds generated inside the mother and present in the uterus are associated with

maternal respiration (Vince et al., 1982; Gerhardt, Abrams and Oliver, 1990), maternal

heartbeats (Walker, Grimwarde, and Wood, 1971; Querleu et al., 1988a), maternal

intestinal activity (Vince et al., 1982; Gerhardt, Abrams and Oliver, 1990; Benzaquen et

al., 1990), maternal physical movements (Vince et al., 1982; Gerhardt, Abrams and

Oliver, 1990), and with placental and fetal circulation (Querleu et al., 1988a). These

sounds provide a background or "noise floor" above which maternal vocalizations and

externally generated sounds emerge (Vince et al., 1982, 1985; Querleu et al., 1988b;

Gerhardt, Abrams and Oliver, 1990; Benzaquen et al., 1990; Richards et al., 1992).

In 1968, Bench measured the intrauterine noise floor at 72 dB SPL in a pregnant

woman during labor. Three years later, Walker et al. (1971) reported an average intensity







18
of the background noise at 85 dB SPL (sound pressure level), with a peak at 95 dB SPL,

which was associate with maternal heartbeats. However, the accuracy of these early

studies was questioned by further studies using a hydrophone instead of a rubber-covered

microphone previously used to measure the intrauterine sound level.

The use of a hydrophone represented an important technological improvement

and provided more accurate data than was previously collected with air microphones.

Studies in pregnant sheep (Vince et al., 1982; Gerhardt, Abrams and Oliver, 1990) and

human (Querleu et al., 1988a; Benzaquen et al., 1990; Richards et al., 1992) showed that

there is a quiet background with a muffled quality to sounds inside the uterus.

Intrauterine sounds are predominately low frequency (< 100 Hz) and reach 90 dB SPL

(Querleu, Renard and Crepin, 1981; Vince et al., 1982; Gerhardt et al., 1990). Spectral

levels decrease as frequency increases, and are as low as 40 dB for higher frequencies

(Benzaquen et al., 1990; Gagnon, Benzaquen and Hunse, 1992). Gagnon et al.

positioned a hydrophone in a pocket of fluid by the human fetal neck and measured

sound pressure levels of 85 dB SPL at 12.5 Hz, decreasing to 60 dB for 100 Hz and less

than 40 dB for 200 Hz and above. When measured in dBA, the human intrauterine sound

level was only 28 dBA (Querleu et al., 1988a). Thus, for both humans and sheep, the

noise floor tends to be dominated by low-frequency energy less than 100 Hz and can

reach levels as high as 90 dB SPL.

Recently, Abrams et al. (1998) explored the origin of the intrauterine background

noise in sheep under well-controlled laboratory conditions. The intrauterine noise level

was measured before and after death of the ewe and fetus, and the average reduction in

sound level postmortem approached 10-15 dB for frequencies below 100 Hz. The result








showed that sounds originating in the ewe and fetus contribute significantly to the low

frequency (< 100 Hz) component of the background noise.



Sound Transmission into the Uterus

Specifications of the amplitudes and frequency distributions of external sounds

transmitted into the uterus have been well described in humans (Querleu et al., 1988a;

Richards et al., 1992) and sheep (Armitage, Baldwin and Vince, 1980; Vince et al., 1982,

1985; Gerhardt, Abrams and Oliver, 1990). The attenuation of sound by the maternal

abdominal wall, uterus and amniotic fluid is low in the low frequencies and increases in

the high frequencies. In pregnant women, studied by Querleu et al. (1981), the

attenuation is 2 dB at 250 Hz, 14 dB at 500 Hz, 20 dB at 1000 Hz and 26 dB at 2000 Hz.

For high frequencies ranging from 3800 to above 18000 Hz, the attenuation is 20 to 40

dB (Querleu et al., 1988a). More recent results from Richards et al. (1992) showed that

there was an average of 3.7 dB enhancement at 125 Hz, with progressively increasing

attenuation up to 10.0 dB at 4000 Hz. Similar conclusions came from studies in sheep

(Armitage, Baldwin and Vince, 1980; Vince et al., 1982, 1985; Gerhardt, Abrams and

Oliver, 1990).

For frequencies below 250 Hz the reduction in sound pressure level through

maternal tissue and fluids was less than 5 dB. Some enhancement of low-frequency

sound pressures has been reported in both humans (Querleu et al., 1981; Richards et al.,

1992) and sheep (Vince et al., 1982, 1985; Gerhardt, Abrams and Oliver, 1990). That is,

the sound pressure in the amnion was greater than the sound pressure in air. Above 250

Hz, attenuation increased at a rate of about 6 dB per octave up to approximately 4000 Hz,








where the average attenuation was 20 to 25 dB. However, at 8000 Hz transmission loss

was 15 dB (Gerhardt, Abrams and Oliver, 1990). These general findings have been

refined and extended by Peters et al. (1993a, 1993b) who evaluated the transfer of

airborne sounds across the abdominal wall of sheep as a function of frequency and

intraabdominal location.

Peters et al. (1993a) studied the transmission of airborne sound into the abdomen

of sheep over a wide frequency range (50-20,000 Hz). They found that mean attenuation

varied from a high of 28 dB to a low of-3 dB. The greatest attenuation occurred for the

frequencies between 5,000 and 12,500 Hz. Surprisingly, sound attenuation varied

inversely as a function of stimulus level for low frequencies (50-125 Hz) and for high

frequencies (7,000-20,000 Hz). At higher stimulus levels (110 dB SPL in air),

attenuation was greater than the attenuation at lower stimulus levels (90 dB SPL). Thus,

the 90 dB stimulus was more efficient than the 110 dB. In the middle frequency range

(200-4,000 Hz), no effect of stimulus level was found.

In another study by Peters et al. (1993b), a hydrophone was positioned at each of

45 locations in a 20 x 20 x 20 array in the abdomen of five non-pregnant sheep post

mortem. Isoattenuation contours within the abdomen were obtained. The sound pressure

at different locations within the three-dimensional space of the sheep was highly variable.

Low-frequency bands (< 250 Hz) of noise revealed strong enhancement of sound

pressure by up to 12 dB in the ventral part of the abdomen. For mid-frequencies (250-

2000 Hz), attenuation reached as high as 20 dB. Attenuation for high frequencies (>

3150 Hz) were somewhat less than for mid-frequencies and reached an upper limit of

approximately 16 dB.








Over the frequency range from 250 to 4000 Hz, the abdomen can be characterized

as a low-pass filter with high-frequency energy rejected at a rate of approximately 6

dB/octave (Gerhardt, Abrams and Oliver, 1990). Thus, external stimuli are shaped by the

tissues and fluids of pregnancy before reaching the fetal head.



Fetal Sound Isolation

It is known how much sound pressure is present at the fetal head. Now there is

information about how much sound actually reaches the fetal inner ear (Gerhardt, et al.

1992). For the fetus in utero, external airborne sound energy must pass from the air

medium to the fluid medium of the amnion before reaching the fetal inner ear. As sound

energy changes medium, it is reduced because of the impedance difference at the air-

tissue interface. The two quantities, pressure and particle velocity, are related and are

dependent on the acoustic impedance of the medium. The acoustic impedance of water is

much higher than that of air, for a given pressure disturbance, the particle velocity is

much less by a factor of approximately 3600 (10 log3600 = 35.5 dB) (Hawkins and

Myrberg, 1983). Thus, equal pressure in air and fluid differ in sound energy by

approximately 35 dB. One would assume that the sound pressure level required to

produce a physiological response from the fetus would be approximately 35 dB greater

than the sound pressure level in air necessary to produce the same response from the

newborn (Gerhardt, 1990; Gerhardt, et al. 1992). Factors that determine how much ex

utero sound reaches the inner ear of the fetus include the sound pressure attenuation

through maternal tissue and fluid and the transformation of this pressure into basilar

membrane displacement.








Gerhardt et al. (1992) studied the extent to which the fetal sheep in utero is

isolated from sounds produced outside the mother. Inferences regarding sound

transmission to the inner ear were made from cochlear microphonic (CM) input-output

functions to stimuli with different frequency content. The CM, an alternating current

generated by the hair cells of the inner ear, mimics the input signal in frequency and

amplitude over a fairly wide range. As the signal amplitude increases, so does the

amplitude of the CM. Cochlear microphonics recorded from the round window are

sensitive indices of transmission characteristics of the middle ear. Thus, changes in the

condition of the middle ear influence the amplitude of the CM. By comparing the sound

pressure levels necessary to produce equal CM amplitude from the fetus in utero, and

later, from the newborn lamb in the same sound field, estimates of fetal sound isolation

can be made.

Cochlear microphonic input-output functions were recorded from in utero fetuses

in response to one-third octave band noises from 125 to 2000 Hz and then again from the

same animals after birth. The magnitude of fetal sound isolation was dependent upon

stimulus frequency. For 125 Hz, sound isolation ranged from 6 to 17 dB, whereas for

2000 Hz fetal sound isolation ranged from 27 to 56 dB. The averages for each stimulus

frequency were 11.1 dB for 125 Hz, 19.8 dB for 250 Hz, 35.3 dB for 500 Hz, 38.2 dB for

1000 Hz and 45.0 dB for 2000 Hz. Thus, for lower frequencies (< 500 Hz) the fetal

auditory system appears to be sensitive to pressure variations produced by the stimulus

originated from outside the mother.








Route of Sound Transmission into the Fetal Inner Ear

Another factor that influences how airborne stimuli affects the fetus is related to

the transmission of sound pressure from the fluid at the fetal head into the inner ear.

Transmission is governed by the route that pressure variations take to reach the inner ear.

The route of sound transmission postnatally is through the outer and middle ear system.

Normal auditory function requires an air-filled middle ear cavity, an intact tympanic

membrane, and functional hair cells and neural mechanism. In order to stimulate the hair

cells of the inner ear, the movement of the stapes footplate in and out of the oval window

creates hydraulic motion of the cochlear fluids, which causes basilar membrane

displacement. However, in the fetus this route is likely to be rendered less efficient

because the mechanical properties of the middle ear are highly dampened. The fetal

middle ear and external ear canal are filled with amniotic fluid, which decreases the

mechanical advantage of the middle ear. In addition, sound pressure may be present with

the same phase at the oval window and round window. The lack of a phase difference, as

well as the lack of a middle ear amplifier, may substantially decrease basilar membrane

displacement and therefore cause a decrease in hearing sensitivity.

Two hypotheses have been proposed that describe the route that exogenous

sounds take to reach the fetal cochlea. It has been suggested that acoustic stimuli in the

fetal environment pass easily through the fluid-filled external auditory canal and middle

ear system to the inner ear (Rubel, 1985b; Querleu et al., 1989). The impedance of inner

ear fluids is similar to that of amniotic fluid, thus, little acoustic energy is lost due to an

impedance mismatch (Querleu et al., 1989).








Hearing via bone conduction is a second alternative. Researchers have shown

that the contribution of the external auditory meatus to auditory sensitivity in underwater

divers is negligible (Hollien and Feinstein, 1975). By comparing the ability of a diver to

hear under different conditions while in water, bone conduction has been shown to be

much more effective in transmitting underwater sound energy. Similarly, fetal hearing

occurs in a fluid environment and sound transmission may be through bone conduction as

well.

Gerhardt, et al. (1996) compared the effectiveness of the two routes of sound

transmission (outer and middle ear vs. bone conduction) by recording CM amplitudes

from fetus sheep in utero in response to airborne sounds. CM input-output functions

were obtained from the fetus in utero during three different conditions: uncovered fetal

head, covered entire fetal head, and covered fetal head with exposed pinna and ear canal.

Results showed that when the fetal head was covered with sound attenuating

material, even though the pinna and ear canal remain uncovered, sound levels necessary

to evoke a response were greater than those necessary to evoke the same response from

the fetus with its head uncovered. This fact revealed that acoustic energy in amniotic

fluid reaches the fetal inner ear through a bone conduction route. External sounds

transmitted into uterus stimulate the inner ear by vibrating fetal skull directly, which in

turn results in the basilar membrane displacement. Thus, more sound energy is necessary

to vibrate the skull to stimulate hair cell by bone conduction than by air conduction.








Model of Fetal Hearing

Gerhardt and Abrams (1996) proposed a model of fetal hearing that considers

what sounds are present in the environment of the fetus and to what extent these sounds

can be detected. The model includes information regarding intrauterine background

noise, sound transmission through the tissues and fluids associated with pregnancy and

sound transmission through the fetal skull into the inner ear.

For the fetus to detect a signal from outside the mother, extrinsic sounds have to

exceed the ambient sound level in utero. The internal noise floor of the mother is

dominated by low-frequency energy produced by respiration, intestinal function,

cardiovascular system, and maternal movements. Spectral levels decrease as frequency

increases, and are 60 dB for 100 Hz and lower than 40 dB for 200 Hz and above.

Presumably, the ability of the fetus to detect exogenous sounds will be dependent in part

on the spectrum level of the noise floor because of masking effects. As expected, high-

frequency sound pressures would be reduced by about 20 dB. The attenuation of low-

frequency sounds by the abdominal wall, uterus and fluids surrounding the fetal head is

quite small and in some cases enhancement of sound pressure of about 5 dB has been

noted. Between 250 and 4000 Hz, sound pressure levels drop at a rate of 6 dB/octave.

At 4000 Hz, maximum attenuation is approximately 20 dB. At frequencies higher than

4000 Hz, the attenuation is reduced to less than 20 dB.

Sound pressures at the fetal head create compressive forces through bone

conduction that result in displacements of the basilar membrane thereby producing a CM.

For 125 and 250 Hz, an airborne signal would be reduced by 10-20 dB in its passage to

the fetal inner ear over what would be expected to reach the inner ear of the organism in







26
air. For 500 through 2000 Hz, the signal would be reduced by 40-45 dB. For frequencies

in this range, the fetus is indeed buffered from sounds in the environment surrounding its

mother probably because of limited function of the ossicular chain. However, for low-

frequency sounds, the fetus is not well isolated. Low-frequency stimuli reach the inner

ear of the fetus with far greater amplitudes than high-frequency stimuli. Interestedly, the

development of the inner ear is such that low-frequency stimuli are detected before high-

frequency stimuli. If the development of normal function is dependent on external

stimulation, then the developmental pattern of the auditory system provides a mechanism

to ensure each neuronal regions receive adequate stimulation from the environment

(Rubel, 1984).

The fetus in utero will detect speech, but probably only the low-frequency

components less than 500 Hz, and only when the airborne signal exceeds about 60 dB

SPL. If it is less than that, the signal could be masked by internal noises. It is predicted

that the human fetus could detect speech at conversational levels (65-75 dB SPL), but

would not be able to discriminate many of the speech sounds with high-frequency

components. Likewise, if music was played to the mother at comfortable listening levels,

the temporal characteristics of music, rhythms, could be sensed by the fetus, but the high-

frequency overtones would not be of sufficient amplitude to be detected (Abrams et al.,

1998). Simply put, the fetus would be stimulated by music with the "bass" register

turned up and the "treble" register turned down. This information may relate to in utero

development of speech and language, to musical preferences and to subsequent cognitive

development.








Intelligibility of Speech Sounds Recorded within the Uterus

Speech produced during normal conversation is approximately 70 dB SPL and is

comprised of acoustic energy primarily between 200 and 3000 Hz. The average

fundamental frequency of an adult is 125 Hz for male's voice, and is 220 Hz for female's

voice. Speech becomes unintelligible when the background noise in the speech-

frequency range exceeds the level of the message by approximately 10 dB.

There are many factors that determine how well a fetus will hear sounds from

outside its mother. These factors include: the frequency content and level of the internal

noise floor; the attenuation of external signals provided by the tissues and fluids

surrounding the fetal head; sound transmission into the fetal inner ear; and the sensitivity

of the auditory system at the time of sound stimulation.

As a result of experimental work, the characteristics of the intrauterine sound

environment are now fairly well understood. Studies in sheep (Vince et al., 1982, 1985;

Gerhardt, Abrams and Oliver, 1990) and in humans (Querleu et al., 1988a; Benzaquen et

al., 1990; Richards et al., 1992) have shown that the mother's voice and speech sounds

from outside the mother transmit easily into the uterus with little attenuation, and form

part of the intrauterine sound environment. Vince et al. (1982, 1985) implanted a

hydrophone inside the amniotic sac of pregnant ewes, and obtained long-term recordings.

They showed that the sound of maternal vocalizations forms a prominent part of the

intrauterine sound environment, and is louder inside the uterus than outside. Gerhardt et

al. (1990) also noted that when listening to the internal recordings from sheep,

conversations were recognized between experimenters with normal vocal effort 3 feet

from the ewe. Speech was muffled and intelligibility was poor, however, pitch,








intonation, and rhythm were quite clear. These findings are in accordance with data

provided by human studies. Querleu et al. (1988b) presented various human voices

through a loudspeaker to pregnant women and recorded the speech with a hydrophone in

the uterus. The voice included mother talking directly, the mother's voice recorded on

tape and playback, and the recorded voices of other women and men. All types of

recorded voices (presented at 60 dBA) emerged above the basal noise floor (28 dBA) by

+8 to +12 dB. The mother's voice recorded directly was 24 dB greater than the noise

floor. The intensity of the maternal voice transmitted to the uterine cavity was greater

than that of outside voices. Moreover, it was also transmitted to fetus more often than

any other voices. In 1990, Benzaquen et al. reported that maternal vocalization was

easily recorded in utero in ten pregnant women tested in the study. The sound spectrum

produced by pronouncing the words of "99" was characterized by peak intensity of 70 to

75 dB SPL at 200 to 250 Hz and was approximately 20 dB above the intrauterine

background noise at those frequencies.

Richards et al. (1992) studied the transmission of speech into the uterus.

Intrauterine sound pressure levels of the mother's voice were enhanced by an average of

5.2 dB in the low-frequency range, whereas external male and female voices were

attenuated by 2.1 and 3.2 dB, respectively. However, these studies only provided the

information about the existence of speech sound in the intrauterine sound environment.

The understandability of speech recorded from within the uterus is another critical issue

for our understanding of early speech and language development. Fetal identification of

its mother's voice and its ability to form memories of early exposure to speech are in part

dependent on the intelligibility of the speech message.








Currently, two published studies address the perceptibility of speech recorded

from inside the uterus. Querleu et al. (1988b) recorded the voices of five pregnant

women and voices of other male and female talkers with a modified microphone

positioned by the head of the fetus. Six listeners were able to recognize about 30% of the

3120 French phonemes. No significant difference was noted between the male and

female voice, and the mother's voice was not better perceived although more intense.

The recognition of vowels was correlated with their second formant. The intonation

patterns, which frequencies were ranging from 100 to 1000 Hz, were perfectly well

discriminated compared to linguistic meaning.

In a more recent study conducted by Griffiths et al. (1994), a panel of over 100

untrained individuals judged the intelligibility of speech recorded in utero from a

pregnant ewe. Two separate word lists, one d' mcai ni rul and one of non-meaningful

speech stimuli were delivered to the side of the ewe through a loudspeaker and were

simultaneously recorded with an air microphone located 15 cm from the flank and with a

hydrophone previously sutured to the neck of the fetus. Perceptual test tapes generated

from these recordings were played to 102 judges. Intelligibility was influenced by three

factors: transducer site (maternal flank or in utero); gender of the talker (male or female);

and intensity level (65, 75 or 85 dB). For recordings made at the maternal flank, there

was no significant difference between male and female talkers. Intelligibility scores

increased with increased stimulus level for talkers and at both recording sites. However,

intelligibility scores were significantly lower for females than for males when the

recordings were made in utero.








An analysis of the feature information from recordings inside and outside the

uterus showed that voicing information is better transmitted in utero than place or manner

information. "Voicing" refers to the presence or absence of vocal fold vibrations (e.g., /s/

vs. /z/), "place" of articulation refers to the location of the major air-flow constriction

during production (e.g., bilabial vs. alveolar), and manner" refers to the way the speech

sound is produced (e.g., plosive vs. glide).

Miller and Nicely (1955) reported that low-pass filtering of speech signals

resulted in a greater loss of manner and place information than of voicing information.

They concluded that the higher frequency information in the speech signal is critical for

accurate identification of manner and place of articulation. The findings of Griffiths et al.

(1994) are consistent with those of Miller and Nicely (1955) in that transmission into the

uterus can be modeled as a low-pass filter. The poorer in utero reception of place and

manner information is associated with the greater high frequency attenuation.

Voicing information from the male talker, which is carried by low-frequency

energy, was largely preserved in utero. The judges evaluated the male talker's voice

equally well regardless of transducer site. Speech of the female talker carried less well

into the uterus. The fundamental frequency of the female talker was higher than that of

the male talker. Thus, it is understandable that voicing information from the male would

carry better into the uterus than that from the female.

Male and female talker intelligibility scores averaged approximately 55% and

34%, respectively, when recorded from within the uterus. Although these results reflect

the perceptibility of the speech energies present in the amniotic fluid, they do not specify

what speech energy might be present at the fetal inner ear. Measures of acoustic








transmission to the fetal inner ear are quite limited at present. Much work needs to be

completed before conclusions can be drawn regarding what speech energies reach and are

able to be perceived by the fetus.



Fetal Auditory Experiences and Learning

During the last trimester, the human fetus, with a well-developed hearing

mechanism, is exposed to a large variety of simple and complex sounds. Prolonged

exposure, for several weeks or even months, to external and maternal sounds may have

several consequences to the fetus at structural, functional, and behavioral levels. Prenatal

activation of the auditory system may contribute to normal development of peripheral

structures and central connections, as well as maintenance of anatomic and functional

integrity during prenatal maturation. On a more general level, fetal auditory stimulation

may contribute to the formation of auditory perceptual abilities, and to the organization of

the newborn's preferences for a particular acoustical signal (Lecanuet and Schaal, 1996).



Prenatal Effects of Sound Experience

Human fetal responsiveness to intense acoustical stimulation has been studied

only in the past two decades. Fetuses are not only responsive to intense stimulation, they

also display differential auditory responses as a function of the characteristics of the

stimulus. When acoustic or vibroacoustic stimuli are above 110 dB SPL, fetuses display

heart rate accelerations and motor-startle movement responses. Below 100 dB SPL, no

reliable movement responses can be recorded, but fetuses display small, transient heart-

rate decelerations rather than heart-rate accelerations (Lecanuet, Granier-Deferre and








Busnel, 1989, 1995). The heart-rate acceleration changes to auditory stimulation are

typically associated with so-called "startling" or defensive response, while deceleration

changes are "orienting" or attentive response (Berg and Berg, 1987).

Experiments have shown that repetition at a short interval (every 3-4 seconds) of

a 92 to 95 dB SPL acoustic stimulus led to the disappearance of a cardiac deceleration

response that had been induced by the first presentation of the stimulus, indicating an

habituation (Lecanuet et al., 1992). Habituation is defined as the decrement in response

after repeated presentation of a stimulus. Habituation is essential for the efficient

functioning and survival of the organism, enabling it to ignore familiar stimuli and attend

to new stimuli. Habituation represents one of the simplest yet most essential learning

processes the individual possesses, and underlies much of our functioning and

development (Hepper, 1992). Using a classical habituation / dishabituation procedure,

Kisilevsky and Muir (1991) obtained a significant decrement of both fetal cardiac

acceleration and movement responses to a complex noise (at 110 dB SPL), followed by a

recovery of these responses when triggered by a novel vibroacoustic stimulus. The

fetuses were between 37 and 42 weeks gestation during the experiment. Habituation in

utero relates not only to the reception of the sensory message, but also its integration at

lower levels of the central nervous system. Therefore, the fetus in utero is capable of

learning (Querleu et al., 1989).

Lecanuet et al. (1989, 1993) studied the auditory discriminative capacities of the

near-term fetus by using habituation/dishabituation of heart-rate deceleration responses.

In one study (Lecanuet, Granier-Deferre and Busnel, 1989), fetuses at 35 to 38 weeks

gestation displayed a transit heart-rate deceleration response when they were exposed to







33
the repeated presentation (every 3.5 second) of a pair of French syllables: /ba/ and /bi/ or

/bi/ and /ba/, spoken by a female talker at 95 dB SPL. Reversing the order of the paired

syllables after 16 presentations also reliably induced the same type of response. This was

observed in 15/19 fetuses in the BABI/BIBA condition and in 10/14 fetuses in the

BIBA/BABI condition. Response recovery suggested that fetuses discriminated between

the two stimuli. The discrimination that occurred may have been performed on the basis

of a perceptual difference in loudness (intensity) between the /ba/ and /bi/, since the

equalization of these syllables was presented with SPL, not hearing level. This intensity

adjustment makes /bi/ louder than /ba/ for audit listeners. Similarly, Shahidullah and

Hepper (1994) found that fetuses at 35 weeks gestation had the ability to discriminate

between /baba/ and /bibi/.

In another experiment (Lecanuet et al., 1993), the ability of near-term fetuses to

discriminate different speakers producing the same sentence was studied. The heart-rate

responses of fetuses between 36 to 39 weeks gestation were recorded before, during and

after stimulation to the sentence 'Dick a du bon the' (Dick has some good tea). The

sentence was spoken by either a male talker (minimum fundamental frequency F = 83

Hz) or a female talker (minimum Fo= 165 Hz) and delivered through a loudspeaker 20

cm above the mother's abdomen at the same level (90-95 dB SPL). The fetuses were

exposed to the first voice presentation (male or female) and followed by the other voice

or the same voice (control condition) after fetal heart-rate response returned to baseline.

The results demonstrated that in the first 10 s after presentation of the initial voice, the

voice (male or female) induced a high and similar proportion of heart rate deceleration

changes (77% to the male voice, 66% to the female voice) compared to a group of non-








stimulated subjects (9% of deceleration and 46% of acceleration). Within the first 10 s

following the voice change, 69% of the fetuses exposed to the other voice displayed a

heart-rate deceleration response, whereas 43% of the fetuses in the control condition

displayed heart-rate acceleration change. The authors pointed out that near-term fetuses

might perceive a difference between the voice characteristics of two speakers, at least

when they are highly contrasted for Fo and timbre. The results cannot be generalized for

all male and female voices or for all speakers since voices with extremely low Fo were

used in the study (Lecanuet, Granier-Deferre and Busnel, 1995; Lecanuet, 1996).

Hepper et al. (1993) studied the ability of fetuses to discriminate between a

strange female's voice and the mother's voice by measurement of the number of fetal

movements during a 2-minute speech presentation. The results showed that fetuses at 36

weeks gestation did not discriminate between their mother's voice and that of a stranger,

when tape recordings were played to them via an air-coupled loudspeaker placed on the

abdomen. However, the fetuses were able to discriminate between their mother's voice

recorded on tape and played to them over the loudspeaker and the mother's voice

produced naturally; less movements were noted in response to the mother's direct

speaking voice when compared to a tape recording of her voice. According to the

authors, discrimination may be due to the presence of internally transmitted components

of speech which the fetus perceives when the mother is speaking, but that are not present

when the tape recording of the mother's voice is played.

The possibility of prenatal recognition of a familiar child's rhyme was studied by

DeCasper et al. (1994). Seventeen pregnant women recited a child's rhyme aloud three

times a day from their 33rd to 37th week of pregnancy. Fetal heart-rate response was








used to assess differential fetal responsiveness to the target rhyme versus a novel rhyme.

During the 37th week of gestation, each fetus was stimulated to one rhyme for 30 seconds

through a loudspeaker placed over the mother's abdomen. The first rhyme was followed

by 75 s of silence and then the other rhyme was presented for 30 s. Stimulus level for

both rhymes was set at 80-82 dB SPL. Care was taken during fetal testing to keep the

mother unaware of which rhyme was being presented so that she could not inadvertently

cue her fetus. The results showed that fetal heart rates significantly decreased from

prestimulus levels when the target rhyme was presented and significantly increased over

prestimulus levels when the novel rhyme was presented, regardless of presentation order.

This differential heart-rate change implied that the fetus discriminated the two rhymes.

Moreover, since these rhymes were counterbalanced across fetuses, the different patterns

of heart-rate responds could not be attributed to any unique acoustic attributes of one

rhyme.

There is now a growing body of data showing that fetuses perceive acoustical

stimuli. Near-term fetuses can discriminate between two complex stimuli (such as

syllables), between two speech passages, and they are able to learn. Such a competence

may be partly a consequence of fetal familiarization to speech sounds.



Postnatal Effects of Prenatal Sound Experience

Prenatal auditory experience may result in general and / or specific learning

effects that are evidenced in postnatal life. Stimuli familiar to the fetus may selectively

soothe the baby after birth or may elicit orienting responses during quiet states. Familiar

stimuli are more alerting than unfamiliar ones. It is well documented that prenatal








auditory experience plays a major role in the development of human newborn auditory

preferences and capabilities (Fifer, 1987; Leanuet, 1996).

It has been shown that maternal heartbeat (Salk, 1962) and recordings of

intrauterine noises (Rosner and Doherty, 1979) can calm a restless baby and serves as a

potent reinforcer during operant conditioning nonnutritive sucking procedures (DeCasper

and Sigafoos, 1983). Indeed, intrauterine cardiac rhythms are potent reinforces for 2- to

3-day-old newborns, a finding that suggests that prenatal auditory experience affects

postnatal behavior.

Nonnutritive sucking procedures made it possible to objectify newbom's

discriminative abilities and to test the newborn's preference for a given stimulus. The

human voice, especially that of its mother, is likely to have increased salience for the

fetus relative to other auditory stimuli. Mother's voice in the fetal sound environment

differs from other sounds in its intensity, variability, and other multimodal characteristics.

Mother's voice has been reported to be the most intense acoustic signal measured in the

amniotic environment (Querleu et al., 1988a; Benzaquen et al., 1990; Richards et al.,

1992). The nature of the maternal voice may promote greater fetal responsiveness to

mother's voice than any other prenatal sound. The earliest evidence for differential

responsiveness to maternal voice came from work with older infants (Mills and Meluish,

1974). The experiments demonstrated a differential sensitivity to the maternal voice in

20- to 30-day-old infants. The amount of time spent sucking and number of sucks per

minute were increased after a brief presentation of his/her mother's voice. In a later

study using 1-month-old infants (Mehler et al., 1978), sucks were reinforced with either a

mother's or a stranger's voice, intonated or monotone. A significant increase in sucking







37
was only observed when mother's voice was normally intonated. The role of intonation

in recognition of the mother's voice was suggested. Although these procedures clearly

demonstrate that infants respond differentially to their mother's normal voice, the

differences in responding do not necessarily indicate a preference for her voice (Fifer,

1987).

The study by DeCasper and Fifer (1980), using two different nonnutritive sucking

procedures, was the first to provide direct experimental evidence that neonates prefer

their mother's voice. Using a temporal discrimination procedure, 2- to 3-day-old infants

were observed for a 5-minute baseline period in which nonrewarded sucks on a

nonnutritive nipple were recorded. The median time of the interburst intervals (IBIs) was

calculated and used to set the contingency for the testing. For 5 of the 10 infants tested,

sucking bursts that ended IBIs shorter than the baseline median IBI (mIBI) turned on a

tape recording of the infant's mother reading a children's story. Whereas sucking bursts

that ended IBIs equal to or longer than the mlBI turned on a tape recording of another

infant's mother reading the same story. For the other five infants, the IBI/story

contingency was reversed. The results showed that 8 of the 10 infants shifted their

overall medians significantly in the direction necessary to turn on the recording of its

mother's voice. Also, the infants turned on the recording of their mother's voice more

often and for a longer total period of time than the unfamiliar female voice.

In the second procedure, which involved a signal discrimination paradigm, the

presence or absence of a 4-s 400 Hz tone signaled the availability of the different voices,

and the voices remained on for the duration of the sucking burst. For 8 of the 16 infants

tested, sucking on the nipple during the tone resulted in the cessation of the tone and








turned on a recording of their own mother's voice reading a children's story, whereas

sucking during silence turned on a recording of another woman reading the same story.

For the other eight infants, the signal/story contingency was reversed. Again, evidence of

newborns' preference for their own mother's voice was obtained. Infants showed a

significantly greater probability of sucking during the signal (tone or silence) that led to

the presentation of the maternal voice recording.

Since it is possible that preference for the mother's voice could be generated very

fast by the newborn's initial postnatal contact with the mother, several subsequent studies

have attempted to rule out the effect of postnatal auditory experience. Fifer (1987) failed

to find any evidence that preference in newborns for maternal voice was related to either

postnatal age (1- vs. 3-day-olds) or method of feeding (bottle-fed vs. breast-fed).

Another study showed that 2-day-old newborns did not prefer its father's voice to that of

another male's voice, even though these newborns had 4 to 10 hours of postnatal contact

with their fathers (DeCasper and Prescott, 1984). This study also determined that the

absence of a preference for the paternal voice was not due to the inability of newborns to

discriminate between pairs of male voices. Furthermore, the authors compared the

preference between an airborne version of those mother's voice and their "intrauterine",

low-pass filtered version. Using tone/silence discriminative responding procedures, 2- to

3-day-old infants were given a choice of hearing their mother's voice (or other female's

voice) either unfiltered or low-pass filtered at 1000 Hz (Spence and DeCasper, 1987).

Infants showed no preference for either the unfiltered or low-pass filtered version of their

mother's voice, whereas infants preferred the unfiltered version of the nonmatemal voice

to the filtered nonmaternal voice. According to the authors, since there is apparently little








prenatal experience with the low-frequency features of other female voices, but

considerable postnatal experience with their full spectral characteristics, the newborns

preferred the more familiar version of the female stranger's voice. In contrast, both the

filtered and unfiltered versions of maternal voice contained the necessary low-frequency

features for maternal voice recognition, so the infants showed no preference.

Finally, Fifer and Moon (1989), using a modified version of the "intrauterine"

mother's voice mixed or not mixed with maternal cardiovascular sounds, found that 2-

day-old newborns preferred a low-pass filtered version of the maternal voice to an

unfiltered version when 500 Hz was the cutoff frequency. Therefore, it is possible that

the infants in the previous study (Spence and DeCasper, 1987) did not show a preference

for the filtered maternal voice because it was more similar to their postnatal rather than

their prenatal experience with the maternal voice. Newborns' prenatal familiarity with

maternal voice may explain the findings by Hepper et al. (1993). Using an analysis of

fetal movements, Hepper et al. demonstrated that 2- to 4-day-old newborns discriminated

normal speech from "motherese" speech of their mothers' voice, but not between normal

intonated and one of "motherese" of a strange female's voice. Newborns, however,

discriminated the maternal voice from a strange female voice.

Taken together, these results suggest that prenatal auditory experience determines

at least some of the infant's early auditory preferences. This prenatal effect was

demonstrated more directly by the study conducted by DeCasper and Spence (1986).

Sixteen pregnant women recited one of the three children's stories aloud twice each day

during the final 6 weeks of their pregnancies. After birth, the newborns (average age of

55.8 hours) were tested using the nonnutritive IBI contingent sucking procedure. For







40
eight of the infants in the prenatal group, sucking bursts following IBIs < mIBI turned on

a recording of a woman (either the infant's own mother or the mother of another infant)

reading the story that the infant's mother had read while pregnant. Sucking bursts which

followed IBIs > mIBI turned on a recording of that same woman reading a novel story.

For the other eight infants in the prenatal group, the IBI/story contingency was reversed.

Additionally, a control group (12 infants) was tested under the same conditions except

that these infants had no experience with any of three stories. The results showed that

regardless of which story the mothers had recited while pregnant and regardless of the

IBI/story contingency, the newborns in the prenatal group were more likely to suck after

IBIs required to turn on the familiar story, the one they had heard prenatally, whereas

infants in the control group showed no systematic change in their sucking pattern from

baseline. Moreover, these preferences for one of three stories were not dependent on the

specific voice of the storyteller. This result showed that the induction of a preference for

a story (speech passage) generalized from maternal to nonmateral voice. It implies that

the newborn retains two different kinds of acoustic information from prenatal experience:

information about specific characteristics of the mother's voice (perhaps fundamental

frequency) and more general characteristics that are not necessarily mother-specific, such

as intonation contours and / or temporal characteristics.

These studies provide strong evidence that the late-term human fetus is able to

process some aspects of vocal stimulation presented by the mother and retain some of

that information for at least several days after birth. It remains unclear, however, which

specific aspects of prenatal auditory stimulation were responsible for postnatal auditory

preferences.








Because external low-frequency sound is transmitted into the uterus with little

attenuation and because high-frequency sound is attenuated, the fetus can only detect the

low-frequency components of passage presented by the mother. It appears that these

newborns could not merely depend on segmental information (phonetic components of

speech, i.e., the specific consonants and vowels making up the words), which they

experienced prenatally, as the basis for their postnatal recognition, since segmental

information is carried by those frequencies that appear to be most attenuated in utero

(frequencies above 1000 Hz). In contrast, the suprasegmental information (intonation,

frequency variation, stress, and rhythm) contained in the maternal voice and in the stories

recited by the mother is available to the fetus with very little attenuation. The hypothesis

about the role of suprasegmental information in fetal auditory perception has been

investigated (Cooper and Aslin, 1989).

In an effort to test whether prenatally available suprasegmental information would

be sufficient to induce a postnatal preference, the authors had 13 pregnant women sing

the lyrics of the tune to "Mary HadA Little Lamb" using the syllable "la" instead of the

actual words of the melody (Cooper and Aslin, 1989). Each woman sang the melody 5

minutes daily starting on the 14th day prior to her due date. The newborns of these

mothers were tested between 34 and 72 hours after birth (mean age = 52 hours old) using

the IBI procedure. For the seven infants in the prenatal group, sucking bursts that ended

IBIs < mIBI turned on a recording of "Mary HadA Little Lamb" sung by a professional

female singer (using "la" instead of the words), whereas sucking bursts that ended IBIs 2

mlBI turned on a recording of the same singer singing "Love Somebody", also with "la"

instead of the words. These two melodies were sung in the same key and contained the







42
same absolute notes, but the notes occurred in different orders to yield different melodic

contours. For the other six infants in the prenatal group, the IBI/melody contingency was

reversed. In addition, a control group of eight newborns was tested under the identical

condition except that they had no prior experience with either melody. The results

showed that the newborns in the prenatal group produced more of the IBIs to turn on their

familiar melody compared to their baseline performance, while the newborns in the

control group did not, regardless of condition. This study demonstrated that the

suprasegmental characteristics of a prenatally experienced melody were sufficient to

induce a postnatal preference for that melody.

Further supporting evidence for the salience of suprasegmental information in

fetal perception comes from the demonstration that newborns discriminated and preferred

their native language to a foreign language (Mehler et al., 1988; Moon, Cooper and Fifer,

1993). Using the /a/ or /i/ signal discrimination procedure (Moon and Fifer, 1990), Moon

et al. (1993) demonstrated that 2-day-old newborns whose mothers were monolingual

speakers of Spanish or English, preferred their mother's language to the other one.

Demonstration of a preference for the native language at such an early age favors an

interpretation of the study by Mehler et al. (1988) in terms of a prenatal familiarization.

In the latter studies, using a noncontingent habituation / dishabituation of high-amplitude

sucking procedure, Mehler et al. (1988) demonstrated that 4-day-old native French

newborns could discriminate a recording of a woman speaking Russian from the same

woman speaking French, but did not iillIlerentii.ll, respond to English from Italian

recordings. Also, 4-day-olds of non-French parents did not respond differentially to

either Russian or French recordings. Thus, very young infants seem to require some








experience with a language in order to respond differentially to languages. This

interpretation is strengthened by additional data (Mehler et al., 1988) showing that native

English 2-month-olds also did not respond differentially to Russian or French, but easily

discriminated English from Italian. Thus, it was not merely the young age of the

newborns that resulted in their failure to respond differentially to nonnative languages.

Prenatal maternal speech is one likely source of native language experience for the

newborns.

Finally, Mehler et al. (1988) demonstrated that native French 4-day-old newborns

and native English 2-month-olds could still discriminate French from Russian and

English from Italian, respectively, even when all of the these recordings were low-pass

filtered at 400 Hz, which effectively removed most segmental information and

maintained their intonational and temporal structures. It is more likely that prenatal

auditory experience with the suprasegmental features of maternal speech influences the

ability of newborns to discriminated their native language from other nonnative language,

although it certainly is possible that newborns rely on both segmental and suprasegmental

information when discriminating their native language from a foreign language.

There is now clear evidence that from the earliest days of postnatal life the human

infant is actively engaged in processing sounds, particularly those containing acoustic

attributes of the infant's native language. The infant's prenatal experience with maternal

speech may, in large part, determine the early postnatal perceptual salience of a specific

mother's speech and native speech.








Speech Perception


Speech Perception in Infancy

There are two characterizations of infants' "initial state" regarding speech

perception. One argues that infants enter the world equipped with specialized speech-

specific mechanisms evolved for the perception of speech, and that infants are born with

a "speech module" to decode the complex and intricate speech signals (Foder, 1983;

Mehler and Dupoux, 1994). The other holds that infants begin life without specialized

mechanisms dedicated to speech, and that infants' initial responsiveness to speech can be

attributed to their more general sensory and cognitive abilities (Aslin, 1987; Kuhl, 1987;

Jusczyk, 1996).

In fact, the capacity of newborns to distinguish minimal speech contrasts is

remarkable (Aslin, Pisoni and Jusczyk, 1983; Aslin, 1987; Kuhl, 1987; Mehler and

Dupoux, 1994). Eimas et al. (1971) were the first to demonstrate that human infants, as

young as one month old, can discriminate subtle acoustic properties in a categorical

manner that differentiate for English-speaking adults the stop-consonant-vowel syllables

/ba/ from /pa/, which are different in voice onset time (VOT). In their study, computer-

generated (synthetic) speech differing only VOT was presented in pairs to infants for

testing with the high-amplitude sucking procedure. Only one of these VOT pairs spanned

the boundary between English-speaking adults' phonemic categories for /ba/ and /pa/.

This between-category VOT pair was discriminated by the infants, whereas several other

within-category pairs were not discriminated, even though the VOT difference between

each pair was identical (20 second). Since then, there is growing body of evidence that

nearly all speech contrasts (phonetic contrasts) used in any of the world's natural








languages can be discriminated by 6 months of age (Aslin, Pisoni and Jusczyk, 1983;

Aslin, 1987; Kuhl, 1987; Jusczyk, 1996). There are also indications that during the early

stages, the mechanisms that underlie speech processing by infants may be a part of more

general auditory processing capacities (Aslin, Pisoni and Jusczyk, 1983; Aslin, 1987;

Kuhl, 1987; Jusczyk, 1996). Prior to 6 months of age, infants are performing their

analysis of speech sounds solely on the basis of acoustic differences. These acoustic

differences are sufficient to permit categorical perception, just as similar acoustic

mechanisms presumably support the processing of nonspeech contrasts by infants

(Jusczyk et al., 1983) and the processing of speech contrasts by nonhumans (Kuhl and

Miller, 1975, 1978).



Characteristic of Speech

Speech signals have numerous distinctive acoustic properties or attributes that are

used in the earliest stages of perceptual analysis. The average intensity of normal speech,

measured at a distance of 30 centimeter from the speaker's lips, is about 66 dB intensity

level (IL), and individual variation between speakers is about 5 dB (Dunn and White,

1940). If the pauses (silent intervals) are excluded, the experimental data indicated that

these levels would be increased 3 dB (Fletcher, 1953). Loud speech may reach 86 dB IL,

while soft speech may be as low as 46 dB. In the course of ordinary conversation, the

dynamic range of speech is about 35-40 dB (Fletcher, 1953). In a more recent study (Cox

and Moore, 1988), the mean sound pressure level at 1 meter for a male talker speaking

with normal vocal effort was 61 dB and for a female talker was 59 dB. The average

spectra were similar in the range from 400 to 5000 Hz between male and female talkers.








Interestingly, the comparison of long-term average speech spectra over 12 languages

showed that the spectrum was similar for all languages although there were many small

differences (Byrne, et al., 1994). The average value of sound pressure level at 20

centimeter for males was 71.8 dB SPL, while that for females was 71.5 dB SPL. For

one-third octave bands of speech, the maximum short-term r.m.s. level was 10 dB above

the maximum long-term r.m.s. level, and was consistent across languages and frequency.

Most of the energy of speech derives from vowels. Vowels are usually more

intense and relatively longer in duration than consonants. The average difference in

intensity between vowels and consonants is about 12 dB. In English, the intensity

difference between the weakest consonants /0/ and the strongest vowel lo/ is about 28 dB

(Fletcher, 1953). The frequency range of speech extends from 80 Hz to several thousand

Hertz, while the frequencies important to the speech signal are within the 100 to 5000 Hz

range (Borden and Harris, 1984). The human voice is composed of many frequencies.

The lowest frequency is the fundamental frequency of the voice, driven by the vibration

of the vocal folds. The fundamental frequency is constantly changing during articulation,

and varies considerably from one person to another. The fundamental frequency of a

low-pitched male voice is about 90 Hz, while a woman with a high-pitched voice may

speak at a fundamental frequency of about 300 Hz. On average, the average female voice

corresponds to middle C or 256 Hz, whereas the male voice is about an octave lower

(Fletcher, 1953).

The energy in vowels is concentrated mainly in the harmonic sounds of the

fundamental frequency, which for each vowel is divided into several typical frequency

regions, called formants, whose center frequency depends on the shape of the vocal tract








(resonance of the vocal tract). In addition to the fundamental frequency (Fo), four

formants are usually recognized; the lowest two formants (F1 and F2) are stronger than

the other two and occur at frequencies typical for each vowel. The lowest three formants

are the most important for correct recognition of English vowels. The frequency range of

these formants fits fairly well within the 300-3500 Hz range, which is the standard

bandwidth used in the telephone industry (Borden and Harris, 1984; Kent, 1997). If the

fundamental frequency is raised by an octave, the formant values increase by only 17

percent (Peterson and Barney, 1952).

The consonants differ essentially from the vowels in that they usually have no

distinct formant composition; they are composed of mostly high-frequency noise

components. In most consonants, however, energy is concentrated mainly in

characteristic frequency regions. Thus, consonant sounds have components that are

higher in frequency and lower in intensity than vowel sounds. The intensity tends to be

scattered continuously over the frequency region characteristic of each consonant sound

(French and Steinberg, 1947; Borden and Harris, 1984; Kent, 1997).

In contrast to acoustic phonetics that identifies speech sounds in terms of acoustic

parameters (frequency composition, relative intensity, and duration changes), traditional

phonetics describes speech sounds in terms of the way they are produced. The main

divisions are voicing, place and manner. "Voicing" is related to vocal fold vibration, e.g.,

voiced or voiceless. "Place" is related to the location of the major airflow constriction of

the vocal tract during articulation, e.g., bilabial, labio-dental, lingui-dental, alveolar,

palatal or velar. "Manner" is related to the degree of nasal, oral, or pharyngeal cavity








construction, e.g., vowels, stops plosivess), nasals, fricates, affricates, liquids or glides.

Thus, /b/ in the word "best" is a voiced bilabial stop (plosive) (Borden and Harris, 1984).



Intelligibility of Speech

The ability to understand speech is the most important measurable aspect of

human auditory function. Speech can be detected as a signal as soon as the most intense

point of its spectrum exceeds the ear's pure tone threshold at the frequency concerned.

This intensity is called the speech detection threshold or threshold of detectability (Egan,

1948; Schill, 1985). At this intensity level, a listener is just able to detect the presence of

speech sounds about 50% of the time. When the intensity is increased by some 8 dB, the

subjects begin to understand some words and can repeat half of the speech material

presented; this is the speech reception threshold or threshold of perceptibility (Egan,

1948; Hawkins and Stevens, 1950; Schill, 1985). The speech reception threshold of

spondee words (two syllables), which is considerably lower than one-syllable words, is at

about 20 dB SPL (Davis, 1948; Penord, 1985). However, only after the average intensity

of speech has reached between 30 to 33 dB SPL, are 50 percent of monosyllabic words

understood (Kryter, 1946; French and Steinberg, 1947; Davis, 1948; Egan, 1948).

Speech intelligibility or speech discrimination, expressed in terms of percentage correct,

is used to describe how much speech sound can be understood. The factors affecting

speech intelligibility are numerous. These include physical factors related to the speech

stimuli such as level of presentation, frequency composition, distortion, and signal to

noise ratio.








French and Steinberg (1947) used nonsense monosyllables of the consonant-

vowel-consonant (CVC) type as word material in their studies, and examined

intelligibility after low-pass and high-pass filtering. They found that when intensity was

increased, discrimination improved up to a certain limit, after which it remained largely

constant even if intensity was further increased. Optimal intensity with different filter

settings proved to be approximately the same, within a range of 10 dB. The optimal

intensity was 75 dB SPL. At this level, when all frequencies above 1000 Hz were passed

through the filter, 90% of CVC syllables were recognized correctly. However, when only

the frequencies below 1000 Hz were presented, correct identification of the CVC

syllables declined to 27%. The French and Steinberg study clearly demonstrated the

importance of the high frequencies for correct identification of CVC syllables.

Furthermore, when intelligibility scores were plotted as a function of cutoff-frequency of

at optimal intensity levels, the low-pass and high-pass curves intersected at 1900 Hz,

where the intelligibility score was 68%. It was said that the crossover point divided the

frequency scale into two equivalent parts; the frequencies above the cross were as

important as the frequencies below the crossover frequency.

The type of speech material distinctly affects the intelligibility of filtered speech

(Hirsh, Reynolds and Joseph, 1954). The speech materials in their study included

nonsense syllables, monosyllabic words (Central Institute for the Deaf Auditory Test W-

22), disyllabic words spondeess, iambs and trochees) and polysyllabic words. The input

speech level for all filter conditions was 95 dB SPL. They found that nonsense

monosyllables and monosyllable words suffered most in intelligibility during frequency

filtering. When the cutoff frequency (high-pass filter) was less than 3200 Hz, the








intelligibility did not decrease significantly. But iritliibhltll decreased rapidly as the

cutoff frequency increased above 3200 Hz. Under low-pass filter conditions, it was only

when all the frequencies above 800 Hz were eliminated that the intelligibility decreased

noticeably from its maximum, and then it dropped rapidly as the more extreme filter

conditions were reached. The functional curves for the different speech materials

remained nearly constant under both high-pass and low-pass filtering. The fewer

syllables there were in a meaningful word the lower its intelligibility. Nonsense

monosyllables were the least intelligible of all. Intelligibility of nonsense syllables and

monosyllable words is severely affected by frequency distortion. However, as word

length increases, intelligibility is retained. For nonsense syllables, the low-pass and high-

pass functional curves intersected at 1700 Hz, where the intelligibility score was 75%.

The higher crossover frequency (1900 Hz) with lower intelligibility score (68%) in the

French and Steinberg (1947) curves may be due to the high rejection rate of the filters.

Hirsh et al. (1954) also studied noise-masking effects on the intelligibility of different

types of speech materials. The intelligibility of easy speech material increased more

rapidly as a function of signal-to-noise (S/N) ratio than did the irnlcll'b HIII.. of more

difficult material. At a given S/N ratio, noise levels significantly affect intelligibility. In

general, intelligibility at a noise level of 70 dB was higher than that at other noise levels.

The results also showed that the intelligibility of polysyllabic, disyllabic and

monosyllabic words in noise was higher when they appeared in sentences than when they

appeared as discrete items on a list. Differences among the inilligil-l of the different

types of words were much smaller when the words appeared in sentences. Sentence

context had the greatest benefit on understanding monosyllabic words.








Pollack (1948) increased the difficulty of the test method for studying the effect

of low-pass and high-pass filtering by adding continuous spectrum white noise at 81.5 dB

SPL as a constant background noise. The test material consisted of monosyllabic,

phonetically balanced words. The overall speech level was about 68 dB SPL at a

distance of 1 meter from the talker. In general, the results indicated that speech

intelligibility increased as the intensity level of the speech signal and the frequency range

were increased. Owing to the background noise, +10 dB orthotelephonic gain (ratio of

the sound intensity at the listener's ear produced by the test system to the orthotelephonic

reference system, about 75 dB SPL) gave only 30 percent discrimination even to

unfiltered speech. With low-pass and high-pass filtering, the intelligibility improved

continuously with increasing intensity, up to a +50 dB orthotelephonic gain with different

filter settings, even though the rise of the curves between orthotelephonic gain of +30 and

+50 dB was fairly slight. The introduction of background noise resulted in shifting

optimal intensity from +10 dB orthotelephonic gain (French and Steinberg, 1947) to the

+30 to +50 dB level.

The Pollack (1948) study also demonstrated that the contribution to the

intelligibility of the higher speech frequencies alone was small. When a high-pass filter

with a 2375 Hz cutoff was used, intelligibility was only 5% at maximal gain. However,

these same frequencies made an appreciable difference in intelligibility when the low

frequency sounds were also passed at the same time. When the cutoff frequency of low-

pass filter was extended from 2500 Hz to 3950 Hz, the intelligibility was improved from

70% to 90%. It was suggested that the contribution to intelligibility of a given band of

speech frequencies was not independent of the contribution being made at the same time








by other bands of frequencies. There was an interaction among the contributions of the

various bands. Similarly, the contribution to intelligibility of very low speech

frequencies was also small. No words were recognized when the frequencies below 425

Hz alone were heard. However, when high-pass cutoff frequency was decreased from

580 Hz to 350 Hz, the intelligibility was improved from 85% to 93%.

A study of the effects of noise and frequency filtering on the perceptual

confusions of English consonants revealed that noise and low-pass filtering ensured more

homogeneous and well-defined results, whereas the mistakes from high-pass filtering

were more indefinite (Miller and Nicely, 1955). Nonsense consonant-vowel (CV)

syllables were used as the test material. The 16 consonants were spoken initially before

the vowel /a!. The results showed that voicing and nasality (manner of articulation) were

much less affected by a random masking noise than were the other features. Affrication

and duration (manner of articulation) were somewhat superior to place but far inferior to

voicing and nasality. Voicing and nasality were discriminable at S/N ratio as poor as -12

dB whereas the place of articulation was hard to distinguish at S/N ratio less than 6 dB,

an 18 dB difference in efficiency. After low-pass filtering (cutoff frequency ranged from

5000 Hz to 300 Hz), voicing and nasality features were well preserved compared with

affrication and place information although affrication was superior to place of

articulation. These results showed the considerable similarity between masking by

broadband noise and filtering by low-pass filters. The authors explained that the uniform

noise spectrum masked high frequencies more than low frequencies since the high-

frequency components of speech were relative weaker than low-frequency components,

so it was in effect a kind of low-pass filter. However, high-pass filtering (cutoff








frequency ranged from 1000 Hz to 4500 Hz) produced a totally different pattern. All

features deteriorated in about the same way as the low frequencies were removed. Thus,

low-pass filters affected linguistic features differentially, leaving the phonemes audible

but similar in predictable ways, whereas high-pass filters removed most of the acoustic

power in the consonants, leaving them inaudible and producing quite random confusions.

Audibility was the problem for high-pass filtering and confusibility was the problem for

low-pass filtering. In addition, the crossover point of the high-pass and low-pass function

curves was 1550 Hz, and it became 1250 Hz when plotted by the relative amount of

information transmitted instead of the intelligibility score. The downward shift of

crossover point in frequency indicated that relative to the intelligibility, the low-pass

information was greater and the high-pass information was smaller in consonant

recognition.

Wang and her colleagues studied perceptual features of consonant confusions in

noise (Wang and Bilger, 1973), and following filtering distortion of speech (Wang, Reed

and Bilger, 1978), by sequential information analysis (SINFA), which sequentially

identifies features with a high proportion of transmitted information contributing to

consonant perception. Nonsense syllables were used as test materials in their studies.

The stimuli represented all phonologically permissible consonant-vowel (CV) and vowel-

consonant (VC) syllables, which were formed by combing one of 25 consonants with the

vowels /i/, /a/ or/u/. Wang and Bilger (1973) demonstrated that articulatory and

phonological features could account for a large proportion of transmitted information.

The particular features, which resulted in high levels of performance, varied significantly

from one syllable set to another and in some cases varied within syllable sets as a








function of listening conditions. Voice and nasal features were well perceived both in

noise and in quiet, and they were identified as perceptually important in every syllable set

where they were distinctive. The feature round (/w/ and /h'/) was also well perceived

both in noise and in quiet. Other features, such as frication and place, appeared to have

different perceptual importance depending upon the listening condition. Under filtering

conditions, there were differential effects of high-pass and low-pass filtering on feature

recognition (Wang, Reed and Bilger, 1978). Low-pass filtering (cutoff frequency ranged

from 5600 Hz to 500 Hz) produced systematic changes in the importance of different

features, whereas high-pass filtering (cutoff frequency ranged from 355 Hz to 4000 Hz)

produced less consistent changes in features recognition. When the low-pass cutoff was

lowered from 2800 to 1400 Hz, sibilance (/s/, /z/, /S/, /tS/, /Z/ and /dZ/) (manner of

articulation) quickly lost its perceptibility. The high-pass filtering had little effect on the

recognition of sibilance. The high crossover point of the functions at 2800 Hz indicated

that cues for sibilant sound lay in the high-frequency region of the spectrum, above 2000

Hz. High (/k/, /g/, /S/, /tS/, /Z/, /dZ/, /I1/, /w/ and /j/) and anterior (/p/, /t/, /b/, /d/, /f/, /s/,

/v/, /z/, /m/, /n/, /1/, /0/ and //) features (place of articulation) also dropped noticeably

when the cutoff of low-pass filter was lowered to 1400 Hz. For CV syllables, the

crossover point, approximately 1700 Hz, was lower than that for VC syllables, about

2400 Hz. Thus, the cues for high / anterior features were partly dependent on the position

of the consonant within the syllables. However, voice and nasality became increasingly

important as the low-pass cutoff was lowered, while they were adversely affected by

high-pass filtering. The characteristics of consonant confusions following filtering were

quite similar to that noted by Miller and Nicely (1955).








The patterns of consonant confusions generated by subjects with sensorineural

hearing loss were like those generated by normal hearing subjects in response to the

appropriate filtering distortion of speech (Bilger and Wang, 1976; Wang, Reed and

Bilger, 1978). For example, severe low-pass filtering produced consonant confusions

comparable to those of listeners with high-frequency hearing loss. Severe high-pass

filtering gave a result comparable to that of patients with flat or rising hearing loss.

In 1994, Griffiths et al. investigated the intelligibility of speech stimuli recorded

within the uterus of a pregnant sheep. The results showed that the intelligibility of the

phonemes recorded in the air was significantly greater than the intelligibility of phonemes

recorded in utero. A male talker's voice was more intelligible than a female talker's

voice when the recordings were made in utero. Furthermore, an analysis of the feature

information transmission from recordings inside and outside the uterus revealed that

voicing information is better transmitted in utero than place or manner information. The

findings are quite similar to those of studies conducted by Miller and Nicely (1955) and

Wang et al. (1978) in that transmission into the uterus can be modeled as a low-pass

filter. While the results of Griffiths et al. (1994) study only reflect the perceptibility of the

speech energies present in the amniotic fluid, they do not specify what speech energy might

be present at the level of fetal inner ear. Measurements of acoustic transmission to the fetal

inner ear are quite limited at present. The purpose of current study was to evaluate the

intelligibility of externally generated speech utterances transmitted to and recorded at the

fetal sheep inner ear in utero.














CHAPTER 3
MATERIALS AND METHODS



The overall aims of this project were to determine the intelligibility of speech

information that was transmitted into the uterus and present within the inner ear of the sheep

fetus in utero. Cues inherent in the speech of both the mother and external talkers may be

perceived by the fetus, thus forming the basis for language acquisition. This study was

intended to provide evidence of fetal inner ear physiological responses to externally

generated speech and to address the hypotheses included in Chapter 1. The study had two

distinct components. The first involved recording speech produced through a loudspeaker

with an air microphone, a hydrophone placed in the uterus of a pregnant sheep and an

electrode secured to the round window of the fetus in utero cochlearr microphonic, CM).

The second portion of the study involved playing the recordings to a jury of normal hearing

adults so speech intelligibility could be evaluated.



Surgery

Eight time-mated pregnant ewes carrying fetuses at gestational ages from 130-140

days were prepared for surgery (term is 145 days). From this group, speech stimuli

recorded from only one animal were used in this study. Recordings from this animal were

judged by the experimenter to have the best fidelity. Speech signals produced from a








loudspeaker were recorded with an air microphone, a hydrophone placed in the uterus of

pregnant sheep and an electrode secured to the round window of the fetus. The Animal Use

Protocol in this study was approved by the Institutional Animal Care and Use Committee

(IACUC) of the University of Florida.

In preparation for measurements of fetal cochlear microphonic (CM), ewes were

fasted, anesthetized and maintained on a mixture of oxygen and halothane (1.5-2%) during

surgery and subsequent experimentation. The ewe was placed in the supine position and

the fetal head was delivered through a midline hysterotomy. An incision was made over the

fetal right bulla posterior and inferior to the pinna. The incision was located at the

attachment of the cartilaginous portion of the canal to the lateral surface of the skull and

was made parallel to the posterior border of the mandibular ramus. The bulla was exposed

and a small hole was opened through the bulla. The round window was located with an

operating microscope. An electrode was made from insulated stranded stainless steel wire

(Cooner Wire Company, Chatesworth, CA) with the insulation removed from one end. The

uninsulated end was rolled into a 2-mm diameter ball and placed inside the round window

niche (positive electrode). After verifying the impedance of the round window electrode (<

10 kO), the bulla was refilled with amniotic fluid and sealed over with methylmethacrylate.

Additional Cooner wire electrodes were sutured to tissue overlying the bulla (negative

electrode) and to tissue at a remote site (ground electrode). The skin over the bulla was

sutured and the electrodes were carefully secured to the fetus with silk thread. The fetus

was returned to the uterus and the uterus and abdomen were closed with clamps. Electrode








wires passed through the incisions and were connected to a biological amplifier (Grass

Instruments Co., model P511K, Quincy, MA).



Recording Speech Stimuli

The anesthetized ewe was placed supine on a stretcher and transported to a sound-

treated booth (Industrial Acoustics Co., model GDC-IL, Bronx, NY). Speech stimuli for

producing fetal CM were prerecorded on cassette tape and consisted of Vowel-Consonant-

Vowel (VCV) nonsense syllables and Consonant-Vowel-Consonant (CVC) monosyllable

words spoken by a male and a female talker. The center of a loudspeaker was one meter

from the ewe and was adjusted to the same height as the center of the lateral wall of the

ewe's abdomen. A calibrated air microphone (Briiel and Kjael, type 4165, Marlborough,

MA) was positioned over the maternal abdomen at a distance of 10 cm. A miniature

hydrophone (Briiel and Kjael, model 8103), calibrated with a pistonphone (Briiel and Kjael,

model 4223), was inserted in the uterus and connected to a charge amplifier (Brilel and

Kjael, type 2635). The output from the tape player (Harman Kardon, model TD 392,

Woodbury, NY) was routed through a power amplifier (Peavey DECA/1200, Peavey

Electronics Corp., Meridian, MS) that activated the loudspeaker (Peavey HDH-2). The

cochlear potentials, CMs recorded from the fetal inner ear in response to the speech stimuli,

were amplified (Grass Instruments Co., model P511K, Quincy, MA) and high-pass filtered

at 100 Hz (Kron-Hite Corp., model 3550, Avon, MA, 24 dB/octave). Figure 3-1 showed

the schematic drawing of recording system set-up.

Because the CM is produced during acoustic stimulation, the potential can be

contaminated with electromagnetic artifact emanating from the loudspeaker and associated








































Figure 3-1. Schematic drawing showing the aminal and the setup of devices for stimulus generation, stimulus
measurement, and recording in air, in the uterus, and from the fetal inner ear cochlearr microphonic).








wires. The electrical interference produces a voltage output from the biological amplifier

that mimics the true biologic potential. Because electromagnetic energy travels at the speed

of light, whereas acoustic energy travels at the speed of sound (344 m/s), uncontaminated

CM occurred approximately 3 ms after the onset of the stimulus. If this onset delay was not

present in the recording, then measurements were repeated after appropriate equipment

adjustment and / or grounding. The presence of an onset delay confirmed that the recorded

waveform was bioelectric rather than electromagnetic (Gerhardt et al., 1992).

Before recording speech stimuli, CMs (Figure 3-2) were verified by using tone-

bursts (0.5, 1.0 and 2.0 kHz). An evoked potential averaging computer (Tucker-Davis

Technologies, Gainesville, FL) delivered stimuli to the loudspeaker. Tone bursts were

delivered to the ewe's flank at intensity levels that were capable of producing CM

responses. Twenty stimuli were delivered and averaged for each CM response. Stimulus

duration (10 or 20 ms), sweep time (20 or 50 ms) and filtering (100-3,000 Hz or 100-10,000

Hz) varied with stimulus frequency (0.5, 1.0 and 2.0 kHz). The rate of stimulation was 5/s

and the rise/fall time was 0.2 ms.

The speech stimuli were delivered to the flank of pregnant ewes at two intensity

levels (105 and 95 dB SPL). First, the signals were simultaneously detected with a

microphone located over the abdomen and electrodes placed on the fetal round window in

utero. The outputs from the microphone and inner ear (CM) were recorded on two separate

channels of a DAT tape recorder (SONY Corporation, type ZA5ES, Japan). Then, the same

speech stimuli were repeated and recorded with a hydrophone placed in the uterus and

electrodes placed on the fetal round window ex utero. The fetal external canal and middle

ear cavity were cleared of fluids during ex utero measurement. At the completion of all










500 Hz 70 dB



500 Hz 60 dB


500 Hz 50 dB



2000 Hz -70 dB


2000 Hz 60 dB


Figure 3-2. CM responses obtained from a fetal sheep. Examples of CMs evoked by airborne pure tones at 0.5 and
2.0 kHz and at stimulus levels indicated under each waveform. The apparent onset latency represents the acoustic travel-time
from the loudspeaker to the fetal inner.


---~-----








measurements, the ewe and fetus were euthanized as prescribed by the IACUC of the

University of Florida.



Perceptual Testing



Subjects

A total of 155 undergraduate students from the Department of Communication

Sciences and Disorders at University of Florida volunteered to participate in this study.

From this group, responses from 139 students who judged the intelligibility of speech

stimuli were used. Sixteen students were excluded from the study for the following

reasons: eight judges used unreadable symbols; four judges were normative American

English speakers; and four judges reported hearing loss. The descriptive information of the

perceptual tests is presented in Table 3-1.

All of the judges had taken or were taking an undergraduate course in phonetics,

although as a group they would not be considered experienced phoneticians. All testing

was completed in a single 45-minute session. The protocol for the perceptual testing was

approved by the University of Florida Institutional Review Board (UFIRB Project # 1998-

563).



Speech Stimuli

Two sets of stimuli were used, vowel-consonant-vowel (VCV) nonsense syllables

and consonant-vowel-consonant (CVC) words spoken by male and female talkers and

words based on the Griffiths word lists (1967). Each stimulus item was presented in a








Table 3-1. Perceptual tests.


Perceptual audio CD

A

B

C

D

E

F


Contents

VCV

CVC

CVC

CVC

CVC

CVC


Number ofjudges

33

19

21

20

21

25








carrier phrase, "Mark the word ." The 14 nonsense syllables (C=/p, t, k, b, d, g, f, v, s,

z, m, n, S, tS/) spoken by both a male and a female talker were preceded and followed by

the vowel /a/ (e.g. /aga/). The mean fundamental frequencies were 120 and 225 Hz for the

male and female talkers, respectively. Sixty-four items were recorded at each of 16

conditions among gender of talker (male and female), stimulus levels (105 and 95 dB SPL),

and recording locations (air, uterus, CM ex utero, and CM in utero).



Procedures

The word list, spoken by both male and female talkers, were played through the

loudspeaker via a cassette tape recorder at two different airborne levels measured at the

maternal flank: 105 and 95 dB SPL (dB re: 20 jiPa). The outputs from the air microphone,

the hydrophone, and the fetus inner ear (CM) ex utero and in utero were recorded on DAT

tapes. One set of recordings with the best quality sound from one fetus was chosen for

constructing perceptual tapes. First, speech stimuli were digitized and reproduced via a

computer program (Cool Edit, Syntrillium Software Corporation, Phoenix, AZ) with 44.1-

kHz sampling rate and 16-bit resolution. The amplitudes of the speech stimuli were

adjusted to the same relative voltage levels. Second, each syllable item with a carrier phrase

was saved as an individual file. Then a computer program was used to randomize and

counter-balance the speech stimuli among gender of talker (male and female), stimulus

levels (105 and 95 dB SPL), and recording locations (air, uterus, CM ex utero, and CM in

utero). Finally, six different perceptual audio compact discs (CDs) were created. One

contained randomized recordings of 224 nonsense items (14 nonsense syllables recorded








under 16 conditions). The five other CDs contained recordings of 800 monosyllabic words,

each version consisted of 160 words (10 words recorded under 16 conditions, the same

word occurred no more than 4 times in each version). A 5-second silence interval separated

each test item.

The recordings were used to conduct a perceptual test of speech intelligibility. The

test required groups of judges to listen to the utterances in the carrier phrase and mark on

paper what they heard. The judges' responses provided the basis for determining

intelligibility scores (percent correct) associated with the VCV nonsense items and the CVC

words.

For the 14 VCV nonsense items, the judges filled in a blank in a /a a / frame with

the vowel set to la/. For example, if a judge heard "Mark the word /apa/," he or she would

have to write a "p" in the blank to be correct.

For the 50 CVC words, each judge selected his or her response from a closed set of

six monosyllable words that differed in either the initial or final consonant. For example,

one stimulus item was "Mark the word bat" and the response list included "batch, bash, bat,

bass, back, badge." To be correct, the judge would have to mark the word "bat."

Each version of perceptual audio CDs were played to a group of judges comprising

20-30 normal hearing young adults. All testing were conducted in a specially designed

listening laboratory which accommodated up to 25 people at one time. The perceptual

audio CD were played over earphones (HS-95 and HS-56, SONY) to the judges at an

output level set to be comfortably loud (approximately 70 dB SPL). Figure 3-3 showed the

frequency responses of two types of earphones used in the perceptual tests. Each listening




















O 90


o so-... "


70






50



63 125 250 500 1k 2k 4k 8k A L

FREQUENCY (Hz)



Figure 3-3. The frequency responses of two types of earphones: SONY HS-95 (dot line) and HS-56 (solid line)
used for the perceptual tests.
C.








test was preceded by a brief practice session using a version of perceptual audio CD

different from the real testing CD to ensure that subjects understood the perceptual tests.



Data Analyses



Statistical Analyses

Intelligibility, consonant confusion matrices and spectral analyses of recorded

speech signals were assessed. The speech intelligibility scores (percent correct) were

derived from the judges' responses to the perceptual audio CDs for the VCV nonsense

syllables and CVC words by gender, intensity level, and recording location. Multifactor

analysis of variance (ANOVA) was performed on the data of the VCV nonsense syllables

and CVC words separately. The independent variables included three factors: gender of the

talker (male and female), sound pressure level of the airborne stimulus (105 and 95 dB), and

location of recording (air, uterus, CM from ex utero fetus, and CM from in utero fetus).

The dependent variables were percentage of correct identification of nonsense syllables and

monosyllabic words (perceptual scores). In order to meet the variance assumptions for

statistical analysis, the pr cr .nt niell I h. ih', data, which are binomial variables (Thomton

and Raffin, 1978), were transformed using an arcsine function (2xarcsinx4(%/100)) to

normalize the variance prior to further analysis (Winer, Brown and Michels, 1991).








Information Analyses

Data were presented in the form of a 14 x 14-item confusion matrix for each

condition. A total of 16 matrices for VCV nonsense syllables were collected. Sequential

Information Analysis (SINFA; Wang, 1976) of perceptual pattern was performed. SINFA

is applied to the error matrices in order to evaluate the amount of feature information

received. SINFA allows for the partitioning of the contingent information transmitted and

received for particular features of the stimuli (e.g., voicing, manner, and place). From these

results a relative measure of performance may be calculated (the ratio of the bits of

information received to the bits sent, with the effects of other features held constant). The

data from all 16 conditions were analyzed using SINFA.



Acoustic Analyses

Acoustic analyses of five vowels (/i/, /I/, /E/, /as/, /A/) selected from the Griffiths'

words list (CVC) were performed across the recording conditions (105-dB stimuli of both

male and female speakers recorded in air, in the uterus, CM from ex utero fetus, and CM

from in utero fetus). The fundamental frequency (F0) and the first three formant frequencies

(F1, F2, and F3), and their relative intensity levels were measured by using a signal-

processing computer program (Cool Edit, Syntrillium Software Corporation, Phoenix, AZ).

Each real-time speech waveform was digitized with 44.1-kHz sampling rate and 16-bit

resolution. An averagel50-ms segment was selected around the steady-state portion of each

vowel. The F0 and formants (F,, F,, and F,) of each segment were measured by visual

inspection of the corresponding Fourier transform spectrum using Hamming window with







69
4096 Fourier size followed by smoothing (Lee, Potamianos and Narayanan, 1999).

According to the values measured by Peterson and Barney (1952), and Hillenbrand et al.

(1995), FO and formants frequencies (F|, F2, and F3) were estimated. The relative intensity

levels were also calculated by subtracting the background noise value from the peak value

under different recording conditions. Two-factor repeated measures ANOVAs were

performed on the data of relative intensity levels of F, F,, F,, and F, across the recording

locations for each vowel.













CHAPTER 4
RESULTS AND DISCUSSION



One hundred and thirty-nine judges completed the perceptual tests. Because the

speech stimuli were completed randomized and counter-balanced across gender of talkers

(male and female), stimulus levels (105 and 95 dB SPL) and recording locations (air, uterus,

CM ex utero, and CM in utero), learning effects were minimized.



Intelligibility

The speech intelligibility scores (percent correct) derived from the judges'

responses to the perceptual audio compact discs (CDs) for the VCV nonsense syllables

and CVC words are displayed in Figures 4-1 and 4-2, respectively. A few general

observations can be made about both Figures. First, intelligibility scores as a function of

location alone, decreased from air to hydrophone locations and decreased again from CM

ex utero to CM in utero. That is to say, intelligibility scores of the VCV and CVC lists

were high when recorded in air and slightly less when recorded with a hydrophone in the

uterus. The scores, when recorded from the inner ear of the fetus ex utero, are 20-40%

lower than recordings from either the air or hydrophone locations. The intelligibility

scores recorded from the inner ear of the fetus in utero are about 10-20% poorer than the

scores recorded from the fetal CM ex utero. Second, from casual inspection of the two

Figures, there appear to be a slight gender and level effects primarily for the VCV lists.
























Figure 4-1. Mean percent iniellicIhi of VCV nonsense stimuli spoken by a male and a female talker
recorded in air, in the uterus, from the fetal CM ex utero, and from fetal CM in utero at two airborne stimulus levels.
Bars equal the standard error of the mean.








100
00 VCV EMale-105 d
90 93 Female-10!
~o 80 [] Male-95 dB
o 80
M [ Female-95
>- 70
I-
60
50.o
- 40
30
-LJ
20
10
0
AIR UTERUS CM-EX CM-IN
TEST CONDITION
























Figure 4-2. Mean percent intelligibility of CVC words spoken by a male and a female talker recorded in air, in
the uterus, from the fetal CM ex utero, and from fetal CM in utero at two airborne stimulus levels. Bars equal the
standard error of the mean.








100
9CVC Male-105 d
90 Female-10t
80. .. Male-95 dB
80
..M Female-95
>- 70
60
rn 50 :::
- 40
S30
S20
10
0

AIR UTERUS CM-EX CM-IN

TEST CONDITION









Gender and level effects are more pronounced from recordings of the CM than from

recordings in air or in the uterus. Summaries of the means and standard deviations for

intelligibility by gender, stimulus level, and location that contributed to these figures are

presented in Tables 4-1 and 4-2.

The results of a three-factor repeated measure ANOVA are summarized for VCV

stimuli and given in Tables 4-3. There was a significant three-way interaction among

gender, stimulus level, and location (F3,96= 14.582, p < 0.0001). The main effects were

significant for each of the three factors: location (F3,96= 994.982, p < 0.0001), gender (F|,

32= 210.258, p < 0.0001), and stimulus level (F1,32= 25.869, p < 0.0001). The results of

the post hoc multiple comparison test (Newman-Keuls) are presented in Table 4-4. Not

all of the paired results were included in this table. Note that intelligibility in all cases

was significantly greater (p < 0.01) for CM ex utero than for CM in utero. Also,

intelligibility of the nonsense syllables (VCV) was better at higher presentation levels

than at lower presentation levels. When both stimulus levels were compared, statistical

significance (p < 0.01) was attained for the male voice recorded in the uterus, from CM

ex utero, and from CM in utero, as well as for the female voice recorded from CM in

utero.

The ANOVA results for CVC words (Table 4-5) showed a significant three-way

interaction among gender, stimulus level, and location (F3, 315 = 22.459, p < 0.0001). This

was similar to the results for the nonsense syllables (VCV). The main effects were

significant for location (F3,315= 1213.579, p < 0.0001) and stimulus level (F, o05s=

102.82, p < 0.0001), but not for gender (FI, 105 = 1.247, p = 0.267). The results of the post

hoc multiple comparison test (Newman-Keuls) are given in Table 4-6, in which not all of









Table 4-1. VCV stimulus intelligibility scores for each talker, stimulus level and recording site.


In Air In Uterus CM-ex utero CM-in utero

Male talker 105 dB 95 dB 105 dB 95 dB 105 dB 95 dB 105 dB 95 dB

Mean (%) 99.35% 98.48% 89.61% 96.10% 80.52% 70.13% 46.75% 32.47%

S.D. (%) 2.09% 2.97% 7.16% 5.08% 11.33% 8.83% 12.25% 9.46%

No. correct (N=14) 13.909 13.788 12.545 13.455 11.273 9.818 6.545 4.545

S.D. 0.292 0.415 1.003 0.711 1.587 1.236 1.716 1.325

No. ofjudges 33 33 33 33 33 33 33 33

Female talker

Mean (%) 90.26% 91.34% 82.90% 78.79% 50.22% 48.48% 39.39% 28.57%

S.D. (%) 5.87% 3.89% 8.74% 10.18% 7.68% 10.22% 11.73% 11.01%

No. correct (N=14) 12.636 12.788 11.606 11.030 7.030 6.788 5.515 4.000

S.D. 0.822 0.545 1.223 1.425 1.075 1.431 1.642 1.541

No. ofjudges 33 33 33 33 33 33 33 33









Table 4-2. CVC stimulus intelligibility scores for each talker, stimulus level and recording site.


In Air In Uterus CM-ex utero CM-in utero

Male talker 105 dB 95 dB 105 dB 95 dB 105 dB 95 dB 105 dB 95 dB

Mean (%) 95.47% 97.45% 93.40% 85.75% 62.26% 57.83% 60.09% 40.85%

S.D. (%) 6.19% 4.38% 7.55% 13.23% 16.69% 15.24% 12.31% 15.92%

No. correct (N=10) 9.547 9.745 9.340 8.575 6.226 5.783 6.009 4.085

S.D. 0.619 0.438 0.755 1.323 1.669 1.524 1.231 1.592

No. ofjudges 106 106 106 106 106 106 106 106

Female talker

Mean (%) 98.02% 92.74% 90.57% 86.70% 63.02% 62.74% 52.26% 46.23%

S.D. (%) 4.66% 6.25% 7.28% 11.85% 16.28% 16.07% 15.14% 15.58%

No. correct (N=10) 9.802 9.274 9.057 8.670 6.302 6.274 5.226 4.623

S.D. 0.466 0.625 0.728 1.185 1.628 1.607 1.514 1.558

No. of judges 106 106 106 106 106 106 106 106









Table 4-3. ANOVA summary table for VCV stimuli.


Source Sum of Squares df Mean Squares F p-value
Location 180.991 3 60.330 994.982 <0.0001

Error (Location) 5.821 96 0.06063

Gender 22.470 1 22.470 210.258 <0.0001

Error (Gender) 3.420 32 0.107

Level 0.894 1 0.894 25.869 <0.0001

Error (Level) 1.106 32 0.03456

Location x Gender 3.948 3 1.316 21.539 <0.0001

Error (Location x Gender) 5.866 96 0.0611

Location x Level 2.738 3 0.913 23.181 <0.0001

Error (Location x Level) 3.779 96 0.03936

Gender x Level 0.00407 1 0.0407 0.104 0.749

Error (Gender x Level) 1.249 32 0.03904

Location x Gender x Level 2.180 3 0.727 14.582 <0.0001

Error (Location x Gender x Level) 4.784 96 0.04983










Table 4-4. Post hoc multiple comparisons (Newman-Keuls test) for VCV stimuli.


Conditions
AMH AML UMH UML XMH XML IMH IML AFH AFL UFH UFL XFH XFL IFH IFL
AMH
AML
UMH **
UML **
XMH ** **
XML ** ** **
IMH ** ** **
IML ** ** ** **
AFH **
AFL **
UFH ** **
UFL ** **
XFH ** ** **
XFL ** ** **
IFH ** ** ** **
IFL ** ** ** **
Note: A = In Air; U = In Uterus; X = CM-ex utero; I= CM-in utero; M = Male; F = Female; H = 105 dB; L =95 dB.
-- p>0.05; p<0.05; ** p<0.01.










Table 4-5. ANOVA summary table for CVC stimuli.


Source Sum of Squares df Mean Squares F p-value

Location 505.738 3 168.579 1213.687 <0.0001


Error (Location)

Gender

Error (Gender)

Level

Error (Level)

Location x Gender

Error (Location x Gender)

Location x Level

Error (Location x Level)

Gender x Level

Error (Gender x Level)

Location x Gender x Level

Error (Location x Gender x Level)


43.753

0.192

16.154

9.484

9.685

1.2995

39.486

2.821

33.439

0.119

22.126

7.713

36.061


0.139

0.192

0.154

9.484

0.09224

0.433

0.125

0.940

0.106

0.119

0.211

2.571

0.114


1.247 0.267



102.820 <0.0001



3.456 0.0658



8.857 <0.0001



0.566 0.454



22.459 <0.0001













Table 4-6. Post hoc multiple comparisons (Newman-Keuls test) for CVC stimuli.


Conditions

AMH AML UMH UML XMH XML IMH IML AFH AFL UFH UFL XFH XFL IFH IFL


** **


AMH

AML

UMH

UML

XMH

XML

IMH

IML

AFH

AFL

UFH


** **


**

**

** **

** **L

** ** ** *4


Note: A = In Air; U = In Uterus; X = CM-ex utero; I = CM-in utero; M = Male; F = Female; H
-- p>0.05; p<0.05; ** p<0.01.


= 105 dB; L = 95 dB.


**



**


** **


**


**


**


**



**


~







82

the paired results were included. It is noted that intelligibility was significantly greater (p

< 0.01) for CM ex utero than for CM in utero, except for the male voice recorded at 105

dB SPL (p > 0.05). Also, intelligibility of the words (CVC) was better at higher

presentation levels than at lower presentation levels, except for the male voice recorded

in air. When both stimulus levels were compared, statistical significance (p < 0.01) was

achieved for the male voice recorded in air (p < 0.05), in the uterus, and from CM in

utero, as well as for the female voice recorded in air and from CM in utero.

Figures 4-3 simplifies those data presented in Figure 4-1 by combining levels.

For VCV stimuli, the average intelligibility scores for the male voice recorded in air, in

the uterus, from fetal CM ex utero, and from fetal CM in utero were 98.9%, 92.9%,

75.3%, and 39.6%, respectively. For the female voice recorded in air, in the uterus, from

fetal CM ex utero, and from fetal CM in utero, the intelligibility scores were 90.8%,

80.8%, 49.4%, and 34.0%, respectively. A two-factor repeated measures ANOVA

indicated significant interaction between gender and location (F3,96 = 20.925, p < 0.0001),

and main effects for gender (F1,32 = 192.744, p < 0.0001) and location (F3,96 = 1048.477,

p < 0.0001). The post hoc multiple comparison test (Newman-Keuls) indicated that the

intelligibility scores of the male voice were significantly higher (p < 0.01) than that of the

female voice at all four recording locations. Also, for both male and female talkers, the

intelligibility scores recorded in air were significantly higher (p < 0.01) than that of each

of the other three recording locations. The scores recorded in the uterus were

significantly higher (p < 0.01) than that of recordings from CM ex utero and CM in utero.

The scores recorded from CM ex utero were significantly higher (p < 0.01) than that from

CM in utero.

























Figure 4-3. Mean percent :n l ,ll Il. of VCV nonsense stimuli spoken by a male and a female talker
recorded in air, in the uterus, from the fetal CM ex utero, and from fetal CM in utero when combining two airborne
stimulus levels. Bars equal the standard error of the mean.








100
100 VCV Ma
90 --- -aFer
80
>70 -
-I 60
S50
"j 40 -
-J
u 30
z 20
10 -
0 1
AIR UTERUS CM-EX CM-IN
TEST CONDITION









Similarly, Figures 4-4 clarifies those data presented in Figure 4-2 by combining

levels. For CVC words, the average intelligibility scores for the male voice recorded in

air, in the uterus, from fetal CM ex utero, and from fetal CM in utero were 96.5%, 89.6%,

60.1%, and 50.5%, respectively. For the female voice recorded in air, in the uterus, from

fetal CM ex utero, and from fetal CM in utero, the intelligibility scores were 95.4%,

88.6%, 62.9%, and 49.3%, respectively. A two-factor repeated measures ANOVA

indicated significant interaction between gender and location (F3,315 = 3.386, p = 0.0184),

and main effects for location (F3,315 = 1045.347, p < 0.0001), but not for gender (FI, l05 =

1.427, p = 0.235). A post hoc multiple comparison test (Newman-Keuls) indicated that,

for both male and female talkers, the intelligibility scores recorded in air were

significantly higher (p < 0.01) than that of each of the other three recording locations.

The scores recorded in the uterus were significantly higher (p < 0.01) than that of

recordings from CM ex utero and CM in utero. The scores recorded from CM ex utero

were significantly higher (p < 0.01) than that from CM in utero. There were no statistical

differences (p > 0.05) between the male voice and the female voice across recording

locations, except when recorded in air (p < 0.05).

As reported above, speech (VCV and CVC stimuli) intelligibility scores were

significantly higher for the recordings in air than in the uterus. Likewise, the

intelligibility was significantly greater for the recordings from CM ex utero than from

CM in utero. The recordings within the uterus reflect the speech energies present in

amniotic fluid, whereas the recordings from CM in utero represent the actual fetal

physiological responses to externally generated speech. The characteristics of

transmission of external sound pressure into the maternal abdomen and uterus has been
























Figure 4-4. Mean percent intelligibility of CVC words spoken by a male and a female talker recorded in air, in
the uterus, from the fetal CM ex utero, and from fetal CM in utero when combining two airborne stimulus levels. Bars
equal the standard error of the mean.









100
CVC r'Male
90 Female
o80 -
70
F-
60
50 -
(9
4- 40
- N
w 30
F--
z 20 -
10 -
0
AIR UTERUS CM-EX CM-IN
TEST CONDITION









well described in humans (Querleu et al., 1988a; Richards et al., 1992) and sheep

(Armitage, Baldwin and Vince, 1980; Vince et al., 1982, 1985; Gerhardt, Abrams and

Oliver, 1990). The abdomen wall, uterus, and amniotic fluids can be characterized as a

low-pass filter with a high-frequency cutoff at 250 Hz and a rejection rate of

approximately 6 dB per octave. For frequencies below 250 Hz, sound pressures passing

through to the fetus are unattenuated, and, in some cases, are enhanced. Above 250 Hz,

sound pressures are increasingly attenuated by up to 20 dB (Gerhardt, Abrams and

Oliver, 1990). Thus, the speech signals would be altered as they passed through tissues

of the ewe into the uterus. Additionally, the spectral contents of external sounds are

further modified by the route of sound transmission into the fetal inner ear. Sound

pressures pass through the fetal head by a bone conduction pathway (Gerhardt et al.,

1996). For 125 to 250 Hz, an airborne signal would be reduced by 10-20 dB before

reaching the fetal inner ear. For 500 through 2000 Hz, the signal would be reduced by

35-45 dB (Gerhardt et al., 1992). Therefore, the recordings of speech from CM in utero

would be further degraded and less intelligible than the recordings in air and in the uterus.

The present findings reveal better intelligibility for speech in the uterus than has

been previously found (Querleu et al., 1988b; Griffiths et al., 1994). Querleu et al.

(1988b) found that about 30% of 3120 French phonemes recorded within the uterus of

pregnant women were recognized. In 1994, Griffiths et al. evaluated the intelligibility of

speech stimuli (VCV nonsense syllables and CVC words) recorded within the uterus of a

pregnant sheep. The intelligibility scores were approximately 55% and 34% for the male

and female talkers, respectively. However, the results from the current study showed that

the intelligibility scores averaged across the stimulus types and intensity levels, were




Full Text
xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID EUH4BMYWP_2CU14P INGEST_TIME 2013-03-27T17:07:59Z PACKAGE AA00013620_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES