Encoding of sound-source elevation by the spike patterns of cortical neurons

MISSING IMAGE

Material Information

Title:
Encoding of sound-source elevation by the spike patterns of cortical neurons
Physical Description:
ix, 133 leaves : ill. ; 29 cm.
Language:
English
Creator:
Xu, Li, 1963-
Publication Date:

Subjects

Subjects / Keywords:
Research   ( mesh )
Neurons, Afferent -- physiology   ( mesh )
Auditory Cortex -- physiology   ( mesh )
Sound Localization   ( mesh )
Acoustic Stimulation   ( mesh )
Neural Networks (Computer)   ( mesh )
Cats   ( mesh )
Department of Neuroscience thesis Ph.D   ( mesh )
Dissertations, Academic -- College of Medicine -- Department of Neuroscience -- UF   ( mesh )
Genre:
bibliography   ( marcgt )
non-fiction   ( marcgt )

Notes

Thesis:
Thesis (Ph.D.)--University of Florida, 1999.
Bibliography:
Bibliography: leaves 124-131.
General Note:
Typescript.
General Note:
Vita.
Statement of Responsibility:
by Li Xu.

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 030103542
oclc - 51555765
System ID:
AA00022316:00001

Table of Contents
    Title Page
        Page i
    Acknowledgement
        Page ii
        Page iii
    Table of Contents
        Page iv
        Page v
    List of Figures
        Page vi
        Page vii
    Abstract
        Page viii
        Page ix
    Chapter 1. Introduction
        Page 1
        Page 2
        Page 3
    Chapter 2. Background
        Page 4
        Page 5
        Page 6
        Page 7
        Page 8
        Page 9
        Page 10
        Page 11
        Page 12
        Page 13
        Page 14
        Page 15
        Page 16
        Page 17
        Page 18
        Page 19
        Page 20
        Page 21
        Page 22
        Page 23
        Page 24
        Page 25
        Page 26
        Page 27
    Chapter 3. Sensitivity to sound-source elevation in nontonotopic auditory cortex
        Page 28
        Page 29
        Page 30
        Page 31
        Page 32
        Page 33
        Page 34
        Page 35
        Page 36
        Page 37
        Page 38
        Page 39
        Page 40
        Page 41
        Page 42
        Page 43
        Page 44
        Page 45
        Page 46
        Page 47
        Page 48
        Page 49
        Page 50
        Page 51
        Page 52
        Page 53
        Page 54
        Page 55
        Page 56
        Page 57
        Page 58
        Page 59
        Page 60
        Page 61
        Page 62
        Page 63
        Page 64
        Page 65
        Page 66
        Page 67
    Chapter 4. Auditory cortical sensitivity to vertical source location: Parallels to human psychophysics
        Page 68
        Page 69
        Page 70
        Page 71
        Page 72
        Page 73
        Page 74
        Page 75
        Page 76
        Page 77
        Page 78
        Page 79
        Page 80
        Page 81
        Page 82
        Page 83
        Page 84
        Page 85
        Page 86
        Page 87
        Page 88
        Page 89
        Page 90
        Page 91
        Page 92
        Page 93
        Page 94
        Page 95
        Page 96
        Page 97
        Page 98
        Page 99
        Page 100
        Page 101
        Page 102
        Page 103
        Page 104
        Page 105
        Page 106
        Page 107
        Page 108
        Page 109
        Page 110
        Page 111
        Page 112
        Page 113
        Page 114
        Page 115
        Page 116
        Page 117
        Page 118
        Page 119
        Page 120
    Chapter 5. Summary and conclusions
        Page 121
        Page 122
        Page 123
    References
        Page 124
        Page 125
        Page 126
        Page 127
        Page 128
        Page 129
        Page 130
        Page 131
    Biographical sketch
        Page 132
        Page 133
        Page 134
        Page 135
Full Text










ENCODING OF SOUND-SOURCE ELEVATION BY THE SPIKE PATTERNS OF
CORTICAL NEURONS















By

LIXU













A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY


UNIVERSITY OF FLORIDA


1999














ACKNOWLEDGMENTS

First of all, I thank my mentor and role model, Dr. John Middlebrooks, for his

teaching, guidance, support, and encouragement during my graduate training. The

knowledge and experience that I have gained in his laboratory have contributed greatly to

the development of my academic career.

I thank the members of my supervisory committee Drs. Roger Reep, Charles

Vierck, Jr., and Robert Sorkin for their constructive comments as well as critical

questions. I thank Dr. David Green who, although retired from the supervisory

committee, has provided me continuous help.

I am grateful to have worked with several postdoctoral fellows in Dr.

Middlebrooks's laboratory Drs. Ann Clock Eddins, Shigeto Furukawa, and Ewen

Macpherson. Ann helped me to fit in the lab. Shigeto has participated in most

experiments and has contributed one good idea after another for my data analysis and

final discussion. Ewen has made sense to me of the mysteries of psychophysical

modeling in spatial hearing. New students in Dr. Middlebrooks's laboratory Julie

Arenberg and Brian Mickey have brought fresh thoughts to the lab. Many thanks go

to Zekiye Onsan, who has provided the ultimate technical assistance in the lab.

I thank my fellow graduate students Tony Acosta-Rua, Kellye Daniels, Sean

Hurley, Alyson Peel, and Jeff Petruska for their friendship, and I wish them all the

best in their careers.








I thank the Department of Neuroscience for allowing me to do my dissertation

research away from Florida, and, equally, I thank the Kresge Hearing Research Institute

of the University of Michigan for accepting me to complete my research there and for

awarding me a one-year traineeship (funded by NIDCD).

Finally, I would like to thank my friends and my family who I always keep in my

heart, for their understanding, patience, and faith throughout the years.














TABLE OF CONTENTS


page


ACKNOW LEDGM ENTS.............................................................................................. i

LIST OF FIGURES ................................................................................................. vi

ABSTRACT ............................................................................................................... viii

CHAPTERS

1 INTRODUCTION ................................................................................................... 1

2 BACKGROUND ...................................................................................................... 4
Acoustical Cues for Sound Localization.............................................................. 4
Auditory Cortex: Structure and Function ............................................................ 8
Area Al ............................................................................. ...................... 8
Area A2...................................................................................................... 14
AAF................................................................................. .o ........ .. ..... 15
Area AES ................................................................................................... 17
Neural Codes for Sensory Stimuli ..................................................................... 20
Spike Rate as Neural Codes ........................................................................ 20
Spike Timing as Neural Codes .................................................................... 22

3 SENSITIVITY TO SOUND-SOURCE ELEVATION IN
NONTONOTOPIC AUDITORY CORTEX ........................................................... 28
Introduction...................................................................................................... 28
M methods ........................................................................................................... 30
Results.............................................................................................................. 33
General Properties of Sound-Source Elevation Sensitivity........................... 33
Neural Network Classification of Spike Patterns ......................................... 38
Comparison of Elevation Coding in Areas AES and A2............................... 47
Contribution of SPL Cues to Elevation Coding ........................................... 48
Frequency Tuning Properties and Network Performance ............................. 54
Relation between Azimuth and Elevation Coding ........................................ 58
Discussion ........................................................................................................ 60
Acoustical Cues and Localization in M edian Plane ...................................... 60








A2 versus AES: Elevation Sensitivity and Frequency Tuning
P properties ..................................................................... .......................... 63
Correlation between Azimuth and Elevation Coding.................................... 65
Concluding Remarks ..............................................................................66

4 AUDITORY CORTICAL SENSITIVITY TO VERTICAL SOURCE
LOCATION: PARALLELS TO HUMAN PSYCHOPHYSICS.............................. 68
Introduction...................................................... ............................................68
M methods ........................................................................................................... 7 1
Experimental Apparatus...................................................................... .......... 71
Multichannel Recording and Spike Sorting............................................. 72
Stimulus Paradigm and Experimental Procedure .......................................... 73
D ata A nalysis......................................................... .. ............................. 76
R esults.............................................................................................................. 77
General Properties of Neural Responses to Broadband and
Narrowband Stimuli............................................................................ 78
Network classification of responses to broadband stimulation...................... 80
Neural Network Classification of Responses to Narrowband
Stim ulation .............................................................................. ...................82
The Model of Spectral Shape Recognition ................................................86
Correspondence of Physiology with Behavioral Simulation .......................... 92
Neural Responses to Stimuli Containing a Narrowband Notch..................... 97
Comparison of Narrowband Noise Results to Highpass Noise Data........... 100
Elevation Sensitivity by Spike Counts........................................................ 108
D discussion ...................................................................................................... 111
Spectral Features and Elevation Coding.................................................... 112
Influences of Spectral Notches on Elevation Coding .................................. 116
Elevation Coding by Spike Counts and Spike Timing................................ 117
Concluding Rem arks................................................................................. 119

5 SUMMARY AND CONCLUSIONS.................................................................... 121

REFEREN C ES ................................ .............................................................. ............. 124

BIOGRAPHICAL SKETCH............................. ......................................................... 132














LIST OF FIGURES


Figure page

3.1. Spike-count-versus-elevation profiles ................................................................ 34
3.2. Distribution of depth of modulation of spike count by elevation ......................... 36
3.3. Distribution of the range of elevations over which spike counts greater
than half maximum were elicited ........................................................................ 37
3.4. Distribution of locations of best-elevation centroids ........................................... 39
3.5. Raster plot of responses from two AES units (A: 950531 and B: 950754)
and an A2 unit (C: 970821) ............................................................................... 40
3.6. Network performance of the same unit (950531) as in Figure 3.5A .................... 41
3.7. Network performance of the same unit (950754) as in Figure 3.5B .................... 43
3.8. Network performance of the same unit (970821) as in Figure 3.5C .................... 44
3.9. Distribution of elevation coding performance across the entire sample
of units .............................................................................................................. 46
3.10. Comparison of network performance of A2 and AES units ................................ 48
3.11. Sound levels and neural network performance ................................................... 50
3.12. Percentage of unit sample activated as a function of stimulus tonal
frequency .......................................................................................................... 55
3.13. Frequency tuning bandwidth and neural network performance ........................... 57
3.14. Correlation between network performance in azimuth and elevation .................. 59
4.1. Unit responses elicited by broadband and narrowband noise (unit 9806C02)....... 79
4.2. Network analysis of spike patterns of the same unit (9806C02) as in
F igure 4 .1 .......................................................................................................... 8 1
4.3. Unit responses elicited by broadband, narrowband, and notched noise
(unit 9806C 16) .................................................................................................. 84
4.4. Network estimates of elevation .......................................................................... 85
4.5. Network analysis of spike patterns and model predictions in response
to narrowband stim ulation ................................................................................. 87
4.6. Head-related transfer functions (HRTFs) in the median plane measured
from left ears of 3 cats ....................................................................................... 88
4.7. Spectral differences between the narrowband stimulus spectra and HRTFs ........90
4.8. Correspondence between model prediction and network outputs ....................... 93
4.9. Distribution of percent correct for all narrowband center frequencies
across the sam ple of units .................................................................................. 96
4.10. Network analysis of spike patterns elicited by notched noise .............................. 99
4.11. Unit responses elicited by broadband, narrowband, and highpass noise
(unit 98 11C 03) ................................................................................................ 10 1
4.12. Comparison of network classification of the spike patterns elicited by








narrowband and highpass noise ......................................................... .............. 103
4.13. Sum of the squared differences (SSD) of network outputs ............................... 105
4.14. Distribution of percentile of matched SSD across the sample of units ............... 107
4.15. Accuracy of elevation coding by spike counts and by full spike patterns ........... 109
4.16. Network classification of spike counts elicited by narrowband sounds .............. 110














Abstract of Thesis Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

ENCODING OF SOUND-SOURCE ELEVATION BY THE SPIKE PATTERNS OF
CORTICAL NEURONS

By

LiXu

May 1999

Chairman: John C. Middlebrooks
Major Department: Neuroscience

Previous studies have demonstrated that the spike patterns of auditory cortical

neurons carry information about sound-source location in azimuth. The question arises

as to whether those neurons integrate the multiple acoustical cues that signal the location

of a sound source, or whether they merely demonstrate sensitivity to a specific parameter

that covaries with sound-source azimuth, such as interaural level difference. We

addressed that issue by testing the sensitivity of cortical neurons to sound locations in the

median vertical plane, where interaural difference cues are negligible. We also tested

whether and how cortical neurons use spectral information to derive their elevation

sensitivity. The study involved extracellular recording of units in the nontonotopic

auditory cortex (areas AES and A2) of chloralose-anesthetized cats. Broadband noise

and various spectrally-filtered stimuli were presented in an anechoic room from 14

locations in the vertical midline in 20 steps, from 60 below the front horizon, up and









over the head, to 20 below the rear horizon. Artificial neural networks were used to

recognize spike patterns, which contain both the number and timing of spikes, and to

thereby estimate the locations of sound sources in elevation. The network performance

was fairly accurate in classifying spike patterns elicited by broadband noise. Using the

same neural network that was trained with spike patterns elicited by broadband noise, we

presented spike patterns elicited by spectrally-filtered noise and recorded network

estimates of the locations in elevation of those stimuli. This procedure could be

considered as the physiological analog of asking a psychophysical listener to report the

apparent location of a spectrally-filtered noise. The network elevation estimates based

on spike patterns elicited by narrowband and highpass noise exhibited tendencies similar

to localization judgments by human listeners. A quantitative model derived from

comparison of the stimulus spectrum with the external-ear transfer functions of individual

cats could successfully predict the region in elevation that was associated with

narrowband noise. These results further support the theory that full spike patterns

(including spike counts and spike timing) of cortical neurons code information about

sound location and that such neural responses underlie the localization behavior of the

animal.














CHAPTER 1
INTRODUCTION


The auditory cortex is essential for sound localization behavior. Human patients

with unilateral temporal lobe lesions have difficulties in localizing sounds from the side

contralateral to the lesion (Greene 1929; Klinton and Bontecou 1966; Sanchez-Longo

and Forster 1958; Wortis and Pfeiffer 1948). Experimental ablations of the cat's auditory

cortex also result in deficits in localization of sound sources presented on the side

contralateral to the lesion (Jenkins and Masterton 1982). Despite sustained effort in

neurophysiological studies of the auditory cortex, the cortical codes for sound

localization are still not well understood.

Studies of the optic tectum in the barn owl (Knudsen 1982) and the superior

colliculus in mammals (Middlebrooks and Knudsen 1984; Palmer and King 1982) show

evidence of single neurons that are selective for sound-source location. The neurons'

preferred sound-source locations vary systematically according to the locations of the

neurons within the midbrain structure. Therefore, the working hypothesis for most

studies of the auditory cortex has been that there exists a topographic code for sound

localization in the auditory cortex (Brugge et al. 1994; Clarey et al. 1994; Imig et al.

1990; Middlebrooks Pettigrew 1981; Rajan et al. 1990b). Unfortunately, results reported

from the aforementioned studies have not produced evidence to support such a

hypothesis.








In 1994, Middlebrooks and colleagues proposed an alternative hypothesis that a

distributed code exists for sound localization in the auditory cortex. Studies in his

laboratory have shown that spike patterns (spike counts and spike timing) of the auditory

cortical neurons carry information about sound-source location (Middlebrooks et al.

1994, 1998; Xu et al. 1998). The essence of the hypothesis of the distributed code for

sound localization is that the activity of each individual neuron can carry information

about broad ranges of location and that accurate sound localization is derived from

information that is distributed across a large population of neurons.

The present study extended that line of research in Middlebrooks's laboratory and

expanded the observation from the horizontal plane to the vertical plane. In the central

nervous system, the computational processes for sound localization in the vertical plane

are different from those involved for sound localization in the horizontal plane, due to

different acoustical cues that are used for localization in the two dimensions. Interaural

difference cues (i.e., interaural time difference and interaural level difference) are used for

horizontal localization, whereas spectral shape cues are used for vertical localization and

front/back discrimination. The computational processes for those cues are parallel and

segregated as early as in the cochlear nucleus and all the way throughout the brainstem.

The present study was designed to address whether the cortical neurons that have

previously been shown to code azimuth integrate the multiple acoustical cues that signal

the location of a sound source, or whether they merely demonstrate sensitivity to a

specific parameter that covaries with sound-source azimuth, such as interaural level

difference. Manipulation of source spectra can confound spectral shape cues for vertical

localization. Listeners make systematic misjudgments when asked to localize spectrally-








manipulated noise. Since interaural difference cues are still intact, such a spectral

manipulation does not cause error in horizontal localization. Thus, manipulation of

source spectra provides a way to test more directly that the cortical neurons utilize the

spectral shape cues to code sound-source elevation and that their activities are closely

related to the localization behavior of the animal. We studied the changes in the

elevation sensitivity of the cortical neurons under the conditions of spectrally-

manipulated noise stimulation.

The remainder of the document is organized in the following manner. Chapter 2

reviews the acoustical cues for sound localization with an emphasis on the vertical and

front/back dimensions. It also provides a background on the structure and function of

the auditory cortex followed by a short review on the cortical codes for sensory stimuli

with special attention to the coding of stimuli by the timing of spikes. Two subsequent

chapters describe two major research projects that deal with elevation coding in the

auditory cortex, each with detailed introduction, methods, results, and discussion.

Chapter 3 describes the sensitivity to sound-source elevation in the nontonotopic

auditory cortex. Chapter 4 describes the responses of auditory cortical neurons to

spectrally-manipulated noise stimuli that produce localization illusion. Finally, Chapter 5

provides a brief summary and conclusions from the present research.














CHAPTER 2
BACKGROUND

Acoustical Cues for Sound Localization


Unlike visual space that is mapped on the retina in a point-to-point fashion,

sound-source locations are not mapped directly onto the ear. Instead, locations must be

computed by the brain from sets of acoustical cues that result from the interaction of the

incident sound wave with the head and external ears. Azimuth information is derived at

high frequencies from the interaural level differences (ILDs) and at low frequencies from

interaural phase differences (IPDs). Those binaural difference cues, however, are

ambiguous in distinguishing the vertical and front/back locations (i.e., the elevation). In

the median sagittal plane, for example, ILD and IPD values are zero at all locations, if the

head is perfectly symmetrical. Off the median plane, ILD and IPD are constant for

locations that fall on the surface of virtual cones centered on the interaural axis. Thus,

Woodworth (1938) coined the term of "cone of confusion." Batteau (1967) was one of

the first to draw our attention to the pinna-based spectral cues as a necessary factor to

disambiguate the position around the cone. The convoluted surface of the pinna and

concha differentially modify the frequency spectrum of the incoming acoustical signal

depending on the angle of incidence of the signal. The spectral features, or spectral

shape cues, that result from the modification by the pinna, including spectral peaks and

notches, vary systematically with sound-source locations (Shaw 1974; Mehrgardt and








Mellert 1977; Humanski and Butler 1988; Middlebrooks et al. 1989; Wightman and

Kistler 1989). The frequencies of the spectral peaks and notches increase as sound-

source locations are shifted from low to high elevation, both in the front and rear

locations. The peaks and notches grow smaller at high elevations (above -70), resulting

in a relatively less transformed spectra for sources above the head. There is significant

individual variation in the spectral shape cues due to the physical shape and size

differences of the pinnae and heads among subjects (Middlebrooks 1999a).

Several lines of evidence from psychophysical studies indicate that spectral shape

cues are the major cues for vertical localization. For example, vertical localization is

most accurate when the stimulus has a broad bandwidth that contains energy at 4 kHz

and above (Butler and Helwig 1983; Gardner and Gardner 1973; Hebrank and Wright

1974b; Makous and Middlebrooks 1990; Roffler and Butler 1968). Spectral shape cues

from one ear seem to be sufficient for vertical localization. Vertical localization with a

single ear tested by plugging the other ear is almost accurate as with both ears (Hebrank

and Wright 1974a; Oldfield and Parker 1986). Patients who have congenital deafness in

one ear but normal hearing in the other show accurate vertical localization (Slattery and

Middlebrooks 1994). However, a recent virtual localization study revealed some

discrepancies in monaural localization between free-field results and virtual-source results

(Wightman and Kistler 1997). In that study, vertical localization was eliminated using

monaurally-delivered virtual source sounds.

There are numerous studies on how localization is affected by perturbing,

obscuring, or removing the spectral shape cues. Gardner and Gardner (1973) measured

median plane localization accuracy as listeners' pinnae were gradually occluded with








rubber inserts. Performance was progressively degraded by various degrees of occlusion.

These effects were also observed by Fisher and Freedman (1968), who bypassed the

listener's pinnae with inserted tubes. A recent study by Hofman and colleagues (1998)

offered an intriguing new insight into how the brain learns the transfer functions of the

ears. Those researchers modified the subjects' spectral shape cues by reshaping their

pinnae with plastic molds. The localization of sound elevation was dramatically degraded

immediately after the modification. After six weeks of wearing these molds

continuously, though, all subjects seemed to have learned the transfer functions of the

new ears, so their vertical localization with the new ears was normal again. More

interestingly, learning the new spectral shape cues did not interfere with the neural

representation of the original cues, as the subject could localize sounds with both normal

and modified pinnae (Hofman et al. 1998).

Bandpassing the acoustic signal is another commonly-used method to either

partially or completely remove spectral shape cues from the signal depending on the

bandwidth of filter. In the case of tonal stimulation, the source spectrum consists of a

single sinusoid component. Roffler and Butler (1968) used tonal signals in their studies

of median plane localization. They demonstrated that the apparent elevation of a source

depended on its frequency and was independent of its actual position. Some other

experiments were performed with narrowband noise stimuli. Blauert (1969/1970)

presented 1/3-octave noise from the median plane and showed that the center frequencies

of the noise determined whether the apparent position was in front, above or behind.

Similar effects were shown by Butler and Helwig (1983) using I -kHz-wide noise bands

with center frequencies ranging from 4 to 14 kHz. A final example of narrowband








localization is described by Middlebrooks (1992). In his experiment, subjects reported a

compelling illusion of an auditory image located at an elevation that was determined by

the center frequency of the 1/6-octave-wide narrowband sounds, not by the actual source

location. A typical subject, for instance, consistently reported an image high and in front

when the center frequency was 6 kHz and low and to the rear when the center frequency

was 10 kHz. A model that incorporated measurement of the external-ear transfer

functions could predict the reported sound locations. In such a model, similarity between

the spectra of narrowband stimuli and the external-ear transfer functions was calculated

by way of correlation. Localization judgments of the subjects were biased to locations

for which the external-ear transfer function most closely resembled the stimulus spectrum

(Middlebrooks 1992).

It is worth noting that disruption of spectral shape cues does not affect accurate

localization in azimuth (Hofman et al. 1998; Kistler and Wightman 1992; Middlebrooks

1992, 1999b; Oldfield and Parker 1984). It seems that interaural difference cues and

spectral shape cues are utilized independently to derive sound-source azimuth and

elevation, respectively. The brain is therefore capable of integrating multiple acoustical

cues, including ILDs, IPDs, and spectral shape cues, to synthesize the sound locations.

How the brain interprets the spectral shape cues is a puzzling question. Models of sound

localization support the concept of a central repository of direction templates, derived

from the directional transformation of the external ears (Macpherson 1998;

Middlebrooks 1992; Zakarauskas and Cynader 1993). In such a theory, the frequency

spectrum of an incoming sound is compared to each of the templates, and the one that

matches the best then signals the direction of the incoming sound.








Auditory Cortex: Structure and Function


This section describes the morphological organization of the auditory cortex, i.e.,

the laminar characteristics and the thalamic connections. Focus then moves to the

physiological representations in the auditory cortex, including tonotopic arrangement,

binaural processing, and sound localization. This review will consider primarily studies in

the cat, the species used in the present research.

The cat's auditory cortex is displayed on the lateral surface of the brain. Based on

cytoarchitectural characteristics and physiological properties, the auditory cortex is

divided into subregions. They are the primary auditory cortex (Al), the second auditory

cortex (A2), the anterior auditory field (AAF), the dorsal posterior (DP), posterior (P),

ventral posterior (VP), ventral (V), and temporal (T) auditory fields, and the anterior

ectosylvian sulcus area (areas AES) (Clarey and Irvine 1986; Imnig and Reale 1980). The

most complete studies have been done in areas Al, A2, AAF, or AES.

Area Al

The primary auditory cortex is characterized by an overall high packing density in

layers II, III and IV of the six layers. The high density of granular cells gives the cortex

the term koniocortex, or "dust cortex." The human primary auditory cortex is a 900 -

1600 mm2 area of classic koniocortex along the transverse temporal gyri of Heschl,

corresponding to area 41 (Brodmann 1909). It is surrounded by nonprimary cortex that

can be subdivided into four or five areas. In the cat, Al is located in the dorsal middle

ectosylvian gyrus. The distinction of Al from other auditory cortical areas can be made

in sections stained for cell bodies by the light band of the inner sublayer of layer V (Rose








1949). Detailed description of the Al cytoarchitecture was further provided by Winer

(1992). The molecular layer (layer I) is remarkable for its few neurons. The bulk of its

connections are with the apical dendrites of deeper-lying neurons or within layer I. The

external granule cell layer (layer II) has a wide range of both pyramidal and nonpyramidal

neurons, a columnar and vertical organization that is conserved in the deeper layers, and

significant neurochemical diversity. Its principal connections are with adjacent

nonprimary auditory areas, and it provides local interlaminar projections with layers I-III.

The external pyramidal cell layer (layer III) has a complex set of intrinsic and extrinsic

connections, including relations with the auditory thalamus and ipsilateral as well as

contralateral auditory cortices. This is reflected in its diverse neuronal architecture. The

pyramidal cells of various sizes that are more common in the deeper one-half represent

the most conspicuous population in this layer. Many commissural cells of origin lie in

this layer. The granule cell layer (layer IV), only about 250 g.m thick, represents one-

eighth of the cortical depth. Its connectivity is dominated by thalamic, corticocortical,

and intrinsic input. It also receives projections from the commissural system but does not

send fibers to the system like layer III does. The vertical column organization is

particularly obvious in this layer. The internal pyramidal cell layer (layer V) is has a cell-

sparse, myelin-rich outer half (Va), and an inner half (Vb) with many medium-sized and

large pyramidal cells. It is the source of connections to the ipsilateral nonprimary

auditory cortex, the contralateral Al, the auditory thalamus and the inferior colliculus.

The multiform layer (layer VI) contains the most diverse neuronal population within Al,

consisting of at least nine readily recognized types of cells (Winer 1992).








The major thalamic input to Al I comes from the ventral division of the medial

geniculate body (MGB). This specific auditory relay system ends predominantly in layer

III and IV (Winer 1992). The thalamocortical and corticothalamic Al I projections are

highly reciprocal (Andersen et al. 1980). In addition, the connections between MGB and

Al preserve the systematic topography. For example, injection of anterograde tracer

into A I results in a sheetlike labeling in the ventral division of the MGB and the labeling

sites change systematically with the central tuning frequencies of the injection sites. Al

also receives minor input from a nontonotopic thalamic nucleus (medium-large cell

division of the medial division) (Morel and Imig 1987).

The tonotopic organization of Al I in the cat was first demonstrated at the single-

cell level by Merzenich and associates (1973, 1975). Frequency is represented across the

mediolateral dimension of Al cortex as isofrequency bands. On an axis perpendicular to

this plane of representation, the best frequencies change as a simple function of cortical

location. Low frequencies are represented posteriorly, and high frequencies anteriorly.

The frequency tuning curves of the vast majority of the Al neurons are narrow, with the

sharpest tuning at higher best frequencies (Phillips and Irvine 1981). Along the

isofrequency contour, gradients of tuning sharpness exist. The sharpest frequency tuning

is found near the center of the mediolateral extent of Al, and the sharpness of tuning

gradually decreases toward the medial and lateral border of Al I as revealed by multiple-

unit recordings (Schreiner and Mendelson 1990). In single unit study, the gradient in

bandwidth at 40 dB above minimum threshold (BW40) exists in the dorsal half of A I

(Aid), but the ventral half of A I (Alv) shows no clear BW40 gradient (Schreiner and

Sutter 1992). It is a common observation that within the same vertical penetration into








A l, the best frequency is remarkably constant. The cortical area that represents the

higher frequencies is disproportionally larger than that represents the lower frequencies,

suggesting that more neural machinery of the cat is devoted to encode or extract

information relevant to high frequencies.

The representation of a "point" on the sensory epithelia of the cochlea as a "band"

of cortex suggests that some other parameter of the auditory stimulus is functionally

organized along the isofrequency dimension. There is evidence that groups of neurons

with different binaural response properties are segregated with an Al isofrequency band.

More than 90% of the neurons encountered in Al can be classified into either the

excitatory/excitatory (EE) or excitatory/inhibitory (El) interaction class (Middlebrooks et

al. 1980). Typically, a cortical neuron is excited by sound stimulus from the contralateral

ear. If stimulus from ipsilateral side excites the neuron and binaural stimulus displays

facilitation in the neuronal responses, this neuron is an EE neuron. Otherwise, if

ipsilateral stimulation does not excite the neuron and binaural stimulation produces a

weaker response, then the neuron is an El neuron. All neurons encountered along a

given radial penetration are of the same binaural response class. In a surface view,

neurons of the same binaural response properties aggregate to form patches. Patches

formed by the two types of cells are organized in strips running roughly at right angles to

the isofrequency contours (Middlebrooks et al. 1980). The thalamic sources of input to

these binaural response-specific bands are strictly segregated from each other in the

ventral division of the MGB, as identified with retrograde tracers (Middlebrooks and

Zook 1983). The functional roles of the binaural topographic organization are unclear.








One hypothesis is that El regions are responsible for the processing of spatial location

information and EE regions for frequency pattern analysis (Middlebrooks et al. 1980).

Early studies by Middlebrooks and Pettigrew (1981) examined the functional

organization pertaining to sound localization within Al. Single units were recorded

while tonal stimuli were presented in a free sound field. The receptive fields were

mapped by plotting boundaries of spatial regions within which stimuli elicited a given

neural response. About half of the neurons encountered were location-insensitive or

omnidirectional. Two discrete populations of cells could be identified from the pool of

the location-selective units. One was hemifield units which responded to sounds

presented in the contralateral sound field; the other was axial units which had small,

complete circumscribed receptive fields. The axial units had high frequency tuning, and

their receptive fields reflected the directionality of the contralateral ear at those

frequencies. It is noteworthy that no systematic map of sound space was found in Al of

the cat. Rajan et al. (1990a) found that neurons were sensitive to contra-field, ipsi-field

or central-field and neurons of the same type tended to cluster together along the

frequency-band strip. However, there were often rapid changes in the azimuth tuning

type in units isolated over short distances even though their electrode steps were usually

100 itm and sometimes 50 pgm. Al was found not to be organized in a point-to-point

pattern for the sound-source azimuth. Using noise bursts as stimuli, Imig and colleagues

(1990) also found that neighboring units exhibited similar azimuth and stimulus level

selectivity, suggesting that modular organizations might exist in Al I related to both

azimuth and level selectivity. There is a clear relationship between the nonmonotonic

rate-level function and the strength of the directionality. That is, virtually all of the cells








in A I that have the most strongly nonmonotonic level functions are also sensitive to

azimuth. Since similar property was not found in the ventral nucleus of the MGB, they

concluded that the linkage between azimuth sensitivity and nonmonotonic level tuning

emerged in the cortex (Barone et al. 1996).

Recently, a topography of the monotonicity of rate-level functions in cat A I was

revealed (Sutter and Schreiner 1995). The amplitude selectivity varies systematically

along the isofrequency contours. Clusters sharply tuned for intensity (i.e., nonmonotonic

clusters) are located near the center of the contour. A second nonmonotonic region is

several millimeters dorsal to the center. The lowest thresholds of single neurons are

consistently located in the nonmonotonic regions. The scatter of single-neuron intensity

threshold is smallest at these locations. Although the nonmonotonic neurons have been

shown to be predominantly directionally sensitive (Imig et al. 1990), the restricted

intensity response and threshold range would not favor them for encoding intensity-

independent sound location. However, the response properties of neurons in the dorsal

part of Al are of interest in the context of sound localization. Sutter and Schreiner

(1991) recorded single-unit frequency tuning curves in Al. About 20% of the neurons

had multipeaked tuning curves and 90% of them were in the dorsal part of Al.

Inhibitory/suppressive bands, as demonstrated with two-tone paradigm, were often

present between peaks. It was suggested that these neurons might be sensitive to specific

spectrotemporal combinations in the acoustic input and might be involved in complex

sound processing. It is an attractive idea that these subpopulations of neurons in the

dorsal part of Al are particularly suitable for detecting the spectral notches that are

flanked by two spectral peaks or plateaus. Because spectral notches have been indicated








to be important acoustical cues for localization in elevation, it might be worthwhile to

investigate the coding of elevation by these neurons in our future experiments.

Area A2

A2 is located ventral to Al on the middle ectosylvian gyrus, extending at least 6

mm ventrally from Al. The transition area between Al I and A2 defined physiologically

has a width of about 0.5 1 mm, concordant with a gradual change of the

cytoarchitecture of the border (Schreiner and Cynader 1984). A2 has a distinctive

cytoarchitecture arrangement: there are fewer of the pyramidal cells characteristic of

layer III in A I, the density of neurons is more or less uniform throughout, except in layer

Vb, and large or giant pyramidal neurons mark layer Va. Nevertheless, layer IV is

dominated by small, round cells, and the columnar arrangement evident in Al is

conserved here as well (Winer 1992).

A2 loci are thalamocortically and corticothalamically connected with the caudal

dorsal nucleus, the ventral lateral nucleus of the ventral division, and the medial division

of the MGB. The dorsal division projections are the heaviest of all. These connections

are largely segregated from those between Al I and MGB. Injection studies revealed no

apparent systematic topography of A2 projection to and from the MGB nuclei. While

the connections between A I or AAF and the ventral division of the MGB is termed the

"cochleotopic system," the connections between A2 and the MGB is called the "diffuse

system" (Andersen et al. 1980).

A2 neurons are much more broadly tuned in frequency than Al neurons. There is

a gradual transition from sharply tuned Al neurons to broadly tuned A2 neurons on the

border of A I and A2. Typical A2 neurons are slightly less sensitive to tonal stimuli than









A I cells and are almost equally sensitive across a broad range of frequencies, commonly

spanning several octaves. Therefore, the tonotopic organization within A2 concordant

with Al in orientation is significantly blurred by the strong variability of the characteristic

frequencies, isolated low-frequency islands, and increasing bandwidth of the frequency

receptive fields (Andersen et al. 1980; Schreiner and Cynader 1984). A2 is bordered

posteriorly by tonotopically organized regions of cortex (P and VP) (Andersen et al.

1980).

In terms of binaural interactions, the segregation of EE and El responses has also

been demonstrated in A2, but grouping of "like" responses tends to be highly variable in

shape and orientation between animals as compared to Al. The proportion of EO (no

interaction, monaural only) neurons in A2 (-24%) is slightly larger than that in Al

(- 18%) (Schreiner and Cynader 1984). Discharges of EO neurons are determined by

stimulation of one ear (usually contralateral side) and are unaffected by simultaneous

stimulation of the other ear. Therefore, their binaural responses are indistinguishable

from the monaurally-evoked responses from the sensitive ear.

AAF

AAF is located anterior to Al I on the middle and anterior ectosylvian gyri. In

AAF, the neuronal density is somewhat lower than that in Al and the cells are slightly

larger, the pyramidal cell populations in layer lia and Va have larger somata than their

Al counterparts, and the cell-poor part of Vb is reduced. In addition, layer IV contains a

significant number of pyramidal cells, unlike layer IV in A 1 (Winer 1992).

The systematic topography of the thalamocortical and corticothalamic reciprocal

projections of AAF with the auditory thalamus are similar to the Al connections








(Andersen et al. 1980). However, the connections with the ventral division of the MGB

are weaker than in Al. The major tonotopic input comes from the lateral part of the

posterior group of thalamic nuclei (Po). A2 also receives major input from the

nontonotopic thalamic nucleus (medium-large cell region of the medial division) (Morel

and Imig 1987).

In AAF, there is a clear tonotopic organization which is a mirror image of that in

Al. High frequencies are oriented dorsoventrally along the border with the high-

frequency region of Al 1; lower frequencies are represented in the more rostral cortex.

Comparison of the properties of AAF and Al shows that these two areas are similar in

many important features, including unit response properties, short latency, and

disproportionally greater representation of higher frequencies. They also share some

common thalamocortical inputs. These similarities suggest that AAF is not a

"secondary" cortical field, but rather that it and Al are parallel processors of ascending

acoustical information (Knight 1977).

Phillips and Irvine (1982) obtained data on the binaural interactions of 40 AAF

neurons. The binaural interactions of AAF neurons were qualitatively similar to those of

A I neurons, but they regarded the data as preliminary due to the small number of

neurons studied.

Azimuthal tuning of AAF neurons was measured by Korte and Rauschecker

(1993). Spatial tuning of individual neurons as defined by spatial tuning index which was

simply the ratio between the minimal and maximal responses from all 7 azimuth locations

(-60 to +60 in 20 step) was found not to be different from that of AES neurons. This

study was done in only two cats and the number of AAF neurons versus AES neurons









studied was not reported. Certainly, more studies need to be done before any

conclusions on the functional organization of AAF in sound localization can be drawn.

Area AES

Area AES is located on the banks and funds of the anterior ectosylvian sulcus.

It is a multiple-modality sensory cortex where neurons responsive to somatosensory,

auditory, and visual stimulation are apparently intermingled throughout both banks and

funds of the AES. But it is still controversial whether there are modality-specific (pure

visual or pure somatosensory) subregions and the size of those regions within both banks

and funds of AES (see Meredith and Clemo 1989; Clarey and Irvine 1990a).

Barbiturate anesthesia, which has been shown to suppress the auditory responses, was

considered to be the reason for the discrepancy among different studies (Clarey and

Irvine 1990a).

As would be expected for a multisensory cortex, area AES has a wide range of

inputs from the thalamus and other cortical regions. Roda and Reinoso-Suarez (1983)

studied the thalamic projections to the cortex of AES by the use of retrograde labeling

with a direct visual approach to the AES region. It was shown that all labeled neurons in

the thalamus were ipsilateral to the injection. The thalamic afferents originated from the

ventromedial thalamic nucleus (VM), lateral medial subdivision of the lateral posterior-

pulvinar complex (LM), suprageniculate nucleus (Sg), posterior thalamic nuclear group

(Po), and magnocellular (or medial) division of the MGB. A small number of labeled

neurons was found in the ventral part of the lateral posterior nucleus (LP), VA/VL, MD,

and intralaminar nuclei. Slightly different patterns of these thalamocortical connections

were observed depending on the portion of the AES region considered. Clarey and









Irvine (1990b) used a physiological guide to inject horseradish peroxidase into the

acoustically responsive regions of the AES. The labeling of the medial division of MGB

(i.e., the magnocellular division) and other thalamic nuclei were similar to previously

described results. The posterior group of thalamic nuclei (Po), a tonotopically organized

auditory thalamus, was also found to project to area AES. Since no neurons in area AES

were found to show sharp frequency tuning, some degree of convergence of the input

from Po must have occurred. No input from the ventral MGB was described.

The cortical input to area AES arises from a number of unimodal and

multisensory areas, with a dominant input from the cortex of the suprasylvian sulcus

(SSS), which contains several extrastriate visual fields and to a lesser extent some

anterior multimodal regions. Area AES also receives input from contralateral AES and

contralateral SSS (Clarey and Irvine 1990b; Reinoso-Suarez and Roda 1985). It is not

clear whether area AES receives input from other auditory cortex. A recent report did

show that AES neurons projected to auditory cortical areas Al I and A2, and temporal (T)

auditory field. In the coronal sections of Al, the labeling appeared in patches. When the

sections were aligned and serially arranged, the patches formed bands that extended in a

rostrocaudal direction across Al I (Miller and Meredith 1998).

Area AES receives input from the motor regions of the thalamus and cortex

(Reinoso-Suarez and Roda 1985); therefore, it might be involved in functions that

require sensorimotor integration. This speculation was supported by the fact that area

AES has dense projection to deep layers of the superior colliculus (SC) (Meredith and

Clemo 1989). In the anterograde and retrograde labeling study, Meredith and Clemo

(1989) demonstrated that of the auditory cortices (Al; A2; areas A, P, VP, and AES),









only area AES projected to the SC. Auditory SC neurons responded to electric

stimulation of the area AES only. However, neither anatomical nor physiological

techniques revealed a clear topographic relationship between the area AES and the SC

but suggested instead a diffuse and extremely divergent/convergent projection.

No tonotopic organization has been identified in the area AES. The following

characteristics of AES cells distinguish them from the bordering Al and AAF cells: a loss

of sharply tuned responses and the appearance of broad or irregular high-frequency

tuning, an increase in the latency of response, an increase in the strength of the

suprathreshold response to noise, and the advent of response to visual stimulation

(Clarey and Irvine 1986, 1990a). The distinction between the AES neurons and A2

neurons is less clear cut. Generally, the AES neurons are more responsive to noise and

some are responsive to visual stimulation. When tested for binaural interactions, the

AES neurons have predominantly EE responses (Clarey and Irvine 1990a).

Korte and Rauschecker (1993) reported that more than half of the neurons they

recorded from the AAF and area AES were "directional." Preliminary data from the

same laboratory showed that the neurons' preferred azimuth changed continuously over a

certain range, until it jumped discontinuously. A piecewise continuous representation of

location preference in the auditory cortex was suggested (Henning et al. 1995). One of

the obvious limitations of their work is that azimuth sensitivity was measured within only

60 of the frontal midline. A complete account of the experiment is still not available.

Middlebrooks and collaborators (1998) recorded the azimuth tuning through 360 from

154 AES neurons and showed that azimuth tuning of the AES neurons was usually broad

and no systematical change of preferred azimuth was seen.









Neural Codes for Sensory Stimuli


This section reviews two theories on the neural codes for sensory stimuli. One is

the traditional view of neural coding and is based on spike rate; the other has evolved

more recently and incorporates spike timing in the theory.

Spike Rate as Neural Codes

Edgar Adrian, who was the first to study the nervous system on the cellular level

in 1920s, established three fundamental facts about neural code: (1) individual neurons

produce stereotyped action potentials, or spikes; (2) the rate of spiking increases as the

stimulus intensity increases; and (3) spike rate begins to decline if a static stimulus is

continued for a very long time. Later, the notion of feature selectivity, in which the cell's

response depends most strongly on a small number of stimulus parameters and is

maximal at some optimum value of these parameter, was clearly enunciated by Barlow

(1953), who was Adrian's student. A specific example from Barlow's work is the "bug

detector" of the frog retina, a class of ganglion cells that respond with great specificity to

small black disks moving within neurons' receptive fields (Barlow 1953; also see Lettvin

et al. 1959). His "neuron doctrine" formulated from the above observations maintains

that sensory neurons are tuned to specific "trigger features" and that a strong discharge

by a neuron would signal the presence of a trigger feature within its receptive field

(Barlow 1972). In the context of "bug detector," the sensory neurons are represented as

yes/no devices, signaling the presence or absence of certain elementary features. As a

consequence of this neuron specificity, a given stimulus would be represented by a

minimum number of active neurons.









The ideas of feature selectivity and cortical maps have dominated the exploration

of the cortex. Cortical map or topographic organization is maintained from sensory

epithelia to the sensory cortex. In the visual system, the visual space is mapped to the

retina from which a point-to-point projection ascends to the primary visual cortex. The

same is true for the somatosensory system in which the sensory input from the body

surface projects topographically to the primary somatosensory cortex in the form of a

homunculus. In the auditory system, the sensory epithelia in the cochlea is tonotopically

organized so that high frequency is represented in the base of the cochlea and low

frequency in the apex. Such a tonotopical organization is maintained all the way to the

primary auditory cortex.

In other instances, computational maps could emerge from the integrative activity

of the central nervous system. For example, many cells in the visual cortex are selective

not only for the size of the objects (e.g., the width of a bar) but also for their orientation.

Neighboring neurons are tuned to neighboring orientation, so that such a computational

feature selectivity is mapped over the surface of the cortex (Hubel and Wiesel 1962).

Hubel and Wiesel (1962) also rationalized that this orientation selectivity could be built

out of center-surround neurons, suggesting that higher percepts are built out of

elementary features. In the auditory system, single neurons in the optic tectum in the

barn owl and the superior colliculus in mammals are selective for sound-source location

(barn owl: Knudsen 1982; guinea pig: Palmer and King 1982; cat: Middlebrooks and

Knudsen 1984; monkey: Jay and Sparks 1984). In those midbrain structures, the

preferred sound-source locations of neurons vary systematically according to the










locations of neurons within the structure. In other word, there exists an auditory spatial

map in the midbrain.

The neural code based on spike rate leads us quite far in our understanding of the

brain function. It is disappointing, however, that despite sustained efforts in several

laboratories, a spatial map has not been found in the auditory cortex, a structure essential

for sound localization. Previous studies have examined cortical area Al (Brugge et al.

1994, 1996; Imig et al. 1990; Middlebrooks and Pettigrew 1981; Rajan et al. 1990b), the

anterior ectosylvian area (area AES) (Korte and Rauschecker 1993; Middlebrooks et al.

1998) and, to a lesser degree, the anterior auditory field (AAF) (Korte and Rauschecker

1993). Those studies have shown that the spatial tuning of the cortical neurons by spike

rate is broad. Moreover, an increased stimulus intensity causes significant expansion of

the spatial receptive field in the neurons. At any sound-source location, a stimulus

evokes firing from a large proportion of neurons in the auditory cortex (Middlebrooks et

al. 1998). There are no systematic shifts in the "best location" of the neurons when the

recording electrode changes location in the cortex. The "best location" changes as the

stimulus levels are changed. These data are inconsistent with a spike-rate-based

topographical code for sound localization. An alternative hypothesis of the neural codes

for sound localization, in which spike timing as well as spike counts is incorporated, was

proposed and tested by Middlebrooks and colleagues (1994, 1998).

Spike Timing as Neural Codes

As studies of sensory percepts increase in complexity, a simple spike rate code

may be rendered inadequate as a predictor of behavior. Although controversy still exists

regarding whether spike timing contributes to sensory coding in the cortex (Shadlen and









Newsome 1994; Softky 1995), evidence is rapidly growing that supports the neural

codes in which spike timing of the cortical neurons carries information about stimulus

parameters. In the context of this review, temporal code is defined as a neural code in

which the temporal pattern of a neuron's discharge transmits important information about

the stimulus. In the temporal pattern of a neuron's discharge, spike latency and interspike

interval enter the picture. Temporal code might also incorporate the relative spike timing

among multiple neurons, thus giving rise to the term of ensemble temporal code

(Eggermont 1998). Note that a theory of temporal code does not preclude a rate code

being superimposed on it simultaneously.

Temporal code has been shown to be superior to rate code in various sensory

systems in the following three categories: representation of time-dependent signals,

information rates and coding efficiency, and reliability of computation (Rieke et al.

1997). In order for the temporal code to be useful, repetitive firing in the neurons should

be sufficiently reliable. Mainen and Sejnowski (1995) demonstrated that the spike-

generating mechanisms of the cortical neurons are intrinsically precise. Spike trains

could be produced with timing reproducible to less than 1 ms. Such precision is

necessary for the propagation of information by a high-resolution temporal code. To

address the significance of temporal code, it is necessary to consider not just the intrinsic

variability of response to the same stimulus, but also to compare this variability with the

variability encountered as stimulus attribute is changed. Victor and Purpura (1996) used

a metrical analysis of spike patterns to study the nature and precision of temporal coding

in the visual cortex. They found that -30% of recordings would be regarded as showing

a lack of dependence on the stimulus attribute if one considered spike count but









demonstrated substantial tuning when temporal pattern was taken into consideration.

Temporal precision was highest for stimulus contrast (10 30 ms) and lowest for texture

type (100 ms). Their finding suggested the possibility that multiple submodalities can be

represented simultaneously in a spike train with some degree of independence. The firing

patterns, viewed with high temporal resolution, might represent contrast, while the same

pattern, viewed with a substantially lower resolution, might represent texture or another

correlate of visual form.

Information about tactile stimulus location is well preserved in the precise

topographic maps in the primary somatosensory cortex (SI), as discussed in the previous

section. In the secondary somatosensory cortex (SII), neurons have large receptive fields

and the topographic organization disappears. Nicolelis and his colleagues (1998)

recently showed that different cortical areas could use different combinations of encoding

strategies to represent the location of a tactile stimulus. Information about stimulus

location could be transformed from a spatial code (based on spike rate) in area SI to an

ensemble temporal code in area SII. They made simultaneous multi-site neural ensemble

recordings in three areas of the primate somatosensory cortex (areas 3b, SII and 2). An

artificial neural network algorithm was then used to measure how well the firing patterns

of cortical ensembles could predict, on a single trial basis, the location of a punctate

tactile stimulus applied to the animal's body. The neural network could successfully

discriminate multiple stimulus locations based on spike patterns of cortical ensembles of

each of the three areas. However, by integrating neuronal firing data into a range of bin

size (3, 5, 15 or 45 ins), a procedure that was referred to as "bin clumping," they found

that the discrimination ability of only area SII neural ensembles was significantly









deteriorated. Therefore, while the neuronal responses in areas 3b and 2 contained

information about stimulus location in the form of rate code, the spatiotemporal

character of neuronal responses in the SII cortex contained the requisite information

using temporally patterned spike sequences (Nicolelis et al. 1998).

Another elegant example of temporal coding comes from reports by Richmond,

Optican and their collaborators who used information theory to describe the time

dependent neural responses in monkey visual system. The question that they set out to

answer was that whether temporal patterns of neuronal firing represent stimulus features

such as visual spatial patterns. Their first experiments were done on cells in the inferior

temporal cortex (Richmond and Optican 1987), and subsequent experiments have used

the same methods to study neurons in several different visual areas (McClurkin et al.

1991; Richmond and Optican 1990). The visual cortical neurons produced the same

average number of spikes during the presentation of different spatial patterns (Walsh

functions). On the other hand, it was clear that the temporal pattern of spikes during the

stimulus presentation was very different (Richmond et al. 1987; 1990). In their studies,

they first filtered spike trains in response to a large set of two-dimensional spatial

patterns to generate smoothed spike patterns. They then approximated the smoothed

spike patterns as a sum of successively more complex waveforms (the principal

components). Each instance of the spike pattern was then transformed into a set of

coefficients, in much the same way that Fourier series transforms a function of time into

the discrete set of Fourier coefficients. It was shown that the first principal component,

which was highly correlated with spike count, carried only about half of the information

that was available in the spike patterns. Higher principal components, which were










uncorrelated with spike count and yet represented the tendency of the spikes to cluster at

different times following the onset of the static visual stimulus, carried nearly half of the

total information. Their observations suggested that features of spike patterns additional

to spike counts, presumably spike timing, carry stimulus-related information in the visual

cortex.

Middlebrooks and collaborators (1994, 1998) showed that spike patterns of

auditory cortical neurons carry information about sound-source azimuth. In their studies,

an artificial neural network was used as a generic pattern classifier. Such a neural-net

algorithm allowed them to "read out" the sound-source azimuth from the firing patterns

of single cortical neurons. They observed a moderate level of localization performance

based on spike counts alone, and performance improved when spike timing was

incorporated. Principal components analysis showed that information-bearing elements

of the firing patterns of the cortical neurons included spike counts and temporal

dispersion of the firing patterns (Middlebrooks and Xu 1996). Their research along with

that of others leads us to the concept of a "panoramic code" in which stimulus-related

information is embedded in the temporal patterns of the neuronal discharges. Each single

neuron codes many stimulus attributes, e.g., stimulus location around 360

(Middlebrooks et al. 1994; 1998), visual spatial patterns (Richmond et al. 1987; 1990),

or visual contrast and texture (Victor and Purpura 1996). With this scheme, one can

interpret a continuously varying output of a neuron to decode a continuously varying

stimulus parameter. In contrast, a coding scheme based on spike rate would require one

to integrate the activity of a neuron over a period of time to obtain a spike rate which is

then interpreted as the probability that a particular stimulus is present. In a real-world







27


situation, the strategy using a timing-based panoramic code is therefore obviously

superior to that using a rate-based code in the neural representation of time-dependent

sensory information.















CHAPTER 3
SENSITIVITY TO SOUND-SOURCE ELEVATION IN NONTONOTOPIC
AUDITORY CORTEX

Introduction


We have shown that the spike patterns of auditory cortical neurons carry

information about sound-source azimuth (Middlebrooks et al. 1994, 1998). The

principal cues for the location of a sound source in the horizontal dimension (i.e.,

azimuth) are those provided by the differences in sounds at the two ears, i.e., interaural

time difference (ITD) and interaural level difference (ILD). In contrast, the principal cues

for location in the vertical dimension are spectral-shape cues that are produced largely by

the interaction of the incident sound wave with the convoluted surface of the pinna (see

Middlebrooks and Green 1991 for review). The question arises as to whether the spike

patterns that we studied represent the output of a system that integrates these multiple

cues for sound-source location, or whether they merely demonstrate neuronal sensitivity

to an interaural difference that co-varies with sound-source azimuth, such as ILD. Sound

sources located anywhere in the vertical midline produce small, perhaps negligible,

interaural differences. For that reason, one would predict that a neuron that was

sensitive only to interaural differences would show no sensitivity to the vertical location

of sound source in the midline and be unable to distinguish front and rear locations.

Alternatively, if cortical neurons integrate multiple types of location information, we

would expect to observe sensitivity to both the horizontal and the vertical location of a









sound source. We addressed this issue by testing the sensitivity of neurons for the

vertical location of sound sources in the median plane.

The spatial tuning properties of cortical auditory neurons have been studied by

several groups of investigators (area Al: Brugge et al. 1994, 1996; Imig et al. 1990;

Middlebrooks and Pettigrew 1981; Rajan et al. 1990a, 1990b; area AES: Korte and

Rauschecker 1993; Middlebrooks et al. 1994, 1998). Most of those studies were

restricted to the azimuthal sensitivity of the neurons. Middlebrooks and Pettigrew

(1981) described a few units that showed elevation sensitivity to near-threshold sounds,

but the stimuli in that study were pure tone bursts, which lacked the spectral information

that is crucial for vertical localization of sounds that vary in sound pressure level (SPL).

Brugge and colleagues (1994, 1996) confirmed that most Al cells are differentially

sensitive to sound-source direction using "virtual space" clicks as stimuli that simulated

1650 sound-source locations in a three-dimensional space. Near threshold, many of the

neurons in their study showed virtual space receptive fields that were restricted in the

horizontal and vertical dimensions. When stimulus levels were increased, however, most

of the spatial receptive fields enlarged and the vertical selectivity disappeared. Imig et al.

(1997) found that, at the level of the medial geniculate body, neurons showed sensitivity

to sound-source elevation when stimulated with broadband noise. Such elevation

sensitivity disappeared when stimulated with pure tones. They suggested that those

neurons were capable of synthesizing their elevation sensitivity by utilizing spectral cues

that were present in the broadband noise stimuli.

The present study was undertaken to examine the coding of sound-source

elevation by neurons in cortical areas AES and A2. The spike counts of most of these









neurons showed rather broad tuning for sound-source elevation. Nevertheless, spike

patterns (i.e., spike counts and spike timing) varied with sound-source elevation. Using

an artificial neural network paradigm like the one that we used in the previous studies of

azimuth coding (Middlebrooks et al. 1994, 1998), we found that it was possible to

identify sound-source elevation by recognizing spike patterns. This result leads us to

reject the hypothesis that neurons are merely sensitive to ITD or ILD. Our initial data all

were collected from units in area AES (Xu and Middlebrooks 1995). Many of those

units failed to discriminate among low elevations. When tested with tones, most of those

AES neurons responded only to frequencies greater than 15 kHz. We reasoned that the

accuracy in lower elevation coding might improve if we could find neurons that were

sensitive to lower frequency tones, because spectral details in the range of 5 to 10 kHz

are thought to signal lower elevations (Rice et al. 1992). Therefore, we expanded our

experiments to area A2 in which neurons sensitive to broader bands of frequency are

more often found. In this report, results from areas AES and A2 were compared in terms

of their elevation-coding accuracy and their frequency tuning properties. The role that

source sound pressure level might play in elevation coding was addressed. The

relationship between network performance in azimuth and elevation of the same neurons

was examined.


Methods


Methods of surgical preparation, electrophysiological recording, stimulus

presentation, and data analysis were described in detail in Middlebrooks et al. (1998). In

brief, 14 cats were used for this study. Cats were anesthetized for surgery with









isoflurane, then were transferred to a-chloralose for single-unit recording. The right

auditory cortex was exposed for microelectrode penetration. Our on-line spike

discriminator sometimes accepted spikes from more than one unit, so we must note the

possibility that we have underestimated the precision of elevation coding by single units.

We recorded from the anterior ectosylvian sulcus auditory area (area AES) and auditory

area A2. Recordings from area AES were made from the portion of area AES that lies

on the posterior bank of the anterior ectosylvian sulcus. Recordings from area A2 were

made from the crest of the middle ectosylvian gyrus ventral to area Al. Area A2 was

distinguished from neighboring Al by frequency tuning curves that were at least one

octave wide at 40 dB above threshold. Following each experiment, the cat was

euthanized and then perfused. The half brain was stored in 10% formalin with 4%

sucrose and later transferred to 30% sucrose. Frozen sections stained with cresyl violet

were examined with a light microscope to determine the electrode location in the cortex.

Sound stimuli were presented in an anechoic chamber from 14 loudspeakers that

were located on the median sagittal plane, from 60 below the frontal horizon (-60), up

and over the head, to 20 below the rear horizon (+200) in 20 steps. Stimuli consisted

of broadband Gaussian noise burst stimuli of 100-ms duration with abrupt onsets and

offsets. Loudspeaker frequency responses were closely equalized as described in

Middlebrooks et al. (1998). All speakers were 1.2 m from the center of the cat's head.

The stimulus levels were 20 to 40 dB above the threshold of each unit in 5-dB steps. A

total of 24 to 40 trials was delivered for each combination of stimulus location and

stimulus level; locations and levels were varied in a pseudorandom order. Whenever

possible, the frequency tuning properties of the units also were studied, using pure tone










stimuli. The pure tone stimuli were 100-ms tone bursts (with 5-ms onset and offset

ramps) with frequencies ranging from 3.75 to 30.0 kHz at one-third octave steps. They

were presented at 10 dB and 40 dB above threshold from a speaker in the horizontal

plane from which strong responses to broadband noise were obtained, usually at

contralateral 20 or 40 azimuth.

Off-line, an artificial neural network was used to perform pattern recognition on

the neuronal responses (Middlebrooks et al. 1998). Neural spike patterns were

represented by estimates of spike density functions based on bootstrap averages of

responses to 8 stimuli, as described in the previous paper. The two output units of the

neural network produced the sine and cosine of the stimulus elevation, and the arctangent

of the two outputs gave a continuously varying output in degree in elevation. We did not

constrain the output of the network to any particular range, so the scatter in network

estimation of elevation sometimes fell outside the range of locations to which the

network was trained (i.e., from -60 to +200).

Measurement of directional transfer functions of the external ears was carried out

in six of the cats after the physiological experiments. A 1/4" tube microphone was

inserted in the ear canal through a surgical opening at the posterior base of the pinna.

The probe stimuli delivered from each of the 14 speakers in the median plane were pairs

of Golay codes (Zhou et al. 1992) that were 81.92 ms in duration. Recordings from the

microphone were amplified and then digitized at 100 kHz, yielding a spectral resolution

of 12.2 Hz from 0 to 50 kHz. We subtracted from the amplitude spectra a common

term that was formed by the root-mean-squared sound pressure averaged across all

elevations. Subtraction of the common term left the component of each spectrum that









was specific to each location (Middlebrooks and Green 1990). Those measurements

permitted us to study in detail the directional transfer functions of the external ear;

however, in the present study, we considered only the spatial patterns of sound levels of

three one-octave frequency bands: low-frequency (3.75 7.5 kHz), mid-frequency (7.5 -

15 kHz), and high-frequency (15 30 kHz).


Results


General Properties of Sound-Source Elevation Sensitivity

A total of 195 units was recorded from areas AES (113 units) and A2 (82 units).

Figure 3.1 shows the elevation sensitivity of two AES units (Figure 3.1, A and B) and

two A2 units (Figure 3.1, C and D). Left and right columns of the figure plot data from

20 dB and 40 dB above threshold, respectively. The elevation tuning of the units in

Figure 3.1, A and C, was among the sharpest in our sample. Most often, however, units

showed some selectivity at the lower sound pressure level, but the selectivity broadened

considerably at higher sound pressure levels. The units in Figure 3. 1, B and D, are

typical. The region of stimulus elevation that produced the greatest spike counts from

each unit was represented by the "best-elevation centroid", which was the spike-count-

weighted center of mass of the peak response, with the peak defined by a spike count

greater than 75% of the unit's maximum. The rationale for representing elevation

preferences by best-elevation centroids rather than by single peaks or best areas was that

the location of a centroid is influenced by all stimuli that produced strong responses, not

just by a single stimulus location (Middlebrooks et al. 1998). The primary centroids for

the examples in Figure 3.1 are marked by arrows. However, for the responses at 40 dB










Threshold+20 dB

90" A. 950719
area AES


Threshold+40 dB


C. 9607A2


Figure 3.1. Spike-count-versus-elevation profiles. A, B: AES units (950719 and
950984). C, D: A2 units (9607A2 and 960721). The left column represents spike-count-
versus elevation profiles at stimulus level 20 dB above threshold and right side 40 dB
above threshold. In these polar plots, the angular dimension gives the speaker elevation
in the median plane, with 0 straight in front of the cat, 90 straight above the cat's head,
and 180 straight behind, as marked in A. The radial dimension gives the mean spike
counts (spikes per stimulus presentation). Arrows show the primary elevation centroids,
which is the spike-count-weighted center of mass with a peak defined by a spike count
greater than 75% of the unit's maximum. No centroids could be calculated for 40 dB
data of B and D.









above threshold represented by the right column of Figure 3.1, B and D, no centroids

could be computed because the spatial tuning became too flat.

The elevation sensitivity of spike counts in our sample of units is summarized in

Figures 3.2 and 3.3. At stimulus levels 20 dB above threshold, 86% of the AES units

and 66% of the A2 units showed more than 50% modulation of spike counts by sound-

source elevation (Figure 3.2, left panels), but that proportion of the sample dropped to

48% for AES units and 13% for A2 units when the stimulus level was raised to 40 dB

above threshold (Figure 3.2, right panels). The height of elevation tuning was

represented by the range of elevation over which stimuli activated units to more than

50% of their maximal spike counts. Figure 3.3 shows histograms of the height of

elevation tuning, which was defined as the range of elevations over which units

responded with spike counts greater than half maximum. Fifty-two percent of the AES

units and 84% of the A2 units showed heights larger than 180 at stimulus levels 20 dB

above threshold (Figure 3.3, left panels), and the heights of nearly all units from either

area AES or area A2 were larger than 1 80 at 40 dB above threshold (Figure 3.3, right

panels). In general, A2 units tended to show broader tuning in sound-source elevation

than did AES units (Mann-Whitney U test, P < 0.01). Note that all measurements of

elevation were made in the vertical midline. Elevation sensitivity might have appeared

somewhat sharper if it had been tested in a vertical plane, off the midline that passed

through the peaks in units' azimuth profiles. That approach has been used, for instance,

in studies of the superior colliculus (Middlebrooks and Knudsen 1984) and medial

geniculate body (Imig et al. 1997).














of Spike Count by Elevation


1 I 0 III
C

Area A2 area A2
L N= 82 N= 82
W Thr + 20 dB Thr + 40 dB
S30 median=59.6% median=31.6%



20



10 -



0
0 20 40 60 80 100 0 20 0 60 80 1
Depth of Modulation (X)




Figure 3.2. Distribution of depth of modulation of spike count by elevation. Open bars
in the upper panels represent area AES units. Filled bars in the lower panels represent
area A2 units. Left panels plot data at a stimulus level 20 dB above threshold. Right
panels plot data at a stimulus levels 40 dB above threshold.












Height of Elevation Tuning at Half-Maximal Spike Count


area AES
N=113
Thr + 20 dB


mIRFt


S! I I f I I I I I 1 1 1
area AES 51.3
N=113
Thr + 40 dB


twill,,


I I I *
area A2
N= 82
Thr + 20 dB


r-- -- r-


I I I I
area A2
N= 82
Thr + 40 dB


86.6%


8.


I I W I Wri -+4 1
0 40 80 120 160 200 240 280 0 40 80 120 160 200 240 280
Height in Elevation


Figure 3.3. Distribution of the range of elevations over which spike counts greater than
half maximum were elicited. Conventions as in Figure 3.2.


V









The best-elevation centroids of our population of 195 units were distributed

throughout the elevations of the median plane. However, more centroids were located in

the frontal elevations from 20 to 80 than in any other locations (Figure 3.4). For 34%

of the AES units and 14% of the A2 units that were studied at 20 dB above threshold,

best-elevation centroids were not computed because the modulation of the spike counts

of the units by sound-source elevation was smaller than 50%. Such percentages

increased to 51 and 87, respectively, at stimulus levels 40 dB above threshold. These

units were represented by the bars marked by "NC" in Figure 3.4. No consistent orderly

progression of centroids along electrode penetrations was evident in either area AES or

area A2. Rarely, for low-intensity stimuli, we saw an orderly progression of centroids

along a short distance of the penetration. However, this organization did not persist at

higher stimulus levels.

Neural Network Classification of Spike Patterns

Examples of the spike patterns of two AES units and an A2 unit are shown in

Figure 3.5 in a raster plot format. Each panel in the figure represents one unit, and only

responses elicited at 40 dB above threshold are shown here. Sound-source elevation is

plotted on the ordinate and the post-onset time of stimulus is plotted on the abscissa.

Each dot represents one spike recorded from the unit. For each of the spike patterns,

one can see subtle changes in the numbers and distribution of spikes and in the latencies

of the patterns from one elevation to another. It is also noticeable that spike patterns

from different units differ significantly.

Figure 3.6 plots the results from artificial neural network analysis of the spike

patterns at 40 dB re threshold of the same AES unit as in Figure 3.5A. In panel A,










Distribution of Best-Elevation Centroids


area AES
N=113
Thr + 20 dB


. H

area A2
N= 82
"Thr + 20 dB










-60 0


11d-21


area AES
N=113
"Thr + 40 dB


area A2
N= 82
'Thr + 40 dB


~jLItI H


60 120 180 NC -60


F-1--,


0 60 120 180


Elevation (degrees)



Figure 3.4. Distribution of locations of best-elevation centroids. The percentages of
units for which no centroids could be calculated are marked "NC" on the abscissa.
Conventions as in Figure 3.2.


51.3%
'7


86.6X

R.







40



200 A' :' '950531
... .. : .. .. : -. .." .........................
180 --"-- area AES
160 ,
1 0 - -l ^ - - - - - ---- - - - - - - -
1 4 0 - --:' . . - - - -- -
120 -":- . .
100
80
..-.., ,-. ..:.....:...............................
80 .;.... ;.- "" ....... :............... .. ... ....
60 O

20 .....- -: --.. .-.................................
0 --:-.. --. .. . . . .
20

-20 .
0 ... ... I;'. ' -------------------------------------
-40
-<0 ::::::: :::::::::::::::::::::::::::::::::::::::::

-60 : Threshold+40 dB
200 .B 950754
180 "i "-;,---------------------------- ----------
-! 160 area AES
60 -6------ ------------
L~o140

"5 0 ' .... -'- ....... ................
120
100
.2 80
640 ----------
> 20 - -. .- : - --------" ------------------
S40
20 0 --. ----------------- ---- .............
..... . -:-...........-............................
WJ 20-

-20 1.
: = 4 0 ... .. L. ..
-60 :.A" Threshbld+40 dB

200 C 970821
180 'il ----------rea'-2-
180 .... ------------ ---------- *... ;2-"
14o --.-.-... -.-- ----- ---
260
-gO--------- ,--------------- --------------------

-- -(6 *--- .-- - - - - - - - -- - - --- - -- -
.........., -.---. ....... -........-------.........-
12 ... .... . j .........- --... ..... ..... ....... ...
100 4:

80 .......... :i.;."..................................
-60 ------.'--------------------
40
... ... ..-- v-- - ...................................
20
0 ---. --- .- .--. -------.. -. --- -------. -........
-20 -.
40
-20 ... ... ... 't ;............................. .
-40
-60 .' Threshold+40 dB

0 10 o20 30 40 o50 60 70o 80 90 100
Post-Onset-Time (ms)



Figure 3.5. Raster plot of responses from two AES units (A: 950531 and B: 950754)
and an A2 unit (C: 970821). Each dot represents one spike from the unit. Each row of
dots represents the spike pattern recorded from 10 ms before the onset to 10 ms after the
offset of one presentation of the stimulus at the location in elevation indicated along the
vertical axis. Only 10 of the 40 trials recorded at each elevation are plotted. Stimuli
were 100-ms noise burst starting at 0 ms, represented by the thick bars. Stimulus level
was 40 dB above threshold.









A(IMJ.


240


180


-60


-120


-180


950531
area AES +
Thr + 40 dB +











-6 ' ' 0 I 2 8
c Ee.

+ + +




+ +'
4-
+




+
4.-






+- +



-60 0 60 120 180
Sound-Source Elevation (degrees)


Figure 3.6. Network performance of the same unit (950531) as in Figure 3.5A. In A,
each plus sign represents the network output in response to input of one bootstrapped
patterns. The abscissa represents the actual stimulus elevation, and the ordinate
represents the network estimate of elevation. The solid line connects the mean directions
of network estimates for each stimulus location. Perfect performance is represented by
the dashed diagonal line. Panel B shows the distribution of network errors. The dashed
line represents 7.1%, which is the expected random chance performance given 14
speaker elevations.










each plus sign represents the network estimate of elevation based on one spike pattern,

and the solid line indicates the mean direction of responses at each stimulus elevation. In

general, the neural-network estimates scattered around the perfect performance line

represented by the dashed line. Some large deviations from the targets were seen at

certain locations in elevation (e.g., -60 to -20 in this particular example). The neural

network classification of the spike patterns of this unit yielded a median error of 32.2,

which was among the smallest in our sample. The distribution of errors in estimation of

elevation for this unit is shown in Figure 3.6B. Seventeen percent of network errors

were within 10 of the targets. In contrast, the expected value of random chance

performance given 14 speakers is 7.1%.

Results of neural-network analysis of responses of another AES unit are shown in

Figure 3.7; the spike patterns of this unit are plotted in Figure 3.5B. The network

estimates of elevation based on the responses of this unit were less accurate than the

estimates shown in Figure 3.6. The network scatter was larger and, at elevations -60 to -

20, the network estimates consistently pointed above the stimuli. Nevertheless, the

network produced systematically varying estimates of elevation within the region of 0 to

140. The unit represented in Figure 3.7 was typical of many units in that network

analysis of its spike patterns tended to undershoot elevations at the extremes of the range

that we tested (e.g., -60 to -20 and 160 to 200 in this particular example). The median

error for this unit was 47.5, which is slightly larger than the mean of our entire

population.

Undershoots at the extremes of the range were also common for A2 units,

However, some A2 units could discriminate the lower elevations fairly well. Figure











300


240


180


120


60


0


-60


-120


-180


950754 +
area AES
Thr + 40 dB


+"
+


-60 60 120 180
Sound-Source Elevation (degrees)
or -


-180 -120 -60 0 60
Network Error (degrees)


Figure 3.7. Network performance of the same unit (950754) as in Figure 3.5B.
Conventions as Figure 3.6.


970754 B
Median error=47.5*


u I


5 -


n-a ]r _


11 . . i t l l l l l l l l












970821
area A2 +
Thr + 40 dB +


4+


jt


-- 4 a I I i I
-60 0 60 120 180
Sound-Source Elevation (degrees)


-i180 '-12 '4oo 6
Network Error


S6b '1
(degrees)


Figure 3.8. Network performance of the same unit (970821) as in Figure 3.5C.
Conventions as Figure 3.6.


180 -


120


60


-60 F


-120


-1801-


970821
Median error=25.4


. . . A I H E & I


---------------


--------------









3.8 shows the network analysis of spike patterns shown in Figure 3.5C. The mean

directions of the responses were fairly accurate at all locations except at 160 to 200,

where undershoots were seen (Figure 3.8A). The distribution of errors (Figure 3.8B)

shows a bias toward negative errors because of those undershoots.

For all the 195 units studied at 40 dB above threshold, the median errors of the

network performance averaged 46.4, ranging from 25.4 to 67.5. The distribution of

the median errors is shown in Figure 3.9 (right panel). For stimulus level at 20 dB above

threshold, the median errors of the network performances averaged 6 less than those at

40 dB above threshold (Figure 3.9, left panel). The bulk of the distribution for all

stimulus level conditions was substantially better than chance performance of 65 which

is marked by arrows in Figure 3.9. The chance performance of 65 is a theoretical

median error when we consider the entire range of 260 of elevation. When we tested

the network with data in which the relation between spike patterns and stimulus

elevations was randomized, we obtained an averaged median error of 66.5 1 .7 across

all the 195 units. In general, the median errors of network performance in elevation

averaged 2 to 3 larger than those we found in network outputs in azimuth

(Middlebrooks et al. 1998). This is consistent with an observation from a study of

localization by human listeners (Makous and Middlebrooks 1990). For stimuli in the

frontal midline, vertical errors were roughly twice as large as horizontal errors. Results

from behavioral studies in cats are difficult to compare in terms of localization accuracy

in vertical and horizontal dimensions because only a very limited range of elevation was

employed in those studies (Huang and May 1996a; May and Huang 1996).














25 5 . . . I . I . .
area AES area AES
N=113 N=113
Thr + 20 dB Thr + 40 dB
20


15


10



5-

C
0 ; : : ; : ; : ; :


0 area A2 area A2
U N= 82
L_ N= 82
0 Thr + 20 dB Thr + 40 dB
a- 20 h+20d



15


10


5,


0
0 20 40 60 80 0 20 40 60 80
Median Error (degrees)





Figure 3.9. Distribution of elevation coding performance across the entire sample of
units. Chance performance of 65 is marked by the arrow. Conventions as in Figure 3.2.










We demonstrated in our previous paper that coding of sound-source azimuth by

spike patterns is more accurate than coding by spike counts alone (Middlebrooks et al.

1998). We evaluated the coding of sound-source elevation by those two coding

schemes. Consistent with our previous paper, we found that median errors in neural

network outputs obtained with spike counts were significantly larger than those obtained

with complete spike patterns. Median errors in network output obtained in the spike-

count-only condition averaged 8 to 12 larger than those obtained in the complete-spike-

pattern condition, depending on cortical area (A2 or AES) and stimulus level (20 or 40

dB above threshold).

Comparison of Elevation Coding in Areas AES and A2

We compared our sample of A2 units with our sample of AES units in regard to

the accuracy of coding of elevation by spike patterns. Averaged across all elevations, the

median errors at sound levels of 20 dB above threshold were slightly smaller for A2 units

than those for AES units (t test, P < 0.05), but not significantly different from each other

in the two areas at 40 dB above threshold (compare upper panels with lower panels in

Figure 3.9). When we consider particular ranges of elevation, however, we often found

that in area AES, the median errors at locations below the front horizon were much

larger than those at the rest of the locations in elevation. In the case of A2 units, this

difference was less prominent. Individual examples were given in Figures 3.6 3.8. We

then calculated the median errors at each of the 14 elevations for units from areas AES

and A2. The mean and standard error of the median errors were plotted in Figure 3.10.

Asterisks in Figure 3.10 marked the locations at which the differences in the means of the

median errors between the two cortical areas were statistically significant (t test, P <











120 AES, N=113
A2, N= 82
100 t p<0.05 *
oW
L^ 80
L
W-H
c c 60
0 0*
:5 0
^ 40

20


to CJ V W G 00 04 1W to 0 a
I I I I .
Sound-Source Elevation (M)



Figure 3.10. Comparison of network performance of A2 and AES units. Plotted here
are the means and standard errors of the median errors from the network analysis of AES
(open bars) and A2 units (filled bars) at each individual elevation. Asterisks mark the
locations where the means of A2 units are significantly different from those of AES units
(t test, P <0.05).








0.05). The median errors at elevations from 0 to 120 for A2 units and 20 to 140 for

AES units were fairly small. The median errors of AES units at -60 to 0 of elevation

were significantly larger than those of A2 units. The reverse was true at 120 to 200 of

elevation. Thus, compared to AES units, A2 units achieved a better balance in the

network output errors in lower elevations and rear locations.

Contribution of SPL Cues to Elevation Coding

Spectral shape cues are regarded as the major acoustical cue for location in the

median plane (Middlebrooks and Green 1991). However, the modulation of SPL in the

cat's ear canal due to the directionality of the pinna also can serve as a cue. We refer this









cue as the SPL cue. We wished to test the hypothesis that SPL cues alone could account

for our results. We measured the SPLs in the cat's ear canal and compared the acoustical

data with the network performance. Specifically, we compared the network performance

among sound-source elevations at which the stimuli produced similar SPLs in the ear

canal. If the SPL cue played a dominant role, the artificial neural network would not be

able to discriminate those elevations successfully. We also tested the network

performance under conditions in which the SPL of the sound source was varied. If the

SPL cue dominated, we would expect that the network performance would be degraded

substantially when the variation of the source SPL is large relative to the dynamic range

of the modulation of SPL in the cat's ear canal.

The elevation sensitivity of SPLs varies somewhat with frequency, so we

measured SPLs within 3 one-octave bands: low, 3.75 7.5 kHz; middle, 7.5 15 kHz;

and high, 15 30 kHz. The spatial patterns of sound levels in these three frequency

bands were similar among the six cats that were used in the acoustic measurement.

Figure 3.11 A plots the sound levels in those three frequency bands as a function of

sound-source elevation from the measurement of one of the cats. The entire ranges of

the sound level profiles for the low-, mid-, and high-frequency regions were 11.9, 17.8,

and 29.2 dB, respectively (Figure 3. I1A). For the low- and high-frequency bands, sound

from 0 elevation produced the maximal gain in the external ear canal of the cat. Sound

levels decreased more or less monotonically when the sound source moved below or

above the horizontal plane and behind the cat. For the mid-frequency band, however,

sounds from -20 and 0 and those from 100 and 120 produced the largest gains in the

























Figure 3.11. Sound levels and neural network performance. A: Sound levels measured
at the external ear canal as a function of sound-source elevation. Levels were measured
in low- (3.75 7.5 kHz), mid- (7.5 15 kHz), and high-frequency (15 30 kHz) bands.
B: Sound levels in the low-frequency band are plotted with triangles on the left ordinate.
The mean directions of neural network responses of a unit (960553) that responded well
to the low-frequency tones are plotted with filled circles on the right ordinate. The two
ordinates are scaled so that the ranges of two curves roughly overlap. The small arrows
mark the pair of sound-source elevations at which sound levels were found similar to one
another (within 1 dB) but at which network estimates of elevation were different. C:
Sound-level profile at mid-frequency region (open squares) and mean directions of the
network responses (filled circles) of a unit (950915) that responded well to mid-
frequency tones are plotted in the same format as B. D: Sound-level profiles at high-
frequency band at 10 dB above and 10 dB below the actual one shown in A are plotted
on the left ordinate with crosses to simulate the 20-dB range of the roving levels. Mean
directions of the network responses of a unit (950702) that responded well to high-
frequency tones are plotted on the right ordinate. The network was trained with spike
patterns from 5 SPLs, from 20 to 40 dB above threshold. Filled and open circles are
mean directions of network output when tested with spike patterns obtained with
stimulus at 20 and 40 dB above threshold. Arrows mark examples at which the two
network outputs point to the same correct locations.







51






30
A B
..... . i .. .... ........ 6 0
25 4 6
S25-. ,
-20
20 A 0

'' ,I T "6 0
15- 1 0
~20
\ 140
1 k \5* 180

5- 15- 0
A 3.75- 7.5 kHz A6
-0D- 7.5 -15.0 kHz "A 0.
0
0- -X- 15.0 -30.0 kHz p -0- Centroids of net
-x- 15.0 -30.0 kHz- estimates Z
x (I
-5-40
-5 I.. iII I.,I II ,I I, I,I I 0
"3
C D X
....... i --60 30 / -
2 0 .. . . . . ... .... .. . . . 6 0 "
J-20 30
....' ,.2. -60. .
~' T'~" : ',20 C
15 25-/ -20
S60 4
208
100 20 60

10 1 4 140

\ 180
... . . . . . . . . . . ..\ ^
114
5- \/

b -0-20 dB \ \x
0- 0 -0-40 dB \
0
I I I I1 I 1 i i
-60-20 20 60 100 140 180 -60-20 20 60 100 140 180
Sound-Source Elevation (degrees)









external ear canal. The sound levels dropped at locations behind the cat and in those

below the frontal horizon.

We compared the elevation sensitivity of sound levels with the neural network

estimation of elevation by plotting sound levels and neural network output on common

abscissas (Figure 3.11, B and C). Figure 3.11B shows the network analysis of a unit that

responded best to frequencies in the low-frequency band. The triangles show the sound

levels in that band. Figure 3.11C shows network data and mid-frequency sound levels

for a unit that responded best to the middle frequencies. The left ordinate, used for SPL

data, and the right ordinate, used for neural network estimate, were scaled so that both

sets of data roughly overlapped. If the network identification of elevation was due

simply to SPL variation, sound sources that differed in elevation but produced the same

SPLs in the ear canal would result in the same elevations in the network output. In fact,

the neural network could distinguish pairs of speakers at which similar SPLs (within I -

dB) were produced. Examples of such pairs of locations are marked by arrows in Figure

3.11, B and C. The results are inconsistent with the prediction based on the SPL cue.

Next, we tested the effect of roving the source SPLs. Figure 3.11 D was plotted

for another unit in a similar format to Figure 3.11, B and C. This unit responded best to

frequencies in the high-frequency band. Here, we plotted two high-frequency sound-

level curves separated by 20 dB, simulating the SPL cues under conditions in which we

varied the stimulus SPLs in a range of 20 dB. A neural network was trained with spike

patterns from five SPLs between 20 and 40 dB above threshold in 5-dB steps. The

network output based on spike patterns elicited with single source SPLs at 20 and 40 dB

above threshold were plotted using the right ordinate. One can see from Figure 3.1 ID









that even though the high-frequency band provided the strongest SPL cues for

localization in elevation, those SPL cues were greatly confounded when stimulus levels

were roved in the range of 20 dB. For instance, a stimulus of 20 dB SPL at 0 and a

stimulus of 40 dB SPL at 180 would produce similar sound level at the ear canal.

Nevertheless, neural-network recognition of spike patterns produced by two single

stimulus levels (20 and 40 dB above threshold) were fairly accurate and comparable.

Arrows show examples in which the network recognized two sets of spike patterns as

responses to stimuli at the same elevation, even when the stimulus SPLs differed by 20

dB. The median error in network output for the unit represented in Figure 3.1 ID was

29.0. That means that one half of the network outputs fell within a range of roughly

58.0 ( 29.0) around the correct elevation. That range of errors is 22.3% of the 260

range of elevation that was tested. In contrast, SPL cues to sound-source elevation were

confounded by source levels that roved over a range of 20 dB, which is 68.5% of the

29.2-dB range of variation of SPL produced by a constant-level source moved through

260 of elevation. We applied the same approach as in Figure 3.11 to all the units in our

sample that had median errors smaller than 40 and obtained results qualitatively similar

to those shown in the figure. These results contradict the hypothesis that elevation

sensitivity is due entirely to the elevation dependence of SPL.

Our systematic analysis of the effect of roving levels on network performance

further supports the hypothesis that level-invariant information about sound-source

location is present in the spike patterns. For the sample of 195 units, the averaged

median errors of the network when trained and tested with responses to stimuli that were

20 and 40 dB above threshold were 40.3 and 46.4, respectively. Neural network









analysis yielded an average median error of 47.9 when trained and tested with 5 roving

levels (20, 25, 30, 35, and 40 dB above threshold). Statistics did not show any

significant difference of the averaged median errors between the condition of a single

level at 40 dB above threshold and that of 5 roving levels (paired t test, P > 0.05).

Frequency Tuning Properties and Network Performance

The coding of sound source elevation requires integration of information across a

range of frequencies. Frequency tuning properties of a neuron might be related to a

neuron's elevation sensitivity. In this section, we explored the relation between the

frequency tuning properties and the network performance in the two cortical areas. We

found that A2 units showed broader frequency tuning than did AES units. The broader

frequency tuning in A2 was mainly due to that the low-cutoff frequencies of the

frequency tuning curves of the A2 units extended toward lower frequencies. Acoustic

measures of the cat's head-related transfer function (Rice et al. 1992) and behavioral

studies in cats (Huang and May 1996a) suggested that spectral details in lower frequency

range (e.g., 5 10 kHz) might signal low elevations. In fact, as we showed earlier, the

AES units tended to produce larger errors in the low elevations (-60 to 0) than did A2

units (Figure 3.10). Could the broader frequency tuning and lower low-cutoff

frequencies of the A2 units account for their better performance in the low elevations?

First, we consider the frequency tuning properties of the units. The units that we

encountered in areas AES and A2 responded well to broadband noise burst stimuli. We

recorded frequency tuning responses to tone bursts of 100-ms duration in 173 of the 195

units. Among them, 91 units were from area AES and 82 from area A2. Most of units

showed stronger responses to higher frequency tones (>15 kHz) than to lower frequency













area AES, N= 91


A












-- SI
"sSI


/
/ -
I
/


C
0
00
0L4-



0._>
0
0
4- >
00


Q) 0
a- .
L4-
U


3.8 7.5 15.0 30.0 3.8
Frequency (kHz)


r~


area A2, N= 82

B
o..... .**"*****




I-
/ \ I
I'
iI 'I%
/ '


7.5 15.0 30.0


Figure 3.12. Percentage of unit sample activated as a function of stimulus tonal
frequency. The three lines in each panel represent the percentage of units activated at or
above 25, 50, and 75% of maximal spike counts. A. Pooled data from 91 AES units. B.
Pooled data from 82 A2 units.












tones (<15 kHz). Figure 3.12, A and B, shows, for our sample of AES and A2 units,


respectively, the percentage of the population activated to levels at or above 25, 50, and


75% of maximal spike counts at various tonal frequencies, at a stimulus level 40 dB


above threshold. At almost all frequencies, more than half of the population in both areas


AES and A2 were activated above 25% of maximal spike counts. Tonal stimuli activated


a larger fraction of the unit population in area A2 than in area AES, especially in lower


frequencies. Hence, frequency tuning bandwidth appeared broader in our sample of A2


-- 50%
- 75%


. . . . . . i l l .


... I









units than in the AES units. The conventional way of defining tuning bandwidth is to

find thresholds at various frequencies and then to measure the bandwidth at a certain

level above the lowest threshold. That might not provide an accurate description of

tuning bandwidth under condition of free-field sound stimulation because the transfer

functions of the pinnae will be added to the frequency sensitivity of the unit. Instead, we

defined the tuning bandwidth as follows. First, we measured spike counts in response to

tones at various frequencies with a fixed level of 40 dB above the threshold for the best

frequency. The tuning bandwidth was the frequency range over which the spike counts

were at or above 50% of the maximal spike count. That provided a somewhat more

appropriate measure of the bandwidth of frequency that influenced the unit responses in

our study. The distribution of the frequency tuning bandwidths in our sample of A2 and

AES units is shown in the upper panels of Figure 3.13. The mean bandwidth in A2 was

2.02 octaves and that in AES neurons was 1.49 octaves. This difference was statistically

significant (t test, P < 0.01 ).

Next, in order to explore whether this difference in frequency tuning bandwidth

could account for the difference between AES and A2 units in neural network

performance in low elevation coding, we measured the correlation of the bandwidths of

individual A2 and AES units with their neural network performance, particularly in the

lower elevation coding. Lower panels of Figure 3.13 are scatter plots of the neural

network performance at lower elevations as a function of frequency tuning bandwidth for

our AES and A2 units, respectively. The lower elevations that represented are -60 to 0,

which are in the range in which difference between the two cortical areas were evident

(Figure 3.10). No correlation could be seen between the network performance












































1 2 3 I 2 3
Frequency Tuning Bandwidth (octave)


Figure 3.13. Frequency tuning bandwidth and neural network performance. Upper
panels represent the distribution of bandwidth in AES units (left, open bars) and in A2
units (right, filled bar). Lower panels represent relation between the neural network
performance in the lower elevation and the frequency tuning bandwidth. Left and right
panels represent areas AES and A2, respectively. Median errors were computed in a
range of -60 to 0 elevation.









represented by the median errors and the frequency tuning bandwidth. Similarly, we

measured the correlation of the low-cutoff frequencies of the frequency tuning curves of

individual A2 and AES units with their neural network performance in the lower

elevations. We found a marginally significant correlation between the network output

errors at low elevations and low-cutoff frequencies in the sample of A2 units (r = 0.24,

0.01 < P < 0.05) but not in the sample of AES units.

Relation between Azimuth and Elevation Coding

For 175 units, responses to stimuli from both horizonta and vertical speakers

were obtained. Across these 175 units, there was a significant positive correlation

between the network performance in azimuth and in elevation (Figure 3.14). Each panel

in Figure 3.14 is a scatter plot of the median errors of the same units in encoding sound-

source azimuth and elevation. AES units (N=113) are presented in the upper panels and

A2 units (N=62) in the lower panels. Left panels plot data obtain from stimulus level at

20 dB above threshold and right panels 40 dB above threshold. Correlation coefficients

(r) between median errors in azimuth and elevation ranged between 0.23 to 0.53

depending on the cortical areas and the stimulus levels. The correlation coefficients of

the A2 units were larger than those of the AES units, especially for the stimulus level at

40 dB above threshold. Among the units that coded elevation with median errors of 40

or less, for example, the majority of units also showed median errors of 40 or less in

azimuth. The principal acoustic cues for localization in elevation differ from those for

localization in azimuth. If neurons are sensitive only to a particular localization cue, no

correlation or perhaps negative correlation between network performance in the two

dimensions would be expected. The fact that we observed positive correlations between















80,1 11


area AES
SThr + 20 dB
N =113
r = .43
Sp<.01I


0 0 00


0


area AES
Thr + 40 dB
N =113
r = .23
p<.05


o00


co 0

0
0

0
0


00 0 0
00000 0
0


ni i I I I


area A2
0 Thr + 20 dB
N = 62
r = .46
0 p
0 *"

0
.00. 0
01a .o**o S
0 #

0 0 o
0*. ".0


0 *


0 10 20 30 40


area A2
Thr + 40
N = 62
r = .53
p<.01


dB


0



0*0 0
* I
S. .0*



*
* 0
*0 -


50 60 70 0 10 20 30
Median Errors in Azimuth (degrees)


01


40 50 60 70 80


Figure 3.14. Correlation between network performance in azimuth and elevation. Each
dot in the scatter plots represents, for one unit, the median error of the network
performance in elevation versus that in azimuth. There is a positive correlation between
network performance in both dimensions. Open circles in the upper panels represent area
AES units. Filled circles in the lower panels represent area A2 units. Left panels plot
data at a stimulus level 20 dB above threshold. Right panels plot data at a stimulus level
40 dB above threshold.


I I I I U









the two dimensions indicates that many units can integrate information from multiple

types of localization cues.

Discussion


Results presented in Middlebrooks et al. (1998) support the hypothesis that

sound-source azimuth is represented in the auditory cortex by a distributed code. In that

code, responses of individual neurons carry information about 360 of azimuth, and the

information about any particular sound-source location is distributed among units

throughout entire cortical areas. The present study extends that observation to the

dimension of sound-source elevation. The acoustical cues for sound-source elevation

differ from those for azimuth, and identification of source azimuth and elevation

presumably require distinct neural mechanisms. The observation that units in areas AES

and A2 show similar coding for azimuth and elevation supports the hypothesis that

neurons integrate the multiple cues that signal the location of a sound source rather than

merely coding a particular acoustical parameter that happens to co-vary with sound-

source location. In this Discussion, we consider the acoustical cues that could underlie

the elevation sensitivity that we observed, evaluate the similarities and differences

between areas AES and A2 in regard to elevation and frequency sensitivity, and comment

on the significance of the correlation between azimuth and elevation coding accuracy.

Acoustical Cues and Localization in Median Plane

Acoustical measurements of directional transfer functions in the ear canal and

behavioral studies have provided insights into the acoustical cues for sound localization

in the vertical dimension. Due to the approximate left-right symmetry of the head and









ears, a stimulus presented in the median plane will reach both ears simultaneously with

equal levels. Interaural time differences and interaural level differences that are important

for localization in the horizontal plane may contribute little if any to the localization in the

median plane (Middlebrooks and Green 1991; Middlebrooks et al. 1989).

Sound pressure level, on the other hand, can be a cue for vertical localization if

the source level is known and constant. The SPL in the ear canal varies with sound-

source elevation. Earlier recordings in cats have shown that within the range of -60 to

+90 elevation, SPL varies a few dB for lower frequency tones to as much as 20 dB for

high frequency tones (Middlebrooks and Pettigrew 1981; Musicant et al. 1990; Phillips et

al. 1982). In the present study, the acoustical recording of the directional transfer

function at the entrance of the external ear canal of cats was carried out in the range of

elevation from -60 to 200. Instead of examining each individual frequency, we plotted

the SPL profile in three frequency bands (Figure 3.11A). The high-frequency band (15 -

30 kHz) had the largest variation in SPL. The entire range of the sound level profiles for

the low-, mid-, and high-frequency regions were 11.9, 17.8, and 29.2 dB, respectively.

To test the degree to which SPL cues might have contributed to our physiological

results, we compared the elevation sensitivity of unit responses with the elevation

sensitivity of ear-canal SPLs. There were two indications that SPL cues are not the

principal cues for the elevation sensitivity we observed. First, we observed many

instances in which sound sources at two locations produced roughly the same SPL in the

ear canals, yet produced unit responses that could be readily distinguished by an artificial

neural network. Second, under conditions in which we roved stimulus SPLs over a range

of 20 dB, a sound source at a single location produced SPLs ranging over 20 dB, yet








produced unit responses containing SPL-invariant features that resulted in roughly equal

neural-network estimates of elevation. Although SPL cues might contribute to elevation

sensitivity under certain conditions in which sound-source SPLs are constant, these two

observations indicate that SPL cues alone could not have accounted for the neuronal

elevation sensitivity that we observed.

A body of evidence suggests that spectral-shape cues are the principal cues for

localization in the vertical dimension. Measurement of the directional transfer functions

of human ears (Middlebrooks et al. 1989; Shaw 1974; Wightman and Kistler 1989) and

those of cat ears (Musicant et al. 1990; Rice et al. 1992) has shown that spectral shape

features vary systematically with sound-source elevations. The most conspicuous

features of the transfer functions of a cat ear are probably the spectral notches. The

center frequencies of the spectral notches (5-18 kHz in cat) increase as sound-source

elevation changes from low to high (Musicant et al. 1990; Rice et al. 1992). Recent

behavioral studies in cats have provided evidence that indicates that the mid-frequency

spectral-shape cues are important for vertical localization (Huang and May 1996a,

1996b; May and Huang 1996). A recent report from Imig and colleagues (1997) has

demonstrated that at least some elevation sensitive units in the medial geniculate body

lose that sensitivity when tested with tonal stimuli, also suggesting a spectral basis for

elevation sensitivity (Imig et al. 1997). We do not yet have any direct evidence that the

elevation sensitivity that we observed was due to sensitivity to spectral-shape cues.

Having ruled out SPL cues, however, sensitivity to spectral-shape cues certainly is the

most likely explanation for the elevation sensitivity that we see.








A2 versus AES: Elevation Sensitivity and Frequency Tuning Properties

Our initial data from area AES showed larger errors at frontal locations below the

horizon than at higher elevations and in the rear. We explored auditory area A2 to test

whether sensitivity to low frontal elevations might be more accurate in another cortical

area. Averaged across all elevations, the accuracy of elevation coding for units from

areas A2 and AES was not significantly different. Nevertheless, differences between

cortical areas were found in the errors at low frontal and rear locations (i.e., -60 to 0

and +120 to +200). For both cortical areas, errors of the network output at lower

elevations and rear locations were much larger than those at other locations. These large

errors were almost always caused by underestimation of targets. These undershoots

might be due to an edge effect of the neural network analysis. That is, the network

would tend not to give mean outputs at locations beyond the limits of the training set.

However, the edge effect could not explain why there were differences in the accuracy of

network output in various elevation ranges between the two cortical areas.

Since spectral-shape cues of the sound are important for localization in vertical

plane, it is conceivable that differences in the frequency tuning of neurons in areas AES

and A2 might account for differences in elevation sensitivity. Previous studies showed

that broadly tuned neurons were found in both areas (Andersen et al. 1980; Clarey and

Irvine 1986; Reale and Imig 1980; Schreiner and Cynader 1984). In area AES, neurons

were shown to respond to ranges of frequency that most often were weighted toward

high frequencies (Clarey and Irvine 1986). In area A2, a dorsoventral gradient of

frequency tuning bandwidth was demonstrated with the lowest Qio values found in the

most ventral parts of A2. Frequency bands often extended to low frequencies (Schreiner








and Cynader 1984). For the sample of our 91 AES units and 82 A2 units, most of them

showed stronger responses to higher frequency tones (>15 kHz) than to lower frequency

tones (< 15 kHz). Frequency tuning bandwidth was broader in our sample of A2 units

than in the AES units, and tonal stimuli activated a larger fraction of the unit population

in area A2 than in area AES, especially at lower frequencies (Figures 3.12 and 3.13). We

could postulate that the properties of broad frequency tuning in area A2 would make A2

neurons more suitable for detecting the spectral shape cues that are important for

elevation coding than AES neurons. However, our results were not conclusive in this

regard. No correlation was found between the frequency tuning bandwidth and the

network output errors at the locations at which differences between A2 and AES neurons

were evident (Figure 3.13). Only a marginally significant correlation was found between

the low-cutoff frequencies and network output errors at low elevations in the sample of

A2 units. Perhaps overall frequency tuning bandwidth of the cortical neurons is not as

important as are details of frequency response areas that consist of excitatory and

inhibitory regions, as suggested in the data obtained from the medial geniculate body

(Imig et al. 1997). Our limited data, as well as earlier studies on frequency tuning of the

A2 and AES neurons, have shown that some of the neurons from either cortical area

have irregular frequency tuning curves in which two or multiple peaks are present

(Clarey and Irvine 1986; Schreiner and Cynader 1984). Such irregular frequency tuning

may produce spectral regions of inhibition and facilitation which in turn may provide the

basis for a neuron's directional sensitivity.








Correlation between Azimuth and Elevation Coding

We find that, in general, those cortical units in areas AES and A2 that exhibit the

most accurate elevation coding also tend to show good azimuth sensitivity. The

psychophysical literature supports the view that azimuth sensitivity derives primarily from

interaural difference cues and that elevation sensitivity derives from spectral shape cues

(Middlebrooks and Green 1991). We would like to conclude that single cortical neurons

receive information both from brain systems that perform interaural comparisons as well

as those that analyze details of spectra at each ear. An alternative interpretation,

however, is that the units that we studied were not sensitive to interaural differences and

that both the azimuth sensitivity and the elevation sensitivity that we observed were

derived from spectra shape cues. Indeed, acoustical studies in cat and human indicate

that spectra measured at each ear vary conspicuously as a broadband sound source is

varied in azimuth (Rice et al. 1992; Shaw 1974). Moreover, human patients that are

chronically deaf in one ear can show reasonably accurate localization in azimuth,

presumably by exploiting monaural spectral cues for azimuth (Slattery and Middlebrooks

1994).

These conflicting conclusions can be resolved only by future studies in which

specific acoustical cues are controlled directly. At this time, however, at least two lines

of evidence lead us to reject the view that the spatial sensitivity of the units that we

studied is derived entirely from spectral shape cues. First, Imig and colleagues (1997)

searched for units in the cat's medial geniculate body that showed azimuth sensitivity

derived predominantly from monaural spectral cues. Only about 17% of units in the

ventral nucleus (VN) and the lateral part of the posterior group (PO) showed azimuth








sensitivity that persisted after the ipsilateral ear was plugged. That study is not directly

relevant to the current one, since VN and PO project most strongly to cortical area Al,

not A2 or AES. Nevertheless, those results argue that in at least two divisions of the

auditory thalamus only a small minority of units shows azimuth sensitivity that is

dominated by monaural spectral cues. Second, studies in area A2 that used dichotic

stimulation have shown that about a third of area A2 units show excitatory/inhibitory

binaural interactions (Schreiner and Cynader 1984). That type of binaural interaction

would necessarily result in sensitivity to interaural level differences. About 40% of units

in area A2 and -69% of units in area AES show excitatory/excitatory binaural

interactions (Clarey and Irvine 1986; Schreiner and Cynader 1984), and

excitatory/excitatory interactions also can result in sensitivity to interaural level

differences (Wise and Irvine 1984). Even if we consider only the excitatory/inhibitory

units in area A2, a minimum of a third of our A2 sample should have included units that

were sensitive to interaural level differences. It would be difficult to argue that both the

elevation and azimuth sensitivity shown by units in areas AES and A2 is due primarily to

spectral shape sensitivity.

Concluding Remarks

The study reported in Middlebrooks et al. (1998) demonstrated that the responses

of single units in areas AES and A2 can code sound-source location in the horizontal

plane throughout 360 of azimuth. That result raised the question of whether units in

those cortical areas integrate multiple acoustical cues for sound-source location or

whether they simply code the value of a single acoustical parameter, such as interaural

level difference, that co-varies with azimuth. In the present study, we have found that





67


the responses of units also can code the elevation of a sound source in the median plane,

in which interaural difference cues presumably are negligible. Moreover, the units that

show the best elevation coding accuracy also code azimuth well. These results do not

constitute conclusive evidence of a direct role of these neurons in sound-localization

behavior. They do, however, support the hypothesis that single cortical neurons can

combine information from multiple acoustical cues to identify the location of a sound

source in azimuth and elevation.














CHAPTER 4
AUDITORY CORTICAL SENSITIVITY TO VERTICAL SOURCE LOCATION:
PARALLELS TO HUMAN PSYCHOPHYSICS

Introduction


We have reported previously that the spike patterns (spike counts and spike

timing) of neurons in the nontonotopic auditory cortex carry information about sound-

source location (Middlebrooks et al. 1994, 1998; Xu et al. 1998). The results support

the hypothesis that the activity of individual neurons carries information about broad

ranges of location and that accurate sound localization is derived from information that is

distributed across large population of neurons. The spike patterns that we studied

represent an output of a system that integrates multiple cues for sound-source location.

Human psychophysical studies have demonstrated that accurate localization of

broadband sounds in the vertical plane utilizes spectral-shape cues that are produced by

the interaction of the incident sound wave with the head and the convoluted surface of

the pinna (see Middlebrooks and Green 1991 for review). Human listeners can localize

accurately when presented with stimuli that have spectra that are fairly broad and flat, as

is true of most natural sounds. When certain filters are applied to stimuli, however,

localization based on spectral shape cues is confounded and listeners make systematic

errors in the vertical and front/back dimensions. Similarly, behavioral studies in cats have

shown that cats can accurately localize broadband sounds in the vertical plane and that








vertical localization fails when stimulus spectra are restricted to narrow bands of

frequency (Huang and May 1996a; May and Huang 1996; Populin and Yin 1998).

If the neurons that we have studied in the auditory cortex contribute to sound

localization behavior, one would expect that their responses would correctly signal the

locations of broadband sound sources, as we have observed previously. By analogy with

behavioral results, we also would expect their responses to signal systematically incorrect

locations when presented with certain filtered sounds. It is that expectation that we

tested in the present study.

We chose to study auditory cortical area A2 because A2 neurons are broadly

tuned to frequency (Andersen et al. 1980; Reale and Imig 1980; Schreiner and Cynader

1984) and because elevation sensitivity encoded by their spike patterns has been shown in

the previous report (Xu et al. 1998). Stimuli consisted of broadband noise and three

types of filtered noise. Broadband noise was chosen because human and feline listeners

tend to localize sounds accurately in the vertical and front/back dimensions when

stimulus spectra are broad and flat (Makous and Middlebrooks 1990; May and Huang

1996). The filtered noise included narrow bandpass noise (narrowband noise), narrow

band-reject noise (notched noise) and highpass noise. We chose narrowband noise

because human listeners make systematic errors when required to localize a narrowband

sound and because that pattern of errors is predicted well by a quantitative model

(Middlebrooks 1992). Similar behavioral results were observed in a head-orientation

experiments in cats (Huang and May 1996a). We chose notch stimuli because a possible

localization illusion due to spectral notches was observed in a human behavioral studies

(Bloom 1977; Walkins 1978) and because analysis of feline head-related transfer








functions has led several groups to speculate that notches might provide salient cues for

localization (Musicant et al. 1990; Rice et al. 1992). Highpass noise was chosen because

behavioral studies have shown that human localization judgements are influenced by the

cut-off frequency of a highpass sound (Hebrank and Wright 1974b) and because recent

human psychophysical studies from this laboratory have shown that narrowband and

highpass noise stimuli that have equal low-frequency cut-offs tend to produce equivalent

localization judgements (Macpherson and Middlebrooks 1999).

In the present study, we performed pattern recognition on cortical spike patterns

using an artificial neural network paradigm that we employed in previous studies of

azimuth and elevation coding (Middlebrooks et al. 1994, 1998; Xu et al. 1998). We

trained neural networks to recognize the spike patterns elicited by broadband noise

sources at various elevations. When presented with such spike patterns, the trained

networks produced estimates of the source location that corresponded reasonably well

with the actual locations. Later, the trained network was used to classify cortical

responses to filtered noise. In response to spike patterns elicited by narrowband noise of

a given center frequency, the network produced fairly constant elevation estimates,

regardless of the actual source elevation. When presented with spike patterns elicited by

narrowband sounds that varied in center frequency, the network produced elevation

estimates that tended to vary systematically in elevation. The region in elevation that was

associated with a given center frequency could be predicted by a localization model

based on spectral shape recognition. Highpass stimuli tend to produce spike patterns and

network outputs similar to those of narrowband stimuli when the low-frequency cut-offs

of both stimuli match each other. Our data support the hypothesis that the elevation








sensitivity of these cortical neurons derives from computational principles similar to those

that underlie human vertical localization.

Methods


Eight adult cats of either sex were used in this study. Cats were anesthetized for

surgery with isoflurane, then were transferred to t-chloralose for single-unit recording.

The right auditory cortex was exposed for microelectrode penetration. Both ears of the

cat were supported in a symmetrical forward position that resembled the ear position

adopted by a cat attending to a frontal sound. Details of anesthesia procedures and

surgical preparation are available in Middlebrooks et al. (1998).

Experimental Apparatus

Experiments were conducted in a sound-attenuating chamber that was lined with

acoustical foam (Ilibruck) to suppress reflections of sounds at frequencies > 500 Hz.

Sound stimuli were presented from loudspeakers (Pioneer model TS-879 two-way

coaxials) mounted on 2 circular hoops, one in the horizontal plane and one in the vertical

midline plane. On the horizontal hoop, 18 loudspeakers spaced by 20 covered 360.

On the vertical hoop, 14 loudspeakers spaced by 20 ranged from 60 below the frontal

horizon, up and over the top, to 20 below the rear horizon. Vertical locations were

labeled continuously in 20 steps from -60 to 200. All loudspeakers had a distance of

1.2 m from the center of the chamber where the head of the animal was positioned. In

the present study, we focused only on the vertical plane.

Experiments were controlled with an Intel-based personal computer. Acoustic

stimuli were synthesized digitally, using equipment from Tucker-Davis Technologies








(TDT). The sampling rate for audio output was 100 kHz, with 16-bit resolution. Before

each experiment, the loudspeakers were calibrated by presenting maximum-length

sequences (Golay codes) and recording the responses with a 1/2-in microphone (Larson-

Davis model 2540) placed in the center of the chamber in the absence of the cat (Golay

1961; Zhou et al. 1992). Loudspeaker responses were equalized individually so that the

root-mean-squared variation in sound level, computed in 6.1-Hz steps from 1,000 to

30,000 Hz, was < 1.0 dB.

Multichannel Recording and Spike Sorting

We used silicon-substrate thin-film multichannel recording probes to record unit

activities. Each probe had 16 recording sites on a one-dimensional shank spaced at

intervals of 100 gim and allowed simultaneously recording from up to 16 sites (Drake et

al. 1988; Najafi et al. 1985). The nominal impedances were -4 MU. We recorded from

auditory cortical area A2. The probe was passed in a dorsoventral orientation, roughly

parallel to the cortical surface, near the crest of the ventral middle ectosylvian gyrus.

Generally, the probe passed through the middle cortical layers that are active under

anesthesia, although recordings did not necessarily all come from the same cortical layer.

An on-line spike discriminator (TDT model SD 1) and custom graphic software were

used to monitor spike activities from one selected channel at a time. Prior to detailed

study at each probe placement, we determined the frequency tuning properties of units at

the most dorsal recording sites. We sometimes detected sharp frequency tuning, which

was taken as evidence that the probe was in the auditory cortical area Al. In such cases,

we retracted the probe and moved it further ventral.








Signals from the recording probe were amplified with a custom 16-channel

amplifier, digitized at a 25-kHz rate, sharply low-pass filtered below 6 kHz, re-sampled

at a 12.5 kHz sample rate, and then stored on a PC hard disk. Off-line, we isolated unit

activities from the digitized signal using custom spike-sorting software. Spike times

were stored at 20-ts resolution for further analysis. Occasionally, we encountered well-

isolated single units, but most often the recordings were characteristic of unresolved

clusters of several units. We presume that the addition of responses of multiple units

could only increase the apparent breadth of spatial tuning of single units and could only

decrease the spatial specificity of spike patterns. For that reason, we regard our results

to be conservative estimates of the accuracy of spatial coding by single units. Some unit

recordings were regarded as weak or unstable and thus were excluded from further

analysis. Usable recordings met the following two criteria. (1) In response to broadband

noise, the maximum mean spike rate across all tested sound levels and elevations was > 1

spike per trial. (2) Across all presentations of broadband noise, the mean spike rate in

the first half of the trials differed from that in the second half by no more than a factor of

2.

Stimulus Paradigm and Experimental Procedure

At each placement of a recording probe, we recorded responses to tones,

broadband noise, and filtered noise. The entire stimulus set required about 6 -8 hours to

present. We first studied the frequency tuning properties of the units. Pure tone stimuli,

consisted of 80-ms tone bursts (with 5-ms onset and offset ramps) with frequencies

ranging from 1.18 to 30.0 kHz in 1/3-oct steps. They were presented at +80 or +100








elevation at stimulus levels of 10, 20, 30 and 40 dB above the threshold of the most

sensitive unit.

Elevation sensitivity was then studied by presenting broadband noise bursts from

the 14 loudspeakers in the vertical midline plane, one loudspeaker at a time. The

broadband noise stimuli consisted of independent Gaussian noise samples of 80-ms

duration (with 0.5-ms onset and offset ramps). The spectra of the Gaussian noise bursts

were bandpassed between I and 30 kHz with abrupt cutoffs. The stimulus levels were 20

to 40 dB above the unit's threshold in 5-dB steps. A total of 40 trials was delivered for

each combination of stimulus location and stimulus level; locations and levels were varied

in a pseudorandom order.

Spectrally-filtered noise, consisting of 80-ms bursts of narrowband noise, notched

noise, and highpass noise, were always presented at 80 or 100 elevation. We chose

those locations to present the spectrally-filtered noise because cats' head-related transfer

functions typically were flattest for these locations. The narrowband noise had a flat

center 1/6-oct wide and skirts that fell off at 128 dB per octave. The center frequencies

(Fc's) of the narrowband noise stimuli that we used were usually from 4 to 18 kHz in 1-

kHz steps. In some cases, the range of Fc's were extended to 28 kHz. The reject bands

for the notch stimuli had a flat center 1/6-oct, 1/2-oct, or 1-oct wide and skirts that rose

at 128 dB per octave. The depth of the notch was 40 dB and the widths at the top were

0.792, 1. 125, or 1.625 octave. The Fc's of the notch typically ranged from 4 to 18 kHz in

I -kHz steps. The highpass noise had a positive slope of 128 dB per octave. The 3-dB

cutoff frequencies of the highpass noise ranged from 6 to 20 kHz in 1-kHz steps. The

sound levels of the spectrally-filtered noise were equalized by root-mean-squared power.








Perceptually, two sounds of equal root-mean-squared power that differ in spectral shape

might produce different loudnesses. Therefore, the stimulus levels all were expressed as

stimulus levels above unit's threshold for each type of spectrally-filtered noise. Stimulus

levels 20, 30, and 40 dB above threshold were used for the spectrally-filtered stimuli. A

total of 20 trials was delivered for each combination of stimulus Fc or cutoff frequency

and stimulus level; frequencies and levels were varied in a pseudorandom order.

Narrowband stimuli at I 3 Fc's also were varied across a range of elevations to

study the elevation sensitivities of neurons to the narrowband noise. The narrowband

noise of selected Fc's were presented from the 14 loudspeakers in the vertical plane, one

loudspeaker at a time. The stimulus levels for each Fc were 20, 30, and 40 dB above

threshold. A total of 20 trials was delivered for each combination of stimulus location

and stimulus level; locations and levels were varied in a pseudorandom order.

Measurement of head-related transfer functions (HRTFs) of the external ears was

carried out in all cats after the physiological experiments. A 1/2" probe microphone

(Larson-Davis model 2540) was inserted into the ear canal through a surgical opening at

the posterior base of the pinna. The probe stimuli delivered from each of the 14

loudspeakers in the median plane were pairs of Golay codes (Golay 1961; Zhou et al.

1992) that were 81.92 ms in duration. Recordings from the microphone were amplified

and then digitized at a rate of 100 kHz, yielding a spectral resolution of 12.2 Hz from 0

to 50 kHz. We divided from the amplitude spectra a common term that was formed by

the root-mean-squared sound pressure averaged across all elevations. Removal of the

common term left the component of each spectrum that was specific to each location; we

have referred to that term previously as the directional transfer function (Middlebrooks








and Green 1990), but now adopt the term HRTF in agreement with common usage. We

convolved each HRTF in the linear frequency scale with a bank of bandpass filters to

transfer it to a logarithmic (i.e., octave) scale (Middlebrooks 1999a). The filter bank

consisted of 118 triangular filters. The 3-dB bandwidth of the filters was 0.0571 octave,

filter slopes were 105 dB per octave, and the center frequencies were spaced in equal

intervals of 0.0286 octave from 3 to 30 kHz yielding 118 bands. The interval of 0.0286

was chosen to give intervals of 2% in frequency.

Data Analysis

The goals of the data analysis were, first, to map the correspondence of

broadband sound-source elevations with cortical spike patterns and, then, to associate

spike patterns elicited by various filtered sounds with broadband source elevations.

Artificial neural networks were employed to map spike patterns onto source elevations.

Networks were constructed using MATLAB Neural Network Toolbox (The Mathworks,

Natick, MA) and were trained with the back-propagation algorithm (Rumelhart et al.

1986). The architecture, as detailed in Middlebrooks et al. (1998), consisted of a 4-unit

hidden layer with sigmoid transfer functions and a 2-unit linear output layer. The inputs

to the neural network were spike density functions expressed in 1-ms time bins. The

spike density functions were derived from a bootstrap averaging procedure (Efron and

Tibshirani 1991) in which each spike density function was formed by repeatedly drawing

8 samples with replacement from the neural responses to a particular stimulus condition.

The two output units of the neural network produced the sine and cosine of the stimulus

elevation, and the arctangent of the two outputs gave a continuously varying output in

degree in elevation, i.e., the polar angle around the interaural axis. We did not constrain








the output of the network to any particular range, so the scatter in network estimation of

elevation sometimes fell outside the range of locations to which the network was trained

(i.e., from -60 to +200). Typically, we formed 20 bootstrapped training patterns from

the odd-numbered trials of the neural responses to the broadband noise stimuli and used

them to train the artificial neural network. The trained network was then subjected to

testing with patterns consisted of 100 bootstrapped trials derived from either the even-

numbered trials of the neural responses to broadband noise or the entire set of neural

responses to spectrally-filtered noise.

Results


Usable unit and unit-cluster data were obtained at 389 recording sites in 33

multichannel probe placements in auditory area A2 in 8 cats. All of the A2 units showed

relatively broad frequency tuning that was defined by frequency tuning curves that were

at least one octave wide at 40 dB above threshold. For 60.2% of the units, the tuning

curve of each unit spanned the entire mid-frequency range of 6 19 kHz. In the

following, we report the general properties of these units in response to broadband and

narrowband noise stimulation at various source elevations. We then examine the

sensitivity of units for the elevation of broadband noise sources. A quantitative model

that predicts human judgements of the locations of narrowband sounds is adapted for the

cat, then model predictions are compared with the locations signaled by cortical neurons

in response to narrowband stimuli. The neural responses to notch stimuli are also

analyzed using the neural-network algorithm. Next, we compare the elevation sensitivity

of the neural responses to highpass noise stimulation with that of neural responses to








narrowband noise stimulation. Finally, we examine the consequences for localization

coding of excluding information conveyed by the timing of spikes.

General Properties of Neural Responses to Broadband and Narrowband Stimuli

As we demonstrated in the previous study (Xu et al. 1998), A2 units showed

broad elevation tuning in response to broadband noise stimulation. An example of the

spike patterns of one representative unit (9806C02) in response to broadband noise is

represented by a raster plot in Figure 4. IA. Sound-source elevation is plotted on the

ordinate and the post-stimulus onset time is plotted on the abscissa. Each dot represents

one spike recorded from the unit. Only 20 trials of responses for each stimulus condition

elicited at 30 dB above threshold are shown here. One can see subtle changes in the

numbers and distribution of spikes and in the latencies of the spike patterns from one

elevation to another. The elevation tuning of the unit's mean spike counts in response to

broadband noise at 20 to 40 dB above threshold in 5-dB steps is plotted in Figure 4. ID.

Spike counts showed some elevation tuning at the lowest stimulus level but tuning

flattened out at higher stimulus levels. We quantified the elevation tuning of spike counts

by the average modulation of the spike counts by sound-source elevation across 20, 30,

and 40 dB above threshold. The modulation for the unit in Figure 4.1 A, averaged across

sound levels, was 59.2%. Across the whole population of 389 units that we studied

using broadband noise, the median of the average modulation was 47.8%, which was

comparable with our previous report (Xu et al. 1998).

Narrowband stimuli produced weaker elevation tuning than did broadband

stimuli. The raster plots (Figure 4.1, B and C) show the spike patterns of the same unit

elicited by narrowband noise centered at Fc of 6 and 16 kHz, respectively. Spike











Broadband Noise

200 A :i
. . .. .. .. : .. ... .. .. .
180 .
............ ...............
160 ....

140 -
U) ...... .... . .......
|120
(D . ,' ";
1.. ...j. -. .. ......
........... . .... ..
0
S80,-.
S ...... .... .. .............
i C 6 0 V .
. . :. ..........
'3 40

CO 20 'r"
20
0 ..........^ ," ... .... .......

-20

-40 '
-60 : ".
'-I.-

0 10 20 30 40 5



5 . . . . . . .
D
4 -
13-

o 5, ..

Mb3\ I\\ -

a 2 /
O' -07- 25 dB ". / -
--fl-30o '* .../* "\
-A30 dB

0 1 : -- --0-- 0 t dB
-60 0 60 120 180


6 kHz Narrowband Noise

B.
. ... .. .

.............. S ......
.i." ?* |

.............. -. .. ........






............
....... ..... .. .. ..........

............. . ...........



.......'.... -
............. ',:;. ..........





.r
***** .**- .......



.. .........
.............. .. ..........
,' "*,"


..... ......... .. ......
", ... ... ... .." .. .. .. ...

.. .. .. .. .. .. ... ", :'; .. .. ; .. . .

............. ....


0 10 20 30 40 E
Post-Onset-Time (ms)


.d P -






-60 0 60 120 180
Stimulus Elevation (degrees)


16 kHz Narrowband Noise


b -- 1, , ,
F



3 ; b .qo
d

2 -


1 9806C02
Area A2
0 p
-60 0 60 120 180


Figure 4.1. Unit responses elicited by broadband and narrowband noise (unit 9806C02).
A: Raster plot of responses to broadband sounds presented from 14 locations in the

median plane. Each dot represents one spike from the unit. Each row of dots represents

the spike pattern recorded from one presentation of the stimulus at the location in
elevation indicated along the vertical axis. Only 20 trials recorded at each elevation are
plotted. Stimuli were 80 mis in duration and 30 dB above threshold. B and C: Raster
plots of responses to 1/6-oct narrowband noise with center frequencies at 6 and 16 kHz,
respectively. Other conventions are the same as in A. D: Spike-rate-versus-elevation
profiles for the responses to broadband stimulation. Each line represents the spike-rate-
versus-elevation profile at one of the five stimulus levels (i.e., 20, 25, 30, 35, and 40 dB

above threshold). E and F: Spike-rate-versus-elevation profiles for the responses to 6-

and 16-kHz narrowband stimulation, respectively. Stimulus levels were 20, 30, and 40
dB above threshold. Symbols and line types match those in D that represent the

equivalent levels.








patterns showed less variation from one elevation to another than did those elicited by

broadband stimuli. On the other hand, spike patterns showed considerable variation

across F,. Fewer spike counts were elicited by 6-kHz narrowband noise than by 16-kHz

narrowband noise. The spike patterns elicited by 16-kHz narrowband noise usually

started with a single short-latency (< 20 mns) spike followed by a silent period of about 3

mis and then several spikes at short interspike intervals (Figure 4. 1 C). These firing

patterns resembled those elicited by broadband noise at +20 to +60 elevation (Figure

4.1A). Figure 4.1, E and F, plots the elevation tuning of the unit in response to the two

narrowband stimuli at 20, 30 and 40 dB above threshold. The elevation tuning curves

were flatter than those of broadband noise stimulation; the average modulation of

elevation was 30.6 and 20.8% for 6- and 16-kHz narrowband stimulation, respectively.

Across the sample of 158 units that we recorded using narrowband stimuli, the median

of the average modulation of spike counts by elevation of narrowband noise was 39.9%.

Network classification of responses to broadband stimulation

Results from artificial-neural-network analysis of the spike patterns elicited by

broadband noise stimulation were comparable with our previous report (Xu et al. 1998).

The A2 neurons could code sound-source elevation with their spike patterns with various

degree of accuracy. As an example, the network analysis of the spike patterns of the

same unit as in Figure 4.1A elicited at 30 dB above threshold is shown in Figure 4.2A.

Each plus (+) represents the network estimate of elevation based on one spike pattern,

and the solid line indicates the median direction of responses at each stimulus source

elevation. In general, the neural-network estimates scattered around the perfect

performance line (---). Some large deviations from the targets were seen at certain



































-120


-180


-240


Narrowband Noise


-60 0 60 120 180 -60 0 60 120 180
Sound-Source Elevation (degrees)


Figure 4.2. Network analysis of spike patterns of the same unit (9806C02) as in Figure
4. I. A: Network performance in classifying spike patterns elicited by broadband noise at
30 dB above threshold. Each symbol represents the network output in response to input
of one bootstrapped patterns. The abscissa represents the actual stimulus elevation, and
the ordinate represents the network estimate of elevation. The solid line connects the
median directions of network estimates for each stimulus location. Perfect performance
is represented by the dashed diagonal line. B. Network classification of spike patterns
elicited by narrowband noise of center frequencies at 6 kHz (o) and 16 kHz (x). The
neural network was trained with spike patterns elicited by broadband noise at 5 roving
levels (20, 25, 30, 35, and 40 dB above threshold) and was tested with those elicited by
narrowband noise at 30 dB above threshold. Other conventions are the same as in A.


Broadband Noise








locations in elevation (e.g., -60 in this example). We calculated the median error of the

neural-network estimates as a global measure of network performance. The neural

network classification of the spike patterns of the unit shown in Figure 4.2A yielded a

median error of 27.8, which was among the smallest in our sample of recordings with

broadband noise stimuli.

Across all the 389 units that we studied with broadband noise stimuli, the median

errors of the network performance averaged 41.7 and 50.4 for stimulus levels of 20 and

40 dB above threshold, respectively, ranging from 19.9 to 67.2. The averaged median

errors were 3 to 4 larger than in the data set that we reported previously (Xu et al.

1998). This small difference probably was due to differences in unit recording and spike

sorting techniques. Nonetheless, the bulk of the distribution of median errors was

substantially better than chance performance of 65. The distribution of the median

errors was unimodal. We selected the half of the distribution with the lowest median

errors at 40 dB above threshold (194 units; median errors < 50.4) for analysis of

responses to filtered sounds. Among those 194 elevation-sensitive units, 73 units were

tested using narrowband noise of fixed Fe's at various elevations. Using stimuli fixed in

elevation at +80 or +100, all 194 elevation-sensitive units were tested with narrowband

noise of varying Fc's, 127 were tested with notches of varying Fc's and 74 were tested

using highpass noise stimuli.

Neural Network Classification of Responses to Narrowband Stimulation

The spike patterns of narrowband noise stimulation presented from 14 midline

elevations showed less variation across locations than did spike patterns to broadband

noise stimulation, as shown in Figure 4.1. When we trained the artificial neural network








with spike patterns elicited by broadband stimulation and used this trained network to

classify the spike patterns elicited by narrowband stimulation, we found that the network

outputs tended to cluster around certain locations in elevation, regardless of the actual

source locations. Figure 4.2B shows an example of the neural-network outputs for one

of the elevation-sensitive units (9806C02); the spike patterns of this unit are plotted in

Figure 4.1, B and C. The network estimates of elevation for 6-kHz narrowband noise

are plotted with crosses (x) and those for 16-kHz narrowband noise are plotted with

circles (o). The neural-network outputs for spike patterns elicited by the 6-kHz

narrowband noise tended to scatter in the upper-rear quadrant, whereas those for spike

patterns elicited by 16-kHz narrowband noise tended to point around 50 above the front

horizon. The network estimates of elevation for the neuronal responses to narrowband

stimulation were dependent on the center frequency but independent of the actual source

location.

In the following analysis, we tested the neural responses to narrowband

stimulation of different Fc's presented at a fixed location. In this test, we trained the

neural network with spike patterns elicited by broadband noise at 5 roving levels (20, 25,

30, 35, and 40 dB above threshold). After the neural network learned to recognize the

spike patterns of broadband stimulation according to sound-source elevation, the trained

network was used to classify the neural responses to narrowband noise stimulation of

varying Fc's.

An example of the spike patterns elicited by broadband noise and narrowband

noise from one of our elevation-sensitive units (9806C16) is shown in Figure 4.3 in a

similar format to that of Figure 4.1. Broadband noise stimuli were presented from 14











Broadband Noise
200 A
.................... .. .. .... .. ...........
180 ""
1 8 ................... ...'.... ................. ... .
160 V
S140 ."

.................. ... ." .......... ................
o)120 .',..

a) 100 ,.
I 8 .... ....... .:.......................
c 80 .
., .................... .. .... ........... ...........
m 60
....... ... ... ....
40 I:'.." "
2 20 "
= .............. ... ................... .

-20 ..
.................... o ............. .. .........
-200
-40 .,"/:
-60 "-,"
0 10 20 30 40 50




3
D

aD f.0 .0 -.0

a) 2.\

.0. .... 20 dB "
25dB .'" 35dB
-0 30dB -40dB
-60 0 60 120 180
Stimulus Elevation (degrees)


Narrowband Noise
at 80 Elevation
BBN B ..
18 i
17 *.......*...
1 8 .................. .. ......................

I 172 ...................... ". ............... -....... .
16
-iz15 $


M ..... .................... .................... .
~14
1 3 .......................


~11 ,

S10
. ... .................. ,, .......................

7 . -- .
6 *..................... ..- ,. .......... .... .
.................... .. ..: ....................
5 ^.
......................... ................
4 .": -
0 10 20 30 40 50
Post-Onset-Time (ms)


2
0


0
&~ JO


BBN 4 6 8 10 14 18
Center Frequency (kHz)


Notches at 80 Elevation
BBN C
18 .I .
17 ..... .....
1 6 ..................... ..... ..... ............
1 6 .................. .: .. ....... .................
141 ,''
", 1 5 .................. *. ..........................
zS 14 .. ".. ...... . .....

5'13 ..



a ) 9 . . . . .: : ; + . . . . . .
C .................. ..............................





7- 91 ""

0 10 20 30 40 50
C D ..... ...... . .< . .......................
| 1 1 .................. .. ; ,.. -......,,...........
7 .10......................
6 ............... . ........ ....
. . . . . .. .................. *.......
5 ................. ^ ......... ............



0 10 20 30 40 50


F






9806C16
Area A2

BBN4 6 8 10 1418
Center Frequency (kHz)


Figure 4.3. Unit responses elicited by broadband, narrowband, and notched noise (unit
9806C 16). A: Raster plot of responses to broadband stimulation presented from 14
locations in the median plane. Conventions as Figure 4. I A. B: Raster plots of responses
to narrowband noise of various center frequencies. The narrowband stimuli were
presented from +80 elevation. The narrowband center frequencies were from 4 to 18
kHz as indicated along the vertical axis with BBN indicating spike patterns elicited by
broadband sounds presented at +80 elevation. Stimuli were 20 dB above threshold. C:
Raster plots of responses to 1/6-oct notched noise of center frequencies ranging from 4
to 18 kHz in I -kHz steps. Other conventions are the same as in B. D: Spike-rate-
versus-elevation profiles for the responses to broadband stimulation. Conventions as
Figure 4.1 A. E and F: Spike-rate-versus-center-frequency profiles for the responses to
narrowband and notched noise, respectively. Stimulus levels were 20, 30, and 40 dB
above threshold. Symbols and line types match those in D that represent the equivalent
levels. BBN on the abscissa indicates spike rate elicited by broadband noise.

















120


80.


E 8
'2 40
W2

LU
W1 +41

0
0)



-40



BBN 5 7 9 11 13 15 17
Narrowband Center Frequency (kHz)



Figure 4.4. Network estimates of elevation. The network analysis was based on the
responses to narrowband sounds that varied in center frequency; the neural responses of
the unit (9806C16) are shown in Figure 4.3. The neural network was trained with spike
patterns elicited by broadband noise presented from 14 elevations at 5 roving levels (20,
25, 30, 35, and 40 dB above threshold) and was tested with those elicited by narrowband
noise at 30 dB above threshold. Each column of symbols represents network outputs for
spike patterns elicited by narrowband noise of a given center frequency as indicated along
the abscissa. BBN indicates the network responses to spike patterns elicited by
broadband noise. All stimuli were presented from +80 elevation. The background of
gray-scale rectangles for the narrowband stimuli represents the acoustical model
predictions that are based on the spectral differences between the narrowband stimulus
spectra and the head-related transfer functions at each elevation. Values of the spectral
differences were scaled to span the full lightness between the extremes of black and
white. White and light gray indicate small spectral differences and the network estimates
that fall in those regions are plotted in black. Black and dark gray indicate large spectral
differences and the network estimates that fall in those regions are plotted in white.









elevations (Figure 4.3, A). The narrowband stimuli of Fc's from 4 to 18 kHz in l-kHz

steps were presented at +80 elevation (Figure 4.3, B). Only 20 response patterns in

each stimulus condition are shown here. The spike rate tuning of the unit at 5 different

stimulus levels of broadband noise and 3 different stimulus levels of narrowband noise

are plotted in Figure 4.3, D and E. Both elevation tuning of the broadband noise and the

frequency tuning to narrowband noise were fairly broad.

Figure 4.4 shows the network estimate of elevation based on responses of the

same unit (9806C 16) to narrowband sounds that varied in F,. Each column of plus signs

represents the network output for one F,. The background of gray-scale rectangles

represents the acoustical model that is described in the next section. In this case, the

network estimates of elevations for the narrowband noise data tended to shift

monotonically to lower elevations as Fc's increased. The network outputs for broadband

noise data are shown on the stripe of white background. The median direction of the

network estimation for the broadband noise data was +59.9, which was about 20 off

the location (+80 elevation) from which the broadband noise was actually presented.

Figure 4.5 shows an example from a unit (9803A02) in a different cat.

Narrowband noise stimuli with 10 different Fc's (7 to 16 kHz in 1-kHz steps) were

presented at +80 elevation. In this case, the network estimates of elevation varied

somewhat erratically with Fc of the stimuli. The median direction of the network

estimation for the broadband noise data was +93.7, which was 13.7 off the target (+80

elevation) where the broadband noise was actually presented.

The Model of Spectral Shape Recognition

In a previous human psychophysical study, we presented a quantitative model






87










200 *:s AO2



160

(n

120
80


16)
o ()




E
45=
Z 0
-40 Co



z0








BBN 8 10 12 14 16
Narrowband Center Frequency (kHz)






Figure 4.5. Network analysis of spike patterns and model predictions in response to
narrowband stimulation. This example is taken from a unit (9803A02) in a different cat
from that shown in Figure 4.4. Narrowband center frequencies varied from 7 to 16 kHz
in 1-kHz steps. Other conventions are the same as in Figure 4.4.
















B CAT9806
I I I I
+2000
_+180'
+160
+140Q'

+120 -


+80'


5 10 15 20 30 5 10 15 20 30
Frequency (kHz)


A CAT9803













p I I I
I I<


Figure 4.6. Head-related transfer functions (HRTFs) in the median plane measured from
left ears of 3 cats. The measurements and process of HRTFs are described in detain in
METHODS. Starting from the bottom, each line represents a HRTF for one of the 14
midline elevations from -60 to +200, as indication on the left in B. A: cat9803. B:
cat9806. C: cat9811.


C CAT9811



























I I I I I30
5 10 15 20 30


-60'









that used a comparison of stimulus spectra with head-related transfer functions (HRTFs)

to predict listeners' judgements of the locations of narrowband sounds (Middlebrooks

1992). In the present study, we adapted that model to the cat as a means of simulating

cats' location judgements. The model was adapted by substituting feline HRTFs for

human HRTFs and by extending the frequency range of the analysis to higher frequencies

to accommodate the cats' higher audible range.

Figure 4.6 shows examples of HRTFs for all the 14 midline elevations measured

in the left ears of 3 cats (A, cat9803; B, cat9806; C, cat9811). There were considerable

individual differences among cats. In general, however, spectral features, such as peaks

and notches, tended to increase in center frequency as sound sources increased in

elevation in the front (-60 to +80) and, to a lesser degree, in the rear (+200 to +100).

The most systematic variation occurred in the mid-frequency region (5 18 kHz), which

has been emphasized in previous studies of the cat HRTFs (Musicant et al. 1990; Rice et

al 1992). In most cats, HRTFs at overhead locations (+80 to +100 elevation) were

relatively flat, although exceptions did occur (e.g., Figure 4.6A). Differences in the

midline HRTFs measured from the left and right ears of a given cat tended to be smaller

than the differences among cats. The median spectral differences between left and right

ears across all 8 cats was 10.4 dB2, whereas the median spectral differences between left

ears of all 28 pairs of cats was 14.5 dB2. In the spectral recognition model that predicted

the narrowband noise localization behavior of the individual cats, we used the HRTFs

measured from each cat's own left ear, i.e., contralateral to the physiological recording

site.









































5 1U 15 2U 3M -60 0 60 120 180
Frequency (kHz) Elevation (degrees)






Figure 4.7. Spectral differences between the narrowband stimulus spectra and HRTFs.
Left panel: Spectra of narrowband noise of center frequencies from 4 to 18 kHz in 1 -kHz
steps. Symbols represent the center frequencies. Right panel: Spectral differences. Each
line represents the spectral differences between the spectrum of the narrowband noise of
a given center frequency as indicated on the left of the line and the HRTFs measured
from 14 elevations as indicated by the abscissa. HRTFs were taken from cat9806 (Figure
4.6, B).








We defined a metric to quantify the similarity between the narrowband noise

stimuli and the HRTFs. First, the stimulus spectrum was added to the HRTFs of the

elevation at which the stimulus was presented. Next, we subtracted, frequency by

frequency, the log-magnitude spectrum of each HRTF from that of each narrowband

stimulus. Then, we computed the variance of each difference distribution across all

frequencies. We referred to the variance of the difference distribution as the spectral

ditfereenc'. The smaller the spectral difference, the more similar are the stimulus

spectrum and the HRTF. Figure 4.7 illustrates how this computation was accomplished

for the data from one of the cats (cat9806). The amplitude spectra of the 1/6-oct

narrowband noise stimuli with Fc's from 4 to 18 kHz in 1-kHz steps are shown in the left

panel of Figure 4.7. The right panel of Figure 4.7 plots the spectral differences. The

abscissa in the right panel of Figure 4.7 represents the source elevations at which the 14

HRTFs were measured; those HRTFs are shown in Figure 4.6B. Each line in the right

panel of Figure 4.7 represents the spectral difference between one narrowband noise

stimulus (Figure 4.7, left panel) and the 14 HRTFs (Figure 4.6B). The symbols used for

the lines match the symbols used to represent the Fc's of the narrowband noise spectra

shown in the left panel of Figure 4.7.

Our model predicts that an individual animal's judgement of a narrowband sound

source would be biased towards elevations at which the spectral differences are small. If

the responses of cortical neurons are influenced by the narrowband noise stimulus in the

same way as is the behavior of the animal, the spike patterns elicited by narrowband noise

of a particular Fc should resemble the spike patterns elicited by broadband noise at source

elevations at which the spectral differences are small. In terms of the artificial-neural-