Study of vocal fold vibration and the glottal sound source using synchronized speech, electroglottography and ultra-high...

MISSING IMAGE

Material Information

Title:
Study of vocal fold vibration and the glottal sound source using synchronized speech, electroglottography and ultra-high speed laryngeal films
Physical Description:
viii, 211 leaves : ill. ; 28 cm.
Language:
English
Creator:
Krishnamurthy, Ashok Kumar, 1957-
Publication Date:

Subjects

Subjects / Keywords:
Glottis   ( lcsh )
Vocal cords   ( lcsh )
Speech   ( lcsh )
Larynx   ( lcsh )
Genre:
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 1983.
Bibliography:
Includes bibliographical references (leaves 204-210).
Statement of Responsibility:
by Ashok Kumar Krishnamurthy.
General Note:
Typescript.
General Note:
Vita.

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
oclc - 11556760
ocm11556760
System ID:
AA00011148:00001

Full Text










STUDY OF VOCAL FOLD VIBRATION
AND THE GLOTTAL SOUND SOURCE
USING SYNCHRONIZED SPEECH,
ELECTROGLOTTOGRAPHY AND
ULTRA-HIGH SPEED LARYNGEAL FILMS












By


ASHOK KUMAR KRISHNAMURTHY


A DISSERTATION PRESENTED TO THE GRADUATE
SCHOOL OF THE UNIVERSITY OF FLORIDA
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF DOCTOR OF PHILOSOPHY




UNIVERSITY OF FLORIDA


1983































To Amma and Appa














ACKNOWLEDGEMENTS


I wish to express deep appreciation to my advisor and

committee chairman, Dr. Donald G. Childers, for his

invaluable guidance, encouragement and assistance in all

phases of the research. My interaction with him has been a

most gratifying learning experience.

I am indebted to Dr. G. Paul Moore for the many

insightful discussions about vocal fold vibration, his

expert assistance with the high-speed photography and for

his participation in my supervisory committee. I also wish

to thank the remaining members of my committee, Drs. L. W.

Couch, R. L. Sullivan and E. R. Chenette for their time and

interest.

Thanks are due to my colleagues at the Center for Mind-

Machine Interaction Research at the University of Florida.

In particular, Jay Naik and Jerry Larar spent many long days

(and nights) getting all the data into the computer in a

usable form--thanks you guys! I also thank the numerous

persons who assisted with the film data measurements.

I wish to acknowledge the financial support provided by

grants ECS-8116341 from the National Science Foundation and

NIH17078 from the National Institute of Health and by the

University of Florida Center of Excellence Program in

Information Transfer and Processing.

iii









I also wish to thank my typist Debbie LaMar for a

cheerful and competent typing job accomplished at short

notice.

During the course of this study, I was fortunate to

have the support and encouragement of some very good

friends: Ravi Sundaresan, Tiru Rao, GVS Bhaskar, Gautam

Das, Chaitanya Baru and Liley Prasad. I thank them all.

Finally, I am grateful to my parents, brother and

sister for their love, support and encouragement which made

this accomplishment possible. They always believed I could

do it, and to them I dedicate this work.















TABLE OF CONTENTS


Page
ACKNOWLEDGEMENTS.............. ....... ......... iii

ABSTRACT.... ................... ........... ........ vii

CHAPTER

1 INTRODUCTION ......................... .. 1

2 EXPERIMENTAL DATA BASE:
COLLECTION AND MEASUREMENT................ 8

Subjects and Tasks....... .............. 8
Data Collection and Equipment.......... 9
Data Measurements and Preprocessing.... 13
Data Synchronization................... 17
Potential Errors in the Data Sets...... 18

3 A STUDY OF THE SYNCHRONIZED ULTRA-HIGH
SPEED FILMS AND THE ELECTROGLOTTOGRAPH
SIGNAL.......................... .......... 20

Introduction ......................... 20
Structure of the Vocal Folds........... 21
Vibration of the Vocal Folds........... 23
Interpretations of the EGG............. 26
Glottal Area and the EGG............... 31
EGG and the Length of Glottal Contact.. 62
EGG and Observations of the High Speed
Films ............................... 70
A Qualitative Model for the EGG........ 76
Conclusions ............................. 79

4 SYNCHRONIZED GLOTTAL VOLUME VELOCITY,
GLOTTAL AREA AND THE EGG.................. 80

Introduction ........................... 80
The Linear Model for Voiced Speech..... 82
Some Inverse Filtering Techniques...... 89
The Closed Phase Covariance Method
of Inverse Filtering................. 92
Results from the Synchronized Data
Base................................ 107
Note.................................. 138









Page
5 TWO CHANNEL SPEECH ANALYSIS USING THE EGG
AND THE ACOUSTIC SPEECH SIGNAL............ 139

Introduction........................... 139
Voiced/Unvoiced Classification and
Fo Estimation....................... 142

Estimation of the Vocal Tract Filter
and the Glottal Volume Velocity..... 157
Note ................................... 187

6 CONCLUSIONS............................... 188

Summary............................... 188
Directions for Future Research......... 190

APPENDICES

A CALIBRATION OF THE ELECTROGLOTTOGRAPH
CIRCUITS.................................. 193

B TAPE RECORDER DISTORTION CORRECTION....... 195

C FIR BAND PASS FILTER RESPONSE............. 198

D PITCH-SYNCHRONOUS CIRCULAR
CORRELATION (PSA) METHOD.................. 200

E FILE NAMING CONVENTION USED............... 202

REFERENCES ... ... ............................. 204

BIOGRAPHICAL SKETCH................................ 211














Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy



STUDY OF VOCAL FOLD VIBRATION
AND THE GLOTTAL SOUND SOURCE
USING SYNCHRONIZED SPEECH,
ELECTROGLOTTOGRAPHY AND
ULTRA-HIGH SPEED LARYNGEAL FILMS




By

ASHOK KUMAR KRISHNAMURTHY


DECEMBER, 1983

Chairman: D. G. Childers
Major Department: Electrical Engineering


The purpose of the research was to establish an

understanding of the Electroglottograph (EGG) signal and its

relationship to vocal fold vibration. The method adopted is

to compare the EGG with data from simultaneously and

synchronously recorded ultra-high speed laryngeal films and

the acoustic speech signal. A second concern of 'this study

was to evaluate the feasibility of using the EGG in

improving speech analysis techniques.

The data measured from the ultra-high speed laryngeal

films include the glottal area and the length of the glottal

opening. The acoustic speech wave is inverse filtered to

vii








obtain the glottal volume velocity. A comparison of the EGG

with these synchronized data shows that the EGG is a

function of the lateral area of contact between the vocal

folds. A description of the EGG during the various glottal

phases and a qualitative model for the EGG are presented.

The experimental results show that the EGG is an excellent

indicator of the period of the glottal vibration. An

algorithm for automatically locating the glottal opening and

closing instants from the EGG is described and

experimentally evaluated. The spectra of the EGG and the

glottal volume velocity are computed and compared. This

comparison does not show a consistent relationship between

the two spectra.

A method for voiced/unvoiced classification and

fundamental frequency contour estimation using the EGG is

described. A comparison of the results obtained using this

method with the results from a speech signal based method

indicates that the EGG based method is simpler, more

reliable and requires less computation.

The EGG is also used to implement three pitch-

synchronous linear prediction analysis methods. Results are

presented to show that the pitch-synchronous closed phase

analysis method provides an accurate estimation of the vocal

tract formant frequencies and bandwidths. This method is

also used to implement an automatic, pitch-synchronous

technique to obtain the glottal volume velocity in

continuous speech.


viii














CHAPTER 1

INTRODUCTION

Speech is the most natural medium of communication for

human beings. The physiological speech mechanism consists

of the lungs, vocal folds, and oral and nasal tracts. The

improper functioning of any part of this system can result

in an impairment of the ability to speak. The detection,

diagnosis and treatment of speaking disorders are important

problems and require an understanding of the speech

production process. There is a need to understand human

speech production from another point of view also; namely to

find efficient and reliable methods for the storage and

transmission of speech. Closely related to this is the

problem of voice communication with machines.

A schematic representation of the human vocal system is

shown in Figure 1.1. The lungs act as a reservoir of air.

An increase in the lung pressure causes the flow of air into

the trachea. For the voiced sounds in speech, this flow of

air interacts with the vocal folds causing them to

vibrate. The aerodynamic-myoelastic theory (1) attempts to

provide a mathematical description of vocal fold

vibration. The vibration of the vocal folds "chops" the air

flow into discrete pulses that act as an acoustic sound

source. This glottal sound source is filtered by the vocal




































OnTOITI --

CR*Co-D ... ..
.. .. -r o SO C*rL-
S ----'. vocAY-- ------ oo

)^|' ~ -^ -^ -' - f<-













Figure 1.1 Schematic representation of the human
speech production system (from (2)).









and nasal tracts and is radiated into the atmosphere as a

pressure wave at the mouth and nostrils. The mathematical

description of the filtering process imposed by the

supraglottal tract is the acoustic theory of speech

production (3).

Vocal fold vibration and its acoustic correlate, the

glottal sound source, are as yet poorly understood areas of

speech research (4). There are several problems where such

knowledge is of importance, e.g., the detection and

treatment of laryngeal disorders (5), training aids for the

hearing impaired (6), the synthesis of natural sounding

speech (7), and improved modeling of the speech signal.

The relative inaccessibility of the larynx makes direct

observation of vocal fold vibration in vivo impossible. One

must, therefore, resort to various indirect observation

techniques, e.g., indirect laryngoscopy, ultra-high speed

photography, photoglottography, ultrasound, X-rays,

electroglottography and inverse filtering of the acoustic

speech signal. A review of these methods can be found in

reference 5 and 8. Most of these indirect methods have one

or more of the following drawbacks: The procedure is

difficult to apply and so can be used with only a limited

cross section of the population; the appartus is difficult

to obtain and maintain; or the method can be used with only

a limited range of phonations.








Electroglottography, in contrast, appears to be an

inexpensive procedure that can be used with a majority of

the population and over a wide range of phonations.

Electroglottography is basically an electrical impedance

measuring technique (9) and has its origins in the work of

Fabre' (10). The principle behind the device is simple

(11): A pair of electrodes is applied to the neck at the

level of the larynx. A high frequency (about 5 MHz) current

passes from one electrode through the neck and is picked up

by the other electrode. As the subject phonates, the

opening and closing of the vocal folds change the electrical

impedance of the neck in the region of the electrodes. This

modulates the radio frequency (RF) current, which is

demodulated using a detector to yield the electroglottograph

(EGG) signal. The EGG is presumably a measure of the

changing electrical impedance at the neck, and hence of the

vocal fold vibration. Figure 1.2 is a block diagram

illustrating this principle.

While electroglottography was proposed in 1957,

successful implementation of the method was accomplished

only recently (11,12). The primary difficulty appears to

have been the instrumentation of the method; a good

description of requirements of the measurement procedure and

the design of a device to meet the requirements is given in

reference 11.

In spite of its apparent usefulness as a glottal

sensor, electroglottography is not well understood, and





















Neck


EGG


Figure 1.2 Schematic of the Electroglottographic Technique








hypotheses about the EGG signal have not been sufficiently

validated. The primary concern of this study is to

establish an understanding of the EGG and its relation to

vocal fold activity in normal adult speakers. The

methodology of the research is a comparison of the data

obtained from ultra-high speed laryngeal films, the EGG and

the acoustic speech wave. A second concern of this study is

to evaluate the feasibility of using the EGG as a second

channel of information for improving existing speech

analysis techniques.

The organization of this dissertation is as follows:

The experimental data base used in this study is described

in Chapter 2. We discuss there the data collection

procedure, the equipment used, the measurements made on the

high speed films and the synchronization and preprocessing

of the signals.

In Chapter 3, we compare the EGG with the data from the

high speed films. Based on this comparison, the various

phases in the EGG are identified, and a qualitative model

for the EGG is presented.

The linear source-filter model (13) for voiced speech

is discussed in Chapter 4, and the necessity for closed

phase analysis of the speech signal is established. The

closed phase covariance method of glottal inverse filtering

is then introduced as one possible method for closed phase

analysis. This method is used to obtain the glottal volume

velocity from the speech data in the experimental data








base. Finally, temporal and spectral comparisons of the

glottal volume velocity, the EGG and the glottal area are

presented.

Chapter 5 deals with the applications of the EGG in

speech analysis. A method for voiced/unvoiced

classification and pitch period estimation using the EGG is

described. Results obtained using this method are compared

with the results from a method using only the speech

signal. The autocorrelation and covariance methods of

linear prediction analysis of the speech signal are then

discussed. The use of the EGG to segment the speech

waveform into individual pitch periods, and further into the

closed and open phases in each pitch period is described.

We introduce and compare three pitch-synchronous linear

prediction analysis methods. Results from the

autocorrelation and the three pitch-synchronous linear

prediction methods are discussed.

We summarize the important results and conclusions of

this study in Chapter 6. A number of problems for future

research are also identified.














CHAPTER 2

EXPERIMENTAL DATA BASE: COLLECTION AND MEASUREMENT


The primary goal of this investigation was to establish

an understanding of the EGG by relating its features to

vocal fold vibration. To achieve this goal, we decided to

compare the EGG with the vocal fold vibrations and the

speech waveform obtained simultaneously and synchronously on

ultra-high speed films and magnetic tape, respectively. In

this chapter, we describe the data collection and

measurement procedures used to obtain the experimental data

used in this study.



Subjects and Tasks

Four normal adult males (JMN, DMK, GPM and AKK), who

possessed no evidence of voice disorders or laryngeal

pathology were the subjects used in the study. The

experimental tasks for each of the subjects consisted of

phonation of the vowel /i/ at three different intensities at

each of three different fundamental frequencies. The vowel

/i/ was chosen so that the epiglottis was held out of the

optical pathway of the vocal fold image during filming;

however, because the tongue was held down and a laryngeal

mirror was used, the procedure resulted in a sound closer to

an /a/ in most cases. The recorded phonation was sustained

for about three seconds.








The three fundamental frequencies used were 125 Hz, 170

Hz and 340 Hz; to control the fundamental frequency during

the experiments, the subjects were asked to match a pure

tone of the appropriate frequency that they heard over a

pair of headphones. The three different intensities at each

fundamental frequency represent a "comfortable" intensity,

an intensity approximately 4dB above it and another

intensity about 4dB below it. The actual intensities

produced were monitored using a sound level meter.

Thus, there were nine tasks for each subject, for a

total of thirty-six tasks.



Data Collection and Equipment

High Speed Photography

The technique of ultra-high speed photography of the

vibrating vocal folds is described in (8,14). Briefly, a

laryngeal mirror is held in the subject's mouth at the back

of the pharynx. A high intensity light source is focused

onto the mirror, which reflects the light beam 90 downwards

onto the vocal folds. The image of the vocal folds,

reflected by the same mirror, is focused by a system of

lenses to a high speed camera. As the subject phonates, the

details of the vibration are captured on the film. The film

can then be played back later at a slower speed to view the

detailed vibratory behavior.

The photographic equipment and configuration used for

this study have been described elsewhere (8). The high









speed camera used was a Fastax model WF-14, which is capable

of exposure rates of 8000 frames/second. The camera

controls were adjusted to obtain a film speed of 5000

frames/second over the last portion of the film, when the

exposure rate is nearly constant.

The camera has two lens systems through which images

can be photographed. The first lens system was used for the

photography of the vocal folds. A grid, adjusted to be in

the focal plane of the vocal folds, was also photographed

via this lens system. This allows absolute measures of

vocal fold vibratory patterns to be made, while previously

only relative measures were possible.

The second lens system, specifically designed to

photograph an oscilloscope face, protrudes from the side of

the camera. Two timing signals (to be described later) were

photographed via this lens. The traces of these two timing

signals were adjusted to lie along one edge of the film.

The EGG waveform was also displayed on a third trace of the

oscilloscope. This trace was positioned on the other edge

of the film away from the two timing traces. Because the

two lens systems are at a 900 angle, the three oscilloscope

traces appear on a film frame that is displaced five frames

behind the film frame recording the corresponding vocal fold

image.

The high speed films used were one of two different

types--the black and white Kodak 7277 4-X reversal film or

the color Kodak Ektrachrome 7250 high speed video news film.








EGG and'Speech Signals

The speech signal was obtained using a hearing aid

microphone coupled directly to one channel of a stereo tape

recorder. The tape recorder used was either a Revox A77 or

a Teac A-2060. The microphone was attached to the handle of

the laryngeal mirror at the point where the mirror frame

joins with the handle. The distance of the glottis from

this point varies from subject to subject, but was

approximately 11 cm in most cases. The microphone was used

at this particular location to shield the audio signal from

the camera motor noise. The audio bandwidth of the

microphone has been measured to be about 6 KHz with a slight

peak at 4 KHz.

The EGG signal was obtained using an electroglottograph

designed by D. Teaney and manufactured by Synchrovoice

Associates. The EGG was recorded on one channel of a Sony

model TC530 stereo tape recorder. The rise and fall times

of the EGG circuits were tested using a square wave

calibration circuit; this is described in Appendix A.

The second channel of both tape recorders was used to

record 10 KHz square wave timing signal. Both tape

recorders were run at 7-1/2 ips to obtain a flat frequency

response from 50 Hz to 5 KHz.


The Timing Signals

A special time code generator has been designed to

allow temporal synchronization of the EGG, speech and film








data (15). The time code generator provides three timing

signals: a 10 KHz square wave that was recorded on the

second channel of both tape recorders, a 5 KHz square wave

derived from the 10 KHz signal, and an 8 bit counter

signal. The latter two timing signals were photographed on

the laryngeal film via the oscilloscope face as described

earlier. The 8 bit counter signal tracks the number of 100

cycles of the 5 KHz signal that occur following the

initiation of the timing signals; in other words, the 8 bit

counter is at 0 V except after every 100 cycles of the 5 KHz

signal. At these instants, the counter value is incremented

by 1, and the new count asserted on the counter signal

line. Thus, given any frame in the film, if N is the number

of cycles of the 5 KHz square wave between this frame and

the last counter output, and the decimal value of the last

counter output is k, then lOOk+N cycles of the 5 KHz square

wave have elapsed between the initiation of the timing and

this frame.

The 10 KHz signal, recorded on the second channel -of

the tape recorders, was used as the external clock signal

for the Analog to Digital (A-D) converter while digitizing

the speech and EGG signals. Since all the timing signals

were initiated simultaneously, and the 5 KHz clock is

obtained from the 10 KHz signal, the number of samples

corresponding to lOOk+N cycles of the 5 KHz square wave is

200k + 2N. Thus, the (200k + 2N)th sample of the EGG is

temporally aligned with the given film frame. For the









speech signal, the corresponding sample number for

synchroniation is (200k + 2N + d) where d is an additional

factor to account for the propagation delay from the glottis

to the microphone.



Data Measurements and Preprocessing


Film Data Measurements

A number of parameters were measured from the high

speed films to characterize the vibration. The operator

projected the film onto a screen a frame at a time using an

Athena 224-ES stop frame projector. A segment of the film

where the number of 5 KHz square wave cycles between

successive counter outputs was close to 100 was isolated at

this time. This represents a segment where the film speed

was close to 5000 frames per second. One hundred and fifty

frames from this section were chosen for analysis.

We have described elsewhere a semi-automated,

computerized system for the analysis of high speed laryngeal

films (16). The hardware for this system consists of a

Vidicon TV camera attached to a Spatial Data Systems EyeCom

108PT image processing terminal. This terminal, interfaced

to a Data General NOVA 4 minicomputer, has the capability of

displaying video images along with superposed graphics. The

display screen is divided into 640 x 480 coordinate

locations. A cursor, controlled by the operator with a

joystick, can be moved to any desired location on the

screen, and the cursor coordinates transferred to the








computer. The terminal is also capable of digitizing images

with a 640 x 480 pixels spatial resolution and an intensity

resolution of 256 gray levels.

The operation of the film measurement system can be

briefly described as follows:

The glottal images are projected using the Athena stop

frame projector onto a 450 mirror, which reflects the image

upwards onto a translucent screen. The image formed on the

screen is scanned by the TV camera and displayed on the

EyeCom display. The operator, using the joystick cursor,

measures the length of the glottis and the width at five

chosen locations. The glottal boundary can then be

approximated using a number of straight lines. The computer

program calculates the glottal area using this straight line

approximation. This procedure is illustrated in Figure 2.1.

The EGG trace, photographed on the film, can also be

digitized using the system. The EGG signal obtained by this

method will henceforth be referred to as the traced EGG in

this study.

The difficulty of outlining the glottal image

consistently introduces noise in the measured values of the

glottal area, length and widths. The traced EGG is also

noisy because of the limited spatial resolution of the

EyeCom terminal. Consequently, these measures have to be

suitably smoothed. Since the smoothing technique needs to

preserve the abrupt changes in the signals between glottal

phases while eliminating sharp, point like jumps, purely













Length _



II



," ~ _xK


I I1 I
I I I

I- a-- a -- a a b b

WI W2 W3 W4 W5
VP
the locations where the widths are measured
A Anterior
S- teriorVP Vocal Process
P Posterior


Outline of the glottal contour using straight lines
to measure the glottal area


Figure 2.1 Measurement of a laryngeal film frame


W1-W5 are


I------









linear smoothing methods are unsuitable. Instead, a

combination of nonlinear median smoothing and linear

smoothing as described in (17) was used.

Digitization and Preprocessing of the EGG and Speech

As explained earlier, the 10 KHz timing signal

recorded on the second channel of the tape recorders was

used as the clock source for the A-D converter in digitizing

the speech and EGG signals. Due to the limited bandwidth of

the tape recorders, the timing signal was passed through a

waveshaping circuit to obtain "clean" square waves. Small

variations in the tape speed and jitter in the waveshaping

circuit are; however, sufficient to introduce errors in the

synchronization amongst the various signals.

After digitization, the EGG and speech signals are

subject to two stages of preprocessing: Correction for tape

recorder distortion and highpass filtering to remove noise

and power line components.

Tape Distortion. The capacitor coupling used in normal

audio tape recorders introduces phase and magnitude

distortion, primarily in the low frequency region below 200

Hz. This distortion can significantly affect the results

obtained from the inverse filtering of the speech waveforms

(18). In the EGG, the distortion is manifested as a

downward slope in the EGG during the glottal open phase.

Berouti (19) has described a method for correcting such

distortion. The method, described in Appendix B, involves

the derivation of the tape recorder transfer function using









a reference signal. The traced EGG was used as the

reference in correcting the recorded EGG for the

distortion. This enabled the correction parameters to be

obtained for each task. A similar reference signal is

however not available for the speech signal. Consequently,

a fixed correction had to be derived and applied to all the

speech waveforms. This is also discussed in Appendix B.

Highpass Filtering. The speech and EGG waveforms were

band pass filtered using a 351 point, linear phase FIR

filter (20). The transfer function of the filter is shown

in Appendix C.



Data Synchronization

We have already explained the procedure for

theoretically synchronizing the different glottal waveforms

using the timing signal. In practice, we found that small

synchronization errors existed after following the alignment

procedure. These errors were primarily due to the sampling

errors during digitization, as explained above. However,

the traced EGG is obtained from the films, and is

consequently in perfect alignment with the film data. The

approach adopted to solve the synchronization problem was

therefore the following:



1) the EGG obtained from the tape was shifted

sufficiently to align it with the traced EGG. This

typically involved shifts of less than 10 samples; and









2) the assumption was made that the speech and EGG

obtained from the tapes are in synchronization; therefore,

the speech signal is also shifted by the same amount as the

shift required to align the traced and recorded EGG

signals. The speech was further shifted by four samples to

compensate for the acoustic propagation delay from the

glottis to the microphone.


Potential Errors in the Data Sets


Film Data

There are two primary sources of error in the data

measured from the films:

1) the entire vocal folds from the anterior to the

posterior may not be exposed in the film. Typically, this

is due to the shadowing of the anterior portions of the

vocal folds by the epiglottis. The occluded portion of the

glottis is left out of the measurement because of the

difficulty of extrapolating the glottal contour over this

portion. This is a possible source of systematic error in

the film data; and

2) the second source of error is the inaccuracies in

the measurement process itself; such errors are discussed in

(21).



Digitizied Data

The errors in the digitized data also arise from two

sources:






19

1) the synchronization errors due to the sampling

process; and

2) the errors due to the tape recorder distortion.

The traced EGG can be used to correct the recorded EGG

and reduce these errors significantly. The use of a 4

channel FM recorder in future work should eliminate both of

these problems.













CHAPTER 3

A STUDY OF THE SYNCHRONIZED
ULTRA-HIGH SPEED FILMS AND THE
ELECTROGLOTTOGRAPH SIGNAL


Introduction

The importance of the electroglottograph signal (EGG)

as a method of assessing vocal fold vibration was pointed

out in Chapter 1. The interpretation of the various phases

and features of the EGG was identified as a research goal

there. The experimental data -base on which this study is

based and the various measures of vocal behavior obtained

from this data base were developed in Chapter 2. In this

chapter, the glottal area and the length of the glottal

opening measured from the ultra-high speed films, and visual

observations of these films are used to analyze and

interpret the synchronized EGG. The plan of this chapter is

as follows:

Since some understanding of the structure of the normal

vocal folds and the vibration of the vocal folds in normal

voice is essential to studying the EGG, these are discussed

in the next two sections.

The current interpretations of the EGG and evidence on

which these are based are outlined in the third section.

The next three sections are concerned with correlating

the EGG with the glottal area, the length of glottal contact

and visual observations from the films respectively.

20









Finally, in the last section a qualitative model for

the EGG is presented, based on the results of the previous

three sections.



Structure of the Vocal Folds


The human vocal folds contained within the larynx are

the basic vibrators that provide the source for the voiced

sounds of speech. The morphological and histological

structure of the vocal folds is therefore of considerable

importance in speech science and has been the subject of

much research (22,23). It is only the free surface of the

vocal folds that take part in the vibration, and this is

typically described as consisting of two layers, a body and

a cover (22). The body is the vocalis muscle and the cover

is the mucosal layer covering this muscle. A schematic

representation of this layer structure is shown in Figure

3.1. This separation of the vocal folds into two layers is

considered essential in sustaining vocal fold vibrations

(1).

The vocal folds act as a mechanical vibrator, and so

adequate lubrication of this mechanism is necessary for

their proper and sustained functioning (24). This

lubrication is provided by the mucus squirted on to the

cords by the ventricular glands. Therefore, as pointed out

by Fourcin (25), the mucus can be considered a third layer

or part of the vocal folds. While the mucus is typically

left out in most discussions of vocal fold vibration, it can




















Sraifllr *'lvumnuv* rq.hr llu


I COVER
I.AMINA PROPRIA

























(23), with permission of the author).
.* .. 'Y^..*
















Figure 3.1 Structure of the human vocal folds (from
(23), with permission of the author).









influence the EGG considerably, as will be evident in the

sequel.



Vibration of the Vocal Folds


Observations of the vibrations of the excised vocal

folds using stroboscopy (26) and of the normal folds during

phonation using high speed films (8) reveal that the vocal

folds undergo complex three-dimensional movements. Phase

differences during a vibratory cycle exist among the

different portions of the vocal folds, both along their

thickness and their length. These complicated wave-like

behaviors are being understood and modeled only of late

(27,28).

It is now generally accepted that during normal chest

voice phonation the more inferior body of the vocal folds

vibrates out of phase with respect to the more superior

cover. In fact, according to the current flow separation

theory of vocal fold vibration, it is this phase difference

that transfers energy from the air flowing through the

glottis to the vibrating system (1).

Most descriptions of vocal fold vibration divide a

single vibratory cycle into at least 3 distinct phases: i)

an opening phase during which the vocal folds pull apart

increasing the area of the glottal opening, ii) a closing

phase during which the vocal folds come together reducing

the glottal area, and iii) a closed phase during which the

vocal folds are maximally closed. Note that in some








vibratory modes as in a breathy voice, a distinct closed

phase may not exist and the area of the glo'ttal opening

shows an almost sinusoidal variation with time.

Based on observations using excised larynges (26,29),

ultra-high speed photography (8,14,30), ultrasonography (31)

and X-ray stroboscopy (32), the movements of the vocal folds

during these three phases in normal chest voice can be

described as follows:

During the opening phase, the vocal folds first

separate inferiorly and the opening moves upwards with a

wave like motion in the mucous membrane. Occasionally, the

opening first appears on the superior surface as a small

"chink" which then opens up in a "zipper" like fashion.

The closing phase begins with contact between the lower

edges of the glottis. The closure then proceeds along the

length of the lower edge and is then followed by the mucosal

layers coming together.

The closed phase is not necessarily associated with an

increasing amount of contact between the vocal folds. It is

often observed (26) that as the vocal folds come into

contact in a vertical plane, they may be pulling apart at

the same time in a different vertical plane.

A schematic representation of vocal fold vibration

observed in an excised canine larynx (26) is shown in Figure

3.2. This figure serves to elucidate the verbal description

given above.







25







L


L
..... .
. *. o


L


; 1 _


L


. L

,-I


L

.

* I *II


Figure 3.2


Schematic representation of vocal fold
vibration in chest voice phonation (from
(26) with permission of the author).









Interpretations of the EGG

Almost all the current interpretations of the EGG are

based on correlating the EGG waveform with one or more

simultaneously and synchronously obtained glottographic

signals. Since no one glottographic signal provides

complete information about the vibration, the observed

behavior is then extrapolated based on the knowledge of the

expected behavior of the vibrating vocal folds. The present

study uses the glottal volume velocity obtained by inverse

filtering the acoustic speech wave and the high speed films

of the vocal folds as the corroborative glottographic

signals. Different glottographic signals provide evidence

of different aspects of vocal fold vibration, so it is

useful to review the glottographic signal--EGG studies that

have been done.

Fant et al. (33) correlated the EGG with optical

glottography and inverse filtering of the speech wave and

concluded that i) the flat portion of the EGG corresponds to

the glottal open phase, ii) the rapid fall in the EGG

corresponds to the closing portion, and iii) the ascending

portion of the EGG is when the vocal folds are opening. A

"slope break" in the opening phase of the EGG was sometimes

seen, and this corresponds to the opening instant.

Fourcin (34) studied the EGG combined with stroboscopic

photography and concluded that there is an antiphase

relationship between the EGG and the glottal area of

opening. He also states








S. .the electrical output is only really significant
during the period of vocal fold closure.. .(34, page
318) .

Fog-Pedersen (35) combined EGG with stroboscopic

observation and based on this arrived at the representation

of the EGG during a single cycle as shown in Figure 3.3.

Lecluse (36) also combined electroglottography with

simultaneous stroboscopic observations and postulated the

model for the EGG shown in Figure 3.4. He also measured

numerous quotients from the EGG and identified two basic

forms of the EGG:

a broad electroglottogram, which occurred mainly in the
low-frequency range ( below 150 Hz), and a narrow, nearly
symmetrical electroglottogram, which- occurred principally
in the frequency range above 150 Hz.(36, page 162)

Fourcin (25) and Rothenberg (37) correlated the EGG

with the glottal volume velocity derived by inverse

filtering the acoustic speech signal. Rothenberg has used

the idealized model for the EGG shown in Figure 3.5 to

describe the features in the EGG. He notes that the start

of the glottal open phase can be usually associated with a

discontinuity in the slope of the EGG. This is in keeping

with the observations of Fant (33) and Fourcin (25).

S Smith (15) and Childers, Smith and Moore (4) combined

the EGG with observations of the ultra-high speed films

taken simultaneously. They measured the length of contact

of the vocal folds along the midsagittal plane and found a

high degree of correlation between this length and the

EGG. Their observations support the model of Figure 3.5.

























1 3


EGG







2


1 Maximum opening phase
2 Maximum closing phase
Points 3 and 4 are changes from the plateau to
the glottal slope of the glottographic curves.


Figure 3.3 Fog-Pedersen's model for the EGG (after 35).




















EGG










1 is the moment of

2 is the moment at
length, but not

3 is the moment at
vertical plane

4 is the moment at

5 is the moment at


initial closure at a single point

which closure is completed over the whole
in the vertical plane

which closure is completed over the whole


which opening begins

which time whole length is open


Figure 3.4 Lecluse's model for the EGG (after 36).


















EGG

4

3


1-2 vocal folds maximally closed
3-4 folds separating from lower margins
towards upper margins
3-5 upper fold margins separating
7 lower margins close
3-7 folds apart

1 closure reaches upper fold margins


Figure 3.5 Rothenberg's model for the EGG (after 37).









Baer, Titze and Yoshioka (38) studied synchronized EGG,

photoglottography and the glottal volume velocity. Their

experiments support the conclusions of Rothenberg and

Fourcin.

Recently, Baer, Lofquist and McGarr (39) compared the

information obtained from synchronized high speed films,

photoglottography and EGG. Results are presented for one

task from a male and one task from a female subject. They

found that the minimum in the EGG, corresponding to maximum

glottal contact, seems to occur at the instant of glottal

closure. The instant of glottal opening coincided with a

slope discontinuity in the EGG in the example from the male

subject. Glottal opening for the female subject was gradual

with large horizontal phase differences along the length of

the folds.



Glottal Area and the EGG

The projected glottal area as measured from the high

speed films has been used as a "measure" of vocal fold

vibration. Several parameters can be defined to

characterize the vibration (8,30). The glottal area has

also been used to determine the instants of glottal closure

and opening. Thus the first step in studying the EGG is to

establish correspondence between the glottal area and the

EGG.

Figures 3.6-3.9 are plots of synchronized EGG,

differentiated EGG and glottal area for a number of typical



























38 60 99 129 158 1&0 218 240 273 308
SUBJ : JMN 125 Hz, 60 dB xs
Figure 3.6(a)


38 68 98
SUBJ : JMN


129


Figure 3.6(b 7 Hz,


Figure 3.6 Synchronized EGG, Diff EGG and glottal area, Subj: JMN


3.08

C3.ae






.80
CD



CD
Jd
Q

a


a
03.06


/I -iI \\


i \ / 1i f\ il '




I I '1I /\
. .. ......... ......i...... .. ..... |. .....








Iii !l \ I \ !i


ISO 188 218 24


68 dB


8 278 3886
ns x is










3.80


Lb

.LJ

2.88





e.ee
0:
W





Lai


(-9













e.ee

LJ


2.8a






w

C4



10.861


38 68 99 123 158 1S i 218 248 273 388
ST ", i -. mS X @1


3UDJJ : I'IRz


Figure 3.7(b)


Figure 3.7 Synchronized EGG, Diff EGG and glottal area, Subj: DMK


SUBJ : DMK 125 Hz, 77 dB "s x e
Figure 3.7(a)


__


nz, "4 14 B





34

-.8 i ... -- ......... .. .


. ........ ...... ........... ... .l ......... ......... I ........... ..







S-----Figure 3 -----.8(a)
SUBJ GPM 170 Hz, 64 dB "S


-- 158 I 210 24a 270 3U8
SUBJ : GPM e 3 40 Hz, 67 dB ""
Figure 3.8
Figure 3.8 Synchronized EGG, Diff EGG and glottal area, Subj: GPM







35


36 68 99 129 1t8 180 219 248 279
SUBJ : AKK 190 Hz, 72 dB
Figure 3.9(a)


38 68 39 129 158 1.8 218 248 279
SUBJ : AKK 214 Hz, 70 dB
Figure 3.9(b)


Figure 3.9 Synchronized EGG, Diff EGG and glottal area, Subj: AKK


3.86


3.08


380
ms X 1t


389
Is X 1a









tasks from our data base. The arrangement of these plots is

as follows:

The first graph in each plot is the EGG (EGG). Next is

the differentiated EGG (D-EGG) and the last is of the

glottal area (AREA). Two sets of dashed vertical lines have

been drawn in these graphs. The first set is drawn at the

glottal opening and closing instants in each pitch period of

the glottal area. Also included in this set are vertical

lines at the maximum value of glottal area in each period.

The second set of vertical lines is drawn at the maximum and

minimum in the differentiated EGG in each pitch period. The

significance of this second set of lines will be obvious

shortly.

Based on the study of such plots for all the tasks in

the data base, we describe the EGG during the different

glottal phases in the next subsection.

Description of the EGG

Closed Phase. The start of the closed phase is usually

associated with a rapid decrease in the EGG. The minimum in

the EGG, corresponding to maximum lateral contact, occurs in

the closed phase after glottal closure. The behavior

reported by Baer et al. (39) in which the minimum in the EGG

occurred at the instant of glottal closure was not observed

in any of the data sets. The EGG begins to increase from

its minimum while still in the closed phase, reflecting the

separation of the folds from the inferior surfaces towards

the upper margins.









The observed shape of the EGG in this period is

typically parabolic, implying that the depth of contact of

the folds is continuously changing. Most of the examples in

Figures 3.6-3.9 show this behavior. Occasionally, the EGG

has an "almost" flat region during which the depth of

contact is presumably constant. An example is shown in

Figure 3.8(a).

Opening Phase. The opening phase is defined in terms

of the glottal area as the duration from glottal opening to

the maximum value of the glottal area. The glottal area

during this phase increases monotonically to its maximum.

Observation of the corresponding high speed film frames

reveal that the initial glottal opening is gradual with

large horizontal phase differences. Thus it may take

several film frames for the glottal opening to spread to the

entire length of the folds. Further increase in the glottal

area is brought about by the folds moving apart with no

change in the lateral contact between them. The EGG

consequently shows two distinct phases. In the first, the

EGG increases monotonically reflecting the decreasing

lateral contact between the folds. Once the folds have

separated, the EGG remains constant while the folds pull

apart further.

This description of the EGG during the opening phase is

consistent with the observations of Baer et al (39).

Closing Phase. The closing phase is defined as the

duration from the maximum glottal area to the instant of









glottal closure. The area decreases monotonically to zero

during this time and is usually symmetric with respect to

the opening phase. The movements of the vocal folds,

however, reveal a basic asymmetry between the opening and

the closing phases. Over a large portion of the closing

phase, the vocal folds adduct towards their medial position

with little or no change in the length of contact along the

midsagittal line. Just prior to closure, the vocal folds

are almost parallel with a narrow opening along their entire

length. Closure occurs almost simultaneously along the

entire midsagittal line. Thus while the glottal area does

not reflect this fact, the glottal closure is an abrupt

phenomenon.

The EGG, as a result, again has two distinct phases.

In the first, the EGG continues to maintain a constant value

while the vocal folds come together without contact. Then

comes the characteristic rapid fall in the EGG corresponding

to the almost simultaneous contact along the length of the

folds. This is perhaps the most consistently observed

feature in the EGG during normal, chest voice phonation.

The experiments of Baer et al (39), Rothenberg (37), Fourcin

(34) and Lecluse (36) agree on this point.

Rothenberg (37) has suggested that the EGG may be
influenced by the electrical capacitance of the glottal

opening, particularly when the folds are close but not

touching. As explained above, this is typically the case

just prior to glottal closure. The EGG, however, does not









show any changes during this time, implying that the

capacitive effects are not significant.

Determination of the Opening and Closing Instants from the


The description of the EGG in the previous section made

no reference to features in the EGG that mark the instants

of glottal opening and closure. One of the important

applications of the EGG is in automating analysis of vocal

fold behavior. It is therefore necessary to define suitable

operational techniques to locate these instants from the

EGG. The usefulness of the definition must be validated by

comparing the results obtained by using the EGG against

other standard techniques.

While previous studies have located features in the EGG

that correspond to glottal opening and closure, they do not

enable a unique determination of these times. For example,

glottal closure is said to occur during the "rapid fall" in

the EGG. Since this fall can span several film frames, just

which of these corresponds to glottal closure? The case of

the glottal opening is worse since the corresponding

feature, a slope discontinuity in the EGG, need not even be

present in the waveform.

The discussion so far points to the rate of change of

the EGG with time as a better candidate for locating the

glottal opening and closing instants rather than the EGG

itself. The time differentiation for sampled-time data, as

used in this study, can be easily approximated by the

discrete time filter,

H(z) = 1 z-1.









The differentiated EGG (Diff EGG) is included in the

synchronized data plots of Figures 3.6-3.9. Thirty out of

the thirty six data sets used in this study show similar

waveforms. The six data sets that do not fit in this

category appear to be examples of breathy phonation in which

the vocal folds vibrate without any significant contact

between them.

Closing Instant. As was explained earlier, closure

occurs almost simultaneously along the length of the folds

and the EGG decreases rapidly during this time. This rapid

fall is typically less than 0.6 ms in duration. The Diff

EGG has a sharp negative spike that corresponds to .the

greatest rate of decrease of the EGG. The instant of

glottal closure is operationally defined for this study as

the minimum in the Diff EGG during a voice period. The

rapidity of glottal closure ensures that this feature is

usually within 0.6 ms of the actual closure instant.

Opening Instant. Our earlier discussions have pointed

out that the EGG changes slowly during the glottal opening

phase because of the horizontal phase differences associated

with the opening. Examination of the synchronized data

plots in Figures 3.6-3.9 shows that the Diff EGG is maximum

very close to the instant of glottal opening. This instant

typically corresponds to a point of inflection in the EGG,

where it changes from a concave upwards to a concave

downwards curve. When a slope discontinuity is present in

the EGG, our observation has been that the point of slope









discontinuity is also such an inflection point. Now the

second derivative of a function is either zero or does not

exist at a point of inflection (40). A second observation

is that the second derivative of the EGG does not exist at

this point of inflection in the EGG.' We illustrate these

remarks with the sketches shown in Figure 3.10.

The close correspondence between the maximum in the

Diff EGG (which also happens to be the point of inflection)

and the glottal opening is observed very consistently in all

the data sets we have studied. The EGG and Diff EGG

waveforms in (41) also fit this model. Other researchers

have not used the Diff EGG, but our description of the EGG

during glottal opening appears applicable to the waveforms

published in (37) also.

Based on this discussion, we define the opening instant

as the maximum in the Diff EGG during a glottal period.

Period and Open Quotient. Once the opening and closing

instants have been determined, the pitch period, T, is

defined as the time duration between two successive closing

instants.

The open quotient (O.Q) is defined as
.Q = duration of the open phase
pitch period.
Note that the opening and closing instants have been

defined as the maximum and the minimum in the Diff EGG in a

single glottal period. Given an EGG record containing

several glottal periods, the value of the maximum and the

minimum of the Diff EGG need not be the same in the



























EGG

II
I I
I I












Diff.
I







EGI






0
I I
I I
I I
I I
II I
I I
I I
I I
II
Diff.I
EGG I
1


opening without a
slope discontinuity


EGG


I













Diff.
EGG
0



opening with a
slope discontinuity
Diff.
EGG
0~~ _

oeinqwt
slp disnInut


Figure 3.10 Illustration of EGG during glottal opening








different periods. A method for automatically locating the

EGG opening and closing instants for all the periods in the

record has been implemented and can be described by the

algorithm given below.

Algorithm EGG-Closed-Open

Let the EGG record be EGG(1)...EGG(NUMB).

1. Remove the mean from the EGG record.

2. Differentiate the EGG using the filter

H(z) = 1 -z-1
3. Locate the positive to negative zero crossing

instants and negative-to-positive zero crossing instants

in the EGG. Label these PN(i) and NP(j) respectively.

Note that the simple form of the EGG ensures that either

NP(1) < PN(1) < NP(2) < PN(2) < ... < NP(K) < PN(RH) OR

PN(1) < NP(1) < ... < PN(K) < NP (RH) < ... etc.

depending on whether the record starts with a

positive-to-negative or negative-to-positive zero

crossing. Let n = number of positive-to-negative zero

crossings and m = number of negative-to-positive zero

crossings. Note that In ml < 2.

4. Initialization:

If NP(1) < PN(1)

Then locate a maximum in the Diff EGG between Diff

EGG(1) and Diff EGG(PN(1)). Label this instant

OPEN(1); it is the first opening instant.

Else, locate a minimum in the Diff EGG between Diff

EGG(1) and Diff EGG(NP(1)). Label this instant

CLOSE(1); it is the first closure instant.









5. The loop:

For i = 1, ..., n 1,

locate a closure instant CLOSE(i) as the location

of the minimum in Diff EGG between Diff

EGG(OPEN(i)) and Diff EGG(NP(i)).

For i = 1, ..., n 1,

locate an opening instant OPEN(i) as the location

of the maximum in Diff EGG between Diff

EGG(CLOSE(i)) and PN(i).

6. End.

Note that this algorithm locates the opening and

closing instants sequentially and uses only zero crossing

and peak picking information. It is therefore capable of

real time implementation.

The opening and closing instants also need to be

determined from the glottal area function to compare the two

methods, EGG based and area based. A similar algorithm to

determine the opening and closing instants from the area has

also been implemented.

Algorithm AREA-Closed-Open

1. Locate the peaks in the glottal area record. Let

the number of peaks = n.

2. For i=1,2,..,n-1

do

i) locate the minimum glottal area between the glottal

area peaks i and 1+1.








ii) let A1 = area of glottal peak i

A2 = area of glottal peak i+1

M = minimum glottal area between peaks.

Set the threshold as


AI+ A2
THRES = 0.1* ( M) + M.
2

iii) locate the closing instant as the index j such that

Area (j-1) > THRES and Area (j) < THRES

locate the opening instant as the index K such that

Area (K) < THRES and Area (K) > THRES.

3. End effects:

i) locate a closing/opening instant between the start

and the first area peak (if possible).

ii) locate a closing/opening instant between the last

area peak and the end (if possible).

4. Return.

Results

A computer program has been implemented that
incorporates the above two algorithms and automatically

computes the errors in locating the opening instant, the

closing instant and in computing the period and O.Q. from

the EGG. The values of these variables obtained from the

glottal area are used as the reference. The program also

plots out the synchronized EGG, Diff EGG and glottal area

with the relevant points marked to allow the researcher to

verify that the algorithms have indeed performed correctly.









This program was run once for each of the 36 data sets,

once comparing the area with the EGG traced from the film

and once comparing the area with the EGG recorded on the

audio tape. As was explained in Chapter 2, the

synchronization between the traced EGG and the glottal area

is very good; hence this was chosen for further analysis.

In some data sets, however, the EGG trace went off the film

or else the algorithm made obvious errors. In such cases,

the glottal area--EGG off audio tape comparison was used.

Since the closing and opening instants in the area are

well defined only when complete glottal closure exists, only

such tasks (22 out of the 36 in the data base) have been

included in the analysis of opening and closing instants

determination error.

These results have been summarized in the form of a

series of tables and figures.

Opening instant error. The error in determining the

opening instant from the EGG as compared with the opening

instant determined from the glottal area for each of the

four subjects in this study is shown in Tables 3.1, 3.3, 3.5

and 3.7. The distribution of this error is shown in Figure

3.11.

Figure 3.11 reveals that while the error was less than

eight samples (0.8 ms) in most cases, there are two examples

where the error is more than twelve samples (1.2 ms). Also,

the error shows some subject dependency.









Closing instant error. The error in locating the

closing instant for the four subjects is shown in Tables

3.2, 3.4, 3.6 and 3.8. Figure 3.12 shows the distribution

of the error. It is seen that the closing instant was

located with an error of less than six samples (0.6 ms) in

most cases.

Pitch period measurement. The pitch as measured from

the EGG and the glottal area, and the error are shown in

Tables 3.9-3.12. The error distribution is summarized in

Figure 3.13. It is seen, as might be expected, that the EGG

is an excellent signal for the measurement of pitch,

typically involving less than 0.5% error in the measurement

as compared with those obtained from the glottal area.

O.Q. measurement. The open quotients measured from the

EGG and the glottal area, and the error in the EGG

measurement are shown in Tables 3.13-3.16 and the error

distribution summarized in Figure 3.14. Note the strong

subject dependency of the error.

Discussion. The results obtained show that the EGG is

an excellent signal for locating the closing instants of the

vocal fold vibration and for determining the vibration

period. While it gives a good indication of the region

where opening occurs, the present algorithm is not very

effective in locating the exact instant. Consequently, the

O.Q. measured from the EGG also shows large errors.

Moreover, such errors appear to have a subject dependency.









TABLE 3.1 ERROR IN DETERMINING CLOSING INSTANT
IN NUMBER OF SAMPLES


SUBJECT:


JMN


TABLE 3.2 ERROR IN DETERMINING CLOSING INSTANT
IN NUMBER OF SAMPLES


SUBJECT:


JMN


FREQ 125 170 340

INT



LOW 4.00 3.2 N/A



MED 2.00 14.2 N/A



HIGH 6.25 3.4 N/A


FREQ 125 170 340

INT



LOW 1.00 2.00 N/A



MED 2.25 3.5 N/A



HIGH 1.67 3.4 N/A









TABLE 3.3 ERROR IN DETERMINING OPENING INSTANT
IN NUMBER OF SAMPLES


SUBJECT:


DMK


TABLE 3.4 ERROR IN DETERMINING CLOSING INSTANT
IN NUMBER OF SAMPLES


SUBJECT:


DMK


FREQ 125 170 340

INT



LOW 4.33 3.80 3.6



MED 2.00 3.20 2.6



HIGH 3.00 0.600 N/A


FREQ 125 170 340

INT



LOW 1.00 3.48 1.6



MED 3.50 0.00 2.1



HIGH 1.25 2.20 N/A









TABLE 3.5 ERROR IN DETERMINING OPENING INSTANT
IN NUMBER OF SAMPLES


SUBJECT:


AKK


TABLE 3.6 ERROR IN DETERMINING CLOSING INSTANT
IN NUMBER OF SAMPLES


SUBJECT:


AKK


FREQ 125 170 340

INT



LOW 11.5 7.6 12.25



MED 11.4 4.83 N/A



HIGH 6.5 N/A 7.3


SFREQ 125 170 340

INT



LOW 0.667 0.75 2.40



MED 1.00 2.86 N/A



HIGH 0.5 N/A 1.9









TABLE 3.7 ERROR IN DETERMINING OPENING INSTANT
IN NUMBER OF SAMPLES


SUBJECT:


GPM


TABLE 3.8


ERROR IN DETERMINING CLOSING INSTANT
IN NUMBER OF SAMPLES


SUBJECT:


GPM


FREQ 125 170 340

INT



LOW N/A 4.25 4.1



MED N/A N/A 3.1



HIGH 3.49 5.28 1.14


FREQ 125 170 340

INT



LOW N/A 4.00 2.0



MED N/A N/A 0.9



HIGH 7.56 6.00 3.71












r r


I 1-I


t0 to 0
q-* 1 i. 0
S.0
*.'. *.0'. *r
~I I I 4


h
-I


*4CO
i-I


.0t
0


* 4


iLn r-
v-i


0 C*
0. C.
COLfl

1-4






Co

.


,4 -


OL
0


LJ




N
3= C


os







Lii


cr
0
cZ

LUi




N

LCi



1-4





<
LU
tt:





C)
0:

o

LUI




N




i-4









LC
LU






Q


4 4 4

z r r


N
t--

C
C


0,








r-





t-
4.)









r-
















C





L.



co




E

C
-I
*r









































C,
t-

















































t-
















0
a)






N


0 m


Cr)






LU
ai,,I
e,<,


4 +


4 4


*-4 *- -1
*-I --r I-I
.ev) *,O J O4


,0 -4

-4


4I -- -


CU
* ('a

-4


10 0 -4







LA O IO
CLO




c r, C % LrA^r
C04 04 -0
-4 r- t-
>- i l -



i-i i- -


L 4.


0

cr)


0

Li





N


I-'








LU
Q


N
















ii,






LI

I-
1..,
1-1









54






t C%;
Lii


N


n o c

CD O C 0.











4.)
*C 2 *0 L





N
CD ko .. .





CD 0 0 *







LU (








ko
(n tn N C4
=L CC
Cg O (
SCD 0 Z -
c 4




. s .












--j
------- ------- ------y
























U-o
0 0i


LI UJ C O
I- I-









C)
.4: 0 *

I- O 0 0

4 J_ M
Bt >^ >. ^_ 1






I : 3
















0







N
I -
. cm










LIJ
on


N


C-4











C"


r-


*
0 .
CV






CO
o


4 I. 4


r-4






i-4
co

CM



* -


0








0





o o
UJ












CD













0
L














UJ
OLE







LLI









0-


C"
LLi












.-j
=>=









QC
UJ


0
uj






N


-I3
CCD




UJ







ii
cr




LL.
u-


Z 0


O

0









CO C.




mc
CV)




I-

,4 *-

CO


o


"d-


CO 00


CD
*l 0>
i-4





0

Cj
4o
i-


o

-4







COY

-4


*co





r(
-4


i-4


4 &


c~ 4

Z 1 ;L



















L&J




N











L&J


O)
0:

Lii






N

,w
-4


w






CC

CD
CD









o
Oc





uj



















LI
-J











cc
Av
<,
LU


0:



z: z








=3 t^


CD







LO
CM


*








c)
0


+


C*

C)
















00
P-


*-
r-












O0


rl-
LA
CM.

*









0
ON





(0
d3


0
*-I





CO

O
CM
LO



0



co
clu
mr


c


rC
u0
*










C-i
co
CV


0-
Li.I




3 cm
DI-

S J -1 I-


c)
ol
rY:
LUi




N




=- m
I-I


1 I


Z







4

I







4

I


\ I
LIT














CD.
0

Cu
LU




N





= t.




LUi


o
0
a:






N



MLI
I-4




4c


4:


0







LLI








4:




o E


LL







c4







Li









3m
L-4



0-J -
LU: 0


-r
CI














LO
0r
V,


L,



-4




00
o








LA

co


I +


"-

1-4

-4




L
cJ
o




cn


N
CM
1-4


0
r~







'o
10

0


4 1


r*
-4


0



1-4


c)


r"


-o






-4
O
Ln


*



cli
-4

U,

o


Ln
cY,

co




CMj








O

Ln
cn
*r
LO


CM
0








Ln
c


0y




CD3

D 3 I
Im- 3 Q C3
*- -1 Z 3=


cr




C:

N









-4


cr-














ce
O


LJ




N

C"
















0
LU
O



















LJ







-4







Li


C)


LjJ




N




1-4





Li
N




|r


cO
LO






*
c;




to



OD





LC)
c)
00
Lo





O
-O

cc

cJ




CM








to

O


LO

CM




CN



to





LO

C-


0


CO




CD



tO
-4c


I


0
0
C)
LO
LA






cr

U-->

0


0
cO
O


0
r-







CO
co

0


-4




LO
*

cr





0


LL.
,u-


I- 3
2 0 L L-
3-r"
U ...J E "r"
















l:

LIi








LLJ
N







C:


LLJ




N
= C

C




-4
'-i


09



0


C.






LLI










L-








LUi
0:






















-4

v,4

LLi
-j


I*-
0
LU C


I .
















LO
S "4
00

0


4 I


<0
km

o




C\1
rl
cur


o
LO

0)


OJ
CM






0
1-4










C-

r-.


(0


co





*









ON
CO,









0
CD
0


I 4-


-4

LO





o



0


LL

.;-
3=

Z 0 UJ 1
-* -1 -3"


0:


ULJ




N




1-4

















0 C





Su
C-


4


S /
";.". ...- ;. ;--
. -. ... "* "-.. '*.


Il o


/ /


JMN


DMK


GPM 'L l


AKK


- r r

- -
A '
j. jI


S4 6 8

Number of samples


7T?~77i
1/ /', -1


18

error


12 14

(Fs=10KHz)


Figure 3.11 Distribution of the error in locating the opening instant.


-II // / //


I -




-- -i I~ -


JMN /F/,,


DMK





RKK


~~~~rI I.Ir ,K ,


Figure 3.12 Distribution of the error in locating the closing instant.


12

'11
la
le u


ZT s


Eu
SU 6
Z u


4

3



1

S


Number of samples error (Fs=10KHz)


m m m I I


1


L


mw


i s r -.
-.
r
~ii
rrrr
rrrrr
r r
rrr,
rrr'
rrrr, r






61




16 57-
is JMN
14
sDMK i
O ,- >,,"
it < GPM

-0 3 KK





3 -



S ..5 2.1 2.5 3.8 3.5 4.9 4.5 5.S
2 0 1"1 .., .., ,








% Error in measuring pitch period

Figure 3.13 Distribution of the error in measuring the pitch period.








te JMNPI..
3 -.. .









2.- / -DK












I PKK
Eu -
z Co


B 1 \ / '/ ',s I I
8 5 18 15 29 2S 38 35
Error in measuring O.Q.

Figure 3.14 Distribution of the error in measuring the open quotient.









Another phenomenon observed is that for the same task,

the periods and O.Q.s measured from the glottal area for

each vibratory cycle show more variation than those measured

from the EGG. The implications of this have not yet been

pursued.



EGG and the Length of Glottal Contact


We have mentioned earlier that on frame by frame

projection of the high speed films of the vocal folds it is

seen that there exists phase differences along the length of

the folds during the opening and closing phases, i.e.,

during the closing (opening) phase, contact (opening)

between the folds first occurs over a small portion of its

length. In succeeding frames this contact (opening)

proceeds "zipper" like along the length of the folds until

the whole glottis is closed (open). This behavior is more

pronounced during opening than closing phases.

Now, the lateral area of contact between the vocal

folds changes in two dimensions, along the length of the

folds and also along their thickness. However, we can

assume as a first approximation that the depth of contact

does not change appreciably during the period of time when

an initial glottal opening spreads to the entire length of

the folds. Then, the lateral area of contact is

proportional to the length of contact along the top margins

of the vocal folds. A similar remark also applies during

closure.









Since the EGG is presumably proportional to the inverse

of the lateral area of contact, the conjecture is that

during the opening phase the EGG is proportional to the

length of the glottal opening. Smith (15) and Childers,

Smith and Moore (4) found good correlation between the EGG

and the length of the glottal opening.

We compare the EGG and the glottal opening length in

Figures 3.15-3.19. These plots are arranged as follows:

The first graph is of the length of the glottal opening,

normalized to be between 0 and 1. The second graph is the

EGG. In the third graph, the length of the glottal opening

and the EGG are superposed. According to our arguments, the

EGG is proportional to the glottal opening length only

during the open phase. Thus in the third graph, the portion

of the EGG corresponding to the closed phase has not been

plotted. Also, the EGG during the open phase has been

scaled to be between 0 and 1 in each period.

These figures show that in most of the examples, the

EGG and the glottal opening length correlate very well. The

rising portion of the EGG during opening, the flat portion

corresponding to no contact and the steep closing portion

agree with the corresponding phases in the length. There is

another important observation to be made--in some of the

data sets with a closed phase (Figures 3.17, 3.18 and 3.19),

it is seen that the EGG has a smaller value at the instant

of glottal opening than at the instant of glottal closure.

In other words, for the same length of contact between the








folds, the impedance across the folds is smaller during

opening than during closure. If we assume that the

electrical properties of the contacting surfaces is the same

during the opening and closing phases, this implies that the

thickness of the contacting region is much larger during the

opening than the closing phase. However, this is in direct

contradiction to what has been observed in practice. To

quote Baer ,

Glottal closure also exhibited wavelike properties.
Tissues at the lower edge of closure were peeled apart,
while tissues above the point of closure were still coming
together. The depth of closure was almost negligible
immediately before the glottis opened.(26, page 40)

Even on the observations of the high speed films, it is

seen that just before opening, the texture and reflectance

of the contacting surface show a change that leads one to

believe that the depth of closure is very small.

Thus, if the depth of closure is in fact smaller at the

opening instant, then the lower impedance must be due to a

higher conductivity of the contacting surfaces.

Observations of the films show that in many instances,

the last layer of the vocal folds to separate during the

opening phase is the free mucus on the surface of the

folds. After repeated observations of some of the films in

the data base, we are convinced that the mucus is indeed

responsible for the lower impedance during the opening

phase. This point is taken up further in the next section.


































38 6s 99 129 150 iaa 218 24e

SUBJ : DMK 125 Hz, 77 dB

Figure 3.15(a)


38 68 9s 128 158 8 210 248 278

SUBJ : DMK 170 Hz, 74 dB
Figure 3.15(b)


Figure 3.15 Synchronized length of glottal
Subj: DMK


opening and EGG,


-J
0


3


-J



2






1

I i
SAI

0 J
Li 9


279 398
nS x 18


308
S x e
ms x is























:A ,, : ...- ...... : ..... ... ." -y / "-"... .. ." ."...l' t ...... .....
-- -------: -'--- v ;_ -... -.. -- -.- ." : -
. p i : v / |


v v" i7 ?,

i8 129 1S leg 218 240 278 389
DMK 340 Hz, 63 dB s x te
Figure 3.16(a)


38 a
SUBJ :


?9 128 158 i@ a
JMN 125 Hz,
Figure 3.16(b)


219
56


240
dB


Figure 3.16 Synchronized length of glottal opening and EGG,
Subjs: DMK and JMN


C31
Uub 4


i


i




















3

J
2


,.O,


/.i i i i



1/ 1 / 1


*'...J...., .^_ .... \ ........ __ .


i '.'
Ri i ;
.
I/
i)


A1
0


279 389
MT X 1s





67






I ji






S 6 128 Zii i 279 39

3 68 98 128 16 184 218 24 278 389
SUBJ : JMN 170 Hz, 75 dB s
Figure 3.17(a)










S u..... .1 ...... .... : .n ... ..... ... ...... ...... .






I -- -------------
,, / i i l i t i \ ii
S...* -.. -......... .........- ..-- -. ......-L..... .---- ......... ----..- --



338 6 90 120 l56 18 218 248 278 388
SUBJ : JMN 170 Hz, 72 dB "s,< l
Figure 3.17(b)
Figure 3.17 Synchronized length of glottal opening and EGG,
Subj: JMN






68


S ............. ........... ... ........... ......... ... .....












iJ : --- -z- /- -- --
















.. ...... .. ....... ........ ............. .



. ---- --- --..'. .--.. .-- -- -.-.-. -









38 68 9 18 i169 219i Z48 27 38i


SUBJ : GPM 340 Hz, 73 dB s x is
Figure 3.18(b)


Figure 3.18 Synchronized length of glottal opening and EGG,
Subj: GPM














_1J


2

CB







(3





LJ


















CD






Ll
CD "i
U I























3
-J














Li C 8_l


38 68 90 128 150 ,8a 218 248 278
SUBJ : AKK 170 Hz, 68 dB
Figure 3.19(b)

Figure 3.19 Synchronized length of glottal opening and EGG,
Subj: AKK


39 68 90 128 156 183 218 240 279 399
SUBJ : AKK 190 Hz, 72 dB xs 1
Figure 3.19(a)


388
Ms x 10









EGG and Observations of the High Speed Film


The final part of the study correlating the EGG and the

ultra-high speed films was the frame-by-frame visual

observation of the films to locate events which may be

responsible for the shape of the EGG waveform.

Two complete vibratory cycles were selected for each

task. Then, from the plot of the synchronized EGG trace,

the film frame corresponding to the opening instant

determined from the EGG, the frames corresponding to the

knees around the flat top of the EGG and the closing frame

from the EGG were determined. The high speed film was

projected onto a screen using a stop-frame projector and the

vibratory behavior observed during these frames noted

down. Figure 3.20 illustrates these remarks. This results

in a table of observations of the form shown in Table

3.17. The observations from several such tables were

collected to form the tables shown in Tables 3.18-3.20.

Perusing these three tables along the rows

corresponding to EGG opening and EGG knee it is seen that

i) In 6 of the 12 tasks, the EGG knee before the flat

open phase coicided with a break in a strand of free mucus

stretching between the folds.

ii) In 6 of the 12 tasks, the maximum in the Diff EGG

coincided with a frame in which there is some form of change

in a mucus bridge across the folds.

The effect of the mucus on the EGG is particularly

apparent in one of the tasks, Subj: JMN, 170 Hz,72 dB. The


















EGG










Diff.
EGG


0--


A B


knee
knee
closing instant
opening instant


A,B,C,D are the EGG
high speed film


events chosen for detailed observation on the


Figure 3.20 EGG events chosen for film observation


A B


EGG
EGG
EGG
EGG









TABLE 3.17 TABLE OF OBSERVATIONS FOR
SUBJ: JMN, TASK: 170 Hz, 72 dB


EGG FEATURE


1. Opening,
frame 12.5







2. Knee,
frame 15.5



3. Knee,
frame 27.5



4. Closure,
frame 29

5. Opening,
frame 43.5




6. Knee,
frame 46.5

7. Knee,
frame 58


8. Closure,
frame 59

9. Opening,
frame 73.5


LENGTH/AREA FEATURE


1. Opening,
frame 5







2. Knee in
length, frame
15


3. Knee in
length, frame
27


4. Closure,
frame 28

5 Opening,
frame 35




6. Knee,
frame 44

7. Knee.
frame 58


8. Closure,
frame 59

9. Opening,
frame 67


FILM FRAMES DESCRIPTION


1. Small posterior
opening has started
as early as frame 2.
Between frames 12
and 13 a large mucus
bridge at the vocal
process begins to
separate.

2. The mucus strand
at the vocal process
breaks in frames
15-16.

3. First lateral
contact between the
folds occurs in
frames 27-28.

4. Complete glottal
closure by frame 28.

5. Posterior
opening present from
30. Change in mucus
bridge frames 43-44
as described in 1.

6. Mucus strand
breaks, frame 46.

7. First contact
between folds is in
frame 58.

8. Closure occurs
in frame 59.

9. Opening has
started frame 63,
same comments as
in 1.

















U(
cL U -o

U 1 0r E
"- .- <- = (r 4-)4-
-0 (A f




S. L C uE
z[ i- c 01 o a < ao

L W t- 0 ., .
C V)4-J (V 4-) E
4 "" -- 4O (A W

CAN LW ,LJL
C ,.4 .,- ( .U W n _) 04 (
O L; .-1 u 00 (-- (,- .-C I
Su oF0. aIU 4- 4- o
.4-) 0U LW W r-






) i < "-- 0 00
!- .=o'- 0 3 flLLJ to o >c .. .4-
z It" Q; c E Q; m Io
000 04-.r 1 n-
tm M C (a 0 t4 0 ,
4 J 4-- (V a u E
-o a ) a 0 D














SeL C f l .U
Lj C-) (A C L- t- (A M = W4- L. ) 0CO
_, i O. J U. o ,, 0 4- ,
L.=- ea ccu 4-) E





-L
V) W
c3 O U O u u *e E *r o io







S0 I I U.,-
C)) C

CD- L4

WJ 00 0. c f ) .- -) a)r-
-j, M a t- C-o O E
v 0 -C 03 Ll-o-+ C) O
S4-)4 U '- ) u4-
C *- =C m





-03
0 IL 4-

E> -W U (flDO
30 L-C4 OWJ'0 OE

OD UO)E -0 c O aC)
4 m C:'- < (- O r- U L
u l 0"- r L- y- 0 3
,- 4-34
c U Ua) 4 o
.0 C = =.-
-1jO CL E c+J4-) 4-) 3u

cc
I-




cmc

j a a) < 0


I. C C
o L)Y

.r .














I-K I


A r -o 4)



M L. 4-0. L
MC 4- QC






*W,-4 l U
0-t- 3 0 0 C




=3 0 *i Vf f0
O U W
0 4- C "= =

.**- 0 C*- =
-1 0 ,r- 3 E


CD



-c4
C0


















1-4


SJ
4 -.
C

( 4.
0
0.0)
Q.>
cO


cn 4- U
0) 0o 0
0 40

ULO
*C 4- U J C
C*i- u





S0c04











C

0)
o L- 4-
C 01

0.>


0)
S IC
,-- (0 I *- .-
L- C5 C '4-
0 *- 3 *..- *-

U C U


-- T-. .- .)4-. 01

> C]- C 0) =) 4134.3
,- 4- U 00


C ) ca






0) 0)



01n U 4-) 0)
OL


a t- u E
*'- f0 0) M0 0
C 0) E 4-L L-
0) (0 C 4-
0. 0) (A 0

M0 4J








F-1- 0 UO 0

0- C4.3) 0 r-
*-- .0) C 0 -".- L
SCD L 0. U
Ce 4 4


















0)
CL C



CL c
















CL



C-


cm

=El
CQ


1-
p-4
r=


I
U W
00)




- -

U 0)
0s .C
C4 .4 4--


C
0) *-
I 0. -
C 4.- 4.-) 4-
0 m *r- a 0
4to S 10
4- 3 o
U 10 .. U c rC- U)


C 01 *.- 0
0L -*i >30 4-
C W Q- C- -
* CO .*. 0) W
. 0 0 0 -E .C


0 I




C 3e
4-) W c-
L 0).


S4-O) 4-
L-.. C 4.-
OUW1

, U 4..-






LOCr
00)
- )
e3 c
U) 0 -
i 0) 0-


LL C 4.3

0 U 0
mu:'O ^


cu
C

a
0) 0'-
0 1 L-

S.0
c U1
*i- 0 =
O 0

4-3 0
*r
CO S (0


-O

'.0








e-4


0)
SE
r- 0
. r,
E4-
0

L)- U)-


-


01
0 U3








(U)



4.00 E)




0=
OE







O .-
e-





U)


C-


01=

S0r-


CD
CD
Ltlp
Lufc
UJ

























U- C


C.0 0 .


U E. U-

.CiU


IWE
o0 Li 4-)
= Li
T- 3 U


U E to
C

0 -d
L)C Em

. *- L
-2 J3


d0
4-)
L




4)




0) 4- L.
0. Ia
o C'. a3


co






CM











-r
0"







0










O-
-o




1^



y
<:[





-0





IZ




W
*o


In In 10 'u

0 4 4) 4-)


C 01 0
U -- C E a
0 -b 0
OC. EO C L,-
4-) 4) ( )4- L
L- L 0.
- 0 O 0



eC L- 4-
0I O f- 0
*'- 0 -i-.0
41 L L. C
O L ) > 0



* C C Q- 0


0o > 4-
U. 00 0 0..-.



.-L .4- LU 0 1 4-


c










01
I- 4










CC
S0 +-









U00
3



U- -(

+J -
4Ur-0
LLC 0 4
U- QO
C (Uc


4J



















*4>





CL













e"
r
Ci

L



r.
0e



















0
*L
C

C
0.




04)








C

L


























*


CM


e.-
t4)>
L.,I

r
=In


i-
.Li
I- 0-
U

o0


I)


L 4-)

In -o
'04


0 C

ao 3
04-




1 I

LOC








LL4 r_ 41


0 4 -) 4-)
t-cru J3
*i- en x
bU~-C, -


I
L
J


0


.-E




(->'-4-
t-1-
*0 E


40


ce
OS
.C

4-)
4(
0nC
L.)
V- ,

* )
Co


cc







C)


CO
1"
c?


LJ









table of observations for this task is shown in Table

3.17. The length of the glottal opening and the EGG for

this task are shown in Figure 3.17(b). An examination of

this table and the corresponding high speed film is a

convincing demonstration of this fact.



A Qualitative Model for the EGG


The last three sections are the report of an

experimental study comparing the EGG and vocal fold

vibration as deduced from high speed films. Now, our

understanding of both vocal fold vibration and the EGG is

insufficient to completely describe the various types of EGG

observed, or even all the features in a given EGG record.

Nevertheless, the results presented in this chapter allow us

to describe an "ideal" EGG signal that incorporates all the

features that have been consistently observed. This is

indeed the purpose of the Rothenberg model of Figure 3.5.

The model we present below is a refinement of the

Rothenberg model. A schematic representation of our model

is shown in Figure 3.21. The discussion to follow is with

reference to this figure.

In Figure 3.21, A-B is the period of time in the

glottal opening phase when the vocal folds are moving apart

increasing the glottal area. There is no contact between the

folds and the EGG is consequently a constant.

During B-C, the folds are coming together decreasing

the glottal area, but first contact between the folds occurs

only at C. The EGG is thus constant during B-C.








The interval C-D corresponds to the rapid closure of

the folds along their length and at time D, the projected

glottal area becomes zero. The EGG decreases rapidly during

this time, and the large negative spike in the Diff EGG

occurs very close to time D.

The interval D-F is the glottal closed phase with zero

glottal area. The EGG decreases during the initial portion,

D-E, of the closed phase reflecting the increasing depth of

contact between the folds. At E the folds reach maximal

lateral contact and subsequently begin pulling apart at the

lower margins. This causes the EGG to increase between E and

F. The Diff EGG also increases during this time--thus the

EGG increases with an increasing slope; i.e., it is concave

upwards.

The point F corresponds to the first appearance of the

glottal opening on the upper margins of the folds. Usually,

but not always, this coincides with a discontinuity in the

slope of the EGG. Between F and G the glottal opening

spreads along the length of the folds, decreasing the amount

of lateral contact. The EGG consequently increases

monotonically; however, the Diff EGG is now decreasing and

so the EGG is concave downwards from F. The EGG has an

inflection point at F.

After time G, the folds are no longer in contact and

the EGG remains constant. The cycle then repeats itself.



























EGG

















Diff.
EGG

0-


I.
11

I




I
I


Figure 3.21 A qualitative model for the EGG


I I
11



I I
I I


HI
I
I
I II
I I
II

1I
I (


I I


I I
i I
I I

I I
I
I I

I








Conclusions

This chapter compared the electroglottograph signal

with simultaneously obtained ultra-high speed films of the

vocal folds. The comparisons indicate that the EGG is

indicative of lateral glottal contact. The experiments of

Smith (42,43) that allegedly show that the EGG registers

acoustic and mechanical effects appear incorrect.

The behavior of the EGG during the different glottal

phases was described. An algorithm for determining the

instants of glottal opening and closure from the Diff EGG

was described and evaluated by comparing against the glottal

area. The O.Q. and period computed from the EGG were also

compared against the values determined from the glottal

area. The results indicate that the EGG provides an

accurate determination of the closing instant and the voice

period. The determination of the instant of glottal opening

is not as reliable, but is typically within 0.8 ms of the

corresponding instant determined from the glottal area.

It was pointed out that the EGG is affected by mucus

strands bridging the folds. These appear to provide a

highly conductive path for the radio frequency signal used

in the EGG.

Finally, a qualitative model for the EGG was presented.














CHAPTER 4

SYNCHRONIZED GLOTTAL VOLUME VELOCITY,
GLOTTAL AREA AND THE EGG


Introduction


The vibration of the vocal folds and the relationship

between this vibration and the EGG was studied in the last

chapter. Here the concern is with correlating the EGG and

the acoustic consequence of vocal fold vibration, namely the

glottal sound source.

The periodic vibrations of the vocal folds cause

"puffs" of air to flow into the supraglottal cavities. This

airflow, or glottal volume velocity, is then shaped by the

acoustic vocal tract filter and radiated as sound at the

lips. The waveform of the glottal volume velocity (v-v) is

an important variable in determining the properties of the

radiated speech wave and is therefore fundamental to all

investigations of the speech production process.

While it was proposed as early as the 1830's that the

vocal folds act as a harmonic generator, the exact waveshape

of the v-v was obtained only in the 1950's, when the vocal

tract was finally understood and analysed as an acoustical

system (44). The reason for this is that the glottal v-v is

not easily transduced, but rather has to be somehow inferred

from the output at the mouth. This entails some method of

"cancelling out" or inverse filtering the effects. of the
80









vocal tract from the mouth output. Since the vocal tract

filter is also unknown, this too has to be estimated from

the speech signal. One possible way is to assume a

parametric model for the vocal tract, estimate the

parameters using the speech wave, and then use this derived

model to inverse filter the speech. Details of this

technique, and a number of related techniques are presented

in the next two sections.

There are several reasons for studying synchronized EGG

and glottal v-v, and it is appropriate to introduce them at

this point. Firstly, a common problem in all the currently

used methods of inverse filtering is deciding when the

method has performed correctly, i.e., deciding when the

inverse filter and the estimated glottal v-v are indeed the

true ones. Typically, this decision is based on the

presence or lack of certain "expected" features in the

resulting v-v waveform. The EGG, being an independently

obtained signal, can provide an objective basis for making

this decision if one knows how features in the EGG and the

v-v are related (37). Carrying this argument further, it

may be possible that the EGG can be used in automating the

inverse filtering itself (37).

The second motivation, related to the first has to do

with the difficulty of inverse filtering. The glottal v-v

has a significant influence on the quality of the voice

produced (45,46). Holmes (7) and Yea (47) have shown that

using a glottal excitation close to the true glottal v-v








can greatly improve the quality of the speech produced by

speech synthesizers. Thus, in situations such as a clinical

environment, voice or singing training, vocoding, etc.,

information about the glottal sound source is desirable, but

is precluded because of the difficulty or inappropriateness

of inverse filtering. The question then arises: Can the

EGG supply any of the information desired?

Finally, one of the long-term goals of the research in

laryngeal and voice source dynamics is that of deducing the

motions of the vocal folds from a set of simultaneously

transduced glottographic waveforms such as the glottal area,

the EGG and the glottal v-v (48). Establishing experimental

correlations between synchronized glottographic waveforms is

a first step in this project.

An additional remark: The last few years have seen a

great improvement in our understanding of the glottal v-v

and its dependence on the glottal area and the supraglottal

tract (49,50,51). However, there have been very few

systematic studies in which the glottal area and the glottal

volume velocity have been obtained in synchrony. Thus, even

without the EGG, the present data base should prove useful

in the testing and verification of these theories.


The Linear Model for Voiced Speech

Almost all techniques for inverse filtering the speech

signal to obtain the glottal v-v are based on the linear

model shown in Figure 4.1. The source is assumed to be a








periodic waveform generator which outputs pulses of v-v.

The v-v is input to a linear, time invariant vocal tract

filter. The transfer function of the vocal tract filter is

determined by the supraglottal articulators. The output of

this filter is then passed through a second filter that

models the radiation at the lips, and is finally output as

speech. While conceptually and computationally simple, this

model is not strictly correct, because the assumption that

the source and the tract are linearly separable, i.e., they

do not influence one another, is incorrect. As a matter of

fact, the glottal v-v is affected by the vocal tract

transfer function, and the above linear model needs a

careful interpretation. Since this interpretation is

essential to understanding the limitations of inverse

filtering schemes, we now discuss the steps leading to the

linear model.

The physiological system producing voiced speech

consists of two interacting subsystems: the mechanical

vibrations of the vocal folds and the sub-and supra-glottal

acoustic filters. The Ishizaka-Flanagan model (52) or the

model of Titze (27) leads to a set of coupled differential

equations describing the total system. The complexity and

computational requirements of these models are substantial,

and they do not lead to practical schemes for inverse

filtering (see, however, Note 1). Now, even though the two

systems are coupled, extensive simulations with these models

(as well as observations of vocal fold vibration) do not








show any significant influence of the vocal tract on the

vibration or the glottal area function. What does seem to

be affected by this coupling is the glottal v-v.

Many of the current studies in the glottal sound source

are concerned with making a simpler analysis of the source-

tract coupling effects than afforded by the Ishizaka-

Flanagan model (49,50,51). Here, the glottal area function

is treated as given, and a lumped parameter electrical

equivalent circuit is used for the vocal tract. This model

is shown in Figure 4.4. For simplicity, only a single

formant vocal tract is represented. The time-varying

resistance, Rg(t), and inductance, Lg(t), are the glottal

resistance and inductance respectively, and are controlled

by the assumed area function as well as the current flowing

through them. If the input impedance, Zt, of the vocal

tract as seen by the glottis, is very small compared to

Rg(t) for all t, the current flow Ug(t) would be determined

mostly by the glottal impedance, and there would be

negligible source-tract coupling. This assumption is true

only when the glottal area Ag(t) is zero or very small;

during the glottal open phase, Rg(t) and Zt are comparable

and this loading effect influences Ug(t) as follows:

1) The inertive nature of Zt at frequencies below that

of the first formant causes a delay in the peak of Ug(t) as

compared to Ag(t). This results in a steeper slope at

closure (and consequently more high frequency energy) in

Ug(t) as compared to Ag(t) (50).








2) The finite values of Rg(t) during the glottal open

phase cause an increase in the effective bandwidth of the

resonant frequencies; since Rg(t) is time-varying, the

frequency of resonance also changes. This modulation effect

on the frequencies and bandwidths of the resonances of the

system is the critical one in considering inverse filtering

(6).

Now, in spite of this coupling effect, if we define

everything to the left of the dashed line in Figure 4.4 as a

source with its output being the actual volume velocity

(current) flow Ug(t) for a given vocal tract configuration,

then in this context, the linear time invariant model of

Figure 4.1 is valid. The decoupling of the glottal and

supraglottal systems is achieved by including in the source

all the effects of source-tract coupling. Note, however,

that now the defined source is not independent of the tract.

Having established the framework in which the model of

Figure 4.1 is valid, we return to some general discussions

on glottal inverse filtering based on this model. First,

since all the blocks are linear, we can interchange the

order of vocal tract filtering and radiation leading to

Figure 4.2. It is well known that the radiation term can be

approximated very well by a differentiation (2). Combining

the first two blocks of Figure 4.2, we arrive at Figure 4.3

where the source is now the differentiated glottal v-v.

The vocal tract filter for vowel sounds is usually

modeled as an all-pole filter; this can be theoretically














Ug (t) um(t)


Figure 4.1 Linear model for voiced speech.


Figure 4.2


Ugi(t) du (t) s(t)
dt
Model of Figure 4.1 with vocal tract filter
and radiation interchanged.


s(t)


dug(t)
dt


Figure 4.3 Model of
combined.


Figure 4.2 with source and radiation


s(t)
























I ng ; 1- Vocal
P I Tract
SFilter






Zt




Figure 4.4 A simple model to study the source tract
interaction effects








justified on the basis of acoustic tube modeling of the

vocal tract (2). The inverse of the vocal tract filter

therefore contains only zero's or antiresonances. If the ra

diated speech wave is passed through this inverse vocal

tract filter, the output will be the differentiated glottal

v-v. A simple integration of this signal will yield the

glottal v-v.

The various inverse filtering schemes described in the

next section differ essentially in the methods of estimating

or implementing the inverse vocal tract filter. Now the the

problem of estimating the resonances (frequency. and

bandwidth) of the vocal tract filter is a common one in

speech analysis. Techniques such as short-time Fourier

analysis, linear prediction and homomorphic processing have

been applied to this problem (53,54). Note however that any

analysis scheme that is applied over several pitch periods

(or even an entire period), and assumes a time-invariant

vocal tract filter over the analysis duration will lead to

erroneous results because of the source-tract coupling

effects mentioned earlier. Only during the glottal closed

phase are the vocal tract characteristics stationary. To

estimate the vocal tract filter of Figure 4.1, analysis

should be restricted to this interval. This is what has led

to the concept of closed-phase speech analysis (6,19,55).









Some Inverse Filtering Techniques


We present in this section some of the wide variety of

inverse filtering techniques possible. The primary

motivation is to show that while many different

implementations are possible they all lack an objective

criterion in deciding when the signal output is the true

glottal volume velocity.

Miller. One of the first investigators to successfully

obtain the glottal volume velocity was R.L. Miller in 1959

(44). Miller used a linear phase low pass filter to remove

the second and higher formants and an analog zero circuit to

cancel out the first formant of the speech wave. He

initially used a spectrographic analysis of the speech

signal to obtain the settings for the inverse filter

network, but later appears to have abandoned this step,

setting the controls directly. The criterion used in

deciding the correct setting was that the resulting glottal

v-v should have a "flat" closed phase.

Holmes. J.N. Holmes improved Miller's inverse

filtering technique by including antiresonances or zero's

for five formants (56). The inverse filter controls were

adjusted to produce minimum formant ripple in the output v-

v. Holmes was primarily interested in the improvement of

the naturalness of speech synthesizer outputs when such

measured v-v waveforms were used as the excitation source

(7).









Nakatsui and Suzuki. M. Nakatsui and J. Suzuki

implemented the methods of Miller and Holmes in the

discrete-time domain (57). Their digital inverse filter had

adjustable zeros for the first three formants and fixed

zeros for the forth and fifth formants. The bandwidths were

computed using a fixed formula for all the formants. Again,

a flat closed phase was used as the criterion in adjusting

the filter coefficients.

Mathews, Miller and David. This method computes the

glottal v-v using a pitch-synchronous analysis technique

(58). The method consists of computing the Fourier

coefficients of a pitch period of the speech pressure

signal, locating the formant frequencies, removing the

formant poles from the spectrum, and regenerating the

glottal waveform from the residual by Fourier synthesis.

Note that since the analysis is done over an entire pitch

period, the estimated formant frequencies and bandwidths

will be incorrect.

Sondhi. M.M. Sondhi proposed a method of inverse

filtering in which the speaker inserts one end of a hard

walled acoustic tube into his or her mouth while phonating

(59). If the tube is properly matched and has a

reflectionless termination, Sondhi showed that the pressure

picked up anywhere in the tube should be a delayed version

of the glottal v-v. The method does not work with a

recorded phonation and cannot be used simultaneously with

filming of the vocal folds.









Rothenberg. M.R. Rothenberg used an analog inverse

filtering scheme similar to those of Miller and Holmes. The

novel feature of Rothenberg's technique is the use of a

circumferentially vented pneumotachograph mask to directly

measure the air volume velocity at the mouth (60). In such

a case, the radiation filter of Figure 4.1 is not present,

and response down to zero frequency can be obtained. Also,

using suitable calibration, absolute airflow levels can be

measured. Since no integration of the inverse filter output

is required to obtain the glottal v-v, the method is less

sensitive to low-frequency noise than schemes that use the

radiated pressure wave. The primary disadvantage is that

the mask has a frequency response only up to 1.5 KHz.

Later, Rothenberg and Zahorian (61) reported a

nonlinear filtering scheme, where by using suitable

feedback, the inverse filter antiresonance frequency and

bandwidth are changed synchronously with the glottal flow to

simulate the effects of the frequency and bandwidth changes

during the open phase. Under these conditions, the inverse

filter output should be proportional to the glottal area of

Figure 4.2. Fant (62), however, states that the method may

not be correct. In any case, it is difficult to instrument.

Berouti. M. Berouti, in 1976, proposed a method for

accurately estimating the vocal tract formant frequencies

and bandwidths from the speech signal by analysis over the

closed glottis interval (19). His approach is based on the

discrete time linear prediction technique that has proved









very successful in speech analysis. Berouti identifies the

closed glottal interval by visual inspection of the speech

signal. Since his method is a special case of a more

general approach to be described next, we do not discuss it

further.

Wong, Markel and Gray. In 1979 0. Wong, J. Markel and

A. Gray published their inverse filtering technique based on

a linear prediction model for the speech signal (63). By a

careful analysis of the sequence of events in a single

glottal period, they were able to propose a criterion for

locating the interval of glottal closure.

Since their method

1) was the only one available that had an objective

procedure for selecting the inverse filter and

2) could be implemented in software without extensive

new instrumentation,

it was decided to use this method to carry out the

inverse filtering tasks required for this study. The

theoretical and practical implementation considerations of

the method are presented in the next section.



The Closed Phase Covariance
Method of Inverse Filtering


Theory

The inverse filtering method of Wong, Markel and Gray

is based on a discrete time formulation of the linear speech

production shown in Figure 4.5(a).