Citation
Intelligibility of selected consonant sounds distorted by infinite peak clipping

Material Information

Title:
Intelligibility of selected consonant sounds distorted by infinite peak clipping
Creator:
Joseph, Maurice, 1926-
Publication Date:
Language:
English
Physical Description:
vi, 83 leaves : ill. ; 28 cm.

Subjects

Subjects / Keywords:
Amplitude ( jstor )
Consonants ( jstor )
Experimentation ( jstor )
Fricative consonants ( jstor )
Mutual intelligibility ( jstor )
Pollock ( jstor )
Spoken communication ( jstor )
Syllables ( jstor )
Vowels ( jstor )
Waveforms ( jstor )
Dissertations, Academic -- Speech -- UF ( lcsh )
Speech thesis Ph. D ( lcsh )
Speech, Intelligibility of ( lcsh )
Genre:
bibliography ( marcgt )
non-fiction ( marcgt )

Notes

Thesis:
Thesis - University of Florida.
Bibliography:
Bibliography: leaves 79-81.
General Note:
Manuscript copy.
General Note:
Vita.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
This item is presumed in the public domain according to the terms of the Retrospective Dissertation Scanning (RDS) policy, which may be viewed at http://ufdc.ufl.edu/AA00007596/00001. The University of Florida George A. Smathers Libraries respect the intellectual property rights of others and do not claim any copyright interest in this item. Users of this work have responsibility for determining copyright status prior to reusing, publishing or reproducing this item for purposes other than what is allowed by fair use or other copyright exemptions. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder. The Smathers Libraries would like to learn more about this item and invite individuals or organizations to contact the RDS coordinator (ufdissertations@uflib.ufl.edu) with any additional information they can provide.
Resource Identifier:
022059336 ( ALEPH )
13489197 ( OCLC )

Downloads

This item has the following downloads:


Full Text














INTELLIGIBILITY OF SELECTED CONSONANT


SOUNDS


DISTORTED BY INFINITE PEAK CLIPPING


By

MAURICE JOSEPH


A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF
THE UNIVERSITY OF FLORIDA
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF DOCTOR OF PHILOSOPHY











UNIVERSITY OF FLORIDA
December, 1966






































To My Wife
















ACKNOWLEDGMENTS


The author expresses sincere appreciation to his

advisor and chairman, Dr. Donald Dew, for guidance throughout the planning and execution of this study, and to members of the dissertation committee for suggestions and encouragement.

Special appreciation is expressed to Dr. Harry Hollien, and the faculty, staff, and students of the Communication Sciences Laboratory of the University of Florida for cooperation in all phases of the study.

In addition, the author gratefully acknowledges Dr.

A. E. Brandt's advice and aid in statistical treatment of the data.

The author wishes especially to thank his wife,

Pauline, for her constant encouragement, inspiration, and untiring aid at all times.

Finally, the author expresses gratitude to the Vocational Rehabilitation Administration for financial aid through a Special Rehabilitation Research Fellowship.

















TABLE OF CONTENTS


LIST OF TABLES ..... . . . .. . . . . .................. v

LIST OF FIGURES .... . .. . . . . . . ................. vi

CHAPTER
I BACKGROUND AND STATEMENT OF PROBLEM .... 1

II PROCEDURE ........ ................. 14

III RESULTS AND DISCUSSION .... .......... 28

IV SUMMARY AND CONCLUSIONS .... .......... 45

APPENDIX A DETAILS OF THE CLIPPING APPARATUS 47 APPENDIX B SYSTEM CALIBRATION PILOT STUDY ..... 53 APPENDIX C STIMULUS-RESPONSE MATRICES .. ....... 58 BIBLIOGRAPHY .......... .................... 79

BIOGRAPHICAL DATA ........ ................. 82

















LIST OF TABLES


Table Page

1 Vocabulary of nonsense syllables used
as stimuli ...... .............. 16

2 Summary of analysis of variance ..... 29

3 Difference between treatments for each
consonant ...... .............. 30

4 Matrix of significant differences between
sounds under each distortion treatment 34

5 Differences between voiced-voiceless
cognate pairs subjected to infinite
peak clipping ..... ............ 37

6 Effect of order of treatment for each treatment ...... .............. 41

7 Intelligibility scores for PB lists
without distortion and with infinite
peak clipping ..... ............ 57

















LIST OF FIGURES


Figure Page

1 A comparison of a complex wave before
and after infinite peak clipping 4

2 Block diagram of equipment .. ........ 20 3 Structural scheme of factorial design . . 24 4 Intelligibility scores of consonants . . . 33

5 Confusion matrix of responses to
undistorted consonants .. ........ 42

6 Confusion matrix of responses to consonants distorted by infinite
peak clipping ..... ............. 43

7 Schematic of clipper amplifier module . 49 8 Output waveforms of modules . ....... 50 9 Action of clipper on speech waveforms 51 10-29 Matrices of stimulus-responses by ten
observers under two treatment
conditions ...... .............. 59-78

















CHAPTER I

BACKGROUND AND STATEMENT OF PROBLEM




Introduction

Experimental studies have demonstrated that word lists retain intelligibility when subjected to distortion by infinite peak clipping (Licklider and Pollack, 1948; Licklider, Bindra, and Pollack, 1948). These findings have been interpreted to mean that the temporal pattern of zero-axis crossings of the speech waveform provides information that is in itself sufficient for perception of speech. A corollary inference is that clues provided by relative amplitude variations in the speech waveform may be discarded without degradation of intelligibility. These interpretations of the effects of infinite peak clipping are of considerable theoretical importance: to the extent that-they are true they describe an invariant factor in speech intelligibility. However, there are two reasons to suspect that these generalizations may need qualification. One reason is that the speech sample used in the early studies may have provided contextual clues to word perception in addition to the frequency and amplitude clues inherent in the speech wave. Thus, with multiple clues









operating, the role of one physical parameter is more difficult to assess. A second reason to question the generalizations about zero-axis crossing and relative amplitude clues is that these clues may assume different importance for different types of speech. Factors that affect intelligibility of word lists may differ from those that affect individual sounds; moreover, it is possible that these factors may assume different importance among different individual speech sounds.

It may be argued that speech communication generally

involves units larger than individual speech sounds, and that word intelligibility provides a closer approximation to intelligibility of conversational speech. It may be true that whole words, with contextual clues free to operate, have higher face validity as representative of speech in general. However, speech communication situations do exist--in military communications, for instance--in which identification of specific speech sounds may be essential. Moreover, fundamental generalizations about factors that determine intelligibility of speech should be capable of verification on molecular levels (figuratively) such as individual phonemes, consonant clusters, and syllables, as well as on the molar level of words and continuous discourse.

The focus of this study is on the individual speech

sound. Specifically, the present study represents an attempt to determine the effects of infinite peak clipping on









intelligibility of specific consonants. It also represents an attempt to minimize the opportunity for contextual clues to operate.

Infinite Peak Clipping

Definition

Peak clipping is a form of distortion in which amplitude is prevented from exceeding a certain maximum limit. Amplitude is held constant during times when, in a linear system, it would exceed the limiting level. When the positive and negative-going parts of the speech wave are equally limited, the effect is said to be "symmetrical peak clipping." If a wave thus limited is reamplified so that the clipped output wave equals the amplitude of the peaks of the original wave (before clipping), and if this operation is continued through several successive stages of clipping and reamplification until the waveform is approximately rectangular in shape, the operation is called "infinite peak clipping." Line B of Figure 1 illustrates the result of infinite peak clipping on the waveform shown-on Line A. Two characteristics of the clipped wave may be noted on this figure: the points (in time) that correspond to axis crossings in the undistorted wave are the same in the clipped wave, but the amplitudes have only two values, a single maximum excursion above and below the axis. In addition, all indications of differences in rise-time or fall-time, or of changes that did not cross




























A






















I -LU J I L










Figure 1. A comparison of a complex wave before and after infinite peak clipping (after Licklider, Bindra, and Pollack, 1948).

A. Undistorted
B. Same wave following infinite peak clipping









the axis, are eliminated in the clipped wave. Relevant literature

Effects on intelligibility. The most important study on the intelligibility of speech distorted by infinite clipping was that of Licklider and Pollack (1948). Phonetically balanced word lists recorded on discs by one talker were subjected to infinite peak clipping and presented to five listeners in a series of twenty-five sessions. Scores averaged approximately sixty-eight percent in the first listening session, but improved in succeeding sessions. Intelligibility scores of over ninety percent were achieved by the twentyfifth session. The authors felt that this increase was independent of the listeners' increased familiarity with PB word lists but instead represented learning to understand clipped speech. This suggests that statements regarding the intelligibility of clipped speech should specify the observers' familiarity with clipped speech.

The effects of peak clipping in combination with other forms of distortion were also studied. Integration of the waveform, the equivalent of a low-pass filter slope of six dB per octave, severely degrades intelligibility of PB-50 word lists, if the integration occurs prior to clipping. Differentiation of the waveform, the equivalent of a highpass filter slope of six dB per octave, performed before clipping, results in intelligibility slightly better than









with clipping alone. Integration or differentiation of the waveform following the clipping had no effect on Intelligibility (Licklider and Pollack, 1948). When combined with distortion by reduction of sound duration, subsequent clipping had little effect on the intelligibility of vowels, but did reduce the intelligibility of consonants (Ahmend and Fatehchand, 1959) in CVC and VC syllables.

Spectral effects. In attempting to predict or to explain the effects of infinite peak clipping under various conditions, it would be helpful to know the effects of clipping on the spectrum of speech. Unfortunately, the information available is scarce and somewhat equivocal. Only one report gives empirical measurements: Licklider, Bindra, and Pollack (1948), reporting the results of an intensity-frequency-time spectral analysis of the word "shoe-bench," both undistorted and peak clipped, state ". . although many of the details of the pattern are changed by infinite peak clipping, the general plan of the terrain is by no means rendered unrecognizable. The main concentrations of'low-frequency and of high-frequency energy are still in the same places despite the rearrangement of minor peaks." This report, even while minimizing the effects of clipping on the spectrum, does support the contention that changes do occur in the spectrum under infinite peak clipping.

Two theoretical papers, analyzing the clipping process









mathematically, exhibit similar ambiguity. Velichkin (1962) concludes ". . . It is evident that amplitude clipping causes a broadening of the speech spectrum, but this broadening is slight, even with maximum clipping." Mathematical calculations by Dukes (1954) confirmed the likelihood that clipping would not seriously reduce the intelligibility of speech, but he adds, ". . . it is clear that the results have significance in respect of very long samples . . . With this formulation nothing can be deduced about the intelligibility of individual sounds, except of course that large deviations below the average must 'be relatively infrequent." He points out that in applying his formulas some of the values obtained represent averages over all possible unvoiced and voiced sounds.

The most serious shortcoming of the spectral informmation available, whether empirical or theoretical, is that it is based on relatively long-term averaged values. Apparently spectral changes, however small, do occur under infinite peak clipping, but are obscured by averaging over time. Specific moment-to-moment changes are not yet known in detail; this makes it difficult to predict the effects on brief speech sounds such as consonants. Implications of previous studies on peak clipping

Two related implications of the Licklider and Pollack (1948) study have been considered of major importance. These concern the relative importance of dynamic amplitude variation









and of the temporal pattern of zero-axis crossings in the speech waveform.

Amplitude clues. Since the speech wave subjected to infinite peak clipping assumes a dichotomous value of amplitude, the almost infinite range of possible amplitude values present in the undistorted wave is reduced to one value of maximum and minimum excursion. This, in effect, means that information regarding relative amplitudes in the speech wave is discarded. The implication drawn is that ". . . the variations in intensity from moment to moment appear not to be basic cues for the recognition of words," and that "the socalled dynamic characteristics of speech are not of vital importance for intelligibility. It is apparently just as well to reproduce all the fundamental speech sounds (or what is left of them after' clipping) at the same intensity level as it is to preserve their normal intensities" (Licklider and Pollack, 1948).

Zero-axis crossings. It has been reasoned that since infinite peak clipping eliminates information regarding relative amplitudes, as well as on- and off-slope dnd wave shape, all that remains in the clipped wave is the temporal pattern of zero-axis crossings. The inference drawn is that the axis crossing information is sufficient to provide speech intelligibility (Licklider, 1950). It should be noted that this is

negative evidence based upon the elimination of other possibilities.









Black and Hixson (1959) were unable to demonstrate a direct relationship between density of zero-axis crossings and intelligibility.

The implications of the Licklider and Pollack study may be paraphrased as follows:

1) The information provided by the temporal pattern of zero-axis crossings is necessary and sufficient for intelligibility of speech, and

2) The information provided by the dynamic pattern of amplitude variations is not necessary for intelligibility of speech.

The point of view that has stimulated the present study is that:

1) While the information provided by the temporal


pattern of zero-ax'is crossings is probably necessary, it is possible that there are specific specimens of speech for which it may not be sufficient, and


2) While the information provided by the dynamic

pattern of amplitude variations is probably not necessary for intelligibility of many speech samples, it is possible that there are some specific speech sounds for which this information may be necessary.

Applications of peak clipping in communication

Thus far infinite peak clipping has been treated in terms of distortion. It may also be viewed as a means of









simplifying the speech wave for efficient transmission.

Speech encoding. Licklider and Pollack (1948) observe that infinite peak clipping can reduce speech to a bivariate code more efficiently than pulse-modulation procedures. Not only is the speech wave easily encoded by infinite peak clipping, but may be decoded by the human ear with no further decoding apparatus necessary,.other than the transducer that would ordinarily be used in the transmission system. Southworth (1963) describes several speech digitizing techniques based upon infinite clipping; these include pulse-number modulation and delta modulation techniques.

Broadcast transmission. One of the earliest applications of peak clipping was in broadcast transmission. Premodulation clipping has been found to increase the efficiency of power use in br'aodcast transmission (Kryter, Licklider, and Stevens, 1947).

Protective limiting. Clipping has been found to be useful in protecting the ear from high-energy speech peaks (Pollack and Pickett, 1959). This was investigated in detail for applications in hearing aid design (Davis, et al., 1947).

In all of the above applications, the ultimate usefulness will depend to some extent upon how intelligible peak clipped speech remains in relation to the needs of the specific situation. It is already known that word intelligibility is satisfactory under peak clipping. To the extent that specific









communication situations demand fine discriminations, the intelligibility of peak clipped speech must be determined in greater detail.

Limitations of previous studies on peak clipping

Influence of test materials. Experimental results on intelligibility are highly dependent on the type of test materials used. Under identical acoustic conditions, an in-, telligibility score may vary by as much as ninety percent depending upon whether the test materials are digits or nonsense syllables (Miller, Heise, and Lichten, 1951). Differences in intelligibility have been associated with meaning, number of syllables, and, to a smaller extent, syllabic stress (Hirsh, Reynolds, and Joseph, 1954). Since the currently available information about intelligibility of speech subjected to infin'ite peak clipping is based on monosyllabic words, the intelligibility of other speech samples, such as individual phonemes, remains unknown until tested empirically.

Effect of contextual clues. An important factor is the information provided by the context of sounds in a word or syllable; this may affect the number of alternative responses available if a'particular sound is unintelligible (Miller, 1951). If, for instance, a test word is straw_/ , with the final consonant unintelligible to the observer, and the test consists of nonsense syllables, there are more than twenty possible final consonants available as responses. These include the









nonsense words: stram, strat, strab, strad, stran, strak, strang, strag, stral, stray, straf, straz, strath (voiced and voiceless), stras, strash, strach. If, however, the test vocabulary consisted of real English words, then there is probably but one possible response: strap. Thus, if the final sound were rendered unintelligible by some distortion, on the real word test the observer's knowledge of the statistical probabilities of occurrence of certain sounds, combinations of sounds, and orders of sounds would probably enable him to correctly identify the word. If a distortion, such as infinite peak clipping, produced a systematic effect on the intelligibility of certain sounds, this could be obscured in PB-50 scores if word contexts led to correct identification of words in which the sounds occurred.

As J. D. Harris (1960) has observed:

In order to uncover the contribution of
each type of physical cue alone, it is
not enough to eliminate it by some legerdemaine in the laboratory. When this is
done . . . intelligibility is often unaffected. A false conclusion could in
that case be reached . . . that the cue
eliminated is of minor importance in
speech communication. What is necessary
is to eliminate progressively one, two,
and more cues simultaneously.

Since contextual clues may contribute to the intelligibility of distorted speech, it is considered important in this study to control contextual clues while studying the effects of distortion on intelligibility of individual sounds.









Statement of the Problem

This study addresses itself to three questions:


1) Are individual speech sounds affected in intellibibility by infinite peak clipping?

2) If so, are these effects equal from one phoneme to another?

3) If individual speech sounds are not equally affected in intelligibility by infinite peak clipping, is there a systematic pattern in the way in which sounds are affected?

















CHAPTER II

PROCEDURE




Stimulus materials

Selection of stimuli. For this study twelve consonants /p b t d k g e 6 f v s z/ were chosen which represent a variety of positions and manners of articulation, as well as acoustic effects. In particular, the selected consonants include examples of fricatives and plosives, of voicedvoiceless cognate pairs, and of lingua-alveolar, linguadental, bilabial, and velar placements. They also represent the following di.stinctive features: contrasts of stridentmellow, tense-lax, continuous-discontinuant, grave-acute, and compact-diffuse.

Obviously some of these consonants, namely the plosives, cannot be spoken without an adjacent vowel. Moreover, it is well known that the acoustic characteristics of consonants can be altered by the vowel context in which they are produced. Consequently, in order to counterbalance the vowel environment, the selected consonants were embedded in a series of VCV syllables. Specifically, a corpus of ninety-six nonsense Vowel-Consonant-Vowel words was designed, drawing upon









the twelve consonants in combination with four stressed vowels and the neutral schwa sound. The counterbalancing was accomplished by selecting stressed vowels /i ag a u/, which represent extremes of position on a vowel quadrilateral, and combining them with each of the consonants in both the pre-vocalic and post-vocalic positions. By adding the schwa to each of these CV and VC combinations in the initial or final position, respectively, VCV syllables were formed which correspond to Trochaic or Iambic stress patterns. These may be schematized as either stressed vowel + consonant + schwa, or schwa + consonant + stressed vowel. Thus for any replication of the entire corpus there are eight utterances of each consonant embedded in different vocalic environments. The entire set of stimulus words is presented in Table 1.

Because there are individual differences in the production of speech sounds, it was deemed desirable to use three different talkers to produce samples of the syllables. The three adult male talkers were chosen from the Communication Sciences faculty. Each talker had extensive training and experience in phonetic transcription.

In preparation for recording spoken examples of the selected stimuli, a phonetic transcription was made of the ninety-six nonsense syllables. The transcriptions were replicated three times, once for each talker, and arranged in one complete random order. On the final copies of this











Table 1. Vocabulary of nonsense syllables used as stimu]i.





iSa aes a (Is7 US.) OSs roe ,SQ osu i 0 G aeO o (1 0 o u t), 0 i O0a 10D U

i fo aef@ ( fw uf;, o f i o fae o f ( : f u izo aezo V) u7ozi ozas ovO )zu

@ 6 i e @ cc G6U iV@ aev0 'v) uv@ Ovi @ eG VQ DVU i PC) ae p c a u Up Spi p- o pa o pu itG @to a to ut) Gotite Cta otu iko aeko Uako uko oki eke o k oku ibo aebo abo ubo E bi o be a ba obu ido aeda d d udG d d e oWda odu igo' eg( Qgo ug@ Ggi ogae oga qgu









list of 288 phonetically transcribed syllables, individual items were identified by talker so that the recordings could be made in a single session.

Recording the stimulus tape. For two reasons, the

spoken examples of stimuli were tape recorded during a single session. First, if recordings were made separately a subsequent splicing or rerecording step would have been necessary to randomize the stimuli produced by different talkers. However, by recording all the talkers in a single session, this randomization was accomplished prior to the recordings. Secondly, since the talkers were instructed to correct any errors in the production of the stimuli, they and the experimenter provided an initial check of each other's articulation.

In the recording session the talkers were seated in a

series 1200 Industrial Acoustic Corporation room approximately two feet from the Altec M-20 microphone system, which was coupled to an Ampex model 351-C full-track tape recorder located outside the sound-treated room. Each talker-listener had a complete list of the stimuli indicating which talker was to read each of the words. Talkers were instructed to use the carrier phrase "now" for each VCV word with no intervening pause. By using such a phrase an opportunity for the talker to reach a stable vocal level was provided before his production of the VCV syllable. The word "now" was chosen, rather than a more conventional "say the word" or "write the









word," because it does not include any of the sounds used as stimuli. This avoided the possibility of a constant standard of reference for one or more of the test stimuli. The experimenter provided a flash of light as a cue for the beginning of each word in order to space stimuli at four-second intervals.

Following the recording session, the tape was edited to remove errors and irrelevant materials and to insert an identifying number before each group of ten stimuli. Since one of the talkers tended to use a slightly lower vocal level than the others, the gain was adjusted during the rerecording process to equalize the levels of the speech peaks on the carrier phrase.

Observations

Observers. The criteria for selection of observers were: 1) training in phonetics, and 2) normal speech and hearing. Hearing sensitivity within normal limits was determined by means of individual screening tests at 250, 500, 1000, 2000, 4000, and 6000 Hz. This testing was accomplished with a Beltone model 10 A portable audiometer, set at a screening level of 15 dB relative to the 1964 I.S.O. standard.

Ten observers who met the criteria were recruited from among the students and faculty associated with the Communication Sciences program at the University of Florida. This included eight male and two female observers whose ages ranged









from twenty to fifty years. For each of the observers, the experimental session was the first exposure to speech subjected to infinite peak clipping.

Playback system. The complete system as used in the experimental trials is depicted schematically in Figure 2. The stimulus tape was played from an Ampex 351-C tape recorder into one channel of a Marantz model 7 preamplifier with controls set in the normal flat frequency response position. The output of this channel was led to a double-pole doublethrow selector control which either connected or bypassed the clipper. The clipping unit, consisting of three cascaded printed-circuit amplifier modules, operated beyond their linear range, is described in detail in Appendix A. The "pass-clip" switching control output led to a Hewlett-Packard 350-D decade attenuator set. The signal was then reamplified through the second channel of the preamplifier connected to a Marantz model 8-B power amplifier. The output of the power amplifier drove an AR-3 acoustic suspension loudspeaker system located in the I.A.C. room. A dual-beam Tektronix type 564 oscilloscope was used to constantly monitor the speech waveforms at the input of the clipping unit and at the output of the power amplifier. Accordingly, both undistorted and clipped waveforms could be observed simultaneously.

Playback level. Stimuli were presented to observers

at 70 dB Sound Pressure Level. In order to achieve the desired




















Pass
Selector


speaker


Figure 2. Block diagram of equipment.









level consistently for all experimental trials, a preliminary calibration was performed. A General Radio sound level meter, located at a position in the I.A.C. room, where observers would sit during experimental trials, was used to determine sound pressure levels. The RMS voltage reading at the input to the loudspeaker which produced a 70 dB Sound Pressure Level at the observer position was noted and used as reference setting during experimental trials. The signal used to set voltage readings was a 1000 Hz calibration tone. The calibration tone was recorded on the test tape at a V.U. level that corresponded to the average V.U. of the talkers' speech peaks on the tape; the signal source for the recording was a General Radio model 1304-B beat frequency audio generator. RMS voltages at the transducer input were determined with a Ballantine model 321-C A.C. Vacuum Tube Voltmeter. Sound Pressure Levels were read on the B scale of the sound-level meter.

System calibration. In order to determine whether the present experimental apparatus was comparable with that of previous studies, a system calibration was performed. This took the form of a pilot study using a speech sample similar to that used in the Licklider and Pollack (1948) study. Details are presented in Appendix B. The apparatus was found to be comparable.

Conducting experimental trials. Observers were seated









in one of three seats placed at a distance of two and onehalf feet from the loudspeaker. Half of the observers heard the test list under the "clip" treatment first and "pass" (undistorted) second; the other five observers heard the lists with treatment conditions in the reverse order.

The following recorded instructions were presented to the observers before the VCV list under each treatment condition:

Transcribe only the consonant in the word. Do
not transcribe the word "now" which comes before each word. For example, if you hear
"Now ESe," write the symbol for /S/. If you hear "Now e0," write the symbol for /m/. To help you keep your place, after each group of
ten words the number of the following word
will be given.

Just prior to the instructions for the "clip" condition, whether it came first or second in order, certain recorded materials were presented to provide some familiarization with the sound of clipped speech. These materials included one unfamiliar passage--a paragraph from an out-dated news magazine, and two passages which were presumed to be very familiar: the pledge of allegiance to the flag, and digits from one to twenty.

A brief rest period was provided between treatment

conditions. At this time the voltage to the loudspeaker was rechecked and the tape was advanced to the position for the following treatment. The listening session required about
L,









one hour to complete both treatment conditions. Data reduction

Analysis of variance. The primary analysis of results used a design with four factors: Consonant (C), Observer (0), Talker CT), and Distortion (D). Factor C consisted of the twelve consonants used in this study, Factor 0 represented ten observers and Factor T represented three talkers. The two treatments in Factor D were the distorted, or "clip" condition and the undistorted, or "pass" condition. The latter was included as the experimental control for the effects of clipping. The design utilized a mixed model in which the Consonant and Treatment factors were considered fixed effects and the Observer and Talker factors were considered random effects. The structural design is depicted in Figure 3. The number of correct responses (from zero to eight) made by a single observer to a talker's productions of a given consonant was entered in the individual cell. For purposes of this analysis, differences among vowels and stress patterns in syllables were ignored; the syllable was viewed simply as the vehicle for providing a means of counterbalancing a variety of coarticulation effects among the test stimuli, and as a method of limiting the number of response alternatives for the total VCV vocabulary in order to control the contextual variable.

Individual comparisons. A factorial design makes





























































Figure 3. Structural scheme of factorial design.









possible the analysis of two or more experimental variables simultaneously, both of their individual effects and of their interactions with one another. In an analysis of variance, the F ratios obtained are evaluated with reference to the theoretical distribution of F in order to test the significance of differences among the treatment means.

It is often deemed desirable to investigate the sources contributing to the significance of a given variable. This may be accomplished by making specific comparisons of the significances of differences among means of the levels or categories of a variable. For instance, in the present study the information that the Consonant factor had statistical significance would not indicate which consonants differed from one another, but comparisons among the individual consonants or groups of consonants could provide this information.

For enumerative data such as those collected in this study the chi-square test provides an appropriate test of significance. The present data represent the number of correct responses that were actually obtained compared with the number of correct responses possible. According to Snedecor (1956) the basic chi-square formula may be verbalized as: "Chi-square is the sum of such ratios as (deviation squared)/ (hypothetical number)." The factorial chi-square, developed by Brandt and Snedecor (Brandt, 1949), represents an alternative method of calculation of chi-square that is easily









adaptable to use in a factorial design. In this method the sums of squares are multiplied by a chi-square factor to determine the value of chi-square associated with a given comparison. The chi-square factor represents (e)2/(o)(e-o), where e equals the total possible sum of scores and o equals the obtained sum of scores. The sum of squares associated with the comparison is (d)2/(f)(s12 +s22), where d equals the difference between sums of scores of the quantities being compared, f is the frequency of each item being compared, and Ei and - are the number of items comprising each half of the comparison. For example, if one sound is compared with another, (s12+s2 2) would equal twice one squared, or two; if six sounds were compared with six other sounds, (sl2 + s 22) would equal twice six squared, or seventy-two. The obtained value of chi-square is tested for the number of degrees of freedom appropriate to the effect or interaction for which the comparison is made.

This method was used to make voiced-voiceless comparisons, fricative-plosive comparisons, and individual consonant comparisons. It was also used to test the effect of order of presentation of treatments.

Stimulus-response matrices. Information about types of error responses was displayed in confusion matrices which were compiled for each observer (see Appendix C), similar to those described by Miller and Nicely (1955). These matrices preserved





27



specific responses made by observers to each talker's stimuli.


















CHAPTER III

RESULTS AND DISCUSSION




Analysis of Variance

A summary of the results of the analysis of variance is presented in Table 2.

Main effects

Two main effects were found significant. First, the Distortion factor was significant, indicating that clipping does produce a decrement in consonant intelligibility. The total percent of correct responses under the "pass" condition was 94.6%; the corresponding score for clipped speech was 76.3%. This gross effect of clipping on all stimuli together does provide an affirmative answer to the experimenter's first question, are individual speech sounds affected by infinite peak clipping?

The Consonant main effect was also significant. This may be due to an inherent difference in consonant intelligibility, whether clipped or undistorted. It may be noted in Table 3 that for both /0/ and /6/, intelligibility scores were lower than for other consonants under both conditions.









Table 2. Summary of analysis of variance.


Source


d.f. Mean Square


Consonant (C) Observer (0) Talker (T)
Distortion (D)


CxO
CxT CxD OxT OxD TxD


CxOxT CxOxD CxTxD OxTxD


CxOxTxD


pooled residual
a

OxT CxOxT CxOxTxD


pooled residualb


CxOxT CxOxTxD


198 99 22 18


198


414


396


9 2.45 3. 18 0.05
387. 20


13 .07 52. 05 142. 71
2. 23
48 .99 200. 53


7.81
20.57 80.19
26.69


FI Significance,


9.41 0. 33 o). ( 1
39 . 55


1 .29 5. 14 14. 08 0. 22 4.83
19.79


0.63 1.65 6.44 2.14


N.S.
N .S.







N.S.




N.S.


12.45


9.79


10.13


error term for main effects was pooled residuala error term for two-way interactions was pooled residualb error term for three way interaction was CxOxTxD


2
.01 level of significance ** .05 n evel of significance * not significant N.S.












Table 3. Difference between treatments for each consonant.




Percent Correct
Consonant Pass Cli, Chi-square




s 98.8 94.6 1.67 6 85.8 38.8 213.56 f 89.6 70.8 33.87 z 99.2 83.3 25.44 6 65.8 27.5 141.56 v 97.9 74.6 52.45 p 100.0 87.5 15.05 t 99.6 91.3 6.69 k 99.6 83.8 24.15 b 99.2 84.2 21.68 d 99.6 87.9 13.11 g 100.0 91.3 7.38




With 11 df, chi-square needed for significance at .05- 19.7 .01- 24.7









The main effects for Talker and for Observer were not found to be significant. This was reassuring, since both of these factors were potential sources of considerable experimental error. Since over-all differences among talkers and observers were not statistically significant, many of the scores to be reported are based upon the pooled talker and observer scores.

Interactions

Despite the lack of significance of the Talker and

Observer main effects, there were several significant interactions involving talkers and observers. Visual inspection of the data indicated that the Consonant x Talker interaction may have been due to a higher score of correct responses to /0/ as spoken by talker A than by the other talkers, and a higher score on /6/ as spoken by talker C. Furthermore, the Talker x Distortion interaction showed significance since talker C yielded a higher score than talkers A and B for the "pass" condition but a lower score than the others under clipping. These interactions involving talkers tend to confirm the advisability of using more than one talker in a study such as this. Divergent over-all scores under clipping between two observers seemed to be the basis of the significant Observer x Distortion interaction.

Of major importance in this study is the Consonant x Distortion interaction. The significance of this









interaction provides an answer to the experimenter's second question; are these effects equal? The effect of clipping on individual sounds is not equal.

Individual Comparisons

Specific consonants

The difference between distortion treatments for each consonant is presented in Table 3, in which the percentage of correct responses is compared for each consonant. For almost half of the consonants that were included as stimuli, the intelligibility under clipping was not significantly different than for undistorted speech. The sounds that were affected little were /s g t d p/. Significantly different at the .01 level were /6 0 v f/. Of the consonants that were affected by clipping, the decrement in percent of correct responses was as great as 47%. These relationships may be visualized easily in Figure 4.

Considering each treatment separately, the difference

from one consonant to another is shown in Table 4. This table is arranged in two matrices; the upper matrix represents the "pass" condition and the lower represents the "clip" condition. In each matrix, the presence of an asterisk at the intersect of a column and row indicates a statistically significant difference between the two corresponding sounds in the margins. A blank space indicates that for the treatment under consideration, the difference between the two sounds















7, QFI,
FLFL


i I


KL


v p t k 6 d


Figure 4.


Intelligibility scores of consonants. A. Open bar indicates undistorted condition B. Closed bar indicates infinite peak clipped
condition


100" 80





60-


40-


zo0-










Table 4. Matrix of significant differencesI between sounds under each distortion treatment.


Pass


s (95.0%)

t (91 .3%) g (91 .3%) d (87.9%) p (87 .5%) b (84. 2%) k (83 .8%) z (83 .3%) v (74 5%)

f (70.8%) 0 (38 .8%) , ( 27.5%)


s t g d p b k z v f 0













\* \*
1 1 1 2 11 1 1*



Clip


Sounds are arranged in order of decreasing intelligibility under clipping. 1I
significance level .01









(with respect to the number of stimuli correctly identified) is not statistically significant at the .01 level. Under the "pass" condition, /6/ is statistically different from all of the other sounds. Under the "clip" condition, there are four consonants that are uniformly less intelligible than the others; these are /6 0 f v/. Reference to Table 4 makes it possible to ascertain the significance of the difference in intelligibility, under either treatment, between any two of the test stimuli.

Results of the present study seem to support the observations made by Licklider and Pollack that a speech waveform can retain intelligibility even when stripped of relative amplitude clues so that only the temporal pattern of zeroaxis crossings is retained. In the present study, data are also presented which show that the extent to which this is true depends upon the speech sample.1 With contextual clues controlled, the intelligibility of individual consonant sounds in VCV syllables ranged from below thirty percent to scores above ninety percent, for observers' first exposure to speech distorted by infinite peak clipping. The differential effect




IA pilot study was conducted as a calibration procedure to determine the equivalence of the present experimental situation to that of the Licklider and Pollack study. It was concluded that the two were comparable. See Appendix B for details.









of clipping on individual consonants is emphasized by noting the scores for the sounds that were the most affected and those that were the least affected by clipping. Scores for /0/ and /6/ were 38.8% and 27.5%; compared with the scores obtained without distortion, these represent decrements of 47.0% and 38.3% respectively. The score of 95.0% for Is/ represents a decrement of only 3.6%. which was not statistically significant.

Voiced-voiceless grouping

The value of chi-square associated with the grouped scores of /s 0 f p t k/ and grouped scores of /z 6 v b d g/ under clipping was 0.36; the value necessary for significance at the .05 level was 19.70. There were no significant differences under clipping between pairs of voiced-voiceless cognate sounds, as shown in Table 5. Thus, clipping produced the same degree of effect on intelligibility in the voiced and voiceless sounds. Since these groups are primarily differentiated on the basis of the presence or absence of lowfrequency periodic excitation, it is to be expected that clipping would not affect recognition of voicing. Fricative-plosive grouping

In comparing consonants grouped according to the

fricative-plosive classification, a value of chi-square of 46.14 was obtained; the value necessary for significance at the .01 level was 24.70. It is apparent that the fricatives














Table 5. subjected


Differences between voiced-voiceless cognate pairs to infinite peak clipping.


Consonant Pair Chi-square



S -z 14.21 6 6 12.19 f -v 1.35 p -b 1.07 t -d 1.07 k -g 5.43


With 11 df, .05 - 19 .7 .01 - 24.7


value of chi-square needed for significance at









were different, as a group, from the plosives. Intelligibility of the fricatives was considerably lower. However, this must be qualified; two of the fricatives, /s/ and /z/ were relatively unaffected by clipping, while the other four, /0 6 v f/ were severely degraded in intelligibility. Consequently, it appears that the distinguishing characteristics of fricatives are obscured more than those of plosives, and furthermore, that the characteristics of the dental plosives are obscured more than the alveolar plosives. This provides an answer to the third question, is there a systematic way in which sounds are affected by peak clipping? The following reasons are offered in partial explanation:

1) Cavity resonance. The four severely affected fricatives are formed at the teeth, so far forward in the mouth that they are relatively free of oral cavity resonance effects. The friction component of these sounds consists only of the turbulence of the breathstream between tongue and teeth or between lip and teeth, radiating into the air (plus voicing for /6/ and /v/). It is evident that without oral cavity resonance there is little basis for distinguishing the lingua-dental sounds from the labio-dental sounds.

2) Friction and vocalic components of fricatives. An experiment performed at the Haskins Laboratories seems especially relevant to this discussion. Harris (1958) separated the friction components of fricative sounds in VC









syllables from the vocalic portions and combined the friction component of one fricative with the vocalic portion associated with a different fricative. She found that identification of /s/ and /z/ could be made by observers entirely on the basis of the friction component, but that identification of /f v 0 6/ depended on the combination of both friction and vocalic portions. Since in the present study /s/ and /z/ retained such high intelligibility, it might be inferred that the removal of relative amplitude clues did not affect the transmission of information inherent in the friction portion of fricatives, but that it did affect the vocalic portion.

3) Balance of noise. The /0/ and /6/ are characterized by a low intensity noise, while the /s/ and /z/ are considered to have a high intensity noise component (Jacobson, Fant, and Halle, 1952). Since one characteristic of infinite peak clipping is to equalize amplitudes to a uniformly high level, it is possible that clipping might do more violence to the naturally lower noise levels of /0/ and 6/ than to the normally higher noise levels of /s/ and /z/.

4) Durational clues. Still another explanation can be

made on the basis of the clue of duration. While traditionally the fricatives are considered longer in duration than plosives, Miller and Nicely (1955) included /s/ /z/ and the lateral sibilant fricatives in one class, representing the consonants of longest duration, but the dental fricatives were placed in









the same category of duration as the plosives. Since temporal characteristics would be unchanged by clipping, this would provide an additional means of identifying the /s/ and /z/. It would also help to explain certain errors that occurred, in which /p t k/ were the responses to /f/ and to /v/.

5) Affrication in plosives. A factor related to confusions between fricatives and plosives may be the affrication that typically follows plosives in American speech. This affrication, when amplified by peak clipping, might tend to make the clipped plosives perceptually resemble the fricatives. Order of treatment

The possibility of an order effect based on the procedure in which five observers received the "pass" condition first and the other five received the "clip" condition first was investigated. As Table 6 shows, the order of presentation did not yield a significant difference for either the "pass" or the "clip" condition.

Error Responses

Confusion matrices were prepared which combined results for all talkers and for all observers. Figures 5 and 6 show types of responses actually made to the stimuli and their frequency of occurrence. The individual scores shown in these figures are based on ten observers' pooled responses to the eight syllables spoken by three talkers for each consonant-a total of 240 stimuli per consonant.















Table 6. Effect of order of treatment for each treatment.




Order
Treatment Pass First Clip First Difference Chi-square
Sum of Scores




Pass 1374 1350 24 1.60 Clip 1135 1061 74 15.26




With 9 df, value of chi-square needed for significance at .05 - 16.9
.01 - 21.7









Percent response on pass treatment


s E -F z v p tk d c3 omit I rwm n h5.83 12.08 0.83 0.83 - .1__ 6 8q. 8
0.141 0l~r .'11

4. ,4 o.5I 5.83 Z-83 0.93 o.41 10




qq,, o , o., -




/00


Figure 5. Confusion matrix of responses to undistorted consonants.


S e







V
E)



"-4
, kF
v . a1


U












s e -F z


Percent response on clip treatment v p f k b d 9 omit l r


'w m n g


j +


8.15 q. tq c, q 6
013 38.76 11-91 1.15 .1 0 98 O3 0.83 l2ST 0.83

lo,03 .3 14.fjG 0.41 Y500 0.41 B 0.983 OqI(

0.93 83.33 8.75 3.33 0.83 [.25' 0.qI 0'+l 0-'f O-f!
Oj q 0.1 2.06 27 , o Lf31, 0.9 1.25" 2.i1 &25 5.93 Zo2.Bo~

OQ 1j 0.41 0.93 q. 7458 OS 20. a41 2.91 5Z .'20 11~

2 .qi o 0.0.
NN






011 5.83 04 3.q.I 250 315 o41 0.93 I.

0 3 0.93 .g 3.15. . ,oI �83

.20.o.l1 0."1 0.41 3.7 qI.z 2A. o .'fI


Figure 6. Confusion matrix of responses to consonants distorted by infinite peak clipping.









Generally, under the "pass" condition most response

errors could be classified in the same voicing and manner of release category as their respective stimuli. However, under the "clip" condition the types of error were divergent. There were very few instances of voiced-voiceless confusions, but there were many confusions on the plosive-fricative dimension. These tended to be unidirectional; that is, fricatives, especially /0/ and /6/, were frequently identified as plosives, but plosives were rarely heard as fricatives. There were other interesting examples of confusions that were not reciprocal: twenty-eight percent of the responses to /0/ were /f/, but only seven percent of the responses to /f/ were /e/. Similarly, forty-three percent of the responses to /6/ were /v/, but only nine percent of the responses to /v/ were /6/. Under the "pass" condition the confusions between /0/ and /f/ were approximately equal, but confusions between /6/ and /v/ exhibited the same sort of asymmetry as under clipping.

There were many more types of error responses used by observers under the "clip" than the "pass" condition. This was more noticeable for the dental fricatives than for other sounds.

















CHAPTER IV

SUMMARY AND CONCLUSIONS




Summary

An experiment was performed to determine whether individual consonant sounds are affected differentially by infinite peak clipping. The consonants, representing plosives and fricatives, voiced and voiceless, and five articulatory positions, were placed in Vowel-Consonant-Vowel nonsense syllables. These stimuli were spoken by three trained talkers following a predetermined random word order and were tape recorded. Ten adult observers, trained in phonetics, transcribed the consonants in the stimulus words under two conditions: one in which the speech was passed through the system undistorted, and one in which the speech was distorted by infinite peak clipping.

The data were subjected to a factorial analysis of

variance to appraise the statistical significance of the results. The main effect for Distortion treatment was significant as was the interaction between Distortion and Consonant. This indicated that not only did infinite peak









clipping affect the intelligibility of speech, but it does so differently for different consonant sounds. The main effects reflecting Talker and Observer variance were not significant, indicating a lack of systematic error from these sources.

The dental fricatives were the most severely affected by clipping. The plosives and /s/ and /z/ were the least affected by clipping. Intelligibility scores for clipped consonants ranged from 27.5% to 94.6%; the range for the undistorted condition was from 65.8% to 100%. Sounds that exhibited the poorest intelligibility when not distorted also showed the lowest intelligibility under clipping. Voiced and voiceless sounds were equally affected by clipping. Conclusions

Infinite peak clipping affects the intelligibility of individual speech phonemes.

The effect of infinite peak clipping on the intelligibility of individual consonant sounds is different from one sound to another.

There is a systematic pattern in the relative effects of infinite peak clipping on intelligibility of individual sounds. Dental fricatives show the largest decrement under clipping; plosives and sibilant fricatives are the least affected of the sounds sampled. Intelligibility of voiced and voiceless sounds are equally affected by infinite peak clipping.









































APPENDIX A

















DETAILS OF THE CLIPPING APPARATUS


Three resistance-capacitance-coupled printed-circuit

amplifier modules were used in cascade. Clipping was achieved by overdriving the amplifiers between saturation and cutoff, as described by Littwin (1964). Each module consisted of one emitter-follower stage driving two stages of common-emitter amplification. A schematic of the amplifier modules is shown in Figure 7.

Undistorted operation of each module separately and of the three modules cascaded, when operated within their linear range, was demonstrated by observing the input and output waveforms on a Tektronix type 564 dual-beam storage oscilloscope. These waveforms were photographed using a C-12 oscilloscope camera with Polaroid pack. In Figure 8 the upper traces show a 1000 Hz tone at the output of a Hewlett-Packard model 201-C oscillator, and the lower traces show the waveforms at the output of successive cascaded modules. In Figure 8A the unit is shown operating within its linear range; in B it is operating as a peak clipper. The action of the clipper on samples of speech is shown in Figure 9. Photographs A and
















-o IV DC.

R5 R9






OUT




R Rr R--C



30o C-_ i icfa


3,000 ohms C2 10. microfarads 47,000 ohms C3 100. microfarads 10,000 ohms C 3 .001 microfarads 4,700 ohms C4 100. microfarads 4,700 ohms
680 ohms QI Low Beta p-n-p 3,300 ohms Q2 Low Beta p-n-p 2,100 ohms Q3 High Beta p-n-p 1,500 ohms


Schematic of clipper amplifier module.


Figure 7.








































B































Figure 8. Output waveforms of modules
A. Linear operation
B. Peak clipping






























-!!lnml





















B

















Figure 9. Action of clipper on speech waveforms.
A. Vowel
B. Consonant









B show waveforms at two different moments during utterance of the word "mash," corresponding to the vowel and the final consonant. The upper traces show the speech signal at the tape playback output of an Ampex model 354 tape recorder; the lower traces show the same samples monitored at the output of the clipper.

The frequency response of each module and of the three in tandem was found to be 60 to 12,000 Hz + 3 dB. This was determined by measuring the root mean square output voltage of the clipper, with constant input, using pure tones generated by a General Radio model 1304-B beat frequency audio oscillator. Voltages were measured with a Ballantine model 321 A.C. Vacuum Tube Voltmeter.








































APPENDIX B


















SYSTEM CALIBRATION PILOT STUDY


Pur pose

To determine if the present experimental situation was comparable to that used by previous investigators, the traditional procedures were replicated in a pilot study. In particular, the intelligibility of undistorted and clipped PB words was compared using materials described by Licklider and

Pollack (1948) .


Stimulus materials

Lists one and two of the Harvard PB-50 words (Egan,

1948) were arranged in random word order. As in the Licklider and Pollack (1948) study, one adult male talker who had training in phonetics read both lists. The words were tape recorded in a series 1200 Industrial Acoustics Corporation room using an Altec M-20 condenser microphone coupled to an Ampex model 350-C full-track tape recorder.

The stimulus words, each in the carrier phrase "Write the word __ ," were spoken as one continuous sentence. During the recording, the talker monitored the deflections of a VU meter associated with this phrase in order to maintain a










uniform level of vocal intensity among the words. The words were separated by an interval of five seconds, and before each ten words a number identifying the following stimulus was given, using the phrase, "Next is number ." Apparatus

The peak clipping unit consisted of three cascaded

printed-circuit amplifier modules operated beyond their linear range. Details of this instrument are given in Appendix A. A high quality audio playback system which is described in Chapter II was used for presentation of stimulus materials. Sub jects

Ten observers, three males and seven females, were selected from the staff of the Communication Sciences Laboratory. The observers ranged in age from nineteen to thirty-six years and demonstrated normal hearing. Hearing was screened at 15 dB relative to the I.S.O. standard at test frequencies of 250, 500, 1000, 2000, 4000, and 6000 Hz, using a Beltone model 10 A portable audiometer.

Procedure

Experimental trials were conducted in a series 1200

Industrial Acoustics Corporation auditory test room. Observers were seated in one of three chairs positioned at two and onehalf feet from the loudspeaker. Instructions were given to write the words that they heard, using response forms that were provided with spaces for the fifty words of a PB word list.










In order to avoid any systematic effect associated


with the difficulty of one PB list relative to another, the distortion treatments and word lists were counterbalanced. For the "pass" treatment (without peak clipping), five observers heard List One, and the other five observers heard List Two. For the "clip" treatment, each observer heard the list that was not presented to him under the "pass" treatment. Results

Results are shown in Table 7. The mean intelligibility scores for the ten observers for the "pass" treatment was 95.8%; the scores for the "clip" treatment were 68.8%. In the Licklider and Pollack study (1948) the mean score for five observers in the first test session was approximately 98% without distortion, and 64% with infinite peak clipping. Conclusion

With PB-50 lists as stimulus material, the apparatus and procedure used in the present experiment duplicated results obtained by Licklider and Pollack (1948). Intelligibility scores obtained without distortion and with infinite peak clipping were of the same order of magnitude as those reported in the literature.














Table 7. Intelligibility scores for PB lists without distortion and with infinite peak clipping.




Observers No Distortion Cl i pping




Group 1 List 1 95.2% List 2 65.6% Group 2 List 2 96.4% List 1 72.0% All 95.8% 68.8%








































APPENDIX C







Number of Responses


S 04:z vp


+ k 6 8 S omt I r w rn n


A
B
C
A
B
C
A
B
C
A
B
C
A

C
A
8
C
A
B
C
A




B
C
A
B
C
A
B
C
A
B
C
A


Figure 10. Matrix of responses by Observer 1
consonants.


to undistorted


Letters A, B, and C identify talkers.


177







7j1







1 71







Number of Responses


b c Hi H w I[ L
A 42


A 74" 2 1




At4~12 1l1 2K
3 ~ 3


S i 2-14
SC 1 4 . .. .
=n V I 2
L A
C U


oA

k j




B G 2
Al 5I 2 1; 1
A


A H



Figure 11. Matrix of responses by Observer .1 to consonants distorted by infinite peak clipping.


Letters A, B, and C identify talkers.







Number of Responses s 0-z v p t k k d oc fJ r w rn n f 5 r







A 3 '


I
A ..
Z B

. - - - _ _ 8 -.... ...



2 0 B 8 6




B i i


c ,
A



oA -1-..









Figure 12. matrix of responses by Observer 2 to undistorted consonants.


Letters A, B, and C identify talkers.







Number of Responses


I\[7 I Il I

o ______ t.. ..

B


A
Z J7-1
(121

V B
C L7




o C_7
A
A 2 2 41






C
u A





C 13 M
A b
6 2
4J P C 71
A
t E3
0 C17


Fiur A3 Iari Ifrsossb bere cnoat
distorted~~~~ byifnt7ea lpig


Letters A, B, and C identify talkers.











A

C A

C A
4 B
C A
Z B
C

E
-j C
A
V

C DA 4j C
A
0S
0
A
k a
C A C A 0 C oA



A




C
A


A






Figure 14. consonant s


Number of Responses


-s 0 F z r ~ t C W fi f w r






71 3 4~
4


Matrix of responses by Observer 3 to undistorted


Letters A, B, and C identify talkers.


J







Number of Responses


s 04z v p t k 6 omt r w n n J �1




2 1 2<


A 7
Z B

C









I T
c~ -Ka2 1IJ





S
A
0 BA8



cI'0 j- I

B ~ 7
C 7 - - -- - --
A 8

C 7!1
A 1



Figure 15. Matrix of responses by Observer 3 to consonants distorted by infinite peak clipping.


Letters A, B, and C identify talkers.







Number of Responses


A~

C I ___ .



CI I
Ci






cI I
3 4




.. I 5 2-H '

5 2
A0 I 8I

d
.vs 8
V8



A
0 SI
U
A











Figure 16. Matrix of responses by Observer 4 to undistorted

consonants.


Letters A, B, and C identify talkers.







Number of Responses

sO4z v p tk 6 �,t r w nn D f' +r

A

: B8
C a


A
A 2 2




Z B 1
C
A 7 C I

uld A 7 V 7
a) 7j

ID C
D A6 2
r. C 5 2
o
A 7
A 2
B 5 A



t B
c - -



A
C62 A

C.



Figure 17. Matrix of responses by Observer 4 to consonants distorted by infinite peak clipping.


Letters A, B, and C identify talkers.







Number of Responses


n n 9 J b) +i


s -F z v p t k 6 4 g -.;t I r w


SA

C A

C A

C A Z B
C A

C A v a
C A P B

A t B
C A k 6

C
A C A d B
C A


Figure 18. Matrix of responses by Observer .5 to undistorted consonants.


Letters A, B, and C identify talkers.


U)




0
U


8


53



__ 1





71 7




I L
8
',,


114� : ! ! ,!,,8
Ba B8
8

: 3 ....


8





b � � , ,8







Number of Responses


S


Z

Ca


VV

0

0

U
k b 6,


s 6 Z oz v p k 6 r w r n { AG
38, C6 2
A 2 1221 B 14 i I c L 3 ~~21 A 3 2 B 7
C 5 II A 422 B 8
C 412 A 8 B I 4 2 1 C L3 2 2 A 5 I B 7 C 16 I A 7 I B 61 C 8 A 71 B 8 C _8 A 7 B 8
C
A 8 B I 511 C I 7 A 7 B 62 C 52 A 7
B 7
C. L I-- - - - - - .


Figure 19. Matrix of responses by Observer 5 to consonants distorted by infinite peak clipping.


Letters A, B, and C identify talkers.







Number of Responses

z v p P k 6 r w " 5
AB


C
A 7
A
B 7 1
C _ 61 .A 2
4 B I


C 8

A 6 B


A T nV B1
Ci)




0
t A 8 0B 8
U C------ 8 ------A C
A
b B
O .....8
A


A



Figure 20. Matrix of responses by Observer 6 to undistorted consonants.


Letters A, B, and C identify talkers.







Number of Responses
s O -z v p t k 6 .t r w rn n


A

C A

C A

C A Z B
C A



P B
C


A
V B
C A


P B
A t B
C A





C
A


C
A


Figure 21. Matrix of responses by Observer 6 to consonants distorted by infinite peak clipping.


Letters A, B, and C identify talkers.







Number of Responses s 0 z v p � k 6 .;t r W rn D J b

bBB
06
A

A 8
8 6 2 ---- --
C 5 1

A 1
B ,
C B
Al 2
Z B 8 4
C 4
A
V S 6
C A

tB 8 P c 8
A
t

A 8
k 8
A

C
A a

C -8
A 8
9 8 - - -


Figure 22. consonants.


Matrix of responses by Observer 7.to undistorted


Letters A, B, and C identify talkers.






Number of Responses


s 6 - z v p t k 6 , o, t


Srwrnn D


j.


A S B
C A

C A

C A Z B
C A

C A V B
C A

P B
A t B
C A k B
C A

C A
C A
C


Figure 23. Matrix of distorted by infinite


responses by Observer peak clipping.


7.to consonants


Letters A, B, and C identify talkers.


53- ~~
4 42 2
2 1 3



8
7 I

1 5
I T
17
I 5






771
8
35

17
8d 7






7
I I I
- '7-----------_ 7 i







Number of Responses


s 6 F z v p t k 6 o.;t I r wre n


C
A
C

C A

C A
z


C A



V B
C A
P
C A

C A
k 8




A

C
A

C
A C


Figure 24. Matrix of responses by Observer 8 to undistorted consonants.


Letters A, B, and C" identify talkers.






Number of Responses


S 09 *z v p t k 6 d 9 1 r w rn n


A

C A

C A
C A Z B
C A
B



C A



V
C A
Z a
c





A i-B
C A k 6
C





A

C
A

C
A
k


Figure 25. Matrix of responses by Observer Sto consonants distorted by infinite peak clipping.


Letters A, B, and C identify talkers.







Number of Responses


s 60 z 4 v P + k 6 , t I r w n


A
SB
C A

.4 B
C A


z

C A

C A VB
C A
P 8
C


A t B
C
A t 8
c


A

C
A

C
A
S


Figure 26. consonants.


Matrix of responses by Observer 9.to undistorted


Letters A, B, and C identify talkers.






Number of Responses


A
L
C A

SB
C A 4B
C
A Z B
C A

C A v B
C A P c
A
SB
C A




C
A

C
A
C


Figure 27.


s 0 z v p t k 6t I r w rn n D J fr



2 2


422 717




44
7
8
82 6

25
.34



7 I




2
1221




26

L_ 26


Matrix of responses by Observer 9 to consonants


distorted by infinite


peak clipping.


Letters A, B, and C identify talkers.






Number of Responses


A

C A

4 B
C A

C A
B
�02 C
A

CC dA U) C ZA

p B o C 0 A
0 o C
A


A


A
SB
C A




Figure 28. consonants.


Matrix of responses by Observer1 to undistorted


Letters A, B, and C identify talkers.






Number of Responses


z t k w rr n



A 8
C 2 ,,5
A 6 5


A 2 4



A
B 53
C 23 2
A 35 Z 8



. 2 4 - - - - - - -
A2 C 42
B 3 1 4

A B v 26



k c ,
A8
C
B
A
P B




C
A 7 1 tB
C

A 7
C 7'
A8


A
S 3 i J--7 1-


Figure 29. Matrix of responses by Observer distorted by infinite peak clipping.


10 to consonants


Letters A, B, and C identify talkers.

















BIBLI OGRAPHY


Ahmend, R., and Fatehchand, R., Effect of sample duration on
the articulation of sounds in normal and clipped
speech, J. Acous Soc. Am., 31, 1959, 1022-1029.

Black, J. W., Accompaniments of word intelligibility, J. Speech
Hearing Dis., 17, 1952, 409-418.

Black, J. W., and Hixson, W. C., Number of axis crossings and
intelligibility of speech, J. Acous. Soc. Am., 31, 1959,
1384-1385.

Brandt, A. E., Forms of analysis for either measurement or
enumeration data amenable to machine methods, Proceedings, Computation Seminar, December 1949, N. Y. , International Business Machines Corporation.

Davis, H., Stevens, S. S., Nichols, R. H., Jr., Hudgins, C. V.,
Marquis, R. J. , Peterson, G. E., and Ross, D. A., Hearing Aids: An Experimental Study of Design Objectives.
Cambridge, Massachusetts: Harvard University Press,
1947.

Dukes, J. M. C., The effect of severe amplitude limitation on
certain types of random signal: a clue to the intelligibility of 'infinitely' clipped speech, Monograph No.
111R, Proc. Instn. of Elect. Engrs., 1954, 88-97.

Egan, J. P., Articulation testing methods, The Larnygoscope,
58, 1948, 955-991.

Harris, J. D., Combinations of distortion in speech, A.M.A.
Archives of Otolaryngology, 72, 1960, 227-232.

Harris, K. S., Cues for the discrimination of American English
fricatives in spoken syllables, Language and Speech, 1,
1958, 1-7.









Hirsh, I. J., Reynolds, E. G., and Joseph, M., Intelligibility
of different speech materials, J. Acous. Soc. Am., 26,
1954, 530-538.

Jacobson, R., Fant, C. G. M. , and Halle, M. , Preliminaries
to speech analysis, Tech. Rept. No. 13, Acoustics Lab.,
M.I.T., 1952.

Kryter, K. D., Licklider, J. C. R., and Stevens, S. S.,
Premodulation clipping in AM voice communication,
J. Acous. Soc. Am., 19, 1947, 125-131.

Licklider, J. C. R., The intelligibility of amplitudedichotomized, time-quantized speech waves, J. Acous.
Soc. Am., 22, 6, 1950, 820-823.

Licklider, J. C. R., Bindra, D., and Pollack, I., The intelligibility of rectangular speech-waves, Am. J. Psychol.,
61, 1948, 1-20.

Licklider, J. C. R., and Pollack, I., Effects of differentiation, integration, and infinite peak clipping upon the intelligibility of speech, J. Acous. Soc. Am., 20,
1948, 42-51.

Littwin, S., Ed., Pulse Generators in Industrial Electronics.
New York: John F. Rider Publisher, 1964.

Miller, G. A., Language and Communication, New York: McGrawHill, 1951.

Miller, G. A., Heise, G., and Lichten, W., The intelligibility
of speech as a function of the context of the test
materials, J. Exp. Psychol., 41, 1951, 329-335.

Miller, G. A., and Nicely, P. E., Analysis of perceptual
confusions among some English consonants, J. Acous.
Soc. Am. , 27, 1955, 338-352.

Pollack, I., and Pickett, J. M., Intelligibility of peakclipped speech at high noise levels, J. Acous. Soc.
Am., 31, 1959, 14-16.

Snedecor, G. W. , Statistical Methods, Fifth Edition, Ames,
Iowa: The Iowa State University Press, 1956.

Southworth, G. R., Speech digitizing techniques, J. Audio
Engr. Soc., 11, 4, October 1963, 349-352.





81



Velichkin, A. I., Amplitude clipping of speech, Soviet
Physics-Acoustics, 8, 2, Oct.-Dec. 1962, 130-134.

Vilbig, F., and Haase, K. H., Theoretical investigations to
reduce harmonic distortions in a clipping process,
J. Acous. Soc. Am., 29, 1957, 776.

















BIOGRAPHICAL DATA


Maurice Joseph was born November 25, 1926, at Astoria, New York. In February 1944 he was graduated from William Cullen Bryant High School in Astoria. From 1944 to 1946 he served in the United States Army. Following his discharge he returned to school and subsequently received the degree of Bachelor of Arts from Queens College, Flushing, New York, in February 1950. He was a graduate assistant at the University of Pittsburgh and received the degree of Master of Science in August 1952. After two years as a research assistant at Central Institute for the Deaf in St. Louis, Missouri, Mr. Joseph taught for ten years at Tulane University Medical School. In 1964 he was enrolled at the University of Florida where he has continued to work toward the degree of Doctor of Philosophy.

Mr. Joseph has published articles, as junior author, in the Journal of the Acoustical Society of America and in the Southern Medical Journal. He was granted a Special Rehabilitation Research Fellowship by the Vocational Rehabilitation Administration.




83



Maurice Joseph is married to the former Pauline Julia Bruckenstein and is the father of two children, Laura and David. He is a member of the American Speech and Hearing Association, the American Association for the Advancement of Science, the American Association of University Professors, and Sigma Alpha Eta.











This dissertation was prepared under the direction of the chairman of the candidate's supervisory committee and has been approved by al l members of that commi ttee. It was submitted to the Dean of the College of Arts and Sciences and to the Graduate Council, and was approved as partial fulfillment of the requirements for the degree of Doctor of Philosophy.


December 17, 1966




Dean, Coll e f rts and Sciences





Dean, Graduate School



SUPERVISORY COMMITTEE:




Chairman




Full Text

PAGE 1

INTELLIGIBILITY OF SELECTED CONSONANT SOUNDS DISTORTED BY INFINITE PEAK CLIPPING By MAURICE JOSEPH A DISSERTATION PRESENTED TO THE GRADUATE COUNaL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA December, 1966

PAGE 3

ACKNOWLEDGMENTS The author expresses sincere appreciation to his advisor and chairman, Dr. Donald Dew, for guidance throughout the planning and execution of this study, and to members of the dissertation committee for suggestions and encouragement . Special appreciation is expressed to Dr. Harry Hollien, and the faculty, staff, and students of the Communication Sciences Laboratory of the University of Florida for cooperation in all phases of the study. In addition, the author gratefully acknowledges Dr. A. E. Brandt's advice and aid in statistical treatment of the data. The author wishes especially to thank his wife, Pauline, for her constant encouragement, inspiration, and untiring aid at all times. Finally, the author expresses gratitude to the Vocational Rehabilitation Administration for financial aid through a Special Rehabilitation Research Fellowship. iii

PAGE 4

TABLE OF CONTENTS LIST OF TABLES ^ LIST OF FIGURES CHAPTER I BACKGROUND AND STATEMENT OF PROBLEM .... 1 II PROCEDURE 2^4 III RESULTS AND DISCUSSION 28 IV SUMMARY AND CONCLUSIONS 45 APPENDIX A DETAILS OF THE CLIPPING APPARATUS ... 47 APPENDIX B SYSTEM CALIBRATION PILOT STUDY 53 APPENDIX C STIMULUS-RESPONSE MATRICES 58 BIBLIOGRAPHY 79 BIOGRAPHICAL DATA 82 iv

PAGE 5

LIST OF TABLES Page 1 Vocabulary of nonsense syllables used as s t i mu 11 2 Summary of analysis of variance 29 3 Difference between treatments for each consonant 3 q 4 Matrix of significant differences between sounds under each distortion treatment 34 5 Differences between vo i c ed voi c el e s s cognate pairs subjected to infinite peak clipping 37 6 Effect of order of treatment for each treatment 41 7 Intelligibility scores for PB lists without distortion and with infinite peak clipping 57 V

PAGE 6

LIST OF FIGURES Fi eur e Page 1 A comparison of a complex wave before and after infinite peak clipping 4 2 Block diagram of equipment ... oq 3 Structural scheme of factorial design . , 24 4 Intelligibility scores of consonants . . . 33 5 Confusion matrix of responses to undistorted consonants . . . 40 6 Confusion matrix of responses to consonants distorted by infinite peak clipping 43 7 Schematic of clipper amplifier module . . 49 8 Output waveforms of modules 50 9 Action of clipper on speech waveforms . . 51 10-29 Matrices of stimu 1 u s -r e s pon se s by ten observers under two treatment conditions 59-78 Vi

PAGE 7

CHAPTER I BACKGROUND AND STATEMENT OF PROBLEM I n t r oduc t ion Experimental studies have demonstrated that word lists retain intelligibility when subjected to distortion by infinite peak clipping (Licklider and Pollack, 1948; Licklider, Bindra, and Pollack, 1948). These findings have been interpreted to mean that the temporal pattern of zero-axis crossings of the speech waveform provides information that is in itself sufficient for perception of speech. A corollary Inference is that clues provided by relative amplitude variations in the speech waveform may be discarded without degradation of intelligibility. These interpretations of the effects of infinite peak clipping are of considerable theoretical importance: to the extent that they are true they describe an invariant factor in speech intelligibility. However, there are two reasons to suspect that these generalizations may need qualification. One reason is that the speech sample used in the early studies may have provided contextual clues to word perception in addition to the frequency and amplitude clues inherent in the speech wave. Thus, with multiple clues 1

PAGE 8

2 operating, the role of one physical parameter is more difficult to assess. A second reason to question the generalizations about zero-axis crossing and relative amplitude clues is that these clues may assume different importance for different types of speech. Factors that affect intelligibility of word lists may differ from those that affect individual sounds; moreover, it is possible that these factors may assume different importance among different individual speech sounds. It may be argued that speech communication generally involves units larger than individual speech sounds, and that word intelligibility provides a closer approximation to intelligibility of conversational speech. It may be true that whole words, with contextual clues free to operate, have higher face validity as representative of speech in general. However, speech communication situations do exist--in military communications, for instance--in which identification of specific speech sounds may be essential. Moreover, fundamental generalizations about factors that determine intelligibility of speech should be capable of verification on molecular levels (figuratively) such as individual phonemes, consonant clusters, and syllables, as well as on the molar level of words and continuous discourse. The focus of this study is on the individual speech sound. Specifically, the present study represents an attempt to determine the effects of infinite peak clipping on

PAGE 9

3 intelligibility of specific consonants. It also represents an attempt to minimize the opportunity for contextual clues to operate. Infinite Peak Clipping Def ini tion Peak clipping is a form of distortion in which amplitude is prevented from exceeding a certain maximum limit. Amplitude is held constant during times when, in a linear system, it would exceed the limiting level. When the positive and n eg a t i V ego i n g parts of the speech wave are equally limited, the effect is said to be ^^symmetrical peak clipping." If a wave thus limited is reamplified so that the clipped output wave equals the amplitude of the peaks of the original wave (before clipping), and if this operation is continued through several successive stages of clipping and reamplification until the waveform is approximately rectangular in sliape , the operation is called "infinite peak clipping." Line B of Figure 1 illustrates the result of infinite peak clipping on the waveform shownon Line A. Two characteristics of the clipped wave may be noted on this figure: the points (in time) that correspond to axis crossings in the undistorted wave are the same in the clipped wave, but the amplitudes have only two values, a single maximum excursion above and below the axis. In addition, all indications of differences in rise-time or fall-time, or of changes that did not cross

PAGE 10

4 Figure 1. A comparison of a complex wave before and after infinite peak clipping (after Licklider, Bindra, and Pollack, 1948 ) . A. Undistorted B. Same wave following infinite peak clipping

PAGE 11

5 the axis, are eliminated in the clipped wave. Relevant literature Effect s — cm — intell igibility . The most Important study on the intelligibility of speech distorted by infinite clipping was that of Licklider and Pollack (1948). Phonetically balanced word lists recorded on discs by one talker were subjected to infinite peak clipping and presented to five listeners in a series of twenty-five sessions. Scores averaged approximately sixty-eight percent in the first listening session, but improved in succeeding sessions. Intelligibility scores of over ninety percent were achieved by the twentyfifth session. The authors felt that this increase was independent of the listeners' increased familiarity with PB word lists but instead represented learning to understand clipped speech. This suggests that statements regarding the intelligibility of clipped speech should specify the observers' familiarity with clipped speech. The effects of peak clipping in combination with other forms of distortion were also studied. Integration of the waveform, the equivalent of a low-pass filter slope of six dB per octave, severely degrades intelligibility of PB-50 word lists, if the integration occurs prior to clipping. Differentiation of the waveform, the equivalent of a highpass filter slope of six dB per octave, performed before clipping, results in intelligibility slightly better than

PAGE 12

6 with clipping alone. Integration or differentiation of the waveform following the clipping had no effect on intelligibility (Licklider and Pollack, 1948) . When combined with distortion by reduction of sound duration, subsequent cliphad little effect on the intelligibility of vowels, but did reduce the intelligibility of consonants (Ahmend and Fatehchand, 1959) in CVC and VC syllables. Spectral effects . In attempting to predict or to explain the effects of infinite peak clipping under various conditions, it would be helpful to know the effects of clipping on the spectrum of speech. Unfortunately, the information available is scarce and somewhat equivocal. Only one report gives empirical measurements: Licklider, Bindra, and Pollack (1948), reporting the results of an intensi tyf requency-time spectral analysis of the word "shoe-bench," both undistorted and peak clipped, state ". . . although many of the details of the pattern are changed by infinite peak clipping, the general plan of the terrain is by no means rendered unrecognizable. The main concentrations o f ' lo wf r equency and of high-frequency energy are still in the same places despite the rearrangement of minor peaks." This report, even while minimizing the effects of clipping on the spectrum, does support the contention that changes do occur in the spectrum under infinite peak c 1 i ppi ng . Two theoretical papers, analyzing the clipping process

PAGE 13

7 mathematically, exhibit similar ambiguity. Vellchkin (1962) concludes . . It is evident that amplitude clipping causes a broadening of the speech spectrum, but this broadening is slight, even with maximum clipping." Mathematical calculations by Dukes (1954) confirmed the likelihood that clipping would not seriously reduce the intelligibility of speech, but he adds, ". . . it is clear that the results have significance in respect of very long samples . . . With this formulation nothing can be deduced about the intelligibility of individual sounds, except of course that large deviations below the average must ’be relatively infrequent.” He points out that in applying his formulas some of the values obtained represent averages over all possible unvoiced and voiced sounds. The most serious shortcoming of the spectral informmation available, whether empirical or theoretical, is that it is based on relatively long-term averaged values. Apparently spectral changes, however small, do occur under infinite peak clipping, but are obscured by averaging over time. Specific moment-to-moment changes are not yet known in detail; this makes it difficult to predict the effects on brief speech sounds such as consonants. Implications of previous studies on peak clipping Two related implications of the Licklider and Pollack (1948) study have been considered of major importance. These concern the relative importance of dynamic amplitude variation

PAGE 14

8 and of the temporal pattern of zero-axis crossings in the speech waveform. Am pi i tud e clues . Since the speech wave subjected to infinite peak clipping assumes a dichotomous value of amplitude, the almost infinite range of possible amplitude values present in the undistorted wave is reduced to one value of maximum and minimum excursion. This, in effect, means that information regarding relative amplitudes in the speech wave is discarded. The implication drawn is that ". . . the variations in intensity from moment to moment appear not to be basic cues for the recognition of words," and that "the socalled dynamic characteristics of speech are not of vital importance for intelligibility. It is apparently just as well to reproduce all the fundamental speech sounds (or what is left of them after' clipping) at the same intensity level as it is to preserve their normal intensities" (Licklider and Pollack, 1948) . Zero-axi s crossings . It has been reasoned that since infinite peak clipping eliminates information regarding relative amplitudes, as well as onand off-slope and wave shape, all that remains in the clipped wave is the temporal pattern of zero-axis crossings. The inference drawn is that the axis crossing information is sufficient to provide speech intelligi(Licklider, 1950). It should be noted that this is negative evidence based upon the elimination of other possibilities.

PAGE 15

9 Black and Hixson (1959) were unable to demonstrate a direct relationship between density of zero-axis crossings and intelligibility. The implications of the Licklider and Pollack study may be paraphrased as follows: 1) The information provided by the temporal pattern of zero-axis crossings is necessary and sufficient for intelligibility of speech, and 2) The information provided by the dynamic pattern of amplitude variations is not necessary for intelligibility of speech. The point of view that has stimulated the present study is that : 1) While the information provided by the temporal pattern of zero-axis crossings is probably necessary, it is possible that there are specific specimens of speech for which it may not be sufficient, and 2) While the information provided by the dynamic pattern of amplitude variations is probably not necessary for intelligibility of many speech samples, it is possible that there are some specific speech sounds for which this information may be necessary. Applications of peak clipping in communication Thus far infinite peak clipping has been treated in terms of distortion. It may also be viewed as a means of

PAGE 16

simplifying the speech wave for efficient transmission Speech encoding. Licklider and Pollack (1948) observe that infinite peak clipping can reduce speech to a bivariate code more efficiently than pulse-modulation procedures. Not only is the speech wave easily encoded by infinite peak clipping, but may be decoded by the human ear with no further decoding apparatus necessary other than the transducer that would ordinarily be used in the transmission system. Southworth (1963) describes several speech digitizing techniques based upon infinite clipping; these include pulse-number modulation and delta modulation techniques. broadcast tra nsmission . One of the earliest applications of peak clipping was in broadcast transmission. Premodulation clipping has been found to increase the efficiency of power use in braodcast transmission (Kryter, Licklider, and Stevens, 1947). Protective limiting . Clipping has been found to be useful in protecting the ear from high-energy speech peaks (Pollack and Pickett, 1959). This was investigated in detail for applications in hearing aid design (Davis, et al . . 1947). In all pf the above applications, the ultimate usefulness will depend to some extent upon how intelligible peak clipped speech remains in relation to the needs of the spes i ^ i on . It is already known that word intelligibility is satisfactory under peak clipping. To the extent that specific

PAGE 17

11 communication situations demand fine discriminations, the intelligibility of peak clipped speech must be determined in greater detail . I^ imitatlons of previous studies on peak clinning j_ nf luence — o_f — test mat eri al s . Experimental results on intelligibility are highly dependent on the type of test materials used. Under identical acoustic conditions, an in-, t ell i gi bi 1 i ty score may vary by as much as ninety percent depending upon whether the test materials are digits or nonsense syllables (Miller, Heise, and Lichten, 1951 ). Differences in intelligibility have been associated with meaning, number of syllables, and, to a smaller extent, syllabic stress (Hirsh, Reynolds, and Joseph, 1954). Since the currently available information about intelligibility of speech subjected to infinite peak clipping is based on monosyllabic words, the intelligibility of other speech samples, such as individual phonemes, remains unknown until tested empirically. ^ ^ ^ — oJ — contextual clues . An important factor is the information provided by the context of sounds in a word or syllable; this may affect the number of alternative responses available if a'particular sound is unintelligible (Miller, 1951). If, for instance, a test word is /stra 3 _/, with the final consonant unintelligible to the observer, and the test consists of nonsense syllables, there are more than twenty possible final consonants available as responses. These include the

PAGE 18

nonsense words: ^.ram, strat , ^trab . strad . stran . strak . ^tran^, strag , s tral , stray, straf . straz . strath (voiced and voiceless) s t r as , strash , strach . If, however, the test vocabulary consisted of real English words, then there is probably but one possible response: strap. Thus, if the final sound were rendered unintelligible by some distortion, on the real word test the observer's knowledge of the statistical probabilities of occurrence of certain sounds, combinations of sounds, and orders of sounds would probably enable him to correctly identify the word. If a distortion, such as infinite peak clipping, produced a systematic effect on the intelligibility of certain sounds, this could be obscured in PB-50 scores if word contexts led to correct identification of words in which the sounds occurred. As J. D. Harris (1960) has observed: In order to uncover the contribution of each type of physical cue alone, it is not enough to eliminate it by some legerdemaine in the laboratory. When this is done . . . intelligibility is often unaffected. A false conclusion could in that case be reached . . . that the cue eliminated is of minor importance in speech communication. What is necessary is to eliminate progressively one, two, and more cues simultaneously. of distorted speech, it is considered important in this study to control contextual clues while studying the effects of distortion on intelligibility of individual sounds. Since contextual clues may contribute to the

PAGE 19

13 Statement of the Problem This study addresses itself to three questions: 1) Are individual speech sounds affected in intellibibility by infinite peak clipping? 2) If so, are these effects equal from one phoneme to another? 3) If individual speech sounds are not equally affected in intelligibility by infinite peak clipping, is there a systematic pattern in the way in which sounds are affected?

PAGE 20

CHAPTER II PROCEDURE Stimulus materials Sele ction of stimuli . For this study twelve consonants /pbtdkgGdfvsz/ were chosen which represent a variety of positions and manners of articulation, as well as acoustic effects. In particular, the selected consonants include examples of fricatives and plosives, of voicedvoiceless cognate pairs, and of 1 i ngua— al veol ar , linguadental, bilabial, and velar placements. They also represent the following di.stinctive features: contrasts of stridentmellow, tense-lax, continuous-di scontinuant , grave-acute, and compact-diffuse . Obviously some of these consonants, namely the plosives, cannot be spoken without an adjacent vowel. Moreover, it is well known that the acoustic characteristics of consonants can be altered by the vowel context in which they are produced. Consequently, in order to counterbalance the vowel environment, the selected consonants were embedded in a series of VCV syllables. Specifically, a corpus of ninety-six nonsense Vo wel -C on s on an t Vo wel words was designed, drawing upon 14

PAGE 21

15 the twelve consonants in combination with four stressed vowels and the neutral schwa sound. The counterbalancing was accomplished by selecting stressed vowels /i ae a u/ , which represent extremes of position on a vowel quadrilateral, and combining them with each of the consonants in both the pre-vocalic and post-vocalic positions. By adding the schwa to each of these CV and VC combinations in the initial or final position, respectively, VCV syllables were formed which correspond to Trochaic or lambic stress patterns. These may be schematized as either stressed vowel + consonant + schwa, or schwa + consonant + stressed vowel. Thus for any replication of the entire corpus there are eight utterances of each consonant embedded in different vocalic environments. The entire set of stimulus words is presented in Table 1. Because there are individual differences in the production of speech sounds, it was deemed desirable to use three different talkers to produce samples of the syllables. The three adult male talkers were chosen from the Communication Sciences faculty. Each talker had extensive training and experience in phonetic transcription. In preparation for recording spoken examples of the selected stimuli, a phonetic transcription was made of the ninety-six nonsense syllables. The transcriptions were replicated three times, once for each talker, and arranged in one complete random order. On the final copies of this

PAGE 22

16 Table 1 . Vocabulary of nonsense syllables used as stimuli. ISO 39S0 0 s i 0 sas 10 0 ee9 0 0 0 i 0 0ae i f 0 aef 0 0 f i 0 fae i Z0 aez0 0 zi azae i d 0 aeda 0 d i 0 dae i V0 aeva 0 vi 0 vae i p0 aepa 0 pi 0 pae i 1 0 aeta 0 t i 0 tae i k0 aeka 0 ki akas i b0 asba 0 bi 0 bee i d 0 aed a 0di adae i g8 ' Sega 0gi agae a S0 u s a a sa a su a 0 a U 0 0 0 0a a0u a f a Ufa a f a a f u a V a u za a va 0 zu a d a u d a 0 d a 8 d u a V a u va a va a vu a pa u pa a pa a pu a t a uta a ta a tu aka uka a ka aku aba uba a ba a bu a d a ud a ada adu a ga uga nga agu

PAGE 23

list of 288 phonetically transcribed syllables, individual Items were identified by talker so that the recordings could be made in a single session. Recording the stimul us tape . For two reasons, the spoken examples of stimuli were tape recorded during a single session. First, if recordings were made separately a subsequent splicing or rerecording step would have been necessary to randomize the stimuli produced by different talkers. However, by recording all the talkers in a single session, this randomization was accomplished prior to the recordings. Secondly, since the talkers were instructed to correct any errors in the production of the stimuli, they and the experimenter provided an initial check of each other's articulation. In the recording session the talkers were seated in a series 1200 Industrial Acoustic Corporation room approximately two feet from the Altec M-20 microphone system, which was coupled to an Ampex model 351-C full-track tape recorder located outside the sound-treated room. Each talker-listener had a complete list of the stimuli indicating which talker was to read each of the words. Talkers were instructed to use the carrier phrase "now" for each VCV word with no intervening pause. By using such a phrase an opportunity for the talker to reach a stable vocal level was provided before his production of the VCV syllable. The word "now" was chosen, rather than a more conventional "say the word" or "write the

PAGE 24

18 word, ' because it does not include any of the sounds used as stimuli. This avoided the possibility of a constant standard of reference for one or more of the test stimuli. The experimenter provided a flash of light as a cue for the beginning of each word in order to space stimuli at four-second interval s . Following the recording session, the tape was edited to remove errors and irrelevant materials and to insert an identifying number before each group of ten stimuli. Since one of the talkers tended to use a slightly lower vocal level than the others, the gain was adjusted during the rerecording process to equalize the levels of the speech peaks on the carrier phrase. Observations Observers . The criteria for selection of observers were: 1) training in phonetics, and 2) normal speech and hearing. Hearing sensitivity within normal limits was determined by means of individual screening tests at 250, 500, 1000, 2000, 4000, and 6000 Hz. This testing was accomplished with a Beltone model 10 A portable audiometer, set at a screening level of 15 dB relative to the 1964 I.S.O. standard. Ten observers who met the criteria were recruited from among the students and faculty associated with the Communication Sciences program at the University of Florida. This included eight male and two female observers whose ages ranged

PAGE 25

19 from twenty to fifty years. For each of the observers, the experimental session was the first exposure to speech subjected to infinite peak clipping. -Playback s ystem . The complete system as used in the experimental trials is depicted schematically in Figure 2. The stimulus tape was played from an Ampex 351-C tape recorder into one channel of a Marantz model 7 preamplifier with controls set in the normal flat frequency response position. The output of this channel was led to a double— pole doublethrow selector control which either connected or bypassed the clipper. The clipping unit, consisting of three cascaded pr i n t ed c i r cu i t amplifier modules, operated beyond their linear range, is described in detail in Appendix A. The "pass-clip" switching control output led to a Hewlett-Packard 350-D decade attenuator set. The signal was then reamplified through the second channel of the preamplifier connected to a Marantz model 8-B power amplifier. The output of the power amplifier drove an AR-3 acoustic suspension loudspeaker system located in the I.A.C. room. A dual-beam Tektronix type 564 oscilloscope was used to constantly monitor the speech waveforms at the input of the clipping unit and at the output of the power amplifier. Accordingly, both undistorted and clipped waveforms could be observed simultaneously. Play back level . Stimuli were presented to observers at 70 dB Sound Pressure Level. In order to achieve the desired

PAGE 26

20 u 0 ) Figure 2. Block diagram of equipment.

PAGE 27

21 level consistently for all experimental trials, a preliminary calibration was performed. A General Radio sound level meter, located at a position in the I.A.C. room, where observers would sit during experimental trials, was used to determine sound pressure levels. The RMS voltage reading at the input to the loudspeaker which produced a 70 dB Sound Pressure Level at the observer position was noted and used as reference setting during experimental trials. The signal used to set voltage readings was a 1000 Hz calibration tone. The calibration tone was recorded on the test tape at a V.U. level that corresponded to the average V.U. of the talkers' speech peaks on the tape; the signal source for the recording was a General Radio model 1304-B beat frequency audio generator. RMS voltages at the transducer input were determined with a Ballantine model 321-C A.C. Vacuum Tube Voltmeter. Sound Pressure Levels were read on the B scale of the sound-level met er . s t em — c^a 1 ibratlon . In order to determine whether the present experimental apparatus was comparable with that of previous studies, a system calibration was performed. This took the form of a pilot study using a speech sample similar to that used in the Licklider and Pollack (1948) study. De\ tails are presented in Appendix B. The apparatus was found to be comparable. Conducting experimental trials . Observers were seated

PAGE 28

22 in one of three seats placed at a distance of two and onehalf feet from the loudspeaker. Half of the observers heard the test list under the "clip" treatment first and "pass" (undistorted) second; the other f i ve obse r ver s heard the lists with treatment conditions in the reverse order. The following recorded instructions were presented to the observers before the VCV list under each treatment condition: Transcribe only the consonant in the word. Do not transcribe the word "now" which comes before each word. For example, if you hear "Now eje," write the symbol for /j/. If you hear "Now emo , " write the symbol for /m/ . To help you keep your place, after each group of ten words the number of the following word will be given. Just prior to the instructions for the "clip" condition, whether it came first or second in order, certain recorded materials were presented to provide some familiarization with the sound of clipped speech. These materials included one unfamiliar passage--a paragraph from an out-dated news magazine, and two passages which were presumed to be very familiar: the pledge of allegiance to the flag, and digits from one to twenty. I A brief rest period was provided between treatment conditions. At this time the voltage to the loudspeaker was rechecked and the tape was advanced to the position for the following treatment. The listening session required about

PAGE 29

one hour to complete both treatment conditions. Data reduction 23 Analy si s — oj — var i anc e . The primary analysis of results used a design with four factors; Consonant (C), Observer (0), Talker (T), and Distortion (D). Factor C consisted of the twelve consonants used in this study, Factor 0 represented ten observers and Factor T represented three talkers. The two treatments in Factor D were the distorted, or clip condition and the undistorted, or "pass" condition. The latter was included as the experimental control for the effects of clipping. The design utilized a mixed model in which the Consonant and Treatment factors were considered fixed effects and the Observer and Talker factors were considered random effects. The structural design is depicted in Figure 3. The number of correct responses (from zero to eight) made by a single observer to a talker's productions of a given consonant was entered in the individual cell. For purposes of this analysis, differences among vowels and stress patterns in syllables were ignored; the syllable was viewed simply as the vehicle for providing a means of counterbalancing a variety of coarticulation effects among the test stimuli , and as a method of limiting the number of response alternatives for the total VCV vocabulary in order to control the contextual variable. Individual comparisons . A factorial design makes

PAGE 30

Qcr Pv -fX! < Q;iN -hCDCnt-O D-+~0 < QV N ~h Q) CO 24 Figure 3 . Structural scheme of factorial design. Treatment

PAGE 31

25 possible the analysis of two or more experimental variables simultaneously, both of their individual effects and of their interactions with one another. In an analysis of variance, the I] ratios obtained are evaluated with reference to the theoretical distribution of in order to test the significance of differences among the treatment means. It is often deemed desirable to investigate the sources contributing to the significance of a given variable. This may be accomplished by making specific comparisons of the significances of differences among means of the levels or categories of a variable. For instance, in the present study the information that the Consonant factor had statistical significance would not indicate which consonants differed from one another, but comparisons among the individual consonants or groups of consonants could provide this information. For enumerative data such as those collected in this study the chi-square test provides an appropriate test of significance. The present data represent the number of correct responses that were actually obtained compared with the number of correct responses possible. According to Snedecor (1956) the basic chi-square formula may be verbalized as: ® is the sum of such ratios as (deviation squared)/ (hypothetical number)." The factorial chi-square, developed by Brandt and Snedecor (Brandt, 1949), represents an alternative method of calculation of chi-square that is easily

PAGE 32

26 adaptable to use in a factorial design. In this method the sums of squares are multiplied by a chi-square factor to determine the value of chi-square associated with a given comparison. The chi-square factor represents ( e ) ^/( o ) ( e-o ) , where e equals the total possible sum of scores and ^ equals the obtained sum of scores. The sum of squares associated with the comparison is ( d ) 2/ ( f ) ( s^ 3 ^2 ^ ^ where d equals the difference between sums of scores of the quantities being compared, J. is the frequency of each item being compared, and s.^ and ^2 are the number of items comprising each half of the comparison. For example, if one sound is compared with another, would equal twice one squared, or two; if six sounds were compared with six other sounds, (si2 + 32 ^) would equal twice six squared, or seventy-two. The obtained value of chi-square is tested for the number of degrees of freedom appropriate to the effect or interaction for which the compari son i s made . This method was used to make voi c ed voi c el e s s comparisons, fricative-plosive comparisons, and individual consonant comparisons. It was also used to test the effect of order of presentation of treatments. S t i mulu s r es non s e matrices . Information about types of error responses was displayed in confusion matrices which were compiled for each observer (see Appendix C), similar to those described by Miller and Nicely (1955). These matrices preserved

PAGE 33

27 specific responses made by observers to each talker' uli . I s t im-

PAGE 34

I CHAPTER III RESULTS AND DISCUSSION Analysis of Variance A summary of the results of the analysis of variance is presented in Table 2. Mai n effects Two main effects were found significant. First, the Distortion factor was significant, indicating that clipping does produce a decrement in consonant intelligibility. The total percent of correct responses under the "pass" condition was 94.6%; the corresponding score for clipped speech was 76.3%,. This gross effect of clipping on all stimuli together does provide an affirmative answer to the experimenter's first question, are individual speech sounds affected by infinite peak clipping? The Consonant main effect was also significant. This may be due to an inherent difference in consonant intelligibility, whether clipped or undistorted. It may be noted in Table 3 that for both /9/ and /d/, intelligibility scores were lower than for other consonants under both conditions. 28

PAGE 35

29 Table 2. Summary of analysis of variance. Source d . f . Mean Scjuare Significance Consonant ( C ) 1 1 9 2.45 9.44 * * Observer (0) 9 3.18 0.33 N.S. Talker (T) 2 0.05 0 . 0 1 N.S. Distortion (D) 1 38 7 . 20 39.55 * * CxO 99 13.07 1 . 29 * CxT 22 52.05 5.14 * + CxD 1 1 142.71 14.08 * * OxT 18 2 . 23 0 . 22 N.S. OxD 9 48.99 4.83 * * TxD 2 200 . 53 19.79 * * CxOxT 198 7.81 0.63 N.S. C xOxD 99 20.57 1.65 * CxTxD 22 80.19 6.44 * OxTxD 18 26.69 2.14 * * CxOxTxD 198 12.45 pooled residual a OxT CxOxT CxOxTxD 414 9 , 79 pooled residualj^ CxOxT CxOxTxD 396 10.13 error term for main effects was pooled residualg^ error term for two-way interactions was pooled residualj^ error term for three way interaction was CxOxTxD .01 level of significance ** .05 level of significance * not significant N.S.

PAGE 36

30 Table 3. Difference between treatments for each consonant. Consonant Percent Pass Correct Cl i ,) Chisquare s 98.8 94.6 1.67 0 8 5.8 38 . 8 213.56 f 89.6 70 . 8 33.87 z 99.2 83.3 25 . 44 (3 6 5.8 27 . 5 141.56 V 97.9 74 . 6 52.45 p 100.0 87.5 15.05 t 99.6 91.3 6 .69 k ,99.6 83.8 24.15 b 99.2 84.2 21.68 d 99.6 87.9 13.11 g 100.0 91.3 7.38 With 11 d f , chi-square needed for significance at .0519.7 . 01 24.7

PAGE 37

31 The main effects for Talker and for Observer were not found to be significant. This was reassuring, since both of these factors were potential sources of considerable experimental error. Since over-all differences among talkers and observers were not statistically significant, many of the scores to be reported are based upon the pooled talker and observer scores. Interactions Despite the lack of significance of the Talker and Observer main effects, there were several significant interactions involving talkers and observers. Visual inspection of the data indicated that the Consonant x Talker interaction may have been due to a higher score of correct responses to /e/ as spoken by talker A than by the other talkers, and a higher score on /d/ as spoken by talker C. Furthermore, the Talker x Distortion interaction showed significance since talker C yielded a higher score than talkers A and B for the "pass" condition but a lower score than the others under clipping. These interactions involving talkers tend to confirm the advisability of using more than one talker in a study such as this. Divergent over-all scores under clipping between two observers seemed to be the basis of the significant Observer x Distortion interaction. Of major importance in this study is the Consonant X Distortion interaction. The significance of this

PAGE 38

32 interaction provides an answer to the experimenter's second question; are these effects equal? The effect of clipping on individual sounds is not equal. Individual Comparisons Specific consonants The difference between distortion treatments for each consonant is presented in Table 3, in which the percentage of correct responses is compared for each consonant. For almost half of the consonants that were included as stimuli, the intelligibility under clipping was not significantly different than for undistorted speech. The sounds that were affected little were /s g t d p/ . Significantly different at the .01 level were /d 0 v f / . Of the consonants that were affected by clipping, the decrement in percent of correct responses was as great as 47%. These relationships may be visualized easily in Figure 4. Considering each treatment separately, the difference from one consonant to another is shown in Table 4. This table is arranged in two matrices; the upper matrix represents the pass" condition and the lower represents the "clip" condition. In each matrix, the presence of an asterisk at the intersect of a column and row indicates a statistically significant difference between the two corresponding sounds in the margins. A blank space indicates that for the treatment under consideration, the difference between the two sounds

PAGE 39

33 r k b d 3 Figure 4. Intelligibility scores of consonants. A. Open bar indicates undistorted condition B. Closed bar indicates infinite peak clipped condi ti on

PAGE 40

34 Table 4. Matrix of significant differences^ under each distortion treatment. between sounds Pass g d p b k 0 d Sounds are arranged in order of decreasing intelligibility under clipping. significance level .01

PAGE 41

35 (with respect to the number of stimuli correctly identified) is not statistically significant at the .01 level. Under the "pass" condition, /6/ is statistically different from all of the other sounds. Under the "clip" condition, there are four consonants that are uniformly less intelligible than the others; these are /3 9 f v/. Reference to Table 4 makes it possible to ascertain the significance of the difference in intelligibility, under either treatment, between any two of the test stimuli. Results of the present study seem to support the observations made by Licklider and Pollack that a speech waveform can retain intelligibility even when stripped of relative amplitude clues so that only the temporal pattern of zeroaxis crossings is retained. In the present study, data are also presented which show that the extent to which this is true depends upon the speech sample.^ With contextual clues controlled, the intelligibility of individual consonant sounds in VCV syllables ranged from below thirty percent to scores above ninety percent, for observers' first exposure to speech distorted by infinite peak clipping. The differential effect A pilot study was conducted as a calibration procedure to determine the equivalence of the present experimental situation to that of the Licklider and Pollack study. It was concluded that the two were comparable. See Appendix B for details.

PAGE 42

36 of clipping on individual consonants is emphasized by noting the scores for the sounds that were the most affected and those that were the least affected by clipping. Scores for /e/ and /3/ were 38.8% and 27.5%; compared with the scores obtained without distortion, these represent decrements of 47.0% and 38.3% respectively. The score of 95,0% for /s/ represents a decrement of only 3.6%. which was not statistically significant. Vo i c ed VO i c el e s s grouping The value of chi-square associated with the grouped scores of /s 0 f p t k/ and grouped scores of /z d v b d g/ under clipping was 0.36; the value necessary for significance at the .05 level was 19.70. There were no significant differences under clipping between pairs of voi c edvoi c el e s s cognate sounds, as shown in Table 5. Thus, clipping produced the same degree of effect on intelligibility in the voiced and voiceless sounds. Since these groups are primarily differentiated on the basis of the presence or absence of lowfrequency periodic excitation, it is to be expected that clipping would not affect recognition of voicing. Fr i c at i vepi o si ve grouping In comparing consonants grouped according to the f r i c a t i V e pi o s i V e classification, a value of chi-square of 46,14 was obtained; the value necessary for significance at the .01 level was 24.70. It is apparent that the fricatives

PAGE 43

37 Table 5. Differences between vo i c ed voi c el es s cognate pairs subjected to infinite peak clipping. Consonant Pair Chi -square S z 14 . 21 0 a 12.19 f V 1.35 p b 1.07 t d 1 . 07 k g 5 . 43 With 11 df, value of chi-square needed for significance at .05 19.7 .01 24.7 I

PAGE 44

38 wer e di f f er en t , as a group, from the plosives. Intelligibility of the fricatives was considerably lower. However, this must be qualified; two of the fricatives, /s/ and /z/ were relatively unaffected by clipping, while the other four, /0 5 V f/ were severely degraded in intelligibility. Consequently, it appears that the distinguishing characteristics of fricatives are obscured more than those of plosives, and furthermore, that the characteristics of the dental plosives are obscured more than the alveolar plosives. This provides an answer to the third question, is there a systematic way in which sounds are affected by peak clipping? The following reasons are offered in partial explanation: 1) Cavity resonance . The four severely affected fricatives are formed at the teeth, so far forward in the mouth that they are relatively free of oral cavity resonance effects. The friction component of these sounds consists only of the turbulence of the breathstream between tongue and teeth or between lip and teeth, radiating into the air (plus voicing for /b/ and /v/) . It is evident that without oral cavity resonance there is little basis for distinguishing the lingua-dental sounds from the labio-dental sounds. 2) Friction and vocalic components of fricatives . An experiment performed at the Haskins Laboratories seems especially relevant to this discussion. Harris (1958) separated the friction components of fricative sounds in VC

PAGE 45

39 syllables from the vocalic portions and combined the friction component of one fricative with the vocalic portion associated with a different fricative. She found that identification of /s/ and /z/ could be made by observers entirely on the basis of the friction component, but that identification of /f V 9 6/ depended on the combination of both friction and vocalic portions. Since in the present study /s/ and /z/ retained such high intelligibility, it might be inferred that the removal of relative amplitude clues did not affect the transmission of information inherent in the friction portion of fricatives, but that it did affect the vocalic portion. Balance — o_f — noise. The /0/ and /5/ are characterized by a low intensity noise, while the /s/ and /z/ are considered to have a high intensity noise component (Jacobson, Fant, and Halle, 1952). Since one characteristic of infinite peak clipping is to equalize amplitudes to a uniformly high level, it is possible that clipping might do more violence to the naturally lower noise levels of /0/ and b/ than to the normally higher noise levels of /s/ and /z/. Durational clues . Still another explanation can be made on the basis of the clue of duration. While traditionally the fricatives are considered longer in duration than plosives. Miller and Nicely (1955) included /s/ /z/ and the lateral sibilant fricatives in one class, representing the consonants of longest duration, but the dental fricatives were placed in

PAGE 46

40 the same category of duration as the plosives. Since temporal characteristics would be unchanged by clipping, this would provide an additional means of identifying the /s/ and /z/. It would also help to explain certain errors that occurred, in which /p t k/ were the responses to /f/ and to /v/. Aff^^ica tion in plosives . A factor related to confusions between fricatives and plosives may be the affrication that typically follows plosives in American speech. This affrication, when amplified by peak clipping, might tend to make the clipped plosives perceptually resemble the fricatives. Order of treatment The possibility of an order effect based on the procedure in which five observers received the "pass" condition first and the other five received the "clip" condition first was investigated. As Table 6 shows, the order of presentation did not yield a significant difference for either the "pass" or the "clip" condition. Error Responses Confusion matrices were prepared which combined results for all talkers and for all observers. Figures 5 and 6 show types of responses actually made to the stimuli and their frequency of occurrence. The individual scores shown in these figures are based on ten observers' pooled responses to the eight syllables spoken by three talkers for each consonant-a total of 240 stimuli per consonant.

PAGE 47

41 Table 6, Effect of order of treatment for each treatment. Order Tr eatm en t Pas s First Sum o f Clin First Scores Di fference Chi -square Pass 1374 1350 24 1.60 Clip 1135 1061 74 15.26 With 9 df, value of chi-square needed for significance at .05 16.9 .01-21.7

PAGE 48

Percent response on pass treatment 42 4^ -i: o c E $ — 'e o o CP db “0 rO CD O ? o rO cp lo r— * ro O Vo csi ? 6 po/ CD nS N oCO 0 01 CO y lo/ CTT" O CD w->> oo' O ? b CO OQ /cy:r o U) CD M > Q_ -t2^ _Q "0 CD TinuiTitS Figure 5. Confusion matrix of responses to undistorted consonants.

PAGE 49

Percent response on clip treatment 43 0.83 •3“ C3 _c oo O rO d -ad 3d c CO o 3 d CD 0 CS E 3“ d CO 0 rsf v9 3” d •? CS s_ o CO QD d cnI LA — 5" o lr» d 07 Cs4 In rO CP 6 3 C) C5 rO oo o 3C» 3c> 3“ d cn 04 •r. 0 Ca c~ rd cO tP rvj' cO CO d Va of S CS /S Va [ — rd JQ rO CD d d r roo Hi CO CO d /S 0 07 csi C> > cO c c^i Lo r~ ob d 0 rO CJ> 0 M rr> ^ rn /PO / OO CJ3 o oj rO CO d HrrTN / T O' 0 ® O' ryco /ro OO O O rr 3“ d b) ro CJO O r Cl.-^ 1:1 -0 CO TinuiTis Figure 6. Confusion matrix of responses to consonants distorted by infinite peak clipping.

PAGE 50

44 Generally, under the "pass" condition most response errors could be classified in the same voicing and manner of release category as their respective stimuli. However, under the clip" condition the types of error were divergent. There were very few instances of voi c edvo i c el es s confusions, but there were many confusions on the pi o si vef r i c at i ve dimension. These tended to be unidirectional; that is, fricatives, especially /e/ and /d/, were frequently identified as plosives, but plosives were rarely heard as fricatives. There were other interesting examples of confusions that were not reciprocal: twenty-eight percent of the responses to /0/ were /f/, but only seven percent of the responses to /f/ were /0/. Similarly, forty-three percent of the responses to /3/ were /v/, but only nine percent of the responses to /v/ were /3/. Under the "pass" condition the confusions between /0/ and /f/ were approximately equal, but confusions between /3/ and /v/ exhibited the same sort of asymmetry as under clipping. There were many more types of error responses used by observers under the "clip" than the "pass" condition. This Wcxs more noticeable for the dental fricatives than for other sounds .

PAGE 51

CHAPTER IV SUMMARY AND CONCLUSIONS S ummar V An experiment was performed to determine whether individual consonant sounds are affected differentially by infinite peak clipping. The consonants, representing plosives and fricatives, voiced and voiceless, and five articulatory positions, were placed in Vowel-Consonant-Vowel nonsense syllables. These stimuli were spoken by three trained talkers following a predetermined random word order and were tape recorded. Ten adult observers, trained in phonetics, transcribed the consonants in the stimulus words under two conditions: one in which the speech was passed through the system undistorted, and one in which the speech was distorted by infinite peak clipping. The data were subjected to a factorial analysis of variance to appraise the statistical significance of the results. The main effect for Distortion treatment was significant as was the interaction between Distortion and Consonant. This indicated that not only did infinite peak 45

PAGE 52

46 clipping affect the intelligibility of speech, but it does so differently for different consonant sounds. The main effects reflecting Talker and Observer variance were not sig'^ificant, indicating a lack of systematic error from these s our ces . The dental fricatives were the most severely affected by clipping. The plosives and /s/ and / z/ were the least affected by clipping. Intelligibility scores for clipped consonants ranged from 27.5% to 94.6%; the range for the undistorted condition was from 65.8% to 100%. Sounds that exhibited the poorest intelligibility when not distorted also showed the lowest intelligibility under clipping. Voiced and voiceless sounds were equally affected by clipping. Conclusi ons Infinite peak clipping affects the intelligibility of individual speech phonemes. The effect of infinite peak clipping on the intelligibility of individual consonant sounds is different from one sound to another. There is a systematic pattern in the relative effects of infinite peak clipping on intelligibility of individual sounds. Dental fricatives show the largest decrement under clipping; plosives and sibilant fricatives are the least affected of the sounds sampled. Intelligibility of voiced and voiceless sounds are equally affected by infinite peak cli pping .

PAGE 53

APPENDIX A

PAGE 54

DETAILS OF THE CLIPPING APPARATUS Three resistance-capacitance-coupled pr i n t ed c i r cu i t amplifier modules were used in cascade. Clipping was achieved by overdriving the amplifiers between saturation and cutoff, as described by Littwin (1964) , Each module consisted of one emi t t e r f o 1 1 o wer stage driving two stages of c ommon -em i t t er amplification. A schematic of the amplifier modules is shown in Figure 7 . Undistorted operation of each module separately and of the three modules cascaded, when operated within their linear range, was demonstrated by observing the input and output waveforms on a Tektronix type 564 dual-beam storage oscilloscope. These waveforms were photographed using a C-12 oscilloscope camera with Polaroid pack. In Figure 8 the upper traces show a 1000 Hz tone at the output of a Hewlett-Packard model 201-C oscillator , and the lower traces show the waveforms at the output of successive cascaded modules. In Figure 8A the unit is shown operating within its linear range; in B it is operating as a peak clipper. The action of the clipper on samples of speech is shown in Figure 9. Photographs A and 48

PAGE 55

49 Rl 3 , 000 ohm s Cl 10 . microfarads 47 , 000 ohms C2 100 . microfarads R 3 10 , 000 ohms C3 001 microfarads H4 4 , 700 ohms C4 100 . micro farad s 1^5 4,700 ohms Re 680 ohms Qi Low Beta p-n-p Ry 3 , 300 ohms Q2 Low Beta p-np Rs 2,100 ohms Q3 High Beta p-n-p R9 1 , 500 ohms Figure 7. Schematic of clipper amplifier module.

PAGE 56

re 8 Output waveforms of modules A. Linear operation B. Peak clipping

PAGE 57

51 Action of clipper on speech waveforms. A. Vowel B , Consonant Figure 9 .

PAGE 58

B show waveforms at two different moments during utterance of the word "mash," corresponding to the vowel and the final consonant. The upper traces show the speech signal at the tape playback output of an Ampex model 354 tape recorder; the lower traces show the same samples monitored at the output of the c 1 i pper . The frequency response of each module and of the three in tandem was found to be 60 to 12,000 Hz + 3 dB. This was determined by measuring the root mean square output voltage of the clipper, with constant input, using pure tones generated by a General Radio model 1304-B beat frequency audio oscillator. Voltages were measured with a Ballantine model 321 A.C. Vacuum Tube Voltmeter.

PAGE 59

APPENDIX B

PAGE 60

SYSTEM CALIBRATION PILOT STUDY Pur pose To determine if the present experimental situation was comparable to that used by previous investigators, the traditional procedures were replicated in a pilot study. In particular, the intelligibility of undistorted and clipped PB words was compared using materials described by Licklider and Pollack (1948). Stimulus materials Lists one and two of the Harvard PB-50 words (Egan, 1948) were arranged in random word order. As in the Licklider and Pollack (1948) study, one adult male talker who had training in phonetics read both lists. The words were tape recorded in a series 1200 Industrial Acoustics Corporation room using an Altec M-20 condenser microphone coupled to an Ampex model 350— C full— track tape recorder. The stimulus words, each in the carrier phrase "Write the word ," were spoken as one continuous sentence. During the recording, the talker monitored the deflections of a VU meter associated with this phrase in order to maintain a 54

PAGE 61

55 uniform level of vocal intensity among the words. The words were separated by an interval of five seconds, and before each ten words a number identifying the following stimulus was given, using the phrase, "Next is number Apparatus The peak clipping unit consisted of three cascaded printed-circuit amplifier modules operated beyond their linear range. Details of this instrument are given in Appendix A. A high quality audio playback system which is described in Chapter 11 was used for presentation of stimulus materials. Subjects Ten observers, three males and seven females, were selected from the staff of the Communication Sciences Laboratory. The observers ranged in age from nineteen to thirty-six years and demonstrated normal hearing. Hearing was screened at 15 dB relative to the l.S.O. standard at test frequencies of 250, 500, 1000, 2000, 4000, and 6000 Hz, using a Beltone model 10 A portable audiometer. Procedure Experimental trials were conducted in a series 1200 Industrial Acoustics Corporation auditory test room. Observers were seated in one of three chairs positioned at two and onehalf feet from the loudspeaker. Instructions were given to write the words that they heard, using response forms that were provided with spaces for the fifty words of a PB word list.

PAGE 62

56 In order to avoid any systematic effect associated with the difficulty of one PB list relative to another, the distortion treatments and word lists were counterbalanced, ior the "pass" treatment (without peak clipping)^ five observers heard List One, and the other five observers heard List Two. For the "clip" treatment, each observer heard the list that was not presented to him under the "pass" treatment. Results Results are shown in Table 7. The mean intelligibility scores for the ten observers for the "pass" treatment was 95.8%; the scores for the "clip" treatment were 68.8%. In the Licklider and Pollack study (1948) the mean score for five observers in the first test session was approximately 98% without distortion, and 64% with infinite peak clipping. Conclusion With PB-50 lists as stimulus material, the apparatus and procedure used in the present experiment duplicated results obtained by Licklider and Pollack (1948). Intelligibility scores obtained without distortion and with infinite peak clipping were of the same order of magnitude as those reported in the lit er ature .

PAGE 63

5 7 Table 7. Intelligibility tortion and with infinite scores for PB lists without dispeak clipping. Observer s No Distortion Cl i pping Group 1 List 1 9 5.2% List 2 65.6% Group 2 List 2 96.4% List 1 7 2.0% All 95.8% 68 . 8 %

PAGE 64

APPENDIX C

PAGE 65

59 •H 0 s •H w 03 od TJ Q3 03 03 4-> c o3 C O 03 c; o u s e r T z V D i k a 8 Number of Responses Figure 10. Matrix of responses by Observer 1 to undistorted consonants. Letters A, B, and C identify talkers.

PAGE 66

60 Number of Responses 2 B •H 4-> w (/] cd T) o w w -*-> c: cd c o cn c: o o ^ / ^ ^ ' r w m 0 2 J Wr c ^ b B C A CA D U B C 0 ^ 1B c A 2 B r\ 'o A 3 B C A V s c A P § t ^ c A A 'o B C II s 'i L If 54 i 1 i 1 1 1 1 1 J ! i a ( 1 j 1 1 Z 1 1 1 1 i 1 j 1 j z ' 1 i 1 1 3 |4 i6 3 ? i f 1 Z i Z ? Ib 8 p 1 1 3 1 3 1 T 2 1 1 j 1 1 4 1 z 1 1 1 B 4 Z i Z 1 1 1 1 1 1 “ 2 6 3 Z 1 z 1 1 8 8 7 Z 3 Z 6 5 7 G 5 1 z z 1 i A d B c Z 8 8 6 1
PAGE 67

Consonants Used as Stimuli 61 Number of Responses ^ ^ f ^ + rwmnn fii+r s 0 r r 2 5 V p I k b d c J Figure 12. Matrix of responses by Observer 2 to undistorted consonants. ot-uxteu Letters A, B and C identify talkers.

PAGE 68

Number of Responses :3 £ CO w (d XI 0) (fl D c (d c o w c o u s e z 5 V p t k b d 9 sOfz^i vptk ^ A 3 C B A B C A B C A B C A B C A S C A 3 C A B C A B C A B C A B C A B C 153 lit ^ Oinif r w m n 1 \ O 4 3 1 4 B\ 7 \ 6 i S3 7 ir Figure 13. Matrix of responses by Observer 2' to consonants aistorted by infinite peak clipping. Letters A, B, and C identify talkers.

PAGE 69

63 Number of Responses 3 E •H 4 -> w 0] cd TJ 0) U] c cd c: o w c o o s e 2 V P t k b d Figure 14. Matrix of responses by Observer 3 to undistorted consonant s . Letters A, B, and C identify talkers.

PAGE 70

64 Number of Responses S0f2^vpfkt>ci(2 om'it I r w m n Q J n ir C b B C A G B c P A f B C A ^ B C e . A 3 B to ^ cd s V B S C cn A i P c® S J. A g 1 B ^ C A k B A b B c A d B c s 8 :8 i 1 1 1 1 1 ]~p 1 1 1 1 1 1 1 1 4 4 2 1 i 1 1 1 i' 8 2 1 1 7i 8| 7^ 1 1 i 2 CD 1 1 2 1 3 1 1 1 5 5 ± 1 1 1 2 1 t 2 8l 7 8 1 8 8 8 1 1 1 1 2 6 6 6 8 7 7 1 1 8 8 7 1 A q B c 1 7 8 8 Figure 15. Matrix of responses by Observer 3 to consonants distorted by infinite peak clipping. Letters A, B, and C identify talkers.

PAGE 71

65 Number of Responses CO cn cti 'O 0 ) c cd fl o CO c o u s e 0 r 2 5 V p t k b d 3 Figure 16. Matrix of responses by Observer 4 to undistorte consonants. Letters A, B, and C identify talkers.

PAGE 72

66 Number of Responses S 0 fzS vp+k t>(ig Dmit I rwmn^Ji^ir c b B C A e B C A A -P B C A Z B c i A 5 B cn Q d . A •a V 8 « c ^ A m n ^ p B S ^ c ° , A ^ t B U C A k B A b B C 8 8 8 2 1 2 4 6 4 2 1 2 1 7 8 8 1 / 1 6 8 7_ 1 1 1 1 1 5 6 5 1 1 2 1 7 7 7 1 1 1 6 5 5 1 2 2 2 1 7 8 4 1 4 1 1 1 6 8 7 1 8 8 5 1 1 A d B C 7 8 6 1 2 A q B c 8 7 3 1 Figure 17. Matrix of responses by Observer 4 to distorted by infinite peak clipping. I Letters A, B, and C identify talkers. con son an t s

PAGE 73

67 Number of Responses s6fz5 vptk b
PAGE 74

68 3 B •H CO w fd TJ 0) (0 ID cn c (d c o (A C o u s e f z 5 V P t k b d. 3 Number of Responses A B C A B C A B C A B C A B C A B C A B C A B C A B C A B C A B C A B C Figure 19. Matrix of responses by Observer 5 to consonants distorted by infinite peak clipping. Letters A, B, and C identify talkers.

PAGE 75

69 Number of Responses w cn cd •o 0 cn D m -p G (d G o 0) C o o S e -F 2 5 V P t k b d 3 A B C A B C A B C A B C A B C A B C A B C A B C A B C A B C A B C A B C S 0 -f z ^ vp+k bdg tmf I rwmni^J'l^i'r 8 8 8 rm 7 6 Figure 20. Matrix of responses by Observer 6 consonants. to undi storted Letters A, B, and C identify talkers.

PAGE 76

70 Number of Responses S0-pZ'5vp+kto
PAGE 77

71 s e r . Z d e o k b d 9 Number of Responses Figure 22. Matrix of responses by Observer 7. to undistorted con sonant s . Letters A, B, and C identify talkers.

PAGE 78

72 Number of Responses S e z o o k b d 9 sOfzSvp+kbci ^m'lf A B C A B C A B C A B C A B C A 8 C A B C A B C A B C A B C A B C A B C 5 5 4 2 213 8 6 8 7 3 5 3 7 7 8 8 8 8 7 8 7 7 8 8 7 8 7 8 7 7 8 7 r ' j r vv m n 13 J h rr Figure 23. Matrix of responses by Observer 7. to consonants distorted by infinite peak clipping. Letters A, B, and C identify talkers.

PAGE 79

Consonants Used as Stimuli 73 s e z V P t k b d 3 Number of Responses Figure 24. Matrix of responses by Observer 8 to undistorted consonant s . Letters A, B, and C' identify talkers.

PAGE 80

Number of Responses b e •rH CO cn cd -a 0) CO CO -p c (d c o CO c o u S0f25vpi'kb
PAGE 81

75 Number of Responses 0 B •H -P CO CD (d TJ 0) cn D tn 4-> c cd c o U3 c o u s e z 5 V p -L I k b d 3 A B C A B C A B C A B C A B C A 8 C A B C A B C A B C A B C A B C A B C S Q f z ^ vpi’k im'it I Figure 26. Matrix of responses by Observer 9. to undistorted consonants . Letters A, B, and C identify talkers

PAGE 82

Consonants Used as Stimuli 76 Number of Responses vp+k im'it I rwmngj'k'f'r c ^ b B C A e B C P A -F B c A Z B c 3 A 1 5 ^ B r CO tn A V B o C ^ A m P B r c A c , A s r B § c A k ^ A b B c A cl B C ' A q B ^ c 7 8 7 1 1 { 4 8 4 1 2 3 2 1 1 2 1 1 1 6 8 7 2 7 8 8 i 3 3 4 5 4 4 1 1 2 2 7 6 5 1 1 8 7 8 7 8 8 1 1 1 1 2 7 6 6 j 2 1 2 6 5 6 2 1 2 8 6 6 1 1 2 1 8 8 6 L—_J Figure 27. Matrix of responses by Observer 9 to consonants distortsd by infinit© peak clipping. Letters A, B, and C identify talkers.

PAGE 83

77 •f' =1 e •H CO CO (d a 0) CO 1:3 CO c cd c o CO c O u s e p z 3 V 0 i t k b d 3 Number of Responses Figure 28. Matrix of responses by Observer 10 to undistorted consonants . Letters A, B, and C identify talkers. I

PAGE 84

78 s e z k b d 9 Number of Responses Figure 29. Matrix of responses by Observer 10' to consonants distorted by infinite peak clipping. Letters A, B, and C identify talkers.

PAGE 85

BIBLIOGRAPHY Ahmend , R. , and Fatehchand, R. , Effect of sample duration on the articulation of sounds in normal and clipped speech, J . Acous Soc . Am . . 31, 1959, 1022-1029. Black, J. W., Accompaniments of word intelligibility, J. Speech Hearing Pis .. 17, 1952, 409-418. Black, J. W. , and Hixson, W. C. , Number of axis crossings and intelligibility of speech, J . Acous . Soc . Am . . 31, 1959, 1384-1385 . Brandt, A. E. , Forms of analysis for either measurement or enumeration data amenable to machine methods, Proceed ings. Computation Seminar . December 1949, N. Y., International Business Machines Corporation. Davis, H., Stevens, S. S., Nichols, R. H., Jr., Hudgins, C. V., Marquis, R. J., Peterson, G. E., and Ross, D. A., Hear ing Aids: An Experimental Study of Design Objectives . Cambridge, Massachusetts: Harvard University Press, 1947 . Dukes, J. M. C. , The effect of severe amplitude limitation on certain types of random signal: a clue to the intelligibility of 'infinitely' clipped speech. Monograph No. Proc . Instn. of Elect. Engrs .. 1954, 88-97. Egan, J. P. , Articulation testing methods. The Larnygosco ne . 58, 1948, 955-991. Harris, J. D. , Combinations of distortion in speech, A . M . A . Archives of Otolaryngology . 72, 1960, 227-232. Harris, K. S., Cues for the discrimination of American English fricatives in spoken syllables. Language and Speech . 1, 1958, 1-7. 79

PAGE 86

80 Hirsh, I. J., Reynolds, E. G., and Joseph, M., Intelligibility of different speech materials, J . Acous . Soc . Am . . 26, 1954, 530-538. Jacobson, R. , Fant, C. G. M. , and Halle, M. , Preliminari es to spee ch analysis . Tech. Rept . No. 13, Acoustics Lab., M.I.T., 1952. Kryter, K. D., Licklider, J. C. R., and Stevens, S. S., Premodulation clipping in AM voice communication, J . Acous . Soc . Am . . 19, 1947, 125-131. Licklider, J. C. R., The intelligibility of am pi i tudedichotomized, timequanti zed speech waves, J . Ac ou s . Soc . Am . . 22, 6, 1950, 820-823. Licklider, J. C. R., Bindra, D., and Pollack, I., The intelligibility of rectangular s peech-waves , Am . J . Psychol . . 61, 1948, 1-20. Licklider, J. C. R., and Pollack, I., Effects of differentiation, integration, and infinite peak clipping upon the intelligibility of speech, J . Acous. Soc . Am . . 20, 1948, 42-51. Littwin, S., Ed., Pulse Generators in Industrial Electronics . New York: John F. Rider Publisher, 1964. Miller, G. A., Language and Communication . New York: McGrawHill, 1951. • I Heise, G. , and Lichten, W. , The intelligibility of speech as a function of the context of the test materials, J . Exp. Psychol . . 41, 1951, 329-335. Miller, G. A. , and Nicely, P. E. , Analysis of perceptual confusions among some English consonants, J . Acous . Soc . Am . . 27, 1955, 338-352. Pollack, I., and Pickett, J. M., Intelligibility of peakclipped speech at high noise levels, J. Acous. Soc . Am. 31 , 1959 , 14-16 . Snedecor, G. W., Statistical Methods . Fifth Edition, Ames, Iowa: The Iowa State University Press, 1956. J . I Southworth, G. R., Speech digitizing techniques, J. Audio Engr . Soc . 11, 4, October 1963, 349-352. 1

PAGE 87

Velichkin, A. I., Amplitude clipping of speech, Soviet Physics-Acoustics . 8, 2, Oct. -Dec. 1962, 130 134 . 81 I and Haase, K. H. , Theoretical investigations to rsduc© harmonic distortions in a clipping process, J . Acous . Soc . Am . , 29, 1957, 776,

PAGE 88

BIOGRAPHICAL DATA Maurice Joseph was born November 25, 1926, at Astoria, New York. In February 1944 he was graduated from William Cullen Bryant High School in Astoria. From 1944 to 1946 he served in the United States Army. Following his discharge he returned to school and subsequently received the degree of Bachelor of Arts from Queens College, Flushing, New York, in February 1950. He was a graduate assistant at the University of Pittsburgh and received the degree of Master of Science in August 1952. After two years as a research assistant at Central Institute for the Deaf in St. Louis, Missouri, Mr. Joseph taught for ten years at Tulane University Medical School. In 1964 he was enrolled at the University of Florida where he has continued to work toward the degree of Doctor of Phi 1 oso phy . Mr. Joseph has published articles, as junior author, in the Journal o f the Acoustical Society of America and in the Southe rn Medical Journal . He was granted a Special Rehabilitation Research Fellowship by the Vocational Rehabilitation Administration. 82

PAGE 89

83 Maurice Joseph is married to the former Pauline Julia Bruckenstein and is the father of two children, Laura and David. He is a member of the American Speech and Hearing Association, the American Association for the Advancement of Science, the American Association of University Professors, and Sigma Alpha Eta. ( 1 A

PAGE 90

This dissertation was prepared under the direction of the chairman of the candidate's supervisory committee and has been approved by all members of that committee. It was submitted to the Dean of the College of Arts and Sciences and to the Graduate Council, and was approved as partial fulfillment of the requirements for the degree of Doctor of Ph i 1 o s o phy . December 17, 1966 Dean, Graduate School SUPERVISORY COMMITTEE: Chai rman