Sound symbolism in natural languages


Material Information

Sound symbolism in natural languages
Physical Description:
vii, 292 leaves : ; 29 cm.
Ciccotosto, Nick, 1955-
Publication Date:


Subjects / Keywords:
Sound symbolism   ( lcsh )
Psycholinguistics   ( lcsh )
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )


Thesis (Ph. D.)--University of Florida, 1991.
Includes bibliographical references (leaves 265-291).
Statement of Responsibility:
by Nick Ciccotosto.
General Note:
General Note:

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 001714616
notis - AJC6981
oclc - 25605028
System ID:

Full Text









I dedicate this work to my Dad, Donald Ciccotosto, also known

as Don Tosta and to my Mom, Irene Ciccotosto, also known as Irene

Tosta. I further dedicate this work to Carol Ciccotosto, my wife, and

Christopher J. Costoff, her son.

These people have immeasurably enriched my life and there is

no doubt I would not have accomplished this study without their

support and inspiration.


I would like to express my warmest gratitude to my committee

members, Dr. Linda D. Wolfe, Dr. Christiana M. Leonard, Dr. Robert

Lawless, Dr. Ronald Kephart, and Dr. Norman N. Markel. They have

all encouraged and graced me with much perceptive criticism about

this topic.

I also would like to thank Dr. Ronald Randles, chairperson of

the statistics department at the University of Florida. He gave

timely insight on the use of nonparametrics applied to linguistic

topics. I thank my friend Dr. Stanley R. Witkowski at Northern
Illinois University for his ever present humor and direction in

staging the entire series of experiments. It was with his help that

the first pilot studies on sound symbolism were carried out.

Finally, I would like to thank my mother, Irene, and father,

Donald, for their love and concern over the years. There are no

greater parents in the world. My wife, Carol, deserves special

thanks for her kind patience, interest, and concern when I was

working too many hours on one facet of human existence. I

especially want to thank my brother, Rick, and sisters, Nita, Angel,

and Dawn, and my friends, Tom McNulty, Jeff Rosenberg, Greg
McKinney, Brian Akers, and Larry Redman, for their interest and



ACKN OW LEDGEM ENTS.........................................................................................iii

A B STRA CT..................................... ................................................................ iv


HYPOTHESES IN NATURAL LANGUAGES.............................

Introduction........................................................................................ 1
Sound Symbolism and Proto-language................................ 6
The Nature of Sound Symbolism..............................................10
Sound Symbolism Hypotheses................................................... 15
Physiological.............................................. ...................18
A natom ical..........................................................................29
Sem antically Ancient........................................................31

II SOUND SYMBOLISM DATA AND ANALYSIS.........................39

The Universe of the Linguistic Data.......................................39
Coding the Linguistic Data......................................................... 41
Hypothesis Testing Using Chi-Square...................................44
Hypothesis Testing Using Rank Ordering...........................63

NATURAL LANGUAGES..........................................................72

Introduction.............................................................. ..................... 72
Sound Symbolism and Prosody................................... .....74
Sound Symbolic Terminologies...............................................80

Evidence of Sound Symbolism in Natural
L anguages.................................................................... ........... 106

IV OTHER SOUND SYMBOLISM EXPERIMENTS........................ 141

Types of Experiments and their Limitations...................141
"Size" Sound Symbolism Experiments...............................146
Artificial Lexicons in Sound Symbolism
Experiments..................................................................................... 155
Natural Lexicons in Sound Symbolism
Experim ents.................................................. ............................ 165
"Goodness-of-Fit" Sound Symbolism
Experim ents............................................................................. 174
Synaesthetic Studies into Sound Symbolism...................180
Summary of Sound Symbolic Experiments......................188

V CONCLUDING REMARKS..........................................................191

Sum m ary....................................................................................... 19 1
Theoretical W eaknesses............................................................195
Future Research.................................................. ..................197

A WORD LIST FOR 16 CONCEPTS.........................................200
G LO SSES................................................................................... 231
C CODING PARAMETERS FOR ALL GLOSSES.........................252
F PHONETIC CHARACTERS........................................................263

REFEREN CES....................................................................................................... 265

BIOGRAPHICAL SKETCH............................................................................. 292

Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy



Nick Ciccotosto

December, 1991

Chairperson: Linda D. Wolfe
Major Department: Anthropology

A major assumption in modern linguistics is that sounds
composing words arbitrarily associate with meanings. Saussure's

early 20th century arbitrary sound-meaning tenet has been neither

adequately examined nor challenged. This dissertation casts doubt

upon this theory by gathering evidence of sound symbolism from

virtually all known language phyla. Major sound symbolism
experiments are reviewed, and finally, a series of sound symbolism

hypotheses is proposed for a group of basic vocabulary words.
These glottochronological words, of a supposed arbitrary sound-
meaning nature, are routinely utilized by linguists to trace genetic
relationships among language phyla.

Dissertation data are composed of a lexical sample representing

1% of 5000 world languages. Sixteen glosses contain 50 words per

meaning from 50 languages, and are taken from at least 10 of the
17 human language phyla. The set includes: NECK, TOOTH, MOUTH,


SPIT, FOOD, WATER, and CHEW. These 800 glosses, taken from a pool
of 229 languages, are tallied according to sub-phonemic distinctive

articulatory and acoustic features such as nasal, stop, spirant,

bilabial, and others.
For the 16 concepts, a total of 63 hypotheses are proposed. Each

hypothesis argues that certain sub-phonemic features are to be

found at higher or lower levels than those in the remaining sample

of 750 words. Chi-square tests run on 63 hypotheses give 23
instances of association at significant levels, p<.05. The application of
the rank-order median test of Kruskal-Wallis to the same
hypotheses gives similar results. For the ordered alternative
Jonckheere-Terpstra test, all predicted features based on three k-

samples are highly significant.
Such synchronically extensive sound symbolism is striking.
Sound symbolism, within the basic behavioral and physiological
meanings of these words, shows a heirarchy of sub-phonemic
features. Their evolutionary adaptive value may allow conspecifics
facile entry into a communication network.



Sound symbolism, a nonarbitrary, one-to-one relation between

acoustic and motor-acoustic features and meaning, is an important

study for anthropologists because its accurate delineation may shed

light upon an underlying nature of the human language faculty.

Additionally, understanding its mechanics may render a fuller

explication of the lexicon possessed by humankind in pre-sapiens

times. This dissertation examines sound symbolism and argues that

it relates to primitive cognitive levels such as those required of

neonates and early and pre-sapiens society. The crux of this type of

examination is that:

"There will always be layers of the vocabulary, representing a
more primitive stage of language in which the relation between
sound and meaning is partly motivated. there is a need for a
systematic investigation of this vocabulary in various languages,
supplemented by psycholinguistic tests, in order to find out what is
universal in the expressive function of these partly motivated
signs." (Fischer-Jorgensen 1978:80)

In this chapter, I sketch sound symbolism and present a series

of hypotheses about motivated meanings and their representations

with nonarbitrary linguistic features. The language data are
discussed in Chapter II and it represents 800 words taken from 229

languages. The data set includes 16 semantic categories (i.e., words
and their meanings) which are hypothesized to contain sound

symbolic elements. These words are part of the glottochronological

list devised by Swadesh (1971) and refer to basic and proto-typical
ethnoanatomical, physiological, and culturally specific semantic
domains. My word sample includes: (ethnoanatomical) BREAST,
EAT, DRINK, CHEW, SWALLOW, SPIT; (culturally specific) WATER,

The data set exposes semantically basic words and as such, the
categories may reflect universally unmarked domains. That is,
unmarked domains contain words of short form, phonetically
archaic in shape, which are basic in meaning, and which are learned
earliest by language speakers (Battistella 1990:23-68).
This data set is admittedly minimal, though for a number of

reasons. Presently, world culture exhibits at least 5,000 separate

languages. Given an upward limit on the actual size of a particular

language lexicon, an overestimate would be that any language
contains more than 1,000,000 words. Even so, 5,000 languages with
1,000,000 words each, means that 5 billion words are spoken on
earth. Clearly this demonstrates an expansion of lexicons
everywhere at a distant time when phonemes, through a changing

neuro-physiological morphology, became disentangled from primate
call structures (Hewes 1983).

Statistically speaking, greater than two-thirds or 70% of all
languages contain a phonemic inventory of between 20 and 35
phonemes. Even so, the range of phonemes actually produced in all
human languages is at least 500 (Pullum and Ladusaw 1986).
Phonemic inventories range in size from 11 (Hawaiian-Austronesian
Phyla) to 141 (!Kung-Khoisan Phyla) (Maddieson 1984:7).

In turn, each phoneme is a mental construct of a given cultural

group, composed of binary distinctive articulatory and motor-
acoustic features (Sapir 1929). This suggests that languages are
largely composed of arbitrary sound-meanings. The impetus for
accepting the view that there is an arbitrary connection between
signifier and signified comes from the work of the great structural
linguist Saussure (1959). In his groundbreaking work, he held that
a word is composed of sounds and reference to a concept. If the
association between sound and concept were not predominantly
arbitrary, languages would cease to change (Saussure 1959:67-71).

While languages are endlessly changing bio-cultural entities
and completely replace their lexicon approximately every 100,000
years (1&-5% per 1,000 years) (see Gudchinsky 1964, for example),
would it be unusual to find more than 1,000 sound symbolic words

in any given lexicon? For the supposed maximal 1 million words per
language, this would represent a negligible one tenth of 1% of a
language's lexicon. Still, although any language might contain
1,000,000 items, scholars generally agree that an average speaker
might command behavioral and physiological mastery of 10,000-
30,000 words actively (Durbin 1969). For the neonate, child,

mentally handicapped, or the emerging bilingual speaker the total

can be considerable smaller. Taking the latter figure as more

realistic would mean a large sound symbolic system could command

more than 3% of a language's lexical system. This may have already

been demonstrated for Japanese (Hamano 1986), and I argue this

for English in Chapter III. However, exactly how a language's sound

symbolic lexicon should be measured is still a matter of some

debate (Ultan 1978; Malkiel 1990a).
The importance of these statistical assumptions is that if a

number of basic glottochronological words are compiled from a

geographically and genetically distanced sample of world languages,
the expectation is that, not being in contact for more than 100,000

years, then only 1% of the terms should be similar. Otherwise, since

contact and borrowing is ruled out, internal and cross-culturally
parallel forces are at work. This more reliable intuition means that

sound symbolic words should appear significantly above limits set

by glottochronologists in many languages. Further, there is nothing
"primitive" about a vocabulary rich in sound symbolic words versus

one appearing less so. Sound symbolism may rank more as a
creative force in producing "new" words, than as a label for

aberrant morphological words.
At as yet uncovered levels of cognition and bio-mechanics,

sound symbolic processes approach "least moves" theories, that is,

they express exceedingly close association between sound and

concept. Contrary to what Saussure and disciples argued, sound

symbolic words are linguistically pervasive, proto-typical, and if

time frames must be given, at least hundreds of thousands of years

old. As LeCron Foster points out, the "arbitrary relationship between

phonological representation and meaning becomes questionable
once motivation is discovered for assignment of a particular
meaning to a particular phonological unit" (LeCron Foster 1978:83).

The subconscious levels of language use are yet to be fully

explicated because the extent and importance of sound symbolism
in world languages. The function of sound symbolism as a citadel of

special word-meaning formations is not well studied. Much
speculating and many poorly designed studies have been done, to

be sure, and few scholars suggest sound symbolism can expose
primordial words, for fear of reiterating some variation of the
disdained "bow-wow," "sing-song," "ding-dong" language origin
theories. Additionally, linguists have omitted sound symbolism as
an arena of attention because of a focus upon sound changes and
the etymological primacy of words (Jespersen 1921/1947:410).

Among the few to propose nonarbitrary sound meanings for

primordial words are Mary LeCron Foster (1978) and Gordon W.
Hewes (1983).

So far, historically documented languages attest sound
symbolism examples from 12 of the 17 language phyla. There is
little doubt much more evidence of sound symbolism is forthcoming
from the lesser studied language phyla. Just as easily, one can see a
sub-field emerging to be labelled "generative phono-semantics" or
"psycho-semiotics" (Markel 1991) to deal with the under studied

mental structures which imbue language its affective use within

socially dynamic contexts.

Psycholinguists, linguists, and anthropologists have

implemented numerous types of experiments upon sound

symbolism. Their investigations involve textual analysis and the

psychological testing of differing linguistic groups with the creation

of artificial lexicons and the use of sound symbolic words. This

research has never been incorporated into anthropological theories

about language origins. Below, sound symbolism is placed back into

this context.

Sound Symbolism and Proto-language

The evolutionary advantages of vocal communication in

primates are considerable. Calls warn others away from danger or

toward food. It is no small observation that they confer "life-

lengthening" advantages to select individuals capable of their

efficient production and understanding (Bickerton 1990:147). This

most basic tenet of communicative function, when placed in the

context of human bio-social evolution, witnesses humans as

paragons of communicative efficiency. Humans are the only species

producing a vocal communication allowing themselves defense

outside of real evolutionary time. This is to say, they can warn each

other about dangers which are unknowable through the immediate

senses, such as cancer and global warming (Pinker and Bloom


Among current speculation on language origins is the endless

though necessary reiteration that language evolution has had many
causes; bipedality (Washburn 1960), vocal-morphological

restructuring (Lieberman 1984), increased brain size (Jerison 1976),
neural-reshuffling (Falk 1990), gestural-motor enhancement

(Ojemann and Mateer 1979), gender differences (Jonas and Jonas

1975), use of fire (Goudlsbom 1983), increasing face to face
interaction (Tanner and Zihlman 1976), and so forth. Beyond this,

however, most language origin arguments splinter into gradualist
versus punctuated scenarios. Stephen Jay Gould's school argues
language is an "exaptation," a combination of otherwise spurious
physiological events coalescing into a remarkably sudden
referential system (Pinker and Bloom 1990). The classical school of
language origin antedates even Wallace's and Darwin's ideas on the
subject. This school presents evidence of a gradualistic "language

design" apparent in nature, even at the expense of efficiently eating,
drinking, breathing, and swallowing (Hockett and Ascher 1964;
Lenneberg 1967; Lieberman 1984).

Scholars like to quibble over which selective pressures resulted
in early hominids leaving the forest. Our distant ancestors,
Bickerton argues, used their proto-words most likely in alarm calls,

animal imitations, expressive grunts, and chance associations
(Bickerton 1990:156). Arising as a representational system,

language was adaptable because it described nature. The only real
intent of proto-words was "to get the point across," says Bickerton,

and this echoes Wittgenstein's philosophy of language (C.H.Brown

1976). Wittgenstein states, "Whereof one cannot talk, one is silent."

Simply put, this means that where there is no selection pressure to

produce a sound, there is not one there. Chomsky claimed that
humans developed a sudden and apparent "linguistic organ"

through the evolving neural tissue (Chomsky 1968). The more

typical Wittgenstein attitude must prevail. Instead of the "rules" of
language being innate, Wittgenstein argues that the capacity to form

rules of language is innate. This view more closely follows the
findings of Ojeman and Mateer (1979), that syntax could have

developed in concert with increasing fine motor control.

The primary function of language is to represent nature, and

as intrinsically connected to animal communication as a whole, this

function is crucial to the intent of all humanly produced words. The

meanings which words contain are only to be found within a range

of human behaviors as an animal species. More basic meanings may

be inseparable from the sounds composing them because they

consistently "get the point across." Whether these basic meanings

are called 'flee', 'fight', 'mate', or 'feed' versus 'run', 'hate', 'love', or

'food' is a moot point. This is exactly what LeCron Foster proposed

when she derived even more distant proto-words from the proto-

words of reconstructed language phyla (1978). She writes:

"Early linguistic symbols (phonemes), apparently parental to all
present-day languages, are reconstructed from a group of languages
whose genetic relationship to one another is extremely remote. The
reconstructed symbols are found to be nonarbitrary. Their
motivation depends upon a gestural iconicity between manner of
articulation and movement or positioning in space which the symbol
represents. Thus, the hypothesis presented here implies that early

language was not naming in the conventional sense but
representation of one kind of physical activity by means of another,
displaced in time but similar in spatial relationship" (LeCron Foster

If a handful of proto-words or sound-symbols can be

manipulated so as to generate elementary propositions, a language

system can emerge with conspecific vocal partners. The advantage

of merely being able to indicate "THERE"+"FOOD" would be

tremendous to our early hominid relatives. Evidently, this capacity

to relate to (or to name) objects and delay enactment of behavioral

rote is well within the range of abilities demonstrated by our closest

genetically and morphologically expressed cousins Pan (Gardner and

Gardner 1971), Bonobo (Boehm 1989; Mori 1983), Pongo (Miles

1983), and Gorilla (Patterson and Linden 1981).

Bickerton's presentation of proto-language assumes the lexicon

of a Homo habilis or Homo erectus to be like a "miser's shoebox,"

each proto-word containing a meaning according to neccessity's

rankings (Bickerton 1990:158). Proto-language also may have

contained a proto-syntax, including negators, question words,

pronouns, relative-time markers, quantifiers, modal auxiliaries, and

particles indicating location (Bickerton 1990:185).
The necessary semantic concepts identified for any human
time before 100,000 years ago are, in Wittgenstein's views,
synonymous with selective pressures. Without recourse to a sound
symbolism element in a language origin scenario, language origin
theories fail to show how any sound is ever connected to any
meaning. This is an absurdity because in order to be at an

overwhelming level of arbitrary sound-meaning, all the present

languages had to have undergone immensely long parallel


The trouble with a cursory dismissal of sound symbolism is

that in order to have arrived at fully arbitrary language now,

humans would have had to have totally foregone all emotion and

neccessity from their utterances. This is clearly not the case with

any language.

I propose that the arbitrary sound-meaning hypothesis is an

unreachable end for all languages and that sound symbolism

mechanisms underlie naming processes.

The Nature of Sound Symbolism

Why should scholars of such differing ages as Socrates,
Aristotle, Plato, Condillac, Swift, Darwin, Wallace, Tylor, and Freud

(Jakobson and Waugh 1979) agree that some facets of words carry

meaning in and of themselves? The attractiveness of a sound

symbolism is that it provides a bridge between extrinsic and inner
realities in hominids. Such plausibility has come into and out of

vogue. Presently, it is becoming increasingly important as an arena
holding vital answers about language origins.

Take the largely autonomic, primate vegetative process of

coughing, as an illustration. Here, coughing is a reflex integrated
neurally at the medulla and is initiated by irritation of the
bronchio-alveolar, tracheal, laryngeal, or pharyngeal mucosae

(Geoffrey, Bernthal, Bertozini, and Bosma.1984). Additionally,

auricular nerve stimulation can initiate the coughing reflex and it

can be produced voluntarily as a discrete sign, a diagnostic event, or

unconsciously with symbolic meaning (Leith 1977:547). During a

cough, as the glottis closes, strong intrapulmonary pressure builds

with the respiratory muscle contractions, and finally, the glottis

suddenly opens to release an explosive discharge of air, mucous,

water, and foreign bodies (Ganong 1983:180). The sound of a cough

varies from animal to animal, being species, age, sex, and in some

manners disease specific (Leith 1977). Nevertheless, the sounds of a

cough in all species take place within a few frequency bands of

acoustic energy, not all of them. Any animal who mimics, duplicates,

or reiterates a cough would create the description of the autonomic

process through the sympathetic nervous system.
There are miles of neural circuitry between the autonomic and

sympathetic nervous system, but what makes sound symbolism

attractive is just that it "gets the point across" as Bickerton would

say. In hominid neural evolution, it points to a "least moves"

pathway inexorably trained upon language development. Sound

symbolism is known to provide a "least moves" route in a variety of

ways, the least of which is that it provides a mnemonic assist to
peripherally included vocal partners such as neonates, other Homo

erectus individuals, or foreign language learners (Wescott 1971b,

Jakobson and Waugh 1979). If language is to include a wide range

of individual genotypes and intelligence, and still incorporate a list

of symbolic elements, it certainly needs mnemonic assists.

In contemporary linguistics, there are arguments for "weak"

sound symbolism. That is, finding one peculiar and necessary
meaning, say "size," diverse languages will all utilize one feature
type to represent it (Durbin 1969). To date, evidence shows this
type of a sound symbolism argument only as a general proposition.
Among the more interesting "weak" though universal sound
symbolism examples include the observation that for most
languages the normal declarative order is Subject-Verb-Object (e.g.
English, "I Do It"). This word order represents better than any other
the actual order of transitive events (Greenberg 1966:76). In regard
to social relationships, terms for male/father and female/mother
universally appropriate labial consonants to the female and apical
consonants to the male ([mama] vs. [dada]) (Jakobson 1960).
A stronger sound symbolism argument supposes that all
humans share a common pool of semantically and evolutionarily
important events. In this case, the phonological, semantic, or
syntactic language universals are linked through sound symbolism

on a language by language basis (Durbin 1969:8). That the front
vowel [i] represents "smallness" in most language is an example of a
semantic-phonological sound symbolism("tiny">"teeny," Bob>Bobbie,
e.g.). Depending upon how the [i] vowel is used, it might also connect
with syntax. A clearer example of this syntax-phonological
symbolism is a connection between [FRICATIVE] and a pluralized
noun (in English [-s] or its voiced counterpart [-z]). Here, the sound
symbolism expresses the concept of "more" with continued sound

instead of plosive and brief sound (use of an [-s] instead of a [-p],


Since sound symbolism is probably universal in language use, it
is necessary to regard the wider scope of language universals for

comparisons. Although language universal research focuses upon

the regularities of syntax, phonology, and lexicon, the lexical domain

was ignored until the late 1960s (Witkowski and Brown 1978).

Since then, implicational universals have been found in folk color

terminology (Berlin and Kay 1969), folk botanical (Berlin 1972; C.H.

Brown 1977), folk zoological life-forms (C.H. Brown 1979), kinship

(Witkowski 1972), ethnoanatomy (McClure 1975), and ethnobiology
(Berlin, Breedlove, and Raven 1973). An implicational universal is

apparent when the occurrence of an item in widespread languages
implies the occurrence of another item or items, but not vice versa
(Witkowski and Brown 1978:428).

As an illustration, an ethnobotanical lexical scheme is in order.

First, no language exists which does not contain at least one word

involving the name of a plant. Hence, naming the botanical universe

is certainly part of the human evolutionary cognitive experience.

But, many languages contain more than one term for plants. Some

languages spoken by pre-literate hunting-gathering societies
contain thousands of such terms. An implicational universal might

read then that if any languages have two words for botanical items,
at least one will be a term for "tree"(e.g., large plant). If any

languages have three terms, the third term will be a "grerb," a small
plant relative to the botanical inventory of a particular

environment, whose parts are chiefly herbaceous. Given four

botanical words in a language, the fourth will be either "bush" or
"vine" or "grass" (Witkowski and Brown 1978:434). One always gets

a term for "tree" before one for "vine", "grass", "grerb" an so on.

Biconditional universals are known as well for human language
speakers. Using the semantic-differential approach, Osgood, May,

and Miron (1975) found that people use the same qualifying

framework in applying connotation or affective meaning to words.
This biconditional universal implies that all human speakers rank

their emotional response to words and their sounds according to
evaluative (good/bad), potency (strong/weak), and activity

(active/passive) dimensions. For a biconditional universe, the

presence of one concept or term will always indicate the other.

With regard to sound symbolism, language universals expose

ancient human avenues of naming behavior. Like the proto-words
of Wescott and Bickerton, sound symbolic words may rank concepts

according to the earliest hominid survival necessities. Hence, the

more basic, primitive, or universal a word may be, the more sound

symbolism may be influencing emotional evaluations about such a

word. In other terms, basic words may represent the activities,

dimensions, or senses of primary sensory and survival value to
early language users with sound symbolism. Strictly speaking, early
naming behavior should contain a close connection between the

signifier and the event to be signified.

Sound Symbolism Hypotheses

The vocalizations of primate communication are dynamic

physical events. Their many complicated muscular and acoustic

productions include imploded fricatives, exploded grunts, coos,

screams, cries, hoots, gobbles, songs, clicks, geckers, whines,

whistles, growls, barks, pants, laughs, twitters, chirps, and "words."

The varied anatomies capable of such diverse modes of producing

sounds among primates point strongly that evolution selected for

vocalization effects in differing environments (Waser and Brown


Among humans, physiological parameters of vocalization are no

less complex. Voluntary production of sound requires coordination

of seven of twelve pairs of cranial nerves, seven major paired

muscles groups in the larynx alone, widely integrated brainstem,

midbrain, and cortical areas, and numerous recurrent thoracic and

lumbar nerves and muscles (Chusid 1970).

However, humans produce sound within acoustical physics laws

as would any other primate. Namely, a rarified and condensed

stream of air is modulated through modification of ventilatory

resonance chambers. Human oral anatomy consists of three

resonance chambers: the laryngeal, the oral, and the nasal. Sound

frequency and intensity is mainly a function of the vocal folds

located in the glottal region. An increased muscle elasticity or a

tracheal air pressure elevation can cause a rise in pitch. Conversely,

a decrease in the vocal folds elasticity or an increased tracheal air

pressure elevation can cause an increase in intensity (Judson and

Weaver 1942:77).
The voluntary act of phonation in humans is so extraordinary

that an accomplished singer can effect over 2,100 variations of pitch
by varying the length of the glottal folds 1-1.5 micrometer (Wyke

1967:5). Additionally, humans alter the post-glottal sound wave by

movements of the tongue, mandible, lips, and velum with

astonishing speed and articulatory proficiency. John F. Kennedy, for

example, held the world record for an articulatory rate of 327 word

per minute in an outburst in a December of 1961 speech

(McWhirter 1978:48). One can assume the topic was emotionally


Although initiated voluntarily, the act of speaking is based

mechanistically upon the precise subconscious integration of a large

number of feed-back reflexes which constantly adjust the large
numbers of muscles required with any type of phonation (Wyke

1967:3-4). Three phonatory reflexes derive from mucosal, articular,

and myotatic mechanoreceptors. The first, presented above in the

cough reflex, produces occlusive glottal effects. Articular reflexes

occur very rapidly when the glottis is opened and closed. For the

key of middle C, a human glottis opens 256 times per second. The

articular reflexes produce what is called phasicc tuning." Finally,
much slower and phylogenetically older myotatic mechanoreceptors
produce stretching adjustments, tonic tuning reflexes, allowing a

consistent frequency emission (Wyke 1967:13).

Considering the many vegetative requirements of humans,

breathing, eating, drinking, swallowing, vomiting, coughing,

chewing, sucking, biting, and so on, it is doubtful every muscle and

nerve combination now existing would exist wholly because of such

vegetative functions (Judson and Weaver 1942:37). Of importance

here is what anatomical, neurological, and physiological differences

distinguish the speech mechanism from the vegetative mechanisms.

Unfortunately, this may never be possible to do considering the soft

tissue nature of the vocal apparatus in primates. Instead, it can be

argued that vegetative functions must have been closely connected

to the earliest semantic conceptions of hominids and these

conceptions are still present, though at a psycho-semiotic level, in

everyday language.
Below, I present three categories of sixteen words. For each

word present in Table l.a., there are 50 instances of this particular

meaning taken from at least 10 of the world's 17 language phyla. I

shall propose about each semantic gloss a number of hypotheses

arguing a nonarbitrary, though motivational, connection between

manners of articulation or places of articulation and meaning. (The

phonetical transcription of these 800 words and the languages they

are from are presented in Appendix A. Their supporting references

are presented in Appendix B. All phonetic characters utilized in

these words are presented and defined in Appendix F for easy


Testing Glosses and Categories

Physiological Ethnoanatomical Semantically










The manners of articulation to be coded for in these words

include: a. stops, b. fricatives, c. affricates, d. nasals, e. resonants, f.

glides, and g. approximants. The places of articulation and their

involvement with various muscle groups and cranial nerves include:

a. bilabial, b. dental-alveolar, c. palatal, d. labio-velar, e. velar, f.

glottal, g. fronted vowels, and h. backed vowels. Mechanically

speaking, consistent modes of production for semantically similar

concepts across distant language phyla should not be expected

unless the glossary represents the proto-language constraint of

sound symbolism.


It must be assumed there was some importance to face-to-face

social interaction as some species of Australopithecines evolved into

early Homo erectus lineages (Tanner and Zihlman 1976:474).

Because of this, the association between highly physiologically

hedonistic activities, such as chewing and swallowing, and socially

expressive ones of emotional value through the face and the mouth

cannot be ignored (Dellow 1976:9).

A physiological sound symbolism origin is based upon the

assumption that part of the sound-producing mechanism is closely

involved in the activity which is named. Wescott (1980b) goes so

far as to state that a study of non-primate phonation and human

speech suggests that labiality was initially prominent in language
origins. The reason for the early focus upon lip sounds is the

behavioral reinforcement produced by synesthetic experience:

"[B]ecause the lips are the outermost speech-organs, they are, for
a speaker, the most touchable of his own speech-organs and, for a
hearer, the most visible of another's speech-organs. When the
senses of touch and sight overlap the sense of hearing, they not only
reinforce the latter but ease the evolutionary transition from a non-
auditory to an auditory channel of preferential information-
transfer." (Wescott 1980b:105)

Wescott's attitude is nothing less than a reworked version of

the gesture-speech origin of language. Its most important
proponents have included Darwin, Wallace, Tylor, Paget, and

Johannesson (Critchley 1967:27-38). In one manner or another,

each of these scholars proposed that meaningful gesture and
language arose together in a mutual type of synergism (Hewes

1973). Wallace, in particular, held that a wide variety of languages

utilized lip-pointing to express ideas such as coming and going, self

and other, up and down, and inwards and outwards.

At the center of gesture-speech origin theories is the

assumption that the shape of the physiological components

constituting certain sounds (tongue placement, lip protrusion, teeth

baring, extreme exhalation, etc.) may be sufficiently close in manner

to provide a shorthand synonymy for other important behaviors.

The gesture-speech language origin theory is better labelled

physiologically constrained sound symbolism. Two assumptions

underlie the following hypotheses: First, that these words, COUGH,


physiological necessities for all primates; second, when they became
semantic entities as words, they still represented affective arenas of

behavior. Therefore, I assume that, as it became necessary for these

physiological processes to become words, they became so in
response to intense evolutionary selection.
Cough. A cough is one member of a larger class of respiratory

maneuvers in which respired gas acts as a fluid coupling which

transmits energy from the respiratory muscles to other sites in the

respiratory system. This class contains three functions the energy of

the respiratory muscles may be used for: 1. Ventilation, including

breathing: gas exchange, panting: thermoregulation, sniffing:

olfaction; 2. Sound production, including phonation and singing,
whistling, snorting, and Bronx cheer; 3. Moving material outward or

inward, including coughing: lower airways, larynx, forced expiration:

lower airways, larynx, clearing throat: hypopharynx, spitting:
mouth, sneezing: upper airways, nose-blowing: nasopharynx,

paranasal sinuses, nose, sniffling: retaining secretions in the nose,

snuffling: nasopharynx, nose, paranasal sinuses (Leith 1977:545-

Coughing appears rare when an animal possesses good health,

and it is likely the appearance of coughing increasingly became a

diagnostic sign to hominid groups as they improved upon other

social integration behaviors. If this is true, it should not be unlikely

that in most languages the distinctive features naming COUGH could

also have a polysemic relation to words and concepts such as SICK,

HOT, DISEASE, and so on.
While this suggestion has not yet been tested, the null
hypotheses for COUGH are: Ho: stops, velars, back vowels, and

glottals find chance/normal distribution in the sample. The
alternate hypotheses are: Ha: stops, velars, back vowels, and

glottals find higher than chance/normal distribution in the sample.

The alternate hypotheses suppose that because a cough is such an
invariant autonomic process, it provides reference to itself through

sound symbolism.
Vomit. There are numerous mechanisms which protect an

animal from ingested toxins. These include, in decreasing order of

temporal effectiveness: 1. The smell or taste of potential foodstuffs

which may be avoided by innate or learned behaviors, 2. The
detection of toxins by the receptors in the gut followed by a central
reflex triggering appropriate responses; nausea to prevent further
consumption, inhibition of gastric motility to confine the toxin to the
stomach, and vomiting to purge the system of ingested, though not

entirely absorped toxin (Davis, Harding, Leslie and Andrews

Vomiting is of great importance in human evolution considering

the vagaries of diet and health in a pre-scientific era. It is a

powerful reinforcer of memory and behavior for all primates.

Armelagos and Farb remark that back vowels are noticed in world
languages for foods which can cause nausea (Farb and Armelagos

1980) It can be suggested, therefore, that when selection pressures
developed a word for VOMIT, its features closely related to other
words for dangerous food items and visceral sensations, POISON,
Emetic responses to emotionally charged events also occur and
humans can speak of "sickening sights" and "nauseating fights"
(Ganong 1983:180). Likewise, it can be suggested that because of
the inflammatory contexts they are found within, taboo words,
especially derogatory insults, contain features which are
synonymous with VOMIT. Wescott reports that for English, at least,

swear words about all manner of topics, include velar and labial
consonants ("kike," "mick," "dyke," "nigger," "bugger," fuckerr,"
"wop," "polack," "gook," "mex," "spic," "canuck," "redneck" e.g.)

(Wescott 1971a:124). This back and front pattern relates at least
superficially with what could be considered fitting sound symbolic
phonetic features naming VOMIT.
Vomiting is a complex muscular event creating many points of
stress and noise, so presupposing features universal to the world's
examples of VOMIT is difficult. Its complexity can be noted in its

sequence of motor actions: 1. the elevation of the soft palate, 2.
larynx and hyoid drawn forward, 3. salivation and opening of the
mouth, 4. closure of glottis, 5. relaxation of the esophagus, 6.
opening of the cardia, 7. flaccid relaxation of the stomach, 8.
constriction of the lower end of the stomach, 9. inhibition of normal
respiration, 10. forced inspiration, 11. sharp contraction of
diaphram and abdominal muscles, and 12. characteristic posture,
bent at waist, clenched fists, strained face, and so on.
The null hypotheses about VOMIT are: Ho: velars, glottals,

nasals, stops, and back vowels find chance/normal distribution in
the sample. The alternate hypotheses are: Ha: nasal features should

be found at low frequency because the velum is shut when
vomiting, so as to prevent vomitus from entering the nasal
passageways. Glottals, velars, and back vowels should be at high
frequency in the glosses for VOMIT because they correspond to
crucial areas of the process. Stops should be high frequency because
they imitate the suddenness and acoustic manner of vomiting.
Spit. Though spitting is generally thought of as a voluntary
activity, it is much like coughing and is present at birth in neonates.
The normal person secretes about 1.5 liters of saliva per day, which

contains a number of digestive enzymes, provides some measure of
anti-bacterial action, and lubricates and cleans the mouth (Ganong
It can be assumed that early hominids possessed some degree
of proficiency with spitting, and also put the secretion to important
bio-medical uses. Saliva is known in early and present cultures as

the means to cause fermentation of various grains for the
production of alcoholic drinks. In various human cultures, the act of
spitting can also be a segment of a threat display.
The bio-mechanisms of SPIT are much like COUGH. The
exception is that the liquid globule is usually gathered higher in the
airways. The null hypotheses assume: H o: fricatives, stops, dental-

alveolars, and affricates should have a chance/normal distribution.
The alternate hypotheses are: Ha: stops, fricatives, dental-alveolars,

and affricates should find higher rates in the distribution. They
recapitulate the articulatory points in the act of spitting and the
sounds which are made in the course of violent and abrupt
Eat. Although a great deal is now known about eating centers of
the nervous system, this is of little aid in determining what
semantic intent a proto-language word such as EAT might contain.
The reason for this is that even though EAT refers to ingesting food,
the steps involved are diverse and complex. Eating involves
chewing, sucking, and swallowing. Each is in turn a behavior whose
foundations are largely autonomic.
It would appear that EAT may have become a word when
selective pressure announced a need to identify the good or bad
qualities of foodstuffs whose properties were not transparent to any
sensory detection. Proto-typically, EAT may mark an occasion
where non-poisonous foodstuffs might be ingested.
Of all the physiological words proposed here, EAT is the most
mysterious. Exactly what does it refer to? I propose these null

hypotheses: Ho: fricatives, dental-alveolars, stops, and front vowels
should have chance/normal distribution. Alternately, I propose: Ha:

fricatives, stops, dental-alveolars, and front vowels should have a
higher rate of distribution. The words for EAT may refer to getting
food to the front of the mouth (front vowels), the tools of eating
(dental-alveolars), sounds of chewing food fricativess), or mechanics
of glottal closure in swallowing (stops).
Drink. The behavior of drinking is closely related to swallowing.
The difference between the two is that whereas a normal swallow

occurs in one-thirtieth of a second, drinking can occur for durations
exceeding one second (Fink 1975:109). Otherwise, when a person
drinks, a liquid is introduced into the oral cavity and the larynx is
elevated and glottis closes just as with swallowing.
The null hypotheses are: Ho: velars, palatals, resonants, and

stops should be at chance/normal distribution. The alternate
hypotheses are: Ha: velars, stops, palatals, and resonants should

find higher than chance/normal distribution in the sample. Velars
are elevated because the manipulation of the velum prevents liquid
from entering the naso-pharynx. Palatals represent the kinesthetic
sensations of a mouthful of liquid. Stops indicate the necessary
glottal shutting. Resonants mime the action of the tongue while
Chew. As mentioned earlier, chewing is a hedonistic event for
hominids. Evidently, the pattern of mastication is generated by a
pool of motoneurones in the brainstem and is not proprioceptive in
nature (Lund 1976:145). The ability to gently crack a peanut or


crush a tiny blackberry seed arises from other proprioceptive facial,
oro-pharyngeal, and laryngeal motoneurones.
Since chewing is a reflex present at birth, its similarities to

features in the production of speech have not gone unnoticed. In
fact, one scholar recently stated that in the production of vowels

and consonants,

"incorporating noncylcical gestures at specific points in an
ongoing cycle of movements closely resembles the incorporation of
food transport and swallowing movements into the cyclical jaw
movements of chewing, suggesting that the pattern in speech is
taken over from eating, with modifications specific to manipulating
the shape of vocal tract resonators in place of ingesting food"
(Kingston 1990:738-739).

Chewing is only one stage in a series of behavioral steps to eat

food. Not only does chewing involve many cranial nerves and
muscles, it appears that humans chew soft foods more slowly than

hard foods (Lund 1976:146). With these bits of information on
chewing, the following null hypotheses are made: Ho: features

found at chance/normal rates, dental-alveolar, front vowels, velars,
and fricatives. The alternate hypotheses are made: Ha: features

found at above chance/normal rates, dental-alveolar, front vowels,
velars, fricatives. These hypotheses are made because chewing
involves articulation of the two dental arcades, in the anterior
portion of the oral cavity, bordered by the velum posteriorly, and
with sufficient force to cause breaking noises to be routinely heard.
Suck. There is little doubt that sucking is crucial in the early
post-natal period for primates. Some studies suggest that "sucking is

a functionally adaptive response that may be influenced by

nutritive reinforcement contingencies in the feeding situation"

(Siqueland and DeLucia 1969:1145).
A child may have tactile, muscular, and gustatory stimuli

initiate sucking, at first by triggering a flow of saliva to assist in the

labial seal on the nipple. The thrusting and closure of the infantile

lips and gum pads upon the peri-areolar tissue is responsible for
milk removal, and importantly, the true physical sucking is a
minimal factor in milk secretion (Dellow 1976:14).
Surprisingly, the effective reinforcement of sucking can be

achieved with a wide variety of stimuli including visual, auditory,

tactile, olfactory, and kinetic (Siqueland and DeLucia 1969:1146). In
other words, humming or rocking an infant may be used to
reinforce feeding behavior in a neonate over and above more
autonomic controls of the nervous system. It can be suggested,

therefore, that sucking reinforcement in early hominids was related

to direct communication with an infant with multi-modal sensory

elements, the ultimate purpose being to train and exercise effective

motions of the facial and oral musculature.

Sucking behaviors are also an important part of healing

procedures practiced by shamans and doctors in widespread
cultural areas. When SUCK was coded finally into word form, sound
symbolism could have set the limits to the features appropriate to
its reference. The null hypotheses are: Ho: palatals, fricatives,

affricates, and nasals should find chance/normal distribution in the
sample. The alternate hypotheses are: Ha: palatals, nasals,

fricatives, and affricates should find a higher rate in SUCK glosses.
The act of sucking creates a negative pressure inside the oral cavity,
explaining the palatal features chosen. Fricatives and affricates
mimic the sounds made in sucking. Nasals are hypothesized at a
higher rate because an infant can breathe and suck simultaneously
and the nasally produced consonants may have reinforcing and
calming qualities.
Swallow. When a swallow is initiated by the voluntary action of
collecting oral contents on the tongue and propelling them
backwards into the pharynx, a wave of involuntary contractions of
the pahrynegeal muscles push the material at a rate of about 4
cm/s into the stomach. Inhibition of breathing and glottal closure
are vital parts of the swallowing reflex (Ganong 1983:393).
Swallowing is present in utero and the amount of amniotic fluid
swallowed shortly before birth closely corresponds to that of
mother's milk shortly after birth (Dellow 1976:7). This behavior is
such a major portion of human experience that even when fasting,
the normal human swallows approximately 2,400 times per day
(Ganong 1983:393).
Since SWALLOW refers to a virtually autonomic process, parts
of its sequence could be coded into the phonetic rendition of a word
with sound symbolism. The null hypotheses are: Ho: glides, velars,

and glottals should be at chance/normal distribution. The
alternative hypotheses are: Ha: glides are also known as semi-

vowels since the acoustic energy and articulatory form splits vowel
and consonant definitions. So, because of the similarity to

swallowing, glides should be found at higher than chance/normal

distribution. Velars and glottals should also be found at higher rates

because the act of swallowing inhibits respiration and closes the

glottis. Humans must manipulate both the glottis and velum to

prevent food or water from entering the nasal pharynx or the



The universal presence of words labelling parts of the human

anatomy in all languages strongly suggests that ethnoanatomical

terms are members of a proto-lexicon. Which body parts were

named first in response to selection pressures is a mystery. One

function of body terms might have been to represent associated

behaviors with specific areas of the anatomy. Another function

might have been to extend self-reflective reference upon the outer

world. Widespread occurence of this type of metaphor is seen in

world languages. Such extensions include "mouth of the river", "neck

of the woods", "shoulder of the road", "foot of the mountain", and so

on (Lehrer 1974:135). In some languages the more basic body

terms extend to name even more specific bodily locations, such as

"the neck of the hand" for "wrist" and "neck of the leg" for "ankle."

The basis for sound symbolic naming of anatomy rests in the

physical similarity with place of articulation and part so named.

This naming behavior presents a "least moves," allowing memory

and activity of an area to be the same.

Breast. For neonates, the human breast is an active area of
behavior. The null hypotheses about BREAST are: Ho: nasals,

bilabials, front vowels, and stops should have chance/normal
occurence in the sample. The alternate hypotheses are: Ha: nasals,

bilabials, and front vowels should be higher than chance/normal in
the sample because they are found in the same area most used in
suckling. Since feeding is a continuous process, less than
chance/normal distribution of stops should be seen.
Tooth. The properties of human teeth include hardness,
smallness, and presence in the front of the mouth. Wescott argues
that terms for TOOTH also contain dental-alveolar elements
(Wescott 1971:424). With this in mind, the null hypotheses are: Ho:

dental-alveolars, stops, and bilabials should find chance/normal
distribution in the sample. The alternate hypotheses are: Ha:

dental-alveolars and stops should be higher in rate than
chance/normal distribution. Bilabials should be less than
chance/normal distribution in the sample. Though covering teeth,
the lips clearly are not teeth. I assume that the softness of the lips
versus the teeth also made this dichotomy obvious.
Nose. The nose is an anatomical center of unceasing air
turbulence. It contains the third resonance chamber necessary to
create nasal sounds. Likewise, it is the prominent structure of the
face for humans. The null hypotheses are: Ho: nasals, resonants,

and bilabials should find chance/normal distribution in the sample.
The alternate hypotheses are: Ha: nasals and resonants should find

higher than chance/normal rates because the nose is also part of
their place of articualtion. Bilabials should be higher in frequency

than chance/normal distribution because they represent the nose
visually similar to the protruding possible with the lips.
Neck. Many activities take place in the neck. It is the most
obvious source of phonation, coughing, hiccuping, choking,
swallowing, and drinking. The null hypotheses are: Ho: velars,

stops, and back vowels should find chance/normal distribution in
the sample. The alternate hypotheses are: Ha: velars, stops, and

back vowels should find higher than chance/normal occurrence in
the sample. These features are the most representative of the more
autonomic processes in the neck. In addition, it must be assumed
than since the paleolithic hunter era, humans have realized the
crunching cracking sound a neck makes as it is broken.
Mouth. It is not so clear what MOUTH refers to in many
languages. Though it is generally thought of as the cavity after the
lips and before the neck, its meaning is variable cross-culturally
like so many things. The null hypotheses are: Ho: stops, dental-

alveolars, bilabials, and velars should find chance/normal
distribution in the sample. The alternate hypotheses are: Ha: Stops,

because they inflate the oral cavity, should be found at higher than
chance/normal distribution. Dental-alveolars, velars, and bilabials
circumscribe the mouth and also should be present at higher than
chance/normal rates.
Semantically Ancient
Any "once upon a time" theory about human language origins
must include the necessities of finding water, food, and defense
against predators. If sound symbolism did play a pivotal part of the


proto-language naming system in early hominids, it did so because
of transforming a number of sensory data into consistent acoustic
form. Semantically ancient examples of sound symbolism are based
upon the connection between the most distinctive feature attribute
of the object named and a referent acoustical metaphor. For
example, WATER is soft and fluid, so it would not be expected that it
be named with stops or dental-alveolars.
Water. A human cannot live for more than a week without
water. There is little doubt than the earliest savanna dwellers
became proficient in finding hidden water as a matter of survival
necessity. The null hypotheses are: Ho: labio-velars, dental-

alveolars, approximants, glides, front vowels, and stops should all
find a chance/normal distribution in the sample of world languages.
The alternative hypotheses are: Ha: dental-alveolars and stops

should be less than chance/normal distribution for WATER. Both
represent distinctness in oral gesturing and are incongruous with
water as a fluid. The labio-velars, approximants, glides, and front
vowels should be higher than chance/normal frequency since they
mime drinking behaviors.

Food. It is hard to imagine what actual food, FOOD represents as
a semantic universal in world languages. Does it mean something
that is merely eaten, and thereby include medicinal herbs? Or does
it mean something that is eaten every day and carries an

appropriate set of preparative behaviors about itself? Although it
could be hypothesized that the taste of a food might determine its

name, it is hard to invent or even imagine any one food that might
taste the same for millions of genetically variable individuals.
Nonetheless, if a very sweet food like honey were to be named,
it might be named more for the front of the mouth where those
taste receptors are found, rather than the back of the mouth. For
example, the English "honey" and Greek "mellis" both contain front
vowels and nasal consonants. If a food were bitter or used to induce
vomiting, like the gourd called "kolosinth" by the English, a front
and back consonantal symbolism might be produced (Norwood
For FOOD in general, the null hypotheses are: Ho: nasals and

front vowels should find chance/normal distribution in the sample.
The alternate hypotheses are: Ha: nasals and front vowels should

find a higher rate than chance/normal in the sample. I argue here
that humans identified FOOD in much the same way as BREAST.
Dog. It is uncertain when the wolf was domesticated by early
humans. It can be assumed that since the use of fire and the
production of lancelate tools, the wolf ceased to be a threat to
human communities. Importantly, wolves are like humans in having
spread to all continents. Human cultures almost universally contain
myths concerning wolves. Dogs are important to humans because
when domesticated they also eat feces and reduce levels of
contamination in the immediate human environment. In various
cultures they are food, servant and work horse, pet and family
member, scientific subject, and god.

A DOG is most readily identified by the sounds it makes. The
null hypotheses are: Ho: velars, stops, back vowels, and glottals

should find chance/normal distribution in the sample of world
languages for DOG. The alternative hypotheses are: Ha: velars, back

vowels, glottals, and stops should find higher than chance/normal

distribution in the sample. The proto-word for DOG may have

synonymy with NECK, the place of the bark is near the NECK.

Below are tables 1.b., 1.c., and 1.d. which recapitulate these
unwieldy hypotheses. Each table presents the 16 glosses and the

types of hypotheses argued about each. There are 63 predictions

away from an average feature frequency for all 16 glosses.

Table 1.b.
Glosses and Consonantal Articulation Hypotheses
Features: Bilabial Dental- Palatal Labio- Velar Glottal
Alveolar Velar
Predicted H M L H M L H M L HML HM HML


Breast + + + + + +

Tooth + + + + + +

Nose + + + + + +

Neck + + + + + +

Mouth + + + + + +

Cough + + + + + +

Vomit + + + + + +

Suck + + + + + +

Table 1.b. continued
Features: Bilabial Dental- Palatal Labio- Velar Glottal
Alveolar Velar

Predicted H M LH M L H M L HML HM HML


Eat + + + + + +

Drink + + + + + +

Chew + + + + + +

Swallow + + + + + +

Spit + + + + + +

Water + + + + + +

Dog + + + + + +

Food + + + + + +

Totals: 3 12 1 5 10 1 2 14 0 1 15 0 8 8 0 4 12 0

Table 1.c.

Glosses and Consonantal Manner Hy otheses

Features: Affricates Fricatives Stops Nasals

Hypotheses: H M L H M L H M L H M L


Breast + + + +

Tooth + + + +

Nose + + + +

Neck + + + +

Mouth + + + +

Cough + + + +

Vomit + + + +


Table I.c. continued

Features: Affricates Fricatives Stops Nasals

Hypotheses: H M L H M L H M L H M L


Suck + + + +

Eat + + + +

Drink + + + +

Chew + + + +

Swallow + + + +

Spit + + + +

Water + + + +

Dog + + + +

Food + + + +

Totals: 2 14 0 4 12 0 9 5 2 4 11 1

Glasses and

Table 1.d.
Vowel and Semi-Vowel


Features: B. Vowels Fr. Vowels Glides Approx. Reson.

Hypotheses: H ML H M L H M LH MLH ML


Breast + + + + +

Tooth + + + + +

Nose + + + + +

Neck + + + + +

Mouth + + + + +

Table 1.d. continued

Features: B. Vowels Fr. Vowels Glides Approx. Reson.

Hypotheses: H M L H M L H M LH M LH M L


Cough + + + + +

Vomit + + + + +

Suck + + + + +

Eat + + + + +

Drink + + + + +

Chew + + + + +

Swallow + + + + +

Spit + + + + +

Water + + + + +

Dog + + + + +

Food + + + + +

Totals: 5 11 0 6 10 0 2 14 0 1 15 0 2 14 0

With the presentation of the hypotheses completed, Chapter II

will provide the tally and analysis of this language data. Three

types of statistical tests are made upon these 63 hypotheses. These

include the standard Chi-Square test, the Kruskal-Wallis median

test, and the Jonckheere-Terpstra ordered alternative test. These

tables are used when Kruskal-Wallis median-rank and Jonckheere-

Terpstra testing is done in Chapter II.

Following Chapter II, I discuss the incorporation of prosody as a

subset of sound symbolism in Chapter III. In Chapter III, I also

identify and discuss more than a dozen synonymous sound

symbolism terms and introduce some order to such references

found scattered in the literature. Finally, Chapter III presents

natural language examples of sound symbolism for world languages.

These are illustrative of the extent of sound symbolism throughout

the world, types of sound symbolism, and functions of sound


Chapter IV critically discusses the most important sound

symbolism experiments carried out over the past 70 years. The

diversity of these experiments is not easily compared with the

results from Chapter II. Nevertheless, the concurrence they lend is


Finally, a summary and concluding remarks are given in

Chapter V. Weaknesses of the dissertation design are outlined and

promising areas of future research are listed.


The Universe of the Linguistic Data

The hypotheses proposed in Chapter I regard human language as

a unitary event, though as an entity expressed as over 5,000 regional
languages. To test the depth of the hypotheses outlined in Chapter I, a

representative sample of the 5,000 languages spoken among humans

is necessary. When testing any gloss of this sample, one major

assumption becomes apparent. This is that the presence of any

predicted feature or pattern of sound and meaning becomes

significant to a universal domain when its frequency falls above or

below chance levels of occurence. In short, the arbitrary sound-

meaning hypothesis holds that both words and their sounds should

only find average levels of association regardless of meaning.

The data base consists of 800 monolexemes for 16 concepts. The

categories include: BREAST, TOOTH, NOSE, NECK, MOUTH; COUGH,

FOOD. Each contains 50 examples or words, and each word comes from
a different language. For each category of 50 words, no more than 5

languages come from one of the 17 language phyla considered. So, for

each meaning and its 50 instances of globally sampled words, at least

10 language phlya of 17 language phyla are represented. The

language phyla considered include: 1. Afro-Asiatic, 2. Australian, 3.

Austro-Asiatic, 4 Austronesian, 5. Eskimo-Aleut, 6. Indo-European,

7. Dravidian, 8. Indo-Pacific, 9. Niger-Khordofanian, 10. North

Amerind, 11. South Amerind, 12. Uralic, 13. Nilo-Saharan, 14.

Khoisan, 15. Austro-Thai, 16. Sino-Tibetan, and 17. Altaic. Language

phyla such as Na-Dene, Paleo-Siberian, Georgian, Basque and others

were excluded from this list because of the lack of representative

sources and ambiguities surrounding their phyletic assignments.

The creation of this data base assumes that a balanced sample

of geographically or historically separated languages should

demonstrate languages composed of varying structural components.

That is, their differences should show apt use of the "language"

category because, by definition, languages are changing entities

never possessing the exact phoneme usage frequencies or phonetic

inventory. This should be so even though they use the same

distinctive features in recognizing and creating their phonemic

inventories. All told, what Saussure (1959) argues should be

present; namely, there should be few strong connections between

sounds and meanings, their signifiers and the concepts they signify.

The fragmentary documentation of geographically separated

languages made collection of all 800 words from 50 languages

impossible. This would have been ideal because a range could have

been obtained for total numbers of phonemes present in the

sample. Unfortunately, the data set holds words from 229 sampled

languages, with no one language providing more than a total of 16

words for all 16 concepts. Thus, no single language's phonemic

range and sound frequencies could influence decisions very much.

By sampling from 229 languages and phonologies instead of 50, any

association between sounds and the meanings would be impressive.
In point of fact, less than one percent of the words were

identical in all possessed features with others. These words were

from the same language phyla and it is uncertain whether they

represented loans or cognates from a mother language. Clearly,

there is plenty of distance between the sounds used to represent
meaning in different cultures, especially when comparing across

phyletic boundaries. However, when predicted patterns of sound-

meaning relationships are consistently observed, the arbitrary

sound-meaning hypothesis is not supported.

Coding the Linguistic Data

Each word in the sample (N=800) was coded in a variety of

descriptive ways. (The entire set of words is presented in Appendix

A and each specific language's supporting reference is in Appendix

B.) First, all phones were tallied. A mean word length for each

category was found. Interestingly, the shortest word was EAT (3.6

phones per word) and the longest was SWALLOW (5.2 phones per

word). Perhaps the longer average reflects the less cultural and

more autonomic behavior "swallowing". In addition, over 90% of all

words contained between 4-5 phones. Below is table 2.a.:

Table 2.a.

Data Sample Descriptive Tallies
Words: Sample Word Phones Consonants Vowels

Eat 50 3.6 181 98 83

Water 50 3.7 186 91 95

Drink 50 3.9 196 106 90

Dog 50 4 204 104 100

Breast 50 4.1 208 108 100

Nose 50 4.2 212 120 92

Tooth 50 4.3 219 120 99

Mouth 50 4.4 220 118 102

Suck 50 4.4 223 127 96

Neck 50 4.5 226 125 101

Cough 50 4.7 237 129 108

Vomit 50 4.7 238 127 111

Spit 50 4.8 245 144 101

Food 50 5 250 124 126

Chew 50 5.1 258 142 116

Swallow 50 5.2 263 141 122

Totals 800 4.4 3566 1924 1642

Here, it is unclear what role word length plays with regard to
specific meaning. It is intriguing that EAT, WATER, and DRINK are
the shortest three words and FOOD, CHEW, and SWALLOW are the

longest three. It might be hypothesized that the longer words

represent longer or slower phenomena, the reverse might be true

for the shorter terms. It would be interesting to test such a guess by

simply replicating the same size sample with new languages. If true,

a length and meaning connection as a human language universal

could be analogous with examples in alloprimate communication

systems. These conjectures will undoubtedly be tested further

because this data is not significantly different. The standard

deviation of this sample is a large 1.6. So, one standard deviation

from the standard mean (4.4) easily contains both the shortest word

EAT (3.6) and the longest SWALLOW (5.2).

Analysis of the data set was done further for a comprehensive

number of articulatory and acoustic features. Each sound, whether

consonant or vowel, is identified according to its distinctive

features. Tallying is a binomial decision. A language and its word

either: a). Yes, contain or b). No, does not contain a feature. Hence,

the maximum number of words for each category possessing any
given feature is 50, or 100%. The gloss COUGH, for instance, gives 49

out of 50 languages with an obstruent in that meaning. These coding

parameters for vowels included all rounded or unrounded front,

central, back vowels distinguished by high, middle or low tongue
height. Consonantal coding was done for the following front to back

places of articulation: bilabial, labio-dental, interdental, dental-
alveolar, palatal, labio-velar, velar, uvular, and glottal. Consonants

were also coded for the following manners of articulation: stop,

fricative, affricate, nasal, glide, trill, lateral, approximants,

obstruent, and resonant. These six coding tables are given in

Appendix C according to ethnoanatomical, physiological, and cultural

glosses. Not all the coding parameters were used in testing

hypotheses. The vowels, for instance, are tested only according to

whether they are front or back. The extra coding parameters are

available to demonstrate the full scope of the data and for further

testing by interested scholars.

Hypotheses Testing Using Chi-Square

In Chapter I, 63 hypotheses contrasted sound symbolism and

arbitrary sound-meaning relations. The arbitrary-meaning

hypothesis is the null hypothesis. It argues that in the pantheon of

5,000 known languages, all phones will be randomly represented

over all meanings. There should be no particular agreement among

separate languages and the sounds in meanings attached to sounds.

Further, when a single category of words is compared among

languages, the interlanguage similarity should be as small.

My 800 word data sample is synchronic. It takes words from

languages as they are known this century. No words represent the

proto-forms of any phyla. The statistical tests necessary are

nonparametric because the underlying population distribution of a

sample is not uniform (Wynne 1982:330). The 800 word data set

represents 229 languages and therefore, 229 distinct phonetic

inventories. The little information available about most makes

normality assumptions difficult to test.

However, the 800 word sample does represents 229 languages

and the phonetic range for this sample probably reaches 90% of the

possible phonetic variation known for human language in its
entirety. Then, by obtaining frequency counts of categories (words)

for certain qualitative variables (distinctive features), a two-by-two
contingency table or Chi-square can measure significance of any
In the 63 Chi-square tests below, a test word, (e.g. COUGH), is

compared to all other words (15 other glosses) according to a
qualitative feature. That is, as a sample, COUGH might contain a total
of 50 examples of a certain feature for its 50 languages. This
number is compared to the total number of other languages and

features, which might total 750 features for 750 languages. Chi-

square results from the calculation of Phi shown below (Driver

4(a+b) (a+c) (b+d) (c+d)

X=02 N

Since the degrees of freedom equalled 1, the Yates correction

was applied for distribution skewing. There is some debate recently
over whether the Yates correction for continuity is necessary. In our

case, the N is so large (800), that applying this correction lowers Chi
values very little. The nature of Chi-square only allows non-

directional associative findings. Although the hypothesis about

COUGH predicts it will contain more STOPS than average, deviation

in either direction will result in a significant Chi-square.

All 63 hypotheses discussed in Chapter 1 contain the same


Null Hypothesis:

Ho: u=U, (given a word of n=(50) and u occurence of a feature, a

larger sample N (800)-n(50)=(750) and U occurence of a feature

should be similar);
Alternate Hypothesis:

Ha: u is not equal to U.

The test statistic is Chi-square, and the corrected Yates value is

given. The significance level sought is p<.05. At this level, the null

hypothesis asks that if the true correlation between a feature and
meaning is zero, what would be the probability of obtaining, by an

error of sampling, a value as high or higher than that obtained from

the observed sample. Since there are repeated tests being made, the

results must be qualified. If 100 tests were made with Chi-square

at a .05 probability level, 5 cases would be likely to be significant or

insignificant by chance factors. In the tests presented below, for 63

hypotheses about 3 cases should be expected to yield results solely
according to chance associations. As the results will show, this error

is negligible due to the dramatic number of significant tests. Below

are the 16 glosses and the Chi-square tests for each:

Table 2.b.1.






+Stop 23 506 Phi=.11 _

-Stop 27 244 Chi=9.6 .001 8 .003 *****

+Bilabial 2 1 232 Phi=.05

-Bilabial 29 518 Chi=2.6 .10 2.1 .14

+Nasal 26 334 Phi=.03

-Nasal 24 416 Chi=1.0 .3 .77 .37

+F. Vow 24 337 Phi=.01

-F. Vow 26 413 Chi=.17 .67 .07 .78

For BREAST, distribution of the stop is significantly predicted

by the alternate hypothesis. Of 50 languages contributing to the

BREAST sample, only 46% contain one or more stops while 67% use

one or more stops for the 15 other concepts.

Table 2.b.2.





Tooth_ _

+Dental 42 507 Phi=.08

-Dental 8 243 Chi=5.8 .01 5.1 .02 ****

+Stop 30 499 Phi=.03

-Stop 20 251 Chi=.89 .34 .65 .42

+Bilabial 32 329 Phi=.14

-Bilabial 18 421 Chi=16.1 .0001 14.9 .0001 ****

+F.Vowel 32 329 Phi=.09

-F.Vowel 18 421 Chi=7.6 .005 6.8 .008 ****

These results demonstrate that world languages use dental-

alveolars and front vowels, not bilabials to name TOOTH. A linguist

should find the teeth named with sounds like, ne, si, se, zi, chi, and

so on, but not mo, ma, ka, ta, duh, pu, po, ba, bo, and so on.

Table 2.b.3.




+Nasal 40 320 Phi=.18

-Nasal 10 430 Chi=26.3 .0001 24.9 .0001 ****

+Resonant 42 496 Phi=.09

-Resonant 8 254 Chi=6.7 .009 6.0 .01 ****

+Bilabial 27 226 Phi=.12

-Bilabial 23 524 Chi=12.3 .0004 11.2 .0008 ****

For NOSE, nasals, resonants (which include nasals and

approximants), and bilabials are favored features, presumably

because of iconic or gestural similarity. The NOSE sample shows 80%

of its languages choosing nasal, versus 43% for the other 15

concepts. For resonants, NOSE carries 84% to 57%, and for bilabial

54% to 30%.

Table 2.b.4.





+Velar 35 286 Phi=.15

-Velar 15 464 Chi=19.8 .0001 18.5 .0001 ****

+Stop 42 487 Phi=.09

-Stop 8 263 Chi=7.6 .005 6.7 .009 ****

+B.Vowel 30 402 Phi=.03

-B.Vowel 20 348 Chi=.7 .37 .5 .46

Highly significant associations for velars and stops are seen for

NECK. The velar feature may be used in NECK with iconic or

kinesthetic origin. The velar articulation is at the level of the neck

in proximity. The significance may also be because so many

vegetative processes occur in the neck involving the same processes

of phonatory stopping. Some evidence is seen for this below for

COUGH and VOMIT. Both contain significant levels of stops like NECK.

Table 2.b.5.





+Nasal 15 345 Phi=.07

-Nasal 35 405 Chi=4.8 .02 4.2 .03 ****

+Stop 40 489 Phi=.07

-Stop 10 261 Chi=4.5 .03 3.9 .05 ****

+Glottal 1 3 113 Phi=.07

-Glottal 37 637 Chi=4.2 .03 3.4 .06

+Velar 23 298 Phi=.03

-Velar 27 452 Chi=.7 .38 .52 .46

+B.Vowel 26 406 Phi=.01

-B.Vowel 24 344 Chi=.08 .76 .02 .88


significance is found with nasal and stop features.

Presumably, nasals are not favored because the velum is usually

closed during vomiting. Stop features describe convulsive mechanics
of vomiting. The glottal features approach significance with p=.06.
Surprisingly, back vowels and velar consonants find an average
distribution. The most common VOMIT vowels are /a,a/.

Table 2.b.6.






+Stop 41 488 Phi=.08

-Stop 9 262 Chi=6 .01 5.2 .02 ****

+Velar 2 8 265 Phi=.l

-Velar 22 485 Chi=8.6 .003 7.7 .005 ****

+Glottal 1 3 113 Phi=.07

-Glottal 37 637 Chi=4.2 .03 3.4 .06

+B.Vowel 3 1 401 Phi=.04

-B.Vowel 19 349 Chi=1.3 .24 1 .3

Like the word NECK, the gloss COUGH contains significant

numbers of stops and velars. The glottal feature is suggestive at

p=.06. COUGH does carry features commonly known for most coughs,

namely velar stops. Back vowels are just as likely to be found as

front vowels in names for cough. Of all vowels, the most common for

COUGH is Back Mid Round /o/, at 36%.


Table 2.b.7




+Bilabial 19 234 Phi=.03

-Bilabial 31 516 Chi=l .31 .7 .39

+Dental 35 514 Phi=.007

-Dental 15 236 Chi=.04 .82 .003 .95

+Stop 32 497 Phi=.01

-Stop 18 253 Chi=.1 .74 .03 .86

+Velar 22 299 Phi=.02

-Velar 28 451 Chi=.3 .56 .18 .66

None of the hypotheses for MOUTH is significant. Apparently

there is no commonality among what feature used to name MOUTH.

When the frequencies of all the features are ranked, as is done

below in the next analysis section, MOUTH is average for all features

except one. It is tied for last place, out of 16 rankings, for the use of

fricatives. Why MOUTH stands out among anatomical terms may

have something to do with the semantic vagueness of MOUTH itself.
What is the MOUTH? Where does it begin and end? Its vagueness

may aid in its arbitrary sound-meaning form.

Table 2.b.8.





+Palatal 17 1 15 Phi=.12

-Palatal 33 635 Chi=11.8 .0006 10.5 .001 ****

+Affricate 5 3 7 Phi=.05

-Affricate 45 713 Chi=2.4 .11 1.5 .21

+B.Vowel 39 393 Phi=.12

-B.Vowel 1 1 357 Chi=12.3 .0004 11.3 .0008****

+Fricative 21 280 Phi=.02

-Fricative 29 470 Chi=.4 .50 .25 .61

+Nasal 25 335 Phi=.02

-Nasal 25 415 Chi=.5 .46 .3 .55

For the gloss SUCK, palatals and back vowels are significant.

This duplicates the mechanics of suction whereby the tongue is

depressed due to negative ingressive pressures. Other features as

nasal, fricative, and affricate are insignificant. Apparently, there is
little acoustic mimicry found in words for SUCK.

Table 2.b.9.



Yates p


+Fricative 12 289 Phi=.07

-Fricative 38 461 Chi=4.2 .04 3.6 .057

+Dental 30 5 19 Phi=.04

-Dental 20 231 Chi=1.8 .17 1.4 .23

+Stop 27 502 Phi=.06

-Stop 23 248 Chi=3.5 .06 2.9 .08

+F.Vowel 16 345 Phi=.06

-F.Vowel 34 405 Chi=3.7 .05 3.1 .07

The gloss EAT is an enigma. No feature appears at

levels above

average. Surprisingly, the rotory action of chewing and its intra-oral

noise must not contribute to the choice of this feature for CHEW

words. Fricative frequency is 24% versus 38% for all other glosses.

Possibly the reason for this is that EAT, like MOUTH, is a culturally

more malleable word because of its semantic vagueness. Like

MOUTH, what does EAT refer to? Is it chewing, swallowing,

consumption, sipping, gulping, or slurping? Each culture approaches

EAT differently and this is paramount in its form.




Yates I


+Velar 12 309 Phi=.08

-Velar 38 441 Chi=5.7 .01 5 .02 ****

+Palatal 11 1 21 Phi=.03

-Palatal 39 629 Chi=1.1 .27 .7 .37

+Resonant 34 504 Phi=.0004

-Resonant 16 246 Chi=.01 .90 .001 .96

+Stop 3 1 498 Phi=.02

-Stop 19 252 Chi=.4 .52 .2 .62

Only velar features are significant for DRINK. Since drinking

mechanics conspire to keep fluid from the nasal sinuses, velars are

less than the mean frequency for the other words. It may be that

physiological words tend to reject the very features that would

indicate a poor enactment of the named event. When velars are in a

word, movement of the velum draws attention to the border area

between the mouth and nose at the soft palate. Choking on a drink
or morsel of food involves the velum and the glottis and accurate
drinking behavior may be named to contrast with this.

Table 2.b.11.



Yates p


+Velar 22 299 Phi=.02

-Velar 28 451 Chi=.3 .56 .1 .66

+Fricative 24 277 Phi=.05

-Fricative 26 473 Chi=2.4 .11 1.9 .15

+F.Vowel 36 341 Phi=.02

-F.Vowel 14 409 Chi=.5 .45 .3 .54

+Dental 36 513 Phi=.01

-Dental 14 137 Chi=.2 .59 .1 .7

Like the words EAT and MOUTH, CHEW is semantically

inscrutible because none of the tested features are significant. CHEW

may not be comparable in its failure to MOUTH and EAT and may

involve other senses. For nasals, CHEW ranks second among 16 for

frequency of such phonemes. Chewing may be tied to the

stimulation of the cranio-facial musculature and enhancement of

olfactory detections. Or, the act of chewing reduces food mass and
may reflect this in a front, small to back, large vowel apophony in

each word for CHEW. For CHEW, 30 of 50 languages show a vowel


Table 2.b.12.



Yates D


+Glide 15 109 Phi=.l1

-Glide 35 641 Chi=8.5 .003 7.4 .006 ****

+Velar 23 298 Phi=.03

-Velar 27 452 Chi=.7 .38 .5 .46

+Glottal 9 117 Phi=.01

-Glottal 41 633 Chi=.2 .6 .06 .8

SWALLOW contains glides at significant levels. Glides mime the

motion of the tongue as it propells a bolus of food toward the

esophagus. This result was predicted. Glottal and velars are random

and perhaps for reasons similar to why DRINK lacks velar features

at significant levels. Ideally, swallowing is a continuous and

autonomic process and glottal and velar articulation features stand

in the way of this. When swallowing goes awry it becomes choking,

and it is possible CHOKE would use features which are only random

in SWALLOW, much like DRINK.


Table 2.b.13.




+Fricative 32 269 Phi=.14 _

-Fricative 18 481 Chi=15.8 .0001 14.6 .0001 ****

+Stop 4 1 488 Phi=.08

-Stop 9 262 Chi=6 .01 5.2 .02 ****

+Dental 39 510 Phi=.05

-Dental 11 240 Chi=2.1 .14 1.7 .18

+Affricate 9 3 3 Chi=1.4

-Affricate 41 717 Chi=17.4 .0001 14.8 .0001 ****

SPIT contains significant levels of affricates, fricatives, and

stops. All these features are present in the mechanics and acoustics

of spitting. This is, again, predicted. The dental-alveolar frequency
of SPIT ranks third of 16 words. Even so, such a frequency is only

average. As for the vowel a SPIT word tends to contain, though no

predictions were made, the SPIT sample contains the highest

number of High Back Round vowels of the data set. The vowel is /u/
and in 23 of 50 languages SPIT contains this vowel. It would be

interesting to see whether the finding holds up in a larger sample.

Table 2.b.14.

Food All-Food p Yates p p<.05

+Nasal 27 333 Phi=.04

-Nasal 23 417 Chi=1.7 .18 1.3 .24

+F.Vowel 24 337 Phi=.01

-F.Vowel 26 413 Chi=.1 .6 .07 .78

FOOD is not labelled with any predicted feature at significant

levels. This may be due to poor feature choice or the semantic

variability of FOOD across cultures. One culture's food is another's

waste. However, FOOD leads the category of central, unround

vowels, showing 39 of 50 languages with the vowel /a/. The

significance of this is unclear, but next most common for this vowel

is CHEW (36) and then EAT (33). Interestingly, these three terms

had the fewest successfully predicted features. Also pertinent is the

observation that FOOD contains the most dental-alveolars features

of any gloss. It tops even the TOOTH gloss. This suggests FOOD may

have polysemic overlap with TOOTH as the start of the eating

Table 2.b.15.



Yates Ip


+Velar 22 299 Phi=.02

-Velar 28 451 Chi=.3 .55 .19 .66

+B.Vowel 25 407 Phi=.02

-B.Vowel 25 343 Chi=.3 .56 .18 .66

+Stop 3 9 490 Phi=.06

-Stop 1 1 260 Chi=3.3 .06 2.8 .09

+Glottal 7 119 Phi=.01

-Glottal 43 631 Chi=.12 .72 .02 .88

The gloss DOG is insignificant for

popular belief, the word for DOG is not

all tested features. Contrary to

similar across widely

disparate linguistic areas for stops, velars, glottals, and back vowels.

Other features may be similar but have not been measured. For
instance, DOG ranks third for labio-velars, front vowels, and

approximants. It may also display vowel apophony.

Table 2. b.16

Water All-Water

Yates p


+Labio-Velar 8 46 Phi=.09

-Labio-Velar 42 704 Chi=7.2 .007 5.7 .01 ****

+Approximant 16 238 Phi=.001

-Approximant 34 512 Chi=.001 .96 .01 .9

+Stop 1 8 511 Phi=.16

-Stop 32 239 Chi=21.6 .0001 20.1 .0001 ****

+F.Vowel 22 339 Phi=.005

-F.Vowel 28 411 Chi=.02 .86 .003 .98

+Glides 1 3 1 11 Phi=.07

-Glides 37 639 Chi=4.4 .03 3.6 .05 ****

+Dental 27 522 Phi=.004

-Dental 23 428 Chi=.01 .89 .002 .98

Like SWALLOW, the gloss WATER contains significant

association with glides. This was predicted. Labio-velars are also

significant in the words collected for WATER. This is interesting

because this type of phoneme is produced at both ends of the oral

cavity and perhaps duplicates the wide oral area which water

contacts. Finally, WATER does not tend to use stops as naming

features. In this way, it is much like BREAST.
It would be interesting to compare terms for water from

cultures which are aware of ice and those which have had little

knowledge of ice. If there is a reference to water because of its
liquidity, would the cultures with knowledge of ice include more

stop features than average for their water term? (English contains a
stop in its water term, /t/, but also a labio-velar /w/).

Hypothesis Testing Using Rank Ordering

Since an 800 word sample is large and bulky, more than one

type of statistical analysis is useful to bring out significance. A large

number of ranking nonparametric tests are available to test the null
hypothesis for social scientists. One of the most widely used is the
Kruskal-Wallis one-way analysis of variance by ranks. This test is

useful when there are more than two categories comparing more

than two populations or samples. When only two categories and two
populations are given, the Kruskal-Wallis test is equivalent to the
Mann-Whitney test and equates the Chi-square distribution tables

(Daniel 1990:226).
Another nonparametric test useful to the types of data
considered here is known in the literature as the Jonckheere-
Terpstra test for ordered alternatives (Terpstra 1953) (Jonckheere
1954). In the Kruskal-Wallis test, as in the Chi-square, the deviation
in a particular direction from the null hypothesis cannot be

measured (Holander and Wolfe 1973:122). With the Jonckheere-
Terpstra test, the alternative hypotheses are ordered and at least

three samples drawn. Since this test is used with three or more

samples of observations, the distinction between one-sided and

two-sided tests is not maintained (Daniel 1990:235). It is, therefore,
a very powerful alternative nonparametric test which creates

simplified results available to any researcher with a rudimentary

understanding of z-score and normal distribution statistics (Odeh
In Chi-square analysis, each of the 16 word categories has a

number of hypotheses. Presumably, each word as a category (n=50)
has a mean average different from the mean average of a larger
number of words (N-n=750) drawn from the same universe of

words (U=800). In nonparametric ranking analysis, each word
category is ranked against each other according to each of the 15
tested features; bilabial, dental-alveolar, palatal, labio-velar, glottal,
affricate, fricative, stop, nasal, back vowels, front vowels, glides,
approximants, and resonants. The initial ranking needed for both
tests in given in Appendix D. The actual rankings are given in

Appendix E. Actual rankings average the ties between categories
and are not merely 1 through 16 rankings found in the initial
Kruskal-Wallis Testing. The Kruskal-Wallis test is a median-
rank test. Any null hypothesis formed with it assumes that the k
sums of ranks (that is, the sums of the ranks in each sample) to be

about equal when adjusted for unequal sample sizes (Daniel

1990:227). According to the 63 hypotheses outlined in Chapter 1
and tested according to Chi-square in this chapter, we can only say

that each Chi-square test shows or fails to show significant

association between word and feature frequency. In the case to

follow, the testing feature (e.g. bilabial, velar, et cetera), not
individual hypotheses about words is considered. In testing median,

not mean, the Kruskal-Wallis test can tell whether the hypotheses,
as grouped by feature, are significant or not.
In order to test using the Kruskal-Wallis design, the hypotheses

outlined at the end of Chapter I must be used. This time, as the

tables 1.a., 1.b., and 1.c. show, each feature is predicted to be High,
Mid, or Low in frequency in each of the 16 glosses. The 63

hypotheses now become 240 hypotheses, with the 177 unstated
Middle or average values considered hypotheses. Further, in using

this test, some of the features have only Mid and High values
predicted, while four, bilabials, dental-alveolars, stops, and nasals
have three values predicted. Below are the predictions made for 16

glosses and 15 features on two and three values (k=2, k=3 e.g.).

The Kruskal-Wallis test statistic is given below. In summary, it

is a measurement that is a weighted sum of the squares of

deviations of the sums of ranks from the expected sum of ranks,
using reciprocals of sample sizes as weights (Daniel 1990:227).
12 k Ri
H= -NN ) -3(N+1)
N i=l ni

The use of this test statistic involves making the null

hypotheses that given nl, n2, or n3 population comparisons (Hi,
Mid, or Lo samples, i.e.), their medians will be identical. The
alternate hypotheses argue the medians are different from one
another in the predicted manners. There are 63 High or Low
frequency medians predicted for my data set. The remaining 177
are Mid predictions. When k=3, the degree of freedom is 2, for k=2,
the d.f. score is 1. The significance tables are the same as those used
for Chi-square. The table below gives the computed Kruskal-Wallis
test statistics:

Table 2.c.
Kruskal-Wallis Results and Significance

k sample I

Test-Stat (H)


Features Tested:

Bilabial 3 6.5 ****

Dental-Alveolar 3 1.7

Palatal 2 3.4

Labio-Velar 2 2.1

Velar 2 8.4 ****

Glottal 2 -.6

Affricate 2 4.7 ****

Fricative 2 .5

Stop 3 5.9 ****

Nasal 3 5.1

Back Vowel 2 1.2

Front Vowel 2 .09

Table 2.c. continued

k sample Test-Stat (H) p<.05

Features Tested:

Glide 2 4.1 ****

Approximant 2 -.12

Resonant 2 .8

In these results, the predictions for bilabials, velars, stops,

affricates, and glides are significant. This represents one-third of

the feature categories tested. In comparison, about one-third of the

Chi-square scores of the 63 individual hypotheses were significant

at the same levels of probability. The concurrence speaks well of
the overall success of the hypotheses and the internal data


Given usual pronouncements of the arbitrary sound-meaning

hypothesis, a sample, such as has been created here with this data
set, should contain about 5% shared cognate set per 100,000 years
contact. The levels of associations for features and meanings are

entirely too high, almost 6 times more than expected. This exposes

either a serious flaw in linguistic reconstructionist arguments or

evidence that sound symbolism is present within many languages,
regardless of phyletic grouping.
Alternately, it might be argued the significance is due to a

sub-set of languages within the sample concurring. This seems
unlikely given that 229 languages provide the 800 words in the

data set. If there is actually an agreement among a subset of


languages, it would have to be remarkably obvious to create such a

strong showing.
Jonckheere-Terpstra Testing. While the Chi-square and
Kruskal-Wallis test statistics measure differences between selected

samples of words or features, neither indicates whether the

difference is in the predicted direction Though there are many
ranking tests, one useful test is the little known Jonckheere-

Terpstra test for ordered alternatives. With this test, at least three
populations are required. In it, the null hypothesis predicts all
populations equal, but the alternate hypothesis predicts an

inequality in a particular direction. For the alternate hypothesis, nl

is lesser or equal to n2 which is lesser or equal to n3. In short, the
Jonckheere-Terpstra test is a one-sided Mann-Whitney or Wilcoxon

test. The advantage of this test is that it takes into account the
partial prior information in a postulated previous ordering.
In the tables listing the hypotheses in Chapter 1, it can be seen

that only bilabial, dental-alveolar, stop, and nasal features contain a

k=3 and qualify for this type of testing. Additionally, all the

hypotheses of the dissertation can be summed and a grand score of
hypothesis efficacy can be figured. This type of test creates a J-

score, which given probability tables, elicits a significance level.
Entering such a table, the p-level desired is matched with the k-
score, and the k-score's three or more sample sizes. For instance, the
k-score for bilabial is 3, their sizes are 3, 12, and 1. The probability
level can be less than .05.

The formula for obtaining the Jonckheere-Terpstra test is given

below. It tallies all pairwise comparisons from each population,
giving a score of 1 when one population element is greater than that
in another, and one-half point in the case of a tie. It measures
whether at least one of the population means is less than at least
one of the other population means (Daniel 1990:234).

J= jUij

The k-scores for each of the five tests are non-symmetrical and

unusual. As a result, tables do not exist which can translate the J-
score into a probability statement. This is unfortunate, but not
devastating. When sample size is large enough, the J-score can be
converted quite readily into the standard z-score, which carries a
normal distribution. In the z-score, the mean is always 0 and the
variance 1. The formula to convert using the obtained J-score is
given below.

J- (N2-i=lni )/4

2 k 2
[N2(2N+3)-i=1ni (2ni +3)]/72

This test is useful because it relates the ability of the
hypotheses to predict order in a data set, which according to
arbitrary sound-meaning tenets, should not have order.

The scores are given below in Table 2.d. with their significance

Table 2.d.

Jonckheere-Terpstra Results for Feature Hyp otheses (k=3)

J-Score Z-Score p p<.05

Feature(s) Tested:

Bilabial 47 3.5 .0002 ****

Dental-Alveolar 47 9.5 .0001 ****

Stop 56.5 9.7 .0001 ****

Nasal 51 8.9 .0001 ****

All Hypotheses (63); 8751 6.3 .0001 ****

(58)Hi<( 177)Mid<(5)Lo

These strikingly significant results indicate that when enough

information warranted three predictions as to the direction of the
means of three populations about certain features, the hypotheses

were all significant. Further, the results show that as a whole, the
63 hypotheses proposed initially, when modified into 240
hypotheses by including populations which are merely considered

average, are highly significant. Succinctly, this indicates order can
be predicted for sound-meaning associations for a geographical and
genetical distant sample of world languages utilizing classical ideas
about sound symbolism. To date, such a simple design has never
been done by scholars researching the limits of the sound
symbolism phenomena.

The following chapter places these results into the context of

widespread sound symbolism examples from world languages.
Given such comparison, the unusually marked results of this

chapter appear so only due to lack of structured research into sound

symbolism phenomena.



Within this chapter, three related areas are examined.

They are important to consider because they shed light upon

the difficulties which arise when scholars choose to specialize
research domains and forget the overall unity of linguistic
phenomena. First, evidence suggesting sound symbolism

encompasses prosody is viewed. As a long labelled "supra-

segmental" feature of linguistic pattern, prosody is essential to
all languages. Philosophers from Plato to Freud and linguists

from Ben Johnson to Roman Jakobson have held that prosodic
functions are intrinsic to the lineal nature of sound use in

communication purposes. Prosody not only occupies a pivotal
role in the language play during language acquisition for

children, it is basic in allowing meaning transfer between

speakers. Yet, until recently, prosody has received little
serious attention by language scholars.
Many works have scratched out schemes which place

prosody within a sound symbolic domain or sound symbolism

within prosodic one. Each paradigm reaches vastly different

conclusions. Among the more notable include: Fonagy (1979),

La M6taphore en Phon6tique, Genette (1976), Mimologiques:

Voyage en Cratylie. Ertel (1969) Psychophonetik, Jakobson
and Waugh (1978) The Sound Shape of Language, Wescott

(1980c) Sound and Sense: Linguistic Essays on Phonosemic

Subjects, and Thass-Thienemann (1967) The Subconscious


Second, prosody is vast and its literature has not been

adequately reviewed anywhere. Neither has its body of

knowledge ever been trully compared with sound symbolism

studies. So, even though this cannot be done here, I will list

and define a plethora of sound symbolism terms, currently

used without much agreement among scholars. In recognizing

this immense arena claimed by the numerous sound
symbolism researchers, I propose that prosody is a sub-set of

a much tighter grouping of sound symbolism rules. I predict

that when the elements of a universal prosody are identified

and codified, they will be indistiguishable from sound

symbolic ones.

Lastly, evidence of sound symbolism from 12 of the 17
major language phyla is presented. I claim research will

expose sound symbolism in all known language phyla. Its

absence is due to lack of published research data, though

certainly it appears present in scans of relevant dictionaries.

Sound Symbolism and Prosody

The bio-acoustic universe is composed of environmental

sounds, animal calls, and human speech. Sounds have always

carried emotive meanings for humans. Any survey of the

cultural metaphors ascribed and debated about sounds in

particular languages demonstrates this pervasiveness. Each

one of these domains is described in all cultures with varying

numbers of semantically polar adjectives. A far from

exhaustive list includes the following contrasting beliefs about

bio-accoustically perceived sound: A sound may be described

and thereby taught to be understood as small or large, dry or

wet, light or dark, lightweight or heavy, fast or slow, hard or

soft, smooth or rough, weak or strong, sharp or dull, female or

male, quiet or loud, angular or round, clear or abstruse, near

or far, empty or full, gay or sad, pure or mixed, short or long,

few or many, sweet or sour, even or odd, squat or tall, high or

low, thin or wide, major or flat, tonal or atonal, nervy or calm,

and so on (Fonagy 1979). Even so, evidence remains anecdotal

that any sounds innately evoke emotions.

Though the acoustic features lending themselves to such

binary description are not well understood, there is general

acceptance among scholars that prosody plays a major part in

this and it carries "sound suggestiveness" and "intrinsic value"

(Jakobson and Waugh 1978:198). In most definitions, prosody
refers to a suprasegmental manipulation of the forms of

utterance. So defined as suprasegmental, the prosodic process

takes place on a level which overlies a basic structure, usually

the phoneme. Any number of suprasegmentals can be created

and labelled prosodic. However, the most commonly cited ones

function such that the pitch, loudness, tempo, duration, and

rhythm are linked, either innately or voluntarily, to

connotative meaning (Barry 1981:321).

Prosody has at least four functions. First, the "globally

rhythmic" and tonal pattern direct a hearer's attention and act

as semantic guides (Barry 1981:337). Prosodic tonality and

tempo modulation aid in dividing acoustically inseparable
"connected speech" into semantic units. "Connected speech" is

common to all languages and involves the ordinary blending

of one word into another. This phenomenon is witnessed in

the difficulty of aurally learning a foreign language, when it is
more easily learned literally.

A second prosodic function is known as speaker attitude

signalling. For this function, a person hears and discerns

whether a speaker is agitated, angry, calm, seductive, happy,

sad, or despondent, by voice quality. Though the prosodic
elements processed to achieve this aim can include pitch,

tempo, and loudness, an accurate discernment of speaker
attitude by conspecifics has been shown to interact within

social context. That is, even though emotional states are
broadly comparable for all humans, the traits used to identify

each are highly malleable to change according to particular

instance. Nevertheless, keeping a social situation qualifier in

mind, for English speakers, it has been shown that mild anger

produces an increased tempo of speaking, whereas depression

produces a decrease (Markel, Bein, and Phillis 1973). When
listeners rate emotions of speakers according to "softness" or

"harshness", it has been evident that soft, empathetic

emotions such as grief and love are expressed through peak-

pitch profiles. The harsh, hostile emotions, such as anger and
contempt, are expressed through peak-loudness profiles
(Costanzo, Markel, and Costanzo 1969:269). Additionally,

length of utterance seems connected to an expression of

friendship (Markel 1988). Consequent to these studies, no one
now doubts social context and prosodic elements

synergistically interact to convey speaker attitude.
Third, perceptual focussing is a function of prosody.

Localization in the tonal accent, determined by pitch

movement, forces a centralization upon the type of
information being conveyed (Barry 1981:330). With this
prosodic function, for example, most languages utilize high

and/or rising intonations to mark questions and the converse
to indicate statements (Bolinger 1964). Otherwise, a speaker

such as an irritated parent might indicate the imperative in a
command to a child such as "Get in this house NOW!" Focussing
acts as a double function in that it determines the
communicatively most important elements within the sense

unit and at the same time links the unit to its context (Barry

Finally, experiments show that when subjects are

presented with syntactically ruptured binaural sentences, the

listener's attention follows the prosody, while the syntacto-

semantic switch merely caused hesitations and omissions

(Darwin 1975)(Barry 1981). This "guide" function of prosody

is suspect in the emergence of proto-syntax. This is to say, in

the earliest language scenario, prosody may have been the

syntax. Consequently, conspecific sound meant emotion and
meaning emplacement within a social context. Certainly, vocal

pauses marked an upward physiological constraint of vocal

length utterance and must have played a part in semantic

Cross-cultural similarity in the use of the fundamental

frequency to convey affect, intention, or emotion is well
known in anecdotal and experimental evidence (Ohala

1984:2). Neonates prefer their own mother's voice over others

(DeCasper and Fifer 1980). "Baby-talk" or "motherese"

consistently occupies higher and harmonic regions of

frequency and amplitude (Ferguson 1964)(Fernald and Kuhl

1987). Perhaps one of the oldest perceptions in any hominid
proto-language may be that MOTHER is FEMALE and SMALLER

and TONALLY HIGHER in acoustical production. If this

conjecture is extended, the earliest human culture and

language began with mother-infant interaction communicating
affective intent.

It is little secret all mammalian orders communicate

emotional activity with tonality and other prosodic features.

Within humanly conceived sound symbolic words, high tone

tends to be associated with words connoting or denoting small,
diminutive, familiar, near, familiar, near, or narrow, and the

reverse meanings for low tone (Ohala 1984:4). In phonemic

terms, for vowels, this means the front vowels represent the

higher frequency versus the back vowels. For consonants, this

means the voiceless ones represent the higher versus the

voiced ones. As shown further, this is an important focus of

testing in sound symbolism experiments.
In humans, vowels are most easily recognized and are

always intonated. Intonation of utterance is universal, if only

because Nature creates animals of differing shapes and

capacities and possibly intonation is the most common

denominator (Bolinger 1964). For example, an evolutionary

pattern producing, accepting, and perceiving a high front

unrounded /i/ vowel by a female or male, child or adult, of
differing size and health is too widespread to be explained by

borrowing, descent from a common linguistic source, or chance

(Ohala 1984:2). Indeed, Liebermann pointed out that this

group of articulatory parameters forming this intonation be

called the "supervowel" because it is identified with unerring

accuracy among a pantheon of cultural groups and actors

(Lieberman 1984:158-161).
Intonation is thusly deemed partly an innate and
evolutionarily selected behavior. It is so because evidence

shows it is crucial to the socialization processes in alloprimates

by allowing the inherent variability of the individual a place
in communicative adaptiveness. Over-specialization gets a

genera wiped out and no species can perfectly create high

frequency vowels invariably. A process entailing the use of
sound for communication of affective intents must include a
multitude of constraining factors. Some of these include the
health of the animal, a social context, an age of the animal, a
sex for the animal, and an emotional state of the animal. Any

one of these can alter the formation of a vowel intonation. Too
often, language or communication schemes assume "once upon
a time" that animals created a sonal frequency, and that this

became an auditory frequency. All this, the assumption goes,
without the slightest variability.

Prosody is not yet a subset of any sound symbolic
scheme. Partly, this is due to lack of cross-cultural data on

prosody and the lack of a unifying framework with which to
study sound symbolism. Even so, all vowels are intonated.
Any two phrase utterance occurs within a temporal and
commonly iconic scheme. Plus, the use of prosody is linked
with intent within a social context, and the use of sound
symbolism is connected with clarifying intent within a social

context containing shared perceptual routines. In any case, it

seems absurd to argue that when small front vowels indicate
semantic "smallness" in a particular culture, this be labelled
"sound symbolism," while claiming the use of a high frequency

register, including the same vowels, and evincing affective
connotations, belongs for study within prosodic subfield. The

troublesome blur between sound symbolism mechanics and

prosodic ones belongs in part to faulty logic. Use of sound
symbolic phonetic devices implies a shared cognitive tradition.

This tradition owns functions identical to those of prosody.

Often, sound symbolism is treated as if it must only occur
within a vacum, something a categorical definition of prosody

could never sustain.

Sound Symbolic Terminologies

Sound symbolism is labelled with a swath of terms

including: "iconic symbolism" (Wescott 1971b), "psycho-
morphism" (Markel 1966), "phonosymbolism" (Malkiel

1990a), "phonetic symbolism" (Sapir 1929) (Newman 1933),
"synaesthesia" (Karkowski, Odbert, and Osgood 1942), "sound-

meaning correlation" (Heise 1966), "onomatopaeition" (Kahlo

1960), "vocal-gestural symbolism" (Paget 1930),
"phememism" (Foster 1978), "animal talk" (Langdon 1978),

"ideophone symbolism" (Samarin 1970), "magical imitation"
(Fisher 1983), "mimicry" (Bladon 1977), "expressiveness"

(Henry 1936; Fudge 1970), and "holestheme-phonestheme

symbolism" (Wescott 1987).

Such colorful nomenclature regards types of sound and

meaning within language mechanics as sometimes partially

and entirely motivated. These terms can refer to types of

sound symbolism: lexical, syntactic, morphic, psychological,

and phonological. Otherwise, they can appear as combinations

of two or more types. I delimit most below. A simple

organization on a expressive scale ranging from minimally to

maximally arbitrary is difficult to construct cross-culturally,

though it has been done for a single language elsewhere

(Bladon 1977). Even in the case of the least arbitrary,

mimicry, the given definitions are paradoxical. Nevertheless, in

comparison, each possesses semi-inclusive functions enabling

communicative intent to be interpreted among conspecifics in

a manner more certain than in purely arbritary sound-

meaning units.

Mimicry. Mimicry is the least arbitrary form of language

use and generally the best possible imitation of a particular

sound source by a conspecific (Bladon 1977). Individuals

always vary in their capacity to mimic with vocal dexterity
fluctuating widely among a speaking groups. An important

difference exists, however, between imitating a cat using a
high-toned rasping falsetto voice and reporting a name for

what a cat says. The former can use vocal pitch, amplitude,
delivery speed, staccattoed presentation, reduplication, and so

on (in English, [miauw], [hesss] i.e.]. The latter are described

below as onomatopes and represent an abbreviated recall of

an obvious auditory feature of the thing described (in English,
[kaet], [pus] i.e.).

Mimicry is not easily transcribed orthographically.for

linguists, poets, and speech therapists. Consequently, it is not

well studied scientifically. Still, it is extensive in the collective

psyche and oral history of a culture's forms of dramatic

recitation. The great art to mimicry, whether of human voice,

activity, or emotion, is well known among primates.
Evidence abounds that humans possess extraordinary

mimicry capabilities and talents. Widespread communities

astound the public yearly by hosting pig-calling, eagle-calling,

alligator-calling, duck-calling, or turkey-calling festivals. The

only requisite for a person to become a rich and famous

performer in Western soicety is an uncanny ability to

duplicate other people's voices and say something which is

semantically inappropriate to that persona's voice.

One of the few studies done on this topic reports on a

speaker's ability to create onomotopoetic words so to describe

auditory phenomenon. Wissemann (1954) asked subjects to
describe various sounds which included rattling chains,

snapping wood, sploshing water, shattering glass, clanging
bells, and the like. Interestingly enough, the longer sound did

not necessarily elicit the longer name. Instead, the number of
syllables corresponded to the number of divisions heard in

the noise. Syllables created expressed the sound's
differentiation and stress highlighted important sonal

dimensions (Brown 1958:116). Abrupt onset of sound, such as

in snapping, breaking, pounding, and the like, usually was
named with a voiceless stop consonant (e.g., [p], [t], [k])

Gradual onset noises became labelled with fricative
consonants (e.g., [s], [z], [h], ) (Brown 1958:117). Further,

Wissemann's subjects agreed upon a common scheme for

vowel utilization in labelling colors and sizes. Vowels
produced frontally were used to refer to bright small noises,

low back vowels the reverse (Brown 1958:118).
This study raises the possibly that mimicry or a process

similar to echoism underlies naming principles for sensory
experiences. Roger Brown inquired: "Is it possible that primal
man created his first words in accordance with these same
imitative rules and that these rules, being "natural" to all men,
made translation of the first words easy?" Such an earliest

language scenario presents mimicry as only part of a creative

manipulative naming system in a dynamic communicative
order, loaded with changing social needs, for numerous

primate genera. For example, higher rank in early hominid
vocalizations, in comparison with other alloprimate
observations, might have been signalled by greater than
normal use of vocalizations given and received from
conspecifics (Gouzoules, Gouzoules and Marler 1986).

Onomatopes. Onomatopes are "words" and not mere

acoustical imitations. As qualified "words," they seldom

possess unchanging spelling forms and show considerable

difference in dictionary definitions. They represent a sound

source and are phonemically characterized speech sounds. For

example, sonogram comparisons could show that the English
voiced alveolar-palatal fricative /z/ resembles the sound of a

bee buzzing. The /z/ and the sounds of the word "buzz" are
phonemes in English. In Yucatecan Mayan, there is no /z/

phoneme to use in an onomatope for the sound a bee makes
and their /b/ is imploded, the feature reversal of the English

/b/. If Mayan children make a word for what a bee says, it

will not contain a /z/ if it is an onomatope. The codification of

phonemes into those "words" for a speaking group varies

cross-culturally. Onomatopic production is distinct from

mimicry, though, and languages contain rules for compressing

an imitation of what an animal/process actually emits into a

shared word. This acoustical compression phenomenon of

languages is little studied and few statements can be made

regarding it.

"Morpho-phono-symbolics" or similarly, "phono-

semantics" are empty jargon. No one knows how speakers go
from imitating the bark of a dog, for example, to creating a
word for its bark. To give some examples from the Indo-
European family, English speakers' dogs can say [wufl,

Germans' [vaul, Frenchs' [wal, Icelandics' [gelta], Rumanians'

[latra], Croatians' [lajatil, Lithuanians' [lotil, and Palis'
[bhussatil. In the Altaic language family, a Turkish dog says

[haul, and a Japanese [wi~g. For the Niger-Khordofanic
language Mbukushu a dog says [kudha]. Tahitian, an

Austronesic language, allows dogs to say [aoa]. North Amerind

languages differ as well for dog barking. In Hopi it is [waha],

Crow [bahuk], Ojibwa [miki], and Micmac ['psagagwl. Finally,

for Mon, an Austro-Asiatic language, a dog's bark is [ki?]

(Bladon 1977:162; for others see dictionaries in Appendix B).

The common sense adage that dogs bark the same world

round is untrue. Even among packs of the same sub-species

barks may differ. Which types of dogs and what area of the

geographic world do the dogs bark in are two variables

influencing onomatopic construction of "bark." All this quickly

dismisses a tidy summary of a mechanical dog bark. In short,

simply naming the vocalization of an animal is a complex


Other onomatopes relate to sounds that a culture

recognizes as emotionally significant. In English these include
"tee-hee," "boo-hoo," "ugh," "tut-tut," "no-no" and so on.

Certain onomatopes also have echoic reference to speech
styles, such as "blah-blah," "la-dee-dah," "hem and haw,"
"yammer," "stammer," "babble," "stutter," "mutter," "sputter,"

and so on. Of particular importance to this dissertation is a

group of onomatopes regarding vegetative process such as

hiccuping, sneezing, coughing, laughing, and so on. Cross-

cultural onomatopic similarities expose the operation of sound

and gestural symbolism. In the experiments following, this
"semantic" compression of sound value is further examined. It

should be noted that even with the most automatic event, say

coughing, the cross-cultural expressions are non-identical in

some ways, but identical in other, predictable, ways.

Synaesthesia. Synaesthesia labels a subject's connotative

regard for sounds as they associate with unusual senses. In

early Greece, Homer equated colors, emotions, and sounds

(Pecjak 1970:625). More modern subjects, in response to

music, report major chords "wet" and minor "dry" (Karkowski,

Odbert and Osgood 1942). Similarly, Naval submarine

radiomen during World War 2, in response to the need to

share information about sonar recordings, developed a

specialized lexicon. In this creative vocabulary sounds were

called "bright" "shiny" and "dark". Large objects, explosions, or

processes were given low frequency phonemes. When events

approached the ship, they were called small, bright, and high

(Solomon 1958,1959).
Sapir (1929), discussed at length in Chapter IV, using

nonsense CVC words (i.e. words created of consonant+ vowel+
consonant), demonstrated that the more anteriorly produced
the vowel, the smaller in relative perceived size (1929). Other

tests have associated high tones with sharp objects, and low
tones with round objects (Davis 1961). Bilabial phonemes (e.g.
/b/,/p/,/m/,/3/,/6/ ) associate with rounded shapes and

velar stops (e.g. /k/,/g/,/g/ ) with angular shapes in English

(Firth 1935).

Synaesthesia experiments are described in detail in the

next chapter. Compared with sound symbolism, synaesthetic

definitions are fuzzy because they were formulated upon

archaic conceptions of sense perceptions and sound dynamics.

Just as any neurologist would say there are more than five

sense receptors, any audiologist would says sound perception

includes transduction of mechanical energy through air, water,

bone, chemical, and electrical mediums. Sound lends itself

synaesthetically with light, touch, space, and the like

presumably because of somatosensory overlapping modes of

sensory processing in the brain.

Phonaesthetics. Phonaesthetics label an emotional nature

to sounds. Good or bad, hot or cold, fast or slow, dangerous or

safe are varied affective connotations which types of sounds

can acquire in orderly fashion within a culture. Examples
include: a.) [-~~s] found in words (such as dash, gash, clash,

lash, flash, etc.) associates with violence, b.) low mid back
unrounded sound ,/A/, (in mud, dud, cud, e.g.) associated with

an unspecified heaviness and dullness, c.) [sm -] cluster carries

a pejorative connotation for English speakers (Markel 1966).
Ideophones operate in Niger-Khordofanian languages to label

"big" or "harmonically ideal", and "thin" or "discordant"

speaking styles (Wescott 1980a; Samarin 1967; Sapir 1975).

Phonaesthetic devices vary considerably between

cultures. Nonetheless, no comparative studies have been done

upon universal world poetry, song, or recitation trope. The

crucial value of an idea of linguistic "beauty" in any language

is underestimated. Language speakers are critically directed

to vary their speaking registers from earliest utterances. That

each of these registers carries its own rules of appropriateness

is well known. The ability to interact successfully within a

social milieu is tied with knowing the rules of the "pleasant"

speech game (Farb 1974). Perhaps because the rules are so

fluid or perhaps because they are so subjective, scholars have

failed to develop a scheme appropriate for the study of

phonaesthetics. Still, phonaesthetic devices are little different

from sound symbolic ones. Sounds which are made during

pleasant activities become synonymous with pleasantness.

Many of these include sucking, making love, smacking, and so

on, and are described in the following section upon sound

symbolism in natural languages.

Linguistic icons. Linguistic iconism denotes the use of

sounds as icons, nonarbitrary intentional signs acting as

designations bearing an intrinsic resemblance to the thing it

designates (Wescott 1971b:416). Instances highlighting

linguistic iconism in the world's languages include: a.)

quickness--in English, stop consonants convey the iconic

impression of brevity and discontinuity as in the contrast

between "chirp," "yelp" versus chirrr," "yell". The rapidity with

which they are made, iconically recapitulates their rank as

"quick". In terms of meters per second, they are the fastest

produced sounds humans can make.; b.) quietness--voiceless

consonants imply inaudibility or a vocal incapacity and are

most effective when coupled with high front vowels to imply

smallness. Such English exemplars include "tick," "hiss,"

"sizzle," "whistle," "whisper," and "shush". Again, diminished

volume with speech terms parallels diminished activity of a

referent process; c.) temporality--later events are reflected

later in the naming event. This is evident in the commonality

of suffixing for past tense morphemes (Greenberg 1964); d.)

commonality--frequently used terms are shorter than average

when referent importance rises. These short basic terms are

also learned earliest by children (Brown and Witkowski


Such a list of linguistic icons is hardly complete. An

exhaustive study of their pervasiveness has not been done. As

a whole, they demonstrate that vocal behavior parallels non-

vocal behavior as far as some semantic intents are concerned.

Iconism is abbreviated behavior display. As such, it is very

similar to sound symbolism devices. Like behaviors and
meanings get like expressions, albeit in greatly reduced forms.

Vocal icons. Vocal iconism is not strictly linguistic iconism.
Instead, it refers to the use of gestural specificity of vocality.
For example, dentality can be a vocal icon. Since this

consonantal feature involves articulation with the teeth, it

connotes steady projection of something from a base. Many

world languages contain names of various projections from

the earth or the body utilizing dental consonants. Instances
include Proto-Indo-European *ed- "to bite" and *dent- "tooth;"

Effik -ot "head," eto- "tee;" Mixtec tu- "tail," thuk "horn," t'e

"woods," and duti- "mountain" (Wescott 1971b:422).
Following this conjecture about vocal icons and the teeth,

Hockett proposed that the rise of the labiodental phonemes

[f, v] were caused by the advent of agriculture (Hockett
1985:284). He remarked that these phonemes diffused from

nascent agriculture centers and represented the shift to the
chewing configuration required of grinding cereals instead of

the scissor-bite required for cutting meat. Such a shift became

iconic and presumably, the terms for grains of all types should
overlap significantly with those of teeth, at least as far as
sharing phonemes.
In some languages, minimal articulatory shifts indicate

minimal semantic shifts. For English, instances include "this-
that," or "six-seven," or "four-five". In proto-Semitic (*Oinay)
and (Oala:Ou) "three-four" and (Sid0u) and (Sab'u) "six-seven"

(Wescott 1971b:421)
Names for body parts often include just those parts so
named. I have compiled evidence that hundreds of languages
name "tooth" with dental consonants made with the teeth.
Similarly, "lip" is named with labial consonants. Vocal icons
are necessarily redundant. For example, the word for tongue

in all languages will include movement of the tongue. What

would be of interest is to test through electro-mylography
whether muscles of named anatomical parts invariably

respond when so named. If so, vocal iconism may be

considered an adjunct to other identified synergistic body

languages (Argyle 1973).

Psycho-morphs. A Psycho-morph is "a non-morphemic
unit of one or more phonemes for which a connotative

meaning can be established, but, this connotative meaning

may not accompany all occurrences of the unit" (Markel

1966:2). Non-morphemic units for English can include the

phoneme clusters /sm-/ and /gl-/, for example. The speaker

associates, with cognitive mechanisms not well understood,
the identified psycho-morph with a select attitude. For

instance, English speakers negatively regard the /sm-/ cluster

(Markel and Hamp 1960).

The mechanisms for Markel's psycho-morphs are not
inherited, appearing culturally and language specific.

However, like so many speech behaviors, the active processes

of the psycho-morph occur below the normal level of speaker

awareness. Unconscious attitudes toward psycho-morphs
influence speaker selection of appropriate word choice when
given competing alternatives (Markel 1966).

Psycho-morphs impute linguistic units, other that at the
level of the word, actively disturbing a level of word retreival

in a speaker's cognitive mind. Within a culture, psycho-

morphs demonstrate a culture's self-reflexive processes,

injected into actual language use. Attitude is use and use is

iteration of attitude. For Markel, the psycho-morph is only one

of a number of processes expressing the inner psychic world

of a speaker. Even the selection of large groups of vocabulary,

expressive words, of negative and positive connotation, link

up in frequency of use in hypertense speakers (Markel 1990).

Feelings reiterate use, use reiterates feelings. In itself, these

findings recapitulate views of virtually every mentalistt"

ethologist. Animals, including humans, overlay their inner

worlds upon extrinsic reality.

Ideophones. Ideophones are linguistically marginal units,

their exact definition being a matter of some debate.

Africanist Clement Doke first described a group of

grammatically deviant expressive forms common to Bantu

languages and conveying sensory impressions as ideophones

(Doke 1935:118-119). He argued ideophones were a separate

part of speech much as an adjective or adverb. Since then,

their special lexical status has been largely dismissed (Wescott


Other linguists have added to the growing corpus of the
ideophone. Samarin reports that at least twenty-five terms

synonymous with ideophony (Samarin 1971). Westermann

labels it Lautbild,"a word that depicts a reaction to sensory

impressions and expresses a feeling in a suitable acoustic

form"(Smithers 1954:73). Linguist Gerard Diffloth

characterizes ideophones as grammatical units which can

function by themselves as complete sentences. Their

morphemic constituents are phonic features (Diffloth 1972).

Ideophones contain unusual sounds, form exceptions to

the rules of length, tone, and stress applying to other

elements, and are commonly reduplicated (Smithers 1954:83).

Two examples are illustrative: a.) intensity--English

ideophones can involve consonantal doubling [mm, tt, dd, gg,

pp, ss, 11, etc.] to indicate intensity such as in "puff," "yell,"
"guffaw," "chatter," "sluggish," and "quarrel". Verbs with

voiced consonantal doubling are rare in Old English and as

well as Old Norse. There are six known in each language. But

when their usage increases in Middle English, they are used in

words expressing actions, gestures, or movements of a

sluggish, inert, or vacillating kind, or those that are repeated

(Smithers 1954:85); b.) sound duplication--another event of

ideophony is palimphony or sound-repetition. Types abound

in English including "pop," "crack," "plop," "boob," "dud," and so

on. Disyllable examples are also well represented in "hot-

head," "tid-bit," "kick-kack," "sad-sack," "sing-song," "rag-tag,"

and "hobo". Echo-compound words can be seen as well in

"hodge-podge," "hurly-burly," "pell-mell," and "tootsy-

wootsy"(e.g. bilabial series); "rag-tag," "super-duper," "willy-
nilly," and "ding-a-ling," "chit-chat"(e.g. apical series); and

"hootchy-kootchy" and "hurdy-gurdy"(e.g. velar series)

(Wescott 1980a:200-202).