Read The Language Instinct: How the Mind Creates Language Online
Authors: Steven Pinker
So far we have just discussed the vowels—sounds where the air has clear passage from the larynx to the world. When some barrier is put in the way, one gets a consonant. Pronounce
ssssss
. The tip of your tongue—the sixth speech organ—is brought up almost against the gum ridge, leaving a small opening. When you force a stream of air through the opening, the air breaks apart turbulently, creating noise. Depending on the size of the opening and the length of the resonant cavities in front of it, the noise will have some of its frequencies louder than others, and the peak and range of frequencies define the sound we hear as
s
. This noise-making comes from the friction of moving air, so this kind of sound is called a fricative. When rushing air is squeezed between the tongue and palate, we get
sh;
between the tongue and teeth,
th;
and between the lower lip and teeth,
f
. The body of the tongue, or the vocal folds of the larynx, can also be positioned to create turbulence, defining the various “ch” sounds in languages like German, Hebrew, and Arabic (
Bach, Chanukah
, and so on).
Now pronounce a
t
. The tip of the tongue gets in the way of the airstream, but this time it does not merely impede the flow; it stops it entirely. When the pressure builds up, you release the tip of the tongue, allowing the air to pop out (flutists use this motion to demarcate musical notes). Other “stop” consonants can be formed by the lips
(p)
, by the body of the tongue pressed against the palate
(k)
, and by the larynx (in the “glottal” consonants in
uh-oh
). What a listener hears when you produce a stop consonant is the following. First, nothing, as the air is dammed up behind the stoppage: stop consonants are the sounds of silence. Then, a brief burst of noise as the air is released; its frequency depends on the size of the opening and the resonant cavities in front of it. Finally, a smoothly changing resonance, as voicing fades in while the tongue is gliding into the position of whatever vowel comes next. As we shall see, this hop-skip-and-jump makes life miserable for speech engineers.
Finally, pronounce
m
. Your lips are sealed, just like for
p
. But this time the air does not back up silently; you can say
mmmmm
until you are out of breath. That is because you have also opened your soft palate, allowing all of the air to escape through your nose. The voicing sound is now amplified at the resonant frequencies of the nose and of the part of the mouth behind the blockage. Releasing the lips causes a sliding resonance similar in shape to what we heard for the release in
p
, except without the silence, noise burst, and fade-in. The sound
n
works similarly to
m
, except that the blockage is created by the tip of the tongue, the same organ used for
d
and
s
. So does the
ng
in
sing
, except that the body of the tongue does the job.
Why do we say
razzle-dazzle
instead of
dazzle-razzle?
Why
super-duper, helter-skelter, harum-scarum, hocus-pocus, willy-nilly, hully-gully, roly-poly, holy moly, herky-jerky, walkie-talkie, namby-pamby, mumbo-jumbo, loosey-goosey, wing-ding, wham-bam, hobnob, razzamatazz
, and
rub-a-dub-dub?
I thought you’d never ask. Consonants differ in “obstruency”—the degree to which they impede the flow of air, ranging from merely making it resonate, to forcing it noisily past an obstruction, to stopping it up altogether. The word beginning with the less obstruent consonant always comes before the word beginning with the more obstruent consonant. Why ask why?
Now that you have completed a guided tour up the vocal tract, you can understand how the vast majority of sounds in the world’s languages are created and heard. The trick is that a speech sound is not a single gesture by a single organ. Every speech sound is a
combination
of gestures, each exerting its own pattern of sculpting of the sound wave, all executed more or less simultaneously—that is one of the reasons speech can be so rapid. As you may have noticed, a sound can be nasal or not, and produced by the tongue body, the tongue tip, or the lips, in all six possible combinations:
Lips
Nasal (Soft Palate Open):
m
Not Nasal (Soft Palate Closed):
p
Tongue tip
Nasal (Soft Palate Open):
n
Not Nasal (Soft Palate Closed):
t
Tongue body
Nasal (Soft Palate Open):
ng
Not Nasal (Soft Palate Closed):
k
Similarly, voicing combines in all possible ways with the choice of speech organ:
Lips
Voicing (Larynx Hums):
b
No Voicing (Lrynx Doesn’t Hum):
p
Tongue tip
Voicing (Larynx Hums):
d
No Voicing (Lrynx Doesn’t Hum):
t
Tongue body
Voicing (Larynx Hums):
g
No Voicing (Lrynx Doesn’t Hum):
k
Speech sounds thus nicely fill the rows and columns and layers of a multidimensional matrix. First, one of the six speech organs is chosen as the major articulator: the larynx, soft palate, tongue body, tongue tip, tongue root, or lips. Second, a manner of moving that articulator is selected: fricative, stop, or vowel. Third, configurations of the other speech organs can be specified: for the soft palate, nasal or not; for the larynx, voiced or not; for the tongue root, tense or lax; for the lips, rounded or unrounded. Each manner or configuration is a symbol for a set of commands to the speech muscles, and such symbols are called features. To articulate a phoneme, the commands must be executed with precise timing, the most complicated gymnastics we are called upon to perform.
English multiplies out enough of these combinations to define 40 phonemes, a bit above the average for the world’s languages. Other languages range from 11 (Polynesian) to 141 (Khoisan or “Bushman”). The total inventory of phonemes across the world numbers in the thousands, but they are all defined as combinations of the six speech organs and their shapes and motions. Other mouth sounds are not used in any language: scraping teeth, clucking the tongue against the floor of the mouth, making raspberries, and squawking like Donald Duck, for instance. Even the unusual Khoisan and Bantu clicks (similar to the sound of
tsk-tsk
and made famous by the Xhosa pop singer Miriam Makeba) are not miscellanous phonemes added to those languages. Clicking is a manner-of-articulation feature, like stop or fricative, and it combines with all the other features to define a new layer of rows and columns in the language’s table of phonemes. There are clicks produced by the lips, tongue tip, and tongue body, any of which can be nasalized or not, voiced or not, and so on, as many as 48 click sounds in all!
An inventory of phonemes is one of the things that gives a language its characteristic sound pattern. For example, Japanese is famous for not distinguishing
r
from
l
. When I arrived in Japan on November 4, 1992, the linguist Masaaki Yamanashi greeted me with a twinkle and said, “In Japan, we have been very interested in Clinton’s erection.”
We can often recognize a language’s sound pattern even in a speech stream that contains no real words, as with the Swedish chef on
The Muppets
or John Belushi’s samurai dry cleaner. The linguist Sarah G. Thomason has found that people who claim to be channeling back to past lives or speaking in tongues are really producing gibberish that conforms to a sound pattern vaguely reminiscent of the claimed language. For example, one hypnotized channeler, who claimed to be a nineteenth-century Bulgarian talking to her mother about soldiers laying waste to the countryside, produced generic pseudo-Slavic gobbledygook like this:
Ovishta reshta rovishta. Vishna beretishti? Ushna barishta dashto. Na darishnoshto. Korapshnoshashit darishtoy. Aobashni bedetpa.
And of course, when the words in one language are pronounced with the sound pattern of another, we call it a foreign accent, as in the following excerpt from a fractured fairy tale by Bob Belviso:
GIACCHE ENNE BINNESTAUCCHE
Uans appona taim uase disse boi. Neimmese Giacche. Naise boi. Live uite ise mamma. Mainde da cao.
Uane dei, di spaghetti ise olle ronne aute. Dei goine feinte fromme no fudde. Mamma soi orais, “Oreie Giacche, teicche da cao enne traide erra forre bocchese spaghetti enne somme uaine.”
Bai enne bai commese omme Giacche. I garra no fudde, i garra no uaine. Meichese misteicche, enne traidese da cao forre bonce binnese.
Giacchasse!
What defines the sound pattern of a language? It must be more than just an inventory of phonemes. Consider the following words:
ptak
plaft
vlas
rtut
thale
sram
flutch
toasp
hlad
mgla
dnom
nyip
All of the phonemes are found in English, but any native speaker recognizes that
thale, plaft
, and
flutch
are not English words but could be, whereas the remaining ones are not English words and could not be. Speakers must have tacit knowledge about how phonemes are strung together in their language.
Phonemes are not assembled into words as one-dimensional left-to-right strings. Like words and phrases, they are grouped into units, which are then grouped into bigger units, and so on, defining a tree. The group of consonants (C) at the beginning of a syllable is called an onset; the vowel (V) and any consonants coming after it are called the rime:
The rules generating syllables define legal and illegal kinds of words in a language. In English an onset can consist of a cluster of consonants, like
flit, thrive
, and
spring
, as long as they follow certain restrictions. (For example,
vlit
and
sring
are impossible.) A rime can consist of a vowel followed by a consonant or certain clusters of consonants, as in
toast, lift
, and
sixths
. In Japanese, in contrast, an onset can have only a single consonant and a rime must be a bare vowel; hence
strawberry ice cream
is translated as
sutoroberi aisukurimo, girlfriend garufurendo
. Italian allows some clusters of consonants in an onset but no consonants at the end of a rime. Belviso used this constraint to simulate the sound pattern of Italian in the Giacche story;
and
becomes
enne, from
becomes
fromme, beans
becomes
binnese
.
Onsets and rimes not only define the possible sounds of a language; they are the pieces of word-sound that are most salient to people, and thus are the units that get manipulated in poetry and word games. Words that rhyme share a rime; words that alliterate share an onset (or just an initial consonant). Pig Latin, eggy-peggy, aygo-paygo, and other secret languages of children tend to splice words at onset-rime boundaries, as does the Yinglish construction in
fancy-shmancy
and
Oedipus-Shmoedipus
. In the 1964 hit song “The Name Game” (“Noam Noam Bo-Boam, Bonana Fana Fo-Foam, Fee Fi Mo Moam, Noam”), Shirley Ellis could have saved several lines in the stanza explaining the rules if she had simply referred to onsets and rimes.
Syllables, in turn, are collected into rhythmic groups called feet: