InfiniteMIT embeds videos from YouTube that require cookies to function properly. By watching videos, you agree to YouTube's Terms of Service.

Morris Halle, "The Representation of Words in Memory" - MIT EECS Colloquium Lecture

Search transcript...

PROFESSOR: Ladies and gentlemen, our speaker this afternoon is Professor Morris Halle, an Institute professor who has been associated with linguistics, now linguistics and philosophy, and with RLE. He joined RLE, as I did when Jerry Wiesner was directing it, 1951. So I've known Morris since somewhat before that.

Over all of the years, I think my favorite quotation came from when he received the Killian Award, and talked about how he had become a linguist, and said that in his childhood in Latvia, he grew up in a household in which all the adults spoke different-- every adult had a different language. And he knew, he induced, being an intelligent fellow, induced from his surroundings that when you grew up you got your own language. And indeed it happened, but he never guessed that his was going to be English. He's going to talk to us on a representation of words and memory.

HALLE: You need a handout anybody here you can't follow this without-- this speech without a handout. Where's my-- Yeah, that's better. The light is so strong. In speaking, as well as in listening to speech of others. we have the very clear and distinct impression that all utterances are composed of discrete words. Many people are quite surprised when it's pointed out to them that utterances are quasi-continuous acoustically, and normally contain no pauses separating one word from the next, the next.

[LAUGHS]

And this impression of speaking and hearing discrete words in the continuous speech signal is not lost, even by the most experienced speech researchers, who are completely aware that utterances are not acoustically segmented into words. Now there's more to it. When I produce the English utterance, a dog never plays when he's alone, everyone in this room cannot only repeat it, but when I ask, can easily establish that the utterances consisted of eight words, a dog never plays when he is alone.

On the other hand, when I produce the utterance [NON-ENGLISH PHRASE], everyone in this room, most people in this audience will have difficulty and repeating what I said and will find it almost impossible to determine how many words this utterance was composed of, yet you heard it just as clearly as the other utterance I made. Well, there's of course no mystery about the result of this little experiment. The second utterance was in Czech, a language that I suppose not many people in this room command.

And since they did not know Czech, they couldn't-- they didn't know the words-- since they don't know Czech, they don't know the words of Czech, and knowledge of a language, of course, crucial to all aspects of speech. Our experiment also brings out the fact that an essential aspect of the knowledge of language is the knowledge of your words.

Now, most people think that that's all, but it's not true. But I will not enlighten you on that topic. I will limit-- I will perhaps reinforce the incorrect impression that many of you may have, that knowing the words is all. It's a large part of what you have to do, but it isn't all.

Now, this totally unsurprising proposition however, has a number of corollaries that provide helpful hints about the nature of language perhaps the most important of these for our topic is that we are not born knowing the words of our language. The words we know we have learned from our parents, from other people, from books or other media, so what does it mean to know a word?

When we learn or come to know a word, we store in our memory information of a certain kind that enables us to use the word and not to insist that we ourselves produce, and to recognize it when used in an utterance by someone else. The information that we need in order to do this is of two kinds. On the one hand, we need information about the sound of the word. Let us call this the phonetic information.

On the other hand, we need information about the meaning of the word, about its lexical category, for example, we need to know is it a noun or a verb or an adjective, it's grammatical peculiarities, for example, is it a strong or weak verb? Does it form its plural with a regular S as in cakes or keys or with an N as in children and oxen. Let's call the latter information grammatical semantic information.

The present talk will be limited to the phonetic information. I shall have nothing to say, next to nothing to say, about the other kind. So if anybody wants to talk about it, I'm happy to answer questions about it.

OK, so now we know that we must store in our memories, the phonetic information about every word. Now, this-- that is, we are not born knowing it and if you want to say dog or cat or table or whatever you want to happen to want to say, something has to be in your memory to say it. We very rarely make up words. For all practical purposes, we could say we never make up words. We always get them from the outside.

Now this naturally leads to the question as to the form in which this memory is stored, the form in which this information is stored in our memory. It has been assumed by most linguists and other students of language that the phonetic information about each word is stored as a sequence of discrete units called phonemes. As a first approximation, we may think of the phonemes as the units that are represented by the letters in written language.

Thus the claim that is being made here is that speakers of English remember the word cat as being composed of three phonemes, whereas the word scat is a sequence of four phonemes, et cetera, et cetera. If you reflect on it a bit, the proposition that the words are composed of phonemes is anything but self-evident. It implies that all speakers of all languages store the words they know as sequences of discrete letter size units.

Perhaps the most dubious implication of this proposition is that this must be true also very young children, say, three or four years and younger, for many of them speak quite fluently. If our claim is correct, they must be able not only to split up into discrete words, the continuous speech that they hear, but they must also be able to segment the words into phonemes. Such children must therefore be able to perform sophisticated analysis on the utterances they hear that are beyond present day capabilities of speech science.

Even worse, perhaps even worse, is that there is nothing out there in the world that might suggest to two-year-old, or to anyone for that matter, that speech consists of words and that words are composed of phonemes. You know, this discovery that the words are composed of phonemes underlies alphabetic writing. And alphabetic writing wasn't discovered till, you know, 3,000 years ago. But human beings have been around probably speaking for God knows, maybe a million years.

But anyway, it's a relatively recent discovery that sort of was a conscious discovery. Yet, as an unconscious discovery, if this claim is true, it must underlie all language at all times. And this is rather surprising, and from self-evident. So how come that somebody learns it, since you know your mother never tells you to study the phonemes of your language.

And certainly mothers, maybe now, recently, that they become more anxious about their children getting into kindergarten. But even when I was a child, people didn't worry too much about it. So the question is, how come that children even begin to get the idea that they have to do that.

Well, when you are faced with unanswerable questions, the obvious move, there is one standard and moves that every professor, I mean everybody knows, and that is to question the presuppositions. So instead of answering the question, we all say, well, what's the presupposition of this question? And maybe we can then get on. Now implicit in the subject in this question and objection is the assumption that children need to learn both that there are words and that the words are composed of phonemes.

Instead, we might assume something different. And here we follow a line of reasoning that I like lots of what goes on here, was pioneered by Chomsky originally, and namely that this knowledge is hardwired into all of us at birth. And that the knowledge that speech is composed of phonemes is as much part of our physiological makeup as our heart or lungs and liver.

Our language has a word for such innate knowledge. And that's the word instinct. And here I quote from the Webster's third, and it said that the instinct is a complex and specific response on the part of an organism to environmental stimuli that is largely hereditary and unalterable, though the pattern of behaviors through which it is expressed may be modified by learning, that does not involve reason, a learning that does not involve reason, and that has as its goal the removal of attention or excitation.

Well, I mean if you have the instinct for chopping words into phonemes, well, then you do them. There is nothing to explain. That's the only way you can be because that's what it means to be, at least in part, be human, and have the capacity to learn language. So children now, if this assumption is correct, learn words not because they're taught to do so, but rather because they are constructed from birth so as to do this. And part of their net machinery consists of the mechanisms that analyze words into its phonemes.

And of course, we do not know very much about these mechanisms. But that's the logic of the situation. Now, there's even some very simple experimental evidence showing that very young English speaking children analyze their words at the phonemes. The evidence involves the ability of children to form plural self nouns they know. So in English, the plurals are formed, as you all know, by adding S.

Well, that's not quite correct. If you look at the examples in 2, you will see that they differ somewhat. So, it's very clear, if you look in, let's say, 2C, that you don't add just S. You add B CH. The word beach and the plural of that is not beach, but it's B ches. You add a whole syllable.

And if you looking 2B, you don't add SS, which is, right? But you add-- notice, you don't say keys. You don't say plays. You say keyz, playz, which is-- so you add zz there. And finally, in fact, you do add s, just in the case of the examples in 2A. All right? So, notice that there are three distinct plurals, not just one.

Those of you who came in late, better sit next to somebody who has a handout. I don't have any more. Because you can't follow this without the-- OK.

All right, so there's three suffixes. And you know, the children have to know which to say. For example, if a child says keys. You'll notice that there's something wrong with the way the child speaks and Mom might get anxious. But children never do that. They always do-- they never say-- or if a child were to say cakus, instead of cakes, you would notice that.

And children never do. They know how to say it correctly. In fact, it turns out that these plural suffixes are not randomly distributed. You don't have to learn each word with its plural. But there is a general rule. And a rule is given in-- one form of the rule, we'll be talking about this rule, is given in 3. Now what it says there, if the noun ends in chuh, juh, suh, shuh, zshuh, then the plural is us. OK, that's the first part of the rule.

Second part, otherwise, if it doesn't end in this list, then, if it ends in kuh, puh, tuh, fuh, thuh, then add sss. And finally, otherwise, add zzz. So you have like a decision tree, you know check a list. You understand. OK? Now, OK, so this is a rule. Now the question is, do children know this rule or is this something that's clever linguists invented, you know, people who get PhDs, from MIT?

Well, the answer is that children know this rule. And in fact, in the literature is there's an experiment that shows it. And that experiment is rather-- I'll tell you. It is a rather simple experiment. The experimenters had different kinds of dolls and they gave them names.

So one doll had the name was Clye. And another doll was Clish. Let's say another doll was his name was Plick. And now, obviously these are not English words. The children had never heard these words. The experimenters never-- so they introduced these dolls to the children and then they ran the experiments. And they were very careful never to use plurals of these names.

And then they arranged them in little groups in this room where they the test was conducted. The children were brought in. And the child was asked, what's that, you pointing at a group of dolls of one type and a child would say these are Clyes. Or these are, what was it I had. Whatever it was. I couldn't pass that test anyway. But they were-- I'm too old for that.

They said the correct plural. That is they always-- so for Clye, they said Clyes. For Clish, they said Clishes. And for Plick, they said Plicks, you know just like the way you're supposed to say by this rule number three. Now notice, that they had not heard these names before. These names were given right there. That's the first time they ever heard them.

And they had certainly never heard the plural of these words. But they knew how to do that. Now, the conclusion is, of course, that they knew the rule. But knowing the rule implies a little bit more than that. That is, only did they know the rule, they could apply it. And in order to apply the rule, of course, what they had to do is they had to slice the word up into the last phoneme, and the rest, minimally.

And then, once they had done that, then they could pick the right suffix, namely, whatever it was for each of the three sets, just as is given in your rule number three. So the conclusion is that none of the children have knowledge of the kind that presupposes their ability to divide the continuous speech signal up into words, and the words into phonemes, and perform these rather complicated, perhaps not so complicated, but anyway, these tasks of making the plural.

Now the next question I will briefly discuss is what are phonemes? Now, when you study phonemes, I've give you a list of phonemes in four. Now, there's-- so puh, buh, fuh, vuh, muh. You know, those are the-- so pill, bill, phil, vale, mall, or something like that, these are the and similar to duh and so on.

Now, when we pronounce these-- as you study these things, you'll notice something that when you pronounce these phonemes, all the ones that I've given you involve complete closure. There's closure of your mouth, so that the air is obstructed in its outflow. That's very different, let's say, when you have other phonemes like say, R, O, or E. These are vowels and are quite different from the consonants that I've given you, and they all have this property that somehow there's an obstruction to the flow of air.

Now what differentiates these three groups is the piece of machinery that you have that obstructs the flow. So in the first set, puh, buh, fuh, and so on, it's the lips. In the second set, it's the flat part of the tongue, duh, duh. See, it's not the lips part. It's the flat part of the tongue. And in the third set, guh, guh, it's the body of the tongue that that stops say the airflow.

So we have three groups, depending on the articulator is the class-- is the technical term that interrupts the air flow. So that's one property, that the air flow is interrupted, and there's a particular articulator that does the interrupting. And that's what makes them different.

In the first line, it says nasal. What you notice there is that we allow in some sounds, the air to flow through the nose. We open up a side port as it were, which allows the air to flow through the nose and that excites certain resonances in these cavities here. And you have these sounds like mmm, nnn, you know, these so-called nasal sounds.

The rest of the sounds are produced in such a way that the soft palate is raised up and no air is allowed to flow through the nose. And so, we have this-- and you see, what you're supposed to see here is that this is a particular gesture that is the same in all three types of sound. So nuh and muh are quite similar in that you have a lowering of this soft palate.

But in one case you superimpose this on a labial stoppage of flow. In the other case, you impose it on a coronal stoppage of flow. And in the third case on a dorsal stoppage of flow. What you see here is that they all, they have a sort of similar internal structure.

Now if you look at the non-nasal sounds, you notice two other things. You notice that sounds like puh and tuh and fuh are produced without vocal cord vibration. Now you can tell whether your vocal cords vibrate. When you say zzzz, they vibrate. And you can feel that when you say, ssss, they don't vibrate. So there is another property that this vocal cord vibration that goes in.

And now you notice again, you can vibrate your vocal cords while stopping the airflow with your lips, or while stopping it with your flat part of the tongue or the tongue body and you can do that in different ways. And finally, there-- and that is represented by the minuses and pluses in the second line. And it says, something happens that your glottis, that you switch on and off and either the vocal cords vibrate or they don't. And that's represented on line two.

Now finally, the stoppage can be complete, as it is, let's say in puh, or tuh, or it can be sort of partial as it is, in let's say, th or sss, where air is trapped inside your mouth, but only partially. OK. Now, and what you notice here is that these are sort of subgestures, the closure, the lowering of the velum, the behavior of the vocal folds, each of these is represented in three variance, in the first class, where the major closure is made with the lips, in second class where it's made with the flat part of the tongue, in the third part where it's performed with the tongue body, by raising the tongue body.

And now these subgestures, you find in all-- each phonemes therefore is sort of a complex of subgestures. And the technical term is to say that phonemes are complexes of features. Now, so that a speech sound is not something which is further unanalyzable, but rather it's a complex of subproperties. So now, and perhaps the most interesting observation to be made is, of all the sounds in the world, only phonemes are complexes of features and only can be analyzed into phonemes. And they must be analyzed into phonemes.

So when you hear, to ask, what does-- when you represent, let's say a sound of the car or a rattle in your car, you don't have phonemes to represent that. I mean some kind of rattle that-- whereas words, they divided into phonemes. And the phonemes are made up of features. That's sort of the idea. OK.

Now this digression, this technical digression, was necessary to bring out the next thing. Now obviously, none of you, or most of you who has never heard of phonemes before, now I'm here the first to tell you. Now what's interesting is that we now may ask a question. Do people who know English do they think about the phonemes in terms of indivisible units or our English speakers, in their naive behavior, is their behavior properly characterized in terms of features?

And here is an-- and we will try to show you how that works. Now, if the latter is true, then the knowledge is that the people that know the plural rule, that know the rule number three, then of course, they wouldn't have lists of phonemes. They would have feature lists of some sort. That is, this knowledge would not be represented by a list of phonemes. But if phonemes are always feature bundles, then they would be represented by feature vowels.

So for example, let's now look at five, which is a translation of three into features. Now notice, the first set is the sounds suh, zjuh, suh, zuh, chuh, juh. Now, all these sounds are made with the flat part of the tongue. They are coronal.

They have the coronal feature. And moreover, they have some other features which we won't bother. OK, so the translation of the list, that's the list in 3A, is, as I've given in five, saying it's cornonal. Does it have the feature coronal plus some other features?

Now the second list says, puh, tuh, kuh, fuh, thuh. Now, those are all the sounds and only that are made out without vibration of the vocal folds. They're unvoiced. And so the rule now has been translated into into feature. I mean five is just a translation into feature terms, of that list that I've given you in three.

And now the question is, in which form do you, as a speaker of English, know the plural rule? Now the answer, and we'll now run an experiment right away, and we will see whether it works. OK, now the test was suggested to me many years ago by at Lise Menn, who's now a professor at the University of Colorado.

And what the test that she suggests was quite simple. First of all, she said, look, you can ask people to make plurals of non-English words. So for example, if you knew a Frenchman whose name was de la Rue, although [? Uu ?] doesn't exist in English, you could ask what's the plural of Monsieur de La Rue and his family. And somebody would--

So what is it? Which one? Which one? Which one? Well, which one, the first, the second, or the third? Third one OK, fine. Good. We're getting there. Now.

The two lists make different predictions about this sound, oo. Number three says, regardless, of what the sound is, if it's not an English sound, you have to make zzz. That's what number three says. But number 5 doesn't say that.

Number five says, look at its features. If it has the features, then apply the rule. OK, so now we have to find a phoneme of English, which not in English, but which has appropriate features. So that wouldn't pick number three. And we do have that. And everybody knows Johann Sebastian Bach.

Now Bach, he had many children. And so, what's the plural of Bach. Number two, not number three. OK? Now it's clear that everybody picks number two and not number three. Although chuh is not a phoneme of English. Therefore there couldn't be a list.

It couldn't be listed in anything that's a rule of English. And therefore, since the action was performed here and the test was run here, it's clear that somehow or other people know something about features. Maybe they had never heard about it. But then you know a lot of things you haven't heard about that play a real role in life.

OK. So we have now established that phonemes are actually bundles of features. And that speakers have knowledge in a very strong sense of these features. They may not have any way of talking about it. But then we don't have any way of talking about how we walk. We don't know how we balance ourselves so that we don't fall down. But we know how to do that even though we can't talk about it. OK.

So these are features. I mean that phoneme is a bundle of features. And a word, now, we can now say something about the way the word is represented in memory, namely be represented by a matrix, where each column will represent a column of features. And there will be as many phonemes-- each phoneme will be a column of features. And there'll be as many phonemes in the word as there are-- as many columns as there are phonemes in the word.

So that's now the claim that we're going to make about the representation of words in memory. Now we want to go on a little bit and say perhaps a little bit more. Now, notice that the features are, in one sense, they are instructions. They are instructions for actions. That is in one case, say lower your velum. Or close your lips. Or they are actions that are observable actions.

Now in memory of course, these instructions serve to distinguish one word from another. So the difference between, let's say, the word bill and the word mill, is that in mill, we have the instruction lower your velum, plus nasal. Whereas in bill, it's minus nasal. That is the instruction.

But in memory they serve as a kind of a purely at diacritic m which distinguishes one matrix from another. And only at some particular point does it eventuate in action. Now interestingly, when you hear a signal, you therefore have to analyze the signal in terms of these actions in order to pick the word out of your memory if the story is correct.

I mean, the matrices are matrices in terms of actions of particular pieces of the machinery here. Now so what this sort of presupposes or implies is that there is a kind of a primacy of the gymnastics that's going on over the sound that's out there. That is, you analyze a sound in terms of the gymnastics in order to figure out which word did he say in order for me to understand that person whose speech I'm now understanding?

Now this may sound rather surprising. Now some researchers have questioned the need for the translation from acoustics into articulations. Articulations is this gymnastics. And I've proposed instead that features in memory are directly related to properties in the acoustic signal, rather than being indirectly related to properties in the [INAUDIBLE].

It is therefore necessary at this point to review the evidence that supports the alternative view that-- for me, since I have proposed that, that the acoustic signal is not directly related to the specific acoustic properties and in the signal, but rather that this relationship is indirect mediated. Now this view, by the way, has a name in the literature. And it's called the motor theory.

And it has been championed for a long time by Alan Liberman of Haskins Laboratory and a number of other people that you may know. Now I will give a little bit of evidence why I think this is so, that there's a primacy of the articulation over the sound. And the first evidence is, when you ask speakers of any language to say a word, so you make a nonsense word like soup, let's say, just made up some word.

And you ask somebody to say it, without even in the dark, so they can't see or you play a record, so they can't see what you do when you say, please repeat this word, the actions that they perform will be enormously uniform. They may not 100%. But the uniformity is enormously surprising.

So when they hear that word they will start by raising the flat part of the tongue against the roof of the mouth. And around their lips, and then they raise the body of the tongue. The actions are executed always the same. There will never be an attempt to mimic the sound by putting your fingers in your mouth or doing something desperate. You know, they will always be the uniform.

And this is universal. I mean there is no instruction, no teaching, that happens when you make such a request. And so it's very it's not only in English speaking, when you ask somebody who never spoke and never heard English, tell them to pronounce this sound and there will be an enormous uniformity. And the uniformity will be in terms of is this execution of this particular action.

Now notice, when you ask people to produce a sound, you make, and you make some odd sound. And say, well, see what you can do. And you know, like scratching two pieces of metal plates and reproduce this sound. And people get very lost. They're very flustered.

We have this built in. Certain sounds, we analyze into articulatory gestures, namely features. And those we then produce. And then others we do all kinds of wild things. Another bit of evidence is the so-called McGurk experiment as some of you may know.

Now the McGurk experiment consisted of the following rather surprising test. People heard a tape and shown a tape in a video where somebody says, da, da, da, da, da, da, da. At a particular point, the audio portion remains the same, but the video portion changes to somebody saying, like this, you know, Saying doo, doo, doo, doo essentially, although the audio portion continues saying da, da, da.

And the impression that people have is that the person switched. And that he is now saying doo, doo. And you can actually turn off the video and they hear the da, da, ad again. And you turn the video up, and it's serving the illusion.

And the illusion simply shows you that are the articulatory information, once you receive this articulatory cue, I mean, so overwhelming, that you think that you hear something different. So here's another bit of some of kind of evidence that suggests that.

Now the third is what happens in language. And thus is this my last point I will take about 10 minutes to make it. And that is that, in addition to making plurals, there are other kind of rules in language. And I'll begin by one.

And these rules as you will see, always affect features. They always affect one or two features as you will see in just a moment. So take for example, the words in six. Now notice, when we say, take a word like leaf. Now the plural of leaf happens not to be leafs, but it's leaves. We say chief, chiefs. But leaf, leaves.

OK. Now the same thing, we'll take house. The plural of house is not houses. That would be the plural of Hermans and his family. But it would be it's houses. I mean so what happens is that in the plural, we change the shape of the stem.

The noun in the plural has a different sound than in the singular. So house in the plural in the singular. But houzz in the plural. How we don't in-- not reflected in the spelling. But notice for f and v we reflect that even in the spelling. So we say leaf, but the plural is leaves and we write a different letter to reflect that. But for house we don't bother. We don't write z there.

There's a small number of English words which have this property. And In order to speak English correctly, you have to do that. If you said, leafs, somebody would correct you. That's not correct. I mean you so there is this kind of small rule. And we have a rule, which I've written in number seven.

And what the rule says is that if you have a sound such as plus continuant means ffss, it is that kind of sound, which has incomplete blockage of the flow. This sound in the plural becomes voiced. It changes its composition so that instead of being produced without vocal cord vibration, it's now produced with vocal cord vibration.

So there is such-- and that's part of knowing English. OK, now, interestingly, there's other things that affect voicing in English. So let's take the suffix that's spelled th, which you find on the one hand in the ordinal numbers, so twelfths. Twelfths. Twelve, but twelfths. Say, now, or, five, and you say fifths. It's not five, but fifth.

So there's, the vuh becomes fuh, which in speech term, means it becomes voiceless. And notice the same thing happens in the-- when the th is used to make a noun from an adjective. So we say swide And we say widths. duh, but widths Or we say, or broad, we say breadths.

OK I mean there other such examples. All right? Now what's interesting is that here again, we have, in terms of features, what we would say is that this consonant becomes minus voice before the suffix th. You this like this is a suffix th.

But here it's sort of wrong to put it that way. Because really what's going on here is not this th is itself minus voice. You know, when we had the-- I've written here without the feature composition. But th itself, th, is a voiceless.

So now what's going on here is that the feature minus voice, is sort of spread onto the preceding sound. It's being anticipated, as it were. See, it's not just any change that's going on here. It isn't as though you know, this constant becomes the nasal or some crazy change.

The change here is really a change of a feature spreading from one phoneme onto the other. And if what you have here is a kind of an anticipation of voicelessness. You know that this is going to be voiceless. So therefore, you anticipate it already.

But now let's go back to the English plural rule. Notice what we had there was, we had, we said, well, we get, if it's minus voice. then it becomes sss, otherwise it's zzz. Remember that's what rule five said, the last part.

But again, what you have here, see, is a spreading in the opposite direction. That is, the spreading it goes from the end of the stem onto the suffix, that is the suffix was zz or the basic suffix is zz. But if it follows a voiceless consonant, then the voicing feature now is spread onto it.

So here you have as it were, a kind of a perseverance inertia effect, where this feature of the stem is held over onto the suffix. And I've represented these two things in your handout in, one is, in 10 and the other one is in the 11. Where I try to show you that in one case we have kind of an anticipation effect, and in the other case, we have an inertia effect.

And now again, is that I want to go on and show you a very surprising, unexpected effect. Another is when we look at the actual gesture and it begins to make very good sense when we look at the actual gesture, which I'm going to now tell you. OK, now how do you do-- how is voicing produced?

I mean, we vibrate our vocal cords. How we do that? Well, we don't-- each vibration isn't individually controlled. They go much too fast. They go several hundred times a second. So it has to be done in some other way. And the way you do it is basically ttthpt, like the soul. I mean what right it's a kind of a relaxation oscillation, where you blow air up from the lungs.

The vocal chords sit on top of the lungs. They're elastic as the pressure builds up. The vocal chords are shoved apart. Then some air escapes. The pressure drops. The Bernoulli effect takes over. Vocal cords close. And you know the whole thing starts all over again from the beginning.

So that's the actual action. Now Ken Stevens and I, who, like Peter Elias and I go back to prehistoric times, so we some 25 years ago we investigated the matter. We studied this whole question of how that works.

And one of the things we noticed immediately, and for this we claim no credit, or we shouldn't claim any credit, is that I mean that what happens is you stiffen the vocal cords. As you stiffen the vocal cords, what happens? Now interesting, the first thing that happens is that the rate of vibration will increase.

So as you make them stiffer, the rate of vibration increases. Now we know that we control-- what does the vocal cord vibration reflect? Well, when I increase my vocal cords-- let's say when I say a vowel, or any vowel, and I increase the rate of vibration of my vocal cord, the pitch goes up. I say, ah, oh, oh.

These are changes in the rate of vibration that are effective probably by changing the stiffness of the vocal chords. Now, however, and here's the more interesting fact, when you make a vowel sound, you're as I indicated were, your mouth is wide open and no air is trapped in your mouth. And the pressure inside your mouth is roughly ambient pressure.

And so the drop from what goes on from the lungs into the mouth is sort of the maximal pressure that the difference that can be. Now what happens when you say, let's say, sss or the zzz? Well when you say sss, you trap air in your mouth. And some pressure is built up inside your mouth. And therefore the drop across the vocal folds, assuming that the pressure in the lungs remains the same, is less.

So we have two cases. In a sound like sss, we have a small pressure drop. And in a sound like ahh, we have maximum pressure drop. And now, the question we asked, was what happens as you change, make the same changes in vocal chords stiffness? What will be the behavior of the vocal chords?

Now we know in the case of big pressure drop, there will be increasing in the rate of vibration. But in the case of a little pressure drop, small pressure drop, what happens not surprisingly, when you think about it, is you increase the stiffness, the vocal chords stopped vibrating. Because there's just not enough push at this point to-- the stiffness is now so great that and the pressure is so small that they simply don't vibrate.

So that says that as a gesture, the differences in pitch, that is high pitch versus low pitch, and voicelessness, absence of vocal cord vibration, and presence of vocal cord vibration, are produced by the same mechanism. Right? That is the same articulatory mechanism that is involved in making changes in pitch is present in making the difference-- differentiating sss from zzz from vvv and so on.

Now that's simply the logic of the situ. That seems to be the way it is. Now the question is, but now notice what's interesting and if you look in number in your handout under 13, I've sort of made a table there, and I said, there's a large pressure drop. And on the left, there's a stiff vocal folds and slack vocal folds.

And there was a large pressure drop, higher rate of vibration versus lower rate of vibration, with small pressure drop, absence of vibration, cessation of vibration, presence of vibration. OK? So but now notice acoustically, and here is a question that we try to raise. What has primacy? Is it the gymnastics or is the sound?

OK, now notice, the sound is very different. I mean there is really nothing in common between a high pitched vowel and a voiceless consonant and a low pitched vowel and a voice consonant. So here is-- acoustically as they should go their separate ways. But from the point of view of the mechanism, they seem to be going the same way. But we have even, when we have a little bit of linguistic evidence to show that perhaps that's the right way to go. And that is and this evidence is given in these examples from a Chinese dialect in number 14.

Now what goes on there is, in the Chinese words, in many of the Chinese languages, they are differentiated by the pitch contours that they have. So they have a falling contour or a rising contour or an even contour. And that makes differences in meaning. I mean, you know everybody knows that.

OK, but now in this particular dialect, which is not the standard, I mean we should not the Mandarin standard dialect or any of the-- it's a particular [? Amoid ?] dialect. Now what you see there is this. That look on the left hand column. Now the numbers indicate pitches. So five is the highest pitch and one is the lowest. So five three is a falling pitch and three five is a rising. And four four is even.

But now notice, you have these two sets of words. And they have falling, even, and rising. And they come in two versions, a high pitch version, when the sound is tuh, and a low pitched version when the sound is duh. So in T, in the word T, they have three versions of the word T, And you know was, T and T and T and T. And then they have the same with D.

But the register in which it happens. So is it a high register or a lower register if you will. And that goes of course, with the voicing of the proceeding of the constant. If you have tuh, which is the voiceless consonant, which is one, that you form by stiffening your vocal folds, you now spread the stiffening of the vocal folds onto the following vowel. And therefore the pitch of the vowel goes up.

In the other case, you have slack in the vocal folds-- you have duh, with slackening of the vocal folds, and as a result, you spread that over to the vowel, and as a result, the pitch of the vowel drops. So in fact, the rule, there seems to be a rule in Chinese, which said spread that stiffening from the preceding consonant onto the following vowel, and that's it.

So the rule is really, just exactly like the rule that we have for the English plural, only in the English plural, we spread it among consonants and leave the vowels alone as it were. Whereas in Chinese, we spread from a consonant to the vowel. And the results are exactly as expected. But implicit in all this is, that the gesture, the articulately gymnastics, is the primary thing. Whereas the acoustic result is, as it were a secondary thing.

So this brings me, I'll just have one brief conclusion. I'll say that I've argued that words are stored in memory of sequences of discrete phonemes. I've suggested that phonemes are composed of more elementary entities, namely features. I've attempted to give some idea of the nature of the features. And I've discussed a number of factors to suggest that a significant part as an analogy speakers have of their language involves features.

I noted that because of its great complexity, it's implausible to assume that the machinery involved in the memorization and production of words is learned by speakers. Instead it's more plausible to assume that much of the knowledge required for speaking is innate, hardwired in us at birth. And finally, I gave some evidence to show that the articulately reactions of the different features have a more central character to the functioning of languages than their acoustic effects.

[APPLAUSE]

Any questions?

AUDIENCE: If English is not your [INAUDIBLE], when does the hard wiring set in?

HALLE: It's different for different people. You know I have a colleague, Ken Hale, for whom he's 55, hardwiring has yet to begin. But for most people it's around puberty. Puberty, about 12, 13.

AUDIENCE: How do deaf mutes ever learn a language? It must be very different.

HALLE: Well, notice that their natural language is, of course, for gesture. Now you see, what we speak, your mom always tells you not to, but most of us do. And, the deaf you see, they develop language which is every bit as complex as English. It's not English. I mean deaf, deaf mutes do not just sign words. I mean, they say English words in the order in which the English syntax puts them.

They have a sign language, which--

AUDIENCE: Their memory must work quite differently.

HALLE: No, their memory is in terms of gestures. And they happen to have gestures of the upper body and the hands, rather than gestures of the lips and those articulators that we use of those of us who can hear. And it's clearly-- and as you well know, that these children, for example, deaf children of parents who are not deaf, they learn sign and they learn sign correctly, although the parents very often don't know how to sign correctly.

Because a lot of it is really innate in them. And we have these both parallel channels, of which, for obvious energetic reasons, the spoken, the one that we use for speaking is to be preferred, because you're moving much smaller masses. You know, when you move your arms and your body, I mean, you're really moving pounds. You know, whereas when you move this, you just move a few grams. And therefore it's much more efficient from that point of view. So that's the preferred one.

But sign languages have existed and exist there's just every bit as complicated as other languages. Louis.

AUDIENCE: The storing of a language involves many thousands of words and [INAUDIBLE] and so on. And it's tempting at least to, ask, it is a memory in terms of all these words, which is very large, or is it in terms of features or whatever, and features and rules, which can combine things to make the words. What's know about this? Or what's speculated?

HALLE: What I showed you that the representation would have to be in terms of some feature. I mean we're extremely gifted, enormously gifted for memorizing enormous numbers of words. You know, the average child aged six, knows something like I don't know 30,000 words. And that's an enormous number. You know, when you think that, I mean there's stages in children or ever they learn something like a word an hour, a new word an hour.

Those of you who've taken language courses and have had to memorize the words. Well, we all know what a horrible, difficult task that is. And so we're very capable, I mean, naturally, very gifted at memorizing thousands and thousands of words at a young age. And the rules of language, like those that I showed you, are very obviously in terms of features.

A feature changes because you spread this feature onto some adjacent phoneme, as I tried to show you or things like that, that what goes on. Sir.

AUDIENCE: You make a good case for how [INAUDIBLE] mechanical question or gymnastic question. How do you explain roof, roofs, and hoof, hooves, or the fact that we use a different plural for the word loaf, depending on its meaning?

HALLE: Yeah. You didn't get the handout. In number seven, I said that, or number six, I forget now what number it was. Number seven, I said that there's a list. I said notice, applies only to leaf. There's about 30 words, where this rule applies. And it's part of, you have to memorize. I mean a large part of English is memorization.

AUDIENCE: [INAUDIBLE]

HALLE: Child will probably, notice, children will make mistakes. And the mistakes, so for example, you probably if you've seen children will say mouses. Now notice, they never heard mouses, or they will hear oxes. You know, they'll say oxes.

They never heard oxes. Because the language around them doesn't use that. So is that I mean that goes again, to show that they know the rule. And they've learned the rule. And they haven't learned the exception rule, namely that for ox you don't do it, you know. So they just go by the regular rule, until somebody takes them aside and says, hey, hey.

Ma'am.

AUDIENCE: In section eight, in item eight of your list.

HALLE: What?

AUDIENCE: In your item eight in your handout, you talked about the [INAUDIBLE] last consonant. The more striking thing there is the vowel change. Do you have an explanation for the vowel change?

HALLE: Yes I do, but that's a long story. You see me privately, I'll tell you. That's a long story, yeah. It's shortening thing. These are partly shortening and other things too. But that's a lie, I would take me 10 minutes to do it. So it's not appropriate.

AUDIENCE: You didn't get a chance to talk about things like syllable structure or stress pattern. I assume that's because you would group it under your grammatical.

HALLE: No, no, . Or This is less lesson one.

AUDIENCE: Could you comment on the role of things like that and the representation of words?

HALLE: Yes, I can comment. I don't know. I think syllable structure and stress are-- what I believe is this. That now, I can't give you any evidence, needless to say for this round. Is that the representation, see, I've said that the representation is a form of a matrix. You know so each phoneme is a sort of a block of features. That's what I said.

Now, I think that the way a proper way to think about is even somewhat different. Namely that this block of-- that there is a timing slot associated with each phoneme. And the blocks of features are associated with the timing slot. And so it's possible for example for a given phoneme to be associated with two timing slots.

Like for example, when you say, oh, I don't know, say. cool lee or something like this, you know, cool plus lee. Or things like that, OK? That where the phoneme gets spread a given form gets spread over two slots. And things like or-- now syllable structure. I think is that is imposed on these timing slots.

And it is represented on a plane that's is orthogonal to the plane in which the feature composition is. And stress is yet on another plane, which is orthogonal. So that the axial, if you want to have a physical picture, it's like the knowledge that you have of a word is like a spiral bound notebook, which is all, you know, the registration is all in terms of these timing slots. And there's various structures of the timing slots.

And you can peek. I mean you can look from one page to the other. These pages, and so you look at something here and decide what to do about the stress and so on. And they have vast interesting, specific properties. And the syllable structure is a particular kind and I've written about this and so on but that's kind of a short answer to a very long question.

Here.

AUDIENCE: There are other experiments too. You sort argue that people represent these phoneme representations in sort of like motor control program. Not the acoustic signal isn't important, per se, sort of more try to map the acoustic signal into a motor program to recognize a phenome.

HALLE: Well I said that there's something, that there is knowledge of language use is independent of speaking. Like for example, it's conceivable that you could be, if you would have a let's say, an injury or somebody would have an injury. And that would not permit that person to speak. Now some neurological injury or simply block any speaking of the person. Would still know English perfectly well.

Couldn't just, couldn't perform the actions. But the actions would still be represented by the language act. And so in terms of these features, and the features are related to accents, whether they can be carried out or not.

AUDIENCE: I mean, people have done experiments, where you shoot novocaine [INAUDIBLE] using a different motor sequence to produce [INAUDIBLE].

HALLE: Yes and no. Yes or no. Yes to some extent too.

HALLE: But also, they never make a T with their lips. Or they never make a B because they're abiding by manipulating the velum. I mean there is some plasticity. But the elasticity is very limited. See, I mean, you're absolutely right. I mean, there is this plasticity. But the plasticity is--

AUDIENCE: But it seems like they do those adjustments based on acoustic information. They try to produce--

HALLE: Look at there's a real reason why deaf people don't speak. You know I mean, the acoustics triggers it. I mean, so, I said, I didn't mean to say that. We don't have to listen, if you don't hear, you can't-- it never works, but there is this very close tie up between the acoustics and the articulation. And that's what I've tried to stress.

AUDIENCE: But internally you claim you think or you say that you believe words represented not so much as a sequence of features, but more as the sequence of actions.

HALLE: No. But as the features are actions. The features are actions. They represent those features. And the features are actions which have acoustic consequences. And I don't want to-- I'm not for a moment do I deny that.

PROFESSOR: Maybe we better stop here.

HALLE: All right. Thanks very much.

[APPLAUSE]