Rudolf E. Kálmán, “A New Look at Randomness as a Systems Phenomenon" - MIT EECS Colloquium

Search transcript...

[MUSIC PLAYING]

MODERATOR: Well, I'm very pleased to introduce Professor Kálmán from the ETH in Zürich and University of Florida in Gainesville. He, of course, needs no introduction. Amongst his many honors includes the Kyoto Prize. I think you were one of the first winners of the Kyoto Prize.

KÁLMÁN: The first winner.

MODERATOR: The first winner-- I stand corrected-- of the Kyoto Prize, which, in the eyes of many, are equivalent or perhaps higher than the Nobel prizes. He's going to talk to us today about randomness as a systems phenomenon.

KÁLMÁN: Thank you very much. I have a very unfortunate-- do I have to wear this?

MODERATOR: Say what?

KÁLMÁN: Do I have to wear this?

MODERATOR: Yeah, you better.

KÁLMÁN: I have a habit, which is perhaps unfortunate, of being most eager to lecture on problems which are not solved in the hope that my efforts in the lecture will be rewarded by some interesting comments or interaction later, which sometimes happens. Today I'm going to talk about a problem that, in some sense, is solved. But at the same time, the solution gives rise to a new problem. So I'm doing both, talking about a solved problem and about an unsolved problem.

Well, first of all, let me say a word about the word "stochastic." Usually, in the literature, stochastics and randomness are treated as roughly the same thing. And the trouble with our current misunderstandings about a lot of things begins already with this word.

In the article in the Encyclopedia Britannica dealing with Linus Pauling, a very successful and very earnest scientist, his approach to science is characterized as a stochastic method, which is explained-- of course, this is a Greek word. It's explained in the article as meaning, "aims to divine the truth by conjecture." Very roughly speaking, I think it's fair to say that all this stuff, one can abbreviate it with the English word "guess." In other words, the method is you guess. Of course, Linus Pauling guessing or my guessing are quite different things, so I don't want to make fun of that. But nevertheless, that is the meaning.

Now, I have some Greek connections. And therefore, I wanted to find out whether this interpretation meets with their approval. And here is a-- and generally, they say it's not true. But I have a-- let me adjust. I have a letter from somebody who's certified to be Greek and says, "which arrives at conclusions by guessing." So that seems to be the origin of the word "stochastics."

Now, for a very long time, people have voiced various kinds of reservations about the way probabilistic modeling or probabilistic reasoning is used in regard to explaining phenomena in the real world. For example, a guy named Einstein said-- and this is terribly well known, although never recorded by Einstein himself. He said, "God does not play dice."

Some people would interpret this as Einstein's denying that there is randomness in nature. But I think that's a very crude way of interpreting a statement. There are lots of other ways of looking at it. So let's just take it literally. God does not play dice. And I'll come back to it.

[LAUGHTER]

Second, Kolmogorov was supposed to have said that, "There is something wrong with statistics." Unfortunately, I don't have a written reference to the statement, but I did put it in one of my own papers, so I can reference my own paper for this statement.

[LAUGHTER]

And of course, the question is, what is wrong with statistics? And if you're interested in the answer to that question, there's a paper that will be written next week which will provide an answer that much of it, I will cover also myself.

And then finally, to come back to the more classical side of the subject, perhaps the greatest expert on probability theory was Bruno de Finetti. He had written a big treatise on probability theory. And I gave you here a part of the preface, which I think is fairly explicit. "Probability does not exist"-- in the real world, I think one might add.

Now, one more point that should be made is that the issue that is discussed here-- namely, can I arrive at the truth by guessing-- has a long history which is associated prominently with the name of Newton, who said, in 1713, that "Hypotheses non fingo." This is a kind of Latin that cannot be translated completely accurately in polite company. The expression "fingo" is not quite polite. But it has an obvious meaning. It says, I do not dream up hypotheses to explain the facts. I let the facts explain themselves.

So my main point in this lecture will be to simply say that the question of relationship between stochastic models and the real world is something that cannot be handled by guessing. This relationship cannot be handled by guessing from a general point of view, but it was handled that way in most of the time that people have attempted to look at this question in spite of authorities like Newton, who would have been very unhappy if he had really paid attention to what Bernoulli was doing right around this time.

But Bernoulli, I remind you, was mainly interested in things like card games. That was the main application of probability theory at that time. And my objections do not apply to card games because those are constituted in such a fashion that the axioms of probability are automatically satisfied by putting yourself in a certain well-defined, idealized situation.

But I am interested in the relationship between stochastics and the real world, and this relationship has not been adequately explored aside from statements like what you see there. And I would furthermore add that this is a researchable question. It's not a question that should be debated purely on religious grounds, which is how it is normally debated today.

Now, there has been quite a bit of discussion about these questions in the past. And one of the slogans which brings out the objections to the usual procedure is called SSP, an abbreviation used by DeLeo, which means Standard Statistical Prejudice. Some people are very unhappy about the word "prejudice," which incidentally was already used by Newton in the same context as "Hypotheses non fingo."

If you don't like this, just replace it by the words, "in appropriate assumptions." So you could say that Einstein didn't mean to say that it's a prejudice that God plays dice. He could have meant it's an inappropriate assumption, in looking at the physical world, that God plays dice.

So there is a standard statistical prejudice, which is what? We take a so-called population with a fixed invariant probability distribution. We assume that this object exists. And then usually, you also add to it the notion of independent sampling.

And my point is that none of these notions are things that can be really tested or researched in regard to the real world. These are all on the level of assumptions. They are a very heavy use of hypotheses fingere in the sense that Newton does not like.

And the situation is so bad that, in my personal opinion, which I will very briefly sketch you only here, it is not possible to proceed along these lines further than has been done already, and you have to take a drastically different point of view. The drastically different point of view means that you will study mainly the notion of randomness, and much less the notion of probability, but randomness. Probability will be treated as an induced thing, and it's quite difficult to determine in the natural world.

You can easily measure other cross-sections of nuclear reactions relatively easily, but nobody in the physical world today has machines that really measure things like probabilities. In other words, it's the kind of notion that's not suitable, in many ways, for a direct definition or direct study.

So I want to introduce a new definition of randomness, and I will say almost nothing about probability. The definition of randomness will be the following-- whatever is not determined or, if you feel, constrained by classical rules, by which I mean, of course, macroscopic, mathematically precise things, precise rules.

So whenever you're in that situation, which could be such that everything is uniquely determined-- if things are not uniquely determined, then whatever is there has to be regarded as random, for lack of any better word. But this randomness, I claim, describes very quickly and directly a lot of the obviously random phenomena that you see in the real world in a way that conventional probability theory never even came close to describing.

Let me give a few simple examples. And then at the end, I will report an investigation that's directly related to econometrics, which is not only successful, but it forces me to make this kind of a definition. I could not use any other definition on the basis of the facts that I have available.

So an example is suppose we expand the square root of 2 into a decimal fraction which continues a little longer than this-- 213562. Now, in a certain sense, it's not random because number 2 occurs twice. 4 occurs twice. 1 occurs twice. And 7, for example, doesn't occur at all, and so forth.

Nevertheless, according to my definition, I would have to call this random. Why? Because the only thing I can say, on the basis of classical knowledge, about such a sequence is that it's ultimately periodic. That would be the classical knowledge. Periodic, of course, is highly regular, and it's certainly not random.

If that were the case, this is a rational number. But of course, already the Greeks knew that this was not a rational number. And therefore, this is random. So we take a point of view that's quite compatible with the Greeks, namely that whatever is not rational-- that is to say, given in this kind of a way-- has to be designated as random.

This leaves open the question of trying to prove, if you're really interested, what is the distribution of digits in this expansion? And there's a little bit of literature available for that. But to the best of my knowledge, nobody has ever, in a mathematically useful way, been able to define a joint probability distribution for all the possible n-tuples of digits in this sequence. That sort of activity is not interesting from the present point of view because it is not information that is relevant to the basic question.

But it seems to be perfectly obvious that this should be a random sequence also in the sense of approximately equal occurrence of these digits, and that seems to be an empirical fact. So that's how far we want to go with such a definition.

My second example, which I will just mention the name, is economician [INAUDIBLE], who is credited with a famous theorem in topology and is now well over 80 years old, and published a number of papers between 1980 and 1990, including a book, I think, about '89 or '90 on treatment of experimental data, roughly speaking. He sets up certain examples, which I published and talked about in other places, and attempts to analyze these examples in such a fashion that he runs into the problem of defining what is random and not random, and makes a horrible mistake.

This is totally wrong. I can't explain exactly because that's a lot of time-- exactly why this is totally wrong, but it will be related to the discussion toward the end of the lecture. And the difficulty is simply that he uses an approach to the subject which tries to dodge the notion of randomness in the sense that I'm talking about it here.

Another example is leaves on a tree. If you look at any reasonable tree of a given specific type, then the distribution of leaves on this tree has a roughly random appearance, but of course, not totally random. There's a certain regularity as well.

Now, from the point of view of conventional probability theory, it's completely crazy to imagine that you would describe the joint probability distribution of all these leaves. It just doesn't make any sense. It would be a tremendous job. When you're finished with it, it would not be interesting, and so forth.

But there's a theory for this called the Lindenmayer system. And the Lindenmayer systems are an immediate explanation of randomness in this sense. Mainly, these are like finite automata, but with a not necessarily deterministic transition function.

In other words, as this automata evolves, similar to the growth of a tree, occasionally, points occur where either of two or more decisions is possible. And it is this non-uniqueness of the decisions that leads to the observed randomness of the trees, which can be simulated by computer graphic experiments. And you have amazing agreement with the real world without, of course, going too deep into questions of this sort.

So the essential point of my definition of randomness is that it looks at the world as possibly determined by classical rules, uniquely, or possibly non-uniquely determined by these classical rules. Well, it may very well happen that on a microscopic level, there are some additional rules and operations, so we don't claim that such things are necessarily random in the sense of not being deterministic. But the essential issue is that it's a fact that in the real world, not everything is determined uniquely by overall classical rules. Why that is so is something I'm not going to speculate about, in accordance with Newton's statement.

Let me give perhaps one more example which illustrates how I want to use these words. Suppose you take a cube and throw it. How do I apply this idea? Well, a cube cannot come to equilibrium on a vertex or an edge by some sort of general notion of stability. But it could come to equilibrium on a face, of which there are exactly six.

So as far as macroscopic rules of mechanics are concerned, these rules simply say that a cube will end up on one of the six faces when you throw it. And it says nothing about which of the six faces it will come to rest on. Then it's a reasonable argument that because of symmetry, the probability of each of six faces should be the same.

But that's a kind of discussion that is not necessary here. I don't want to go into it. And in a more complicated situation, the determination of such probabilities for more direct physical principles is a complicated matter.

But I'd like to mention that apparently, there's some work going on in the following thing. You throw the die. And at some point, the path is deterministic because, according to classical physics, everything is exactly determined by the conditions here. And then at some point, this becomes indeterministic.

And the physicists seem to be amused by the question, where does this thing becoming indeterministic? I think that's complete nonsense. And to avoid such complete nonsense, we have to use my definition of randomness. There is no such thing as transition from deterministic to indeterministic and all that.

Now I'd like to talk about a specific example that came up in econometrics but which has a general significance. And then I will show some experimental results. So there is a connection with the real world. The main example is the following.

Suppose I'm given a cloud of points, xt in Rn-- t, small or a large number. But t, in any case, will be thought of as being bigger than n-- not necessarily much bigger. And I ask the question, do these points satisfy a linear relation, which can be expressed in this form, the matrix A prime denoting a collection of rows, each row being one particular linear relation. And for various purposes, it's necessary to technically assume that the rank of A prime is maximum, which means that it will be equal to q when I take q linear relations. So I consider a q by n matrix.

The question is, is there such a linear relation in the data? It's a trivial exercise in pure mathematics to prove that if that relation is satisfied by the data like this, in which case, I call the data exact, A prime is unique. So a major fact in this situation is A prime is unique. You satisfy Newton's dictum, and nothing is assumed. It's the data, once this is satisfied, that automatically gives you A prime-- unique in a mathematically sophisticated sense, not in a stupid sense. The numbers are not necessarily unique, but that's not important.

Now the issue is, in the real world, you have necessarily a situation that can be thought of as x hat plus x tilde t-- namely, the actual data is thought of as consisting of an exact part and a noise part. That's a totally conventional assumption, a so-called additive noise. But now, at this point, there is a serious question of how to proceed.

According to the standard statistical prejudice, x tilde t will be defined as a random vector with all that the assumption implies. In particular, the customary definition is something like this. x1 tilde t is Gaussian 0 mean fixed variance. And x tilde kt is 0 for all k not equal to 1. That's a typical example of the application of standard statistical prejudice. That is to say I take into account a theoretical object, which is a Gaussian random vector with certain specific assumptions, and then I analyze a problem on this basis.

Now, the reason why this standard statistical prejudice is not acceptable is somewhat subtle. It could be acceptable in the case I guessed right. It is possible, in the real world, to imagine that the data has actually been prepared in such a fashion that this assumption is true. For example, you can run a simulation experiment in which you simply put this assumption into the data. So if it's true, then the assumption is harmless.

But the question is, how can I check whether it's true or not? And second, what do I do when the assumption is wrong? So in practical terms, the trouble is simply this, that few people have the imagination-- and that's almost a theorem-- to be able to test all such prejudices at the same time, which would eliminate or weaken the word "prejudice" considerably. So as long as I make one assumption, I'm certainly in a prejudice situation. If I look at all possible assumptions, I'm in a less prejudice situation.

So in the example to be discussed, it turns out that I could make an assumption provided I'm very careful of not fixing the prejudice at the beginning, but toward the end of the investigation. But then, it turns out that the assumption will not buy me very much. It may be correct, but it has no serious explanatory power.

So a new approach has to be found, and the new approach essentially says that to produce noise so that a linear relation will be exactly satisfied in the exact part of the data, I have to be very careful about how I produce this particular noise. So I have to check various possibilities. It's not known, even in this simple case, as to what the optimal or most reliable procedure might be. But there are some practical suggestions.

So to get to that point, let me introduce a definition that sigma is the covariance matrix of x and, similarly, sigma hat and sigma tilde. Then I could look at the following situation to avoid explicitly making the standard statistical prejudice. I could say that one, sigma hat has a given corank, q, which is due to the fact that the equation Ax hat equals 0 is equivalent to the equation A prime sigma hat equals 0. And the corank of sigma hat will turn out to be the rank of the matrix A. So this assumption, in regard to looking at linear relations, is precisely equivalent to what I have written over there.

The second assumption would be I look at the situation in low noise-- namely, I assume that sigma tilde, the norm, is less than, say, epsilon. If I do not make any assumption about noise-- for example, I could declare, in this situation, the whole data to be noise, the exact part 0. Then, of course, I don't get anywhere. Therefore, it's our interest to see whether or not some other assumption or noise-- namely, a small amount of noise-- has some explanatory power for a given set of data or not.

Now, this question was first considered, to the best of my knowledge, in the literature in a famous book by Ragnar Frisch published in 1934, which most people have not understood. In fact, I guarantee that nobody ever understood it, including Frisch. Apparently, he hasn't written it. Professor Sanderson told me a couple of days ago that it was probably written by Frisch's student, Haavelmo, who got the Nobel Prize in 1989. And I'm quite sure that-- quite sure from direct contact, that Haavelmo, certainly hasn't understood it. So we have a serious problem, but it's not too important for the present lecture.

So Frisch wanted to check something related to this situation here, and he produced some simulated data which, unfortunately, is not published in the book, but only the corresponding covariance matrix, sigma. But we know that the simulated data had low noise-- namely, approximately 10% in terms of x tilde. So x tilde was about 10% of x. x tilde equals 10% of x, which is relatively low noise.

He then attempted to have a numerical procedure for getting back the assumptions he put in the data-- for example, determining q-- and verifying that the noise was low. For reasons that I am unable to fully understand, Frisch did not succeed fully in this experiment, although he tried it. And therefore, the experiment has to be redone, and I'll show you the main result on that.

I should add one more thing. Frisch had a method of investigating this problem, and that's the following. Plot let's say, rows of sigma minus 1. This prescription can be, to some extent, justified as follows. Each row of sigma minus 1, written as a prime i, satisfies the relation here for a suitable definition of x tilde. And these things are known as the elementary regressions in the so-called least square sense, which is too technical to explain here, but it means what you think it does, meaning that x tilde i, t is not identically 0.

So an elementary regression is a way of looking at the data in the light of the standard statistical prejudice, which makes an assumption like this except that you vary the variable that's allowed to be noisy or other variables that are always not noisy. That's the meaning of plotting the rows of sigma minus 1. But I should add, plot the normalized rows-- namely one coefficient would be 1 so that the a prime i will be written in the format 1 alpha beta, or something like that.

In the situation that Frisch was concerned with, we had n equals 4 and q equals 2. These assumptions, plus the low noise, are put into the data. Now if you analyze this by a modern procedure which I cannot, at this point, explain in detail, you obtain the following plot. This is real computation obtained only about a couple of weeks ago because I checked to make sure that we didn't make some mistake in using the Frisch data.

Now, what's interesting in this plot is that all four points are on the same straight line. These are the four rows of sigma minus 1. Now, if you have four points in a straight line, it surely has some major mathematical significance. So to understand-- Frisch did not plot this thing, which was a terrible mistake. And in fact, he did not really understand the whole thing, but he was very close to understanding it.

So from the practical point of view, what you have to do is this. Suppose I see this. Never mind why. I'll explain that later. What is the explanation that all points lie in a straight line?

Well, the explanation is straightforward. We have a theorem. The theorem says that under these circumstances when n is equal to 2, all points will lie on a line with a certain band around it, the thickness of the band being plus/minus epsilon. The points will lie within the here and so forth if and only if the assumptions I stated there are met-- in other words, if and only if we have this situation with q equal to 2.

Then if you perform the plot of the rows of sigma minus 1, they must lie in a straight line. And you can measure the amount of noise by measuring the epsilon-- amount of noise relative to the rows of sigma minus 1 by measuring how far these points are from a straight line.

Now, in the plot that you see, there is no cheating. If you look at it very carefully, these things are not exactly on the line. But they're so close that within the normal accuracy of such a display, you can't tell any difference. But they're not exactly on the line. By measuring how far away from the line you are, you get an estimate for the epsilon.

So this is the solution of the simulation problem of Frisch-- namely, if I do what Frisch says, then I get an indication of the situation in this data which is strongly characterized by the number q, and also by this epsilon. If epsilon is large, forget about it. If your data is very noisy, it's not worth taking. Throw it away.

Statisticians have traditionally made a terrible mistake in trying to explore the unknown by concentrating on the noise. Scientists, on the other hand, regard noise with a certain amount of contempt, and their interest is getting things like this straight line. The straight line is really a system that's in the data and which you have identified, and you see there's no fuzziness whatever about the straight line, at least within the accuracy of the picture.

So this is a situation concerning Frisch's book, which is now available. However, historically, this was not how things went. Historically, I became interested for various reasons in data published in the book of [INAUDIBLE], a French econometrics textbook, which used data from the French economy, as shown in these diagrams, and which you can see perhaps a little better when it's plotted against time, as here. So in terms of time behavior, you see that these three variables are roughly moving together, and the last variable is small and essentially random.

Now, under these circumstances, you would ask, what can we say for sure about the apparently random behavior of these three variables, forgetting the third variable, which is obviously noise? I'll come back later to the question what to do when one variable is suspected to be noise, how to verify that fact.

Now, one of the techniques that is-- one thing you could do is take the whole data from 1949 to 1966, compute the covariance matrices for that, and by some theory, try to decide what q is. So we have the fact that from 1949 to 1966, by various ad hoc methods, you can verify that q is equal to 2. And everything would be simple.

But this is not a completely satisfactory statement because obviously, this data may have a time gradient component in it, and it is desirable to investigate whether the behavior of the system-- namely, the French economy-- is significantly different, for example, over this period of time or over this period of time. In his book, [INAUDIBLE] claims that the common market brought about a big change from '60 to '66, and one might be interested in checking whether the claim is right or not.

Now furthermore, even in the standard statistical prejudice, it is a fact that if you subdivide the data, the subsets of the data must satisfy exactly the same assumptions as the whole data. So for example, if I assume this for all t, then for subsets of t, this must also hold. And that's a simple mathematical feature of all such things.

Therefore, it would be interesting to see what happens if you look at the data in a segmented way. And now we will see what happens if you have 5-year windows in the data. This situation was first investigated on July 20, 1983. And I got the results back around approximately 11:58 AM, a point in time which rather accurately marks of the death of classical econometrics.

[LAUGHTER]

Now, the question is, exactly why is this so? And I have to make one more explanation before showing you the slides. Econometrics. The theory, as it existed in 1986, was not yet freed-- '83-- was not yet freed from the prejudice that I have discussed. There's also so-called Frisch prejudice.

The Frisch prejudice says that sigma tilde is a diagonal matrix. It can be shown that under certain circumstances, you can arrange the noise so that the data covariance of the noise-- these are all data covariances, not population. There's no population discussed at all. But the data covariance of the noise is diagonal.

Then when n is equal to 3, there are two situations. Either the three points that we're plotting sum out together like this. Then q is equal to 1. Or they are like this, three different quadrants. And then q is equal to 2. So we wanted to know whether the statement that the data contains q1 or q2, which is the critical question of the nature of the data, is verified by looking at the windows.

So let me begin showing you various windows. We're plotting three rows of a 3 by 3 covariance matrix, and the first window, from 1949 to '54, I think, looks like that. That's rather interesting. You see a straight line between the three points. Actually, what you draw by a computer is a triangle connecting the three points, but the result is a straight line.

And of course, now we know that this has something to do with the Frisch simulation experiment. But at the time that this was done, that conclusion was not yet clear. If we take the second window, we have a similar situation, but the points are quite different. For the third window, the points happen to coincide. Notice that they're always in the second quadrant.

For the next window, the points now correspond to q equals 2 in the Frisch sense. So far, they were q equals 1 in the Frisch sense, and they're all over. For the next window, the points again coincide. For the next window, they coincide about where they are, but now they're in the other quadrant, in the fourth quadrant.

Once more, they coincide like that. And then something happens. The thing explodes, and there's a point off this particular scale. But again, they're in the third quadrant. Remember that the difference between any two consecutive graphs is only one year. One year is taken out, and one year is put into the data. Four years are in common.

The next year is suddenly, these three points come together again. Then again, we have Frisch q equals 2, like that. Then we have Frisch q equals 2 again. Then we have a few more. Now Frisch q is again 1, and we're back in the second quadrant. q equals 2. And finally, q equals 1.

From my personal psychological-emotional point of view, these graphs represent the discovery, roughly, on par what I had heard about, for example, the discovery of radioactivity. These graphs are so unbelievably different from what you expect that it's a serious task to find an explanation for them.

So let me summarize the property of the graphs. q Frisch can be equal to 1 or 2, and the graphs are not stable with respect to this property. This depends on what years we put together in some particular way. They're not stable.

And the quadrant containing the points is also not stable. If the Frisch prejudice were correct, then a practical statistical investigation of this would have to have the response that either q is always 1 or 2. And if 1 is always 1, it would always be in the same quadrant. The graphs that we have seen completely contradict this.

On the other hand, the line in the graph containing all three points is completely stable. There's not a single sample in the 14 windows in which the three points are not on the line. So the property, the line contains three points, is stable. The line-- let's see. The line containing three points is definitely stable.

But much more is true, which is why I have the transparencies. The lines coincide. It's always the same lie. Let's take a look, for example, at the last-- oh, we have the last window-- last window. And I go back until I have these coincident points, for example.

I have coincident points on window number seven, and this is window number 14. If I superimpose window seven on 14, which have no points in common-- they're disjoint windows. It's a little hard to do this very accurately. But you can see that roughly to the accuracy I can do that, these coincident points here lie all together. In fact, they're on top of one in window 14. So if I lift this up, you'll see there's a point here. And on top of that are the coincident points of window seven.

Now, that's such an enormous coincidence that it cannot be a random effect. So lines coincide, and these two things are non-random effects. Because in this style of sloppy conventional statistics, it is an unbelievably small probability that for no reason at all, these points coincide.

We could make one more demonstration of that. For example, I take window five and compare it with, say, window one, which is the long line. Here's window one. And window one, the line doesn't go into the next quadrant, but you can still see that the points in window five, which is disjoint from window five, lie on the continuation of this line almost exactly.

So these are clearly nonrandom effects. But in accordance with my discussion, there's also random effect-- namely, the position of points on the line even if the line is not drawn in a particular graph. On the line equals random in my sense. It's not constrained.

In fact, this is exactly what the theorem here says. The theorem claims that under the conditions stated over there, the points must always lie within a narrow band of this line. The narrower the band, the clearer the assumption of low noise. But nothing is said about where they lie. So the actual position is left to be random because the theorem, which gives a precise mathematical statement, says nothing about it.

A more mathematical way of saying it is this. Suppose a line, l, is generated by two points. This statement is a rule which says that I must somehow get two points which are on a fixed line, a fixed l, is generated by two points. So the line is fixed.

But the statement that two points generate that line says nothing about where these points are, except that they cannot effectively coincide because that would be a violation of the statement generate. But this is not an interesting difficulty. They can be essentially anywhere on the line as long as they're not exactly coincident.

Therefore, according to my definition, where they are is random. Therefore, we see that the standard statistical prejudice, which incidentally gives you one point-- and this one point wanders around like crazy because of this fact-- one point is random. The standard statistical prejudice is dead wrong in this case. But the correct explanation of the data with q equals 2 is very stable with respect to all the windows-- so all the data.

And at the same time, it explains why nothing intelligent can be said about-- nothing further can be said about where these points are. They happen to be the effect of this part of the matrix when I write the expression sigma hat plus sigma tilde minus 1. And by explicitly analyzing the algebra by which this calculation is made, you can easily see that it is this point of the matrix that causes-- these parts of the matrix that causes these points to move up and down.

I just want to make one comment as to why this is deadly for classical econometrics. In classical econometrics, I proceed by making one such assumption. And in this case, it doesn't matter which point I look at. Every one of the points is randomly varying over the windows.

If I make this assumption, then I have the statement that there's one linear relation between the variables, for example, due to this point here. And then it causes big trouble to econometricians if this point wanders over here, which we have seen occurs back and forth throughout this data. Because here, even the sign of a coefficient changes into the opposite, which causes serious problems with economic interpretation.

Well, the reason why this happens and the reason why this standard statistical prejudice with one point is totally false is that a single point on a line is never identifiable because what is identifiable is the line itself. The line can be generated by two points, but the generators cannot be identified. Therefore, a single point has no significance.

So to conclude, let me state one theorem which follows from this setup line mathematical problems but which has terrible implications on practice and which is the reason why classical econometrics died once you saw these things. And the theorem says if you don't know what you are doing, which is technically saying use the wrong prejudice or the wrong guess or prejudice, then all your computations are noise. No information at all. Thank you very much.

[APPLAUSE]

Keyword Highlighting