Leslie Kaelbling: 6.01 Lecture 09

Search transcript...

PROFESSOR KAELBLING: OK. Well, this is sparse attendance, but it's going to be the most fun lecture. So that's what you get rewarded by. So we're going to talk about probability.

Let me just remind you that there's an exam tomorrow night. That could be related to the sparsity. 7:30 to 3:30, 26-100 or 10-250, depending on your last name-- oops, last names N through Z. There's been lots of talk on Piazza. If you haven't gone looking there, it's probably worth looking. So there's that.

OK, so what we're working on in this last section of the class, if you remember last time we had a lecture two weeks ago, we talked about the idea that we're going to build systems that have models of how the world works, not just in the engineer's head, but actually in the head of the system itself. The robot, when it's thinking about where it might be in the hallway, right, as we did last week in lab-- it uses a model of how its sensors work and how its wheels work to move it from state to state in order to aggregate the observations it gets over time and make an estimate of where it thinks it might be in the world. So that's still what we're up to.

And what we're going to do today is really think in a lot more detail about state estimation and what it means, and that sort of thing. So first, what I want to do is start with some stuff to just kind of get us a little happier with the idea of subjective probability. So I have in my hand here a LEGO piece. It's got you could say bumps on one side, and holes on the other. Everybody knows about LEGO. So we've got bumps and we've got holes.

So that's sort of like heads versus tails, right? Bumps versus holes. I could flip a coin, but you guys all have some built in biases about how coins work. I don't know what you think about LEGOs.

So if I were to flip this LEGO-- imagine that we wanted to do a betting game involving flipping a LEGO piece. And I say, OK, I'm going to flip this LEGO and I'll pay you $1.00. Actually, a little later, we'll do a higher stakes game. I'm not offering money on this LEGO game right now, but I suppose I could.

Imagine if I said I would pay $1.00 if it comes up bumps, OK? That's a proposition I could make to you. How much would you pay for the opportunity to play that game with me? I'm offering you $1.00 if it comes up bumps. What would you pay?

STUDENT: [INAUDIBLE].

PROFESSOR KAELBLING: $0.50. OK, there's a lot of $0.50. Is there any dissents?

STUDENT: As little as I can get away with.

PROFESSOR KAELBLING: As little as you could get away with. OK, there we go. OK, we have an entrepreneur. Good. Anybody else?

OK, so somebody who said $0.50, say why.

STUDENT: [INAUDIBLE].

PROFESSOR KAELBLING: Right. But why 50%? Why do you say that? Do you think that-- I mean, have you ever flipped one of these, or flipped it 20 times, or 100 times, or 1,000 times? Do you think that if I flipped it 1,000 times, I'd get 500 bumps? Do you think if I flipped it a bazillion times, I'd get a bazillion over 2 bumps, roughly? Yeah?

So why are you so convinced that this is a 50-50 device? Maybe the bumps are heavier than the holes. Might not be, right? It might be that this thing is actually fundamentally imbalanced. In fact, I'm guessing it probably is.

In fact, I was guessing you probably wouldn't say 50%. I never know how these conversations are going to go. OK, good.

So at least it seems to me that there's a pretty good chance-- oh, that's a risky thing. It seems likely that this thing is probably actually not, in the long term-- it's probably not 50% completely evenly likely to land on one side versus the other. It's probably not.

I would say that it might still be rational to make that $0.50 bet. And the reason could be that even in the situation where you don't know the long term payoff of this thing-- you don't know the long term likelihood that it's going to end up bumps versus heads-- you just used your 50-50 to say, meh, I don't really know how it's going to come out. Probably.

So when you said 50%, you were probably basically on your own lack of knowledge about this gizmo more than you were basing it on some statistical understanding of how this thing was going to work out, or your physics knowledge and aerodynamics and all that, or whatever it was. So the idea that you could say 50%, not because you have statistical evidence that the thing is going to come up bumps 50% of the time, but because you don't really have any reason to prefer one hypothesis over the other-- that's this idea of subjective probability, the idea of using probability to represent your own personal lack of information about what's going on in the world.

So when we talk about the robot having beliefs about where it is in the world, it's that same idea. We're talking about the robot having some personal uncertainty about where it is in the world. We're not thinking about long run frequencies or anything in particular. We're just thinking about the robot having personal uncertainty. So maybe the robot's not a person, but robotic uncertainty.

OK, so that was a little exercise. Now I'm going to do a more interesting one, and it'll give us some practice doing Bayes' rule stuff. So actually, before I do this in complete detail, I need two volunteers. So I want to have collusion. Two volunteers that are sitting near each other.

OK, so you two. You can do this job. So I have two envelopes. One is labeled raw material, and in the raw material envelope, there's red and blue LEGOs, a bunch of them. And in this envelope, it says Official 601 betting apparatus.

So what I want you two to do-- wait, it was you and you. I don't know how you're going to do this. Maybe it's going to be tricky. I don't know. You can work together.

I want you to put four LEGOs out of this envelope into this one, and don't tell me which four. You can collude, you can talk, you can decide what you're going to do. So they're going to put four LEGOs into that envelope. I have to wait for them to do it, so the pressure's on, too.

We can sing the Jeopardy! song. (HUMMING THE JEOPARDY SONG) Did you know that my 15-year-old had the idea of playing the Jeopardy! song faster and at increasing pitch at the end of the nano quizzes? But I told her that would be cruel. That's cruel, right?

Then Adam had the idea of some horrible music from Mario drowning or something. I don't know, maybe you guys know the Mario drowning song.

OK, are you done? OK, good. I need them back. I won't look. OK, I'll take the leftovers, too.

OK, so now we have the official 601 betting apparatus. Good. Now here's the bet. The bet is the following.

So there's four LEGO bricks in here, and there's some number of blue ones and some number of red ones. I don't know what's in here. You don't either, except for these two people who are not allowed to say anything more today.

OK, so I have here a bearer bond. It says "pay to the bearer $20 if a red LEGO is drawn from the bag. So at some point near the end of this lecture, we're going to have an event where we pick one LEGO out of that bag, and whoever has the bond, if the LEGO comes out red gets $20. I even have $20.

OK, so how much would you pay for this bond? Would you pay $1.00?

STUDENT: [INAUDIBLE].

PROFESSOR KAELBLING: Yeah. Would you pay $19?

STUDENT: No.

PROFESSOR KAELBLING: No? OK. $5?

STUDENT: [INAUDIBLE].

PROFESSOR KAELBLING: In the back?

STUDENT: Is it possible that there are no red bricks?

PROFESSOR KAELBLING: It's entirely possible that there are no red bricks in here. Or they could all be red. Right?

So this is not like the coin that you think is 50/50 anymore, because there's a real chance that those guys-- who knows what they did? Maybe they put all red ones in. Maybe they put all blue ones in. Maybe it is 50/50. I don't really know.

OK, how much would you pay? Would somebody pay $5?

STUDENT: [INAUDIBLE].

PROFESSOR KAELBLING: Would somebody pay $10? Yeah, some people would pay $10. Would anybody pay $12? Sure? $12?

Because I feel bad taking money off students, I'm going to sell this to Denny. Denny's already told me he'll pay me $10. So we're going to do this. We'll see who wins at the end of the thing.

STUDENT: I don't know about this.

PROFESSOR KAELBLING: Oh, yeah. Excellent. OK, this is a $10. Oh, now I have $30. That's even better. All right, here.

STUDENT: [INAUDIBLE].

PROFESSOR KAELBLING: You can hold the $20. OK, good. So Denny has a sheet of paper, I have $10, we have someone holding the $20.

So let's think about how much this is worth just the way it stands, OK? So what could be in the bag? Well, there could be four blue, three blue and one red, two blue and two red, et cetera, right? So those are the possibilities.

Now, if we were to try to think sort of probabilistically about this, we have to think, well what's the distribution over those possible arrangements of things? And again, for the lack of a better idea, unless we have particular insight into the psychology of our people who picked those LEGOs out, I don't really know how to think about that. So I'm just kin of going to assume that these are equally likely.

So if those are equally likely, we can think about sort of the expected value of this bet. We can say, well, there's a set of states, right? The set of states is the number of red LEGOs in the bag. And each one of those states, we think, well, maybe they're equally likely.

But now, depending on the number of red LEGOs in the bag, the bag is worth more or less to us in expectation, right? Because if there's no red LEGOs in the bag, we know for sure we're getting zero. If there's all red LEGOs in the bag, we know for sure we're getting $20. If there's an intermediate number, like if there's only one red LEGO in the bag, then there's a one in four chance that we're going to win, so one in four chance of getting $20, and a 3/4 chance of getting $0 is $5. I'm not really going to require you to understand expected value, but you probably have an intuitive idea of what's going on here. So given this distribution of how likely the different states are of the bag, and how much we would expect to get if, in fact, that was the makeup of the bag, we find that the expected value-- the probabilistic, if those probabilities were right and we did in the long term, the expected value of that bag is $10.

OK, so that's what Denny paid, so that's clever, Denny. That seems good, and many of you would have done the same thing. But what we can do is get more information, and I'm going to just kind of skip to here. So what we're going do is we're going to get some more information. We're going to do some experiments on this bag. They don't count as the official draws.

We're going to do a couple of experiments and see how that adjusts our belief about what's in here. And then we can see whether we think-- we could be Monty Hall, but we're not going to. We can offer the chance to change our mind or something like that, but we're not.

So what we're going to do is we're going to think about making observations of the contents of this bag. So what are the observations we can make? Well, we could pull a LEGO out and we can look at it, and that will tell us something about the composition of the bag. We're going to put it back in. We're always going to put it back in, so the observations we make aren't going to change the underlying probabilities, OK?

So what we want to know, if we pull one LEGO out and look at the color, we're going to want to know a distribution-- a belief state-- which is a distribution of the possible states of the world. In this case, the states of the world are the proportions of red and blue LEGOs in the bag. We're going to want to update our distribution over the states of the world based on the observation that we get. That is to say, the color of the LEGO that we pulled out. Does that make sense to everybody?

OK, so you can pick one out and show it to us and put it back in. It is blue. OK, uh-oh, Denny's not happy about that.

No, no, no, put it back. Put it back. OK, we've got one blue LEGO. Good. Actually, that's the case that's worked out in detail already in my slides, so I don't have to do it on the board.

OK, so you guys all remember this kind of way of drawing a picture of what Bayes' Rule does, or what the state estimation does. So we're going to go through this for Bayes' Rule practice.

So when we started out, our prior, our belief about the states of the world-- in this case, the contents of the bag-- was uniform. 1/5, 1/5, 1/5, 1/5, 1/5. We were equally likely each contents. But now we've got an observation of blue, and so what we're going to do is do Bayes' Rule to figure out the distribution over the state, given that the observation was blue.

OK, so if you remember, there's sort of two steps to this update. First, we're going to multiply through by the observation probability. What's the probability we get an observation of blue given the state is S?

OK, if the state is this one, 0, which means there are no red LEGOs, what's the probability of observing blue? 1. Good.

If the state is that there are four red LEGOs, the probability of observing blue is? 0, OK, good. So those are the probabilities of observing blue when we pull one out, right? There's four LEGOs, so it all depends on what the underlying makeup of the bag is, what the probability of seeing blue is. Does that make sense?

So those are the observation probabilities. They don't add up to one. That's OK, right? Because it's the distribution over observation equals blue, given S. So that's not a distribution on S.

Now the next step is that we multiply, right? We multiply each of our prior probabilities by the observation probabilities, and we get these numbers. That was easy multiplication.

And now what we have is this thing, but it's not a probability distribution, right? The numerators add up to 10, but the denominator is 20, so this thing adds up to 1/2. So we have to make it add up to 1. We do that by dividing through by 0.5-- by 1/2. And we get this probability distribution.

OK, so this is now, if we are rational Bayesian reasoners, which we may not be-- there's all sorts of interesting evidence about whether humans are or are not rational Bayesian reasoners-- but if we were, then this is what we ought to believe. This is what our posterior distribution on the contents of this envelope ought to be, right? 4/10 now, we think that it's got four blues in it.

So if you, knowing what you know now, if I offered to sell you that bond, would you pay more or less than $10 for it? Less. Good. Is there anybody who would pay $10 for it now? OK, I guess not. Poor Danny.

All right, so good. So this is what we think about what's going on. OK, let's do one more draw.

Oh, another blue. Another blue. OK, this is the one that's not worked out in our notes. I'll do it on the board. Another blue.

So, this is the probability distribution over S, given O0 is blue, right? That's what we had before. And it's 4/10, 3/10, 2/10, 1/10, 0. Yeah?

STUDENT: Is it the same blue block each time?

PROFESSOR KAELBLING: I don't know. We don't know. We don't know. But we're believing here that the blue blocks are interchangeable and the red ones are interchangeable, and we can't tell the difference. So we're just really interested in the proportion.

You're right. You're right. OK, she has a great point, thought. I mean, maybe, if the identities of the blocks were discernible, and you realized that this blue block came out, but it was the same as the blue block you got last time, that might actually change the update that we're going to do here.

That's a trickier thing, which we're not going to do. So we'll assume that they're anonymous. OK, we're not going to dust them for fingerprints.

OK, so now let's see. What are we going to do? We're going to do an update. We're going to multiply by the probability of seeing a blue, given that state, right? So we're interested in now, what's the probability that the observation 1 is blue, given S?

And those are the same probabilities as before. This is 4/4, 3/4, 2/4, 1/4, and 0. We multiply, get 16/40, 9/40, 4/40, and 1/40, and 0/40. So that was the likelihood of seeing blue, given each of these states.

STUDENT: I have a question.

PROFESSOR KAELBLING: Yeah, good.

STUDENT: If it was the same blue block that was pulled out, wouldn't nothing change?

PROFESSOR KAELBLING: So I actually have to think this one through. Would nothing change?

STUDENT: Only if you could identify that block.

PROFESSOR KAELBLING: Well, no, but that's her point. So imagine the box-- not only are they blue, but they have little numbers on them. So I think that nothing would change, but I think that you shouldn't hold me to that, and I'll go figure it out and let you all know. It's certainly not the same situation as this.

There's all sorts of interesting studies of this, like IJ Good, who's a famous statistician, did all this stuff during World War II about trying to estimate how many tanks the Germans had based on looking at the serial numbers of the tanks that they captured. So there's all sorts of stuff about when there's finite numbers of things, and you know their identities, and how can you reason about the size of the population, and so on.

So time to renormalize. The numerators here add up to 30, so we divide by 30/40, and we get 16/30. So we're going to normalize, and I'll write it out in decimals, because it's easier to think about now. So we get 0.53, 0.3, 0.13, 0.03, and 0.

So it's not looking so good for Denny and his bet, right? I don't know, maybe those miserable students read ahead. We hope not. OK, we'll deal with the rest of this later.

So that was meant to be just kind of a reminder about how to do belief state update, and the fact that you can just keep computing. After you see one observation, you can do some computation and see the next one, and so on. So now I'm going to cover some territory that I covered in mini lectures, and that was relevant for last week. But it's going to be-- for those of you in particular who haven't done the state estimator yet, it's important ideas.

So we're going to look at hidden Markov models and their generalization to something called stochastic state machines. And they are actually used to model all kind of sequential processes. People use them to model, for instance, gene sequences, or word sequences, or letter sequences within words, or various kinds of statistical sequences of things. That's a common use for hidden markup models.

We start with some initial distribution on the states. We have a conditional distribution that says how the new state depends on the old one, and we have an observation model that tells us how the observations depend on the states. So you've have a lot of practice with this already.

And the idea is that given a whole sequence of observations, what you're interested in is computing-- actually, there's a number of questions you could ask, right? You could say, given this whole sequence of observations, where was I when I started? Because you know that better, having had this whole sequence. But mostly, we'll concentrate on this question, which is having seen all this sequence of observations, what state am I in now?

OK, so at the very end of the last lecture that we did out loud, we had this are my leftovers edible example. I forgot to bring in my edible leftover prop. You're probably just as happy.

So this was the question-- I think this was the question that we ended on just as we ran out of time. So this is the state transition model. It's three probability distributions. It's one conditional probability distribution.

It says, conditioned on the state at time T, what's the distribution on the state at time T plus 1? So note that each of these rows adds up to 1. If before, my leftovers were tasty, now this is a probability distribution on what they'll be like in the next step. If before they were smelly, here's the distribution, and so on.

OK, so how can we answer the question, what's the probability that State 1 is tasty? What's the probability that State 1 is tasty? Well, our prior, in order to answer that question, you had to know what State 1 was like. And here, we're assuming State 1 was surely tasty.

So if State 0 was certainly tasty, what's the probability that State 1 is tasty c?

STUDENT: 0.8.

PROFESSOR KAELBLING: 0.8, right? You can read that right out of this transition table. You say, the previous state was tasty, so the distribution on the next state says that 0.8 is tasty. So that's just a warm-up of what the transition table looks like-- what it's for.

OK, just as we had a little graphical trick for doing Bayes' rule, there's a kind of nice graphical trick for thinking about how transition distributions happen. And we kind of exercised this idea informally in the last lab of last week, but this is now sort of a graphical way of thinking about it. So let's now try to ask the question, what's a distribution over the state of the whole state of my leftovers at Time 1, and then what's the distribution of the whole state of my leftovers at Time 2?

OK, the Time 1 case is pretty easy, right? Because all of our probability is-- at Time 0, we're assuming surely they're tasty. So what we can do is sort of make arrows from each of our states to each of the possible states it could transition to. Generally speaking, there will be nine arrows. There are six here because there are several zeroes in that transition matrix, so we didn't bother putting those arrows in.

So if it's tasty on this step, it could possibly be tasty or smelly on the next step, and the probabilities are 0.8 and 0.2. If it's smelly in the previous step, it could possibly be tasty or smelly or furry. If it's furry, it stays furry.

So I draw these arcs that connect the states at time T to the states at time T plus 1, and label them with the transition probabilities. So now I have my prior belief about the state I'm in. I have my state transition probabilities, and then to compute the likelihood of the next state, the probability that goes here, I take this probability times the transition probability-- so that's 0.8-- plus this probability times the transition probability. Well, this is 0, so 0 times 0.1 is 0, so I just get 0.8.

OK, let's do-- well, none of the cases are very interesting here, right? So because this was a 1, we're just copying that distribution down into the next row. So now let's do one more step. With this step, does anybody have questions about what's here? No, OK.

So now we say, all right, well, now we're interested in the state in two days. In two days, what will my leftovers be like, is the question. The transition probabilities are assumed to be the same, right? So the fundamental idea here is that the transition probabilities themselves and the observation probabilities don't change, but the state evolves over time. So this set of arcs and labels is the same as this set of arcs and labels.

But now when we ask, for instance, what's the likelihood that my leftovers are tasty on the second day, Day 2, it's 0.8 times the probability that they were good before-- 0.8-- plus 0.1 times the probability that they were smelly before, and they've just turned into fermented perfection. So to compute this 0.66, I take 0.8 times this 0.8, 0.2 times this 0.1, and I add them together. For each arrow that came into this state, I would add up a term, which is the product of the probability in the state that it came from times that transition probability. Does this picture make sense to everybody?

So in general, if you have a distribution over where you might be, and you want to compute a distribution over where you are next, this is actually a matrix multiplication, if you know how to think about things that way. But it's a pretty straightforward operation to figure out your new distribution. So State 1 at time T is 0.66.

So now let's do a little bit of an integrated example. In fact, some of the first people to think about estimating the hidden state of some system and then actually, more interestingly, controlling it, they were thinking about how to decide what to do with machines in a factory. I might have a really big expensive machine, and it's expensive to take it offline or to repair it, so I might try to diagnose, look at the things that are coming out of it, and decide whether it's broken enough that I should take it down to repair it or not, for instance. So we'll do a simple example of that, just where we're doing the estimation.

So imagine that I have a copy machine. I went and bought a new copy machine. I bought it from some really cheapo-- one of those websites that doesn't look too good, that looks kind of like our April 1st website, with lots of blinky stuff. So I'm not so sure it's very good anyway. It was pretty cheap, so I think that even when it arrives at my door, there's probability 0.9 it's good, and probability 0.1 it's bad. So that's what I think about this machine when I get it.

Now, I have two pieces of beliefs about how these copy machines work. I have a transition model, right, which says that this thing is going to decline precipitously. If it was good before, the probability it's good on the next step is only 0.7. So there's a 0.3 chance of it failing each day-- let's say each day.

And if it was bad before, it pretty surely stays bad, but there's some small chance it gets better. So that's the transition model, and then I have an observation model that tells me how likely are my observations, given the state of this machine? And let's imagine that I have three kinds of observations, right? I look at a copy that comes out, and it could be a great copy, it could be a little bit smudgy, or it could be just all black. The toner exploded.

So here's a conditional probability distribution that says, what do the observations look like when it's in a good state? Even when it's this thing, even when it's good, sometimes it makes black copies. And what do the observations look like when it's in a bad state? So this is now kind of a characterization of my copy machine and the way it's going to behave over time as a hidden markup model.

OK, so now what we're going to do is ask ourselves some questions. If we get some observations, what do we believe about the machine? So first of all, let's assume that a good copy came out of it. Yay, we've got our copy machine, we brought it home, we plugged it in, we put in something, we made a copy, and it was beautiful. So now what do we think about our machine, after it made a beautiful copy?

So I'm going to show you two ways to think about it, one which is in terms of these operations on distributions that we already have implemented, and another one is in these sort of pictorial terms. So imagine that I find this information that observation 0 is perfect. How can I make use of that information? Well, one way to think about it is that I can make a joint distribution between the state and the observation.

So my state before was 0.9, 0.1. I thought it was pretty likely it was good, 0.1 it was bad. 0.9, 0.1. And I know the probability of getting a perfect observation in the good state, getting a perfect observation in the bad state. I get this observation model that tells me all the different observations in terms of the states.

So I can use that operation, which you guys have implemented, which takes a prior and a conditional distribution, and gives you a joint, right? So you all implemented that. You know how to do that-- make a joint distribution.

So I could do that. I could make this joint distribution. So this says, how are the observations and the states related to one another, within one time step? Assuming that it was 0.9 good and 0.1 bad.

So which numbers here were going to add up to 0.9? Some set of these numbers will add up to 0.9. What are they? The top row, right? If I computed the marginal distribution-- that's the one you write in the margin, the marginal distribution-- is 0.9, 0.1, right? Probability of good is 0.9, probably of bad is 0.1.

So now I have this joint distribution. What I'm interested in doing now is conditioning on the fact that I saw a perfect observation. So remember, conditioning is like changing the universe, right? Conditioning is changing the universe. Ruling out all those things that couldn't have happened.

I did not see smudged and I did not see black, so I canceled them out. I kind of delete them from my consideration, and I say I'm only going to think about the parts of the world where things are perfect. So now, only looking at the parts of the world where I got a perfect observation, what's the probability that the state is good versus bad?

So to do that, so if I ask the question, what's the probability that the state is good and the observation is perfect, divided by the probability the observation is perfect. That's the definition of a conditional probability. So I can compute that looking here, and what I'd get is this new distribution. Good, 0.96, bad, 0.014.

So what did I do? I made a joint distribution, and then I conditioned on my observation. That is a conceptually sort of neat way of thinking about what's going on. It's also computationally expensive. And that's the exercise that we have you doing in the state of the last problem from the last design lab, is doing it more efficiently.

So here's a way to think about how you're doing it more efficiently. This is sort of the pictorial method. We start out with probability of 0.9 and 0.1, good and bad. We multiply by the observation probabilities, right?

So what's the probability we've got a perfect copy, given that it was good? 0.8. Probability of getting a perfect copy, given it was bad? 0.1.

Do the component-wise multiplication, so I get 0.72, 0.01. Divide by the sum, that normalizes, and I get this distribution. So I get the same answer I got before. It's just two different ways of getting to the same place.

OK, time passes. So now, remember, I also had this probability that my machine is going to fail just sitting in my office overnight. And so, what I want to think about is this joint distribution on the probability of State 1 and State 0, given that the observation was perfect.

Let me actually just go straight to this picture, because I think it's the easier way to think about it. And this is like what we did with the leftovers. So this distribution here is what we believed after we saw that observation of perfect, right? 0.986, 0.014.

Now we say, OK, time passes. So we're going to apply our transition model, we're going to look at this distribution on the state at time 1, given the state at time 0, and we're going to see that if it was good before, there's a 0.7 chance that it's going to be good on the next step. And if it was bad before, there's a 0.1 chance it's going to be good on the next step. So we take these two ways of the machine being good on the new step, and we add them together.

So this is 0.986 times 0.7, plus 0.014 times 0.1. We add those together, and we get 0.692. We do the same thing to compute this number, and we don't need to re-normalize. Why?

Because these numbers sum to 1, and each of the distributions that we're using to compute the new numbers sum to 1. So if you do a little algebra, you can convince yourself that this thing is going to sum to 1. Modular rounding errors. So you don't have to do a normalization step after you do this transition model.

OK, so what happened? Well, our copy machine is so prone to failure that overnight, even though we went to bed thinking it was pretty good, we get up in the morning and we think, oh, I don't know. Maybe not so good after all.

It's worked out here. I don't think I'm going to go through it. We could deal with the issue of getting another copy that's smudged, taking the observation into account, in the same way that we did before-- taking the transition into account. So at the end of the second day, one perfect copy, one smudged copy. I'm pretty sure my machine is not very good anymore. Not such a good machine.

OK, any questions about this process? Not on this exam, but certainly on future exams, we'll expect you to be able to do it. And you have the exercise of implementing it in Python, so it's important to understand it.

OK. The only difference between a hidden markup model and a stochastic state machine-- all these fancy names we're flinging around-- is that in a stochastic state machine, we're actually thinking about there being an input to the model. An HMM, you can just think of it as being something that emits observations. It's got some hidden state, which we don't know, and observations come out of it, one on each step, right? So copies come out of the copy machine.

The only input we gave to the copy machine was like, to print the button. We're not assuming that the input has any effect on the observations. To turn it into a stochastic state machine, we need to give it inputs. And that just means that the transition model depends both on the previous state and the input. That's the only difference.

OK, now we can do this all in Python. You have some experience with that. I'm not really going to go into it.

Actually, I do want to talk about this, though, because I think it's kind of fun. So inside Python-- inside our 601, [INAUDIBLE] 601, there's something called a stochastic state machine. And to describe the stochastic state machine, we have to specify distribution over the start state.

This should be called transition model. Maybe next year I'll go and change this around. It should be called transition model observation model, because those are conditional probability distributions. They're not properly distributions.

But OK, let's look at this part. So when we say, hey, state machine, what's your start state? It says, well, I don't know. I'm a probabilistic state machine. I don't really know my start state. So what I'm going to do is I'm going to draw an example from the starting distribution, according to the probabilities. So if my starting distribution is a bag of LEGOs, I'll pick one out. So that's a start state.

And what's get next values for one of these guys? Well, remember that get next values returns a next state and an output, right? So to compute the next state, I put in the input, right-- that's the action that I'm taking, the input to the machine-- and I put in the current state. That should give me a distribution over new states.

And I draw an example. I say, hey, what new state would you go to? Pick one at random from that distribution. And similarly, to get an observation, to get an output, I take the observation distribution that goes with the state I'm in right now, and I draw an observation. So it just runs randomly. That's what a stochastic state machine does.

So, for instance, you could take the copy machine, think of it as a stochastic state machine, ask it to transduce the string of 20 copies. Imagine that the input is just copy. We have to have some input. So this is the distribution over what will come out of the copy machine.

And actually, if you look at it, and if you did this a whole bunch of times, you'd see that early on in the string, we got some pretty good copies. But this copy machine is going downhill, right? You look at that transition model, and you know that it's gradually going to be more likely to be bad. So you get more smudgy and black things at the end.

OK, state estimation. So what is a state estimator? So this is a stochastic state machine. It's a probabilistic object. When you put in an input, you're not sure what output you're going to get. You're not sure what state it's going to transition to.

So that's something. Maybe it's a thing in the world, right? Maybe it's the robot driving around. Maybe it's a coffee machine, maybe it's something like that. The job that we typically have of trying to make the robot understand where it is, or trying to understand whether we should maintain our copy machine, is the problem of building a state estimator.

So a state estimator is a state machine. I'll call it state estimator. And what it takes as input-- the state estimator-- is pairs of observations and actions. Observations and inputs. Things that were done to and observed to come out of this machine, whose state we don't know.

And so what we're going to do is build a state machine whose job it is to try to figure out what in the world is going on with this random process, that it just has only a little bit of access to. So the internal state in the state estimator is the belief state, right? What's the belief state? It's a probability distribution over the hidden state inside this machine.

So this guy's job-- the job of the state estimator-- is to start out in some initial state, typically the initial state of the state estimator-- should be, if the initial distribution of your process that you're trying to estimate the state of. And then, on each step, it takes an observation and an input, and it uses those to update its belief about the underlying state of the machine.

And so that's like that copy machine update process that we did, right? It would take an observation and then an input, and it would compute a new belief about what was going on. And that belief is the output of the state estimator, and it's also the next state.

Does this relationship makes sense? Typically, this is in the world. This is something that's happening out in the world, and this is in the head of your robot. But notice that part of what you need to pass in to the state estimator when you create it-- part of what the state estimator depends on really critically is actually a stochastic state machine model.

If it didn't already have an idea about how copy machines tended to break down, and how copy machines tended to generate observations, it wouldn't be able to make any useful inferences from what it sees coming out of the machine. But because it knows how those observations relate to the underlying state of the machine, it can make a decent estimate of the state of that machine.

Questions about this? It is confusing and funny, I know, so probably somebody has a question. Unless it's too confusing. Well, in design lab again this week, we're going to do more of this type stuff.

So let me show you some movies. It's time to wake up. I'll show you some movies, and then we'll talk about we're going to do this week-- Wednesday, Thursday, Friday. So I'm going to show you some movies of state estimation. Let's start with this one, because it's most like what we've been doing.

So this is a robot. This is the little robot with his sonar readings. This is some work that somebody else did at University of Washington, Dieter Fox. A little robot with its sonar readings in a very big floor of their computer science department. And it's trying to estimate its position in this building.

It's using what's called a particle filter, which is a kind of a state estimator, which, instead of keeping a mathematical representation of the distribution over whether the robot could be, it keeps a bunch of samples that try to represent together the distribution of where the robot could be, and those are the little red dots. But as it goes along, it does a state update, just like we've been doing. That's probably got some talking.

So number of samples, 40,000. That's a lot of samples. That's where the robot is most likely positioned at the moment. OK, so now the actual robot is going along, getting observations. And as it goes around and gets observations, you'll see that some particles are dying out. Those are some places that the robot initially thought it might have been, but as it gets more and more observations, it's killing them, just like when you did the stuff last week. As your robot got more observations of the halls, it ruled out certain things.

So now the robot's pretty sure it's in a hallway. Notice that now we all just have kind of nice left side of the straight hallway type particles left. It's getting these long measurements, which are really only consistent with being down there. The particles are clumping up. It's very exciting.

There it is. OK, it found itself. OK, good. So that's one example of state estimation.

Let me show you a couple of other examples. These are actually from my research group. actually, let me go forward just enough so we see the robot. So here's a robot we have up on the fourth floor. You can see it's got little yellow rubber glovey fingers on. Inside those rubber glovey fingers are $2,000 sensors which sense the torques on the end of the fingertip.

And so from that information, we can estimate the point on the finger tip and the direction of the forces it's feeling when it touches something. So we get three of these torque measurements, one on each finger. And what you see down there is sort of like a belief state. So what happens is we use computer vision to look and estimate where the drill. We don't get such a great estimation.

And this purple stuff you see down in the corner, the dark blue thing and the red dots are an estimate of the most likely position we think the drill is in. But that light blue blob down on the floor is sort of a projection of the set of places that we think it might be. So that means we're not really very certain about where it is. There's a lot of uncertainty.

And the goal here for this robot is to be able to pick the drill up and actually activate the trigger. So it has to be holding it in a very precise way. Now, what's interesting here is that if it goes and it tries to grab the handle of the drill, the handle feels roughly cylindrical, and its fingers aren't very reliable. So if it goes and tries to grab the handle, that doesn't give it any information about which way the drill is oriented, right? Because if I grab it like that, I don't get any information.

So what you'll see the system is two pieces. You'll see the state estimator, which is responsible for shrinking down that purple area. Also, and in fact, the focus of this research work, actually, was on selecting how to touch the thing next in order to get the most information. Because there's places where you can touch that drill, and they don't really tell you anything about its orientation, and places where you can touch it and you get a lot of information. So anyway, so you get it.

So it touches it, it re-computes, it updates its distribution about where the object might be. It decides in a minute that it has to change the orientation, because it couldn't reach it when it was turned like that. It touches the bottom, scoots it over, you can see the belief moving down on the bottom window. It's pretty sure now. It's going to do a couple more verifications, because it wants to be really sure.

And now, yay, it pulls the trigger. OK, so there's another belief state update. I'm going to show you one more, just for fun. This is what I do for a living, when I'm not teaching you. What? Oh, you'll tell me later.

OK, so here's another thing. This isn't a simulation, but we have this actual robot, too, and we're going to be working on getting it to do this stuff. So here's a two armed robot that can roll around, and it's supposed to manipulate some stuff in a kitchen. Eventually we're going to make this more complicated.

But it's not sure where the positions of these objects are. So now its belief state is the relative positions and orientations of the robot, and the tables, and the cups on the tables, and so on. So that's one set of part of its belief.

And another part of its belief is actually the stuff we call fog, which are these grey areas, which are just places that it hasn't looked in. So it doesn't have any confidence that there's not stuff in the fog. If you're driving in the fog, you want to look before you drive through some area that you haven't seen before.

So it's keeping track of an estimate of the distribution of the positions of these objects in its world, plus keeping track of where it's looked before. And again, it's trying to decide, in order to do some stuff with these things in the world, where does it need to look? And with this robot-- it's like the robots, actually, that we have-- it has pretty bad odometry. So if it thinks it knows where it is relative to something, and then it moves the base, it loses certainty about where it is with respect to that thing. And you can see the distributions kind of fuzz up a little.

And then it looks again, and it says, oh, OK, now I got it. So every time it moves, it has to look again at the thing it's going to try to manipulate, otherwise it won't have a good idea of where it is.

So you can see as it turns its head and looks, it dispels some of the fog. It moves up there. It may have to move its hand out of the way to take another good look at the object. Reasons about that, goes back, picks the thing up, moves it out of the way. I think it's trying to put the red one over there where the green one was.

See how the green one got big? That's because that's the size of its uncertainty, 90% confidence interval about where the green one was. Then it saw it and it got small again.

So it comes over here, moves its hand, looks carefully at the red thing. The distributions tighten up again. It picks up the red thing, goes over there, puts it down. So our summer project is to get this now to go back and work on the actual robot. But that's another example of state estimation.

So we're going to be doing the same thing in lab, only on a little bit smaller scale. So we're going to have our little guy in [INAUDIBLE], and I hope that if tomorrow night goes well with our practicing, we can do it on the real robot. And the idea is we're going to have the robot driving up and down the hallway, sort of like the driving up and down the hallway thing we did last week. Only now it's the real robot in real continuous space.

So the robot is going to be moving along this hallway, and we're going to have to discretize its position as it moves along. And it's going to be getting these real sonar readings, and we're really going to just use the leftmost sonar in this simple example, just so that it is the closest thing to something that you've already done. And so we're going to try to estimate where the robot is, and then maybe have it go park in a particular parking spot once it's gotten itself localized.

So I'm not going to do this in complete detail, but the transition model is actually an interesting thing. So we're going to discretize the robot's position in the hallway, because doing continuous distribution is sort of more complicated, and not a thing that we want to do this class for right now. So we're going to have a discrete distribution over positions the robot could have the hallway.

But what happens if you're sure-- imagine that you're sure that the robot is somewhere in this interval. You don't know where it is, exactly. You just know it's at position 10. And then it moves a distance to the left.

If you took that block and moved it over and plopped it down, it might be that some of it will be in one bin, and some of it will be in another bin, right? It's like if you knew for sure I'm between 1 and 2, and then you move to the right a meter and a half. Well, all your probability mass didn't land in the same bin, so you're not exactly sure where you are.

So one of the things that we use the transition noise to model is actually the sort of error that gets introduced by discretization. So we say, well, if we move forward a certain distance and this whole piece of probability stuff lands on a boundary, then we're going to use our idea of uncertainty to say, well, I'm not exactly sure which next discrete state I land in. Probably I'm going to land in maybe the one where the center of this goes, but there's some chance that I landed on the earlier one, and maybe some chance that I landed in the later one, right?

So a combination of discretization error and actual error in our observations of how much the robot moves is what we'll put together in order to come up with a transition model. So what we're going to be doing in design lab this week is working a lot on modeling. We're going to work on modeling the transition error in the robot. We also have to model the robot's observation distributions.

So we did some simple observation distribution modeling already last week, but now we have to think of things being a little more complicated. Here's the robot, and it's at some position and orientation in the world. It shoots out a ray. We have to be able to compute.

If the robot were standing here, what sonar reading would we expect it to get? And then we'll put an error distribution around that. So there's a little bit of trig that has to be done.

And now we're going to ask the question, well, if the reading should have been, say, 1, what's the distribution on observations we're likely, actually, to get? And that's something that might be natural to model using a continuous uncertainty, like a Gassuian distribution, or a mixture of Gaussians, right? Our robots-- imagine there's a wall a distance 1 away. There is a pretty good likelihood of observing 1 or something near it, but we might also observe a 4. Or actually, maybe we just have a spike on 5, depending on what the robot does when it gets a bad reading.

Or we might say that there's a mixture of the uniform distribution and the Gaussian distribution. We might say, well, there's a pretty good chance we get an observation that's related to the distance that the wall is really at, but I really can't say. We might get some other random thing, because it bounces off. And so maybe I should say there is some likelihood that we would see a 3, if we're here, even though the wall's 1 away.

So what we're going to use instead of these continuous Gaussians and mixtures of Gaussians are the distributions that come from the homework that you're doing now. So in Homework 4, there's this idea of square distributions and triangle distributions and mixtures of them, and those are a way to make discrete approximations to those continuous observation distributions.

So we extended the deadline on Homework 4 so that you wouldn't have to do it on the same night as the exam, people in Sections 1 and 2. But I strongly encourage people to read through it and understand it, even if you haven't implemented it, because the ideas will be really important.

OK, so then we'll have our little robot running along, and we'll have some estimate of where it is, observation probabilities, and so on. It'll be all sorts of good fun. What we used to do in previous years, except for that people found it mystifying, so we backed off a little bit, but I'll just tell you about it, because it's still fun. I mean, if we wanted our robot really to be able to find its way, imagine that we wanted our robot to be able to find its way in the data center. That would be cool. Nobody can find their way in the data center.

But actually, it's easier than some other buildings, because usually the problem in an office building is that all the corridors look alike, right? So poor little robot finds itself in a corridor, and it doesn't know which corridor it's in. At least in Stata, every single location is different. So if your sensors are reliable, you know exactly where you are. So that's actually a useful property of a crazy building.

But if we really wanted to get our little red robots to localize themselves in a real environment, we would have to extend our estimator to multiple dimensions. We would have to think about not just where it was along a given hallway, but its x, y, and we'd have to think of its rotation as a coordinate, too, right? Because depending on which way the robot is oriented, we would get a different set of expected readings. So the underlying distribution that we would need to estimate would be on this three dimensional space.

And there's some hairy coordinate transform stuff that has to happen, and you have to use all eight sonar readings. But what you get is something like this. So here's an example in [INAUDIBLE]. I don't know if you can see it very well, but here's the robot, and it's been running around in [INAUDIBLE].

And these are little visualizations that are like the ones that we've seen before, right, where red means unlikely and blue means likely. Let's look at the bottom one. It's the more interesting one. So if you could see, there's little arrowheads in that picture. And so in each x, y discreet location, we've drawn just the most likely orientation.

There's really eight orientations. It's thinking about for each x, y location, it's also considering maybe I could be in one of eight discrete orientations. But here, we're just drawing the most likely orientation.

So in this window, right, this is the observation probability window. It says, well, the robot's standing right here, and getting this set of observations. And this is showing us how likely it is to get those observations. And what this window tells us is that, well, this looks pretty much like I'm looking at a wall, and it looks like mostly I'm looking at any wall.

The only difference is that these places here, where there are some bends, it expects to see sort of different sonar readings than the ones it's getting, right? Because if it were standing right here, its rightmost sonar would be shooting way down. So even for this one set of sonar readings, it doesn't think it's in this part of the world, because it would feel different if it were there.

Over here, we see the distribution over whether the robot thinks it is-- the distribution over states-- and this is after having driven around for a while. So in fact, the only places where there are some good blue arrows are up here, facing up to the angle of the same orientation that the robot's actually in. And there's a little green one right there which is its most likely position.

So our robots, with a not very major extension of the thing that we're going to do this week in lab, can actually drive around and localize themselves in reasonably complicated environments. So that's sort of a fun thing.

OK, I think that probably I have another application of state estimation, but I don't think I'm going to do it. I think what we're going to do is finish the bet, and then I'll set you free. So I need one more volunteer, this time for the money. Here.

OK, Denny's money is riding on this. You'd better pick a good one. Blue! Yes! Excellent. Excellent.

So I get my $20 back. Yay! OK, that's good.

We'll see you all at the exam tomorrow night.

STUDENT: [INAUDIBLE].

PROFESSOR KAELBLING: What's in the bag? You want to know what's in the bag? Oh, they cheated! Oh, look at this. All blue. Next time-- OK, this is bad.

Next time, I'm going to have to set the bag up before we do the handout. I feel bad. I'm going to give Denny his money back now. Here's your money back. Here's my certificate. All right.