Leslie Kaelbling: 6.01 Lecture 02

Search transcript...

PROFESSOR LESLIE KAELBLING: Good. So we're going to be having lectures in here henceforth, so this is a much nicer room and I'm pretty sure we'll fit. It's a whirlwind, right? This is only the second lecture of the third week, and we have finished the first module of this course. Well, we've kind of finished it. So we have been concentrating on the ideas of compositionality and design in software, and in particular, looking at it in the context of object-oriented programming.

One of things I happen to really like about this course is that it's not about programming, but we do a lot of programming throughout the rest of the term. So we're going to exercise and build on, throughout the rest of the semester, the ideas that we just put in place in the last two weeks. So over time, we'll build many more classes that represent systems of objects that compose in various interesting kinds of ways. So although the software engineering module is over in the sense that we're going to move to another primary topic, we're going to go back over that over and over and over again. So please don't imagine that because, technically speaking, we did object oriented programming, A, that you're an expert in it, or B, that we're going to quit talking about it because probably, neither of those things are true. Maybe for some of you it is.

So what are we going to do next? What we're going to do next is think about analyzing systems that interact with the world, trying to predict something about their behavior. So what you've seen so far is that we can build state machines that control a robot in interesting ways. You did the wall following state machine, and you could see how you could actually build up quite complex behaviors out of state machines composed of state machines or layered on top of other state machines and so on. So you now have the tools to build really interesting and complicated robot controllers. So that's a good thing.

The bad thing, potentially, is that if you build a particular program to control a robot, and say to me, look, here's my program to control the robot, I don't have any way of looking at that program or thinking about the robot in its environment that will help me understand whether that program that you just wrote for the robot is a good one or not. So the only way that we have of assessing general purpose computer programs that control robots is by running them and looking to see what happens. And because there's variability in the world and the sensors and so on, you would have to run them a lot of times and so on. So just looking at a program, it's not a very satisfying way to understand how well it works, and watching it run takes time and money and you have to possibly worry about statistical analyses of whether you've done enough tests and so on.

So that's not such a good thing. We can write really complicated things, but we don't have a good way of understanding how they work. So what we're going to do in this section of the course and then again in a more complicated way at the very end is think about building models of both the controller that we put in the robot and of the world that the robot's interacting with, and we're going to build those models in such a way that we can do mathematical analysis of the interaction of the robot and its world and prove something about how well it's going to behave. So we're going to do mathematical analysis of the system that lets us know how well it's going to behave under the assumptions that we put into the model. So there's always going to be a little bit of slop between the models we make and the actual world, but this will give us some access to understanding how well a program is likely to work actually before we start running it. So that's a good thing.

So we'll continue with the idea of state machines, and now what we can do is think about sort of a composite model where we have a controller. So that may be like the controller that you wrote, something that takes in, for instance, the example that we'll look at a lot this week is you would like your robot to be a certain distance from the wall in front of it. So maybe somebody tells the robot what distance it wants to be from the wall in front of it. Then we have a controller that somehow takes that desired distance and possibly the actual distance, and then tries to figure out what to do about that. So you're used to writing a piece of program that's like a controller for the robot.

Then we have this box over here with the crazy name, plant. So this is something that Control Theory people call "the world." It's the plant. That's because when people started doing control theory, they were controlling manufacturing plants, and so that was the world, the plant. Students have wondered whether there was something green and leafy involved here. The answer is no. No leafy greens, just the world that my controller is interacting with.

So if I think of myself as a controller of students is 601, that's kind of a noisy system to control. I'm trying to control your brains so that they arrive in a state that understands the whole set of material. You can think of that as my job. Then I'm the controller and you're the plant. If it's the robot interacting with the world, then one way to think about it is that the controller is the little piece of program that you're writing right now, and the plant is actually the combination of the physical robot plus the wall and the other stuff that it's working. So it's a very general idea.

So if we were to make a state machine description of a controller and a state machine description of a plant, we know how to compose those and do feedback, and that gives us now one thing which is still a state machine. And so maybe, we'll get a handle on understanding the whole system by thinking about it is a composition of pieces that are state machines. So that's a nice idea, maybe. We think of the whole thing as the composition of state machines.

If we use general purpose state machines, they don't give us a very good handle. They don't let us analyze what's going on very well. So again, if I hand you a computer program and I ask you even a very simple question like, will this ever terminate? The answer is that you cannot in general answer that question. So general computer programs are very hard to analyze, very, very, very hard to analyze, and they won't give us a way of really understanding how well things are going to work.

So what we're going to do in this module is restrict ourselves to actually a very restricted class of state machines, but they have the property that we can do analysis on them and make predictions about how they're going to behave. So we're going to consider this class of state machines called LTI machines, Linear Time Invariance systems. And we'll see in detail what linear, time, and variance mean to us.

So since you already understand state machines, you can think of them as state machines. The state of these machines that we're going to think about is simply some number of previous inputs that the machine has seen and some number of previous outputs that the machine has generated. Now, you've already written a state machine that, for instance, has to remember two previous inputs. You have a pretty good idea of how you could make that state machine remember as many previous inputs as it needs. And similarly, it wouldn't be that hard to just add to the state of your state machine some number of previous outputs.

So the idea of a state machine with some number of previous inputs, some number of previous outputs, that's the basic idea of the state. And then the output is going to be a fixed linear function of the input and the state. So we'll look at this in detail, see what that means, a fixed linear function. But the good thing about this is that these systems can be analyzed mathematically by making this restriction to this class of state machines. We can analyze them mathematically, and they're also compositional, which is cool.

So if we have two LTI state machines and we make a cascade composition of them, that's still an LTI state machine. It's still in the simple class. If we take one and we do feedback on it, it's still in this class. If we take them and we compose them in parallel, they're still in this class. So that's an important feature. We really like compositional systems because we can build up very complicated things out of simple pieces.

And again, this is an idea that we're already used to from state machines, thinking about describing a system by the way it transforms its inputs into its outputs. So a signal-- and we'll talk in detail what we mean by a signal, but just think of it now as a sequence of values, an infinite sequence of of values comes in, the system operates on it in some way and ultimately, it generates output, which is also some infinite sequence of values. And we'll see that we have PCAP-- Primitive Composition and Abstraction and Pattern systems-- for operating on signals and on systems. And so what we're going to do today and next week and the week after that is spin this out into a really interesting and powerful theory and methodology for designing systems that control an external world.

So here is a problem that you did last week. Last week, you had the robot, and it was facing some kind of wall, and we asked you to write a brain to control the robot so that it would stay a particular fixed distance from the wall. Now, most people wrote something which said, well, if I'm too far away, I should go forward, and if I'm too close, I should go backwards. And so that was OK, but there was a problem a lot of people had. Was there something anybody unsatisfied with their controller for their robot last week? No, they were all perfect?

Oscillation. So a lot of people had some problem with oscillation, right? They wrote something that said, well, if I'm too close, I should go backwards, if I'm too far away, I should go forward. But of course, the robot would never hit it right on the line and it would just go back and forth. So some people cured that by saying, well, I don't have to be exactly right. So they put a kind of band around the middle to say, if I'm near this dividing line, I just won't bother moving. So that was a way to kill the oscillation but at the price of not knowing where you wanted to go. So that wasn't so good.

So an alternative would be to make the control, the speed that we tell the robot to go, be proportional to the amount of error it has. So if it's way far away from where it's supposed to be, it goes really fast, and if it's almost there, it goes really slow, because the problem is that you'd like to just sneak up on the right place and stop. So here's Soar. I'm so brave, turning on a live demo here. We'll see if it works. Reload Brave New World.

So here's Soar, and here's the robot facing the wall, and now I have it set up so its velocity is proportional to the error. Instead of saying always go forward a fixed amount or backward a fixed speed, we're going to go proportionally. And so if I run this brain, there's the robot. Now, that was totally satisfying either, right? Were you satisfied with this? Here we go. Let's do it again. Oops, I have to move that. Sorry. We see these lovely figures. One more time. So watch the robot.

So that's a plot of the distance to the wall in front of the robot, right? It starts at this distance, it wants to be at roughly this distance, and these are the distances we get. Is that good? Do you like it? Do you not like it? No. So why do you like it?

STUDENT: It's not perfect.

PROFESSOR LESLIE KAELBLING: It's not perfect. That's right. And we demand perfection of our robot. That's right. Good. It's not perfect. It overshoots. On the other hand, it does actually, you can believe that if we just let it run for a while longer, it's going to arrive at the right distance. So that seems like a good thing. We didn't sacrifice the ability to arrive at the right distance, but we did somehow have this problem that it overshoots. And you could imagine that if we were trying to get close to a wall that was closer, we might have a problem. It might eventually even whack into the wall or something.

So what did we do? I wrote a program, I ran it. We looked at it, we see it's not so good, and you can imagine now entering a cycle of going back and messing with the program until we like it. That's a cycle we're all used to, messing with a program until we like it. Instead, we're going to try to go into now a different cycle, which is analyzing this program mathematically and seeing if we can understand how to change it by reasoning about it, rather than by watching it go so that it will have some good properties. So we'll go back to our previously scheduled program.

So here we called it undershoot because it makes the distance smaller than we wanted to. Why does it undershoot? What would be your explanation for that? What's it doing? Any idea? Yeah?

STUDENT: Can it immediately stop? It has to decelerate and then--

PROFESSOR LESLIE KAELBLING: That's right. So we have this issue that the robot has mass, inertia, and all that stuff, and even if it knew the exact distance it needed to go and therefore the velocity, it couldn't just teleport right there and stop instantly. So it gets going, and then it has trouble slowing down and stopping. So what we have to do is modulate. We have to think about, I said I was going to make the velocity proportional to the error, but we'll have to think about what that constant of proportionality is. So maybe the issue here is that it's still trying to get there too quickly. Maybe if we let it try to get there more slowly, it will not overshoot. That's the kind of thing that we're going to try to answer.

So if you think about now, we think about this, we've been calling it WallFinder. So let WallFinder be the composition of the robot and the world around it, and we're putting a signal in that says, hey robot, I want you to be this distance from the wall. And then the combination of the robot program and the world is running, it's doing something, and what we're getting out is a plot that looks sort of like this which tells us the current distance to the wall.

So this system-- remember this system is now the combination of the controller and the world-- together, we can think of it as a state machine that transduces this signal, which is the desired distance, into this signal, which is in fact, the actual distance. And we can now think about how would we change this system so that the signal that we get as output is something that we like better, maybe, than this one?

So this idea of signals and systems applies very broadly. Physics, biology, water tanks, cell phones, the idea of some system that takes in a signal and generates another one is a very broad view. We'll study it in the context of robots, but

Don't imagine that this piece of the course is about robots. It applies quite widely. Another important thing about this view is that it's modular. The idea that I'm speaking to my cell phone-- we'll I'm not speaking into my cell phone. I could. I'm speaking into a microphone. Anyway, so I'm making some little membrane vibrate and we're capturing that and then digitizing it and sending it somewhere else and so on. The idea that in the end, if we take this system and we compose these different transformations together, what I get is something that takes sound in and gives sound out.

And once I abstract this system as something that takes sound in and gives sound out, we can use it to do stuff, and we don't actually care whether what we have in between here is a cell phone system or a soup can and a piece of string. Sound in to sound out is what matters when I'm talking to you on the phone. How it gets implemented doesn't matter in detail although of course, certain transducers will have better properties of efficiency or sound quality or various things like that. So it'll matter in that sense. But thinking in terms of the fact that I'm going from sound to sound means that I don't have to worry in detail about what goes in between.

It's also hierarchical, and it's the same basic idea which is that once I have this thing, sound in to sound out, now I can use that as a building block in a more complicated description of a system. So it's compositional and hierarchical. I could put the pieces together, and then having put them together, I can use that piece as a module in a bigger system. So let's think about that.

So a lot of the analysis of signals and systems happens in continuous time. And those of you who are taking 1803 right now are thinking about differential equations, and that's really the background that we need to do analysis of continuous time systems. What we're going to do in this class is look at discrete time versions of signals and systems. It integrates very nicely with the state machine view, and it's a good way to start, and you could go from here then to think about the continuous time case. But it's a perfect model, actually, of what goes on in our robots, right? So the robot operates in a discrete time control loop where it reads the sensors, presents the sensor values to the brain for some amount of computation and decision. The brain computes a velocity command for the robot, and then that gets executed and we go around the loop.

So I said what we were going to do is think about LTI systems. So here's the deal. Here's the definition of what an LTI system is. It's a kind of state machine. Let's just actually look right here. So what I said was that the state of the state machine was the k previous output values and j previous input values, so that's the state, same as before. And what's interesting, then, is to look at the output. So I'm going to use this variable, y, to stand for the output. That's a convention. I don't know why, but y is the output and x is the input conventionally.

So what this says, this is a description of how we compute the output from the state in one of these machines. So this is the output at time, n, so the output I'm going to generate, say, right now if this is time n is a linear function of previous outputs and previous inputs-- possibly the current input and previous inputs. So all these c's are constants. They're constants that we fix when we build the controller. This is 0.2 and minus 9 and whatever, but the c's are constants and the d's are constants. So the c's and d's are constants, and they don't change while the system is running. So time invariant, that means that the constants don't change. The output is always the same linear function of the previous inputs and the previous outputs. Of course the state changes. It wouldn't be a very interesting state machine if the state were not time invariant. The state changes, but these constants don't change. So it's the same linear function.

So yn minus 1 is the output that this machine generated on the last step, on the previous step. This is the output it generated two steps ago, this is the output it generated k steps ago. xn is the input that's happening right now. OK, so here's a question. If I wanted, in the context of an LTI system, to represent a pure function, all the coefficients would be zero except for one of them. Which one would be non-zero for a pure function? d0, yay. OK, good. So d0 says, yn, my output, is going to be some constant times the input. That's the most complicated pure function we can represent in this set up. So you can see that there really is a restriction. This is not all the state machines in the world. This is a small subset of state machines, but that's an interesting subset.

STUDENT: [INAUDIBLE].

PROFESSOR LESLIE KAELBLING: My point was in a pure function-- this is a problem that many of you have been thinking about recently, a pure function state machine-- the output depends only on the current input. Current output just depends on the current input. There's no memory. So in this whole, hairy difference equation, we have various terms that depend on previous outputs that I generated and terms here that depend on previous inputs that the system has received.

But if I set all the c's to 0 and all the d's to 0 except for d0, then I get a system where the current output, y sub n, depends only on the current input. And so it's just a function. And so if d is 2, then my output is 2 times my input. If d is minus 1, it's minus 1 times my input. That's really all it can be. So does that make sense to everybody?

So let me go over these terms again in the context of this formula and then we can answer questions, because this is really an important thing. So linear time-invariant. So linear means that the way the output depends on the inputs is linear. It's just the old inputs and the old output values times constants added together. I'm not allowed to write yn minus 1 times yn minus 2. That would not be a linear combination of my previous outputs. So I can only add up a weighted combination of my previous outputs or my previous inputs. That's what I'm allowed to do. That makes it linear.

Time-invariant is that these coefficients, the c's and the d's, don't change over time. And causal, this is something that we don't talk about a whole lot, but we're really only allowed to define the output value at some time, n, in terms of previous output values and previous input values. We're not allowed to define the current output value in terms of input that we'll get in the future. You could make a lot of money in the stock market if you could do that. The systems that we talk about here, the way we define them, we're not going to allow ourselves to do that.

So any questions about this? That make sense? This is the cornerstone of all the stuff we're going to do, so it's worth making sure. I'll wait until you're uncomfortable. OK, good.

So we'll start out by looking at even a simpler subset of these systems where we define the output, y, only in terms of previous inputs. So we're not yet, for right now, going to worry about cases where y depends on previous outputs. So imagine that I have a system that's defined this way. y of n is x of n minus x of n minus 1. So colloquially, I would say, oh, the output is the current input minus the previous one, right? That's what that says.

So let's now think about how this machine defined that way would operate on a simple input. So we'll use this input here, which is called the unit sample signal. We'll talk about it a lot more, too. So it's a signal. It's something that's defined to have value 1 at time 0 and value 0 at all other times. So here's a plot of a piece of that signal. The unit sample signal is an infinite sequence of values. It goes all the way backwards and all the way forwards. You can never enumerate all the values in a signal. Maybe you could describe the signal compactly mathematically, but it's an infinite set of values.

And it looks like this. We use these plots a lot. So this just says it's a discrete time graph of the values of the signal. It has value 0 at minus 1 and minus 2 and minus 3 and so on. It has value 1 at time 0, and value 0 at time 1, 2, 3, 4, and so on. So what if we let x be this signal? So imagine that we let x be this signal. Then what would the output be? What would y be? We can think about that by making a table.

So let's do that. Let's make a table where we have n, x, and y. And we'll start n off at minus 1, 0, 1, 2, 3. And x is going to be the unit sample signal, so at time n, x has value 0. At time 0, 1. Good. 0, 0, 0.

So now, let's figure out what y is going to be. So y is x of n minus x of n minus 1. Well, let's start right here. We can think about that later. So y at 0 is x at 0 minus x at minus 1. So what's y of 0? 1. It's this minus that. y at 1? Negative 1. y at 2? y at 2,000? 0. OK, good. Because now there's no more differences, right? You can see that y is taking just a local difference of the two previous values of the signal. Since the signal only has one place where it's nonzero, there's just these two places where it has value nonzero also. So because it's got zeroes all the way back, we know that y minus 1 is 0. OK

So here's this new signal, and that's the output. That's the y that we get if x is the unit sample signal, and this difference equation characterizes the LTI system that we're feeding it through. So here's the example of that. Does that make sense to everybody?

So we have another way. We're going to have several ways of describing these systems, and they'll help you think about them, sometimes to draw pictures, sometimes to write down one kind of equation, sometimes to write another kind of an equation. And we'll have to get used to converting back and forth between them and it really helps us understand things better by thinking about multiple representations.

So another representation of this difference equation, of what it does to its inputs, how it gets its output from its input, is by using a block diagram that looks like this. Now, you're already used to the idea of composing state machines, and so that's what we're doing here. So all the little boxes in this diagram are state machines, and the arrows are compositions. So this thing, this triangle, is called a gain, but it's really just to multiply. The triangle is a pure function that takes the input in and multiplies it by a constant that we write inside the little triangle. And the delay is the delay state machine. You already know about the delay state machine. It delays the input by one time step. And this is a pure function that takes two inputs and adds them up and generates an output.

So those are going to be our three primitives. If you think about, we're doing linear combinations of values that are potentially delayed. Those are our tools, right? Linear combinations give us scaling and addition, and then being able to look back into history is what we get from delay. So those are the tools that we have to build up these systems.

So this block diagram corresponds to this difference equation. How do we see that? Well, the output is the sum of two things-- and that seems right, because y is the sum of two things. And what is it the sum of? Well, it's the sum of x at that same time minus 1, so that's how we get this negation, and a delay. So when we take x and we run it through a delay, it means that what comes out here is the x from the previous time step. So we can use this way of describing things in terms of compositions of primitives as an equivalent description of the system to this difference equation. Does that make sense? So the difference equation is compact, easy to write down. You don't have to get out your drawing program. But sometimes the block diagrams can help you visualize what's going on better, so we'll use both methods.

So when we drew this picture and when we thought about the difference equation, we were talking always about particular values of these signals, the value at time n or the value at time n minus 1. And now what we want to do is change our view a little bit and we want to say, rather than thinking about what happens at time 2 or time 4 or time 93, I want to think about how this box transforms a whole infinite sequence of values into another whole infinite sequence of values. Because it's time-invariant, the transmission is going to be the same thing from signals to signals. Whether we think about it operating at time minus 100 or at time plus 100, the fundamental way the machine operates is the same.

So a shift in view. This is the same figure as before, but now I've labeled the wires with a big X and a big Y, and these are now standing for whole signals, whole infinite sequences of values. And we can think of these boxes now as operating on whole signals. So now we're going to descend briefly into a little PCAP story, a little Primitive Composition and Abstraction story, on signals, which are infinite sequences of values.

So just to get some notational conventions, we'll use a big X or some other letter-- we'll have to have more than just X and Y eventually-- but we'll use a capital letter to stand for a whole signal, and we'll use the small letter with a subscript of a time, which is the value of that signal, x, at time step n. Y could a whole output signal, y sub n is the value of the output signal at time step n.

And the notion of transduction is still the same from state machines. Remember we had this method transduce? In the state machines, we had a method transduce which took a finite set of values and gave you a finite set of output values. We're just going to think now of transduce as actually operating on infinite sequences. In the mathematical world, that's OK. In Python, it's also going to be OK. You'll see.

So now, we can think of these operators as manipulating signals, whole signals. We're going to think about now scaling a whole signal, delaying a whole signal, adding whole signals. So we're going to think about PCAP, and we're going to start with actually a single primitive. The primitive that we're going to think about is the primitive we already introduced, which is this unit sample signal. So here's a signal defined to be value 1 at time 0, value 0 everywhere else. That's an infinite object and has values everywhere at every possible index. There are other useful primitives which are described more in the notes, but we can get pretty far with just this one.

So now, what can we do to a signal? We can scale a signal. Scaling a signal means multiplying the whole signal, every single value in the infinite sequence, by some constant. So we can scale it by some constant, c. I've written here a way of thinking about what's going on. If I take the signal, X, and multiply it by c-- so now here, I'm taking a signal and scaling it by c-- and I ask what the nth value in that signal is, the answer to that question is c, the constant, times the nth value of the signal. So we're just effectively multiplying this infinite sequence of values, each one of them by c. So we can do 4 delta, that gives us this, minus 3.3 delta gives us something that looks like that. This constant, c, is often called a gain.

A really interesting operation is delay. So another thing that we can do to a signal is delay it. We use this name, R, to stand for delaying a signal. It's called R because given the way that we draw the signals, delaying a signal shifts it to the right one. So R is for shift to the right. That means make it happen later. I know that every now and then, I have a total mental failure and I look at this and I think, it doesn't seem like later, it seems farther. But it all just has to do with which way you think time grows. Anyway, later is to the right in these pictures.

So RX-- X is a signal, any signal you want it to be. RX is that same signal, just shifted to the right by one. It was infinite before, it's infinite now. That's fine. It's just shifted over by one. And if we want to say exactly how we get the samples out of a shifted signal, we can say, well, if you give me a shifted signal and ask me for the nth value-- say the fifth value in the shifted signal-- I can answer that question by looking at the original signal and looking at the value at time, 4. So the value at time 5 of the shifted signal is the value at time 4 of the original signal. Does that make sense?

So here's R of delta. Here's the unit sample signal shifted right by one. Here's the unit sample signal shifted right by 3. And just for notational convenience, and also it's going to turn out to be even better than notational convenience, we'll write R cubed when we mean shifted to the right 3 times.

OK, so we could scale, we could delay, and now we can add. So we can take two signals and add them together, and it goes just the way you'd expect it to. We just take the one signal, take the other signal, line them up so their indices match, and add the values. Now again, this is something you can do in Plato's world of infinite perfect mathematics. These are infinite things that we're adding up. It's sort of like vector addition of infinite vectors or something.

But we can describe what we mean quite compactly. Even though these are infinite things we're adding together, we're saying, well, if you ask me for the nth value, say the fifth value of this signal which I get by adding two signals together, if you ask me for that value, I can give it to you. And the way I'm going to give it to you is I'm going to get the fifth value of the first signal and add it to the fifth value of the second signal. So that's a complete and compact definition of how we can add infinite objects.

So here, we've added the unit sample signal, plus the unit sample signal delayed by 2, plus the unit sample signal delayed by 4. And here we've done a similar thing, but we've scaled them before adding them.

So now the thing that's nice, we have primitives, we have ways of combining them with scaling, delay, and addition, and now we can abstract by naming the signals. So we could say, all right, I'm going to define the signal 3 delta plus 4R delta minus 2R squared delta, I could call that Y. It's a signal. It's sort of like a procedure. It's a complicated thing we defined, we give it a name, and now we can use it to define other signals. So I can say, well now my new signal, Z, is Y plus 0.3RY. And if I took my signal from before, Y, and did this to it, than I would get a new signal. And I can do anything I want with the Z's and so on. So I have a nice compositional system on signals.

And one thing that's important to know, if you think about it, it's not so surprising, really, but any signal that has finitely many nonzero samples in it can be constructed from the unit sample signal with delay, add, and gain because you could just put those peaks wherever you want them. Questions?

Most of the time when we draw a diagram like this, we're going to think about it as operating on whole entire signals. And we can describe it in terms now of what we call operator equations or operator notation. At least right here, it has a similar form to the difference equation notation but instead of talking about the values at particular times, we're writing an equation that describes how whole signals are related to one another.

So I can write Y equals X minus RX. And that's description of operations that I can do on the signal, X, to compute my output signal, Y, all in one go. I don't have to talk about minus 1 or minus 2. I just say, here's how to make the whole signal. Take the original signal X, take a copy of it, shift it to the right, subtract the one from the other, now that's a new signal. So that's a description of how the whole thing works.

So Y equals RX. Which of the following things here is true? Just to kind of help you understand something about what's going on. You can now think about it for a minute and then put up your hand with the number of fingers corresponding to the answer. Peace, man. OK, good. The people who are awake said two. So this is just to check to be sure that you understand R and the relationship. This is a difference equation. Actually, I can do that. It's red now.

This is a difference equation and this is an operator equation. They say exactly the same thing. This says y a time step n plus 1 is equal to x at the previous time step. Now, does it perturb you that it does-- it could have said y at n is equal to x at n minus 1. That's saying precisely the same thing, right? It's just the relationship between the indices that matters. And so this says y is equal to x delayed by 1. yep?

STUDENT: Could you just quickly go over what the actual difference is between a difference equation and an operator equation?

PROFESSOR LESLIE KAELBLING: Good. The difference between a difference equation and an operator equation is the following. A difference equation is about samples. It's talking about lowercase variables which stand for the particular value of the signal at a particular time. So I can say y at time 100, y at time 52, y at time minus 10. It's just talking about particular values.

An operator equation has these big variables, and it's talking about infinite sequences all at once. So this uppercase Y is an infinite sequence, X is an infinite sequence. And this is describing a way that these two infinite sequences are related to one another. That's a really, really important thing to have in your head. So signals are infinite sequences. We can talk about how whole signals relate to one another.

Now of course, the way whole signals relate to one another can also be described in terms of how samples relate to one another, but it's a question of whether you're talking about time points or the whole thing. Other questions? So now the really cool stuff starts.

We have these equations, right? We have things that we can write down. Let's go back to here for a second. We wrote Y equals X minus RX here. And once your high school algebra skills kick in, you say, well I could factor an X out of that. And yeah, algebraically you can. And the question is, what does that mean? Does that mean something? Is that an OK thing to do? Just because it looks like something that used to be OK, is it OK in this new world? And the answer is yes.

The algebraic operations that you're used to applying when you're operating on numbers apply when you're operating on signals as well. So let's just go through these. There's little proofs in the notes in more detail, but we'll just look and see what some of the consequences are.

So for instance, we can see that commutativity holds. So I can write R times 1 minus R times X. That's the same as 1 minus R times R times X. So let's look at this. So 1 minus R is the operation of taking the signal and subtracting a delayed version of it. So this chunk of stuff right here does 1 minus R. The operation R is a delay. 1 minus R of R of X is the same as R of 1 minus R of X.

So you can see if you look at this one, that sort of corresponds to this picture. It says, I take X, I run it through the 1 minus R machine-- that's running it through this-- and then I run it through the R machine-- that's running it through the delay-- and out comes my Y. If I commute those operations, I do them in the other order, then what I do is I run it through the R machine first and then through the 1 minus R machine to get Y. So that sort of corresponds to this expression. And those two things are equivalent under a condition that I'll describe in a minute.

So this is cascading, right? This is taking the output of one machine and putting it into another one. It's multiplication in the algebraic sense, right? Composition. So here, when I multiply these operations, 1 minus R to R, that's like put it through this box and then put it through that box. That corresponds to cascade composition in our system of machine compositions.

So these two machines are equivalent if something. So what's the if? Remember when we made delay machines, state machines, and we had to specify the initial output? Because when you delay a signal, you have to do something in the first time somebody asks you a question, you don't know what it is. So there's this notion that when the system wakes up, it wakes up in some kind of state.

If these delays wake up with 0s in them, if they wake up what we call at rest, then you could always do this commutativity maneuver and everything is OK. If they don't wake up at rest, then you might need to put different initial values into the delays in order to make it all generate the same sequence of values. They'll have the same long term behavior, which we'll spend time exploring next week. But if you want to say they're exactly the same, they get exactly the same outputs, then you have to worry about the values in the delays. But for right now, when we're thinking about the algebra, we say, oh, we're going to assume they wake up with 0s, then you can commute them and it's fine. So here's an algebraic operation that it's OK to perform on these operator expressions.

We also have the distributive law. So multiplication distributes over addition in normal algebra, and it works here, too. It's easy to see it in the block diagrams. You could say, well, if I take X and I subtract RX from it, and then take that and run it through a delay, that's the same as taking two copies of the delay box and putting one along this wire and one over here. Because it's still the sum of X delayed by 1 minus X delayed by 2. So here, we can see it, X delayed by 1 minus X delayed by 2.

Algebraically, in the operator expressions, you can see it down here. What we're doing to our signal is delaying by 1 the result of taking the signal and subtracting its delayed value. And that's the same as doing delay by 1 minus delay by 2. Is that cool with everybody?

It gets hairier from here. I'm going to let you look at this. But basically, the usual kinds of associative operations, they still apply. You can puzzle over all the triangles and boxes in your free time. I'm also going to let you solve this problem in your free time, because it takes a little stewing, but let me just give you a hint about how you think about it. So the question is, how many of these systems are equivalent? In fact, all three of them are equivalent. So people say, oh, how could I ever know that?

So first of all, I can't just look at these and say, oh yeah, totally they're equivalent. Maybe Denny can look at them and say, oh totally, they're equivalent, but I can't. So if you're not a very practiced person at doing this stuff, then you can just fall back on either the mathematical tools or on tracing through the diagram with your finger. So let's just do the tracing through case.

What you can see is that there are four pathways from the X to get to the Y. It can go upstairs and upstairs, upstairs and downstairs, downstairs and upstairs, downstairs and downstairs. So that gives you a sum of four terms in the expression that describes how Y depends on x. And you can see a similar thing in each of these, and you can work through and see that those sums are the same. Again, I'm going to let you work that on your own time. Feel free to ask us questions in Piazza if you want to.

So now I'm going to get to the next big conceptual thing. So what we've seen so far is that we can take signals, we can describe how systems operate on these signals using these components of addition, gain, and delay. But so far, all the examples we've seen have been what are called feed forward systems where all the arrows, all the water flows home to the sea. So all the input flows in the same direction, it gets modulated in various ways, added up, and we get the output.

But there's another thing we can do, which is that we're allowed-- I said at the very beginning that LTI systems, that output was allowed to depend on previous inputs, but also to depend on previous outputs. So far, we haven't done that yet, but now we're going to let our output depend on previous outputs.

So what looks like in pictures is this. What we see is that there are now cycles in our diagrams. So here's an output, and it's going through a delay box and coming back and being added in. So that's a fundamental here-now feedback loop. So how do we describe this in terms of operator equations? Well, it's not that hard. If you just relax and don't worry too much about what you're writing down, it's pretty easy, and then we'll think about what it means in a minute.

So what is Y? Well, Y is the sum of two things. Y is the sum of two things. It better be, because Y is that point of an adder. OK, sum of two things. One of the things that it's the sum of is X. That's easy. And then, what's the other thing that's coming into the adder? Well, it's the delay of Y. So this signal right here coming into the adder is RY. You believe that? We don't know what Y is, but whatever Y is, this is RY. So now we've defined Y is equal to RY plus X. Is that cool?

Now, this feels a little bit trickier, because we're not sure what Y is. Before, if you knew X, it was totally clear how to get Y. A way to compute Y was obvious. Here, this doesn't tell us how to compute Y. It's like an algebraic equation when you say, a squared minus 4 is equal to the a over 2 and then you have to find a. Nobody told you what a is quite yet. It's the same thing here. We're giving you an equation that will determine the value of Y, but we haven't told you the value of Y yet.

So we can do some algebraic manipulations on this if we want to. We could write it this way, 1 minus R of Y is equal to X. So that means if we take Y-- so the 1 minus R of something, we're used to this, right? This was Y is 1 minus R of X. That didn't perturb us at all, right? This says, if I took X and I subtracted its delayed self, I would get Y. This says, if I took Y and subtracted its delayed self, I would get X. Well, that's cool, but that's not the way I need to run my show because I know X and I need Y, but it describes the relationship between these two segments.

So now what we have to think about is how we can take this description of the relationship between the two signals and see what does it tell us about Y if we know X. So let's look at an example now, so we'll make a table. Just so we keep things straight, this was for Y is X minus RX. That was this guy. So now let's try to do the same trick, think about feeding the unit sample signal into this new machine we have with the feedback.

So the new machine is Y is equal to X plus RY. So let's make a table. N, X, Y. And we're still going to think about X being the unit sample signal. This to give us some intuition about how this machine works. So let's go with this, 0, 1, 2, 3. And the unit sample signal is the same unit sample signal that it was before, So it's 0, 1, 0, 0, 0. So now, let's try to figure out the value of signal Y at time 0. What do you think? What should that value be? There's a finger. Tell me. It should be 1. Why should it be 1?

STUDENT: Because X is 1 and the last Y is presumably 0.

PROFESSOR LESLIE KAELBLING: Good. So what he said was that this should be 1 because the X at the same time was 1, and the Y at the previous time was presumably 0, and that's a very nice way of saying it, presumably 0. If it were 0, we actually kind of have to know this value in order to know how to fill this table in. We have to know the previous output of this machine. Now to understand its long term behavior, it turns out that we don't have to know that, but to understand the particular values that are going to come out of this machine when we run it, it's going to depend on the initial value that we put in there. So again, just like the delay machine, depends on the initial value that we put in.

So this is an assumption that it is at rest. If we assume that it's 0, then we can go ahead and compute the rest of these values. If you assume that it's something else, you can also compute values and they would be different. So now what is it? X plus Y. Actually, let's write down the equivalent difference equation because that makes things a little bit easier to think about. So what difference equation is equivalent to this? I'm going to say Y at time n is equal to X at what time? n plus Y when? n minus 1. OK, good.

So this is an absolutely equivalent statement to this, but when you're actually doing the business of filling in a table, sometimes it's easier to see what to do when you look at the difference equation view because we're trafficking in individual samples. We're trying to compute Y at time 0. Y at time 0 is X at time 0 plus Y at time negative 1. So presuming at rest, then this is 1. So now, the pattern is that the value here is going to be the sum of these two guys will give us the value here, and that's 1, and the sum of these two guys give us the value here, and that's 1. So the output is going to be 1 forever into the future.

This machine is called an accumulator. It's just like the accumulator state machine that you wrote. It adds up all the inputs it's ever seen. In the case it just sees one spike of value 1, then it's going to have value 1 forevermore. So this is a way, using our simple tools-- before, remember I said that from the unit sample signal, we could create any signal that had a finite number of nonzero values with add and scale and delay? Now we can see that by using feedback, we can actually get signals that have infinitely many nonzero values.

So the little slogan down here, "Persistent response to transient input." A transient input is one that only has finitely many nonzero values. You could just enumerate the places where it's not 0. That's what it means to be a transient signal. And a persistent response is something that actually has nonzero values infinitely into the future. So that's an interesting thing. Feedback really changes the game in a significant way. You put in one little pulse, and it stays there forever.

So how can we think about the accumulator? This equation may seem a little funny somehow, because we want we want to solve for Y, and we're not sure what it means to solve for Y in an expression like this. So we saw that various kinds of commutativity were OK with operators, but dividing through by 1 minus R should make you at least nervous. It's not clear what that means. What is 1 minus R, and how can we divide through by it? It's going to turn out that algebraically, it's actually an OK thing to do and it means something sensible in the systems sense.

So let's first think about that infinite system. So let's think about this Y is X plus RY. So one thing that you could do if you want to try to understand that, Y is X plus RY, you could say, well, Y is equal to X plus RY. That's X plus RY. And I could expand things out a little bit more. I get X plus RX plus R of X plus RY. You can see where this is going now probably. So this is going to turn into X plus RX plus R squared X plus R cubed X et cetera. And we can factor the X out. That's OK. These are all operations that we've decided are OK. So that's going to be equal to X times 1 plus R plus R squared plus R cubed et cetera. And so there's a proof in the readings that shows us that this is equivalent to X over 1 minus R.

And this is a nice picture to help you have intuition about that expansion I just did. So in the feedback case, whatever the output was, it's going to come around and get added in, but another way to see that is just that we take X and then we delay it and add it in again, delay it by 2 and add it in again, delay it by 3 and add it in again. So if you think of the unit sample signal as having that one spike at 1, this is just saying, I'm going to take infinitely many copies of the unit sample signal, delay each one by some integer amount, and add them all up. So that gives me spikes of height 1 at all future time points from 0 forever.

I am going to skip this problem because we have other stuff to do, and we'll come back to it if we have time. So I want to recap here the LTI system idea and then talk a little bit about, again, how we work with these systems. So the state is k, previous output values and j, previous input values, and the output is a linear combination of these things. Again, there's an example that I'm going to come back to if we have time.

What I want to do is be sure that you're happy with the conversion between difference equations and operator equations. So they have the same number of terms, they have the same coefficients in this form. So we can write an operator equation this way, we can write a difference equation this way, and it really is that we're using R to talk about an operation on a whole signal instead of naming particular indexed values into the signals. So difference equations and operator equations. OK?

So there's something, a skill, that you'll need to have in Design Lab, and I just want to be sure that you see how to do this because it's one of the real advantages of working in the operator equation way of thinking about things. So imagine somebody gives you a description of a system here and a description of another system and says, I have them wired up in this way. And then what they'd like you to give them back is a description in terms of a difference equation or an operator equation of how Y depends on X.

So I go down to the state machine store and I buy one of these and one of these and a thing to add them together and I wire them up, and I'd like to get a certificate or a proof or something that says, if I wire these pieces up this way, how is the output going to depend on the input? So I know the certificates for the two boxes. I know how their outputs depend on their inputs historically. And so I know this is an LTI system, I know this is an LTI system. And I know from our arguments of compositionality that this whole thing is an LTI system, and I want a compact description of how Y depends on X. So that's the problem I'd like to solve.

So let's imagine, again, that when I went to the store, I looked up the certificates for these two boxes and I was told that, well, Y is just some constant, K, times E. Obviously, this signal, E, is X plus Z. That's what an adder does. And the certificate for this box says that Z is RY minus R squared Y. So do you get the problem? So each of these boxes comes with a description of how it works. We know how an adder works. We want to know, how does the whole system work together? I'm going to do this on the board.

E is X plus Z. And Z is some complicated thing, RY minus R squared Y. Z is equal to RY minus R squared Y. This is an algebra problem. What we want to do is find-- in order to describe how this whole machine works, we want to get rid of those internal variables. We don't want to think about them. That's an implementation detail of my particular box here. But we want to find an expression for Y in terms of X. So this is plain old algebra. You can either watch we make mistakes or you can make your own mistakes on your page. In the end, we should all arrive at the same answer or agree on an answer. So up to you, watch me, do it yourself, but be sure that in the end, we agree.

So let's see. What am I going to do? Well, let's get rid of E. It seems easy to get rid of. So we'll get Y is K times X plus Z. That was good enough. I don't need those equations anymore. And X is an input, so that's a good. So now my issue is Z. So with Z, well we have an actual definition for Z here. So we can get this expression Y is K times X plus we put this in, RY minus R squared Y.

So now, all we have to do is solve for Y. So how am I going to do that? Well, I get Y minus-- I'm going to do this wrong. I can't do algebra. I'm just doing this to humiliate myself or something-- plus KR squared Y is equal to KX. So now, what do I do? I just factor the Y out. So I'm going to get Y is equal to KX divided by 1 minus KR plus KR squared.

So that's some kind of a description of how Y depends on X. That's a description we'll know what to do with, actually, better next week. But we can get out of this a different kind of description that you might find easier to think about. So let's try to convert this now. I can't put that board up so I'm going to come over here. Another kind of a description that could be useful to us is a difference equation. That tells us how does this machine run.

So let's think about the difference equation view of this machine. So what would that be? So you could help me with this. So I'm going to say, y at time n, little y, is equal to what? Tell me a term that goes on this side.

STUDENT: [INAUDIBLE].

PROFESSOR LESLIE KAELBLING: What?

STUDENT: kx n.

PROFESSOR LESLIE KAELBLING: kx n, good. What else?

STUDENT: ky minus 1.

PROFESSOR LESLIE KAELBLING: Good. Plus ky at n minus 1. We got this from this guy. ky n minus 1. And we get one more term, and it's going to be minus ky what?

STUDENT: n minus 2.

PROFESSOR LESLIE KAELBLING: n minus 2. So there's another description of that same system. So you can always go back and forth between them. Sometimes it's easier to think about one, sometimes it's easier to think about another. This tells you how the state machine works. It tells you if you knew two previous outputs of this system and the current input. You need the current input and two previous outputs. If you knew those things, you could compute the current output. So this tells you what the state needs to be, and it tells you how you need to compute the output as a function of the state. This way of using operator equations is a really convenient way for describing how machines operate and in particular how to compose them and how to understand how the aggregate machine works. We will be doing stuff using-- question, thank you.

STUDENT: Are there algebraic methods you can't use when you're trying to solve like this?

PROFESSOR LESLIE KAELBLING: You mean, is there anything that you're tempted to do algebraically that's actually forbidden?

STUDENT: Yes.

PROFESSOR LESLIE KAELBLING: I don't think so. No. It's very cool.

STUDENT: [INAUDIBLE].

PROFESSOR LESLIE KAELBLING: Initial rest, right. Under the assumption that the delays start out with zeroes in them, all of algebra [INAUDIBLE]. Yes?

STUDENT: You said that we know that Y is equal to a constant [INAUDIBLE]. How do we exactly know this?

PROFESSOR LESLIE KAELBLING: Because when you bought this part from Radio Shack, it said, I'm a gain. Yeah.

STUDENT: [INAUDIBLE].

PROFESSOR LESLIE KAELBLING: Right. So this is not an always true kind of thing. This is just saying, if I bought a box from Radio Shack-- so I bought this box, and this box came with the certificate that Y was equal to K times E so that this multiplies by K. And this box comes with a certificate that what it does is it takes its input, it delays it by 1, and subtracts the input delay by 2. So that's the certificate for this box, that's the certificate for this box, and then the question is, what happens when we put them together? You can't buy those exact boxes from Radio Shack. Other questions about this?

So we're going to have a whole bunch of different ways of creating state machines that are equivalent to these LTI machines defined by difference equations. I have two of them here. There's another one that we'll use a little bit in Design Lab too which will be described in one of the mini lectures. So imagine that you just wanted to create a state machine just from scratch that did this. You can see that you can do it just by remembering, by making a state machine whose next state is the old input and whose output is the current input minus the state, which is the old input. So you've looked at a delay by 2 machine. This is a machine that just takes the current input and subtracts the delayed output.

So that's the same machine. You could make an instance of this state machine. Here, you can feed in a finite piece of the signal to transduce and it will give you out the transduction of that piece. This is something that I want you guys to look at pretty carefully because this is, again, a thing that's going to be important to us in the short term.

Another way to construct this state machine is instead of building it yourself-- building it yourself was maybe OK when it was the simple, but when they start to get really complicated and you try to get all this managed, all this yourself, it'll get buggy and really, really, really hard to debug. And what's nice is to look at this and say, oh, this is a composition of a bunch of primitives. So let me define the primitive state machines and then put them together. And we can do that using our state machine composition methods.

So I can say, oh yeah, this machine, what I do is I'm going to use in this case a thing called Parallel Add. It's a state machine combinator. You can read about in the software documentation. It takes as input a pair of values and it gives out the sum of those two values. So it's going to combine two state machines, and its output is going to be the sum of the outputs of those state machines.

So I'm adding two state machines here together. So there's a state machine that's described by the value on this wire, and the output of the state machine that's described by the value here. This top state machine is a really boring one. It's so boring we gave it a name. We gave it the name Wire. So Wire is a state machine. It's the same as multiplying by 1, which is its output is just the same as its input. That's fine. We're happy with simple things. So sm.Wire is this state machine that's just a wire.

This state machine down here is a cascade. It's a cascade of a gain of minus 1. So gain is a multiplier. Gain state machine corresponds to a triangular thing in our diagram. It means multiply by minus 1. So that's this. And then we have a delay. We have this state machine called sm.R. You know it as sm.Delay, but it has another name now, R. That works. sm.R, if you don't give it any arguments, it assumes that it starts at rest. It starts with value 0. But you can also pass in the value that you'd like it to start out with in one second.

So now, I'm going to cascade the gain with the delay. So sm.Cascade of Gain and Delay is a description of this much of the machine. Then when I take these two guys in parallel and add their values, I get a description of the whole machine. Yeah?

STUDENT: I was just wondering why the name Cascade.

PROFESSOR LESLIE KAELBLING: Why the name Cascade? I don't know. Some people call it Series, Series Composition or Serial Composition. Later, we're going to be doing circuits and they have a notion of series which is sort of a different thing, so we decided that we didn't want to use that. I don't know. But it flows, it flows. The output of one flows into the other one. That's why it cascades.

STUDENT: All right.

PROFESSOR LESLIE KAELBLING: Now, if I had changed this around, if I wrote sm.R of 0, sm.Gain of minus 1, would I get the same or a different machine? Same. So changing the order of arguments here would be like changing the order of the Delay and the Gain here. That's, formally speaking, a different actual contraption, but it will cause the same mathematical transformation of the signal as long as the delays start out with zeroes.

So we have lots of representations of LTI systems. We can represent them as difference equations which lets you simulate how they work, as block diagrams which let us have intuition about how the signals flow around, operator descriptions, which are particularly useful for combination and algebraic manipulation. Python subclasses, so there's a Python subclass called LTI State Machine, and you can just put in the C coefficients and the D coefficients and some start up values, and that'll make any LTI state machine in the whole world you want in one blow. Or there is what we just saw, a Python combination of primitive state machines which lets you wire together the pieces to make a state machine.

So this week, we're going to have no software lab. There's Design Lab as usual. There are exercises on the Tutor to do, so you can start doing those. Please post questions on Piazza. Go to office hours. See you later.