Computation and the Transformation of Practically Everything: The March of Technology
AGARWAL: I am Anant Agarwal, and chairing our next session. We have a couple of very exciting speakers for this March of Technology session. The first speaker will be Rod Brooks, who was director of CSAIL in the early part of the decade. He's also been a founder of a string of successful robotics companies, iRobot, and most recently, Heartland Robotics. Rod.
Actually, it says that Rodney L. Brooks is speaking, and I don't know who he is. I'm Rodney A. Brooks, but I'll impersonate him.
So I'm going to talk about the March of Technology and I'm going to look back a little land forward, as everyone has done. And back in the old days, ordinary people couldn't touch computers. They were behind glass walls. One of the reasons was people were too dangerous for computers because they all smoked back then when the smoke particles got him a disk drive. So you had to seal the computers off from people so they couldn't affect them. And if you wanted your information technology or information that would deliver change, you had to go through a whole chain of people-- systems analyst, programmers, punch card operators, et cetera. You couldn't actually do anything yourself. But now ordinary people can touch computers. And so there's been this explosion over the last 30 years of forms of computation, and how we use it, and information that we get, and we have control.
I'm a robotics guy and robots in factories today are just like they were in 1961, when the first production lines used robots in New Jersey in a GM plant. And ordinary people can't touch those robots, in this case, because the robots are too dangerous, and they will swing around and kill someone without noticing. But what if ordinary people could touch robots? And over the last 10 years, there has been a transformation where they can. Now ordinary people can touch robots.
And up in the top left side is as a surgeon using an Intuitive Surgical da Vinci system to operate remotely inside the patient. When the surgeon moves their fingers maybe a millimeter, the robot moves a fifth of a millimeter inside so they can do much more delicate surgery and the incision is much smaller. You might question whether a surgeon is an ordinary person, and they take that to mean something different than the rest of us when we say that.
But then there are home robots. And I'm going to talk about numbers in a minute on the top right. Lots of robots now in people's homes. There weren't any 10 years ago. And on the bottom, there's been a transformation of US military over the last 10 years of not only air vehicles, unmanned air vehicles, but also, ground robots. And we see various things on the bottom left. A robot in Iraq for an improvised explosive device remediation, a robot sniffing in the trunk of a car looking for a bomb. In the middle on the bottom there, you see a guy who came in with his pack bot to a depot in Iraq that had been blown up, it'd been out defusing bombs.
So he came in, the robot is in a box in pieces, he said, can you fix Scooby? And the depot manager said, no, but we can give you a new robot. No, no, I don't want a new robot. I want Scooby. Scooby and I have been through a lot together. And you see on the back of Scooby's head there-- if I can make this work-- I can't make that work.
There's a separate point. On the back of Scooby's head, you see one vehicle based IED, 17 improvised explosive devices, and one unexploded ordinance before Scooby blew up.
And now we see the unmanned vehicles getting into the infantry. This just started a program of record of putting one robot with every troop of nine soldiers in the US army. So there's been a real change over the last 10 years. Many of these robots are from one of my companies, but not all of them, where the ground robots in the US military have gone from zero to over 9,000 ground robots. And robots in people's homes from zero to over 6 million robots in people's homes. That's happened just since the middle of 2002.
And why is that happened? Why do we have all these robots, and all these robot forms? Oh, before I say that, I want to just say one thing, because this is MIT. This is some of the robots that iRobot sells in non-consumer areas-- and I'll come back. There's some of these in some of these in Fukushima right now at a nuclear power plant. This is underwater seaglider from University of Washington, from where Ed Lazowska is. And these were out in the Gulf looking at the oil spill last year. So not just military applications, but also other sortts of applications.
I meant to remember up here, UROPs. One of the great things about MIT is the undergraduate Research Opportunities Program with undergraduates working with faculty. And this company, iRobot, was started by two UROPs and me, a faculty member. And we went out and started the company together. Yesterday, we had Tom Leighton talking about Akamai. Again, UROPs were really part of that whole process of starting the company. And so that's one of the great things about MIT.
Colin Angle, who's the CEO of iRobot, likes to point out that he's only ever had two jobs with a regular paycheck. One is camp counselor and one is CEO of a company.
Anyway, why have things taken off? Things have taken off because computation and sensors have gotten exponentially cheaper for the last 50 years. We've heard that exponential again and again. I'm going to come back to that in a second, because there's some key points there that I think that explain what's happened a little more. Research and computer vision and things like simultaneous localization and mapping have made major strides in the last 10 years. A lot of work that work has been done by MIT, but also Stanford, other places.
And for certain tasks, robots have passed a usability threshold that makes them useful to untrained people.
You have to press this hard or long? I'm not sure. IT exponentials. We've said that again and again the last few days, but it all comes back to Moore's law. And this is Gordon Moore's paper from 1965 where he says in the abstract, there's a 10 page little article, with unit cost falling as number of components rise, by 1975, economics may dictate squeezing as many as 65,000 components on a single chip.
And in the abstract, the second paragraph there, he says this is going to lead to-- this is 1965 remember when a computer cost a million dollars at least and was filling a room-- home computers, automatic controls for automobiles-- and if anyone's used a 1965 or previous automobile, you know there weren't much electronics in them back then-- and best of all, personal portable communications equipment. So he foresaw all that.
And there was a cartoon in the paper with a someone selling cosmetics right next to a handy home computer. I asked Gordon in 2005 at a celebration, the 40 year celebration is paper, whether he'd had anything to do that cartoon. He said, no, not at all. It just showed up in the article. My guess is the cartoonist was sort of making fun of the idea of having home computers, but it turns out to be what really happened.
But this is Gordon's key graph. He's only got four data points. This is the null data point. You plotted how many components on a chip and had four data points plotted against here. And he just extrapolated-- he didn't dare extrapolate more than 10 years of how many components would be on a chip. And that comes out to 2 to 16th or 65,000.
What's an exponential? An exponential is something where the rate of change, the amount of stuff you've got, is proportional to the instantaneous amount of stuff that you have around already. So is that why these chips got more and more components? Was because he had chips, so you had computers that you could design better chips, so that you could build bigger chips, and so on?
Well, David Mandel, who is running the whole MIT 150th, in his book, Apollo, talks about the lunar module computer. And that was the first computer with chips. And in the Block I lunar module computer, it was one gate per chip, then they upgraded to two gates per chip, an exponential increase. But that didn't come around till 1968. So that's not why this exponential took off originally. And it wasn't in play in 1965. In fact, I think there's three exponential forms.
The rate of improvement is proportional to the current level of adoption. And certainly, network effects and uptake of Facebook and Twitter and all those things that us old people don't use is due to that.
The second one is the existence of the law, tells everyone what to aim for. And I think that was a driver in Intel and all the Japanese memory companies. The engineers would walk past in the morning that plot of how many bits they had to have on a chip, and what year, and they'd know what technology they needed to use. Or thirdly, someone else is driving an exponential, and you get to hop on it for free. And I'm going to come back to that in a minute.
But first, I want to say this thing about information technology and exponential, it's because there's a special relationship between IT and physics and that is that all digital instantiation of information technology are based on a one bit abstraction for a bulk physical property of some sort. We heard yesterday, Shannon said, oh, we can have an open or closed circuit. That's a 1 or a 0. That's where the electrons are flowing or not. It's a one bit abstraction.
So if you want to you know look at these piles of sand and tell which ones are red pile and which ones are green pile, there's a certain amount of bulk physical stuff, but you can take away half-- you can make each of those piles of sand half the size, and they'd still be piles of red sand or green sand. Then you could do it again, and again, and again. And the property stays.
Eventually, you get down to one grain, and then you have to go to multicore. But.
So the physical bulk to be produced while maintaining that abstraction, that's the important thing about information technology and why you can get exponentials. And Suzanne was talking about some of the new technologies, energy, biotech, et cetera, and being very capital intensive. I think that Sandhill Road VCs got spoiled by this, and they sort thought that any new technology would go exponential, but it doesn't have the same relationship in energy, for instance. So there are technologies all over the place and information technology. And then those exponentials beget other exponentials.
And so information technology exponentials are in performance, going up, cost going down. But that's not true in and other technologies. And my technology or my area, robotics, it's not completely true. There's a family of robots from iRobot ranging from 350 pounds. These two are about 65 pounds, this is 35 pounds. And this is a two pound robot.
They've got much the same form, but their strength is not the same and their speed is not the same. If you've got a mechanical arm and you cut off half of it, it gets weaker. It can't compute holding stuff up as well when you get rid of the physical bulk. So there's a relationship between mechanical performance and the physical bulk, which is not there in IT.
Likewise in price. One of my master's students a few years ago went around the lab and found all the old master card catalogs all lying around. And he found five components that we'd been using in the lab for 13 years, and looked at their price, and then normalized for the consumer price index. And they didn't go down exponentially over those 13 years. They only just went down a few percent. So cost of mechanical systems, likewise, doesn't go down exponentially. Cost of energy systems doesn't go down exponentially.
But if someone else is driving an exponential, maybe you can use it for free. And that's the good news for robotics. The IT exponential is good for us. They've been very, very good to robotics over the last few years. And exponentials help robots in all sorts of places-- cameras, vision, machine learning. Training [INAUDIBLE] size, when you go on the web, and get lots of examples, and then train up your learning algorithms, et cetera. And even the number of cores on a chip. And we don't necessarily in robotics have to be quite as smart about having a general programming methodology, because we can dedicate one core to processing sound, and one core to processing the left image, and one core to processing the right image. And we don't have to have parallel programs at such fine grain.
But exponentials do impact robots. In 1979, when I was a grad student at Stanford-- this is the old Stanford Artificial Intelligence Lab-- Hans Moravec had this project with a robot called The Cart, and I was his junior gofer. And about midnight, when most people had gone home, and the DEC 10 mainframe was available, we'd set up this room, and set the robot going. The flashe is because this is being recorded on 16 millimeter film and the camera is stopping.
You'll see this thing slide across. This is the digital camera, and it was $50,000. So he couldn't afford two of them, so we had the mechanically slide it to get stereo. And then processor would look at those images, compute for 15 minutes, and then move a meter. So this is a six hour run overnight. Getting a robot to go 20 meters over six hours.
And around 6:00 AM-- we didn't even have our own computer just for the AI lab. Then the Center for Computer Research and Music came in at 6:00 AM and started composing on the computer and the performance went way now. So that was the six hour window. So that's Stanford AI Lab 1979, 20 meters in six hours.
This camera, by the way, $50,000. Recently, I saw a retail camera, webcam, for $2.99, so I bought it. It came with the USB cable for that price. Now it was a really, really lousy webcam, but it was way better than this $50,000 camera.
So that's 1979. 1992, Ian Horswill, one of my students, had a robot here at MIT, Polly, which would give tours of the AI lab. It went about 2,000 meters in six hours on a typical day. And by 2005, Sebastian Thrun, head of the same lab as The Cart, Stanford Artificial Intelligence Lab, had his robot, Stanley, go 200 kilometers across the desert in six hours. So we see forward there's a magnitude improvement over 26 years, about a doubling every two years, 13 doublings.
And so we see the exponential performance increase in these robots largely driven by the exponential growth of information technology. And a few clever algorithms here and there.
I claim the world needs robots, which is convenient, since I build them. I'll just show you one little piece here. This is the population of Europe in 1950. Histogram five year intervals on the left for men, five year intervals for women on the right. And this is what happens by 2000. And by 2050, the snake swallowed something, and you see that people are getting older.
So over the next few years, Ed, as we baby boomers enter retirement age, not only is Social Security going to be stressed, because we're all pulling on Social Security, but those dollars have to be used to compete amongst less workers to provide physical services for us. And I think that's really going to drive the use of robotics to do physical stuff in the world, because the ratio of older to younger people is changing dramatically. Not just in Europe, also the US, Japan.
I just want to finish with a few clips of practical research in robots from some of my students over the last few years. These are my PhD students, but many bachelors, masters, and postdocs have also been involved. And show you the sorts of things that robots can do today.
So this is actually a few years ago. Here's a robot looking at this person's eyes. And when she looked somewhere, the robot fingers out her gaze direction and looks where she's looking. Very different from the current industrial robot aware. Of the person, aware of how they're interacting. If I was going to show Ed how to do something, I'd be looking at his eyes all the time to see whether he was paying attention to the right thing. And I'd be looking at his gaze direction. We use gaze direction as a cue.
The good thing about robots is that they can affect the world. So if you want to do a vision problem, you want to see what that object is against the background, you can move it around, and that segments it from the background. So you don't have to be perfect on one image. Sort of gets back to you or that do your best, change the world. Gonna push things around, change things.
And so here's the robot learning about the appearance of an object, which looks much like the background that's been put on. But the robot pushes it, segments it. And then in one shot, gets a view of the object, which you can then use to match later. And you'll see over its shoulder there, looking inside its brain, its model, which is a 2D image model of an object. Very different from the way we use-- we people have models of objects. I'll get back to that in a minute. And here, integrating some of the speech work from Jim Glass and Victor Zue's group. And I hope the sound is on.
BROOKS: Single shot, teaching of words. If it's more than one syllable, takes two more trials.
BROOKS: Now watch the robot's eyes.
BROOKS: It knows where the eyes' gaze direction refers to. And I just want to finish up with showing how different current research robots are to industrial robots. Industrial robots have very little sensing of the world, don't know what's going on, don't know about people. But this is the robot, Domo, and Aaron Edsinger, who was the grad student whose work this was, and there he is safely interacting with the robot. Right in, close to it, playing with it. He's giving an object that it's never seen before. It moves them around in its view, and gets the extent of them. So can manipulate.
Here, it's aware of him, it's following him visually. And helping him with a task by being there as an assistant. And here, you see the robot, it's picked up this hammer. Didn't know about the hammer before. And it moves it around very quickly, gets its visual extent against a cluttered background, messy sort of environment. And is able to predict how it moves around.
And then it's able to do tasks, and was just finished with that task. Here it is putting a pipe in a hole that it knew nothing about before. Hasn't seen these objects, has no model. Builds the model dynamically by looking at how those things move against the clutter and maybe mobile background. And then doesn't insertion. In his thesis defense, he had the robot making a margarita.
Okay. So I'm going to finish up with that and just say that for me, the challenge is to make robots even better and easier to work with. Get back to some of the things Patrick Vincent talked about yesterday. Trying to make robots do things the way humans do, because although we can do image based object recognition, we can't recognize classes. We can't recognize that this is a shoe if we've never seen a black shoe that looks just like that before, whereas a two-year-old child can.
A four-year-old child is able to understand language in noisy environments, and be able to understand lots of accents. A six-year-old child is able to tie shoe laces and do every task that is asked of a worker in a Chinese factory. And by the time a child is actually nine, but it makes it more fun to say eight here, the child understands the difference in what they know, and what other people know, and what other people know in terms of-- the child is able to keep track of what the other person has seen, and what they know, and how that differs from their own model of the world. Whereas when they're six years old, they think everyone knows exactly the same stuff as they do.
So if you're looking for challenges in robotics, build these sorts of capabilities or move towards these sort of capabilities, and we'll have more and more of these robots. They'll be able to hug Ed in his old age as he's getting more and more decrepit. Thank you.
Are we going to do questions? Are there any questions? Yeah.
AUDIENCE: So I see from your talk that you're very optimistic about the progress of robotics and technology in general. But you've also expressed skepticism with the technological singularity. So where do you see things stopping, assuming it will stop at some point, right?
BROOKS: Well, I think I think Butler was right in his talk this morning that if you're a materialist, you have to believe that uploading is technologically possible. But I think it's way too early for us to worry about that. I think that's sort of a bunch of technology people looking for eternal salvation without the inconvenience of having to believe in God. So I think all of us, not just Ed and me, but all of us here are going to die. I mean, it's just going to happen. Sorry.
I guess that's the end.
AGARWAL: Thank you, Rod. The next session, you'll see my robot come and do this introduction. It's my pleasure to introduce John Hennessy, who will talk next. John is the president of Stanford, although I'm a little dubious about his taste in graduate students. I was one of his early students in the late '80s. John.
HENNESSY: Thank you, Anant, I guess. As somebody told me recently, of all the introductions I've ever had, that's the most recent.
Okay, so I think as is traditional in these sort of talks, I'm going to try to both take a look backward and then I look forward and talk about what the implications are. The March of Technology is indeed a good uber title for this talk, because it really is about the dramatic changes and about the inflection point that we've passed through, and what some of those implications are.
Let's face it, most of the world is not going to use computers the way those of us in my generation have used computers. I started on actually, paper tape and punch cards for a little while, and then moved to time sharing. But most of that time with a computer that was at least on my desk and not physically movable. The next generation is going to be using these things to access the internet. And that's the way they're going to operate.
We've already got this incredible inflection point coming in 2012 that may actually move up into 2011, because the rapid acceleration of tablets. And if you put tablets with smartphones, which I think is probably a reasonable thing to do-- they're mobile devices, they're small, then we may actually get the crossover in 2011. And desktops are fading in this whole picture.
So it's happening very quickly. That means the growth in CPUs is actually being driven by that low end. Plus of course, this enormous consumption of computers in the cloud as we build these giant warehouse scaled computers that are providing much of the storage, backup, content, information for the web.
So what happens now? Batteries become crucial. Energy becomes a key factor in determining how useful these devices are. And whether you look at a tablet or you look at a cell phone, you quickly discover that energy becomes the limiting factor. How long does this last? Certainly, you start using the web a lot on this, it doesn't last very long. And that's a key issue.
So battery life time is crucial, which means energy consumption becomes a crucial metric. By the way, it does in the cloud, too. If you look at the initial capital cost, this piece of it in red here is the cost of power of cooling. That's just capital costs. That's the infrastructure to bring up a large sized cloud. If you then look at the overall operational cost, you see the power here combined with the capital amortization of the infrastructure is a significant piece of that puzzle.
And anybody who's looked through the kinds of clouds that are being built by Google or Yahoo or others, the first thing you'll notice is they're not using the cutting edge fastest processor, because as you'll see, the cutting edge fastest processor is a disproportionate power Pig it's energy inefficient. And I'll talk some about why that's the case.
So we're sort of back we're in this age now where we're really back to the future. We rode Moore's law. And computer architects-- that's what I did for a living when I was doing that before I was a university president-- computer architects basically have had one function for quite some time. We take Moore's law and the magic of this doubling of transistors that's been given to us by the integrated circuit industry, and we try and turn it into performance that's usable by the kinds of software systems that we write now.
So if you think about what happened in the late '70s and early '80s when the RISC movement was about, it was really about efficiency. In the end, it was, how do we take these transistors and use them effectively given that we're going to be on a single chip? How does that reset the picture for how we build computers? That freed up by simplifying the instruction set without losing very much translational efficiency between the software systems. We picked up a lot of transistors that could then be used to do pipeline and caches, et cetera.
And we gained runtime efficiency. I think early on, people realized that if you could spend a little more time at compilation time and less runtime, that that was a good trade for production oriented environments.
So what did we learn in the 1980s? Well we really discovered that the ability of these simpler instruction sets to exploit instruction level parallelism was substantially better than the more complex instruction sets. And that's really what allowed us then to get on what I would call the instruction level parallelism road path. And it was a road path that was going up like this, like gangbusters. And every year, we were just getting more instruction a level of parallelism, introducing new ideas to extract more of it. And thereby being able to deliver lots of new performance without changing the software model.
Of course, one of the things we discovered there that I think none of us suspected-- when we started-- and I still remember putting this on the slides that we pitched to the venture capitalists when we started MIPS-- we said, the emergence of high level language and Unix would make software portability much better than it had ever been. And therefore, new architectures would be easier to introduce. What we missed, of course, was the emergence of shrink wrap software. And the shrink wrap software came out, and it meant that carrying that software burden along, getting the independent software vendors to move to your new architecture was absolutely crucial. And that was hard to do.
So silicon in power or almost unlimited in the 1990s. We rode that instruction level parallelism path. Caches became gigantic. We got lots of performance. And then things changed.
What happened? Well, two things happened at the same time. The immersion of the embedded world and what I would call personal mobile devices becoming widespread, energy efficiency became critical. Even at the high end desktop, energy efficiency became crucial. The amount of power we could get in and out of the chip became a limiting factor in how we designed it.
I mean, imagine now, you go and buy an Intel Core i7, it has detector on there that slows the clock rate down if the chip gets too hot. You can write a work load which will cause that to come on. In fact, you can actually read a work load that will turn the processor completely off if it gets near the critical junction temperature.
So energy is the key. The tough thing about energy is that the power consumed is proportional to how many transistors you're switching every clock cycle times the clock cycle rate. If you think about the dilemma that provides to computer architects, what it says basically, you speed up the clock, you consume more power. You switch more transistors, you consume more power.
What this really says is that one needs to think about energy efficiency in a whole new way. Because either way you advance performance, you're going to pay more. But if you do it more efficiently, it will work better than otherwise.
So a few people predicted that the amount of instruction level parallelism was unlimited, and therefore, we would just keep extracting more and more and more. And this road path would continue forever. Well, the answer is yes, there's lots of instruction level parallelism, but exploiting very much of it is just tremendously inefficient. And that inside holds in lots of different ways. It holds whether you try to do it statically or dynamically. I think we've tried lots of different techniques and run into the same roadblock. I'll show you why just quickly.
Here's a bunch of random programs, how much instruction level parallelism is there. If you have ideal speculation-- by ideal speculation, I mean you know a priori, the outcome of every single branch. Therefore, you can run arbitrarily far ahead in the code. And any instruction which doesn't depend on instruction x can be operated in parallel with instruction x. Remember that ideal speculation. So there's lots of parallelism here. 18 is this lowest one, but 150 in scientific code. GCS 55 instructions per clock-- I mean, that's incredible. If we ever got there, that's a factor of probably between 20 and 30 better than we can actually do.
Well, what happens? Well, when you start to make the amount of look ahead finite, the number drops very quickly. So that's the first problem you run into. And the reason is there's lots of locality independence and computation when you look across a short stream. So you see this number falling down very quickly.
The other problem is that speculation basically hit some fundamental very difficult limits to overcome. And this shows you-- interesting data just recently collected actually on a Intel i7 that shows you what fraction of the work the instructions that actually get sent to the processor are actually wasted. And you see it's 25%, 30%, 20% here.
Now remember, every instruction that gets wasted, you paid an energy cost to execute that instruction, essentially all of it. Now you pay an energy cost to undo the effects of this instruction that you should not have done. And that's what happened. Speculation just couldn't get good enough to overcome these kinds of limits. And of course, it's a probability thing. If you're 90% successful on the first branch, and 90% on the second branch, then you're only 80% on getting both of them right. And that just overcomes the advantages that speculation has after a short amount of time.
So what happened? So we started thinking about multicore, and threads, and switching to thread level parallelism as the key issue. It has lots of advantages. It's an effective way to overcome the memory wall. Anant did a lot of work in this particular area early on using multithreading. We can exploit it in lots of ways, both through simultaneous multithreading, multithreading within a single core, as well as by going to multicore.
And Moore's law helps do makes this easy. In fact, it was a great quote that somebody said at one conference. They said, multicore makes Moore's law now the programmer's problem. It's not quite the right metaphor, but the metaphor was you now have to worry about getting performance efficiency. It's not the CPU designer's problem anymore.
But the big questions are, can multithreading and multicore deliver more performance in an efficient fashion? So here is some fascinating data showing you the performance and energy efficiency for a simultaneous SMT-- so a simultaneous multi-threaded single core running with two threads versus running with one thread.
The thing you'll notice. So performance, we get a performance boost, which goes anywhere from one down here not very much on a few programs to numbers up to close to two. So it's running close to twice as fast as it would by multi, threading two threads on a single core. But look at the energy numbers. The energy numbers are actually pretty good. And in fact, in some cases, energy efficiency has been gained, meaning the two threads running on a single core actually complete the computation with less total energy expended. Which is an amazing insight.
This would never happen for ILP. It doesn't happen when you try to do simply two instructions per clock, you can't get this result. So it's an interesting insight that we're able to do that. But I should say, these programs are Java programs. These are a bunch of things from the Parsec benchmarks set. So these what you might call modern computationally intensive programs.
So now what about when you go to multicore? Does this trend continue in multicore? And the answer is, sometimes. This is a more complicated chart here. I've lumped together the past benchmarks, and the Java benchmarks speed up here, and energy efficiency over here. So notice that when the speed up is very good, as it is in the case of the Parsec benchmarks, then the energy efficiency is good. When you get less speed up of course, you then see a lower level of energy efficiency.
So for some of these benchmarks, in the case of the Java benchmarks, because you get less speed up, there's less of a reduction in execution time, there's more power consumed in a multicore environment, so you're paying more power. The result is that you don't necessarily get an improvement in energy efficiency. Nonetheless, most CPU designers would agree that compared to any alternative that's available in the instruction level parallelism benchmark arena, any alternative, this is still better from an efficiency viewpoint.
When you combine them, sometimes you get the advantages, sometimes you don't. The problem of combining SMT and multicore now is if you have a four core machine with two threads per core, you need eight levels of parallelism in order to really keep it busy. And if you don't have that, then, of course, you pay a higher penalty and you actually see this kind of fall off here when the speedup isn't very good, as happens in these Java benchmarks.
So where are we? We're in an interesting state now where there's essentially two divisions, two different ways to go forward with multicore base designs. One that uses fine grained multithreading. This has sort of been the future that Sun Niagara is pursuing, and possibly could be something that Intel might pursue in a future processor designed, more for the embedded and mobile market than for the PC market.
And we've got things like the Intel i7 and lots of processors that are over here that use some amount of speculation, multiple issues, three to four issues per clock, aggressive memory hierarchies, and tackle the problem that way. Scientific computing guys are often a different realm. They're off trying to figure out what to do with GPUs. And someday, GPUs will be designed by CPU designers, and then maybe the problem will get fixed. But we're not there yet.
So a couple concluding thoughts about this. We're moving in this direction. There's no going back. There's no other viable alternative on the horizon. We are going to have to accept the fact that if you want more performance, you're going to get it by more cores on a chip, and the software people are going to have to figure it out. Well, it's sort of back to the future. This is a wonderful quote that's in David Cook's oral history talking about the Iliac IV. On the Iliac IV, he was not only the head software person, he was the software person on the Iliac IV responsible for the system software.
So you see what happened a bunch of hardware designers said, well, if we build the hardware, the software will work. And of course, that's not what happened. It turned out to be a very difficult machine to program. Even for a narrow range of applications, it was hard to use, although it did work quite well for a subset of those. But it's a reminder that that interaction between software and hardware is a fundamental part of the problem.
So going back to some of the points that were made earlier-- and I absolutely agree with what Barbara Liskov said earlier. Programming multiple processors in ways that are not impossibly difficult, which today, one can debate whether they're impossibly difficult, but they're close-- it's close. Once, somebody told me it's not hard to program multiple processors. And I said, it's not hard if I don't care about correctness and I don't care about performance.
Programming multiple processors in ways that are not impossibly difficult. Retain some degree of portability across architectures so that we don't have to turn every single programmer into a hardware architecture expert. Understanding things like the size of caches, levels at which coherency are done, multiple levels of interconnect and how they operate. Which today, if you really want to push the performance, those are all things you have to become aware of.
Have some flexibility across processor count variation. So today, we have a dilemma that many pieces of code work fine if they're written for four threads, but work marginally on eight, and don't work on 16. We need to do a better job of thinking about how we get scalability and write our code. And of course, achieve efficiency, because in the end, the reason I was willing to use some kind of multi processor was to get more performance, and I have to get that efficiency.
I think this is in the computer systems area, one of the truly most difficult problems. We're more than a decade behind and we're more than a decade behind because everybody was dreaming that that ILP roadmap would continue into the far future, and we wouldn't have to solve this problem. Unfortunately, for better or worse, the investments were not made in really trying to focus on this problem.
I agree with-- I think Barbara hinted at earlier-- there likely is no silver bullet here. There's no one thing that's going to crisply solve the problem. But I also believe that there is great opportunity and that it's going to take somebody who is in the next generation of computer scientists to solve this problem. So I urge the young people to think hard about what they might do to solve this problem. The person who solves this problem will really propel the industry along, and we'll all be grateful, because we'll all have faster machines that do all the great things we want to do. Thank you.
AGARWAL: We have time for a couple questions.
Please come forward to the mics if you can.
STUDENT: Hello. So I'm very interested in this challenge. And I was wondering about your insights about what would be the most potential or the most suitable level of a software stack to solve or to address this challenge at?
HENNESSY: I think it will probably get solved at multiple levels on the software stack. I think the sort of MapReduce is one model for one class of problems that have lots of natural high level parallelism and can be divided into tasks, which are more or less identical run across that. But it's not an adequate model for all our applications. I think we're going to have to have fundamental progress in terms of methodology that eventually gets coded into some language support mechanisms and a programming environment.
We're going to need that. And I think it's not enough-- it's not enough to just have this very high level parallelism. Works fine if you're searching the worldwide web, works fine if you're running large database queries. But the truth is, I want my little tiny device here to have a user interface, which is absolutely spectacular so that I talk to it under all kinds of situations. It filters out all the background noise, it figures out what to do, and it really makes my life better. That is not going to be coded in that kind of application environment. So I think we're going to have to work on the language and program methodology as well.
AUDIENCE: I appreciate your comments about the parallel processing, but I had a question about the processor architectures and how they would evolve in the future. In particular, how do you see changes in microprocessor architectures with the different applications, such as data centers versus mobile devices?
HENNESSY: Yeah, so we are seeing some divergence. I actually threw a slide out of this to save some time. Actually, it's probably the next slide in the order.
If they put it back up.
They've taken it off. Okay. Probably put it in as a backup slide. Oh, here it
Is. So this shows you two different microprocessors. And this is a phenomenal opportunity that only comes along once in a great while. Two processors, same instruction set architecture, same compiler, same implementation technology. So now we get to compare two different organizational approaches to performance.
Okay, one is the Intel's high end i7. 7 the other is the Atom 230, which is in a lot of the very low end netbooks. So look at this. Look at the incredible performance advantage the i7 has. Four at a minimum, three or four times, up to seven, eight times. So you say it's that much faster, it must be a much better processor. Well, not if your metric is energy. It's worse by about a factor of two in terms of energy consumption.
So I think we are seeing it. When you see this data, it immediately tells you there may be a split in how we design things for the very low end, where we really care about energy. And some things where we care a bit about performance and we're willing to give up on energy efficiency in order to get that performance. And that split may be coming, it just may be unavoidable.
AUDIENCE: So what meters are coming into use as we try to save energy at home. But I as a software engineer have to run a separate profiler to figure out where my time is being wasted, and I have almost no introspection as to where my power is being wasted. How can we instrument the world in such a way that now that speed is my problem as a software engineer, that I have the tools to attack it?
HENNESSY: Yeah, that's a very good question. This data that was actually collected nine years it was actually collected by hooking up watt meters to computers, because the rated power doesn't tell you anything about these processors. Because the max power is not equal to the thermal design power, is not equal to the actual power consumption. They're all different measurements.
So in order to figure out how much power is actually being consumed, you have to hook up a watt meter right now. So there is now, this notion that we should have inside the processor, just as we have now all these measurements that tell us what the speculation of this ratio is, what the branch missed prediction is, what the cache missed rates are, what the power consumption is. And I think if we had that, we could actually then begin to think about this issue. And I think you care about low power devices, you're going to have to contemplate what it means.
Okay, thank you.
AGARWAL: Thank you, John.