Brains, Minds, and Machines: Vision and Action

Search transcript...

MODERATOR: And I would like to welcome all of you to the symposium, and to this first panel this morning. We start with vision and action, because they represent problems that are computationally very difficult and that has been, until now, mostly impossible to reproduce in machines. Just think about the irony of this. We have machines that can beat humans at chess, perform better medical diagnoses than many doctors, trade stocks more efficiently, but we don't have yet any robot who can be a chef in a kitchen, or a gardener.

However despite the fact that it's not an exaggeration to say that vision is a proxy for intelligence and in fact, we say I see to mean I understand, and motor control and planning is similarly difficult. But despite this fact, we are beginning finally to get some practical successes in robotics and in vision. And we'll have today first Takeo Kanade, and then Rod Brooks, and Amnon Shashua tell us about some pretty amazing machines, and perhaps also why none of them is a really quite intelligent yet.

So let me start by introducing Takeo Kanade. He is the Whitaker Professor of Computer Science and Robotics at Carnegie Mellon. He has received his PhD in electrical engineering from Kyoto University in Japan in '74.

He has held many posts. He was director of the Robotics Institute. He has founded the Digital Human Research Center in Tokyo, served as its director for the last 10 years. He has worked in multiple areas of robotics and vision, computer vision, multimedia, manipulators, autonomous mobile robots, medical robotics, and sensors.

He has many awards. He is a member of the National Academy of Engineering. He won the Okawa Prize and several other prizes. I don't want to take too much time out of his 10 minutes, so just to say that he has been one of the great pioneers in computer vision and robotics, has been a great model for me. And he will speak about past successes of vision in robotics by sketching some history of scene analysis, navigation, and object recognition. Takeo.

KANADE: I'd like to talk about vision from my point of view, progress and also non-progress.

What is vision? I think vision is the intelligent process to reach description of the scene, given an image or images, which is in a sense a measurement taken by the device, called cameras, of this three dimensional scene. Now, in spite of the fact this is probably the oldest, the earliest attempt in AI, it turns out to be very difficult. There are lots of reasons for that.

Technically, we have to deal with a large number, a large amount of volume of data, as big as 100 megabytes per second. But that's rather technical. More fundamentally, we have to deal with 2-D 3-D degeneracy. Image is a projection of 3-D, so directly inversing is mathematically impossible.

Furthermore, the signal is compounded. It is the combination of many factors, such as lighting, material, viewing angle, and the media between the camera and object. So dealing with this compounded information and divide the effect is not a trivial factor.

And in AI, we actually talk about context as one of the most difficult problems. Now, the context is like this. When this picture is given, when you ask what is this, it's obvious it's a car. Now, it doesn't look like a car because it's only a box. Then if you ask why is it a car, then many people say, well, because it's on the road. That's how initially the artificial intelligence tried to explain.

Now indeed, context is playing a role. Because if you remove all of them, we only see a small rectangle. It does doesn't look like a car. And if you put back context, it does look like a car.

Now, the problem is, however, if you ask then why is this a road, which gives the context of the car? And then the answer that I get 30 years ago is because there's a car on it.

Now, this is the fundamental problem, because of the vision difficulty, difficulty of the vision. That is a cyclic dilemma. And this is true in context, occlusion, attention, and so forth. We can explain all these phenomenas, but the explanation actually includes the answer. Therefore, in order to build the machine to do the same, we have to resolve this ambiguity.

And I have a share of the failure of attempt of vision, as well. In 1975, my student, Ohta, developed outdoor scene analysis. Surprisingly, at that time it's an enormously ambitious goal, given the input image work out over segmentation, today's terminology it's called Super Pixel. And then developed a semantic net that connects all these image items, image regions into building windows, sky, tree, and car.

And we barely succeeded. Indeed, around that time, if you processed two images, you can write a good paper. And this is indeed one of them.

Now, of course, what went wrong? Now, of course we wrote a recognition knowledge module. For example, this is the module that we wrote. Like, how can we recognize the window and building in the image? So if you find a small rectangle, then find out if there are a grid of those rectangles, which is a good reason for believing that is the region of the building wall.

Now, all these rules, however, we wrote manually. And we said, oh, if you find such a thing, then the probability of that building region to be a building wall is we thought, OK, 1.2. And then we keep doing that. So of course, as you can easily see, we didn't go too far.

And because of this difficulty, which I think in general I call a signal to symbol mismatch, I think vision has departed from AI. I call it early divorce, because of two young marriage, too early marriage. And since then, in fact, the vision group has I think went out of AI and went to more mechanical, engineering oriented society. And during that, I think we had some success, progress, and funds.

But let me talk some of the one from my own work and CMU's work-- autonomous driving. Early on, we could drive in the park. In 1985, we developed Navlab 1, which is made of some threes and fours, cameras and range finders, and can drive barely actually in the street, and then detect some of the clues, and so forth.

Now then, around 1995, we could actually do a campaign which we named No Hands Across America. At that time actually, Navlab 5, the fifth version of the Navlab, could steer 98.2% of the distance from DC to San Diego, completely by computer vision driven program. But only 98-- we can call it as much as 98.2%, or up as little as 98.2%. But that was, what, '95.

Now as you may know, 2007, CMU group won Urban Challenge, which was completely autonomously driven, autonomously by this car. And we actually recognize and understand various traffic signs, and so forth, and then do a parking, and so forth. And a similar-- and so forth.

And also, I had fun with multi-camera vision. For example, I built a 51 camera system and 39 camera system them around '94. In 2001, we can digitize four dimensional events so that we can completely generate the new view from arbitrary viewpoints. And that was back in 1995.

And then I had a fund to actually apply the idea to the Super Bowl, with the 33 camera vision system.

NANTZ: Called Eye Vision. It provides panoramic coverage similar to the special effects in the hit movie, The Matrix.

[MUSIC PLAYING]

Here at the Super Bowl, 30 robotic cameras have been mounted on the scoreboard, and all along the upper deck in intervals of 7 degrees.

ANNOUNCER: We will talk from time to time tonight about Eye Vision. If you're wonder what it is, this is what it looks like.

ANNOUNCER: You got a pass rush, the quarterback, he drops back. Look, he sees a big lane. Look at that big lane--

KANADE: To actually apply some of the vision systems to such a high visible event.

Now, at this moment I'm now building a 1,000 camera system for the goal of completely modeling the human activities and shape and motion, et cetera.

Now, how about object recognition? Face is my favorite object that I started with my PhD, back in 1973. At that time, I could only recognise 33 people with a 75% recognition rate, 1973. Now today, as you may know, a computer actually can recognise 97% out of 1.6 million people, as long as good pictures are given. And it is reported it can actually outperform, in some cases human.

Now, how about detection? I think we developed-- actually, Poggio was one of the pioneers in phase detection. In the mid-90s, we developed face detection, which as you may know we use everywhere today in camera and cell phone, and so forth.

And now from detection to align, we can align parts even with extreme expressions. And now we can explicitly handle occlusions by actually including visibility variables within the model. This is one of the attempts to actually tackle the cyclic dilemma that I mentioned in the beginning.

Now, this is the same idea applied to car alignment, and this program actually can detect the cars and align the car, even with occlusions. And the important thing is that this program knows not just handling of occlusion, knows where the cars are actually occluded. And it looks like Robert's 65 now. The scene is actually analyzable by this kind of program, and understand what shape of the car, et cetera.

Now, once we align such a thing, then we're going to do an interesting thing, which I call it Obama speaks Japanese.

[SPEAKING JAPANESE]

Now, once we have such a program, what we are doing is, what we can do is develop a video conference room. But the difference is, one of the channels is inserted face cloning. So you may be thinking you're talking with a woman, but you're actually talking with a man. You may be thinking the guy is saying yes, but the guy may be saying no. So by this, we can actually understand what in what way the human understands each other.

Finally, how about the original problem of scene analysis? I think we are getting back to this problem in the last 10 years, or 5 to 10 years, by actually looking at the context, object, image, and viewpoints in a complete system, most often by using some graphic model. And some of the program actually can generate the scene description, as was promised early on-- not only the object, but also some functionality is understood from the scene. This is the example by Dr. Gupta and the company of Carnegie Mellon, most recent one. And then the program can understand where the person may sit, may reach, may lie on, et cetera.

So finally, I'd like to conclude that I think going through this cycle, I think is the time to revisit the vision as a search for description problem. In other words, I think it might be a time to remarry AI and vision. And I think this time, we are a little more mature, with a lot more of computation, a lot better model, a lot better modules that we have developed in the past 20, 30 years. Thank you.

MODERATOR: Thank you, Takeo. We'll have the other short presentations, then a discussion among the panel. And then we'll take questions from the audience, so hold your questions for a little while.

So now I have the pleasure of introducing Rod Brooks. Many years ago, I was involved-- I was at AI Lab when he was a post-doc there. And then I was involved in the easy decision of convincing him to come to MIT as a faculty, and a lot has happened since then. He just retired to from his position as Panasonic Professor of Robotics at MIT to be full time at his last startup. I'm sure he will speak about it.

Before that, he had started iRobot, and I'm sure he will speak about it because he will in fact speak about simple animals with simple medical systems inspired the robot architecture that now has six million instances in peoples' homes, and over 4,000 instances in the US military. Rod.

BROOKS: Thanks, Tommy.

I'm going to talk about the action part of vision and action. And the first question is, why don't we see more robots? And I think we don't see more robots out in the world because it's difficult for us to tell what is hard and what is easy. And conventional, top down engineering often goes astray when we try to develop robust robotic solutions.

So what is hard and what is easy? Fantasy versus reality has gotten in the way for us as external observers. And sci-fi robots are fantastic, anything we see in science fiction. Unfortunately, some companies have fantastic robots, too, in the fantasy sense, where many of the demos are wind up toys and great video, but the robots can actually do what we infer they can do by looking at the videos. And that confuses people.

And then lab demos, in all good intentions, often make many non-obvious simplifying assumptions that don't work out in the real world. So getting robots out into the real world is hard. And perception, as Takeo said, is really, really hard. Perception is effortless for people, but for robots I think it's still largely unsolved. For navigation, yeah, we've got that. But we don't have it for other things. And the latest little buzz amongst the venture capital community is oh, the Kinect, that gives you 3-D. That's going to make everything easy. But 3-D doesn't make things easy by itself.

Now let me give you an example of what's easy and what's hard. If you give me a bunch of money, I can from scratch go and take an airplane and make it so it can fly from Boston to Los Angeles autonomously. I'm confident I can do that. But something I'm not confident that I can do, get a robot to do, given an obscene amount of money is this.

Reaching my hand into my pocket and pulling out my keys without hurting myself is an incredibly hard problem, much harder than flying a plane autonomously from Boston to LA. Why is that? Why is this different?

Well, it's different because in conventional machines, such as flying an airplane from Boston to LA, it operates under distinct phases, and the stable boundary conditions for long periods. But for a robot, reaching into a pocket, grabbing something unknown, the boundary conditions change every few milliseconds, and the robot has to adjust to unpredictable and often unmeasurable boundary conditions, millisecond to millisecond.

And the senses that are being used provide at best changing projections into a sparse representational system. It's very hard. It seems easy. Gee, anyone can reach into their pocket and pull out their keys. But that's much harder, I say, than flying an airplane from Boston to LA.

So in normal engineering, we'd use hierarchical decomposition. And Takeo actually gave a good example of that, in his 175 work, where you decompose into sub modules or subtasks. The decomposition, hierarchically, is used in different sorts of ways. And then trying to couple of these pieces together to get the whole system to work becomes very difficult.

And eventually-- and here, I've just got the time frame, the representation here with these green arrows, we get stuck with what is the representation between the modules? What do we have to hand? What's the information? What are the symbols that we have to hand? And it's a very representational stance that standard engineering uses to understand how to do a complex robotic task.

25 years ago, I started looking at simple animals for inspiration, and realized that they operate very differently. I'm going to tell a bunch of anecdotes here. I'm not claiming to do science. I'm just looking at how animals do stuff, and looking at various literature. And I'm getting inspiration about how to change my engineering.

So in this particular sort of jellyfish, it's got swimming and feeding behaviors. And it has two completely different neural networks to do them. It has different neurons, dedicated to them to different biochemistry, different sorts of connections, different transmitters. Different networks, completely full of different behaviors. Hmm. that's somewhat different from an information processing idea, where you would have modders, and you would send messages to the modders, and use the same communication network to do that.

And so that led me to think about what was maybe different metaphors. Maybe we have this complete facial metaphor for neural systems. Maybe that's not the right way to think about it. Evolution didn't think about it that way. Maybe there's other metaphors which let you understand these simple animals.

Here at MIT in 1959, Lettvan, Maturana, McCulloch and Pitts, the famous paper, What the Frog's Eye Tells the Frog's Brain-- which everyone knows about, but no one has read is my observation-- looked at a frog. They put it in a-- this is the virtual reality system of 1959, Takeo. It's a little simpler than your 1,000 cameras. It was a magnet pulling around a black marker in a sphere, while looking at the neurons of the frog.

And out of that, they found there were four types of signals in the tectum coming out, which were very related to the behaviors that the frog did. One sort of signal for one sort of behavior, and another sort of signal for another sort of behavior, not a central representation of what the world was like out there.

Jumping spiders, very fairly simple sets of behaviors. They follow prey, and they mate with other spiders. These spiders have eight eyes, as all spiders do. Some of them are vestigial, but the eyes are specialized for different tasks. The side eyes are used for orienting towards something that it sees out of the corner of its side eye.

This is the virtual reality system for the spiders. It's more complex. There's a drum that can be driven by a motor, so that you can change the way the spider sees the world change as it tries to turn in place on a floor that slips underneath it. And there's some fairly simple stuff that comes out. When it sees something on the side, it seems to encode how many steps it should take, and then goes open loop to turn towards the thing. And then it has to decide whether it's worth mating with, or whether it's not worth making with. This is what looks attractive to a jumping spider for mating, the percentages up there.

And how does it find out whether something-- how does it decide whether a stimulus is an attractive looking spider to mate with? Well, its front eyes have torsion muscles and side to side muscles, and a funny shaped retina. So in other words, flat V shape, vertical flat V shape there, on the right. And when it's turned towards a stimulus, it then torsions its eye to line up with the longest elements, and then slides them side to side, back and forth. And so it mechanically scans for the correct leg angles, and that's what says what's an attractive spider to mate with. So again, very different from a scene analysis idea of trying to get a good representation of everything, but just getting just enough from the environment to know what to do.

So that led me to think about building robot control systems somewhat differently than the early marriage of vision and AI, and the early marriage of robotics and AI-- thanks for that metaphor, Takeo-- that brains are about performance, and representations are an engineering way to think about software. And maybe we get them confused when we're building robots. So I've tried to build robots based on performance, rather than internal representations.

And my reinspired engineering connected sensors to actuators in parallel, many times, with very short connections. As we see in neural systems, the diameter of the human brain is only six neurons wide from anywhere to anywhere, around long chains of reasoning. And each layer has some perception, plan in action, and these layers all operate in parallel.

And with that, I started building various robots in the lab, really looked at insects quite a bit to see how they worked, and built robots that could scurry over rough terrain. From that, by the way, we built a robot, Colin Angle and I, who's now the CEO of iRobot. Colin Angle built this robot for JPL in 1989, which then led to the Rocky series of rovers.

And meanwhile, we started a company called iRobot, which was going to commercialize these robots by sending them to the moon, back in 1991. We couldn't quite get there although, we did build a robot that we put in a brilliant pebble for the Ballistic Missile Defense organization. I don't have time to show you the video of it launching out of Edwards Air Force Base. And that ultimately shamed NASA into going back to their Rocky robots at JPL, and sending them to Mars. So we managed to get this control system to Mars in 1997.N

Then we went on to build, as Tommy mentioned, home robots and military robots. And these robots are out there in large numbers. And this has been a tremendous uptake over the last nine years. From 0 ground robots in the US military, there are over 9,000, now. About half of them are from iRobot. And 0 robots in people's homes nine years ago, and just iRobot alone has now six million of these robots, based on models of simple animals, inspired by simple animals to do simple tasks.

And these robots are out with the military, dealing with IED in Iraq and Afghanistan. They often get blown up, rather than the bomb techs, which is a good thing. We have them in Fukushima, the Fukushima reactor buildings right now. They're going out measuring radiation. We have some other robots there that are somewhat larger to do some tasks out there.

And we had them out in the Gulf, during the oil spill. These are robots that can go out for months at a time, and make various measurements of the plume and the dispersal of the oil.

And along the way, I've also started building robots with for almost 20 years now with a more human form, but based on the same sorts of simple principles of building feedback loops and coupling what the robot sees, what the robot does, with whatever's out in the environment. And for these humanoid robots, it's coupling it with people, and interacting with people in a way that industrial robots today cannot interact with because they're just not aware of anything outside of them.

And as Tommy mentioned, I've left MIT to start yet another company, Heartland Robotics. And our goal is to try and use these principles to build a new generation of industrial robots which will bring manufacturing back to the US and change the balance of payments that we have with China. So thank you.

MODERATOR: So Amnon Shashua is the next panelist to speak. He is a professor in Computer Science at Hebrew University in Jerusalem. He got his PhD at MIT, in the Artificial Intelligence Lab, working in computational vision. And he has been a pioneer after that in work in vision, especially with the geometry of vision, the recognition of objects under variable lighting.

He has been Chairman of the Computer Science Department in Jerusalem. He is also a serial entrepreneur. His last company, Mobileye, is making devices that allow cars to see. And in fact, I am very proud to say that Amnon was a student and a post-doc in my group. He's a dear friend and a colleague, and he's probably my exhibit number one in at least showing that AI, and in particular machine learning and computer vision have achieved impressive human level performance in certain narrow domains. Amnon.

SHASHUA: Patrick Winston mentioned yesterday that commercial distraction had an adverse effect on the growth of a AI. So I guess I'm representing one of those distractions this morning.

12 years ago, I founded Mobileye-- it's a company that has today about 250 employees-- to develop sensors, cameras for then an emerging field of driving assistant systems. At that point, I understood that in order to make significant progress in computer vision, one requires to muster resources that go way beyond what I can do in academia. So far, we spent more than $200 million in developing what I'll be showing.

So the automotive industry, the cars that you drive is the first really wide scale, high demanding introduction of computer vision. You find there areas of visual object detection, notably detecting cars, detecting pedestrians, detecting traffic signs, visual motion perception like ego-motion estimation, collision assessment with other cars. Very demanding performance specifications, more than 99% availability, very low false positives. For example, in detecting cars and pedestrians, one expects a false positive once every 20 hours of driving, and a false activation of brakes once every 200,000 kilometers of driving. So this is very, very demanding. And in some of those very narrow domains, one even achieves better performance than their human vision, especially in the traffic sign recognition, and also in the pedestrian detection, especially when pedestrians are hidden in clutter.

And the future road map in this industry is even more challenging. It's semi-autonomous driving, starting from 2013, towards even more extensive autonomous driving 2015 and forward. So I'll explain a bit what is that field, and what are the challenges, and what's in stock in the future.

So Takeo mentioned how difficult computer vision is. So I'll not expand on that. I'll just mention that object recognition, we generally have two types of object recognition. We have one that's a category level, like people, cars, faces. And then we have a within category level, like Tommy, like my car, my home, and so forth. And then we have motion, action, understanding action, depth, color, and so forth, and individual perception.

Now, detecting people is notoriously difficult. This is a very active area in the scientific community, in computer vision. Tommy was the pioneer of introducing machine learning and pedestrian detection in mid-90s, so he knows how difficult the task is. And it's still a very ongoing problem. It is because that people, the variability, the image variability that people generate is very, very large. People tend to be found in domains that have a lot of clutter, objects that look like pedestrians near them. And it's very, very difficult to reach a very high level of performance.

So let me start with a commercial by Volvo. I don't have any equity in the company. So I'll show a commercial by Volvo, and then I'll say something about it.

[MUSIC PLAYING]

[BEEPING]

[NON-ENGLISH SPEECH]

This is a commercial. Now, in a commercial you can do whatever you like, but this is an actual system. Volvo's sold already 50,000 cars that have this system. So it's a camera detecting pedestrians, cars, and other objects, but specifically the pedestrian detection, when a pedestrian is detected in a collision course, the car emits a warning signal and if the collision is still imminent, the car will actually brake. Now, when you go and buy a car at the dealership, they take you for a spin. And here is a clip that I downloaded from the internet.

So right there, at the end of the road.

[BEEPING]

So this is automatic braking. The driver did not brake, and on the cluster it appears that the brake was automatically.

So these are systems that you have today. Volvo specifically launched it six months ago, and it has said so far 50,000 units have been sold. Actually, if you want to buy this car there's a waiting list of at least six months, because of the system. This is what Volvo claims.

This is just a list of car manufacturers and years of launching introduction of systems like this, systems that do pedestrian detection, vehicle detection, traffic sign recognition, road analysis, detecting lanes, emitting warnings when there is a lane departure, a force feedback controlled by maintaining the car in the center of the road.

Just in 2010, we-- my company is the leader in this area-- we sold 300,000 units. By 2014, several car manufacturers are introducing this as standard fit in all cars. And in 2015, it's expected to have a widespread standard fit. Also, the regulators are playing a very active role in this area, because these are technologies that on one hand are affordable. It's only a camera, so the camera sensor is a few dollars, microprocessor is a few dollars, electronics a few dollars, that's it. So it's very, very affordable, and it saves lives. So regulators are actively involved in regulating by introducing star ratings to encourage car manufacturers to introduce these systems. And this is exactly what is happening.

In terms of under the hood, here's is a clip showing what the system does. When you have a square, it's a detection of a vehicle. A rectangle is a detection of a motorcycle or a pedestrian. Red means the pedestrian is in a collision course. Blue and white means a pedestrian is in positions outside of a collision course.

What you can see here is that pedestrians are usually found in a lot of clutter. It's not that you have a classical situation in which a person is standing in the middle of the road and you need to detect it. It's standing amidst many, many distractors and clutter. And you need to be able to detect the pedestrian and only the pedestrian, and not anything else. So this is a very challenging problem.

As an example of traffic sign recognition, there are about 40 to 50 different traffic signs that a system needs to detect. Once a traffic sign is being detected, the car, the driver gets information about the speed limit information and other information related to the signs that were detected. This was launched already in 2008 by BMW, and now launched by many, many other car manufacturers.

So overall, when you look at the applications that you have in a car, most of the application is the forward facing camera. There is lane analysis for lane departure warning, high beam control. The car can automatically turn on the high beams by scanning the image and understanding the scene. Forward collision warning against vehicles and pedestrians. The pedestrian detection and collision mitigation by braking, and so forth. And the road map is going into semi-autonomous and autonomic driving.

And the key computer vision technologies are surrounding object recognition identification-- like I said, pedestrians, cars, traffic signs. Motion estimation technologies, those a lot of motion estimation, structure for motion going on here. And roadway understanding, which makes us to understand the scene itself, in other to predict all sorts of collision situations.

So in terms of key development principles behind this, in terms of-- well, there's software and hardware. In terms of software, we use a lot of statistical learning techniques, both linear and also non-linear estimators. We use a cascade model, in which simple classifiers work densely over the image, on every pixel, and then much more advanced classifiers work only on selected regions on the image.

In terms of measurements, it's not clear what kind of measurements one needs to make out of images, whether to take the pixel values, or take gradients, or take histograms of gradients, to take histograms of other measurements, to do sub-similarity measurements. So what we do, we collect all sorts of measurements, all what you can imagine, and then let a greedy approach through learning select those measurements that would use the classification loss as much as possible.

Data-- it's lots of it. This is something that goes way beyond what one can acquire in a scientific setting, in academia. We're talking about training sets in the millions and tens of millions. And they cover the narrow domain of the problem. And this is very important. The fact that the domain is narrow enables us to develop systems that actually work. When you widen the domain, it becomes much more difficult.

There's also domain constraints. We have a very constraint on computing budget, because it's a portable system. Power consumption, the power consumption in an electronic system in a car is about two watts. The power consumption of a MacBook Air is about 45 watts, just to give you an order of scale.

And there are all sorts of real time constraints that dictate algorithmic approach. And actually, those constraints are good for development. Because when you are not constrained, you tend to develop algorithms that do not really meet reality later on. The fact that you have constraints pushes you to develop algorithms, then, that actually work.

In terms of incrementality, typically what we found is that there isn't a super algorithm. It's not that we have developed an algorithm that works very differently from what you know about. Actually, what has been developed are layers, sophisticated, advanced algorithms that work one on top of the other. Usually, the nth layer tries to resolve-- it's kind of a repair mechanism, tries to resolve problems that the n minus 1 layer was not able to solve.

And you can find similar things in biology. This is something that I took from discussions with Tony Poggio, like DNA repair mechanisms, the structure of the visual cortex, the layered structure of the visual cortex. So there are indications that a layered approach is really the right thing to do. And this is something that we ended up doing in our development.

In terms of hardware considerations-- I'll finish it in moment-- we ended up developing our own microprocessor, because there isn't a microprocessor today that is suited for computer vision. In tomorrow's panel, I'll explain a bit more about hardware considerations.

And near future road maps is all the things that take us to autonomous driving. I'll talk about this later tomorrow. In terms of the broader challenge-- I'll just end here-- is that one can make the following conjecture is that if you have a single object class, O of 1 of the classes, a small number of object classes, the state of the art that exists today, with sufficient resources, are sufficient to make systems that work. But humans can effortlessly handle thousands of object classes, 10,000 object classes.

Replicating resources is unpractical, unwieldy. So the next big challenge is how to achieve sublinear growth, in terms of resources, when the number of classes grows to the thousands? And this is something that I'll also describe a bit in detail, in the panel tomorrow. OK, thank you.

MODERATOR: So we have heard about quite impressive machines, but we are still a long way to be able to call these computers real intelligence. And several of us believe that this will require going beyond computers and use what we learn from neuroscience. And so Matt Wilson will give us a glimpse through the eyes of a neuroscientist, especially about planning and motor representations.

Matt's lab is next door to mine. He received his PhD from Caltech. He's the Chairman, Fairchild Professor of Neuroscience in the Department of Brain and Cognitive Science. And-- Matt, come.

WILSON: Great, thanks. Thanks, Tommy.

So I guess I'm the token neuroscientist on the panel. And it would be impossible for me to summarize what the contributions of neuroscience and biology are to the complex problem, intelligence perception. So I thought I would try to just point out what it is that biology does that I think is the primary challenge for robotics and intelligence. And that is what I've highlighted here.

It's not so much in the successes, it's in the failures. When we look at intelligent organisms, when confronted with incomplete information, when confronted with a question that doesn't actually have a clear answer, we make guesses. We make intelligent guesses, and those guesses are based upon a process of informed guesswork. We have an internal model, we evaluate that model, we make plans, and we hope that those plans actually work out if not optimally, at least satisfactorily.

So it's the 1.8% of the cases, that 1.8% of the time, navigating across the United States. The question is, how do you fail in those 1.8% of the cases? How do you fail when you have a traffic sign that actually is not legible and you have to make a guess. There's no information out in the world to inform you. It's evaluating these internal models.

So how does a brain construct and evaluate an internal model? Well, we believe that there is some contribution of memory. And that is that building a model requires that we try to understand how things work in the world, an inference of causality through the evaluation of time ordered events. How we interact with the world gives us the raw information to build a model upon which we can make informed guesses. And necessarily, our interactions with the world involve time and space.

Some of you may have come over here not having been on the MIT campus for a while. There are new buildings, the Koch building, McGovern Institute, Picower Institute, the Stata Center-- some of you may have been confused. This doesn't conform to my former model. Yet, I make a good guess, and here you are. Your past interactions in time and space create a record of your interactions, and now you're going to use that to make a plan, to make a guess, what does that look like in the brain?

Now, I'm going to give you a little example from some of the research that we've done in the lab of our studies in a brain structure, the hippocampus, that is essential for memory of events. Interestingly, the hippocampus is also essential for navigation and memory of space. Memory, space, and time. Recent work-- and this is human cognitive work, which is actually quite compelling, compelling and sort of challenging our notion of what memory is-- finds that humans that have damage to this structure, which impairs the ability to recall past events-- if I ask you what you did yesterday, you would have difficulty answering that question. But I ask you a seemingly distinct, perhaps unrelated question, what are you going to do tomorrow? Strikingly, humans with damage to the hippocampus have difficulty in answering that question. Ability to imagine the future is somehow linked to the ability to recall the past.

And this is the hippocampus in a rodent. And the approach that we use is to insert electrodes into the brain to monitor this hidden process, what goes on in this structure that is so critical for memory of time and space. And I'm just going to highlight these individual brain cells. The approach involves recording the activity, the electrical discharges of individual neurons. And then I'm going to show you here what does this look like? What does the activity look like in an actual experiment?

So here's an experiment, a top down view of a little maze. Sorry, the pointer is a little finicky. Rod, you got a-- yeah, give me that Star Wars pointer. There we go.

Top down view of a little maze. This is a rat. The little green circle shows you where the animal is, as we're tracking it with very unsophisticated visual processing algorithms, just little diodes on the animal's head. And the colored dots that you see here are the individual discharges of the collections of neurons here that again are color coded. So this is neurons in the vicinity of a single recording electrode, within a range of tens of microns. And we're mapping out where the animal is when individual cells discharge.

Now, there are a couple of interesting things to notice here. I'm just going to run this one more time. I have five minutes left.

Couple of things to note. So when the animal moves, you see cells discharging. And certain cells will fire at certain locations. Here's this blue cell fires whenever the animals here. But occasionally, the animal stops. The blue cell fires here, the animal's going to stop again. Cell's firing. What does it all mean?

Well, there's another way in which we can represent this information. Instead of showing the raw spiking activity, we're going to decode these patterns. And that is infer the location of the animal based upon the ongoing neural activity. And so the triangles here, as you see them, represent our best guess, our estimate of where the animal is based on the correspondence of neural activity and location.

And what you can see here is that as the animal moves, these triangles, our estimate tracks the animal's location very accurately. But when the animal stops, you notice the triangle goes away. Our estimate-- and now you notice the triangles seem to appear at different locations on the maze. What does that mean?

Well there's an interesting property of this structure, and the brain in general, during ongoing behavior. And that is that you can rapidly switch between what I'll refer to as online and offline modes.

In the online mode, when an animal is moving through space, it's actively perceiving, it's actually responding to the environment. And that's characterized by certain patterns of activity, the activity that allows us to estimate the animal's location, good correspondence between the brain state and the external world.

But there are periods when the animal stops. They may be brief, lasting only a few seconds. But when the animal stops, the brain switches from this evaluation of the external world to some kind of internal evaluation. And this internal evaluation is reflected in a change in the way in which the brain responds. You see that instead of this relatively steady activity, the animal stops, and there are periods when many cells are discharging. These bursts, brief bursts, lasting only a fraction of a second.

I'm going to zoom in, just going to take one of these brief periods. It lasts only a fraction of a second. We're going to ask, what goes on during one of those periods? Lasts a fraction of a second. And this is the-- we're going to decode the activity. So I'm just going to give you a little example. I'm going to play out--

So what we're doing here, this is about half a second. We're moving along and decoding the activity, the neural activity that's going on, when the animal is sitting and is stopped at that location. And what you note is the decoded location, rather than corresponding to where the animal is, what the animal's perceiving, its ongoing interaction with the world, rather now reflects a pattern, a sequence that corresponds to where the animal could go-- future locations. It's running through a oath, or a trajectory that could correspond to what the animal might do in the future.

Now, a few other interesting properties of this. And it's unclear how this may actually relate and inform our efforts to understand how we build models, or what models look like in the brain. But one thing that struck us was when we look at these reactivated memories, they take on a characteristic form. And that is, they are piecewise continuous, with a fixed chunk size. And that is, an extended sequence takes the form of chains of shorter sequences. And that is our memory of the world, and the way in which we evaluate it, is really composed of smaller blocks, that we link together elements of experience in order to evaluate or perhaps estimate, or plan future actions.

Now, a question, an open question is, do these blocks, can they be recomposed in some way? Do they represent some fundamental compositional element of the brain from which we build and evaluate these internal models?

I think one example of how this internal state clearly distinguishes animals, biological systems from robots is the fact that these brains are active even when animals are not interacting with the environment. And so here we see the same process, decoding activity in this memory center. But now the animal is no longer even on the maze. It's just sitting here. You can see the little panel here shows a rat curled up off in a corner, sleeping. And during brief intervals, as we see here, we see the same process going on. The animal is, if you give me the liberty of anthropomorphizing, the animal is dreaming about its past experience in space.

So what we see is that the brain can re-evaluate sequential events that have occurred during these online states. The animal is sitting thinking, dreaming. This evaluation may be used to build or evaluate internal models. And there's a question of is this what thinking and dreaming actually look like in the brain, and how might that actually be used to build robots that can do the same thing?

So I would posit that the day in which we determine that robots have achieved some level of intelligence is not when they have successfully solved problems of action and perception in the world, but rather when they can introspect, think, or dream about their successes and failures.

MODERATOR: So I would like to show a couple of slides as a way to ask a question. This is work done in my group recently about the problem of visual recognition in the cortex. This problem has been described by David Marr, there in the middle of all of that picture, 30 years ago as the question of finding what is where, and vision is a way to answer this question by looking.

So what is where has a correspondence actually in the nervous system, in the visual cortex. It's a oversimplification, but neuroscientists have been distinguishing for the last couple of decades a series of visual areas called the ventral stream, which is mainly involved with answering the what question-- what is in the image, which objects-- and the dorsal stream, especially dedicated to the question of where and motion.

So we have been developing a model trying to mimic what we know about the anatomy and the physiology, the neural properties of these two streams. Here is the ventral stream, going from View 1 to higher visual areas, from the back of the head to the front. Information goes from the eyes to View 1, which is in the back, and then towards the front, up to high visual areas called the infratemporal cortex.

And this model here is just one of a series of models that a number of people have developed over the last two decades, including some people here, like [INAUDIBLE]. And the point I want to make is that this model seems to be quite successful in reproducing-- although they are based just on physiological data in the macaque monkey, to first of all reproduce human performance in a rapid categorization task in which you have to say after briefly looking at an image like the one you have seen before, whether, for instance, there was an animal or not in the image. And the performance of this model and human observers, MIT undergraduates anyway, it is about the same.

This is no proof that the model is correct, but it turns out, even more surprisingly, that this class of models does pretty well in recognizing different object types in images, compared pretty well means compared to state of the art vision systems, like eight different classes of objects. And it also does pretty well in a version that is designed to mimic the dorsal stream in categorizing actions, micro-behavior of a mouse. This is a system that can be used by biologists looking at the behavior of mutants, as a way to do quantitative phenotyping. But the interesting thing is that it does about as well, actually a little bit better than human technicians labeling the videos.

So I'm showing all of this just to ask this question to the panel. We have here an example of models that summarize neuroscience data. They were not designed really to outperform computer recognition systems. They were designed to mimic what the neurons in different areas are doing. And I was personally quite surprised to find that they perform as well or about the same level as a computer vision system.

So the question is, is neuroscience beginning to know enough so that it can tell engineers and computer scientists how to do AI, or go about developing a better vision system, better robots? Will neuroscientists begin to compete with the engineers? Rod, what do you say?

BROOKS: I'm not going to ask the question directly, but to say that I think sometimes we get confused. You know, I've definitely been inspired by neuroscience and by animal behavior, but sometimes other methods work better for practical problems. And you know, I think Google is a great example of that. Google search is not operating in a way that humans would search, but it is nevertheless a very useful thing.

So we have to temper it. We have to not get too hung up on it has to work just the way that a human system works. But at the same time, a human system outperforms in other ways so many of our systems that any tricks we can learn from there can be helpful, to expand the range. And I think in tomorrow's panel, we can see that the-- oops.

In tomorrow's panel on applications, I think Dave [? Borucci, ?] who may be here, is going to talk about Watson. And Watson can do great things, but it's not the same way as humans. But nevertheless, it might be more useful for more immediate applications, in medical applications, or whatever.

MODERATOR: Takeo.

KANADE: I have been probably the engineer of the engineer approach, in the past 30 years. And I think the reality is the engineering system has surpassed the performance in the past 30 years. And then we've been hearing, even though it's better, so-called biology based or even inspired approach, but it was not as good as human. As a result, in the last 30 years, we've been hearing we have to switch approach to bioengineering based, and keep the fact that we keep hearing 30 years, the same thing, probably implies that probably in the past it was not as successful.

However, I think a lot of the reasons is that in my mind, in the past those research was mostly explanation, or findings based, not necessarily presenting the algorithm. Of course, the inspiration was there. I'm simply talking algorithm level. Like Rod said, lots of inspiration.

And also, reality is, I think it's the question is maybe ill posed. I think what probably engineering-based system is actually influencing how we model the human or biological system works. So it's a Hyundai and a Honda relationship.

Now, at the same time, I think I see sort of a new, hopefully promising trend, like your saliency, focus of attention work, which seems to be more on algorithm, how to perform level, rather than how to explain level. I think that's a new development, and I think we engineers should-- I mean, at least for me-- I'm sort of cautiously optimistic that we can learn something new from here.

At the same time, I agree completely with Rod that there's no reason that we have to believe that the human is the optimal machine. I don't believe that. Therefore, I think-- well, I'm repeating. I think the two approaches will go side by side, and I think we are maybe in the new stage of mutually learning, rather than simply looking at sort of the analogy.

MODERATOR: Amnon.

SHASHUA: I think the difference between engineering and neuroscience is like the difference between AIDS research and cancer research. AIDS research was historically focused on finding a cure. Cancer research more or less it's focused on understanding the processes that give rise to cancer. Engineering is focused on solving a problem. Neuroscience, or science in general, is focused on understanding the processes behind intelligence.

But I would say that there are certain structural dissimilarities that are intriguing. And for example, in Tommy's model, which follows the layered structure of the visual cortex, there is a structural dissimilarity between that and the conventional machine learning algorithms work. Conventional machine learning algorithms are very shallow. It's like a single layer network, like a support vector machine, whereas the visual cortex has many, many layers.

And this is a dissimilarity that is intriguing, because maybe we engineers can think and understand why the human, why the biological system adopted, evolved into a very depth, very wide layered approach, compared to our algorithms, that are very shallow.

MODERATOR: Mark, Do you want to add the perspective of the neuroscientist?

SHASHUA: Well, you know, I think the neruroscientist-- as an engineer myself, I certainly appreciate the problem solving approach. And I think this is where AI and computer science have focused their efforts, rightly so, in terms of performance, to solve a problem.

The case of neuroscience is it's the problem that cannot be anticipated. It's the adaptive nature of behavior. Again, in the face of incomplete information, how do we nonetheless perform adequately, and adequate performance across sort of a spectrum of contexts and conditions requires that we fully integrate the evaluation of complex state, which is what neuroscientists study. They study, increasingly so, the way in which across modalities, across systems, how do we integrate information, in order to make informed guesses with the maximal information that we have available, and the maximal internal knowledge that we can bring to bear on it.

So it is the engineering problem of intelligence, start here, which you could say is what neuroscience is ultimately trying to understand. And I think that there can be domains, for instance, in the areas of visual perception, in which we can understand principles of neural function which could be applicable, and I think could even inform, at an algorithmic level, some of the data that I showed.

Suggests particular mechanisms by which a combination of closed loop ands open loop evaluation of sort of spatially estimated temporal structures. So we think about the way the brain seems to work is to represent time in terms of space. Space is-- it's not just a metaphor, It's actually sort of a practical substitution that comes from our everyday experience.

And these are the kinds of tricks which neuroscience seems to use, which may actually have immediate applications. But I think in the broader context, it is the fact that neuroscience is focused on trying to answer this more this more broadly defined, more integrative question, that ultimately will give us the means to integrate all of these solution oriented approaches and have what I think Rod pointed out is not just a demonstration of robots that seem to be intelligent, seem to emulate human behavior, but which can actually do so. So that's what I believe we can contribute to the larger picture.

BROOKS: It's going to be interesting to see the marketing challenge for when the Volvos are dreaming.

WILSON: That's something I would like to see.

SHASHUA: But I wouldn't be carried away with engineering successes. As Rod demonstrated, just putting out keys from my pocket is an impossible task of today's technology. In visual perception it's the same, that there isn't a robot today that can take a video stream and process it to the level of detail that we humans can do. We can detect instances of thousands of object classes, understand action, very intricate action dynamics. No computer algorithm is even close to that today.

MODERATOR: Well, it's fair to say that engineering can do pretty well, as we have seen, in specific domains. And in specific domains, there are already visual systems that do better than humans, say for PC board inspection, for instance. So the question-- and I think there will be vision systems, I think Takeo mentioned there are face identification systems that are probably better than humans, under controlled conditions. So it will have been better than human vision, in specific domains. We have it already.

The question is, does what we know about the technology of today, machine learning, and so on allow us, do we see a way to have the broad visual intelligence of humans? Is it enough to put together the car vision system with your system-- you know, a lot of systems, and we get all the broad abilities, capabilities of human vision, or will we need to learn from neuroscience?

I want to add to that obviously, understanding the brain is a very important task in itself, even if the neuroscientists will not compete with engineers. but you know, neuroscience is a large enterprise, growing. There are 40,000 people at the annual meeting of the Neuroscience Society. And we're learning more and more, and more rapidly about the brain. So I think it's almost a given that neuroscience will tell something to engineers. And my question is whether it will be how to bridge, how to create a broad ability, for instance, in vision.

KANADE: I think one of the things that I feel the most curious and want to know is that obviously, the human-- I believe human is using some representation. You may not like it.

BROOKS: No, I--

KANADE: Well, and it seems then for us, in engineering a system, we come up with some ad hoc, or at least we try to justify-- but independent, separate representation for a problem. And we believe they are at least reasonably good, and in some cases pretty good, seems to be, to get the good result.

But human-- I don't know, do we have a separate representation for each problem? Or do we have some general, relatively common representation for perception, action, and space, and so forth, which obviously our system, human system is so robust, so flexible, so noise resistant, and so forth. And I think if we have some hint toward that, I think that is the place where the engineer's system and neuroscience and cognitive science and so forth mutually can learn.

MODERATOR: Let me read to you a question that came through the email question system we set up, was kind of the same point. It says, despite impressive advances in vision, speech, robotics, and other fields, we still appear to be very far from matching human capabilities in any of these domains. Given that we now have immensely more data and computational power than even before, what is holding us back? What are the central challenges that must be overcome to unlock major progress? Anybody?

BROOKS: If we knew exactly what was holding us back, we wouldn't be held back. And I want to expand on that, on going from neuroscience to engineering, and engineering back to neuroscience. I think it's a dance. And we will make mistakes going both ways. As you mentioned, how engineering can then be used to model neuroscience, and then neuroscience can be used to get clues the other way. We see that mistakes get made.

And I think a great example of that has played out over the last 50 or 60 years with DNA, the central dogma. Von Neumann talked about there is a tape, and the tape is both copied and interpreted. And then DNA seemed to fit that very early on, and only in the last few years we're seeing the complexity is much more than simple copying and simple interpretation. So it was useful for a while, but now it's that simple Von Neumann approach is no longer good enough to explain what's going on in DNA.

And likewise, I think we see going back and forth the idea of representations, and of course there are representations at some level, going into the hippocampus and what is the representation there and coming back. We have to refine those words over time, and define what they mean. And that's an incremental process.

So I don't think there's going to be a secret that comes from one to the other. I think it's going to be a back and forth, over time.

MODERATOR: Let us take some questions from the audience. There be people that around with wireless microphones, so if you have a question, can you get the microphone?

AUDIENCE: Thank you. This is a metaquestion, really. It strikes me that what's going on here and now is a slice of what nature does in evolution. There's a tension between performance and understanding, and you use that discrepancy to move forward. And I wonder if anything can be learned in the engineering process from what goes on in evolution.

SHASHUA: You can try. I think one of the big dissonances is that the computer architecture and the biological machine are very, very different. And because the architectures are so different, you can compensate in one architecture, you can do things that compensate for strength in the other architecture. For example, we have today computers that play chess, that play chess better than humans. Have we learned anything from it? It's kind of disappointing. No new revelation came out of it.

Google Translate has very impressive performance because they use a huge database, a database that a human doesn't need in order to learn a language. In Mobileye pedestrian detection, we use huge databases of millions of examples, clearly way beyond what a human needs in order to learn the class of pedestrian detection. So you can use the strength of the computer architecture to compensate for certain cleverness that the biological system has, that can do what it does without this masses of data.

And I don't know if it's good or bad. It could be that-- you know, airplanes don't flap their wings. It's not that we need to make an airplane that flaps its wings, in order to say that we have the ability to fly. Could be that if we better understand the strengths of the computer architecture, we'll find the necessary algorithms that will bring us to the performance that we want to reach, those performances that humans are very good at.

MODERATOR: Now-- microphones.

BROOKS: Yeah, there it is.

MODERATOR: OK.

AUDIENCE: Crick pointed out that there was a fundamental problem in visual perception. It was called the binding problem. That is, you took a visual system and you broke it up into a whole bunch of modules, 40 of them maybe, and then it came together in a single perception. And the question is, how that happened.

And so the thing that one would like to know from the engineers is how do you put together all your data and et cetera into a single perception? And that is how, in some basic sense, engineering may help us to deal with the problem in neurophysiology.

BROOKS: Or it may be a misinterpretation to say that it does come together in the human system. And so definitely, I think in engineering we haven't brought them together because we do these use specialized things for particular subcases, and don't have anything to answer there. But some of these notions go back and forth about what must be there, and I suspect that we'd kind of seem a little quaint later on, about whether those were the right questions or not.

WILSON: I think the binding problem, the question of whether or not having separate streams of processing ultimately confounds efforts to integrate informations across domains, or even within modalities. And it I think it fundamentally comes down to what Takeo actually pointed out as the context problem. And that is, you can imagine all of us here, independent entities, who say there's fundamentally a binding problem. How could we possibly integrate information across all of these minds and brains that are out there? And yet, the common context allows us to infer the relatedness of these hidden internal states.

So understanding context, understanding how the elements-- they don't have to be put back together. But if they are processed in a way that in the end allows us to infer that they are related, we can define that as contexts. And then the question is, once we have context, how do we map information back into that? How do we extract information from that? And what does that context indicate, as an appropriate solution?

So when we can understand how context is ultimately represented, we can say that is the solution. It doesn't have to be brought back together. It is already a whole. And so understanding how the brain then takes that assumption, and assumption distributed processing, in which there is some common context, and then in the end, how do we evaluate a system which has now solved the problem in a distributed form, to come up with a solution. So the integrated knowledge problem that ultimately does not-- in a sense side step the problem of binding, I think, is what neuroscience does.

BROOKS: And I think often in these interpretations there's a hidden homunculus.

WILSON: Yes.

BROOKS: And it requires a different way of thinking about some of these problems.

MODERATOR: One more question.

AUDIENCE: Yes. OK.

What's holding us back? The first is the area of multi-processing. I remember old arguments I'd have with Marvin Minsky, where he said that a fast single machine could always emulate a parallel system. And since then, we found out that that just isn't so.

So this concept of having architectures which are truly, incredibly parallel, far different from our limited Von Neumann machines, which are so virus prone, et cetera, to be able to operate in a whole new way, in fact, where programming might not even be what we think it is today. It's not programmed, it becomes a learning machine. And by definition, it will make mistakes. And so what it gets down to is the concept of letting go, the concept of not being in control freaks, where everything we do-- we worry about litigious issues, and so on. I'm sure iRobotics and others have this as a major concern.

So that's one area, namely whole new architectures, whole new approaches, whole new philosophies, which can be done. It's just that something is holding us back, and I think it's our philosophies, our views.

The second question-- the second issue, rather, that I'd like you to address is the issue of collaboration, limited by the Tower of Babel which exists between departments, interdepartments, PhDs who learn more and more about less and less, the jargon issue, the conventions issue, the journal issues-- all of the infrastructures which really make departments very compartmentalized and very competitive, rather than collaborative, to say nothing about the corporate world and patents, which gets into a whole other realm. But that we are such a very non-collaborative species, with a lot more to be said. But just those two issues, I'd appreciate your comments on.

WILSON: Well, I'd just like to point out that some of the speculations I raised at the end is, can robots do what rats seem to do is something we're actually pursuing in collaboration with CSAIL, with Nick Roy, John Leonard, Seth Teller, thinking broadly, even within the Department of Brain and Cognitive Sciences. And we sort of pride ourselves on having this interdisciplinary environment, in which we can think and act on these opportunities for cross-disciplinary collaborations.

So I think that is what we have here at MIT. Whether we can leverage it, maximally leverage it is what we're hoping to achieve. And hopefully, what will come out of these forums, demonstration of potential, and then maximally leveraging that opportunity. But I agree, there may be practical obstacles. But I think we are certainly in a position to overcome them, and I think there are many examples of efforts to do so, the Intelligence Initiative being one of the most recent and promising, at least from the standpoint of brain and cognitive science.

MODERATOR: Yeah, the Intelligence initiative that Josh Tenenbaum and I are trying to develop at MIT, and he's behind this symposium, he is an example of trying to get various departments, and not only departments, also schools at MIT to work together on this problem of intelligence. And part of the assumption is that we need to speak even more between engineering and neuroscience.

Neuroscience needs engineering in various ways, not only as a tool, but also in order to prove that a model or a theory is valid. The first test is whether it does what it's supposed to do, it solves the problems that it's supposed to solve.

So you have to be an engineer if you want to understand the brain. And conversely, as I said, there are probably ways in which engineering can leverage knowledge from neuroscience.

Let's get a last question, and then--

AUDIENCE: When I was a student under Mike Dertouzos, he always talked about a computational environment, of neural computation. And one might think of mini columns at this point. We've talked about perhaps-- there's a theory going around that the brain processes facts, and what's going on in the cortex is all of the Bayesian probabilities. Has anyone boiled that-- I guess I got a couple questions. Has anyone boiled that down, and how does each of you-- does this kind of thinking fit into the thinking that you're doing?

MODERATOR: I think we'll discuss this issue later today, in the last panel, and you could probably ask it again to a couple of people there, including Jeff Hawkins and Josh Tenenbaum. So thank you all, and please thank with me our panelists.

Keyword Highlighting