MIT/Brown Vannevar Bush Symposium - Fifty Years After 'As We May Think' (Part 4/5)
PRESENTER: Good morning, ladies and gentlemen. We're ready to start the morning session. And I couldn't think of a better eye-opener than the pioneer of digital convergence, Nicholas Negroponte. As you well know, he is director of the world famous and still unique interdisciplinary Media Lab here at MIT, which just has been celebrating 10 years of accomplishment with a big bash.
His hit book, Being Digital, and his monthly columns in WIRED are helping to educate the lay public about the all-important paradigm shift that is happening because of digital convergence. And I, personally, am very glad that he is contributing his atoms rather than his bits here today. Nicholas.
NEGROPONTE: [? I ?] [? think ?] [? I'm going to ?] [? use this thing. ?] Ah. Keep forgetting podiums have these slopes to them.
[? [NO AUDIO] ?]
Am I live? We're now live. Okay. As Andy mentioned, the Media Lab just celebrated its 10th anniversary, which is not particularly important. But what's happened is that when you make an event of that scale, we invite all of the sort of consortia that form parts of the laboratory and all of the various groups. So this has not been one day. This has been a bonanza of meetings that started Sunday night.
And so you get the dregs this morning, I'm sorry to say. I will do my best. And I don't know if anybody attended the 10th, but I may repeat some of the stories I even told at the beginning of it.
About three years ago, I became involved with WIRED Magazine. And I became involved in a very odd way. It wasn't my idea. I mean, it was Louis Rossetto's and Jane Metcalfe's. But nobody would fund them. None of the big media tycoons or none of the VCs would fund them.
And I agreed to fund them. And to protect my investment, I wrote the back page. I said, I'll write the back page, too. So if this thing goes down the tubes, at least I'll be part of it going down the tubes. And if it's a success, then let me be part of that, too. But I would invest more than money. I'd invest some energy.
And about a year and a half ago-- maybe it's a little more, a little less-- I learned something absolutely astonishing. At least, it really touched me. And that was that the largest number of subscribers were kids buying WIRED for their parents for Christmas. Okay.
And that touched me, actually. Because it was basically kids saying, Mom and Dad, this is about me. And here's your Christmas present.
Now, when I sort of heard that, and it was about the 18th issue, I said, well, maybe the thing to do is to take the 18 back pages and make them into a book for Mom and Dad, which, of course, is awfully naive. Because I'd like to just take the 18 stories, and you staple them together, and you send them back to the publisher, and they put a cover on it. It was not that at all. It was basically writing something all over again.
But in the process of doing it, I ran across-- and I'm not even sure when and how-- but sort of bumped into the idea of thinking about the world in terms of bits and atoms, and using that perspective. And that, if you looked at something as we normally do-- in fact, as laws are created and so on-- they're usually based on atoms-- things that we can touch and feel and have weight, mass, and so on.
And yet, that the world of bits was really quite different. And even though in my experience at MIT and computer science, either formally or informally, we hadn't really thought very much about the fact that bits didn't have size, weight, color, mass, and more or less traveled at the speed of light, and that this was really a very kind of interesting dichotomy.
And there are two stories that I'm very fond of saying. One will touch your hearts, I think, quite directly. Other's more related to telecommunications. But think of some of the things that have happened in the past 10 years.
And my first example will be in 1983 when Judge Greene broke up the telephone company. He told the RBOCs-- the Regional Bell Operating Companies-- he said, you are not allowed in the information business.
Now, think of that remark. Because it's actually a very strange remark, because every single one of the Regional Bell Operating Companies in this country produces yellow pages. And those yellow pages, in every case, all seven, is the highest margin business that any of them run. It's not the biggest cash generator, okay. But just in terms of margins, they're almost obscene. Okay.
So the yellow pages, which is certainly the information business, was fair game. They could do that. So if you think of what Judge Greene was saying in terms of bits and atoms, what he was basically saying was, it's absolutely okay to, more or less, defoliant to squeeze ink on dead trees, perhaps even to use child labor to hurl these huge, chunky books over the transoms of American doorsteps-- but that it was utterly illegal and against the law to transmit a no-return, no-deposit, ecologically sound bit at the speed of light into the home.
So I mean, if you really looked at it in terms of bits and atoms, it was the most absurd position that he was taking in 1983. And, of course, it took over 10 years to even start to change it.
There are some people in this room who are actively involved in digital libraries. And you look at digital libraries. And you ask yourself, why do library-libraries, or physical libraries-- why do they work? And one of the reasons they work is absolutely-- I mean, it's almost completely based on atoms. The reason a library works today is that I take my atoms down to the library. Some of us have a few too many of them.
When I get there, let's presume I'm allowed in the stacks. And I borrow the book. Something that people don't tell you about atoms is that when you borrow an atom, there's no atom left. Okay, there's empty space.
You take the book off. You read it for a week. You bring it back. And miraculously, somebody borrows it again. You bring it back. Miraculously, somebody borrows it again a week later. And at the end of that little unlikely episode, 52 people have read that book in one year.
Now, as some of you are doing, and people that we all know, when you take that model and you build a digital library and you turn it into bits, what happens? First of all, you don't have to take your atoms down there. And second of all-- and, again, we're never told this, okay, because it's so obvious-- when you borrow a bit, there's always a bit left.
So lo and behold, 20 million people in theory can borrow the same bit. And we've violated copyright law in any country that has it, and certainly violated the sense of intellectual property.
Now, my point is really quite simple. My point is if you look at some very, very-- and, again, they're very simple observations-- and you think of them in terms of bits and atoms, and I'll get back to this afterwards, it really is a very, very different perspective.
And so I started down this path to do that and discovered in the process that what was going on in the United States in general, in the past sort of 12 months, really was much more dramatic than even the most extreme and most sort of wild predictions. We're all sort of falling short.
Let me give you an example. The official statistic is that 45% to 55% of all new personal computers are going into the American home. And it depends who you speak to. I mean, Intel will tell you it's 50-something. But the range is between 45% and 50%. And they think, boy, isn't that a big number? I mean, it's a very big number.
Well, the real number is more like 75%. And the reason it's 75% is for a reason that is obvious, again, but nobody knows how to count it.
And that is that almost every single corporation in the United States, when they look at somebody's desktop and they say, upgrade your 386 to a Pentium or something, what do they say? They say, take the old one home. Or they say, why don't you buy a laptop, and that way you can go back and forth between work and home?
So it turns out that today, 50% of black teenagers have personal computers at home. 75% of all teenagers have computers at home. And none of us, even those who were considered nuts in their speculations, would have ever have predicted that kind of rate of change, partly because-- and this, again, is speculation on my part-- but I think part of it is just our body clocks are exponentially sensitive.
We're not sensitive to things that happen exponentially. We don't know how to understand them. Yes, we all can plot things on log-log paper. But we don't understand things that happen. Web's doubling every 50 days now. We don't know what that means. We don't understand with doubling every 50 days means.
But we do understand, it turns out that today, roughly, there's a new home page every five seconds. That we can understand. We can say, aha, yes, that's [INAUDIBLE] But the doubling part, we really don't understand.
So in parallel with, again, the kids subscribing for their parents, there was this other phenomenon of sort of exponential growth, but in a very, very odd demographic in the United States right now. If you look at the population and you find that there are basically no 15-year-old Americans that are digitally illiterate. Okay.
Now, if I count Nintendo and SEGA, that's 100% true. And if you pulled Nintendo and SEGA out of the equation, it's still roughly probably 90% true. Or at least, it's so true that it's as good as 100%.
And then what happens? It's really very odd. It's a period in time that won't last long. But it's a very odd phenomenon. The highest percentage of people getting online as a percent of their age group are the elderly-- 55 years old and up-- right now. And that really is true.
So what do you have? You've got this big swell that moves very quickly. Anybody who's a parent in this room knows how fast that moves. So you've got this big swell here at the sort of lower end. You've then got this other swell at the other-- not as big, but at least growing very, very rapidly. And what you have in the middle is what I call the "digital homeless."
And the digital homeless, who are they? They're affluent. They're literate, well-educated. And it's people who have the one thing that the other two groups-- or sorry, missing the one thing that the other two groups have in spades. And that is time. They don't have time. They're just too busy. And they're not, like ourselves, sort of in the field.
So you get this very odd sort of distribution right now. And it's the demographics, at least in the United States right now, is indeed very, very peculiar. It's not normal at all.
And probably in four years it'll all shift over. But for the moment, it really is a very special condition, which, in my opinion, leads to a lot of the-- I don't want to call them "problems" because they haven't come up as problems yet, necessarily-- but leads to a lot of kind of the misunderstanding. Because that same digitally homeless group tend to be the people who are making laws and decisions and run companies and do all sorts of things. So in some sense, the intellectual and management power of the United States tends to occupy that space in the middle.
So, again, I started sort of thinking about it. And ran after sort of pushing the book being digital out, I realized when I finally shared the podium in Los Angeles with Don Rickles and [? David ?] [? Koresh's ?] mother. I said, boy, yes, computing maybe is becoming a rather popular thing. And it was a very odd experience. And I don't recommend it.
And certainly don't intend to-- actually, it was really bizarre. I shared the podium with a four-legged llama. Not sure why. And then I shared the podium on another occasion with [? Martín. ?] [? Martín ?] was a man who changed his sex to a woman in order to have a lesbian relationship with his wife while they brought up the four children.
So that was considered on the same wavelength as being digital.
And so it was an odd experience for me. But in doing that, one of the things that happened to me was when I was in Michigan. You sign your life away for two weeks. And the publisher, then, almost puts you on a sort of an orbit for that two weeks. And I happened to be in Michigan in February the day after this kid, Jake Baker, was arrested for posting his message on whatever it was-- all.sex.bestiality or something.
And I don't know if you remember the story. But what he had done is he had written this story, posted it. I forget how many days it lived out there. And a guy in Moscow, okay, which is a cute detail, logged in to whatever it was-- all.sex.bestiality-- and was offended. Now, digest that for a second.
My image is, walking down a street in Amsterdam, passing what might have been a store front selling I don't know-- god knows what. But the glass has been painted over in black. And it says, with giant, orange letters "sex shop." And dangling from the front door [? are ?] inflated prophylactics.
And you go through, into the door. You're not going to get offended. I mean, you've made a pretty fundamental decision when you go through that door.
And this Russian, in my opinion, made that same decision. Read it, got offended, and happened to be-- and, again, I don't know how many people know the exact story-- but happened to be an alumnus of the University of Michigan.
And he complained-- I don't know if it was to the alumni office or to the president's office. But he complained. And lo and behold, at 11 o'clock at night, the sheriff arrests Jake Baker, confiscates his reading glasses.
Okay. Dead serious, okay. Plop him in jail without bail-- okay, without bail. And for those of you who know the story, the part that I've left out is that he had actually used a real person's name. And there were some people claiming that-- oh, I don't know.
At any rate, I don't even want to try and imagine what the case is, because it's the without bail part that got me. Because I didn't think we did that in the United States. And there he sat for over 30 days without bail.
The reason I knew about is because his mother and stepfather came to see me and asked if I could help get him out of jail. And he did, indeed, finally get out on bail. And I don't know what effect I had. But at any rate, he got out on bail.
And when it hit the courts in June, it was dismissed. The case was dismissed as frivolous-- that the government's case was frivolous, that it was fiction. And it didn't get beyond that point. It was just dismissed.
And when you see something like that happen-- and, again, I'm certainly not a lawyer-- but it seems to me that the law is, by definition, a reactive phenomenon. And when it reacts sort of so absurdly, a little bit like a floppy, flopping, sort of dying fish on a dock, I think something rather big-- it's sort of like an early warning system.
Just, again, it's another story I'm fond of telling-- happened a little bit before the Jake Baker incident. And less people knew about it, at least the first part of it. A lot of people knew about the second part.
What happened was, I think it was in the middle of January, a cleric in Pakistan, actually, asked officially that the United States extradite Madonna and Michael Jackson to stand trial for having violated fundamentalist Islamic law. And this did not get the front page of any paper. In fact, I'm not even sure the State Department responded officially. And most people sort of just laughed it off.
But almost at exactly the same time, a couple whose name I can't remember, in Cupertino, or actually the neighboring town, had some bits on their computer that were absolutely legal by Cupertino standards and legal by California law and legal by [INAUDIBLE]
And some guy in Tennessee logs in, sucks the bits over to Tennessee, doesn't like what he sees on his screen. Calls his sheriff. The sheriff doesn't like what he sees on the screen. The sheriff in Tennessee calls the sheriff in Cupertino. They not only arrest the people, they extradite them to Tennessee.
Okay, we didn't do this with Michael Jackson and Madonna. And a lot of parents would have wished we had. At any rate, there he is. He is put on trial, found guilty, and is guilty today. I mean, they're appealing, but it's still there.
And you say, wait a moment. What's going on? And what's going on is, in my opinion, just something very simple, that this is a big one and we don't understand it, and that the bits and atoms issue, in fact, people will say it's all hype, okay.
Yes, there's a lot of hype. But it is a wild understatement. I think what's going on right now is the tip of an iceberg that, in the next year or so when you get things like untraceable digital cash, when you get some of the intellectual property issues, and all of these things that start to come up, which will all surface in the next 12 months-- it's sort of a big way. I think it's a nontrival period.
The fact that it's the 50th anniversary that created this event-- it's the 50th anniversary also this year of Arthur Clarke's famous paper on satellites. And I mean, there's a lot of sort of 50th's. It's the 50th anniversary of the UN. There seems to be a lot of 50th's, but what's going to happen in the 51st year is going to be pretty dramatic of all of those sort of events.
And I think it's a really big deal. And I think we've all underestimated it. Okay, every single one of us. And a lot of people in this room were very much part of things like the internet and things like the world wide web and multimedia and hypertext.
My god, I mean, what a group-- I mean, of all the people. But I think it's fair to say that every single person in this room, okay, underestimated it. We didn't think it would be as big as it's turned out.
And, again, I think if you just try to understand what it will do to electronic commerce, and so on-- some of us use big words and strong voices and parade with great energy and can say these things-- but deep down in your hearts, I think a lot of you-- including myself-- we don't really believe that it's as big as it probably is. And I think that's quite important. And, again, I'm not sure a symposium like this is the one that's going to change that.
Andy has asked me to sort of end 5 minutes early because there's some group photograph that has to be taken. And I've got a deadline at the other end. And I want to leave time for questions. So I'm only going to use about 5 or 10 more minutes and sort of tell you where I think, again, we-- whatever that means-- what the Media Lab has been doing, since we did celebrate our 10th.
And it's kind of a soul searching phenomenon-- sort of what we think we did and where we think we're going. Because for the very same reason that bits are weightless, colorless, sizeless, travel at the speed of light, we can't experience them. Bits or not something that humans can experience in any way until they're turn back into atoms.
Now, yes, maybe we can start wiring ourselves in the future. But for the time being, we really have to turn them back into atoms. And the art of turning atoms into bits and bits into atoms has, in some sense, been-- though I would have never said it this way-- has been what I've my career doing. It's what people like Andy started 30 years ago, Alan did 25, 30 years ago. Some of us-- I mean, there are just many people in this room who were doing that, though I don't think we thought of it in those exact terms.
One of the things that happened in the late '70s is that a few people-- and it was sort of considered sissy computer science at the time-- felt that there just wasn't a sensory richness to this sort of bits to atoms conversion, and that may be the right thing to do is to look.
Because as the kind of jingle at the Media Lab is, bits are bits, and there's no such thing as video or audio or data. They're all bits. And that you should get into the so-called multimedia environment, et cetera, and the kind of sorts of things that we've done.
What's happening now is that that agenda is over. Industry [? has ?] taken at least the bits into the atom side of it. The richness of the output of most computers is by no means a done deal. But it's at least, compared to 10 years ago, quite extraordinary.
The reverse is still a mess. I mean, my god. I mean, the path back in, in fact, in my opinion is as decrepit as it was 20 years ago. But so that's obviously on the agenda for the next decade.
But we have always-- and "we," again, I'll speak for the 300 people at the Media Lab vaguely-- is that we've always thought of turning the bits into atoms in order to experience the bits. And so the atoms were there, just because that's what you see. And it's the phosphors that glow, or you wiggle a little air and you make a sound, and so on.
But the pervasiveness of the atoms was not particularly important. And that's why, as part of our agenda over the next decade, we've started this thing called "things that think." Because there is, now, a new space that has to do with the personality of objects. And that personality comes from thinking and linking and so on and so forth, and will certainly be for us a major agenda.
And in fact, there's a very big consortium that was launched or was announced on Tuesday which has a lot of, for us, new companies. In fact, it's rather exciting, because it's companies like Nike [? Sneaker-- ?] or Nike the company-- [? they ?] make a lot of other things other than sneakers. Federal Express, Steelcase Furniture. I mean, it's a whole different world for us.
And what they have in common is that they make things, but they don't talk to each other at all. And one of the things we have started doing is to work more extensively, because people had been doing this for a long time, on wearable computing.
And it turns out that some of the faculty have discovered that you really can ship 100,000 bits per second through the body using the body basically as a bus. And it's a whole new definition, I guess, of a "backplane."
And what that does is, that if you shake hands with somebody, you really can exchange 100,000 bits per second. So maybe in 10 years from now, the next conference you go to, you just shake hands with everybody, and then go home and print out all the calling cards.
Or you might go to a phone booth and pick up a handset, and just in the few seconds it takes to get that handset off the hook to your ear, you have downloaded 100,000 bits per second for the three seconds and put in all of your speech and voice patterns and everything so then, all of a sudden, the handset, it's much easier for it to do the speech recognition, et cetera. And, again, in the things that think, if your body can both radiate and take in the amount of bits, it's probably a pretty interesting thing to be working on.
So those are the kinds of things that I think we'll be doing over the next decade. And what it all gets down to-- and this is, I guess, sort of the right place to end-- is that computing really is not about computers anymore, okay.
And if you think it is, you're in the wrong field. Okay. Computing is about life. And it really has left the desktop. It's left a lot of the things that we have taken for granted.
And that's sort of a hard thing to accept. But I think it's important-- in fact, I think it's very important that we recognize that. Because otherwise, you're just going to just be scaling up that kind of junk sitting on the table. And that's not what it's about. And I think it's time that we recognize that.
And having said that, I'm going to leave the next 10 minutes for questions. And then we break for Andy. And if there's no questions, I'll ask a question of myself, so don't worry.
NEGROPONTE: Oh, you have to go to the mic. Okay. Okay. Sorry, I didn't know the protocol.
AUDIENCE: Yeah. So I'm [? Michael ?] [? Lesk. ?] One of my problems with all these intelligent objects is that they argue with each other about control.
I heard a wonderful talk from a gentleman at Phillips who said that, well, between the set-top box makers and the TV set makers, and even the remote control makers, everybody wants their box to be intelligent and in control and all the other boxes to be dumb peripherals. And I got no sense-- and I, in general, have no sense-- that we're learning how to solve the social problem of persuading all these companies that they should be making co-operating boxes. And I just don't know what we do about that.
NEGROPONTE: Well, in our image of things that think, we don't think too much about boxes, okay. Because as soon as you say the word "box," it seems to me a giveaway that you have already got a computing engine of some sort that it's expected to do something that is already of a digital nature. For us, we're thinking in terms of doorknobs, hinges, paint, thumbtacks, nails.
It's like a very fine grained parallel processor versus big, chunky things. Because a set-top box and a TV set are, again, in that family. And the set-top box makers are going to be history for another reason. So we can dismiss them. And they're going to evaporate. The TV manufacturers are going to be history for yet another reason.
So I don't care about them too much. But I do care about the people who make toasters and people who make microwave ovens and so on and so forth. Because, again, the personality of an object-- take a doorknob.
It would not be very difficult to get people to appreciate the fact that if the doorknob could see, could listen, could recognize burglars, could recognize the Federal Express man and sign for you, could open the door when you're coming in with packages and so on, that that might be kind of enhancing the behavior and, again, the personality of the doorknob.
The likelihood of that happening in the doorknob is quite small. Okay. Right, at least for the next few years. And even if it could happen in the doorknob, you don't want to replicate it in all doorknobs, necessarily.
And this is where you're dead on-- there's going to have to be a lot of linking going on. And these manufacturers are going to have to talk to each other. But it's not so much, John Malone talking to Apple Computer, or talking to Zenith. It really is Yale Lock talking to-- I don't know-- Stanley Something-or-Other, talking to whoever makes carpet, et cetera. And I think it's a different grain. Yes.
AUDIENCE: With respect to the horror stories you just told about people getting arrested and extradited and stuff, would you consider that to be a reason not to go online in the first place?
NEGROPONTE: Mm-mm. No, not at all.
AUDIENCE: I mean, you're suddenly being found by a whole new category of people who didn't know about you before and you don't know what's involved. [LAUGHS]
NEGROPONTE: No, I wouldn't even begin to recognize that. My goodness. I think I can probably personally claim to have produced 50% of AOL's new users in the past year. I wouldn't say that at all. I think what's happening-- and the part that's grotesque-- I mean, the US gov-- there's something-- did I see Bob Kahn in the back of the room? Oh, I see, yes. Hi, Bob. I mean, you guys down in Washington seem to have all your sign bits wrong.
Not you, personally, but the people who surround you-- sort of the government.
I mean, the export laws on encryption are mindless. It guarantees us one thing-- only the bad guys will have it. The Senator Exon stuff is mindless. I mean, because you obviously can't argue too hard against him, because you become a child pornographer if you do. But you can't control it. And why doesn't somebody tell him that?
I think government is going to find-- just in general, the nation state's going to find less and less of a role. And it's already found less of a role in the financial community. And like all things digital, they seem to get bigger and smaller at the same time. And I think you'll find that governance will get smaller and more local and more global at the same time.
AUDIENCE: Mark Bernstein, Eastgate. One of the themes, especially in Bush '45, also in Engelbart and Nelson's original work, was augmentation, liberation, better people. And in the last year or two, it seems we've seen the unexpected side effect come to roost-- laws that a transgression is worse if you're carrying a gun or do it with a computer.
NEGROPONTE: I'm sorry, [? laws ?] [? are ?] worse?
AUDIENCE: Well, for example, pornography is bad. But pornography with the aid of a computer--
NEGROPONTE: Is worse, right.
AUDIENCE: --will get its own law.
AUDIENCE: Theft is bad. But computer theft is worse.
AUDIENCE: Trespassing is bad.
AUDIENCE: But trespassing on someone else's computer is a felony. And this seems to tie in with what you were saying about the [? Gulch, ?] the people who are too busy or otherwise not tied in. But since they are also the people with the power--
NEGROPONTE: Yeah. For a while.
AUDIENCE: For a while.
AUDIENCE: How do we keep them from stomping out the whole process for a while?
NEGROPONTE: Because they can't stop the whole process. They can do anything they want. And my favorite analogy-- and I want to be very upfront. I first heard this from Mitch Resnick, and then I modified his stories.
But I love the story of the ducks flying south, and that there's some people that think the front duck is leading, which, of course, it isn't. My variation of that story is that if you go out-- and I don't recommend it-- but if you go out and shoot the front duck, they'll all scatter. And then they'll come back together, and there'll be another front duck. And it is not the vice president duck that became president duck, okay.
And there are people in Washington who think that. And the internet works like the ducks. And until people recognize that ducks flying south and the behavior of the net are one and the same, and unless you shoot all the ducks, you've got nothing.
And so really, it's like fighting against HD TV, which I did vehemently four years ago. And I haven't spoken about it [? since, ?] because it's just going to die of its own weight. Okay, the so-called "grand alliance" is just going to poop out. It's going to look like a sort of yak dung in about two years, on its own accord. And that's why the same thing's going to happen with the people-- they're just going to evaporate, because they just shoot one or two ducks.
AUDIENCE: Yeah. [? Matt Erst ?] from Brown University. I'd like to ask you a question about, basically, bits and atoms. Atoms have the wonderful property that they're accessible. If I get a book, anyone can read a book, whereas bits have to be interpreted. And there's an encoding. And encodings become obsolete.
So as we go to digital media, what steps do you think we need to take to make sure that it's accessible to everyone, both during one time period and for future generations [? whenever the ?] encoding changes?
NEGROPONTE: Well, the future generation part's separate, in the sense of how do you archive the bits. I mean, you could drill holes in tellurium or something that loses a few angstroms a century and then probably preserve the bits in many different ways.
But in terms of the accessibility-- and a lot of people told me, hey, smart ass, you think it's so digital, and yet you publish this stupid paper book which isn't even on the net, okay.
It's there right. And you know why it's not on the net? Because I may be a smart ass, but the publisher is a smarter ass and won't let me put it on the net.
And it turns out that the medium right now, at least for that group, that slug in the middle, the display device they have is a book. And one of our biggest projects starting up at the Media Lab is to make electronic paper. And we think we have a way of making a pulp-like medium with a very high contrast ratio that you can write in parallel.
You basically bind 100 pages of this stuff into something that'll look like a book. You write it over 20 minutes. Get, again, the contrast ratio of paper. When you're finished, you plug it back in. It sucks the bits out. It takes away the thing. And then you write it again.
And if we can pull that off, we'll actually have, in a few years, a damn thing that looks and feels like a book. But believe it or not, is eraseable and rewriteable and so on. So maybe that's an interim technology. So there are a lot of bets you have to make along the way. I'm not trying to ride on one particular bits, atoms transducer. Okay, give it to me hard. [LAUGHS]
[LAUGHTER AND CLAPPING]
Or this is the last question, okay.
AUDIENCE: It's very simple. Two years ago, New Media Magazine asked me to interview you. And we had a very nice chat. And I asked for a definition of "media." And you got kind of flustered and said you'd get back to me.
AUDIENCE: And I just wondered if you have one.
Media is like air in the sense that you don't notice it until it's missing. Okay. And that's probably really what's the common denominator. Because see, one of the reasons-- and I get into trouble when I say, the message is not the [? medium. ?] People really dump on me, especially Alan Kay, who knows a lot more about Marshall McLuhan than I [? do, ?] will sort of give a very sophisticated response to that.
But I really have come to the conclusion that whatever carries the bits, and whether you think the carrier is something that has a message in it, even though the bits may be transcodable in the receiver, which is, I don't think, something McLuhan had thought about, that you could actually transmit bits, and that they actually have no, quote, "form" until they arrive at the receiver, whereupon they get transcoded into one form or another, again, it's really like air. It's not that it's paper. It's not that it's TV. It's not that it's a screen. And so it really is the bits. But it's what air is to us.
AUDIENCE: So you've said what it's like.
AUDIENCE: Could you rephrase that as a definition?
NEGROPONTE: Could I rephrase that as a definition?
AUDIENCE: Yes. You said media is like air.
NEGROPONTE: Yeah. Okay. The media is air.
AUDIENCE: So-- [CLEARS THROAT]
So wind, then--
AUDIENCE: So wind, then--
AUDIENCE: --would be the motion of bits in some direction under pressure? And weather would be, let's say, the global infosphere.
NEGROPONTE: Sure. And a fart will be a fact.
I think my time's up. [LAUGHS]
PRESENTER: Thanks very much, Nick. I think you did wake us up. All right. Photo op for five minutes. Take a stretch break. The rest of you, Raj Reddy is next. Would the speakers come up front right away, please? Nick really does have to leave.
It's a great pleasure to welcome Raj Reddy to the podium. Raj is, of course, the AI guru's guru. And he's best known for his pioneering work in speech recognition.
He just told me this morning a little detail that I'd forgotten. At the '68 Fall Joint Computer Conference, he competed head to head with the mother of all demos.
And in a parallel session, presented CMU's pioneering work on computers with hands, eyes, and ears. And he says, just like [? Doug ?] is still working on his program of research, so is he. And so is his school.
He's dean of the School of Computer Science at Carnegie Mellon. And that includes not just the Department of Computer Science, but also the Institute for Robotics. And he's the Herbert A. Simon Professor of Computer Science and Robotics.
He's won many awards. I'll only mention the Gomory Award and the all-important Turing Award. Turing Award is the big one in our field, as you know. He's also the only person that I know of in computer science who has a Legion D'Honneur, presented by President Mitterrand.
I asked him to speak about an aspect of Bush's writing that doesn't get much press, which is his musings on speech and language understanding, and other things that are related to that-- the vocoder-- and anything else that he wants to tell us about. Raj, please.
REDDY: Thank you. Thank you, Andy. In reading the paper, Bush, and thinking about what he must have been thinking, it was clear to me, as you know, he was the crown prince of scientific establishment in the '40s in this country. And he had many problems.
And the key problem he faced was access to information on demand. And so he invented the Memex and the idea of multimedia databases based on the technologies of the day that he could think of. The theme of my talk is, that problem is still there with us. And it'll be there for 50 more years. When we have the 100th anniversary of Bush's famous paper.
The second thing we want to keep in mind is that time period between about 1945 to '55 was a very important period for the intellectual ferment that it caused to happen as a result of the digital technology being born. While Bush may not have fully understood the implications of digital technology, Alan Turing, who by that time had already invented the symbol processor Enigma and decoded the German encryption scheme, and Simon and Newell and Minsky and McCarthy were all beginning to think about computer as a symbol processor, and how symbolic processing would come to dominate computation as a whole.
And that vision is the thing, I think, that's, for us, kind of important to go away with. Namely, not the specific details that Bush proposed, but the fact that he thought, perhaps there may be ways in which one can capture multiple media, whether it's images or speech a language and text, and have it available at your fingertips on demand.
And so what I hope to do today is to kind of quickly review the progress in each of these fields and kind of show you a videotape of the current status as I see it. And then spend a few minutes talking about what the future might hold in all of these areas.
The other important link to people like Doug Engelbart here is what he was looking for is a way of amplifying [? his ?] own capabilities-- human augmentation. What all of AI is about is about enhancing the capabilities of the human being. "Intelligence amplifier" is the phrase that has been used.
The question is, what are the kinds of things we as a community can do that will make each of us 10 times, 100 times more [? effective, ?] [? then, ?] in the things that we need to do and we want to do, than we can physically be, given our human capabilities?
So those are the twin themes-- human argumentation and multimedia databases-- that I think are the central theses of Bush's paper. So if we look at it, he proposes the capture of image information of various types. And I'd like to review for you where we are in that. That includes capture, storage, retrieval, and processing in some form. He wasn't quite sure how it might be processed.
He had known about the vocoder at that time. Dudley had already shown it. And he thought perhaps it might be possible to begin to use voice input, both for annotations-- I know these days we call it the "talking paper." You can add electronic annotations to your page. And so paper which won't talk back at you can suddenly begin to talk.
And things like that, he was already thinking about. And the whole issue of hyperlinks and hypertext and language processing and text processing was innately connected with all of these [? ideas. ?]
What he did not say, and perhaps was not even thinking about, is the next issue in information and demand. I think several people pointed out-- I think it may be Bob Kahn-- and Herb Simon put it very succinctly. And if you read the recent Scientific American, there's a quote from him-- we have a wealth of information, but poverty of human attention.
The whole issue of future multimedia databases is going to be not simply providing information and demand, or data and demand, but getting the relevant information in decision-ready form to the person that needs it. People that manage that transformation and have those kinds of tools at their fingertips would be 100,000 times more effective than the rest of us who don't have those tools. And that is the key of human augmentation, in some sense.
So where are we in each of these technologies? So in image processing, I kind of first got introduced to image processing when Marvin Minsky came to Stanford in '64. And in our own hubris in AI, we thought we could create a Mars Rover that we could put on the planet Mars in 10 years. And [? Marvin ?] and John McCarthy thought we could do it in three years. So we started interfacing an image input device to the PDP-6, I think it was.
And so the fact that you could actually transmit images or information was known even to Bush. But the fact you could capture them in digital form was already practical in the early '60s, although very painfully.
But only last year it became possible to capture full video for hours on end, and put it on a desk, until we got a card that could plug into your PC or workstation, which would do [? MPEG ?] compression in real time. It was not practical. It was very painful to get any kind of moving data.
So we had to wait almost 30 years. It was, you could always capture a particular picture. But capturing moving scenes was not practical until recently.
So what we can do now, routinely-- and there's no magic to this-- we can digitize, compress, store, retrieve, and display images. And that was primarily the vision of Bush.
What we cannot yet do is what Michael was saying yesterday about image interpretation. That's not quite true. We can do some.
In fact, we have a van-- autonomous vehicle-- which drove from Washington DC all the way to San Diego. And 98% of the time, it was able to interpret the images it was seeing and drove by itself. Whenever it got confused, it beeped at the driver and said, sorry, I'm confused. You better take over.
So the theme of AI has been shown slowly shifting from replacing the human being or doing everything that a human being can do completely autonomously to the theme of human augmentation. Let us do everything that we can do. And if there are 5% or 10% that we don't know how to do automatically, let the human being take care of it. Please wait. I'll call my master. All right.
So that is the theme. And so in the image processing, we can do road navigation, digital mapping, robotic vision. People use that routinely. And there are all kinds of tricks they have to do.
But the broad idea, if [? you ?] can give someone any random sequence of images and say, interpret them completely, we would fail. But if you said, interpret them 80% of the time, and let the human being annotate the rest of it, I think it is doable. It's not been done, because ARPA hasn't provided the funding to do it yet. [LAUGHS] But it's the question of, what is doable at any given point in time?
So what I'd like to do is show you the first videotape in one minute. And this tape essentially gives the next generation vision of a digital camera. A digital camera you can buy these days from Canon and Kodak, and there's even a standard that's being established, so you can take digital pictures.
What you could not do with the old cameras and the new cameras is to have a camera which will produce a panoramic [? vision ?] of a 360-degree vision. And Apple recently introduced a thing called QuickTime VR where you could stitch together images for virtual reality purposes. It was a poor man's virtual reality.
And so you can actually turn around. And as you turn around, you could see what you would see. But the problem with it was, to take those pictures you had to have a tripod and do the whole picture very [? precisely ?] [? on a ?] plane so that stitching could happen.
One of our students went to Apple. He was working on 3D vision. And the transformations you need if you didn't want to have those restrictions, [? are ?] you have images with distortions that result from pan and tilt and zoom.
So you have translation and rotation and perspective distortions. And if you can actually solve those problems and stitch together multiple images to make a composite image, then suddenly, you have a camera which is currently not practical.
So I'd now like to show the video showing a small part of this. There's no audio on this. So he went to San Francisco, took a whole bunch of pictures of different kinds. And using a power PC, a Mac, he was able to then stitch together all these images and do the appropriate [? perspective, ?] [? distortion ?] transformations so that you now have a 3D camera-- a panoramic camera-- a panoramic digital camera.
AUDIENCE: It actually looks slightly out of focus. Or is that the video?
REDDY: It is the video. These are all homebrew video packages. So some of these are not as good as they could be. So you can see it's slowly adjusting itself and doing appropriate transformations. And ultimately, it stitches itself together.
And so this is now a practical technology that can be done using today's workstations. And, of course, those of you who haven't studied the problem, image registration has been a fundamental problem in computer vision for 30 years.
In its purest form, if you have 1,000-by-1,000 image, you have a million pixels. If you have to register them with another million pixels, you have 10 to the 12 operations. Each operation is a correlation operation, which easily takes between 1,000 to 10,000 operations.
So to register two images in the brute force form, you're talking about 10 to the 15, 10 to the 16 operations. So a lot of work has been done by many people, both at MIT and Stanford and CMU-- a number of places-- to do this registration issue. And all kinds of heuristics and tricks are used to do that.
And here, the heuristic it's using is the human being roughly positions it where it is. The idea is to get the precise stitching of the images, okay. So that's the current status of the kind of video cameras and video input that is possible using today's technology that Vannevar Bush may not have thought about.
The second area I want to talk about is the speech recognition area. Here is one area where ARPA had the foresight to continue supporting at various levels for 30 years. And almost all the areas we are looking at, that Vannevar Bush wanted-- speech, image, language, retrieval, all kinds of things-- these are what I call imprecise technologies. They're never going to be perfect. And you need to kind of continuously keep--
The interesting thing, of course, is when used by human beings, human beings are not perfect either. But nevertheless, they have figured out how to use these sources of information effectively to do the job they want to do.
How many of you can recognize all the faces and retrieve the right face in this room, that you have met? And it's pretty poor. In fact, it is dismal. And so the human retrieval capabilities are pretty lousy. So given that, the kinds of things we'll be able to do with these systems is substantial.
So speech processing is one of those areas which is never going to be perfect. But nevertheless, I think we're at a stage where useful systems doing useful things for average people is possible.
And I'd like to show you my next videotape. What it is, is it is a speaker-independent, unlimited vocabulary dictation running on a standalone Pentium--
- Raj Reddy.
- Bob Whitey.
REDDY: --doing voicemail application.
- Latest Sphinx-III application.
REDDY: It makes errors.
- Sorry I missed you last night, period. I had hoped to show you our latest implementation of Sphinx before you left, period. As with our previous unlimited vocabulary systems, comma, it is speaker-independent and continuous speech, period.
REDDY: So when it makes an error, you just go type it in or speak it in. It's not a big deal. So expecting a system to be perfect is not--
- It is.
REDDY: --the right attitude to have.
- The error rate is on the order of 15% to 20%, period.
REDDY: And we believe this will go down to about 5% in two to three years on about a same kind of machine, which P6 is like a 200 MIPS machine.
REDDY: And currently uses 64 megabytes. And the memory capacity may be somewhat [? lowered ?] in the future.
REDDY: But we're talking about people with that kind of capability using this technology.
- The system is running standalone on a new Pentium Pro, period. As you can see, comma, it is pretty close to real time, period. As PC platforms become faster and more affordable, comma--
REDDY: So and he's also speaking very carefully.
- --we will continue to use all of the cycles at our disposal.
REDDY: That's the way you would speak--
REDDY: --to a secretary who doesn't understand all the terminology that you're speaking, and also can't write fast enough, perhaps.
REDDY: Sometimes you can speak and make the correction. Sometimes you type it in. You try once. Try speaking again.
- At our.
REDDY: And if it doesn't recognize, you type it in. So there, even the second [? time it didn't. ?]
- At our.
REDDY: That hour.
So ultimately, you give up, and you type it in.
So that kind of gives you an idea of where we are. From a historical perspective, we've been working on it collectively as a community for 30 years. Victor Zue, here at MIT, John Makhoul and others at BBN, people at SRI, CMU, AT&T, and IBM have all been contributing.
And we are at a stage now-- and Michael was talking about precision and recall and metrics. The ARPA community has embraced that with a speech program about 10 years ago. We've been running metrics every year. And we have very precise measures of how well we do.
And the best indication of the progress of science in this area is any one of us does something, and we all publish these papers and talk about it at the annual conferences-- within three months or six months, everybody else can implement it and have it running. So the replication of results, which is the nature of a true science and the scientific method, is something that happens in this area on a routine basis these days, which is very satisfying.
So the next area I want to kind of go to is the whole language processing area. Of all of the areas, both image and speech, language processing is still perhaps the, to me, most challenging. If I had to rank all of the difficulty, speech is probably the easiest. Spoken language, of course, is hard. But if you just take the speech part, as you were seeing.
Language is probably perhaps the most important technologies for the next decade. And it's somewhat harder. Image processing and image understanding is an order of magnitude harder, and probably will be another 20 years or 30 years before we get to the right place.
And the impact it will have to me appears to be less at this point in time than the impact language technologies will have. The kinds of applications that we are looking at are not question-answering things we tried in the '60s in language, but more importantly, things like retrieval, which Michael talked about yesterday, understanding, summarization, translation, and a whole range of related things.
The best way I can illustrate this is to show you my next videotape, which is about multimedia databases, and then come back and talk about some fundamental difficult problems in language processing implementing the Vannevar Bush vision would have. Can we now show the next videotape, please?
- Informedia, can I hear the music of Mozart?
[PIANO MUSIC PLAYING]
Are there any portraits of Mozart? Do you have anything on Mozart's childhood?
- And this was because--
REDDY: Can we turn on those lights?
- --he was also a highly talented performer.
REDDY: It might be a little bit better.
- One of the most extraordinary prodigies of all time. He was the infant prodigy. And he could play blindfolded. Or he could play with a cloth over the keys.
- Do you have a video of a performance of Mozart?
[ORCHESTRAL MUSIC PLAYING]
- Until this last generation, that knowledge of Mozart could only be found here. Throughout history, humankind has placed the highest emphasis on recording and preserving of information for generations to come. But with the digital society we live in, today's libraries are becoming transformed.
Video, or motion pictures, have become the preferred method for this documentation of our society. Yet there are a limited means to access the footage sitting in the vaults of our television and film studios around the country. Until now.
With the combined efforts of Carnegie Mellon's research scientists and video experts at WQED Pittsburgh, we are developing Informedia, a multimedia library system with the ability to bring you information on demand. This concept of information on demand is a revolutionary idea in accessing large amounts of video resources and makes Informedia the most advanced retrieval system under development today.
[PIANO MUSIC PLAYING]
Through advanced software technologies, Informedia is able to access hundreds of hours of video footage and present short segments that meet the user's needs. This unique function is being made possible through technologies being developed at Carnegie Mellon University.
These technologies are integrated into an intelligent search and indexing system. Speech recognition, natural language processing, and image understanding make up these cooperating technologies.
First, and probably the most familiar use of speech recognition is to allow the user to access the entire system through voice commands. Computer show transcript.
- A new religion replaced the old. But the cathedral is also a symbol of continuity in Mexico.
- CMU has given speech recognition an entirely new dimension. When applied to any video footage, speech recognition software automatically generates a transcript. Transcripts are then searched for subjects requested by the user-- the first method by which Informedia gives you the information you need.
Searching the transcript may not always be sufficient. The subject matter may not be referred to directly in the text. This is where our second cooperating technology comes into the picture.
Natural language processing allows a higher level of semantic understanding than merely the actual text, allowing the computer to search on topics that may only be inferred.
Since this is a visual medium, the actual video image is important to understanding the content-- image understanding. By analyzing each picture frame, the computer is given another tool for searching video footage. This enables the user to search video for shots with the same scenic composition-- particular backgrounds, such as space or blue sky, or searching on only footage with a single person talking.
Now, let's take these technologies and see how they create the Informedia system. Informedia can process any type of video footage. This footage is then broken down into its video and audio tracks. The audio track is automatically transcribed through speech recognition technology. The transcript is then indexed using natural language processing technology to give a comprehensive topical index.
The video track is then segmented using image understanding technology. The video is divided into useful-sized clips and automatically stored and indexed.
The two tracks are then rejoined, now in a useful, indexed multimedia library. Once online, users can search this multimedia library through voice commands using speech recognition. The video is then presented, giving the user information on demand.
[MILITARY DRUM BEAT]
- For the commoners, the Spanish conquest and the later Mexican Revolution brought merely an end to one form of government and the imposition of another.
- Carnegie Mellon University and WQED Pittsburgh are doing what's never been done before-- creating vast, exploreable digital video libraries. For those who wish to see a more extended demonstration of Informedia in action, let the tape continue to roll.
The current Informedia system is under development at Carnegie Mellon University and possesses many of the functions and applications the complete system will eventually have. Do you have anything on Mars? The system has a speaker-independent speech recognition interface to allow users to enter their queries through voice commands.
The computer shows the results of the search through these icons and titles. Each of these icons represents a short segment of a larger video, enabling the system to return only the information you require.
- From Star Trek-- The Next Generation, I've led many an expedition in science fiction. Now, as your host on Space Age, I will guide you through an incredible real-life adventure, the story of how humans and machines have combined to bring us to the brink of a new age of discovery.
- Here are a few of the Informedia research scientists to demonstrate the system's new functions and applications.
- Hello. I'm Alex Hauptmann of the Carnegie Mellon speech recognition research group. Speech recognition has taken on a variety of other functions within the Informedia system. What about the bus that was bombed by Muslim fundamentalists in Israel?
Speech recognition can automatically generate transcripts from the audio track. This transcript is then time-synced to the video to allow for word following.
- --exploded. And blood on the floor, blood [INAUDIBLE] [? spread ?] [? all over. ?]
- A towel covered a decapitated corpse.
- Hello there. I'm Michael Witbrock from Carnegie Mellon University's Computer Science Department. A recent development in Informedia is its application for Informedia News On Demand. With Informedia News On Demand, a user has access to a library of recent news stories from around the globe.
What do you have about a tank in San Diego?
- It was a frightening scene--
- Informedia News On Demand can access news information from broadcast, radio, and wire news services, giving the user a wider variety of perspectives than any single source.
- [? --crushing ?] everything in its path. Correspondent [? Reed ?] [? Galen ?] has the story and the incredible pictures.
- 58 tons of war machine on a rampage in San Diego.
- There have been many recent developments in the way our users can access the video footage. Tell me about the new Microsoft Windows operating system. The first development is the use of skims for reviewing the video segments. At compression rates of up to 20 times real time, the user can quickly access the main ideas of the segment.
- --in Miami opened-- [? at ?] [? the-- ?] [? Professor ?] says Microsoft has hype game.
- To find an exact location within a video segment, we have created video film strips. These markers indicate the exact location in the footage where the query topic is mentioned.
- What is it? Windows 95 is an operating system, which is the basic software that controls personal computers.
- We have developed another kind of skim with a lower compression rate that focuses on the content of the video. International Space University. Informedia skims are automatically generated using statistical language processing.
- 137 students train for tomorrow's challenges in space.
- Artificial [? G ?] shift.
- You've got two sleeping cubicles. We'd like to think that we're coming into a new age.
- Space is for commercialization.
- A whole generation has arisen.
- Instant information-- digitized, indexed, searched, retrieved. It's the age of Informedia.
REDDY: So what I'd like to kind of do is, it all looks kind of sexy and glitzy, but there are a lot of interesting solutions, and also lots of interesting research problems. I'd like to share with you some of the interesting research problems. First, I want to tell you about the problems in language processing.
So let us look at two kinds of information retrieval problems. One is, let us say I was a commodity trader, and I said, give me all the stories about gold.
It is possible that I might also be interested in getting other stories about precious metals like platinum or uranium or whatever. But on the other hand, I don't want stories about a golden parachute or American Express Gold Card. So that requires what I think Doug was talking about yesterday, of semantically-based retrieval systems.
That's a different kind of associative retrieval that's also important, which we don't fully know how to do yet. Supposing I asked a news story, saying, do you have any news clips of atrocities in Banja Luka? Perhaps you might have it.
But an intelligent retrieval system might also want to say, would you like stories about that in [? Gorazde? ?] And that requires understanding of geography and the fact that these two cities or towns are in the same geographical region in Bosnia, and the fact you're interested in one might also imply that you're interested in stories in the other one.
We are not anywhere close to that kind of thing. But we will get there-- 5, 10, 15 years-- probably, hopefully before the retirement that Michael was talking about. Even when we get there, it will never be perfect. But given the fact the human retrieval systems are pretty lousy themselves, we will be very happy to use whatever we can get.
The only indications of this, [? are ?] [? you ?] look at the Lycos and Yahoo and other systems, which are internet catalogs, they're not that good. But the last I looked, every day more than 100,000 people log into Lycos catalog index just to find out where they can find other related information.
For many people, it is the way of doing business every day today. And it fits Michael's test, saying if I tried to take it away and you complain bitterly, then the technology has arrived.
And the same is going to be true for speech or language or vision. If we can build applications that normal people can routinely use, in spite of the fact they will be full of errors-- full of errors is a proportional term, a relative term. And we just need to kind of have metrics which will help us to determine at what point something is ready for prime time.
And so let me give you some other facts. All the news stories you saw there were picked right off of the airwaves and fully, completely, automatically indexed. The speech recognition transcripts were fully produced automatically. I don't want you to think they were perfect-- far from it. The error rate there in most of those transcripts is about of the order of about 20%.
If you simply didn't have a lot of the shortcuts we were using-- for example, in order to produce the transcript, we not only used the language model from all the news stories of 1994, we have a thing called LM du jour, the language model of the day, which picks off all the news stories of the day. And it also uses the closed caption data, which is roughly about 70% correct. And the trick here is not just to produce the transcript, but also align the video and audio and text, which is [? Ted's ?] transclusion problem of relating all the related things that are the same in multiple media, and doing it in a way that is useful on a routine basis.
The image segmentation-- we find the problem of aligning the news story with the image segmentation is something we don't know how to do either. And we do [? it. ?] Many times, it is off by half a second or a second. So you get a little incorrect strip at the beginning or the end. You lose something. And people don't seem to mind.
So the issue here is, how do we create technologies that are imperfect in themselves, imprecise in themselves, but nevertheless useful to a human being in the human augmentation sense? Let me give you a couple of other language problems-- [? nonretrieval ?] [? type. ?]
You saw when you click on an icon, it was showing you a headline. That is intended to be a newspaper headline story of that story. So given 100 words or 200 words, how do you produce a five-word headline out of it? We don't know how to do it right. But whatever we are doing is better than not having it at all. And, again, we don't have good metrics for defining what constitutes a high quality headline.
The second thing you saw is the skim-- essentially, semantic fast forward. If you have used a conventional VCR type of fast forward, you'll just see the image but you won't hear anything. And whatever you hear would not be coherent.
The whole idea here is, can you see a two-hour lecture or a two-hour movie in five minutes, and then decide whether you're going to listen to the rest of it afterwards? If we could do that, we would have increased most of our productivity by a factor of 10 or so. That is true human augmentation, in the sense of [? that. ?]
So there are a number of potential leap frogs that would happen as we begin to build on these technologies. And what I'd like to do now is to show you a look into the future of some of the things we're doing at CMU. This kind of relates to 3D photography and 3D imaging, and how that might transform the way we do things in the future. Can I next show the last videotape, please?
- At Carnegie Mellon Robotics Institute, we have made substantial progress in 3D computer vision and are working on its new applications. We have built a video-rate stereo machine that can produce a complete [? dense ?] [? depth ?] map of a scene at 30 frames per second, aligned with [? intense ?] images-- probably the first of its kind.
- A five-camera stereo head is used to image a scene. This is the intensity image. The machine processes multiple image data from five cameras based on the multi-baseline stereo theory and computes depth for each pixel. In the depth image shown here, brighter pixels indicate closer points. The darker the pixel, the farther from the camera.
We can observe the video rate operation. The size of the depth image is up to 254 by 240. The depth resolution is currently 5 bits. And an increase to 7 bits is being planned.
- The capability of obtaining [? dense, ?] 3D, distance, and shape information at video-rate opens up a new class of applications. One of them is what we named "z-key," a new technique for merging real and virtual images.
- Chroma key is a standard technique for image superimposition, such as a TV reporter in front of a weather map. Unlike the chroma key that uses blue color as the key to switch between the real and virtual scenes, the z-key compares depth information from the stereo machine and the z buffer from the graphics system for switching. As a result, a real person can be between the two virtual lamps or partially in front of and behind the virtual object.
We can also have the virtual object interact with the real object consistently. Here, the virtual lamp is casting a shadow on the person with the correct shape as he moves. Note the person is also casting a shadow onto the virtual wall.
- Another project that we are working on is the virtualized reality studio that can turn a real-world event into a virtual one by using 3D computer vision and graphics tools. The goal is to immerse the user into a reconstruction of a real-world event completely interactively.
- The virtualized reality studio dome is fully covered by many cameras from all directions. In this example, we used cameras located on one half of the dome and recorded a scene onto videotapes.
The range or depth of every point in an image was computed using the same multi-baseline stereo algorithm used in the video-rate stereo machine, but offline. The depth map is edited to focus on the region of interest. The scene can be reconstructed with the depth and intensity information by placing a virtual or soft camera from the front, from the left, from the right, or from the top, or moving the soft camera as the user moves freely.
Earlier results of the virtualized reality studio dealt with moving scenes. The event originally observed from a single viewpoint can be viewed from many angles or even moving viewpoints-- from the left, close-up, from above to below, and from the left trying to steal the ball.
- For this baseball scene, we can create a ball's-eye view.
- It is our intention to develop a full video-rate system using the video-rate stereo technology. Virtualized reality has great significance in training, telepresence, and entertainment, in that it would enable users to interact with a real event or performance from dynamically selected viewpoints.
REDDY: So Takeo Kanade whom you saw there is the director of the Robotics Institute. He is too modest to fully hype the potential of this technology. But I am not, so I'll tell you about it.
Basically, what you're seeing here is the beginnings of teleportation and [? tele-time-travel. ?]
Okay. I'm not joking. The teleportation idea is supposing you wanted to be at the Mariners game that went for 11 innings. But you're in Boston. You can place yourself right next to the batter and see exactly what he's seeing with this technology. So that's not quite teleportation of Star Trek. That is, you cannot change physically anything that's going on in the baseball game. But you can position yourself at that place.
The same technology begins to show us what time travel might be like. Those of you who are in need of OJ fix and want to somehow go back and relive when OJ was doing, you could sit right next to him, put yourself back two weeks or three weeks, sit right next to him, and see what he was seeing when a witness was actually saying something about him. More importantly, if this technology were available in 1945, you would have been able to sit right next to-- no, in 1944-- Churchill, Stalin, and Roosevelt, and see what they were seeing and what they were saying in the same kind of way.
So it's not quite time travel into the future. But it's at least time travel into the past. And you can't change the past. So in that sense, we are beginning to see the beginnings of what you might call in teleportation and time travel.
And each of these images takes about 50 billion operations to produce a 3D map. And so we can use up all the cycles that anybody can produce for a long time to come. And you can expect when we have our 100th anniversary of this, almost all the domes-- Houston dome and everything else-- will have a few thousand cameras. So those of us couch potatoes can have any kind of a seat we want from anywhere in the stadium. Okay.
So with that, I'd like to conclude. For me, the lessons of Bush are kind of three-fold. First, the vision of information and demand I think is still very important. And in particular, access to relevant information rapidly in decision-ready form involves a number of fundamental technologies which we don't know how to solve, which were all still extremely important.
And when we solve these problems even partially, the increasing human augmentation and intelligence amplification are, if we don't want to use those hype words, simply productivity enhancement. If you could do the things you do by a factor of 5 or 10 or 100, then you are essentially creating a super human race on the planet. Those of you who have these tools will be 100 times more effective in whatever it is that you're trying to do.
And in that sense, Bush's vision is still extremely important. There are a number of very interesting technical problems, which all seem to be solvable. There will never be solved perfectly. But that's okay.
The second important lesson out of all of this is the implications of the whole technology of multimedia databases and information and demand is for socialization of knowledge. It's no longer the case, a few [? Mandarins ?] who have access to the right information can sit in a corner and decide all these things.
Everyone in the world will have access to the same information. Even those who are illiterate and poor and perhaps don't even know how to type or read can have access to this information. And that's an amazing prospect.
And the third is what Bob Kahn called [? "de-fixation" ?] yesterday. I love that word. Each of you can capture everything you have ever said in your entire life in about a terabyte of information. Today, using the new CD disk, that's about 200 disks. In 20 years from now, I bet it'll be about 20 disks with the densities increasing. And it might even be one disk because this technology is moving very fast.
And that's not very many disks. So if you have the appropriate kind of retrieval thing, all the way from your baby babble to the time of your profound pronouncements when you're 80, everything can be captured. And you can have access to all of that forever. So everything you've said.
You can also have access to everything you have done from a 3D sense. Okay. It's only 1,000 times more. I did the calculation on the bits. The de-fixation of everything you have done is about a petabyte.
And if you again say, how many disks is that? Projecting into the future, it's about 1,000 disks of a terabyte each-- CDs of this size. And that will fit in a shoebox. And so when you die, they can bury your shoebox disks of everything you've done.
It's too bad Negroponte is gone. But I have to request Alan Kay in his stead. Perhaps one of you would write a book called Being Immortal. This is it. The way you become immortal is to capture everything you've ever done and bury it with you so that in future, if somebody wants to know what you've done, you can do it. And when we figure out how to do simulated evolution, you can create a clone of you so that that person does the same things in the future. Thank you.
PRESENTER: Terrific, Raj. Okay, folks. Keep him busy.
REDDY: Yes, sir.
AUDIENCE: Dick Marcus, MIT. You said that in order to do really smart information retrieval, we're going to need some sophisticated natural language processing techniques.
REDDY: At least.
AUDIENCE: Yes, and more. But I'd like to suggest that we can already do sophisticated information retrieval. And perhaps what you're saying is that to do it completely automatically, if we go back to what Vannevar Bush was doing, he was talking about interactive systems in which the intelligence of the human works in conjunction with the mechanical things. And I would suggest that if we look at the same paradigm today, that we do have expert searchers who do quite effective information retrieval.
AUDIENCE: And some of us are working on research to show that if we have automated expert assistance, we can achieve the same kind of expertise even now.
REDDY: Right. I'm a big proponent of that. In fact, when I give talks to my AI colleagues, I say, we should be working on 80/20 systems. Let us do the 80% that can be done routinely.
And if we can also make this system so that they can say, I don't know the answer to this, please wait, I'll call my supervisor, which is a human being-- and that person can do the remaining 20%. And then those of us in AI research will look over your shoulder and say, what is it that you're doing for the 20%? And apply the 80/20 rule to that. Hopefully in another 10 years, we would be at a 96% level. So that's I think exactly the right paradigm.
AUDIENCE: I'm Bob Kahn. Raj, I found the example that you gave of the interplay between the reality world and the digital world very compelling. And I'd like to take it one step further by asking you to speculate a bit. I mean, you had the example on the screen of-- was it Takeo Kanade in [? lamp ?] [? land? ?] And on the other hand, you might have, I guest, Roger Rabbit in the dean's office, is the other example.
The interactions that you're talking about there are extremely static, even though there is some motion and a little bit of dynamics. But you can't really get any interplay where one thing affects the other in those examples.
So if you were to take your example of Raj Reddy in the ballpark in Seattle, could you imagine that you could be at the batter's plate and actually not replace the batter to do what he did, but in fact get up against the picture in simulation land and see what you can have done to effect the change? Or maybe practice playing right field?
AUDIENCE: How much can you see in terms of real, dynamic, noncausal interaction taking place in this interplay?
REDDY: That's a great question. I have three vignettes I want to share with you about that. One is that we're working on a thing called video karaoke, which is something like that.
But we had a piece on the Informedia tape, which we took out for lack of time. The way we got started on that whole project was, about four years ago we discovered there are a hundred hours of interviews with Arthur Clarke in WQED archives. They only used four minutes of it. And I said, this is crazy. Isn't there some way we can capture all of those things and then kind of find out what he had to say?
And one of the students did something very brilliant, I thought. Once you capture and have your complete text of the transcript of the whole thing, you can ask any question that comes to your mind and interact with Arthur Clarke. And he will give you the answer.
It goes right to that particular phrase where he used the same phrase that you had in your question, and then start answering something. And you would think he's answering you. He's not. It just so happened that phrase occurred in some context. And it's a synthetic interview. It's kind of another example of the kind of interaction.
The best ones I know of this kind is the work of Joe Bates, who's working on interactive drama and interactive fiction and interactive agents. And Joe, he is the kind of leading proponent within the AI community building these agents like a fish or a Roger Rabbit or somebody who would come and talk to you and interact with you. And you could play a game together or do whatever. And the whole spectrum of interactive fiction and interactive drama is going to be there. And it'll be fun to watch.
Then Dr. [INAUDIBLE].
REDDY: [? Ted? ?]
AUDIENCE: Oh, here you go.
[? [LAUGHTER] ?]
AUDIENCE: Just one point about transclusion.
The term-- you referred to resolving a number of tapes and concurrent and coordinated materials. I prefer the term "resolution" for that. "Transclusion" means precisely the reuse of the same bits in a different context as labels.
AUDIENCE: Thank you.
AUDIENCE: But it was the [? transparallel ?] [? representation. ?]
AUDIENCE: Hi, Raj.
AUDIENCE: Ed Moriarty at MIT. A lot of people tend to get a little camera shy. In general, if you just take a camera, point it at them, you, all of a sudden, no longer have the real-life event. You have people playing to the camera or somehow altering reality because of the fact you're putting a camera or somehow recording.
And if we took this thought to a 50-year extreme, I'm really curious-- we have people no longer are needed for remembering things. So your brain doesn't need to do that. And we have the technology, potentially, becoming kind of the entire environment in which people are playing.
And I'm really curious what the impact could be, in sort of an insidious way, of just the impact on what is a person at that point? And what is our role? And can you just talk about this bad side, the dark side?
REDDY: Yeah. I don't see any dark side at all. In fact, no matter what, people will be thinking, because the bits will be constantly churning. And it's kind of the nature of the human race, our human species.
See, when we felt cold, we didn't wait till genetics to grow fur on our body. We made fur coats. When we wanted to fly, we didn't wait to have wings. We made airplanes. When we want to remember and think, we will make our artifacts and use them as an augmentation device.
And that is the way it'll all happen. And the human being will be at the center of everything. There is no way, I would, or you would, or anybody would stop thinking. Even an illiterate person in a little village I come from, it is not going to be the case. Yeah. Alan?
AUDIENCE: Actually, I have a different answer--
AUDIENCE: --for that question.
AUDIENCE: Which is, it always reminds me of that wonderful cartoon of the classroom in which the professor's tape recorder is talking to all the student tape recorders. And both the professor and the students are off doing something more worthwhile.
AUDIENCE: So one of the ways of thinking about this is, this is the answer to the problem of Moore's law, which is how can you get computers to use up all their capacity? And the answer is, well, have them talk to each other. And meanwhile, we'll go off and do something.
AUDIENCE: Because the other way of getting remembered forever is to have just one good idea, not to have a shoe box full of all your trivial ones.
AUDIENCE: So I kind of like that old way better.
PRESENTER: Thanks again, Raj. That was great.