Final Jeopardy (11 page)

Read Final Jeopardy Online

Authors: Stephen Baker

BOOK: Final Jeopardy
3.86Mb size Format: txt, pdf, ePub

In mid-2007, the nascent
Jeopardy
system wasn't really a machine at all. Fragments of a
Jeopardy
player existed as a collection of software programs, some of them hand-me-downs from the recent bake-off, all of them easy to load onto a laptop. As engineers pieced together an architecture for the new system, Chu-Carroll pondered a fundamental question: How knowledgeable did this computer really need to be? One of its forebears, Basement Baseline, had hunted down its answers on the Web. Blue J wouldn't have that luxury. So as Chu-Carroll sat down for Blue J's first day of school, her pupil was a tabula rasa.

She quickly turned to a promising resource. James Fan had already demonstrated the value of Wikipedia for answering a small subsection of
Jeopardy
clues. “It related to popular culture and what people care about,” Chu-Carroll said. So she set to work extracting much of the vast corpus of Wikipedia articles from the online site and putting them into a format that Blue J could read.

But how about books? Gutenberg.org offered hundreds of classics for free, along with a ranking of the most popular downloads. Chu-Carroll could feed any or all of them to Blue J. After all, words didn't take up much space.
Moby Dick,
for example, was only 1.5 megabytes. Photographs on new camera phones packed more bits than that. So one day she downloaded the Gutenberg library and gave Blue J a crash course on the Great Books.

“It wasn't a smart move,” she later admitted. “One of the most popular books on Gutenberg was a manual for surgeons from a hundred years ago.” This meant that when faced with a clue about modern medicine, Blue J could be consulting a source unschooled in antibiotics, CAT scans, and HIV, one fixated instead on scurvy, rickets, and infections (not to mention amputations) associated with trench warfare. “I'm not sure why that book is so popular,” said Chu-Carroll. “Are people doing at-home surgery?”

Whatever their motives, most human readers knew exactly what they were getting when downloading the medical relic. If not, they quickly found out. In addition to surgical descriptions, the book contained extraordinary pictures of exotic and horrifying conditions, such as elephantiasis of the penis. Aside from these images, the book's interest was largely historical. Humans had little trouble placing it in this context. Chu-Carroll's pupil, by contrast, had a maddening habit endemic among its ilk: It tended to take every source at its word.

Blue J's literal-mindedness posed the greatest challenge at every step of its education. Finding suitable data for this gullible machine was only the first task. Once Blue J had its source material, from James Joyce to archives of the Boing-Boing blog, the IBM team would have to teach it to make sense of all those words: to place names and facts into context and to come to grips with how they were related to one another. Hamlet, to pick one example, was related not only to his mother, Gertrude, but also to Shakespeare, Denmark, Elizabethan literature, a famous soliloquy, and themes ranging from mortality to self-doubt. Preparing Blue J to navigate all of these connections for virtually every entity on earth, factual or fictional, would be the machine's true education. The process would involve creating, testing, and fine-tuning thousands of algorithms. The final challenge would be to prepare it to play the game itself. Blue J would have to come up with answers it could bet on within three to five seconds. That job was still a year or two down the road.

For now, Chu-Carroll found herself contemplating academic heresy. Like college students paging through Cliff's Notes or surfing Wikipedia, she began to wonder whether Blue J should bother with books at all. Each one contained so many passages that could be misconstrued. In the lingo of her colleagues, books had a sky-high signal-to-noise ratio. The signals, the potential answers, swam in oceans of words, so-called noise.

Imagine Blue J reading Mark Twain's
Huckleberry Finn.
In one section, Huck and the escaped slave, Jim, are contemplating the night sky:

We had the sky up there, all speckled with stars, and we used to lay on our backs and look up at them, and discuss about whether they was made or only just happened. Jim he allowed they was made, but I allowed they happened; I judged it would have took too long to MAKE so many. Jim said the moon could a LAID them; well, that looked kind of reasonable, so I didn't say nothing against it, because I've seen a frog lay most as many, so of course it could be done.

Assuming that Blue J could slog through the idiomatic language—no easy job for a computer—it could “learn” something about the cosmos. Both characters, it appeared, agreed that the moon, like a frog, could have laid the stars. It seemed “reasonable” to them, a conclusion Blue J would be likely to respect. A human would put that passage into context, learn something about Jim and Huck, and perhaps laugh. Blue J, it was safe to say, would never laugh. It would likely take note of an utterly fallacious parent-offspring relationship between the moon and the stars and record it. No doubt its mad hunt through hundreds of sources to answer a single
Jeopardy
clue would bring in much more astronomical data and statistically overwhelm this passage. In time, maybe the machine would develop trusted sources for such astronomical questions and wouldn't be so foolish as to consult Huck Finn and Jim about the cosmos. But still,
most books had too many words—too much noise—for the job ahead.

This led to an early conclusion about a
Jeopardy
machine. It didn't need to know books, plays, symphonies, or TV sitcoms in great depth. It only needed to know
about
them. Unlike literature students, the machine would not be pressed to compare and contrast the themes of family or fate in
Hamlet
with those in
Oedipus Rex.
It just had to know they were there. When it came to art, it wouldn't be evaluating the brushwork of Velázquez and Manet. It only needed to know some basic biographical facts about them, along with a handful of their most famous paintings. Ken Jennings, Ferrucci's team learned, didn't prepare for
Jeopardy
by plowing through big books. In
Brainiac,
he described endless practice with flash cards. The conclusion was clear: The IBM team didn't need a genius. They had to build the world's most impressive dilettante.

From their statistical analysis of twenty thousand
Jeopardy
clues drawn randomly from the past twenty years, Chu-Carroll and her colleagues knew how often each category, from U.S. presidents to geography, was likely to pop up. Cities and countries each accounted for a bit more than 2 percent of the clues; Shakespeare and Abraham Lincoln were regulars on the big board. The team proceeded to load Blue J with the data most likely to contain the answers. It was a diet full of lists, encyclopedia entries, dictionaries, thesauruses, newswire articles, and downloaded Web pages. Then they tried out batches of
Jeopardy
clues to see how it fared.

Blue J was painfully slow. Laboring on a single computer, it created a logjam of data, sending information through the equivalent of a skinny straw when it needed a fire hose. It didn't have enough bandwidth (the rate of data transfer). And it lacked computing muscle. This led to delays, or latency. It often took an hour or two to puzzle out the meaning of the clue, dig through its data to come up with a long list of possible answers, or “candidates,” evaluate them, choose one, and decide whether it was confident enough to bet. The best way to run the tests, Chu-Carroll and her colleagues eventually realized, was to ask Blue J questions before lunch. The machine would cogitate; they'd eat. It was an efficient division of labor.

An hour or so after they returned to the office, Blue J's list of candidate answers would arrive. Many of them were on target, but some were ridiculous. One clue, for example, read: “A 2000 ad showing this pop sensation at ages 3 & 18 was the 100th “got milk?” ad.” Blue J, after exhaustive work, missed the answer (“Who is Britney Spears?”) by a mile, suggesting “What is Holy Crap?” as a possibility. The machine also volunteered that the diet of grasshoppers was “kosher” and that the Russian word for goodbye (“What is
do svidaniya
?”) was “cholesterol.”

The dumb answers didn't matter—at least not yet. They had to do with the computer's discernment, which was still primitive. The main concern for the
Jeopardy
team at this stage was whether the correct answer popped up anywhere at all on its list. This was its measure of binary recall, a metric that Blue J shared with humans. If a student in geography were asked about the capital of Romania, she might come up with Budapest and Bucharest and not remember which of the two was right. In Blue J's world, those would be her candidate answers, and she clearly had the data in her head to answer correctly. From Chu-Carroll's perspective, this student would have passed the binary recall test and needed no more data on capitals. (She just had to brush up on the ones she knew.) In a similar test, Blue J clearly didn't recognize Britney Spears or
do svidaniya
as the correct answer. But if those words showed up somewhere on its candidate lists, then it had the wherewithal to answer them (once it got smarter). By focusing on categories where Blue J struck out most often, the researchers worked to fill in the holes in its knowledge base.

Week by week, gigabyte by gigabyte, Blue J's trove of data grew. But by the standards of consumer electronics, it remained a pip-squeak. The fifth-generation iPods, which were selling five miles down the road, stored 30 to 80 gigabytes. Blue J topped out at the level of a midrange iPod, at some 75 gigabytes. Within a year or two, cell phones might hold as much. But there was a reason for the coziness of Blue J's stash. The more data it had to rummage through, the longer it took—and it was already painfully slow. What's more, the easiest and clearest sources for Blue J to digest were lists and encyclopedia entries. They were short and to the point, and harder for their literal-minded pupil to misinterpret. And they didn't take up much disk space.

While Chu-Carroll wrestled with texts, James Fan was part of a team grappling with a challenge every bit as important: coaxing the machine to understand the convoluted
Jeopardy
clues. If Blue J couldn't figure out what it was supposed to look for, the data on its disk wouldn't make any difference. From Blue J's perspective, each clue was a riddle to be decoded, and the key was to figure out the precise object of its hunt. Was it a country? A person? A kind of fish? In
Jeopardy
, it wasn't always clear.

The crucial task was to spot the word representing what Blue J was supposed to find. In everyday questions, finding it was simple. In “Who assassinated President Lincoln?” or “Where did Columbus land in 1492?” the “who” and “where” point to a killer and a place. But in
Jeopardy
, where the clues are statements and the answers questions, finding these key words, known as Lexical Answer Types (LATs), was a lot trickier. Often a clue would signal the LAT with a “this,” as in: “
This
title character was the crusty & tough city editor of the
Los Angeles Tribune
.” Blue J had no trouble with that one. It identified “title character” as its focus and returned from its hunt with the right answer (“Who is Lou Grant?”).

But others proved devilishly hard. This clue initially left Blue J befuddled: “In nine-ball, whenever you sink this, it's a scratch.” Blue J, Fan said, immediately identified “this” as the object to look for. But what was “this?” The computer had to analyze the rest of the sentence. “This” was something that sank. But it was not related, at least in any clear way, to vessels, the most common sinking objects. To identify the LAT, Blue J would have to investigate the two other distinguishing words in the clue, “nine-ball” and “scratch.” They led the computer to the game of pool and, eventually, to the answer (“What is a cue ball?”).

Sometimes the LAT remained a complete mystery. The computer, Fan said, had all kinds of trouble figuring out what to look for in this World Leaders clue: “In 1984, his grandson succeeded his daughter to become his country's prime minister.” Should the computer look for the grandson? The daughter? The country? Any human player would quickly understand that it was none of the above. The trick was looking for a single person whose two roles went unmentioned: a father and a grandfather. To unearth this, Blue J would have had to analyze family relationships. In the end, it failed, choosing the grandson (“Who is Rajiv Gandhi?”). In its list of answers, Blue J did include the correct name (“Who is Nehru?”), but it had less confidence in it.

Troubles with specific clues didn't matter. Even Ken Jennings only won the buzz 62 percent of the time. Blue J could afford to pass on some. The important thing was to fix chronic mistakes and to orient the machine to succeed on as many clues as possible. In previous weeks, Fan and his colleagues had identified twenty-five hundred different LATs in
Jeopardy
clues and ranked them by their frequency. The easiest for Blue J were the most specific. The machine could zero in on songs, kings, criminals, or plants in a flash, but most of them were more vague. “He,” for example, was the most common, accounting for 2.2 percent of the clues. Over the coming months, Fan would have to teach Blue J how to explore the rest of each clue to figure out exactly what kind of “he” or “this” it should look for.

It was possible, Ferrucci thought, that someday a machine would replicate the complexity and nuance of the human mind. In fact, in IBM's Almaden research labs, on a California hilltop high above Silicon Valley, a scientist named Dharmendra Modha was building a simulated brain boasting seven hundred million electronic neurons. Within years, he hoped to map the brain of a cat, then a monkey, and eventually a human. But mapping the human brain, with its hundred billion neurons and trillions or quadrillions of connections among them, was a long-term project. With time, it might result in a bold new architecture for computing that would lead to a new level of computer intelligence. Perhaps then machines would come up with their own ideas, wrestle with concepts, appreciate irony, and think more like humans.

Other books

Food Over Medicine by Pamela A. Popper, Glen Merzer
Flawed by Cecelia Ahern
Border of the sun by Aditya Mewati
Game Winner (The Penalty Kill Trilogy #3) by Lindsay Paige, Mary Smith
Alice-Miranda Takes the Lead by Jacqueline Harvey
Too Charming by Kathryn Freeman
Heliopause by Heather Christle
To Catch a Star by Romy Sommer
Flawed by Jo Bannister