Final Jeopardy (10 page)

Read Final Jeopardy Online

Authors: Stephen Baker

BOOK: Final Jeopardy
10.01Mb size Format: txt, pdf, ePub

Unlike the brittle expert systems, neural networks were supple. They specialized in pattern detection, not a series of if/then commands. They never choked on changes in the data but simply adjusted. While expert systems processed data sequentially, as if following a recipe, the electronic neurons crunched in unison—in parallel. Their weakness? Since these collections of artificial neurons learned by themselves, it was nearly impossible to figure out how they reached their conclusions or to understand what they were picking up about the world. A neural net was a black box.

By the time Ferrucci returned to IBM Research, in 1995, he was looking beyond expert systems and neural nets. In his spare time, he and a colleague from RPI, Selmer Bringsjord, were building a machine called Brutus, which wrote fiction. And they were writing a book about their machine,
Artificial Intelligence and Literary Creativity.
Brutus, they wrote, is “utterly devoid of emotion, but he nonetheless seems to have within his reach things that touch not only our minds, but our heart.”

The idea for the program, Ferrucci later said, came when Bringsjord asked him if a machine could create its own story line. Ferrucci took up the challenge. Instead of teaching the machine to dream up plots, he programmed it with about a dozen themes, from betrayal to revenge. For each theme, the machine was first given a series of literary examples and then a program to develop stories along those lines. One of its models for betrayal was Shakespeare's
Julius Caesar
(the program was named for Caesar's confidant-turned-conspirer, Brutus). The program produced serviceable plots, but they were less than riveting. “The one thing it couldn't do is figure out if something was interesting,” Ferrucci said. “Machines don't understand that.”

In his day job, Ferrucci was teaching computers more practical lessons. As head of Semantic Analysis and Integration at IBM, he was trying to instruct them to make sense of human communication. On the Internet, records of our words and activities were proliferating as never before. Companies—IBM and its customers alike—needed tools to interpret these new streams of information and put them to work. Ideally, an IBM program would tell a manager what customers or employees were saying or thinking as well as what trends and insights to draw from them and perhaps what decisions to make.

Within IBM itself, some two hundred researchers were developing a host of technologies to mine what humans were writing and saying. But each one operated within its own specialty. Some parsed sentences, analyzing the grammar and vocabulary. Others hunted Google-style for keywords and Web links. Some constructed massive databases and ontologies to organize this knowledge. A number of them continued to hone expert systems and neural networks. Meanwhile, the members of the Q-A team coached their computer for the annual TRec competitions. “We had lots of different pockets of researchers working on these different analytical algorithms,” Ferrucci said. “But any time you wanted to combine them, you had a problem.” There was simply no good way to do it.

In the early 2000s, Ferrucci and his team put together a system to unify these diverse technologies. It was called UIMA, Unstructured Information Management Architecture. It was tempting to think of UIMA as a single brain and all of the different specialties, from semantic analysis to fact-checking, as cognitive regions. But Ferrucci maintained that UIMA had no intelligence of its own. “It was just plumbing,” he said. Idle plumbing, in fact, because for years it went largely unused.

But a
Jeopardy
project, he realized, could provide a starring role for UIMA. Blue J would be more than a single machine. His team would pull together an entire conglomeration of Q-A approaches. The machine would house dozens, even hundreds of algorithms, each with its own specialty, all of them chasing down answers at the same time. A couple of the jury-rigged algorithms that James Fan had ginned up could do their thing. They would compete with others. Those that delivered good answers for different types of questions would rise in the results—a bit like the best singers in the Handel sing-along. As each one amassed its record, it would gain stature in its specialty and be deemed clueless in others. Loser algorithms—those that failed to produce good results in even a single niche—would be ignored and eventually removed. (Each one would have to prove its worth in at least one area to justify its inclusion.) As the system learned which algorithms to pay attention to, it would grow smarter. Blue J would evolve into an ecosystem in which the key to survival, for each of the algorithms, would be to contribute to correct responses to
Jeopardy
clues.

While part of his team grappled with Blue J's architecture, Ferrucci had several researchers trolling the Internet for
Jeopardy
data. If this system was going to compete with humans in the game, it would require two types of information. First, it needed
Jeopardy
clues, thousands of them. This would be the machine's study guide—what those in the field of machine learning called a training set. A human player might watch a few
Jeopardy
shows to get a feel for the types of clues and then take some time to study country capitals or brush up on Shakespeare. The computer would do the same work statistically. Each
Jeopardy
clue, of course, was unique and would never be repeated, so it wasn't a question of learning the answers. But a training set would orient the researchers. Given thousands of clues, IBM programmers could see what percentage of them dealt with geography, U.S. presidents, words in a foreign language, soap operas, and hundreds of other categories—and how much detail the computer would need for each. The clue asking which presidential candidate carried New York State in 1948, for example (“Who is Thomas Dewey?”), indicated that the computer would have to keep track of White House losers as well as winners. What were the odds of a presidential loser popping up in a clue?

Digging through the training set, researchers could also rank various categories of puzzles and word games. They could calculate the odds that a
Jeopardy
match would include a puzzling Before & After, asking, for example, about the “Kill Bill star who played 11 seasons behind the plate for the New York Yankees” (“Who is Uma Thurman Munson?”). A rich training set would give them a chance to scrutinize the language in
Jeopardy
clues, including abbreviations, slang, and foreign words. If the machine didn't recognize AKA as “also known as” or “oops!” as a misunderstanding, if it didn't recognize “sayonara,” “au revoir,” “auf Wiedersehen,” and hundreds of other expressions, it could kiss entire
Jeopardy
categories goodbye. Without a good training set, researchers might be filling the brain of their bionic student with the wrong information.

Second, and nearly as important, they needed data on the performance of past
Jeopardy
champs. How often did they get the questions right? How long did they take to buzz in? What were their betting strategies in Double Jeopardy and Final Jeopardy? These humans were the competition, and their performance became the benchmark for Blue J.

In the end, it didn't take a team of sleuths to track down much of this data. With a simple Internet search, they found a Web site called J! Archive, a trove of historical
Jeopardy
data. A labor of love by
Jeopardy
fans, the site detailed every game in the show's history, with the clues, the contestants, their answers—and even the comments by Alex Trebek. Here were more than 180,000 clues, hundreds of categories, and the performance of thousands of players, from first-time losers to champions like Brad Rutter and Ken Jennings.

In these early days, the researchers focused only on Jennings. He was the gold standard. And with records of his seventy-four games—more than four times as many as any other champion—they could study his patterns, his strengths and vulnerabilities. They designed a chart, the Jennings Arc, to map his performance: the percentage of questions on which he won the buzz and his precision on those questions. Each of his games was represented by a dot, and the best ones, with high buzz and high accuracy, floated high on the chart to the extreme right. His precision averaged 92 percent and occasionally reached 100 percent. He routinely dominated the buzz, on one game answering an astounding 75 percent of the clues. For each of these games, the IBM team calculated how well a competitor would have to perform to beat him. The numbers varied, but it was clear that their machine would need to win the buzz at least half the time, get about nine of ten right—and also win its share of Daily Doubles.

In the early summer of 2007, after the bake-off, the
Jeopardy
team marked the performance of the Piquant system on the Jennings Arc. (Basement Baseline, which lacked a confidence gauge, did not produce enough data to be charted there.) Piquant's performance was so far down and to the left of Ken Jennings's dots, it appeared to be . . . well, exactly what it was: an alien species—and not destined for
Jeopardy
greatness.

When word of this performance spread around the Yorktown labs, it only fueled the concerns that Ferrucci's team was heading for an embarrassing fall—if it ever got that far. Mark Wegman, then the head of computer science at IBM Research, described himself as someone who's “usually wildly optimistic about technology.” But when he saw the initial numbers, he said, “I thought there was a 10 percent chance that in five years we could pull it off.”

For Ferrucci, Piquant's failure was anything but discouraging. It gave him the impetus to march ahead on a different path, toward Blue J. “This was a chance to do something really, really big,” he said. However, he wasn't sure his team would see it this way. So he gathered the group of twelve in a small meeting room at the Hawthorne labs. He started by describing the challenges ahead. It would be a three- to five-year project, similar in length to a military deployment. It would be intense, and it could be disastrous. But at the same time they had an opportunity to do something memorable. “We could sit here writing papers for the next five years,” he said, “or we build an entirely new type of computer.” He introduced, briefly, a nugget of realpolitik
.
There would be no other opportunities for them in Q-A technologies within IBM. He had effectively engineered a land grab, putting every related resource into his
Jeopardy
ecosystem. If they wanted to do this kind of science, he said, “this was the only place to be.”

Then he went around the room with a simple question: “Are you in or are you out?”

One by one, the researchers said yes. But their response was not encouraging. The consensus was that they could build a machine that could compete—but probably not beat—a human champion. “We thought it could earn positive money before getting to Final Jeopardy,” said Chu-Carroll, one of the only holdovers on Ferrucci's team from the old TRec unit. “At least we wouldn't be kicked off the stage.”

With this less than ringing endorsement, Ferrucci sent word to Paul Horn that the
Jeopardy
challenge was on. He promised to have a machine, within twenty-four months, that could compete against average human players. Within thirty-six to forty-eight months, his machine, he said, would beat champions one-quarter of the time. And within five to seven years, the
Jeopardy
machine would be “virtually unbeatable.” He added that this final goal might not be worth pursuing. “It is more useful,” he said, “to create a system that is less than perfect but easily adapted to new areas.” A week later, Ferrucci and a small team from IBM Research flew to Culver City, to the Robert Young Building on the Sony lot. There they'd see whether Harry Friedman would agree to let the yet-to-be-built Blue J play
Jeopardy
on national television.

4. Educating Blue J

JENNIFER CHU-CARROLL
, sitting amid a clutter of hardware and piles of paper in her first-floor office in the Hawthorne labs, wondered what in the world to teach Blue J. How much of the Bible would it have to know? The Holy Book popped up in hundreds of
Jeopardy
clues. But did that mean the computer needed to know every psalm, the laws of Deuteronomy, Jonah's thoughts and prayers while inside the whale? Would a dose of Dostoevsky help? She could feed it
The Idiot,
Crime and Punishment,
or any of the other classics that might pop up in a
Jeopardy
clue. When it came to traditional book knowledge, feeding Blue J's brain was nearly as easy as Web surfing.

This was in July 2007. Chu-Carroll's boss, David Ferrucci, and the small IBM contingent had just flown back from Culver City, where they had been given a provisional thumbs-up from Harry Friedman. A man-machine match would take place, perhaps in late 2010 or early 2011. IBM needed the deadline to mobilize the effort within the company and to establish it as a commitment, not just a vague possibility.
Jeopardy
, for its part, would bend the format a bit for the machine. The games would not include audio or visual clues, where contestants have to watch a snippet of video or recognize a bar of music. And they might let the machine buzz electronically instead of hitting a physical button. The onus, according to the preliminary agreement, was on IBM to come up with a viable player in time for the match.

It was up to Chu-Carroll and a few of her colleagues to map out the machine's reading curriculum. Chu-Carroll had black bangs down to her eyes and often wore sweatshirts and jeans. Like practically everyone else on the team, she had a doctorate in computer science, hers from the University of Delaware. She had worked for five years at Lucent Technology's Bell Labs, in New Jersey. There she taught machines how to participate in a dialogue and how to modulate their voices to communicate different signals. (Lucent was developing automated call centers.) When Chu-Carroll came to IBM in 2001, joining her husband, Mark, she plunged into building Q-A technologies. (Mark later left for Google.)

Other books

X Descending by Lambright, Christian
Whisper Falls by Elizabeth Langston
Quatermass by Nigel Kneale
The Two Kinds of Decay by Sarah Manguso
The Elderine Stone by Lawson, Alan