Authors: Daniel Suarez
“What’s my objective?”
The colonel closed his laptop and focused his gaze on Odin. “You are to discover the source of the drone attacks on the continental United States.”
“You’re referring to the terror bombings, sir?”
“That’s a cover story. The truth is more worrisome. We think whoever was behind the Karbala attack is also behind the CONUS attacks.”
Odin considered the implications.
“One more thing.” The colonel paused for emphasis. “You are to pursue your mission no matter where it leads you—even if you are commanded to stop. You must continue, and you must succeed. Do you understand me, Master Sergeant?”
Odin nodded. “Yes, I believe I do, sir.”
CHAPTER 3
Raconteur
“G
ood afternoon,
ladies and gentlemen. My name is Joshua Strickland, team lead for visual intelligence development here at the Stanford Vision Lab. I’d like to thank you all for coming today.”
Strickland stood at the head of a darkened, windowless lecture hall in the basement of the Gates Computer Science Building. Beside him the camera-eye logo of the Vision Lab filled a large projection screen. In the PowerPoint afterglow he saw familiar and unfamiliar faces among a small audience seated primarily in the front two rows. He focused on the serious faces seated just before him.
“An especially warm welcome to our distinguished guests from the Transformational Convergence Technology Office. Thanks also to our faculty advisor, Doctor Lei Li, without whose support we would not be presenting to you today.”
There was timid applause from somewhere in the darkness.
Strickland paused to collect his thoughts. So much was riding on this. He took a breath then began, “What you’re about to see is a visual intelligence technology we call Raconteur.” A click of his wireless remote, and the slide changed to an animation of dozens, then hundreds, and then thousands of individual video insets, swarming. It was a vast stream of graphic data. “Visual intelligence is often confused with ‘computer vision’—but it’s much more than that. Visual intelligence means giving machines the ability not merely to identify objects in images—which has been possible for years—but the cognitive ability to discern what’s occurring in a scene. Concept detection, integrated cognition, interpolation— prediction. What could have happened, and what might happen next. It means giving machines not only the ability to see but to understand what they see.”
He searched the faces of those front and center. “Why is this important?”
He clicked the remote, and the slide changed to surveillance images of London subway bombers moving through stations and standing in railcars. “In an increasingly dangerous world, video surveillance represents society’s best hope to detect threats before they materialize. But this flood of visual imagery means an exponential increase in the volume of surveillance video that must be analyzed—and analyzed real-time if it is to be of use not just in reviewing criminal acts after the fact but in preventing criminal acts.”
The image changed to that of a burned-out Starbucks on an urban street. Then another photo from a newspaper showing a burned-out SUV beneath the headline
SENATOR ASSASSINATED IN TERROR BOMBING
. “We need only consider the recent unsolved terror bombings here in the United States to recognize how critical visual intelligence is to our future.”
Strickland scanned the faces of his audience. They were with him.
“How do we imbue machines with this ability? We do this by emulating the way humans process spatiotemporal events. Human visual cognition is closely attuned to change, and it’s these changes that create what we call ‘attention states.’ We acquire ‘attention states’ from video imagery through an algorithmic mechanism that includes notions of focus of attention, markers placed on salient objects, and the critical relationships between those objects in terms of motion and contact. These are necessary to distinguish individual events from one another. A series of attentional states over time then becomes a visual attention trace—or VAT—which begins to form the elements of a story. One that can be programmatically narrated through machine-readable text—text that can then be algorithmically searched for relevance, in real time, by an ‘audience’ of other, simpler programs. This is why we call our system ‘Raconteur’—because it tells the story of what’s happening in a way that common systems can understand. And like any good storyteller, ‘Raconteur’ remembers how the current scene fits into the whole.”
Strickland knew that his combination of youth and poise would be an advantage here. Disruptive technology was like that. Now, at twenty-two, he was leading a team that was about to revolutionize visual image processing. Although he wasn’t the driving force behind the innovations, he did know how to spot and recruit talent to his work teams. If history was any guide, that was the primary skill necessary for success in Silicon Valley. Being able to spot a good idea and knowing who could make it work. Removing obstacles and inspiring others, that was the biggest part of innovation.
“We have worked with DARPA’s technical staff to coordinate the following demonstration, in strict adherence to the Mind’s-Eye Project guidelines. Please remember that our system has not been previously exposed to the images that you—and it—are about to see. We look forward to taking your questions after the test. Until then, ladies and gentlemen, I give you ‘Raconteur,’ the storyteller. . . .”
More light applause as the screen went black.
Strickland stepped aside as two smaller screens glowed to life up front—one bearing the title “TCTO Phase 1—Recognition Test.” The other screen displayed a blinking cursor.
Strickland moved to the side to stand with his project team, bracing for whatever came next. He cast a tense look at his development lead, Vijay Prakash, but the handsome, dour Bengali ignored Strickland’s arched eyebrows and looked to the screen. The rest of the grad student crew—Sourav Chatterjee, Gerhard Koepple, Wang Bao-Rong, and Nikolay Kasheyev—nodded in acknowledgment of the moment. Then they all turned to watch the screens too.
The words “TCTO Phase 1—Recognition Test” soon appeared also on the right-hand screen. The twin projections were set up so that whatever appeared in the left-hand screen, Raconteur would have to make sense of and describe in text on the right-hand screen.
Strickland felt relief wash over him as he stood in the darkness. Failing simple character recognition while reading the title card would have killed them, but then, OCR was handled by a licensed library, not their code. Still, he knew the DARPA judges wouldn’t cut them any slack for choosing a bad library.
But the test was already moving on. No time to ponder disaster scenarios. The left-hand screen changed to black-and-white surveillance video. It depicted a woman walking down an office hallway carrying a cardboard records box.
Strickland tensed again. He’d seen the VI algorithms work a hundred thousand times and had a pretty good idea how they functioned, but they’d never been run live in front of such an important audience. What happened next would decide the next several years of his life—of their lives—and quite possibly the trajectory of Strickland’s career. He focused on the blinking cursor on the right-hand screen—the Raconteur output panel.
As the video continued, text began to appear. . . .
Person carries object along corridor.
Murmurs of approval swept through the room, but Strickland remained tense.
C’mon. Do it. Do it, baby. . . .
The cursor then began expanding on the details.
Woman carries box along corridor.
More murmurs and some clapping. Strickland cast a glance at the DARPA managers, who were nodding and talking softly among themselves. Taking notes. A wave of relief flowed through him. He’d had no idea how clenched he was, but now that initial impressions were good, the judges would be more receptive if there was a later glitch. He told himself that no matter what happened from here on, they had at least avoided a meltdown. They had gotten on the scoreboard.
The scene changed to an exterior; an American soldier standing on a littered street in some Middle Eastern slum, weapon slung and motioning to unseen people. A small—possibly Iraqi—child entered the frame behind him. Strickland felt the dread returning, as the text scrolled. . . .
Armed person . . . approached by child.
More applause and some actual shouts of excitement.
Strickland felt a smile crease his face before he clamped down on it. Too early to celebrate.
Uniformed soldier approached by child in street.
The hoots continued. So far so good, but Strickland knew the difficulty levels were only going to increase. As he watched, the system mistook another soldier entering the frame as a possible threat—
#ALERT—armed person.
Not too far off the truth, though.
The control frame faded to black and displayed the title: “TCTO Phase 1—Interpolation Test.”
Here we go. The complexity of visual concepts ramped up fast. It was why their system focused on deriving context first while interpreting a scene, and why it never forgot what it had seen previously. That was key to avoiding a lot of useless processing. Humans walking down a city sidewalk, for example, do not suddenly expect to see a mountain vista or a rolling sea all around them. That would be impossible—thus, even if these things appeared, they were likely to be graphical representations like ads, not the actual thing. Daisy-chaining events made it possible to take the known and use it as a base camp from which to explore the unknown—pushing that frontier back just a little at a time, like ants exploring terrain.
As Strickland knew, even a person with Down syndrome was a generalized genius compared to special-purpose computer algorithms. Breaking things down to their simplest elements was the only way to accomplish anything useful. Prakash had worked out the architecture, and the design made Strickland’s head hurt. But if the damned thing worked, he’d forgive all of the man’s arrogance.
The scene on the left changed to a woman in a burka—a
burka
! What U.S. troops called a “BMO,” short for “black moving object.” DARPA bastards. No face, no clear view of her arms or torso. On-screen she resembled a walking bag. But if memory served, Vijay and Gerhard’s gait detection code should help assign the attribute of “human” to walking objects—and along with “humanity” came implied geometry, potential actions, and patterns of movement. The burka woman was moving along a narrow village road carrying what appeared to be a plastic water jug on her head.
The room waited with bated breath. Then the text started scrolling.
Person carries object down street.
Okay, so far so good.
The woman entered a dwelling through a doorway on the left, and the system correctly described her disappearance. Then all was quiet for a moment, until she reemerged without the jug on her head. This was the real test. Cognition.
#ALERT—DROPPED—ITEM: Person observed carrying object into building and leaving without it.
Strickland felt the importance of this moment as loud applause filled the room. They had just passed the bomber test. Years of work flashed before him. He felt the backslaps of his teammates, and he turned to their smiling faces in the semidarkness. He even grabbed hands and side-hugged Prakash. They’d never gotten along well—always struggling for the reins. But this moment was what they’d been working for. Even the eternally serious Prakash gave the barest hint of a smile. A smirk, really.
Strickland had to admit the guy knew what he was doing. “Great work, Vijay.”
Prakash nodded. “It’s a start.”
Prick.
Couldn’t he enjoy anything?
There were calls for quiet as the test continued, but a warm tingling had settled across Strickland. They would get their research grant. He knew it now. The excited discussions among the judges told him they’d outperformed anything they’d ever seen. His professional career had begun, and he would forever remember this moment. He couldn’t wait to tell Sandra.
But then he remembered that they weren’t seeing each other anymore.
* * *
S
trickland popped
the cork on a bottle of cheap champagne and let foam spew in all directions as his research mates screamed in jubilation. Back in the KSL lab cluster on the second floor, there was much to celebrate. The lab was an open workspace with HD digital video cameras clamped to brackets and on tripods scattered here and there, rack servers in one corner, their LED lights flickering as though in time to the music. LCD monitors on desks and mounted to the ceiling scrolled Raconteur-generated text of the festivities . . . most of it not too far off—but then they would now have a federal grant to perfect it, wouldn’t they?
What they’d believed to be groundbreaking work had been recognized. The venture capital arm of a U.S. intelligence agency had tentatively agreed to finance their research project, but along with that came top-level introductions to other private venture firms. His team now represented the very bleeding edge in the field of visual intelligence. All their personal rancor and disagreements had been for a purpose, and now the entire crew shouted another toast, enjoying the moment—Chatterjee, Koepple, Prakash, Wang, Kasheyev—a truly international team. And then there were the other lab teams in the cluster, along with their faculty advisors. There were spouses and significant others, as well—turning it into a full-scale party. Strickland wished he had someone to share it with too. But that would come in time, especially now that success had found him. In a few years he hoped to be a partner in some venture capital firm on Sand Hill Road. He was on his way.