Data and Goliath (5 page)

Read Data and Goliath Online

Authors: Bruce Schneier

BOOK: Data and Goliath
2.93Mb size Format: txt, pdf, ePub

Most electronic surveillance doesn’t happen that way. It’s covert. We read newspapers
online, not realizing that the articles we read are recorded. We browse online stores,
not realizing that both the things we buy and the things we look at and decide not
to buy are being monitored. We use electronic payment systems, not thinking about
how they’re keeping a record of our purchases. We carry our cell phones with us, not
understanding that they’re constantly tracking our location.

Buzzfeed is an entertainment website that collects an enormous amount of information
about its users. Much of the data comes from traditional Internet tracking, but Buzzfeed
also has a lot of fun quizzes, some of which ask very personal questions. One of them—“How
Privileged Are You?”—asks
about financial details, job stability, recreational activities, and mental health.
Over two million people have taken that quiz, not realizing that Buzzfeed saves data
from its quizzes. Similarly, medical information sites like WebMD collect data on
what pages users search for and read.

Lest you think it’s only your web browsing, e-mails, phone calls, chats, and other
electronic communications that are monitored, old-fashioned paper mail is tracked
as well. Through a program called Isolation Control and Tracking, the US Postal Service
photographs the exterior, front and back, of
every piece of mail sent in the US
. That’s about 160 billion pieces annually. This data is available to law enforcement,
and certainly other government agencies as well.

Off the Internet, many surveillance technologies are getting smaller and less obtrusive.
In some cities, video cameras capture our images hundreds of times a day. Some are
obvious, but we don’t see a CCTV camera embedded in a ceiling light or ATM, or a gigapixel
camera a block away. Drones are getting smaller and harder to see; they’re now the
size of insects and soon the size of dust.

Add identification software to any of these image collection systems, and you have
an automatic omnipresent surveillance system. Face recognition is the easiest way
to identify people on camera, and the technology is getting better every year. In
2014, face recognition algorithms started outperforming people. There are other image
identification technologies in development: iris scanners that work at a distance,
gait recognition systems, and so on.

There’s more hidden surveillance going on in the streets. Those contactless RFID chip
cards in your wallet can be used to track people. Many retail stores are surreptitiously
tracking people by the MAC addresses and Bluetooth IDs—which are basically identification
numbers—broadcast by their smartphones. The goal is to record which aisles they walk
down, which products they stop to look at, and so on. People can be tracked at public
events by means of both these approaches.

In 2014, a senior executive from the Ford Motor Company told an audience at the Consumer
Electronics Show, “We know everyone who breaks the law, we know when you’re doing
it. We have GPS in your car, so we know what you’re doing.” This came as a shock and
surprise, since no one knew
Ford had its car owners under constant surveillance. The company quickly retracted
the remarks, but the comments left a lot of wiggle room for Ford to collect data on
its car owners. We know from a Government Accountability Office report that both automobile
companies and navigational aid companies collect a lot of location data from their
users.

Radar in the terahertz range can detect concealed weapons on people, and objects through
eight inches of concrete wall. Cameras can “listen” to phone conversations by focusing
on nearby objects like potato chip bags and measuring their vibrations. The NSA, and
presumably others, can turn your cell phone’s microphone on remotely, and listen to
what’s going on around it.

There are body odor recognition systems under development, too. On the Internet, one
company is working on identifying people by their typing style. There’s research into
identifying people by their writing style. Both corporations and governments are harvesting
tens of millions of voiceprints—yet another way to identify you in real time.

This is the future. Store clerks will know your name, address, and income level as
soon as you walk through the door. Billboards will know who you are, and record how
you respond to them. Grocery store shelves will know what you usually buy, and exactly
how to entice you to buy more of it. Your car will know who is in it, who is driving,
and what traffic laws that driver is following or ignoring. Even now, it feels a lot
like science fiction.

As surveillance fades into the background, it becomes easier to ignore. And the more
intrusive a surveillance system is, the more likely it is to be hidden. Many of us
would refuse a drug test before being hired for an office job, but many companies
perform invasive background checks on all potential employees. Likewise, being tracked
by hundreds of companies on the Internet—companies you’ve never interacted with or
even heard of—feels much less intrusive than a hundred market researchers following
us around taking notes.

In a sense, we’re living in a unique time in history; many of our surveillance systems
are still visible to us. Identity checks are common, but they still require us to
show our ID. Cameras are everywhere, but we can still see them. In the near future,
because these systems will be hidden, we may unknowingly acquiesce to even more surveillance.

AUTOMATIC SURVEILLANCE

A surprising amount of surveillance happens to us automatically, even if we do our
best to opt out. It happens because we interact with others, and
they’re
being monitored.

Even though I never post or friend anyone on Facebook—I have a professional page,
but not a personal account—Facebook tracks me. It maintains a profile of non-Facebook
users in its database. It tracks me whenever I visit a page with a Facebook “Like”
button. It can probably make good guesses about who my friends are based on tagged
photos, and it may well have the profile linked to other information it has purchased
from various data brokers. My friends, and those sites with the Like buttons, allow
Facebook to surveil me through them.

I try not to use Google search. But Google still collects a lot of information about
the websites I visit, because so many of them use Google Analytics to track their
visitors. Again, those sites let Google track me through them. I use various blockers
in my browser so Google can’t track me very well, but it’s working on technologies
that will circumvent my privacy practices.

I also don’t use Gmail. Instead, I use a local ISP and store all of my e-mail on my
computer. Even so, Google has about a third of my messages, because many of the people
I correspond with use Gmail. It’s not just Gmail.com addresses; Google hosts a lot
of organizations’ e-mail, even though those organizations keep their domain name addresses.
There are other examples. Apple has a worldwide database of Wi-Fi passwords, including
my home network’s, from people backing up their iPhones. Many companies have my contact
information because my friends and colleagues back up their address books in the cloud.
If my sister publishes her genetic information, then half of mine becomes public as
well.

Sometimes data we only intend to share with a few becomes surveillance data for the
world. Someone might take a picture of a friend at a party and post it on Facebook
so her other friends can see it. Unless she specifies otherwise, that picture is public.
It’s still hard to find, of course—until it’s tagged by an automatic face recognition
system and indexed by a search engine. Now that photo can be easily found with an
image search.

I am constantly appearing on other people’s surveillance cameras.
In cities like London, Chicago, Mexico City, and Beijing, the police forces have installed
surveillance cameras all over the place. In other cities, like New York, the cameras
are mostly privately owned. We saw the difference in two recent terrorism cases. The
London subway bombers were identified by government cameras, and the Boston Marathon
bombers by private cameras attached to businesses.

That data is almost certainly digital. Often it’s just stored on the camera, on an
endless loop that erases old data as it records new data. But increasingly, that surveillance
video is available on the Internet and being saved indefinitely—and a lot of it is
publicly searchable.

Unless we take steps to prevent it, being captured on camera will get even less avoidable
as life recorders become more prevalent. Once enough people regularly record video
of what they are seeing, you’ll be in enough of their video footage that it’ll no
longer matter whether or not you’re wearing one. It’s kind of like herd immunity,
but in reverse.

UBIQUITOUS SURVEILLANCE

Philosopher Jeremy Bentham conceived of his “panopticon” in the late 1700s as a way
to build cheaper prisons. His idea was a prison where every inmate could be surveilled
at any time, unawares. The inmate would have no choice but to assume that he was always
being watched, and would therefore conform. This idea has been used as a metaphor
for mass personal data collection, both on the Internet and off.

On the Internet, surveillance is ubiquitous. All of us are being watched, all the
time, and that data is being stored forever. This is what an information-age surveillance
state looks like, and it’s efficient beyond Bentham’s wildest dreams.

3

Analyzing Our Data

I
n 2012, the
New York Times
published a story on how corporations analyze our data for advertising advantages.
The article revealed that Target Corporation could determine from a woman’s buying
patterns that she was pregnant, and would use that information to send the woman ads
and coupons for baby-related items. The story included an anecdote about a Minneapolis
man who’d complained to a Target store that had sent baby-related coupons to his teenage
daughter, only to find out later that Target was right.

The general practice of amassing and saving all kinds of data is called “big data,”
and the science and engineering of extracting useful information from it is called
“data mining.” Companies like Target mine data to focus their advertising. Barack
Obama mined data extensively in his 2008 and 2012 presidential campaigns for the same
purpose. Auto companies mine the data from your car to design better cars; municipalities
mine data from roadside sensors to understand driving conditions. Our genetic data
is mined for all sorts of medical research. Companies like Facebook and Twitter mine
our data for advertising purposes, and have allowed academics to mine their data for
social research.

Most of these are secondary uses of the data. That is, they are not the reason the
data was collected in the first place. In fact, that’s the basic promise of big data:
save everything you can, and someday you’ll be able to figure out some use for it
all.

Big data sets derive value, in part, from the inferences that can be made from them.
Some of these are obvious. If you have someone’s detailed location data over the course
of a year, you can infer what his favorite restaurants are. If you have the list of
people he calls and e-mails, you can infer who his friends are. If you have the list
of Internet sites he visits—or maybe a list of books he’s purchased—you can infer
his interests.

Some inferences are more subtle. A list of someone’s grocery purchases might imply
her ethnicity. Or her age and gender, and possibly religion. Or her medical history
and drinking habits. Marketers are constantly looking for patterns that indicate someone
is about to do something expensive, like get married, go on vacation, buy a home,
have a child, and so on. Police in various countries use these patterns as evidence,
either in a court or in secret. Facebook can predict race, personality, sexual orientation,
political ideology, relationship status, and drug use on the basis of Like clicks
alone. The company knows you’re engaged before you announce it, and gay before you
come out—and its postings may reveal that to other people without your knowledge or
permission. Depending on the country you live in, that could merely be a major personal
embarrassment—or it could get you killed.

There are a lot of errors in these inferences, as all of us who’ve seen Internet ads
that are only vaguely interesting can attest. But when the ads are on track, they
can be eerily creepy—and we often don’t like it. It’s one thing to see ads for hemorrhoid
suppositories or services to help you find a girlfriend on television, where we know
they’re being seen by everyone. But when we know they’re targeted at us specifically,
based on what we’ve posted or liked on the Internet, it can feel much more invasive.
This makes for an interesting tension: data we’re willing to share can imply conclusions
that we don’t want to share. Many of us are happy to tell Target our buying patterns
for discounts and notifications of new products we might like to buy, but most of
us don’t want Target to figure out that we’re pregnant. We also don’t want the large
data thefts and fraud that inevitably accompany these large databases.

When we think of computers using all of our data to make inferences, we have a
very human way of thinking about it. We imagine how we would make sense of the data,
and project that process onto computers. But that’s not right. Computers and people
have different strengths, weaknesses, and limits. Computers can’t abstractly reason
nearly as well as people, but they can process enormous amounts of data ever more
quickly. (If you think about it, this means that computers are better at working with
metadata than they are at handling conversational data.) And they’re constantly improving;
computing power is still doubling every eighteen months, while our species’ brain
size has remained constant. Computers are already far better than people at processing
quantitative data, and they will continue to improve.

Other books

The Pagan Night by Tim Akers
The Janus Man by Colin Forbes
Salvage by MJ Kobernus
The Face In The Mirror by Stewart, Barbara
The New World by Stackpole, Michael A.
Bryony and Roses by T. Kingfisher
By a Slow River by Philippe Claudel
Prague by Arthur Phillips