Data and Goliath (29 page)

Read Data and Goliath Online

Authors: Bruce Schneier

BOOK: Data and Goliath
6.18Mb size Format: txt, pdf, ePub

This is because most of the cost of privacy breaches falls on the people whose data
is exposed. In economics, this is known as an externality: an effect of a decision
not borne by the decision maker. Externalities limit the incentive for companies to
improve their security.

You might expect users to respond by favoring secure services over insecure ones—after
all, they’re making their own buying decisions on the basis of the same market model.
But that’s not generally possible. In some cases, software monopolies limit the available
product choice. In other cases, the “lock-in effect” created by proprietary file formats,
existing infrastructure, compatibility requirements, or software-as-a-service makes
it harder to switch. In many cases, we don’t know who is collecting our data; recall
the discussion of hidden surveillance in Chapter 2. In all cases, it’s hard for buyers
to assess the security of any data service. And it’s not just nontechnical buyers;
even I can’t tell you whether or not to entrust your privacy to any particular service
provider.

Liabilities change this. By raising the cost of privacy breaches, we can make companies
accept the costs of the externality and force them to expend more effort protecting
the privacy of those whose data they have acquired. We’re already doing this in the
US with healthcare data; privacy violations in that industry come with serious fines.

And it’s starting to happen here with data from stores, as well. Target is facing
several lawsuits as a result of its 2013 breach. In other cases, banks are being sued
for inadequately protecting the privacy of their customers. One
way to help would be to require companies to inform users about
all
the information they possess that might have been compromised.

These cases can be complicated, with multiple companies involved in a particular incident,
and apportioning liability will be hard. Courts have been reluctant to find a value
in privacy, because people willingly give it away in exchange for so little. And because
it is difficult to link harms from loss of privacy to specific actions that caused
those harms, such cases have been very difficult to win.

There’s a better way to approach this: make the point of harm the privacy breach,
not any ensuing harms. Companies must be compelled to comply with regulations like
the 1973 Code of Fair Information Practices and other similar privacy regulations
that are currently not mandatory; then the violation becomes failure to adhere to
the regulations.

There’s a parallel with how the EPA regulates environmental pollutants. Factories
have a legal obligation to limit the amount of particulates they emit into the environment.
If they exceed these limits, they are subject to
fines. There’s no need to wait until there is an unexpected surge of cancer cases.
The problem is understood, the regulations are in place, companies can decide whether
to build factories with coal furnaces or solar panels, and sanctions are waiting if
they fail to comply with what are essentially best practices. That is the direction
we need to go.

The US Code of Fair Information Practices (1973)

The Code of Fair Information Practices is based on five principles:

1. There must be no personal data record-keeping systems whose very existence is secret.

2. There must be a way for a person to find out what information about the person
is in a record and how it is used.

3. There must be a way for a person to prevent information about the person that was
obtained for one purpose from being used or made available for other purposes without
the person’s consent.

4. There must be a way for a person to correct or amend a record of identifiable information
about the person.

5. Any organization creating, maintaining, using, or disseminating records of identifiable
personal data must assure the reliability of the data for their intended use and must
take precautions to prevent misuses of the data.

To be sure, making systems more secure will cost money, and corporations will pass
those costs on to users in the form of higher prices if they can. But users are already
paying extra costs for insecure systems: the direct and indirect costs of privacy
breaches. Making companies liable for breaches moves those costs to them and, as a
by-product, causes the companies to improve their security. The relevant term from
economics is “least cost avoider”: it is economically efficient to assign liability
to the entity in possession of the data because it is best positioned to minimize
risk. Think about it—what can you do if you want Facebook to better secure your data?
Not much. Economic theory says that’s why the company should bear the cost of poor
security practices.

REGULATE DATA USE

Unlike in the EU, in the US today personal information about you is not your property;
it’s owned by the collector. Laws protect specific categories of personal data—financial
data, healthcare information, student data, videotape rental records—but we have nothing
like the broad privacy protection laws you find in European countries. But broad legal
protections are really the only solution; leaving the market to sort this out will
lead to even more invasive mass surveillance.

Here’s an example. Dataium is a company that tracks you as you shop for a car online.
It monitors your visits to different manufacturers’ websites: what types of cars you’re
looking at, what options you click on for more information, what sorts of financing
options you research, how long you linger on any given page. Dealers pay for this
information about you—not just information about the cars they sell, but the cars
you looked at that are sold by other manufacturers. They pay for this information
so that when you walk into a showroom, they can more profitably sell you a car.

Think about the economics here. That information might cost you (ballpark estimate)
$300 extra on the final price when you buy your car. That
means it’s worth no more than $300 to protect yourself from Dataium’s tactics. But
there are 16 million cars sold annually in the US. Even if you assume that Dataium
has customer information relevant to just 2% of them, that means it’s worth about
$100 million to the company to ensure that its tactics work.

This asymmetry is why market solutions tend to fail. It’s a collective action problem.
It’s worth $100 million to all of us collectively to protect ourselves from Dataium,
but we can’t coordinate effectively. Dataium naturally bands the car dealers together,
but the best way for us customers to band together is through collective political
action.

The point of use is a sensible place to regulate, because much of the information
that’s collected about us is collected because we want it to be. We object when that
information is being used in ways we didn’t intend: when it is stored, shared, sold,
correlated, and used to manipulate us in some stealthy way. This means that we need
restrictions on how our data can be used, especially restrictions on ways that differ
from the purposes for which it was collected.

Other problems arise when corporations treat their underlying algorithms as trade
secrets: Google’s PageRank algorithm, which determines what search results you see,
and credit-scoring systems are two examples. The companies have legitimate concerns
about secrecy. They’re worried both that competitors will copy them and that people
will figure out how to game them. But I believe transparency trumps proprietary claims
when the algorithms have a direct impact on the public. Many more algorithms can be
made public—or redesigned so they can be made public—than currently are. For years,
truth in lending and fair lending laws have required financial institutions to ensure
that the algorithms they use are both explainable and legally defensible. Mandated
transparency needs to be extended into other areas where algorithms hold power over
people: they have to be open. Also, there are ways of auditing algorithms for fairness
without making them public.

Corporations tend to be rational risk assessors, and will abide by regulation. The
key to making this work is oversight and accountability. This isn’t something unusual:
there are many regulated industries in our society, because we know what they do is
both important and dangerous. Personal information and the algorithms used to analyze
it are no different. Some
regular audit mechanism would ensure that corporations are following the rules, and
would penalize them if they don’t.

This all makes sense in theory, but actually doing it is hard. The last thing we want
is for the government to start saying, “You can only do this and nothing more” with
our data. Permissions-based regulation would stifle technological innovation and change.
We want rights-based regulation—basically, “You can do anything you want unless it
is prohibited.”

REGULATE DATA COLLECTION AS WELL

Regulating data use isn’t enough. Privacy needs to be regulated in many places: at
collection, during storage, upon use, during disputes. The OECD Privacy Framework
sets them out nicely, and they’re all essential.

There’s been a concerted multi-year effort by US corporations to convince the world
that we don’t need regulations on data collection, only on data use. Companies seek
to eradicate any limitations on data collection because they know that any use limitations
will be narrowly defined, and that they can slowly expand them once they have our
data. (A common argument against any particular data-use regulation is that it’s a
form of censorship.) They know that if collection limitations are in place, it’s much
harder to change them. But as with government mass surveillance, the privacy harms
come from the simple collection of the data, not only from its use. Remember the discussion
of algorithmic surveillance from Chapter 10. Unrestricted corporate collection will
result in broad collection, expansive sharing with the government, and a slow chipping
away at the necessarily narrowly defined use restrictions.

We need to fight this campaign. Limitations on data collection aren’t new. Prospective
employers are not allowed to ask job applicants whether they’re pregnant. Loan applications
are not allowed to ask about the applicants’ race. “Ban the Box” is a campaign to
make it illegal for employers to ask about applicants’ criminal pasts. The former
US gays-in-the-military compromise, “Don’t Ask Don’t Tell,” was a restriction on data
collection. There are restrictions on what questions can be asked by the US Census
Bureau.

Extending this to a world where everything we do is mediated by computers isn’t going
to be easy, but we need to start discussing what sorts
of data should never be collected. There are some obvious places to start. What we
read online should be as private as it is in the paper world. This means we should
legally limit recording the webpages we read, the links we click on, and our search
results. It’s the same with our movements; it should not be a condition of having
a cell phone that we subject ourselves to constant surveillance. Our associations—to
whom we communicate, whom we meet on the street—should not be continually monitored.
Maybe companies can be allowed to use some of this data immediately and then must
purge it. Maybe they’ll be allowed to save it for a short period of time.

One intriguing idea has been proposed by University of Miami Law School professor
Michael Froomkin: requiring both government agencies and private companies engaging
in mass data collection to file Privacy Impact Notices, modeled after Environmental
Impact Reports. This would serve to inform the public about what’s being collected
and why, and how it’s being stored and used. It would encourage decision makers to
think about privacy early in any project’s development, and to solicit public feedback.

One place to start is to require opt-in. Basically, there are two ways to obtain consent.
Opt-in means that you have to explicitly consent before your data is collected and
used. Opt-out is the opposite; your data will be collected and used unless you explicitly
object. Companies like Facebook prefer opt-out, because they can make the option difficult
to find and know that most people won’t bother. Opt-in is more fair, and the use of
service shouldn’t be contingent on allowing data collection.

Right now, there’s no downside to collecting and saving everything. By limiting what
companies can collect and what they can do with the data they collect, by making companies
responsible for the data under their control, and by forcing them to come clean with
customers about what they actually collect and what they do with it, we will influence
them to collect and save only the data about us they know is valuable.

Congress needs to begin the hard work of updating US privacy laws and stop making
excuses for inaction. Courts can also play a significant role safeguarding consumer
privacy by enforcing current privacy laws. The regulatory agencies, such as the FTC
and the FCC, have some authority to protect consumer privacy in certain domains. But
what the United States needs
today is an independent data protection agency comparable to those in other countries
around the world. And we have to do better than patching problems only after they
become sufficiently harmful. These challenges are big and complex, and we require
an agency with the expertise and resources to have a meaningful impact.

MAKE DO WITH LESS DATA

By and large, organizations could make do collecting much less data, and storing it
for shorter periods of time, than they do now. The key is going to be understanding
how much data is needed for what purpose.

For example, many systems that collect identifications don’t really need that information.
Often, authorization is all that’s required. A social networking site doesn’t need
to know your real identity. Neither does a cloud storage company.

Some types of data analysis require you to have data on a lot of people, but not on
everyone. Think about Waze. It uses surveillance data to infer traffic flow, but doesn’t
need everyone’s data to do that. If it has enough cars under surveillance to get a
broad coverage of major roads, that’s good enough. Many retailers rely on ubiquitous
surveillance to measure the effectiveness of advertisements, infer buying patterns,
and so on; but again, they do not need everyone’s data. A representative sample is
good enough for those applications, and was common when data collection was expensive.

Other books

Anything but Ordinary by Nicola Rhodes
Y punto by Mercedes Castro
Fabric of Sin by Phil Rickman
Weeds in the Garden of Love by Steven J. Daniels
Withholding Secrets by Diana Fisher