Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (22 page)

Read Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy Online

Authors: Cathy O'Neil

Tags: #Business & Economics, #General, #Social Science, #Statistics, #Privacy & Surveillance, #Public Policy, #Political Science

BOOK: Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy
7.52Mb size Format: txt, pdf, ePub

Recently, Google processed images of a trio of happy young African Americans and its automatic photo-tagging service
labeled them as gorillas. The company apologized profusely, but in systems like Google’s, errors are inevitable. It was most likely faulty machine learning (and probably not a racist running loose in the Googleplex) that led the computer to confuse
Homo sapiens
with our close cousin, the gorilla. The software itself had flipped through billions of images of primates and had made its own distinctions. It focused on everything from shades of color to the distance between eyes and the shape of the ear. Apparently, though, it wasn’t thoroughly tested before being released.

Such mistakes are learning opportunities—as long as the system receives feedback on the error. In this case, it did. But injustice persists. When automatic systems sift through our data to size us up for an e-score, they naturally project the past into the future. As we saw in recidivism sentencing models and predatory loan algorithms, the poor are expected to remain poor forever and are treated accordingly—denied opportunities, jailed more often, and gouged for services and loans. It’s inexorable, often hidden and beyond appeal, and unfair.

Yet we can’t count on automatic systems to address the issue. For all of their startling power, machines cannot yet make adjustments for fairness, at least not by themselves. Sifting through data and judging what is fair is utterly foreign to them and enormously complicated. Only human beings can impose that constraint.

There’s a paradox here. If we return one last time to that ’50s-era banker, we see that his mind was occupied with human distortions—desires, prejudice, distrust of outsiders. To carry out the job more fairly and efficiently, he and the rest of his industry handed the work over to an algorithm.

Sixty years later, the world is dominated by automatic systems chomping away on our error-ridden dossiers. They urgently require the context, common sense, and fairness that only humans can provide. However, if we leave this issue to the marketplace, which prizes efficiency, growth, and cash flow (while tolerating a certain degree of errors), meddling humans will be instructed to stand clear of the machinery.

This will be a challenge, because even as the problems with our old credit models become apparent, powerful newcomers are storming in.
Facebook, for example, has patented a new type of credit rating, one based on our social networks. The goal, on its
face, is reasonable. Consider a college graduate who goes on a religious mission for five years, helping to bring potable water to impoverished villages in Africa. He comes home with no credit rating and has trouble getting a loan. But his classmates on Facebook are investment bankers, PhDs, and software designers. Birds-of-a-feather analysis would indicate that he’s a good bet. But that same analysis likely works against a hardworking housecleaner in East St. Louis, who might have numerous unemployed friends and a few in jail.

Meanwhile, the formal banking industry is frantically raking through personal data in its attempts to boost business. But licensed banks are subject to federal regulation and disclosure requirements, which means that customer profiling carries reputational and legal risk.
American Express learned this the hard way in 2009, just as the Great Recession was gearing up. No doubt looking to reduce risk on its own balance sheet, Amex cut the spending limits of some customers. Unlike the informal players in the e-score economy, though, the credit card giant had to send them a letter explaining why.

This is when Amex delivered a low blow. Cardholders who shopped at certain establishments, the company wrote, were more likely to fall behind on payments. It was a matter of statistics, plain and simple, a clear correlation between shopping patterns and default rates. It was up to the unhappy Amex customers to guess which establishment had poisoned their credit. Was it the weekly shop at Walmart or perhaps the brake job at Grease Monkey that placed them in the bucket of potential deadbeats?

Whatever the cause, it left them careening into a nasty recession with less credit. Worse, the lowered spending limit would appear within days on their credit reports. In fact, it was probably there even before the letters arrived. This would lower their scores and drive up their borrowing costs. Many of these cardholders, it’s
safe to say, frequented “stores associated with poor repayments” because they weren’t swimming in money. And wouldn’t you know it? An algorithm took notice and made them poorer.

Cardholders’ anger attracted the attention of the mainstream press, including the
New York Times
, and Amex promptly announced that it would not correlate stores to risk. (Amex later insisted that it had chosen the wrong words in its message and that it had scrutinized only broader consumer patterns, not specific merchants.)

It was a headache and an embarrassment for American Express. If they had indeed found a strong correlation between shopping at a certain store and credit risk, they certainly couldn’t use it now. Compared to most of the Internet economy, they’re boxed in, regulated, in a certain sense handicapped. (Not that they should complain. Over the decades, lobbyists for the incumbents have crafted many of the regulations with an eye to defending the entrenched powers—and keeping pesky upstarts locked out.)

So is it any surprise that newcomers to the finance industry would choose the freer and unregulated route? Innovation, after all, hinges on the freedom to experiment. And with petabytes of behavioral data at their fingertips and virtually no oversight, opportunities for the creation of new business models are vast.

Multiple companies, for example, are working to replace payday lenders. These banks of last resort cater to the working poor, tiding them over from one paycheck to the next and charging exorbitant interest rates. After twenty-two weeks, a $500 loan could cost $1,500. So if an efficient newcomer could find new ways to rate risk, then pluck creditworthy candidates from this desperate pool of people, it could charge them slightly lower interest and still make a mountain of money.

That was
Douglas Merrill’s idea. A former chief operating officer at Google, Merrill believed that he could use Big Data
to calculate risk and offer payday loans at a discount. In 2009, he founded a start-up called ZestFinance.
On the company web page, Merrill proclaims that “all data is credit data.” In other words, anything goes.

ZestFinance buys data that shows whether applicants have kept up with their cell phone bills, along with plenty of other publicly available or purchased data. As Merrill promised, the company’s rates are lower than those charged by most payday lenders.
A typical $500 loan at ZestFinance costs $900 after twenty-two weeks—60 percent lower than the industry standard.

It’s an improvement, but is it fair? The company’s algorithms process up to
ten thousand data points per applicant, including unusual observations, such as whether applicants use proper spelling and capitalization on their application form, how long it takes them to read it, and whether they bother to look at the terms and conditions. “Rule followers,” the company argues, are better credit risks.

That may be true. But punctuation and spelling mistakes also point to low education, which is highly correlated with class and race. So when poor people and immigrants qualify for a loan, their substandard language skills might drive up their fees. If they then have trouble paying those fees, this might validate that they were a high risk to begin with and might further lower their credit scores. It’s a vicious feedback loop, and paying bills on time plays only a bit part.

When new ventures are built on WMDs, troubles are bound to follow, even when the players have the best intentions. Take the case of the “peer-to-peer” lending industry. It started out in the last decade with the vision of borrowers and lenders finding each other on matchmaking platforms. This would represent the democratization of banking. More people would get loans, and at the same time millions of everyday people would become small-time
bankers and make a nice return. Both sides would bypass the big greedy banks.

One of the first peer-to-peer exchanges, Lending Club, launched as an application on Facebook in 2006 and
received funding a year later to become a new type of bank. To calculate the borrower’s risk, Lending Club blended the traditional credit report with data gathered from around the web. Their algorithm, in a word, generated e-scores, which they claimed were more accurate than credit scores.

Lending Club and its chief rival, Prosper, are still tiny. They’ve generated
less than $10 billion in loans, which is but a speck in the $3 trillion consumer lending market. Yet they’re attracting loads of attention.
Executives from Citigroup and Morgan Stanley serve as directors of peer-to-peer players, and
Wells Fargo’s investment fund is the largest investor in Lending Club. Lending Club’s stock offering in December of 2014 was
the biggest tech IPO of the year.
It raised $870 million and reached a valuation of $9 billion, making it the fifteenth most valuable bank in America.

The fuss has little to do with democratizing capital or cutting out the middleman. According to
a report in
Forbes
, institutional money now accounts for more than 80 percent of all the activity on peer-to-peer platforms. For big banks, the new platforms provide a convenient alternative to the tightly regulated banking economy. Working through peer-to-peer systems, a lender can analyze nearly any data it chooses and develop its own e-scores. It can develop risk correlations for neighborhoods, zip codes, and the stores customers shop at—all without having to send them embarrassing letters explaining why.

And what does that mean for us? With the relentless growth of e-scores, we’re batched and bucketed according to secret formulas, some of them fed by portfolios loaded with errors. We’re viewed not as individuals but as members of tribes, and we’re stuck with
that designation. As e-scores pollute the sphere of finance, opportunities dim for the have-nots. In fact, compared to the slew of WMDs running amok, the prejudiced loan officer of yesteryear doesn’t look all that bad. At the very least, a borrower could attempt to read his eyes and appeal to his humanity.

*
Even so, I should add, fixing them can be a nightmare. A Mississippi resident named Patricia Armour tried for two years to get Experian to expunge from her file a $40,000 debt she no longer owed. It took a call to Mississippi’s attorney general, she told the
New York Times
, before Experian corrected her record.

 

Late in the nineteenth century, a renowned statistician named Frederick Hoffman created a potent WMD. It’s very likely that
Hoffman, a German who worked for the Prudential Life Insurance Company, meant no harm. Later in his life, his research contributed mightily to public health. He did valuable work on malaria and was among the first to associate cancer with tobacco. Yet on a spring day in 1896, Hoffman published a 330-page report that set back the cause of racial equality in the United States and reinforced the status of millions as second-class citizens. His report used exhaustive statistics to make the case that the lives of black Americans were so precarious that the entire race was uninsurable.

Hoffman’s analysis, like many of the WMDs we’ve been discussing, was statistically flawed. He confused causation with correlation, so that the voluminous data he gathered served only to confirm his thesis: that race was a powerful predictor of life expectancy. Racism was so ingrained in his thinking that he apparently never stopped to consider whether poverty and injustice might have something to do with the death rate of African Americans, whether the lack of decent schools, modern plumbing, safe workplaces, and access to health care might kill them at a younger age.

Hoffman also made a fundamental statistical error. Like the presidential commission that issued the 1983
Nation at Risk
report, Hoffman neglected to stratify his results. He saw blacks only as a large and homogeneous group. So he failed to separate them into different geographical, social, or economic cohorts. For him, a black schoolteacher leading an orderly life in Boston or New York was indistinguishable from a sharecropper laboring twelve hours a day barefoot in the Mississippi Delta. Hoffman was blinded by race.

Other books

Travel Team by Mike Lupica
Becoming Americans by Donald Batchelor
Orbital Maneuvers by R Davison
Grave Deeds by Betsy Struthers
Reign of Hell by Sven Hassel
Colors by Russell J. Sanders