Read Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy Online
Authors: Cathy O'Neil
Tags: #Business & Economics, #General, #Social Science, #Statistics, #Privacy & Surveillance, #Public Policy, #Political Science
And so was his industry. With time, of course, insurers advanced a bit in their thinking and sold policies to African American families. After all, there was money to be made. But they clung for decades to Hoffman’s idea that entire groups of people were riskier than others—and some of them too risky.
Insurance companies as well as bankers delineated neighborhoods where they would not invest. This cruel practice, known as redlining, has been outlawed by various pieces of legislation, including
the Fair Housing Act of 1968.
Nearly a half century later, however, redlining is still with us, though in far more subtle forms. It’s coded into the latest generation of WMDs. Like Hoffman, the creators of these new models confuse correlation with causation. They punish the poor, and especially racial and ethnic minorities. And they back up their
analysis with reams of statistics, which give them the studied air of evenhanded science.
On this algorithmic voyage through life, we’ve clawed our way through education and we’ve landed a job (even if it is one that runs us on a chaotic schedule). We’ve taken out loans and seen how our creditworthiness is a stand-in for other virtues or vices. Now it’s time to protect our most treasured assets—our home and car and our family’s health—and make arrangements for those we one day leave behind.
Insurance grew out of actuarial science, a discipline whose roots reach back to the seventeenth century. This was a period in which Europe’s growing bourgeoisie was acquiring great wealth. It allowed many the luxury, for the first time, to think ahead to future generations.
While advances in math were providing the tools necessary to make predictions, an early generation of data hounds was looking for new things to count. One was a draper in London named
John Graunt. He went through birth and death records and in 1682 came up with
the first study of the mortality rates of an entire community of people. He calculated, for example, that children in London faced a 6 percent death risk in each of the first six years of their lives. (And with statistics, he was able to dispel the myth that the plague swept through every year a new monarch came into power.) For the first time, mathematicians could calculate the most probable arc of a person’s life. These numbers didn’t work for individuals, of course. But with big enough numbers, the average and range were predictable.
Mathematicians didn’t pretend to foresee the fate of each individual. That was unknowable. But they could predict the prevalence of accidents, fires, and deaths within large groups of people. Over the following three centuries, a vast insurance industry grew around these predictions. The new industry gave people, for the
first time, the chance to pool their collective risk, protecting individuals when misfortune struck.
Now, with the evolution of data science and networked computers, insurance is facing fundamental change. With ever more information available—including the data from our genomes, the patterns of our sleep, exercise, and diet, and the proficiency of our driving—insurers will increasingly calculate risk for the individual and free themselves from the generalities of the larger pool. For many, this is a welcome change. A health enthusiast today can demonstrate, with data, that she sleeps eight hours a night, walks ten miles a day, and eats little but green vegetables, nuts, and fish oil. Why shouldn’t she get a break on her health insurance?
The move toward the individual, as we’ll see, is embryonic. But already insurers are using data to divide us into smaller tribes, to offer us different products and services at varying prices. Some might call this customized service. The trouble is, it’s not individual. The models place us into groups we cannot see, whose behavior appears to resemble ours. Regardless of the quality of the analysis, its opacity can lead to gouging.
Take auto insurance. In 2015,
researchers at Consumer Reports conducted an extensive nationwide study looking for disparities in pricing. They analyzed more than two billion price quotes from all the major insurers for hypothetical customers from every one of the 33,419 zip codes in the country. What they found was wildly unfair, and rooted—as we saw in the last chapter—in credit scores.
Insurers draw these scores from credit reports, and then, using the insurer’s proprietary algorithm, create their own ratings, or e-scores. These are proxies for responsible driving. But Consumer Reports found that the e-scores, which include all sorts of demographic data, often count for more than the driver’s record. In other words, how you manage money can matter more than how you drive a car. In New York State, for example, a dip in a
driver’s credit rating from “excellent” to merely “good” could jack up the annual cost of insurance by $255.
And in Florida, adults with clean driving records and poor credit scores paid an average of $1,552 more than the same drivers with excellent credit and a
drunk driving conviction
.
We’ve already discussed how the growing reliance on credit scores across the economy works against the poor. This is yet another example of that trend, and an egregious one—especially since auto insurance is mandatory for anyone who drives. What’s different here is the focus on the proxy when far more relevant data is available. I cannot imagine a more meaningful piece of data for auto insurers than a drunk driving record. It is evidence of risk in precisely the domain they’re attempting to predict. It’s far better than other proxies they consider, such as a high school student’s grade point average. Yet it can count far less in their formula than a score drawn from financial data thrown together on a credit report (which, as we’ve seen, is sometimes erroneous).
So why would their models zero in on credit scores? Well, like other WMDs, automatic systems can plow through credit scores with great efficiency and at enormous scale. But I would argue that the chief reason has to do with profits. If an insurer has a system that can pull in an extra $1,552 a year from a driver with a clean record, why change it? The victims of their WMD, as we’ve seen elsewhere, are more likely to be poor and less educated, a good number of them immigrants. They’re less likely to know that they’re being ripped off. And in neighborhoods with more payday loan offices than insurance brokers, it’s harder to shop for lower rates. In short, while an e-score might not correlate with safe driving, it does create a lucrative pool of vulnerable drivers. Many of them are desperate to drive—their jobs depend on it. Overcharging them is good for the bottom line.
From the auto insurer’s perspective, it’s a win-win. A good driver
with a bad credit score is low risk and superhigh reward. What’s more, the company can use some of the proceeds from that policy to address the inefficiencies in the model. Those might include the drivers with pristine credit reports who pay low premiums and crash their cars while drunk.
That may sound a tad cynical. But consider the price optimization algorithm at Allstate, the insurer self-branded as “the Good Hands People.” According to a watchdog group, the
Consumer Federation of America, Allstate analyzes consumer and demographic data to determine the likelihood that customers will shop for lower prices. If they aren’t likely to, it makes sense to charge them more. And that’s just what Allstate does.
It gets worse. In a filing to the
Wisconsin Department of Insurance, the CFA listed one hundred thousand microsegments in Allstate’s pricing schemes. These pricing tiers are based on how much each group can be expected to pay. Consequently, some receive discounts of up to 90 percent off the average rate, while others face an increase of 800 percent. “
Allstate’s insurance pricing has become untethered from the rules of risk-based premiums and from the rule of law,” said J. Robert Hunter, CFA’s director of insurance and the former Texas insurance commissioner. Allstate responded that the CFA’s charges were inaccurate. The company did concede, however, that “marketplace considerations,
consistent with industry practices, have been appropriate in developing insurance prices.” In other words, its models study a host of proxies to calculate how much to charge customers. And the rest of the industry does, too.
The resulting pricing is unfair. This abuse could not occur if insurance pricing were transparent and customers could easily comparison-shop. But like other WMDs, it is opaque. Every person gets a different experience, and the models are optimized to draw as much money as they can from the desperate and the
ignorant. The result—another feedback loop—is that poor drivers who can least afford outrageous premiums are squeezed for every penny they have. The model is fine-tuned to draw as much money as possible from this subgroup. Some of them, inevitably, fall too far, defaulting on their auto loans, credit cards, or rent. That further punishes their credit scores, which no doubt drops them into an even more forlorn microsegment.
When Consumer Reports issued its damning report on the auto insurers, it also launched a campaign directed at the National Association of Insurance Commissioners (NAIC), complete with
its own Twitter campaign: @NAIC_News to Insurance Commissioners: Price me by how I drive, not by who you think I am! #FixCarInsurance.
The underlying idea was that drivers should be judged by their records—their number of speeding tickets, or whether they’ve been in an accident—and not by their consumer patterns or those of their friends or neighbors. Yet in the age of Big Data, urging insurers to judge us by how we drive means something entirely new.
Insurance companies now have manifold ways to study drivers’ behavior in exquisite detail. For a preview, look no further than the trucking industry.
These days, many trucks carry an electronic logging device that registers every turn, every acceleration, every time they touch the brakes. And in 2015,
Swift Transportation, the nation’s largest trucking company, started to install cameras pointed in two directions, one toward the road ahead, the other at the driver’s face.
The stated goal of this surveillance is to reduce accidents.
About seven hundred truckers die on American roads every year. And their crashes also claim the lives of many in other vehicles. In addition to the personal tragedy, this costs lots of money.
The
average cost of a fatal crash, according to the Federal Motor Carrier Safety Administration, is $3.5 million.
But with such an immense laboratory for analytics at their fingertips, trucking companies aren’t stopping at safety. If you combine geoposition, onboard tracking technology, and cameras, truck drivers deliver a rich and constant stream of behavioral data. Trucking companies can now analyze different routes, assess fuel management, and compare results at different times of the day and night. They can even calculate ideal speeds for different road surfaces.
And they use this data to figure out which patterns provide the most revenue at the lowest cost.
They can also compare individual drivers. Analytics dashboards give each driver a scorecard. With a click or two, a manager can identify the best and worst performers across a broad range of metrics. Naturally, this surveillance data can also calculate the risk for each driver.
This promise is not lost on the insurance industry. Leading insurers including
Progressive, State Farm, and Travelers are already offering drivers a discount on their rates if they agree to share their driving data. A small telemetric unit in the car, a simple version of the black boxes in airplanes, logs the speed of the car and how the driver brakes and accelerates. A GPS monitor tracks the car’s movements.
In theory, this meets the ideal of the Consumer Reports campaign. The individual driver comes into focus. Consider eighteen-year-olds. Traditionally they pay sky-high rates because their age group, statistically, indulges in more than its share of recklessness. But now, a high school senior who avoids jackrabbit starts, drives at a consistent pace under the speed limit, and eases to a stop at red lights might get a discounted rate. Insurance companies have long given an edge to young motorists who finish driver’s ed or
make the honor roll. Those are proxies for responsible driving. But driving data is the real thing. That’s better, right?
There are a couple of problems. First, if the system attributes risk to geography, poor drivers lose out. They are more likely to drive in what insurers deem risky neighborhoods. Many also have long and irregular commutes, which translates into higher risk.
Fine, you might say. If poor neighborhoods are riskier, especially for auto theft, why should insurance companies ignore that information? And if longer commutes increase the chance of accidents, that’s something the insurers are entitled to consider. The judgment is still based on the driver’s behavior, not on extraneous details like her credit rating or the driving records of people her age. Many would consider that an improvement.
To a degree, it is. But consider a hypothetical driver who lives in a rough section of Newark, New Jersey, and must commute thirteen miles to a barista job at a Starbucks in the wealthy suburb of Montclair. Her schedule is chaotic and includes occasional clopenings. So she shuts the shop at 11, drives back to Newark, and returns before 5 a.m. To save ten minutes and $1.50 each way on the Garden State Parkway, she takes a shortcut, which leads her down a road lined with bars and strip joints.