Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy

Cathy O’ Neil is a mathematician and data scientist. UC Berkeley-educated with a PhD from Harvard, she left an academic career to join the finance industry. After realizing how Wall Street was using mathematics in an unethical way, she went on to research algorithms in different industries and their negative impact on social groups and individuals. The product of this research is her latest book “Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy“

What is it about?

It is an easy, yet fascinating read. O’ Neil explains in a very explicit way how companies and institutions use big data trying to solve specific problems while at the same time creating by-products that affect people’s lives. Data scientists program these algorithms to achieve efficiency, which often comes on the expense of fairness against people usually at the lowest socio-economic levels of our society. Without a feedback loop, the algorithms remain unaltered, solidifying this negative impact, which does nothing but perpetuate inequality and, in the long run, threaten democracy.

Do you want to receive updates for posts like this? Join my newsletter!

Excerpts

So to sum up, these are the three elements of a WMD [Weapons of Math Destruction]: Opacity, Scale, and Damage. All of them will be present, to one degree or another, in the examples we’ll be covering.

While looking at WMDs, we’re often faced with a choice between fairness and efficacy. Our legal traditions lean strongly toward fairness. The Constitution, for example, presumes innocence and is engineered to value it. From a modeler’s perspective, the presumption of innocence is a constraint, and the result is that some guilty people go free, especially those who can afford good lawyers. Even those found guilty have the right to appeal their verdict, which chews up time and resources. So the system sacrifices enormous efficiencies for the promise of fairness. The Constitution’s implicit judgment is that freeing someone who may well have committed a crime, for lack of evidence, poses less of a danger to our society than jailing or executing an innocent person. WMDs, by contrast, tend to favor efficiency. By their very nature, they feed on data that can be measured and counted. But fairness is squishy and hard to quantify. It is a concept. And computers, for all of their advances in language and logic, still struggle mightily with concepts. They “understand” beauty only as a word associated with the Grand Canyon, ocean sunsets, and grooming tips in Vogue magazine. They try in vain to measure “friendship” by counting likes and connections on Facebook. And the concept of fairness utterly escapes them. Programmers don’t know how to code for it, and few of their bosses ask them to.

And why are nonwhite prisoners from poor neighborhoods more likely to commit crimes? According to the data inputs for the recidivism models, it’s because they’re more likely to be jobless, lack a high school diploma, and have had previous run-ins with the law. And their friends have, too. Another way of looking at the same data, though, is that these prisoners live in poor neighborhoods with terrible schools and scant opportunities. And they’re highly policed. So the chance that an ex-convict returning to that neighborhood will have another brush with the law is no doubt larger than that of a tax fraudster who is released into a leafy suburb. In this system, the poor and nonwhite are punished more for being who they are and living where they live.

Our livelihoods increasingly depend on our ability to make our case to machines. The clearest example of this is Google. For businesses, whether it’s a bed-and-breakfast or an auto repair shop, success hinges on showing up on the first page of search results. Now individuals face similar challenges, whether trying to get a foot in the door of a company, to climb the ranks—or even to survive waves of layoffs. The key is to learn what the machines are looking for. But here too, in a digital universe touted to be fair, scientific, and democratic, the insiders find a way to gain a crucial edge.

The practice of using credit scores in hirings and promotions creates a dangerous poverty cycle. After all, if you can’t get a job because of your credit record, that record will likely get worse, making it even harder to land work. It’s not unlike the problem young people face when they look for their first job—and are disqualified for lack of experience. Or the plight of the longtime unemployed, who find that few will hire them because they’ve been without a job for too long. It’s a spiraling and defeating feedback loop for the unlucky people caught up in it.

Facebook, for example, has patented a new type of credit rating, one based on our social networks. The goal, on its face, is reasonable.

Leading insurers including Progressive, State Farm, and Travelers are already offering drivers a discount on their rates if they agree to share their driving data. A small telemetric unit in the car, a simple version of the black boxes in airplanes, logs the speed of the car and how the driver brakes and accelerates. A GPS monitor tracks the car’s movements.

In these early days, the auto insurers’ tracking systems are opt-in. Only those willing to be tracked have to turn on their black boxes. They get rewarded with a discount of between 5 and 50 percent and the promise of more down the road. (And the rest of us subsidize those discounts with higher rates.) But as insurers gain more information, they’ll be able to create more powerful predictions. That’s the nature of the data economy. Those who squeeze out the most intelligence from this information, turning it into profits, will come out on top. They’ll predict group risk with greater accuracy (though individuals will always confound them). And the more they benefit from the data, the harder they’ll push for more of it. At some point, the trackers will likely become the norm. And consumers who want to handle insurance the old-fashioned way, withholding all but the essential from their insurers, will have to pay a premium, and probably a steep one. In the world of WMDs, privacy is increasingly a luxury that only the wealthy can afford.

As insurance companies learn more about us, they’ll be able to pinpoint those who appear to be the riskiest customers and then either drive their rates to the stratosphere or, where legal, deny them coverage. This is a far cry from insurance’s original purpose, which is to help society balance its risk. In a targeted world, we no longer pay the average. Instead, we’re saddled with anticipated costs. Instead of smoothing out life’s bumps, insurance companies will demand payment for those bumps in advance. This undermines the point of insurance, and the hits will fall especially hard on those who can least afford them.

Once companies amass troves of data on employees’ health, what will stop them from developing health scores and wielding them to sift through job candidates? Much of the proxy data collected, whether step counts or sleeping patterns, is not protected by law, so it would theoretically be perfectly legal. And it would make sense. As we’ve seen, they routinely reject applicants on the basis of credit scores and personality tests. Health scores represent a natural—and frightening—next step. Already, companies are establishing ambitious health standards for workers and penalizing them if they come up short. Michelin, the tire company, sets its employees goals for metrics ranging from blood pressure to glucose, cholesterol, triglycerides, and waist size. Those who don’t reach the targets in three categories have to pay an extra $1,000 a year toward their health insurance. The national drugstore chain CVS announced in 2013 that it would require employees to report their levels of body fat, blood sugar, blood pressure, and cholesterol—or pay $600 a year.

The potential for Facebook to hold sway over our politics extends beyond its placement of news and its Get Out the Vote campaigns. In 2012, researchers experimented on 680,000 Facebook users to see if the updates in their news feeds could affect their mood. It was already clear from laboratory experiments that moods are contagious. Being around a grump is likely to turn you into one, if only briefly. But would such contagions spread online? Using linguistic software, Facebook sorted positive (stoked!) and negative (bummed!) updates. They then reduced the volume of downbeat postings in half of the news feeds, while reducing the cheerful quotient in the others. When they studied the users’ subsequent posting behavior, they found evidence that the doctored new feeds had indeed altered their moods. Those who had seen fewer cheerful updates produced more negative posts. A similar pattern emerged on the positive side.

Much the same is true of Google. Its search algorithm appears to be focused on raising revenue. But search results, if Google so chose, could have a dramatic effect on what people learn and how they vote. Two researchers, Robert Epstein and Ronald E. Robertson, recently asked undecided voters in both the United States and India to use a search engine to learn about upcoming elections. The engines they used were programmed to skew the search results, favoring one party over another. Those results, they said, shifted voting preferences by 20 percent. This effect was powerful, in part, because people widely trust search engines. Some 73 percent of Americans, according to a Pew Research report, believe that search results are both accurate and impartial. So companies like Google would be risking their own reputation, and inviting a regulatory crackdown, if they doctored results to favor one political outcome over another.

The turn of the twentieth century was a time of great progress. People could light their houses with electricity and heat them with coal. Modern railroads brought in meat, vegetables, and canned goods from a continent away. For many, the good life was getting better. Yet this progress had a gruesome underside. It was powered by horribly exploited workers, many of them children. In the absence of health or safety regulations, coal mines were death traps. In 1907 alone, 3,242 miners died. Meatpackers worked twelve to fifteen hours a day in filthy conditions and often shipped toxic products. Armour and Co. dispatched cans of rotten beef by the ton to US Army troops, using a layer of boric acid to mask the stench. Meanwhile, rapacious monopolists dominated the railroads, energy companies, and utilities and jacked up customers’ rates, which amounted to a tax on the national economy. Clearly, the free market could not control its excesses. So after journalists like Ida Tarbell and Upton Sinclair exposed these and other problems, the government stepped in. It established safety protocols and health inspections for food, and it outlawed child labor. With the rise of unions, and the passage of laws safeguarding them, our society moved toward eight-hour workdays and weekends off. These new standards protected companies that didn’t want to exploit workers or sell tainted foods, because their competitors had to follow the same rules. And while they no doubt raised the costs of doing business, they also benefited society as a whole. Few of us would want to return to a time before they existed.

Movements toward auditing algorithms are already afoot. At Princeton, for example, researchers have launched the Web Transparency and Accountability Project. They create software robots that masquerade online as people of all stripes—rich, poor, male, female, or suffering from mental health issues. By studying the treatment these robots receive, the academics can detect biases in automated systems from search engines to job placement sites. Similar initiatives are taking root at universities like Carnegie Mellon and MIT.

If we want to bring out the big guns, we might consider moving toward the European model, which stipulates that any data collected must be approved by the user, as an opt-in. It also prohibits the reuse of data for other purposes. The opt-in condition is all too often bypassed by having a user click on an inscrutable legal box. But the “not reusable” clause is very strong: it makes it illegal to sell user data. This keeps it from the data brokers whose dossiers feed toxic e-scores and microtargeting campaigns. Thanks to this “not reusable” clause, the data brokers in Europe are much more restricted, assuming they follow the law.

Conclusion

Same as O’ Neil, I often see an unfounded trust in big data and technology to solve deeply rooted social problems. However, big data and technology also create new or amplify existing problems. Understanding this process can pinpoint to what we can do to stop and change this.