Read What Stays in Vegas Online
Authors: Adam Tanner
A lot had changed since the company had formalized its “no outside data” policy on clients in the mid-2000s. Many of the best insights were out in the open in social media, including Facebook.
Six Degrees to Harry Lewis
During Gary Loveman's time at Harvard Business School, a respected computer science professor became dean of Harvard College on the other side of the Charles River. Harry Lewis was even more of a numbers nerd than Loveman, having earned a PhD in applied mathematics. A top expert on data, he has taught thousands of students about computer science. Like Loveman, he was very interested in what human insights can be gained from people's data. Because he had worked so long in computer scienceâstarting long before the personal computer eraâhe was concerned about how much could be revealed about people by aggregating their data. Lewis did not know Loveman, but had heard of him.
One day in January 2004, not long after he stepped down after eight years as dean, Lewis received an email from an ambitious student. Bitter winter winds whipped off the Charles River that morning. Lewis had looked forward to starting a sabbatical, the magical bonus of academic life when tenured professors enjoy a semester away from their normal teaching duties.
The student had grown interested in how public information could link different people and reveal interesting characteristics about them. He had set up a website showing connections between people at the university and had mined the school newspaper,
The Harvard Crimson
, for the information. “Professor, I've been interested in graph theory and its applications to social networks for a while now, so I did some research . . . that has to do with linking people through articles they
appear in from the
Crimson
,” the student wrote. “I thought people would find this interesting, so I've set up a preliminary site that allows people to find the connection (through people and articles) from any person to the most frequently mentioned person in the time frame I looked at. This person is you.
“I wanted to ask your permission to put this site up though, since it has your name in its title.”
The student called the proposed site “Six Degrees to Harry Lewis.” The titled referenced a famous study by Stanley Milgram, who found that a random person could link to a stranger through people they knew using about six intermediaries.
1
Lewis reacted cautiously to the student, whom he knew from his course “Introduction to the Theory of Computation.”
“Can I see it before I say yes? It's all public information, but there is somehow a point at which aggregation of public information feels like an invasion of privacy,” Lewis wrote. “I am on sabbatical leave as of today, so you are catching me at a moment where I was about to relish some anonymity:)”
The student replied that he had already set up the site. He emailed the link. Professor Lewis replied with a few suggestions on how to improve it and gave his assent to going public: “Sure, what the hell. Seems harmless.”
2
Soon after, Professor Lewis, who had also taught Microsoft founder Bill Gates “Introduction to Combinatorial Mathematics” in the 1970s, said goodbye to another one of his brilliant students who decided to drop out of Harvard. A month later, the student, Mark Zuckerberg, built on that early experiment and created Facebook. In retrospect, Lewis's initial caution about aggregating public information has proven visionary. One can indeed learn a lot from many innocent facts about someone.
Ten years later, much as Zuckerberg briefly mined the
Harvard Crimson
to find trends, everyone from casino security officers to marketers and academic researchers regularly hunts for patterns in Facebook and other social networks. Often people find that sites like Facebook give away clues about us that we did not intend to reveal.
Consider, for example, the clues embedded in a person's “likes.” In June 2007, David Stillwell had just finished college in Britain and had time to kill before starting graduate school in psychology. Facebook had about thirty million users at the time (it passed the one billion point in 2012). The social network had just started allowing outside developers to write applications that operated inside the site. Such apps could tap into profiles if users granted permission. Stillwell created myPersonality, an online test that allowed people to take a personality test that measures five main traits: openness to experience, conscientiousness, extraversion, agreeableness, and emotional stability.
3
Since then, nearly eight million people have taken the personality test, typically teenagers and twenty-somethings. Americans, Brits, and Canadians are the most active participants. Around 40 percent of those who took the test allowed Stillwell and other researchers to see their Facebook profiles.
In 2013, Stillwell and two other researchers published an analysis of what they could learn from the Facebook “likes” of 58,466 Americans.
4
In contrast to some of the more intimate Facebook preferences, likes are among the most innocuous and easily visible items. By default they are public. Just by looking at the likesâand not just obvious preferences such as liking a conservative political site or gay-oriented pageâresearchers could consistently infer intimate details such as sexual orientation, religion, political views, smoking, alcohol or drug use, and other characteristics. “If we can collect a few bits of data of a person, there is so much that we can predict,” says study coauthor Thore Graepel, a Microsoft researcher. It may sound like a cliché, but the survey supported the idea that liking the musical
Wicked
or the singer Britney Spears was a good predictor of male homosexuality, much as the preference for rap group Wu-Tang Clan or basketball player Shaquille O'Neal suggested heterosexuality.
The study's third coauthor, Michal Kosinski, who works with Stillwell at the Psychometrics Centre at the University of Cambridge, is especially sensitive to government abuse of personal information because he grew up in Communist Poland. In fact, he considers himself a product of the martial law that sought to repress the independent
Solidarity trade union movement, which eventually undermined the Soviet-backed government. In 1981, when the Polish Communists imposed martial law, many couples opted to spend more time at home rather than go out on the town. Michal was born the following year. He says it might be unnerving for Europeans or Americans to be outed because of how they use the Internet. But it was far more dangerous for people in some other countries. “The same technology used in other countries is not unnerving, just directly dangerous,” he says. “There are many worse things that could happen.”
5
“The first situation in which Iran will use big data to put some people in prison, it will have a huge backlash on companies like Amazon or Facebook or whichever company will be unlucky to have their data used against human rights,” Kosinski says. Figuring out personality patterns from data is not difficult, he says. In fact, a high school student could write a Facebook “gaydar” application in an evening to out gay Facebook friends, perhaps to disastrous consequences. It could prompt suicides or other tragedies.
6
It's one thing for an academic to unmask intimate patterns from Internet postings. But would a company actually seek to use such information for profit? Of course. Jim Adler, the former chief privacy officer at Intelius, says data brokers should be able to publish anything that people can see in public.
7
Such a standard, in Adler's view, opens the way to recording when people walk into gay bars, cancer facilities, or Alcoholics Anonymous clinics. Mass urbanization has created an expectation of privacy that did not exist before, Adler says. But the Internet is returning standards back to those of the small towns where people knew many details about one another.
8
“I really don't think we are violating people's privacy. I feel that there is an era of innovation that we are going through that is shrinking the world and putting us in public where we thought we were in private,” he says.
Knowing someone's sexual orientation could prove valuable for Las Vegas casinos advertising drag shows or gay bars, for example. But targeted ads could also offend. Ads in gay publications or on Internet sites visited by people with such interestsâthe theme of
Chapter 13
, on
Internet advertisingâmay prove a more effective and less potentially offensive approach.
9
* * *
Likes are just one of many ways to discern unexpected private details from Facebook profiles. The same year Stillwell set up myPersonality, a Massachusetts Institute of Technology master's degree student and an undergraduate senior wanted to see just how much they could infer about a person's sexual orientation even if the person did not disclose that information in public. Without Facebook's permission, Behram Mistree and Carter Jernigan used a computer program to harvest Facebook profiles of 6,077 MIT students. The automated process took several weeks. They noted sexual orientation for people who stated a preference in Facebook's “Interested In” tab, where one can list men, women, or both.
In their sample they found that a typical straight male had 0.7 gay male friends, whereas those who declared themselves gay males had 4.6 gay friends on average.
10
They created a logistic regression model and found that if more than 1.89 percent of a male's friends identified themselves as gay, the Facebook user who did not express a sexual preference was likely gay. They checked their finding against students whose true sexual orientation they already knew.
11
“It's not so much that you are inadvertently disclosing things that you hadn't wanted to,” Mistree said years later.
12
“It's actually that the locus of control for describing personal information about yourself using these social networks has moved from you to others. It's not your decision anymore. It's the decision of your neighbor. It's the decision of your basketball coach, all these people.” Jernigan added, “It's not about what they post about you, it is what they post on themselves that then reflects on you.”
13
The same kinds of techniques that reveal intimate information from Facebook can help outsiders figure out who you are when you have not identified yourself. The vast proliferation of personal data as well as advances in computing power have made it harder to maintain anonymity. That's because some parts of a person's data could match
with another dataset about them with more identifying details. It is as if several city maps had been ripped into pieces. An individual piece might not show enough to recognize the place, but a few pieces together would. In 1997 Latanya Sweeney, who in 2014 served as chief technology officer at the FTC, showed just how easy it is to identify someone with a few simple cluesâeven for a graduate student, as she was at the time at MIT.
One May morning in 1996, Massachusetts Governor William Weld attended a graduation ceremony at Bentley University, outside Boston, to receive an honorary degree. The event brought attention to a school often overshadowed by better-known area institutions such as Harvard, MIT, and Boston University. Shortly after receiving his honorary law doctorate, Weld collapsed and lost consciousness for about a minute. An ambulance took him to Deaconess-Waltham Hospital. The graduation ceremony proceeded, with the crowd pausing for a moment of silence and prayer for the governor. At the hospital, doctors announced that they had conducted an electrocardiogram, a chest X-ray, and blood tests on the fifty-year-old Weld. They concluded he had suffered nothing more serious than the flu. He recovered quickly.
The following year, Sweeney wanted to see if she could identify medical patients from anonymous records. The Massachusetts Group Insurance Commission (GIC), a state body that looks at health-care costs and treatment, released hospital exit records on state employees to researchers but without the patients' names. “I remembered Weld had collapsed and that's why I thought, âCan I find Weld in the records?'” says Sweeney, who later became a Harvard professor.
She bought a copy of the voter rolls for Cambridge, the city where Weld lived. Those records contained the name, birth date, gender, and ZIP code for each resident. Only three men in the Cambridge area shared Weld's birth date, and he was the only one with that birth date in his ZIP code. Using that limited information, she pinpointed his hospital records.
14
Her study, published the following year, showed that just knowing someone's date of birth, gender, and postal code provided enough information to identify up to 87 percent of the US population. “You don't need very much information to reidentify people,” she says.
In 2013 Sweeney, working with a research assistant and two students, tried to unlock the names of participants in an especially ambitious medical research study. That effort, called the Personal Genome Project (PGP), aimed to spark new discoveries. George Church, a professor of genetics at Harvard Medical School, says that advances in data and in medicine make it impossible to guarantee anonymity for most medical experiments. When he set up the Personal Genome Project, he made no privacy promises. In the interest of advancing knowledge of human health and disease, he posts the data for all volunteers on the Internet for any researcher to study. He does not list names, but many participants share intimate details: abortions, depression, sexual ailments, and prescription drugs are listed along with their DNA sequence.