The Secret Life of Pronouns (31 page)

Read The Secret Life of Pronouns Online

Authors: James W. Pennebaker

BOOK: The Secret Life of Pronouns
3.67Mb size Format: txt, pdf, ePub

One can imagine that by cataloging regional names, we could begin to isolate where people are from. All that would be needed would be to wait until speakers or writers mention their words for soft drinks, spices, or how their boats rolled over. Instead of waiting for relatively obscure words or topics to emerge, it might be more efficient to track more common words. Function words, perhaps?

AS WE’VE SEEN, stealth words vary by context. It also happens that they can vary according to the geographical regions. Some regional differences in function word use are well known. For example, if people are talking to several of their friends, they might refer to them as “all of you” if from Seattle, “youse guys” if from New York, “y’all” if from the American South, or even “yat” in Louisiana (as in “Where are you at?”).

Even in relatively formal writing, regional differences exist in function word use. Cindy Chung and I had the opportunity to test this idea using thousands of essays people had written in response to the nationally syndicated radio program
This I Believe
. The original version of
This I Believe
ran briefly in the 1950s with journalist and radio commentator Edward R. Murrow. Over the course of a couple of years, Murrow invited some of the cultural leaders of the time to summarize their most important beliefs. The weekly broadcasts of politicians, sports stars, scientists, philosophers, and others captured the imagination of radio listeners. In 2005, Jay Allison and Dan Gediman resurrected the idea for National Public Radio. Rather than relying on celebrities, regular listeners were encouraged to contribute their own
This I Believe
essays. Over a four-year period, over seventy thousand essays were submitted, although only about two hundred were ever read on air. Most of the essays were eventually posted online for the public to read (www.thisibelieve.org).

Working with Allison and Gediman, we analyzed about 37,500 essays. A number of the stories were riveting, some tragic, many funny, most touching and inspirational. Regional differences emerged in terms of the topics of the essays themselves. Stories about sports were most common in the Midwest, racial issues in the South, and science in the Northeast.

Relative use of words within an immediacy cluster (I-words, short words, present-tense verbs, nonuse of articles) and making-distinction cluster (exclusive words, negations, causal words, nonuse of inclusive words). Darker regions reflect higher usage. Language samples are based on about 37,500 U.S.
This I Believe
essays.

Not only did people differ in their topics, but they also differed in terms of their function words. As you can see on the top map, people in the middle of the country tended to use the highest rates of I-words, present-tense verbs, and short words. As you might recall from earlier chapters, this constellation of words reflects psychological immediacy, wherein writers tend to be in the here and now. The darker areas reflect higher rates of immediacy. Contributors from the Northeast and the West were the least personal and social in their writing and the most concrete. These low levels of immediacy reflect a language style of psychological distance and formality.

Recall from earlier chapters that function words often clump together in a way that reflects analytic thinking. People who think analytically often make distinctions between ideas. In making distinctions, it is necessary to use words such as conjunctions (
but
,
if
,
or
), negations (
no
,
not
), and prepositions (
with
,
over
). In the bottom map, people in the middle of the country make the most distinctions and those in the far Northeast make the fewest.

What accounts for these regional differences? As we have seen with the language style matching, people quickly adjust their speaking styles to others around them. The more time spent conversing, the more individuals begin seeing their worlds in similar ways. As a general rule, my neighbors have the same weather, eat similar foods, share the same community events, and deal with the same schools, tax collectors, stores, and bureaucracies as I do. The people in my community share many of the same issues with those in the next town and, to a certain degree, with those in the neighboring state. But as I travel farther and farther away from home, the weather, food, cultures, and concerns begin to change. As the social and physical environments change, so do the ways people approach their worlds and talk with others.

ALTHOUGH LANGUAGE DIFFERENCES should become more pronounced over greater distances, some variations in language can spring up in neighborhoods separated by only a few blocks where weather, terrain, ethnicity, social class, and every other factor is similar. I grew up in an oil-boom town in West Texas, where most families would move in for about four years before being transferred elsewhere. Even with the constant migration, new neighborhood children would quickly adopt accents and slang consistent with that neighborhood.

Even within schools, researchers have been able to isolate different language patterns among different subgroups. In an important analysis of a Detroit high school in the early 1980s, Penelope Eckert demonstrated that the language of the school’s jocks was as distinctive as that of its burnouts. Not unlike most secondary schools in the world, the different tightly knit groups adopted their own language styles in a way that reflected their group’s identity.

It’s not much of a stretch to imagine that different schools in the same geographical region could develop their own language styles. We have found evidence for this by analyzing over fifty thousand college admissions essays submitted by students who were accepted at the University of Texas at Austin over several years. Working with the school’s admissions office, linguist David Beaver and I looked at about two thousand essays from students from nine different high schools surrounding a single large metropolitan area in Texas. The students from the various high schools did equally well in high school and their first year in college and varied only modestly in their social class and ethnic makeup. Nevertheless, the ways they used pronouns, articles, prepositions, and other function words in their essays differed from school to school. In other words, each school had its own linguistic fingerprint.

College admissions essays, like
This I Believe
stories, are unique forms of writing. Most people will write something like them only a few times in their lives—if ever. Usually, the authors sit alone in their rooms (or coffee shops) and reflect on some of the bigger issues in their lives. Their stories reflect their families, their friends and community, and their society. The inner voices that guide their word choices are driven by the ways they think, what they attend to, their emotional states at the time, and their language history.

It is little wonder that self-reflective essays mirror people’s sense of place and the groups with which they spend most of their time. Even with relatively crude computer models, we can do much better than chance at estimating which part of the country, what city, and possibly what part of a city a person is from by the ways they use function words in their essays. If we are analyzing transcripts of people in conversations, these same function words provide hints to what people are doing, the situations they are in, and the nature of their connections to the people around them.

If you have paranoid tendencies, know that it is unlikely that a function-word-based predator drone will ever be developed. The words we use have always reflected
who
we are,
where
we are, and
what
we are doing. The who, where, and what have historically been obvious. Prior to the written word, if I were speaking with you, we would both know that I was talking (who), our location (where), and our current actions (what). Only through the fluke of technological advancements have we had a period where the who, where, and what of communicating became opaque. The delicious irony is that with additional advances in technology, we may eventually be able to determine the who, where, and what of communication at levels comparable to our ancestors more than five thousand years ago.

THIS CHAPTER HAS explored how the words people use in groups can reveal something about the groups themselves. Use of we-words by group members often suggests that the members identify with their group. Over time, as people become more comfortable with their group, everyone tends to use we-words more. When groups succeed or are threatened from the outside, group identity increases, with a corresponding increase in the use of we-words.

We-words reflect group identity but not the degree to which a group works well together. It may be possible to increase a team’s sense of identity, but that doesn’t mean the team will actually perform any better. There may be no
I
in
team
but there is no
we
in
team
either. Language analyses suggest that for groups to work best, they must think alike and pay close attention to the other team members. In all likelihood, language style matching reflects mutual interest and respect among different people in a group.

The definition of a group has been used rather loosely in this chapter. This is OK: I’m a social psychologist. What is particularly intriguing is that similar processes for the use of we-words and language style matching are apparent among dating couples, working laboratory groups, real-world work groups, online communities, and entire schools, communities, and societies. The unifying theme is that all of these groups use language to communicate. Words are the common currency of interaction whether written or spoken.

Finally, just as words of group members reveal information about group processes, they also tell us something about what groups are doing and where they are. In an odd way, function word usage is highly contagious. Whether in couples, small groups, neighborhoods, or communities, people tend to adopt the language styles of the people around them. Our words, especially our function words, inadvertently reveal what we are doing and where we are. Just as our accents, body language, and clothes reveal our social and psychological selves, so do our words.

If you are a private investigator, put away your spyglass. Instead, boot up your computer and start counting words.

CHAPTER 10

Word Sleuthing

I
N STUDYING WORDS, I have frequently been asked to analyze language to answer questions that I would have never considered. Lawyers, historians, music lovers, political consultants, educators, intelligence agents, and others have occasionally contacted me to see if our language approach could give them a different perspective on a problem they have been thinking about.

This chapter brings together some of the more interesting projects my students and I have been playing with over the years. The topics vary quite a bit. Nevertheless, they showcase different ways words can be analyzed to answer novel questions.

USING WORDS TO IDENTIFY AUTHORS

The phone call I received from the senior partner in a law firm caught me off guard. He was curious if I could analyze an e-mail that had been sent to a member of his firm; let’s call her Ms. Livingston. It was quite sensitive, he confided, and it was important that he talk directly with the person who had sent the e-mail. The only problem was that the e-mail had been sent anonymously from an untraceable e-mail address. After I agreed to look at it, he sent me the following e-mail:

Ms. Livingston:

I think you should know that David Simpson has perpetuated the idea that you have no credibility among your colleagues. He says you altered depositions and falsified expense reports at your last job in New York. He says this is the reason you left so abruptly.

He has spread these stories to people in various departments, including Billing, Personnel, Public Relations and to those at the executive level. It is uncertain how and when our senior partners will deal with this. But if you start getting the cold shoulder, you will know why.

When I first heard of this, I was surprised, but took what he said at face value. Of course, this was before I learned of his voracious appetite for propagating half-truths, gossip, and outright lies, all in the name of somehow making himself look knowledgeable and “better.”

Such a pity. He obviously has talent, but it is all negated by his vile, malicious tongue. All I can think of is a tremendous sense of insecurity. But I digress. I just thought you would like to know.

A friend

After receiving the e-mail, Ms. Livingston turned it over to the law firm. She dismissed the rumor as provably false but was concerned that if David Simpson really was spreading false rumors, it could damage her reputation along with that of the firm. I had spent several years developing methods to analyze language and personality but had never been paid to be a word detective.

What kind of person may have written the note? Is “A friend” a male or female and what is his or her approximate age? What is the person’s link to Ms. Livingston, to David Simpson, and to the firm? Any hints as to the person’s personality traits?

In the years since I worked on the case, several new ways of looking at words have been developed. One involves comparing the words “A friend” used with those of tens of thousands of regular bloggers. For example, by looking at just the function and emotion words, we can guess that there is a 71 percent chance that the author is female and a 75 percent chance that she is between the ages of thirty-five and forty-five. It is much harder to get a good read on her personality. One analysis suggests that there is a fairly good chance that the author of the e-mail is high in the trait of narcissism—meaning she may be somewhat conceited and manipulative.

Look more closely at the e-mail and other hints emerge. The person is psychologically connected to the firm (“our senior partners”) and has knowledge of rumors from across several departments within the firm. The person also is working to impress Ms. Livingston by using a large vocabulary. Particularly interesting is the use of words and phrases such as “voracious appetite,” “vile,” and “malicious tongue.” These are Old Testament words that, in other analyses, were primarily used by people between forty-two and forty-four years of age at the time of the project.

One other important clue was the layout and punctuation. The e-mail was professionally typed with paragraphs of equivalent size. There was only one space between the period and the beginning of the next sentence, which suggests the person learned to type after about 1985—when desktop computers became popular—
or
the person had some background in journalism or publishing before 1985, where the single space after a period was the norm. (My wife, who was in publishing before 1985, explained this to me.)

What happened? When I submitted my report to the senior partner, he was relieved because it precisely matched the person he had suspected—a conscientious women in her early forties with a background in newspapers who had been with the firm for several years. I never learned the final disposition of the case, but I see that Ms. Livingston is now a senior partner with the firm.

WHO WROTE IT? THE ART OF AUTHOR IDENTIFICATION

Deciphering linguistic clues to solve crimes has a rich tradition in criminology. The FBI, various national security agencies, and local police departments around the world occasionally seek the expertise of linguists to help decode ransom notes or written threats, or to assess who might have written legal or other documents.

One of the best-known early forensic linguists is Donald Foster, a professor of English at Vassar College. Using a mixture of computer and deductive skills, along with a broad knowledge of history and literature, Foster has worked with law enforcement agencies on high-profile cases such as the Unabomber, the 2001 anthrax attacks, and the 1997 JonBenét Ramsey murder case. He has also applied his methods to determine the authenticity of some works by Shakespeare and others. Perhaps his most successful venture was in identifying Joe Klein as the author of an anonymously published satirical novel on the Clinton presidency,
Primary Colors
.

Foster has been a controversial figure because several of his high-profile claims about authorship have not panned out. He has also been less than forthcoming about the details of his methods of author identification, something that reflects his training in English rather than statistics and science. Nevertheless, Foster’s approach has alerted the literary and forensic worlds to the promise of computer-based methods to identify authors and their work.

FINDING THE TELLS

World-class poker players closely watch and listen to their opponents in attempts to predict the cards they may be holding. Often players will pretend they have a poor set of cards when they have a good set; other times they will bluff by giving the impression they have a winning hand when they don’t. Experts look for telling signs of deception—or “tells.” Some players avoid looking around the table, others tap their feet, yet others talk more loudly. The ability to decipher tells can give card players a large advantage in high-stakes poker games.

There are various types of tells in people’s use of written language as well. Two are particularly good clues in identifying authors: function words and punctuation. This can be seen in looking back at the blogs we collected in 2001 as part of the September 11 project discussed in the last chapter. Recall that we saved about seventy blog entries from each of a thousand people in the two months before and after the 9/11 attacks. Every few years, my students and I revisit LiveJournal.com to see if the same people are still posting. Ten years later, 25 to 30 percent are still active. About 25 percent have erased their accounts. The remainder stopped posting, on average, five years after the attacks, in 2006. Many of the former posters migrated to other systems such as Facebook or Twitter.

Simply reading the last ten years of people’s posts provides an intimate picture of their lives. Not unlike Michael Apted’s
Seven Up!
documentary series, we have been able to track the unfolding experiences of the bloggers as they grow older. Many of the same issues still drive the authors. Even though some have now married, had children, and started careers, recurring insecurities, motives, and goals keep returning. Those who were happy and upbeat in 2001 tend to be the same optimistic people nine years later. For example, a young father writes in a random blog in 2001 about his favorite hockey team:

lucky lucky chicken bone. i shall do the happy-cup-dance. we shall win. we shall triumph. and there will be much rejoicing! i just need to get cable first. ok. i wasn’t just gonna post about hockey, but yvonne’s ready to go. yeah. shut up. you try resisting that sweet, sweet candeh.

And nine years later, you see the same person:

My first attempt at making salsa was, in my humble opinion, not too shabby. protip: don’t use Roma tomatoes. I’m not sure why the hell I thought they’d work out fine, but I was terribly wrong. Ok, not
terribly
just mildly. ah, salsa humor. I’m heading back to the mexi-mart today to pick up the goods to try another batch. Maybe i’ll have it done in time for the bbq. Who knows? Since my catharsis, I’ve been in an amazing headspace.

Obviously, these two writing samples are from the same person. I mean, anyone could spot it immediately.

Really?

Actually, we can see the similarities once we know that they were written by the same person. But what if we read blogs all day and came across the second one several hours after reading the first? In all likelihood, most people wouldn’t jump up yelling, “Aha! I have read that writing style earlier … yes, from the guy who wrote about hockey.” Could language experts or computers make a definitive match? Are language fingerprints as reliable as DNA or real fingerprints? The short answer is no. However, computerized language analyses do a reasonably good job at matching which writing goes with which person.

Imagine we had a large number of blog entries from twenty bloggers. Several years later, we retrieve a handful of new postings from each of the same twenty bloggers. Now imagine sitting on your living room floor with hundreds of pages of posts trying to match each current blog entry with the original posts of the twenty bloggers. All things being equal, anyone should be able to match 5 percent of the blog posts correctly just due to chance alone. Most people would do terribly on this task. It is unlikely that you would match at rates any better than 10–12 percent. The writing style differences are too subtle and there is just too much information.

Computers are more patient and systematic. If we just analyze function words, the computer correctly matches the recent blog posts with the original authors about 29 percent of the time. This is actually impressive given the time lag between the writing of the posts.

But there is more to author identification than function words. Look at the consistency of punctuation. The following woman, for example, continues to use asterisks in the same way nine years apart. This was part of an early 2001 entry:

Oh.. I have also discovered a shy streak I didn’t know I had. I guess you would call it shyness. Somebody made me *blush*. Repeatedly. That is *weird*. I don’t blush.

And in 2010:

We *are* in post-post-punk now, aren’t we? The guys in the band made a joke about how they just wrote that song yesterday, and maybe a quarter of the people in the room didn’t get why the rest of us were chuckling. weird. *shrug*

Others use punctuation in equally unique but more subtle ways. From a twenty-seven-year-old male in 2001:

I mailed memorial gift checks to Immanuel [endowment donation in honor of Joan’s mother]; and St Anne’s - for my favorite accounting professor the Smythe scholarship. Frank & Rebecca brought over “Midnight in the Garden of Good & Evil” and a couple homebrews. My eyelids want to close so I better …

In 2010:

I didn’t quite know what to say thinking, “hmm, mud, what is it … when I found a mirror I didn’t see any other “brown stuff” i brought a watermelon and Costco multi-grain chips, Had a couple beers, I took Yuengling B & T - dinner was boiled/grilled chicken, okra, slaw, “dipping” brownies.

This person is the Alvin Ailey of punctuation. He jumps, swirls, swoops, and rolls with the full gamut of punctuational possibilities: [ ; - … & “/. Oddly, when I first read his blog, I didn’t even notice his use of punctuation marks—they just blended into his writing. However, when his blogs were computer analyzed, his use of punctuation stood out.

Punctuation marks can identify some people better than anything they write. In fact, when looking only at punctuation, computer programs identified 31 percent of authors correctly—essentially the same rate as relying on function words. When both function words and punctuation were used together, the computer correctly paired the original bloggers with writing samples several years later 39 percent of the time.

Punctuation, function words, and content words that are used in everyday writing are all parts of our personal signature. To appreciate this, go to your own e-mail account and spend a few minutes looking at the e-mails you send to and receive from others. Start with the page layout. Some people tend to write very long e-mails, whereas others keep them to a sentence or two. People tend to differ in the length of their paragraphs and sentences. Their greetings and closings vary tremendously as well. Some use emoticons; some never do.

Some of these differences may be psychologically important but most probably aren’t. The person who ends most e-mails with “Sincerely” may do this just because they were told to do so when they were younger. Even though these variations may not say anything about your conflicts with your mother when you were an infant, they still mark you. That is, they are part of your general writing style that makes you stand out from everyone else. And that is the interesting story. All of the language features we can measure can help to identify you.

THE CASE OF THE FEDERALIST PAPERS

In 1787 and 1788, a series of eighty-five essays were published in pamphlets and newspapers across the American colonies in an attempt to sway people to support the proposed document that would become the U.S. Constitution. Published anonymously under the name Publius, the papers discussed a wide range of topics, including the role of the presidency, taxation, state versus federal power, etc. Even at the time, many knew that Publius was not a single person but, instead, James Madison (who would become the fourth president), Alexander Hamilton (the first secretary of the treasury), and John Jay (the first chief justice of the Supreme Court).

In the years that followed, the authorship of seventy-four of the essays gradually became known. Madison wrote fifteen, Hamilton fifty-one, Jay five, and Madison and Hamilton jointly wrote three of the articles. The authorship of the remaining eleven was never determined and has been a source of speculation ever since. The first serious attempt to identify the author of the eleven papers was undertaken by historian Douglass Adair as part of his dissertation in 1943. Adair’s historical analysis deduced that all eleven anonymous essays had been written by James Madison.

The debate resurfaced in 1964 when statisticians Frederick Mosteller and David Wallace introduced a new way to analyze words. By focusing on a small number of function words, they concluded that Adair was indeed correct because their elegant statistical models pointed to Madison as the likely author. Since then, identifying the anonymous authors of the Federalist Papers has become something of a sport whenever new language analysis methods are developed.

I am proud to announce the New Official Findings. Historians, prepare your quills.

Function Word Analyses

Using similar methods to those of Mosteller and Wallace, we find the same effects. The anonymous eleven cases all use pronouns, prepositions, and other stealth words in ways similar to James Madison. Case closed?

Not so fast. Other statisticians have discovered a small problem that exists with the investigation of our founding fathers’ function words. Since Mosteller and Wallace, another technique has been devised that is called cross-validation. The idea is to examine each of the original essays individually as if they were anonymously written. In other words, we pull one of the known essays out of the stack and then develop a computer model based on the remaining essays to try to determine who wrote the essay we pulled out. It’s a marvelous method because we are determining results about a question whose answer we already know. If our cross-validation analyses successfully guess who wrote all of the known essay writers, we can place a tremendous amount of trust in our research methods.

Heartbreak city. The cross-validation results suggest that Mosteller and Wallace might have been wrong. About 14 percent of the known essays are not classified correctly based on function words. This is a serious problem. If the computer can’t tell us what we already know with extremely high accuracy, we have to be careful in interpreting the results from essays about which we don’t know the author.

Punctuation Analyses

Recall that people’s use of punctuation can reveal authorship in many cases. Using similar cross-validation analyses on punctuation resulted in disappointing results as well. And using a combination of function words and punctuation to predict authorship produced slightly better results than the function words alone. Interestingly, the function words plus punctuation results hinted that Hamilton wrote three of the eleven anonymous essays.

Going for the Tell: People’s Use of Obscure Words

Over the course of my career I have written more papers than I care to admit. Perhaps ten years ago, a colleague thanked me for a review I had written about her research. I was flattered, of course, but a bit puzzled since my review had been written anonymously. “How did you know I was the author of that review?” I blurted out. She just laughed and said one word:
intriguing
.

Intriguing, indeed. I went back to many of my reviews, then articles, and even books. I was shocked by how frequently I used the word
intriguing.
Even this book is littered with intrigue—I just can’t help myself. Over the years, I’ve noticed that most of my colleagues and friends have their own favorite but relatively obscure words that even they aren’t familiar with. The words aren’t used at high rates but they find their ways into the occasional e-mail, Facebook post, blog, tweet, or article.

Did Madison or Hamilton have tell words in their articles? With a little sleuthing it turns out the answer is yes. In almost half of his papers, Hamilton used the word
readily
; Madison never did. In nine of his fifteen articles, Madison used
consequently
, compared with Hamilton’s use of the word three times across his fifty-one papers. Hamilton also had a fondness for
commonly
,
enough
,
intended
,
kind
, and
naturally
. Madison tended to overuse
absolutely
,
administer
,
betray
,
composing
,
compass
,
innovation
,
lies
,
proceedings
, and
wish
.

If we just examine the use of these fourteen words, the statistics are promising—almost a perfect score for cross-validation. However, the story for the unknown authors comes out quite differently than what the earlier scholars claimed. They suggest that Hamilton actually wrote eight of the anonymous essays and Madison wrote only three.

What is truth in this case? Reading Douglass Adair’s delightful account of the controversy surrounding the eleven articles, it is clear that Hamilton and Madison had very different memories of who wrote what. Adair is ultimately more sympathetic to Madison’s claims, although the objective evidence to assign authorship is not compelling either way. Like Mosteller and Wallace, I have no in-depth knowledge of the actual case. Nevertheless, historians should know that from a statistical perspective, the case is still open.

Other books

Crossroads by Chandler McGrew
Desert Divers by Sven Lindqvist
Rebel Soul by Kate Kessler
Burning Ambition by Amy Knupp
Grizzly by Will Collins
The Lace Balcony by Johanna Nicholls
Tamed Galley Master by Lizzie Lynn Lee