Read The Half-Life of Facts Online
Authors: Samuel Arbesman
There are many examples where a small error, despite being corrected later, has spread through a population. If you want to spend days poring over persistent errors that have spread far and wide, Snopes.com is a great font for these bits of information. Or even look at Wikipedia. In his delightfully nerdy Web comic
xkcd
, author Randall Munroe wishes for a world in which schoolchildren read the Wikipedia page on common misconceptions weekly, in order to learn truth as well as skepticism. Both of these sites are full of urban legends, false facts, and misconceptions that have become prevalent.
One good rule of thumb when examining how errors propagate over time is to look for a simple phrase:
contrary to popular belief.
While the phrase is a favorite of writers with a love for the counterintuitive point (and this author is not immune to this), it’s also a clear indication that a bit of knowledge has spread far and wide despite being inaccurate. The antidote to this false fact, which of course the writer is about to tell you, has yet to penetrate the
popular consciousness. And this phrase is by no means new. I have found instances of it in books and magazines from the nineteenth century debunking false facts about lunar phases, medical knowledge, and even the heredity of genius.
There are even examples where such misinformation has been spread purposefully, albeit sometimes with a wink rather than with malicious intent. But since we often don’t track our sources, this can have a rather problematic effect. For example: I have a book on my shelf entitled
Dictionary of Theories
. Perusing it one day, I came across a curious entry:
Dynamics of an asteroid
(1809)
Astronomy
Initiated by C F Gauss (1777–1855), but reputed to have had its outstanding exposition in an elusive textbook by James Moriarty (c. 1840–1891), with later contributions by other mathematicians including K Weierstrass (1815–1897) and J E Littlewood (1885–1977).
The motion of an asteroid, which is now generally understood as a minor planet, is that of a body of negligible gravitational attraction in the gravitational field of two massive bodies, just like a spacecraft under the influence of the Earth and the Moon.
J F Bowers, “James Moriarty: A Forgotten Mathematician,” New Scientist, 124 (1989). Parts 1696–7, 17–19.
While I had never heard of this theory, I wasn’t terribly surprised, as the dictionary contains many obscure ideas. But I did recognize the reference, and that surprised me. That’s because the citation was to a monograph by none other than James Moriarty, the arch nemesis of Sherlock Holmes.
As brilliant as he might have been, Moriarty never existed. Yet here he was, in a fictional reference that had somehow jumped into the real world. Searching a bit further, I tracked down the article referenced in
New Scientist
, and discovered that it was a tongue-in-cheek analysis, by John Bowers of the School of
Mathematics at the University of Leeds, of the mathematical contributions of Professor Moriarty, including his analysis of how gravity operates on asteroids.
But the citations to Moriarty’s work didn’t end there. I even found a dissertation by Kristian Kennaway, a physics doctoral student at the University of Southern California, that cites one of the other celebrated bits of research by Moriarty: a treatise on the binomial theorem, published in none other than the
Bohemian Journal of Counting
.
While this is no doubt a fun bit that Kennaway inserted to see if his committee members were paying attention (my aunt placed a banana bread recipe into her master’s thesis and no one noticed), Kennaway actually cites Moriarty’s mathematical work to explain something, providing the justification for a mathematical concept using the work of a fictional character.
Thus far, I haven’t seen many other examples of this, and I doubt such bizarre overlaps of fiction with reality have propagated far. But it does give one pause.
Bad information can spread fast. And a first-mover advantage in information often has a pernicious effect. Whatever fact first appears in print, whether true or not, is very difficult to dislodge. Sara Lippincott, a former fact-checker for
The
New Yorker
, has made this explicit. These errors “will live on and on,…deceiving researcher after researcher through the ages, all of whom will make new errors on the strength of the original errors, and so on into an explosion of errata.” This is strong stuff. These errors become ever present and extremely difficult to correct. It’s like trying to gather dandelion seeds once they have been blown to the wind.
I myself was a victim when I actually propagated the myth that a frog, if boiled slowly, will not jump out of a pot. I mentioned this in passing in the
Boston Globe
, using it to explain how people don’t notice factual change if it happens slowly. I was taken to task soon after by James Fallows, of
The Atlantic
, who has worked hard to remove this falsehood from the population; in fact, the frog only remains in the pot if it’s brain-dead.
Can we understand in any rigorous way how these sorts of falsehoods continue to propagate? Happily, there is scientific research that delves into how they spread. But that science requires us first to take a little detour to examine some very old typos in ancient manuscripts, their surprising relationship to genetics, and how both of these fields deal with error.
. . .
WE
can look to the children’s game of telephone to understand how facts can be corrupted and spread: The children sit in a circle, and one person begins by whispering a phrase or sentence to the child next to them. This person whispers to their neighbor, who in turn does the same, continuing until the person who completes the circle, the last one to hear the sentence, says aloud what they heard. This is then compared to what the first person initially said, often with hilarious results. Of course, sometimes this is because there’s someone malicious somewhere along the line—the kid who delights in replacing every verb with
fart
, for example. But in general, the sentence decays without any malice or intent. It simply gets changed because hearing a whispered sentence doesn’t provide great fidelity. It’s what information scientists would refer to as a noisy channel. When information is passed from one person to another it has the potential to become inaccurate unless there are a whole host of error-checking mechanisms.
A clear case of this that we can actually measure and study quantitatively can be found in the world of old texts. Surprisingly, understanding the errors in these manuscripts is actually quite similar to understanding genetics. This may sound a bit odd. What do handwritten manuscripts from the medieval period or earlier have to do with genetics? On the surface, nothing: One is a distinguished part of the humanities and the other a hard experimental science. However, while those who study each of these fields have very little to do with one another, it turns out that there is a great deal of symmetry. It mainly comes down to mutation.
Scholars who study paleography—the field of research that
examines ancient writing—are all too aware of the mistakes that scribes make when copying a text. These types of errors, which can be used to understand the provenance of a document, are actually nearly identical to the types of errors caused by polymerase enzymes, the proteins responsible for copying DNA strands.
When it comes to copying DNA—those strands of information that code for proteins and so much more—there are a few advantages over simply hand copying a document. DNA’s language is made up of four letters, or bases, which come in complementary pairs: A always goes with T, and G always goes with C. When DNA is copied, its double helix is unzipped, and the letters of each helix—one side of the zipper—can be easily paired with their complementary letters. This results in two new double helices—closed zippers—both of which have properly paired letters, because the complementary letters act as a simple way to prevent errors.
Nonetheless, when DNA is replicated, it’s sometimes done imperfectly. The group of chemical machines responsible for duplicating a strand of DNA occasionally makes mistakes. That’s what canmake up a mutation: an incorrect copying, or even a piece of DNA getting hit by a cosmic ray. However it happens, some error is introduced into the sequence. For example, an A gets turned into a G, or something much bigger happens. The types of mutations fall into a few categories, such as duplicating a section of DNA or deleting a letter, due to regular ways that the DNA copying mechanisms operate. The majority of these errors cause no problem whatsoever, but in some cases a change in a single letter of DNA can cause some large-scale issues, such as in the case of sickle-cell anemia.
There are systematic errors in copying a text as well. Whether it’s skipping a word or duplicating it, there is an order to the ways in which a scribe’s mind wanders during his transcription. Many of the errors can be grouped into categories, just like the different types of genetic mutations. And not only are there regularities to how both DNA and ancient manuscripts are copied incorrectly, but these types of errors are often very similar, despite the large differences between how scribes and enzymes work.
There is a common scribal error known by the Greek term
homeoteleuton
. This refers to a type of deletion, in which there are two identical word phrases separated by some other text and the scribe accidentally skips to the second phrase without transcribing the intervening portion, including the first instance of the phrase. For example, there is a verse at the end of the creation story in Genesis that reads, “And on the seventh day God finished the work that he had done, and he rested on the seventh day from all the work that he had done.” Notice that the phrase “the work that he had done” is repeated. If a scribe incorrectly transcribed the verse simply as “And on the seventh day God finished the work that he had done,” and then proceeded to the next verse, that would be a homeoteleuton.
In genetics, this same error is known as a slipped-strand mispairing mutation. AATTCGATATACGA gets copied as AATTCGA, skipping the middle section.
Insertions can occur during copying in both genetics and paleography as well. Simply called insertions in genetics, it is called dittography for manuscripts. There are also reversals: metathesis in paleography and chromosomal transpositions in genetics. And point mutations, substituting the wrong genetic base when copying DNA, also occur in handwritten manuscripts. In both cases the wrong letter is written, based on probabilities of their being similar. In DNA, C and T are quite similar chemically and can be confused easily. In ancient Greek, lambda and delta look similar, and are more likely to be exchanged as well. And the list goes on.
While fun to chronicle such similarities, they can also be exploited in the same way. Each type of error occurs with different yet predictable frequencies, which we can use if we want to make judgments about the ages of documents or sequences. For example, if a rare mutation is found frequently in a genetic sequence, the sequence can be assumed to be quite old, since a long period of time is needed for these errors to accumulate. In addition, errors can be used to infer the relationship between differing versions of documents or sequences. If two documents have few differences
between them, we can assume that they are more closely related than two documents that have many differences.
More generally, mutational differences between DNA sequences can be used to understand the evolutionary history of a population, or even of a group of species. So too with variants of the same manuscript. A famous example of this is of research that quantitatively studied the differences between the surviving versions of Geoffrey Chaucer’s
The Canterbury Tales
. By subjecting the variants to a battery of genetic analyses, the researchers were able to better understand the contents of the ancestral version, Chaucer’s own copy.
They used one of the better-known, and saucier, sections of Chaucer’s work, “The Wife of Bath’s Prologue,” in order to trace how changes can be used to find the original version. Based on fifty-eight different surviving versions of this section, which is 850 lines long, the team of researchers—made up of biochemists, information scientists, and humanities scholars—used off-the-shelf computer programs from the field of evolutionary genetics to deduce what Chaucer’s original version likely looked like. They concluded that Chaucer’s original was in fact an unfinished version, complete with his notes about intended additions and deletions.
Armed with a sense of how genetic tools can be applied to understand how texts spread and change, we can now use this to understand exactly what we have been trying to grasp: how information, especially misinformation, spreads.
Mutation of texts is far from an ancient problem. It happens in modern times and we can see it especially clearly in the world of science, when facts themselves are referenced: Citations—references in scholarly papers to previous works—also mutate over time.
Too often a popular paper isn’t actually read by a scientist and then cited in her own work. Sometimes scientists just look at the bibliographies of other papers and copy the citation to the paper instead. This somewhat lazy approach is unfortunately all too common, and if one scientist types it incorrectly, then suddenly
there is a mutated version of the citation out in the wild. If other scientists come along and look only at that reference and not the original paper itself, that typo gets propagated from paper to paper, leading to a proliferation of errors. Just as we can learn about how ancient manuscripts spread errors, studying these mutations can allow us to learn about the history of the article that is being cited.
Mikhail Simkin and Vwani Roychowdhury, professors of electrical engineering at the University of California–Los Angeles, actually measured how often these sorts of factual corruptions occur in the scientific literature. In a series of papers, they explored the possible mechanisms for how this occurs, with most of their mathematical models relying on hypothetical scientists grabbing a few papers they’ve recently read and copying citations from the back. An assumption of laziness, certainly, but it also seems that something close to this might actually be the truth.