Read The Half-Life of Facts Online
Authors: Samuel Arbesman
What does it mean for knowledge to exist but remain hidden? It’s one thing for a result to be ignored—that’s a specific instance of this sort of knowledge, which will be discussed later. But Swanson was talking about research that, due to its inability to combine with other findings, is less valuable than it could be. Imagine that in one area of the scientific literature there was a paper showing that A implies B. Then, somewhere else, in some seldom-read journal (or even a journal read only by those in an entirely different area), an article contained the finding that B implies C. But since no one has read both papers, the obvious result—that A implies C—remained dormant, hidden in the literature as an unknown fact.
Could this really happen? Swanson argued that not only was it possible but that there are many such examples. He demonstrated this with a novel finding: Swanson combined research that showed that dietary fish oil improves blood circulation with entirely separate research that associated poor blood circulation with a condition known as Raynaud’s syndrome. Aside from a few scattered reports, no one, until Swanson, had combined the findings of both these areas and recognized the mechanism that could allow fish oil to help those with Raynaud’s syndrome. Swanson, far from a biologist, was even able to publish this finding in a medical journal.
Building on this, Swanson continued to develop methods of combing through the newly digitized literature. He expanded his
use of MEDLINE, an online database run by the National Library of Medicine, which is housed at the National Institutes of Health. MEDLINE allowed Swanson to search rapidly for medical key words, and then to combine research that had remained separate. In the late 1980s, such databases were still in their relative infancy, but Swanson recognized their potential.
In 1996, a decade after his initial paper, Swanson and a collaborator, Neil Smalheiser, revisited undiscovered public knowledge to see if it was simply a matter of one exceptional example or if it had become a generalizable concept with broad applications, through the use of computational resources. They highlighted six additional examples in this follow-up that revealed intriguing connections between things such as magnesium deficiency and migraines, and sleep habits and phospholipases—a type of enzyme. But while digitizing information is powerful and could reveal a handful of examples, this was but the tip of the iceberg.
. . .
MY
father had studied information science and the idea of undiscovered public knowledge. So he did what a dermatologist, untrained in neurology but mindful of the concept of undiscovered public knowledge, would do: He looked in the neurological literature for a hint of some dermatological relationship to ALS that had remained hidden. After combing through many papers, he was led to a book from 1880, where the first case of ALS was described. The physician author noted that patients with this newly described condition did not develop bedsores, even as the disease caused them to be entirely immobile and bedridden.
This old book in turn led my father to reports by neurologists who had used biopsies of ALS patients to examine the structure of the skin’s framework. They found that there were specific chemical differences related to a decrease in the stretchiness of an ALS patient’s skin, known as its elasticity. These scientists had found that, even though this was a neurological disease, it could affect other
parts of the body, including the skin. My father realized that this was the critical clue.
My father knew that in dermatology there are numerous skin diseases that also reduced skin elasticity. But unlike in ALS, there were well-understood quantitative methods of measuring the changes in elasticity due to these diseases. He connected these two areas that had been entirely unrelated and proposed that the techniques for measuring skin elasticity could be used as a way of measuring the biochemical changes in ALS patients, but in an entirely noninvasive manner. After writing this theory up, my father submitted it to InnoCentive.
And he won, as one of five hypotheses selected.
Prize money in hand, my father went to researchers at the dermatology and neurology departments of Columbia University and suggested that they all collaborate on testing his hypothesis. They conducted a pilot study with ALS patients, and it seems that my father’s undiscovered public knowledge was right: There are quantitative changes as the patients got sicker.
While my father didn’t win the final prize—that went to a team from Harvard—his team eventually won a monetary prize honoring how much progress they had made. Furthermore, some neurologists are currently exploring the connections between ALS and the skin. My father also showed me in a tangible way the power of connecting pieces of hidden knowledge.
Hidden knowledge takes many forms. At its most basic level hidden knowledge can consist of pieces of information that are unknown, or are known only to a few, and, for all practical purposes, still need to be revealed. Other times hidden knowledge includes facts that are part of undiscovered public knowledge, when bits of knowledge need to be connected to other pieces of information in order to yield new facts. Knowledge can be hidden in all sorts of ways, and new facts can only be created if this knowledge is recognized and exploited.
But, happily, hidden knowledge—and fact excavation—isn’t simply a matter of reading lots of papers and hoping for the best.
There is a science to understanding how new facts can be uncovered in what is already known.
. . .
ONE
of the most fundamental rules of hidden knowledge is the lesson learned from InnoCentive: a long tail of expertise—everyday people in large numbers—has a greater chance of solving a problem than do the experts. The problems that go on to be solved by InnoCentive are precisely the ones that experts can’t solve. This was true even before computers and large scientific databases. Revealing hidden knowledge through the power of the crowd has been a great idea at almost any time in human history.
This is behind the concept of the innovation prize. The British government once offered a prize for the first solution to accurately measure longitude at sea—created in 1714 and awarded to John Harrison in 1773—but this was by no means the first such prize. Other governments had previously offered other prizes for longitude: the Netherlands in 1627 and Spain as early as 1567. They hoped that by getting enough people to work on this problem, the solution—perhaps obtained by drawing on ideas from different fields—would emerge.
In 1771, a French academy offered a prize for finding a vegetable that would provide adequate nutrition during a time of famine. The prize was won two years later by Antoine Parmentier for his suggestion of the potato. To our ears, this sounds so obvious as to be silly. But at the time, the potato, due to its origins in South America, was generally unknown in France. And those who did know of it thought it was involved in outbreaks of leprosy. Parmentier’s research into starch, and his willingness to look into unplumbed areas of knowledge, provided the opportunity to uncover a fact—that potatoes are nutritious, not deadly—that was known in other parts of the world but had remained hidden in Europe.
But while prizes help tease out innovations and ideas that would otherwise remain hidden and accelerate the pace of knowledge
diffusion, hidden facts unfortunately remain a far too common part of how knowledge works.
. . .
IN
1999, Albert-László Barabási and Réka Albert wrote a celebrated paper that was published in
Science
, one of the world’s premier scientific journals, about a process they termed
preferential attachment
. The process is responsible for creating a certain pattern of connections in networks—also known as a long tail of popularity—by the simple rule of the rich getting richer, or in this case, connections begetting more connections. For example, on Twitter there are a few individuals with millions of followers, while most users have only a handful. This paper shows how, by assuming a simple rule that newcomers look at everyone in the network and are more likely to connect with the most popular people, you can explain why you get the properties of the entire network—in Twitter or elsewhere—that we see. Using a wide variety of datasets and some mathematics, they showed this rigorous result.
Unfortunately, they weren’t the first. Derek Price, the father of scientometrics, had written a paper in the 1970s showing that one can get this same pattern by invoking a similar rule with respect to how scientific papers cite one another. But Barabási and Albert didn’t know about Price.
Price wasn’t the first either. Herbert Simon, a renowned economist, had developed the same idea in the fifties. Which also happened to be the same concept that Udny Yule had published several decades earlier.
The general concept of preferential attachment is actually known by many names. It’s known as the Matthew effect, as Robert Merton coined it, in sociology, and is related to something known as Gibrat’s Law when it comes to understanding how cities and firms grow.
More generally, Mikhail Simkin and Vwani Roychowdhury, the same scientists who explored the errors in scientific citations, examined a few models developed in physics that are widely used to explain certain types of probability distributions. This includes
the models behind everything from chain reactions to income distributions. They explored how these models have been reinvented again and again, and they go into great detail, over the course of thirty-five pages, eventually summarizing these successive reinventions in a large table. They show, for example, that something known as the branching process was discovered in the mid-1840s, only to be rediscovered in the 1870s, then again in 1922, 1930, 1938, 1941, and 1944. The Erdo˝s-Rényi random graph, written about by Paul Erdo˝s and Alfréd Rényi in 1960, was first examined in 1941 by Paul Flory, the chemist and Nobel laureate. As Stigler’s Law of Eponymy states: “No scientific law is named after its discoverer.” Naturally, Stephen Stigler attributes this law to Robert Merton.
Extreme cases of this can especially be found during times of war. Richard Feynman, the celebrated physicist, shared the Nobel Prize with two other physicists, including Sin-Itiro Tomonaga. Tomonaga’s work was conducted in Japan during World War II, so even though Feynman later reached similar results, Tomonaga’s work at the same time did not spread to the scientific world or to the West.
This sort of situation was most pronounced during the many decades of the Cold War. During the latter half of the twentieth century scientists in the West often duplicated research done in the Soviet Union. Certain concepts in computer science, related to the difficulty of mathematical problems, were independently discovered both in the United States and Soviet Union. In addition, a precursor to the laser was independently developed both in the East and the West. In a widespread way, knowledge was effectively hidden from a large portion of the world, and duplication was the natural result.
Far from duplication of discovery being a strange, isolated situation common only during times of war (or at least cold war), this seems to occur quite often. Known as
multiple independent discovery
, some have occurred five or more times simultaneously and can make innovation seem nearly inevitable. Classic examples of simultaneous innovation are the telephone, for which two patents were filed on the same day, the discovery of helium, and even the
theory of natural selection, which was proposed by both Charles Darwin and Alfred Russel Wallace.
In some of these cases (though by no means all), there was a certain amount of delay: A discovery was simply not known by one party and ended up being duplicated, sometimes years later. If knowledge had spread widely, such a thing would not have occurred. But knowledge can be hidden for other reasons. There are occasions in science when knowledge is hidden because it is so far ahead of its time.
. . .
THERE
is a rule in online circles known as Godwin’s law. It states that as the length of an Internet discussion approaches infinity, the probability that someone will be compared to Hitler or the Nazis approaches one. Perhaps a corollary should be that if a discussion is about Shakespeare, the longer it is, the likelihood of arguing that Shakespeare did not actually write the plays attributed to him increases.
I’m not going to weigh in on the question of whether William Shakespeare authored the plays attributed to William Shakespeare. However, the concern that leads to this discussion—the comparison of Shakespeare’s background and training to what he actually produced—is not unique to literature. In fact, there are many Shakespeares, in all fields, including mathematics and science. There are individuals who, based on their environment, seem highly unlikely to have done what they did. Their accomplishments give us a sense of how knowledge can move forward. And one of these Shakespeares was George Green.
George Green was a miller who lived in the town of Nottingham during the nineteenth century. He received one year of schooling when he was eight years old, in 1801, and then went to work in the mill and bakery of his father. Until nearly the end of his short life—he died at the age of forty-seven—he was completely unknown. Yet this entirely unremarkable man, whose background consisted of knowledge related to grain and baked goods, produced a variety
of unbelievable innovations in mathematics and physics. Two of his contributions are known as Green’s theorem and Green’s functions, the latter of which are complex enough to vex many mathematicians and physicists.
His first work,
An Essay on the Application of Mathematical Analysis to the Theories of Electricity and Magnetism
, introduced both of these concepts, and it was published in 1828. Frankly, no one is entirely sure what happened between 1801 and 1828, and what led to Green laying the foundations for the mathematics eventually used in quantum mechanics. There was one other person trained in mathematics in Nottingham, and while we don’t know if Green and this man interacted, they did live near each other. And frankly, this seems to be the only possible solution to this mystery. But however Green learned advanced mathematics, Einstein once remarked that Green’s contributions were decades ahead of what was expected. As a result of his work being so ahead of its time, and coming from far outside the mainstream, Green’s work was almost completely unknown until after his death in 1841.