Does this sound familiar? We are back in the dice game of the Chevalier de Méré. Had Pascal been miraculously reanimated at the L.A. County Courthouse, his testimony would have been very useful. He could point out that Los Angeles is a big place: once the population from which you draw your couples gets up around 4 million, the probability of two occurrences of these “1 in 12 million” characteristics (using the same power law calculation we used in Chapter 2) actually rises to about 1 in 3.
To give injustice its final varnishing, the prosecution made one of the most fundamental and common mistakes of probabilityâwhich, for obvious reasons, is known as the
Prosecutor's Fallacy
. This is the leap from the probability of a random match to the probability of innocence: having first assumed that the chance that another couple had the characteristics of the Collinses was 1 in 12 million, the prosecutor then assumed a 1-in-12-million chance that they had
not
robbed Mrs. Brooks (indeed, he cranked up the odds against innocence to “one in a billion” in his closing statement). Why is this a fallacy? Well, let's simplify it: what if the only identifying criteria were, say, being black and male? The chance of a random match over the U.S. population is around 1 in 20; does that, therefore, make any given black male defendant 19/20 guilty? Obviously not; but why not? Because there are uncounted ways of being innocent and only one way of being guilty; our legal system presumes innocence not just because the emperor Vespasian decreed it, but because of the underlying probabilities.
Even the fair-minded can fall into the Prosecutor's Fallacy, however, because they are bamboozled by the way these probabilities are presented. Suppose that we say instead: “Out of every 20 Americans, 1 is black and male. Two hundred people passed the crime scene that dayâso we can expect, on average, 10 of them to be black males. Therefore, in the absence of other evidence, the chances that the defendant is the culprit is 1 in 10.” Then, probability evidence would be a help rather than a hindrance. Juries, like doctors, find percentages much less comprehensible than frequenciesâand are therefore much more likely to accept the prosecutor's view unquestioningly when told that the chance that, say, a DNA match is accidental is 0.1 percent rather than 1 in 1,000.
The Collins jury was swept away by the numbers and voted guilty; the conviction was overturned on appeal, and the case is taught in every law school as a primer in errors to avoidâbut these are persistent errors which, though chased out the door, return through the window.
In late 1999 Sally Clark, an English lawyer, was tried for murdering her two infant sons. Eleven-week-old Christopher had died in 1996, of what doctors believed at the time was a lung infection; a little over a year later, 8-week-old Harry died suddenly at home. The medical evidence presented at the trial was complex, confusing, sometimes contradictory, and generally inconclusive, revolving around postmortem indications that might suggest shaking, suffocation, or attempts at resuscitationâor might even be artificial products of the autopsy. Sally Clark's defense was that both children had died naturally. The phrase “sudden infant death syndrome” arose sometime during the pretrial discoveryâand with it, probability stalked into the courtroom.
One of the most important prosecution witnesses was Professor Sir Roy Meadowânot a statistician but a well-known pediatric consultant. His principal reason for being there was to give medical evidence and to point out the suspicious similarities between the babies' deaths.
Meadow gave no probabilities for these apparent coincidences; but he had recently written the preface for a well-constructed, government-sponsored study of sudden infant death syndrome (SIDS) in which the odds of death were calculated against certain known factors (smoking, low income, young mothers). Sally Clark did not smoke; she was well paid and over 27âso the statistically determined likelihood of a death from SIDS in her family was 1 in 8,543. Following the lead of the study's authors, Meadow went on to speculate about the likelihood of two SIDS deaths appearing in the same family: “Yes, you have to multiply 1 in 8,543 times 1 in 8,543 . . . it's approximately a chance of 1 in 73 million.”
He repeated the figure and added: “In England, Wales, and Scotland there are about say 700,000 live births a year, so it is saying by chance that happening will occur about once every hundred years.”
The prosecuting barrister pounced: “So is this right, not only would the chance be 1 in 73 million but
in addition
in these two deaths there are features which would be regarded as suspicious in any event?” Professor Meadow replied, “I believe so.”
The voice, once loosed, has no way to return. Did the professor realize that the numbers he was reciting would not only help imprison someone wrongly for four yearsâbut would bring his own career to an ignominious end?
What was so wrong in what he said? Three things: first, SIDS is not the null hypothesis, the generic assumption if murder is ruled out. Nor is it a specific diseaseâit is by definition a death for which there is
no
apparent cause. Meadow himself later pointed this out: “All it is is a âdon't know.'” Ignorance, though, is not the same as randomness. We can certainly say that, in the UK, 1 family in 8,543 with no risk factors will, on average, suffer a death from SIDS: that is a
statistical
figure; it comes from observation. But to postulate that
probability
is at workâthat these deaths result from rolling some vast 8,543-sided dieâis not justified. Indeed, where any cause can be identified, such as a genetic predisposition or an environmental pollutant, the probability of two similar deaths in the same family would be much higher.
The second problem was the implication that a low probability of SIDS implied the guilt of Sally Clark. This prosecutor avoided committing his eponymous fallacy, but his use of “
in addition
” created a strong sense that any further evidence against the babies' mother merely reduced the already minuscule probability that their deaths could have happened naturally.
The third and most important flaw was that SIDS actually had nothing to do with the case. Sally Clark's defense team had never claimed that the babies' deaths were SIDS; it claimed they were natural. The postmortem examinations had revealed that
something
was wrong: these were not inexplicable deaths, they were unexplained. There was therefore no reason to discuss the probability of SIDS at allâexcept that the prosecution had assumed it would be the basis of the defense and had therefore spent time and effort in securing expert testimony on it. Unfortunately, none of these three objections was brought up in the cross-examination of Professor Meadow.
Who can know what will sway the heart of a jury? The medical evidence was complex and equivocal: there was nothing conclusive, nothing
memorable
. Now, out of a peripheral issue that should not even have been discussed, there arose this dramatic figureâ73 million to 1. In his summing up, the judge did indeed warn against excessive reliance on these numbers: “However compelling you may find them to be, we do not convict people in these courts on statistics. It would be a terrible day if that were so.” But convict they did.
Four years later, Sally Clark's second appeal succeeded. There were evidentiary grounds, but the court also found that: “putting the evidence of 1 in 73 million before the jury with its related statistic that it was the equivalent of a single occurrence of two such deaths in the same family once in a century was tantamount to saying that without consideration of the rest of the evidence one could be just about sure that this was a case of murder.”
In a remarkable development, the Royal Statistical Society itself had issued an announcement to protest against “a medical expert witness making a serious statistical error, one which may have had a profound effect on the outcome of the case.” Sir Roy Meadow was eventually struck off the medical rolls. The police and prosecution guidelines for infant mortality that he had helped developâpopularly described as “one is tragic, two suspicious, three murder”âwere scrapped; even the accepted standards for medical evidence came under suspicion. The press, which had reveled in morbid images of monster mothers, swiveled around to attack witchfinder doctors. The odds, always a dangerous way to deal with uncertainty, were reversed.
Yet even had Professor Meadow's 1-in-73 million calculation been relevant to the case, Bayes' theorem might have prevented a miscarriage of justiceâbecause it would have made clear what likelihoods were actually being compared. If, in the absence of all other evidence, we agree to odds of 1 in 73 million against two cases of SIDS in the same family, whatâusing the same methods of calculationâare the odds against two cases of infanticide?
One in 8.4 billion
. Numbers need not only serve the prosecution; the statistical knife cuts both ways.
Â
The likelihood of seeing Bayes make regular appearances in court is low. Juries are supposed to be ordinary people, using common sense in making their decisions, and judges are naturally dubious about anything that tends to replace common sense with the mysterious mechanism of calculation. The fear is that people may have an inflated respect for what they do not understand and convict an innocent suspect because “you can't argue with the figures.”
Unfortunately, bad probability has tended to drive out good. There are particular kinds of evidence for which Bayes' theorem would be a relatively uncontroversial guide for the perplexedâfor example, in evaluating identifications from fingerprint evidence, paternity tests, and DNA matching. Here, the “learning term” in Bayes' equation is not a figure taken from the air: there are solid statistical reasons behind the numbers describing how likely a given identification is, or how reliable a positive result should be, given a genuine association.
Galton first proposed fingerprint evidence as a forensic tool, and more than a century's experience has failed to disprove the statistical uniqueness of everyone's individual set. Of course, matching the blurred partial print from the crime scene to the neat inked file card is a different matterâyou would expect there to be known error rates that could be included in Bayesian calculation. But no; although all other expert testimony is now required by the Supreme Court to include its intrinsic error ratesâthe so-called Daubert rulingâfingerprint evidence is presented as absolute: a 100 percent sure yes or no. Indeed, the world's oldest and largest forensic professional organization
forbids
its members to make probabilistic statements about fingerprint identification, deeming it to be “conduct unbecoming.”
Even where error rates are permitted, the courts are uneasy. In
R. v. Adams,
a recent British rape case, a positive DNA match was the only basis for identification; all the other evidence pointed away from the accused. The prosecution's expert witness gave the ratio of likelihood for the match, given the two hypotheses of guilt and innocence, as “at least 1/2,000,000 and possibly as high as 1/200,000,000.” The defense's expert witness told the jury that the correct way to combine this ratio with the prior probabilities was through Bayes' theorem. Thus far, all was uncontroversial.
The defense went on to explain how Bayes' theorem would combine the probabilities based on the other evidence (that the victim did not recognize the accused in a lineup, that the accused was fifteen years older than the victim's description of her attacker, that the accused had an alibi for the time of the attack) before applying the likelihood ratio from the DNA match. Using conservative numbers for these independent probabilities results in a prior probability of guilt of around 1/3,600,000. This makes the likelihood ratio for the DNA match a critical question, because if it's only 1/2,000,000, Bayes' theorem produces a likelihood of guilt of .36. If it's 1/200,000,000, the likelihood is .98. Knowing those two numbers should have focused the jury's deliberation on the key question: what was the real likelihood ratio of the DNA evidence?
Whatever it was the jury took into consideration, Adams was convictedâand the appeals court roundly condemned the use of Bayes' theorem:
The percentages [
sic
] chosen are matters of judgement: that is inevitable. But the apparently objective numerical figures used in the theorem may conceal the element of judgement on which it entirely depends . . . to introduce Bayes' Theorem, or any similar method, into a criminal trial plunges the jury into inappropriate and unnecessary realms of theory and complexity deflecting them from their proper task.
Â
One might ask what is the jurors' proper task, if it is not to use every means to clarify and define how they apply their common sense? But this is the current view: the technique has been forbidden, not because it doesn't work, but because the court may not understand it. We can continue to err, as long as we err in ways we find familiar.
Â
In 1913, John Henry Wigmore, Dean of the Northwestern School of Law in Chicago, proposed “a
novum organum
for the study of Judicial Evidence.” The reference to Francis Bacon cut two ways: it implied both that a science of evidence was possible and that it had not yet been achievedâthat law still languished in medieval obscurity. Wigmore had come across a sentence in W. S. Jevons'
Principles of Science
that summed up the problems of shaping a mass of evidence to produce a just decision: “We are logically weak and imperfect in respect of the fact that we are obliged to think of one thing
after
another.”
Dean Wigmore set out to arrange evidence not in time, but in space: not as a sequence, but as a network, showing “the conscious juxtaposition of detailed ideas for the purpose of producing rationally a single final idea.” Like all worthy tasks, this meant starting with a clean sheet of paper and a well-sharpened pencil.