The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball (11 page)

BOOK: The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball
10.53Mb size Format: txt, pdf, ePub

Finally, any discussion about predictive analytics in baseball in the wake of
Moneyball
would be remiss without mention of its most successful applicator. Nate Silver, an erstwhile economic consultant at KPMG, sold his projection system PECOTA to Baseball Prospectus in 2003, the year
Moneyball
was published. As we have detailed
Chapter 2
, the success of Baseball Prospectus, which was in no small part due to the quality of PECOTA, gave Silver a venue for refining his predictive analytical skills. By the summer of 2008, Silver shifted those skills onto a new project:
FiveThirtyEight.com
, a website devoted to predicting the outcomes of national elections in the United States. After Silver correctly predicted the winner of the presidential election (encompassing correct predictions for the winner in forty-nine of the fifty states and the District of Columbia), as well as every U.S. Senate race, he received nationwide acclaim, a place for his blog on the
New York Times
website, and a two-book deal reportedly worth three-quarters of a million dollars.
27
One is left to wonder if any of this would have happened had Billy Beane turned Lewis away.

In summary, we have outlined the basic techniques for how sabermetricians analyze the two major components of offense: baserunning and hitting. We have developed the dual notions of accuracy and reliability, and explored the properties of many commonly used metrics in this light. With these dual notions come disparate purposes: statistics that are accurate are useful to quantify contributions in a given year, but not always useful for predicting future performance. Conversely, statistics that are reliable might not assess what we really want to know (e.g., their relationship to runs scored), but they might give us a better sense of what the future is likely to bring. In our view, understanding this distinction is critical to moving beyond senseless debates about, for example, whether the player with the most RBIs should win the MVP award.

4

An Overview of Current Sabermetric Thought II
Defense, WAR, and Strategy

In this chapter we first turn our attention to how sabermetricians have approached the analysis of defense in baseball and then focus on unresolved issues in sabermetric thought. As we will see, for a variety of reasons, the accurate measurement of player contributions on the defensive side has proven far more elusive to sabermetricians than the corresponding offensive components.

Defense

Bill James’s model (and common sense) for expected winning percentage makes it clear that preventing the opposing team from scoring is just as important as scoring runs. The repeated refrain that “defense wins championships” is as prevalent in baseball as it is in other sports, although it is sometimes morphed into phrases like “good pitching beats good hitting.” But the concept of defense in baseball is more similar to that of soccer or hockey than it is to basketball or football. In the former sports, one player (the goalie, or pitcher) is designated almost entirely for defensive purposes, and he plays a tremendous role in determining the number of points that are allowed, while the majority of his teammates are often focused more on offense and play more complementary defensive roles. Conversely, in basketball all players must contribute on both offense and defense, and while some players may
excel in one facet of the game or another, there is no designation that changes what certain players are allowed to do. In football, with the days of the two-way player largely in the past, a change of possession usually brings a complete changeover of personnel for both teams. What is difficult in baseball (or soccer or hockey) is measuring the extent of the defensive contribution of the pitcher (goalie) relative to the other fielders (defensemen). In the same way that we presented separate discussions of baserunning and hitting (which together comprise offense) above, in what follows we present separate discussions of fielding and pitching (which together comprise defense).

Pitching

The evaluation of pitchers changed remarkably little until the theory of Defense Independent Pitching Statistics (DIPS) was advanced by Vörös McCracken in January 2001.
1
Prior to DIPS, starting pitchers were evaluated by their won-loss record, earned run average (ERA), and other proxies for the depth and quality of their contribution (innings pitched, strikeouts, etc.), while relievers were judged by those same metrics, but also saves and holds. Indeed, Bill James and Rob Neyer developed a simple formula to combine these elements into a Cy Young Predictor, which has correctly identified the winner of the Cy Young Award in sixty-seven of ninety cases since the award was first given to a pitcher in each league in 1967.
2

Although many had noted how a pitcher’s won-loss record depended heavily on factors that were entirely outside the pitcher’s control (e.g., the offensive performance of his teammates, or the performance of the relievers who followed him in the game), ERA (earned run average) was still considered to be an accurate reflection of the quality of a pitcher’s performance.
3
However, DIPS theory has largely discredited even this venerable metric, by demonstrating that the percentage of balls put into play against a particular pitcher that fall for hits is much more subject to chance than conventional wisdom allowed. Although the original incarnation of DIPS theory—that pitchers had zero control over this ratio—has been weakened by several arguments, the general thrust that pitchers have far less control over the batting average on balls put in play against them than was previously believed,
is widely accepted within the sabermetrics community inside and outside of the industry. This tenet is now fundamental to understanding the relationship between pitching and defense. Indeed, even major league pitchers such as Zack Greinke and Brandon McCarthy are known to be disciples of DIPS.
4
This represents a considerable change since
Moneyball
, at which time “McCracken’s astonishing discovery about major league pitchers had no apparent effect on the management, or evaluation, of actual pitchers.”
5

Given its importance, we explore this notion in some detail below. McCracken’s original conclusion was: “There is little if any difference among major-league pitchers in their ability to prevent hits on balls hit in the field of play.”

What does this mean? Certainly it does not mean, as has been a common misinterpretation, that there are no meaningful differences among pitchers. Nor does it mean that all pitchers should be expected to give up the same number of hits per inning, or even hits per batter faced. McCracken is talking about batting average on balls in play (BABIP), which we defined above as the ratio of base hits to balls in play, and observing that it appears to be largely outside the pitcher’s control. That is, once the ball has been put into play, it doesn’t seem to matter all that much whether it was put into play against Roger Clemens or Roger Craig. In a mathematical sense, McCracken’s claim is that the conditional probability of a hit against each pitcher,
given that the ball has already been hit into play
, is more or less a constant. Thus, from a statistical point of view, there is little difference among pitchers in terms of their skill at preventing hits on balls in play. That is, if the league-average BABIP is about .290, then most pitchers do not seem to possess sufficient skill to keep their BABIP significantly and consistently under .290. However, the difficulty of putting the ball into play against a particular pitcher varies dramatically, and according to McCracken’s theory, captures nearly all of the variation in pitcher skill. If this is true, then pitchers should be judged by what happens when the ball is
not
put into play against them; that is, by the number of strikeouts they record and the number of walks and home runs they yield.

The problem with a conventional metric, such as ERA, is that it depends heavily upon how many hits a pitcher allows, which, in turn, depends on his
BABIP, which does not appear to have much to do with him. Compounding the problem is that for most pitchers, the ball is very often in play. In general, about 70 percent of all plate appearances end with the ball being put into play, and, thus, if McCracken is to be believed, the outcome of about 70 percent of the plate appearances has little to do with the skill of the pitcher. Intuitively, this suggests that about 70 percent of a typical pitcher’s ERA might just be noise, making it a relatively poor metric for evaluating pitchers.

One of McCracken’s original observations is that great pitchers like Pedro Martinez and Greg Maddux ranked among the league’s best pitchers in BABIP one year, but then among the very worst the next.
6
Does it make sense that if BABIP truly reflected a skill, that these two masters would suddenly lose, and then regain, that skill overnight? Would your answer change if the question were about strikeout rate? That is, would you believe that Martinez or Maddux would have among the league’s highest strikeout rate in one season, and then among the lowest in the next?

These questions return us to the notion of the reliability of a statistic, and our belief that if a statistic is truly measuring a skill, then repeated measurements over a relatively short period of time should be similar. McCracken found evidence that this is not true of BABIP for pitchers. In our framework, we can quantify this by saying that the reliability of BABIP for pitchers is extremely low (R = 0.17, on a year-to-year basis), which we can see graphically in
Figure 8
. While the correlation is not zero, it is very low compared to other performance metrics, and the implication is that what BABIP measures is not so much a skill of the pitcher, but rather a combination of things that don’t seem to stay the same for a particular pitcher. Clearly, the quality of the fielders playing behind the pitcher plays a role in determining his BABIP. The configuration of the ballpark (e.g., height and distance of outfield walls, extent of foul territory) plays a role. And chance plays a role, but none of these things has much to do with the pitcher. Accordingly, we observe that BABIP fluctuates from year to year in a pattern that is not consistent with complete randomness (because some of those things
do
stay the same over time, and pitcher skill
does
play a role), but is consistent with a large amount of randomness—which is far more randomness that many people were willing to concede.

Figure 8. Batting Average on Ball in Play for Pitchers Has Very Low Reliability

Conversely, statistics that measure what happens when the ball is not in play show much higher reliability. As it did for batters, strikeout rate shows the highest reliability (R = 0.759), which is why you don’t see great pitchers like Martinez or Maddux putting up low strikeout rates in their prime. This is because the percentage of batters that a pitcher strikes out has almost entirely to do with the skills of that pitcher, and while those skills may erode over time, they are not likely to change much from one season to the next. Similarly, a pitcher’s walk rate is relatively reliable, since it too depends almost entirely on the pitcher, and reflects a persistent skill (his control) that is unlikely to change over short periods of time.

In the discussion of predictive analytics above, we suggested that a better model for predicting BABIP for batters would be an important step forward in predicting the future performance of hitters. The same is true of pitchers, and in our view this represents something of a “holy grail” of baseball analytics.
The search for a better way to predict BABIP began with McCracken’s statement that pitchers had little or no control over BABIP, which implies that the best estimate of a pitcher’s future BABIP is the league-average BABIP. Recall that in the discussion of forecasting models we presented above, regression to the mean leads most models to predict the future value of a statistic by forming a weighted average of the estimate of the player’s ability with an estimate of the league average. The extent to which the first number is shifted toward the second number is different for every model, but it should reflect the size of the role of chance in the value of the statistic for a player. If the statistic measures pure chance, then the best estimate of every player’s future statistic is the league average. Conversely, if the statistic measures pure skill, then you would use the individual player’s estimate entirely, and discard the league average as irrelevant. The gauntlet that McCracken threw down implied that the best estimate of each pitcher’s future BABIP was the league-average BABIP, or at least something very close to it.

Other books

Scout by Ellen Miles
The Red Queen Dies by Frankie Y. Bailey
The China Study by T. Colin Campbell, Thomas M. Campbell
420 by Kenya Wright, Jackie Sheats
Her Risk To Take by Toni Anderson
Lost To Me by Jamie Blair
Shadows of Death by H.P. Lovecraft
The Denver Cereal by Claudia Hall Christian
Strangers on a Train by Carolyn Keene