Perception is a funny thing. Small differences that are real and important are often very hard to perceive, even when you pay close attention. Bill James made the point quite clearly in his
1977 Baseball Abstract
. What is the difference between hitters with batting averages of .275 and .300? These batting averages range from decent to good. But, how many hits does a good hitter have compared to a decent hitter over a two-week period? One. That’s right, a mere two hits a month separates the wheat from the chaff among hitters. If we assume the batters average twenty at-bats per week, over a two-week span the good hitter will have twelve hits and the decent hitter will have eleven. Over the course of the season, that translates to twelve extra hits for the good player. Even if you watched every game, it would be hard to know who was the better player. And if you stepped out to the bathroom at inopportune times over the year, you could easily gain the wrong impression of each player’s ability.
But, because the numbers generated at low levels of competition can’t tell us much about the professional baseball career of teenagers, even when aggregated, we have to look to something else. Scouts developed some rules—tall is better than short, skinny is better than fat, etc.—and tried to quantify tools that were measurable—radar-gun fastballs, speed around the base path, the mechanics of a swing, or an arm motion. They might even conduct interviews to find out the unquantifiable: competitive drive, the heart of a champion, sanity, etc. For most of the history of baseball, this is all the scout had to go on, and this information was better than no information at all. Teams that scouted prospects instead of randomly selecting talent tended to win more games.
It’s common to hear stat-heads emphasize the bias of psychological factors that can cause scouts to develop irrational likes or dislikes for certain types of players. Using numbers removes such biases. But we don’t even need to consider these biases to show why stat-head scouting can have advantages over the traditional methods. I think these biases do exist, but certainly there are good scouts out there who have learned to avoid the obvious pitfalls. Consider the scouting of a player like Barry Zito. Zito is known for his deceptive curveball and changeup, with a fastball in the high-80s.
61
Many teams refused to give him a chance because he couldn’t break 90 on the radar gun. While speed tells the scout something, speed just isn’t everything. The failure to identify Zito as a future Cy Young winner wasn’t a product of human scouting, just bad scouting. Relying on a radar gun, rather than examining deceptive ability, could cause a scout to conclude Zito would be a dud. But a good scout would not be stuck on this particular metric. The failure of scouting in this instance had to do with individuals, not the process.
This is a very simple discovery to make, and most people in baseball, including scouts, are probably aware of the weaknesses of the gun. But what about some other weaknesses in scouting that are yet undiscovered? How does a GM find those? Statistical methods offer insights as to the things scouts are missing. Maybe college pitchers who always pitch the second half of Sunday doubleheaders face weaker competition, which artificially inflates their strikeout rates. I have no idea if my conjecture is true, but certainly such quirks exist, which might take a long time to find out without statistical analysis. And even if you figure out that these stats ought to be discounted, how much should you discount them? A couple of scouts sitting around talking may eventually realize such a phenomenon, but it’s something a stats-geek with a spreadsheet can pick up in an hour of goofing around with data. And imagine if you’re the first team to figure this out. You get a huge jump on the competition for new talent.
The process is important to avoid mistakes, but not because statistics avoid the biases of human perception. How did the Oakland A’s figure out Zito was worthy of being drafted in the first round? From the stats. When you look at major-league players, you can see that many succeed in different ways. There are many young guys who have fastballs near the century mark but can’t make it out of Double-A. Other pitchers with much lower velocity have great success in the major-leagues, like Greg Maddux with his mid-80s fastball. But a common trait among major-league pitchers is throwing hard, and very few players in the big leagues get away with fastballs under 90 miles per hour. How can you figure out which of these pitchers are more likely to succeed? What if we had a spreadsheet—like the popular Microsoft Excel—with a list of characteristics for every twenty-year-old college pitcher? Table 21 shows a hypothetical short list of stats for three pitchers.
How can we know which of these metrics gives us the most information about the future success of a pitcher? Certainly, all scouts will have their ideas, which are mostly correct. Speed is good, walks are
bad, etc.; but how can we determine what is most important among a sea of data? Some guys have low walk rates but no speed, or vice versa. We can see any number of combinations across players. Over time, a group of scouts may be able to figure out what is the most important, how to discount the bad or inflate the good, but these are all guesses. If you are lucky, your team will have some of the better guessers. Statistics give us a method to sort through the data to spot trends and according to different factors. A better understanding of what characteristics are important for predicting the future performance of a player adds some certainty to predictions. Statistical methods also present guesses, but they are much more precise guesses. Here is how this works for projecting baseball players.
Statisticians have developed methods to quantify the impact of individual factors on outcomes when many factors are involved. As long as we have a large group of players, the statistical methods identify when the individual characteristics of pitchers differ and how these differences in college translate to differences in the future. We can then identify what each factor is worth in terms of predicting the future. When one goes up, we see the quality of the player go up, when another goes down, the player’s success falls. Each factor receives a weight based on the importance of affecting the outcome.
The formula above is a simple linear model that tells us how much each of these important characteristics predicts the ERA of any player at the age of twenty-seven. Certainly some players will do better, and some worse, but knowing these factors can tell use more about what that player’s true performance will be by the time he is twenty-seven than just guessing the average ERA of all twenty-seven-year-old pitchers. How can we possibly
know
this? Well, the laws of statistics give us certain insights about the likelihood of events occurring based on other events. A particularly useful tool of applied statisticians, multiple regression analysis, can be a big help. Appendix A provides further description of this method.
What does multiple regression tell us? Multiple regression can spot patterns in the data, even when the data seem quite confusing. But when we observe statistical metrics of a large number of individuals, we can identify differences in some of these predicting factors (strikeouts, walks, etc.) that move with or against the predicted metric (ERA). In fact, human eyes really don’t view these changes all at once. It would be too hard. Multiple regression involves using mathematics to measure the changes in multiple factors. It identifies when movements in predicting factors and the predicted metric change from player to player and how big those movements are. With history as our guide—that is looking at what college players did in the past—a performance scout can see what these factors reveal about their future performances. Thus, the multiple regression analysis of the past data provides information on how to value performance in the present as it relates to the future. We might learn that a pitcher who strikes out a lot of batters will be expected to have a lower ERA at twenty-seven, while a pitcher who walks a lot of batters ought to have a higher ERA. Maybe pitcher ERAs in the ACC and SEC need to be valued differently.
62
If a scout wanted to see how Sunday doubleheaders affect the prediction, he could add a variable that counted for the number of Sunday doubleheaders pitched to weight the effect. In the end, these estimates are not just educated guesses, but based on objective past performance, which generates weights from a history of events. And if the data set used is of a certain size and accurate in its information, we can feel reasonably confident in the estimated weights.
These estimates will not be perfect, but they can identify and correct unseen errors that human scouts miss in some areas. The human mind is just not equipped to process such a wealth of information all at once. Just as a metal detector helps a beachcomber find coins buried in the sand, statistical analysis helps general managers find players buried among sweaty teenagers.
Efficient Efficiency
The use of statistical methods to enhance scouting represents another type of efficiency. Efficiency is also a term statisticians use to judge what are known as
estimators
. Estimators are theoretical tools for predicting a numerical estimate from a sample of data. Rather than being a number, an estimator is a method for generating predicted estimates. There are lots of different methods we could choose to predict estimates from a group data.
63
For example, let’s say I was asked to predict the SAT score of a randomly selected student at a university. I would have many options to educate my guess. I could ask the next student who walked down the street what his score was and go with that score. Or I could go with the mean (total scores divided by total students), the median (the middle score), or the mode (the most common score) of the student population. These choices represent methods, or estimators, that will generate an estimate that we can use to predict. Statisticians try to determine properties of competing estimators that will minimize the mistakes, in size and frequency, of estimates. The “stat-heads versus scouts” debate is an argument over estimators, except that the estimators are not necessarily purely known mathematical functions of the data, like the mean or median, which statisticians use. Each camp makes predictions about the same population of players, based on different sets of estimators.
Statistical estimators are judged on two properties: bias and consistency. First, we want our estimator to be unbiased, that is, not consistently above or below the true value we are estimating. We don’t want to choose a method that will regularly over- or underpredict how good a player will become. Second, we want to minimize the size of the predicting mistake, a quality known as consistency. We would prefer that the mistakes that we make about a player’s future performance be small and infrequent. When we choose from several unbiased estimators to predict a true outcome from population, we want the one that minimizes the errors of the prediction. Fewer and smaller mistakes are preferred to more and larger mistakes, right? In the language of statistics, an estimator with the greater consistency, or smaller variance, is said to be
more efficient
than a less consistent estimator. In this sense, being efficient means making fewer and smaller mistakes in predicting the future. This, in turn, increases the economic efficiency of the organization, because it now needs fewer resources to devote to scouting talent. By adopting a new technology, the A’s became more “efficient” in their evaluation of talent.
Hobbyist stat-heads that I know tend to have a bias toward college players when it comes to the draft. Indeed, in
Moneyball
, Lewis touts the A’s focus on college players over high school talent as an edge. But the edge isn’t because college players are better. The A’s focused on college players because they are more predictable from the statistical tools the A’s favor. A technological innovation in performance scouting, such as the DIPS ERA—a modified ERA heavily based on strikeouts, walks, and home runs—can increase the efficiency in evaluating talent. You can use a DIPS ERA to better predict a college player because of the quantity and quality of the observations, but probably not a high school player. And if a technology can be employed with a higher confidence in one area than in another, it’s no surprise that the A’s would concentrate on a talent pool where this new technology is useful. Just as the cotton gin caused Southern farmers to switch to cotton farming in the late eighteenth century, where this technological innovation was relevant, so too did the A’s turn to the college talent pool where their inventions were useful. The reason for this choice is well documented in the book: