The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball (29 page)

BOOK: The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball
6.95Mb size Format: txt, pdf, ePub
ads

61
. Bill James, “Underestimating the Fog,”
Baseball Research Journal
33 (2004), pp. 29–33.

62
. Thomas Gilovich, Robert Vallone, and Amos Tversky, “The Hot Hand in Basketball: On the Misperception of Random Sequences,”
Cognitive Psychology
17 (1985), pp. 295–314.

63
. James, “Underestimating the Fog,” p. 33. Nonetheless, in chapter 4 of
The Book
, Tango, Lichtman, and Dolphin find a significant clutch effect.

64
. Samuel Arbesman and Steven Strogatz, “A Journey to Baseball’s Alternate Universe,”
New York Times
, March 30, 2008.

65
. McCotter performed what statisticians call a permutation test.

66
. Trent McCotter, “Hitting Streaks Don’t Obey Your Rules,”
Chance
, November 2011.

67
. Jim Albert had written much on the subject of streakiness, and engaged McCotter in a debate about the interpretation of his work. Albert did not dispute the elegance of McCotter’s method or the novelty of his findings. However, he suggested that McCotter’s discovery is (1) likely the result of failing to correct for multiple testing (a smart observation that should be addressed, but is probably not enough to overturn the result); and (2) of an effect that is so small as to be of no practical significance (we find this to be less persuasive).

68
. Lewis refers to this type of calculation in a passage about defense:
Moneyball
, p. 134.

69
. The 119 sacrifice bunts by the Rockies were 60 percent more than the NL average in that season. Perhaps even more extreme examples were the California Angels in 1985 and 1986, who sacrificed about twice as often as the average AL team in both of those seasons.

70
. Keith Woolner, “Baseball’s Hilbert Problems,”
Baseball Prospectus
, February 10, 2004,
http://www.baseballprospectus.com/article.php?articleid=2551
.

71
. Email communication, September 25, 2012.

72
. Ibid.

Chapter 5. The Moneyball Diaspora

1
. Interestingly, cricket, where player performance is also discrete, has not witnessed a blossoming of analytics. Ahsan Butt, “Why Haven’t We Seen the Moneyball-ification of Cricket?”
www.firstpost.com
, December 11, 2012.

2
. Accordingly, a more careful modeling of the net points from a score in basketball would be a positive function to the proximity of the shot to the end of the twenty-four-second clock. This calculus changes again as a possession nears the end of a quarter in basketball.

3
. For some stats, the number of occurrences might be greater outside baseball. For instance, the number of plate appearances is limited in baseball, whereas the number of shots in basketball is not as constrained. Still, some might argue that the relevant observation in baseball is the pitch, rather than the plate appearance, which would multiply the sample size several-fold.

4
. For an informative history of TENDEX as well as other statistics in football, basketball, and baseball, see Dave Heeren and Pete Palmer,
Basic Ball
(Haworth, N.J.: St. Johann Press, 2011).

5
. Email exchange, August 21, 2012.

6
. These were players who were on the all-star team in the year of the championship or in the adjacent years.

7
. APBR stands for Association for Professional Basketball Research.

8
. Email from Dan Rosenbaum, August 30, 2012.

9
.
http://espn.go.com/nba/story/_/page/PERDiem-121214/john-hollinger-farewell -column
.

10
. Bob Carroll and Pete Palmer,
The Hidden Game of Football
, 1988; David Romer, “Do Firms Maximize: Evidence from Professional Football,”
Journal of Political Economy
114 (2006), pp. 340–365.

11
. We are using
analytics
here basically as interchangeable with
statistical analysis
and programming. Clearly, different analysts do different types of work. For instance, it is our understanding that the Ravens hired Sandy Weil in large measure due to his skills in data management and information systems.

12
.
http://slashdot.org/topic/bi/buffalo-bills-latest-nfl-team-to-embrace-data-analytics/
, accessed on January 2, 2013.

13
. According to at least one observer, John Pollard, a general manager for Stats LLC, the pace of adoption of analytics in NFL front offices began to accelerate after the 2011–2012 season. See Judy Battista, “More NFL Teams Use Statisticians,”
New York Times
, November 14, 2012.

14
. See note 2 above.

15
. See David J. Berri, Martin B. Schmidt, and Stacey L. Brook,
Wages of Wins: Taking Measure of the Many Myths in Modern Sport
(Stanford: Stanford University Press, 2006),
chs. 5-7 and David J. Berri and Martin B. Schmidt,
Stumbling on Wins: Two Economists Expose the Pitfalls on the Road to Victory in Professional Sports
(Upper Saddle River, N.J.: FT Press, 2010), chs. 2-3 and Appendix A.

16
. Berri’s models are log-linear.

17
. Berri and Schmidt, in
Stumbling on Wins
, also report,
inter alia
, that the factors predicting when a player will be chosen in the draft do not predict how successful a player will be in the NBA and that draft position is a very weak predictor of a player’s future NBA performance. See pp. 136–137.

18
. In hockey, this system has proved useful, but it also has clear limitations, including the interdependency of play and the fact that role players may be advantaged or disadvantaged by it. For instance, a defender who is called upon repeatedly to go up against the top line of the opposing team is more likely to earn a minus than a defender who is up against the second line. The development of statistics for hockey analytics has been moved forward significantly by Gabriel Desjardins at behindthenet.ca.

19
. According to Berri and Schmidt (
Stumbling on Wins
, p. 183), for 239 players only 7 percent of the variation in a player’s adjusted plus/minus value in 2008-2009 was explained by what he did in 2007-2008. Further, if the sample is restricted to those 87 players who switched teams, the percent of variance explained falls to 1 percent.

20
. Indeed, to the extent that plus/minus score provides inadequate consistency over time, a new metric based on parameters estimated in a regression that employs plus/minus as the dependent variable is likely to suffer from the same problem.

21
. For an excellent discussion of the crucial role played by offensive linemen, the importance of measuring their output, and the immense complexity in doing so, see Benjamin Alamar and Keith Goldner, “The Blindside Project: Measuring the Impact of Individual Offensive Linemen,”
Chance
11 (2011), available at
http://chance.amstat.org/2011/11/the-blindside-project/
, and B. Alamar and J. Weinstein-Gould, “Isolating the Effect of Individual Linemen on the Passing Game in the National Football League,”
Journal of Quantitative Analysis in Sports
4, no. 2 (April 2008).

22
. Passer Rating (often wrongly called “Quarterback Rating”) is frequently cited during NFL game broadcasts and in the media. Its meaning is suspect for several reasons, including the use of arbitrary weights for various outcomes of passing plays and the complete exclusion of running plays.

23
. See, for instance, Berri and Schmidt,
Stumbling on Wins
, appendix B, “Measuring Wins Produced in the NFL.”

24
. The analytical approach here is similar to that used in baseball with the run expectancy matrix. See, for one, Tom Tango et al.,
The Book: Playing the Percentages in Baseball
(Dulles, Va.: Potomac Books, 2007).

25
. See Jim Armstrong, “Aggressiveness Index 2011,” Football Outsiders, January 27, 2012,
http://footballoutsiders.com/stat-analysis/2012/aggressiveness-index-2011
for a discussion of these studies.

26
. Berri and Schmidt,
Stumbling on Wins
, pp. 35-37, 182.

27
. For clarity, the following statistics are the coefficient of determination (R
2
), not the correlation coefficient (r).

28
. Hockey stats in the NHL also show much stronger year-to-year consistency, e.g., the R
2
(percent of variance explained) for goals per minute is 63 percent.

29
. Possibly compounding this effect, if your team does not have a strong winning record, then it will be hard to convince GMs around the league that you have desirable players on your roster.

30
. The NBA draft lottery is an annual event held by the league in which the teams that had missed the playoffs in the previous season, or teams who hold the draft rights of another team that missed the playoffs in the previous season, participate in a lottery process to determine the order in the NBA draft. In the draft, the teams obtain the rights to amateur U.S. college players and other eligible players, including international players. The lottery winner gets the first selection in the draft. Under the current weighted lottery rules (in place since 1990), fourteen teams participate in the lottery. The lottery is weighted so that the team with the worst record, or the team that holds the draft rights of the team with the worst record, has the best chance to obtain a higher draft pick. The lottery process determines the first three picks of the draft. The rest of the first-round draft order is in reverse order of the win-loss record for the remaining teams, or the teams who originally held the lottery rights if they were traded. The lottery does not determine the draft order in the second round of the draft.

31
. Graham MacAree, “Football Analytics: Finding Bill James’ Cipher,”
www.weaintgotnohistory.com/2012/7/8/3143394/football-analytics
.

32
. Jere Longman, “Messi’s Brilliance Transcends Numbers,”
New York Times
, December 11, 2012.

33
. Armstrong, “Aggressiveness Index, 2011.”

34
. It is interesting that even though there have been no clearly demonstrated advantages to the use of football analytics, as there have been in baseball, those NFL front offices that practice analytics treat it with the utmost confidentiality. A 2012 article in the
New York Times
commented: “Few teams like to talk about the degree to which they use analytics because they fear giving away a competitive advantage. One general manager whose team does delve into statistics, but who didn’t want to be identified, wondered why the [Baltimore] Ravens announced the hire [of a director of football analytics in August 2012] at all.” Battista, “More NFL Teams Use Statisticians.” Meanwhile, the San Francisco 49ers have been openly stocking up with techies from Silicon Valley in anticipation of innovative applications that the techies will uncover. See Kevin Clark, “Silicon Valley Straps on Pads,”
Wall Street Journal
, December 11, 2012.

35
. Benjamin Alamar and Vijay Mehrotra, “Sports Analytics, Part 2,”
http://www.analytics-magazine.org/november-december-2011/476-sports-analytics-part-2
, p. 5.

36
. Battista, “More NFL Teams Use Statisticians.”

Chapter 6. Analytics and the Business of Baseball

1
. Although it is the received wisdom, there are detractors who question how much competitive balance or what modalities of it are the most significant. For one, see Simon Kuper and Stefan Szymanski,
Soccernomics
(New York: Nation Books, 2009). Competitive balance can apply to the uncertainty of individual games, the uncertainty of the outcome of a season, or the uncertainty of which teams will rise to the top from one year to the next.

2
. To be sure, revenue sharing in MLB has other roles apart from promoting competitive balance. The way it is structured in MLB, it lowers a player’s net marginal revenue product and, hence, other things being equal, lowers his salary. It is also intended to support the bottom line of financially fragile teams. For further discussion of these elements of revenue sharing, see Andrew Zimbalist,
May the Best Team Win
:
Baseball Economics and Public Policy
(Washington, D.C.: Brookings Institution Press, 2003).

3
. It is important to clarify that we are using year-end LRD (MLB’s Labor Relations Department) payroll data. LRD payroll is based on cash accounting for the entire forty-man roster. It does not use average annual contract value, as does MLB’s competitive balance tax accounting. Further, it is significant that it is year-end payroll that is always more highly correlated to team win percentage than is opening day or mid-year payroll. This is because teams that are doing well generally add players (and salary) as the season progresses, and teams that are doing poorly usually dump players (and salary).

4
. It is interesting to note that the R
2
between payroll and win percentage was actually higher in the years prior to the publication of
Moneyball
. For instance, between 1995 and 2002 the R
2
averaged .334. One might conjecture that this is due to the Yankees remarkable on-field success between 1996 and 2001, but this is only partially true. The R
2
only drops a few percentage points, to .274, when the Yankees are excluded from the data.

5
. More precisely, the idealized standard deviation is equal to .5/√
n
, where .5 represents a 50 percent chance for each team to win (i.e., the assumption that talent is equally distributed across all teams) and
n
is the number of games played in the league per season.

6
. This explanation is discussed in greater detail in Andrew Zimbalist,
Baseball and Billions
(New York: Basic Books, 1992), chapter 4.

BOOK: The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball
6.95Mb size Format: txt, pdf, ePub
ads

Other books

Murder by Numbers by Kaye Morgan
Glory (Book 5) by McManamon, Michael
Healer of Carthage by Lynne Gentry
The Abomination by Jonathan Holt
Black Knight, White Queen by Jackie Ashenden
Caleb's Blessing by Silver, Jordan
Ice Trilogy by Vladimir Sorokin