Monday, July 14

I did some baseball research!

I'm a fan of baseball statistics, frequently eschewing traditional measures like batting average and RBI in favor of new-guard numbers like OPS+ and OBP, because I have read smart-sounding fellows describe how these metrics better capture players' value. Just for fun, I ran some numbers myself to reach my own conclusions.

I looked at the MLB team batting stats, and graphed each team's runs per game against: OPS+, OBP, and BA. I did this for 2008 (so far) and 2007 in total, for AL, NL, and all MLB. For each, I plotted correlation as an R2 value, a measure of correlation between data sets. 1 is perfect (e.g., plotting runs/game against runs/per game), and 0 means random. First, the 2007 numbers (click to enlarge). Red is OBP, blue is OPS+, and green is BA.


OPS+ is the clear winner from the AL at 0.86, followed by OBP (0.64) and, bringing up the rear as it will continue to do, batting average (0.53). The NL is a bit more random - numbers are down across the board, but OPS+ is by far the winner (0.63) over the other two metrics. Maybe this is a result of the NL's different style, with pitchers hitting? When we include both leagues, naturally, the same order remains. Now let's have a look at last year, a larger sample size by about a factor of two:


Check out how OPS+ is still a winner for predicting AL run-scoring (0.83) and how the others have caught up somewhat, especially OBP at 0.78. Get on base, young hitters! Batting average is of some value, we see (0.65), but not nearly as instructive as these other numbers.

Last year's NL numbers are fascinating - OBP is on top! Its 0.73 r-squared value makes it a slightly better predictor than the ever-reliable OPS+ (0.71). I bet someone from SABR knows why this is, but I do not. Yet I do not fear it. Batting average, as usual, lags well behind at 0.45. Adding the two leagues together keeps OBP on top by a small amount - people aren't kidding when they say the most important thing in baseball hitting is not making outs. Some analysts think that, as a component of OPS, it is up to four times more valuable than SLG% (I forget where I read that). Getting on base is important. Walking is important. Extra-base hits are important. A high batting average is not important.

I looked at the numbers for pitching as well, but ERA is obviously a self-fulfilling number when it comes to pitching runs yielded and I lacked a solid angle for analysis. Also: this article is long. I'll defer to the fellows who have discovered Runs Created and the value of WHIP - as correct as they are on batting metrics, I'm willing to trust them when they say these are better indicators of pitching prowess than ERA or, heaven forbid, W-L record.

So there you have it: OPS+ is a very good predictor of runs, which we know is a predictor of wins (though some really anti-stat guys might say something cliched like "all that matters is wins" without understanding how wins happen). OBP is nearly as good, occasionally even better. Batting average: not very good. Weird that me, a professional scientist, did this in some spare time, while I've yet to see a professional sportswriter even do it on an off-day. Get with the program!

Oh, and Go Tribe!

No comments: