17 May 2010

The Silliness of Pitching Wins (and Losses)

In the wake of recent developments that reveal the irrationality of awarding Wins and Losses to pitchers, I felt it would be an interesting exercise to visualize the extent of the silliness. I attempt to express how W-L record fails as both a descriptive and a performance statistic for starting pitching by this stat, along with Win Probability Added (WPA) and Wins above Replacement (WAR) against a performance stat (xFIP) and a descriptive stat (ERA).

If a winning statistic appropriately measures pitching performance, it should correlate well with other raw measures of pitching performance, such as xFIP. The stronger the correlation, the less scattered the points representing the two variables will be. In the chart below I plotted W-L Rate (Wins - Losses / Starts) in blue, WPA Rate (Win Probability Added / Starts) in red, and WAR Rate (Wins above Replacement / Starts) in green against xFIP.

Note: All the data presented herein are from starts only. Data from relief appearances by starters are not included. Source: FanGraphs

You can see that WPA Rate and WAR Rate correlate rather well with xFIP. Of course, WAR's strong correlation is expected, since FanGraphs' pitching WAR and xFIP are both based on unadjusted FIP.

WPA's strong correlation is a bit more surprising, since xFIP is supposed to eliminate contextual effects such as run support, defense and game state, to which both WPA and W-L record are subject. W-L however, is very fuzzy when plotted against xFIP, indicating that it is a poor measure of starter performance.

What's that you say? W-L record is a descriptive stat, not a performance stat? Wins and Losses must only measure what actually happened, not how much of what happened should be credited to the pitcher? Okay, I'll take the bait. Let's see how good W-L record is at describing outcomes by plotting it against another stat that is context-dependent, like ERA...

Not very well, it would seem. Again, WPA Rate and WAR Rate correlate with ERA far better than W-L Rate. The correlation matrix is posted below; as R approaches 1.0, the level of correlation between the two stats approaches perfection.

R W-L Rate WPA rate WAR rate
xFIP 0.4901 0.6559 0.8223
ERA 0.6133 0.7485 0.7223

It's not that W-L record itself is a terrible stat--50-60% correlation is nothing to sneeze at and the statistical significance of the test is very high. It's just that we have so many better stats that tell us how well a pitcher performed and describe how well a pitcher pitched in an individual game. And in that sense, W-L record is a failure.

In both comparisons, WPA and WAR both do a better job than Wins in measuring performance and describing outcomes, with WPA being a slightly better descriptor than WAR and WAR being a significantly better measurement of performance.

Hopefully, the Cy Young and HOF voters will pay attention to these facts, building on the success of the 2009 Zack Greinke candidacy.

