01 April 2016

It Begins: 2016 Edition

This weekend marks the opening of the 2016 Major League Baseball regular season. Rational Pastime will be following it game-by-game, updating RPScore ratings along the way and checking in occasionally to see what insights the data yield.

As in the past, this year's RPScore ratings are seeded with WAR projections from FanGraphs and park data from ESPN. The season projections and spreads above are the result of running that data 1,024 times through my season simulator. At this point in the season, these projections are rather broad. For instance, the 20.6-win 90% confidence interval is nearly twice as large as the 12 win spread between the best and worst American League teams' median projections. In another instance, last year's disappointing Washington Nationals are just as likely to win 80 games as they are 100.*

*Please head on over to Cards Conclave to read my analysis of the Nationals' offseason and expectations for 2016.

But first, let's take a look at how we get there from here.

I could begin 2016 by assuming every team is equally talented on opening day. We know this not to be true, but what other assumptions may we make? There are a variety of projection systems in the sabermetric universe, but all we need is a starting point. We don't need to overthink this; as the season continues, actual team performance will correct some opening day errors.

The Fangraphs WAR depth charts are among the most straightforward team projection systems available. Fangraphs uses its own Steamer and ZiPS projections, combined with assumptions about each team's lineup, to project how many wins above replacement each team will generate. These numbers will serve as a suitable estimation of team talent for the RPScore system. From there, we can estimate how many wins each team's lineup will add and what Elo score we should seed the RPScore system with.

In the table above, I converted Fangraphs' WAR projections Wins Added stat by assuming a replacement level, where Wins Added = projected WAR + 81 - the average of all projected WARs. I then converted those Wins Added numbers to Elo ratings using the equation Elo = 700 * (Wins Added / 162) + 650. The Elo scores above will serve as each team's rating on opening day.

You may notice that the Wins Added projections derived from Fangraphs' depth charts differ by a couple of wins from the full-season projections in the graphic at the top of this post. That is the result of variations in strength of schedule. Based on Fangraphs' projections and the schedules each team faces, the defending National League Champion New York Metropolitans have the smoothest road ahead with an average opponent strength of .483. Thanks to the weakness of the rest of the NL East, the Mets' and Nats' easy schedules should grant them an extra three wins. At .509, the Orioles and Brewers have the rockiest rows to hoe.

Initial team strength is not the only variable the RPScore model considers. As game data come in, my model adjusts hits and walks for the parks in which each game takes place. Some parks favor hits and and extended at bats more than others, and these "park factors" are derived from five years of ESPN park factor data, except in the case of new parks or modified dimensions. In these cases, park values are fully regressed to the baseline value of 1.00 for all events (thus Marlins' Park's 1.00 rating across the board for 2016).

Moreover, the model adjusts game-by-game expectations for each team based on home field advantage, using Matt Swartz's numbers for division, league and interleague advantages. I converted each of these advantages to an Elo value that the model adds or subtract to each team's Elo rating when estimating the probability of victory.

The way the RPScore's Elo system works, ratings are adjusted based on a combination of factors: 1) each team's rating coming in (adjusted for home field advantage); 2) how dominant each victory or defeat is, based on the BaseRuns each team scored; and 3) how much emphasis the system puts on each individual game, known as the K factor. Several years of data indicate that a K factor of 16 produces the lowest root mean squared error for each individual game prediction.

These numbers will remain constant over the course of the year, except for park factor data, which I update at the start of the postseason. If you've read this far, I hope you'll stop by as the season continues to see how well your teams, as well as the RPScore system, are performing.

Play ball!

No comments: