MLB Projections (Beta)

Check out the blog for projections and ongoing coverage of the 2016 MLB Playoffs.


Toggle League/Division Projections

Explanation and Methods:
  • My MLB season projections are based on my team RPScore ratings, which you can view here.
  • After each update to the team ratings, I run a Monte Carlo simulation of the remainder of the regular season 50,000 times. Each game is simulated using a random number generator and the Log 5 method, incorporating both team's ratings, and applying an adjustment for home field advantage (using Matt Swartz's numbers).
    • Before each run, each team's ratings are adjusted to account for the error between RPScore's estimate and a team's true talent rating.
      • A random point is picked along each team's rating's Gaussian distribution.
      • The standard deviation of each distribution is assumed to be the same as the standard deviation of all RPScore ratings. This is usually somewhere between 0.06 and 0.07.
      • This produces a rating that is some distance from each team's estimated rating, but which clusters around the original estimation.
    • The 50% column represents the median record for each team at the end of the season by adding the simulated rest-of-season median wins to the wins each team has already banked, and dividing by the expected number of games played (usually 162).
    • The 5% column represents a reasonable lower bound for each team's final regular season record. The 95% column represents a reasonable upper bound for each team's final regular season record.
      • In 5% of the simulations, a team matches or underperforms its record in the 5% column.
      • In 95% of the simulations, a team matches or underperforms its record in the 95% column.
      • In 5% of the simulations, a team matches or outperforms its record in the 95% column.
    • Games postponed without a make-up are assumed to take place after the final day of regular season play until a make-up game is scheduled or it is decided there will be no make-up.
    • The GB column notes how far back of first place each team will find itself at the end of the regular season should each team fulfill its median projection.
    • The rSOS column lists the expected strength of schedule for each team during the remainder of the season (not home field adjusted). It's the weighted average of each of a team's opponent's ratings.
    • The rH% column lists the percentage of games remaining that each team is currently scheduled to play in its home park.

    Those who are interested in access to my data may request it via email. I download all of the data I use to calculate ratings via Zach Panzarino's excellent mlbgame package for Python. I encourage anyone with questions, comments or criticisms to share them in the comments below.

    No comments:

    Post a Comment

    Please Be Civil.