MLB Projections

Explanation and Methods:
  • My MLB season projections are based on my team RPScore ratings, which you can view here.
  • After each update to the team ratings, I run a Monte Carlo simulation of the remainder of the regular season 50,000 times.
    • Each game is simulated using a random number generator and the Log 5 method, incorporating both team's ratings, and applying an adjustment for home field advantage (using Matt Swartz's numbers).
    • Before each run, each team's ratings are adjusted to account for the error between RPScore's estimate and a team's true talent rating.
      • A random point is picked along each team's rating's Gaussian distribution.
      • This produces a rating that is some distance from each team's estimated rating, but which clusters around the original estimation.
      • The standard deviation of each distribution is set at 0.055. Historically, this produces the least error between the projected 80% confidence interval and the actual 80th percentile errors at the end of the season.
    • Additionally, as the simulator simulates later and later dates, it reverts each team's RPScore slightly towards .500.
      • For every day between the the current data and the date being simulated, the simulator regresses each team's RPScore 0.2% towards .500.
      • This is consistent with how each team's RPScore regresses over the course of the season.
  • The table at the top of this page presents projected end-of-regular-season standings as well as high and low estimates for each team.
    • The 50% column represents the median record for each team at the end of the season by adding the simulated rest-of-season median wins to the wins each team has already banked, and dividing by the expected number of games played (usually 162).
    • The 10% column represents a reasonable lower bound for each team's final regular season record. The 90% column represents a reasonable upper bound for each team's final regular season record.
      • In 10% of the simulations, a team matches or underperforms its record in the 10% column.
      • In 90% of the simulations, a team matches or underperforms its record in the 90% column.
      • In 10% of the simulations, a team matches or outperforms its record in the 90% column.
    • Games postponed without a make-up are assumed to take place after the final day of regular season play until a make-up game is scheduled or it is decided there will be no make-up.
    • The GB column notes how far back of first place each team will find itself at the end of the regular season should each team fulfill its median projection.
    • The rSOS column lists the expected strength of schedule for each team during the remainder of the season (not home field adjusted). It's the weighted average of each of a team's opponent's ratings.
    • The rH% column lists the percentage of games remaining that each team is currently scheduled to play in its home park.

    Those who are interested in access to my data may request it via email. I download all of the data I use to calculate ratings via Zach Panzarino's excellent mlbgame package for Python. I encourage anyone with questions, comments or criticisms to share them in the comments below.

    No comments: