**RPScore**

- This is my best guess of each team's "true" record, or the record one would expect a team to post if it played an infinite number of games against a .500 team on a neutral field.
- Because the breadth of the distributions of its two components,
**Elo Win**and**Xº Win'**,**RPScore**is an average of the two components weighted by the other component's standard distribution. - This ensures that neither component "outweighs" the other solely due to the breadth of their distribution.
- Since
**Elo Avg**is seeded with projected team Wins Above Replacement, this has the effect of regressing early season ratings heavily towards preseason expectations. **RPScore**= ((**Elo Avg*****Elo Weight**+**Xº Win**) / (**Elo Weight**+ 1)) +**Adjustment****Elo Weight**is the amount that Elo Avg is weighted against**Xº Win**to determine**RPScore****Elo Weight**is based on the relative standard deviations of**Elo Avg**and**Xº Win**as well as how many regular season games have transpired.**Elo Weight**= (SD of all**Xº Win'**scores / SD of all**Elo Avg**scores)^((162 - regular season games played) / 162 + 1)**Adjustment**is the value that ensures that the average of RPScores equals .500.- If the mean of all team
**RPScores**deviates from .500, the difference between the initial mean and .500 is added to each final score to ensure that the mean of all**RPScores**.

**Elo Avg**

- Based on the Elo rating system most commonly used in chess, RP's version of Elo differs in several ways.
- First, my system uses Brandon Heipp's (cached) version (the second of his three) of David Smyth's Base Runs formula to predict the expected run differential for each game, calculates the Pythagenpat estimated win-loss record for that game, and then assigns a fraction of a win or loss to each team based on their Pythagenpat score. For instance, if the Nationals beat the Yankees 2-1, would assign a full win to Washington and a full loss to New York. Instead, RP's system assigns them a fraction of a win based on their expected run differential if the two teams had played each other an infinite number of times.
- Second, my system adjusts for home field advantage (based on Matt Swartz's work) when calculating the expectation that either team will win the contest. The home team is awarded ~23 points for divisional match-ups, ~29 points for intraleague and ~35 points for interleague match-ups.
- Third, instead of starting each team out at 1000 at the beginning of the season, it adjusts their initial Elo to match a WAR depth chart where each team's expected total WAR is projected using Tom Tango's WARcel method.
- Starting Elo = ((Projected Team WAR + 52.8) / 162) * 700 + 650
- RP's Elo system assumes a K-factor of 8. This was arrived at by determining which K-factor produces estimations of next-game outcomes with the lowest Brier Score.
- The
**Elo Avg**metric converts the Elo rating to a rate. **Elo Avg**= 1 / (1 + 10^((1000 - Elo Rating) / 400))

**Xº Win**

- This is the simple average of
**0º Win**,**1º Win**,**2º Win**and**3º Win**. It is inspired by Baseball Prospectus' Hit List Factor. **Xº Win**= (**0º Win**+**1º Win**+**2º Win**+**3º Win**) / 4

**0º Win**

- This is simple win percentage.
**0º Win**= W / (W + L)

**1º Win**

- Pythagenpat, developed by Dave Smyth and Patriot, is a method for estimating true record using runs scored and allowed.
**1º Win**= Scored^X / (Scored^X + Allowed^X) where X = ((Scored + Allowed) / Games)^.287

**2º Win**

- This is Pythagenpat estimated win percentage but with the run differential replaced with Base Runs differential. It is inspired by Baseball Prospectus' 2nd Order Win Percentage.
- Furthermore each of the component stats of Base Runs is multiplied by a decay factor, which intentionally underweights older totals at a rate of .999^day.
- For instance, a team's most recent one-game hit total (H
_{0}) would be H * .999^*d*= H * .999^0 = H * 1 = H. - A team's hit total from 10 days ago would count for H * .999^10 = .990 * H = H
_{10}. - A team's hit total from 100 days ago would count for H * .999^100 = .904 * H = H
_{100}. - A team's hit total from
*n*days ago would count for H * .999^*n*= H_{n}. - The sum of a team's decayed hits over a season of duration
*n*would be H_{0}+ H_{1}... H_{(n-1)}. We can express this as ΣH_{(0...n-1)}or, more simply, H_{d}. - As per David Smyth and Brandon Heipp, Base Runs = A * B / (B + C) + D where, when using decayed stats:
- A = H
_{d}+ BB_{d}- HR_{d}- CS_{d} - B = 0.76*1B
_{d}+ 2.28*2B_{d}+ 3.8*3B_{d}+ 2.28*HR_{d}+ 0.038*BB_{d}+ 1.14*SB_{d} - C = AB
_{d}- H_{d} - D = HR
_{d}

**SOS**

- This is the
**2º Win**of a team's opponents, weighted for frequency, discounting stats logged against the team in question, and adjusted for park effects on H, 2B, 3B, HR and BB.

**3º Win**

- This is
**2º Win**adjusted for strength of schedule (and, by extension, park effects). It is inspired by Baseball Prospectus' 3rd Order Win Percentage. **3º Win**is adjusted for strength of schedule by multiplying the odds ratio of a team's**2º Win**by the odds ratio of its strength of schedule.**3º Win**= (**2º Win**/ 1 -**2º Win**) * (**SOS**/ 1 -**SOS**) / (1 + (**2º Win**/ 1 -**2º Win**) * (**SOS**/ 1 -**SOS**))

**L7/L30**

- Change in
**RPScore**over the past seven or 30 days multiplied by 1,000. **L7**= (**RPScore**_{0}-**RPScore**_{7}) * 1,000**L30**= (**RPScore**_{0}-**RPScore**_{30}) * 1,000**RPScore**_{0}=**RPScore**at present;**RPScore**_{n}=**RPScore**as of*n*days previous.

**Luck**

- This estimates how much better each team has performed relative to their
**RPScore**. - First, I re-scale each team's RPScore so that the range between best and worst
**RPScore**is the same as the range between the best and worst**0º Win**. - Then I subtract each team's
**0º Win**from its re-scaled**RPScore**. - Multiplied by 1,000, this is how many basis points each team is over- or under-performing relative to their record to date.
**Luck**= ((((**0º Win**- 0.5) * ((**RPScore**_{MAX}-**RPScore**_{MIN}) / (**0º Win**_{MAX}-**0º Win**_{MIN}))) + 0.5) -**RPScore**) * 1,000

Those who are interested in access to my data may request it via email. I download all of the data I use to calculate ratings via Zach Panzarino's excellent

*mlbgame*package for Python.

**I encourage anyone with questions, comments or criticisms to share them in the comments below.**

## No comments:

## Post a Comment

Please Be Civil.