19 March 2014

Rating Systems Challenge: Multiyear Comparison

It's bracket time again!

As I have for the past few years, I will once again be running the Rational Pastime Rating Systems Challenge, a contest where I pit several bracket-picking systems against each other in an ESPN Tournament Challenge pool. Here's a look at how the various systems have performed over the last few years.

The table below plots the thirteen systems that I have tracked each of the last three years, ranked in order of correct picks made. Now, anyone who picks a bracket knows it's not just about getting the most correct picks—it's about getting the right picks at the right time. That said, my hunch is that correct pick percentage is a better predictor of future performance than how each finishes against other ESPN competitors.

System Correct (3yr)   Centile (3yr)
FiveThirtyEight 62% 84.7
Chalk 62% 82.1
ESPN National Bracket     61% 84.5
Jeff Sagarin 61% 72.6
Pomeroy 61% 69.2
AP Preseason Poll 61% 55.6
Vegas 60% 77
Sonny Moore 60% 61.9
ESPN Decision Tree 59% 58.9
LRMC Bayesian 58% 55.4
NCAA RPI 56% 38.2
Lunardi RPI 54% 34.2
Nolan Power Index 53% 50.9

It's a credit to Nate Silver that his FiveThirtyEight model has both the highest pick percentage and finishes better against the general population that any other system I've tracked. In fact, over the last few years, it's the only system that has outperformed chalk (the NCAA's assigned seeding order) on a pick-by-pick basis.

Over the last couple of years, however, I have added more systems. The table below plots the nineteen systems that I have tracked over the previous two years.

System Correct (2yr)   Centile (2yr)
Survival 68% 85
ESPN Preseason Poll 67% 75
Jeff Sagarin 65% 94
AP Preseason Poll 65% 71
ESPN Computer 64% 97
FiveThirtyEight 63% 91
ESPN National Bracket     63% 89
Vegas 63% 87
AP Postseason Poll 63% 87
Chalk 63% 87
Pomeroy 63% 82
ESPN Decision Tree 63% 81
Sonny Moore 63% 69
ESPN BPI 62% 86
ESPN Postseason Poll 61% 64
LRMC Bayesian 60% 73
NCAA RPI 57% 41
Lunardi RPI 56% 35
Nolan Power Index 53% 53

The FiveThirtyEight model performs well in this sample, but it's not the best. The most correct picks belong to the Harvard Sports Analysis Cooperative's (and now Sports Illustrated's) Survival bracket, which won the 2012 Rating Systems Challenge. However, the model's accuracy did not serve it well last year, picking a lot of early contests but missing later ones. As such, the Survival model has performed at the 85th percentile against ESPN contestants, on average.

The top performer in terms of percentile over the last two years—and which tied for first in 2013—has been ESPN's computer picker, a system that hides behind their Insider paywall. The second best percentile performer was Jeff Sagarin's computer system, followed by last year's co-champ, FiveThirtyEight.

So which systems should you rely on? Well, none in particular, but you should be wary of any system that doesn't outperform chalk. If a rating system can't do better than the Selection Committee's own rankings in the long run, then it has no business ranking basketball teams in the first place. Look at several of the well-performing systems, take injuries into account (FiveThirtyEight already does this) and, most of all, have fun.

An Incomplete List of NCAAB Rating Systems:

