08 April 2013

Rating Systems Challenge: Equations are Red, Experts are Blue

Georgia Dome
Site of the 2013 Final Four
Source: Wikimedia Commons
This post is one in a series investigating how the various college basketball rating systems predict the men's NCAA Tournament. Click here to read the rest.

While tonight's contest will feature a battle between two squads of college students who represent the pinnacle of their sport, the winner will also determine whether the best prognosticators of NCAA Tournament Basketball were a council of experts—the survey samples for the ESPN and AP Preseason Polls—or advanced models developed by Nate Silver and ESPN.

In The Signal and the Noise, author Nate Silver makes a convincing argument that experts are typically bad prognosticators and groups of experts are seldom better. Instead, we should look at rigorous, testable models based on objective, verifiable data. However, even advanced models can fail to predict outcomes in low-information systems where minor variations can cascade into major disruptions.

Weather forecasting is one attempt to simplify such a system. Forecasting tournament play in sport is another.


To borrow more from Mr. Silver, in both meteorology and sports prognostication, simple historical baselines do not provide us with much useful information. Sure, there may be a 20% chance of rain* on any given April 10th in Atlanta, GA, but that doesn't tell us whether Atlantans should take their umbrella when they leave for work tomorrow. Likewise, while it may be true that the ultimate champion in ~58% of NCAA Tournaments has been a one-seed,** that tells us very little about whether 1 Louisville is going to beat 4 Michigan tonight in the Georgia Dome.

*not a researched estimate
**since the NCAA Tournament expanded to sixty-four teams, according to CBS Sports

Instead, we try to build models that rely on first principles (adiabatic heat exchange, Log5) and spit out probabilistic assessments. These models are most likely to fail when we have imperfect information about outcomes based on multistage causality. This is why, in a given year, a council of experts in November might outwit the most advanced models—one designed by Mr. Silver himself—with the most up-to-date information.



A win for Louisville is a win for math: the FiveThirtyEight and ESPN Computer brackets may be trailing, but a Cardinals title would put them over the top. Conversely, a win for Michigan is a win for the experts—or at least November's experts. The preseason polls expected Indiana to win it all so, while they may lead now, a Michigan win is the only way they can hold onto that lead.

Either way, do not let tonight's outcomes distract from the fact that the postseason polls underperformed chalk yet again. Maybe the experts—be they journalists or coaches—aren't that smart after all.

Follow us on Facebook and Twitter—or stay tuned—to find out which system won Rational Pastime's 2013 Rating Systems Challenge.

5 comments:

Mark Monnin said...

Thank you for comparing the many systems for us. Which system(s) do you think are consistently best? It looks to me like FiveThirtyEight gets better each year.

JD Mathewson said...

Mark:

The FiveThirtyEight system has performed better than last year. Just about any system will perform better than it did in 2011, since none of the systems made any correct picks after the Elite Eight. So I would be careful before we call it a trend.

I only started covering the Survival Model since last year, but over time I think it has performed the best.

JD Mathewson said...

I'll be doing a full wrap up including a multiyear comparison soon. Thanks for following!

Mark said...

Hey JD! Are you going to be doing another comparison this year?

JD Mathewson said...

Sure am! I'll be posting that multiyear comparison today (the one I said I was gonna do 11 months ago) and then it's off to the races.