Wednesday, March 18, 2009

NCAA: How Did the Selection Committee Make Its Picks?

We're one day before the (real) beginning of the NCAA tournament and before the tournament gets underway, I want to explore the factors that went into the NCAA's selection picks. What polls do they look at? Do they care about RPI? Do they care about more advanced statistical rankings?

A look at the top 24 teams in the tournament gives us some insight into what went into the decisions. I looked at 4 main indicators: The AP Poll, the USA Today/ESPN Coaches poll, RPI, and Jeff Sagarin's advanced rankings. The data was limited to the top 24 teams so that all teams would have a ranking in all of the polls.

Looking at the polls seperately, we can see how well they individually predict the tournament seeding. The chart below shows the % of variability in the seeding explained by each poll.

As we can see, the AP Poll explains 81% of the commitee's seeding and the other 19% is explained by other factors. The ESPN Poll and RPI also explained a high proportion of the variance, while the Sagarin Ratings did significantly worse at predicting the committee's selections, explaining only 62% of the variability.

Of course, the selection committee is not only looking at one of these polls, but has access to all 4 metrics (as well as many more). When we put all four of the ratings into the model at once, what do we get?

The AP Poll and RPI are quite significant predictors of the seeding and the Sagarin Ratings are margnially so. However, once the other factors are included, the ESPN poll is totally worthless. The selection committee appears to discard the ESPN poll as a redundant but inferior poll compared to the AP, which it holds in high esteem.

Removing the ESPN Poll from the model, we find the following regression results:

All three of the remaining factors are significant, but with by the most weight belonging to the AP Poll. RPI is also a strong factor, despite the fact that it is mostly garbage. The advanced statistical metrics, perhaps surprisingly, do hold some weight with the committee, albeit not a lot.

And so it appears that the selections committee's choices (at least for the top 6 seeds) are based on approximately the following - 40% AP Poll, 35% RPI, 15% Sagarin Ratings (or other similar statistical methods) and 10% of unknown committee preferences.

The model holds remarkably well (at least for these 24 cases - next year I'll have to try the method out BEFORE the committee makes its selections). It saw only a few things differently - #2 Duke as a #1 seed instead of UConn, #5 Purdue as a #4 seed instead of Xavier, and #6 West Virginia as a #5 seed instead of Illinois. Of course, this only explains the reasoning behind the selection committee's picks, not where teams should have been seeded. That post will have to wait another day.