Friday, January 8, 2010

A method for making strength-of-schedule adjustments

So let's say that we have the basic end products of a season of some sport: quizbowl, baseball, football, whatever: first, a win-loss record (and a record of each team's schedule to whom those wins and losses took place) and a few cumulative stats (perhaps points for and against, yards for and against, runs for and against, strikeouts for and against, whatever).

When determining which of two teams is stronger, pundits often attempt to appear effete by moving past the low-hanging fruit of win-loss record, and they instead latch on to net statistics like net points. Generally, a team that outscores its opponents by an average of twenty points per game is better than one that outscores its opponents by an average of five. But is that any better or worse than saying that the former team is better because it went 14-2, not 10-6?

I say not really. I say not really in large part because strength of schedule is unaccounted for. One team playing a much weaker schedule can inflate net points even more than it might inflate record, and vice-versa. So we need to construct a baseline.

Suppose there are six teams in two divisions. Each team plays its two division opponents twice each and the teams in the other division once each, for a seven game season. We have teams A, B, C and teams D, E, F. I propose we construct

rat_A = net_stat_A +(1/7)(2*rat_B + 2*rat_C + rat_D + rat_E + rat_F)
rat_B = net_stat_B +(1/7)(2*rat_A + 2*rat_C + rat_D + rat_E + rat_F)
rat_C = net_stat_C +(1/7)(2*rat_B + 2*rat_A + rat_D + rat_E + rat_F)
rat_D = net_stat_D +(1/7)(2*rat_E + 2*rat_F + rat_A + rat_B + rat_C)
rat_E = net_stat_E +(1/7)(2*rat_F + 2*rat_D + rat_A + rat_B + rat_C)
rat_F = net_stat_F +(1/7)(2*rat_D + 2*rat_E + rat_A + rat_B + rat_C)

What does this mean? This will produce team ratings where zero is a perfectly average team (having played an average schedule and put up a total of zero net points up against it, or having played a schedule of teams a total of seven points above average and put up negative seven net points against it, et cetera). Moreover, these ratings can be solved pretty quickly with a little linear algebra, or with an iterative method: let's suppose that we initialize all the ratings to the net statistics. Let's say A is +40 on the year, B is +5, C is -20, D is +30, E is -10, F is -45. So we write

new_rat_A = 40 +(1/7)(10 - 40 + 30 - 10 - 45) = 40 - 55/7 = 32.14
new_rat_B = 5 +(1/7)(80 - 40 + 30 - 10 - 45) = 5 + 15/7 = 7.14
new_rat_C = -20 +(1/7)(10 + 80 + 30 - 10 - 45) = -20 + 65/7 = -10.71
new_rat_D = 30 +(1/7)(-20 - 90 + 40 + 5 - 20) = 30 - 85/7 = 17.85
new_rat_E = -10 +(1/7)(60 - 90 + 40 + 5 - 20) = -10 - 5/7 = -10.71
new_rat_F = -45 +(1/7)(60 - 20 + 40 + 5 - 20) = -45 + 65/7 = -35.71

and so forth. Of course, the linear algebra method is much more attractive, particularly with 100+ game schedules and 30+ teams.

This type of rating doesn't have to be based off of net points: it can be net yards, for example, or a net measure of some type of advanced stat. It's also general: it doesn't rely on the specifics of any given sport.

No comments:

Post a Comment