10 minute read

Motivation

In my last post, I was playing around with data from an API offered by http://collegefootballdata.com.

Then I found some blog posts on the site, and I thought a couple of them on how Simple Rating Systems work were interesting.

In particular, this one.

They did all their coding in Python, but I’m becoming more and more of an R fan, so I thought I’d re-code the wheel, so to speak.

In their post, they used data from the 2019 season, so for consistency, I’ll do the same.

Also, in the last post, I was using tidyr to un-nest JSON data.

It seemed add to me that you have to do these repeated unnest_auto() steps to get things parsed out.

If it’s automatic, why do I have to keep doing it manually?

After some more Googling, I found the tidyjson package, which has a nice spread_all function that I’ll use instead.

Simple Rating System: The Math

This, to me, is the coolest part.

I had no idea this is how some rating systems work, and it’s pretty slick.

It’s just one big system of equations that you solve with regular ‘ol linear algebra.

In other words, solve .

That’s it.

I’ll start with the b vector - that’s easiest to explain.

The b Vector

The b vector is each team’s average margin of victory for the season.

Couldn’t be any simpler.

The A Matrix

This is a little more complicated.

The A matrix will have dimensions of 130x130 - one row and column for each FBS team.

The diagonal will be 1’s (i.e., the identity matrix).

Think of the rest of the matrix in terms of rows.

We’ll set it up alphabetically, so the first row will be for Air Force.

First, we’ll count how many games Air Force played that season.

Then we’ll identify all of Air Force’s opponents - those are the columns.

As I said, the Air Force-Air Force entry will have a 1.

Moving across the columns, if Air Force didn’t play that team, put a 0 there.

If they did, divide the number of times Air Force played that team by the total number of games played and put that value in the column.

Keep doing that until you get to the last column (i.e., that last potential match-up).

Then repeat that process for the next team, Akron, and then the next, etc.

That’s it.

This matrix represents each team’s strength of schedule.

Pretty clever, right?

A teams rating is it’s mean margin of victory adjusted by it’s strength of schedule.

The Code

First we need to get all the FBS team names so we can exclude non-FBS games.

library(tidyjson)
library(dplyr)
library(httr)

fbs <-
  httr::GET(
    url = "https://api.collegefootballdata.com/teams/fbs?year=2019",
    httr::add_headers(
      Authorization = paste("Bearer", Sys.getenv("YOUR_API_TOKEN"))
    )
  )

fbs_teams <-
  httr::content(fbs, "parsed") %>% # convert response to a nested list
  spread_all %>% # rectangularize nested list into a dataframe
  arrange(school) # make sure teams are in alphabetical order

Now we’ll get team win-loss records.

records <-
  httr::GET(
  url = "https://api.collegefootballdata.com/games?year=2019",
  httr::add_headers(
    accept = "application/json",
    Authorization = paste("Bearer", Sys.getenv("YOUR_API_TOKEN"))
  )
)

team_records <-
  httr::content(records, "parsed") %>%
  spread_all

Now get scores and margin of victory for each game and eliminate non-FBS games.

Eventually we’ll use this for the $b$ vector, but first we’ll need it in this format for the A matrix.

scores <- team_records %>%
  filter(home_team %in% (fbs_teams %>% .$school) & away_team %in% (fbs_teams %>% .$school)) %>%
  select(home_team, away_team, home_points, away_points) %>%
  mutate(home_mov = home_points - away_points)
head(scores)
## # A tbl_json: 6 x 6 tibble with a "JSON" attribute
##   ..JSON                 home_team  away_team   home_points away_points home_mov
##   <chr>                  <chr>      <chr>             <dbl>       <dbl>    <dbl>
## 1 "{\"id\":401110723..." Florida    Miami                24          20        4
## 2 "{\"id\":401114164..." Hawai'i    Arizona              45          38        7
## 3 "{\"id\":401117854..." Cincinnati UCLA                 24          14       10
## 4 "{\"id\":401111653..." Clemson    Georgia Te~          52          14       38
## 5 "{\"id\":401114236..." Tulane     Florida In~          42          14       28
## 6 "{\"id\":401110731..." Texas A&M  Texas State          41           7       34

Ok, now we can start to generate the A matrix.

First, I’ll populate it with the number of times each team faced each other.

There’s probably a more elegant way, but this is what came to me first.

A <- data.frame(diag(0, nrow=130, ncol=130), row.names = fbs_teams %>% .$school)
colnames(A) <- fbs_teams %>% .$school

# populate dataframe with
for (r in 1:nrow(scores)){
  home <- scores[r, 1] %>% .$home_team
  away <- scores[r, 2] %>% .$away_team
  A[home, away] <- A[home, away] + 1
  A[away, home] <- A[away, home] + 1
}

# clean up
rm(away, home, r)

A[1:6, 1:6]
##                   Air Force Akron Alabama Appalachian State Arizona
## Air Force                 0     0       0                 0       0
## Akron                     0     0       0                 0       0
## Alabama                   0     0       0                 0       0
## Appalachian State         0     0       0                 0       0
## Arizona                   0     0       0                 0       0
## Arizona State             0     0       0                 0       1
##                   Arizona State
## Air Force                     0
## Akron                         0
## Alabama                       0
## Appalachian State             0
## Arizona                       1
## Arizona State                 0

Hold that thought on the A matrix - we need a little more work to proceed.

Next, rearrange the scores data to get one margin of victory score for each team and each game.

mov <- scores %>%
  select(home_team, home_mov) %>%
  rename(team = home_team, mov = home_mov) %>%
  bind_rows(scores %>%
              select(away_team, home_mov) %>%
              rename(team = away_team, mov = home_mov) %>%
              mutate(mov = -mov))

Now count the total number of games each team played.

n_games <- mov %>% count(team) %>% .$n

Multiply A’s columns by 1 / n_games.

MARGIN=1 specifies to sweep across columns.

A <- sweep(A, 1/n_games, MARGIN=1, FUN =`*`)

Finally, add the identity matrix and A is built.

A <- A + diag(1, nrow=130, ncol=130)
A[1:6, 1:6]
##                   Air Force Akron Alabama Appalachian State    Arizona
## Air Force                 1     0       0                 0 0.00000000
## Akron                     0     1       0                 0 0.00000000
## Alabama                   0     0       1                 0 0.00000000
## Appalachian State         0     0       0                 1 0.00000000
## Arizona                   0     0       0                 0 1.00000000
## Arizona State             0     0       0                 0 0.09090909
##                   Arizona State
## Air Force            0.00000000
## Akron                0.00000000
## Alabama              0.00000000
## Appalachian State    0.00000000
## Arizona              0.09090909
## Arizona State        1.00000000

Now calculate the mean margin of victory for each team.

This is the $b$ vector for the system of equations.

b <-
  mov %>%
  group_by(team) %>%
  summarize(mean_mov = mean(mov)) %>%
  .$mean_mov

It took a while to build the system of equations, but solving it is a one-liner.

solve(A, b)
##             Air Force                 Akron               Alabama
##           12.17488262          -22.66408637           32.60720600
##     Appalachian State               Arizona         Arizona State
##           21.80515687          -13.03736947            1.10205537
##              Arkansas        Arkansas State                  Army
##          -24.46835876           -0.52637216            1.05006201
##                Auburn            Ball State                Baylor
##            5.88613703            0.40799928           14.36295601
##           Boise State        Boston College         Bowling Green
##           17.32564683           -2.78480549          -33.10414546
##               Buffalo                   BYU            California
##           10.20151104            0.46092289           -6.46928285
##      Central Michigan             Charlotte            Cincinnati
##           10.31277888           -4.34284716            7.36008829
##               Clemson      Coastal Carolina              Colorado
##           41.48627376            0.83014089          -11.45324149
##        Colorado State           Connecticut                  Duke
##           -2.51297888          -22.92597291          -10.33694277
##         East Carolina      Eastern Michigan               Florida
##          -12.21872118           -2.75282448           12.69160016
##      Florida Atlantic Florida International         Florida State
##            8.62463543            1.37276246           -6.07744689
##          Fresno State               Georgia      Georgia Southern
##            0.50823966           15.23789346           -1.56619791
##         Georgia State          Georgia Tech               Hawai'i
##           -2.53872623          -20.00376049            0.03002992
##               Houston              Illinois               Indiana
##          -11.22633847            5.83248366            6.84673009
##                  Iowa            Iowa State                Kansas
##           10.19598383            8.32429892          -17.55664637
##          Kansas State            Kent State              Kentucky
##            9.86298135           -0.90629641           11.16790944
##               Liberty             Louisiana      Louisiana Monroe
##           11.38026279           14.12089636          -10.96010135
##        Louisiana Tech            Louisville                   LSU
##           19.83837303          -11.11555104           24.61782878
##              Marshall              Maryland               Memphis
##           -4.53506310          -21.13126417           16.67762020
##                 Miami            Miami (OH)              Michigan
##            3.40729287           -9.17435086            9.40439559
##        Michigan State      Middle Tennessee             Minnesota
##           -2.84667123           -5.41633827           14.54830499
##     Mississippi State              Missouri                  Navy
##          -10.04679640            5.12551526           14.51847575
##              NC State              Nebraska                Nevada
##          -10.13346295           -1.06634224          -10.70442169
##            New Mexico      New Mexico State        North Carolina
##          -18.96441796          -26.88151749            1.61200847
##           North Texas     Northern Illinois          Northwestern
##           -1.33040741           -5.75767308           -9.39415805
##            Notre Dame                  Ohio            Ohio State
##           21.22221810           10.59616933           36.35622851
##              Oklahoma        Oklahoma State          Old Dominion
##           15.82512098            3.23809017          -14.57638255
##              Ole Miss                Oregon          Oregon State
##           -0.47404615           20.18886699           -6.91414098
##            Penn State            Pittsburgh                Purdue
##           13.07833337           -4.24030653           -4.44792009
##                  Rice               Rutgers       San Diego State
##           -7.94447288          -27.74865910           11.75886571
##        San José State                   SMU         South Alabama
##           -0.51222213           14.52255973          -17.50751350
##        South Carolina         South Florida  Southern Mississippi
##          -20.16977658          -15.55091539           -3.01895055
##              Stanford              Syracuse                   TCU
##          -10.41277836           -5.93538233           -1.45957680
##                Temple             Tennessee                 Texas
##            2.25488829           -4.03217697            1.54246752
##             Texas A&M           Texas State            Texas Tech
##            0.24533392          -19.98074165           -2.56724537
##                Toledo                  Troy                Tulane
##           -9.23735051           -1.27992009           -1.44073888
##                 Tulsa                   UAB                   UCF
##           -9.98788905            7.89737667           22.70582261
##                  UCLA                 UMass                  UNLV
##           -9.73407669          -28.89607277          -12.08819590
##                   USC        UT San Antonio                  Utah
##            3.58662283          -18.39429259           20.89793643
##            Utah State                  UTEP            Vanderbilt
##           -9.99000619          -13.81130518          -22.07917803
##              Virginia         Virginia Tech           Wake Forest
##            1.72160891            8.58302730            1.14879646
##            Washington      Washington State         West Virginia
##            8.68703548            7.56504382          -12.05131807
##      Western Kentucky      Western Michigan             Wisconsin
##           11.04805953           10.09365243           11.70814611
##               Wyoming
##           10.07670372

If you’re familiar with linear models in R, this bit of code does the same thing.

Don’t forget to include a -1 to drop the intercept term.

lm_A <- cbind(A, b)
coefficients(lm(b ~ . -1 , data=lm_A))
##             `Air Force`                   Akron                 Alabama
##             12.17488262            -22.66408637             32.60720600
##     `Appalachian State`                 Arizona         `Arizona State`
##             21.80515687            -13.03736947              1.10205537
##                Arkansas        `Arkansas State`                    Army
##            -24.46835876             -0.52637216              1.05006201
##                  Auburn            `Ball State`                  Baylor
##              5.88613703              0.40799928             14.36295601
##           `Boise State`        `Boston College`         `Bowling Green`
##             17.32564683             -2.78480549            -33.10414546
##                 Buffalo                     BYU              California
##             10.20151104              0.46092289             -6.46928285
##      `Central Michigan`               Charlotte              Cincinnati
##             10.31277888             -4.34284716              7.36008829
##                 Clemson      `Coastal Carolina`                Colorado
##             41.48627376              0.83014089            -11.45324149
##        `Colorado State`             Connecticut                    Duke
##             -2.51297888            -22.92597291            -10.33694277
##         `East Carolina`      `Eastern Michigan`                 Florida
##            -12.21872118             -2.75282448             12.69160016
##      `Florida Atlantic` `Florida International`         `Florida State`
##              8.62463543              1.37276246             -6.07744689
##          `Fresno State`                 Georgia      `Georgia Southern`
##              0.50823966             15.23789346             -1.56619791
##         `Georgia State`          `Georgia Tech`               `Hawai'i`
##             -2.53872623            -20.00376049              0.03002992
##                 Houston                Illinois                 Indiana
##            -11.22633847              5.83248366              6.84673009
##                    Iowa            `Iowa State`                  Kansas
##             10.19598383              8.32429892            -17.55664637
##          `Kansas State`            `Kent State`                Kentucky
##              9.86298135             -0.90629641             11.16790944
##                 Liberty               Louisiana      `Louisiana Monroe`
##             11.38026279             14.12089636            -10.96010135
##        `Louisiana Tech`              Louisville                     LSU
##             19.83837303            -11.11555104             24.61782878
##                Marshall                Maryland                 Memphis
##             -4.53506310            -21.13126417             16.67762020
##                   Miami            `Miami (OH)`                Michigan
##              3.40729287             -9.17435086              9.40439559
##        `Michigan State`      `Middle Tennessee`               Minnesota
##             -2.84667123             -5.41633827             14.54830499
##     `Mississippi State`                Missouri                    Navy
##            -10.04679640              5.12551526             14.51847575
##              `NC State`                Nebraska                  Nevada
##            -10.13346295             -1.06634224            -10.70442169
##            `New Mexico`      `New Mexico State`        `North Carolina`
##            -18.96441796            -26.88151749              1.61200847
##           `North Texas`     `Northern Illinois`            Northwestern
##             -1.33040741             -5.75767308             -9.39415805
##            `Notre Dame`                    Ohio            `Ohio State`
##             21.22221810             10.59616933             36.35622851
##                Oklahoma        `Oklahoma State`          `Old Dominion`
##             15.82512098              3.23809017            -14.57638255
##              `Ole Miss`                  Oregon          `Oregon State`
##             -0.47404615             20.18886699             -6.91414098
##            `Penn State`              Pittsburgh                  Purdue
##             13.07833337             -4.24030653             -4.44792009
##                    Rice                 Rutgers       `San Diego State`
##             -7.94447288            -27.74865910             11.75886571
##        `San José State`                     SMU         `South Alabama`
##             -0.51222213             14.52255973            -17.50751350
##        `South Carolina`         `South Florida`  `Southern Mississippi`
##            -20.16977658            -15.55091539             -3.01895055
##                Stanford                Syracuse                     TCU
##            -10.41277836             -5.93538233             -1.45957680
##                  Temple               Tennessee                   Texas
##              2.25488829             -4.03217697              1.54246752
##             `Texas A&M`           `Texas State`            `Texas Tech`
##              0.24533392            -19.98074165             -2.56724537
##                  Toledo                    Troy                  Tulane
##             -9.23735051             -1.27992009             -1.44073888
##                   Tulsa                     UAB                     UCF
##             -9.98788905              7.89737667             22.70582261
##                    UCLA                   UMass                    UNLV
##             -9.73407669            -28.89607277            -12.08819590
##                     USC        `UT San Antonio`                    Utah
##              3.58662283            -18.39429259             20.89793643
##            `Utah State`                    UTEP              Vanderbilt
##             -9.99000619            -13.81130518            -22.07917803
##                Virginia         `Virginia Tech`           `Wake Forest`
##              1.72160891              8.58302730              1.14879646
##              Washington      `Washington State`         `West Virginia`
##              8.68703548              7.56504382            -12.05131807
##      `Western Kentucky`      `Western Michigan`               Wisconsin
##             11.04805953             10.09365243             11.70814611
##                 Wyoming
##             10.07670372

To visualize the ratings, let’s make a plot of the top 25.

library(ggplot2)
library(forcats)

srs <-
  tibble(team = fbs_teams$school,
         rating = solve(A, b),
         color = fbs_teams$color)

top_25 <-
  srs %>%
  arrange(desc(rating)) %>%
  slice(1:25)

ggplot() +
  geom_col(data = top_25, aes(x = fct_reorder(team, rating), y = rating), fill = top_25$color) +
  coord_flip() +
  theme_bw() +
  ylab("Rating") +
  xlab("Team")

In the College Football Data blog, they further refine the rating by factoring in home field advantage, conference strength, and things like that.

That fine, but I just wanted to get the basic mechanics down.

Comments