149: A conversation about college football
A Q&A with the creator of bcfytoys.com, Brian Fremeau, plus a few visualizations.
Welcome to another edition of Bless your Chart.
We’re trying something a little bit different today. I caught up with Brian Fremeau to discuss a bunch of college football topics. Brian is the creator of the Fremeau Efficiency Index, or FEI, a college football rating system based on opponent-adjusted possession efficiency.
I’ve been following Brian’s work for awhile, and I find his perspective on the game quite useful. The preseason FEI ratings were released today, August 6, and you can find the updated ratings at bcftoys.com.
We bounce around multiple topics and we’ve added in some visualizations to help support our conversation.
Please note this Q&A has been lightly edited for length and clarity.
BYC: In all your years tracking data, what is the most surprising statistic you've come across?
BF: I’m genuinely surprised at how stable the numbers can be each season, even though college football is wildly unpredictable. There are teams that kind of surprise and shock us in both directions every year, but in the aggregate, I can lean on past results to be a relatively strong way to forecast future results.
The anchor point for a lot of my work is expected points per drive based on starting field position. And that has remained, perhaps not as surprisingly, a consistent thing that I have not had to kind of modulate from year-to-year.
Of course, there are going to be edge cases like when an extreme jumps off the page. Wait, there were only 13 possessions in the game? How did that happen?
Or 46 possessions in a non-overtime game? The outliers are surprising, and when those things happen, I don't have an answer to why. It's football, and sometimes weird things happen.
It’s a surprise to me that the act of collecting the data doesn’t grow stale to me. The minutia and the regularity of college football is interesting to me. Parts of the game are changing not where I would like to see it go, but new data coming in with games excites me.
Of course, the origin of my site is inspired by a true story. This catastrophic Notre Dame game against Boston College that I can't explain to anybody. I was there and I witnessed it, and it was just mind numbing every moment of that game.

What's the difference between a team's ranking and rating? And do people conflate the two?
It's an issue if there isn't any kind of critical thought given to it, right? Because, yes, there are many people that kind of latch on to the idea of the ordinal ranking.
The ordinal rating carries some relative meaning or we wouldn't do it otherwise. Six is better than seven and six is worse than five. But if you don't add this other kind of level of critical thinking, and say, but six is basically the same as three through nine, then you’re missing something.
I think this is a human instinct. Myself, I'm sure I'm guilty of this from time to time, you can get kind of seduced by the ordinal ranking.
If I’m trying to resist that instinct, I try to think of teams in ranges of ordinal rankings. For example, if you're a contender for the national championship, your ordinal ranking ought to be, and probably will be in at some point during the year, in the top three to four range.
Because the top three to four often create a pretty significant separation, and number five is close, but often there's some kind of extended differences. Five through 15 might be relatively interchangeable in a given year.
I do think the rating is a far more important number to understand. In my numbers, my format for the rating is to represent expected points per possession against an average opponent and it's to the 100th decimal point.
So the difference between two teams ranked near one another is a few hundredths of a decimal point. Difference per possession, over the course of a 24 possession game, maybe that's a point in a game, right?
That's how little or fractionally different they can be.
How does that show up in strength of schedule?
This is harder for people, myself included, to accept. The difference between the best team, number 1, and the 40th best team is a pretty chunky difference. It might be a 15 to 20 point difference if they played one another.
The difference between the hardest schedule and the 40th hardest schedule is more nuanced.
Are we evaluating it based on how hard it would be for a number 1 team to go undefeated against the two schedules? Or how hard it would be for the 40th ranked team to reach a bowl against the two schedules?
Those are different things, and we can and should measure them differently.
What does SP+ see that FEI misses and how does that show up in F+?
The origin of F+ is Bill Connelly and I were recruited to write and publish with the Football Outsiders team almost 18 years ago. I think Aaron Schatz, known for DVOA, said we have to publish one number. Figure out a way to combine your two numbers.
And, it wasn't a significantly in depth exercise. We just said let's see if we can just kind of average them, how might that work? And the readers liked it, and I sort of just carried the torch of republishing it, because people were asking about it after Football Outsiders folded.
I think about the differences in two ways. SP+ is examining a different cut of data at the play level than FEI does at the drive level. You’re going to capture different kinds of success rate at the play level versus the drive level. All those plays cumulatively roll up into a did you do anything with that drive? It's valuable to understand whether you performed successfully down to down, and whether that is true.
The signal I trust is in aggregate, you're going to get the ball 12 times, and what do you do when you have the ball those 12 times? It's sort of just a more nuanced and dynamic way of asking the same kind of question.
The other way that I think of it is our systems are distinctly different in their design. SP+ is designed to be as predictive as possible for upcoming events.
FEI is designed to be as reflective of the reality we have already seen as possible. And those two things could be in conflict with one another.
I think F+ allows a little bit of this and a little bit of that, and it’s probably good. And so what I haven't done in a long time is ask if F+ is performing better than either one individually? I don't know the answer to that question.
I'm glad we do it, and I'm glad it's a cordial college football analyst collaboration for so many years.
Editors note: personally, I love the novelty of just mashing the two metrics together to create F+. It’s not perfect, but that’s the point. It forces the person consuming the metric to imagine or think through how it’s useful. If they tweaked it to match up perfectly, I think you would lose some of the magic of mashing it together.
How do you define complementary football and does it show up in your numbers?
I don't focus on it specifically. I'm not measuring in any way a defense rating is a better complement or worse compliment to a team’s offensive rating. But I do observe that there's something to it.
Because you see teams are the number one offense in the country, but number 110 defense, or vice versa, and what is going on that's not complementary? I think in terms of a championship pursuit, you do have to be strong on both sides of the ball.
I do think what plays into my numbers more so this idea if you chart the pattern of movement of up and down the field, there's a relationship at play between your offense, your defense, and your special teams.
I'm trying to parse out the credit and blame to each of those respective units in the way I delineate defense and special teams numbers, but the relationship between them is absolutely playing out.

What's changed in your preseason methodology over the years?
I first got asked by Aaron Schatz at Football Outsiders to produce a preseason rating in 2007. And I didn't know how I was going to start.
I think all I did at that time was invent in my head that something like a program rating would be a good idea. Let me combine five years of data and start there. And to a degree, I haven't really given up on that today because it’s proven to be good enough.
What I recognize is I’m not trying to get every team right. I think that is folly because we’re going to be wrong, and I think we should just accept that.
I can anchor to the year-over-year data. It’s a set of correlations from preseason ratings to end-of-year ratings that I feel good about. I thought only using last year’s data might outperform using the previous five years, but it doesn’t. It’s a sample size thing, you only have around 12 events in a given year, and there are going to be outliers.
I also think programs are relatively stable. This has been true for the 150 years college football has been played. Michigan is good at football. Are they good at football every year? No, but if your starting point is Michigan’s good at football, then ranking them in the top-ten in a year that they win the National Championship is a good starting point and in a year they go 8-4 is also a good starting point.
It's almost like more of a description of expectation than a prediction. If you start with that expectation with programs are relatively stable, you’re going to miss on Florida State and Indiana like last year. You’re also going to get Georgia, Oregon, Texas, Notre Dame, and Penn State right. If my goal is to start all the teams out fairly and then transition over the course of the season from preseason data to in-season data, I'm satisfied with that.
There were years where I was trying to get more sophisticated with returning starters and other transition factors, like if you coach changed. I tried to play with a coaching number, but I felt like even all of those attempts, maybe I didn't have the time or interest to invest into making it good, but they also felt just as arbitrary. What factor am I inventing here?
Does that mean if you change your offensive coordinator, something's going to change about your offense? It felt like I was inventing that out of thin air, rather than saying, I can measure what this thing does, so let me just stick to publishing it that way.
So I'm going to be higher on Michigan, to go back to that example, than most, because though last year was a dip, the previous four years of Michigan’s results are included in my numbers.
Ohio State is going to be number one even though I know intuitively that they've lost some things this past season. I don't think it's unfair to say, a program that good probably should be back in the playoff again.

How has realignment affected team interconnectivity and how do you think about interconnectivity?
I think interconnectivity is huge. Because if college football goes in a direction that reduces the interconnectivity, it's actually going to make the data worse, and I'm going to be frustrated by that.
I don't know that I would give up, but I will point to the 2020 season. That season is such a perfect lab experiment in the schedule interconnectivity. It was insanely isolated relative to what college football has been in the past. It resulted in what I call bad data. We don’t need to throw it all out, but Buffalo would show up as a top-ten team.
Buffalo was not in the top-ten. None of the MAC played a single non-conference game against a Power Five team at the time. Buffalo went undefeated out of conference and we measure against the past because we didn’t know any better.

How do you feel about nine game conference schedules?
Do I think a nine game conference schedule for other leagues is good?
Sure, watching those games play out might be interesting, but I fear losing the interconnectivity between teams as someone who uses it to evaluate the relative performance of teams. So, I’m not in favor of a nine game conference schedule.
In fact, my argument would be, and this is Notre Dame bias, we should go back to the heyday where there's 20 independents. It creates just as many interesting matchups, if not more so, because it's less rigid than the current system.
The idea of these monster conferences, having teams that have almost nothing to do with one another is absurd to me. Indiana and Penn State last year were both playoff teams, and I'm not begrudging their performance, but we didn’t know much about them and they're in the same conference. They didn't play each other, and they barely played enough common opponents to have us judge whether one is better than the other.
Notre Dame has this one foot in, one foot out, ACC relationship and I understand why, but Notre Dame had basically as many common opponents with Clemson, who they didn't play, than Penn State had with Indiana. That should never happen, and yet it did last year.
And that feels like the future that I don't like for a lot of reasons, but as a data analyst I really don't like what that might do to my faith in the tools I've set up to measure these things.
Does conference strength matter and can we actually measure it?
I don't publish a number that measures it. I know Bill and others do, and people could simply average up the teams or figure it out on their own if they're interested. I just don't know what I would do with that number, so that's why I kind of don't bother.
I don't think describing the SEC as x better than the Big 10, or x better than the Big 12 is useful. This goes to my core argument, the SEC, in and of itself, isn't any of those teams. Those teams are members of the SEC, but every SEC team is effectively playing a very common, but independent schedule.
None of them are playing the same schedule, and now, with the size of these conferences, some of them are even playing wildly different ones. It would be misleading to characterize them as playing an SEC schedule, but having that carry any meaning.
I resist this idea of the conference measurement is, in and of itself, meaningful.
I understand that you can talk about it and use it in the context of the competition. Yes, I agree with that, and I would also support that the competition is weaker in certain conferences and some leagues have a higher top end and a much lower floor. Those arguments can be made.
But I resist the idea that when you are talking about the merits of Alabama, Ole Miss, or SMU and Clemson, that you're using SEC numbers or an ACC number to have that conversation.
No, you can use Alabama's numbers and Ole Miss’ numbers, and Clemson's numbers and SMU numbers, but you can't use this aggregated number because it’s not meaningful.
Conferences are just arrangements between independents. They are financial agreements of who's going to play who, but no one's playing a round robin schedule anymore.
So what advice would you give the playoff committee?
First, the committee needs stat people on it. I’m not advocating for myself. And I say that less so as to use my numbers or Bill’s numbers, but use Bill or a stat person to inform the conversation.
They've got a mountain of data, Sports Source Analytics is the official data source. I presume that the committee is debating something, and I guarantee someone in that room is saying something silly about the numbers. Either they're using them the wrong way or they're saying one thing that they mean, but it means something else.
At a minimum, the committee needs someone to hold that in check. If they did, I bet the room's decision making and conversation would be better for it.
The second thing that I think the committee should do is communicate better to the public. There is no reason to not share the data that they are looking at each week.
You don't have to defend why you deviated from it or why you followed it. I don't understand why we don't know what they're looking at or why don't we just publish what they're looking at. I think the committee would be better served and the public would be better served to peek inside the box.
Amen.
🤟 Thanks for reading this far 🤟
If you enjoyed this conversation, please check out bcftoys.com and give Brian a follow. There is a treasure trove of useful data at his site, and it was a lot of fun to ask him questions and share this conversation.

