Update: since first publishing this article Rob Bateman who runs Opta’s Data Editorial team informed me that Sam Green did coin the abbreviation xG in his initial blog post in 2012. I’m happy to clarify this.
Stats in football are not going anywhere. That might be the most obvious takeaway from two new books: Expected Goals: The Story of How Data Conquered Football and Changed the Game Forever by Rory Smith, and Net Gains: Inside the Beautiful Game's Analytics Revolution by Ryan O’Hanlon. It’s not a completely new genre. Starting with Soccernomics by Simon Kuper and Stefan Szymanski (originally Why England Lose) in 2009, then followed by Football Hackers by Christoph Biermann and The Numbers Game by Chris Anderson and David Sally a few years later, this is a topic we now have a chunk of literature on. What’s exciting is how much the story has changed since those books were published.
Soccernomics wrote about using data to understand football as though it was a completely new concept (which, to a mass audience, it was). It didn’t even really want to change the sport in any way, functioning more as insight into something that the authors assumed was just fine as it was. The Numbers Game functioned as a sort of pitch for how Anderson and Sally think analytics inside a club should be done in 2013, while Football Hackers in 2019 (largely an English translation of Biermann’s German book a year earlier) showed us just how much the nerds had changed football in five years. But we were still in a sport of sceptics.
Sitting here in 2022, it feels like the data people quietly won the culture war. Many top clubs might do a poor job of integrating analytics into their process, but none of them deny it’s a thing that matters. “Football has taken data onboard insanely fast”, Smith told The Anfield Wrap, “and probably more quickly and more completely than baseball.” But unlike baseball, nobody wants to talk about it. Football clubs, as both books lament, like to keep their cards as close to their chest as possible. If a club figured out the secret to winning football matches, the last thing they would do is tell their opponents about it. Thus we need books like Expected Goals and Net Gains to figure out exactly what is going on.
It’s a subject I’ve followed closely. I remember being enthralled by the first edition of Soccernomics, before reading I don’t know how many posts from nerdy Liverpool fans around 2012 trying to explain why the team kept missing big chances and hitting the post. But a light switch went off in my head when I stumbled upon the blog StatsBomb about halfway through the 2013/14 season. There were people genuinely trying to understand the why of football. I wasn’t coming at it for any motivation to make money, work in football, or anything like that. I was just a nerd who liked football and found it fascinating to learn more about how the sport really worked.
I was enthusiastic enough about it that I happened to become friends on Twitter with Mike L. Goodman, who later became the managing editor of StatsBomb and suggested I pitch him some article ideas. I became a regular columnist for the site, and that gave me my big break in football media, which is why I’m writing this newsletter instead of working my old job behind the till in a shop. Nothing I’ve ever written could really be described as “doing analytics”. I’m not the kind of person who would be covered in either of these books. I’m a writer with a degree in film studies who doesn’t know how to code. I just looked at analytics work done by others and used it to tell stories about football.
If previous books were about an oncoming transformation, both Smith and O’Hanlon get to write about a revolution right in the eye of the storm. They go about it in completely different ways. It’s not just that Smith, a British writer, chronicles football analytics while O’Hanlon, an American, writes about soccer analytics, though that’s a big part of it. Sitting from their vantage points on opposite sides of the Atlantic, they see the story from different angles.
Smith, currently a correspondent at the New York Times after working his way up through the sport pages of various British newspapers, isn’t necessarily a data evangelist. His book isn’t about the field of football analytics as a subject of intrigue, but simply a story of how it changed football on the inside. This is at times a strength of Expected Goals, as he keeps a level of neutrality in the culture war between data nerds and Proper Football Men. An old-school journalist, he keeps a level of objectivity in reporting what happened without trying to take sides. In that sense, though, this is a difficult story to tell. The book ends up leaning heavily on the experiences of Chris Anderson, in part as a lens to the challenges of changing the football industry, and probably in part because he was just the person willing to talk openly. Though, in truth, he did not end up being a protagonist in the story of “how data conquered football”, and some of his chapters could have been devoted to others.
That distance can also be a weakness. Smith doesn’t have any interest in football analytics beyond how it changed the way clubs operate. This is probably how people inside the game saw data: as a magic box that can make them win more. The book pushes back on this at times, but it never really engages with football analytics as its own field of interest, regardless of what happens within clubs. Considering the title, Smith delves into where expected goals came from and how it became the most well-known metric in the sport. But he never really gets into the purpose of xG and why it’s so useful.
O’Hanlon, by comparison, is no neutral bystander. In his time as an editor at Grantland and then The Ringer, he published some really important and valuable articles on football analytics, from writers such as Bobby Gardiner (now an analyst for AC Milan), James Yorke (currently Director of Football at StatsBomb), and a lot of groundbreaking stuff from Mike Goodman. After that, he wrote his heavily stats-themed Substack newsletter No Grass in the Clouds, before shutting it down – presumably because of the high standard of his competition – and taking a staff writer job at ESPN. (Full disclosure: I worked with O’Hanlon on FiveThirtyEight’s old Soccer Chats feature.)
He’s someone with a sincere enthusiasm for football analytics, and it shows throughout Net Gains. In stark contrast to the competition, it’s a book full of ideas about how the game is played and what analytics teaches us about events on the pitch. Both books talk about the emergence of expected goals, but only Net Gains outlines just why it’s so useful. O’Hanlon uses Cristiano Ronaldo as an example of a player who went on a huge finishing slump while the footballing world insisted he was finished, but xG had his back the whole time. Needless to say, Ronaldo did not suddenly forget how to kick the ball into the goal. Ironically despite the title, Expected Goals by comparison never teaches you about the metric in such a way. O’Hanlon’s book is full of insights about the field, such as the difficulty of quantifying the value of a controlling midfielder like Sergio Busquets. Expected Goals wants to tell us the who, what, when, where and how, but Net Gains is the only book interested in the why.
This enthusiasm arguably makes his history more sound. Both books recount the history of how xG emerged, acknowledging that many different people came up with the idea independently. But Smith credits its emerging popularity directly to Opta and their former data analyst Sam Green, writing:
“Green does not regard himself as the inventor of Expected Goals. He did not know, while he was building his system, of the work being done by StatDNA and Decision Technology, but he recognises now that they were all thinking along similar lines. There was a reason he said, that when he was out pitching Opta’s services to clubs, he did not spend much time in north London. Among his peers at clubs and consultancies, there were a handful of people developong metrics that did much the same thing, after all. His version, though, would be the one that caught the imagination, the one that popularised the idea, the one that would be presented to the Royal Statistical Society, the one that would, in time, germinate so broadly that it appeared on Match of the Day. Green was in the right place at the right time to give football what would turn out to be its breakthrough metric, the one that would take analytics if not into the mainstream, then certainly into one of its tributaries. Football has Green to thank for the – belated, eventual – arrival of xG.”
O’Hanlon tells a different story, crediting its popularity much more to the popularity of public blogging and social media, citing Michael Caley in particular. As he tells it,
“Michael Caley can’t quite claim credit for [analyst Paul] Power’s hated “expected goals”. […] In 2009, on his site, Soccermetrics, Howard Hamilton wrote about the sport’s need for an “expected-goal value” in a post titled “Moneyball and soccer.” According to Caley, Sarah Rudd, formerly of Arsenal, and Sam Green, an early employee of Opta, were both working on expected-goals models in the late [2000s] and early 2010s.
“Once you’re trying to get the components of goals, a lot of people have come to that independently,” Caley said. “I certainly didn’t discover anything because other people had already done it. But I think it’s an idea that makes sense once you start working with the data you have.
Now, Caley does claim that he was the first person to come up with the abbreviation “xG.” And beyond Power’s criticism of the name, the power of xG is that it both scratches at the intuition of anyone who has ever watched a soccer game before and at the same time upends conventional wisdom. Caley and the like were the first to codify the idea with both a term and a mathematical model, but everyone who watches a soccer game is doing so with some kind of expected-goals model in their head.”
No one’s facts are necessarily off here. Green was undeniably a pioneer in the field of xG, with his work certainly changing the game internally at Opta. But, at least from my vantage point, xG does not become the metric without Caley and others lighting up Twitter with their data in the 2010s. It wasn’t just niche blogs and tweets. Two years before the supposed breakthrough moment of xG appearing on Match of the Day, Caley was using the metric in that super niche and obscure publication, The Washington Post. Smith almost completely ignores the online analytics work done in the public sphere, and that’s a glaring oversight (quite how you write a whole book on the subject without mentioning Colin Trainor even in passing is beyond me).
And that’s really the difference between both books. Smith tells his story by talking to people inside football, so any work done outside that sphere, no matter how important, doesn’t seem to matter. It may be a story of what happened inside football clubs, but I don’t think it shows a complete picture of how and where this all came from. O’Hanlon comes with a much greater interest in exploring football analytics as something worth discussing whether it “changed the game forever” or not. I did disagree with a few of his conclusions, but I did find Net Gains to be a richer and more thought-provoking text than Expected Goals.
Net Gains spends a decent chunk of one chapter on Karun Singh, best known for developing his expected threat model. Singh produced arguably the most exciting public-facing work in years. In the book, Singh explains to O’Hanlon his reasons for working on his own blog instead of within a football club:
“”A lot of my motivation starting out in analytics in the first place was kind of academic. From my perspective, what I wanted to see most is the field advance as a whole,” he said. “I think one of the most effective ways to do that is remain an outsider, not be affiliated with a particular club or organisation, which has a couple of benefits. First, of course, you get to open source your work, you get to share it more broadly, you get to share it at conferences, it’s just more accessible. And second, I think if you’re in that club or data company environment you’re always going to be motivated by the problems at hand, whether it’s business problems or the next game that you have to file a report for. Given my academic mindset or motivation toward the whole field, I feel pretty comfortable as an outsider. It gives me a broad view as well of the things people are doing across the board.” He continued, “I’m not ruling out working for a club or organisation at some point. It’s on my bucket list to work for a club and see what that’s like, but in the meantime, I’ve found peace with my current position. It allows me flexibility in terms of the ideas I want to explore. If I find something particularly interesting that isn’t exactly motivated by what a coach is asking for, I can do that without anyone asking me questions. It has its disadvantages – I don’t have as much context on the ground, I haven’t worked directly with other practitioners, I don’t know the best way to talk with them or work with them – but from an innovation perspective I like my current vantage point.”
But hey, if Arsenal come calling, who knows?”
Some time after Singh conducted this interview, Arsenal came calling. A week before Net Gains was published, Singh announced he had joined the Gunners as a data scientist, packing his bags and moving to London. He should be an excellent hire for Arsenal, and it’s obviously a huge opportunity for Singh. But it speaks to a disappointing truth: all of the best, most cutting edge analytics work is sealed off within football clubs. Liverpool have a particle physicist from CERN working on advanced pitch control models. Manchester City have a former statistician for the Treasury (the UK government department in charge of finances) with a PhD in computational astrophysics working on AI.
What have either of these people discovered about football under the employ of such wealthy clubs? I have absolutely no idea. The cutting edge stuff is top secret. All of these clubs invest in analytics as a zero sum game. They want to find an edge their opponents don’t know about. To ensure that’s the case, they have to make sure no one outside their organisations can know what’s going on. They have absolutely no interest in advancing the field overall if it means helping other clubs learn more.
I didn’t become interested in football analytics because I wanted to change the game. If clubs are leaving value on the table with obvious inefficiencies, that’s not my problem. I’m just here to watch the sport and write about it. What I really wanted was to learn more about how football works and understand the game on a deeper level. I’m here for interest and enjoyment, and I don’t gain any more of that if every club in Europe signs players based on an advanced EPV model.
The situation we have now is one where the field is presumably advancing very quickly, but separately, inside football clubs. If you got a senior data analyst at a top club drunk enough, I’d think they could tell you all kinds of groundbreaking things about football. But here on the outside, analytics discussions have barely evolved since I was writing for StatsBomb four years ago. Is this it? Is xG, crude pressing measures and some basic ball progression stuff the sum total of what us mere mortals will ever know about football?
Most clubs do not integrate data into their processes very well at this point, but let’s just say that changes. Let’s say that in ten years’ time, every club incorporates analytics into all their decisions, while using proprietary models far more advanced than anything we can see publicly. At that point, none of us will be able to know what good decision making and good strategy looks like. We’ll have come full circle and we will understand football even less than we ever did before, because the astrophysicists have figured out all the things we will never know.
That doesn’t feel like the exciting future I once hoped for.
Good point about the lack of evolution in the football analytics discussion. Sometimes I hear "this team underlying numbers..." and I think, yeah, you mean almost literally this team's xG.
I follow the NBA closely and I'm a bit more optimistic about this tho. My feeling with the NBA is that the teams are obviously way more forward than the public knowledge about analytics, but there are more or less constant improvements in the public discussion. It will be slower in football beacuse there's much less 'stats culture' in the fans and the media, but I guess we will eventually get there somehow
As someone who follows both Baseball and Football, one long term effect of all (most?) teams using some form of data analysis is that it has been quite anti-player and been used to suppress earnings. Maybe as a less closed shop across Europe (for now…and the EPL makes this arguable too) this won’t be as much of a worry. Publicly there has been some interesting things about biomechanics that have filtered from teams to the public a bit in baseball and that could be interesting in the future for football