Data and statistics have become far more prevalent in football over recent years. At the forefront of this is expected goals (or xG). Since xG was introduced in 2012 by Opta’s Sam Green, the metric has gone on to become one of the most widespread and insightful within football analytics.
Following early adoption by betting companies and professional clubs, expected goals has now become a regular feature for mainstream global broadcasters such as Sky Sports, BBC’s Match of the Day, Bein Sports and NBC. Expected goals, or xG as it’s also known, has ascended from the laptops of analysts and now regularly finds itself in the mouths of Premier League managers and TV pundits.
Expected goals is one of the first advanced metrics to become widely known among general football fans and so it has inevitably faced its critics over the years (see Jeff Stelling in 2017). A battle between the traditional way of viewing the game and the upcoming world of data analytics. However, before we pass our judgement, it is important to understand how the metric works and how we should be using it.
What Is Expected Goals (xG)?
Expected goals (or xG) measures the quality of a chance by calculating the likelihood that it will be scored by using information on similar shots in the past. We use nearly one million shots from Opta’s historical database to measure xG on a scale between zero and one, where zero represents a chance that is impossible to score, and one represents a chance that a player would be expected to score every single time.
We know that a chance from the halfway line isn’t as likely to result in a goal as a chance from inside the penalty area. With xG, we can give numbers to these scenarios. For example, suppose the chance from inside the box is assigned an xG of 0.1. This means that a player would, on average, be expected to score one goal from every ten shots in this situation or 10% of the time.
The terminology may be new, but these phrases have been used by football fans and commentators for years before xG was introduced – “he scores that nine times out of ten” or “he should’ve had a hat-trick today”.
How Do We Calculate Expected Goals?
While watching a game, we can intuitively tell which chances are more or less likely to be scored. How close was the shooter to goal? Were they shooting from a good angle? Was it a one-on-one? Was it a header?
The difficulty is that there are an average of 25 shots per game. That’s 250 shots over a weekend in most competitions. It would take a long time, even for the most well-trained eyes in the sport, to accurately assign likelihoods of scoring to each of these unique situations. And who has the time anyway? Luckily for you, our data scientists do!
With a model, we can calculate the likelihood of a goal being scored for all 9,609 shots taken in the 2022-23 Premier League season in a matter of seconds. The same goes for the 45,764 shots taken across the top five European leagues last campaign. And the 768,394 shots across all the competitions we covered last term. That’s a lot of shots.
Opta’s xG model uses a machine-learning technique called XGBoost (no relation to the naming of the metric) that is powered by nearly one million shots from our historical Opta data. This training data is taken from 40 competitions between 2018-19 and 2021-22.
The model uses several variables from before, and up to, the exact moment the shot was taken. It evaluates how over 20 variables affect the likelihood of a goal being scored. Some of the most important factors are listed below:
- Distance to the goal.
- Angle to the goal.
- Goalkeeper position, giving us information on the likelihood that they’re able to make a save.
- The clarity the shooter has of the goal mouth, based on the positions of other players.
- The amount of pressure they are under from the opposition defenders.
- Shot type, such as which foot the shooter used or whether it was a volley/header/one-on-one.
- Pattern of play (e.g., open play, fast break, direct free-kick, corner kick, throw-in etc.).
- Information on the previous action, such as the type of assist (e.g., through ball, cross etc.).
One unique and innovative feature in our expected goals model is the goalkeeper position feature, which allows us to estimate the probability of a goalkeeper making a save. It uses the distance of the goalkeeper to the shot (a proxy for their reaction time) and their position relative to the line of sight of the shot to the goal, including whether the goalkeeper was inside the penalty box and able to use their hands.
In addition to the interaction between these features, our xG model also infers where the shot taker is likely to aim in the goal and how this affects the likelihood of it being saved. These features allow us to evaluate goalkeeper positioning and see where the optimal position to make a save may have been.
We recognise that some situations are particularly unique and so these are modelled independently. Penalties are the most consistent shot in football and are given a constant value reflective of their historical conversion rate (0.79 xG).
Expected Goals in Women’s Competitions
Opta’s xG measures chances in women’s competitions with a separate model. It was found that some variables, like distance to goal and the goalkeeper’s likelihood of making a save, had a greater influence on the likelihood of a chance being scored in women’s competitions.
For example, we found that if we used the men’s model for women’s shots, we would be underestimating the effect of distance for close-range shots while we would be overestimating the effect of distance for long-range shots.
To preserve the benefits of the depth of historic data in the men’s competitions, the women’s model is powered by the same features described above but it is re-trained on relevant data across nine major women’s competitions between 2018-19 and 2021-22.
The main criticisms of expected goals often appear in scenarios where the metric isn’t being applied correctly. The most common of which is at the game level. A team having a higher xG total in a match doesn’t necessarily imply that they should’ve won the game. xG is only measuring chance quality and not the expected outcome of the game.
Another misconception is in the literal interpretation of the metric name. We do not “expect” goals to occur exactly as the likelihood predicts. We also understand that fractions of goals cannot be scored. The name “expected goals” is derived from the mathematical concept of “expected value” and it is a measure of the likelihood of an outcome occurring.
The expected value of a fair coin toss is 50% likely to land on heads and 50% likely to land on tails (the expected heads or the expected tails is 0.5). We do not expect exactly half of our tosses to land on each outcome, but rather that over a large number of coin tosses, the total number of each outcome should follow closely to this pattern (or to regress to this mean). The same applies to expected goals. Variance from the expected value is inevitable and this is valuable information that we can analyse in football.
Understanding xG Overperformance
A player or team who has been overperforming their xG does not then have to underperform to regress to expectation. This is a concept known as the Gambler’s Fallacy. While we would expect them to regress back to scoring in line with their expectation with their future shots, they have already ‘banked’ this overperformance.
If a player towards the start of the season has already scored five goals more than their expected goals total, it is likely that they may still overperform their final season total by these five goals. In the same way, if a coin toss landed on heads ten times in a row, future coin tosses are still equally likely to land on heads as they are tails, but the ten times that the coin landed on heads have already happened.
How Can We Use Expected Goals?
Let’s compare two players from their 2022-23 seasons: Marseille’s Alexis Sánchez in Ligue 1 and Wolverhampton Wanderers’ Rúben Neves in the Premier League. Both players took exactly 63 shots last season (excluding penalties) but scored 12 and three goals respectively.
So, what was the difference between their shots?
By quantifying the quality of the 63 chances for each player, xG adds additional context to their shots that goes beyond the traditional metrics such as shots on target or average shot distance. We can compare the quality of chances that each player had.
From the chances that Sánchez had, we would expect the average player to score about 10 goals (10.2 xG). On the other hand, from Neves’ chances, we would expect the average player to score only three goals (2.8 xG).
Looking at their shot maps below, we can immediately understand why their goalscoring output was so different. Both players overperformed slightly according to their expected goals output, but their 63 chances were very different in quality, with Sánchez shooting from far closer to goal, and their output reflected that.
We’ve focused on an individual player example here, but the expected goals metric can also be applied to teams or games in a similar manner. Of course, we can see here that a player or team may score more or less often than their xG value suggests but this is exactly the variance we can now analyse. Is a player scoring less than he should be? Who is getting chances from high xG situations?
Expected Goals Depth
Football is a relatively low-scoring sport and so our ability to measure the likelihood of a goal being scored is essential context. With expected goals, we can arm pundits and analysts with another tool to quantify the stories that every football fan wants to hear. Which striker is struggling with their finishing? Which team’s form suggests they should be higher in the league table?
The unrivalled depth of Opta’s data means that we now have over 4.5m shots enriched with xG values for more than 100,000 players, which allows us to compare and understand the performances of players and teams all over the world.
xG is a metric that goes beyond the traditional shot counts, but it is important to remember that it is still just a metric. We can use it to evaluate underlying performances, but it is actual goals that are going to win you football matches.
Football is unpredictable and goals can come from any number of unexpected outcomes but with expected goals, we can explain just how unlikely these were.