using pca to create striker profiles
with forwards from europe's top 5 leagues
In this article I explore the feasability of creating striker roles and subsequently profiles through the dimensionality-reducing method of Principal Component Analysis.
INTRODUCTION
The history of football contains a long list of legendary strikers, from the likes of Pele, Eusebio and Van Basten to those of Henry, Zlatan and Rooney. Whilst all of these great players left a lasting mark on the world of football, their styles varied widely. From physically menacing goal-scoring-machines like Didier Drogba and Christian Vieri, to nimble-footed fox-in-the-box type players like Filipo Inzaghi and Gary Lineker. These different styles, strengths and weaknesses are a major part of what makes discussions around the greatness and who, amongst these greats, is the greatest entertaining and at the same time never conclusive.
Whilst the aforementioned strikers matched their playing styles to their individual strengths and weaknesses, their success within their respective roles was also made possible through the specific set of tactical instructions of the teams they played for and, of course, their team’s success vice versa. For example a team like Liverpool, which relies on heavy pressing to win the ball back as quickly as possible, would probably not have been as successful if they would have employed a striker who’d played more in the role of a Target Man or Poacher, instead of a Pressing Forward (these roles, as the basis of this article, will be discussed later in much more detail). This is where the use-case of this article lies: instead of having to look at the many different features that make up a player’s profile to asses their appropriateness for specific roles, we explore the feasibility of reducing these features into several key components that can be attributed to specific roles.
This would enable people working in or around the football industry, if effective of course, to make assessments about the specific profile of a player with a more objective foundation whilst also taking away questions that could arise when working with a lot of features. In addition we will also be testing if players really excel (albeit with a limited sample size) at one role, but not another, to the degree we perhaps expect them too.
methodology
The data that will be used for this article has been downloaded from the football reference website, which is provided for free by Statsbomb. The final dataset, after removing some less relevant and goalkeeping features, consists of 132 features covering 172 strikers from the Premier League (England), Bundesliga (Germany), Ligue 1 (France), La Liga (Spain) and Serie A (Italy). These features cover overarching topics such as shooting, passing, pass types, dribbling, goal and shot creating actions and defensive actions, which will be transformed into five principal components.
To create these principal components we make use of a method called Principal Component Analysis (PCA). PCA is popular method for reducing the dimensions in datasets with a number of features per observations whilst preserving as much information as desired. PCA does this through linearly altering the selected features into a new coordinate system where a large amount or all of the data’s variation can be described in fewer dimensions. The method will, usually, work particularly well in instances where multicollinearity is strongly present within a dataset. Multicollinearity is a concept in which several variables within a dataset are correlated with one another, which is often the case with football data. For example, a player who will more often pressure an opponent is very likely to have a higher number of tackles and/or interceptions as well.
Using this technique, we distill 5 principal components from the dataset whilst preserving roughly 80 percent of the variance in the dataset. In turn we use correlation scores to distill what features are and are not contributing to the respective principal components. From this we can distill the specific characteristics of each principal component and determine which striker playing style matches this. In addition, to make the components and respective roles a bit more concrete, we take for each component a striker that excels in it and take a look at their underlying stats and whether it matches the role fitted to the component.
COMPONENT ONE:
THE FALSE NINE
The first component is one that has strong positive correlation with all of the passing, dribbling and goals/shot creating actions variables within the dataset. This, combined with the fact that players who have higher scores in this component have a lot of touches in the mid-third of the field, shows a clear resemblance to the role of a False Nine.
A False Nine, for those unfamiliar with the concept, is a role in which the striker often drops lower (centrally) on the field, moving away from the central defenders, to receive the ball in between the lines and connect play and attempting to draw players out of position. The concept has been said to have originated from the Brazilian side Corinthians in the late 1890’s, where center forward G.O. Smith would often drop back to create opportunities for this team mate and specifically the wingers with through balls. This was then already a clear shift from the traditional role of a forward, who would often be instructed to stay as deep on the field as possible.
One of the players that scores particularly high on this dimension is Dutch forward Memphis Depay. Depay usually plays as a center forward, occupying this part of the field in 2563 of his 3882 minutes on the field in all competitions during the 21-22 season. When we take a look at the underlying stats of Memphis Depay, it is easy to see why he scores high in the role of a False Nine. He scores above the 90th percentile in total passes attempted, completed, total progressive distances of his passes, key passes and touches in the mid and attacking 3rd, reinforcing his suitability for this role.
![](https://daanmolendijk.nl/wp-content/uploads/2022/10/Memphis-Depay_pizza-1-947x1024.png)
COMPONENT TWO:
THE DEFENSIVE STRIKER
Next on we take a look at the dimension of aspect of the Defensive Striker. This component is one that has strong and moderate positive correlation scores with tackles, pressures and touches in the defensive third of the field, blocked passes and interceptions. It as well has moderate negative correlation with shooting variables focusing on the total amounts of shots, shots on goals and expected goals as well as touches in the penalty area.
The Defensive Striker is not a role that has as an explicit role in the history of football as the False Nine, but more one that comes with overarching tactics where team holds position (deep) on their own half as is the case with, for example, the infamous tactic of Catenaccio.
Within this component Burnley’s Jay Rodriguez really excelled during the 21-22 season in the Premier League, a conclusion again reinforced by his underlying stats. Rodriguez excels in pressuring on the defensive third of the field, blocking passes and tackling in the defensive third with percentile scores all above the 97.
Rodriguez’s stats do also reinforce that, as also indicated by the correlation scores of this component, the role of a defensive forward will often lead to less offensive output. Rodriguez’s non-penalty expected goals (expected goals of a player without the penalties he possibly has taken) are at the 40th percentile, whilst his expected goals ratio per shot puts him at 22nd. Both are fairly logical when considered that Rodriguez, due to his defensive tasks, will probably have less opportunities to get in the box and thus have less shots on goal from shorter distances (which in turn lowers his expected goals output).
![](https://daanmolendijk.nl/wp-content/uploads/2022/10/Jay-Rodriguez_pizza-1-947x1024.png)
COMPONENT THREE:
THE TARGET MAN
Following the Defensive Striker we will move on to the third component. The third component is one that has the strong and moderate positive correlation with passes made by head, final third passes and passes made under pressure. Whilst having negative moderate correlation with different variables concerning dribbling and carries. This at first glance seems to connect well to the role of the Target Man. A Target Man is a player who is usually tall and physically strong, excels at heading the ball and who’s main responsibility is target is to win aerials, hold up the ball high on the pitch (through their strength) and distribute it under pressure (often aerially, dueling the central defenders or defensive midfielders).
The Target Man appears to be a product of the traditional British tactic of Route One. A tactic aimed at playing as directly as possible through heavy reliance on long balls and aerial superiority, which recently has been employed by teams such as Stoke City (especially under manager Tony Pulis) and Burnley (under Sean Dyche). To employ this tactic successfully a forward who could win these aerial duels and hold up play was very desirable, thus the Target Man was born.
For a player who resembles this style of play nowadays we however have to move towards Italy, where Hellas Verona’s forward Milan Đurić excels within this component. When looking at this physical profile it is clear why this could be the case: Đurić is listed being 1.99 meters tall whilst reportedly weighing in at 99 kilograms.
The underlying stats of Đurić reinforce his strengths as a Target Man: he is ranked, within this sample, as the player with the most headed balls per 90 and most passes under pressures per 90. He is adept at distributing the ball short as well, with a percentile rank of 80. Besides these offensive stats, he also uses his physicality to help in defensive set pieces, having scores of above the 90th percentile regarding both clearances and touches in the own penalty box.
![](https://daanmolendijk.nl/wp-content/uploads/2022/10/Milan-Duric_pizza-1-768x831.png)
COMPONENT FOUR:
THE POACHER
Following the component of the Deep-lying Forward, we move on to the last component of this analysis. This component is one that has positive moderate correlation with the variables concerning shooting and goal-scoring, whilst negatively moderately correlating with the playmaking and receiving variables within the dataset. This combination shows a clear resemblance of the role of the Poacher.
A Poacher is, together with the Target Man and False Nine, one of the more classical striker roles within football. Well-known Poachers, who immediately spring to mind when discussing this role, are Van Nistelrooy, Klose and, of course, Filippo Inzaghi. From these examples you are probably already able to define a clear image of what the playing style of a poacher looks like, but to be sure I’ll give a short summary. Usually Poachers are the strikers who largely do not concern themselves too much with defensive and especially playmaking responsibilities. Instead, they are the sort of striker that largely seems to be invisible from the game, to then suddenly appear at the end of an attack through smart positioning and subsequently finishing it off with excellent goal-scoring capabilities. Thus scoring high in this role results, on average, in more shots, shots on goal, goals scored and expected goals per 90.
One of the strikers, within our sample, that scores high on this component is well-known Lazio striker Ciro Immobile. Immobile appears to fit the mold of a Poacher, with percentile scores above 90 for total shots, shots on target and average goals per match. It must however be noted that Immobile also scores fairly high on the components of The Pressing Forward and False Nine, which does not really match the description of the role of a Poacher nor the correlation results themselves. It is likely that Immobile in that regard is a an outlier compared to the average player in this sample and that in addition his stats also benefit from playing for a relatively dominant team.
![](https://daanmolendijk.nl/wp-content/uploads/2022/10/Ciro-Immobile_pizza-2-947x1024.png)
COMPONENT FIVE:
THE PRESSING FORWARD
Moving on to the fifth component we find one that shows strong and moderate correlation with pressing in the offensive third, receiving the ball and shooting more often. This, combined with the fact that the players who score highest within this component are playing as strikers for teams well-know for their pressing, shows it matches well with the role of the Pressing Forward.
A Pressing Forward is a role that has vastly grown in popularity during the last decade, with it being a essential counterpart of the successful high-pressing tactics being employed by teams such as Liverpool, Borussia Dortmund and RB Leipzig. Within this tactical mindset the striker is relied upon to be one of the first players to trigger the pressing of the team high up the field when the ball has been lost.
A player who scores particurly high on this component is RB Leipzig’s Yussuf Poulsen. This is rather unsuprising with RB Leipzig, as mentioned before, being well known for their pressing playing style. Yussuf Poulsen stats reinforce this profile with percentile scores of pressures in the defensive, mid and offensive third all at or above the 85th percentile. In addition he has respective percentile scores of 90, 95 and 88 for total tackles, tackles in the defensive third and tackles in the offensive third. Additionally, he scores around the 93rd percentile for blocked passes as well.
Offensively Poulsen contributes in an adept manner too. His non-penalty expected goals stat puts him at the 88th percentile amongst strikers. In addition his goals per 90 and shots on target per 90 are above average as well. This reinforces the earlier discussed correlation of this role with offensive output, aside from the defensive aspects of the pressuring forward.
![](https://daanmolendijk.nl/wp-content/uploads/2022/10/Yussuf-Poulsen_pizza-1-947x1024.png)