For the past week I’ve been messing around with some stats, looking at the geographical distribution of Football League and Premier League clubs by region using the NUTS 1 regional boundaries. The purpose of this exercise being to explore the link between population, wealth and footballing success in terms of the number of top clubs in each region.
Using stats on population by region from the Office for National Statistics I produced this scatter-graph of regional population and the number of Football League teams in each area.
As we can see there does appear to be quite a strong relationship between the two with the number of Football League clubs increasing in line with a regions total population with the correlation coefficient being 0.75. This suggests that the total population of a region does have a reasonably strong bearing on how many top clubs can be found in that region.
For regional wealth however the relationship is not quite as strong. This graph shows the correlation between regional GVA (Gross Value Added), obtained from the ONS, and the number of Football League clubs.
Though there is also a positive relationship- with areas with higher values of total GVA having a higher number of top clubs the correlation coefficient here is a slightly more moderate 0.59 which is also slightly higher than the figure for the correlation between the number of clubs and GVA per head which is 0.42. All this suggesting that of the two measures total GVA and population, population is the stronger factor in predicting the number of football league clubs in a region.
Taking this insight and using the regional population figures I then decided to compare the performance of different regions. This is where things get slightly more interesting. Assuming an equal distribution of the 20 Premier League clubs and an equal share of all 92 Football League clubs throughout England and Wales depending on population I calculated the difference between the expected number of clubs and the actual number of clubs. The result is this graph:
The regions which perform best, with more Premier League and Football League clubs than would be expected based on their share of the total population, are the North West, West Midlands and London. Particularly poor performers are the East Midlands, East of England, South East, and Wales.
One explanation which can be put forward to explain this difference between the regions performance is historical. The North West was the cradle of the early football league, which means clubs there are more established and benefit from deeper connections and more mature support bases than clubs in other regions. Another reason may be that the North West, West Midlands and London are all home to major conurbations, whereas the regions which perform less well or badly such as; the South East, East of England and South West do not. The benefit of conurbations to clubs being the wider catchment areas for support, far more than a club in even a decent sized city like Newcastle, or Sheffield. A good example is if we take the three main conurbations of London, Manchester and Birmingham – in 2011/12 some 50% of Premier League sides came from one of these three urban areas.
One way of testing the importance of conurbations is to use data on regional population density. Though the measure produced by ONS is an overall measure it is likely to be higher in regions with large conurbations. Looking at the relationship first of all however, there appears to be no correlation, but this is largely due to the effect of the outlier London.
If we remove London from the data set then we get this:
The correlation coefficient is a strong 0.82, higher than the figure for total population suggesting that population density is the more important factor in predicting the number of Football League clubs in a region.
A suspicion we may have is that the relationship is also stronger amongst the top of the football league than the bottom. Dividing the Football League in two
The respective correlation coefficients for population density and the number of teams are 0.85 for the top two divisions and 0.62 for the bottom two, so regional population density does matter more in the upper half of the Football League.
Looking at the scatterplots in both cases too the North West, rather than being an anomaly, has roughly the number of clubs in the top half of the Football League which we expect it given it’s population density of a little over 500, this suggests that the history of football in the region is less a factor in explaining its success, rather it is the history of the region in terms of its urban development that matters.