Excitement. Tomorrow is the beginning of the new non-league season. I’ve made my plans already and will, all things being well, be putting up a post ,relating to my excursion, during the week. In the meantime I have some more statistics to get out of my system; Again I’ve been looking at attendances, but this time I am looking at the impact of various factors on attendances on a match-by-match basis.
I have been messing around with a statistical technique known as regression analysis and for this analysis I have used data, obtained from www.football.co.uk, relating to attendances at Havant and Waterlooville stretching over two seasons; 2009/10 and 20010/11.
The factors I most wanted to look at were; Form, Day of the week, Local derbies and the ‘pulling power’ of the opposition.
Putting all the data into the regression model I obtained the following output:
- Adjusted R Squared = 30
- Intercept = 544.2
- Derby = 189.7*
- Win Ratio = 281.3
- Weekday = -150*
- Opponent Average = 0.16
(*) = significant at 5% level
Don’t worry too much about what these figures mean at this stage – all will be explained in due course! The first figure, the adjusted R square, refers to how much of the variation in attendances the model as a whole explains. In this case it is 0.30 so we can say that the model explains 30% of the variations in attendances with 70% being down to other, unknown, factors which lie beyond the model.
Then we have the intercept – this a figure relating to what attendances are if all the factors in the analysis = 0. This figure does not always necessarily make sense as often in reality some factors will always be greater than 0. But what we do with this figure is add the coefficients to it
The next set of figures are the coefficient’s. These refer to how much difference each factor I have included in the model makes when all other factors remain constant. In just a moment I will look at the factors I have looked at in more detail.
Finally, significance refers to how sure we can be that the results haven’t occurred by chance. A generally accepted standard for this is the 5% level – if this is not met then we can’t be reasonably sure the results are due to chance.
To measure form I calculated a ‘win’ ratio of the previous 5 games – where a draw would count also as half a win – so if out of the last 5 games there had been 2 wins and two draws this would be 1 + 1 + 0.5 + 0.5 divided by 5 which would = 0.6
If we look at the above graph we can see the relationship between attendance and form over the two seasons in question and to me there certainly appears to be something going on.
The coefficient is 281.3 So this means that if all 5 games have been won then we can expect 281.3 people to show up as opposed to if all 5 games had been lost in which case it would be 0. If on the other hand we had 2 wins and 3 losses we would be looking at 112.5. If it were 3 wins out of the last five? That would be 168.8 – so in that case an extra win would mean an additional 56.3 people could be expected at the ground.
It is important to note however that the coefficient is not statistically significant. This could be because we don’t have enough data, or that the measure of ‘form’ used is not as good as another measure such as goals scored, or form over the lats 10 home games, or some-such thing.
2.) Day of the week
Here I have distinguished between games played on weekends (Saturdays) and games played on weekdays. When all other factors are held constant a weekday game at Havant & Waterlooville can be expected to attract 150 less spectators than a game which is played on a Saturday. Unlike form this coefficient is statistically significant at the 5% level.
What constitutes a derby? Generally speaking it’s a case of proximity, but there are derbies and there are derbies – for example Southampton and Portsmouth is a more intense rivalry that Southampton and Bournemouth. In this analysis I have chosen county as the defining feature of a derby and designated a game as a derby when the opposition are, like Havant & Waterlooville, from Hampshire.
The coefficient was 189.7 meaning quite simply that, all other things held constant, derby games could be expected to attract this number of additional people. Over the two seasons H&W had 7 derby games with the biggest attendance in both seasons being for the game against Eastleigh; 1451 and 1020 respectively – in both cases the highest gate of the season. The coefficient, again, was also statistically significant at the 5% level
4.) Opposition pulling power
I have used the oppositions average attendance for the season in question as a measure of the drawing power of the opposition. The ‘Big’ teams in the league, those with the most drawing power, will tend to have a higher average and may also possess a larger travelling support. Though the coefficient seems small 0.16 what happens if an opponent has an average attendance of 1500? This translates into 240 spectators whereas an opponent who only averages 300 we would expect only an additional 48 spectators. this would suggest that bigger teams do mean higher crowds however we must be careful not to overstate this as this is not statistically significant.
Conclusions: I don’t like Monday’s, Tuesdays, Wednesdays, Thursdays, or Fridays.
The overall results seem to suggest what I expected to find; namely that attendances are higher, for derbies, weekend games, against bigger clubs and when the home team have been playing well. Whilst the results we have seen for the latter two may be due to chance – we can say with a high degree of certainty that the first two do have an impact.
So to increase their attendances clubs need to a.) Cultivate local rivalries b.) Avoid weekday matches as much as possible c.) maintain a winning run and d.) play against opponents who themselves attract big crowds.