Introduction

With 14 weeks of the season past, the statistics available to us are taking on more and more significance. As such, here is a post outlining my method for simulating the final Premier League table. This area has been written about extensively but what is scarce is the number of articles where (f)analysts outline their methods of running such a simulation – at least in terms that don’t require an in-depth knowledge of mathematics to fully comprehend. I must stress that I am no mathematician/statistician and that most of the information to follow I have adapted / taken from sources across the (f)analytics community. While I will make every effort to credit each of these sources, if something stands out as familiar please let me know so I can give credit where it is due.

Methodology

*Disclaimer* – This section is intended for those interested in the method behind such a simulation and my thoughts on why I have chosen to run it in this way. For those who just want the results feel free to skip this section.

The go-to method for running such an analysis is the Monte Carlo simulation. The Monte Carlo uses a random number and whether your input is above/below this number, it will either be classed as a win or a loss. This process is repeated anywhere from 100 (too few) – 1,000,000 (overkill?) times and an average of each iteration recorded. Personally, I use 10,100 iterations and discard the first 100 results, leaving 10,000 iterations for analysis. The problem with football is that you have 3 possible outcomes rather than the 2 you have in other sports such as baseball, basketball, and American football. To compensate for this, I follow Cassini’s method:

If the random number is lower than the probability of the away team winning, it is recorded as an away win.

If the random number is higher than an away win but lower than the chance of an away win + a draw, then the result is recorded as a draw.

If the random number is greater than that, the result is recorded as a home win.

That leads to the next issue: how do you find the probabilities for every game in the season? Bookmakers typically only price up matches for the following two weeks. If you want to simulate an entire season, you have to have a way of making your own odds.

              Formulating Probabilities:

I generally use my own rating system to formulate odds for games, but I’d need to write another article to explain my methodology for that, so for this I’m going to use TSR [Total Shot Ratio: Shots For/(Shots For + Shots Against)]

Mark Taylor wrote an article for Pinnacle in which he outlines a method to convert TSR to odds. Using this method, we can find the chance of two teams winning a match. Let’s take the upcoming match between Watford and Everton as our example:

Watford have a TSR of 0.45

Everton have a TSR of 0.53

I add in a slight home advantage at this stage as TSR doesn’t account for this. Then following Taylor’s method we get each team’s chance of winning:

Watford: 41.08%

Everton: 49.93%

The problem here is that we haven’t taken the draw into consideration. By multiplying the chances of Watford and Everton winning and adding an inflation figure to bring it in line with sports betting markets (I’ve found +0.15 to work best) we get the probability for the draw: 35.5%.

I know what you’re thinking, “Peter, 41%+50%+35% is more than 100%.” It is indeed. 126%. So to bring the odds back in line we divide each possible outcome by the total amount:

 

 

capture

When we convert these to decimal odds we get

Watford -> 3.08

Everton -> 2.53

Draw -> 3.56

Or about 21/10, 6/4, and 5/2 in old money.

Here’s the average odds on offer from Oddsportal

watev

So I’m quite happy with that.

Simulating the Match

Using Excel’s RAND() function to give us a random number between 0 and 1 we get 0.259.

If this random number is less than the away team’s chance (0.3947 or 39.47%) it is recorded as an Everton win. If it is less than the chance of an away team win + the chance of a draw, it is recorded as a draw, and if it is higher than that it is recorded as a home win.

Since the random number is lower than Everton’s chance of winning, the match is recorded as an Everton victory. The home team gets 0 points, and the away team gets 3. We do this for all the remaining fixtures in the Premier League and sum the total points each team finishes with. Use this final points tally as our input in the first row/column of a data table (I prefer to run mine horizontal rather than Taylor’s vertical table as it makes collecting results much easier as you do not need to reformat the table afterwards/rewrite formulas for each column to row, just drag and copy).

We make a data table (10,100 iterations for me), and let Excel do its magic. This takes a while so it’s good to do some busy work – make a cuppa, take the dog for a walk, read a novel, write a novel, grow a beard of wisdom and finally voila.

I always check the % of home wins, draws, and away wins to make sure they are in line with past seasons. This simulation shows home wins accounting for 41% of results, draws making up 27%, and away wins making up 32%, so it’s definitely in range and I’m happy with that.

Results

So after cleaning up the data and getting our means, medians, quartiles etc… we have our final table:

onpaste-20161208-125310

Liverpool are still expected to win the league but only just. In this round of simulations, Chelsea closed the gap to 0.185 points – what an end to the season that would be!

Burnley, Sunderland, and Hull are expected to take the drop with Swansea just about managing to hang on, and last season’s champs are set for an obscure… hmm I not a fan of that chart.

onpaste-20161208-125942

15th. The champions of the EPL are expected to finish in the bottom half of the table. Yikes.

Using this method also lets us make some pretty cool charts. Here’s a box and whisker plot of the results:

onpaste-20161208-132603

Alright, well that’s all for now. Any questions please feel free to ask below or on twitter @petermckeever