For or the last month I have been struggling to publish a credible machine learning prediction of NFL playoff scores. Why? Because I want to get my hands dirty with data analytics. Why football? Because politics and or social issues might lose friends, financial markets might lose friends’ money, and science stuff would be just too dry. Besides most of my friends have some interest in football.
Two weeks ago I published my predictions for the AFC and NFC Championship Games. Since that time I have been transferring my code to Google Colaboratory, so you can run it, and writing this blog post. Now it is the eve of Super Bowl LIII and time to wrap it up. My prediction? LA 29 NE 23. Would I bet on it? No way.
At this point in the post, I suggest you run my code and see all of this year’s playoff score predictions. Click on “Open in Colab” below. Using the “Runtime” menu, select “Run all.” Then scroll the right pane to the bottom and wait for the scores to compute.
Am I pleased with my program? Yes, I set out to learn new skills, mission accomplished. Is it accurate? Sometimes. You will note that have I nailed both LA scores and missed badly on the low side with New England. I obviously have work to do.
I plan to continue improving the program over the next year. The effort to date has been a freewheeling hacker approach. I need to introduce “light” engineering formalism. Starting with testing after each change using past seasons. Something like, admit changes only it improves in three out of four seasons.
For those of you who have no interest in the details, this is a good point to leave. I will post on this blog each time I learn major new concepts about machine learning or football,
The first step is always, where to get data? Was this project going to end before it started? Luckily for NFL football game data, someone else has done most of the work. Andrew Gallant’s (aka Burntsushi) NFLDB provides an interface from NFL game day on the web to a PostgreSQL database on my MacBook Pro. The only downside to NFLDB is that it uses Python 2.7 and I am committed to Python 3. So now I have Python 2.7 with the Burntsushi software, but this is not a great inconvenience since I only use it to update a PostgreSQL database. Also, I added psycopg2, as an interface to PostgreSQL, to my Python 3.
What is the data provided through NFLDB? A hierarchical set of data tables. The highest level is “game.” It contains the teams, who’s home, the final and quarter scores. The second level is “drive.” Its contents include the drive start (time and yard-line), the end condition, e.g. (Touchdown, Field Goal, Punt, Interception, Fumble, etc.) along with the yard line, the number of first downs, yards gained, penalty yards and elapsed time. The drive data is the end of the data which I have used for this first version of my algorithm. Below the drive, there is “play,” “play-player” and “player” data. These tables are detailed at a sufficient level to call the game for a radio broadcast. Who carried the ball, threw the pass, to which intended receiver, and on the defense, who made the tackle, and who assisted? With NFLDB the answers to all of these questions are yours.
How does one predict a result (e.g. a Football score) from data? Three different components must be chosen. A mathematical model for the result needs to be selected. This model can range from a linear equation to a many layer neural network. Secondly one develops a list of features which are chosen from or calculated from the data. Finally, a training algorithm or process is chosen. Another expression for the training algorithm is machine learning.
I am using is a simple linear equation to model football scores.
Where is the score fitted/predicted for the team,
is one if the team is home zero otherwise,
are offensive features of the team, and
are the defensive features of the opposing team. Finally, the
s are the weights fitted by the training algorithm.
Useful features are correlated with the football scores. Also, they need to be much smaller in number than the data they will be fitted to. The football regular season has 256 games yielding 512 team scores. If we choose 512 features our fit to the model will only describe what has happened and have no predictive power. I picked my feature set primarily as a set of probability estimates. The table below describes the offensive features. To be more correct, read probability as the probability estimate from the regular season. I have also incorporated the possibility of measurement per play features, and have added one example “pyp” which is net penalty yards per play.
| Turnover | Probability that an offensive drive results in a turnover. |
| safety | Probability that an offensive drive results in a safety. |
| TDle20 | Probability that an offensive drive, starting between own goalline and own 20, results in a Touchdown. |
| TDle40 | Probability that an offensive drive, starting between own 20 and own 40, results in a Touchdown. |
| TDle60 | Probability that an offensive drive, starting between own 40 and opposing 40. results in a Touchdown. |
| TDle80 | Probability that an offensive drive, starting between opposing 40 and opposing 20. results in a Touchdown. |
| FGle20 | Probability that an offensive drive, starting between own goal-line and own 20, results in a Field Goal. |
| FGle40 | Probability that an offensive drive, starting between own 20 and own 40, results in a Field Goal. |
| FGle60 | Probability that an offensive drive, starting between own 40 and opposing 40, results in a Field Goal. |
| FGle80 | Probability that an offensive drive, starting between opposing 40 and opposing 20, results in a Field Goal. |
| RZ | Probability that an offensive drive which reaches the opposing 20 results in a Touchdown. |
| nfd | Probability that an offensive drive, with no first downs, results in a Punt. |
| Ple20 | Probability that an offensive drive starts between own goal-line and own 20. |
| Ple40 | Probability that an offensive drive starts between own 20 and own 40. |
| Ple60 | Probability that an offensive drive starts between own 40 and opposing 40. |
| Ple80 | Probability that an offensive drive starts between opposing 40 and opposing 20. |
| pyp | Average yards lost per play as a result of a penalty. |
What about defense? I have used the same measures as the offensive measures. Instead of the offensive team against all competitors, I calculated each parameter for all competitors against the defensive team. Yes, the direction of goodness is reversed, but the model can handle that. For example, a good defense will have a lower probability of allowing a touchdown in the Red Zone. These new parameters use the same name with a preceding “D.”
To train my model, I use Bayesian Ridge Regression from Scikit-Learn. Why? Because I read somewhere, it minimizes the problems with multicolinearity and overfitting with inappropriate features.
If you have not run the code yet, I suggest you click on “Open in Colab” below:
I changed my approach to initialization of the enviornment from my previous Colaboratory notebooks. This time i used %%Shell to write a shall script to load the data files. On my own Mac I used SQL for my Pandas dataframes. Saving off to “csv” files seems to be a good way to move to Colabortory.