Win Likelihood (NFL, MLB, NBA)

The other day I saw a stat that NFL teams that finish with 16 points win more frequently than teams that finish with 21 points. So I looked through the scores of every single NFL game since the 1990 season to see if this was true, and throughout 16496 games…

nfl_wp.png

The stat was correct! Oddly enough, teams have won more frequently with 9 points than 14, and more frequently with 23 than 29. Seems strange right? You’d expect that the every single point scored would increase the likelihood of wining. I’m not quite sure if this is a result of the NFL’s unconventional scoring system, the small sample size with unusual scores, or an unexplained dependence between scores; regardless, this trend in NFL scores is definitely odd. I also wanted to look at the scoring distribution to see if that could explain anything.

nfl_totalG.png

At first glance, it looks like a normal distribution, but with a second look you can see that although it’s not quite normal, there is still a pretty consistent pattern. A cycle seems to repeat every three to four points, but that still doesn’t explain the win percentages. Maybe there’s something behind the scenes that I’m missing or its just some Scorigami magic.

I decided to look at scores for MLB and NBA games to see if there was anything weird going on in other leagues. The scores for MLB games provided results that were much easier to explain.

mlb_wp.png

The only somewhat strange occurrence comes at the 22 run mark. Teams that scored 22 runs have an all-time record of 18-1: this is solely because on May 17, 1979, the Chicago Cubs lost to the Philadelphia Phillies with a final score of 23-22. All other teams have won a game where they scored 18 or more runs.

mlb_totalG.png

Overall, the distribution of MLB scores is much more expected. The much larger sample size of 128,524 games and a smaller spread of scores produces a neat right-skewed distribution. The larger amount of data collected in the MLB makes the scoring distribution much cleaner, while also making advanced analytics much more concise than in other pro sports. Of course, there’s still plenty of weird things that happen in baseball, but as far as this measure goes, this is as clean-cut as it gets.

Lastly, I wanted to look at the same relationship between points and win percentage for NBA games:

nba_wp.png

Although the right side of the plot looks weird, it can be entirely explained by small sample size. The portion of the plot ranging from around 75-120 points is representative of what you would expect. Everything outside of this range is strictly a result of not having too many games with that specific final score. For example, given a team scored 184 points, their win percentage is 0%. It’s actually a similar situation as the Cubs loss described earlier: an already abnormally high scoring game got pushed into overtime. On December 13, 1983, the Denver Nuggets lost to the Detroit Pistons with a final score of 186-184.

nba_totalG.png

Are there any important conclusions that can be made from all this? Probably not, but it’s still interesting to see the different trends in these pro sports leagues.

Here is the project’s GitHub link for those interested in how I collected the data and the csv files generated.

Previous
Previous

The Biggest Misconception about Basketball Analytics