Table of Contents
Fetching ...

Assessing win strength in MLB win prediction models

Morgan Allen, Paul Savala

TL;DR

The paper investigates how predicted win probabilities from a broad set of MLB win-prediction models relate to actual score differentials, framing win-strength as a measurable attribute. By training on 2001–2015 data and testing on 2016–2019 data, it compares six model families plus a FiveThirtyEight baseline using AUROC, log-loss, and Brier score, and finds that most models outperform the baseline with logistic regression often delivering top overall performance. The study further demonstrates that predicted win probabilities correlate positively with score differentials on average, and that a targeted run-line betting strategy using probabilistic cutoffs can yield positive returns. Overall, the work highlights meaningful linkages between win likelihood and win strength, while offering practical insights for betting strategies and model ensemble design.

Abstract

In Major League Baseball, strategy and planning are major factors in determining the outcome of a game. Previous studies have aided this by building machine learning models for predicting the winning team of any given game. We extend this work by training a comprehensive set of machine learning models using a common dataset. In addition, we relate the win probabilities produced by these models to win strength as measured by score differential. In doing so we show that the most common machine learning models do indeed demonstrate a relationship between predicted win probability and the strength of the win. Finally, we analyze the results of using predicted win probabilities as a decision making mechanism on run-line betting. We demonstrate positive returns when utilizing appropriate betting strategies, and show that naive use of machine learning models for betting lead to significant loses.

Assessing win strength in MLB win prediction models

TL;DR

The paper investigates how predicted win probabilities from a broad set of MLB win-prediction models relate to actual score differentials, framing win-strength as a measurable attribute. By training on 2001–2015 data and testing on 2016–2019 data, it compares six model families plus a FiveThirtyEight baseline using AUROC, log-loss, and Brier score, and finds that most models outperform the baseline with logistic regression often delivering top overall performance. The study further demonstrates that predicted win probabilities correlate positively with score differentials on average, and that a targeted run-line betting strategy using probabilistic cutoffs can yield positive returns. Overall, the work highlights meaningful linkages between win likelihood and win strength, while offering practical insights for betting strategies and model ensemble design.

Abstract

In Major League Baseball, strategy and planning are major factors in determining the outcome of a game. Previous studies have aided this by building machine learning models for predicting the winning team of any given game. We extend this work by training a comprehensive set of machine learning models using a common dataset. In addition, we relate the win probabilities produced by these models to win strength as measured by score differential. In doing so we show that the most common machine learning models do indeed demonstrate a relationship between predicted win probability and the strength of the win. Finally, we analyze the results of using predicted win probabilities as a decision making mechanism on run-line betting. We demonstrate positive returns when utilizing appropriate betting strategies, and show that naive use of machine learning models for betting lead to significant loses.

Paper Structure

This paper contains 30 sections, 1 equation, 10 figures, 9 tables.

Figures (10)

  • Figure 1: Percent of games on which pairs of models agree. Higher levels of agreement reduce the effectiveness of ensembling, as predictions rarely differ. Note that XGB has low agreement with all other models, and yet is highly accuracy on its own. Thus XGB is a strong candidate for model ensembling.
  • Figure 2: Predicted home team win probability versus score differential (home team final score minus away team final score). All predicted win probabilities are unscaled and rounded to the nearest 10%. A dashed horizontal line at zero score differential (tied game) is shown for reference. LogR, KNN and ANN show the strongest positive linear trends. XGB and KNN perform the best for games with the most extreme predicted win probabilities. Note that SVM, KNN and FTE all fail to predict any games to have an especially high or low win probability.
  • Figure 3: Distribution of predicted home team win probabilities for FTE versus LogR. Note the narrow prediction region for FTE and the skinny tails. This means that many games will marked as being toss-ups.
  • Figure 4: Returns as a percentage of money invested using the high and low cutoffs shown on each axis. Predictions come from the LogR model. Setting the low and high cutoff equal to 0.5 is precisely the naive strategy described above. For appropriate choices of win and loss cutoffs we demonstrate positive returns, even into the double digits.
  • Figure 5: Percentage of games wagered on using each pair of cutoffs using the LogR model. Colors correspond to the returns shown in the previous figure. Note that in situations where positive returns are realized, approximately 0.5% to 5% of games are wagered on. In a normal season of 2430 games this corresponds to between 5 and 120 games per season.
  • ...and 5 more figures