Table of Contents
Fetching ...

Not feeling the buzz: Correction study of mispricing and inefficiency in online sportsbooks

Lawrence Clegg, John Cartlidge

TL;DR

A replication and correction of a recent article about mispricing and inefficiency in online sportsbooks is presented, demonstrating the importance of replication studies in sports forecasting, and the necessity to clean data.

Abstract

We present a replication and correction of a recent article (Ramirez, P., Reade, J.J., Singleton, C., Betting on a buzz: Mispricing and inefficiency in online sportsbooks, International Journal of Forecasting, 39:3, 2023, pp. 1413-1423, doi: 10.1016/j.ijforecast.2022.07.011). RRS measure profile page views on Wikipedia to generate a "buzz factor" metric for tennis players and show that it can be used to form a profitable gambling strategy by predicting bookmaker mispricing. Here, we use the same dataset as RRS to reproduce their results exactly, thus confirming the robustness of their mispricing claim. However, we discover that the published betting results are significantly affected by a single bet (the "Hercog" bet), which returns substantial outlier profits based on erroneously long odds. When this data quality issue is resolved, the majority of reported profits disappear and only one strategy, which bets on "competitive" matches, remains significantly profitable in the original out-of-sample period. While one profitable strategy offers weaker support than the original study, it still provides an indication that market inefficiencies may exist, as originally claimed by RRS. As an extension, we continue backtesting after 2020 on a cleaned dataset. Results show that (a) the "competitive" strategy generates no further profits, potentially suggesting markets have become more efficient, and (b) model coefficients estimated over this more recent period are no longer reliable predictors of bookmaker mispricing. We present this work as a case study demonstrating the importance of replication studies in sports forecasting, and the necessity to clean data. We open-source release comprehensive datasets and code.

Not feeling the buzz: Correction study of mispricing and inefficiency in online sportsbooks

TL;DR

A replication and correction of a recent article about mispricing and inefficiency in online sportsbooks is presented, demonstrating the importance of replication studies in sports forecasting, and the necessity to clean data.

Abstract

We present a replication and correction of a recent article (Ramirez, P., Reade, J.J., Singleton, C., Betting on a buzz: Mispricing and inefficiency in online sportsbooks, International Journal of Forecasting, 39:3, 2023, pp. 1413-1423, doi: 10.1016/j.ijforecast.2022.07.011). RRS measure profile page views on Wikipedia to generate a "buzz factor" metric for tennis players and show that it can be used to form a profitable gambling strategy by predicting bookmaker mispricing. Here, we use the same dataset as RRS to reproduce their results exactly, thus confirming the robustness of their mispricing claim. However, we discover that the published betting results are significantly affected by a single bet (the "Hercog" bet), which returns substantial outlier profits based on erroneously long odds. When this data quality issue is resolved, the majority of reported profits disappear and only one strategy, which bets on "competitive" matches, remains significantly profitable in the original out-of-sample period. While one profitable strategy offers weaker support than the original study, it still provides an indication that market inefficiencies may exist, as originally claimed by RRS. As an extension, we continue backtesting after 2020 on a cleaned dataset. Results show that (a) the "competitive" strategy generates no further profits, potentially suggesting markets have become more efficient, and (b) model coefficients estimated over this more recent period are no longer reliable predictors of bookmaker mispricing. We present this work as a case study demonstrating the importance of replication studies in sports forecasting, and the necessity to clean data. We open-source release comprehensive datasets and code.
Paper Structure (15 sections, 11 equations, 2 figures, 8 tables)

This paper contains 15 sections, 11 equations, 2 figures, 8 tables.

Figures (2)

  • Figure 1: Cumulative profits and the effect of the Hercog bet. When using Bet365 odds, the majority of reported out-of-sample returns BettingOnABuzz are generated from a single bet on Hercog to win against Doi (March 22, 2019; see dashed line). When the Hercog bet is removed, both models make a loss (see solid line).
  • Figure 2: Cumulative profits, using PM w/o RD: $p\in[0.4,0.6]$, shown in Table \ref{['tab:table5corrected']}: (a) Original out-of-sample period (Jan. 2019 to Feb. 2020); (b) Extended out-of-sample period (Jan. 2019 - Aug. 2023). Dotted vertical line indicates change in dataset.