Table of Contents
Fetching ...

March Madness Tournament Predictions Model: A Mathematical Modeling Approach

Christian McIver, Karla Avalos, Nikhil Nayak

TL;DR

The paper addresses predicting March Madness outcomes with an objective statistical approach. It employs a logistic-regression model using four predictors derived from efficiency metrics and a power rating to estimate win probabilities for 1-on-1 matchups, and then runs Monte Carlo simulations of entire brackets. It evaluates performance with naive matchup accuracy and Spearman correlation between predicted and actual final rounds, reporting around 74.6% test accuracy and bracket-level correlations ranging from about 0.37 to 0.75. The work demonstrates that a compact, interpretable feature set can yield competitive predictive power and highlights directions for incorporating time-varying statistics and other factors.

Abstract

This paper proposes a model to predict the outcome of the March Madness tournament based on historical NCAA basketball data since 2013. The framework of this project is a simplification of the FiveThrityEight NCAA March Madness prediction model, where the only four predictors of interest are Adjusted Offensive Efficiency (ADJOE), Adjusted Defensive Efficiency (ADJDE), Power Rating, and Two-Point Shooting Percentage Allowed. A logistic regression was utilized with the aforementioned metrics to generate a probability of a particular team winning each game. Then, a tournament simulation is developed and compared to real-world March Madness brackets to determine the accuracy of the model. Accuracies of performance were calculated using a naive approach and a Spearman rank correlation coefficient.

March Madness Tournament Predictions Model: A Mathematical Modeling Approach

TL;DR

The paper addresses predicting March Madness outcomes with an objective statistical approach. It employs a logistic-regression model using four predictors derived from efficiency metrics and a power rating to estimate win probabilities for 1-on-1 matchups, and then runs Monte Carlo simulations of entire brackets. It evaluates performance with naive matchup accuracy and Spearman correlation between predicted and actual final rounds, reporting around 74.6% test accuracy and bracket-level correlations ranging from about 0.37 to 0.75. The work demonstrates that a compact, interpretable feature set can yield competitive predictive power and highlights directions for incorporating time-varying statistics and other factors.

Abstract

This paper proposes a model to predict the outcome of the March Madness tournament based on historical NCAA basketball data since 2013. The framework of this project is a simplification of the FiveThrityEight NCAA March Madness prediction model, where the only four predictors of interest are Adjusted Offensive Efficiency (ADJOE), Adjusted Defensive Efficiency (ADJDE), Power Rating, and Two-Point Shooting Percentage Allowed. A logistic regression was utilized with the aforementioned metrics to generate a probability of a particular team winning each game. Then, a tournament simulation is developed and compared to real-world March Madness brackets to determine the accuracy of the model. Accuracies of performance were calculated using a naive approach and a Spearman rank correlation coefficient.

Paper Structure

This paper contains 6 sections, 1 equation, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Logistic Regression as a Genralized Linear Model
  • Figure 2: Simulation results for the NCAA 2023 tournament between the South and Midwest conferences, on the left and right side of the bracket, respectively. The green matchups represent matchups correctly predicted by the logistic regression model. The red matchups represent matchups that were predicted incorrectly. The blue matchups represent matchups that were composed of the wrong teams but had the correct winner.
  • Figure 3: Simulation results for the NCAA 2023 tournament between the East and West conferences, on the left and right side of the bracket, respectively.
  • Figure 4: Feature Importance Plot
  • Figure 5: Selected Model Equation