Table of Contents
Fetching ...

Model-Agnostic Interpretation Framework in Machine Learning: A Comparative Study in NBA Sports

Shun Liu

TL;DR

We address interpretability in high-dimensional sports analytics by proposing a model-agnostic interpretation framework that preserves predictive performance through modular data processing. The approach fuses diverse post-hoc and global explanations (e.g., SHAP, PDPs) with a feature-centric pipeline spanning dataset integration, dimensionality reduction, preprocessing, regression and ANN modeling, and visualization. Key findings include competitive predictive performance (e.g., $R^2$ up to 0.8156 for win predictions and a substantial gain from nonlinear feature engineering in salary modeling) and clear, interpretable insights into feature importance and nonlinearity via weight plots, SHAP, and GAMs. The framework aims to enable trust, transparency, and actionable insights in NBA sports analytics and high-dimensional predictive modeling more broadly.

Abstract

The field of machine learning has seen tremendous progress in recent years, with deep learning models delivering exceptional performance across a range of tasks. However, these models often come at the cost of interpretability, as they operate as opaque "black boxes" that obscure the rationale behind their decisions. This lack of transparency can limit understanding of the models' underlying principles and impede their deployment in sensitive domains, such as healthcare or finance. To address this challenge, our research team has proposed an innovative framework designed to reconcile the trade-off between model performance and interpretability. Our approach is centered around modular operations on high-dimensional data, which enable end-to-end processing while preserving interpretability. By fusing diverse interpretability techniques and modularized data processing, our framework sheds light on the decision-making processes of complex models without compromising their performance. We have extensively tested our framework and validated its superior efficacy in achieving a harmonious balance between computational efficiency and interpretability. Our approach addresses a critical need in contemporary machine learning applications by providing unprecedented insights into the inner workings of complex models, fostering trust, transparency, and accountability in their deployment across diverse domains.

Model-Agnostic Interpretation Framework in Machine Learning: A Comparative Study in NBA Sports

TL;DR

We address interpretability in high-dimensional sports analytics by proposing a model-agnostic interpretation framework that preserves predictive performance through modular data processing. The approach fuses diverse post-hoc and global explanations (e.g., SHAP, PDPs) with a feature-centric pipeline spanning dataset integration, dimensionality reduction, preprocessing, regression and ANN modeling, and visualization. Key findings include competitive predictive performance (e.g., up to 0.8156 for win predictions and a substantial gain from nonlinear feature engineering in salary modeling) and clear, interpretable insights into feature importance and nonlinearity via weight plots, SHAP, and GAMs. The framework aims to enable trust, transparency, and actionable insights in NBA sports analytics and high-dimensional predictive modeling more broadly.

Abstract

The field of machine learning has seen tremendous progress in recent years, with deep learning models delivering exceptional performance across a range of tasks. However, these models often come at the cost of interpretability, as they operate as opaque "black boxes" that obscure the rationale behind their decisions. This lack of transparency can limit understanding of the models' underlying principles and impede their deployment in sensitive domains, such as healthcare or finance. To address this challenge, our research team has proposed an innovative framework designed to reconcile the trade-off between model performance and interpretability. Our approach is centered around modular operations on high-dimensional data, which enable end-to-end processing while preserving interpretability. By fusing diverse interpretability techniques and modularized data processing, our framework sheds light on the decision-making processes of complex models without compromising their performance. We have extensively tested our framework and validated its superior efficacy in achieving a harmonious balance between computational efficiency and interpretability. Our approach addresses a critical need in contemporary machine learning applications by providing unprecedented insights into the inner workings of complex models, fostering trust, transparency, and accountability in their deployment across diverse domains.
Paper Structure (19 sections, 9 equations, 10 figures, 2 tables)

This paper contains 19 sections, 9 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Sample statistics for away team.
  • Figure 2: Field Goal Percentage(%) for home team in the proposed dataset.
  • Figure 3: The shifts of dataset variance towards the number of principal components(annotated as n for simplicity). When $X$ equal to 2, with two principal components, the 80% variance can be explained, in the case of $X$ equals to 3, has better expressiveness(85% variance is clear). In the subsequent experiments, we prefer $X$ to be 3, which conserve the majority of input's statistical properties, but also reduce information redundency to a larger extent.
  • Figure 4: Training losses(left) and retraining accuracy(right) of artificial neural network(ANN). The losses for both training and validating keep declining before plateauing at the range of 0.6-0.8, and the gaps are narrowed with training epochs increasing, indicating the network has converged. Aim to improve the performance, 200 epochs of retrainig is conducted and the scoring metric--accuracy achieves 0.76.
  • Figure 5: Pair-wise correlation between "Four Factors" and "W"(the number of winning games).
  • ...and 5 more figures