Model-Agnostic Interpretation Framework in Machine Learning: A Comparative Study in NBA Sports
Shun Liu
TL;DR
We address interpretability in high-dimensional sports analytics by proposing a model-agnostic interpretation framework that preserves predictive performance through modular data processing. The approach fuses diverse post-hoc and global explanations (e.g., SHAP, PDPs) with a feature-centric pipeline spanning dataset integration, dimensionality reduction, preprocessing, regression and ANN modeling, and visualization. Key findings include competitive predictive performance (e.g., $R^2$ up to 0.8156 for win predictions and a substantial gain from nonlinear feature engineering in salary modeling) and clear, interpretable insights into feature importance and nonlinearity via weight plots, SHAP, and GAMs. The framework aims to enable trust, transparency, and actionable insights in NBA sports analytics and high-dimensional predictive modeling more broadly.
Abstract
The field of machine learning has seen tremendous progress in recent years, with deep learning models delivering exceptional performance across a range of tasks. However, these models often come at the cost of interpretability, as they operate as opaque "black boxes" that obscure the rationale behind their decisions. This lack of transparency can limit understanding of the models' underlying principles and impede their deployment in sensitive domains, such as healthcare or finance. To address this challenge, our research team has proposed an innovative framework designed to reconcile the trade-off between model performance and interpretability. Our approach is centered around modular operations on high-dimensional data, which enable end-to-end processing while preserving interpretability. By fusing diverse interpretability techniques and modularized data processing, our framework sheds light on the decision-making processes of complex models without compromising their performance. We have extensively tested our framework and validated its superior efficacy in achieving a harmonious balance between computational efficiency and interpretability. Our approach addresses a critical need in contemporary machine learning applications by providing unprecedented insights into the inner workings of complex models, fostering trust, transparency, and accountability in their deployment across diverse domains.
