SHAP values through General Fourier Representations: Theory and Applications
Roberto Morales
TL;DR
This work develops a generalized Fourier framework for SHAP values on discrete, multi-valued input spaces by constructing an orthonormal tensor-product basis under product measures. It proves that SHAP attributions are linear functionals of the model’s Fourier coefficients and provides both deterministic truncation stability bounds and probabilistic convergence results to Gaussian-process limits, including infinite-width neural networks. A GP-based analysis yields explicit mean and concentration bounds for SHAP truncation errors, while a neural-network experiment demonstrates substantial computational savings and strong agreement with Kernel-SHAP on a clinical dataset with significant class imbalance. The results bridge cooperative-game explanations with harmonic analysis, enabling scalable and robust SHAP interpretations for complex, non-binary domains and guiding sparse spectral recovery in practice.
Abstract
This article establishes a rigorous spectral framework for the mathematical analysis of SHAP values. We show that any predictive model defined on a discrete or multi-valued input space admits a generalized Fourier expansion with respect to an orthonormalisation tensor-product basis constructed under a product probability measure. Within this setting, each SHAP attribution can be represented as a linear functional of the model's Fourier coefficients. Two complementary regimes are studied. In the deterministic regime, we derive quantitative stability estimates for SHAP values under Fourier truncation, showing that the attribution map is Lipschitz continuous with respect to the distance between predictors. In the probabilistic regime, we consider neural networks in their infinite-width limit and prove convergence of SHAP values toward those induced by the corresponding Gaussian process prior, with explicit error bounds in expectation and with high probability based on concentration inequalities. We also provide a numerical experiment on a clinical unbalanced dataset to validate the theoretical findings.
