The Distributional Uncertainty of the SHAP score in Explainable Machine Learning
Santiago Cifuentes, Leopoldo Bertossi, Nina Pardal, Sergio Abriola, Maria Vanina Martinez, Miguel Romero
TL;DR
SHAP scores depend on an underlying entity population distribution, which is typically unknown in practice. This paper treats SHAP as a polynomial over an uncertainty region of product distributions, enabling the computation of SHAP intervals via maxima and minima over hyperrectangles. It proves that key problems for these region-based SHAP scores are NP-hard or NP-complete, even for binary decision trees, and confirms related hardness for ambiguity and dominance questions, while showing polynomial-time tractability for certain classifier families. An empirical study on the California Housing data demonstrates that feature rankings can vary with distributional assumptions and that shrinking the uncertainty region reduces this sensitivity. Overall, the work offers a principled, robust perspective on attribution under distributional uncertainty and lays groundwork for more reliable local explanations.
Abstract
Attribution scores reflect how important the feature values in an input entity are for the output of a machine learning model. One of the most popular attribution scores is the SHAP score, which is an instantiation of the general Shapley value used in coalition game theory. The definition of this score relies on a probability distribution on the entity population. Since the exact distribution is generally unknown, it needs to be assigned subjectively or be estimated from data, which may lead to misleading feature scores. In this paper, we propose a principled framework for reasoning on SHAP scores under unknown entity population distributions. In our framework, we consider an uncertainty region that contains the potential distributions, and the SHAP score of a feature becomes a function defined over this region. We study the basic problems of finding maxima and minima of this function, which allows us to determine tight ranges for the SHAP scores of all features. In particular, we pinpoint the complexity of these problems, and other related ones, showing them to be NP-complete. Finally, we present experiments on a real-world dataset, showing that our framework may contribute to a more robust feature scoring.
