Explainability is NOT a Game
Joao Marques-Silva, Xuanxiang Huang
TL;DR
The paper challenges the reliability of Shapley-value-based feature attribution for XAI by showing irrelevants can receive the largest absolute Shapley values, misrepresenting their predictive role. It formalizes AXp and CXp explanations and connects them with relevancy via MHS duality, and defines Shapley values on boolean classifiers using constructs like $\Upsilon$, $\phi$, $\Delta$, and $\mathsf{Sv}$. Through running example and exhaustive enumeration of 4-variable boolean functions, it uncovers widespread issues (I1–I7) and argues that Shapley-based explanations often conflict with logic-based relevancy. Consequently SHAP and related approximations inherit fundamental flaws, undermining their use in high-stakes domains.
Abstract
Explainable artificial intelligence (XAI) aims to help human decision-makers in understanding complex machine learning (ML) models. One of the hallmarks of XAI are measures of relative feature importance, which are theoretically justified through the use of Shapley values. This paper builds on recent work and offers a simple argument for why Shapley values can provide misleading measures of relative feature importance, by assigning more importance to features that are irrelevant for a prediction, and assigning less importance to features that are relevant for a prediction. The significance of these results is that they effectively challenge the many proposed uses of measures of relative feature importance in a fast-growing range of high-stakes application domains.
