Table of Contents
Fetching ...

Towards trustable SHAP scores

Olivier Letoffe, Xuanxiang Huang, Joao Marques-Silva

TL;DR

This paper investigates trustable SHAP scores by arguing that failures of exact SHAP attributions arise from the characteristic function rather than Shapley values themselves. It formalizes desirable properties for characteristic functions (value independence, relevancy compliance, numerical neutrality) and introduces a family of similarity-based characteristic functions that align SHAP scores with abductive and contrastive explanations. The authors analyze the computational complexity of these new definitions, showing $"#P$-hard and $ ext{NP}$-hard$ cases alongside practical polynomial-time scenarios for common representations, and propose concrete SHAP-tool modifications to adopt the new functions. Overall, the work provides a principled path to trustworthy feature attributions with provable properties and practical implications for XAI tooling and interpretation reliability.

Abstract

SHAP scores represent the proposed use of the well-known Shapley values in eXplainable Artificial Intelligence (XAI). Recent work has shown that the exact computation of SHAP scores can produce unsatisfactory results. Concretely, for some ML models, SHAP scores will mislead with respect to relative feature influence. To address these limitations, recently proposed alternatives exploit different axiomatic aggregations, all of which are defined in terms of abductive explanations. However, the proposed axiomatic aggregations are not Shapley values. This paper investigates how SHAP scores can be modified so as to extend axiomatic aggregations to the case of Shapley values in XAI. More importantly, the proposed new definition of SHAP scores avoids all the known cases where unsatisfactory results have been identified. The paper also characterizes the complexity of computing the novel definition of SHAP scores, highlighting families of classifiers for which computing these scores is tractable. Furthermore, the paper proposes modifications to the existing implementations of SHAP scores. These modifications eliminate some of the known limitations of SHAP scores, and have negligible impact in terms of performance.

Towards trustable SHAP scores

TL;DR

This paper investigates trustable SHAP scores by arguing that failures of exact SHAP attributions arise from the characteristic function rather than Shapley values themselves. It formalizes desirable properties for characteristic functions (value independence, relevancy compliance, numerical neutrality) and introduces a family of similarity-based characteristic functions that align SHAP scores with abductive and contrastive explanations. The authors analyze the computational complexity of these new definitions, showing -hard and -hard$ cases alongside practical polynomial-time scenarios for common representations, and propose concrete SHAP-tool modifications to adopt the new functions. Overall, the work provides a principled path to trustworthy feature attributions with provable properties and practical implications for XAI tooling and interpretation reliability.

Abstract

SHAP scores represent the proposed use of the well-known Shapley values in eXplainable Artificial Intelligence (XAI). Recent work has shown that the exact computation of SHAP scores can produce unsatisfactory results. Concretely, for some ML models, SHAP scores will mislead with respect to relative feature influence. To address these limitations, recently proposed alternatives exploit different axiomatic aggregations, all of which are defined in terms of abductive explanations. However, the proposed axiomatic aggregations are not Shapley values. This paper investigates how SHAP scores can be modified so as to extend axiomatic aggregations to the case of Shapley values in XAI. More importantly, the proposed new definition of SHAP scores avoids all the known cases where unsatisfactory results have been identified. The paper also characterizes the complexity of computing the novel definition of SHAP scores, highlighting families of classifiers for which computing these scores is tractable. Furthermore, the paper proposes modifications to the existing implementations of SHAP scores. These modifications eliminate some of the known limitations of SHAP scores, and have negligible impact in terms of performance.
Paper Structure (47 sections, 23 theorems, 20 equations, 3 figures, 2 tables)

This paper contains 47 sections, 23 theorems, 20 equations, 3 figures, 2 tables.

Key Result

Proposition 1

$\exists(\mathbf{x}\in\mathbb{F}).\left(\mathsf{AEx}(\mathbf{x},{\mathcal{F}}\setminus{\mathcal{S}};{\mathcal{E}})\right)$ iff $\mathsf{WCXp}({\mathcal{S}};{\mathcal{E}})$, i.e. there exists a constrained adversarial example with the features ${\mathcal{F}}\setminus{\mathcal{S}}$ iff the set ${\math

Figures (3)

  • Figure 1: Simple regression tree model, adapted from james-bk17. The target sample is $((1,1),1)$.
  • Figure 2: AExs, AXPs & SHAP scores for the regression tree from \ref{['ex:tr:rt']} and target sample $((1,1),1)$. For simplicity, parameterizations are elided.
  • Figure 3: Example decision tree (DT), representing classifier $\kappa_2$, and respective similarity predicate $\sigma_{2}$. The target sample is $((1,1,1,1),1)$.

Theorems & Definitions (33)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Proposition 5
  • Theorem 1
  • Proposition 6
  • Proposition 7
  • Proposition 8
  • Proposition 9
  • ...and 23 more