Table of Contents
Fetching ...

The Explanation Game -- Rekindled (Extended Version)

Joao Marques-Silva, Xuanxiang Huang, Olivier Letoffe

TL;DR

The paper tackles the reliability of SHAP-based explanations in XAI by exposing fundamental flaws in the traditional SHAP_T construction that relies on expected-value characteristic functions. It introduces a rigorously defined alternative SHAP framework using a monotone characteristic function $v_a$, paired with a data-based, sample-driven approach (sbXps) and a robust Shapley-estimation procedure (CGT) that guarantees bounded error with high confidence. The key contributions include (i) a new, non-misleading SHAP definition that preserves connections between feature-attribution and feature-selection explanations, (ii) a scalable estimation pipeline with polynomial-time guarantees and zero attribution for irrelevant features, and (iii) comprehensive experiments on boolean, tabular, and image data showing substantial improvements over SHAP in ranking fidelity and practicality. The findings have significant practical impact for trustworthy XAI, offering a scalable, theory-backed method that provides more accurate feature importance rankings in real-world applications.

Abstract

Recent work demonstrated the existence of critical flaws in the current use of Shapley values in explainable AI (XAI), i.e. the so-called SHAP scores. These flaws are significant in that the scores provided to a human decision-maker can be misleading. Although these negative results might appear to indicate that Shapley values ought not be used in XAI, this paper argues otherwise. Concretely, this paper proposes a novel definition of SHAP scores that overcomes existing flaws. Furthermore, the paper outlines a practically efficient solution for the rigorous estimation of the novel SHAP scores. Preliminary experimental results confirm our claims, and further underscore the flaws of the current SHAP scores.

The Explanation Game -- Rekindled (Extended Version)

TL;DR

The paper tackles the reliability of SHAP-based explanations in XAI by exposing fundamental flaws in the traditional SHAP_T construction that relies on expected-value characteristic functions. It introduces a rigorously defined alternative SHAP framework using a monotone characteristic function , paired with a data-based, sample-driven approach (sbXps) and a robust Shapley-estimation procedure (CGT) that guarantees bounded error with high confidence. The key contributions include (i) a new, non-misleading SHAP definition that preserves connections between feature-attribution and feature-selection explanations, (ii) a scalable estimation pipeline with polynomial-time guarantees and zero attribution for irrelevant features, and (iii) comprehensive experiments on boolean, tabular, and image data showing substantial improvements over SHAP in ranking fidelity and practicality. The findings have significant practical impact for trustworthy XAI, offering a scalable, theory-backed method that provides more accurate feature importance rankings in real-world applications.

Abstract

Recent work demonstrated the existence of critical flaws in the current use of Shapley values in explainable AI (XAI), i.e. the so-called SHAP scores. These flaws are significant in that the scores provided to a human decision-maker can be misleading. Although these negative results might appear to indicate that Shapley values ought not be used in XAI, this paper argues otherwise. Concretely, this paper proposes a novel definition of SHAP scores that overcomes existing flaws. Furthermore, the paper outlines a practically efficient solution for the rigorous estimation of the novel SHAP scores. Preliminary experimental results confirm our claims, and further underscore the flaws of the current SHAP scores.
Paper Structure (20 sections, 2 theorems, 16 equations, 4 figures, 5 tables, 2 algorithms)

This paper contains 20 sections, 2 theorems, 16 equations, 4 figures, 5 tables, 2 algorithms.

Key Result

Proposition 1

Let $i\in{\mathcal{F}}$ be an irrelevant feature. Then,

Figures (4)

  • Figure 1: ML model ${\mathcal{M}}_1$, adapted from Fig. 06(a) in hms-ijar24. As shown, the instance is $((1,1,1,1),1)$. For the DT, we have the set of AXps $\mathbb{A}_1=\{\{1,2\}\}$ and the se of CXps $\mathbb{C}_1=\{\{1\},\{2\}\}$. The expected values are used for computing the SHAP scores, as proposed in lundberg-nips17.
  • Figure 2: Simple ML model ${\mathcal{M}}_2$, with instance $((1,1),1)$, and $\alpha\not=0$. The expected values are computed for all possible sets of features. Clearly, $\mathbb{A}_2=\mathbb{C}_2=\{\{1\}\}$.
  • Figure 3: $\mathsf{SHAP}_{\mathrm{T}}$ scores for ${\mathcal{E}}_1$ and ${\mathcal{E}}_2$. These are the values that the tool SHAP lundberg-nips17 approximates.
  • Figure 4: Comparison of RBO values. Blue (resp. green) shows comparison with (resp. absolute) $\mathsf{SHAP}_{\mathrm{E}}$ scores

Theorems & Definitions (9)

  • Example 1
  • Example 2
  • Example 3
  • Proposition 1
  • Proposition 2
  • Example 4
  • Example 5
  • Example 6
  • Example 7