Explainability is NOT a Game

Joao Marques-Silva; Xuanxiang Huang

Explainability is NOT a Game

Joao Marques-Silva, Xuanxiang Huang

TL;DR

The paper challenges the reliability of Shapley-value-based feature attribution for XAI by showing irrelevants can receive the largest absolute Shapley values, misrepresenting their predictive role. It formalizes AXp and CXp explanations and connects them with relevancy via MHS duality, and defines Shapley values on boolean classifiers using constructs like $\Upsilon$, $\phi$, $\Delta$, and $\mathsf{Sv}$. Through running example and exhaustive enumeration of 4-variable boolean functions, it uncovers widespread issues (I1–I7) and argues that Shapley-based explanations often conflict with logic-based relevancy. Consequently SHAP and related approximations inherit fundamental flaws, undermining their use in high-stakes domains.

Abstract

Explainable artificial intelligence (XAI) aims to help human decision-makers in understanding complex machine learning (ML) models. One of the hallmarks of XAI are measures of relative feature importance, which are theoretically justified through the use of Shapley values. This paper builds on recent work and offers a simple argument for why Shapley values can provide misleading measures of relative feature importance, by assigning more importance to features that are irrelevant for a prediction, and assigning less importance to features that are relevant for a prediction. The significance of these results is that they effectively challenge the many proposed uses of measures of relative feature importance in a fast-growing range of high-stakes application domains.

Explainability is NOT a Game

TL;DR

, and

. Through running example and exhaustive enumeration of 4-variable boolean functions, it uncovers widespread issues (I1–I7) and argues that Shapley-based explanations often conflict with logic-based relevancy. Consequently SHAP and related approximations inherit fundamental flaws, undermining their use in high-stakes domains.

Abstract

Paper Structure (11 sections, 13 equations, 3 figures, 4 tables)

This paper contains 11 sections, 13 equations, 3 figures, 4 tables.

Introduction
Definitions
Classification Problems
Formal Explanations
Shapley Values in Explainability
Feature (Ir)relevancy
Refuting Shapley Values for Explainability
Misleading Feature Importance
Issues with Shapley Values for Explainability
Verdict & Justification
Discussion

Figures (3)

Figure 1: Example classifier -- decision tree and its truth table. For this classifier, we have ${\mathcal{F}}=\{1,2,3,4\}$, ${\mathcal{D}}_i=\{0,1\},i=1,2,3,4$, $\mathbb{F}=\{0,1\}^4$, and ${\mathcal{K}}=\{0,1\}$. The classification function is given by the decision tree shown, or alternatively by the truth table. Finally, the instance considered is $((0,0,0,0),0)$, corresponding to row 1 in the truth table. The instance is consistent with path $\langle1,2,4,6,10\rangle$, which is highlighted in the DT. The prediction is 0, as indicated in terminal node 10.
Figure 2: Computing AXp's/CXp's for the example DT and instance $((0,0,0,0),0)$. All subsets of features are considered. For computing AXp's, and for some set ${\mathcal{S}}$, the features in ${\mathcal{S}}$ are fixed to their values as dictated by $\mathbf{v}$. The picked rows are the rows consistent with those fixed values. For example, if ${\mathcal{S}}=\{2,3,4\}$, then only rows 1 and 9 are consistent with having features 2, 3 and 4 assigned value 0. Similarly, for computing CXp's, and for some set ${\mathcal{S}}$, the features in ${\mathcal{F}}\setminus{\mathcal{S}}$ are fixed to their values as dictated by $\mathbf{v}$. The picked rows are again the rows consistent with those fixed values. For example, if ${\mathcal{S}}=\{2\}$, then ${\mathcal{F}}\setminus{\mathcal{S}}=\{1,3,4\}$, and so only rows 1 and 5 are consistent with having features 1, 3 and 4 assigned value 0. An AXp is an irreducible set of features that is sufficient for the prediction. In this example, only $\{2,3,4\}$ respects the criteria. Moreover, a CXp is an irreducible set of features which, if allowed to take any value from their domain, the prediction changes. For this example, $\{2\}$, $\{3\}$ and $\{4\}$ respect the criteria, i.e. by only changing one of these features, we are able to change the prediction.
Figure 3: Computation of Shapley values for the example DT and instance $((0,0,0,0),0)$. For each feature $i$, the sets to consider are all the sets that do not include the feature. For each set ${\mathcal{S}}$, we show the rows consistent with the values of the features in ${\mathcal{S}}$, as dictated by $\mathbf{v}$. For example, if ${\mathcal{S}}=\{2,4\}$, then the rows of the truth table consistent with having features 2 and 4 assigned value 0 are 1, 3, 9 and 11. The average values are obtained by summing up the values of the classifier in the rows consistent with ${\mathcal{S}}$ and dividing by the total number of rows. For ${\mathcal{S}}=\{2,4\}$, only row 3 in the truth table takes value 1, and so the average becomes $1/4$.

Explainability is NOT a Game

TL;DR

Abstract

Explainability is NOT a Game

Authors

TL;DR

Abstract

Table of Contents

Figures (3)