Towards Attributions of Input Variables in a Coalition

Xinhao Zheng; Huiqi Deng; Quanshi Zhang

Towards Attributions of Input Variables in a Coalition

Xinhao Zheng, Huiqi Deng, Quanshi Zhang

TL;DR

This paper analyzes the numerical effects of AND-OR interactions in AI models and extends the Shapley value to a new attribution metric for variable coalitions, revealing that specific interactions cause attribution conflicts.

Abstract

This paper focuses on the fundamental challenge of partitioning input variables in attribution methods for Explainable AI, particularly in Shapley value-based approaches. Previous methods always compute attributions given a predefined partition but lack theoretical guidance on how to form meaningful variable partitions. We identify that attribution conflicts arise when the attribution of a coalition differs from the sum of its individual variables' attributions. To address this, we analyze the numerical effects of AND-OR interactions in AI models and extend the Shapley value to a new attribution metric for variable coalitions. Our theoretical findings reveal that specific interactions cause attribution conflicts, and we propose three metrics to evaluate coalition faithfulness. Experiments on synthetic data, NLP, image classification, and the game of Go validate our approach, demonstrating consistency with human intuition and practical applicability.

Towards Attributions of Input Variables in a Coalition

TL;DR

Abstract

Paper Structure (28 sections, 7 theorems, 33 equations, 7 figures, 5 tables)

This paper contains 28 sections, 7 theorems, 33 equations, 7 figures, 5 tables.

Introduction
Related works
Algorithm
Preliminaries: AND-OR interactions
Revisiting attributions from interactions
Attribution value for a coalition
Explaining the conflict of attributions
Properties/axioms for the attribution of a coalition
Experiment
Evaluating faithfulness of a coalition
Application: explaining the Go game
Conclusion
Coalition attribution vs interaction effect
Universal-matching property of AND-OR interactions
Proof of Theorem 2
...and 13 more sections

Key Result

Theorem 3.2

(Reformulation of the Shapley value, proved in Appendix proof of theorem 2) The Shapley value $\phi(i)$ of each input variable $x_i$ can be explained as $\phi(i)=\sum_{S\subseteq N,i\in S}\frac{1}{|S|}\left[I_\text{and}(S)+I_\text{or}(S)\right]$.

Figures (7)

Figure 1: (a)AND-OR interaction: Let the AI model encode three interactions $S_1 = \{x_1, x_2\}$, $S_2=\{x_1, x_2,$$x_3, x_4, x_5, x_6\}$ and $S_3 = \{x_5, x_6\}$, respectively. In this way, the Shapley value of $x_1$ can be decomposed as $\phi(x_1)=1/2\cdot I(S_1) + 1/6\cdot I(S_2)$. (b) Conflict of attributions: Let us consider another example with three interactions, w.r.t., $S_1=\{x_1,x_2,x_3,x_4\}$, $S_2=\{x_1,x_2\}$, and $S_3=\{x_2,x_3,x_4\}$. The attribution of the coalition $\{x_1,x_2\}$ is not equal to the sum of attributions of input variable $x_1$ and $x_2$, i.e., $\varphi(S=\{x_1,x_2\})\neq \phi(x_1)+\phi(x_2)$.
Figure 2: Visualization of two approaches for the selection of coalitions in KataGo. For a coalition $S$, $\varphi(S)>0$ means the coalition $S$ of stones makes a positive numerical effect for the white, while it makes a negative effect when $\varphi(S)<0$.
Figure 3: Analysis of shape patterns in Go compared to human intuition
Figure 4: Coalition attribution faithfulness metrics of VGG-11 on CIFAR-10 dataset
Figure 5: Coalition attribution faithfulness metrics of ResNet-20 on CIFAR-10 dataset
...and 2 more figures

Theorems & Definitions (18)

Definition 3.1
Theorem 3.2
Theorem 3.3
Theorem 3.4
Corollary 3.5
Theorem 3.6
Corollary 3.7
Corollary 3.8
proof
proof
...and 8 more

Towards Attributions of Input Variables in a Coalition

TL;DR

Abstract

Towards Attributions of Input Variables in a Coalition

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (18)