Table of Contents
Fetching ...

Explaining Probabilistic Models with Distributional Values

Luca Franceschi, Michele Donini, Cédric Archambeau, Matthias Seeger

TL;DR

The distributional values, random variables that track changes in the model output and their analytic expressions for games with Gaussian, Bernoulli and Categorical payoffs are introduced.

Abstract

A large branch of explainable machine learning is grounded in cooperative game theory. However, research indicates that game-theoretic explanations may mislead or be hard to interpret. We argue that often there is a critical mismatch between what one wishes to explain (e.g. the output of a classifier) and what current methods such as SHAP explain (e.g. the scalar probability of a class). This paper addresses such gap for probabilistic models by generalising cooperative games and value operators. We introduce the distributional values, random variables that track changes in the model output (e.g. flipping of the predicted class) and derive their analytic expressions for games with Gaussian, Bernoulli and Categorical payoffs. We further establish several characterising properties, and show that our framework provides fine-grained and insightful explanations with case studies on vision and language models.

Explaining Probabilistic Models with Distributional Values

TL;DR

The distributional values, random variables that track changes in the model output and their analytic expressions for games with Gaussian, Bernoulli and Categorical payoffs are introduced.

Abstract

A large branch of explainable machine learning is grounded in cooperative game theory. However, research indicates that game-theoretic explanations may mislead or be hard to interpret. We argue that often there is a critical mismatch between what one wishes to explain (e.g. the output of a classifier) and what current methods such as SHAP explain (e.g. the scalar probability of a class). This paper addresses such gap for probabilistic models by generalising cooperative games and value operators. We introduce the distributional values, random variables that track changes in the model output (e.g. flipping of the predicted class) and derive their analytic expressions for games with Gaussian, Bernoulli and Categorical payoffs. We further establish several characterising properties, and show that our framework provides fine-grained and insightful explanations with case studies on vision and language models.
Paper Structure (31 sections, 2 theorems, 68 equations, 6 figures, 1 table)

This paper contains 31 sections, 2 theorems, 68 equations, 6 figures, 1 table.

Key Result

Lemma 3.7

Denote $\alpha_j = \theta_{S\cup i, j}$, $\beta_j = \theta_{S, j}$ and $\nu_j = \alpha_j - \beta_j$ and assume (without loss of generality) the categories to be ordered so that $\nu_1\ge \nu_2\ge \dots\ge \nu_d$. Then, for any $i\in [n]$ and $S\in 2^{[n]\setminus i}$, the distribution of $v(S\cup i) where for $r\neq s$, $\tilde{Q}_{i, S}(r, s) = e^{\alpha_r + \beta_s}\left( C_s - C_r \right) \math

Figures (6)

  • Figure 1: (Left) The model (Black box), representing $f$, is a $3$-way classifier that outputs categorical distributions. (Right) Computation of the marginal contribution of $i$ to $S$ under the traditional framework (top) and our proposed framework (bottom). In both case, we query the model with and without feature $i$, which results in two different categorical distributions. The standard approach (e.g. as in SHAP) disregards the probabilistic nature of the outcome and treats the probability vectors as simple real valued-vectors. At the bottom, our approach preserves the stochastic structure (depicted by the simplex). The resulting stochastic marginal contribution is a RV taking values in the difference set. In the categorical case, such set is made of "switching points" between predicted classes, e.g. from cat to dog. Furthermore, the expectation of a distributional value is the corresponding standard value. This correspondence, formalized in Proposition \ref{['prop:all']}.(i) is represented by the star symbol and the arrow connecting top and bottom representations.
  • Figure 2: Applications of categorical Shapley value for a digit (top) and an object classifiers (bottom). Test images from MNIST lecun1998gradient and ImageNet deng2009imagenet. All gradations of white represent pixel-wise probabilities.
  • Figure 3: (Top two rows) We plot the standard (estimated) Shapley value for each of the digit explaining the output probabilities: red gradations indicate positive contribution, blue negative. The values have been obtained as expectation of the Categorical SV, but could have been obtained also with other techniques such as KernelSHAP lundberg2017unified. (Bottom five rows) We plot slices of the Categorical SV. All plots except the last two show transition probabilities from and to the digit '8' and complement Figure \ref{['fig:images']} in the main paper. The last two plots show examples of transition probabilities that do not involve the digit '8'.
  • Figure 4: Plots of several other transition probabilities for the cat example of the main paper. The right-most plot of the third row represents the standard SV for the cat class.
  • Figure 5: Fidelity studies for Mnist (top row) and ImageNet (bottom row) cases.
  • ...and 1 more figures

Theorems & Definitions (9)

  • Definition 3.1: Cooperative stochastic games
  • Definition 3.2: Stochastic marginal contribution
  • Definition 3.3: Distributional value operators
  • Remark 3.4: Distributional values and reparameterizations
  • Example 3.5: The XOR game
  • Remark 3.6: On the "noise sharing" condition
  • Lemma 3.7: Categorical marginal contributions
  • Definition 3.8: Efficient and symmetric coalition structures
  • Proposition 3.9