Explaining a probabilistic prediction on the simplex with Shapley compositions

Paul-Gauthier Noé; Miquel Perelló-Nieto; Jean-François Bonastre; Peter Flach

Explaining a probabilistic prediction on the simplex with Shapley compositions

Paul-Gauthier Noé, Miquel Perelló-Nieto, Jean-François Bonastre, Peter Flach

TL;DR

The paper tackles explainability for multiclass probabilistic predictions by treating outputs as compositional data on the probability simplex. It extends the Shapley value to the simplex through the Aitchison geometry, defining Shapley compositions and proving their uniqueness under linearity, symmetry, and efficiency. An estimation algorithm and multiple visualisation tools in the isometric log-ratio space are proposed, with demonstrations on Iris and digits datasets to illustrate how features move the distribution toward or away from class regions. This principled, geometry-aware framework enables coherent, directional explanations across all classes, addressing limitations of per-class Shapley explanations. The approach lays a theoretical and practical foundation for interpretable multiclass predictions in a wide range of applications, albeit with considerations about computational complexity and feature dependencies.

Abstract

Originating in game theory, Shapley values are widely used for explaining a machine learning model's prediction by quantifying the contribution of each feature's value to the prediction. This requires a scalar prediction as in binary classification, whereas a multiclass probabilistic prediction is a discrete probability distribution, living on a multidimensional simplex. In such a multiclass setting the Shapley values are typically computed separately on each class in a one-vs-rest manner, ignoring the compositional nature of the output distribution. In this paper, we introduce Shapley compositions as a well-founded way to properly explain a multiclass probabilistic prediction, using the Aitchison geometry from compositional data analysis. We prove that the Shapley composition is the unique quantity satisfying linearity, symmetry and efficiency on the Aitchison simplex, extending the corresponding axiomatic properties of the standard Shapley value. We demonstrate this proper multiclass treatment in a range of scenarios.

Explaining a probabilistic prediction on the simplex with Shapley compositions

TL;DR

Abstract

Paper Structure (26 sections, 3 theorems, 27 equations, 10 figures, 2 algorithms)

This paper contains 26 sections, 3 theorems, 27 equations, 10 figures, 2 algorithms.

Remark
Introduction
Related work
The Shapley value in machine learning
Compositional data
The Aitchison geometry of the simplex
The isometric log-ratio transformation
Shapley composition on the simplex
Explaining a multiclass prediction with Shapley compositions
Visualisation in an isometric-log-ratio space
Three classes
Four classes
More classes: groups of classes and balances
Angles, norms and projections
Histograms and parallel coordinates
...and 11 more sections

Key Result

Theorem 1

The Shapley composition is the unique quantity that satisfies the following properties on the Aitchison simplex:

Figures (10)

Figure 1: A synthetic example of a Shapley composition-based explanation. This shows a $2$-dimensional space isomorphic to the $3$-class simplex where each point is a probability distribution as visualised with the histograms. The dashed black rays separate the maximum probability regions for each class and the dashed coloured vectors show the direction in favour of one class and against the other two. The space is additive such that the features' contributions $\{\tilde{\bm{\phi}}_i \}_{\{1,2,3\}}$ translate the base distribution to the prediction.
Figure 2: The sum of the Shapley compositions in an ILR space from the base distribution to the prediction for the classification of an Iris instance.
Figure 3: Visualisation of the Shapley values for each class in a one-vs-the-rest manner for the same instance as in Figure \ref{['fig:3classesshapsum']}, obtained using the SHAP toolkit NIPS2017_7062. The red/blue bars represent positive/negative contributions of each feature on the prediction.
Figure 4: Shapley compositions in a $3$-dimensional ILR space for a four classes digit recognition task. The Shapley compositions are summed in the ILR space from the base distribution to the prediction. The gray transparent walls mark out the four maximum probability decision regions.
Figure 5: Bifurcation tree corresponding to the basis obtained with the Gram-Schmidt procedure as in egozcue2003isometric and used in the examples of Figures \ref{['fig:3classesshapsum']} and \ref{['fig:4classesshapsum']}.
...and 5 more figures

Theorems & Definitions (7)

Theorem 1
proof
Definition
Lemma
proof
Corollary
proof

Explaining a probabilistic prediction on the simplex with Shapley compositions

TL;DR

Abstract

Explaining a probabilistic prediction on the simplex with Shapley compositions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (7)