Explaining a probabilistic prediction on the simplex with Shapley compositions
Paul-Gauthier Noé, Miquel Perelló-Nieto, Jean-François Bonastre, Peter Flach
TL;DR
The paper tackles explainability for multiclass probabilistic predictions by treating outputs as compositional data on the probability simplex. It extends the Shapley value to the simplex through the Aitchison geometry, defining Shapley compositions and proving their uniqueness under linearity, symmetry, and efficiency. An estimation algorithm and multiple visualisation tools in the isometric log-ratio space are proposed, with demonstrations on Iris and digits datasets to illustrate how features move the distribution toward or away from class regions. This principled, geometry-aware framework enables coherent, directional explanations across all classes, addressing limitations of per-class Shapley explanations. The approach lays a theoretical and practical foundation for interpretable multiclass predictions in a wide range of applications, albeit with considerations about computational complexity and feature dependencies.
Abstract
Originating in game theory, Shapley values are widely used for explaining a machine learning model's prediction by quantifying the contribution of each feature's value to the prediction. This requires a scalar prediction as in binary classification, whereas a multiclass probabilistic prediction is a discrete probability distribution, living on a multidimensional simplex. In such a multiclass setting the Shapley values are typically computed separately on each class in a one-vs-rest manner, ignoring the compositional nature of the output distribution. In this paper, we introduce Shapley compositions as a well-founded way to properly explain a multiclass probabilistic prediction, using the Aitchison geometry from compositional data analysis. We prove that the Shapley composition is the unique quantity satisfying linearity, symmetry and efficiency on the Aitchison simplex, extending the corresponding axiomatic properties of the standard Shapley value. We demonstrate this proper multiclass treatment in a range of scenarios.
