Table of Contents
Fetching ...

Explainability as statistical inference

Hugo Henri Joseph Senetaire, Damien Garreau, Jes Frellsen, Pierre-Alexandre Mattei

TL;DR

This work reframes model explainability as statistical inference by introducing LEX, a modular probabilistic framework that jointly models a predictor $p_\theta$, a stochastic feature selector $p_\gamma(z|x)$, and an imputation model $p_\iota$, yielding an interpretable, instance-wise feature mask. Explanations are obtained by marginalizing over masks and imputations, with a maximum-likelihood objective $\mathcal{L}(\theta,\gamma)$ and regularization to control sparsity, enabling both In-Situ and Post-Hoc scenarios within a unified theory. The authors demonstrate that several existing methods (e.g., L2X, Invase, REAL-X) are special cases of this framework, and they introduce ground-truth datasets for evaluating feature importance maps, showing that multiple imputation leads to more plausible explanations than constant imputations. Across synthetic, Switching Panels, and CelebA Smile experiments, LEX with multiple imputation achieves competitive predictive accuracy while yielding sharper, more reliable feature selections, suggesting a practical path toward robust, uncertainty-aware explanations in diverse modalities.

Abstract

A wide variety of model explanation approaches have been proposed in recent years, all guided by very different rationales and heuristics. In this paper, we take a new route and cast interpretability as a statistical inference problem. We propose a general deep probabilistic model designed to produce interpretable predictions. The model parameters can be learned via maximum likelihood, and the method can be adapted to any predictor network architecture and any type of prediction problem. Our method is a case of amortized interpretability models, where a neural network is used as a selector to allow for fast interpretation at inference time. Several popular interpretability methods are shown to be particular cases of regularised maximum likelihood for our general model. We propose new datasets with ground truth selection which allow for the evaluation of the features importance map. Using these datasets, we show experimentally that using multiple imputation provides more reasonable interpretations.

Explainability as statistical inference

TL;DR

This work reframes model explainability as statistical inference by introducing LEX, a modular probabilistic framework that jointly models a predictor , a stochastic feature selector , and an imputation model , yielding an interpretable, instance-wise feature mask. Explanations are obtained by marginalizing over masks and imputations, with a maximum-likelihood objective and regularization to control sparsity, enabling both In-Situ and Post-Hoc scenarios within a unified theory. The authors demonstrate that several existing methods (e.g., L2X, Invase, REAL-X) are special cases of this framework, and they introduce ground-truth datasets for evaluating feature importance maps, showing that multiple imputation leads to more plausible explanations than constant imputations. Across synthetic, Switching Panels, and CelebA Smile experiments, LEX with multiple imputation achieves competitive predictive accuracy while yielding sharper, more reliable feature selections, suggesting a practical path toward robust, uncertainty-aware explanations in diverse modalities.

Abstract

A wide variety of model explanation approaches have been proposed in recent years, all guided by very different rationales and heuristics. In this paper, we take a new route and cast interpretability as a statistical inference problem. We propose a general deep probabilistic model designed to produce interpretable predictions. The model parameters can be learned via maximum likelihood, and the method can be adapted to any predictor network architecture and any type of prediction problem. Our method is a case of amortized interpretability models, where a neural network is used as a selector to allow for fast interpretation at inference time. Several popular interpretability methods are shown to be particular cases of regularised maximum likelihood for our general model. We propose new datasets with ground truth selection which allow for the evaluation of the features importance map. Using these datasets, we show experimentally that using multiple imputation provides more reasonable interpretations.
Paper Structure (55 sections, 33 equations, 23 figures)

This paper contains 55 sections, 33 equations, 23 figures.

Figures (23)

  • Figure 1: The LEX pipeline allows us to transform any prediction model into an explainable one. In supervised learning, a standard approach uses a function $f_\theta$ (usually a neural network) to parameterize a prediction distribution $p_\theta$. In that framework, we would feed the input data directly to the neural network $f_\theta$. Within the LEX pipeline, we obtain a distribution of masks $p_\gamma$ parameterized by a neural network $g_\gamma$ from the input data. Samples from this mask distribution are applied to the original image $x$ to produce incomplete samples $x_z$. We implicitly remove features by sampling imputed samples $\Tilde{x}$ given the masked image using a generative model $p_{\iota}$ conditioned on both the mask and the original image. These samples are then fed to a classifier $f_\theta$ to obtain a prediction. As opposed to previous methods, multiple imputation allows us to minimise the encoding happening in the mask and to get a more faithful selection.
  • Figure 2: Left panel: graphical model of a standard predictive model. We propose to embed this model in a latent explainer model using the construction of the right panel.
  • Figure 3: Performances of LEX with different imputations. $0$ imputation (solid orange line) corresponds to the imputation method of Invase/L2X, Surrogate $0$ imputation (blue dashed line) is the imputation method of REAL-X. The standard Gaussian is the true conditional imputation method from the model (green dotted curve). We also conducted experiments on self-explainable neural networks (SENN) in dark continuous green. The reported accuracy is obtained using all features. Columns correspond to the three synthetic datasets (S1, S2, S3) and lines correspond to the different measure of quality of the model (Accuracy, FDR, TPR). We report the mean and the standard deviation over $5$ folds/generated datasets.
  • Figure 4: Performance of LEX (mean/std over 5 datasets) with varying constant imputation (orange solid line) and surrogate constant imputation (blue dashed line) on S3 using the true selection rate. Though Invase/L2X (resp. REAL-X) uses constant imputation (resp. surrogate constant), all these methods used only 0 as constant.
  • Figure 5: Performances of LEX on the Switching Panel MNIST dataset with different imputations. Surrogate $0$ imputation corresponds to the parametrization of REAL-X, $0$ imputation corresponds to the parametrization of Invase/L2X. We report the mean and standard deviation over $5$ folds/generated datasets. We also report results on Post-Hoc methods (LIME, SHAP, FASTSHAP) and self-explainable neural networks (SENN). For the Post-Hoc methods, the predictor trained without any selection module and with full data has an accuracy of 0.97 on average over the 5 folds/generated datasets which is similar to the result obtained with the method trained In-Situ.
  • ...and 18 more figures