Explainability as statistical inference
Hugo Henri Joseph Senetaire, Damien Garreau, Jes Frellsen, Pierre-Alexandre Mattei
TL;DR
This work reframes model explainability as statistical inference by introducing LEX, a modular probabilistic framework that jointly models a predictor $p_\theta$, a stochastic feature selector $p_\gamma(z|x)$, and an imputation model $p_\iota$, yielding an interpretable, instance-wise feature mask. Explanations are obtained by marginalizing over masks and imputations, with a maximum-likelihood objective $\mathcal{L}(\theta,\gamma)$ and regularization to control sparsity, enabling both In-Situ and Post-Hoc scenarios within a unified theory. The authors demonstrate that several existing methods (e.g., L2X, Invase, REAL-X) are special cases of this framework, and they introduce ground-truth datasets for evaluating feature importance maps, showing that multiple imputation leads to more plausible explanations than constant imputations. Across synthetic, Switching Panels, and CelebA Smile experiments, LEX with multiple imputation achieves competitive predictive accuracy while yielding sharper, more reliable feature selections, suggesting a practical path toward robust, uncertainty-aware explanations in diverse modalities.
Abstract
A wide variety of model explanation approaches have been proposed in recent years, all guided by very different rationales and heuristics. In this paper, we take a new route and cast interpretability as a statistical inference problem. We propose a general deep probabilistic model designed to produce interpretable predictions. The model parameters can be learned via maximum likelihood, and the method can be adapted to any predictor network architecture and any type of prediction problem. Our method is a case of amortized interpretability models, where a neural network is used as a selector to allow for fast interpretation at inference time. Several popular interpretability methods are shown to be particular cases of regularised maximum likelihood for our general model. We propose new datasets with ground truth selection which allow for the evaluation of the features importance map. Using these datasets, we show experimentally that using multiple imputation provides more reasonable interpretations.
