Identifying Drivers of Predictive Aleatoric Uncertainty

Pascal Iversen; Simon Witzke; Katharina Baum; Bernhard Y. Renard

Identifying Drivers of Predictive Aleatoric Uncertainty

Pascal Iversen, Simon Witzke, Katharina Baum, Bernhard Y. Renard

TL;DR

This paper tackles explaining predictive aleatoric uncertainty by extending regression to output a Gaussian predictive distribution $y|\mathbf{x} \sim \mathcal{N}(\hat{\mu}_{\mathbf{x}}, \hat{\sigma}^2_{\mathbf{x}})$ and applying post-hoc variance explanations to elucidate drivers of irreducible noise. It introduces Variance Feature Attribution (VFA) flavors (e.g., VFA-SHAP, VFA-IG, VFA-LRP, VFA-DeepSHAP) and benchmarks them against CLUE and InfoSHAP across synthetic tabular data and an image dataset MNIST+U, using adapted XAI metrics for uncertainty explanations. The evaluation shows that VFA methods, particularly VFA-SHAP, generally outperform alternatives in identifying uncertainty drivers, with MNIST+U confirming variance-focused attributions align with ground-truth uncertainty masks. The work provides a scalable, minimally invasive approach to explain uncertainty in regression and offers a benchmark suite and evaluation protocol to advance uncertainty explanation research, with potential impact on risk-aware decision-making in real-world AI systems.

Abstract

Explainability and uncertainty quantification are key to trustable artificial intelligence. However, the reasoning behind uncertainty estimates is generally left unexplained. Identifying the drivers of uncertainty complements explanations of point predictions in recognizing model limitations and enhancing transparent decision-making. So far, explanations of uncertainties have been rarely studied. The few exceptions rely on Bayesian neural networks or technically intricate approaches, such as auxiliary generative models, thereby hindering their broad adoption. We propose a straightforward approach to explain predictive aleatoric uncertainties. We estimate uncertainty in regression as predictive variance by adapting a neural network with a Gaussian output distribution. Subsequently, we apply out-of-the-box explainers to the model's variance output. This approach can explain uncertainty influences more reliably than complex published approaches, which we demonstrate in a synthetic setting with a known data-generating process. We substantiate our findings with a nuanced, quantitative benchmark including synthetic and real, tabular and image datasets. For this, we adapt metrics from conventional XAI research to uncertainty explanations. Overall, the proposed method explains uncertainty estimates with little modifications to the model architecture and outperforms more intricate methods in most settings.

Identifying Drivers of Predictive Aleatoric Uncertainty

TL;DR

This paper tackles explaining predictive aleatoric uncertainty by extending regression to output a Gaussian predictive distribution

and applying post-hoc variance explanations to elucidate drivers of irreducible noise. It introduces Variance Feature Attribution (VFA) flavors (e.g., VFA-SHAP, VFA-IG, VFA-LRP, VFA-DeepSHAP) and benchmarks them against CLUE and InfoSHAP across synthetic tabular data and an image dataset MNIST+U, using adapted XAI metrics for uncertainty explanations. The evaluation shows that VFA methods, particularly VFA-SHAP, generally outperform alternatives in identifying uncertainty drivers, with MNIST+U confirming variance-focused attributions align with ground-truth uncertainty masks. The work provides a scalable, minimally invasive approach to explain uncertainty in regression and offers a benchmark suite and evaluation protocol to advance uncertainty explanation research, with potential impact on risk-aware decision-making in real-world AI systems.

Abstract

Paper Structure (16 sections, 3 equations, 6 figures, 2 tables)

This paper contains 16 sections, 3 equations, 6 figures, 2 tables.

Introduction
Related Work
Methods
Deep Heteroscedastic Regression and Extension of Pre-trained Models
Post-hoc Explanation of Predictive Variance
Uncertainty Explanation Evaluation Metrics
Benchmark on Tabular Data
Synthetic Data Generation
Tabular Real World Datasets
Tabular Benchmarking Setup
Benchmark on Image Data: MNIST+U
Results
Benchmarking the Detection of Uncertainty Drivers using Synthetic Datasets
Local Accuracies, Faithfulness, and Robustness
Benchmark on MNIST+U Image Data
...and 1 more sections

Figures (6)

Figure 1: Overview of the variance feature attribution pipeline. (A) A point prediction model with an output layer with weight matrix ${\bm{W}}_{old} \in \mathbb{R}^{d\times1}$ and a scalar bias. We equip this model with a Gaussian distribution resulting in (B), a model with output weight matrix ${\bm{W}}_{new} \in \mathbb{R}^{d\times2}$ and bias ${\bm{b}}_{new} \in \mathbb{R}^2$. The two outputs are the mean $\hat{\mu}$ and the variance $\hat{\sigma}^2$ of the predictive distribution. (C) From there, we can explain the variance using any suitable explainability method, resulting in attributions to the input features that can be used to understand the drivers of the model's aleatoric uncertainty.
Figure 2: Explanations for uncertainty and mean predictions for the synthetic dataset using VFA-SHAP. We display SHAP summaries for the 10 most important features of (A) model uncertainty or (B) mean prediction ordered by the mean of their absolute estimated Shapley values. VFA-SHAP identifies all noise features driving the model's aleatoric uncertainty. Explaining the mean output offers complementary information but does not detect uncertainty features.
Figure 3: Top 15 global importance features with GRA and GMA for each uncertainty explainer. First column: From 1,500 test samples, we explain the 200 instances with the highest predicted uncertainty. VFA flavors highlight the ground truth noise features (red), while Infoshap and CLUE are less accurate. Second and third columns: For 200 random or low uncertainty instances, VFA remains accurate, while CLUE becomes unreliable. InfoSHAP maintains adequate performance but consistently detects only three noise features.
Figure 4: Local Lipschitz continuity estimates for 200 randomly chosen test set instances for all methods and datasets. Lower values indicate higher robustness. Having the lowest median Lipschitz estimates for most datasets, VFA-SHAP and VFA-IG are the generally more robust explainers.
Figure 5: RMA and RRA for each uncertainty explainer and LRP mean explanations. We compare the attribution of pixels in the ground truth mask of the mean or the noise. We expect most of the relevance to be contained in the uncertainty mask. We show the mean and standard deviation over all samples in the test set. (*) Note: for CLUE, we only use 40% of the test samples due to its high runtime.
...and 1 more figures

Identifying Drivers of Predictive Aleatoric Uncertainty

TL;DR

Abstract

Identifying Drivers of Predictive Aleatoric Uncertainty

Authors

TL;DR

Abstract

Table of Contents

Figures (6)