Table of Contents
Fetching ...

Explaining Bayesian Neural Networks

Kirill Bykov, Marina M. -C. Höhne, Adelaida Creosteanu, Klaus-Robert Müller, Frederick Klauschen, Shinichi Nakajima, Marius Kloft

TL;DR

This paper addresses the gap in Explainable AI for Bayesian Neural Networks by treating local explanations as distributions induced by the posterior over weights, $p(W|\mathcal{D}_{tr})$, and sampling multiple explanation maps to quantify uncertainty in explanations. It introduces a method-agnostic framework (UAI) that computes mean explanations and constructs Union/Intersection aggregations, plus an uncertainty-aware variant UAI^+ and clustering to reveal multi-modal explanation strategies. A key theoretical result shows that, for linear attribution operators, the explanation of the predictive mean equals the mean of explanations, enabling efficient summarization of the average behavior. Empirical results on CMNIST, ImageNet, and a pathology use case demonstrate that incorporating explanation uncertainty improves interpretability, highlights diverse reasoning modes, and aids in detecting spurious cues (Clever Hans), though the work is limited by the chosen posterior approximation and the need for dedicated metrics for explanation distributions. Overall, the framework provides a practical, scalable path to uncertainty-aware XAI, with potential impact for safety-critical deployment and more nuanced model debugging in real-world tasks.

Abstract

To advance the transparency of learning machines such as Deep Neural Networks (DNNs), the field of Explainable AI (XAI) was established to provide interpretations of DNNs' predictions. While different explanation techniques exist, a popular approach is given in the form of attribution maps, which illustrate, given a particular data point, the relevant patterns the model has used for making its prediction. Although Bayesian models such as Bayesian Neural Networks (BNNs) have a limited form of transparency built-in through their prior weight distribution, they lack explanations of their predictions for given instances. In this work, we take a step toward combining these two perspectives by examining how local attributions can be extended to BNNs. Within the Bayesian framework, network weights follow a probability distribution; hence, the standard point explanation extends naturally to an explanation distribution. Viewing explanations probabilistically, we aggregate and analyze multiple local attributions drawn from an approximate posterior to explore variability in explanation patterns. The diversity of explanations offers a way to further explore how predictive rationales may vary across posterior samples. Quantitative and qualitative experiments on toy and benchmark data, as well as on a real-world pathology dataset, illustrate that our framework enriches standard explanations with uncertainty information and may support the visualization of explanation stability.

Explaining Bayesian Neural Networks

TL;DR

This paper addresses the gap in Explainable AI for Bayesian Neural Networks by treating local explanations as distributions induced by the posterior over weights, , and sampling multiple explanation maps to quantify uncertainty in explanations. It introduces a method-agnostic framework (UAI) that computes mean explanations and constructs Union/Intersection aggregations, plus an uncertainty-aware variant UAI^+ and clustering to reveal multi-modal explanation strategies. A key theoretical result shows that, for linear attribution operators, the explanation of the predictive mean equals the mean of explanations, enabling efficient summarization of the average behavior. Empirical results on CMNIST, ImageNet, and a pathology use case demonstrate that incorporating explanation uncertainty improves interpretability, highlights diverse reasoning modes, and aids in detecting spurious cues (Clever Hans), though the work is limited by the chosen posterior approximation and the need for dedicated metrics for explanation distributions. Overall, the framework provides a practical, scalable path to uncertainty-aware XAI, with potential impact for safety-critical deployment and more nuanced model debugging in real-world tasks.

Abstract

To advance the transparency of learning machines such as Deep Neural Networks (DNNs), the field of Explainable AI (XAI) was established to provide interpretations of DNNs' predictions. While different explanation techniques exist, a popular approach is given in the form of attribution maps, which illustrate, given a particular data point, the relevant patterns the model has used for making its prediction. Although Bayesian models such as Bayesian Neural Networks (BNNs) have a limited form of transparency built-in through their prior weight distribution, they lack explanations of their predictions for given instances. In this work, we take a step toward combining these two perspectives by examining how local attributions can be extended to BNNs. Within the Bayesian framework, network weights follow a probability distribution; hence, the standard point explanation extends naturally to an explanation distribution. Viewing explanations probabilistically, we aggregate and analyze multiple local attributions drawn from an approximate posterior to explore variability in explanation patterns. The diversity of explanations offers a way to further explore how predictive rationales may vary across posterior samples. Quantitative and qualitative experiments on toy and benchmark data, as well as on a real-world pathology dataset, illustrate that our framework enriches standard explanations with uncertainty information and may support the visualization of explanation stability.

Paper Structure

This paper contains 32 sections, 2 theorems, 18 equations, 8 figures, 2 tables.

Key Result

Lemma 3.1

For any explanation method that can be formalized as in equation eq:ExplanationOperator with a linear operator $\mathcal{T}_{x, W} = \mathcal{T}_{x}$ that does not depend on $W$, it holds that

Figures (8)

  • Figure 1: Illustrating the practical advantages of BNNs over standard DNNs. From left to right: a whole-slide image from a cancer patient, divided into patches with highlighted cancerous regions; DNN prediction; BNN uncertainty estimation; and patches with the highest uncertainty. Unlike a standard DNN, the BNN provides uncertainty estimates for each patch. Blue indicates high certainty, while red indicates low certainty. The rightmost panel shows patches where the model's prediction is uncertain and fluctuates between classes.
  • Figure 2: Schematic illustration of proposed methods for explaining Bayesian Neural Networks. Given a particular input -- a cat image -- we sample models from the posterior distribution and collect local explanations from each instance. These explanations are later aggregated using Union and Intersection (UAI) method: The Union explanation provides a global overview of the features learned by the BNN by combining various modes, whereas the intersection explanation provides the intersection strategy used by the BNN -- demonstrating the most certain features used for the prediction. Further, local explanations can be clustered to illustrate the different modalities of the decision-making strategies.
  • Figure 3: Visualization of the multi-modality of gradient explanations of a BNN (here, a LeNet network trained with dropout) is shown exemplarily for an image of class "Trousers" from the Fashion MNIST dataset. The explanations were clustered by the SpRAy algorithm into 7 clusters, stated on top, and the first row shows the mean explanation for each cluster respectively, where the shape of the trousers is overlaid over the explanation. The second row depicts the t-SNE visualization of the distribution of explanations, where the points of the particular clusters are highlighted. From the mean cluster explanations, we can observe the variability in the decision-making process of the Bayesian Neural Networks --- each mode illustrates one decision-making pattern, and the number of elements in each cluster indicates the importance of each cluster to the prediction.
  • Figure 4: Illustrative explanations of a Bayesian Neural Network. The BNN was trained with Dropout on the Custom MNIST dataset bykov2021noisegrad. The input, an MNIST digit zero on a random CIFAR Background, is shown on the left and was correctly classified as zero by the BNN. Explanations of the BNN decision are given as Intersection ($\alpha = 5$), Average, Union ($\alpha = 95$), and UAI$^+$ explanations using LRP-$\varepsilon$ (first row), Integrated Gradient (IG) (second row). We can observe that the relevance of all explanations correctly emphasizes the digit. However, our proposed Union and Intersection approach, in which the bundled information is contained in the UAI$^+$ explanation, has a stronger informational content about the role of the features concerning the model prediction by specifying the model's (un)certainty that a feature contributed to the prediction made.
  • Figure 5: Illustration of UAI$^+$ method explanations based on the Absolute Gradient explanation method for three different Bayesian scenarios: Deep Ensemble, MC Dropout, and Laplace approximation. From the explanations, we can observe the main features that each of the Bayesian Networks used.
  • ...and 3 more figures

Theorems & Definitions (4)

  • Definition 2.1: Relevance Attribution operator
  • Lemma 3.1
  • Theorem 3.1
  • proof