Table of Contents
Fetching ...

Uncovering the Structure of Explanation Quality with Spectral Analysis

Johannes Maeß, Grégoire Montavon, Shinichi Nakajima, Klaus-Robert Müller, Thomas Schnake

TL;DR

The paper introduces a spectral-analysis framework for explanation quality in Explainable AI by encoding attributions into a redistribution matrix $R_{\cdot|\cdot}$ and analyzing its singular values to separate stability from target sensitivity. It defines the Stability-Sensitivity Metric $SSM = \frac{1}{\sigma_1} \cdot \| (\sigma_k)_{k=1}^K \|_2$ and shows that common metrics like pixel-flipping and entropy partly reflect these factors, validated on MNIST and ImageNet with methods including LRP, SmoothGrad, IG, and Shapley. Through qualitative and quantitative analyses, the work demonstrates how hyperparameters such as $\gamma$ and smoothing can move explanations toward a sweet spot that is both stable and discriminative, and demonstrates spectral decomposition to interpret heatmaps. Overall, the framework provides a theoretical lens and practical guidance for designing more reliable XAI evaluations and explanation techniques.

Abstract

As machine learning models are increasingly considered for high-stakes domains, effective explanation methods are crucial to ensure that their prediction strategies are transparent to the user. Over the years, numerous metrics have been proposed to assess quality of explanations. However, their practical applicability remains unclear, in particular due to a limited understanding of which specific aspects each metric rewards. In this paper we propose a new framework based on spectral analysis of explanation outcomes to systematically capture the multifaceted properties of different explanation techniques. Our analysis uncovers two distinct factors of explanation quality-stability and target sensitivity-that can be directly observed through spectral decomposition. Experiments on both MNIST and ImageNet show that popular evaluation techniques (e.g., pixel-flipping, entropy) partially capture the trade-offs between these factors. Overall, our framework provides a foundational basis for understanding explanation quality, guiding the development of more reliable techniques for evaluating explanations.

Uncovering the Structure of Explanation Quality with Spectral Analysis

TL;DR

The paper introduces a spectral-analysis framework for explanation quality in Explainable AI by encoding attributions into a redistribution matrix and analyzing its singular values to separate stability from target sensitivity. It defines the Stability-Sensitivity Metric and shows that common metrics like pixel-flipping and entropy partly reflect these factors, validated on MNIST and ImageNet with methods including LRP, SmoothGrad, IG, and Shapley. Through qualitative and quantitative analyses, the work demonstrates how hyperparameters such as and smoothing can move explanations toward a sweet spot that is both stable and discriminative, and demonstrates spectral decomposition to interpret heatmaps. Overall, the framework provides a theoretical lens and practical guidance for designing more reliable XAI evaluations and explanation techniques.

Abstract

As machine learning models are increasingly considered for high-stakes domains, effective explanation methods are crucial to ensure that their prediction strategies are transparent to the user. Over the years, numerous metrics have been proposed to assess quality of explanations. However, their practical applicability remains unclear, in particular due to a limited understanding of which specific aspects each metric rewards. In this paper we propose a new framework based on spectral analysis of explanation outcomes to systematically capture the multifaceted properties of different explanation techniques. Our analysis uncovers two distinct factors of explanation quality-stability and target sensitivity-that can be directly observed through spectral decomposition. Experiments on both MNIST and ImageNet show that popular evaluation techniques (e.g., pixel-flipping, entropy) partially capture the trade-offs between these factors. Overall, our framework provides a foundational basis for understanding explanation quality, guiding the development of more reliable techniques for evaluating explanations.

Paper Structure

This paper contains 24 sections, 15 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Cartoon depiction of the two factors of explanation quality that can be derived from our spectral analysis and the SSM metric (Eq. \ref{['eq:cartoon']}) that aggregates them. We postulate the existence of a 'sweet spot' where both explanation stability and sensitivity can be achieved. This can be reached by a subtle adjustment of explanation parameters such as LRP's $\gamma$ and SmoothGrad's standard deviation (as later shown in Figures \ref{['fig:comparison_vgg16_g16_all']} and \ref{['fig:comparison_d3_g14_all']}).
  • Figure 2: Examples of explanations produced by the LRP explanation technique using the rule LRP-$\gamma$ with different values of the parameter $\gamma$. An increase in $\gamma$ is associated with an increase in explanation stability (visible here as the vanishing noise pattern in the explanation). On the other hand, choosing too large a value for $\gamma$ results in a decrease in target sensitivity.
  • Figure 3: Quantitative analysis of evaluation metrics for explanation methods on an ImageNet-trained model using 100 images. The top row shows exemplary heat maps for the class 'sulphur-crested cockatoo’, one per method and parameter choice per dashed vertical line drawn in the main plot. Below, we present evaluation metrics (top to bottom): stability & sensitivity (ours), SSM (ours), PF-AUC, and entropy. Explanation methods (left to right) include LRP, sg, and ig. For LRP, results for 11 $\gamma$ values are shown, and for sg, 4 different noise levels. PF-AUCs are calculated after deleting 5% of the image. Thick lines indicate the median, with shaded areas showing variability: 5% for SSM and PF-AUC, 25% for stability, sensitivity, and entropy. The star indicates where explanation quality under the given metric is maximized.
  • Figure 4: Quantitative analysis of evaluation metrics for explanation methods on an MNIST-trained model using 100 images. The top row shows exemplary heat maps for the class '8’, one per method and parameter choice per dashed vertical line drawn in the main plot. The structure of this figure follows Fig. \ref{['fig:comparison_vgg16_g16_all']}, with the following difference: The rightmost column includes results for the Shapley methods. The LRP curve was obtained using 80 different $\gamma$ parameters.
  • Figure 5: LRP explanations for the class 'paddle' decomposed into contributions of different singular values (cf. Eq. \ref{['eq:expanded']}). Heat maps visualize how bins of singular values, namely, the ranges $(1,1)$, $(2,10)$, $(11,100)$, and $(101,1000)$ contribute, with the norm of this partial result (as a percentage of the norm of the full heat map) denoted in brackets. The top row of heat maps corresponds to LRP with $\gamma=0.04$ and the bottom row uses $\gamma=0.11$. The plots on the right visualize the rise in the heat maps norm as it is produced with approximations of $R_{\cdot| \cdot}$ with increasing rank $k$: $\|\sum_{i=1}^{k} \mathcal{E}(y; \sigma_i)\|_2 \cdot \| \mathcal{E}(y) \|_2^{-1}$. Depending on the choice of $\gamma$, small singular values contribute little to heat maps and their norm, indicating that explanations are sensitive to only a low number of patterns in the data.