Uncovering the Structure of Explanation Quality with Spectral Analysis

Johannes Maeß; Grégoire Montavon; Shinichi Nakajima; Klaus-Robert Müller; Thomas Schnake

Uncovering the Structure of Explanation Quality with Spectral Analysis

Johannes Maeß, Grégoire Montavon, Shinichi Nakajima, Klaus-Robert Müller, Thomas Schnake

TL;DR

The paper introduces a spectral-analysis framework for explanation quality in Explainable AI by encoding attributions into a redistribution matrix $R_{\cdot|\cdot}$ and analyzing its singular values to separate stability from target sensitivity. It defines the Stability-Sensitivity Metric $SSM = \frac{1}{\sigma_1} \cdot \| (\sigma_k)_{k=1}^K \|_2$ and shows that common metrics like pixel-flipping and entropy partly reflect these factors, validated on MNIST and ImageNet with methods including LRP, SmoothGrad, IG, and Shapley. Through qualitative and quantitative analyses, the work demonstrates how hyperparameters such as $\gamma$ and smoothing can move explanations toward a sweet spot that is both stable and discriminative, and demonstrates spectral decomposition to interpret heatmaps. Overall, the framework provides a theoretical lens and practical guidance for designing more reliable XAI evaluations and explanation techniques.

Abstract

As machine learning models are increasingly considered for high-stakes domains, effective explanation methods are crucial to ensure that their prediction strategies are transparent to the user. Over the years, numerous metrics have been proposed to assess quality of explanations. However, their practical applicability remains unclear, in particular due to a limited understanding of which specific aspects each metric rewards. In this paper we propose a new framework based on spectral analysis of explanation outcomes to systematically capture the multifaceted properties of different explanation techniques. Our analysis uncovers two distinct factors of explanation quality-stability and target sensitivity-that can be directly observed through spectral decomposition. Experiments on both MNIST and ImageNet show that popular evaluation techniques (e.g., pixel-flipping, entropy) partially capture the trade-offs between these factors. Overall, our framework provides a foundational basis for understanding explanation quality, guiding the development of more reliable techniques for evaluating explanations.

Uncovering the Structure of Explanation Quality with Spectral Analysis

TL;DR

Abstract

Uncovering the Structure of Explanation Quality with Spectral Analysis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)