A Fresh Look at Sanity Checks for Saliency Maps

Anna Hedström; Leander Weber; Sebastian Lapuschkin; Marina Höhne

A Fresh Look at Sanity Checks for Saliency Maps

Anna Hedström, Leander Weber, Sebastian Lapuschkin, Marina Höhne

TL;DR

This paper reevaluates the Model Parameter Randomisation Test (MPRT) for saliency-map explanations and identifies key methodological weaknesses related to preprocessing, layer-order, and similarity measures. It introduces two enhancements, Smooth MPRT (sMPRT) and Efficient MPRT (eMPRT), which respectively denoise attribution signals via input perturbations and quantify explanation faithfulness through increases in complexity measured by histogram entropy, thereby avoiding biased similarity metrics. Through extensive experiments on ImageNet, MNIST, and fMNIST with multiple architectures and attribution methods, the study demonstrates that both sMPRT and eMPRT improve metric reliability over the original MPRT, though no metric achieves perfect reliability and rankings can vary across tasks. The work provides a practical, publicly available toolkit for robust XAI evaluation and highlights the importance of using multiple, complementary metrics to assess attribution quality in real-world applications.

Abstract

The Model Parameter Randomisation Test (MPRT) is highly recognised in the eXplainable Artificial Intelligence (XAI) community due to its fundamental evaluative criterion: explanations should be sensitive to the parameters of the model they seek to explain. However, recent studies have raised several methodological concerns for the empirical interpretation of MPRT. In response, we propose two modifications to the original test: Smooth MPRT and Efficient MPRT. The former reduces the impact of noise on evaluation outcomes via sampling, while the latter avoids the need for biased similarity measurements by re-interpreting the test through the increase in explanation complexity after full model randomisation. Our experiments show that these modifications enhance the metric reliability, facilitating a more trustworthy deployment of explanation methods.

A Fresh Look at Sanity Checks for Saliency Maps

TL;DR

Abstract

Paper Structure (35 sections, 3 theorems, 7 equations, 13 figures, 1 table)

This paper contains 35 sections, 3 theorems, 7 equations, 13 figures, 1 table.

Introduction
Preliminaries
Model Parameter Randomisation Test (MPRT)
Methodological Caveats
Methods
Smooth Model Parameter Randomisation Test (sMPRT)
Efficient Model Parameter Randomisation Test (eMPRT)
Experimental Results
Analysing sMPRT
Denoising Attributions with sMPRT.
Effect of the Number of Perturbed Samples.
Analysing eMPRT
Benchmarking Explanation Methods.
Comparing Evaluation Outcomes with MPRT.
Meta-Evaluation
...and 20 more sections

Key Result

theorem thmcountertheorem

Let $\Psi_{\tau}^{\text{MPRT}}: \mathcal{E} \times \mathcal{F} \times \mathcal{X} \times \mathcal{Y} \mapsto \mathbb{R}$ be an evaluation function that computes a quality estimate $\hat{q} \in \mathbb{R}$ that measures the similarity between the original explanation ${\bm{e}}_l$ and the explanation with $\rho: \mathbb{R}^{D} \times \mathbb{R}^{D} \mapsto \mathbb{R}$ as a similarity function.

Figures (13)

Figure 1: Schematic visualisation of the original MPRT adebayo2018 (top), shortcomings identified by recent literature (middle) and our proposed solutions (bottom). Solid arrows in the visualization indicate shortcomings directly addressed by our proposed metrics. Dashed arrows show those resolved through incorporating methods suggested by existing research sundararajan2018bindershort. (a) The MPRT assesses the reliability of explanation methods by randomising a model's parameters layer by layer and comparing explanation similarity, i.e., $\rho({\bm{e}}, \hat{{\bm{e}}})$ between the original model $f$ and the randomised version $\hat{f}$. This is done by examining the explanations from each layer. (b) Pre-processing: normalisation and using absolute values can affect MPRT results by stripping information related to feature importance, especially the sign. (c) Layer-order: randomising layers from top to bottom can retain properties of the original lower layers and not yield fully random outputs, skewing the evaluation. (d) Similarity measures: the pairwise similarity metrics used in the original MPRT are sensitive to noise (e.g. from gradient shattering), potentially affecting how the test ranks explanation methods. (e) sMPRT introduces a "denoising" pre-processing step that averages explanations over $N$ perturbed inputs, reducing noise. (f) eMPRT reinterprets MPRT by comparing the complexity, measured by discrete entropy $\xi({\bm{e}})$, of explanations before and after full model randomisation.
Figure 2: Difference between sMPRT ($N=50$) and MPRT (corresponding to sMPRT with $N=1$). SSIM performance can degrade when attributions are denoised by sMPRT, and this effect occurs most strongly with gradient-based methods such as Gradient or SmoothGrad. The unrandomised and fully randomised model states are denoted as orig and final, respectively.
Figure 3: Effect of number of perturbed samples $N$ on sMPRT results for VGG16 (left) and ResNet18 (right) on ImageNet data. The plots indicate how the area under the mean sMPRT curve (AUC) changes with $N$ for different explanation methods (i.e. the area under the curves as shown in Figure \ref{['fig:smprt']}). Up to $N=50$, there seems to be a steep change in AUC, especially for gradient-based methods. After that, the AUC curves flatten out, indicating a converged estimate of the denoised samples.
Figure 4: Panels (a)-(d) illustrate entropy curves, representing the increase in complexity $\xi(\hat{e})$ with progressive bottom-up layer randomisation, denoted as $\hat{f}_l^b$. Panel (e) presents aggregated eMPRT scores, serving as a comparative benchmark for ten different XAI methods. No definitive superiority is observed across the tested datasets and models.
Figure 5: Relative ranking of attribution methods using MPRT and eMPRT across different datasets and models. This figure illustrates the categorical rankings for ten attribution methods plus a random attribution across ImageNet (VGG16, ResNet18) and MNIST (LeNet) datasets. The evaluation reveals how rankings fluctuate strongly between MPRT and eMPRT, consistent with findings from hedstrom2023metaquantus.
...and 8 more figures

Theorems & Definitions (3)

theorem thmcountertheorem: MPRT
theorem thmcountertheorem: sMPRT
theorem thmcountertheorem: eMPRT

A Fresh Look at Sanity Checks for Saliency Maps

TL;DR

Abstract

A Fresh Look at Sanity Checks for Saliency Maps

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (3)