Saliency strikes back: How filtering out high frequencies improves white-box explanations

Sabine Muzellec; Thomas Fel; Victor Boutin; Léo andéol; Rufin VanRullen; Thomas Serre

Saliency strikes back: How filtering out high frequencies improves white-box explanations

Sabine Muzellec, Thomas Fel, Victor Boutin, Léo andéol, Rufin VanRullen, Thomas Serre

TL;DR

This work identifies a prevalent flaw in gradient-based white-box explanations: high-frequency artifacts in the gradient hinder faithfulness. It introduces FORGrad, a Fourier-based repair that applies an architecture- and method-specific low-pass filter to the gradient $\nabla_{\bm{x}} \bm{f}(\bm{x})$, with the cutoff $\sigma^{\star}$ chosen by maximizing the faithfulness metric over a validation set via $\sigma^{\star} = \arg\max_{\sigma} \mathbb{E}_{\bm{x} \sim \mathcal{V}} F(\bm{\varphi}_{\sigma}(\bm{x}))$, where $\mathcal{V}$ contains 1,280 ImageNet validation images. The authors demonstrate that white-box attributions exhibit more high-frequency content than black-box methods, largely due to max-pooling and downsampling; by filtering these frequencies, FORGrad substantially improves faithfulness, stability, and ranking of white-box methods, bridging the gap with black-box approaches while preserving computational efficiency. The findings suggest architectural factors like pooling contribute to gradient artifacts and motivate future design changes, including pooling strategies and transformer-related analyses. Overall, FORGrad enables simpler, efficient white-box explanations to compete with heavier black-box methods on XAI benchmarks, with potential implications for training-time filtering and robustness.

Abstract

Attribution methods correspond to a class of explainability methods (XAI) that aim to assess how individual inputs contribute to a model's decision-making process. We have identified a significant limitation in one type of attribution methods, known as ``white-box" methods. Although highly efficient, as we will show, these methods rely on a gradient signal that is often contaminated by high-frequency artifacts. To overcome this limitation, we introduce a new approach called "FORGrad". This simple method effectively filters out these high-frequency artifacts using optimal cut-off frequencies tailored to the unique characteristics of each model architecture. Our findings show that FORGrad consistently enhances the performance of already existing white-box methods, enabling them to compete effectively with more accurate yet computationally demanding "black-box" methods. We anticipate that our research will foster broader adoption of simpler and more efficient white-box methods for explainability, offering a better balance between faithfulness and computational efficiency.

Saliency strikes back: How filtering out high frequencies improves white-box explanations

TL;DR

, with the cutoff

chosen by maximizing the faithfulness metric over a validation set via

, where

contains 1,280 ImageNet validation images. The authors demonstrate that white-box attributions exhibit more high-frequency content than black-box methods, largely due to max-pooling and downsampling; by filtering these frequencies, FORGrad substantially improves faithfulness, stability, and ranking of white-box methods, bridging the gap with black-box approaches while preserving computational efficiency. The findings suggest architectural factors like pooling contribute to gradient artifacts and motivate future design changes, including pooling strategies and transformer-related analyses. Overall, FORGrad enables simpler, efficient white-box explanations to compete with heavier black-box methods on XAI benchmarks, with potential implications for training-time filtering and robustness.

Abstract

Paper Structure (28 sections, 20 figures, 7 tables)

This paper contains 28 sections, 20 figures, 7 tables.

Introduction
Related Work
Notations, Metrics, and Networks
White-box Methods are Contaminated by High-Frequencies from the Gradient
Comparison between White-Box and Black-Box Methods
White-box methods are contaminated with high-frequency signal:
A Low-Pass Filtered Gradient Approximates Well the Original Gradient
High-Frequency Artifacts Stems from Max-Pooling Operations
FORGrad: FOurier Reparation of the Gradients
Conclusion and Perspectives
Appendix
Additional results
Frequency power per category.
Gradient approximation with ConvNeXt and ViT
Investigating the impact of MaxPooling on high-frequency content
...and 13 more sections

Figures (20)

Figure 1: The FORGrad method.A:FORGrad estimates an optimal cut-off frequency ($\sigma^{\star}$) for individual combinations of attribution method and network architecture. It is then used to low-pass filter the gradient signal. FORGrad leads to quantitatively better attribution maps. B:FORGrad is applicable to all white-box methods (reddish data points), and it consistently improves their faithfulness (see reddish crosses). The x-axis represents the execution time in seconds for each method computed on $100$ images. We refer the reader to section \ref{['sec:metrics']} for more details on the faithfulness metric.
Figure 2: Fourier signature and power/frequency slope. The Fourier signature is computed using a circular average of the Fourier amplitudes (averaged over all $\theta$ values) at each frequency (i.e., each $R$). The power/frequency slope summarizes in one scalar the Fourier signature.
Figure 3: High-frequency power in attribution methods.A: Fourier signature of white-box and black-box attribution methods. White-box methods produce attribution maps with increased power in the high frequencies. B: Faithfulness and power/frequency slope for several attribution methods. The numbers below the bar correspond to this method's computational time, measured using a set of 100 images (ImageNet) with an Nvidia T4. White-box methods show lower faithfulness but significantly better computational complexity compared to black-box methods. See section \ref{['sec:metrics']} for more details on how faithfulness, the power/frequency slope, and the Fourier signature were computed.
Figure 4: Gradients are contaminated with high frequencies and stem from MaxPooling operations. A: We evaluate the importance of high-frequency content in the gradient using a first-order approximation of the model, i.e., $\bm{f}(\bm{x}+\bm{\varepsilon})\approx \bm{f}(\bm{x})+\bm{\varepsilon}\nabla_{\bm{x}} \bm{f}(\bm{x})$, and we compute the $\ell_2$ approximation error when we remove high frequency up to $\sigma$: $\zeta({\bm{x}}, \sigma)$ relative to the error when we do not filter $\zeta({\bm{x}}, \sigma_{\text{max}})$. Gradients in which we remove high-frequency content (dark blue) produce an error closer to the baseline (green) compared to the control conditions (pink). B: Example of gradients in a ResNet architecture after a pooling operation, along with their Fourier signature. MaxPooling operations elicit high-frequency power in the gradient.
Figure 5: The FORGrad method. A: Visual examples in which the gradient is low-pass filtered at various cut-off frequencies. Without any filtering ($\sigma=224$), the attribution maps are similar to those obtained with original white-box methods (i.e., Gradient-Input here). FORGrad finds an optimal cut-off frequency at $\sigma=10$. B: Sensitivity of the faithfulness to changes in the cut-off frequency. The faithfulness is evaluated on 1,280 images from the ImageNet validation set (see section \ref{['sec:metrics']} for more details). The frequencies range from $224$, indicative of an unfiltered gradient, to $0$, representing a fully filtered gradient. C: Side-by-side comparison of attribution maps generated using the SmoothGrad method (top-row) versus those refined with the FORGrad method (bottom row). D: Evolution of the optimal $\sigma^{\star}$ values across different ResNetV2 architectures and attribution methods. The variability highlights the crucial role of a tailored cut-off frequency.
...and 15 more figures

Saliency strikes back: How filtering out high frequencies improves white-box explanations

TL;DR

Abstract

Saliency strikes back: How filtering out high frequencies improves white-box explanations

Authors

TL;DR

Abstract

Table of Contents

Figures (20)