Table of Contents
Fetching ...

The (Un)reliability of saliency methods

Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Erhan, Been Kim

TL;DR

The paper critically evaluates saliency methods by testing their sensitivity to simple input transformations that do not affect model predictions. It introduces input invariance as a key reliability criterion and shows that many methods fail this test, especially for attribution schemes that depend on reference points. Gradient and signal-based methods tend to be invariant, while attribution methods like GI and some IG/DTD variants can be misled unless carefully chosen references or normalization are applied. The work highlights a need for principled reference-point selection and further research to guarantee reliable explanations across transformations and modalities.

Abstract

Saliency methods aim to explain the predictions of deep neural networks. These methods lack reliability when the explanation is sensitive to factors that do not contribute to the model prediction. We use a simple and common pre-processing step ---adding a constant shift to the input data--- to show that a transformation with no effect on the model can cause numerous methods to incorrectly attribute. In order to guarantee reliability, we posit that methods should fulfill input invariance, the requirement that a saliency method mirror the sensitivity of the model with respect to transformations of the input. We show, through several examples, that saliency methods that do not satisfy input invariance result in misleading attribution.

The (Un)reliability of saliency methods

TL;DR

The paper critically evaluates saliency methods by testing their sensitivity to simple input transformations that do not affect model predictions. It introduces input invariance as a key reliability criterion and shows that many methods fail this test, especially for attribution schemes that depend on reference points. Gradient and signal-based methods tend to be invariant, while attribution methods like GI and some IG/DTD variants can be misled unless carefully chosen references or normalization are applied. The work highlights a need for principled reference-point selection and further research to guarantee reliable explanations across transformations and modalities.

Abstract

Saliency methods aim to explain the predictions of deep neural networks. These methods lack reliability when the explanation is sensitive to factors that do not contribute to the model prediction. We use a simple and common pre-processing step ---adding a constant shift to the input data--- to show that a transformation with no effect on the model can cause numerous methods to incorrectly attribute. In order to guarantee reliability, we posit that methods should fulfill input invariance, the requirement that a saliency method mirror the sensitivity of the model with respect to transformations of the input. We show, through several examples, that saliency methods that do not satisfy input invariance result in misleading attribution.

Paper Structure

This paper contains 14 sections, 19 equations, 6 figures.

Figures (6)

  • Figure 1: Integrated gradients and Deep Taylor Decomposition determine input attribution relative to a chosen reference point. This choice determines the vantage point for all subsequent attribution. Using two example reference points for each method we demonstrate that changing the reference causes the attribution to diverge. The attributions are visualized in a consistent manner with the IG paper Mukund2017. Visualisations were made using ImageNet data. Imagenet2015 and the VGG16 architecture Simonyan2014.
  • Figure 2: Evaluating the sensitivity of gradient and signal methods using MNIST with a [0,1] encoding for network $f_1$ and a [-1,0] encoding for network $f_2$. Both raw gradients and signal methods satisfy input invariance by producing identical saliency heatmaps for both networks.
  • Figure 3: Evaluation of attribution method sensitivity using MNIST with a [0,1] encoding for network $f_1$ and a [-1,0] encoding for network $f_2$. Gradient x Input, IG and DTD with a zero reference point, which is equivalent to LRP Bach2015Montavon2017, do not satisfy input invariance and produce different attributions for each network. IG with a black image reference point and DTD with a PA reference point are not sensitive to a mean shift in input.
  • Figure 4: Evaluation of attribution method sensitivity using MNIST. Gradient x Input, all IG reference points and DTD with a LRP reference point do not satisfy input invariance and produce different attributions for each network. DTD with a PA reference point is not sensitive to the transformation of the input.
  • Figure 5: Smoothgrad (SG) inherits the sensitivity of the underlying attribution method. SG is not sensitive to the input transformation for gradient and signal methods (SG-PA and and SG-GB). SG does not satisfy input invariance for Integrated Gradients (SG-Zero) and Deep Taylor Decomposition (SG-LRP) when a zero vector refernce point is used. SG is invariant to the constant input shift when PatternAttribution (SG-PA) or a black image (SG-Black) are used. SG is not input invariant for gradient x input.
  • ...and 1 more figures