The (Un)reliability of saliency methods
Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Erhan, Been Kim
TL;DR
The paper critically evaluates saliency methods by testing their sensitivity to simple input transformations that do not affect model predictions. It introduces input invariance as a key reliability criterion and shows that many methods fail this test, especially for attribution schemes that depend on reference points. Gradient and signal-based methods tend to be invariant, while attribution methods like GI and some IG/DTD variants can be misled unless carefully chosen references or normalization are applied. The work highlights a need for principled reference-point selection and further research to guarantee reliable explanations across transformations and modalities.
Abstract
Saliency methods aim to explain the predictions of deep neural networks. These methods lack reliability when the explanation is sensitive to factors that do not contribute to the model prediction. We use a simple and common pre-processing step ---adding a constant shift to the input data--- to show that a transformation with no effect on the model can cause numerous methods to incorrectly attribute. In order to guarantee reliability, we posit that methods should fulfill input invariance, the requirement that a saliency method mirror the sensitivity of the model with respect to transformations of the input. We show, through several examples, that saliency methods that do not satisfy input invariance result in misleading attribution.
