Robustness of Visual Explanations to Common Data Augmentation
Lenka Tětková, Lars Kai Hansen
TL;DR
This paper addresses the reliability of post-hoc visual explanations under naturally occurring data augmentations. By partitioning augmentations into invariant and equivariant groups and evaluating multiple explanation methods across CNN architectures on ImageNet, it introduces a stability metric $S( ext{correlation}, ext{probability})$ and a pixel-flipping faithfulness score to quantify robustness and fidelity. The findings show explanations are generally less robust than model predictions, with invariant transformations yielding more stable attributions than equivariant ones; among methods, LRP composites and Guided Backpropagation provide the best stability, while Gradients-based methods are the least robust. Training with augmented data does not fully resolve instability, underscoring the need for more robust explanation techniques before deploying them in real-world vision tasks.
Abstract
As the use of deep neural networks continues to grow, understanding their behaviour has become more crucial than ever. Post-hoc explainability methods are a potential solution, but their reliability is being called into question. Our research investigates the response of post-hoc visual explanations to naturally occurring transformations, often referred to as augmentations. We anticipate explanations to be invariant under certain transformations, such as changes to the colour map while responding in an equivariant manner to transformations like translation, object scaling, and rotation. We have found remarkable differences in robustness depending on the type of transformation, with some explainability methods (such as LRP composites and Guided Backprop) being more stable than others. We also explore the role of training with data augmentation. We provide evidence that explanations are typically less robust to augmentation than classification performance, regardless of whether data augmentation is used in training or not.
