Table of Contents
Fetching ...

Reliable Evaluation of Attribution Maps in CNNs: A Perturbation-Based Approach

Lars Nieradzik, Henrike Stephani, Janis Keuper

TL;DR

The method proposes to replace pixel modifications with adversarial perturbations, which provides a more robust evaluation framework and shows the increased consistency of the metric across 15 dataset-architecture combinations.

Abstract

In this paper, we present an approach for evaluating attribution maps, which play a central role in interpreting the predictions of convolutional neural networks (CNNs). We show that the widely used insertion/deletion metrics are susceptible to distribution shifts that affect the reliability of the ranking. Our method proposes to replace pixel modifications with adversarial perturbations, which provides a more robust evaluation framework. By using smoothness and monotonicity measures, we illustrate the effectiveness of our approach in correcting distribution shifts. In addition, we conduct the most comprehensive quantitative and qualitative assessment of attribution maps to date. Introducing baseline attribution maps as sanity checks, we find that our metric is the only contender to pass all checks. Using Kendall's $τ$ rank correlation coefficient, we show the increased consistency of our metric across 15 dataset-architecture combinations. Of the 16 attribution maps tested, our results clearly show SmoothGrad to be the best map currently available. This research makes an important contribution to the development of attribution maps by providing a reliable and consistent evaluation framework. To ensure reproducibility, we will provide the code along with our results.

Reliable Evaluation of Attribution Maps in CNNs: A Perturbation-Based Approach

TL;DR

The method proposes to replace pixel modifications with adversarial perturbations, which provides a more robust evaluation framework and shows the increased consistency of the metric across 15 dataset-architecture combinations.

Abstract

In this paper, we present an approach for evaluating attribution maps, which play a central role in interpreting the predictions of convolutional neural networks (CNNs). We show that the widely used insertion/deletion metrics are susceptible to distribution shifts that affect the reliability of the ranking. Our method proposes to replace pixel modifications with adversarial perturbations, which provides a more robust evaluation framework. By using smoothness and monotonicity measures, we illustrate the effectiveness of our approach in correcting distribution shifts. In addition, we conduct the most comprehensive quantitative and qualitative assessment of attribution maps to date. Introducing baseline attribution maps as sanity checks, we find that our metric is the only contender to pass all checks. Using Kendall's rank correlation coefficient, we show the increased consistency of our metric across 15 dataset-architecture combinations. Of the 16 attribution maps tested, our results clearly show SmoothGrad to be the best map currently available. This research makes an important contribution to the development of attribution maps by providing a reliable and consistent evaluation framework. To ensure reproducibility, we will provide the code along with our results.

Paper Structure

This paper contains 11 sections, 6 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Visualization of four different attribution maps (AM) and two evaluation methods on the same input image and the same model (EfficientNet-B0 DBLP:journals/corr/abs-1905-11946). Visually, it is impossible to objectively determine which result gives the "best" estimate of the image regions with the highest impact on the model decision. Current evaluation methods of the used AMs also give conflicting results: the Deletion method (lower is better) DBLP:journals/corr/FongV17DBLP:journals/corr/abs-1806-07421 ranks Guided Integrated Gradients (IG)DBLP:journals/corr/abs-2106-09788 first, while the Insertion method (higher is better) points towards Blur IGDBLP:journals/corr/abs-2004-03383. Refer also to \ref{['fig:corr']} for a comparison between more images.
  • Figure 2: This plot illustrates the degree of similarity among all attribution maps. The matrix was computed by averaging the individual correlation results across all attribution maps in the ImageNet dataset with ResNet-50.
  • Figure 3: The series of images showcases the process of the Deletion methods, applied to the original image. The first image combines the original image with the saliency map overlay. Subsequent images (2nd to 4th) depict three stages of deletion, with progressively less important regions being zeroed out, as indicated by the colors in the first image.
  • Figure 4: Insertion, Insertion Blur, and our Perturbation method applied to one image. (a) Shows the Insertion and Insertion Blur methods. (b) Displays our Perturbation method for different epsilon values. Probability refers to the neural network's confidence for the selected class. The order of removing the perturbed pixels (b) and inserting the pixels (a) depends on the saliency map. The highest values are changed first. AUC values are computed for each saliency map, condensing the outcome into a single scalar value. Refer to \ref{['tab:monosmooth']} for numerical results using more than one image.
  • Figure 5: Blue is the original image, orange is the perturbed image. There is only a small effect on the image distribution. Increasing the brightness by $1$ would have a similar effect on the distribution (shifting it to the right). We considered here only the red channel of the violin image from \ref{['fig:deletionprocess']}.
  • ...and 7 more figures