Table of Contents
Fetching ...

Towards Better Understanding Attribution Methods

Sukrut Rao, Moritz Böhle, Bernt Schiele

TL;DR

This work tackles the challenge of evaluating post-hoc attribution methods for deep networks by introducing three complementary evaluation schemes: DiFull (controlled influence of input regions), ML-Att (consistent layers across methods), and AggAtt (holistic qualitative visualization). It provides a thorough quantitative and qualitative assessment across GridPG, DiFull, and DiPart on ImageNet and CIFAR-10, analyzes correlations between attribution methods, and demonstrates a Gaussian-smoothing post-processing that substantially improves localization for several methods. The results reveal that fair, cross-layer comparisons are feasible and that smoothing can align input-layer explanations with final-layer Grad-CAM, enhancing both fidelity and interpretability. The work also rigorously compares computational costs with existing approaches like SmoothGrad, offering practical guidance for deploying attribution methods in real-world settings. Overall, the proposed framework enables fairer, more systematic evaluation and highlights smoothing as a robust tool to improve attribution quality across architectures and datasets.

Abstract

Deep neural networks are very successful on many vision tasks, but hard to interpret due to their black box nature. To overcome this, various post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions. Evaluating such methods is challenging since no ground truth attributions exist. We thus propose three novel evaluation schemes to more reliably measure the faithfulness of those methods, to make comparisons between them more fair, and to make visual inspection more systematic. To address faithfulness, we propose a novel evaluation setting (DiFull) in which we carefully control which parts of the input can influence the output in order to distinguish possible from impossible attributions. To address fairness, we note that different methods are applied at different layers, which skews any comparison, and so evaluate all methods on the same layers (ML-Att) and discuss how this impacts their performance on quantitative metrics. For more systematic visualizations, we propose a scheme (AggAtt) to qualitatively evaluate the methods on complete datasets. We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods. Finally, we propose a post-processing smoothing step that significantly improves the performance of some attribution methods, and discuss its applicability.

Towards Better Understanding Attribution Methods

TL;DR

This work tackles the challenge of evaluating post-hoc attribution methods for deep networks by introducing three complementary evaluation schemes: DiFull (controlled influence of input regions), ML-Att (consistent layers across methods), and AggAtt (holistic qualitative visualization). It provides a thorough quantitative and qualitative assessment across GridPG, DiFull, and DiPart on ImageNet and CIFAR-10, analyzes correlations between attribution methods, and demonstrates a Gaussian-smoothing post-processing that substantially improves localization for several methods. The results reveal that fair, cross-layer comparisons are feasible and that smoothing can align input-layer explanations with final-layer Grad-CAM, enhancing both fidelity and interpretability. The work also rigorously compares computational costs with existing approaches like SmoothGrad, offering practical guidance for deploying attribution methods in real-world settings. Overall, the proposed framework enables fairer, more systematic evaluation and highlights smoothing as a robust tool to improve attribution quality across architectures and datasets.

Abstract

Deep neural networks are very successful on many vision tasks, but hard to interpret due to their black box nature. To overcome this, various post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions. Evaluating such methods is challenging since no ground truth attributions exist. We thus propose three novel evaluation schemes to more reliably measure the faithfulness of those methods, to make comparisons between them more fair, and to make visual inspection more systematic. To address faithfulness, we propose a novel evaluation setting (DiFull) in which we carefully control which parts of the input can influence the output in order to distinguish possible from impossible attributions. To address fairness, we note that different methods are applied at different layers, which skews any comparison, and so evaluate all methods on the same layers (ML-Att) and discuss how this impacts their performance on quantitative metrics. For more systematic visualizations, we propose a scheme (AggAtt) to qualitatively evaluate the methods on complete datasets. We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods. Finally, we propose a post-processing smoothing step that significantly improves the performance of some attribution methods, and discuss its applicability.
Paper Structure (24 sections, 1 equation, 25 figures, 2 tables)

This paper contains 24 sections, 1 equation, 25 figures, 2 tables.

Figures (25)

  • Figure 1: Spearman rank correlation coefficients between Grad-CAM at the final layer and S-IntGrad at the input layer on GridPG for varying degrees of smoothing. The first column shows the correlation with the original unsmoothed version. We observe that the correlation improves significantly for both VGG11 and Resnet18 when smoothing with large kernel sizes.
  • Figure 2: Spearman rank correlation coefficients between Grad-CAM at the final layer and S-IxG at the input layer on GridPG for varying degrees of smoothing. The first column shows the correlation with the original unsmoothed version. We observe that the correlation improves for VGG11, but does not significantly improve for Resnet18.
  • Figure 9: Quantitative Results on ImageNet. We evaluate the localization scores each attribution method at the input (Inp), middle (Mid), and final (Fin) convolutional layers, on each of GridPG, DiFull, and DiPart using VGG11 (left) and Resnet18 (right). Top: Backpropagation-based methods. Middle: Activation-based methods. Bottom: Perturbation-based methods. The two horizontal dotted lines mark localization scores of $1.0$ and $0.25$, which correspond to perfect and random localization, respectively. We use the "*" symbol to show boxes that collapse to a single point, for better readability.
  • Figure 10: Examples from each AggAtt bin for each method at the input layer on GridPG using VGG11. From each bin, the image and its attribution at the median position are shown.
  • Figure 11: Examples from each AggAtt bin for each method at the final layer on GridPG using VGG11. From each bin, the image and its attribution at the median position are shown.
  • ...and 20 more figures