Table of Contents
Fetching ...

Now you see me! Attribution Distributions Reveal What is Truly Important for a Prediction

Nils Philipp Walter, Jilles Vreeken, Jonas Fischer

TL;DR

This work addresses the opacity of neural network explanations by showing that standard attributions computed on a single class logit miss information used by the model. It introduces VAR, a training free refinement that computes per pixel distributions over a subset of class logits via a local softmax, then refines target attributions by weighting with the dominance of competing classes. Across CNN and Vision Transformer architectures, VAR consistently improves localization, robustness to sanity checks, and insertion-based measures, revealing both discriminative and shared features that were hidden in traditional attribution pipelines. The approach is architecture agnostic, method agnostic, and designed to be readily applied to existing attribution methods, enhancing interpretability in high stakes domains and beyond.

Abstract

Neural networks are regularly employed in high-stakes decision-making, where understanding and transparency is key. Attribution methods have been developed to gain understanding into which input features neural networks use for a specific prediction. Although widely used in computer vision, these methods often result in unspecific saliency maps that fail to identify the relevant information that led to a decision, supported by different benchmarks results. Here, we revisit the common attribution pipeline and identify one cause for the lack of specificity in attributions as the computation of attribution of isolated logits. Instead, we suggest to combine attributions of multiple class logits in analogy to how the softmax combines the information across logits. By computing probability distributions of attributions over classes for each spatial location in the image, we unleash the true capabilities of existing attribution methods, revealing better object- and instance-specificity and uncovering discriminative as well as shared features between classes. On common benchmarks, including the grid-pointing game and randomization-based sanity checks, we show that this reconsideration of how and where we compute attributions across the network improves established attribution methods while staying agnostic to model architectures. We make the code publicly available: https://github.com/nilspwalter/var.

Now you see me! Attribution Distributions Reveal What is Truly Important for a Prediction

TL;DR

This work addresses the opacity of neural network explanations by showing that standard attributions computed on a single class logit miss information used by the model. It introduces VAR, a training free refinement that computes per pixel distributions over a subset of class logits via a local softmax, then refines target attributions by weighting with the dominance of competing classes. Across CNN and Vision Transformer architectures, VAR consistently improves localization, robustness to sanity checks, and insertion-based measures, revealing both discriminative and shared features that were hidden in traditional attribution pipelines. The approach is architecture agnostic, method agnostic, and designed to be readily applied to existing attribution methods, enhancing interpretability in high stakes domains and beyond.

Abstract

Neural networks are regularly employed in high-stakes decision-making, where understanding and transparency is key. Attribution methods have been developed to gain understanding into which input features neural networks use for a specific prediction. Although widely used in computer vision, these methods often result in unspecific saliency maps that fail to identify the relevant information that led to a decision, supported by different benchmarks results. Here, we revisit the common attribution pipeline and identify one cause for the lack of specificity in attributions as the computation of attribution of isolated logits. Instead, we suggest to combine attributions of multiple class logits in analogy to how the softmax combines the information across logits. By computing probability distributions of attributions over classes for each spatial location in the image, we unleash the true capabilities of existing attribution methods, revealing better object- and instance-specificity and uncovering discriminative as well as shared features between classes. On common benchmarks, including the grid-pointing game and randomization-based sanity checks, we show that this reconsideration of how and where we compute attributions across the network improves established attribution methods while staying agnostic to model architectures. We make the code publicly available: https://github.com/nilspwalter/var.

Paper Structure

This paper contains 22 sections, 11 equations, 20 figures, 3 tables.

Figures (20)

  • Figure 1: Reconsidering how to apply attributions. For an image classification, typically done with a softmax classification head on top of an image encoder (top left), the standard approach to generate attribution maps as explanation for the decision-making is considering the logit of the predicted class (bottom left), ignoring that softmax incorporates logits of all classes for the final prediction. We suggest to compute distributions of attributions across classes by computing the softmax of attribution values across all logits, reflecting the network decision-making (right). Network parts considered for the attribution computation are colored in orange.
  • Figure 2: Contrastive attributions across architectures and methods. For each baseline (top), our refinement (bottom) sharpens class-specific regions (keyboard, laptop, monitor, mouse). In ResNet-50 the effect is strongest, revealing clear class-specific signals often assumed absent. In ViT-base-16, attributions already cover relevant areas but remain diffuse; our method reduces this blur and highlights the important regions more cleanly.
  • Figure 3: VAR on the Grid Pointing Game. We show examples from the grid pointing game for methods most affected by our framework (as columns: Integrated Gradient, Guided Backpropagation, Input$\times$Gradient) for ResNet50. Input Images are given on the left, for each we provide vanilla attribution methods (top rows) and augmented with VAR (bottom rows). For each, we show the attribution for the four different classes in the grid as columns.
  • Figure 4: Qualitative example of the ablation study. For GBP (top) and GBP with VAR (bottom) we provide examples from the insertion/deletion ablation. For each, we show the original image with class softmax scores for two classes, the attribution map for each of the classes, and the attribution-based intervention mask on each of the classes with resulting changes in class softmax scores.
  • Figure 5: Sanity check by network randomization. We show similarity between attributions before and after randomization of x% of network layers for standard attribution (dashed) and when augmented with VAR (solid). Lower is better. Randomization is from back to front of the network following the strategy of adebayo2018sanity.
  • ...and 15 more figures