Now you see me! Attribution Distributions Reveal What is Truly Important for a Prediction
Nils Philipp Walter, Jilles Vreeken, Jonas Fischer
TL;DR
This work addresses the opacity of neural network explanations by showing that standard attributions computed on a single class logit miss information used by the model. It introduces VAR, a training free refinement that computes per pixel distributions over a subset of class logits via a local softmax, then refines target attributions by weighting with the dominance of competing classes. Across CNN and Vision Transformer architectures, VAR consistently improves localization, robustness to sanity checks, and insertion-based measures, revealing both discriminative and shared features that were hidden in traditional attribution pipelines. The approach is architecture agnostic, method agnostic, and designed to be readily applied to existing attribution methods, enhancing interpretability in high stakes domains and beyond.
Abstract
Neural networks are regularly employed in high-stakes decision-making, where understanding and transparency is key. Attribution methods have been developed to gain understanding into which input features neural networks use for a specific prediction. Although widely used in computer vision, these methods often result in unspecific saliency maps that fail to identify the relevant information that led to a decision, supported by different benchmarks results. Here, we revisit the common attribution pipeline and identify one cause for the lack of specificity in attributions as the computation of attribution of isolated logits. Instead, we suggest to combine attributions of multiple class logits in analogy to how the softmax combines the information across logits. By computing probability distributions of attributions over classes for each spatial location in the image, we unleash the true capabilities of existing attribution methods, revealing better object- and instance-specificity and uncovering discriminative as well as shared features between classes. On common benchmarks, including the grid-pointing game and randomization-based sanity checks, we show that this reconsideration of how and where we compute attributions across the network improves established attribution methods while staying agnostic to model architectures. We make the code publicly available: https://github.com/nilspwalter/var.
