DAVE: Distribution-aware Attribution via ViT Gradient Decomposition

Adam Wróbel; Siddhartha Gairola; Jacek Tabor; Bernt Schiele; Bartosz Zieliński; Dawid Rymarczyk

DAVE: Distribution-aware Attribution via ViT Gradient Decomposition

Adam Wróbel, Siddhartha Gairola, Jacek Tabor, Bernt Schiele, Bartosz Zieliński, Dawid Rymarczyk

TL;DR

DAVE addresses the challenge of producing stable, high-resolution attributions for Vision Transformers by performing a structured gradient decomposition that isolates the distribution-aware effective transformation $L(\boldsymbol{X})$ from input-dependent operator variation. By applying Reynolds-style equivariant averaging across local spatial transformations and low-pass smoothing, DAVE extracts a locally equivariant component that suppresses architecture-induced artifacts and yields sharper, class-consistent explanations across supervised and self-supervised ViTs as well as B-cos variants. Evaluations on ImageNet with multiple ViT backbones show improved localization and faithful attribution curves compared with strong baselines, accompanied by qualitative evidence of object-centric maps. The method offers a principled, scalable pathway to trustworthy ViT explanations and supports safer deployment of high-capacity models in critical applications, with ongoing opportunities to improve efficiency and automatically adapt the transformation group.

Abstract

Vision Transformers (ViTs) have become a dominant architecture in computer vision, yet producing stable and high-resolution attribution maps for these models remains challenging. Architectural components such as patch embeddings and attention routing often introduce structured artifacts in pixel-level explanations, causing many existing methods to rely on coarse patch-level attributions. We introduce DAVE \textit{(\underline{D}istribution-aware \underline{A}ttribution via \underline{V}iT Gradient D\underline{E}composition)}, a mathematically grounded attribution method for ViTs based on a structured decomposition of the input gradient. By exploiting architectural properties of ViTs, DAVE isolates locally equivariant and stable components of the effective input--output mapping. It separates these from architecture-induced artifacts and other sources of instability.

DAVE: Distribution-aware Attribution via ViT Gradient Decomposition

TL;DR

from input-dependent operator variation. By applying Reynolds-style equivariant averaging across local spatial transformations and low-pass smoothing, DAVE extracts a locally equivariant component that suppresses architecture-induced artifacts and yields sharper, class-consistent explanations across supervised and self-supervised ViTs as well as B-cos variants. Evaluations on ImageNet with multiple ViT backbones show improved localization and faithful attribution curves compared with strong baselines, accompanied by qualitative evidence of object-centric maps. The method offers a principled, scalable pathway to trustworthy ViT explanations and supports safer deployment of high-capacity models in critical applications, with ongoing opportunities to improve efficiency and automatically adapt the transformation group.

Abstract

Paper Structure (81 sections, 45 equations, 19 figures, 3 tables, 2 algorithms)

This paper contains 81 sections, 45 equations, 19 figures, 3 tables, 2 algorithms.

Introduction
Related works
Architecture-agnostic Attribution Methods.
ViT-Specific Methods.
Inherently interpretable Architectures.
DAVE
Extracting the effective transformation
Layer structure
Layer definition.
Layer realisation.
Layer derivative.
Effective transformation
Effective transformation representation.
Extracting an equivariant transformation
Equivariance criterion.
...and 66 more sections

Figures (19)

Figure 1: DAVE provides fine-grained, pixel-level attributions that capture class-specific structural patterns present in each object. ImageNet-1k imagenet samples and corresponding DAVE attributions for DeiT III-B-16/224 touvron2022deit.
Figure 2: Attribution consistency. Under small augmentations (5° rotation, 20px horizontal and 8px vertical shift), DAVE highlights consistent features across the original and augmented images, while AttnLRP and LeGrad show inconsistent attributions (white markers), on a DeiT-III-B-16/224 model.
Figure 3: Overview of the DAVE attribution pipeline for Vision Transformers. Given an input image, DAVE samples small spatial transformations and Gaussian perturbations, computes the effective input–output transformation of the ViT for each sample, and filters out operator-variation term. The resulting attribution operators are inverse-transformed, averaged, and applied element-wise to the input to produce the final DAVE attribution map.
Figure 4: Toy example: illustrating operator variation dominating the layer derivative despite a stable effective transformation (Eq. \ref{['eq:layer_derivative']}). Top: full layer derivative (sum), dominated by operator variation. Middle: effective transformation with a small perturbation. Bottom: operator-variation term, where the perturbation is amplified.
Figure 5: Progressive construction of DAVE attribution. The effective transformation captures the direct input–output action of the network while discarding operator-variation terms. Equivariant aggregation suppresses architecture-induced artifacts by enforcing local transformation consistency. DAVE further applies low-pass stabilization, yielding a stable and interpretable attribution map. Columns show input, input$\times$gradient, input$\times$effective transformation, input$\times$equivariant transformation, and final DAVE attribution, respectively.
...and 14 more figures

DAVE: Distribution-aware Attribution via ViT Gradient Decomposition

TL;DR

Abstract

DAVE: Distribution-aware Attribution via ViT Gradient Decomposition

Authors

TL;DR

Abstract

Table of Contents

Figures (19)