DAVE: Distribution-aware Attribution via ViT Gradient Decomposition
Adam Wróbel, Siddhartha Gairola, Jacek Tabor, Bernt Schiele, Bartosz Zieliński, Dawid Rymarczyk
TL;DR
DAVE addresses the challenge of producing stable, high-resolution attributions for Vision Transformers by performing a structured gradient decomposition that isolates the distribution-aware effective transformation $L(\boldsymbol{X})$ from input-dependent operator variation. By applying Reynolds-style equivariant averaging across local spatial transformations and low-pass smoothing, DAVE extracts a locally equivariant component that suppresses architecture-induced artifacts and yields sharper, class-consistent explanations across supervised and self-supervised ViTs as well as B-cos variants. Evaluations on ImageNet with multiple ViT backbones show improved localization and faithful attribution curves compared with strong baselines, accompanied by qualitative evidence of object-centric maps. The method offers a principled, scalable pathway to trustworthy ViT explanations and supports safer deployment of high-capacity models in critical applications, with ongoing opportunities to improve efficiency and automatically adapt the transformation group.
Abstract
Vision Transformers (ViTs) have become a dominant architecture in computer vision, yet producing stable and high-resolution attribution maps for these models remains challenging. Architectural components such as patch embeddings and attention routing often introduce structured artifacts in pixel-level explanations, causing many existing methods to rely on coarse patch-level attributions. We introduce DAVE \textit{(\underline{D}istribution-aware \underline{A}ttribution via \underline{V}iT Gradient D\underline{E}composition)}, a mathematically grounded attribution method for ViTs based on a structured decomposition of the input gradient. By exploiting architectural properties of ViTs, DAVE isolates locally equivariant and stable components of the effective input--output mapping. It separates these from architecture-induced artifacts and other sources of instability.
