Benchmarking the Attribution Quality of Vision Models
Robin Hesse, Simone Schaub-Meyer, Stefan Roth
TL;DR
The work tackles the challenge of evaluating attribution quality in vision models by introducing the In-Domain Single-Deletion Score (IDSDS), which aligns train and test domains and enables inter-model comparisons on natural images. It systematically assesses 23 attribution methods across ImageNet backbones, finding that intrinsically explainable models substantially outperform standard ones and that raw attribution values can outperform their absolute counterparts under IDSDS. The study also shows that network design choices (e.g., depth, width, BN/bias removal, pre-softmax vs post-softmax) consistently influence attribution quality and confirms an accidental yet observable accuracy-explainability trade-off in large-scale models. Overall, IDSDS provides a principled framework for fair, cross-model attribution evaluation and offers practical guidance for designing more explainable vision architectures.
Abstract
Attribution maps are one of the most established tools to explain the functioning of computer vision models. They assign importance scores to input features, indicating how relevant each feature is for the prediction of a deep neural network. While much research has gone into proposing new attribution methods, their proper evaluation remains a difficult challenge. In this work, we propose a novel evaluation protocol that overcomes two fundamental limitations of the widely used incremental-deletion protocol, i.e., the out-of-domain issue and lacking inter-model comparisons. This allows us to evaluate 23 attribution methods and how different design choices of popular vision backbones affect their attribution quality. We find that intrinsically explainable models outperform standard models and that raw attribution values exhibit a higher attribution quality than what is known from previous work. Further, we show consistent changes in the attribution quality when varying the network design, indicating that some standard design choices promote attribution quality.
