Table of Contents
Fetching ...

Benchmarking the Attribution Quality of Vision Models

Robin Hesse, Simone Schaub-Meyer, Stefan Roth

TL;DR

The work tackles the challenge of evaluating attribution quality in vision models by introducing the In-Domain Single-Deletion Score (IDSDS), which aligns train and test domains and enables inter-model comparisons on natural images. It systematically assesses 23 attribution methods across ImageNet backbones, finding that intrinsically explainable models substantially outperform standard ones and that raw attribution values can outperform their absolute counterparts under IDSDS. The study also shows that network design choices (e.g., depth, width, BN/bias removal, pre-softmax vs post-softmax) consistently influence attribution quality and confirms an accidental yet observable accuracy-explainability trade-off in large-scale models. Overall, IDSDS provides a principled framework for fair, cross-model attribution evaluation and offers practical guidance for designing more explainable vision architectures.

Abstract

Attribution maps are one of the most established tools to explain the functioning of computer vision models. They assign importance scores to input features, indicating how relevant each feature is for the prediction of a deep neural network. While much research has gone into proposing new attribution methods, their proper evaluation remains a difficult challenge. In this work, we propose a novel evaluation protocol that overcomes two fundamental limitations of the widely used incremental-deletion protocol, i.e., the out-of-domain issue and lacking inter-model comparisons. This allows us to evaluate 23 attribution methods and how different design choices of popular vision backbones affect their attribution quality. We find that intrinsically explainable models outperform standard models and that raw attribution values exhibit a higher attribution quality than what is known from previous work. Further, we show consistent changes in the attribution quality when varying the network design, indicating that some standard design choices promote attribution quality.

Benchmarking the Attribution Quality of Vision Models

TL;DR

The work tackles the challenge of evaluating attribution quality in vision models by introducing the In-Domain Single-Deletion Score (IDSDS), which aligns train and test domains and enables inter-model comparisons on natural images. It systematically assesses 23 attribution methods across ImageNet backbones, finding that intrinsically explainable models substantially outperform standard ones and that raw attribution values can outperform their absolute counterparts under IDSDS. The study also shows that network design choices (e.g., depth, width, BN/bias removal, pre-softmax vs post-softmax) consistently influence attribution quality and confirms an accidental yet observable accuracy-explainability trade-off in large-scale models. Overall, IDSDS provides a principled framework for fair, cross-model attribution evaluation and offers practical guidance for designing more explainable vision architectures.

Abstract

Attribution maps are one of the most established tools to explain the functioning of computer vision models. They assign importance scores to input features, indicating how relevant each feature is for the prediction of a deep neural network. While much research has gone into proposing new attribution methods, their proper evaluation remains a difficult challenge. In this work, we propose a novel evaluation protocol that overcomes two fundamental limitations of the widely used incremental-deletion protocol, i.e., the out-of-domain issue and lacking inter-model comparisons. This allows us to evaluate 23 attribution methods and how different design choices of popular vision backbones affect their attribution quality. We find that intrinsically explainable models outperform standard models and that raw attribution values exhibit a higher attribution quality than what is known from previous work. Further, we show consistent changes in the attribution quality when varying the network design, indicating that some standard design choices promote attribution quality.
Paper Structure (16 sections, 1 equation, 10 figures, 7 tables)

This paper contains 16 sections, 1 equation, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Illustration of our in-domain single-deletion score (IDSDS) for evaluating the correctness of attribution maps. We obtain "ground-truth" importance scores for each non-overlapping image patch by feeding images with deleted patches (shown in white) through the model under inspection and measuring the output drop of the model’s logit for the target class. The larger the drop (denoted by numbers and arrow width), the more important is the patch for the model. Next, we divide the attribution map into the corresponding patches and measure the attribution sum per patch. Finally, we obtain our IDSDS by computing the Spearman rank-order correlation between the output drops and the corresponding patch-attribution sums. To ensure that all image interventions are in-domain, we fine-tune the model under inspection on images with deleted patches before the evaluation.
  • Figure 2: (a) IDSDS on ImageNet for attribution methods using a ResNet-50 and the considered intrinsically explainable models. Please refer to \ref{['sec:ranking']} for an interpretation of the results. (b) Comparison of the incremental-deletion score (IDS) when computing a fixed attribution for the original input (top) versus when updating the attribution in each deletion step (bottom). The raw attributions for I$\times$G, IG, and IG-U perform better for the second setup. (c) Comparison of existing evaluation protocols. We compare IDSDS to the incremental-deletion protocol Samek:2017:EVW (IDS), the OOD single-deletion protocol Selvaraju:2017:GCV (SDS), and FunnyBirds Hesse:2023:SVD. Notably, the change between SDS and IDSDS indicates that aligning the training and testing domains is important; IDS is the only protocol strictly preferring absolute over raw attributions, and the best baseline image changes between real images and synthetic images from FunnyBirds. For better readability, we provide numerical values in \ref{['sec:ap_numbers']}.
  • Figure 3: (a) Comparison of model architectures (VGG-16 Simonyan:2015:VDC, ResNet-50 He:2016:DRL, and ViT-B/16 Dosovitskiy:2021:IWW). Compared to ResNet-50, attribution methods achieve a higher IDSDS on VGG-16 and a lower IDSDS on ViT-B/16. (b) Comparison of network depths. The IDSDS decreases with increasing depth.
  • Figure 4: (a) Comparison of different widths. Almost all attribution methods achieve a lower IDSDS for the wide (W) ResNet-50 Zagoruyko:2016:WRD, indicating that the increased width impedes attribution correctness. (b) Comparison of pre- and post-softmax attribution maps. Computing the attribution for a ResNet-50 after the final softmax layer reduces the IDSDS of almost every attribution method.
  • Figure 5: (a) Comparison of batch norm (BN) and the bias term. Removing the BN layers and all bias terms positively affects the IDSDS. (b) IDSDS over accuracy. We plot the best IDSDS of each model over the top-1 ImageNet accuracy. The mark indicates the respective best attribution method.
  • ...and 5 more figures

Theorems & Definitions (1)

  • proof