Table of Contents
Fetching ...

On the Faithfulness of Vision Transformer Explanations

Junyi Wu, Weitai Kang, Hao Tang, Yuan Hong, Yan Yan

TL;DR

This work introduces Salience-guided Faithfulness Coefficient (SaCo), a novel evaluation metric leveraging essential in-formation of salience distribution and demonstrates that the use of gradient and multi-layer aggregation can markedly enhance the faithfulness of attention-based explanation, shedding light on potential paths for advancing Vision Transformer explainability.

Abstract

To interpret Vision Transformers, post-hoc explanations assign salience scores to input pixels, providing human-understandable heatmaps. However, whether these interpretations reflect true rationales behind the model's output is still underexplored. To address this gap, we study the faithfulness criterion of explanations: the assigned salience scores should represent the influence of the corresponding input pixels on the model's predictions. To evaluate faithfulness, we introduce Salience-guided Faithfulness Coefficient (SaCo), a novel evaluation metric leveraging essential information of salience distribution. Specifically, we conduct pair-wise comparisons among distinct pixel groups and then aggregate the differences in their salience scores, resulting in a coefficient that indicates the explanation's degree of faithfulness. Our explorations reveal that current metrics struggle to differentiate between advanced explanation methods and Random Attribution, thereby failing to capture the faithfulness property. In contrast, our proposed SaCo offers a reliable faithfulness measurement, establishing a robust metric for interpretations. Furthermore, our SaCo demonstrates that the use of gradient and multi-layer aggregation can markedly enhance the faithfulness of attention-based explanation, shedding light on potential paths for advancing Vision Transformer explainability.

On the Faithfulness of Vision Transformer Explanations

TL;DR

This work introduces Salience-guided Faithfulness Coefficient (SaCo), a novel evaluation metric leveraging essential in-formation of salience distribution and demonstrates that the use of gradient and multi-layer aggregation can markedly enhance the faithfulness of attention-based explanation, shedding light on potential paths for advancing Vision Transformer explainability.

Abstract

To interpret Vision Transformers, post-hoc explanations assign salience scores to input pixels, providing human-understandable heatmaps. However, whether these interpretations reflect true rationales behind the model's output is still underexplored. To address this gap, we study the faithfulness criterion of explanations: the assigned salience scores should represent the influence of the corresponding input pixels on the model's predictions. To evaluate faithfulness, we introduce Salience-guided Faithfulness Coefficient (SaCo), a novel evaluation metric leveraging essential information of salience distribution. Specifically, we conduct pair-wise comparisons among distinct pixel groups and then aggregate the differences in their salience scores, resulting in a coefficient that indicates the explanation's degree of faithfulness. Our explorations reveal that current metrics struggle to differentiate between advanced explanation methods and Random Attribution, thereby failing to capture the faithfulness property. In contrast, our proposed SaCo offers a reliable faithfulness measurement, establishing a robust metric for interpretations. Furthermore, our SaCo demonstrates that the use of gradient and multi-layer aggregation can markedly enhance the faithfulness of attention-based explanation, shedding light on potential paths for advancing Vision Transformer explainability.
Paper Structure (16 sections, 3 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 16 sections, 3 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Explanation result and illustration of two perturbation manners: cumulative perturbation and our SaCo perturbation. Previous metrics perturb the pixel subsets cumulatively. In contrast, the SaCo perturbs them individually to directly compare their influences.
  • Figure 2: Correlations between sample rankings w.r.t. our SaCo and existing metrics.
  • Figure 3: Illustration of three explanations for the predicted class 'linnet', salience score distributions, changes in model's confidence caused by perturbation, and final SaCo and AOPC scores.
  • Figure 4: Evaluation results for advanced explanation methods and Random Attribution (red). Three graphs present results on CIFAR-10 (left), CIFAR-100 krizhevsky2009learning (middle), and ImageNet russakovsky2015imagenet (right), respectively. The values on each axis have been rescaled so that a larger distance from the center consistently signifies superior performance. Enlarged graphs are provided in the supplementary for better clarity.