Table of Contents
Fetching ...

Faithful Attention Explainer: Verbalizing Decisions Based on Discriminative Features

Yao Rong, David Scheerer, Enkelejda Kasneci

TL;DR

The paper tackles the challenge of producing faithful textual explanations for image classifier decisions. It proposes Faithful Attention Explainer (FAE) that grounds language in the classifier's attended features and adds an alignment mechanism using a BiLSTM and an Attention Enforcement module to support extrinsic attention maps like GradCAM or human gaze. On the CUB-200-2011 and ACT-X datasets, FAE achieves strong language metrics and high Faithful Explanation Rate (FER), with AE providing further faithfulness gains and enabling extrinsic attention interpretation. These results demonstrate the feasibility of gaze-based human–AI interaction and highlight both the promise and limitations of leveraging external attention maps and language models for explanations.

Abstract

In recent years, model explanation methods have been designed to interpret model decisions faithfully and intuitively so that users can easily understand them. In this paper, we propose a framework, Faithful Attention Explainer (FAE), capable of generating faithful textual explanations regarding the attended-to features. Towards this goal, we deploy an attention module that takes the visual feature maps from the classifier for sentence generation. Furthermore, our method successfully learns the association between features and words, which allows a novel attention enforcement module for attention explanation. Our model achieves promising performance in caption quality metrics and a faithful decision-relevance metric on two datasets (CUB and ACT-X). In addition, we show that FAE can interpret gaze-based human attention, as human gaze indicates the discriminative features that humans use for decision-making, demonstrating the potential of deploying human gaze for advanced human-AI interaction.

Faithful Attention Explainer: Verbalizing Decisions Based on Discriminative Features

TL;DR

The paper tackles the challenge of producing faithful textual explanations for image classifier decisions. It proposes Faithful Attention Explainer (FAE) that grounds language in the classifier's attended features and adds an alignment mechanism using a BiLSTM and an Attention Enforcement module to support extrinsic attention maps like GradCAM or human gaze. On the CUB-200-2011 and ACT-X datasets, FAE achieves strong language metrics and high Faithful Explanation Rate (FER), with AE providing further faithfulness gains and enabling extrinsic attention interpretation. These results demonstrate the feasibility of gaze-based human–AI interaction and highlight both the promise and limitations of leveraging external attention maps and language models for explanations.

Abstract

In recent years, model explanation methods have been designed to interpret model decisions faithfully and intuitively so that users can easily understand them. In this paper, we propose a framework, Faithful Attention Explainer (FAE), capable of generating faithful textual explanations regarding the attended-to features. Towards this goal, we deploy an attention module that takes the visual feature maps from the classifier for sentence generation. Furthermore, our method successfully learns the association between features and words, which allows a novel attention enforcement module for attention explanation. Our model achieves promising performance in caption quality metrics and a faithful decision-relevance metric on two datasets (CUB and ACT-X). In addition, we show that FAE can interpret gaze-based human attention, as human gaze indicates the discriminative features that humans use for decision-making, demonstrating the potential of deploying human gaze for advanced human-AI interaction.
Paper Structure (10 sections, 5 equations, 4 figures, 1 table)

This paper contains 10 sections, 5 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: FAE generates faithful explanations (Top). Using attention enforcement, FAE generates a sentence further explaining the attended-to area in GradCAM (Bottom).
  • Figure 2: Overview of Faithful Attention Explainer. The encoder is omitted for simplicity but the output features $V^f$ and $V^i$ from the encoder are denoted. The embedding layer is used to transform words into embeddings. Left: the attention model and decoder are illustrated. The attention model produces attention $\alpha$ based on the previous sequence. Right: the attention alignment is used to produce $\hat{\alpha}$ based on the generated sequence $\hat{y}_{t:T}$, which tries to align $\alpha$ with $\hat{\alpha}$.
  • Figure 3: Illustration of using attention enforcement on CUB and MPII-ANO. Left: Images and extrinsic saliency maps are shown. Middle: Frames denote the step where enforcement is activated. Right: Sentences generated by FAE with and without attention enforcement. The top two examples use GradCAM from the classifier as extrinsic attention maps, while the bottom one uses human gaze maps.
  • Figure 4: Comparison of our method and GPT-4 in generating textual explanations.