Table of Contents
Fetching ...

Explanation Bottleneck Models

Shin'ya Yamaguchi, Kosuke Nishida

TL;DR

A novel interpretable deep neural network called explanation bottleneck models (XBMs) is proposed, which generate a text explanation from the input without pre-defined concepts and then predict a final task prediction based on the generated explanation by leveraging pre-trained vision-language encoder-decoder models.

Abstract

Recent concept-based interpretable models have succeeded in providing meaningful explanations by pre-defined concept sets. However, the dependency on the pre-defined concepts restricts the application because of the limited number of concepts for explanations. This paper proposes a novel interpretable deep neural network called explanation bottleneck models (XBMs). XBMs generate a text explanation from the input without pre-defined concepts and then predict a final task prediction based on the generated explanation by leveraging pre-trained vision-language encoder-decoder models. To achieve both the target task performance and the explanation quality, we train XBMs through the target task loss with the regularization penalizing the explanation decoder via the distillation from the frozen pre-trained decoder. Our experiments, including a comparison to state-of-the-art concept bottleneck models, confirm that XBMs provide accurate and fluent natural language explanations without pre-defined concept sets. Code is available at https://github.com/yshinya6/xbm/.

Explanation Bottleneck Models

TL;DR

A novel interpretable deep neural network called explanation bottleneck models (XBMs) is proposed, which generate a text explanation from the input without pre-defined concepts and then predict a final task prediction based on the generated explanation by leveraging pre-trained vision-language encoder-decoder models.

Abstract

Recent concept-based interpretable models have succeeded in providing meaningful explanations by pre-defined concept sets. However, the dependency on the pre-defined concepts restricts the application because of the limited number of concepts for explanations. This paper proposes a novel interpretable deep neural network called explanation bottleneck models (XBMs). XBMs generate a text explanation from the input without pre-defined concepts and then predict a final task prediction based on the generated explanation by leveraging pre-trained vision-language encoder-decoder models. To achieve both the target task performance and the explanation quality, we train XBMs through the target task loss with the regularization penalizing the explanation decoder via the distillation from the frozen pre-trained decoder. Our experiments, including a comparison to state-of-the-art concept bottleneck models, confirm that XBMs provide accurate and fluent natural language explanations without pre-defined concept sets. Code is available at https://github.com/yshinya6/xbm/.
Paper Structure (35 sections, 5 equations, 4 figures, 11 tables, 1 algorithm)

This paper contains 35 sections, 5 equations, 4 figures, 11 tables, 1 algorithm.

Figures (4)

  • Figure 1: Explanation bottleneck models (XBMs). We propose an interpretable model that generates text explanations for the input embedding with respect to target tasks and then predicts final task labels from the explanations.
  • Figure 2: Training of XBMs. An XBM is optimized by the target task loss with explanation distillation. Explanation distillation leverages a reference explanation $\bm{e}_\mathrm{p}$ generated from a pre-trained text decoder $g_{\phi_\mathrm{p}}$ for penalizing the output distribution of an explanation decoder $g_\phi$ to maintain the interpretable text generation capability of $g_\phi$.
  • Figure 3: Explanation styles provided by XBMs. XBMs can output (i) text explanation directly generated from the explanation decoder, (ii) concept phrases with self-attention scores in the classifier, and (iii) cross-attention heatmap for the entire text explanation and each concept phrase. Concept phrases are constructed by a natural language parser, and the self-attention scores are computed in a middle layer of the classifier with respect to the [CLS] token for each concept phrase. Cross-attention heatmaps are the heatmap visualizations of cross-attention scores between input text tokens and image embedding tokens in the middle layer of the multi-modal classifier (a redder means a higher score).
  • Figure 4: Transition of XBM's explanation outputs during training (please zoom in).