Table of Contents
Fetching ...

On the Concept Trustworthiness in Concept Bottleneck Models

Qihan Huang, Jie Song, Jingwen Hu, Haofei Zhang, Yong Wang, Mingli Song

TL;DR

A pioneering metric, referred to as concept trustworthiness score, is proposed to gauge whether the concepts in CBMs are derived from relevant regions, and an enhanced CBM is introduced, enabling concept predictions to be made specifically from distinct parts of the feature map, thereby facilitating the exploration of their related regions.

Abstract

Concept Bottleneck Models (CBMs), which break down the reasoning process into the input-to-concept mapping and the concept-to-label prediction, have garnered significant attention due to their remarkable interpretability achieved by the interpretable concept bottleneck. However, despite the transparency of the concept-to-label prediction, the mapping from the input to the intermediate concept remains a black box, giving rise to concerns about the trustworthiness of the learned concepts (i.e., these concepts may be predicted based on spurious cues). The issue of concept untrustworthiness greatly hampers the interpretability of CBMs, thereby hindering their further advancement. To conduct a comprehensive analysis on this issue, in this study we establish a benchmark to assess the trustworthiness of concepts in CBMs. A pioneering metric, referred to as concept trustworthiness score, is proposed to gauge whether the concepts are derived from relevant regions. Additionally, an enhanced CBM is introduced, enabling concept predictions to be made specifically from distinct parts of the feature map, thereby facilitating the exploration of their related regions. Besides, we introduce three modules, namely the cross-layer alignment (CLA) module, the cross-image alignment (CIA) module, and the prediction alignment (PA) module, to further enhance the concept trustworthiness within the elaborated CBM. The experiments on five datasets across ten architectures demonstrate that without using any concept localization annotations during training, our model improves the concept trustworthiness by a large margin, meanwhile achieving superior accuracy to the state-of-the-arts. Our code is available at https://github.com/hqhQAQ/ProtoCBM.

On the Concept Trustworthiness in Concept Bottleneck Models

TL;DR

A pioneering metric, referred to as concept trustworthiness score, is proposed to gauge whether the concepts in CBMs are derived from relevant regions, and an enhanced CBM is introduced, enabling concept predictions to be made specifically from distinct parts of the feature map, thereby facilitating the exploration of their related regions.

Abstract

Concept Bottleneck Models (CBMs), which break down the reasoning process into the input-to-concept mapping and the concept-to-label prediction, have garnered significant attention due to their remarkable interpretability achieved by the interpretable concept bottleneck. However, despite the transparency of the concept-to-label prediction, the mapping from the input to the intermediate concept remains a black box, giving rise to concerns about the trustworthiness of the learned concepts (i.e., these concepts may be predicted based on spurious cues). The issue of concept untrustworthiness greatly hampers the interpretability of CBMs, thereby hindering their further advancement. To conduct a comprehensive analysis on this issue, in this study we establish a benchmark to assess the trustworthiness of concepts in CBMs. A pioneering metric, referred to as concept trustworthiness score, is proposed to gauge whether the concepts are derived from relevant regions. Additionally, an enhanced CBM is introduced, enabling concept predictions to be made specifically from distinct parts of the feature map, thereby facilitating the exploration of their related regions. Besides, we introduce three modules, namely the cross-layer alignment (CLA) module, the cross-image alignment (CIA) module, and the prediction alignment (PA) module, to further enhance the concept trustworthiness within the elaborated CBM. The experiments on five datasets across ten architectures demonstrate that without using any concept localization annotations during training, our model improves the concept trustworthiness by a large margin, meanwhile achieving superior accuracy to the state-of-the-arts. Our code is available at https://github.com/hqhQAQ/ProtoCBM.
Paper Structure (24 sections, 7 equations, 6 figures, 2 tables)

This paper contains 24 sections, 7 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Untrustworthiness of a concept named "forehead::gray" in the vanilla CBM. (a) The input image. (b) The localization map of this concept (generated by Grad-CAM) concentrates on the underpart of the bird. (c) After removing the head part in the image, the prediction probability for this concept changes very slightly.
  • Figure 2: The calculation of concept trustworthiness score of a concept $c$ about the forehead of the bird.
  • Figure 3: Overview of our proposed model (only three concepts are presented for brevity). Given the input image $x$, the "Backbone" extracts different layers of features for $x$. The CLA module spatially aligns the last feature map $f(x)$ (also denoted as $z_d$) with the input image according to the shallow feature map $z_s$ in a multi-scale manner. Meanwhile, the CIA module spatially aligns $f(x)$ with $x$ by aligning $f(x)$ with the feature map of another augmented image. Next, $f(x)$ is aggregated with multiple prototypes, generating the localization maps $l_{\boldsymbol{\mathrm{p}}}(x)$ and activation values $a_{\boldsymbol{\mathrm{p}}}(x)$ of prototypes. Finally, the activation values $a_{\boldsymbol{\mathrm{p}}}(x)$ are fed into "Concept predictor" and "Category Predictor" for concept prediction and category prediction, respectively. The PA loss is used to supervise the localization maps of the learned concepts. Note that the loss back propagation of $\mathcal{L}_{\rm concept}$ and $\mathcal{L}_{\rm task}$ is omitted for simplicity in the figure.
  • Figure 4: The concept prediction accuracy of our model is similar with the base model before dropping the related regions, while it significantly decreases after dropping the related regions ("Ours (Drop)").
  • Figure 5: The localization maps of two concepts ("forehead::black" and "belly::white") generated by our model, which accurately locate the related regions of the concepts.
  • ...and 1 more figures