Table of Contents
Fetching ...

Do Concept Bottleneck Models Learn as Intended?

Andrei Margeloiu, Matthew Ashman, Umang Bhatt, Yanzhi Chen, Mateja Jamnik, Adrian Weller

TL;DR

Do Concept Bottleneck Models Learn as Intended? investigates whether constraining prediction through a predefined concept layer $f(g(\boldsymbol{x}))$ yields true interpretability, predictability, and intervenability. The authors compare independent, sequential, and joint training regimes and assess them with post hoc interpretability methods on the Osteoarthritis Initiative (OAI) and Caltech-UC Bird (CUB) datasets. They find that the joint objective often allows the model to use information about the target beyond the bottleneck, and that the learned concepts do not map to semantically meaningful input-space regions, whereas the independent variant may satisfy the desiderata under current analysis. The work challenges the practical utility of CBMs in their current form and motivates redesigned concept representations and validation techniques to ensure concepts truly capture input-relevant structure.

Abstract

Concept bottleneck models map from raw inputs to concepts, and then from concepts to targets. Such models aim to incorporate pre-specified, high-level concepts into the learning procedure, and have been motivated to meet three desiderata: interpretability, predictability, and intervenability. However, we find that concept bottleneck models struggle to meet these goals. Using post hoc interpretability methods, we demonstrate that concepts do not correspond to anything semantically meaningful in input space, thus calling into question the usefulness of concept bottleneck models in their current form.

Do Concept Bottleneck Models Learn as Intended?

TL;DR

Do Concept Bottleneck Models Learn as Intended? investigates whether constraining prediction through a predefined concept layer yields true interpretability, predictability, and intervenability. The authors compare independent, sequential, and joint training regimes and assess them with post hoc interpretability methods on the Osteoarthritis Initiative (OAI) and Caltech-UC Bird (CUB) datasets. They find that the joint objective often allows the model to use information about the target beyond the bottleneck, and that the learned concepts do not map to semantically meaningful input-space regions, whereas the independent variant may satisfy the desiderata under current analysis. The work challenges the practical utility of CBMs in their current form and motivates redesigned concept representations and validation techniques to ensure concepts truly capture input-relevant structure.

Abstract

Concept bottleneck models map from raw inputs to concepts, and then from concepts to targets. Such models aim to incorporate pre-specified, high-level concepts into the learning procedure, and have been motivated to meet three desiderata: interpretability, predictability, and intervenability. However, we find that concept bottleneck models struggle to meet these goals. Using post hoc interpretability methods, we demonstrate that concepts do not correspond to anything semantically meaningful in input space, thus calling into question the usefulness of concept bottleneck models in their current form.

Paper Structure

This paper contains 9 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: Intervenability on joint and sequential CBMs is better than that of the independent CBM: this is from Figure 4 of koh2020concept.
  • Figure 2: Post hoc comparison between the joint and independent CBMs for the concept "wing pattern." Columns 2-4 show the saliency map for each discrete value of concept: solid, spotted, stripe. The last column shows the saliency for the entire "Wing pattern" concept by averaging the saliency maps from columns 2-4. Both CBMs attend to the entire bird instead and not just to the wing. The saliency maps are computed using Integrated Gradients with Gaussian Noise baseline.
  • Figure 3: Joint model "leg color." The bird's leg is not attended to by any saliency method.
  • Figure 4: Independent model "leg color." The bird's leg is not attended to by any saliency method.
  • Figure 5: Independent model "bill shape" concept.
  • ...and 3 more figures