Table of Contents
Fetching ...

There Was Never a Bottleneck in Concept Bottleneck Models

Antonio Almudévar, José Miguel Hernández-Lobato, Alfonso Ortega

TL;DR

This work argues that Concept Bottleneck Models (CBMs) do not enforce a true bottleneck, since their latent components $z_j$ can carry information beyond the predefined concepts $c_j$, compromising interpretability and intervention validity. To address this, the authors introduce Minimal Concept Bottleneck Models (MCBMs), which impose an Information Bottleneck at the level of each $z_j$ via a variational objective that minimizes $I(Z_j;X|C_j)$, and under Gaussian assumptions reduces to a mean-squared error between predicted and concept-conditioned targets. Empirically, MCBMs consistently reduce nuisance information (higher Uncertainty Reduction Ratios) and yield more interpretable representations (higher CKA, better Disentanglement and OIS) while enabling more reliable concept interventions, though stronger bottlenecks can modestly reduce task performance. The paper further analyzes fundamental theoretical flaws in CBM interventions (e.g., One-vs-Rest and sigmoid-inverse approximations) and outlines future directions, including modeling task-related nuisances with additional latent variables and exploring priors over $p(z_j|c_j)$ to improve evaluation metrics.

Abstract

Deep learning representations are often difficult to interpret, which can hinder their deployment in sensitive applications. Concept Bottleneck Models (CBMs) have emerged as a promising approach to mitigate this issue by learning representations that support target task performance while ensuring that each component predicts a concrete concept from a predefined set. In this work, we argue that CBMs do not impose a true bottleneck: the fact that a component can predict a concept does not guarantee that it encodes only information about that concept. This shortcoming raises concerns regarding interpretability and the validity of intervention procedures. To overcome this limitation, we propose Minimal Concept Bottleneck Models (MCBMs), which incorporate an Information Bottleneck (IB) objective to constrain each representation component to retain only the information relevant to its corresponding concept. This IB is implemented via a variational regularization term added to the training loss. As a result, MCBMs yield more interpretable representations, support principled concept-level interventions, and remain consistent with probability-theoretic foundations.

There Was Never a Bottleneck in Concept Bottleneck Models

TL;DR

This work argues that Concept Bottleneck Models (CBMs) do not enforce a true bottleneck, since their latent components can carry information beyond the predefined concepts , compromising interpretability and intervention validity. To address this, the authors introduce Minimal Concept Bottleneck Models (MCBMs), which impose an Information Bottleneck at the level of each via a variational objective that minimizes , and under Gaussian assumptions reduces to a mean-squared error between predicted and concept-conditioned targets. Empirically, MCBMs consistently reduce nuisance information (higher Uncertainty Reduction Ratios) and yield more interpretable representations (higher CKA, better Disentanglement and OIS) while enabling more reliable concept interventions, though stronger bottlenecks can modestly reduce task performance. The paper further analyzes fundamental theoretical flaws in CBM interventions (e.g., One-vs-Rest and sigmoid-inverse approximations) and outlines future directions, including modeling task-related nuisances with additional latent variables and exploring priors over to improve evaluation metrics.

Abstract

Deep learning representations are often difficult to interpret, which can hinder their deployment in sensitive applications. Concept Bottleneck Models (CBMs) have emerged as a promising approach to mitigate this issue by learning representations that support target task performance while ensuring that each component predicts a concrete concept from a predefined set. In this work, we argue that CBMs do not impose a true bottleneck: the fact that a component can predict a concept does not guarantee that it encodes only information about that concept. This shortcoming raises concerns regarding interpretability and the validity of intervention procedures. To overcome this limitation, we propose Minimal Concept Bottleneck Models (MCBMs), which incorporate an Information Bottleneck (IB) objective to constrain each representation component to retain only the information relevant to its corresponding concept. This IB is implemented via a variational regularization term added to the training loss. As a result, MCBMs yield more interpretable representations, support principled concept-level interventions, and remain consistent with probability-theoretic foundations.

Paper Structure

This paper contains 42 sections, 23 equations, 10 figures, 8 tables, 1 algorithm.

Figures (10)

  • Figure 1: In Vanilla Models, concepts and nuisances may be arbitrarily entangled in the representation, and a variable $z_j$ may capture only part of a concept (depicted as paler colors). In CBMs, each $z_j$ encodes all information about its corresponding concept $c_j$, but may also capture some information about nuisances (e.g., $z_1$) or other concepts (e.g., $z_2$). In contrast, MCBMs enforce that each representation variable $z_j$ encodes all—and only—the information about its corresponding concept.
  • Figure 2: Graphical models of the different systems described for two concepts and two-dimensional representations. Appendix \ref{['app:complete_gm']} shows the analogous figure for $m$ concepts and $m$-dimensional representations. Inputs $\bm{x}$ are defined by some concepts $\{c_j\}_{j=1}^m$ and nuisances $\bm{n}$; and targets $\bm{y}$ are defined by $\bm{x}$ (gray arrows). Vanilla models obtain the representations $\{z_j\}_{j=1}^m$ from $\bm{x}$ through the encoder$\color{mycyan}p_\theta(\bm{z}|\bm{x})$ (cyan arrows) and solve the task $\hat{\bm{y}}$ sequentially through the task head$\color{mygreen}q_\phi(\hat{\bm{y}}|\bm{z})$ (green arrows). Concept Bottleneck Models make a prediction $\hat{c}_j$ of each concept $c_j$ from one representation $z_j$ through the concept head$\color{myblue}q_\phi(\hat{c}_j|z_j)$ (blue arrows). Minimal CBMs make a prediction $\hat{z}_j$ of each representation $z_j$ from one concept $c_j$ through the representation head$\color{myred}q_\phi(\hat{z}_j|c_j)$ (red arrows).
  • Figure 3: Some nuisances $\bm{n}_y \in \bm{n}$ affect the task $\bm{y}$ while others $\bm{n}_{\bar{y}} \in n$ do not. None of them should affect the representation $\bm{z}$ since it must be fully described by the concepts $\bm{c}$.
  • Figure 4: CKA (left, ↑), Disentanglement (middle, ↑) and OIS (right, ↓).
  • Figure 5: Error (y-axis) versus percentage of concepts intervened (x-axis) across different models.
  • ...and 5 more figures