There Was Never a Bottleneck in Concept Bottleneck Models
Antonio Almudévar, José Miguel Hernández-Lobato, Alfonso Ortega
TL;DR
This work argues that Concept Bottleneck Models (CBMs) do not enforce a true bottleneck, since their latent components $z_j$ can carry information beyond the predefined concepts $c_j$, compromising interpretability and intervention validity. To address this, the authors introduce Minimal Concept Bottleneck Models (MCBMs), which impose an Information Bottleneck at the level of each $z_j$ via a variational objective that minimizes $I(Z_j;X|C_j)$, and under Gaussian assumptions reduces to a mean-squared error between predicted and concept-conditioned targets. Empirically, MCBMs consistently reduce nuisance information (higher Uncertainty Reduction Ratios) and yield more interpretable representations (higher CKA, better Disentanglement and OIS) while enabling more reliable concept interventions, though stronger bottlenecks can modestly reduce task performance. The paper further analyzes fundamental theoretical flaws in CBM interventions (e.g., One-vs-Rest and sigmoid-inverse approximations) and outlines future directions, including modeling task-related nuisances with additional latent variables and exploring priors over $p(z_j|c_j)$ to improve evaluation metrics.
Abstract
Deep learning representations are often difficult to interpret, which can hinder their deployment in sensitive applications. Concept Bottleneck Models (CBMs) have emerged as a promising approach to mitigate this issue by learning representations that support target task performance while ensuring that each component predicts a concrete concept from a predefined set. In this work, we argue that CBMs do not impose a true bottleneck: the fact that a component can predict a concept does not guarantee that it encodes only information about that concept. This shortcoming raises concerns regarding interpretability and the validity of intervention procedures. To overcome this limitation, we propose Minimal Concept Bottleneck Models (MCBMs), which incorporate an Information Bottleneck (IB) objective to constrain each representation component to retain only the information relevant to its corresponding concept. This IB is implemented via a variational regularization term added to the training loss. As a result, MCBMs yield more interpretable representations, support principled concept-level interventions, and remain consistent with probability-theoretic foundations.
