Concepts' Information Bottleneck Models
Karim Galliamov, Syed M Ahsan Kazmi, Adil Khan, Adín Ramírez Rivera
TL;DR
This work addresses the fidelity and interpretability gaps in Concept Bottleneck Models (CBMs) by introducing a Concepts' Information Bottleneck (CIBM) regularizer applied to the concept layer. By explicitly minimizing $I(X;C)$ while preserving $I(C;Y)$, the approach yields minimal-sufficient concepts and reduces concept leakage without architectural changes, via two practical implementations: a bound-based method $\text{IB}_B$ and an estimator-based method $\text{IB}_E$. The authors provide theoretical justifications—including a PAC-Bayes generalization bound—and validate the methods across six CBM variants and three datasets, showing improved end-to-end accuracy, stronger interventions, and lower leakage, supported by information-plane analyses. Overall, CIBMs offer a theoretically grounded, generalizable path to more faithful, intervenable CBMs with practical impact for explanations and debugging in real-world systems.
Abstract
Concept Bottleneck Models (CBMs) aim to deliver interpretable predictions by routing decisions through a human-understandable concept layer, yet they often suffer reduced accuracy and concept leakage that undermines faithfulness. We introduce an explicit Information Bottleneck regularizer on the concept layer that penalizes $I(X;C)$ while preserving task-relevant information in $I(C;Y)$, encouraging minimal-sufficient concept representations. We derive two practical variants (a variational objective and an entropy-based surrogate) and integrate them into standard CBM training without architectural changes or additional supervision. Evaluated across six CBM families and three benchmarks, the IB-regularized models consistently outperform their vanilla counterparts. Information-plane analyses further corroborate the intended behavior. These results indicate that enforcing a minimal-sufficient concept bottleneck improves both predictive performance and the reliability of concept-level interventions. The proposed regularizer offers a theoretic-grounded, architecture-agnostic path to more faithful and intervenable CBMs, resolving prior evaluation inconsistencies by aligning training protocols and demonstrating robust gains across model families and datasets.
