Table of Contents
Fetching ...

Energy-Based Concept Bottleneck Models: Unifying Prediction, Concept Intervention, and Probabilistic Interpretations

Xinyue Xu, Yi Qin, Lu Mi, Hao Wang, Xiaomeng Li

TL;DR

The paper tackles the interpretability gap in concept bottleneck models by introducing Energy-Based Concept Bottleneck Models (ECBMs), which define a joint energy over input, concepts, and class labels using three neural energy networks. This structure enables unified probabilistic prediction, concept correction, and rich conditional interpretations such as $p(\mathbf{c}|\mathbf{y})$, $p(c_k|\mathbf{y},c_{k'})$, and $p(\mathbf{c}_{-k},\mathbf{y}|\mathbf{x},c_k)$. ECBMs are trained with a composite loss $\mathcal{L}_{total}^{all}=\mathcal{L}_{class}+\lambda_c\mathcal{L}_{concept}+\lambda_g\mathcal{L}_{global}$ and perform inference by optimizing the joint energy, with test-time interventions capable of propagating corrections to correlated concepts. Empirical results on real-world datasets show ECBMs outperforming state-of-the-art baselines in concept-related metrics and providing near-ground-truth alignments for conditional concept importance, underscoring their potential for more interpretable and robust decision-making. The work offers a concrete path toward interpretable models that preserve predictive power while delivering deeper insights into concept–concept–label dynamics.

Abstract

Existing methods, such as concept bottleneck models (CBMs), have been successful in providing concept-based interpretations for black-box deep learning models. They typically work by predicting concepts given the input and then predicting the final class label given the predicted concepts. However, (1) they often fail to capture the high-order, nonlinear interaction between concepts, e.g., correcting a predicted concept (e.g., "yellow breast") does not help correct highly correlated concepts (e.g., "yellow belly"), leading to suboptimal final accuracy; (2) they cannot naturally quantify the complex conditional dependencies between different concepts and class labels (e.g., for an image with the class label "Kentucky Warbler" and a concept "black bill", what is the probability that the model correctly predicts another concept "black crown"), therefore failing to provide deeper insight into how a black-box model works. In response to these limitations, we propose Energy-based Concept Bottleneck Models (ECBMs). Our ECBMs use a set of neural networks to define the joint energy of candidate (input, concept, class) tuples. With such a unified interface, prediction, concept correction, and conditional dependency quantification are then represented as conditional probabilities, which are generated by composing different energy functions. Our ECBMs address both limitations of existing CBMs, providing higher accuracy and richer concept interpretations. Empirical results show that our approach outperforms the state-of-the-art on real-world datasets.

Energy-Based Concept Bottleneck Models: Unifying Prediction, Concept Intervention, and Probabilistic Interpretations

TL;DR

The paper tackles the interpretability gap in concept bottleneck models by introducing Energy-Based Concept Bottleneck Models (ECBMs), which define a joint energy over input, concepts, and class labels using three neural energy networks. This structure enables unified probabilistic prediction, concept correction, and rich conditional interpretations such as , , and . ECBMs are trained with a composite loss and perform inference by optimizing the joint energy, with test-time interventions capable of propagating corrections to correlated concepts. Empirical results on real-world datasets show ECBMs outperforming state-of-the-art baselines in concept-related metrics and providing near-ground-truth alignments for conditional concept importance, underscoring their potential for more interpretable and robust decision-making. The work offers a concrete path toward interpretable models that preserve predictive power while delivering deeper insights into concept–concept–label dynamics.

Abstract

Existing methods, such as concept bottleneck models (CBMs), have been successful in providing concept-based interpretations for black-box deep learning models. They typically work by predicting concepts given the input and then predicting the final class label given the predicted concepts. However, (1) they often fail to capture the high-order, nonlinear interaction between concepts, e.g., correcting a predicted concept (e.g., "yellow breast") does not help correct highly correlated concepts (e.g., "yellow belly"), leading to suboptimal final accuracy; (2) they cannot naturally quantify the complex conditional dependencies between different concepts and class labels (e.g., for an image with the class label "Kentucky Warbler" and a concept "black bill", what is the probability that the model correctly predicts another concept "black crown"), therefore failing to provide deeper insight into how a black-box model works. In response to these limitations, we propose Energy-based Concept Bottleneck Models (ECBMs). Our ECBMs use a set of neural networks to define the joint energy of candidate (input, concept, class) tuples. With such a unified interface, prediction, concept correction, and conditional dependency quantification are then represented as conditional probabilities, which are generated by composing different energy functions. Our ECBMs address both limitations of existing CBMs, providing higher accuracy and richer concept interpretations. Empirical results show that our approach outperforms the state-of-the-art on real-world datasets.
Paper Structure (23 sections, 12 theorems, 40 equations, 12 figures, 6 tables, 1 algorithm)

This paper contains 23 sections, 12 theorems, 40 equations, 12 figures, 6 tables, 1 algorithm.

Key Result

Proposition 3.0

Given the ground-truth values of concepts $[c_{k}]_{k=1}^{K-s}$, the joint probability of the remaining concepts $[c_{k}]_{k=K-s+1}^{K}$ and the class label ${\bm{y}}$ can be computed as follows: where $E^{joint}_{{\bm{\theta}}}({\bm{x}},{\bm{c}},{\bm{y}})$ is the joint energy defined in Eqn. eq:joint_E.

Figures (12)

  • Figure 1: Overview of our ECBM. Top: During training, ECBM learns positive concept embeddings ${\bm{v}}_k^{(+)}$ (in black), negative concept embeddings ${\bm{v}}_k^{(-)}$ (in white), the class embeddings ${\bm{u}}_m$ (in black), and the three energy networks by minimizing the three energy functions, $E_{{\bm{\theta}}}^{class}({\bm{x}}, {\bm{y}})$, $E_{{\bm{\theta}}}^{concept}({\bm{x}}, {\bm{c}})$, and $E_{{\bm{\theta}}}^{global}({\bm{c}}, {\bm{y}})$ using Eqn. \ref{['eq:loss_total']}. The concept ${\bm{c}}$ and class label ${\bm{y}}$ are treated as constants. Bottom: During inference, we (1) freeze all concept and class embeddings as well as all networks, and (2) update the predicted concept probabilities $\widehat{{\bm{c}}}$ and class probabilities $\widehat{{\bm{y}}}$ by minimizing the three energy functions using Eqn. \ref{['eq:loss_total']}.
  • Figure 2: Performance with different ratios of intervened concepts on three datasets (with error bars). The intervention ratio denotes the proportion of provided correct concepts. We use CEM with RandInt. CelebA and AWA2 do not have grouped concepts; thus we adopt individual intervention.
  • Figure 3: Marginal concept importance ($p(c_k=1 | {\bm{y}})$) for top $3$ concepts of $4$ different classes computed using Proposition \ref{['prop:marginalclass']}. ECBM's estimation (Ours) is very close to the ground truth (Oracle).
  • Figure 4: We selected the class "Black and White Warbler" in CUB for illustration. (a) Joint class-specific concept importance $p(c_{k^{\prime}}=1, c_k=1 | {\bm{y}})$ for ECBM's prediction and ground truth derived from Proposition \ref{['prop:jointclass']}. (b) Class-specific conditional probability among concepts $p(c_k=1 | c_{k^{\prime}}=1, {\bm{y}})$ for ECBM's prediction and ground truth derived from Proposition \ref{['prop:correctconcept']}. (c) Class-agnostic conditional probability among concepts $p(c_{k}=1 | c_{k^{\prime}}=1)$ for ECBM's prediction and ground truth derived from Proposition \ref{['prop:conditionalconcepts']}.
  • Figure 5: Joint class-specific of concepts importance heatmap ($p(c_{k^{\prime}}=1, c_k=1 | {\bm{y}})$) for ECBM's ground truth and prediction derived from Proposition \ref{['prop:jointclass']}.
  • ...and 7 more figures

Theorems & Definitions (19)

  • Proposition 3.0: Joint Missing Concept and Class Probability
  • Proposition 3.0: Marginal Class-Specific Concept Importance
  • Proposition 3.0: Joint Class-Specific Concept Importance
  • Proposition 3.0: Class-Specific Conditional Probability among Concepts
  • Proposition 3.0: Class-Agnostic Conditional Probability among Concepts
  • Proposition A.0: Joint Missing Concept and Class Probability
  • proof
  • Proposition A.0: Joint Class-Specific Concept Importance
  • proof
  • Proposition A.0: Marginal Class-Specific Concept Importance
  • ...and 9 more