Table of Contents
Fetching ...

Counterfactual Concept Bottleneck Models

Gabriele Dominici, Pietro Barbiero, Francesco Giannini, Martin Gjoreski, Giuseppe Marra, Marc Langheinrich

TL;DR

CF-CBMs introduce a latent-variable extension of Concept Bottleneck Models that jointly generates concept-level counterfactuals and supports test-time interventions. By incorporating two latent spaces and a counterfactual path, the model answers What, How, and Why not directly within a unified objective, achieving competitive classification accuracy while delivering interpretable, minimal-change counterfactuals and enabling task-driven interventions. Empirical results across five datasets show CF-CBMs outperform post-hoc baselines in counterfactual validity and stability, and joint training reduces reliance on less informative concepts, enhancing explanation simplicity and robustness. The approach thus advances reliable AI by coupling high-level interpretability with actionable, concept-based counterfactual reasoning, suitable for real-time or high-concept domains such as medical imaging and autonomous systems.

Abstract

Current deep learning models are not designed to simultaneously address three fundamental questions: predict class labels to solve a given classification task (the "What?"), simulate changes in the situation to evaluate how this impacts class predictions (the "How?"), and imagine how the scenario should change to result in different class predictions (the "Why not?"). The inability to answer these questions represents a crucial gap in deploying reliable AI agents, calibrating human trust, and improving human-machine interaction. To bridge this gap, we introduce CounterFactual Concept Bottleneck Models (CF-CBMs), a class of models designed to efficiently address the above queries all at once without the need to run post-hoc searches. Our experimental results demonstrate that CF-CBMs: achieve classification accuracy comparable to black-box models and existing CBMs ("What?"), rely on fewer important concepts leading to simpler explanations ("How?"), and produce interpretable, concept-based counterfactuals ("Why not?"). Additionally, we show that training the counterfactual generator jointly with the CBM leads to two key improvements: (i) it alters the model's decision-making process, making the model rely on fewer important concepts (leading to simpler explanations), and (ii) it significantly increases the causal effect of concept interventions on class predictions, making the model more responsive to these changes.

Counterfactual Concept Bottleneck Models

TL;DR

CF-CBMs introduce a latent-variable extension of Concept Bottleneck Models that jointly generates concept-level counterfactuals and supports test-time interventions. By incorporating two latent spaces and a counterfactual path, the model answers What, How, and Why not directly within a unified objective, achieving competitive classification accuracy while delivering interpretable, minimal-change counterfactuals and enabling task-driven interventions. Empirical results across five datasets show CF-CBMs outperform post-hoc baselines in counterfactual validity and stability, and joint training reduces reliance on less informative concepts, enhancing explanation simplicity and robustness. The approach thus advances reliable AI by coupling high-level interpretability with actionable, concept-based counterfactual reasoning, suitable for real-time or high-concept domains such as medical imaging and autonomous systems.

Abstract

Current deep learning models are not designed to simultaneously address three fundamental questions: predict class labels to solve a given classification task (the "What?"), simulate changes in the situation to evaluate how this impacts class predictions (the "How?"), and imagine how the scenario should change to result in different class predictions (the "Why not?"). The inability to answer these questions represents a crucial gap in deploying reliable AI agents, calibrating human trust, and improving human-machine interaction. To bridge this gap, we introduce CounterFactual Concept Bottleneck Models (CF-CBMs), a class of models designed to efficiently address the above queries all at once without the need to run post-hoc searches. Our experimental results demonstrate that CF-CBMs: achieve classification accuracy comparable to black-box models and existing CBMs ("What?"), rely on fewer important concepts leading to simpler explanations ("How?"), and produce interpretable, concept-based counterfactuals ("Why not?"). Additionally, we show that training the counterfactual generator jointly with the CBM leads to two key improvements: (i) it alters the model's decision-making process, making the model rely on fewer important concepts (leading to simpler explanations), and (ii) it significantly increases the causal effect of concept interventions on class predictions, making the model more responsive to these changes.
Paper Structure (23 sections, 10 equations, 8 figures, 10 tables)

This paper contains 23 sections, 10 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: CF-CBMs generate counterfactuals at the concept level rather than at the input level, as the changes are more interpretable. On the contrary, identifying the changes in the input level require significant more effort from the user. The images are sourced from the SIIM Pneumothorax dataset SIIM-ACR.
  • Figure 2: Counterfactual CBM. For a given input sample, the task predictor answers "what?" queries predicting class labels. The concept predictor answers "how?" queries simulating changes in the scenarios through interventions. When taking the counterfactual latent distribution $z'$, the concept predictor answers "why not?" queries via concept-based counterfactuals, which are easier to understand then counterfactuals at the input level.
  • Figure 3: CF-CBMs balance the trade off between counterfactuals' reliability (proximity) and user effort ($\Delta$-Sparsity). The arrow $(\rightarrow)$ points towards optimal values. Pareto-optimal models form the frontier of the shaded region, whereas dominated solutions are located within the shaded region.
  • Figure 4: CF-CBMs calibrate the trade off between counterfactuals' diversity (variability) and plausibility (IoU). The arrow $(\rightarrow)$ points towards optimal values. Pareto-optimal models form the frontier of the shaded region, whereas dominated solutions are located within the shaded region. "CF-CBM $n$" samples $n$ counterfactuals from the latent distribution as described in App. \ref{['app:multiverse']}.
  • Figure 5: CF-CBMs generate reliable (proximity) and accurate interventions (ROC AUC Int.). The arrow $(\rightarrow)$ points towards optimal values. Pareto-optimal models form the frontier of the shaded region, whereas dominated solutions are located within the shaded region.
  • ...and 3 more figures