Causally Reliable Concept Bottleneck Models
Giovanni De Felice, Arianna Casanova Flores, Francesco De Santis, Silvia Santini, Johannes Schneider, Pietro Barbiero, Alberto Termine
TL;DR
The paper addresses the fragility of purely associative concept bottleneck models by introducing Causally reliable Concept Bottleneck Models (C$^2$BMs), which impose a causal bottleneck over concepts guided by a structural causal model. It presents a fully automated pipeline to discover concepts and causal graphs from data and background knowledge, and to train an exogenous-embedding encoder and a hypernetwork that parameterizes causal mechanisms. Empirical results across synthetic and real datasets show that C$^2$BMs maintain competitive task accuracy while achieving improved causal reliability, better responsiveness to interventions, and enhanced debiasing and fairness properties. This work offers a principled path toward interpretable, causally grounded AI that supports interventions and fairness in complex decision tasks.
Abstract
Concept-based models are an emerging paradigm in deep learning that constrains the inference process to operate through human-interpretable variables, facilitating explainability and human interaction. However, these architectures, on par with popular opaque neural models, fail to account for the true causal mechanisms underlying the target phenomena represented in the data. This hampers their ability to support causal reasoning tasks, limits out-of-distribution generalization, and hinders the implementation of fairness constraints. To overcome these issues, we propose Causally reliable Concept Bottleneck Models (C$^2$BMs), a class of concept-based architectures that enforce reasoning through a bottleneck of concepts structured according to a model of the real-world causal mechanisms. We also introduce a pipeline to automatically learn this structure from observational data and unstructured background knowledge (e.g., scientific literature). Experimental evidence suggests that C$^2$BMs are more interpretable, causally reliable, and improve responsiveness to interventions w.r.t. standard opaque and concept-based models, while maintaining their accuracy.
