V-CEM: Bridging Performance and Intervenability in Concept-based Models
Francesco De Santis, Gabriele Ciravegna, Philippe Bich, Danilo Giordano, Tania Cerquitelli
TL;DR
V-CEM addresses the interpretability-performance tension in concept-based models by marrying the strengths of concept embeddings with variational inference. It introduces a probabilistic framework where concept embeddings are generated via an approximate posterior q(\mathbf{c}|x,c) conditioned on both inputs and concept predictions, while a prior p(\mathbf{c}|c) enforces concept-specific clustering; this yields an ELBO-based objective L = (1/k)L_c + \lambda_t L_t + \lambda_p L_p that balances concept accuracy, task accuracy, and embedding regularization. The model enables targeted concept-embedding interventions that can override input-dependent effects, improving OOD intervenability while preserving strong ID performance comparable to black-box models and CEMs. Experimental results across vision and NLP tasks demonstrate V-CEM’s cohesive embedding space (CRC), robust OOD intervention responsiveness, and competitive ID accuracy, suggesting practical benefits for reliable, human-guided AI systems. The work also introduces metrics for evaluating concept representations and discusses avenues for extending V-CEM to multimodal data and causal modeling to further enhance interpretability and robustness.
Abstract
Concept-based eXplainable AI (C-XAI) is a rapidly growing research field that enhances AI model interpretability by leveraging intermediate, human-understandable concepts. This approach not only enhances model transparency but also enables human intervention, allowing users to interact with these concepts to refine and improve the model's performance. Concept Bottleneck Models (CBMs) explicitly predict concepts before making final decisions, enabling interventions to correct misclassified concepts. While CBMs remain effective in Out-Of-Distribution (OOD) settings with intervention, they struggle to match the performance of black-box models. Concept Embedding Models (CEMs) address this by learning concept embeddings from both concept predictions and input data, enhancing In-Distribution (ID) accuracy but reducing the effectiveness of interventions, especially in OOD scenarios. In this work, we propose the Variational Concept Embedding Model (V-CEM), which leverages variational inference to improve intervention responsiveness in CEMs. We evaluated our model on various textual and visual datasets in terms of ID performance, intervention responsiveness in both ID and OOD settings, and Concept Representation Cohesiveness (CRC), a metric we propose to assess the quality of the concept embedding representations. The results demonstrate that V-CEM retains CEM-level ID performance while achieving intervention effectiveness similar to CBM in OOD settings, effectively reducing the gap between interpretability (intervention) and generalization (performance).
