AnyCBMs: How to Turn Any Black Box into a Concept Bottleneck Model
Gabriele Dominici, Pietro Barbiero, Francesco Giannini, Martin Gjoreski, Marc Langhenirich
TL;DR
AnyCBMs address the challenge of adding interpretability to powerful black-box neural networks by introducing a lightweight external mapping from black-box embeddings to supervised concepts and back to embeddings, effectively creating a Concept Bottleneck Model without retraining the original model. The method relies on a commuting relationship between the black-box transformation and the two auxiliary mappings, enabling concept-based explanations and interventions that align with the black-box decision process. Empirical results on MNIST and CUB show that AnyCBMs match the concept and downstream task performance of standard CBMs and the original black box, and that concept interventions are similarly effective, even when the AnyCBM is trained on a different dataset. This work highlights a practical, resource-efficient path to augment large pre-trained models with interpretable latent spaces and human-in-the-loop control over predictions.
Abstract
Interpretable deep learning aims at developing neural architectures whose decision-making processes could be understood by their users. Among these techniqes, Concept Bottleneck Models enhance the interpretability of neural networks by integrating a layer of human-understandable concepts. These models, however, necessitate training a new model from the beginning, consuming significant resources and failing to utilize already trained large models. To address this issue, we introduce "AnyCBM", a method that transforms any existing trained model into a Concept Bottleneck Model with minimal impact on computational resources. We provide both theoretical and experimental insights showing the effectiveness of AnyCBMs in terms of classification performances and effectivenss of concept-based interventions on downstream tasks.
