AnyCBMs: How to Turn Any Black Box into a Concept Bottleneck Model

Gabriele Dominici; Pietro Barbiero; Francesco Giannini; Martin Gjoreski; Marc Langhenirich

AnyCBMs: How to Turn Any Black Box into a Concept Bottleneck Model

Gabriele Dominici, Pietro Barbiero, Francesco Giannini, Martin Gjoreski, Marc Langhenirich

TL;DR

AnyCBMs address the challenge of adding interpretability to powerful black-box neural networks by introducing a lightweight external mapping from black-box embeddings to supervised concepts and back to embeddings, effectively creating a Concept Bottleneck Model without retraining the original model. The method relies on a commuting relationship between the black-box transformation and the two auxiliary mappings, enabling concept-based explanations and interventions that align with the black-box decision process. Empirical results on MNIST and CUB show that AnyCBMs match the concept and downstream task performance of standard CBMs and the original black box, and that concept interventions are similarly effective, even when the AnyCBM is trained on a different dataset. This work highlights a practical, resource-efficient path to augment large pre-trained models with interpretable latent spaces and human-in-the-loop control over predictions.

Abstract

Interpretable deep learning aims at developing neural architectures whose decision-making processes could be understood by their users. Among these techniqes, Concept Bottleneck Models enhance the interpretability of neural networks by integrating a layer of human-understandable concepts. These models, however, necessitate training a new model from the beginning, consuming significant resources and failing to utilize already trained large models. To address this issue, we introduce "AnyCBM", a method that transforms any existing trained model into a Concept Bottleneck Model with minimal impact on computational resources. We provide both theoretical and experimental insights showing the effectiveness of AnyCBMs in terms of classification performances and effectivenss of concept-based interventions on downstream tasks.

AnyCBMs: How to Turn Any Black Box into a Concept Bottleneck Model

TL;DR

Abstract

Paper Structure (18 sections, 2 theorems, 4 equations, 2 figures, 2 tables)

This paper contains 18 sections, 2 theorems, 4 equations, 2 figures, 2 tables.

Introduction
Background
AnyCBM: Turning Black Boxes into Concept Bottleneck Models
Case 1: $\phi$ is the identity function on $H$
Case 2: independent training
Experiments
Data & task setup
Evaluation
Baselines
Key findings
AnyCBMs match black box and CBM performances in terms of classification accuracy on concepts and downstream tasks (Table \ref{['tab:task_accuracy']}),
AnyCBM interventions are as effective as in Concept Bottleneck Models (Figure \ref{['fig:interventions']})
AnyCBM can be trained with a different dataset from the one used to train the black-box model (Table \ref{['tab:ood']})
Discussion
Advantages
...and 3 more sections

Key Result

Theorem 3.2

If $\phi$ is the identity function on $H$, then $\psi_y$ is injective:

Figures (2)

Figure 1: Any Concept Bottleneck Models (AnyCBMs) transform any black box neural architecture into an interpretable CBM mapping black box embeddings into a set of supervised concepts and then mapping the predicted concepts back to black box embeddings. This allows AnyCBMs to be applied to any layer of a trained black box and to perform concept-based interventions as in standard CBMs.
Figure 2: Task accuracy of AnyCBMs compared to CBMs after intervening on an increasing number of family of concepts on the MNIST and CUB dataset.

Theorems & Definitions (5)

Definition 3.1: AnyCBM
Theorem 3.2
proof
Theorem 3.3
proof

AnyCBMs: How to Turn Any Black Box into a Concept Bottleneck Model

TL;DR

Abstract

AnyCBMs: How to Turn Any Black Box into a Concept Bottleneck Model

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (5)