Table of Contents
Fetching ...

AnyCBMs: How to Turn Any Black Box into a Concept Bottleneck Model

Gabriele Dominici, Pietro Barbiero, Francesco Giannini, Martin Gjoreski, Marc Langhenirich

TL;DR

AnyCBMs address the challenge of adding interpretability to powerful black-box neural networks by introducing a lightweight external mapping from black-box embeddings to supervised concepts and back to embeddings, effectively creating a Concept Bottleneck Model without retraining the original model. The method relies on a commuting relationship between the black-box transformation and the two auxiliary mappings, enabling concept-based explanations and interventions that align with the black-box decision process. Empirical results on MNIST and CUB show that AnyCBMs match the concept and downstream task performance of standard CBMs and the original black box, and that concept interventions are similarly effective, even when the AnyCBM is trained on a different dataset. This work highlights a practical, resource-efficient path to augment large pre-trained models with interpretable latent spaces and human-in-the-loop control over predictions.

Abstract

Interpretable deep learning aims at developing neural architectures whose decision-making processes could be understood by their users. Among these techniqes, Concept Bottleneck Models enhance the interpretability of neural networks by integrating a layer of human-understandable concepts. These models, however, necessitate training a new model from the beginning, consuming significant resources and failing to utilize already trained large models. To address this issue, we introduce "AnyCBM", a method that transforms any existing trained model into a Concept Bottleneck Model with minimal impact on computational resources. We provide both theoretical and experimental insights showing the effectiveness of AnyCBMs in terms of classification performances and effectivenss of concept-based interventions on downstream tasks.

AnyCBMs: How to Turn Any Black Box into a Concept Bottleneck Model

TL;DR

AnyCBMs address the challenge of adding interpretability to powerful black-box neural networks by introducing a lightweight external mapping from black-box embeddings to supervised concepts and back to embeddings, effectively creating a Concept Bottleneck Model without retraining the original model. The method relies on a commuting relationship between the black-box transformation and the two auxiliary mappings, enabling concept-based explanations and interventions that align with the black-box decision process. Empirical results on MNIST and CUB show that AnyCBMs match the concept and downstream task performance of standard CBMs and the original black box, and that concept interventions are similarly effective, even when the AnyCBM is trained on a different dataset. This work highlights a practical, resource-efficient path to augment large pre-trained models with interpretable latent spaces and human-in-the-loop control over predictions.

Abstract

Interpretable deep learning aims at developing neural architectures whose decision-making processes could be understood by their users. Among these techniqes, Concept Bottleneck Models enhance the interpretability of neural networks by integrating a layer of human-understandable concepts. These models, however, necessitate training a new model from the beginning, consuming significant resources and failing to utilize already trained large models. To address this issue, we introduce "AnyCBM", a method that transforms any existing trained model into a Concept Bottleneck Model with minimal impact on computational resources. We provide both theoretical and experimental insights showing the effectiveness of AnyCBMs in terms of classification performances and effectivenss of concept-based interventions on downstream tasks.
Paper Structure (18 sections, 2 theorems, 4 equations, 2 figures, 2 tables)

This paper contains 18 sections, 2 theorems, 4 equations, 2 figures, 2 tables.

Key Result

Theorem 3.2

If $\phi$ is the identity function on $H$, then $\psi_y$ is injective:

Figures (2)

  • Figure 1: Any Concept Bottleneck Models (AnyCBMs) transform any black box neural architecture into an interpretable CBM mapping black box embeddings into a set of supervised concepts and then mapping the predicted concepts back to black box embeddings. This allows AnyCBMs to be applied to any layer of a trained black box and to perform concept-based interventions as in standard CBMs.
  • Figure 2: Task accuracy of AnyCBMs compared to CBMs after intervening on an increasing number of family of concepts on the MNIST and CUB dataset.

Theorems & Definitions (5)

  • Definition 3.1: AnyCBM
  • Theorem 3.2
  • proof
  • Theorem 3.3
  • proof