Table of Contents
Fetching ...

Concept Bottleneck Models Without Predefined Concepts

Simon Schrodi, Julian Schur, Max Argus, Thomas Brox

TL;DR

This work addresses interpretability in concept bottleneck models by removing the need for predefined or human-annotated concepts. It introduces Unsupervised Concept Bottleneck Models (UCBMs), which learn a compact concept space via dictionary learning from a pretrained model's activations and pair it with an interpretable, sparse classifier that uses an input-dependent gating mechanism to limit concept usage across all classes. The approach yields competitive performance with dramatically higher sparsity (e.g., ~0.7% of concepts per input on ImageNet) and provides explainable decisions at the concept level; it also demonstrates how large vision-language models can guide weight edits to fix misclassifications. Overall, UCBMs offer a scalable, interpretable alternative to black-box models and open the door to programmable model editing via external multimodal guidance.

Abstract

There has been considerable recent interest in interpretable concept-based models such as Concept Bottleneck Models (CBMs), which first predict human-interpretable concepts and then map them to output classes. To reduce reliance on human-annotated concepts, recent works have converted pretrained black-box models into interpretable CBMs post-hoc. However, these approaches predefine a set of concepts, assuming which concepts a black-box model encodes in its representations. In this work, we eliminate this assumption by leveraging unsupervised concept discovery to automatically extract concepts without human annotations or a predefined set of concepts. We further introduce an input-dependent concept selection mechanism that ensures only a small subset of concepts is used across all classes. We show that our approach improves downstream performance and narrows the performance gap to black-box models, while using significantly fewer concepts in the classification. Finally, we demonstrate how large vision-language models can intervene on the final model weights to correct model errors.

Concept Bottleneck Models Without Predefined Concepts

TL;DR

This work addresses interpretability in concept bottleneck models by removing the need for predefined or human-annotated concepts. It introduces Unsupervised Concept Bottleneck Models (UCBMs), which learn a compact concept space via dictionary learning from a pretrained model's activations and pair it with an interpretable, sparse classifier that uses an input-dependent gating mechanism to limit concept usage across all classes. The approach yields competitive performance with dramatically higher sparsity (e.g., ~0.7% of concepts per input on ImageNet) and provides explainable decisions at the concept level; it also demonstrates how large vision-language models can guide weight edits to fix misclassifications. Overall, UCBMs offer a scalable, interpretable alternative to black-box models and open the door to programmable model editing via external multimodal guidance.

Abstract

There has been considerable recent interest in interpretable concept-based models such as Concept Bottleneck Models (CBMs), which first predict human-interpretable concepts and then map them to output classes. To reduce reliance on human-annotated concepts, recent works have converted pretrained black-box models into interpretable CBMs post-hoc. However, these approaches predefine a set of concepts, assuming which concepts a black-box model encodes in its representations. In this work, we eliminate this assumption by leveraging unsupervised concept discovery to automatically extract concepts without human annotations or a predefined set of concepts. We further introduce an input-dependent concept selection mechanism that ensures only a small subset of concepts is used across all classes. We show that our approach improves downstream performance and narrows the performance gap to black-box models, while using significantly fewer concepts in the classification. Finally, we demonstrate how large vision-language models can intervene on the final model weights to correct model errors.
Paper Structure (31 sections, 4 equations, 11 figures, 2 tables)

This paper contains 31 sections, 4 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Overview of UCBM. In the first step (top), we extract concepts (represented as normalized vectors) using an unsupervised concept discovery method. Given an input sample, we project the activations on the space spanned by these concept vectors (middle at the bottom). Finally, we train an interpretable classifier consisting of an input-dependent concept selection mechanism and sparse linear layer (middle to right at the bottom).
  • Figure 2: Concepts discovered in an unsupervised manner exhibit faithful behavior. We remove image parts and observe the change in activation-concept cosine similarities for a chainsaw (\ref{['subfig:concept_removal_chainsaw']}) and an ostrich (\ref{['subfig:concept_removal_ostrich']}).
  • Figure 3: UCBM Pareto-dominates the baselines. We varied the number of available concepts. As expected, we found that the more available concepts the better the downstream performance. Importantly, UCBM Pareto-dominates the baseline methods.
  • Figure 4: Worse performance of UCBM to UCBM without concept selection & dropout in \ref{['tab:downstream_performance']} is due to the increased sparsity. We plot the mean number of active concepts per input as we make $\lambda_\pi$ smaller and reduce the dropout rate (we only plot the Pareto-optimal points). The dotted black lines serve only as visual guides. We find that UCBM closes the gap to UCBM without concept selection & dropout as we move towards less sparsity.
  • Figure 5: Sensitivity analysis over $\lambda_w$ (\ref{['subfig:lam_w_imagenet']}), $\lambda_\pi$ (\ref{['subfig:lam_pi_imagenet']}), and the dropout rate (\ref{['subfig:dropout_imagenet']}) for ImageNet. Stronger regularization strengths ($\lambda_w$,$\lambda_\pi$) lead to worse downstream task performance. For dropout there exists a sweet spot. Results for the other datasets are provided in \ref{['sec:sensitivity_analysis_other']}.
  • ...and 6 more figures