Table of Contents
Fetching ...

Flexible Concept Bottleneck Model

Xingbo Du, Qiantong Dou, Lei Fan, Rui Zhang

TL;DR

FCBM addresses rigidity in VLM-based CBMs by enabling dynamic concept pools through a hypernetwork that predicts class weights from concept text features, and a learnable-temperature sparsemax to select salient concepts. The method decouples concept embeddings from their contributions and aligns training/inference distributions to support zero-shot generalization to unseen concepts. On five benchmarks with ResNet50 and ViT-L/14, FCBM achieves competitive accuracy with a similar number of effective concepts and demonstrates rapid adaptation to new concept pools with only one epoch of fine-tuning. This flexibility makes FCBM well-suited for real-world, rapidly evolving domains where concept sets change over time.

Abstract

Concept bottleneck models (CBMs) improve neural network interpretability by introducing an intermediate layer that maps human-understandable concepts to predictions. Recent work has explored the use of vision-language models (VLMs) to automate concept selection and annotation. However, existing VLM-based CBMs typically require full model retraining when new concepts are involved, which limits their adaptability and flexibility in real-world scenarios, especially considering the rapid evolution of vision-language foundation models. To address these issues, we propose Flexible Concept Bottleneck Model (FCBM), which supports dynamic concept adaptation, including complete replacement of the original concept set. Specifically, we design a hypernetwork that generates prediction weights based on concept embeddings, allowing seamless integration of new concepts without retraining the entire model. In addition, we introduce a modified sparsemax module with a learnable temperature parameter that dynamically selects the most relevant concepts, enabling the model to focus on the most informative features. Extensive experiments on five public benchmarks demonstrate that our method achieves accuracy comparable to state-of-the-art baselines with a similar number of effective concepts. Moreover, the model generalizes well to unseen concepts with just a single epoch of fine-tuning, demonstrating its strong adaptability and flexibility.

Flexible Concept Bottleneck Model

TL;DR

FCBM addresses rigidity in VLM-based CBMs by enabling dynamic concept pools through a hypernetwork that predicts class weights from concept text features, and a learnable-temperature sparsemax to select salient concepts. The method decouples concept embeddings from their contributions and aligns training/inference distributions to support zero-shot generalization to unseen concepts. On five benchmarks with ResNet50 and ViT-L/14, FCBM achieves competitive accuracy with a similar number of effective concepts and demonstrates rapid adaptation to new concept pools with only one epoch of fine-tuning. This flexibility makes FCBM well-suited for real-world, rapidly evolving domains where concept sets change over time.

Abstract

Concept bottleneck models (CBMs) improve neural network interpretability by introducing an intermediate layer that maps human-understandable concepts to predictions. Recent work has explored the use of vision-language models (VLMs) to automate concept selection and annotation. However, existing VLM-based CBMs typically require full model retraining when new concepts are involved, which limits their adaptability and flexibility in real-world scenarios, especially considering the rapid evolution of vision-language foundation models. To address these issues, we propose Flexible Concept Bottleneck Model (FCBM), which supports dynamic concept adaptation, including complete replacement of the original concept set. Specifically, we design a hypernetwork that generates prediction weights based on concept embeddings, allowing seamless integration of new concepts without retraining the entire model. In addition, we introduce a modified sparsemax module with a learnable temperature parameter that dynamically selects the most relevant concepts, enabling the model to focus on the most informative features. Extensive experiments on five public benchmarks demonstrate that our method achieves accuracy comparable to state-of-the-art baselines with a similar number of effective concepts. Moreover, the model generalizes well to unseen concepts with just a single epoch of fine-tuning, demonstrating its strong adaptability and flexibility.

Paper Structure

This paper contains 26 sections, 15 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Difference between the vanilla CBM and FCBM.
  • Figure 2: The pipeline of FCBM consists of four key components: a) A two-stage learning framework; b) Concept sets generated by LLMs, which are used to form CLIP-derived features; c) A hypernetwork that generates weights based on text features; and d) A tailored sparsemax module that enforces sparsity in the weights during both training and inference.
  • Figure 3: Adaptability of different concept pools across five datasets using ResNet50 and ViT-L/14 backbones. For each subfigure, we test the accuracy of FCBM using three types of concepts: the trained concepts, the LLM-generated concepts without training (zero-shot), and the LLM-generated concepts with only one epoch of fine-tuning (finetuned). DeepSeek-V3 (a-e) and GPT-4o (f-j) are employed as the LLM backbones, respectively.
  • Figure 4: The adaptability of FCBM across different concept pools on the Places365 dataset. The left histograms illustrates the prediction made using the trained concepts, while the right histograms show the prediction based on the DeepSeek-V3-generated concepts. The left image belong to the ground-truth class, i.e., 'campus'.
  • Figure 5: Sparsity analysis. We test the accuracy of FCBM with NEC = 30, 50, 100, and full concepts across five datasets using ResNet50 and ViT-L/14 backbones.
  • ...and 1 more figures