Table of Contents
Fetching ...

Towards Faithful Multimodal Concept Bottleneck Models

Pierre Moreau, Emeline Pineau Ferrand, Yann Choho, Benjamin Wong, Annabelle Blangero, Milan Bhan

Abstract

Concept Bottleneck Models (CBMs) are interpretable models that route predictions through a layer of human-interpretable concepts. While widely studied in vision and, more recently, in NLP, CBMs remain largely unexplored in multimodal settings. For their explanations to be faithful, CBMs must satisfy two conditions: concepts must be properly detected, and concept representations must encode only their intended semantics, without smuggling extraneous task-relevant or inter-concept information into final predictions, a phenomenon known as leakage. Existing approaches treat concept detection and leakage mitigation as separate problems, and typically improve one at the expense of predictive accuracy. In this work, we introduce f-CBM, a faithful multimodal CBM framework built on a vision-language backbone that jointly targets both aspects through two complementary strategies: a differentiable leakage loss to mitigate leakage, and a Kolmogorov-Arnold Network prediction head that provides sufficient expressiveness to improve concept detection. Experiments demonstrate that f-CBM achieves the best trade-off between task accuracy, concept detection, and leakage reduction, while applying seamlessly to both image and text or text-only datasets, making it versatile across modalities.

Towards Faithful Multimodal Concept Bottleneck Models

Abstract

Concept Bottleneck Models (CBMs) are interpretable models that route predictions through a layer of human-interpretable concepts. While widely studied in vision and, more recently, in NLP, CBMs remain largely unexplored in multimodal settings. For their explanations to be faithful, CBMs must satisfy two conditions: concepts must be properly detected, and concept representations must encode only their intended semantics, without smuggling extraneous task-relevant or inter-concept information into final predictions, a phenomenon known as leakage. Existing approaches treat concept detection and leakage mitigation as separate problems, and typically improve one at the expense of predictive accuracy. In this work, we introduce f-CBM, a faithful multimodal CBM framework built on a vision-language backbone that jointly targets both aspects through two complementary strategies: a differentiable leakage loss to mitigate leakage, and a Kolmogorov-Arnold Network prediction head that provides sufficient expressiveness to improve concept detection. Experiments demonstrate that f-CBM achieves the best trade-off between task accuracy, concept detection, and leakage reduction, while applying seamlessly to both image and text or text-only datasets, making it versatile across modalities.
Paper Structure (31 sections, 7 equations, 7 figures, 2 tables)

This paper contains 31 sections, 7 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Pareto frontier: concept detection accuracy versus aggregate leakage. The x-axis represents the average of task-related and inter-concept leakage as introduced in Section \ref{['bk_rw']}, and the y-axis represents RMSE concept detection performance.
  • Figure 2: Leakage analysis in multimodal CBMs
  • Figure 3: Overview of f-CBM, illustrated on an instance from the N24News dataset belonging to the Sport category.
  • Figure 4: Ablation study of f-CBM: effect of the KAN layer and the leakage loss on task accuracy, concept RMSE, and leakage (N24 dataset, CLIP-base backbone).
  • Figure 5: Effect of the leakage loss on concept activation distributions. Early in training (left), activations separate by predicted class, revealing concept-task leakage. Later (right), concept detection improved while the leakage loss reduces class information encoded in the activations, mitigating leakage.
  • ...and 2 more figures