Towards Robust and Reliable Concept Representations: Reliability-Enhanced Concept Embedding Model
Yuxuan Cai, Xiyu Wang, Satoshi Tsutsui, Winnie Pang, Bihan Wen
TL;DR
This work tackles reliability gaps in concept bottleneck models by addressing two core weaknesses: sensitivity to concept-irrelevant background features and semantic inconsistency of the same concept across samples. It introduces RECEM, a framework combining Concept-Level Disentanglement with a Gradient Reversal Layer and HSIC regularization, and Concept Mixup for semantic alignment across samples, optimized via a joint loss that includes reconstruction and alignment terms. Empirical results across CUB, TravelingBirds, CelebA, and AwA2 show that RECEM surpasses strong baselines in concept accuracy, task accuracy, and Concept Alignment Score (CAS), particularly under background shifts and incomplete annotations, while enabling more faithful human interventions. The findings underscore the value of disentangling nuisance information and aligning concept semantics to improve both interpretability and robustness in CBMs for real-world deployment.
Abstract
Concept Bottleneck Models (CBMs) aim to enhance interpretability by predicting human-understandable concepts as intermediates for decision-making. However, these models often face challenges in ensuring reliable concept representations, which can propagate to downstream tasks and undermine robustness, especially under distribution shifts. Two inherent issues contribute to concept unreliability: sensitivity to concept-irrelevant features (e.g., background variations) and lack of semantic consistency for the same concept across different samples. To address these limitations, we propose the Reliability-Enhanced Concept Embedding Model (RECEM), which introduces a two-fold strategy: Concept-Level Disentanglement to separate irrelevant features from concept-relevant information and a Concept Mixup mechanism to ensure semantic alignment across samples. These mechanisms work together to improve concept reliability, enabling the model to focus on meaningful object attributes and generate faithful concept representations. Experimental results demonstrate that RECEM consistently outperforms existing baselines across multiple datasets, showing superior performance under background and domain shifts. These findings highlight the effectiveness of disentanglement and alignment strategies in enhancing both reliability and robustness in CBMs.
