Table of Contents
Fetching ...

Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models

Nishad Singhi, Jae Myung Kim, Karsten Roth, Zeynep Akata

TL;DR

This work tackles the practical inefficiency of test-time interventions in concept-based models by introducing a lightweight Concept Intervention Realignment Module (CIRM) that leverages inter-concept relationships to update non-intervened concepts after each intervention. The CRM, combined with an intervention policy, can be trained posthoc or end-to-end and is compatible with CBMs and CEMs, including intervention-aware variants like IntCEMs. Across CUB, CelebA, and AwA2, CIRM substantially improves concept attribution and final classification with far fewer interventions (often reducing required interventions by over 70%), highlighting its potential to reduce human-in-the-loop costs in real-world deployments. The approach is validated through extensive ablations and policy-trajectory analyses, demonstrating robustness across architectures and training schemes and offering practical benefits for resource-constrained settings, with code available at the cited repository.

Abstract

Concept Bottleneck Models (CBMs) ground image classification on human-understandable concepts to allow for interpretable model decisions. Crucially, the CBM design inherently allows for human interventions, in which expert users are given the ability to modify potentially misaligned concept choices to influence the decision behavior of the model in an interpretable fashion. However, existing approaches often require numerous human interventions per image to achieve strong performances, posing practical challenges in scenarios where obtaining human feedback is expensive. In this paper, we find that this is noticeably driven by an independent treatment of concepts during intervention, wherein a change of one concept does not influence the use of other ones in the model's final decision. To address this issue, we introduce a trainable concept intervention realignment module, which leverages concept relations to realign concept assignments post-intervention. Across standard, real-world benchmarks, we find that concept realignment can significantly improve intervention efficacy; significantly reducing the number of interventions needed to reach a target classification performance or concept prediction accuracy. In addition, it easily integrates into existing concept-based architectures without requiring changes to the models themselves. This reduced cost of human-model collaboration is crucial to enhancing the feasibility of CBMs in resource-constrained environments. Our code is available at: https://github.com/ExplainableML/concept_realignment.

Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models

TL;DR

This work tackles the practical inefficiency of test-time interventions in concept-based models by introducing a lightweight Concept Intervention Realignment Module (CIRM) that leverages inter-concept relationships to update non-intervened concepts after each intervention. The CRM, combined with an intervention policy, can be trained posthoc or end-to-end and is compatible with CBMs and CEMs, including intervention-aware variants like IntCEMs. Across CUB, CelebA, and AwA2, CIRM substantially improves concept attribution and final classification with far fewer interventions (often reducing required interventions by over 70%), highlighting its potential to reduce human-in-the-loop costs in real-world deployments. The approach is validated through extensive ablations and policy-trajectory analyses, demonstrating robustness across architectures and training schemes and offering practical benefits for resource-constrained settings, with code available at the cited repository.

Abstract

Concept Bottleneck Models (CBMs) ground image classification on human-understandable concepts to allow for interpretable model decisions. Crucially, the CBM design inherently allows for human interventions, in which expert users are given the ability to modify potentially misaligned concept choices to influence the decision behavior of the model in an interpretable fashion. However, existing approaches often require numerous human interventions per image to achieve strong performances, posing practical challenges in scenarios where obtaining human feedback is expensive. In this paper, we find that this is noticeably driven by an independent treatment of concepts during intervention, wherein a change of one concept does not influence the use of other ones in the model's final decision. To address this issue, we introduce a trainable concept intervention realignment module, which leverages concept relations to realign concept assignments post-intervention. Across standard, real-world benchmarks, we find that concept realignment can significantly improve intervention efficacy; significantly reducing the number of interventions needed to reach a target classification performance or concept prediction accuracy. In addition, it easily integrates into existing concept-based architectures without requiring changes to the models themselves. This reduced cost of human-model collaboration is crucial to enhancing the feasibility of CBMs in resource-constrained environments. Our code is available at: https://github.com/ExplainableML/concept_realignment.
Paper Structure (30 sections, 5 equations, 11 figures, 4 tables, 2 algorithms)

This paper contains 30 sections, 5 equations, 11 figures, 4 tables, 2 algorithms.

Figures (11)

  • Figure 1: Concept-based classification models allow for human intervention, where a human expert can correct specifically assigned concepts. However, to achieve satisfactory performance, concept-based classification models often require a large number of interventions, where each additional intervention requires costly human interaction.
  • Figure 2: Illustration of the concept intervention realignment module. Given the concept encoding $g(x)$, we intervene on the concept $i$ selected by a concept selection policy $\pi$. This concept is replaced with a ground-truth (GT) value ($\in \{0,1\}$ depending on whether it is present in a given image or not) to obtain $\tilde{c}_t$ (representing intervention step $t \in \{1, ..., T\}$). This intervened concept representation is then passed into the concept realignment module (leveraging e.g. an MLP or LSTM reweighting mode), which outputs the realigned $u(\tilde{c}_t)$. To ensure that the ground-truth values provided by the user are not overwritten during realignment, $u(\tilde{c}_t)$ retains ground-truth corrections. The final concept vector is then based into a concept-based classifier $f$.
  • Figure 3: Concept prediction loss vs. the number of intervened concepts with and without concept realignment. Concept realignment consistently improves concept predictions.
  • Figure 4: Classification accuracy vs. the number of intervened concepts with and without concept realignment. Realignment consistently improves classification accuracy.
  • Figure 5: Concept Intervention Realignment in intervention-aware CEMs. (a) Concept prediction loss and (b) classification accuracy with jointly and post-hoc trained CIRMs. In both cases, significant benefits can be seen, especially for correct concept attribution after intervention - both for jointly and posthoc trained realignment modules.
  • ...and 6 more figures