Table of Contents
Fetching ...

Learning to Intervene on Concept Bottlenecks

David Steinmann, Wolfgang Stammer, Felix Friedrich, Kristian Kersting

TL;DR

This work tackles the interpretability bottleneck in deep learning by enhancing concept bottleneck models (CBMs) with two-memory CB2Ms that store past mistakes and interventions. The two-memory system enables automatic generalization of prior interventions to unseen data and targeted detection of bottleneck errors to guide human feedback, improving efficiency in interactive concept learning. Across tasks with distribution shifts and confounded data, CB2Ms demonstrate substantial gains in concept and task accuracy and show robust mistake detection via memory-based reasoning. The approach offers a practical pathway to continual, data-efficient, interactive correction of concept bottlenecks in real-world scenarios.

Abstract

While deep learning models often lack interpretability, concept bottleneck models (CBMs) provide inherent explanations via their concept representations. Moreover, they allow users to perform interventional interactions on these concepts by updating the concept values and thus correcting the predictive output of the model. Up to this point, these interventions were typically applied to the model just once and then discarded. To rectify this, we present concept bottleneck memory models (CB2Ms), which keep a memory of past interventions. Specifically, CB2Ms leverage a two-fold memory to generalize interventions to appropriate novel situations, enabling the model to identify errors and reapply previous interventions. This way, a CB2M learns to automatically improve model performance from a few initially obtained interventions. If no prior human interventions are available, a CB2M can detect potential mistakes of the CBM bottleneck and request targeted interventions. Our experimental evaluations on challenging scenarios like handling distribution shifts and confounded data demonstrate that CB2Ms are able to successfully generalize interventions to unseen data and can indeed identify wrongly inferred concepts. Hence, CB2Ms are a valuable tool for users to provide interactive feedback on CBMs, by guiding a user's interaction and requiring fewer interventions.

Learning to Intervene on Concept Bottlenecks

TL;DR

This work tackles the interpretability bottleneck in deep learning by enhancing concept bottleneck models (CBMs) with two-memory CB2Ms that store past mistakes and interventions. The two-memory system enables automatic generalization of prior interventions to unseen data and targeted detection of bottleneck errors to guide human feedback, improving efficiency in interactive concept learning. Across tasks with distribution shifts and confounded data, CB2Ms demonstrate substantial gains in concept and task accuracy and show robust mistake detection via memory-based reasoning. The approach offers a practical pathway to continual, data-efficient, interactive correction of concept bottlenecks in real-world scenarios.

Abstract

While deep learning models often lack interpretability, concept bottleneck models (CBMs) provide inherent explanations via their concept representations. Moreover, they allow users to perform interventional interactions on these concepts by updating the concept values and thus correcting the predictive output of the model. Up to this point, these interventions were typically applied to the model just once and then discarded. To rectify this, we present concept bottleneck memory models (CB2Ms), which keep a memory of past interventions. Specifically, CB2Ms leverage a two-fold memory to generalize interventions to appropriate novel situations, enabling the model to identify errors and reapply previous interventions. This way, a CB2M learns to automatically improve model performance from a few initially obtained interventions. If no prior human interventions are available, a CB2M can detect potential mistakes of the CBM bottleneck and request targeted interventions. Our experimental evaluations on challenging scenarios like handling distribution shifts and confounded data demonstrate that CB2Ms are able to successfully generalize interventions to unseen data and can indeed identify wrongly inferred concepts. Hence, CB2Ms are a valuable tool for users to provide interactive feedback on CBMs, by guiding a user's interaction and requiring fewer interventions.
Paper Structure (17 sections, 4 equations, 5 figures, 13 tables)

This paper contains 17 sections, 4 equations, 5 figures, 13 tables.

Figures (5)

  • Figure 1: Reusing a CBM intervention can correct model mistakes for multiple examples. Top: CBMs generate a human interpretable concept representation via bottleneck ($g$) to solve the final task with a predictor ($f$). Human users can correct these concept predictions via targeted interventions (blue) influencing the final prediction. Bottom: Human interventions hold valuable information reusable in the right situations to automatically correct model errors without further human interactions.
  • Figure 2: Overview of CB2M to detect mistakes or generalize interventions. A vanilla CBM (grey), consisting of bottleneck ($g$) and predictor ($f$), is extended with a two-fold memory (orange and green). The memory compares encodings of new samples to known mistakes to (i) detect model errors or (ii) automatically correct the model via reuse of interventions.
  • Figure 3: Less is enough: Intervening on a subset of all concepts already yields large improvements. CB2Ms can be combined with methods which select subsets of concepts for interventions (here ECTP) closer_shin. (Mean and std over 5 runs)
  • Figure 4: CB2M also proves effective with fewer interventions in the memory. This ablation evaluates the effect of the validation size on the concept and class accuracy on the full set. The CB2M was provided with 25%, 50%, 75% or 100% of the validation set mistakes as interventions. We present the baseline CBM results (gray) for comparisons.
  • Figure 5: Ablation on the effect of the memory size on the performance of CB2M. Specifically, the performance on the identified instances is shown. The CB2M was provided with 25%, 50%, 75% or 100% of the validation set mistakes as interventions. We present the baseline CBM results (gray) for comparisons. Overall, CB2Ms performance is not affected much by the memory size and vastly surpasses the base CBM performance.