Table of Contents
Fetching ...

Learning to Receive Help: Intervention-Aware Concept Embedding Models

Mateo Espinosa Zarlenga, Katherine M. Collins, Krishnamurthy Dvijotham, Adrian Weller, Zohreh Shams, Mateja Jamnik

TL;DR

Intervention-aware Concept Embedding models (IntCEMs) are proposed, a novel CBM-based architecture and training paradigm that improves a model's receptiveness to test-time interventions and significantly outperform state-of-the-art concept-interpretable models when provided with test- time concept interventions.

Abstract

Concept Bottleneck Models (CBMs) tackle the opacity of neural architectures by constructing and explaining their predictions using a set of high-level concepts. A special property of these models is that they permit concept interventions, wherein users can correct mispredicted concepts and thus improve the model's performance. Recent work, however, has shown that intervention efficacy can be highly dependent on the order in which concepts are intervened on and on the model's architecture and training hyperparameters. We argue that this is rooted in a CBM's lack of train-time incentives for the model to be appropriately receptive to concept interventions. To address this, we propose Intervention-aware Concept Embedding models (IntCEMs), a novel CBM-based architecture and training paradigm that improves a model's receptiveness to test-time interventions. Our model learns a concept intervention policy in an end-to-end fashion from where it can sample meaningful intervention trajectories at train-time. This conditions IntCEMs to effectively select and receive concept interventions when deployed at test-time. Our experiments show that IntCEMs significantly outperform state-of-the-art concept-interpretable models when provided with test-time concept interventions, demonstrating the effectiveness of our approach.

Learning to Receive Help: Intervention-Aware Concept Embedding Models

TL;DR

Intervention-aware Concept Embedding models (IntCEMs) are proposed, a novel CBM-based architecture and training paradigm that improves a model's receptiveness to test-time interventions and significantly outperform state-of-the-art concept-interpretable models when provided with test- time concept interventions.

Abstract

Concept Bottleneck Models (CBMs) tackle the opacity of neural architectures by constructing and explaining their predictions using a set of high-level concepts. A special property of these models is that they permit concept interventions, wherein users can correct mispredicted concepts and thus improve the model's performance. Recent work, however, has shown that intervention efficacy can be highly dependent on the order in which concepts are intervened on and on the model's architecture and training hyperparameters. We argue that this is rooted in a CBM's lack of train-time incentives for the model to be appropriately receptive to concept interventions. To address this, we propose Intervention-aware Concept Embedding models (IntCEMs), a novel CBM-based architecture and training paradigm that improves a model's receptiveness to test-time interventions. Our model learns a concept intervention policy in an end-to-end fashion from where it can sample meaningful intervention trajectories at train-time. This conditions IntCEMs to effectively select and receive concept interventions when deployed at test-time. Our experiments show that IntCEMs significantly outperform state-of-the-art concept-interpretable models when provided with test-time concept interventions, demonstrating the effectiveness of our approach.
Paper Structure (63 sections, 8 equations, 13 figures, 8 tables)

This paper contains 63 sections, 8 equations, 13 figures, 8 tables.

Figures (13)

  • Figure 1: When intervening on a CBM, a human expert analyses the predicted concepts and corrects mispredicted values (e.g., the mispredicted concept "legs"), allowing the CBM to update its prediction.
  • Figure 2: Given concept embeddings and probabilities $\zeta(\mathbf{x}) = (\{(\hat{\mathbf{c}}^{+}_i, \hat{\mathbf{c}}^{-}_i)\}_{i=1}^{k}, \hat{\mathbf{p}})$, an IntCEM training step (1) samples an intervention mask $\mathbf{\mu}^{(0)} \sim p(\mathbf{\mu})$ and horizon $T \sim p(T)$, (2) generates an intervention trajectory $\{(\mathbf{\mu}^{(t-1)}, \mathbf{\eta}^{(t)})\}_{t=1}^T$ from the learnable intervention policy $\psi$, (3) and predicts the task label at the start and end of the trajectory. Our loss incentivises (i) good initial concept predictions ($\mathcal{L}_\text{concept}$), (ii) performance-boosting intervention trajectories ($\mathcal{L}_\text{roll}$), (iii) and low task loss before and after interventions ($\mathcal{L}_\text{pred}$), with a heavier penalty$\gamma^T > 1$ for mispredicting at the end of the trajectory. In this figure, dashed orange arrows indicate recursive steps in our sampling process.
  • Figure 3: Task accuracy of all baseline models after receiving a varying number of randomly selected interventions. For our binary MNIST-based tasks, we show task AUC rather than accuracy. Here and elsewhere, we show the means and standard deviations (can be insignificant) over five random seeds.
  • Figure 4: Task accuracy of IntCEMs and CEMs on CUB and CelebA when intervening with different test-time policies. We show similar improvements of IntCEMs over CBMs in Appendix \ref{['appendix:intervention_policies']}.
  • Figure 5: Task performance when intervening on IntCEMs following test-time policies $\psi$, CooP, Random, and BC-Skyline. Our baseline "IntCEM no $\psi$" is an IntCEM whose test and train interventions are sampled from a Random policy rather than from $\psi$ (i.e., a policy is not learnt in this baseline).
  • ...and 8 more figures