Table of Contents
Fetching ...

Adaptive Test-Time Intervention for Concept Bottleneck Models

Matthew Shen, Aliyah Hsu, Abhineet Agarwal, Bin Yu

TL;DR

This work tackles the interpretability-performance trade-off in concept bottleneck models by distilling the nonlinear concept-to-target component into FIGS-BD, an interpretable binary-augmented sum-of-trees model. It enables adaptive test-time intervention (ATTI), ranking concept interactions for per-example validation by humans. Across CV and NLP datasets, FIGS-BD preserves most of the teacher's predictive power while offering a compact, interpretable representation and effective, targeted interventions, including strong improvements with few interventions. The approach holds practical promise for deploying CBMs in real-world, high-stakes settings with limited expert intervention capability.

Abstract

Concept bottleneck models (CBM) aim to improve model interpretability by predicting human level "concepts" in a bottleneck within a deep learning model architecture. However, how the predicted concepts are used in predicting the target still either remains black-box or is simplified to maintain interpretability at the cost of prediction performance. We propose to use Fast Interpretable Greedy Sum-Trees (FIGS) to obtain Binary Distillation (BD). This new method, called FIGS-BD, distills a binary-augmented concept-to-target portion of the CBM into an interpretable tree-based model, while maintaining the competitive prediction performance of the CBM teacher. FIGS-BD can be used in downstream tasks to explain and decompose CBM predictions into interpretable binary-concept-interaction attributions and guide adaptive test-time intervention. Across 4 datasets, we demonstrate that our adaptive test-time intervention identifies key concepts that significantly improve performance for realistic human-in-the-loop settings that only allow for limited concept interventions.

Adaptive Test-Time Intervention for Concept Bottleneck Models

TL;DR

This work tackles the interpretability-performance trade-off in concept bottleneck models by distilling the nonlinear concept-to-target component into FIGS-BD, an interpretable binary-augmented sum-of-trees model. It enables adaptive test-time intervention (ATTI), ranking concept interactions for per-example validation by humans. Across CV and NLP datasets, FIGS-BD preserves most of the teacher's predictive power while offering a compact, interpretable representation and effective, targeted interventions, including strong improvements with few interventions. The approach holds practical promise for deploying CBMs in real-world, high-stakes settings with limited expert intervention capability.

Abstract

Concept bottleneck models (CBM) aim to improve model interpretability by predicting human level "concepts" in a bottleneck within a deep learning model architecture. However, how the predicted concepts are used in predicting the target still either remains black-box or is simplified to maintain interpretability at the cost of prediction performance. We propose to use Fast Interpretable Greedy Sum-Trees (FIGS) to obtain Binary Distillation (BD). This new method, called FIGS-BD, distills a binary-augmented concept-to-target portion of the CBM into an interpretable tree-based model, while maintaining the competitive prediction performance of the CBM teacher. FIGS-BD can be used in downstream tasks to explain and decompose CBM predictions into interpretable binary-concept-interaction attributions and guide adaptive test-time intervention. Across 4 datasets, we demonstrate that our adaptive test-time intervention identifies key concepts that significantly improve performance for realistic human-in-the-loop settings that only allow for limited concept interventions.

Paper Structure

This paper contains 17 sections, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: The CBM incorrectly identifies "long legs" in the image, perhaps due to the spurious correlations between water and long legged birds like seagulls. FIGS adaptive test-time intervention (ATTI) recommends a small number (2) of concepts based on a binarization of predicted concepts (including "long legs") to intervene on, which results in the correct prediction.
  • Figure 2: Effectiveness of adaptive test-time interventions for different concept-to-target models. Note the $x$-axis enumerates the number of interactions (of at most 3 concepts) intervened on.
  • Figure 3: Performance of CBM linear with adaptive test-time interventions for concepts suggested by different CTT models. FIGS ATTI greatly out-performs Linear ATTI.
  • Figure 4: Left: number of uncorrectable samples of each intervention method. Right: count of iterations of intervention needed of each method to flip a wrong prediction into a correct one.