Adaptive Test-Time Intervention for Concept Bottleneck Models
Matthew Shen, Aliyah Hsu, Abhineet Agarwal, Bin Yu
TL;DR
This work tackles the interpretability-performance trade-off in concept bottleneck models by distilling the nonlinear concept-to-target component into FIGS-BD, an interpretable binary-augmented sum-of-trees model. It enables adaptive test-time intervention (ATTI), ranking concept interactions for per-example validation by humans. Across CV and NLP datasets, FIGS-BD preserves most of the teacher's predictive power while offering a compact, interpretable representation and effective, targeted interventions, including strong improvements with few interventions. The approach holds practical promise for deploying CBMs in real-world, high-stakes settings with limited expert intervention capability.
Abstract
Concept bottleneck models (CBM) aim to improve model interpretability by predicting human level "concepts" in a bottleneck within a deep learning model architecture. However, how the predicted concepts are used in predicting the target still either remains black-box or is simplified to maintain interpretability at the cost of prediction performance. We propose to use Fast Interpretable Greedy Sum-Trees (FIGS) to obtain Binary Distillation (BD). This new method, called FIGS-BD, distills a binary-augmented concept-to-target portion of the CBM into an interpretable tree-based model, while maintaining the competitive prediction performance of the CBM teacher. FIGS-BD can be used in downstream tasks to explain and decompose CBM predictions into interpretable binary-concept-interaction attributions and guide adaptive test-time intervention. Across 4 datasets, we demonstrate that our adaptive test-time intervention identifies key concepts that significantly improve performance for realistic human-in-the-loop settings that only allow for limited concept interventions.
