Diverse Concept Proposals for Concept Bottleneck Models
Katrina Brown, Marton Havasi, Finale Doshi-Velez
TL;DR
The paper tackles interpretability in concept bottleneck models by proposing to generate multiple predictive concept proposals, enabling expert selection among diverse explanations. It draws samples from the posterior $p(\mathbf{c},\theta,\phi | \mathbf{x},\mathbf{y})$ via Hamiltonian MCMC, filters proposals by an accuracy threshold $t_{acc}$, and then creates a small, diverse subset using greedy or clustering with multiple similarity metrics. It further enables conditioning on selected concepts to augment explanations. Experiments on Hexagon and MIMIC-III show the approach recovers multiple ground-truth concepts (e.g., 4/5 on MIMIC-III) and that greedy methods often yield stronger coverage, supporting interpretability and recourse in healthcare.
Abstract
Concept bottleneck models are interpretable predictive models that are often used in domains where model trust is a key priority, such as healthcare. They identify a small number of human-interpretable concepts in the data, which they then use to make predictions. Learning relevant concepts from data proves to be a challenging task. The most predictive concepts may not align with expert intuition, thus, failing interpretability with no recourse. Our proposed approach identifies a number of predictive concepts that explain the data. By offering multiple alternative explanations, we allow the human expert to choose the one that best aligns with their expectation. To demonstrate our method, we show that it is able discover all possible concept representations on a synthetic dataset. On EHR data, our model was able to identify 4 out of the 5 pre-defined concepts without supervision.
