Table of Contents
Fetching ...

Diverse Concept Proposals for Concept Bottleneck Models

Katrina Brown, Marton Havasi, Finale Doshi-Velez

TL;DR

The paper tackles interpretability in concept bottleneck models by proposing to generate multiple predictive concept proposals, enabling expert selection among diverse explanations. It draws samples from the posterior $p(\mathbf{c},\theta,\phi | \mathbf{x},\mathbf{y})$ via Hamiltonian MCMC, filters proposals by an accuracy threshold $t_{acc}$, and then creates a small, diverse subset using greedy or clustering with multiple similarity metrics. It further enables conditioning on selected concepts to augment explanations. Experiments on Hexagon and MIMIC-III show the approach recovers multiple ground-truth concepts (e.g., 4/5 on MIMIC-III) and that greedy methods often yield stronger coverage, supporting interpretability and recourse in healthcare.

Abstract

Concept bottleneck models are interpretable predictive models that are often used in domains where model trust is a key priority, such as healthcare. They identify a small number of human-interpretable concepts in the data, which they then use to make predictions. Learning relevant concepts from data proves to be a challenging task. The most predictive concepts may not align with expert intuition, thus, failing interpretability with no recourse. Our proposed approach identifies a number of predictive concepts that explain the data. By offering multiple alternative explanations, we allow the human expert to choose the one that best aligns with their expectation. To demonstrate our method, we show that it is able discover all possible concept representations on a synthetic dataset. On EHR data, our model was able to identify 4 out of the 5 pre-defined concepts without supervision.

Diverse Concept Proposals for Concept Bottleneck Models

TL;DR

The paper tackles interpretability in concept bottleneck models by proposing to generate multiple predictive concept proposals, enabling expert selection among diverse explanations. It draws samples from the posterior via Hamiltonian MCMC, filters proposals by an accuracy threshold , and then creates a small, diverse subset using greedy or clustering with multiple similarity metrics. It further enables conditioning on selected concepts to augment explanations. Experiments on Hexagon and MIMIC-III show the approach recovers multiple ground-truth concepts (e.g., 4/5 on MIMIC-III) and that greedy methods often yield stronger coverage, supporting interpretability and recourse in healthcare.

Abstract

Concept bottleneck models are interpretable predictive models that are often used in domains where model trust is a key priority, such as healthcare. They identify a small number of human-interpretable concepts in the data, which they then use to make predictions. Learning relevant concepts from data proves to be a challenging task. The most predictive concepts may not align with expert intuition, thus, failing interpretability with no recourse. Our proposed approach identifies a number of predictive concepts that explain the data. By offering multiple alternative explanations, we allow the human expert to choose the one that best aligns with their expectation. To demonstrate our method, we show that it is able discover all possible concept representations on a synthetic dataset. On EHR data, our model was able to identify 4 out of the 5 pre-defined concepts without supervision.

Paper Structure

This paper contains 19 sections, 1 equation, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Graphical model of a Concept Bottleneck Model (CBM). The inputs $x$ are used to predict the distribution of concepts $p_\theta(c|x)$ (parameterized by $\theta$). Then, the concepts are used to predict the final label $p_\phi(y|c)$ (parameterized by $\phi$).
  • Figure 2: The synthetic hexagon dataset. Each of the 6 clusters contain 200 points. The 15 possible concept decision boundaries between the clusters are denoted by straight lines.