Table of Contents
Fetching ...

Bayesian Concept Bottleneck Models with LLM Priors

Jean Feng, Avni Kothari, Luke Zier, Chandan Singh, Yan Shuo Tan

TL;DR

This work introduces BC-LLM, a Bayesian concept bottleneck framework that leverages LLM priors to iteratively discover interpretable concepts for predictive tasks. By wrapping LLMs in a split-sample Metropolis-within-Gibbs sampler, the method balances efficient concept proposals with principled posterior inference, enabling uncertainty quantification even when the concept space is effectively infinite. Empirical results across image, text, and tabular domains show BC-LLM often outperforms fully interpretable baselines and can match or exceed some black-box models, while providing interpretable, actionable concepts and robust OOD handling. The approach offers a scalable path to safer, auditable AI in high-stakes settings such as healthcare and biology, with clear avenues for future improvement in speed and scalability.

Abstract

Concept Bottleneck Models (CBMs) have been proposed as a compromise between white-box and black-box models, aiming to achieve interpretability without sacrificing accuracy. The standard training procedure for CBMs is to predefine a candidate set of human-interpretable concepts, extract their values from the training data, and identify a sparse subset as inputs to a transparent prediction model. However, such approaches are often hampered by the tradeoff between exploring a sufficiently large set of concepts versus controlling the cost of obtaining concept extractions, resulting in a large interpretability-accuracy tradeoff. This work investigates a novel approach that sidesteps these challenges: BC-LLM iteratively searches over a potentially infinite set of concepts within a Bayesian framework, in which Large Language Models (LLMs) serve as both a concept extraction mechanism and prior. Even though LLMs can be miscalibrated and hallucinate, we prove that BC-LLM can provide rigorous statistical inference and uncertainty quantification. Across image, text, and tabular datasets, BC-LLM outperforms interpretable baselines and even black-box models in certain settings, converges more rapidly towards relevant concepts, and is more robust to out-of-distribution samples.

Bayesian Concept Bottleneck Models with LLM Priors

TL;DR

This work introduces BC-LLM, a Bayesian concept bottleneck framework that leverages LLM priors to iteratively discover interpretable concepts for predictive tasks. By wrapping LLMs in a split-sample Metropolis-within-Gibbs sampler, the method balances efficient concept proposals with principled posterior inference, enabling uncertainty quantification even when the concept space is effectively infinite. Empirical results across image, text, and tabular domains show BC-LLM often outperforms fully interpretable baselines and can match or exceed some black-box models, while providing interpretable, actionable concepts and robust OOD handling. The approach offers a scalable path to safer, auditable AI in high-stakes settings such as healthcare and biology, with clear avenues for future improvement in speed and scalability.

Abstract

Concept Bottleneck Models (CBMs) have been proposed as a compromise between white-box and black-box models, aiming to achieve interpretability without sacrificing accuracy. The standard training procedure for CBMs is to predefine a candidate set of human-interpretable concepts, extract their values from the training data, and identify a sparse subset as inputs to a transparent prediction model. However, such approaches are often hampered by the tradeoff between exploring a sufficiently large set of concepts versus controlling the cost of obtaining concept extractions, resulting in a large interpretability-accuracy tradeoff. This work investigates a novel approach that sidesteps these challenges: BC-LLM iteratively searches over a potentially infinite set of concepts within a Bayesian framework, in which Large Language Models (LLMs) serve as both a concept extraction mechanism and prior. Even though LLMs can be miscalibrated and hallucinate, we prove that BC-LLM can provide rigorous statistical inference and uncertainty quantification. Across image, text, and tabular datasets, BC-LLM outperforms interpretable baselines and even black-box models in certain settings, converges more rapidly towards relevant concepts, and is more robust to out-of-distribution samples.

Paper Structure

This paper contains 34 sections, 5 theorems, 42 equations, 6 figures, 4 tables, 3 algorithms.

Key Result

Theorem 3.1

Suppose the data is IID. Let $L(\vec{c}) \coloneqq \max_{\vec{\theta}}\mathbb{E}\lbrace \log p(Y|X,\vec{\theta},\vec{c})\rbrace$ and $\mathcal{C}^* \coloneqq \operatorname{argmax}_{\vec{c}}L(\vec{c}).$ For sample size $n$, let $\Pi_n$ denote the set of stationary distributions of the Markov chain de

Figures (6)

  • Figure 1: BC-LLM is initialized by having the LLM hypothesize the top concepts based on keyphrases extracted from each observation (Step 0). The concepts are then iteratively refined by dropping a concept (Step 1), querying the LLM for candidate replacements (Step 2), annotating each observation with the candidate concepts using the LLM (Step 3), and determining which, if any, of the candidate concepts to accept (Step 4).
  • Figure 2: Example of BC-LLM classifying Bunting birds. (Left) Left and right dendrograms are learned concepts when given 1/3 versus 3/3 of the training data, respectively. Labels are shortened concept questions, generally of the form "Does the image depict...?" Proportion of posterior samples with the concept are shown in parentheses. Highlighted labels are distinguishing bird features. (Right) Application of BC-LLM to an actual bunting bird (top) versus a dog pretending to be one (bottom).
  • Figure 3: MIMIC results: (Left) Comparison of BC-LLM and existing methods in terms of performance and recovery of true concepts with 95% CI. (Right) Dendrograms of concepts learned by BC-LLM with 100 and 800 observations (left and right, respectively). Labels are shortened concept questions generally of the form "Does the note mention the patient...?" Highlighted labels correspond to the true color-coded concepts in Section \ref{['sec:mimic']}.
  • Figure 4: Learning to augment a readmission risk prediction model for heart failure patients. Dendrogram labels are shortened questions of the format "Does the note mention the patient having...?". Highlighted concepts received scores from clinicians as being highly predictive (scores 2.5+). Average clinician ratings for concepts/features from the different methods are shown on the right.
  • Figure 5: Additional results running BC-LLM and comparator methods on the MIMIC dataset, evaluated in terms of performance and recovery of true concepts. Error bars indicate the 95% CI.
  • ...and 1 more figures

Theorems & Definitions (12)

  • Theorem 3.1
  • Proposition A.1
  • proof
  • Remark F.1
  • Remark F.2
  • proof : Proof of Theorem \ref{['thm:no_prior']}
  • Lemma G.2
  • proof
  • Proposition G.3: Concentration of split-sample posterior
  • proof
  • ...and 2 more