Table of Contents
Fetching ...

Deferring Concept Bottleneck Models: Learning to Defer Interventions to Inaccurate Experts

Andrea Pugnana, Riccardo Massidda, Francesco Giannini, Pietro Barbiero, Mateo Espinosa Zarlenga, Roberto Pellungrini, Gabriele Dominici, Fosca Giannotti, Davide Bacciu

TL;DR

Deferring Concept Bottleneck Models (DCBMs) integrate Learning to Defer with Concept Bottleneck Models to enable deferrals not only for final decisions but also for intermediate concepts. The framework derives a consistent deferral-aware loss via maximum likelihood for a multi-variable Bayesian-like model, enabling independent training of concept and task predictors while preserving interpretability. Empirical results show that deferring can boost predictive performance and mitigate concept incompleteness, at the cost of increased human involvement governed by λ, and DCBMs provide interpretable explanations for deferral choices. The work provides theoretical consistency guarantees and practical training guidelines, paving the way for robust human-in-the-loop CBM deployments in realistic settings with imperfect human experts.

Abstract

Concept Bottleneck Models (CBMs) are machine learning models that improve interpretability by grounding their predictions on human-understandable concepts, allowing for targeted interventions in their decision-making process. However, when intervened on, CBMs assume the availability of humans that can identify the need to intervene and always provide correct interventions. Both assumptions are unrealistic and impractical, considering labor costs and human error-proneness. In contrast, Learning to Defer (L2D) extends supervised learning by allowing machine learning models to identify cases where a human is more likely to be correct than the model, thus leading to deferring systems with improved performance. In this work, we gain inspiration from L2D and propose Deferring CBMs (DCBMs), a novel framework that allows CBMs to learn when an intervention is needed. To this end, we model DCBMs as a composition of deferring systems and derive a consistent L2D loss to train them. Moreover, by relying on a CBM architecture, DCBMs can explain why defer occurs on the final task. Our results show that DCBMs achieve high predictive performance and interpretability at the cost of deferring more to humans.

Deferring Concept Bottleneck Models: Learning to Defer Interventions to Inaccurate Experts

TL;DR

Deferring Concept Bottleneck Models (DCBMs) integrate Learning to Defer with Concept Bottleneck Models to enable deferrals not only for final decisions but also for intermediate concepts. The framework derives a consistent deferral-aware loss via maximum likelihood for a multi-variable Bayesian-like model, enabling independent training of concept and task predictors while preserving interpretability. Empirical results show that deferring can boost predictive performance and mitigate concept incompleteness, at the cost of increased human involvement governed by λ, and DCBMs provide interpretable explanations for deferral choices. The work provides theoretical consistency guarantees and practical training guidelines, paving the way for robust human-in-the-loop CBM deployments in realistic settings with imperfect human experts.

Abstract

Concept Bottleneck Models (CBMs) are machine learning models that improve interpretability by grounding their predictions on human-understandable concepts, allowing for targeted interventions in their decision-making process. However, when intervened on, CBMs assume the availability of humans that can identify the need to intervene and always provide correct interventions. Both assumptions are unrealistic and impractical, considering labor costs and human error-proneness. In contrast, Learning to Defer (L2D) extends supervised learning by allowing machine learning models to identify cases where a human is more likely to be correct than the model, thus leading to deferring systems with improved performance. In this work, we gain inspiration from L2D and propose Deferring CBMs (DCBMs), a novel framework that allows CBMs to learn when an intervention is needed. To this end, we model DCBMs as a composition of deferring systems and derive a consistent L2D loss to train them. Moreover, by relying on a CBM architecture, DCBMs can explain why defer occurs on the final task. Our results show that DCBMs achieve high predictive performance and interpretability at the cost of deferring more to humans.

Paper Structure

This paper contains 26 sections, 4 theorems, 25 equations, 7 figures, 25 tables.

Key Result

Proposition 3.1

Let $\bm{{\theta}}$ be the parameters of a DCBM. Then, we can obtain the most likely parameters $\hat{\theta}$ given observations on the inputs $\bm{{x}}$, the concepts $\bm{{c}}$, the human $\bm{{h}}$, and the task $\bm{{y}}$, by minimizing the following loss function: where $q(\space\cdot\space;\theta_V)\colon\mathcal{D} (\bm{{Z}}_V)\to\mathbb{R}^{K_V+1}$ returns the logits of the model $M_V$ g

Figures (7)

  • Figure 1: A DCBM: Given an input, the concept predictors $M_{\bm{{C}}}$ output either a concept's value or defer its prediction to a human (i.e., they predict $\bot$). Next, the deferring system $\Delta_{\bm{{C}}}$ outputs the human labels only on the deferred concepts, returning the system's predictions otherwise. The same applies to the final task, where the task classifier $M_Y$ is an input of a dedicated deferring system $\Delta_Y$. DCBMs can be trained by considering the cost of deferring, thus regulating the expected number of human deferrals.
  • Figure 2: A DCBM is a Bayesian Network where inputs $\bm{{X}}$, concepts $\bm{{C}}$, tasks $\bm{{Y}}$, and human labels $\bm{{H}}$ are observed variables (in gray). As represented by the plate notation koller2009probabilistic, we assign a human expert and a latent model to each variable. We incorporate the deferral decision in the model through a dedicated output, denoted as $M=\bot$. Here, we learn each model $M_V$'s parameters $\theta_V$ via maximum likelihood.
  • Figure 3: Results on completeness when human experts have perfect concept and task accuracy (i.e., they are oracles). We report each metric's average and standard deviations as we increase the defer cost $\lambda$. The black box and the CBM baselines are constant as they are independent of the defer cost.
  • Figure 4: Results on cifar10-h when human experts have perfect accuracy on the final task but not on the concepts. We report each metric's average and standard deviation as we increase the defer costs $\lambda$. The black box and the CBM baselines are constant as they are independent of the defer cost.
  • Figure 5: Results on CUB dataset when human experts have perfect accuracy on the concepts but not on the final task. We report each metric's average and standard deviation as we increase the defer costs $\lambda$. The black box and the CBM baselines are constant as they are independent of the deferring cost.
  • ...and 2 more figures

Theorems & Definitions (10)

  • Proposition 3.1: Maximum Likelihood of DCBM
  • proof
  • Lemma 3.2
  • proof
  • Theorem 3.3
  • proof
  • Lemma 1.1
  • proof
  • proof
  • proof