Table of Contents
Fetching ...

To Ask or Not to Ask: Learning to Require Human Feedback

Andrea Pugnana, Giovanni De Toni, Cesare Barbera, Roberto Pellungrini, Bruno Lepri, Andrea Passerini

TL;DR

This work addresses the limitations of Learning to Defer by introducing Learning to Ask (LtA), a framework where an ML model learns not only when to defer to a human but also how to incorporate expert input through an enriched predictor. LtA employs a two-component architecture (f and g) and a budgeted selector s, optimizing the $L^{ask}$ loss with a deferral constraint; it provides a theoretical optimality result (Theorem 1) and a realizable-consistent surrogate for joint training (Theorem 2). The authors present two practical training paradigms, LtA-Seq and LtA-Joint, and validate them on synthetic data and a real X-ray dataset, showing that LtA can outperform traditional LtD, especially with richer forms of expert feedback. This framework advances human-AI collaboration by enabling more flexible, budget-aware querying and integration of expert insights into predictive models.

Abstract

Developing decision-support systems that complement human performance in classification tasks remains an open challenge. A popular approach, Learning to Defer (LtD), allows a Machine Learning (ML) model to pass difficult cases to a human expert. However, LtD treats humans and ML models as mutually exclusive decision-makers, restricting the expert contribution to mere predictions. To address this limitation, we propose Learning to Ask (LtA), a new framework that handles both when and how to incorporate expert input in an ML model. LtA is based on a two-part architecture: a standard ML model and an enriched model trained with additional expert human feedback, with a formally optimal strategy for selecting when to query the enriched model. We provide two practical implementations of LtA: a sequential approach, which trains the models in stages, and a joint approach, which optimises them simultaneously. For the latter, we design surrogate losses with realisable-consistency guarantees. Our experiments with synthetic and real expert data demonstrate that LtA provides a more flexible and powerful foundation for effective human-AI collaboration.

To Ask or Not to Ask: Learning to Require Human Feedback

TL;DR

This work addresses the limitations of Learning to Defer by introducing Learning to Ask (LtA), a framework where an ML model learns not only when to defer to a human but also how to incorporate expert input through an enriched predictor. LtA employs a two-component architecture (f and g) and a budgeted selector s, optimizing the loss with a deferral constraint; it provides a theoretical optimality result (Theorem 1) and a realizable-consistent surrogate for joint training (Theorem 2). The authors present two practical training paradigms, LtA-Seq and LtA-Joint, and validate them on synthetic data and a real X-ray dataset, showing that LtA can outperform traditional LtD, especially with richer forms of expert feedback. This framework advances human-AI collaboration by enabling more flexible, budget-aware querying and integration of expert insights into predictive models.

Abstract

Developing decision-support systems that complement human performance in classification tasks remains an open challenge. A popular approach, Learning to Defer (LtD), allows a Machine Learning (ML) model to pass difficult cases to a human expert. However, LtD treats humans and ML models as mutually exclusive decision-makers, restricting the expert contribution to mere predictions. To address this limitation, we propose Learning to Ask (LtA), a new framework that handles both when and how to incorporate expert input in an ML model. LtA is based on a two-part architecture: a standard ML model and an enriched model trained with additional expert human feedback, with a formally optimal strategy for selecting when to query the enriched model. We provide two practical implementations of LtA: a sequential approach, which trains the models in stages, and a joint approach, which optimises them simultaneously. For the latter, we design surrogate losses with realisable-consistency guarantees. Our experiments with synthetic and real expert data demonstrate that LtA provides a more flexible and powerful foundation for effective human-AI collaboration.

Paper Structure

This paper contains 28 sections, 2 theorems, 39 equations, 4 figures, 1 table.

Key Result

Theorem 1

Let $f\in{\mathcal{F}}$ be a standard predictor, and let $g\in{\mathcal{G}}$ be a fixed enriched predictor. Define $\Delta\mathbb{E}(\ell^f, \ell^g) = \mathbb{E}_{y\mid {\mathbf{x}}}[\ell^f\left(f({\mathbf{x}}),y\right)]-\mathbb{E}_{y,h\mid{\mathbf{x}}}[\ell^g\left(g({\mathbf{x}},h), y\right)]$ as t where $\tau_\beta^* = \inf_{\lambda}\{\lambda\geq0: P(\mathbb{E}_{y\mid {\mathbf{x}}} \left[ \ell^

Figures (4)

  • Figure 1: From Learning to Defer (LtD) to Learning to Ask (LtA). (Left) In classical LtD tasks, given an instance ${\mathbf{x}} \in {\mathcal{X}}$, we employ a selection strategy $s({\mathbf{x}})$ to defer the final prediction to either a human expert or an ML predictor. However, both the expert and the predictor might have access to mutually exclusive informative features (e.g., complex medical signals and the oral medical history of the patient). (Right) Our proposed LtA framework (Section \ref{['sec:LtAformal']}) interrogates instead about when to request human input, and how to incorporate such complementary information $h \in {\mathcal{H}}$ within an enriched predictor $g: {\mathcal{X}} \times {\mathcal{H}} \rightarrow {\mathcal{Y}}$.
  • Figure 2: Learning to Defer (LtD) can be suboptimal. (Top) Synthetic classification task with two binary features $x_1,x_2 \in \{0,1\}$ and four classes $Y \in \{0,1,2,3\}$. We vary the informativeness of each feature, i.e., how discriminative $x_1$ and $x_2$ are for predicting the true label, creating three scenarios, varying the expert ($Acc_H$) and machine ($Acc_M$) empirical accuracies on the test set. (Bottom) For each scenario, neither the machine predictor $f(x_1)$ (TML]FFFFFF0173B2) nor the human expert predictor $f'(x_2)$ (TML]FFFFFFDE8F05) can perfectly recover $Y$ alone given their single feature. The oracle deferral strategy LtD* (TML]FFFFFF029E73), which always chooses the correct prediction when possible, remains limited by this partial information. Only a strategy that integrates both signals, LtA* (TML]FFFFFFD55E00), can achieve perfect classification. For each scenario, we report the standard deviation over five runs.
  • Figure 3: Empirical accuracy ($Acc$) at various coverage levels ($1-\beta$), on both synthetic and real classification tasks, and different expert feedback. (\ref{['fig:acc_SynthZeroCost']}) Results for Synth; (\ref{['fig:acc_CHXLtDZeroCost']}) Results for X-Rays when using standard LtD feedback; (\ref{['fig:acc_CHXUncZeroCost']}) Results for X-Rays when using uncertainty feedback to train LtA methods. In Fig. \ref{['fig:acc_CHXUncZeroCost']}, we also include the LtD baseline. We also report the empirical accuracy of the expert and machine predictions alone as dotted lines. Lastly, we report the standard deviation over five runs as shaded areas.
  • Figure 4: Empirical accuracy ($Acc$) on X-Rays at various coverage levels ($1-\beta$), using different defer costs $\delta \in \{0.0, \ldots, 1.0\}$. (\ref{['fig:deltaLtD']}) Results for LtD (\ref{['fig:deltaSeq']}, \ref{['fig:deltaJoint']}) Results for LtA-Seq and LtA-Joint, respectively, with uncertainty feedback. We report the standard deviation over five runs as shaded areas.

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 2
  • proof