Table of Contents
Fetching ...

Optimizing Resources for On-the-Fly Label Estimation with Multiple Unknown Medical Experts

Tim Bary, Tiffanie Godelaine, Axel Abels, Benoît Macq

TL;DR

This work tackles real-time ground truth estimation in medical screening when expert pools are unknown and data arrive as a stream. It introduces a modular backbone that alternates between ranking experts by Bayesian trust, inferring a coalitional label with a calculated confidence, and updating expert parameters via EM, stopping when $c_*^n \ge \tau$. The approach achieves comparable accuracy to non-adaptive baselines while reducing the number of expert queries by up to 50% across three multi-annotator datasets, demonstrating practical gains for continuous screening workflows. The framework supports on-the-fly labeling, cold-start capability, and adaptive allocation of experts to difficult cases, offering a scalable path toward tighter integration of human expertise in medical AI pipelines.

Abstract

Accurate ground truth estimation in medical screening programs often relies on coalitions of experts and peer second opinions. Algorithms that efficiently aggregate noisy annotations can enhance screening workflows, particularly when data arrive continuously and expert proficiency is initially unknown. However, existing algorithms do not meet the requirements for seamless integration into screening pipelines. We therefore propose an adaptive approach for real-time annotation that (I) supports on-the-fly labeling of incoming data, (II) operates without prior knowledge of medical experts or pre-labeled data, and (III) dynamically queries additional experts based on the latent difficulty of each instance. The method incrementally gathers expert opinions until a confidence threshold is met, providing accurate labels with reduced annotation overhead. We evaluate our approach on three multi-annotator classification datasets across different modalities. Results show that our adaptive querying strategy reduces the number of expert queries by up to 50% while achieving accuracy comparable to a non-adaptive baseline. Our code is available at https://github.com/tbary/MEDICS

Optimizing Resources for On-the-Fly Label Estimation with Multiple Unknown Medical Experts

TL;DR

This work tackles real-time ground truth estimation in medical screening when expert pools are unknown and data arrive as a stream. It introduces a modular backbone that alternates between ranking experts by Bayesian trust, inferring a coalitional label with a calculated confidence, and updating expert parameters via EM, stopping when . The approach achieves comparable accuracy to non-adaptive baselines while reducing the number of expert queries by up to 50% across three multi-annotator datasets, demonstrating practical gains for continuous screening workflows. The framework supports on-the-fly labeling, cold-start capability, and adaptive allocation of experts to difficult cases, offering a scalable path toward tighter integration of human expertise in medical AI pipelines.

Abstract

Accurate ground truth estimation in medical screening programs often relies on coalitions of experts and peer second opinions. Algorithms that efficiently aggregate noisy annotations can enhance screening workflows, particularly when data arrive continuously and expert proficiency is initially unknown. However, existing algorithms do not meet the requirements for seamless integration into screening pipelines. We therefore propose an adaptive approach for real-time annotation that (I) supports on-the-fly labeling of incoming data, (II) operates without prior knowledge of medical experts or pre-labeled data, and (III) dynamically queries additional experts based on the latent difficulty of each instance. The method incrementally gathers expert opinions until a confidence threshold is met, providing accurate labels with reduced annotation overhead. We evaluate our approach on three multi-annotator classification datasets across different modalities. Results show that our adaptive querying strategy reduces the number of expert queries by up to 50% while achieving accuracy comparable to a non-adaptive baseline. Our code is available at https://github.com/tbary/MEDICS

Paper Structure

This paper contains 28 sections, 3 equations, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: Illustration of the problem setup. Given a data point $x^n$, the algorithm must (A) select $Q^n$ experts $\{a_q\}_{q=1}^{Q^n}$ from a coalition of size $K$ to annotate $x^n$, (B) infer a coalitional label $\hat{z}_*^n$ based on the experts' estimated labels $\{\hat{z}^n_q\}_{q=1}^{Q^n}$, and (C) update the estimated parameters $\{\hat{\theta}_{(k)}\}_{k=1}^K$ and trusts $\{r_{(k)}\}_{k=1}^K$ of the experts. In this example, $Q^n = 3$.
  • Figure 2: Comparison of the accuracies of the adaptive and baseline algorithms across different expert sampling strategies for the (1) Glioma Classification, (2) Weather Sentiment, and (3) Music Genre datasets, based on the number of queried experts. Points on the curves represent average performance for a given threshold over 100 bootstrap repetitions. The $\tau$ and $Q$ scales values are indicated in Section \ref{['sec:eval']}. The shaded areas represent the 95% confidence intervals.
  • Figure 3: Average number of queried experts through time for the adaptive algorithm with AUER expert sampling over 100 bootstrap repetitions. Similar curves are observed with the Greedy and Random expert samplings. The shaded areas represent the 95% confidence intervals.