Table of Contents
Fetching ...

Learning from Disagreement: A Group Decision Simulation Framework for Robust Medical Image Segmentation

Chen Zhong, Yuxuan Yang, Xinyue Zhang, Ruohan Ma, Yong Guo, Gang Li, Jupeng Li

TL;DR

This work tackles inter-rater variability in medical image segmentation by reframing disagreement as informative rather than noise. It introduces a group decision simulation (GDS) framework consisting of an Expert Signature Generator (ESG) and a Simulated Consultation Module (SCM) that jointly disentangle annotator biases from image content and synthesize final segmentations via adaptive sampling. Empirical results show state-of-the-art Dice scores on challenging CBCT and MRI datasets and strong performance in both high-consensus and highly uncertain regions, demonstrating improved robustness and clinically relevant uncertainty modeling. The approach offers a principled path toward trustworthy, uncertainty-aware computer-aided diagnosis by treating expert disagreement as a valuable signal rather than a nuisance.

Abstract

Medical image segmentation annotation suffers from inter-rater variability (IRV) due to differences in annotators' expertise and the inherent blurriness of medical images. Standard approaches that simply average expert labels are flawed, as they discard the valuable clinical uncertainty revealed in disagreements. We introduce a fundamentally new approach with our group decision simulation framework, which works by mimicking the collaborative decision-making process of a clinical panel. Under this framework, an Expert Signature Generator (ESG) learns to represent individual annotator styles in a unique latent space. A Simulated Consultation Module (SCM) then intelligently generates the final segmentation by sampling from this space. This method achieved state-of-the-art results on challenging CBCT and MRI datasets (92.11% and 90.72% Dice scores). By treating expert disagreement as a useful signal instead of noise, our work provides a clear path toward more robust and trustworthy AI systems for healthcare.

Learning from Disagreement: A Group Decision Simulation Framework for Robust Medical Image Segmentation

TL;DR

This work tackles inter-rater variability in medical image segmentation by reframing disagreement as informative rather than noise. It introduces a group decision simulation (GDS) framework consisting of an Expert Signature Generator (ESG) and a Simulated Consultation Module (SCM) that jointly disentangle annotator biases from image content and synthesize final segmentations via adaptive sampling. Empirical results show state-of-the-art Dice scores on challenging CBCT and MRI datasets and strong performance in both high-consensus and highly uncertain regions, demonstrating improved robustness and clinically relevant uncertainty modeling. The approach offers a principled path toward trustworthy, uncertainty-aware computer-aided diagnosis by treating expert disagreement as a valuable signal rather than a nuisance.

Abstract

Medical image segmentation annotation suffers from inter-rater variability (IRV) due to differences in annotators' expertise and the inherent blurriness of medical images. Standard approaches that simply average expert labels are flawed, as they discard the valuable clinical uncertainty revealed in disagreements. We introduce a fundamentally new approach with our group decision simulation framework, which works by mimicking the collaborative decision-making process of a clinical panel. Under this framework, an Expert Signature Generator (ESG) learns to represent individual annotator styles in a unique latent space. A Simulated Consultation Module (SCM) then intelligently generates the final segmentation by sampling from this space. This method achieved state-of-the-art results on challenging CBCT and MRI datasets (92.11% and 90.72% Dice scores). By treating expert disagreement as a useful signal instead of noise, our work provides a clear path toward more robust and trustworthy AI systems for healthcare.

Paper Structure

This paper contains 11 sections, 1 equation, 2 figures, 4 tables.

Figures (2)

  • Figure 1: The overall structure of our proposed framework.
  • Figure 2: Diversified segmentation results of our proposed framework on the public QUBIQ (Left 4 column) and private JCL (Right 2 column) datasets. Different colors denote different delineations, which shows that our model only generate diverse and plausible predictions on inherent blurred areas in the medical image (white arrow).