Table of Contents
Fetching ...

Towards Collective Intelligence: Uncertainty-aware SAM Adaptation for Ambiguous Medical Image Segmentation

Mingzhou Jiang, Jiaying Zhou, Junde Wu, Tianyang Wang, Yueming Jin, Min Xu

TL;DR

This paper addresses the challenge of ambiguous medical image segmentation where multiple experts provide valid yet differing interpretations. It introduces UA-SAM, an uncertainty-aware adapter that learns a multi-expert latent distribution via a conditional variational autoencoder and aligns uncertainty with adapter positions and prompts to generate diverse, clinically plausible segmentations. The approach enables a one-to-many mapping from input images to plausible segmentation masks, capturing expert variability while maintaining efficiency through parameter-efficient adapter design, including Position-conditioned Attention and Prompt Channel Attention. Across seven multi-expert benchmarks, UA-SAM achieves state-of-the-art consensus performance and favorable distributional alignment (GED), demonstrating potential to enhance clinical reliability and interpretability in ambiguous segmentation tasks.

Abstract

Collective intelligence from multiple medical experts consistently surpasses individual expertise in clinical diagnosis, particularly for ambiguous medical image segmentation tasks involving unclear tissue boundaries or pathological variations. The Segment Anything Model (SAM), a powerful vision foundation model originally designed for natural image segmentation, has shown remarkable potential when adapted to medical image segmentation tasks. However, existing SAM adaptation methods follow a single-expert paradigm, developing models based on individual expert annotations to predict deterministic masks. These methods systematically ignore the inherent uncertainty and variability in expert annotations, which fundamentally contradicts clinical practice, where multiple specialists provide different yet equally valid interpretations that collectively enhance diagnostic confidence. We propose an Uncertainty-aware Adapter, the first SAM adaptation framework designed to transition from single expert mindset to collective intelligence representation. Our approach integrates stochastic uncertainty sampling from a Conditional Variational Autoencoder into the adapters, enabling diverse prediction generation that captures expert knowledge distributions rather than individual expert annotations. We employ a novel position-conditioned control mechanism to integrate multi-expert knowledge, ensuring that the output distribution closely aligns with the multi-annotation distribution. Comprehensive evaluations across seven medical segmentation benchmarks have demonstrated that our collective intelligence-based adaptation achieves superior performance while maintaining computational efficiency, establishing a new adaptation framework for reliable clinical implementation.

Towards Collective Intelligence: Uncertainty-aware SAM Adaptation for Ambiguous Medical Image Segmentation

TL;DR

This paper addresses the challenge of ambiguous medical image segmentation where multiple experts provide valid yet differing interpretations. It introduces UA-SAM, an uncertainty-aware adapter that learns a multi-expert latent distribution via a conditional variational autoencoder and aligns uncertainty with adapter positions and prompts to generate diverse, clinically plausible segmentations. The approach enables a one-to-many mapping from input images to plausible segmentation masks, capturing expert variability while maintaining efficiency through parameter-efficient adapter design, including Position-conditioned Attention and Prompt Channel Attention. Across seven multi-expert benchmarks, UA-SAM achieves state-of-the-art consensus performance and favorable distributional alignment (GED), demonstrating potential to enhance clinical reliability and interpretability in ambiguous segmentation tasks.

Abstract

Collective intelligence from multiple medical experts consistently surpasses individual expertise in clinical diagnosis, particularly for ambiguous medical image segmentation tasks involving unclear tissue boundaries or pathological variations. The Segment Anything Model (SAM), a powerful vision foundation model originally designed for natural image segmentation, has shown remarkable potential when adapted to medical image segmentation tasks. However, existing SAM adaptation methods follow a single-expert paradigm, developing models based on individual expert annotations to predict deterministic masks. These methods systematically ignore the inherent uncertainty and variability in expert annotations, which fundamentally contradicts clinical practice, where multiple specialists provide different yet equally valid interpretations that collectively enhance diagnostic confidence. We propose an Uncertainty-aware Adapter, the first SAM adaptation framework designed to transition from single expert mindset to collective intelligence representation. Our approach integrates stochastic uncertainty sampling from a Conditional Variational Autoencoder into the adapters, enabling diverse prediction generation that captures expert knowledge distributions rather than individual expert annotations. We employ a novel position-conditioned control mechanism to integrate multi-expert knowledge, ensuring that the output distribution closely aligns with the multi-annotation distribution. Comprehensive evaluations across seven medical segmentation benchmarks have demonstrated that our collective intelligence-based adaptation achieves superior performance while maintaining computational efficiency, establishing a new adaptation framework for reliable clinical implementation.
Paper Structure (20 sections, 9 equations, 8 figures, 7 tables)

This paper contains 20 sections, 9 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: An example of previous SAM adaptation methods (based on single-expert strategy) and our proposed Uncertainty-aware Adapter based on multi-expert collective intelligence for clinical scenarios
  • Figure 2: The impact of aggregating different numbers of expert annotations for adapting SAM to optic cup segmentation.
  • Figure 3: Overview architecture of framework UA-SAM. We froze the parameters of SAM and only updated the Uncertainty-aware Adapter parameters. Note that we did not show the prompt encoder.
  • Figure 4: UA-SAM shows superior segmentation visualization. Our results exhibit more prominent uncertainty boundaries while maintaining shapes that are closer to the ground truth. Here, the Ensemble U-Net is implemented by training UNet under three different random seeds. Due to the large discrepancy in prediction shapes, we employed majority voting to aggregate the results from multiple U-Nets.
  • Figure 5: The impact of sampling times. For each image, the model predicts $n$ times and then aggregates the results into the final mask via majority voting.
  • ...and 3 more figures