Multi-rater Prompting for Ambiguous Medical Image Segmentation

Jinhong Wang; Yi Cheng; Jintai Chen; Hongxia Xu; Danny Chen; Jian Wu

Multi-rater Prompting for Ambiguous Medical Image Segmentation

Jinhong Wang, Yi Cheng, Jintai Chen, Hongxia Xu, Danny Chen, Jian Wu

TL;DR

This work tackles ambiguous medical image segmentation with multi-rater annotations by introducing PU-Net, a U-Net–based architecture that uses rater-aware prompts and Implantable Transformer Blocks to model both individual rater concerns and inter-rater consensus. The backbone is kept frozen during fine-tuning, with only the prompts and task heads updated, enabling substantial computational savings. The method formalizes multi-rater data as $D^{r_j} = \{x_k, r_j, y_k^{r_j}\}$ and uses $R+1$ prompts, including an aggregation prompt, guided by a mix-training objective $L = L_{task}(x_k, r; \theta_t, \theta_p)$. Experiments on the RIGA dataset demonstrate that PU-Net achieves calibrated, competitive segmentation across different GTs while updating approximately $0.3\%$ of parameters, highlighting its practicality for scalable, domain-adaptive medical image analysis.

Abstract

Multi-rater annotations commonly occur when medical images are independently annotated by multiple experts (raters). In this paper, we tackle two challenges arisen in multi-rater annotations for medical image segmentation (called ambiguous medical image segmentation): (1) How to train a deep learning model when a group of raters produces a set of diverse but plausible annotations, and (2) how to fine-tune the model efficiently when computation resources are not available for re-training the entire model on a different dataset domain. We propose a multi-rater prompt-based approach to address these two challenges altogether. Specifically, we introduce a series of rater-aware prompts that can be plugged into the U-Net model for uncertainty estimation to handle multi-annotation cases. During the prompt-based fine-tuning process, only 0.3% of learnable parameters are required to be updated comparing to training the entire model. Further, in order to integrate expert consensus and disagreement, we explore different multi-rater incorporation strategies and design a mix-training strategy for comprehensive insight learning. Extensive experiments verify the effectiveness of our new approach for ambiguous medical image segmentation on two public datasets while alleviating the heavy burden of model re-training.

Multi-rater Prompting for Ambiguous Medical Image Segmentation

TL;DR

and uses

prompts, including an aggregation prompt, guided by a mix-training objective

. Experiments on the RIGA dataset demonstrate that PU-Net achieves calibrated, competitive segmentation across different GTs while updating approximately

of parameters, highlighting its practicality for scalable, domain-adaptive medical image analysis.

Abstract

Paper Structure (11 sections, 4 equations, 3 figures, 1 table)

This paper contains 11 sections, 4 equations, 3 figures, 1 table.

Introduction
Method
Problem Definition
The Overall Framework
Prompt Learning with Rater-aware Prompts
Implantable Transformer for Prompt Learning
Experiments
Datasets
Experimental Setup
Experimental Results
Discussions and Conclusions

Figures (3)

Figure 1: (a) A disagreement example when multiple raters (experts) produce diverse but plausible annotations on an image. (b) In our proposed PU-Net, we adopt prompt learning to consider all individual opinions from different raters and adapt the pre-trained model for downstream tasks to avoid fine-tuning the entire model with heavy computation resources/costs.
Figure 2: An overview of our PU-Net architecture. The network is U-shaped and the backbone is a CNN module. We assign $R + 1$ prompts for $R$ rater cases and apply prompt learning by inserting an Implantable Transformer Block (ITB) between every two adjacent downsampling/upsampling layers. Channel-wise FCs are used to align the dimension and distribution of the prompts with image features.
Figure 3: Visualization of calibrated segmentation mask predictions of the SOTA method CM-Net and our proposed PU-Net. Raters 1--6 correspond to the annotations of 6 different annotators, and MV corresponds to the majority-voting annotations. In the bottom row, the purple $P_{r_j}$ or $P_{c}$ in each box represents the prompt for the corresponding Rater $j$ or MV.

Multi-rater Prompting for Ambiguous Medical Image Segmentation

TL;DR

Abstract

Multi-rater Prompting for Ambiguous Medical Image Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (3)