Diversified and Personalized Multi-rater Medical Image Segmentation

Yicheng Wu; Xiangde Luo; Zhe Xu; Xiaoqing Guo; Lie Ju; Zongyuan Ge; Wenjun Liao; Jianfei Cai

Diversified and Personalized Multi-rater Medical Image Segmentation

Yicheng Wu, Xiangde Luo, Zhe Xu, Xiaoqing Guo, Lie Ju, Zongyuan Ge, Wenjun Liao, Jianfei Cai

TL;DR

This work tackles annotation ambiguity in medical image segmentation by introducing D-Persona, a two-stage framework that first learns a shared latent space to capture diverse expert opinions and then derives per-expert prompts via attention-based projections for personalized segmentation. Stage I employs a bound-constrained loss with a probabilistic U-Net backbone to broaden segmentation diversity, while Stage II uses multiple projection heads and cross-attention against a prior bank to deliver expert-specific outputs without retraining the core model. Evaluations on NPC-170 and LIDC-IDRI show state-of-the-art performance in both diversification and personalization metrics, underscoring the method's ability to provide multiple plausible segmentations alongside personalized predictions. The approach offers practical benefits for clinical workflows by enabling diverse opinions and individualized analysis within a single framework, with code to be released for reproducibility.

Abstract

Annotation ambiguity due to inherent data uncertainties such as blurred boundaries in medical scans and different observer expertise and preferences has become a major obstacle for training deep-learning based medical image segmentation models. To address it, the common practice is to gather multiple annotations from different experts, leading to the setting of multi-rater medical image segmentation. Existing works aim to either merge different annotations into the "groundtruth" that is often unattainable in numerous medical contexts, or generate diverse results, or produce personalized results corresponding to individual expert raters. Here, we bring up a more ambitious goal for multi-rater medical image segmentation, i.e., obtaining both diversified and personalized results. Specifically, we propose a two-stage framework named D-Persona (first Diversification and then Personalization). In Stage I, we exploit multiple given annotations to train a Probabilistic U-Net model, with a bound-constrained loss to improve the prediction diversity. In this way, a common latent space is constructed in Stage I, where different latent codes denote diversified expert opinions. Then, in Stage II, we design multiple attention-based projection heads to adaptively query the corresponding expert prompts from the shared latent space, and then perform the personalized medical image segmentation. We evaluated the proposed model on our in-house Nasopharyngeal Carcinoma dataset and the public lung nodule dataset (i.e., LIDC-IDRI). Extensive experiments demonstrated our D-Persona can provide diversified and personalized results at the same time, achieving new SOTA performance for multi-rater medical image segmentation. Our code will be released at https://github.com/ycwu1997/D-Persona.

Diversified and Personalized Multi-rater Medical Image Segmentation

TL;DR

Abstract

Paper Structure (21 sections, 11 equations, 5 figures, 4 tables)

This paper contains 21 sections, 11 equations, 5 figures, 4 tables.

Introduction
Related work
Crowdsourcing-based Segmentation
Generation-based Segmentation
One-stage Personalization Segmentation
Noisy Learning
Methods
Stage I: Diversified Segmentation
Stage II: Personalized Segmentation
Experiments and Results
Dataset
Implementation Details
Evaluation Metrics
Performance on NPC-170
Performance on LIDC-IDRI
...and 6 more sections

Figures (5)

Figure 1: Overview of scheme designs in multi-rater medical image segmentation. Top: problem setting and expected objectives (i.e., meta, implicit, or explicit experts); Middle: existing methods including crowdsourcing, generation, and one-stage personalization; Bottom: our proposed two-stage framework, providing both diversified and specifically personalized segmentation simultaneously.
Figure 2: Pipeline of our proposed D-Persona framework for multi-rater medical image segmentation. Left: Stage I is designed to construct a common latent space where different latent codes lead to diversified segmentation results; Right: Stage II performs the personalized segmentation by individual projection heads to mimic the corresponding expert raters.
Figure 3: Exemplar explanation of $Dice_{max}$ and $Dice_{match}$ in a given $4 \times 6$ Dice matrix. $Dice_{max}$ averages the maximum scores of individual columns and $Dice_{match}$ further constrains a one-to-one matching between the prediction and annotation sets.
Figure 4: Diversified segmentation results of Stage I in our proposed D-Persona framework on the NPC-170 (Left) and LIDC-IDRI (Right) datasets. Different colors denote different delineations, which shows that our model can generate diverse and plausible predictions.
Figure 5: Personalized segmentation results of Stage II in our proposed D-Persona framework on the NPC-170 (Left) and LIDC-IDRI (Right) datasets. Compared to expert annotations (Red), our model can generate corresponding personalized segmentation results (Yellow). Particularly, our model successfully captures the underlying annotation preferences, i.e., from conservative to aggressive styles, as shown in the four LIDC-IDRI results on the right.

Diversified and Personalized Multi-rater Medical Image Segmentation

TL;DR

Abstract

Diversified and Personalized Multi-rater Medical Image Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)