Table of Contents
Fetching ...

Probabilistic Modeling of Multi-rater Medical Image Segmentation for Diversity and Personalization

Ke Liu, Shangde Gao, Yichao Fu, Shangqi Gao, Chunhua Shen

TL;DR

ProSeg introduces a two-latent-variable probabilistic model for multi-rater medical image segmentation to simultaneously achieve diversity and personalization. It uses tau to capture annotator preferences and Z to model boundary ambiguity, learned via variational inference to enable sampling of diverse and expert-aligned segmentations. On NPC and LIDC-IDRI datasets, ProSeg achieves state-of-the-art performance across diversity and personalization metrics, outperforming generation and personalization baselines. An ablation study confirms the necessity of both latent spaces, highlighting the practicality of a unified probabilistic framework for broader medical image segmentation tasks.

Abstract

Medical image segmentation is inherently influenced by data uncertainty, arising from ambiguous boundaries in medical scans and inter-observer variability in diagnosis. To address this challenge, previous works formulated the multi-rater medical image segmentation task, where multiple experts provide separate annotations for each image. However, existing models are typically constrained to either generate diverse segmentation that lacks expert specificity or to produce personalized outputs that merely replicate individual annotators. We propose Probabilistic modeling of multi-rater medical image Segmentation (ProSeg) that simultaneously enables both diversification and personalization. Specifically, we introduce two latent variables to model expert annotation preferences and image boundary ambiguity. Their conditional probabilistic distributions are then obtained through variational inference, allowing segmentation outputs to be generated by sampling from these distributions. Extensive experiments on both the nasopharyngeal carcinoma dataset (NPC) and the lung nodule dataset (LIDC-IDRI) demonstrate that our ProSeg achieves a new state-of-the-art performance, providing segmentation results that are both diverse and expert-personalized. Code can be found in https://github.com/AI4MOL/ProSeg.

Probabilistic Modeling of Multi-rater Medical Image Segmentation for Diversity and Personalization

TL;DR

ProSeg introduces a two-latent-variable probabilistic model for multi-rater medical image segmentation to simultaneously achieve diversity and personalization. It uses tau to capture annotator preferences and Z to model boundary ambiguity, learned via variational inference to enable sampling of diverse and expert-aligned segmentations. On NPC and LIDC-IDRI datasets, ProSeg achieves state-of-the-art performance across diversity and personalization metrics, outperforming generation and personalization baselines. An ablation study confirms the necessity of both latent spaces, highlighting the practicality of a unified probabilistic framework for broader medical image segmentation tasks.

Abstract

Medical image segmentation is inherently influenced by data uncertainty, arising from ambiguous boundaries in medical scans and inter-observer variability in diagnosis. To address this challenge, previous works formulated the multi-rater medical image segmentation task, where multiple experts provide separate annotations for each image. However, existing models are typically constrained to either generate diverse segmentation that lacks expert specificity or to produce personalized outputs that merely replicate individual annotators. We propose Probabilistic modeling of multi-rater medical image Segmentation (ProSeg) that simultaneously enables both diversification and personalization. Specifically, we introduce two latent variables to model expert annotation preferences and image boundary ambiguity. Their conditional probabilistic distributions are then obtained through variational inference, allowing segmentation outputs to be generated by sampling from these distributions. Extensive experiments on both the nasopharyngeal carcinoma dataset (NPC) and the lung nodule dataset (LIDC-IDRI) demonstrate that our ProSeg achieves a new state-of-the-art performance, providing segmentation results that are both diverse and expert-personalized. Code can be found in https://github.com/AI4MOL/ProSeg.

Paper Structure

This paper contains 48 sections, 17 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: Distance distribution between two random experts. A greater distance indicates higher diversity and a more similar distribution with the Gold standard indicates better personalization.
  • Figure 2: Probability graph model (PGM) of methods for multi-rater segmentation. $X$, $\mathcal{R}$, and $Y$ denote the images, expert annotators, and annotations respectively. The latent variable $Z$ denotes the ambiguity in medical scans. In our probability model, a latent variable $\tau$ is formulated to model the subjective variants among expert annotators. The green rectangular box represents a set of variables.
  • Figure 3: Model architecture of deep variational inference for multi-rater segmentation. ProSeg consists of image decoders $p(\textit{x}|\textit{z}_i)$, image encoders $p(\textit{z}_i|\textit{x})$, class embedding $q(\tau|\mathcal{R})$, classifier $p(\mathcal{R}|\tau)$, and the segmentation predictor $p(\textit{y}_{r_i}|\tau_i,z_i)$.
  • Figure 4: Expert annotator rank distribution of test (second row) and train (first row) datasets, where rank is obtained according to their annotation area.
  • Figure 6: Class embedding distribution
  • ...and 6 more figures

Theorems & Definitions (7)

  • Definition 3.1: Multi-rater medical image segmentation
  • Remark 3.2: Crowdsourcing method
  • Remark 3.3: Generation method
  • Remark 3.4: Personalization method
  • Definition 3.5: Probabilistic modeling of multi-rater medical image segmentation
  • Definition 3.6: Diversity
  • Definition 3.7: Personalization