Table of Contents
Fetching ...

ProSona: Prompt-Guided Personalization for Multi-Expert Medical Image Segmentation

Aya Elgebaly, Nikolaos Delopoulos, Juliane Hörner-Rieber, Carolin Rippke, Sebastian Klüter, Luca Boldrini, Lorenzo Placidi, Riccardo Dal Bello, Nicolaus Andratschke, Michael Baumgartl, Claus Belka, Christopher Kurz, Guillaume Landry, Shadi Albarqouni

TL;DR

ProSona addresses inter-observer variability in medical image segmentation by learning a continuous latent space of annotator styles and enabling text-guided personalization. It combines a Probabilistic U-Net backbone to capture multiple plausible segmentations with a prompt-guided mechanism that maps natural-language prompts into the latent space, producing personalized predictions via a softmax-weighted combination of latent codes. A multi-level contrastive objective aligns textual and visual representations to disentangle and stabilize annotator styles. Across $\text{LIDC-IDRI}$ and a multi-institutional $\text{prostate MRI}$ dataset, ProSona achieves a significant reduction in $GED$ (17%) and a gain of about 1 point in $MeanDice$ over $DPersona$, while providing intuitive prompt-based control and smooth transitions between styles. This contributes to more flexible, interpretable, and clinically faithful segmentation that respects expert variability.

Abstract

Automated medical image segmentation suffers from high inter-observer variability, particularly in tasks such as lung nodule delineation, where experts often disagree. Existing approaches either collapse this variability into a consensus mask or rely on separate model branches for each annotator. We introduce ProSona, a two-stage framework that learns a continuous latent space of annotation styles, enabling controllable personalization via natural language prompts. A probabilistic U-Net backbone captures diverse expert hypotheses, while a prompt-guided projection mechanism navigates this latent space to generate personalized segmentations. A multi-level contrastive objective aligns textual and visual representations, promoting disentangled and interpretable expert styles. Across the LIDC-IDRI lung nodule and multi-institutional prostate MRI datasets, ProSona reduces the Generalized Energy Distance by 17% and improves mean Dice by more than one point compared with DPersona. These results demonstrate that natural-language prompts can provide flexible, accurate, and interpretable control over personalized medical image segmentation. Our implementation is available online 1 .

ProSona: Prompt-Guided Personalization for Multi-Expert Medical Image Segmentation

TL;DR

ProSona addresses inter-observer variability in medical image segmentation by learning a continuous latent space of annotator styles and enabling text-guided personalization. It combines a Probabilistic U-Net backbone to capture multiple plausible segmentations with a prompt-guided mechanism that maps natural-language prompts into the latent space, producing personalized predictions via a softmax-weighted combination of latent codes. A multi-level contrastive objective aligns textual and visual representations to disentangle and stabilize annotator styles. Across and a multi-institutional dataset, ProSona achieves a significant reduction in (17%) and a gain of about 1 point in over , while providing intuitive prompt-based control and smooth transitions between styles. This contributes to more flexible, interpretable, and clinically faithful segmentation that respects expert variability.

Abstract

Automated medical image segmentation suffers from high inter-observer variability, particularly in tasks such as lung nodule delineation, where experts often disagree. Existing approaches either collapse this variability into a consensus mask or rely on separate model branches for each annotator. We introduce ProSona, a two-stage framework that learns a continuous latent space of annotation styles, enabling controllable personalization via natural language prompts. A probabilistic U-Net backbone captures diverse expert hypotheses, while a prompt-guided projection mechanism navigates this latent space to generate personalized segmentations. A multi-level contrastive objective aligns textual and visual representations, promoting disentangled and interpretable expert styles. Across the LIDC-IDRI lung nodule and multi-institutional prostate MRI datasets, ProSona reduces the Generalized Energy Distance by 17% and improves mean Dice by more than one point compared with DPersona. These results demonstrate that natural-language prompts can provide flexible, accurate, and interpretable control over personalized medical image segmentation. Our implementation is available online 1 .

Paper Structure

This paper contains 13 sections, 5 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Information loss in consensus fusion for lung nodule. Left: expert delineations showing variability in boundary interpretation. Right: majority-vote mask (green) and regions omitted by consensus (red), corresponding to subtle infiltrative extensions potentially relevant for treatment planning.
  • Figure 2: Overview of our ProSona: (a) Stage 2: Personalization - enabling text-guided segmentation through the prior bank and prompt-based latent space navigation; (c) Prompt processing pipeline showing CLIP encoding and similarity-based latent selection; (d) Multi-level contrastive learning with text-pairwise and similarity-pairwise matrices for better disentanglement of annotation styles.
  • Figure 3: Qualitative examples of ProSona on LIDC–IDRI (top) and prostate MRI (bottom). Top: input CT slice with expert annotations (A1,A2,...) and their union/intersection regions. Bottom: predictions (S1,S2,...) guided by prompts describing annotator styles.
  • Figure 4: Smooth interpolation between annotator styles.
  • Figure 5: Ablation study on the contrastive hyper‑parameters $\alpha$ and $\beta$.