ProSona: Prompt-Guided Personalization for Multi-Expert Medical Image Segmentation
Aya Elgebaly, Nikolaos Delopoulos, Juliane Hörner-Rieber, Carolin Rippke, Sebastian Klüter, Luca Boldrini, Lorenzo Placidi, Riccardo Dal Bello, Nicolaus Andratschke, Michael Baumgartl, Claus Belka, Christopher Kurz, Guillaume Landry, Shadi Albarqouni
TL;DR
ProSona addresses inter-observer variability in medical image segmentation by learning a continuous latent space of annotator styles and enabling text-guided personalization. It combines a Probabilistic U-Net backbone to capture multiple plausible segmentations with a prompt-guided mechanism that maps natural-language prompts into the latent space, producing personalized predictions via a softmax-weighted combination of latent codes. A multi-level contrastive objective aligns textual and visual representations to disentangle and stabilize annotator styles. Across $\text{LIDC-IDRI}$ and a multi-institutional $\text{prostate MRI}$ dataset, ProSona achieves a significant reduction in $GED$ (17%) and a gain of about 1 point in $MeanDice$ over $DPersona$, while providing intuitive prompt-based control and smooth transitions between styles. This contributes to more flexible, interpretable, and clinically faithful segmentation that respects expert variability.
Abstract
Automated medical image segmentation suffers from high inter-observer variability, particularly in tasks such as lung nodule delineation, where experts often disagree. Existing approaches either collapse this variability into a consensus mask or rely on separate model branches for each annotator. We introduce ProSona, a two-stage framework that learns a continuous latent space of annotation styles, enabling controllable personalization via natural language prompts. A probabilistic U-Net backbone captures diverse expert hypotheses, while a prompt-guided projection mechanism navigates this latent space to generate personalized segmentations. A multi-level contrastive objective aligns textual and visual representations, promoting disentangled and interpretable expert styles. Across the LIDC-IDRI lung nodule and multi-institutional prostate MRI datasets, ProSona reduces the Generalized Energy Distance by 17% and improves mean Dice by more than one point compared with DPersona. These results demonstrate that natural-language prompts can provide flexible, accurate, and interpretable control over personalized medical image segmentation. Our implementation is available online 1 .
