PULASki: Learning inter-rater variability using statistical distances to improve probabilistic segmentation

Soumick Chatterjee; Franziska Gaidzik; Alessandro Sciarra; Hendrik Mattern; Gábor Janiga; Oliver Speck; Andreas Nürnberger; Sahani Pathiraja

PULASki: Learning inter-rater variability using statistical distances to improve probabilistic segmentation

Soumick Chatterjee, Franziska Gaidzik, Alessandro Sciarra, Hendrik Mattern, Gábor Janiga, Oliver Speck, Andreas Nürnberger, Sahani Pathiraja

TL;DR

PULASki addresses inter-rater variability and data scarcity in medical image segmentation by reframing the Probabilistic U-Inet loss with distribution-based distances. By replacing the reconstruction term with $D(p(y|x), p_ heta(y|x,z))$ using metrics such as the Hausdorff divergence, FID, or de-biased Sinkhorn, the method better captures plausible segmentations, especially under severe class imbalance. Evaluations on intracranial vessel and MS lesion tasks in 2D and 3D demonstrate improved representation of uncertainty and higher anatomical plausibility, with the Hausdorff-based loss often delivering the strongest gains and 3D patches providing the most coherent segmentations. The work highlights the practical viability of distribution-distance losses for uncertainty-aware segmentation and provides insights on 2D versus 3D training, convergence behavior, and computational trade-offs for clinical applications.

Abstract

In the domain of medical imaging, many supervised learning based methods for segmentation face several challenges such as high variability in annotations from multiple experts, paucity of labelled data and class imbalanced datasets. These issues may result in segmentations that lack the requisite precision for clinical analysis and can be misleadingly overconfident without associated uncertainty quantification. This work proposes the PULASki method as a computationally efficient generative tool for biomedical image segmentation that accurately captures variability in expert annotations, even in small datasets. This approach makes use of an improved loss function based on statistical distances in a conditional variational autoencoder structure (Probabilistic UNet), which improves learning of the conditional decoder compared to the standard cross-entropy particularly in class imbalanced problems. The proposed method was analysed for two structurally different segmentation tasks (intracranial vessel and multiple sclerosis (MS) lesion) and compare our results to four well-established baselines in terms of quantitative metrics and qualitative output. These experiments involve class-imbalanced datasets characterised by challenging features, including suboptimal signal-to-noise ratios and high ambiguity. Empirical results demonstrate the PULASKi method outperforms all baselines at the 5\% significance level. Our experiments are also of the first to present a comparative study of the computationally feasible segmentation of complex geometries using 3D patches and the traditional use of 2D slices. The generated segmentations are shown to be much more anatomically plausible than in the 2D case, particularly for the vessel task.

PULASki: Learning inter-rater variability using statistical distances to improve probabilistic segmentation

TL;DR

using metrics such as the Hausdorff divergence, FID, or de-biased Sinkhorn, the method better captures plausible segmentations, especially under severe class imbalance. Evaluations on intracranial vessel and MS lesion tasks in 2D and 3D demonstrate improved representation of uncertainty and higher anatomical plausibility, with the Hausdorff-based loss often delivering the strongest gains and 3D patches providing the most coherent segmentations. The work highlights the practical viability of distribution-distance losses for uncertainty-aware segmentation and provides insights on 2D versus 3D training, convergence behavior, and computational trade-offs for clinical applications.

Abstract

Paper Structure (29 sections, 42 equations, 12 figures, 6 tables, 1 algorithm)

This paper contains 29 sections, 42 equations, 12 figures, 6 tables, 1 algorithm.

Introduction
Contributions
Background
Notation
U-Net
Probabilistic U-Net
Diversified and Personalised Multi-rater Segmentation (D-Persona)
Stage I: Diversified Segmentation
Stage II: Personalised Segmentation
Collectively Intelligent Medical Diffusion (CIMD)
Multi-head Variational Inference U-Net (MH VI U-Net)
Stochastic Segmentation Networks (SSN)
Monte Carlo Dropout
PULASki - Probabilistic Unet Loss Assessed through Statistical distances
Frechet Inception Distance (FID)
...and 14 more sections

Figures (12)

Figure 1: Schematic of Baseline Methods: (A): Monte-Carlo-Dropout (MC-DO); (B): Probabilistic U-Net (ProbU-Net); (C): Stochastic Segmentation Network (SSN); (D): Multi-head Variational Inference U-Net (MH VI U-Net)
Figure 2: Schematic of the proposed PULASki method.
Figure 3: Rate of occurrence (RoO) of labels across multiple annotations. The brighter (yellow) the voxel appears, the more often it is labelled as vessel (A) or lesion (B) across the available annotations. (A): Ten segmentations per image from one subject in OpenNeuro’s ‘StudyForrest’ dataset were generated using the Frangi filter. A vessel cross section at the location of the red line is displayed within the red square. (B): Multiple Sclerosis lesion segmentations from 7 expert annotators for one subject.
Figure 4: Quantitative assessment of distribution of generated segmentations per image compared to available data. Boxplots show variation in Generalised Energy Distance (GED) scores per image in the test set for all baselines and PULASKi with different statistical distances. Results for the vessel and MS lesion segmentation task are shown in the top and bottom rows, respectively.
Figure 5: Rate of occurrence (RoO) of labels across generated segmentations for a given subject. The brighter (yellow) the voxel appears, the more often it is labelled as vessel (A,C) or lesion (B,D). The annotaded data is displayed in the dashed rectangle for vessel segmentation (A) and MS lesion (B). (A) RoO for the 2D vessel segmentation for PULASki method with different loss functions (FID, Sinkhorn and Hausdorff divergence) shown in the top row; baselines (Probabilistic U-Net, MCDO, VIMH, SSN) shown in bottom row. All methods were trained on 10 plausible labels per image. A detailed view of a specific region, indicated in the red rectangle, is provied below the larger volume. (B) RoO for the 2D Multiple sclerosis segmentation for PULASki method with different loss functions (Sinkhorn and Hausdorff) and selected baselines. All methods were trained on 7 plausible labels per image. (C) and (D) RoO for the 3D implementation in vessel segmentation and MS segmentation, respectively.
...and 7 more figures

PULASki: Learning inter-rater variability using statistical distances to improve probabilistic segmentation

TL;DR

Abstract

PULASki: Learning inter-rater variability using statistical distances to improve probabilistic segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (12)