PULASki: Learning inter-rater variability using statistical distances to improve probabilistic segmentation
Soumick Chatterjee, Franziska Gaidzik, Alessandro Sciarra, Hendrik Mattern, Gábor Janiga, Oliver Speck, Andreas Nürnberger, Sahani Pathiraja
TL;DR
PULASki addresses inter-rater variability and data scarcity in medical image segmentation by reframing the Probabilistic U-Inet loss with distribution-based distances. By replacing the reconstruction term with $D(p(y|x), p_ heta(y|x,z))$ using metrics such as the Hausdorff divergence, FID, or de-biased Sinkhorn, the method better captures plausible segmentations, especially under severe class imbalance. Evaluations on intracranial vessel and MS lesion tasks in 2D and 3D demonstrate improved representation of uncertainty and higher anatomical plausibility, with the Hausdorff-based loss often delivering the strongest gains and 3D patches providing the most coherent segmentations. The work highlights the practical viability of distribution-distance losses for uncertainty-aware segmentation and provides insights on 2D versus 3D training, convergence behavior, and computational trade-offs for clinical applications.
Abstract
In the domain of medical imaging, many supervised learning based methods for segmentation face several challenges such as high variability in annotations from multiple experts, paucity of labelled data and class imbalanced datasets. These issues may result in segmentations that lack the requisite precision for clinical analysis and can be misleadingly overconfident without associated uncertainty quantification. This work proposes the PULASki method as a computationally efficient generative tool for biomedical image segmentation that accurately captures variability in expert annotations, even in small datasets. This approach makes use of an improved loss function based on statistical distances in a conditional variational autoencoder structure (Probabilistic UNet), which improves learning of the conditional decoder compared to the standard cross-entropy particularly in class imbalanced problems. The proposed method was analysed for two structurally different segmentation tasks (intracranial vessel and multiple sclerosis (MS) lesion) and compare our results to four well-established baselines in terms of quantitative metrics and qualitative output. These experiments involve class-imbalanced datasets characterised by challenging features, including suboptimal signal-to-noise ratios and high ambiguity. Empirical results demonstrate the PULASKi method outperforms all baselines at the 5\% significance level. Our experiments are also of the first to present a comparative study of the computationally feasible segmentation of complex geometries using 3D patches and the traditional use of 2D slices. The generated segmentations are shown to be much more anatomically plausible than in the 2D case, particularly for the vessel task.
