Estimation and Analysis of Slice Propagation Uncertainty in 3D Anatomy Segmentation
Rachaell Nihalaani, Tushar Kataria, Jadie Adams, Shireen Y. Elhabian
TL;DR
The paper addresses the challenge of 3D anatomy segmentation under limited annotations by employing self-supervised slice propagation and integrating calibrated epistemic uncertainty quantification (UQ) to assess reliability. It adapts and benchmarks five UQ methods across two slice-propagation models, Sli2Vol and Vol2Flow, on three abdominal datasets to evaluate both segmentation accuracy and uncertainty calibration. Key findings show that UQ can improve both accuracy and trustworthiness, with concrete dropout delivering strong segmentation and uncertainty estimates, while SWAG offers better calibration at some cost to accuracy, highlighting trade-offs between methods. The work provides open-source code and a comprehensive benchmark, underscoring the practical value of calibrated UQ for safe, annotation-efficient medical image segmentation and outlining avenues for future domain-aware UQ enhancements.
Abstract
Supervised methods for 3D anatomy segmentation demonstrate superior performance but are often limited by the availability of annotated data. This limitation has led to a growing interest in self-supervised approaches in tandem with the abundance of available un-annotated data. Slice propagation has emerged as an self-supervised approach that leverages slice registration as a self-supervised task to achieve full anatomy segmentation with minimal supervision. This approach significantly reduces the need for domain expertise, time, and the cost associated with building fully annotated datasets required for training segmentation networks. However, this shift toward reduced supervision via deterministic networks raises concerns about the trustworthiness and reliability of predictions, especially when compared with more accurate supervised approaches. To address this concern, we propose the integration of calibrated uncertainty quantification (UQ) into slice propagation methods, providing insights into the model's predictive reliability and confidence levels. Incorporating uncertainty measures enhances user confidence in self-supervised approaches, thereby improving their practical applicability. We conducted experiments on three datasets for 3D abdominal segmentation using five UQ methods. The results illustrate that incorporating UQ improves not only model trustworthiness, but also segmentation accuracy. Furthermore, our analysis reveals various failure modes of slice propagation methods that might not be immediately apparent to end-users. This study opens up new research avenues to improve the accuracy and trustworthiness of slice propagation methods.
