Assessing Uncertainty Estimation Methods for 3D Image Segmentation under Distribution Shifts
Masoumeh Javanbakhat, Md Tasnimul Hasan, Cristoph Lippert
TL;DR
The paper addresses the critical issue of reliable uncertainty estimation for 3D medical image segmentation under distribution shifts. It compares multimodal-posterior capable methods (cSGHMC) with non-Bayesian and single-mode approaches (MCD, DE) across covariate, modality, and corruption shifts using three 3D datasets, evaluating calibration and predictive uncertainty via $NLL$, $BS$, and $ECE$ as well as entropy-based measures. Results indicate that methods capturing multiple posterior modes yield more trustworthy uncertainty estimates, with cSGHMC often providing superior calibration and higher, more informative uncertainty under shift, while Deep Ensemble shows limited diversity and inconsistent reliability. The study offers practical guidance for deploying medical AI with uncertainty thresholds to avoid making dangerous decisions when faced with distribution shifts, underscoring the importance of posterior diversity in robust uncertainty quantification.
Abstract
In recent years, machine learning has witnessed extensive adoption across various sectors, yet its application in medical image-based disease detection and diagnosis remains challenging due to distribution shifts in real-world data. In practical settings, deployed models encounter samples that differ significantly from the training dataset, especially in the health domain, leading to potential performance issues. This limitation hinders the expressiveness and reliability of deep learning models in health applications. Thus, it becomes crucial to identify methods capable of producing reliable uncertainty estimation in the context of distribution shifts in the health sector. In this paper, we explore the feasibility of using cutting-edge Bayesian and non-Bayesian methods to detect distributionally shifted samples, aiming to achieve reliable and trustworthy diagnostic predictions in segmentation task. Specifically, we compare three distinct uncertainty estimation methods, each designed to capture either unimodal or multimodal aspects in the posterior distribution. Our findings demonstrate that methods capable of addressing multimodal characteristics in the posterior distribution, offer more dependable uncertainty estimates. This research contributes to enhancing the utility of deep learning in healthcare, making diagnostic predictions more robust and trustworthy.
