Table of Contents
Fetching ...

Bayesian Uncertainty Estimation by Hamiltonian Monte Carlo: Applications to Cardiac MRI Segmentation

Yidong Zhao, Joao Tourais, Iain Pierce, Christian Nitsche, Thomas A. Treibel, Sebastian Weingärtner, Artur M. Schweidtmann, Qian Tao

TL;DR

This work tackles the challenge of unreliable uncertainty estimation in DL-based cardiac MRI segmentation by introducing HMC-CP, a scalable Bayesian framework that uses Hamiltonian Monte Carlo with cold posterior tempering and cyclical annealing to sample diverse posterior solutions. By aggregating voxel-wise uncertainties across posterior samples, the method produces calibrated voxel-level estimates and an image-level failure score, achieving improved calibration and segmentation accuracy on in-domain cine data and robust performance under substantial domain shifts to quantitative MRI. The study demonstrates that diversity in the functional space, captured via multi-modal HMC samples, correlates with better uncertainty estimates and that image-level failure detection achieves high AUC (up to about 91%) across datasets. Overall, HMC-CP provides a principled, efficient path toward trustworthy DL in clinical cardiac imaging, addressing silent failures and enabling practical failure detection through an aggregated confidence score.

Abstract

Deep learning (DL)-based methods have achieved state-of-the-art performance for many medical image segmentation tasks. Nevertheless, recent studies show that deep neural networks (DNNs) can be miscalibrated and overconfident, leading to "silent failures" that are risky for clinical applications. Bayesian DL provides an intuitive approach to DL failure detection, based on posterior probability estimation. However, the posterior is intractable for large medical image segmentation DNNs. To tackle this challenge, we propose a Bayesian learning framework using Hamiltonian Monte Carlo (HMC), tempered by cold posterior (CP) to accommodate medical data augmentation, named HMC-CP. For HMC computation, we further propose a cyclical annealing strategy, capturing both local and global geometries of the posterior distribution, enabling highly efficient Bayesian DNN training with the same computational budget as training a single DNN. The resulting Bayesian DNN outputs an ensemble segmentation along with the segmentation uncertainty. We evaluate the proposed HMC-CP extensively on cardiac magnetic resonance image (MRI) segmentation, using in-domain steady-state free precession (SSFP) cine images as well as out-of-domain datasets of quantitative T1 and T2 mapping. Our results show that the proposed method improves both segmentation accuracy and uncertainty estimation for in- and out-of-domain data, compared with well-established baseline methods such as Monte Carlo Dropout and Deep Ensembles. Additionally, we establish a conceptual link between HMC and the commonly known stochastic gradient descent (SGD) and provide general insight into the uncertainty of DL. This uncertainty is implicitly encoded in the training dynamics but often overlooked. With reliable uncertainty estimation, our method provides a promising direction toward trustworthy DL in clinical applications.

Bayesian Uncertainty Estimation by Hamiltonian Monte Carlo: Applications to Cardiac MRI Segmentation

TL;DR

This work tackles the challenge of unreliable uncertainty estimation in DL-based cardiac MRI segmentation by introducing HMC-CP, a scalable Bayesian framework that uses Hamiltonian Monte Carlo with cold posterior tempering and cyclical annealing to sample diverse posterior solutions. By aggregating voxel-wise uncertainties across posterior samples, the method produces calibrated voxel-level estimates and an image-level failure score, achieving improved calibration and segmentation accuracy on in-domain cine data and robust performance under substantial domain shifts to quantitative MRI. The study demonstrates that diversity in the functional space, captured via multi-modal HMC samples, correlates with better uncertainty estimates and that image-level failure detection achieves high AUC (up to about 91%) across datasets. Overall, HMC-CP provides a principled, efficient path toward trustworthy DL in clinical cardiac imaging, addressing silent failures and enabling practical failure detection through an aggregated confidence score.

Abstract

Deep learning (DL)-based methods have achieved state-of-the-art performance for many medical image segmentation tasks. Nevertheless, recent studies show that deep neural networks (DNNs) can be miscalibrated and overconfident, leading to "silent failures" that are risky for clinical applications. Bayesian DL provides an intuitive approach to DL failure detection, based on posterior probability estimation. However, the posterior is intractable for large medical image segmentation DNNs. To tackle this challenge, we propose a Bayesian learning framework using Hamiltonian Monte Carlo (HMC), tempered by cold posterior (CP) to accommodate medical data augmentation, named HMC-CP. For HMC computation, we further propose a cyclical annealing strategy, capturing both local and global geometries of the posterior distribution, enabling highly efficient Bayesian DNN training with the same computational budget as training a single DNN. The resulting Bayesian DNN outputs an ensemble segmentation along with the segmentation uncertainty. We evaluate the proposed HMC-CP extensively on cardiac magnetic resonance image (MRI) segmentation, using in-domain steady-state free precession (SSFP) cine images as well as out-of-domain datasets of quantitative T1 and T2 mapping. Our results show that the proposed method improves both segmentation accuracy and uncertainty estimation for in- and out-of-domain data, compared with well-established baseline methods such as Monte Carlo Dropout and Deep Ensembles. Additionally, we establish a conceptual link between HMC and the commonly known stochastic gradient descent (SGD) and provide general insight into the uncertainty of DL. This uncertainty is implicitly encoded in the training dynamics but often overlooked. With reliable uncertainty estimation, our method provides a promising direction toward trustworthy DL in clinical applications.
Paper Structure (27 sections, 21 equations, 15 figures, 3 tables)

This paper contains 27 sections, 21 equations, 15 figures, 3 tables.

Figures (15)

  • Figure 1: With a limited amount of training data, the network admits infinite weight solutions that can explain the training set. The posterior of weights models the probability density of the solution space, which is characterized by multiple local optima. The HMC chain (black line) is guided by the momentum (red arrow) which accumulates the gradient (purple arrow) to approach the local optima. The noise (green arrow) encourages the exploration of the low-loss surface. Multiple local optima can be visited by the chain via the annealing strategy. The weight space sampling is essentially similar to training the networks with SGD with momentum. In practice, checkpoints during the chain simulation are saved as posterior samples to form ensembles for function space marginalization.
  • Figure 2: Uncertainty maps indicate possible over- and under-segmentation. We estimate the true foreground (TF), false foreground(FF), and false background (FB) using the estimated uncertainty and aggregate them into the final image-level score.
  • Figure 3: Representative images of ACDC, M&M and QMRI datasets. ACDC and M&M are SSFP cine images and the contrast variation is relatively minor. QMRI baseline images have a larger contrast change compared to the training set (ACDC).
  • Figure 4: Loss landscape and chain trajectory during training: (a) The loss landscape around a MAP solution. (b) Applying cyclical training promotes the diversity of solutions. The triangular marks indicate the three modes of solutions on the loss surface in three training cycles. (c) The t-SNE map of the collected weight samples illustrates three clusters of local weight samples. (d) Cosine similarity of weight samples collected in three cycles, suggests that weights drawn from a single cycle (mode) of the chain correlate with each other, while weight modes from different cycles are diverse.
  • Figure 5: Confusion matrices that show the diversity of functions of PHi-Seg (a), MC-Dropout (b), Deep Ensembles (c), and our proposed SGHMC variants, SGHMC-Single (d) and SGHMC-Multi (e). The ensemble of all function instances is denoted as E, at the lower-right corner of the matrices. Each entry in the confusion matrix represents the mutual distance in the function space of two functions, defined in Sec. \ref{['sec:func_div']}. (f) sums up the functional distance values from (a) to (e) and illustrates the mean of rows in the confusion matrices.
  • ...and 10 more figures