Table of Contents
Fetching ...

Sparse Bayesian Networks: Efficient Uncertainty Quantification in Medical Image Analysis

Zeinab Abboud, Herve Lombaert, Samuel Kadoury

TL;DR

This study introduces a training procedure for a sparse (partial) Bayesian network that selectively assigns a subset of parameters as Bayesian by assessing their deterministic saliency through gradient sensitivity analysis, and achieves competitive performance and predictive uncertainty estimation by reducing Bayesian parameters by over 95%.

Abstract

Efficiently quantifying predictive uncertainty in medical images remains a challenge. While Bayesian neural networks (BNN) offer predictive uncertainty, they require substantial computational resources to train. Although Bayesian approximations such as ensembles have shown promise, they still suffer from high training and inference costs. Existing approaches mainly address the costs of BNN inference post-training, with little focus on improving training efficiency and reducing parameter complexity. This study introduces a training procedure for a sparse (partial) Bayesian network. Our method selectively assigns a subset of parameters as Bayesian by assessing their deterministic saliency through gradient sensitivity analysis. The resulting network combines deterministic and Bayesian parameters, exploiting the advantages of both representations to achieve high task-specific performance and minimize predictive uncertainty. Demonstrated on multi-label ChestMNIST for classification and ISIC, LIDC-IDRI for segmentation, our approach achieves competitive performance and predictive uncertainty estimation by reducing Bayesian parameters by over 95\%, significantly reducing computational expenses compared to fully Bayesian and ensemble methods.

Sparse Bayesian Networks: Efficient Uncertainty Quantification in Medical Image Analysis

TL;DR

This study introduces a training procedure for a sparse (partial) Bayesian network that selectively assigns a subset of parameters as Bayesian by assessing their deterministic saliency through gradient sensitivity analysis, and achieves competitive performance and predictive uncertainty estimation by reducing Bayesian parameters by over 95%.

Abstract

Efficiently quantifying predictive uncertainty in medical images remains a challenge. While Bayesian neural networks (BNN) offer predictive uncertainty, they require substantial computational resources to train. Although Bayesian approximations such as ensembles have shown promise, they still suffer from high training and inference costs. Existing approaches mainly address the costs of BNN inference post-training, with little focus on improving training efficiency and reducing parameter complexity. This study introduces a training procedure for a sparse (partial) Bayesian network. Our method selectively assigns a subset of parameters as Bayesian by assessing their deterministic saliency through gradient sensitivity analysis. The resulting network combines deterministic and Bayesian parameters, exploiting the advantages of both representations to achieve high task-specific performance and minimize predictive uncertainty. Demonstrated on multi-label ChestMNIST for classification and ISIC, LIDC-IDRI for segmentation, our approach achieves competitive performance and predictive uncertainty estimation by reducing Bayesian parameters by over 95\%, significantly reducing computational expenses compared to fully Bayesian and ensemble methods.
Paper Structure (15 sections, 1 equation, 4 figures, 1 table)

This paper contains 15 sections, 1 equation, 4 figures, 1 table.

Figures (4)

  • Figure 1: Various model implementations of deterministic (a), Bayesian (b), partial Bayesian (c, d), where black connections are deterministic, and red are probabilistic (Bayesian). Partial Bayesian models can be implemented in two distinct approaches, (c) where a single or multiple layers can be set as Bayesian, or sparse approach (d) where a selected number of connections can be Bayesian. (b $\rightarrow$ d arranged from highest to lowest number of Bayesian parameters).
  • Figure 2: Our proposed training of sparse (partial) Bayesian network. Step 1: Train a deterministic model by minimizing the negative log likelihood $\mathcal{L}(y, y_{gt})$ where the parameters are represented as point estimates. Step 2: Perform a gradient-based sensitivity analysis, denoted as $\nabla \theta$, and identify the Topk connections corresponding to the highest gradients (in red). Step 3: Train a sparse (partial) Bayesian model with the Topk connections as Bayesian parameters and the remaining network as deterministic by minimizing the Evidence Lower Bound (ELBO) loss $\mathcal{L}(y, y_{gt})+\beta \cdot KL\left(p_b(\theta), q_b(\theta)\right)$, where $p_b(\theta)$ and $q_b(\theta)$ are the prior and posterior distributions for the $\theta_b$ Bayesian parameters.
  • Figure 3: Segmentation samples for 5-member ensemble, 1% partial Bayesian, and fully Bayesian models with input image on the far left. Predictions mask overlays show true positive (green), false positive (blue), and false negative (red). The uncertainty map is the entropy of the output probability, showing regions of high (red) and low (blue) uncertainty. (b) LIDC-IDRI includes inter-rater variability (2nd column). Our partial 1% is at par with ensembles at a lower cost. (Zoom in for a better view of the details.)
  • Figure 4: Performance comparison of partial Bayesian models with $r_{bayes}=$ (1%, 5%, 10%, 20%, 40%, 80%), 5-member ensembles, and fully Bayesian models for classification and segmentation tasks. Mid-line is the median metric value. Shaded area indicates the 25-75% interquartile range. Test Error is computed as $(1-\text{accuracy/Dice/IoU})$.