Table of Contents
Fetching ...

Estimating Epistemic and Aleatoric Uncertainty with a Single Model

Matthew A. Chan, Maria J. Molina, Christopher A. Metzler

TL;DR

This work introduces a new approach to ensembling, hyper-diffusion models (HyperDM), which allows one to accurately estimate both epistemic and aleatoric uncertainty with a single model and offers prediction accuracy on par with, and in some cases superior to, multi-model ensembles.

Abstract

Estimating and disentangling epistemic uncertainty, uncertainty that is reducible with more training data, and aleatoric uncertainty, uncertainty that is inherent to the task at hand, is critically important when applying machine learning to high-stakes applications such as medical imaging and weather forecasting. Conditional diffusion models' breakthrough ability to accurately and efficiently sample from the posterior distribution of a dataset now makes uncertainty estimation conceptually straightforward: One need only train and sample from a large ensemble of diffusion models. Unfortunately, training such an ensemble becomes computationally intractable as the complexity of the model architecture grows. In this work we introduce a new approach to ensembling, hyper-diffusion models (HyperDM), which allows one to accurately estimate both epistemic and aleatoric uncertainty with a single model. Unlike existing single-model uncertainty methods like Monte-Carlo dropout and Bayesian neural networks, HyperDM offers prediction accuracy on par with, and in some cases superior to, multi-model ensembles. Furthermore, our proposed approach scales to modern network architectures such as Attention U-Net and yields more accurate uncertainty estimates compared to existing methods. We validate our method on two distinct real-world tasks: x-ray computed tomography reconstruction and weather temperature forecasting.

Estimating Epistemic and Aleatoric Uncertainty with a Single Model

TL;DR

This work introduces a new approach to ensembling, hyper-diffusion models (HyperDM), which allows one to accurately estimate both epistemic and aleatoric uncertainty with a single model and offers prediction accuracy on par with, and in some cases superior to, multi-model ensembles.

Abstract

Estimating and disentangling epistemic uncertainty, uncertainty that is reducible with more training data, and aleatoric uncertainty, uncertainty that is inherent to the task at hand, is critically important when applying machine learning to high-stakes applications such as medical imaging and weather forecasting. Conditional diffusion models' breakthrough ability to accurately and efficiently sample from the posterior distribution of a dataset now makes uncertainty estimation conceptually straightforward: One need only train and sample from a large ensemble of diffusion models. Unfortunately, training such an ensemble becomes computationally intractable as the complexity of the model architecture grows. In this work we introduce a new approach to ensembling, hyper-diffusion models (HyperDM), which allows one to accurately estimate both epistemic and aleatoric uncertainty with a single model. Unlike existing single-model uncertainty methods like Monte-Carlo dropout and Bayesian neural networks, HyperDM offers prediction accuracy on par with, and in some cases superior to, multi-model ensembles. Furthermore, our proposed approach scales to modern network architectures such as Attention U-Net and yields more accurate uncertainty estimates compared to existing methods. We validate our method on two distinct real-world tasks: x-ray computed tomography reconstruction and weather temperature forecasting.
Paper Structure (23 sections, 15 equations, 11 figures, 4 tables)

This paper contains 23 sections, 15 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: General framework of HyperDM. (a) A Bayesian hyper-network is optimized to generate diffusion model weights from randomly sampled noise. This process is repeated $M$ times to obtain an ensemble of $M$ weights. (b) A diffusion model accepts fixed weights from the hyper-network to stochastically generate a prediction. This process is repeated $N$ times for each set of weights, yielding a total of $M\times N$ predictions. (c) The ensemble predictions are aggregated to produce a final prediction and an epistemic / aleatoric uncertainty map.
  • Figure 2: Accurate uncertainty estimation using HyperDM. (a) HyperDM is trained on four 1D datasets with aleatoric uncertainty determined by noise variance $\sigma_\eta^2$. Variances across diffusion model predictions are visualized as one distribution per training dataset. Aleatoric estimates (i.e., the mean of each distribution) accurately predict $\sigma_\eta^2$. (b) HyperDM is trained on four datasets with epistemic uncertainty determined by dataset size $|\mathcal{D}|$. Prediction means are visualized as one distribution per training dataset. Epistemic estimates (i.e., the variance of each distribution) grow inversely with $|\mathcal{D}|$.
  • Figure 3: Weather forecasting on out-of-distribution data. (a) An out-of-distribution measurement is formed by synthetically inserting a hot spot in the northeastern part of Canada. (b) Epistemic and aleatoric uncertainty maps are produced by each method on the provided measurement. Compared to other methods, HyperDM is best able to isolate the abnormal feature in its epistemic estimate.
  • Figure 4: CT reconstruction on out-of-distribution data. (a) An out-of-distribution CT measurement formed by synthetically inserting metal implants along the spine. (b) Epistemic and aleatoric uncertainty maps are produced by each method on the out-of-distribution measurement. Both DPS-UQ and HyperDM are able to distinguish the abnormal feature in their epistemic prediction.
  • Figure 5: Aggregation of ensemble predictions. Ensemble predictions are aggregated using conventional methods (e.g., mean, median, mode). Mean and median aggregation results are similar, while mode aggregation results are noticeably more noisy.
  • ...and 6 more figures