Table of Contents
Fetching ...

From Risk to Uncertainty: Generating Predictive Uncertainty Measures via Bayesian Estimation

Nikita Kotelevskii, Vladimir Kondratyev, Martin Takáč, Éric Moulines, Maxim Panov

TL;DR

This paper presents a unified risk-based framework to quantify predictive uncertainty by decomposing pointwise risk into aleatoric and epistemic components using strictly proper scoring rules and Bayesian estimation. By expressing $R_{Tot}$ as $R_{Bayes}+R_{Exc}$ and leveraging Bayesian predictions, the authors show how well-known uncertainty measures (e.g., Mutual Information, EPKL) arise as special cases under different approximations, and they connect the framework to energy-based models. Through extensive experiments on CIFAR10/100 and TinyImageNet, they show that Log-score-based measures are generally effective for OOD detection, while Bayes and Total risks tend to excel at misclassification detection, with Excess risk offering advantages in soft-OOD scenarios. The work provides practical guidance on selecting uncertainty measures based on task (OOD vs misclassification) and data regime (soft- vs hard-OOD), and it establishes a theoretical link between diverse uncertainty metrics within a single Bayesian risk framework.

Abstract

There are various measures of predictive uncertainty in the literature, but their relationships to each other remain unclear. This paper uses a decomposition of statistical pointwise risk into components, associated with different sources of predictive uncertainty, namely aleatoric uncertainty (inherent data variability) and epistemic uncertainty (model-related uncertainty). Together with Bayesian methods, applied as an approximation, we build a framework that allows one to generate different predictive uncertainty measures. We validate our method on image datasets by evaluating its performance in detecting out-of-distribution and misclassified instances using the AUROC metric. The experimental results confirm that the measures derived from our framework are useful for the considered downstream tasks.

From Risk to Uncertainty: Generating Predictive Uncertainty Measures via Bayesian Estimation

TL;DR

This paper presents a unified risk-based framework to quantify predictive uncertainty by decomposing pointwise risk into aleatoric and epistemic components using strictly proper scoring rules and Bayesian estimation. By expressing as and leveraging Bayesian predictions, the authors show how well-known uncertainty measures (e.g., Mutual Information, EPKL) arise as special cases under different approximations, and they connect the framework to energy-based models. Through extensive experiments on CIFAR10/100 and TinyImageNet, they show that Log-score-based measures are generally effective for OOD detection, while Bayes and Total risks tend to excel at misclassification detection, with Excess risk offering advantages in soft-OOD scenarios. The work provides practical guidance on selecting uncertainty measures based on task (OOD vs misclassification) and data regime (soft- vs hard-OOD), and it establishes a theoretical link between diverse uncertainty metrics within a single Bayesian risk framework.

Abstract

There are various measures of predictive uncertainty in the literature, but their relationships to each other remain unclear. This paper uses a decomposition of statistical pointwise risk into components, associated with different sources of predictive uncertainty, namely aleatoric uncertainty (inherent data variability) and epistemic uncertainty (model-related uncertainty). Together with Bayesian methods, applied as an approximation, we build a framework that allows one to generate different predictive uncertainty measures. We validate our method on image datasets by evaluating its performance in detecting out-of-distribution and misclassified instances using the AUROC metric. The experimental results confirm that the measures derived from our framework are useful for the considered downstream tasks.
Paper Structure (53 sections, 97 equations, 6 figures, 13 tables)

This paper contains 53 sections, 97 equations, 6 figures, 13 tables.

Figures (6)

  • Figure 1: The figure shows different examples of input objects in binary classification problem (cats vs dogs). The limitation of our approach is that $\eta(x) = P_{tr}(Y \mid X=x)$ should be defined even for objects with tiny mass under $P_{tr}$ (see discussion in Appendix \ref{['sec:limitations']}).
  • Figure 2: Different situations for risk estimates. Risks typed in black and above the axis are the true ones. Risks, typed in color, and below are estimates. Two-pointed arrows show Excess risks. Top.$\Tilde{\text{R}}_{\text{Tot}}$ underestimates $\text{R}_{\text{Tot}}$, $\Tilde{\text{R}}_{\text{Bayes}}^{(1)}$ better estimates $\text{R}_{\text{Bayes}}$, and $\Tilde{\text{R}}_{\text{Exc}}^{(1)}$ better estimates $\text{R}_{\text{Exc}}$. Bottom.$\Tilde{\text{R}}_{\text{Tot}}$ overestimates $\text{R}_{\text{Tot}}$, $\Tilde{\text{R}}_{\text{Bayes}}^{(1)}$ better estimates $\text{R}_{\text{Bayes}}$, and $\Tilde{\text{R}}_{\text{Exc}}^{(2)}$ better estimates $\text{R}_{\text{Exc}}$. We see, that for different estimates of $\text{R}_{\text{Tot}}$, we have different best approximations for $\text{R}_{\text{Exc}}$. See discussion in Appendix \ref{['sec:limitations']}.
  • Figure 3: Violin plots for different training loss functions and different metrics for ResNet18 Left: CIFAR10; Middle: CIFAR100; Right: TinyImageNet.
  • Figure 4: Different shapes of the posterior distributions.
  • Figure 5: Epistemic uncertainty metrics, given prior misspecification and different samples sizes.
  • ...and 1 more figures