From Risk to Uncertainty: Generating Predictive Uncertainty Measures via Bayesian Estimation

Nikita Kotelevskii; Vladimir Kondratyev; Martin Takáč; Éric Moulines; Maxim Panov

From Risk to Uncertainty: Generating Predictive Uncertainty Measures via Bayesian Estimation

Nikita Kotelevskii, Vladimir Kondratyev, Martin Takáč, Éric Moulines, Maxim Panov

TL;DR

This paper presents a unified risk-based framework to quantify predictive uncertainty by decomposing pointwise risk into aleatoric and epistemic components using strictly proper scoring rules and Bayesian estimation. By expressing $R_{Tot}$ as $R_{Bayes}+R_{Exc}$ and leveraging Bayesian predictions, the authors show how well-known uncertainty measures (e.g., Mutual Information, EPKL) arise as special cases under different approximations, and they connect the framework to energy-based models. Through extensive experiments on CIFAR10/100 and TinyImageNet, they show that Log-score-based measures are generally effective for OOD detection, while Bayes and Total risks tend to excel at misclassification detection, with Excess risk offering advantages in soft-OOD scenarios. The work provides practical guidance on selecting uncertainty measures based on task (OOD vs misclassification) and data regime (soft- vs hard-OOD), and it establishes a theoretical link between diverse uncertainty metrics within a single Bayesian risk framework.

Abstract

There are various measures of predictive uncertainty in the literature, but their relationships to each other remain unclear. This paper uses a decomposition of statistical pointwise risk into components, associated with different sources of predictive uncertainty, namely aleatoric uncertainty (inherent data variability) and epistemic uncertainty (model-related uncertainty). Together with Bayesian methods, applied as an approximation, we build a framework that allows one to generate different predictive uncertainty measures. We validate our method on image datasets by evaluating its performance in detecting out-of-distribution and misclassified instances using the AUROC metric. The experimental results confirm that the measures derived from our framework are useful for the considered downstream tasks.

From Risk to Uncertainty: Generating Predictive Uncertainty Measures via Bayesian Estimation

TL;DR

and leveraging Bayesian predictions, the authors show how well-known uncertainty measures (e.g., Mutual Information, EPKL) arise as special cases under different approximations, and they connect the framework to energy-based models. Through extensive experiments on CIFAR10/100 and TinyImageNet, they show that Log-score-based measures are generally effective for OOD detection, while Bayes and Total risks tend to excel at misclassification detection, with Excess risk offering advantages in soft-OOD scenarios. The work provides practical guidance on selecting uncertainty measures based on task (OOD vs misclassification) and data regime (soft- vs hard-OOD), and it establishes a theoretical link between diverse uncertainty metrics within a single Bayesian risk framework.

Abstract

Paper Structure (53 sections, 97 equations, 6 figures, 13 tables)

This paper contains 53 sections, 97 equations, 6 figures, 13 tables.

Introduction
Predictive uncertainty quantification via risks
Pointwise Risk as a Measure of Uncertainty
Aleatoric and Epistemic Uncertainties via Risks
Risks for Strictly Proper Scoring Rules
Bayesian Risk Estimation
Best risk approximation choice
Connection to energy-based models
Related Work
Experiments
Is there a best function plug-in choice?
Is Excess Risk better than Bayes Risk for Out-of-Distribution Detection?
Is Total risk always better than Excess risk for Misclassification detection?
Which energy estimate is better?
Conclusion
...and 38 more sections

Figures (6)

Figure 1: The figure shows different examples of input objects in binary classification problem (cats vs dogs). The limitation of our approach is that $\eta(x) = P_{tr}(Y \mid X=x)$ should be defined even for objects with tiny mass under $P_{tr}$ (see discussion in Appendix \ref{['sec:limitations']}).
Figure 2: Different situations for risk estimates. Risks typed in black and above the axis are the true ones. Risks, typed in color, and below are estimates. Two-pointed arrows show Excess risks. Top.$\Tilde{\text{R}}_{\text{Tot}}$ underestimates $\text{R}_{\text{Tot}}$, $\Tilde{\text{R}}_{\text{Bayes}}^{(1)}$ better estimates $\text{R}_{\text{Bayes}}$, and $\Tilde{\text{R}}_{\text{Exc}}^{(1)}$ better estimates $\text{R}_{\text{Exc}}$. Bottom.$\Tilde{\text{R}}_{\text{Tot}}$ overestimates $\text{R}_{\text{Tot}}$, $\Tilde{\text{R}}_{\text{Bayes}}^{(1)}$ better estimates $\text{R}_{\text{Bayes}}$, and $\Tilde{\text{R}}_{\text{Exc}}^{(2)}$ better estimates $\text{R}_{\text{Exc}}$. We see, that for different estimates of $\text{R}_{\text{Tot}}$, we have different best approximations for $\text{R}_{\text{Exc}}$. See discussion in Appendix \ref{['sec:limitations']}.
Figure 3: Violin plots for different training loss functions and different metrics for ResNet18 Left: CIFAR10; Middle: CIFAR100; Right: TinyImageNet.
Figure 4: Different shapes of the posterior distributions.
Figure 5: Epistemic uncertainty metrics, given prior misspecification and different samples sizes.
...and 1 more figures

From Risk to Uncertainty: Generating Predictive Uncertainty Measures via Bayesian Estimation

TL;DR

Abstract

From Risk to Uncertainty: Generating Predictive Uncertainty Measures via Bayesian Estimation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)