Table of Contents
Fetching ...

Provable Uncertainty Decomposition via Higher-Order Calibration

Gustaf Ahdritz, Aravind Gollakota, Parikshit Gopalan, Charlotte Peale, Udi Wieder

TL;DR

This paper introduces higher-order calibration as a principled framework to decompose predictive uncertainty into aleatoric and epistemic components with semantic alignment to the data-generating process. It defines higher-order predictors $f:\mathcal{X}\to\Delta\Delta\mathcal{Y}$ and Bayes mixtures over level sets $[x]$, proving that, under perfect higher-order calibration, the predicted aleatoric uncertainty matches the true average aleatoric uncertainty across the level set and the epistemic uncertainty corresponds to the average dispersion of those distributions. To make the theory practical, the authors introduce $k$-th order calibration, a tractable relaxation verifiable with $k$-snapshots, with a quantified convergence rate to full higher-order calibration and moment-recovery guarantees for the first $k$ moments. They provide two avenues to achieve $k$-th order calibration: learning directly from snapshots and a post-hoc calibration routine, each supported by finite-sample guarantees. Empirical evaluation on image classification (e.g., CIFAR-10H) demonstrates that larger $k$ improves the quality of aleatoric uncertainty estimates and yields meaningful uncertainty decompositions for real-world tasks, while the framework remains distribution-free and broadly applicable to Bayesian and ensemble predictors. The work offers a robust, calibration-grounded method for uncertainty quantification with explicit semantics and practical guarantees, enabling more trustworthy and interpretable predictive systems.

Abstract

We give a principled method for decomposing the predictive uncertainty of a model into aleatoric and epistemic components with explicit semantics relating them to the real-world data distribution. While many works in the literature have proposed such decompositions, they lack the type of formal guarantees we provide. Our method is based on the new notion of higher-order calibration, which generalizes ordinary calibration to the setting of higher-order predictors that predict mixtures over label distributions at every point. We show how to measure as well as achieve higher-order calibration using access to $k$-snapshots, namely examples where each point has $k$ independent conditional labels. Under higher-order calibration, the estimated aleatoric uncertainty at a point is guaranteed to match the real-world aleatoric uncertainty averaged over all points where the prediction is made. To our knowledge, this is the first formal guarantee of this type that places no assumptions whatsoever on the real-world data distribution. Importantly, higher-order calibration is also applicable to existing higher-order predictors such as Bayesian and ensemble models and provides a natural evaluation metric for such models. We demonstrate through experiments that our method produces meaningful uncertainty decompositions for image classification.

Provable Uncertainty Decomposition via Higher-Order Calibration

TL;DR

This paper introduces higher-order calibration as a principled framework to decompose predictive uncertainty into aleatoric and epistemic components with semantic alignment to the data-generating process. It defines higher-order predictors and Bayes mixtures over level sets , proving that, under perfect higher-order calibration, the predicted aleatoric uncertainty matches the true average aleatoric uncertainty across the level set and the epistemic uncertainty corresponds to the average dispersion of those distributions. To make the theory practical, the authors introduce -th order calibration, a tractable relaxation verifiable with -snapshots, with a quantified convergence rate to full higher-order calibration and moment-recovery guarantees for the first moments. They provide two avenues to achieve -th order calibration: learning directly from snapshots and a post-hoc calibration routine, each supported by finite-sample guarantees. Empirical evaluation on image classification (e.g., CIFAR-10H) demonstrates that larger improves the quality of aleatoric uncertainty estimates and yields meaningful uncertainty decompositions for real-world tasks, while the framework remains distribution-free and broadly applicable to Bayesian and ensemble predictors. The work offers a robust, calibration-grounded method for uncertainty quantification with explicit semantics and practical guarantees, enabling more trustworthy and interpretable predictive systems.

Abstract

We give a principled method for decomposing the predictive uncertainty of a model into aleatoric and epistemic components with explicit semantics relating them to the real-world data distribution. While many works in the literature have proposed such decompositions, they lack the type of formal guarantees we provide. Our method is based on the new notion of higher-order calibration, which generalizes ordinary calibration to the setting of higher-order predictors that predict mixtures over label distributions at every point. We show how to measure as well as achieve higher-order calibration using access to -snapshots, namely examples where each point has independent conditional labels. Under higher-order calibration, the estimated aleatoric uncertainty at a point is guaranteed to match the real-world aleatoric uncertainty averaged over all points where the prediction is made. To our knowledge, this is the first formal guarantee of this type that places no assumptions whatsoever on the real-world data distribution. Importantly, higher-order calibration is also applicable to existing higher-order predictors such as Bayesian and ensemble models and provides a natural evaluation metric for such models. We demonstrate through experiments that our method produces meaningful uncertainty decompositions for image classification.

Paper Structure

This paper contains 61 sections, 29 theorems, 110 equations, 6 figures, 2 tables.

Key Result

Theorem 1.2

Suppose $f$ is higher-order calibrated. Let $\pi^* = f^*([x])$ be the Bayes mixture over the level set $[x]$, and let $\overline{p}^* = \mathop{\mathrm{\mathbb{E}}}\limits_{\bm{p}^* \sim \pi^*}[\bm{p}^*]$. Then

Figures (6)

  • Figure 1: An illustration of higher-order calibration using the X-ray classification example. We depict scenarios 1 and 2 on the top and bottom respectively. On the left, we have instances grouped together into one level set $[x]$ by the predictor. By learning from snapshots drawn from the level set in either case, we are able to predict mixtures that match the true Bayes mixture $f^*([x])$.
  • Figure 2: Calibrating models with $k$-snapshots yields increasingly accurate estimates of aleatoric uncertainty.Top: Average aleatoric uncertainty estimation error (\ref{['eq:au-est-error']}) of CIFAR-10 models calibrated using snapshots of increasing size. Bottom: For three of the highest-entropy equivalence classes, we depict the distribution of entropies ranging over components of the predicted mixture (gray) and the Bayes mixture (green). We see that the distributions and in particular the means are similar.
  • Figure 3: Qualitatively, accurate estimates of aleatoric uncertainty help separate unusual, poorly learned images (mostly epistemic) from genuinely ambiguous ones (mostly aleatoric).Top: CIFAR-10H images with the highest ratio of epistemic uncertainty to aleatoric uncertainty (depicted by colored bars), as estimated by a well-higher-order-calibrated model. Bottom: The most aleatoric images according to the same model.
  • Figure 4: Comparison of Shannon and Brier binary entropy functions. Here we use the scaled version of Brier entropy, namely $G_{\text{Brier}}(p) = 4p(1-p)$, for a better comparison.
  • Figure 5: The simple binary regression task from johnson2024experts. (Positive) inputs are drawn from a normal distribution (left), and outputs are determined by a fixed function $p(y | x)$ (right) with low- and high-frequency components, the latter of which our simple predictor ($k=1$) fails to learn completely.
  • ...and 1 more figures

Theorems & Definitions (58)

  • Definition 1.1: Higher-order predictors and calibration
  • Theorem 1.2: Uncertainty decomposition under higher-order calibration
  • Definition 1.3: Perfect $k^\text{th}$-order calibration
  • Definition 3.1: Approximate higher-order calibration
  • Definition 3.2: $k$-snapshots and $k^\text{th}$-order projections
  • Definition 3.3: Approximate $k^\text{th}$-order calibration
  • Theorem 3.4: $k^\text{th}$-order calibration implies higher-order calibration.
  • Theorem 3.5: Moment estimates from $k^{th}$-order calibration
  • Theorem 3.6: Empirical estimate of $k^\text{th}$-order projection guarantee
  • Definition 4.1
  • ...and 48 more