Provable Uncertainty Decomposition via Higher-Order Calibration
Gustaf Ahdritz, Aravind Gollakota, Parikshit Gopalan, Charlotte Peale, Udi Wieder
TL;DR
This paper introduces higher-order calibration as a principled framework to decompose predictive uncertainty into aleatoric and epistemic components with semantic alignment to the data-generating process. It defines higher-order predictors $f:\mathcal{X}\to\Delta\Delta\mathcal{Y}$ and Bayes mixtures over level sets $[x]$, proving that, under perfect higher-order calibration, the predicted aleatoric uncertainty matches the true average aleatoric uncertainty across the level set and the epistemic uncertainty corresponds to the average dispersion of those distributions. To make the theory practical, the authors introduce $k$-th order calibration, a tractable relaxation verifiable with $k$-snapshots, with a quantified convergence rate to full higher-order calibration and moment-recovery guarantees for the first $k$ moments. They provide two avenues to achieve $k$-th order calibration: learning directly from snapshots and a post-hoc calibration routine, each supported by finite-sample guarantees. Empirical evaluation on image classification (e.g., CIFAR-10H) demonstrates that larger $k$ improves the quality of aleatoric uncertainty estimates and yields meaningful uncertainty decompositions for real-world tasks, while the framework remains distribution-free and broadly applicable to Bayesian and ensemble predictors. The work offers a robust, calibration-grounded method for uncertainty quantification with explicit semantics and practical guarantees, enabling more trustworthy and interpretable predictive systems.
Abstract
We give a principled method for decomposing the predictive uncertainty of a model into aleatoric and epistemic components with explicit semantics relating them to the real-world data distribution. While many works in the literature have proposed such decompositions, they lack the type of formal guarantees we provide. Our method is based on the new notion of higher-order calibration, which generalizes ordinary calibration to the setting of higher-order predictors that predict mixtures over label distributions at every point. We show how to measure as well as achieve higher-order calibration using access to $k$-snapshots, namely examples where each point has $k$ independent conditional labels. Under higher-order calibration, the estimated aleatoric uncertainty at a point is guaranteed to match the real-world aleatoric uncertainty averaged over all points where the prediction is made. To our knowledge, this is the first formal guarantee of this type that places no assumptions whatsoever on the real-world data distribution. Importantly, higher-order calibration is also applicable to existing higher-order predictors such as Bayesian and ensemble models and provides a natural evaluation metric for such models. We demonstrate through experiments that our method produces meaningful uncertainty decompositions for image classification.
