Table of Contents
Fetching ...

Unified Uncertainty Calibration

Kamalika Chaudhuri, David Lopez-Paz

TL;DR

This work tackles uncertainty estimation for robust AI by criticizing the conventional reject-or-classify approach and introducing Unified Uncertainty Calibration (U2C), a framework that jointly calibrates aleatoric and epistemic uncertainties into a single extended prediction over c+1 classes. By relabeling a small fraction of in-domain data and training a nonlinear epistemic calibrator, U2C enables communication between uncertainty sources and calibrated predictions that include abstention as a natural outcome. The authors provide a theoretical comparison showing when RC and U2C differ in error and likelihood, and demonstrate empirically across ImageNet-based benchmarks that U2C improves both classification accuracy and calibration (ECE) on most settings while maintaining stability across several epistemic estimators. Overall, U2C offers a principled, scalable path to safer, more reliable predictions under distribution shifts with practical implications for real-world AI systems.

Abstract

To build robust, fair, and safe AI systems, we would like our classifiers to say ``I don't know'' when facing test examples that are difficult or fall outside of the training classes.The ubiquitous strategy to predict under uncertainty is the simplistic \emph{reject-or-classify} rule: abstain from prediction if epistemic uncertainty is high, classify otherwise.Unfortunately, this recipe does not allow different sources of uncertainty to communicate with each other, produces miscalibrated predictions, and it does not allow to correct for misspecifications in our uncertainty estimates. To address these three issues, we introduce \emph{unified uncertainty calibration (U2C)}, a holistic framework to combine aleatoric and epistemic uncertainties. U2C enables a clean learning-theoretical analysis of uncertainty estimation, and outperforms reject-or-classify across a variety of ImageNet benchmarks. Our code is available at: https://github.com/facebookresearch/UnifiedUncertaintyCalibration

Unified Uncertainty Calibration

TL;DR

This work tackles uncertainty estimation for robust AI by criticizing the conventional reject-or-classify approach and introducing Unified Uncertainty Calibration (U2C), a framework that jointly calibrates aleatoric and epistemic uncertainties into a single extended prediction over c+1 classes. By relabeling a small fraction of in-domain data and training a nonlinear epistemic calibrator, U2C enables communication between uncertainty sources and calibrated predictions that include abstention as a natural outcome. The authors provide a theoretical comparison showing when RC and U2C differ in error and likelihood, and demonstrate empirically across ImageNet-based benchmarks that U2C improves both classification accuracy and calibration (ECE) on most settings while maintaining stability across several epistemic estimators. Overall, U2C offers a principled, scalable path to safer, more reliable predictions under distribution shifts with practical implications for real-world AI systems.

Abstract

To build robust, fair, and safe AI systems, we would like our classifiers to say ``I don't know'' when facing test examples that are difficult or fall outside of the training classes.The ubiquitous strategy to predict under uncertainty is the simplistic \emph{reject-or-classify} rule: abstain from prediction if epistemic uncertainty is high, classify otherwise.Unfortunately, this recipe does not allow different sources of uncertainty to communicate with each other, produces miscalibrated predictions, and it does not allow to correct for misspecifications in our uncertainty estimates. To address these three issues, we introduce \emph{unified uncertainty calibration (U2C)}, a holistic framework to combine aleatoric and epistemic uncertainties. U2C enables a clean learning-theoretical analysis of uncertainty estimation, and outperforms reject-or-classify across a variety of ImageNet benchmarks. Our code is available at: https://github.com/facebookresearch/UnifiedUncertaintyCalibration
Paper Structure (19 sections, 4 theorems, 18 equations, 1 figure, 4 tables)

This paper contains 19 sections, 4 theorems, 18 equations, 1 figure, 4 tables.

Key Result

Lemma 5.1

The difference of errors between RC and U2C based on a network $f_{\tau}$ is:

Figures (1)

  • Figure 1: Panel (a) shows the acceptance/rejection regions of RC and U2C, serving as a visual support to our theoretical analysis. Panel (b) shows examples of IID images according to their epistemic uncertainty ($u(x)$, horizontal axis), aleatoric uncertainty ($\pi_f(x)$, vertical axis), and correctness of classification (border color). Panel (c) illustrates OOD images similarly. Last two panels illustrate how U2C covers all possible aleatoric-epistemic combinations, in way that correlates appropriately to (mis)classification, both IID and OOD.

Theorems & Definitions (8)

  • Lemma 5.1
  • Lemma 5.2
  • proof
  • proof
  • Lemma A.1
  • proof
  • Lemma A.2
  • proof