Table of Contents
Fetching ...

Uncertainty Quantification for Regression using Proper Scoring Rules

Alexander Fishkov, Kajetan Schweighofer, Mykyta Ielanskyi, Nikita Kotelevskii, Mohsen Guizani, Maxim Panov

TL;DR

The paper extends uncertainty quantification to regression by grounding measures in proper scoring rules, enabling a principled decomposition into aleatoric and epistemic uncertainty. It develops a unified framework that yields total, Bayes, and excess risks and provides closed-form or Gaussian-surrogate estimators under ensemble assumptions. The approach recovers and generalizes existing variance- and entropy-based regression UQ methods while offering new measures based on CRPS and quadratic scores. Extensive experiments on synthetic and real data demonstrate robust, task-aligned behavior across selective prediction, OOD detection, and active learning, guiding practitioners on which uncertainty measures to use in practice.

Abstract

Quantifying uncertainty of machine learning model predictions is essential for reliable decision-making, especially in safety-critical applications. Recently, uncertainty quantification (UQ) theory has advanced significantly, building on a firm basis of learning with proper scoring rules. However, these advances were focused on classification, while extending these ideas to regression remains challenging. In this work, we introduce a unified UQ framework for regression based on proper scoring rules, such as CRPS, logarithmic, squared error, and quadratic scores. We derive closed-form expressions for the resulting uncertainty measures under practical parametric assumptions and show how to estimate them using ensembles of models. In particular, the derived uncertainty measures naturally decompose into aleatoric and epistemic components. The framework recovers popular regression UQ measures based on predictive variance and differential entropy. Our broad evaluation on synthetic and real-world regression datasets provides guidance for selecting reliable UQ measures.

Uncertainty Quantification for Regression using Proper Scoring Rules

TL;DR

The paper extends uncertainty quantification to regression by grounding measures in proper scoring rules, enabling a principled decomposition into aleatoric and epistemic uncertainty. It develops a unified framework that yields total, Bayes, and excess risks and provides closed-form or Gaussian-surrogate estimators under ensemble assumptions. The approach recovers and generalizes existing variance- and entropy-based regression UQ methods while offering new measures based on CRPS and quadratic scores. Extensive experiments on synthetic and real data demonstrate robust, task-aligned behavior across selective prediction, OOD detection, and active learning, guiding practitioners on which uncertainty measures to use in practice.

Abstract

Quantifying uncertainty of machine learning model predictions is essential for reliable decision-making, especially in safety-critical applications. Recently, uncertainty quantification (UQ) theory has advanced significantly, building on a firm basis of learning with proper scoring rules. However, these advances were focused on classification, while extending these ideas to regression remains challenging. In this work, we introduce a unified UQ framework for regression based on proper scoring rules, such as CRPS, logarithmic, squared error, and quadratic scores. We derive closed-form expressions for the resulting uncertainty measures under practical parametric assumptions and show how to estimate them using ensembles of models. In particular, the derived uncertainty measures naturally decompose into aleatoric and epistemic components. The framework recovers popular regression UQ measures based on predictive variance and differential entropy. Our broad evaluation on synthetic and real-world regression datasets provides guidance for selecting reliable UQ measures.

Paper Structure

This paper contains 56 sections, 4 theorems, 102 equations, 10 figures, 10 tables.

Key Result

Theorem 4

Every scoring function that is consistent for the mean functional $T(P) = \mathbb{E}_P[Y]$ admits a representation as a Bregman divergence: where $\varphi$ is a convex function, and $\varphi'$ is its subgradient. Such $S(x,y)$ are also called Bregman functions.

Figures (10)

  • Figure 1: Behavior of uncertainty measures under location and scale shifts of the posterior distribution. Arrows indicate whether a measure increased or decreased due to the shift, gray bars indicate changes < 1%, missing entries that the measure is not computable or constant zero.
  • Figure 2: Uncertainty with the logarithmic scoring rule. Each panel plots test inputs at their ensemble-averaged predictive mean; color indicates the corresponding uncertainty. Left: Total risk. Middle: Bayes risk (aleatoric). Right: Excess risk (epistemic proxy).
  • Figure 3: Kendall's $\tau_b$ rank correlation between different risk approximations for the considered proper scoring rules (top row) and between different scoring rules for the considered risk approximations (bottom rows). Correlations are averaged over all considered datasets.
  • Figure 4: Training and validation loss when minimizing the Gaussian NLL on a synthetic regression example. Left: Training with the standard parameterization (equation \ref{['eq:standard_nll']}). Right: Training with the natural parameterization (equation \ref{['eq:natural_nll']}).
  • Figure 5: Best model when minimizing the Gaussian NLL on a synthetic regression example. Left: Predicting with the standard parameterization (equation \ref{['eq:standard_nll']}). Right: Predicting with the natural parameterization (equation \ref{['eq:natural_nll']}).
  • ...and 5 more figures

Theorems & Definitions (9)

  • Definition 1
  • Definition 2
  • proof
  • Definition 3: Definition 2.1 from Gneiting2009MakingAE
  • Theorem 4: Savage Savage1971ElicitationOP, 1971
  • Theorem 5: Theorem 2.2 from Gneiting2009MakingAE
  • Theorem 6: Theorem 2.3 from Gneiting2009MakingAE
  • Definition 7
  • Theorem 8: Theorem 1 from Gneiting2007StrictlyPS, notation from Theorem 12 in waghmare2025properscoringrulesestimation