Table of Contents
Fetching ...

Uncertainty Quantification for Regression: A Unified Framework based on kernel scores

Christopher Bülte, Yusuf Sale, Gitta Kutyniok, Eyke Hüllermeier

TL;DR

This work addresses the challenge of uncertainty quantification in regression by unifying total, aleatoric, and epistemic uncertainty under a framework of strictly proper scoring rules built from kernel scores. By formalizing both Bayesian-model-average and pairwise-estimator schemes, and linking kernel properties to downstream behavior, the authors provide concrete design guidelines for task-specific uncertainty measures. The method encompasses existing measures (e.g., variance, entropy, energy distance, MMD) and enables robust, translation-invariant, and potentially task-adapted uncertainty through choices like the energy score or Gaussian kernel score. Empirical results on weather, UCI benchmarks, and active-learning tasks demonstrate robustness and the value of adapting measures to the task, with clear trade-offs between robustness, OOD responsiveness, and computational cost. Overall, the framework offers a principled way to tailor regression uncertainty to application requirements and paves the way for more flexible uncertainty representations in regression problems.

Abstract

Regression tasks, notably in safety-critical domains, require proper uncertainty quantification, yet the literature remains largely classification-focused. In this light, we introduce a family of measures for total, aleatoric, and epistemic uncertainty based on proper scoring rules, with a particular emphasis on kernel scores. The framework unifies several well-known measures and provides a principled recipe for designing new ones whose behavior, such as tail sensitivity, robustness, and out-of-distribution responsiveness, is governed by the choice of kernel. We prove explicit correspondences between kernel-score characteristics and downstream behavior, yielding concrete design guidelines for task-specific measures. Extensive experiments demonstrate that these measures are effective in downstream tasks and reveal clear trade-offs among instantiations, including robustness and out-of-distribution detection performance.

Uncertainty Quantification for Regression: A Unified Framework based on kernel scores

TL;DR

This work addresses the challenge of uncertainty quantification in regression by unifying total, aleatoric, and epistemic uncertainty under a framework of strictly proper scoring rules built from kernel scores. By formalizing both Bayesian-model-average and pairwise-estimator schemes, and linking kernel properties to downstream behavior, the authors provide concrete design guidelines for task-specific uncertainty measures. The method encompasses existing measures (e.g., variance, entropy, energy distance, MMD) and enables robust, translation-invariant, and potentially task-adapted uncertainty through choices like the energy score or Gaussian kernel score. Empirical results on weather, UCI benchmarks, and active-learning tasks demonstrate robustness and the value of adapting measures to the task, with clear trade-offs between robustness, OOD responsiveness, and computational cost. Overall, the framework offers a principled way to tailor regression uncertainty to application requirements and paves the way for more flexible uncertainty representations in regression problems.

Abstract

Regression tasks, notably in safety-critical domains, require proper uncertainty quantification, yet the literature remains largely classification-focused. In this light, we introduce a family of measures for total, aleatoric, and epistemic uncertainty based on proper scoring rules, with a particular emphasis on kernel scores. The framework unifies several well-known measures and provides a principled recipe for designing new ones whose behavior, such as tail sensitivity, robustness, and out-of-distribution responsiveness, is governed by the choice of kernel. We prove explicit correspondences between kernel-score characteristics and downstream behavior, yielding concrete design guidelines for task-specific measures. Extensive experiments demonstrate that these measures are effective in downstream tasks and reveal clear trade-offs among instantiations, including robustness and out-of-distribution detection performance.

Paper Structure

This paper contains 32 sections, 5 theorems, 53 equations, 8 figures, 3 tables.

Key Result

Proposition 5.1

For any proper scoring rule $S$, it holds that

Figures (8)

  • Figure 1: Illustration of epistemic uncertainty for a two-member Gaussian ensemble with shared variances. As the component variances shrink, the variance-based measure ($S_\mathrm{SE}$) stays constant, the entropy-based measure ($S_\mathrm{log}$) diverges, while our proposed energy-score-based measure ($S_\mathrm{ES}$) converges to half the Euclidean distance between component means.
  • Figure 2: The figure shows AU and EU averaged over a test set of 365 days for the different uncertainty measures. For visualization purposes, epistemic uncertainty is shown on a log-scale.
  • Figure 3: Different task losses (each plot) sorted by each of the different uncertainty measures from highest to lowest total uncertainty, trained on the T2M prediction task. For visualization purposes, the values shown are moving averages of size 50.
  • Figure 4: Continuous ranked probability score with increasing training instances for different model runs with the corresponding uncertainty measure specified by $\gamma$, averaged across three runs. The left panel shows the full run, the right panel shows a close-up.
  • Figure 5: The figure shows the spatial domain used for the distributional regression networks, as well as the corresponding land-sea mask and orography.
  • ...and 3 more figures

Theorems & Definitions (14)

  • Definition 4.1: Kernel score
  • Proposition 5.1
  • Proposition 5.2
  • Proposition 5.3
  • proof : Proof of Proposition \ref{['prop:eu']}
  • proof : Proof of Proposition \ref{['prop:au']}
  • proof : Proof of Proposition \ref{['prop:robustness']}
  • Proposition A.1
  • proof
  • Proposition A.2
  • ...and 4 more