Table of Contents
Fetching ...

Nonparametric Distribution Regression Re-calibration

Ádám Jung, Domokos M. Kelen, András A. Benczúr

TL;DR

This work proposes a novel nonparametric re-calibration algorithm capable of correcting calibration error without restrictive modeling assumptions, and introduces a novel characteristic kernel over distributions that can be evaluated in $\mathcal{O}(n \log n)$ time for empirical distributions of size $n$.

Abstract

A key challenge in probabilistic regression is ensuring that predictive distributions accurately reflect true empirical uncertainty. Minimizing overall prediction error often encourages models to prioritize informativeness over calibration, producing narrow but overconfident predictions. However, in safety-critical settings, trustworthy uncertainty estimates are often more valuable than narrow intervals. Realizing the problem, several recent works have focused on post-hoc corrections; however, existing methods either rely on weak notions of calibration (such as PIT uniformity) or impose restrictive parametric assumptions on the nature of the error. To address these limitations, we propose a novel nonparametric re-calibration algorithm based on conditional kernel mean embeddings, capable of correcting calibration error without restrictive modeling assumptions. For efficient inference with real-valued targets, we introduce a novel characteristic kernel over distributions that can be evaluated in $\mathcal{O}(n \log n)$ time for empirical distributions of size $n$. We demonstrate that our method consistently outperforms prior re-calibration approaches across a diverse set of regression benchmarks and model classes.

Nonparametric Distribution Regression Re-calibration

TL;DR

This work proposes a novel nonparametric re-calibration algorithm capable of correcting calibration error without restrictive modeling assumptions, and introduces a novel characteristic kernel over distributions that can be evaluated in time for empirical distributions of size .

Abstract

A key challenge in probabilistic regression is ensuring that predictive distributions accurately reflect true empirical uncertainty. Minimizing overall prediction error often encourages models to prioritize informativeness over calibration, producing narrow but overconfident predictions. However, in safety-critical settings, trustworthy uncertainty estimates are often more valuable than narrow intervals. Realizing the problem, several recent works have focused on post-hoc corrections; however, existing methods either rely on weak notions of calibration (such as PIT uniformity) or impose restrictive parametric assumptions on the nature of the error. To address these limitations, we propose a novel nonparametric re-calibration algorithm based on conditional kernel mean embeddings, capable of correcting calibration error without restrictive modeling assumptions. For efficient inference with real-valued targets, we introduce a novel characteristic kernel over distributions that can be evaluated in time for empirical distributions of size . We demonstrate that our method consistently outperforms prior re-calibration approaches across a diverse set of regression benchmarks and model classes.
Paper Structure (29 sections, 5 theorems, 40 equations, 7 figures, 1 table)

This paper contains 29 sections, 5 theorems, 40 equations, 7 figures, 1 table.

Key Result

Lemma 4.1

The sum of calibration error and lack of sharpness is equal to the divergence from perfect predictions, i.e., the expected error score $\mathbb{E}\left[S(Q, Y)\right]$ is equal to

Figures (7)

  • Figure 1: Fraction of random train-test splits where the hypothesis of auto-calibration was accepted by SKCE at $\alpha = 5\%$. The numbers after the dataset name indicate the size of the test set $|\mathcal{D}_{test}|$, allowing the power of the hypothesis test to be assessed. See \ref{['ax:benchmark_detailed_results']} for detailed results.
  • Figure 2: CRPS loss relative to the base model trained only on the test set ($\mathrm{None(T)}$). See \ref{['ax:benchmark_detailed_results']} for detailed results.
  • Figure 3: Ratio of spits when the hypothesis of PIT-calibration was accepted at $\alpha = 5\%$. The numbers after the dataset name indicate the size of the test set $|\mathcal{D}_{test}|$, allowing the power of the hypothesis test to be assessed. See \ref{['ax:benchmark_detailed_results']} for detailed results.
  • Figure 4: Detailed benchmark results for base model $\mathrm{GDN}$.
  • Figure 5: Detailed benchmark results for base model $\mathrm{MDN}$.
  • ...and 2 more figures

Theorems & Definitions (15)

  • Definition 3.1
  • Lemma 4.1
  • proof
  • Definition 5.1
  • Proposition 5.2
  • proof
  • proof
  • proof
  • Definition A.1
  • Proposition A.2
  • ...and 5 more