Nonparametric Distribution Regression Re-calibration

Ádám Jung; Domokos M. Kelen; András A. Benczúr

Nonparametric Distribution Regression Re-calibration

Ádám Jung, Domokos M. Kelen, András A. Benczúr

TL;DR

This work proposes a novel nonparametric re-calibration algorithm capable of correcting calibration error without restrictive modeling assumptions, and introduces a novel characteristic kernel over distributions that can be evaluated in $\mathcal{O}(n \log n)$ time for empirical distributions of size $n$.

Abstract

A key challenge in probabilistic regression is ensuring that predictive distributions accurately reflect true empirical uncertainty. Minimizing overall prediction error often encourages models to prioritize informativeness over calibration, producing narrow but overconfident predictions. However, in safety-critical settings, trustworthy uncertainty estimates are often more valuable than narrow intervals. Realizing the problem, several recent works have focused on post-hoc corrections; however, existing methods either rely on weak notions of calibration (such as PIT uniformity) or impose restrictive parametric assumptions on the nature of the error. To address these limitations, we propose a novel nonparametric re-calibration algorithm based on conditional kernel mean embeddings, capable of correcting calibration error without restrictive modeling assumptions. For efficient inference with real-valued targets, we introduce a novel characteristic kernel over distributions that can be evaluated in $\mathcal{O}(n \log n)$ time for empirical distributions of size $n$. We demonstrate that our method consistently outperforms prior re-calibration approaches across a diverse set of regression benchmarks and model classes.

Nonparametric Distribution Regression Re-calibration

TL;DR

time for empirical distributions of size

Abstract

time for empirical distributions of size

. We demonstrate that our method consistently outperforms prior re-calibration approaches across a diverse set of regression benchmarks and model classes.

Paper Structure (29 sections, 5 theorems, 40 equations, 7 figures, 1 table)

This paper contains 29 sections, 5 theorems, 40 equations, 7 figures, 1 table.

Introduction
Related Work
Background
Proper Scoring Rules
Kernel Mean Embedding of distributions
Distance of mean embeddings
Conditional Kernel Mean Embedding
Notions of calibration
PIT calibration
Auto-calibration
Hypothesis testing calibration
Calibration vs. Sharpness principle
Re-calibration
Non-parametric calibration map estimation
The Energy Distance Kernel (EDK)
...and 14 more sections

Key Result

Lemma 4.1

The sum of calibration error and lack of sharpness is equal to the divergence from perfect predictions, i.e., the expected error score $\mathbb{E}\left[S(Q, Y)\right]$ is equal to

Figures (7)

Figure 1: Fraction of random train-test splits where the hypothesis of auto-calibration was accepted by SKCE at $\alpha = 5\%$. The numbers after the dataset name indicate the size of the test set $|\mathcal{D}_{test}|$, allowing the power of the hypothesis test to be assessed. See \ref{['ax:benchmark_detailed_results']} for detailed results.
Figure 2: CRPS loss relative to the base model trained only on the test set ($\mathrm{None(T)}$). See \ref{['ax:benchmark_detailed_results']} for detailed results.
Figure 3: Ratio of spits when the hypothesis of PIT-calibration was accepted at $\alpha = 5\%$. The numbers after the dataset name indicate the size of the test set $|\mathcal{D}_{test}|$, allowing the power of the hypothesis test to be assessed. See \ref{['ax:benchmark_detailed_results']} for detailed results.
Figure 4: Detailed benchmark results for base model $\mathrm{GDN}$.
Figure 5: Detailed benchmark results for base model $\mathrm{MDN}$.
...and 2 more figures

Theorems & Definitions (15)

Definition 3.1
Lemma 4.1
proof
Definition 5.1
Proposition 5.2
proof
proof
proof
Definition A.1
Proposition A.2
...and 5 more

Nonparametric Distribution Regression Re-calibration

TL;DR

Abstract

Nonparametric Distribution Regression Re-calibration

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (15)