A Variational Estimator for $L_p$ Calibration Errors

Eugène Berta; Sacha Braun; David Holzmüller; Francis Bach; Michael I. Jordan

A Variational Estimator for $L_p$ Calibration Errors

Eugène Berta, Sacha Braun, David Holzmüller, Francis Bach, Michael I. Jordan

TL;DR

This work shows how to extend a recent variational framework for estimating calibration errors beyond divergences induced induced by proper losses, to cover a broad class of calibration errors induced by $L_p$ divergences.

Abstract

Calibration$\unicode{x2014}$the problem of ensuring that predicted probabilities align with observed class frequencies$\unicode{x2014}$is a basic desideratum for reliable prediction with machine learning systems. Calibration error is traditionally assessed via a divergence function, using the expected divergence between predictions and empirical frequencies. Accurately estimating this quantity is challenging, especially in the multiclass setting. Here, we show how to extend a recent variational framework for estimating calibration errors beyond divergences induced induced by proper losses, to cover a broad class of calibration errors induced by $L_p$ divergences. Our method can separate over- and under-confidence and, unlike non-variational approaches, avoids overestimation. We provide extensive experiments and integrate our code in the open-source package probmetrics (https://github.com/dholzmueller/probmetrics) for evaluating calibration errors.

A Variational Estimator for $L_p$ Calibration Errors

TL;DR

divergences.

Abstract

Calibration

the problem of ensuring that predicted probabilities align with observed class frequencies

is a basic desideratum for reliable prediction with machine learning systems. Calibration error is traditionally assessed via a divergence function, using the expected divergence between predictions and empirical frequencies. Accurately estimating this quantity is challenging, especially in the multiclass setting. Here, we show how to extend a recent variational framework for estimating calibration errors beyond divergences induced induced by proper losses, to cover a broad class of calibration errors induced by

divergences. Our method can separate over- and under-confidence and, unlike non-variational approaches, avoids overestimation. We provide extensive experiments and integrate our code in the open-source package probmetrics (https://github.com/dholzmueller/probmetrics) for evaluating calibration errors.

Paper Structure (30 sections, 2 theorems, 23 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 30 sections, 2 theorems, 23 equations, 3 figures, 2 tables, 1 algorithm.

INTRODUCTION
Contributions.
ESTIMATING PROPER CALIBRATION ERRORS
Proper calibration errors.
A variational estimator.
Obtaining a lower bound with cross-validation.
ESTIMATING $L_p$ CALIBRATION ERRORS
EXPERIMENTS
Obtaining a lower bound with cross-validation.
Approaching the true calibration error with better classifiers.
ALGORITHMIC PROCEDURE
ESTIMATING GENERAL DISTANCES
ESTIMATING OVER- AND UNDER-CONFIDENCE
Binary case.
Multiclass case.
...and 15 more sections

Key Result

Proposition 1

For $p \geq 1, z \in \Delta_k, Y \in \mathcal{Y}$, define where $\nabla_z \| z - f(X) \|_p = \mathrm{sign}(z-f(X)) \odot \frac{| z - f(X) |^{p-1}}{\| z - f(X) \|_p^{p-1}}$, with element-wise power and $\mathrm{sign}$ in the numerator (for $p=1$, $\nabla_z \| z - f(X) \|_p =\mathrm{sign}(z-f(X))$). Then,

Figures (3)

Figure 1: Estimated $\mathrm{CE}_{|\cdot|}$ by number of samples when the predictions are calibrated (top), over-confident (middle), or shifted by a small parameter (bottom).
Figure 2: Different simulated mis-calibration scenarios. Predictions are either over-confident (left), under-confident (middle) or a mix of both (right).
Figure 3: Estimation of $\mathrm{CE}_{\|\cdot\|_2}$ with binning, isotonic regression with over-fitting, and cross-validated isotonic regression on synthetic multiclass datasets with $3$ classes (left) and $10$ classes (right)

Theorems & Definitions (7)

Proposition 1
proof
Remark 1
Remark 2
Remark 3
Proposition 2: Multiclass extension of Proposition 3.1 in braun2025conditional
proof

A Variational Estimator for $L_p$ Calibration Errors

TL;DR

Abstract

A Variational Estimator for $L_p$ Calibration Errors

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (7)