Table of Contents
Fetching ...

A Variational Estimator for $L_p$ Calibration Errors

Eugène Berta, Sacha Braun, David Holzmüller, Francis Bach, Michael I. Jordan

TL;DR

This work shows how to extend a recent variational framework for estimating calibration errors beyond divergences induced induced by proper losses, to cover a broad class of calibration errors induced by $L_p$ divergences.

Abstract

Calibration$\unicode{x2014}$the problem of ensuring that predicted probabilities align with observed class frequencies$\unicode{x2014}$is a basic desideratum for reliable prediction with machine learning systems. Calibration error is traditionally assessed via a divergence function, using the expected divergence between predictions and empirical frequencies. Accurately estimating this quantity is challenging, especially in the multiclass setting. Here, we show how to extend a recent variational framework for estimating calibration errors beyond divergences induced induced by proper losses, to cover a broad class of calibration errors induced by $L_p$ divergences. Our method can separate over- and under-confidence and, unlike non-variational approaches, avoids overestimation. We provide extensive experiments and integrate our code in the open-source package probmetrics (https://github.com/dholzmueller/probmetrics) for evaluating calibration errors.

A Variational Estimator for $L_p$ Calibration Errors

TL;DR

This work shows how to extend a recent variational framework for estimating calibration errors beyond divergences induced induced by proper losses, to cover a broad class of calibration errors induced by divergences.

Abstract

Calibrationthe problem of ensuring that predicted probabilities align with observed class frequenciesis a basic desideratum for reliable prediction with machine learning systems. Calibration error is traditionally assessed via a divergence function, using the expected divergence between predictions and empirical frequencies. Accurately estimating this quantity is challenging, especially in the multiclass setting. Here, we show how to extend a recent variational framework for estimating calibration errors beyond divergences induced induced by proper losses, to cover a broad class of calibration errors induced by divergences. Our method can separate over- and under-confidence and, unlike non-variational approaches, avoids overestimation. We provide extensive experiments and integrate our code in the open-source package probmetrics (https://github.com/dholzmueller/probmetrics) for evaluating calibration errors.
Paper Structure (30 sections, 2 theorems, 23 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 30 sections, 2 theorems, 23 equations, 3 figures, 2 tables, 1 algorithm.

Key Result

Proposition 1

For $p \geq 1, z \in \Delta_k, Y \in \mathcal{Y}$, define where $\nabla_z \| z - f(X) \|_p = \mathrm{sign}(z-f(X)) \odot \frac{| z - f(X) |^{p-1}}{\| z - f(X) \|_p^{p-1}}$, with element-wise power and $\mathrm{sign}$ in the numerator (for $p=1$, $\nabla_z \| z - f(X) \|_p =\mathrm{sign}(z-f(X))$). Then,

Figures (3)

  • Figure 1: Estimated $\mathrm{CE}_{|\cdot|}$ by number of samples when the predictions are calibrated (top), over-confident (middle), or shifted by a small parameter (bottom).
  • Figure 2: Different simulated mis-calibration scenarios. Predictions are either over-confident (left), under-confident (middle) or a mix of both (right).
  • Figure 3: Estimation of $\mathrm{CE}_{\|\cdot\|_2}$ with binning, isotonic regression with over-fitting, and cross-validated isotonic regression on synthetic multiclass datasets with $3$ classes (left) and $10$ classes (right)

Theorems & Definitions (7)

  • Proposition 1
  • proof
  • Remark 1
  • Remark 2
  • Remark 3
  • Proposition 2: Multiclass extension of Proposition 3.1 in braun2025conditional
  • proof