Table of Contents
Fetching ...

Calibration Error for Decision Making

Lunjia Hu, Yifan Wu

TL;DR

A new efficient algorithm for online calibration that achieves near-optimal expected Calibration Decision Loss, bypassing the $\Omega(T^{-0.472})$ lower bound for ECE by Qiao and Valiant (2021).

Abstract

Calibration allows predictions to be reliably interpreted as probabilities by decision makers. We propose a decision-theoretic calibration error, the Calibration Decision Loss (CDL), defined as the maximum improvement in decision payoff obtained by calibrating the predictions, where the maximum is over all payoff-bounded decision tasks. Vanishing CDL guarantees the payoff loss from miscalibration vanishes simultaneously for all downstream decision tasks. We show separations between CDL and existing calibration error metrics, including the most well-studied metric Expected Calibration Error (ECE). Our main technical contribution is a new efficient algorithm for online calibration that achieves near-optimal $O(\frac{\log T}{\sqrt{T}})$ expected CDL, bypassing the $Ω(T^{-0.472})$ lower bound for ECE by Qiao and Valiant (2021).

Calibration Error for Decision Making

TL;DR

A new efficient algorithm for online calibration that achieves near-optimal expected Calibration Decision Loss, bypassing the lower bound for ECE by Qiao and Valiant (2021).

Abstract

Calibration allows predictions to be reliably interpreted as probabilities by decision makers. We propose a decision-theoretic calibration error, the Calibration Decision Loss (CDL), defined as the maximum improvement in decision payoff obtained by calibrating the predictions, where the maximum is over all payoff-bounded decision tasks. Vanishing CDL guarantees the payoff loss from miscalibration vanishes simultaneously for all downstream decision tasks. We show separations between CDL and existing calibration error metrics, including the most well-studied metric Expected Calibration Error (ECE). Our main technical contribution is a new efficient algorithm for online calibration that achieves near-optimal expected CDL, bypassing the lower bound for ECE by Qiao and Valiant (2021).
Paper Structure (32 sections, 24 theorems, 90 equations, 3 figures, 1 algorithm)

This paper contains 32 sections, 24 theorems, 90 equations, 3 figures, 1 algorithm.

Key Result

Lemma 1.3

Let $T,m$ be positive integers satisfying $m = \Theta (\sqrt T)$. Define $Q = \{q_1,\ldots,q_m\}\subseteq [0,1]$ where $q_i = i/m$ for every $i = 1,\ldots,m$. Given a sequence of predictions $\bm{p} = (p_1,\ldots,p_T)\in Q^T$ and realized states $\bm{\theta} = (\theta_1,\ldots,\theta_T)\in \{0,1\}^T

Figures (3)

  • Figure 1: In this example, $\textsc{ECE}$ overestimates the decision loss from miscalibration for a specific decision task. The plot visualizes the predictions in $[0, 1]$. The best-response decision rule changes action at threshold $1/2$ (red). When the miscalibrated predictor predicts $0.4$ (blue), the actual empirical frequency is $0.2$; and when $0.6$ (blue) is predicted, the empirical frequency is $0.8$. Miscalibration induces no loss to the decision maker, since in both cases the prediction and the corresponding empirical frequency lie on the same side of the threshold, recommending the same action.
  • Figure 2: The graphic explanation of the connection between proper scoring rule and Bregman divergence. The thick convex curve plots the convex utility function $u(p)$ for a proper scoring rule. Fix a report, the score $S(p, \theta) = u(p) + \nabla u(p) (\theta - p)$ is the extreme points on the gradient hyperplane passing $u(p)$ (the thin line). Given empirical distribution $\widehat{p}$, the Bregman divergence $\textsc{Breg}(p, \widehat{p})$ is the loss of reporting $p$ instead of $\widehat{p}$.
  • Figure 3: The thick black line plots the special convex function $u$ for the scoring rule. The convex utility function is V-shaped, consisting of two linear pieces intersecting at $\mu$. Once fixing the prediction $p$, the score $S(p, \theta)= u(p) + \nabla u(p)\cdot(\theta - p)$ is linear in the state $\theta$. The scoring rule offers two set of scores for the prediction to selecttwo linear lines for prediction. When $p\leq q_0$, the prediction selects scores $\{S(0, 0), S(0, 1)\}$. Otherwise, the prediction selects $\{S(1, 0), S(1, 1)\}$.

Theorems & Definitions (64)

  • Example 1.1
  • Example 1.2
  • Lemma 1.3: Informal special case of \ref{['lm:attribute']}
  • Definition 2.1
  • Definition 2.2
  • Definition 2.3: $\textsc{ECE}$
  • Definition 2.4: $K_{2}$ calibration error
  • Definition 2.5: Smooth Calibration Error, smooth
  • Definition 2.6: Distance to Calibration, utc
  • Lemma 2.7: utc
  • ...and 54 more