Table of Contents
Fetching ...

Smooth Calibration and Decision Making

Jason Hartline, Yifan Wu, Yunran Yang

TL;DR

The paper addresses the mismatch between ML calibration and decision-making calibration, showing that post-processing a predictor with distance-to-calibration $\epsilon$ can achieve $\text{ECE}$ and $\text{CDL}$ bounds of $O(\sqrt{\epsilon})$. It develops a privacy-based post-processing framework that adds truncated noise to predictions, yielding provable bounds in both batch and online settings and establishing asymptotic optimality via matching lower bounds. Two concrete noise schemes (truncated Laplace and truncated Gaussian) are analyzed, with explicit bounds and tightness results, and online discretization is shown to preserve comparable guarantees with an additional discretization term. The findings clarify the trade-offs between ML calibration and decision-making calibration and relate the approach to omniprediction and online calibration literature. Overall, the work provides a principled method to render ML predictors trustworthy for decision-makers while delineating fundamental limits of post-processing versus direct optimization.

Abstract

Calibration requires predictor outputs to be consistent with their Bayesian posteriors. For machine learning predictors that do not distinguish between small perturbations, calibration errors are continuous in predictions, e.g., smooth calibration error (Foster and Hart, 2018), Distance to Calibration (Blasiok et al., 2023a). On the contrary, decision-makers who use predictions make optimal decisions discontinuously in probabilistic space, experiencing loss from miscalibration discontinuously. Calibration errors for decision-making are thus discontinuous, e.g., Expected Calibration Error (Foster and Vohra, 1997), and Calibration Decision Loss (Hu and Wu, 2024). Thus, predictors with a low calibration error for machine learning may suffer a high calibration error for decision-making, i.e., they may not be trustworthy for decision-makers optimizing assuming their predictions are correct. It is natural to ask if post-processing a predictor with a low calibration error for machine learning is without loss to achieve a low calibration error for decision-making. In our paper, we show that post-processing an online predictor with $ε$ distance to calibration achieves $O(\sqrtε)$ ECE and CDL, which is asymptotically optimal. The post-processing algorithm adds noise to make predictions differentially private. The optimal bound from low distance to calibration predictors from post-processing is non-optimal compared with existing online calibration algorithms that directly optimize for ECE and CDL.

Smooth Calibration and Decision Making

TL;DR

The paper addresses the mismatch between ML calibration and decision-making calibration, showing that post-processing a predictor with distance-to-calibration can achieve and bounds of . It develops a privacy-based post-processing framework that adds truncated noise to predictions, yielding provable bounds in both batch and online settings and establishing asymptotic optimality via matching lower bounds. Two concrete noise schemes (truncated Laplace and truncated Gaussian) are analyzed, with explicit bounds and tightness results, and online discretization is shown to preserve comparable guarantees with an additional discretization term. The findings clarify the trade-offs between ML calibration and decision-making calibration and relate the approach to omniprediction and online calibration literature. Overall, the work provides a principled method to render ML predictors trustworthy for decision-makers while delineating fundamental limits of post-processing versus direct optimization.

Abstract

Calibration requires predictor outputs to be consistent with their Bayesian posteriors. For machine learning predictors that do not distinguish between small perturbations, calibration errors are continuous in predictions, e.g., smooth calibration error (Foster and Hart, 2018), Distance to Calibration (Blasiok et al., 2023a). On the contrary, decision-makers who use predictions make optimal decisions discontinuously in probabilistic space, experiencing loss from miscalibration discontinuously. Calibration errors for decision-making are thus discontinuous, e.g., Expected Calibration Error (Foster and Vohra, 1997), and Calibration Decision Loss (Hu and Wu, 2024). Thus, predictors with a low calibration error for machine learning may suffer a high calibration error for decision-making, i.e., they may not be trustworthy for decision-makers optimizing assuming their predictions are correct. It is natural to ask if post-processing a predictor with a low calibration error for machine learning is without loss to achieve a low calibration error for decision-making. In our paper, we show that post-processing an online predictor with distance to calibration achieves ECE and CDL, which is asymptotically optimal. The post-processing algorithm adds noise to make predictions differentially private. The optimal bound from low distance to calibration predictors from post-processing is non-optimal compared with existing online calibration algorithms that directly optimize for ECE and CDL.

Paper Structure

This paper contains 22 sections, 22 theorems, 110 equations, 1 table.

Key Result

Proposition 2.10

Given a decision problem with proper scoring rule $S$, a predictor $P$, the mapping $\sigma^*(p) = \Pr[\theta|p]$ is the swap regret maximizing mapping, i.e. the swap regret equals the payoff improvement from calibrating the predictor: $\sigma^*(p) = \Pr[\theta = 1 | p]$.

Theorems & Definitions (48)

  • Definition 2.1: Differential Privacy
  • Definition 2.2: Scoring Rule from Decision
  • Definition 2.3: Proper Score
  • Claim 2.4: kleinberg2023uhu2024predict
  • Definition 2.5: Decision Loss
  • Definition 2.6: Omniprediction
  • Definition 2.7: Perfect Calibration
  • Definition 2.8: Expected Calibration Error, $\textsc{ECE}$
  • Definition 2.9: Swap Regret
  • Proposition 2.10
  • ...and 38 more