Smooth Calibration and Decision Making
Jason Hartline, Yifan Wu, Yunran Yang
TL;DR
The paper addresses the mismatch between ML calibration and decision-making calibration, showing that post-processing a predictor with distance-to-calibration $\epsilon$ can achieve $\text{ECE}$ and $\text{CDL}$ bounds of $O(\sqrt{\epsilon})$. It develops a privacy-based post-processing framework that adds truncated noise to predictions, yielding provable bounds in both batch and online settings and establishing asymptotic optimality via matching lower bounds. Two concrete noise schemes (truncated Laplace and truncated Gaussian) are analyzed, with explicit bounds and tightness results, and online discretization is shown to preserve comparable guarantees with an additional discretization term. The findings clarify the trade-offs between ML calibration and decision-making calibration and relate the approach to omniprediction and online calibration literature. Overall, the work provides a principled method to render ML predictors trustworthy for decision-makers while delineating fundamental limits of post-processing versus direct optimization.
Abstract
Calibration requires predictor outputs to be consistent with their Bayesian posteriors. For machine learning predictors that do not distinguish between small perturbations, calibration errors are continuous in predictions, e.g., smooth calibration error (Foster and Hart, 2018), Distance to Calibration (Blasiok et al., 2023a). On the contrary, decision-makers who use predictions make optimal decisions discontinuously in probabilistic space, experiencing loss from miscalibration discontinuously. Calibration errors for decision-making are thus discontinuous, e.g., Expected Calibration Error (Foster and Vohra, 1997), and Calibration Decision Loss (Hu and Wu, 2024). Thus, predictors with a low calibration error for machine learning may suffer a high calibration error for decision-making, i.e., they may not be trustworthy for decision-makers optimizing assuming their predictions are correct. It is natural to ask if post-processing a predictor with a low calibration error for machine learning is without loss to achieve a low calibration error for decision-making. In our paper, we show that post-processing an online predictor with $ε$ distance to calibration achieves $O(\sqrtε)$ ECE and CDL, which is asymptotically optimal. The post-processing algorithm adds noise to make predictions differentially private. The optimal bound from low distance to calibration predictors from post-processing is non-optimal compared with existing online calibration algorithms that directly optimize for ECE and CDL.
