Uniform convergence of the smooth calibration error and its relationship with functional gradient
Futoshi Futami, Atsushi Nitanda
TL;DR
The paper addresses the gap between calibration and accuracy by studying the smooth calibration error (smCE) in a finite-sample setting. It develops a uniform convergence bound for smCE and connects training smCE to the norm of the empirical functional gradient, providing a principled route to improve calibration without sacrificing accuracy. Through three case studies—gradient boosting trees, kernel boosting, and two-layer neural networks—the authors derive concrete conditions under which both calibration and misclassification performance can be simultaneously guaranteed, highlighting trade-offs between iterations and model complexity. These results offer theoretical guidance for designing probabilistic predictors with provable calibration guarantees in common learning paradigms and illustrate how calibration improvements can be achieved via controlling the gradient dynamics. The framework also clarifies the relationship between smCE and traditional calibration metrics, underscoring the practical relevance of uniform convergence results for real-world probabilistic prediction tasks.
Abstract
Calibration is a critical requirement for reliable probabilistic prediction, especially in high-risk applications. However, the theoretical understanding of which learning algorithms can simultaneously achieve high accuracy and good calibration remains limited, and many existing studies provide empirical validation or a theoretical guarantee in restrictive settings. To address this issue, in this work, we focus on the smooth calibration error (CE) and provide a uniform convergence bound, showing that the smooth CE is bounded by the sum of the smooth CE over the training dataset and a generalization gap. We further prove that the functional gradient of the loss function can effectively control the training smooth CE. Based on this framework, we analyze three representative algorithms: gradient boosting trees, kernel boosting, and two-layer neural networks. For each, we derive conditions under which both classification and calibration performances are simultaneously guaranteed. Our results offer new theoretical insights and practical guidance for designing reliable probabilistic models with provable calibration guarantees.
