Information-theoretic Generalization Analysis for Expected Calibration Error
Futoshi Futami, Masahiro Fujisawa
TL;DR
This work addresses the gap in understanding the estimation bias of binning-based calibration errors by providing a unified information-theoretic treatment of both uniform width (UWB) and uniform mass binning (UMB). It derives sharp upper bounds on the total bias of the binned ECE, identifies the optimal bin count B = O(n_te^{1/3}) that minimizes this bias, and shows the resulting bias scales as O(n_te^{-1/3}). Extending to generalization analysis, the authors develop IT-based bounds for the ECE and TCE gaps via eCMI/fCMI, relate these to metric entropy, and analyze the impact of data reuse in recalibration. Experimental results on synthetic and real datasets confirm the nonvacuity of the bounds and illustrate practical bin-size guidance, including the potential benefits of reusing training data for recalibration when calibration generalizes well.
Abstract
While the expected calibration error (ECE), which employs binning, is widely adopted to evaluate the calibration performance of machine learning models, theoretical understanding of its estimation bias is limited. In this paper, we present the first comprehensive analysis of the estimation bias in the two common binning strategies, uniform mass and uniform width binning. Our analysis establishes upper bounds on the bias, achieving an improved convergence rate. Moreover, our bounds reveal, for the first time, the optimal number of bins to minimize the estimation bias. We further extend our bias analysis to generalization error analysis based on the information-theoretic approach, deriving upper bounds that enable the numerical evaluation of how small the ECE is for unknown data. Experiments using deep learning models show that our bounds are nonvacuous thanks to this information-theoretic generalization analysis approach.
