Information-theoretic Generalization Analysis for Expected Calibration Error

Futoshi Futami; Masahiro Fujisawa

Information-theoretic Generalization Analysis for Expected Calibration Error

Futoshi Futami, Masahiro Fujisawa

TL;DR

This work addresses the gap in understanding the estimation bias of binning-based calibration errors by providing a unified information-theoretic treatment of both uniform width (UWB) and uniform mass binning (UMB). It derives sharp upper bounds on the total bias of the binned ECE, identifies the optimal bin count B = O(n_te^{1/3}) that minimizes this bias, and shows the resulting bias scales as O(n_te^{-1/3}). Extending to generalization analysis, the authors develop IT-based bounds for the ECE and TCE gaps via eCMI/fCMI, relate these to metric entropy, and analyze the impact of data reuse in recalibration. Experimental results on synthetic and real datasets confirm the nonvacuity of the bounds and illustrate practical bin-size guidance, including the potential benefits of reusing training data for recalibration when calibration generalizes well.

Abstract

While the expected calibration error (ECE), which employs binning, is widely adopted to evaluate the calibration performance of machine learning models, theoretical understanding of its estimation bias is limited. In this paper, we present the first comprehensive analysis of the estimation bias in the two common binning strategies, uniform mass and uniform width binning. Our analysis establishes upper bounds on the bias, achieving an improved convergence rate. Moreover, our bounds reveal, for the first time, the optimal number of bins to minimize the estimation bias. We further extend our bias analysis to generalization error analysis based on the information-theoretic approach, deriving upper bounds that enable the numerical evaluation of how small the ECE is for unknown data. Experiments using deep learning models show that our bounds are nonvacuous thanks to this information-theoretic generalization analysis approach.

Information-theoretic Generalization Analysis for Expected Calibration Error

TL;DR

Abstract

Paper Structure (61 sections, 17 theorems, 162 equations, 6 figures, 4 tables)

This paper contains 61 sections, 17 theorems, 162 equations, 6 figures, 4 tables.

Introduction
Preliminaries
Calibration error and its estimator
Biases of ECE and limitation of existing work
Information-theoretic generalization error analysis
Proposed analysis of total bias in binned ECE
Generalization error analysis in calibration error
Information-theoretic analysis of generalization error in ECE and TCE
On the behavior of eCMI and the order of total bias on metric entropy
Generalized error analysis on recalibration and bias due to reuse of training data
Related work
Experiments
Verification of our bounds
Experiments on synthetic datasets:
Experiments on image datasets:
...and 46 more sections

Key Result

Theorem 1

Under the CMI setting, we have where $\mathrm{eCMI}(l)\coloneqq I(l(\mathcal{A}(\tilde{Z}_U,R),\tilde{Z});U|\tilde{Z})$ and $l(\mathcal{A}(\tilde{Z}_U,R),\tilde{Z})$ is an $n \times 2$ loss matrix obtained by applying $l(\mathcal{A}(\tilde{Z}_U,R),\cdot)$ elementwise to $\tilde{Z}$.

Figures (6)

Figure 1: Behavior of the upper bound in Eq. \ref{['eq_test_data_use_total_bias']} as $n$ increases when UWB is used. The following two terms: less calibrate and better calibrate refer to $\beta = (0.5, -1.5)$ and $\beta = (0.2, -1.9)$, respectively, where the former setting produces a worse value of the TCE estimator.
Figure 2: Behavior of the upper bound in Eq. \ref{['eq:bias_bound']} for various $B$ as $n$ increases (mean $\pm$ std.). For clarity, only the results using UMB are shown. The ECE gap is shown for $B = \lfloor n^{1/3} \rfloor$ since the change in $B$ did not result in significant differences. We refer to Figure \ref{['fig:boundplot_logscale']} in Appendix \ref{['app:bound_plot_various']} for a detailed analysis of the relationship between (log-scaled) ECE gap values and bound values across different bin settings.
Figure 3: Behavior of the upper bound in Eq. \ref{['eq:bias_bound']} for various $B$ as $n$ increases (mean $\pm$ std.). For clarity, only the results using UWB are shown. The ECE gap is evaluated by estimating $\mathbb{E}_{R,S_{\mathrm{tr}},S_{\mathrm{te}}}[|\mathrm{ECE}(f_W,S_{\mathrm{te}})-\mathrm{ECE}(f_W, S_{{\mathrm{tr}}})|]$. The ECE gap is shown for $B = \lfloor n^{1/3} \rfloor$ since the change in $B$ did not result in significant differences.
Figure 4: Behavior of the upper bound in Eq. \ref{['eq:tight_bound_thm7']} as $n$ increases for different number of bins (mean $\pm$ std.) when using UMB after recalibration.
Figure 5: Behavior of the upper bound in Eq. \ref{['eq:bias_bound']} for various $B$ as $n$ increases (mean $\pm$ std.; log-scale) when UMB is used. The ECE gap is evaluated by estimating $\mathbb{E}_{R,S_{\mathrm{tr}},S_{\mathrm{te}}}[|\mathrm{ECE}(f_W,S_{\mathrm{te}})-\mathrm{ECE}(f_W, S_{{\mathrm{tr}}})|]$. These results show that the variance of the ECE gap obtained in non-optimal $B$ settings is large, while the ECE gap in settings based on the optimal $B$ is stable.
...and 1 more figures

Theorems & Definitions (33)

Theorem 1: Theorem 6.7 in steinke20a
Theorem 2: Statistical bias analysis
proof : Proof sketch
Theorem 3: Binning bias analysis
proof : Proof sketch
Corollary 1
Theorem 4: Generalization error bound of the ECE
proof : Proof sketch
Theorem 5: Generalization error bound of the TCE
Theorem 6: Metric entropy
...and 23 more

Information-theoretic Generalization Analysis for Expected Calibration Error

TL;DR

Abstract

Information-theoretic Generalization Analysis for Expected Calibration Error

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (33)