Table of Contents
Fetching ...

Parametric $ρ$-Norm Scaling Calibration

Siyuan Zhang, Linbo Xie

TL;DR

The paper tackles unreliable uncertainty calibration in modern high-capacity models by proposing a post-hoc calibrator based on Parametric $\rho$-Norm Scaling, which regulates output magnitude to mitigate overconfidence without sacrificing accuracy. It introduces a multi-level optimization objective that combines bin-level Square Calibration Error with an instance-level KL-divergence regularization to preserve distributional properties of the pre-calibration outputs. The key contributions include (1) a new $\rho$-Norm Scaling calibration model with theoretical properties like decision invariance, (2) a joint bin- and instance-level objective for calibrator optimization, and (3) extensive empirical validation across multiple datasets showing state-of-the-art calibration performance in post-processing settings. This approach yields more reliable confidence estimates while maintaining classifier performance, with practical impact for deploying calibrated models in real-world decision-making tasks. Mathematical formulations such as $g_c(z) = \frac{e^{r_c}}{\sum_j e^{r_j}}$ and $r_j(z) = \frac{z_j}{\gamma \|z\|_\rho + \beta}$ are central to controlling output magnitude and shaping the calibrated distribution.

Abstract

Output uncertainty indicates whether the probabilistic properties reflect objective characteristics of the model output. Unlike most loss functions and metrics in machine learning, uncertainty pertains to individual samples, but validating it on individual samples is unfeasible. When validated collectively, it cannot fully represent individual sample properties, posing a challenge in calibrating model confidence in a limited data set. Hence, it is crucial to consider confidence calibration characteristics. To counter the adverse effects of the gradual amplification of the classifier output amplitude in supervised learning, we introduce a post-processing parametric calibration method, $ρ$-Norm Scaling, which expands the calibrator expression and mitigates overconfidence due to excessive amplitude while preserving accuracy. Moreover, bin-level objective-based calibrator optimization often results in the loss of significant instance-level information. Therefore, we include probability distribution regularization, which incorporates specific priori information that the instance-level uncertainty distribution after calibration should resemble the distribution before calibration. Experimental results demonstrate the substantial enhancement in the post-processing calibrator for uncertainty calibration with our proposed method.

Parametric $ρ$-Norm Scaling Calibration

TL;DR

The paper tackles unreliable uncertainty calibration in modern high-capacity models by proposing a post-hoc calibrator based on Parametric -Norm Scaling, which regulates output magnitude to mitigate overconfidence without sacrificing accuracy. It introduces a multi-level optimization objective that combines bin-level Square Calibration Error with an instance-level KL-divergence regularization to preserve distributional properties of the pre-calibration outputs. The key contributions include (1) a new -Norm Scaling calibration model with theoretical properties like decision invariance, (2) a joint bin- and instance-level objective for calibrator optimization, and (3) extensive empirical validation across multiple datasets showing state-of-the-art calibration performance in post-processing settings. This approach yields more reliable confidence estimates while maintaining classifier performance, with practical impact for deploying calibrated models in real-world decision-making tasks. Mathematical formulations such as and are central to controlling output magnitude and shaping the calibrated distribution.

Abstract

Output uncertainty indicates whether the probabilistic properties reflect objective characteristics of the model output. Unlike most loss functions and metrics in machine learning, uncertainty pertains to individual samples, but validating it on individual samples is unfeasible. When validated collectively, it cannot fully represent individual sample properties, posing a challenge in calibrating model confidence in a limited data set. Hence, it is crucial to consider confidence calibration characteristics. To counter the adverse effects of the gradual amplification of the classifier output amplitude in supervised learning, we introduce a post-processing parametric calibration method, -Norm Scaling, which expands the calibrator expression and mitigates overconfidence due to excessive amplitude while preserving accuracy. Moreover, bin-level objective-based calibrator optimization often results in the loss of significant instance-level information. Therefore, we include probability distribution regularization, which incorporates specific priori information that the instance-level uncertainty distribution after calibration should resemble the distribution before calibration. Experimental results demonstrate the substantial enhancement in the post-processing calibrator for uncertainty calibration with our proposed method.

Paper Structure

This paper contains 13 sections, 3 theorems, 21 equations, 4 figures, 5 tables, 1 algorithm.

Key Result

Proposition 1

For any model output $z$ and the probability by mapping of ${g_c} = \frac{{{e^{{r_c}}}}}{{\sum\nolimits_{j = 1}^m {{e^{{r_j}}}} }}$ where ${r_j}\left( z \right) = \frac{{{z_j}}}{{\gamma {{\left\| z \right\|}_\rho }}}$, the following inequalities holds.

Figures (4)

  • Figure 1: Overview of our proposed post-hoc calibrator structure and optimization objective after pipeline of classifier optimization: (1) Addressing the issue of output magnitude amplification during supervised learning, we introduce a $\rho$-Norm Scaling calibration within the post-calibration framework. (2) Uncertainty represents the entire dataset statistically, making its optimization prone to losing sample-level information. To address this, we incorporate probabilistic similarity between pre-calibration and post-calibration as a instance-level loss, combined with bin-level loss.
  • Figure 2: Amplitude changes in classifier optimization. In these figures, the overall output magnitude of all samples is defined as $\frac{1}{{Nm}}\sum\nolimits_{i = 1}^N {{{\left\| {{z^i}} \right\|}_2}}$. During the supervised learning of the classifier, the output magnitude follows a specific pattern. (a) illustrates that in the absence of weight decay, the output amplitude steadily increases throughout the optimization process. Although this trend is alleviated in the presence of weight decay, as depicted in (b), the final magnitudes exhibit a positive correlation with the overall confidence distribution, shown in (c) and (d).
  • Figure 3: Confidence histograms and reliability diagrams for different post-hoc calibration methods with ResNet35 on CIFAR-100. Confidence histograms display the sample count within each bin, whereas reliability diagrams illustrate the difference between the average confidence (marked in red) and the accuracy (indicated in blue) in each bin.
  • Figure 4: Coincidence distribution of different optimization objective in Vector Scaling. In (a), samples are categorized into bins based on confidence levels through Softmax. Each sample in (b) and (c) belongs to the same bin as in (a). Using sample-level SCE alone in post-calibration results in a significant deviation from the original distribution. This challenge is mitigated by the bin-level KL.

Theorems & Definitions (3)

  • Proposition 1
  • Proposition 2
  • Proposition 3