Table of Contents
Fetching ...

Uncertainty Weighted Gradients for Model Calibration

Jinxu Lin, Linwei Tao, Minjing Dong, Chang Xu

TL;DR

This work tackles the mis-calibration of deep classifiers by reframing uncertainty-aware training as gradient-weighted optimization. It shows that directly weighting gradients by sample uncertainty—via a generalized Brier Score—yields more reliable probability estimates than traditional loss-weighting approaches like Focal Loss. The authors introduce BSCE-GRA, a gradient-based loss that aligns optimization with uncertainty, and provide theoretical and empirical evidence of improved calibration across multiple datasets, models, and metrics. The approach offers practical benefits, achieving state-of-the-art calibration with minimal post-processing and broad applicability to diverse architectures and scales.

Abstract

Model calibration is essential for ensuring that the predictions of deep neural networks accurately reflect true probabilities in real-world classification tasks. However, deep networks often produce over-confident or under-confident predictions, leading to miscalibration. Various methods have been proposed to address this issue by designing effective loss functions for calibration, such as focal loss. In this paper, we analyze its effectiveness and provide a unified loss framework of focal loss and its variants, where we mainly attribute their superiority in model calibration to the loss weighting factor that estimates sample-wise uncertainty. Based on our analysis, existing loss functions fail to achieve optimal calibration performance due to two main issues: including misalignment during optimization and insufficient precision in uncertainty estimation. Specifically, focal loss cannot align sample uncertainty with gradient scaling and the single logit cannot indicate the uncertainty. To address these issues, we reformulate the optimization from the perspective of gradients, which focuses on uncertain samples. Meanwhile, we propose using the Brier Score as the loss weight factor, which provides a more accurate uncertainty estimation via all the logits. Extensive experiments on various models and datasets demonstrate that our method achieves state-of-the-art (SOTA) performance.

Uncertainty Weighted Gradients for Model Calibration

TL;DR

This work tackles the mis-calibration of deep classifiers by reframing uncertainty-aware training as gradient-weighted optimization. It shows that directly weighting gradients by sample uncertainty—via a generalized Brier Score—yields more reliable probability estimates than traditional loss-weighting approaches like Focal Loss. The authors introduce BSCE-GRA, a gradient-based loss that aligns optimization with uncertainty, and provide theoretical and empirical evidence of improved calibration across multiple datasets, models, and metrics. The approach offers practical benefits, achieving state-of-the-art calibration with minimal post-processing and broad applicability to diverse architectures and scales.

Abstract

Model calibration is essential for ensuring that the predictions of deep neural networks accurately reflect true probabilities in real-world classification tasks. However, deep networks often produce over-confident or under-confident predictions, leading to miscalibration. Various methods have been proposed to address this issue by designing effective loss functions for calibration, such as focal loss. In this paper, we analyze its effectiveness and provide a unified loss framework of focal loss and its variants, where we mainly attribute their superiority in model calibration to the loss weighting factor that estimates sample-wise uncertainty. Based on our analysis, existing loss functions fail to achieve optimal calibration performance due to two main issues: including misalignment during optimization and insufficient precision in uncertainty estimation. Specifically, focal loss cannot align sample uncertainty with gradient scaling and the single logit cannot indicate the uncertainty. To address these issues, we reformulate the optimization from the perspective of gradients, which focuses on uncertain samples. Meanwhile, we propose using the Brier Score as the loss weight factor, which provides a more accurate uncertainty estimation via all the logits. Extensive experiments on various models and datasets demonstrate that our method achieves state-of-the-art (SOTA) performance.

Paper Structure

This paper contains 27 sections, 20 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: $g(p,\gamma)$ of Focal Loss vs predicted confidence $\hat{p}_c$.
  • Figure 2: An illustration of value of gradient weight function on a 4 class classification. It is obvious that $u_{\text{\tiny FL}}$ varies only along the $p_i$ axis and $u_{\text{\tiny DFL}}$ changes along the $p_i$ and $p_j$ axes. $u_{\text{\tiny BS}}$ responds to changes across all axes, providing a more complete uncertainty evaluation.
  • Figure 3: Comparison of different ECE metrics. The first three plots show the uncertainty for CIFAR-10 using ResNet-50, while the remaining plots represent ResNet-110 on CIFAR-10.
  • Figure 4: Evolution of gradient norm distributions across different training epochs for various loss functions. The scatter plots show the relationship between gradient norm and Brier Score for different loss functions (Focal Loss, Dual Focal Loss, BSCE-GRA).
  • Figure 5: Figure \ref{['fig: ece over epochs']} presents the evolution of ECE throughout the training process, demonstrating that our method rapidly converges to the best result by epoch 250. The subsequent figures depict the gradient magnitudes of various methods between epochs 150 and 250.
  • ...and 3 more figures