Table of Contents
Fetching ...

CLUE: Neural Networks Calibration via Learning Uncertainty-Error alignment

Pedro Mendes, Paolo Romano, David Garlan

TL;DR

CLUE tackles unreliable uncertainty estimates by introducing a differentiable, domain-agnostic calibration loss that directly aligns predicted uncertainty with observed model error, avoiding binning or distributional comparisons. The method augments standard losses with a calibration term $L(y,\hat{y}) = \alpha \cdot L_e(y,\hat{y}) + (1-\alpha) \cdot (L_e(y,\hat{y}) - u(\hat{y}))^2$, and employs MC Dropout to estimate uncertainty during training and inference. Extensive cross-domain evaluations in vision, regression, and language tasks demonstrate superior calibration quality and competitive predictive performance, with minimal computational overhead. The approach shows robust uncertainty estimates and improved OOD detection capabilities, highlighting its practical impact for trustworthy AI systems. Overall, CLUE provides a scalable, general framework for uncertainty calibration that integrates smoothly with modern architectures and real-world pipelines.

Abstract

Reliable uncertainty estimation is critical for deploying neural networks (NNs) in real-world applications. While existing calibration techniques often rely on post-hoc adjustments or coarse-grained binning methods, they remain limited in scalability, differentiability, and generalization across domains. In this work, we introduce CLUE (Calibration via Learning Uncertainty-Error Alignment), a novel approach that explicitly aligns predicted uncertainty with observed error during training, grounded in the principle that well-calibrated models should produce uncertainty estimates that match their empirical loss. CLUE adopts a novel loss function that jointly optimizes predictive performance and calibration, using summary statistics of uncertainty and loss as proxies. The proposed method is fully differentiable, domain-agnostic, and compatible with standard training pipelines. Through extensive experiments on vision, regression, and language modeling tasks, including out-of-distribution and domain-shift scenarios, we demonstrate that CLUE achieves superior calibration quality and competitive predictive performance with respect to state-of-the-art approaches without imposing significant computational overhead.

CLUE: Neural Networks Calibration via Learning Uncertainty-Error alignment

TL;DR

CLUE tackles unreliable uncertainty estimates by introducing a differentiable, domain-agnostic calibration loss that directly aligns predicted uncertainty with observed model error, avoiding binning or distributional comparisons. The method augments standard losses with a calibration term , and employs MC Dropout to estimate uncertainty during training and inference. Extensive cross-domain evaluations in vision, regression, and language tasks demonstrate superior calibration quality and competitive predictive performance, with minimal computational overhead. The approach shows robust uncertainty estimates and improved OOD detection capabilities, highlighting its practical impact for trustworthy AI systems. Overall, CLUE provides a scalable, general framework for uncertainty calibration that integrates smoothly with modern architectures and real-world pipelines.

Abstract

Reliable uncertainty estimation is critical for deploying neural networks (NNs) in real-world applications. While existing calibration techniques often rely on post-hoc adjustments or coarse-grained binning methods, they remain limited in scalability, differentiability, and generalization across domains. In this work, we introduce CLUE (Calibration via Learning Uncertainty-Error Alignment), a novel approach that explicitly aligns predicted uncertainty with observed error during training, grounded in the principle that well-calibrated models should produce uncertainty estimates that match their empirical loss. CLUE adopts a novel loss function that jointly optimizes predictive performance and calibration, using summary statistics of uncertainty and loss as proxies. The proposed method is fully differentiable, domain-agnostic, and compatible with standard training pipelines. Through extensive experiments on vision, regression, and language modeling tasks, including out-of-distribution and domain-shift scenarios, we demonstrate that CLUE achieves superior calibration quality and competitive predictive performance with respect to state-of-the-art approaches without imposing significant computational overhead.

Paper Structure

This paper contains 11 sections, 9 equations, 6 tables.