Table of Contents
Fetching ...

MaxEnt Loss: Constrained Maximum Entropy for Calibration under Out-of-Distribution Shift

Dexter Neo, Stefan Winkler, Tsuhan Chen

TL;DR

This work introduces MaxEnt Loss, a constrained maximum-entropy regularizer for calibration under out-of-distribution shifts. By maximizing entropy subject to training-derived mean and/or variance constraints, and integrating these through Lagrange multipliers, the method yields three end-to-end forms (Mean M, Variance V, Mean+Variance M+V) that improve calibration without sacrificing accuracy. It explicitly links MaxEnt to Focal loss and demonstrates, through extensive synthetic and real-world OOD benchmarks, that MaxEnt achieves state-of-the-art calibration across diverse datasets and remains compatible with pre- and post-calibration techniques such as label smoothing and temperature scaling. The approach also analyzes the role of local constraints and feature-norm ordering, and discusses limitations and directions for future work, including adaptive multiplier schemes to enhance robustness to unseen shifts.

Abstract

We present a new loss function that addresses the out-of-distribution (OOD) calibration problem. While many objective functions have been proposed to effectively calibrate models in-distribution, our findings show that they do not always fare well OOD. Based on the Principle of Maximum Entropy, we incorporate helpful statistical constraints observed during training, delivering better model calibration without sacrificing accuracy. We provide theoretical analysis and show empirically that our method works well in practice, achieving state-of-the-art calibration on both synthetic and real-world benchmarks.

MaxEnt Loss: Constrained Maximum Entropy for Calibration under Out-of-Distribution Shift

TL;DR

This work introduces MaxEnt Loss, a constrained maximum-entropy regularizer for calibration under out-of-distribution shifts. By maximizing entropy subject to training-derived mean and/or variance constraints, and integrating these through Lagrange multipliers, the method yields three end-to-end forms (Mean M, Variance V, Mean+Variance M+V) that improve calibration without sacrificing accuracy. It explicitly links MaxEnt to Focal loss and demonstrates, through extensive synthetic and real-world OOD benchmarks, that MaxEnt achieves state-of-the-art calibration across diverse datasets and remains compatible with pre- and post-calibration techniques such as label smoothing and temperature scaling. The approach also analyzes the role of local constraints and feature-norm ordering, and discusses limitations and directions for future work, including adaptive multiplier schemes to enhance robustness to unseen shifts.

Abstract

We present a new loss function that addresses the out-of-distribution (OOD) calibration problem. While many objective functions have been proposed to effectively calibrate models in-distribution, our findings show that they do not always fare well OOD. Based on the Principle of Maximum Entropy, we incorporate helpful statistical constraints observed during training, delivering better model calibration without sacrificing accuracy. We provide theoretical analysis and show empirically that our method works well in practice, achieving state-of-the-art calibration on both synthetic and real-world benchmarks.
Paper Structure (30 sections, 14 equations, 8 figures, 5 tables)

This paper contains 30 sections, 14 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Uncalibrated models (left) make overconfident OOD misdiagnoses, resulting in dire consequences. Well-calibrated models (right) exhibit lower confidence, reflecting their uncertainty for OOD samples.
  • Figure 2: In the face of unknown OOD, we argue that predictions should not deviate too far from the observed global Gibbs distribution. For different values of constraints, higher $\mu$ values tend toward larger classes, while a higher $\sigma^2$ results in distributions being more spread out.
  • Figure 3: Test and calibration error curves highlighting the performance of different loss functions on CIFAR/CIFAR-C. As distribution shifts worsen from 0 to 5, all methods converge to similar test errors, while our method remains well calibrated.
  • Figure 4: Samples from training and augmented validation/test sets for CIFAR10, CIFAR100 and TinyImageNet respectively (synthetic OOD, top 3 rows). Samples from Camelyon17-Wilds, iWildCam-Wilds and FMoW-Wilds are shown in the bottom 3 rows.
  • Figure 5: Bin-strength densities (top) and reliability diagrams (bottom) computed using $B=10$ bins for different loss functions, evaluated on CIFAR100-C. MaxEnt Loss delivers a more uniform spread of probability densities and a reliability bar plot that better matches the ideal diagonal.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Definition 1: Mean Constraint
  • Definition 2: Variance Constraint
  • Definition 3: Mean and Variance Constraints