Domain-adaptive and Subgroup-specific Cascaded Temperature Regression for Out-of-distribution Calibration

Jiexin Wang; Jiahao Chen; Bing Su

Domain-adaptive and Subgroup-specific Cascaded Temperature Regression for Out-of-distribution Calibration

Jiexin Wang, Jiahao Chen, Bing Su

TL;DR

This work tackles the challenge of calibrating neural network confidence under out-of-distribution shifts. It introduces a meta-set-based cascaded temperature regression framework that learns domain-adaptive, subgroup-specific scaling by augmenting the validation set to form diverse meta-sets and training two regression heads for category-wise and confidence-level-wise calibration. The method builds subgroup representations based on predicted categories and confidence levels, learns temperatures via two regression networks, and optimizes a combined loss using ECE as the calibration metric. Empirical results on MNIST, CIFAR-10, and TinyImageNet show substantial improvements in calibration errors, especially under strong domain shifts, highlighting the approach's robustness and practical impact for reliable, post-hoc uncertainty estimation in real-world OOD scenarios.

Abstract

Although deep neural networks yield high classification accuracy given sufficient training data, their predictions are typically overconfident or under-confident, i.e., the prediction confidences cannot truly reflect the accuracy. Post-hoc calibration tackles this problem by calibrating the prediction confidences without re-training the classification model. However, current approaches assume congruence between test and validation data distributions, limiting their applicability to out-of-distribution scenarios. To this end, we propose a novel meta-set-based cascaded temperature regression method for post-hoc calibration. Our method tailors fine-grained scaling functions to distinct test sets by simulating various domain shifts through data augmentation on the validation set. We partition each meta-set into subgroups based on predicted category and confidence level, capturing diverse uncertainties. A regression network is then trained to derive category-specific and confidence-level-specific scaling, achieving calibration across meta-sets. Extensive experimental results on MNIST, CIFAR-10, and TinyImageNet demonstrate the effectiveness of the proposed method.

Domain-adaptive and Subgroup-specific Cascaded Temperature Regression for Out-of-distribution Calibration

TL;DR

Abstract

Paper Structure (10 sections, 10 equations, 6 figures, 2 tables)

This paper contains 10 sections, 10 equations, 6 figures, 2 tables.

Introduction
Method
Problem definition
Subgroup-induced representation
Cascaded temperature regression mechanism
Experiments
Datasets and baselines
Experimental results
Ablation study
Conclusion

Figures (6)

Figure 1: Comparison of calibration methods. (a): Most existing methods only re-scale the confidence scores using the instance features (blue lines) or the logits (green lines). (b): Our cascaded calibration successively re-scale the logits based on the statistics of instance confidence scores from subgroups of different predicted categories and confidence levels.
Figure 2: Reliability Diagrams on CIFAR10.1: (left) the gap between accuracy and average confidence under different confidence levels and (right) the gap under different predicted categories. ECE is the expected calibration error naeini2015obtaining.
Figure 3: Overview of the proposed method. Blue/purple lines indicate the pipeline of the category-wise/confidence-level-wise calibration. Red lines indicate the pipeline of the computation to loss and the backpropagation.
Figure 4: Comparison of different methods. SCE(%) is reported.
Figure 5: Reliability diagram on USPS using different methods. Baselines are trained on the perturbed validation set.
...and 1 more figures

Domain-adaptive and Subgroup-specific Cascaded Temperature Regression for Out-of-distribution Calibration

TL;DR

Abstract

Domain-adaptive and Subgroup-specific Cascaded Temperature Regression for Out-of-distribution Calibration

Authors

TL;DR

Abstract

Table of Contents

Figures (6)