Domain-adaptive and Subgroup-specific Cascaded Temperature Regression for Out-of-distribution Calibration
Jiexin Wang, Jiahao Chen, Bing Su
TL;DR
This work tackles the challenge of calibrating neural network confidence under out-of-distribution shifts. It introduces a meta-set-based cascaded temperature regression framework that learns domain-adaptive, subgroup-specific scaling by augmenting the validation set to form diverse meta-sets and training two regression heads for category-wise and confidence-level-wise calibration. The method builds subgroup representations based on predicted categories and confidence levels, learns temperatures via two regression networks, and optimizes a combined loss using ECE as the calibration metric. Empirical results on MNIST, CIFAR-10, and TinyImageNet show substantial improvements in calibration errors, especially under strong domain shifts, highlighting the approach's robustness and practical impact for reliable, post-hoc uncertainty estimation in real-world OOD scenarios.
Abstract
Although deep neural networks yield high classification accuracy given sufficient training data, their predictions are typically overconfident or under-confident, i.e., the prediction confidences cannot truly reflect the accuracy. Post-hoc calibration tackles this problem by calibrating the prediction confidences without re-training the classification model. However, current approaches assume congruence between test and validation data distributions, limiting their applicability to out-of-distribution scenarios. To this end, we propose a novel meta-set-based cascaded temperature regression method for post-hoc calibration. Our method tailors fine-grained scaling functions to distinct test sets by simulating various domain shifts through data augmentation on the validation set. We partition each meta-set into subgroups based on predicted category and confidence level, capturing diverse uncertainties. A regression network is then trained to derive category-specific and confidence-level-specific scaling, achieving calibration across meta-sets. Extensive experimental results on MNIST, CIFAR-10, and TinyImageNet demonstrate the effectiveness of the proposed method.
