Consistency-Guided Temperature Scaling Using Style and Content Information for Out-of-Domain Calibration

Wonjeong Choi; Jungwuk Park; Dong-Jun Han; Younghyun Park; Jaekyun Moon

Consistency-Guided Temperature Scaling Using Style and Content Information for Out-of-Domain Calibration

Wonjeong Choi, Jungwuk Park, Dong-Jun Han, Younghyun Park, Jaekyun Moon

TL;DR

This work addresses the challenge of calibrating neural network confidences under out-of-domain (OOD) shifts, where target-domain validation data are unavailable. It introduces Consistency-Guided Temperature Scaling (CTS), a post-hoc temperature scaling method that enforces prediction consistency against style and content variations via two auxiliary losses, $L_{style}$ and $L_{content}$, without altering model parameters. CTS demonstrates superior OOD calibration across multiple multi-domain benchmarks (e.g., PACS, Office-Home, Digits-DG, VLCS) and maintains accuracy, while also being compatible with existing domain-generalization strategies. The approach provides practical benefits for trustworthy AI in real-world, multi-domain settings by leveraging only source-domain information to achieve domain-invariant calibration.

Abstract

Research interests in the robustness of deep neural networks against domain shifts have been rapidly increasing in recent years. Most existing works, however, focus on improving the accuracy of the model, not the calibration performance which is another important requirement for trustworthy AI systems. Temperature scaling (TS), an accuracy-preserving post-hoc calibration method, has been proven to be effective in in-domain settings, but not in out-of-domain (OOD) due to the difficulty in obtaining a validation set for the unseen domain beforehand. In this paper, we propose consistency-guided temperature scaling (CTS), a new temperature scaling strategy that can significantly enhance the OOD calibration performance by providing mutual supervision among data samples in the source domains. Motivated by our observation that over-confidence stemming from inconsistent sample predictions is the main obstacle to OOD calibration, we propose to guide the scaling process by taking consistencies into account in terms of two different aspects -- style and content -- which are the key components that can well-represent data samples in multi-domain settings. Experimental results demonstrate that our proposed strategy outperforms existing works, achieving superior OOD calibration performance on various datasets. This can be accomplished by employing only the source domains without compromising accuracy, making our scheme directly applicable to various trustworthy AI systems.

Consistency-Guided Temperature Scaling Using Style and Content Information for Out-of-Domain Calibration

TL;DR

and

, without altering model parameters. CTS demonstrates superior OOD calibration across multiple multi-domain benchmarks (e.g., PACS, Office-Home, Digits-DG, VLCS) and maintains accuracy, while also being compatible with existing domain-generalization strategies. The approach provides practical benefits for trustworthy AI in real-world, multi-domain settings by leveraging only source-domain information to achieve domain-invariant calibration.

Abstract

Paper Structure (32 sections, 5 equations, 3 figures, 3 tables)

This paper contains 32 sections, 5 equations, 3 figures, 3 tables.

Introduction
Main contributions.
Related Works
Calibration methods.
Calibration for out-of-domain (OOD) scenarios.
Domain generalization (DG).
Background
Temperature Scaling (TS)
Style Shifting
Consistency-Guided Temperature Scaling
Problem Setup
Key Insights: Correlations between Consistency and OOD Calibration
Setup.
Key observations.
Causality analysis.
...and 17 more sections

Figures (3)

Figure 1: (a-b) Correlations between variance of predictions and OOD calibration performance for test samples from different target domains. Samples with high variance are more likely to show poor OOD calibration performance in both cases of style and content shifts on PACS dataset. (c-d) Reliability diagrams for comparing calibration tendency depending on the variance of predictions under style and content variations. It can be confirmed that the poor calibration performance of high variance samples (pink line) arises from the over-confident predictions. We note that the size of points indicates the relative counts of samples in each corresponding confidence interval.
Figure 2: Overview of our consistency-guided temperature scaling (CTS). Samples from the same class on the validation set are fed into the model in a pair-wise manner, and three different intermediate features (original, style shifted, content shifted) are generated. Then, style/content shifted logits $\mathbb{P}(y|f^{(s_j, c_i)})$/$\mathbb{P}(y|f^{(s_i, c_j)})$ are created, and TS is performed with consistency losses described in Section \ref{['overallloss']}.
Figure 3: Compatibility with augmentation-based DG methods on PACS dataset. Each scheme is combined with either MixStyle zhou2021mixstyle or DSU li2022uncertainty.

Consistency-Guided Temperature Scaling Using Style and Content Information for Out-of-Domain Calibration

TL;DR

Abstract

Consistency-Guided Temperature Scaling Using Style and Content Information for Out-of-Domain Calibration

Authors

TL;DR

Abstract

Table of Contents

Figures (3)