TEA: Test-time Energy Adaptation

Yige Yuan; Bingbing Xu; Liang Hou; Fei Sun; Huawei Shen; Xueqi Cheng

TEA: Test-time Energy Adaptation

Yige Yuan, Bingbing Xu, Liang Hou, Fei Sun, Huawei Shen, Xueqi Cheng

TL;DR

TEA reframes test-time adaptation as an energy-based problem by converting a trained classifier into an energy-based model with the energy function $E_ heta(oldsymbol{x}) = -\log \sum_y \exp(f_ heta(oldsymbol{x})[y])$. It then jointly leverages Contrastive Divergence and Langevin dynamics to align the model's distribution with the unseen test distribution, updating only normalization layers for efficiency. Across CIFAR-10/100, TinyImageNet, and PACS, TEA outperforms state-of-the-art TTA methods in both image corruption and domain generalization, while also improving confidence calibration. The results reveal a strong link between energy reduction and improved generalization, and demonstrate TEA’s ability to imbibe a richer perception of the test distribution without access to training data or processes, suggesting a practical pathway to robust generalization under distribution shifts.

Abstract

Test-time adaptation (TTA) aims to improve model generalizability when test data diverges from training distribution, offering the distinct advantage of not requiring access to training data and processes, especially valuable in the context of large pre-trained models. However, current TTA methods fail to address the fundamental issue: covariate shift, i.e., the decreased generalizability can be attributed to the model's reliance on the marginal distribution of the training data, which may impair model calibration and introduce confirmation bias. To address this, we propose a novel energy-based perspective, enhancing the model's perception of target data distributions without requiring access to training data or processes. Building on this perspective, we introduce $\textbf{T}$est-time $\textbf{E}$nergy $\textbf{A}$daptation ($\textbf{TEA}$), which transforms the trained classifier into an energy-based model and aligns the model's distribution with the test data's, enhancing its ability to perceive test distributions and thus improving overall generalizability. Extensive experiments across multiple tasks, benchmarks and architectures demonstrate TEA's superior generalization performance against state-of-the-art methods. Further in-depth analyses reveal that TEA can equip the model with a comprehensive perception of test distribution, ultimately paving the way toward improved generalization and calibration.

TEA: Test-time Energy Adaptation

TL;DR

TEA reframes test-time adaptation as an energy-based problem by converting a trained classifier into an energy-based model with the energy function

. It then jointly leverages Contrastive Divergence and Langevin dynamics to align the model's distribution with the unseen test distribution, updating only normalization layers for efficiency. Across CIFAR-10/100, TinyImageNet, and PACS, TEA outperforms state-of-the-art TTA methods in both image corruption and domain generalization, while also improving confidence calibration. The results reveal a strong link between energy reduction and improved generalization, and demonstrate TEA’s ability to imbibe a richer perception of the test distribution without access to training data or processes, suggesting a practical pathway to robust generalization under distribution shifts.

Abstract

est-time

nergy

daptation (

), which transforms the trained classifier into an energy-based model and aligns the model's distribution with the test data's, enhancing its ability to perceive test distributions and thus improving overall generalizability. Extensive experiments across multiple tasks, benchmarks and architectures demonstrate TEA's superior generalization performance against state-of-the-art methods. Further in-depth analyses reveal that TEA can equip the model with a comprehensive perception of test distribution, ultimately paving the way toward improved generalization and calibration.

Paper Structure (38 sections, 10 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 38 sections, 10 equations, 8 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Test Time Adaptation
Energy Based Model
Method
Notation and Overall Architecture
Energy Adaptation for Test Distribution
Modulation Parameters
Discussion
Experiment
Experimental Setup
Datasets and Metrics
Backbones and Baselines
Implementation
Adaptation Results
...and 23 more sections

Figures (8)

Figure 1: Performance vs. energy on model trained with original distribution, tested across various shifted distributions. Upper: error rate change within energy score groups. Lower: loss variation with energy scores, each point denoting a distribution. Marker styles and opacity reflect distribution types and divergence.
Figure 2: Overview of Test-time Energy Adaptation (TEA). Given a trained model (classifier) and in-coming test data, TEA directly integrates test data distribution into the trained classifier by fine-tuning its normalization layers through energy-based training: TEA constructs an Energy-Based Model from the classifier by reinterpreting the negative log-sum-exp of logits as an energy function, and employs Contrastive Divergence as the adaptation objective which decrease the energy of test samples while increase the energy of negative samples generated by Langevin Dynamics. This adaptation increases the likelihood of test samples under the classifier's distribution, enabling a gradual alignment between the distributions of the trained classifier and the test data, thereby enhancing generalizability.
Figure 3: This illustration captures the energy reduction and generalizability enhancement achieved by TEA across CIFAR-10-C, CIFAR-100-C, and TinyImageNet-200-C, displayed from left to right. The upper set of graphs trace the evolution of energy score, corresponding loss and accuracy in response to incrementally increasing TEA adaptation steps. The lower set uncovers the extent of energy reduction and the consequent performance improvement before and after executing TEA adaptation, under different levels of distribution shift.
Figure 4: Test distribution perception visualization for identical training and testing distributions on MNIST and CIFAR-10.
Figure 6: Calibration comparison between TEA and baselines on CIFAR-10 dataset. In an ideal scenario for optimal calibration, blue bars should align with the diagonal line, and a smaller grey gap area is preferred. Quantitative measures are provided via ECE and MCE metrics, where lower values indicate better calibration.
...and 3 more figures

TEA: Test-time Energy Adaptation

TL;DR

Abstract

TEA: Test-time Energy Adaptation

Authors

TL;DR

Abstract

Table of Contents

Figures (8)