Energy-based Preference Optimization for Test-time Adaptation

Yewon Han; Seoyun Yang; Taesup Kim

Energy-based Preference Optimization for Test-time Adaptation

Yewon Han, Seoyun Yang, Taesup Kim

TL;DR

This paper tackles test-time adaptation under distribution shifts with unlabeled target data. It introduces EpoTTA, a sampling-free energy-based framework that reformulates the target marginal as $p_{\theta}(x)=(1/Z)q_{\phi}(x)\exp(-\tilde{E}_{\theta}(x)/\beta)$ and derives a direct preference-optimization objective in the spirit of Direct Preference Optimization to avoid normalization constants and SGLD. Empirically, EpoTTA achieves strong accuracy and calibration on CIFAR10-C, CIFAR100-C, and TinyImageNet-C, while substantially reducing computational cost compared with SGLD-based energy methods; it also demonstrates robustness under non-IID test streams and with small replay buffers. The approach offers a practical, scalable alternative for real-world TTA, providing reliable adaptation without reliance on uncertain pseudo-labels or expensive sampling, and it substantially narrows the gap between performance and deployment constraints in dynamic environments.

Abstract

Test-Time Adaptation (TTA) enhances model robustness by enabling adaptation to target distributions that differ from training distributions, improving real-world generalizability. Existing TTA approaches focus on adjusting the conditional distribution; however these methods often depend on uncertain predictions in the absence of label information, leading to unreliable performance. Energy-based frameworks suggest a promising alternative to address distribution shifts without relying on uncertain predictions, instead computing the marginal distribution of target data. However, they involve the critical challenge of requiring extensive SGLD sampling, which is impractical for test-time scenarios requiring immediate adaptation. In this work, we propose Energy-based Preference Optimization for Test-time Adaptation (EPOTTA), which is based on a sampling free strategy. We first parameterize the target model using a pretrained model and residual energy function, enabling marginal likelihood maximization of target data without sampling. Building on the observation that the parameterization is mathematically equivalent to DPO objective, we then directly adapt the model to a target distribution without explicitly training the residual. Our experiments verify that EPOTTA is well-calibrated and performant while achieving computational efficiency.

Energy-based Preference Optimization for Test-time Adaptation

TL;DR

Abstract

Energy-based Preference Optimization for Test-time Adaptation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)