Protected Test-Time Adaptation via Online Entropy Matching: A Betting Approach

Yarin Bar; Shalev Shaer; Yaniv Romano

Protected Test-Time Adaptation via Online Entropy Matching: A Betting Approach

Yarin Bar, Shalev Shaer, Yaniv Romano

TL;DR

POEM addresses test-time adaptation under distribution shifts by detecting entropy drift with betting martingales and aligning test-time entropy to the source via an entropy-matching, OT-inspired loss. It replaces entropy minimization with a distribution-matching objective using a transport map $\tilde{Z}_j=F_s^{-1}(Q(u_j))$, where $u_j=F_s(Z^t_j)$ and $Q$ is a likelihood-ratio derived transform. The method updates only normalization layers and uses online SF-OGD to adapt the betting parameter, achieving a no-harm behavior in-distribution and improved accuracy under shifts. Empirical results on ImageNet-C, CIFAR-C, and OfficeHome demonstrate competitive or superior performance with controlled adaptation and maintained calibration.

Abstract

We present a novel approach for test-time adaptation via online self-training, consisting of two components. First, we introduce a statistical framework that detects distribution shifts in the classifier's entropy values obtained on a stream of unlabeled samples. Second, we devise an online adaptation mechanism that utilizes the evidence of distribution shifts captured by the detection tool to dynamically update the classifier's parameters. The resulting adaptation process drives the distribution of test entropy values obtained from the self-trained classifier to match those of the source domain, building invariance to distribution shifts. This approach departs from the conventional self-training method, which focuses on minimizing the classifier's entropy. Our approach combines concepts in betting martingales and online learning to form a detection tool capable of quickly reacting to distribution shifts. We then reveal a tight relation between our adaptation scheme and optimal transport, which forms the basis of our novel self-supervised loss. Experimental results demonstrate that our approach improves test-time accuracy under distribution shifts while maintaining accuracy and calibration in their absence, outperforming leading entropy minimization methods across various scenarios.

Protected Test-Time Adaptation via Online Entropy Matching: A Betting Approach

TL;DR

, where

and

is a likelihood-ratio derived transform. The method updates only normalization layers and uses online SF-OGD to adapt the betting parameter, achieving a no-harm behavior in-distribution and improved accuracy under shifts. Empirical results on ImageNet-C, CIFAR-C, and OfficeHome demonstrate competitive or superior performance with controlled adaptation and maintained calibration.

Abstract

Paper Structure (52 sections, 4 theorems, 23 equations, 10 figures, 4 tables, 2 algorithms)

This paper contains 52 sections, 4 theorems, 23 equations, 10 figures, 4 tables, 2 algorithms.

Introduction
Preliminaries
Problem setup
Related work: test-time adaptation via self-training
Testing by betting
Proposed method: protected online entropy matching (POEM)
Preview of our method
Motivating example: entropy minimization vs. entropy matching
Online drift detection
Online model adaptation
Putting it all together
Experiments
Continual shifts
Single shift
In distribution
...and 37 more sections

Key Result

Proposition 1

The random process presented in eq:unif_test_martingale is a valid test martingale for $\mathcal{H}_0$eq:null.

Figures (10)

Figure 1: Demonstration of the advantage of entropy matching on toy binary classification problem with Gaussian data. The top panel represents an in-distribution setup in which $P_{XY}^t=P_{XY}^s$. The bottom panel illustrates an out-of-distribution setup, obtained by shifting the two Gaussians. The entropy matching (red) and entropy minimization (black) risks are presented as a function of $\omega$. The dashed yellow line presents the decision boundary of the pre-trained classifier. The points marked by stars correspond to the decision boundary of the adapted classifiers.
Figure 2: Continual test-time adaptation on ImageNet-C with a ViT model. Top: Per-corruption accuracy with a corruption segment size of 1,000 examples. Results are obtained over 10 independent trials; error bars are tiny. Bottom left: Severity shift---low (1) to high (5) and back to low. Bottom center: Severity shift---high (5) to low (1) and back to high. To improve the readability of these two graphs, we only present POEM, the best-performing baseline method (EATA), and the no-adapt approach. Bottom right: Mean accuracy under continual corruptions as a function of the corruption segment size.
Figure 3: In-distribution experiment on ImageNet (left panel): calibration error (ECE guo2017calibration) versus $\|\omega\|_F^2$---a metric that evaluates the classifier's parameters deviation from the original ViT model. Lower values on both axes are better. Results are averaged across 10 independent trials; standard errors and accuracy of each method are reported in Table \ref{['tab:in_dist_performance']} in the appendix. The behavior of the betting parameter (right panel): the value of $\epsilon$ is presented as a function of time for both in- and out-of-distribution experiments (a single shift, two severity levels).
Figure 4: Martingale behaviour with and without adaptation and on in-distribution data. Visualization of three scenarios: (1) out-of-distribution data (ImageNet-C, brightness level 1) without adaptation, (2) the same out-of-distribution data with online adaptation, and (3) in-distribution data (ImageNet) all on ResNet50. The top panel shows the martingale value, that is, the accumulated capital (in $\log$ scale) over time, while the bottom panel shows the corresponding betting variable $\epsilon$.
Figure 5: Continual test-time adaptation on ImageNet-C with a ResNet model. Top: Per-corruption accuracy with a corruption segment size of 1,000 examples. Results are obtained over 10 independent trials; error bars are tiny. Bottom left: Severity shift---low (1) to high (5) and back to low. Bottom center: Severity shift---high (5) to low (1) and back to high. Bottom right: Mean accuracy under continual corruptions as a function of the corruption segment size.
...and 5 more figures

Theorems & Definitions (9)

Definition 1: Test Martingale
Proposition 1
Proposition 2
Lemma 1
Theorem 1
proof
proof
proof
proof

Protected Test-Time Adaptation via Online Entropy Matching: A Betting Approach

TL;DR

Abstract

Protected Test-Time Adaptation via Online Entropy Matching: A Betting Approach

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (9)