Table of Contents
Fetching ...

Improving Entropy-Based Test-Time Adaptation from a Clustering View

Guoliang Lin, Hanjiang Lai, Yan Pan, Jian Yin

TL;DR

This work reinterprets entropy-based test-time adaptation (EBTTA) as a clustering process, where the forward pass corresponds to label assignment and the backward pass to center updating. Building on this perspective, the authors propose Test-Time Clustering (TTC), which enhances EBTTA via robust label assignment, a similarity-preserving constraint, sample selection, and gradient accumulation. Empirical results on CIFAR-10-C, CIFAR-100-C, and ImageNet-C show consistent improvements over baselines like TENT, with notable gains at small batch sizes and across diverse models. The clustering view clarifies when entropy minimization helps, and the proposed components translate into practical gains for robust, scalable test-time adaptation.

Abstract

Domain shift is a common problem in the realistic world, where training data and test data follow different data distributions. To deal with this problem, fully test-time adaptation (TTA) leverages the unlabeled data encountered during test time to adapt the model. In particular, entropy-based TTA (EBTTA) methods, which minimize the prediction's entropy on test samples, have shown great success. In this paper, we introduce a new clustering perspective on the EBTTA. It is an iterative algorithm: 1) in the assignment step, the forward process of the EBTTA models is the assignment of labels for these test samples, and 2) in the updating step, the backward process is the update of the model via the assigned samples. This new perspective allows us to explore how entropy minimization influences test-time adaptation. Accordingly, this observation can guide us to put forward the improvement of EBTTA. We propose to improve EBTTA from the assignment step and the updating step, where robust label assignment, similarity-preserving constraint, sample selection, and gradient accumulation are proposed to explicitly utilize more information. Experimental results demonstrate that our method can achieve consistent improvements on various datasets. Code is provided in the supplementary material.

Improving Entropy-Based Test-Time Adaptation from a Clustering View

TL;DR

This work reinterprets entropy-based test-time adaptation (EBTTA) as a clustering process, where the forward pass corresponds to label assignment and the backward pass to center updating. Building on this perspective, the authors propose Test-Time Clustering (TTC), which enhances EBTTA via robust label assignment, a similarity-preserving constraint, sample selection, and gradient accumulation. Empirical results on CIFAR-10-C, CIFAR-100-C, and ImageNet-C show consistent improvements over baselines like TENT, with notable gains at small batch sizes and across diverse models. The clustering view clarifies when entropy minimization helps, and the proposed components translate into practical gains for robust, scalable test-time adaptation.

Abstract

Domain shift is a common problem in the realistic world, where training data and test data follow different data distributions. To deal with this problem, fully test-time adaptation (TTA) leverages the unlabeled data encountered during test time to adapt the model. In particular, entropy-based TTA (EBTTA) methods, which minimize the prediction's entropy on test samples, have shown great success. In this paper, we introduce a new clustering perspective on the EBTTA. It is an iterative algorithm: 1) in the assignment step, the forward process of the EBTTA models is the assignment of labels for these test samples, and 2) in the updating step, the backward process is the update of the model via the assigned samples. This new perspective allows us to explore how entropy minimization influences test-time adaptation. Accordingly, this observation can guide us to put forward the improvement of EBTTA. We propose to improve EBTTA from the assignment step and the updating step, where robust label assignment, similarity-preserving constraint, sample selection, and gradient accumulation are proposed to explicitly utilize more information. Experimental results demonstrate that our method can achieve consistent improvements on various datasets. Code is provided in the supplementary material.
Paper Structure (28 sections, 1 theorem, 22 equations, 3 figures, 7 tables, 1 algorithm)

This paper contains 28 sections, 1 theorem, 22 equations, 3 figures, 7 tables, 1 algorithm.

Key Result

Lemma 1

For sample $x$, the entropy loss $H(x)$ would increase the probability of the class with the largest value, and decrease the sum of probabilities of the other classes .

Figures (3)

  • Figure 1: The probabilities of classes when applying entropy minimization on probabilities via gradient descent, where $K$ is the number of classes. The largest probability will get larger after each iteration of gradient descent.
  • Figure 2: Accuracy(%) on various datasets with different batch sizes. Gradient accumulation (GA) can boost performance significantly when the batch size is small.
  • Figure 3: Density plots of test-time feature distribution on CIFAR-10-C with impulse noise. The color for the feature distribution of the supervised learning as reference is yellow. The color for the feature distribution of TENT is blue. The color for the feature distribution of TTC is red. Each horizontal axis represents a channel. More overlapping areas means more alignment, hence TTC is more aligned with supervised learning.

Theorems & Definitions (1)

  • Lemma 1