Table of Contents
Fetching ...

Effortless Active Labeling for Long-Term Test-Time Adaptation

Guowei Wang, Changxing Ding

TL;DR

This work addresses the growing labeling burden in long-term test-time adaptation by introducing EATTA, which restricts annotation to at most one sample per batch. It identifies border samples between source and target domains via feature perturbations and uses their pseudo-labels to guide single-step learning, while a gradient-norm based debiasing mechanism balances supervised and unsupervised objectives with EMA refinement. The approach yields state-of-the-art results on ImageNet-C/R/K/A and PACS under both CTTA and FTTA settings, with significantly lower annotation costs than prior ATTA methods. The findings suggest that careful sample selection focused on learnability, combined with dynamic loss balancing, enables robust, efficient long-term adaptation in dynamic deployment scenarios.

Abstract

Long-term test-time adaptation (TTA) is a challenging task due to error accumulation. Recent approaches tackle this issue by actively labeling a small proportion of samples in each batch, yet the annotation burden quickly grows as the batch number increases. In this paper, we investigate how to achieve effortless active labeling so that a maximum of one sample is selected for annotation in each batch. First, we annotate the most valuable sample in each batch based on the single-step optimization perspective in the TTA context. In this scenario, the samples that border between the source- and target-domain data distributions are considered the most feasible for the model to learn in one iteration. Then, we introduce an efficient strategy to identify these samples using feature perturbation. Second, we discover that the gradient magnitudes produced by the annotated and unannotated samples have significant variations. Therefore, we propose balancing their impact on model optimization using two dynamic weights. Extensive experiments on the popular ImageNet-C, -R, -K, -A and PACS databases demonstrate that our approach consistently outperforms state-of-the-art methods with significantly lower annotation costs.

Effortless Active Labeling for Long-Term Test-Time Adaptation

TL;DR

This work addresses the growing labeling burden in long-term test-time adaptation by introducing EATTA, which restricts annotation to at most one sample per batch. It identifies border samples between source and target domains via feature perturbations and uses their pseudo-labels to guide single-step learning, while a gradient-norm based debiasing mechanism balances supervised and unsupervised objectives with EMA refinement. The approach yields state-of-the-art results on ImageNet-C/R/K/A and PACS under both CTTA and FTTA settings, with significantly lower annotation costs than prior ATTA methods. The findings suggest that careful sample selection focused on learnability, combined with dynamic loss balancing, enables robust, efficient long-term adaptation in dynamic deployment scenarios.

Abstract

Long-term test-time adaptation (TTA) is a challenging task due to error accumulation. Recent approaches tackle this issue by actively labeling a small proportion of samples in each batch, yet the annotation burden quickly grows as the batch number increases. In this paper, we investigate how to achieve effortless active labeling so that a maximum of one sample is selected for annotation in each batch. First, we annotate the most valuable sample in each batch based on the single-step optimization perspective in the TTA context. In this scenario, the samples that border between the source- and target-domain data distributions are considered the most feasible for the model to learn in one iteration. Then, we introduce an efficient strategy to identify these samples using feature perturbation. Second, we discover that the gradient magnitudes produced by the annotated and unannotated samples have significant variations. Therefore, we propose balancing their impact on model optimization using two dynamic weights. Extensive experiments on the popular ImageNet-C, -R, -K, -A and PACS databases demonstrate that our approach consistently outperforms state-of-the-art methods with significantly lower annotation costs.

Paper Structure

This paper contains 20 sections, 6 equations, 6 figures, 12 tables.

Figures (6)

  • Figure 1: Differences between (a) existing ATTA methods, and the proposed (b) EATTA. EATTA requires labeling only one sample per batch or multiple batches. And it does not rely on sample buffers. The solid and dashed black lines in the figure represent forward and backward propagation, respectively.
  • Figure 2: Overview of our EATTA approach. It aims to select at most one sample that is both informative and feasible to learn by a single-step optimization from each batch of data. We regard this sample lies at the border between the source- and target-domain data distributions, and identify this sample by observing its sensitivity to feature perturbations. Moreover, EATTA adopts a gradient norm-based debiasing strategy to adaptively combine the training objectives on the labeled and unlabeled data.
  • Figure 3: (a) Distribution of the prediction entropy for the selected samples according to each of three criteria, respectively. (b) Distribution of the prediction confidence change after feature perturbation on the original pseudo-label of each selected sample. Three sample selection criteria are considered, i.e., maximum prediction entropy wang2014new, incremental clustering gui2024active, and ours. The experiments are conducted on the ImageNet-C database with severity level 5 gaussian noise.
  • Figure 4: (a) The gradient magnitudes of the supervised and unsupervised loss terms in Eq. \ref{['eqn.2']}. (b) We repeat the former experiment while replacing the entropy loss with cross-entropy loss on the unannotated samples. To employ this loss, we estimate pseudo-labels for the unannotated samples. Both experiments are conducted on the ImageNet-C database with severity level 5 gaussian noise.
  • Figure 5: Comparisons in true positive rates on each category of the ImageNet-C database by different ATTA methods.
  • ...and 1 more figures