Active Test-Time Adaptation: Theoretical Analyses and An Algorithm

Shurui Gui; Xiner Li; Shuiwang Ji

Active Test-Time Adaptation: Theoretical Analyses and An Algorithm

Shurui Gui, Xiner Li, Shuiwang Ji

TL;DR

Active Test-Time Adaptation (ATTA) integrates limited labeled test-time samples into fully online adaptation to address substantial distribution shifts without access to source data. The authors establish learning-theoretic bounds showing that labeled test instances tighten the test-domain error, leveraging the $\mathcal{H}\Delta\mathcal{H}$-distance and a weighted empirical risk $\hat{\epsilon}_{\bm{w}}(h(t))$, while addressing catastrophic forgetting through balanced entropy minimization. They introduce SimATTA, a lightweight algorithm that partitions incoming data by entropy, maintains informative anchors via incremental clustering, and trains on a mix of pseudo-labeled source-like anchors and actively labeled anchors. Across PACS, VLCS, Office-Home, and Tiny-ImageNet-C, ATTA demonstrates substantial gains over traditional TTA and competitive performance with ADA, validating both the theoretical guarantees and practical viability of budgeted, real-time adaptation.

Abstract

Test-time adaptation (TTA) addresses distribution shifts for streaming test data in unsupervised settings. Currently, most TTA methods can only deal with minor shifts and rely heavily on heuristic and empirical studies. To advance TTA under domain shifts, we propose the novel problem setting of active test-time adaptation (ATTA) that integrates active learning within the fully TTA setting. We provide a learning theory analysis, demonstrating that incorporating limited labeled test instances enhances overall performances across test domains with a theoretical guarantee. We also present a sample entropy balancing for implementing ATTA while avoiding catastrophic forgetting (CF). We introduce a simple yet effective ATTA algorithm, known as SimATTA, using real-time sample selection techniques. Extensive experimental results confirm consistency with our theoretical analyses and show that the proposed ATTA method yields substantial performance improvements over TTA methods while maintaining efficiency and shares similar effectiveness to the more demanding active domain adaptation (ADA) methods. Our code is available at https://github.com/divelab/ATTA

Active Test-Time Adaptation: Theoretical Analyses and An Algorithm

TL;DR

-distance and a weighted empirical risk

, while addressing catastrophic forgetting through balanced entropy minimization. They introduce SimATTA, a lightweight algorithm that partitions incoming data by entropy, maintains informative anchors via incremental clustering, and trains on a mix of pseudo-labeled source-like anchors and actively labeled anchors. Across PACS, VLCS, Office-Home, and Tiny-ImageNet-C, ATTA demonstrates substantial gains over traditional TTA and competitive performance with ADA, validating both the theoretical guarantees and practical viability of budgeted, real-time adaptation.

Abstract

Paper Structure (39 sections, 14 theorems, 73 equations, 13 figures, 31 tables, 2 algorithms)

This paper contains 39 sections, 14 theorems, 73 equations, 13 figures, 31 tables, 2 algorithms.

Introduction
The Active Test-Time Adaptation Formulation
Theoretical Studies
Alleviating Distribution Shifts through Active Test-Time Adaptation
Mitigating Catastrophic Forgetting with Balanced Entropy Minimization
An ATTA Algorithm
Algorithm Overview
Incremental Clustering
Experimental Studies
The failure of Test-Time Adaptation
Efficiency & Enhanced TTA Setting Comparisons
Comparisons to a Stronger Setting: Active Domain Adaptation
Conclusion and Discussion
Broader Impacts
FAQ & Discussions
...and 24 more sections

Key Result

Theorem 1

Let $H$ be a hypothesis class of VC-dimension $d$. At time step $t$, for ATTA data domains ${D}_S,U_{te}(1),\cdots, U_{te}(t),\cdots$, $S_i$ are unlabeled samples of size $m$ sampled from each of the $t+1$ domains respectively. The total number of samples in ${D}_{tr}(t)$ is $N$ and the ratio of sam where $C=\sqrt{ \left(\sum_{i=0}^{t}\frac{w_i^2}{\lambda_i}\right)\left(\frac{ d\log(2N) - \log(\de

Figures (13)

Figure 1: (a) Empirical validation of Thm. \ref{['mainthm']}. We train a series of models on $N=2000$ samples from the PACS li2017deeper dataset given different $\lambda_0$ and $w_0$ and display the test domain loss of each model. Red points are the test loss minimums given a fixed $\lambda_0$. The orange line is the reference where $w_0=\lambda_0$. We observe that $w_0$ with loss minimums are located closed to the orange line but slightly smaller than $\lambda_0$, which validates our findings in Eq. (\ref{['Eq:optimal_w']}). (b) Empirical analysis with an uncertainty balancing. Given source pre-trained models, we fine-tune the models on 500 samples with different $\lambda_0$ and $w_0$, and display the combined error surface of test and source error. Although a small $\lambda_0$ is good for test domain error, it can lead to non-trivial source error exacerbation. Therefore, we can observe that the global loss minimum (green X) locates in a relatively high-$\lambda_0$ region.
Figure 2: Overview of the SimATTA framework.
Figure 3: Initial IC step: normal clustering. Left: Clustering results. Right: Selecting new anchors.
Figure 4: The first IC step. Left: Weighted clustering results. Right: Selecting new anchors.
Figure 5: The second IC step. Left: Weighted clustering results. Right: Selecting new anchors.
...and 8 more figures

Theorems & Definitions (23)

Definition 1: The ATTA problem
Theorem 1
Theorem 2
Corollary 3
Corollary 4
Definition 2: $\mathcal{H}$-divergence
Definition 3: $\mathcal{H}\Delta\mathcal{H}$-distance
Theorem 5
Lemma 6
Lemma 7
...and 13 more

Active Test-Time Adaptation: Theoretical Analyses and An Algorithm

TL;DR

Abstract

Active Test-Time Adaptation: Theoretical Analyses and An Algorithm

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (23)