Exploring Human-in-the-Loop Test-Time Adaptation by Synergizing Active Learning and Model Selection

Yushu Li; Yongyi Su; Xulei Yang; Kui Jia; Xun Xu

Exploring Human-in-the-Loop Test-Time Adaptation by Synergizing Active Learning and Model Selection

Yushu Li, Yongyi Su, Xulei Yang, Kui Jia, Xun Xu

TL;DR

The paper addresses hyper-parameter sensitivity and continual distribution shifts in test-time adaptation by introducing Human-In-The-Loop Test-Time Adaptation (HILTTA). It jointly performs active learning-based sample labeling and model selection over a discrete hyper-parameter pool $\Omega$, using labeled data as a validation set to select $\theta^*(\omega)$ and to conduct supervised updates. Key innovations include anchor deviation regularization, EMA validation smoothing, and K-Margin sampling that balances uncertainty and diversity. Across five TTA benchmarks, HILTTA consistently improves performance over unsupervised TTA, outperforms existing active TTA baselines, and remains compatible with off-the-shelf TTA methods; code is publicly available.

Abstract

Existing test-time adaptation (TTA) approaches often adapt models with the unlabeled testing data stream. A recent attempt relaxed the assumption by introducing limited human annotation, referred to as Human-In-the-Loop Test-Time Adaptation (HILTTA) in this study. The focus of existing HILTTA studies lies in selecting the most informative samples to label, a.k.a. active learning. In this work, we are motivated by a pitfall of TTA, i.e. sensitivity to hyper-parameters, and propose to approach HILTTA by synergizing active learning and model selection. Specifically, we first select samples for human annotation (active learning) and then use the labeled data to select optimal hyper-parameters (model selection). To prevent the model selection process from overfitting to local distributions, multiple regularization techniques are employed to complement the validation objective. A sample selection strategy is further tailored by considering the balance between active learning and model selection purposes. We demonstrate on 5 TTA datasets that the proposed HILTTA approach is compatible with off-the-shelf TTA methods and such combinations substantially outperform the state-of-the-art HILTTA methods. Importantly, our proposed method can always prevent choosing the worst hyper-parameters on all off-the-shelf TTA methods. The source code is available at https://github.com/Yushu-Li/HILTTA.

Exploring Human-in-the-Loop Test-Time Adaptation by Synergizing Active Learning and Model Selection

TL;DR

, using labeled data as a validation set to select

and to conduct supervised updates. Key innovations include anchor deviation regularization, EMA validation smoothing, and K-Margin sampling that balances uncertainty and diversity. Across five TTA benchmarks, HILTTA consistently improves performance over unsupervised TTA, outperforms existing active TTA baselines, and remains compatible with off-the-shelf TTA methods; code is publicly available.

Abstract

Paper Structure (29 sections, 8 equations, 10 figures, 15 tables, 1 algorithm)

This paper contains 29 sections, 8 equations, 10 figures, 15 tables, 1 algorithm.

Introduction
Related Work
Test-Time Adaptation
Active Learning
Methodology
Human-in-the-Loop Test-Time Adaptation Protocol
Model Selection with Sparse Annotation
Design of Validation Objective
Sample Selection for HILTTA
Overall Algorithm for HILTTA
Experiments
Experimental setting
Evaluations on HILTTA
Ablation & Additional Study
Conclusion
...and 14 more sections

Figures (10)

Figure 1: An illustration of Human-in-the-Loop Test-Time Adaptation framework. Upon collecting a batch of testing data $\{x_i\}_{i=1\cdots N_b}$, we train multiple candidate models $\theta^*(\omega_m)$ using candidate hyper-parameters $\{\omega_m\}$. We further select candidate samples for annotation through K-Margin. Model selection is achieved by choosing the hyper-parameter minimizing validation objective. We repeat until the testing data stream concludes, following a continual TTA framework.
Figure 2: Comparison of different active learning strategies under HILTTA with TENT wang2020tent as TTA unsupervised model adaptation strategy. Average classification error is reported, where a lower value indicates better performance.
Figure 3: Performance on the 3D point cloud classification task in ModelNet40-C dataset. Average classification error is reported.
Figure 4: Performance of TENT wang2020tent with and without HIL. Bold lines represent the performance with different hyper-parameter values, while dashed lines indicate the performance with model selection.
Figure 5: (Left) Average selected learning rate per corruption for TENT on ImageNet-C. (Right) Average error rate per corruption for TENT on ImageNet-C.
...and 5 more figures

Exploring Human-in-the-Loop Test-Time Adaptation by Synergizing Active Learning and Model Selection

TL;DR

Abstract

Exploring Human-in-the-Loop Test-Time Adaptation by Synergizing Active Learning and Model Selection

Authors

TL;DR

Abstract

Table of Contents

Figures (10)