When and Where to Reset Matters for Long-Term Test-Time Adaptation

Taejun Lim; Joong-Won Hwang; Kibok Lee

When and Where to Reset Matters for Long-Term Test-Time Adaptation

Taejun Lim, Joong-Won Hwang, Kibok Lee

TL;DR

An Adaptive and Selective Reset (ASR) scheme that dynamically determines when and where to reset, an importance-aware regularizer to recover essential knowledge lost due to reset, and an on-the-fly adaptation adjustment scheme to enhance adaptability under challenging domain shifts are proposed.

Abstract

When continual test-time adaptation (TTA) persists over the long term, errors accumulate in the model and further cause it to predict only a few classes for all inputs, a phenomenon known as model collapse. Recent studies have explored reset strategies that completely erase these accumulated errors. However, their periodic resets lead to suboptimal adaptation, as they occur independently of the actual risk of collapse. Moreover, their full resets cause catastrophic loss of knowledge acquired over time, even though such knowledge could be beneficial in the future. To this end, we propose (1) an Adaptive and Selective Reset (ASR) scheme that dynamically determines when and where to reset, (2) an importance-aware regularizer to recover essential knowledge lost due to reset, and (3) an on-the-fly adaptation adjustment scheme to enhance adaptability under challenging domain shifts. Extensive experiments across long-term TTA benchmarks demonstrate the effectiveness of our approach, particularly under challenging conditions. Our code is available at https://github.com/YonseiML/asr.

When and Where to Reset Matters for Long-Term Test-Time Adaptation

TL;DR

Abstract

Paper Structure (41 sections, 10 equations, 15 figures, 37 tables, 1 algorithm)

This paper contains 41 sections, 10 equations, 15 figures, 37 tables, 1 algorithm.

Introduction
Related Work
Method
Problem Definition
Motivation
Adaptive and Selective Reset
Importance-Aware Knowledge Recovery
On-the-Fly Adaptation Adjustment
Experiments
Setup
Main Results
Ablation Study
Analysis
Conclusion
Acknowledgments
...and 26 more sections

Figures (15)

Figure 1: Illustrative comparison between a naive reset approach (RDumb; press2024rdumb) and our Adaptive and Selective Reset (ASR) based on the same model (ETA; niu2022efficient). RDumb fully resets parameters at fixed intervals (e.g., every 1000 steps), whereas ASR dynamically decides when and where to reset, achieving more stable (i.e., less abrupt performance drop at each reset) and higher performance. Dotted vertical lines indicate when resets occur.
Figure 2: Overview of our Adaptive and Selective Reset (ASR) scheme, which compares prediction concentration $\mathcal{C}_t$ with its cumulative counterpart $\bar{\mathcal{C}}_{t-1}$ for each test batch from a long domain stream, triggers a reset when $\mathcal{C}_t > \bar{\mathcal{C}}_{t-1}$, indicating that the model is corrupted severely enough to collapse, and determines layers to reset based on $\mathcal{C}_t - \bar{\mathcal{C}}_{t-1}$, which reflects how severely the model is corrupted. On the upper side, icons inside dashed boxes, labeled with numbers, denote class labels. White icons represent correct predictions, while black icons represent incorrect predictions.
Figure 3: Corr. of $\mathcal{C}_t$ and Acc.
Figure 4: $\sigma(\mu)$ vs. $\mu(\sigma)$.
Figure 5: Robust to Var($|$logit$|$).
...and 10 more figures

When and Where to Reset Matters for Long-Term Test-Time Adaptation

TL;DR

Abstract

When and Where to Reset Matters for Long-Term Test-Time Adaptation

Authors

TL;DR

Abstract

Table of Contents

Figures (15)