Parameter-Selective Continual Test-Time Adaptation

Jiaxu Tian; Fan Lyu

Parameter-Selective Continual Test-Time Adaptation

Jiaxu Tian, Fan Lyu

TL;DR

The paper addresses continual test-time adaptation (CTTA) under persistent domain shifts, where conventional mean-teacher (MT) methods update all parameters, leading to error accumulation and forgetting. It proposes Parameter-Selective Mean Teacher (PSMT), which uses Fisher information to selectively update crucial parameters, combining a selective distillation strategy for the student with a selective EMA update for the teacher. The framework introduces a quadratic FI-based regularizer $ ext{$ ext{L}_{ ext{stu}}$}= ext{\sum}_i F_i ( heta_{i,t}- heta_{i,t-1})^2$ and a masked EMA update $ heta'_{t+1}= extbf{m} heta'_t+(1- extbf{m})igl( abla heta'_t+(1- abla) heta_{t+1}igr)$ guided by FI thresholds, enabling learning of new information while preserving past knowledge. Experiments on CIFAR10C, CIFAR100C, and ImageNet-C show state-of-the-art performance across online CTTA, GTTA, and forgetting evaluations, demonstrating the practical impact of FI-guided parameter selection in continual test-time learning.

Abstract

Continual Test-Time Adaptation (CTTA) aims to adapt a pretrained model to ever-changing environments during the test time under continuous domain shifts. Most existing CTTA approaches are based on the Mean Teacher (MT) structure, which contains a student and a teacher model, where the student is updated using the pseudo-labels from the teacher model, and the teacher is then updated by exponential moving average strategy. However, these methods update the MT model indiscriminately on all parameters of the model. That is, some critical parameters involving sharing knowledge across different domains may be erased, intensifying error accumulation and catastrophic forgetting. In this paper, we introduce Parameter-Selective Mean Teacher (PSMT) method, which is capable of effectively updating the critical parameters within the MT network under domain shifts. First, we introduce a selective distillation mechanism in the student model, which utilizes past knowledge to regularize novel knowledge, thereby mitigating the impact of error accumulation. Second, to avoid catastrophic forgetting, in the teacher model, we create a mask through Fisher information to selectively update parameters via exponential moving average, with preservation measures applied to crucial parameters. Extensive experimental results verify that PSMT outperforms state-of-the-art methods across multiple benchmark datasets. Our code is available at \url{https://github.com/JiaxuTian/PSMT}.

Parameter-Selective Continual Test-Time Adaptation

TL;DR

ext{L}_{ ext{stu}}

and a masked EMA update

guided by FI thresholds, enabling learning of new information while preserving past knowledge. Experiments on CIFAR10C, CIFAR100C, and ImageNet-C show state-of-the-art performance across online CTTA, GTTA, and forgetting evaluations, demonstrating the practical impact of FI-guided parameter selection in continual test-time learning.

Abstract

Paper Structure (15 sections, 9 equations, 4 figures, 9 tables, 1 algorithm)

This paper contains 15 sections, 9 equations, 4 figures, 9 tables, 1 algorithm.

Introduction
Related Work
Continual Test-Time Adaptation
Parameter Selection in Neural Network
Method
Overview
Student Update using Selective Distillation
Teacher Update using Selective Exponential Moving Average
Overall Update
Experiment
Experimental Setting
Major Results
Results on Gradual Test-Time Adaptation
Ablation Study and Analysis
Conclusion

Figures (4)

Figure 1: (a) In the traditional MT-based methods, all parameters are updated, which can be problematic for CTTA tasks. This approach may result in the phenomenon of error accumulation. (b) Our method improves on the issue of updating crucial parameters by selectively restoring them instead. The student model focuses on acquiring new knowledge efficiently, while the teacher model is dedicated to reinforcing previous knowledge.
Figure 2: PSMT framework: Using test samples as inputs for the student model and augmented samples for the teacher model, PSMT enhances the conventional student model through regularization of existing knowledge using past knowledge. PSMT improves the traditional EMA method by selecting crucial parameters based on Fisher information.
Figure 3: Average error rate on ImageNet-to-ImageNet-C for all corruption types over 10 varied sequences.
Figure 4: The performance of Fisher information in CoTTA method and selective EMA is assessed during domain shifts on the CIFAR10-to-CIFAR10C dataset. SEMA represents the selective EMA.

Parameter-Selective Continual Test-Time Adaptation

TL;DR

Abstract

Parameter-Selective Continual Test-Time Adaptation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)