Parameter-Selective Continual Test-Time Adaptation
Jiaxu Tian, Fan Lyu
TL;DR
The paper addresses continual test-time adaptation (CTTA) under persistent domain shifts, where conventional mean-teacher (MT) methods update all parameters, leading to error accumulation and forgetting. It proposes Parameter-Selective Mean Teacher (PSMT), which uses Fisher information to selectively update crucial parameters, combining a selective distillation strategy for the student with a selective EMA update for the teacher. The framework introduces a quadratic FI-based regularizer $ ext{$ ext{L}_{ ext{stu}}$}= ext{\sum}_i F_i ( heta_{i,t}- heta_{i,t-1})^2$ and a masked EMA update $ heta'_{t+1}= extbf{m} heta'_t+(1- extbf{m})igl( abla heta'_t+(1- abla) heta_{t+1}igr)$ guided by FI thresholds, enabling learning of new information while preserving past knowledge. Experiments on CIFAR10C, CIFAR100C, and ImageNet-C show state-of-the-art performance across online CTTA, GTTA, and forgetting evaluations, demonstrating the practical impact of FI-guided parameter selection in continual test-time learning.
Abstract
Continual Test-Time Adaptation (CTTA) aims to adapt a pretrained model to ever-changing environments during the test time under continuous domain shifts. Most existing CTTA approaches are based on the Mean Teacher (MT) structure, which contains a student and a teacher model, where the student is updated using the pseudo-labels from the teacher model, and the teacher is then updated by exponential moving average strategy. However, these methods update the MT model indiscriminately on all parameters of the model. That is, some critical parameters involving sharing knowledge across different domains may be erased, intensifying error accumulation and catastrophic forgetting. In this paper, we introduce Parameter-Selective Mean Teacher (PSMT) method, which is capable of effectively updating the critical parameters within the MT network under domain shifts. First, we introduce a selective distillation mechanism in the student model, which utilizes past knowledge to regularize novel knowledge, thereby mitigating the impact of error accumulation. Second, to avoid catastrophic forgetting, in the teacher model, we create a mask through Fisher information to selectively update parameters via exponential moving average, with preservation measures applied to crucial parameters. Extensive experimental results verify that PSMT outperforms state-of-the-art methods across multiple benchmark datasets. Our code is available at \url{https://github.com/JiaxuTian/PSMT}.
