PALM: Pushing Adaptive Learning Rate Mechanisms for Continual Test-Time Adaptation
Sarthak Kumar Maharana, Baoming Zhang, Yunhui Guo
TL;DR
PALM tackles continual test-time adaptation by removing reliance on noisy pseudo-labels and instead using gradient-based prediction uncertainty to select which layers to adapt. It then modulates learning rates for parameters in the selected layers according to a domain-shift sensitivity measure that blends current and past task information, while freezing unselected layers to mitigate catastrophic forgetting. The approach combines a KL-divergence-based uncertainty objective with a sensitivity-driven LR update and a dual-term optimization (entropy plus consistency), enabling selective, efficient adaptation. Empirical results on CIFAR-10C, CIFAR-100C, and ImageNet-C show PALM achieving state-of-the-art CTTA/GTTA performance with far fewer trainable parameters and lower computational demands than full-model methods.
Abstract
Real-world vision models in dynamic environments face rapid shifts in domain distributions, leading to decreased recognition performance. Using unlabeled test data, continuous test-time adaptation (CTTA) directly adjusts a pre-trained source discriminative model to these changing domains. A highly effective CTTA method involves applying layer-wise adaptive learning rates for selectively adapting pre-trained layers. However, it suffers from the poor estimation of domain shift and the inaccuracies arising from the pseudo-labels. This work aims to overcome these limitations by identifying layers for adaptation via quantifying model prediction uncertainty without relying on pseudo-labels. We utilize the magnitude of gradients as a metric, calculated by backpropagating the KL divergence between the softmax output and a uniform distribution, to select layers for further adaptation. Subsequently, for the parameters exclusively belonging to these selected layers, with the remaining ones frozen, we evaluate their sensitivity to approximate the domain shift and adjust their learning rates accordingly. We conduct extensive image classification experiments on CIFAR-10C, CIFAR-100C, and ImageNet-C, demonstrating the superior efficacy of our method compared to prior approaches.
