Table of Contents
Fetching ...

PALM: Pushing Adaptive Learning Rate Mechanisms for Continual Test-Time Adaptation

Sarthak Kumar Maharana, Baoming Zhang, Yunhui Guo

TL;DR

PALM tackles continual test-time adaptation by removing reliance on noisy pseudo-labels and instead using gradient-based prediction uncertainty to select which layers to adapt. It then modulates learning rates for parameters in the selected layers according to a domain-shift sensitivity measure that blends current and past task information, while freezing unselected layers to mitigate catastrophic forgetting. The approach combines a KL-divergence-based uncertainty objective with a sensitivity-driven LR update and a dual-term optimization (entropy plus consistency), enabling selective, efficient adaptation. Empirical results on CIFAR-10C, CIFAR-100C, and ImageNet-C show PALM achieving state-of-the-art CTTA/GTTA performance with far fewer trainable parameters and lower computational demands than full-model methods.

Abstract

Real-world vision models in dynamic environments face rapid shifts in domain distributions, leading to decreased recognition performance. Using unlabeled test data, continuous test-time adaptation (CTTA) directly adjusts a pre-trained source discriminative model to these changing domains. A highly effective CTTA method involves applying layer-wise adaptive learning rates for selectively adapting pre-trained layers. However, it suffers from the poor estimation of domain shift and the inaccuracies arising from the pseudo-labels. This work aims to overcome these limitations by identifying layers for adaptation via quantifying model prediction uncertainty without relying on pseudo-labels. We utilize the magnitude of gradients as a metric, calculated by backpropagating the KL divergence between the softmax output and a uniform distribution, to select layers for further adaptation. Subsequently, for the parameters exclusively belonging to these selected layers, with the remaining ones frozen, we evaluate their sensitivity to approximate the domain shift and adjust their learning rates accordingly. We conduct extensive image classification experiments on CIFAR-10C, CIFAR-100C, and ImageNet-C, demonstrating the superior efficacy of our method compared to prior approaches.

PALM: Pushing Adaptive Learning Rate Mechanisms for Continual Test-Time Adaptation

TL;DR

PALM tackles continual test-time adaptation by removing reliance on noisy pseudo-labels and instead using gradient-based prediction uncertainty to select which layers to adapt. It then modulates learning rates for parameters in the selected layers according to a domain-shift sensitivity measure that blends current and past task information, while freezing unselected layers to mitigate catastrophic forgetting. The approach combines a KL-divergence-based uncertainty objective with a sensitivity-driven LR update and a dual-term optimization (entropy plus consistency), enabling selective, efficient adaptation. Empirical results on CIFAR-10C, CIFAR-100C, and ImageNet-C show PALM achieving state-of-the-art CTTA/GTTA performance with far fewer trainable parameters and lower computational demands than full-model methods.

Abstract

Real-world vision models in dynamic environments face rapid shifts in domain distributions, leading to decreased recognition performance. Using unlabeled test data, continuous test-time adaptation (CTTA) directly adjusts a pre-trained source discriminative model to these changing domains. A highly effective CTTA method involves applying layer-wise adaptive learning rates for selectively adapting pre-trained layers. However, it suffers from the poor estimation of domain shift and the inaccuracies arising from the pseudo-labels. This work aims to overcome these limitations by identifying layers for adaptation via quantifying model prediction uncertainty without relying on pseudo-labels. We utilize the magnitude of gradients as a metric, calculated by backpropagating the KL divergence between the softmax output and a uniform distribution, to select layers for further adaptation. Subsequently, for the parameters exclusively belonging to these selected layers, with the remaining ones frozen, we evaluate their sensitivity to approximate the domain shift and adjust their learning rates accordingly. We conduct extensive image classification experiments on CIFAR-10C, CIFAR-100C, and ImageNet-C, demonstrating the superior efficacy of our method compared to prior approaches.
Paper Structure (11 sections, 11 equations, 3 figures, 8 tables)

This paper contains 11 sections, 11 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: The framework of our proposed method PALM.
  • Figure 2: Illustrations of LR importance $i_{j,n}^t$ across different convolutional blocks(B)/stages(S)/layers(L) during CTTA. [Top row] - Variations for the domain "glass_blur" across datasets. [Bottom row] - Variations for the domain "snow".
  • Figure 3: Ablation results on smoothing factor $\alpha$ and temperature T.