Table of Contents
Fetching ...

FoCTTA: Low-Memory Continual Test-Time Adaptation with Focus

Youbing Hu, Yun Cheng, Zimu Zhou, Anqi Lu, Zhiqiang Cao, Zhijun Li

TL;DR

FoCTTA tackles memory bottlenecks in continual test-time adaptation by shifting adaptation away from Batch Normalization (BN) affine parameters to a small set of adaptation-critical representation layers, identified via a warm-up gradient-based metric. It selectively updates the top-$K$ representation layers during test-time, enabling effective adaptation with small batch sizes and reduced activation storage, and uses an entropy-based objective with a regularization term to prevent forgetting. Empirically, FoCTTA outperforms state-of-the-art CTTA methods on CIFAR10-C, CIFAR100-C, and ImageNet-C under the same memory constraints and achieves about a threefold memory reduction, while also delivering faster adaptation times. The approach is well-suited for memory-limited IoT devices and edge deployments, offering practical benefits in both computation and storage requirements.

Abstract

Continual adaptation to domain shifts at test time (CTTA) is crucial for enhancing the intelligence of deep learning enabled IoT applications. However, prevailing TTA methods, which typically update all batch normalization (BN) layers, exhibit two memory inefficiencies. First, the reliance on BN layers for adaptation necessitates large batch sizes, leading to high memory usage. Second, updating all BN layers requires storing the activations of all BN layers for backpropagation, exacerbating the memory demand. Both factors lead to substantial memory costs, making existing solutions impractical for IoT devices. In this paper, we present FoCTTA, a low-memory CTTA strategy. The key is to automatically identify and adapt a few drift-sensitive representation layers, rather than blindly update all BN layers. The shift from BN to representation layers eliminates the need for large batch sizes. Also, by updating adaptation-critical layers only, FoCTTA avoids storing excessive activations. This focused adaptation approach ensures that FoCTTA is not only memory-efficient but also maintains effective adaptation. Evaluations show that FoCTTA improves the adaptation accuracy over the state-of-the-arts by 4.5%, 4.9%, and 14.8% on CIFAR10-C, CIFAR100-C, and ImageNet-C under the same memory constraints. Across various batch sizes, FoCTTA reduces the memory usage by 3-fold on average, while improving the accuracy by 8.1%, 3.6%, and 0.2%, respectively, on the three datasets.

FoCTTA: Low-Memory Continual Test-Time Adaptation with Focus

TL;DR

FoCTTA tackles memory bottlenecks in continual test-time adaptation by shifting adaptation away from Batch Normalization (BN) affine parameters to a small set of adaptation-critical representation layers, identified via a warm-up gradient-based metric. It selectively updates the top- representation layers during test-time, enabling effective adaptation with small batch sizes and reduced activation storage, and uses an entropy-based objective with a regularization term to prevent forgetting. Empirically, FoCTTA outperforms state-of-the-art CTTA methods on CIFAR10-C, CIFAR100-C, and ImageNet-C under the same memory constraints and achieves about a threefold memory reduction, while also delivering faster adaptation times. The approach is well-suited for memory-limited IoT devices and edge deployments, offering practical benefits in both computation and storage requirements.

Abstract

Continual adaptation to domain shifts at test time (CTTA) is crucial for enhancing the intelligence of deep learning enabled IoT applications. However, prevailing TTA methods, which typically update all batch normalization (BN) layers, exhibit two memory inefficiencies. First, the reliance on BN layers for adaptation necessitates large batch sizes, leading to high memory usage. Second, updating all BN layers requires storing the activations of all BN layers for backpropagation, exacerbating the memory demand. Both factors lead to substantial memory costs, making existing solutions impractical for IoT devices. In this paper, we present FoCTTA, a low-memory CTTA strategy. The key is to automatically identify and adapt a few drift-sensitive representation layers, rather than blindly update all BN layers. The shift from BN to representation layers eliminates the need for large batch sizes. Also, by updating adaptation-critical layers only, FoCTTA avoids storing excessive activations. This focused adaptation approach ensures that FoCTTA is not only memory-efficient but also maintains effective adaptation. Evaluations show that FoCTTA improves the adaptation accuracy over the state-of-the-arts by 4.5%, 4.9%, and 14.8% on CIFAR10-C, CIFAR100-C, and ImageNet-C under the same memory constraints. Across various batch sizes, FoCTTA reduces the memory usage by 3-fold on average, while improving the accuracy by 8.1%, 3.6%, and 0.2%, respectively, on the three datasets.

Paper Structure

This paper contains 17 sections, 6 equations, 4 figures, 10 tables.

Figures (4)

  • Figure 1: Evaluate TENT memory cost and performance across various batch sizes. (a) TENT memory costs at different batch sizes. (b) TENT performance at different batch sizes. The red dashed line represents the performance of the pre-trained model, i.e., without using any CTTA method.
  • Figure 2: We evaluated the performance of selecting the top-K layers of the adaptation model using various metrics on three commonly used CTTA benchmarks. In addition, we also showcased the feature representation of CTTA during adaptation using various metrics on CIFAR10-C with WideResNet-28. All these results are on a logarithmic scale, and we normalized them by linear transformation with the maximum value set to 0. A larger block index corresponds to deeper layers. (a): Gradient norm of different blocks. (b):$\ell_1$ norm of different blocks. (c): Weight norm of different blocks. (d): We optimized the top-K layers of the adaptation model on diverse datasets using varied metrics to assess its performance while keeping the other layers frozen. The red dashed line indicates the performance of the original model.
  • Figure 3: The pipeline of our FoCTTA framework.(a) Pre-training, which is agnostic to architecture and pre-training methods, any pre-trained model can be used as initialization. (b) Warm-up training, which employs augmented data $x^\prime$ of the source data to simulate distributional shifts, computing the sensitivity of each layer in the feature extractor $g_s$ to domain shifts. (c) At test-time $t$, continuously changing target data $x^t$ is used as input, and only the adaptation-critical representation layer is optimized.
  • Figure 4: Visualization of the discriminative power of the sample features of different methods on CIFAR10-C. Colors represent sample classes.