Table of Contents
Fetching ...

SURGEON: Memory-Adaptive Fully Test-Time Adaptation via Dynamic Activation Sparsity

Ke Ma, Jiaqi Tang, Bin Guo, Fan Dang, Sicong Liu, Zhui Zhu, Lei Wu, Cheng Fang, Ying-Cong Chen, Zhiwen Yu, Yunhao Liu

TL;DR

SURGEON addresses the memory bottleneck of fully test-time adaptation (FTTA) on resource-constrained devices by introducing dynamic activation sparsity, which prunes activations at per-layer dynamic ratios during adaptation. It defines two layer-level importances, Gradient Importance and Layer Activation Memory, combines them into an integrated indicator I_i, and converts this into layer-specific pruning ratios p_i^t to balance learning capacity with memory usage. The approach is architecture-agnostic and does not require modifications to the original training procedure, achieving state-of-the-art accuracy-memory trade-offs across CIFAR-C, ImageNet-C, and ACDC benchmarks, for both CNN and transformer backbones. Empirically, SURGEON outperforms baselines like BN-stat, TENT, CoTTA, EcoTTA, and MECTA in memory efficiency while maintaining or improving accuracy, and it remains effective in real-world deployments on edge hardware, enabling robust deployments under distribution shifts.

Abstract

Despite the growing integration of deep models into mobile terminals, the accuracy of these models declines significantly due to various deployment interferences. Test-time adaptation (TTA) has emerged to improve the performance of deep models by adapting them to unlabeled target data online. Yet, the significant memory cost, particularly in resource-constrained terminals, impedes the effective deployment of most backward-propagation-based TTA methods. To tackle memory constraints, we introduce SURGEON, a method that substantially reduces memory cost while preserving comparable accuracy improvements during fully test-time adaptation (FTTA) without relying on specific network architectures or modifications to the original training procedure. Specifically, we propose a novel dynamic activation sparsity strategy that directly prunes activations at layer-specific dynamic ratios during adaptation, allowing for flexible control of learning ability and memory cost in a data-sensitive manner. Among this, two metrics, Gradient Importance and Layer Activation Memory, are considered to determine the layer-wise pruning ratios, reflecting accuracy contribution and memory efficiency, respectively. Experimentally, our method surpasses the baselines by not only reducing memory usage but also achieving superior accuracy, delivering SOTA performance across diverse datasets, architectures, and tasks.

SURGEON: Memory-Adaptive Fully Test-Time Adaptation via Dynamic Activation Sparsity

TL;DR

SURGEON addresses the memory bottleneck of fully test-time adaptation (FTTA) on resource-constrained devices by introducing dynamic activation sparsity, which prunes activations at per-layer dynamic ratios during adaptation. It defines two layer-level importances, Gradient Importance and Layer Activation Memory, combines them into an integrated indicator I_i, and converts this into layer-specific pruning ratios p_i^t to balance learning capacity with memory usage. The approach is architecture-agnostic and does not require modifications to the original training procedure, achieving state-of-the-art accuracy-memory trade-offs across CIFAR-C, ImageNet-C, and ACDC benchmarks, for both CNN and transformer backbones. Empirically, SURGEON outperforms baselines like BN-stat, TENT, CoTTA, EcoTTA, and MECTA in memory efficiency while maintaining or improving accuracy, and it remains effective in real-world deployments on edge hardware, enabling robust deployments under distribution shifts.

Abstract

Despite the growing integration of deep models into mobile terminals, the accuracy of these models declines significantly due to various deployment interferences. Test-time adaptation (TTA) has emerged to improve the performance of deep models by adapting them to unlabeled target data online. Yet, the significant memory cost, particularly in resource-constrained terminals, impedes the effective deployment of most backward-propagation-based TTA methods. To tackle memory constraints, we introduce SURGEON, a method that substantially reduces memory cost while preserving comparable accuracy improvements during fully test-time adaptation (FTTA) without relying on specific network architectures or modifications to the original training procedure. Specifically, we propose a novel dynamic activation sparsity strategy that directly prunes activations at layer-specific dynamic ratios during adaptation, allowing for flexible control of learning ability and memory cost in a data-sensitive manner. Among this, two metrics, Gradient Importance and Layer Activation Memory, are considered to determine the layer-wise pruning ratios, reflecting accuracy contribution and memory efficiency, respectively. Experimentally, our method surpasses the baselines by not only reducing memory usage but also achieving superior accuracy, delivering SOTA performance across diverse datasets, architectures, and tasks.

Paper Structure

This paper contains 34 sections, 7 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: The problem of memory-efficient TTA methods. (a)EcoTTA introduces meta networks to adapt the frozen backbone but requires modifications to the original training procedure to warm up these additional blocks. (b)MECTA establishes the updating criterion based on BN layers. (c) Our method, Surgeon prunes activations at layer-specific dynamic ratios without relying on specific architectures or modifications to the training procedure.
  • Figure 2: Surgeon prunes activations at layer-specifc dynamic ratios in a data-sensitive manner during adaptation. In forward propagation, it prunes activations ($A_{i} \rightarrow \dot{A_i}$) before caching them into memory. In backward propagation, these sparse activations are used to calculate the weight gradients $\Delta W_i$ (see Eq. \ref{['equ_backward']}). By employing dynamic activation sparsity, Surgeon substantially reduces the memory cost of adaptation while maintaining comparable accuracy in dynamic FTTA scenarios.
  • Figure 3: Mean online error (%) under different global static pruning ratios for TTA on three convolutional networks. The pruning ratio of Surgeon refers to a global static pruning ratio that yields an equivalent cache size.
  • Figure 4: Visualization of normalized importance metrics. Experiments are implemented using WideResNet-28 on CIFAR10-C.