SURGEON: Memory-Adaptive Fully Test-Time Adaptation via Dynamic Activation Sparsity
Ke Ma, Jiaqi Tang, Bin Guo, Fan Dang, Sicong Liu, Zhui Zhu, Lei Wu, Cheng Fang, Ying-Cong Chen, Zhiwen Yu, Yunhao Liu
TL;DR
SURGEON addresses the memory bottleneck of fully test-time adaptation (FTTA) on resource-constrained devices by introducing dynamic activation sparsity, which prunes activations at per-layer dynamic ratios during adaptation. It defines two layer-level importances, Gradient Importance and Layer Activation Memory, combines them into an integrated indicator I_i, and converts this into layer-specific pruning ratios p_i^t to balance learning capacity with memory usage. The approach is architecture-agnostic and does not require modifications to the original training procedure, achieving state-of-the-art accuracy-memory trade-offs across CIFAR-C, ImageNet-C, and ACDC benchmarks, for both CNN and transformer backbones. Empirically, SURGEON outperforms baselines like BN-stat, TENT, CoTTA, EcoTTA, and MECTA in memory efficiency while maintaining or improving accuracy, and it remains effective in real-world deployments on edge hardware, enabling robust deployments under distribution shifts.
Abstract
Despite the growing integration of deep models into mobile terminals, the accuracy of these models declines significantly due to various deployment interferences. Test-time adaptation (TTA) has emerged to improve the performance of deep models by adapting them to unlabeled target data online. Yet, the significant memory cost, particularly in resource-constrained terminals, impedes the effective deployment of most backward-propagation-based TTA methods. To tackle memory constraints, we introduce SURGEON, a method that substantially reduces memory cost while preserving comparable accuracy improvements during fully test-time adaptation (FTTA) without relying on specific network architectures or modifications to the original training procedure. Specifically, we propose a novel dynamic activation sparsity strategy that directly prunes activations at layer-specific dynamic ratios during adaptation, allowing for flexible control of learning ability and memory cost in a data-sensitive manner. Among this, two metrics, Gradient Importance and Layer Activation Memory, are considered to determine the layer-wise pruning ratios, reflecting accuracy contribution and memory efficiency, respectively. Experimentally, our method surpasses the baselines by not only reducing memory usage but also achieving superior accuracy, delivering SOTA performance across diverse datasets, architectures, and tasks.
