Test-Time Model Adaptation with Only Forward Passes
Shuaicheng Niu, Chunyan Miao, Guohao Chen, Pengcheng Wu, Peilin Zhao
TL;DR
This paper tackles the challenge of test-time adaptation under distribution shifts on resource-constrained devices where backpropagation is impractical and models may be quantized or hard-coded. It introduces Forward-Optimization Adaptation (FOA), which optimizes a small input prompt via Covariance Matrix Adaptation Evolution Strategy (CMA) in a fully online, unsupervised setting, without modifying model weights. A novel fitness function combining prediction entropy and activation-discrepancy statistics guides prompt learning, complemented by a forward-only Activation Shifting module that aligns test activations with the source domain. Empirically, FOA delivers competitive or superior accuracy and calibration on ImageNet-C, ImageNet-R, ImageNet-V2, and ImageNet-Sketch, with substantial memory reductions and applicability to 8-bit/6-bit ViT quantized models. Overall, FOA broadens the practical reach of test-time adaptation to edge devices and quantized models, offering a memory-efficient pathway to robust out-of-distribution generalization.
Abstract
Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts. However, in real-world scenarios, models are usually deployed on resource-limited devices, e.g., FPGAs, and are often quantized and hard-coded with non-modifiable parameters for acceleration. In light of this, existing methods are often infeasible since they heavily depend on computation-intensive backpropagation for model updating that may be not supported. To address this, we propose a test-time Forward-Optimization Adaptation (FOA) method. In FOA, we seek to solely learn a newly added prompt (as model's input) via a derivative-free covariance matrix adaptation evolution strategy. To make this strategy work stably under our online unsupervised setting, we devise a novel fitness function by measuring test-training statistic discrepancy and model prediction entropy. Moreover, we design an activation shifting scheme that directly tunes the model activations for shifted test samples, making them align with the source training domain, thereby further enhancing adaptation performance. Without using any backpropagation and altering model weights, FOA runs on quantized 8-bit ViT outperforms gradient-based TENT on full-precision 32-bit ViT, while achieving an up to 24-fold memory reduction on ImageNet-C.
