Table of Contents
Fetching ...

Test-Time Model Adaptation with Only Forward Passes

Shuaicheng Niu, Chunyan Miao, Guohao Chen, Pengcheng Wu, Peilin Zhao

TL;DR

This paper tackles the challenge of test-time adaptation under distribution shifts on resource-constrained devices where backpropagation is impractical and models may be quantized or hard-coded. It introduces Forward-Optimization Adaptation (FOA), which optimizes a small input prompt via Covariance Matrix Adaptation Evolution Strategy (CMA) in a fully online, unsupervised setting, without modifying model weights. A novel fitness function combining prediction entropy and activation-discrepancy statistics guides prompt learning, complemented by a forward-only Activation Shifting module that aligns test activations with the source domain. Empirically, FOA delivers competitive or superior accuracy and calibration on ImageNet-C, ImageNet-R, ImageNet-V2, and ImageNet-Sketch, with substantial memory reductions and applicability to 8-bit/6-bit ViT quantized models. Overall, FOA broadens the practical reach of test-time adaptation to edge devices and quantized models, offering a memory-efficient pathway to robust out-of-distribution generalization.

Abstract

Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts. However, in real-world scenarios, models are usually deployed on resource-limited devices, e.g., FPGAs, and are often quantized and hard-coded with non-modifiable parameters for acceleration. In light of this, existing methods are often infeasible since they heavily depend on computation-intensive backpropagation for model updating that may be not supported. To address this, we propose a test-time Forward-Optimization Adaptation (FOA) method. In FOA, we seek to solely learn a newly added prompt (as model's input) via a derivative-free covariance matrix adaptation evolution strategy. To make this strategy work stably under our online unsupervised setting, we devise a novel fitness function by measuring test-training statistic discrepancy and model prediction entropy. Moreover, we design an activation shifting scheme that directly tunes the model activations for shifted test samples, making them align with the source training domain, thereby further enhancing adaptation performance. Without using any backpropagation and altering model weights, FOA runs on quantized 8-bit ViT outperforms gradient-based TENT on full-precision 32-bit ViT, while achieving an up to 24-fold memory reduction on ImageNet-C.

Test-Time Model Adaptation with Only Forward Passes

TL;DR

This paper tackles the challenge of test-time adaptation under distribution shifts on resource-constrained devices where backpropagation is impractical and models may be quantized or hard-coded. It introduces Forward-Optimization Adaptation (FOA), which optimizes a small input prompt via Covariance Matrix Adaptation Evolution Strategy (CMA) in a fully online, unsupervised setting, without modifying model weights. A novel fitness function combining prediction entropy and activation-discrepancy statistics guides prompt learning, complemented by a forward-only Activation Shifting module that aligns test activations with the source domain. Empirically, FOA delivers competitive or superior accuracy and calibration on ImageNet-C, ImageNet-R, ImageNet-V2, and ImageNet-Sketch, with substantial memory reductions and applicability to 8-bit/6-bit ViT quantized models. Overall, FOA broadens the practical reach of test-time adaptation to edge devices and quantized models, offering a memory-efficient pathway to robust out-of-distribution generalization.

Abstract

Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts. However, in real-world scenarios, models are usually deployed on resource-limited devices, e.g., FPGAs, and are often quantized and hard-coded with non-modifiable parameters for acceleration. In light of this, existing methods are often infeasible since they heavily depend on computation-intensive backpropagation for model updating that may be not supported. To address this, we propose a test-time Forward-Optimization Adaptation (FOA) method. In FOA, we seek to solely learn a newly added prompt (as model's input) via a derivative-free covariance matrix adaptation evolution strategy. To make this strategy work stably under our online unsupervised setting, we devise a novel fitness function by measuring test-training statistic discrepancy and model prediction entropy. Moreover, we design an activation shifting scheme that directly tunes the model activations for shifted test samples, making them align with the source training domain, thereby further enhancing adaptation performance. Without using any backpropagation and altering model weights, FOA runs on quantized 8-bit ViT outperforms gradient-based TENT on full-precision 32-bit ViT, while achieving an up to 24-fold memory reduction on ImageNet-C.
Paper Structure (17 sections, 8 equations, 4 figures, 17 tables, 1 algorithm)

This paper contains 17 sections, 8 equations, 4 figures, 17 tables, 1 algorithm.

Figures (4)

  • Figure 1: (a) An illustration of our proposed FOA. For each batch of online incoming test samples, we feed them alongside prompts ${\bf p}$ into the TTA model, and calculate a fitness value that serves as a learning signal, aiding the covariance matrix adaptation (CMA) optimizer in learning the prompts ${\bf p}$. This fitness function is derived from both the prediction entropy and the distribution discrepancy between the testing CLS activations and source CLS activations (calculated once). (b) We further boost the adaptation performance by directly adjusting the activations (before the final MLP head), guiding them from the testing distribution towards the source distribution.
  • Figure 2: Parameter sensitivity analyses of our FOA. Experiments are conducted on ImageNet-C (Gaussian Noise, level 5) with ViT-Base.
  • Figure 3: Visualizations of images in ImageNet and ImageNet-C/V2/R/Sketch, which are directly taken from their original papers.
  • Figure 4: Online accuracy comparison with MEMO zhang2021memo on ViT and ImageNet-C (Gaussian noise, severity level 5).