Table of Contents
Fetching ...

Subspace Optimization for Backpropagation-Free Continual Test-Time Adaptation

Damian Sójka, Sebastian Cygert, Marc Masana

Abstract

We introduce PACE, a backpropagation-free continual test-time adaptation system that directly optimizes the affine parameters of normalization layers. Existing derivative-free approaches struggle to balance runtime efficiency with learning capacity, as they either restrict updates to input prompts or require continuous, resource-intensive adaptation regardless of domain stability. To address these limitations, PACE leverages the Covariance Matrix Adaptation Evolution Strategy with the Fastfood projection to optimize high-dimensional affine parameters within a low-dimensional subspace, leading to superior adaptive performance. Furthermore, we enhance the runtime efficiency by incorporating an adaptation stopping criterion and a domain-specialized vector bank to eliminate redundant computation. Our framework achieves state-of-the-art accuracy across multiple benchmarks under continual distribution shifts, reducing runtime by over 50% compared to existing backpropagation-free methods.

Subspace Optimization for Backpropagation-Free Continual Test-Time Adaptation

Abstract

We introduce PACE, a backpropagation-free continual test-time adaptation system that directly optimizes the affine parameters of normalization layers. Existing derivative-free approaches struggle to balance runtime efficiency with learning capacity, as they either restrict updates to input prompts or require continuous, resource-intensive adaptation regardless of domain stability. To address these limitations, PACE leverages the Covariance Matrix Adaptation Evolution Strategy with the Fastfood projection to optimize high-dimensional affine parameters within a low-dimensional subspace, leading to superior adaptive performance. Furthermore, we enhance the runtime efficiency by incorporating an adaptation stopping criterion and a domain-specialized vector bank to eliminate redundant computation. Our framework achieves state-of-the-art accuracy across multiple benchmarks under continual distribution shifts, reducing runtime by over 50% compared to existing backpropagation-free methods.

Paper Structure

This paper contains 17 sections, 10 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Accuracy versus runtime trade-off on the ImageNet-C benchmark using a ViT-B model, across various adaptation stopping thresholds $\epsilon$ (star marks the default setting $\epsilon\!=\!0.045$). The horizontal dotted line represents the NoAdapt baseline accuracy. Existing BP-free methods typically face a trade-off: they either achieve high accuracy at the expense of computational efficiency or reduce runtime by sacrificing precision. Our approach not only outperforms current baselines but also introduces a tunable mechanism to balance the accuracy-runtime trade-off, preventing the inefficient use of resources for diminishing returns.
  • Figure 2: Observed performance gap. A comparison of updating the affine parameters of normalization layers (Norm.) versus three input prompts (Prompt) for a ViT-B model during test-time adaptation on ImageNet-C. We adapt these using ground-truth labels with an SGD optimizer with varying learning rate. Updating the normalization layers allows the model to more effectively 'correct' the covariate shift at each network depth for all reported learning rate values.
  • Figure 3: Intrinsic dimensionality of continual TTA gradients. The affine parameters of normalization layers from a ViT-B model are optimized via SGD with the loss function from FOA niu2024test. The concatenated gradients from all ImageNet-C domains reveal that only 566.0 components explain 90% of the variance. This highlights the low-dimensional nature of the adaptation space. Note that only the most significant components are shown for clarity. The analysis is based on 11729.0 gradient batches.
  • Figure 4: Marginal accuracy gain per unit time across adaptation budgets on ImageNet-C with ViT-B model. For each consecutive pair of adaptation step budgets, we compute the increase in mean accuracy across domains and divide it by the additional estimated wall-clock time required for that interval using our Subspace Adaptation with CMA-ES algorithm. The mean of CMA-ES distribution $\bm{m}$ is used as the evaluated model update. Higher bars indicate more efficient use of adaptation time, while decreasing bars indicate diminishing returns from further adaptation.
  • Figure 5: Diagram of PACE. 1) Subspace Adaptation: we adapt the model by adding a high-dimensional random projection of a small, learnable vector to the model's normalization layer weights. We use the CMA-ES strategy to iteratively evolve a population of these vectors, selecting the one that minimizes the loss on current test samples. 2) Adaptation Stopping: for efficiency, we stop the adaptation when the mean of the distribution optimized by CMA-ES is lower than a threshold. Along with the Domain-Specialized Vector Bank, they make an effective and efficient TTA system.
  • ...and 2 more figures