Table of Contents
Fetching ...

Efficient Edge Test-Time Adaptation via Latent Feature Coordinate Correction

Xinyu Luo, Jie Liu, Kecheng Chen, Junyi Yang, Bo Ding, Arindam Basu, Haoliang Li

TL;DR

This paper presents TED, a forward-only, gradient-free test-time adaptation method tailored for edge devices. TED updates a compact latent-coordinate vector within the source latent subspace defined by the top-$k$ principal components, using CMA-ES to minimize entropy and align the test latent with the source distribution without altering model parameters. The approach yields state-of-the-art performance in single-instance TTA across image classification and keyword spotting while dramatically reducing computation and memory, and it demonstrates feasibility through deployment on a ZYNQ-7020 edge platform. These results suggest a practical, scalable path for robust edge AI under real-world distribution shifts.

Abstract

Edge devices face significant challenges due to limited computational resources and distribution shifts, making efficient and adaptable machine learning essential. Existing test-time adaptation (TTA) methods often rely on gradient-based optimization or batch processing, which are inherently unsuitable for resource-constrained edge scenarios due to their reliance on backpropagation and high computational demands. Gradient-free alternatives address these issues but often suffer from limited learning capacity, lack flexibility, or impose architectural constraints. To overcome these limitations, we propose a novel single-instance TTA method tailored for edge devices (TED), which employs forward-only coordinate optimization in the principal subspace of latent using the covariance matrix adaptation evolution strategy (CMA-ES). By updating a compact low-dimensional vector, TED not only enhances output confidence but also aligns the latent representation closer to the source latent distribution within the latent principal subspace. This is achieved without backpropagation, keeping the model parameters frozen, and enabling efficient, forgetting-free adaptation with minimal memory and computational overhead. Experiments on image classification and keyword spotting tasks across the ImageNet and Google Speech Commands series datasets demonstrate that TED achieves state-of-the-art performance while $\textit{reducing computational complexity by up to 63 times}$, offering a practical and scalable solution for real-world edge applications. Furthermore, we successfully $\textit{deployed TED on the ZYNQ-7020 platform}$, demonstrating its feasibility and effectiveness for resource-constrained edge devices in real-world deployments.

Efficient Edge Test-Time Adaptation via Latent Feature Coordinate Correction

TL;DR

This paper presents TED, a forward-only, gradient-free test-time adaptation method tailored for edge devices. TED updates a compact latent-coordinate vector within the source latent subspace defined by the top- principal components, using CMA-ES to minimize entropy and align the test latent with the source distribution without altering model parameters. The approach yields state-of-the-art performance in single-instance TTA across image classification and keyword spotting while dramatically reducing computation and memory, and it demonstrates feasibility through deployment on a ZYNQ-7020 edge platform. These results suggest a practical, scalable path for robust edge AI under real-world distribution shifts.

Abstract

Edge devices face significant challenges due to limited computational resources and distribution shifts, making efficient and adaptable machine learning essential. Existing test-time adaptation (TTA) methods often rely on gradient-based optimization or batch processing, which are inherently unsuitable for resource-constrained edge scenarios due to their reliance on backpropagation and high computational demands. Gradient-free alternatives address these issues but often suffer from limited learning capacity, lack flexibility, or impose architectural constraints. To overcome these limitations, we propose a novel single-instance TTA method tailored for edge devices (TED), which employs forward-only coordinate optimization in the principal subspace of latent using the covariance matrix adaptation evolution strategy (CMA-ES). By updating a compact low-dimensional vector, TED not only enhances output confidence but also aligns the latent representation closer to the source latent distribution within the latent principal subspace. This is achieved without backpropagation, keeping the model parameters frozen, and enabling efficient, forgetting-free adaptation with minimal memory and computational overhead. Experiments on image classification and keyword spotting tasks across the ImageNet and Google Speech Commands series datasets demonstrate that TED achieves state-of-the-art performance while , offering a practical and scalable solution for real-world edge applications. Furthermore, we successfully , demonstrating its feasibility and effectiveness for resource-constrained edge devices in real-world deployments.

Paper Structure

This paper contains 16 sections, 9 equations, 5 figures, 11 tables, 1 algorithm.

Figures (5)

  • Figure 1: Accuracy, computation, and memory comparison of various TTA methods under a single-instance setting on ImageNet-C with ViT-Base.
  • Figure 2: An overview of our proposed TTA method for edge devices (TED). Source samples are used to compute the latent PC basis ${\mathbf{V}}_k$ during the preparation phase. For a single OOD sample, its latent is updated within the source latent principal subspace by encouraging higher prediction confidence and aligning it closer to the source latent distribution. This is achieved using a forward-only CMA-ES optimizer, enabling efficient and hardware-friendly TTA.
  • Figure 3: Performance comparison on ImageNet-V2/R/Sketch with ViT-Base regarding Accuracy (%). GF stands for gradient-free. The bold number indicates the best result.
  • Figure 4: Visualization of latent feature alignment. Comparisons of corrupted, original, and TED-adapted latent features in terms of (a) feature mean, (b) feature standard deviation, (c) feature distribution, (d) PCA-projected latent space, (e) Euclidean distance to original features, and (f) cosine similarity to original features. TED effectively aligns OOD latent features with the source domain.
  • Figure 5: GFLOPs, memory usage and running time per sample comparison on ImageNet-C with ViT-Base. GF stands for gradient-free. The bold number indicates the best result.