Table of Contents
Fetching ...

LeanTTA: A Backpropagation-Free and Stateless Approach to Quantized Test-Time Adaptation on Edge Devices

Cynthia Dong, Hong Jia, Young D. Kwon, Georgios Rizos, Cecilia Mascolo

TL;DR

LeanTTA addresses the challenge of robust test-time adaptation on resource-constrained edge devices by introducing a backpropagation-free, stateless framework that dynamically updates quantized normalization statistics per data point. It combines per-sample statistic stabilization, Mahalanobis-distance-based balancing, and a partial fusion strategy to enable fast, memory-efficient adaptation without historical data, suitable for 8-bit models. Key contributions include a four-step statistic processing pipeline, a per-sample reset mechanism to prevent model collapse, and demonstrated gains across CIFAR10/100-C and real-world audio datasets with edge hardware constraints, achieving up to $15.7\%$ error reduction and peak memory as low as $11.2$ MB. This approach substantially lowers the barrier to reliable on-device learning under abrupt and gradual domain shifts, offering practical impact for real-time, low-power applications in diverse sensor environments.

Abstract

While there are many advantages to deploying machine learning models on edge devices, the resource constraints of mobile platforms, the dynamic nature of the environment, and differences between the distribution of training versus in-the-wild data make such deployments challenging. Current test-time adaptation methods are often memory-intensive and not designed to be quantization-compatible or deployed on low-resource devices. To address these challenges, we present LeanTTA, a novel backpropagation-free and stateless framework for quantized test-time adaptation tailored to edge devices. Our approach minimizes computational costs by dynamically updating normalization statistics without backpropagation, which frees LeanTTA from the common pitfall of relying on large batches and historical data, making our method robust to realistic deployment scenarios. Our approach is the first to enable further computational gains by combining partial adaptation with quantized module fusion. We validate our framework across sensor modalities, demonstrating significant improvements over state-of-the-art TTA methods, including a 15.7% error reduction, peak memory usage of only 11.2MB for ResNet18, and fast adaptation within an order-of-magnitude of normal inference speeds on-device. LeanTTA provides a robust solution for achieving the right trade offs between accuracy and system efficiency in edge deployments, addressing the unique challenges posed by limited data and varied operational conditions.

LeanTTA: A Backpropagation-Free and Stateless Approach to Quantized Test-Time Adaptation on Edge Devices

TL;DR

LeanTTA addresses the challenge of robust test-time adaptation on resource-constrained edge devices by introducing a backpropagation-free, stateless framework that dynamically updates quantized normalization statistics per data point. It combines per-sample statistic stabilization, Mahalanobis-distance-based balancing, and a partial fusion strategy to enable fast, memory-efficient adaptation without historical data, suitable for 8-bit models. Key contributions include a four-step statistic processing pipeline, a per-sample reset mechanism to prevent model collapse, and demonstrated gains across CIFAR10/100-C and real-world audio datasets with edge hardware constraints, achieving up to error reduction and peak memory as low as MB. This approach substantially lowers the barrier to reliable on-device learning under abrupt and gradual domain shifts, offering practical impact for real-time, low-power applications in diverse sensor environments.

Abstract

While there are many advantages to deploying machine learning models on edge devices, the resource constraints of mobile platforms, the dynamic nature of the environment, and differences between the distribution of training versus in-the-wild data make such deployments challenging. Current test-time adaptation methods are often memory-intensive and not designed to be quantization-compatible or deployed on low-resource devices. To address these challenges, we present LeanTTA, a novel backpropagation-free and stateless framework for quantized test-time adaptation tailored to edge devices. Our approach minimizes computational costs by dynamically updating normalization statistics without backpropagation, which frees LeanTTA from the common pitfall of relying on large batches and historical data, making our method robust to realistic deployment scenarios. Our approach is the first to enable further computational gains by combining partial adaptation with quantized module fusion. We validate our framework across sensor modalities, demonstrating significant improvements over state-of-the-art TTA methods, including a 15.7% error reduction, peak memory usage of only 11.2MB for ResNet18, and fast adaptation within an order-of-magnitude of normal inference speeds on-device. LeanTTA provides a robust solution for achieving the right trade offs between accuracy and system efficiency in edge deployments, addressing the unique challenges posed by limited data and varied operational conditions.

Paper Structure

This paper contains 40 sections, 9 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: LeanTTA comparison against SOTA TTA methods under abruptly changing distribution shifts. Circle size is scaled to the maximum memory required for adaptation. Labels indicate batch size. For methods where models collapsed at batch size one, results were omitted. Results below the baseline indicate that adaptation worsened accuracy. The data are comprised of shuffled images, sampled across all distributions — the method is introduced in \ref{['app:subsec:datasets']}.
  • Figure 2: Proposed updated normalization layer. For each intermediate activation, target statistics are recorded and stabilized using source statistics. The Mahalanobis distance, $d$, estimates how far out-of-distribution the stabilized statistics have moved at each layer. The source and target statistics are then recombined according to the scaled distance, and used for normalization.
  • Figure 3: Mean (top) and variance (bottom) recorded from a single layer of ResNet18 on the the shift-free CIFAR10 dataset, for a running average with momentum=0.9 (orange), with statistics calculated at each image (blue), and with stabilized statistics (red).
  • Figure 4: Layerwise ablation analysis in two directions: (1) progressively removing adaptation from shallow layers (blue solid line) and (2) progressively adding adaptation to deeper layers (orange dotted line). Results are from full-precision (fp32) Resnet18 and MobileNetV2 evaluated on the Abrupt CIFAR10 dataset.
  • Figure 5: Fused, half-fused, and unfused quantized time-per-iteration for LeanTTA applied to ResNet18 (left) and MobileNetV2 (right) with a 500-image subset of CIFAR10-C and CIFAR100-C. For "Half", batch normalization and convolutional layers are fused in the half of the model closest to the outputs.
  • ...and 3 more figures