Table of Contents
Fetching ...

Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded Systems

Pietro Farina, Subrata Biswas, Eren Yıldız, Khakim Akhunov, Saad Ahmed, Bashima Islam, Kasım Sinan Yıldırım

TL;DR

FreeML tackles the problem of deploying pre-trained DNNs on batteryless, memory-constrained devices by introducing two complementary techniques: SparseComp for sparsity-imposed, runtime-aware compression and layer separation, and gNet, a single global early exit that enables anytime outputs without altering the baseline network. The two-phase pipeline first reduces model size to fit stringent device memory, then augments the model with gNet and auto-generates portable C code for intermittent execution on microcontrollers. Across multiple datasets, FreeML achieves up to 95x compression with substantial memory-time-energy benefits and negligible accuracy loss, outperforming prior approaches that rely on multiple models or architecturally modified networks. The framework is released as open-source to facilitate practical deployment on batteryless platforms and could extend to deeper networks and varied hardware through planned enhancements. Overall, FreeML demonstrates a practical path to memory- and energy-efficient, energy-adaptive on-device AI for intermittent power environments.

Abstract

Batteryless systems frequently face power failures, requiring extra runtime buffers to maintain inference progress and leaving only a memory space for storing ultra-tiny deep neural networks (DNNs). Besides, making these models responsive to stochastic energy harvesting dynamics during inference requires a balance between inference accuracy, latency, and energy overhead. Recent works on compression mostly focus on time and memory, but often ignore energy dynamics or significantly reduce the accuracy of pre-trained DNNs. Existing energy-adaptive inference works modify the architecture of pre-trained models and have significant memory overhead. Thus, energy-adaptive and accurate inference of pre-trained DNNs on batteryless devices with extreme memory constraints is more challenging than traditional microcontrollers. We combat these issues by proposing FreeML, a framework to optimize pre-trained DNN models for memory-efficient and energy-adaptive inference on batteryless systems. FreeML comprises (1) a novel compression technique to reduce the model footprint and runtime memory requirements simultaneously, making them executable on extremely memory-constrained batteryless platforms; and (2) the first early exit mechanism that uses a single exit branch for all exit points to terminate inference at any time, making models energy-adaptive with minimal memory overhead. Our experiments showed that FreeML reduces the model sizes by up to $95 \times$, supports adaptive inference with a $2.03-19.65 \times$ less memory overhead, and provides significant time and energy benefits with only a negligible accuracy drop compared to the state-of-the-art.

Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded Systems

TL;DR

FreeML tackles the problem of deploying pre-trained DNNs on batteryless, memory-constrained devices by introducing two complementary techniques: SparseComp for sparsity-imposed, runtime-aware compression and layer separation, and gNet, a single global early exit that enables anytime outputs without altering the baseline network. The two-phase pipeline first reduces model size to fit stringent device memory, then augments the model with gNet and auto-generates portable C code for intermittent execution on microcontrollers. Across multiple datasets, FreeML achieves up to 95x compression with substantial memory-time-energy benefits and negligible accuracy loss, outperforming prior approaches that rely on multiple models or architecturally modified networks. The framework is released as open-source to facilitate practical deployment on batteryless platforms and could extend to deeper networks and varied hardware through planned enhancements. Overall, FreeML demonstrates a practical path to memory- and energy-efficient, energy-adaptive on-device AI for intermittent power environments.

Abstract

Batteryless systems frequently face power failures, requiring extra runtime buffers to maintain inference progress and leaving only a memory space for storing ultra-tiny deep neural networks (DNNs). Besides, making these models responsive to stochastic energy harvesting dynamics during inference requires a balance between inference accuracy, latency, and energy overhead. Recent works on compression mostly focus on time and memory, but often ignore energy dynamics or significantly reduce the accuracy of pre-trained DNNs. Existing energy-adaptive inference works modify the architecture of pre-trained models and have significant memory overhead. Thus, energy-adaptive and accurate inference of pre-trained DNNs on batteryless devices with extreme memory constraints is more challenging than traditional microcontrollers. We combat these issues by proposing FreeML, a framework to optimize pre-trained DNN models for memory-efficient and energy-adaptive inference on batteryless systems. FreeML comprises (1) a novel compression technique to reduce the model footprint and runtime memory requirements simultaneously, making them executable on extremely memory-constrained batteryless platforms; and (2) the first early exit mechanism that uses a single exit branch for all exit points to terminate inference at any time, making models energy-adaptive with minimal memory overhead. Our experiments showed that FreeML reduces the model sizes by up to , supports adaptive inference with a less memory overhead, and provides significant time and energy benefits with only a negligible accuracy drop compared to the state-of-the-art.
Paper Structure (44 sections, 2 equations, 12 figures, 5 tables, 2 algorithms)

This paper contains 44 sections, 2 equations, 12 figures, 5 tables, 2 algorithms.

Figures (12)

  • Figure 1: The SparseComp compression scheme. Smaller weights are pruned after imposing the sparsity constraint. The pruned weights can appear in the next epoch again during re-training. After several iterations, the layer is compressed with a minimal drop in the model accuracy.
  • Figure 2: General overview of gNet. In this example, early exit occurs at layer $2$; hence $F_1$ and $F_2$ are available, and the later features are zero-padded.
  • Figure 3: Accuracy of compression.
  • Figure 4: Effect of training dataset percentage on SparseComp compression.
  • Figure 5: No. of parameters for gNet and NRT-eP. gNet requires fewer parameters than NRT-eP, reducing memory overhead.
  • ...and 7 more figures