Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded Systems

Pietro Farina; Subrata Biswas; Eren Yıldız; Khakim Akhunov; Saad Ahmed; Bashima Islam; Kasım Sinan Yıldırım

Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded Systems

Pietro Farina, Subrata Biswas, Eren Yıldız, Khakim Akhunov, Saad Ahmed, Bashima Islam, Kasım Sinan Yıldırım

TL;DR

FreeML tackles the problem of deploying pre-trained DNNs on batteryless, memory-constrained devices by introducing two complementary techniques: SparseComp for sparsity-imposed, runtime-aware compression and layer separation, and gNet, a single global early exit that enables anytime outputs without altering the baseline network. The two-phase pipeline first reduces model size to fit stringent device memory, then augments the model with gNet and auto-generates portable C code for intermittent execution on microcontrollers. Across multiple datasets, FreeML achieves up to 95x compression with substantial memory-time-energy benefits and negligible accuracy loss, outperforming prior approaches that rely on multiple models or architecturally modified networks. The framework is released as open-source to facilitate practical deployment on batteryless platforms and could extend to deeper networks and varied hardware through planned enhancements. Overall, FreeML demonstrates a practical path to memory- and energy-efficient, energy-adaptive on-device AI for intermittent power environments.

Abstract

Batteryless systems frequently face power failures, requiring extra runtime buffers to maintain inference progress and leaving only a memory space for storing ultra-tiny deep neural networks (DNNs). Besides, making these models responsive to stochastic energy harvesting dynamics during inference requires a balance between inference accuracy, latency, and energy overhead. Recent works on compression mostly focus on time and memory, but often ignore energy dynamics or significantly reduce the accuracy of pre-trained DNNs. Existing energy-adaptive inference works modify the architecture of pre-trained models and have significant memory overhead. Thus, energy-adaptive and accurate inference of pre-trained DNNs on batteryless devices with extreme memory constraints is more challenging than traditional microcontrollers. We combat these issues by proposing FreeML, a framework to optimize pre-trained DNN models for memory-efficient and energy-adaptive inference on batteryless systems. FreeML comprises (1) a novel compression technique to reduce the model footprint and runtime memory requirements simultaneously, making them executable on extremely memory-constrained batteryless platforms; and (2) the first early exit mechanism that uses a single exit branch for all exit points to terminate inference at any time, making models energy-adaptive with minimal memory overhead. Our experiments showed that FreeML reduces the model sizes by up to $95 \times$, supports adaptive inference with a $2.03-19.65 \times$ less memory overhead, and provides significant time and energy benefits with only a negligible accuracy drop compared to the state-of-the-art.

Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded Systems

TL;DR

Abstract

, supports adaptive inference with a

less memory overhead, and provides significant time and energy benefits with only a negligible accuracy drop compared to the state-of-the-art.

Paper Structure (44 sections, 2 equations, 12 figures, 5 tables, 2 algorithms)

This paper contains 44 sections, 2 equations, 12 figures, 5 tables, 2 algorithms.

Introduction
DNN Inference on Intermittent Power
SOTA: Deploying Pre-trained DNN Models
Unique Features of FreeML
FreeML for DNN Intermittent Inference
Sparsity-imposed Compression of Models
Overview.
Runtime-Aware Separation of Layers
Iterative Unstructured Pruning
Global Early Exit for Pre-Trained Networks
Overview of gNet
Augmentation with Zero-Padding
Concatenation with Pooling
Classification with a Fully-Connected Layer
Agile Training of gNet
...and 29 more sections

Figures (12)

Figure 1: The SparseComp compression scheme. Smaller weights are pruned after imposing the sparsity constraint. The pruned weights can appear in the next epoch again during re-training. After several iterations, the layer is compressed with a minimal drop in the model accuracy.
Figure 2: General overview of gNet. In this example, early exit occurs at layer $2$; hence $F_1$ and $F_2$ are available, and the later features are zero-padded.
Figure 3: Accuracy of compression.
Figure 4: Effect of training dataset percentage on SparseComp compression.
Figure 5: No. of parameters for gNet and NRT-eP. gNet requires fewer parameters than NRT-eP, reducing memory overhead.
...and 7 more figures

Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded Systems

TL;DR

Abstract

Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (12)