Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded Systems
Pietro Farina, Subrata Biswas, Eren Yıldız, Khakim Akhunov, Saad Ahmed, Bashima Islam, Kasım Sinan Yıldırım
TL;DR
FreeML tackles the problem of deploying pre-trained DNNs on batteryless, memory-constrained devices by introducing two complementary techniques: SparseComp for sparsity-imposed, runtime-aware compression and layer separation, and gNet, a single global early exit that enables anytime outputs without altering the baseline network. The two-phase pipeline first reduces model size to fit stringent device memory, then augments the model with gNet and auto-generates portable C code for intermittent execution on microcontrollers. Across multiple datasets, FreeML achieves up to 95x compression with substantial memory-time-energy benefits and negligible accuracy loss, outperforming prior approaches that rely on multiple models or architecturally modified networks. The framework is released as open-source to facilitate practical deployment on batteryless platforms and could extend to deeper networks and varied hardware through planned enhancements. Overall, FreeML demonstrates a practical path to memory- and energy-efficient, energy-adaptive on-device AI for intermittent power environments.
Abstract
Batteryless systems frequently face power failures, requiring extra runtime buffers to maintain inference progress and leaving only a memory space for storing ultra-tiny deep neural networks (DNNs). Besides, making these models responsive to stochastic energy harvesting dynamics during inference requires a balance between inference accuracy, latency, and energy overhead. Recent works on compression mostly focus on time and memory, but often ignore energy dynamics or significantly reduce the accuracy of pre-trained DNNs. Existing energy-adaptive inference works modify the architecture of pre-trained models and have significant memory overhead. Thus, energy-adaptive and accurate inference of pre-trained DNNs on batteryless devices with extreme memory constraints is more challenging than traditional microcontrollers. We combat these issues by proposing FreeML, a framework to optimize pre-trained DNN models for memory-efficient and energy-adaptive inference on batteryless systems. FreeML comprises (1) a novel compression technique to reduce the model footprint and runtime memory requirements simultaneously, making them executable on extremely memory-constrained batteryless platforms; and (2) the first early exit mechanism that uses a single exit branch for all exit points to terminate inference at any time, making models energy-adaptive with minimal memory overhead. Our experiments showed that FreeML reduces the model sizes by up to $95 \times$, supports adaptive inference with a $2.03-19.65 \times$ less memory overhead, and provides significant time and energy benefits with only a negligible accuracy drop compared to the state-of-the-art.
