TinyDéjàVu: Smaller Memory Footprint & Faster Inference on Sensor Data Streams with Always-On Microcontrollers
Zhaolan Huang, Emmanuel Baccelli
TL;DR
TinyDéjàVu introduces a State-Space Model–based framework to drastically reduce RAM and compute for time-series inference on memory-constrained MCUs. By reformulating temporal operators as SSMs and partitioning networks at a Global Temporal Aggregator boundary, it enables streaming inference with two-stage preheat/streaming execution and aggressive reuse of overlapping window computations. The approach includes a global pooling optimization, circular-buffer SSMs, and optional BF16 precision, achieving up to ~99% RAM reduction and up to ~200x speedups in streaming scenarios, with negligible accuracy loss. Open-source implementations and reproducible hardware benchmarks demonstrate broad applicability across diverse temporal models. These results suggest a practical path to energy-efficient, always-on sensor analytics on ultra-low-resource devices.
Abstract
Always-on sensors are increasingly expected to embark a variety of tiny neural networks and to continuously perform inference on time-series of the data they sense. In order to fit lifetime and energy consumption requirements when operating on battery, such hardware uses microcontrollers (MCUs) with tiny memory budget e.g., 128kB of RAM. In this context, optimizing data flows across neural network layers becomes crucial. In this paper, we introduce TinyDéjàVu, a new framework and novel algorithms we designed to drastically reduce the RAM footprint required by inference using various tiny ML models for sensor data time-series on typical microcontroller hardware. We publish the implementation of TinyDéjàVu as open source, and we perform reproducible benchmarks on hardware. We show that TinyDéjàVu can save more than 60% of RAM usage and eliminate up to 90% of redundant compute on overlapping sliding window inputs.
