Table of Contents
Fetching ...

TinyDéjàVu: Smaller Memory Footprint & Faster Inference on Sensor Data Streams with Always-On Microcontrollers

Zhaolan Huang, Emmanuel Baccelli

TL;DR

TinyDéjàVu introduces a State-Space Model–based framework to drastically reduce RAM and compute for time-series inference on memory-constrained MCUs. By reformulating temporal operators as SSMs and partitioning networks at a Global Temporal Aggregator boundary, it enables streaming inference with two-stage preheat/streaming execution and aggressive reuse of overlapping window computations. The approach includes a global pooling optimization, circular-buffer SSMs, and optional BF16 precision, achieving up to ~99% RAM reduction and up to ~200x speedups in streaming scenarios, with negligible accuracy loss. Open-source implementations and reproducible hardware benchmarks demonstrate broad applicability across diverse temporal models. These results suggest a practical path to energy-efficient, always-on sensor analytics on ultra-low-resource devices.

Abstract

Always-on sensors are increasingly expected to embark a variety of tiny neural networks and to continuously perform inference on time-series of the data they sense. In order to fit lifetime and energy consumption requirements when operating on battery, such hardware uses microcontrollers (MCUs) with tiny memory budget e.g., 128kB of RAM. In this context, optimizing data flows across neural network layers becomes crucial. In this paper, we introduce TinyDéjàVu, a new framework and novel algorithms we designed to drastically reduce the RAM footprint required by inference using various tiny ML models for sensor data time-series on typical microcontroller hardware. We publish the implementation of TinyDéjàVu as open source, and we perform reproducible benchmarks on hardware. We show that TinyDéjàVu can save more than 60% of RAM usage and eliminate up to 90% of redundant compute on overlapping sliding window inputs.

TinyDéjàVu: Smaller Memory Footprint & Faster Inference on Sensor Data Streams with Always-On Microcontrollers

TL;DR

TinyDéjàVu introduces a State-Space Model–based framework to drastically reduce RAM and compute for time-series inference on memory-constrained MCUs. By reformulating temporal operators as SSMs and partitioning networks at a Global Temporal Aggregator boundary, it enables streaming inference with two-stage preheat/streaming execution and aggressive reuse of overlapping window computations. The approach includes a global pooling optimization, circular-buffer SSMs, and optional BF16 precision, achieving up to ~99% RAM reduction and up to ~200x speedups in streaming scenarios, with negligible accuracy loss. Open-source implementations and reproducible hardware benchmarks demonstrate broad applicability across diverse temporal models. These results suggest a practical path to energy-efficient, always-on sensor analytics on ultra-low-resource devices.

Abstract

Always-on sensors are increasingly expected to embark a variety of tiny neural networks and to continuously perform inference on time-series of the data they sense. In order to fit lifetime and energy consumption requirements when operating on battery, such hardware uses microcontrollers (MCUs) with tiny memory budget e.g., 128kB of RAM. In this context, optimizing data flows across neural network layers becomes crucial. In this paper, we introduce TinyDéjàVu, a new framework and novel algorithms we designed to drastically reduce the RAM footprint required by inference using various tiny ML models for sensor data time-series on typical microcontroller hardware. We publish the implementation of TinyDéjàVu as open source, and we perform reproducible benchmarks on hardware. We show that TinyDéjàVu can save more than 60% of RAM usage and eliminate up to 90% of redundant compute on overlapping sliding window inputs.

Paper Structure

This paper contains 23 sections, 5 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Temporal operators can be expressed as SSM, which can drastically reduce peak RAM usage. In this example, $k$ and $C$ represents the kernel size and the number of channels, respectively. $P(x)$ denotes the pooling function; $W$ is the kernel weight; $A=[0 \ I; 0 \ 0]$; $B = [0 \ 0 \ \cdots \ 1]^T$.
  • Figure 2: Graph transformation of TinyDéjàVu. $w_1$ and $w_2$ denote two consecutive, overlapping sliding windows. GTA: Global Temporal Aggregator, Op: Operator.
  • Figure 3: Equivalent SSM of Global Pooling. $N$: window size, $s$: window stride.
  • Figure 4: RAM Usage in kB: Vanilla vs. TinyDéjàVu.
  • Figure 5: Compute latency during Streaming stage measured on stm32f746g-disco board with different overlap rates of sliding windows. All results are normalized under baseline (preheat) latency in \ref{['tab:latency-baseline']}.