Table of Contents
Fetching ...

MicroFlow: An Efficient Rust-Based Inference Engine for TinyML

Matteo Carnelos, Francesco Pasti, Nicola Bellotto

TL;DR

MicroFlow targets robust TinyML inference on highly constrained MCUs by using a compiler-based approach in Rust, enabling static memory allocation, paging, and memory-safety guarantees. The host-side compiler generates lean, static code and the runtime executes a small set of quantized operators with no reliance on the standard library, achieving significant memory and speed advantages over traditional C/C++ engines. Experimental results show state-of-the-art accuracy with substantially reduced memory footprint and competitive or superior runtime performance across multiple 8-bit and 32-bit MCUs, demonstrating practical viability for on-device intelligent tasks. The work is open-source and modular, with clear avenues for extending operators and integrating hardware accelerators, advancing the feasibility of robust TinyML on ultra-resource-constrained devices.

Abstract

In recent years, there has been a significant interest in developing machine learning algorithms on embedded systems. This is particularly relevant for bare metal devices in Internet of Things, Robotics, and Industrial applications that face limited memory, processing power, and storage, and which require extreme robustness. To address these constraints, we present MicroFlow, an open-source TinyML framework for the deployment of Neural Networks (NNs) on embedded systems using the Rust programming language. The compiler-based inference engine of MicroFlow, coupled with Rust's memory safety, makes it suitable for TinyML applications in critical environments. The proposed framework enables the successful deployment of NNs on highly resource-constrained devices, including bare-metal 8-bit microcontrollers with only 2kB of RAM. Furthermore, MicroFlow is able to use less Flash and RAM memory than other state-of-the-art solutions for deploying NN reference models (i.e. wake-word and person detection), achieving equally accurate but faster inference compared to existing engines on medium-size NNs, and similar performance on bigger ones. The experimental results prove the efficiency and suitability of MicroFlow for the deployment of TinyML models in critical environments where resources are particularly limited.

MicroFlow: An Efficient Rust-Based Inference Engine for TinyML

TL;DR

MicroFlow targets robust TinyML inference on highly constrained MCUs by using a compiler-based approach in Rust, enabling static memory allocation, paging, and memory-safety guarantees. The host-side compiler generates lean, static code and the runtime executes a small set of quantized operators with no reliance on the standard library, achieving significant memory and speed advantages over traditional C/C++ engines. Experimental results show state-of-the-art accuracy with substantially reduced memory footprint and competitive or superior runtime performance across multiple 8-bit and 32-bit MCUs, demonstrating practical viability for on-device intelligent tasks. The work is open-source and modular, with clear avenues for extending operators and integrating hardware accelerators, advancing the feasibility of robust TinyML on ultra-resource-constrained devices.

Abstract

In recent years, there has been a significant interest in developing machine learning algorithms on embedded systems. This is particularly relevant for bare metal devices in Internet of Things, Robotics, and Industrial applications that face limited memory, processing power, and storage, and which require extreme robustness. To address these constraints, we present MicroFlow, an open-source TinyML framework for the deployment of Neural Networks (NNs) on embedded systems using the Rust programming language. The compiler-based inference engine of MicroFlow, coupled with Rust's memory safety, makes it suitable for TinyML applications in critical environments. The proposed framework enables the successful deployment of NNs on highly resource-constrained devices, including bare-metal 8-bit microcontrollers with only 2kB of RAM. Furthermore, MicroFlow is able to use less Flash and RAM memory than other state-of-the-art solutions for deploying NN reference models (i.e. wake-word and person detection), achieving equally accurate but faster inference compared to existing engines on medium-size NNs, and similar performance on bigger ones. The experimental results prove the efficiency and suitability of MicroFlow for the deployment of TinyML models in critical environments where resources are particularly limited.
Paper Structure (46 sections, 30 equations, 11 figures, 6 tables, 1 algorithm)

This paper contains 46 sections, 30 equations, 11 figures, 6 tables, 1 algorithm.

Figures (11)

  • Figure 1: Overview of the MicroFlow framework. Given a Neural Network, the host machine generates the Source Code and the network encoded Weights using the MicroFlow Compiler. The target embedded system executes the model using the MicroFlow Runtime module.
  • Figure 2: MicroFlow's compilation steps. The MicroFlow Compiler generates the Source Code that is eventually built by the Rust Compiler, together with the Microflow Runtime and the User Code, to produce the Target Binary.
  • Figure 3: Expansion of the macro. The input tokens are expanded by the procedural macro according to the model.
  • Figure 4: Parsing example. The input file is deserialized and parsed to build the internal representation.
  • Figure 5: Example of ownership propagation during the execution of an operator. The input tensor is transferred to the operator, which receives ownership and releases the tensor after execution.
  • ...and 6 more figures