Table of Contents
Fetching ...

Ariel-ML: Computing Parallelization with Embedded Rust for Neural Networks on Heterogeneous Multi-core Microcontrollers

Zhaolan Huang, Kaspar Schleiser, Gyungmin Myung, Emmanuel Baccelli

TL;DR

The paper tackles the challenge of running neural networks on low-power, multi-core microcontrollers by introducing Ariel-ML, a Rust-based toolkit that combines a TinyML pipeline with an embedded runtime and IREE-based cross-compilation to exploit parallelism. It presents a full end-to-end design including a host build system, a Rust-based device runtime, and a greedy multicore scheduler, all integrated with IREE and Ariel OS. Open-source experiments across Arm Cortex-M, RISC-V, and ESP32 show reduced inference latency compared with C/C++ baselines, while incurring some RAM/Flash overhead due to the IREE runtime. The work provides a practical foundation for TinyML practitioners and Rust developers to evaluate and deploy neural models on heterogeneous MCUs in production settings, with reproducible benchmarks and clear future directions for optimization and security.

Abstract

Low-power microcontroller (MCU) hardware is currently evolving from single-core architectures to predominantly multi-core architectures. In parallel, new embedded software building blocks are more and more written in Rust, while C/C++ dominance fades in this domain. On the other hand, small artificial neural networks (ANN) of various kinds are increasingly deployed in edge AI use cases, thus deployed and executed directly on low-power MCUs. In this context, both incremental improvements and novel innovative services will have to be continuously retrofitted using ANNs execution in software embedded on sensing/actuating systems already deployed in the field. However, there was so far no Rust embedded software platform automating parallelization for inference computation on multi-core MCUs executing arbitrary TinyML models. This paper thus fills this gap by introducing Ariel-ML, a novel toolkit we designed combining a generic TinyML pipeline and an embedded Rust software platform which can take full advantage of multi-core capabilities of various 32bit microcontroller families (Arm Cortex-M, RISC-V, ESP-32). We published the full open source code of its implementation, which we used to benchmark its capabilities using a zoo of various TinyML models. We show that Ariel-ML outperforms prior art in terms of inference latency as expected, and we show that, compared to pre-existing toolkits using embedded C/C++, Ariel-ML achieves comparable memory footprints. Ariel-ML thus provides a useful basis for TinyML practitioners and resource-constrained embedded Rust developers.

Ariel-ML: Computing Parallelization with Embedded Rust for Neural Networks on Heterogeneous Multi-core Microcontrollers

TL;DR

The paper tackles the challenge of running neural networks on low-power, multi-core microcontrollers by introducing Ariel-ML, a Rust-based toolkit that combines a TinyML pipeline with an embedded runtime and IREE-based cross-compilation to exploit parallelism. It presents a full end-to-end design including a host build system, a Rust-based device runtime, and a greedy multicore scheduler, all integrated with IREE and Ariel OS. Open-source experiments across Arm Cortex-M, RISC-V, and ESP32 show reduced inference latency compared with C/C++ baselines, while incurring some RAM/Flash overhead due to the IREE runtime. The work provides a practical foundation for TinyML practitioners and Rust developers to evaluate and deploy neural models on heterogeneous MCUs in production settings, with reproducible benchmarks and clear future directions for optimization and security.

Abstract

Low-power microcontroller (MCU) hardware is currently evolving from single-core architectures to predominantly multi-core architectures. In parallel, new embedded software building blocks are more and more written in Rust, while C/C++ dominance fades in this domain. On the other hand, small artificial neural networks (ANN) of various kinds are increasingly deployed in edge AI use cases, thus deployed and executed directly on low-power MCUs. In this context, both incremental improvements and novel innovative services will have to be continuously retrofitted using ANNs execution in software embedded on sensing/actuating systems already deployed in the field. However, there was so far no Rust embedded software platform automating parallelization for inference computation on multi-core MCUs executing arbitrary TinyML models. This paper thus fills this gap by introducing Ariel-ML, a novel toolkit we designed combining a generic TinyML pipeline and an embedded Rust software platform which can take full advantage of multi-core capabilities of various 32bit microcontroller families (Arm Cortex-M, RISC-V, ESP-32). We published the full open source code of its implementation, which we used to benchmark its capabilities using a zoo of various TinyML models. We show that Ariel-ML outperforms prior art in terms of inference latency as expected, and we show that, compared to pre-existing toolkits using embedded C/C++, Ariel-ML achieves comparable memory footprints. Ariel-ML thus provides a useful basis for TinyML practitioners and resource-constrained embedded Rust developers.

Paper Structure

This paper contains 13 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Build pipeline of Ariel-ML. The circled numbers denote the order of execution.
  • Figure 2: Architecture of Ariel-ML on device.
  • Figure 3: Execution model during model inference. The circled numbers denote the order of execution. Wait: synchronization primitive that instructs the executor to suspend until all previously dispatched tasks have finished execution. It: Work Item.
  • Figure 4: Greedy multicore scheduler in Ariel-ML. Each row–vector multiplication in the matrix-vector multiply operation can be decomposed into conflict-free work items and scheduled across different cores. The circled numbers denote the order of execution. It: Work Item.
  • Figure 5: Relative RAM and Flash usage of Ariel-ML with subsystem breakdown on RP2040 (RaspberryPi Pico 1).