Ariel-ML: Computing Parallelization with Embedded Rust for Neural Networks on Heterogeneous Multi-core Microcontrollers
Zhaolan Huang, Kaspar Schleiser, Gyungmin Myung, Emmanuel Baccelli
TL;DR
The paper tackles the challenge of running neural networks on low-power, multi-core microcontrollers by introducing Ariel-ML, a Rust-based toolkit that combines a TinyML pipeline with an embedded runtime and IREE-based cross-compilation to exploit parallelism. It presents a full end-to-end design including a host build system, a Rust-based device runtime, and a greedy multicore scheduler, all integrated with IREE and Ariel OS. Open-source experiments across Arm Cortex-M, RISC-V, and ESP32 show reduced inference latency compared with C/C++ baselines, while incurring some RAM/Flash overhead due to the IREE runtime. The work provides a practical foundation for TinyML practitioners and Rust developers to evaluate and deploy neural models on heterogeneous MCUs in production settings, with reproducible benchmarks and clear future directions for optimization and security.
Abstract
Low-power microcontroller (MCU) hardware is currently evolving from single-core architectures to predominantly multi-core architectures. In parallel, new embedded software building blocks are more and more written in Rust, while C/C++ dominance fades in this domain. On the other hand, small artificial neural networks (ANN) of various kinds are increasingly deployed in edge AI use cases, thus deployed and executed directly on low-power MCUs. In this context, both incremental improvements and novel innovative services will have to be continuously retrofitted using ANNs execution in software embedded on sensing/actuating systems already deployed in the field. However, there was so far no Rust embedded software platform automating parallelization for inference computation on multi-core MCUs executing arbitrary TinyML models. This paper thus fills this gap by introducing Ariel-ML, a novel toolkit we designed combining a generic TinyML pipeline and an embedded Rust software platform which can take full advantage of multi-core capabilities of various 32bit microcontroller families (Arm Cortex-M, RISC-V, ESP-32). We published the full open source code of its implementation, which we used to benchmark its capabilities using a zoo of various TinyML models. We show that Ariel-ML outperforms prior art in terms of inference latency as expected, and we show that, compared to pre-existing toolkits using embedded C/C++, Ariel-ML achieves comparable memory footprints. Ariel-ML thus provides a useful basis for TinyML practitioners and resource-constrained embedded Rust developers.
