MLonMCU: TinyML Benchmarking with Fast Retargeting
Philipp van Kempen, Rafael Stahl, Daniel Mueller-Gritschneder, Ulf Schlichtmann
TL;DR
MLonMCU presents a framework-independent, extensible benchmarking flow for TinyML deployments on microcontrollers, enabling fast retargeting across frameworks and targets. It combines a Python-based architecture with modular components to automate model loading, building, tuning, and execution, producing detailed reports and visuals. The evaluation demonstrates that TVM can achieve lower inference latency with auto-generated kernels, while TFLite Micro often offers better memory footprints for complex models; AutoTVM and USMP further optimize RAM usage, though tuning iterations and hardware constraints pose practical challenges. Overall, MLonMCU enables rapid, cross-framework comparisons and practical insights for deploying TinyML at the edge, with scope for integrating NAS-like model search and power-performance analyses in future work.
Abstract
While there exist many ways to deploy machine learning models on microcontrollers, it is non-trivial to choose the optimal combination of frameworks and targets for a given application. Thus, automating the end-to-end benchmarking flow is of high relevance nowadays. A tool called MLonMCU is proposed in this paper and demonstrated by benchmarking the state-of-the-art TinyML frameworks TFLite for Microcontrollers and TVM effortlessly with a large number of configurations in a low amount of time.
