HTVM: Efficient Neural Network Deployment On Heterogeneous TinyML Platforms
Josse Van Delm, Maarten Vandersteegen, Alessio Burrello, Giuseppe Maria Sarda, Francesco Conti, Daniele Jahier Pagliari, Luca Benini, Marian Verhelst
TL;DR
HTVM tackles deploying DNNs on heterogeneous TinyML SoCs with limited memory by fusing TVM's flexible codegen with DORY's memory-aware tiling in an accelerator-aware, ahead-of-time flow. It uses accelerator-aware pattern matching to dispatch work to digital and analog accelerators on DIANA and relies on a BYOC backend to generate optimized accelerator code while managing data movement. The approach yields large end-to-end speedups, substantial binary-size reductions, and near-peak accelerator performance, demonstrated through MLPerf Tiny benchmarks on a real heterogeneous platform. This work provides an open-source, scalable path for deploying diverse neural networks on mixed-architecture edge devices without online autotuning.
Abstract
Optimal deployment of deep neural networks (DNNs) on state-of-the-art Systems-on-Chips (SoCs) is crucial for tiny machine learning (TinyML) at the edge. The complexity of these SoCs makes deployment non-trivial, as they typically contain multiple heterogeneous compute cores with limited, programmer-managed memory to optimize latency and energy efficiency. We propose HTVM - a compiler that merges TVM with DORY to maximize the utilization of heterogeneous accelerators and minimize data movements. HTVM allows deploying the MLPerf(TM) Tiny suite on DIANA, an SoC with a RISC-V CPU, and digital and analog compute-in-memory AI accelerators, at 120x improved performance over plain TVM deployment.
