Table of Contents
Fetching ...

MATCH: Model-Aware TVM-based Compilation for Heterogeneous Edge Devices

Mohamed Amine Hamdi, Francesco Daghero, Giuseppe Maria Sarda, Josse Van Delm, Arne Symons, Luca Benini, Marian Verhelst, Daniele Jahier Pagliari, Alessio Burrello

TL;DR

MATCH addresses the challenge of deploying DNNs on heterogeneous edge MCUs by integrating a model-aware DNN compiler with TVM via a DSE (ZigZag/LOMA) and accelerator-aware pattern matching. It introduces a layer-template code generation flow and a target-API abstraction that decouples hardware specifics from compilation, enabling rapid addition of new HW targets. Across DIANA and GAP9, MATCH achieves substantial latency reductions compared with plain TVM and competitive performance against HW-specific toolchains, while offering much easier extensibility. The approach lowers delivery risk for new edge accelerators and demonstrates strong potential for practical TinyML deployments on diverse SoCs.

Abstract

Streamlining the deployment of Deep Neural Networks (DNNs) on heterogeneous edge platforms, coupling within the same micro-controller unit (MCU) instruction processors and hardware accelerators for tensor computations, is becoming one of the crucial challenges of the TinyML field. The best-performing DNN compilation toolchains are usually deeply customized for a single MCU family, and porting to a different heterogeneous MCU family implies labor-intensive re-development of almost the entire compiler. On the opposite side, retargetable toolchains, such as TVM, fail to exploit the capabilities of custom accelerators, resulting in the generation of general but unoptimized code. To overcome this duality, we introduce MATCH, a novel TVM-based DNN deployment framework designed for easy agile retargeting across different MCU processors and accelerators, thanks to a customizable model-based hardware abstraction. We show that a general and retargetable mapping framework enhanced with hardware cost models can compete with and even outperform custom toolchains on diverse targets while only needing the definition of an abstract hardware model and a SoC-specific API. We tested MATCH on two state-of-the-art heterogeneous MCUs, GAP9 and DIANA. On the four DNN models of the MLPerf Tiny suite MATCH reduces inference latency by up to 60.88 times on DIANA, compared to using the plain TVM, thanks to the exploitation of the on-board HW accelerator. Compared to HTVM, a fully customized toolchain for DIANA, we still reduce the latency by 16.94%. On GAP9, using the same benchmarks, we improve the latency by 2.15 times compared to the dedicated DORY compiler, thanks to our heterogeneous DNN mapping approach that synergically exploits the DNN accelerator and the eight-cores cluster available on board.

MATCH: Model-Aware TVM-based Compilation for Heterogeneous Edge Devices

TL;DR

MATCH addresses the challenge of deploying DNNs on heterogeneous edge MCUs by integrating a model-aware DNN compiler with TVM via a DSE (ZigZag/LOMA) and accelerator-aware pattern matching. It introduces a layer-template code generation flow and a target-API abstraction that decouples hardware specifics from compilation, enabling rapid addition of new HW targets. Across DIANA and GAP9, MATCH achieves substantial latency reductions compared with plain TVM and competitive performance against HW-specific toolchains, while offering much easier extensibility. The approach lowers delivery risk for new edge accelerators and demonstrates strong potential for practical TinyML deployments on diverse SoCs.

Abstract

Streamlining the deployment of Deep Neural Networks (DNNs) on heterogeneous edge platforms, coupling within the same micro-controller unit (MCU) instruction processors and hardware accelerators for tensor computations, is becoming one of the crucial challenges of the TinyML field. The best-performing DNN compilation toolchains are usually deeply customized for a single MCU family, and porting to a different heterogeneous MCU family implies labor-intensive re-development of almost the entire compiler. On the opposite side, retargetable toolchains, such as TVM, fail to exploit the capabilities of custom accelerators, resulting in the generation of general but unoptimized code. To overcome this duality, we introduce MATCH, a novel TVM-based DNN deployment framework designed for easy agile retargeting across different MCU processors and accelerators, thanks to a customizable model-based hardware abstraction. We show that a general and retargetable mapping framework enhanced with hardware cost models can compete with and even outperform custom toolchains on diverse targets while only needing the definition of an abstract hardware model and a SoC-specific API. We tested MATCH on two state-of-the-art heterogeneous MCUs, GAP9 and DIANA. On the four DNN models of the MLPerf Tiny suite MATCH reduces inference latency by up to 60.88 times on DIANA, compared to using the plain TVM, thanks to the exploitation of the on-board HW accelerator. Compared to HTVM, a fully customized toolchain for DIANA, we still reduce the latency by 16.94%. On GAP9, using the same benchmarks, we improve the latency by 2.15 times compared to the dedicated DORY compiler, thanks to our heterogeneous DNN mapping approach that synergically exploits the DNN accelerator and the eight-cores cluster available on board.

Paper Structure

This paper contains 26 sections, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Overview of the components that are part of an AI compiler.
  • Figure 2: MATCH flow. TVM default components are colored in dark blue.
  • Figure 3: Overview of the layer template and its conversion to target-specific code.
  • Figure 4: Example of a generic MatchTarget, detailing how an HW execution module is composed.
  • Figure 5: DIANA architecture. We report the computational units in orange and the memory units in light green.
  • ...and 6 more figures