vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs

Size Zheng; Renze Chen; Meng Li; Zihao Ye; Luis Ceze; Yun Liang

vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs

Size Zheng, Renze Chen, Meng Li, Zihao Ye, Luis Ceze, Yun Liang

TL;DR

vMCU introduces segment-level memory management to coordinate memory resources with kernel execution for DNN inference on resource-constrained MCUs. By virtualizing MCU memory into a circular pool of segments and coupling segmentation with two-level tiling in kernels, it enables partial tensor overlapping that reduces RAM usage and energy without retraining. The approach yields up to $49.5\%$ RAM reduction and $53.0\%$ energy savings on single layers, and a $61.5\%$ reduction in the memory bottleneck for end-to-end networks, expanding the set of deployable models on low-end MCUs. The work also provides a Python-based compiler interface and a practical MCU library for FC, Conv, and inverted-bottleneck modules, with no accuracy loss and potential to enlarge NAS search spaces.

Abstract

IoT devices based on microcontroller units (MCU) provide ultra-low power consumption and ubiquitous computation for near-sensor deep learning models (DNN). However, the memory of MCU is usually 2-3 orders of magnitude smaller than mobile devices, which makes it challenging to map DNNs onto MCUs. Previous work separates memory management and kernel implementation for MCU and relies on coarse-grained memory management techniques such as inplace update to reduce memory consumption. In this paper, we propose to coordinate memory management and kernel optimization for DNN inference on MCUs to enable fine-grained memory management. The key idea is to virtualize the limited memory of MCU as a large memory pool. Each kernel divides the memory pool into kernel-specific segments and handles segment load and store while computing DNN layers. Memory consumption can be reduced because using the fine-grained segment-level memory control, we can overlap the memory footprint of different tensors without the need to materialize them at the same time. Following this idea, we implement \ours{} for DNN inference on MCU. Evaluation for single layers on ARM Cortex-M4 and Cortex-M7 processors shows that \ours{} can reduce from $12.0\%$ to $49.5\%$ RAM usage and from $20.6\%$ to $53.0\%$ energy consumption compared to state-of-the-art work. For full DNN evaluation, \ours{} can reduce the memory bottleneck by $61.5\%$, enabling more models to be deployed on low-end MCUs.

vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs

TL;DR

RAM reduction and

energy savings on single layers, and a

reduction in the memory bottleneck for end-to-end networks, expanding the set of deployable models on low-end MCUs. The work also provides a Python-based compiler interface and a practical MCU library for FC, Conv, and inverted-bottleneck modules, with no accuracy loss and potential to enlarge NAS search spaces.

Abstract

RAM usage and from

energy consumption compared to state-of-the-art work. For full DNN evaluation, \ours{} can reduce the memory bottleneck by

, enabling more models to be deployed on low-end MCUs.

Paper Structure (26 sections, 10 equations, 12 figures, 3 tables)

This paper contains 26 sections, 10 equations, 12 figures, 3 tables.

Introduction
Background and Motivation
Architecture Features of Microcontrollers
Reduce DNN Size With NAS
Tensor-level Memory Management on MCU
Motivational Example
Overview of vMCU
Segment-level Memory Management
Segment-aware Kernel Design
Kernel Design for Single Layer
Kernel Design for Multiple Layers
Segment Size Selection
vMCU Compiler Support
Vector Intrinsic Support
Library Generation
...and 11 more sections

Figures (12)

Figure 1: a) and b): Compare Tensor-level memory management and segment-level memory management. c): Motivational example.
Figure 2: Overview of vMCU.
Figure 3: Problem formulation for GEMM example.
Figure 4: Pseudo code for the kernel of fully connected layer
Figure 5: Pseudo code for the kernel of 2D convolution layer
...and 7 more figures

vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs

TL;DR

Abstract

vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs

Authors

TL;DR

Abstract

Table of Contents

Figures (12)