Table of Contents
Fetching ...

CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs

Liangzhen Lai, Naveen Suda, Vikas Chandra

TL;DR

CMSIS-NN delivers optimized fixed-point neural network kernels for Arm Cortex-M CPUs, enabling efficient edge inference on resource-limited devices. By using fixed-point quantization, specialized 2x2 matmul tiling, partial im2col convolutions, in situ pooling, and SWAR/lookup-based activations, the suite achieves substantial runtime and energy improvements over baselines. The presented CIFAR-10 CNN evaluation demonstrates practical feasibility on Cortex-M7 with low memory footprint and competitive accuracy, highlighting the approach's suitability for real-time edge AI. Available as open-source primitives, these kernels can accelerate deployment of NN models on microcontrollers and facilitate integration with ML frameworks.

Abstract

Deep Neural Networks are becoming increasingly popular in always-on IoT edge devices performing data analytics right at the source, reducing latency as well as energy consumption for data communication. This paper presents CMSIS-NN, efficient kernels developed to maximize the performance and minimize the memory footprint of neural network (NN) applications on Arm Cortex-M processors targeted for intelligent IoT edge devices. Neural network inference based on CMSIS-NN kernels achieves 4.6X improvement in runtime/throughput and 4.9X improvement in energy efficiency.

CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs

TL;DR

CMSIS-NN delivers optimized fixed-point neural network kernels for Arm Cortex-M CPUs, enabling efficient edge inference on resource-limited devices. By using fixed-point quantization, specialized 2x2 matmul tiling, partial im2col convolutions, in situ pooling, and SWAR/lookup-based activations, the suite achieves substantial runtime and energy improvements over baselines. The presented CIFAR-10 CNN evaluation demonstrates practical feasibility on Cortex-M7 with low memory footprint and competitive accuracy, highlighting the approach's suitability for real-time edge AI. Available as open-source primitives, these kernels can accelerate deployment of NN models on microcontrollers and facilitate integration with ML frameworks.

Abstract

Deep Neural Networks are becoming increasingly popular in always-on IoT edge devices performing data analytics right at the source, reducing latency as well as energy consumption for data communication. This paper presents CMSIS-NN, efficient kernels developed to maximize the performance and minimize the memory footprint of neural network (NN) applications on Arm Cortex-M processors targeted for intelligent IoT edge devices. Neural network inference based on CMSIS-NN kernels achieves 4.6X improvement in runtime/throughput and 4.9X improvement in energy efficiency.

Paper Structure

This paper contains 14 sections, 12 figures, 2 tables.

Figures (12)

  • Figure 1: Structure of a typical deep neural network.
  • Figure 2: Overview of the neural network kernel structure.
  • Figure 3: Illustration and pseudo code of the data transform from $q7\_t$ to $q15\_t$ in CMSIS $arm\_q7\_to\_q15$ function (assuming big-endian data format).
  • Figure 4: Illustration and pseudo code for data transformation from $q7\_t$ to $q15\_t$ without reordering. Output and input data are ordered differently.
  • Figure 5: The inner-loop of matrix multiplication with $2 \times 2$ kernel. Each loop computes the dot product results of $2$ columns and $2$ rows, i.e. $4$ outputs.
  • ...and 7 more figures