Table of Contents
Fetching ...

Scalable and RISC-V Programmable Near-Memory Computing Architectures for Edge Nodes

Michele Caon, Clément Choné, Pasquale Davide Schiavone, Alexandre Levisse, Guido Masera, Maurizio Martina, David Atienza

TL;DR

This paper addresses the energy bottleneck of edge AI by introducing two software-friendly near-memory computing architectures, NM-Caesar and NM-Carus, designed as drop-in SRAM replacements with programmable compute near the memory. NM-Caesar emphasizes area efficiency with host-driven SIMD, while NM-Carus provides autonomous, vector-capable computation via a RISC-V–based controller and a scalable vpu, enabling higher throughput on data-level parallel workloads. Post-layout simulations show substantial gains over a baseline RV32IMC CPU, with NM-Carus reaching a peak energy efficiency of 306.7 GOPS/W for 8-bit matrix multiplication and both variants delivering up to 28–53× speedups depending on kernel and data width. The work demonstrates robust end-to-end TinyML performance improvements and situates NM-Caesar and NM-Carus favorably against state-of-the-art CIM/NMC designs, underscoring the practicality and impact of software-friendly near-memory accelerators for edge devices.

Abstract

The widespread adoption of data-centric algorithms, particularly Artificial Intelligence (AI) and Machine Learning (ML), has exposed the limitations of centralized processing infrastructures, driving a shift towards edge computing. This necessitates stringent constraints on energy efficiency, which traditional von Neumann architectures struggle to meet. The Compute-In-Memory (CIM) paradigm has emerged as a superior candidate due to its efficient exploitation of available memory bandwidth. However, existing CIM solutions require high implementation effort and lack flexibility from a software integration standpoint. This work proposes a novel, software-friendly, general-purpose, and low-integration-effort Near-Memory Computing (NMC) approach, paving the way for the adoption of CIM-based systems in the next generation of edge computing nodes. Two architectural variants, NM-Caesar and NM-Carus, are proposed and characterized to target different trade-offs in area efficiency, performance, and flexibility, covering a wide range of embedded microcontrollers. Post-layout simulations show up to $28.0\times$ and $53.9\times$ lower execution time and $25.0\times$ and $35.6\times$ higher energy efficiency at the system level, respectively, compared to executing the same tasks on a state-of-the-art RISC-V CPU (RV32IMC). NM-Carus achieves a peak energy efficiency of $306.7$ GOPS/W in 8-bit matrix multiplications, surpassing recent state-of-the-art in- and near-memory circuits.

Scalable and RISC-V Programmable Near-Memory Computing Architectures for Edge Nodes

TL;DR

This paper addresses the energy bottleneck of edge AI by introducing two software-friendly near-memory computing architectures, NM-Caesar and NM-Carus, designed as drop-in SRAM replacements with programmable compute near the memory. NM-Caesar emphasizes area efficiency with host-driven SIMD, while NM-Carus provides autonomous, vector-capable computation via a RISC-V–based controller and a scalable vpu, enabling higher throughput on data-level parallel workloads. Post-layout simulations show substantial gains over a baseline RV32IMC CPU, with NM-Carus reaching a peak energy efficiency of 306.7 GOPS/W for 8-bit matrix multiplication and both variants delivering up to 28–53× speedups depending on kernel and data width. The work demonstrates robust end-to-end TinyML performance improvements and situates NM-Caesar and NM-Carus favorably against state-of-the-art CIM/NMC designs, underscoring the practicality and impact of software-friendly near-memory accelerators for edge devices.

Abstract

The widespread adoption of data-centric algorithms, particularly Artificial Intelligence (AI) and Machine Learning (ML), has exposed the limitations of centralized processing infrastructures, driving a shift towards edge computing. This necessitates stringent constraints on energy efficiency, which traditional von Neumann architectures struggle to meet. The Compute-In-Memory (CIM) paradigm has emerged as a superior candidate due to its efficient exploitation of available memory bandwidth. However, existing CIM solutions require high implementation effort and lack flexibility from a software integration standpoint. This work proposes a novel, software-friendly, general-purpose, and low-integration-effort Near-Memory Computing (NMC) approach, paving the way for the adoption of CIM-based systems in the next generation of edge computing nodes. Two architectural variants, NM-Caesar and NM-Carus, are proposed and characterized to target different trade-offs in area efficiency, performance, and flexibility, covering a wide range of embedded microcontrollers. Post-layout simulations show up to and lower execution time and and higher energy efficiency at the system level, respectively, compared to executing the same tasks on a state-of-the-art RISC-V CPU (RV32IMC). NM-Carus achieves a peak energy efficiency of GOPS/W in 8-bit matrix multiplications, surpassing recent state-of-the-art in- and near-memory circuits.
Paper Structure (24 sections, 13 figures, 8 tables)

This paper contains 24 sections, 13 figures, 8 tables.

Figures (13)

  • Figure 1: Top-level block diagram of a nmc-enhanced mcu hosting NM-Caesar (area-critical implementations) or NM-Carus (performance-oriented applications) as part of its memory subsystem.
  • Figure 2: Top-level block diagram of NM-Caesar.
  • Figure 3: Example timing diagram of NM-Caesar running two normal write operations ($S0$ and $S1$) and three instructions.
  • Figure 4: Top-level block diagram of NM-Carus.
  • Figure 5: Scalar and vector instruction execution in NM-Carus.
  • ...and 8 more figures