CIMR-V: An End-to-End SRAM-based CIM Accelerator with RISC-V for AI Edge Device

Yan-Cheng Guo and; Tian-Sheuan Chang; Chih-Sheng Lin; Bo-Cheng Chiou; Chih-Ming Lai; Shyh-Shyuan Sheu; Wei-Chung Lo; Shih-Chieh Chang

CIMR-V: An End-to-End SRAM-based CIM Accelerator with RISC-V for AI Edge Device

Yan-Cheng Guo and, Tian-Sheuan Chang, Chih-Sheng Lin, Bo-Cheng Chiou, Chih-Ming Lai, Shyh-Shyuan Sheu, Wei-Chung Lo, Shih-Chieh Chang

TL;DR

This work targets the data-movement bottleneck in SRAM-based computing-in-memory accelerators by integrating layer fusion, weight fusion, and a CIM-enabled convolution/pooling pipeline into an end-to-end on-chip inference engine. CIMR-V features a high-density SRAM CIM macro with X-mode and Y-mode, a modified RISC-V core with CIM instructions, and a full-stack flow that maps high-level AI models to on-chip CIM operations. On a keyword spotting task, layer fusion, weight fusion, and a convolution-pooling pipeline yield an end-to-end latency reduction of 85.14%, with energy efficiency reaching 3707.84 TOPS/W and 26.21 TOPS at 50 MHz on a 28nm process. The design demonstrates practical, programmable CIM suitable for edge devices, combining high energy efficiency with end-to-end model inference on chip.

Abstract

Computing-in-memory (CIM) is renowned in deep learning due to its high energy efficiency resulting from highly parallel computing with minimal data movement. However, current SRAM-based CIM designs suffer from long latency for loading weight or feature maps from DRAM for large AI models. Moreover, previous SRAM-based CIM architectures lack end-to-end model inference. To address these issues, this paper proposes CIMR-V, an end-to-end CIM accelerator with RISC-V that incorporates CIM layer fusion, convolution/max pooling pipeline, and weight fusion, resulting in an 85.14\% reduction in latency for the keyword spotting model. Furthermore, the proposed CIM-type instructions facilitate end-to-end AI model inference and full stack flow, effectively synergizing the high energy efficiency of CIM and the high programmability of RISC-V. Implemented using TSMC 28nm technology, the proposed design achieves an energy efficiency of 3707.84 TOPS/W and 26.21 TOPS at 50 MHz.

CIMR-V: An End-to-End SRAM-based CIM Accelerator with RISC-V for AI Edge Device

TL;DR

Abstract

CIMR-V: An End-to-End SRAM-based CIM Accelerator with RISC-V for AI Edge Device

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)