RISC-V R-Extension: Advancing Efficiency with Rented-Pipeline for Edge DNN Processing
Won Hyeok Kim, Hyeong Jin Kim, Tae Hee Han
TL;DR
Edge devices require energy-efficient DNN inference within tight power and area constraints, making traditional NPUs impractical for small form factors. The paper proposes the RISC-V R-extension, which combines rented-pipeline execution and Architectural Pipeline Registers (APR) with new instructions rfmac.s and rfsmac.s to accelerate MAC operations on CPU cores. Across LeNet, ResNet-20, and MobileNet-V1, RV64R delivers IPC improvements up to 29% over RV64F and reduces memory accesses by up to 34%, with runtime gains around 50% versus RV64F and ~32% vs Baseline, while incurring only modest hardware overhead in FPGA implementations. These results indicate a viable, low-overhead CPU-based path for edge AI that can scale with future vector extensions, enabling more responsive and power-efficient edge applications.
Abstract
The proliferation of edge devices necessitates efficient computational architectures for lightweight tasks, particularly deep neural network (DNN) inference. Traditional NPUs, though effective for such operations, face challenges in power, cost, and area when integrated into lightweight edge devices. The RISC-V architecture, known for its modularity and open-source nature, offers a viable alternative. This paper introduces the RISC-V R-extension, a novel approach to enhancing DNN process efficiency on edge devices. The extension features rented-pipeline stages and architectural pipeline registers (APR), which optimize critical operation execution, thereby reducing latency and memory access frequency. Furthermore, this extension includes new custom instructions to support these architectural improvements. Through comprehensive analysis, this study demonstrates the boost of R-extension in edge device processing, setting the stage for more responsive and intelligent edge applications.
