A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme Edge

Longwei Huang; Chao Fang; Qiong Li; Jun Lin; Zhongfeng Wang

A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme Edge

Longwei Huang, Chao Fang, Qiong Li, Jun Lin, Zhongfeng Wang

TL;DR

This work proposes a precision-scalable RISC-V DNN processor with on-device learning capability that facilitates diverse precision levels of fixed-point DNN inference, spanning from 2-bit to 16-bit, and enhances on-device learning through improved support with FP16 operations.

Abstract

Extreme edge platforms, such as in-vehicle smart devices, require efficient deployment of quantized deep neural networks (DNNs) to enable intelligent applications with limited amounts of energy, memory, and computing resources. However, many edge devices struggle to boost inference throughput of various quantized DNNs due to the varying quantization levels, and these devices lack floating-point (FP) support for on-device learning, which prevents them from improving model accuracy while ensuring data privacy. To tackle the challenges above, we propose a precision-scalable RISC-V DNN processor with on-device learning capability. It facilitates diverse precision levels of fixed-point DNN inference, spanning from 2-bit to 16-bit, and enhances on-device learning through improved support with FP16 operations. Moreover, we employ multiple methods such as FP16 multiplier reuse and multi-precision integer multiplier reuse, along with balanced mapping of FPGA resources, to significantly improve hardware resource utilization. Experimental results on the Xilinx ZCU102 FPGA show that our processor significantly improves inference throughput by 1.6$\sim$14.6$\times$ and energy efficiency by 1.1$\sim$14.6$\times$ across various DNNs, compared to the prior art, XpulpNN. Additionally, our processor achieves a 16.5$\times$ higher FP throughput for on-device learning.

A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme Edge

TL;DR

Abstract

14.6

and energy efficiency by 1.1

14.6

across various DNNs, compared to the prior art, XpulpNN. Additionally, our processor achieves a 16.5

higher FP throughput for on-device learning.

Paper Structure (13 sections, 8 figures, 1 table)

This paper contains 13 sections, 8 figures, 1 table.

Introduction
Related Works
The Proposed RISC-V DNN Processor
Features of Our DNN Processor
Customized RISC-V Instruction-Driven Mapping
Precision-Scalable Processing Element
Balancing LUT and DSP Mapping
Experimental Results
Experimental Setup
FPGA Resource Utilization Analysis
Throughput and Energy Efficiency Comparison
Comparison to CPU, GPU, and FPGA-based Prior Arts
Conclusion

Figures (8)

Figure 1: Comparison between the architecture of (a) XpulpNN garofalo2021xpulpnn and (b) our proposed DNN processor.
Figure 2: The instruction and computation flow of our processor and XpulpNN to perform an INT8 matrix multiplication operator. (a) shows the 4$\times$4 matmul operator; (b) and (c) show the computational and instruction flows of our processor's SA and XpulpNN's dotp units, respectively.
Figure 3: Data arrangement method of different precision.
Figure 4: Architecture of the precision-scalable multiplier with highly-reused 16-bit mantissa multiplier and 8-bit multiplier trees. Only half of the 4-bit multiplier trees and 2-bit multipliers of one 8-bit multiplier tree are reused to ensure the output bit-width remains the same at different precision levels.
Figure 5: Architecture of the precision-scalable adder.
...and 3 more figures

A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme Edge

TL;DR

Abstract

A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme Edge

Authors

TL;DR

Abstract

Table of Contents

Figures (8)