DSLR-CNN: Efficient CNN Acceleration using Digit-Serial Left-to-Right Arithmetic
Malik Zohaib Nisar, Muhammad Sohail Ibrahim, Saeid Gorgin, Muhammad Usman, Jeong-A Lee
TL;DR
DSLR-CNN introduces a left-to-right online arithmetic framework for CNN acceleration, leveraging digit-serial, MSDF computations to enable tight digit-level pipelining and reduced interconnects. The architecture uses a weight-stationary dataflow with tiling across a large PE array (9 PEs per row, 64 PEs per column; $T_n=16$, $T_m=8$, $T_r=T_c=8$) and an LR-SPM multiplier with online delay $\delta=2$, achieving substantial latency reductions and improved peak TOPS/W and GOPS/$mm^2$ across AlexNet, VGG-16, and ResNet-18 relative to a conventional bit-serial baseline. Synthesis on GSCL 45nm at 500 MHz shows DSLR-CNN delivering up to $4.47$ TOPS peak performance (AlexNet) and up to $3.57$ TOPS/W energy efficiency, coupled with latency reductions (e.g., AlexNet 0.94 ms vs 1.54 ms baseline). The results demonstrate strong performance, energy, and area efficiency gains, validating LR-based digit-serial CNN acceleration and motivating future exploration of unified online algorithms and broader network applicability (e.g., GoogleNet, YOLO, transformers).
Abstract
Digit-serial arithmetic has emerged as a viable approach for designing hardware accelerators, reducing interconnections, area utilization, and power consumption. However, conventional methods suffer from performance and latency issues. To address these challenges, we propose an accelerator design using left-to-right (LR) arithmetic, which performs computations in a most-significant digit first (MSDF) manner, enabling digit-level pipelining. This leads to substantial performance improvements and reduced latency. The processing engine is designed for convolutional neural networks (CNNs), which includes low-latency LR multipliers and adders for digit-level parallelism. The proposed DSLR-CNN is implemented in Verilog and synthesized with Synopsys design compiler using GSCL 45nm technology, the DSLR-CNN accelerator was evaluated on AlexNet, VGG-16, and ResNet-18 networks. Results show significant improvements across key performance evaluation metrics, including response time, peak performance, power consumption, operational intensity, area efficiency, and energy efficiency. The peak performance measured in GOPS of the proposed design is 4.37x to 569.11x higher than contemporary designs, and it achieved 3.58x to 44.75x higher peak energy efficiency (TOPS/W), outperforming conventional bit-serial designs.
