DSLR-CNN: Efficient CNN Acceleration using Digit-Serial Left-to-Right Arithmetic

Malik Zohaib Nisar; Muhammad Sohail Ibrahim; Saeid Gorgin; Muhammad Usman; Jeong-A Lee

DSLR-CNN: Efficient CNN Acceleration using Digit-Serial Left-to-Right Arithmetic

Malik Zohaib Nisar, Muhammad Sohail Ibrahim, Saeid Gorgin, Muhammad Usman, Jeong-A Lee

TL;DR

DSLR-CNN introduces a left-to-right online arithmetic framework for CNN acceleration, leveraging digit-serial, MSDF computations to enable tight digit-level pipelining and reduced interconnects. The architecture uses a weight-stationary dataflow with tiling across a large PE array (9 PEs per row, 64 PEs per column; $T_n=16$, $T_m=8$, $T_r=T_c=8$) and an LR-SPM multiplier with online delay $\delta=2$, achieving substantial latency reductions and improved peak TOPS/W and GOPS/$mm^2$ across AlexNet, VGG-16, and ResNet-18 relative to a conventional bit-serial baseline. Synthesis on GSCL 45nm at 500 MHz shows DSLR-CNN delivering up to $4.47$ TOPS peak performance (AlexNet) and up to $3.57$ TOPS/W energy efficiency, coupled with latency reductions (e.g., AlexNet 0.94 ms vs 1.54 ms baseline). The results demonstrate strong performance, energy, and area efficiency gains, validating LR-based digit-serial CNN acceleration and motivating future exploration of unified online algorithms and broader network applicability (e.g., GoogleNet, YOLO, transformers).

Abstract

Digit-serial arithmetic has emerged as a viable approach for designing hardware accelerators, reducing interconnections, area utilization, and power consumption. However, conventional methods suffer from performance and latency issues. To address these challenges, we propose an accelerator design using left-to-right (LR) arithmetic, which performs computations in a most-significant digit first (MSDF) manner, enabling digit-level pipelining. This leads to substantial performance improvements and reduced latency. The processing engine is designed for convolutional neural networks (CNNs), which includes low-latency LR multipliers and adders for digit-level parallelism. The proposed DSLR-CNN is implemented in Verilog and synthesized with Synopsys design compiler using GSCL 45nm technology, the DSLR-CNN accelerator was evaluated on AlexNet, VGG-16, and ResNet-18 networks. Results show significant improvements across key performance evaluation metrics, including response time, peak performance, power consumption, operational intensity, area efficiency, and energy efficiency. The peak performance measured in GOPS of the proposed design is 4.37x to 569.11x higher than contemporary designs, and it achieved 3.58x to 44.75x higher peak energy efficiency (TOPS/W), outperforming conventional bit-serial designs.

DSLR-CNN: Efficient CNN Acceleration using Digit-Serial Left-to-Right Arithmetic

TL;DR

) and an LR-SPM multiplier with online delay

, achieving substantial latency reductions and improved peak TOPS/W and GOPS/

across AlexNet, VGG-16, and ResNet-18 relative to a conventional bit-serial baseline. Synthesis on GSCL 45nm at 500 MHz shows DSLR-CNN delivering up to

TOPS peak performance (AlexNet) and up to

TOPS/W energy efficiency, coupled with latency reductions (e.g., AlexNet 0.94 ms vs 1.54 ms baseline). The results demonstrate strong performance, energy, and area efficiency gains, validating LR-based digit-serial CNN acceleration and motivating future exploration of unified online algorithms and broader network applicability (e.g., GoogleNet, YOLO, transformers).

Abstract

Paper Structure (24 sections, 9 equations, 12 figures, 5 tables, 1 algorithm)

This paper contains 24 sections, 9 equations, 12 figures, 5 tables, 1 algorithm.

Introduction
Background
Convolution Neural network
Left-to-Right Arithmetic
Proposed Design: DSLR-CNN Design
Overall Design
Processing Element
Control Unit
Input and Kernel Buffer
Results and Discussion
Performance Evaluation
Power Utilization and Energy Efficiency
Area and Area Efficiency
Throughput or Performance
Latency
...and 9 more sections

Figures (12)

Figure 1: Illustration of a Convolution Operation.
Figure 2: Comparative Timing Analysis of Conventional Arithmetic vs. Online Arithmetic for Sequential Interdependent Operations.
Figure 3: LR Serial-Parallel Multiplier usman2023low.
Figure 4: Radix-2 LR Adder ercegovac2004digital.
Figure 5: Tile of the DSLR-CNN Architecture and its Processing Element.
...and 7 more figures

DSLR-CNN: Efficient CNN Acceleration using Digit-Serial Left-to-Right Arithmetic

TL;DR

Abstract

DSLR-CNN: Efficient CNN Acceleration using Digit-Serial Left-to-Right Arithmetic

Authors

TL;DR

Abstract

Table of Contents

Figures (12)