Table of Contents
Fetching ...

Energy Efficient Exact and Approximate Systolic Array Architecture for Matrix Multiplication

Pragun Jaswal, L. Hemanth Krishna, B. Srinivasu

TL;DR

The paper addresses the energy cost of matrix-multiplication cores in DNN and vision workloads by introducing a systolic-array design with exact and approximate processing elements built from PPC and NPPC blocks to reduce area, power, and delay. The approach yields an energy reduction of about 22% for the exact PE and 32% for the approximate PE, validated on an 8×8 array with applications to DCT and edge detection that achieve PSNRs around 38 dB and 30 dB, respectively. Key contributions include an exact 8-bit PE, an approx. 8-bit PE with quantified error, and validation across DCT, edge detection, and image sharpening tasks, demonstrating substantial hardware-efficiency gains across SA sizes. The work supports deployment on edge devices and error-resilient vision tasks by balancing energy efficiency with acceptable accuracy, leveraging PPC/NPPC-based MACs to maintain performance where full precision is not required.

Abstract

Deep Neural Networks (DNNs) require highly efficient matrix multiplication engines for complex computations. This paper presents a systolic array architecture incorporating novel exact and approximate processing elements (PEs), designed using energy-efficient positive partial product and negative partial product cells, termed as PPC and NPPC, respectively. The proposed 8-bit exact and approximate PE designs are employed in a 8x8 systolic array, which achieves a energy savings of 22% and 32%, respectively, compared to the existing design. To demonstrate their effectiveness, the proposed PEs are integrated into a systolic array (SA) for Discrete Cosine Transform (DCT) computation, achieving high output quality with a PSNR of 38.21,dB. Furthermore, in an edge detection application using convolution, the approximate PE achieves a PSNR of 30.45,dB. These results highlight the potential of the proposed design to deliver significant energy efficiency while maintaining competitive output quality, making it well-suited for error-resilient image and vision processing applications.

Energy Efficient Exact and Approximate Systolic Array Architecture for Matrix Multiplication

TL;DR

The paper addresses the energy cost of matrix-multiplication cores in DNN and vision workloads by introducing a systolic-array design with exact and approximate processing elements built from PPC and NPPC blocks to reduce area, power, and delay. The approach yields an energy reduction of about 22% for the exact PE and 32% for the approximate PE, validated on an 8×8 array with applications to DCT and edge detection that achieve PSNRs around 38 dB and 30 dB, respectively. Key contributions include an exact 8-bit PE, an approx. 8-bit PE with quantified error, and validation across DCT, edge detection, and image sharpening tasks, demonstrating substantial hardware-efficiency gains across SA sizes. The work supports deployment on edge devices and error-resilient vision tasks by balancing energy efficiency with acceptable accuracy, leveraging PPC/NPPC-based MACs to maintain performance where full precision is not required.

Abstract

Deep Neural Networks (DNNs) require highly efficient matrix multiplication engines for complex computations. This paper presents a systolic array architecture incorporating novel exact and approximate processing elements (PEs), designed using energy-efficient positive partial product and negative partial product cells, termed as PPC and NPPC, respectively. The proposed 8-bit exact and approximate PE designs are employed in a 8x8 systolic array, which achieves a energy savings of 22% and 32%, respectively, compared to the existing design. To demonstrate their effectiveness, the proposed PEs are integrated into a systolic array (SA) for Discrete Cosine Transform (DCT) computation, achieving high output quality with a PSNR of 38.21,dB. Furthermore, in an edge detection application using convolution, the approximate PE achieves a PSNR of 30.45,dB. These results highlight the potential of the proposed design to deliver significant energy efficiency while maintaining competitive output quality, making it well-suited for error-resilient image and vision processing applications.

Paper Structure

This paper contains 13 sections, 15 figures, 7 tables.

Figures (15)

  • Figure 1: Architecture of $3 \times 3$ Systolic Array for Matrix MultiplicationKung1982WhySA.
  • Figure 2: Existing Design of Exact 4-Bit Signed PE Lombardi2015
  • Figure 3: Conventional approach of Exact (a) PPC and (b) NPPC
  • Figure 4: Proposed Exact 4-Bit Unsigned PE Design
  • Figure 5: Proposed Exact 4-Bit Signed PE Design
  • ...and 10 more figures