Table of Contents
Fetching ...

A Low-Power Sparse Deep Learning Accelerator with Optimized Data Reuse

Kai-Chieh Hsu, Tian-Sheuan Chang

TL;DR

This work tackles irregular data distribution in sparse deep learning accelerators by reducing SRAM traffic through two innovations: Effective Index Matching (EIM) and Shared Index Data Reuse (SIDR). EIM regularizes data access by reordering non-zero data indexes into the compressed data sequence to derive consistent effective input and weight indexes, enabling simultaneous and efficient index matching. SIDR coordinates data sharing across a 2-D PE array, consolidating repeated data accesses into shared registers and broadcast paths to maximize reuse. Together, these techniques yield an $86\%$ reduction in SRAM reads and a $2.5\times$ improvement in power efficiency over state-of-the-art methods, validated on a 16×16 PE array in a MobileNet V2 PW layer and random matrix multiplications, with notable PE utilization and low MAPM values. The approach blends the data-regularity of dense DLAs with sparse computation efficiency, offering a practical pathway to high-performance, low-power sparse DLAs.

Abstract

Sparse deep learning has reduced computation significantly, but its irregular non-zero data distribution complicates the data flow and hinders data reuse, increasing on-chip SRAM access and thus power consumption of the chip. This paper addresses the aforementioned issues by maximizing data reuse to reduce SRAM access by two approaches. First, we propose Effective Index Matching (EIM), which efficiently searches and arranges non-zero operations from compressed data. Second, we propose Shared Index Data Reuse (SIDR) which coordinates the operations between Processing Elements (PEs), regularizing their SRAM data access, thereby enabling all data to be reused efficiently. Our approach reduces the access of the SRAM buffer by 86\% when compared to the previous design, SparTen. As a result, our design achieves a 2.5$\times$ improvement in power efficiency compared to state-of-the-art methods while maintaining a simpler dataflow.

A Low-Power Sparse Deep Learning Accelerator with Optimized Data Reuse

TL;DR

This work tackles irregular data distribution in sparse deep learning accelerators by reducing SRAM traffic through two innovations: Effective Index Matching (EIM) and Shared Index Data Reuse (SIDR). EIM regularizes data access by reordering non-zero data indexes into the compressed data sequence to derive consistent effective input and weight indexes, enabling simultaneous and efficient index matching. SIDR coordinates data sharing across a 2-D PE array, consolidating repeated data accesses into shared registers and broadcast paths to maximize reuse. Together, these techniques yield an reduction in SRAM reads and a improvement in power efficiency over state-of-the-art methods, validated on a 16×16 PE array in a MobileNet V2 PW layer and random matrix multiplications, with notable PE utilization and low MAPM values. The approach blends the data-regularity of dense DLAs with sparse computation efficiency, offering a practical pathway to high-performance, low-power sparse DLAs.

Abstract

Sparse deep learning has reduced computation significantly, but its irregular non-zero data distribution complicates the data flow and hinders data reuse, increasing on-chip SRAM access and thus power consumption of the chip. This paper addresses the aforementioned issues by maximizing data reuse to reduce SRAM access by two approaches. First, we propose Effective Index Matching (EIM), which efficiently searches and arranges non-zero operations from compressed data. Second, we propose Shared Index Data Reuse (SIDR) which coordinates the operations between Processing Elements (PEs), regularizing their SRAM data access, thereby enabling all data to be reused efficiently. Our approach reduces the access of the SRAM buffer by 86\% when compared to the previous design, SparTen. As a result, our design achieves a 2.5 improvement in power efficiency compared to state-of-the-art methods while maintaining a simpler dataflow.

Paper Structure

This paper contains 11 sections, 9 figures, 1 table, 1 algorithm.

Figures (9)

  • Figure 1: An example of the data compression using bitmap. The blue and green numbers represent the original indexes of non-zero inputs and weights, respectively, while the gray zeros represent the zero values
  • Figure 2: A Simple Example for typical sparse DLAs gondimalla2019sparten and proposed SIDR
  • Figure 3: The DLA architecture and the dataflow of SIDR.
  • Figure 4: The process and example of EIM.
  • Figure 5: The demonstration of performing the example in Fig. \ref{['Bitmap example']} using SIDR
  • ...and 4 more figures