Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators

Geonhwa Jeong; Po-An Tsai; Abhimanyu R. Bambhaniya; Stephen W. Keckler; Tushar Krishna

Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators

Geonhwa Jeong, Po-An Tsai, Abhimanyu R. Bambhaniya, Stephen W. Keckler, Tushar Krishna

TL;DR

This work tackles the mismatch between unstructured sparsity in DNNs and the practicality of structured sparse accelerators by introducing TASD, which represents any sparse tensor as a sum of structured sparse tensors. The TASDER framework automatically selects TASD configurations per layer to enable TASD-W (weights) and TASD-A (activations), achieving significant energy-delay product reductions and real-system speedups without fine-tuning. By integrating TASD with a flexible structured sparse HW (TTC) design, the approach provides broad acceleration for both dense and sparse networks, including activation sparsity, with modest area overhead. The results demonstrate notable gains on benchmarks like ResNet50 and BERT, including up to $83\%$ EDP improvement and up to $39\%$ real-system speedups, highlighting practical impact for mainstream hardware.

Abstract

Exploiting sparsity in deep neural networks (DNNs) has been a promising area for meeting the growing computation requirements. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparsity support, but it provides limited flexibility and requires extra model fine-tuning. Moreover, any sparse model fine-tuned for certain structured sparse HW cannot be accelerated by other structured hardware. To enable acceleration using unstructured sparsity of DNNs on structured sparse hardware, we propose an approximation method leveraging the distributive property in linear algebra to turn any sparse tensor into a series of structured sparse tensors. We also develop a software framework, TASDER, to apply high-quality structured approximation on weights and activations of DNNs. Our method accelerates dense and sparse DNNs without fine-tuning and improves energy-delay-product (EDP) by up to 83% and 74%. It achieves up to 39% speed-up on a real system.

Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators

TL;DR

EDP improvement and up to

real-system speedups, highlighting practical impact for mainstream hardware.

Abstract

Paper Structure (30 sections, 3 equations, 20 figures, 4 tables)

This paper contains 30 sections, 3 equations, 20 figures, 4 tables.

Introduction
Background
Terminology
DNN SW: Inducing sparsity in DNNs
DNN HW: Exploiting sparsity in DNNs
Tension between sparse DNN SW and HW
TASD: Tensor Approximation via Structured Decomposition
Overview
Using TASD for matrix multiplication
HW/SW Co-Design with TASD
System architecture overview
TASD-W: Applying TASD on weights
TASD-A: Applying TASD on activations
Structured sparse HW for TASD
Evaluation
...and 15 more sections

Figures (20)

Figure 1: Different flows to exploit sparsity in DNNs.
Figure 2: Different sparsity patterns and views.
Figure 3: TASD Interface.
Figure 4: TASD example using a 2$\times$8 matrix $A$.
Figure 5: System overview with TASDER.
...and 15 more figures

Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators

TL;DR

Abstract

Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators

Authors

TL;DR

Abstract

Table of Contents

Figures (20)