Table of Contents
Fetching ...

Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators

Geonhwa Jeong, Po-An Tsai, Abhimanyu R. Bambhaniya, Stephen W. Keckler, Tushar Krishna

TL;DR

This work tackles the mismatch between unstructured sparsity in DNNs and the practicality of structured sparse accelerators by introducing TASD, which represents any sparse tensor as a sum of structured sparse tensors. The TASDER framework automatically selects TASD configurations per layer to enable TASD-W (weights) and TASD-A (activations), achieving significant energy-delay product reductions and real-system speedups without fine-tuning. By integrating TASD with a flexible structured sparse HW (TTC) design, the approach provides broad acceleration for both dense and sparse networks, including activation sparsity, with modest area overhead. The results demonstrate notable gains on benchmarks like ResNet50 and BERT, including up to $83\%$ EDP improvement and up to $39\%$ real-system speedups, highlighting practical impact for mainstream hardware.

Abstract

Exploiting sparsity in deep neural networks (DNNs) has been a promising area for meeting the growing computation requirements. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparsity support, but it provides limited flexibility and requires extra model fine-tuning. Moreover, any sparse model fine-tuned for certain structured sparse HW cannot be accelerated by other structured hardware. To enable acceleration using unstructured sparsity of DNNs on structured sparse hardware, we propose an approximation method leveraging the distributive property in linear algebra to turn any sparse tensor into a series of structured sparse tensors. We also develop a software framework, TASDER, to apply high-quality structured approximation on weights and activations of DNNs. Our method accelerates dense and sparse DNNs without fine-tuning and improves energy-delay-product (EDP) by up to 83% and 74%. It achieves up to 39% speed-up on a real system.

Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators

TL;DR

This work tackles the mismatch between unstructured sparsity in DNNs and the practicality of structured sparse accelerators by introducing TASD, which represents any sparse tensor as a sum of structured sparse tensors. The TASDER framework automatically selects TASD configurations per layer to enable TASD-W (weights) and TASD-A (activations), achieving significant energy-delay product reductions and real-system speedups without fine-tuning. By integrating TASD with a flexible structured sparse HW (TTC) design, the approach provides broad acceleration for both dense and sparse networks, including activation sparsity, with modest area overhead. The results demonstrate notable gains on benchmarks like ResNet50 and BERT, including up to EDP improvement and up to real-system speedups, highlighting practical impact for mainstream hardware.

Abstract

Exploiting sparsity in deep neural networks (DNNs) has been a promising area for meeting the growing computation requirements. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparsity support, but it provides limited flexibility and requires extra model fine-tuning. Moreover, any sparse model fine-tuned for certain structured sparse HW cannot be accelerated by other structured hardware. To enable acceleration using unstructured sparsity of DNNs on structured sparse hardware, we propose an approximation method leveraging the distributive property in linear algebra to turn any sparse tensor into a series of structured sparse tensors. We also develop a software framework, TASDER, to apply high-quality structured approximation on weights and activations of DNNs. Our method accelerates dense and sparse DNNs without fine-tuning and improves energy-delay-product (EDP) by up to 83% and 74%. It achieves up to 39% speed-up on a real system.
Paper Structure (30 sections, 3 equations, 20 figures, 4 tables)

This paper contains 30 sections, 3 equations, 20 figures, 4 tables.

Figures (20)

  • Figure 1: Different flows to exploit sparsity in DNNs.
  • Figure 2: Different sparsity patterns and views.
  • Figure 3: TASD Interface.
  • Figure 4: TASD example using a 2$\times$8 matrix $A$.
  • Figure 5: System overview with TASDER.
  • ...and 15 more figures