Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators
Geonhwa Jeong, Po-An Tsai, Abhimanyu R. Bambhaniya, Stephen W. Keckler, Tushar Krishna
TL;DR
This work tackles the mismatch between unstructured sparsity in DNNs and the practicality of structured sparse accelerators by introducing TASD, which represents any sparse tensor as a sum of structured sparse tensors. The TASDER framework automatically selects TASD configurations per layer to enable TASD-W (weights) and TASD-A (activations), achieving significant energy-delay product reductions and real-system speedups without fine-tuning. By integrating TASD with a flexible structured sparse HW (TTC) design, the approach provides broad acceleration for both dense and sparse networks, including activation sparsity, with modest area overhead. The results demonstrate notable gains on benchmarks like ResNet50 and BERT, including up to $83\%$ EDP improvement and up to $39\%$ real-system speedups, highlighting practical impact for mainstream hardware.
Abstract
Exploiting sparsity in deep neural networks (DNNs) has been a promising area for meeting the growing computation requirements. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparsity support, but it provides limited flexibility and requires extra model fine-tuning. Moreover, any sparse model fine-tuned for certain structured sparse HW cannot be accelerated by other structured hardware. To enable acceleration using unstructured sparsity of DNNs on structured sparse hardware, we propose an approximation method leveraging the distributive property in linear algebra to turn any sparse tensor into a series of structured sparse tensors. We also develop a software framework, TASDER, to apply high-quality structured approximation on weights and activations of DNNs. Our method accelerates dense and sparse DNNs without fine-tuning and improves energy-delay-product (EDP) by up to 83% and 74%. It achieves up to 39% speed-up on a real system.
