Prosperity: Accelerating Spiking Neural Networks via Product Sparsity
Chiyue Wei, Cong Guo, Feng Cheng, Shiyu Li, Hao "Frank" Yang, Hai "Helen" Li, Yiran Chen
TL;DR
The paper addresses inefficiencies in exploiting sparsity in Spiking Neural Networks by introducing Product Sparsity, which reuses identical inner-product results across spike-row sub-combinations. It then designs Prosperity, a dedicated hardware architecture with a ProSparsity Processing Unit (PPU), TCAM-based detectors, and a ProSparsity Forest-driven pipeline to achieve overhead-free sparse processing and linear-time complexity. Across spiking CNNs and transformers, Prosperity demonstrates average speedups of $7.4\times$ over PTB and $1.8\times$ over the A100, with energy efficiency gains up to $193\times$ and notable reductions in activation density (e.g., SpikeBERT shows $11\times$ computation reduction; density drops from $13.19\%$ to $1.23\%$). These results indicate that ProSparsity enables substantial, hardware-efficient improvements for a broad class of SNNs, including emerging spiking transformer models, while remaining algorithm-agnostic and compatible with other DNN compression techniques.
Abstract
Spiking Neural Networks (SNNs) are highly efficient due to their spike-based activation, which inherently produces bit-sparse computation patterns. Existing hardware implementations of SNNs leverage this sparsity pattern to avoid wasteful zero-value computations, yet this approach fails to fully capitalize on the potential efficiency of SNNs. This study introduces a novel sparsity paradigm called Product Sparsity, which leverages combinatorial similarities within matrix multiplication operations to reuse the inner product result and reduce redundant computations. Product Sparsity significantly enhances sparsity in SNNs without compromising the original computation results compared to traditional bit sparsity methods. For instance, in the SpikeBERT SNN model, Product Sparsity achieves a density of only $1.23\%$ and reduces computation by $11\times$, compared to bit sparsity, which has a density of $13.19\%$. To efficiently implement Product Sparsity, we propose Prosperity, an architecture that addresses the challenges of identifying and eliminating redundant computations in real-time. Compared to prior SNN accelerator PTB and the A100 GPU, Prosperity achieves an average speedup of $7.4\times$ and $1.8\times$, respectively, along with energy efficiency improvements of $8.0\times$ and $193\times$, respectively. The code for Prosperity is available at https://github.com/dubcyfor3/Prosperity.
