PULSE: Parametric Hardware Units for Low-power Sparsity-Aware Convolution Engine
Ilkin Aliyev, Tosiron Adegbija
TL;DR
The paper tackles the challenge of unstructured, layer-wise sparsity in Spiking Neural Networks by introducing PULSE, a design-time parametric hardware generator that adapts hardware resources to per-layer workloads. The architecture comprises an Event Control Unit and Neural Core with a Priority Encoder-based spike compression, enabling an event-driven, layer-wise resource partitioning that aligns processing elements with sparsity patterns. Empirical results on a Kintex FPGA with MNIST, FMNIST, and SVHN show substantial gains: approximately $3.14\times$ FPS/W over sparsity-oblivious designs and $1.72\times$ over the latest sparsity-aware work, albeit with some latency overhead due to processing whole layers. Overall, PULSE demonstrates that explicit workload balancing and design-time configurability can significantly boost energy efficiency for sparsity-rich SNN accelerators, with potential for further improvements through power gating and finer-grained partitioning.
Abstract
Spiking Neural Networks (SNNs) have become popular for their more bio-realistic behavior than Artificial Neural Networks (ANNs). However, effectively leveraging the intrinsic, unstructured sparsity of SNNs in hardware is challenging, especially due to the variability in sparsity across network layers. This variability depends on several factors, including the input dataset, encoding scheme, and neuron model. Most existing SNN accelerators fail to account for the layer-specific workloads of an application (model + dataset), leading to high energy consumption. To address this, we propose a design-time parametric hardware generator that takes layer-wise sparsity and the number of processing elements as inputs and synthesizes the corresponding hardware. The proposed design compresses sparse spike trains using a priority encoder and efficiently shifts the activations across the network's layers. We demonstrate the robustness of our proposed approach by first profiling a given application's characteristics followed by performing efficient resource allocation. Results on a Xilinx Kintex FPGA (Field Programmable Gate Arrays) using MNIST, FashionMNIST, and SVHN datasets show a 3.14x improvement in accelerator efficiency (FPS/W) compared to a sparsity-oblivious systolic array-based accelerator. Compared to the most recent sparsity-aware work, our solution improves efficiency by 1.72x.
