Table of Contents
Fetching ...

PULSE: Parametric Hardware Units for Low-power Sparsity-Aware Convolution Engine

Ilkin Aliyev, Tosiron Adegbija

TL;DR

The paper tackles the challenge of unstructured, layer-wise sparsity in Spiking Neural Networks by introducing PULSE, a design-time parametric hardware generator that adapts hardware resources to per-layer workloads. The architecture comprises an Event Control Unit and Neural Core with a Priority Encoder-based spike compression, enabling an event-driven, layer-wise resource partitioning that aligns processing elements with sparsity patterns. Empirical results on a Kintex FPGA with MNIST, FMNIST, and SVHN show substantial gains: approximately $3.14\times$ FPS/W over sparsity-oblivious designs and $1.72\times$ over the latest sparsity-aware work, albeit with some latency overhead due to processing whole layers. Overall, PULSE demonstrates that explicit workload balancing and design-time configurability can significantly boost energy efficiency for sparsity-rich SNN accelerators, with potential for further improvements through power gating and finer-grained partitioning.

Abstract

Spiking Neural Networks (SNNs) have become popular for their more bio-realistic behavior than Artificial Neural Networks (ANNs). However, effectively leveraging the intrinsic, unstructured sparsity of SNNs in hardware is challenging, especially due to the variability in sparsity across network layers. This variability depends on several factors, including the input dataset, encoding scheme, and neuron model. Most existing SNN accelerators fail to account for the layer-specific workloads of an application (model + dataset), leading to high energy consumption. To address this, we propose a design-time parametric hardware generator that takes layer-wise sparsity and the number of processing elements as inputs and synthesizes the corresponding hardware. The proposed design compresses sparse spike trains using a priority encoder and efficiently shifts the activations across the network's layers. We demonstrate the robustness of our proposed approach by first profiling a given application's characteristics followed by performing efficient resource allocation. Results on a Xilinx Kintex FPGA (Field Programmable Gate Arrays) using MNIST, FashionMNIST, and SVHN datasets show a 3.14x improvement in accelerator efficiency (FPS/W) compared to a sparsity-oblivious systolic array-based accelerator. Compared to the most recent sparsity-aware work, our solution improves efficiency by 1.72x.

PULSE: Parametric Hardware Units for Low-power Sparsity-Aware Convolution Engine

TL;DR

The paper tackles the challenge of unstructured, layer-wise sparsity in Spiking Neural Networks by introducing PULSE, a design-time parametric hardware generator that adapts hardware resources to per-layer workloads. The architecture comprises an Event Control Unit and Neural Core with a Priority Encoder-based spike compression, enabling an event-driven, layer-wise resource partitioning that aligns processing elements with sparsity patterns. Empirical results on a Kintex FPGA with MNIST, FMNIST, and SVHN show substantial gains: approximately FPS/W over sparsity-oblivious designs and over the latest sparsity-aware work, albeit with some latency overhead due to processing whole layers. Overall, PULSE demonstrates that explicit workload balancing and design-time configurability can significantly boost energy efficiency for sparsity-rich SNN accelerators, with potential for further improvements through power gating and finer-grained partitioning.

Abstract

Spiking Neural Networks (SNNs) have become popular for their more bio-realistic behavior than Artificial Neural Networks (ANNs). However, effectively leveraging the intrinsic, unstructured sparsity of SNNs in hardware is challenging, especially due to the variability in sparsity across network layers. This variability depends on several factors, including the input dataset, encoding scheme, and neuron model. Most existing SNN accelerators fail to account for the layer-specific workloads of an application (model + dataset), leading to high energy consumption. To address this, we propose a design-time parametric hardware generator that takes layer-wise sparsity and the number of processing elements as inputs and synthesizes the corresponding hardware. The proposed design compresses sparse spike trains using a priority encoder and efficiently shifts the activations across the network's layers. We demonstrate the robustness of our proposed approach by first profiling a given application's characteristics followed by performing efficient resource allocation. Results on a Xilinx Kintex FPGA (Field Programmable Gate Arrays) using MNIST, FashionMNIST, and SVHN datasets show a 3.14x improvement in accelerator efficiency (FPS/W) compared to a sparsity-oblivious systolic array-based accelerator. Compared to the most recent sparsity-aware work, our solution improves efficiency by 1.72x.
Paper Structure (8 sections, 3 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 8 sections, 3 equations, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: Illustration of spike-based convolution operation flow. The "membrane potential" of neurons with non-zero activation values is updated along with the surrounding neurons determined by the filter weights.
  • Figure 2: The proposed layer architecture for a CONV/FC layer hardware. PENC stands for Priority Encoder routine. F. weight stands for Filter weights
  • Figure 3: Per-layer workload distribution for each application (model+dataset). The superscripts represent the network used for the dataset, shown below Table \ref{['tab:compr_results']}.