Table of Contents
Fetching ...

A High-Throughput Spiking Neural Network Processor Enabling Synaptic Delay Emulation

Faquan Chen, Qingyang Tian, Ziren Wu, Rendong Ying, Fei Wen, Peilin Liu

TL;DR

Real-time temporal processing in SNNs on edge devices requires efficient handling of synaptic delays, which are expensive to implement on budget hardware. The authors propose a multicore SNN processor that emulates synaptic delays by converting delayed computations into non-delayed equivalents using a Spiking Ring Buffer, implemented as a four-core SoC on a PYNQ Z2 FPGA. The delayed LIF update is $u^{l}_i[t] = \lambda u^{l}_i[t-1] + \sum_j w_{ij}s_j^{l-1}[t-d_{ij}] - v_{th}s_i^{l}[t-1]$, with $s_i^l[t] = H(u_i^l[t]-v_{th})$, and $d_{ij}$ denotes the synaptic delay. Experimental results on the SHD benchmark with a 3-layer network (140-256-256-20) show deployment accuracy of 93.4% after 8-bit quantization; the system achieves 104 samples/s at 125 MHz with an average power of 282 mW, and a total SoC power of 1.71 W. Overall, the work demonstrates a cost-effective, edge-ready hardware platform for synaptic-delay-based temporal processing in SNNs on reconfigurable devices.

Abstract

Synaptic delay has attracted significant attention in neural network dynamics for integrating and processing complex spatiotemporal information. This paper introduces a high-throughput Spiking Neural Network (SNN) processor that supports synaptic delay-based emulation for edge applications. The processor leverages a multicore pipelined architecture with parallel compute engines, capable of real-time processing of the computational load associated with synaptic delays. We develop a SoC prototype of the proposed processor on PYNQ Z2 FPGA platform and evaluate its performance using the Spiking Heidelberg Digits (SHD) benchmark for low-power keyword spotting tasks. The processor achieves 93.4% accuracy in deployment and an average throughput of 104 samples/sec at a typical operating frequency of 125 MHz and 282 mW power consumption.

A High-Throughput Spiking Neural Network Processor Enabling Synaptic Delay Emulation

TL;DR

Real-time temporal processing in SNNs on edge devices requires efficient handling of synaptic delays, which are expensive to implement on budget hardware. The authors propose a multicore SNN processor that emulates synaptic delays by converting delayed computations into non-delayed equivalents using a Spiking Ring Buffer, implemented as a four-core SoC on a PYNQ Z2 FPGA. The delayed LIF update is , with , and denotes the synaptic delay. Experimental results on the SHD benchmark with a 3-layer network (140-256-256-20) show deployment accuracy of 93.4% after 8-bit quantization; the system achieves 104 samples/s at 125 MHz with an average power of 282 mW, and a total SoC power of 1.71 W. Overall, the work demonstrates a cost-effective, edge-ready hardware platform for synaptic-delay-based temporal processing in SNNs on reconfigurable devices.

Abstract

Synaptic delay has attracted significant attention in neural network dynamics for integrating and processing complex spatiotemporal information. This paper introduces a high-throughput Spiking Neural Network (SNN) processor that supports synaptic delay-based emulation for edge applications. The processor leverages a multicore pipelined architecture with parallel compute engines, capable of real-time processing of the computational load associated with synaptic delays. We develop a SoC prototype of the proposed processor on PYNQ Z2 FPGA platform and evaluate its performance using the Spiking Heidelberg Digits (SHD) benchmark for low-power keyword spotting tasks. The processor achieves 93.4% accuracy in deployment and an average throughput of 104 samples/sec at a typical operating frequency of 125 MHz and 282 mW power consumption.

Paper Structure

This paper contains 5 sections, 2 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Comparison of synaptic delay implementation. (a) conventional delayed model. (b) proposed non-delayed equivalent.
  • Figure 2: The overall architecture. During computation, the head pointer advances to store new presynaptic spikes at the current timestep. Synaptic weights/delays are fetched, enabling spike pointer calculation (via delay + head pointer) to access historical spikes for current-timestep processing.
  • Figure 3: Placement distribution of compute cores and PS in the FPGA