Table of Contents
Fetching ...

Fast Algorithms for Spiking Neural Network Simulation with FPGAs

Björn A. Lindqvist, Artur Podobas

TL;DR

This paper tackles the challenge of efficiently simulating large-scale spiking neural networks on energy-constrained hardware. It employs OpenCL-based high-level synthesis to implement the Potjans-Diesmann cortical microcircuit on a single Intel Agilex FPGA, exploring twelve simulator variants that leverage push, just-in-time, and horizon-based spike transfer alongside memory-aware data structures. The results show real-time or faster performance with competitive or superior energy efficiency (approximately $25\,$ per synaptic event in the best cases) while preserving accuracy compared to a reference NEST simulation, demonstrating the viability of FPGAs for high-fidelity SNN simulation. The work provides broadly applicable design insights—such as on-chip memory-centric spike transfer, disjoint synapse partitioning, and horizon-based transfer—that can be adapted to other hardware, signaling a practical path toward energy-efficient, scalable brain-inspired modeling on heterogeneous HPC systems.

Abstract

Using OpenCL-based high-level synthesis, we create a number of spiking neural network (SNN) simulators for the Potjans-Diesmann cortical microcircuit for a high-end Field-Programmable Gate Array (FPGA). Our best simulators simulate the circuit 25\% faster than real-time, require less than 21 nJ per synaptic event, and are bottle-necked by the device's on-chip memory. Speed-wise they compare favorably to the state-of-the-art GPU-based simulators and their energy usage is lower than any other published result. This result is the first for simulating the circuit on a single hardware accelerator. We also extensively analyze the techniques and algorithms we implement our simulators with, many of which can be realized on other types of hardware. Thus, this article is of interest to any researcher or practitioner interested in efficient SNN simulation, whether they target FPGAs or not.

Fast Algorithms for Spiking Neural Network Simulation with FPGAs

TL;DR

This paper tackles the challenge of efficiently simulating large-scale spiking neural networks on energy-constrained hardware. It employs OpenCL-based high-level synthesis to implement the Potjans-Diesmann cortical microcircuit on a single Intel Agilex FPGA, exploring twelve simulator variants that leverage push, just-in-time, and horizon-based spike transfer alongside memory-aware data structures. The results show real-time or faster performance with competitive or superior energy efficiency (approximately per synaptic event in the best cases) while preserving accuracy compared to a reference NEST simulation, demonstrating the viability of FPGAs for high-fidelity SNN simulation. The work provides broadly applicable design insights—such as on-chip memory-centric spike transfer, disjoint synapse partitioning, and horizon-based transfer—that can be adapted to other hardware, signaling a practical path toward energy-efficient, scalable brain-inspired modeling on heterogeneous HPC systems.

Abstract

Using OpenCL-based high-level synthesis, we create a number of spiking neural network (SNN) simulators for the Potjans-Diesmann cortical microcircuit for a high-end Field-Programmable Gate Array (FPGA). Our best simulators simulate the circuit 25\% faster than real-time, require less than 21 nJ per synaptic event, and are bottle-necked by the device's on-chip memory. Speed-wise they compare favorably to the state-of-the-art GPU-based simulators and their energy usage is lower than any other published result. This result is the first for simulating the circuit on a single hardware accelerator. We also extensively analyze the techniques and algorithms we implement our simulators with, many of which can be realized on other types of hardware. Thus, this article is of interest to any researcher or practitioner interested in efficient SNN simulation, whether they target FPGAs or not.
Paper Structure (30 sections, 11 equations, 29 figures, 5 tables)

This paper contains 30 sections, 11 equations, 29 figures, 5 tables.

Figures (29)

  • Figure 1: A fully-connected neural network with two hidden layers, two input neurons, and three output neurons
  • Figure 2: A LUT for computing any two-variable boolean function implemented as one 4:1 multiplexer and four memory bits. The bit values determines which function the LUT computes.
  • Figure 3: Simplified schematic of the Agilex 7 ALM. The multiplexers' (trapezoids) control signals (not shown) and the contents of the LUT configures the ALM. The ALM can serve as -- among other things -- a four-bit adder, a four-bit memory, or as combinational logic of six inputs depending on configuration.
  • Figure 4: Single and multiple work item OpenCL kernels for element-wise vector product
  • Figure 5: A loop-nest that could benefit from pipeline parallelism. The functions f, g, and h are assumed to be short and inlineable.
  • ...and 24 more figures