Table of Contents
Fetching ...

FeNN-DMA: A RISC-V SoC for SNN acceleration

Zainab Aizaz, James C. Knight, Thomas Nowotny

TL;DR

This work tackles the memory-bound nature of Spiking Neural Networks and the need for programmable accelerators by introducing FeNN-DMA, a fully-programmable RISC-V SoC for SNN acceleration on FPGA. It combines a scalar CV32E40x core, a wide 32-lane vector core (VEC), and a bespoke DMA to stream weights from DDR4 into URAM, all orchestrated by PyFeNN, enabling complex neuron models, delays, and sparse connectivity. The approach achieves state-of-the-art accuracy on SHD (90.32%) and Neuromorphic-MNIST (98.46%) while supporting much larger networks than fixed-function FPGA accelerators, and runs on a Kria KV260 with single- and dual-core configurations. The work demonstrates that wide-vector, programmable FPGA accelerators can deliver competitive performance with manageable power, and outlines clear paths toward inter-core scalability and training via EventProp for broader neuromorphic workloads.

Abstract

Spiking Neural Networks (SNNs) are a promising, energy-efficient alternative to standard Artificial Neural Networks (ANNs) and are particularly well-suited to spatio-temporal tasks such as keyword spotting and video classification. However, SNNs have a much lower arithmetic intensity than ANNs and are therefore not well-matched to standard accelerators like GPUs and TPUs. Field Programmable Gate Arrays(FPGAs) are designed for such memory-bound workloads and here we develop a novel, fully-programmable RISC-V-based system-on-chip (FeNN-DMA), tailored to simulating SNNs on modern UltraScale+ FPGAs. We show that FeNN-DMA has comparable resource usage and energy requirements to state-of-the-art fixed-function SNN accelerators, yet it is capable of simulating much larger and more complex models. Using this functionality, we demonstrate state-of-the-art classification accuracy on the Spiking Heidelberg Digits and Neuromorphic MNIST tasks.

FeNN-DMA: A RISC-V SoC for SNN acceleration

TL;DR

This work tackles the memory-bound nature of Spiking Neural Networks and the need for programmable accelerators by introducing FeNN-DMA, a fully-programmable RISC-V SoC for SNN acceleration on FPGA. It combines a scalar CV32E40x core, a wide 32-lane vector core (VEC), and a bespoke DMA to stream weights from DDR4 into URAM, all orchestrated by PyFeNN, enabling complex neuron models, delays, and sparse connectivity. The approach achieves state-of-the-art accuracy on SHD (90.32%) and Neuromorphic-MNIST (98.46%) while supporting much larger networks than fixed-function FPGA accelerators, and runs on a Kria KV260 with single- and dual-core configurations. The work demonstrates that wide-vector, programmable FPGA accelerators can deliver competitive performance with manageable power, and outlines clear paths toward inter-core scalability and training via EventProp for broader neuromorphic workloads.

Abstract

Spiking Neural Networks (SNNs) are a promising, energy-efficient alternative to standard Artificial Neural Networks (ANNs) and are particularly well-suited to spatio-temporal tasks such as keyword spotting and video classification. However, SNNs have a much lower arithmetic intensity than ANNs and are therefore not well-matched to standard accelerators like GPUs and TPUs. Field Programmable Gate Arrays(FPGAs) are designed for such memory-bound workloads and here we develop a novel, fully-programmable RISC-V-based system-on-chip (FeNN-DMA), tailored to simulating SNNs on modern UltraScale+ FPGAs. We show that FeNN-DMA has comparable resource usage and energy requirements to state-of-the-art fixed-function SNN accelerators, yet it is capable of simulating much larger and more complex models. Using this functionality, we demonstrate state-of-the-art classification accuracy on the Spiking Heidelberg Digits and Neuromorphic MNIST tasks.

Paper Structure

This paper contains 18 sections, 1 equation, 3 figures, 5 tables.

Figures (3)

  • Figure 1: (A) Block diagram of a single-core FeNN System-on-Chip. (B) Execution of VMUL instruction in one vector lane.
  • Figure 2: Lane local memory data structures for spike propagation. Snaking lines indicate parallelism across vector lanes. Colours are used to differentiate inputs to different neurons. (A) Compressed connectivity with 256 target neurons. $I_i$ indicates the input to neuron $i$. (B) Delayed connectivity with 64 target neurons and $N_\text{delay}=4$ delay slots. $I_{i,d}$ indices the input to neuron $i$ in delay slot $d$.
  • Figure 3: (A) Example raster plot showing activity of neurons in a balanced random network with 2048 excitatory and 512 inhibitory neurons. (B) Simulation time of a balanced random network with 90% sparsity running on a single FeNN core. Points represent measured simulation times and dashed lines in corresponding colours show the time predicted by our performance model based on the number of neurons and number of SOPs. Horizontal dashed line represents real-time performance. (C) Effective throughput of balanced random network with 16000.0 neurons. "Measured" is based on total simulation time, "Measured synapses" is calculated using performance counters around the event propagation process group and "Theoretical" is calculated as described in §\ref{['sec:snn_implement']}.