Table of Contents
Fetching ...

SpikeStream: Accelerating Spiking Neural Network Inference on RISC-V Clusters with Sparse Computation Extensions

Simone Manoni, Paul Scheffler, Luca Zanatta, Andrea Acquaviva, Luca Benini, Andrea Bartolini

TL;DR

SpikeStream tackles the sparsity-driven bottlenecks of Spiking Neural Network inference by deploying a software-based accelerator on a multicore RISC-V streaming cluster. It combines CSR-based tensor compression, task/data parallelism, tiling with double buffering, and Stream Registers to convert irregular sparse accesses into hardware-assisted streaming operations, achieving substantial speedups and energy savings. On the S-VGG11 CIFAR10-like benchmark, SpikeStream delivers up to $7.29\times$ speedup with $52.3\%$ FPU utilization in FP16 and $5.68\times$ energy efficiency in FP8 relative to a non-streaming baseline, while also outperforming or rivaling several SoA neuromorphic accelerators in end-to-end metrics. These results demonstrate that flexible, GP SNN acceleration via SR-enhanced streaming can bridge the gap between neuromorphic efficiency and general-purpose compute, enabling broader adoption of SNNs on conventional hardware.

Abstract

Spiking Neural Network (SNN) inference has a clear potential for high energy efficiency as computation is triggered by events. However, the inherent sparsity of events poses challenges for conventional computing systems, driving the development of specialized neuromorphic processors, which come with high silicon area costs and lack the flexibility needed for running other computational kernels, limiting widespread adoption. In this paper, we explore the low-level software design, parallelization, and acceleration of SNNs on general-purpose multicore clusters with a low-overhead RISC-V ISA extension for streaming sparse computations. We propose SpikeStream, an optimization technique that maps weights accesses to affine and indirect register-mapped memory streams to enhance performance, utilization, and efficiency. Our results on the end-to-end Spiking-VGG11 model demonstrate a significant 4.39x speedup and an increase in utilization from 9.28% to 52.3% compared to a non-streaming parallel baseline. Additionally, we achieve an energy efficiency gain of 3.46x over LSMCore and a performance gain of 2.38x over Loihi.

SpikeStream: Accelerating Spiking Neural Network Inference on RISC-V Clusters with Sparse Computation Extensions

TL;DR

SpikeStream tackles the sparsity-driven bottlenecks of Spiking Neural Network inference by deploying a software-based accelerator on a multicore RISC-V streaming cluster. It combines CSR-based tensor compression, task/data parallelism, tiling with double buffering, and Stream Registers to convert irregular sparse accesses into hardware-assisted streaming operations, achieving substantial speedups and energy savings. On the S-VGG11 CIFAR10-like benchmark, SpikeStream delivers up to speedup with FPU utilization in FP16 and energy efficiency in FP8 relative to a non-streaming baseline, while also outperforming or rivaling several SoA neuromorphic accelerators in end-to-end metrics. These results demonstrate that flexible, GP SNN acceleration via SR-enhanced streaming can bridge the gap between neuromorphic efficiency and general-purpose compute, enabling broader adoption of SNNs on conventional hardware.

Abstract

Spiking Neural Network (SNN) inference has a clear potential for high energy efficiency as computation is triggered by events. However, the inherent sparsity of events poses challenges for conventional computing systems, driving the development of specialized neuromorphic processors, which come with high silicon area costs and lack the flexibility needed for running other computational kernels, limiting widespread adoption. In this paper, we explore the low-level software design, parallelization, and acceleration of SNNs on general-purpose multicore clusters with a low-overhead RISC-V ISA extension for streaming sparse computations. We propose SpikeStream, an optimization technique that maps weights accesses to affine and indirect register-mapped memory streams to enhance performance, utilization, and efficiency. Our results on the end-to-end Spiking-VGG11 model demonstrate a significant 4.39x speedup and an increase in utilization from 9.28% to 52.3% compared to a non-streaming parallel baseline. Additionally, we achieve an energy efficiency gain of 3.46x over LSMCore and a performance gain of 2.38x over Loihi.

Paper Structure

This paper contains 18 sections, 1 equation, 7 figures.

Figures (7)

  • Figure 1: Pseudocode of the spatial iterations on the RF.
  • Figure 2: SpVA loop RISC-V assembly.
  • Figure 3: SpikeStream SpVA pseudocode
  • Figure 4: SpikeStream convolutional layer dataflow.
  • Figure 5: Performance and memory footprint evaluation on S-VGG11, showing average values and standard deviations for a batch size of 128 input frames.
  • ...and 2 more figures