Table of Contents
Fetching ...

L-SPINE: A Low-Precision SIMD Spiking Neural Compute Engine for Resource-efficient Edge Inference

Sonu Kumar, Mukul Lokhande, Santosh Kumar Vishvakarma

Abstract

Spiking Neural Networks (SNNs) offer a promising solution for energy-efficient edge intelligence; however, their hardware deployment is constrained by memory overhead, inefficient scaling operations, and limited parallelism. This work proposes L-SPINE, a low-precision SIMD-enabled spiking neural compute engine for efficient edge inference. The architecture features a unified multi-precision datapath supporting 2-bit, 4-bit, and 8-bit operations, leveraging a multiplier-less shift-add model for neuron dynamics and synaptic accumulation. Implemented on an AMD VC707 FPGA, the proposed neuron requires only 459 LUTs and 408 FFs, achieving a critical delay of 0.39 ns and 4.2 mW power. At the system level, L-SPINE achieves 46.37K LUTs, 30.4K FFs, 2.38 ms latency, and 0.54 W power. Compared to CPU and GPU platforms, it reduces inference latency from seconds to milliseconds, achieving an up to three orders-of-magnitude improvement in energy efficiency. Quantisation analysis shows that INT2/INT4 configurations significantly reduce memory footprint with minimal accuracy loss. These results establish L-SPINE as a scalable and efficient solution for real-time edge SNN deployment.

L-SPINE: A Low-Precision SIMD Spiking Neural Compute Engine for Resource-efficient Edge Inference

Abstract

Spiking Neural Networks (SNNs) offer a promising solution for energy-efficient edge intelligence; however, their hardware deployment is constrained by memory overhead, inefficient scaling operations, and limited parallelism. This work proposes L-SPINE, a low-precision SIMD-enabled spiking neural compute engine for efficient edge inference. The architecture features a unified multi-precision datapath supporting 2-bit, 4-bit, and 8-bit operations, leveraging a multiplier-less shift-add model for neuron dynamics and synaptic accumulation. Implemented on an AMD VC707 FPGA, the proposed neuron requires only 459 LUTs and 408 FFs, achieving a critical delay of 0.39 ns and 4.2 mW power. At the system level, L-SPINE achieves 46.37K LUTs, 30.4K FFs, 2.38 ms latency, and 0.54 W power. Compared to CPU and GPU platforms, it reduces inference latency from seconds to milliseconds, achieving an up to three orders-of-magnitude improvement in energy efficiency. Quantisation analysis shows that INT2/INT4 configurations significantly reduce memory footprint with minimal accuracy loss. These results establish L-SPINE as a scalable and efficient solution for real-time edge SNN deployment.

Paper Structure

This paper contains 10 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: System-level architecture of the proposed L-SPINE accelerator integrating pico-rv32 RISC-V, spike encoding modules, and a 2D SIMD-enabled neuron processing array for efficient SNN inference.
  • Figure 2: Detailed datapath for Proposed SIMD-enabled multi-precision compute engine supporting configurable 16x 2-bit, 4x 4-bit, and 1x 8-bit operations using a reconfigurable shift-add logic integrated in LIF neuron computation.
  • Figure 3: Design and evaluation flow of the proposed L-SPINE architecture, including SNN training, quantization, hardware mapping, and FPGA-based validation.
  • Figure 4: Comparison of accuracy and memory footprint with the state-of-the-art SNN quantisation, STBPSTBP, ADMMADMM, TruncQuantMAC.
  • Figure 5: Impact of precision scaling on SNN accuracy across INT2, INT4, INT8, and FP32 configurations.