Table of Contents
Fetching ...

Compression and Inference of Spiking Neural Networks on Resource-Constrained Hardware

Karol C. Jurzec, Tomasz Szydlo, Maciej Wielgosz

TL;DR

This paper presents a C-based runtime for executing SNNTorch-trained spiking neural networks on resource-constrained hardware, emphasizing static memory allocation, cache-friendly data layouts, and a JSON-to-C model importer. It introduces spike-driven pruning to remove inactive neurons and filters, achieving substantial speedups (approximately 11× on desktop and up to 21× with pruning) and memory reductions that enable microcontroller deployment. Evaluations on N-MNIST and ST-MNIST show functional parity with Python baselines while enabling real-time or near-real-time inference on devices like the Arduino Portenta H7. The findings suggest that well-engineered software co-design and spike-driven compression can extend SNN deployment from neuromorphic chips to conventional embedded platforms, broadening edge AI applicability.

Abstract

Spiking neural networks (SNNs) communicate via discrete spikes in time rather than continuous activations. Their event-driven nature offers advantages for temporal processing and energy efficiency on resource-constrained hardware, but training and deployment remain challenging. We present a lightweight C-based runtime for SNN inference on edge devices and optimizations that reduce latency and memory without sacrificing accuracy. Trained models exported from SNNTorch are translated to a compact C representation; static, cache-friendly data layouts and preallocation avoid interpreter and allocation overheads. We further exploit sparse spiking activity to prune inactive neurons and synapses, shrinking computation in upstream convolutional layers. Experiments on N-MNIST and ST-MNIST show functional parity with the Python baseline while achieving ~10 speedups on desktop CPU and additional gains with pruning, together with large memory reductions that enable microcontroller deployment (Arduino Portenta H7). Results indicate that SNNs can be executed efficiently on conventional embedded platforms when paired with an optimized runtime and spike-driven model compression. Code: https://github.com/karol-jurzec/snn-generator/

Compression and Inference of Spiking Neural Networks on Resource-Constrained Hardware

TL;DR

This paper presents a C-based runtime for executing SNNTorch-trained spiking neural networks on resource-constrained hardware, emphasizing static memory allocation, cache-friendly data layouts, and a JSON-to-C model importer. It introduces spike-driven pruning to remove inactive neurons and filters, achieving substantial speedups (approximately 11× on desktop and up to 21× with pruning) and memory reductions that enable microcontroller deployment. Evaluations on N-MNIST and ST-MNIST show functional parity with Python baselines while enabling real-time or near-real-time inference on devices like the Arduino Portenta H7. The findings suggest that well-engineered software co-design and spike-driven compression can extend SNN deployment from neuromorphic chips to conventional embedded platforms, broadening edge AI applicability.

Abstract

Spiking neural networks (SNNs) communicate via discrete spikes in time rather than continuous activations. Their event-driven nature offers advantages for temporal processing and energy efficiency on resource-constrained hardware, but training and deployment remain challenging. We present a lightweight C-based runtime for SNN inference on edge devices and optimizations that reduce latency and memory without sacrificing accuracy. Trained models exported from SNNTorch are translated to a compact C representation; static, cache-friendly data layouts and preallocation avoid interpreter and allocation overheads. We further exploit sparse spiking activity to prune inactive neurons and synapses, shrinking computation in upstream convolutional layers. Experiments on N-MNIST and ST-MNIST show functional parity with the Python baseline while achieving ~10 speedups on desktop CPU and additional gains with pruning, together with large memory reductions that enable microcontroller deployment (Arduino Portenta H7). Results indicate that SNNs can be executed efficiently on conventional embedded platforms when paired with an optimized runtime and spike-driven model compression. Code: https://github.com/karol-jurzec/snn-generator/

Paper Structure

This paper contains 12 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: Overview of the C-based SNN runtime architecture and implementation flow. The runtime loads event-based data and a JSON-exported SNN model, then executes the network forward pass over time. Optimizations such as multi-threading (if available) and network pruning are applied to improve performance.
  • Figure 2: Example spiking raster plots for ST-MNIST and N-MNIST datasets. Each plot shows spike events over time for one representative spiking layer. Only a subset of neurons emit spikes during the entire stimulus, illustrating the sparse activity characteristic of SNNs.
  • Figure 3: Illustration of the relationship between a convolutional layer and a subsequent spiking layer, used for pruning. In this toy example, a conv2d layer with 4 output channels (red, green, blue, gray) feeds into a spiking layer of 16 neurons. Each group of 4 spiking neurons (right) corresponds to one convolutional filter (color-coded). The numbers indicate spike counts per neuron over some input.
  • Figure 4: Effect of pruning on the conv2d and spiking layers. Continuing the example from Figure \ref{['fig:conv_spike_relation']}, the red and blue filters (and their connected spiking neurons) are removed, since their neurons showed no activity (all spike counts zero). This results in a 50% reduction in convolutional layer computations for this layer. The pruned network retains only the active filters (green and gray) and their associated neurons.
  • Figure 5: Arduino Portenta H7 microcontroller used as the deployment target.
  • ...and 2 more figures