Compression and Inference of Spiking Neural Networks on Resource-Constrained Hardware
Karol C. Jurzec, Tomasz Szydlo, Maciej Wielgosz
TL;DR
This paper presents a C-based runtime for executing SNNTorch-trained spiking neural networks on resource-constrained hardware, emphasizing static memory allocation, cache-friendly data layouts, and a JSON-to-C model importer. It introduces spike-driven pruning to remove inactive neurons and filters, achieving substantial speedups (approximately 11× on desktop and up to 21× with pruning) and memory reductions that enable microcontroller deployment. Evaluations on N-MNIST and ST-MNIST show functional parity with Python baselines while enabling real-time or near-real-time inference on devices like the Arduino Portenta H7. The findings suggest that well-engineered software co-design and spike-driven compression can extend SNN deployment from neuromorphic chips to conventional embedded platforms, broadening edge AI applicability.
Abstract
Spiking neural networks (SNNs) communicate via discrete spikes in time rather than continuous activations. Their event-driven nature offers advantages for temporal processing and energy efficiency on resource-constrained hardware, but training and deployment remain challenging. We present a lightweight C-based runtime for SNN inference on edge devices and optimizations that reduce latency and memory without sacrificing accuracy. Trained models exported from SNNTorch are translated to a compact C representation; static, cache-friendly data layouts and preallocation avoid interpreter and allocation overheads. We further exploit sparse spiking activity to prune inactive neurons and synapses, shrinking computation in upstream convolutional layers. Experiments on N-MNIST and ST-MNIST show functional parity with the Python baseline while achieving ~10 speedups on desktop CPU and additional gains with pruning, together with large memory reductions that enable microcontroller deployment (Arduino Portenta H7). Results indicate that SNNs can be executed efficiently on conventional embedded platforms when paired with an optimized runtime and spike-driven model compression. Code: https://github.com/karol-jurzec/snn-generator/
