Sparse by Rule: Probability-Based N:M Pruning for Spiking Neural Networks
Shuhan Ye, Yi Yu, Qixin Zhang, Chenqi Kong, Qiangqiang Wu, Xudong Jiang, Dacheng Tao
TL;DR
SpikeNM tackles the challenge of pruning deep Spiking Neural Networks (SNNs) by introducing a probabilistic $N{:}M$ semi-structured pruning framework learned from scratch. It uses an $M$-way basis-logit parameterization with a differentiable top-$k$ sampler to linearize per-block search to $ ext{O}(M)$ and couples mask learning to spiking dynamics via Eligibility-Inspired Distillation (EID). The method achieves state-of-the-art or competitive accuracy at $N{:}M$ sparsities such as $2{:}4$ and $2{:}8$ on CIFAR10/100 and neuromorphic datasets, while producing hardware-friendly sparsity patterns and preserving energy efficiency. This approach enables scalable, edge-friendly deployment of sparse SNNs and bridges the gap between unstructured and structured pruning by combining flexibility with accelerator-friendly structure.
Abstract
Brain-inspired Spiking neural networks (SNNs) promise energy-efficient intelligence via event-driven, sparse computation, but deeper architectures inflate parameters and computational cost, hindering their edge deployment. Recent progress in SNN pruning helps alleviate this burden, yet existing efforts fall into only two families: \emph{unstructured} pruning, which attains high sparsity but is difficult to accelerate on general hardware, and \emph{structured} pruning, which eases deployment but lack flexibility and often degrades accuracy at matched sparsity. In this work, we introduce \textbf{SpikeNM}, the first SNN-oriented \emph{semi-structured} \(N{:}M\) pruning framework that learns sparse SNNs \emph{from scratch}, enforcing \emph{at most \(N\)} non-zeros per \(M\)-weight block. To avoid the combinatorial space complexity \(\sum_{k=1}^{N}\binom{M}{k}\) growing exponentially with \(M\), SpikeNM adopts an \(M\)-way basis-logit parameterization with a differentiable top-\(k\) sampler, \emph{linearizing} per-block complexity to \(\mathcal O(M)\) and enabling more aggressive sparsification. Further inspired by neuroscience, we propose \emph{eligibility-inspired distillation} (EID), which converts temporally accumulated credits into block-wise soft targets to align mask probabilities with spiking dynamics, reducing sampling variance and stabilizing search under high sparsity. Experiments show that at \(2{:}4\) sparsity, SpikeNM maintains and even with gains across main-stream datasets, while yielding hardware-amenable patterns that complement intrinsic spike sparsity.
