Table of Contents
Fetching ...

Sparse by Rule: Probability-Based N:M Pruning for Spiking Neural Networks

Shuhan Ye, Yi Yu, Qixin Zhang, Chenqi Kong, Qiangqiang Wu, Xudong Jiang, Dacheng Tao

TL;DR

SpikeNM tackles the challenge of pruning deep Spiking Neural Networks (SNNs) by introducing a probabilistic $N{:}M$ semi-structured pruning framework learned from scratch. It uses an $M$-way basis-logit parameterization with a differentiable top-$k$ sampler to linearize per-block search to $ ext{O}(M)$ and couples mask learning to spiking dynamics via Eligibility-Inspired Distillation (EID). The method achieves state-of-the-art or competitive accuracy at $N{:}M$ sparsities such as $2{:}4$ and $2{:}8$ on CIFAR10/100 and neuromorphic datasets, while producing hardware-friendly sparsity patterns and preserving energy efficiency. This approach enables scalable, edge-friendly deployment of sparse SNNs and bridges the gap between unstructured and structured pruning by combining flexibility with accelerator-friendly structure.

Abstract

Brain-inspired Spiking neural networks (SNNs) promise energy-efficient intelligence via event-driven, sparse computation, but deeper architectures inflate parameters and computational cost, hindering their edge deployment. Recent progress in SNN pruning helps alleviate this burden, yet existing efforts fall into only two families: \emph{unstructured} pruning, which attains high sparsity but is difficult to accelerate on general hardware, and \emph{structured} pruning, which eases deployment but lack flexibility and often degrades accuracy at matched sparsity. In this work, we introduce \textbf{SpikeNM}, the first SNN-oriented \emph{semi-structured} \(N{:}M\) pruning framework that learns sparse SNNs \emph{from scratch}, enforcing \emph{at most \(N\)} non-zeros per \(M\)-weight block. To avoid the combinatorial space complexity \(\sum_{k=1}^{N}\binom{M}{k}\) growing exponentially with \(M\), SpikeNM adopts an \(M\)-way basis-logit parameterization with a differentiable top-\(k\) sampler, \emph{linearizing} per-block complexity to \(\mathcal O(M)\) and enabling more aggressive sparsification. Further inspired by neuroscience, we propose \emph{eligibility-inspired distillation} (EID), which converts temporally accumulated credits into block-wise soft targets to align mask probabilities with spiking dynamics, reducing sampling variance and stabilizing search under high sparsity. Experiments show that at \(2{:}4\) sparsity, SpikeNM maintains and even with gains across main-stream datasets, while yielding hardware-amenable patterns that complement intrinsic spike sparsity.

Sparse by Rule: Probability-Based N:M Pruning for Spiking Neural Networks

TL;DR

SpikeNM tackles the challenge of pruning deep Spiking Neural Networks (SNNs) by introducing a probabilistic semi-structured pruning framework learned from scratch. It uses an -way basis-logit parameterization with a differentiable top- sampler to linearize per-block search to and couples mask learning to spiking dynamics via Eligibility-Inspired Distillation (EID). The method achieves state-of-the-art or competitive accuracy at sparsities such as and on CIFAR10/100 and neuromorphic datasets, while producing hardware-friendly sparsity patterns and preserving energy efficiency. This approach enables scalable, edge-friendly deployment of sparse SNNs and bridges the gap between unstructured and structured pruning by combining flexibility with accelerator-friendly structure.

Abstract

Brain-inspired Spiking neural networks (SNNs) promise energy-efficient intelligence via event-driven, sparse computation, but deeper architectures inflate parameters and computational cost, hindering their edge deployment. Recent progress in SNN pruning helps alleviate this burden, yet existing efforts fall into only two families: \emph{unstructured} pruning, which attains high sparsity but is difficult to accelerate on general hardware, and \emph{structured} pruning, which eases deployment but lack flexibility and often degrades accuracy at matched sparsity. In this work, we introduce \textbf{SpikeNM}, the first SNN-oriented \emph{semi-structured} pruning framework that learns sparse SNNs \emph{from scratch}, enforcing \emph{at most } non-zeros per -weight block. To avoid the combinatorial space complexity growing exponentially with , SpikeNM adopts an -way basis-logit parameterization with a differentiable top- sampler, \emph{linearizing} per-block complexity to \(\mathcal O(M)\) and enabling more aggressive sparsification. Further inspired by neuroscience, we propose \emph{eligibility-inspired distillation} (EID), which converts temporally accumulated credits into block-wise soft targets to align mask probabilities with spiking dynamics, reducing sampling variance and stabilizing search under high sparsity. Experiments show that at sparsity, SpikeNM maintains and even with gains across main-stream datasets, while yielding hardware-amenable patterns that complement intrinsic spike sparsity.

Paper Structure

This paper contains 12 sections, 1 theorem, 22 equations, 2 figures, 3 tables, 1 algorithm.

Key Result

Theorem 4.1

where $\mathbf{e}_{j}$ is $j$-th basis vector in $\mathbb{R}^{M}$. $\bigoplus_{k=1}^{N}\mathbf a_k$ has exactly $N$ ones if $\mathbf a_k$ chooses distinct basis.

Figures (2)

  • Figure 1: The overview of SpikeNM. For each block of $M$ weights, we learn logits $\boldsymbol{\pi}$ and sample $N$ one-hot basis without replacement. These operations yield an at-most-$N$ binary mask $\mathcal{M}$, avoiding enumeration of all $\sum_{k\le N}\binom{M}{k}$ candidates used in traditional $N{:}M$ methods (For ease of illustration, we use 2:4 sparsity in the figure). The mask gates parameters $\mathcal{W}$ during training on inputs of $T$ time steps, optimized by the classification loss $\mathcal{L}_{\text{cls}}$ together with the eligibility-inspired regularizer $\mathcal{L}_{\text{EID}}$. At test time, the learned discrete masks are fixed, producing $N{:}M$ semi-structured sparsity.
  • Figure 2: Ablation with CIFAR10-DVS. (a) Ablation on different search epochs with a fixed total budget of 320 epochs ($\text{search}+\text{finetune}=320$). The dashed line marks the unpruned baseline (82.4%). (b) Ablation on $\lambda_{\mathrm{EID}}$. Bars show accuracies for non-zero $\lambda$. The dashed line denotes $\lambda=0$, where no EID is used.

Theorems & Definitions (1)

  • Theorem 4.1: Representation of $N{:}M$ sparsity