Table of Contents
Fetching ...

Toward Efficient Spiking Transformers: Synapse Pruning Meets Synergistic Learning-Based Compensation

Hongze Sun, Wuque Cai, Duo Chen, Quan Tang, Shifeng Mao, Jiayi He, Zhenxing Wang, Yan Cui, Dezhong Yao, Daqing Guo

TL;DR

This work tackles the efficiency gap in Spiking Transformer models by marrying two pruning strategies—unstructured $L_{1}$P and structured DSP—with a plug-and-play synergistic learning-based compensation via a novel sLIF neuron. The framework systematically reduces parameter count and computation while preserving accuracy, and is underpinned by theoretical analyses of gradient restoration and plasticity-driven response realignment. Extensive experiments across static and neuromorphic datasets (including ImageNet, CIFAR, CIFAR10-DVS, and ADE20K) demonstrate strong compression with competitive performance, faster convergence during fine-tuning, and meaningful inference-time and energy savings. The results suggest a practical path for deploying ST-based models on edge devices and neuromorphic hardware, with potential for dynamic sparsity and broader architectural extensions in future work.

Abstract

As a foundational architecture of artificial intelligence models, Transformer has been recently adapted to spiking neural networks with promising performance across various tasks. However, existing spiking Transformer(ST)-based models require a substantial number of parameters and incur high computational costs, thus limiting their deployment in resource-constrained environments. To address these challenges, we propose combining synapse pruning with a synergistic learning-based compensation strategy to derive lightweight ST-based models. Specifically, two types of tailored pruning strategies are introduced to reduce redundancy in the weight matrices of ST blocks: an unstructured $\mathrm{L_{1}P}$ method to induce sparse representations, and a structured DSP method to induce low-rank representations. In addition, we propose an enhanced spiking neuron model, termed the synergistic leaky integrate-and-fire (sLIF) neuron, to effectively compensate for model pruning through synergistic learning between synaptic and intrinsic plasticity mechanisms. Extensive experiments on benchmark datasets demonstrate that the proposed methods significantly reduce model size and computational overhead while maintaining competitive performance. These results validate the effectiveness of the proposed pruning and compensation strategies in constructing efficient and high-performing ST-based models.

Toward Efficient Spiking Transformers: Synapse Pruning Meets Synergistic Learning-Based Compensation

TL;DR

This work tackles the efficiency gap in Spiking Transformer models by marrying two pruning strategies—unstructured P and structured DSP—with a plug-and-play synergistic learning-based compensation via a novel sLIF neuron. The framework systematically reduces parameter count and computation while preserving accuracy, and is underpinned by theoretical analyses of gradient restoration and plasticity-driven response realignment. Extensive experiments across static and neuromorphic datasets (including ImageNet, CIFAR, CIFAR10-DVS, and ADE20K) demonstrate strong compression with competitive performance, faster convergence during fine-tuning, and meaningful inference-time and energy savings. The results suggest a practical path for deploying ST-based models on edge devices and neuromorphic hardware, with potential for dynamic sparsity and broader architectural extensions in future work.

Abstract

As a foundational architecture of artificial intelligence models, Transformer has been recently adapted to spiking neural networks with promising performance across various tasks. However, existing spiking Transformer(ST)-based models require a substantial number of parameters and incur high computational costs, thus limiting their deployment in resource-constrained environments. To address these challenges, we propose combining synapse pruning with a synergistic learning-based compensation strategy to derive lightweight ST-based models. Specifically, two types of tailored pruning strategies are introduced to reduce redundancy in the weight matrices of ST blocks: an unstructured method to induce sparse representations, and a structured DSP method to induce low-rank representations. In addition, we propose an enhanced spiking neuron model, termed the synergistic leaky integrate-and-fire (sLIF) neuron, to effectively compensate for model pruning through synergistic learning between synaptic and intrinsic plasticity mechanisms. Extensive experiments on benchmark datasets demonstrate that the proposed methods significantly reduce model size and computational overhead while maintaining competitive performance. These results validate the effectiveness of the proposed pruning and compensation strategies in constructing efficient and high-performing ST-based models.

Paper Structure

This paper contains 28 sections, 19 equations, 12 figures, 7 tables, 1 algorithm.

Figures (12)

  • Figure 1: Comparison of model parameters and classification accuracy between our lightweight models (denoted as 'ST[ours]') and existing ST-based models on the CIFAR10-DVS dataset.
  • Figure 2: Overview of the proposed lightweight ST-based models and the corresponding pruning strategies. In the original ST-based encoder, input embeddings are sequentially processed by the SSA and MLP modules. The primary parameter overhead resides in matrices $\mathbf{U}{\mathrm{q}}$, $\mathbf{U}{\mathrm{k}}$, $\mathbf{U}{\mathrm{v}}$, and $\mathbf{M}{0}$ of the SSA module, as well as $\mathbf{M}{1}$ and $\mathbf{M}{2}$ of the MLP module. To address this, two lightweight strategies are introduced: L$_{1}$P, which yields sparse matrices, and DSP, which produces low-rank matrices.
  • Figure 3: Illustration of the overall pipeline of the proposed method, including the sLIF neuron model and the synergistic learning mechanism.
  • Figure 4: Response re-alignment via synergistic plasticity of $u_{th}$ and $\tau$. Decreasing $u_{th}$ performs a horizontal shift of the f-I curve to the left, enabling the neuron to respond to weak inputs induced by pruning. Varying $\tau$ modulates the curvature and sensitivity of the response. The joint optimization of both parameters allows the sLIF neuron to maintain robust firing rates and information flow even under extreme pruning rates.
  • Figure 5: Performance comparison of the proposed lightweight models (sLIF) and LIF-compensated models (LIF) under varying pruning sparsity on the CIFAR10 and CIFAR10-DVS datasets. The parameter counts (Param) and pruned accuracies (Pruned) of the baseline models are also provided to demonstrate the effectiveness of the proposed method.
  • ...and 7 more figures