Table of Contents
Fetching ...

Efficient Training of Spiking Neural Networks by Spike-aware Data Pruning

Chenxiang Ma, Xinyi Chen, Yujie Wu, Kay Chen Tan, Jibin Wu

TL;DR

This work addresses the high training cost of spiking neural networks by introducing spike-aware data pruning (SADP). SADP optimizes data usage by selecting examples with probabilities proportional to an upper-bound proxy of their gradient norms, called the spike-aware importance score, and adds smoothing and a dynamic pruning schedule to stabilize training. The approach yields substantial training speedups while preserving accuracy across diverse datasets and architectures, including large-scale ImageNet experiments, and proves compatible with online, local, and efficient-inference settings. By reducing gradient variance and avoiding expensive per-example gradient computations, SADP offers a data-centric route to scaling SNNs to bigger models and datasets with practical efficiency gains.

Abstract

Spiking neural networks (SNNs), recognized as an energy-efficient alternative to traditional artificial neural networks (ANNs), have advanced rapidly through the scaling of models and datasets. However, such scaling incurs considerable training overhead, posing challenges for researchers with limited computational resources and hindering the sustained development of SNNs. Data pruning is a promising strategy for accelerating training by retaining the most informative examples and discarding redundant ones, but it remains largely unexplored in SNNs. Directly applying ANN-based data pruning methods to SNNs fails to capture the intrinsic importance of examples and suffers from high gradient variance. To address these challenges, we propose a novel spike-aware data pruning (SADP) method. SADP reduces gradient variance by determining each example's selection probability to be proportional to its gradient norm, while avoiding the high cost of direct gradient computation through an efficient upper bound, termed spike-aware importance score. This score accounts for the influence of all-or-nothing spikes on the gradient norm and can be computed with negligible overhead. Extensive experiments across diverse datasets and architectures demonstrate that SADP consistently outperforms data pruning baselines and achieves training speedups close to the theoretical maxima at different pruning ratios. Notably, SADP reduces training time by 35% on ImageNet while maintaining accuracy comparable to that of full-data training. This work, therefore, establishes a data-centric paradigm for efficient SNN training and paves the way for scaling SNNs to larger models and datasets. The source code will be released publicly after the review process.

Efficient Training of Spiking Neural Networks by Spike-aware Data Pruning

TL;DR

This work addresses the high training cost of spiking neural networks by introducing spike-aware data pruning (SADP). SADP optimizes data usage by selecting examples with probabilities proportional to an upper-bound proxy of their gradient norms, called the spike-aware importance score, and adds smoothing and a dynamic pruning schedule to stabilize training. The approach yields substantial training speedups while preserving accuracy across diverse datasets and architectures, including large-scale ImageNet experiments, and proves compatible with online, local, and efficient-inference settings. By reducing gradient variance and avoiding expensive per-example gradient computations, SADP offers a data-centric route to scaling SNNs to bigger models and datasets with practical efficiency gains.

Abstract

Spiking neural networks (SNNs), recognized as an energy-efficient alternative to traditional artificial neural networks (ANNs), have advanced rapidly through the scaling of models and datasets. However, such scaling incurs considerable training overhead, posing challenges for researchers with limited computational resources and hindering the sustained development of SNNs. Data pruning is a promising strategy for accelerating training by retaining the most informative examples and discarding redundant ones, but it remains largely unexplored in SNNs. Directly applying ANN-based data pruning methods to SNNs fails to capture the intrinsic importance of examples and suffers from high gradient variance. To address these challenges, we propose a novel spike-aware data pruning (SADP) method. SADP reduces gradient variance by determining each example's selection probability to be proportional to its gradient norm, while avoiding the high cost of direct gradient computation through an efficient upper bound, termed spike-aware importance score. This score accounts for the influence of all-or-nothing spikes on the gradient norm and can be computed with negligible overhead. Extensive experiments across diverse datasets and architectures demonstrate that SADP consistently outperforms data pruning baselines and achieves training speedups close to the theoretical maxima at different pruning ratios. Notably, SADP reduces training time by 35% on ImageNet while maintaining accuracy comparable to that of full-data training. This work, therefore, establishes a data-centric paradigm for efficient SNN training and paves the way for scaling SNNs to larger models and datasets. The source code will be released publicly after the review process.

Paper Structure

This paper contains 33 sections, 2 theorems, 31 equations, 6 figures, 11 tables, 1 algorithm.

Key Result

Proposition 1

For the Bernoulli sampling gradient estimator in Eq. (eq:est_w_grad), the optimal selection probabilities $\mathbf{p}^*\!=\!(p_1^*, \ldots, p_N^*)$ that minimize the gradient variance $\mathop{\mathrm{Var}}\nolimits[\hat{\nabla}_{\bm{W}}\mathcal{L}]$ are where the clipping threshold $\alpha$ is defined as and $\|\nabla_{\bm{W}}\ell(\bm{x})\|_{(1)}\!\leq\!\cdots\!\leq\!\|\nabla_{\bm{W}} \ell(\bm{

Figures (6)

  • Figure 1: Comparison of SADP with existing data pruning methods. (a) Pearson correlation between the per-example gradient norm and different importance scores. Our spike-aware score maintains a significantly higher correlation than the loss score, particularly under high spike sparsity. (b) SADP reduces gradient variance throughout training. (c) This variance reduction leads to faster convergence. (d) SADP consistently achieves higher accuracy across different pruning ratios, with accuracy gains becoming more pronounced at higher ratios. Experiments are conducted with ResNet18 on CIFAR-10.
  • Figure 2: Illustration of SADP. At the start of each training epoch, selection probabilities are computed for all examples using the proposed spike-aware importance score, which provides an efficient and accurate approximation of the per-example gradient norm for variance minimization. To enhance training stability, the probabilities are smoothed to avoid extremely small values, after which a subset of examples is probabilistically sampled for training the SNN. Note that the spike-aware importance score explicitly captures the effect of sparse binary spikes on the per-example gradient norm, and its computation requires only quantities already available from forward and backward passes, thereby achieving both high efficiency and strong effectiveness.
  • Figure 3: Comparison of training efficiency on CIFAR-100. The y-axis represents the training time ratio relative to the full-data training. The theoretical maximum training time ratio is equal to the specified pruning ratio.
  • Figure 4: Comparison of selection probabilities derived from (a) the spike-aware importance score and (b) the loss score, against target probabilities based on the per-example gradient norm. The CIFAR10 dataset with a 50% pruning ratio and a smoothing constant of 0.3 is adopted. Pearson correlation coefficients are provided in the legend.
  • Figure 5: Comparison of gradient variance. Results are shown on CIFAR-10 at a 50% pruning ratio.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Proposition 1
  • proof
  • Proposition 2
  • proof