Table of Contents
Fetching ...

Learning from Dense Events: Towards Fast Spiking Neural Networks Training via Event Dataset Distillation

Shuhan Ye, Yi Yu, Qixin Zhang, Chenqi Kong, Qiangqiang Wu, Kun Wang, Xudong Jiang

TL;DR

This work tackles the costly training of spiking neural networks (SNNs) on densely temporal event streams by introducing PACE, a dataset distillation framework tailored for event data. PACE comprises Spatial-Temporal Densified Spike Matching (ST-DSM) to densify and align spike patterns in space and time and PEQ-N, a straight-through probabilistic event quantizer that preserves gradient flow while producing integer event frames. Across DVS-Gesture, CIFAR10-DVS, and N-MNIST, PACE outperforms existing coreset and distillation baselines, with particularly large gains on dynamic streams and at low/moderate IPC; for example, on N-MNIST with IPC=$1$, it achieves $84.4\%$ accuracy, about $85\%$ of the full training performance, while reducing training time by $>50\times$ and storage by $>6000\times$. The distilled surrogates transfer to other SNN backbones and enable minute-scale training, supporting efficient edge deployment and indicating a practical path toward scalable neuromorphic vision systems.

Abstract

Event cameras sense brightness changes and output binary asynchronous event streams, attracting increasing attention. Their bio-inspired dynamics align well with spiking neural networks (SNNs), offering a promising energy-efficient alternative to conventional vision systems. However, SNNs remain costly to train due to temporal coding, which limits their practical deployment. To alleviate the high training cost of SNNs, we introduce \textbf{PACE} (Phase-Aligned Condensation for Events), the first dataset distillation framework to SNNs and event-based vision. PACE distills a large training dataset into a compact synthetic one that enables fast SNN training, which is achieved by two core modules: \textbf{ST-DSM} and \textbf{PEQ-N}. ST-DSM uses residual membrane potentials to densify spike-based features (SDR) and to perform fine-grained spatiotemporal matching of amplitude and phase (ST-SM), while PEQ-N provides a plug-and-play straight through probabilistic integer quantizer compatible with standard event-frame pipelines. Across DVS-Gesture, CIFAR10-DVS, and N-MNIST datasets, PACE outperforms existing coreset selection and dataset distillation baselines, with particularly strong gains on dynamic event streams and at low or moderate IPC. Specifically, on N-MNIST, it achieves \(84.4\%\) accuracy, about \(85\%\) of the full training set performance, while reducing training time by more than \(50\times\) and storage cost by \(6000\times\), yielding compact surrogates that enable minute-scale SNN training and efficient edge deployment.

Learning from Dense Events: Towards Fast Spiking Neural Networks Training via Event Dataset Distillation

TL;DR

This work tackles the costly training of spiking neural networks (SNNs) on densely temporal event streams by introducing PACE, a dataset distillation framework tailored for event data. PACE comprises Spatial-Temporal Densified Spike Matching (ST-DSM) to densify and align spike patterns in space and time and PEQ-N, a straight-through probabilistic event quantizer that preserves gradient flow while producing integer event frames. Across DVS-Gesture, CIFAR10-DVS, and N-MNIST, PACE outperforms existing coreset and distillation baselines, with particularly large gains on dynamic streams and at low/moderate IPC; for example, on N-MNIST with IPC=, it achieves accuracy, about of the full training performance, while reducing training time by and storage by . The distilled surrogates transfer to other SNN backbones and enable minute-scale training, supporting efficient edge deployment and indicating a practical path toward scalable neuromorphic vision systems.

Abstract

Event cameras sense brightness changes and output binary asynchronous event streams, attracting increasing attention. Their bio-inspired dynamics align well with spiking neural networks (SNNs), offering a promising energy-efficient alternative to conventional vision systems. However, SNNs remain costly to train due to temporal coding, which limits their practical deployment. To alleviate the high training cost of SNNs, we introduce \textbf{PACE} (Phase-Aligned Condensation for Events), the first dataset distillation framework to SNNs and event-based vision. PACE distills a large training dataset into a compact synthetic one that enables fast SNN training, which is achieved by two core modules: \textbf{ST-DSM} and \textbf{PEQ-N}. ST-DSM uses residual membrane potentials to densify spike-based features (SDR) and to perform fine-grained spatiotemporal matching of amplitude and phase (ST-SM), while PEQ-N provides a plug-and-play straight through probabilistic integer quantizer compatible with standard event-frame pipelines. Across DVS-Gesture, CIFAR10-DVS, and N-MNIST datasets, PACE outperforms existing coreset selection and dataset distillation baselines, with particularly strong gains on dynamic event streams and at low or moderate IPC. Specifically, on N-MNIST, it achieves accuracy, about of the full training set performance, while reducing training time by more than and storage cost by , yielding compact surrogates that enable minute-scale SNN training and efficient edge deployment.

Paper Structure

This paper contains 15 sections, 16 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: The framework of our PACE (Phase-Aligned Condensation for Events). Float synthetic data is quantized into integer-driven events via our PEQ-N module. Then, both real and synthetic event streams are fed into unified SNN teacher and student models. The core of our approach is the Spatial-Temporal Densified Spike Matching module, which trains the synthetic data by forcing its resulting spike patterns to closely mimic those generated by the real data across both space and time.
  • Figure 2: Visualization of Original real data (top), distilled binary (middle) and integer (bottom) event data for (a) DVS-Gesture "right hand wave", (b) N-MNIST "0", and (c) CIFAR10-DVS "airplane". Each subfigure shows representative voxelized event maps: red denotes positive (ON) events; blue denotes negative (OFF) events; black marks pixels where both polarities occur within the same bin.