Learning Long Sequences in Spiking Neural Networks
Matei Ioan Stan, Oliver Rhodes
TL;DR
This work investigates bringing state space models (SSMs) to spike-based neural networks (SNNs) for long-range sequence modelling, addressing both efficiency and accuracy on neuromorphic hardware. By introducing Binary SSM (Binary S4D) and the Gated Spiking Unit (GSU), the authors enable efficient, addition-based feature mixing with non-differentiable spikes while mitigating vanishing gradients. Empirically, SSM-based SNNs outperform Transformers on the Long Range Arena and achieve state-of-the-art SNN performance on sequential MNIST, albeit with a gap to non-binarised baselines that is bridged by the GSU and non-saturating activations. The findings suggest that saturating spiking activations are a key limitation for scaling SNNs to long sequences, and that non-binary, non-saturating forward operations can preserve energy efficiency while maintaining strong performance, paving the way for neuromorphic deployment of large-scale SSMs.
Abstract
Spiking neural networks (SNNs) take inspiration from the brain to enable energy-efficient computations. Since the advent of Transformers, SNNs have struggled to compete with artificial networks on modern sequential tasks, as they inherit limitations from recurrent neural networks (RNNs), with the added challenge of training with non-differentiable binary spiking activations. However, a recent renewed interest in efficient alternatives to Transformers has given rise to state-of-the-art recurrent architectures named state space models (SSMs). This work systematically investigates, for the first time, the intersection of state-of-the-art SSMs with SNNs for long-range sequence modelling. Results suggest that SSM-based SNNs can outperform the Transformer on all tasks of a well-established long-range sequence modelling benchmark. It is also shown that SSM-based SNNs can outperform current state-of-the-art SNNs with fewer parameters on sequential image classification. Finally, a novel feature mixing layer is introduced, improving SNN accuracy while challenging assumptions about the role of binary activations in SNNs. This work paves the way for deploying powerful SSM-based architectures, such as large language models, to neuromorphic hardware for energy-efficient long-range sequence modelling.
