P-SpikeSSM: Harnessing Probabilistic Spiking State Space Models for Long-Range Dependency Tasks
Malyaban Bal, Abhronil Sengupta
TL;DR
This work tackles long-range dependency tasks with spiking neural networks by introducing P-SpikeSSM, a probabilistic spiking state-space model that uses an $n$-dimensional hidden state and stochastic spike generation. The architecture couples SpikeSampler for parallel spike generation with SpikeMixer and FuseClamp layers to enable robust inter-neuron communication and residual-like aggregation, achieving scalable deep SNNs. Through experiments on the Long Range Arena, psMNIST, and Speech Command datasets, the approach attains state-of-the-art performance among SNNs while exhibiting sparse spiking that yields energy efficiency; a detailed energy analysis supports substantial reductions in computation compared with non-spiking baselines. Overall, P-SpikeSSM demonstrates that probabilistic spiking dynamics can deliver competitive accuracy for long-context tasks with hardware-friendly parallelism, paving the way for deployment on neuromorphic hardware and edge devices.
Abstract
Spiking neural networks (SNNs) are posited as a computationally efficient and biologically plausible alternative to conventional neural architectures, with their core computational framework primarily using the leaky integrate-and-fire (LIF) neuron model. However, the limited hidden state representation of LIF neurons, characterized by a scalar membrane potential, and sequential spike generation process, poses challenges for effectively developing scalable spiking models to address long-range dependencies in sequence learning tasks. In this study, we develop a scalable probabilistic spiking learning framework for long-range dependency tasks leveraging the fundamentals of state space models. Unlike LIF neurons that rely on the deterministic Heaviside function for a sequential process of spike generation, we introduce a SpikeSampler layer that samples spikes stochastically based on an SSM-based neuronal model while allowing parallel computations. To address non-differentiability of the spiking operation and enable effective training, we also propose a surrogate function tailored for the stochastic nature of the SpikeSampler layer. To enhance inter-neuron communication, we introduce the SpikeMixer block, which integrates spikes from neuron populations in each layer. This is followed by a ClampFuse layer, incorporating a residual connection to capture complex dependencies, enabling scalability of the model. Our models attain state-of-the-art performance among SNN models across diverse long-range dependency tasks, encompassing the Long Range Arena benchmark, permuted sequential MNIST, and the Speech Command dataset and demonstrate sparse spiking pattern highlighting its computational efficiency.
