Table of Contents
Fetching ...

Scalable Event-by-event Processing of Neuromorphic Sensory Signals With Deep State-Space Models

Mark Schöne, Neeraj Mohan Sushma, Jingyue Zhuge, Christian Mayr, Anand Subramoney, David Kappel

TL;DR

The paper addresses scalable, event-by-event processing of neuromorphic sensory data by introducing Event-SSM, a deep state-space modeling framework built on the S5 architecture that handles asynchronous, irregular event streams. By leveraging a linear-time-invariant formulation with a left-half-plane spectrum and a novel asynchronous discretization, the method achieves long-range dependency modeling and parallelizable computation, complemented by event-pooling to manage millions of events. Empirically, Event-SSM attains state-of-the-art results on Spiking Heidelberg Digits (SHD) and Spiking Speech Commands (SSC), and competitive performance on DVS128 Gestures, significantly advancing fully event-based processing with recurrent networks. This work demonstrates the practicality of scalable, fully event-based temporal modeling for neuromorphic benchmarks, with potential impact on real-time neuromorphic systems and energy-efficient AI.

Abstract

Event-based sensors are well suited for real-time processing due to their fast response times and encoding of the sensory data as successive temporal differences. These and other valuable properties, such as a high dynamic range, are suppressed when the data is converted to a frame-based format. However, most current methods either collapse events into frames or cannot scale up when processing the event data directly event-by-event. In this work, we address the key challenges of scaling up event-by-event modeling of the long event streams emitted by such sensors, which is a particularly relevant problem for neuromorphic computing. While prior methods can process up to a few thousand time steps, our model, based on modern recurrent deep state-space models, scales to event streams of millions of events for both training and inference. We leverage their stable parameterization for learning long-range dependencies, parallelizability along the sequence dimension, and their ability to integrate asynchronous events effectively to scale them up to long event streams. We further augment these with novel event-centric techniques enabling our model to match or beat the state-of-the-art performance on several event stream benchmarks. In the Spiking Speech Commands task, we improve state-of-the-art by a large margin of 7.7% to 88.4%. On the DVS128-Gestures dataset, we achieve competitive results without using frames or convolutional neural networks. Our work demonstrates, for the first time, that it is possible to use fully event-based processing with purely recurrent networks to achieve state-of-the-art task performance in several event-based benchmarks.

Scalable Event-by-event Processing of Neuromorphic Sensory Signals With Deep State-Space Models

TL;DR

The paper addresses scalable, event-by-event processing of neuromorphic sensory data by introducing Event-SSM, a deep state-space modeling framework built on the S5 architecture that handles asynchronous, irregular event streams. By leveraging a linear-time-invariant formulation with a left-half-plane spectrum and a novel asynchronous discretization, the method achieves long-range dependency modeling and parallelizable computation, complemented by event-pooling to manage millions of events. Empirically, Event-SSM attains state-of-the-art results on Spiking Heidelberg Digits (SHD) and Spiking Speech Commands (SSC), and competitive performance on DVS128 Gestures, significantly advancing fully event-based processing with recurrent networks. This work demonstrates the practicality of scalable, fully event-based temporal modeling for neuromorphic benchmarks, with potential impact on real-time neuromorphic systems and energy-efficient AI.

Abstract

Event-based sensors are well suited for real-time processing due to their fast response times and encoding of the sensory data as successive temporal differences. These and other valuable properties, such as a high dynamic range, are suppressed when the data is converted to a frame-based format. However, most current methods either collapse events into frames or cannot scale up when processing the event data directly event-by-event. In this work, we address the key challenges of scaling up event-by-event modeling of the long event streams emitted by such sensors, which is a particularly relevant problem for neuromorphic computing. While prior methods can process up to a few thousand time steps, our model, based on modern recurrent deep state-space models, scales to event streams of millions of events for both training and inference. We leverage their stable parameterization for learning long-range dependencies, parallelizability along the sequence dimension, and their ability to integrate asynchronous events effectively to scale them up to long event streams. We further augment these with novel event-centric techniques enabling our model to match or beat the state-of-the-art performance on several event stream benchmarks. In the Spiking Speech Commands task, we improve state-of-the-art by a large margin of 7.7% to 88.4%. On the DVS128-Gestures dataset, we achieve competitive results without using frames or convolutional neural networks. Our work demonstrates, for the first time, that it is possible to use fully event-based processing with purely recurrent networks to achieve state-of-the-art task performance in several event-based benchmarks.
Paper Structure (10 sections, 14 equations, 3 figures, 5 tables)

This paper contains 10 sections, 14 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: In real-time, our model evolves a linear time-invariant state-space in continuous time and integrates the delta-coded event-stream along the way. The strength of our model stems from its duality with a linear time-variant recurrence relation discretized over the event times. This allows the simulation to leverage the associative scan primitive to parallelize the dynamical system over time.
  • Figure 2: Our modified simplified state-space layer consists of an SSM followed by a non-linear multiplicative transformation. A skip connection and a normalization layer complete the block. The information about event timings is passed to the model via the differences $\Delta_i=t_i-t_{i-1}$.
  • Figure 3: Distribution of the number of events per class in the DVS128-Gesture dataset. The median number of events per sample is about 300,000, and the maximum number of events per sample is about 1.5 million.