Table of Contents
Fetching ...

Event2Vec: Processing Neuromorphic Events Directly by Representations in Vector Space

Wei Fang, Priyadarshini Panda

TL;DR

The paper tackles the challenge of applying dense deep learning to asynchronous, sparse neuromorphic events from AER sensors. It introduces event2vec, a vector-space embedding that decomposes each event into a parametric spatial component and a learned temporal component, yielding $\\mathbf{v}=\\mathbf{v}_s+\\mathbf{v}_t$. Through experiments on DVS Gesture, ASL-DVS, and DVS-Lip, it demonstrates competitive or state-of-the-art accuracy with markedly fewer parameters, while delivering high throughput and low latency and preserving sparsity and temporal fidelity. The approach bridges sparse neuromorphic data with Transformer-based architectures and points toward future directions such as edge-device deployment and event-language modeling of streams.

Abstract

Neuromorphic event cameras possess superior temporal resolution, power efficiency, and dynamic range compared to traditional cameras. However, their asynchronous and sparse data format poses a significant challenge for conventional deep learning methods. Existing methods either convert the events into dense synchronous frame representations for processing by powerful CNNs or Transformers, but lose the asynchronous, sparse and high temporal resolution characteristics of events during the conversion process; or adopt irregular models such as sparse convolution, spiking neural networks, or graph neural networks to process the irregular event representations but fail to take full advantage of GPU acceleration.Inspired by word-to-vector models, we draw an analogy between words and events to introduce event2vec, a novel representation that allows neural networks to process events directly. This approach is fully compatible with the parallel processing capabilities of Transformers. We demonstrate the effectiveness of event2vec on the DVS Gesture, ASL-DVS, and DVS-Lip benchmarks, showing that event2vec is remarkably parameter-efficient, features high throughput and low latency, and achieves high accuracy even with an extremely low number of events or low spatial resolutions. Event2vec introduces a novel paradigm by demonstrating for the first time that sparse, irregular event data can be directly integrated into high-throughput Transformer architectures. This breakthrough resolves the long-standing conflict between maintaining data sparsity and maximizing GPU efficiency, offering a promising balance for real-time, low-latency neuromorphic vision tasks. The code is provided in https://github.com/Intelligent-Computing-Lab-Panda/event2vec.

Event2Vec: Processing Neuromorphic Events Directly by Representations in Vector Space

TL;DR

The paper tackles the challenge of applying dense deep learning to asynchronous, sparse neuromorphic events from AER sensors. It introduces event2vec, a vector-space embedding that decomposes each event into a parametric spatial component and a learned temporal component, yielding . Through experiments on DVS Gesture, ASL-DVS, and DVS-Lip, it demonstrates competitive or state-of-the-art accuracy with markedly fewer parameters, while delivering high throughput and low latency and preserving sparsity and temporal fidelity. The approach bridges sparse neuromorphic data with Transformer-based architectures and points toward future directions such as edge-device deployment and event-language modeling of streams.

Abstract

Neuromorphic event cameras possess superior temporal resolution, power efficiency, and dynamic range compared to traditional cameras. However, their asynchronous and sparse data format poses a significant challenge for conventional deep learning methods. Existing methods either convert the events into dense synchronous frame representations for processing by powerful CNNs or Transformers, but lose the asynchronous, sparse and high temporal resolution characteristics of events during the conversion process; or adopt irregular models such as sparse convolution, spiking neural networks, or graph neural networks to process the irregular event representations but fail to take full advantage of GPU acceleration.Inspired by word-to-vector models, we draw an analogy between words and events to introduce event2vec, a novel representation that allows neural networks to process events directly. This approach is fully compatible with the parallel processing capabilities of Transformers. We demonstrate the effectiveness of event2vec on the DVS Gesture, ASL-DVS, and DVS-Lip benchmarks, showing that event2vec is remarkably parameter-efficient, features high throughput and low latency, and achieves high accuracy even with an extremely low number of events or low spatial resolutions. Event2vec introduces a novel paradigm by demonstrating for the first time that sparse, irregular event data can be directly integrated into high-throughput Transformer architectures. This breakthrough resolves the long-standing conflict between maintaining data sparsity and maximizing GPU efficiency, offering a promising balance for real-time, low-latency neuromorphic vision tasks. The code is provided in https://github.com/Intelligent-Computing-Lab-Panda/event2vec.

Paper Structure

This paper contains 26 sections, 8 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Conceptual analogy between words and events.
  • Figure 2: Accuracy vs. number of events compared to sampling techniques from 11147953 on the DVS Gesture dataset.
  • Figure 3: Accuracy vs. spatial resolution: Comparison with the SOTA Max-Former fang2025spiking on DVS Gesture.
  • Figure 4: Visual comparison of the learned spatial embeddings.
  • Figure 5: Event-level attention maps on samples from DVS Gesture (Row 1), ASL-DVS (Row 2), and DVS-Lip (Row 3).
  • ...and 4 more figures