Table of Contents
Fetching ...

I2E: Real-Time Image-to-Event Conversion for High-Performance Spiking Neural Networks

Ruichen Ma, Liwei Meng, Guanchao Qiao, Ning Ning, Yang Liu, Shaogang Hu

TL;DR

I2E addresses a critical bottleneck in neuromorphic computing by converting static images into high-fidelity event streams in real time, enabling on-the-fly data augmentation for spiking neural networks (SNNs). The method achieves real-time performance via a three-stage pipeline that computes sparse event streams through a parallelized intensity map, a spatio-temporal convolution simulating microsaccades, and adaptive firing with a dynamic threshold. Empirically, I2E enables state-of-the-art training on synthetic event datasets (e.g., I2E-ImageNet at 60.50% and CIFAR-scale results) and establishes a sim-to-real paradigm by pre-training on synthetic data and fine-tuning on real CIFAR10-DVS to reach 92.5% accuracy. The work demonstrates that synthetic event data can serve as a high-fidelity proxy for real sensor data, offering a scalable foundation for developing high-performance, energy-efficient neuromorphic systems and is backed by open-source code and datasets for broad reuse.

Abstract

Spiking neural networks (SNNs) promise highly energy-efficient computing, but their adoption is hindered by a critical scarcity of event-stream data. This work introduces I2E, an algorithmic framework that resolves this bottleneck by converting static images into high-fidelity event streams. By simulating microsaccadic eye movements with a highly parallelized convolution, I2E achieves a conversion speed over 300x faster than prior methods, uniquely enabling on-the-fly data augmentation for SNN training. The framework's effectiveness is demonstrated on large-scale benchmarks. An SNN trained on the generated I2E-ImageNet dataset achieves a state-of-the-art accuracy of 60.50%. Critically, this work establishes a powerful sim-to-real paradigm where pre-training on synthetic I2E data and fine-tuning on the real-world CIFAR10-DVS dataset yields an unprecedented accuracy of 92.5%. This result validates that synthetic event data can serve as a high-fidelity proxy for real sensor data, bridging a long-standing gap in neuromorphic engineering. By providing a scalable solution to the data problem, I2E offers a foundational toolkit for developing high-performance neuromorphic systems. The open-source algorithm and all generated datasets are provided to accelerate research in the field.

I2E: Real-Time Image-to-Event Conversion for High-Performance Spiking Neural Networks

TL;DR

I2E addresses a critical bottleneck in neuromorphic computing by converting static images into high-fidelity event streams in real time, enabling on-the-fly data augmentation for spiking neural networks (SNNs). The method achieves real-time performance via a three-stage pipeline that computes sparse event streams through a parallelized intensity map, a spatio-temporal convolution simulating microsaccades, and adaptive firing with a dynamic threshold. Empirically, I2E enables state-of-the-art training on synthetic event datasets (e.g., I2E-ImageNet at 60.50% and CIFAR-scale results) and establishes a sim-to-real paradigm by pre-training on synthetic data and fine-tuning on real CIFAR10-DVS to reach 92.5% accuracy. The work demonstrates that synthetic event data can serve as a high-fidelity proxy for real sensor data, offering a scalable foundation for developing high-performance, energy-efficient neuromorphic systems and is backed by open-source code and datasets for broad reuse.

Abstract

Spiking neural networks (SNNs) promise highly energy-efficient computing, but their adoption is hindered by a critical scarcity of event-stream data. This work introduces I2E, an algorithmic framework that resolves this bottleneck by converting static images into high-fidelity event streams. By simulating microsaccadic eye movements with a highly parallelized convolution, I2E achieves a conversion speed over 300x faster than prior methods, uniquely enabling on-the-fly data augmentation for SNN training. The framework's effectiveness is demonstrated on large-scale benchmarks. An SNN trained on the generated I2E-ImageNet dataset achieves a state-of-the-art accuracy of 60.50%. Critically, this work establishes a powerful sim-to-real paradigm where pre-training on synthetic I2E data and fine-tuning on the real-world CIFAR10-DVS dataset yields an unprecedented accuracy of 92.5%. This result validates that synthetic event data can serve as a high-fidelity proxy for real sensor data, bridging a long-standing gap in neuromorphic engineering. By providing a scalable solution to the data problem, I2E offers a foundational toolkit for developing high-performance neuromorphic systems. The open-source algorithm and all generated datasets are provided to accelerate research in the field.

Paper Structure

This paper contains 42 sections, 18 equations, 9 figures, 9 tables, 1 algorithm.

Figures (9)

  • Figure 1: The I2E image-to-event conversion process. A single static RGB image is transformed into an eight-timestep event stream by simulating microsaccadic eye movements. The process effectively captures fine-grained details and salient object contours, producing a sparse data format well-suited for efficient, event-driven processing by SNNs.
  • Figure 2: The I2E algorithm simulates microsaccadic eye movements using a source image (point E) and its eight one-pixel-shifted versions (the other points), represented by a $3\times3$ grid. The intensity change $\Delta V$ is calculated by differencing pairs of these images. As shown by the arrows, these differences are classified into eight directional groups, each of which generates the data for one of the eight timesteps in the final event stream.
  • Figure 3: The subtraction of two shifted images is computationally equivalent to a 2D convolution with a sparse kernel.
  • Figure 4: Event rate statistics on ImageNet. $S_{th_0} = 0.12$ is selected to achieve a mean event rate of approximately 5%.
  • Figure 5: Trade-off between timesteps, accuracy, and data compression on ImageNet. Using more timesteps improves accuracy at the cost of a lower data compression ratio.
  • ...and 4 more figures