Table of Contents
Fetching ...

SpikePoint: An Efficient Point-based Spiking Neural Network for Event Cameras Action Recognition

Hongwei Ren, Yue Zhou, Yulong Huang, Haotian Fu, Xiaopeng Lin, Jie Song, Bojun Cheng

TL;DR

SpikePoint tackles the challenge of event-based action recognition by directly processing sparse event clouds with a single-stage point-based Spiking Neural Network. It introduces a novel event-cloud representation, FPS/KNN sampling, and a rate-encoded 3D point framework, augmented by local and global feature extractors and a residual-inspired SNN design trained with surrogate gradients. The method achieves state-of-the-art performance on multiple event-based datasets using only a tiny fraction of parameters and power compared to conventional ANNs, and demonstrates strong energy efficiency with low timesteps (16). This end-to-end, sparsity-preserving approach paves the way for ultra-low-power neuromorphic processing in action recognition and potentially other event-based tasks such as SLAM and multimodal sensing.

Abstract

Event cameras are bio-inspired sensors that respond to local changes in light intensity and feature low latency, high energy efficiency, and high dynamic range. Meanwhile, Spiking Neural Networks (SNNs) have gained significant attention due to their remarkable efficiency and fault tolerance. By synergistically harnessing the energy efficiency inherent in event cameras and the spike-based processing capabilities of SNNs, their integration could enable ultra-low-power application scenarios, such as action recognition tasks. However, existing approaches often entail converting asynchronous events into conventional frames, leading to additional data mapping efforts and a loss of sparsity, contradicting the design concept of SNNs and event cameras. To address this challenge, we propose SpikePoint, a novel end-to-end point-based SNN architecture. SpikePoint excels at processing sparse event cloud data, effectively extracting both global and local features through a singular-stage structure. Leveraging the surrogate training method, SpikePoint achieves high accuracy with few parameters and maintains low power consumption, specifically employing the identity mapping feature extractor on diverse datasets. SpikePoint achieves state-of-the-art (SOTA) performance on four event-based action recognition datasets using only 16 timesteps, surpassing other SNN methods. Moreover, it also achieves SOTA performance across all methods on three datasets, utilizing approximately 0.3\% of the parameters and 0.5\% of power consumption employed by artificial neural networks (ANNs). These results emphasize the significance of Point Cloud and pave the way for many ultra-low-power event-based data processing applications.

SpikePoint: An Efficient Point-based Spiking Neural Network for Event Cameras Action Recognition

TL;DR

SpikePoint tackles the challenge of event-based action recognition by directly processing sparse event clouds with a single-stage point-based Spiking Neural Network. It introduces a novel event-cloud representation, FPS/KNN sampling, and a rate-encoded 3D point framework, augmented by local and global feature extractors and a residual-inspired SNN design trained with surrogate gradients. The method achieves state-of-the-art performance on multiple event-based datasets using only a tiny fraction of parameters and power compared to conventional ANNs, and demonstrates strong energy efficiency with low timesteps (16). This end-to-end, sparsity-preserving approach paves the way for ultra-low-power neuromorphic processing in action recognition and potentially other event-based tasks such as SLAM and multimodal sensing.

Abstract

Event cameras are bio-inspired sensors that respond to local changes in light intensity and feature low latency, high energy efficiency, and high dynamic range. Meanwhile, Spiking Neural Networks (SNNs) have gained significant attention due to their remarkable efficiency and fault tolerance. By synergistically harnessing the energy efficiency inherent in event cameras and the spike-based processing capabilities of SNNs, their integration could enable ultra-low-power application scenarios, such as action recognition tasks. However, existing approaches often entail converting asynchronous events into conventional frames, leading to additional data mapping efforts and a loss of sparsity, contradicting the design concept of SNNs and event cameras. To address this challenge, we propose SpikePoint, a novel end-to-end point-based SNN architecture. SpikePoint excels at processing sparse event cloud data, effectively extracting both global and local features through a singular-stage structure. Leveraging the surrogate training method, SpikePoint achieves high accuracy with few parameters and maintains low power consumption, specifically employing the identity mapping feature extractor on diverse datasets. SpikePoint achieves state-of-the-art (SOTA) performance on four event-based action recognition datasets using only 16 timesteps, surpassing other SNN methods. Moreover, it also achieves SOTA performance across all methods on three datasets, utilizing approximately 0.3\% of the parameters and 0.5\% of power consumption employed by artificial neural networks (ANNs). These results emphasize the significance of Point Cloud and pave the way for many ultra-low-power event-based data processing applications.
Paper Structure (35 sections, 27 equations, 8 figures, 11 tables, 1 algorithm)

This paper contains 35 sections, 27 equations, 8 figures, 11 tables, 1 algorithm.

Figures (8)

  • Figure 1: The overall architecture of SpikePoint. The raw event cloud is segmented by the sliding window. Then, the global Point Cloud is transformed into $M$ groups by grouping and sampling. The coordinate is converted into spikes by rate coding, and the results of action recognition are obtained by the local feature extractor, global feature extractor, and classifier in turn.
  • Figure 2: Visualization of our grouping method. (a) The different spatial positions of $[x_c,y_c,z_c]$, $[x_{min}, y_{min}, z_{min}]$ and $[x,y,z]$. (b) The transformation of the distribution after taking absolute.
  • Figure 3: $ResF$ ablation experiment (a) and the result (b) on DVS ACTION dataset.
  • Figure 4: Structural ablation experiment (a) and the result (b) on DVS ACTION dataset.
  • Figure 5: Visualization of four datasets. (a)DVS Action.(b)Daily DVS.(c)IBM Gesture. (d)HMDB51-DVS.
  • ...and 3 more figures