Table of Contents
Fetching ...

Frequency-aware Event Cloud Network

Hongwei Ren, Fei Ma, Xiaopeng Lin, Yuetong Fang, Hongxiang Huang, Yulong Huang, Yue Zhou, Haotian Fu, Ziyi Yang, Fei Richard Yu, Bojun Cheng

TL;DR

Event cameras produce asynchronous $2S-1T-1P$ events, but conventional frame/voxel pipelines incur costly transformations and lose fine-grained temporal details, while Point Cloud methods ignore polarity and struggle with long-term features. This work introduces FECNet, a frequency-aware backbone operating on the $2S-1T-1P$ Event Cloud, and implements a novel Event-based G\&S module (D-FPS, EF-KNN, CES) along with Spatial-FA and Temporal-FA modules that leverage $FFT$-based feature extraction with a learnable filter $V$. These components dramatically reduce MACs while capturing global spatial-temporal context, enabling efficient processing of long event sequences. Comprehensive experiments across nine datasets for object classification, action recognition, and human pose estimation demonstrate state-of-the-art or competitive accuracy with substantially lower computational cost and real-time throughput, suggesting FECNet as a practical backbone for edge-oriented event-based vision.

Abstract

Event cameras are biologically inspired sensors that emit events asynchronously with remarkable temporal resolution, garnering significant attention from both industry and academia. Mainstream methods favor frame and voxel representations, which reach a satisfactory performance while introducing time-consuming transformation, bulky models, and sacrificing fine-grained temporal information. Alternatively, Point Cloud representation demonstrates promise in addressing the mentioned weaknesses, but it ignores the polarity information, and its models have limited proficiency in abstracting long-term events' features. In this paper, we propose a frequency-aware network named FECNet that leverages Event Cloud representations. FECNet fully utilizes 2S-1T-1P Event Cloud by innovating the event-based Group and Sampling module. To accommodate the long sequence events from Event Cloud, FECNet embraces feature extraction in the frequency domain via the Fourier transform. This approach substantially extinguishes the explosion of Multiply Accumulate Operations (MACs) while effectively abstracting spatial-temporal features. We conducted extensive experiments on event-based object classification, action recognition, and human pose estimation tasks, and the results substantiate the effectiveness and efficiency of FECNet.

Frequency-aware Event Cloud Network

TL;DR

Event cameras produce asynchronous events, but conventional frame/voxel pipelines incur costly transformations and lose fine-grained temporal details, while Point Cloud methods ignore polarity and struggle with long-term features. This work introduces FECNet, a frequency-aware backbone operating on the Event Cloud, and implements a novel Event-based G\&S module (D-FPS, EF-KNN, CES) along with Spatial-FA and Temporal-FA modules that leverage -based feature extraction with a learnable filter . These components dramatically reduce MACs while capturing global spatial-temporal context, enabling efficient processing of long event sequences. Comprehensive experiments across nine datasets for object classification, action recognition, and human pose estimation demonstrate state-of-the-art or competitive accuracy with substantially lower computational cost and real-time throughput, suggesting FECNet as a practical backbone for edge-oriented event-based vision.

Abstract

Event cameras are biologically inspired sensors that emit events asynchronously with remarkable temporal resolution, garnering significant attention from both industry and academia. Mainstream methods favor frame and voxel representations, which reach a satisfactory performance while introducing time-consuming transformation, bulky models, and sacrificing fine-grained temporal information. Alternatively, Point Cloud representation demonstrates promise in addressing the mentioned weaknesses, but it ignores the polarity information, and its models have limited proficiency in abstracting long-term events' features. In this paper, we propose a frequency-aware network named FECNet that leverages Event Cloud representations. FECNet fully utilizes 2S-1T-1P Event Cloud by innovating the event-based Group and Sampling module. To accommodate the long sequence events from Event Cloud, FECNet embraces feature extraction in the frequency domain via the Fourier transform. This approach substantially extinguishes the explosion of Multiply Accumulate Operations (MACs) while effectively abstracting spatial-temporal features. We conducted extensive experiments on event-based object classification, action recognition, and human pose estimation tasks, and the results substantiate the effectiveness and efficiency of FECNet.
Paper Structure (21 sections, 11 equations, 4 figures, 8 tables, 1 algorithm)

This paper contains 21 sections, 11 equations, 4 figures, 8 tables, 1 algorithm.

Figures (4)

  • Figure 1: Gain of FECNet on DVS128 Gesture. The horizontal axis represents the MACs after taking $ln$, the vertical axis represents accuracy, the different shapes denote representations, and the number beside the shape is throughput (FPS).
  • Figure 2: Visualization of different representations in yin-yang from N-Caltech101 dataset.
  • Figure 3: FECNet's architecture. It accomplishes three tasks by processing the Event Cloud into a sequence of distinct modules: embedding, hierarchy structure, and predict head. In more detail, the hierarchy structure contains five different modules: Event-based G & S module captures the local neighborhood relationships, Spatial-FA abstracts the explicit spatial and implicit temporal features, AGG aggregates the features in a group, Temporal-FA catches the global explicit temporal features, and RES is responsible for the abstraction and high-level representation of features. The final results are obtained by the max-pooling and average-pooling features.
  • Figure 4: Visualization of Event Cloud's coordinates and features dimension during hierarchy structure. The closer to the end of the loop, the fewer the number of events and the greater the feature dimension.