Frequency-aware Event Cloud Network
Hongwei Ren, Fei Ma, Xiaopeng Lin, Yuetong Fang, Hongxiang Huang, Yulong Huang, Yue Zhou, Haotian Fu, Ziyi Yang, Fei Richard Yu, Bojun Cheng
TL;DR
Event cameras produce asynchronous $2S-1T-1P$ events, but conventional frame/voxel pipelines incur costly transformations and lose fine-grained temporal details, while Point Cloud methods ignore polarity and struggle with long-term features. This work introduces FECNet, a frequency-aware backbone operating on the $2S-1T-1P$ Event Cloud, and implements a novel Event-based G\&S module (D-FPS, EF-KNN, CES) along with Spatial-FA and Temporal-FA modules that leverage $FFT$-based feature extraction with a learnable filter $V$. These components dramatically reduce MACs while capturing global spatial-temporal context, enabling efficient processing of long event sequences. Comprehensive experiments across nine datasets for object classification, action recognition, and human pose estimation demonstrate state-of-the-art or competitive accuracy with substantially lower computational cost and real-time throughput, suggesting FECNet as a practical backbone for edge-oriented event-based vision.
Abstract
Event cameras are biologically inspired sensors that emit events asynchronously with remarkable temporal resolution, garnering significant attention from both industry and academia. Mainstream methods favor frame and voxel representations, which reach a satisfactory performance while introducing time-consuming transformation, bulky models, and sacrificing fine-grained temporal information. Alternatively, Point Cloud representation demonstrates promise in addressing the mentioned weaknesses, but it ignores the polarity information, and its models have limited proficiency in abstracting long-term events' features. In this paper, we propose a frequency-aware network named FECNet that leverages Event Cloud representations. FECNet fully utilizes 2S-1T-1P Event Cloud by innovating the event-based Group and Sampling module. To accommodate the long sequence events from Event Cloud, FECNet embraces feature extraction in the frequency domain via the Fourier transform. This approach substantially extinguishes the explosion of Multiply Accumulate Operations (MACs) while effectively abstracting spatial-temporal features. We conducted extensive experiments on event-based object classification, action recognition, and human pose estimation tasks, and the results substantiate the effectiveness and efficiency of FECNet.
