Table of Contents
Fetching ...

Fast Window-Based Event Denoising with Spatiotemporal Correlation Enhancement

Huachen Fang, Jinjian Wu, Qibin Hou, Weisheng Dong, Guangming Shi

TL;DR

This work tackles the noise-prone outputs of event-based cameras by introducing a window-based denoising framework that processes stacks of events, enabling real-time performance with high interpretability. It combines a probabilistic temporal analysis (Temporal Window) and a learned spatial prior (Soft Spatial Feature Embedding) within a multi-scale architecture (WedNet) that employs hierarchical spatial feature learning and a bone-event check to preserve object structure. The method formulates denoising as a MAP problem and solves the spatial component via learned convolutional sparse coding, achieving robust denoising across multiple datasets (DVSCLEAN, DVSNOISE20, ED-KoGTL) and significantly faster runtimes than existing DL-based approaches. The practical impact lies in reliable, fast event denoising that improves downstream vision tasks in dynamic, noisy environments, enabling real-time neuromorphic perception systems.

Abstract

Previous deep learning-based event denoising methods mostly suffer from poor interpretability and difficulty in real-time processing due to their complex architecture designs. In this paper, we propose window-based event denoising, which simultaneously deals with a stack of events while existing element-based denoising focuses on one event each time. Besides, we give the theoretical analysis based on probability distributions in both temporal and spatial domains to improve interpretability. In temporal domain, we use timestamp deviations between processing events and central event to judge the temporal correlation and filter out temporal-irrelevant events. In spatial domain, we choose maximum a posteriori (MAP) to discriminate real-world event and noise, and use the learned convolutional sparse coding to optimize the objective function. Based on the theoretical analysis, we build Temporal Window (TW) module and Soft Spatial Feature Embedding (SSFE) module to process temporal and spatial information separately, and construct a novel multi-scale window-based event denoising network, named MSDNet. The high denoising accuracy and fast running speed of our MSDNet enables us to achieve real-time denoising in complex scenes. Extensive experimental results verify the effectiveness and robustness of our MSDNet. Our algorithm can remove event noise effectively and efficiently and improve the performance of downstream tasks.

Fast Window-Based Event Denoising with Spatiotemporal Correlation Enhancement

TL;DR

This work tackles the noise-prone outputs of event-based cameras by introducing a window-based denoising framework that processes stacks of events, enabling real-time performance with high interpretability. It combines a probabilistic temporal analysis (Temporal Window) and a learned spatial prior (Soft Spatial Feature Embedding) within a multi-scale architecture (WedNet) that employs hierarchical spatial feature learning and a bone-event check to preserve object structure. The method formulates denoising as a MAP problem and solves the spatial component via learned convolutional sparse coding, achieving robust denoising across multiple datasets (DVSCLEAN, DVSNOISE20, ED-KoGTL) and significantly faster runtimes than existing DL-based approaches. The practical impact lies in reliable, fast event denoising that improves downstream vision tasks in dynamic, noisy environments, enabling real-time neuromorphic perception systems.

Abstract

Previous deep learning-based event denoising methods mostly suffer from poor interpretability and difficulty in real-time processing due to their complex architecture designs. In this paper, we propose window-based event denoising, which simultaneously deals with a stack of events while existing element-based denoising focuses on one event each time. Besides, we give the theoretical analysis based on probability distributions in both temporal and spatial domains to improve interpretability. In temporal domain, we use timestamp deviations between processing events and central event to judge the temporal correlation and filter out temporal-irrelevant events. In spatial domain, we choose maximum a posteriori (MAP) to discriminate real-world event and noise, and use the learned convolutional sparse coding to optimize the objective function. Based on the theoretical analysis, we build Temporal Window (TW) module and Soft Spatial Feature Embedding (SSFE) module to process temporal and spatial information separately, and construct a novel multi-scale window-based event denoising network, named MSDNet. The high denoising accuracy and fast running speed of our MSDNet enables us to achieve real-time denoising in complex scenes. Extensive experimental results verify the effectiveness and robustness of our MSDNet. Our algorithm can remove event noise effectively and efficiently and improve the performance of downstream tasks.
Paper Structure (18 sections, 20 equations, 10 figures, 6 tables)

This paper contains 18 sections, 20 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Top: Element-based event denoising samples the neighborhoods of the current event and processes the event stream event by event. Bottom: Window-based event denoising method samples a stack of events and labels the event stack at one processing period.
  • Figure 2: Comparisons of SNR score and running time on the DVSCLEAN dataset. The algorithms with higher SNR score and lower running speed have a better denoising performance.
  • Figure 3: Framework of WedNet. Our WedNet simultaneously processes a stack of events, significantly improving running speed. We first use the temporal window to divide event stacks and then utilize the BEC module to check the bone events in the event stack. HSFL module is to learn the latent spatial feature consisting of four extraction levels and four propagation levels. Finally, we use fully connected layer to get event labels.
  • Figure 4: Structure of our SSFE module.
  • Figure 5: Bone Events Check module. A stack of events within $t_{lim}$ is first compressed into a frame. Then, the CDL algorithm (4 neighborhoods) labels the event stack. The event whose connected domain overcomes the threshold $\tau$ is considered a bone event.
  • ...and 5 more figures