Table of Contents
Fetching ...

Implicit Event-RGBD Neural SLAM

Delin Qu, Chi Yan, Dong Wang, Jie Yin, Dan Xu, Bin Zhao, Xuelong Li

TL;DR

EN-SLAM presents a first-in-kind framework that fuses event cameras with RGBD in an implicit neural SLAM pipeline. It introduces differentiable CRF rendering to map a shared radiance field to RGB and event data and employs an Event Temporal Aggregating (ETA) optimization to exploit the temporal differences in events for robust tracking and global BA. The approach is validated on synthetic DEV-Indoors and real DEV-Reals datasets, showing superior tracking accuracy (ATE) and mapping quality (ACC, Depth) with real-time performance (≈17 FPS). The work advances robust dense 3D reconstruction in challenging indoor environments and provides two benchmark datasets to evaluate NeRF-based SLAM under non-ideal conditions.

Abstract

Implicit neural SLAM has achieved remarkable progress recently. Nevertheless, existing methods face significant challenges in non-ideal scenarios, such as motion blur or lighting variation, which often leads to issues like convergence failures, localization drifts, and distorted mapping. To address these challenges, we propose EN-SLAM, the first event-RGBD implicit neural SLAM framework, which effectively leverages the high rate and high dynamic range advantages of event data for tracking and mapping. Specifically, EN-SLAM proposes a differentiable CRF (Camera Response Function) rendering technique to generate distinct RGB and event camera data via a shared radiance field, which is optimized by learning a unified implicit representation with the captured event and RGBD supervision. Moreover, based on the temporal difference property of events, we propose a temporal aggregating optimization strategy for the event joint tracking and global bundle adjustment, capitalizing on the consecutive difference constraints of events, significantly enhancing tracking accuracy and robustness. Finally, we construct the simulated dataset DEV-Indoors and real captured dataset DEV-Reals containing 6 scenes, 17 sequences with practical motion blur and lighting changes for evaluations. Experimental results show that our method outperforms the SOTA methods in both tracking ATE and mapping ACC with a real-time 17 FPS in various challenging environments. Project page: https://delinqu.github.io/EN-SLAM.

Implicit Event-RGBD Neural SLAM

TL;DR

EN-SLAM presents a first-in-kind framework that fuses event cameras with RGBD in an implicit neural SLAM pipeline. It introduces differentiable CRF rendering to map a shared radiance field to RGB and event data and employs an Event Temporal Aggregating (ETA) optimization to exploit the temporal differences in events for robust tracking and global BA. The approach is validated on synthetic DEV-Indoors and real DEV-Reals datasets, showing superior tracking accuracy (ATE) and mapping quality (ACC, Depth) with real-time performance (≈17 FPS). The work advances robust dense 3D reconstruction in challenging indoor environments and provides two benchmark datasets to evaluate NeRF-based SLAM under non-ideal conditions.

Abstract

Implicit neural SLAM has achieved remarkable progress recently. Nevertheless, existing methods face significant challenges in non-ideal scenarios, such as motion blur or lighting variation, which often leads to issues like convergence failures, localization drifts, and distorted mapping. To address these challenges, we propose EN-SLAM, the first event-RGBD implicit neural SLAM framework, which effectively leverages the high rate and high dynamic range advantages of event data for tracking and mapping. Specifically, EN-SLAM proposes a differentiable CRF (Camera Response Function) rendering technique to generate distinct RGB and event camera data via a shared radiance field, which is optimized by learning a unified implicit representation with the captured event and RGBD supervision. Moreover, based on the temporal difference property of events, we propose a temporal aggregating optimization strategy for the event joint tracking and global bundle adjustment, capitalizing on the consecutive difference constraints of events, significantly enhancing tracking accuracy and robustness. Finally, we construct the simulated dataset DEV-Indoors and real captured dataset DEV-Reals containing 6 scenes, 17 sequences with practical motion blur and lighting changes for evaluations. Experimental results show that our method outperforms the SOTA methods in both tracking ATE and mapping ACC with a real-time 17 FPS in various challenging environments. Project page: https://delinqu.github.io/EN-SLAM.
Paper Structure (26 sections, 15 equations, 19 figures, 12 tables, 1 algorithm)

This paper contains 26 sections, 15 equations, 19 figures, 12 tables, 1 algorithm.

Figures (19)

  • Figure 1: Illustration of the proposed implicit event-RGBD neural SLAM system EN-SLAM under non-ideal environments. The dynamic range of RGB sensors is relatively low and suffers from motion blur. Instead, event cameras show great potential in non-ideal environments due to their high dynamic range and low latency advantages. Our method samples rays from two independent RGBD and event cameras to jointly train a single implicit neural field with both modalities. This hybrid shared mechanism provides a natural fusion approach, avoiding alignment issues. It also leverages the advantages of both modalities, resulting in dense, more robust, and higher-quality reconstruction results.
  • Figure 2: Illustration of the Event Generation Model (EGM). An event is triggered at a single pixel if the corresponding logarithmic change in luminance exceeds a threshold $C$.
  • Figure 3: Overview of EN-SLAM. EN-SLAM decodes the scene encoding to a shared geometry and radiance representation, and decomposes the radiance into RGB color $\mathbf{c}(\mathbf{x})$ and event luminance $\mathbf{l}(\mathbf{x})$ via differentiable CRF Mappers. We iteratively optimize the pose and scene representation by minimizing losses, in tracking and global BA with the event temporal aggregating techniques in \ref{['algorithm:ETA']}.
  • Figure 4: The illustration of event temporal aggregating optimization strategy. In the tracking and global BA stages, EN-SLAM adaptively forwards query the previous frame according to the previous index table, and sample rays from different views perform joint optimization in \ref{['eq:event_loss']}.
  • Figure 5: Illustration of the proposed probability-weighted sampling strategy. We utilize the loss of the RGBD plane (left) to guide ray sampling in the event plane (right).
  • ...and 14 more figures