Implicit Event-RGBD Neural SLAM
Delin Qu, Chi Yan, Dong Wang, Jie Yin, Dan Xu, Bin Zhao, Xuelong Li
TL;DR
EN-SLAM presents a first-in-kind framework that fuses event cameras with RGBD in an implicit neural SLAM pipeline. It introduces differentiable CRF rendering to map a shared radiance field to RGB and event data and employs an Event Temporal Aggregating (ETA) optimization to exploit the temporal differences in events for robust tracking and global BA. The approach is validated on synthetic DEV-Indoors and real DEV-Reals datasets, showing superior tracking accuracy (ATE) and mapping quality (ACC, Depth) with real-time performance (≈17 FPS). The work advances robust dense 3D reconstruction in challenging indoor environments and provides two benchmark datasets to evaluate NeRF-based SLAM under non-ideal conditions.
Abstract
Implicit neural SLAM has achieved remarkable progress recently. Nevertheless, existing methods face significant challenges in non-ideal scenarios, such as motion blur or lighting variation, which often leads to issues like convergence failures, localization drifts, and distorted mapping. To address these challenges, we propose EN-SLAM, the first event-RGBD implicit neural SLAM framework, which effectively leverages the high rate and high dynamic range advantages of event data for tracking and mapping. Specifically, EN-SLAM proposes a differentiable CRF (Camera Response Function) rendering technique to generate distinct RGB and event camera data via a shared radiance field, which is optimized by learning a unified implicit representation with the captured event and RGBD supervision. Moreover, based on the temporal difference property of events, we propose a temporal aggregating optimization strategy for the event joint tracking and global bundle adjustment, capitalizing on the consecutive difference constraints of events, significantly enhancing tracking accuracy and robustness. Finally, we construct the simulated dataset DEV-Indoors and real captured dataset DEV-Reals containing 6 scenes, 17 sequences with practical motion blur and lighting changes for evaluations. Experimental results show that our method outperforms the SOTA methods in both tracking ATE and mapping ACC with a real-time 17 FPS in various challenging environments. Project page: https://delinqu.github.io/EN-SLAM.
