DVN-SLAM: Dynamic Visual Neural SLAM Based on Local-Global Encoding
Wenhua Wu, Guangming Wang, Ting Deng, Sebastian Aegidius, Stuart Shanks, Valerio Modugno, Dimitrios Kanoulas, Hesheng Wang
TL;DR
DVN-SLAM tackles dynamic robustness in dense NeRF-based SLAM by introducing a local-global fusion neural implicit representation that combines a global One-Blob encoding with local axis-aligned feature planes. It uses attention-based feature fusion and a result fusion scheme to predict RGB and TSDF, and an information concentration loss based on depth variance to address rendering uncertainties. The approach achieves competitive static localization and mapping while maintaining robustness in highly dynamic indoor scenes across Replica and TUM-RGBD, outperforming several baselines and running in real time on an A100 GPU. This work advances dense SLAM by enabling plausible reconstructions for unobserved regions and robust operation under object motion, with potential impact on autonomous systems and AR/VR in dynamic environments.
Abstract
Recent research on Simultaneous Localization and Mapping (SLAM) based on implicit representation has shown promising results in indoor environments. However, there are still some challenges: the limited scene representation capability of implicit encodings, the uncertainty in the rendering process from implicit representations, and the disruption of consistency by dynamic objects. To address these challenges, we propose a real-time dynamic visual SLAM system based on local-global fusion neural implicit representation, named DVN-SLAM. To improve the scene representation capability, we introduce a local-global fusion neural implicit representation that enables the construction of an implicit map while considering both global structure and local details. To tackle uncertainties arising from the rendering process, we design an information concentration loss for optimization, aiming to concentrate scene information on object surfaces. The proposed DVN-SLAM achieves competitive performance in localization and mapping across multiple datasets. More importantly, DVN-SLAM demonstrates robustness in dynamic scenes, a trait that sets it apart from other NeRF-based methods.
