TivNe-SLAM: Dynamic Mapping and Tracking via Time-Varying Neural Radiance Fields
Chengyao Duan, Zhiliu Yang
TL;DR
TivNe-SLAM tackles dynamic SLAM by introducing a time-varying implicit representation that augments 3D space with time to form 4D space–time coordinates and uses a deformation field to map to a canonical field at $t=0$. Colors and SDF are regressed by a pair of MLPs conditioned on time, with a TriLerp-based embedding and a two-stage optimization to jointly track poses and map dynamic objects. A novel overlap-based keyframe selection maximizes view coverage, enabling more complete dynamic reconstructions while maintaining real-time performance and avoiding pre-trained models. Evaluations on synthetic Room4 and ToyCar3 and real Teddy datasets show competitive tracking accuracy and superior dynamic-object reconstruction, with substantially faster training than RoDynRF. These results highlight the practical impact of 4D neural implicit representations for robust, real-time dynamic-SLAM in real-world environments.
Abstract
Previous attempts to integrate Neural Radiance Fields (NeRF) into the Simultaneous Localization and Mapping (SLAM) framework either rely on the assumption of static scenes or require the ground truth camera poses, which impedes their application in real-world scenarios. This paper proposes a time-varying representation to track and reconstruct the dynamic scenes. Firstly, two processes, a tracking process and a mapping process, are maintained simultaneously in our framework. In the tracking process, all input images are uniformly sampled and then progressively trained in a self-supervised paradigm. In the mapping process, we leverage motion masks to distinguish dynamic objects from the static background, and sample more pixels from dynamic areas. Secondly, the parameter optimization for both processes is comprised of two stages: the first stage associates time with 3D positions to convert the deformation field to the canonical field. The second stage associates time with the embeddings of the canonical field to obtain colors and a Signed Distance Function (SDF). Lastly, we propose a novel keyframe selection strategy based on the overlapping rate. Our approach is evaluated on two synthetic datasets and one real-world dataset, and the experiments validate that our method achieves competitive results in both tracking and mapping when compared to existing state-of-the-art NeRF-based dynamic SLAM systems.
