Table of Contents
Fetching ...

Neural Implicit Representation for Highly Dynamic LiDAR Mapping and Odometry

Qi Zhang, He Wang, Ru Li, Wenbin Li

TL;DR

This work targets robust 3D reconstruction and odometry in highly dynamic outdoor LiDAR scenes by extending NeRF-LOAM with dynamic foreground/background separation, a multi-resolution octree, and Fourier feature encoding. A moving-object detection thread produces 3D bounding boxes to mask dynamic regions, while ground points are regenerated within foreground areas to maintain a coherent background map. The neural scene representation uses multi-resolution octree embeddings (with $H=3$) and Fourier-encoded query coordinates to predict the SDF via $Ψ(p)=f(γ(p),F_{\alpha}^{s})$, and optimizes with a combined loss $L_{total}=\lambda_s L_s + \lambda_f L_f + \lambda_e L_e + \lambda_d L_d$. Experiments on MOT19, MaiCity, and Newer College show improved map completeness and competitive odometry compared to NeRF-LOAM, Pin-SLAM, and other baselines, highlighting the method’s practicality for dynamic outdoor SLAM.

Abstract

Recent advancements in Simultaneous Localization and Mapping (SLAM) have increasingly highlighted the robustness of LiDAR-based techniques. At the same time, Neural Radiance Fields (NeRF) have introduced new possibilities for 3D scene reconstruction, exemplified by SLAM systems. Among these, NeRF-LOAM has shown notable performance in NeRF-based SLAM applications. However, despite its strengths, these systems often encounter difficulties in dynamic outdoor environments due to their inherent static assumptions. To address these limitations, this paper proposes a novel method designed to improve reconstruction in highly dynamic outdoor scenes. Based on NeRF-LOAM, the proposed approach consists of two primary components. First, we separate the scene into static background and dynamic foreground. By identifying and excluding dynamic elements from the mapping process, this segmentation enables the creation of a dense 3D map that accurately represents the static background only. The second component extends the octree structure to support multi-resolution representation. This extension not only enhances reconstruction quality but also aids in the removal of dynamic objects identified by the first module. Additionally, Fourier feature encoding is applied to the sampled points, capturing high-frequency information and leading to more complete reconstruction results. Evaluations on various datasets demonstrate that our method achieves more competitive results compared to current state-of-the-art approaches.

Neural Implicit Representation for Highly Dynamic LiDAR Mapping and Odometry

TL;DR

This work targets robust 3D reconstruction and odometry in highly dynamic outdoor LiDAR scenes by extending NeRF-LOAM with dynamic foreground/background separation, a multi-resolution octree, and Fourier feature encoding. A moving-object detection thread produces 3D bounding boxes to mask dynamic regions, while ground points are regenerated within foreground areas to maintain a coherent background map. The neural scene representation uses multi-resolution octree embeddings (with ) and Fourier-encoded query coordinates to predict the SDF via , and optimizes with a combined loss . Experiments on MOT19, MaiCity, and Newer College show improved map completeness and competitive odometry compared to NeRF-LOAM, Pin-SLAM, and other baselines, highlighting the method’s practicality for dynamic outdoor SLAM.

Abstract

Recent advancements in Simultaneous Localization and Mapping (SLAM) have increasingly highlighted the robustness of LiDAR-based techniques. At the same time, Neural Radiance Fields (NeRF) have introduced new possibilities for 3D scene reconstruction, exemplified by SLAM systems. Among these, NeRF-LOAM has shown notable performance in NeRF-based SLAM applications. However, despite its strengths, these systems often encounter difficulties in dynamic outdoor environments due to their inherent static assumptions. To address these limitations, this paper proposes a novel method designed to improve reconstruction in highly dynamic outdoor scenes. Based on NeRF-LOAM, the proposed approach consists of two primary components. First, we separate the scene into static background and dynamic foreground. By identifying and excluding dynamic elements from the mapping process, this segmentation enables the creation of a dense 3D map that accurately represents the static background only. The second component extends the octree structure to support multi-resolution representation. This extension not only enhances reconstruction quality but also aids in the removal of dynamic objects identified by the first module. Additionally, Fourier feature encoding is applied to the sampled points, capturing high-frequency information and leading to more complete reconstruction results. Evaluations on various datasets demonstrate that our method achieves more competitive results compared to current state-of-the-art approaches.
Paper Structure (17 sections, 9 equations, 5 figures, 3 tables)

This paper contains 17 sections, 9 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: The 3D reconstruction of KITTI MOT 19 using our proposed method and NeRF-LOAM deng2023nerf.
  • Figure 2: The overview of the system. The left part of the image illustrates the process of background and foreground separation. We remove the dynamic points from the foreground, and generate a foreground mask (pink), resulting in a purely static scene. The right part of the image shows the training process of the neural SDF module. We interpolate the query point at different resolution levels to obtain the corresponding features, and finally combine the Fourier feature positional encoding and feed them into the MLP to predict the SDF value.
  • Figure 3: Definition of the 3D box $B$.
  • Figure 4: 3D reconstruction results among Proposed method, NeRF-LOAM deng2023nerf, Pin-SLAM pan2024pin. (a) and (b): Original images from the datasets MOT19 and MOT26. The red ovals highlight dynamic pedestrians of the scene. (c) to (h): These show how the proposed Method, NeRF-LOAM deng2023nerf, and Pin-SLAM pan2024pin perform on the MOT19 dataset. The grayscale images are reconstructions, and the red ovals indicate the areas of focus for comparison.
  • Figure 5: Qualitative visualization of the map quality on the MaiCity dataset (Mai) (top) and Newer College dataset (NCD) (down). The areas highlighted by the red ellipses emphasize the contributions of our proposed method.