Table of Contents
Fetching ...

PIN-SLAM: LiDAR SLAM Using a Point-Based Implicit Neural Representation for Achieving Global Map Consistency

Yue Pan, Xingguang Zhong, Louis Wiesmann, Thorbjörn Posewsky, Jens Behley, Cyrill Stachniss

TL;DR

PIN-SLAM addresses the challenge of global map consistency in LiDAR SLAM by introducing a point-based implicit neural map (PIN map) built from elastic neural points. The system alternates online local map learning with correspondenced-free pose estimation, and uses loop closures to elastically adjust both poses and neural points, enabling consistent large-scale maps and accurate mesh reconstruction. It combines a voxel-hashing data structure, local SDF supervision, and second-order optimization to achieve online performance at frame rate on a moderate GPU, while supporting extensions to RGB-D and semantic mapping. Across diverse datasets, PIN-SLAM demonstrates competitive localization accuracy, improved loop-closure recall, and superior map consistency with compact implicit representations, illustrating its practical impact for real-time, large-scale SLAM.

Abstract

Accurate and robust localization and mapping are essential components for most autonomous robots. In this paper, we propose a SLAM system for building globally consistent maps, called PIN-SLAM, that is based on an elastic and compact point-based implicit neural map representation. Taking range measurements as input, our approach alternates between incremental learning of the local implicit signed distance field and the pose estimation given the current local map using a correspondence-free, point-to-implicit model registration. Our implicit map is based on sparse optimizable neural points, which are inherently elastic and deformable with the global pose adjustment when closing a loop. Loops are also detected using the neural point features. Extensive experiments validate that PIN-SLAM is robust to various environments and versatile to different range sensors such as LiDAR and RGB-D cameras. PIN-SLAM achieves pose estimation accuracy better or on par with the state-of-the-art LiDAR odometry or SLAM systems and outperforms the recent neural implicit SLAM approaches while maintaining a more consistent, and highly compact implicit map that can be reconstructed as accurate and complete meshes. Finally, thanks to the voxel hashing for efficient neural points indexing and the fast implicit map-based registration without closest point association, PIN-SLAM can run at the sensor frame rate on a moderate GPU. Codes will be available at: https://github.com/PRBonn/PIN_SLAM.

PIN-SLAM: LiDAR SLAM Using a Point-Based Implicit Neural Representation for Achieving Global Map Consistency

TL;DR

PIN-SLAM addresses the challenge of global map consistency in LiDAR SLAM by introducing a point-based implicit neural map (PIN map) built from elastic neural points. The system alternates online local map learning with correspondenced-free pose estimation, and uses loop closures to elastically adjust both poses and neural points, enabling consistent large-scale maps and accurate mesh reconstruction. It combines a voxel-hashing data structure, local SDF supervision, and second-order optimization to achieve online performance at frame rate on a moderate GPU, while supporting extensions to RGB-D and semantic mapping. Across diverse datasets, PIN-SLAM demonstrates competitive localization accuracy, improved loop-closure recall, and superior map consistency with compact implicit representations, illustrating its practical impact for real-time, large-scale SLAM.

Abstract

Accurate and robust localization and mapping are essential components for most autonomous robots. In this paper, we propose a SLAM system for building globally consistent maps, called PIN-SLAM, that is based on an elastic and compact point-based implicit neural map representation. Taking range measurements as input, our approach alternates between incremental learning of the local implicit signed distance field and the pose estimation given the current local map using a correspondence-free, point-to-implicit model registration. Our implicit map is based on sparse optimizable neural points, which are inherently elastic and deformable with the global pose adjustment when closing a loop. Loops are also detected using the neural point features. Extensive experiments validate that PIN-SLAM is robust to various environments and versatile to different range sensors such as LiDAR and RGB-D cameras. PIN-SLAM achieves pose estimation accuracy better or on par with the state-of-the-art LiDAR odometry or SLAM systems and outperforms the recent neural implicit SLAM approaches while maintaining a more consistent, and highly compact implicit map that can be reconstructed as accurate and complete meshes. Finally, thanks to the voxel hashing for efficient neural points indexing and the fast implicit map-based registration without closest point association, PIN-SLAM can run at the sensor frame rate on a moderate GPU. Codes will be available at: https://github.com/PRBonn/PIN_SLAM.
Paper Structure (39 sections, 27 equations, 14 figures, 14 tables)

This paper contains 39 sections, 27 equations, 14 figures, 14 tables.

Figures (14)

  • Figure 1: We present PIN-SLAM, a novel LiDAR SLAM system using an elastic point-based implicit neural map representation. Depicted in the middle, we show a large-scale globally consistent neural point map built with our approach using about 20,000 LiDAR scans recorded with a car without using any information from a GNSS, IMU or wheel odometry. We can query the SDF value at an arbitrary position from the neural point map and reconstruct surface meshes. The point colors represent the neural point feature after online optimization. On the left, we show the consistent neural points (top) and mesh (bottom) of a region traversed by the car multiple times indicated by the dashed orange box. The colors of the neural points (top) represent timesteps when the point was added to the map. On the right, we show the high-fidelity mesh (bottom) of a building reconstructed from the neural point map (top) of the region indicated by a dashed blue box.
  • Figure 2: Pipeline overview of PIN-SLAM. Starting from a point cloud $\mathcal{P}$ scanned at timestep $t$, (1) a point cloud for registration $\mathcal{P}_r$ and a point cloud for mapping $\mathcal{P}_m$ are voxel-downsampled. (2) We align $\mathcal{P}_r$ to the implicit SDF of current local map $\mathcal{M}_l$ to estimate the global pose of current frame ${\hbox{\sffamily{T}}}_{WC_{t}}$. (3) ${\hbox{\sffamily{T}}}_{WC_{t}}$ is then used to transform $\mathcal{P}_m$ into the map coordinate system. With the transformed $\mathcal{P}_m$, we update the point-based implicit neural (PIN) map $\mathcal{M}$ and optimize the neural point features in the local map $\mathcal{M}_l$ by online incremental learning. (4) We generate polar context descriptor $\mathbf{U}_{t}$ using the current local map $\mathcal{M}_l$ and search for loop closures by comparing $\mathbf{U}_{t}$ to descriptors generated for previous frames. Once a loop between frame $C_{t}$ and $C_{k}$ is detected, we add the transformation ${\hbox{\sffamily{T}}}_{C_kC_{t}}$ as a loop edge of the pose graph and then (5) conduct the pose graph optimization. The position and orientation of the neural points in $\mathcal{M}$ are transformed along with their associated frames after the pose graph optimization, leading to a globally consistent map. With the PIN map, we can query the SDF value at an arbitrary position during or after the SLAM task for path planning and mesh reconstruction.
  • Figure 3: Diagram of SDF querying in our point-based implicit neural map simplified in 2D. (a) The point in gray is the query position ${\hbox{\boldmath$p$}}$ while the other points are the neighboring neural points. Each neural point predicts the SDF value $s_i$ at the query position by feeding the neural point feature ${\hbox{\boldmath$f$}}^g_i$ and the query point's position ${\hbox{\boldmath$d$}}_i$ under the neural point's coordinate system through a globally shared decoder $D_\theta^g$. Then the predictions are weighted as the final prediction $s$ according to the distances from the neural points to the query position. (b) The orientation of each neural point defines its local coordinate system, ensuring the relative coordinate ${\hbox{\boldmath$d$}}_i$ and thus the SDF querying invariant to rigid-body transformation.
  • Figure 4: An example during the operation of PIN-SLAM at a timestep: (a) shows the point cloud for mapping $\mathcal{P}_m$. A moving bus is highlighted in the green circle. (b) shows the sparser point cloud for registration $\mathcal{P}_r$, colorized according to the point-wise registration weight from black to red. The points in blue are filtered, mainly lying at dynamic objects, rough vegetation, and newly observed regions. (c) shows points from the training sample pool $\mathcal{D}_p$ for map optimization, colorized according to their SDF target values from blue to red.
  • Figure 5: Comparison of (a) the locally consistent mesh with duplicated structures reconstructed by PIN LiDAR odometry, and (b) the globally consistent mesh reconstructed by PIN-SLAM built on KITTI sequence 00 after loop closure corrections. The estimated trajectories are overlaid on the map and colorized according to the timestamp. Details of two revisited regions are highlighted in the boxes.
  • ...and 9 more figures