Hierarchical Pose Estimation and Mapping with Multi-Scale Neural Feature Fields
Evgenii Kruzhkov, Alena Savinykh, Sven Behnke
TL;DR
This work addresses large-scale SLAM with unknown poses by introducing a probabilistic implicit-mapping framework built on octree-based neural fields and hierarchical pose optimization. It learns a signed distance function map from sequential LiDAR data, using coarse-to-fine activation of multi-level MLPs to capture both coarse geometry and fine details, while propagating pose gradients through the measurements. The approach demonstrates strong localization on KITTI and superior mapping performance on MaiCity compared to sequential baselines, without requiring ground-truth poses. With real-time-like performance on standard GPUs, the method offers a practical, scalable solution for open-world robotic perception and navigation using neural implicit representations.
Abstract
Robotic applications require a comprehensive understanding of the scene. In recent years, neural fields-based approaches that parameterize the entire environment have become popular. These approaches are promising due to their continuous nature and their ability to learn scene priors. However, the use of neural fields in robotics becomes challenging when dealing with unknown sensor poses and sequential measurements. This paper focuses on the problem of sensor pose estimation for large-scale neural implicit SLAM. We investigate implicit mapping from a probabilistic perspective and propose hierarchical pose estimation with a corresponding neural network architecture. Our method is well-suited for large-scale implicit map representations. The proposed approach operates on consecutive outdoor LiDAR scans and achieves accurate pose estimation, while maintaining stable mapping quality for both short and long trajectories. We built our method on a structured and sparse implicit representation suitable for large-scale reconstruction and evaluated it using the KITTI and MaiCity datasets. Our approach outperforms the baseline in terms of mapping with unknown poses and achieves state-of-the-art localization accuracy.
