Diffusion Based Robust LiDAR Place Recognition
Benjamin Krummenacher, Jonas Frey, Turcan Tuna, Olga Vysotska, Marco Hutter
TL;DR
The paper tackles robust global LiDAR-based place recognition for construction sites, addressing perceptual aliasing and the kidnapped robot problem by learning a multi-hypothesis pose distribution from synthetic LiDAR data generated inside an accurate mesh. It introduces a diffusion regression model with a PointNet++ backbone that outputs $N$ candidate poses from a single scan, followed by fast global registration (FGR) to verify and refine the best match. Key contributions include the diffusion-based place recognition module, synthetic dataset generation from reality capture meshes, thorough evaluation on three real-world datasets, and ablation/localizability analyses. The results demonstrate competitive accuracy and the ability to represent multi-modal pose distributions, enabling robust re-localization in complex, multi-floor construction environments and offering practical benefits for downstream global registration tasks.
Abstract
Mobile robots on construction sites require accurate pose estimation to perform autonomous surveying and inspection missions. Localization in construction sites is a particularly challenging problem due to the presence of repetitive features such as flat plastered walls and perceptual aliasing due to apartments with similar layouts inter and intra floors. In this paper, we focus on the global re-positioning of a robot with respect to an accurate scanned mesh of the building solely using LiDAR data. In our approach, a neural network is trained on synthetic LiDAR point clouds generated by simulating a LiDAR in an accurate real-life large-scale mesh. We train a diffusion model with a PointNet++ backbone, which allows us to model multiple position candidates from a single LiDAR point cloud. The resulting model can successfully predict the global position of LiDAR in confined and complex sites despite the adverse effects of perceptual aliasing. The learned distribution of potential global positions can provide multi-modal position distribution. We evaluate our approach across five real-world datasets and show the place recognition accuracy of 77% +/-2m on average while outperforming baselines at a factor of 2 in mean error.
