Table of Contents
Fetching ...

SOLVR: Submap Oriented LiDAR-Visual Re-Localisation

Joshua Knights, Sebastián Barbas Laina, Peyman Moghadam, Stefan Leutenegger

TL;DR

SOLVR is proposed, a unified pipeline for learning based LiDAR-Visual re-localisation which performs place recognition and 6-DoF registration across sensor modalities, and replaces RANSAC with a registration function that weights a simple least-squares fitting with the estimated inlier likelihood of sparse keypoint correspondences.

Abstract

This paper proposes SOLVR, a unified pipeline for learning based LiDAR-Visual re-localisation which performs place recognition and 6-DoF registration across sensor modalities. We propose a strategy to align the input sensor modalities by leveraging stereo image streams to produce metric depth predictions with pose information, followed by fusing multiple scene views from a local window using a probabilistic occupancy framework to expand the limited field-of-view of the camera. Additionally, SOLVR adopts a flexible definition of what constitutes positive examples for different training losses, allowing us to simultaneously optimise place recognition and registration performance. Furthermore, we replace RANSAC with a registration function that weights a simple least-squares fitting with the estimated inlier likelihood of sparse keypoint correspondences, improving performance in scenarios with a low inlier ratio between the query and retrieved place. Our experiments on the KITTI and KITTI360 datasets show that SOLVR achieves state-of-the-art performance for LiDAR-Visual place recognition and registration, particularly improving registration accuracy over larger distances between the query and retrieved place.

SOLVR: Submap Oriented LiDAR-Visual Re-Localisation

TL;DR

SOLVR is proposed, a unified pipeline for learning based LiDAR-Visual re-localisation which performs place recognition and 6-DoF registration across sensor modalities, and replaces RANSAC with a registration function that weights a simple least-squares fitting with the estimated inlier likelihood of sparse keypoint correspondences.

Abstract

This paper proposes SOLVR, a unified pipeline for learning based LiDAR-Visual re-localisation which performs place recognition and 6-DoF registration across sensor modalities. We propose a strategy to align the input sensor modalities by leveraging stereo image streams to produce metric depth predictions with pose information, followed by fusing multiple scene views from a local window using a probabilistic occupancy framework to expand the limited field-of-view of the camera. Additionally, SOLVR adopts a flexible definition of what constitutes positive examples for different training losses, allowing us to simultaneously optimise place recognition and registration performance. Furthermore, we replace RANSAC with a registration function that weights a simple least-squares fitting with the estimated inlier likelihood of sparse keypoint correspondences, improving performance in scenarios with a low inlier ratio between the query and retrieved place. Our experiments on the KITTI and KITTI360 datasets show that SOLVR achieves state-of-the-art performance for LiDAR-Visual place recognition and registration, particularly improving registration accuracy over larger distances between the query and retrieved place.
Paper Structure (21 sections, 7 equations, 4 figures, 7 tables)

This paper contains 21 sections, 7 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: An illustration of our approach to the task of LiDAR-Visual re-localisation. SOLVR constructs 3D submaps from a stream of input camera images in order to align the camera and LiDAR sensor modalities. The submap is used to retrieve a corresponding place from a database of LiDAR scans, at which point the submap and scan are registered in order to estimate the current sensor pose.
  • Figure 2: Partial submap generation pipeline. For each pair of stereo images, we create a 3D depth projection by predicting the metric depth using a neural network and projecting each pixel into 3D using the camera intrinsics. We then align the depth projections from a local window using the camera trajectory and fuse them using a Bayesian occupancy update to construct our partial submap. We note that the noise in zoomed-in region (c) is significantly reduced in (d), due to the Bayesian occupancy update performed when integrating individual depth projections into the partial submap.
  • Figure 3: 6-DoF LiDAR-Visual Registration accuracy versus distance between query and candidate pairs on sequences KITTI-09 and 10.
  • Figure 4: Recall@1 at 5m threshold vs. 6-DoF Registration accuracy for LiDAR-Visual place recognition and registration for shared and mixed training thresholds on KITTI-00.