Table of Contents
Fetching ...

Increasing SLAM Pose Accuracy by Ground-to-Satellite Image Registration

Yanhao Zhang, Yujiao Shi, Shan Wang, Ankit Vora, Akhil Perincherry, Yongbo Chen, Hongdong Li

TL;DR

The paper addresses long-term drift in vision-based SLAM for autonomous driving by introducing a fusion framework with ground-to-satellite (G2S) image registration. It combines a deep-learning-based G2S registration (BoostG2SLoc) with a coarse-to-fine G2S pose selection and a scaled pose-graph optimization that estimates per-frame scale factors $s_k$, producing drift-corrected trajectories. The method demonstrates improved translation and rotation accuracy on KITTI and FordAV datasets, with iterative trajectory refinement enhancing robustness. This work offers a practical path toward GPS-independent, globally-consistent localization suitable for real-world autonomous driving scenarios.

Abstract

Vision-based localization for autonomous driving has been of great interest among researchers. When a pre-built 3D map is not available, the techniques of visual simultaneous localization and mapping (SLAM) are typically adopted. Due to error accumulation, visual SLAM (vSLAM) usually suffers from long-term drift. This paper proposes a framework to increase the localization accuracy by fusing the vSLAM with a deep-learning-based ground-to-satellite (G2S) image registration method. In this framework, a coarse (spatial correlation bound check) to fine (visual odometry consistency check) method is designed to select the valid G2S prediction. The selected prediction is then fused with the SLAM measurement by solving a scaled pose graph problem. To further increase the localization accuracy, we provide an iterative trajectory fusion pipeline. The proposed framework is evaluated on two well-known autonomous driving datasets, and the results demonstrate the accuracy and robustness in terms of vehicle localization.

Increasing SLAM Pose Accuracy by Ground-to-Satellite Image Registration

TL;DR

The paper addresses long-term drift in vision-based SLAM for autonomous driving by introducing a fusion framework with ground-to-satellite (G2S) image registration. It combines a deep-learning-based G2S registration (BoostG2SLoc) with a coarse-to-fine G2S pose selection and a scaled pose-graph optimization that estimates per-frame scale factors , producing drift-corrected trajectories. The method demonstrates improved translation and rotation accuracy on KITTI and FordAV datasets, with iterative trajectory refinement enhancing robustness. This work offers a practical path toward GPS-independent, globally-consistent localization suitable for real-world autonomous driving scenarios.

Abstract

Vision-based localization for autonomous driving has been of great interest among researchers. When a pre-built 3D map is not available, the techniques of visual simultaneous localization and mapping (SLAM) are typically adopted. Due to error accumulation, visual SLAM (vSLAM) usually suffers from long-term drift. This paper proposes a framework to increase the localization accuracy by fusing the vSLAM with a deep-learning-based ground-to-satellite (G2S) image registration method. In this framework, a coarse (spatial correlation bound check) to fine (visual odometry consistency check) method is designed to select the valid G2S prediction. The selected prediction is then fused with the SLAM measurement by solving a scaled pose graph problem. To further increase the localization accuracy, we provide an iterative trajectory fusion pipeline. The proposed framework is evaluated on two well-known autonomous driving datasets, and the results demonstrate the accuracy and robustness in terms of vehicle localization.
Paper Structure (26 sections, 7 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 26 sections, 7 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: The proposed framework fuses vSLAM with G2S registration and estimates the camera trajectory with high accuracy. The inputs are poses from stereo SLAM, ground-view images, and satellite images, the output is an updated vehicle trajectory. The example shows the localization error using the colour map (unit: m).
  • Figure 2: A flowchart showing the main processes of the proposed framework. For each vehicle pose, we calculate the G2S prediction $\breve{\mathbf{T}}$ using the ground-view and the corresponding satellite images. The valid predictions are selected via a coarse-to-fine procedure, and are fused with the relative poses (from the original SLAM trajectory) by solving a scaled pose graph optimization. The localization error is shown using the colour map (unit: m).
  • Figure 3: Error distribution (unit m) of the raw G2S predictions and that of the selected predictions. Here, we use seven sequences ('04'-'10') from KITTI Odometry Benchmark for evaluation. We do not update the trajectory to avoid the effect from other modules.
  • Figure 4: A comparison of localization error distribution. The RMSE is shown in Table \ref{['tab_compare_slam']}. The 1st-2nd rows are the rotation (unit $^\circ$) and translation (unit m) error distribution of each sequence, where $\mathbf{S}$ is by trajectory origin. The 3rd row reports the histograms of all rotation and translation errors, where $\mathbf{S}$ is by multiple ground truth. Overall, the error by the proposed framework is lower and more concentrated.
  • Figure 5: Examples of the estimated trajectories on KITTI. The first figure shows a scenario without loop closure. For the second and the third figures, the trajectories estimated by SLAM are with loop closure. For all results, our estimated trajectories (red) are very close to the ground truth (green).
  • ...and 1 more figures