Table of Contents
Fetching ...

Evaluating Global Geo-alignment for Precision Learned Autonomous Vehicle Localization using Aerial Data

Yi Yang, Xuran Zhao, H. Charles Zhao, Shumin Yuan, Samuel M. Bateman, Tiffany A. Huang, Chris Beall, Will Maddern

TL;DR

This work tackles sub-meter global localization for autonomous vehicles by leveraging aerial data and two training-time data-alignment strategies. It shows that aligning aerial maps to vehicle data (or vice versa) before learning significantly improves localization accuracy, and it introduces a two-branch cross-modal localization model trained with alignment-derived ground truth. On a large-scale 1600 km SF Bay Area dataset, vehicle-to-map alignment with DSM and RGB imagery yields sub-meter position error and sub-degree yaw error, while even RGB-only data can achieve competitive medians when trained with proper alignment. The findings underscore the practical potential of low-cost aerial data for precise, scalable autonomous-vehicle localization and guide future improvements in cross-modal geo-localization pipelines.

Abstract

Recently there has been growing interest in the use of aerial and satellite map data for autonomous vehicles, primarily due to its potential for significant cost reduction and enhanced scalability. Despite the advantages, aerial data also comes with challenges such as a sensor-modality gap and a viewpoint difference gap. Learned localization methods have shown promise for overcoming these challenges to provide precise metric localization for autonomous vehicles. Most learned localization methods rely on coarsely aligned ground truth, or implicit consistency-based methods to learn the localization task -- however, in this paper we find that improving the alignment between aerial data and autonomous vehicle sensor data at training time is critical to the performance of a learning-based localization system. We compare two data alignment methods using a factor graph framework and, using these methods, we then evaluate the effects of closely aligned ground truth on learned localization accuracy through ablation studies. Finally, we evaluate a learned localization system using the data alignment methods on a comprehensive (1600km) autonomous vehicle dataset and demonstrate localization error below 0.3m and 0.5$^{\circ}$ sufficient for autonomous vehicle applications.

Evaluating Global Geo-alignment for Precision Learned Autonomous Vehicle Localization using Aerial Data

TL;DR

This work tackles sub-meter global localization for autonomous vehicles by leveraging aerial data and two training-time data-alignment strategies. It shows that aligning aerial maps to vehicle data (or vice versa) before learning significantly improves localization accuracy, and it introduces a two-branch cross-modal localization model trained with alignment-derived ground truth. On a large-scale 1600 km SF Bay Area dataset, vehicle-to-map alignment with DSM and RGB imagery yields sub-meter position error and sub-degree yaw error, while even RGB-only data can achieve competitive medians when trained with proper alignment. The findings underscore the practical potential of low-cost aerial data for precise, scalable autonomous-vehicle localization and guide future improvements in cross-modal geo-localization pipelines.

Abstract

Recently there has been growing interest in the use of aerial and satellite map data for autonomous vehicles, primarily due to its potential for significant cost reduction and enhanced scalability. Despite the advantages, aerial data also comes with challenges such as a sensor-modality gap and a viewpoint difference gap. Learned localization methods have shown promise for overcoming these challenges to provide precise metric localization for autonomous vehicles. Most learned localization methods rely on coarsely aligned ground truth, or implicit consistency-based methods to learn the localization task -- however, in this paper we find that improving the alignment between aerial data and autonomous vehicle sensor data at training time is critical to the performance of a learning-based localization system. We compare two data alignment methods using a factor graph framework and, using these methods, we then evaluate the effects of closely aligned ground truth on learned localization accuracy through ablation studies. Finally, we evaluate a learned localization system using the data alignment methods on a comprehensive (1600km) autonomous vehicle dataset and demonstrate localization error below 0.3m and 0.5 sufficient for autonomous vehicle applications.

Paper Structure

This paper contains 18 sections, 5 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: We show that explicit alignment between vehicle LiDAR and aerial DSM at training time (top) is key to unlocking sub-meter-accurate learned global localization using aerial imagery on urban roads (bottom). In this paper we compare approaches for map data geo-alignment (blue box) at training time for a learning-based autonomous vehicle localization system. The resulting localization solutions are significantly improved at run-time to an accuracy sufficient for autonomous vehicle applications.
  • Figure 2: We compare two classes of alignment approach, where (a) shows the workflow to align aerial map data to vehicle data and (b) shows the workflow to align vehicle data to aerial map data.
  • Figure 3: Localization model architecture inspired by Barsan2018LearningMapSarlin2023Orienternet:Matching. The model consists of two components, an online encoder (top) which consumes onboard LiDAR spins, and a geospatial encoder (bottom) which consumes aerial DSM and/or imagery. The embedding images produced by these two components are aligned with each other by computing the cross correlation over a search window of possible $x$, $y$, and $\theta$ offsets.
  • Figure 4: Overlay images of USGS DSM data and the vehicle LiDAR map with (left) no alignment and (right) vehicle-to-map alignment. Vehicle LiDAR data is shown in the white-red color, and USGS DSM data is shown in the green-blue color. Note that the LiDAR artifacts in the highlighted region are resolved through the alignment process.
  • Figure 5: Localization results from models trained using no alignment vs vehicle-to-map alignment. The first row from left to right: USGS DSM, online LiDAR, USDA aerial imagery; subsequent rows show models trained without alignment and by vehicle-to-map alignment respectively. Note the improved contrast of the online embedding (center) and reduced uncertainty (right) with improved training alignment.
  • ...and 2 more figures