Table of Contents
Fetching ...

AGL-NET: Aerial-Ground Cross-Modal Global Localization with Varying Scales

Tianrui Guan, Ruiqi Xian, Xijun Wang, Xiyang Wu, Mohamed Elnoor, Daeun Song, Dinesh Manocha

TL;DR

A novel scale and skeleton loss function guides the network toward learning scale-invariant feature representations, eliminating the need for pre-processing satellite maps and significantly improves real-world applicability in scenarios with unknown map scales.

Abstract

We present AGL-NET, a novel learning-based method for global localization using LiDAR point clouds and satellite maps. AGL-NET tackles two critical challenges: bridging the representation gap between image and points modalities for robust feature matching, and handling inherent scale discrepancies between global view and local view. To address these challenges, AGL-NET leverages a unified network architecture with a novel two-stage matching design. The first stage extracts informative neural features directly from raw sensor data and performs initial feature matching. The second stage refines this matching process by extracting informative skeleton features and incorporating a novel scale alignment step to rectify scale variations between LiDAR and map data. Furthermore, a novel scale and skeleton loss function guides the network toward learning scale-invariant feature representations, eliminating the need for pre-processing satellite maps. This significantly improves real-world applicability in scenarios with unknown map scales. To facilitate rigorous performance evaluation, we introduce a meticulously designed dataset within the CARLA simulator specifically tailored for metric localization training and assessment. The code and data can be accessed at https://github.com/rayguan97/AGL-Net.

AGL-NET: Aerial-Ground Cross-Modal Global Localization with Varying Scales

TL;DR

A novel scale and skeleton loss function guides the network toward learning scale-invariant feature representations, eliminating the need for pre-processing satellite maps and significantly improves real-world applicability in scenarios with unknown map scales.

Abstract

We present AGL-NET, a novel learning-based method for global localization using LiDAR point clouds and satellite maps. AGL-NET tackles two critical challenges: bridging the representation gap between image and points modalities for robust feature matching, and handling inherent scale discrepancies between global view and local view. To address these challenges, AGL-NET leverages a unified network architecture with a novel two-stage matching design. The first stage extracts informative neural features directly from raw sensor data and performs initial feature matching. The second stage refines this matching process by extracting informative skeleton features and incorporating a novel scale alignment step to rectify scale variations between LiDAR and map data. Furthermore, a novel scale and skeleton loss function guides the network toward learning scale-invariant feature representations, eliminating the need for pre-processing satellite maps. This significantly improves real-world applicability in scenarios with unknown map scales. To facilitate rigorous performance evaluation, we introduce a meticulously designed dataset within the CARLA simulator specifically tailored for metric localization training and assessment. The code and data can be accessed at https://github.com/rayguan97/AGL-Net.
Paper Structure (18 sections, 8 equations, 4 figures, 3 tables)

This paper contains 18 sections, 8 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Overview of global localization and proposed AGL-Net: Utilizing a local ground LiDAR and an aerial-view map, our goal is to identify the corresponding position and orientation of ground observations relative to the map. This task presents two significant challenges: cross-modality matching and varying scales of the map. To address these, our method employs a unified network designed to process both point and image modalities, while explicitly managing the scale discrepancies between ground and aerial views. Our assumption of an unknown scale not only distinguishes our method from previous approaches Tang2021GetTTTang2020RSLNetLI, but also introduces a more challenging task.
  • Figure 2: Data diversity from CARLA simulator for global localization: We show the overhead image and LiDAR points in red in their corresponding location. In each pair, we show images on the same area with different scales (a, b, d, e), orientations(c, f), and lighting conditions (f). Since the data from the ground and air might be collected at different times, part of the LiDAR points would not correspond to dynamic objects (cars, etc.) in the aerial view, but static objects (buildings, etc.) can match well with the ground scan.
  • Figure 3: Architecture of our proposed network AGL-Net:AGL-Net processes LiDAR point clouds and aerial maps through separate encoders to generate neural feature representations. These features then undergo a two-stage matching process: initial matching for general correspondence and skeleton-based matching with a predicted scale adjustment to account for potential scale discrepancies. Finally, AGL-Net fuses the results from both stages to generate a robust final estimation score for accurate camera pose determination.
  • Figure 4: AGL-Net output visualization in CARLA simulation: We use green arrow for the ground truth and blue arrow for the predicted pose. We highlight and enlarge the likelihood region near the ground truth in the read circle. Even in case of a larger location error (top left), the pose likelihood distribution have higher value along the lane of the road.