Table of Contents
Fetching ...

OrthoLoC: UAV 6-DoF Localization and Calibration Using Orthographic Geodata

Oussema Dhaouadi, Riccardo Marin, Johannes Meier, Jacques Kaiser, Daniel Cremers

TL;DR

OrthoLoC introduces a UAV localization paradigm that leverages orthographic geodata (DOP) and elevation data (DSM) to enable 6-DoF pose estimation in GNSS-denied environments. It provides a large-scale, multi-modal dataset with precise ground-truth poses and proposes AdHoP, a geometry-driven refinement that warps the DOP with a homography to reduce perspective disparities before re-estimating pose. The approach is backbone-agnostic and demonstrates that 2.5D geodata can yield accurate localization, with the best results from dense matchers combined with AdHoP, while camera calibration remains challenging due to focal-length/translation ambiguity and is improved by higher data resolution and covisibility. The work also offers a benchmarking framework that enables fair cross-domain evaluation of localization and calibration methods using orthographic references, potentially accelerating practical deployment in resource-constrained UAV missions.

Abstract

Accurate visual localization from aerial views is a fundamental problem with applications in mapping, large-area inspection, and search-and-rescue operations. In many scenarios, these systems require high-precision localization while operating with limited resources (e.g., no internet connection or GNSS/GPS support), making large image databases or heavy 3D models impractical. Surprisingly, little attention has been given to leveraging orthographic geodata as an alternative paradigm, which is lightweight and increasingly available through free releases by governmental authorities (e.g., the European Union). To fill this gap, we propose OrthoLoC, the first large-scale dataset comprising 16,425 UAV images from Germany and the United States with multiple modalities. The dataset addresses domain shifts between UAV imagery and geospatial data. Its paired structure enables fair benchmarking of existing solutions by decoupling image retrieval from feature matching, allowing isolated evaluation of localization and calibration performance. Through comprehensive evaluation, we examine the impact of domain shifts, data resolutions, and covisibility on localization accuracy. Finally, we introduce a refinement technique called AdHoP, which can be integrated with any feature matcher, improving matching by up to 95% and reducing translation error by up to 63%. The dataset and code are available at: https://deepscenario.github.io/OrthoLoC.

OrthoLoC: UAV 6-DoF Localization and Calibration Using Orthographic Geodata

TL;DR

OrthoLoC introduces a UAV localization paradigm that leverages orthographic geodata (DOP) and elevation data (DSM) to enable 6-DoF pose estimation in GNSS-denied environments. It provides a large-scale, multi-modal dataset with precise ground-truth poses and proposes AdHoP, a geometry-driven refinement that warps the DOP with a homography to reduce perspective disparities before re-estimating pose. The approach is backbone-agnostic and demonstrates that 2.5D geodata can yield accurate localization, with the best results from dense matchers combined with AdHoP, while camera calibration remains challenging due to focal-length/translation ambiguity and is improved by higher data resolution and covisibility. The work also offers a benchmarking framework that enables fair cross-domain evaluation of localization and calibration methods using orthographic references, potentially accelerating practical deployment in resource-constrained UAV missions.

Abstract

Accurate visual localization from aerial views is a fundamental problem with applications in mapping, large-area inspection, and search-and-rescue operations. In many scenarios, these systems require high-precision localization while operating with limited resources (e.g., no internet connection or GNSS/GPS support), making large image databases or heavy 3D models impractical. Surprisingly, little attention has been given to leveraging orthographic geodata as an alternative paradigm, which is lightweight and increasingly available through free releases by governmental authorities (e.g., the European Union). To fill this gap, we propose OrthoLoC, the first large-scale dataset comprising 16,425 UAV images from Germany and the United States with multiple modalities. The dataset addresses domain shifts between UAV imagery and geospatial data. Its paired structure enables fair benchmarking of existing solutions by decoupling image retrieval from feature matching, allowing isolated evaluation of localization and calibration performance. Through comprehensive evaluation, we examine the impact of domain shifts, data resolutions, and covisibility on localization accuracy. Finally, we introduce a refinement technique called AdHoP, which can be integrated with any feature matcher, improving matching by up to 95% and reducing translation error by up to 63%. The dataset and code are available at: https://deepscenario.github.io/OrthoLoC.

Paper Structure

This paper contains 61 sections, 29 equations, 16 figures, 7 tables.

Figures (16)

  • Figure 1: Georeferenced UAV Localization / Calibration with Orthographic Geodata. Our framework bridges the aerial-to-orthographic domain gap. It enables precise 6-DoF localization and calibration using only DOP and DSM geodata. This approach works even in GNSS-denied environments without requiring expensive 3D models or image databases.
  • Figure 2: Data Modalities in dataset. Each sample includes a query image, a point map (represented as a depth map), a local mesh, visible 3D keypoints, and photogrammetrically reconstructed DOP/DSM. The dataset also includes an augmented version of DOP/DSM derived from secondary sources, introducing domain gaps for increased variability.
  • Figure 3: Dataset Creation Pipeline. First, (A) data acquisition involves UAV imagery collection. This data, combined with georeferencing techniques like GCP and RTK, reconstructs a georeferenced 3D textured mesh. Subsequently, geodata is derived through rasterization and orthographic rendering. Then, (B) data pairing identifies regions of interest for each query image via raycasting. These areas undergo random expansion, followed by cropping geometric elements to form samples. Finally, (C) the data is augmented with geodata from external sources, where spatial alignment is verified.
  • Figure 4: UAV 6-DoF Localization and Calibration with AdHoP: (A) Initial Localization / Calibration: We match features between the query image and DOP (1), lift the correspondences to 3D using the DSM (2), and compute an initial pose and optional intrinsics (3). (B) AdHoP Refinement: Using the initial 2D-2D correspondences, we estimate a homography to warp the DOP (4), thereby reducing perspective differences. This enables enhanced feature matching on the warped orthophoto (5). The new correspondences are then mapped back to the original unwarped coordinate space (6), lifted to 3D using the DSM (7), and used to compute refined camera parameters (8). The refinement is accepted only when it reduces the reprojection error (9).
  • Figure 5: Localization Without and With AdHoP. xFeat* xfeat_24 matching results showing 3D keypoint projections in green (using the ground-truth pose) and red (using the estimated pose). Blue lines indicate projection discrepancies between estimated and ground-truth positions.
  • ...and 11 more figures