Table of Contents
Fetching ...

Scale-Aware UAV-to-Satellite Cross-View Geo-Localization: A Semantic Geometric Approach

Yibin Ye, Shuo Chen, Kun Wang, Xiaokai Song, Jisheng Dang, Qifeng Yu, Xichao Teng, Zhang Li

TL;DR

A geometric framework that recovers the absolute metric scale from monocular UAV images using semantic anchors and shows strong potential for downstream applications such as passive UAV altitude estimation and 3D model scale recovery is proposed.

Abstract

Cross-View Geo-Localization (CVGL) between UAV imagery and satellite images plays a crucial role in target localization and UAV self-positioning. However, most existing methods rely on the idealized assumption of scale consistency between UAV queries and satellite galleries, overlooking the severe scale ambiguity commonly encountered in real-world scenarios. This discrepancy leads to field-of-view misalignment and feature mismatch, significantly degrading CVGL robustness. To address this issue, we propose a geometric framework that recovers the absolute metric scale from monocular UAV images using semantic anchors. Specifically, small vehicles (SVs), characterized by relatively stable prior size distributions and high detectability, are exploited as metric references. A Decoupled Stereoscopic Projection Model is introduced to estimate the absolute image scale from these semantic targets. By decomposing vehicle dimensions into radial and tangential components, the model compensates for perspective distortions in 2D detections of 3D vehicles, enabling more accurate scale estimation. To further reduce intra-class size variation and detection noise, a dual-dimension fusion strategy with Interquartile Range (IQR)-based robust aggregation is employed. The estimated global scale is then used as a physical constraint for scale-adaptive satellite image cropping, improving UAV-to-satellite feature alignment. Experiments on augmented DenseUAV and UAV-VisLoc datasets demonstrate that the proposed method significantly improves CVGL robustness under unknown UAV image scales. Additionally, the framework shows strong potential for downstream applications such as passive UAV altitude estimation and 3D model scale recovery.

Scale-Aware UAV-to-Satellite Cross-View Geo-Localization: A Semantic Geometric Approach

TL;DR

A geometric framework that recovers the absolute metric scale from monocular UAV images using semantic anchors and shows strong potential for downstream applications such as passive UAV altitude estimation and 3D model scale recovery is proposed.

Abstract

Cross-View Geo-Localization (CVGL) between UAV imagery and satellite images plays a crucial role in target localization and UAV self-positioning. However, most existing methods rely on the idealized assumption of scale consistency between UAV queries and satellite galleries, overlooking the severe scale ambiguity commonly encountered in real-world scenarios. This discrepancy leads to field-of-view misalignment and feature mismatch, significantly degrading CVGL robustness. To address this issue, we propose a geometric framework that recovers the absolute metric scale from monocular UAV images using semantic anchors. Specifically, small vehicles (SVs), characterized by relatively stable prior size distributions and high detectability, are exploited as metric references. A Decoupled Stereoscopic Projection Model is introduced to estimate the absolute image scale from these semantic targets. By decomposing vehicle dimensions into radial and tangential components, the model compensates for perspective distortions in 2D detections of 3D vehicles, enabling more accurate scale estimation. To further reduce intra-class size variation and detection noise, a dual-dimension fusion strategy with Interquartile Range (IQR)-based robust aggregation is employed. The estimated global scale is then used as a physical constraint for scale-adaptive satellite image cropping, improving UAV-to-satellite feature alignment. Experiments on augmented DenseUAV and UAV-VisLoc datasets demonstrate that the proposed method significantly improves CVGL robustness under unknown UAV image scales. Additionally, the framework shows strong potential for downstream applications such as passive UAV altitude estimation and 3D model scale recovery.
Paper Structure (21 sections, 18 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 21 sections, 18 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: Scale comparison in UAV-to-Satellite CVGL: (a) For existing datasets (DenseUAV, GTA-UAV, SUES-200, University-1652), UAV image scales (marked by arrows) are consistent with satellite images (purple arrow, normalized to 1.0), with maximum difference not exceeding a factor of 2; (b) Under unknown scale, imprecise satellite cropping causes huge scale/FOV discrepancies between UAV images and satellite crops.
  • Figure 2: Overview of the proposed Scale-aware CVGL framework. The method first utilizes oriented bounding boxes of small vehicles as semantic anchors. Notably, off-center targets (orange frame) exhibit significant stereoscopic effects, necessitating the incorporation of vehicle height into the modeling. To address this, we employ a decoupled stereoscopic projection model combined with statistical dimension distributions to estimate the single-instance absolute scale. Finally, robust global scale estimation is conducted to guide the scale-adaptive cropping of satellite imagery, ensuring robust CVGL.
  • Figure 3: DOTA dataset object category analysis. (a) Normalized length (blue) and width (orange) distributions. (b) Average instances per image frequency.
  • Figure 4: Overview of the refined datasets. (a) DenseUAV+: expanded satellite maps enabling scale-adaptive cropping at different altitudes. The blue dots denote the UAV trajectories. (b) UAV-VisLoc+: SfM-based refinement producing ortho-map/DSM and refined poses for accurate relative altitude.
  • Figure 5: Sensitivity to scale mismatch. Success Rate (SR, %) under different relative altitude ratios $\delta$ in $\tilde{H}=H(1+\delta)$ (equivalently, relative scale mismatch). Each subfigure contains two plots: left reports results on queries where our scale estimator is applicable (with sufficient semantic anchors, e.g., $\tau_{conf}\!\ge\!0.5$ and $N\!\ge\!5$), while right reports results on all queries (including those without valid scale estimates). The green vertical line indicates the ground-truth scale ($\delta=0$). The orange band denotes the distribution of our estimated scale errors, visualized as $\mu\pm\sigma$ of $\delta$ (computed on the estimable subset), showing that most estimates fall into the stable regime where CVGL performance is less sensitive to scale.
  • ...and 3 more figures