Table of Contents
Fetching ...

Scale-adaptive UAV Geo-localization via Height-aware Partition Learning

Quan Chen, Tingyu Wang, Rongfeng Lu, Yu Liu, Bolun Zheng, Zhedong Zheng

TL;DR

This work tackles scale mismatch in UAV-to-satellite geo-localization by introducing SaLPN, a scale-adaptive partition framework that uses a height-derived factor $\theta$ to adjust drone-view partitions via HAAS, paired with a saliency-guided refinement (SGRS) to produce robust global, salient, and background descriptors. A square partition strategy (SPS) enables simultaneous capture of fine-grained and global information, and three classifier branches supervise the part-level features in a shared embedding space. Extensive experiments on University-1652 and SUES-200 show state-of-the-art performance and strong robustness to cross-view scale variations, with ablations confirming the effectiveness of SPS, HAAS, and SGRS across ResNet-50 and ViT backbones. The approach advances GNSS-denied UAV geo-localization by enabling explicit semantic alignment across views under varying drone heights, with practical impact for reliable cross-view retrieval in real-world deployment.

Abstract

UAV Geo-Localization faces significant challenges due to the drastic appearance discrepancy between dronecaptured images and satellite views. Existing methods typically assume a consistent scaling factor across views and rely on predefined partition alignment to extract viewpoint-invariant representations through part-level feature construction. However, this scaling assumption often fails in real-world scenarios, where variations in drone flight states lead to scale mismatches between cross-view images, resulting in severe performance degradation. To address this issue, we propose a scale-adaptive partition learning framework that leverages known drone flight height to predict scale factors and dynamically adjust feature extraction. Our key contribution is a height-aware adjustment strategy, which calculates the relative height ratio between drone and satellite views, dynamically adjusting partition sizes to explicitly align semantic information between partition pairs. This strategy is integrated into a Scale-adaptive Local Partition Network (SaLPN), building upon an existing square partition strategy to extract both finegrained and global features. Additionally, we propose a saliencyguided refinement strategy to enhance part-level features, further improving retrieval accuracy. Extensive experiments validate that our height-aware, scale-adaptive approach achieves stateof-the-art geo-localization accuracy in various scale-inconsistent scenarios and exhibits strong robustness against scale variations. The code will be made publicly available.

Scale-adaptive UAV Geo-localization via Height-aware Partition Learning

TL;DR

This work tackles scale mismatch in UAV-to-satellite geo-localization by introducing SaLPN, a scale-adaptive partition framework that uses a height-derived factor to adjust drone-view partitions via HAAS, paired with a saliency-guided refinement (SGRS) to produce robust global, salient, and background descriptors. A square partition strategy (SPS) enables simultaneous capture of fine-grained and global information, and three classifier branches supervise the part-level features in a shared embedding space. Extensive experiments on University-1652 and SUES-200 show state-of-the-art performance and strong robustness to cross-view scale variations, with ablations confirming the effectiveness of SPS, HAAS, and SGRS across ResNet-50 and ViT backbones. The approach advances GNSS-denied UAV geo-localization by enabling explicit semantic alignment across views under varying drone heights, with practical impact for reliable cross-view retrieval in real-world deployment.

Abstract

UAV Geo-Localization faces significant challenges due to the drastic appearance discrepancy between dronecaptured images and satellite views. Existing methods typically assume a consistent scaling factor across views and rely on predefined partition alignment to extract viewpoint-invariant representations through part-level feature construction. However, this scaling assumption often fails in real-world scenarios, where variations in drone flight states lead to scale mismatches between cross-view images, resulting in severe performance degradation. To address this issue, we propose a scale-adaptive partition learning framework that leverages known drone flight height to predict scale factors and dynamically adjust feature extraction. Our key contribution is a height-aware adjustment strategy, which calculates the relative height ratio between drone and satellite views, dynamically adjusting partition sizes to explicitly align semantic information between partition pairs. This strategy is integrated into a Scale-adaptive Local Partition Network (SaLPN), building upon an existing square partition strategy to extract both finegrained and global features. Additionally, we propose a saliencyguided refinement strategy to enhance part-level features, further improving retrieval accuracy. Extensive experiments validate that our height-aware, scale-adaptive approach achieves stateof-the-art geo-localization accuracy in various scale-inconsistent scenarios and exhibits strong robustness against scale variations. The code will be made publicly available.

Paper Structure

This paper contains 14 sections, 15 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: The simplified diagram of our research motivation. Given drone-view images captured from different heights (high-yellow box, middle-green box, low-blue box) and the corresponding satellite-view image (right), we propose a scale-adaptive partition learning strategy. This strategy dynamically adjusts to different altitudes by eliminating redundant background and expanding the field of view (FoV) to enhance spatial alignment. As shown, the extended/indented partition pairs exhibit more semantically consistent content, facilitating subsequential representation learning.
  • Figure 2: The proposed height-aware adjustment strategy ($N=3$ for illustration). Case (II): Assuming the drone and satellite has similar viewpoint height, we show the typical square partition process, i.e., the uniform partition. However, the assumption is not always hold, and thus, we consider a general form of partition. Case (I): When the drone is with higher flight height with extra backgrounds, the partition areas should be decreased based on the calculated scale factor $\theta_{1}$. Case (III): When the drone is with lower flight height with limited FoVs, the partition areas should be increased based on the calculated scale factor $\theta_{2}$.
  • Figure 3: The comparison of our method with typical partition strategies, including (a) soft-partition strategy FSRA dai2021transformer, (b) hard-partition strategy LPN wang2021each and (c) our height-aware method. The first and third rows are drone-view images with higher and lower heights, while the second row is the matched satellite image with a single scale. For different partition strategies, the corresponding part-level image representations are placed in the same column. We could observe that the proposed method has yielded a more consistent inter-part alignment at the same part level (i.e., every column).
  • Figure 4: Overview of SaLPN framework, including three phase: feature extraction, scale-adaptive partition learning and classification supervision. In the feature extraction phase, we extract the visual features by the backbones with sharing weights between two platforms. In the scale-adaptive partition learning phase, the visual features from two branch are sliced into part-level features with semantically consistent content. Next, each part-level feature is refined into more fine-grained feature descriptors, including global, salient and background representations, via a saliency-guided refinement strategy. In the classification supervision phase, we leverage the classifier module to predict the geo-tag of all feature descriptors. The network is optimized by minimizing the sum of the cross-entropy losses over all parts. In the testing phase, part-level image representation is extracted before classification layer in classifier module, and measures similarity by Euclidean distance. + denotes the element-wise addition; $\delta$ denotes threshold-based binarization; Ⓢ denotes the feature separation.
  • Figure 5: The data augmentation strategy to extend the shooting height range of drone-views from the visual perspective. (a) Making the shooting height seem higher: given a drone image, mirror a strip $\Delta$P pixels wide around the input image and resize it to initial resolution; (b) Making the shooting height seem lower: given a drone image, crop out a square ring $\Delta$P pixels wide around the input image and resize it to initial resolution.
  • ...and 5 more figures