(MGS)$^2$-Net: Unifying Micro-Geometric Scale and Macro-Geometric Structure for Cross-View Geo-Localization
Minglei Li, Mengfan He, Chao Chen, Ziyang Meng
TL;DR
This paper addresses cross-view geo-localization under severe 3D geometric misalignment by introducing a geometry-grounded framework, (MGS)², that couples Macro-Geometric Structure Filtering (MGSF) with Micro-Geometric Scale Adaptation (MGSA) and a Geometric-Appearance Contrastive Distillation (GACD) loss. MGSA uses depth priors to dynamically fuse multi-scale features, while MGSF uses depth-derived macro-gradients and normal clustering to suppress view-dependent vertical facades and emphasize horizontal planes. GACD enforces a geometric-prior-based ranking, rewarding roof-centric activations over facade distractions, and the optimization combines L_triplet with L_GACD. Experiments on University-1652 and SUES-200 show state-of-the-art Recall@1 and strong cross-dataset generalization, with qualitative visualizations confirming effective suppression of vertical artifacts. Overall, the approach demonstrates that explicit 3D structure priors substantially improve cross-view localization in urban environments and offers a path toward robust, geometry-aware CVGL systems in GNSS-denied scenarios.
Abstract
Cross-view geo-localization (CVGL) is pivotal for GNSS-denied UAV navigation but remains brittle under the drastic geometric misalignment between oblique aerial views and orthographic satellite references. Existing methods predominantly operate within a 2D manifold, neglecting the underlying 3D geometry where view-dependent vertical facades (macro-structure) and scale variations (micro-scale) severely corrupt feature alignment. To bridge this gap, we propose (MGS)$^2$, a geometry-grounded framework. The core of our innovation is the Macro-Geometric Structure Filtering (MGSF) module. Unlike pixel-wise matching sensitive to noise, MGSF leverages dilated geometric gradients to physically filter out high-frequency facade artifacts while enhancing the view-invariant horizontal plane, directly addressing the domain shift. To guarantee robust input for this structural filtering, we explicitly incorporate a Micro-Geometric Scale Adaptation (MGSA) module. MGSA utilizes depth priors to dynamically rectify scale discrepancies via multi-branch feature fusion. Furthermore, a Geometric-Appearance Contrastive Distillation (GACD) loss is designed to strictly discriminate against oblique occlusions. Extensive experiments demonstrate that (MGS)$^2$ achieves state-of-the-art performance, recording a Recall@1 of 97.5\% on University-1652 and 97.02\% on SUES-200. Furthermore, the framework exhibits superior cross-dataset generalization against geometric ambiguity. The code is available at: \href{https://github.com/GabrielLi1473/MGS-Net}{https://github.com/GabrielLi1473/MGS-Net}.
