Anchor-free Cross-view Object Geo-localization with Gaussian Position Encoding and Cross-view Association
Xingtao Ling, Chenlin Fu, Yingying Zhu
TL;DR
The paper tackles cross-view object geo-localization under large viewpoint gaps and positional uncertainty by introducing AFGeo, an anchor-free framework that forgoes predefined anchors in favor of direct pixel-wise localization. It couples Gaussian Position Encoding (GPE), which models the query click point as a learnable 2D Gaussian, with a Cross-view Object Association Module (CVOAM) that aligns semantically consistent context across views, all within a lightweight architecture. The method features an FCOS-inspired anchor-free localization head that decouples classification and regression and employs FCOS-style targets and a multi-term loss including focal, BCE, and GIoU components. AFGeo achieves state-of-the-art results on the CVOGL and G2D benchmarks, demonstrating strong localization accuracy with minimal parameter overhead and enabling deployment in resource-constrained scenarios.
Abstract
Most existing cross-view object geo-localization approaches adopt anchor-based paradigm. Although effective, such methods are inherently constrained by predefined anchors. To eliminate this dependency, we first propose an anchor-free formulation for cross-view object geo-localization, termed AFGeo. AFGeo directly predicts the four directional offsets (left, right, top, bottom) to the ground-truth box for each pixel, thereby localizing the object without any predefined anchors. To obtain a more robust spatial prior, AFGeo incorporates Gaussian Position Encoding (GPE) to model the click point in the query image, mitigating the uncertainty of object position that challenges object localization in cross-view scenarios. In addition, AFGeo incorporates a Cross-view Object Association Module (CVOAM) that relates the same object and its surrounding context across viewpoints, enabling reliable localization under large cross-view appearance gaps. By adopting an anchor-free localization paradigm that integrates GPE and CVOAM with minimal parameter overhead, our model is both lightweight and computationally efficient, achieving state-of-the-art performance on benchmark datasets.
