Table of Contents
Fetching ...

Cross-view geo-localization, Image retrieval, Multiscale geometric modeling, Frequency domain enhancement

Hongying Zhang, ShuaiShuai Ma

TL;DR

The Spatial and Frequency Domain Enhancement Network (SFDE) is proposed, which leverages complementary representations from spatial and frequency domains to characterizing consistency across domains from the perspectives of scene topology, multiscale structural patterns, and frequency invariance.

Abstract

Cross-view geo-localization (CVGL) aims to establish spatial correspondences between images captured from significantly different viewpoints and constitutes a fundamental technique for visual localization in GNSS-denied environments. Nevertheless, CVGL remains challenging due to severe geometric asymmetry, texture inconsistency across imaging domains, and the progressive degradation of discriminative local information. Existing methods predominantly rely on spatial domain feature alignment, which is inherently sensitive to large scale viewpoint variations and local disturbances. To alleviate these limitations, this paper proposes the Spatial and Frequency Domain Enhancement Network (SFDE), which leverages complementary representations from spatial and frequency domains. SFDE adopts a three branch parallel architecture to model global semantic context, local geometric structure, and statistical stability in the frequency domain, respectively, thereby characterizing consistency across domains from the perspectives of scene topology, multiscale structural patterns, and frequency invariance. The resulting complementary features are jointly optimized in a unified embedding space via progressive enhancement and coupled constraints, enabling the learning of cross-view representations with consistency across multiple granularities. Comprehensive experiments show that SFDE achieves competitive performance and in many cases even surpasses state-of-the-art methods, while maintaining a lightweight and computationally efficient design. {Our code is available at https://github.com/Mashuaishuai669/SFDE

Cross-view geo-localization, Image retrieval, Multiscale geometric modeling, Frequency domain enhancement

TL;DR

The Spatial and Frequency Domain Enhancement Network (SFDE) is proposed, which leverages complementary representations from spatial and frequency domains to characterizing consistency across domains from the perspectives of scene topology, multiscale structural patterns, and frequency invariance.

Abstract

Cross-view geo-localization (CVGL) aims to establish spatial correspondences between images captured from significantly different viewpoints and constitutes a fundamental technique for visual localization in GNSS-denied environments. Nevertheless, CVGL remains challenging due to severe geometric asymmetry, texture inconsistency across imaging domains, and the progressive degradation of discriminative local information. Existing methods predominantly rely on spatial domain feature alignment, which is inherently sensitive to large scale viewpoint variations and local disturbances. To alleviate these limitations, this paper proposes the Spatial and Frequency Domain Enhancement Network (SFDE), which leverages complementary representations from spatial and frequency domains. SFDE adopts a three branch parallel architecture to model global semantic context, local geometric structure, and statistical stability in the frequency domain, respectively, thereby characterizing consistency across domains from the perspectives of scene topology, multiscale structural patterns, and frequency invariance. The resulting complementary features are jointly optimized in a unified embedding space via progressive enhancement and coupled constraints, enabling the learning of cross-view representations with consistency across multiple granularities. Comprehensive experiments show that SFDE achieves competitive performance and in many cases even surpasses state-of-the-art methods, while maintaining a lightweight and computationally efficient design. {Our code is available at https://github.com/Mashuaishuai669/SFDE
Paper Structure (20 sections, 24 equations, 8 figures, 10 tables)

This paper contains 20 sections, 24 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Overall architecture of the proposed SFDE network. The network adopts a three branch parallel design to capture GSCB, LGSB, and FSAB from complementary perspectives. The left part depicts the shared backbone for feature extraction and the subsequent multi branch processing, while the right part illustrates the complete inference workflow.
  • Figure 2: Overview of the LGSB. This branch captures spatial relationships ranging from local textures to mid range geometric configurations via multiscale dilated convolutions, and integrates interactive attention between local and global features with adaptive spatial pyramid pooling to achieve multigranularity geometric-sensitive modeling.
  • Figure 3: Overview of the FSAB. The branch transforms spatial features into the frequency domain and decomposes them into amplitude and phase components. Adaptive frequency reweighting and modulation are applied to the amplitude spectrum, while phase structures are preserved to maintain spatial coherence. The enhanced spectral representations are then projected back to the spatial domain through the inverse Fourier transform, producing frequency-complementary features.
  • Figure 4: Comparison of computational cost (Params, FLOPs) and performance between DAC and SFDE.
  • Figure 5: Visualization of feature embeddings in a 2D feature space. We select 40 geo-locations from the test set, samples with the same color correspond to the same location, and the star marker denotes the center of the corresponding location.
  • ...and 3 more figures