Table of Contents
Fetching ...

ViewBridge:Revisiting Cross-View Localization from Image Matching

Panwang Xia, Qiong Wu, Lei Yu, Yi Liu, Mingtao Xiong, Xudong Lu, Yi Liu, Haoyu Guo, Yongxiang Yao, Junjian Zhang, Xiangyuan Cai, Hongwei Hu, Zhi Zheng, Yongjun Zhang, Yi Wan

TL;DR

This work revisits cross-view localization from the perspective of image matching and proposes a unified framework that enhances both matching and localization and introduces a Surface Model and SimRefiner that adaptively refines similarity distributions to enhance match reliability.

Abstract

Cross-view localization aims to estimate the 3-DoF pose of a ground-view image by aligning it with aerial or satellite imagery. Existing methods typically address this task through direct regression or feature alignment in a shared bird's-eye view (BEV) space. Although effective for coarse alignment, these methods fail to establish fine-grained and geometrically reliable correspondences under large viewpoint variations, thereby limiting both the accuracy and interpretability of localization results. Consequently, we revisit cross-view localization from the perspective of image matching and propose a unified framework that enhances both matching and localization. Specifically, we introduce a Surface Model that constrains BEV feature projection to physically valid regions for geometric consistency, and a SimRefiner that adaptively refines similarity distributions to enhance match reliability. To further support research in this area, we present CVFM, the first benchmark with 32,509 cross-view image pairs annotated with pixel-level correspondences. Extensive experiments demonstrate that our approach achieves geometry-consistent and fine-grained correspondences across extreme viewpoints and further improves the accuracy and stability of cross-view localization.

ViewBridge:Revisiting Cross-View Localization from Image Matching

TL;DR

This work revisits cross-view localization from the perspective of image matching and proposes a unified framework that enhances both matching and localization and introduces a Surface Model and SimRefiner that adaptively refines similarity distributions to enhance match reliability.

Abstract

Cross-view localization aims to estimate the 3-DoF pose of a ground-view image by aligning it with aerial or satellite imagery. Existing methods typically address this task through direct regression or feature alignment in a shared bird's-eye view (BEV) space. Although effective for coarse alignment, these methods fail to establish fine-grained and geometrically reliable correspondences under large viewpoint variations, thereby limiting both the accuracy and interpretability of localization results. Consequently, we revisit cross-view localization from the perspective of image matching and propose a unified framework that enhances both matching and localization. Specifically, we introduce a Surface Model that constrains BEV feature projection to physically valid regions for geometric consistency, and a SimRefiner that adaptively refines similarity distributions to enhance match reliability. To further support research in this area, we present CVFM, the first benchmark with 32,509 cross-view image pairs annotated with pixel-level correspondences. Extensive experiments demonstrate that our approach achieves geometry-consistent and fine-grained correspondences across extreme viewpoints and further improves the accuracy and stability of cross-view localization.