Loc$^2$: Interpretable Cross-View Localization via Depth-Lifted Local Feature Matching

Zimin Xia; Chenghao Xu; Alexandre Alahi

Loc$^2$: Interpretable Cross-View Localization via Depth-Lifted Local Feature Matching

Zimin Xia, Chenghao Xu, Alexandre Alahi

TL;DR

An accurate and interpretable fine-grained cross-view localization method that estimates the 3 Degrees of Freedom pose of a ground-level image by matching its local features with a reference aerial image that directly learns ground-aerial image-plane correspondences using weak supervision from camera poses.

Abstract

We propose an accurate and interpretable fine-grained cross-view localization method that estimates the 3 Degrees of Freedom (DoF) pose of a ground-level image by matching its local features with a reference aerial image. Unlike prior approaches that rely on global descriptors or bird's-eye-view (BEV) transformations, our method directly learns ground-aerial image-plane correspondences using weak supervision from camera poses. The matched ground points are lifted into BEV space with monocular depth predictions, and scale-aware Procrustes alignment is then applied to estimate camera rotation, translation, and optionally the scale between relative depth and the aerial metric space. This formulation is lightweight, end-to-end trainable, and requires no pixel-level annotations. Experiments show state-of-the-art accuracy in challenging scenarios such as cross-area testing and unknown orientation. Furthermore, our method offers strong interpretability: correspondence quality directly reflects localization accuracy and enables outlier rejection via RANSAC, while overlaying the re-scaled ground layout on the aerial image provides an intuitive visual cue of localization performance.

Loc$^2$: Interpretable Cross-View Localization via Depth-Lifted Local Feature Matching

TL;DR

Abstract

Loc$^2$: Interpretable Cross-View Localization via Depth-Lifted Local Feature Matching

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)