Deep Homography Estimation for Visual Place Recognition
Feng Lu, Shuting Dong, Lijun Zhang, Bingxi Liu, Xiangyuan Lan, Dongmei Jiang, Chun Yuan
TL;DR
This paper tackles visual place recognition by replacing RANSAC-based geometric verification with a differentiable deep homography estimation (DHE) network in a two-stage VPR framework. The DHE network regresses a homography $\\mathbf{H}_{qc}$ from a dense local feature map to identify inliers via a re-projection inliers loss $L_r$, enabling end-to-end training with the backbone through the REI objective $L = L_g + \lambda L_r$. Key contributions include the architecture that jointly learns global retrieval and differentiable geometric verification, the REI loss that supplies supervision without explicit homography labels, and empirically strong results that outperform SOTA methods while delivering large speedups. The approach significantly reduces re-ranking time and improves robustness against perceptual aliasing, making it well suited for real-time, large-scale VPR tasks.
Abstract
Visual place recognition (VPR) is a fundamental task for many applications such as robot localization and augmented reality. Recently, the hierarchical VPR methods have received considerable attention due to the trade-off between accuracy and efficiency. They usually first use global features to retrieve the candidate images, then verify the spatial consistency of matched local features for re-ranking. However, the latter typically relies on the RANSAC algorithm for fitting homography, which is time-consuming and non-differentiable. This makes existing methods compromise to train the network only in global feature extraction. Here, we propose a transformer-based deep homography estimation (DHE) network that takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification. Moreover, we design a re-projection error of inliers loss to train the DHE network without additional homography labels, which can also be jointly trained with the backbone network to help it extract the features that are more suitable for local matching. Extensive experiments on benchmark datasets show that our method can outperform several state-of-the-art methods. And it is more than one order of magnitude faster than the mainstream hierarchical VPR methods using RANSAC. The code is released at https://github.com/Lu-Feng/DHE-VPR.
