Geometry-Aware Feature Matching for Large-Scale Structure from Motion
Gonglin Chen, Jinsen Wu, Haiwei Chen, Wenbin Teng, Zhiyuan Gao, Andrew Feng, Rongjun Qin, Yajie Zhao
TL;DR
The paper tackles the problem of obtaining dense, consistent correspondences across image sequences with large baselines for Structure from Motion (SfM). It introduces a geometry-aware optimization module that fuses sparse anchors from detector-based matching with dense features from detector-free matching, enforcing epipolar geometry via the Sampson distance to refine correspondences. The method iteratively reweights and reassigns matches to produce geometrically consistent, denser correspondences, leading to improved camera Pose accuracy and more complete 3D reconstructions. Evaluations on IMC 2021, ScanNet, and challenging air-to-ground datasets demonstrate state-of-the-art pose accuracy and denser point clouds, with acknowledged tradeoffs in computational efficiency. The approach integrates with COLMAP and enhances SfM in extreme large-scale settings where traditional methods struggle.
Abstract
Establishing consistent and dense correspondences across multiple images is crucial for Structure from Motion (SfM) systems. Significant view changes, such as air-to-ground with very sparse view overlap, pose an even greater challenge to the correspondence solvers. We present a novel optimization-based approach that significantly enhances existing feature matching methods by introducing geometry cues in addition to color cues. This helps fill gaps when there is less overlap in large-scale scenarios. Our method formulates geometric verification as an optimization problem, guiding feature matching within detector-free methods and using sparse correspondences from detector-based methods as anchor points. By enforcing geometric constraints via the Sampson Distance, our approach ensures that the denser correspondences from detector-free methods are geometrically consistent and more accurate. This hybrid strategy significantly improves correspondence density and accuracy, mitigates multi-view inconsistencies, and leads to notable advancements in camera pose accuracy and point cloud density. It outperforms state-of-the-art feature matching methods on benchmark datasets and enables feature matching in challenging extreme large-scale settings.
