Table of Contents
Fetching ...

Geometry-Aware Feature Matching for Large-Scale Structure from Motion

Gonglin Chen, Jinsen Wu, Haiwei Chen, Wenbin Teng, Zhiyuan Gao, Andrew Feng, Rongjun Qin, Yajie Zhao

TL;DR

The paper tackles the problem of obtaining dense, consistent correspondences across image sequences with large baselines for Structure from Motion (SfM). It introduces a geometry-aware optimization module that fuses sparse anchors from detector-based matching with dense features from detector-free matching, enforcing epipolar geometry via the Sampson distance to refine correspondences. The method iteratively reweights and reassigns matches to produce geometrically consistent, denser correspondences, leading to improved camera Pose accuracy and more complete 3D reconstructions. Evaluations on IMC 2021, ScanNet, and challenging air-to-ground datasets demonstrate state-of-the-art pose accuracy and denser point clouds, with acknowledged tradeoffs in computational efficiency. The approach integrates with COLMAP and enhances SfM in extreme large-scale settings where traditional methods struggle.

Abstract

Establishing consistent and dense correspondences across multiple images is crucial for Structure from Motion (SfM) systems. Significant view changes, such as air-to-ground with very sparse view overlap, pose an even greater challenge to the correspondence solvers. We present a novel optimization-based approach that significantly enhances existing feature matching methods by introducing geometry cues in addition to color cues. This helps fill gaps when there is less overlap in large-scale scenarios. Our method formulates geometric verification as an optimization problem, guiding feature matching within detector-free methods and using sparse correspondences from detector-based methods as anchor points. By enforcing geometric constraints via the Sampson Distance, our approach ensures that the denser correspondences from detector-free methods are geometrically consistent and more accurate. This hybrid strategy significantly improves correspondence density and accuracy, mitigates multi-view inconsistencies, and leads to notable advancements in camera pose accuracy and point cloud density. It outperforms state-of-the-art feature matching methods on benchmark datasets and enables feature matching in challenging extreme large-scale settings.

Geometry-Aware Feature Matching for Large-Scale Structure from Motion

TL;DR

The paper tackles the problem of obtaining dense, consistent correspondences across image sequences with large baselines for Structure from Motion (SfM). It introduces a geometry-aware optimization module that fuses sparse anchors from detector-based matching with dense features from detector-free matching, enforcing epipolar geometry via the Sampson distance to refine correspondences. The method iteratively reweights and reassigns matches to produce geometrically consistent, denser correspondences, leading to improved camera Pose accuracy and more complete 3D reconstructions. Evaluations on IMC 2021, ScanNet, and challenging air-to-ground datasets demonstrate state-of-the-art pose accuracy and denser point clouds, with acknowledged tradeoffs in computational efficiency. The approach integrates with COLMAP and enhances SfM in extreme large-scale settings where traditional methods struggle.

Abstract

Establishing consistent and dense correspondences across multiple images is crucial for Structure from Motion (SfM) systems. Significant view changes, such as air-to-ground with very sparse view overlap, pose an even greater challenge to the correspondence solvers. We present a novel optimization-based approach that significantly enhances existing feature matching methods by introducing geometry cues in addition to color cues. This helps fill gaps when there is less overlap in large-scale scenarios. Our method formulates geometric verification as an optimization problem, guiding feature matching within detector-free methods and using sparse correspondences from detector-based methods as anchor points. By enforcing geometric constraints via the Sampson Distance, our approach ensures that the denser correspondences from detector-free methods are geometrically consistent and more accurate. This hybrid strategy significantly improves correspondence density and accuracy, mitigates multi-view inconsistencies, and leads to notable advancements in camera pose accuracy and point cloud density. It outperforms state-of-the-art feature matching methods on benchmark datasets and enables feature matching in challenging extreme large-scale settings.
Paper Structure (21 sections, 6 equations, 5 figures, 8 tables)

This paper contains 21 sections, 6 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 2: Challenging case of detector-based and detector-free feature matching methods for cross view images.
  • Figure 3: Matches from detector-free methods suffered from the problem of short track length (the track breaks at $I_2$). The location of the matches in $I_2$ depends on the images that it pairs with.
  • Figure 4: An overview of our pipeline for SfM reconstruction. 1. the pipeline runs image retrieval based on global embeddings generated by dinov2 oquab2024dinov2. 2. A backbone module takes image pairs as input. The image pairs will be processed by a Detector-Free Backbone and a Detector-Based backbone. 3. A geometry-aware optimization module is applied to iteratively optimize the fundamental matrix and matches with anchor points from detector-based methods. 4. The final matched coarse points are refined using a correlation-based refinement block. 5. Final refined matches are then fed into COLMAP schoenberger2016sfmschoenberger2016mvs for SfM.
  • Figure 5: Qualitative Results. Our method is qualitatively compared with ALIKED zhao2023aliked + LG lindenberger2023lightglue on multiple scenes. Green cameras have less than 3° absolute pose error, while red cameras have an error larger than 3°. More results can be found in supplementary material.
  • Figure 6: Qualitative Results. Our method is qualitatively compared with other feature matching methods on collected air-to-ground datasets. Red cameras are recovered poses.