Table of Contents
Fetching ...

LoMa: Local Feature Matching Revisited

David Nordström, Johan Edstedt, Georg Bökman, Jonathan Astermark, Anders Heyden, Viktor Larsson, Mårten Wadenbäck, Michael Felsberg, Fredrik Kahl

Abstract

Local feature matching has long been a fundamental component of 3D vision systems such as Structure-from-Motion (SfM), yet progress has lagged behind the rapid advances of modern data-driven approaches. The newer approaches, such as feed-forward reconstruction models, have benefited extensively from scaling dataset sizes, whereas local feature matching models are still only trained on a few mid-sized datasets. In this paper, we revisit local feature matching from a data-driven perspective. In our approach, which we call LoMa, we combine large and diverse data mixtures, modern training recipes, scaled model capacity, and scaled compute, resulting in remarkable gains in performance. Since current standard benchmarks mainly rely on collecting sparse views from successful 3D reconstructions, the evaluation of progress in feature matching has been limited to relatively easy image pairs. To address the resulting saturation of benchmarks, we collect 1000 highly challenging image pairs from internet data into a new dataset called HardMatch. Ground truth correspondences for HardMatch are obtained via manual annotation by the authors. In our extensive benchmarking suite, we find that LoMa makes outstanding progress across the board, outperforming the state-of-the-art method ALIKED+LightGlue by +18.6 mAA on HardMatch, +29.5 mAA on WxBS, +21.4 (1m, 10$^\circ$) on InLoc, +24.2 AUC on RUBIK, and +12.4 mAA on IMC 2022. We release our code and models publicly at https://github.com/davnords/LoMa.

LoMa: Local Feature Matching Revisited

Abstract

Local feature matching has long been a fundamental component of 3D vision systems such as Structure-from-Motion (SfM), yet progress has lagged behind the rapid advances of modern data-driven approaches. The newer approaches, such as feed-forward reconstruction models, have benefited extensively from scaling dataset sizes, whereas local feature matching models are still only trained on a few mid-sized datasets. In this paper, we revisit local feature matching from a data-driven perspective. In our approach, which we call LoMa, we combine large and diverse data mixtures, modern training recipes, scaled model capacity, and scaled compute, resulting in remarkable gains in performance. Since current standard benchmarks mainly rely on collecting sparse views from successful 3D reconstructions, the evaluation of progress in feature matching has been limited to relatively easy image pairs. To address the resulting saturation of benchmarks, we collect 1000 highly challenging image pairs from internet data into a new dataset called HardMatch. Ground truth correspondences for HardMatch are obtained via manual annotation by the authors. In our extensive benchmarking suite, we find that LoMa makes outstanding progress across the board, outperforming the state-of-the-art method ALIKED+LightGlue by +18.6 mAA on HardMatch, +29.5 mAA on WxBS, +21.4 (1m, 10) on InLoc, +24.2 AUC on RUBIK, and +12.4 mAA on IMC 2022. We release our code and models publicly at https://github.com/davnords/LoMa.

Paper Structure

This paper contains 60 sections, 4 equations, 14 figures, 12 tables.

Figures (14)

  • Figure 1: Revisiting local feature matching. We introduce HardMatch, a challenging hand-annotated matching benchmark, and LoMa, a fast and accurate family of local feature-based models. (a) LoMa successfully matches pairs from HardMatch where LightGlue fails, (b) HardMatch is significantly harder than previous benchmarks.
  • Figure 2: The LoMa pipeline. By replacing ALIKED Zhao2023ALIKED with DaD edstedt2025dad+DeDoDe edstedt2024dedode and training the descriptor and matcher on a large collection of datasets we achieve SotA results, even surpassing dense matchers on some tasks ( HardMatch).
  • Figure 3: HardMatch groups. The dataset contains image pairs from a wide range of challenging scenarios, organized into 9 groups. (a) Example pairs illustrating each group. (b) HardMatch mAA@10px performance per group.
  • Figure 4: HardMatch accuracy at different thresholds. LoMa performs slightly better than the best dense matchers and significantly outperforms LightGlue.
  • Figure 5: Ablations. Performance on the validation set of HardMatch (HM).
  • ...and 9 more figures