RoMa v2: Harder Better Faster Denser Feature Matching

Johan Edstedt; David Nordström; Yushan Zhang; Georg Bökman; Jonathan Astermark; Viktor Larsson; Anders Heyden; Fredrik Kahl; Mårten Wadenbäck; Michael Felsberg

RoMa v2: Harder Better Faster Denser Feature Matching

Johan Edstedt, David Nordström, Yushan Zhang, Georg Bökman, Jonathan Astermark, Viktor Larsson, Anders Heyden, Fredrik Kahl, Mårten Wadenbäck, Michael Felsberg

TL;DR

RoMa v2 tackles dense feature matching by presenting a robust two-stage pipeline that couples a fast coarse matcher with lightweight refiners. It introduces a novel matching objective with an auxiliary NLL term and, crucially, predicts a per-pixel predictive covariance to quantify uncertainty during refinement. The approach leverages frozen DINOv3 features in a Multi-view Transformer, trains on a diverse mix of wide and small baseline datasets, and employs an EMA bias remedy to stabilize subpixel refinement. Empirically, RoMa v2 achieves state-of-the-art accuracy across benchmarks with favorable runtime and memory trade-offs, and the inclusion of covariance improves downstream pose estimation and RANSAC-based refinement.

Abstract

Dense feature matching aims to estimate all correspondences between two images of a 3D scene and has recently been established as the gold-standard due to its high accuracy and robustness. However, existing dense matchers still fail or perform poorly for many hard real-world scenarios, and high-precision models are often slow, limiting their applicability. In this paper, we attack these weaknesses on a wide front through a series of systematic improvements that together yield a significantly better model. In particular, we construct a novel matching architecture and loss, which, combined with a curated diverse training distribution, enables our model to solve many complex matching tasks. We further make training faster through a decoupled two-stage matching-then-refinement pipeline, and at the same time, significantly reduce refinement memory usage through a custom CUDA kernel. Finally, we leverage the recent DINOv3 foundation model along with multiple other insights to make the model more robust and unbiased. In our extensive set of experiments we show that the resulting novel matcher sets a new state-of-the-art, being significantly more accurate than its predecessors. Code is available at https://github.com/Parskatt/romav2

RoMa v2: Harder Better Faster Denser Feature Matching

TL;DR

Abstract

RoMa v2: Harder Better Faster Denser Feature Matching

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)