OmniGlue: Generalizable Feature Matching with Foundation Model Guidance
Hanwen Jiang, Arjun Karpur, Bingyi Cao, Qixing Huang, Andre Araujo
TL;DR
OmniGlue addresses the limited generalization of learnable image matchers by introducing foundation-model guidance from DINOv2 and a position-disentangled attention mechanism that separates spatial from appearance information during feature propagation. The method builds intra- and inter-image graphs, prunes cross-image connections with DINOv2 similarities, and refines descriptors through attention blocks that incorporate positional context without embedding it into the final descriptors. Across seven diverse datasets, OmniGlue delivers strong cross-domain gains—up to $20.9\%$ relative over prior work and $9.5\%$ over LightGlue—while maintaining competitive in-domain performance and enabling effective few-shot adaptation. This highlights a practical path toward robust, domain-agnostic image matching suitable for real-world pose estimation and registration tasks.
Abstract
The image matching field has been witnessing a continuous emergence of novel learnable feature matching techniques, with ever-improving performance on conventional benchmarks. However, our investigation shows that despite these gains, their potential for real-world applications is restricted by their limited generalization capabilities to novel image domains. In this paper, we introduce OmniGlue, the first learnable image matcher that is designed with generalization as a core principle. OmniGlue leverages broad knowledge from a vision foundation model to guide the feature matching process, boosting generalization to domains not seen at training time. Additionally, we propose a novel keypoint position-guided attention mechanism which disentangles spatial and appearance information, leading to enhanced matching descriptors. We perform comprehensive experiments on a suite of $7$ datasets with varied image domains, including scene-level, object-centric and aerial images. OmniGlue's novel components lead to relative gains on unseen domains of $20.9\%$ with respect to a directly comparable reference model, while also outperforming the recent LightGlue method by $9.5\%$ relatively.Code and model can be found at https://hwjiang1510.github.io/OmniGlue
