LightGlue: Local Feature Matching at Light Speed
Philipp Lindenberger, Paul-Edouard Sarlin, Marc Pollefeys
TL;DR
LightGlue tackles the efficiency gap in deep sparse feature matching by introducing an adaptive Transformer-based matcher that can halt computation early for easy image pairs. It replaces the heavy Sinkhorn-based optimization of previous work with a lightweight, per-layer correspondence head and a confidence-driven exit mechanism, while using relative positional encodings and bidirectional attention to maintain accuracy. Through synthetic homography pretraining and MegaDepth finetuning, LightGlue achieves state-of-the-art or competitive results with substantially reduced runtime, and ablations highlight the critical role of matchability, adaptivity, and deep supervision. The method is demonstrated across HPatches, MegaDepth, and Aachen Day-Night, showing strong performance for SLAM-scale localization and visual reconstruction tasks, with code publicly available.
Abstract
We introduce LightGlue, a deep neural network that learns to match local features across images. We revisit multiple design decisions of SuperGlue, the state of the art in sparse matching, and derive simple but effective improvements. Cumulatively, they make LightGlue more efficient - in terms of both memory and computation, more accurate, and much easier to train. One key property is that LightGlue is adaptive to the difficulty of the problem: the inference is much faster on image pairs that are intuitively easy to match, for example because of a larger visual overlap or limited appearance change. This opens up exciting prospects for deploying deep matchers in latency-sensitive applications like 3D reconstruction. The code and trained models are publicly available at https://github.com/cvg/LightGlue.
