Learning to Make Keypoints Sub-Pixel Accurate
Shinjeong Kim, Marc Pollefeys, Daniel Barath
TL;DR
This work tackles sub-pixel keypoint localization in learned detectors by introducing a detector-agnostic Keypoint Refinement module that learns per-keypoint offsets to achieve sub-pixel precision. The refinement uses patch-based CNNs and a differentiable SoftArgMax to produce displacement vectors, trained with a calibrated epipolar-loss objective that emphasizes multi-view geometric consistency. Across multiple datasets and detectors, the method yields consistent improvements in relative pose and fundamental matrix estimation, as well as keypoint localization, with only around 7 ms additional latency. The approach demonstrates broad generalizability across detectors and matchers and remains computationally lightweight, making it a practical drop-in enhancement for modern feature pipelines.
Abstract
This work addresses the challenge of sub-pixel accuracy in detecting 2D local features, a cornerstone problem in computer vision. Despite the advancements brought by neural network-based methods like SuperPoint and ALIKED, these modern approaches lag behind classical ones such as SIFT in keypoint localization accuracy due to their lack of sub-pixel precision. We propose a novel network that enhances any detector with sub-pixel precision by learning an offset vector for detected features, thereby eliminating the need for designing specialized sub-pixel accurate detectors. This optimization directly minimizes test-time evaluation metrics like relative pose error. Through extensive testing with both nearest neighbors matching and the recent LightGlue matcher across various real-world datasets, our method consistently outperforms existing methods in accuracy. Moreover, it adds only around 7 ms to the time of a particular detector. The code is available at https://github.com/KimSinjeong/keypt2subpx .
