Table of Contents
Fetching ...

Learning to Make Keypoints Sub-Pixel Accurate

Shinjeong Kim, Marc Pollefeys, Daniel Barath

TL;DR

This work tackles sub-pixel keypoint localization in learned detectors by introducing a detector-agnostic Keypoint Refinement module that learns per-keypoint offsets to achieve sub-pixel precision. The refinement uses patch-based CNNs and a differentiable SoftArgMax to produce displacement vectors, trained with a calibrated epipolar-loss objective that emphasizes multi-view geometric consistency. Across multiple datasets and detectors, the method yields consistent improvements in relative pose and fundamental matrix estimation, as well as keypoint localization, with only around 7 ms additional latency. The approach demonstrates broad generalizability across detectors and matchers and remains computationally lightweight, making it a practical drop-in enhancement for modern feature pipelines.

Abstract

This work addresses the challenge of sub-pixel accuracy in detecting 2D local features, a cornerstone problem in computer vision. Despite the advancements brought by neural network-based methods like SuperPoint and ALIKED, these modern approaches lag behind classical ones such as SIFT in keypoint localization accuracy due to their lack of sub-pixel precision. We propose a novel network that enhances any detector with sub-pixel precision by learning an offset vector for detected features, thereby eliminating the need for designing specialized sub-pixel accurate detectors. This optimization directly minimizes test-time evaluation metrics like relative pose error. Through extensive testing with both nearest neighbors matching and the recent LightGlue matcher across various real-world datasets, our method consistently outperforms existing methods in accuracy. Moreover, it adds only around 7 ms to the time of a particular detector. The code is available at https://github.com/KimSinjeong/keypt2subpx .

Learning to Make Keypoints Sub-Pixel Accurate

TL;DR

This work tackles sub-pixel keypoint localization in learned detectors by introducing a detector-agnostic Keypoint Refinement module that learns per-keypoint offsets to achieve sub-pixel precision. The refinement uses patch-based CNNs and a differentiable SoftArgMax to produce displacement vectors, trained with a calibrated epipolar-loss objective that emphasizes multi-view geometric consistency. Across multiple datasets and detectors, the method yields consistent improvements in relative pose and fundamental matrix estimation, as well as keypoint localization, with only around 7 ms additional latency. The approach demonstrates broad generalizability across detectors and matchers and remains computationally lightweight, making it a practical drop-in enhancement for modern feature pipelines.

Abstract

This work addresses the challenge of sub-pixel accuracy in detecting 2D local features, a cornerstone problem in computer vision. Despite the advancements brought by neural network-based methods like SuperPoint and ALIKED, these modern approaches lag behind classical ones such as SIFT in keypoint localization accuracy due to their lack of sub-pixel precision. We propose a novel network that enhances any detector with sub-pixel precision by learning an offset vector for detected features, thereby eliminating the need for designing specialized sub-pixel accurate detectors. This optimization directly minimizes test-time evaluation metrics like relative pose error. Through extensive testing with both nearest neighbors matching and the recent LightGlue matcher across various real-world datasets, our method consistently outperforms existing methods in accuracy. Moreover, it adds only around 7 ms to the time of a particular detector. The code is available at https://github.com/KimSinjeong/keypt2subpx .
Paper Structure (20 sections, 8 equations, 7 figures, 9 tables)

This paper contains 20 sections, 8 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: An overview of the proposed sub-pixel refinement method. Given a pair of images, local features are detected, described, and matched to find correspondences. For each match, image patches centered at the keypoints are extracted. Our proposed Keypoint Refinement module (\ref{['fig:method']}) takes the patches and descriptors to refine the keypoint locations. When training, the refined keypoint matches are used to calculate loss so that the Keypoint Refinement module can be optimized. On evaluation, the relative pose between the two views is estimated using robust estimators.
  • Figure 2: Visualization of how our Keypoint Refinement module works. For detectors producing dense score maps as an intermediate representation, the patches of the score map are concatenated to the image patches. The feature map of each patch is extracted with a small convolutional neural network (CNN) and dot-produced with an average of matched descriptor pairs. Taking the SoftArgMax operation on the resulting score map gives the sub-pixel accurate displacement of each keypoint. Note that the weights of two CNNs of the Keypoint Refinement module are shared.
  • Figure 3: Histograms of scale and orientation of offset vectors for SuperPoint across various datasets. The top row illustrates the distribution of lengths of offset vectors, with significant displacements observed on MegaDepth, suggesting a high potential for accuracy improvements through sub-pixel refinement due to initial keypoint localization inaccuracies. The progression from left to right shows decreased lengths, with minimal adjustments required on ScanNet. The bottom row displays the directional histogram of predicted offset vectors, revealing a uniform distribution with a tendency for vertical alignment and a rightward bias, likely reflecting unique characteristics of datasets. We do not show the bins beyond offset size $>2$px as they are negligible
  • Figure 4: Image pairs from KITTI (top), ScanNet (bottom left), and MegaDepth (bottom right) datasets showing inliers of refined matches. For each pair, the upper row shows matches, and the bottom one the image patches our Keypoint Refinement module takes, with initial points as blue and refined ones as red. The refinements are larger on the MegaDepth pair, as it is visible also in the analysis provided by Fig. \ref{['fig:histograms']}. However, both on KITTI and ScanNet, the refined features seem to be better localized visually.
  • Figure 5: Visualization of geometric inlier correspondences on MegaDepth-1500 benchmarksun2021loftr with our method, Reinforced SuperPoint, and SuperPoint, from left to right. The first and second row shows the successful cases and the last row shows the failure case. Note that our method consistently increases the number of inliers, but it does not always lead to more accurate relative poses.
  • ...and 2 more figures