Table of Contents
Fetching ...

RaCo: Ranking and Covariance for Practical Learned Keypoints

Abhiram Shenoi, Philipp Lindenberger, Paul-Edouard Sarlin, Marc Pollefeys

TL;DR

This paper introduces RaCo, a lightweight neural network designed to learn robust and versatile keypoints suitable for a variety of 3D computer vision tasks, and demonstrates state-of-the-art performance in keypoint repeatability and two-view matching, particularly under large in-plane rotations.

Abstract

This paper introduces RaCo, a lightweight neural network designed to learn robust and versatile keypoints suitable for a variety of 3D computer vision tasks. The model integrates three key components: the repeatable keypoint detector, a differentiable ranker to maximize matches with a limited number of keypoints, and a covariance estimator to quantify spatial uncertainty in metric scale. Trained on perspective image crops only, RaCo operates without the need for covisible image pairs. It achieves strong rotational robustness through extensive data augmentation, even without the use of computationally expensive equivariant network architectures. The method is evaluated on several challenging datasets, where it demonstrates state-of-the-art performance in keypoint repeatability and two-view matching, particularly under large in-plane rotations. Ultimately, RaCo provides an effective and simple strategy to independently estimate keypoint ranking and metric covariance without additional labels, detecting interpretable and repeatable interest points. The code is available at https://github.com/cvg/RaCo.

RaCo: Ranking and Covariance for Practical Learned Keypoints

TL;DR

This paper introduces RaCo, a lightweight neural network designed to learn robust and versatile keypoints suitable for a variety of 3D computer vision tasks, and demonstrates state-of-the-art performance in keypoint repeatability and two-view matching, particularly under large in-plane rotations.

Abstract

This paper introduces RaCo, a lightweight neural network designed to learn robust and versatile keypoints suitable for a variety of 3D computer vision tasks. The model integrates three key components: the repeatable keypoint detector, a differentiable ranker to maximize matches with a limited number of keypoints, and a covariance estimator to quantify spatial uncertainty in metric scale. Trained on perspective image crops only, RaCo operates without the need for covisible image pairs. It achieves strong rotational robustness through extensive data augmentation, even without the use of computationally expensive equivariant network architectures. The method is evaluated on several challenging datasets, where it demonstrates state-of-the-art performance in keypoint repeatability and two-view matching, particularly under large in-plane rotations. Ultimately, RaCo provides an effective and simple strategy to independently estimate keypoint ranking and metric covariance without additional labels, detecting interpretable and repeatable interest points. The code is available at https://github.com/cvg/RaCo.
Paper Structure (65 sections, 7 equations, 16 figures, 4 tables)

This paper contains 65 sections, 7 equations, 16 figures, 4 tables.

Figures (16)

  • Figure 1: Practical interest point detection. RaCo detects repeatable and interpretable corners (left), learned from perspective image crops. A dedicated ranking head (middle) maximizes the downstream accuracy-speed trade-off by ranking matchable points higher. The estimated 2D metric covariances (right) describe the keypoints' spatial uncertainty in pixels (colored by the angle of the first eigenvector, and whitened where the variance is large).
  • Figure 2: Overview. Our method consists of three branches: i) A detector head that produces a scoremap with repeatable keypoints, and ii) a covariance head that outputs the 2D spatial uncertainty in pixels, both sharing a lightweight backbone. The iii) ranker module outputs soft keypoint scores which maximize the repeatability at different keypoint budgets.
  • Figure 3: Keypoint ranking. Inconsistent keypoint ranking between images (left) results in excessive match filtering when the amount of keypoints per image is restricted (small budget, middle). Our ranking module keeps repeatable points at the top of the list and yields similar ranks for corresponding points (right).
  • Figure 4: Covariance supervision. We train our covariance estimator by maximizing the log-likelihood of the reprojection error between corresponding keypoints. For corresponding keypoints $\mathbf{x}^i_A$ in view $A$ and $\mathbf{x}^i_B$ in view $B$, the reprojection error $\mathbf{e}^i_{B\rightarrow A} = \mathbf{H}_{B\rightarrow A}(\mathbf{x}^i_B) - \mathbf{x}^i_A$ is modeled as a zero-mean Gaussian: $\mathbf{e}^i_{B\rightarrow A} \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma}^i_A + \mathbf{J}^i_{B\rightarrow A} \boldsymbol{\Sigma}^i_B (\mathbf{J}^i_{B\rightarrow A})^\top)$, where $\mathbf{J}^i_{B\rightarrow A}$ is the Jacobian of the homography evaluated at $\mathbf{x}^i_B$. The resulting covariance estimates (right) are strongly anisotropic (colored by the angle of the covariance's first eigenvector) and are large in areas with low texture (illustrated by opacity).
  • Figure 5: Rotation evaluation on HPatches balntas2017hpatches. We plot the repeatability@2px over the rotation angle between image pairs. SIFT lowe2004distinctive is more robust than any learned keypoint detectors, but our improved rotation augmentations result in state-of-the-art rotational robustness without requiring specialized model architectures.
  • ...and 11 more figures