R2D2: Repeatable and Reliable Detector and Descriptor

Jerome Revaud; Philippe Weinzaepfel; César De Souza; Noe Pion; Gabriela Csurka; Yohann Cabon; Martin Humenberger

R2D2: Repeatable and Reliable Detector and Descriptor

Jerome Revaud, Philippe Weinzaepfel, César De Souza, Noe Pion, Gabriela Csurka, Yohann Cabon, Martin Humenberger

TL;DR

R2D2 tackles the fundamental mismatch between repeatability and discriminativeness in local features by jointly learning a detector, descriptor, and a reliability predictor. It introduces a dense per-pixel descriptor with two confidence maps: repeatability S and reliability R, trained with self-supervised losses (cosine-similarity and peakiness for repeatability; AP-based ranking with κ-weighted reliability for descriptors). The method achieves state-of-the-art performance on HPatches and Aachen Day-Night, particularly benefiting tasks requiring robust matching under viewpoint and illumination changes, while maintaining a compact descriptor size. The approach is validated through extensive ablations and a localization pipeline, demonstrating practical impact for visual localization and 3D reconstruction tasks.

Abstract

Interest point detection and local feature description are fundamental steps in many computer vision applications. Classical methods for these tasks are based on a detect-then-describe paradigm where separate handcrafted methods are used to first identify repeatable keypoints and then represent them with a local descriptor. Neural networks trained with metric learning losses have recently caught up with these techniques, focusing on learning repeatable saliency maps for keypoint detection and learning descriptors at the detected keypoint locations. In this work, we argue that salient regions are not necessarily discriminative, and therefore can harm the performance of the description. Furthermore, we claim that descriptors should be learned only in regions for which matching can be performed with high confidence. We thus propose to jointly learn keypoint detection and description together with a predictor of the local descriptor discriminativeness. This allows us to avoid ambiguous areas and leads to reliable keypoint detections and descriptions. Our detection-and-description approach, trained with self-supervision, can simultaneously output sparse, repeatable and reliable keypoints that outperforms state-of-the-art detectors and descriptors on the HPatches dataset. It also establishes a record on the recently released Aachen Day-Night localization dataset.

R2D2: Repeatable and Reliable Detector and Descriptor

TL;DR

Abstract

R2D2: Repeatable and Reliable Detector and Descriptor

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)