DaD: Distilled Reinforcement Learning for Diverse Keypoint Detection
Johan Edstedt, Georg Bökman, Mårten Wadenbäck, Michael Felsberg
TL;DR
DaD tackles descriptor-free keypoint detection for Structure-from-Motion by training a keypoint detector with reinforcement learning and a balanced top-K sampling strategy. It uncovers two emergent detectors—light and dark—and fuses them via point-wise maximum knowledge distillation to form DaD, a diverse, descriptor-free detector. Across MegaDepth1500, ScanNet1500, and HPatches, DaD achieves state-of-the-art performance, especially in few-keypoint scenarios, without relying on SfM tracks or descriptors. The method offers a scalable, self-supervised solution that strengthens two-view reconstruction pipelines while addressing inherent biases in single-type detectors.
Abstract
Keypoints are what enable Structure-from-Motion (SfM) systems to scale to thousands of images. However, designing a keypoint detection objective is a non-trivial task, as SfM is non-differentiable. Typically, an auxiliary objective involving a descriptor is optimized. This however induces a dependency on the descriptor, which is undesirable. In this paper we propose a fully self-supervised and descriptor-free objective for keypoint detection, through reinforcement learning. To ensure training does not degenerate, we leverage a balanced top-K sampling strategy. While this already produces competitive models, we find that two qualitatively different types of detectors emerge, which are only able to detect light and dark keypoints respectively. To remedy this, we train a third detector, DaD, that optimizes the Kullback-Leibler divergence of the pointwise maximum of both light and dark detectors. Our approach significantly improve upon SotA across a range of benchmarks. Code and model weights are publicly available at https://github.com/parskatt/dad
