SuperPoint: Self-Supervised Interest Point Detection and Description
Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich
TL;DR
The paper tackles the need for robust, repeatable interest-point detection and description for multi-view geometry by introducing SuperPoint, a fully-convolutional network that jointly detects keypoints and computes 256-d descriptors in one pass. It bootstraps from synthetic data (MagicPoint on Synthetic Shapes) and leverages Homographic Adaptation to self-label unlabeled real images, enabling strong synthetic-to-real transfer. Key contributions include a two-headed, shared-encoder architecture, a self-supervised training pipeline, and state-of-the-art HPatches performance with real-time speed, particularly in illumination-robust scenarios. The work paves the way for learning-based, end-to-end feature extraction suitable for SLAM, SfM, and image matching in diverse environments.
Abstract
This paper presents a self-supervised framework for training interest point detectors and descriptors suitable for a large number of multiple-view geometry problems in computer vision. As opposed to patch-based neural networks, our fully-convolutional model operates on full-sized images and jointly computes pixel-level interest point locations and associated descriptors in one forward pass. We introduce Homographic Adaptation, a multi-scale, multi-homography approach for boosting interest point detection repeatability and performing cross-domain adaptation (e.g., synthetic-to-real). Our model, when trained on the MS-COCO generic image dataset using Homographic Adaptation, is able to repeatedly detect a much richer set of interest points than the initial pre-adapted deep model and any other traditional corner detector. The final system gives rise to state-of-the-art homography estimation results on HPatches when compared to LIFT, SIFT and ORB.
