SELC: Self-Supervised Efficient Local Correspondence Learning for Low Quality Images
Yuqing Wang, Yan Wang, Hailiang Tang, Xiaoji Niu
TL;DR
SELC addresses the need for accurate yet efficient feature matching in SLAM by proposing a lightweight, patch-based CNN that learns dense local descriptors without manual annotations. It integrates traditional tracking signals via a hybrid self-supervision paradigm and enforces both intra-frame and inter-frame consistency through a combination of keypoint, heat-map, and dense descriptor losses, plus single and multi-frame consistency losses. The approach yields strong short-term accuracy and robust long-term drift mitigation while maintaining efficiency, achieving near state-of-the-art speeds at low resolutions and substantial gains at high resolutions through pyramid inference. Evaluations on MegaDepth, KITTI, HPatches, and Euroc demonstrate competitive repeatability and particularly notable efficiency improvements for high-resolution imagery, making the method well-suited for resource-constrained visual localization and SLAM pipelines.
Abstract
Accurate and stable feature matching is critical for computer vision tasks, particularly in applications such as Simultaneous Localization and Mapping (SLAM). While recent learning-based feature matching methods have demonstrated promising performance in challenging spatiotemporal scenarios, they still face inherent trade-offs between accuracy and computational efficiency in specific settings. In this paper, we propose a lightweight feature matching network designed to establish sparse, stable, and consistent correspondence between multiple frames. The proposed method eliminates the dependency on manual annotations during training and mitigates feature drift through a hybrid self-supervised paradigm. Extensive experiments validate three key advantages: (1) Our method operates without dependency on external prior knowledge and seamlessly incorporates its hybrid training mechanism into original datasets. (2) Benchmarked against state-of-the-art deep learning-based methods, our approach maintains equivalent computational efficiency at low-resolution scales while achieving a 2-10x improvement in computational efficiency for high-resolution inputs. (3) Comparative evaluations demonstrate that the proposed hybrid self-supervised scheme effectively mitigates feature drift in long-term tracking while maintaining consistent representation across image sequences.
