DK-SLAM: Monocular Visual SLAM with Deep Keypoint Learning, Tracking and Loop-Closing

Hao Qu; Lilian Zhang; Jun Mao; Junbo Tie; Xiaofeng He; Xiaoping Hu; Yifei Shi; Changhao Chen

DK-SLAM: Monocular Visual SLAM with Deep Keypoint Learning, Tracking and Loop-Closing

Hao Qu, Lilian Zhang, Jun Mao, Junbo Tie, Xiaofeng He, Xiaoping Hu, Yifei Shi, Changhao Chen

TL;DR

DK-SLAM addresses the fragility of handcrafted features in monocular SLAM by introducing a deep keypoint extractor trained with Model-Agnostic Meta-Learning ($MAML$) to improve generalization across scenes. It couples a coarse-to-fine two-stage tracking pipeline with an online binary Bag-of-Words loop-closure module, enabling accurate pose estimation and scalable loop detection. The approach yields substantial gains over traditional and learning-based baselines on KITTI and EuRoC datasets, notably improving translation and rotation accuracy and mapping quality. These results demonstrate that meta-learned deep keypoints, together with online adaptation and efficient loop-closure, can deliver robust, real-world SLAM performance across diverse environments.

Abstract

The performance of visual SLAM in complex, real-world scenarios is often compromised by unreliable feature extraction and matching when using handcrafted features. Although deep learning-based local features excel at capturing high-level information and perform well on matching benchmarks, they struggle with generalization in continuous motion scenes, adversely affecting loop detection accuracy. Our system employs a Model-Agnostic Meta-Learning (MAML) strategy to optimize the training of keypoint extraction networks, enhancing their adaptability to diverse environments. Additionally, we introduce a coarse-to-fine feature tracking mechanism for learned keypoints. It begins with a direct method to approximate the relative pose between consecutive frames, followed by a feature matching method for refined pose estimation. To mitigate cumulative positioning errors, DK-SLAM incorporates a novel online learning module that utilizes binary features for loop closure detection. This module dynamically identifies loop nodes within a sequence, ensuring accurate and efficient localization. Experimental evaluations on publicly available datasets demonstrate that DK-SLAM outperforms leading traditional and learning based SLAM systems, such as ORB-SLAM3 and LIFT-SLAM. These results underscore the efficacy and robustness of our DK-SLAM in varied and challenging real-world environments.

DK-SLAM: Monocular Visual SLAM with Deep Keypoint Learning, Tracking and Loop-Closing

TL;DR

DK-SLAM addresses the fragility of handcrafted features in monocular SLAM by introducing a deep keypoint extractor trained with Model-Agnostic Meta-Learning (

) to improve generalization across scenes. It couples a coarse-to-fine two-stage tracking pipeline with an online binary Bag-of-Words loop-closure module, enabling accurate pose estimation and scalable loop detection. The approach yields substantial gains over traditional and learning-based baselines on KITTI and EuRoC datasets, notably improving translation and rotation accuracy and mapping quality. These results demonstrate that meta-learned deep keypoints, together with online adaptation and efficient loop-closure, can deliver robust, real-world SLAM performance across diverse environments.

Abstract

Paper Structure (27 sections, 9 equations, 9 figures, 5 tables, 1 algorithm)

This paper contains 27 sections, 9 equations, 9 figures, 5 tables, 1 algorithm.

Introduction
Related Works
Learning based Feature Extractor
Learning based Visual SLAM
Loop Closure Detection
Deep Keypoint based Monocular SLAM
Deep Keypoint Meta Learning
Feature Extractor Network
Self-Supervised Keypoint Learning
MAML-Based Visual Keypoint Meta Learning
The Distribution Strategy of Deep Keypoints
Coarse-to-Fine Keypoint Tracking
Semi-direct Coarse Keypoint Tracking
Coarse-to-Fine Keypoint Tracking
Deep Keypoint Based Loop Closing
...and 12 more sections

Figures (9)

Figure 1: An overview of our proposed DK-SLAM framework with deep keypoint meta learning, two-stage coarse-to-fine keypoint tracking and online learning based binary BoW for loop-closing.
Figure 2: Diagram of our proposed coarse-to-fine two-stage keypoint tracking strategy. This process begins with relative pose estimation through patch photometric loss optimization, followed by refinement using the 3D-2D keypoint relationship for enhanced accuracy.
Figure 3: The illustration of our proposed Online Learning based Binary BoW. The BoW is constructed incrementally, with matched descriptors in the keyframes database stored within the same leaf node. In the presence of unmatched descriptors in the current keyframe, a new leaf node is created.
Figure 4: The generated trajectories of our proposed DK-SLAM on the Sequence 00, 02, 05, 07, 09 and 10 of the KITTI dataset, comparing with LDSO and ORB-SLAM3.
Figure 5: Mapping results generated by our proposed DK-SLAM system.
...and 4 more figures

DK-SLAM: Monocular Visual SLAM with Deep Keypoint Learning, Tracking and Loop-Closing

TL;DR

Abstract

DK-SLAM: Monocular Visual SLAM with Deep Keypoint Learning, Tracking and Loop-Closing

Authors

TL;DR

Abstract

Table of Contents

Figures (9)