Table of Contents
Fetching ...

SuperPoint-SLAM3: Augmenting ORB-SLAM3 with Deep Features, Adaptive NMS, and Learning-Based Loop Closure

Shahram Najam Syed, Ishir Roongta, Kavin Ravie, Gangadhar Nageswar

TL;DR

This paper addresses the robustness gap in visual SLAM under challenging conditions by upgrading ORB-SLAM3 with deep SuperPoint features, adaptive non-maximal suppression (ANMS) for uniform keypoint distribution, and a learning-based loop-closure pathway. The proposed SuperPoint-SLAM3 replaces ORB descriptors with 256-d SuperPoint descriptors, applies ANMS to ensure spatially distributed features, and combines with a learning-based loop-closure head to improve localization accuracy on KITTI and EuRoC while maintaining real-time performance. Key results show substantial reductions in both translation and rotation errors on KITTI ($E_T$ dropping from $4.15\%$ to $0.34\%$, $E_R$ from $0.0027$ deg/m to $0.0010$ deg/m) and roughly halving errors on EuRoC across sequences, validating the value of integrating deep features with learnable place recognition. The work highlights significant potential for deep feature fusion in SLAM, while also identifying areas—such as loop-closure compatibility and computational load—that require further development for seamless deployment in real-world systems.

Abstract

Visual simultaneous localization and mapping (SLAM) must remain accurate under extreme viewpoint, scale and illumination variations. The widely adopted ORB-SLAM3 falters in these regimes because it relies on hand-crafted ORB keypoints. We introduce SuperPoint-SLAM3, a drop-in upgrade that (i) replaces ORB with the self-supervised SuperPoint detector--descriptor, (ii) enforces spatially uniform keypoints via adaptive non-maximal suppression (ANMS), and (iii) integrates a lightweight NetVLAD place-recognition head for learning-based loop closure. On the KITTI Odometry benchmark SuperPoint-SLAM3 reduces mean translational error from 4.15% to 0.34% and mean rotational error from 0.0027 deg/m to 0.0010 deg/m. On the EuRoC MAV dataset it roughly halves both errors across every sequence (e.g., V2\_03: 1.58% -> 0.79%). These gains confirm that fusing modern deep features with a learned loop-closure module markedly improves ORB-SLAM3 accuracy while preserving its real-time operation. Implementation, pretrained weights and reproducibility scripts are available at https://github.com/shahram95/SuperPointSLAM3.

SuperPoint-SLAM3: Augmenting ORB-SLAM3 with Deep Features, Adaptive NMS, and Learning-Based Loop Closure

TL;DR

This paper addresses the robustness gap in visual SLAM under challenging conditions by upgrading ORB-SLAM3 with deep SuperPoint features, adaptive non-maximal suppression (ANMS) for uniform keypoint distribution, and a learning-based loop-closure pathway. The proposed SuperPoint-SLAM3 replaces ORB descriptors with 256-d SuperPoint descriptors, applies ANMS to ensure spatially distributed features, and combines with a learning-based loop-closure head to improve localization accuracy on KITTI and EuRoC while maintaining real-time performance. Key results show substantial reductions in both translation and rotation errors on KITTI ( dropping from to , from deg/m to deg/m) and roughly halving errors on EuRoC across sequences, validating the value of integrating deep features with learnable place recognition. The work highlights significant potential for deep feature fusion in SLAM, while also identifying areas—such as loop-closure compatibility and computational load—that require further development for seamless deployment in real-world systems.

Abstract

Visual simultaneous localization and mapping (SLAM) must remain accurate under extreme viewpoint, scale and illumination variations. The widely adopted ORB-SLAM3 falters in these regimes because it relies on hand-crafted ORB keypoints. We introduce SuperPoint-SLAM3, a drop-in upgrade that (i) replaces ORB with the self-supervised SuperPoint detector--descriptor, (ii) enforces spatially uniform keypoints via adaptive non-maximal suppression (ANMS), and (iii) integrates a lightweight NetVLAD place-recognition head for learning-based loop closure. On the KITTI Odometry benchmark SuperPoint-SLAM3 reduces mean translational error from 4.15% to 0.34% and mean rotational error from 0.0027 deg/m to 0.0010 deg/m. On the EuRoC MAV dataset it roughly halves both errors across every sequence (e.g., V2\_03: 1.58% -> 0.79%). These gains confirm that fusing modern deep features with a learned loop-closure module markedly improves ORB-SLAM3 accuracy while preserving its real-time operation. Implementation, pretrained weights and reproducibility scripts are available at https://github.com/shahram95/SuperPointSLAM3.

Paper Structure

This paper contains 49 sections, 3 figures, 2 tables, 3 algorithms.

Figures (3)

  • Figure 1: System architecture of SuperPoint-SLAM3 integrating SuperPoint and ANMS into the ORB-SLAM3 pipeline.
  • Figure 2: Comparative analysis of 2D trajectories (XZ plane) for sequences 00 to 10 using ORB-SLAM3, SuperPoint, and SuperPoint + ANMS.
  • Figure 3: Comparative analysis of 6D pose estimation for ORB-SLAM3, SuperPoint, and SuperPoint + ANMS.