Table of Contents
Fetching ...

ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM

Carlos Campos, Richard Elvira, Juan J. Gómez Rodríguez, José M. M. Montiel, Juan D. Tardós

TL;DR

ORB-SLAM3 addresses the need for robust, accurate SLAM across visual, visual-inertial, and multi-map scenarios. It introduces a MAP-estimation framework that remains effective during IMU initialization, a high-recall place recognition mechanism, and the Atlas multi-map system for seamless cross-session map merging. Experimental results on EuRoC and TUM-VI demonstrate state-of-the-art accuracy and robustness across configurations, including centimeter-scale errors in stereo-inertial setups and strong multi-session performance. The open-source release of ORB-SLAM3 enables widespread adoption and further advancement of SLAM capabilities in real-world environments.

Abstract

This paper presents ORB-SLAM3, the first system able to perform visual, visual-inertial and multi-map SLAM with monocular, stereo and RGB-D cameras, using pin-hole and fisheye lens models. The first main novelty is a feature-based tightly-integrated visual-inertial SLAM system that fully relies on Maximum-a-Posteriori (MAP) estimation, even during the IMU initialization phase. The result is a system that operates robustly in real-time, in small and large, indoor and outdoor environments, and is 2 to 5 times more accurate than previous approaches. The second main novelty is a multiple map system that relies on a new place recognition method with improved recall. Thanks to it, ORB-SLAM3 is able to survive to long periods of poor visual information: when it gets lost, it starts a new map that will be seamlessly merged with previous maps when revisiting mapped areas. Compared with visual odometry systems that only use information from the last few seconds, ORB-SLAM3 is the first system able to reuse in all the algorithm stages all previous information. This allows to include in bundle adjustment co-visible keyframes, that provide high parallax observations boosting accuracy, even if they are widely separated in time or if they come from a previous mapping session. Our experiments show that, in all sensor configurations, ORB-SLAM3 is as robust as the best systems available in the literature, and significantly more accurate. Notably, our stereo-inertial SLAM achieves an average accuracy of 3.6 cm on the EuRoC drone and 9 mm under quick hand-held motions in the room of TUM-VI dataset, a setting representative of AR/VR scenarios. For the benefit of the community we make public the source code.

ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM

TL;DR

ORB-SLAM3 addresses the need for robust, accurate SLAM across visual, visual-inertial, and multi-map scenarios. It introduces a MAP-estimation framework that remains effective during IMU initialization, a high-recall place recognition mechanism, and the Atlas multi-map system for seamless cross-session map merging. Experimental results on EuRoC and TUM-VI demonstrate state-of-the-art accuracy and robustness across configurations, including centimeter-scale errors in stereo-inertial setups and strong multi-session performance. The open-source release of ORB-SLAM3 enables widespread adoption and further advancement of SLAM capabilities in real-world environments.

Abstract

This paper presents ORB-SLAM3, the first system able to perform visual, visual-inertial and multi-map SLAM with monocular, stereo and RGB-D cameras, using pin-hole and fisheye lens models. The first main novelty is a feature-based tightly-integrated visual-inertial SLAM system that fully relies on Maximum-a-Posteriori (MAP) estimation, even during the IMU initialization phase. The result is a system that operates robustly in real-time, in small and large, indoor and outdoor environments, and is 2 to 5 times more accurate than previous approaches. The second main novelty is a multiple map system that relies on a new place recognition method with improved recall. Thanks to it, ORB-SLAM3 is able to survive to long periods of poor visual information: when it gets lost, it starts a new map that will be seamlessly merged with previous maps when revisiting mapped areas. Compared with visual odometry systems that only use information from the last few seconds, ORB-SLAM3 is the first system able to reuse in all the algorithm stages all previous information. This allows to include in bundle adjustment co-visible keyframes, that provide high parallax observations boosting accuracy, even if they are widely separated in time or if they come from a previous mapping session. Our experiments show that, in all sensor configurations, ORB-SLAM3 is as robust as the best systems available in the literature, and significantly more accurate. Notably, our stereo-inertial SLAM achieves an average accuracy of 3.6 cm on the EuRoC drone and 9 mm under quick hand-held motions in the room of TUM-VI dataset, a setting representative of AR/VR scenarios. For the benefit of the community we make public the source code.

Paper Structure

This paper contains 25 sections, 4 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Main system components of ORB-SLAM3.
  • Figure 2: Factor graph representation for different optimizations along the system
  • Figure 3: Factor graph representation for the welding BA, with reprojection error terms (blue squares), IMU preintegration terms (yellow squares) and bias random walk (purple squares).
  • Figure 4: Colored squares represent the RMS ATE for ten different execution in each sequence of the EuRoC dataset.
  • Figure 5: Multi-session stereo-inertial result with several sequences from TUM-VI dataset (front, side and top views).
  • ...and 1 more figures