Table of Contents
Fetching ...

EC3R-SLAM: Efficient and Consistent Monocular Dense SLAM with Feed-Forward 3D Reconstruction

Lingxiang Hu, Naima Ait Oufroukh, Fabien Bonardi, Raymond Ghandour

TL;DR

EC3R-SLAM addresses the bottlenecks of latency and GPU memory in monocular dense SLAM by delivering calibration-free operation through a tight coupling of lightweight tracking with a feed-forward 3D reconstruction backend. It introduces both local and global loop-closure mechanisms to enforce multi-view consistency while simultaneously estimating intrinsics, enabling real-time, memory-efficient dense mapping. The proposed approach achieves competitive accuracy on standard benchmarks (TUM-RGBD, 7-Scenes, Replica) with significantly lower VRAM usage and robust performance on resource-constrained hardware. The combination of local sparse tracking, submap-level feed-forward reconstruction, and comprehensive loop closures yields strong generalization and practical applicability to real-world robotics and AR/VR tasks.

Abstract

The application of monocular dense Simultaneous Localization and Mapping (SLAM) is often hindered by high latency, large GPU memory consumption, and reliance on camera calibration. To relax this constraint, we propose EC3R-SLAM, a novel calibration-free monocular dense SLAM framework that jointly achieves high localization and mapping accuracy, low latency, and low GPU memory consumption. This enables the framework to achieve efficiency through the coupling of a tracking module, which maintains a sparse map of feature points, and a mapping module based on a feed-forward 3D reconstruction model that simultaneously estimates camera intrinsics. In addition, both local and global loop closures are incorporated to ensure mid-term and long-term data association, enforcing multi-view consistency and thereby enhancing the overall accuracy and robustness of the system. Experiments across multiple benchmarks show that EC3R-SLAM achieves competitive performance compared to state-of-the-art methods, while being faster and more memory-efficient. Moreover, it runs effectively even on resource-constrained platforms such as laptops and Jetson Orin NX, highlighting its potential for real-world robotics applications.

EC3R-SLAM: Efficient and Consistent Monocular Dense SLAM with Feed-Forward 3D Reconstruction

TL;DR

EC3R-SLAM addresses the bottlenecks of latency and GPU memory in monocular dense SLAM by delivering calibration-free operation through a tight coupling of lightweight tracking with a feed-forward 3D reconstruction backend. It introduces both local and global loop-closure mechanisms to enforce multi-view consistency while simultaneously estimating intrinsics, enabling real-time, memory-efficient dense mapping. The proposed approach achieves competitive accuracy on standard benchmarks (TUM-RGBD, 7-Scenes, Replica) with significantly lower VRAM usage and robust performance on resource-constrained hardware. The combination of local sparse tracking, submap-level feed-forward reconstruction, and comprehensive loop closures yields strong generalization and practical applicability to real-world robotics and AR/VR tasks.

Abstract

The application of monocular dense Simultaneous Localization and Mapping (SLAM) is often hindered by high latency, large GPU memory consumption, and reliance on camera calibration. To relax this constraint, we propose EC3R-SLAM, a novel calibration-free monocular dense SLAM framework that jointly achieves high localization and mapping accuracy, low latency, and low GPU memory consumption. This enables the framework to achieve efficiency through the coupling of a tracking module, which maintains a sparse map of feature points, and a mapping module based on a feed-forward 3D reconstruction model that simultaneously estimates camera intrinsics. In addition, both local and global loop closures are incorporated to ensure mid-term and long-term data association, enforcing multi-view consistency and thereby enhancing the overall accuracy and robustness of the system. Experiments across multiple benchmarks show that EC3R-SLAM achieves competitive performance compared to state-of-the-art methods, while being faster and more memory-efficient. Moreover, it runs effectively even on resource-constrained platforms such as laptops and Jetson Orin NX, highlighting its potential for real-world robotics applications.

Paper Structure

This paper contains 41 sections, 7 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: (a) Our method achieves real-time multi-view consistent 3D reconstruction from uncalibrated RGB sequence. (b) Benchmark results show fast inference and low GPU memory use with competitive accuracy, highlighting its efficiency.
  • Figure 2: System overview. The RGB images are first processed in the tracking module, where keyframes are selected and used for local loop closure to identify similar frames. The verified keyframes are stored in the keyframe buffer, and once a sufficient number of keyframes are accumulated, they are passed to the mapping module to generate reconstruction information, which is stored in the database. At the same time, the global loop closure module retrieves features from the database for loop detection and performs pose graph optimization.
  • Figure 3: Illustration of point correction. (a) Before correction. (b) After point correction .
  • Figure 4: We perform loop detection by computing a similarity matrix and filtering it with global and local thresholds, followed by homography-based verification.
  • Figure 5: EC3R-SLAM can generalize to new datasets.We show results from Tanks and Temples knapitsch2017tanks, ScanNet dai2017scannet, EuRoC burri2016euroc, Waymo open sun2020scalability, ETH3D schops2019bad,and DL3DV dl3dv.
  • ...and 4 more figures