Table of Contents
Fetching ...

Multi-Agent Monocular Dense SLAM With 3D Reconstruction Priors

Yuchen Zhou, Haihang Wu

TL;DR

The paper tackles scalable dense monocular SLAM for multi-robot teams by leveraging a strong 3D reconstruction prior (MASt3R) at each agent and fusing local submaps on a centralized server through loop-closure constraints. Each agent independently performs MASt3R-based local tracking, mapping, and loop closures, producing submaps that are then integrated into a globally consistent map via a loop-closure-driven global graph optimization. The results show cm-level tracking accuracy in real-world multi-agent scenarios and competitive dense reconstruction quality, along with a substantial runtime advantage (approximately 11.81 FPS) over depth-based RGB-D baselines, particularly when using calibrated intrinsics. This work demonstrates that monocular dense SLAM can scale to multiple agents with maintained accuracy and significantly improved speed, enabling practical deployment in collaborative robotics tasks.

Abstract

Monocular Simultaneous Localization and Mapping (SLAM) aims to estimate a robot's pose while simultaneously reconstructing an unknown 3D scene using a single camera. While existing monocular SLAM systems generate detailed 3D geometry through dense scene representations, they are computationally expensive due to the need for iterative optimization. To address this challenge, MASt3R-SLAM utilizes learned 3D reconstruction priors, enabling more efficient and accurate estimation of both 3D structures and camera poses. However, MASt3R-SLAM is limited to single-agent operation. In this paper, we extend MASt3R-SLAM to introduce the first multi-agent monocular dense SLAM system. Each agent performs local SLAM using a 3D reconstruction prior, and their individual maps are fused into a globally consistent map through a loop-closure-based map fusion mechanism. Our approach improves computational efficiency compared to state-of-the-art methods, while maintaining similar mapping accuracy when evaluated on real-world datasets.

Multi-Agent Monocular Dense SLAM With 3D Reconstruction Priors

TL;DR

The paper tackles scalable dense monocular SLAM for multi-robot teams by leveraging a strong 3D reconstruction prior (MASt3R) at each agent and fusing local submaps on a centralized server through loop-closure constraints. Each agent independently performs MASt3R-based local tracking, mapping, and loop closures, producing submaps that are then integrated into a globally consistent map via a loop-closure-driven global graph optimization. The results show cm-level tracking accuracy in real-world multi-agent scenarios and competitive dense reconstruction quality, along with a substantial runtime advantage (approximately 11.81 FPS) over depth-based RGB-D baselines, particularly when using calibrated intrinsics. This work demonstrates that monocular dense SLAM can scale to multiple agents with maintained accuracy and significantly improved speed, enabling practical deployment in collaborative robotics tasks.

Abstract

Monocular Simultaneous Localization and Mapping (SLAM) aims to estimate a robot's pose while simultaneously reconstructing an unknown 3D scene using a single camera. While existing monocular SLAM systems generate detailed 3D geometry through dense scene representations, they are computationally expensive due to the need for iterative optimization. To address this challenge, MASt3R-SLAM utilizes learned 3D reconstruction priors, enabling more efficient and accurate estimation of both 3D structures and camera poses. However, MASt3R-SLAM is limited to single-agent operation. In this paper, we extend MASt3R-SLAM to introduce the first multi-agent monocular dense SLAM system. Each agent performs local SLAM using a 3D reconstruction prior, and their individual maps are fused into a globally consistent map through a loop-closure-based map fusion mechanism. Our approach improves computational efficiency compared to state-of-the-art methods, while maintaining similar mapping accuracy when evaluated on real-world datasets.

Paper Structure

This paper contains 11 sections, 8 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: System overview. Agent Side: Each agent processes its own RGBD stream, performing tasks including tracking, mapping, and local graph optimization. Upon completion, the agent sends its sub-map and trajectory to the central server. Server Side: The server performs both intra-agent and inter-agent loop closure detection, followed by global graph optimization. This process merges the submaps into a unified global map and updates the agent poses accordingly.
  • Figure 2: Reconstruction of Office 0 in the Replica dataset, where each agent independently reconstructs the scene. The individual maps are then merged into a global map using our global map fusion algorithm.
  • Figure 3: Reconstruction of Room 0 in the Aria dataset, where each agent independently reconstructs the scene, and their maps are subsequently fused into a global map using our global map fusion algorithm..