Multi-Agent Monocular Dense SLAM With 3D Reconstruction Priors
Yuchen Zhou, Haihang Wu
TL;DR
The paper tackles scalable dense monocular SLAM for multi-robot teams by leveraging a strong 3D reconstruction prior (MASt3R) at each agent and fusing local submaps on a centralized server through loop-closure constraints. Each agent independently performs MASt3R-based local tracking, mapping, and loop closures, producing submaps that are then integrated into a globally consistent map via a loop-closure-driven global graph optimization. The results show cm-level tracking accuracy in real-world multi-agent scenarios and competitive dense reconstruction quality, along with a substantial runtime advantage (approximately 11.81 FPS) over depth-based RGB-D baselines, particularly when using calibrated intrinsics. This work demonstrates that monocular dense SLAM can scale to multiple agents with maintained accuracy and significantly improved speed, enabling practical deployment in collaborative robotics tasks.
Abstract
Monocular Simultaneous Localization and Mapping (SLAM) aims to estimate a robot's pose while simultaneously reconstructing an unknown 3D scene using a single camera. While existing monocular SLAM systems generate detailed 3D geometry through dense scene representations, they are computationally expensive due to the need for iterative optimization. To address this challenge, MASt3R-SLAM utilizes learned 3D reconstruction priors, enabling more efficient and accurate estimation of both 3D structures and camera poses. However, MASt3R-SLAM is limited to single-agent operation. In this paper, we extend MASt3R-SLAM to introduce the first multi-agent monocular dense SLAM system. Each agent performs local SLAM using a 3D reconstruction prior, and their individual maps are fused into a globally consistent map through a loop-closure-based map fusion mechanism. Our approach improves computational efficiency compared to state-of-the-art methods, while maintaining similar mapping accuracy when evaluated on real-world datasets.
