Table of Contents
Fetching ...

MCN-SLAM: Multi-Agent Collaborative Neural SLAM with Hybrid Implicit Neural Scene Representation

Tianchen Deng, Guole Shen, Xun Chen, Shenghai Yuan, Hongming Shen, Guohao Peng, Zhenyu Wu, Jingchuan Wang, Lihua Xie, Danwei Wang, Hesheng Wang, Weidong Chen

TL;DR

This paper addresses scalable, multi-agent neural SLAM in large, real-world environments under communication limits by introducing MCN-SLAM, a distributed framework that fuses a hybrid implicit scene representation with distributed tracking, intra-to-inter loop closures, and online submap distillation. The approach achieves global map consistency across agents through both local self-correction and cross-agent fusion, while maintaining bandwidth efficiency via parameter- and descriptor-based exchanges. A real-world Dense SLAM (DES) dataset with continuous trajectories and high-accuracy 3D meshes is released to catalyze development in SLAM and visual foundation models. Experiments show state-of-the-art performance in mapping, tracking, and rendering across indoor and outdoor scenarios, validating the method’s scalability and practicality.

Abstract

Neural implicit scene representations have recently shown promising results in dense visual SLAM. However, existing implicit SLAM algorithms are constrained to single-agent scenarios, and fall difficulties in large-scale scenes and long sequences. Existing NeRF-based multi-agent SLAM frameworks cannot meet the constraints of communication bandwidth. To this end, we propose the first distributed multi-agent collaborative neural SLAM framework with hybrid scene representation, distributed camera tracking, intra-to-inter loop closure, and online distillation for multiple submap fusion. A novel triplane-grid joint scene representation method is proposed to improve scene reconstruction. A novel intra-to-inter loop closure method is designed to achieve local (single-agent) and global (multi-agent) consistency. We also design a novel online distillation method to fuse the information of different submaps to achieve global consistency. Furthermore, to the best of our knowledge, there is no real-world dataset for NeRF-based/GS-based SLAM that provides both continuous-time trajectories groundtruth and high-accuracy 3D meshes groundtruth. To this end, we propose the first real-world Dense slam (DES) dataset covering both single-agent and multi-agent scenarios, ranging from small rooms to large-scale outdoor scenes, with high-accuracy ground truth for both 3D mesh and continuous-time camera trajectory. This dataset can advance the development of the research in both SLAM, 3D reconstruction, and visual foundation model. Experiments on various datasets demonstrate the superiority of the proposed method in both mapping, tracking, and communication. The dataset and code will open-source on https://github.com/dtc111111/mcnslam.

MCN-SLAM: Multi-Agent Collaborative Neural SLAM with Hybrid Implicit Neural Scene Representation

TL;DR

This paper addresses scalable, multi-agent neural SLAM in large, real-world environments under communication limits by introducing MCN-SLAM, a distributed framework that fuses a hybrid implicit scene representation with distributed tracking, intra-to-inter loop closures, and online submap distillation. The approach achieves global map consistency across agents through both local self-correction and cross-agent fusion, while maintaining bandwidth efficiency via parameter- and descriptor-based exchanges. A real-world Dense SLAM (DES) dataset with continuous trajectories and high-accuracy 3D meshes is released to catalyze development in SLAM and visual foundation models. Experiments show state-of-the-art performance in mapping, tracking, and rendering across indoor and outdoor scenarios, validating the method’s scalability and practicality.

Abstract

Neural implicit scene representations have recently shown promising results in dense visual SLAM. However, existing implicit SLAM algorithms are constrained to single-agent scenarios, and fall difficulties in large-scale scenes and long sequences. Existing NeRF-based multi-agent SLAM frameworks cannot meet the constraints of communication bandwidth. To this end, we propose the first distributed multi-agent collaborative neural SLAM framework with hybrid scene representation, distributed camera tracking, intra-to-inter loop closure, and online distillation for multiple submap fusion. A novel triplane-grid joint scene representation method is proposed to improve scene reconstruction. A novel intra-to-inter loop closure method is designed to achieve local (single-agent) and global (multi-agent) consistency. We also design a novel online distillation method to fuse the information of different submaps to achieve global consistency. Furthermore, to the best of our knowledge, there is no real-world dataset for NeRF-based/GS-based SLAM that provides both continuous-time trajectories groundtruth and high-accuracy 3D meshes groundtruth. To this end, we propose the first real-world Dense slam (DES) dataset covering both single-agent and multi-agent scenarios, ranging from small rooms to large-scale outdoor scenes, with high-accuracy ground truth for both 3D mesh and continuous-time camera trajectory. This dataset can advance the development of the research in both SLAM, 3D reconstruction, and visual foundation model. Experiments on various datasets demonstrate the superiority of the proposed method in both mapping, tracking, and communication. The dataset and code will open-source on https://github.com/dtc111111/mcnslam.

Paper Structure

This paper contains 14 sections, 21 equations, 16 figures, 13 tables.

Figures (16)

  • Figure 1: We visualize the outdoor scenes of our DES dataset ($\approx$276,000 $m^2$). The 3D map collected by more than 200 industrial laser scanners. We show the scene reconstruction results of some localized regions in different colors.
  • Figure 2: We present MCN-SLAM, the first distributed multi-agent collaborative SLAM system with distributed mapping and camera tracking, hybrid implicit scene representation, intra-to-inter loop closure, and multiple submap fusion. Depicted at the middle, we present the scene reconstruction results of four agents, demonstrating scene reconstruction performance in the real-world, large-scale long-corridor scenes ($\approx$1200 $m^2$). This scene is collected through various industrial laser scanners. We present the rendered depth and color image of different type of agents around the corridor. The trajectory of each agent is marked in a unique color for clarity.
  • Figure 3: System Overview. Our system is a multi-agent collaborative SLAM system which consists hybrid scene representation, distributed tracking, intra-to-inter loop closure, and submap-fusion. In distributed optimization module, each agent takes the color images and depth images as input. In addition, each agent will exchange the network weights of its peers. We carefully design consistency loss with color, depth, and SDF loss in inter-loop closure. Each agent can successively performs individual scene mapping and collaborative mapping and tracking to generate the final neural implicit map with submap-fusion.
  • Figure 4: (a) presents the reconstruction of multi-agent pose graph with intra-to-inter loop closure. (b) and (c) present the covisibility matirx of intra loop closure and inter loop closure. In Figure (b), the horizontal and vertical axes represent the keyframes of the local (single-agent) system, while the horizontal and vertical axes in figure (c) represent the keyframes of the local and peer agents.
  • Figure 5: The multi-implicit-submap fusion for multi-agent SLAM system. Two submaps $M_1$, $M_2$ with their subvolumes and keyframes are shown. We demonstrate the fusion process of two submaps, where pose and submap bundle adjustment (BA) optimization is performed through loop detection and inter loop closure between agents, ultimately registering the submaps generated by different agents.
  • ...and 11 more figures