Table of Contents
Fetching ...

MR-COGraphs: Communication-efficient Multi-Robot Open-vocabulary Mapping System via 3D Scene Graphs

Qiuyi Gu, Zhaocheng Ye, Jincheng Yu, Jiahao Tang, Tinghao Yi, Yuhan Dong, Jian Wang, Jinqiang Cui, Xinlei Chen, Yu Wang

TL;DR

MR-COGraphs introduces COGraphs, a graph-based open-vocabulary mapping framework for multi-robot systems that compresses semantic features from $512$-D to $3$-D for transmission, enabling substantial data reduction while preserving mapping and query performance. The system merges local graphs through place recognition and translation estimation, using an encoder/decoder to recover high-dimensional features for open-vocabulary queries. Evaluations on Replica, Isaac Sim, and real-world experiments show data-volume reductions of up to $\sim$89-95% with negligible impact on object retrieval accuracy and map quality. This work enables scalable collaborative perception in communication-limited environments and provides open-source datasets and benchmarks to advance open-vocabulary, graph-based mapping.

Abstract

Collaborative perception in unknown environments is crucial for multi-robot systems. With the emergence of foundation models, robots can now not only perceive geometric information but also achieve open-vocabulary scene understanding. However, existing map representations that support open-vocabulary queries often involve large data volumes, which becomes a bottleneck for multi-robot transmission in communication-limited environments. To address this challenge, we develop a method to construct a graph-structured 3D representation called COGraph, where nodes represent objects with semantic features and edges capture their spatial adjacency relationships. Before transmission, a data-driven feature encoder is applied to compress the feature dimensions of the COGraph. Upon receiving COGraphs from other robots, the semantic features of each node are recovered using a decoder. We also propose a feature-based approach for place recognition and translation estimation, enabling the merging of local COGraphs into a unified global map. We validate our framework on two realistic datasets and the real-world environment. The results demonstrate that, compared to existing baselines for open-vocabulary map construction, our framework reduces the data volume by over 80\% while maintaining mapping and query performance without compromise. For more details, please visit our website at https://github.com/efc-robot/MR-COGraphs.

MR-COGraphs: Communication-efficient Multi-Robot Open-vocabulary Mapping System via 3D Scene Graphs

TL;DR

MR-COGraphs introduces COGraphs, a graph-based open-vocabulary mapping framework for multi-robot systems that compresses semantic features from -D to -D for transmission, enabling substantial data reduction while preserving mapping and query performance. The system merges local graphs through place recognition and translation estimation, using an encoder/decoder to recover high-dimensional features for open-vocabulary queries. Evaluations on Replica, Isaac Sim, and real-world experiments show data-volume reductions of up to 89-95% with negligible impact on object retrieval accuracy and map quality. This work enables scalable collaborative perception in communication-limited environments and provides open-source datasets and benchmarks to advance open-vocabulary, graph-based mapping.

Abstract

Collaborative perception in unknown environments is crucial for multi-robot systems. With the emergence of foundation models, robots can now not only perceive geometric information but also achieve open-vocabulary scene understanding. However, existing map representations that support open-vocabulary queries often involve large data volumes, which becomes a bottleneck for multi-robot transmission in communication-limited environments. To address this challenge, we develop a method to construct a graph-structured 3D representation called COGraph, where nodes represent objects with semantic features and edges capture their spatial adjacency relationships. Before transmission, a data-driven feature encoder is applied to compress the feature dimensions of the COGraph. Upon receiving COGraphs from other robots, the semantic features of each node are recovered using a decoder. We also propose a feature-based approach for place recognition and translation estimation, enabling the merging of local COGraphs into a unified global map. We validate our framework on two realistic datasets and the real-world environment. The results demonstrate that, compared to existing baselines for open-vocabulary map construction, our framework reduces the data volume by over 80\% while maintaining mapping and query performance without compromise. For more details, please visit our website at https://github.com/efc-robot/MR-COGraphs.

Paper Structure

This paper contains 28 sections, 2 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Overview of the MR-COGraphs Framework.
  • Figure 2: The Generation Process of COGraphs.
  • Figure 3: Comparison of the original and decoded features when the encoder and decoder are trained on household-related images from ImageNet.
  • Figure 4: COGraphs Merging.
  • Figure 5: Feature Compression Evaluation.
  • ...and 2 more figures