MA3DSG: Multi-Agent 3D Scene Graph Generation for Large-Scale Indoor Environments

Yirum Kim; Jaewoo Kim; Ue-Hwan Kim

MA3DSG: Multi-Agent 3D Scene Graph Generation for Large-Scale Indoor Environments

Yirum Kim, Jaewoo Kim, Ue-Hwan Kim

TL;DR

This work tackles the scalability gap in 3D scene graph generation by introducing MA3DSG, a training-free, multi-agent framework that incrementally builds local graphs and fuses them into a global 3D semantic scene graph through a lightweight graph alignment and update mechanism. It also presents MA3DSG-Bench, a comprehensive benchmark that simulates diverse agent configurations, domain sizes, and dynamic conditions to assess performance and scalability in large-scale indoor environments. Empirical results show MA3DSG achieving competitive accuracy with substantial gains in speed (up to 4x faster) and dramatic reductions in data traffic (up to ~98x) compared to multi-agent baselines, especially in dynamic LDCP scenarios. The work lays a foundation for scalable, real-world multi-agent 3DSGG systems and provides a practical benchmark for future research.

Abstract

Current 3D scene graph generation (3DSGG) approaches heavily rely on a single-agent assumption and small-scale environments, exhibiting limited scalability to real-world scenarios. In this work, we introduce Multi-Agent 3D Scene Graph Generation (MA3DSG) model, the first framework designed to tackle this scalability challenge using multiple agents. We develop a training-free graph alignment algorithm that efficiently merges partial query graphs from individual agents into a unified global scene graph. Leveraging extensive analysis and empirical insights, our approach enables conventional single-agent systems to operate collaboratively without requiring any learnable parameters. To rigorously evaluate 3DSGG performance, we propose MA3DSG-Bench-a benchmark that supports diverse agent configurations, domain sizes, and environmental conditions-providing a more general and extensible evaluation framework. This work lays a solid foundation for scalable, multi-agent 3DSGG research.

MA3DSG: Multi-Agent 3D Scene Graph Generation for Large-Scale Indoor Environments

TL;DR

Abstract

Paper Structure (34 sections, 2 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 34 sections, 2 equations, 4 figures, 3 tables, 1 algorithm.

Introduction
Related Work
3D Scene Graph Generation
Hierarchical 3D Scene Graphs
3D Semantic Scene Graphs
Multi-Agent System
Graph Alignment
Methodology
Problem Formulation
Overview
3D Semantic Scene Graph Generation
3D Global Segmentation Map (3D GSM)
Feature Graph
3D Semantic Scene Graph Alignment
Graph Alignment
...and 19 more sections

Figures (4)

Figure 1: Comparison of Runtime and Data Traffic. Our MA3DSG (14.8 min, 3.7 MB) runs $4\times$ faster than single-agent system (SGFN, 61.8 min), and uses $98\times$ less data traffic than multi-agent system (SGFN + SG-PGM, 364.1 MB) in extremely large-scale environments. Unlike the single-agent baselines and MA3DSG, which were only executed on CPUs, the multi-agent baselines utilized GPUs on the backend due to their model complexity.
Figure 2: The overall architecture of the proposed MA3DSG. Each agent incrementally generates 3D semantic scene graphs in a large-scale environment. The framework consists of multi-agent exploration, 3D semantic scene graph generation, and graph alignment, where agents collaboratively construct and integrate local scene graphs into a unified global representation.
Figure 3: Unified domain evaluation. (a) Prior works treat each explored scene separately. (b) A newly annotated final 3D scene graph reflects temporal changes from randomly ordered visits for the LDCP scenario.
Figure 4: Qualitative results of SGFN and MA3DSG. We visualize (a) incrementally scanned point clouds, (b) ground truth instance segmentation, (c) ground truth 3D Semantic Scene Graph, (d) SGFN-generated, and (e) MA3DSG-generated 3D Semantic Scene Graphs. For the same room, the upper row shows SCP results and the lower row shows LDCP results.

MA3DSG: Multi-Agent 3D Scene Graph Generation for Large-Scale Indoor Environments

TL;DR

Abstract

MA3DSG: Multi-Agent 3D Scene Graph Generation for Large-Scale Indoor Environments

Authors

TL;DR

Abstract

Table of Contents

Figures (4)