MA3DSG: Multi-Agent 3D Scene Graph Generation for Large-Scale Indoor Environments
Yirum Kim, Jaewoo Kim, Ue-Hwan Kim
TL;DR
This work tackles the scalability gap in 3D scene graph generation by introducing MA3DSG, a training-free, multi-agent framework that incrementally builds local graphs and fuses them into a global 3D semantic scene graph through a lightweight graph alignment and update mechanism. It also presents MA3DSG-Bench, a comprehensive benchmark that simulates diverse agent configurations, domain sizes, and dynamic conditions to assess performance and scalability in large-scale indoor environments. Empirical results show MA3DSG achieving competitive accuracy with substantial gains in speed (up to 4x faster) and dramatic reductions in data traffic (up to ~98x) compared to multi-agent baselines, especially in dynamic LDCP scenarios. The work lays a foundation for scalable, real-world multi-agent 3DSGG systems and provides a practical benchmark for future research.
Abstract
Current 3D scene graph generation (3DSGG) approaches heavily rely on a single-agent assumption and small-scale environments, exhibiting limited scalability to real-world scenarios. In this work, we introduce Multi-Agent 3D Scene Graph Generation (MA3DSG) model, the first framework designed to tackle this scalability challenge using multiple agents. We develop a training-free graph alignment algorithm that efficiently merges partial query graphs from individual agents into a unified global scene graph. Leveraging extensive analysis and empirical insights, our approach enables conventional single-agent systems to operate collaboratively without requiring any learnable parameters. To rigorously evaluate 3DSGG performance, we propose MA3DSG-Bench-a benchmark that supports diverse agent configurations, domain sizes, and environmental conditions-providing a more general and extensible evaluation framework. This work lays a solid foundation for scalable, multi-agent 3DSGG research.
