Generation of Uncertainty-Aware Emergent Concepts in Factorized 3D Scene Graphs via Graph Neural Networks
Jose Andres Millan-Romera, Muhammad Shaheer, Miguel Fernandez-Cortizas, Martin R. Oswald, Holger Voos, Jose Luis Sanchez-Lopez
TL;DR
The paper addresses how to autonomously derive emergent spatial concepts, such as rooms, from primitive plane observations within a Factorized 3D Scene Graph (F3DSG) and to encode these concepts as uncertainty-aware optimization factors in a SLAM backend. It introduces a three-stage learning-based pipeline: Sem-GAT for semantic grouping of planes into emergent concepts, Met-GNN for predicting centroids of these concepts, and learned factors with learned covariances for joint optimization in SLAM. The authors demonstrate that these learned components can improve concept detection, trajectory estimation, and map reconstruction across simulated and real indoor environments, with notable robustness in complex, non-Manhattan layouts. The work provides a foundation for scalable, generalizable spatial understanding in robotics by replacing manual, concept-specific heuristics with a unified, uncertainty-aware learning framework integrated into F3DSG-based SLAM.
Abstract
Enabling robots to autonomously discover emergent spatial concepts (e.g., rooms) from primitive geometric observations (e.g., planar surfaces) within 3D Scene Graphs is essential for robust indoor navigation and mapping. These graphs provide a hierarchical metric-semantic representation in which such concepts are organized. To further enhance graph-SLAM performance, Factorized 3D Scene Graphs incorporate these concepts as optimization factors that constrain relative geometry and enforce global consistency. However, both stages of this process remain largely manual: concepts are typically derived using hand-crafted, concept-specific heuristics, while factors and their covariances are likewise manually designed. This reliance on manual specification limits generalization across diverse environments and scalability to new concept classes. This paper presents, for the first time, a learning-based method to generate online spatial emergent concepts as optimizable factors within a SLAM backend, reducing the need to handcraft both concept generation and the definition of their corresponding factors and covariances. In both simulated and real indoor scenarios, our approach improves complex concept detection by 20.7% and 5.3%, trajectory estimation by 19.2%, and map reconstruction by 12.3% and 3.8%, respectively, highlighting the benefits of this integration for robust and adaptive spatial understanding.
