Table of Contents
Fetching ...

Generation of Uncertainty-Aware Emergent Concepts in Factorized 3D Scene Graphs via Graph Neural Networks

Jose Andres Millan-Romera, Muhammad Shaheer, Miguel Fernandez-Cortizas, Martin R. Oswald, Holger Voos, Jose Luis Sanchez-Lopez

TL;DR

The paper addresses how to autonomously derive emergent spatial concepts, such as rooms, from primitive plane observations within a Factorized 3D Scene Graph (F3DSG) and to encode these concepts as uncertainty-aware optimization factors in a SLAM backend. It introduces a three-stage learning-based pipeline: Sem-GAT for semantic grouping of planes into emergent concepts, Met-GNN for predicting centroids of these concepts, and learned factors with learned covariances for joint optimization in SLAM. The authors demonstrate that these learned components can improve concept detection, trajectory estimation, and map reconstruction across simulated and real indoor environments, with notable robustness in complex, non-Manhattan layouts. The work provides a foundation for scalable, generalizable spatial understanding in robotics by replacing manual, concept-specific heuristics with a unified, uncertainty-aware learning framework integrated into F3DSG-based SLAM.

Abstract

Enabling robots to autonomously discover emergent spatial concepts (e.g., rooms) from primitive geometric observations (e.g., planar surfaces) within 3D Scene Graphs is essential for robust indoor navigation and mapping. These graphs provide a hierarchical metric-semantic representation in which such concepts are organized. To further enhance graph-SLAM performance, Factorized 3D Scene Graphs incorporate these concepts as optimization factors that constrain relative geometry and enforce global consistency. However, both stages of this process remain largely manual: concepts are typically derived using hand-crafted, concept-specific heuristics, while factors and their covariances are likewise manually designed. This reliance on manual specification limits generalization across diverse environments and scalability to new concept classes. This paper presents, for the first time, a learning-based method to generate online spatial emergent concepts as optimizable factors within a SLAM backend, reducing the need to handcraft both concept generation and the definition of their corresponding factors and covariances. In both simulated and real indoor scenarios, our approach improves complex concept detection by 20.7% and 5.3%, trajectory estimation by 19.2%, and map reconstruction by 12.3% and 3.8%, respectively, highlighting the benefits of this integration for robust and adaptive spatial understanding.

Generation of Uncertainty-Aware Emergent Concepts in Factorized 3D Scene Graphs via Graph Neural Networks

TL;DR

The paper addresses how to autonomously derive emergent spatial concepts, such as rooms, from primitive plane observations within a Factorized 3D Scene Graph (F3DSG) and to encode these concepts as uncertainty-aware optimization factors in a SLAM backend. It introduces a three-stage learning-based pipeline: Sem-GAT for semantic grouping of planes into emergent concepts, Met-GNN for predicting centroids of these concepts, and learned factors with learned covariances for joint optimization in SLAM. The authors demonstrate that these learned components can improve concept detection, trajectory estimation, and map reconstruction across simulated and real indoor environments, with notable robustness in complex, non-Manhattan layouts. The work provides a foundation for scalable, generalizable spatial understanding in robotics by replacing manual, concept-specific heuristics with a unified, uncertainty-aware learning framework integrated into F3DSG-based SLAM.

Abstract

Enabling robots to autonomously discover emergent spatial concepts (e.g., rooms) from primitive geometric observations (e.g., planar surfaces) within 3D Scene Graphs is essential for robust indoor navigation and mapping. These graphs provide a hierarchical metric-semantic representation in which such concepts are organized. To further enhance graph-SLAM performance, Factorized 3D Scene Graphs incorporate these concepts as optimization factors that constrain relative geometry and enforce global consistency. However, both stages of this process remain largely manual: concepts are typically derived using hand-crafted, concept-specific heuristics, while factors and their covariances are likewise manually designed. This reliance on manual specification limits generalization across diverse environments and scalability to new concept classes. This paper presents, for the first time, a learning-based method to generate online spatial emergent concepts as optimizable factors within a SLAM backend, reducing the need to handcraft both concept generation and the definition of their corresponding factors and covariances. In both simulated and real indoor scenarios, our approach improves complex concept detection by 20.7% and 5.3%, trajectory estimation by 19.2%, and map reconstruction by 12.3% and 3.8%, respectively, highlighting the benefits of this integration for robust and adaptive spatial understanding.
Paper Structure (14 sections, 9 equations, 8 figures, 5 tables)

This paper contains 14 sections, 9 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: System Overview. An initial graph by proximity is set from the plane nodes inside a factorized 3D scene graph (F3DSG). A graph neural network (GNN) classifies the edges into same room or same wall. They are clustered, and a room or wall semantic node is generated for each cluster. New nodes receive a geometric definition from other GNNs depending on the concept. The metric-semantic nodes are incorporated into the F3DSG along with the factors and their covariances.
  • Figure 2: System architecture. After reception of the plane layer from the F3DSG of the SLAM backend, every node is connected with its K neighbours, building the initial graph by proximity. It is fed to the Sem-GAT, which classifies the edges into same room or same wall. Those are separately clustered, leveraging cycles for same room ones, and generating a room or wall semantic nodes for each cluster if it is consistent with previous observations. Afterwards, the geometric origin of the new nodes is defined by its Met-GNN depending on the concept. A new factor, along with its covariance, is included for every new node and incorporated into the F3DSG for its use by the SLAM backend.
  • Figure 3: Edge classification training. From the synthetic dataset (left, up), only plane nodes are extracted and linked by proximity (right, up) and fed to the Sem-GAT, which infers the edge type (right, down). The loss is computed against the ground truth in the synthetic dataset (left, down, red and orange lines).
  • Figure 4: Origin inference training. From the synthetic dataset (left, up), subgraphs containing room or wall nodes and their adjacent planes are extracted (right, up) and fed independently to the corresponding Met-GNN. The loss is computed against the ground truth origins (left, down, red and orange squares).
  • Figure 5: 3DSG generation from LiDAR data. Top-down views of RViz representation of the generated 3DSGs. The areas of the complex rooms fully detected by our method are highlighted with colors.
  • ...and 3 more figures