Table of Contents
Fetching ...

Learning High-level Semantic-Relational Concepts for SLAM

Jose Andres Millan-Romera, Hriday Bavle, Muhammad Shaheer, Martin R. Oswald, Holger Voos, Jose Luis Sanchez-Lopez

TL;DR

This work proposes an algorithm based on Graph Neural Networks for learning high-level semantic-relational concepts that can be inferred from the low-level factor graph, and integrates it into the S-Graphs+ algorithm providing improved pose and map accuracy compared to the baseline while further enhancing the scene representation.

Abstract

Recent works on SLAM extend their pose graphs with higher-level semantic concepts like Rooms exploiting relationships between them, to provide, not only a richer representation of the situation/environment but also to improve the accuracy of its estimation. Concretely, our previous work, Situational Graphs (S-Graphs+), a pioneer in jointly leveraging semantic relationships in the factor optimization process, relies on semantic entities such as Planes and Rooms, whose relationship is mathematically defined. Nevertheless, there is no unique approach to finding all the hidden patterns in lower-level factor-graphs that correspond to high-level concepts of different natures. It is currently tackled with ad-hoc algorithms, which limits its graph expressiveness. To overcome this limitation, in this work, we propose an algorithm based on Graph Neural Networks for learning high-level semantic-relational concepts that can be inferred from the low-level factor graph. Given a set of mapped Planes our algorithm is capable of inferring Room entities relating to the Planes. Additionally, to demonstrate the versatility of our method, our algorithm can infer an additional semantic-relational concept, i.e. Wall, and its relationship with its Planes. We validate our method in both simulated and real datasets demonstrating improved performance over two baseline approaches. Furthermore, we integrate our method into the S-Graphs+ algorithm providing improved pose and map accuracy compared to the baseline while further enhancing the scene representation.

Learning High-level Semantic-Relational Concepts for SLAM

TL;DR

This work proposes an algorithm based on Graph Neural Networks for learning high-level semantic-relational concepts that can be inferred from the low-level factor graph, and integrates it into the S-Graphs+ algorithm providing improved pose and map accuracy compared to the baseline while further enhancing the scene representation.

Abstract

Recent works on SLAM extend their pose graphs with higher-level semantic concepts like Rooms exploiting relationships between them, to provide, not only a richer representation of the situation/environment but also to improve the accuracy of its estimation. Concretely, our previous work, Situational Graphs (S-Graphs+), a pioneer in jointly leveraging semantic relationships in the factor optimization process, relies on semantic entities such as Planes and Rooms, whose relationship is mathematically defined. Nevertheless, there is no unique approach to finding all the hidden patterns in lower-level factor-graphs that correspond to high-level concepts of different natures. It is currently tackled with ad-hoc algorithms, which limits its graph expressiveness. To overcome this limitation, in this work, we propose an algorithm based on Graph Neural Networks for learning high-level semantic-relational concepts that can be inferred from the low-level factor graph. Given a set of mapped Planes our algorithm is capable of inferring Room entities relating to the Planes. Additionally, to demonstrate the versatility of our method, our algorithm can infer an additional semantic-relational concept, i.e. Wall, and its relationship with its Planes. We validate our method in both simulated and real datasets demonstrating improved performance over two baseline approaches. Furthermore, we integrate our method into the S-Graphs+ algorithm providing improved pose and map accuracy compared to the baseline while further enhancing the scene representation.
Paper Structure (14 sections, 5 equations, 5 figures, 5 tables)

This paper contains 14 sections, 5 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: System Overview. We learn how to generate high-level semantic concepts, such as Rooms and Walls from low-level observed entities, such as Planes. In this example, the Plane information is retrieved from the low-level layer of S-Graphs+s_graphs+ and transformed in a proximity graph in which a GNN classifies each edge as same Room or same Wall. The newly inferred edges are further clustered leveraging the existence of cycles, obtaining a new Room or Wall node for each cluster. By augmenting the S-Graphs+s_graphs+ with the new nodes and edges, we improve the quality of the map and the estimated camera trajectory.
  • Figure 2: System Architecture. This illustrates the entire process from geometric entities reception to the inclusion of new higher-level entities to S-Graph. First, the GNN is trained off-line to update the embedding of nodes and edges and classify the edges. During the SLAM process, the raw low-level nodes are retrieved from S-Graphs and connected with a proximity graph. The GNN infers edge classification to "same Room", "same Wall", or nothing. "same Room" edges are clustered leveraging cycles. A subgraph is generated for clusters or standalone edges and finally included in S-Graph.
  • Figure 3: a) Learned Patterns. Example of the distribution of Planes that belong to the same high-level concept. b) Initial Embeddings. The definition of the nodes features of the Planes is composed by the width (w) and the normal (n) from which it was observed. The edge features are defined by the relative position of the centroids (c) and the closest distance (cd). c) Neural Architecture. The classification of both "same Room" and "same Wall" relations is accomplished by two independent neural networks with similar architecture.
  • Figure 4: Graph Expressiveness. measured by precision and recall in Room detection for the in Hydra RShydra (HRS) and in S-Graphs+ RSs_graphs+ (SRS), as baselines. Those are compared with Our Conservative (C) and Greedy (G) approach in five simulated and four real scenes. For each approach, experiments are aggregated and the metrics are presented by mean, std, maximum, and minimum.
  • Figure 5: Graph Expressiveness. Compared qualitatively over Hydra RShydra, S-Graphs+ RSs_graphs+ and Ours C on two example datasets, SE3 simulated and RC2F2 real. Room generation is presented in the first three columns while indicating precision (P) and recall (R) ratios. The fourth column, presents a 3D perspective to demonstrate the Rooms as well as Wall generation but Ours C.