Table of Contents
Fetching ...

Graph2Nav: 3D Object-Relation Graph Generation to Robot Navigation

Tixiao Shan, Abhinav Rajvanshi, Niluthpol Mithun, Han-Pang Chiu

TL;DR

Graph2Nav tackles the problem of real-time 3D object-relation graph generation for robot navigation by fusing 2D panoptic scene graphs with 3D SLAM to build a global, layered 3D scene graph that includes rich object relations. It leverages PSGFormer-based 2D panoptic scene graphs to generate 3D semantics without relying on 3D-only training data, and integrates with an LLM-enabled planner (SayNav) to drive object-search tasks in real-world indoor and outdoor environments. The approach demonstrates improved 3D object localization and relation labeling accuracy, and shows navigation efficiency gains when object-relations are used to inform planning. This work broadens the applicability of 3D scene graphs to real-world outdoor scenarios and real-time robotic navigation, with potential extensions to grounding relations in LLMs for more complex manipulation tasks.

Abstract

We propose Graph2Nav, a real-time 3D object-relation graph generation framework, for autonomous navigation in the real world. Our framework fully generates and exploits both 3D objects and a rich set of semantic relationships among objects in a 3D layered scene graph, which is applicable to both indoor and outdoor scenes. It learns to generate 3D semantic relations among objects, by leveraging and advancing state-of-the-art 2D panoptic scene graph works into the 3D world via 3D semantic mapping techniques. This approach avoids previous training data constraints in learning 3D scene graphs directly from 3D data. We conduct experiments to validate the accuracy in locating 3D objects and labeling object-relations in our 3D scene graphs. We also evaluate the impact of Graph2Nav via integration with SayNav, a state-of-the-art planner based on large language models, on an unmanned ground robot to object search tasks in real environments. Our results demonstrate that modeling object relations in our scene graphs improves search efficiency in these navigation tasks.

Graph2Nav: 3D Object-Relation Graph Generation to Robot Navigation

TL;DR

Graph2Nav tackles the problem of real-time 3D object-relation graph generation for robot navigation by fusing 2D panoptic scene graphs with 3D SLAM to build a global, layered 3D scene graph that includes rich object relations. It leverages PSGFormer-based 2D panoptic scene graphs to generate 3D semantics without relying on 3D-only training data, and integrates with an LLM-enabled planner (SayNav) to drive object-search tasks in real-world indoor and outdoor environments. The approach demonstrates improved 3D object localization and relation labeling accuracy, and shows navigation efficiency gains when object-relations are used to inform planning. This work broadens the applicability of 3D scene graphs to real-world outdoor scenarios and real-time robotic navigation, with potential extensions to grounding relations in LLMs for more complex manipulation tasks.

Abstract

We propose Graph2Nav, a real-time 3D object-relation graph generation framework, for autonomous navigation in the real world. Our framework fully generates and exploits both 3D objects and a rich set of semantic relationships among objects in a 3D layered scene graph, which is applicable to both indoor and outdoor scenes. It learns to generate 3D semantic relations among objects, by leveraging and advancing state-of-the-art 2D panoptic scene graph works into the 3D world via 3D semantic mapping techniques. This approach avoids previous training data constraints in learning 3D scene graphs directly from 3D data. We conduct experiments to validate the accuracy in locating 3D objects and labeling object-relations in our 3D scene graphs. We also evaluate the impact of Graph2Nav via integration with SayNav, a state-of-the-art planner based on large language models, on an unmanned ground robot to object search tasks in real environments. Our results demonstrate that modeling object relations in our scene graphs improves search efficiency in these navigation tasks.

Paper Structure

This paper contains 15 sections, 1 equation, 4 figures, 3 tables.

Figures (4)

  • Figure 1: 3D scene graphs constructed using Graph2Nav for outdoor (left) and indoor (right) scenes. The graph includes a hierarchy (from top to bottom): a test site, regions, and objects. The figure also shows examples of 2D input images and the 3D point clouds generated by Graph2Nav. Note we omit text labels of objects and object-relations in 3D scene graphs for better visualization.
  • Figure 2: The process flow diagram for Graph2Nav. A pose graph-based SLAM system is utilized to provide real-time pose estimations for received image and point cloud data. Semantic objects and relations are then extracted from them via a panoptic scene graph generation network. At last, a consistent global 3D scene graph is generated by continuously merging newly observed objects and relations.
  • Figure 3: Two examples of object-relations ("beside" and "on" top of) from portions of our generated 3D scene graphs (top) with their correspondent 2D images (bottom). Note we only show large objects with their relations for better visualization.
  • Figure 4: An example of the impact from object-relations to the search plan (yellow trajectory in bottom-left picture and bottom-right picture) executed by our robot (top): No object-relations (bottom-left), and with object-relations (bottom-right).