Graph2Nav: 3D Object-Relation Graph Generation to Robot Navigation
Tixiao Shan, Abhinav Rajvanshi, Niluthpol Mithun, Han-Pang Chiu
TL;DR
Graph2Nav tackles the problem of real-time 3D object-relation graph generation for robot navigation by fusing 2D panoptic scene graphs with 3D SLAM to build a global, layered 3D scene graph that includes rich object relations. It leverages PSGFormer-based 2D panoptic scene graphs to generate 3D semantics without relying on 3D-only training data, and integrates with an LLM-enabled planner (SayNav) to drive object-search tasks in real-world indoor and outdoor environments. The approach demonstrates improved 3D object localization and relation labeling accuracy, and shows navigation efficiency gains when object-relations are used to inform planning. This work broadens the applicability of 3D scene graphs to real-world outdoor scenarios and real-time robotic navigation, with potential extensions to grounding relations in LLMs for more complex manipulation tasks.
Abstract
We propose Graph2Nav, a real-time 3D object-relation graph generation framework, for autonomous navigation in the real world. Our framework fully generates and exploits both 3D objects and a rich set of semantic relationships among objects in a 3D layered scene graph, which is applicable to both indoor and outdoor scenes. It learns to generate 3D semantic relations among objects, by leveraging and advancing state-of-the-art 2D panoptic scene graph works into the 3D world via 3D semantic mapping techniques. This approach avoids previous training data constraints in learning 3D scene graphs directly from 3D data. We conduct experiments to validate the accuracy in locating 3D objects and labeling object-relations in our 3D scene graphs. We also evaluate the impact of Graph2Nav via integration with SayNav, a state-of-the-art planner based on large language models, on an unmanned ground robot to object search tasks in real environments. Our results demonstrate that modeling object relations in our scene graphs improves search efficiency in these navigation tasks.
