Open Scene Graphs for Open World Object-Goal Navigation
Joel Loo, Zhanxin Wu, David Hsu
TL;DR
The paper introduces Open Scene Graphs (OSGs) as a configurable topo-semantic memory for open-world object-goal navigation and presents OpenSearch, a system that composes foundation models to perform zero-shot, open-vocabulary searches across diverse environments and robot embodiments. OSGs structure open-set scene information into a layered, hierarchical graph that can be instantiated per environment type, enabling LLM-based reasoning to plan, explore, and navigate toward target objects specified in natural language. The approach is validated through simulation and real-world experiments, showing improved reasoning and generalisation over existing LLM-based methods and demonstrating zero-shot robustness to new environments, robots, and instructions. Limitations include computational cost and lack of explicit uncertainty handling, with future work pointing to online OSG spec inference and more efficient inference for broader real-world applicability.
Abstract
How can we build robots for open-world semantic navigation tasks, like searching for target objects in novel scenes? While foundation models have the rich knowledge and generalisation needed for these tasks, a suitable scene representation is needed to connect them into a complete robot system. We address this with Open Scene Graphs (OSGs), a topo-semantic representation that retains and organises open-set scene information for these models, and has a structure that can be configured for different environment types. We integrate foundation models and OSGs into the OpenSearch system for Open World Object-Goal Navigation, which is capable of searching for open-set objects specified in natural language, while generalising zero-shot across diverse environments and embodiments. Our OSGs enhance reasoning with Large Language Models (LLM), enabling robust object-goal navigation outperforming existing LLM approaches. Through simulation and real-world experiments, we validate OpenSearch's generalisation across varied environments, robots and novel instructions.
