Optimal Scene Graph Planning with Large Language Model Guidance
Zhirui Dai, Arash Asgharivaskasi, Thai Duong, Shusen Lin, Maria-Elizabeth Tzes, George Pappas, Nikolay Atanasov
TL;DR
The paper addresses translating natural-language missions into verifiable task specifications on hierarchical scene graphs and planning optimally within that structure. It introduces a four-level hierarchical planning domain derived from a scene graph and uses AMRA* with a provably consistent LTL heuristic and an LLM-guided heuristic to accelerate search while maintaining optimality. The approach integrates an LLM to map natural language to co-safe LTL formulas and to provide semantic guidance, together with a formal automaton-based check via $\mathcal{M}_{\phi_{\mu}}$ and a multi-heuristic framework that guarantees correctness. Experimental results on complex indoor scenes show that LLM guidance significantly speeds up planning and that using guidance across multiple hierarchy levels yields the best performance, enabling efficient and scalable natural-language task planning in semantic maps.
Abstract
Recent advances in metric, semantic, and topological mapping have equipped autonomous robots with semantic concept grounding capabilities to interpret natural language tasks. This work aims to leverage these new capabilities with an efficient task planning algorithm for hierarchical metric-semantic models. We consider a scene graph representation of the environment and utilize a large language model (LLM) to convert a natural language task into a linear temporal logic (LTL) automaton. Our main contribution is to enable optimal hierarchical LTL planning with LLM guidance over scene graphs. To achieve efficiency, we construct a hierarchical planning domain that captures the attributes and connectivity of the scene graph and the task automaton, and provide semantic guidance via an LLM heuristic function. To guarantee optimality, we design an LTL heuristic function that is provably consistent and supplements the potentially inadmissible LLM guidance in multi-heuristic planning. We demonstrate efficient planning of complex natural language tasks in scene graphs of virtualized real environments.
