Optimal Scene Graph Planning with Large Language Model Guidance

Zhirui Dai; Arash Asgharivaskasi; Thai Duong; Shusen Lin; Maria-Elizabeth Tzes; George Pappas; Nikolay Atanasov

Optimal Scene Graph Planning with Large Language Model Guidance

Zhirui Dai, Arash Asgharivaskasi, Thai Duong, Shusen Lin, Maria-Elizabeth Tzes, George Pappas, Nikolay Atanasov

TL;DR

The paper addresses translating natural-language missions into verifiable task specifications on hierarchical scene graphs and planning optimally within that structure. It introduces a four-level hierarchical planning domain derived from a scene graph and uses AMRA* with a provably consistent LTL heuristic and an LLM-guided heuristic to accelerate search while maintaining optimality. The approach integrates an LLM to map natural language to co-safe LTL formulas and to provide semantic guidance, together with a formal automaton-based check via $\mathcal{M}_{\phi_{\mu}}$ and a multi-heuristic framework that guarantees correctness. Experimental results on complex indoor scenes show that LLM guidance significantly speeds up planning and that using guidance across multiple hierarchy levels yields the best performance, enabling efficient and scalable natural-language task planning in semantic maps.

Abstract

Recent advances in metric, semantic, and topological mapping have equipped autonomous robots with semantic concept grounding capabilities to interpret natural language tasks. This work aims to leverage these new capabilities with an efficient task planning algorithm for hierarchical metric-semantic models. We consider a scene graph representation of the environment and utilize a large language model (LLM) to convert a natural language task into a linear temporal logic (LTL) automaton. Our main contribution is to enable optimal hierarchical LTL planning with LLM guidance over scene graphs. To achieve efficiency, we construct a hierarchical planning domain that captures the attributes and connectivity of the scene graph and the task automaton, and provide semantic guidance via an LLM heuristic function. To guarantee optimality, we design an LTL heuristic function that is provably consistent and supplements the potentially inadmissible LLM guidance in multi-heuristic planning. We demonstrate efficient planning of complex natural language tasks in scene graphs of virtualized real environments.

Optimal Scene Graph Planning with Large Language Model Guidance

TL;DR

and a multi-heuristic framework that guarantees correctness. Experimental results on complex indoor scenes show that LLM guidance significantly speeds up planning and that using guidance across multiple hierarchy levels yields the best performance, enabling efficient and scalable natural-language task planning in semantic maps.

Abstract

Paper Structure (10 sections, 1 theorem, 7 equations, 7 figures, 3 tables)

This paper contains 10 sections, 1 theorem, 7 equations, 7 figures, 3 tables.

INTRODUCTION
PROBLEM STATEMENT
NATURAL LANGUAGE TO TEMPORAL LOGIC
OPTIMAL SCENE GRAPH PLANNING
AMRA* Planning
Hierarchical Planning Domain Description
LTL Heuristic
LLM Heuristic
EVALUATION
CONCLUSION

Key Result

Proposition 1

The heuristic function $h_{\textsc{LTL}}: \mathcal{V} \times \mathcal{Q} \rightarrow \mathbb{R}$ defined below is consistent:

Figures (7)

Figure 1: Planning a natural language mission, $\mu: \text{"Reach the oven in the kitchen"}$, in a scene graph $\mathcal{G}$ of the Gibson environment Benevolence xiazamirhe2018gibsonenv with object, room, and floor attributes. The terms "oven" and "kitchen" in $\mu$ belong to the object and room attributes of the scene graph, respectively. The scene graph $\mathcal{G}$ is described to the LLM using the connectivity of its attributes (attribute hierarchy) and the LLM is used to translate $\mu$ to LTL formula $\phi_{\mu}$ and associated Automaton $\mathcal{M}_{\phi}$. We construct a hierarchical planning domain from the scene graph, and use multi-resolution multi-heuristic planning saxena2022amra to plan the mission execution. In addition to mission translation, the LLM is used to provide heuristic guidance to accelerate the planning, while an LTL heuristic is used to guarantees optimality.
Figure 2: Natural language to LTL translation. (a) Attribute hierarchy $\Bar{\mathcal{G}}$. The unique IDs and the room connections are shown in parenthesis and inside red brackets, respectively. (b) Unique ID extraction from natural language mission $\mu$. (c) LTL formula generation from natural language specification. (d) Syntax and co-safety check over the generated LTL formula $\phi_{\mu}$. (e) Automaton construction.
Figure 3: Four-level hierarchical planning domain for Benevolence.
Figure 4: ChatGPT prompt requesting a scene graph path.
Figure 5: The automaton graph $T$ for the mission "go to the bedroom 2, then visit the kitchen 3, reach the oven 11, and always avoid the TV 9" with an initial node $q_1 = 4$ and an accepting node $0$.
...and 2 more figures

Theorems & Definitions (5)

Definition 1
Definition 2
Definition 3
Proposition 1
proof

Optimal Scene Graph Planning with Large Language Model Guidance

TL;DR

Abstract

Optimal Scene Graph Planning with Large Language Model Guidance

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (5)