Table of Contents
Fetching ...

Hierarchical Planning for Complex Tasks with Knowledge Graph-RAG and Symbolic Verification

Cristina Cornelio, Flavio Petruzzellis, Pietro Lio

TL;DR

This work tackles long-horizon robotic planning with LLMs by introducing HVR, a neuro-symbolic framework that fuses Hierarchical planning, Knowledge Graph-based Retrieval-Augmented Generation (KG-RAG), and Symbolic Verification. The system uses an Ontology $ O$ and a dynamic Knowledge Graph $ G$ to ground plans, extracts task-relevant subgraphs $ G'$ via KG-RAG, and generates macro actions (MAs) expanded into atomic action blocks (AAs) through LLM-driven policies with an action mapper. A Symbolic Validator checks preconditions and outcomes in a PDDL-like formalism, enabling plan correction and serving as a real-time failure detector by aligning ideal world states with observed states. Macro actions are stored in a reusable library to support knowledge transfer across agents. Experiments in an AI2Thor kitchen environment show that HVR outperforms baselines across LLM sizes and task complexities, with RAG aiding smaller models, hierarchical planning helping larger models, and symbolic verification improving plan correctness, albeit with some overhead and simulator limitations. This approach offers a robust blueprint for reliable, scalable, and transferable planning in embodied AI systems, particularly in knowledge-rich, long-horizon tasks.

Abstract

Large Language Models (LLMs) have shown promise as robotic planners but often struggle with long-horizon and complex tasks, especially in specialized environments requiring external knowledge. While hierarchical planning and Retrieval-Augmented Generation (RAG) address some of these challenges, they remain insufficient on their own and a deeper integration is required for achieving more reliable systems. To this end, we propose a neuro-symbolic approach that enhances LLMs-based planners with Knowledge Graph-based RAG for hierarchical plan generation. This method decomposes complex tasks into manageable subtasks, further expanded into executable atomic action sequences. To ensure formal correctness and proper decomposition, we integrate a Symbolic Validator, which also functions as a failure detector by aligning expected and observed world states. Our evaluation against baseline methods demonstrates the consistent significant advantages of integrating hierarchical planning, symbolic verification, and RAG across tasks of varying complexity and different LLMs. Additionally, our experimental setup and novel metrics not only validate our approach for complex planning but also serve as a tool for assessing LLMs' reasoning and compositional capabilities.

Hierarchical Planning for Complex Tasks with Knowledge Graph-RAG and Symbolic Verification

TL;DR

This work tackles long-horizon robotic planning with LLMs by introducing HVR, a neuro-symbolic framework that fuses Hierarchical planning, Knowledge Graph-based Retrieval-Augmented Generation (KG-RAG), and Symbolic Verification. The system uses an Ontology and a dynamic Knowledge Graph to ground plans, extracts task-relevant subgraphs via KG-RAG, and generates macro actions (MAs) expanded into atomic action blocks (AAs) through LLM-driven policies with an action mapper. A Symbolic Validator checks preconditions and outcomes in a PDDL-like formalism, enabling plan correction and serving as a real-time failure detector by aligning ideal world states with observed states. Macro actions are stored in a reusable library to support knowledge transfer across agents. Experiments in an AI2Thor kitchen environment show that HVR outperforms baselines across LLM sizes and task complexities, with RAG aiding smaller models, hierarchical planning helping larger models, and symbolic verification improving plan correctness, albeit with some overhead and simulator limitations. This approach offers a robust blueprint for reliable, scalable, and transferable planning in embodied AI systems, particularly in knowledge-rich, long-horizon tasks.

Abstract

Large Language Models (LLMs) have shown promise as robotic planners but often struggle with long-horizon and complex tasks, especially in specialized environments requiring external knowledge. While hierarchical planning and Retrieval-Augmented Generation (RAG) address some of these challenges, they remain insufficient on their own and a deeper integration is required for achieving more reliable systems. To this end, we propose a neuro-symbolic approach that enhances LLMs-based planners with Knowledge Graph-based RAG for hierarchical plan generation. This method decomposes complex tasks into manageable subtasks, further expanded into executable atomic action sequences. To ensure formal correctness and proper decomposition, we integrate a Symbolic Validator, which also functions as a failure detector by aligning expected and observed world states. Our evaluation against baseline methods demonstrates the consistent significant advantages of integrating hierarchical planning, symbolic verification, and RAG across tasks of varying complexity and different LLMs. Additionally, our experimental setup and novel metrics not only validate our approach for complex planning but also serve as a tool for assessing LLMs' reasoning and compositional capabilities.

Paper Structure

This paper contains 27 sections, 3 figures, 16 tables.

Figures (3)

  • Figure 1: Overview of HRV: (1) Given a natural language task description, a pre-trained frozen LLM to generate a macro-plan (policy $\varphi$), which is expanded into an AA-block (policy $\pi$). Retrieval-augmented generation (RAG) method retrieves relevant context from the agent knowledge graph (initialized with the environment ontology) to support plan generation, while a Plan Validator detects and triggers the correction of potential errors. (2) Once the plan is finalized, the agent executes the atomic actions (AA) from each AA-block within the environment (see Figure \ref{['fig:robot_point_of_view']}), while recording the execution details in the knowledge graph. (3) After each action, a Symbolic Validator verifies the alignment between the "ideal plan" and the actual environment state, potentially detecting failures. The system then reports the performance of the LLM-based planner using a set of novel metrics.
  • Figure 2: An agent executes an Expanded plan in a kitchen environment following its sequence of atomic actions $aa_j$. For each action, the robot interACTs with the environment and OBSERVEs the resulting state. Visual observations are captured as a scene graph, representing object relative positions and states, while auditory feedback is processed through classification. This multi-modal outcome corresponds to the Agent's action outcome in Fig. \ref{['fig:system_overview']}.
  • Figure 3: The task "Serve wine" is divided into multiple macro actions MAs $(ma^1, \ldots, ma^m)$ such as "Pick up the bottle of wine" and "Pour wine into the cup". Each macro action is decomposed into a AA-block with atomic actions AAs $(aa_1^i, \ldots, aa_{k_i}^i)$ such as navigate_to_obj, pick_up, and pour. The final expanded plan (E-plan) is the concatenating all the AAs from each AA-block.