Hierarchical Planning for Complex Tasks with Knowledge Graph-RAG and Symbolic Verification
Cristina Cornelio, Flavio Petruzzellis, Pietro Lio
TL;DR
This work tackles long-horizon robotic planning with LLMs by introducing HVR, a neuro-symbolic framework that fuses Hierarchical planning, Knowledge Graph-based Retrieval-Augmented Generation (KG-RAG), and Symbolic Verification. The system uses an Ontology $ O$ and a dynamic Knowledge Graph $ G$ to ground plans, extracts task-relevant subgraphs $ G'$ via KG-RAG, and generates macro actions (MAs) expanded into atomic action blocks (AAs) through LLM-driven policies with an action mapper. A Symbolic Validator checks preconditions and outcomes in a PDDL-like formalism, enabling plan correction and serving as a real-time failure detector by aligning ideal world states with observed states. Macro actions are stored in a reusable library to support knowledge transfer across agents. Experiments in an AI2Thor kitchen environment show that HVR outperforms baselines across LLM sizes and task complexities, with RAG aiding smaller models, hierarchical planning helping larger models, and symbolic verification improving plan correctness, albeit with some overhead and simulator limitations. This approach offers a robust blueprint for reliable, scalable, and transferable planning in embodied AI systems, particularly in knowledge-rich, long-horizon tasks.
Abstract
Large Language Models (LLMs) have shown promise as robotic planners but often struggle with long-horizon and complex tasks, especially in specialized environments requiring external knowledge. While hierarchical planning and Retrieval-Augmented Generation (RAG) address some of these challenges, they remain insufficient on their own and a deeper integration is required for achieving more reliable systems. To this end, we propose a neuro-symbolic approach that enhances LLMs-based planners with Knowledge Graph-based RAG for hierarchical plan generation. This method decomposes complex tasks into manageable subtasks, further expanded into executable atomic action sequences. To ensure formal correctness and proper decomposition, we integrate a Symbolic Validator, which also functions as a failure detector by aligning expected and observed world states. Our evaluation against baseline methods demonstrates the consistent significant advantages of integrating hierarchical planning, symbolic verification, and RAG across tasks of varying complexity and different LLMs. Additionally, our experimental setup and novel metrics not only validate our approach for complex planning but also serve as a tool for assessing LLMs' reasoning and compositional capabilities.
