An End-to-end Planning Framework with Agentic LLMs and PDDL
Emanuele La Malfa, Ping Zhu, Samuele Marro, Sara Bernardini, Michael Wooldridge
TL;DR
<3-5 sentence high-level summary> The paper addresses the challenge of turning ambiguous, natural-language specifications into correct, optimized planning solutions by coupling LLM-driven orchestration with symbolic planning and verification. It introduces an end-to-end agentic framework that dynamically creates and refines multi-agent workflows to produce PDDL domains and problems, which are solved by external planners and checked by validators, with the final plan back-translated into natural language. The approach yields significant accuracy gains across benchmarks (Google Natural Plan, PlanBench, Blocksworld, Tower of Hanoi) and provides cost-aware refinements, while also offering an open-source implementation and a LangGraph-based parallel variant. It also discusses benchmarking limitations and brittleness of agentic systems, outlining directions for scaling, multimodal inputs, and real-world deployment.
Abstract
We present an end-to-end framework for planning supported by verifiers. An orchestrator receives a human specification written in natural language and converts it into a PDDL (Planning Domain Definition Language) model, where the domain and problem are iteratively refined by sub-modules (agents) to address common planning requirements, such as time constraints and optimality, as well as ambiguities and contradictions that may exist in the human specification. The validated domain and problem are then passed to an external planning engine to generate a plan. The orchestrator and agents are powered by Large Language Models (LLMs) and require no human intervention at any stage of the process. Finally, a module translates the final plan back into natural language to improve human readability while maintaining the correctness of each step. We demonstrate the flexibility and effectiveness of our framework across various domains and tasks, including the Google NaturalPlan benchmark and PlanBench, as well as planning problems like Blocksworld and the Tower of Hanoi (where LLMs are known to struggle even with small instances). Our framework can be integrated with any PDDL planning engine and validator (such as Fast Downward, LPG, POPF, VAL, and uVAL, which we have tested) and represents a significant step toward end-to-end planning aided by LLMs.
