Table of Contents
Fetching ...

An End-to-end Planning Framework with Agentic LLMs and PDDL

Emanuele La Malfa, Ping Zhu, Samuele Marro, Sara Bernardini, Michael Wooldridge

TL;DR

<3-5 sentence high-level summary> The paper addresses the challenge of turning ambiguous, natural-language specifications into correct, optimized planning solutions by coupling LLM-driven orchestration with symbolic planning and verification. It introduces an end-to-end agentic framework that dynamically creates and refines multi-agent workflows to produce PDDL domains and problems, which are solved by external planners and checked by validators, with the final plan back-translated into natural language. The approach yields significant accuracy gains across benchmarks (Google Natural Plan, PlanBench, Blocksworld, Tower of Hanoi) and provides cost-aware refinements, while also offering an open-source implementation and a LangGraph-based parallel variant. It also discusses benchmarking limitations and brittleness of agentic systems, outlining directions for scaling, multimodal inputs, and real-world deployment.

Abstract

We present an end-to-end framework for planning supported by verifiers. An orchestrator receives a human specification written in natural language and converts it into a PDDL (Planning Domain Definition Language) model, where the domain and problem are iteratively refined by sub-modules (agents) to address common planning requirements, such as time constraints and optimality, as well as ambiguities and contradictions that may exist in the human specification. The validated domain and problem are then passed to an external planning engine to generate a plan. The orchestrator and agents are powered by Large Language Models (LLMs) and require no human intervention at any stage of the process. Finally, a module translates the final plan back into natural language to improve human readability while maintaining the correctness of each step. We demonstrate the flexibility and effectiveness of our framework across various domains and tasks, including the Google NaturalPlan benchmark and PlanBench, as well as planning problems like Blocksworld and the Tower of Hanoi (where LLMs are known to struggle even with small instances). Our framework can be integrated with any PDDL planning engine and validator (such as Fast Downward, LPG, POPF, VAL, and uVAL, which we have tested) and represents a significant step toward end-to-end planning aided by LLMs.

An End-to-end Planning Framework with Agentic LLMs and PDDL

TL;DR

<3-5 sentence high-level summary> The paper addresses the challenge of turning ambiguous, natural-language specifications into correct, optimized planning solutions by coupling LLM-driven orchestration with symbolic planning and verification. It introduces an end-to-end agentic framework that dynamically creates and refines multi-agent workflows to produce PDDL domains and problems, which are solved by external planners and checked by validators, with the final plan back-translated into natural language. The approach yields significant accuracy gains across benchmarks (Google Natural Plan, PlanBench, Blocksworld, Tower of Hanoi) and provides cost-aware refinements, while also offering an open-source implementation and a LangGraph-based parallel variant. It also discusses benchmarking limitations and brittleness of agentic systems, outlining directions for scaling, multimodal inputs, and real-world deployment.

Abstract

We present an end-to-end framework for planning supported by verifiers. An orchestrator receives a human specification written in natural language and converts it into a PDDL (Planning Domain Definition Language) model, where the domain and problem are iteratively refined by sub-modules (agents) to address common planning requirements, such as time constraints and optimality, as well as ambiguities and contradictions that may exist in the human specification. The validated domain and problem are then passed to an external planning engine to generate a plan. The orchestrator and agents are powered by Large Language Models (LLMs) and require no human intervention at any stage of the process. Finally, a module translates the final plan back into natural language to improve human readability while maintaining the correctness of each step. We demonstrate the flexibility and effectiveness of our framework across various domains and tasks, including the Google NaturalPlan benchmark and PlanBench, as well as planning problems like Blocksworld and the Tower of Hanoi (where LLMs are known to struggle even with small instances). Our framework can be integrated with any PDDL planning engine and validator (such as Fast Downward, LPG, POPF, VAL, and uVAL, which we have tested) and represents a significant step toward end-to-end planning aided by LLMs.

Paper Structure

This paper contains 22 sections, 10 figures, 1 table.

Figures (10)

  • Figure 1: Overview of our end-to-end planning framework graph that generates a plan, backed up by a planner, from a human specification (left). The framework includes the first JSON and PDDL plan generator, and the agentic sub-component, which refines an existing plan (right).
  • Figure 2: Overview of our planning framework illustrating how the agentic components interact with each other. The "PDDL Generator" produces the JSON representation of the problem and the first PDDL domain, problem, and plan. The "Refiner" solves the outstanding issues in the domain, problem, and plan, informed by the output of the PDDL solver and verifier. The "Clarifier" block, which asks human intervention if any part of the specification is not clear, can be removed seamlessly and appears only in the LangGraph implementation (all the experiments have been carried out without this module or any human intervention).
  • Figure 3: Results for GPT-5-mini on the Google Natural Plan Benchmark and Planbench for $30$ problems of each benchmark.
  • Figure 4: Results for GPT-5-mini on increasingly difficult Blocksworld and the Tower of Hanoi problems. For Blocksworld, "Easy", "Medium", and "Hard" correspond to problems where the optimal solution, which we enforce, consists of $2-4$, $6-8$, and $10-12$ actions.
  • Figure 5: Frequency of each agent for the Google Natural Plan Benchmark and Planbench.
  • ...and 5 more figures