Table of Contents
Fetching ...

LLMs as Planning Formalizers: A Survey for Leveraging Large Language Models to Construct Automated Planning Models

Marcus Tantakoun, Xiaodan Zhu, Christian Muise

TL;DR

Large Language Models struggle with long-horizon planning when used end-to-end, motivating a neuro-symbolic approach in which LLMs are used to build planning models rather than directly producing executable plans. The paper surveys the landscape and proposes a three-part Model Generation taxonomy (Task, Domain, Hybrid) plus Model Editing and Model Benchmarks, centering on LLMs-as-Formalizers. It analyzes roughly 80 works and introduces L2P, an open-source library that standardizes NL-to-PDDL extraction, refinement, and validation to enable planner execution. The work emphasizes explainability, verification, and modular pipelines that combine LLMs with classical AP tools, outlining concrete benchmarks and future directions to make LLM-assisted planning robust and scalable across domains.

Abstract

Large Language Models (LLMs) excel in various natural language tasks but often struggle with long-horizon planning problems requiring structured reasoning. This limitation has drawn interest in integrating neuro-symbolic approaches within the Automated Planning (AP) and Natural Language Processing (NLP) communities. However, identifying optimal AP deployment frameworks can be daunting and introduces new challenges. This paper aims to provide a timely survey of the current research with an in-depth analysis, positioning LLMs as tools for formalizing and refining planning specifications to support reliable off-the-shelf AP planners. By systematically reviewing the current state of research, we highlight methodologies, and identify critical challenges and future directions, hoping to contribute to the joint research on NLP and Automated Planning.

LLMs as Planning Formalizers: A Survey for Leveraging Large Language Models to Construct Automated Planning Models

TL;DR

Large Language Models struggle with long-horizon planning when used end-to-end, motivating a neuro-symbolic approach in which LLMs are used to build planning models rather than directly producing executable plans. The paper surveys the landscape and proposes a three-part Model Generation taxonomy (Task, Domain, Hybrid) plus Model Editing and Model Benchmarks, centering on LLMs-as-Formalizers. It analyzes roughly 80 works and introduces L2P, an open-source library that standardizes NL-to-PDDL extraction, refinement, and validation to enable planner execution. The work emphasizes explainability, verification, and modular pipelines that combine LLMs with classical AP tools, outlining concrete benchmarks and future directions to make LLM-assisted planning robust and scalable across domains.

Abstract

Large Language Models (LLMs) excel in various natural language tasks but often struggle with long-horizon planning problems requiring structured reasoning. This limitation has drawn interest in integrating neuro-symbolic approaches within the Automated Planning (AP) and Natural Language Processing (NLP) communities. However, identifying optimal AP deployment frameworks can be daunting and introduces new challenges. This paper aims to provide a timely survey of the current research with an in-depth analysis, positioning LLMs as tools for formalizing and refining planning specifications to support reliable off-the-shelf AP planners. By systematically reviewing the current state of research, we highlight methodologies, and identify critical challenges and future directions, hoping to contribute to the joint research on NLP and Automated Planning.

Paper Structure

This paper contains 24 sections, 8 figures.

Figures (8)

  • Figure 1: Distinction of planning using LLMs: (a) LLM-as-Planner uses LLMs for direct I/O planning; (b) LLM-as-Formalizer generates planning specifications for existing task planning methods (i.e. PDDL).
  • Figure 2: Taxonomy of research in LLM Planning Model Specification
  • Figure 3: A shortened L2P reconstruction of the 'action-by-action algorithm' guan2023leveragingpretrainedlargelanguage, which iteratively generates PDDL actions while updating a dynamic predicate list. Output found in Figure \ref{['fig:aba-output']}.
  • Figure 4: Exp. = Explicit PDDL info. Feedback/Human Intervention provided at the level of the LLM-generated PDDL spec. itself. $^{\star}$Papers reconstructed by L2P.
  • Figure 5: L2P usage - generating simple PDDL predicates
  • ...and 3 more figures