Table of Contents
Fetching ...

Generating consistent PDDL domains with Large Language Models

Pavel Smirnov, Frank Joublin, Antonello Ceravola, Michael Gienger

TL;DR

The paper addresses the challenge of generating runnable PDDL domains from natural language by embedding an automated feedback loop that combines textual plan generation, JSON-based domain markup, syntactic and semantic consistency checks, and reachability analysis. The approach mitigates common LLM errors by back-and-forth correction and plan-based validation, though absolute correctness is not guaranteed. Experiments across Gripper, Pizza, Logistics, Household, and Tyreworld show that the method reduces errors and eases human-in-the-loop correction, with simpler domains more reliably executable and complex domains remaining challenging. This work advances practical LLM-assisted task planning by providing a structured pipeline that aligns LLM output with planning executability and more efficient human verification.

Abstract

Large Language Models (LLMs) are capable of transforming natural language domain descriptions into plausibly looking PDDL markup. However, ensuring that actions are consistent within domains still remains a challenging task. In this paper we present a novel concept to significantly improve the quality of LLM-generated PDDL models by performing automated consistency checking during the generation process. Although the proposed consistency checking strategies still can't guarantee absolute correctness of generated models, they can serve as valuable source of feedback reducing the amount of correction efforts expected from a human in the loop. We demonstrate the capabilities of our error detection approach on a number of classical and custom planning domains (logistics, gripper, tyreworld, household, pizza).

Generating consistent PDDL domains with Large Language Models

TL;DR

The paper addresses the challenge of generating runnable PDDL domains from natural language by embedding an automated feedback loop that combines textual plan generation, JSON-based domain markup, syntactic and semantic consistency checks, and reachability analysis. The approach mitigates common LLM errors by back-and-forth correction and plan-based validation, though absolute correctness is not guaranteed. Experiments across Gripper, Pizza, Logistics, Household, and Tyreworld show that the method reduces errors and eases human-in-the-loop correction, with simpler domains more reliably executable and complex domains remaining challenging. This work advances practical LLM-assisted task planning by providing a structured pipeline that aligns LLM output with planning executability and more efficient human verification.

Abstract

Large Language Models (LLMs) are capable of transforming natural language domain descriptions into plausibly looking PDDL markup. However, ensuring that actions are consistent within domains still remains a challenging task. In this paper we present a novel concept to significantly improve the quality of LLM-generated PDDL models by performing automated consistency checking during the generation process. Although the proposed consistency checking strategies still can't guarantee absolute correctness of generated models, they can serve as valuable source of feedback reducing the amount of correction efforts expected from a human in the loop. We demonstrate the capabilities of our error detection approach on a number of classical and custom planning domains (logistics, gripper, tyreworld, household, pizza).
Paper Structure (11 sections, 2 figures, 2 tables)

This paper contains 11 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Main generation pipeline
  • Figure 2: Reachibility analysis pipeline (step 4)