Generating consistent PDDL domains with Large Language Models
Pavel Smirnov, Frank Joublin, Antonello Ceravola, Michael Gienger
TL;DR
The paper addresses the challenge of generating runnable PDDL domains from natural language by embedding an automated feedback loop that combines textual plan generation, JSON-based domain markup, syntactic and semantic consistency checks, and reachability analysis. The approach mitigates common LLM errors by back-and-forth correction and plan-based validation, though absolute correctness is not guaranteed. Experiments across Gripper, Pizza, Logistics, Household, and Tyreworld show that the method reduces errors and eases human-in-the-loop correction, with simpler domains more reliably executable and complex domains remaining challenging. This work advances practical LLM-assisted task planning by providing a structured pipeline that aligns LLM output with planning executability and more efficient human verification.
Abstract
Large Language Models (LLMs) are capable of transforming natural language domain descriptions into plausibly looking PDDL markup. However, ensuring that actions are consistent within domains still remains a challenging task. In this paper we present a novel concept to significantly improve the quality of LLM-generated PDDL models by performing automated consistency checking during the generation process. Although the proposed consistency checking strategies still can't guarantee absolute correctness of generated models, they can serve as valuable source of feedback reducing the amount of correction efforts expected from a human in the loop. We demonstrate the capabilities of our error detection approach on a number of classical and custom planning domains (logistics, gripper, tyreworld, household, pizza).
