PROC2PDDL: Open-Domain Planning Representations from Texts
Tianyi Zhang, Li Zhang, Zhaoyi Hou, Ziyu Wang, Yuling Gu, Peter Clark, Chris Callison-Burch, Niket Tandon
TL;DR
This work presents Proc2PDDL, the first open-domain dataset that maps procedural natural language texts to PDDL representations, enabling evaluation of text-to-planning in diverse domains. It formulates action modeling as predicting a $ abla$DF from text $\mathbb{T}$ and header $H$, and evaluates intrinsic domain-definition accuracy and extrinsic plan solvability via a BFS-based PDDL planner. A Zone of Proximal Development (ZPD) prompting strategy—breaking the task into Extraction, Inference, and Translation—improves LM performance, yet GPT-4-level models still struggle to accurately generate domain actions or reliably solve problems from $ abla$PFs. The dataset uses wikiHow procedures to test open-domain transfer and highlights both syntactic and semantic errors as key bottlenecks for current LMs in symbolic planning. Overall, Proc2PDDL and the ZPD methodology offer a path toward integrating language understanding with formal planning, motivating further research in LM-driven open-domain planning.
Abstract
Planning in a text-based environment continues to be a major challenge for AI systems. Recent approaches have used language models to predict a planning domain definition (e.g., PDDL) but have only been evaluated in closed-domain simulated environments. To address this, we present Proc2PDDL , the first dataset containing open-domain procedural texts paired with expert-annotated PDDL representations. Using this dataset, we evaluate state-of-the-art models on defining the preconditions and effects of actions. We show that Proc2PDDL is highly challenging, with GPT-3.5's success rate close to 0% and GPT-4's around 35%. Our analysis shows both syntactic and semantic errors, indicating LMs' deficiency in both generating domain-specific prgorams and reasoning about events. We hope this analysis and dataset helps future progress towards integrating the best of LMs and formal planning.
