PDDLEGO: Iterative Planning in Textual Environments
Li Zhang, Peter Jansen, Tianyi Zhang, Peter Clark, Chris Callison-Burch, Niket Tandon
TL;DR
The paper tackles planning under partial observability in textual environments, where end-to-end LLM planning struggles due to incomplete knowledge. It introduces PDDLEGO, a neurosymbolic framework that iteratively builds a PDDL representation during exploration by employing two LL-based modalities: PDDL-gen to generate a full problem file and PDDL-edit to apply constrained edits to the current PF, guided by sub-goals when the end-goal is unattainable. Empirically, PDDLEGO improves planning efficiency and success rate across two text-game benchmarks (Coin Collector and Cooking World) compared with end-to-end action generation, achieving a 43% efficiency gain on Coin Collector and up to 98% success in Cooking World easy, with substantial robustness in harder variants. The approach also improves interpretability and correctability by constraining the planning task to a deterministically solvable PF, albeit at the cost of slower PDDL generation and a requirement for domain-file annotations and sub-goal structures.
Abstract
Planning in textual environments have been shown to be a long-standing challenge even for current models. A recent, promising line of work uses LLMs to generate a formal representation of the environment that can be solved by a symbolic planner. However, existing methods rely on a fully-observed environment where all entity states are initially known, so a one-off representation can be constructed, leading to a complete plan. In contrast, we tackle partially-observed environments where there is initially no sufficient information to plan for the end-goal. We propose PDDLEGO that iteratively construct a planning representation that can lead to a partial plan for a given sub-goal. By accomplishing the sub-goal, more information is acquired to augment the representation, eventually achieving the end-goal. We show that plans produced by few-shot PDDLEGO are 43% more efficient than generating plans end-to-end on the Coin Collector simulation, with strong performance (98%) on the more complex Cooking World simulation where end-to-end LLMs fail to generate coherent plans (4%).
