Leveraging Environment Interaction for Automated PDDL Translation and Planning with Large Language Models

Sadegh Mahdavi; Raquel Aoki; Keyi Tang; Yanshuai Cao

Leveraging Environment Interaction for Automated PDDL Translation and Planning with Large Language Models

Sadegh Mahdavi, Raquel Aoki, Keyi Tang, Yanshuai Cao

TL;DR

This work enables the automated modeling of planning environments using LLMs and environment feedback, eliminating the need for human intervention in the PDDL translation process and paving the way for more reliable LLM agents in challenging problems.

Abstract

Large Language Models (LLMs) have shown remarkable performance in various natural language tasks, but they often struggle with planning problems that require structured reasoning. To address this limitation, the conversion of planning problems into the Planning Domain Definition Language (PDDL) has been proposed as a potential solution, enabling the use of automated planners. However, generating accurate PDDL files typically demands human inputs or correction, which can be time-consuming and costly. In this paper, we propose a novel approach that leverages LLMs and environment feedback to automatically generate PDDL domain and problem description files without the need for human intervention. Our method introduces an iterative refinement process that generates multiple problem PDDL candidates and progressively refines the domain PDDL based on feedback obtained from interacting with the environment. To guide the refinement process, we develop an Exploration Walk (EW) metric, which provides rich feedback signals for LLMs to update the PDDL file. We evaluate our approach on $10$ PDDL environments. We achieve an average task solve rate of 66% compared to a 29% solve rate by GPT-4's intrinsic planning with chain-of-thought prompting. Our work enables the automated modeling of planning environments using LLMs and environment feedback, eliminating the need for human intervention in the PDDL translation process and paving the way for more reliable LLM agents in challenging problems. Our code is available at https://github.com/BorealisAI/llm-pddl-planning

Leveraging Environment Interaction for Automated PDDL Translation and Planning with Large Language Models

TL;DR

Abstract

PDDL environments. We achieve an average task solve rate of 66% compared to a 29% solve rate by GPT-4's intrinsic planning with chain-of-thought prompting. Our work enables the automated modeling of planning environments using LLMs and environment feedback, eliminating the need for human intervention in the PDDL translation process and paving the way for more reliable LLM agents in challenging problems. Our code is available at https://github.com/BorealisAI/llm-pddl-planning

Paper Structure (19 sections, 2 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 19 sections, 2 equations, 6 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Notation and Background
Method
Setup
Difficulty of domain PDDL generation
Domain alignment measure via Exploration Walk metrics
Leveraging LLMs to generate PDDL files
Experiments
Conclusion
Dataset
Dataset Details.
Criticality of predicate design.
Implementation Details
One-shot prompting
...and 4 more sections

Figures (6)

Figure 1: Snippets of PDDL domain, problem, and plan.
Figure 2: (a) Effect of the number of removed terms on plan search failure. Each gray line shows the $\text{PNF}_k$ (Plan-Not-Found) metric for one environment. The red line is the average of all 15 environments. (b) Correlation between average exploration walk (EW) score and average domain difference. The $x$-axis shows how many terms each pair of domains differs in. The $y$-axis shows the average EW score over various pairs. All the domains show the average monotonicity of the EW score with respect to term difference.
Figure 3: Overview of our method. Right: The process begins with natural language descriptions translated into problem PDDL by the LLM (red arrows). Then a domain is generated and refined through iterative cycles involving exploration walks in the environment, interaction with a classical planner, and feedback from the LLM (blue/black arrows). Left: The iterative refinement process depicted on the right corresponds to single paths in the structures shown on the left. Each node represents a state in the refinement process, with arrows indicating problem translation (red), domain refinement (blue).
Figure 4: Correlation between average exploration walk score and average domain difference
Figure 5: Historgram of average number of lines of domains in pddl_github.
...and 1 more figures

Theorems & Definitions (1)

Definition 1: EW Metrics

Leveraging Environment Interaction for Automated PDDL Translation and Planning with Large Language Models

TL;DR

Abstract

Leveraging Environment Interaction for Automated PDDL Translation and Planning with Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)

Theorems & Definitions (1)