Table of Contents
Fetching ...

CoPAL: Corrective Planning of Robot Actions with Large Language Models

Frank Joublin, Antonello Ceravola, Pavel Smirnov, Felix Ocker, Joerg Deigmoeller, Anna Belardinelli, Chao Wang, Stephan Hasler, Daniel Tanneberg, Michael Gienger

TL;DR

CoPAL tackles open-world robotic planning by integrating large language models into a four-layer corrective planning stack that uses multi-level feedback to recover from planning and execution errors. The approach grounds high-level reasoning in low-level motion through a closed loop, employing backprompting and a hierarchy of planners to adapt to environmental changes. Experiments across barman, blocks world, and pizza scenarios demonstrate improved executability, reduced runtimes when using mid-level planning, and emergent adaptive behaviors in real robots. The work highlights the potential of LLM-driven robots in dynamic settings and points to avenues for explainability, prompt design, and latency mitigation to enable practical deployment.

Abstract

In the pursuit of fully autonomous robotic systems capable of taking over tasks traditionally performed by humans, the complexity of open-world environments poses a considerable challenge. Addressing this imperative, this study contributes to the field of Large Language Models (LLMs) applied to task and motion planning for robots. We propose a system architecture that orchestrates a seamless interplay between multiple cognitive levels, encompassing reasoning, planning, and motion generation. At its core lies a novel replanning strategy that handles physically grounded, logical, and semantic errors in the generated plans. We demonstrate the efficacy of the proposed feedback architecture, particularly its impact on executability, correctness, and time complexity via empirical evaluation in the context of a simulation and two intricate real-world scenarios: blocks world, barman and pizza preparation.

CoPAL: Corrective Planning of Robot Actions with Large Language Models

TL;DR

CoPAL tackles open-world robotic planning by integrating large language models into a four-layer corrective planning stack that uses multi-level feedback to recover from planning and execution errors. The approach grounds high-level reasoning in low-level motion through a closed loop, employing backprompting and a hierarchy of planners to adapt to environmental changes. Experiments across barman, blocks world, and pizza scenarios demonstrate improved executability, reduced runtimes when using mid-level planning, and emergent adaptive behaviors in real robots. The work highlights the potential of LLM-driven robots in dynamic settings and points to avenues for explainability, prompt design, and latency mitigation to enable practical deployment.

Abstract

In the pursuit of fully autonomous robotic systems capable of taking over tasks traditionally performed by humans, the complexity of open-world environments poses a considerable challenge. Addressing this imperative, this study contributes to the field of Large Language Models (LLMs) applied to task and motion planning for robots. We propose a system architecture that orchestrates a seamless interplay between multiple cognitive levels, encompassing reasoning, planning, and motion generation. At its core lies a novel replanning strategy that handles physically grounded, logical, and semantic errors in the generated plans. We demonstrate the efficacy of the proposed feedback architecture, particularly its impact on executability, correctness, and time complexity via empirical evaluation in the context of a simulation and two intricate real-world scenarios: blocks world, barman and pizza preparation.
Paper Structure (13 sections, 1 equation, 5 figures, 5 tables)

This paper contains 13 sections, 1 equation, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Snapshot of a pizza domain task execution on the real robot alongside an illustrative description of the proposed framework.
  • Figure 2: Left: System architecture. Computational modules are drawn in blue, top-down instructions in green, bottom-up feedback in orange, and real-world interaction modules in dark green. Right: Execution flow for two exemplary human requests (step 1 and 3).
  • Figure 3: Amount of high-level (HLP, left) and mid-level (MLP, right) replanning by setup.
  • Figure 4: Comparison of the architecture variations regarding executability and runtimes: architectures replanning both on a mid- and high-level achieve the best trade-off between runtime and executability.
  • Figure 5: Extract of an experiment with collisions and low-level feedback.