Table of Contents
Fetching ...

Self-Corrective Task Planning by Inverse Prompting with Large Language Models

Jiho Lee, Hayun Lee, Jonghyeon Kim, Kyungjae Lee, Eunwoo Kim

TL;DR

This work tackles the challenge of LLM-based robot task planning producing plausible yet infeasible plans. It introduces InversePrompt, a self-corrective method that uses inverse prompting to generate inverse actions and verify state reversibility, enabling multi-step reasoning and interpretable feedback. The approach translates natural-language goals into PDDL, then iteratively refines plans with three-step inverse prompting, improving plan feasibility and justification. Empirical results on Ballmoving, Blocksworld, Cooking benchmarks and real-world robot experiments show substantial improvements in success rates and reduction in correction attempts, outperforming both external validators and standard self-correction. The work promises more reliable, explainable LLM-based planning in complex, long-horizon robotic tasks.

Abstract

In robot task planning, large language models (LLMs) have shown significant promise in generating complex and long-horizon action sequences. However, it is observed that LLMs often produce responses that sound plausible but are not accurate. To address these problems, existing methods typically employ predefined error sets or external knowledge sources, requiring human efforts and computation resources. Recently, self-correction approaches have emerged, where LLM generates and refines plans, identifying errors by itself. Despite their effectiveness, they are more prone to failures in correction due to insufficient reasoning. In this paper, we introduce InversePrompt, a novel self-corrective task planning approach that leverages inverse prompting to enhance interpretability. Our method incorporates reasoning steps to provide clear, interpretable feedback. It generates inverse actions corresponding to the initially generated actions and verifies whether these inverse actions can restore the system to its original state, explicitly validating the logical coherence of the generated plans. The results on benchmark datasets show an average 16.3% higher success rate over existing LLM-based task planning methods. Our approach offers clearer justifications for feedback in real-world environments, resulting in more successful task completion than existing self-correction approaches across various scenarios.

Self-Corrective Task Planning by Inverse Prompting with Large Language Models

TL;DR

This work tackles the challenge of LLM-based robot task planning producing plausible yet infeasible plans. It introduces InversePrompt, a self-corrective method that uses inverse prompting to generate inverse actions and verify state reversibility, enabling multi-step reasoning and interpretable feedback. The approach translates natural-language goals into PDDL, then iteratively refines plans with three-step inverse prompting, improving plan feasibility and justification. Empirical results on Ballmoving, Blocksworld, Cooking benchmarks and real-world robot experiments show substantial improvements in success rates and reduction in correction attempts, outperforming both external validators and standard self-correction. The work promises more reliable, explainable LLM-based planning in complex, long-horizon robotic tasks.

Abstract

In robot task planning, large language models (LLMs) have shown significant promise in generating complex and long-horizon action sequences. However, it is observed that LLMs often produce responses that sound plausible but are not accurate. To address these problems, existing methods typically employ predefined error sets or external knowledge sources, requiring human efforts and computation resources. Recently, self-correction approaches have emerged, where LLM generates and refines plans, identifying errors by itself. Despite their effectiveness, they are more prone to failures in correction due to insufficient reasoning. In this paper, we introduce InversePrompt, a novel self-corrective task planning approach that leverages inverse prompting to enhance interpretability. Our method incorporates reasoning steps to provide clear, interpretable feedback. It generates inverse actions corresponding to the initially generated actions and verifies whether these inverse actions can restore the system to its original state, explicitly validating the logical coherence of the generated plans. The results on benchmark datasets show an average 16.3% higher success rate over existing LLM-based task planning methods. Our approach offers clearer justifications for feedback in real-world environments, resulting in more successful task completion than existing self-correction approaches across various scenarios.

Paper Structure

This paper contains 16 sections, 1 equation, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Given a goal, the LLM planner generates an action sequence. Then, (a) a robot executes it without any validation. (b) The generated action sequence is validated by an external validator, which is constructed using rule-based methods or by retrieving knowledge from external sources. (c) The LLM planner validates and refines the generated action sequence through a self-correction process, providing feedback on its output and using it to correct actions. (d) The proposed method further enhances the self-correction process from (c) with inverse prompting.
  • Figure 2: Examples of the generated feedback with the standard prompting and the proposed inverse prompting under the Ballmoving domain, given the question in (a). The text underlined in red indicates the groundings for the final answer, while the text underlined in black highlights the reasoning steps in our approach.
  • Figure 3: An overview of the proposed overall process.
  • Figure 4: The number of attempts for correction.
  • Figure 5: Comparison of execution sequences in real-world scenarios between self-corrective task planning using the proposed method with inverse prompting method (Ours) and without it (Self-corr. w/o IP). indicates that the given action was determined to be correct in the validation process and was subsequently executed. Best viewed in color ($\times$2).