Self-Corrective Task Planning by Inverse Prompting with Large Language Models
Jiho Lee, Hayun Lee, Jonghyeon Kim, Kyungjae Lee, Eunwoo Kim
TL;DR
This work tackles the challenge of LLM-based robot task planning producing plausible yet infeasible plans. It introduces InversePrompt, a self-corrective method that uses inverse prompting to generate inverse actions and verify state reversibility, enabling multi-step reasoning and interpretable feedback. The approach translates natural-language goals into PDDL, then iteratively refines plans with three-step inverse prompting, improving plan feasibility and justification. Empirical results on Ballmoving, Blocksworld, Cooking benchmarks and real-world robot experiments show substantial improvements in success rates and reduction in correction attempts, outperforming both external validators and standard self-correction. The work promises more reliable, explainable LLM-based planning in complex, long-horizon robotic tasks.
Abstract
In robot task planning, large language models (LLMs) have shown significant promise in generating complex and long-horizon action sequences. However, it is observed that LLMs often produce responses that sound plausible but are not accurate. To address these problems, existing methods typically employ predefined error sets or external knowledge sources, requiring human efforts and computation resources. Recently, self-correction approaches have emerged, where LLM generates and refines plans, identifying errors by itself. Despite their effectiveness, they are more prone to failures in correction due to insufficient reasoning. In this paper, we introduce InversePrompt, a novel self-corrective task planning approach that leverages inverse prompting to enhance interpretability. Our method incorporates reasoning steps to provide clear, interpretable feedback. It generates inverse actions corresponding to the initially generated actions and verifies whether these inverse actions can restore the system to its original state, explicitly validating the logical coherence of the generated plans. The results on benchmark datasets show an average 16.3% higher success rate over existing LLM-based task planning methods. Our approach offers clearer justifications for feedback in real-world environments, resulting in more successful task completion than existing self-correction approaches across various scenarios.
