Table of Contents
Fetching ...

Consolidating Trees of Robotic Plans Generated Using Large Language Models to Improve Reliability

Md Sadman Sakib, Yu Sun

TL;DR

This work addresses the instability of LLM-generated robotic task plans by converting natural language goals into executable PDDL plans through multiple high-level task trees. It generates several task trees with GPT-4, combines them into a unified network, and uses graph-search with cost-based retrieval to select a reliable, low-cost plan, further enhanced by integrating FOON as external knowledge. The selected tree is transformed into a low-level PDDL plan via GPT-4 and Fast-Downward, enabling robot execution; experiments in cooking tasks show improved planning accuracy and efficiency with good generalization potential. The approach offers a practical, human-in-the-loop visualization and correction path, boosting reliability for real-world robotic task planning across domains.

Abstract

The inherent probabilistic nature of Large Language Models (LLMs) introduces an element of unpredictability, raising concerns about potential discrepancies in their output. This paper introduces an innovative approach aims to generate correct and optimal robotic task plans for diverse real-world demands and scenarios. LLMs have been used to generate task plans, but they are unreliable and may contain wrong, questionable, or high-cost steps. The proposed approach uses LLM to generate a number of task plans as trees and amalgamates them into a graph by removing questionable paths. Then an optimal task tree can be retrieved to circumvent questionable and high-cost nodes, thereby improving planning accuracy and execution efficiency. The approach is further improved by incorporating a large knowledge network. Leveraging GPT-4 further, the high-level task plan is converted into a low-level Planning Domain Definition Language (PDDL) plan executable by a robot. Evaluation results highlight the superior accuracy and efficiency of our approach compared to previous methodologies in the field of task planning.

Consolidating Trees of Robotic Plans Generated Using Large Language Models to Improve Reliability

TL;DR

This work addresses the instability of LLM-generated robotic task plans by converting natural language goals into executable PDDL plans through multiple high-level task trees. It generates several task trees with GPT-4, combines them into a unified network, and uses graph-search with cost-based retrieval to select a reliable, low-cost plan, further enhanced by integrating FOON as external knowledge. The selected tree is transformed into a low-level PDDL plan via GPT-4 and Fast-Downward, enabling robot execution; experiments in cooking tasks show improved planning accuracy and efficiency with good generalization potential. The approach offers a practical, human-in-the-loop visualization and correction path, boosting reliability for real-world robotic task planning across domains.

Abstract

The inherent probabilistic nature of Large Language Models (LLMs) introduces an element of unpredictability, raising concerns about potential discrepancies in their output. This paper introduces an innovative approach aims to generate correct and optimal robotic task plans for diverse real-world demands and scenarios. LLMs have been used to generate task plans, but they are unreliable and may contain wrong, questionable, or high-cost steps. The proposed approach uses LLM to generate a number of task plans as trees and amalgamates them into a graph by removing questionable paths. Then an optimal task tree can be retrieved to circumvent questionable and high-cost nodes, thereby improving planning accuracy and execution efficiency. The approach is further improved by incorporating a large knowledge network. Leveraging GPT-4 further, the high-level task plan is converted into a low-level Planning Domain Definition Language (PDDL) plan executable by a robot. Evaluation results highlight the superior accuracy and efficiency of our approach compared to previous methodologies in the field of task planning.
Paper Structure (24 sections, 10 figures, 1 table)

This paper contains 24 sections, 10 figures, 1 table.

Figures (10)

  • Figure 1: Overview of our task planning approach (using a cooking task for example). The process begins with a meal specification query, resulting in the creation of an optimal task tree. This tree is then converted into a PDDL plan, facilitating robot task execution.
  • Figure 2: (a) Activities from a cooking video and (b) its corresponding functional units.
  • Figure 3: An example of task tree generation from user command using prompt engineering with GPT-4.
  • Figure 4: Illustration of cost optimization: A comparison between task trees obtained from (a) GPT-4 and (b) the unified network. The assigned costs for scooping, pouring, and mixing are 0.4, 0.1, and 0.2, respectively.
  • Figure 5: Top 10 most frequently seen (a) motions, (b) ingredients, (c) states and (d) utensils in FOON.
  • ...and 5 more figures