Table of Contents
Fetching ...

RoboGPT: an intelligent agent of making embodied long-term decisions for daily instruction tasks

Yaran Chen, Wenbo Cui, Yuanwen Chen, Mining Tan, Xinyao Zhang, Dongbin Zhao, He Wang

TL;DR

The proposed RoboGPT agent outperforms SOTA methods on the ALFRED daily tasks, and exceeds SOTA LLM-based planners like ChatGPT in task-planning rationality for hundreds of unseen daily tasks, and even other domain tasks, while keeping the large model's original broad application and generality.

Abstract

Robotic agents must master common sense and long-term sequential decisions to solve daily tasks through natural language instruction. The developments in Large Language Models (LLMs) in natural language processing have inspired efforts to use LLMs in complex robot planning. Despite LLMs' great generalization and comprehension of instruction tasks, LLMs-generated task plans sometimes lack feasibility and correctness. To address the problem, we propose a RoboGPT agent\footnote{our code and dataset will be released soon} for making embodied long-term decisions for daily tasks, with two modules: 1) LLMs-based planning with re-plan to break the task into multiple sub-goals; 2) RoboSkill individually designed for sub-goals to learn better navigation and manipulation skills. The LLMs-based planning is enhanced with a new robotic dataset and re-plan, called RoboGPT. The new robotic dataset of 67k daily instruction tasks is gathered for fine-tuning the Llama model and obtaining RoboGPT. RoboGPT planner with strong generalization can plan hundreds of daily instruction tasks. Additionally, a low-computational Re-Plan module is designed to allow plans to flexibly adapt to the environment, thereby addressing the nomenclature diversity challenge. The proposed RoboGPT agent outperforms SOTA methods on the ALFRED daily tasks. Moreover, RoboGPT planner exceeds SOTA LLM-based planners like ChatGPT in task-planning rationality for hundreds of unseen daily tasks, and even other domain tasks, while keeping the large model's original broad application and generality.

RoboGPT: an intelligent agent of making embodied long-term decisions for daily instruction tasks

TL;DR

The proposed RoboGPT agent outperforms SOTA methods on the ALFRED daily tasks, and exceeds SOTA LLM-based planners like ChatGPT in task-planning rationality for hundreds of unseen daily tasks, and even other domain tasks, while keeping the large model's original broad application and generality.

Abstract

Robotic agents must master common sense and long-term sequential decisions to solve daily tasks through natural language instruction. The developments in Large Language Models (LLMs) in natural language processing have inspired efforts to use LLMs in complex robot planning. Despite LLMs' great generalization and comprehension of instruction tasks, LLMs-generated task plans sometimes lack feasibility and correctness. To address the problem, we propose a RoboGPT agent\footnote{our code and dataset will be released soon} for making embodied long-term decisions for daily tasks, with two modules: 1) LLMs-based planning with re-plan to break the task into multiple sub-goals; 2) RoboSkill individually designed for sub-goals to learn better navigation and manipulation skills. The LLMs-based planning is enhanced with a new robotic dataset and re-plan, called RoboGPT. The new robotic dataset of 67k daily instruction tasks is gathered for fine-tuning the Llama model and obtaining RoboGPT. RoboGPT planner with strong generalization can plan hundreds of daily instruction tasks. Additionally, a low-computational Re-Plan module is designed to allow plans to flexibly adapt to the environment, thereby addressing the nomenclature diversity challenge. The proposed RoboGPT agent outperforms SOTA methods on the ALFRED daily tasks. Moreover, RoboGPT planner exceeds SOTA LLM-based planners like ChatGPT in task-planning rationality for hundreds of unseen daily tasks, and even other domain tasks, while keeping the large model's original broad application and generality.
Paper Structure (19 sections, 7 equations, 5 figures, 2 tables)

This paper contains 19 sections, 7 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The architecture of RoboGPT. RoboPlanner decomposes an instruction into logical subgoals. RoboSkill encompasses navigation, manipulation, and interaction with the environment sequentially based on subgoals. If a subgoal fails, Re-Plan receives feedback and generates a new plan based on environmental information, e.g., replace 'SideTable' with 'Desk'.
  • Figure 2: The framework of template feedback-based self-instruction data generation and RoboPlanner training process.
  • Figure 3: Effectiveness of Re-Plan. Re-Plan will find similar alternative objects if the subgoal object cannot be noun-aligned with the object already present in the environment.
  • Figure 4: Planning of the task with the invisible object in containers. RoboPlanner possesses the capability to comprehend object relationships, hence facilitating the accomplishment of tasks involving the presence of a target object within an enclosed area.
  • Figure 5: Verification system. The instruction task is 'Slice a tomato, put the knife in the sink, and put the sliced tomato in the fridge'. The planning of RoboGPT is 'Find a knife, pick up the knife, find a tomato, slice the tomato, find a sink, put the knife in the sink, find the sliced tomato, pick up the sliced tomato, find a fridge, open the fridge, put the sliced tomato in the fridge, close the fridge'.