Table of Contents
Fetching ...

Large Language Models for Orchestrating Bimanual Robots

Kun Chu, Xufeng Zhao, Cornelius Weber, Mengdi Li, Wenhao Lu, Stefan Wermter

TL;DR

LAnguage-model-based Bimanual ORchestration (LABOR), an agent utilizing an LLM to analyze task configurations and devise coordination control policies for addressing long-horizon bimanual tasks, demonstrates that the method outperforms the baseline in terms of success rate.

Abstract

Although there has been rapid progress in endowing robots with the ability to solve complex manipulation tasks, generating control policies for bimanual robots to solve tasks involving two hands is still challenging because of the difficulties in effective temporal and spatial coordination. With emergent abilities in terms of step-by-step reasoning and in-context learning, Large Language Models (LLMs) have demonstrated promising potential in a variety of robotic tasks. However, the nature of language communication via a single sequence of discrete symbols makes LLM-based coordination in continuous space a particular challenge for bimanual tasks. To tackle this challenge, we present LAnguage-model-based Bimanual ORchestration (LABOR), an agent utilizing an LLM to analyze task configurations and devise coordination control policies for addressing long-horizon bimanual tasks. We evaluate our method through simulated experiments involving two classes of long-horizon tasks using the NICOL humanoid robot. Our results demonstrate that our method outperforms the baseline in terms of success rate. Additionally, we thoroughly analyze failure cases, offering insights into LLM-based approaches in bimanual robotic control and revealing future research trends. The project website can be found at http://labor-agent.github.io.

Large Language Models for Orchestrating Bimanual Robots

TL;DR

LAnguage-model-based Bimanual ORchestration (LABOR), an agent utilizing an LLM to analyze task configurations and devise coordination control policies for addressing long-horizon bimanual tasks, demonstrates that the method outperforms the baseline in terms of success rate.

Abstract

Although there has been rapid progress in endowing robots with the ability to solve complex manipulation tasks, generating control policies for bimanual robots to solve tasks involving two hands is still challenging because of the difficulties in effective temporal and spatial coordination. With emergent abilities in terms of step-by-step reasoning and in-context learning, Large Language Models (LLMs) have demonstrated promising potential in a variety of robotic tasks. However, the nature of language communication via a single sequence of discrete symbols makes LLM-based coordination in continuous space a particular challenge for bimanual tasks. To tackle this challenge, we present LAnguage-model-based Bimanual ORchestration (LABOR), an agent utilizing an LLM to analyze task configurations and devise coordination control policies for addressing long-horizon bimanual tasks. We evaluate our method through simulated experiments involving two classes of long-horizon tasks using the NICOL humanoid robot. Our results demonstrate that our method outperforms the baseline in terms of success rate. Additionally, we thoroughly analyze failure cases, offering insights into LLM-based approaches in bimanual robotic control and revealing future research trends. The project website can be found at http://labor-agent.github.io.
Paper Structure (16 sections, 7 figures)

This paper contains 16 sections, 7 figures.

Figures (7)

  • Figure 1: Illustration of the LABOR agent. During the execution of the task, with the guiding prompt, the LLM coordinator first decomposes the task and then generates the step action plan, including the control command for the left and right hand. The bimanual robot executor then performs actions on the environment according to the commands, and the results provide feedback to the LLM for the next action, and so on until the end of the task.
  • Figure 2: Spatio-temporal control adopted by the LABOR agent.
  • Figure 3: Prompts for the LLM to orchestrate a bimanual robot to accomplish the task. The general prompting template, indicated in standard black text, remains consistent across all tasks, while task-specific details such as environment setting and task description are shown in gray.
  • Figure 4: Example of a LABOR agent's reasoning. With the guiding prompt, the LLM decomposes the task into multiple stages, i.e., uncoordinated and coordinated stages, and then generates action plans with skill primitives for both hands until the task is accomplished.
  • Figure 5: NICOL workspace with daily objects in real and simulated worlds, from left to right: apple, banana, cup (orange or blue), bowl, scissors, and cup (red or yellow). In this work, experiments are designed and completed in the simulated environment, leaving real-world exploration for the foreseeable future.
  • ...and 2 more figures