Table of Contents
Fetching ...

Autonomous Behavior Planning For Humanoid Loco-manipulation Through Grounded Language Model

Jin Wang, Arturo Laurenzi, Nikos Tsagarakis

TL;DR

This work tackles autonomous humanoid loco-manipulation in unstructured environments by integrating a large language model with a modular behavior library to generate executable task graphs. The LLM outputs a hierarchical behavior tree from human instructions and a predefined set of action and perceptual behaviors, enabling whole-body control of the CENTAURO robot through an XML-based task graph. A multimodal failure detection and recovery loop, leveraging a visual language model and proprioceptive cues, provides robustness during long-horizon tasks. Empirical validation in both simulation and real-world CENTAURO experiments demonstrates high task-planning and execution success, with notable gains from incorporating failure detection and recovery, and shows the approach can operate without additional training. The framework offers a practical path toward rapid deployment of autonomous humanoid loco-manipulation systems in dynamic, real-world settings.

Abstract

Enabling humanoid robots to perform autonomously loco-manipulation in unstructured environments is crucial and highly challenging for achieving embodied intelligence. This involves robots being able to plan their actions and behaviors in long-horizon tasks while using multi-modality to perceive deviations between task execution and high-level planning. Recently, large language models (LLMs) have demonstrated powerful planning and reasoning capabilities for comprehension and processing of semantic information through robot control tasks, as well as the usability of analytical judgment and decision-making for multi-modal inputs. To leverage the power of LLMs towards humanoid loco-manipulation, we propose a novel language-model based framework that enables robots to autonomously plan behaviors and low-level execution under given textual instructions, while observing and correcting failures that may occur during task execution. To systematically evaluate this framework in grounding LLMs, we created the robot 'action' and 'sensing' behavior library for task planning, and conducted mobile manipulation tasks and experiments in both simulated and real environments using the CENTAURO robot, and verified the effectiveness and application of this approach in robotic tasks with autonomous behavioral planning.

Autonomous Behavior Planning For Humanoid Loco-manipulation Through Grounded Language Model

TL;DR

This work tackles autonomous humanoid loco-manipulation in unstructured environments by integrating a large language model with a modular behavior library to generate executable task graphs. The LLM outputs a hierarchical behavior tree from human instructions and a predefined set of action and perceptual behaviors, enabling whole-body control of the CENTAURO robot through an XML-based task graph. A multimodal failure detection and recovery loop, leveraging a visual language model and proprioceptive cues, provides robustness during long-horizon tasks. Empirical validation in both simulation and real-world CENTAURO experiments demonstrates high task-planning and execution success, with notable gains from incorporating failure detection and recovery, and shows the approach can operate without additional training. The framework offers a practical path toward rapid deployment of autonomous humanoid loco-manipulation systems in dynamic, real-world settings.

Abstract

Enabling humanoid robots to perform autonomously loco-manipulation in unstructured environments is crucial and highly challenging for achieving embodied intelligence. This involves robots being able to plan their actions and behaviors in long-horizon tasks while using multi-modality to perceive deviations between task execution and high-level planning. Recently, large language models (LLMs) have demonstrated powerful planning and reasoning capabilities for comprehension and processing of semantic information through robot control tasks, as well as the usability of analytical judgment and decision-making for multi-modal inputs. To leverage the power of LLMs towards humanoid loco-manipulation, we propose a novel language-model based framework that enables robots to autonomously plan behaviors and low-level execution under given textual instructions, while observing and correcting failures that may occur during task execution. To systematically evaluate this framework in grounding LLMs, we created the robot 'action' and 'sensing' behavior library for task planning, and conducted mobile manipulation tasks and experiments in both simulated and real environments using the CENTAURO robot, and verified the effectiveness and application of this approach in robotic tasks with autonomous behavioral planning.
Paper Structure (16 sections, 1 equation, 7 figures, 4 tables, 1 algorithm)

This paper contains 16 sections, 1 equation, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: Humanoid robot CENTAURO picks objects with the planning of the $\textit{task graphs}$ generated by the LLM. The '$\textit{behavior lib}$' consists of various action and sensing behaviors with 'tags' describing the semantic content of different behaviors.
  • Figure 2: Overview of the Framework. (a) Behavior Planner takes the human instruction as input, given the behavior lib and prompts, LLM generates a hierarchical structure behavior tree, which forms the task graph along with the behavior code. (b) The CENTAURO robot executes the lower action command and feeds back its current state. The entire process does not require any additional training.
  • Figure 3: Behavior Planner Grounding LLM
  • Figure 4: Failure detection using a combination of perception behaviors. By asking VLM, the visual Q&A behavior can reason the state of the task, while using the torque sensor, the Grip force behavior will return the torque on the gripper.
  • Figure 5: Experiment setup
  • ...and 2 more figures