Autonomous Behavior Planning For Humanoid Loco-manipulation Through Grounded Language Model
Jin Wang, Arturo Laurenzi, Nikos Tsagarakis
TL;DR
This work tackles autonomous humanoid loco-manipulation in unstructured environments by integrating a large language model with a modular behavior library to generate executable task graphs. The LLM outputs a hierarchical behavior tree from human instructions and a predefined set of action and perceptual behaviors, enabling whole-body control of the CENTAURO robot through an XML-based task graph. A multimodal failure detection and recovery loop, leveraging a visual language model and proprioceptive cues, provides robustness during long-horizon tasks. Empirical validation in both simulation and real-world CENTAURO experiments demonstrates high task-planning and execution success, with notable gains from incorporating failure detection and recovery, and shows the approach can operate without additional training. The framework offers a practical path toward rapid deployment of autonomous humanoid loco-manipulation systems in dynamic, real-world settings.
Abstract
Enabling humanoid robots to perform autonomously loco-manipulation in unstructured environments is crucial and highly challenging for achieving embodied intelligence. This involves robots being able to plan their actions and behaviors in long-horizon tasks while using multi-modality to perceive deviations between task execution and high-level planning. Recently, large language models (LLMs) have demonstrated powerful planning and reasoning capabilities for comprehension and processing of semantic information through robot control tasks, as well as the usability of analytical judgment and decision-making for multi-modal inputs. To leverage the power of LLMs towards humanoid loco-manipulation, we propose a novel language-model based framework that enables robots to autonomously plan behaviors and low-level execution under given textual instructions, while observing and correcting failures that may occur during task execution. To systematically evaluate this framework in grounding LLMs, we created the robot 'action' and 'sensing' behavior library for task planning, and conducted mobile manipulation tasks and experiments in both simulated and real environments using the CENTAURO robot, and verified the effectiveness and application of this approach in robotic tasks with autonomous behavioral planning.
